RubyGems - tagmemics - Versions diffs - 0.0.0.beta - Mend

tagmemics 0.0.0.beta

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +7 -0
data/README.md +177 -0
data/Rakefile +29 -0
data/lib/tagmemics.rb +35 -0
data/lib/tagmemics/config.rb +51 -0
data/lib/tagmemics/sentence.rb +0 -0
data/lib/tagmemics/word.rb +69 -0
data/lib/tagmemics/word/wordnet.rb +43 -0
metadata +162 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: e27b491016415e6938d939946daf4176b8f9afce
+  data.tar.gz: 518f34821b597f5c58c125b2558c059e0d83363e
+SHA512:
+  metadata.gz: b0847592c853bca02019b517022cbfe754dbfe00a93dc181753f497d245e0218a55d917c965dcfbdaeaea08b938ebd1b2fce8bbeb0eed6bc8d764b49c26af8aa
+  data.tar.gz: 54a2e571e6a6ee5c24c29039fdd9905e0000e0ded8475e200546cd6e7c01dda947ff86fdea09bbc1e368c2164a91cefad9117e17196622f3d7dbe4a33281cf70

data/README.md ADDED Viewed

@@ -0,0 +1,177 @@
+# Tagmemics
+## Description
+The English language is extremely complicated.  We have words that can have multiple
+parts of speech.  Natural language processing is difficult because it is hard to
+tell if a word is a noun when it could be a verb or an adjective, etc.
+The purpose of this project is to develop an algorithm that, given a sentence string,
+has a ranking system that detects the part of speech of each word.
+Why is the useful?  Because **understanding the correct parts of speech in a sentence
+is the first step to teaching a robot how to read**.
+## The Goal
+The endstate is to have usage like this:
+```ruby
+Tagmemics.parse('I am the best thing since sliced bread and binary numbers')
+# =>
+# <ParsedSentence:0x007fc7ebba47e8
+# @adjectives=["best", "binary", "sliced"],
+# @articles=["the"],
+# @conjunctions=["and"],
+# @nouns=["bread", "numbers", "thing"],
+# @prepositions=["since"],
+# @pronouns=["I"],
+# @str="I am the best thing since sliced bread and binary numbers"
+# @verbs=["am"]
+# >
+```
+Notice that `sliced` is an adjective here, but could also be a *past-tense verb*.
+Also, `binary` is an adjective, but could also be a *noun*.
+This throws the possibility of having a simple hash of words out the window.  Instead,
+the goal is to leverage the [WordNet database](http://wordnet.princeton.edu/) to list
+the many possibilities of a given word and **rank the possibilities by
+the part of speech of the word's neighbors**.
+For example, we know `sliced` and `binary` are both adjectives because they are
+both directly preceding a noun.
+The algorithm that handles this ranking is the dream behind this project.
+## Current Thought Process
+*Note: this is informal knowledge of grammar and most likely needs improvement.*
+> **Cheat Sheet**
+> - Nouns (including pronouns) are a person, place or thing.
+> - Verbs are the action.
+> - Adjectives describe the *what* of a noun or pronoun.
+> - Adverbs describe the *how* of a verb, adjective, or another adverb.
+> - Articles are adjectives but have little meaning: "the, a, an" (zero probability of confusion)
+> - Prepositions add context to a noun or verb in the form of a
+[*prepositional phrase*](https://en.wikipedia.org/wiki/Adpositional_phrase)
+(low probability of confusion).
+> - Conjunctions combine words or phrases together (low probability of confusion).
+#### A noun appears:
+  - after an adjective (including articles)
+    - The red **fox** jumped the **fence**.
+  - before a verb
+    - The bank *robber* stole the money.
+    - **Mary** likes strawberries.
+  - at end of prepositional phrase (as the object)
+    - I went across **town*.
+    - The red fox jumped over the **fence**.
+#### An adjective appears:
+  - before a noun
+    - The **red** fox jumped the **tall** fence.
+    - The **tasty** food got eaten.
+  - after a linking verb (predicate adjective)
+    - The food tasted **great**.
+    - I am **tired**.
+    - Nancy is **thoughtful**.
+    - That looked **amazing**.
+#### A verb appears:
+  - directly after a noun
+    - The red fox **jumped** the tall fence.
+    - The tasty food **got eaten**.
+  - directly after a pronoun
+    - The man who **stole** it is Bob.
+    - They **said** that maybe he **stole** it.
+    - Bob is a theaf that **had** a bad childhood.
+    - I **know** he **needs** to learn some ruby.
+#### An adverb appears:
+  - directly after a verb
+    - He walked **quickly** to the store.
+    - Run as **fast** as you can!
+    - Mary ate the cheeseburger **ridiculously quick**
+  - before a verb
+    - He **quickly** walked to the store.
+    - Mary **ridiculously** ate the cheeseburger.
+    - She **sometimes** takes medication.
+  - before an adjective
+    - Mary is **really** fat.
+    - Bob is **especially** clever.
+  - before another adverb
+    - He speaks **very** slowly.
+    - He exercises **remarkably** well.
+#### A preposition appears:
+  - directly after a verb
+    - He walked **across** the room.
+    - I jumped **over** the rope.
+    - The ball is **over** there.
+    - She will arrive **at** noon.
+  - beginning of a sentence
+    - **In** the morning, I usually drink coffee.
+    - **Around** the mountain, here she comes!

data/Rakefile ADDED Viewed

@@ -0,0 +1,29 @@
+require 'rubygems'
+require 'rake'
+require 'rake/testtask'
+Rake::TestTask.new(:test) do |test|
+  test.libs << 'lib' << 'test'
+  test.pattern = 'test/**/test*.rb'
+  test.verbose = true
+end
+Rake::TestTask.new(:dummy) do |test|
+  test.libs << 'lib' << 'test'
+  test.pattern = 'test/**/dummy*.rb'
+  test.verbose = true
+end
+desc 'Open console with tagmemics loaded'
+task :console do
+  exec 'pry -r ./lib/tagmemics.rb'
+end
+desc 'make a release'
+task :release do
+  exec './script/release'
+end
+task c: :console # alias 'c' for console
+task d: :dummy
+task default: :test

data/lib/tagmemics.rb ADDED Viewed

@@ -0,0 +1,35 @@
+require_relative './tagmemics/word'
+require_relative './tagmemics/sentence'
+module Lexicon
+  def self.parse(str)
+    ParsedSentence.new(str)
+  end
+end
+# The output of Lexicon.parse
+class ParsedSentence
+  attr_accessor :nouns, :verbs, :articles, :adjectives, :adverbs,
+                :prepositions, :conjunctions, :pronouns
+  def initialize(str)
+    @str = str
+  end
+  def sentence_to_array(sentence)
+    sentence.split(/\W/)
+  end
+  def start_hash(arr)
+    arr.map do |word|
+      result =
+        case
+        when part_of_speech(ARTICLES, word).any? then :article
+        when part_of_speech(CONJUNCTIONS, word).any? then :conjunction
+        when part_of_speech(PRONOUNS, word).any? then :pronoun
+        end
+      [word, result]
+    end.to_h
+  end
+end

data/lib/tagmemics/config.rb ADDED Viewed

@@ -0,0 +1,51 @@
+# Retrieves data from config folder to save to constants.
+module Config
+  def self.config_path
+    File.join(File.dirname(__FILE__), '../../config')
+  end
+  # Returns the absolute path to the adjectives list
+  def self.list_path(part_of_speech)
+    File.join(config_path, "#{part_of_speech}.txt")
+  end
+  def self.list_contents(part_of_speech)
+    File.new(list_path(part_of_speech), 'r:utf-8').read
+  end
+  def self.contents_to_a(part_of_speech)
+    list_contents(part_of_speech).split("\n")
+  end
+  def self.update_list(part_of_speech, uri, css_selector)
+    require 'mechanize'
+    agent = Mechanize.new
+    page = agent.get(uri)
+    destination = "./config/#{part_of_speech}.txt"
+    target = page.search(css_selector)
+    regx = /[^'a-zA-Z\s]/ # anything beside letters, apostrophe or space
+    arr = []
+    target.each do |x|
+      x = x.text
+      x.gsub(/\r\n\s/, "\n").split("\n").each do |word|
+        next if arr.include? word
+        next if regx =~ word
+        arr << word
+      end
+    end
+    arr.sort!
+    puts "Starting list from #{uri}"
+    puts "There are #{arr.count} #{part_of_speech} to save."
+    File.open(destination, 'w') do |line|
+      arr.each do |word|
+        line << word + "\n"
+      end
+    end
+    puts 'Save complete.'
+  end
+end

data/lib/tagmemics/sentence.rb ADDED Viewed

File without changes

data/lib/tagmemics/word.rb ADDED Viewed

@@ -0,0 +1,69 @@
+require 'wordnet'
+require 'facets'
+require_relative './config'
+require_relative './word/wordnet'
+module Lexicon
+  class Word
+    include Config
+    ARTICLES = %w(the an a)
+    CONJUNCTIONS = %w(for and nor but or yet so )
+    PRONOUNS = Config.contents_to_a('pronouns')
+    def part_of_speech(constant, str)
+      arr = []
+      constant.each do |word|
+        regx = /\b#{word}\b/i
+        arr << word if regx =~ str # word phrase matches
+      end
+      arr
+    end
+    def decimal_complete(hsh)
+      total = hsh.length
+      complete = hsh.count { |_k, v| v } # not nil
+      complete / total.to_f
+    end
+    def initialize(word)
+      @word = word
+      @confidence_levels = confidence_levels(word)
+    end
+    def confidence_levels(word)
+      {
+        :noun => noun_confidence(word),
+        :verb => verb_confidence(word),
+        :adjective => adjective_confidence(word),
+        :adverb => adverb_confidence(word),
+        :article => article_confidence(word),
+        :preposition => preposition_confidence(word),
+        :conjunction => conjunction_confidence(word)
+      }
+    end
+    def noun_confidence(str)
+      (WordNet.orig_probability(str) / 1) * 3
+    end
+    def verb_confidence(str)
+    end
+    def adjective_confidence(str)
+    end
+    def adverb_confidence(str)
+    end
+    def article_confidence(str)
+    end
+    def preposition_confidence(str)
+    end
+    def conjunction_confidence(str)
+    end
+  end
+end

data/lib/tagmemics/word/wordnet.rb ADDED Viewed

@@ -0,0 +1,43 @@
+require 'wordnet'
+require 'facets'
+puts "you're at the right place"
+module Lexicon
+  module WordNet
+    class << self
+      def lex
+        WordNet::Lexicon.new
+      end
+      def word_definitions(word)
+        word = word.to_sym
+        lex.lookup_synsets(word)
+      end
+      def parts_of_speech_frequency(word, arr = [])
+        word_definitions(word).each do |x|
+          arr << x.part_of_speech
+        end
+        arr.frequency
+      end
+      def total_possibilities(word)
+        parts_of_speech_frequency(word).values.reduce(:+)
+      end
+      def orig_probability(word)
+        hsh = parts_of_speech_frequency(word)
+        denom = total_possibilities(word)
+        hsh.each { |k, v| hsh[k] = v / denom.to_f }
+      end
+      def most_likely(word)
+        hsh = probability(word)
+        max = hsh.values.max
+        hsh.select { |_k, v| v == max }
+      end
+    end
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,162 @@
+--- !ruby/object:Gem::Specification
+name: tagmemics
+version: !ruby/object:Gem::Version
+  version: 0.0.0.beta
+platform: ruby
+authors:
+- John Mason
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2015-10-26 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: facets
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.0'
+- !ruby/object:Gem::Dependency
+  name: wordnet-defaultdb
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+- !ruby/object:Gem::Dependency
+  name: shoulda
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.5'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.5'
+- !ruby/object:Gem::Dependency
+  name: shoulda-context
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.2'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.2'
+- !ruby/object:Gem::Dependency
+  name: minitest-reporters
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.1'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.1'
+- !ruby/object:Gem::Dependency
+  name: pry
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.9'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.9'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '10.3'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '10.3'
+- !ruby/object:Gem::Dependency
+  name: mechanize
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.7'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.7'
+description: One day this will be great.  Until then, it will be a project.
+email: mace2345@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- README.md
+- Rakefile
+- lib/tagmemics.rb
+- lib/tagmemics/config.rb
+- lib/tagmemics/sentence.rb
+- lib/tagmemics/word.rb
+- lib/tagmemics/word/wordnet.rb
+homepage: http://github.com/m8ss/tagmemics
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">"
+    - !ruby/object:Gem::Version
+      version: 1.3.1
+requirements: []
+rubyforge_project:
+rubygems_version: 2.4.5
+signing_key:
+specification_version: 4
+summary: A more organized way of accessing a language.
+test_files: []