tagmemics 0.0.0.beta

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: e27b491016415e6938d939946daf4176b8f9afce
4
+ data.tar.gz: 518f34821b597f5c58c125b2558c059e0d83363e
5
+ SHA512:
6
+ metadata.gz: b0847592c853bca02019b517022cbfe754dbfe00a93dc181753f497d245e0218a55d917c965dcfbdaeaea08b938ebd1b2fce8bbeb0eed6bc8d764b49c26af8aa
7
+ data.tar.gz: 54a2e571e6a6ee5c24c29039fdd9905e0000e0ded8475e200546cd6e7c01dda947ff86fdea09bbc1e368c2164a91cefad9117e17196622f3d7dbe4a33281cf70
data/README.md ADDED
@@ -0,0 +1,177 @@
1
+ # Tagmemics
2
+
3
+ ## Description
4
+
5
+ The English language is extremely complicated. We have words that can have multiple
6
+ parts of speech. Natural language processing is difficult because it is hard to
7
+ tell if a word is a noun when it could be a verb or an adjective, etc.
8
+
9
+ The purpose of this project is to develop an algorithm that, given a sentence string,
10
+ has a ranking system that detects the part of speech of each word.
11
+
12
+ Why is the useful? Because **understanding the correct parts of speech in a sentence
13
+ is the first step to teaching a robot how to read**.
14
+
15
+ ## The Goal
16
+
17
+ The endstate is to have usage like this:
18
+
19
+ ```ruby
20
+
21
+ Tagmemics.parse('I am the best thing since sliced bread and binary numbers')
22
+
23
+ # =>
24
+ # <ParsedSentence:0x007fc7ebba47e8
25
+ # @adjectives=["best", "binary", "sliced"],
26
+ # @articles=["the"],
27
+ # @conjunctions=["and"],
28
+ # @nouns=["bread", "numbers", "thing"],
29
+ # @prepositions=["since"],
30
+ # @pronouns=["I"],
31
+ # @str="I am the best thing since sliced bread and binary numbers"
32
+ # @verbs=["am"]
33
+ # >
34
+
35
+ ```
36
+
37
+ Notice that `sliced` is an adjective here, but could also be a *past-tense verb*.
38
+ Also, `binary` is an adjective, but could also be a *noun*.
39
+
40
+ This throws the possibility of having a simple hash of words out the window. Instead,
41
+ the goal is to leverage the [WordNet database](http://wordnet.princeton.edu/) to list
42
+ the many possibilities of a given word and **rank the possibilities by
43
+ the part of speech of the word's neighbors**.
44
+
45
+ For example, we know `sliced` and `binary` are both adjectives because they are
46
+ both directly preceding a noun.
47
+
48
+ The algorithm that handles this ranking is the dream behind this project.
49
+
50
+
51
+ ## Current Thought Process
52
+
53
+ *Note: this is informal knowledge of grammar and most likely needs improvement.*
54
+
55
+ > **Cheat Sheet**
56
+
57
+ > - Nouns (including pronouns) are a person, place or thing.
58
+
59
+ > - Verbs are the action.
60
+
61
+ > - Adjectives describe the *what* of a noun or pronoun.
62
+
63
+ > - Adverbs describe the *how* of a verb, adjective, or another adverb.
64
+
65
+ > - Articles are adjectives but have little meaning: "the, a, an" (zero probability of confusion)
66
+
67
+ > - Prepositions add context to a noun or verb in the form of a
68
+ [*prepositional phrase*](https://en.wikipedia.org/wiki/Adpositional_phrase)
69
+ (low probability of confusion).
70
+
71
+ > - Conjunctions combine words or phrases together (low probability of confusion).
72
+
73
+
74
+
75
+ #### A noun appears:
76
+
77
+ - after an adjective (including articles)
78
+
79
+ - The red **fox** jumped the **fence**.
80
+
81
+ - before a verb
82
+
83
+ - The bank *robber* stole the money.
84
+
85
+ - **Mary** likes strawberries.
86
+
87
+ - at end of prepositional phrase (as the object)
88
+
89
+ - I went across **town*.
90
+
91
+ - The red fox jumped over the **fence**.
92
+
93
+
94
+ #### An adjective appears:
95
+
96
+ - before a noun
97
+
98
+ - The **red** fox jumped the **tall** fence.
99
+
100
+ - The **tasty** food got eaten.
101
+
102
+ - after a linking verb (predicate adjective)
103
+
104
+ - The food tasted **great**.
105
+
106
+ - I am **tired**.
107
+
108
+ - Nancy is **thoughtful**.
109
+
110
+ - That looked **amazing**.
111
+
112
+ #### A verb appears:
113
+
114
+ - directly after a noun
115
+
116
+ - The red fox **jumped** the tall fence.
117
+
118
+ - The tasty food **got eaten**.
119
+
120
+ - directly after a pronoun
121
+
122
+ - The man who **stole** it is Bob.
123
+
124
+ - They **said** that maybe he **stole** it.
125
+
126
+ - Bob is a theaf that **had** a bad childhood.
127
+
128
+ - I **know** he **needs** to learn some ruby.
129
+
130
+ #### An adverb appears:
131
+
132
+ - directly after a verb
133
+
134
+ - He walked **quickly** to the store.
135
+
136
+ - Run as **fast** as you can!
137
+
138
+ - Mary ate the cheeseburger **ridiculously quick**
139
+
140
+ - before a verb
141
+
142
+ - He **quickly** walked to the store.
143
+
144
+ - Mary **ridiculously** ate the cheeseburger.
145
+
146
+ - She **sometimes** takes medication.
147
+
148
+ - before an adjective
149
+
150
+ - Mary is **really** fat.
151
+
152
+ - Bob is **especially** clever.
153
+
154
+ - before another adverb
155
+
156
+ - He speaks **very** slowly.
157
+
158
+ - He exercises **remarkably** well.
159
+
160
+ #### A preposition appears:
161
+
162
+ - directly after a verb
163
+
164
+ - He walked **across** the room.
165
+
166
+ - I jumped **over** the rope.
167
+
168
+ - The ball is **over** there.
169
+
170
+ - She will arrive **at** noon.
171
+
172
+ - beginning of a sentence
173
+
174
+ - **In** the morning, I usually drink coffee.
175
+
176
+ - **Around** the mountain, here she comes!
177
+
data/Rakefile ADDED
@@ -0,0 +1,29 @@
1
+ require 'rubygems'
2
+ require 'rake'
3
+ require 'rake/testtask'
4
+
5
+ Rake::TestTask.new(:test) do |test|
6
+ test.libs << 'lib' << 'test'
7
+ test.pattern = 'test/**/test*.rb'
8
+ test.verbose = true
9
+ end
10
+
11
+ Rake::TestTask.new(:dummy) do |test|
12
+ test.libs << 'lib' << 'test'
13
+ test.pattern = 'test/**/dummy*.rb'
14
+ test.verbose = true
15
+ end
16
+
17
+ desc 'Open console with tagmemics loaded'
18
+ task :console do
19
+ exec 'pry -r ./lib/tagmemics.rb'
20
+ end
21
+
22
+ desc 'make a release'
23
+ task :release do
24
+ exec './script/release'
25
+ end
26
+
27
+ task c: :console # alias 'c' for console
28
+ task d: :dummy
29
+ task default: :test
data/lib/tagmemics.rb ADDED
@@ -0,0 +1,35 @@
1
+ require_relative './tagmemics/word'
2
+ require_relative './tagmemics/sentence'
3
+
4
+
5
+ module Lexicon
6
+ def self.parse(str)
7
+ ParsedSentence.new(str)
8
+ end
9
+ end
10
+
11
+ # The output of Lexicon.parse
12
+ class ParsedSentence
13
+ attr_accessor :nouns, :verbs, :articles, :adjectives, :adverbs,
14
+ :prepositions, :conjunctions, :pronouns
15
+
16
+ def initialize(str)
17
+ @str = str
18
+ end
19
+
20
+ def sentence_to_array(sentence)
21
+ sentence.split(/\W/)
22
+ end
23
+
24
+ def start_hash(arr)
25
+ arr.map do |word|
26
+ result =
27
+ case
28
+ when part_of_speech(ARTICLES, word).any? then :article
29
+ when part_of_speech(CONJUNCTIONS, word).any? then :conjunction
30
+ when part_of_speech(PRONOUNS, word).any? then :pronoun
31
+ end
32
+ [word, result]
33
+ end.to_h
34
+ end
35
+ end
@@ -0,0 +1,51 @@
1
+ # Retrieves data from config folder to save to constants.
2
+ module Config
3
+ def self.config_path
4
+ File.join(File.dirname(__FILE__), '../../config')
5
+ end
6
+
7
+ # Returns the absolute path to the adjectives list
8
+ def self.list_path(part_of_speech)
9
+ File.join(config_path, "#{part_of_speech}.txt")
10
+ end
11
+
12
+ def self.list_contents(part_of_speech)
13
+ File.new(list_path(part_of_speech), 'r:utf-8').read
14
+ end
15
+
16
+ def self.contents_to_a(part_of_speech)
17
+ list_contents(part_of_speech).split("\n")
18
+ end
19
+
20
+ def self.update_list(part_of_speech, uri, css_selector)
21
+ require 'mechanize'
22
+
23
+ agent = Mechanize.new
24
+ page = agent.get(uri)
25
+ destination = "./config/#{part_of_speech}.txt"
26
+ target = page.search(css_selector)
27
+ regx = /[^'a-zA-Z\s]/ # anything beside letters, apostrophe or space
28
+
29
+ arr = []
30
+ target.each do |x|
31
+ x = x.text
32
+ x.gsub(/\r\n\s/, "\n").split("\n").each do |word|
33
+ next if arr.include? word
34
+ next if regx =~ word
35
+ arr << word
36
+ end
37
+ end
38
+ arr.sort!
39
+
40
+ puts "Starting list from #{uri}"
41
+ puts "There are #{arr.count} #{part_of_speech} to save."
42
+
43
+ File.open(destination, 'w') do |line|
44
+ arr.each do |word|
45
+ line << word + "\n"
46
+ end
47
+ end
48
+
49
+ puts 'Save complete.'
50
+ end
51
+ end
File without changes
@@ -0,0 +1,69 @@
1
+ require 'wordnet'
2
+ require 'facets'
3
+ require_relative './config'
4
+ require_relative './word/wordnet'
5
+
6
+ module Lexicon
7
+ class Word
8
+ include Config
9
+
10
+ ARTICLES = %w(the an a)
11
+ CONJUNCTIONS = %w(for and nor but or yet so )
12
+ PRONOUNS = Config.contents_to_a('pronouns')
13
+
14
+
15
+ def part_of_speech(constant, str)
16
+ arr = []
17
+ constant.each do |word|
18
+ regx = /\b#{word}\b/i
19
+ arr << word if regx =~ str # word phrase matches
20
+ end
21
+ arr
22
+ end
23
+
24
+ def decimal_complete(hsh)
25
+ total = hsh.length
26
+ complete = hsh.count { |_k, v| v } # not nil
27
+ complete / total.to_f
28
+ end
29
+
30
+ def initialize(word)
31
+ @word = word
32
+ @confidence_levels = confidence_levels(word)
33
+ end
34
+
35
+ def confidence_levels(word)
36
+ {
37
+ :noun => noun_confidence(word),
38
+ :verb => verb_confidence(word),
39
+ :adjective => adjective_confidence(word),
40
+ :adverb => adverb_confidence(word),
41
+ :article => article_confidence(word),
42
+ :preposition => preposition_confidence(word),
43
+ :conjunction => conjunction_confidence(word)
44
+ }
45
+ end
46
+
47
+ def noun_confidence(str)
48
+ (WordNet.orig_probability(str) / 1) * 3
49
+ end
50
+
51
+ def verb_confidence(str)
52
+ end
53
+
54
+ def adjective_confidence(str)
55
+ end
56
+
57
+ def adverb_confidence(str)
58
+ end
59
+
60
+ def article_confidence(str)
61
+ end
62
+
63
+ def preposition_confidence(str)
64
+ end
65
+
66
+ def conjunction_confidence(str)
67
+ end
68
+ end
69
+ end
@@ -0,0 +1,43 @@
1
+ require 'wordnet'
2
+ require 'facets'
3
+
4
+ puts "you're at the right place"
5
+
6
+ module Lexicon
7
+ module WordNet
8
+ class << self
9
+ def lex
10
+ WordNet::Lexicon.new
11
+ end
12
+
13
+ def word_definitions(word)
14
+ word = word.to_sym
15
+ lex.lookup_synsets(word)
16
+ end
17
+
18
+ def parts_of_speech_frequency(word, arr = [])
19
+ word_definitions(word).each do |x|
20
+ arr << x.part_of_speech
21
+ end
22
+ arr.frequency
23
+ end
24
+
25
+ def total_possibilities(word)
26
+ parts_of_speech_frequency(word).values.reduce(:+)
27
+ end
28
+
29
+ def orig_probability(word)
30
+ hsh = parts_of_speech_frequency(word)
31
+ denom = total_possibilities(word)
32
+
33
+ hsh.each { |k, v| hsh[k] = v / denom.to_f }
34
+ end
35
+
36
+ def most_likely(word)
37
+ hsh = probability(word)
38
+ max = hsh.values.max
39
+ hsh.select { |_k, v| v == max }
40
+ end
41
+ end
42
+ end
43
+ end
metadata ADDED
@@ -0,0 +1,162 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: tagmemics
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.0.beta
5
+ platform: ruby
6
+ authors:
7
+ - John Mason
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-10-26 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: facets
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '3.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '3.0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: wordnet-defaultdb
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '1.0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '1.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: shoulda
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.5'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.5'
55
+ - !ruby/object:Gem::Dependency
56
+ name: shoulda-context
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '1.2'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '1.2'
69
+ - !ruby/object:Gem::Dependency
70
+ name: minitest-reporters
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '1.1'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '1.1'
83
+ - !ruby/object:Gem::Dependency
84
+ name: pry
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '0.9'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '0.9'
97
+ - !ruby/object:Gem::Dependency
98
+ name: rake
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - "~>"
102
+ - !ruby/object:Gem::Version
103
+ version: '10.3'
104
+ type: :development
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - "~>"
109
+ - !ruby/object:Gem::Version
110
+ version: '10.3'
111
+ - !ruby/object:Gem::Dependency
112
+ name: mechanize
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - "~>"
116
+ - !ruby/object:Gem::Version
117
+ version: '2.7'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - "~>"
123
+ - !ruby/object:Gem::Version
124
+ version: '2.7'
125
+ description: One day this will be great. Until then, it will be a project.
126
+ email: mace2345@gmail.com
127
+ executables: []
128
+ extensions: []
129
+ extra_rdoc_files: []
130
+ files:
131
+ - README.md
132
+ - Rakefile
133
+ - lib/tagmemics.rb
134
+ - lib/tagmemics/config.rb
135
+ - lib/tagmemics/sentence.rb
136
+ - lib/tagmemics/word.rb
137
+ - lib/tagmemics/word/wordnet.rb
138
+ homepage: http://github.com/m8ss/tagmemics
139
+ licenses:
140
+ - MIT
141
+ metadata: {}
142
+ post_install_message:
143
+ rdoc_options: []
144
+ require_paths:
145
+ - lib
146
+ required_ruby_version: !ruby/object:Gem::Requirement
147
+ requirements:
148
+ - - ">="
149
+ - !ruby/object:Gem::Version
150
+ version: '0'
151
+ required_rubygems_version: !ruby/object:Gem::Requirement
152
+ requirements:
153
+ - - ">"
154
+ - !ruby/object:Gem::Version
155
+ version: 1.3.1
156
+ requirements: []
157
+ rubyforge_project:
158
+ rubygems_version: 2.4.5
159
+ signing_key:
160
+ specification_version: 4
161
+ summary: A more organized way of accessing a language.
162
+ test_files: []