tagmemics 0.0.0.beta

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: e27b491016415e6938d939946daf4176b8f9afce
4
+ data.tar.gz: 518f34821b597f5c58c125b2558c059e0d83363e
5
+ SHA512:
6
+ metadata.gz: b0847592c853bca02019b517022cbfe754dbfe00a93dc181753f497d245e0218a55d917c965dcfbdaeaea08b938ebd1b2fce8bbeb0eed6bc8d764b49c26af8aa
7
+ data.tar.gz: 54a2e571e6a6ee5c24c29039fdd9905e0000e0ded8475e200546cd6e7c01dda947ff86fdea09bbc1e368c2164a91cefad9117e17196622f3d7dbe4a33281cf70
data/README.md ADDED
@@ -0,0 +1,177 @@
1
+ # Tagmemics
2
+
3
+ ## Description
4
+
5
+ The English language is extremely complicated. We have words that can have multiple
6
+ parts of speech. Natural language processing is difficult because it is hard to
7
+ tell if a word is a noun when it could be a verb or an adjective, etc.
8
+
9
+ The purpose of this project is to develop an algorithm that, given a sentence string,
10
+ has a ranking system that detects the part of speech of each word.
11
+
12
+ Why is the useful? Because **understanding the correct parts of speech in a sentence
13
+ is the first step to teaching a robot how to read**.
14
+
15
+ ## The Goal
16
+
17
+ The endstate is to have usage like this:
18
+
19
+ ```ruby
20
+
21
+ Tagmemics.parse('I am the best thing since sliced bread and binary numbers')
22
+
23
+ # =>
24
+ # <ParsedSentence:0x007fc7ebba47e8
25
+ # @adjectives=["best", "binary", "sliced"],
26
+ # @articles=["the"],
27
+ # @conjunctions=["and"],
28
+ # @nouns=["bread", "numbers", "thing"],
29
+ # @prepositions=["since"],
30
+ # @pronouns=["I"],
31
+ # @str="I am the best thing since sliced bread and binary numbers"
32
+ # @verbs=["am"]
33
+ # >
34
+
35
+ ```
36
+
37
+ Notice that `sliced` is an adjective here, but could also be a *past-tense verb*.
38
+ Also, `binary` is an adjective, but could also be a *noun*.
39
+
40
+ This throws the possibility of having a simple hash of words out the window. Instead,
41
+ the goal is to leverage the [WordNet database](http://wordnet.princeton.edu/) to list
42
+ the many possibilities of a given word and **rank the possibilities by
43
+ the part of speech of the word's neighbors**.
44
+
45
+ For example, we know `sliced` and `binary` are both adjectives because they are
46
+ both directly preceding a noun.
47
+
48
+ The algorithm that handles this ranking is the dream behind this project.
49
+
50
+
51
+ ## Current Thought Process
52
+
53
+ *Note: this is informal knowledge of grammar and most likely needs improvement.*
54
+
55
+ > **Cheat Sheet**
56
+
57
+ > - Nouns (including pronouns) are a person, place or thing.
58
+
59
+ > - Verbs are the action.
60
+
61
+ > - Adjectives describe the *what* of a noun or pronoun.
62
+
63
+ > - Adverbs describe the *how* of a verb, adjective, or another adverb.
64
+
65
+ > - Articles are adjectives but have little meaning: "the, a, an" (zero probability of confusion)
66
+
67
+ > - Prepositions add context to a noun or verb in the form of a
68
+ [*prepositional phrase*](https://en.wikipedia.org/wiki/Adpositional_phrase)
69
+ (low probability of confusion).
70
+
71
+ > - Conjunctions combine words or phrases together (low probability of confusion).
72
+
73
+
74
+
75
+ #### A noun appears:
76
+
77
+ - after an adjective (including articles)
78
+
79
+ - The red **fox** jumped the **fence**.
80
+
81
+ - before a verb
82
+
83
+ - The bank *robber* stole the money.
84
+
85
+ - **Mary** likes strawberries.
86
+
87
+ - at end of prepositional phrase (as the object)
88
+
89
+ - I went across **town*.
90
+
91
+ - The red fox jumped over the **fence**.
92
+
93
+
94
+ #### An adjective appears:
95
+
96
+ - before a noun
97
+
98
+ - The **red** fox jumped the **tall** fence.
99
+
100
+ - The **tasty** food got eaten.
101
+
102
+ - after a linking verb (predicate adjective)
103
+
104
+ - The food tasted **great**.
105
+
106
+ - I am **tired**.
107
+
108
+ - Nancy is **thoughtful**.
109
+
110
+ - That looked **amazing**.
111
+
112
+ #### A verb appears:
113
+
114
+ - directly after a noun
115
+
116
+ - The red fox **jumped** the tall fence.
117
+
118
+ - The tasty food **got eaten**.
119
+
120
+ - directly after a pronoun
121
+
122
+ - The man who **stole** it is Bob.
123
+
124
+ - They **said** that maybe he **stole** it.
125
+
126
+ - Bob is a theaf that **had** a bad childhood.
127
+
128
+ - I **know** he **needs** to learn some ruby.
129
+
130
+ #### An adverb appears:
131
+
132
+ - directly after a verb
133
+
134
+ - He walked **quickly** to the store.
135
+
136
+ - Run as **fast** as you can!
137
+
138
+ - Mary ate the cheeseburger **ridiculously quick**
139
+
140
+ - before a verb
141
+
142
+ - He **quickly** walked to the store.
143
+
144
+ - Mary **ridiculously** ate the cheeseburger.
145
+
146
+ - She **sometimes** takes medication.
147
+
148
+ - before an adjective
149
+
150
+ - Mary is **really** fat.
151
+
152
+ - Bob is **especially** clever.
153
+
154
+ - before another adverb
155
+
156
+ - He speaks **very** slowly.
157
+
158
+ - He exercises **remarkably** well.
159
+
160
+ #### A preposition appears:
161
+
162
+ - directly after a verb
163
+
164
+ - He walked **across** the room.
165
+
166
+ - I jumped **over** the rope.
167
+
168
+ - The ball is **over** there.
169
+
170
+ - She will arrive **at** noon.
171
+
172
+ - beginning of a sentence
173
+
174
+ - **In** the morning, I usually drink coffee.
175
+
176
+ - **Around** the mountain, here she comes!
177
+
data/Rakefile ADDED
@@ -0,0 +1,29 @@
1
+ require 'rubygems'
2
+ require 'rake'
3
+ require 'rake/testtask'
4
+
5
+ Rake::TestTask.new(:test) do |test|
6
+ test.libs << 'lib' << 'test'
7
+ test.pattern = 'test/**/test*.rb'
8
+ test.verbose = true
9
+ end
10
+
11
+ Rake::TestTask.new(:dummy) do |test|
12
+ test.libs << 'lib' << 'test'
13
+ test.pattern = 'test/**/dummy*.rb'
14
+ test.verbose = true
15
+ end
16
+
17
+ desc 'Open console with tagmemics loaded'
18
+ task :console do
19
+ exec 'pry -r ./lib/tagmemics.rb'
20
+ end
21
+
22
+ desc 'make a release'
23
+ task :release do
24
+ exec './script/release'
25
+ end
26
+
27
+ task c: :console # alias 'c' for console
28
+ task d: :dummy
29
+ task default: :test
data/lib/tagmemics.rb ADDED
@@ -0,0 +1,35 @@
1
+ require_relative './tagmemics/word'
2
+ require_relative './tagmemics/sentence'
3
+
4
+
5
+ module Lexicon
6
+ def self.parse(str)
7
+ ParsedSentence.new(str)
8
+ end
9
+ end
10
+
11
+ # The output of Lexicon.parse
12
+ class ParsedSentence
13
+ attr_accessor :nouns, :verbs, :articles, :adjectives, :adverbs,
14
+ :prepositions, :conjunctions, :pronouns
15
+
16
+ def initialize(str)
17
+ @str = str
18
+ end
19
+
20
+ def sentence_to_array(sentence)
21
+ sentence.split(/\W/)
22
+ end
23
+
24
+ def start_hash(arr)
25
+ arr.map do |word|
26
+ result =
27
+ case
28
+ when part_of_speech(ARTICLES, word).any? then :article
29
+ when part_of_speech(CONJUNCTIONS, word).any? then :conjunction
30
+ when part_of_speech(PRONOUNS, word).any? then :pronoun
31
+ end
32
+ [word, result]
33
+ end.to_h
34
+ end
35
+ end
@@ -0,0 +1,51 @@
1
+ # Retrieves data from config folder to save to constants.
2
+ module Config
3
+ def self.config_path
4
+ File.join(File.dirname(__FILE__), '../../config')
5
+ end
6
+
7
+ # Returns the absolute path to the adjectives list
8
+ def self.list_path(part_of_speech)
9
+ File.join(config_path, "#{part_of_speech}.txt")
10
+ end
11
+
12
+ def self.list_contents(part_of_speech)
13
+ File.new(list_path(part_of_speech), 'r:utf-8').read
14
+ end
15
+
16
+ def self.contents_to_a(part_of_speech)
17
+ list_contents(part_of_speech).split("\n")
18
+ end
19
+
20
+ def self.update_list(part_of_speech, uri, css_selector)
21
+ require 'mechanize'
22
+
23
+ agent = Mechanize.new
24
+ page = agent.get(uri)
25
+ destination = "./config/#{part_of_speech}.txt"
26
+ target = page.search(css_selector)
27
+ regx = /[^'a-zA-Z\s]/ # anything beside letters, apostrophe or space
28
+
29
+ arr = []
30
+ target.each do |x|
31
+ x = x.text
32
+ x.gsub(/\r\n\s/, "\n").split("\n").each do |word|
33
+ next if arr.include? word
34
+ next if regx =~ word
35
+ arr << word
36
+ end
37
+ end
38
+ arr.sort!
39
+
40
+ puts "Starting list from #{uri}"
41
+ puts "There are #{arr.count} #{part_of_speech} to save."
42
+
43
+ File.open(destination, 'w') do |line|
44
+ arr.each do |word|
45
+ line << word + "\n"
46
+ end
47
+ end
48
+
49
+ puts 'Save complete.'
50
+ end
51
+ end
File without changes
@@ -0,0 +1,69 @@
1
+ require 'wordnet'
2
+ require 'facets'
3
+ require_relative './config'
4
+ require_relative './word/wordnet'
5
+
6
+ module Lexicon
7
+ class Word
8
+ include Config
9
+
10
+ ARTICLES = %w(the an a)
11
+ CONJUNCTIONS = %w(for and nor but or yet so )
12
+ PRONOUNS = Config.contents_to_a('pronouns')
13
+
14
+
15
+ def part_of_speech(constant, str)
16
+ arr = []
17
+ constant.each do |word|
18
+ regx = /\b#{word}\b/i
19
+ arr << word if regx =~ str # word phrase matches
20
+ end
21
+ arr
22
+ end
23
+
24
+ def decimal_complete(hsh)
25
+ total = hsh.length
26
+ complete = hsh.count { |_k, v| v } # not nil
27
+ complete / total.to_f
28
+ end
29
+
30
+ def initialize(word)
31
+ @word = word
32
+ @confidence_levels = confidence_levels(word)
33
+ end
34
+
35
+ def confidence_levels(word)
36
+ {
37
+ :noun => noun_confidence(word),
38
+ :verb => verb_confidence(word),
39
+ :adjective => adjective_confidence(word),
40
+ :adverb => adverb_confidence(word),
41
+ :article => article_confidence(word),
42
+ :preposition => preposition_confidence(word),
43
+ :conjunction => conjunction_confidence(word)
44
+ }
45
+ end
46
+
47
+ def noun_confidence(str)
48
+ (WordNet.orig_probability(str) / 1) * 3
49
+ end
50
+
51
+ def verb_confidence(str)
52
+ end
53
+
54
+ def adjective_confidence(str)
55
+ end
56
+
57
+ def adverb_confidence(str)
58
+ end
59
+
60
+ def article_confidence(str)
61
+ end
62
+
63
+ def preposition_confidence(str)
64
+ end
65
+
66
+ def conjunction_confidence(str)
67
+ end
68
+ end
69
+ end
@@ -0,0 +1,43 @@
1
+ require 'wordnet'
2
+ require 'facets'
3
+
4
+ puts "you're at the right place"
5
+
6
+ module Lexicon
7
+ module WordNet
8
+ class << self
9
+ def lex
10
+ WordNet::Lexicon.new
11
+ end
12
+
13
+ def word_definitions(word)
14
+ word = word.to_sym
15
+ lex.lookup_synsets(word)
16
+ end
17
+
18
+ def parts_of_speech_frequency(word, arr = [])
19
+ word_definitions(word).each do |x|
20
+ arr << x.part_of_speech
21
+ end
22
+ arr.frequency
23
+ end
24
+
25
+ def total_possibilities(word)
26
+ parts_of_speech_frequency(word).values.reduce(:+)
27
+ end
28
+
29
+ def orig_probability(word)
30
+ hsh = parts_of_speech_frequency(word)
31
+ denom = total_possibilities(word)
32
+
33
+ hsh.each { |k, v| hsh[k] = v / denom.to_f }
34
+ end
35
+
36
+ def most_likely(word)
37
+ hsh = probability(word)
38
+ max = hsh.values.max
39
+ hsh.select { |_k, v| v == max }
40
+ end
41
+ end
42
+ end
43
+ end
metadata ADDED
@@ -0,0 +1,162 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: tagmemics
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.0.beta
5
+ platform: ruby
6
+ authors:
7
+ - John Mason
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-10-26 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: facets
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '3.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '3.0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: wordnet-defaultdb
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '1.0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '1.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: shoulda
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.5'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.5'
55
+ - !ruby/object:Gem::Dependency
56
+ name: shoulda-context
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '1.2'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '1.2'
69
+ - !ruby/object:Gem::Dependency
70
+ name: minitest-reporters
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '1.1'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '1.1'
83
+ - !ruby/object:Gem::Dependency
84
+ name: pry
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '0.9'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '0.9'
97
+ - !ruby/object:Gem::Dependency
98
+ name: rake
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - "~>"
102
+ - !ruby/object:Gem::Version
103
+ version: '10.3'
104
+ type: :development
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - "~>"
109
+ - !ruby/object:Gem::Version
110
+ version: '10.3'
111
+ - !ruby/object:Gem::Dependency
112
+ name: mechanize
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - "~>"
116
+ - !ruby/object:Gem::Version
117
+ version: '2.7'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - "~>"
123
+ - !ruby/object:Gem::Version
124
+ version: '2.7'
125
+ description: One day this will be great. Until then, it will be a project.
126
+ email: mace2345@gmail.com
127
+ executables: []
128
+ extensions: []
129
+ extra_rdoc_files: []
130
+ files:
131
+ - README.md
132
+ - Rakefile
133
+ - lib/tagmemics.rb
134
+ - lib/tagmemics/config.rb
135
+ - lib/tagmemics/sentence.rb
136
+ - lib/tagmemics/word.rb
137
+ - lib/tagmemics/word/wordnet.rb
138
+ homepage: http://github.com/m8ss/tagmemics
139
+ licenses:
140
+ - MIT
141
+ metadata: {}
142
+ post_install_message:
143
+ rdoc_options: []
144
+ require_paths:
145
+ - lib
146
+ required_ruby_version: !ruby/object:Gem::Requirement
147
+ requirements:
148
+ - - ">="
149
+ - !ruby/object:Gem::Version
150
+ version: '0'
151
+ required_rubygems_version: !ruby/object:Gem::Requirement
152
+ requirements:
153
+ - - ">"
154
+ - !ruby/object:Gem::Version
155
+ version: 1.3.1
156
+ requirements: []
157
+ rubyforge_project:
158
+ rubygems_version: 2.4.5
159
+ signing_key:
160
+ specification_version: 4
161
+ summary: A more organized way of accessing a language.
162
+ test_files: []