words 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -2,11 +2,43 @@
2
2
 
3
3
  ## About ##
4
4
 
5
- Words implements a fast interface to [Wordnet®](http://wordnet.princeton.edu) which makes use of [Tokyo Cabinet](http://1978th.net/tokyocabinet/) and a FFI interface, [rufus-tokyo](http://github.com/jmettraux/rufus-tokyo), to provide cross ruby distribution compatability and blistering speed. I have attempted to provide ease of use in the form of a simple yet powerful api and installation is a sintch, we even include the data in it's tokyo data format (subject to the original wordnet licencing.)
5
+ Words implements a fast interface to [Wordnet®](http://wordnet.princeton.edu) which provides both a pure ruby and an FFI powered backend over the same easy-to-use API. The FFI backend makes use of [Tokyo Cabinet](http://1978th.net/tokyocabinet/) and the FFI interface, [rufus-tokyo](http://github.com/jmettraux/rufus-tokyo), to provide cross ruby distribution compatability and blistering speed. The pure ruby interface operates on a special ruby optimised index along with the basic dictionary files provided by WordNet®. I have attempted to provide ease of use in the form of a simple yet powerful api and installation is a sintch!
6
6
 
7
- ## Installation ##
7
+ ## Pre-Installation ##
8
8
 
9
- First ensure you have [Tokyo Cabinet](http://1978th.net/tokyocabinet/) installed. It should be nice and easy...
9
+ First ensure you have a copy of the wordnet data files. This is generally available from your Linux/OSX package manager:
10
+
11
+ #Ubuntu
12
+ sudo apt-get install wordnet-base
13
+
14
+ #Fedora/RHL
15
+ sudo yum update wordnet
16
+
17
+ #MacPorts
18
+ sudo port install wordnet
19
+
20
+ or you can simply download and install (Unix/OSX):
21
+
22
+ wget http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
23
+ sudo mkdir /usr/local/share/wordnet
24
+ sudo tar -C /usr/local/share/wordnet/ -xzf WNdb-3.0.tar.gz
25
+
26
+ or (Windows)
27
+
28
+ Download http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
29
+ Unzip
30
+
31
+ ## For Tokyo Backend Only ##
32
+
33
+ Unless you want to use the tokyo backend you are now ready to install Words && build the data, otherwise if you want to use the tokyo backend (FAST!) you will also need [Tokyo Cabinet](http://1978th.net/tokyocabinet/) installed. It should be nice and easy... something like:
34
+
35
+ wget http://1978th.net/tokyocabinet/tokyocabinet-1.4.41.tar.gz
36
+ cd tokyo-cabinet/
37
+ ./configure
38
+ make
39
+ sudo make install
40
+
41
+ ## GEM Installation ##
10
42
 
11
43
  After this it should be just a gem to install. For those of you with old rubygems versions first:
12
44
 
@@ -19,22 +51,22 @@ Otherwise and after it's simply:
19
51
 
20
52
  Then your ready to rock and roll. :)
21
53
 
22
- ## Build Data (Optional) ##
54
+ ## Build Data ##
23
55
 
24
- If you want to build the wordnet dataset file yourself, from the original wordnet files, you can use the bundled "build_dataset.rb"
56
+ To build the wordnet dataset (or index for pure) file yourself, from the original wordnet files, you can use the bundled "build_wordnet" command
25
57
 
26
- ./build_dataset.rb -h #this will give you the usage
27
- sudo ./build_dataset.rb #this will attempt to build the data locating the original wordnet files through a search...
58
+ build_wordnet -h # this will give you the usage information
59
+ sudo build_wordnet -v --build-tokyo # this would attempt to build the tokyo backend data locating the original wordnet files through a search...
60
+ sudo build_wordnet -v --build-pure # this would attempt to build the pure backend index locating the original wordnet files through a search...
28
61
 
29
62
  ## Usage ##
30
63
 
31
64
  Heres a few little examples of using words within your programs.
32
65
 
33
-
34
66
  require 'rubygems'
35
67
  require 'words'
36
68
 
37
- data = Words::Words.new
69
+ data = Words::Words.new # or: data = Words::Words.new(:pure) for the pure ruby backend
38
70
 
39
71
  # locate a word
40
72
  lemma = data.find("bat")
@@ -45,6 +77,7 @@ Heres a few little examples of using words within your programs.
45
77
  lemma.synsets(:noun) # => array of synsets which represent nouns of the lemma bat
46
78
  # or
47
79
  lemma.nouns # => array of synsets which represent nouns of the lemma bat
80
+ lemma.noun_ids # => array of synsets ids which represent nouns of the lemma bat
48
81
  lemma.verbs? #=> true
49
82
 
50
83
  # specify a sense
@@ -53,6 +86,7 @@ Heres a few little examples of using words within your programs.
53
86
 
54
87
  sense.gloss # => a club used for hitting a ball in various games
55
88
  sense2.words # => ["cricket bat", "bat"]
89
+ sense2.lexical_description # => a description of the lexical meaning of the synset
56
90
  sense.relations.first # => "Semantic hypernym relation between n02806379 and n03053474"
57
91
 
58
92
  sense.relations(:hyponym) # => Array of hyponyms associated with the sense
@@ -68,7 +102,8 @@ Heres a few little examples of using words within your programs.
68
102
  sense.derivationally_related_forms.first.source_word # => "bat"
69
103
  sense.derivationally_related_forms.first.destination_word # => "bat"
70
104
  sense.derivationally_related_forms.first.destination # => the synset of v01413191
71
-
105
+
106
+ These and more examples are available from within the examples.rb file!
72
107
 
73
108
  ## Note on Patches/Pull Requests ##
74
109
 
data/Rakefile CHANGED
@@ -10,9 +10,10 @@ begin
10
10
  gem.email = "roja@arbia.co.uk"
11
11
  gem.homepage = "http://github.com/roja/words"
12
12
  gem.authors = ["Roja Buck"]
13
- gem.add_development_dependency "trollop", ">= 1.15"
13
+ gem.add_dependency "trollop", ">= 1.15"
14
14
  gem.add_dependency 'rufus-tokyo', '>= 1.0.5'
15
- # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
15
+ gem.executables = [ "build_wordnet" ]
16
+ gem.default_executable = "build_wordnet"
16
17
  end
17
18
  Jeweler::GemcutterTasks.new
18
19
  rescue LoadError
@@ -46,7 +47,7 @@ task :default => :test
46
47
  require 'rake/rdoctask'
47
48
  Rake::RDocTask.new do |rdoc|
48
49
  version = File.exist?('VERSION') ? File.read('VERSION') : ""
49
-
50
+
50
51
  rdoc.rdoc_dir = 'rdoc'
51
52
  rdoc.title = "words #{version}"
52
53
  rdoc.rdoc_files.include('README*')
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.0
1
+ 0.2.0
@@ -6,7 +6,6 @@ require 'pathname'
6
6
  # gem includes
7
7
  require 'rubygems'
8
8
  require 'trollop'
9
- require 'pstore'
10
9
  require 'rufus-tokyo'
11
10
 
12
11
  POS_FILE_TYPES = %w{ adj adv noun verb }
@@ -27,7 +26,10 @@ if __FILE__ == $0
27
26
  opts = Trollop::options do
28
27
  opt :verbose, "Output verbose program detail.", :default => false
29
28
  opt :wordnet, "Location of the wordnet dictionary directory", :default => "Search..."
29
+ opt :build_tokyo, "Build the tokyo dataset?", :default => false
30
+ opt :build_pure, "Build the pure ruby dataset?", :default => false
30
31
  end
32
+ Trollop::die :build_tokyo, "Either tokyo dataset or pure ruby dataset are required" if !opts[:build_tokyo] && !opts[:build_pure]
31
33
  puts "Verbose mode enabled" if (VERBOSE = opts[:verbose])
32
34
 
33
35
  wordnet_dir = nil
@@ -57,7 +59,8 @@ if __FILE__ == $0
57
59
 
58
60
  # Build data
59
61
 
60
- hash = Rufus::Tokyo::Table.new("data/wordnet.tct")
62
+ index_hash = Hash.new
63
+ data_hash = Hash.new
61
64
  POS_FILE_TYPES.each do |file_pos|
62
65
 
63
66
  puts "Building #{file_pos} indexes..." if VERBOSE
@@ -73,30 +76,45 @@ if __FILE__ == $0
73
76
  tagsense_count = pos + index_parts.shift
74
77
  synset_ids = Array.new(synset_count).map { POS_FILE_TYPE_TO_SHORT[file_pos] + index_parts.shift }
75
78
 
76
- hash[lemma] = { "synset_ids" => '', "tagsense_counts" => '' } if hash[lemma].nil?
79
+ index_hash[lemma] = { "synset_ids" => [], "tagsense_counts" => [] } if index_hash[lemma].nil?
80
+ index_hash[lemma] = { "lemma" => lemma, "synset_ids" => index_hash[lemma]["synset_ids"] + synset_ids, "tagsense_counts" => index_hash[lemma]["tagsense_counts"] + [tagsense_count] }
77
81
 
78
- hash[lemma] = { "lemma" => lemma, "synset_ids" => (hash[lemma]["synset_ids"].split('|') + synset_ids).join('|'), # append synsets
79
- "tagsense_counts" => (hash[lemma]["tagsense_counts"].split('|') << tagsense_count).join('|') } # append pointer symbols
80
82
  end
81
83
 
82
- puts "Adding #{file_pos} data..." if VERBOSE
83
-
84
- # add data
85
- (wordnet_dir + "data.#{file_pos}").each_line do |data_line|
86
- next if data_line[0, 2] == " "
87
- data_line, gloss = data_line.split(" | ")
88
- data_parts = data_line.split(" ")
89
-
90
- synset_id, lexical_filenum, synset_type, word_count = POS_FILE_TYPE_TO_SHORT[file_pos] + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
91
- words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
92
- relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
84
+ if opts[:build_tokyo]
85
+ puts "Building #{file_pos} data..." if VERBOSE
93
86
 
94
- hash[synset_id] = { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type,
95
- "words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss }
87
+ # add data
88
+ (wordnet_dir + "data.#{file_pos}").each_line do |data_line|
89
+ next if data_line[0, 2] == " "
90
+ data_line, gloss = data_line.split(" | ")
91
+ data_parts = data_line.split(" ")
92
+
93
+ synset_id, lexical_filenum, synset_type, word_count = POS_FILE_TYPE_TO_SHORT[file_pos] + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
94
+ words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
95
+ relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
96
+
97
+ data_hash[synset_id] = { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type,
98
+ "words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss.strip }
99
+ end
96
100
  end
97
101
 
98
-
99
102
  end
100
- hash.close
103
+
104
+ if opts[:build_tokyo]
105
+ tokyo_hash = Rufus::Tokyo::Table.new("#{File.dirname(__FILE__)}/../data/wordnet.tct")
106
+ index_hash.each { |k,v| tokyo_hash[k] = { "lemma" => v["lemma"], "synset_ids" => v["synset_ids"].join('|'), "tagsense_counts" => v["tagsense_counts"].join('|') } }
107
+ data_hash.each { |k,v| tokyo_hash[k] = v }
108
+ tokyo_hash.close
109
+ end
110
+
111
+ if opts[:build_pure]
112
+ index = Hash.new
113
+ index_hash.each { |k,v| index[k] = [v["lemma"], v["tagsense_counts"].join('|'), v["synset_ids"].join('|')] }
114
+ File.open("#{File.dirname(__FILE__)}/../data/index.dmp",'w') do |file|
115
+ file.write Marshal.dump(index)
116
+ end
117
+ end
118
+
101
119
 
102
120
  end
@@ -4,25 +4,32 @@ require 'lib/words'
4
4
 
5
5
  if __FILE__ == $0
6
6
 
7
- wordnet = Words::Words.new
7
+ wordnet = Words::Words.new # :pure
8
+
9
+ puts wordnet
8
10
 
9
11
  puts wordnet.find('bat')
10
12
  puts wordnet.find('bat').available_pos.inspect
11
13
  puts wordnet.find('bat').lemma
14
+ puts wordnet.find('bat').nouns?
12
15
  puts wordnet.find('bat').synsets('noun')
13
- puts wordnet.find('bat').synsets('noun').last.words.inspect
14
- puts wordnet.find('bat').synsets('noun').last.relations
16
+ puts wordnet.find('bat').noun_ids
17
+ puts wordnet.find('bat').synsets(:noun).last.words.inspect
18
+ puts wordnet.find('bat').nouns.last.relations
15
19
  wordnet.find('bat').synsets('noun').last.relations.each { |relation| puts relation.inspect }
16
- puts wordnet.find('bat').synsets('noun').last.methods
17
20
  puts wordnet.find('bat').synsets('noun').last.hyponyms?
18
21
  puts wordnet.find('bat').synsets('noun').last.participle_of_verbs?
19
22
 
20
23
  puts wordnet.find('bat').synsets('noun').last.relations(:hyponym)
24
+ puts wordnet.find('bat').synsets('noun').last.hyponyms?
21
25
  puts wordnet.find('bat').synsets('noun').last.relations("~")
22
26
  puts wordnet.find('bat').synsets('verb').last.inspect
23
27
  puts wordnet.find('bat').synsets('verb').last.words
24
28
  puts wordnet.find('bat').synsets('verb').last.words_with_num.inspect
25
29
 
30
+ puts wordnet.find('bat').synsets('verb').first.lexical.inspect
31
+ puts wordnet.find('bat').synsets('verb').first.lexical_description
32
+
26
33
  wordnet.close
27
34
 
28
35
  end
@@ -10,12 +10,94 @@ module Words
10
10
 
11
11
  class WordnetConnection
12
12
 
13
- def self.wordnet_connection
14
- @@wordnet_connection
13
+ SHORT_TO_POS_FILE_TYPE = { 'a' => 'adj', 'r' => 'adv', 'n' => 'noun', 'v' => 'verb' }
14
+
15
+ attr_reader :connected, :connection_type, :data_path, :wordnet_dir
16
+
17
+ def initialize(type, path, wordnet_path)
18
+ @data_path = Pathname.new("#{File.dirname(__FILE__)}/../data/wordnet.tct") if type == :tokyo && path == :default
19
+ @data_path = Pathname.new("#{File.dirname(__FILE__)}/../data/index.dmp") if type == :pure && path == :default
20
+ @connection_type = type
21
+
22
+ if @data_path.exist?
23
+ if @connection_type == :tokyo
24
+ @connection = Rufus::Tokyo::Table.new(@data_path.to_s)
25
+ @connected = true
26
+ elsif @connection_type == :pure
27
+ # open the index is there
28
+ File.open(@data_path,'r') do |file|
29
+ @connection = Marshal.load file.read
30
+ end
31
+ # search for the wordnet files
32
+ if locate_wordnet?(wordnet_path)
33
+ @connected = true
34
+ else
35
+ @connected = false
36
+ raise "Failed to locate the wordnet database. Please ensure it is installed and that if it resides at a custom path that path is given as an argument when constructing the Words object."
37
+ end
38
+ else
39
+ @connected = false
40
+ end
41
+ else
42
+ @connected = false
43
+ raise "Failed to locate the words #{ @connection_type == :pure ? 'index' : 'dataset' } at #{@data_path}. Please insure you have created it using the words gems provided 'build_dataset.rb' command."
44
+ end
45
+
46
+ end
47
+
48
+ def close
49
+ @connected = false
50
+ if @connected && connection_type == :tokyo
51
+ connection.close
52
+ end
53
+ return true
54
+ end
55
+
56
+ def lemma(term)
57
+ if connection_type == :pure
58
+ raw_lemma = @connection[term]
59
+ { 'lemma' => raw_lemma[0], 'tagsense_counts' => raw_lemma[1], 'synset_ids' => raw_lemma[2]}
60
+ else
61
+ @connection[term]
62
+ end
63
+ end
64
+
65
+ def synset(synset_id)
66
+ if connection_type == :pure
67
+ pos = synset_id[0,1]
68
+ File.open(@wordnet_dir + "data.#{SHORT_TO_POS_FILE_TYPE[pos]}","r") do |file|
69
+ file.seek(synset_id[1..-1].to_i)
70
+ data_line, gloss = file.readline.strip.split(" | ")
71
+ data_parts = data_line.split(" ")
72
+ synset_id, lexical_filenum, synset_type, word_count = pos + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
73
+ words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
74
+ relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
75
+ { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type, "words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss.strip }
76
+ end
77
+ else
78
+ @connection[synset_id]
79
+ end
15
80
  end
16
81
 
17
- def self.wordnet_connection=(x)
18
- @@wordnet_connection = x
82
+ def locate_wordnet?(base_dirs)
83
+
84
+ base_dirs = case base_dirs
85
+ when :search
86
+ ['/usr/share/wordnet', '/usr/local/share/wordnet', '/usr/local/WordNet-3.0']
87
+ else
88
+ [ base_dirs ]
89
+ end
90
+
91
+ base_dirs.each do |dir|
92
+ ["", "dict"].each do |sub_folder|
93
+ path = Pathname.new(dir + sub_folder)
94
+ @wordnet_dir = path if (path + "data.noun").exist?
95
+ break if !@wordnet_dir.nil?
96
+ end
97
+ end
98
+
99
+ return !@wordnet_dir.nil?
100
+
19
101
  end
20
102
 
21
103
  end
@@ -29,7 +111,8 @@ module Words
29
111
  "\\" => :pertainym, "<" => :participle_of_verb, "&" => :similar_to, "^" => :see_also }
30
112
  SYMBOL_TO_RELATION = RELATION_TO_SYMBOL.invert
31
113
 
32
- def initialize(relation_construct, source_synset)
114
+ def initialize(relation_construct, source_synset, wordnet_connection)
115
+ @wordnet_connection = wordnet_connection
33
116
  @symbol, @dest_synset_id, @pos, @source_dest = relation_construct.split('.')
34
117
  @dest_synset_id = @pos + @dest_synset_id
35
118
  @symbol = RELATION_TO_SYMBOL[@symbol]
@@ -66,7 +149,7 @@ module Words
66
149
  end
67
150
 
68
151
  def destination
69
- @destination = Synset.new(@dest_synset_id) unless defined? @destination
152
+ @destination = Synset.new @dest_synset_id, @wordnet_connection unless defined? @destination
70
153
  @destination
71
154
  end
72
155
 
@@ -85,9 +168,55 @@ module Words
85
168
  class Synset
86
169
 
87
170
  SYNSET_TYPE_TO_SYMBOL = {"n" => :noun, "v" => :verb, "a" => :adjective, "r" => :adverb, "s" => :adjective_satallite }
88
-
89
- def initialize(synset_id)
90
- @synset_hash = WordnetConnection::wordnet_connection[synset_id]
171
+ NUM_TO_LEX = [ { :lex => :adj_all, :description => "all adjective clusters" },
172
+ { :lex => :adj_pert, :description => "relational adjectives (pertainyms)" },
173
+ { :lex => :adv_all, :description => "all adverbs" },
174
+ { :lex => :noun_Tops, :description => "unique beginner for nouns" },
175
+ { :lex => :noun_act, :description => "nouns denoting acts or actions" },
176
+ { :lex => :noun_animal, :description => "nouns denoting animals" },
177
+ { :lex => :noun_artifact, :description => "nouns denoting man-made objects" },
178
+ { :lex => :noun_attribute, :description => "nouns denoting attributes of people and objects" },
179
+ { :lex => :noun_body, :description => "nouns denoting body parts" },
180
+ { :lex => :noun_cognition, :description => "nouns denoting cognitive processes and contents" },
181
+ { :lex => :noun_communication, :description => "nouns denoting communicative processes and contents" },
182
+ { :lex => :noun_event, :description => "nouns denoting natural events" },
183
+ { :lex => :noun_feeling, :description => "nouns denoting feelings and emotions" },
184
+ { :lex => :noun_food, :description => "nouns denoting foods and drinks" },
185
+ { :lex => :noun_group, :description => "nouns denoting groupings of people or objects" },
186
+ { :lex => :noun_location, :description => "nouns denoting spatial position" },
187
+ { :lex => :noun_motive, :description => "nouns denoting goals" },
188
+ { :lex => :noun_object, :description => "nouns denoting natural objects (not man-made)" },
189
+ { :lex => :noun_person, :description => "nouns denoting people" },
190
+ { :lex => :noun_phenomenon, :description => "nouns denoting natural phenomena" },
191
+ { :lex => :noun_plant, :description => "nouns denoting plants" },
192
+ { :lex => :noun_possession, :description => "nouns denoting possession and transfer of possession" },
193
+ { :lex => :noun_process, :description => "nouns denoting natural processes" },
194
+ { :lex => :noun_quantity, :description => "nouns denoting quantities and units of measure" },
195
+ { :lex => :noun_relation, :description => "nouns denoting relations between people or things or ideas" },
196
+ { :lex => :noun_shape, :description => "nouns denoting two and three dimensional shapes" },
197
+ { :lex => :noun_state, :description => "nouns denoting stable states of affairs" },
198
+ { :lex => :noun_substance, :description => "nouns denoting substances" },
199
+ { :lex => :noun_time, :description => "nouns denoting time and temporal relations" },
200
+ { :lex => :verb_body, :description => "verbs of grooming, dressing and bodily care" },
201
+ { :lex => :verb_change, :description => "verbs of size, temperature change, intensifying, etc." },
202
+ { :lex => :verb_cognition, :description => "verbs of thinking, judging, analyzing, doubting" },
203
+ { :lex => :verb_communication, :description => "verbs of telling, asking, ordering, singing" },
204
+ { :lex => :verb_competition, :description => "verbs of fighting, athletic activities" },
205
+ { :lex => :verb_consumption, :description => "verbs of eating and drinking" },
206
+ { :lex => :verb_contact, :description => "verbs of touching, hitting, tying, digging" },
207
+ { :lex => :verb_creation, :description => "verbs of sewing, baking, painting, performing" },
208
+ { :lex => :verb_emotion, :description => "verbs of feeling" },
209
+ { :lex => :verb_motion, :description => "verbs of walking, flying, swimming" },
210
+ { :lex => :verb_perception, :description => "verbs of seeing, hearing, feeling" },
211
+ { :lex => :verb_possession, :description => "verbs of buying, selling, owning" },
212
+ { :lex => :verb_social, :description => "verbs of political and social activities and events" },
213
+ { :lex => :verb_stative, :description => "verbs of being, having, spatial relations" },
214
+ { :lex => :verb_weather, :description => "verbs of raining, snowing, thawing, thundering" },
215
+ { :lex => :adj_ppl, :description => "participial adjectives" } ]
216
+
217
+ def initialize(synset_id, wordnet_connection)
218
+ @wordnet_connection = wordnet_connection
219
+ @synset_hash = wordnet_connection.synset(synset_id)
91
220
  # construct some conveniance menthods for relation type access
92
221
  Relation::SYMBOL_TO_RELATION.keys.each do |relation_type|
93
222
  self.class.send(:define_method, "#{relation_type}s?") do
@@ -117,6 +246,22 @@ module Words
117
246
  @words_with_num
118
247
  end
119
248
 
249
+ def lexical_filenum
250
+ @synset_hash["lexical_filenum"].to_i
251
+ end
252
+
253
+ def lexical_catagory
254
+ lexical[:lex]
255
+ end
256
+
257
+ def lexical_description
258
+ lexical[:description]
259
+ end
260
+
261
+ def lexical
262
+ NUM_TO_LEX[@synset_hash["lexical_filenum"].to_i]
263
+ end
264
+
120
265
  def synset_id
121
266
  @synset_hash["synset_id"]
122
267
  end
@@ -130,7 +275,7 @@ module Words
130
275
  end
131
276
 
132
277
  def relations(type = :all)
133
- @relations = @synset_hash["relations"].split('|').map { |relation| Relation.new(relation, self) } unless defined? @relations
278
+ @relations = @synset_hash["relations"].split('|').map { |relation| Relation.new(relation, self, @wordnet_connection) } unless defined? @relations
134
279
  case
135
280
  when Relation::SYMBOL_TO_RELATION.include?(type.to_sym)
136
281
  @relations.select { |relation| relation.relation_type == type.to_sym }
@@ -153,8 +298,9 @@ module Words
153
298
  POS_TO_SYMBOL = {"n" => :noun, "v" => :verb, "a" => :adjective, "r" => :adverb}
154
299
  SYMBOL_TO_POS = POS_TO_SYMBOL.invert
155
300
 
156
- def initialize(lemma_hash)
157
- @lemma_hash = lemma_hash
301
+ def initialize(raw_lemma, wordnet_connection)
302
+ @wordnet_connection = wordnet_connection
303
+ @lemma_hash = raw_lemma
158
304
  # construct some conveniance menthods for relation type access
159
305
  SYMBOL_TO_POS.keys.each do |pos|
160
306
  self.class.send(:define_method, "#{pos}s?") do
@@ -163,9 +309,17 @@ module Words
163
309
  self.class.send(:define_method, "#{pos}s") do
164
310
  synsets(pos)
165
311
  end
312
+ self.class.send(:define_method, "#{pos}_ids") do
313
+ synset_ids(pos)
314
+ end
166
315
  end
167
316
  end
168
317
 
318
+ def tagsense_counts
319
+ @tagsense_counts = @lemma_hash["tagsense_counts"].split('|').map { |count| { POS_TO_SYMBOL[count[0,1]] => count[1..-1].to_i } } unless defined? @tagsense_counts
320
+ @tagsense_counts
321
+ end
322
+
169
323
  def lemma
170
324
  @lemma = @lemma_hash["lemma"].gsub('_', ' ') unless defined? @lemma
171
325
  @lemma
@@ -182,20 +336,19 @@ module Words
182
336
  end
183
337
 
184
338
  def synsets(pos = :all)
185
- relevent_synsets = case
339
+ synset_ids(pos).map { |synset_id| Synset.new synset_id, @wordnet_connection }
340
+ end
341
+
342
+ def synset_ids(pos = :all)
343
+ @synset_ids = @lemma_hash["synset_ids"].split('|') unless defined? @synset_ids
344
+ case
186
345
  when SYMBOL_TO_POS.include?(pos.to_sym)
187
- synset_ids.select { |synset_id| synset_id[0,1] == SYMBOL_TO_POS[pos.to_sym] }
346
+ @synset_ids.select { |synset_id| synset_id[0,1] == SYMBOL_TO_POS[pos.to_sym] }
188
347
  when POS_TO_SYMBOL.include?(pos.to_s)
189
- synset_ids.select { |synset_id| synset_id[0,1] == pos.to_s }
348
+ @synset_ids.select { |synset_id| synset_id[0,1] == pos.to_s }
190
349
  else
191
- synset_ids
350
+ @synset_ids
192
351
  end
193
- relevent_synsets.map { |synset_id| Synset.new synset_id }
194
- end
195
-
196
- def synset_ids
197
- @synset_ids = @lemma_hash["synset_ids"].split('|') unless defined? @synset_ids
198
- @synset_ids
199
352
  end
200
353
 
201
354
  def inspect
@@ -203,25 +356,42 @@ module Words
203
356
  end
204
357
 
205
358
  alias word lemma
359
+ alias pos available_pos
206
360
 
207
361
  end
208
362
 
209
363
  class Words
210
364
 
211
- def initialize(path = 'data/wordnet.tct')
212
- if (Pathname.new path).exist?
213
- WordnetConnection::wordnet_connection = Rufus::Tokyo::Table.new(path)
214
- else
215
- abort("Failed to locate the words database at #{(Pathname.new path).realpath}")
216
- end
365
+ @wordnet_connection = nil
366
+
367
+ def initialize(type = :tokyo, path = :default, wordnet_path = :search)
368
+ @wordnet_connection = WordnetConnection.new(type, path, wordnet_path)
217
369
  end
218
370
 
219
371
  def find(word)
220
- Lemma.new WordnetConnection::wordnet_connection[word]
372
+ Lemma.new @wordnet_connection.lemma(word), @wordnet_connection
373
+ end
374
+
375
+ def connection_type
376
+ @wordnet_connection.connection_type
377
+ end
378
+
379
+ def wordnet_dir
380
+ @wordnet_connection.wordnet_dir
221
381
  end
222
382
 
223
383
  def close
224
- WordnetConnection::wordnet_connection.close
384
+ @wordnet_connection.close
385
+ end
386
+
387
+ def connected
388
+ @wordnet_connection.connected
389
+ end
390
+
391
+ def to_s
392
+ return "Words not connected" if !connected
393
+ return "Words running in pure mode using wordnet files found at #{wordnet_dir} and index at #{@wordnet_connection.data_path}" if connection_type == :pure
394
+ return "Words running in tokyo mode with dataset at #{@wordnet_connection.data_path}" if connection_type == :tokyo
225
395
  end
226
396
 
227
397
  end
@@ -0,0 +1,60 @@
1
+ # Generated by jeweler
2
+ # DO NOT EDIT THIS FILE DIRECTLY
3
+ # Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
4
+ # -*- encoding: utf-8 -*-
5
+
6
+ Gem::Specification.new do |s|
7
+ s.name = %q{words}
8
+ s.version = "0.2.0"
9
+
10
+ s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
+ s.authors = ["Roja Buck"]
12
+ s.date = %q{2010-01-16}
13
+ s.default_executable = %q{build_wordnet}
14
+ s.description = %q{A fast, easy to use interface to WordNet® with cross ruby distribution compatability. We use TokyoCabinet to store the dataset and the excellent rufus-tokyo to interface with it. This allows us to have full compatability across ruby distributions while still remaining both fast and simple to use.}
15
+ s.email = %q{roja@arbia.co.uk}
16
+ s.executables = ["build_wordnet"]
17
+ s.extra_rdoc_files = [
18
+ "LICENSE",
19
+ "README.markdown"
20
+ ]
21
+ s.files = [
22
+ ".gitignore",
23
+ "LICENSE",
24
+ "README.markdown",
25
+ "Rakefile",
26
+ "VERSION",
27
+ "bin/build_wordnet",
28
+ "examples.rb",
29
+ "lib/words.rb",
30
+ "test/helper.rb",
31
+ "test/test_words.rb",
32
+ "words.gemspec"
33
+ ]
34
+ s.homepage = %q{http://github.com/roja/words}
35
+ s.rdoc_options = ["--charset=UTF-8"]
36
+ s.require_paths = ["lib"]
37
+ s.rubygems_version = %q{1.3.5}
38
+ s.summary = %q{A fast, easy to use interface to WordNet® with cross ruby distribution compatability.}
39
+ s.test_files = [
40
+ "test/test_words.rb",
41
+ "test/helper.rb"
42
+ ]
43
+
44
+ if s.respond_to? :specification_version then
45
+ current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
46
+ s.specification_version = 3
47
+
48
+ if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
49
+ s.add_runtime_dependency(%q<trollop>, [">= 1.15"])
50
+ s.add_runtime_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
51
+ else
52
+ s.add_dependency(%q<trollop>, [">= 1.15"])
53
+ s.add_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
54
+ end
55
+ else
56
+ s.add_dependency(%q<trollop>, [">= 1.15"])
57
+ s.add_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
58
+ end
59
+ end
60
+
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: words
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Roja Buck
@@ -9,12 +9,12 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2010-01-14 00:00:00 +00:00
13
- default_executable:
12
+ date: 2010-01-16 00:00:00 +00:00
13
+ default_executable: build_wordnet
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: trollop
17
- type: :development
17
+ type: :runtime
18
18
  version_requirement:
19
19
  version_requirements: !ruby/object:Gem::Requirement
20
20
  requirements:
@@ -34,8 +34,8 @@ dependencies:
34
34
  version:
35
35
  description: "A fast, easy to use interface to WordNet\xC2\xAE with cross ruby distribution compatability. We use TokyoCabinet to store the dataset and the excellent rufus-tokyo to interface with it. This allows us to have full compatability across ruby distributions while still remaining both fast and simple to use."
36
36
  email: roja@arbia.co.uk
37
- executables: []
38
-
37
+ executables:
38
+ - build_wordnet
39
39
  extensions: []
40
40
 
41
41
  extra_rdoc_files:
@@ -47,12 +47,12 @@ files:
47
47
  - README.markdown
48
48
  - Rakefile
49
49
  - VERSION
50
- - build_dataset.rb
51
- - data/wordnet.tct
50
+ - bin/build_wordnet
52
51
  - examples.rb
53
52
  - lib/words.rb
54
53
  - test/helper.rb
55
54
  - test/test_words.rb
55
+ - words.gemspec
56
56
  has_rdoc: true
57
57
  homepage: http://github.com/roja/words
58
58
  licenses: []
Binary file