words 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -2,11 +2,43 @@
2
2
 
3
3
  ## About ##
4
4
 
5
- Words implements a fast interface to [Wordnet®](http://wordnet.princeton.edu) which makes use of [Tokyo Cabinet](http://1978th.net/tokyocabinet/) and a FFI interface, [rufus-tokyo](http://github.com/jmettraux/rufus-tokyo), to provide cross ruby distribution compatability and blistering speed. I have attempted to provide ease of use in the form of a simple yet powerful api and installation is a sintch, we even include the data in it's tokyo data format (subject to the original wordnet licencing.)
5
+ Words implements a fast interface to [Wordnet®](http://wordnet.princeton.edu) which provides both a pure ruby and an FFI powered backend over the same easy-to-use API. The FFI backend makes use of [Tokyo Cabinet](http://1978th.net/tokyocabinet/) and the FFI interface, [rufus-tokyo](http://github.com/jmettraux/rufus-tokyo), to provide cross ruby distribution compatability and blistering speed. The pure ruby interface operates on a special ruby optimised index along with the basic dictionary files provided by WordNet®. I have attempted to provide ease of use in the form of a simple yet powerful api and installation is a sintch!
6
6
 
7
- ## Installation ##
7
+ ## Pre-Installation ##
8
8
 
9
- First ensure you have [Tokyo Cabinet](http://1978th.net/tokyocabinet/) installed. It should be nice and easy...
9
+ First ensure you have a copy of the wordnet data files. This is generally available from your Linux/OSX package manager:
10
+
11
+ #Ubuntu
12
+ sudo apt-get install wordnet-base
13
+
14
+ #Fedora/RHL
15
+ sudo yum update wordnet
16
+
17
+ #MacPorts
18
+ sudo port install wordnet
19
+
20
+ or you can simply download and install (Unix/OSX):
21
+
22
+ wget http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
23
+ sudo mkdir /usr/local/share/wordnet
24
+ sudo tar -C /usr/local/share/wordnet/ -xzf WNdb-3.0.tar.gz
25
+
26
+ or (Windows)
27
+
28
+ Download http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
29
+ Unzip
30
+
31
+ ## For Tokyo Backend Only ##
32
+
33
+ Unless you want to use the tokyo backend you are now ready to install Words && build the data, otherwise if you want to use the tokyo backend (FAST!) you will also need [Tokyo Cabinet](http://1978th.net/tokyocabinet/) installed. It should be nice and easy... something like:
34
+
35
+ wget http://1978th.net/tokyocabinet/tokyocabinet-1.4.41.tar.gz
36
+ cd tokyo-cabinet/
37
+ ./configure
38
+ make
39
+ sudo make install
40
+
41
+ ## GEM Installation ##
10
42
 
11
43
  After this it should be just a gem to install. For those of you with old rubygems versions first:
12
44
 
@@ -19,22 +51,22 @@ Otherwise and after it's simply:
19
51
 
20
52
  Then your ready to rock and roll. :)
21
53
 
22
- ## Build Data (Optional) ##
54
+ ## Build Data ##
23
55
 
24
- If you want to build the wordnet dataset file yourself, from the original wordnet files, you can use the bundled "build_dataset.rb"
56
+ To build the wordnet dataset (or index for pure) file yourself, from the original wordnet files, you can use the bundled "build_wordnet" command
25
57
 
26
- ./build_dataset.rb -h #this will give you the usage
27
- sudo ./build_dataset.rb #this will attempt to build the data locating the original wordnet files through a search...
58
+ build_wordnet -h # this will give you the usage information
59
+ sudo build_wordnet -v --build-tokyo # this would attempt to build the tokyo backend data locating the original wordnet files through a search...
60
+ sudo build_wordnet -v --build-pure # this would attempt to build the pure backend index locating the original wordnet files through a search...
28
61
 
29
62
  ## Usage ##
30
63
 
31
64
  Heres a few little examples of using words within your programs.
32
65
 
33
-
34
66
  require 'rubygems'
35
67
  require 'words'
36
68
 
37
- data = Words::Words.new
69
+ data = Words::Words.new # or: data = Words::Words.new(:pure) for the pure ruby backend
38
70
 
39
71
  # locate a word
40
72
  lemma = data.find("bat")
@@ -45,6 +77,7 @@ Heres a few little examples of using words within your programs.
45
77
  lemma.synsets(:noun) # => array of synsets which represent nouns of the lemma bat
46
78
  # or
47
79
  lemma.nouns # => array of synsets which represent nouns of the lemma bat
80
+ lemma.noun_ids # => array of synsets ids which represent nouns of the lemma bat
48
81
  lemma.verbs? #=> true
49
82
 
50
83
  # specify a sense
@@ -53,6 +86,7 @@ Heres a few little examples of using words within your programs.
53
86
 
54
87
  sense.gloss # => a club used for hitting a ball in various games
55
88
  sense2.words # => ["cricket bat", "bat"]
89
+ sense2.lexical_description # => a description of the lexical meaning of the synset
56
90
  sense.relations.first # => "Semantic hypernym relation between n02806379 and n03053474"
57
91
 
58
92
  sense.relations(:hyponym) # => Array of hyponyms associated with the sense
@@ -68,7 +102,8 @@ Heres a few little examples of using words within your programs.
68
102
  sense.derivationally_related_forms.first.source_word # => "bat"
69
103
  sense.derivationally_related_forms.first.destination_word # => "bat"
70
104
  sense.derivationally_related_forms.first.destination # => the synset of v01413191
71
-
105
+
106
+ These and more examples are available from within the examples.rb file!
72
107
 
73
108
  ## Note on Patches/Pull Requests ##
74
109
 
data/Rakefile CHANGED
@@ -10,9 +10,10 @@ begin
10
10
  gem.email = "roja@arbia.co.uk"
11
11
  gem.homepage = "http://github.com/roja/words"
12
12
  gem.authors = ["Roja Buck"]
13
- gem.add_development_dependency "trollop", ">= 1.15"
13
+ gem.add_dependency "trollop", ">= 1.15"
14
14
  gem.add_dependency 'rufus-tokyo', '>= 1.0.5'
15
- # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
15
+ gem.executables = [ "build_wordnet" ]
16
+ gem.default_executable = "build_wordnet"
16
17
  end
17
18
  Jeweler::GemcutterTasks.new
18
19
  rescue LoadError
@@ -46,7 +47,7 @@ task :default => :test
46
47
  require 'rake/rdoctask'
47
48
  Rake::RDocTask.new do |rdoc|
48
49
  version = File.exist?('VERSION') ? File.read('VERSION') : ""
49
-
50
+
50
51
  rdoc.rdoc_dir = 'rdoc'
51
52
  rdoc.title = "words #{version}"
52
53
  rdoc.rdoc_files.include('README*')
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.0
1
+ 0.2.0
@@ -6,7 +6,6 @@ require 'pathname'
6
6
  # gem includes
7
7
  require 'rubygems'
8
8
  require 'trollop'
9
- require 'pstore'
10
9
  require 'rufus-tokyo'
11
10
 
12
11
  POS_FILE_TYPES = %w{ adj adv noun verb }
@@ -27,7 +26,10 @@ if __FILE__ == $0
27
26
  opts = Trollop::options do
28
27
  opt :verbose, "Output verbose program detail.", :default => false
29
28
  opt :wordnet, "Location of the wordnet dictionary directory", :default => "Search..."
29
+ opt :build_tokyo, "Build the tokyo dataset?", :default => false
30
+ opt :build_pure, "Build the pure ruby dataset?", :default => false
30
31
  end
32
+ Trollop::die :build_tokyo, "Either tokyo dataset or pure ruby dataset are required" if !opts[:build_tokyo] && !opts[:build_pure]
31
33
  puts "Verbose mode enabled" if (VERBOSE = opts[:verbose])
32
34
 
33
35
  wordnet_dir = nil
@@ -57,7 +59,8 @@ if __FILE__ == $0
57
59
 
58
60
  # Build data
59
61
 
60
- hash = Rufus::Tokyo::Table.new("data/wordnet.tct")
62
+ index_hash = Hash.new
63
+ data_hash = Hash.new
61
64
  POS_FILE_TYPES.each do |file_pos|
62
65
 
63
66
  puts "Building #{file_pos} indexes..." if VERBOSE
@@ -73,30 +76,45 @@ if __FILE__ == $0
73
76
  tagsense_count = pos + index_parts.shift
74
77
  synset_ids = Array.new(synset_count).map { POS_FILE_TYPE_TO_SHORT[file_pos] + index_parts.shift }
75
78
 
76
- hash[lemma] = { "synset_ids" => '', "tagsense_counts" => '' } if hash[lemma].nil?
79
+ index_hash[lemma] = { "synset_ids" => [], "tagsense_counts" => [] } if index_hash[lemma].nil?
80
+ index_hash[lemma] = { "lemma" => lemma, "synset_ids" => index_hash[lemma]["synset_ids"] + synset_ids, "tagsense_counts" => index_hash[lemma]["tagsense_counts"] + [tagsense_count] }
77
81
 
78
- hash[lemma] = { "lemma" => lemma, "synset_ids" => (hash[lemma]["synset_ids"].split('|') + synset_ids).join('|'), # append synsets
79
- "tagsense_counts" => (hash[lemma]["tagsense_counts"].split('|') << tagsense_count).join('|') } # append pointer symbols
80
82
  end
81
83
 
82
- puts "Adding #{file_pos} data..." if VERBOSE
83
-
84
- # add data
85
- (wordnet_dir + "data.#{file_pos}").each_line do |data_line|
86
- next if data_line[0, 2] == " "
87
- data_line, gloss = data_line.split(" | ")
88
- data_parts = data_line.split(" ")
89
-
90
- synset_id, lexical_filenum, synset_type, word_count = POS_FILE_TYPE_TO_SHORT[file_pos] + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
91
- words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
92
- relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
84
+ if opts[:build_tokyo]
85
+ puts "Building #{file_pos} data..." if VERBOSE
93
86
 
94
- hash[synset_id] = { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type,
95
- "words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss }
87
+ # add data
88
+ (wordnet_dir + "data.#{file_pos}").each_line do |data_line|
89
+ next if data_line[0, 2] == " "
90
+ data_line, gloss = data_line.split(" | ")
91
+ data_parts = data_line.split(" ")
92
+
93
+ synset_id, lexical_filenum, synset_type, word_count = POS_FILE_TYPE_TO_SHORT[file_pos] + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
94
+ words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
95
+ relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
96
+
97
+ data_hash[synset_id] = { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type,
98
+ "words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss.strip }
99
+ end
96
100
  end
97
101
 
98
-
99
102
  end
100
- hash.close
103
+
104
+ if opts[:build_tokyo]
105
+ tokyo_hash = Rufus::Tokyo::Table.new("#{File.dirname(__FILE__)}/../data/wordnet.tct")
106
+ index_hash.each { |k,v| tokyo_hash[k] = { "lemma" => v["lemma"], "synset_ids" => v["synset_ids"].join('|'), "tagsense_counts" => v["tagsense_counts"].join('|') } }
107
+ data_hash.each { |k,v| tokyo_hash[k] = v }
108
+ tokyo_hash.close
109
+ end
110
+
111
+ if opts[:build_pure]
112
+ index = Hash.new
113
+ index_hash.each { |k,v| index[k] = [v["lemma"], v["tagsense_counts"].join('|'), v["synset_ids"].join('|')] }
114
+ File.open("#{File.dirname(__FILE__)}/../data/index.dmp",'w') do |file|
115
+ file.write Marshal.dump(index)
116
+ end
117
+ end
118
+
101
119
 
102
120
  end
@@ -4,25 +4,32 @@ require 'lib/words'
4
4
 
5
5
  if __FILE__ == $0
6
6
 
7
- wordnet = Words::Words.new
7
+ wordnet = Words::Words.new # :pure
8
+
9
+ puts wordnet
8
10
 
9
11
  puts wordnet.find('bat')
10
12
  puts wordnet.find('bat').available_pos.inspect
11
13
  puts wordnet.find('bat').lemma
14
+ puts wordnet.find('bat').nouns?
12
15
  puts wordnet.find('bat').synsets('noun')
13
- puts wordnet.find('bat').synsets('noun').last.words.inspect
14
- puts wordnet.find('bat').synsets('noun').last.relations
16
+ puts wordnet.find('bat').noun_ids
17
+ puts wordnet.find('bat').synsets(:noun).last.words.inspect
18
+ puts wordnet.find('bat').nouns.last.relations
15
19
  wordnet.find('bat').synsets('noun').last.relations.each { |relation| puts relation.inspect }
16
- puts wordnet.find('bat').synsets('noun').last.methods
17
20
  puts wordnet.find('bat').synsets('noun').last.hyponyms?
18
21
  puts wordnet.find('bat').synsets('noun').last.participle_of_verbs?
19
22
 
20
23
  puts wordnet.find('bat').synsets('noun').last.relations(:hyponym)
24
+ puts wordnet.find('bat').synsets('noun').last.hyponyms?
21
25
  puts wordnet.find('bat').synsets('noun').last.relations("~")
22
26
  puts wordnet.find('bat').synsets('verb').last.inspect
23
27
  puts wordnet.find('bat').synsets('verb').last.words
24
28
  puts wordnet.find('bat').synsets('verb').last.words_with_num.inspect
25
29
 
30
+ puts wordnet.find('bat').synsets('verb').first.lexical.inspect
31
+ puts wordnet.find('bat').synsets('verb').first.lexical_description
32
+
26
33
  wordnet.close
27
34
 
28
35
  end
@@ -10,12 +10,94 @@ module Words
10
10
 
11
11
  class WordnetConnection
12
12
 
13
- def self.wordnet_connection
14
- @@wordnet_connection
13
+ SHORT_TO_POS_FILE_TYPE = { 'a' => 'adj', 'r' => 'adv', 'n' => 'noun', 'v' => 'verb' }
14
+
15
+ attr_reader :connected, :connection_type, :data_path, :wordnet_dir
16
+
17
+ def initialize(type, path, wordnet_path)
18
+ @data_path = Pathname.new("#{File.dirname(__FILE__)}/../data/wordnet.tct") if type == :tokyo && path == :default
19
+ @data_path = Pathname.new("#{File.dirname(__FILE__)}/../data/index.dmp") if type == :pure && path == :default
20
+ @connection_type = type
21
+
22
+ if @data_path.exist?
23
+ if @connection_type == :tokyo
24
+ @connection = Rufus::Tokyo::Table.new(@data_path.to_s)
25
+ @connected = true
26
+ elsif @connection_type == :pure
27
+ # open the index is there
28
+ File.open(@data_path,'r') do |file|
29
+ @connection = Marshal.load file.read
30
+ end
31
+ # search for the wordnet files
32
+ if locate_wordnet?(wordnet_path)
33
+ @connected = true
34
+ else
35
+ @connected = false
36
+ raise "Failed to locate the wordnet database. Please ensure it is installed and that if it resides at a custom path that path is given as an argument when constructing the Words object."
37
+ end
38
+ else
39
+ @connected = false
40
+ end
41
+ else
42
+ @connected = false
43
+ raise "Failed to locate the words #{ @connection_type == :pure ? 'index' : 'dataset' } at #{@data_path}. Please insure you have created it using the words gems provided 'build_dataset.rb' command."
44
+ end
45
+
46
+ end
47
+
48
+ def close
49
+ @connected = false
50
+ if @connected && connection_type == :tokyo
51
+ connection.close
52
+ end
53
+ return true
54
+ end
55
+
56
+ def lemma(term)
57
+ if connection_type == :pure
58
+ raw_lemma = @connection[term]
59
+ { 'lemma' => raw_lemma[0], 'tagsense_counts' => raw_lemma[1], 'synset_ids' => raw_lemma[2]}
60
+ else
61
+ @connection[term]
62
+ end
63
+ end
64
+
65
+ def synset(synset_id)
66
+ if connection_type == :pure
67
+ pos = synset_id[0,1]
68
+ File.open(@wordnet_dir + "data.#{SHORT_TO_POS_FILE_TYPE[pos]}","r") do |file|
69
+ file.seek(synset_id[1..-1].to_i)
70
+ data_line, gloss = file.readline.strip.split(" | ")
71
+ data_parts = data_line.split(" ")
72
+ synset_id, lexical_filenum, synset_type, word_count = pos + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
73
+ words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
74
+ relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
75
+ { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type, "words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss.strip }
76
+ end
77
+ else
78
+ @connection[synset_id]
79
+ end
15
80
  end
16
81
 
17
- def self.wordnet_connection=(x)
18
- @@wordnet_connection = x
82
+ def locate_wordnet?(base_dirs)
83
+
84
+ base_dirs = case base_dirs
85
+ when :search
86
+ ['/usr/share/wordnet', '/usr/local/share/wordnet', '/usr/local/WordNet-3.0']
87
+ else
88
+ [ base_dirs ]
89
+ end
90
+
91
+ base_dirs.each do |dir|
92
+ ["", "dict"].each do |sub_folder|
93
+ path = Pathname.new(dir + sub_folder)
94
+ @wordnet_dir = path if (path + "data.noun").exist?
95
+ break if !@wordnet_dir.nil?
96
+ end
97
+ end
98
+
99
+ return !@wordnet_dir.nil?
100
+
19
101
  end
20
102
 
21
103
  end
@@ -29,7 +111,8 @@ module Words
29
111
  "\\" => :pertainym, "<" => :participle_of_verb, "&" => :similar_to, "^" => :see_also }
30
112
  SYMBOL_TO_RELATION = RELATION_TO_SYMBOL.invert
31
113
 
32
- def initialize(relation_construct, source_synset)
114
+ def initialize(relation_construct, source_synset, wordnet_connection)
115
+ @wordnet_connection = wordnet_connection
33
116
  @symbol, @dest_synset_id, @pos, @source_dest = relation_construct.split('.')
34
117
  @dest_synset_id = @pos + @dest_synset_id
35
118
  @symbol = RELATION_TO_SYMBOL[@symbol]
@@ -66,7 +149,7 @@ module Words
66
149
  end
67
150
 
68
151
  def destination
69
- @destination = Synset.new(@dest_synset_id) unless defined? @destination
152
+ @destination = Synset.new @dest_synset_id, @wordnet_connection unless defined? @destination
70
153
  @destination
71
154
  end
72
155
 
@@ -85,9 +168,55 @@ module Words
85
168
  class Synset
86
169
 
87
170
  SYNSET_TYPE_TO_SYMBOL = {"n" => :noun, "v" => :verb, "a" => :adjective, "r" => :adverb, "s" => :adjective_satallite }
88
-
89
- def initialize(synset_id)
90
- @synset_hash = WordnetConnection::wordnet_connection[synset_id]
171
+ NUM_TO_LEX = [ { :lex => :adj_all, :description => "all adjective clusters" },
172
+ { :lex => :adj_pert, :description => "relational adjectives (pertainyms)" },
173
+ { :lex => :adv_all, :description => "all adverbs" },
174
+ { :lex => :noun_Tops, :description => "unique beginner for nouns" },
175
+ { :lex => :noun_act, :description => "nouns denoting acts or actions" },
176
+ { :lex => :noun_animal, :description => "nouns denoting animals" },
177
+ { :lex => :noun_artifact, :description => "nouns denoting man-made objects" },
178
+ { :lex => :noun_attribute, :description => "nouns denoting attributes of people and objects" },
179
+ { :lex => :noun_body, :description => "nouns denoting body parts" },
180
+ { :lex => :noun_cognition, :description => "nouns denoting cognitive processes and contents" },
181
+ { :lex => :noun_communication, :description => "nouns denoting communicative processes and contents" },
182
+ { :lex => :noun_event, :description => "nouns denoting natural events" },
183
+ { :lex => :noun_feeling, :description => "nouns denoting feelings and emotions" },
184
+ { :lex => :noun_food, :description => "nouns denoting foods and drinks" },
185
+ { :lex => :noun_group, :description => "nouns denoting groupings of people or objects" },
186
+ { :lex => :noun_location, :description => "nouns denoting spatial position" },
187
+ { :lex => :noun_motive, :description => "nouns denoting goals" },
188
+ { :lex => :noun_object, :description => "nouns denoting natural objects (not man-made)" },
189
+ { :lex => :noun_person, :description => "nouns denoting people" },
190
+ { :lex => :noun_phenomenon, :description => "nouns denoting natural phenomena" },
191
+ { :lex => :noun_plant, :description => "nouns denoting plants" },
192
+ { :lex => :noun_possession, :description => "nouns denoting possession and transfer of possession" },
193
+ { :lex => :noun_process, :description => "nouns denoting natural processes" },
194
+ { :lex => :noun_quantity, :description => "nouns denoting quantities and units of measure" },
195
+ { :lex => :noun_relation, :description => "nouns denoting relations between people or things or ideas" },
196
+ { :lex => :noun_shape, :description => "nouns denoting two and three dimensional shapes" },
197
+ { :lex => :noun_state, :description => "nouns denoting stable states of affairs" },
198
+ { :lex => :noun_substance, :description => "nouns denoting substances" },
199
+ { :lex => :noun_time, :description => "nouns denoting time and temporal relations" },
200
+ { :lex => :verb_body, :description => "verbs of grooming, dressing and bodily care" },
201
+ { :lex => :verb_change, :description => "verbs of size, temperature change, intensifying, etc." },
202
+ { :lex => :verb_cognition, :description => "verbs of thinking, judging, analyzing, doubting" },
203
+ { :lex => :verb_communication, :description => "verbs of telling, asking, ordering, singing" },
204
+ { :lex => :verb_competition, :description => "verbs of fighting, athletic activities" },
205
+ { :lex => :verb_consumption, :description => "verbs of eating and drinking" },
206
+ { :lex => :verb_contact, :description => "verbs of touching, hitting, tying, digging" },
207
+ { :lex => :verb_creation, :description => "verbs of sewing, baking, painting, performing" },
208
+ { :lex => :verb_emotion, :description => "verbs of feeling" },
209
+ { :lex => :verb_motion, :description => "verbs of walking, flying, swimming" },
210
+ { :lex => :verb_perception, :description => "verbs of seeing, hearing, feeling" },
211
+ { :lex => :verb_possession, :description => "verbs of buying, selling, owning" },
212
+ { :lex => :verb_social, :description => "verbs of political and social activities and events" },
213
+ { :lex => :verb_stative, :description => "verbs of being, having, spatial relations" },
214
+ { :lex => :verb_weather, :description => "verbs of raining, snowing, thawing, thundering" },
215
+ { :lex => :adj_ppl, :description => "participial adjectives" } ]
216
+
217
+ def initialize(synset_id, wordnet_connection)
218
+ @wordnet_connection = wordnet_connection
219
+ @synset_hash = wordnet_connection.synset(synset_id)
91
220
  # construct some conveniance menthods for relation type access
92
221
  Relation::SYMBOL_TO_RELATION.keys.each do |relation_type|
93
222
  self.class.send(:define_method, "#{relation_type}s?") do
@@ -117,6 +246,22 @@ module Words
117
246
  @words_with_num
118
247
  end
119
248
 
249
+ def lexical_filenum
250
+ @synset_hash["lexical_filenum"].to_i
251
+ end
252
+
253
+ def lexical_catagory
254
+ lexical[:lex]
255
+ end
256
+
257
+ def lexical_description
258
+ lexical[:description]
259
+ end
260
+
261
+ def lexical
262
+ NUM_TO_LEX[@synset_hash["lexical_filenum"].to_i]
263
+ end
264
+
120
265
  def synset_id
121
266
  @synset_hash["synset_id"]
122
267
  end
@@ -130,7 +275,7 @@ module Words
130
275
  end
131
276
 
132
277
  def relations(type = :all)
133
- @relations = @synset_hash["relations"].split('|').map { |relation| Relation.new(relation, self) } unless defined? @relations
278
+ @relations = @synset_hash["relations"].split('|').map { |relation| Relation.new(relation, self, @wordnet_connection) } unless defined? @relations
134
279
  case
135
280
  when Relation::SYMBOL_TO_RELATION.include?(type.to_sym)
136
281
  @relations.select { |relation| relation.relation_type == type.to_sym }
@@ -153,8 +298,9 @@ module Words
153
298
  POS_TO_SYMBOL = {"n" => :noun, "v" => :verb, "a" => :adjective, "r" => :adverb}
154
299
  SYMBOL_TO_POS = POS_TO_SYMBOL.invert
155
300
 
156
- def initialize(lemma_hash)
157
- @lemma_hash = lemma_hash
301
+ def initialize(raw_lemma, wordnet_connection)
302
+ @wordnet_connection = wordnet_connection
303
+ @lemma_hash = raw_lemma
158
304
  # construct some conveniance menthods for relation type access
159
305
  SYMBOL_TO_POS.keys.each do |pos|
160
306
  self.class.send(:define_method, "#{pos}s?") do
@@ -163,9 +309,17 @@ module Words
163
309
  self.class.send(:define_method, "#{pos}s") do
164
310
  synsets(pos)
165
311
  end
312
+ self.class.send(:define_method, "#{pos}_ids") do
313
+ synset_ids(pos)
314
+ end
166
315
  end
167
316
  end
168
317
 
318
+ def tagsense_counts
319
+ @tagsense_counts = @lemma_hash["tagsense_counts"].split('|').map { |count| { POS_TO_SYMBOL[count[0,1]] => count[1..-1].to_i } } unless defined? @tagsense_counts
320
+ @tagsense_counts
321
+ end
322
+
169
323
  def lemma
170
324
  @lemma = @lemma_hash["lemma"].gsub('_', ' ') unless defined? @lemma
171
325
  @lemma
@@ -182,20 +336,19 @@ module Words
182
336
  end
183
337
 
184
338
  def synsets(pos = :all)
185
- relevent_synsets = case
339
+ synset_ids(pos).map { |synset_id| Synset.new synset_id, @wordnet_connection }
340
+ end
341
+
342
+ def synset_ids(pos = :all)
343
+ @synset_ids = @lemma_hash["synset_ids"].split('|') unless defined? @synset_ids
344
+ case
186
345
  when SYMBOL_TO_POS.include?(pos.to_sym)
187
- synset_ids.select { |synset_id| synset_id[0,1] == SYMBOL_TO_POS[pos.to_sym] }
346
+ @synset_ids.select { |synset_id| synset_id[0,1] == SYMBOL_TO_POS[pos.to_sym] }
188
347
  when POS_TO_SYMBOL.include?(pos.to_s)
189
- synset_ids.select { |synset_id| synset_id[0,1] == pos.to_s }
348
+ @synset_ids.select { |synset_id| synset_id[0,1] == pos.to_s }
190
349
  else
191
- synset_ids
350
+ @synset_ids
192
351
  end
193
- relevent_synsets.map { |synset_id| Synset.new synset_id }
194
- end
195
-
196
- def synset_ids
197
- @synset_ids = @lemma_hash["synset_ids"].split('|') unless defined? @synset_ids
198
- @synset_ids
199
352
  end
200
353
 
201
354
  def inspect
@@ -203,25 +356,42 @@ module Words
203
356
  end
204
357
 
205
358
  alias word lemma
359
+ alias pos available_pos
206
360
 
207
361
  end
208
362
 
209
363
  class Words
210
364
 
211
- def initialize(path = 'data/wordnet.tct')
212
- if (Pathname.new path).exist?
213
- WordnetConnection::wordnet_connection = Rufus::Tokyo::Table.new(path)
214
- else
215
- abort("Failed to locate the words database at #{(Pathname.new path).realpath}")
216
- end
365
+ @wordnet_connection = nil
366
+
367
+ def initialize(type = :tokyo, path = :default, wordnet_path = :search)
368
+ @wordnet_connection = WordnetConnection.new(type, path, wordnet_path)
217
369
  end
218
370
 
219
371
  def find(word)
220
- Lemma.new WordnetConnection::wordnet_connection[word]
372
+ Lemma.new @wordnet_connection.lemma(word), @wordnet_connection
373
+ end
374
+
375
+ def connection_type
376
+ @wordnet_connection.connection_type
377
+ end
378
+
379
+ def wordnet_dir
380
+ @wordnet_connection.wordnet_dir
221
381
  end
222
382
 
223
383
  def close
224
- WordnetConnection::wordnet_connection.close
384
+ @wordnet_connection.close
385
+ end
386
+
387
+ def connected
388
+ @wordnet_connection.connected
389
+ end
390
+
391
+ def to_s
392
+ return "Words not connected" if !connected
393
+ return "Words running in pure mode using wordnet files found at #{wordnet_dir} and index at #{@wordnet_connection.data_path}" if connection_type == :pure
394
+ return "Words running in tokyo mode with dataset at #{@wordnet_connection.data_path}" if connection_type == :tokyo
225
395
  end
226
396
 
227
397
  end
@@ -0,0 +1,60 @@
1
+ # Generated by jeweler
2
+ # DO NOT EDIT THIS FILE DIRECTLY
3
+ # Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
4
+ # -*- encoding: utf-8 -*-
5
+
6
+ Gem::Specification.new do |s|
7
+ s.name = %q{words}
8
+ s.version = "0.2.0"
9
+
10
+ s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
+ s.authors = ["Roja Buck"]
12
+ s.date = %q{2010-01-16}
13
+ s.default_executable = %q{build_wordnet}
14
+ s.description = %q{A fast, easy to use interface to WordNet® with cross ruby distribution compatability. We use TokyoCabinet to store the dataset and the excellent rufus-tokyo to interface with it. This allows us to have full compatability across ruby distributions while still remaining both fast and simple to use.}
15
+ s.email = %q{roja@arbia.co.uk}
16
+ s.executables = ["build_wordnet"]
17
+ s.extra_rdoc_files = [
18
+ "LICENSE",
19
+ "README.markdown"
20
+ ]
21
+ s.files = [
22
+ ".gitignore",
23
+ "LICENSE",
24
+ "README.markdown",
25
+ "Rakefile",
26
+ "VERSION",
27
+ "bin/build_wordnet",
28
+ "examples.rb",
29
+ "lib/words.rb",
30
+ "test/helper.rb",
31
+ "test/test_words.rb",
32
+ "words.gemspec"
33
+ ]
34
+ s.homepage = %q{http://github.com/roja/words}
35
+ s.rdoc_options = ["--charset=UTF-8"]
36
+ s.require_paths = ["lib"]
37
+ s.rubygems_version = %q{1.3.5}
38
+ s.summary = %q{A fast, easy to use interface to WordNet® with cross ruby distribution compatability.}
39
+ s.test_files = [
40
+ "test/test_words.rb",
41
+ "test/helper.rb"
42
+ ]
43
+
44
+ if s.respond_to? :specification_version then
45
+ current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
46
+ s.specification_version = 3
47
+
48
+ if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
49
+ s.add_runtime_dependency(%q<trollop>, [">= 1.15"])
50
+ s.add_runtime_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
51
+ else
52
+ s.add_dependency(%q<trollop>, [">= 1.15"])
53
+ s.add_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
54
+ end
55
+ else
56
+ s.add_dependency(%q<trollop>, [">= 1.15"])
57
+ s.add_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
58
+ end
59
+ end
60
+
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: words
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Roja Buck
@@ -9,12 +9,12 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2010-01-14 00:00:00 +00:00
13
- default_executable:
12
+ date: 2010-01-16 00:00:00 +00:00
13
+ default_executable: build_wordnet
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: trollop
17
- type: :development
17
+ type: :runtime
18
18
  version_requirement:
19
19
  version_requirements: !ruby/object:Gem::Requirement
20
20
  requirements:
@@ -34,8 +34,8 @@ dependencies:
34
34
  version:
35
35
  description: "A fast, easy to use interface to WordNet\xC2\xAE with cross ruby distribution compatability. We use TokyoCabinet to store the dataset and the excellent rufus-tokyo to interface with it. This allows us to have full compatability across ruby distributions while still remaining both fast and simple to use."
36
36
  email: roja@arbia.co.uk
37
- executables: []
38
-
37
+ executables:
38
+ - build_wordnet
39
39
  extensions: []
40
40
 
41
41
  extra_rdoc_files:
@@ -47,12 +47,12 @@ files:
47
47
  - README.markdown
48
48
  - Rakefile
49
49
  - VERSION
50
- - build_dataset.rb
51
- - data/wordnet.tct
50
+ - bin/build_wordnet
52
51
  - examples.rb
53
52
  - lib/words.rb
54
53
  - test/helper.rb
55
54
  - test/test_words.rb
55
+ - words.gemspec
56
56
  has_rdoc: true
57
57
  homepage: http://github.com/roja/words
58
58
  licenses: []
Binary file