RubyGems - words - Versions diffs - 0.1.0 → 0.2.0 - Mend

words 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

data/README.markdown +45 -10
data/Rakefile +4 -3
data/VERSION +1 -1
data/{build_dataset.rb → bin/build_wordnet} +38 -20
data/examples.rb +11 -4
data/lib/words.rb +200 -30
data/words.gemspec +60 -0
metadata +8 -8
data/data/wordnet.tct +0 -0

data/README.markdown CHANGED

@@ -2,11 +2,43 @@
 ## About ##
-Words implements a fast interface to [Wordnet®](http://wordnet.princeton.edu) which makes use of [Tokyo Cabinet](http://1978th.net/tokyocabinet/) and a FFI interface, [rufus-tokyo](http://github.com/jmettraux/rufus-tokyo), to provide cross ruby distribution compatability and blistering speed. I have attempted to provide ease of use in the form of a simple yet powerful api and installation is a sintch, we even include the data in it's tokyo data format (subject to the original wordnet licencing.)
+Words implements a fast interface to [Wordnet®](http://wordnet.princeton.edu) which provides both a pure ruby and an FFI powered backend over the same easy-to-use API. The FFI backend makes use of [Tokyo Cabinet](http://1978th.net/tokyocabinet/) and the FFI interface, [rufus-tokyo](http://github.com/jmettraux/rufus-tokyo), to provide cross ruby distribution compatability and blistering speed. The pure ruby interface operates on a special ruby optimised index along with the basic dictionary files provided by WordNet®. I have attempted to provide ease of use in the form of a simple yet powerful api and installation is a sintch!
-## Installation ##
+## Pre-Installation ##
-First ensure you have [Tokyo Cabinet](http://1978th.net/tokyocabinet/) installed. It should be nice and easy...
+First ensure you have a copy of the wordnet data files. This is generally available from your Linux/OSX package manager:
+    #Ubuntu
+    sudo apt-get install wordnet-base
+    #Fedora/RHL
+    sudo yum update wordnet
+    #MacPorts
+    sudo port install wordnet
+or you can simply download and install (Unix/OSX):
+	wget http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
+	sudo mkdir /usr/local/share/wordnet
+	sudo tar -C /usr/local/share/wordnet/ -xzf WNdb-3.0.tar.gz
+or (Windows)
+	Download http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
+	Unzip
+## For Tokyo Backend Only ##
+Unless you want to use the tokyo backend you are now ready to install Words && build the data, otherwise if you want to use the tokyo backend (FAST!) you will also need [Tokyo Cabinet](http://1978th.net/tokyocabinet/) installed. It should be nice and easy... something like:
+    wget http://1978th.net/tokyocabinet/tokyocabinet-1.4.41.tar.gz
+    cd tokyo-cabinet/
+    ./configure
+    make
+    sudo make install
+## GEM Installation ##
 After this it should be just a gem to install. For those of you with old rubygems versions first:
@@ -19,22 +51,22 @@ Otherwise and after it's simply:
 Then your ready to rock and roll. :)
-## Build Data (Optional) ##
+## Build Data ##
-If you want to build the wordnet dataset file yourself, from the original wordnet files, you can use the bundled "build_dataset.rb"
+To build the wordnet dataset (or index for pure) file yourself, from the original wordnet files, you can use the bundled "build_wordnet" command
-	./build_dataset.rb -h #this will give you the usage
-	sudo ./build_dataset.rb #this will attempt to build the data locating the original wordnet files through a search...
+	build_wordnet -h # this will give you the usage information
+	sudo build_wordnet -v --build-tokyo # this would attempt to build the tokyo backend data locating the original wordnet files through a search...
+	sudo build_wordnet -v --build-pure # this would attempt to build the pure backend index locating the original wordnet files through a search...
 ## Usage ##
 Heres a few little examples of using words within your programs.
     require 'rubygems'
     require 'words'
-    data = Words::Words.new
+    data = Words::Words.new # or: data = Words::Words.new(:pure) for the pure ruby backend
     # locate a word
     lemma = data.find("bat")
@@ -45,6 +77,7 @@ Heres a few little examples of using words within your programs.
     lemma.synsets(:noun) # => array of synsets which represent nouns of the lemma bat
     # or
     lemma.nouns # => array of synsets which represent nouns of the lemma bat
+    lemma.noun_ids # => array of synsets ids which represent nouns of the lemma bat
     lemma.verbs? #=> true
     # specify a sense
@@ -53,6 +86,7 @@ Heres a few little examples of using words within your programs.
     sense.gloss # => a club used for hitting a ball in various games
     sense2.words # => ["cricket bat", "bat"]
+    sense2.lexical_description # => a description of the lexical meaning of the synset
     sense.relations.first # => "Semantic hypernym relation between n02806379 and n03053474"
     sense.relations(:hyponym) # => Array of hyponyms associated with the sense
@@ -68,7 +102,8 @@ Heres a few little examples of using words within your programs.
     sense.derivationally_related_forms.first.source_word # => "bat"
     sense.derivationally_related_forms.first.destination_word # => "bat"
     sense.derivationally_related_forms.first.destination # => the synset of v01413191
+These and more examples are available from within the examples.rb file!
 ## Note on Patches/Pull Requests ##

data/Rakefile CHANGED

@@ -10,9 +10,10 @@ begin
     gem.email = "roja@arbia.co.uk"
     gem.homepage = "http://github.com/roja/words"
     gem.authors = ["Roja Buck"]
-    gem.add_development_dependency "trollop", ">= 1.15"
+    gem.add_dependency "trollop", ">= 1.15"
     gem.add_dependency 'rufus-tokyo', '>= 1.0.5'
-    # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
+    gem.executables = [ "build_wordnet" ]
+    gem.default_executable = "build_wordnet"
   end
   Jeweler::GemcutterTasks.new
 rescue LoadError
@@ -46,7 +47,7 @@ task :default => :test
 require 'rake/rdoctask'
 Rake::RDocTask.new do |rdoc|
   version = File.exist?('VERSION') ? File.read('VERSION') : ""
   rdoc.rdoc_dir = 'rdoc'
   rdoc.title = "words #{version}"
   rdoc.rdoc_files.include('README*')

data/VERSION CHANGED

	@@ -1 +1 @@
1	- 0.1.0
1	+ 0.2.0

data/{build_dataset.rb → bin/build_wordnet} RENAMED

@@ -6,7 +6,6 @@ require 'pathname'
 # gem includes
 require 'rubygems'
 require 'trollop'
-require 'pstore'
 require 'rufus-tokyo'
 POS_FILE_TYPES = %w{ adj adv noun verb }
@@ -27,7 +26,10 @@ if __FILE__ == $0
   opts = Trollop::options do
     opt :verbose, "Output verbose program detail.", :default => false
     opt :wordnet, "Location of the wordnet dictionary directory", :default => "Search..."
+    opt :build_tokyo, "Build the tokyo dataset?", :default => false
+    opt :build_pure, "Build the pure ruby dataset?", :default => false
   end
+  Trollop::die :build_tokyo, "Either tokyo dataset or pure ruby dataset are required" if !opts[:build_tokyo] && !opts[:build_pure]
   puts "Verbose mode enabled" if (VERBOSE = opts[:verbose])
   wordnet_dir = nil
@@ -57,7 +59,8 @@ if __FILE__ == $0
   # Build data
-  hash = Rufus::Tokyo::Table.new("data/wordnet.tct")
+  index_hash = Hash.new
+  data_hash = Hash.new
   POS_FILE_TYPES.each do |file_pos|
     puts "Building #{file_pos} indexes..." if VERBOSE
@@ -73,30 +76,45 @@ if __FILE__ == $0
       tagsense_count = pos + index_parts.shift
       synset_ids = Array.new(synset_count).map { POS_FILE_TYPE_TO_SHORT[file_pos] + index_parts.shift }
-      hash[lemma] = { "synset_ids" => '', "tagsense_counts" => '' } if hash[lemma].nil?
+      index_hash[lemma] = { "synset_ids" => [], "tagsense_counts" => [] } if index_hash[lemma].nil?
+      index_hash[lemma] = { "lemma" => lemma, "synset_ids" => index_hash[lemma]["synset_ids"] + synset_ids, "tagsense_counts" => index_hash[lemma]["tagsense_counts"] + [tagsense_count] }
-      hash[lemma] = { "lemma" => lemma, "synset_ids" => (hash[lemma]["synset_ids"].split('|') + synset_ids).join('|'), # append synsets
-                      "tagsense_counts" => (hash[lemma]["tagsense_counts"].split('|') << tagsense_count).join('|') } # append pointer symbols
     end
-    puts "Adding #{file_pos} data..." if VERBOSE
-    # add data
-     (wordnet_dir + "data.#{file_pos}").each_line do |data_line|
-      next if data_line[0, 2] == "  "
-      data_line, gloss = data_line.split(" | ")
-      data_parts = data_line.split(" ")
-      synset_id, lexical_filenum, synset_type, word_count = POS_FILE_TYPE_TO_SHORT[file_pos] + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
-      words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
-      relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
+    if opts[:build_tokyo]
+      puts "Building #{file_pos} data..." if VERBOSE
-      hash[synset_id] = { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type,
-                          "words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss }
+      # add data
+       (wordnet_dir + "data.#{file_pos}").each_line do |data_line|
+        next if data_line[0, 2] == "  "
+        data_line, gloss = data_line.split(" | ")
+        data_parts = data_line.split(" ")
+        synset_id, lexical_filenum, synset_type, word_count = POS_FILE_TYPE_TO_SHORT[file_pos] + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
+        words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
+        relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
+        data_hash[synset_id] = { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type,
+                          "words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss.strip }
+      end
     end
   end
-  hash.close
+  if opts[:build_tokyo]
+    tokyo_hash = Rufus::Tokyo::Table.new("#{File.dirname(__FILE__)}/../data/wordnet.tct")
+    index_hash.each { |k,v| tokyo_hash[k] = { "lemma" => v["lemma"], "synset_ids" => v["synset_ids"].join('|'), "tagsense_counts" => v["tagsense_counts"].join('|') } }
+    data_hash.each { |k,v| tokyo_hash[k] = v }
+    tokyo_hash.close
+  end
+  if opts[:build_pure]
+    index = Hash.new
+    index_hash.each { |k,v| index[k] = [v["lemma"], v["tagsense_counts"].join('|'), v["synset_ids"].join('|')] }
+    File.open("#{File.dirname(__FILE__)}/../data/index.dmp",'w') do |file|
+      file.write Marshal.dump(index)
+    end
+  end
 end

data/examples.rb CHANGED

@@ -4,25 +4,32 @@ require 'lib/words'
 if __FILE__ == $0
-  wordnet = Words::Words.new
+  wordnet = Words::Words.new  # :pure
+  puts wordnet
   puts wordnet.find('bat')
   puts wordnet.find('bat').available_pos.inspect
   puts wordnet.find('bat').lemma
+  puts wordnet.find('bat').nouns?
   puts wordnet.find('bat').synsets('noun')
-  puts wordnet.find('bat').synsets('noun').last.words.inspect
-  puts wordnet.find('bat').synsets('noun').last.relations
+  puts wordnet.find('bat').noun_ids
+  puts wordnet.find('bat').synsets(:noun).last.words.inspect
+  puts wordnet.find('bat').nouns.last.relations
   wordnet.find('bat').synsets('noun').last.relations.each { |relation| puts relation.inspect }
-  puts wordnet.find('bat').synsets('noun').last.methods
   puts wordnet.find('bat').synsets('noun').last.hyponyms?
   puts wordnet.find('bat').synsets('noun').last.participle_of_verbs?
   puts wordnet.find('bat').synsets('noun').last.relations(:hyponym)
+  puts wordnet.find('bat').synsets('noun').last.hyponyms?
   puts wordnet.find('bat').synsets('noun').last.relations("~")
   puts wordnet.find('bat').synsets('verb').last.inspect
   puts wordnet.find('bat').synsets('verb').last.words
   puts wordnet.find('bat').synsets('verb').last.words_with_num.inspect
+  puts wordnet.find('bat').synsets('verb').first.lexical.inspect
+  puts wordnet.find('bat').synsets('verb').first.lexical_description
   wordnet.close
 end

data/lib/words.rb CHANGED

@@ -10,12 +10,94 @@ module Words
   class WordnetConnection
-    def self.wordnet_connection
-      @@wordnet_connection
+    SHORT_TO_POS_FILE_TYPE = { 'a' => 'adj', 'r' => 'adv', 'n' => 'noun', 'v' => 'verb' }
+    attr_reader :connected, :connection_type, :data_path, :wordnet_dir
+    def initialize(type, path, wordnet_path)
+      @data_path = Pathname.new("#{File.dirname(__FILE__)}/../data/wordnet.tct") if type == :tokyo && path == :default
+      @data_path = Pathname.new("#{File.dirname(__FILE__)}/../data/index.dmp") if type == :pure && path == :default
+      @connection_type = type
+      if @data_path.exist?
+        if @connection_type == :tokyo
+          @connection = Rufus::Tokyo::Table.new(@data_path.to_s)
+          @connected = true
+        elsif @connection_type == :pure
+          # open the index is there
+          File.open(@data_path,'r') do |file|
+            @connection = Marshal.load file.read
+          end
+          # search for the wordnet files
+          if locate_wordnet?(wordnet_path)
+            @connected = true
+          else
+            @connected = false
+            raise "Failed to locate the wordnet database. Please ensure it is installed and that if it resides at a custom path that path is given as an argument when constructing the Words object."
+          end
+        else
+          @connected = false
+        end
+      else
+        @connected = false
+        raise "Failed to locate the words #{ @connection_type == :pure ? 'index' : 'dataset' } at #{@data_path}. Please insure you have created it using the words gems provided 'build_dataset.rb' command."
+      end
+    end
+    def close
+      @connected = false
+      if @connected && connection_type == :tokyo
+        connection.close
+      end
+      return true
+    end
+    def lemma(term)
+      if connection_type == :pure
+        raw_lemma = @connection[term]
+        { 'lemma' => raw_lemma[0], 'tagsense_counts' => raw_lemma[1], 'synset_ids' => raw_lemma[2]}
+      else
+        @connection[term]
+      end
+    end
+    def synset(synset_id)
+      if connection_type == :pure
+        pos = synset_id[0,1]
+        File.open(@wordnet_dir + "data.#{SHORT_TO_POS_FILE_TYPE[pos]}","r") do |file|
+          file.seek(synset_id[1..-1].to_i)
+          data_line, gloss = file.readline.strip.split(" | ")
+          data_parts = data_line.split(" ")
+          synset_id, lexical_filenum, synset_type, word_count = pos + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
+          words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
+          relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
+          { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type, "words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss.strip }
+        end
+      else
+        @connection[synset_id]
+      end
     end
-    def self.wordnet_connection=(x)
-      @@wordnet_connection = x
+    def locate_wordnet?(base_dirs)
+      base_dirs = case base_dirs
+        when :search
+        ['/usr/share/wordnet', '/usr/local/share/wordnet', '/usr/local/WordNet-3.0']
+      else
+        [ base_dirs ]
+      end
+      base_dirs.each do |dir|
+        ["", "dict"].each do |sub_folder|
+          path = Pathname.new(dir + sub_folder)
+          @wordnet_dir = path if (path + "data.noun").exist?
+          break if !@wordnet_dir.nil?
+        end
+      end
+      return !@wordnet_dir.nil?
     end
   end
@@ -29,7 +111,8 @@ module Words
                     "\\" => :pertainym, "<" => :participle_of_verb, "&" => :similar_to, "^" => :see_also }
     SYMBOL_TO_RELATION = RELATION_TO_SYMBOL.invert
-    def initialize(relation_construct, source_synset)
+    def initialize(relation_construct, source_synset, wordnet_connection)
+      @wordnet_connection = wordnet_connection
       @symbol, @dest_synset_id, @pos, @source_dest = relation_construct.split('.')
       @dest_synset_id = @pos + @dest_synset_id
       @symbol = RELATION_TO_SYMBOL[@symbol]
@@ -66,7 +149,7 @@ module Words
     end
     def destination
-      @destination = Synset.new(@dest_synset_id) unless defined? @destination
+      @destination = Synset.new @dest_synset_id, @wordnet_connection unless defined? @destination
       @destination
     end
@@ -85,9 +168,55 @@ module Words
   class Synset
     SYNSET_TYPE_TO_SYMBOL = {"n" => :noun, "v" => :verb, "a" => :adjective, "r" => :adverb, "s" => :adjective_satallite }
-    def initialize(synset_id)
-      @synset_hash = WordnetConnection::wordnet_connection[synset_id]
+    NUM_TO_LEX = [ { :lex => :adj_all, :description => "all adjective clusters" },
+    { :lex => :adj_pert, :description => "relational adjectives (pertainyms)" },
+    { :lex => :adv_all, :description => "all adverbs" },
+    { :lex => :noun_Tops, :description => "unique beginner for nouns" },
+    { :lex => :noun_act, :description => "nouns denoting acts or actions" },
+    { :lex => :noun_animal, :description => "nouns denoting animals" },
+    { :lex => :noun_artifact, :description => "nouns denoting man-made objects" },
+    { :lex => :noun_attribute, :description => "nouns denoting attributes of people and objects" },
+    { :lex => :noun_body, :description => "nouns denoting body parts" },
+    { :lex => :noun_cognition, :description => "nouns denoting cognitive processes and contents" },
+    { :lex => :noun_communication, :description => "nouns denoting communicative processes and contents" },
+    { :lex => :noun_event, :description => "nouns denoting natural events" },
+    { :lex => :noun_feeling, :description => "nouns denoting feelings and emotions" },
+    { :lex => :noun_food, :description => "nouns denoting foods and drinks" },
+    { :lex => :noun_group, :description => "nouns denoting groupings of people or objects" },
+    { :lex => :noun_location, :description => "nouns denoting spatial position" },
+    { :lex => :noun_motive, :description => "nouns denoting goals" },
+    { :lex => :noun_object, :description => "nouns denoting natural objects (not man-made)" },
+    { :lex => :noun_person, :description => "nouns denoting people" },
+    { :lex => :noun_phenomenon, :description => "nouns denoting natural phenomena" },
+    { :lex => :noun_plant, :description => "nouns denoting plants" },
+    { :lex => :noun_possession, :description => "nouns denoting possession and transfer of possession" },
+    { :lex => :noun_process, :description => "nouns denoting natural processes" },
+    { :lex => :noun_quantity, :description => "nouns denoting quantities and units of measure" },
+    { :lex => :noun_relation, :description => "nouns denoting relations between people or things or ideas" },
+    { :lex => :noun_shape, :description => "nouns denoting two and three dimensional shapes" },
+    { :lex => :noun_state, :description => "nouns denoting stable states of affairs" },
+    { :lex => :noun_substance, :description => "nouns denoting substances" },
+    { :lex => :noun_time, :description => "nouns denoting time and temporal relations" },
+    { :lex => :verb_body, :description => "verbs of grooming, dressing and bodily care" },
+    { :lex => :verb_change, :description => "verbs of size, temperature change, intensifying, etc." },
+    { :lex => :verb_cognition, :description => "verbs of thinking, judging, analyzing, doubting" },
+    { :lex => :verb_communication, :description => "verbs of telling, asking, ordering, singing" },
+    { :lex => :verb_competition, :description => "verbs of fighting, athletic activities" },
+    { :lex => :verb_consumption, :description => "verbs of eating and drinking" },
+    { :lex => :verb_contact, :description => "verbs of touching, hitting, tying, digging" },
+    { :lex => :verb_creation, :description => "verbs of sewing, baking, painting, performing" },
+    { :lex => :verb_emotion, :description => "verbs of feeling" },
+    { :lex => :verb_motion, :description => "verbs of walking, flying, swimming" },
+    { :lex => :verb_perception, :description => "verbs of seeing, hearing, feeling" },
+    { :lex => :verb_possession, :description => "verbs of buying, selling, owning" },
+    { :lex => :verb_social, :description => "verbs of political and social activities and events" },
+    { :lex => :verb_stative, :description => "verbs of being, having, spatial relations" },
+    { :lex => :verb_weather, :description => "verbs of raining, snowing, thawing, thundering" },
+    { :lex => :adj_ppl, :description => "participial adjectives" } ]
+    def initialize(synset_id, wordnet_connection)
+      @wordnet_connection = wordnet_connection
+      @synset_hash = wordnet_connection.synset(synset_id)
       # construct some conveniance menthods for relation type access
       Relation::SYMBOL_TO_RELATION.keys.each do |relation_type|
         self.class.send(:define_method, "#{relation_type}s?") do
@@ -117,6 +246,22 @@ module Words
       @words_with_num
     end
+    def lexical_filenum
+      @synset_hash["lexical_filenum"].to_i
+    end
+    def lexical_catagory
+      lexical[:lex]
+    end
+    def lexical_description
+      lexical[:description]
+    end
+    def lexical
+      NUM_TO_LEX[@synset_hash["lexical_filenum"].to_i]
+    end
     def synset_id
       @synset_hash["synset_id"]
     end
@@ -130,7 +275,7 @@ module Words
     end
     def relations(type = :all)
-      @relations = @synset_hash["relations"].split('|').map { |relation| Relation.new(relation, self) } unless defined? @relations
+      @relations = @synset_hash["relations"].split('|').map { |relation| Relation.new(relation, self, @wordnet_connection) } unless defined? @relations
       case
         when Relation::SYMBOL_TO_RELATION.include?(type.to_sym)
         @relations.select { |relation| relation.relation_type == type.to_sym }
@@ -153,8 +298,9 @@ module Words
     POS_TO_SYMBOL = {"n" => :noun, "v" => :verb, "a" => :adjective, "r" => :adverb}
     SYMBOL_TO_POS = POS_TO_SYMBOL.invert
-    def initialize(lemma_hash)
-      @lemma_hash = lemma_hash
+    def initialize(raw_lemma, wordnet_connection)
+      @wordnet_connection = wordnet_connection
+      @lemma_hash = raw_lemma
       # construct some conveniance menthods for relation type access
       SYMBOL_TO_POS.keys.each do |pos|
         self.class.send(:define_method, "#{pos}s?") do
@@ -163,9 +309,17 @@ module Words
         self.class.send(:define_method, "#{pos}s") do
           synsets(pos)
         end
+        self.class.send(:define_method, "#{pos}_ids") do
+          synset_ids(pos)
+        end
       end
     end
+    def tagsense_counts
+      @tagsense_counts = @lemma_hash["tagsense_counts"].split('|').map { |count| { POS_TO_SYMBOL[count[0,1]] => count[1..-1].to_i }  } unless defined? @tagsense_counts
+      @tagsense_counts
+    end
     def lemma
       @lemma = @lemma_hash["lemma"].gsub('_', ' ') unless defined? @lemma
       @lemma
@@ -182,20 +336,19 @@ module Words
     end
     def synsets(pos = :all)
-      relevent_synsets = case
+      synset_ids(pos).map { |synset_id| Synset.new synset_id, @wordnet_connection }
+    end
+    def synset_ids(pos = :all)
+      @synset_ids = @lemma_hash["synset_ids"].split('|') unless defined? @synset_ids
+      case
         when SYMBOL_TO_POS.include?(pos.to_sym)
-        synset_ids.select { |synset_id| synset_id[0,1] == SYMBOL_TO_POS[pos.to_sym] }
+        @synset_ids.select { |synset_id| synset_id[0,1] == SYMBOL_TO_POS[pos.to_sym] }
         when POS_TO_SYMBOL.include?(pos.to_s)
-        synset_ids.select { |synset_id| synset_id[0,1] == pos.to_s }
+        @synset_ids.select { |synset_id| synset_id[0,1] == pos.to_s }
       else
-        synset_ids
+        @synset_ids
       end
-      relevent_synsets.map { |synset_id| Synset.new synset_id }
-    end
-    def synset_ids
-      @synset_ids = @lemma_hash["synset_ids"].split('|') unless defined? @synset_ids
-      @synset_ids
     end
     def inspect
@@ -203,25 +356,42 @@ module Words
     end
     alias word lemma
+    alias pos available_pos
   end
   class Words
-    def initialize(path = 'data/wordnet.tct')
-      if (Pathname.new path).exist?
-        WordnetConnection::wordnet_connection = Rufus::Tokyo::Table.new(path)
-      else
-        abort("Failed to locate the words database at #{(Pathname.new path).realpath}")
-      end
+    @wordnet_connection = nil
+    def initialize(type = :tokyo, path = :default, wordnet_path = :search)
+      @wordnet_connection = WordnetConnection.new(type, path, wordnet_path)
     end
     def find(word)
-      Lemma.new WordnetConnection::wordnet_connection[word]
+      Lemma.new  @wordnet_connection.lemma(word), @wordnet_connection
+    end
+    def connection_type
+      @wordnet_connection.connection_type
+    end
+    def wordnet_dir
+      @wordnet_connection.wordnet_dir
     end
     def close
-      WordnetConnection::wordnet_connection.close
+      @wordnet_connection.close
+    end
+    def connected
+      @wordnet_connection.connected
+    end
+    def to_s
+      return "Words not connected" if !connected
+      return "Words running in pure mode using wordnet files found at #{wordnet_dir} and index at #{@wordnet_connection.data_path}" if connection_type == :pure
+      return "Words running in tokyo mode with dataset at #{@wordnet_connection.data_path}" if connection_type == :tokyo
     end
   end

data/words.gemspec ADDED

@@ -0,0 +1,60 @@
+# Generated by jeweler
+# DO NOT EDIT THIS FILE DIRECTLY
+# Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
+# -*- encoding: utf-8 -*-
+Gem::Specification.new do |s|
+  s.name = %q{words}
+  s.version = "0.2.0"
+  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
+  s.authors = ["Roja Buck"]
+  s.date = %q{2010-01-16}
+  s.default_executable = %q{build_wordnet}
+  s.description = %q{A fast, easy to use interface to WordNet® with cross ruby distribution compatability. We use TokyoCabinet to store the dataset and the excellent rufus-tokyo to interface with it. This allows us to have full compatability across ruby distributions while still remaining both fast and simple to use.}
+  s.email = %q{roja@arbia.co.uk}
+  s.executables = ["build_wordnet"]
+  s.extra_rdoc_files = [
+    "LICENSE",
+     "README.markdown"
+  ]
+  s.files = [
+    ".gitignore",
+     "LICENSE",
+     "README.markdown",
+     "Rakefile",
+     "VERSION",
+     "bin/build_wordnet",
+     "examples.rb",
+     "lib/words.rb",
+     "test/helper.rb",
+     "test/test_words.rb",
+     "words.gemspec"
+  ]
+  s.homepage = %q{http://github.com/roja/words}
+  s.rdoc_options = ["--charset=UTF-8"]
+  s.require_paths = ["lib"]
+  s.rubygems_version = %q{1.3.5}
+  s.summary = %q{A fast, easy to use interface to WordNet® with cross ruby distribution compatability.}
+  s.test_files = [
+    "test/test_words.rb",
+     "test/helper.rb"
+  ]
+  if s.respond_to? :specification_version then
+    current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
+    s.specification_version = 3
+    if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
+      s.add_runtime_dependency(%q<trollop>, [">= 1.15"])
+      s.add_runtime_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
+    else
+      s.add_dependency(%q<trollop>, [">= 1.15"])
+      s.add_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
+    end
+  else
+    s.add_dependency(%q<trollop>, [">= 1.15"])
+    s.add_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
+  end
+end

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: words
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.2.0
 platform: ruby
 authors:
 - Roja Buck
@@ -9,12 +9,12 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2010-01-14 00:00:00 +00:00
-default_executable:
+date: 2010-01-16 00:00:00 +00:00
+default_executable: build_wordnet
 dependencies:
 - !ruby/object:Gem::Dependency
   name: trollop
-  type: :development
+  type: :runtime
   version_requirement:
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
@@ -34,8 +34,8 @@ dependencies:
     version:
 description: "A fast, easy to use interface to WordNet\xC2\xAE with cross ruby distribution compatability. We use TokyoCabinet to store the dataset and the excellent rufus-tokyo to interface with it. This allows us to have full compatability across ruby distributions while still remaining both fast and simple to use."
 email: roja@arbia.co.uk
-executables: []
+executables:
+- build_wordnet
 extensions: []
 extra_rdoc_files:
@@ -47,12 +47,12 @@ files:
 - README.markdown
 - Rakefile
 - VERSION
-- build_dataset.rb
-- data/wordnet.tct
+- bin/build_wordnet
 - examples.rb
 - lib/words.rb
 - test/helper.rb
 - test/test_words.rb
+- words.gemspec
 has_rdoc: true
 homepage: http://github.com/roja/words
 licenses: []

data/data/wordnet.tct DELETED

Binary file