RubyGems - words - Versions diffs - 0.1.0 → 0.2.0 - Mend

words 0.1.0 → 0.2.0

Files changed (9) hide show

data/README.markdown +45 -10
data/Rakefile +4 -3
data/VERSION +1 -1
data/{build_dataset.rb → bin/build_wordnet} +38 -20
data/examples.rb +11 -4
data/lib/words.rb +200 -30
data/words.gemspec +60 -0
metadata +8 -8
data/data/wordnet.tct +0 -0

data/README.markdown CHANGED

@@ -2,11 +2,43 @@
 ## About ##
-Words implements a fast interface to [Wordnet®](http://wordnet.princeton.edu) which makes use of [Tokyo Cabinet](http://1978th.net/tokyocabinet/) and a FFI interface, [rufus-tokyo](http://github.com/jmettraux/rufus-tokyo), to provide cross ruby distribution compatability and blistering speed. I have attempted to provide ease of use in the form of a simple yet powerful api and installation is a sintch, we even include the data in it's tokyo data format (subject to the original wordnet licencing.)
+Words implements a fast interface to [Wordnet®](http://wordnet.princeton.edu) which provides both a pure ruby and an FFI powered backend over the same easy-to-use API. The FFI backend makes use of [Tokyo Cabinet](http://1978th.net/tokyocabinet/) and the FFI interface, [rufus-tokyo](http://github.com/jmettraux/rufus-tokyo), to provide cross ruby distribution compatability and blistering speed. The pure ruby interface operates on a special ruby optimised index along with the basic dictionary files provided by WordNet®. I have attempted to provide ease of use in the form of a simple yet powerful api and installation is a sintch!
-## Installation ##
+## Pre-Installation ##
-First ensure you have [Tokyo Cabinet](http://1978th.net/tokyocabinet/) installed. It should be nice and easy...
+First ensure you have a copy of the wordnet data files. This is generally available from your Linux/OSX package manager:
+    #Ubuntu
+    sudo apt-get install wordnet-base
+    #Fedora/RHL
+    sudo yum update wordnet
+    #MacPorts
+    sudo port install wordnet
+or you can simply download and install (Unix/OSX):
+	wget http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
+	sudo mkdir /usr/local/share/wordnet
+	sudo tar -C /usr/local/share/wordnet/ -xzf WNdb-3.0.tar.gz
+or (Windows)
+	Download http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
+	Unzip
+## For Tokyo Backend Only ##
+Unless you want to use the tokyo backend you are now ready to install Words && build the data, otherwise if you want to use the tokyo backend (FAST!) you will also need [Tokyo Cabinet](http://1978th.net/tokyocabinet/) installed. It should be nice and easy... something like:
+    wget http://1978th.net/tokyocabinet/tokyocabinet-1.4.41.tar.gz
+    cd tokyo-cabinet/
+    ./configure
+    make
+    sudo make install
+## GEM Installation ##
 After this it should be just a gem to install. For those of you with old rubygems versions first:
@@ -19,22 +51,22 @@ Otherwise and after it's simply:
 Then your ready to rock and roll. :)
-## Build Data (Optional) ##
+## Build Data ##
-If you want to build the wordnet dataset file yourself, from the original wordnet files, you can use the bundled "build_dataset.rb"
+To build the wordnet dataset (or index for pure) file yourself, from the original wordnet files, you can use the bundled "build_wordnet" command
-	./build_dataset.rb -h #this will give you the usage
-	sudo ./build_dataset.rb #this will attempt to build the data locating the original wordnet files through a search...
+	build_wordnet -h # this will give you the usage information
+	sudo build_wordnet -v --build-tokyo # this would attempt to build the tokyo backend data locating the original wordnet files through a search...
+	sudo build_wordnet -v --build-pure # this would attempt to build the pure backend index locating the original wordnet files through a search...
 ## Usage ##
 Heres a few little examples of using words within your programs.
     require 'rubygems'
     require 'words'
-    data = Words::Words.new
+    data = Words::Words.new # or: data = Words::Words.new(:pure) for the pure ruby backend
     # locate a word
     lemma = data.find("bat")
@@ -45,6 +77,7 @@ Heres a few little examples of using words within your programs.
     lemma.synsets(:noun) # => array of synsets which represent nouns of the lemma bat
     # or
     lemma.nouns # => array of synsets which represent nouns of the lemma bat
+    lemma.noun_ids # => array of synsets ids which represent nouns of the lemma bat
     lemma.verbs? #=> true
     # specify a sense
@@ -53,6 +86,7 @@ Heres a few little examples of using words within your programs.
     sense.gloss # => a club used for hitting a ball in various games
     sense2.words # => ["cricket bat", "bat"]
+    sense2.lexical_description # => a description of the lexical meaning of the synset
     sense.relations.first # => "Semantic hypernym relation between n02806379 and n03053474"
     sense.relations(:hyponym) # => Array of hyponyms associated with the sense
@@ -68,7 +102,8 @@ Heres a few little examples of using words within your programs.
     sense.derivationally_related_forms.first.source_word # => "bat"
     sense.derivationally_related_forms.first.destination_word # => "bat"
     sense.derivationally_related_forms.first.destination # => the synset of v01413191
+These and more examples are available from within the examples.rb file!
 ## Note on Patches/Pull Requests ##

data/Rakefile CHANGED

@@ -10,9 +10,10 @@ begin
     gem.email = "roja@arbia.co.uk"
     gem.homepage = "http://github.com/roja/words"
     gem.authors = ["Roja Buck"]
-    gem.add_development_dependency "trollop", ">= 1.15"
+    gem.add_dependency "trollop", ">= 1.15"
     gem.add_dependency 'rufus-tokyo', '>= 1.0.5'
-    # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
+    gem.executables = [ "build_wordnet" ]
+    gem.default_executable = "build_wordnet"
   end
   Jeweler::GemcutterTasks.new
 rescue LoadError
@@ -46,7 +47,7 @@ task :default => :test
 require 'rake/rdoctask'
 Rake::RDocTask.new do |rdoc|
   version = File.exist?('VERSION') ? File.read('VERSION') : ""
   rdoc.rdoc_dir = 'rdoc'
   rdoc.title = "words #{version}"
   rdoc.rdoc_files.include('README*')

data/VERSION CHANGED

	@@ -1 +1 @@
1	- 0.1.0
1	+ 0.2.0

data/{build_dataset.rb → bin/build_wordnet} RENAMED

@@ -6,7 +6,6 @@ require 'pathname'
 # gem includes
 require 'rubygems'
 require 'trollop'
-require 'pstore'
 require 'rufus-tokyo'
 POS_FILE_TYPES = %w{ adj adv noun verb }
@@ -27,7 +26,10 @@ if __FILE__ == $0
   opts = Trollop::options do
     opt :verbose, "Output verbose program detail.", :default => false
     opt :wordnet, "Location of the wordnet dictionary directory", :default => "Search..."
+    opt :build_tokyo, "Build the tokyo dataset?", :default => false
+    opt :build_pure, "Build the pure ruby dataset?", :default => false
   end
+  Trollop::die :build_tokyo, "Either tokyo dataset or pure ruby dataset are required" if !opts[:build_tokyo] && !opts[:build_pure]
   puts "Verbose mode enabled" if (VERBOSE = opts[:verbose])
   wordnet_dir = nil
@@ -57,7 +59,8 @@ if __FILE__ == $0
   # Build data
-  hash = Rufus::Tokyo::Table.new("data/wordnet.tct")
+  index_hash = Hash.new
+  data_hash = Hash.new
   POS_FILE_TYPES.each do |file_pos|
     puts "Building #{file_pos} indexes..." if VERBOSE
@@ -73,30 +76,45 @@ if __FILE__ == $0
       tagsense_count = pos + index_parts.shift
       synset_ids = Array.new(synset_count).map { POS_FILE_TYPE_TO_SHORT[file_pos] + index_parts.shift }
-      hash[lemma] = { "synset_ids" => '', "tagsense_counts" => '' } if hash[lemma].nil?
+      index_hash[lemma] = { "synset_ids" => [], "tagsense_counts" => [] } if index_hash[lemma].nil?
+      index_hash[lemma] = { "lemma" => lemma, "synset_ids" => index_hash[lemma]["synset_ids"] + synset_ids, "tagsense_counts" => index_hash[lemma]["tagsense_counts"] + [tagsense_count] }
-      hash[lemma] = { "lemma" => lemma, "synset_ids" => (hash[lemma]["synset_ids"].split('|') + synset_ids).join('|'), # append synsets
-                      "tagsense_counts" => (hash[lemma]["tagsense_counts"].split('|') << tagsense_count).join('|') } # append pointer symbols
     end
-    puts "Adding #{file_pos} data..." if VERBOSE
-    # add data
-     (wordnet_dir + "data.#{file_pos}").each_line do |data_line|
-      next if data_line[0, 2] == "  "
-      data_line, gloss = data_line.split(" | ")
-      data_parts = data_line.split(" ")
-      synset_id, lexical_filenum, synset_type, word_count = POS_FILE_TYPE_TO_SHORT[file_pos] + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
-      words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
-      relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
+    if opts[:build_tokyo]
+      puts "Building #{file_pos} data..." if VERBOSE
-      hash[synset_id] = { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type,
-                          "words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss }
+      # add data
+       (wordnet_dir + "data.#{file_pos}").each_line do |data_line|
+        next if data_line[0, 2] == "  "
+        data_line, gloss = data_line.split(" | ")
+        data_parts = data_line.split(" ")
+        synset_id, lexical_filenum, synset_type, word_count = POS_FILE_TYPE_TO_SHORT[file_pos] + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
+        words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
+        relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
+        data_hash[synset_id] = { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type,
+                          "words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss.strip }
+      end
     end
   end
-  hash.close
+  if opts[:build_tokyo]
+    tokyo_hash = Rufus::Tokyo::Table.new("#{File.dirname(__FILE__)}/../data/wordnet.tct")
+    index_hash.each { |k,v| tokyo_hash[k] = { "lemma" => v["lemma"], "synset_ids" => v["synset_ids"].join('|'), "tagsense_counts" => v["tagsense_counts"].join('|') } }
+    data_hash.each { |k,v| tokyo_hash[k] = v }
+    tokyo_hash.close
+  end
+  if opts[:build_pure]
+    index = Hash.new
+    index_hash.each { |k,v| index[k] = [v["lemma"], v["tagsense_counts"].join('|'), v["synset_ids"].join('|')] }
+    File.open("#{File.dirname(__FILE__)}/../data/index.dmp",'w') do |file|
+      file.write Marshal.dump(index)
+    end
+  end
 end

data/examples.rb CHANGED

@@ -4,25 +4,32 @@ require 'lib/words'
 if __FILE__ == $0
-  wordnet = Words::Words.new
+  wordnet = Words::Words.new  # :pure
+  puts wordnet
   puts wordnet.find('bat')
   puts wordnet.find('bat').available_pos.inspect
   puts wordnet.find('bat').lemma
+  puts wordnet.find('bat').nouns?
   puts wordnet.find('bat').synsets('noun')
-  puts wordnet.find('bat').synsets('noun').last.words.inspect
-  puts wordnet.find('bat').synsets('noun').last.relations
+  puts wordnet.find('bat').noun_ids
+  puts wordnet.find('bat').synsets(:noun).last.words.inspect
+  puts wordnet.find('bat').nouns.last.relations
   wordnet.find('bat').synsets('noun').last.relations.each { |relation| puts relation.inspect }
-  puts wordnet.find('bat').synsets('noun').last.methods
   puts wordnet.find('bat').synsets('noun').last.hyponyms?
   puts wordnet.find('bat').synsets('noun').last.participle_of_verbs?
   puts wordnet.find('bat').synsets('noun').last.relations(:hyponym)
+  puts wordnet.find('bat').synsets('noun').last.hyponyms?
   puts wordnet.find('bat').synsets('noun').last.relations("~")
   puts wordnet.find('bat').synsets('verb').last.inspect
   puts wordnet.find('bat').synsets('verb').last.words
   puts wordnet.find('bat').synsets('verb').last.words_with_num.inspect
+  puts wordnet.find('bat').synsets('verb').first.lexical.inspect
+  puts wordnet.find('bat').synsets('verb').first.lexical_description
   wordnet.close
 end

data/lib/words.rb CHANGED

@@ -10,12 +10,94 @@ module Words
   class WordnetConnection
-    def self.wordnet_connection
-      @@wordnet_connection
+    SHORT_TO_POS_FILE_TYPE = { 'a' => 'adj', 'r' => 'adv', 'n' => 'noun', 'v' => 'verb' }
+    attr_reader :connected, :connection_type, :data_path, :wordnet_dir
+    def initialize(type, path, wordnet_path)
+      @data_path = Pathname.new("#{File.dirname(__FILE__)}/../data/wordnet.tct") if type == :tokyo && path == :default
+      @data_path = Pathname.new("#{File.dirname(__FILE__)}/../data/index.dmp") if type == :pure && path == :default
+      @connection_type = type
+      if @data_path.exist?
+        if @connection_type == :tokyo
+          @connection = Rufus::Tokyo::Table.new(@data_path.to_s)
+          @connected = true
+        elsif @connection_type == :pure
+          # open the index is there
+          File.open(@data_path,'r') do |file|
+            @connection = Marshal.load file.read
+          end
+          # search for the wordnet files
+          if locate_wordnet?(wordnet_path)
+            @connected = true
+          else
+            @connected = false
+            raise "Failed to locate the wordnet database. Please ensure it is installed and that if it resides at a custom path that path is given as an argument when constructing the Words object."
+          end
+        else
+          @connected = false
+        end
+      else
+        @connected = false
+        raise "Failed to locate the words #{ @connection_type == :pure ? 'index' : 'dataset' } at #{@data_path}. Please insure you have created it using the words gems provided 'build_dataset.rb' command."
+      end
+    end
+    def close
+      @connected = false
+      if @connected && connection_type == :tokyo
+        connection.close
+      end
+      return true
+    end
+    def lemma(term)
+      if connection_type == :pure
+        raw_lemma = @connection[term]
+        { 'lemma' => raw_lemma[0], 'tagsense_counts' => raw_lemma[1], 'synset_ids' => raw_lemma[2]}
+      else
+        @connection[term]
+      end
+    end
+    def synset(synset_id)
+      if connection_type == :pure
+        pos = synset_id[0,1]
+        File.open(@wordnet_dir + "data.#{SHORT_TO_POS_FILE_TYPE[pos]}","r") do |file|
+          file.seek(synset_id[1..-1].to_i)
+          data_line, gloss = file.readline.strip.split(" | ")
+          data_parts = data_line.split(" ")
+          synset_id, lexical_filenum, synset_type, word_count = pos + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
+          words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
+          relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
+          { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type, "words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss.strip }
+        end
+      else
+        @connection[synset_id]
+      end
     end
-    def self.wordnet_connection=(x)
-      @@wordnet_connection = x
+    def locate_wordnet?(base_dirs)
+      base_dirs = case base_dirs
+        when :search
+        ['/usr/share/wordnet', '/usr/local/share/wordnet', '/usr/local/WordNet-3.0']
+      else
+        [ base_dirs ]
+      end
+      base_dirs.each do |dir|
+        ["", "dict"].each do |sub_folder|
+          path = Pathname.new(dir + sub_folder)
+          @wordnet_dir = path if (path + "data.noun").exist?
+          break if !@wordnet_dir.nil?
+        end
+      end
+      return !@wordnet_dir.nil?
     end
   end
@@ -29,7 +111,8 @@ module Words
                     "\\" => :pertainym, "<" => :participle_of_verb, "&" => :similar_to, "^" => :see_also }
     SYMBOL_TO_RELATION = RELATION_TO_SYMBOL.invert
-    def initialize(relation_construct, source_synset)
+    def initialize(relation_construct, source_synset, wordnet_connection)
+      @wordnet_connection = wordnet_connection
       @symbol, @dest_synset_id, @pos, @source_dest = relation_construct.split('.')
       @dest_synset_id = @pos + @dest_synset_id
       @symbol = RELATION_TO_SYMBOL[@symbol]
@@ -66,7 +149,7 @@ module Words
     end
     def destination
-      @destination = Synset.new(@dest_synset_id) unless defined? @destination
+      @destination = Synset.new @dest_synset_id, @wordnet_connection unless defined? @destination
       @destination
     end
@@ -85,9 +168,55 @@ module Words
   class Synset
     SYNSET_TYPE_TO_SYMBOL = {"n" => :noun, "v" => :verb, "a" => :adjective, "r" => :adverb, "s" => :adjective_satallite }
-    def initialize(synset_id)
-      @synset_hash = WordnetConnection::wordnet_connection[synset_id]
+    NUM_TO_LEX = [ { :lex => :adj_all, :description => "all adjective clusters" },
+    { :lex => :adj_pert, :description => "relational adjectives (pertainyms)" },
+    { :lex => :adv_all, :description => "all adverbs" },
+    { :lex => :noun_Tops, :description => "unique beginner for nouns" },
+    { :lex => :noun_act, :description => "nouns denoting acts or actions" },
+    { :lex => :noun_animal, :description => "nouns denoting animals" },
+    { :lex => :noun_artifact, :description => "nouns denoting man-made objects" },
+    { :lex => :noun_attribute, :description => "nouns denoting attributes of people and objects" },
+    { :lex => :noun_body, :description => "nouns denoting body parts" },
+    { :lex => :noun_cognition, :description => "nouns denoting cognitive processes and contents" },
+    { :lex => :noun_communication, :description => "nouns denoting communicative processes and contents" },
+    { :lex => :noun_event, :description => "nouns denoting natural events" },
+    { :lex => :noun_feeling, :description => "nouns denoting feelings and emotions" },
+    { :lex => :noun_food, :description => "nouns denoting foods and drinks" },
+    { :lex => :noun_group, :description => "nouns denoting groupings of people or objects" },
+    { :lex => :noun_location, :description => "nouns denoting spatial position" },
+    { :lex => :noun_motive, :description => "nouns denoting goals" },
+    { :lex => :noun_object, :description => "nouns denoting natural objects (not man-made)" },
+    { :lex => :noun_person, :description => "nouns denoting people" },
+    { :lex => :noun_phenomenon, :description => "nouns denoting natural phenomena" },
+    { :lex => :noun_plant, :description => "nouns denoting plants" },
+    { :lex => :noun_possession, :description => "nouns denoting possession and transfer of possession" },
+    { :lex => :noun_process, :description => "nouns denoting natural processes" },
+    { :lex => :noun_quantity, :description => "nouns denoting quantities and units of measure" },
+    { :lex => :noun_relation, :description => "nouns denoting relations between people or things or ideas" },
+    { :lex => :noun_shape, :description => "nouns denoting two and three dimensional shapes" },
+    { :lex => :noun_state, :description => "nouns denoting stable states of affairs" },
+    { :lex => :noun_substance, :description => "nouns denoting substances" },
+    { :lex => :noun_time, :description => "nouns denoting time and temporal relations" },
+    { :lex => :verb_body, :description => "verbs of grooming, dressing and bodily care" },
+    { :lex => :verb_change, :description => "verbs of size, temperature change, intensifying, etc." },
+    { :lex => :verb_cognition, :description => "verbs of thinking, judging, analyzing, doubting" },
+    { :lex => :verb_communication, :description => "verbs of telling, asking, ordering, singing" },
+    { :lex => :verb_competition, :description => "verbs of fighting, athletic activities" },
+    { :lex => :verb_consumption, :description => "verbs of eating and drinking" },
+    { :lex => :verb_contact, :description => "verbs of touching, hitting, tying, digging" },
+    { :lex => :verb_creation, :description => "verbs of sewing, baking, painting, performing" },
+    { :lex => :verb_emotion, :description => "verbs of feeling" },
+    { :lex => :verb_motion, :description => "verbs of walking, flying, swimming" },
+    { :lex => :verb_perception, :description => "verbs of seeing, hearing, feeling" },
+    { :lex => :verb_possession, :description => "verbs of buying, selling, owning" },
+    { :lex => :verb_social, :description => "verbs of political and social activities and events" },
+    { :lex => :verb_stative, :description => "verbs of being, having, spatial relations" },
+    { :lex => :verb_weather, :description => "verbs of raining, snowing, thawing, thundering" },
+    { :lex => :adj_ppl, :description => "participial adjectives" } ]
+    def initialize(synset_id, wordnet_connection)
+      @wordnet_connection = wordnet_connection
+      @synset_hash = wordnet_connection.synset(synset_id)
       # construct some conveniance menthods for relation type access
       Relation::SYMBOL_TO_RELATION.keys.each do |relation_type|
         self.class.send(:define_method, "#{relation_type}s?") do
@@ -117,6 +246,22 @@ module Words
       @words_with_num
     end
+    def lexical_filenum
+      @synset_hash["lexical_filenum"].to_i
+    end
+    def lexical_catagory
+      lexical[:lex]
+    end
+    def lexical_description
+      lexical[:description]
+    end
+    def lexical
+      NUM_TO_LEX[@synset_hash["lexical_filenum"].to_i]
+    end
     def synset_id
       @synset_hash["synset_id"]
     end
@@ -130,7 +275,7 @@ module Words
     end
     def relations(type = :all)
-      @relations = @synset_hash["relations"].split('|').map { |relation| Relation.new(relation, self) } unless defined? @relations
+      @relations = @synset_hash["relations"].split('|').map { |relation| Relation.new(relation, self, @wordnet_connection) } unless defined? @relations
       case
         when Relation::SYMBOL_TO_RELATION.include?(type.to_sym)
         @relations.select { |relation| relation.relation_type == type.to_sym }
@@ -153,8 +298,9 @@ module Words
     POS_TO_SYMBOL = {"n" => :noun, "v" => :verb, "a" => :adjective, "r" => :adverb}
     SYMBOL_TO_POS = POS_TO_SYMBOL.invert
-    def initialize(lemma_hash)
-      @lemma_hash = lemma_hash
+    def initialize(raw_lemma, wordnet_connection)
+      @wordnet_connection = wordnet_connection
+      @lemma_hash = raw_lemma
       # construct some conveniance menthods for relation type access
       SYMBOL_TO_POS.keys.each do |pos|
         self.class.send(:define_method, "#{pos}s?") do
@@ -163,9 +309,17 @@ module Words
         self.class.send(:define_method, "#{pos}s") do
           synsets(pos)
         end
+        self.class.send(:define_method, "#{pos}_ids") do
+          synset_ids(pos)
+        end
       end
     end
+    def tagsense_counts
+      @tagsense_counts = @lemma_hash["tagsense_counts"].split('|').map { |count| { POS_TO_SYMBOL[count[0,1]] => count[1..-1].to_i }  } unless defined? @tagsense_counts
+      @tagsense_counts
+    end
     def lemma
       @lemma = @lemma_hash["lemma"].gsub('_', ' ') unless defined? @lemma
       @lemma
@@ -182,20 +336,19 @@ module Words
     end
     def synsets(pos = :all)
-      relevent_synsets = case
+      synset_ids(pos).map { |synset_id| Synset.new synset_id, @wordnet_connection }
+    end
+    def synset_ids(pos = :all)
+      @synset_ids = @lemma_hash["synset_ids"].split('|') unless defined? @synset_ids
+      case
         when SYMBOL_TO_POS.include?(pos.to_sym)
-        synset_ids.select { |synset_id| synset_id[0,1] == SYMBOL_TO_POS[pos.to_sym] }
+        @synset_ids.select { |synset_id| synset_id[0,1] == SYMBOL_TO_POS[pos.to_sym] }
         when POS_TO_SYMBOL.include?(pos.to_s)
-        synset_ids.select { |synset_id| synset_id[0,1] == pos.to_s }
+        @synset_ids.select { |synset_id| synset_id[0,1] == pos.to_s }
       else
-        synset_ids
+        @synset_ids
       end
-      relevent_synsets.map { |synset_id| Synset.new synset_id }
-    end
-    def synset_ids
-      @synset_ids = @lemma_hash["synset_ids"].split('|') unless defined? @synset_ids
-      @synset_ids
     end
     def inspect
@@ -203,25 +356,42 @@ module Words
     end
     alias word lemma
+    alias pos available_pos
   end
   class Words
-    def initialize(path = 'data/wordnet.tct')
-      if (Pathname.new path).exist?
-        WordnetConnection::wordnet_connection = Rufus::Tokyo::Table.new(path)
-      else
-        abort("Failed to locate the words database at #{(Pathname.new path).realpath}")
-      end
+    @wordnet_connection = nil
+    def initialize(type = :tokyo, path = :default, wordnet_path = :search)
+      @wordnet_connection = WordnetConnection.new(type, path, wordnet_path)
     end
     def find(word)
-      Lemma.new WordnetConnection::wordnet_connection[word]
+      Lemma.new  @wordnet_connection.lemma(word), @wordnet_connection
+    end
+    def connection_type
+      @wordnet_connection.connection_type
+    end
+    def wordnet_dir
+      @wordnet_connection.wordnet_dir
     end
     def close
-      WordnetConnection::wordnet_connection.close
+      @wordnet_connection.close
+    end
+    def connected
+      @wordnet_connection.connected
+    end
+    def to_s
+      return "Words not connected" if !connected
+      return "Words running in pure mode using wordnet files found at #{wordnet_dir} and index at #{@wordnet_connection.data_path}" if connection_type == :pure
+      return "Words running in tokyo mode with dataset at #{@wordnet_connection.data_path}" if connection_type == :tokyo
     end
   end

data/words.gemspec ADDED

@@ -0,0 +1,60 @@
+# Generated by jeweler
+# DO NOT EDIT THIS FILE DIRECTLY
+# Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
+# -*- encoding: utf-8 -*-
+Gem::Specification.new do |s|
+  s.name = %q{words}
+  s.version = "0.2.0"
+  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
+  s.authors = ["Roja Buck"]
+  s.date = %q{2010-01-16}
+  s.default_executable = %q{build_wordnet}
+  s.description = %q{A fast, easy to use interface to WordNet® with cross ruby distribution compatability. We use TokyoCabinet to store the dataset and the excellent rufus-tokyo to interface with it. This allows us to have full compatability across ruby distributions while still remaining both fast and simple to use.}
+  s.email = %q{roja@arbia.co.uk}
+  s.executables = ["build_wordnet"]
+  s.extra_rdoc_files = [
+    "LICENSE",
+     "README.markdown"
+  ]
+  s.files = [
+    ".gitignore",
+     "LICENSE",
+     "README.markdown",
+     "Rakefile",
+     "VERSION",
+     "bin/build_wordnet",
+     "examples.rb",
+     "lib/words.rb",
+     "test/helper.rb",
+     "test/test_words.rb",
+     "words.gemspec"
+  ]
+  s.homepage = %q{http://github.com/roja/words}
+  s.rdoc_options = ["--charset=UTF-8"]
+  s.require_paths = ["lib"]
+  s.rubygems_version = %q{1.3.5}
+  s.summary = %q{A fast, easy to use interface to WordNet® with cross ruby distribution compatability.}
+  s.test_files = [
+    "test/test_words.rb",
+     "test/helper.rb"
+  ]
+  if s.respond_to? :specification_version then
+    current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
+    s.specification_version = 3
+    if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
+      s.add_runtime_dependency(%q<trollop>, [">= 1.15"])
+      s.add_runtime_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
+    else
+      s.add_dependency(%q<trollop>, [">= 1.15"])
+      s.add_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
+    end
+  else
+    s.add_dependency(%q<trollop>, [">= 1.15"])
+    s.add_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
+  end
+end

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: words
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.2.0
 platform: ruby
 authors:
 - Roja Buck
@@ -9,12 +9,12 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2010-01-14 00:00:00 +00:00
-default_executable:
+date: 2010-01-16 00:00:00 +00:00
+default_executable: build_wordnet
 dependencies:
 - !ruby/object:Gem::Dependency
   name: trollop
-  type: :development
+  type: :runtime
   version_requirement:
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
@@ -34,8 +34,8 @@ dependencies:
     version:
 description: "A fast, easy to use interface to WordNet\xC2\xAE with cross ruby distribution compatability. We use TokyoCabinet to store the dataset and the excellent rufus-tokyo to interface with it. This allows us to have full compatability across ruby distributions while still remaining both fast and simple to use."
 email: roja@arbia.co.uk
-executables: []
+executables:
+- build_wordnet
 extensions: []
 extra_rdoc_files:
@@ -47,12 +47,12 @@ files:
 - README.markdown
 - Rakefile
 - VERSION
-- build_dataset.rb
-- data/wordnet.tct
+- bin/build_wordnet
 - examples.rb
 - lib/words.rb
 - test/helper.rb
 - test/test_words.rb
+- words.gemspec
 has_rdoc: true
 homepage: http://github.com/roja/words
 licenses: []

data/data/wordnet.tct DELETED

Binary file