RubyGems - rwordnet - Versions diffs - 0.1.3 → 1.0.0 - Mend

rwordnet 0.1.3 → 1.0.0

Files changed (24) hide show

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 4bea9b677b6a581d27c04ad1912c3034bf8329d8
+  data.tar.gz: 4fdef6fdfdbe2373b445f7857d3fe5ff5071fba2
+SHA512:
+  metadata.gz: c1471523dcb27e496eb72b406f37ddd83184dfcecbcabe95c492f6018b29fa53180d06035b7f9c0a938bfde0e4c28a37f89473c0b3a712be527ed2abf6261af3
+  data.tar.gz: 25cf40f306f3dfcf8076d9430cfd5d8d805af9198762a530ed4f293e5b0c4e5ccc680e941b2256f63c5141d5499ad77b7821eec5b4e245a72a47e1f9ca6e83b3

data/History.txt CHANGED Viewed

@@ -1,5 +1,14 @@
+# rWordNet 1.0.0
+* Performance fixes for the lookup
+* Find using Lemma.find / Lemma.find_all
+* using ruby style constant names like `VerbPointers` -> `VERB_POINTERS`
+* renamed WordNet::WordNetDB to WordNet::DB
+* renaming a few methods in Lemma like `p_cnt` -> `pointer_count`
+* make Pointer a real class
+* renaming a few methods in SynSet like `get_relation` -> `relation`
 # rWordNet 0.1.3
-# Fixed a terrible bug that caused Indices to re-read the *entire* database on every failed lookup.
+* Fixed a terrible bug that caused Indices to re-read the *entire* database on every failed lookup.
 # rWordNet 0.1.2
 * Added unique (integer) ids to lemmas [Wolfram Sieber]

data/README.markdown CHANGED Viewed

@@ -1,59 +1,69 @@
 # A pure Ruby interface to WordNet #
+[![Build Status](https://travis-ci.org/doches/rwordnet.png)](https://travis-ci.org/doches/rwordnet)
+## Summary ##
++ Works directly on the database that comes with WordNet
++ No gem or native dependencies
++ *Very* easy to install
++ Small footprint (8.1M vs 24M for Ruby-Wordnet+DB)
++ Can use a custom, existing WordNet installation
 ## About ##
 This library implements a pure Ruby interface to the WordNet lexical/semantic
-database. Unlike existing ruby bindings, this one doesn't require you to convert
+database. Unlike existing ruby bindings, this one doesn't require you to convert
 the original WordNet database into a new database format; instead it can work directly
 on the database that comes with WordNet.
 If you're doing something data-intensive you will achieve much better performance
-with Michael Granger's [Ruby-WordNet](http://www.deveiate.org/projects/Ruby-WordNet/),
-since it converts the WordNet database into a BerkelyDB file for quicker access. In
-writing rwordnet, I've focused more on usability and ease of installation ( *gem install
+with Michael Granger's [Ruby-WordNet](http://www.deveiate.org/projects/Ruby-WordNet/),
+since it converts the WordNet database into a BerkelyDB file for quicker access.  rwordnet has a much smaller footprint, with no gem or native dependencies, and requires about a third of the space on disk as Ruby-Wordnet + DB. In
+writing rwordnet, I've focused more on usability and ease of installation ( *gem install
 rwordnet* ) at the expense of some performance. Use at your own risk, etc.
 ## Installation ##
 One of the chief benefits of rwordnet over Ruby-WordNet is how easy it is to install:
-    gem install gemcutter  # These two steps are only necessary if you haven't
-    gem tumble             # yet installed the gemcutter tools
     gem install rwordnet
-That's it! rwordnet comes bundled with the WordNet database which it uses by default,
+That's it! rwordnet comes bundled with the WordNet database which it uses by default,
 so there's absolutely nothing else to download, install, or configure.
-Of course, if you want to use your own WordNet installation, that's easy too -- just
+Of course, if you want to use your own WordNet installation, that's easy too -- just
 set the path to WordNet's database files before using the library (see examples below).
 ## Usage ##
 The other benefit of rwordnet over Ruby-WordNet is that it's so much easier (IMHO) to
-use.
+use.
+As an example, consider finding all of the noun glosses for a given word:
-As a quick example, consider finding all of the noun glosses for a given word:
+```Ruby
+require 'wordnet'
-    require 'rubygems'
-    require 'wordnet'
-    index = WordNet::NounIndex.instance
-    lemma = index.find("fruit")
-    lemma.synsets.each { |synset| puts synset.gloss }
+lemma = WordNet::Lemma.find("fruit", :noun)
+lemma.synsets.each { |synset| puts synset.gloss }
+```
 ...or all of the glosses, period:
-    lemmas = WordNet::WordNetDB.find("fruit")
-    synsets = lemmas.map { |lemma| lemma.synsets }
-    words = synsets.flatten
-    words.each { |word| puts word.gloss }
+```Ruby
+lemmas = WordNet::Lemma.find_all("fruit")
+synsets = lemmas.map { |lemma| lemma.synsets }
+words = synsets.flatten
+words.each { |word| puts word.gloss }
+```
 Have your own WordNet database that you've marked up with extra attributes and whatnot?
 No problem:
-    require 'rubygems'
-    require 'wordnet'
-    include WordNet
-    WordNetDB.path = "/path/to/WordNet-3.0"
-    lemmas = WordNetDB.find("fruit")
-    ...
+```Ruby
+require 'wordnet'
+WordNet::DB.path = "/path/to/WordNet-3.0"
+lemmas = WordNet::Lemma.find_all("fruit")
+...
+```

data/examples/benchmark.rb ADDED Viewed

@@ -0,0 +1,14 @@
+require 'benchmark'
+require 'wordnet'
+initial = Benchmark.realtime do
+  WordNet::Lemma.find(ARGV[0] || raise("Usage: ruby benchmark.rb noun"), :noun)
+end
+puts "Time to initial word #{initial}"
+lookup = Benchmark.realtime do
+  1000.times { WordNet::Lemma.find('fruit', :noun) }
+end
+puts "Time for 1k lookups #{lookup}"

data/examples/dictionary.rb CHANGED Viewed

@@ -1,5 +1,4 @@
 # Use WordNet as a command-line dictionary.
-require 'rubygems'
 require 'wordnet'
 if ARGV.size != 1
@@ -10,10 +9,10 @@ end
 word = ARGV[0]
 # Find all the lemmas for a word (i.e., whether it occurs as a noun, verb, etc.)
-lemmas = WordNet::WordNetDB.find(word)
+lemmas = WordNet::Lemma.find_all(word)
 # Print out each lemma with a list of possible meanings.
-lemmas.each do |lemma|
+lemmas.each do |lemma|
   puts lemma
   lemma.synsets.each_with_index do |synset,i|
     puts "\t#{i+1}) #{synset.gloss}"

data/examples/full_hypernym.rb CHANGED Viewed

@@ -1,10 +1,7 @@
-require 'rubygems'
 require 'wordnet'
-# Open the index file for nouns
-index = WordNet::NounIndex.new
 # Find the word 'fruit'
-lemma = index.find("fruit")
+lemma = WordNet::Lemma.find("fruit", :noun)
 # Find all the synsets for 'fruit', and pick the first one.
 synset = lemma.synsets[0]
 puts synset

data/lib/wordnet/db.rb ADDED Viewed

@@ -0,0 +1,17 @@
+module WordNet
+  # Represents the WordNet database, and provides some basic interaction.
+  class DB
+    # By default, use the bundled WordNet
+    @path = File.expand_path("../../../WordNet-3.0/", __FILE__)
+    class << self
+      # To use your own WordNet installation (rather than the one bundled with rwordnet:
+      # Returns the path to the WordNet installation currently in use. Defaults to the bundled version of WordNet.
+      attr_accessor :path
+      def open(path, &block)
+        File.open(File.join(self.path, path), "r", &block)
+      end
+    end
+  end
+end

data/lib/wordnet/lemma.rb CHANGED Viewed

@@ -1,39 +1,60 @@
 module WordNet
+  # Represents a single word in the WordNet lexicon, which can be used to look up a set of synsets.
+  class Lemma
+    SPACE = ' '
+    attr_accessor :word, :pos, :pointer_symbols, :tagsense_count, :synset_offsets, :id
-# Represents a single word in the WordNet lexicon, which can be used to look up a set of synsets.
-class Lemma
-  attr_accessor :lemma, :pos, :synset_cnt, :p_cnt, :ptr_symbol, :tagsense_cnt, :synset_offset, :id
-  # Create a lemma from a line in an index file. You should be creating Lemmas by hand; instead,
-  # use the WordNet#find and Index#find methods to find the Lemma for a word.
-  def initialize(index_line, id = 0)
-    @id = (id > 0) ? id : nil
-    line = index_line.split(" ")
-    @lemma = line.shift
-    @pos = line.shift
-    @synset_cnt = line.shift.to_i
-    @p_cnt = line.shift.to_i
-    @ptr_symbol = []
-    @p_cnt.times { @ptr_symbol.push line.shift }
-    line.shift # Throw away redundant sense_cnt
-    @tagsense_cnt = line.shift.to_i
-    @synset_offset = []
-    @synset_cnt.times { @synset_offset.push line.shift.to_i }
-  end
-  # Return a list of synsets for this Lemma. Each synset represents a different sense, or meaning, of the word.
-  def get_synsets
-    return @synset_offset.map { |offset| Synset.new(@pos, offset) }
-  end
-  def to_s
-    [@lemma, @pos].join(",")
-  end
-  alias synsets get_synsets
-  alias word lemma
-end
+    # Create a lemma from a line in an lexicon file. You should be creating Lemmas by hand; instead,
+    # use the WordNet::Lemma.find and WordNet::Lemma.find_all methods to find the Lemma for a word.
+    def initialize(lexicon_line, id)
+      @id = id
+      line = lexicon_line.split(" ")
+      @word = line.shift
+      @pos = line.shift
+      synset_count = line.shift.to_i
+      @pointer_symbols = line.slice!(0, line.shift.to_i)
+      line.shift # Throw away redundant sense_cnt
+      @tagsense_count = line.shift.to_i
+      @synset_offsets = line.slice!(0, synset_count).map(&:to_i)
+    end
+    # Return a list of synsets for this Lemma. Each synset represents a different sense, or meaning, of the word.
+    def synsets
+      @synset_offsets.map { |offset| Synset.new(@pos, offset) }
+    end
+    def to_s
+      [@word, @pos].join(",")
+    end
+    class << self
+      @@cache = {}
+      def find_all(word)
+        [:noun, :verb, :adj, :adv].flat_map do |pos|
+          find(word, pos) || []
+        end
+      end
+      # Find a lemma for a given word and pos
+      def find(word, pos)
+        cache = @@cache[pos] ||= build_cache(pos)
+        if found = cache[word]
+          Lemma.new(*found)
+        end
+      end
+      private
+      def build_cache(pos)
+        cache = {}
+        DB.open(File.join("dict", "index.#{pos}")).each_line.each_with_index do |line, index|
+          word = line.slice(0, line.index(SPACE))
+          cache[word] = [line, index+1]
+        end
+        cache
+      end
+    end
+  end
 end

data/lib/wordnet/pointer.rb CHANGED Viewed

@@ -1,15 +1,14 @@
 module WordNet
+  class Pointer
+    attr_reader :symbol, :offset, :pos, :source, :target
-# Convenience class for treating hashes as objects, i.e. obj[:key] <=> obj.key. I know
-# this is probably a bad idea, but it's so convenient...
-class Pointer < Hash
-  def method_missing(msg, *args)
-    if self.include?(msg)
-      return self[msg]
-    else
-      throw NoMethodError.new("undefined method `#{msg}' for #{self}:Pointer")
+    def initialize(symbol: raise, offset: raise, pos: raise, source: raise)
+      @symbol, @offset, @pos, @source = symbol, offset, pos, source
+      @target = source.slice!(2,2)
     end
-  end
-end
+    def is_semantic?
+      source == "00" && target == "00"
+    end
+  end
 end

data/lib/wordnet/pointers.rb CHANGED Viewed

@@ -1,37 +1,82 @@
-# A container for various constants. In particular, contains constants representing the WordNet symbols used to look up synsets by relation, i.e. Hypernym/Hyponym.
-# Use these symbols in conjunction with the Synset#get_relation method.
+# A container for various constants.
+# In particular, contains constants representing the WordNet symbols used to look up synsets by relation, i.e. Hypernym/Hyponym.
+# Use these symbols in conjunction with the Synset#relation method.
 module WordNet
+  NOUN_POINTERS = {
+    "-c" => "Member of this domain - TOPIC",
+    "+" => "Derivationally related form",
+    "%p" => "Part meronym",
+    "~i" => "Instance Hyponym",
+    "@" => "Hypernym",
+    ";r" => "Domain of synset - REGION",
+    "!" => "Antonym",
+    "#p" => "Part holonym",
+    "%s" => "Substance meronym",
+    ";u" => "Domain of synset - USAGE",
+    "-r" => "Member of this domain - REGION",
+    "#s" => "Substance holonym",
+    "=" => "Attribute",
+    "-u" => "Member of this domain - USAGE",
+    ";c" => "Domain of synset - TOPIC",
+    "%m" => "Member meronym",
+    "~" => "Hyponym",
+    "@i" => "Instance Hypernym",
+    "#m" => "Member holonym"
+  }
+  VERB_POINTERS = {
+    "+" => "Derivationally related form",
+    "@" => "Hypernym",
+    ";r" => "Domain of synset - REGION",
+    "!" => "Antonym",
+    ";u" => "Domain of synset - USAGE",
+    "$" => "Verb Group",
+    ";c" => "Domain of synset - TOPIC",
+    ">" => "Cause",
+    "~" => "Hyponym",
+    "*" => "Entailment"
+  }
+  ADJECTIVE_POINTERS = {
+    ";r" => "Domain of synset - REGION",
+    "!" => "Antonym",
+    "\\" => "Pertainym (pertains to noun)",
+    "<" => "Participle of verb",
+    "&" => "Similar to",
+    "=" => "Attribute",
+    ";c" => "Domain of synset - TOPIC"
+  }
+  ADVERB_POINTERS = {
+    ";r" => "Domain of synset - REGION",
+    "!" => "Antonym",
+    ";u" => "Domain of synset - USAGE",
+    "\\" => "Derived from adjective",
+    ";c" => "Domain of synset - TOPIC"
+  }
-  NounPointers = {"-c"=>"Member of this domain - TOPIC", "+"=>"Derivationally related form", "%p"=>"Part meronym", "~i"=>"Instance Hyponym", "@"=>"Hypernym", ";r"=>"Domain of synset - REGION", "!"=>"Antonym", "#p"=>"Part holonym", "%s"=>"Substance meronym", ";u"=>"Domain of synset - USAGE", "-r"=>"Member of this domain - REGION", "#s"=>"Substance holonym", "="=>"Attribute", "-u"=>"Member of this domain - USAGE", ";c"=>"Domain of synset - TOPIC", "%m"=>"Member meronym", "~"=>"Hyponym", "@i"=>"Instance Hypernym", "#m"=>"Member holonym"}
-  VerbPointers = {"+"=>"Derivationally related form", "@"=>"Hypernym", ";r"=>"Domain of synset - REGION", "!"=>"Antonym", ";u"=>"Domain of synset - USAGE", "$"=>"Verb Group", ";c"=>"Domain of synset - TOPIC", ">"=>"Cause", "~"=>"Hyponym", "*"=>"Entailment"}
-  AdjectivePointers = {";r"=>"Domain of synset - REGION", "!"=>"Antonym", "\\"=>"Pertainym (pertains to noun)", "<"=>"Participle of verb", "&"=>"Similar to", "="=>"Attribute", ";c"=>"Domain of synset - TOPIC"}
-  AdverbPointers = {";r"=>"Domain of synset - REGION", "!"=>"Antonym", ";u"=>"Domain of synset - USAGE", "\\"=>"Derived from adjective", ";c"=>"Domain of synset - TOPIC"}
-  MemberOfThisDomainTopic = "-c"
-  DerivationallyRelatedForm = "+"
-  PartMeronym = "%p"
+  MEMBER_OF_THIS_DOMAIN_TOPIC = "-c"
+  DERIVATIONALLY_RELATED_FORM = "+"
+  PART_MERONYM = "%p"
   InstanceHyponym = "~i"
-  Hypernym = "@"
-  DomainOfSynsetRegion = ";r"
-  Antonym = "!"
-  PartHolonym = "#p"
-  SubstanceMeronym = "%s"
-  VerbGroup = "$"
-  DomainOfSynsetUsage = ";u"
-  MemberOfThisDomainRegion = "-r"
-  SubstanceHolonym = "#s"
-  DerivedFromAdjective = "\\"
-  ParticipleOfVerb = "<"
-  SimilarTo = "&"
-  Attribute = "="
-  AlsoSee = "^"
-  Cause = ">"
-  MemberOfThisDomainUsage = "-u"
-  DomainOfSynsetTopic = ";c"
-  MemberMeronym = "%m"
-  Hyponym = "~"
-  InstanceHypernym = "@i"
-  Entailment = "*"
-  MemberHolonym = "#m"
+  HYPERNYM = "@"
+  DOMAIN_OF_SYNSET_REGION = ";r"
+  ANTONYM = "!"
+  PART_HOLONYM = "#p"
+  SUBSTANCE_MERONYM = "%s"
+  VERB_GROUP = "$"
+  DOMAIN_OF_SYNSET_USAGE = ";u"
+  MEMBER_OF_THIS_DOMAIN_REGION = "-r"
+  SUBSTANCE_HOLONYM = "#s"
+  DERIVED_FROM_ADJECTIVE = "\\"
+  PARTICIPLE_OF_VERB = "<"
+  SIMILAR_TO = "&"
+  ATTRIBUTE = "="
+  ALSO_SEE = "^"
+  CAUSE = ">"
+  MEMBER_OF_THIS_DOMAIN_USAGE = "-u"
+  DOMAIN_OF_SYNSET_TOPIC = ";c"
+  MEMBER_MERONYM = "%m"
+  HYPONYM = "~"
+  INSTANCE_HYPERNYM = "@i"
+  ENTAILMENT = "*"
+  MEMBER_HOLONYM = "#m"
 end

data/lib/wordnet/synset.rb CHANGED Viewed

@@ -1,90 +1,98 @@
 module WordNet
+  SYNSET_TYPES = {"n" => "noun", "v" => "verb", "a" => "adj", "r" => "adv"}
-# Represents a synset (or group of synonymous words) in WordNet. Synsets are related to each other by various (and numerous!)
-# relationships, including Hypernym (x is a hypernym of y <=> x is a parent of y) and Hyponym (x is a child of y)
-class Synset
-  attr_reader :gloss, :synset_offset, :lex_filenum, :ss_type, :w_cnt, :wordcounts
-  # Create a new synset by reading from the data file specified by +pos+, at +offset+ bytes into the file. This is how
-  # the WordNet database is organized. You shouldn't be creating Synsets directly; instead, use Lemma#synsets.
-  def initialize(pos, offset)
-    data = File.open(File.join(WordNetDB.path,"dict","data.#{SynsetType[pos]}"),"r")
-    data.seek(offset)
-    data_line = data.readline.strip
-    data.close
-    info_line, @gloss = data_line.split(" | ")
-    line = info_line.split(" ")
-    @synset_offset = line.shift
-    @lex_filenum = line.shift
-    @ss_type = line.shift
-    @w_cnt = line.shift.to_i
-    @wordcounts = {}
-    @w_cnt.times do
-      @wordcounts[line.shift] = line.shift.to_i
+  # Represents a synset (or group of synonymous words) in WordNet. Synsets are related to each other by various (and numerous!)
+  # relationships, including Hypernym (x is a hypernym of y <=> x is a parent of y) and Hyponym (x is a child of y)
+  class Synset
+    attr_reader :gloss, :synset_offset, :lex_filenum, :synset_type, :word_counts, :pos_offset, :pos
+    # Create a new synset by reading from the data file specified by +pos+, at +offset+ bytes into the file. This is how
+    # the WordNet database is organized. You shouldn't be creating Synsets directly; instead, use Lemma#synsets.
+    def initialize(pos, offset)
+      data_line = DB.open(File.join("dict", "data.#{SYNSET_TYPES.fetch(pos)}")) do |f|
+        f.seek(offset)
+        f.readline.strip
+      end
+      info_line, @gloss = data_line.split(" | ", 2)
+      line = info_line.split(" ")
+      @pos = pos
+      @pos_offset = offset
+      @synset_offset = line.shift
+      @lex_filenum = line.shift
+      @synset_type = line.shift
+      @word_counts = {}
+      word_count = line.shift.to_i
+      word_count.times do
+        @word_counts[line.shift] = line.shift.to_i
+      end
+      pointer_count = line.shift.to_i
+      @pointers = Array.new(pointer_count).map do
+        Pointer.new(
+          symbol: line.shift[0],
+          offset: line.shift.to_i,
+          pos: line.shift,
+          source: line.shift
+        )
+      end
     end
-    @p_cnt = line.shift.to_i
-    @pointers = []
-    @p_cnt.times do
-      pointer = Pointer.new
-      pointer[:symbol] = line.shift,
-      pointer[:offset] = line.shift.to_i
-      pointer[:pos] = line.shift
-      pointer[:source] = line.shift
-      pointer[:is_semantic?] = (pointer[:source] == "0000")
-      pointer[:target] = pointer[:source][2..3]
-      pointer[:source] = pointer[:source][0..1]
-      pointer[:symbol] = pointer[:symbol][0]
-      @pointers.push pointer
+    # How many words does this Synset include?
+    def word_count
+      @word_counts.size
     end
-  end
-  # How many words does this Synset include?
-  def size
-    @wordcounts.size
-  end
-  # Get a list of words included in this Synset
-  def words
-    @wordcounts.keys
-  end
-  # List of valid +pointer_symbol+s is in pointers.rb
-  def get_relation(pointer_symbol)
-    @pointers.reject { |pointer| pointer.symbol != pointer_symbol }.map { |pointer| Synset.new(@ss_type, pointer.offset) }
-  end
-  # Get the Synset of this sense's antonym
-  def antonym
-    get_relation(Antonym)
-  end
-  # Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).
-  def hypernym
-    get_relation(Hypernym)[0]
-  end
-  # Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit)
-  def hyponym
-    get_relation(Hyponym)
-  end
-  # Get the entire hypernym tree (from this synset all the way up to +entity+) as an array.
-  def expanded_hypernym
-    parent = self.hypernym
-    return [] if parent.nil?
-    return [parent, parent.expanded_hypernym].flatten
-  end
-  def to_s
-    "(#{@ss_type}) #{words.map {|x| x.gsub('_',' ')}.join(', ')} (#{@gloss})"
-  end
-  alias parent hypernym
-  alias children hyponym
-end
+    # Get a list of words included in this Synset
+    def words
+      @word_counts.keys
+    end
+    # List of valid +pointer_symbol+s is in pointers.rb
+    def relation(pointer_symbol)
+      @pointers.select { |pointer| pointer.symbol == pointer_symbol }.
+        map! { |pointer| Synset.new(@synset_type, pointer.offset) }
+    end
+    # Get the Synset of this sense's antonym
+    def antonym
+      relation(ANTONYM)
+    end
+    # Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).
+    def hypernym
+      relation(HYPERNYM)[0]
+    end
+    # Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit)
+    def hyponym
+      relation(HYPONYM)
+    end
+    # Get the entire hypernym tree (from this synset all the way up to +entity+) as an array.
+    def expanded_hypernym
+      parent = hypernym
+      list = []
+      return list unless parent
+      while parent
+        break if list.include? parent.pos_offset
+        list.push parent.pos_offset
+        parent = parent.parent
+      end
+      list.flatten!
+      list.map! { |offset| Synset.new(@pos, offset)}
+    end
+    def to_s
+      "(#{@synset_type}) #{words.map { |x| x.tr('_',' ') }.join(', ')} (#{@gloss})"
+    end
+    alias size word_count
+    alias parent hypernym
+    alias children hyponym
+  end
 end

data/lib/wordnet/version.rb ADDED Viewed

@@ -0,0 +1,3 @@
+module WordNet
+  VERSION = "1.0.0"
+end

data/lib/wordnet.rb CHANGED Viewed

@@ -1,7 +1,5 @@
 require 'wordnet/pointer'
-require 'wordnet/wordnetdb'
-require 'wordnet/index'
+require 'wordnet/db'
 require 'wordnet/lemma'
 require 'wordnet/pointers'
-require 'wordnet/pos'
 require 'wordnet/synset'

data/test/test_helper.rb CHANGED Viewed

@@ -1,17 +1,16 @@
-require "test/unit"
-require File.dirname(__FILE__) + "/../lib/wordnet"
+require "bundler/setup"
+require "maxitest/autorun"
+$LOAD_PATH.unshift Bundler.root.join("lib")
+require "wordnet"
-class << Test::Unit::TestCase
-  def test(name, &block)
-    test_name = :"test_#{name.gsub(' ','_')}"
-    raise ArgumentError, "#{test_name} is already defined" if self.instance_methods.include? test_name.to_s
-    define_method test_name, &block
-  end
-  def expect(expected_value, &block)
-    define_method :"test_#{caller.first.split("/").last}" do
-      assert_equal expected_value, instance_eval(&block)
-    end
+Minitest::Test.class_eval do
+  def with_db_path(path)
+    begin
+      old, WordNet::DB.path = WordNet::DB.path, path
+      yield
+    ensure
+      WordNet::DB.path = old
+    end
   end
 end

data/test/unit/db_test.rb ADDED Viewed

@@ -0,0 +1,14 @@
+require_relative "../test_helper"
+describe WordNet::DB do
+  it 'sets and reads path' do
+    with_db_path("WordNetPath") { WordNet::DB.path.must_equal "WordNetPath" }
+  end
+  it "opens a relative path" do
+    result = WordNet::DB.open(File.join("dict", "index.verb")) do |f|
+      f.gets
+    end
+    result.must_equal "  1 This software and database is being provided to you, the LICENSEE, by  \n"
+  end
+end

data/test/unit/lemma_test.rb ADDED Viewed

@@ -0,0 +1,94 @@
+require_relative "../test_helper"
+describe WordNet::Lemma do
+  describe ".find" do
+    it 'finds a lemma by string' do
+      lemma = WordNet::Lemma.find("fruit", :noun)
+      lemma.to_s.must_equal "fruit,n"
+    end
+    it 'caches found' do
+      lemma1 = WordNet::Lemma.find("fruit", :noun)
+      lemma2 = with_db_path "does-not-exist" do
+        WordNet::Lemma.find("fruit", :noun)
+      end
+      lemma1.word.must_equal lemma2.word
+    end
+    it 'only scans the db once' do
+      lemma1 = WordNet::Lemma.find("fruit", :noun)
+      lemma2 = with_db_path "does-not-exist" do
+        WordNet::Lemma.find("table", :noun)
+      end
+      lemma2.word.must_equal "table"
+    end
+    it 'can lookup different things' do
+      lemma1 = WordNet::Lemma.find("fruit", :noun)
+      lemma2 = WordNet::Lemma.find("banana", :noun)
+      lemma1.word.must_equal "fruit"
+      lemma2.word.must_equal "banana"
+    end
+    it 'does not find word in wrong file' do
+      lemma = WordNet::Lemma.find("elephant", :verb)
+      lemma.must_equal nil
+    end
+    it 'caches unfound' do
+      WordNet::Lemma.find("elephant", :verb)
+      lemma2 = with_db_path "does-not-exist" do
+        WordNet::Lemma.find("elephant", :verb)
+      end
+      lemma2.must_equal nil
+    end
+    it 'fails on unknown type' do
+      assert_raises Errno::ENOENT do
+        WordNet::Lemma.find("fruit", :sdjksdfjkdfskjsdfjk)
+      end
+    end
+    it "does not find by regexp" do
+      WordNet::Lemma.find(".", :verb).must_equal nil
+    end
+  end
+  describe ".find_all" do
+    it "finds all pos" do
+      result = WordNet::Lemma.find_all("fruit")
+      result.size.must_equal 2
+      result.map(&:pos).sort.must_equal ["n", "v"]
+    end
+    it "returns empty array for unfound" do
+      WordNet::Lemma.find_all("sdjkhdfsjfdsjhkfds").must_equal []
+    end
+    it "does not produce a circular reference" do
+      l = WordNet::Lemma.find_all("blink")[1]
+      l.synsets[1].expanded_hypernym.wont_be_nil
+    end
+  end
+  describe "#synsets" do
+    it 'finds them' do
+      lemma = WordNet::Lemma.find("fruit", :noun)
+      synsets = lemma.synsets
+      synsets.size.must_equal 3
+      synsets[1].to_s.must_equal "(n) yield, fruit (an amount of a product)"
+    end
+  end
+  describe ".new" do
+    it "builds all fields" do
+      lemma = WordNet::Lemma.new("fruit n 3 3 @ ~ + 3 3 13134947 04612722 07294550", 123)
+      lemma.id.must_equal 123
+      lemma.word.must_equal "fruit"
+      lemma.pos.must_equal "n"
+      lemma.pointer_symbols.must_equal ["@", "~", "+"]
+      lemma.tagsense_count.must_equal 3
+      lemma.synset_offsets.must_equal [13134947, 4612722, 7294550]
+    end
+  end
+end

data/test/unit/pointer_test.rb ADDED Viewed

@@ -0,0 +1,26 @@
+require_relative "../test_helper"
+describe WordNet::Pointer do
+  let(:pointer) { WordNet::Pointer.new(symbol: "s", offset: 123, pos: "v", source: "1234") }
+  describe "#initialize" do
+    it "sets all values" do
+      pointer.symbol.must_equal "s"
+      pointer.offset.must_equal 123
+      pointer.pos.must_equal "v"
+      pointer.source.must_equal "12"
+      pointer.target.must_equal "34"
+    end
+  end
+  describe "#is_semantic?" do
+    it "is not semantic for non-0" do
+      pointer.is_semantic?.must_equal false
+    end
+    it "is semantic for all-0" do
+      pointer = WordNet::Pointer.new(symbol: "s", offset: 123, pos: "v", source: "0000")
+      pointer.is_semantic?.must_equal true
+    end
+  end
+end

data/test/unit/synset_test.rb CHANGED Viewed

@@ -1,43 +1,39 @@
-require File.dirname(__FILE__) + "/../test_helper.rb"
+require_relative "../test_helper"
-class TestSynset < Test::Unit::TestCase
-  @@synsets = nil
-  def setup
-    if @@synsets.nil?
-      index = WordNet::NounIndex.instance
-      lemma = index.find("fruit")
-      @@synsets = lemma.get_synsets
-    end
+describe WordNet::Synset do
+  def self.synsets
+    @synsets ||= WordNet::Lemma.find("fruit", :noun).synsets
   end
-  test 'get synsets for a lemma' do
-    assert_equal 3, @@synsets.size
-    assert_equal "(n) fruit (the ripened reproductive body of a seed plant)",@@synsets[0].to_s
-    assert_equal "an amount of a product",@@synsets[1].gloss
+  let(:synsets) { self.class.synsets }
+  it 'get synsets for a lemma' do
+    assert_equal 3, synsets.size
+    assert_equal "(n) fruit (the ripened reproductive body of a seed plant)",synsets[0].to_s
+    assert_equal "an amount of a product",synsets[1].gloss
   end
-  test 'get hypernym for a synset' do
-    hypernym = @@synsets[0].get_relation(WordNet::Hypernym)
-    hypernym = @@synsets[0].hypernym
+  it 'get hypernym for a synset' do
+    hypernym = synsets[0].relation(WordNet::HYPERNYM)
+    hypernym = synsets[0].hypernym
     assert_equal 1,hypernym.size
     assert_equal "(n) reproductive structure (the parts of a plant involved in its reproduction)",hypernym.to_s
   end
-  test 'test shorthand for get_relation' do
-    hypernym = @@synsets[0].get_relation(WordNet::Hypernym)
-    hypernym2 = @@synsets[0].hypernym
+  it 'test shorthand for get_relation' do
+    hypernym = synsets[0].relation(WordNet::HYPERNYM)
+    hypernym2 = synsets[0].hypernym
     assert_equal hypernym[0].gloss, hypernym2.gloss
   end
-  test 'get hyponyms for a synset' do
-    hyponym = @@synsets[0].get_relation(WordNet::Hyponym)
+  it 'get hyponyms for a synset' do
+    hyponym = synsets[0].relation(WordNet::HYPONYM)
     assert_equal 29,hyponym.size
     assert_equal "fruit of various buckthorns yielding dyes or pigments",hyponym[26].gloss
   end
-  test 'test expanded hypernym tree' do
-    expanded = @@synsets[0].expanded_hypernym
+  it 'test expanded hypernym tree' do
+    expanded = synsets[0].expanded_hypernym
     assert_equal 8, expanded.size
     assert_equal "entity", expanded[expanded.size-1].words[0]
   end

metadata CHANGED Viewed

@@ -1,33 +1,23 @@
---- !ruby/object:Gem::Specification
+--- !ruby/object:Gem::Specification
 name: rwordnet
-version: !ruby/object:Gem::Version
-  prerelease: false
-  segments:
-  - 0
-  - 1
-  - 3
-  version: 0.1.3
+version: !ruby/object:Gem::Version
+  version: 1.0.0
 platform: ruby
-authors:
+authors:
 - Trevor Fountain
 - Wolfram Sieber
+- Michael Grosser
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2010-10-15 00:00:00 +01:00
-default_executable:
+date: 2015-03-08 00:00:00.000000000 Z
 dependencies: []
-description: A pure Ruby interface to the WordNet database
+description:
 email: doches@gmail.com
 executables: []
 extensions: []
-extra_rdoc_files:
-- README.markdown
-files:
+extra_rdoc_files: []
+files:
 - History.txt
 - README.markdown
 - WordNet-3.0/AUTHORS
@@ -42,54 +32,43 @@ files:
 - WordNet-3.0/dict/index.adv
 - WordNet-3.0/dict/index.noun
 - WordNet-3.0/dict/index.verb
+- examples/benchmark.rb
 - examples/dictionary.rb
 - examples/full_hypernym.rb
 - lib/wordnet.rb
-- lib/wordnet/index.rb
+- lib/wordnet/db.rb
 - lib/wordnet/lemma.rb
 - lib/wordnet/pointer.rb
 - lib/wordnet/pointers.rb
-- lib/wordnet/pos.rb
 - lib/wordnet/synset.rb
-- lib/wordnet/wordnetdb.rb
+- lib/wordnet/version.rb
 - test/test_helper.rb
-- test/unit/index_test.rb
+- test/unit/db_test.rb
+- test/unit/lemma_test.rb
+- test/unit/pointer_test.rb
 - test/unit/synset_test.rb
-- test/unit/wordnetdb_test.rb
-has_rdoc: true
-homepage: http://github.com/doches/rwordnet
-licenses: []
+homepage: https://github.com/doches/rwordnet
+licenses:
+- MIT
+metadata: {}
 post_install_message:
-rdoc_options:
-- --charset=UTF-8
-require_paths:
+rdoc_options: []
+require_paths:
 - lib
-required_ruby_version: !ruby/object:Gem::Requirement
-  requirements:
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
   - - ">="
-    - !ruby/object:Gem::Version
-      segments:
-      - 0
-      version: "0"
-required_rubygems_version: !ruby/object:Gem::Requirement
-  requirements:
+    - !ruby/object:Gem::Version
+      version: 2.0.0
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
   - - ">="
-    - !ruby/object:Gem::Version
-      segments:
-      - 0
-      version: "0"
+    - !ruby/object:Gem::Version
+      version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 1.3.6
+rubygems_version: 2.2.2
 signing_key:
-specification_version: 3
+specification_version: 4
 summary: A pure Ruby interface to the WordNet database
-test_files:
-- test/unit/index_test.rb
-- test/unit/synset_test.rb
-- test/unit/wordnetdb_test.rb
-- test/test_helper.rb
-- examples/full_hypernym.rb
-- examples/dictionary.rb
+test_files: []

data/lib/wordnet/index.rb DELETED Viewed

@@ -1,82 +0,0 @@
-require 'singleton'
-module WordNet
-# Index is a WordNet lexicon. Note that Index is the base class; you probably want to be using the NounIndex, VerbIndex, etc. classes instead.
-# Note that Indices are Singletons -- get an Index object by calling <POS>Index.instance, not <POS>Index.new.
-class Index
-  # Create a new index for the given part of speech. +pos+ can be one of +noun+, +verb+, +adj+, or +adv+.
-  def initialize(pos)
-    @pos = pos
-    @db = {}
-    @finished_reading = false
-  end
-  # Find a lemma for a given word. Returns a Lemma which can then be used to access the synsets for the word.
-  def find(lemma_str)
-    # Look for the lemma in the part of the DB already read...
-    return @db[lemma_str] if @db.include?(lemma_str)
-    return nil if @finished_reading
-    # If we didn't find it, read in some more from the DB.
-    index = WordNetDB.open(File.join(WordNetDB.path,"dict","index.#{@pos}"))
-    lemma_counter = 1
-    if not index.closed?
-      loop do
-        break if index.eof?
-        line = index.readline
-        lemma = Lemma.new(line, lemma_counter); lemma_counter += 1
-        @db[lemma.word] = lemma
-        if line =~ /^#{lemma_str} /
-          return lemma
-        end
-      end
-      index.close
-    end
-    @finished_reading = true
-    # If we *still* didn't find it, return nil. It must not be in the database...
-    return nil
-  end
-end
-# An Index of nouns. Create a NounIndex by calling `NounIndex.instance`
-class NounIndex < Index
-  include Singleton
-  def initialize
-    super("noun")
-  end
-end
-# An Index of verbs. Create a VerbIndex by calling `VerbIndex.instance`
-class VerbIndex < Index
-  include Singleton
-  def initialize
-    super("verb")
-  end
-end
-# An Index of adjectives. Create an AdjectiveIndex by `AdjectiveIndex.instance`
-class AdjectiveIndex < Index
-  include Singleton
-  def initialize
-    super("adj")
-  end
-end
-# An Index of adverbs. Create an AdverbIndex by `AdverbIndex.instance`
-class AdverbIndex < Index
-  include Singleton
-  def initialize
-    super("adv")
-  end
-end
-end

data/lib/wordnet/pos.rb DELETED Viewed

@@ -1,3 +0,0 @@
-module WordNet
-  SynsetType = {"n" => "noun", "v" => "verb", "adj" => "adj", "adv" => "adv"}
-end

data/lib/wordnet/wordnetdb.rb DELETED Viewed

@@ -1,54 +0,0 @@
-module WordNet
-# Represents the WordNet database, and provides some basic interaction.
-class WordNetDB
-  # By default, use the bundled WordNet
-  @@path = File.join(File.dirname(__FILE__),"/../../WordNet-3.0/")
-  @@files = {}
-  # To use your own WordNet installation (rather than the one bundled with rwordnet:
-  def WordNetDB.path=(path_to_wordnet)
-    @@path = path_to_wordnet
-  end
-  # Returns the path to the WordNet installation currently in use. Defaults to the bundled version of WordNet.
-  def WordNetDB.path
-    @@path
-  end
-  # Look up a word in WordNet. Returns a list of lemmas occuring in any of the index files (noun, verb, adjective, adverb).
-  def WordNetDB.find(word)
-    lemmas = []
-    [NounIndex, VerbIndex, AdjectiveIndex, AdverbIndex].each do |index|
-      lemmas.push index.instance.find(word)
-    end
-    return lemmas.flatten.reject { |x| x.nil? }
-  end
-  # Register a new DB file handle. You shouldn't need to call this method; it's called automatically every time you open an index or data file.
-  def WordNetDB.open(path)
-    # If the file is already open, just return the handle.
-    return @@files[path] if @@files.include?(path) and not @@files[path].closed?
-    # Open and store
-    @@files[path] = File.open(path,"r")
-    return @@files[path]
-  end
-  # You should call this method after you're done using WordNet.
-  def WordNetDB.close
-    WordNetDB.finalize(0)
-  end
-  def WordNetDB.finalize(id)
-    @@files.each_value do |handle|
-      begin
-        handle.close
-      rescue IOError
-        ; # Keep going, close the next file.
-      end
-    end
-  end
-end
-end

data/test/unit/index_test.rb DELETED Viewed

@@ -1,21 +0,0 @@
-require File.dirname(__FILE__) + "/../test_helper.rb"
-class TestIndex < Test::Unit::TestCase
-  @@index = nil
-  def setup
-    @@index = WordNet::NounIndex.instance if @@index.nil?
-  end
-  test 'find a lemma by string' do
-    lemma = @@index.find("fruit")
-    assert_equal "fruit,n",lemma.to_s
-  end
-  test 'get synsets for a lemma' do
-    lemma = @@index.find("fruit")
-    synsets = lemma.get_synsets
-    assert_equal 3, synsets.size
-    assert_equal "(n) yield, fruit (an amount of a product)",synsets[1].to_s
-  end
-end

data/test/unit/wordnetdb_test.rb DELETED Viewed

@@ -1,15 +0,0 @@
-require File.dirname(__FILE__) + "/../test_helper.rb"
-class TestWordNetDB < Test::Unit::TestCase
-  include WordNet
-  test 'set and read path' do
-    WordNetDB.path = "WordNetPath"
-    assert_equal "WordNetPath",WordNetDB.path
-  end
-  test 'find a word' do
-    lemmas = WordNetDB.find("fruit")
-    assert_equal 2,lemmas.size
-  end
-end