rwordnet 0.1.3 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 4bea9b677b6a581d27c04ad1912c3034bf8329d8
4
+ data.tar.gz: 4fdef6fdfdbe2373b445f7857d3fe5ff5071fba2
5
+ SHA512:
6
+ metadata.gz: c1471523dcb27e496eb72b406f37ddd83184dfcecbcabe95c492f6018b29fa53180d06035b7f9c0a938bfde0e4c28a37f89473c0b3a712be527ed2abf6261af3
7
+ data.tar.gz: 25cf40f306f3dfcf8076d9430cfd5d8d805af9198762a530ed4f293e5b0c4e5ccc680e941b2256f63c5141d5499ad77b7821eec5b4e245a72a47e1f9ca6e83b3
data/History.txt CHANGED
@@ -1,5 +1,14 @@
1
+ # rWordNet 1.0.0
2
+ * Performance fixes for the lookup
3
+ * Find using Lemma.find / Lemma.find_all
4
+ * using ruby style constant names like `VerbPointers` -> `VERB_POINTERS`
5
+ * renamed WordNet::WordNetDB to WordNet::DB
6
+ * renaming a few methods in Lemma like `p_cnt` -> `pointer_count`
7
+ * make Pointer a real class
8
+ * renaming a few methods in SynSet like `get_relation` -> `relation`
9
+
1
10
  # rWordNet 0.1.3
2
- # Fixed a terrible bug that caused Indices to re-read the *entire* database on every failed lookup.
11
+ * Fixed a terrible bug that caused Indices to re-read the *entire* database on every failed lookup.
3
12
 
4
13
  # rWordNet 0.1.2
5
14
  * Added unique (integer) ids to lemmas [Wolfram Sieber]
data/README.markdown CHANGED
@@ -1,59 +1,69 @@
1
1
  # A pure Ruby interface to WordNet #
2
2
 
3
+ [![Build Status](https://travis-ci.org/doches/rwordnet.png)](https://travis-ci.org/doches/rwordnet)
4
+
5
+ ## Summary ##
6
+
7
+ + Works directly on the database that comes with WordNet
8
+ + No gem or native dependencies
9
+ + *Very* easy to install
10
+ + Small footprint (8.1M vs 24M for Ruby-Wordnet+DB)
11
+ + Can use a custom, existing WordNet installation
12
+
3
13
  ## About ##
4
14
 
5
15
  This library implements a pure Ruby interface to the WordNet lexical/semantic
6
- database. Unlike existing ruby bindings, this one doesn't require you to convert
16
+ database. Unlike existing ruby bindings, this one doesn't require you to convert
7
17
  the original WordNet database into a new database format; instead it can work directly
8
18
  on the database that comes with WordNet.
9
19
 
10
20
  If you're doing something data-intensive you will achieve much better performance
11
- with Michael Granger's [Ruby-WordNet](http://www.deveiate.org/projects/Ruby-WordNet/),
12
- since it converts the WordNet database into a BerkelyDB file for quicker access. In
13
- writing rwordnet, I've focused more on usability and ease of installation ( *gem install
21
+ with Michael Granger's [Ruby-WordNet](http://www.deveiate.org/projects/Ruby-WordNet/),
22
+ since it converts the WordNet database into a BerkelyDB file for quicker access. rwordnet has a much smaller footprint, with no gem or native dependencies, and requires about a third of the space on disk as Ruby-Wordnet + DB. In
23
+ writing rwordnet, I've focused more on usability and ease of installation ( *gem install
14
24
  rwordnet* ) at the expense of some performance. Use at your own risk, etc.
15
25
 
16
26
  ## Installation ##
17
27
 
18
28
  One of the chief benefits of rwordnet over Ruby-WordNet is how easy it is to install:
19
29
 
20
- gem install gemcutter # These two steps are only necessary if you haven't
21
- gem tumble # yet installed the gemcutter tools
22
30
  gem install rwordnet
23
-
24
- That's it! rwordnet comes bundled with the WordNet database which it uses by default,
31
+
32
+ That's it! rwordnet comes bundled with the WordNet database which it uses by default,
25
33
  so there's absolutely nothing else to download, install, or configure.
26
- Of course, if you want to use your own WordNet installation, that's easy too -- just
34
+ Of course, if you want to use your own WordNet installation, that's easy too -- just
27
35
  set the path to WordNet's database files before using the library (see examples below).
28
36
 
29
37
  ## Usage ##
30
38
 
31
39
  The other benefit of rwordnet over Ruby-WordNet is that it's so much easier (IMHO) to
32
- use.
40
+ use.
41
+
42
+ As an example, consider finding all of the noun glosses for a given word:
33
43
 
34
- As a quick example, consider finding all of the noun glosses for a given word:
44
+ ```Ruby
45
+ require 'wordnet'
35
46
 
36
- require 'rubygems'
37
- require 'wordnet'
38
-
39
- index = WordNet::NounIndex.instance
40
- lemma = index.find("fruit")
41
- lemma.synsets.each { |synset| puts synset.gloss }
47
+ lemma = WordNet::Lemma.find("fruit", :noun)
48
+ lemma.synsets.each { |synset| puts synset.gloss }
49
+ ```
42
50
 
43
51
  ...or all of the glosses, period:
44
52
 
45
- lemmas = WordNet::WordNetDB.find("fruit")
46
- synsets = lemmas.map { |lemma| lemma.synsets }
47
- words = synsets.flatten
48
- words.each { |word| puts word.gloss }
53
+ ```Ruby
54
+ lemmas = WordNet::Lemma.find_all("fruit")
55
+ synsets = lemmas.map { |lemma| lemma.synsets }
56
+ words = synsets.flatten
57
+ words.each { |word| puts word.gloss }
58
+ ```
49
59
 
50
60
  Have your own WordNet database that you've marked up with extra attributes and whatnot?
51
61
  No problem:
52
62
 
53
- require 'rubygems'
54
- require 'wordnet'
55
-
56
- include WordNet
57
- WordNetDB.path = "/path/to/WordNet-3.0"
58
- lemmas = WordNetDB.find("fruit")
59
- ...
63
+ ```Ruby
64
+ require 'wordnet'
65
+
66
+ WordNet::DB.path = "/path/to/WordNet-3.0"
67
+ lemmas = WordNet::Lemma.find_all("fruit")
68
+ ...
69
+ ```
@@ -0,0 +1,14 @@
1
+ require 'benchmark'
2
+ require 'wordnet'
3
+
4
+ initial = Benchmark.realtime do
5
+ WordNet::Lemma.find(ARGV[0] || raise("Usage: ruby benchmark.rb noun"), :noun)
6
+ end
7
+
8
+ puts "Time to initial word #{initial}"
9
+
10
+ lookup = Benchmark.realtime do
11
+ 1000.times { WordNet::Lemma.find('fruit', :noun) }
12
+ end
13
+
14
+ puts "Time for 1k lookups #{lookup}"
@@ -1,5 +1,4 @@
1
1
  # Use WordNet as a command-line dictionary.
2
- require 'rubygems'
3
2
  require 'wordnet'
4
3
 
5
4
  if ARGV.size != 1
@@ -10,10 +9,10 @@ end
10
9
  word = ARGV[0]
11
10
 
12
11
  # Find all the lemmas for a word (i.e., whether it occurs as a noun, verb, etc.)
13
- lemmas = WordNet::WordNetDB.find(word)
12
+ lemmas = WordNet::Lemma.find_all(word)
14
13
 
15
14
  # Print out each lemma with a list of possible meanings.
16
- lemmas.each do |lemma|
15
+ lemmas.each do |lemma|
17
16
  puts lemma
18
17
  lemma.synsets.each_with_index do |synset,i|
19
18
  puts "\t#{i+1}) #{synset.gloss}"
@@ -1,10 +1,7 @@
1
- require 'rubygems'
2
1
  require 'wordnet'
3
2
 
4
- # Open the index file for nouns
5
- index = WordNet::NounIndex.new
6
3
  # Find the word 'fruit'
7
- lemma = index.find("fruit")
4
+ lemma = WordNet::Lemma.find("fruit", :noun)
8
5
  # Find all the synsets for 'fruit', and pick the first one.
9
6
  synset = lemma.synsets[0]
10
7
  puts synset
data/lib/wordnet/db.rb ADDED
@@ -0,0 +1,17 @@
1
+ module WordNet
2
+ # Represents the WordNet database, and provides some basic interaction.
3
+ class DB
4
+ # By default, use the bundled WordNet
5
+ @path = File.expand_path("../../../WordNet-3.0/", __FILE__)
6
+
7
+ class << self
8
+ # To use your own WordNet installation (rather than the one bundled with rwordnet:
9
+ # Returns the path to the WordNet installation currently in use. Defaults to the bundled version of WordNet.
10
+ attr_accessor :path
11
+
12
+ def open(path, &block)
13
+ File.open(File.join(self.path, path), "r", &block)
14
+ end
15
+ end
16
+ end
17
+ end
data/lib/wordnet/lemma.rb CHANGED
@@ -1,39 +1,60 @@
1
1
  module WordNet
2
+ # Represents a single word in the WordNet lexicon, which can be used to look up a set of synsets.
3
+ class Lemma
4
+ SPACE = ' '
5
+ attr_accessor :word, :pos, :pointer_symbols, :tagsense_count, :synset_offsets, :id
2
6
 
3
- # Represents a single word in the WordNet lexicon, which can be used to look up a set of synsets.
4
- class Lemma
5
- attr_accessor :lemma, :pos, :synset_cnt, :p_cnt, :ptr_symbol, :tagsense_cnt, :synset_offset, :id
6
-
7
- # Create a lemma from a line in an index file. You should be creating Lemmas by hand; instead,
8
- # use the WordNet#find and Index#find methods to find the Lemma for a word.
9
- def initialize(index_line, id = 0)
10
- @id = (id > 0) ? id : nil
11
- line = index_line.split(" ")
12
-
13
- @lemma = line.shift
14
- @pos = line.shift
15
- @synset_cnt = line.shift.to_i
16
- @p_cnt = line.shift.to_i
17
-
18
- @ptr_symbol = []
19
- @p_cnt.times { @ptr_symbol.push line.shift }
20
- line.shift # Throw away redundant sense_cnt
21
- @tagsense_cnt = line.shift.to_i
22
- @synset_offset = []
23
- @synset_cnt.times { @synset_offset.push line.shift.to_i }
24
- end
25
-
26
- # Return a list of synsets for this Lemma. Each synset represents a different sense, or meaning, of the word.
27
- def get_synsets
28
- return @synset_offset.map { |offset| Synset.new(@pos, offset) }
29
- end
30
-
31
- def to_s
32
- [@lemma, @pos].join(",")
33
- end
34
-
35
- alias synsets get_synsets
36
- alias word lemma
37
- end
7
+ # Create a lemma from a line in an lexicon file. You should be creating Lemmas by hand; instead,
8
+ # use the WordNet::Lemma.find and WordNet::Lemma.find_all methods to find the Lemma for a word.
9
+ def initialize(lexicon_line, id)
10
+ @id = id
11
+ line = lexicon_line.split(" ")
12
+
13
+ @word = line.shift
14
+ @pos = line.shift
15
+ synset_count = line.shift.to_i
16
+ @pointer_symbols = line.slice!(0, line.shift.to_i)
17
+ line.shift # Throw away redundant sense_cnt
18
+ @tagsense_count = line.shift.to_i
19
+ @synset_offsets = line.slice!(0, synset_count).map(&:to_i)
20
+ end
21
+
22
+ # Return a list of synsets for this Lemma. Each synset represents a different sense, or meaning, of the word.
23
+ def synsets
24
+ @synset_offsets.map { |offset| Synset.new(@pos, offset) }
25
+ end
38
26
 
27
+ def to_s
28
+ [@word, @pos].join(",")
29
+ end
30
+
31
+ class << self
32
+ @@cache = {}
33
+
34
+ def find_all(word)
35
+ [:noun, :verb, :adj, :adv].flat_map do |pos|
36
+ find(word, pos) || []
37
+ end
38
+ end
39
+
40
+ # Find a lemma for a given word and pos
41
+ def find(word, pos)
42
+ cache = @@cache[pos] ||= build_cache(pos)
43
+ if found = cache[word]
44
+ Lemma.new(*found)
45
+ end
46
+ end
47
+
48
+ private
49
+
50
+ def build_cache(pos)
51
+ cache = {}
52
+ DB.open(File.join("dict", "index.#{pos}")).each_line.each_with_index do |line, index|
53
+ word = line.slice(0, line.index(SPACE))
54
+ cache[word] = [line, index+1]
55
+ end
56
+ cache
57
+ end
58
+ end
59
+ end
39
60
  end
@@ -1,15 +1,14 @@
1
1
  module WordNet
2
+ class Pointer
3
+ attr_reader :symbol, :offset, :pos, :source, :target
2
4
 
3
- # Convenience class for treating hashes as objects, i.e. obj[:key] <=> obj.key. I know
4
- # this is probably a bad idea, but it's so convenient...
5
- class Pointer < Hash
6
- def method_missing(msg, *args)
7
- if self.include?(msg)
8
- return self[msg]
9
- else
10
- throw NoMethodError.new("undefined method `#{msg}' for #{self}:Pointer")
5
+ def initialize(symbol: raise, offset: raise, pos: raise, source: raise)
6
+ @symbol, @offset, @pos, @source = symbol, offset, pos, source
7
+ @target = source.slice!(2,2)
11
8
  end
12
- end
13
- end
14
9
 
10
+ def is_semantic?
11
+ source == "00" && target == "00"
12
+ end
13
+ end
15
14
  end
@@ -1,37 +1,82 @@
1
- # A container for various constants. In particular, contains constants representing the WordNet symbols used to look up synsets by relation, i.e. Hypernym/Hyponym.
2
- # Use these symbols in conjunction with the Synset#get_relation method.
1
+ # A container for various constants.
2
+ # In particular, contains constants representing the WordNet symbols used to look up synsets by relation, i.e. Hypernym/Hyponym.
3
+ # Use these symbols in conjunction with the Synset#relation method.
3
4
 
4
5
  module WordNet
6
+ NOUN_POINTERS = {
7
+ "-c" => "Member of this domain - TOPIC",
8
+ "+" => "Derivationally related form",
9
+ "%p" => "Part meronym",
10
+ "~i" => "Instance Hyponym",
11
+ "@" => "Hypernym",
12
+ ";r" => "Domain of synset - REGION",
13
+ "!" => "Antonym",
14
+ "#p" => "Part holonym",
15
+ "%s" => "Substance meronym",
16
+ ";u" => "Domain of synset - USAGE",
17
+ "-r" => "Member of this domain - REGION",
18
+ "#s" => "Substance holonym",
19
+ "=" => "Attribute",
20
+ "-u" => "Member of this domain - USAGE",
21
+ ";c" => "Domain of synset - TOPIC",
22
+ "%m" => "Member meronym",
23
+ "~" => "Hyponym",
24
+ "@i" => "Instance Hypernym",
25
+ "#m" => "Member holonym"
26
+ }
27
+ VERB_POINTERS = {
28
+ "+" => "Derivationally related form",
29
+ "@" => "Hypernym",
30
+ ";r" => "Domain of synset - REGION",
31
+ "!" => "Antonym",
32
+ ";u" => "Domain of synset - USAGE",
33
+ "$" => "Verb Group",
34
+ ";c" => "Domain of synset - TOPIC",
35
+ ">" => "Cause",
36
+ "~" => "Hyponym",
37
+ "*" => "Entailment"
38
+ }
39
+ ADJECTIVE_POINTERS = {
40
+ ";r" => "Domain of synset - REGION",
41
+ "!" => "Antonym",
42
+ "\\" => "Pertainym (pertains to noun)",
43
+ "<" => "Participle of verb",
44
+ "&" => "Similar to",
45
+ "=" => "Attribute",
46
+ ";c" => "Domain of synset - TOPIC"
47
+ }
48
+ ADVERB_POINTERS = {
49
+ ";r" => "Domain of synset - REGION",
50
+ "!" => "Antonym",
51
+ ";u" => "Domain of synset - USAGE",
52
+ "\\" => "Derived from adjective",
53
+ ";c" => "Domain of synset - TOPIC"
54
+ }
5
55
 
6
- NounPointers = {"-c"=>"Member of this domain - TOPIC", "+"=>"Derivationally related form", "%p"=>"Part meronym", "~i"=>"Instance Hyponym", "@"=>"Hypernym", ";r"=>"Domain of synset - REGION", "!"=>"Antonym", "#p"=>"Part holonym", "%s"=>"Substance meronym", ";u"=>"Domain of synset - USAGE", "-r"=>"Member of this domain - REGION", "#s"=>"Substance holonym", "="=>"Attribute", "-u"=>"Member of this domain - USAGE", ";c"=>"Domain of synset - TOPIC", "%m"=>"Member meronym", "~"=>"Hyponym", "@i"=>"Instance Hypernym", "#m"=>"Member holonym"}
7
- VerbPointers = {"+"=>"Derivationally related form", "@"=>"Hypernym", ";r"=>"Domain of synset - REGION", "!"=>"Antonym", ";u"=>"Domain of synset - USAGE", "$"=>"Verb Group", ";c"=>"Domain of synset - TOPIC", ">"=>"Cause", "~"=>"Hyponym", "*"=>"Entailment"}
8
- AdjectivePointers = {";r"=>"Domain of synset - REGION", "!"=>"Antonym", "\\"=>"Pertainym (pertains to noun)", "<"=>"Participle of verb", "&"=>"Similar to", "="=>"Attribute", ";c"=>"Domain of synset - TOPIC"}
9
- AdverbPointers = {";r"=>"Domain of synset - REGION", "!"=>"Antonym", ";u"=>"Domain of synset - USAGE", "\\"=>"Derived from adjective", ";c"=>"Domain of synset - TOPIC"}
10
-
11
- MemberOfThisDomainTopic = "-c"
12
- DerivationallyRelatedForm = "+"
13
- PartMeronym = "%p"
56
+ MEMBER_OF_THIS_DOMAIN_TOPIC = "-c"
57
+ DERIVATIONALLY_RELATED_FORM = "+"
58
+ PART_MERONYM = "%p"
14
59
  InstanceHyponym = "~i"
15
- Hypernym = "@"
16
- DomainOfSynsetRegion = ";r"
17
- Antonym = "!"
18
- PartHolonym = "#p"
19
- SubstanceMeronym = "%s"
20
- VerbGroup = "$"
21
- DomainOfSynsetUsage = ";u"
22
- MemberOfThisDomainRegion = "-r"
23
- SubstanceHolonym = "#s"
24
- DerivedFromAdjective = "\\"
25
- ParticipleOfVerb = "<"
26
- SimilarTo = "&"
27
- Attribute = "="
28
- AlsoSee = "^"
29
- Cause = ">"
30
- MemberOfThisDomainUsage = "-u"
31
- DomainOfSynsetTopic = ";c"
32
- MemberMeronym = "%m"
33
- Hyponym = "~"
34
- InstanceHypernym = "@i"
35
- Entailment = "*"
36
- MemberHolonym = "#m"
60
+ HYPERNYM = "@"
61
+ DOMAIN_OF_SYNSET_REGION = ";r"
62
+ ANTONYM = "!"
63
+ PART_HOLONYM = "#p"
64
+ SUBSTANCE_MERONYM = "%s"
65
+ VERB_GROUP = "$"
66
+ DOMAIN_OF_SYNSET_USAGE = ";u"
67
+ MEMBER_OF_THIS_DOMAIN_REGION = "-r"
68
+ SUBSTANCE_HOLONYM = "#s"
69
+ DERIVED_FROM_ADJECTIVE = "\\"
70
+ PARTICIPLE_OF_VERB = "<"
71
+ SIMILAR_TO = "&"
72
+ ATTRIBUTE = "="
73
+ ALSO_SEE = "^"
74
+ CAUSE = ">"
75
+ MEMBER_OF_THIS_DOMAIN_USAGE = "-u"
76
+ DOMAIN_OF_SYNSET_TOPIC = ";c"
77
+ MEMBER_MERONYM = "%m"
78
+ HYPONYM = "~"
79
+ INSTANCE_HYPERNYM = "@i"
80
+ ENTAILMENT = "*"
81
+ MEMBER_HOLONYM = "#m"
37
82
  end
@@ -1,90 +1,98 @@
1
1
  module WordNet
2
+ SYNSET_TYPES = {"n" => "noun", "v" => "verb", "a" => "adj", "r" => "adv"}
2
3
 
3
- # Represents a synset (or group of synonymous words) in WordNet. Synsets are related to each other by various (and numerous!)
4
- # relationships, including Hypernym (x is a hypernym of y <=> x is a parent of y) and Hyponym (x is a child of y)
5
- class Synset
6
- attr_reader :gloss, :synset_offset, :lex_filenum, :ss_type, :w_cnt, :wordcounts
7
-
8
- # Create a new synset by reading from the data file specified by +pos+, at +offset+ bytes into the file. This is how
9
- # the WordNet database is organized. You shouldn't be creating Synsets directly; instead, use Lemma#synsets.
10
- def initialize(pos, offset)
11
- data = File.open(File.join(WordNetDB.path,"dict","data.#{SynsetType[pos]}"),"r")
12
- data.seek(offset)
13
- data_line = data.readline.strip
14
- data.close
15
-
16
- info_line, @gloss = data_line.split(" | ")
17
- line = info_line.split(" ")
18
-
19
- @synset_offset = line.shift
20
- @lex_filenum = line.shift
21
- @ss_type = line.shift
22
- @w_cnt = line.shift.to_i
23
- @wordcounts = {}
24
- @w_cnt.times do
25
- @wordcounts[line.shift] = line.shift.to_i
4
+ # Represents a synset (or group of synonymous words) in WordNet. Synsets are related to each other by various (and numerous!)
5
+ # relationships, including Hypernym (x is a hypernym of y <=> x is a parent of y) and Hyponym (x is a child of y)
6
+ class Synset
7
+ attr_reader :gloss, :synset_offset, :lex_filenum, :synset_type, :word_counts, :pos_offset, :pos
8
+
9
+ # Create a new synset by reading from the data file specified by +pos+, at +offset+ bytes into the file. This is how
10
+ # the WordNet database is organized. You shouldn't be creating Synsets directly; instead, use Lemma#synsets.
11
+ def initialize(pos, offset)
12
+ data_line = DB.open(File.join("dict", "data.#{SYNSET_TYPES.fetch(pos)}")) do |f|
13
+ f.seek(offset)
14
+ f.readline.strip
15
+ end
16
+
17
+ info_line, @gloss = data_line.split(" | ", 2)
18
+ line = info_line.split(" ")
19
+
20
+ @pos = pos
21
+ @pos_offset = offset
22
+ @synset_offset = line.shift
23
+ @lex_filenum = line.shift
24
+ @synset_type = line.shift
25
+
26
+ @word_counts = {}
27
+ word_count = line.shift.to_i
28
+ word_count.times do
29
+ @word_counts[line.shift] = line.shift.to_i
30
+ end
31
+
32
+ pointer_count = line.shift.to_i
33
+ @pointers = Array.new(pointer_count).map do
34
+ Pointer.new(
35
+ symbol: line.shift[0],
36
+ offset: line.shift.to_i,
37
+ pos: line.shift,
38
+ source: line.shift
39
+ )
40
+ end
26
41
  end
27
-
28
- @p_cnt = line.shift.to_i
29
- @pointers = []
30
- @p_cnt.times do
31
- pointer = Pointer.new
32
- pointer[:symbol] = line.shift,
33
- pointer[:offset] = line.shift.to_i
34
- pointer[:pos] = line.shift
35
- pointer[:source] = line.shift
36
- pointer[:is_semantic?] = (pointer[:source] == "0000")
37
- pointer[:target] = pointer[:source][2..3]
38
- pointer[:source] = pointer[:source][0..1]
39
- pointer[:symbol] = pointer[:symbol][0]
40
- @pointers.push pointer
42
+
43
+ # How many words does this Synset include?
44
+ def word_count
45
+ @word_counts.size
41
46
  end
42
- end
43
-
44
- # How many words does this Synset include?
45
- def size
46
- @wordcounts.size
47
- end
48
-
49
- # Get a list of words included in this Synset
50
- def words
51
- @wordcounts.keys
52
- end
53
-
54
- # List of valid +pointer_symbol+s is in pointers.rb
55
- def get_relation(pointer_symbol)
56
- @pointers.reject { |pointer| pointer.symbol != pointer_symbol }.map { |pointer| Synset.new(@ss_type, pointer.offset) }
57
- end
58
-
59
- # Get the Synset of this sense's antonym
60
- def antonym
61
- get_relation(Antonym)
62
- end
63
-
64
- # Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).
65
- def hypernym
66
- get_relation(Hypernym)[0]
67
- end
68
-
69
- # Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit)
70
- def hyponym
71
- get_relation(Hyponym)
72
- end
73
-
74
- # Get the entire hypernym tree (from this synset all the way up to +entity+) as an array.
75
- def expanded_hypernym
76
- parent = self.hypernym
77
- return [] if parent.nil?
78
-
79
- return [parent, parent.expanded_hypernym].flatten
80
- end
81
-
82
- def to_s
83
- "(#{@ss_type}) #{words.map {|x| x.gsub('_',' ')}.join(', ')} (#{@gloss})"
84
- end
85
-
86
- alias parent hypernym
87
- alias children hyponym
88
- end
89
47
 
48
+ # Get a list of words included in this Synset
49
+ def words
50
+ @word_counts.keys
51
+ end
52
+
53
+ # List of valid +pointer_symbol+s is in pointers.rb
54
+ def relation(pointer_symbol)
55
+ @pointers.select { |pointer| pointer.symbol == pointer_symbol }.
56
+ map! { |pointer| Synset.new(@synset_type, pointer.offset) }
57
+ end
58
+
59
+ # Get the Synset of this sense's antonym
60
+ def antonym
61
+ relation(ANTONYM)
62
+ end
63
+
64
+ # Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).
65
+ def hypernym
66
+ relation(HYPERNYM)[0]
67
+ end
68
+
69
+ # Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit)
70
+ def hyponym
71
+ relation(HYPONYM)
72
+ end
73
+
74
+ # Get the entire hypernym tree (from this synset all the way up to +entity+) as an array.
75
+ def expanded_hypernym
76
+ parent = hypernym
77
+ list = []
78
+ return list unless parent
79
+
80
+ while parent
81
+ break if list.include? parent.pos_offset
82
+ list.push parent.pos_offset
83
+ parent = parent.parent
84
+ end
85
+
86
+ list.flatten!
87
+ list.map! { |offset| Synset.new(@pos, offset)}
88
+ end
89
+
90
+ def to_s
91
+ "(#{@synset_type}) #{words.map { |x| x.tr('_',' ') }.join(', ')} (#{@gloss})"
92
+ end
93
+
94
+ alias size word_count
95
+ alias parent hypernym
96
+ alias children hyponym
97
+ end
90
98
  end
@@ -0,0 +1,3 @@
1
+ module WordNet
2
+ VERSION = "1.0.0"
3
+ end
data/lib/wordnet.rb CHANGED
@@ -1,7 +1,5 @@
1
1
  require 'wordnet/pointer'
2
- require 'wordnet/wordnetdb'
3
- require 'wordnet/index'
2
+ require 'wordnet/db'
4
3
  require 'wordnet/lemma'
5
4
  require 'wordnet/pointers'
6
- require 'wordnet/pos'
7
5
  require 'wordnet/synset'
data/test/test_helper.rb CHANGED
@@ -1,17 +1,16 @@
1
- require "test/unit"
2
- require File.dirname(__FILE__) + "/../lib/wordnet"
1
+ require "bundler/setup"
2
+ require "maxitest/autorun"
3
3
 
4
+ $LOAD_PATH.unshift Bundler.root.join("lib")
5
+ require "wordnet"
4
6
 
5
- class << Test::Unit::TestCase
6
- def test(name, &block)
7
- test_name = :"test_#{name.gsub(' ','_')}"
8
- raise ArgumentError, "#{test_name} is already defined" if self.instance_methods.include? test_name.to_s
9
- define_method test_name, &block
10
- end
11
-
12
- def expect(expected_value, &block)
13
- define_method :"test_#{caller.first.split("/").last}" do
14
- assert_equal expected_value, instance_eval(&block)
15
- end
7
+ Minitest::Test.class_eval do
8
+ def with_db_path(path)
9
+ begin
10
+ old, WordNet::DB.path = WordNet::DB.path, path
11
+ yield
12
+ ensure
13
+ WordNet::DB.path = old
14
+ end
16
15
  end
17
16
  end
@@ -0,0 +1,14 @@
1
+ require_relative "../test_helper"
2
+
3
+ describe WordNet::DB do
4
+ it 'sets and reads path' do
5
+ with_db_path("WordNetPath") { WordNet::DB.path.must_equal "WordNetPath" }
6
+ end
7
+
8
+ it "opens a relative path" do
9
+ result = WordNet::DB.open(File.join("dict", "index.verb")) do |f|
10
+ f.gets
11
+ end
12
+ result.must_equal " 1 This software and database is being provided to you, the LICENSEE, by \n"
13
+ end
14
+ end
@@ -0,0 +1,94 @@
1
+ require_relative "../test_helper"
2
+
3
+ describe WordNet::Lemma do
4
+ describe ".find" do
5
+ it 'finds a lemma by string' do
6
+ lemma = WordNet::Lemma.find("fruit", :noun)
7
+ lemma.to_s.must_equal "fruit,n"
8
+ end
9
+
10
+ it 'caches found' do
11
+ lemma1 = WordNet::Lemma.find("fruit", :noun)
12
+ lemma2 = with_db_path "does-not-exist" do
13
+ WordNet::Lemma.find("fruit", :noun)
14
+ end
15
+ lemma1.word.must_equal lemma2.word
16
+ end
17
+
18
+ it 'only scans the db once' do
19
+ lemma1 = WordNet::Lemma.find("fruit", :noun)
20
+ lemma2 = with_db_path "does-not-exist" do
21
+ WordNet::Lemma.find("table", :noun)
22
+ end
23
+ lemma2.word.must_equal "table"
24
+ end
25
+
26
+ it 'can lookup different things' do
27
+ lemma1 = WordNet::Lemma.find("fruit", :noun)
28
+ lemma2 = WordNet::Lemma.find("banana", :noun)
29
+ lemma1.word.must_equal "fruit"
30
+ lemma2.word.must_equal "banana"
31
+ end
32
+
33
+ it 'does not find word in wrong file' do
34
+ lemma = WordNet::Lemma.find("elephant", :verb)
35
+ lemma.must_equal nil
36
+ end
37
+
38
+ it 'caches unfound' do
39
+ WordNet::Lemma.find("elephant", :verb)
40
+ lemma2 = with_db_path "does-not-exist" do
41
+ WordNet::Lemma.find("elephant", :verb)
42
+ end
43
+ lemma2.must_equal nil
44
+ end
45
+
46
+ it 'fails on unknown type' do
47
+ assert_raises Errno::ENOENT do
48
+ WordNet::Lemma.find("fruit", :sdjksdfjkdfskjsdfjk)
49
+ end
50
+ end
51
+
52
+ it "does not find by regexp" do
53
+ WordNet::Lemma.find(".", :verb).must_equal nil
54
+ end
55
+ end
56
+
57
+ describe ".find_all" do
58
+ it "finds all pos" do
59
+ result = WordNet::Lemma.find_all("fruit")
60
+ result.size.must_equal 2
61
+ result.map(&:pos).sort.must_equal ["n", "v"]
62
+ end
63
+
64
+ it "returns empty array for unfound" do
65
+ WordNet::Lemma.find_all("sdjkhdfsjfdsjhkfds").must_equal []
66
+ end
67
+
68
+ it "does not produce a circular reference" do
69
+ l = WordNet::Lemma.find_all("blink")[1]
70
+ l.synsets[1].expanded_hypernym.wont_be_nil
71
+ end
72
+ end
73
+
74
+ describe "#synsets" do
75
+ it 'finds them' do
76
+ lemma = WordNet::Lemma.find("fruit", :noun)
77
+ synsets = lemma.synsets
78
+ synsets.size.must_equal 3
79
+ synsets[1].to_s.must_equal "(n) yield, fruit (an amount of a product)"
80
+ end
81
+ end
82
+
83
+ describe ".new" do
84
+ it "builds all fields" do
85
+ lemma = WordNet::Lemma.new("fruit n 3 3 @ ~ + 3 3 13134947 04612722 07294550", 123)
86
+ lemma.id.must_equal 123
87
+ lemma.word.must_equal "fruit"
88
+ lemma.pos.must_equal "n"
89
+ lemma.pointer_symbols.must_equal ["@", "~", "+"]
90
+ lemma.tagsense_count.must_equal 3
91
+ lemma.synset_offsets.must_equal [13134947, 4612722, 7294550]
92
+ end
93
+ end
94
+ end
@@ -0,0 +1,26 @@
1
+ require_relative "../test_helper"
2
+
3
+ describe WordNet::Pointer do
4
+ let(:pointer) { WordNet::Pointer.new(symbol: "s", offset: 123, pos: "v", source: "1234") }
5
+
6
+ describe "#initialize" do
7
+ it "sets all values" do
8
+ pointer.symbol.must_equal "s"
9
+ pointer.offset.must_equal 123
10
+ pointer.pos.must_equal "v"
11
+ pointer.source.must_equal "12"
12
+ pointer.target.must_equal "34"
13
+ end
14
+ end
15
+
16
+ describe "#is_semantic?" do
17
+ it "is not semantic for non-0" do
18
+ pointer.is_semantic?.must_equal false
19
+ end
20
+
21
+ it "is semantic for all-0" do
22
+ pointer = WordNet::Pointer.new(symbol: "s", offset: 123, pos: "v", source: "0000")
23
+ pointer.is_semantic?.must_equal true
24
+ end
25
+ end
26
+ end
@@ -1,43 +1,39 @@
1
- require File.dirname(__FILE__) + "/../test_helper.rb"
1
+ require_relative "../test_helper"
2
2
 
3
- class TestSynset < Test::Unit::TestCase
4
- @@synsets = nil
5
-
6
- def setup
7
- if @@synsets.nil?
8
- index = WordNet::NounIndex.instance
9
- lemma = index.find("fruit")
10
- @@synsets = lemma.get_synsets
11
- end
3
+ describe WordNet::Synset do
4
+ def self.synsets
5
+ @synsets ||= WordNet::Lemma.find("fruit", :noun).synsets
12
6
  end
13
-
14
- test 'get synsets for a lemma' do
15
- assert_equal 3, @@synsets.size
16
- assert_equal "(n) fruit (the ripened reproductive body of a seed plant)",@@synsets[0].to_s
17
- assert_equal "an amount of a product",@@synsets[1].gloss
7
+
8
+ let(:synsets) { self.class.synsets }
9
+
10
+ it 'get synsets for a lemma' do
11
+ assert_equal 3, synsets.size
12
+ assert_equal "(n) fruit (the ripened reproductive body of a seed plant)",synsets[0].to_s
13
+ assert_equal "an amount of a product",synsets[1].gloss
18
14
  end
19
-
20
- test 'get hypernym for a synset' do
21
- hypernym = @@synsets[0].get_relation(WordNet::Hypernym)
22
- hypernym = @@synsets[0].hypernym
15
+
16
+ it 'get hypernym for a synset' do
17
+ hypernym = synsets[0].relation(WordNet::HYPERNYM)
18
+ hypernym = synsets[0].hypernym
23
19
  assert_equal 1,hypernym.size
24
20
  assert_equal "(n) reproductive structure (the parts of a plant involved in its reproduction)",hypernym.to_s
25
21
  end
26
22
 
27
- test 'test shorthand for get_relation' do
28
- hypernym = @@synsets[0].get_relation(WordNet::Hypernym)
29
- hypernym2 = @@synsets[0].hypernym
23
+ it 'test shorthand for get_relation' do
24
+ hypernym = synsets[0].relation(WordNet::HYPERNYM)
25
+ hypernym2 = synsets[0].hypernym
30
26
  assert_equal hypernym[0].gloss, hypernym2.gloss
31
27
  end
32
-
33
- test 'get hyponyms for a synset' do
34
- hyponym = @@synsets[0].get_relation(WordNet::Hyponym)
28
+
29
+ it 'get hyponyms for a synset' do
30
+ hyponym = synsets[0].relation(WordNet::HYPONYM)
35
31
  assert_equal 29,hyponym.size
36
32
  assert_equal "fruit of various buckthorns yielding dyes or pigments",hyponym[26].gloss
37
33
  end
38
-
39
- test 'test expanded hypernym tree' do
40
- expanded = @@synsets[0].expanded_hypernym
34
+
35
+ it 'test expanded hypernym tree' do
36
+ expanded = synsets[0].expanded_hypernym
41
37
  assert_equal 8, expanded.size
42
38
  assert_equal "entity", expanded[expanded.size-1].words[0]
43
39
  end
metadata CHANGED
@@ -1,33 +1,23 @@
1
- --- !ruby/object:Gem::Specification
1
+ --- !ruby/object:Gem::Specification
2
2
  name: rwordnet
3
- version: !ruby/object:Gem::Version
4
- prerelease: false
5
- segments:
6
- - 0
7
- - 1
8
- - 3
9
- version: 0.1.3
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
10
5
  platform: ruby
11
- authors:
6
+ authors:
12
7
  - Trevor Fountain
13
8
  - Wolfram Sieber
9
+ - Michael Grosser
14
10
  autorequire:
15
11
  bindir: bin
16
12
  cert_chain: []
17
-
18
- date: 2010-10-15 00:00:00 +01:00
19
- default_executable:
13
+ date: 2015-03-08 00:00:00.000000000 Z
20
14
  dependencies: []
21
-
22
- description: A pure Ruby interface to the WordNet database
15
+ description:
23
16
  email: doches@gmail.com
24
17
  executables: []
25
-
26
18
  extensions: []
27
-
28
- extra_rdoc_files:
29
- - README.markdown
30
- files:
19
+ extra_rdoc_files: []
20
+ files:
31
21
  - History.txt
32
22
  - README.markdown
33
23
  - WordNet-3.0/AUTHORS
@@ -42,54 +32,43 @@ files:
42
32
  - WordNet-3.0/dict/index.adv
43
33
  - WordNet-3.0/dict/index.noun
44
34
  - WordNet-3.0/dict/index.verb
35
+ - examples/benchmark.rb
45
36
  - examples/dictionary.rb
46
37
  - examples/full_hypernym.rb
47
38
  - lib/wordnet.rb
48
- - lib/wordnet/index.rb
39
+ - lib/wordnet/db.rb
49
40
  - lib/wordnet/lemma.rb
50
41
  - lib/wordnet/pointer.rb
51
42
  - lib/wordnet/pointers.rb
52
- - lib/wordnet/pos.rb
53
43
  - lib/wordnet/synset.rb
54
- - lib/wordnet/wordnetdb.rb
44
+ - lib/wordnet/version.rb
55
45
  - test/test_helper.rb
56
- - test/unit/index_test.rb
46
+ - test/unit/db_test.rb
47
+ - test/unit/lemma_test.rb
48
+ - test/unit/pointer_test.rb
57
49
  - test/unit/synset_test.rb
58
- - test/unit/wordnetdb_test.rb
59
- has_rdoc: true
60
- homepage: http://github.com/doches/rwordnet
61
- licenses: []
62
-
50
+ homepage: https://github.com/doches/rwordnet
51
+ licenses:
52
+ - MIT
53
+ metadata: {}
63
54
  post_install_message:
64
- rdoc_options:
65
- - --charset=UTF-8
66
- require_paths:
55
+ rdoc_options: []
56
+ require_paths:
67
57
  - lib
68
- required_ruby_version: !ruby/object:Gem::Requirement
69
- requirements:
58
+ required_ruby_version: !ruby/object:Gem::Requirement
59
+ requirements:
70
60
  - - ">="
71
- - !ruby/object:Gem::Version
72
- segments:
73
- - 0
74
- version: "0"
75
- required_rubygems_version: !ruby/object:Gem::Requirement
76
- requirements:
61
+ - !ruby/object:Gem::Version
62
+ version: 2.0.0
63
+ required_rubygems_version: !ruby/object:Gem::Requirement
64
+ requirements:
77
65
  - - ">="
78
- - !ruby/object:Gem::Version
79
- segments:
80
- - 0
81
- version: "0"
66
+ - !ruby/object:Gem::Version
67
+ version: '0'
82
68
  requirements: []
83
-
84
69
  rubyforge_project:
85
- rubygems_version: 1.3.6
70
+ rubygems_version: 2.2.2
86
71
  signing_key:
87
- specification_version: 3
72
+ specification_version: 4
88
73
  summary: A pure Ruby interface to the WordNet database
89
- test_files:
90
- - test/unit/index_test.rb
91
- - test/unit/synset_test.rb
92
- - test/unit/wordnetdb_test.rb
93
- - test/test_helper.rb
94
- - examples/full_hypernym.rb
95
- - examples/dictionary.rb
74
+ test_files: []
data/lib/wordnet/index.rb DELETED
@@ -1,82 +0,0 @@
1
- require 'singleton'
2
- module WordNet
3
-
4
- # Index is a WordNet lexicon. Note that Index is the base class; you probably want to be using the NounIndex, VerbIndex, etc. classes instead.
5
- # Note that Indices are Singletons -- get an Index object by calling <POS>Index.instance, not <POS>Index.new.
6
- class Index
7
- # Create a new index for the given part of speech. +pos+ can be one of +noun+, +verb+, +adj+, or +adv+.
8
- def initialize(pos)
9
- @pos = pos
10
- @db = {}
11
-
12
- @finished_reading = false
13
- end
14
-
15
- # Find a lemma for a given word. Returns a Lemma which can then be used to access the synsets for the word.
16
- def find(lemma_str)
17
- # Look for the lemma in the part of the DB already read...
18
- return @db[lemma_str] if @db.include?(lemma_str)
19
-
20
- return nil if @finished_reading
21
-
22
- # If we didn't find it, read in some more from the DB.
23
- index = WordNetDB.open(File.join(WordNetDB.path,"dict","index.#{@pos}"))
24
-
25
- lemma_counter = 1
26
- if not index.closed?
27
- loop do
28
- break if index.eof?
29
- line = index.readline
30
- lemma = Lemma.new(line, lemma_counter); lemma_counter += 1
31
- @db[lemma.word] = lemma
32
- if line =~ /^#{lemma_str} /
33
- return lemma
34
- end
35
- end
36
- index.close
37
- end
38
-
39
- @finished_reading = true
40
-
41
- # If we *still* didn't find it, return nil. It must not be in the database...
42
- return nil
43
- end
44
- end
45
-
46
- # An Index of nouns. Create a NounIndex by calling `NounIndex.instance`
47
- class NounIndex < Index
48
- include Singleton
49
-
50
- def initialize
51
- super("noun")
52
- end
53
- end
54
-
55
- # An Index of verbs. Create a VerbIndex by calling `VerbIndex.instance`
56
- class VerbIndex < Index
57
- include Singleton
58
-
59
- def initialize
60
- super("verb")
61
- end
62
- end
63
-
64
- # An Index of adjectives. Create an AdjectiveIndex by `AdjectiveIndex.instance`
65
- class AdjectiveIndex < Index
66
- include Singleton
67
-
68
- def initialize
69
- super("adj")
70
- end
71
- end
72
-
73
- # An Index of adverbs. Create an AdverbIndex by `AdverbIndex.instance`
74
- class AdverbIndex < Index
75
- include Singleton
76
-
77
- def initialize
78
- super("adv")
79
- end
80
- end
81
-
82
- end
data/lib/wordnet/pos.rb DELETED
@@ -1,3 +0,0 @@
1
- module WordNet
2
- SynsetType = {"n" => "noun", "v" => "verb", "adj" => "adj", "adv" => "adv"}
3
- end
@@ -1,54 +0,0 @@
1
- module WordNet
2
-
3
- # Represents the WordNet database, and provides some basic interaction.
4
- class WordNetDB
5
- # By default, use the bundled WordNet
6
- @@path = File.join(File.dirname(__FILE__),"/../../WordNet-3.0/")
7
- @@files = {}
8
-
9
- # To use your own WordNet installation (rather than the one bundled with rwordnet:
10
- def WordNetDB.path=(path_to_wordnet)
11
- @@path = path_to_wordnet
12
- end
13
-
14
- # Returns the path to the WordNet installation currently in use. Defaults to the bundled version of WordNet.
15
- def WordNetDB.path
16
- @@path
17
- end
18
-
19
- # Look up a word in WordNet. Returns a list of lemmas occuring in any of the index files (noun, verb, adjective, adverb).
20
- def WordNetDB.find(word)
21
- lemmas = []
22
- [NounIndex, VerbIndex, AdjectiveIndex, AdverbIndex].each do |index|
23
- lemmas.push index.instance.find(word)
24
- end
25
- return lemmas.flatten.reject { |x| x.nil? }
26
- end
27
-
28
- # Register a new DB file handle. You shouldn't need to call this method; it's called automatically every time you open an index or data file.
29
- def WordNetDB.open(path)
30
- # If the file is already open, just return the handle.
31
- return @@files[path] if @@files.include?(path) and not @@files[path].closed?
32
-
33
- # Open and store
34
- @@files[path] = File.open(path,"r")
35
- return @@files[path]
36
- end
37
-
38
- # You should call this method after you're done using WordNet.
39
- def WordNetDB.close
40
- WordNetDB.finalize(0)
41
- end
42
-
43
- def WordNetDB.finalize(id)
44
- @@files.each_value do |handle|
45
- begin
46
- handle.close
47
- rescue IOError
48
- ; # Keep going, close the next file.
49
- end
50
- end
51
- end
52
- end
53
-
54
- end
@@ -1,21 +0,0 @@
1
- require File.dirname(__FILE__) + "/../test_helper.rb"
2
-
3
- class TestIndex < Test::Unit::TestCase
4
- @@index = nil
5
-
6
- def setup
7
- @@index = WordNet::NounIndex.instance if @@index.nil?
8
- end
9
-
10
- test 'find a lemma by string' do
11
- lemma = @@index.find("fruit")
12
- assert_equal "fruit,n",lemma.to_s
13
- end
14
-
15
- test 'get synsets for a lemma' do
16
- lemma = @@index.find("fruit")
17
- synsets = lemma.get_synsets
18
- assert_equal 3, synsets.size
19
- assert_equal "(n) yield, fruit (an amount of a product)",synsets[1].to_s
20
- end
21
- end
@@ -1,15 +0,0 @@
1
- require File.dirname(__FILE__) + "/../test_helper.rb"
2
-
3
- class TestWordNetDB < Test::Unit::TestCase
4
- include WordNet
5
-
6
- test 'set and read path' do
7
- WordNetDB.path = "WordNetPath"
8
- assert_equal "WordNetPath",WordNetDB.path
9
- end
10
-
11
- test 'find a word' do
12
- lemmas = WordNetDB.find("fruit")
13
- assert_equal 2,lemmas.size
14
- end
15
- end