rwordnet 0.1.3 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 4bea9b677b6a581d27c04ad1912c3034bf8329d8
4
+ data.tar.gz: 4fdef6fdfdbe2373b445f7857d3fe5ff5071fba2
5
+ SHA512:
6
+ metadata.gz: c1471523dcb27e496eb72b406f37ddd83184dfcecbcabe95c492f6018b29fa53180d06035b7f9c0a938bfde0e4c28a37f89473c0b3a712be527ed2abf6261af3
7
+ data.tar.gz: 25cf40f306f3dfcf8076d9430cfd5d8d805af9198762a530ed4f293e5b0c4e5ccc680e941b2256f63c5141d5499ad77b7821eec5b4e245a72a47e1f9ca6e83b3
data/History.txt CHANGED
@@ -1,5 +1,14 @@
1
+ # rWordNet 1.0.0
2
+ * Performance fixes for the lookup
3
+ * Find using Lemma.find / Lemma.find_all
4
+ * using ruby style constant names like `VerbPointers` -> `VERB_POINTERS`
5
+ * renamed WordNet::WordNetDB to WordNet::DB
6
+ * renaming a few methods in Lemma like `p_cnt` -> `pointer_count`
7
+ * make Pointer a real class
8
+ * renaming a few methods in SynSet like `get_relation` -> `relation`
9
+
1
10
  # rWordNet 0.1.3
2
- # Fixed a terrible bug that caused Indices to re-read the *entire* database on every failed lookup.
11
+ * Fixed a terrible bug that caused Indices to re-read the *entire* database on every failed lookup.
3
12
 
4
13
  # rWordNet 0.1.2
5
14
  * Added unique (integer) ids to lemmas [Wolfram Sieber]
data/README.markdown CHANGED
@@ -1,59 +1,69 @@
1
1
  # A pure Ruby interface to WordNet #
2
2
 
3
+ [![Build Status](https://travis-ci.org/doches/rwordnet.png)](https://travis-ci.org/doches/rwordnet)
4
+
5
+ ## Summary ##
6
+
7
+ + Works directly on the database that comes with WordNet
8
+ + No gem or native dependencies
9
+ + *Very* easy to install
10
+ + Small footprint (8.1M vs 24M for Ruby-Wordnet+DB)
11
+ + Can use a custom, existing WordNet installation
12
+
3
13
  ## About ##
4
14
 
5
15
  This library implements a pure Ruby interface to the WordNet lexical/semantic
6
- database. Unlike existing ruby bindings, this one doesn't require you to convert
16
+ database. Unlike existing ruby bindings, this one doesn't require you to convert
7
17
  the original WordNet database into a new database format; instead it can work directly
8
18
  on the database that comes with WordNet.
9
19
 
10
20
  If you're doing something data-intensive you will achieve much better performance
11
- with Michael Granger's [Ruby-WordNet](http://www.deveiate.org/projects/Ruby-WordNet/),
12
- since it converts the WordNet database into a BerkelyDB file for quicker access. In
13
- writing rwordnet, I've focused more on usability and ease of installation ( *gem install
21
+ with Michael Granger's [Ruby-WordNet](http://www.deveiate.org/projects/Ruby-WordNet/),
22
+ since it converts the WordNet database into a BerkelyDB file for quicker access. rwordnet has a much smaller footprint, with no gem or native dependencies, and requires about a third of the space on disk as Ruby-Wordnet + DB. In
23
+ writing rwordnet, I've focused more on usability and ease of installation ( *gem install
14
24
  rwordnet* ) at the expense of some performance. Use at your own risk, etc.
15
25
 
16
26
  ## Installation ##
17
27
 
18
28
  One of the chief benefits of rwordnet over Ruby-WordNet is how easy it is to install:
19
29
 
20
- gem install gemcutter # These two steps are only necessary if you haven't
21
- gem tumble # yet installed the gemcutter tools
22
30
  gem install rwordnet
23
-
24
- That's it! rwordnet comes bundled with the WordNet database which it uses by default,
31
+
32
+ That's it! rwordnet comes bundled with the WordNet database which it uses by default,
25
33
  so there's absolutely nothing else to download, install, or configure.
26
- Of course, if you want to use your own WordNet installation, that's easy too -- just
34
+ Of course, if you want to use your own WordNet installation, that's easy too -- just
27
35
  set the path to WordNet's database files before using the library (see examples below).
28
36
 
29
37
  ## Usage ##
30
38
 
31
39
  The other benefit of rwordnet over Ruby-WordNet is that it's so much easier (IMHO) to
32
- use.
40
+ use.
41
+
42
+ As an example, consider finding all of the noun glosses for a given word:
33
43
 
34
- As a quick example, consider finding all of the noun glosses for a given word:
44
+ ```Ruby
45
+ require 'wordnet'
35
46
 
36
- require 'rubygems'
37
- require 'wordnet'
38
-
39
- index = WordNet::NounIndex.instance
40
- lemma = index.find("fruit")
41
- lemma.synsets.each { |synset| puts synset.gloss }
47
+ lemma = WordNet::Lemma.find("fruit", :noun)
48
+ lemma.synsets.each { |synset| puts synset.gloss }
49
+ ```
42
50
 
43
51
  ...or all of the glosses, period:
44
52
 
45
- lemmas = WordNet::WordNetDB.find("fruit")
46
- synsets = lemmas.map { |lemma| lemma.synsets }
47
- words = synsets.flatten
48
- words.each { |word| puts word.gloss }
53
+ ```Ruby
54
+ lemmas = WordNet::Lemma.find_all("fruit")
55
+ synsets = lemmas.map { |lemma| lemma.synsets }
56
+ words = synsets.flatten
57
+ words.each { |word| puts word.gloss }
58
+ ```
49
59
 
50
60
  Have your own WordNet database that you've marked up with extra attributes and whatnot?
51
61
  No problem:
52
62
 
53
- require 'rubygems'
54
- require 'wordnet'
55
-
56
- include WordNet
57
- WordNetDB.path = "/path/to/WordNet-3.0"
58
- lemmas = WordNetDB.find("fruit")
59
- ...
63
+ ```Ruby
64
+ require 'wordnet'
65
+
66
+ WordNet::DB.path = "/path/to/WordNet-3.0"
67
+ lemmas = WordNet::Lemma.find_all("fruit")
68
+ ...
69
+ ```
@@ -0,0 +1,14 @@
1
+ require 'benchmark'
2
+ require 'wordnet'
3
+
4
+ initial = Benchmark.realtime do
5
+ WordNet::Lemma.find(ARGV[0] || raise("Usage: ruby benchmark.rb noun"), :noun)
6
+ end
7
+
8
+ puts "Time to initial word #{initial}"
9
+
10
+ lookup = Benchmark.realtime do
11
+ 1000.times { WordNet::Lemma.find('fruit', :noun) }
12
+ end
13
+
14
+ puts "Time for 1k lookups #{lookup}"
@@ -1,5 +1,4 @@
1
1
  # Use WordNet as a command-line dictionary.
2
- require 'rubygems'
3
2
  require 'wordnet'
4
3
 
5
4
  if ARGV.size != 1
@@ -10,10 +9,10 @@ end
10
9
  word = ARGV[0]
11
10
 
12
11
  # Find all the lemmas for a word (i.e., whether it occurs as a noun, verb, etc.)
13
- lemmas = WordNet::WordNetDB.find(word)
12
+ lemmas = WordNet::Lemma.find_all(word)
14
13
 
15
14
  # Print out each lemma with a list of possible meanings.
16
- lemmas.each do |lemma|
15
+ lemmas.each do |lemma|
17
16
  puts lemma
18
17
  lemma.synsets.each_with_index do |synset,i|
19
18
  puts "\t#{i+1}) #{synset.gloss}"
@@ -1,10 +1,7 @@
1
- require 'rubygems'
2
1
  require 'wordnet'
3
2
 
4
- # Open the index file for nouns
5
- index = WordNet::NounIndex.new
6
3
  # Find the word 'fruit'
7
- lemma = index.find("fruit")
4
+ lemma = WordNet::Lemma.find("fruit", :noun)
8
5
  # Find all the synsets for 'fruit', and pick the first one.
9
6
  synset = lemma.synsets[0]
10
7
  puts synset
data/lib/wordnet/db.rb ADDED
@@ -0,0 +1,17 @@
1
+ module WordNet
2
+ # Represents the WordNet database, and provides some basic interaction.
3
+ class DB
4
+ # By default, use the bundled WordNet
5
+ @path = File.expand_path("../../../WordNet-3.0/", __FILE__)
6
+
7
+ class << self
8
+ # To use your own WordNet installation (rather than the one bundled with rwordnet:
9
+ # Returns the path to the WordNet installation currently in use. Defaults to the bundled version of WordNet.
10
+ attr_accessor :path
11
+
12
+ def open(path, &block)
13
+ File.open(File.join(self.path, path), "r", &block)
14
+ end
15
+ end
16
+ end
17
+ end
data/lib/wordnet/lemma.rb CHANGED
@@ -1,39 +1,60 @@
1
1
  module WordNet
2
+ # Represents a single word in the WordNet lexicon, which can be used to look up a set of synsets.
3
+ class Lemma
4
+ SPACE = ' '
5
+ attr_accessor :word, :pos, :pointer_symbols, :tagsense_count, :synset_offsets, :id
2
6
 
3
- # Represents a single word in the WordNet lexicon, which can be used to look up a set of synsets.
4
- class Lemma
5
- attr_accessor :lemma, :pos, :synset_cnt, :p_cnt, :ptr_symbol, :tagsense_cnt, :synset_offset, :id
6
-
7
- # Create a lemma from a line in an index file. You should be creating Lemmas by hand; instead,
8
- # use the WordNet#find and Index#find methods to find the Lemma for a word.
9
- def initialize(index_line, id = 0)
10
- @id = (id > 0) ? id : nil
11
- line = index_line.split(" ")
12
-
13
- @lemma = line.shift
14
- @pos = line.shift
15
- @synset_cnt = line.shift.to_i
16
- @p_cnt = line.shift.to_i
17
-
18
- @ptr_symbol = []
19
- @p_cnt.times { @ptr_symbol.push line.shift }
20
- line.shift # Throw away redundant sense_cnt
21
- @tagsense_cnt = line.shift.to_i
22
- @synset_offset = []
23
- @synset_cnt.times { @synset_offset.push line.shift.to_i }
24
- end
25
-
26
- # Return a list of synsets for this Lemma. Each synset represents a different sense, or meaning, of the word.
27
- def get_synsets
28
- return @synset_offset.map { |offset| Synset.new(@pos, offset) }
29
- end
30
-
31
- def to_s
32
- [@lemma, @pos].join(",")
33
- end
34
-
35
- alias synsets get_synsets
36
- alias word lemma
37
- end
7
+ # Create a lemma from a line in an lexicon file. You should be creating Lemmas by hand; instead,
8
+ # use the WordNet::Lemma.find and WordNet::Lemma.find_all methods to find the Lemma for a word.
9
+ def initialize(lexicon_line, id)
10
+ @id = id
11
+ line = lexicon_line.split(" ")
12
+
13
+ @word = line.shift
14
+ @pos = line.shift
15
+ synset_count = line.shift.to_i
16
+ @pointer_symbols = line.slice!(0, line.shift.to_i)
17
+ line.shift # Throw away redundant sense_cnt
18
+ @tagsense_count = line.shift.to_i
19
+ @synset_offsets = line.slice!(0, synset_count).map(&:to_i)
20
+ end
21
+
22
+ # Return a list of synsets for this Lemma. Each synset represents a different sense, or meaning, of the word.
23
+ def synsets
24
+ @synset_offsets.map { |offset| Synset.new(@pos, offset) }
25
+ end
38
26
 
27
+ def to_s
28
+ [@word, @pos].join(",")
29
+ end
30
+
31
+ class << self
32
+ @@cache = {}
33
+
34
+ def find_all(word)
35
+ [:noun, :verb, :adj, :adv].flat_map do |pos|
36
+ find(word, pos) || []
37
+ end
38
+ end
39
+
40
+ # Find a lemma for a given word and pos
41
+ def find(word, pos)
42
+ cache = @@cache[pos] ||= build_cache(pos)
43
+ if found = cache[word]
44
+ Lemma.new(*found)
45
+ end
46
+ end
47
+
48
+ private
49
+
50
+ def build_cache(pos)
51
+ cache = {}
52
+ DB.open(File.join("dict", "index.#{pos}")).each_line.each_with_index do |line, index|
53
+ word = line.slice(0, line.index(SPACE))
54
+ cache[word] = [line, index+1]
55
+ end
56
+ cache
57
+ end
58
+ end
59
+ end
39
60
  end
@@ -1,15 +1,14 @@
1
1
  module WordNet
2
+ class Pointer
3
+ attr_reader :symbol, :offset, :pos, :source, :target
2
4
 
3
- # Convenience class for treating hashes as objects, i.e. obj[:key] <=> obj.key. I know
4
- # this is probably a bad idea, but it's so convenient...
5
- class Pointer < Hash
6
- def method_missing(msg, *args)
7
- if self.include?(msg)
8
- return self[msg]
9
- else
10
- throw NoMethodError.new("undefined method `#{msg}' for #{self}:Pointer")
5
+ def initialize(symbol: raise, offset: raise, pos: raise, source: raise)
6
+ @symbol, @offset, @pos, @source = symbol, offset, pos, source
7
+ @target = source.slice!(2,2)
11
8
  end
12
- end
13
- end
14
9
 
10
+ def is_semantic?
11
+ source == "00" && target == "00"
12
+ end
13
+ end
15
14
  end
@@ -1,37 +1,82 @@
1
- # A container for various constants. In particular, contains constants representing the WordNet symbols used to look up synsets by relation, i.e. Hypernym/Hyponym.
2
- # Use these symbols in conjunction with the Synset#get_relation method.
1
+ # A container for various constants.
2
+ # In particular, contains constants representing the WordNet symbols used to look up synsets by relation, i.e. Hypernym/Hyponym.
3
+ # Use these symbols in conjunction with the Synset#relation method.
3
4
 
4
5
  module WordNet
6
+ NOUN_POINTERS = {
7
+ "-c" => "Member of this domain - TOPIC",
8
+ "+" => "Derivationally related form",
9
+ "%p" => "Part meronym",
10
+ "~i" => "Instance Hyponym",
11
+ "@" => "Hypernym",
12
+ ";r" => "Domain of synset - REGION",
13
+ "!" => "Antonym",
14
+ "#p" => "Part holonym",
15
+ "%s" => "Substance meronym",
16
+ ";u" => "Domain of synset - USAGE",
17
+ "-r" => "Member of this domain - REGION",
18
+ "#s" => "Substance holonym",
19
+ "=" => "Attribute",
20
+ "-u" => "Member of this domain - USAGE",
21
+ ";c" => "Domain of synset - TOPIC",
22
+ "%m" => "Member meronym",
23
+ "~" => "Hyponym",
24
+ "@i" => "Instance Hypernym",
25
+ "#m" => "Member holonym"
26
+ }
27
+ VERB_POINTERS = {
28
+ "+" => "Derivationally related form",
29
+ "@" => "Hypernym",
30
+ ";r" => "Domain of synset - REGION",
31
+ "!" => "Antonym",
32
+ ";u" => "Domain of synset - USAGE",
33
+ "$" => "Verb Group",
34
+ ";c" => "Domain of synset - TOPIC",
35
+ ">" => "Cause",
36
+ "~" => "Hyponym",
37
+ "*" => "Entailment"
38
+ }
39
+ ADJECTIVE_POINTERS = {
40
+ ";r" => "Domain of synset - REGION",
41
+ "!" => "Antonym",
42
+ "\\" => "Pertainym (pertains to noun)",
43
+ "<" => "Participle of verb",
44
+ "&" => "Similar to",
45
+ "=" => "Attribute",
46
+ ";c" => "Domain of synset - TOPIC"
47
+ }
48
+ ADVERB_POINTERS = {
49
+ ";r" => "Domain of synset - REGION",
50
+ "!" => "Antonym",
51
+ ";u" => "Domain of synset - USAGE",
52
+ "\\" => "Derived from adjective",
53
+ ";c" => "Domain of synset - TOPIC"
54
+ }
5
55
 
6
- NounPointers = {"-c"=>"Member of this domain - TOPIC", "+"=>"Derivationally related form", "%p"=>"Part meronym", "~i"=>"Instance Hyponym", "@"=>"Hypernym", ";r"=>"Domain of synset - REGION", "!"=>"Antonym", "#p"=>"Part holonym", "%s"=>"Substance meronym", ";u"=>"Domain of synset - USAGE", "-r"=>"Member of this domain - REGION", "#s"=>"Substance holonym", "="=>"Attribute", "-u"=>"Member of this domain - USAGE", ";c"=>"Domain of synset - TOPIC", "%m"=>"Member meronym", "~"=>"Hyponym", "@i"=>"Instance Hypernym", "#m"=>"Member holonym"}
7
- VerbPointers = {"+"=>"Derivationally related form", "@"=>"Hypernym", ";r"=>"Domain of synset - REGION", "!"=>"Antonym", ";u"=>"Domain of synset - USAGE", "$"=>"Verb Group", ";c"=>"Domain of synset - TOPIC", ">"=>"Cause", "~"=>"Hyponym", "*"=>"Entailment"}
8
- AdjectivePointers = {";r"=>"Domain of synset - REGION", "!"=>"Antonym", "\\"=>"Pertainym (pertains to noun)", "<"=>"Participle of verb", "&"=>"Similar to", "="=>"Attribute", ";c"=>"Domain of synset - TOPIC"}
9
- AdverbPointers = {";r"=>"Domain of synset - REGION", "!"=>"Antonym", ";u"=>"Domain of synset - USAGE", "\\"=>"Derived from adjective", ";c"=>"Domain of synset - TOPIC"}
10
-
11
- MemberOfThisDomainTopic = "-c"
12
- DerivationallyRelatedForm = "+"
13
- PartMeronym = "%p"
56
+ MEMBER_OF_THIS_DOMAIN_TOPIC = "-c"
57
+ DERIVATIONALLY_RELATED_FORM = "+"
58
+ PART_MERONYM = "%p"
14
59
  InstanceHyponym = "~i"
15
- Hypernym = "@"
16
- DomainOfSynsetRegion = ";r"
17
- Antonym = "!"
18
- PartHolonym = "#p"
19
- SubstanceMeronym = "%s"
20
- VerbGroup = "$"
21
- DomainOfSynsetUsage = ";u"
22
- MemberOfThisDomainRegion = "-r"
23
- SubstanceHolonym = "#s"
24
- DerivedFromAdjective = "\\"
25
- ParticipleOfVerb = "<"
26
- SimilarTo = "&"
27
- Attribute = "="
28
- AlsoSee = "^"
29
- Cause = ">"
30
- MemberOfThisDomainUsage = "-u"
31
- DomainOfSynsetTopic = ";c"
32
- MemberMeronym = "%m"
33
- Hyponym = "~"
34
- InstanceHypernym = "@i"
35
- Entailment = "*"
36
- MemberHolonym = "#m"
60
+ HYPERNYM = "@"
61
+ DOMAIN_OF_SYNSET_REGION = ";r"
62
+ ANTONYM = "!"
63
+ PART_HOLONYM = "#p"
64
+ SUBSTANCE_MERONYM = "%s"
65
+ VERB_GROUP = "$"
66
+ DOMAIN_OF_SYNSET_USAGE = ";u"
67
+ MEMBER_OF_THIS_DOMAIN_REGION = "-r"
68
+ SUBSTANCE_HOLONYM = "#s"
69
+ DERIVED_FROM_ADJECTIVE = "\\"
70
+ PARTICIPLE_OF_VERB = "<"
71
+ SIMILAR_TO = "&"
72
+ ATTRIBUTE = "="
73
+ ALSO_SEE = "^"
74
+ CAUSE = ">"
75
+ MEMBER_OF_THIS_DOMAIN_USAGE = "-u"
76
+ DOMAIN_OF_SYNSET_TOPIC = ";c"
77
+ MEMBER_MERONYM = "%m"
78
+ HYPONYM = "~"
79
+ INSTANCE_HYPERNYM = "@i"
80
+ ENTAILMENT = "*"
81
+ MEMBER_HOLONYM = "#m"
37
82
  end
@@ -1,90 +1,98 @@
1
1
  module WordNet
2
+ SYNSET_TYPES = {"n" => "noun", "v" => "verb", "a" => "adj", "r" => "adv"}
2
3
 
3
- # Represents a synset (or group of synonymous words) in WordNet. Synsets are related to each other by various (and numerous!)
4
- # relationships, including Hypernym (x is a hypernym of y <=> x is a parent of y) and Hyponym (x is a child of y)
5
- class Synset
6
- attr_reader :gloss, :synset_offset, :lex_filenum, :ss_type, :w_cnt, :wordcounts
7
-
8
- # Create a new synset by reading from the data file specified by +pos+, at +offset+ bytes into the file. This is how
9
- # the WordNet database is organized. You shouldn't be creating Synsets directly; instead, use Lemma#synsets.
10
- def initialize(pos, offset)
11
- data = File.open(File.join(WordNetDB.path,"dict","data.#{SynsetType[pos]}"),"r")
12
- data.seek(offset)
13
- data_line = data.readline.strip
14
- data.close
15
-
16
- info_line, @gloss = data_line.split(" | ")
17
- line = info_line.split(" ")
18
-
19
- @synset_offset = line.shift
20
- @lex_filenum = line.shift
21
- @ss_type = line.shift
22
- @w_cnt = line.shift.to_i
23
- @wordcounts = {}
24
- @w_cnt.times do
25
- @wordcounts[line.shift] = line.shift.to_i
4
+ # Represents a synset (or group of synonymous words) in WordNet. Synsets are related to each other by various (and numerous!)
5
+ # relationships, including Hypernym (x is a hypernym of y <=> x is a parent of y) and Hyponym (x is a child of y)
6
+ class Synset
7
+ attr_reader :gloss, :synset_offset, :lex_filenum, :synset_type, :word_counts, :pos_offset, :pos
8
+
9
+ # Create a new synset by reading from the data file specified by +pos+, at +offset+ bytes into the file. This is how
10
+ # the WordNet database is organized. You shouldn't be creating Synsets directly; instead, use Lemma#synsets.
11
+ def initialize(pos, offset)
12
+ data_line = DB.open(File.join("dict", "data.#{SYNSET_TYPES.fetch(pos)}")) do |f|
13
+ f.seek(offset)
14
+ f.readline.strip
15
+ end
16
+
17
+ info_line, @gloss = data_line.split(" | ", 2)
18
+ line = info_line.split(" ")
19
+
20
+ @pos = pos
21
+ @pos_offset = offset
22
+ @synset_offset = line.shift
23
+ @lex_filenum = line.shift
24
+ @synset_type = line.shift
25
+
26
+ @word_counts = {}
27
+ word_count = line.shift.to_i
28
+ word_count.times do
29
+ @word_counts[line.shift] = line.shift.to_i
30
+ end
31
+
32
+ pointer_count = line.shift.to_i
33
+ @pointers = Array.new(pointer_count).map do
34
+ Pointer.new(
35
+ symbol: line.shift[0],
36
+ offset: line.shift.to_i,
37
+ pos: line.shift,
38
+ source: line.shift
39
+ )
40
+ end
26
41
  end
27
-
28
- @p_cnt = line.shift.to_i
29
- @pointers = []
30
- @p_cnt.times do
31
- pointer = Pointer.new
32
- pointer[:symbol] = line.shift,
33
- pointer[:offset] = line.shift.to_i
34
- pointer[:pos] = line.shift
35
- pointer[:source] = line.shift
36
- pointer[:is_semantic?] = (pointer[:source] == "0000")
37
- pointer[:target] = pointer[:source][2..3]
38
- pointer[:source] = pointer[:source][0..1]
39
- pointer[:symbol] = pointer[:symbol][0]
40
- @pointers.push pointer
42
+
43
+ # How many words does this Synset include?
44
+ def word_count
45
+ @word_counts.size
41
46
  end
42
- end
43
-
44
- # How many words does this Synset include?
45
- def size
46
- @wordcounts.size
47
- end
48
-
49
- # Get a list of words included in this Synset
50
- def words
51
- @wordcounts.keys
52
- end
53
-
54
- # List of valid +pointer_symbol+s is in pointers.rb
55
- def get_relation(pointer_symbol)
56
- @pointers.reject { |pointer| pointer.symbol != pointer_symbol }.map { |pointer| Synset.new(@ss_type, pointer.offset) }
57
- end
58
-
59
- # Get the Synset of this sense's antonym
60
- def antonym
61
- get_relation(Antonym)
62
- end
63
-
64
- # Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).
65
- def hypernym
66
- get_relation(Hypernym)[0]
67
- end
68
-
69
- # Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit)
70
- def hyponym
71
- get_relation(Hyponym)
72
- end
73
-
74
- # Get the entire hypernym tree (from this synset all the way up to +entity+) as an array.
75
- def expanded_hypernym
76
- parent = self.hypernym
77
- return [] if parent.nil?
78
-
79
- return [parent, parent.expanded_hypernym].flatten
80
- end
81
-
82
- def to_s
83
- "(#{@ss_type}) #{words.map {|x| x.gsub('_',' ')}.join(', ')} (#{@gloss})"
84
- end
85
-
86
- alias parent hypernym
87
- alias children hyponym
88
- end
89
47
 
48
+ # Get a list of words included in this Synset
49
+ def words
50
+ @word_counts.keys
51
+ end
52
+
53
+ # List of valid +pointer_symbol+s is in pointers.rb
54
+ def relation(pointer_symbol)
55
+ @pointers.select { |pointer| pointer.symbol == pointer_symbol }.
56
+ map! { |pointer| Synset.new(@synset_type, pointer.offset) }
57
+ end
58
+
59
+ # Get the Synset of this sense's antonym
60
+ def antonym
61
+ relation(ANTONYM)
62
+ end
63
+
64
+ # Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).
65
+ def hypernym
66
+ relation(HYPERNYM)[0]
67
+ end
68
+
69
+ # Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit)
70
+ def hyponym
71
+ relation(HYPONYM)
72
+ end
73
+
74
+ # Get the entire hypernym tree (from this synset all the way up to +entity+) as an array.
75
+ def expanded_hypernym
76
+ parent = hypernym
77
+ list = []
78
+ return list unless parent
79
+
80
+ while parent
81
+ break if list.include? parent.pos_offset
82
+ list.push parent.pos_offset
83
+ parent = parent.parent
84
+ end
85
+
86
+ list.flatten!
87
+ list.map! { |offset| Synset.new(@pos, offset)}
88
+ end
89
+
90
+ def to_s
91
+ "(#{@synset_type}) #{words.map { |x| x.tr('_',' ') }.join(', ')} (#{@gloss})"
92
+ end
93
+
94
+ alias size word_count
95
+ alias parent hypernym
96
+ alias children hyponym
97
+ end
90
98
  end
@@ -0,0 +1,3 @@
1
+ module WordNet
2
+ VERSION = "1.0.0"
3
+ end
data/lib/wordnet.rb CHANGED
@@ -1,7 +1,5 @@
1
1
  require 'wordnet/pointer'
2
- require 'wordnet/wordnetdb'
3
- require 'wordnet/index'
2
+ require 'wordnet/db'
4
3
  require 'wordnet/lemma'
5
4
  require 'wordnet/pointers'
6
- require 'wordnet/pos'
7
5
  require 'wordnet/synset'
data/test/test_helper.rb CHANGED
@@ -1,17 +1,16 @@
1
- require "test/unit"
2
- require File.dirname(__FILE__) + "/../lib/wordnet"
1
+ require "bundler/setup"
2
+ require "maxitest/autorun"
3
3
 
4
+ $LOAD_PATH.unshift Bundler.root.join("lib")
5
+ require "wordnet"
4
6
 
5
- class << Test::Unit::TestCase
6
- def test(name, &block)
7
- test_name = :"test_#{name.gsub(' ','_')}"
8
- raise ArgumentError, "#{test_name} is already defined" if self.instance_methods.include? test_name.to_s
9
- define_method test_name, &block
10
- end
11
-
12
- def expect(expected_value, &block)
13
- define_method :"test_#{caller.first.split("/").last}" do
14
- assert_equal expected_value, instance_eval(&block)
15
- end
7
+ Minitest::Test.class_eval do
8
+ def with_db_path(path)
9
+ begin
10
+ old, WordNet::DB.path = WordNet::DB.path, path
11
+ yield
12
+ ensure
13
+ WordNet::DB.path = old
14
+ end
16
15
  end
17
16
  end
@@ -0,0 +1,14 @@
1
+ require_relative "../test_helper"
2
+
3
+ describe WordNet::DB do
4
+ it 'sets and reads path' do
5
+ with_db_path("WordNetPath") { WordNet::DB.path.must_equal "WordNetPath" }
6
+ end
7
+
8
+ it "opens a relative path" do
9
+ result = WordNet::DB.open(File.join("dict", "index.verb")) do |f|
10
+ f.gets
11
+ end
12
+ result.must_equal " 1 This software and database is being provided to you, the LICENSEE, by \n"
13
+ end
14
+ end
@@ -0,0 +1,94 @@
1
+ require_relative "../test_helper"
2
+
3
+ describe WordNet::Lemma do
4
+ describe ".find" do
5
+ it 'finds a lemma by string' do
6
+ lemma = WordNet::Lemma.find("fruit", :noun)
7
+ lemma.to_s.must_equal "fruit,n"
8
+ end
9
+
10
+ it 'caches found' do
11
+ lemma1 = WordNet::Lemma.find("fruit", :noun)
12
+ lemma2 = with_db_path "does-not-exist" do
13
+ WordNet::Lemma.find("fruit", :noun)
14
+ end
15
+ lemma1.word.must_equal lemma2.word
16
+ end
17
+
18
+ it 'only scans the db once' do
19
+ lemma1 = WordNet::Lemma.find("fruit", :noun)
20
+ lemma2 = with_db_path "does-not-exist" do
21
+ WordNet::Lemma.find("table", :noun)
22
+ end
23
+ lemma2.word.must_equal "table"
24
+ end
25
+
26
+ it 'can lookup different things' do
27
+ lemma1 = WordNet::Lemma.find("fruit", :noun)
28
+ lemma2 = WordNet::Lemma.find("banana", :noun)
29
+ lemma1.word.must_equal "fruit"
30
+ lemma2.word.must_equal "banana"
31
+ end
32
+
33
+ it 'does not find word in wrong file' do
34
+ lemma = WordNet::Lemma.find("elephant", :verb)
35
+ lemma.must_equal nil
36
+ end
37
+
38
+ it 'caches unfound' do
39
+ WordNet::Lemma.find("elephant", :verb)
40
+ lemma2 = with_db_path "does-not-exist" do
41
+ WordNet::Lemma.find("elephant", :verb)
42
+ end
43
+ lemma2.must_equal nil
44
+ end
45
+
46
+ it 'fails on unknown type' do
47
+ assert_raises Errno::ENOENT do
48
+ WordNet::Lemma.find("fruit", :sdjksdfjkdfskjsdfjk)
49
+ end
50
+ end
51
+
52
+ it "does not find by regexp" do
53
+ WordNet::Lemma.find(".", :verb).must_equal nil
54
+ end
55
+ end
56
+
57
+ describe ".find_all" do
58
+ it "finds all pos" do
59
+ result = WordNet::Lemma.find_all("fruit")
60
+ result.size.must_equal 2
61
+ result.map(&:pos).sort.must_equal ["n", "v"]
62
+ end
63
+
64
+ it "returns empty array for unfound" do
65
+ WordNet::Lemma.find_all("sdjkhdfsjfdsjhkfds").must_equal []
66
+ end
67
+
68
+ it "does not produce a circular reference" do
69
+ l = WordNet::Lemma.find_all("blink")[1]
70
+ l.synsets[1].expanded_hypernym.wont_be_nil
71
+ end
72
+ end
73
+
74
+ describe "#synsets" do
75
+ it 'finds them' do
76
+ lemma = WordNet::Lemma.find("fruit", :noun)
77
+ synsets = lemma.synsets
78
+ synsets.size.must_equal 3
79
+ synsets[1].to_s.must_equal "(n) yield, fruit (an amount of a product)"
80
+ end
81
+ end
82
+
83
+ describe ".new" do
84
+ it "builds all fields" do
85
+ lemma = WordNet::Lemma.new("fruit n 3 3 @ ~ + 3 3 13134947 04612722 07294550", 123)
86
+ lemma.id.must_equal 123
87
+ lemma.word.must_equal "fruit"
88
+ lemma.pos.must_equal "n"
89
+ lemma.pointer_symbols.must_equal ["@", "~", "+"]
90
+ lemma.tagsense_count.must_equal 3
91
+ lemma.synset_offsets.must_equal [13134947, 4612722, 7294550]
92
+ end
93
+ end
94
+ end
@@ -0,0 +1,26 @@
1
+ require_relative "../test_helper"
2
+
3
+ describe WordNet::Pointer do
4
+ let(:pointer) { WordNet::Pointer.new(symbol: "s", offset: 123, pos: "v", source: "1234") }
5
+
6
+ describe "#initialize" do
7
+ it "sets all values" do
8
+ pointer.symbol.must_equal "s"
9
+ pointer.offset.must_equal 123
10
+ pointer.pos.must_equal "v"
11
+ pointer.source.must_equal "12"
12
+ pointer.target.must_equal "34"
13
+ end
14
+ end
15
+
16
+ describe "#is_semantic?" do
17
+ it "is not semantic for non-0" do
18
+ pointer.is_semantic?.must_equal false
19
+ end
20
+
21
+ it "is semantic for all-0" do
22
+ pointer = WordNet::Pointer.new(symbol: "s", offset: 123, pos: "v", source: "0000")
23
+ pointer.is_semantic?.must_equal true
24
+ end
25
+ end
26
+ end
@@ -1,43 +1,39 @@
1
- require File.dirname(__FILE__) + "/../test_helper.rb"
1
+ require_relative "../test_helper"
2
2
 
3
- class TestSynset < Test::Unit::TestCase
4
- @@synsets = nil
5
-
6
- def setup
7
- if @@synsets.nil?
8
- index = WordNet::NounIndex.instance
9
- lemma = index.find("fruit")
10
- @@synsets = lemma.get_synsets
11
- end
3
+ describe WordNet::Synset do
4
+ def self.synsets
5
+ @synsets ||= WordNet::Lemma.find("fruit", :noun).synsets
12
6
  end
13
-
14
- test 'get synsets for a lemma' do
15
- assert_equal 3, @@synsets.size
16
- assert_equal "(n) fruit (the ripened reproductive body of a seed plant)",@@synsets[0].to_s
17
- assert_equal "an amount of a product",@@synsets[1].gloss
7
+
8
+ let(:synsets) { self.class.synsets }
9
+
10
+ it 'get synsets for a lemma' do
11
+ assert_equal 3, synsets.size
12
+ assert_equal "(n) fruit (the ripened reproductive body of a seed plant)",synsets[0].to_s
13
+ assert_equal "an amount of a product",synsets[1].gloss
18
14
  end
19
-
20
- test 'get hypernym for a synset' do
21
- hypernym = @@synsets[0].get_relation(WordNet::Hypernym)
22
- hypernym = @@synsets[0].hypernym
15
+
16
+ it 'get hypernym for a synset' do
17
+ hypernym = synsets[0].relation(WordNet::HYPERNYM)
18
+ hypernym = synsets[0].hypernym
23
19
  assert_equal 1,hypernym.size
24
20
  assert_equal "(n) reproductive structure (the parts of a plant involved in its reproduction)",hypernym.to_s
25
21
  end
26
22
 
27
- test 'test shorthand for get_relation' do
28
- hypernym = @@synsets[0].get_relation(WordNet::Hypernym)
29
- hypernym2 = @@synsets[0].hypernym
23
+ it 'test shorthand for get_relation' do
24
+ hypernym = synsets[0].relation(WordNet::HYPERNYM)
25
+ hypernym2 = synsets[0].hypernym
30
26
  assert_equal hypernym[0].gloss, hypernym2.gloss
31
27
  end
32
-
33
- test 'get hyponyms for a synset' do
34
- hyponym = @@synsets[0].get_relation(WordNet::Hyponym)
28
+
29
+ it 'get hyponyms for a synset' do
30
+ hyponym = synsets[0].relation(WordNet::HYPONYM)
35
31
  assert_equal 29,hyponym.size
36
32
  assert_equal "fruit of various buckthorns yielding dyes or pigments",hyponym[26].gloss
37
33
  end
38
-
39
- test 'test expanded hypernym tree' do
40
- expanded = @@synsets[0].expanded_hypernym
34
+
35
+ it 'test expanded hypernym tree' do
36
+ expanded = synsets[0].expanded_hypernym
41
37
  assert_equal 8, expanded.size
42
38
  assert_equal "entity", expanded[expanded.size-1].words[0]
43
39
  end
metadata CHANGED
@@ -1,33 +1,23 @@
1
- --- !ruby/object:Gem::Specification
1
+ --- !ruby/object:Gem::Specification
2
2
  name: rwordnet
3
- version: !ruby/object:Gem::Version
4
- prerelease: false
5
- segments:
6
- - 0
7
- - 1
8
- - 3
9
- version: 0.1.3
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
10
5
  platform: ruby
11
- authors:
6
+ authors:
12
7
  - Trevor Fountain
13
8
  - Wolfram Sieber
9
+ - Michael Grosser
14
10
  autorequire:
15
11
  bindir: bin
16
12
  cert_chain: []
17
-
18
- date: 2010-10-15 00:00:00 +01:00
19
- default_executable:
13
+ date: 2015-03-08 00:00:00.000000000 Z
20
14
  dependencies: []
21
-
22
- description: A pure Ruby interface to the WordNet database
15
+ description:
23
16
  email: doches@gmail.com
24
17
  executables: []
25
-
26
18
  extensions: []
27
-
28
- extra_rdoc_files:
29
- - README.markdown
30
- files:
19
+ extra_rdoc_files: []
20
+ files:
31
21
  - History.txt
32
22
  - README.markdown
33
23
  - WordNet-3.0/AUTHORS
@@ -42,54 +32,43 @@ files:
42
32
  - WordNet-3.0/dict/index.adv
43
33
  - WordNet-3.0/dict/index.noun
44
34
  - WordNet-3.0/dict/index.verb
35
+ - examples/benchmark.rb
45
36
  - examples/dictionary.rb
46
37
  - examples/full_hypernym.rb
47
38
  - lib/wordnet.rb
48
- - lib/wordnet/index.rb
39
+ - lib/wordnet/db.rb
49
40
  - lib/wordnet/lemma.rb
50
41
  - lib/wordnet/pointer.rb
51
42
  - lib/wordnet/pointers.rb
52
- - lib/wordnet/pos.rb
53
43
  - lib/wordnet/synset.rb
54
- - lib/wordnet/wordnetdb.rb
44
+ - lib/wordnet/version.rb
55
45
  - test/test_helper.rb
56
- - test/unit/index_test.rb
46
+ - test/unit/db_test.rb
47
+ - test/unit/lemma_test.rb
48
+ - test/unit/pointer_test.rb
57
49
  - test/unit/synset_test.rb
58
- - test/unit/wordnetdb_test.rb
59
- has_rdoc: true
60
- homepage: http://github.com/doches/rwordnet
61
- licenses: []
62
-
50
+ homepage: https://github.com/doches/rwordnet
51
+ licenses:
52
+ - MIT
53
+ metadata: {}
63
54
  post_install_message:
64
- rdoc_options:
65
- - --charset=UTF-8
66
- require_paths:
55
+ rdoc_options: []
56
+ require_paths:
67
57
  - lib
68
- required_ruby_version: !ruby/object:Gem::Requirement
69
- requirements:
58
+ required_ruby_version: !ruby/object:Gem::Requirement
59
+ requirements:
70
60
  - - ">="
71
- - !ruby/object:Gem::Version
72
- segments:
73
- - 0
74
- version: "0"
75
- required_rubygems_version: !ruby/object:Gem::Requirement
76
- requirements:
61
+ - !ruby/object:Gem::Version
62
+ version: 2.0.0
63
+ required_rubygems_version: !ruby/object:Gem::Requirement
64
+ requirements:
77
65
  - - ">="
78
- - !ruby/object:Gem::Version
79
- segments:
80
- - 0
81
- version: "0"
66
+ - !ruby/object:Gem::Version
67
+ version: '0'
82
68
  requirements: []
83
-
84
69
  rubyforge_project:
85
- rubygems_version: 1.3.6
70
+ rubygems_version: 2.2.2
86
71
  signing_key:
87
- specification_version: 3
72
+ specification_version: 4
88
73
  summary: A pure Ruby interface to the WordNet database
89
- test_files:
90
- - test/unit/index_test.rb
91
- - test/unit/synset_test.rb
92
- - test/unit/wordnetdb_test.rb
93
- - test/test_helper.rb
94
- - examples/full_hypernym.rb
95
- - examples/dictionary.rb
74
+ test_files: []
data/lib/wordnet/index.rb DELETED
@@ -1,82 +0,0 @@
1
- require 'singleton'
2
- module WordNet
3
-
4
- # Index is a WordNet lexicon. Note that Index is the base class; you probably want to be using the NounIndex, VerbIndex, etc. classes instead.
5
- # Note that Indices are Singletons -- get an Index object by calling <POS>Index.instance, not <POS>Index.new.
6
- class Index
7
- # Create a new index for the given part of speech. +pos+ can be one of +noun+, +verb+, +adj+, or +adv+.
8
- def initialize(pos)
9
- @pos = pos
10
- @db = {}
11
-
12
- @finished_reading = false
13
- end
14
-
15
- # Find a lemma for a given word. Returns a Lemma which can then be used to access the synsets for the word.
16
- def find(lemma_str)
17
- # Look for the lemma in the part of the DB already read...
18
- return @db[lemma_str] if @db.include?(lemma_str)
19
-
20
- return nil if @finished_reading
21
-
22
- # If we didn't find it, read in some more from the DB.
23
- index = WordNetDB.open(File.join(WordNetDB.path,"dict","index.#{@pos}"))
24
-
25
- lemma_counter = 1
26
- if not index.closed?
27
- loop do
28
- break if index.eof?
29
- line = index.readline
30
- lemma = Lemma.new(line, lemma_counter); lemma_counter += 1
31
- @db[lemma.word] = lemma
32
- if line =~ /^#{lemma_str} /
33
- return lemma
34
- end
35
- end
36
- index.close
37
- end
38
-
39
- @finished_reading = true
40
-
41
- # If we *still* didn't find it, return nil. It must not be in the database...
42
- return nil
43
- end
44
- end
45
-
46
- # An Index of nouns. Create a NounIndex by calling `NounIndex.instance`
47
- class NounIndex < Index
48
- include Singleton
49
-
50
- def initialize
51
- super("noun")
52
- end
53
- end
54
-
55
- # An Index of verbs. Create a VerbIndex by calling `VerbIndex.instance`
56
- class VerbIndex < Index
57
- include Singleton
58
-
59
- def initialize
60
- super("verb")
61
- end
62
- end
63
-
64
- # An Index of adjectives. Create an AdjectiveIndex by `AdjectiveIndex.instance`
65
- class AdjectiveIndex < Index
66
- include Singleton
67
-
68
- def initialize
69
- super("adj")
70
- end
71
- end
72
-
73
- # An Index of adverbs. Create an AdverbIndex by `AdverbIndex.instance`
74
- class AdverbIndex < Index
75
- include Singleton
76
-
77
- def initialize
78
- super("adv")
79
- end
80
- end
81
-
82
- end
data/lib/wordnet/pos.rb DELETED
@@ -1,3 +0,0 @@
1
- module WordNet
2
- SynsetType = {"n" => "noun", "v" => "verb", "adj" => "adj", "adv" => "adv"}
3
- end
@@ -1,54 +0,0 @@
1
- module WordNet
2
-
3
- # Represents the WordNet database, and provides some basic interaction.
4
- class WordNetDB
5
- # By default, use the bundled WordNet
6
- @@path = File.join(File.dirname(__FILE__),"/../../WordNet-3.0/")
7
- @@files = {}
8
-
9
- # To use your own WordNet installation (rather than the one bundled with rwordnet:
10
- def WordNetDB.path=(path_to_wordnet)
11
- @@path = path_to_wordnet
12
- end
13
-
14
- # Returns the path to the WordNet installation currently in use. Defaults to the bundled version of WordNet.
15
- def WordNetDB.path
16
- @@path
17
- end
18
-
19
- # Look up a word in WordNet. Returns a list of lemmas occuring in any of the index files (noun, verb, adjective, adverb).
20
- def WordNetDB.find(word)
21
- lemmas = []
22
- [NounIndex, VerbIndex, AdjectiveIndex, AdverbIndex].each do |index|
23
- lemmas.push index.instance.find(word)
24
- end
25
- return lemmas.flatten.reject { |x| x.nil? }
26
- end
27
-
28
- # Register a new DB file handle. You shouldn't need to call this method; it's called automatically every time you open an index or data file.
29
- def WordNetDB.open(path)
30
- # If the file is already open, just return the handle.
31
- return @@files[path] if @@files.include?(path) and not @@files[path].closed?
32
-
33
- # Open and store
34
- @@files[path] = File.open(path,"r")
35
- return @@files[path]
36
- end
37
-
38
- # You should call this method after you're done using WordNet.
39
- def WordNetDB.close
40
- WordNetDB.finalize(0)
41
- end
42
-
43
- def WordNetDB.finalize(id)
44
- @@files.each_value do |handle|
45
- begin
46
- handle.close
47
- rescue IOError
48
- ; # Keep going, close the next file.
49
- end
50
- end
51
- end
52
- end
53
-
54
- end
@@ -1,21 +0,0 @@
1
- require File.dirname(__FILE__) + "/../test_helper.rb"
2
-
3
- class TestIndex < Test::Unit::TestCase
4
- @@index = nil
5
-
6
- def setup
7
- @@index = WordNet::NounIndex.instance if @@index.nil?
8
- end
9
-
10
- test 'find a lemma by string' do
11
- lemma = @@index.find("fruit")
12
- assert_equal "fruit,n",lemma.to_s
13
- end
14
-
15
- test 'get synsets for a lemma' do
16
- lemma = @@index.find("fruit")
17
- synsets = lemma.get_synsets
18
- assert_equal 3, synsets.size
19
- assert_equal "(n) yield, fruit (an amount of a product)",synsets[1].to_s
20
- end
21
- end
@@ -1,15 +0,0 @@
1
- require File.dirname(__FILE__) + "/../test_helper.rb"
2
-
3
- class TestWordNetDB < Test::Unit::TestCase
4
- include WordNet
5
-
6
- test 'set and read path' do
7
- WordNetDB.path = "WordNetPath"
8
- assert_equal "WordNetPath",WordNetDB.path
9
- end
10
-
11
- test 'find a word' do
12
- lemmas = WordNetDB.find("fruit")
13
- assert_equal 2,lemmas.size
14
- end
15
- end