rwordnet 0.1.3 → 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/History.txt +10 -1
- data/README.markdown +38 -28
- data/examples/benchmark.rb +14 -0
- data/examples/dictionary.rb +2 -3
- data/examples/full_hypernym.rb +1 -4
- data/lib/wordnet/db.rb +17 -0
- data/lib/wordnet/lemma.rb +56 -35
- data/lib/wordnet/pointer.rb +9 -10
- data/lib/wordnet/pointers.rb +77 -32
- data/lib/wordnet/synset.rb +92 -84
- data/lib/wordnet/version.rb +3 -0
- data/lib/wordnet.rb +1 -3
- data/test/test_helper.rb +12 -13
- data/test/unit/db_test.rb +14 -0
- data/test/unit/lemma_test.rb +94 -0
- data/test/unit/pointer_test.rb +26 -0
- data/test/unit/synset_test.rb +24 -28
- metadata +32 -53
- data/lib/wordnet/index.rb +0 -82
- data/lib/wordnet/pos.rb +0 -3
- data/lib/wordnet/wordnetdb.rb +0 -54
- data/test/unit/index_test.rb +0 -21
- data/test/unit/wordnetdb_test.rb +0 -15
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 4bea9b677b6a581d27c04ad1912c3034bf8329d8
|
4
|
+
data.tar.gz: 4fdef6fdfdbe2373b445f7857d3fe5ff5071fba2
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: c1471523dcb27e496eb72b406f37ddd83184dfcecbcabe95c492f6018b29fa53180d06035b7f9c0a938bfde0e4c28a37f89473c0b3a712be527ed2abf6261af3
|
7
|
+
data.tar.gz: 25cf40f306f3dfcf8076d9430cfd5d8d805af9198762a530ed4f293e5b0c4e5ccc680e941b2256f63c5141d5499ad77b7821eec5b4e245a72a47e1f9ca6e83b3
|
data/History.txt
CHANGED
@@ -1,5 +1,14 @@
|
|
1
|
+
# rWordNet 1.0.0
|
2
|
+
* Performance fixes for the lookup
|
3
|
+
* Find using Lemma.find / Lemma.find_all
|
4
|
+
* using ruby style constant names like `VerbPointers` -> `VERB_POINTERS`
|
5
|
+
* renamed WordNet::WordNetDB to WordNet::DB
|
6
|
+
* renaming a few methods in Lemma like `p_cnt` -> `pointer_count`
|
7
|
+
* make Pointer a real class
|
8
|
+
* renaming a few methods in SynSet like `get_relation` -> `relation`
|
9
|
+
|
1
10
|
# rWordNet 0.1.3
|
2
|
-
|
11
|
+
* Fixed a terrible bug that caused Indices to re-read the *entire* database on every failed lookup.
|
3
12
|
|
4
13
|
# rWordNet 0.1.2
|
5
14
|
* Added unique (integer) ids to lemmas [Wolfram Sieber]
|
data/README.markdown
CHANGED
@@ -1,59 +1,69 @@
|
|
1
1
|
# A pure Ruby interface to WordNet #
|
2
2
|
|
3
|
+
[![Build Status](https://travis-ci.org/doches/rwordnet.png)](https://travis-ci.org/doches/rwordnet)
|
4
|
+
|
5
|
+
## Summary ##
|
6
|
+
|
7
|
+
+ Works directly on the database that comes with WordNet
|
8
|
+
+ No gem or native dependencies
|
9
|
+
+ *Very* easy to install
|
10
|
+
+ Small footprint (8.1M vs 24M for Ruby-Wordnet+DB)
|
11
|
+
+ Can use a custom, existing WordNet installation
|
12
|
+
|
3
13
|
## About ##
|
4
14
|
|
5
15
|
This library implements a pure Ruby interface to the WordNet lexical/semantic
|
6
|
-
database. Unlike existing ruby bindings, this one doesn't require you to convert
|
16
|
+
database. Unlike existing ruby bindings, this one doesn't require you to convert
|
7
17
|
the original WordNet database into a new database format; instead it can work directly
|
8
18
|
on the database that comes with WordNet.
|
9
19
|
|
10
20
|
If you're doing something data-intensive you will achieve much better performance
|
11
|
-
with Michael Granger's [Ruby-WordNet](http://www.deveiate.org/projects/Ruby-WordNet/),
|
12
|
-
since it converts the WordNet database into a BerkelyDB file for quicker access. In
|
13
|
-
writing rwordnet, I've focused more on usability and ease of installation ( *gem install
|
21
|
+
with Michael Granger's [Ruby-WordNet](http://www.deveiate.org/projects/Ruby-WordNet/),
|
22
|
+
since it converts the WordNet database into a BerkelyDB file for quicker access. rwordnet has a much smaller footprint, with no gem or native dependencies, and requires about a third of the space on disk as Ruby-Wordnet + DB. In
|
23
|
+
writing rwordnet, I've focused more on usability and ease of installation ( *gem install
|
14
24
|
rwordnet* ) at the expense of some performance. Use at your own risk, etc.
|
15
25
|
|
16
26
|
## Installation ##
|
17
27
|
|
18
28
|
One of the chief benefits of rwordnet over Ruby-WordNet is how easy it is to install:
|
19
29
|
|
20
|
-
gem install gemcutter # These two steps are only necessary if you haven't
|
21
|
-
gem tumble # yet installed the gemcutter tools
|
22
30
|
gem install rwordnet
|
23
|
-
|
24
|
-
That's it! rwordnet comes bundled with the WordNet database which it uses by default,
|
31
|
+
|
32
|
+
That's it! rwordnet comes bundled with the WordNet database which it uses by default,
|
25
33
|
so there's absolutely nothing else to download, install, or configure.
|
26
|
-
Of course, if you want to use your own WordNet installation, that's easy too -- just
|
34
|
+
Of course, if you want to use your own WordNet installation, that's easy too -- just
|
27
35
|
set the path to WordNet's database files before using the library (see examples below).
|
28
36
|
|
29
37
|
## Usage ##
|
30
38
|
|
31
39
|
The other benefit of rwordnet over Ruby-WordNet is that it's so much easier (IMHO) to
|
32
|
-
use.
|
40
|
+
use.
|
41
|
+
|
42
|
+
As an example, consider finding all of the noun glosses for a given word:
|
33
43
|
|
34
|
-
|
44
|
+
```Ruby
|
45
|
+
require 'wordnet'
|
35
46
|
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
index = WordNet::NounIndex.instance
|
40
|
-
lemma = index.find("fruit")
|
41
|
-
lemma.synsets.each { |synset| puts synset.gloss }
|
47
|
+
lemma = WordNet::Lemma.find("fruit", :noun)
|
48
|
+
lemma.synsets.each { |synset| puts synset.gloss }
|
49
|
+
```
|
42
50
|
|
43
51
|
...or all of the glosses, period:
|
44
52
|
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
53
|
+
```Ruby
|
54
|
+
lemmas = WordNet::Lemma.find_all("fruit")
|
55
|
+
synsets = lemmas.map { |lemma| lemma.synsets }
|
56
|
+
words = synsets.flatten
|
57
|
+
words.each { |word| puts word.gloss }
|
58
|
+
```
|
49
59
|
|
50
60
|
Have your own WordNet database that you've marked up with extra attributes and whatnot?
|
51
61
|
No problem:
|
52
62
|
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
63
|
+
```Ruby
|
64
|
+
require 'wordnet'
|
65
|
+
|
66
|
+
WordNet::DB.path = "/path/to/WordNet-3.0"
|
67
|
+
lemmas = WordNet::Lemma.find_all("fruit")
|
68
|
+
...
|
69
|
+
```
|
@@ -0,0 +1,14 @@
|
|
1
|
+
require 'benchmark'
|
2
|
+
require 'wordnet'
|
3
|
+
|
4
|
+
initial = Benchmark.realtime do
|
5
|
+
WordNet::Lemma.find(ARGV[0] || raise("Usage: ruby benchmark.rb noun"), :noun)
|
6
|
+
end
|
7
|
+
|
8
|
+
puts "Time to initial word #{initial}"
|
9
|
+
|
10
|
+
lookup = Benchmark.realtime do
|
11
|
+
1000.times { WordNet::Lemma.find('fruit', :noun) }
|
12
|
+
end
|
13
|
+
|
14
|
+
puts "Time for 1k lookups #{lookup}"
|
data/examples/dictionary.rb
CHANGED
@@ -1,5 +1,4 @@
|
|
1
1
|
# Use WordNet as a command-line dictionary.
|
2
|
-
require 'rubygems'
|
3
2
|
require 'wordnet'
|
4
3
|
|
5
4
|
if ARGV.size != 1
|
@@ -10,10 +9,10 @@ end
|
|
10
9
|
word = ARGV[0]
|
11
10
|
|
12
11
|
# Find all the lemmas for a word (i.e., whether it occurs as a noun, verb, etc.)
|
13
|
-
lemmas = WordNet::
|
12
|
+
lemmas = WordNet::Lemma.find_all(word)
|
14
13
|
|
15
14
|
# Print out each lemma with a list of possible meanings.
|
16
|
-
lemmas.each do |lemma|
|
15
|
+
lemmas.each do |lemma|
|
17
16
|
puts lemma
|
18
17
|
lemma.synsets.each_with_index do |synset,i|
|
19
18
|
puts "\t#{i+1}) #{synset.gloss}"
|
data/examples/full_hypernym.rb
CHANGED
@@ -1,10 +1,7 @@
|
|
1
|
-
require 'rubygems'
|
2
1
|
require 'wordnet'
|
3
2
|
|
4
|
-
# Open the index file for nouns
|
5
|
-
index = WordNet::NounIndex.new
|
6
3
|
# Find the word 'fruit'
|
7
|
-
lemma =
|
4
|
+
lemma = WordNet::Lemma.find("fruit", :noun)
|
8
5
|
# Find all the synsets for 'fruit', and pick the first one.
|
9
6
|
synset = lemma.synsets[0]
|
10
7
|
puts synset
|
data/lib/wordnet/db.rb
ADDED
@@ -0,0 +1,17 @@
|
|
1
|
+
module WordNet
|
2
|
+
# Represents the WordNet database, and provides some basic interaction.
|
3
|
+
class DB
|
4
|
+
# By default, use the bundled WordNet
|
5
|
+
@path = File.expand_path("../../../WordNet-3.0/", __FILE__)
|
6
|
+
|
7
|
+
class << self
|
8
|
+
# To use your own WordNet installation (rather than the one bundled with rwordnet:
|
9
|
+
# Returns the path to the WordNet installation currently in use. Defaults to the bundled version of WordNet.
|
10
|
+
attr_accessor :path
|
11
|
+
|
12
|
+
def open(path, &block)
|
13
|
+
File.open(File.join(self.path, path), "r", &block)
|
14
|
+
end
|
15
|
+
end
|
16
|
+
end
|
17
|
+
end
|
data/lib/wordnet/lemma.rb
CHANGED
@@ -1,39 +1,60 @@
|
|
1
1
|
module WordNet
|
2
|
+
# Represents a single word in the WordNet lexicon, which can be used to look up a set of synsets.
|
3
|
+
class Lemma
|
4
|
+
SPACE = ' '
|
5
|
+
attr_accessor :word, :pos, :pointer_symbols, :tagsense_count, :synset_offsets, :id
|
2
6
|
|
3
|
-
#
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
@synset_offset = []
|
23
|
-
@synset_cnt.times { @synset_offset.push line.shift.to_i }
|
24
|
-
end
|
25
|
-
|
26
|
-
# Return a list of synsets for this Lemma. Each synset represents a different sense, or meaning, of the word.
|
27
|
-
def get_synsets
|
28
|
-
return @synset_offset.map { |offset| Synset.new(@pos, offset) }
|
29
|
-
end
|
30
|
-
|
31
|
-
def to_s
|
32
|
-
[@lemma, @pos].join(",")
|
33
|
-
end
|
34
|
-
|
35
|
-
alias synsets get_synsets
|
36
|
-
alias word lemma
|
37
|
-
end
|
7
|
+
# Create a lemma from a line in an lexicon file. You should be creating Lemmas by hand; instead,
|
8
|
+
# use the WordNet::Lemma.find and WordNet::Lemma.find_all methods to find the Lemma for a word.
|
9
|
+
def initialize(lexicon_line, id)
|
10
|
+
@id = id
|
11
|
+
line = lexicon_line.split(" ")
|
12
|
+
|
13
|
+
@word = line.shift
|
14
|
+
@pos = line.shift
|
15
|
+
synset_count = line.shift.to_i
|
16
|
+
@pointer_symbols = line.slice!(0, line.shift.to_i)
|
17
|
+
line.shift # Throw away redundant sense_cnt
|
18
|
+
@tagsense_count = line.shift.to_i
|
19
|
+
@synset_offsets = line.slice!(0, synset_count).map(&:to_i)
|
20
|
+
end
|
21
|
+
|
22
|
+
# Return a list of synsets for this Lemma. Each synset represents a different sense, or meaning, of the word.
|
23
|
+
def synsets
|
24
|
+
@synset_offsets.map { |offset| Synset.new(@pos, offset) }
|
25
|
+
end
|
38
26
|
|
27
|
+
def to_s
|
28
|
+
[@word, @pos].join(",")
|
29
|
+
end
|
30
|
+
|
31
|
+
class << self
|
32
|
+
@@cache = {}
|
33
|
+
|
34
|
+
def find_all(word)
|
35
|
+
[:noun, :verb, :adj, :adv].flat_map do |pos|
|
36
|
+
find(word, pos) || []
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
# Find a lemma for a given word and pos
|
41
|
+
def find(word, pos)
|
42
|
+
cache = @@cache[pos] ||= build_cache(pos)
|
43
|
+
if found = cache[word]
|
44
|
+
Lemma.new(*found)
|
45
|
+
end
|
46
|
+
end
|
47
|
+
|
48
|
+
private
|
49
|
+
|
50
|
+
def build_cache(pos)
|
51
|
+
cache = {}
|
52
|
+
DB.open(File.join("dict", "index.#{pos}")).each_line.each_with_index do |line, index|
|
53
|
+
word = line.slice(0, line.index(SPACE))
|
54
|
+
cache[word] = [line, index+1]
|
55
|
+
end
|
56
|
+
cache
|
57
|
+
end
|
58
|
+
end
|
59
|
+
end
|
39
60
|
end
|
data/lib/wordnet/pointer.rb
CHANGED
@@ -1,15 +1,14 @@
|
|
1
1
|
module WordNet
|
2
|
+
class Pointer
|
3
|
+
attr_reader :symbol, :offset, :pos, :source, :target
|
2
4
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
def method_missing(msg, *args)
|
7
|
-
if self.include?(msg)
|
8
|
-
return self[msg]
|
9
|
-
else
|
10
|
-
throw NoMethodError.new("undefined method `#{msg}' for #{self}:Pointer")
|
5
|
+
def initialize(symbol: raise, offset: raise, pos: raise, source: raise)
|
6
|
+
@symbol, @offset, @pos, @source = symbol, offset, pos, source
|
7
|
+
@target = source.slice!(2,2)
|
11
8
|
end
|
12
|
-
end
|
13
|
-
end
|
14
9
|
|
10
|
+
def is_semantic?
|
11
|
+
source == "00" && target == "00"
|
12
|
+
end
|
13
|
+
end
|
15
14
|
end
|
data/lib/wordnet/pointers.rb
CHANGED
@@ -1,37 +1,82 @@
|
|
1
|
-
# A container for various constants.
|
2
|
-
#
|
1
|
+
# A container for various constants.
|
2
|
+
# In particular, contains constants representing the WordNet symbols used to look up synsets by relation, i.e. Hypernym/Hyponym.
|
3
|
+
# Use these symbols in conjunction with the Synset#relation method.
|
3
4
|
|
4
5
|
module WordNet
|
6
|
+
NOUN_POINTERS = {
|
7
|
+
"-c" => "Member of this domain - TOPIC",
|
8
|
+
"+" => "Derivationally related form",
|
9
|
+
"%p" => "Part meronym",
|
10
|
+
"~i" => "Instance Hyponym",
|
11
|
+
"@" => "Hypernym",
|
12
|
+
";r" => "Domain of synset - REGION",
|
13
|
+
"!" => "Antonym",
|
14
|
+
"#p" => "Part holonym",
|
15
|
+
"%s" => "Substance meronym",
|
16
|
+
";u" => "Domain of synset - USAGE",
|
17
|
+
"-r" => "Member of this domain - REGION",
|
18
|
+
"#s" => "Substance holonym",
|
19
|
+
"=" => "Attribute",
|
20
|
+
"-u" => "Member of this domain - USAGE",
|
21
|
+
";c" => "Domain of synset - TOPIC",
|
22
|
+
"%m" => "Member meronym",
|
23
|
+
"~" => "Hyponym",
|
24
|
+
"@i" => "Instance Hypernym",
|
25
|
+
"#m" => "Member holonym"
|
26
|
+
}
|
27
|
+
VERB_POINTERS = {
|
28
|
+
"+" => "Derivationally related form",
|
29
|
+
"@" => "Hypernym",
|
30
|
+
";r" => "Domain of synset - REGION",
|
31
|
+
"!" => "Antonym",
|
32
|
+
";u" => "Domain of synset - USAGE",
|
33
|
+
"$" => "Verb Group",
|
34
|
+
";c" => "Domain of synset - TOPIC",
|
35
|
+
">" => "Cause",
|
36
|
+
"~" => "Hyponym",
|
37
|
+
"*" => "Entailment"
|
38
|
+
}
|
39
|
+
ADJECTIVE_POINTERS = {
|
40
|
+
";r" => "Domain of synset - REGION",
|
41
|
+
"!" => "Antonym",
|
42
|
+
"\\" => "Pertainym (pertains to noun)",
|
43
|
+
"<" => "Participle of verb",
|
44
|
+
"&" => "Similar to",
|
45
|
+
"=" => "Attribute",
|
46
|
+
";c" => "Domain of synset - TOPIC"
|
47
|
+
}
|
48
|
+
ADVERB_POINTERS = {
|
49
|
+
";r" => "Domain of synset - REGION",
|
50
|
+
"!" => "Antonym",
|
51
|
+
";u" => "Domain of synset - USAGE",
|
52
|
+
"\\" => "Derived from adjective",
|
53
|
+
";c" => "Domain of synset - TOPIC"
|
54
|
+
}
|
5
55
|
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
AdverbPointers = {";r"=>"Domain of synset - REGION", "!"=>"Antonym", ";u"=>"Domain of synset - USAGE", "\\"=>"Derived from adjective", ";c"=>"Domain of synset - TOPIC"}
|
10
|
-
|
11
|
-
MemberOfThisDomainTopic = "-c"
|
12
|
-
DerivationallyRelatedForm = "+"
|
13
|
-
PartMeronym = "%p"
|
56
|
+
MEMBER_OF_THIS_DOMAIN_TOPIC = "-c"
|
57
|
+
DERIVATIONALLY_RELATED_FORM = "+"
|
58
|
+
PART_MERONYM = "%p"
|
14
59
|
InstanceHyponym = "~i"
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
60
|
+
HYPERNYM = "@"
|
61
|
+
DOMAIN_OF_SYNSET_REGION = ";r"
|
62
|
+
ANTONYM = "!"
|
63
|
+
PART_HOLONYM = "#p"
|
64
|
+
SUBSTANCE_MERONYM = "%s"
|
65
|
+
VERB_GROUP = "$"
|
66
|
+
DOMAIN_OF_SYNSET_USAGE = ";u"
|
67
|
+
MEMBER_OF_THIS_DOMAIN_REGION = "-r"
|
68
|
+
SUBSTANCE_HOLONYM = "#s"
|
69
|
+
DERIVED_FROM_ADJECTIVE = "\\"
|
70
|
+
PARTICIPLE_OF_VERB = "<"
|
71
|
+
SIMILAR_TO = "&"
|
72
|
+
ATTRIBUTE = "="
|
73
|
+
ALSO_SEE = "^"
|
74
|
+
CAUSE = ">"
|
75
|
+
MEMBER_OF_THIS_DOMAIN_USAGE = "-u"
|
76
|
+
DOMAIN_OF_SYNSET_TOPIC = ";c"
|
77
|
+
MEMBER_MERONYM = "%m"
|
78
|
+
HYPONYM = "~"
|
79
|
+
INSTANCE_HYPERNYM = "@i"
|
80
|
+
ENTAILMENT = "*"
|
81
|
+
MEMBER_HOLONYM = "#m"
|
37
82
|
end
|
data/lib/wordnet/synset.rb
CHANGED
@@ -1,90 +1,98 @@
|
|
1
1
|
module WordNet
|
2
|
+
SYNSET_TYPES = {"n" => "noun", "v" => "verb", "a" => "adj", "r" => "adv"}
|
2
3
|
|
3
|
-
# Represents a synset (or group of synonymous words) in WordNet. Synsets are related to each other by various (and numerous!)
|
4
|
-
# relationships, including Hypernym (x is a hypernym of y <=> x is a parent of y) and Hyponym (x is a child of y)
|
5
|
-
class Synset
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
@
|
4
|
+
# Represents a synset (or group of synonymous words) in WordNet. Synsets are related to each other by various (and numerous!)
|
5
|
+
# relationships, including Hypernym (x is a hypernym of y <=> x is a parent of y) and Hyponym (x is a child of y)
|
6
|
+
class Synset
|
7
|
+
attr_reader :gloss, :synset_offset, :lex_filenum, :synset_type, :word_counts, :pos_offset, :pos
|
8
|
+
|
9
|
+
# Create a new synset by reading from the data file specified by +pos+, at +offset+ bytes into the file. This is how
|
10
|
+
# the WordNet database is organized. You shouldn't be creating Synsets directly; instead, use Lemma#synsets.
|
11
|
+
def initialize(pos, offset)
|
12
|
+
data_line = DB.open(File.join("dict", "data.#{SYNSET_TYPES.fetch(pos)}")) do |f|
|
13
|
+
f.seek(offset)
|
14
|
+
f.readline.strip
|
15
|
+
end
|
16
|
+
|
17
|
+
info_line, @gloss = data_line.split(" | ", 2)
|
18
|
+
line = info_line.split(" ")
|
19
|
+
|
20
|
+
@pos = pos
|
21
|
+
@pos_offset = offset
|
22
|
+
@synset_offset = line.shift
|
23
|
+
@lex_filenum = line.shift
|
24
|
+
@synset_type = line.shift
|
25
|
+
|
26
|
+
@word_counts = {}
|
27
|
+
word_count = line.shift.to_i
|
28
|
+
word_count.times do
|
29
|
+
@word_counts[line.shift] = line.shift.to_i
|
30
|
+
end
|
31
|
+
|
32
|
+
pointer_count = line.shift.to_i
|
33
|
+
@pointers = Array.new(pointer_count).map do
|
34
|
+
Pointer.new(
|
35
|
+
symbol: line.shift[0],
|
36
|
+
offset: line.shift.to_i,
|
37
|
+
pos: line.shift,
|
38
|
+
source: line.shift
|
39
|
+
)
|
40
|
+
end
|
26
41
|
end
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
pointer = Pointer.new
|
32
|
-
pointer[:symbol] = line.shift,
|
33
|
-
pointer[:offset] = line.shift.to_i
|
34
|
-
pointer[:pos] = line.shift
|
35
|
-
pointer[:source] = line.shift
|
36
|
-
pointer[:is_semantic?] = (pointer[:source] == "0000")
|
37
|
-
pointer[:target] = pointer[:source][2..3]
|
38
|
-
pointer[:source] = pointer[:source][0..1]
|
39
|
-
pointer[:symbol] = pointer[:symbol][0]
|
40
|
-
@pointers.push pointer
|
42
|
+
|
43
|
+
# How many words does this Synset include?
|
44
|
+
def word_count
|
45
|
+
@word_counts.size
|
41
46
|
end
|
42
|
-
end
|
43
|
-
|
44
|
-
# How many words does this Synset include?
|
45
|
-
def size
|
46
|
-
@wordcounts.size
|
47
|
-
end
|
48
|
-
|
49
|
-
# Get a list of words included in this Synset
|
50
|
-
def words
|
51
|
-
@wordcounts.keys
|
52
|
-
end
|
53
|
-
|
54
|
-
# List of valid +pointer_symbol+s is in pointers.rb
|
55
|
-
def get_relation(pointer_symbol)
|
56
|
-
@pointers.reject { |pointer| pointer.symbol != pointer_symbol }.map { |pointer| Synset.new(@ss_type, pointer.offset) }
|
57
|
-
end
|
58
|
-
|
59
|
-
# Get the Synset of this sense's antonym
|
60
|
-
def antonym
|
61
|
-
get_relation(Antonym)
|
62
|
-
end
|
63
|
-
|
64
|
-
# Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).
|
65
|
-
def hypernym
|
66
|
-
get_relation(Hypernym)[0]
|
67
|
-
end
|
68
|
-
|
69
|
-
# Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit)
|
70
|
-
def hyponym
|
71
|
-
get_relation(Hyponym)
|
72
|
-
end
|
73
|
-
|
74
|
-
# Get the entire hypernym tree (from this synset all the way up to +entity+) as an array.
|
75
|
-
def expanded_hypernym
|
76
|
-
parent = self.hypernym
|
77
|
-
return [] if parent.nil?
|
78
|
-
|
79
|
-
return [parent, parent.expanded_hypernym].flatten
|
80
|
-
end
|
81
|
-
|
82
|
-
def to_s
|
83
|
-
"(#{@ss_type}) #{words.map {|x| x.gsub('_',' ')}.join(', ')} (#{@gloss})"
|
84
|
-
end
|
85
|
-
|
86
|
-
alias parent hypernym
|
87
|
-
alias children hyponym
|
88
|
-
end
|
89
47
|
|
48
|
+
# Get a list of words included in this Synset
|
49
|
+
def words
|
50
|
+
@word_counts.keys
|
51
|
+
end
|
52
|
+
|
53
|
+
# List of valid +pointer_symbol+s is in pointers.rb
|
54
|
+
def relation(pointer_symbol)
|
55
|
+
@pointers.select { |pointer| pointer.symbol == pointer_symbol }.
|
56
|
+
map! { |pointer| Synset.new(@synset_type, pointer.offset) }
|
57
|
+
end
|
58
|
+
|
59
|
+
# Get the Synset of this sense's antonym
|
60
|
+
def antonym
|
61
|
+
relation(ANTONYM)
|
62
|
+
end
|
63
|
+
|
64
|
+
# Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).
|
65
|
+
def hypernym
|
66
|
+
relation(HYPERNYM)[0]
|
67
|
+
end
|
68
|
+
|
69
|
+
# Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit)
|
70
|
+
def hyponym
|
71
|
+
relation(HYPONYM)
|
72
|
+
end
|
73
|
+
|
74
|
+
# Get the entire hypernym tree (from this synset all the way up to +entity+) as an array.
|
75
|
+
def expanded_hypernym
|
76
|
+
parent = hypernym
|
77
|
+
list = []
|
78
|
+
return list unless parent
|
79
|
+
|
80
|
+
while parent
|
81
|
+
break if list.include? parent.pos_offset
|
82
|
+
list.push parent.pos_offset
|
83
|
+
parent = parent.parent
|
84
|
+
end
|
85
|
+
|
86
|
+
list.flatten!
|
87
|
+
list.map! { |offset| Synset.new(@pos, offset)}
|
88
|
+
end
|
89
|
+
|
90
|
+
def to_s
|
91
|
+
"(#{@synset_type}) #{words.map { |x| x.tr('_',' ') }.join(', ')} (#{@gloss})"
|
92
|
+
end
|
93
|
+
|
94
|
+
alias size word_count
|
95
|
+
alias parent hypernym
|
96
|
+
alias children hyponym
|
97
|
+
end
|
90
98
|
end
|
data/lib/wordnet.rb
CHANGED
data/test/test_helper.rb
CHANGED
@@ -1,17 +1,16 @@
|
|
1
|
-
require "
|
2
|
-
require
|
1
|
+
require "bundler/setup"
|
2
|
+
require "maxitest/autorun"
|
3
3
|
|
4
|
+
$LOAD_PATH.unshift Bundler.root.join("lib")
|
5
|
+
require "wordnet"
|
4
6
|
|
5
|
-
|
6
|
-
def
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
define_method :"test_#{caller.first.split("/").last}" do
|
14
|
-
assert_equal expected_value, instance_eval(&block)
|
15
|
-
end
|
7
|
+
Minitest::Test.class_eval do
|
8
|
+
def with_db_path(path)
|
9
|
+
begin
|
10
|
+
old, WordNet::DB.path = WordNet::DB.path, path
|
11
|
+
yield
|
12
|
+
ensure
|
13
|
+
WordNet::DB.path = old
|
14
|
+
end
|
16
15
|
end
|
17
16
|
end
|
@@ -0,0 +1,14 @@
|
|
1
|
+
require_relative "../test_helper"
|
2
|
+
|
3
|
+
describe WordNet::DB do
|
4
|
+
it 'sets and reads path' do
|
5
|
+
with_db_path("WordNetPath") { WordNet::DB.path.must_equal "WordNetPath" }
|
6
|
+
end
|
7
|
+
|
8
|
+
it "opens a relative path" do
|
9
|
+
result = WordNet::DB.open(File.join("dict", "index.verb")) do |f|
|
10
|
+
f.gets
|
11
|
+
end
|
12
|
+
result.must_equal " 1 This software and database is being provided to you, the LICENSEE, by \n"
|
13
|
+
end
|
14
|
+
end
|
@@ -0,0 +1,94 @@
|
|
1
|
+
require_relative "../test_helper"
|
2
|
+
|
3
|
+
describe WordNet::Lemma do
|
4
|
+
describe ".find" do
|
5
|
+
it 'finds a lemma by string' do
|
6
|
+
lemma = WordNet::Lemma.find("fruit", :noun)
|
7
|
+
lemma.to_s.must_equal "fruit,n"
|
8
|
+
end
|
9
|
+
|
10
|
+
it 'caches found' do
|
11
|
+
lemma1 = WordNet::Lemma.find("fruit", :noun)
|
12
|
+
lemma2 = with_db_path "does-not-exist" do
|
13
|
+
WordNet::Lemma.find("fruit", :noun)
|
14
|
+
end
|
15
|
+
lemma1.word.must_equal lemma2.word
|
16
|
+
end
|
17
|
+
|
18
|
+
it 'only scans the db once' do
|
19
|
+
lemma1 = WordNet::Lemma.find("fruit", :noun)
|
20
|
+
lemma2 = with_db_path "does-not-exist" do
|
21
|
+
WordNet::Lemma.find("table", :noun)
|
22
|
+
end
|
23
|
+
lemma2.word.must_equal "table"
|
24
|
+
end
|
25
|
+
|
26
|
+
it 'can lookup different things' do
|
27
|
+
lemma1 = WordNet::Lemma.find("fruit", :noun)
|
28
|
+
lemma2 = WordNet::Lemma.find("banana", :noun)
|
29
|
+
lemma1.word.must_equal "fruit"
|
30
|
+
lemma2.word.must_equal "banana"
|
31
|
+
end
|
32
|
+
|
33
|
+
it 'does not find word in wrong file' do
|
34
|
+
lemma = WordNet::Lemma.find("elephant", :verb)
|
35
|
+
lemma.must_equal nil
|
36
|
+
end
|
37
|
+
|
38
|
+
it 'caches unfound' do
|
39
|
+
WordNet::Lemma.find("elephant", :verb)
|
40
|
+
lemma2 = with_db_path "does-not-exist" do
|
41
|
+
WordNet::Lemma.find("elephant", :verb)
|
42
|
+
end
|
43
|
+
lemma2.must_equal nil
|
44
|
+
end
|
45
|
+
|
46
|
+
it 'fails on unknown type' do
|
47
|
+
assert_raises Errno::ENOENT do
|
48
|
+
WordNet::Lemma.find("fruit", :sdjksdfjkdfskjsdfjk)
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
it "does not find by regexp" do
|
53
|
+
WordNet::Lemma.find(".", :verb).must_equal nil
|
54
|
+
end
|
55
|
+
end
|
56
|
+
|
57
|
+
describe ".find_all" do
|
58
|
+
it "finds all pos" do
|
59
|
+
result = WordNet::Lemma.find_all("fruit")
|
60
|
+
result.size.must_equal 2
|
61
|
+
result.map(&:pos).sort.must_equal ["n", "v"]
|
62
|
+
end
|
63
|
+
|
64
|
+
it "returns empty array for unfound" do
|
65
|
+
WordNet::Lemma.find_all("sdjkhdfsjfdsjhkfds").must_equal []
|
66
|
+
end
|
67
|
+
|
68
|
+
it "does not produce a circular reference" do
|
69
|
+
l = WordNet::Lemma.find_all("blink")[1]
|
70
|
+
l.synsets[1].expanded_hypernym.wont_be_nil
|
71
|
+
end
|
72
|
+
end
|
73
|
+
|
74
|
+
describe "#synsets" do
|
75
|
+
it 'finds them' do
|
76
|
+
lemma = WordNet::Lemma.find("fruit", :noun)
|
77
|
+
synsets = lemma.synsets
|
78
|
+
synsets.size.must_equal 3
|
79
|
+
synsets[1].to_s.must_equal "(n) yield, fruit (an amount of a product)"
|
80
|
+
end
|
81
|
+
end
|
82
|
+
|
83
|
+
describe ".new" do
|
84
|
+
it "builds all fields" do
|
85
|
+
lemma = WordNet::Lemma.new("fruit n 3 3 @ ~ + 3 3 13134947 04612722 07294550", 123)
|
86
|
+
lemma.id.must_equal 123
|
87
|
+
lemma.word.must_equal "fruit"
|
88
|
+
lemma.pos.must_equal "n"
|
89
|
+
lemma.pointer_symbols.must_equal ["@", "~", "+"]
|
90
|
+
lemma.tagsense_count.must_equal 3
|
91
|
+
lemma.synset_offsets.must_equal [13134947, 4612722, 7294550]
|
92
|
+
end
|
93
|
+
end
|
94
|
+
end
|
@@ -0,0 +1,26 @@
|
|
1
|
+
require_relative "../test_helper"
|
2
|
+
|
3
|
+
describe WordNet::Pointer do
|
4
|
+
let(:pointer) { WordNet::Pointer.new(symbol: "s", offset: 123, pos: "v", source: "1234") }
|
5
|
+
|
6
|
+
describe "#initialize" do
|
7
|
+
it "sets all values" do
|
8
|
+
pointer.symbol.must_equal "s"
|
9
|
+
pointer.offset.must_equal 123
|
10
|
+
pointer.pos.must_equal "v"
|
11
|
+
pointer.source.must_equal "12"
|
12
|
+
pointer.target.must_equal "34"
|
13
|
+
end
|
14
|
+
end
|
15
|
+
|
16
|
+
describe "#is_semantic?" do
|
17
|
+
it "is not semantic for non-0" do
|
18
|
+
pointer.is_semantic?.must_equal false
|
19
|
+
end
|
20
|
+
|
21
|
+
it "is semantic for all-0" do
|
22
|
+
pointer = WordNet::Pointer.new(symbol: "s", offset: 123, pos: "v", source: "0000")
|
23
|
+
pointer.is_semantic?.must_equal true
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
data/test/unit/synset_test.rb
CHANGED
@@ -1,43 +1,39 @@
|
|
1
|
-
|
1
|
+
require_relative "../test_helper"
|
2
2
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
def setup
|
7
|
-
if @@synsets.nil?
|
8
|
-
index = WordNet::NounIndex.instance
|
9
|
-
lemma = index.find("fruit")
|
10
|
-
@@synsets = lemma.get_synsets
|
11
|
-
end
|
3
|
+
describe WordNet::Synset do
|
4
|
+
def self.synsets
|
5
|
+
@synsets ||= WordNet::Lemma.find("fruit", :noun).synsets
|
12
6
|
end
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
assert_equal
|
7
|
+
|
8
|
+
let(:synsets) { self.class.synsets }
|
9
|
+
|
10
|
+
it 'get synsets for a lemma' do
|
11
|
+
assert_equal 3, synsets.size
|
12
|
+
assert_equal "(n) fruit (the ripened reproductive body of a seed plant)",synsets[0].to_s
|
13
|
+
assert_equal "an amount of a product",synsets[1].gloss
|
18
14
|
end
|
19
|
-
|
20
|
-
|
21
|
-
hypernym =
|
22
|
-
hypernym =
|
15
|
+
|
16
|
+
it 'get hypernym for a synset' do
|
17
|
+
hypernym = synsets[0].relation(WordNet::HYPERNYM)
|
18
|
+
hypernym = synsets[0].hypernym
|
23
19
|
assert_equal 1,hypernym.size
|
24
20
|
assert_equal "(n) reproductive structure (the parts of a plant involved in its reproduction)",hypernym.to_s
|
25
21
|
end
|
26
22
|
|
27
|
-
|
28
|
-
hypernym =
|
29
|
-
hypernym2 =
|
23
|
+
it 'test shorthand for get_relation' do
|
24
|
+
hypernym = synsets[0].relation(WordNet::HYPERNYM)
|
25
|
+
hypernym2 = synsets[0].hypernym
|
30
26
|
assert_equal hypernym[0].gloss, hypernym2.gloss
|
31
27
|
end
|
32
|
-
|
33
|
-
|
34
|
-
hyponym =
|
28
|
+
|
29
|
+
it 'get hyponyms for a synset' do
|
30
|
+
hyponym = synsets[0].relation(WordNet::HYPONYM)
|
35
31
|
assert_equal 29,hyponym.size
|
36
32
|
assert_equal "fruit of various buckthorns yielding dyes or pigments",hyponym[26].gloss
|
37
33
|
end
|
38
|
-
|
39
|
-
|
40
|
-
expanded =
|
34
|
+
|
35
|
+
it 'test expanded hypernym tree' do
|
36
|
+
expanded = synsets[0].expanded_hypernym
|
41
37
|
assert_equal 8, expanded.size
|
42
38
|
assert_equal "entity", expanded[expanded.size-1].words[0]
|
43
39
|
end
|
metadata
CHANGED
@@ -1,33 +1,23 @@
|
|
1
|
-
--- !ruby/object:Gem::Specification
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
2
|
name: rwordnet
|
3
|
-
version: !ruby/object:Gem::Version
|
4
|
-
|
5
|
-
segments:
|
6
|
-
- 0
|
7
|
-
- 1
|
8
|
-
- 3
|
9
|
-
version: 0.1.3
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.0
|
10
5
|
platform: ruby
|
11
|
-
authors:
|
6
|
+
authors:
|
12
7
|
- Trevor Fountain
|
13
8
|
- Wolfram Sieber
|
9
|
+
- Michael Grosser
|
14
10
|
autorequire:
|
15
11
|
bindir: bin
|
16
12
|
cert_chain: []
|
17
|
-
|
18
|
-
date: 2010-10-15 00:00:00 +01:00
|
19
|
-
default_executable:
|
13
|
+
date: 2015-03-08 00:00:00.000000000 Z
|
20
14
|
dependencies: []
|
21
|
-
|
22
|
-
description: A pure Ruby interface to the WordNet database
|
15
|
+
description:
|
23
16
|
email: doches@gmail.com
|
24
17
|
executables: []
|
25
|
-
|
26
18
|
extensions: []
|
27
|
-
|
28
|
-
|
29
|
-
- README.markdown
|
30
|
-
files:
|
19
|
+
extra_rdoc_files: []
|
20
|
+
files:
|
31
21
|
- History.txt
|
32
22
|
- README.markdown
|
33
23
|
- WordNet-3.0/AUTHORS
|
@@ -42,54 +32,43 @@ files:
|
|
42
32
|
- WordNet-3.0/dict/index.adv
|
43
33
|
- WordNet-3.0/dict/index.noun
|
44
34
|
- WordNet-3.0/dict/index.verb
|
35
|
+
- examples/benchmark.rb
|
45
36
|
- examples/dictionary.rb
|
46
37
|
- examples/full_hypernym.rb
|
47
38
|
- lib/wordnet.rb
|
48
|
-
- lib/wordnet/
|
39
|
+
- lib/wordnet/db.rb
|
49
40
|
- lib/wordnet/lemma.rb
|
50
41
|
- lib/wordnet/pointer.rb
|
51
42
|
- lib/wordnet/pointers.rb
|
52
|
-
- lib/wordnet/pos.rb
|
53
43
|
- lib/wordnet/synset.rb
|
54
|
-
- lib/wordnet/
|
44
|
+
- lib/wordnet/version.rb
|
55
45
|
- test/test_helper.rb
|
56
|
-
- test/unit/
|
46
|
+
- test/unit/db_test.rb
|
47
|
+
- test/unit/lemma_test.rb
|
48
|
+
- test/unit/pointer_test.rb
|
57
49
|
- test/unit/synset_test.rb
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
50
|
+
homepage: https://github.com/doches/rwordnet
|
51
|
+
licenses:
|
52
|
+
- MIT
|
53
|
+
metadata: {}
|
63
54
|
post_install_message:
|
64
|
-
rdoc_options:
|
65
|
-
|
66
|
-
require_paths:
|
55
|
+
rdoc_options: []
|
56
|
+
require_paths:
|
67
57
|
- lib
|
68
|
-
required_ruby_version: !ruby/object:Gem::Requirement
|
69
|
-
requirements:
|
58
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
59
|
+
requirements:
|
70
60
|
- - ">="
|
71
|
-
- !ruby/object:Gem::Version
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
required_rubygems_version: !ruby/object:Gem::Requirement
|
76
|
-
requirements:
|
61
|
+
- !ruby/object:Gem::Version
|
62
|
+
version: 2.0.0
|
63
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
64
|
+
requirements:
|
77
65
|
- - ">="
|
78
|
-
- !ruby/object:Gem::Version
|
79
|
-
|
80
|
-
- 0
|
81
|
-
version: "0"
|
66
|
+
- !ruby/object:Gem::Version
|
67
|
+
version: '0'
|
82
68
|
requirements: []
|
83
|
-
|
84
69
|
rubyforge_project:
|
85
|
-
rubygems_version:
|
70
|
+
rubygems_version: 2.2.2
|
86
71
|
signing_key:
|
87
|
-
specification_version:
|
72
|
+
specification_version: 4
|
88
73
|
summary: A pure Ruby interface to the WordNet database
|
89
|
-
test_files:
|
90
|
-
- test/unit/index_test.rb
|
91
|
-
- test/unit/synset_test.rb
|
92
|
-
- test/unit/wordnetdb_test.rb
|
93
|
-
- test/test_helper.rb
|
94
|
-
- examples/full_hypernym.rb
|
95
|
-
- examples/dictionary.rb
|
74
|
+
test_files: []
|
data/lib/wordnet/index.rb
DELETED
@@ -1,82 +0,0 @@
|
|
1
|
-
require 'singleton'
|
2
|
-
module WordNet
|
3
|
-
|
4
|
-
# Index is a WordNet lexicon. Note that Index is the base class; you probably want to be using the NounIndex, VerbIndex, etc. classes instead.
|
5
|
-
# Note that Indices are Singletons -- get an Index object by calling <POS>Index.instance, not <POS>Index.new.
|
6
|
-
class Index
|
7
|
-
# Create a new index for the given part of speech. +pos+ can be one of +noun+, +verb+, +adj+, or +adv+.
|
8
|
-
def initialize(pos)
|
9
|
-
@pos = pos
|
10
|
-
@db = {}
|
11
|
-
|
12
|
-
@finished_reading = false
|
13
|
-
end
|
14
|
-
|
15
|
-
# Find a lemma for a given word. Returns a Lemma which can then be used to access the synsets for the word.
|
16
|
-
def find(lemma_str)
|
17
|
-
# Look for the lemma in the part of the DB already read...
|
18
|
-
return @db[lemma_str] if @db.include?(lemma_str)
|
19
|
-
|
20
|
-
return nil if @finished_reading
|
21
|
-
|
22
|
-
# If we didn't find it, read in some more from the DB.
|
23
|
-
index = WordNetDB.open(File.join(WordNetDB.path,"dict","index.#{@pos}"))
|
24
|
-
|
25
|
-
lemma_counter = 1
|
26
|
-
if not index.closed?
|
27
|
-
loop do
|
28
|
-
break if index.eof?
|
29
|
-
line = index.readline
|
30
|
-
lemma = Lemma.new(line, lemma_counter); lemma_counter += 1
|
31
|
-
@db[lemma.word] = lemma
|
32
|
-
if line =~ /^#{lemma_str} /
|
33
|
-
return lemma
|
34
|
-
end
|
35
|
-
end
|
36
|
-
index.close
|
37
|
-
end
|
38
|
-
|
39
|
-
@finished_reading = true
|
40
|
-
|
41
|
-
# If we *still* didn't find it, return nil. It must not be in the database...
|
42
|
-
return nil
|
43
|
-
end
|
44
|
-
end
|
45
|
-
|
46
|
-
# An Index of nouns. Create a NounIndex by calling `NounIndex.instance`
|
47
|
-
class NounIndex < Index
|
48
|
-
include Singleton
|
49
|
-
|
50
|
-
def initialize
|
51
|
-
super("noun")
|
52
|
-
end
|
53
|
-
end
|
54
|
-
|
55
|
-
# An Index of verbs. Create a VerbIndex by calling `VerbIndex.instance`
|
56
|
-
class VerbIndex < Index
|
57
|
-
include Singleton
|
58
|
-
|
59
|
-
def initialize
|
60
|
-
super("verb")
|
61
|
-
end
|
62
|
-
end
|
63
|
-
|
64
|
-
# An Index of adjectives. Create an AdjectiveIndex by `AdjectiveIndex.instance`
|
65
|
-
class AdjectiveIndex < Index
|
66
|
-
include Singleton
|
67
|
-
|
68
|
-
def initialize
|
69
|
-
super("adj")
|
70
|
-
end
|
71
|
-
end
|
72
|
-
|
73
|
-
# An Index of adverbs. Create an AdverbIndex by `AdverbIndex.instance`
|
74
|
-
class AdverbIndex < Index
|
75
|
-
include Singleton
|
76
|
-
|
77
|
-
def initialize
|
78
|
-
super("adv")
|
79
|
-
end
|
80
|
-
end
|
81
|
-
|
82
|
-
end
|
data/lib/wordnet/pos.rb
DELETED
data/lib/wordnet/wordnetdb.rb
DELETED
@@ -1,54 +0,0 @@
|
|
1
|
-
module WordNet
|
2
|
-
|
3
|
-
# Represents the WordNet database, and provides some basic interaction.
|
4
|
-
class WordNetDB
|
5
|
-
# By default, use the bundled WordNet
|
6
|
-
@@path = File.join(File.dirname(__FILE__),"/../../WordNet-3.0/")
|
7
|
-
@@files = {}
|
8
|
-
|
9
|
-
# To use your own WordNet installation (rather than the one bundled with rwordnet:
|
10
|
-
def WordNetDB.path=(path_to_wordnet)
|
11
|
-
@@path = path_to_wordnet
|
12
|
-
end
|
13
|
-
|
14
|
-
# Returns the path to the WordNet installation currently in use. Defaults to the bundled version of WordNet.
|
15
|
-
def WordNetDB.path
|
16
|
-
@@path
|
17
|
-
end
|
18
|
-
|
19
|
-
# Look up a word in WordNet. Returns a list of lemmas occuring in any of the index files (noun, verb, adjective, adverb).
|
20
|
-
def WordNetDB.find(word)
|
21
|
-
lemmas = []
|
22
|
-
[NounIndex, VerbIndex, AdjectiveIndex, AdverbIndex].each do |index|
|
23
|
-
lemmas.push index.instance.find(word)
|
24
|
-
end
|
25
|
-
return lemmas.flatten.reject { |x| x.nil? }
|
26
|
-
end
|
27
|
-
|
28
|
-
# Register a new DB file handle. You shouldn't need to call this method; it's called automatically every time you open an index or data file.
|
29
|
-
def WordNetDB.open(path)
|
30
|
-
# If the file is already open, just return the handle.
|
31
|
-
return @@files[path] if @@files.include?(path) and not @@files[path].closed?
|
32
|
-
|
33
|
-
# Open and store
|
34
|
-
@@files[path] = File.open(path,"r")
|
35
|
-
return @@files[path]
|
36
|
-
end
|
37
|
-
|
38
|
-
# You should call this method after you're done using WordNet.
|
39
|
-
def WordNetDB.close
|
40
|
-
WordNetDB.finalize(0)
|
41
|
-
end
|
42
|
-
|
43
|
-
def WordNetDB.finalize(id)
|
44
|
-
@@files.each_value do |handle|
|
45
|
-
begin
|
46
|
-
handle.close
|
47
|
-
rescue IOError
|
48
|
-
; # Keep going, close the next file.
|
49
|
-
end
|
50
|
-
end
|
51
|
-
end
|
52
|
-
end
|
53
|
-
|
54
|
-
end
|
data/test/unit/index_test.rb
DELETED
@@ -1,21 +0,0 @@
|
|
1
|
-
require File.dirname(__FILE__) + "/../test_helper.rb"
|
2
|
-
|
3
|
-
class TestIndex < Test::Unit::TestCase
|
4
|
-
@@index = nil
|
5
|
-
|
6
|
-
def setup
|
7
|
-
@@index = WordNet::NounIndex.instance if @@index.nil?
|
8
|
-
end
|
9
|
-
|
10
|
-
test 'find a lemma by string' do
|
11
|
-
lemma = @@index.find("fruit")
|
12
|
-
assert_equal "fruit,n",lemma.to_s
|
13
|
-
end
|
14
|
-
|
15
|
-
test 'get synsets for a lemma' do
|
16
|
-
lemma = @@index.find("fruit")
|
17
|
-
synsets = lemma.get_synsets
|
18
|
-
assert_equal 3, synsets.size
|
19
|
-
assert_equal "(n) yield, fruit (an amount of a product)",synsets[1].to_s
|
20
|
-
end
|
21
|
-
end
|
data/test/unit/wordnetdb_test.rb
DELETED
@@ -1,15 +0,0 @@
|
|
1
|
-
require File.dirname(__FILE__) + "/../test_helper.rb"
|
2
|
-
|
3
|
-
class TestWordNetDB < Test::Unit::TestCase
|
4
|
-
include WordNet
|
5
|
-
|
6
|
-
test 'set and read path' do
|
7
|
-
WordNetDB.path = "WordNetPath"
|
8
|
-
assert_equal "WordNetPath",WordNetDB.path
|
9
|
-
end
|
10
|
-
|
11
|
-
test 'find a word' do
|
12
|
-
lemmas = WordNetDB.find("fruit")
|
13
|
-
assert_equal 2,lemmas.size
|
14
|
-
end
|
15
|
-
end
|