rwordnet 0.1.3 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/History.txt +10 -1
- data/README.markdown +38 -28
- data/examples/benchmark.rb +14 -0
- data/examples/dictionary.rb +2 -3
- data/examples/full_hypernym.rb +1 -4
- data/lib/wordnet/db.rb +17 -0
- data/lib/wordnet/lemma.rb +56 -35
- data/lib/wordnet/pointer.rb +9 -10
- data/lib/wordnet/pointers.rb +77 -32
- data/lib/wordnet/synset.rb +92 -84
- data/lib/wordnet/version.rb +3 -0
- data/lib/wordnet.rb +1 -3
- data/test/test_helper.rb +12 -13
- data/test/unit/db_test.rb +14 -0
- data/test/unit/lemma_test.rb +94 -0
- data/test/unit/pointer_test.rb +26 -0
- data/test/unit/synset_test.rb +24 -28
- metadata +32 -53
- data/lib/wordnet/index.rb +0 -82
- data/lib/wordnet/pos.rb +0 -3
- data/lib/wordnet/wordnetdb.rb +0 -54
- data/test/unit/index_test.rb +0 -21
- data/test/unit/wordnetdb_test.rb +0 -15
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 4bea9b677b6a581d27c04ad1912c3034bf8329d8
|
4
|
+
data.tar.gz: 4fdef6fdfdbe2373b445f7857d3fe5ff5071fba2
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: c1471523dcb27e496eb72b406f37ddd83184dfcecbcabe95c492f6018b29fa53180d06035b7f9c0a938bfde0e4c28a37f89473c0b3a712be527ed2abf6261af3
|
7
|
+
data.tar.gz: 25cf40f306f3dfcf8076d9430cfd5d8d805af9198762a530ed4f293e5b0c4e5ccc680e941b2256f63c5141d5499ad77b7821eec5b4e245a72a47e1f9ca6e83b3
|
data/History.txt
CHANGED
@@ -1,5 +1,14 @@
|
|
1
|
+
# rWordNet 1.0.0
|
2
|
+
* Performance fixes for the lookup
|
3
|
+
* Find using Lemma.find / Lemma.find_all
|
4
|
+
* using ruby style constant names like `VerbPointers` -> `VERB_POINTERS`
|
5
|
+
* renamed WordNet::WordNetDB to WordNet::DB
|
6
|
+
* renaming a few methods in Lemma like `p_cnt` -> `pointer_count`
|
7
|
+
* make Pointer a real class
|
8
|
+
* renaming a few methods in SynSet like `get_relation` -> `relation`
|
9
|
+
|
1
10
|
# rWordNet 0.1.3
|
2
|
-
|
11
|
+
* Fixed a terrible bug that caused Indices to re-read the *entire* database on every failed lookup.
|
3
12
|
|
4
13
|
# rWordNet 0.1.2
|
5
14
|
* Added unique (integer) ids to lemmas [Wolfram Sieber]
|
data/README.markdown
CHANGED
@@ -1,59 +1,69 @@
|
|
1
1
|
# A pure Ruby interface to WordNet #
|
2
2
|
|
3
|
+
[](https://travis-ci.org/doches/rwordnet)
|
4
|
+
|
5
|
+
## Summary ##
|
6
|
+
|
7
|
+
+ Works directly on the database that comes with WordNet
|
8
|
+
+ No gem or native dependencies
|
9
|
+
+ *Very* easy to install
|
10
|
+
+ Small footprint (8.1M vs 24M for Ruby-Wordnet+DB)
|
11
|
+
+ Can use a custom, existing WordNet installation
|
12
|
+
|
3
13
|
## About ##
|
4
14
|
|
5
15
|
This library implements a pure Ruby interface to the WordNet lexical/semantic
|
6
|
-
database. Unlike existing ruby bindings, this one doesn't require you to convert
|
16
|
+
database. Unlike existing ruby bindings, this one doesn't require you to convert
|
7
17
|
the original WordNet database into a new database format; instead it can work directly
|
8
18
|
on the database that comes with WordNet.
|
9
19
|
|
10
20
|
If you're doing something data-intensive you will achieve much better performance
|
11
|
-
with Michael Granger's [Ruby-WordNet](http://www.deveiate.org/projects/Ruby-WordNet/),
|
12
|
-
since it converts the WordNet database into a BerkelyDB file for quicker access. In
|
13
|
-
writing rwordnet, I've focused more on usability and ease of installation ( *gem install
|
21
|
+
with Michael Granger's [Ruby-WordNet](http://www.deveiate.org/projects/Ruby-WordNet/),
|
22
|
+
since it converts the WordNet database into a BerkelyDB file for quicker access. rwordnet has a much smaller footprint, with no gem or native dependencies, and requires about a third of the space on disk as Ruby-Wordnet + DB. In
|
23
|
+
writing rwordnet, I've focused more on usability and ease of installation ( *gem install
|
14
24
|
rwordnet* ) at the expense of some performance. Use at your own risk, etc.
|
15
25
|
|
16
26
|
## Installation ##
|
17
27
|
|
18
28
|
One of the chief benefits of rwordnet over Ruby-WordNet is how easy it is to install:
|
19
29
|
|
20
|
-
gem install gemcutter # These two steps are only necessary if you haven't
|
21
|
-
gem tumble # yet installed the gemcutter tools
|
22
30
|
gem install rwordnet
|
23
|
-
|
24
|
-
That's it! rwordnet comes bundled with the WordNet database which it uses by default,
|
31
|
+
|
32
|
+
That's it! rwordnet comes bundled with the WordNet database which it uses by default,
|
25
33
|
so there's absolutely nothing else to download, install, or configure.
|
26
|
-
Of course, if you want to use your own WordNet installation, that's easy too -- just
|
34
|
+
Of course, if you want to use your own WordNet installation, that's easy too -- just
|
27
35
|
set the path to WordNet's database files before using the library (see examples below).
|
28
36
|
|
29
37
|
## Usage ##
|
30
38
|
|
31
39
|
The other benefit of rwordnet over Ruby-WordNet is that it's so much easier (IMHO) to
|
32
|
-
use.
|
40
|
+
use.
|
41
|
+
|
42
|
+
As an example, consider finding all of the noun glosses for a given word:
|
33
43
|
|
34
|
-
|
44
|
+
```Ruby
|
45
|
+
require 'wordnet'
|
35
46
|
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
index = WordNet::NounIndex.instance
|
40
|
-
lemma = index.find("fruit")
|
41
|
-
lemma.synsets.each { |synset| puts synset.gloss }
|
47
|
+
lemma = WordNet::Lemma.find("fruit", :noun)
|
48
|
+
lemma.synsets.each { |synset| puts synset.gloss }
|
49
|
+
```
|
42
50
|
|
43
51
|
...or all of the glosses, period:
|
44
52
|
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
53
|
+
```Ruby
|
54
|
+
lemmas = WordNet::Lemma.find_all("fruit")
|
55
|
+
synsets = lemmas.map { |lemma| lemma.synsets }
|
56
|
+
words = synsets.flatten
|
57
|
+
words.each { |word| puts word.gloss }
|
58
|
+
```
|
49
59
|
|
50
60
|
Have your own WordNet database that you've marked up with extra attributes and whatnot?
|
51
61
|
No problem:
|
52
62
|
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
63
|
+
```Ruby
|
64
|
+
require 'wordnet'
|
65
|
+
|
66
|
+
WordNet::DB.path = "/path/to/WordNet-3.0"
|
67
|
+
lemmas = WordNet::Lemma.find_all("fruit")
|
68
|
+
...
|
69
|
+
```
|
@@ -0,0 +1,14 @@
|
|
1
|
+
require 'benchmark'
|
2
|
+
require 'wordnet'
|
3
|
+
|
4
|
+
initial = Benchmark.realtime do
|
5
|
+
WordNet::Lemma.find(ARGV[0] || raise("Usage: ruby benchmark.rb noun"), :noun)
|
6
|
+
end
|
7
|
+
|
8
|
+
puts "Time to initial word #{initial}"
|
9
|
+
|
10
|
+
lookup = Benchmark.realtime do
|
11
|
+
1000.times { WordNet::Lemma.find('fruit', :noun) }
|
12
|
+
end
|
13
|
+
|
14
|
+
puts "Time for 1k lookups #{lookup}"
|
data/examples/dictionary.rb
CHANGED
@@ -1,5 +1,4 @@
|
|
1
1
|
# Use WordNet as a command-line dictionary.
|
2
|
-
require 'rubygems'
|
3
2
|
require 'wordnet'
|
4
3
|
|
5
4
|
if ARGV.size != 1
|
@@ -10,10 +9,10 @@ end
|
|
10
9
|
word = ARGV[0]
|
11
10
|
|
12
11
|
# Find all the lemmas for a word (i.e., whether it occurs as a noun, verb, etc.)
|
13
|
-
lemmas = WordNet::
|
12
|
+
lemmas = WordNet::Lemma.find_all(word)
|
14
13
|
|
15
14
|
# Print out each lemma with a list of possible meanings.
|
16
|
-
lemmas.each do |lemma|
|
15
|
+
lemmas.each do |lemma|
|
17
16
|
puts lemma
|
18
17
|
lemma.synsets.each_with_index do |synset,i|
|
19
18
|
puts "\t#{i+1}) #{synset.gloss}"
|
data/examples/full_hypernym.rb
CHANGED
@@ -1,10 +1,7 @@
|
|
1
|
-
require 'rubygems'
|
2
1
|
require 'wordnet'
|
3
2
|
|
4
|
-
# Open the index file for nouns
|
5
|
-
index = WordNet::NounIndex.new
|
6
3
|
# Find the word 'fruit'
|
7
|
-
lemma =
|
4
|
+
lemma = WordNet::Lemma.find("fruit", :noun)
|
8
5
|
# Find all the synsets for 'fruit', and pick the first one.
|
9
6
|
synset = lemma.synsets[0]
|
10
7
|
puts synset
|
data/lib/wordnet/db.rb
ADDED
@@ -0,0 +1,17 @@
|
|
1
|
+
module WordNet
|
2
|
+
# Represents the WordNet database, and provides some basic interaction.
|
3
|
+
class DB
|
4
|
+
# By default, use the bundled WordNet
|
5
|
+
@path = File.expand_path("../../../WordNet-3.0/", __FILE__)
|
6
|
+
|
7
|
+
class << self
|
8
|
+
# To use your own WordNet installation (rather than the one bundled with rwordnet:
|
9
|
+
# Returns the path to the WordNet installation currently in use. Defaults to the bundled version of WordNet.
|
10
|
+
attr_accessor :path
|
11
|
+
|
12
|
+
def open(path, &block)
|
13
|
+
File.open(File.join(self.path, path), "r", &block)
|
14
|
+
end
|
15
|
+
end
|
16
|
+
end
|
17
|
+
end
|
data/lib/wordnet/lemma.rb
CHANGED
@@ -1,39 +1,60 @@
|
|
1
1
|
module WordNet
|
2
|
+
# Represents a single word in the WordNet lexicon, which can be used to look up a set of synsets.
|
3
|
+
class Lemma
|
4
|
+
SPACE = ' '
|
5
|
+
attr_accessor :word, :pos, :pointer_symbols, :tagsense_count, :synset_offsets, :id
|
2
6
|
|
3
|
-
#
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
@synset_offset = []
|
23
|
-
@synset_cnt.times { @synset_offset.push line.shift.to_i }
|
24
|
-
end
|
25
|
-
|
26
|
-
# Return a list of synsets for this Lemma. Each synset represents a different sense, or meaning, of the word.
|
27
|
-
def get_synsets
|
28
|
-
return @synset_offset.map { |offset| Synset.new(@pos, offset) }
|
29
|
-
end
|
30
|
-
|
31
|
-
def to_s
|
32
|
-
[@lemma, @pos].join(",")
|
33
|
-
end
|
34
|
-
|
35
|
-
alias synsets get_synsets
|
36
|
-
alias word lemma
|
37
|
-
end
|
7
|
+
# Create a lemma from a line in an lexicon file. You should be creating Lemmas by hand; instead,
|
8
|
+
# use the WordNet::Lemma.find and WordNet::Lemma.find_all methods to find the Lemma for a word.
|
9
|
+
def initialize(lexicon_line, id)
|
10
|
+
@id = id
|
11
|
+
line = lexicon_line.split(" ")
|
12
|
+
|
13
|
+
@word = line.shift
|
14
|
+
@pos = line.shift
|
15
|
+
synset_count = line.shift.to_i
|
16
|
+
@pointer_symbols = line.slice!(0, line.shift.to_i)
|
17
|
+
line.shift # Throw away redundant sense_cnt
|
18
|
+
@tagsense_count = line.shift.to_i
|
19
|
+
@synset_offsets = line.slice!(0, synset_count).map(&:to_i)
|
20
|
+
end
|
21
|
+
|
22
|
+
# Return a list of synsets for this Lemma. Each synset represents a different sense, or meaning, of the word.
|
23
|
+
def synsets
|
24
|
+
@synset_offsets.map { |offset| Synset.new(@pos, offset) }
|
25
|
+
end
|
38
26
|
|
27
|
+
def to_s
|
28
|
+
[@word, @pos].join(",")
|
29
|
+
end
|
30
|
+
|
31
|
+
class << self
|
32
|
+
@@cache = {}
|
33
|
+
|
34
|
+
def find_all(word)
|
35
|
+
[:noun, :verb, :adj, :adv].flat_map do |pos|
|
36
|
+
find(word, pos) || []
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
# Find a lemma for a given word and pos
|
41
|
+
def find(word, pos)
|
42
|
+
cache = @@cache[pos] ||= build_cache(pos)
|
43
|
+
if found = cache[word]
|
44
|
+
Lemma.new(*found)
|
45
|
+
end
|
46
|
+
end
|
47
|
+
|
48
|
+
private
|
49
|
+
|
50
|
+
def build_cache(pos)
|
51
|
+
cache = {}
|
52
|
+
DB.open(File.join("dict", "index.#{pos}")).each_line.each_with_index do |line, index|
|
53
|
+
word = line.slice(0, line.index(SPACE))
|
54
|
+
cache[word] = [line, index+1]
|
55
|
+
end
|
56
|
+
cache
|
57
|
+
end
|
58
|
+
end
|
59
|
+
end
|
39
60
|
end
|
data/lib/wordnet/pointer.rb
CHANGED
@@ -1,15 +1,14 @@
|
|
1
1
|
module WordNet
|
2
|
+
class Pointer
|
3
|
+
attr_reader :symbol, :offset, :pos, :source, :target
|
2
4
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
def method_missing(msg, *args)
|
7
|
-
if self.include?(msg)
|
8
|
-
return self[msg]
|
9
|
-
else
|
10
|
-
throw NoMethodError.new("undefined method `#{msg}' for #{self}:Pointer")
|
5
|
+
def initialize(symbol: raise, offset: raise, pos: raise, source: raise)
|
6
|
+
@symbol, @offset, @pos, @source = symbol, offset, pos, source
|
7
|
+
@target = source.slice!(2,2)
|
11
8
|
end
|
12
|
-
end
|
13
|
-
end
|
14
9
|
|
10
|
+
def is_semantic?
|
11
|
+
source == "00" && target == "00"
|
12
|
+
end
|
13
|
+
end
|
15
14
|
end
|
data/lib/wordnet/pointers.rb
CHANGED
@@ -1,37 +1,82 @@
|
|
1
|
-
# A container for various constants.
|
2
|
-
#
|
1
|
+
# A container for various constants.
|
2
|
+
# In particular, contains constants representing the WordNet symbols used to look up synsets by relation, i.e. Hypernym/Hyponym.
|
3
|
+
# Use these symbols in conjunction with the Synset#relation method.
|
3
4
|
|
4
5
|
module WordNet
|
6
|
+
NOUN_POINTERS = {
|
7
|
+
"-c" => "Member of this domain - TOPIC",
|
8
|
+
"+" => "Derivationally related form",
|
9
|
+
"%p" => "Part meronym",
|
10
|
+
"~i" => "Instance Hyponym",
|
11
|
+
"@" => "Hypernym",
|
12
|
+
";r" => "Domain of synset - REGION",
|
13
|
+
"!" => "Antonym",
|
14
|
+
"#p" => "Part holonym",
|
15
|
+
"%s" => "Substance meronym",
|
16
|
+
";u" => "Domain of synset - USAGE",
|
17
|
+
"-r" => "Member of this domain - REGION",
|
18
|
+
"#s" => "Substance holonym",
|
19
|
+
"=" => "Attribute",
|
20
|
+
"-u" => "Member of this domain - USAGE",
|
21
|
+
";c" => "Domain of synset - TOPIC",
|
22
|
+
"%m" => "Member meronym",
|
23
|
+
"~" => "Hyponym",
|
24
|
+
"@i" => "Instance Hypernym",
|
25
|
+
"#m" => "Member holonym"
|
26
|
+
}
|
27
|
+
VERB_POINTERS = {
|
28
|
+
"+" => "Derivationally related form",
|
29
|
+
"@" => "Hypernym",
|
30
|
+
";r" => "Domain of synset - REGION",
|
31
|
+
"!" => "Antonym",
|
32
|
+
";u" => "Domain of synset - USAGE",
|
33
|
+
"$" => "Verb Group",
|
34
|
+
";c" => "Domain of synset - TOPIC",
|
35
|
+
">" => "Cause",
|
36
|
+
"~" => "Hyponym",
|
37
|
+
"*" => "Entailment"
|
38
|
+
}
|
39
|
+
ADJECTIVE_POINTERS = {
|
40
|
+
";r" => "Domain of synset - REGION",
|
41
|
+
"!" => "Antonym",
|
42
|
+
"\\" => "Pertainym (pertains to noun)",
|
43
|
+
"<" => "Participle of verb",
|
44
|
+
"&" => "Similar to",
|
45
|
+
"=" => "Attribute",
|
46
|
+
";c" => "Domain of synset - TOPIC"
|
47
|
+
}
|
48
|
+
ADVERB_POINTERS = {
|
49
|
+
";r" => "Domain of synset - REGION",
|
50
|
+
"!" => "Antonym",
|
51
|
+
";u" => "Domain of synset - USAGE",
|
52
|
+
"\\" => "Derived from adjective",
|
53
|
+
";c" => "Domain of synset - TOPIC"
|
54
|
+
}
|
5
55
|
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
AdverbPointers = {";r"=>"Domain of synset - REGION", "!"=>"Antonym", ";u"=>"Domain of synset - USAGE", "\\"=>"Derived from adjective", ";c"=>"Domain of synset - TOPIC"}
|
10
|
-
|
11
|
-
MemberOfThisDomainTopic = "-c"
|
12
|
-
DerivationallyRelatedForm = "+"
|
13
|
-
PartMeronym = "%p"
|
56
|
+
MEMBER_OF_THIS_DOMAIN_TOPIC = "-c"
|
57
|
+
DERIVATIONALLY_RELATED_FORM = "+"
|
58
|
+
PART_MERONYM = "%p"
|
14
59
|
InstanceHyponym = "~i"
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
60
|
+
HYPERNYM = "@"
|
61
|
+
DOMAIN_OF_SYNSET_REGION = ";r"
|
62
|
+
ANTONYM = "!"
|
63
|
+
PART_HOLONYM = "#p"
|
64
|
+
SUBSTANCE_MERONYM = "%s"
|
65
|
+
VERB_GROUP = "$"
|
66
|
+
DOMAIN_OF_SYNSET_USAGE = ";u"
|
67
|
+
MEMBER_OF_THIS_DOMAIN_REGION = "-r"
|
68
|
+
SUBSTANCE_HOLONYM = "#s"
|
69
|
+
DERIVED_FROM_ADJECTIVE = "\\"
|
70
|
+
PARTICIPLE_OF_VERB = "<"
|
71
|
+
SIMILAR_TO = "&"
|
72
|
+
ATTRIBUTE = "="
|
73
|
+
ALSO_SEE = "^"
|
74
|
+
CAUSE = ">"
|
75
|
+
MEMBER_OF_THIS_DOMAIN_USAGE = "-u"
|
76
|
+
DOMAIN_OF_SYNSET_TOPIC = ";c"
|
77
|
+
MEMBER_MERONYM = "%m"
|
78
|
+
HYPONYM = "~"
|
79
|
+
INSTANCE_HYPERNYM = "@i"
|
80
|
+
ENTAILMENT = "*"
|
81
|
+
MEMBER_HOLONYM = "#m"
|
37
82
|
end
|
data/lib/wordnet/synset.rb
CHANGED
@@ -1,90 +1,98 @@
|
|
1
1
|
module WordNet
|
2
|
+
SYNSET_TYPES = {"n" => "noun", "v" => "verb", "a" => "adj", "r" => "adv"}
|
2
3
|
|
3
|
-
# Represents a synset (or group of synonymous words) in WordNet. Synsets are related to each other by various (and numerous!)
|
4
|
-
# relationships, including Hypernym (x is a hypernym of y <=> x is a parent of y) and Hyponym (x is a child of y)
|
5
|
-
class Synset
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
@
|
4
|
+
# Represents a synset (or group of synonymous words) in WordNet. Synsets are related to each other by various (and numerous!)
|
5
|
+
# relationships, including Hypernym (x is a hypernym of y <=> x is a parent of y) and Hyponym (x is a child of y)
|
6
|
+
class Synset
|
7
|
+
attr_reader :gloss, :synset_offset, :lex_filenum, :synset_type, :word_counts, :pos_offset, :pos
|
8
|
+
|
9
|
+
# Create a new synset by reading from the data file specified by +pos+, at +offset+ bytes into the file. This is how
|
10
|
+
# the WordNet database is organized. You shouldn't be creating Synsets directly; instead, use Lemma#synsets.
|
11
|
+
def initialize(pos, offset)
|
12
|
+
data_line = DB.open(File.join("dict", "data.#{SYNSET_TYPES.fetch(pos)}")) do |f|
|
13
|
+
f.seek(offset)
|
14
|
+
f.readline.strip
|
15
|
+
end
|
16
|
+
|
17
|
+
info_line, @gloss = data_line.split(" | ", 2)
|
18
|
+
line = info_line.split(" ")
|
19
|
+
|
20
|
+
@pos = pos
|
21
|
+
@pos_offset = offset
|
22
|
+
@synset_offset = line.shift
|
23
|
+
@lex_filenum = line.shift
|
24
|
+
@synset_type = line.shift
|
25
|
+
|
26
|
+
@word_counts = {}
|
27
|
+
word_count = line.shift.to_i
|
28
|
+
word_count.times do
|
29
|
+
@word_counts[line.shift] = line.shift.to_i
|
30
|
+
end
|
31
|
+
|
32
|
+
pointer_count = line.shift.to_i
|
33
|
+
@pointers = Array.new(pointer_count).map do
|
34
|
+
Pointer.new(
|
35
|
+
symbol: line.shift[0],
|
36
|
+
offset: line.shift.to_i,
|
37
|
+
pos: line.shift,
|
38
|
+
source: line.shift
|
39
|
+
)
|
40
|
+
end
|
26
41
|
end
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
pointer = Pointer.new
|
32
|
-
pointer[:symbol] = line.shift,
|
33
|
-
pointer[:offset] = line.shift.to_i
|
34
|
-
pointer[:pos] = line.shift
|
35
|
-
pointer[:source] = line.shift
|
36
|
-
pointer[:is_semantic?] = (pointer[:source] == "0000")
|
37
|
-
pointer[:target] = pointer[:source][2..3]
|
38
|
-
pointer[:source] = pointer[:source][0..1]
|
39
|
-
pointer[:symbol] = pointer[:symbol][0]
|
40
|
-
@pointers.push pointer
|
42
|
+
|
43
|
+
# How many words does this Synset include?
|
44
|
+
def word_count
|
45
|
+
@word_counts.size
|
41
46
|
end
|
42
|
-
end
|
43
|
-
|
44
|
-
# How many words does this Synset include?
|
45
|
-
def size
|
46
|
-
@wordcounts.size
|
47
|
-
end
|
48
|
-
|
49
|
-
# Get a list of words included in this Synset
|
50
|
-
def words
|
51
|
-
@wordcounts.keys
|
52
|
-
end
|
53
|
-
|
54
|
-
# List of valid +pointer_symbol+s is in pointers.rb
|
55
|
-
def get_relation(pointer_symbol)
|
56
|
-
@pointers.reject { |pointer| pointer.symbol != pointer_symbol }.map { |pointer| Synset.new(@ss_type, pointer.offset) }
|
57
|
-
end
|
58
|
-
|
59
|
-
# Get the Synset of this sense's antonym
|
60
|
-
def antonym
|
61
|
-
get_relation(Antonym)
|
62
|
-
end
|
63
|
-
|
64
|
-
# Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).
|
65
|
-
def hypernym
|
66
|
-
get_relation(Hypernym)[0]
|
67
|
-
end
|
68
|
-
|
69
|
-
# Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit)
|
70
|
-
def hyponym
|
71
|
-
get_relation(Hyponym)
|
72
|
-
end
|
73
|
-
|
74
|
-
# Get the entire hypernym tree (from this synset all the way up to +entity+) as an array.
|
75
|
-
def expanded_hypernym
|
76
|
-
parent = self.hypernym
|
77
|
-
return [] if parent.nil?
|
78
|
-
|
79
|
-
return [parent, parent.expanded_hypernym].flatten
|
80
|
-
end
|
81
|
-
|
82
|
-
def to_s
|
83
|
-
"(#{@ss_type}) #{words.map {|x| x.gsub('_',' ')}.join(', ')} (#{@gloss})"
|
84
|
-
end
|
85
|
-
|
86
|
-
alias parent hypernym
|
87
|
-
alias children hyponym
|
88
|
-
end
|
89
47
|
|
48
|
+
# Get a list of words included in this Synset
|
49
|
+
def words
|
50
|
+
@word_counts.keys
|
51
|
+
end
|
52
|
+
|
53
|
+
# List of valid +pointer_symbol+s is in pointers.rb
|
54
|
+
def relation(pointer_symbol)
|
55
|
+
@pointers.select { |pointer| pointer.symbol == pointer_symbol }.
|
56
|
+
map! { |pointer| Synset.new(@synset_type, pointer.offset) }
|
57
|
+
end
|
58
|
+
|
59
|
+
# Get the Synset of this sense's antonym
|
60
|
+
def antonym
|
61
|
+
relation(ANTONYM)
|
62
|
+
end
|
63
|
+
|
64
|
+
# Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).
|
65
|
+
def hypernym
|
66
|
+
relation(HYPERNYM)[0]
|
67
|
+
end
|
68
|
+
|
69
|
+
# Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit)
|
70
|
+
def hyponym
|
71
|
+
relation(HYPONYM)
|
72
|
+
end
|
73
|
+
|
74
|
+
# Get the entire hypernym tree (from this synset all the way up to +entity+) as an array.
|
75
|
+
def expanded_hypernym
|
76
|
+
parent = hypernym
|
77
|
+
list = []
|
78
|
+
return list unless parent
|
79
|
+
|
80
|
+
while parent
|
81
|
+
break if list.include? parent.pos_offset
|
82
|
+
list.push parent.pos_offset
|
83
|
+
parent = parent.parent
|
84
|
+
end
|
85
|
+
|
86
|
+
list.flatten!
|
87
|
+
list.map! { |offset| Synset.new(@pos, offset)}
|
88
|
+
end
|
89
|
+
|
90
|
+
def to_s
|
91
|
+
"(#{@synset_type}) #{words.map { |x| x.tr('_',' ') }.join(', ')} (#{@gloss})"
|
92
|
+
end
|
93
|
+
|
94
|
+
alias size word_count
|
95
|
+
alias parent hypernym
|
96
|
+
alias children hyponym
|
97
|
+
end
|
90
98
|
end
|
data/lib/wordnet.rb
CHANGED
data/test/test_helper.rb
CHANGED
@@ -1,17 +1,16 @@
|
|
1
|
-
require "
|
2
|
-
require
|
1
|
+
require "bundler/setup"
|
2
|
+
require "maxitest/autorun"
|
3
3
|
|
4
|
+
$LOAD_PATH.unshift Bundler.root.join("lib")
|
5
|
+
require "wordnet"
|
4
6
|
|
5
|
-
|
6
|
-
def
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
define_method :"test_#{caller.first.split("/").last}" do
|
14
|
-
assert_equal expected_value, instance_eval(&block)
|
15
|
-
end
|
7
|
+
Minitest::Test.class_eval do
|
8
|
+
def with_db_path(path)
|
9
|
+
begin
|
10
|
+
old, WordNet::DB.path = WordNet::DB.path, path
|
11
|
+
yield
|
12
|
+
ensure
|
13
|
+
WordNet::DB.path = old
|
14
|
+
end
|
16
15
|
end
|
17
16
|
end
|
@@ -0,0 +1,14 @@
|
|
1
|
+
require_relative "../test_helper"
|
2
|
+
|
3
|
+
describe WordNet::DB do
|
4
|
+
it 'sets and reads path' do
|
5
|
+
with_db_path("WordNetPath") { WordNet::DB.path.must_equal "WordNetPath" }
|
6
|
+
end
|
7
|
+
|
8
|
+
it "opens a relative path" do
|
9
|
+
result = WordNet::DB.open(File.join("dict", "index.verb")) do |f|
|
10
|
+
f.gets
|
11
|
+
end
|
12
|
+
result.must_equal " 1 This software and database is being provided to you, the LICENSEE, by \n"
|
13
|
+
end
|
14
|
+
end
|
@@ -0,0 +1,94 @@
|
|
1
|
+
require_relative "../test_helper"
|
2
|
+
|
3
|
+
describe WordNet::Lemma do
|
4
|
+
describe ".find" do
|
5
|
+
it 'finds a lemma by string' do
|
6
|
+
lemma = WordNet::Lemma.find("fruit", :noun)
|
7
|
+
lemma.to_s.must_equal "fruit,n"
|
8
|
+
end
|
9
|
+
|
10
|
+
it 'caches found' do
|
11
|
+
lemma1 = WordNet::Lemma.find("fruit", :noun)
|
12
|
+
lemma2 = with_db_path "does-not-exist" do
|
13
|
+
WordNet::Lemma.find("fruit", :noun)
|
14
|
+
end
|
15
|
+
lemma1.word.must_equal lemma2.word
|
16
|
+
end
|
17
|
+
|
18
|
+
it 'only scans the db once' do
|
19
|
+
lemma1 = WordNet::Lemma.find("fruit", :noun)
|
20
|
+
lemma2 = with_db_path "does-not-exist" do
|
21
|
+
WordNet::Lemma.find("table", :noun)
|
22
|
+
end
|
23
|
+
lemma2.word.must_equal "table"
|
24
|
+
end
|
25
|
+
|
26
|
+
it 'can lookup different things' do
|
27
|
+
lemma1 = WordNet::Lemma.find("fruit", :noun)
|
28
|
+
lemma2 = WordNet::Lemma.find("banana", :noun)
|
29
|
+
lemma1.word.must_equal "fruit"
|
30
|
+
lemma2.word.must_equal "banana"
|
31
|
+
end
|
32
|
+
|
33
|
+
it 'does not find word in wrong file' do
|
34
|
+
lemma = WordNet::Lemma.find("elephant", :verb)
|
35
|
+
lemma.must_equal nil
|
36
|
+
end
|
37
|
+
|
38
|
+
it 'caches unfound' do
|
39
|
+
WordNet::Lemma.find("elephant", :verb)
|
40
|
+
lemma2 = with_db_path "does-not-exist" do
|
41
|
+
WordNet::Lemma.find("elephant", :verb)
|
42
|
+
end
|
43
|
+
lemma2.must_equal nil
|
44
|
+
end
|
45
|
+
|
46
|
+
it 'fails on unknown type' do
|
47
|
+
assert_raises Errno::ENOENT do
|
48
|
+
WordNet::Lemma.find("fruit", :sdjksdfjkdfskjsdfjk)
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
it "does not find by regexp" do
|
53
|
+
WordNet::Lemma.find(".", :verb).must_equal nil
|
54
|
+
end
|
55
|
+
end
|
56
|
+
|
57
|
+
describe ".find_all" do
|
58
|
+
it "finds all pos" do
|
59
|
+
result = WordNet::Lemma.find_all("fruit")
|
60
|
+
result.size.must_equal 2
|
61
|
+
result.map(&:pos).sort.must_equal ["n", "v"]
|
62
|
+
end
|
63
|
+
|
64
|
+
it "returns empty array for unfound" do
|
65
|
+
WordNet::Lemma.find_all("sdjkhdfsjfdsjhkfds").must_equal []
|
66
|
+
end
|
67
|
+
|
68
|
+
it "does not produce a circular reference" do
|
69
|
+
l = WordNet::Lemma.find_all("blink")[1]
|
70
|
+
l.synsets[1].expanded_hypernym.wont_be_nil
|
71
|
+
end
|
72
|
+
end
|
73
|
+
|
74
|
+
describe "#synsets" do
|
75
|
+
it 'finds them' do
|
76
|
+
lemma = WordNet::Lemma.find("fruit", :noun)
|
77
|
+
synsets = lemma.synsets
|
78
|
+
synsets.size.must_equal 3
|
79
|
+
synsets[1].to_s.must_equal "(n) yield, fruit (an amount of a product)"
|
80
|
+
end
|
81
|
+
end
|
82
|
+
|
83
|
+
describe ".new" do
|
84
|
+
it "builds all fields" do
|
85
|
+
lemma = WordNet::Lemma.new("fruit n 3 3 @ ~ + 3 3 13134947 04612722 07294550", 123)
|
86
|
+
lemma.id.must_equal 123
|
87
|
+
lemma.word.must_equal "fruit"
|
88
|
+
lemma.pos.must_equal "n"
|
89
|
+
lemma.pointer_symbols.must_equal ["@", "~", "+"]
|
90
|
+
lemma.tagsense_count.must_equal 3
|
91
|
+
lemma.synset_offsets.must_equal [13134947, 4612722, 7294550]
|
92
|
+
end
|
93
|
+
end
|
94
|
+
end
|
@@ -0,0 +1,26 @@
|
|
1
|
+
require_relative "../test_helper"
|
2
|
+
|
3
|
+
describe WordNet::Pointer do
|
4
|
+
let(:pointer) { WordNet::Pointer.new(symbol: "s", offset: 123, pos: "v", source: "1234") }
|
5
|
+
|
6
|
+
describe "#initialize" do
|
7
|
+
it "sets all values" do
|
8
|
+
pointer.symbol.must_equal "s"
|
9
|
+
pointer.offset.must_equal 123
|
10
|
+
pointer.pos.must_equal "v"
|
11
|
+
pointer.source.must_equal "12"
|
12
|
+
pointer.target.must_equal "34"
|
13
|
+
end
|
14
|
+
end
|
15
|
+
|
16
|
+
describe "#is_semantic?" do
|
17
|
+
it "is not semantic for non-0" do
|
18
|
+
pointer.is_semantic?.must_equal false
|
19
|
+
end
|
20
|
+
|
21
|
+
it "is semantic for all-0" do
|
22
|
+
pointer = WordNet::Pointer.new(symbol: "s", offset: 123, pos: "v", source: "0000")
|
23
|
+
pointer.is_semantic?.must_equal true
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
data/test/unit/synset_test.rb
CHANGED
@@ -1,43 +1,39 @@
|
|
1
|
-
|
1
|
+
require_relative "../test_helper"
|
2
2
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
def setup
|
7
|
-
if @@synsets.nil?
|
8
|
-
index = WordNet::NounIndex.instance
|
9
|
-
lemma = index.find("fruit")
|
10
|
-
@@synsets = lemma.get_synsets
|
11
|
-
end
|
3
|
+
describe WordNet::Synset do
|
4
|
+
def self.synsets
|
5
|
+
@synsets ||= WordNet::Lemma.find("fruit", :noun).synsets
|
12
6
|
end
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
assert_equal
|
7
|
+
|
8
|
+
let(:synsets) { self.class.synsets }
|
9
|
+
|
10
|
+
it 'get synsets for a lemma' do
|
11
|
+
assert_equal 3, synsets.size
|
12
|
+
assert_equal "(n) fruit (the ripened reproductive body of a seed plant)",synsets[0].to_s
|
13
|
+
assert_equal "an amount of a product",synsets[1].gloss
|
18
14
|
end
|
19
|
-
|
20
|
-
|
21
|
-
hypernym =
|
22
|
-
hypernym =
|
15
|
+
|
16
|
+
it 'get hypernym for a synset' do
|
17
|
+
hypernym = synsets[0].relation(WordNet::HYPERNYM)
|
18
|
+
hypernym = synsets[0].hypernym
|
23
19
|
assert_equal 1,hypernym.size
|
24
20
|
assert_equal "(n) reproductive structure (the parts of a plant involved in its reproduction)",hypernym.to_s
|
25
21
|
end
|
26
22
|
|
27
|
-
|
28
|
-
hypernym =
|
29
|
-
hypernym2 =
|
23
|
+
it 'test shorthand for get_relation' do
|
24
|
+
hypernym = synsets[0].relation(WordNet::HYPERNYM)
|
25
|
+
hypernym2 = synsets[0].hypernym
|
30
26
|
assert_equal hypernym[0].gloss, hypernym2.gloss
|
31
27
|
end
|
32
|
-
|
33
|
-
|
34
|
-
hyponym =
|
28
|
+
|
29
|
+
it 'get hyponyms for a synset' do
|
30
|
+
hyponym = synsets[0].relation(WordNet::HYPONYM)
|
35
31
|
assert_equal 29,hyponym.size
|
36
32
|
assert_equal "fruit of various buckthorns yielding dyes or pigments",hyponym[26].gloss
|
37
33
|
end
|
38
|
-
|
39
|
-
|
40
|
-
expanded =
|
34
|
+
|
35
|
+
it 'test expanded hypernym tree' do
|
36
|
+
expanded = synsets[0].expanded_hypernym
|
41
37
|
assert_equal 8, expanded.size
|
42
38
|
assert_equal "entity", expanded[expanded.size-1].words[0]
|
43
39
|
end
|
metadata
CHANGED
@@ -1,33 +1,23 @@
|
|
1
|
-
--- !ruby/object:Gem::Specification
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
2
|
name: rwordnet
|
3
|
-
version: !ruby/object:Gem::Version
|
4
|
-
|
5
|
-
segments:
|
6
|
-
- 0
|
7
|
-
- 1
|
8
|
-
- 3
|
9
|
-
version: 0.1.3
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.0
|
10
5
|
platform: ruby
|
11
|
-
authors:
|
6
|
+
authors:
|
12
7
|
- Trevor Fountain
|
13
8
|
- Wolfram Sieber
|
9
|
+
- Michael Grosser
|
14
10
|
autorequire:
|
15
11
|
bindir: bin
|
16
12
|
cert_chain: []
|
17
|
-
|
18
|
-
date: 2010-10-15 00:00:00 +01:00
|
19
|
-
default_executable:
|
13
|
+
date: 2015-03-08 00:00:00.000000000 Z
|
20
14
|
dependencies: []
|
21
|
-
|
22
|
-
description: A pure Ruby interface to the WordNet database
|
15
|
+
description:
|
23
16
|
email: doches@gmail.com
|
24
17
|
executables: []
|
25
|
-
|
26
18
|
extensions: []
|
27
|
-
|
28
|
-
|
29
|
-
- README.markdown
|
30
|
-
files:
|
19
|
+
extra_rdoc_files: []
|
20
|
+
files:
|
31
21
|
- History.txt
|
32
22
|
- README.markdown
|
33
23
|
- WordNet-3.0/AUTHORS
|
@@ -42,54 +32,43 @@ files:
|
|
42
32
|
- WordNet-3.0/dict/index.adv
|
43
33
|
- WordNet-3.0/dict/index.noun
|
44
34
|
- WordNet-3.0/dict/index.verb
|
35
|
+
- examples/benchmark.rb
|
45
36
|
- examples/dictionary.rb
|
46
37
|
- examples/full_hypernym.rb
|
47
38
|
- lib/wordnet.rb
|
48
|
-
- lib/wordnet/
|
39
|
+
- lib/wordnet/db.rb
|
49
40
|
- lib/wordnet/lemma.rb
|
50
41
|
- lib/wordnet/pointer.rb
|
51
42
|
- lib/wordnet/pointers.rb
|
52
|
-
- lib/wordnet/pos.rb
|
53
43
|
- lib/wordnet/synset.rb
|
54
|
-
- lib/wordnet/
|
44
|
+
- lib/wordnet/version.rb
|
55
45
|
- test/test_helper.rb
|
56
|
-
- test/unit/
|
46
|
+
- test/unit/db_test.rb
|
47
|
+
- test/unit/lemma_test.rb
|
48
|
+
- test/unit/pointer_test.rb
|
57
49
|
- test/unit/synset_test.rb
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
50
|
+
homepage: https://github.com/doches/rwordnet
|
51
|
+
licenses:
|
52
|
+
- MIT
|
53
|
+
metadata: {}
|
63
54
|
post_install_message:
|
64
|
-
rdoc_options:
|
65
|
-
|
66
|
-
require_paths:
|
55
|
+
rdoc_options: []
|
56
|
+
require_paths:
|
67
57
|
- lib
|
68
|
-
required_ruby_version: !ruby/object:Gem::Requirement
|
69
|
-
requirements:
|
58
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
59
|
+
requirements:
|
70
60
|
- - ">="
|
71
|
-
- !ruby/object:Gem::Version
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
required_rubygems_version: !ruby/object:Gem::Requirement
|
76
|
-
requirements:
|
61
|
+
- !ruby/object:Gem::Version
|
62
|
+
version: 2.0.0
|
63
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
64
|
+
requirements:
|
77
65
|
- - ">="
|
78
|
-
- !ruby/object:Gem::Version
|
79
|
-
|
80
|
-
- 0
|
81
|
-
version: "0"
|
66
|
+
- !ruby/object:Gem::Version
|
67
|
+
version: '0'
|
82
68
|
requirements: []
|
83
|
-
|
84
69
|
rubyforge_project:
|
85
|
-
rubygems_version:
|
70
|
+
rubygems_version: 2.2.2
|
86
71
|
signing_key:
|
87
|
-
specification_version:
|
72
|
+
specification_version: 4
|
88
73
|
summary: A pure Ruby interface to the WordNet database
|
89
|
-
test_files:
|
90
|
-
- test/unit/index_test.rb
|
91
|
-
- test/unit/synset_test.rb
|
92
|
-
- test/unit/wordnetdb_test.rb
|
93
|
-
- test/test_helper.rb
|
94
|
-
- examples/full_hypernym.rb
|
95
|
-
- examples/dictionary.rb
|
74
|
+
test_files: []
|
data/lib/wordnet/index.rb
DELETED
@@ -1,82 +0,0 @@
|
|
1
|
-
require 'singleton'
|
2
|
-
module WordNet
|
3
|
-
|
4
|
-
# Index is a WordNet lexicon. Note that Index is the base class; you probably want to be using the NounIndex, VerbIndex, etc. classes instead.
|
5
|
-
# Note that Indices are Singletons -- get an Index object by calling <POS>Index.instance, not <POS>Index.new.
|
6
|
-
class Index
|
7
|
-
# Create a new index for the given part of speech. +pos+ can be one of +noun+, +verb+, +adj+, or +adv+.
|
8
|
-
def initialize(pos)
|
9
|
-
@pos = pos
|
10
|
-
@db = {}
|
11
|
-
|
12
|
-
@finished_reading = false
|
13
|
-
end
|
14
|
-
|
15
|
-
# Find a lemma for a given word. Returns a Lemma which can then be used to access the synsets for the word.
|
16
|
-
def find(lemma_str)
|
17
|
-
# Look for the lemma in the part of the DB already read...
|
18
|
-
return @db[lemma_str] if @db.include?(lemma_str)
|
19
|
-
|
20
|
-
return nil if @finished_reading
|
21
|
-
|
22
|
-
# If we didn't find it, read in some more from the DB.
|
23
|
-
index = WordNetDB.open(File.join(WordNetDB.path,"dict","index.#{@pos}"))
|
24
|
-
|
25
|
-
lemma_counter = 1
|
26
|
-
if not index.closed?
|
27
|
-
loop do
|
28
|
-
break if index.eof?
|
29
|
-
line = index.readline
|
30
|
-
lemma = Lemma.new(line, lemma_counter); lemma_counter += 1
|
31
|
-
@db[lemma.word] = lemma
|
32
|
-
if line =~ /^#{lemma_str} /
|
33
|
-
return lemma
|
34
|
-
end
|
35
|
-
end
|
36
|
-
index.close
|
37
|
-
end
|
38
|
-
|
39
|
-
@finished_reading = true
|
40
|
-
|
41
|
-
# If we *still* didn't find it, return nil. It must not be in the database...
|
42
|
-
return nil
|
43
|
-
end
|
44
|
-
end
|
45
|
-
|
46
|
-
# An Index of nouns. Create a NounIndex by calling `NounIndex.instance`
|
47
|
-
class NounIndex < Index
|
48
|
-
include Singleton
|
49
|
-
|
50
|
-
def initialize
|
51
|
-
super("noun")
|
52
|
-
end
|
53
|
-
end
|
54
|
-
|
55
|
-
# An Index of verbs. Create a VerbIndex by calling `VerbIndex.instance`
|
56
|
-
class VerbIndex < Index
|
57
|
-
include Singleton
|
58
|
-
|
59
|
-
def initialize
|
60
|
-
super("verb")
|
61
|
-
end
|
62
|
-
end
|
63
|
-
|
64
|
-
# An Index of adjectives. Create an AdjectiveIndex by `AdjectiveIndex.instance`
|
65
|
-
class AdjectiveIndex < Index
|
66
|
-
include Singleton
|
67
|
-
|
68
|
-
def initialize
|
69
|
-
super("adj")
|
70
|
-
end
|
71
|
-
end
|
72
|
-
|
73
|
-
# An Index of adverbs. Create an AdverbIndex by `AdverbIndex.instance`
|
74
|
-
class AdverbIndex < Index
|
75
|
-
include Singleton
|
76
|
-
|
77
|
-
def initialize
|
78
|
-
super("adv")
|
79
|
-
end
|
80
|
-
end
|
81
|
-
|
82
|
-
end
|
data/lib/wordnet/pos.rb
DELETED
data/lib/wordnet/wordnetdb.rb
DELETED
@@ -1,54 +0,0 @@
|
|
1
|
-
module WordNet
|
2
|
-
|
3
|
-
# Represents the WordNet database, and provides some basic interaction.
|
4
|
-
class WordNetDB
|
5
|
-
# By default, use the bundled WordNet
|
6
|
-
@@path = File.join(File.dirname(__FILE__),"/../../WordNet-3.0/")
|
7
|
-
@@files = {}
|
8
|
-
|
9
|
-
# To use your own WordNet installation (rather than the one bundled with rwordnet:
|
10
|
-
def WordNetDB.path=(path_to_wordnet)
|
11
|
-
@@path = path_to_wordnet
|
12
|
-
end
|
13
|
-
|
14
|
-
# Returns the path to the WordNet installation currently in use. Defaults to the bundled version of WordNet.
|
15
|
-
def WordNetDB.path
|
16
|
-
@@path
|
17
|
-
end
|
18
|
-
|
19
|
-
# Look up a word in WordNet. Returns a list of lemmas occuring in any of the index files (noun, verb, adjective, adverb).
|
20
|
-
def WordNetDB.find(word)
|
21
|
-
lemmas = []
|
22
|
-
[NounIndex, VerbIndex, AdjectiveIndex, AdverbIndex].each do |index|
|
23
|
-
lemmas.push index.instance.find(word)
|
24
|
-
end
|
25
|
-
return lemmas.flatten.reject { |x| x.nil? }
|
26
|
-
end
|
27
|
-
|
28
|
-
# Register a new DB file handle. You shouldn't need to call this method; it's called automatically every time you open an index or data file.
|
29
|
-
def WordNetDB.open(path)
|
30
|
-
# If the file is already open, just return the handle.
|
31
|
-
return @@files[path] if @@files.include?(path) and not @@files[path].closed?
|
32
|
-
|
33
|
-
# Open and store
|
34
|
-
@@files[path] = File.open(path,"r")
|
35
|
-
return @@files[path]
|
36
|
-
end
|
37
|
-
|
38
|
-
# You should call this method after you're done using WordNet.
|
39
|
-
def WordNetDB.close
|
40
|
-
WordNetDB.finalize(0)
|
41
|
-
end
|
42
|
-
|
43
|
-
def WordNetDB.finalize(id)
|
44
|
-
@@files.each_value do |handle|
|
45
|
-
begin
|
46
|
-
handle.close
|
47
|
-
rescue IOError
|
48
|
-
; # Keep going, close the next file.
|
49
|
-
end
|
50
|
-
end
|
51
|
-
end
|
52
|
-
end
|
53
|
-
|
54
|
-
end
|
data/test/unit/index_test.rb
DELETED
@@ -1,21 +0,0 @@
|
|
1
|
-
require File.dirname(__FILE__) + "/../test_helper.rb"
|
2
|
-
|
3
|
-
class TestIndex < Test::Unit::TestCase
|
4
|
-
@@index = nil
|
5
|
-
|
6
|
-
def setup
|
7
|
-
@@index = WordNet::NounIndex.instance if @@index.nil?
|
8
|
-
end
|
9
|
-
|
10
|
-
test 'find a lemma by string' do
|
11
|
-
lemma = @@index.find("fruit")
|
12
|
-
assert_equal "fruit,n",lemma.to_s
|
13
|
-
end
|
14
|
-
|
15
|
-
test 'get synsets for a lemma' do
|
16
|
-
lemma = @@index.find("fruit")
|
17
|
-
synsets = lemma.get_synsets
|
18
|
-
assert_equal 3, synsets.size
|
19
|
-
assert_equal "(n) yield, fruit (an amount of a product)",synsets[1].to_s
|
20
|
-
end
|
21
|
-
end
|
data/test/unit/wordnetdb_test.rb
DELETED
@@ -1,15 +0,0 @@
|
|
1
|
-
require File.dirname(__FILE__) + "/../test_helper.rb"
|
2
|
-
|
3
|
-
class TestWordNetDB < Test::Unit::TestCase
|
4
|
-
include WordNet
|
5
|
-
|
6
|
-
test 'set and read path' do
|
7
|
-
WordNetDB.path = "WordNetPath"
|
8
|
-
assert_equal "WordNetPath",WordNetDB.path
|
9
|
-
end
|
10
|
-
|
11
|
-
test 'find a word' do
|
12
|
-
lemmas = WordNetDB.find("fruit")
|
13
|
-
assert_equal 2,lemmas.size
|
14
|
-
end
|
15
|
-
end
|