words 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.markdown +45 -10
- data/Rakefile +4 -3
- data/VERSION +1 -1
- data/{build_dataset.rb → bin/build_wordnet} +38 -20
- data/examples.rb +11 -4
- data/lib/words.rb +200 -30
- data/words.gemspec +60 -0
- metadata +8 -8
- data/data/wordnet.tct +0 -0
data/README.markdown
CHANGED
@@ -2,11 +2,43 @@
|
|
2
2
|
|
3
3
|
## About ##
|
4
4
|
|
5
|
-
Words implements a fast interface to [Wordnet®](http://wordnet.princeton.edu) which makes use of [Tokyo Cabinet](http://1978th.net/tokyocabinet/) and
|
5
|
+
Words implements a fast interface to [Wordnet®](http://wordnet.princeton.edu) which provides both a pure ruby and an FFI powered backend over the same easy-to-use API. The FFI backend makes use of [Tokyo Cabinet](http://1978th.net/tokyocabinet/) and the FFI interface, [rufus-tokyo](http://github.com/jmettraux/rufus-tokyo), to provide cross ruby distribution compatability and blistering speed. The pure ruby interface operates on a special ruby optimised index along with the basic dictionary files provided by WordNet®. I have attempted to provide ease of use in the form of a simple yet powerful api and installation is a sintch!
|
6
6
|
|
7
|
-
## Installation ##
|
7
|
+
## Pre-Installation ##
|
8
8
|
|
9
|
-
First ensure you have
|
9
|
+
First ensure you have a copy of the wordnet data files. This is generally available from your Linux/OSX package manager:
|
10
|
+
|
11
|
+
#Ubuntu
|
12
|
+
sudo apt-get install wordnet-base
|
13
|
+
|
14
|
+
#Fedora/RHL
|
15
|
+
sudo yum update wordnet
|
16
|
+
|
17
|
+
#MacPorts
|
18
|
+
sudo port install wordnet
|
19
|
+
|
20
|
+
or you can simply download and install (Unix/OSX):
|
21
|
+
|
22
|
+
wget http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
|
23
|
+
sudo mkdir /usr/local/share/wordnet
|
24
|
+
sudo tar -C /usr/local/share/wordnet/ -xzf WNdb-3.0.tar.gz
|
25
|
+
|
26
|
+
or (Windows)
|
27
|
+
|
28
|
+
Download http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
|
29
|
+
Unzip
|
30
|
+
|
31
|
+
## For Tokyo Backend Only ##
|
32
|
+
|
33
|
+
Unless you want to use the tokyo backend you are now ready to install Words && build the data, otherwise if you want to use the tokyo backend (FAST!) you will also need [Tokyo Cabinet](http://1978th.net/tokyocabinet/) installed. It should be nice and easy... something like:
|
34
|
+
|
35
|
+
wget http://1978th.net/tokyocabinet/tokyocabinet-1.4.41.tar.gz
|
36
|
+
cd tokyo-cabinet/
|
37
|
+
./configure
|
38
|
+
make
|
39
|
+
sudo make install
|
40
|
+
|
41
|
+
## GEM Installation ##
|
10
42
|
|
11
43
|
After this it should be just a gem to install. For those of you with old rubygems versions first:
|
12
44
|
|
@@ -19,22 +51,22 @@ Otherwise and after it's simply:
|
|
19
51
|
|
20
52
|
Then your ready to rock and roll. :)
|
21
53
|
|
22
|
-
## Build Data
|
54
|
+
## Build Data ##
|
23
55
|
|
24
|
-
|
56
|
+
To build the wordnet dataset (or index for pure) file yourself, from the original wordnet files, you can use the bundled "build_wordnet" command
|
25
57
|
|
26
|
-
|
27
|
-
sudo
|
58
|
+
build_wordnet -h # this will give you the usage information
|
59
|
+
sudo build_wordnet -v --build-tokyo # this would attempt to build the tokyo backend data locating the original wordnet files through a search...
|
60
|
+
sudo build_wordnet -v --build-pure # this would attempt to build the pure backend index locating the original wordnet files through a search...
|
28
61
|
|
29
62
|
## Usage ##
|
30
63
|
|
31
64
|
Heres a few little examples of using words within your programs.
|
32
65
|
|
33
|
-
|
34
66
|
require 'rubygems'
|
35
67
|
require 'words'
|
36
68
|
|
37
|
-
data = Words::Words.new
|
69
|
+
data = Words::Words.new # or: data = Words::Words.new(:pure) for the pure ruby backend
|
38
70
|
|
39
71
|
# locate a word
|
40
72
|
lemma = data.find("bat")
|
@@ -45,6 +77,7 @@ Heres a few little examples of using words within your programs.
|
|
45
77
|
lemma.synsets(:noun) # => array of synsets which represent nouns of the lemma bat
|
46
78
|
# or
|
47
79
|
lemma.nouns # => array of synsets which represent nouns of the lemma bat
|
80
|
+
lemma.noun_ids # => array of synsets ids which represent nouns of the lemma bat
|
48
81
|
lemma.verbs? #=> true
|
49
82
|
|
50
83
|
# specify a sense
|
@@ -53,6 +86,7 @@ Heres a few little examples of using words within your programs.
|
|
53
86
|
|
54
87
|
sense.gloss # => a club used for hitting a ball in various games
|
55
88
|
sense2.words # => ["cricket bat", "bat"]
|
89
|
+
sense2.lexical_description # => a description of the lexical meaning of the synset
|
56
90
|
sense.relations.first # => "Semantic hypernym relation between n02806379 and n03053474"
|
57
91
|
|
58
92
|
sense.relations(:hyponym) # => Array of hyponyms associated with the sense
|
@@ -68,7 +102,8 @@ Heres a few little examples of using words within your programs.
|
|
68
102
|
sense.derivationally_related_forms.first.source_word # => "bat"
|
69
103
|
sense.derivationally_related_forms.first.destination_word # => "bat"
|
70
104
|
sense.derivationally_related_forms.first.destination # => the synset of v01413191
|
71
|
-
|
105
|
+
|
106
|
+
These and more examples are available from within the examples.rb file!
|
72
107
|
|
73
108
|
## Note on Patches/Pull Requests ##
|
74
109
|
|
data/Rakefile
CHANGED
@@ -10,9 +10,10 @@ begin
|
|
10
10
|
gem.email = "roja@arbia.co.uk"
|
11
11
|
gem.homepage = "http://github.com/roja/words"
|
12
12
|
gem.authors = ["Roja Buck"]
|
13
|
-
gem.
|
13
|
+
gem.add_dependency "trollop", ">= 1.15"
|
14
14
|
gem.add_dependency 'rufus-tokyo', '>= 1.0.5'
|
15
|
-
|
15
|
+
gem.executables = [ "build_wordnet" ]
|
16
|
+
gem.default_executable = "build_wordnet"
|
16
17
|
end
|
17
18
|
Jeweler::GemcutterTasks.new
|
18
19
|
rescue LoadError
|
@@ -46,7 +47,7 @@ task :default => :test
|
|
46
47
|
require 'rake/rdoctask'
|
47
48
|
Rake::RDocTask.new do |rdoc|
|
48
49
|
version = File.exist?('VERSION') ? File.read('VERSION') : ""
|
49
|
-
|
50
|
+
|
50
51
|
rdoc.rdoc_dir = 'rdoc'
|
51
52
|
rdoc.title = "words #{version}"
|
52
53
|
rdoc.rdoc_files.include('README*')
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.2.0
|
@@ -6,7 +6,6 @@ require 'pathname'
|
|
6
6
|
# gem includes
|
7
7
|
require 'rubygems'
|
8
8
|
require 'trollop'
|
9
|
-
require 'pstore'
|
10
9
|
require 'rufus-tokyo'
|
11
10
|
|
12
11
|
POS_FILE_TYPES = %w{ adj adv noun verb }
|
@@ -27,7 +26,10 @@ if __FILE__ == $0
|
|
27
26
|
opts = Trollop::options do
|
28
27
|
opt :verbose, "Output verbose program detail.", :default => false
|
29
28
|
opt :wordnet, "Location of the wordnet dictionary directory", :default => "Search..."
|
29
|
+
opt :build_tokyo, "Build the tokyo dataset?", :default => false
|
30
|
+
opt :build_pure, "Build the pure ruby dataset?", :default => false
|
30
31
|
end
|
32
|
+
Trollop::die :build_tokyo, "Either tokyo dataset or pure ruby dataset are required" if !opts[:build_tokyo] && !opts[:build_pure]
|
31
33
|
puts "Verbose mode enabled" if (VERBOSE = opts[:verbose])
|
32
34
|
|
33
35
|
wordnet_dir = nil
|
@@ -57,7 +59,8 @@ if __FILE__ == $0
|
|
57
59
|
|
58
60
|
# Build data
|
59
61
|
|
60
|
-
|
62
|
+
index_hash = Hash.new
|
63
|
+
data_hash = Hash.new
|
61
64
|
POS_FILE_TYPES.each do |file_pos|
|
62
65
|
|
63
66
|
puts "Building #{file_pos} indexes..." if VERBOSE
|
@@ -73,30 +76,45 @@ if __FILE__ == $0
|
|
73
76
|
tagsense_count = pos + index_parts.shift
|
74
77
|
synset_ids = Array.new(synset_count).map { POS_FILE_TYPE_TO_SHORT[file_pos] + index_parts.shift }
|
75
78
|
|
76
|
-
|
79
|
+
index_hash[lemma] = { "synset_ids" => [], "tagsense_counts" => [] } if index_hash[lemma].nil?
|
80
|
+
index_hash[lemma] = { "lemma" => lemma, "synset_ids" => index_hash[lemma]["synset_ids"] + synset_ids, "tagsense_counts" => index_hash[lemma]["tagsense_counts"] + [tagsense_count] }
|
77
81
|
|
78
|
-
hash[lemma] = { "lemma" => lemma, "synset_ids" => (hash[lemma]["synset_ids"].split('|') + synset_ids).join('|'), # append synsets
|
79
|
-
"tagsense_counts" => (hash[lemma]["tagsense_counts"].split('|') << tagsense_count).join('|') } # append pointer symbols
|
80
82
|
end
|
81
83
|
|
82
|
-
|
83
|
-
|
84
|
-
# add data
|
85
|
-
(wordnet_dir + "data.#{file_pos}").each_line do |data_line|
|
86
|
-
next if data_line[0, 2] == " "
|
87
|
-
data_line, gloss = data_line.split(" | ")
|
88
|
-
data_parts = data_line.split(" ")
|
89
|
-
|
90
|
-
synset_id, lexical_filenum, synset_type, word_count = POS_FILE_TYPE_TO_SHORT[file_pos] + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
|
91
|
-
words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
|
92
|
-
relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
|
84
|
+
if opts[:build_tokyo]
|
85
|
+
puts "Building #{file_pos} data..." if VERBOSE
|
93
86
|
|
94
|
-
|
95
|
-
|
87
|
+
# add data
|
88
|
+
(wordnet_dir + "data.#{file_pos}").each_line do |data_line|
|
89
|
+
next if data_line[0, 2] == " "
|
90
|
+
data_line, gloss = data_line.split(" | ")
|
91
|
+
data_parts = data_line.split(" ")
|
92
|
+
|
93
|
+
synset_id, lexical_filenum, synset_type, word_count = POS_FILE_TYPE_TO_SHORT[file_pos] + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
|
94
|
+
words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
|
95
|
+
relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
|
96
|
+
|
97
|
+
data_hash[synset_id] = { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type,
|
98
|
+
"words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss.strip }
|
99
|
+
end
|
96
100
|
end
|
97
101
|
|
98
|
-
|
99
102
|
end
|
100
|
-
|
103
|
+
|
104
|
+
if opts[:build_tokyo]
|
105
|
+
tokyo_hash = Rufus::Tokyo::Table.new("#{File.dirname(__FILE__)}/../data/wordnet.tct")
|
106
|
+
index_hash.each { |k,v| tokyo_hash[k] = { "lemma" => v["lemma"], "synset_ids" => v["synset_ids"].join('|'), "tagsense_counts" => v["tagsense_counts"].join('|') } }
|
107
|
+
data_hash.each { |k,v| tokyo_hash[k] = v }
|
108
|
+
tokyo_hash.close
|
109
|
+
end
|
110
|
+
|
111
|
+
if opts[:build_pure]
|
112
|
+
index = Hash.new
|
113
|
+
index_hash.each { |k,v| index[k] = [v["lemma"], v["tagsense_counts"].join('|'), v["synset_ids"].join('|')] }
|
114
|
+
File.open("#{File.dirname(__FILE__)}/../data/index.dmp",'w') do |file|
|
115
|
+
file.write Marshal.dump(index)
|
116
|
+
end
|
117
|
+
end
|
118
|
+
|
101
119
|
|
102
120
|
end
|
data/examples.rb
CHANGED
@@ -4,25 +4,32 @@ require 'lib/words'
|
|
4
4
|
|
5
5
|
if __FILE__ == $0
|
6
6
|
|
7
|
-
wordnet = Words::Words.new
|
7
|
+
wordnet = Words::Words.new # :pure
|
8
|
+
|
9
|
+
puts wordnet
|
8
10
|
|
9
11
|
puts wordnet.find('bat')
|
10
12
|
puts wordnet.find('bat').available_pos.inspect
|
11
13
|
puts wordnet.find('bat').lemma
|
14
|
+
puts wordnet.find('bat').nouns?
|
12
15
|
puts wordnet.find('bat').synsets('noun')
|
13
|
-
puts wordnet.find('bat').
|
14
|
-
puts wordnet.find('bat').synsets(
|
16
|
+
puts wordnet.find('bat').noun_ids
|
17
|
+
puts wordnet.find('bat').synsets(:noun).last.words.inspect
|
18
|
+
puts wordnet.find('bat').nouns.last.relations
|
15
19
|
wordnet.find('bat').synsets('noun').last.relations.each { |relation| puts relation.inspect }
|
16
|
-
puts wordnet.find('bat').synsets('noun').last.methods
|
17
20
|
puts wordnet.find('bat').synsets('noun').last.hyponyms?
|
18
21
|
puts wordnet.find('bat').synsets('noun').last.participle_of_verbs?
|
19
22
|
|
20
23
|
puts wordnet.find('bat').synsets('noun').last.relations(:hyponym)
|
24
|
+
puts wordnet.find('bat').synsets('noun').last.hyponyms?
|
21
25
|
puts wordnet.find('bat').synsets('noun').last.relations("~")
|
22
26
|
puts wordnet.find('bat').synsets('verb').last.inspect
|
23
27
|
puts wordnet.find('bat').synsets('verb').last.words
|
24
28
|
puts wordnet.find('bat').synsets('verb').last.words_with_num.inspect
|
25
29
|
|
30
|
+
puts wordnet.find('bat').synsets('verb').first.lexical.inspect
|
31
|
+
puts wordnet.find('bat').synsets('verb').first.lexical_description
|
32
|
+
|
26
33
|
wordnet.close
|
27
34
|
|
28
35
|
end
|
data/lib/words.rb
CHANGED
@@ -10,12 +10,94 @@ module Words
|
|
10
10
|
|
11
11
|
class WordnetConnection
|
12
12
|
|
13
|
-
|
14
|
-
|
13
|
+
SHORT_TO_POS_FILE_TYPE = { 'a' => 'adj', 'r' => 'adv', 'n' => 'noun', 'v' => 'verb' }
|
14
|
+
|
15
|
+
attr_reader :connected, :connection_type, :data_path, :wordnet_dir
|
16
|
+
|
17
|
+
def initialize(type, path, wordnet_path)
|
18
|
+
@data_path = Pathname.new("#{File.dirname(__FILE__)}/../data/wordnet.tct") if type == :tokyo && path == :default
|
19
|
+
@data_path = Pathname.new("#{File.dirname(__FILE__)}/../data/index.dmp") if type == :pure && path == :default
|
20
|
+
@connection_type = type
|
21
|
+
|
22
|
+
if @data_path.exist?
|
23
|
+
if @connection_type == :tokyo
|
24
|
+
@connection = Rufus::Tokyo::Table.new(@data_path.to_s)
|
25
|
+
@connected = true
|
26
|
+
elsif @connection_type == :pure
|
27
|
+
# open the index is there
|
28
|
+
File.open(@data_path,'r') do |file|
|
29
|
+
@connection = Marshal.load file.read
|
30
|
+
end
|
31
|
+
# search for the wordnet files
|
32
|
+
if locate_wordnet?(wordnet_path)
|
33
|
+
@connected = true
|
34
|
+
else
|
35
|
+
@connected = false
|
36
|
+
raise "Failed to locate the wordnet database. Please ensure it is installed and that if it resides at a custom path that path is given as an argument when constructing the Words object."
|
37
|
+
end
|
38
|
+
else
|
39
|
+
@connected = false
|
40
|
+
end
|
41
|
+
else
|
42
|
+
@connected = false
|
43
|
+
raise "Failed to locate the words #{ @connection_type == :pure ? 'index' : 'dataset' } at #{@data_path}. Please insure you have created it using the words gems provided 'build_dataset.rb' command."
|
44
|
+
end
|
45
|
+
|
46
|
+
end
|
47
|
+
|
48
|
+
def close
|
49
|
+
@connected = false
|
50
|
+
if @connected && connection_type == :tokyo
|
51
|
+
connection.close
|
52
|
+
end
|
53
|
+
return true
|
54
|
+
end
|
55
|
+
|
56
|
+
def lemma(term)
|
57
|
+
if connection_type == :pure
|
58
|
+
raw_lemma = @connection[term]
|
59
|
+
{ 'lemma' => raw_lemma[0], 'tagsense_counts' => raw_lemma[1], 'synset_ids' => raw_lemma[2]}
|
60
|
+
else
|
61
|
+
@connection[term]
|
62
|
+
end
|
63
|
+
end
|
64
|
+
|
65
|
+
def synset(synset_id)
|
66
|
+
if connection_type == :pure
|
67
|
+
pos = synset_id[0,1]
|
68
|
+
File.open(@wordnet_dir + "data.#{SHORT_TO_POS_FILE_TYPE[pos]}","r") do |file|
|
69
|
+
file.seek(synset_id[1..-1].to_i)
|
70
|
+
data_line, gloss = file.readline.strip.split(" | ")
|
71
|
+
data_parts = data_line.split(" ")
|
72
|
+
synset_id, lexical_filenum, synset_type, word_count = pos + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
|
73
|
+
words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
|
74
|
+
relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
|
75
|
+
{ "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type, "words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss.strip }
|
76
|
+
end
|
77
|
+
else
|
78
|
+
@connection[synset_id]
|
79
|
+
end
|
15
80
|
end
|
16
81
|
|
17
|
-
def
|
18
|
-
|
82
|
+
def locate_wordnet?(base_dirs)
|
83
|
+
|
84
|
+
base_dirs = case base_dirs
|
85
|
+
when :search
|
86
|
+
['/usr/share/wordnet', '/usr/local/share/wordnet', '/usr/local/WordNet-3.0']
|
87
|
+
else
|
88
|
+
[ base_dirs ]
|
89
|
+
end
|
90
|
+
|
91
|
+
base_dirs.each do |dir|
|
92
|
+
["", "dict"].each do |sub_folder|
|
93
|
+
path = Pathname.new(dir + sub_folder)
|
94
|
+
@wordnet_dir = path if (path + "data.noun").exist?
|
95
|
+
break if !@wordnet_dir.nil?
|
96
|
+
end
|
97
|
+
end
|
98
|
+
|
99
|
+
return !@wordnet_dir.nil?
|
100
|
+
|
19
101
|
end
|
20
102
|
|
21
103
|
end
|
@@ -29,7 +111,8 @@ module Words
|
|
29
111
|
"\\" => :pertainym, "<" => :participle_of_verb, "&" => :similar_to, "^" => :see_also }
|
30
112
|
SYMBOL_TO_RELATION = RELATION_TO_SYMBOL.invert
|
31
113
|
|
32
|
-
def initialize(relation_construct, source_synset)
|
114
|
+
def initialize(relation_construct, source_synset, wordnet_connection)
|
115
|
+
@wordnet_connection = wordnet_connection
|
33
116
|
@symbol, @dest_synset_id, @pos, @source_dest = relation_construct.split('.')
|
34
117
|
@dest_synset_id = @pos + @dest_synset_id
|
35
118
|
@symbol = RELATION_TO_SYMBOL[@symbol]
|
@@ -66,7 +149,7 @@ module Words
|
|
66
149
|
end
|
67
150
|
|
68
151
|
def destination
|
69
|
-
@destination = Synset.new
|
152
|
+
@destination = Synset.new @dest_synset_id, @wordnet_connection unless defined? @destination
|
70
153
|
@destination
|
71
154
|
end
|
72
155
|
|
@@ -85,9 +168,55 @@ module Words
|
|
85
168
|
class Synset
|
86
169
|
|
87
170
|
SYNSET_TYPE_TO_SYMBOL = {"n" => :noun, "v" => :verb, "a" => :adjective, "r" => :adverb, "s" => :adjective_satallite }
|
88
|
-
|
89
|
-
|
90
|
-
|
171
|
+
NUM_TO_LEX = [ { :lex => :adj_all, :description => "all adjective clusters" },
|
172
|
+
{ :lex => :adj_pert, :description => "relational adjectives (pertainyms)" },
|
173
|
+
{ :lex => :adv_all, :description => "all adverbs" },
|
174
|
+
{ :lex => :noun_Tops, :description => "unique beginner for nouns" },
|
175
|
+
{ :lex => :noun_act, :description => "nouns denoting acts or actions" },
|
176
|
+
{ :lex => :noun_animal, :description => "nouns denoting animals" },
|
177
|
+
{ :lex => :noun_artifact, :description => "nouns denoting man-made objects" },
|
178
|
+
{ :lex => :noun_attribute, :description => "nouns denoting attributes of people and objects" },
|
179
|
+
{ :lex => :noun_body, :description => "nouns denoting body parts" },
|
180
|
+
{ :lex => :noun_cognition, :description => "nouns denoting cognitive processes and contents" },
|
181
|
+
{ :lex => :noun_communication, :description => "nouns denoting communicative processes and contents" },
|
182
|
+
{ :lex => :noun_event, :description => "nouns denoting natural events" },
|
183
|
+
{ :lex => :noun_feeling, :description => "nouns denoting feelings and emotions" },
|
184
|
+
{ :lex => :noun_food, :description => "nouns denoting foods and drinks" },
|
185
|
+
{ :lex => :noun_group, :description => "nouns denoting groupings of people or objects" },
|
186
|
+
{ :lex => :noun_location, :description => "nouns denoting spatial position" },
|
187
|
+
{ :lex => :noun_motive, :description => "nouns denoting goals" },
|
188
|
+
{ :lex => :noun_object, :description => "nouns denoting natural objects (not man-made)" },
|
189
|
+
{ :lex => :noun_person, :description => "nouns denoting people" },
|
190
|
+
{ :lex => :noun_phenomenon, :description => "nouns denoting natural phenomena" },
|
191
|
+
{ :lex => :noun_plant, :description => "nouns denoting plants" },
|
192
|
+
{ :lex => :noun_possession, :description => "nouns denoting possession and transfer of possession" },
|
193
|
+
{ :lex => :noun_process, :description => "nouns denoting natural processes" },
|
194
|
+
{ :lex => :noun_quantity, :description => "nouns denoting quantities and units of measure" },
|
195
|
+
{ :lex => :noun_relation, :description => "nouns denoting relations between people or things or ideas" },
|
196
|
+
{ :lex => :noun_shape, :description => "nouns denoting two and three dimensional shapes" },
|
197
|
+
{ :lex => :noun_state, :description => "nouns denoting stable states of affairs" },
|
198
|
+
{ :lex => :noun_substance, :description => "nouns denoting substances" },
|
199
|
+
{ :lex => :noun_time, :description => "nouns denoting time and temporal relations" },
|
200
|
+
{ :lex => :verb_body, :description => "verbs of grooming, dressing and bodily care" },
|
201
|
+
{ :lex => :verb_change, :description => "verbs of size, temperature change, intensifying, etc." },
|
202
|
+
{ :lex => :verb_cognition, :description => "verbs of thinking, judging, analyzing, doubting" },
|
203
|
+
{ :lex => :verb_communication, :description => "verbs of telling, asking, ordering, singing" },
|
204
|
+
{ :lex => :verb_competition, :description => "verbs of fighting, athletic activities" },
|
205
|
+
{ :lex => :verb_consumption, :description => "verbs of eating and drinking" },
|
206
|
+
{ :lex => :verb_contact, :description => "verbs of touching, hitting, tying, digging" },
|
207
|
+
{ :lex => :verb_creation, :description => "verbs of sewing, baking, painting, performing" },
|
208
|
+
{ :lex => :verb_emotion, :description => "verbs of feeling" },
|
209
|
+
{ :lex => :verb_motion, :description => "verbs of walking, flying, swimming" },
|
210
|
+
{ :lex => :verb_perception, :description => "verbs of seeing, hearing, feeling" },
|
211
|
+
{ :lex => :verb_possession, :description => "verbs of buying, selling, owning" },
|
212
|
+
{ :lex => :verb_social, :description => "verbs of political and social activities and events" },
|
213
|
+
{ :lex => :verb_stative, :description => "verbs of being, having, spatial relations" },
|
214
|
+
{ :lex => :verb_weather, :description => "verbs of raining, snowing, thawing, thundering" },
|
215
|
+
{ :lex => :adj_ppl, :description => "participial adjectives" } ]
|
216
|
+
|
217
|
+
def initialize(synset_id, wordnet_connection)
|
218
|
+
@wordnet_connection = wordnet_connection
|
219
|
+
@synset_hash = wordnet_connection.synset(synset_id)
|
91
220
|
# construct some conveniance menthods for relation type access
|
92
221
|
Relation::SYMBOL_TO_RELATION.keys.each do |relation_type|
|
93
222
|
self.class.send(:define_method, "#{relation_type}s?") do
|
@@ -117,6 +246,22 @@ module Words
|
|
117
246
|
@words_with_num
|
118
247
|
end
|
119
248
|
|
249
|
+
def lexical_filenum
|
250
|
+
@synset_hash["lexical_filenum"].to_i
|
251
|
+
end
|
252
|
+
|
253
|
+
def lexical_catagory
|
254
|
+
lexical[:lex]
|
255
|
+
end
|
256
|
+
|
257
|
+
def lexical_description
|
258
|
+
lexical[:description]
|
259
|
+
end
|
260
|
+
|
261
|
+
def lexical
|
262
|
+
NUM_TO_LEX[@synset_hash["lexical_filenum"].to_i]
|
263
|
+
end
|
264
|
+
|
120
265
|
def synset_id
|
121
266
|
@synset_hash["synset_id"]
|
122
267
|
end
|
@@ -130,7 +275,7 @@ module Words
|
|
130
275
|
end
|
131
276
|
|
132
277
|
def relations(type = :all)
|
133
|
-
@relations = @synset_hash["relations"].split('|').map { |relation| Relation.new(relation, self) } unless defined? @relations
|
278
|
+
@relations = @synset_hash["relations"].split('|').map { |relation| Relation.new(relation, self, @wordnet_connection) } unless defined? @relations
|
134
279
|
case
|
135
280
|
when Relation::SYMBOL_TO_RELATION.include?(type.to_sym)
|
136
281
|
@relations.select { |relation| relation.relation_type == type.to_sym }
|
@@ -153,8 +298,9 @@ module Words
|
|
153
298
|
POS_TO_SYMBOL = {"n" => :noun, "v" => :verb, "a" => :adjective, "r" => :adverb}
|
154
299
|
SYMBOL_TO_POS = POS_TO_SYMBOL.invert
|
155
300
|
|
156
|
-
def initialize(
|
157
|
-
@
|
301
|
+
def initialize(raw_lemma, wordnet_connection)
|
302
|
+
@wordnet_connection = wordnet_connection
|
303
|
+
@lemma_hash = raw_lemma
|
158
304
|
# construct some conveniance menthods for relation type access
|
159
305
|
SYMBOL_TO_POS.keys.each do |pos|
|
160
306
|
self.class.send(:define_method, "#{pos}s?") do
|
@@ -163,9 +309,17 @@ module Words
|
|
163
309
|
self.class.send(:define_method, "#{pos}s") do
|
164
310
|
synsets(pos)
|
165
311
|
end
|
312
|
+
self.class.send(:define_method, "#{pos}_ids") do
|
313
|
+
synset_ids(pos)
|
314
|
+
end
|
166
315
|
end
|
167
316
|
end
|
168
317
|
|
318
|
+
def tagsense_counts
|
319
|
+
@tagsense_counts = @lemma_hash["tagsense_counts"].split('|').map { |count| { POS_TO_SYMBOL[count[0,1]] => count[1..-1].to_i } } unless defined? @tagsense_counts
|
320
|
+
@tagsense_counts
|
321
|
+
end
|
322
|
+
|
169
323
|
def lemma
|
170
324
|
@lemma = @lemma_hash["lemma"].gsub('_', ' ') unless defined? @lemma
|
171
325
|
@lemma
|
@@ -182,20 +336,19 @@ module Words
|
|
182
336
|
end
|
183
337
|
|
184
338
|
def synsets(pos = :all)
|
185
|
-
|
339
|
+
synset_ids(pos).map { |synset_id| Synset.new synset_id, @wordnet_connection }
|
340
|
+
end
|
341
|
+
|
342
|
+
def synset_ids(pos = :all)
|
343
|
+
@synset_ids = @lemma_hash["synset_ids"].split('|') unless defined? @synset_ids
|
344
|
+
case
|
186
345
|
when SYMBOL_TO_POS.include?(pos.to_sym)
|
187
|
-
synset_ids.select { |synset_id| synset_id[0,1] == SYMBOL_TO_POS[pos.to_sym] }
|
346
|
+
@synset_ids.select { |synset_id| synset_id[0,1] == SYMBOL_TO_POS[pos.to_sym] }
|
188
347
|
when POS_TO_SYMBOL.include?(pos.to_s)
|
189
|
-
synset_ids.select { |synset_id| synset_id[0,1] == pos.to_s }
|
348
|
+
@synset_ids.select { |synset_id| synset_id[0,1] == pos.to_s }
|
190
349
|
else
|
191
|
-
synset_ids
|
350
|
+
@synset_ids
|
192
351
|
end
|
193
|
-
relevent_synsets.map { |synset_id| Synset.new synset_id }
|
194
|
-
end
|
195
|
-
|
196
|
-
def synset_ids
|
197
|
-
@synset_ids = @lemma_hash["synset_ids"].split('|') unless defined? @synset_ids
|
198
|
-
@synset_ids
|
199
352
|
end
|
200
353
|
|
201
354
|
def inspect
|
@@ -203,25 +356,42 @@ module Words
|
|
203
356
|
end
|
204
357
|
|
205
358
|
alias word lemma
|
359
|
+
alias pos available_pos
|
206
360
|
|
207
361
|
end
|
208
362
|
|
209
363
|
class Words
|
210
364
|
|
211
|
-
|
212
|
-
|
213
|
-
|
214
|
-
|
215
|
-
abort("Failed to locate the words database at #{(Pathname.new path).realpath}")
|
216
|
-
end
|
365
|
+
@wordnet_connection = nil
|
366
|
+
|
367
|
+
def initialize(type = :tokyo, path = :default, wordnet_path = :search)
|
368
|
+
@wordnet_connection = WordnetConnection.new(type, path, wordnet_path)
|
217
369
|
end
|
218
370
|
|
219
371
|
def find(word)
|
220
|
-
Lemma.new
|
372
|
+
Lemma.new @wordnet_connection.lemma(word), @wordnet_connection
|
373
|
+
end
|
374
|
+
|
375
|
+
def connection_type
|
376
|
+
@wordnet_connection.connection_type
|
377
|
+
end
|
378
|
+
|
379
|
+
def wordnet_dir
|
380
|
+
@wordnet_connection.wordnet_dir
|
221
381
|
end
|
222
382
|
|
223
383
|
def close
|
224
|
-
|
384
|
+
@wordnet_connection.close
|
385
|
+
end
|
386
|
+
|
387
|
+
def connected
|
388
|
+
@wordnet_connection.connected
|
389
|
+
end
|
390
|
+
|
391
|
+
def to_s
|
392
|
+
return "Words not connected" if !connected
|
393
|
+
return "Words running in pure mode using wordnet files found at #{wordnet_dir} and index at #{@wordnet_connection.data_path}" if connection_type == :pure
|
394
|
+
return "Words running in tokyo mode with dataset at #{@wordnet_connection.data_path}" if connection_type == :tokyo
|
225
395
|
end
|
226
396
|
|
227
397
|
end
|
data/words.gemspec
ADDED
@@ -0,0 +1,60 @@
|
|
1
|
+
# Generated by jeweler
|
2
|
+
# DO NOT EDIT THIS FILE DIRECTLY
|
3
|
+
# Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
|
4
|
+
# -*- encoding: utf-8 -*-
|
5
|
+
|
6
|
+
Gem::Specification.new do |s|
|
7
|
+
s.name = %q{words}
|
8
|
+
s.version = "0.2.0"
|
9
|
+
|
10
|
+
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
|
+
s.authors = ["Roja Buck"]
|
12
|
+
s.date = %q{2010-01-16}
|
13
|
+
s.default_executable = %q{build_wordnet}
|
14
|
+
s.description = %q{A fast, easy to use interface to WordNet® with cross ruby distribution compatability. We use TokyoCabinet to store the dataset and the excellent rufus-tokyo to interface with it. This allows us to have full compatability across ruby distributions while still remaining both fast and simple to use.}
|
15
|
+
s.email = %q{roja@arbia.co.uk}
|
16
|
+
s.executables = ["build_wordnet"]
|
17
|
+
s.extra_rdoc_files = [
|
18
|
+
"LICENSE",
|
19
|
+
"README.markdown"
|
20
|
+
]
|
21
|
+
s.files = [
|
22
|
+
".gitignore",
|
23
|
+
"LICENSE",
|
24
|
+
"README.markdown",
|
25
|
+
"Rakefile",
|
26
|
+
"VERSION",
|
27
|
+
"bin/build_wordnet",
|
28
|
+
"examples.rb",
|
29
|
+
"lib/words.rb",
|
30
|
+
"test/helper.rb",
|
31
|
+
"test/test_words.rb",
|
32
|
+
"words.gemspec"
|
33
|
+
]
|
34
|
+
s.homepage = %q{http://github.com/roja/words}
|
35
|
+
s.rdoc_options = ["--charset=UTF-8"]
|
36
|
+
s.require_paths = ["lib"]
|
37
|
+
s.rubygems_version = %q{1.3.5}
|
38
|
+
s.summary = %q{A fast, easy to use interface to WordNet® with cross ruby distribution compatability.}
|
39
|
+
s.test_files = [
|
40
|
+
"test/test_words.rb",
|
41
|
+
"test/helper.rb"
|
42
|
+
]
|
43
|
+
|
44
|
+
if s.respond_to? :specification_version then
|
45
|
+
current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
|
46
|
+
s.specification_version = 3
|
47
|
+
|
48
|
+
if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
|
49
|
+
s.add_runtime_dependency(%q<trollop>, [">= 1.15"])
|
50
|
+
s.add_runtime_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
|
51
|
+
else
|
52
|
+
s.add_dependency(%q<trollop>, [">= 1.15"])
|
53
|
+
s.add_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
|
54
|
+
end
|
55
|
+
else
|
56
|
+
s.add_dependency(%q<trollop>, [">= 1.15"])
|
57
|
+
s.add_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
|
58
|
+
end
|
59
|
+
end
|
60
|
+
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: words
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Roja Buck
|
@@ -9,12 +9,12 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2010-01-
|
13
|
-
default_executable:
|
12
|
+
date: 2010-01-16 00:00:00 +00:00
|
13
|
+
default_executable: build_wordnet
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: trollop
|
17
|
-
type: :
|
17
|
+
type: :runtime
|
18
18
|
version_requirement:
|
19
19
|
version_requirements: !ruby/object:Gem::Requirement
|
20
20
|
requirements:
|
@@ -34,8 +34,8 @@ dependencies:
|
|
34
34
|
version:
|
35
35
|
description: "A fast, easy to use interface to WordNet\xC2\xAE with cross ruby distribution compatability. We use TokyoCabinet to store the dataset and the excellent rufus-tokyo to interface with it. This allows us to have full compatability across ruby distributions while still remaining both fast and simple to use."
|
36
36
|
email: roja@arbia.co.uk
|
37
|
-
executables:
|
38
|
-
|
37
|
+
executables:
|
38
|
+
- build_wordnet
|
39
39
|
extensions: []
|
40
40
|
|
41
41
|
extra_rdoc_files:
|
@@ -47,12 +47,12 @@ files:
|
|
47
47
|
- README.markdown
|
48
48
|
- Rakefile
|
49
49
|
- VERSION
|
50
|
-
-
|
51
|
-
- data/wordnet.tct
|
50
|
+
- bin/build_wordnet
|
52
51
|
- examples.rb
|
53
52
|
- lib/words.rb
|
54
53
|
- test/helper.rb
|
55
54
|
- test/test_words.rb
|
55
|
+
- words.gemspec
|
56
56
|
has_rdoc: true
|
57
57
|
homepage: http://github.com/roja/words
|
58
58
|
licenses: []
|
data/data/wordnet.tct
DELETED
Binary file
|