words 0.1.0 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.markdown +45 -10
- data/Rakefile +4 -3
- data/VERSION +1 -1
- data/{build_dataset.rb → bin/build_wordnet} +38 -20
- data/examples.rb +11 -4
- data/lib/words.rb +200 -30
- data/words.gemspec +60 -0
- metadata +8 -8
- data/data/wordnet.tct +0 -0
data/README.markdown
CHANGED
@@ -2,11 +2,43 @@
|
|
2
2
|
|
3
3
|
## About ##
|
4
4
|
|
5
|
-
Words implements a fast interface to [Wordnet®](http://wordnet.princeton.edu) which makes use of [Tokyo Cabinet](http://1978th.net/tokyocabinet/) and
|
5
|
+
Words implements a fast interface to [Wordnet®](http://wordnet.princeton.edu) which provides both a pure ruby and an FFI powered backend over the same easy-to-use API. The FFI backend makes use of [Tokyo Cabinet](http://1978th.net/tokyocabinet/) and the FFI interface, [rufus-tokyo](http://github.com/jmettraux/rufus-tokyo), to provide cross ruby distribution compatability and blistering speed. The pure ruby interface operates on a special ruby optimised index along with the basic dictionary files provided by WordNet®. I have attempted to provide ease of use in the form of a simple yet powerful api and installation is a sintch!
|
6
6
|
|
7
|
-
## Installation ##
|
7
|
+
## Pre-Installation ##
|
8
8
|
|
9
|
-
First ensure you have
|
9
|
+
First ensure you have a copy of the wordnet data files. This is generally available from your Linux/OSX package manager:
|
10
|
+
|
11
|
+
#Ubuntu
|
12
|
+
sudo apt-get install wordnet-base
|
13
|
+
|
14
|
+
#Fedora/RHL
|
15
|
+
sudo yum update wordnet
|
16
|
+
|
17
|
+
#MacPorts
|
18
|
+
sudo port install wordnet
|
19
|
+
|
20
|
+
or you can simply download and install (Unix/OSX):
|
21
|
+
|
22
|
+
wget http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
|
23
|
+
sudo mkdir /usr/local/share/wordnet
|
24
|
+
sudo tar -C /usr/local/share/wordnet/ -xzf WNdb-3.0.tar.gz
|
25
|
+
|
26
|
+
or (Windows)
|
27
|
+
|
28
|
+
Download http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
|
29
|
+
Unzip
|
30
|
+
|
31
|
+
## For Tokyo Backend Only ##
|
32
|
+
|
33
|
+
Unless you want to use the tokyo backend you are now ready to install Words && build the data, otherwise if you want to use the tokyo backend (FAST!) you will also need [Tokyo Cabinet](http://1978th.net/tokyocabinet/) installed. It should be nice and easy... something like:
|
34
|
+
|
35
|
+
wget http://1978th.net/tokyocabinet/tokyocabinet-1.4.41.tar.gz
|
36
|
+
cd tokyo-cabinet/
|
37
|
+
./configure
|
38
|
+
make
|
39
|
+
sudo make install
|
40
|
+
|
41
|
+
## GEM Installation ##
|
10
42
|
|
11
43
|
After this it should be just a gem to install. For those of you with old rubygems versions first:
|
12
44
|
|
@@ -19,22 +51,22 @@ Otherwise and after it's simply:
|
|
19
51
|
|
20
52
|
Then your ready to rock and roll. :)
|
21
53
|
|
22
|
-
## Build Data
|
54
|
+
## Build Data ##
|
23
55
|
|
24
|
-
|
56
|
+
To build the wordnet dataset (or index for pure) file yourself, from the original wordnet files, you can use the bundled "build_wordnet" command
|
25
57
|
|
26
|
-
|
27
|
-
sudo
|
58
|
+
build_wordnet -h # this will give you the usage information
|
59
|
+
sudo build_wordnet -v --build-tokyo # this would attempt to build the tokyo backend data locating the original wordnet files through a search...
|
60
|
+
sudo build_wordnet -v --build-pure # this would attempt to build the pure backend index locating the original wordnet files through a search...
|
28
61
|
|
29
62
|
## Usage ##
|
30
63
|
|
31
64
|
Heres a few little examples of using words within your programs.
|
32
65
|
|
33
|
-
|
34
66
|
require 'rubygems'
|
35
67
|
require 'words'
|
36
68
|
|
37
|
-
data = Words::Words.new
|
69
|
+
data = Words::Words.new # or: data = Words::Words.new(:pure) for the pure ruby backend
|
38
70
|
|
39
71
|
# locate a word
|
40
72
|
lemma = data.find("bat")
|
@@ -45,6 +77,7 @@ Heres a few little examples of using words within your programs.
|
|
45
77
|
lemma.synsets(:noun) # => array of synsets which represent nouns of the lemma bat
|
46
78
|
# or
|
47
79
|
lemma.nouns # => array of synsets which represent nouns of the lemma bat
|
80
|
+
lemma.noun_ids # => array of synsets ids which represent nouns of the lemma bat
|
48
81
|
lemma.verbs? #=> true
|
49
82
|
|
50
83
|
# specify a sense
|
@@ -53,6 +86,7 @@ Heres a few little examples of using words within your programs.
|
|
53
86
|
|
54
87
|
sense.gloss # => a club used for hitting a ball in various games
|
55
88
|
sense2.words # => ["cricket bat", "bat"]
|
89
|
+
sense2.lexical_description # => a description of the lexical meaning of the synset
|
56
90
|
sense.relations.first # => "Semantic hypernym relation between n02806379 and n03053474"
|
57
91
|
|
58
92
|
sense.relations(:hyponym) # => Array of hyponyms associated with the sense
|
@@ -68,7 +102,8 @@ Heres a few little examples of using words within your programs.
|
|
68
102
|
sense.derivationally_related_forms.first.source_word # => "bat"
|
69
103
|
sense.derivationally_related_forms.first.destination_word # => "bat"
|
70
104
|
sense.derivationally_related_forms.first.destination # => the synset of v01413191
|
71
|
-
|
105
|
+
|
106
|
+
These and more examples are available from within the examples.rb file!
|
72
107
|
|
73
108
|
## Note on Patches/Pull Requests ##
|
74
109
|
|
data/Rakefile
CHANGED
@@ -10,9 +10,10 @@ begin
|
|
10
10
|
gem.email = "roja@arbia.co.uk"
|
11
11
|
gem.homepage = "http://github.com/roja/words"
|
12
12
|
gem.authors = ["Roja Buck"]
|
13
|
-
gem.
|
13
|
+
gem.add_dependency "trollop", ">= 1.15"
|
14
14
|
gem.add_dependency 'rufus-tokyo', '>= 1.0.5'
|
15
|
-
|
15
|
+
gem.executables = [ "build_wordnet" ]
|
16
|
+
gem.default_executable = "build_wordnet"
|
16
17
|
end
|
17
18
|
Jeweler::GemcutterTasks.new
|
18
19
|
rescue LoadError
|
@@ -46,7 +47,7 @@ task :default => :test
|
|
46
47
|
require 'rake/rdoctask'
|
47
48
|
Rake::RDocTask.new do |rdoc|
|
48
49
|
version = File.exist?('VERSION') ? File.read('VERSION') : ""
|
49
|
-
|
50
|
+
|
50
51
|
rdoc.rdoc_dir = 'rdoc'
|
51
52
|
rdoc.title = "words #{version}"
|
52
53
|
rdoc.rdoc_files.include('README*')
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.2.0
|
@@ -6,7 +6,6 @@ require 'pathname'
|
|
6
6
|
# gem includes
|
7
7
|
require 'rubygems'
|
8
8
|
require 'trollop'
|
9
|
-
require 'pstore'
|
10
9
|
require 'rufus-tokyo'
|
11
10
|
|
12
11
|
POS_FILE_TYPES = %w{ adj adv noun verb }
|
@@ -27,7 +26,10 @@ if __FILE__ == $0
|
|
27
26
|
opts = Trollop::options do
|
28
27
|
opt :verbose, "Output verbose program detail.", :default => false
|
29
28
|
opt :wordnet, "Location of the wordnet dictionary directory", :default => "Search..."
|
29
|
+
opt :build_tokyo, "Build the tokyo dataset?", :default => false
|
30
|
+
opt :build_pure, "Build the pure ruby dataset?", :default => false
|
30
31
|
end
|
32
|
+
Trollop::die :build_tokyo, "Either tokyo dataset or pure ruby dataset are required" if !opts[:build_tokyo] && !opts[:build_pure]
|
31
33
|
puts "Verbose mode enabled" if (VERBOSE = opts[:verbose])
|
32
34
|
|
33
35
|
wordnet_dir = nil
|
@@ -57,7 +59,8 @@ if __FILE__ == $0
|
|
57
59
|
|
58
60
|
# Build data
|
59
61
|
|
60
|
-
|
62
|
+
index_hash = Hash.new
|
63
|
+
data_hash = Hash.new
|
61
64
|
POS_FILE_TYPES.each do |file_pos|
|
62
65
|
|
63
66
|
puts "Building #{file_pos} indexes..." if VERBOSE
|
@@ -73,30 +76,45 @@ if __FILE__ == $0
|
|
73
76
|
tagsense_count = pos + index_parts.shift
|
74
77
|
synset_ids = Array.new(synset_count).map { POS_FILE_TYPE_TO_SHORT[file_pos] + index_parts.shift }
|
75
78
|
|
76
|
-
|
79
|
+
index_hash[lemma] = { "synset_ids" => [], "tagsense_counts" => [] } if index_hash[lemma].nil?
|
80
|
+
index_hash[lemma] = { "lemma" => lemma, "synset_ids" => index_hash[lemma]["synset_ids"] + synset_ids, "tagsense_counts" => index_hash[lemma]["tagsense_counts"] + [tagsense_count] }
|
77
81
|
|
78
|
-
hash[lemma] = { "lemma" => lemma, "synset_ids" => (hash[lemma]["synset_ids"].split('|') + synset_ids).join('|'), # append synsets
|
79
|
-
"tagsense_counts" => (hash[lemma]["tagsense_counts"].split('|') << tagsense_count).join('|') } # append pointer symbols
|
80
82
|
end
|
81
83
|
|
82
|
-
|
83
|
-
|
84
|
-
# add data
|
85
|
-
(wordnet_dir + "data.#{file_pos}").each_line do |data_line|
|
86
|
-
next if data_line[0, 2] == " "
|
87
|
-
data_line, gloss = data_line.split(" | ")
|
88
|
-
data_parts = data_line.split(" ")
|
89
|
-
|
90
|
-
synset_id, lexical_filenum, synset_type, word_count = POS_FILE_TYPE_TO_SHORT[file_pos] + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
|
91
|
-
words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
|
92
|
-
relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
|
84
|
+
if opts[:build_tokyo]
|
85
|
+
puts "Building #{file_pos} data..." if VERBOSE
|
93
86
|
|
94
|
-
|
95
|
-
|
87
|
+
# add data
|
88
|
+
(wordnet_dir + "data.#{file_pos}").each_line do |data_line|
|
89
|
+
next if data_line[0, 2] == " "
|
90
|
+
data_line, gloss = data_line.split(" | ")
|
91
|
+
data_parts = data_line.split(" ")
|
92
|
+
|
93
|
+
synset_id, lexical_filenum, synset_type, word_count = POS_FILE_TYPE_TO_SHORT[file_pos] + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
|
94
|
+
words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
|
95
|
+
relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
|
96
|
+
|
97
|
+
data_hash[synset_id] = { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type,
|
98
|
+
"words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss.strip }
|
99
|
+
end
|
96
100
|
end
|
97
101
|
|
98
|
-
|
99
102
|
end
|
100
|
-
|
103
|
+
|
104
|
+
if opts[:build_tokyo]
|
105
|
+
tokyo_hash = Rufus::Tokyo::Table.new("#{File.dirname(__FILE__)}/../data/wordnet.tct")
|
106
|
+
index_hash.each { |k,v| tokyo_hash[k] = { "lemma" => v["lemma"], "synset_ids" => v["synset_ids"].join('|'), "tagsense_counts" => v["tagsense_counts"].join('|') } }
|
107
|
+
data_hash.each { |k,v| tokyo_hash[k] = v }
|
108
|
+
tokyo_hash.close
|
109
|
+
end
|
110
|
+
|
111
|
+
if opts[:build_pure]
|
112
|
+
index = Hash.new
|
113
|
+
index_hash.each { |k,v| index[k] = [v["lemma"], v["tagsense_counts"].join('|'), v["synset_ids"].join('|')] }
|
114
|
+
File.open("#{File.dirname(__FILE__)}/../data/index.dmp",'w') do |file|
|
115
|
+
file.write Marshal.dump(index)
|
116
|
+
end
|
117
|
+
end
|
118
|
+
|
101
119
|
|
102
120
|
end
|
data/examples.rb
CHANGED
@@ -4,25 +4,32 @@ require 'lib/words'
|
|
4
4
|
|
5
5
|
if __FILE__ == $0
|
6
6
|
|
7
|
-
wordnet = Words::Words.new
|
7
|
+
wordnet = Words::Words.new # :pure
|
8
|
+
|
9
|
+
puts wordnet
|
8
10
|
|
9
11
|
puts wordnet.find('bat')
|
10
12
|
puts wordnet.find('bat').available_pos.inspect
|
11
13
|
puts wordnet.find('bat').lemma
|
14
|
+
puts wordnet.find('bat').nouns?
|
12
15
|
puts wordnet.find('bat').synsets('noun')
|
13
|
-
puts wordnet.find('bat').
|
14
|
-
puts wordnet.find('bat').synsets(
|
16
|
+
puts wordnet.find('bat').noun_ids
|
17
|
+
puts wordnet.find('bat').synsets(:noun).last.words.inspect
|
18
|
+
puts wordnet.find('bat').nouns.last.relations
|
15
19
|
wordnet.find('bat').synsets('noun').last.relations.each { |relation| puts relation.inspect }
|
16
|
-
puts wordnet.find('bat').synsets('noun').last.methods
|
17
20
|
puts wordnet.find('bat').synsets('noun').last.hyponyms?
|
18
21
|
puts wordnet.find('bat').synsets('noun').last.participle_of_verbs?
|
19
22
|
|
20
23
|
puts wordnet.find('bat').synsets('noun').last.relations(:hyponym)
|
24
|
+
puts wordnet.find('bat').synsets('noun').last.hyponyms?
|
21
25
|
puts wordnet.find('bat').synsets('noun').last.relations("~")
|
22
26
|
puts wordnet.find('bat').synsets('verb').last.inspect
|
23
27
|
puts wordnet.find('bat').synsets('verb').last.words
|
24
28
|
puts wordnet.find('bat').synsets('verb').last.words_with_num.inspect
|
25
29
|
|
30
|
+
puts wordnet.find('bat').synsets('verb').first.lexical.inspect
|
31
|
+
puts wordnet.find('bat').synsets('verb').first.lexical_description
|
32
|
+
|
26
33
|
wordnet.close
|
27
34
|
|
28
35
|
end
|
data/lib/words.rb
CHANGED
@@ -10,12 +10,94 @@ module Words
|
|
10
10
|
|
11
11
|
class WordnetConnection
|
12
12
|
|
13
|
-
|
14
|
-
|
13
|
+
SHORT_TO_POS_FILE_TYPE = { 'a' => 'adj', 'r' => 'adv', 'n' => 'noun', 'v' => 'verb' }
|
14
|
+
|
15
|
+
attr_reader :connected, :connection_type, :data_path, :wordnet_dir
|
16
|
+
|
17
|
+
def initialize(type, path, wordnet_path)
|
18
|
+
@data_path = Pathname.new("#{File.dirname(__FILE__)}/../data/wordnet.tct") if type == :tokyo && path == :default
|
19
|
+
@data_path = Pathname.new("#{File.dirname(__FILE__)}/../data/index.dmp") if type == :pure && path == :default
|
20
|
+
@connection_type = type
|
21
|
+
|
22
|
+
if @data_path.exist?
|
23
|
+
if @connection_type == :tokyo
|
24
|
+
@connection = Rufus::Tokyo::Table.new(@data_path.to_s)
|
25
|
+
@connected = true
|
26
|
+
elsif @connection_type == :pure
|
27
|
+
# open the index is there
|
28
|
+
File.open(@data_path,'r') do |file|
|
29
|
+
@connection = Marshal.load file.read
|
30
|
+
end
|
31
|
+
# search for the wordnet files
|
32
|
+
if locate_wordnet?(wordnet_path)
|
33
|
+
@connected = true
|
34
|
+
else
|
35
|
+
@connected = false
|
36
|
+
raise "Failed to locate the wordnet database. Please ensure it is installed and that if it resides at a custom path that path is given as an argument when constructing the Words object."
|
37
|
+
end
|
38
|
+
else
|
39
|
+
@connected = false
|
40
|
+
end
|
41
|
+
else
|
42
|
+
@connected = false
|
43
|
+
raise "Failed to locate the words #{ @connection_type == :pure ? 'index' : 'dataset' } at #{@data_path}. Please insure you have created it using the words gems provided 'build_dataset.rb' command."
|
44
|
+
end
|
45
|
+
|
46
|
+
end
|
47
|
+
|
48
|
+
def close
|
49
|
+
@connected = false
|
50
|
+
if @connected && connection_type == :tokyo
|
51
|
+
connection.close
|
52
|
+
end
|
53
|
+
return true
|
54
|
+
end
|
55
|
+
|
56
|
+
def lemma(term)
|
57
|
+
if connection_type == :pure
|
58
|
+
raw_lemma = @connection[term]
|
59
|
+
{ 'lemma' => raw_lemma[0], 'tagsense_counts' => raw_lemma[1], 'synset_ids' => raw_lemma[2]}
|
60
|
+
else
|
61
|
+
@connection[term]
|
62
|
+
end
|
63
|
+
end
|
64
|
+
|
65
|
+
def synset(synset_id)
|
66
|
+
if connection_type == :pure
|
67
|
+
pos = synset_id[0,1]
|
68
|
+
File.open(@wordnet_dir + "data.#{SHORT_TO_POS_FILE_TYPE[pos]}","r") do |file|
|
69
|
+
file.seek(synset_id[1..-1].to_i)
|
70
|
+
data_line, gloss = file.readline.strip.split(" | ")
|
71
|
+
data_parts = data_line.split(" ")
|
72
|
+
synset_id, lexical_filenum, synset_type, word_count = pos + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
|
73
|
+
words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
|
74
|
+
relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
|
75
|
+
{ "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type, "words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss.strip }
|
76
|
+
end
|
77
|
+
else
|
78
|
+
@connection[synset_id]
|
79
|
+
end
|
15
80
|
end
|
16
81
|
|
17
|
-
def
|
18
|
-
|
82
|
+
def locate_wordnet?(base_dirs)
|
83
|
+
|
84
|
+
base_dirs = case base_dirs
|
85
|
+
when :search
|
86
|
+
['/usr/share/wordnet', '/usr/local/share/wordnet', '/usr/local/WordNet-3.0']
|
87
|
+
else
|
88
|
+
[ base_dirs ]
|
89
|
+
end
|
90
|
+
|
91
|
+
base_dirs.each do |dir|
|
92
|
+
["", "dict"].each do |sub_folder|
|
93
|
+
path = Pathname.new(dir + sub_folder)
|
94
|
+
@wordnet_dir = path if (path + "data.noun").exist?
|
95
|
+
break if !@wordnet_dir.nil?
|
96
|
+
end
|
97
|
+
end
|
98
|
+
|
99
|
+
return !@wordnet_dir.nil?
|
100
|
+
|
19
101
|
end
|
20
102
|
|
21
103
|
end
|
@@ -29,7 +111,8 @@ module Words
|
|
29
111
|
"\\" => :pertainym, "<" => :participle_of_verb, "&" => :similar_to, "^" => :see_also }
|
30
112
|
SYMBOL_TO_RELATION = RELATION_TO_SYMBOL.invert
|
31
113
|
|
32
|
-
def initialize(relation_construct, source_synset)
|
114
|
+
def initialize(relation_construct, source_synset, wordnet_connection)
|
115
|
+
@wordnet_connection = wordnet_connection
|
33
116
|
@symbol, @dest_synset_id, @pos, @source_dest = relation_construct.split('.')
|
34
117
|
@dest_synset_id = @pos + @dest_synset_id
|
35
118
|
@symbol = RELATION_TO_SYMBOL[@symbol]
|
@@ -66,7 +149,7 @@ module Words
|
|
66
149
|
end
|
67
150
|
|
68
151
|
def destination
|
69
|
-
@destination = Synset.new
|
152
|
+
@destination = Synset.new @dest_synset_id, @wordnet_connection unless defined? @destination
|
70
153
|
@destination
|
71
154
|
end
|
72
155
|
|
@@ -85,9 +168,55 @@ module Words
|
|
85
168
|
class Synset
|
86
169
|
|
87
170
|
SYNSET_TYPE_TO_SYMBOL = {"n" => :noun, "v" => :verb, "a" => :adjective, "r" => :adverb, "s" => :adjective_satallite }
|
88
|
-
|
89
|
-
|
90
|
-
|
171
|
+
NUM_TO_LEX = [ { :lex => :adj_all, :description => "all adjective clusters" },
|
172
|
+
{ :lex => :adj_pert, :description => "relational adjectives (pertainyms)" },
|
173
|
+
{ :lex => :adv_all, :description => "all adverbs" },
|
174
|
+
{ :lex => :noun_Tops, :description => "unique beginner for nouns" },
|
175
|
+
{ :lex => :noun_act, :description => "nouns denoting acts or actions" },
|
176
|
+
{ :lex => :noun_animal, :description => "nouns denoting animals" },
|
177
|
+
{ :lex => :noun_artifact, :description => "nouns denoting man-made objects" },
|
178
|
+
{ :lex => :noun_attribute, :description => "nouns denoting attributes of people and objects" },
|
179
|
+
{ :lex => :noun_body, :description => "nouns denoting body parts" },
|
180
|
+
{ :lex => :noun_cognition, :description => "nouns denoting cognitive processes and contents" },
|
181
|
+
{ :lex => :noun_communication, :description => "nouns denoting communicative processes and contents" },
|
182
|
+
{ :lex => :noun_event, :description => "nouns denoting natural events" },
|
183
|
+
{ :lex => :noun_feeling, :description => "nouns denoting feelings and emotions" },
|
184
|
+
{ :lex => :noun_food, :description => "nouns denoting foods and drinks" },
|
185
|
+
{ :lex => :noun_group, :description => "nouns denoting groupings of people or objects" },
|
186
|
+
{ :lex => :noun_location, :description => "nouns denoting spatial position" },
|
187
|
+
{ :lex => :noun_motive, :description => "nouns denoting goals" },
|
188
|
+
{ :lex => :noun_object, :description => "nouns denoting natural objects (not man-made)" },
|
189
|
+
{ :lex => :noun_person, :description => "nouns denoting people" },
|
190
|
+
{ :lex => :noun_phenomenon, :description => "nouns denoting natural phenomena" },
|
191
|
+
{ :lex => :noun_plant, :description => "nouns denoting plants" },
|
192
|
+
{ :lex => :noun_possession, :description => "nouns denoting possession and transfer of possession" },
|
193
|
+
{ :lex => :noun_process, :description => "nouns denoting natural processes" },
|
194
|
+
{ :lex => :noun_quantity, :description => "nouns denoting quantities and units of measure" },
|
195
|
+
{ :lex => :noun_relation, :description => "nouns denoting relations between people or things or ideas" },
|
196
|
+
{ :lex => :noun_shape, :description => "nouns denoting two and three dimensional shapes" },
|
197
|
+
{ :lex => :noun_state, :description => "nouns denoting stable states of affairs" },
|
198
|
+
{ :lex => :noun_substance, :description => "nouns denoting substances" },
|
199
|
+
{ :lex => :noun_time, :description => "nouns denoting time and temporal relations" },
|
200
|
+
{ :lex => :verb_body, :description => "verbs of grooming, dressing and bodily care" },
|
201
|
+
{ :lex => :verb_change, :description => "verbs of size, temperature change, intensifying, etc." },
|
202
|
+
{ :lex => :verb_cognition, :description => "verbs of thinking, judging, analyzing, doubting" },
|
203
|
+
{ :lex => :verb_communication, :description => "verbs of telling, asking, ordering, singing" },
|
204
|
+
{ :lex => :verb_competition, :description => "verbs of fighting, athletic activities" },
|
205
|
+
{ :lex => :verb_consumption, :description => "verbs of eating and drinking" },
|
206
|
+
{ :lex => :verb_contact, :description => "verbs of touching, hitting, tying, digging" },
|
207
|
+
{ :lex => :verb_creation, :description => "verbs of sewing, baking, painting, performing" },
|
208
|
+
{ :lex => :verb_emotion, :description => "verbs of feeling" },
|
209
|
+
{ :lex => :verb_motion, :description => "verbs of walking, flying, swimming" },
|
210
|
+
{ :lex => :verb_perception, :description => "verbs of seeing, hearing, feeling" },
|
211
|
+
{ :lex => :verb_possession, :description => "verbs of buying, selling, owning" },
|
212
|
+
{ :lex => :verb_social, :description => "verbs of political and social activities and events" },
|
213
|
+
{ :lex => :verb_stative, :description => "verbs of being, having, spatial relations" },
|
214
|
+
{ :lex => :verb_weather, :description => "verbs of raining, snowing, thawing, thundering" },
|
215
|
+
{ :lex => :adj_ppl, :description => "participial adjectives" } ]
|
216
|
+
|
217
|
+
def initialize(synset_id, wordnet_connection)
|
218
|
+
@wordnet_connection = wordnet_connection
|
219
|
+
@synset_hash = wordnet_connection.synset(synset_id)
|
91
220
|
# construct some conveniance menthods for relation type access
|
92
221
|
Relation::SYMBOL_TO_RELATION.keys.each do |relation_type|
|
93
222
|
self.class.send(:define_method, "#{relation_type}s?") do
|
@@ -117,6 +246,22 @@ module Words
|
|
117
246
|
@words_with_num
|
118
247
|
end
|
119
248
|
|
249
|
+
def lexical_filenum
|
250
|
+
@synset_hash["lexical_filenum"].to_i
|
251
|
+
end
|
252
|
+
|
253
|
+
def lexical_catagory
|
254
|
+
lexical[:lex]
|
255
|
+
end
|
256
|
+
|
257
|
+
def lexical_description
|
258
|
+
lexical[:description]
|
259
|
+
end
|
260
|
+
|
261
|
+
def lexical
|
262
|
+
NUM_TO_LEX[@synset_hash["lexical_filenum"].to_i]
|
263
|
+
end
|
264
|
+
|
120
265
|
def synset_id
|
121
266
|
@synset_hash["synset_id"]
|
122
267
|
end
|
@@ -130,7 +275,7 @@ module Words
|
|
130
275
|
end
|
131
276
|
|
132
277
|
def relations(type = :all)
|
133
|
-
@relations = @synset_hash["relations"].split('|').map { |relation| Relation.new(relation, self) } unless defined? @relations
|
278
|
+
@relations = @synset_hash["relations"].split('|').map { |relation| Relation.new(relation, self, @wordnet_connection) } unless defined? @relations
|
134
279
|
case
|
135
280
|
when Relation::SYMBOL_TO_RELATION.include?(type.to_sym)
|
136
281
|
@relations.select { |relation| relation.relation_type == type.to_sym }
|
@@ -153,8 +298,9 @@ module Words
|
|
153
298
|
POS_TO_SYMBOL = {"n" => :noun, "v" => :verb, "a" => :adjective, "r" => :adverb}
|
154
299
|
SYMBOL_TO_POS = POS_TO_SYMBOL.invert
|
155
300
|
|
156
|
-
def initialize(
|
157
|
-
@
|
301
|
+
def initialize(raw_lemma, wordnet_connection)
|
302
|
+
@wordnet_connection = wordnet_connection
|
303
|
+
@lemma_hash = raw_lemma
|
158
304
|
# construct some conveniance menthods for relation type access
|
159
305
|
SYMBOL_TO_POS.keys.each do |pos|
|
160
306
|
self.class.send(:define_method, "#{pos}s?") do
|
@@ -163,9 +309,17 @@ module Words
|
|
163
309
|
self.class.send(:define_method, "#{pos}s") do
|
164
310
|
synsets(pos)
|
165
311
|
end
|
312
|
+
self.class.send(:define_method, "#{pos}_ids") do
|
313
|
+
synset_ids(pos)
|
314
|
+
end
|
166
315
|
end
|
167
316
|
end
|
168
317
|
|
318
|
+
def tagsense_counts
|
319
|
+
@tagsense_counts = @lemma_hash["tagsense_counts"].split('|').map { |count| { POS_TO_SYMBOL[count[0,1]] => count[1..-1].to_i } } unless defined? @tagsense_counts
|
320
|
+
@tagsense_counts
|
321
|
+
end
|
322
|
+
|
169
323
|
def lemma
|
170
324
|
@lemma = @lemma_hash["lemma"].gsub('_', ' ') unless defined? @lemma
|
171
325
|
@lemma
|
@@ -182,20 +336,19 @@ module Words
|
|
182
336
|
end
|
183
337
|
|
184
338
|
def synsets(pos = :all)
|
185
|
-
|
339
|
+
synset_ids(pos).map { |synset_id| Synset.new synset_id, @wordnet_connection }
|
340
|
+
end
|
341
|
+
|
342
|
+
def synset_ids(pos = :all)
|
343
|
+
@synset_ids = @lemma_hash["synset_ids"].split('|') unless defined? @synset_ids
|
344
|
+
case
|
186
345
|
when SYMBOL_TO_POS.include?(pos.to_sym)
|
187
|
-
synset_ids.select { |synset_id| synset_id[0,1] == SYMBOL_TO_POS[pos.to_sym] }
|
346
|
+
@synset_ids.select { |synset_id| synset_id[0,1] == SYMBOL_TO_POS[pos.to_sym] }
|
188
347
|
when POS_TO_SYMBOL.include?(pos.to_s)
|
189
|
-
synset_ids.select { |synset_id| synset_id[0,1] == pos.to_s }
|
348
|
+
@synset_ids.select { |synset_id| synset_id[0,1] == pos.to_s }
|
190
349
|
else
|
191
|
-
synset_ids
|
350
|
+
@synset_ids
|
192
351
|
end
|
193
|
-
relevent_synsets.map { |synset_id| Synset.new synset_id }
|
194
|
-
end
|
195
|
-
|
196
|
-
def synset_ids
|
197
|
-
@synset_ids = @lemma_hash["synset_ids"].split('|') unless defined? @synset_ids
|
198
|
-
@synset_ids
|
199
352
|
end
|
200
353
|
|
201
354
|
def inspect
|
@@ -203,25 +356,42 @@ module Words
|
|
203
356
|
end
|
204
357
|
|
205
358
|
alias word lemma
|
359
|
+
alias pos available_pos
|
206
360
|
|
207
361
|
end
|
208
362
|
|
209
363
|
class Words
|
210
364
|
|
211
|
-
|
212
|
-
|
213
|
-
|
214
|
-
|
215
|
-
abort("Failed to locate the words database at #{(Pathname.new path).realpath}")
|
216
|
-
end
|
365
|
+
@wordnet_connection = nil
|
366
|
+
|
367
|
+
def initialize(type = :tokyo, path = :default, wordnet_path = :search)
|
368
|
+
@wordnet_connection = WordnetConnection.new(type, path, wordnet_path)
|
217
369
|
end
|
218
370
|
|
219
371
|
def find(word)
|
220
|
-
Lemma.new
|
372
|
+
Lemma.new @wordnet_connection.lemma(word), @wordnet_connection
|
373
|
+
end
|
374
|
+
|
375
|
+
def connection_type
|
376
|
+
@wordnet_connection.connection_type
|
377
|
+
end
|
378
|
+
|
379
|
+
def wordnet_dir
|
380
|
+
@wordnet_connection.wordnet_dir
|
221
381
|
end
|
222
382
|
|
223
383
|
def close
|
224
|
-
|
384
|
+
@wordnet_connection.close
|
385
|
+
end
|
386
|
+
|
387
|
+
def connected
|
388
|
+
@wordnet_connection.connected
|
389
|
+
end
|
390
|
+
|
391
|
+
def to_s
|
392
|
+
return "Words not connected" if !connected
|
393
|
+
return "Words running in pure mode using wordnet files found at #{wordnet_dir} and index at #{@wordnet_connection.data_path}" if connection_type == :pure
|
394
|
+
return "Words running in tokyo mode with dataset at #{@wordnet_connection.data_path}" if connection_type == :tokyo
|
225
395
|
end
|
226
396
|
|
227
397
|
end
|
data/words.gemspec
ADDED
@@ -0,0 +1,60 @@
|
|
1
|
+
# Generated by jeweler
|
2
|
+
# DO NOT EDIT THIS FILE DIRECTLY
|
3
|
+
# Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
|
4
|
+
# -*- encoding: utf-8 -*-
|
5
|
+
|
6
|
+
Gem::Specification.new do |s|
|
7
|
+
s.name = %q{words}
|
8
|
+
s.version = "0.2.0"
|
9
|
+
|
10
|
+
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
|
+
s.authors = ["Roja Buck"]
|
12
|
+
s.date = %q{2010-01-16}
|
13
|
+
s.default_executable = %q{build_wordnet}
|
14
|
+
s.description = %q{A fast, easy to use interface to WordNet® with cross ruby distribution compatability. We use TokyoCabinet to store the dataset and the excellent rufus-tokyo to interface with it. This allows us to have full compatability across ruby distributions while still remaining both fast and simple to use.}
|
15
|
+
s.email = %q{roja@arbia.co.uk}
|
16
|
+
s.executables = ["build_wordnet"]
|
17
|
+
s.extra_rdoc_files = [
|
18
|
+
"LICENSE",
|
19
|
+
"README.markdown"
|
20
|
+
]
|
21
|
+
s.files = [
|
22
|
+
".gitignore",
|
23
|
+
"LICENSE",
|
24
|
+
"README.markdown",
|
25
|
+
"Rakefile",
|
26
|
+
"VERSION",
|
27
|
+
"bin/build_wordnet",
|
28
|
+
"examples.rb",
|
29
|
+
"lib/words.rb",
|
30
|
+
"test/helper.rb",
|
31
|
+
"test/test_words.rb",
|
32
|
+
"words.gemspec"
|
33
|
+
]
|
34
|
+
s.homepage = %q{http://github.com/roja/words}
|
35
|
+
s.rdoc_options = ["--charset=UTF-8"]
|
36
|
+
s.require_paths = ["lib"]
|
37
|
+
s.rubygems_version = %q{1.3.5}
|
38
|
+
s.summary = %q{A fast, easy to use interface to WordNet® with cross ruby distribution compatability.}
|
39
|
+
s.test_files = [
|
40
|
+
"test/test_words.rb",
|
41
|
+
"test/helper.rb"
|
42
|
+
]
|
43
|
+
|
44
|
+
if s.respond_to? :specification_version then
|
45
|
+
current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
|
46
|
+
s.specification_version = 3
|
47
|
+
|
48
|
+
if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
|
49
|
+
s.add_runtime_dependency(%q<trollop>, [">= 1.15"])
|
50
|
+
s.add_runtime_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
|
51
|
+
else
|
52
|
+
s.add_dependency(%q<trollop>, [">= 1.15"])
|
53
|
+
s.add_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
|
54
|
+
end
|
55
|
+
else
|
56
|
+
s.add_dependency(%q<trollop>, [">= 1.15"])
|
57
|
+
s.add_dependency(%q<rufus-tokyo>, [">= 1.0.5"])
|
58
|
+
end
|
59
|
+
end
|
60
|
+
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: words
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Roja Buck
|
@@ -9,12 +9,12 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2010-01-
|
13
|
-
default_executable:
|
12
|
+
date: 2010-01-16 00:00:00 +00:00
|
13
|
+
default_executable: build_wordnet
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: trollop
|
17
|
-
type: :
|
17
|
+
type: :runtime
|
18
18
|
version_requirement:
|
19
19
|
version_requirements: !ruby/object:Gem::Requirement
|
20
20
|
requirements:
|
@@ -34,8 +34,8 @@ dependencies:
|
|
34
34
|
version:
|
35
35
|
description: "A fast, easy to use interface to WordNet\xC2\xAE with cross ruby distribution compatability. We use TokyoCabinet to store the dataset and the excellent rufus-tokyo to interface with it. This allows us to have full compatability across ruby distributions while still remaining both fast and simple to use."
|
36
36
|
email: roja@arbia.co.uk
|
37
|
-
executables:
|
38
|
-
|
37
|
+
executables:
|
38
|
+
- build_wordnet
|
39
39
|
extensions: []
|
40
40
|
|
41
41
|
extra_rdoc_files:
|
@@ -47,12 +47,12 @@ files:
|
|
47
47
|
- README.markdown
|
48
48
|
- Rakefile
|
49
49
|
- VERSION
|
50
|
-
-
|
51
|
-
- data/wordnet.tct
|
50
|
+
- bin/build_wordnet
|
52
51
|
- examples.rb
|
53
52
|
- lib/words.rb
|
54
53
|
- test/helper.rb
|
55
54
|
- test/test_words.rb
|
55
|
+
- words.gemspec
|
56
56
|
has_rdoc: true
|
57
57
|
homepage: http://github.com/roja/words
|
58
58
|
licenses: []
|
data/data/wordnet.tct
DELETED
Binary file
|