words-wordnet 0.4.6
Sign up to get free protection for your applications and to get access to all the features.
- data/LICENSE +20 -0
- data/README.markdown +169 -0
- data/Rakefile +54 -0
- data/VERSION +1 -0
- data/bin/build_wordnet +177 -0
- data/examples.rb +55 -0
- data/lib/evocations.rb +81 -0
- data/lib/homographs.rb +100 -0
- data/lib/relation.rb +90 -0
- data/lib/synset.rb +201 -0
- data/lib/wordnet_connectors/pure_wordnet_connection.rb +224 -0
- data/lib/wordnet_connectors/tokyo_wordnet_connection.rb +141 -0
- data/lib/words.rb +172 -0
- data/spec/words_spec.rb +151 -0
- data/words.gemspec +57 -0
- metadata +95 -0
data/LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2009 Roja Buck
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.markdown
ADDED
@@ -0,0 +1,169 @@
|
|
1
|
+
# Words - A fast, easy to use interface to WordNet® with cross ruby distribution compatibility. #
|
2
|
+
|
3
|
+
## About ##
|
4
|
+
|
5
|
+
Words implements a fast interface to [Wordnet®](http://wordnet.princeton.edu) which provides both a pure ruby and an FFI powered backend over the same easy-to-use API. The FFI backend makes use of [Tokyo Cabinet](http://1978th.net/tokyocabinet/) and the FFI interface, [rufus-tokyo](http://github.com/jmettraux/rufus-tokyo), to provide cross ruby distribution compatibility and blistering speed. The pure ruby interface operates on a special ruby optimised index along with the basic dictionary files provided by WordNet®. I have attempted to provide ease of use in the form of a simple yet powerful api and installation is a sintch!
|
6
|
+
|
7
|
+
* Version 0.2 Introduced Pure Ruby Backend
|
8
|
+
* Version 0.3 Introduced Evocation Support (see examples & below) as developed by the [Wordnet® Evocation Project](http://wordnet.cs.princeton.edu/downloads/evocation/release-0.4/README.TXT)
|
9
|
+
* Version 0.4 Substantial performance increase in pure mode (now faster at some things than the tokyo backend) and simplification of use! Full refactoring. Move to RSpec for testing. API CHANGES: Words::Words -> Words::Wordnet, close -> close!, connected -> connected? and evocations_enabled? -> evocations?
|
10
|
+
|
11
|
+
Documentation: [Yardoc Live](http://yardoc.org/docs/roja-words)
|
12
|
+
|
13
|
+
## Pre-Installation ##
|
14
|
+
|
15
|
+
First ensure you have a copy of the wordnet data files. This is generally available from your Linux/OSX package manager:
|
16
|
+
|
17
|
+
#Ubuntu
|
18
|
+
sudo apt-get install wordnet-base
|
19
|
+
|
20
|
+
#Fedora/RHL
|
21
|
+
sudo yum update wordnet
|
22
|
+
|
23
|
+
#OSX HomeBrew
|
24
|
+
sudo brew install wordnet
|
25
|
+
|
26
|
+
#MacPorts
|
27
|
+
sudo port install wordnet
|
28
|
+
|
29
|
+
or you can simply download and install (Unix/OSX):
|
30
|
+
|
31
|
+
wget http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
|
32
|
+
sudo mkdir /usr/local/share/wordnet
|
33
|
+
sudo tar -C /usr/local/share/wordnet/ -xzf WNdb-3.0.tar.gz
|
34
|
+
|
35
|
+
or (Windows)
|
36
|
+
|
37
|
+
Download http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
|
38
|
+
Unzip
|
39
|
+
|
40
|
+
# due to the way windows tends not to have a folder for things you will
|
41
|
+
# have to specify the location of the wordnet files when using the gem
|
42
|
+
|
43
|
+
## For Tokyo Backend Only ##
|
44
|
+
|
45
|
+
Unless you want to use the tokyo backend you are now ready to install Words (and build the data if you want extras), otherwise if you want to use the tokyo backend (FAST!) you will also need [Tokyo Cabinet](http://1978th.net/tokyocabinet/) installed. It should be nice and easy... something like:
|
46
|
+
|
47
|
+
# osx users should, if ports is installed, simply do
|
48
|
+
sudo ports install tokyocabinet
|
49
|
+
|
50
|
+
# otherwise the best route is from source
|
51
|
+
wget http://1978th.net/tokyocabinet/tokyocabinet-1.4.41.tar.gz
|
52
|
+
tar -xzf tokyocabinet-1.4.41.tar.gz
|
53
|
+
cd tokyocabinet-1.4.41/
|
54
|
+
./configure
|
55
|
+
make
|
56
|
+
sudo make install
|
57
|
+
|
58
|
+
## GEM Installation ##
|
59
|
+
|
60
|
+
After this it should be just a gem to install. For those of you with old rubygems versions first:
|
61
|
+
|
62
|
+
gem install gemcutter # These two steps are only necessary if you haven't
|
63
|
+
gem tumble # yet installed the gemcutter tools
|
64
|
+
|
65
|
+
Otherwise and after it's simply:
|
66
|
+
|
67
|
+
gem install words
|
68
|
+
|
69
|
+
Then your ready to rock and roll. :)
|
70
|
+
|
71
|
+
## Build Data ##
|
72
|
+
|
73
|
+
If all you want to do is use wordnet in it's standard form you don't have to do any databuilding and can skip this section. If however you either
|
74
|
+
want to take advantage of evocations ([Wordnet® Evocation Project](http://wordnet.cs.princeton.edu/downloads/evocation/release-0.4/README.TXT)) or want to use the tokyo backend, read on!
|
75
|
+
|
76
|
+
To build the wordnet dataset file yourself, from the original wordnet files, you can use the bundled "build_wordnet" command
|
77
|
+
|
78
|
+
build_wordnet -h # this will give you the usage information & additional options/features
|
79
|
+
|
80
|
+
# this would attempt to build the tokyo backend data locating the original wordnet
|
81
|
+
# files through a search...
|
82
|
+
sudo build_wordnet --build-tokyo
|
83
|
+
|
84
|
+
# this would attempt to build the tokyo backend locating the original wordnet files
|
85
|
+
# through a search with the addition of evocation support...
|
86
|
+
sudo build_wordnet --build-tokyo-with-evocations
|
87
|
+
|
88
|
+
# this would attempt to build evocation support for the pure backend
|
89
|
+
# (remember no dataset needs to be built to use wordnet with the pure backend)
|
90
|
+
sudo build_wordnet --build-pure-evocations
|
91
|
+
|
92
|
+
## Usage ##
|
93
|
+
|
94
|
+
Heres a few little examples of using words within your programs.
|
95
|
+
|
96
|
+
require 'rubygems'
|
97
|
+
require 'words'
|
98
|
+
|
99
|
+
data = Words::Wordnet.new # or: data = Words::Wordnet.new(:tokyo) for the tokyo backend
|
100
|
+
|
101
|
+
# to specify a wordnet path Words::Words.new(:pure, '/path/to/wordnet')
|
102
|
+
# to specify the tokyo dataset Words::Words.new(:pure, :search, '/path/to/data.tct')
|
103
|
+
|
104
|
+
# play with connections
|
105
|
+
data.connected? # => true
|
106
|
+
data.close!
|
107
|
+
data.connected? # => false
|
108
|
+
data.open!
|
109
|
+
data.connected? # => true
|
110
|
+
data.connection_type # => :pure or :tokyo depending...
|
111
|
+
|
112
|
+
# locate a word
|
113
|
+
lemma = data.find("bat")
|
114
|
+
|
115
|
+
lemma.to_s # => bat, noun/verb
|
116
|
+
lemma.available_pos.inspect # => [:noun, :verb]
|
117
|
+
|
118
|
+
lemma.synsets(:noun) # => array of synsets which represent nouns of the lemma bat
|
119
|
+
# or
|
120
|
+
lemma.nouns # => array of synsets which represent nouns of the lemma bat
|
121
|
+
lemma.noun_ids # => array of synsets ids which represent nouns of the lemma bat
|
122
|
+
lemma.verbs? #=> true
|
123
|
+
|
124
|
+
# specify a sense
|
125
|
+
sense = lemma.nouns.last
|
126
|
+
sense2 = lemma.nouns[2]
|
127
|
+
|
128
|
+
sense.gloss # => a club used for hitting a ball in various games
|
129
|
+
sense2.words # => ["cricket bat", "bat"]
|
130
|
+
sense2.lexical_description # => a description of the lexical meaning of the synset
|
131
|
+
sense.relations.first # => "Semantic hypernym relation between n02806379 and n03053474"
|
132
|
+
|
133
|
+
sense.relations(:hyponym) # => Array of hyponyms associated with the sense
|
134
|
+
# or
|
135
|
+
sense.hyponyms # => Array of hyponyms associated with the sense
|
136
|
+
sense.hyponyms? # => true
|
137
|
+
|
138
|
+
sense.relations.first.is_semantic? # => true
|
139
|
+
sense.relations.first.source_word # => nil
|
140
|
+
sense.relations.first.destination # => the synset of n03053474
|
141
|
+
|
142
|
+
sense.derivationally_related_forms.first.is_semantic? # => false
|
143
|
+
sense.derivationally_related_forms.first.source_word # => "bat"
|
144
|
+
sense.derivationally_related_forms.first.destination_word # => "bat"
|
145
|
+
sense.derivationally_related_forms.first.destination # => the synset of v01413191
|
146
|
+
|
147
|
+
if data.evocations? # check for evocation support
|
148
|
+
data.find("broadcast").senses.first.evocations # => sense relevant evocations
|
149
|
+
data.find("broadcast").senses.first.evocations[1] # => the evocation at index 1
|
150
|
+
data.find("broadcast").senses.first.evocations[1][:destination].words # => synset
|
151
|
+
end
|
152
|
+
|
153
|
+
data.close!
|
154
|
+
|
155
|
+
These and more examples are available from within the examples.rb file!
|
156
|
+
|
157
|
+
## Note on Patches/Pull Requests ##
|
158
|
+
|
159
|
+
* Fork the project.
|
160
|
+
* Make your feature addition or bug fix.
|
161
|
+
* Add tests for it. This is important so I don't break it in a
|
162
|
+
future version unintentionally.
|
163
|
+
* Commit, do not mess with rakefile, version, or history.
|
164
|
+
(if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
|
165
|
+
* Send me a pull request. Bonus points for topic branches.
|
166
|
+
|
167
|
+
## Copyright ##
|
168
|
+
|
169
|
+
Copyright (c) 2010 Roja Buck. See LICENSE for details.
|
data/Rakefile
ADDED
@@ -0,0 +1,54 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
require 'rubygems'
|
3
|
+
require 'rake'
|
4
|
+
|
5
|
+
begin
|
6
|
+
require 'jeweler'
|
7
|
+
Jeweler::Tasks.new do |gem|
|
8
|
+
gem.name = "words-wordnet"
|
9
|
+
gem.summary = %Q{A Fast & Easy to use interface to WordNet with cross ruby distribution compatibility.}
|
10
|
+
gem.description = %Q{Words, with both pure ruby & tokyo-cabinate backends, implements a fast interface to Wordnet over the same easy-to-use API. The FFI backend makes use of Tokyo Cabinet and the FFI interface, rufus-tokyo, to provide cross ruby distribution compatability and blistering speed. The pure ruby interface operates on a special ruby optimised index along with the basic dictionary files provided by WordNet. I have attempted to provide ease of use in the form of a simple yet powerful api and installation is a sintch!}
|
11
|
+
gem.email = "roja@arbia.co.uk"
|
12
|
+
gem.homepage = "http://github.com/roja/words"
|
13
|
+
gem.authors = ["Roja Buck"]
|
14
|
+
gem.executables = [ "build_wordnet" ]
|
15
|
+
gem.default_executable = "build_wordnet"
|
16
|
+
gem.rubyforge_project = 'words'
|
17
|
+
gem.add_development_dependency "rspec", ">= 2.11.0"
|
18
|
+
end
|
19
|
+
Jeweler::GemcutterTasks.new
|
20
|
+
rescue LoadError
|
21
|
+
puts "Jeweler (or a dependency) not available. Install it with: gem install jeweler"
|
22
|
+
end
|
23
|
+
|
24
|
+
require 'rspec/core/rake_task'
|
25
|
+
RSpec::Core::RakeTask.new(:spec) do |spec|
|
26
|
+
spec.pattern = 'spec/**/*_spec.rb'
|
27
|
+
end
|
28
|
+
|
29
|
+
begin
|
30
|
+
require 'rcov'
|
31
|
+
RSpec::Core::RakeTask.new(:rcov) do |spec|
|
32
|
+
spec.libs << 'lib' << 'spec'
|
33
|
+
spec.pattern = 'spec/**/*_spec.rb'
|
34
|
+
spec.rcov = true
|
35
|
+
end
|
36
|
+
rescue LoadError
|
37
|
+
task :rcov do
|
38
|
+
abort "RCov is not available. In order to run rcov, you must: sudo gem install rcov"
|
39
|
+
end
|
40
|
+
end
|
41
|
+
|
42
|
+
task :spec => :check_dependencies
|
43
|
+
|
44
|
+
task :default => :spec
|
45
|
+
|
46
|
+
require 'rdoc/task'
|
47
|
+
Rake::RDocTask.new do |rdoc|
|
48
|
+
version = File.exist?('VERSION') ? File.read('VERSION') : ""
|
49
|
+
|
50
|
+
rdoc.rdoc_dir = 'rdoc'
|
51
|
+
rdoc.title = "test #{version}"
|
52
|
+
rdoc.rdoc_files.include('README*')
|
53
|
+
rdoc.rdoc_files.include('lib/**/*.rb')
|
54
|
+
end
|
data/VERSION
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
0.4.6
|
data/bin/build_wordnet
ADDED
@@ -0,0 +1,177 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
# std includes
|
4
|
+
require 'pathname'
|
5
|
+
|
6
|
+
# gem includes
|
7
|
+
require 'rubygems'
|
8
|
+
|
9
|
+
# standard library includes
|
10
|
+
#require 'trollop'
|
11
|
+
require 'zlib'
|
12
|
+
require 'net/http'
|
13
|
+
require 'optparse'
|
14
|
+
require 'pp'
|
15
|
+
|
16
|
+
# local includes
|
17
|
+
require File.join(File.dirname(__FILE__), '..', 'lib', 'words.rb')
|
18
|
+
|
19
|
+
POS_FILE_TYPES = %w{ adj adv noun verb }
|
20
|
+
POS_FILE_TYPE_TO_SHORT = { 'adj' => 'a', 'adv' => 'r', 'noun' => 'n', 'verb' => 'v' }
|
21
|
+
|
22
|
+
puts "Words Dataset Constructor 2010 (c) Roja Buck"
|
23
|
+
|
24
|
+
opts = { :quiet => false, :build_tokyo => false, :build_tokyo_with_evocations => false, :build_pure_evocations => false, :wordnet => 'Search...' }
|
25
|
+
|
26
|
+
optparse = OptionParser.new do|option|
|
27
|
+
|
28
|
+
option.on( '-q', '--quiet', "Don't output verbose program detail. (Default: false)" ) do
|
29
|
+
opts[:quiet] = true
|
30
|
+
end
|
31
|
+
|
32
|
+
option.on( '-w', '--wordnet FILE', "Location of the wordnet dictionary directory. (Default: Search)" ) do|f|
|
33
|
+
opts[:wordnet] = f
|
34
|
+
end
|
35
|
+
|
36
|
+
option.on( '-t', '--build-tokyo', "Build the tokyo wordnet dataset? (Default: false)" ) do
|
37
|
+
opts[:build_tokyo] = true
|
38
|
+
end
|
39
|
+
|
40
|
+
option.on( '-x', '--build-tokyo-with-evocations', "Build the tokyo dataset with the similarity dataset based on the wordnet evocation project? (Default: false) NOTE: requires internet connection." ) do
|
41
|
+
opts[:build_tokyo_with_evocations] = true
|
42
|
+
end
|
43
|
+
|
44
|
+
option.on( '-e', '--build-pure-evocations', "Build the similarity dataset based on the wordnet evocation project for use with the pure words mode. (Default: false) NOTE: requires internet connection." ) do
|
45
|
+
opts[:build_pure_evocations] = true
|
46
|
+
end
|
47
|
+
|
48
|
+
option.on( '-h', '--help', 'Display this screen' ) do
|
49
|
+
puts option
|
50
|
+
exit
|
51
|
+
end
|
52
|
+
|
53
|
+
end
|
54
|
+
|
55
|
+
optparse.parse!
|
56
|
+
|
57
|
+
if !opts[:build_tokyo] && !opts[:build_tokyo_with_evocations] && !opts[:build_pure_evocations]
|
58
|
+
puts "ERROR: You need to specify at least one dataset you want to build."
|
59
|
+
exit
|
60
|
+
end
|
61
|
+
puts "Verbose mode enabled" if (VERBOSE = !opts[:quiet])
|
62
|
+
|
63
|
+
require 'rufus-tokyo' if opts[:build_tokyo] || opts[:build_tokyo_with_evocations]
|
64
|
+
|
65
|
+
gem_path = Pathname.new "#{File.dirname(__FILE__)}/.."
|
66
|
+
abort "Ensure you run the command using sudo or as a Superuser / Administrator" unless gem_path.writable?
|
67
|
+
data_path = gem_path + "data/"
|
68
|
+
data_path.mkpath
|
69
|
+
|
70
|
+
wordnet_dir = nil
|
71
|
+
if opts[:wordnet] == "Search..."
|
72
|
+
wordnet_dir = Words::Wordnet.locate_wordnet :search
|
73
|
+
abort( "Unable to locate wordnet dictionary. To specify check --help." ) if wordnet_dir.nil?
|
74
|
+
else
|
75
|
+
wordnet_dir = Words::Wordnet.locate_wordnet opts[:wordnet]
|
76
|
+
abort( "Unable to locate wordnet dictionary in directory #{opts[:wordnet]}. Please check and try again." ) if wordnet_dir.nil?
|
77
|
+
end
|
78
|
+
|
79
|
+
# At this point we know we should have a wordnet directory within wordnet_dir
|
80
|
+
puts "Found wordnet files in #{wordnet_dir}..." if VERBOSE
|
81
|
+
|
82
|
+
index_files = POS_FILE_TYPES.map { |pos| wordnet_dir + "index.#{pos}" }
|
83
|
+
data_files = POS_FILE_TYPES.map { |pos| wordnet_dir + "data.#{pos}" }
|
84
|
+
|
85
|
+
(index_files + data_files).each do |required_file|
|
86
|
+
abort( "Unable to locate #{required_file} within the wordnet dictionary. Please check your wordnet copy is valid and try again." ) unless required_file.exist?
|
87
|
+
abort( "Cannot get readable permissions to #{required_file} within the wordnet dictionary. Please check the file permissions and try again." ) unless required_file.readable?
|
88
|
+
end
|
89
|
+
|
90
|
+
# At this point we know we have the correct files, though we don't know there validity
|
91
|
+
puts "Validated existance of wordnet files in #{wordnet_dir}..." if VERBOSE
|
92
|
+
|
93
|
+
# Build data
|
94
|
+
|
95
|
+
index_hash = Hash.new
|
96
|
+
data_hash = Hash.new
|
97
|
+
POS_FILE_TYPES.each do |file_pos|
|
98
|
+
|
99
|
+
puts "Building #{file_pos} indexes..." if VERBOSE
|
100
|
+
|
101
|
+
# add indexes
|
102
|
+
(wordnet_dir + "index.#{file_pos}").each_line do |index_line|
|
103
|
+
next if index_line[0, 2] == " "
|
104
|
+
index_parts = index_line.split(" ")
|
105
|
+
|
106
|
+
lemma, pos, synset_count, pointer_count = index_parts.shift, index_parts.shift, index_parts.shift.to_i, index_parts.shift.to_i
|
107
|
+
pointer_symbols = Array.new(pointer_count).map { POS_FILE_TYPE_TO_SHORT[file_pos] + index_parts.shift }
|
108
|
+
sense_count = index_parts.shift
|
109
|
+
tagsense_count = pos + index_parts.shift
|
110
|
+
synset_ids = Array.new(synset_count).map { POS_FILE_TYPE_TO_SHORT[file_pos] + index_parts.shift }
|
111
|
+
|
112
|
+
index_hash[lemma] = { "synset_ids" => [], "tagsense_counts" => [] } if index_hash[lemma].nil?
|
113
|
+
index_hash[lemma] = { "lemma" => lemma, "synset_ids" => index_hash[lemma]["synset_ids"] + synset_ids, "tagsense_counts" => index_hash[lemma]["tagsense_counts"] + [tagsense_count] }
|
114
|
+
|
115
|
+
end
|
116
|
+
|
117
|
+
if opts[:build_tokyo] || opts[:build_tokyo_with_evocations]
|
118
|
+
puts "Building #{file_pos} data..." if VERBOSE
|
119
|
+
|
120
|
+
# add data
|
121
|
+
(wordnet_dir + "data.#{file_pos}").each_line do |data_line|
|
122
|
+
next if data_line[0, 2] == " "
|
123
|
+
data_line, gloss = data_line.split(" | ")
|
124
|
+
data_parts = data_line.split(" ")
|
125
|
+
|
126
|
+
synset_id, lexical_filenum, synset_type, word_count = POS_FILE_TYPE_TO_SHORT[file_pos] + data_parts.shift, data_parts.shift, data_parts.shift, data_parts.shift.to_i(16)
|
127
|
+
words = Array.new(word_count).map { "#{data_parts.shift}.#{data_parts.shift}" }
|
128
|
+
relations = Array.new(data_parts.shift.to_i).map { "#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}.#{data_parts.shift}" }
|
129
|
+
|
130
|
+
data_hash[synset_id] = { "synset_id" => synset_id, "lexical_filenum" => lexical_filenum, "synset_type" => synset_type,
|
131
|
+
"words" => words.join('|'), "relations" => relations.join('|'), "gloss" => gloss.strip }
|
132
|
+
end
|
133
|
+
end
|
134
|
+
|
135
|
+
end
|
136
|
+
|
137
|
+
score_hash = Hash.new
|
138
|
+
if opts[:build_tokyo_with_evocations] || opts[:build_pure_evocations]
|
139
|
+
puts "Downloading score data..." if VERBOSE
|
140
|
+
scores_file = data_path + "scores.txt.gz"
|
141
|
+
scores_file.delete if scores_file.exist?
|
142
|
+
File.open(scores_file,'w') do |file|
|
143
|
+
file.write Net::HTTP.get(URI.parse('http://cloud.github.com/downloads/roja/words/scores.txt.gz'))
|
144
|
+
end
|
145
|
+
abort( "Unable to gather similarities information from http://cloud.github.com/downloads/roja/words/scores.txt.gz... Try again later." ) unless scores_file.exist?
|
146
|
+
|
147
|
+
puts "Compiling score data..." if VERBOSE
|
148
|
+
Zlib::GzipReader.open(scores_file) do |gz|
|
149
|
+
gz.each_line do |line|
|
150
|
+
mean, median, sense1, sense2 = line.split(',')
|
151
|
+
senses = [sense1, sense2].map! { |sense| sense.strip.split('.') }.map! { |sense| index_hash[sense[0]]["synset_ids"].select { |synset_id| synset_id[0,1] == sense[1].gsub("s", "a") }[sense[2].to_i-1] }
|
152
|
+
senses.each do |sense|
|
153
|
+
relation = (senses - [sense]).first.nil? ? sense : (senses - [sense]).first
|
154
|
+
score_name = sense + "s"
|
155
|
+
score_hash[score_name] = { "relations" => [], "means" => [], "medians" => [] } if score_hash[score_name].nil?
|
156
|
+
score_hash[score_name] = { "relations" => score_hash[score_name]["relations"] << relation, "means" => score_hash[score_name]["means"] << mean, "medians" => score_hash[score_name]["medians"] << median }
|
157
|
+
end unless senses.include? nil
|
158
|
+
end
|
159
|
+
end
|
160
|
+
end
|
161
|
+
|
162
|
+
if opts[:build_tokyo] || opts[:build_tokyo_with_evocations]
|
163
|
+
tokyo_hash = Rufus::Tokyo::Table.new((data_path + "wordnet.tct").to_s)
|
164
|
+
index_hash.each { |k,v| tokyo_hash[k] = { "lemma" => v["lemma"], "synset_ids" => v["synset_ids"].join('|'), "tagsense_counts" => v["tagsense_counts"].join('|') } }
|
165
|
+
data_hash.each { |k,v| tokyo_hash[k] = v }
|
166
|
+
score_hash.each { |k,v| tokyo_hash[k] = { "relations" => v["relations"].join('|'), "means" => v["means"].join('|'), "medians" => v["medians"].join('|') } } if opts[:build_tokyo_with_evocations]
|
167
|
+
tokyo_hash.close
|
168
|
+
end
|
169
|
+
|
170
|
+
if opts[:build_pure_evocations]
|
171
|
+
score = Hash.new
|
172
|
+
score_hash.each { |k,v| score[k] = [v["relations"].join('|'), v["means"].join('|'), v["medians"].join('|')] }
|
173
|
+
File.open(data_path + "evocations.dmp",'w') do |file|
|
174
|
+
file.write Marshal.dump(score)
|
175
|
+
end
|
176
|
+
end
|
177
|
+
|
data/examples.rb
ADDED
@@ -0,0 +1,55 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
# coding: utf-8
|
3
|
+
|
4
|
+
#require 'rubygems'
|
5
|
+
#require 'words'
|
6
|
+
require 'lib/words.rb'
|
7
|
+
|
8
|
+
if __FILE__ == $0
|
9
|
+
|
10
|
+
wordnet = Words::Wordnet.new #:tokyo
|
11
|
+
|
12
|
+
puts wordnet.connected?
|
13
|
+
wordnet.close!
|
14
|
+
puts wordnet.connected?
|
15
|
+
wordnet.open!
|
16
|
+
puts wordnet.connected?
|
17
|
+
|
18
|
+
puts wordnet
|
19
|
+
|
20
|
+
puts wordnet.find('squash racquet')
|
21
|
+
|
22
|
+
puts wordnet.find('bat')
|
23
|
+
puts wordnet.find('bat').available_pos.inspect
|
24
|
+
puts wordnet.find('bat').lemma
|
25
|
+
puts wordnet.find('bat').nouns?
|
26
|
+
puts wordnet.find('bat').synsets('noun')
|
27
|
+
puts wordnet.find('bat').noun_ids
|
28
|
+
puts wordnet.find('bat').synsets(:noun)[2].words.inspect
|
29
|
+
puts wordnet.find('bat').nouns.last.relations
|
30
|
+
wordnet.find('bat').synsets('noun').last.relations.each { |relation| puts relation.inspect }
|
31
|
+
puts wordnet.find('bat').synsets('noun').last.hyponyms?
|
32
|
+
puts wordnet.find('bat').synsets('noun').last.participle_of_verbs?
|
33
|
+
|
34
|
+
puts wordnet.find('bat').synsets('noun').last.relations(:hyponym)
|
35
|
+
puts wordnet.find('bat').synsets('noun').last.hyponyms?
|
36
|
+
puts wordnet.find('bat').synsets('noun').last.relations("~")
|
37
|
+
puts wordnet.find('bat').synsets('verb').last.inspect
|
38
|
+
puts wordnet.find('bat').synsets('verb').last.words.inspect
|
39
|
+
puts wordnet.find('bat').synsets('verb').last.words_with_lexical_ids.inspect
|
40
|
+
|
41
|
+
puts wordnet.find('bat').synsets('verb').first.lexical.inspect
|
42
|
+
puts wordnet.find('bat').synsets('verb').first.lexical_description
|
43
|
+
|
44
|
+
puts wordnet.find('jkashdfajkshfksjdhf')
|
45
|
+
|
46
|
+
if wordnet.evocations?
|
47
|
+
puts wordnet.find("broadcast").senses.first.evocations
|
48
|
+
puts wordnet.find("broadcast").senses.first.evocations.means
|
49
|
+
puts wordnet.find("broadcast").senses.first.evocations[1].inspect
|
50
|
+
puts wordnet.find("broadcast").senses.first.evocations[20][:destination].words
|
51
|
+
end
|
52
|
+
|
53
|
+
wordnet.close!
|
54
|
+
|
55
|
+
end
|