linguistics 1.0.9 → 2.0.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (69) hide show
  1. data.tar.gz.sig +0 -0
  2. data/.gemtest +0 -0
  3. data/ChangeLog +849 -342
  4. data/History.rdoc +11 -0
  5. data/LICENSE +9 -9
  6. data/Manifest.txt +44 -0
  7. data/README.rdoc +226 -0
  8. data/Rakefile +32 -349
  9. data/examples/endocs.rb +272 -0
  10. data/examples/generalize_sentence.rb +2 -1
  11. data/examples/klingon.rb +22 -0
  12. data/lib/linguistics.rb +130 -292
  13. data/lib/linguistics/en.rb +337 -1628
  14. data/lib/linguistics/en/articles.rb +138 -0
  15. data/lib/linguistics/en/conjugation.rb +2245 -0
  16. data/lib/linguistics/en/conjunctions.rb +202 -0
  17. data/lib/linguistics/en/{infinitive.rb → infinitives.rb} +41 -55
  18. data/lib/linguistics/en/linkparser.rb +41 -49
  19. data/lib/linguistics/en/numbers.rb +483 -0
  20. data/lib/linguistics/en/participles.rb +33 -0
  21. data/lib/linguistics/en/pluralization.rb +810 -0
  22. data/lib/linguistics/en/stemmer.rb +75 -0
  23. data/lib/linguistics/en/titlecase.rb +121 -0
  24. data/lib/linguistics/en/wordnet.rb +63 -97
  25. data/lib/linguistics/inflector.rb +89 -0
  26. data/lib/linguistics/iso639.rb +534 -448
  27. data/lib/linguistics/languagebehavior.rb +36 -0
  28. data/lib/linguistics/monkeypatches.rb +42 -0
  29. data/spec/lib/constants.rb +15 -0
  30. data/spec/lib/helpers.rb +38 -0
  31. data/spec/linguistics/en/articles_spec.rb +797 -0
  32. data/spec/linguistics/en/conjugation_spec.rb +2083 -0
  33. data/spec/linguistics/en/conjunctions_spec.rb +154 -0
  34. data/spec/linguistics/en/infinitives_spec.rb +518 -0
  35. data/spec/linguistics/en/linkparser_spec.rb +66 -0
  36. data/spec/linguistics/en/numbers_spec.rb +1295 -0
  37. data/spec/linguistics/en/participles_spec.rb +55 -0
  38. data/spec/linguistics/en/pluralization_spec.rb +4636 -0
  39. data/spec/linguistics/en/stemmer_spec.rb +72 -0
  40. data/spec/linguistics/en/titlecase_spec.rb +841 -0
  41. data/spec/linguistics/en/wordnet_spec.rb +85 -0
  42. data/spec/linguistics/en_spec.rb +45 -167
  43. data/spec/linguistics/inflector_spec.rb +40 -0
  44. data/spec/linguistics/iso639_spec.rb +49 -53
  45. data/spec/linguistics/monkeypatches_spec.rb +40 -0
  46. data/spec/linguistics_spec.rb +46 -76
  47. metadata +241 -113
  48. metadata.gz.sig +0 -0
  49. data/README +0 -166
  50. data/README.english +0 -245
  51. data/rake/191_compat.rb +0 -26
  52. data/rake/dependencies.rb +0 -76
  53. data/rake/documentation.rb +0 -123
  54. data/rake/helpers.rb +0 -502
  55. data/rake/hg.rb +0 -318
  56. data/rake/manual.rb +0 -787
  57. data/rake/packaging.rb +0 -129
  58. data/rake/publishing.rb +0 -341
  59. data/rake/style.rb +0 -62
  60. data/rake/svn.rb +0 -668
  61. data/rake/testing.rb +0 -152
  62. data/rake/verifytask.rb +0 -64
  63. data/tests/en/infinitive.tests.rb +0 -207
  64. data/tests/en/inflect.tests.rb +0 -1389
  65. data/tests/en/lafcadio.tests.rb +0 -77
  66. data/tests/en/linkparser.tests.rb +0 -42
  67. data/tests/en/lprintf.tests.rb +0 -77
  68. data/tests/en/titlecase.tests.rb +0 -73
  69. data/tests/en/wordnet.tests.rb +0 -95
metadata.gz.sig CHANGED
Binary file
data/README DELETED
@@ -1,166 +0,0 @@
1
-
2
- = Linguistics
3
-
4
- == Authors
5
-
6
- * Michael Granger <ged@FaerieMUD.org>
7
- * Martin Chase <stillflame@FaerieMUD.org>
8
-
9
-
10
- == Requirements
11
-
12
- * Ruby >= 1.8.6
13
-
14
-
15
- == Optional
16
-
17
- * Ruby-WordNet (>= 0.0.5) - adds integration for the Ruby binding for the
18
- WordNet� lexical refrence system.
19
-
20
- URL: http://deveiate.org/projects/Ruby-WordNet
21
-
22
- * LinkParser (>= 1.0.5)
23
-
24
- URL: http://deveiate.org/projects/Ruby-LinkParser
25
-
26
-
27
- == General Information
28
-
29
- Linguistics is a framework for building linguistic utilities for Ruby objects
30
- in any language. It includes a generic language-independant front end, a
31
- module for mapping language codes into language names, and a module which
32
- contains various English-language utilities.
33
-
34
-
35
- === Method Interface
36
-
37
- The Linguistics module comes with a language-independant mechanism for
38
- extending core Ruby classes with linguistic methods.
39
-
40
- It consists of three parts: a core linguistics module which contains the
41
- class-extension framework for languages, a generic inflector class that serves
42
- as a delegator for linguistic methods on Ruby objects, and one or more
43
- language-specific modules which contain the actual linguistic functions.
44
-
45
- The module works by adding a single instance method for each language named
46
- after the language's two-letter code (or three-letter code, if no two-letter
47
- code is defined by ISO639) to various Ruby classes. This allows many
48
- language-specific methods to be added to objects without cluttering up the
49
- interface or risking collision between them, albeit at the cost of three or four
50
- more characters per method invocation.
51
-
52
- If you don't like extending core Ruby classes, the language modules should
53
- also allow you to use them as a function library as well.
54
-
55
- For example, the English-language module contains a #plural function which can
56
- be accessed via a method on a core class:
57
-
58
- Linguistics::use( :en )
59
- "goose".en.plural
60
- # => "geese"
61
-
62
- or via the Linguistics::EN::plural function directly:
63
-
64
- include Linguistics::EN
65
- plural( "goose" )
66
- # => "geese"
67
-
68
- The class-extension mechanism actually uses the functional interface behind
69
- the scenes.
70
-
71
- A new feature with the 0.02 release: You can now omit the language-code method
72
- for unambiguous methods by calling Linguistics::use with the +:installProxy+
73
- configuration key, with the language code of the language module whose methods
74
- you wish to be available. For example, instead of having to call:
75
-
76
- "goose".en.plural
77
-
78
- from the example above, you can now do this:
79
-
80
- Lingusitics::use( :en, :installProxy => :en )
81
- "goose".plural
82
- # => "geese"
83
-
84
- More about how this works in the documentation for Linguistics::use.
85
-
86
-
87
- ==== Adding Language Modules
88
-
89
- To add a new language to the framework, create a file named the same as the
90
- ISO639 2- or 3-letter language code for the language you're adding. It must be
91
- placed under lib/linguistics/ to be recognized by the linguistics module, but
92
- you can also just require it yourself prior to calling Linguistics::use().
93
- This file should define a module under Linguistics that is an all-caps version
94
- of the code used in the filename. Any methods you wish to be exposed to users
95
- should be declared as module functions (ie., using Module#module_function).
96
-
97
- You may also wish to add your module to the list of default languages by
98
- adding the appropriate symbol to the Linguistics::DefaultLanguages array.
99
-
100
- For example, to create a Portuguese-language module, create a file called
101
- 'lib/linguistics/pt.rb' which contains the following:
102
-
103
- module Linguistics
104
- module PT
105
- Linguistics::DefaultLanguages << :pt
106
-
107
- module_function
108
- <language methods here>
109
- end
110
- end
111
-
112
- See the English language module (lib/linguistics/en.rb) for an example.
113
-
114
-
115
- === English Language Module
116
-
117
- See the README.english file for a synopsis.
118
-
119
- The English-language module currently contains linguistic functions ported
120
- from a few excellent Perl modules:
121
-
122
- Lingua::EN::Inflect
123
- Lingua::Conjunction
124
- Lingua::EN::Infinitive
125
-
126
- See the lib/linguistics/en.rb file for specific attributions.
127
-
128
- New with version 0.02: integration with the Ruby WordNet� and LinkParser
129
- modules (which must be installed separately).
130
-
131
-
132
- == To Do
133
-
134
- * I am planning on improving the results from the infinitive functions, which
135
- currently return useful results only part of the time. Investigations into
136
- additional stemming functions and some other strategies are ongoing.
137
-
138
- * Martin Chase <stillflame at FaerieMUD dot org> is working on an integration
139
- module for his excellent work on a Ruby interface to the CMU Link Grammar
140
- (an english-sentence parser). This will make writing fairly accurate natural
141
- language parsers in Ruby much easier.
142
-
143
- * Suggestions (and patches) for any of these items or additional features are
144
- welcomed.
145
-
146
-
147
-
148
- == Legal
149
-
150
- This module is Open Source Software which is Copyright (c) 2003 by The
151
- FaerieMUD Consortium. All rights reserved.
152
-
153
- You may use, modify, and/or redistribute this software under the terms of the
154
- Perl Artistic License, a copy of which should have been included in this
155
- distribution (See the file Artistic). If it was not, a copy of it may be
156
- obtained from http://language.perl.com/misc/Artistic.html or
157
- http://www.faeriemud.org/artistic.html).
158
-
159
- THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
160
- WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
161
- MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
162
-
163
-
164
- $Id$
165
-
166
-
data/README.english DELETED
@@ -1,245 +0,0 @@
1
-
2
- = English Ruby Linguistics Module - Synopsis
3
-
4
- This is an overview of the functionality currently in the English functions of
5
- the Ruby Linguistics module as of version 0.02:
6
-
7
-
8
- == Pluralization
9
-
10
- require 'linguistics'
11
- Linguistics::use( :en ) # extends Array, String, and Numeric
12
-
13
- "box".en.plural
14
- # => "boxes"
15
-
16
- "mouse".en.plural
17
- # => "mice"
18
-
19
- "ruby".en.plural
20
- # => "rubies"
21
-
22
-
23
- == Indefinite Articles
24
-
25
- "book".en.a
26
- # => "a book"
27
-
28
- "article".en.a
29
- # => "an article"
30
-
31
-
32
- == Present Participles
33
-
34
- "runs".en.present_participle
35
- # => "running"
36
-
37
- "eats".en.present_participle
38
- # => "eating"
39
-
40
- "spies".en.present_participle
41
- # => "spying"
42
-
43
-
44
- == Ordinal Numbers
45
-
46
- 5.en.ordinal
47
- # => "5th"
48
-
49
- 2004.en.ordinal
50
- # => "2004th"
51
-
52
-
53
- == Numbers to Words
54
-
55
- 5.en.numwords
56
- # => "five"
57
-
58
- 2004.en.numwords
59
- # => "two thousand and four"
60
-
61
- 2385762345876.en.numwords
62
- # => "two trillion, three hundred and eighty-five billion,
63
- seven hundred and sixty-two million, three hundred and
64
- forty-five thousand, eight hundred and seventy-six"
65
-
66
-
67
- == Quantification
68
-
69
- "cow".en.quantify( 5 )
70
- # => "several cows"
71
-
72
- "cow".en.quantify( 1005 )
73
- # => "thousands of cows"
74
-
75
- "cow".en.quantify( 20_432_123_000_000 )
76
- # => "tens of trillions of cows"
77
-
78
-
79
- == Conjunctions
80
-
81
- animals = %w{dog cow ox chicken goose goat cow dog rooster llama
82
- pig goat dog cat cat dog cow goat goose goose ox alpaca}
83
- puts "The farm has: " + animals.en.conjunction
84
-
85
- # => The farm has: four dogs, three cows, three geese, three goats,
86
- two oxen, two cats, a chicken, a rooster, a llama, a pig,
87
- and an alpaca
88
-
89
- Note that 'goose' and 'ox' are both correctly pluralized, and the correct
90
- indefinite article 'an' has been used for 'alpaca'.
91
-
92
- You can also use the generalization function of the #quantify method to give
93
- general descriptions of object lists instead of literal counts:
94
-
95
- allobjs = []
96
- ObjectSpace::each_object {|obj| allobjs << obj.class.name}
97
-
98
- puts "The current Ruby objectspace contains: " +
99
- allobjs.en.conjunction( :generalize => true )
100
-
101
- which will print something like:
102
-
103
- The current Ruby objectspace contains: thousands of Strings,
104
- thousands of Arrays, hundreds of Hashes, hundreds of
105
- Classes, many Regexps, a number of Ranges, a number of
106
- Modules, several Floats, several Procs, several MatchDatas,
107
- several Objects, several IOS, several Files, a Binding, a
108
- NoMemoryError, a SystemStackError, a fatal, a ThreadGroup,
109
- and a Thread
110
-
111
-
112
- == Infinitives
113
-
114
- New in version 0.02:
115
-
116
- "leaving".en.infinitive
117
- # => "leave"
118
-
119
- "left".en.infinitive
120
- # => "leave"
121
-
122
- "leaving".en.infinitive.suffix
123
- # => "ing"
124
-
125
-
126
- == WordNet� Integration
127
-
128
- Also new in version 0.02, if you have the Ruby-WordNet module installed, you can
129
- look up WordNet synsets using the Linguistics interface:
130
-
131
- # Test to be sure the WordNet module loaded okay.
132
- Linguistics::EN.has_wordnet?
133
- # => true
134
-
135
- # Fetch the default synset for the word "balance"
136
- "balance".synset
137
- # => #<WordNet::Synset:0x40376844 balance (noun): "a state of equilibrium"
138
- (derivations: 3, antonyms: 1, hypernyms: 1, hyponyms: 3)>
139
-
140
- # Fetch the synset for the first verb sense of "balance"
141
- "balance".en.synset( :verb )
142
- # => #<WordNet::Synset:0x4033f448 balance, equilibrate, equilibrize, equilibrise
143
- (verb): "bring into balance or equilibrium; "She has to balance work and her
144
- domestic duties"; "balance the two weights"" (derivations: 7, antonyms: 1,
145
- verbGroups: 2, hypernyms: 1, hyponyms: 5)>
146
-
147
- # Fetch the second noun sense
148
- "balance".en.synset( 2, :noun )
149
- # => #<WordNet::Synset:0x404ebb24 balance (noun): "a scale for weighing; depends
150
- on pull of gravity" (hypernyms: 1, hyponyms: 5)>
151
-
152
- # Fetch the second noun sense's hypernyms (more-general words, like a superclass)
153
- "balance".en.synset( 2, :noun ).hypernyms
154
- # => [#<WordNet::Synset:0x404e5620 scale, weighing machine (noun): "a measuring
155
- instrument for weighing; shows amount of mass" (derivations: 2, hypernyms: 1,
156
- hyponyms: 2)>]
157
-
158
- # A simpler way of doing the same thing:
159
- "balance".en.hypernyms( 2, :noun )
160
- # => [#<WordNet::Synset:0x404e5620 scale, weighing machine (noun): "a measuring
161
- instrument for weighing; shows amount of mass" (derivations: 2, hypernyms: 1,
162
- hyponyms: 2)>]
163
-
164
- # Fetch the first hypernym's hypernyms
165
- "balance".en.synset( 2, :noun ).hypernyms.first.hypernyms
166
- # => [#<WordNet::Synset:0x404c60b8 measuring instrument, measuring system,
167
- measuring device (noun): "instrument that shows the extent or amount or quantity
168
- or degree of something" (hypernyms: 1, hyponyms: 83)>]
169
-
170
- # Find the synset to which both the second noun sense of "balance" and the
171
- # default sense of "shovel" belong.
172
- ("balance".en.synset( 2, :noun ) | "shovel".en.synset)
173
- # => #<WordNet::Synset:0x40473da4 instrumentality, instrumentation (noun): "an
174
- artifact (or system of artifacts) that is instrumental in accomplishing some
175
- end" (derivations: 1, hypernyms: 1, hyponyms: 13)>
176
-
177
- # Fetch just the words for the other kinds of "instruments"
178
- "instrument".en.hyponyms.collect {|synset| synset.words}.flatten
179
- # => ["analyzer", "analyser", "cautery", "cauterant", "drafting instrument",
180
- "extractor", "instrument of execution", "instrument of punishment", "measuring
181
- instrument", "measuring system", "measuring device", "medical instrument",
182
- "navigational instrument", "optical instrument", "plotter", "scientific
183
- instrument", "sonograph", "surveying instrument", "surveyor's instrument",
184
- "tracer", "weapon", "arm", "weapon system", "whip"]
185
-
186
- There are many more WordNet methods supported � too many to list here. See the
187
- documentation for the complete list.
188
-
189
-
190
- == LinkParser Integration
191
-
192
- Another new feature in version 0.02 is integration with the Ruby version of the
193
- CMU Link Grammar Parser by Martin Chase. If you have the LinkParser module
194
- installed, you can create linkages from English sentences that let you query for
195
- parts of speech:
196
-
197
- # Test to see whether or not the link parser is loaded.
198
- Linguistics::EN.has_link_parser?
199
- # => true
200
-
201
- # Diagram the first linkage for a test sentence
202
- puts "he is a big dog".sentence.linkages.first.to_s
203
- +---O*---+
204
- | +--Ds--+
205
- +Ss+ | +-A-+
206
- | | | | |
207
- he is a big dog
208
-
209
- # Find the verb in the sentence
210
- "he is a big dog".en.sentence.verb.to_s
211
- # => "is"
212
-
213
- # Combined infinitive + LinkParser: Find the infinitive form of the verb of the
214
- given sentence.
215
- "he is a big dog".en.sentence.verb.infinitive
216
- # => "be"
217
-
218
- # Find the direct object of the sentence
219
- "he is a big dog".en.sentence.object.to_s
220
- # => "dog"
221
-
222
- # Look at the raw LinkParser::Word for the direct object of the sentence.
223
- "he is a big dog".en.sentence.object
224
- # => #<LinkParser::Word:0x403da0a0 @definition=[[{@A-}, Ds-, {@M+}, J-], [{@A-},
225
- Ds-, {@M+}, Os-], [{@A-}, Ds-, {@M+}, Ss+, {@CO-}, {C-}], [{@A-}, Ds-, {@M+},
226
- Ss+, R-], [{@A-}, Ds-, {@M+}, SIs-], [{@A-}, Ds-, {R+}, {Bs+}, J-], [{@A-}, Ds-,
227
- {R+}, {Bs+}, Os-], [{@A-}, Ds-, {R+}, {Bs+}, Ss+, {@CO-}, {C-}], [{@A-}, Ds-,
228
- {R+}, {Bs+}, Ss+, R-], [{@A-}, Ds-, {R+}, {Bs+}, SIs-]], @right=[], @suffix="",
229
- @left=[#<LinkParser::Connection:0x403da028 @rword=#<LinkParser::Word:0x403da0a0
230
- ...>, @lword=#<LinkParser::Word:0x403da0b4 @definition=[[Ss-, O+, {@MV+}], [Ss-,
231
- B-, {@MV+}], [Ss-, P+], [Ss-, AF-], [RS-, Bs-, O+, {@MV+}], [RS-, Bs-, B-,
232
- {@MV+}], [RS-, Bs-, P+], [RS-, Bs-, AF-], [{Q-}, SIs+, O+, {@MV+}], [{Q-}, SIs+,
233
- B-, {@MV+}], [{Q-}, SIs+, P+], [{Q-}, SIs+, AF-]],
234
- @right=[#<LinkParser::Connection:0x403da028 ...>], @suffix="", @left=[],
235
- @name="is", @position=1>, @subName="*", @name="O", @length=3>], @name="dog",
236
- @position=4>
237
-
238
- # Combine WordNet + LinkParser to find the definition of the direct object of
239
- # the sentence
240
- "he is a big dog".en.sentence.object.gloss
241
- # => "a member of the genus Canis (probably descended from the common wolf) that
242
- has been domesticated by man since prehistoric times; occurs in many breeds;
243
- \"the dog barked all night\""
244
-
245
-
data/rake/191_compat.rb DELETED
@@ -1,26 +0,0 @@
1
- # 1.9.1 fixes
2
-
3
-
4
- # Make Pathname compatible with 1.8.7 Pathname.
5
- unless Pathname.instance_methods.include?( :=~ )
6
- class Pathname
7
- def self::glob( *args ) # :yield: p
8
- args = args.collect {|p| p.to_s }
9
- if block_given?
10
- Dir.glob(*args) {|f| yield self.new(f) }
11
- else
12
- Dir.glob(*args).map {|f| self.new(f) }
13
- end
14
- end
15
-
16
- def =~( other )
17
- self.to_s =~ other
18
- end
19
-
20
- def to_str
21
- self.to_s
22
- end
23
- end
24
- end
25
-
26
-
data/rake/dependencies.rb DELETED
@@ -1,76 +0,0 @@
1
- #
2
- # Dependency-checking and Installation Rake Tasks
3
-
4
- #
5
-
6
- require 'rubygems/dependency_installer'
7
- require 'rubygems/source_index'
8
- require 'rubygems/requirement'
9
- require 'rubygems/doc_manager'
10
-
11
- ### Install the specified +gems+ if they aren't already installed.
12
- def install_gems( gems )
13
-
14
- defaults = Gem::DependencyInstaller::DEFAULT_OPTIONS.merge({
15
- :generate_rdoc => true,
16
- :generate_ri => true,
17
- :install_dir => Gem.dir,
18
- :format_executable => false,
19
- :test => false,
20
- :version => Gem::Requirement.default,
21
- })
22
-
23
- # Check for root
24
- if Process.euid != 0
25
- $stderr.puts "This probably won't work, as you aren't root, but I'll try anyway"
26
- end
27
-
28
- gemindex = Gem::SourceIndex.from_installed_gems
29
-
30
- gems.each do |gemname, reqstring|
31
- requirement = Gem::Requirement.new( reqstring )
32
- trace "requirement is: %p" % [ requirement ]
33
-
34
- trace "Searching for an installed #{gemname}..."
35
- specs = gemindex.find_name( gemname )
36
- trace "...found %d specs: %s" %
37
- [ specs.length, specs.collect {|s| "%s %s" % [s.name, s.version] }.join(', ') ]
38
-
39
- if spec = specs.find {|spec| requirement.satisfied_by?(spec.version) }
40
- log "Version %s of %s is already installed (needs %s); skipping..." %
41
- [ spec.version, spec.name, requirement ]
42
- next
43
- end
44
-
45
- rgv = Gem::Version.new( Gem::RubyGemsVersion )
46
- installer = nil
47
-
48
- log "Trying to install #{gemname.inspect} #{requirement}..."
49
- if rgv >= Gem::Version.new( '1.1.1' )
50
- installer = Gem::DependencyInstaller.new
51
- installer.install( gemname, requirement )
52
- else
53
- installer = Gem::DependencyInstaller.new( gemname )
54
- installer.install
55
- end
56
-
57
- installer.installed_gems.each do |spec|
58
- log "Installed: %s" % [ spec.full_name ]
59
- end
60
-
61
- end
62
- end
63
-
64
-
65
- ### Task: install runtime dependencies
66
- desc "Install runtime dependencies as gems"
67
- task :install_dependencies do
68
- install_gems( DEPENDENCIES )
69
- end
70
-
71
- ### Task: install gems for development tasks
72
- desc "Install development dependencies as gems"
73
- task :install_dev_dependencies do
74
- install_gems( DEVELOPMENT_DEPENDENCIES )
75
- end
76
-