linguistics 1.0.9 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (69) hide show
  1. data.tar.gz.sig +0 -0
  2. data/.gemtest +0 -0
  3. data/ChangeLog +849 -342
  4. data/History.rdoc +11 -0
  5. data/LICENSE +9 -9
  6. data/Manifest.txt +44 -0
  7. data/README.rdoc +226 -0
  8. data/Rakefile +32 -349
  9. data/examples/endocs.rb +272 -0
  10. data/examples/generalize_sentence.rb +2 -1
  11. data/examples/klingon.rb +22 -0
  12. data/lib/linguistics.rb +130 -292
  13. data/lib/linguistics/en.rb +337 -1628
  14. data/lib/linguistics/en/articles.rb +138 -0
  15. data/lib/linguistics/en/conjugation.rb +2245 -0
  16. data/lib/linguistics/en/conjunctions.rb +202 -0
  17. data/lib/linguistics/en/{infinitive.rb → infinitives.rb} +41 -55
  18. data/lib/linguistics/en/linkparser.rb +41 -49
  19. data/lib/linguistics/en/numbers.rb +483 -0
  20. data/lib/linguistics/en/participles.rb +33 -0
  21. data/lib/linguistics/en/pluralization.rb +810 -0
  22. data/lib/linguistics/en/stemmer.rb +75 -0
  23. data/lib/linguistics/en/titlecase.rb +121 -0
  24. data/lib/linguistics/en/wordnet.rb +63 -97
  25. data/lib/linguistics/inflector.rb +89 -0
  26. data/lib/linguistics/iso639.rb +534 -448
  27. data/lib/linguistics/languagebehavior.rb +36 -0
  28. data/lib/linguistics/monkeypatches.rb +42 -0
  29. data/spec/lib/constants.rb +15 -0
  30. data/spec/lib/helpers.rb +38 -0
  31. data/spec/linguistics/en/articles_spec.rb +797 -0
  32. data/spec/linguistics/en/conjugation_spec.rb +2083 -0
  33. data/spec/linguistics/en/conjunctions_spec.rb +154 -0
  34. data/spec/linguistics/en/infinitives_spec.rb +518 -0
  35. data/spec/linguistics/en/linkparser_spec.rb +66 -0
  36. data/spec/linguistics/en/numbers_spec.rb +1295 -0
  37. data/spec/linguistics/en/participles_spec.rb +55 -0
  38. data/spec/linguistics/en/pluralization_spec.rb +4636 -0
  39. data/spec/linguistics/en/stemmer_spec.rb +72 -0
  40. data/spec/linguistics/en/titlecase_spec.rb +841 -0
  41. data/spec/linguistics/en/wordnet_spec.rb +85 -0
  42. data/spec/linguistics/en_spec.rb +45 -167
  43. data/spec/linguistics/inflector_spec.rb +40 -0
  44. data/spec/linguistics/iso639_spec.rb +49 -53
  45. data/spec/linguistics/monkeypatches_spec.rb +40 -0
  46. data/spec/linguistics_spec.rb +46 -76
  47. metadata +241 -113
  48. metadata.gz.sig +0 -0
  49. data/README +0 -166
  50. data/README.english +0 -245
  51. data/rake/191_compat.rb +0 -26
  52. data/rake/dependencies.rb +0 -76
  53. data/rake/documentation.rb +0 -123
  54. data/rake/helpers.rb +0 -502
  55. data/rake/hg.rb +0 -318
  56. data/rake/manual.rb +0 -787
  57. data/rake/packaging.rb +0 -129
  58. data/rake/publishing.rb +0 -341
  59. data/rake/style.rb +0 -62
  60. data/rake/svn.rb +0 -668
  61. data/rake/testing.rb +0 -152
  62. data/rake/verifytask.rb +0 -64
  63. data/tests/en/infinitive.tests.rb +0 -207
  64. data/tests/en/inflect.tests.rb +0 -1389
  65. data/tests/en/lafcadio.tests.rb +0 -77
  66. data/tests/en/linkparser.tests.rb +0 -42
  67. data/tests/en/lprintf.tests.rb +0 -77
  68. data/tests/en/titlecase.tests.rb +0 -73
  69. data/tests/en/wordnet.tests.rb +0 -95
metadata.gz.sig CHANGED
Binary file
data/README DELETED
@@ -1,166 +0,0 @@
1
-
2
- = Linguistics
3
-
4
- == Authors
5
-
6
- * Michael Granger <ged@FaerieMUD.org>
7
- * Martin Chase <stillflame@FaerieMUD.org>
8
-
9
-
10
- == Requirements
11
-
12
- * Ruby >= 1.8.6
13
-
14
-
15
- == Optional
16
-
17
- * Ruby-WordNet (>= 0.0.5) - adds integration for the Ruby binding for the
18
- WordNet� lexical refrence system.
19
-
20
- URL: http://deveiate.org/projects/Ruby-WordNet
21
-
22
- * LinkParser (>= 1.0.5)
23
-
24
- URL: http://deveiate.org/projects/Ruby-LinkParser
25
-
26
-
27
- == General Information
28
-
29
- Linguistics is a framework for building linguistic utilities for Ruby objects
30
- in any language. It includes a generic language-independant front end, a
31
- module for mapping language codes into language names, and a module which
32
- contains various English-language utilities.
33
-
34
-
35
- === Method Interface
36
-
37
- The Linguistics module comes with a language-independant mechanism for
38
- extending core Ruby classes with linguistic methods.
39
-
40
- It consists of three parts: a core linguistics module which contains the
41
- class-extension framework for languages, a generic inflector class that serves
42
- as a delegator for linguistic methods on Ruby objects, and one or more
43
- language-specific modules which contain the actual linguistic functions.
44
-
45
- The module works by adding a single instance method for each language named
46
- after the language's two-letter code (or three-letter code, if no two-letter
47
- code is defined by ISO639) to various Ruby classes. This allows many
48
- language-specific methods to be added to objects without cluttering up the
49
- interface or risking collision between them, albeit at the cost of three or four
50
- more characters per method invocation.
51
-
52
- If you don't like extending core Ruby classes, the language modules should
53
- also allow you to use them as a function library as well.
54
-
55
- For example, the English-language module contains a #plural function which can
56
- be accessed via a method on a core class:
57
-
58
- Linguistics::use( :en )
59
- "goose".en.plural
60
- # => "geese"
61
-
62
- or via the Linguistics::EN::plural function directly:
63
-
64
- include Linguistics::EN
65
- plural( "goose" )
66
- # => "geese"
67
-
68
- The class-extension mechanism actually uses the functional interface behind
69
- the scenes.
70
-
71
- A new feature with the 0.02 release: You can now omit the language-code method
72
- for unambiguous methods by calling Linguistics::use with the +:installProxy+
73
- configuration key, with the language code of the language module whose methods
74
- you wish to be available. For example, instead of having to call:
75
-
76
- "goose".en.plural
77
-
78
- from the example above, you can now do this:
79
-
80
- Lingusitics::use( :en, :installProxy => :en )
81
- "goose".plural
82
- # => "geese"
83
-
84
- More about how this works in the documentation for Linguistics::use.
85
-
86
-
87
- ==== Adding Language Modules
88
-
89
- To add a new language to the framework, create a file named the same as the
90
- ISO639 2- or 3-letter language code for the language you're adding. It must be
91
- placed under lib/linguistics/ to be recognized by the linguistics module, but
92
- you can also just require it yourself prior to calling Linguistics::use().
93
- This file should define a module under Linguistics that is an all-caps version
94
- of the code used in the filename. Any methods you wish to be exposed to users
95
- should be declared as module functions (ie., using Module#module_function).
96
-
97
- You may also wish to add your module to the list of default languages by
98
- adding the appropriate symbol to the Linguistics::DefaultLanguages array.
99
-
100
- For example, to create a Portuguese-language module, create a file called
101
- 'lib/linguistics/pt.rb' which contains the following:
102
-
103
- module Linguistics
104
- module PT
105
- Linguistics::DefaultLanguages << :pt
106
-
107
- module_function
108
- <language methods here>
109
- end
110
- end
111
-
112
- See the English language module (lib/linguistics/en.rb) for an example.
113
-
114
-
115
- === English Language Module
116
-
117
- See the README.english file for a synopsis.
118
-
119
- The English-language module currently contains linguistic functions ported
120
- from a few excellent Perl modules:
121
-
122
- Lingua::EN::Inflect
123
- Lingua::Conjunction
124
- Lingua::EN::Infinitive
125
-
126
- See the lib/linguistics/en.rb file for specific attributions.
127
-
128
- New with version 0.02: integration with the Ruby WordNet� and LinkParser
129
- modules (which must be installed separately).
130
-
131
-
132
- == To Do
133
-
134
- * I am planning on improving the results from the infinitive functions, which
135
- currently return useful results only part of the time. Investigations into
136
- additional stemming functions and some other strategies are ongoing.
137
-
138
- * Martin Chase <stillflame at FaerieMUD dot org> is working on an integration
139
- module for his excellent work on a Ruby interface to the CMU Link Grammar
140
- (an english-sentence parser). This will make writing fairly accurate natural
141
- language parsers in Ruby much easier.
142
-
143
- * Suggestions (and patches) for any of these items or additional features are
144
- welcomed.
145
-
146
-
147
-
148
- == Legal
149
-
150
- This module is Open Source Software which is Copyright (c) 2003 by The
151
- FaerieMUD Consortium. All rights reserved.
152
-
153
- You may use, modify, and/or redistribute this software under the terms of the
154
- Perl Artistic License, a copy of which should have been included in this
155
- distribution (See the file Artistic). If it was not, a copy of it may be
156
- obtained from http://language.perl.com/misc/Artistic.html or
157
- http://www.faeriemud.org/artistic.html).
158
-
159
- THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
160
- WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
161
- MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
162
-
163
-
164
- $Id$
165
-
166
-
data/README.english DELETED
@@ -1,245 +0,0 @@
1
-
2
- = English Ruby Linguistics Module - Synopsis
3
-
4
- This is an overview of the functionality currently in the English functions of
5
- the Ruby Linguistics module as of version 0.02:
6
-
7
-
8
- == Pluralization
9
-
10
- require 'linguistics'
11
- Linguistics::use( :en ) # extends Array, String, and Numeric
12
-
13
- "box".en.plural
14
- # => "boxes"
15
-
16
- "mouse".en.plural
17
- # => "mice"
18
-
19
- "ruby".en.plural
20
- # => "rubies"
21
-
22
-
23
- == Indefinite Articles
24
-
25
- "book".en.a
26
- # => "a book"
27
-
28
- "article".en.a
29
- # => "an article"
30
-
31
-
32
- == Present Participles
33
-
34
- "runs".en.present_participle
35
- # => "running"
36
-
37
- "eats".en.present_participle
38
- # => "eating"
39
-
40
- "spies".en.present_participle
41
- # => "spying"
42
-
43
-
44
- == Ordinal Numbers
45
-
46
- 5.en.ordinal
47
- # => "5th"
48
-
49
- 2004.en.ordinal
50
- # => "2004th"
51
-
52
-
53
- == Numbers to Words
54
-
55
- 5.en.numwords
56
- # => "five"
57
-
58
- 2004.en.numwords
59
- # => "two thousand and four"
60
-
61
- 2385762345876.en.numwords
62
- # => "two trillion, three hundred and eighty-five billion,
63
- seven hundred and sixty-two million, three hundred and
64
- forty-five thousand, eight hundred and seventy-six"
65
-
66
-
67
- == Quantification
68
-
69
- "cow".en.quantify( 5 )
70
- # => "several cows"
71
-
72
- "cow".en.quantify( 1005 )
73
- # => "thousands of cows"
74
-
75
- "cow".en.quantify( 20_432_123_000_000 )
76
- # => "tens of trillions of cows"
77
-
78
-
79
- == Conjunctions
80
-
81
- animals = %w{dog cow ox chicken goose goat cow dog rooster llama
82
- pig goat dog cat cat dog cow goat goose goose ox alpaca}
83
- puts "The farm has: " + animals.en.conjunction
84
-
85
- # => The farm has: four dogs, three cows, three geese, three goats,
86
- two oxen, two cats, a chicken, a rooster, a llama, a pig,
87
- and an alpaca
88
-
89
- Note that 'goose' and 'ox' are both correctly pluralized, and the correct
90
- indefinite article 'an' has been used for 'alpaca'.
91
-
92
- You can also use the generalization function of the #quantify method to give
93
- general descriptions of object lists instead of literal counts:
94
-
95
- allobjs = []
96
- ObjectSpace::each_object {|obj| allobjs << obj.class.name}
97
-
98
- puts "The current Ruby objectspace contains: " +
99
- allobjs.en.conjunction( :generalize => true )
100
-
101
- which will print something like:
102
-
103
- The current Ruby objectspace contains: thousands of Strings,
104
- thousands of Arrays, hundreds of Hashes, hundreds of
105
- Classes, many Regexps, a number of Ranges, a number of
106
- Modules, several Floats, several Procs, several MatchDatas,
107
- several Objects, several IOS, several Files, a Binding, a
108
- NoMemoryError, a SystemStackError, a fatal, a ThreadGroup,
109
- and a Thread
110
-
111
-
112
- == Infinitives
113
-
114
- New in version 0.02:
115
-
116
- "leaving".en.infinitive
117
- # => "leave"
118
-
119
- "left".en.infinitive
120
- # => "leave"
121
-
122
- "leaving".en.infinitive.suffix
123
- # => "ing"
124
-
125
-
126
- == WordNet� Integration
127
-
128
- Also new in version 0.02, if you have the Ruby-WordNet module installed, you can
129
- look up WordNet synsets using the Linguistics interface:
130
-
131
- # Test to be sure the WordNet module loaded okay.
132
- Linguistics::EN.has_wordnet?
133
- # => true
134
-
135
- # Fetch the default synset for the word "balance"
136
- "balance".synset
137
- # => #<WordNet::Synset:0x40376844 balance (noun): "a state of equilibrium"
138
- (derivations: 3, antonyms: 1, hypernyms: 1, hyponyms: 3)>
139
-
140
- # Fetch the synset for the first verb sense of "balance"
141
- "balance".en.synset( :verb )
142
- # => #<WordNet::Synset:0x4033f448 balance, equilibrate, equilibrize, equilibrise
143
- (verb): "bring into balance or equilibrium; "She has to balance work and her
144
- domestic duties"; "balance the two weights"" (derivations: 7, antonyms: 1,
145
- verbGroups: 2, hypernyms: 1, hyponyms: 5)>
146
-
147
- # Fetch the second noun sense
148
- "balance".en.synset( 2, :noun )
149
- # => #<WordNet::Synset:0x404ebb24 balance (noun): "a scale for weighing; depends
150
- on pull of gravity" (hypernyms: 1, hyponyms: 5)>
151
-
152
- # Fetch the second noun sense's hypernyms (more-general words, like a superclass)
153
- "balance".en.synset( 2, :noun ).hypernyms
154
- # => [#<WordNet::Synset:0x404e5620 scale, weighing machine (noun): "a measuring
155
- instrument for weighing; shows amount of mass" (derivations: 2, hypernyms: 1,
156
- hyponyms: 2)>]
157
-
158
- # A simpler way of doing the same thing:
159
- "balance".en.hypernyms( 2, :noun )
160
- # => [#<WordNet::Synset:0x404e5620 scale, weighing machine (noun): "a measuring
161
- instrument for weighing; shows amount of mass" (derivations: 2, hypernyms: 1,
162
- hyponyms: 2)>]
163
-
164
- # Fetch the first hypernym's hypernyms
165
- "balance".en.synset( 2, :noun ).hypernyms.first.hypernyms
166
- # => [#<WordNet::Synset:0x404c60b8 measuring instrument, measuring system,
167
- measuring device (noun): "instrument that shows the extent or amount or quantity
168
- or degree of something" (hypernyms: 1, hyponyms: 83)>]
169
-
170
- # Find the synset to which both the second noun sense of "balance" and the
171
- # default sense of "shovel" belong.
172
- ("balance".en.synset( 2, :noun ) | "shovel".en.synset)
173
- # => #<WordNet::Synset:0x40473da4 instrumentality, instrumentation (noun): "an
174
- artifact (or system of artifacts) that is instrumental in accomplishing some
175
- end" (derivations: 1, hypernyms: 1, hyponyms: 13)>
176
-
177
- # Fetch just the words for the other kinds of "instruments"
178
- "instrument".en.hyponyms.collect {|synset| synset.words}.flatten
179
- # => ["analyzer", "analyser", "cautery", "cauterant", "drafting instrument",
180
- "extractor", "instrument of execution", "instrument of punishment", "measuring
181
- instrument", "measuring system", "measuring device", "medical instrument",
182
- "navigational instrument", "optical instrument", "plotter", "scientific
183
- instrument", "sonograph", "surveying instrument", "surveyor's instrument",
184
- "tracer", "weapon", "arm", "weapon system", "whip"]
185
-
186
- There are many more WordNet methods supported � too many to list here. See the
187
- documentation for the complete list.
188
-
189
-
190
- == LinkParser Integration
191
-
192
- Another new feature in version 0.02 is integration with the Ruby version of the
193
- CMU Link Grammar Parser by Martin Chase. If you have the LinkParser module
194
- installed, you can create linkages from English sentences that let you query for
195
- parts of speech:
196
-
197
- # Test to see whether or not the link parser is loaded.
198
- Linguistics::EN.has_link_parser?
199
- # => true
200
-
201
- # Diagram the first linkage for a test sentence
202
- puts "he is a big dog".sentence.linkages.first.to_s
203
- +---O*---+
204
- | +--Ds--+
205
- +Ss+ | +-A-+
206
- | | | | |
207
- he is a big dog
208
-
209
- # Find the verb in the sentence
210
- "he is a big dog".en.sentence.verb.to_s
211
- # => "is"
212
-
213
- # Combined infinitive + LinkParser: Find the infinitive form of the verb of the
214
- given sentence.
215
- "he is a big dog".en.sentence.verb.infinitive
216
- # => "be"
217
-
218
- # Find the direct object of the sentence
219
- "he is a big dog".en.sentence.object.to_s
220
- # => "dog"
221
-
222
- # Look at the raw LinkParser::Word for the direct object of the sentence.
223
- "he is a big dog".en.sentence.object
224
- # => #<LinkParser::Word:0x403da0a0 @definition=[[{@A-}, Ds-, {@M+}, J-], [{@A-},
225
- Ds-, {@M+}, Os-], [{@A-}, Ds-, {@M+}, Ss+, {@CO-}, {C-}], [{@A-}, Ds-, {@M+},
226
- Ss+, R-], [{@A-}, Ds-, {@M+}, SIs-], [{@A-}, Ds-, {R+}, {Bs+}, J-], [{@A-}, Ds-,
227
- {R+}, {Bs+}, Os-], [{@A-}, Ds-, {R+}, {Bs+}, Ss+, {@CO-}, {C-}], [{@A-}, Ds-,
228
- {R+}, {Bs+}, Ss+, R-], [{@A-}, Ds-, {R+}, {Bs+}, SIs-]], @right=[], @suffix="",
229
- @left=[#<LinkParser::Connection:0x403da028 @rword=#<LinkParser::Word:0x403da0a0
230
- ...>, @lword=#<LinkParser::Word:0x403da0b4 @definition=[[Ss-, O+, {@MV+}], [Ss-,
231
- B-, {@MV+}], [Ss-, P+], [Ss-, AF-], [RS-, Bs-, O+, {@MV+}], [RS-, Bs-, B-,
232
- {@MV+}], [RS-, Bs-, P+], [RS-, Bs-, AF-], [{Q-}, SIs+, O+, {@MV+}], [{Q-}, SIs+,
233
- B-, {@MV+}], [{Q-}, SIs+, P+], [{Q-}, SIs+, AF-]],
234
- @right=[#<LinkParser::Connection:0x403da028 ...>], @suffix="", @left=[],
235
- @name="is", @position=1>, @subName="*", @name="O", @length=3>], @name="dog",
236
- @position=4>
237
-
238
- # Combine WordNet + LinkParser to find the definition of the direct object of
239
- # the sentence
240
- "he is a big dog".en.sentence.object.gloss
241
- # => "a member of the genus Canis (probably descended from the common wolf) that
242
- has been domesticated by man since prehistoric times; occurs in many breeds;
243
- \"the dog barked all night\""
244
-
245
-
data/rake/191_compat.rb DELETED
@@ -1,26 +0,0 @@
1
- # 1.9.1 fixes
2
-
3
-
4
- # Make Pathname compatible with 1.8.7 Pathname.
5
- unless Pathname.instance_methods.include?( :=~ )
6
- class Pathname
7
- def self::glob( *args ) # :yield: p
8
- args = args.collect {|p| p.to_s }
9
- if block_given?
10
- Dir.glob(*args) {|f| yield self.new(f) }
11
- else
12
- Dir.glob(*args).map {|f| self.new(f) }
13
- end
14
- end
15
-
16
- def =~( other )
17
- self.to_s =~ other
18
- end
19
-
20
- def to_str
21
- self.to_s
22
- end
23
- end
24
- end
25
-
26
-
data/rake/dependencies.rb DELETED
@@ -1,76 +0,0 @@
1
- #
2
- # Dependency-checking and Installation Rake Tasks
3
-
4
- #
5
-
6
- require 'rubygems/dependency_installer'
7
- require 'rubygems/source_index'
8
- require 'rubygems/requirement'
9
- require 'rubygems/doc_manager'
10
-
11
- ### Install the specified +gems+ if they aren't already installed.
12
- def install_gems( gems )
13
-
14
- defaults = Gem::DependencyInstaller::DEFAULT_OPTIONS.merge({
15
- :generate_rdoc => true,
16
- :generate_ri => true,
17
- :install_dir => Gem.dir,
18
- :format_executable => false,
19
- :test => false,
20
- :version => Gem::Requirement.default,
21
- })
22
-
23
- # Check for root
24
- if Process.euid != 0
25
- $stderr.puts "This probably won't work, as you aren't root, but I'll try anyway"
26
- end
27
-
28
- gemindex = Gem::SourceIndex.from_installed_gems
29
-
30
- gems.each do |gemname, reqstring|
31
- requirement = Gem::Requirement.new( reqstring )
32
- trace "requirement is: %p" % [ requirement ]
33
-
34
- trace "Searching for an installed #{gemname}..."
35
- specs = gemindex.find_name( gemname )
36
- trace "...found %d specs: %s" %
37
- [ specs.length, specs.collect {|s| "%s %s" % [s.name, s.version] }.join(', ') ]
38
-
39
- if spec = specs.find {|spec| requirement.satisfied_by?(spec.version) }
40
- log "Version %s of %s is already installed (needs %s); skipping..." %
41
- [ spec.version, spec.name, requirement ]
42
- next
43
- end
44
-
45
- rgv = Gem::Version.new( Gem::RubyGemsVersion )
46
- installer = nil
47
-
48
- log "Trying to install #{gemname.inspect} #{requirement}..."
49
- if rgv >= Gem::Version.new( '1.1.1' )
50
- installer = Gem::DependencyInstaller.new
51
- installer.install( gemname, requirement )
52
- else
53
- installer = Gem::DependencyInstaller.new( gemname )
54
- installer.install
55
- end
56
-
57
- installer.installed_gems.each do |spec|
58
- log "Installed: %s" % [ spec.full_name ]
59
- end
60
-
61
- end
62
- end
63
-
64
-
65
- ### Task: install runtime dependencies
66
- desc "Install runtime dependencies as gems"
67
- task :install_dependencies do
68
- install_gems( DEPENDENCIES )
69
- end
70
-
71
- ### Task: install gems for development tasks
72
- desc "Install development dependencies as gems"
73
- task :install_dev_dependencies do
74
- install_gems( DEVELOPMENT_DEPENDENCIES )
75
- end
76
-