moby 1.0.5 → 1.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,413 @@
1
+ The Project Gutenberg Etext of Moby Word II by Grady Ward
2
+
3
+ Copyright laws are changing all over the world, be sure to check
4
+ the laws for your country before redistributing these files!!!
5
+
6
+ Please take a look at the important information in this header.
7
+ We encourage you to keep this file on your own disk, keeping an
8
+ electronic path open for the next readers. Do not remove this.
9
+
10
+ This should be the first thing seen when anyone opens the book.
11
+ Do not change or edit it without written permission. The words
12
+ are carefully chosen to provide users with the information they
13
+ need about what they can legally do with the texts.
14
+
15
+ **Welcome To The World of Free Plain Vanilla Electronic Texts**
16
+
17
+ **Etexts Readable By Both Humans and By Computers, Since 1971**
18
+
19
+ *These Etexts Prepared By Hundreds of Volunteers and Donations*
20
+
21
+ Information on contacting Project Gutenberg to get Etexts, and
22
+ further information is included below. We need your donations.
23
+
24
+ Presently, contributions are only being solicited from people in:
25
+ Texas, Nevada, Idaho, Montana, Wyoming, Colorado, South Dakota,
26
+ Iowa, Indiana, and Vermont. As the requirements for other states
27
+ are met, additions to this list will be made and fund raising will
28
+ begin in the additional states. These donations should be made to:
29
+
30
+ Project Gutenberg Literary Archive Foundation
31
+ PMB 113
32
+ 1739 University Ave.
33
+ Oxford, MS 38655
34
+
35
+ Title: Moby Word II
36
+
37
+ Author: Grady Ward, grady@gradyward.com
38
+
39
+ Release Date: May, 2002 [Etext #3201]
40
+
41
+ Edition: 1.0
42
+
43
+ The Project Gutenberg Etext of Moby Word II by Grady Ward
44
+ ******This file should be named mword10.zip******
45
+
46
+ Corrected EDITIONS of our etexts get a new NUMBER, mword11.zip
47
+ VERSIONS based on separate sources get new LETTER, mword10a.zip
48
+
49
+ This etext was prepared by Mike Pullen,
50
+ globaltraveler5565@yahoo.com.
51
+
52
+ Project Gutenberg Etexts are usually created from multiple editions,
53
+ all of which are in the Public Domain in the United States, unless a
54
+ copyright notice is included. Therefore, we usually do NOT keep any
55
+ of these books in compliance with any particular paper edition.
56
+
57
+ We are now trying to release all our books one year in advance
58
+ of the official release dates, leaving time for better editing.
59
+ Please be encouraged to send us error messages even years after
60
+ the official publication date.
61
+
62
+ Please note: neither this list nor its contents are final till
63
+ midnight of the last day of the month of any such announcement.
64
+ The official release date of all Project Gutenberg Etexts is at
65
+ Midnight, Central Time, of the last day of the stated month. A
66
+ preliminary version may often be posted for suggestion, comment
67
+ and editing by those who wish to do so.
68
+
69
+ Most people start at our sites at:
70
+ http://gutenberg.net/pg
71
+ http://promo.net/pg
72
+
73
+ Those of you who want to download our Etexts before announcment
74
+ can surf to them as follows, and just download by date; this is
75
+ also a good way to get them instantly upon announcement, as the
76
+ indexes our cataloguers produce obviously take a while after an
77
+ announcement goes out in the Project Gutenberg Newsletter.
78
+
79
+ http://metalab.unc.edu/pub/docs/books/gutenberg/etext01
80
+ or
81
+ ftp://metalab.unc.edu/pub/docs/books/gutenberg/etext01
82
+
83
+ Or /etext00, 99, 98, 97, 96, 95, 94, 93, 92, 92, 91 or 90
84
+
85
+ Just search by the first five letters of the filename you want,
86
+ as it appears in our Newsletters.
87
+
88
+ Information about Project Gutenberg (one page)
89
+
90
+ We produce about two million dollars for each hour we work. The
91
+ time it takes us, a rather conservative estimate, is fifty hours
92
+ to get any etext selected, entered, proofread, edited, copyright
93
+ searched and analyzed, the copyright letters written, etc. This
94
+ projected audience is one hundred million readers. If our value
95
+ per text is nominally estimated at one dollar then we produce $2
96
+ million dollars per hour this year as we release fifty new Etext
97
+ files per month, or 500 more Etexts in 2000 for a total of 3000+
98
+ If they reach just 1-2% of the world's population then the total
99
+ should reach over 300 billion Etexts given away by year's end.
100
+
101
+ The Goal of Project Gutenberg is to Give Away One Trillion Etext
102
+ Files by December 31, 2001. [10,000 x 100,000,000 = 1 Trillion]
103
+ This is ten thousand titles each to one hundred million readers,
104
+ which is only about 4% of the present number of computer users.
105
+
106
+ At our revised rates of production, we will reach only one-third
107
+ of that goal by the end of 2001, or about 3,333 Etexts unless we
108
+ manage to get some real funding.
109
+
110
+ Something is needed to create a future for Project Gutenberg for
111
+ the next 100 years.
112
+
113
+ We need your donations more than ever!
114
+
115
+ Presently, contributions are only being solicited from people in:
116
+ Texas, Nevada, Idaho, Montana, Wyoming, Colorado, South Dakota,
117
+ Iowa, Indiana, and Vermont. As the requirements for other states
118
+ are met, additions to this list will be made and fund raising will
119
+ begin in the additional states.
120
+
121
+ All donations should be made to the Project Gutenberg Literary
122
+ Archive Foundation and will be tax deductible to the extent
123
+ permitted by law.
124
+
125
+ Mail to:
126
+
127
+ Project Gutenberg Literary Archive Foundation
128
+ PMB 113
129
+ 1739 University Avenue
130
+ Oxford, MS 38655 [USA]
131
+
132
+ We are working with the Project Gutenberg Literary Archive
133
+ Foundation to build more stable support and ensure the
134
+ future of Project Gutenberg.
135
+
136
+ We need your donations more than ever!
137
+
138
+ You can get up to date donation information at:
139
+
140
+ http://www.gutenberg.net/donation.html
141
+
142
+ ***
143
+
144
+ You can always email directly to:
145
+
146
+ Michael S. Hart <hart@pobox.com>
147
+
148
+ hart@pobox.com forwards to hart@prairienet.org and archive.org
149
+ if your mail bounces from archive.org, I will still see it, if
150
+ it bounces from prairienet.org, better resend later on. . . .
151
+
152
+ We would prefer to send you this information by email.
153
+
154
+ Example command-line FTP session:
155
+
156
+ ftp metalab.unc.edu
157
+ login: anonymous
158
+ password: your@login
159
+ cd pub/docs/books/gutenberg
160
+ cd etext90 through etext99 or etext00 through etext01, etc.
161
+ dir [to see files]
162
+ get or mget [to get files. . .set bin for zip files]
163
+ GET GUTINDEX.?? [to get a year's listing of books, e.g.,
164
+ GUTINDEX.99]
165
+ GET GUTINDEX.ALL [to get a listing of ALL books]
166
+
167
+ **The Legal Small Print**
168
+
169
+ (Three Pages)
170
+
171
+ ***START**THE SMALL PRINT!**FOR PUBLIC DOMAIN ETEXTS**START***
172
+ Why is this "Small Print!" statement here? You know: lawyers.
173
+ They tell us you might sue us if there is something wrong with
174
+ your copy of this etext, even if you got it for free from
175
+ someone other than us, and even if what's wrong is not our
176
+ fault. So, among other things, this "Small Print!" statement
177
+ disclaims most of our liability to you. It also tells you how
178
+ you can distribute copies of this etext if you want to.
179
+
180
+ *BEFORE!* YOU USE OR READ THIS ETEXT
181
+ By using or reading any part of this PROJECT GUTENBERG-tm
182
+ etext, you indicate that you understand, agree to and accept
183
+ this "Small Print!" statement. If you do not, you can receive
184
+ a refund of the money (if any) you paid for this etext by
185
+ sending a request within 30 days of receiving it to the person
186
+ you got it from. If you received this etext on a physical
187
+ medium (such as a disk), you must return it with your request.
188
+
189
+ ABOUT PROJECT GUTENBERG-TM ETEXTS
190
+ This PROJECT GUTENBERG-tm etext, like most PROJECT GUTENBERG-tm
191
+ etexts,
192
+ is a "public domain" work distributed by Professor Michael S. Hart
193
+ through the Project Gutenberg Association (the "Project").
194
+ Among other things, this means that no one owns a United States
195
+ copyright
196
+ on or for this work, so the Project (and you!) can copy and
197
+ distribute it in the United States without permission and
198
+ without paying copyright royalties. Special rules, set forth
199
+ below, apply if you wish to copy and distribute this etext
200
+ under the Project's "PROJECT GUTENBERG" trademark.
201
+
202
+ To create these etexts, the Project expends considerable
203
+ efforts to identify, transcribe and proofread public domain
204
+ works. Despite these efforts, the Project's etexts and any
205
+ medium they may be on may contain "Defects". Among other
206
+ things, Defects may take the form of incomplete, inaccurate or
207
+ corrupt data, transcription errors, a copyright or other
208
+ intellectual property infringement, a defective or damaged
209
+ disk or other etext medium, a computer virus, or computer
210
+ codes that damage or cannot be read by your equipment.
211
+
212
+ LIMITED WARRANTY; DISCLAIMER OF DAMAGES
213
+ But for the "Right of Replacement or Refund" described below,
214
+ [1] the Project (and any other party you may receive this
215
+ etext from as a PROJECT GUTENBERG-tm etext) disclaims all
216
+ liability to you for damages, costs and expenses, including
217
+ legal fees, and [2] YOU HAVE NO REMEDIES FOR NEGLIGENCE OR
218
+ UNDER STRICT LIABILITY, OR FOR BREACH OF WARRANTY OR CONTRACT,
219
+ INCLUDING BUT NOT LIMITED TO INDIRECT, CONSEQUENTIAL, PUNITIVE
220
+ OR INCIDENTAL DAMAGES, EVEN IF YOU GIVE NOTICE OF THE
221
+ POSSIBILITY OF SUCH DAMAGES.
222
+
223
+ If you discover a Defect in this etext within 90 days of
224
+ receiving it, you can receive a refund of the money (if any)
225
+ you paid for it by sending an explanatory note within that
226
+ time to the person you received it from. If you received it
227
+ on a physical medium, you must return it with your note, and
228
+ such person may choose to alternatively give you a replacement
229
+ copy. If you received it electronically, such person may
230
+ choose to alternatively give you a second opportunity to
231
+ receive it electronically.
232
+
233
+ THIS ETEXT IS OTHERWISE PROVIDED TO YOU "AS-IS". NO OTHER
234
+ WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, ARE MADE TO YOU AS
235
+ TO THE ETEXT OR ANY MEDIUM IT MAY BE ON, INCLUDING BUT NOT
236
+ LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
237
+ PARTICULAR PURPOSE.
238
+
239
+ Some states do not allow disclaimers of implied warranties or
240
+ the exclusion or limitation of consequential damages, so the
241
+ above disclaimers and exclusions may not apply to you, and you
242
+ may have other legal rights.
243
+
244
+ INDEMNITY
245
+ You will indemnify and hold the Project, its directors,
246
+ officers, members and agents harmless from all liability, cost
247
+ and expense, including legal fees, that arise directly or
248
+ indirectly from any of the following that you do or cause:
249
+ [1] distribution of this etext, [2] alteration, modification,
250
+ or addition to the etext, or [3] any Defect.
251
+
252
+ DISTRIBUTION UNDER "PROJECT GUTENBERG-tm"
253
+ You may distribute copies of this etext electronically, or by
254
+ disk, book or any other medium if you either delete this
255
+ "Small Print!" and all other references to Project Gutenberg,
256
+ or:
257
+
258
+ [1] Only give exact copies of it. Among other things, this
259
+ requires that you do not remove, alter or modify the
260
+ etext or this "small print!" statement. You may however,
261
+ if you wish, distribute this etext in machine readable
262
+ binary, compressed, mark-up, or proprietary form,
263
+ including any form resulting from conversion by word pro-
264
+ cessing or hypertext software, but only so long as
265
+ *EITHER*:
266
+
267
+ [*] The etext, when displayed, is clearly readable, and
268
+ does *not* contain characters other than those
269
+ intended by the author of the work, although tilde
270
+ (~), asterisk (*) and underline (_) characters may
271
+ be used to convey punctuation intended by the
272
+ author, and additional characters may be used to
273
+ indicate hypertext links; OR
274
+
275
+ [*] The etext may be readily converted by the reader at
276
+ no expense into plain ASCII, EBCDIC or equivalent
277
+ form by the program that displays the etext (as is
278
+ the case, for instance, with most word processors);
279
+ OR
280
+
281
+ [*] You provide, or agree to also provide on request at
282
+ no additional cost, fee or expense, a copy of the
283
+ etext in its original plain ASCII form (or in EBCDIC
284
+ or other equivalent proprietary form).
285
+
286
+ [2] Honor the etext refund and replacement provisions of this
287
+ "Small Print!" statement.
288
+
289
+ [3] Pay a trademark license fee to the Project of 20% of the
290
+ gross profits you derive calculated using the method you
291
+ already use to calculate your applicable taxes. If you
292
+ don't derive profits, no royalty is due. Royalties are
293
+ payable to "Project Gutenberg Literary Archive Foundation"
294
+ the 60 days following each date you prepare (or were
295
+ legally required to prepare) your annual (or equivalent
296
+ periodic) tax return. Please contact us beforehand to
297
+ let us know your plans and to work out the details.
298
+
299
+ WHAT IF YOU *WANT* TO SEND MONEY EVEN IF YOU DON'T HAVE TO?
300
+ The Project gratefully accepts contributions of money, time,
301
+ public domain etexts, and royalty free copyright licenses.
302
+ If you are interested in contributing scanning equipment or
303
+ software or other items, please contact Michael Hart at:
304
+ hart@pobox.com
305
+
306
+ *END THE SMALL PRINT! FOR PUBLIC DOMAIN ETEXTS*Ver.04.07.00*END*
307
+
308
+
309
+
310
+
311
+ Moby (tm) Words II Documentation Notes
312
+
313
+ This documentation, the software and/or database are:
314
+
315
+ Public Domain material by grant from the author, January, 2001.
316
+
317
+
318
+ Moby (tm) Words II for the MSDOS operating system is compressed and
319
+ distributed as a single zip file. After extraction, the vocabulary
320
+ files included with this product are in ordinary ASCII format with
321
+ CRLF (ASCII 13/10) delimiters.
322
+
323
+
324
+
325
+
326
+ MOBY WORDS II CONTENTS
327
+
328
+ 6,213 acronyms (acronyms.txt)
329
+ common acronyms & abbreviations
330
+
331
+ 74,550 common dictionary words (common.txt)
332
+ A list of words in common with two or more published dictionaries.
333
+ This gives the developer of a custom spelling checker a good
334
+ beginning pool of relatively common words.
335
+
336
+ 256,772 compound words (compound.txt)
337
+ Over 256,700 hyphenated or other entries containing more than one
338
+ word as well as all capitalized words and acronyms. Phrases were
339
+ considered 'common' if they or variations of them occur in standard
340
+ dictionaries or thesauruses.
341
+
342
+ 113,809 official crosswords (crosswd.txt)
343
+ A list of words permitted in crossword games such as Scrabble(tm).
344
+ Compatible with the first edition of the Official Scrabble Players
345
+ Dictionary(tm). Since this list has all forms: -ing, -ed, -s, and so
346
+ on of words, it makes a good addition when building a custom spelling
347
+ dictionary.
348
+
349
+ 4,160 official crosswords delta (crswd-d.txt)
350
+ When combined with the 113,809 crosswords file, it produces the
351
+ official crossword list compatible with the second edition of the
352
+ Official Scrabble Players Dictionary. (Scrabble is a registered
353
+ trademark of Milton-Bradley licensed to Merriam-Webster.)
354
+
355
+ 467 current fiction substrings (fiction.txt)
356
+ The most frequently occurring 467 substrings occurring in a
357
+ best-selling novel by Amy Tan in 1990.
358
+
359
+ 1,000 by frequency (freq.txt)
360
+ This file consists of the 1,000 most frequently used English words
361
+ from a wide variety of common texts listed in decreasing order of
362
+ frequency
363
+
364
+ 1,000 by frequency internet (freq-int.txt)
365
+ This file consists of the 1,000 most frequently used English words
366
+ as used on the Internet computer network in 1992.
367
+
368
+ 1,185 King James Version frequent substrings (KJVfreq.txt)
369
+ The most frequently occurring 1,185 substrings in the King James
370
+ Version Bible ranked and counted by order of frequency.
371
+
372
+ 21,986 names (names.txt)
373
+ This database contains the most common names used in the United
374
+ States and Great Britain. Spelling checkers may want to supplement
375
+ their basic word list with this one.
376
+
377
+ 4,946 female names (names-f.txt)
378
+ frequent given names of females in English speaking countries
379
+
380
+ 3,897 male names (names-m.txt
381
+ frequent given names of males in English speaking countries
382
+
383
+ 366 often misspelled words (oftenmis.txt)
384
+ many of the most commonly misspelled words in English speaking countries
385
+
386
+ 10,196 places (places.txt)
387
+ a large selection of place names in the United States
388
+
389
+ 354,984 single words (single.txt)
390
+ Over 354,000 single words, excluding proper names, acronyms, or
391
+ compound words and phrases. This list does not exclude archaic words
392
+ or significant variant spellings.
393
+
394
+ USA Constitution (usaconst.txt)
395
+ The Constitution of the United States, including the Bill of Rights
396
+ and all amendments current to 1993.
397
+
398
+ NOTE: Accents have been stripped from words, e.g., 'etude' does not
399
+ mark the accent on the initial 'e'.
400
+
401
+
402
+
403
+ Quick Start
404
+ 1) Insure you have at least 10Mb of free disk space to hold the contents
405
+ of this zip file.
406
+ 2) Create a directory to hold these files listed above.
407
+ 3) Extract the contents of this zip file into the destination directory
408
+ using any compatible zip file extraction utility.
409
+ 4) Delete the original zip file from your disk to save space. (optional)
410
+
411
+
412
+ End of this Project Gutenberg etext of Moby Word II by Grady Ward.
413
+
@@ -0,0 +1,47 @@
1
+ require 'spec_helper'
2
+
3
+ module Moby
4
+ describe Hyphenator do
5
+ let(:hyph) { Hyphenator.new }
6
+
7
+ describe "#hyphenate" do
8
+ def h(word)
9
+ hyph.hyphenate(word)
10
+ end
11
+
12
+ describe "swordcraft" do
13
+ specify { h("swordcraft").should == "sword-craft" }
14
+ end
15
+
16
+ describe "California poppy" do
17
+ specify { h("California poppy").should == "Cal-i-for-ni-a pop-py" }
18
+ end
19
+
20
+ describe "unstandardized" do
21
+ specify { h("unstandardized").should == "un-stand-ard-ized" }
22
+ end
23
+
24
+ describe "recede" do
25
+ specify { h("recede").should == "re-cede" }
26
+ end
27
+
28
+ describe "unlisted words are returned unchanged" do
29
+ describe "Dwalin" do
30
+ specify { h("Dwalin").should == "Dwalin" }
31
+ end
32
+
33
+ describe "Balin" do
34
+ specify { h("Balin").should == "Balin" }
35
+ end
36
+
37
+ describe "Kili" do
38
+ specify { h("Kili").should == "Kili" }
39
+ end
40
+
41
+ describe "Fili" do
42
+ specify { h("Fili").should == "Fili" }
43
+ end
44
+ end
45
+ end
46
+ end
47
+ end