moby 1.0.5 → 1.0.6

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,413 @@
1
+ The Project Gutenberg Etext of Moby Word II by Grady Ward
2
+
3
+ Copyright laws are changing all over the world, be sure to check
4
+ the laws for your country before redistributing these files!!!
5
+
6
+ Please take a look at the important information in this header.
7
+ We encourage you to keep this file on your own disk, keeping an
8
+ electronic path open for the next readers. Do not remove this.
9
+
10
+ This should be the first thing seen when anyone opens the book.
11
+ Do not change or edit it without written permission. The words
12
+ are carefully chosen to provide users with the information they
13
+ need about what they can legally do with the texts.
14
+
15
+ **Welcome To The World of Free Plain Vanilla Electronic Texts**
16
+
17
+ **Etexts Readable By Both Humans and By Computers, Since 1971**
18
+
19
+ *These Etexts Prepared By Hundreds of Volunteers and Donations*
20
+
21
+ Information on contacting Project Gutenberg to get Etexts, and
22
+ further information is included below. We need your donations.
23
+
24
+ Presently, contributions are only being solicited from people in:
25
+ Texas, Nevada, Idaho, Montana, Wyoming, Colorado, South Dakota,
26
+ Iowa, Indiana, and Vermont. As the requirements for other states
27
+ are met, additions to this list will be made and fund raising will
28
+ begin in the additional states. These donations should be made to:
29
+
30
+ Project Gutenberg Literary Archive Foundation
31
+ PMB 113
32
+ 1739 University Ave.
33
+ Oxford, MS 38655
34
+
35
+ Title: Moby Word II
36
+
37
+ Author: Grady Ward, grady@gradyward.com
38
+
39
+ Release Date: May, 2002 [Etext #3201]
40
+
41
+ Edition: 1.0
42
+
43
+ The Project Gutenberg Etext of Moby Word II by Grady Ward
44
+ ******This file should be named mword10.zip******
45
+
46
+ Corrected EDITIONS of our etexts get a new NUMBER, mword11.zip
47
+ VERSIONS based on separate sources get new LETTER, mword10a.zip
48
+
49
+ This etext was prepared by Mike Pullen,
50
+ globaltraveler5565@yahoo.com.
51
+
52
+ Project Gutenberg Etexts are usually created from multiple editions,
53
+ all of which are in the Public Domain in the United States, unless a
54
+ copyright notice is included. Therefore, we usually do NOT keep any
55
+ of these books in compliance with any particular paper edition.
56
+
57
+ We are now trying to release all our books one year in advance
58
+ of the official release dates, leaving time for better editing.
59
+ Please be encouraged to send us error messages even years after
60
+ the official publication date.
61
+
62
+ Please note: neither this list nor its contents are final till
63
+ midnight of the last day of the month of any such announcement.
64
+ The official release date of all Project Gutenberg Etexts is at
65
+ Midnight, Central Time, of the last day of the stated month. A
66
+ preliminary version may often be posted for suggestion, comment
67
+ and editing by those who wish to do so.
68
+
69
+ Most people start at our sites at:
70
+ http://gutenberg.net/pg
71
+ http://promo.net/pg
72
+
73
+ Those of you who want to download our Etexts before announcment
74
+ can surf to them as follows, and just download by date; this is
75
+ also a good way to get them instantly upon announcement, as the
76
+ indexes our cataloguers produce obviously take a while after an
77
+ announcement goes out in the Project Gutenberg Newsletter.
78
+
79
+ http://metalab.unc.edu/pub/docs/books/gutenberg/etext01
80
+ or
81
+ ftp://metalab.unc.edu/pub/docs/books/gutenberg/etext01
82
+
83
+ Or /etext00, 99, 98, 97, 96, 95, 94, 93, 92, 92, 91 or 90
84
+
85
+ Just search by the first five letters of the filename you want,
86
+ as it appears in our Newsletters.
87
+
88
+ Information about Project Gutenberg (one page)
89
+
90
+ We produce about two million dollars for each hour we work. The
91
+ time it takes us, a rather conservative estimate, is fifty hours
92
+ to get any etext selected, entered, proofread, edited, copyright
93
+ searched and analyzed, the copyright letters written, etc. This
94
+ projected audience is one hundred million readers. If our value
95
+ per text is nominally estimated at one dollar then we produce $2
96
+ million dollars per hour this year as we release fifty new Etext
97
+ files per month, or 500 more Etexts in 2000 for a total of 3000+
98
+ If they reach just 1-2% of the world's population then the total
99
+ should reach over 300 billion Etexts given away by year's end.
100
+
101
+ The Goal of Project Gutenberg is to Give Away One Trillion Etext
102
+ Files by December 31, 2001. [10,000 x 100,000,000 = 1 Trillion]
103
+ This is ten thousand titles each to one hundred million readers,
104
+ which is only about 4% of the present number of computer users.
105
+
106
+ At our revised rates of production, we will reach only one-third
107
+ of that goal by the end of 2001, or about 3,333 Etexts unless we
108
+ manage to get some real funding.
109
+
110
+ Something is needed to create a future for Project Gutenberg for
111
+ the next 100 years.
112
+
113
+ We need your donations more than ever!
114
+
115
+ Presently, contributions are only being solicited from people in:
116
+ Texas, Nevada, Idaho, Montana, Wyoming, Colorado, South Dakota,
117
+ Iowa, Indiana, and Vermont. As the requirements for other states
118
+ are met, additions to this list will be made and fund raising will
119
+ begin in the additional states.
120
+
121
+ All donations should be made to the Project Gutenberg Literary
122
+ Archive Foundation and will be tax deductible to the extent
123
+ permitted by law.
124
+
125
+ Mail to:
126
+
127
+ Project Gutenberg Literary Archive Foundation
128
+ PMB 113
129
+ 1739 University Avenue
130
+ Oxford, MS 38655 [USA]
131
+
132
+ We are working with the Project Gutenberg Literary Archive
133
+ Foundation to build more stable support and ensure the
134
+ future of Project Gutenberg.
135
+
136
+ We need your donations more than ever!
137
+
138
+ You can get up to date donation information at:
139
+
140
+ http://www.gutenberg.net/donation.html
141
+
142
+ ***
143
+
144
+ You can always email directly to:
145
+
146
+ Michael S. Hart <hart@pobox.com>
147
+
148
+ hart@pobox.com forwards to hart@prairienet.org and archive.org
149
+ if your mail bounces from archive.org, I will still see it, if
150
+ it bounces from prairienet.org, better resend later on. . . .
151
+
152
+ We would prefer to send you this information by email.
153
+
154
+ Example command-line FTP session:
155
+
156
+ ftp metalab.unc.edu
157
+ login: anonymous
158
+ password: your@login
159
+ cd pub/docs/books/gutenberg
160
+ cd etext90 through etext99 or etext00 through etext01, etc.
161
+ dir [to see files]
162
+ get or mget [to get files. . .set bin for zip files]
163
+ GET GUTINDEX.?? [to get a year's listing of books, e.g.,
164
+ GUTINDEX.99]
165
+ GET GUTINDEX.ALL [to get a listing of ALL books]
166
+
167
+ **The Legal Small Print**
168
+
169
+ (Three Pages)
170
+
171
+ ***START**THE SMALL PRINT!**FOR PUBLIC DOMAIN ETEXTS**START***
172
+ Why is this "Small Print!" statement here? You know: lawyers.
173
+ They tell us you might sue us if there is something wrong with
174
+ your copy of this etext, even if you got it for free from
175
+ someone other than us, and even if what's wrong is not our
176
+ fault. So, among other things, this "Small Print!" statement
177
+ disclaims most of our liability to you. It also tells you how
178
+ you can distribute copies of this etext if you want to.
179
+
180
+ *BEFORE!* YOU USE OR READ THIS ETEXT
181
+ By using or reading any part of this PROJECT GUTENBERG-tm
182
+ etext, you indicate that you understand, agree to and accept
183
+ this "Small Print!" statement. If you do not, you can receive
184
+ a refund of the money (if any) you paid for this etext by
185
+ sending a request within 30 days of receiving it to the person
186
+ you got it from. If you received this etext on a physical
187
+ medium (such as a disk), you must return it with your request.
188
+
189
+ ABOUT PROJECT GUTENBERG-TM ETEXTS
190
+ This PROJECT GUTENBERG-tm etext, like most PROJECT GUTENBERG-tm
191
+ etexts,
192
+ is a "public domain" work distributed by Professor Michael S. Hart
193
+ through the Project Gutenberg Association (the "Project").
194
+ Among other things, this means that no one owns a United States
195
+ copyright
196
+ on or for this work, so the Project (and you!) can copy and
197
+ distribute it in the United States without permission and
198
+ without paying copyright royalties. Special rules, set forth
199
+ below, apply if you wish to copy and distribute this etext
200
+ under the Project's "PROJECT GUTENBERG" trademark.
201
+
202
+ To create these etexts, the Project expends considerable
203
+ efforts to identify, transcribe and proofread public domain
204
+ works. Despite these efforts, the Project's etexts and any
205
+ medium they may be on may contain "Defects". Among other
206
+ things, Defects may take the form of incomplete, inaccurate or
207
+ corrupt data, transcription errors, a copyright or other
208
+ intellectual property infringement, a defective or damaged
209
+ disk or other etext medium, a computer virus, or computer
210
+ codes that damage or cannot be read by your equipment.
211
+
212
+ LIMITED WARRANTY; DISCLAIMER OF DAMAGES
213
+ But for the "Right of Replacement or Refund" described below,
214
+ [1] the Project (and any other party you may receive this
215
+ etext from as a PROJECT GUTENBERG-tm etext) disclaims all
216
+ liability to you for damages, costs and expenses, including
217
+ legal fees, and [2] YOU HAVE NO REMEDIES FOR NEGLIGENCE OR
218
+ UNDER STRICT LIABILITY, OR FOR BREACH OF WARRANTY OR CONTRACT,
219
+ INCLUDING BUT NOT LIMITED TO INDIRECT, CONSEQUENTIAL, PUNITIVE
220
+ OR INCIDENTAL DAMAGES, EVEN IF YOU GIVE NOTICE OF THE
221
+ POSSIBILITY OF SUCH DAMAGES.
222
+
223
+ If you discover a Defect in this etext within 90 days of
224
+ receiving it, you can receive a refund of the money (if any)
225
+ you paid for it by sending an explanatory note within that
226
+ time to the person you received it from. If you received it
227
+ on a physical medium, you must return it with your note, and
228
+ such person may choose to alternatively give you a replacement
229
+ copy. If you received it electronically, such person may
230
+ choose to alternatively give you a second opportunity to
231
+ receive it electronically.
232
+
233
+ THIS ETEXT IS OTHERWISE PROVIDED TO YOU "AS-IS". NO OTHER
234
+ WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, ARE MADE TO YOU AS
235
+ TO THE ETEXT OR ANY MEDIUM IT MAY BE ON, INCLUDING BUT NOT
236
+ LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
237
+ PARTICULAR PURPOSE.
238
+
239
+ Some states do not allow disclaimers of implied warranties or
240
+ the exclusion or limitation of consequential damages, so the
241
+ above disclaimers and exclusions may not apply to you, and you
242
+ may have other legal rights.
243
+
244
+ INDEMNITY
245
+ You will indemnify and hold the Project, its directors,
246
+ officers, members and agents harmless from all liability, cost
247
+ and expense, including legal fees, that arise directly or
248
+ indirectly from any of the following that you do or cause:
249
+ [1] distribution of this etext, [2] alteration, modification,
250
+ or addition to the etext, or [3] any Defect.
251
+
252
+ DISTRIBUTION UNDER "PROJECT GUTENBERG-tm"
253
+ You may distribute copies of this etext electronically, or by
254
+ disk, book or any other medium if you either delete this
255
+ "Small Print!" and all other references to Project Gutenberg,
256
+ or:
257
+
258
+ [1] Only give exact copies of it. Among other things, this
259
+ requires that you do not remove, alter or modify the
260
+ etext or this "small print!" statement. You may however,
261
+ if you wish, distribute this etext in machine readable
262
+ binary, compressed, mark-up, or proprietary form,
263
+ including any form resulting from conversion by word pro-
264
+ cessing or hypertext software, but only so long as
265
+ *EITHER*:
266
+
267
+ [*] The etext, when displayed, is clearly readable, and
268
+ does *not* contain characters other than those
269
+ intended by the author of the work, although tilde
270
+ (~), asterisk (*) and underline (_) characters may
271
+ be used to convey punctuation intended by the
272
+ author, and additional characters may be used to
273
+ indicate hypertext links; OR
274
+
275
+ [*] The etext may be readily converted by the reader at
276
+ no expense into plain ASCII, EBCDIC or equivalent
277
+ form by the program that displays the etext (as is
278
+ the case, for instance, with most word processors);
279
+ OR
280
+
281
+ [*] You provide, or agree to also provide on request at
282
+ no additional cost, fee or expense, a copy of the
283
+ etext in its original plain ASCII form (or in EBCDIC
284
+ or other equivalent proprietary form).
285
+
286
+ [2] Honor the etext refund and replacement provisions of this
287
+ "Small Print!" statement.
288
+
289
+ [3] Pay a trademark license fee to the Project of 20% of the
290
+ gross profits you derive calculated using the method you
291
+ already use to calculate your applicable taxes. If you
292
+ don't derive profits, no royalty is due. Royalties are
293
+ payable to "Project Gutenberg Literary Archive Foundation"
294
+ the 60 days following each date you prepare (or were
295
+ legally required to prepare) your annual (or equivalent
296
+ periodic) tax return. Please contact us beforehand to
297
+ let us know your plans and to work out the details.
298
+
299
+ WHAT IF YOU *WANT* TO SEND MONEY EVEN IF YOU DON'T HAVE TO?
300
+ The Project gratefully accepts contributions of money, time,
301
+ public domain etexts, and royalty free copyright licenses.
302
+ If you are interested in contributing scanning equipment or
303
+ software or other items, please contact Michael Hart at:
304
+ hart@pobox.com
305
+
306
+ *END THE SMALL PRINT! FOR PUBLIC DOMAIN ETEXTS*Ver.04.07.00*END*
307
+
308
+
309
+
310
+
311
+ Moby (tm) Words II Documentation Notes
312
+
313
+ This documentation, the software and/or database are:
314
+
315
+ Public Domain material by grant from the author, January, 2001.
316
+
317
+
318
+ Moby (tm) Words II for the MSDOS operating system is compressed and
319
+ distributed as a single zip file. After extraction, the vocabulary
320
+ files included with this product are in ordinary ASCII format with
321
+ CRLF (ASCII 13/10) delimiters.
322
+
323
+
324
+
325
+
326
+ MOBY WORDS II CONTENTS
327
+
328
+ 6,213 acronyms (acronyms.txt)
329
+ common acronyms & abbreviations
330
+
331
+ 74,550 common dictionary words (common.txt)
332
+ A list of words in common with two or more published dictionaries.
333
+ This gives the developer of a custom spelling checker a good
334
+ beginning pool of relatively common words.
335
+
336
+ 256,772 compound words (compound.txt)
337
+ Over 256,700 hyphenated or other entries containing more than one
338
+ word as well as all capitalized words and acronyms. Phrases were
339
+ considered 'common' if they or variations of them occur in standard
340
+ dictionaries or thesauruses.
341
+
342
+ 113,809 official crosswords (crosswd.txt)
343
+ A list of words permitted in crossword games such as Scrabble(tm).
344
+ Compatible with the first edition of the Official Scrabble Players
345
+ Dictionary(tm). Since this list has all forms: -ing, -ed, -s, and so
346
+ on of words, it makes a good addition when building a custom spelling
347
+ dictionary.
348
+
349
+ 4,160 official crosswords delta (crswd-d.txt)
350
+ When combined with the 113,809 crosswords file, it produces the
351
+ official crossword list compatible with the second edition of the
352
+ Official Scrabble Players Dictionary. (Scrabble is a registered
353
+ trademark of Milton-Bradley licensed to Merriam-Webster.)
354
+
355
+ 467 current fiction substrings (fiction.txt)
356
+ The most frequently occurring 467 substrings occurring in a
357
+ best-selling novel by Amy Tan in 1990.
358
+
359
+ 1,000 by frequency (freq.txt)
360
+ This file consists of the 1,000 most frequently used English words
361
+ from a wide variety of common texts listed in decreasing order of
362
+ frequency
363
+
364
+ 1,000 by frequency internet (freq-int.txt)
365
+ This file consists of the 1,000 most frequently used English words
366
+ as used on the Internet computer network in 1992.
367
+
368
+ 1,185 King James Version frequent substrings (KJVfreq.txt)
369
+ The most frequently occurring 1,185 substrings in the King James
370
+ Version Bible ranked and counted by order of frequency.
371
+
372
+ 21,986 names (names.txt)
373
+ This database contains the most common names used in the United
374
+ States and Great Britain. Spelling checkers may want to supplement
375
+ their basic word list with this one.
376
+
377
+ 4,946 female names (names-f.txt)
378
+ frequent given names of females in English speaking countries
379
+
380
+ 3,897 male names (names-m.txt
381
+ frequent given names of males in English speaking countries
382
+
383
+ 366 often misspelled words (oftenmis.txt)
384
+ many of the most commonly misspelled words in English speaking countries
385
+
386
+ 10,196 places (places.txt)
387
+ a large selection of place names in the United States
388
+
389
+ 354,984 single words (single.txt)
390
+ Over 354,000 single words, excluding proper names, acronyms, or
391
+ compound words and phrases. This list does not exclude archaic words
392
+ or significant variant spellings.
393
+
394
+ USA Constitution (usaconst.txt)
395
+ The Constitution of the United States, including the Bill of Rights
396
+ and all amendments current to 1993.
397
+
398
+ NOTE: Accents have been stripped from words, e.g., 'etude' does not
399
+ mark the accent on the initial 'e'.
400
+
401
+
402
+
403
+ Quick Start
404
+ 1) Insure you have at least 10Mb of free disk space to hold the contents
405
+ of this zip file.
406
+ 2) Create a directory to hold these files listed above.
407
+ 3) Extract the contents of this zip file into the destination directory
408
+ using any compatible zip file extraction utility.
409
+ 4) Delete the original zip file from your disk to save space. (optional)
410
+
411
+
412
+ End of this Project Gutenberg etext of Moby Word II by Grady Ward.
413
+
@@ -0,0 +1,47 @@
1
+ require 'spec_helper'
2
+
3
+ module Moby
4
+ describe Hyphenator do
5
+ let(:hyph) { Hyphenator.new }
6
+
7
+ describe "#hyphenate" do
8
+ def h(word)
9
+ hyph.hyphenate(word)
10
+ end
11
+
12
+ describe "swordcraft" do
13
+ specify { h("swordcraft").should == "sword-craft" }
14
+ end
15
+
16
+ describe "California poppy" do
17
+ specify { h("California poppy").should == "Cal-i-for-ni-a pop-py" }
18
+ end
19
+
20
+ describe "unstandardized" do
21
+ specify { h("unstandardized").should == "un-stand-ard-ized" }
22
+ end
23
+
24
+ describe "recede" do
25
+ specify { h("recede").should == "re-cede" }
26
+ end
27
+
28
+ describe "unlisted words are returned unchanged" do
29
+ describe "Dwalin" do
30
+ specify { h("Dwalin").should == "Dwalin" }
31
+ end
32
+
33
+ describe "Balin" do
34
+ specify { h("Balin").should == "Balin" }
35
+ end
36
+
37
+ describe "Kili" do
38
+ specify { h("Kili").should == "Kili" }
39
+ end
40
+
41
+ describe "Fili" do
42
+ specify { h("Fili").should == "Fili" }
43
+ end
44
+ end
45
+ end
46
+ end
47
+ end