moby 1.0.1beta

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,412 @@
1
+ The Project Gutenberg Etext of Moby Part-of-Speech II by Grady Ward
2
+
3
+ Copyright laws are changing all over the world, be sure to check
4
+ the laws for your country before redistributing these files!!!
5
+
6
+ Please take a look at the important information in this header.
7
+ We encourage you to keep this file on your own disk, keeping an
8
+ electronic path open for the next readers. Do not remove this.
9
+
10
+ This should be the first thing seen when anyone opens the book.
11
+ Do not change or edit it without written permission. The words
12
+ are carefully chosen to provide users with the information they
13
+ need about what they can legally do with the texts.
14
+
15
+ **Welcome To The World of Free Plain Vanilla Electronic Texts**
16
+
17
+ **Etexts Readable By Both Humans and By Computers, Since 1971**
18
+
19
+ *These Etexts Prepared By Hundreds of Volunteers and Donations*
20
+
21
+ Information on contacting Project Gutenberg to get Etexts, and
22
+ further information is included below. We need your donations.
23
+
24
+ Presently, contributions are only being solicited from people in:
25
+ Texas, Nevada, Idaho, Montana, Wyoming, Colorado, South Dakota,
26
+ Iowa, Indiana, and Vermont. As the requirements for other states
27
+ are met, additions to this list will be made and fund raising will
28
+ begin in the additional states. These donations should be made to:
29
+
30
+ Project Gutenberg Literary Archive Foundation
31
+ PMB 113
32
+ 1739 University Ave.
33
+ Oxford, MS 38655
34
+
35
+ Title: Moby Part-of-Speech II
36
+
37
+ Author: Grady Ward, grady@gradyward.com
38
+
39
+ Release Date: May, 2002 [Etext #3203]
40
+
41
+ Edition: 1.0
42
+
43
+ The Project Gutenberg Etext of Moby Part-of-Speech II by Grady Ward
44
+ ******This file should be named mposp10.zip******
45
+
46
+ Corrected EDITIONS of our etexts get a new NUMBER, mposp11.zip
47
+ VERSIONS based on separate sources get new LETTER, mposp10a.zip
48
+
49
+ This etext was prepared by Mike Pullen,
50
+ globaltraveler5565@yahoo.com.
51
+
52
+ Project Gutenberg Etexts are usually created from multiple editions,
53
+ all of which are in the Public Domain in the United States, unless a
54
+ copyright notice is included. Therefore, we usually do NOT keep any
55
+ of these books in compliance with any particular paper edition.
56
+
57
+ We are now trying to release all our books one year in advance
58
+ of the official release dates, leaving time for better editing.
59
+ Please be encouraged to send us error messages even years after
60
+ the official publication date.
61
+
62
+ Please note: neither this list nor its contents are final till
63
+ midnight of the last day of the month of any such announcement.
64
+ The official release date of all Project Gutenberg Etexts is at
65
+ Midnight, Central Time, of the last day of the stated month. A
66
+ preliminary version may often be posted for suggestion, comment
67
+ and editing by those who wish to do so.
68
+
69
+ Most people start at our sites at:
70
+ http://gutenberg.net/pg
71
+ http://promo.net/pg
72
+
73
+ Those of you who want to download our Etexts before announcment
74
+ can surf to them as follows, and just download by date; this is
75
+ also a good way to get them instantly upon announcement, as the
76
+ indexes our cataloguers produce obviously take a while after an
77
+ announcement goes out in the Project Gutenberg Newsletter.
78
+
79
+ http://metalab.unc.edu/pub/docs/books/gutenberg/etext01
80
+ or
81
+ ftp://metalab.unc.edu/pub/docs/books/gutenberg/etext01
82
+
83
+ Or /etext00, 99, 98, 97, 96, 95, 94, 93, 92, 92, 91 or 90
84
+
85
+ Just search by the first five letters of the filename you want,
86
+ as it appears in our Newsletters.
87
+
88
+ Information about Project Gutenberg (one page)
89
+
90
+ We produce about two million dollars for each hour we work. The
91
+ time it takes us, a rather conservative estimate, is fifty hours
92
+ to get any etext selected, entered, proofread, edited, copyright
93
+ searched and analyzed, the copyright letters written, etc. This
94
+ projected audience is one hundred million readers. If our value
95
+ per text is nominally estimated at one dollar then we produce $2
96
+ million dollars per hour this year as we release fifty new Etext
97
+ files per month, or 500 more Etexts in 2000 for a total of 3000+
98
+ If they reach just 1-2% of the world's population then the total
99
+ should reach over 300 billion Etexts given away by year's end.
100
+
101
+ The Goal of Project Gutenberg is to Give Away One Trillion Etext
102
+ Files by December 31, 2001. [10,000 x 100,000,000 = 1 Trillion]
103
+ This is ten thousand titles each to one hundred million readers,
104
+ which is only about 4% of the present number of computer users.
105
+
106
+ At our revised rates of production, we will reach only one-third
107
+ of that goal by the end of 2001, or about 3,333 Etexts unless we
108
+ manage to get some real funding.
109
+
110
+ Something is needed to create a future for Project Gutenberg for
111
+ the next 100 years.
112
+
113
+ We need your donations more than ever!
114
+
115
+ Presently, contributions are only being solicited from people in:
116
+ Texas, Nevada, Idaho, Montana, Wyoming, Colorado, South Dakota,
117
+ Iowa, Indiana, and Vermont. As the requirements for other states
118
+ are met, additions to this list will be made and fund raising will
119
+ begin in the additional states.
120
+
121
+ All donations should be made to the Project Gutenberg Literary
122
+ Archive Foundation and will be tax deductible to the extent
123
+ permitted by law.
124
+
125
+ Mail to:
126
+
127
+ Project Gutenberg Literary Archive Foundation
128
+ PMB 113
129
+ 1739 University Avenue
130
+ Oxford, MS 38655 [USA]
131
+
132
+ We are working with the Project Gutenberg Literary Archive
133
+ Foundation to build more stable support and ensure the
134
+ future of Project Gutenberg.
135
+
136
+ We need your donations more than ever!
137
+
138
+ You can get up to date donation information at:
139
+
140
+ http://www.gutenberg.net/donation.html
141
+
142
+ ***
143
+
144
+ You can always email directly to:
145
+
146
+ Michael S. Hart <hart@pobox.com>
147
+
148
+ hart@pobox.com forwards to hart@prairienet.org and archive.org
149
+ if your mail bounces from archive.org, I will still see it, if
150
+ it bounces from prairienet.org, better resend later on. . . .
151
+
152
+ We would prefer to send you this information by email.
153
+
154
+ Example command-line FTP session:
155
+
156
+ ftp metalab.unc.edu
157
+ login: anonymous
158
+ password: your@login
159
+ cd pub/docs/books/gutenberg
160
+ cd etext90 through etext99 or etext00 through etext01, etc.
161
+ dir [to see files]
162
+ get or mget [to get files. . .set bin for zip files]
163
+ GET GUTINDEX.?? [to get a year's listing of books, e.g.,
164
+ GUTINDEX.99]
165
+ GET GUTINDEX.ALL [to get a listing of ALL books]
166
+
167
+ **The Legal Small Print**
168
+
169
+ (Three Pages)
170
+
171
+ ***START**THE SMALL PRINT!**FOR PUBLIC DOMAIN ETEXTS**START***
172
+ Why is this "Small Print!" statement here? You know: lawyers.
173
+ They tell us you might sue us if there is something wrong with
174
+ your copy of this etext, even if you got it for free from
175
+ someone other than us, and even if what's wrong is not our
176
+ fault. So, among other things, this "Small Print!" statement
177
+ disclaims most of our liability to you. It also tells you how
178
+ you can distribute copies of this etext if you want to.
179
+
180
+ *BEFORE!* YOU USE OR READ THIS ETEXT
181
+ By using or reading any part of this PROJECT GUTENBERG-tm
182
+ etext, you indicate that you understand, agree to and accept
183
+ this "Small Print!" statement. If you do not, you can receive
184
+ a refund of the money (if any) you paid for this etext by
185
+ sending a request within 30 days of receiving it to the person
186
+ you got it from. If you received this etext on a physical
187
+ medium (such as a disk), you must return it with your request.
188
+
189
+ ABOUT PROJECT GUTENBERG-TM ETEXTS
190
+ This PROJECT GUTENBERG-tm etext, like most PROJECT GUTENBERG-tm
191
+ etexts,
192
+ is a "public domain" work distributed by Professor Michael S. Hart
193
+ through the Project Gutenberg Association (the "Project").
194
+ Among other things, this means that no one owns a United States
195
+ copyright
196
+ on or for this work, so the Project (and you!) can copy and
197
+ distribute it in the United States without permission and
198
+ without paying copyright royalties. Special rules, set forth
199
+ below, apply if you wish to copy and distribute this etext
200
+ under the Project's "PROJECT GUTENBERG" trademark.
201
+
202
+ To create these etexts, the Project expends considerable
203
+ efforts to identify, transcribe and proofread public domain
204
+ works. Despite these efforts, the Project's etexts and any
205
+ medium they may be on may contain "Defects". Among other
206
+ things, Defects may take the form of incomplete, inaccurate or
207
+ corrupt data, transcription errors, a copyright or other
208
+ intellectual property infringement, a defective or damaged
209
+ disk or other etext medium, a computer virus, or computer
210
+ codes that damage or cannot be read by your equipment.
211
+
212
+ LIMITED WARRANTY; DISCLAIMER OF DAMAGES
213
+ But for the "Right of Replacement or Refund" described below,
214
+ [1] the Project (and any other party you may receive this
215
+ etext from as a PROJECT GUTENBERG-tm etext) disclaims all
216
+ liability to you for damages, costs and expenses, including
217
+ legal fees, and [2] YOU HAVE NO REMEDIES FOR NEGLIGENCE OR
218
+ UNDER STRICT LIABILITY, OR FOR BREACH OF WARRANTY OR CONTRACT,
219
+ INCLUDING BUT NOT LIMITED TO INDIRECT, CONSEQUENTIAL, PUNITIVE
220
+ OR INCIDENTAL DAMAGES, EVEN IF YOU GIVE NOTICE OF THE
221
+ POSSIBILITY OF SUCH DAMAGES.
222
+
223
+ If you discover a Defect in this etext within 90 days of
224
+ receiving it, you can receive a refund of the money (if any)
225
+ you paid for it by sending an explanatory note within that
226
+ time to the person you received it from. If you received it
227
+ on a physical medium, you must return it with your note, and
228
+ such person may choose to alternatively give you a replacement
229
+ copy. If you received it electronically, such person may
230
+ choose to alternatively give you a second opportunity to
231
+ receive it electronically.
232
+
233
+ THIS ETEXT IS OTHERWISE PROVIDED TO YOU "AS-IS". NO OTHER
234
+ WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, ARE MADE TO YOU AS
235
+ TO THE ETEXT OR ANY MEDIUM IT MAY BE ON, INCLUDING BUT NOT
236
+ LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
237
+ PARTICULAR PURPOSE.
238
+
239
+ Some states do not allow disclaimers of implied warranties or
240
+ the exclusion or limitation of consequential damages, so the
241
+ above disclaimers and exclusions may not apply to you, and you
242
+ may have other legal rights.
243
+
244
+ INDEMNITY
245
+ You will indemnify and hold the Project, its directors,
246
+ officers, members and agents harmless from all liability, cost
247
+ and expense, including legal fees, that arise directly or
248
+ indirectly from any of the following that you do or cause:
249
+ [1] distribution of this etext, [2] alteration, modification,
250
+ or addition to the etext, or [3] any Defect.
251
+
252
+ DISTRIBUTION UNDER "PROJECT GUTENBERG-tm"
253
+ You may distribute copies of this etext electronically, or by
254
+ disk, book or any other medium if you either delete this
255
+ "Small Print!" and all other references to Project Gutenberg,
256
+ or:
257
+
258
+ [1] Only give exact copies of it. Among other things, this
259
+ requires that you do not remove, alter or modify the
260
+ etext or this "small print!" statement. You may however,
261
+ if you wish, distribute this etext in machine readable
262
+ binary, compressed, mark-up, or proprietary form,
263
+ including any form resulting from conversion by word pro-
264
+ cessing or hypertext software, but only so long as
265
+ *EITHER*:
266
+
267
+ [*] The etext, when displayed, is clearly readable, and
268
+ does *not* contain characters other than those
269
+ intended by the author of the work, although tilde
270
+ (~), asterisk (*) and underline (_) characters may
271
+ be used to convey punctuation intended by the
272
+ author, and additional characters may be used to
273
+ indicate hypertext links; OR
274
+
275
+ [*] The etext may be readily converted by the reader at
276
+ no expense into plain ASCII, EBCDIC or equivalent
277
+ form by the program that displays the etext (as is
278
+ the case, for instance, with most word processors);
279
+ OR
280
+
281
+ [*] You provide, or agree to also provide on request at
282
+ no additional cost, fee or expense, a copy of the
283
+ etext in its original plain ASCII form (or in EBCDIC
284
+ or other equivalent proprietary form).
285
+
286
+ [2] Honor the etext refund and replacement provisions of this
287
+ "Small Print!" statement.
288
+
289
+ [3] Pay a trademark license fee to the Project of 20% of the
290
+ gross profits you derive calculated using the method you
291
+ already use to calculate your applicable taxes. If you
292
+ don't derive profits, no royalty is due. Royalties are
293
+ payable to "Project Gutenberg Literary Archive Foundation"
294
+ the 60 days following each date you prepare (or were
295
+ legally required to prepare) your annual (or equivalent
296
+ periodic) tax return. Please contact us beforehand to
297
+ let us know your plans and to work out the details.
298
+
299
+ WHAT IF YOU *WANT* TO SEND MONEY EVEN IF YOU DON'T HAVE TO?
300
+ The Project gratefully accepts contributions of money, time,
301
+ public domain etexts, and royalty free copyright licenses.
302
+ If you are interested in contributing scanning equipment or
303
+ software or other items, please contact Michael Hart at:
304
+ hart@pobox.com
305
+
306
+ *END THE SMALL PRINT! FOR PUBLIC DOMAIN ETEXTS*Ver.04.07.00*END*
307
+
308
+
309
+
310
+
311
+ Moby (tm) Part-of-Speech II Documentation Notes
312
+
313
+ This documentation, the software and/or database are:
314
+
315
+ Public Domain material by grant from the author, January, 2001.
316
+
317
+
318
+ Moby (tm) Part-of-Speech II for MSDOS operating systems is compressed
319
+ and distributed as a single zip file. After decompression the
320
+ part-of-speech file included with this product is in ordinary ASCII
321
+ format with CRLF (ASCII 13/10) delimiters.
322
+
323
+
324
+
325
+
326
+
327
+ MOBY Part-of-Speech II CONTENTS
328
+
329
+ Read Me First File (aaREADME.txt)
330
+ Part-of-Speech (mobypos.txt)
331
+
332
+
333
+
334
+
335
+ Quick Start
336
+ 1) Insure you have at least 3Mb of free disk space to hold the contents
337
+ of this zip file.
338
+ 2) Create a directory to hold these files listed above.
339
+ 3) Extract the contents of this zip file into the destination directory
340
+ using any compatible zip file extraction utility.
341
+ 4) Delete the original zip file from your disk to save space. (optional)
342
+
343
+
344
+ This second edition is a particularly thorough revision of the
345
+ original Moby Part-of-Speech. Beyond the fifteen thousand new
346
+ entries, many thousand more entries have been scrutinized for
347
+ correctness and modernity. This is unquestionably the largest P-O-S
348
+ list in the world. Note that the many included phrases means that
349
+ parsing algorithms can now tokenize in units larger than a single
350
+ word, increasing both speed *and* accuracy.
351
+
352
+
353
+ Database Legend:
354
+
355
+ Each part-of-speech vocabulary entry consists of a word or phrase
356
+ field followed by a field delimiter of the backslash (\) and the
357
+ part-of-speech field that is coded using the following ASCII symbols
358
+ (case is significant):
359
+
360
+
361
+ Noun N
362
+
363
+ Plural p
364
+
365
+ Noun Phrase h
366
+
367
+ Verb (usu participle) V
368
+
369
+ Verb (transitive) t
370
+
371
+ Verb (intransitive) i
372
+
373
+ Adjective A
374
+
375
+ Adverb v
376
+
377
+ Conjunction C
378
+
379
+ Preposition P
380
+
381
+ Interjection !
382
+
383
+ Pronoun r
384
+
385
+ Definite Article D
386
+
387
+ Indefinite Article I
388
+
389
+ Nominative o
390
+
391
+
392
+
393
+ This two-part vocabulary record is delimited from others with CRLF
394
+ (ASCII 13/10). For example, engineer\Nt means that the word engineer
395
+ has two main uses in English; the principal part-of-speech is as a
396
+ noun "That engineer could write in microcode with one hand and in ADA
397
+ with the other" and its secondary part-of-speech is as a transitive
398
+ verb: "We sure engineered that software to death."
399
+
400
+ In many cases, the -ed, -ing, -ly, and -ic forms of words are not
401
+ explicitly listed; the participle forms of verbs will be usually
402
+ marked simply with the V sign rather than the more specific t or i
403
+ symbols. Words such as "be," which often have more than one head
404
+ entry in a dictionary, have one listing with all the parts-of-speech
405
+ for all senses concatenated. Foreign words commonly used in English
406
+ usually include their diacritical marks, for example, the acute
407
+ accent e is denoted by ASCII 142.
408
+
409
+
410
+
411
+ End of this Project Gutenberg etext of Moby Part-of-Speech II by Grady Ward.
412
+