xapian-fu 0.2 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,102 @@
1
+
2
+ | A Danish stop word list. Comments begin with vertical bar. Each stop
3
+ | word is at the start of a line.
4
+
5
+ | This is a ranked list (commonest to rarest) of stopwords derived from
6
+ | a large text sample.
7
+
8
+
9
+ og | and
10
+ i | in
11
+ jeg | I
12
+ det | that (dem. pronoun)/it (pers. pronoun)
13
+ at | that (in front of a sentence)/to (with infinitive)
14
+ en | a/an
15
+ den | it (pers. pronoun)/that (dem. pronoun)
16
+ til | to/at/for/until/against/by/of/into, more
17
+ er | present tense of "to be"
18
+ som | who, as
19
+ på | on/upon/in/on/at/to/after/of/with/for, on
20
+ de | they
21
+ med | with/by/in, along
22
+ han | he
23
+ af | of/by/from/off/for/in/with/on, off
24
+ for | at/for/to/from/by/of/ago, in front/before, because
25
+ ikke | not
26
+ der | who/which, there/those
27
+ var | past tense of "to be"
28
+ mig | me/myself
29
+ sig | oneself/himself/herself/itself/themselves
30
+ men | but
31
+ et | a/an/one, one (number), someone/somebody/one
32
+ har | present tense of "to have"
33
+ om | round/about/for/in/a, about/around/down, if
34
+ vi | we
35
+ min | my
36
+ havde | past tense of "to have"
37
+ ham | him
38
+ hun | she
39
+ nu | now
40
+ over | over/above/across/by/beyond/past/on/about, over/past
41
+ da | then, when/as/since
42
+ fra | from/off/since, off, since
43
+ du | you
44
+ ud | out
45
+ sin | his/her/its/one's
46
+ dem | them
47
+ os | us/ourselves
48
+ op | up
49
+ man | you/one
50
+ hans | his
51
+ hvor | where
52
+ eller | or
53
+ hvad | what
54
+ skal | must/shall etc.
55
+ selv | myself/youself/herself/ourselves etc., even
56
+ her | here
57
+ alle | all/everyone/everybody etc.
58
+ vil | will (verb)
59
+ blev | past tense of "to stay/to remain/to get/to become"
60
+ kunne | could
61
+ ind | in
62
+ når | when
63
+ være | present tense of "to be"
64
+ dog | however/yet/after all
65
+ noget | something
66
+ ville | would
67
+ jo | you know/you see (adv), yes
68
+ deres | their/theirs
69
+ efter | after/behind/according to/for/by/from, later/afterwards
70
+ ned | down
71
+ skulle | should
72
+ denne | this
73
+ end | than
74
+ dette | this
75
+ mit | my/mine
76
+ også | also
77
+ under | under/beneath/below/during, below/underneath
78
+ have | have
79
+ dig | you
80
+ anden | other
81
+ hende | her
82
+ mine | my
83
+ alt | everything
84
+ meget | much/very, plenty of
85
+ sit | his, her, its, one's
86
+ sine | his, her, its, one's
87
+ vor | our
88
+ mod | against
89
+ disse | these
90
+ hvis | if
91
+ din | your/yours
92
+ nogle | some
93
+ hos | by/at
94
+ blive | be/become
95
+ mange | many
96
+ ad | by/through
97
+ bliver | present tense of "to be/to become"
98
+ hendes | her/hers
99
+ været | be
100
+ thi | for (conj)
101
+ jer | you
102
+ sådan | such, like this/like that
@@ -0,0 +1,113 @@
1
+
2
+
3
+ | A Dutch stop word list. Comments begin with vertical bar. Each stop
4
+ | word is at the start of a line.
5
+
6
+ | This is a ranked list (commonest to rarest) of stopwords derived from
7
+ | a large sample of Dutch text.
8
+
9
+ | Dutch stop words frequently exhibit homonym clashes. These are indicated
10
+ | clearly below.
11
+
12
+ de | the
13
+ en | and
14
+ van | of, from
15
+ ik | I, the ego
16
+ te | (1) chez, at etc, (2) to, (3) too
17
+ dat | that, which
18
+ die | that, those, who, which
19
+ in | in, inside
20
+ een | a, an, one
21
+ hij | he
22
+ het | the, it
23
+ niet | not, nothing, naught
24
+ zijn | (1) to be, being, (2) his, one's, its
25
+ is | is
26
+ was | (1) was, past tense of all persons sing. of 'zijn' (to be) (2) wax, (3) the washing, (4) rise of river
27
+ op | on, upon, at, in, up, used up
28
+ aan | on, upon, to (as dative)
29
+ met | with, by
30
+ als | like, such as, when
31
+ voor | (1) before, in front of, (2) furrow
32
+ had | had, past tense all persons sing. of 'hebben' (have)
33
+ er | there
34
+ maar | but, only
35
+ om | round, about, for etc
36
+ hem | him
37
+ dan | then
38
+ zou | should/would, past tense all persons sing. of 'zullen'
39
+ of | or, whether, if
40
+ wat | what, something, anything
41
+ mijn | possessive and noun 'mine'
42
+ men | people, 'one'
43
+ dit | this
44
+ zo | so, thus, in this way
45
+ door | through by
46
+ over | over, across
47
+ ze | she, her, they, them
48
+ zich | oneself
49
+ bij | (1) a bee, (2) by, near, at
50
+ ook | also, too
51
+ tot | till, until
52
+ je | you
53
+ mij | me
54
+ uit | out of, from
55
+ der | Old Dutch form of 'van der' still found in surnames
56
+ daar | (1) there, (2) because
57
+ haar | (1) her, their, them, (2) hair
58
+ naar | (1) unpleasant, unwell etc, (2) towards, (3) as
59
+ heb | present first person sing. of 'to have'
60
+ hoe | how, why
61
+ heeft | present third person sing. of 'to have'
62
+ hebben | 'to have' and various parts thereof
63
+ deze | this
64
+ u | you
65
+ want | (1) for, (2) mitten, (3) rigging
66
+ nog | yet, still
67
+ zal | 'shall', first and third person sing. of verb 'zullen' (will)
68
+ me | me
69
+ zij | she, they
70
+ nu | now
71
+ ge | 'thou', still used in Belgium and south Netherlands
72
+ geen | none
73
+ omdat | because
74
+ iets | something, somewhat
75
+ worden | to become, grow, get
76
+ toch | yet, still
77
+ al | all, every, each
78
+ waren | (1) 'were' (2) to wander, (3) wares, (3)
79
+ veel | much, many
80
+ meer | (1) more, (2) lake
81
+ doen | to do, to make
82
+ toen | then, when
83
+ moet | noun 'spot/mote' and present form of 'to must'
84
+ ben | (1) am, (2) 'are' in interrogative second person singular of 'to be'
85
+ zonder | without
86
+ kan | noun 'can' and present form of 'to be able'
87
+ hun | their, them
88
+ dus | so, consequently
89
+ alles | all, everything, anything
90
+ onder | under, beneath
91
+ ja | yes, of course
92
+ eens | once, one day
93
+ hier | here
94
+ wie | who
95
+ werd | imperfect third person sing. of 'become'
96
+ altijd | always
97
+ doch | yet, but etc
98
+ wordt | present third person sing. of 'become'
99
+ wezen | (1) to be, (2) 'been' as in 'been fishing', (3) orphans
100
+ kunnen | to be able
101
+ ons | us/our
102
+ zelf | self
103
+ tegen | against, towards, at
104
+ na | after, near
105
+ reeds | already
106
+ wil | (1) present tense of 'want', (2) 'will', noun, (3) fender
107
+ kon | could; past tense of 'to be able'
108
+ niets | nothing
109
+ uw | your
110
+ iemand | somebody
111
+ geweest | been; past participle of 'be'
112
+ andere | other
113
+
@@ -0,0 +1,312 @@
1
+
2
+ | An English stop word list. Comments begin with vertical bar. Each stop
3
+ | word is at the start of a line.
4
+
5
+ | Many of the forms below are quite rare (e.g. "yourselves") but included for
6
+ | completeness.
7
+
8
+ | PRONOUNS FORMS
9
+ | 1st person sing
10
+
11
+ i | subject, always in upper case of course
12
+
13
+ me | object
14
+ my | possessive adjective
15
+ | the possessive pronoun `mine' is best suppressed, because of the
16
+ | sense of coal-mine etc.
17
+ myself | reflexive
18
+ | 1st person plural
19
+ we | subject
20
+
21
+ | us | object
22
+ | care is required here because US = United States. It is usually
23
+ | safe to remove it if it is in lower case.
24
+ our | possessive adjective
25
+ ours | possessive pronoun
26
+ ourselves | reflexive
27
+ | second person (archaic `thou' forms not included)
28
+ you | subject and object
29
+ your | possessive adjective
30
+ yours | possessive pronoun
31
+ yourself | reflexive (singular)
32
+ yourselves | reflexive (plural)
33
+ | third person singular
34
+ he | subject
35
+ him | object
36
+ his | possessive adjective and pronoun
37
+ himself | reflexive
38
+
39
+ she | subject
40
+ her | object and possessive adjective
41
+ hers | possessive pronoun
42
+ herself | reflexive
43
+
44
+ it | subject and object
45
+ its | possessive adjective
46
+ itself | reflexive
47
+ | third person plural
48
+ they | subject
49
+ them | object
50
+ their | possessive adjective
51
+ theirs | possessive pronoun
52
+ themselves | reflexive
53
+ | other forms (demonstratives, interrogatives)
54
+ what
55
+ which
56
+ who
57
+ whom
58
+ this
59
+ that
60
+ these
61
+ those
62
+
63
+ | VERB FORMS (using F.R. Palmer's nomenclature)
64
+ | BE
65
+ am | 1st person, present
66
+ is | -s form (3rd person, present)
67
+ are | present
68
+ was | 1st person, past
69
+ were | past
70
+ be | infinitive
71
+ been | past participle
72
+ being | -ing form
73
+ | HAVE
74
+ have | simple
75
+ has | -s form
76
+ had | past
77
+ having | -ing form
78
+ | DO
79
+ do | simple
80
+ does | -s form
81
+ did | past
82
+ doing | -ing form
83
+
84
+ | The forms below are, I believe, best omitted, because of the significant
85
+ | homonym forms:
86
+
87
+ | He made a WILL
88
+ | old tin CAN
89
+ | merry month of MAY
90
+ | a smell of MUST
91
+ | fight the good fight with all thy MIGHT
92
+
93
+ | would, could, should, ought might however be included
94
+
95
+ | | AUXILIARIES
96
+ | | WILL
97
+ |will
98
+
99
+ would
100
+
101
+ | | SHALL
102
+ |shall
103
+
104
+ should
105
+
106
+ | | CAN
107
+ |can
108
+
109
+ could
110
+
111
+ | | MAY
112
+ |may
113
+ |might
114
+ | | MUST
115
+ |must
116
+ | | OUGHT
117
+
118
+ ought
119
+
120
+ | COMPOUND FORMS, increasingly encountered nowadays in 'formal' writing
121
+ | pronoun + verb
122
+
123
+ i'm
124
+ you're
125
+ he's
126
+ she's
127
+ it's
128
+ we're
129
+ they're
130
+ i've
131
+ you've
132
+ we've
133
+ they've
134
+ i'd
135
+ you'd
136
+ he'd
137
+ she'd
138
+ we'd
139
+ they'd
140
+ i'll
141
+ you'll
142
+ he'll
143
+ she'll
144
+ we'll
145
+ they'll
146
+
147
+ | verb + negation
148
+
149
+ isn't
150
+ aren't
151
+ wasn't
152
+ weren't
153
+ hasn't
154
+ haven't
155
+ hadn't
156
+ doesn't
157
+ don't
158
+ didn't
159
+
160
+ | auxiliary + negation
161
+
162
+ won't
163
+ wouldn't
164
+ shan't
165
+ shouldn't
166
+ can't
167
+ cannot
168
+ couldn't
169
+ mustn't
170
+
171
+ | miscellaneous forms
172
+
173
+ let's
174
+ that's
175
+ who's
176
+ what's
177
+ here's
178
+ there's
179
+ when's
180
+ where's
181
+ why's
182
+ how's
183
+
184
+ | rarer forms
185
+
186
+ | daren't needn't
187
+
188
+ | doubtful forms
189
+
190
+ | oughtn't mightn't
191
+
192
+ | ARTICLES
193
+ a
194
+ an
195
+ the
196
+
197
+ | THE REST (Overlap among prepositions, conjunctions, adverbs etc is so
198
+ | high, that classification is pointless.)
199
+ and
200
+ but
201
+ if
202
+ or
203
+ because
204
+ as
205
+ until
206
+ while
207
+
208
+ of
209
+ at
210
+ by
211
+ for
212
+ with
213
+ about
214
+ against
215
+ between
216
+ into
217
+ through
218
+ during
219
+ before
220
+ after
221
+ above
222
+ below
223
+ to
224
+ from
225
+ up
226
+ down
227
+ in
228
+ out
229
+ on
230
+ off
231
+ over
232
+ under
233
+
234
+ again
235
+ further
236
+ then
237
+ once
238
+
239
+ here
240
+ there
241
+ when
242
+ where
243
+ why
244
+ how
245
+
246
+ all
247
+ any
248
+ both
249
+ each
250
+ few
251
+ more
252
+ most
253
+ other
254
+ some
255
+ such
256
+
257
+ no
258
+ nor
259
+ not
260
+ only
261
+ own
262
+ same
263
+ so
264
+ than
265
+ too
266
+ very
267
+
268
+ | Just for the record, the following words are among the commonest in English
269
+
270
+ | one
271
+ | every
272
+ | least
273
+ | less
274
+ | many
275
+ | now
276
+ | ever
277
+ | never
278
+ | say
279
+ | says
280
+ | said
281
+ | also
282
+ | get
283
+ | go
284
+ | goes
285
+ | just
286
+ | made
287
+ | make
288
+ | put
289
+ | see
290
+ | seen
291
+ | whether
292
+ | like
293
+ | well
294
+ | back
295
+ | even
296
+ | still
297
+ | way
298
+ | take
299
+ | since
300
+ | another
301
+ | however
302
+ | two
303
+ | three
304
+ | four
305
+ | five
306
+ | first
307
+ | second
308
+ | new
309
+ | old
310
+ | high
311
+ | long
312
+
@@ -0,0 +1,89 @@
1
+
2
+ | forms of BE
3
+
4
+ olla
5
+ olen
6
+ olet
7
+ on
8
+ olemme
9
+ olette
10
+ ovat
11
+ ole | negative form
12
+
13
+ oli
14
+ olisi
15
+ olisit
16
+ olisin
17
+ olisimme
18
+ olisitte
19
+ olisivat
20
+ olit
21
+ olin
22
+ olimme
23
+ olitte
24
+ olivat
25
+ ollut
26
+ olleet
27
+
28
+ en | negation
29
+ et
30
+ ei
31
+ emme
32
+ ette
33
+ eivät
34
+
35
+ |Nom Gen Acc Part Iness Elat Illat Adess Ablat Allat Ess Trans
36
+ minä minun minut minua minussa minusta minuun minulla minulta minulle | I
37
+ sinä sinun sinut sinua sinussa sinusta sinuun sinulla sinulta sinulle | you
38
+ hän hänen hänet häntä hänessä hänestä häneen hänellä häneltä hänelle | he she
39
+ me meidän meidät meitä meissä meistä meihin meillä meiltä meille | we
40
+ te teidän teidät teitä teissä teistä teihin teillä teiltä teille | you
41
+ he heidän heidät heitä heissä heistä heihin heillä heiltä heille | they
42
+
43
+ tämä tämän tätä tässä tästä tähän tallä tältä tälle tänä täksi | this
44
+ tuo tuon tuotä tuossa tuosta tuohon tuolla tuolta tuolle tuona tuoksi | that
45
+ se sen sitä siinä siitä siihen sillä siltä sille sinä siksi | it
46
+ nämä näiden näitä näissä näistä näihin näillä näiltä näille näinä näiksi | these
47
+ nuo noiden noita noissa noista noihin noilla noilta noille noina noiksi | those
48
+ ne niiden niitä niissä niistä niihin niillä niiltä niille niinä niiksi | they
49
+
50
+ kuka kenen kenet ketä kenessä kenestä keneen kenellä keneltä kenelle kenenä keneksi| who
51
+ ketkä keiden ketkä keitä keissä keistä keihin keillä keiltä keille keinä keiksi | (pl)
52
+ mikä minkä minkä mitä missä mistä mihin millä miltä mille minä miksi | which what
53
+ mitkä | (pl)
54
+
55
+ joka jonka jota jossa josta johon jolla jolta jolle jona joksi | who which
56
+ jotka joiden joita joissa joista joihin joilla joilta joille joina joiksi | (pl)
57
+
58
+ | conjunctions
59
+
60
+ että | that
61
+ ja | and
62
+ jos | if
63
+ koska | because
64
+ kuin | than
65
+ mutta | but
66
+ niin | so
67
+ sekä | and
68
+ sillä | for
69
+ tai | or
70
+ vaan | but
71
+ vai | or
72
+ vaikka | although
73
+
74
+
75
+ | prepositions
76
+
77
+ kanssa | with
78
+ mukaan | according to
79
+ noin | about
80
+ poikki | across
81
+ yli | over, across
82
+
83
+ | other
84
+
85
+ kun | when
86
+ niin | so
87
+ nyt | now
88
+ itse | self
89
+