xapian-fu 0.2 → 1.0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,102 @@
1
+
2
+ | A Danish stop word list. Comments begin with vertical bar. Each stop
3
+ | word is at the start of a line.
4
+
5
+ | This is a ranked list (commonest to rarest) of stopwords derived from
6
+ | a large text sample.
7
+
8
+
9
+ og | and
10
+ i | in
11
+ jeg | I
12
+ det | that (dem. pronoun)/it (pers. pronoun)
13
+ at | that (in front of a sentence)/to (with infinitive)
14
+ en | a/an
15
+ den | it (pers. pronoun)/that (dem. pronoun)
16
+ til | to/at/for/until/against/by/of/into, more
17
+ er | present tense of "to be"
18
+ som | who, as
19
+ på | on/upon/in/on/at/to/after/of/with/for, on
20
+ de | they
21
+ med | with/by/in, along
22
+ han | he
23
+ af | of/by/from/off/for/in/with/on, off
24
+ for | at/for/to/from/by/of/ago, in front/before, because
25
+ ikke | not
26
+ der | who/which, there/those
27
+ var | past tense of "to be"
28
+ mig | me/myself
29
+ sig | oneself/himself/herself/itself/themselves
30
+ men | but
31
+ et | a/an/one, one (number), someone/somebody/one
32
+ har | present tense of "to have"
33
+ om | round/about/for/in/a, about/around/down, if
34
+ vi | we
35
+ min | my
36
+ havde | past tense of "to have"
37
+ ham | him
38
+ hun | she
39
+ nu | now
40
+ over | over/above/across/by/beyond/past/on/about, over/past
41
+ da | then, when/as/since
42
+ fra | from/off/since, off, since
43
+ du | you
44
+ ud | out
45
+ sin | his/her/its/one's
46
+ dem | them
47
+ os | us/ourselves
48
+ op | up
49
+ man | you/one
50
+ hans | his
51
+ hvor | where
52
+ eller | or
53
+ hvad | what
54
+ skal | must/shall etc.
55
+ selv | myself/youself/herself/ourselves etc., even
56
+ her | here
57
+ alle | all/everyone/everybody etc.
58
+ vil | will (verb)
59
+ blev | past tense of "to stay/to remain/to get/to become"
60
+ kunne | could
61
+ ind | in
62
+ når | when
63
+ være | present tense of "to be"
64
+ dog | however/yet/after all
65
+ noget | something
66
+ ville | would
67
+ jo | you know/you see (adv), yes
68
+ deres | their/theirs
69
+ efter | after/behind/according to/for/by/from, later/afterwards
70
+ ned | down
71
+ skulle | should
72
+ denne | this
73
+ end | than
74
+ dette | this
75
+ mit | my/mine
76
+ også | also
77
+ under | under/beneath/below/during, below/underneath
78
+ have | have
79
+ dig | you
80
+ anden | other
81
+ hende | her
82
+ mine | my
83
+ alt | everything
84
+ meget | much/very, plenty of
85
+ sit | his, her, its, one's
86
+ sine | his, her, its, one's
87
+ vor | our
88
+ mod | against
89
+ disse | these
90
+ hvis | if
91
+ din | your/yours
92
+ nogle | some
93
+ hos | by/at
94
+ blive | be/become
95
+ mange | many
96
+ ad | by/through
97
+ bliver | present tense of "to be/to become"
98
+ hendes | her/hers
99
+ været | be
100
+ thi | for (conj)
101
+ jer | you
102
+ sådan | such, like this/like that
@@ -0,0 +1,113 @@
1
+
2
+
3
+ | A Dutch stop word list. Comments begin with vertical bar. Each stop
4
+ | word is at the start of a line.
5
+
6
+ | This is a ranked list (commonest to rarest) of stopwords derived from
7
+ | a large sample of Dutch text.
8
+
9
+ | Dutch stop words frequently exhibit homonym clashes. These are indicated
10
+ | clearly below.
11
+
12
+ de | the
13
+ en | and
14
+ van | of, from
15
+ ik | I, the ego
16
+ te | (1) chez, at etc, (2) to, (3) too
17
+ dat | that, which
18
+ die | that, those, who, which
19
+ in | in, inside
20
+ een | a, an, one
21
+ hij | he
22
+ het | the, it
23
+ niet | not, nothing, naught
24
+ zijn | (1) to be, being, (2) his, one's, its
25
+ is | is
26
+ was | (1) was, past tense of all persons sing. of 'zijn' (to be) (2) wax, (3) the washing, (4) rise of river
27
+ op | on, upon, at, in, up, used up
28
+ aan | on, upon, to (as dative)
29
+ met | with, by
30
+ als | like, such as, when
31
+ voor | (1) before, in front of, (2) furrow
32
+ had | had, past tense all persons sing. of 'hebben' (have)
33
+ er | there
34
+ maar | but, only
35
+ om | round, about, for etc
36
+ hem | him
37
+ dan | then
38
+ zou | should/would, past tense all persons sing. of 'zullen'
39
+ of | or, whether, if
40
+ wat | what, something, anything
41
+ mijn | possessive and noun 'mine'
42
+ men | people, 'one'
43
+ dit | this
44
+ zo | so, thus, in this way
45
+ door | through by
46
+ over | over, across
47
+ ze | she, her, they, them
48
+ zich | oneself
49
+ bij | (1) a bee, (2) by, near, at
50
+ ook | also, too
51
+ tot | till, until
52
+ je | you
53
+ mij | me
54
+ uit | out of, from
55
+ der | Old Dutch form of 'van der' still found in surnames
56
+ daar | (1) there, (2) because
57
+ haar | (1) her, their, them, (2) hair
58
+ naar | (1) unpleasant, unwell etc, (2) towards, (3) as
59
+ heb | present first person sing. of 'to have'
60
+ hoe | how, why
61
+ heeft | present third person sing. of 'to have'
62
+ hebben | 'to have' and various parts thereof
63
+ deze | this
64
+ u | you
65
+ want | (1) for, (2) mitten, (3) rigging
66
+ nog | yet, still
67
+ zal | 'shall', first and third person sing. of verb 'zullen' (will)
68
+ me | me
69
+ zij | she, they
70
+ nu | now
71
+ ge | 'thou', still used in Belgium and south Netherlands
72
+ geen | none
73
+ omdat | because
74
+ iets | something, somewhat
75
+ worden | to become, grow, get
76
+ toch | yet, still
77
+ al | all, every, each
78
+ waren | (1) 'were' (2) to wander, (3) wares, (3)
79
+ veel | much, many
80
+ meer | (1) more, (2) lake
81
+ doen | to do, to make
82
+ toen | then, when
83
+ moet | noun 'spot/mote' and present form of 'to must'
84
+ ben | (1) am, (2) 'are' in interrogative second person singular of 'to be'
85
+ zonder | without
86
+ kan | noun 'can' and present form of 'to be able'
87
+ hun | their, them
88
+ dus | so, consequently
89
+ alles | all, everything, anything
90
+ onder | under, beneath
91
+ ja | yes, of course
92
+ eens | once, one day
93
+ hier | here
94
+ wie | who
95
+ werd | imperfect third person sing. of 'become'
96
+ altijd | always
97
+ doch | yet, but etc
98
+ wordt | present third person sing. of 'become'
99
+ wezen | (1) to be, (2) 'been' as in 'been fishing', (3) orphans
100
+ kunnen | to be able
101
+ ons | us/our
102
+ zelf | self
103
+ tegen | against, towards, at
104
+ na | after, near
105
+ reeds | already
106
+ wil | (1) present tense of 'want', (2) 'will', noun, (3) fender
107
+ kon | could; past tense of 'to be able'
108
+ niets | nothing
109
+ uw | your
110
+ iemand | somebody
111
+ geweest | been; past participle of 'be'
112
+ andere | other
113
+
@@ -0,0 +1,312 @@
1
+
2
+ | An English stop word list. Comments begin with vertical bar. Each stop
3
+ | word is at the start of a line.
4
+
5
+ | Many of the forms below are quite rare (e.g. "yourselves") but included for
6
+ | completeness.
7
+
8
+ | PRONOUNS FORMS
9
+ | 1st person sing
10
+
11
+ i | subject, always in upper case of course
12
+
13
+ me | object
14
+ my | possessive adjective
15
+ | the possessive pronoun `mine' is best suppressed, because of the
16
+ | sense of coal-mine etc.
17
+ myself | reflexive
18
+ | 1st person plural
19
+ we | subject
20
+
21
+ | us | object
22
+ | care is required here because US = United States. It is usually
23
+ | safe to remove it if it is in lower case.
24
+ our | possessive adjective
25
+ ours | possessive pronoun
26
+ ourselves | reflexive
27
+ | second person (archaic `thou' forms not included)
28
+ you | subject and object
29
+ your | possessive adjective
30
+ yours | possessive pronoun
31
+ yourself | reflexive (singular)
32
+ yourselves | reflexive (plural)
33
+ | third person singular
34
+ he | subject
35
+ him | object
36
+ his | possessive adjective and pronoun
37
+ himself | reflexive
38
+
39
+ she | subject
40
+ her | object and possessive adjective
41
+ hers | possessive pronoun
42
+ herself | reflexive
43
+
44
+ it | subject and object
45
+ its | possessive adjective
46
+ itself | reflexive
47
+ | third person plural
48
+ they | subject
49
+ them | object
50
+ their | possessive adjective
51
+ theirs | possessive pronoun
52
+ themselves | reflexive
53
+ | other forms (demonstratives, interrogatives)
54
+ what
55
+ which
56
+ who
57
+ whom
58
+ this
59
+ that
60
+ these
61
+ those
62
+
63
+ | VERB FORMS (using F.R. Palmer's nomenclature)
64
+ | BE
65
+ am | 1st person, present
66
+ is | -s form (3rd person, present)
67
+ are | present
68
+ was | 1st person, past
69
+ were | past
70
+ be | infinitive
71
+ been | past participle
72
+ being | -ing form
73
+ | HAVE
74
+ have | simple
75
+ has | -s form
76
+ had | past
77
+ having | -ing form
78
+ | DO
79
+ do | simple
80
+ does | -s form
81
+ did | past
82
+ doing | -ing form
83
+
84
+ | The forms below are, I believe, best omitted, because of the significant
85
+ | homonym forms:
86
+
87
+ | He made a WILL
88
+ | old tin CAN
89
+ | merry month of MAY
90
+ | a smell of MUST
91
+ | fight the good fight with all thy MIGHT
92
+
93
+ | would, could, should, ought might however be included
94
+
95
+ | | AUXILIARIES
96
+ | | WILL
97
+ |will
98
+
99
+ would
100
+
101
+ | | SHALL
102
+ |shall
103
+
104
+ should
105
+
106
+ | | CAN
107
+ |can
108
+
109
+ could
110
+
111
+ | | MAY
112
+ |may
113
+ |might
114
+ | | MUST
115
+ |must
116
+ | | OUGHT
117
+
118
+ ought
119
+
120
+ | COMPOUND FORMS, increasingly encountered nowadays in 'formal' writing
121
+ | pronoun + verb
122
+
123
+ i'm
124
+ you're
125
+ he's
126
+ she's
127
+ it's
128
+ we're
129
+ they're
130
+ i've
131
+ you've
132
+ we've
133
+ they've
134
+ i'd
135
+ you'd
136
+ he'd
137
+ she'd
138
+ we'd
139
+ they'd
140
+ i'll
141
+ you'll
142
+ he'll
143
+ she'll
144
+ we'll
145
+ they'll
146
+
147
+ | verb + negation
148
+
149
+ isn't
150
+ aren't
151
+ wasn't
152
+ weren't
153
+ hasn't
154
+ haven't
155
+ hadn't
156
+ doesn't
157
+ don't
158
+ didn't
159
+
160
+ | auxiliary + negation
161
+
162
+ won't
163
+ wouldn't
164
+ shan't
165
+ shouldn't
166
+ can't
167
+ cannot
168
+ couldn't
169
+ mustn't
170
+
171
+ | miscellaneous forms
172
+
173
+ let's
174
+ that's
175
+ who's
176
+ what's
177
+ here's
178
+ there's
179
+ when's
180
+ where's
181
+ why's
182
+ how's
183
+
184
+ | rarer forms
185
+
186
+ | daren't needn't
187
+
188
+ | doubtful forms
189
+
190
+ | oughtn't mightn't
191
+
192
+ | ARTICLES
193
+ a
194
+ an
195
+ the
196
+
197
+ | THE REST (Overlap among prepositions, conjunctions, adverbs etc is so
198
+ | high, that classification is pointless.)
199
+ and
200
+ but
201
+ if
202
+ or
203
+ because
204
+ as
205
+ until
206
+ while
207
+
208
+ of
209
+ at
210
+ by
211
+ for
212
+ with
213
+ about
214
+ against
215
+ between
216
+ into
217
+ through
218
+ during
219
+ before
220
+ after
221
+ above
222
+ below
223
+ to
224
+ from
225
+ up
226
+ down
227
+ in
228
+ out
229
+ on
230
+ off
231
+ over
232
+ under
233
+
234
+ again
235
+ further
236
+ then
237
+ once
238
+
239
+ here
240
+ there
241
+ when
242
+ where
243
+ why
244
+ how
245
+
246
+ all
247
+ any
248
+ both
249
+ each
250
+ few
251
+ more
252
+ most
253
+ other
254
+ some
255
+ such
256
+
257
+ no
258
+ nor
259
+ not
260
+ only
261
+ own
262
+ same
263
+ so
264
+ than
265
+ too
266
+ very
267
+
268
+ | Just for the record, the following words are among the commonest in English
269
+
270
+ | one
271
+ | every
272
+ | least
273
+ | less
274
+ | many
275
+ | now
276
+ | ever
277
+ | never
278
+ | say
279
+ | says
280
+ | said
281
+ | also
282
+ | get
283
+ | go
284
+ | goes
285
+ | just
286
+ | made
287
+ | make
288
+ | put
289
+ | see
290
+ | seen
291
+ | whether
292
+ | like
293
+ | well
294
+ | back
295
+ | even
296
+ | still
297
+ | way
298
+ | take
299
+ | since
300
+ | another
301
+ | however
302
+ | two
303
+ | three
304
+ | four
305
+ | five
306
+ | first
307
+ | second
308
+ | new
309
+ | old
310
+ | high
311
+ | long
312
+
@@ -0,0 +1,89 @@
1
+
2
+ | forms of BE
3
+
4
+ olla
5
+ olen
6
+ olet
7
+ on
8
+ olemme
9
+ olette
10
+ ovat
11
+ ole | negative form
12
+
13
+ oli
14
+ olisi
15
+ olisit
16
+ olisin
17
+ olisimme
18
+ olisitte
19
+ olisivat
20
+ olit
21
+ olin
22
+ olimme
23
+ olitte
24
+ olivat
25
+ ollut
26
+ olleet
27
+
28
+ en | negation
29
+ et
30
+ ei
31
+ emme
32
+ ette
33
+ eivät
34
+
35
+ |Nom Gen Acc Part Iness Elat Illat Adess Ablat Allat Ess Trans
36
+ minä minun minut minua minussa minusta minuun minulla minulta minulle | I
37
+ sinä sinun sinut sinua sinussa sinusta sinuun sinulla sinulta sinulle | you
38
+ hän hänen hänet häntä hänessä hänestä häneen hänellä häneltä hänelle | he she
39
+ me meidän meidät meitä meissä meistä meihin meillä meiltä meille | we
40
+ te teidän teidät teitä teissä teistä teihin teillä teiltä teille | you
41
+ he heidän heidät heitä heissä heistä heihin heillä heiltä heille | they
42
+
43
+ tämä tämän tätä tässä tästä tähän tallä tältä tälle tänä täksi | this
44
+ tuo tuon tuotä tuossa tuosta tuohon tuolla tuolta tuolle tuona tuoksi | that
45
+ se sen sitä siinä siitä siihen sillä siltä sille sinä siksi | it
46
+ nämä näiden näitä näissä näistä näihin näillä näiltä näille näinä näiksi | these
47
+ nuo noiden noita noissa noista noihin noilla noilta noille noina noiksi | those
48
+ ne niiden niitä niissä niistä niihin niillä niiltä niille niinä niiksi | they
49
+
50
+ kuka kenen kenet ketä kenessä kenestä keneen kenellä keneltä kenelle kenenä keneksi| who
51
+ ketkä keiden ketkä keitä keissä keistä keihin keillä keiltä keille keinä keiksi | (pl)
52
+ mikä minkä minkä mitä missä mistä mihin millä miltä mille minä miksi | which what
53
+ mitkä | (pl)
54
+
55
+ joka jonka jota jossa josta johon jolla jolta jolle jona joksi | who which
56
+ jotka joiden joita joissa joista joihin joilla joilta joille joina joiksi | (pl)
57
+
58
+ | conjunctions
59
+
60
+ että | that
61
+ ja | and
62
+ jos | if
63
+ koska | because
64
+ kuin | than
65
+ mutta | but
66
+ niin | so
67
+ sekä | and
68
+ sillä | for
69
+ tai | or
70
+ vaan | but
71
+ vai | or
72
+ vaikka | although
73
+
74
+
75
+ | prepositions
76
+
77
+ kanssa | with
78
+ mukaan | according to
79
+ noin | about
80
+ poikki | across
81
+ yli | over, across
82
+
83
+ | other
84
+
85
+ kun | when
86
+ niin | so
87
+ nyt | now
88
+ itse | self
89
+