queryparser 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/COPYING +340 -0
- data/COPYRIGHT +18 -0
- data/README +14 -0
- data/Rakefile +33 -0
- data/lib/queryparser.rb +714 -0
- metadata +61 -0
data/COPYING
ADDED
@@ -0,0 +1,340 @@
|
|
1
|
+
GNU GENERAL PUBLIC LICENSE
|
2
|
+
Version 2, June 1991
|
3
|
+
|
4
|
+
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
|
5
|
+
51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
|
6
|
+
Everyone is permitted to copy and distribute verbatim copies
|
7
|
+
of this license document, but changing it is not allowed.
|
8
|
+
|
9
|
+
Preamble
|
10
|
+
|
11
|
+
The licenses for most software are designed to take away your
|
12
|
+
freedom to share and change it. By contrast, the GNU General Public
|
13
|
+
License is intended to guarantee your freedom to share and change free
|
14
|
+
software--to make sure the software is free for all its users. This
|
15
|
+
General Public License applies to most of the Free Software
|
16
|
+
Foundation's software and to any other program whose authors commit to
|
17
|
+
using it. (Some other Free Software Foundation software is covered by
|
18
|
+
the GNU Library General Public License instead.) You can apply it to
|
19
|
+
your programs, too.
|
20
|
+
|
21
|
+
When we speak of free software, we are referring to freedom, not
|
22
|
+
price. Our General Public Licenses are designed to make sure that you
|
23
|
+
have the freedom to distribute copies of free software (and charge for
|
24
|
+
this service if you wish), that you receive source code or can get it
|
25
|
+
if you want it, that you can change the software or use pieces of it
|
26
|
+
in new free programs; and that you know you can do these things.
|
27
|
+
|
28
|
+
To protect your rights, we need to make restrictions that forbid
|
29
|
+
anyone to deny you these rights or to ask you to surrender the rights.
|
30
|
+
These restrictions translate to certain responsibilities for you if you
|
31
|
+
distribute copies of the software, or if you modify it.
|
32
|
+
|
33
|
+
For example, if you distribute copies of such a program, whether
|
34
|
+
gratis or for a fee, you must give the recipients all the rights that
|
35
|
+
you have. You must make sure that they, too, receive or can get the
|
36
|
+
source code. And you must show them these terms so they know their
|
37
|
+
rights.
|
38
|
+
|
39
|
+
We protect your rights with two steps: (1) copyright the software, and
|
40
|
+
(2) offer you this license which gives you legal permission to copy,
|
41
|
+
distribute and/or modify the software.
|
42
|
+
|
43
|
+
Also, for each author's protection and ours, we want to make certain
|
44
|
+
that everyone understands that there is no warranty for this free
|
45
|
+
software. If the software is modified by someone else and passed on, we
|
46
|
+
want its recipients to know that what they have is not the original, so
|
47
|
+
that any problems introduced by others will not reflect on the original
|
48
|
+
authors' reputations.
|
49
|
+
|
50
|
+
Finally, any free program is threatened constantly by software
|
51
|
+
patents. We wish to avoid the danger that redistributors of a free
|
52
|
+
program will individually obtain patent licenses, in effect making the
|
53
|
+
program proprietary. To prevent this, we have made it clear that any
|
54
|
+
patent must be licensed for everyone's free use or not licensed at all.
|
55
|
+
|
56
|
+
The precise terms and conditions for copying, distribution and
|
57
|
+
modification follow.
|
58
|
+
|
59
|
+
GNU GENERAL PUBLIC LICENSE
|
60
|
+
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
61
|
+
|
62
|
+
0. This License applies to any program or other work which contains
|
63
|
+
a notice placed by the copyright holder saying it may be distributed
|
64
|
+
under the terms of this General Public License. The "Program", below,
|
65
|
+
refers to any such program or work, and a "work based on the Program"
|
66
|
+
means either the Program or any derivative work under copyright law:
|
67
|
+
that is to say, a work containing the Program or a portion of it,
|
68
|
+
either verbatim or with modifications and/or translated into another
|
69
|
+
language. (Hereinafter, translation is included without limitation in
|
70
|
+
the term "modification".) Each licensee is addressed as "you".
|
71
|
+
|
72
|
+
Activities other than copying, distribution and modification are not
|
73
|
+
covered by this License; they are outside its scope. The act of
|
74
|
+
running the Program is not restricted, and the output from the Program
|
75
|
+
is covered only if its contents constitute a work based on the
|
76
|
+
Program (independent of having been made by running the Program).
|
77
|
+
Whether that is true depends on what the Program does.
|
78
|
+
|
79
|
+
1. You may copy and distribute verbatim copies of the Program's
|
80
|
+
source code as you receive it, in any medium, provided that you
|
81
|
+
conspicuously and appropriately publish on each copy an appropriate
|
82
|
+
copyright notice and disclaimer of warranty; keep intact all the
|
83
|
+
notices that refer to this License and to the absence of any warranty;
|
84
|
+
and give any other recipients of the Program a copy of this License
|
85
|
+
along with the Program.
|
86
|
+
|
87
|
+
You may charge a fee for the physical act of transferring a copy, and
|
88
|
+
you may at your option offer warranty protection in exchange for a fee.
|
89
|
+
|
90
|
+
2. You may modify your copy or copies of the Program or any portion
|
91
|
+
of it, thus forming a work based on the Program, and copy and
|
92
|
+
distribute such modifications or work under the terms of Section 1
|
93
|
+
above, provided that you also meet all of these conditions:
|
94
|
+
|
95
|
+
a) You must cause the modified files to carry prominent notices
|
96
|
+
stating that you changed the files and the date of any change.
|
97
|
+
|
98
|
+
b) You must cause any work that you distribute or publish, that in
|
99
|
+
whole or in part contains or is derived from the Program or any
|
100
|
+
part thereof, to be licensed as a whole at no charge to all third
|
101
|
+
parties under the terms of this License.
|
102
|
+
|
103
|
+
c) If the modified program normally reads commands interactively
|
104
|
+
when run, you must cause it, when started running for such
|
105
|
+
interactive use in the most ordinary way, to print or display an
|
106
|
+
announcement including an appropriate copyright notice and a
|
107
|
+
notice that there is no warranty (or else, saying that you provide
|
108
|
+
a warranty) and that users may redistribute the program under
|
109
|
+
these conditions, and telling the user how to view a copy of this
|
110
|
+
License. (Exception: if the Program itself is interactive but
|
111
|
+
does not normally print such an announcement, your work based on
|
112
|
+
the Program is not required to print an announcement.)
|
113
|
+
|
114
|
+
These requirements apply to the modified work as a whole. If
|
115
|
+
identifiable sections of that work are not derived from the Program,
|
116
|
+
and can be reasonably considered independent and separate works in
|
117
|
+
themselves, then this License, and its terms, do not apply to those
|
118
|
+
sections when you distribute them as separate works. But when you
|
119
|
+
distribute the same sections as part of a whole which is a work based
|
120
|
+
on the Program, the distribution of the whole must be on the terms of
|
121
|
+
this License, whose permissions for other licensees extend to the
|
122
|
+
entire whole, and thus to each and every part regardless of who wrote it.
|
123
|
+
|
124
|
+
Thus, it is not the intent of this section to claim rights or contest
|
125
|
+
your rights to work written entirely by you; rather, the intent is to
|
126
|
+
exercise the right to control the distribution of derivative or
|
127
|
+
collective works based on the Program.
|
128
|
+
|
129
|
+
In addition, mere aggregation of another work not based on the Program
|
130
|
+
with the Program (or with a work based on the Program) on a volume of
|
131
|
+
a storage or distribution medium does not bring the other work under
|
132
|
+
the scope of this License.
|
133
|
+
|
134
|
+
3. You may copy and distribute the Program (or a work based on it,
|
135
|
+
under Section 2) in object code or executable form under the terms of
|
136
|
+
Sections 1 and 2 above provided that you also do one of the following:
|
137
|
+
|
138
|
+
a) Accompany it with the complete corresponding machine-readable
|
139
|
+
source code, which must be distributed under the terms of Sections
|
140
|
+
1 and 2 above on a medium customarily used for software interchange; or,
|
141
|
+
|
142
|
+
b) Accompany it with a written offer, valid for at least three
|
143
|
+
years, to give any third party, for a charge no more than your
|
144
|
+
cost of physically performing source distribution, a complete
|
145
|
+
machine-readable copy of the corresponding source code, to be
|
146
|
+
distributed under the terms of Sections 1 and 2 above on a medium
|
147
|
+
customarily used for software interchange; or,
|
148
|
+
|
149
|
+
c) Accompany it with the information you received as to the offer
|
150
|
+
to distribute corresponding source code. (This alternative is
|
151
|
+
allowed only for noncommercial distribution and only if you
|
152
|
+
received the program in object code or executable form with such
|
153
|
+
an offer, in accord with Subsection b above.)
|
154
|
+
|
155
|
+
The source code for a work means the preferred form of the work for
|
156
|
+
making modifications to it. For an executable work, complete source
|
157
|
+
code means all the source code for all modules it contains, plus any
|
158
|
+
associated interface definition files, plus the scripts used to
|
159
|
+
control compilation and installation of the executable. However, as a
|
160
|
+
special exception, the source code distributed need not include
|
161
|
+
anything that is normally distributed (in either source or binary
|
162
|
+
form) with the major components (compiler, kernel, and so on) of the
|
163
|
+
operating system on which the executable runs, unless that component
|
164
|
+
itself accompanies the executable.
|
165
|
+
|
166
|
+
If distribution of executable or object code is made by offering
|
167
|
+
access to copy from a designated place, then offering equivalent
|
168
|
+
access to copy the source code from the same place counts as
|
169
|
+
distribution of the source code, even though third parties are not
|
170
|
+
compelled to copy the source along with the object code.
|
171
|
+
|
172
|
+
4. You may not copy, modify, sublicense, or distribute the Program
|
173
|
+
except as expressly provided under this License. Any attempt
|
174
|
+
otherwise to copy, modify, sublicense or distribute the Program is
|
175
|
+
void, and will automatically terminate your rights under this License.
|
176
|
+
However, parties who have received copies, or rights, from you under
|
177
|
+
this License will not have their licenses terminated so long as such
|
178
|
+
parties remain in full compliance.
|
179
|
+
|
180
|
+
5. You are not required to accept this License, since you have not
|
181
|
+
signed it. However, nothing else grants you permission to modify or
|
182
|
+
distribute the Program or its derivative works. These actions are
|
183
|
+
prohibited by law if you do not accept this License. Therefore, by
|
184
|
+
modifying or distributing the Program (or any work based on the
|
185
|
+
Program), you indicate your acceptance of this License to do so, and
|
186
|
+
all its terms and conditions for copying, distributing or modifying
|
187
|
+
the Program or works based on it.
|
188
|
+
|
189
|
+
6. Each time you redistribute the Program (or any work based on the
|
190
|
+
Program), the recipient automatically receives a license from the
|
191
|
+
original licensor to copy, distribute or modify the Program subject to
|
192
|
+
these terms and conditions. You may not impose any further
|
193
|
+
restrictions on the recipients' exercise of the rights granted herein.
|
194
|
+
You are not responsible for enforcing compliance by third parties to
|
195
|
+
this License.
|
196
|
+
|
197
|
+
7. If, as a consequence of a court judgment or allegation of patent
|
198
|
+
infringement or for any other reason (not limited to patent issues),
|
199
|
+
conditions are imposed on you (whether by court order, agreement or
|
200
|
+
otherwise) that contradict the conditions of this License, they do not
|
201
|
+
excuse you from the conditions of this License. If you cannot
|
202
|
+
distribute so as to satisfy simultaneously your obligations under this
|
203
|
+
License and any other pertinent obligations, then as a consequence you
|
204
|
+
may not distribute the Program at all. For example, if a patent
|
205
|
+
license would not permit royalty-free redistribution of the Program by
|
206
|
+
all those who receive copies directly or indirectly through you, then
|
207
|
+
the only way you could satisfy both it and this License would be to
|
208
|
+
refrain entirely from distribution of the Program.
|
209
|
+
|
210
|
+
If any portion of this section is held invalid or unenforceable under
|
211
|
+
any particular circumstance, the balance of the section is intended to
|
212
|
+
apply and the section as a whole is intended to apply in other
|
213
|
+
circumstances.
|
214
|
+
|
215
|
+
It is not the purpose of this section to induce you to infringe any
|
216
|
+
patents or other property right claims or to contest validity of any
|
217
|
+
such claims; this section has the sole purpose of protecting the
|
218
|
+
integrity of the free software distribution system, which is
|
219
|
+
implemented by public license practices. Many people have made
|
220
|
+
generous contributions to the wide range of software distributed
|
221
|
+
through that system in reliance on consistent application of that
|
222
|
+
system; it is up to the author/donor to decide if he or she is willing
|
223
|
+
to distribute software through any other system and a licensee cannot
|
224
|
+
impose that choice.
|
225
|
+
|
226
|
+
This section is intended to make thoroughly clear what is believed to
|
227
|
+
be a consequence of the rest of this License.
|
228
|
+
|
229
|
+
8. If the distribution and/or use of the Program is restricted in
|
230
|
+
certain countries either by patents or by copyrighted interfaces, the
|
231
|
+
original copyright holder who places the Program under this License
|
232
|
+
may add an explicit geographical distribution limitation excluding
|
233
|
+
those countries, so that distribution is permitted only in or among
|
234
|
+
countries not thus excluded. In such case, this License incorporates
|
235
|
+
the limitation as if written in the body of this License.
|
236
|
+
|
237
|
+
9. The Free Software Foundation may publish revised and/or new versions
|
238
|
+
of the General Public License from time to time. Such new versions will
|
239
|
+
be similar in spirit to the present version, but may differ in detail to
|
240
|
+
address new problems or concerns.
|
241
|
+
|
242
|
+
Each version is given a distinguishing version number. If the Program
|
243
|
+
specifies a version number of this License which applies to it and "any
|
244
|
+
later version", you have the option of following the terms and conditions
|
245
|
+
either of that version or of any later version published by the Free
|
246
|
+
Software Foundation. If the Program does not specify a version number of
|
247
|
+
this License, you may choose any version ever published by the Free Software
|
248
|
+
Foundation.
|
249
|
+
|
250
|
+
10. If you wish to incorporate parts of the Program into other free
|
251
|
+
programs whose distribution conditions are different, write to the author
|
252
|
+
to ask for permission. For software which is copyrighted by the Free
|
253
|
+
Software Foundation, write to the Free Software Foundation; we sometimes
|
254
|
+
make exceptions for this. Our decision will be guided by the two goals
|
255
|
+
of preserving the free status of all derivatives of our free software and
|
256
|
+
of promoting the sharing and reuse of software generally.
|
257
|
+
|
258
|
+
NO WARRANTY
|
259
|
+
|
260
|
+
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
|
261
|
+
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
|
262
|
+
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
|
263
|
+
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
|
264
|
+
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
265
|
+
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
|
266
|
+
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
267
|
+
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
|
268
|
+
REPAIR OR CORRECTION.
|
269
|
+
|
270
|
+
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
271
|
+
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
|
272
|
+
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
|
273
|
+
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
|
274
|
+
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
|
275
|
+
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
|
276
|
+
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
|
277
|
+
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
|
278
|
+
POSSIBILITY OF SUCH DAMAGES.
|
279
|
+
|
280
|
+
END OF TERMS AND CONDITIONS
|
281
|
+
|
282
|
+
How to Apply These Terms to Your New Programs
|
283
|
+
|
284
|
+
If you develop a new program, and you want it to be of the greatest
|
285
|
+
possible use to the public, the best way to achieve this is to make it
|
286
|
+
free software which everyone can redistribute and change under these terms.
|
287
|
+
|
288
|
+
To do so, attach the following notices to the program. It is safest
|
289
|
+
to attach them to the start of each source file to most effectively
|
290
|
+
convey the exclusion of warranty; and each file should have at least
|
291
|
+
the "copyright" line and a pointer to where the full notice is found.
|
292
|
+
|
293
|
+
<one line to give the program's name and a brief idea of what it does.>
|
294
|
+
Copyright (C) <year> <name of author>
|
295
|
+
|
296
|
+
This program is free software; you can redistribute it and/or modify
|
297
|
+
it under the terms of the GNU General Public License as published by
|
298
|
+
the Free Software Foundation; either version 2 of the License, or
|
299
|
+
(at your option) any later version.
|
300
|
+
|
301
|
+
This program is distributed in the hope that it will be useful,
|
302
|
+
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
303
|
+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
304
|
+
GNU General Public License for more details.
|
305
|
+
|
306
|
+
You should have received a copy of the GNU General Public License
|
307
|
+
along with this program; if not, write to the Free Software
|
308
|
+
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
|
309
|
+
|
310
|
+
|
311
|
+
Also add information on how to contact you by electronic and paper mail.
|
312
|
+
|
313
|
+
If the program is interactive, make it output a short notice like this
|
314
|
+
when it starts in an interactive mode:
|
315
|
+
|
316
|
+
Gnomovision version 69, Copyright (C) year name of author
|
317
|
+
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
318
|
+
This is free software, and you are welcome to redistribute it
|
319
|
+
under certain conditions; type `show c' for details.
|
320
|
+
|
321
|
+
The hypothetical commands `show w' and `show c' should show the appropriate
|
322
|
+
parts of the General Public License. Of course, the commands you use may
|
323
|
+
be called something other than `show w' and `show c'; they could even be
|
324
|
+
mouse-clicks or menu items--whatever suits your program.
|
325
|
+
|
326
|
+
You should also get your employer (if you work as a programmer) or your
|
327
|
+
school, if any, to sign a "copyright disclaimer" for the program, if
|
328
|
+
necessary. Here is a sample; alter the names:
|
329
|
+
|
330
|
+
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
331
|
+
`Gnomovision' (which makes passes at compilers) written by James Hacker.
|
332
|
+
|
333
|
+
<signature of Ty Coon>, 1 April 1989
|
334
|
+
Ty Coon, President of Vice
|
335
|
+
|
336
|
+
This General Public License does not permit incorporating your program into
|
337
|
+
proprietary programs. If your program is a subroutine library, you may
|
338
|
+
consider it more useful to permit linking proprietary applications with the
|
339
|
+
library. If this is what you want to do, use the GNU Library General
|
340
|
+
Public License instead of this License.
|
data/COPYRIGHT
ADDED
@@ -0,0 +1,18 @@
|
|
1
|
+
|
2
|
+
QueryParser - Parse a plain language query into Lucene syntax
|
3
|
+
Copyright (C) 2008 Peter Hickman
|
4
|
+
|
5
|
+
This program is free software; you can redistribute it and/or modify
|
6
|
+
it under the terms of the GNU General Public License as published by
|
7
|
+
the Free Software Foundation; either version 2 of the License, or
|
8
|
+
(at your option) any later version.
|
9
|
+
|
10
|
+
This program is distributed in the hope that it will be useful,
|
11
|
+
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
12
|
+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
13
|
+
GNU General Public License for more details.
|
14
|
+
|
15
|
+
You should have received a copy of the GNU General Public License
|
16
|
+
along with this program; if not, write to the Free Software
|
17
|
+
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
|
18
|
+
|
data/README
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
|
2
|
+
QueryParser - Parse a plain language query into Lucene syntax
|
3
|
+
Copyright (C) 2008 Peter Hickman
|
4
|
+
|
5
|
+
Parse a plain language query in the format
|
6
|
+
|
7
|
+
apple not banana
|
8
|
+
|
9
|
+
apple or banana and fig or date
|
10
|
+
|
11
|
+
and change them into a format suitable for use with Lucene or Solr.
|
12
|
+
The plain text query can include braces to group things and the Lucene
|
13
|
+
query can include both a similarity value and field boosting
|
14
|
+
|
data/Rakefile
ADDED
@@ -0,0 +1,33 @@
|
|
1
|
+
# -*- ruby -*-
|
2
|
+
|
3
|
+
require 'rubygems'
|
4
|
+
require 'rake/gempackagetask'
|
5
|
+
|
6
|
+
$:.push 'lib'
|
7
|
+
require 'queryparser'
|
8
|
+
|
9
|
+
PKG_NAME = 'queryparser'
|
10
|
+
PKG_VERSION = QueryParser::VERSION
|
11
|
+
|
12
|
+
spec = Gem::Specification.new do |s|
|
13
|
+
s.name = PKG_NAME
|
14
|
+
s.version = PKG_VERSION
|
15
|
+
s.summary = 'Parse a natural language query into lucene query syntax'
|
16
|
+
|
17
|
+
s.files = FileList['README', 'COPY*', 'Rakefile', 'lib/**/*.rb']
|
18
|
+
s.test_files = FileList['test/*.rb']
|
19
|
+
|
20
|
+
s.has_rdoc = true
|
21
|
+
s.rdoc_options << '--title' << 'QueryParser' << '--charset' << 'utf-8'
|
22
|
+
s.extra_rdoc_files = FileList['README', 'COPYING']
|
23
|
+
|
24
|
+
s.author = 'Peter Hickman'
|
25
|
+
s.email = 'peterhi@ntlworld.com'
|
26
|
+
|
27
|
+
s.homepage = 'queryparser.rubyforge.org'
|
28
|
+
s.rubyforge_project = 'queryparser'
|
29
|
+
end
|
30
|
+
|
31
|
+
Rake::GemPackageTask.new(spec) do |pkg|
|
32
|
+
pkg.need_tar = true
|
33
|
+
end
|
data/lib/queryparser.rb
ADDED
@@ -0,0 +1,714 @@
|
|
1
|
+
# Takes a query in plain english and turns it into a string
|
2
|
+
# suitable to passing to Lucene or Solr.
|
3
|
+
#
|
4
|
+
# Assuming a Lucene / Solr database that has the body of the
|
5
|
+
# data in the +content+ field with the entry heading in a
|
6
|
+
# +title+ field, sub headings in a +subheading+ field
|
7
|
+
#
|
8
|
+
# p = QueryParser.new('content')
|
9
|
+
# l = p->parse("apple")
|
10
|
+
# => "content:apple"
|
11
|
+
#
|
12
|
+
# l = p->parse("apple and banana")
|
13
|
+
# => "+(+content:apple +content:banana)"
|
14
|
+
#
|
15
|
+
# l = p.parse('apple not banana or cherry')
|
16
|
+
# => "+((+content:apple -content:banana) content:cherry)"
|
17
|
+
#
|
18
|
+
# Here we boost the score of those queries that also match the
|
19
|
+
# title field of the document
|
20
|
+
#
|
21
|
+
# p = QueryParser.new("content", nil, 'title' => '^10')
|
22
|
+
# l = p.parse("apple")
|
23
|
+
# => "content:apple title:apple^10"
|
24
|
+
#
|
25
|
+
# Now with an extra boosting for subheadings
|
26
|
+
#
|
27
|
+
# p = QueryParser.new("content", nil, 'title' => '^10', 'subheading' => '^5')
|
28
|
+
# l = p.parse("apple")
|
29
|
+
# => "content:apple title:apple^10 subheading:apple^5"
|
30
|
+
#
|
31
|
+
# We can also change the similarity of the match. In Lucene terms
|
32
|
+
# a similarity of 1.0 will mean that 'banana' will only match 'banana'.
|
33
|
+
# However a similarity of 0.6 (entered as ~0.6) will allow 'banana' to
|
34
|
+
# match 'canada' which is only two letters different. The default similarity
|
35
|
+
# in Lucene is 0.6 (if I remember correctly).
|
36
|
+
#
|
37
|
+
# p = QueryParser.new("content", '~0.6', 'title' => '^10')
|
38
|
+
# l = p.parse("apple not banana")
|
39
|
+
# => "+(+content:apple~0.6 -content:banana~0.6) title:apple~0.6^10"
|
40
|
+
|
41
|
+
class QueryParser
|
42
|
+
VERSION = '1.0.0'
|
43
|
+
|
44
|
+
def initialize(field, similarity = nil, boosts = {})
|
45
|
+
@field = field
|
46
|
+
@similarity = similarity
|
47
|
+
@boosts = boosts
|
48
|
+
end
|
49
|
+
|
50
|
+
# Takes a plain english query and converts it into a string
|
51
|
+
# that can be fed into Lucene or Solr. It will apply the
|
52
|
+
# similarity and boostings set in the constructor.
|
53
|
+
def parse(text)
|
54
|
+
a = tokenise(text)
|
55
|
+
b = expand(a)
|
56
|
+
check_braces(b)
|
57
|
+
has_content(b)
|
58
|
+
|
59
|
+
c = add_implicit_and(b)
|
60
|
+
|
61
|
+
d = maketree(c)
|
62
|
+
if d.class != Array then
|
63
|
+
d = [d]
|
64
|
+
end
|
65
|
+
|
66
|
+
f = process_not(d)
|
67
|
+
g = process_and_or(f, 'and')
|
68
|
+
h = process_and_or(g, 'or')
|
69
|
+
|
70
|
+
# Wrap everything in an and
|
71
|
+
s = QueryParser::And.new
|
72
|
+
s.add(h)
|
73
|
+
|
74
|
+
t = reduce(s)
|
75
|
+
|
76
|
+
b = QueryParser::Or.new
|
77
|
+
b.add(t.boostable())
|
78
|
+
|
79
|
+
a = Array.new
|
80
|
+
x = t.lucene(@field, @similarity)
|
81
|
+
if x[0].chr == '(' then
|
82
|
+
x = "+#{x}"
|
83
|
+
end
|
84
|
+
a << x
|
85
|
+
|
86
|
+
@boosts.each_pair do |k, v|
|
87
|
+
x = [@similarity, v].join('')
|
88
|
+
a << b.lucene(k,x)
|
89
|
+
end
|
90
|
+
|
91
|
+
return a.join(' ')
|
92
|
+
end
|
93
|
+
|
94
|
+
private
|
95
|
+
|
96
|
+
# Split the string into tokens based on whitespace unless it is
|
97
|
+
# enclosed in ' or ". Initially we classify everything as either
|
98
|
+
# a term or quoted.
|
99
|
+
#
|
100
|
+
# The input is a text string and the output a flat list of terms
|
101
|
+
def tokenise(text)
|
102
|
+
r = Array.new()
|
103
|
+
|
104
|
+
delimiter = ''
|
105
|
+
token = ''
|
106
|
+
|
107
|
+
text.split("").each do |char|
|
108
|
+
if delimiter == '' then
|
109
|
+
if char == '"' or char == "'" then
|
110
|
+
token = remove_punctuation(token)
|
111
|
+
r << QueryParser::Term.new(token) if token != ''
|
112
|
+
delimiter = char.dup
|
113
|
+
token = char.dup
|
114
|
+
elsif char == " " then
|
115
|
+
token = remove_punctuation(token)
|
116
|
+
r << QueryParser::Term.new(token) if token != ''
|
117
|
+
token = ''
|
118
|
+
else
|
119
|
+
token << char.dup
|
120
|
+
end
|
121
|
+
elsif delimiter == char then
|
122
|
+
token << char.dup
|
123
|
+
token = remove_punctuation(token)
|
124
|
+
r << QueryParser::Term.new(token) if token != ''
|
125
|
+
token = ''
|
126
|
+
delimiter = ''
|
127
|
+
else
|
128
|
+
token << char.dup
|
129
|
+
end
|
130
|
+
end
|
131
|
+
|
132
|
+
token = remove_punctuation(token)
|
133
|
+
r << QueryParser::Term.new(token) if token != ''
|
134
|
+
|
135
|
+
return r
|
136
|
+
end
|
137
|
+
|
138
|
+
# All our terms will be a-Z0-9 and ( and ). The rest is lost
|
139
|
+
def remove_punctuation(a)
|
140
|
+
if a == '' then
|
141
|
+
return a
|
142
|
+
end
|
143
|
+
|
144
|
+
first = a[0].chr
|
145
|
+
last = a[-1].chr
|
146
|
+
|
147
|
+
quoted = false
|
148
|
+
if first == '"' or first == "'" then
|
149
|
+
if first == last then
|
150
|
+
quoted = true
|
151
|
+
end
|
152
|
+
end
|
153
|
+
|
154
|
+
b = a.gsub(/[^[:alnum:]()]/,' ')
|
155
|
+
c = b.gsub(/\s+/, ' ').strip
|
156
|
+
|
157
|
+
if quoted then
|
158
|
+
return ['"', c, '"'].join('')
|
159
|
+
else
|
160
|
+
return c
|
161
|
+
end
|
162
|
+
end
|
163
|
+
|
164
|
+
# If any terms have '(' or ')' in them then expand them up and tokenise
|
165
|
+
#
|
166
|
+
# The input is a list of terms, the output is a (possibly longer) list of terms
|
167
|
+
def expand(a)
|
168
|
+
r = Array.new
|
169
|
+
|
170
|
+
a.each do |i|
|
171
|
+
if i.type == 'term' and (i.data.index("(") or i.data.index(")")) then
|
172
|
+
x = i.data.gsub("(", " ( ").gsub(")", " ) ")
|
173
|
+
r << tokenise(x)
|
174
|
+
else
|
175
|
+
r << i
|
176
|
+
end
|
177
|
+
end
|
178
|
+
|
179
|
+
return r.flatten
|
180
|
+
end
|
181
|
+
|
182
|
+
# Create nested lists around the 'open' and 'close' ops
|
183
|
+
#
|
184
|
+
# The input is a list of terms, the output is a list of terms and lists of the same
|
185
|
+
def maketree(a)
|
186
|
+
r = Array.new
|
187
|
+
|
188
|
+
while x = a.shift do
|
189
|
+
case x.type
|
190
|
+
when "open"
|
191
|
+
y = maketree(a)
|
192
|
+
if y.size == 1 then
|
193
|
+
r << y[0]
|
194
|
+
elsif y.size > 1 then
|
195
|
+
r << y
|
196
|
+
end
|
197
|
+
when "close"
|
198
|
+
return r
|
199
|
+
else
|
200
|
+
r << x
|
201
|
+
end
|
202
|
+
end
|
203
|
+
|
204
|
+
if r.size == 1 then
|
205
|
+
return r[0]
|
206
|
+
else
|
207
|
+
return r
|
208
|
+
end
|
209
|
+
end
|
210
|
+
|
211
|
+
# Add the implicit 'and' after a term that is not itself an op
|
212
|
+
#
|
213
|
+
# The input is a list of terms and lists of same, the output is a (possibly longer) list of terms
|
214
|
+
def add_implicit_and(a)
|
215
|
+
r = Array.new
|
216
|
+
|
217
|
+
a.each do |i|
|
218
|
+
if r.size > 0 then
|
219
|
+
if previous_type(r.last) then
|
220
|
+
if current_type(i) then
|
221
|
+
r << QueryParser::Term.new('and')
|
222
|
+
end
|
223
|
+
else
|
224
|
+
if not current_type(i) then
|
225
|
+
raise QueryParser::Exceptions::MalformedQuery
|
226
|
+
end
|
227
|
+
end
|
228
|
+
end
|
229
|
+
|
230
|
+
r << i
|
231
|
+
end
|
232
|
+
|
233
|
+
if r.last.type == 'op' then
|
234
|
+
raise QueryParser::Exceptions::MalformedQuery
|
235
|
+
end
|
236
|
+
|
237
|
+
return r
|
238
|
+
end
|
239
|
+
|
240
|
+
# All these behave the same for adding an 'and'
|
241
|
+
def previous_type(i)
|
242
|
+
return (i.type == 'term' or i.type == 'close')
|
243
|
+
end
|
244
|
+
|
245
|
+
def current_type(i)
|
246
|
+
return (i.type == 'term' or i.type == 'open' or i.data == 'not')
|
247
|
+
end
|
248
|
+
|
249
|
+
# The not picks up the term to it's right
|
250
|
+
#
|
251
|
+
# The 'Not' op terms in the list are converted into Not objects
|
252
|
+
def process_not(a)
|
253
|
+
r = Array.new
|
254
|
+
|
255
|
+
# So we can handle a 'not not not apple' and the like
|
256
|
+
b = a.reverse
|
257
|
+
|
258
|
+
b.each do |i|
|
259
|
+
if i.class == Array then
|
260
|
+
i = process_not(i)
|
261
|
+
end
|
262
|
+
|
263
|
+
if i.class == QueryParser::Term and i.type == 'op' and i.data == 'not' then
|
264
|
+
if r.size == 0 then
|
265
|
+
raise QueryParser::Exceptions::MalformedQuery
|
266
|
+
else
|
267
|
+
x = QueryParser::Not.new(r.pop)
|
268
|
+
r << x
|
269
|
+
end
|
270
|
+
else
|
271
|
+
r << i
|
272
|
+
end
|
273
|
+
end
|
274
|
+
|
275
|
+
return r.reverse
|
276
|
+
end
|
277
|
+
|
278
|
+
# Find all the 'and' and 'or' op terms and convert them into And and Or objects
|
279
|
+
def process_and_or(a, type)
|
280
|
+
# make sure that it is in an array
|
281
|
+
if a.class != Array then
|
282
|
+
a = [a]
|
283
|
+
end
|
284
|
+
|
285
|
+
r = Array.new
|
286
|
+
|
287
|
+
has_op = false
|
288
|
+
s = nil
|
289
|
+
|
290
|
+
a.each do |i|
|
291
|
+
# First recurse into each element
|
292
|
+
if i.class == Array then
|
293
|
+
x = process_and_or(i, type)
|
294
|
+
if x.class == Array and x.size == 1 then
|
295
|
+
i = x.first
|
296
|
+
else
|
297
|
+
i = x
|
298
|
+
end
|
299
|
+
elsif i.class == QueryParser::Not then
|
300
|
+
x = process_and_or(i.contents, type)
|
301
|
+
if x.class == Array and x.size == 1 then
|
302
|
+
x = x.first
|
303
|
+
end
|
304
|
+
i = QueryParser::Not.new(x)
|
305
|
+
elsif i.class == QueryParser::And then
|
306
|
+
x = process_and_or(i.contents, type)
|
307
|
+
i = QueryParser::And.new()
|
308
|
+
i.add(x)
|
309
|
+
elsif i.class == QueryParser::Or then
|
310
|
+
x = process_and_or(i.contents, type)
|
311
|
+
i = QueryParser::Or.new()
|
312
|
+
i.add(x)
|
313
|
+
end
|
314
|
+
|
315
|
+
if has_op == true then
|
316
|
+
s.add(i)
|
317
|
+
r << s
|
318
|
+
s = nil
|
319
|
+
has_op = false
|
320
|
+
elsif i.class == QueryParser::Term and i.type == 'op' and i.data == type then
|
321
|
+
has_op = true
|
322
|
+
if i.data == 'and' then
|
323
|
+
s = QueryParser::And.new
|
324
|
+
else
|
325
|
+
s = QueryParser::Or.new
|
326
|
+
end
|
327
|
+
|
328
|
+
if r.size == 0 then
|
329
|
+
raise QueryParser::Exceptions::MalformedQuery
|
330
|
+
else
|
331
|
+
s.add(r.pop)
|
332
|
+
end
|
333
|
+
else
|
334
|
+
r << i
|
335
|
+
end
|
336
|
+
end
|
337
|
+
|
338
|
+
if r.size == 1 then
|
339
|
+
return r[0]
|
340
|
+
else
|
341
|
+
return r
|
342
|
+
end
|
343
|
+
end
|
344
|
+
|
345
|
+
# Reduce the sets down
|
346
|
+
def reduce(a)
|
347
|
+
process = true
|
348
|
+
|
349
|
+
while process do
|
350
|
+
a = a.reduce
|
351
|
+
if a.reduced? == false then
|
352
|
+
process = false
|
353
|
+
end
|
354
|
+
end
|
355
|
+
|
356
|
+
return a
|
357
|
+
end
|
358
|
+
|
359
|
+
# Check that the "(" and ")" are balanced
|
360
|
+
def check_braces(a)
|
361
|
+
counter = 0
|
362
|
+
|
363
|
+
a.each do |i|
|
364
|
+
if i.type == 'open' then
|
365
|
+
counter += 1
|
366
|
+
elsif i.type == 'close' then
|
367
|
+
counter -= 1
|
368
|
+
if counter < 0 then
|
369
|
+
raise QueryParser::Exceptions::UnbalancedBraces
|
370
|
+
end
|
371
|
+
end
|
372
|
+
end
|
373
|
+
|
374
|
+
if counter != 0 then
|
375
|
+
raise QueryParser::Exceptions::UnbalancedBraces
|
376
|
+
end
|
377
|
+
end
|
378
|
+
|
379
|
+
def has_content(a)
|
380
|
+
counter = 0
|
381
|
+
|
382
|
+
a.each do |i|
|
383
|
+
if i.type == 'term' then
|
384
|
+
counter += 1
|
385
|
+
end
|
386
|
+
end
|
387
|
+
|
388
|
+
if counter == 0 then
|
389
|
+
raise QueryParser::Exceptions::EmptyQuery
|
390
|
+
end
|
391
|
+
end
|
392
|
+
end
|
393
|
+
|
394
|
+
# The custom exceptions that may be thrown if there is some
|
395
|
+
# problem with the query.
|
396
|
+
module QueryParser::Exceptions
|
397
|
+
# This exception will be thrown if the query is generally
|
398
|
+
# malformed such as <tt>"apple and and banana"</tt> (too many
|
399
|
+
# <tt>and</tt>s), <tt>"apple not"</tt> (no term after the +not+)
|
400
|
+
# or <tt>"and apple"</tt> (no term before the +and+) and the like
|
401
|
+
class MalformedQuery < Exception
|
402
|
+
end
|
403
|
+
|
404
|
+
# This exception will be thrown if the query contains
|
405
|
+
# unbalanaced braces
|
406
|
+
class UnbalancedBraces < Exception
|
407
|
+
end
|
408
|
+
|
409
|
+
# This exception will be thrown if the supplied query string is
|
410
|
+
# empty after removing the +and+, +or+, +not+, ( and )
|
411
|
+
class EmptyQuery < Exception
|
412
|
+
end
|
413
|
+
end
|
414
|
+
|
415
|
+
# A basic search term. The input query is tokenised into
|
416
|
+
# terms which then cat manipulated to create the query tree.
|
417
|
+
#
|
418
|
+
# Generally you should not need to handle this class unless
|
419
|
+
# you are changing the parser works.
|
420
|
+
class QueryParser::Term
|
421
|
+
# Takes the token from the user's query and classify it:
|
422
|
+
#
|
423
|
+
# open:: The opening ( used to indicate the start of a parentisised part of the query.
|
424
|
+
# close:: The closing ) used to indicate the end of a parentisised part of the query.
|
425
|
+
# and:: The term indicating conjunction
|
426
|
+
# or:: The term indicating disjunction
|
427
|
+
# not:: The term indicating negation
|
428
|
+
# term:: None of the above. A term to find.
|
429
|
+
def initialize(data)
|
430
|
+
@type = 'term'
|
431
|
+
@data = data
|
432
|
+
@was_reduced = false
|
433
|
+
|
434
|
+
if @data == nil then
|
435
|
+
@data = ''
|
436
|
+
else
|
437
|
+
case @data.downcase
|
438
|
+
when "("
|
439
|
+
@type = "open"
|
440
|
+
when ")"
|
441
|
+
@type = "close"
|
442
|
+
when "and", "or", "not"
|
443
|
+
@type = 'op'
|
444
|
+
@data = @data.downcase
|
445
|
+
end
|
446
|
+
end
|
447
|
+
end
|
448
|
+
|
449
|
+
attr_reader :type, :data
|
450
|
+
|
451
|
+
# Display the Term, useful for debugging and testing
|
452
|
+
# the Term class in isolation
|
453
|
+
def inspect
|
454
|
+
"#{@type}:#{@data}"
|
455
|
+
end
|
456
|
+
|
457
|
+
# Convert a term into string usable in a Lucene query
|
458
|
+
# with an optional similarity
|
459
|
+
def lucene(field, suffix = nil)
|
460
|
+
"#{field}:#{@data}#{suffix}"
|
461
|
+
end
|
462
|
+
|
463
|
+
# Even though a term cannot, itself, be reduced the
|
464
|
+
# process will call this method on everything that
|
465
|
+
# is in the query. So we need to have this.
|
466
|
+
def reduce
|
467
|
+
@was_reduced = false
|
468
|
+
return self
|
469
|
+
end
|
470
|
+
|
471
|
+
# Return true if the previous call to #reduce did
|
472
|
+
# actually reduce the term. Again this is a method
|
473
|
+
# universal to all parts of the query and so we
|
474
|
+
# have to have it. But see #set_reduced to see why
|
475
|
+
# it can actually return true.
|
476
|
+
def reduced?
|
477
|
+
@was_reduced
|
478
|
+
end
|
479
|
+
|
480
|
+
# If the term was the only member of an +and+, +or+ or
|
481
|
+
# double (or any multiple of two) +not+ then it will replace
|
482
|
+
# the +and+, +or+ or +not+ in the query and therefore
|
483
|
+
# the original term has reduced and this, the replacement
|
484
|
+
# term, needs to indicate that fact. This allows us to
|
485
|
+
# flag that.
|
486
|
+
def set_reduced
|
487
|
+
@was_reduced = true
|
488
|
+
end
|
489
|
+
|
490
|
+
# The query can be traversed to return the terms
|
491
|
+
# that are considered *boostable*. In the following
|
492
|
+
# +apple+ will be considered positive and returned
|
493
|
+
# but +banana+ will not:
|
494
|
+
#
|
495
|
+
# apple not banana
|
496
|
+
#
|
497
|
+
# Terms that are boostable can be used to improve
|
498
|
+
# the documents relavance / position in the results list.
|
499
|
+
def boostable(negative = false)
|
500
|
+
if negative == true then
|
501
|
+
return nil
|
502
|
+
else
|
503
|
+
return self
|
504
|
+
end
|
505
|
+
end
|
506
|
+
end
|
507
|
+
|
508
|
+
# The base class for the +and+ and +or+ sets.
|
509
|
+
#
|
510
|
+
# Generally you should not need to handle this class unless
|
511
|
+
# you are changing the parser works.
|
512
|
+
class QueryParser::Set
|
513
|
+
def initialize
|
514
|
+
@data = Array.new
|
515
|
+
@was_reduced = false
|
516
|
+
end
|
517
|
+
|
518
|
+
# Add a list of +terms+, +nots+ and other +sets+
|
519
|
+
# to the list of things that are in this part
|
520
|
+
# of the query.
|
521
|
+
#
|
522
|
+
# Can handle a list of items or just a single one.
|
523
|
+
def add(*data)
|
524
|
+
data.each do |i|
|
525
|
+
if i.class == Array then
|
526
|
+
i.each {|j| add(j)}
|
527
|
+
else
|
528
|
+
@data << i
|
529
|
+
end
|
530
|
+
end
|
531
|
+
end
|
532
|
+
|
533
|
+
# Returns all the data held by this set
|
534
|
+
def contents
|
535
|
+
@data
|
536
|
+
end
|
537
|
+
|
538
|
+
# Display the set, useful for debugging and testing
|
539
|
+
def inspect
|
540
|
+
r = Array.new
|
541
|
+
@data.each {|i|r << i.inspect}
|
542
|
+
"<#{self.inspect_class} #{r.join(' ')}>"
|
543
|
+
end
|
544
|
+
|
545
|
+
# Convert a set into string usable in a Lucene query
|
546
|
+
# with an optional similarity that needs to be passed
|
547
|
+
# to the Terms
|
548
|
+
def lucene(field, similarity = nil)
|
549
|
+
r = Array.new
|
550
|
+
@data.each do |i|
|
551
|
+
x = ''
|
552
|
+
if self.class == QueryParser::And and i.class != QueryParser::Not then
|
553
|
+
x = '+'
|
554
|
+
end
|
555
|
+
x << i.lucene(field, similarity)
|
556
|
+
r << x
|
557
|
+
end
|
558
|
+
|
559
|
+
if r.size == 1 then
|
560
|
+
"#{r[0]}"
|
561
|
+
else
|
562
|
+
"(#{r.join(' ')})"
|
563
|
+
end
|
564
|
+
end
|
565
|
+
|
566
|
+
# A Set contained within a Set should fold the contents of the inner set
|
567
|
+
# into itself. Otherwise reduce the contents of the Set individually and
|
568
|
+
# set the flag if the contents reduced
|
569
|
+
def reduce
|
570
|
+
r = Array.new
|
571
|
+
@was_reduced = false
|
572
|
+
|
573
|
+
@data.each do |i|
|
574
|
+
if self.class == i.class then
|
575
|
+
@was_reduced = true
|
576
|
+
i.contents.each do |c|
|
577
|
+
r << c.reduce
|
578
|
+
end
|
579
|
+
else
|
580
|
+
x = i.reduce
|
581
|
+
if x.reduced? then
|
582
|
+
@was_reduced = true
|
583
|
+
end
|
584
|
+
r << x
|
585
|
+
end
|
586
|
+
end
|
587
|
+
|
588
|
+
if r.size == 1 then
|
589
|
+
@was_reduced = true
|
590
|
+
r[0].set_reduced
|
591
|
+
return r.first
|
592
|
+
else
|
593
|
+
@data = r
|
594
|
+
return self
|
595
|
+
end
|
596
|
+
end
|
597
|
+
|
598
|
+
# Did calling #reduce on this set actually reduce it
|
599
|
+
def reduced?
|
600
|
+
@was_reduced
|
601
|
+
end
|
602
|
+
|
603
|
+
# Force the reduced flag to true
|
604
|
+
def set_reduced
|
605
|
+
@was_reduced = true
|
606
|
+
end
|
607
|
+
|
608
|
+
# Return all the boostable terms that are held
|
609
|
+
# in the set. Thus for
|
610
|
+
#
|
611
|
+
# tom and dick and harry
|
612
|
+
#
|
613
|
+
# The terms +tom+, +dick+ and +harry+ are all considered
|
614
|
+
# boostable. However in
|
615
|
+
#
|
616
|
+
# tom and dick and not harry
|
617
|
+
#
|
618
|
+
# Only the terms +tom+ and +dick+ are considered boostable
|
619
|
+
def boostable(negative = false)
|
620
|
+
r = Array.new
|
621
|
+
|
622
|
+
@data.each do |i|
|
623
|
+
x = i.boostable(negative)
|
624
|
+
if x != nil then
|
625
|
+
r << x
|
626
|
+
end
|
627
|
+
end
|
628
|
+
|
629
|
+
return r.flatten
|
630
|
+
end
|
631
|
+
end
|
632
|
+
|
633
|
+
# A subclass just to distinguish the +and+ from the +or+
|
634
|
+
class QueryParser::And < QueryParser::Set
|
635
|
+
def inspect_class
|
636
|
+
"AND"
|
637
|
+
end
|
638
|
+
end
|
639
|
+
|
640
|
+
# A subclass just to distinguish the +and+ from the +or+
|
641
|
+
class QueryParser::Or < QueryParser::Set
|
642
|
+
def inspect_class
|
643
|
+
"OR"
|
644
|
+
end
|
645
|
+
end
|
646
|
+
|
647
|
+
# Something to handle the +not+ term in a query
|
648
|
+
#
|
649
|
+
# Generally you should not need to handle this class unless
|
650
|
+
# you are changing the parser works.
|
651
|
+
class QueryParser::Not
|
652
|
+
# +not+ handles a single term and so it is added in
|
653
|
+
# initialisation rather than with an add method.
|
654
|
+
def initialize(data)
|
655
|
+
@data = data
|
656
|
+
@was_reduced = false
|
657
|
+
end
|
658
|
+
|
659
|
+
# Returns the data held by the +not+
|
660
|
+
def contents
|
661
|
+
@data
|
662
|
+
end
|
663
|
+
|
664
|
+
# Display the +not+, useful for debugging
|
665
|
+
def inspect
|
666
|
+
"<NOT #{@data.inspect}>"
|
667
|
+
end
|
668
|
+
|
669
|
+
# Convert a +not+ into string usable in a Lucene query
|
670
|
+
# passing the similarity on to the term contained by
|
671
|
+
# the +not+
|
672
|
+
def lucene(field, similarity = nil)
|
673
|
+
"-#{@data.lucene(field, similarity)}"
|
674
|
+
end
|
675
|
+
|
676
|
+
# Double negatives should be eliminated otherwise
|
677
|
+
def reduce
|
678
|
+
if @data.class == QueryParser::Not then
|
679
|
+
@was_reduced = true
|
680
|
+
x = @data.contents.reduce
|
681
|
+
x.set_reduced
|
682
|
+
return x
|
683
|
+
else
|
684
|
+
@data = @data.reduce
|
685
|
+
@was_reduced = @data.reduced?
|
686
|
+
return self
|
687
|
+
end
|
688
|
+
end
|
689
|
+
|
690
|
+
# Were the contents reduced?
|
691
|
+
def reduced?
|
692
|
+
@was_reduced
|
693
|
+
end
|
694
|
+
|
695
|
+
# Sets the reduced flag to true
|
696
|
+
def set_reduced
|
697
|
+
@was_reduced = true
|
698
|
+
end
|
699
|
+
|
700
|
+
# Return all the boostable terms that are held
|
701
|
+
# in the +not+. Thus for
|
702
|
+
#
|
703
|
+
# tom and dick and harry
|
704
|
+
#
|
705
|
+
# The terms +tom+, +dick+ and +harry+ are all considered
|
706
|
+
# boostable. However in
|
707
|
+
#
|
708
|
+
# tom and dick and not harry
|
709
|
+
#
|
710
|
+
# Only the terms +tom+ and +dick+ are considered boostable
|
711
|
+
def boostable(negative = false)
|
712
|
+
@data.boostable(!negative)
|
713
|
+
end
|
714
|
+
end
|
metadata
ADDED
@@ -0,0 +1,61 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: queryparser
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Peter Hickman
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2008-12-21 00:00:00 +00:00
|
13
|
+
default_executable:
|
14
|
+
dependencies: []
|
15
|
+
|
16
|
+
description:
|
17
|
+
email: peterhi@ntlworld.com
|
18
|
+
executables: []
|
19
|
+
|
20
|
+
extensions: []
|
21
|
+
|
22
|
+
extra_rdoc_files:
|
23
|
+
- README
|
24
|
+
- COPYING
|
25
|
+
files:
|
26
|
+
- README
|
27
|
+
- COPYING
|
28
|
+
- COPYRIGHT
|
29
|
+
- Rakefile
|
30
|
+
- lib/queryparser.rb
|
31
|
+
has_rdoc: true
|
32
|
+
homepage: queryparser.rubyforge.org
|
33
|
+
post_install_message:
|
34
|
+
rdoc_options:
|
35
|
+
- --title
|
36
|
+
- QueryParser
|
37
|
+
- --charset
|
38
|
+
- utf-8
|
39
|
+
require_paths:
|
40
|
+
- lib
|
41
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
42
|
+
requirements:
|
43
|
+
- - ">="
|
44
|
+
- !ruby/object:Gem::Version
|
45
|
+
version: "0"
|
46
|
+
version:
|
47
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
48
|
+
requirements:
|
49
|
+
- - ">="
|
50
|
+
- !ruby/object:Gem::Version
|
51
|
+
version: "0"
|
52
|
+
version:
|
53
|
+
requirements: []
|
54
|
+
|
55
|
+
rubyforge_project: queryparser
|
56
|
+
rubygems_version: 1.3.1
|
57
|
+
signing_key:
|
58
|
+
specification_version: 2
|
59
|
+
summary: Parse a natural language query into lucene query syntax
|
60
|
+
test_files: []
|
61
|
+
|