dbldots_oedipus 0.0.16
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +10 -0
- data/.rspec +1 -0
- data/Gemfile +4 -0
- data/LICENSE +20 -0
- data/README.md +435 -0
- data/Rakefile +26 -0
- data/ext/oedipus/extconf.rb +72 -0
- data/ext/oedipus/lexing.c +96 -0
- data/ext/oedipus/lexing.h +20 -0
- data/ext/oedipus/oedipus.c +339 -0
- data/ext/oedipus/oedipus.h +58 -0
- data/lib/oedipus.rb +40 -0
- data/lib/oedipus/comparison.rb +88 -0
- data/lib/oedipus/comparison/between.rb +21 -0
- data/lib/oedipus/comparison/equal.rb +21 -0
- data/lib/oedipus/comparison/gt.rb +21 -0
- data/lib/oedipus/comparison/gte.rb +21 -0
- data/lib/oedipus/comparison/in.rb +21 -0
- data/lib/oedipus/comparison/lt.rb +21 -0
- data/lib/oedipus/comparison/lte.rb +21 -0
- data/lib/oedipus/comparison/not.rb +25 -0
- data/lib/oedipus/comparison/not_equal.rb +21 -0
- data/lib/oedipus/comparison/not_in.rb +21 -0
- data/lib/oedipus/comparison/outside.rb +21 -0
- data/lib/oedipus/comparison/shortcuts.rb +144 -0
- data/lib/oedipus/connection.rb +124 -0
- data/lib/oedipus/connection/pool.rb +133 -0
- data/lib/oedipus/connection/registry.rb +56 -0
- data/lib/oedipus/connection_error.rb +14 -0
- data/lib/oedipus/index.rb +320 -0
- data/lib/oedipus/query_builder.rb +185 -0
- data/lib/oedipus/rspec/test_rig.rb +132 -0
- data/lib/oedipus/version.rb +12 -0
- data/oedipus.gemspec +42 -0
- data/spec/data/.gitkeep +0 -0
- data/spec/integration/connection/registry_spec.rb +50 -0
- data/spec/integration/connection_spec.rb +156 -0
- data/spec/integration/index_spec.rb +442 -0
- data/spec/spec_helper.rb +16 -0
- data/spec/unit/comparison/between_spec.rb +36 -0
- data/spec/unit/comparison/equal_spec.rb +22 -0
- data/spec/unit/comparison/gt_spec.rb +22 -0
- data/spec/unit/comparison/gte_spec.rb +22 -0
- data/spec/unit/comparison/in_spec.rb +22 -0
- data/spec/unit/comparison/lt_spec.rb +22 -0
- data/spec/unit/comparison/lte_spec.rb +22 -0
- data/spec/unit/comparison/not_equal_spec.rb +22 -0
- data/spec/unit/comparison/not_in_spec.rb +22 -0
- data/spec/unit/comparison/not_spec.rb +37 -0
- data/spec/unit/comparison/outside_spec.rb +36 -0
- data/spec/unit/comparison/shortcuts_spec.rb +125 -0
- data/spec/unit/comparison_spec.rb +109 -0
- data/spec/unit/query_builder_spec.rb +205 -0
- metadata +164 -0
data/.gitignore
ADDED
data/.rspec
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
--colour
|
data/Gemfile
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright © 2011 Chris Corbyn
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,435 @@
|
|
1
|
+
# Oedipus: Sphinx 2 Search Client for Ruby
|
2
|
+
|
3
|
+
Oedipus is a client for the Sphinx search engine (>= 2.0.2), with support for
|
4
|
+
real-time indexes and multi-dimensional faceted searches.
|
5
|
+
|
6
|
+
It is not a clone of the PHP API, rather it is written from the ground up,
|
7
|
+
wrapping the SphinxQL API offered by searchd. Nor is it a plugin for
|
8
|
+
ActiveRecord or DataMapper... though this will be offered in separate gems (see
|
9
|
+
[oedipus-dm](https://github.com/d11wtq/oedipus-dm)).
|
10
|
+
|
11
|
+
Oedipus provides a level of abstraction in terms of the ease with which faceted
|
12
|
+
search may be implemented, while remaining light and simple.
|
13
|
+
|
14
|
+
Data structures are managed using core ruby data types (Array and Hash), ensuring
|
15
|
+
simplicity and flexibilty.
|
16
|
+
|
17
|
+
The current development focus is on supporting realtime indexes, where data is
|
18
|
+
indexed from your application, rather than by running the indexer tool that comes
|
19
|
+
with Sphinx. You may use indexes that are indexed with the indexer tool, but
|
20
|
+
Oedipus does not (yet) provide wrappers for indexing that data via ruby [1].
|
21
|
+
|
22
|
+
## Dependencies
|
23
|
+
|
24
|
+
* ruby >= 1.9
|
25
|
+
* sphinx >= 2.0.2
|
26
|
+
* mysql dev libs >= 4.1
|
27
|
+
|
28
|
+
## Installation
|
29
|
+
|
30
|
+
Via rubygems:
|
31
|
+
|
32
|
+
```
|
33
|
+
gem install oedipus
|
34
|
+
```
|
35
|
+
|
36
|
+
## Usage
|
37
|
+
|
38
|
+
The following features are all currently implemented.
|
39
|
+
|
40
|
+
### Connecting to Sphinx
|
41
|
+
|
42
|
+
``` ruby
|
43
|
+
require "oedipus"
|
44
|
+
|
45
|
+
sphinx = Oedipus.connect('127.0.0.1:9306') # sphinxql host
|
46
|
+
```
|
47
|
+
|
48
|
+
**NOTE:** Don't connect to the named host 'localhost', since the MySQL library
|
49
|
+
will try to use a UNIX socket instead of a TCP connection, which Sphinx doesn't
|
50
|
+
currently support.
|
51
|
+
|
52
|
+
Connections can be re-used by calling `Oedipus.connection` once connected.
|
53
|
+
|
54
|
+
``` ruby
|
55
|
+
sphinx = Oedipus.connection
|
56
|
+
```
|
57
|
+
|
58
|
+
If you're using Oedipus in a Rails application, you may wish to call `connect`
|
59
|
+
inside an initializer and then obtain that connection in your application, by
|
60
|
+
calling `connection`.
|
61
|
+
|
62
|
+
If you need to manage multiple connections, you may specify names for each
|
63
|
+
connection.
|
64
|
+
|
65
|
+
``` ruby
|
66
|
+
Oedipus.connect("other-host.tld:9306", :other)
|
67
|
+
|
68
|
+
sphinx = Oedipus.connection(:other)
|
69
|
+
```
|
70
|
+
|
71
|
+
### Inserting (real-time indexes)
|
72
|
+
|
73
|
+
``` ruby
|
74
|
+
sphinx[:articles].insert(
|
75
|
+
7,
|
76
|
+
title: "Badgers in the wild",
|
77
|
+
body: "A big long wodge of text",
|
78
|
+
author_id: 4,
|
79
|
+
views: 102
|
80
|
+
)
|
81
|
+
```
|
82
|
+
|
83
|
+
### Replacing (real-time indexes)
|
84
|
+
|
85
|
+
``` ruby
|
86
|
+
sphinx[:articles].replace(
|
87
|
+
7,
|
88
|
+
title: "Badgers in the wild",
|
89
|
+
body: "A big long wodge of text",
|
90
|
+
author_id: 4,
|
91
|
+
views: 102
|
92
|
+
)
|
93
|
+
```
|
94
|
+
|
95
|
+
### Updating (real-time indexes)
|
96
|
+
|
97
|
+
``` ruby
|
98
|
+
sphinx[:articles].update(7, views: 103)
|
99
|
+
```
|
100
|
+
|
101
|
+
### Deleting (real-time indexes)
|
102
|
+
|
103
|
+
``` ruby
|
104
|
+
sphinx[:articles].delete(7)
|
105
|
+
```
|
106
|
+
|
107
|
+
### Fetching a known document (by ID)
|
108
|
+
|
109
|
+
``` ruby
|
110
|
+
record = sphinx[:articles].fetch(7)
|
111
|
+
# => { id: 7, views: 984, author_id: 3 }
|
112
|
+
```
|
113
|
+
|
114
|
+
### Fulltext searching
|
115
|
+
|
116
|
+
You perform queries by invoking `#search` on the index.
|
117
|
+
|
118
|
+
|
119
|
+
``` ruby
|
120
|
+
results = sphinx[:articles].search("badgers", limit: 2)
|
121
|
+
|
122
|
+
# Meta deta indicates the overall number of matched records, while the ':records'
|
123
|
+
# array contains the actual data returned.
|
124
|
+
#
|
125
|
+
# => {
|
126
|
+
# total_found: 987,
|
127
|
+
# time: 0.000,
|
128
|
+
# keywords: [ "badgers" ],
|
129
|
+
# docs: { "badgers" => 987 },
|
130
|
+
# records: [
|
131
|
+
# { id: 7, author_id: 4, views: 102 },
|
132
|
+
# { id: 11, author_id: 6, views: 23 }
|
133
|
+
# ]
|
134
|
+
# }
|
135
|
+
```
|
136
|
+
|
137
|
+
### Fetching only specific attributes
|
138
|
+
|
139
|
+
``` ruby
|
140
|
+
sphinx[:articles].search(
|
141
|
+
"example",
|
142
|
+
attrs: [:id, :views]
|
143
|
+
)
|
144
|
+
```
|
145
|
+
|
146
|
+
### Fetching additional attributes (including expressions)
|
147
|
+
|
148
|
+
Any valid field expression may be fetched. Be sure to alias it if you want to order by it.
|
149
|
+
|
150
|
+
``` ruby
|
151
|
+
sphinx[:articles].search(
|
152
|
+
"example",
|
153
|
+
attrs: [:*, "WEIGHT() AS wgt"]
|
154
|
+
)
|
155
|
+
```
|
156
|
+
|
157
|
+
### Attribute filters
|
158
|
+
|
159
|
+
Result formatting is the same as for a fulltext search. You can add as many
|
160
|
+
filters as you like.
|
161
|
+
|
162
|
+
``` ruby
|
163
|
+
# equality
|
164
|
+
sphinx[:articles].search(
|
165
|
+
"example",
|
166
|
+
author_id: 7
|
167
|
+
)
|
168
|
+
|
169
|
+
# less than or equal
|
170
|
+
sphinx[:articles].search(
|
171
|
+
"example",
|
172
|
+
views: -Float::INFINITY..100
|
173
|
+
)
|
174
|
+
|
175
|
+
sphinx[:articles].search(
|
176
|
+
"example",
|
177
|
+
views: Oedipus.lte(100)
|
178
|
+
)
|
179
|
+
|
180
|
+
# greater than
|
181
|
+
sphinx[:articles].search(
|
182
|
+
"example",
|
183
|
+
views: 100...Float::INFINITY
|
184
|
+
)
|
185
|
+
|
186
|
+
sphinx[:articles].search(
|
187
|
+
"example",
|
188
|
+
views: Oedipus.gt(100)
|
189
|
+
)
|
190
|
+
|
191
|
+
# not equal
|
192
|
+
sphinx[:articles].search(
|
193
|
+
"example",
|
194
|
+
author_id: Oedipus.not(7)
|
195
|
+
)
|
196
|
+
|
197
|
+
# between
|
198
|
+
sphinx[:articles].search(
|
199
|
+
"example",
|
200
|
+
views: 50..100
|
201
|
+
)
|
202
|
+
|
203
|
+
sphinx[:articles].search(
|
204
|
+
"example",
|
205
|
+
views: 50...100
|
206
|
+
)
|
207
|
+
|
208
|
+
# not between
|
209
|
+
sphinx[:articles].search(
|
210
|
+
"example",
|
211
|
+
views: Oedipus.not(50..100)
|
212
|
+
)
|
213
|
+
|
214
|
+
sphinx[:articles].search(
|
215
|
+
"example",
|
216
|
+
views: Oedipus.not(50...100)
|
217
|
+
)
|
218
|
+
|
219
|
+
# IN( ... )
|
220
|
+
sphinx[:articles].search(
|
221
|
+
"example",
|
222
|
+
author_id: [7, 22]
|
223
|
+
)
|
224
|
+
|
225
|
+
# NOT IN( ... )
|
226
|
+
sphinx[:articles].search(
|
227
|
+
"example",
|
228
|
+
author_id: Oedipus.not([7, 22])
|
229
|
+
)
|
230
|
+
```
|
231
|
+
|
232
|
+
### Ordering
|
233
|
+
|
234
|
+
``` ruby
|
235
|
+
sphinx[:articles].search("badgers", order: { views: :asc })
|
236
|
+
```
|
237
|
+
|
238
|
+
Special handling is done for ordering by relevance.
|
239
|
+
|
240
|
+
``` ruby
|
241
|
+
sphinx[:articles].search("badgers", order: { relevance: :desc })
|
242
|
+
```
|
243
|
+
|
244
|
+
In the above case, Oedipus explicity adds `WEIGHT() AS relevance` to the `:attrs`
|
245
|
+
option. You can manually set up the relevance sort if you wish to name the weighting
|
246
|
+
attribute differently.
|
247
|
+
|
248
|
+
### Limits and offsets
|
249
|
+
|
250
|
+
Note that Sphinx applies a limit of 20 by default, so you probably want to specify
|
251
|
+
a limit yourself. You are bound by your `max_matches` setting in sphinx.conf.
|
252
|
+
|
253
|
+
The meta data will still indicate the actual number of results that matched; you simply
|
254
|
+
get a smaller collection of materialized records.
|
255
|
+
|
256
|
+
``` ruby
|
257
|
+
sphinx[:articles].search("bobcats", limit: 50)
|
258
|
+
sphinx[:articles].search("bobcats", limit: 50, offset: 150)
|
259
|
+
```
|
260
|
+
|
261
|
+
### Faceted searching
|
262
|
+
|
263
|
+
A faceted search takes a base query and a set of additional queries that are
|
264
|
+
variations on it. Oedipus makes this simple by allowing your facets to inherit
|
265
|
+
from the base query.
|
266
|
+
|
267
|
+
Oedipus allows you to replace `'%{query}'` in your facets with whatever was in
|
268
|
+
the original query. This can be useful if you want to provide facets that only
|
269
|
+
perform the search in the title of the document (`"@title (%{query})"`) for
|
270
|
+
example.
|
271
|
+
|
272
|
+
Each facet is given a key, which is used to reference it in the results. This
|
273
|
+
key is any arbitrary object that can be used as a key in a ruby Hash. You may,
|
274
|
+
for example, use domain-specific objects as keys.
|
275
|
+
|
276
|
+
Sphinx optimizes the queries by figuring out what the common parts are. Currently
|
277
|
+
it does two optimizations, though in future this will likely improve further, so
|
278
|
+
using this technique to do your faceted searches is the correct approach.
|
279
|
+
|
280
|
+
``` ruby
|
281
|
+
results = sphinx[:articles].search(
|
282
|
+
"badgers",
|
283
|
+
facets: {
|
284
|
+
popular: { views: 100..10000 },
|
285
|
+
also_farming: "%{query} & farming",
|
286
|
+
popular_farming: ["%{query} & farming", views: 100..10000 ]
|
287
|
+
}
|
288
|
+
)
|
289
|
+
# => {
|
290
|
+
# total_found: 987,
|
291
|
+
# time: 0.000,
|
292
|
+
# records: [ ... ],
|
293
|
+
# facets: {
|
294
|
+
# popular: {
|
295
|
+
# total_found: 25,
|
296
|
+
# time: 0.000,
|
297
|
+
# records: [ ... ]
|
298
|
+
# },
|
299
|
+
# also_farming: {
|
300
|
+
# total_found: 123,
|
301
|
+
# time: 0.000,
|
302
|
+
# records: [ ... ]
|
303
|
+
# },
|
304
|
+
# popular_farming: {
|
305
|
+
# total_found: 2,
|
306
|
+
# time: 0.000,
|
307
|
+
# records: [ ... ]
|
308
|
+
# }
|
309
|
+
# }
|
310
|
+
# }
|
311
|
+
```
|
312
|
+
|
313
|
+
#### Multi-dimensional faceted search
|
314
|
+
|
315
|
+
If you can add facets to the root query, how about adding facets to the facets
|
316
|
+
themselves? Easy:
|
317
|
+
|
318
|
+
``` ruby
|
319
|
+
results = sphinx[:articles].search(
|
320
|
+
"badgers",
|
321
|
+
facets: {
|
322
|
+
popular: {
|
323
|
+
views: 100..10000,
|
324
|
+
facets: {
|
325
|
+
in_title: "@title (%{query})"
|
326
|
+
}
|
327
|
+
}
|
328
|
+
}
|
329
|
+
)
|
330
|
+
# => {
|
331
|
+
# total_found: 987,
|
332
|
+
# time: 0.000,
|
333
|
+
# records: [ ... ],
|
334
|
+
# facets: {
|
335
|
+
# popular: {
|
336
|
+
# total_found: 25,
|
337
|
+
# time: 0.000,
|
338
|
+
# records: [ ... ],
|
339
|
+
# facets: {
|
340
|
+
# in_title: {
|
341
|
+
# total_found: 24,
|
342
|
+
# time: 0.000,
|
343
|
+
# records: [ ... ]
|
344
|
+
# }
|
345
|
+
# }
|
346
|
+
# }
|
347
|
+
# }
|
348
|
+
# }
|
349
|
+
```
|
350
|
+
|
351
|
+
In the above example, the nested facet `:in_title` inherits the default
|
352
|
+
parameters from the facet `:popular`, which inherits its parameters from
|
353
|
+
the root query. The result is a search for "badgers" limited only to the
|
354
|
+
title, with views between 100 and 10000.
|
355
|
+
|
356
|
+
There is no limit imposed in Oedipus for how deeply facets can be nested.
|
357
|
+
|
358
|
+
### General purpose multi-search
|
359
|
+
|
360
|
+
If you want to execute multiple queries in a batch that are not related to each
|
361
|
+
other (which would be a faceted search), then you can use `#multi_search`.
|
362
|
+
|
363
|
+
You pass a Hash of keyed-queries and get a Hash of keyed-resultsets.
|
364
|
+
|
365
|
+
``` ruby
|
366
|
+
results = sphinx[:articles].multi_search(
|
367
|
+
badgers: ["badgers", limit: 30],
|
368
|
+
frogs: "frogs & wetlands",
|
369
|
+
rabbits: ["rabbits | burrows", view_count: 20..100]
|
370
|
+
)
|
371
|
+
# => {
|
372
|
+
# badgers: {
|
373
|
+
# ...
|
374
|
+
# },
|
375
|
+
# frogs: {
|
376
|
+
# ...
|
377
|
+
# },
|
378
|
+
# rabbits: {
|
379
|
+
# ...
|
380
|
+
# }
|
381
|
+
# }
|
382
|
+
```
|
383
|
+
|
384
|
+
## Disconnecting from Sphinx
|
385
|
+
|
386
|
+
Oedipus will automatically close connections that are idle, so you don't,
|
387
|
+
generally speaking, need to close connections manually. However, before
|
388
|
+
forking children, for example, it is important to dispose of any open
|
389
|
+
resources to avoid sharing resources across processes. To do this:
|
390
|
+
|
391
|
+
``` ruby
|
392
|
+
Oedipus.connections.each { |key, conn| conn.close }
|
393
|
+
```
|
394
|
+
|
395
|
+
## Running the specs
|
396
|
+
|
397
|
+
There are both unit tests and integration tests in the specs/ directory. By default they
|
398
|
+
will both run, but in order for the integration specs to work, you need a locally
|
399
|
+
installed copy of [Sphinx] [2]. You then execute the specs as follows:
|
400
|
+
|
401
|
+
SEARCHD=/path/to/bin/searchd bundle exec rake spec
|
402
|
+
|
403
|
+
If you don't have Sphinx installed locally, you cannot run the integration specs (they need
|
404
|
+
to write config files and start and stop sphinx internally).
|
405
|
+
|
406
|
+
To run the unit tests alone, without the need for Sphinx:
|
407
|
+
|
408
|
+
bundle exec rake spec:unit
|
409
|
+
|
410
|
+
If you have made changes to the C extension, those changes will be compiled and installed
|
411
|
+
(to the lib/ directory) before the specs are run.
|
412
|
+
|
413
|
+
### Footnotes
|
414
|
+
|
415
|
+
[1]: In practice I find such an abstraction not to be very useful, as it assumes a single-server setup
|
416
|
+
|
417
|
+
[2]: You can build a local copy of sphinx without installing it on the system:
|
418
|
+
|
419
|
+
cd sphinx-2.0.4/
|
420
|
+
./configure
|
421
|
+
make
|
422
|
+
|
423
|
+
The searchd binary will be found in /path/to/sphinx-2.0.4/src/searchd.
|
424
|
+
|
425
|
+
## Future Plans
|
426
|
+
|
427
|
+
* Integration ActiveRecord
|
428
|
+
* Support for re-indexing non-realtime indexes from ruby code
|
429
|
+
* Distributed index support (sharding writes between indexes)
|
430
|
+
* Query translation layer for Lucene-style AND/OR/NOT and attribute:value interpretation
|
431
|
+
* Fulltext query sanitization for unsafe user input (e.g. @@missing field)
|
432
|
+
|
433
|
+
## Copyright and Licensing
|
434
|
+
|
435
|
+
Refer to the LICENSE file for details.
|