dbldots_oedipus 0.0.16
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.gitignore +10 -0
- data/.rspec +1 -0
- data/Gemfile +4 -0
- data/LICENSE +20 -0
- data/README.md +435 -0
- data/Rakefile +26 -0
- data/ext/oedipus/extconf.rb +72 -0
- data/ext/oedipus/lexing.c +96 -0
- data/ext/oedipus/lexing.h +20 -0
- data/ext/oedipus/oedipus.c +339 -0
- data/ext/oedipus/oedipus.h +58 -0
- data/lib/oedipus.rb +40 -0
- data/lib/oedipus/comparison.rb +88 -0
- data/lib/oedipus/comparison/between.rb +21 -0
- data/lib/oedipus/comparison/equal.rb +21 -0
- data/lib/oedipus/comparison/gt.rb +21 -0
- data/lib/oedipus/comparison/gte.rb +21 -0
- data/lib/oedipus/comparison/in.rb +21 -0
- data/lib/oedipus/comparison/lt.rb +21 -0
- data/lib/oedipus/comparison/lte.rb +21 -0
- data/lib/oedipus/comparison/not.rb +25 -0
- data/lib/oedipus/comparison/not_equal.rb +21 -0
- data/lib/oedipus/comparison/not_in.rb +21 -0
- data/lib/oedipus/comparison/outside.rb +21 -0
- data/lib/oedipus/comparison/shortcuts.rb +144 -0
- data/lib/oedipus/connection.rb +124 -0
- data/lib/oedipus/connection/pool.rb +133 -0
- data/lib/oedipus/connection/registry.rb +56 -0
- data/lib/oedipus/connection_error.rb +14 -0
- data/lib/oedipus/index.rb +320 -0
- data/lib/oedipus/query_builder.rb +185 -0
- data/lib/oedipus/rspec/test_rig.rb +132 -0
- data/lib/oedipus/version.rb +12 -0
- data/oedipus.gemspec +42 -0
- data/spec/data/.gitkeep +0 -0
- data/spec/integration/connection/registry_spec.rb +50 -0
- data/spec/integration/connection_spec.rb +156 -0
- data/spec/integration/index_spec.rb +442 -0
- data/spec/spec_helper.rb +16 -0
- data/spec/unit/comparison/between_spec.rb +36 -0
- data/spec/unit/comparison/equal_spec.rb +22 -0
- data/spec/unit/comparison/gt_spec.rb +22 -0
- data/spec/unit/comparison/gte_spec.rb +22 -0
- data/spec/unit/comparison/in_spec.rb +22 -0
- data/spec/unit/comparison/lt_spec.rb +22 -0
- data/spec/unit/comparison/lte_spec.rb +22 -0
- data/spec/unit/comparison/not_equal_spec.rb +22 -0
- data/spec/unit/comparison/not_in_spec.rb +22 -0
- data/spec/unit/comparison/not_spec.rb +37 -0
- data/spec/unit/comparison/outside_spec.rb +36 -0
- data/spec/unit/comparison/shortcuts_spec.rb +125 -0
- data/spec/unit/comparison_spec.rb +109 -0
- data/spec/unit/query_builder_spec.rb +205 -0
- metadata +164 -0
data/.gitignore
ADDED
data/.rspec
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
--colour
|
data/Gemfile
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright © 2011 Chris Corbyn
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,435 @@
|
|
1
|
+
# Oedipus: Sphinx 2 Search Client for Ruby
|
2
|
+
|
3
|
+
Oedipus is a client for the Sphinx search engine (>= 2.0.2), with support for
|
4
|
+
real-time indexes and multi-dimensional faceted searches.
|
5
|
+
|
6
|
+
It is not a clone of the PHP API, rather it is written from the ground up,
|
7
|
+
wrapping the SphinxQL API offered by searchd. Nor is it a plugin for
|
8
|
+
ActiveRecord or DataMapper... though this will be offered in separate gems (see
|
9
|
+
[oedipus-dm](https://github.com/d11wtq/oedipus-dm)).
|
10
|
+
|
11
|
+
Oedipus provides a level of abstraction in terms of the ease with which faceted
|
12
|
+
search may be implemented, while remaining light and simple.
|
13
|
+
|
14
|
+
Data structures are managed using core ruby data types (Array and Hash), ensuring
|
15
|
+
simplicity and flexibilty.
|
16
|
+
|
17
|
+
The current development focus is on supporting realtime indexes, where data is
|
18
|
+
indexed from your application, rather than by running the indexer tool that comes
|
19
|
+
with Sphinx. You may use indexes that are indexed with the indexer tool, but
|
20
|
+
Oedipus does not (yet) provide wrappers for indexing that data via ruby [1].
|
21
|
+
|
22
|
+
## Dependencies
|
23
|
+
|
24
|
+
* ruby >= 1.9
|
25
|
+
* sphinx >= 2.0.2
|
26
|
+
* mysql dev libs >= 4.1
|
27
|
+
|
28
|
+
## Installation
|
29
|
+
|
30
|
+
Via rubygems:
|
31
|
+
|
32
|
+
```
|
33
|
+
gem install oedipus
|
34
|
+
```
|
35
|
+
|
36
|
+
## Usage
|
37
|
+
|
38
|
+
The following features are all currently implemented.
|
39
|
+
|
40
|
+
### Connecting to Sphinx
|
41
|
+
|
42
|
+
``` ruby
|
43
|
+
require "oedipus"
|
44
|
+
|
45
|
+
sphinx = Oedipus.connect('127.0.0.1:9306') # sphinxql host
|
46
|
+
```
|
47
|
+
|
48
|
+
**NOTE:** Don't connect to the named host 'localhost', since the MySQL library
|
49
|
+
will try to use a UNIX socket instead of a TCP connection, which Sphinx doesn't
|
50
|
+
currently support.
|
51
|
+
|
52
|
+
Connections can be re-used by calling `Oedipus.connection` once connected.
|
53
|
+
|
54
|
+
``` ruby
|
55
|
+
sphinx = Oedipus.connection
|
56
|
+
```
|
57
|
+
|
58
|
+
If you're using Oedipus in a Rails application, you may wish to call `connect`
|
59
|
+
inside an initializer and then obtain that connection in your application, by
|
60
|
+
calling `connection`.
|
61
|
+
|
62
|
+
If you need to manage multiple connections, you may specify names for each
|
63
|
+
connection.
|
64
|
+
|
65
|
+
``` ruby
|
66
|
+
Oedipus.connect("other-host.tld:9306", :other)
|
67
|
+
|
68
|
+
sphinx = Oedipus.connection(:other)
|
69
|
+
```
|
70
|
+
|
71
|
+
### Inserting (real-time indexes)
|
72
|
+
|
73
|
+
``` ruby
|
74
|
+
sphinx[:articles].insert(
|
75
|
+
7,
|
76
|
+
title: "Badgers in the wild",
|
77
|
+
body: "A big long wodge of text",
|
78
|
+
author_id: 4,
|
79
|
+
views: 102
|
80
|
+
)
|
81
|
+
```
|
82
|
+
|
83
|
+
### Replacing (real-time indexes)
|
84
|
+
|
85
|
+
``` ruby
|
86
|
+
sphinx[:articles].replace(
|
87
|
+
7,
|
88
|
+
title: "Badgers in the wild",
|
89
|
+
body: "A big long wodge of text",
|
90
|
+
author_id: 4,
|
91
|
+
views: 102
|
92
|
+
)
|
93
|
+
```
|
94
|
+
|
95
|
+
### Updating (real-time indexes)
|
96
|
+
|
97
|
+
``` ruby
|
98
|
+
sphinx[:articles].update(7, views: 103)
|
99
|
+
```
|
100
|
+
|
101
|
+
### Deleting (real-time indexes)
|
102
|
+
|
103
|
+
``` ruby
|
104
|
+
sphinx[:articles].delete(7)
|
105
|
+
```
|
106
|
+
|
107
|
+
### Fetching a known document (by ID)
|
108
|
+
|
109
|
+
``` ruby
|
110
|
+
record = sphinx[:articles].fetch(7)
|
111
|
+
# => { id: 7, views: 984, author_id: 3 }
|
112
|
+
```
|
113
|
+
|
114
|
+
### Fulltext searching
|
115
|
+
|
116
|
+
You perform queries by invoking `#search` on the index.
|
117
|
+
|
118
|
+
|
119
|
+
``` ruby
|
120
|
+
results = sphinx[:articles].search("badgers", limit: 2)
|
121
|
+
|
122
|
+
# Meta deta indicates the overall number of matched records, while the ':records'
|
123
|
+
# array contains the actual data returned.
|
124
|
+
#
|
125
|
+
# => {
|
126
|
+
# total_found: 987,
|
127
|
+
# time: 0.000,
|
128
|
+
# keywords: [ "badgers" ],
|
129
|
+
# docs: { "badgers" => 987 },
|
130
|
+
# records: [
|
131
|
+
# { id: 7, author_id: 4, views: 102 },
|
132
|
+
# { id: 11, author_id: 6, views: 23 }
|
133
|
+
# ]
|
134
|
+
# }
|
135
|
+
```
|
136
|
+
|
137
|
+
### Fetching only specific attributes
|
138
|
+
|
139
|
+
``` ruby
|
140
|
+
sphinx[:articles].search(
|
141
|
+
"example",
|
142
|
+
attrs: [:id, :views]
|
143
|
+
)
|
144
|
+
```
|
145
|
+
|
146
|
+
### Fetching additional attributes (including expressions)
|
147
|
+
|
148
|
+
Any valid field expression may be fetched. Be sure to alias it if you want to order by it.
|
149
|
+
|
150
|
+
``` ruby
|
151
|
+
sphinx[:articles].search(
|
152
|
+
"example",
|
153
|
+
attrs: [:*, "WEIGHT() AS wgt"]
|
154
|
+
)
|
155
|
+
```
|
156
|
+
|
157
|
+
### Attribute filters
|
158
|
+
|
159
|
+
Result formatting is the same as for a fulltext search. You can add as many
|
160
|
+
filters as you like.
|
161
|
+
|
162
|
+
``` ruby
|
163
|
+
# equality
|
164
|
+
sphinx[:articles].search(
|
165
|
+
"example",
|
166
|
+
author_id: 7
|
167
|
+
)
|
168
|
+
|
169
|
+
# less than or equal
|
170
|
+
sphinx[:articles].search(
|
171
|
+
"example",
|
172
|
+
views: -Float::INFINITY..100
|
173
|
+
)
|
174
|
+
|
175
|
+
sphinx[:articles].search(
|
176
|
+
"example",
|
177
|
+
views: Oedipus.lte(100)
|
178
|
+
)
|
179
|
+
|
180
|
+
# greater than
|
181
|
+
sphinx[:articles].search(
|
182
|
+
"example",
|
183
|
+
views: 100...Float::INFINITY
|
184
|
+
)
|
185
|
+
|
186
|
+
sphinx[:articles].search(
|
187
|
+
"example",
|
188
|
+
views: Oedipus.gt(100)
|
189
|
+
)
|
190
|
+
|
191
|
+
# not equal
|
192
|
+
sphinx[:articles].search(
|
193
|
+
"example",
|
194
|
+
author_id: Oedipus.not(7)
|
195
|
+
)
|
196
|
+
|
197
|
+
# between
|
198
|
+
sphinx[:articles].search(
|
199
|
+
"example",
|
200
|
+
views: 50..100
|
201
|
+
)
|
202
|
+
|
203
|
+
sphinx[:articles].search(
|
204
|
+
"example",
|
205
|
+
views: 50...100
|
206
|
+
)
|
207
|
+
|
208
|
+
# not between
|
209
|
+
sphinx[:articles].search(
|
210
|
+
"example",
|
211
|
+
views: Oedipus.not(50..100)
|
212
|
+
)
|
213
|
+
|
214
|
+
sphinx[:articles].search(
|
215
|
+
"example",
|
216
|
+
views: Oedipus.not(50...100)
|
217
|
+
)
|
218
|
+
|
219
|
+
# IN( ... )
|
220
|
+
sphinx[:articles].search(
|
221
|
+
"example",
|
222
|
+
author_id: [7, 22]
|
223
|
+
)
|
224
|
+
|
225
|
+
# NOT IN( ... )
|
226
|
+
sphinx[:articles].search(
|
227
|
+
"example",
|
228
|
+
author_id: Oedipus.not([7, 22])
|
229
|
+
)
|
230
|
+
```
|
231
|
+
|
232
|
+
### Ordering
|
233
|
+
|
234
|
+
``` ruby
|
235
|
+
sphinx[:articles].search("badgers", order: { views: :asc })
|
236
|
+
```
|
237
|
+
|
238
|
+
Special handling is done for ordering by relevance.
|
239
|
+
|
240
|
+
``` ruby
|
241
|
+
sphinx[:articles].search("badgers", order: { relevance: :desc })
|
242
|
+
```
|
243
|
+
|
244
|
+
In the above case, Oedipus explicity adds `WEIGHT() AS relevance` to the `:attrs`
|
245
|
+
option. You can manually set up the relevance sort if you wish to name the weighting
|
246
|
+
attribute differently.
|
247
|
+
|
248
|
+
### Limits and offsets
|
249
|
+
|
250
|
+
Note that Sphinx applies a limit of 20 by default, so you probably want to specify
|
251
|
+
a limit yourself. You are bound by your `max_matches` setting in sphinx.conf.
|
252
|
+
|
253
|
+
The meta data will still indicate the actual number of results that matched; you simply
|
254
|
+
get a smaller collection of materialized records.
|
255
|
+
|
256
|
+
``` ruby
|
257
|
+
sphinx[:articles].search("bobcats", limit: 50)
|
258
|
+
sphinx[:articles].search("bobcats", limit: 50, offset: 150)
|
259
|
+
```
|
260
|
+
|
261
|
+
### Faceted searching
|
262
|
+
|
263
|
+
A faceted search takes a base query and a set of additional queries that are
|
264
|
+
variations on it. Oedipus makes this simple by allowing your facets to inherit
|
265
|
+
from the base query.
|
266
|
+
|
267
|
+
Oedipus allows you to replace `'%{query}'` in your facets with whatever was in
|
268
|
+
the original query. This can be useful if you want to provide facets that only
|
269
|
+
perform the search in the title of the document (`"@title (%{query})"`) for
|
270
|
+
example.
|
271
|
+
|
272
|
+
Each facet is given a key, which is used to reference it in the results. This
|
273
|
+
key is any arbitrary object that can be used as a key in a ruby Hash. You may,
|
274
|
+
for example, use domain-specific objects as keys.
|
275
|
+
|
276
|
+
Sphinx optimizes the queries by figuring out what the common parts are. Currently
|
277
|
+
it does two optimizations, though in future this will likely improve further, so
|
278
|
+
using this technique to do your faceted searches is the correct approach.
|
279
|
+
|
280
|
+
``` ruby
|
281
|
+
results = sphinx[:articles].search(
|
282
|
+
"badgers",
|
283
|
+
facets: {
|
284
|
+
popular: { views: 100..10000 },
|
285
|
+
also_farming: "%{query} & farming",
|
286
|
+
popular_farming: ["%{query} & farming", views: 100..10000 ]
|
287
|
+
}
|
288
|
+
)
|
289
|
+
# => {
|
290
|
+
# total_found: 987,
|
291
|
+
# time: 0.000,
|
292
|
+
# records: [ ... ],
|
293
|
+
# facets: {
|
294
|
+
# popular: {
|
295
|
+
# total_found: 25,
|
296
|
+
# time: 0.000,
|
297
|
+
# records: [ ... ]
|
298
|
+
# },
|
299
|
+
# also_farming: {
|
300
|
+
# total_found: 123,
|
301
|
+
# time: 0.000,
|
302
|
+
# records: [ ... ]
|
303
|
+
# },
|
304
|
+
# popular_farming: {
|
305
|
+
# total_found: 2,
|
306
|
+
# time: 0.000,
|
307
|
+
# records: [ ... ]
|
308
|
+
# }
|
309
|
+
# }
|
310
|
+
# }
|
311
|
+
```
|
312
|
+
|
313
|
+
#### Multi-dimensional faceted search
|
314
|
+
|
315
|
+
If you can add facets to the root query, how about adding facets to the facets
|
316
|
+
themselves? Easy:
|
317
|
+
|
318
|
+
``` ruby
|
319
|
+
results = sphinx[:articles].search(
|
320
|
+
"badgers",
|
321
|
+
facets: {
|
322
|
+
popular: {
|
323
|
+
views: 100..10000,
|
324
|
+
facets: {
|
325
|
+
in_title: "@title (%{query})"
|
326
|
+
}
|
327
|
+
}
|
328
|
+
}
|
329
|
+
)
|
330
|
+
# => {
|
331
|
+
# total_found: 987,
|
332
|
+
# time: 0.000,
|
333
|
+
# records: [ ... ],
|
334
|
+
# facets: {
|
335
|
+
# popular: {
|
336
|
+
# total_found: 25,
|
337
|
+
# time: 0.000,
|
338
|
+
# records: [ ... ],
|
339
|
+
# facets: {
|
340
|
+
# in_title: {
|
341
|
+
# total_found: 24,
|
342
|
+
# time: 0.000,
|
343
|
+
# records: [ ... ]
|
344
|
+
# }
|
345
|
+
# }
|
346
|
+
# }
|
347
|
+
# }
|
348
|
+
# }
|
349
|
+
```
|
350
|
+
|
351
|
+
In the above example, the nested facet `:in_title` inherits the default
|
352
|
+
parameters from the facet `:popular`, which inherits its parameters from
|
353
|
+
the root query. The result is a search for "badgers" limited only to the
|
354
|
+
title, with views between 100 and 10000.
|
355
|
+
|
356
|
+
There is no limit imposed in Oedipus for how deeply facets can be nested.
|
357
|
+
|
358
|
+
### General purpose multi-search
|
359
|
+
|
360
|
+
If you want to execute multiple queries in a batch that are not related to each
|
361
|
+
other (which would be a faceted search), then you can use `#multi_search`.
|
362
|
+
|
363
|
+
You pass a Hash of keyed-queries and get a Hash of keyed-resultsets.
|
364
|
+
|
365
|
+
``` ruby
|
366
|
+
results = sphinx[:articles].multi_search(
|
367
|
+
badgers: ["badgers", limit: 30],
|
368
|
+
frogs: "frogs & wetlands",
|
369
|
+
rabbits: ["rabbits | burrows", view_count: 20..100]
|
370
|
+
)
|
371
|
+
# => {
|
372
|
+
# badgers: {
|
373
|
+
# ...
|
374
|
+
# },
|
375
|
+
# frogs: {
|
376
|
+
# ...
|
377
|
+
# },
|
378
|
+
# rabbits: {
|
379
|
+
# ...
|
380
|
+
# }
|
381
|
+
# }
|
382
|
+
```
|
383
|
+
|
384
|
+
## Disconnecting from Sphinx
|
385
|
+
|
386
|
+
Oedipus will automatically close connections that are idle, so you don't,
|
387
|
+
generally speaking, need to close connections manually. However, before
|
388
|
+
forking children, for example, it is important to dispose of any open
|
389
|
+
resources to avoid sharing resources across processes. To do this:
|
390
|
+
|
391
|
+
``` ruby
|
392
|
+
Oedipus.connections.each { |key, conn| conn.close }
|
393
|
+
```
|
394
|
+
|
395
|
+
## Running the specs
|
396
|
+
|
397
|
+
There are both unit tests and integration tests in the specs/ directory. By default they
|
398
|
+
will both run, but in order for the integration specs to work, you need a locally
|
399
|
+
installed copy of [Sphinx] [2]. You then execute the specs as follows:
|
400
|
+
|
401
|
+
SEARCHD=/path/to/bin/searchd bundle exec rake spec
|
402
|
+
|
403
|
+
If you don't have Sphinx installed locally, you cannot run the integration specs (they need
|
404
|
+
to write config files and start and stop sphinx internally).
|
405
|
+
|
406
|
+
To run the unit tests alone, without the need for Sphinx:
|
407
|
+
|
408
|
+
bundle exec rake spec:unit
|
409
|
+
|
410
|
+
If you have made changes to the C extension, those changes will be compiled and installed
|
411
|
+
(to the lib/ directory) before the specs are run.
|
412
|
+
|
413
|
+
### Footnotes
|
414
|
+
|
415
|
+
[1]: In practice I find such an abstraction not to be very useful, as it assumes a single-server setup
|
416
|
+
|
417
|
+
[2]: You can build a local copy of sphinx without installing it on the system:
|
418
|
+
|
419
|
+
cd sphinx-2.0.4/
|
420
|
+
./configure
|
421
|
+
make
|
422
|
+
|
423
|
+
The searchd binary will be found in /path/to/sphinx-2.0.4/src/searchd.
|
424
|
+
|
425
|
+
## Future Plans
|
426
|
+
|
427
|
+
* Integration ActiveRecord
|
428
|
+
* Support for re-indexing non-realtime indexes from ruby code
|
429
|
+
* Distributed index support (sharding writes between indexes)
|
430
|
+
* Query translation layer for Lucene-style AND/OR/NOT and attribute:value interpretation
|
431
|
+
* Fulltext query sanitization for unsafe user input (e.g. @@missing field)
|
432
|
+
|
433
|
+
## Copyright and Licensing
|
434
|
+
|
435
|
+
Refer to the LICENSE file for details.
|