dbldots_oedipus 0.0.16

Sign up to get free protection for your applications and to get access to all the features.
Files changed (54) hide show
  1. data/.gitignore +10 -0
  2. data/.rspec +1 -0
  3. data/Gemfile +4 -0
  4. data/LICENSE +20 -0
  5. data/README.md +435 -0
  6. data/Rakefile +26 -0
  7. data/ext/oedipus/extconf.rb +72 -0
  8. data/ext/oedipus/lexing.c +96 -0
  9. data/ext/oedipus/lexing.h +20 -0
  10. data/ext/oedipus/oedipus.c +339 -0
  11. data/ext/oedipus/oedipus.h +58 -0
  12. data/lib/oedipus.rb +40 -0
  13. data/lib/oedipus/comparison.rb +88 -0
  14. data/lib/oedipus/comparison/between.rb +21 -0
  15. data/lib/oedipus/comparison/equal.rb +21 -0
  16. data/lib/oedipus/comparison/gt.rb +21 -0
  17. data/lib/oedipus/comparison/gte.rb +21 -0
  18. data/lib/oedipus/comparison/in.rb +21 -0
  19. data/lib/oedipus/comparison/lt.rb +21 -0
  20. data/lib/oedipus/comparison/lte.rb +21 -0
  21. data/lib/oedipus/comparison/not.rb +25 -0
  22. data/lib/oedipus/comparison/not_equal.rb +21 -0
  23. data/lib/oedipus/comparison/not_in.rb +21 -0
  24. data/lib/oedipus/comparison/outside.rb +21 -0
  25. data/lib/oedipus/comparison/shortcuts.rb +144 -0
  26. data/lib/oedipus/connection.rb +124 -0
  27. data/lib/oedipus/connection/pool.rb +133 -0
  28. data/lib/oedipus/connection/registry.rb +56 -0
  29. data/lib/oedipus/connection_error.rb +14 -0
  30. data/lib/oedipus/index.rb +320 -0
  31. data/lib/oedipus/query_builder.rb +185 -0
  32. data/lib/oedipus/rspec/test_rig.rb +132 -0
  33. data/lib/oedipus/version.rb +12 -0
  34. data/oedipus.gemspec +42 -0
  35. data/spec/data/.gitkeep +0 -0
  36. data/spec/integration/connection/registry_spec.rb +50 -0
  37. data/spec/integration/connection_spec.rb +156 -0
  38. data/spec/integration/index_spec.rb +442 -0
  39. data/spec/spec_helper.rb +16 -0
  40. data/spec/unit/comparison/between_spec.rb +36 -0
  41. data/spec/unit/comparison/equal_spec.rb +22 -0
  42. data/spec/unit/comparison/gt_spec.rb +22 -0
  43. data/spec/unit/comparison/gte_spec.rb +22 -0
  44. data/spec/unit/comparison/in_spec.rb +22 -0
  45. data/spec/unit/comparison/lt_spec.rb +22 -0
  46. data/spec/unit/comparison/lte_spec.rb +22 -0
  47. data/spec/unit/comparison/not_equal_spec.rb +22 -0
  48. data/spec/unit/comparison/not_in_spec.rb +22 -0
  49. data/spec/unit/comparison/not_spec.rb +37 -0
  50. data/spec/unit/comparison/outside_spec.rb +36 -0
  51. data/spec/unit/comparison/shortcuts_spec.rb +125 -0
  52. data/spec/unit/comparison_spec.rb +109 -0
  53. data/spec/unit/query_builder_spec.rb +205 -0
  54. metadata +164 -0
@@ -0,0 +1,10 @@
1
+ *.gem
2
+ .bundle
3
+ Gemfile.lock
4
+ pkg/*
5
+ spec/data/index/*
6
+ spec/data/binlog/*
7
+ spec/data/searchd.*
8
+ spec/data/sphinx.*
9
+ lib/oedipus/oedipus.so
10
+ tmp/
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --colour
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in oedipus.gemspec
4
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright © 2011 Chris Corbyn
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,435 @@
1
+ # Oedipus: Sphinx 2 Search Client for Ruby
2
+
3
+ Oedipus is a client for the Sphinx search engine (>= 2.0.2), with support for
4
+ real-time indexes and multi-dimensional faceted searches.
5
+
6
+ It is not a clone of the PHP API, rather it is written from the ground up,
7
+ wrapping the SphinxQL API offered by searchd. Nor is it a plugin for
8
+ ActiveRecord or DataMapper... though this will be offered in separate gems (see
9
+ [oedipus-dm](https://github.com/d11wtq/oedipus-dm)).
10
+
11
+ Oedipus provides a level of abstraction in terms of the ease with which faceted
12
+ search may be implemented, while remaining light and simple.
13
+
14
+ Data structures are managed using core ruby data types (Array and Hash), ensuring
15
+ simplicity and flexibilty.
16
+
17
+ The current development focus is on supporting realtime indexes, where data is
18
+ indexed from your application, rather than by running the indexer tool that comes
19
+ with Sphinx. You may use indexes that are indexed with the indexer tool, but
20
+ Oedipus does not (yet) provide wrappers for indexing that data via ruby [1].
21
+
22
+ ## Dependencies
23
+
24
+ * ruby >= 1.9
25
+ * sphinx >= 2.0.2
26
+ * mysql dev libs >= 4.1
27
+
28
+ ## Installation
29
+
30
+ Via rubygems:
31
+
32
+ ```
33
+ gem install oedipus
34
+ ```
35
+
36
+ ## Usage
37
+
38
+ The following features are all currently implemented.
39
+
40
+ ### Connecting to Sphinx
41
+
42
+ ``` ruby
43
+ require "oedipus"
44
+
45
+ sphinx = Oedipus.connect('127.0.0.1:9306') # sphinxql host
46
+ ```
47
+
48
+ **NOTE:** Don't connect to the named host 'localhost', since the MySQL library
49
+ will try to use a UNIX socket instead of a TCP connection, which Sphinx doesn't
50
+ currently support.
51
+
52
+ Connections can be re-used by calling `Oedipus.connection` once connected.
53
+
54
+ ``` ruby
55
+ sphinx = Oedipus.connection
56
+ ```
57
+
58
+ If you're using Oedipus in a Rails application, you may wish to call `connect`
59
+ inside an initializer and then obtain that connection in your application, by
60
+ calling `connection`.
61
+
62
+ If you need to manage multiple connections, you may specify names for each
63
+ connection.
64
+
65
+ ``` ruby
66
+ Oedipus.connect("other-host.tld:9306", :other)
67
+
68
+ sphinx = Oedipus.connection(:other)
69
+ ```
70
+
71
+ ### Inserting (real-time indexes)
72
+
73
+ ``` ruby
74
+ sphinx[:articles].insert(
75
+ 7,
76
+ title: "Badgers in the wild",
77
+ body: "A big long wodge of text",
78
+ author_id: 4,
79
+ views: 102
80
+ )
81
+ ```
82
+
83
+ ### Replacing (real-time indexes)
84
+
85
+ ``` ruby
86
+ sphinx[:articles].replace(
87
+ 7,
88
+ title: "Badgers in the wild",
89
+ body: "A big long wodge of text",
90
+ author_id: 4,
91
+ views: 102
92
+ )
93
+ ```
94
+
95
+ ### Updating (real-time indexes)
96
+
97
+ ``` ruby
98
+ sphinx[:articles].update(7, views: 103)
99
+ ```
100
+
101
+ ### Deleting (real-time indexes)
102
+
103
+ ``` ruby
104
+ sphinx[:articles].delete(7)
105
+ ```
106
+
107
+ ### Fetching a known document (by ID)
108
+
109
+ ``` ruby
110
+ record = sphinx[:articles].fetch(7)
111
+ # => { id: 7, views: 984, author_id: 3 }
112
+ ```
113
+
114
+ ### Fulltext searching
115
+
116
+ You perform queries by invoking `#search` on the index.
117
+
118
+
119
+ ``` ruby
120
+ results = sphinx[:articles].search("badgers", limit: 2)
121
+
122
+ # Meta deta indicates the overall number of matched records, while the ':records'
123
+ # array contains the actual data returned.
124
+ #
125
+ # => {
126
+ # total_found: 987,
127
+ # time: 0.000,
128
+ # keywords: [ "badgers" ],
129
+ # docs: { "badgers" => 987 },
130
+ # records: [
131
+ # { id: 7, author_id: 4, views: 102 },
132
+ # { id: 11, author_id: 6, views: 23 }
133
+ # ]
134
+ # }
135
+ ```
136
+
137
+ ### Fetching only specific attributes
138
+
139
+ ``` ruby
140
+ sphinx[:articles].search(
141
+ "example",
142
+ attrs: [:id, :views]
143
+ )
144
+ ```
145
+
146
+ ### Fetching additional attributes (including expressions)
147
+
148
+ Any valid field expression may be fetched. Be sure to alias it if you want to order by it.
149
+
150
+ ``` ruby
151
+ sphinx[:articles].search(
152
+ "example",
153
+ attrs: [:*, "WEIGHT() AS wgt"]
154
+ )
155
+ ```
156
+
157
+ ### Attribute filters
158
+
159
+ Result formatting is the same as for a fulltext search. You can add as many
160
+ filters as you like.
161
+
162
+ ``` ruby
163
+ # equality
164
+ sphinx[:articles].search(
165
+ "example",
166
+ author_id: 7
167
+ )
168
+
169
+ # less than or equal
170
+ sphinx[:articles].search(
171
+ "example",
172
+ views: -Float::INFINITY..100
173
+ )
174
+
175
+ sphinx[:articles].search(
176
+ "example",
177
+ views: Oedipus.lte(100)
178
+ )
179
+
180
+ # greater than
181
+ sphinx[:articles].search(
182
+ "example",
183
+ views: 100...Float::INFINITY
184
+ )
185
+
186
+ sphinx[:articles].search(
187
+ "example",
188
+ views: Oedipus.gt(100)
189
+ )
190
+
191
+ # not equal
192
+ sphinx[:articles].search(
193
+ "example",
194
+ author_id: Oedipus.not(7)
195
+ )
196
+
197
+ # between
198
+ sphinx[:articles].search(
199
+ "example",
200
+ views: 50..100
201
+ )
202
+
203
+ sphinx[:articles].search(
204
+ "example",
205
+ views: 50...100
206
+ )
207
+
208
+ # not between
209
+ sphinx[:articles].search(
210
+ "example",
211
+ views: Oedipus.not(50..100)
212
+ )
213
+
214
+ sphinx[:articles].search(
215
+ "example",
216
+ views: Oedipus.not(50...100)
217
+ )
218
+
219
+ # IN( ... )
220
+ sphinx[:articles].search(
221
+ "example",
222
+ author_id: [7, 22]
223
+ )
224
+
225
+ # NOT IN( ... )
226
+ sphinx[:articles].search(
227
+ "example",
228
+ author_id: Oedipus.not([7, 22])
229
+ )
230
+ ```
231
+
232
+ ### Ordering
233
+
234
+ ``` ruby
235
+ sphinx[:articles].search("badgers", order: { views: :asc })
236
+ ```
237
+
238
+ Special handling is done for ordering by relevance.
239
+
240
+ ``` ruby
241
+ sphinx[:articles].search("badgers", order: { relevance: :desc })
242
+ ```
243
+
244
+ In the above case, Oedipus explicity adds `WEIGHT() AS relevance` to the `:attrs`
245
+ option. You can manually set up the relevance sort if you wish to name the weighting
246
+ attribute differently.
247
+
248
+ ### Limits and offsets
249
+
250
+ Note that Sphinx applies a limit of 20 by default, so you probably want to specify
251
+ a limit yourself. You are bound by your `max_matches` setting in sphinx.conf.
252
+
253
+ The meta data will still indicate the actual number of results that matched; you simply
254
+ get a smaller collection of materialized records.
255
+
256
+ ``` ruby
257
+ sphinx[:articles].search("bobcats", limit: 50)
258
+ sphinx[:articles].search("bobcats", limit: 50, offset: 150)
259
+ ```
260
+
261
+ ### Faceted searching
262
+
263
+ A faceted search takes a base query and a set of additional queries that are
264
+ variations on it. Oedipus makes this simple by allowing your facets to inherit
265
+ from the base query.
266
+
267
+ Oedipus allows you to replace `'%{query}'` in your facets with whatever was in
268
+ the original query. This can be useful if you want to provide facets that only
269
+ perform the search in the title of the document (`"@title (%{query})"`) for
270
+ example.
271
+
272
+ Each facet is given a key, which is used to reference it in the results. This
273
+ key is any arbitrary object that can be used as a key in a ruby Hash. You may,
274
+ for example, use domain-specific objects as keys.
275
+
276
+ Sphinx optimizes the queries by figuring out what the common parts are. Currently
277
+ it does two optimizations, though in future this will likely improve further, so
278
+ using this technique to do your faceted searches is the correct approach.
279
+
280
+ ``` ruby
281
+ results = sphinx[:articles].search(
282
+ "badgers",
283
+ facets: {
284
+ popular: { views: 100..10000 },
285
+ also_farming: "%{query} & farming",
286
+ popular_farming: ["%{query} & farming", views: 100..10000 ]
287
+ }
288
+ )
289
+ # => {
290
+ # total_found: 987,
291
+ # time: 0.000,
292
+ # records: [ ... ],
293
+ # facets: {
294
+ # popular: {
295
+ # total_found: 25,
296
+ # time: 0.000,
297
+ # records: [ ... ]
298
+ # },
299
+ # also_farming: {
300
+ # total_found: 123,
301
+ # time: 0.000,
302
+ # records: [ ... ]
303
+ # },
304
+ # popular_farming: {
305
+ # total_found: 2,
306
+ # time: 0.000,
307
+ # records: [ ... ]
308
+ # }
309
+ # }
310
+ # }
311
+ ```
312
+
313
+ #### Multi-dimensional faceted search
314
+
315
+ If you can add facets to the root query, how about adding facets to the facets
316
+ themselves? Easy:
317
+
318
+ ``` ruby
319
+ results = sphinx[:articles].search(
320
+ "badgers",
321
+ facets: {
322
+ popular: {
323
+ views: 100..10000,
324
+ facets: {
325
+ in_title: "@title (%{query})"
326
+ }
327
+ }
328
+ }
329
+ )
330
+ # => {
331
+ # total_found: 987,
332
+ # time: 0.000,
333
+ # records: [ ... ],
334
+ # facets: {
335
+ # popular: {
336
+ # total_found: 25,
337
+ # time: 0.000,
338
+ # records: [ ... ],
339
+ # facets: {
340
+ # in_title: {
341
+ # total_found: 24,
342
+ # time: 0.000,
343
+ # records: [ ... ]
344
+ # }
345
+ # }
346
+ # }
347
+ # }
348
+ # }
349
+ ```
350
+
351
+ In the above example, the nested facet `:in_title` inherits the default
352
+ parameters from the facet `:popular`, which inherits its parameters from
353
+ the root query. The result is a search for "badgers" limited only to the
354
+ title, with views between 100 and 10000.
355
+
356
+ There is no limit imposed in Oedipus for how deeply facets can be nested.
357
+
358
+ ### General purpose multi-search
359
+
360
+ If you want to execute multiple queries in a batch that are not related to each
361
+ other (which would be a faceted search), then you can use `#multi_search`.
362
+
363
+ You pass a Hash of keyed-queries and get a Hash of keyed-resultsets.
364
+
365
+ ``` ruby
366
+ results = sphinx[:articles].multi_search(
367
+ badgers: ["badgers", limit: 30],
368
+ frogs: "frogs & wetlands",
369
+ rabbits: ["rabbits | burrows", view_count: 20..100]
370
+ )
371
+ # => {
372
+ # badgers: {
373
+ # ...
374
+ # },
375
+ # frogs: {
376
+ # ...
377
+ # },
378
+ # rabbits: {
379
+ # ...
380
+ # }
381
+ # }
382
+ ```
383
+
384
+ ## Disconnecting from Sphinx
385
+
386
+ Oedipus will automatically close connections that are idle, so you don't,
387
+ generally speaking, need to close connections manually. However, before
388
+ forking children, for example, it is important to dispose of any open
389
+ resources to avoid sharing resources across processes. To do this:
390
+
391
+ ``` ruby
392
+ Oedipus.connections.each { |key, conn| conn.close }
393
+ ```
394
+
395
+ ## Running the specs
396
+
397
+ There are both unit tests and integration tests in the specs/ directory. By default they
398
+ will both run, but in order for the integration specs to work, you need a locally
399
+ installed copy of [Sphinx] [2]. You then execute the specs as follows:
400
+
401
+ SEARCHD=/path/to/bin/searchd bundle exec rake spec
402
+
403
+ If you don't have Sphinx installed locally, you cannot run the integration specs (they need
404
+ to write config files and start and stop sphinx internally).
405
+
406
+ To run the unit tests alone, without the need for Sphinx:
407
+
408
+ bundle exec rake spec:unit
409
+
410
+ If you have made changes to the C extension, those changes will be compiled and installed
411
+ (to the lib/ directory) before the specs are run.
412
+
413
+ ### Footnotes
414
+
415
+ [1]: In practice I find such an abstraction not to be very useful, as it assumes a single-server setup
416
+
417
+ [2]: You can build a local copy of sphinx without installing it on the system:
418
+
419
+ cd sphinx-2.0.4/
420
+ ./configure
421
+ make
422
+
423
+ The searchd binary will be found in /path/to/sphinx-2.0.4/src/searchd.
424
+
425
+ ## Future Plans
426
+
427
+ * Integration ActiveRecord
428
+ * Support for re-indexing non-realtime indexes from ruby code
429
+ * Distributed index support (sharding writes between indexes)
430
+ * Query translation layer for Lucene-style AND/OR/NOT and attribute:value interpretation
431
+ * Fulltext query sanitization for unsafe user input (e.g. @@missing field)
432
+
433
+ ## Copyright and Licensing
434
+
435
+ Refer to the LICENSE file for details.