dbldots_oedipus 0.0.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. data/.gitignore +10 -0
  2. data/.rspec +1 -0
  3. data/Gemfile +4 -0
  4. data/LICENSE +20 -0
  5. data/README.md +435 -0
  6. data/Rakefile +26 -0
  7. data/ext/oedipus/extconf.rb +72 -0
  8. data/ext/oedipus/lexing.c +96 -0
  9. data/ext/oedipus/lexing.h +20 -0
  10. data/ext/oedipus/oedipus.c +339 -0
  11. data/ext/oedipus/oedipus.h +58 -0
  12. data/lib/oedipus.rb +40 -0
  13. data/lib/oedipus/comparison.rb +88 -0
  14. data/lib/oedipus/comparison/between.rb +21 -0
  15. data/lib/oedipus/comparison/equal.rb +21 -0
  16. data/lib/oedipus/comparison/gt.rb +21 -0
  17. data/lib/oedipus/comparison/gte.rb +21 -0
  18. data/lib/oedipus/comparison/in.rb +21 -0
  19. data/lib/oedipus/comparison/lt.rb +21 -0
  20. data/lib/oedipus/comparison/lte.rb +21 -0
  21. data/lib/oedipus/comparison/not.rb +25 -0
  22. data/lib/oedipus/comparison/not_equal.rb +21 -0
  23. data/lib/oedipus/comparison/not_in.rb +21 -0
  24. data/lib/oedipus/comparison/outside.rb +21 -0
  25. data/lib/oedipus/comparison/shortcuts.rb +144 -0
  26. data/lib/oedipus/connection.rb +124 -0
  27. data/lib/oedipus/connection/pool.rb +133 -0
  28. data/lib/oedipus/connection/registry.rb +56 -0
  29. data/lib/oedipus/connection_error.rb +14 -0
  30. data/lib/oedipus/index.rb +320 -0
  31. data/lib/oedipus/query_builder.rb +185 -0
  32. data/lib/oedipus/rspec/test_rig.rb +132 -0
  33. data/lib/oedipus/version.rb +12 -0
  34. data/oedipus.gemspec +42 -0
  35. data/spec/data/.gitkeep +0 -0
  36. data/spec/integration/connection/registry_spec.rb +50 -0
  37. data/spec/integration/connection_spec.rb +156 -0
  38. data/spec/integration/index_spec.rb +442 -0
  39. data/spec/spec_helper.rb +16 -0
  40. data/spec/unit/comparison/between_spec.rb +36 -0
  41. data/spec/unit/comparison/equal_spec.rb +22 -0
  42. data/spec/unit/comparison/gt_spec.rb +22 -0
  43. data/spec/unit/comparison/gte_spec.rb +22 -0
  44. data/spec/unit/comparison/in_spec.rb +22 -0
  45. data/spec/unit/comparison/lt_spec.rb +22 -0
  46. data/spec/unit/comparison/lte_spec.rb +22 -0
  47. data/spec/unit/comparison/not_equal_spec.rb +22 -0
  48. data/spec/unit/comparison/not_in_spec.rb +22 -0
  49. data/spec/unit/comparison/not_spec.rb +37 -0
  50. data/spec/unit/comparison/outside_spec.rb +36 -0
  51. data/spec/unit/comparison/shortcuts_spec.rb +125 -0
  52. data/spec/unit/comparison_spec.rb +109 -0
  53. data/spec/unit/query_builder_spec.rb +205 -0
  54. metadata +164 -0
@@ -0,0 +1,10 @@
1
+ *.gem
2
+ .bundle
3
+ Gemfile.lock
4
+ pkg/*
5
+ spec/data/index/*
6
+ spec/data/binlog/*
7
+ spec/data/searchd.*
8
+ spec/data/sphinx.*
9
+ lib/oedipus/oedipus.so
10
+ tmp/
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --colour
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in oedipus.gemspec
4
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright © 2011 Chris Corbyn
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,435 @@
1
+ # Oedipus: Sphinx 2 Search Client for Ruby
2
+
3
+ Oedipus is a client for the Sphinx search engine (>= 2.0.2), with support for
4
+ real-time indexes and multi-dimensional faceted searches.
5
+
6
+ It is not a clone of the PHP API, rather it is written from the ground up,
7
+ wrapping the SphinxQL API offered by searchd. Nor is it a plugin for
8
+ ActiveRecord or DataMapper... though this will be offered in separate gems (see
9
+ [oedipus-dm](https://github.com/d11wtq/oedipus-dm)).
10
+
11
+ Oedipus provides a level of abstraction in terms of the ease with which faceted
12
+ search may be implemented, while remaining light and simple.
13
+
14
+ Data structures are managed using core ruby data types (Array and Hash), ensuring
15
+ simplicity and flexibilty.
16
+
17
+ The current development focus is on supporting realtime indexes, where data is
18
+ indexed from your application, rather than by running the indexer tool that comes
19
+ with Sphinx. You may use indexes that are indexed with the indexer tool, but
20
+ Oedipus does not (yet) provide wrappers for indexing that data via ruby [1].
21
+
22
+ ## Dependencies
23
+
24
+ * ruby >= 1.9
25
+ * sphinx >= 2.0.2
26
+ * mysql dev libs >= 4.1
27
+
28
+ ## Installation
29
+
30
+ Via rubygems:
31
+
32
+ ```
33
+ gem install oedipus
34
+ ```
35
+
36
+ ## Usage
37
+
38
+ The following features are all currently implemented.
39
+
40
+ ### Connecting to Sphinx
41
+
42
+ ``` ruby
43
+ require "oedipus"
44
+
45
+ sphinx = Oedipus.connect('127.0.0.1:9306') # sphinxql host
46
+ ```
47
+
48
+ **NOTE:** Don't connect to the named host 'localhost', since the MySQL library
49
+ will try to use a UNIX socket instead of a TCP connection, which Sphinx doesn't
50
+ currently support.
51
+
52
+ Connections can be re-used by calling `Oedipus.connection` once connected.
53
+
54
+ ``` ruby
55
+ sphinx = Oedipus.connection
56
+ ```
57
+
58
+ If you're using Oedipus in a Rails application, you may wish to call `connect`
59
+ inside an initializer and then obtain that connection in your application, by
60
+ calling `connection`.
61
+
62
+ If you need to manage multiple connections, you may specify names for each
63
+ connection.
64
+
65
+ ``` ruby
66
+ Oedipus.connect("other-host.tld:9306", :other)
67
+
68
+ sphinx = Oedipus.connection(:other)
69
+ ```
70
+
71
+ ### Inserting (real-time indexes)
72
+
73
+ ``` ruby
74
+ sphinx[:articles].insert(
75
+ 7,
76
+ title: "Badgers in the wild",
77
+ body: "A big long wodge of text",
78
+ author_id: 4,
79
+ views: 102
80
+ )
81
+ ```
82
+
83
+ ### Replacing (real-time indexes)
84
+
85
+ ``` ruby
86
+ sphinx[:articles].replace(
87
+ 7,
88
+ title: "Badgers in the wild",
89
+ body: "A big long wodge of text",
90
+ author_id: 4,
91
+ views: 102
92
+ )
93
+ ```
94
+
95
+ ### Updating (real-time indexes)
96
+
97
+ ``` ruby
98
+ sphinx[:articles].update(7, views: 103)
99
+ ```
100
+
101
+ ### Deleting (real-time indexes)
102
+
103
+ ``` ruby
104
+ sphinx[:articles].delete(7)
105
+ ```
106
+
107
+ ### Fetching a known document (by ID)
108
+
109
+ ``` ruby
110
+ record = sphinx[:articles].fetch(7)
111
+ # => { id: 7, views: 984, author_id: 3 }
112
+ ```
113
+
114
+ ### Fulltext searching
115
+
116
+ You perform queries by invoking `#search` on the index.
117
+
118
+
119
+ ``` ruby
120
+ results = sphinx[:articles].search("badgers", limit: 2)
121
+
122
+ # Meta deta indicates the overall number of matched records, while the ':records'
123
+ # array contains the actual data returned.
124
+ #
125
+ # => {
126
+ # total_found: 987,
127
+ # time: 0.000,
128
+ # keywords: [ "badgers" ],
129
+ # docs: { "badgers" => 987 },
130
+ # records: [
131
+ # { id: 7, author_id: 4, views: 102 },
132
+ # { id: 11, author_id: 6, views: 23 }
133
+ # ]
134
+ # }
135
+ ```
136
+
137
+ ### Fetching only specific attributes
138
+
139
+ ``` ruby
140
+ sphinx[:articles].search(
141
+ "example",
142
+ attrs: [:id, :views]
143
+ )
144
+ ```
145
+
146
+ ### Fetching additional attributes (including expressions)
147
+
148
+ Any valid field expression may be fetched. Be sure to alias it if you want to order by it.
149
+
150
+ ``` ruby
151
+ sphinx[:articles].search(
152
+ "example",
153
+ attrs: [:*, "WEIGHT() AS wgt"]
154
+ )
155
+ ```
156
+
157
+ ### Attribute filters
158
+
159
+ Result formatting is the same as for a fulltext search. You can add as many
160
+ filters as you like.
161
+
162
+ ``` ruby
163
+ # equality
164
+ sphinx[:articles].search(
165
+ "example",
166
+ author_id: 7
167
+ )
168
+
169
+ # less than or equal
170
+ sphinx[:articles].search(
171
+ "example",
172
+ views: -Float::INFINITY..100
173
+ )
174
+
175
+ sphinx[:articles].search(
176
+ "example",
177
+ views: Oedipus.lte(100)
178
+ )
179
+
180
+ # greater than
181
+ sphinx[:articles].search(
182
+ "example",
183
+ views: 100...Float::INFINITY
184
+ )
185
+
186
+ sphinx[:articles].search(
187
+ "example",
188
+ views: Oedipus.gt(100)
189
+ )
190
+
191
+ # not equal
192
+ sphinx[:articles].search(
193
+ "example",
194
+ author_id: Oedipus.not(7)
195
+ )
196
+
197
+ # between
198
+ sphinx[:articles].search(
199
+ "example",
200
+ views: 50..100
201
+ )
202
+
203
+ sphinx[:articles].search(
204
+ "example",
205
+ views: 50...100
206
+ )
207
+
208
+ # not between
209
+ sphinx[:articles].search(
210
+ "example",
211
+ views: Oedipus.not(50..100)
212
+ )
213
+
214
+ sphinx[:articles].search(
215
+ "example",
216
+ views: Oedipus.not(50...100)
217
+ )
218
+
219
+ # IN( ... )
220
+ sphinx[:articles].search(
221
+ "example",
222
+ author_id: [7, 22]
223
+ )
224
+
225
+ # NOT IN( ... )
226
+ sphinx[:articles].search(
227
+ "example",
228
+ author_id: Oedipus.not([7, 22])
229
+ )
230
+ ```
231
+
232
+ ### Ordering
233
+
234
+ ``` ruby
235
+ sphinx[:articles].search("badgers", order: { views: :asc })
236
+ ```
237
+
238
+ Special handling is done for ordering by relevance.
239
+
240
+ ``` ruby
241
+ sphinx[:articles].search("badgers", order: { relevance: :desc })
242
+ ```
243
+
244
+ In the above case, Oedipus explicity adds `WEIGHT() AS relevance` to the `:attrs`
245
+ option. You can manually set up the relevance sort if you wish to name the weighting
246
+ attribute differently.
247
+
248
+ ### Limits and offsets
249
+
250
+ Note that Sphinx applies a limit of 20 by default, so you probably want to specify
251
+ a limit yourself. You are bound by your `max_matches` setting in sphinx.conf.
252
+
253
+ The meta data will still indicate the actual number of results that matched; you simply
254
+ get a smaller collection of materialized records.
255
+
256
+ ``` ruby
257
+ sphinx[:articles].search("bobcats", limit: 50)
258
+ sphinx[:articles].search("bobcats", limit: 50, offset: 150)
259
+ ```
260
+
261
+ ### Faceted searching
262
+
263
+ A faceted search takes a base query and a set of additional queries that are
264
+ variations on it. Oedipus makes this simple by allowing your facets to inherit
265
+ from the base query.
266
+
267
+ Oedipus allows you to replace `'%{query}'` in your facets with whatever was in
268
+ the original query. This can be useful if you want to provide facets that only
269
+ perform the search in the title of the document (`"@title (%{query})"`) for
270
+ example.
271
+
272
+ Each facet is given a key, which is used to reference it in the results. This
273
+ key is any arbitrary object that can be used as a key in a ruby Hash. You may,
274
+ for example, use domain-specific objects as keys.
275
+
276
+ Sphinx optimizes the queries by figuring out what the common parts are. Currently
277
+ it does two optimizations, though in future this will likely improve further, so
278
+ using this technique to do your faceted searches is the correct approach.
279
+
280
+ ``` ruby
281
+ results = sphinx[:articles].search(
282
+ "badgers",
283
+ facets: {
284
+ popular: { views: 100..10000 },
285
+ also_farming: "%{query} & farming",
286
+ popular_farming: ["%{query} & farming", views: 100..10000 ]
287
+ }
288
+ )
289
+ # => {
290
+ # total_found: 987,
291
+ # time: 0.000,
292
+ # records: [ ... ],
293
+ # facets: {
294
+ # popular: {
295
+ # total_found: 25,
296
+ # time: 0.000,
297
+ # records: [ ... ]
298
+ # },
299
+ # also_farming: {
300
+ # total_found: 123,
301
+ # time: 0.000,
302
+ # records: [ ... ]
303
+ # },
304
+ # popular_farming: {
305
+ # total_found: 2,
306
+ # time: 0.000,
307
+ # records: [ ... ]
308
+ # }
309
+ # }
310
+ # }
311
+ ```
312
+
313
+ #### Multi-dimensional faceted search
314
+
315
+ If you can add facets to the root query, how about adding facets to the facets
316
+ themselves? Easy:
317
+
318
+ ``` ruby
319
+ results = sphinx[:articles].search(
320
+ "badgers",
321
+ facets: {
322
+ popular: {
323
+ views: 100..10000,
324
+ facets: {
325
+ in_title: "@title (%{query})"
326
+ }
327
+ }
328
+ }
329
+ )
330
+ # => {
331
+ # total_found: 987,
332
+ # time: 0.000,
333
+ # records: [ ... ],
334
+ # facets: {
335
+ # popular: {
336
+ # total_found: 25,
337
+ # time: 0.000,
338
+ # records: [ ... ],
339
+ # facets: {
340
+ # in_title: {
341
+ # total_found: 24,
342
+ # time: 0.000,
343
+ # records: [ ... ]
344
+ # }
345
+ # }
346
+ # }
347
+ # }
348
+ # }
349
+ ```
350
+
351
+ In the above example, the nested facet `:in_title` inherits the default
352
+ parameters from the facet `:popular`, which inherits its parameters from
353
+ the root query. The result is a search for "badgers" limited only to the
354
+ title, with views between 100 and 10000.
355
+
356
+ There is no limit imposed in Oedipus for how deeply facets can be nested.
357
+
358
+ ### General purpose multi-search
359
+
360
+ If you want to execute multiple queries in a batch that are not related to each
361
+ other (which would be a faceted search), then you can use `#multi_search`.
362
+
363
+ You pass a Hash of keyed-queries and get a Hash of keyed-resultsets.
364
+
365
+ ``` ruby
366
+ results = sphinx[:articles].multi_search(
367
+ badgers: ["badgers", limit: 30],
368
+ frogs: "frogs & wetlands",
369
+ rabbits: ["rabbits | burrows", view_count: 20..100]
370
+ )
371
+ # => {
372
+ # badgers: {
373
+ # ...
374
+ # },
375
+ # frogs: {
376
+ # ...
377
+ # },
378
+ # rabbits: {
379
+ # ...
380
+ # }
381
+ # }
382
+ ```
383
+
384
+ ## Disconnecting from Sphinx
385
+
386
+ Oedipus will automatically close connections that are idle, so you don't,
387
+ generally speaking, need to close connections manually. However, before
388
+ forking children, for example, it is important to dispose of any open
389
+ resources to avoid sharing resources across processes. To do this:
390
+
391
+ ``` ruby
392
+ Oedipus.connections.each { |key, conn| conn.close }
393
+ ```
394
+
395
+ ## Running the specs
396
+
397
+ There are both unit tests and integration tests in the specs/ directory. By default they
398
+ will both run, but in order for the integration specs to work, you need a locally
399
+ installed copy of [Sphinx] [2]. You then execute the specs as follows:
400
+
401
+ SEARCHD=/path/to/bin/searchd bundle exec rake spec
402
+
403
+ If you don't have Sphinx installed locally, you cannot run the integration specs (they need
404
+ to write config files and start and stop sphinx internally).
405
+
406
+ To run the unit tests alone, without the need for Sphinx:
407
+
408
+ bundle exec rake spec:unit
409
+
410
+ If you have made changes to the C extension, those changes will be compiled and installed
411
+ (to the lib/ directory) before the specs are run.
412
+
413
+ ### Footnotes
414
+
415
+ [1]: In practice I find such an abstraction not to be very useful, as it assumes a single-server setup
416
+
417
+ [2]: You can build a local copy of sphinx without installing it on the system:
418
+
419
+ cd sphinx-2.0.4/
420
+ ./configure
421
+ make
422
+
423
+ The searchd binary will be found in /path/to/sphinx-2.0.4/src/searchd.
424
+
425
+ ## Future Plans
426
+
427
+ * Integration ActiveRecord
428
+ * Support for re-indexing non-realtime indexes from ruby code
429
+ * Distributed index support (sharding writes between indexes)
430
+ * Query translation layer for Lucene-style AND/OR/NOT and attribute:value interpretation
431
+ * Fulltext query sanitization for unsafe user input (e.g. @@missing field)
432
+
433
+ ## Copyright and Licensing
434
+
435
+ Refer to the LICENSE file for details.