pager-ultrasphinx 1.0.20080510

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,456 @@
1
+
2
+ module Ultrasphinx
3
+
4
+ =begin rdoc
5
+ Command-interface Search object.
6
+
7
+ == Basic usage
8
+
9
+ To set up a search, instantiate an Ultrasphinx::Search object with a hash of parameters. Only the <tt>:query</tt> key is mandatory.
10
+ @search = Ultrasphinx::Search.new(
11
+ :query => @query,
12
+ :sort_mode => 'descending',
13
+ :sort_by => 'created_at'
14
+ )
15
+
16
+ Now, to run the query, call its <tt>run</tt> method. Your results will be available as ActiveRecord instances via the <tt>results</tt> method. Example:
17
+ @search.run
18
+ @search.results
19
+
20
+ = Options
21
+
22
+ == Query format
23
+
24
+ The query string supports boolean operation, parentheses, phrases, and field-specific search. Query words are stemmed and joined by an implicit <tt>AND</tt> by default.
25
+
26
+ * Valid boolean operators are <tt>AND</tt>, <tt>OR</tt>, and <tt>NOT</tt>.
27
+ * Field-specific searches should be formatted as <tt>fieldname:contents</tt>. (This will only work for text fields. For numeric and date fields, see the <tt>:filters</tt> parameter, below.)
28
+ * Phrases must be enclosed in double quotes.
29
+
30
+ A Sphinx::SphinxInternalError will be raised on invalid queries. In general, queries can only be nested to one level.
31
+ @query = 'dog OR cat OR "white tigers" NOT (lions OR bears) AND title:animals'
32
+
33
+ == Hash parameters
34
+
35
+ The hash lets you customize internal aspects of the search.
36
+
37
+ <tt>:per_page</tt>:: An integer. How many results per page.
38
+ <tt>:page</tt>:: An integer. Which page of the results to return.
39
+ <tt>:class_names</tt>:: An array or string. The class name of the model you want to search, an array of model names to search, or <tt>nil</tt> for all available models.
40
+ <tt>:sort_mode</tt>:: <tt>'relevance'</tt> or <tt>'ascending'</tt> or <tt>'descending'</tt>. How to order the result set. Note that <tt>'time'</tt> and <tt>'extended'</tt> modes are available, but not tested.
41
+ <tt>:sort_by</tt>:: A field name. What field to order by for <tt>'ascending'</tt> or <tt>'descending'</tt> mode. Has no effect for <tt>'relevance'</tt>.
42
+ <tt>:weights</tt>:: A hash. Text-field names and associated query weighting. The default weight for every field is 1.0. Example: <tt>:weights => {'title' => 2.0}</tt>
43
+ <tt>:filters</tt>:: A hash. Names of numeric or date fields and associated values. You can use a single value, an array of values, or a range. (See the bottom of the ActiveRecord::Base page for an example.)
44
+ <tt>:facets</tt>:: An array of fields for grouping/faceting. You can access the returned facet values and their result counts with the <tt>facets</tt> method.
45
+ <tt>:location</tt>:: A hash. Specify the names of your latititude and longitude attributes as declared in your is_indexed calls. To sort the results by distance, set <tt>:sort_mode => 'extended'</tt> and <tt>:sort_by => 'distance asc'.</tt>
46
+ <tt>:indexes</tt>:: An array of indexes to search. Currently only <tt>Ultrasphinx::MAIN_INDEX</tt> and <tt>Ultrasphinx::DELTA_INDEX</tt> are available. Defaults to both; changing this is rarely needed.
47
+
48
+ == Query Defaults
49
+
50
+ Note that you can set up your own query defaults in <tt>environment.rb</tt>:
51
+
52
+ self.class.query_defaults = HashWithIndifferentAccess.new({
53
+ :per_page => 10,
54
+ :sort_mode => 'relevance',
55
+ :weights => {'title' => 2.0}
56
+ })
57
+
58
+ = Advanced features
59
+
60
+ == Geographic distance
61
+
62
+ If you pass a <tt>:location</tt> Hash, distance from the location in meters will be available in your result records via the <tt>distance</tt> accessor:
63
+
64
+ @search = Ultrasphinx::Search.new(:class_names => 'Point',
65
+ :query => 'pizza',
66
+ :sort_mode => 'extended',
67
+ :sort_by => 'distance',
68
+ :location => {
69
+ :lat => 40.3,
70
+ :long => -73.6
71
+ })
72
+
73
+ @search.run.first.distance #=> 1402.4
74
+
75
+ Note that Sphinx expects lat/long to be indexed as radians. If you have degrees in your database, do the conversion in the <tt>is_indexed</tt> as so:
76
+
77
+ is_indexed 'fields' => [
78
+ 'name',
79
+ 'description',
80
+ {:field => 'lat', :function_sql => "RADIANS(?)"},
81
+ {:field => 'lng', :function_sql => "RADIANS(?)"}
82
+ ]
83
+
84
+ Then, set <tt>Ultrasphinx::Search.client_options[:location][:units] = 'degrees'</tt>.
85
+
86
+ The MySQL <tt>:double</tt> column type is recommended for storing location data. For Postgres, use <tt>:float</tt.
87
+
88
+ == Interlock integration
89
+
90
+ Ultrasphinx uses the <tt>find_all_by_id</tt> method to instantiate records. If you set <tt>with_finders: true</tt> in {Interlock's}[http://blog.evanweaver.com/files/doc/fauna/interlock] <tt>config/memcached.yml</tt>, Interlock overrides <tt>find_all_by_id</tt> with a caching version.
91
+
92
+ == Will_paginate integration
93
+
94
+ The Search instance responds to the same methods as a WillPaginate::Collection object, so once you have called <tt>run</tt> or <tt>excerpt</tt> you can use it directly in your views:
95
+
96
+ will_paginate(@search)
97
+
98
+ == Excerpt mode
99
+
100
+ You can have Sphinx excerpt and highlight the matched sections in the associated fields. Instead of calling <tt>run</tt>, call <tt>excerpt</tt>.
101
+
102
+ @search.excerpt
103
+
104
+ The returned models will be frozen and have their field contents temporarily changed to the excerpted and highlighted results.
105
+
106
+ You need to set the <tt>content_methods</tt> key on Ultrasphinx::Search.excerpting_options to whatever groups of methods you need the excerpter to try to excerpt. The first responding method in each group for each record will be excerpted. This way Ruby-only methods are supported (for example, a metadata method which combines various model fields, or an aliased field so that the original record contents are still available).
107
+
108
+ There are some other keys you can set, such as excerpt size, HTML tags to highlight with, and number of words on either side of each excerpt chunk. Example (in <tt>environment.rb</tt>):
109
+
110
+ Ultrasphinx::Search.excerpting_options = HashWithIndifferentAccess.new({
111
+ :before_match => '<strong>',
112
+ :after_match => '</strong>',
113
+ :chunk_separator => "...",
114
+ :limit => 256,
115
+ :around => 3,
116
+ :content_methods => [['title'], ['body', 'description', 'content'], ['metadata']]
117
+ })
118
+
119
+ Note that your database is never changed by anything Ultrasphinx does.
120
+
121
+ =end
122
+
123
+ class Search
124
+
125
+ include Internals
126
+ include Parser
127
+
128
+ cattr_accessor :query_defaults
129
+ self.query_defaults ||= HashWithIndifferentAccess.new({
130
+ :query => nil,
131
+ :page => 1,
132
+ :per_page => 20,
133
+ :sort_by => nil,
134
+ :sort_mode => 'relevance',
135
+ :indexes => [
136
+ MAIN_INDEX,
137
+ (DELTA_INDEX if Ultrasphinx.delta_index_present?)
138
+ ].compact,
139
+ :weights => {},
140
+ :class_names => [],
141
+ :filters => {},
142
+ :facets => [],
143
+ :location => HashWithIndifferentAccess.new({
144
+ :lat_attribute_name => 'lat',
145
+ :long_attribute_name => 'lng',
146
+ :units => 'radians'
147
+ })
148
+ })
149
+
150
+ cattr_accessor :excerpting_options
151
+ self.excerpting_options ||= HashWithIndifferentAccess.new({
152
+ :before_match => "<strong>", :after_match => "</strong>",
153
+ :chunk_separator => "...",
154
+ :limit => 256,
155
+ :around => 3,
156
+ # Results should respond to one in each group of these, in precedence order, for the
157
+ # excerpting to fire
158
+ :content_methods => [['title', 'name'], ['body', 'description', 'content'], ['metadata']]
159
+ })
160
+
161
+ cattr_accessor :client_options
162
+ self.client_options ||= HashWithIndifferentAccess.new({
163
+ :with_subtotals => false,
164
+ :ignore_missing_records => false,
165
+ # Has no effect if :ignore_missing_records => false
166
+ :max_missing_records => 5,
167
+ :max_retries => 4,
168
+ :retry_sleep_time => 0.5,
169
+ :max_facets => 1000,
170
+ :max_matches_offset => 1000,
171
+ # Whether to add an accessor to each returned result that specifies its global rank in
172
+ # the search.
173
+ :with_global_rank => false,
174
+ # Which method names to try to use for loading records. You can define your own (for
175
+ # example, with :includes) and then attach it here. Each method must accept an Array
176
+ # of ids, but do not have to preserve order. If the class does not respond_to? any
177
+ # method name in the array, :find_all_by_id will be used.
178
+ :finder_methods => []
179
+ })
180
+
181
+ # Friendly sort mode mappings
182
+ SPHINX_CLIENT_PARAMS = {
183
+ 'sort_mode' => {
184
+ 'relevance' => :relevance,
185
+ 'descending' => :attr_desc,
186
+ 'ascending' => :attr_asc,
187
+ 'time' => :time_segments,
188
+ 'extended' => :extended,
189
+ }
190
+ }
191
+
192
+ INTERNAL_KEYS = ['parsed_query'] #:nodoc:
193
+
194
+ MODELS_TO_IDS = Ultrasphinx.get_models_to_class_ids || {}
195
+
196
+ IDS_TO_MODELS = MODELS_TO_IDS.invert #:nodoc:
197
+
198
+ MAX_MATCHES = DAEMON_SETTINGS["max_matches"].to_i
199
+
200
+ FACET_CACHE = {} #:nodoc:
201
+
202
+ # Returns the options hash.
203
+ def options
204
+ @options
205
+ end
206
+
207
+ # Returns the query string used.
208
+ def query
209
+ # Redundant with method_missing
210
+ @options['query']
211
+ end
212
+
213
+ def parsed_query #:nodoc:
214
+ # Redundant with method_missing
215
+ @options['parsed_query']
216
+ end
217
+
218
+ # Returns an array of result objects.
219
+ def results
220
+ require_run
221
+ @results
222
+ end
223
+
224
+ # Returns the facet map for this query, if facets were used.
225
+ def facets
226
+ raise UsageError, "No facet field was configured" unless @options['facets']
227
+ require_run
228
+ @facets
229
+ end
230
+
231
+ # Returns the raw response from the Sphinx client.
232
+ def response
233
+ require_run
234
+ @response
235
+ end
236
+
237
+ # Returns a hash of total result counts, scoped to each available model. Set <tt>Ultrasphinx::Search.client_options[:with_subtotals] = true</tt> to enable.
238
+ #
239
+ # The subtotals are implemented as a special type of facet.
240
+ def subtotals
241
+ raise UsageError, "Subtotals are not enabled" unless self.class.client_options['with_subtotals']
242
+ require_run
243
+ @subtotals
244
+ end
245
+
246
+ # Returns the total result count.
247
+ def total_entries
248
+ require_run
249
+ [response[:total_found] || 0, MAX_MATCHES].min
250
+ end
251
+
252
+ # Returns the response time of the query, in milliseconds.
253
+ def time
254
+ require_run
255
+ response[:time]
256
+ end
257
+
258
+ # Returns whether the query has been run.
259
+ def run?
260
+ !@response.blank?
261
+ end
262
+
263
+ # Returns the current page number of the result set. (Page indexes begin at 1.)
264
+ def current_page
265
+ @options['page']
266
+ end
267
+
268
+ # Returns the number of records per page.
269
+ def per_page
270
+ @options['per_page']
271
+ end
272
+
273
+ # Returns the last available page number in the result set.
274
+ def page_count
275
+ require_run
276
+ (total_entries / per_page.to_f).ceil
277
+ end
278
+
279
+ # Returns the previous page number.
280
+ def previous_page
281
+ current_page > 1 ? (current_page - 1) : nil
282
+ end
283
+
284
+ # Returns the next page number.
285
+ def next_page
286
+ current_page < page_count ? (current_page + 1) : nil
287
+ end
288
+
289
+ # Returns the global index position of the first result on this page.
290
+ def offset
291
+ (current_page - 1) * per_page
292
+ end
293
+
294
+ # Builds a new command-interface Search object.
295
+ def initialize opts = {}
296
+
297
+ # Change to normal hashes with String keys for speed
298
+ opts = Hash[HashWithIndifferentAccess.new(opts._deep_dup._coerce_basic_types)]
299
+ unless self.class.query_defaults.instance_of? Hash
300
+ self.class.query_defaults = Hash[self.class.query_defaults]
301
+ self.class.query_defaults['location'] = Hash[self.class.query_defaults['location']]
302
+
303
+ self.class.client_options = Hash[self.class.client_options]
304
+ self.class.excerpting_options = Hash[self.class.excerpting_options]
305
+ self.class.excerpting_options['content_methods'].map! {|ary| ary.map {|m| m.to_s}}
306
+ end
307
+
308
+ # We need an annoying deep merge on the :location parameter
309
+ opts['location'].reverse_merge!(self.class.query_defaults['location']) if opts['location']
310
+
311
+ # Merge the rest of the defaults
312
+ @options = self.class.query_defaults.merge(opts)
313
+
314
+ @options['query'] = @options['query'].to_s
315
+ @options['class_names'] = Array(@options['class_names'])
316
+ @options['facets'] = Array(@options['facets'])
317
+ @options['indexes'] = Array(@options['indexes']).join(" ")
318
+
319
+ raise UsageError, "Weights must be a Hash" unless @options['weights'].is_a? Hash
320
+ raise UsageError, "Filters must be a Hash" unless @options['filters'].is_a? Hash
321
+
322
+ @options['parsed_query'] = parse(query)
323
+
324
+ @results, @subtotals, @facets, @response = [], {}, {}, {}
325
+
326
+ extra_keys = @options.keys - (self.class.query_defaults.keys + INTERNAL_KEYS)
327
+ log "discarded invalid keys: #{extra_keys * ', '}" if extra_keys.any? and RAILS_ENV != "test"
328
+ end
329
+
330
+ # Run the search, filling results with an array of ActiveRecord objects. Set the parameter to false
331
+ # if you only want the ids returned.
332
+ def run(reify = true)
333
+ @request = build_request_with_options(@options)
334
+
335
+ log "searching for #{@options.inspect}"
336
+
337
+ perform_action_with_retries do
338
+ @response = @request.query(parsed_query, @options['indexes'])
339
+ log "search returned #{total_entries}/#{response[:total_found].to_i} in #{time.to_f} seconds."
340
+
341
+ if self.class.client_options['with_subtotals']
342
+ @subtotals = get_subtotals(@request, parsed_query)
343
+
344
+ # If the original query has a filter on this class, we will use its more accurate total rather the facet's
345
+ # less accurate total.
346
+ if @options['class_names'].size == 1
347
+ @subtotals[@options['class_names'].first] = response[:total_found]
348
+ end
349
+
350
+ end
351
+
352
+ Array(@options['facets']).each do |facet|
353
+ @facets[facet] = get_facets(@request, parsed_query, facet)
354
+ end
355
+
356
+ @results = convert_sphinx_ids(response[:matches])
357
+ @results = reify_results(@results) if reify
358
+
359
+ say "warning; #{response[:warning]}" if response[:warning]
360
+ raise UsageError, response[:error] if response[:error]
361
+
362
+ end
363
+ self
364
+ end
365
+
366
+
367
+ # Overwrite the configured content attributes with excerpted and highlighted versions of themselves.
368
+ # Runs run if it hasn't already been done.
369
+ def excerpt
370
+
371
+ require_run
372
+ return if results.empty?
373
+
374
+ # See what fields in each result might respond to our excerptable methods
375
+ results_with_content_methods = results.map do |result|
376
+ [result,
377
+ self.class.excerpting_options['content_methods'].map do |methods|
378
+ methods.detect do |this|
379
+ result.respond_to? this
380
+ end
381
+ end
382
+ ]
383
+ end
384
+
385
+ # Fetch the actual field contents
386
+ docs = results_with_content_methods.map do |result, methods|
387
+ methods.map do |method|
388
+ method and strip_bogus_characters(result.send(method)) or ""
389
+ end
390
+ end.flatten
391
+
392
+ excerpting_options = {
393
+ :docs => docs,
394
+ :index => MAIN_INDEX, # http://www.sphinxsearch.com/forum/view.html?id=100
395
+ :words => strip_query_commands(parsed_query)
396
+ }
397
+ self.class.excerpting_options.except('content_methods').each do |key, value|
398
+ # Riddle only wants symbols
399
+ excerpting_options[key.to_sym] ||= value
400
+ end
401
+
402
+ responses = perform_action_with_retries do
403
+ # Ship to Sphinx to highlight and excerpt
404
+ @request.excerpts(excerpting_options)
405
+ end
406
+
407
+ responses = responses.in_groups_of(self.class.excerpting_options['content_methods'].size)
408
+
409
+ results_with_content_methods.each_with_index do |result_and_methods, i|
410
+ # Override the individual model accessors with the excerpted data
411
+ result, methods = result_and_methods
412
+ methods.each_with_index do |method, j|
413
+ data = responses[i][j]
414
+ if method
415
+ result._metaclass.send('define_method', method) { data }
416
+ attributes = result.instance_variable_get('@attributes')
417
+ attributes[method] = data if attributes[method]
418
+ end
419
+ end
420
+ end
421
+
422
+ @results = results_with_content_methods.map do |result_and_content_method|
423
+ result_and_content_method.first.freeze
424
+ end
425
+
426
+ self
427
+ end
428
+
429
+
430
+ # Delegates enumerable methods to @results, if possible. This allows us to behave directly like a WillPaginate::Collection. Failing that, we delegate to the options hash if a key is set. This lets us use <tt>self</tt> directly in view helpers.
431
+ def method_missing(*args, &block)
432
+ if @results.respond_to? args.first
433
+ @results.send(*args, &block)
434
+ elsif options.has_key? args.first.to_s
435
+ @options[args.first.to_s]
436
+ else
437
+ super
438
+ end
439
+ end
440
+
441
+ def log msg #:nodoc:
442
+ Ultrasphinx.log msg
443
+ end
444
+
445
+ def say msg #:nodoc:
446
+ Ultrasphinx.say msg
447
+ end
448
+
449
+ private
450
+
451
+ def require_run
452
+ run unless run?
453
+ end
454
+
455
+ end
456
+ end
@@ -0,0 +1,57 @@
1
+
2
+
3
+ module Ultrasphinx
4
+
5
+ =begin rdoc
6
+
7
+ In order to spellcheck your user's query, Ultrasphinx bundles a small spelling module.
8
+
9
+ == Setup
10
+
11
+ Make sure Aspell and the Rubygem <tt>raspell</tt> are installed. See http://blog.evanweaver.com/files/doc/fauna/raspell/ for detailed instructions.
12
+
13
+ Copy the <tt>examples/ap.multi</tt> file into your Aspell dictionary folder (<tt>/opt/local/share/aspell/</tt> on Mac, <tt>/usr/lib/aspell-0.60/</tt> on Linux). This file lets Aspell load a custom wordlist generated by Sphinx from your app data (you can configure its filename in the <tt>config/ultrasphinx/*.base</tt> files). Modify the file if you don't want to also use the default American English dictionary.
14
+
15
+ Finally, to build the custom wordlist, run:
16
+ sudo rake ultrasphinx:spelling:build
17
+
18
+ You need to use <tt>sudo</tt> because Ultrasphinx needs to write to the Aspell dictionary folder. Also note that Aspell, <tt>raspell</tt>, and the custom dictionary must be available on each application server, not on the Sphinx daemon server.
19
+
20
+
21
+ == Usage
22
+
23
+ Now you can see if a query is correctly spelled as so:
24
+ @correction = Ultrasphinx::Spell.correct(@search.query)
25
+
26
+ If <tt>@correction</tt> is not <tt>nil</tt>, go ahead and suggest it to the user.
27
+
28
+ =end
29
+
30
+ module Spell
31
+
32
+ begin
33
+ SP = Aspell.new(Ultrasphinx::DICTIONARY)
34
+ SP.suggestion_mode = Aspell::NORMAL
35
+ SP.set_option("ignore-case", "true")
36
+ Ultrasphinx.say "spelling support enabled"
37
+ rescue Object => e
38
+ SP = nil
39
+ Ultrasphinx.say "spelling support not available (raspell configuration raised \"#{e}\")"
40
+ end
41
+
42
+ def self.correct string
43
+ return nil unless SP
44
+ correction = string.gsub(/[\w\']+/) do |word|
45
+ unless SP.check(word)
46
+ SP.suggest(word).first
47
+ else
48
+ word
49
+ end
50
+ end
51
+
52
+ correction if correction != string
53
+ end
54
+
55
+ end
56
+ end
57
+