pager-ultrasphinx 1.0.20080510

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,456 @@
1
+
2
+ module Ultrasphinx
3
+
4
+ =begin rdoc
5
+ Command-interface Search object.
6
+
7
+ == Basic usage
8
+
9
+ To set up a search, instantiate an Ultrasphinx::Search object with a hash of parameters. Only the <tt>:query</tt> key is mandatory.
10
+ @search = Ultrasphinx::Search.new(
11
+ :query => @query,
12
+ :sort_mode => 'descending',
13
+ :sort_by => 'created_at'
14
+ )
15
+
16
+ Now, to run the query, call its <tt>run</tt> method. Your results will be available as ActiveRecord instances via the <tt>results</tt> method. Example:
17
+ @search.run
18
+ @search.results
19
+
20
+ = Options
21
+
22
+ == Query format
23
+
24
+ The query string supports boolean operation, parentheses, phrases, and field-specific search. Query words are stemmed and joined by an implicit <tt>AND</tt> by default.
25
+
26
+ * Valid boolean operators are <tt>AND</tt>, <tt>OR</tt>, and <tt>NOT</tt>.
27
+ * Field-specific searches should be formatted as <tt>fieldname:contents</tt>. (This will only work for text fields. For numeric and date fields, see the <tt>:filters</tt> parameter, below.)
28
+ * Phrases must be enclosed in double quotes.
29
+
30
+ A Sphinx::SphinxInternalError will be raised on invalid queries. In general, queries can only be nested to one level.
31
+ @query = 'dog OR cat OR "white tigers" NOT (lions OR bears) AND title:animals'
32
+
33
+ == Hash parameters
34
+
35
+ The hash lets you customize internal aspects of the search.
36
+
37
+ <tt>:per_page</tt>:: An integer. How many results per page.
38
+ <tt>:page</tt>:: An integer. Which page of the results to return.
39
+ <tt>:class_names</tt>:: An array or string. The class name of the model you want to search, an array of model names to search, or <tt>nil</tt> for all available models.
40
+ <tt>:sort_mode</tt>:: <tt>'relevance'</tt> or <tt>'ascending'</tt> or <tt>'descending'</tt>. How to order the result set. Note that <tt>'time'</tt> and <tt>'extended'</tt> modes are available, but not tested.
41
+ <tt>:sort_by</tt>:: A field name. What field to order by for <tt>'ascending'</tt> or <tt>'descending'</tt> mode. Has no effect for <tt>'relevance'</tt>.
42
+ <tt>:weights</tt>:: A hash. Text-field names and associated query weighting. The default weight for every field is 1.0. Example: <tt>:weights => {'title' => 2.0}</tt>
43
+ <tt>:filters</tt>:: A hash. Names of numeric or date fields and associated values. You can use a single value, an array of values, or a range. (See the bottom of the ActiveRecord::Base page for an example.)
44
+ <tt>:facets</tt>:: An array of fields for grouping/faceting. You can access the returned facet values and their result counts with the <tt>facets</tt> method.
45
+ <tt>:location</tt>:: A hash. Specify the names of your latititude and longitude attributes as declared in your is_indexed calls. To sort the results by distance, set <tt>:sort_mode => 'extended'</tt> and <tt>:sort_by => 'distance asc'.</tt>
46
+ <tt>:indexes</tt>:: An array of indexes to search. Currently only <tt>Ultrasphinx::MAIN_INDEX</tt> and <tt>Ultrasphinx::DELTA_INDEX</tt> are available. Defaults to both; changing this is rarely needed.
47
+
48
+ == Query Defaults
49
+
50
+ Note that you can set up your own query defaults in <tt>environment.rb</tt>:
51
+
52
+ self.class.query_defaults = HashWithIndifferentAccess.new({
53
+ :per_page => 10,
54
+ :sort_mode => 'relevance',
55
+ :weights => {'title' => 2.0}
56
+ })
57
+
58
+ = Advanced features
59
+
60
+ == Geographic distance
61
+
62
+ If you pass a <tt>:location</tt> Hash, distance from the location in meters will be available in your result records via the <tt>distance</tt> accessor:
63
+
64
+ @search = Ultrasphinx::Search.new(:class_names => 'Point',
65
+ :query => 'pizza',
66
+ :sort_mode => 'extended',
67
+ :sort_by => 'distance',
68
+ :location => {
69
+ :lat => 40.3,
70
+ :long => -73.6
71
+ })
72
+
73
+ @search.run.first.distance #=> 1402.4
74
+
75
+ Note that Sphinx expects lat/long to be indexed as radians. If you have degrees in your database, do the conversion in the <tt>is_indexed</tt> as so:
76
+
77
+ is_indexed 'fields' => [
78
+ 'name',
79
+ 'description',
80
+ {:field => 'lat', :function_sql => "RADIANS(?)"},
81
+ {:field => 'lng', :function_sql => "RADIANS(?)"}
82
+ ]
83
+
84
+ Then, set <tt>Ultrasphinx::Search.client_options[:location][:units] = 'degrees'</tt>.
85
+
86
+ The MySQL <tt>:double</tt> column type is recommended for storing location data. For Postgres, use <tt>:float</tt.
87
+
88
+ == Interlock integration
89
+
90
+ Ultrasphinx uses the <tt>find_all_by_id</tt> method to instantiate records. If you set <tt>with_finders: true</tt> in {Interlock's}[http://blog.evanweaver.com/files/doc/fauna/interlock] <tt>config/memcached.yml</tt>, Interlock overrides <tt>find_all_by_id</tt> with a caching version.
91
+
92
+ == Will_paginate integration
93
+
94
+ The Search instance responds to the same methods as a WillPaginate::Collection object, so once you have called <tt>run</tt> or <tt>excerpt</tt> you can use it directly in your views:
95
+
96
+ will_paginate(@search)
97
+
98
+ == Excerpt mode
99
+
100
+ You can have Sphinx excerpt and highlight the matched sections in the associated fields. Instead of calling <tt>run</tt>, call <tt>excerpt</tt>.
101
+
102
+ @search.excerpt
103
+
104
+ The returned models will be frozen and have their field contents temporarily changed to the excerpted and highlighted results.
105
+
106
+ You need to set the <tt>content_methods</tt> key on Ultrasphinx::Search.excerpting_options to whatever groups of methods you need the excerpter to try to excerpt. The first responding method in each group for each record will be excerpted. This way Ruby-only methods are supported (for example, a metadata method which combines various model fields, or an aliased field so that the original record contents are still available).
107
+
108
+ There are some other keys you can set, such as excerpt size, HTML tags to highlight with, and number of words on either side of each excerpt chunk. Example (in <tt>environment.rb</tt>):
109
+
110
+ Ultrasphinx::Search.excerpting_options = HashWithIndifferentAccess.new({
111
+ :before_match => '<strong>',
112
+ :after_match => '</strong>',
113
+ :chunk_separator => "...",
114
+ :limit => 256,
115
+ :around => 3,
116
+ :content_methods => [['title'], ['body', 'description', 'content'], ['metadata']]
117
+ })
118
+
119
+ Note that your database is never changed by anything Ultrasphinx does.
120
+
121
+ =end
122
+
123
+ class Search
124
+
125
+ include Internals
126
+ include Parser
127
+
128
+ cattr_accessor :query_defaults
129
+ self.query_defaults ||= HashWithIndifferentAccess.new({
130
+ :query => nil,
131
+ :page => 1,
132
+ :per_page => 20,
133
+ :sort_by => nil,
134
+ :sort_mode => 'relevance',
135
+ :indexes => [
136
+ MAIN_INDEX,
137
+ (DELTA_INDEX if Ultrasphinx.delta_index_present?)
138
+ ].compact,
139
+ :weights => {},
140
+ :class_names => [],
141
+ :filters => {},
142
+ :facets => [],
143
+ :location => HashWithIndifferentAccess.new({
144
+ :lat_attribute_name => 'lat',
145
+ :long_attribute_name => 'lng',
146
+ :units => 'radians'
147
+ })
148
+ })
149
+
150
+ cattr_accessor :excerpting_options
151
+ self.excerpting_options ||= HashWithIndifferentAccess.new({
152
+ :before_match => "<strong>", :after_match => "</strong>",
153
+ :chunk_separator => "...",
154
+ :limit => 256,
155
+ :around => 3,
156
+ # Results should respond to one in each group of these, in precedence order, for the
157
+ # excerpting to fire
158
+ :content_methods => [['title', 'name'], ['body', 'description', 'content'], ['metadata']]
159
+ })
160
+
161
+ cattr_accessor :client_options
162
+ self.client_options ||= HashWithIndifferentAccess.new({
163
+ :with_subtotals => false,
164
+ :ignore_missing_records => false,
165
+ # Has no effect if :ignore_missing_records => false
166
+ :max_missing_records => 5,
167
+ :max_retries => 4,
168
+ :retry_sleep_time => 0.5,
169
+ :max_facets => 1000,
170
+ :max_matches_offset => 1000,
171
+ # Whether to add an accessor to each returned result that specifies its global rank in
172
+ # the search.
173
+ :with_global_rank => false,
174
+ # Which method names to try to use for loading records. You can define your own (for
175
+ # example, with :includes) and then attach it here. Each method must accept an Array
176
+ # of ids, but do not have to preserve order. If the class does not respond_to? any
177
+ # method name in the array, :find_all_by_id will be used.
178
+ :finder_methods => []
179
+ })
180
+
181
+ # Friendly sort mode mappings
182
+ SPHINX_CLIENT_PARAMS = {
183
+ 'sort_mode' => {
184
+ 'relevance' => :relevance,
185
+ 'descending' => :attr_desc,
186
+ 'ascending' => :attr_asc,
187
+ 'time' => :time_segments,
188
+ 'extended' => :extended,
189
+ }
190
+ }
191
+
192
+ INTERNAL_KEYS = ['parsed_query'] #:nodoc:
193
+
194
+ MODELS_TO_IDS = Ultrasphinx.get_models_to_class_ids || {}
195
+
196
+ IDS_TO_MODELS = MODELS_TO_IDS.invert #:nodoc:
197
+
198
+ MAX_MATCHES = DAEMON_SETTINGS["max_matches"].to_i
199
+
200
+ FACET_CACHE = {} #:nodoc:
201
+
202
+ # Returns the options hash.
203
+ def options
204
+ @options
205
+ end
206
+
207
+ # Returns the query string used.
208
+ def query
209
+ # Redundant with method_missing
210
+ @options['query']
211
+ end
212
+
213
+ def parsed_query #:nodoc:
214
+ # Redundant with method_missing
215
+ @options['parsed_query']
216
+ end
217
+
218
+ # Returns an array of result objects.
219
+ def results
220
+ require_run
221
+ @results
222
+ end
223
+
224
+ # Returns the facet map for this query, if facets were used.
225
+ def facets
226
+ raise UsageError, "No facet field was configured" unless @options['facets']
227
+ require_run
228
+ @facets
229
+ end
230
+
231
+ # Returns the raw response from the Sphinx client.
232
+ def response
233
+ require_run
234
+ @response
235
+ end
236
+
237
+ # Returns a hash of total result counts, scoped to each available model. Set <tt>Ultrasphinx::Search.client_options[:with_subtotals] = true</tt> to enable.
238
+ #
239
+ # The subtotals are implemented as a special type of facet.
240
+ def subtotals
241
+ raise UsageError, "Subtotals are not enabled" unless self.class.client_options['with_subtotals']
242
+ require_run
243
+ @subtotals
244
+ end
245
+
246
+ # Returns the total result count.
247
+ def total_entries
248
+ require_run
249
+ [response[:total_found] || 0, MAX_MATCHES].min
250
+ end
251
+
252
+ # Returns the response time of the query, in milliseconds.
253
+ def time
254
+ require_run
255
+ response[:time]
256
+ end
257
+
258
+ # Returns whether the query has been run.
259
+ def run?
260
+ !@response.blank?
261
+ end
262
+
263
+ # Returns the current page number of the result set. (Page indexes begin at 1.)
264
+ def current_page
265
+ @options['page']
266
+ end
267
+
268
+ # Returns the number of records per page.
269
+ def per_page
270
+ @options['per_page']
271
+ end
272
+
273
+ # Returns the last available page number in the result set.
274
+ def page_count
275
+ require_run
276
+ (total_entries / per_page.to_f).ceil
277
+ end
278
+
279
+ # Returns the previous page number.
280
+ def previous_page
281
+ current_page > 1 ? (current_page - 1) : nil
282
+ end
283
+
284
+ # Returns the next page number.
285
+ def next_page
286
+ current_page < page_count ? (current_page + 1) : nil
287
+ end
288
+
289
+ # Returns the global index position of the first result on this page.
290
+ def offset
291
+ (current_page - 1) * per_page
292
+ end
293
+
294
+ # Builds a new command-interface Search object.
295
+ def initialize opts = {}
296
+
297
+ # Change to normal hashes with String keys for speed
298
+ opts = Hash[HashWithIndifferentAccess.new(opts._deep_dup._coerce_basic_types)]
299
+ unless self.class.query_defaults.instance_of? Hash
300
+ self.class.query_defaults = Hash[self.class.query_defaults]
301
+ self.class.query_defaults['location'] = Hash[self.class.query_defaults['location']]
302
+
303
+ self.class.client_options = Hash[self.class.client_options]
304
+ self.class.excerpting_options = Hash[self.class.excerpting_options]
305
+ self.class.excerpting_options['content_methods'].map! {|ary| ary.map {|m| m.to_s}}
306
+ end
307
+
308
+ # We need an annoying deep merge on the :location parameter
309
+ opts['location'].reverse_merge!(self.class.query_defaults['location']) if opts['location']
310
+
311
+ # Merge the rest of the defaults
312
+ @options = self.class.query_defaults.merge(opts)
313
+
314
+ @options['query'] = @options['query'].to_s
315
+ @options['class_names'] = Array(@options['class_names'])
316
+ @options['facets'] = Array(@options['facets'])
317
+ @options['indexes'] = Array(@options['indexes']).join(" ")
318
+
319
+ raise UsageError, "Weights must be a Hash" unless @options['weights'].is_a? Hash
320
+ raise UsageError, "Filters must be a Hash" unless @options['filters'].is_a? Hash
321
+
322
+ @options['parsed_query'] = parse(query)
323
+
324
+ @results, @subtotals, @facets, @response = [], {}, {}, {}
325
+
326
+ extra_keys = @options.keys - (self.class.query_defaults.keys + INTERNAL_KEYS)
327
+ log "discarded invalid keys: #{extra_keys * ', '}" if extra_keys.any? and RAILS_ENV != "test"
328
+ end
329
+
330
+ # Run the search, filling results with an array of ActiveRecord objects. Set the parameter to false
331
+ # if you only want the ids returned.
332
+ def run(reify = true)
333
+ @request = build_request_with_options(@options)
334
+
335
+ log "searching for #{@options.inspect}"
336
+
337
+ perform_action_with_retries do
338
+ @response = @request.query(parsed_query, @options['indexes'])
339
+ log "search returned #{total_entries}/#{response[:total_found].to_i} in #{time.to_f} seconds."
340
+
341
+ if self.class.client_options['with_subtotals']
342
+ @subtotals = get_subtotals(@request, parsed_query)
343
+
344
+ # If the original query has a filter on this class, we will use its more accurate total rather the facet's
345
+ # less accurate total.
346
+ if @options['class_names'].size == 1
347
+ @subtotals[@options['class_names'].first] = response[:total_found]
348
+ end
349
+
350
+ end
351
+
352
+ Array(@options['facets']).each do |facet|
353
+ @facets[facet] = get_facets(@request, parsed_query, facet)
354
+ end
355
+
356
+ @results = convert_sphinx_ids(response[:matches])
357
+ @results = reify_results(@results) if reify
358
+
359
+ say "warning; #{response[:warning]}" if response[:warning]
360
+ raise UsageError, response[:error] if response[:error]
361
+
362
+ end
363
+ self
364
+ end
365
+
366
+
367
+ # Overwrite the configured content attributes with excerpted and highlighted versions of themselves.
368
+ # Runs run if it hasn't already been done.
369
+ def excerpt
370
+
371
+ require_run
372
+ return if results.empty?
373
+
374
+ # See what fields in each result might respond to our excerptable methods
375
+ results_with_content_methods = results.map do |result|
376
+ [result,
377
+ self.class.excerpting_options['content_methods'].map do |methods|
378
+ methods.detect do |this|
379
+ result.respond_to? this
380
+ end
381
+ end
382
+ ]
383
+ end
384
+
385
+ # Fetch the actual field contents
386
+ docs = results_with_content_methods.map do |result, methods|
387
+ methods.map do |method|
388
+ method and strip_bogus_characters(result.send(method)) or ""
389
+ end
390
+ end.flatten
391
+
392
+ excerpting_options = {
393
+ :docs => docs,
394
+ :index => MAIN_INDEX, # http://www.sphinxsearch.com/forum/view.html?id=100
395
+ :words => strip_query_commands(parsed_query)
396
+ }
397
+ self.class.excerpting_options.except('content_methods').each do |key, value|
398
+ # Riddle only wants symbols
399
+ excerpting_options[key.to_sym] ||= value
400
+ end
401
+
402
+ responses = perform_action_with_retries do
403
+ # Ship to Sphinx to highlight and excerpt
404
+ @request.excerpts(excerpting_options)
405
+ end
406
+
407
+ responses = responses.in_groups_of(self.class.excerpting_options['content_methods'].size)
408
+
409
+ results_with_content_methods.each_with_index do |result_and_methods, i|
410
+ # Override the individual model accessors with the excerpted data
411
+ result, methods = result_and_methods
412
+ methods.each_with_index do |method, j|
413
+ data = responses[i][j]
414
+ if method
415
+ result._metaclass.send('define_method', method) { data }
416
+ attributes = result.instance_variable_get('@attributes')
417
+ attributes[method] = data if attributes[method]
418
+ end
419
+ end
420
+ end
421
+
422
+ @results = results_with_content_methods.map do |result_and_content_method|
423
+ result_and_content_method.first.freeze
424
+ end
425
+
426
+ self
427
+ end
428
+
429
+
430
+ # Delegates enumerable methods to @results, if possible. This allows us to behave directly like a WillPaginate::Collection. Failing that, we delegate to the options hash if a key is set. This lets us use <tt>self</tt> directly in view helpers.
431
+ def method_missing(*args, &block)
432
+ if @results.respond_to? args.first
433
+ @results.send(*args, &block)
434
+ elsif options.has_key? args.first.to_s
435
+ @options[args.first.to_s]
436
+ else
437
+ super
438
+ end
439
+ end
440
+
441
+ def log msg #:nodoc:
442
+ Ultrasphinx.log msg
443
+ end
444
+
445
+ def say msg #:nodoc:
446
+ Ultrasphinx.say msg
447
+ end
448
+
449
+ private
450
+
451
+ def require_run
452
+ run unless run?
453
+ end
454
+
455
+ end
456
+ end
@@ -0,0 +1,57 @@
1
+
2
+
3
+ module Ultrasphinx
4
+
5
+ =begin rdoc
6
+
7
+ In order to spellcheck your user's query, Ultrasphinx bundles a small spelling module.
8
+
9
+ == Setup
10
+
11
+ Make sure Aspell and the Rubygem <tt>raspell</tt> are installed. See http://blog.evanweaver.com/files/doc/fauna/raspell/ for detailed instructions.
12
+
13
+ Copy the <tt>examples/ap.multi</tt> file into your Aspell dictionary folder (<tt>/opt/local/share/aspell/</tt> on Mac, <tt>/usr/lib/aspell-0.60/</tt> on Linux). This file lets Aspell load a custom wordlist generated by Sphinx from your app data (you can configure its filename in the <tt>config/ultrasphinx/*.base</tt> files). Modify the file if you don't want to also use the default American English dictionary.
14
+
15
+ Finally, to build the custom wordlist, run:
16
+ sudo rake ultrasphinx:spelling:build
17
+
18
+ You need to use <tt>sudo</tt> because Ultrasphinx needs to write to the Aspell dictionary folder. Also note that Aspell, <tt>raspell</tt>, and the custom dictionary must be available on each application server, not on the Sphinx daemon server.
19
+
20
+
21
+ == Usage
22
+
23
+ Now you can see if a query is correctly spelled as so:
24
+ @correction = Ultrasphinx::Spell.correct(@search.query)
25
+
26
+ If <tt>@correction</tt> is not <tt>nil</tt>, go ahead and suggest it to the user.
27
+
28
+ =end
29
+
30
+ module Spell
31
+
32
+ begin
33
+ SP = Aspell.new(Ultrasphinx::DICTIONARY)
34
+ SP.suggestion_mode = Aspell::NORMAL
35
+ SP.set_option("ignore-case", "true")
36
+ Ultrasphinx.say "spelling support enabled"
37
+ rescue Object => e
38
+ SP = nil
39
+ Ultrasphinx.say "spelling support not available (raspell configuration raised \"#{e}\")"
40
+ end
41
+
42
+ def self.correct string
43
+ return nil unless SP
44
+ correction = string.gsub(/[\w\']+/) do |word|
45
+ unless SP.check(word)
46
+ SP.suggest(word).first
47
+ else
48
+ word
49
+ end
50
+ end
51
+
52
+ correction if correction != string
53
+ end
54
+
55
+ end
56
+ end
57
+