pager-ultrasphinx 1.0.20080510
Sign up to get free protection for your applications and to get access to all the features.
- data/LICENSE +184 -0
- data/README +140 -0
- data/Rakefile +27 -0
- data/lib/ultrasphinx/associations.rb +26 -0
- data/lib/ultrasphinx/autoload.rb +12 -0
- data/lib/ultrasphinx/configure.rb +367 -0
- data/lib/ultrasphinx/core_extensions.rb +132 -0
- data/lib/ultrasphinx/fields.rb +198 -0
- data/lib/ultrasphinx/is_indexed.rb +227 -0
- data/lib/ultrasphinx/postgresql/concat_ws.sql +35 -0
- data/lib/ultrasphinx/postgresql/crc32.sql +15 -0
- data/lib/ultrasphinx/postgresql/group_concat.sql +23 -0
- data/lib/ultrasphinx/postgresql/hex_to_int.sql +15 -0
- data/lib/ultrasphinx/postgresql/language.sql +1 -0
- data/lib/ultrasphinx/postgresql/unix_timestamp.sql +12 -0
- data/lib/ultrasphinx/search/internals.rb +385 -0
- data/lib/ultrasphinx/search/parser.rb +139 -0
- data/lib/ultrasphinx/search.rb +456 -0
- data/lib/ultrasphinx/spell.rb +57 -0
- data/lib/ultrasphinx/ultrasphinx.rb +199 -0
- data/lib/ultrasphinx.rb +36 -0
- data/rails/init.rb +2 -0
- data/tasks/ultrasphinx.rake +206 -0
- data/vendor/riddle/MIT-LICENCE +20 -0
- data/vendor/riddle/README +74 -0
- data/vendor/riddle/Rakefile +117 -0
- data/vendor/riddle/lib/riddle/client/filter.rb +44 -0
- data/vendor/riddle/lib/riddle/client/message.rb +65 -0
- data/vendor/riddle/lib/riddle/client/response.rb +84 -0
- data/vendor/riddle/lib/riddle/client.rb +593 -0
- data/vendor/riddle/lib/riddle.rb +25 -0
- data/vendor/will_paginate/LICENSE +18 -0
- metadata +84 -0
@@ -0,0 +1,456 @@
|
|
1
|
+
|
2
|
+
module Ultrasphinx
|
3
|
+
|
4
|
+
=begin rdoc
|
5
|
+
Command-interface Search object.
|
6
|
+
|
7
|
+
== Basic usage
|
8
|
+
|
9
|
+
To set up a search, instantiate an Ultrasphinx::Search object with a hash of parameters. Only the <tt>:query</tt> key is mandatory.
|
10
|
+
@search = Ultrasphinx::Search.new(
|
11
|
+
:query => @query,
|
12
|
+
:sort_mode => 'descending',
|
13
|
+
:sort_by => 'created_at'
|
14
|
+
)
|
15
|
+
|
16
|
+
Now, to run the query, call its <tt>run</tt> method. Your results will be available as ActiveRecord instances via the <tt>results</tt> method. Example:
|
17
|
+
@search.run
|
18
|
+
@search.results
|
19
|
+
|
20
|
+
= Options
|
21
|
+
|
22
|
+
== Query format
|
23
|
+
|
24
|
+
The query string supports boolean operation, parentheses, phrases, and field-specific search. Query words are stemmed and joined by an implicit <tt>AND</tt> by default.
|
25
|
+
|
26
|
+
* Valid boolean operators are <tt>AND</tt>, <tt>OR</tt>, and <tt>NOT</tt>.
|
27
|
+
* Field-specific searches should be formatted as <tt>fieldname:contents</tt>. (This will only work for text fields. For numeric and date fields, see the <tt>:filters</tt> parameter, below.)
|
28
|
+
* Phrases must be enclosed in double quotes.
|
29
|
+
|
30
|
+
A Sphinx::SphinxInternalError will be raised on invalid queries. In general, queries can only be nested to one level.
|
31
|
+
@query = 'dog OR cat OR "white tigers" NOT (lions OR bears) AND title:animals'
|
32
|
+
|
33
|
+
== Hash parameters
|
34
|
+
|
35
|
+
The hash lets you customize internal aspects of the search.
|
36
|
+
|
37
|
+
<tt>:per_page</tt>:: An integer. How many results per page.
|
38
|
+
<tt>:page</tt>:: An integer. Which page of the results to return.
|
39
|
+
<tt>:class_names</tt>:: An array or string. The class name of the model you want to search, an array of model names to search, or <tt>nil</tt> for all available models.
|
40
|
+
<tt>:sort_mode</tt>:: <tt>'relevance'</tt> or <tt>'ascending'</tt> or <tt>'descending'</tt>. How to order the result set. Note that <tt>'time'</tt> and <tt>'extended'</tt> modes are available, but not tested.
|
41
|
+
<tt>:sort_by</tt>:: A field name. What field to order by for <tt>'ascending'</tt> or <tt>'descending'</tt> mode. Has no effect for <tt>'relevance'</tt>.
|
42
|
+
<tt>:weights</tt>:: A hash. Text-field names and associated query weighting. The default weight for every field is 1.0. Example: <tt>:weights => {'title' => 2.0}</tt>
|
43
|
+
<tt>:filters</tt>:: A hash. Names of numeric or date fields and associated values. You can use a single value, an array of values, or a range. (See the bottom of the ActiveRecord::Base page for an example.)
|
44
|
+
<tt>:facets</tt>:: An array of fields for grouping/faceting. You can access the returned facet values and their result counts with the <tt>facets</tt> method.
|
45
|
+
<tt>:location</tt>:: A hash. Specify the names of your latititude and longitude attributes as declared in your is_indexed calls. To sort the results by distance, set <tt>:sort_mode => 'extended'</tt> and <tt>:sort_by => 'distance asc'.</tt>
|
46
|
+
<tt>:indexes</tt>:: An array of indexes to search. Currently only <tt>Ultrasphinx::MAIN_INDEX</tt> and <tt>Ultrasphinx::DELTA_INDEX</tt> are available. Defaults to both; changing this is rarely needed.
|
47
|
+
|
48
|
+
== Query Defaults
|
49
|
+
|
50
|
+
Note that you can set up your own query defaults in <tt>environment.rb</tt>:
|
51
|
+
|
52
|
+
self.class.query_defaults = HashWithIndifferentAccess.new({
|
53
|
+
:per_page => 10,
|
54
|
+
:sort_mode => 'relevance',
|
55
|
+
:weights => {'title' => 2.0}
|
56
|
+
})
|
57
|
+
|
58
|
+
= Advanced features
|
59
|
+
|
60
|
+
== Geographic distance
|
61
|
+
|
62
|
+
If you pass a <tt>:location</tt> Hash, distance from the location in meters will be available in your result records via the <tt>distance</tt> accessor:
|
63
|
+
|
64
|
+
@search = Ultrasphinx::Search.new(:class_names => 'Point',
|
65
|
+
:query => 'pizza',
|
66
|
+
:sort_mode => 'extended',
|
67
|
+
:sort_by => 'distance',
|
68
|
+
:location => {
|
69
|
+
:lat => 40.3,
|
70
|
+
:long => -73.6
|
71
|
+
})
|
72
|
+
|
73
|
+
@search.run.first.distance #=> 1402.4
|
74
|
+
|
75
|
+
Note that Sphinx expects lat/long to be indexed as radians. If you have degrees in your database, do the conversion in the <tt>is_indexed</tt> as so:
|
76
|
+
|
77
|
+
is_indexed 'fields' => [
|
78
|
+
'name',
|
79
|
+
'description',
|
80
|
+
{:field => 'lat', :function_sql => "RADIANS(?)"},
|
81
|
+
{:field => 'lng', :function_sql => "RADIANS(?)"}
|
82
|
+
]
|
83
|
+
|
84
|
+
Then, set <tt>Ultrasphinx::Search.client_options[:location][:units] = 'degrees'</tt>.
|
85
|
+
|
86
|
+
The MySQL <tt>:double</tt> column type is recommended for storing location data. For Postgres, use <tt>:float</tt.
|
87
|
+
|
88
|
+
== Interlock integration
|
89
|
+
|
90
|
+
Ultrasphinx uses the <tt>find_all_by_id</tt> method to instantiate records. If you set <tt>with_finders: true</tt> in {Interlock's}[http://blog.evanweaver.com/files/doc/fauna/interlock] <tt>config/memcached.yml</tt>, Interlock overrides <tt>find_all_by_id</tt> with a caching version.
|
91
|
+
|
92
|
+
== Will_paginate integration
|
93
|
+
|
94
|
+
The Search instance responds to the same methods as a WillPaginate::Collection object, so once you have called <tt>run</tt> or <tt>excerpt</tt> you can use it directly in your views:
|
95
|
+
|
96
|
+
will_paginate(@search)
|
97
|
+
|
98
|
+
== Excerpt mode
|
99
|
+
|
100
|
+
You can have Sphinx excerpt and highlight the matched sections in the associated fields. Instead of calling <tt>run</tt>, call <tt>excerpt</tt>.
|
101
|
+
|
102
|
+
@search.excerpt
|
103
|
+
|
104
|
+
The returned models will be frozen and have their field contents temporarily changed to the excerpted and highlighted results.
|
105
|
+
|
106
|
+
You need to set the <tt>content_methods</tt> key on Ultrasphinx::Search.excerpting_options to whatever groups of methods you need the excerpter to try to excerpt. The first responding method in each group for each record will be excerpted. This way Ruby-only methods are supported (for example, a metadata method which combines various model fields, or an aliased field so that the original record contents are still available).
|
107
|
+
|
108
|
+
There are some other keys you can set, such as excerpt size, HTML tags to highlight with, and number of words on either side of each excerpt chunk. Example (in <tt>environment.rb</tt>):
|
109
|
+
|
110
|
+
Ultrasphinx::Search.excerpting_options = HashWithIndifferentAccess.new({
|
111
|
+
:before_match => '<strong>',
|
112
|
+
:after_match => '</strong>',
|
113
|
+
:chunk_separator => "...",
|
114
|
+
:limit => 256,
|
115
|
+
:around => 3,
|
116
|
+
:content_methods => [['title'], ['body', 'description', 'content'], ['metadata']]
|
117
|
+
})
|
118
|
+
|
119
|
+
Note that your database is never changed by anything Ultrasphinx does.
|
120
|
+
|
121
|
+
=end
|
122
|
+
|
123
|
+
class Search
|
124
|
+
|
125
|
+
include Internals
|
126
|
+
include Parser
|
127
|
+
|
128
|
+
cattr_accessor :query_defaults
|
129
|
+
self.query_defaults ||= HashWithIndifferentAccess.new({
|
130
|
+
:query => nil,
|
131
|
+
:page => 1,
|
132
|
+
:per_page => 20,
|
133
|
+
:sort_by => nil,
|
134
|
+
:sort_mode => 'relevance',
|
135
|
+
:indexes => [
|
136
|
+
MAIN_INDEX,
|
137
|
+
(DELTA_INDEX if Ultrasphinx.delta_index_present?)
|
138
|
+
].compact,
|
139
|
+
:weights => {},
|
140
|
+
:class_names => [],
|
141
|
+
:filters => {},
|
142
|
+
:facets => [],
|
143
|
+
:location => HashWithIndifferentAccess.new({
|
144
|
+
:lat_attribute_name => 'lat',
|
145
|
+
:long_attribute_name => 'lng',
|
146
|
+
:units => 'radians'
|
147
|
+
})
|
148
|
+
})
|
149
|
+
|
150
|
+
cattr_accessor :excerpting_options
|
151
|
+
self.excerpting_options ||= HashWithIndifferentAccess.new({
|
152
|
+
:before_match => "<strong>", :after_match => "</strong>",
|
153
|
+
:chunk_separator => "...",
|
154
|
+
:limit => 256,
|
155
|
+
:around => 3,
|
156
|
+
# Results should respond to one in each group of these, in precedence order, for the
|
157
|
+
# excerpting to fire
|
158
|
+
:content_methods => [['title', 'name'], ['body', 'description', 'content'], ['metadata']]
|
159
|
+
})
|
160
|
+
|
161
|
+
cattr_accessor :client_options
|
162
|
+
self.client_options ||= HashWithIndifferentAccess.new({
|
163
|
+
:with_subtotals => false,
|
164
|
+
:ignore_missing_records => false,
|
165
|
+
# Has no effect if :ignore_missing_records => false
|
166
|
+
:max_missing_records => 5,
|
167
|
+
:max_retries => 4,
|
168
|
+
:retry_sleep_time => 0.5,
|
169
|
+
:max_facets => 1000,
|
170
|
+
:max_matches_offset => 1000,
|
171
|
+
# Whether to add an accessor to each returned result that specifies its global rank in
|
172
|
+
# the search.
|
173
|
+
:with_global_rank => false,
|
174
|
+
# Which method names to try to use for loading records. You can define your own (for
|
175
|
+
# example, with :includes) and then attach it here. Each method must accept an Array
|
176
|
+
# of ids, but do not have to preserve order. If the class does not respond_to? any
|
177
|
+
# method name in the array, :find_all_by_id will be used.
|
178
|
+
:finder_methods => []
|
179
|
+
})
|
180
|
+
|
181
|
+
# Friendly sort mode mappings
|
182
|
+
SPHINX_CLIENT_PARAMS = {
|
183
|
+
'sort_mode' => {
|
184
|
+
'relevance' => :relevance,
|
185
|
+
'descending' => :attr_desc,
|
186
|
+
'ascending' => :attr_asc,
|
187
|
+
'time' => :time_segments,
|
188
|
+
'extended' => :extended,
|
189
|
+
}
|
190
|
+
}
|
191
|
+
|
192
|
+
INTERNAL_KEYS = ['parsed_query'] #:nodoc:
|
193
|
+
|
194
|
+
MODELS_TO_IDS = Ultrasphinx.get_models_to_class_ids || {}
|
195
|
+
|
196
|
+
IDS_TO_MODELS = MODELS_TO_IDS.invert #:nodoc:
|
197
|
+
|
198
|
+
MAX_MATCHES = DAEMON_SETTINGS["max_matches"].to_i
|
199
|
+
|
200
|
+
FACET_CACHE = {} #:nodoc:
|
201
|
+
|
202
|
+
# Returns the options hash.
|
203
|
+
def options
|
204
|
+
@options
|
205
|
+
end
|
206
|
+
|
207
|
+
# Returns the query string used.
|
208
|
+
def query
|
209
|
+
# Redundant with method_missing
|
210
|
+
@options['query']
|
211
|
+
end
|
212
|
+
|
213
|
+
def parsed_query #:nodoc:
|
214
|
+
# Redundant with method_missing
|
215
|
+
@options['parsed_query']
|
216
|
+
end
|
217
|
+
|
218
|
+
# Returns an array of result objects.
|
219
|
+
def results
|
220
|
+
require_run
|
221
|
+
@results
|
222
|
+
end
|
223
|
+
|
224
|
+
# Returns the facet map for this query, if facets were used.
|
225
|
+
def facets
|
226
|
+
raise UsageError, "No facet field was configured" unless @options['facets']
|
227
|
+
require_run
|
228
|
+
@facets
|
229
|
+
end
|
230
|
+
|
231
|
+
# Returns the raw response from the Sphinx client.
|
232
|
+
def response
|
233
|
+
require_run
|
234
|
+
@response
|
235
|
+
end
|
236
|
+
|
237
|
+
# Returns a hash of total result counts, scoped to each available model. Set <tt>Ultrasphinx::Search.client_options[:with_subtotals] = true</tt> to enable.
|
238
|
+
#
|
239
|
+
# The subtotals are implemented as a special type of facet.
|
240
|
+
def subtotals
|
241
|
+
raise UsageError, "Subtotals are not enabled" unless self.class.client_options['with_subtotals']
|
242
|
+
require_run
|
243
|
+
@subtotals
|
244
|
+
end
|
245
|
+
|
246
|
+
# Returns the total result count.
|
247
|
+
def total_entries
|
248
|
+
require_run
|
249
|
+
[response[:total_found] || 0, MAX_MATCHES].min
|
250
|
+
end
|
251
|
+
|
252
|
+
# Returns the response time of the query, in milliseconds.
|
253
|
+
def time
|
254
|
+
require_run
|
255
|
+
response[:time]
|
256
|
+
end
|
257
|
+
|
258
|
+
# Returns whether the query has been run.
|
259
|
+
def run?
|
260
|
+
!@response.blank?
|
261
|
+
end
|
262
|
+
|
263
|
+
# Returns the current page number of the result set. (Page indexes begin at 1.)
|
264
|
+
def current_page
|
265
|
+
@options['page']
|
266
|
+
end
|
267
|
+
|
268
|
+
# Returns the number of records per page.
|
269
|
+
def per_page
|
270
|
+
@options['per_page']
|
271
|
+
end
|
272
|
+
|
273
|
+
# Returns the last available page number in the result set.
|
274
|
+
def page_count
|
275
|
+
require_run
|
276
|
+
(total_entries / per_page.to_f).ceil
|
277
|
+
end
|
278
|
+
|
279
|
+
# Returns the previous page number.
|
280
|
+
def previous_page
|
281
|
+
current_page > 1 ? (current_page - 1) : nil
|
282
|
+
end
|
283
|
+
|
284
|
+
# Returns the next page number.
|
285
|
+
def next_page
|
286
|
+
current_page < page_count ? (current_page + 1) : nil
|
287
|
+
end
|
288
|
+
|
289
|
+
# Returns the global index position of the first result on this page.
|
290
|
+
def offset
|
291
|
+
(current_page - 1) * per_page
|
292
|
+
end
|
293
|
+
|
294
|
+
# Builds a new command-interface Search object.
|
295
|
+
def initialize opts = {}
|
296
|
+
|
297
|
+
# Change to normal hashes with String keys for speed
|
298
|
+
opts = Hash[HashWithIndifferentAccess.new(opts._deep_dup._coerce_basic_types)]
|
299
|
+
unless self.class.query_defaults.instance_of? Hash
|
300
|
+
self.class.query_defaults = Hash[self.class.query_defaults]
|
301
|
+
self.class.query_defaults['location'] = Hash[self.class.query_defaults['location']]
|
302
|
+
|
303
|
+
self.class.client_options = Hash[self.class.client_options]
|
304
|
+
self.class.excerpting_options = Hash[self.class.excerpting_options]
|
305
|
+
self.class.excerpting_options['content_methods'].map! {|ary| ary.map {|m| m.to_s}}
|
306
|
+
end
|
307
|
+
|
308
|
+
# We need an annoying deep merge on the :location parameter
|
309
|
+
opts['location'].reverse_merge!(self.class.query_defaults['location']) if opts['location']
|
310
|
+
|
311
|
+
# Merge the rest of the defaults
|
312
|
+
@options = self.class.query_defaults.merge(opts)
|
313
|
+
|
314
|
+
@options['query'] = @options['query'].to_s
|
315
|
+
@options['class_names'] = Array(@options['class_names'])
|
316
|
+
@options['facets'] = Array(@options['facets'])
|
317
|
+
@options['indexes'] = Array(@options['indexes']).join(" ")
|
318
|
+
|
319
|
+
raise UsageError, "Weights must be a Hash" unless @options['weights'].is_a? Hash
|
320
|
+
raise UsageError, "Filters must be a Hash" unless @options['filters'].is_a? Hash
|
321
|
+
|
322
|
+
@options['parsed_query'] = parse(query)
|
323
|
+
|
324
|
+
@results, @subtotals, @facets, @response = [], {}, {}, {}
|
325
|
+
|
326
|
+
extra_keys = @options.keys - (self.class.query_defaults.keys + INTERNAL_KEYS)
|
327
|
+
log "discarded invalid keys: #{extra_keys * ', '}" if extra_keys.any? and RAILS_ENV != "test"
|
328
|
+
end
|
329
|
+
|
330
|
+
# Run the search, filling results with an array of ActiveRecord objects. Set the parameter to false
|
331
|
+
# if you only want the ids returned.
|
332
|
+
def run(reify = true)
|
333
|
+
@request = build_request_with_options(@options)
|
334
|
+
|
335
|
+
log "searching for #{@options.inspect}"
|
336
|
+
|
337
|
+
perform_action_with_retries do
|
338
|
+
@response = @request.query(parsed_query, @options['indexes'])
|
339
|
+
log "search returned #{total_entries}/#{response[:total_found].to_i} in #{time.to_f} seconds."
|
340
|
+
|
341
|
+
if self.class.client_options['with_subtotals']
|
342
|
+
@subtotals = get_subtotals(@request, parsed_query)
|
343
|
+
|
344
|
+
# If the original query has a filter on this class, we will use its more accurate total rather the facet's
|
345
|
+
# less accurate total.
|
346
|
+
if @options['class_names'].size == 1
|
347
|
+
@subtotals[@options['class_names'].first] = response[:total_found]
|
348
|
+
end
|
349
|
+
|
350
|
+
end
|
351
|
+
|
352
|
+
Array(@options['facets']).each do |facet|
|
353
|
+
@facets[facet] = get_facets(@request, parsed_query, facet)
|
354
|
+
end
|
355
|
+
|
356
|
+
@results = convert_sphinx_ids(response[:matches])
|
357
|
+
@results = reify_results(@results) if reify
|
358
|
+
|
359
|
+
say "warning; #{response[:warning]}" if response[:warning]
|
360
|
+
raise UsageError, response[:error] if response[:error]
|
361
|
+
|
362
|
+
end
|
363
|
+
self
|
364
|
+
end
|
365
|
+
|
366
|
+
|
367
|
+
# Overwrite the configured content attributes with excerpted and highlighted versions of themselves.
|
368
|
+
# Runs run if it hasn't already been done.
|
369
|
+
def excerpt
|
370
|
+
|
371
|
+
require_run
|
372
|
+
return if results.empty?
|
373
|
+
|
374
|
+
# See what fields in each result might respond to our excerptable methods
|
375
|
+
results_with_content_methods = results.map do |result|
|
376
|
+
[result,
|
377
|
+
self.class.excerpting_options['content_methods'].map do |methods|
|
378
|
+
methods.detect do |this|
|
379
|
+
result.respond_to? this
|
380
|
+
end
|
381
|
+
end
|
382
|
+
]
|
383
|
+
end
|
384
|
+
|
385
|
+
# Fetch the actual field contents
|
386
|
+
docs = results_with_content_methods.map do |result, methods|
|
387
|
+
methods.map do |method|
|
388
|
+
method and strip_bogus_characters(result.send(method)) or ""
|
389
|
+
end
|
390
|
+
end.flatten
|
391
|
+
|
392
|
+
excerpting_options = {
|
393
|
+
:docs => docs,
|
394
|
+
:index => MAIN_INDEX, # http://www.sphinxsearch.com/forum/view.html?id=100
|
395
|
+
:words => strip_query_commands(parsed_query)
|
396
|
+
}
|
397
|
+
self.class.excerpting_options.except('content_methods').each do |key, value|
|
398
|
+
# Riddle only wants symbols
|
399
|
+
excerpting_options[key.to_sym] ||= value
|
400
|
+
end
|
401
|
+
|
402
|
+
responses = perform_action_with_retries do
|
403
|
+
# Ship to Sphinx to highlight and excerpt
|
404
|
+
@request.excerpts(excerpting_options)
|
405
|
+
end
|
406
|
+
|
407
|
+
responses = responses.in_groups_of(self.class.excerpting_options['content_methods'].size)
|
408
|
+
|
409
|
+
results_with_content_methods.each_with_index do |result_and_methods, i|
|
410
|
+
# Override the individual model accessors with the excerpted data
|
411
|
+
result, methods = result_and_methods
|
412
|
+
methods.each_with_index do |method, j|
|
413
|
+
data = responses[i][j]
|
414
|
+
if method
|
415
|
+
result._metaclass.send('define_method', method) { data }
|
416
|
+
attributes = result.instance_variable_get('@attributes')
|
417
|
+
attributes[method] = data if attributes[method]
|
418
|
+
end
|
419
|
+
end
|
420
|
+
end
|
421
|
+
|
422
|
+
@results = results_with_content_methods.map do |result_and_content_method|
|
423
|
+
result_and_content_method.first.freeze
|
424
|
+
end
|
425
|
+
|
426
|
+
self
|
427
|
+
end
|
428
|
+
|
429
|
+
|
430
|
+
# Delegates enumerable methods to @results, if possible. This allows us to behave directly like a WillPaginate::Collection. Failing that, we delegate to the options hash if a key is set. This lets us use <tt>self</tt> directly in view helpers.
|
431
|
+
def method_missing(*args, &block)
|
432
|
+
if @results.respond_to? args.first
|
433
|
+
@results.send(*args, &block)
|
434
|
+
elsif options.has_key? args.first.to_s
|
435
|
+
@options[args.first.to_s]
|
436
|
+
else
|
437
|
+
super
|
438
|
+
end
|
439
|
+
end
|
440
|
+
|
441
|
+
def log msg #:nodoc:
|
442
|
+
Ultrasphinx.log msg
|
443
|
+
end
|
444
|
+
|
445
|
+
def say msg #:nodoc:
|
446
|
+
Ultrasphinx.say msg
|
447
|
+
end
|
448
|
+
|
449
|
+
private
|
450
|
+
|
451
|
+
def require_run
|
452
|
+
run unless run?
|
453
|
+
end
|
454
|
+
|
455
|
+
end
|
456
|
+
end
|
@@ -0,0 +1,57 @@
|
|
1
|
+
|
2
|
+
|
3
|
+
module Ultrasphinx
|
4
|
+
|
5
|
+
=begin rdoc
|
6
|
+
|
7
|
+
In order to spellcheck your user's query, Ultrasphinx bundles a small spelling module.
|
8
|
+
|
9
|
+
== Setup
|
10
|
+
|
11
|
+
Make sure Aspell and the Rubygem <tt>raspell</tt> are installed. See http://blog.evanweaver.com/files/doc/fauna/raspell/ for detailed instructions.
|
12
|
+
|
13
|
+
Copy the <tt>examples/ap.multi</tt> file into your Aspell dictionary folder (<tt>/opt/local/share/aspell/</tt> on Mac, <tt>/usr/lib/aspell-0.60/</tt> on Linux). This file lets Aspell load a custom wordlist generated by Sphinx from your app data (you can configure its filename in the <tt>config/ultrasphinx/*.base</tt> files). Modify the file if you don't want to also use the default American English dictionary.
|
14
|
+
|
15
|
+
Finally, to build the custom wordlist, run:
|
16
|
+
sudo rake ultrasphinx:spelling:build
|
17
|
+
|
18
|
+
You need to use <tt>sudo</tt> because Ultrasphinx needs to write to the Aspell dictionary folder. Also note that Aspell, <tt>raspell</tt>, and the custom dictionary must be available on each application server, not on the Sphinx daemon server.
|
19
|
+
|
20
|
+
|
21
|
+
== Usage
|
22
|
+
|
23
|
+
Now you can see if a query is correctly spelled as so:
|
24
|
+
@correction = Ultrasphinx::Spell.correct(@search.query)
|
25
|
+
|
26
|
+
If <tt>@correction</tt> is not <tt>nil</tt>, go ahead and suggest it to the user.
|
27
|
+
|
28
|
+
=end
|
29
|
+
|
30
|
+
module Spell
|
31
|
+
|
32
|
+
begin
|
33
|
+
SP = Aspell.new(Ultrasphinx::DICTIONARY)
|
34
|
+
SP.suggestion_mode = Aspell::NORMAL
|
35
|
+
SP.set_option("ignore-case", "true")
|
36
|
+
Ultrasphinx.say "spelling support enabled"
|
37
|
+
rescue Object => e
|
38
|
+
SP = nil
|
39
|
+
Ultrasphinx.say "spelling support not available (raspell configuration raised \"#{e}\")"
|
40
|
+
end
|
41
|
+
|
42
|
+
def self.correct string
|
43
|
+
return nil unless SP
|
44
|
+
correction = string.gsub(/[\w\']+/) do |word|
|
45
|
+
unless SP.check(word)
|
46
|
+
SP.suggest(word).first
|
47
|
+
else
|
48
|
+
word
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
correction if correction != string
|
53
|
+
end
|
54
|
+
|
55
|
+
end
|
56
|
+
end
|
57
|
+
|