bento_search 1.5.0 → 1.6.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -9,18 +9,18 @@ require 'nokogiri'
9
9
 
10
10
  module BentoSearch
11
11
  # Usually raised by #get on an engine, when result for specified identifier
12
- # can't be found.
12
+ # can't be found.
13
13
  class ::BentoSearch::NotFound < ::BentoSearch::Error ; end
14
- # Usually raised by #get when identifier results in more than one record.
14
+ # Usually raised by #get when identifier results in more than one record.
15
15
  class ::BentoSearch::TooManyFound < ::BentoSearch::Error ; end
16
16
  # Raised for problem contacting or unexpected response from
17
- # remote service. Not yet universally used.
17
+ # remote service. Not yet universally used.
18
18
  class ::BentoSearch::FetchError < ::BentoSearch::Error ; end
19
19
 
20
-
21
- # Module mix-in for bento_search search engines.
20
+
21
+ # Module mix-in for bento_search search engines.
22
22
  #
23
- # ==Using a SearchEngine
23
+ # ==Using a SearchEngine
24
24
  #
25
25
  # See a whole bunch more examples in the project README.
26
26
  #
@@ -43,18 +43,18 @@ module BentoSearch
43
43
  # of BentoSearch::Results
44
44
  #
45
45
  # results = engine.search("query")
46
- #
47
- # See more docs under #search, as well as project README.
48
46
  #
49
- # == Standard configuration variables.
50
- #
47
+ # See more docs under #search, as well as project README.
48
+ #
49
+ # == Standard configuration variables.
50
+ #
51
51
  # Some engines require their own engine-specific configuration for api keys
52
52
  # and such, and offer their own engine-specific configuration for engine-specific
53
- # features.
53
+ # features.
54
54
  #
55
55
  # An additional semi-standard configuration variable, some engines take
56
56
  # an `:auth => true` to tell the engine to assume that all access is by
57
- # authenticated local users who should be given elevated access to results.
57
+ # authenticated local users who should be given elevated access to results.
58
58
  #
59
59
  # Additional standard configuration keys that are implemented by the bento_search
60
60
  # framework:
@@ -63,7 +63,7 @@ module BentoSearch
63
63
  # String name of decorator class that will be applied by #bento_decorate
64
64
  # helper in standard view. See wiki for more info on decorators. Must be
65
65
  # string name, actual class object not supported (to make it easier
66
- # to serialize and transport configuration).
66
+ # to serialize and transport configuration).
67
67
  #
68
68
  # == Implementing a SearchEngine
69
69
  #
@@ -71,7 +71,7 @@ module BentoSearch
71
71
  # generally only responsible for the parts specific to your search engine:
72
72
  # receiving a query, making a call to the external search engine, and
73
73
  # translating it's result to standard a BentoSearch::Results full of
74
- # BentoSearch::ResultItems.
74
+ # BentoSearch::ResultItems.
75
75
  #
76
76
  # Start out by simply including the search engine module:
77
77
  #
@@ -85,64 +85,64 @@ module BentoSearch
85
85
  # BentoSearch::Results item.
86
86
  #
87
87
  # The Results object should have #total_items set with total hitcount, and
88
- # contain BentoSearch::ResultItem objects for each hit in the current page.
89
- # See individual class documentation for more info.
88
+ # contain BentoSearch::ResultItem objects for each hit in the current page.
89
+ # See individual class documentation for more info.
90
90
  #
91
91
  # That's about the extent of your responsibilities. If the search failed
92
92
  # for some reason due to an error, you should return a Results object
93
93
  # with it's #error object set, so it will be `failed?`. The framework
94
94
  # will take care of this for you for certain uncaught exceptions you allow
95
95
  # to rise out of #search_implementation (timeouts, HTTPClient timeouts,
96
- # nokogiri and MultiJson parse errors).
96
+ # nokogiri and MultiJson parse errors).
97
97
  #
98
98
  # A SearchEngine object can be re-used for multiple searches, possibly
99
99
  # under concurrent multi-threading. Do not store search-specific state
100
100
  # in the search object. but you can store configuration-specific state there
101
- # of course.
102
- #
101
+ # of course.
102
+ #
103
103
  # Recommend use of HTTPClient, if possible, for http searches. Especially
104
104
  # using a class-level HTTPClient instance, to re-use persistent http
105
105
  # connections accross searches (can be esp important if you need to contact
106
106
  # external search api via https/ssl).
107
107
  #
108
- # If you have required configuration keys, you can register that with
109
- # class-level required_configuration_keys method.
108
+ # If you have required configuration keys, you can register that with
109
+ # class-level required_configuration_keys method.
110
110
  #
111
- # You can also advertise max per-page value by overriding max_per_page.
111
+ # You can also advertise max per-page value by overriding max_per_page.
112
112
  #
113
- # If you support fielded searching, you should over-ride
113
+ # If you support fielded searching, you should over-ride
114
114
  # #search_field_definitions; if you support sorting, you should
115
115
  # override #sort_definitions. See BentoSearch::SearchEngine::Capabilities
116
- # module for documentation.
117
- #
116
+ # module for documentation.
117
+ #
118
118
  #
119
119
  module SearchEngine
120
120
  DefaultPerPage = 10
121
-
122
121
 
123
-
124
-
122
+
123
+
124
+
125
125
  extend ActiveSupport::Concern
126
-
126
+
127
127
  include Capabilities
128
-
128
+
129
129
  included do
130
- attr_accessor :configuration
130
+ attr_accessor :configuration
131
131
  end
132
-
132
+
133
133
  # If specific SearchEngine calls initialize, you want to call super
134
134
  # handles configuration loading, mostly. Argument is a
135
- # Confstruct::Configuration or Hash.
135
+ # Confstruct::Configuration or Hash.
136
136
  def initialize(aConfiguration = Confstruct::Configuration.new)
137
137
  # To work around weird confstruct bug, we need to change
138
- # a hash to a Confstruct ourselves.
138
+ # a hash to a Confstruct ourselves.
139
139
  # https://github.com/mbklein/confstruct/issues/14
140
140
  unless aConfiguration.kind_of? Confstruct::Configuration
141
141
  aConfiguration = Confstruct::Configuration.new aConfiguration
142
142
  end
143
-
144
-
145
- # init, from copy of default, or new
143
+
144
+
145
+ # init, from copy of default, or new
146
146
  if self.class.default_configuration
147
147
  self.configuration = Confstruct::Configuration.new(self.class.default_configuration)
148
148
  else
@@ -150,187 +150,187 @@ module BentoSearch
150
150
  end
151
151
  # merge in current instance config
152
152
  self.configuration.configure ( aConfiguration )
153
-
154
- # global defaults?
153
+
154
+ # global defaults?
155
155
  self.configuration[:for_display] ||= {}
156
-
156
+
157
157
  # check for required keys -- have to be present, and not nil
158
158
  if self.class.required_configuration
159
- self.class.required_configuration.each do |required_key|
159
+ self.class.required_configuration.each do |required_key|
160
160
  if ["**NOT_FOUND**", nil].include? self.configuration.lookup!(required_key.to_s, "**NOT_FOUND**")
161
161
  raise ArgumentError.new("#{self.class.name} requires configuration key #{required_key}")
162
162
  end
163
163
  end
164
164
  end
165
-
165
+
166
166
  end
167
-
168
-
169
- # Method used to actually get results from a search engine.
167
+
168
+
169
+ # Method used to actually get results from a search engine.
170
170
  #
171
171
  # When implementing a search engine, you do not override this #search
172
172
  # method, but instead override #search_implementation. #search will
173
173
  # call your specific #search_implementation, first normalizing the query
174
- # arguments, and then normalizing and adding standard metadata to your return value.
174
+ # arguments, and then normalizing and adding standard metadata to your return value.
175
175
  #
176
176
  # Most engines support pagination, sorting, and searching in a specific
177
- # field.
177
+ # field.
178
178
  #
179
179
  # # 1-based page index
180
180
  # engine.search("query", :per_page => 20, :page => 5)
181
181
  # # or use 0-based per-record index, engines that don't
182
- # # support this will round to nearest page.
182
+ # # support this will round to nearest page.
183
183
  # engine.search("query", :start => 20)
184
184
  #
185
185
  # You can ask an engine what search fields it supports with engine.search_keys
186
186
  # engine.search("query", :search_field => "engine_search_field_name")
187
187
  #
188
188
  # There are also normalized 'semantic' names you can use accross engines
189
- # (if they support them): :title, :author, :subject, maybe more.
189
+ # (if they support them): :title, :author, :subject, maybe more.
190
190
  #
191
191
  # engine.search("query", :semantic_search_field => :title)
192
192
  #
193
193
  # Ask an engine what semantic field names it supports with `engine.semantic_search_keys`
194
194
  #
195
- # Unrecognized search fields will be ignored, unless you pass in
196
- # :unrecognized_search_field => :raise (or do same in config).
195
+ # Unrecognized search fields will be ignored, unless you pass in
196
+ # :unrecognized_search_field => :raise (or do same in config).
197
197
  #
198
198
  # Ask an engine what sort fields it supports with `engine.sort_keys`. See
199
199
  # list of standard sort keys in I18n file at ./config/locales/en.yml, in
200
- # `en.bento_search.sort_keys`.
200
+ # `en.bento_search.sort_keys`.
201
201
  #
202
202
  # engine.search("query", :sort => "some_sort_key")
203
203
  #
204
204
  # Some engines support additional arguments to 'search', see individual
205
205
  # engine documentation. For instance, some engines support `:auth => true`
206
206
  # to give the user elevated search privileges when you have an authenticated
207
- # local user.
207
+ # local user.
208
208
  #
209
209
  # Query as first arg is just a convenience, you can also use a single hash
210
- # argument.
210
+ # argument.
211
211
  #
212
212
  # engine.search(:query => "query", :per_page => 20, :page => 4)
213
213
  #
214
214
  def search(*arguments)
215
215
  start_t = Time.now
216
-
216
+
217
217
  arguments = normalized_search_arguments(*arguments)
218
218
 
219
219
  results = search_implementation(arguments)
220
-
220
+
221
221
  fill_in_search_metadata_for(results, arguments)
222
-
222
+
223
223
  results.timing = (Time.now - start_t)
224
-
224
+
225
225
  return results
226
226
  rescue *auto_rescue_exceptions => e
227
227
  # Uncaught exception, log and turn into failed Results object. We
228
228
  # only catch certain types of exceptions, or it makes dev really
229
229
  # confusing eating exceptions. This is intentionally a convenience
230
230
  # to allow search engine implementations to just raise the exception
231
- # and we'll turn it into a proper error.
231
+ # and we'll turn it into a proper error.
232
232
  cleaned_backtrace = Rails.backtrace_cleaner.clean(e.backtrace)
233
233
  log_msg = "BentoSearch::SearchEngine failed results: #{e.inspect}\n #{cleaned_backtrace.join("\n ")}"
234
234
  Rails.logger.error log_msg
235
-
235
+
236
236
  failed = BentoSearch::Results.new
237
237
  failed.error ||= {}
238
238
  failed.error[:exception] = e
239
-
239
+
240
240
  failed.timing = (Time.now - start_t)
241
-
241
+
242
242
  fill_in_search_metadata_for(failed, arguments)
243
243
 
244
-
244
+
245
245
  return failed
246
246
  end
247
-
247
+
248
248
  # SOME of the elements of Results to be returned that SearchEngine implementation
249
249
  # fills in automatically post-search. Extracted into a method for DRY in
250
250
  # error handling to try to fill these in even in errors. Also can be used
251
- # as public method for de-serialized or mock results.
251
+ # as public method for de-serialized or mock results.
252
252
  def fill_in_search_metadata_for(results, normalized_arguments = {})
253
253
  results.search_args = normalized_arguments
254
254
  results.start = normalized_arguments[:start] || 0
255
255
  results.per_page = normalized_arguments[:per_page]
256
-
256
+
257
257
  results.engine_id = configuration.id
258
258
  results.display_configuration = configuration.for_display
259
259
 
260
260
  # We copy some configuraton info over to each Item, as a convenience
261
261
  # to display logic that may have decide what to do given only an item,
262
262
  # and may want to parameterize based on configuration.
263
- results.each do |item|
264
- item.engine_id = configuration.id
263
+ results.each do |item|
264
+ item.engine_id = configuration.id
265
265
  item.decorator = configuration.lookup!("for_display.decorator")
266
266
  item.display_configuration = configuration.for_display
267
267
  end
268
268
 
269
269
  results
270
270
  end
271
-
271
+
272
272
 
273
273
  # Take the arguments passed into #search, which can be flexibly given
274
274
  # in several ways, and normalize to an expected single hash that
275
275
  # will be passed to an engine's #search_implementation. The output
276
276
  # of this method is a single hash, and is what a #search_implementation
277
- # can expect to receive as an argument, with keys:
277
+ # can expect to receive as an argument, with keys:
278
278
  #
279
279
  # [:query] the query
280
280
  # [:per_page] will _always_ be present, using the default per_page if
281
281
  # none given by caller
282
282
  # [:start, :page] both :start and :page will _always_ be present, regardless
283
283
  # of which the caller used. They will both be integers, even if strings passed in.
284
- # [:search_field] A search field from the engine's #search_field_definitions, as string.
284
+ # [:search_field] A search field from the engine's #search_field_definitions, as string.
285
285
  # Even if the caller used :semantic_search_field, it'll be normalized
286
- # to the actual local search_field key on output.
287
- # [:sort] Sort key.
286
+ # to the actual local search_field key on output.
287
+ # [:sort] Sort key.
288
288
  #
289
289
  def normalized_search_arguments(*orig_arguments)
290
290
  arguments = {}
291
-
291
+
292
292
  # Two-arg style to one hash, if present
293
293
  if (orig_arguments.length > 1 ||
294
294
  (orig_arguments.length == 1 && ! orig_arguments.first.kind_of?(Hash)))
295
- arguments[:query] = orig_arguments.delete_at(0)
295
+ arguments[:query] = orig_arguments.delete_at(0)
296
296
  end
297
297
 
298
298
  arguments.merge!(orig_arguments.first) if orig_arguments.length > 0
299
-
300
-
299
+
300
+
301
301
  # allow strings for pagination (like from url query), change to
302
- # int please.
302
+ # int please.
303
303
  [:page, :per_page, :start].each do |key|
304
304
  arguments.delete(key) if arguments[key].blank?
305
305
  arguments[key] = arguments[key].to_i if arguments[key]
306
- end
306
+ end
307
307
  arguments[:per_page] ||= DefaultPerPage
308
-
309
- # illegal arguments
308
+
309
+ # illegal arguments
310
310
  if (arguments[:start] && arguments[:page])
311
311
  raise ArgumentError.new("Can't supply both :page and :start")
312
312
  end
313
- if ( arguments[:per_page] &&
314
- self.max_per_page &&
313
+ if ( arguments[:per_page] &&
314
+ self.max_per_page &&
315
315
  arguments[:per_page] > self.max_per_page)
316
316
  raise ArgumentError.new("#{arguments[:per_page]} is more than maximum :per_page of #{self.max_per_page} for #{self.class}")
317
317
  end
318
-
319
-
318
+
319
+
320
320
  # Normalize :page to :start, and vice versa
321
321
  if arguments[:page]
322
322
  arguments[:start] = (arguments[:page] - 1) * arguments[:per_page]
323
323
  elsif arguments[:start]
324
324
  arguments[:page] = (arguments[:start] / arguments[:per_page]) + 1
325
325
  end
326
-
326
+
327
327
  # normalize :sort from possibly symbol to string
328
328
  # TODO: raise if unrecognized sort key?
329
329
  if arguments[:sort]
330
330
  arguments[:sort] = arguments[:sort].to_s
331
331
  end
332
332
 
333
-
333
+
334
334
  # Multi-field search
335
335
  if arguments[:query].kind_of? Hash
336
336
  # Only if allowed
@@ -348,7 +348,7 @@ module BentoSearch
348
348
  # translate semantic fields, raising for unfound fields if configured
349
349
  arguments[:query].transform_keys! do |key|
350
350
  new_key = self.semantic_search_map[key.to_s] || key
351
-
351
+
352
352
  if ( config_arg(arguments, :unrecognized_search_field) == "raise" &&
353
353
  ! self.search_keys.include?(new_key))
354
354
  raise ArgumentError.new("#{self.class.name} does not know about search_field #{new_key}, in query Hash #{arguments[:query]}")
@@ -358,60 +358,60 @@ module BentoSearch
358
358
  end
359
359
 
360
360
  end
361
-
361
+
362
362
  # translate semantic_search_field to search_field, or raise if
363
- # can't.
363
+ # can't.
364
364
  if (semantic = arguments.delete(:semantic_search_field)) && ! semantic.blank?
365
365
  semantic = semantic.to_s
366
366
  # Legacy publication_title is now called source_title
367
367
  semantic = "source_title" if semantic == "publication_title"
368
368
 
369
369
  mapped = self.semantic_search_map[semantic]
370
- if config_arg(arguments, :unrecognized_search_field) == "raise" && ! mapped
370
+ if config_arg(arguments, :unrecognized_search_field) == "raise" && ! mapped
371
371
  raise ArgumentError.new("#{self.class.name} does not know about :semantic_search_field #{semantic}")
372
372
  end
373
373
  arguments[:search_field] = mapped
374
- end
374
+ end
375
375
  if config_arg(arguments, :unrecognized_search_field) == "raise" && ! search_keys.include?(arguments[:search_field])
376
376
  raise ArgumentError.new("#{self.class.name} does not know about :search_field #{arguments[:search_field]}")
377
377
  end
378
-
379
-
378
+
379
+
380
380
  return arguments
381
381
  end
382
382
  alias_method :parse_search_arguments, :normalized_search_arguments
383
-
384
-
385
- # Used mainly/only by the AJAX results loading.
383
+
384
+
385
+ # Used mainly/only by the AJAX results loading.
386
386
  # an array WHITELIST of attributes that can be sent as non-verified
387
387
  # request params and used to execute a search. For instance, 'auth' is
388
- # NOT on there, you can't trust a web request as to 'auth' status.
388
+ # NOT on there, you can't trust a web request as to 'auth' status.
389
389
  # individual engines may over-ride, call super, and add additional
390
- # engine-specific attributes.
390
+ # engine-specific attributes.
391
391
  def public_settable_search_args
392
392
  [:query, :search_field, :semantic_search_field, :sort, :page, :start, :per_page]
393
393
  end
394
-
395
-
394
+
395
+
396
396
  protected
397
397
 
398
398
  # get value of an arg that can be supplied in search args OR config,
399
399
  # with search_args over-ridding config. Also normalizes value to_s
400
- # (for symbols/strings).
400
+ # (for symbols/strings).
401
401
  def config_arg(arguments, key, default = nil)
402
402
  value = if arguments[key].present?
403
403
  arguments[key]
404
404
  else
405
405
  configuration[key]
406
406
  end
407
-
407
+
408
408
  value = value.to_s if value.kind_of? Symbol
409
-
409
+
410
410
  return value
411
411
  end
412
-
412
+
413
413
  # What exceptions should our #search wrapper rescue and turn
414
- # into failed results instead of fatal errors?
414
+ # into failed results instead of fatal errors?
415
415
  #
416
416
  # Can't rescue everything, or we eat VCR/webmock errors, and lots
417
417
  # of other errors we don't want to eat either, making
@@ -420,29 +420,29 @@ module BentoSearch
420
420
  #
421
421
  # This default list is probably useful already, but individual
422
422
  # engines can override if it's convenient for their own errorau
423
- # handling.
423
+ # handling.
424
424
  def auto_rescue_exceptions
425
- [TimeoutError, HTTPClient::TimeoutError,
425
+ [BentoSearch::RubyTimeoutClass, HTTPClient::TimeoutError,
426
426
  HTTPClient::ConfigurationError, HTTPClient::BadResponseError,
427
427
  MultiJson::DecodeError, Nokogiri::SyntaxError]
428
428
  end
429
-
430
-
429
+
430
+
431
431
  module ClassMethods
432
-
433
- # Over-ride returning a hash or Confstruct with
434
- # any configuration values you want by default.
432
+
433
+ # Over-ride returning a hash or Confstruct with
434
+ # any configuration values you want by default.
435
435
  # actual user-specified config values will be deep-merged
436
- # into the defaults.
436
+ # into the defaults.
437
437
  def default_configuration
438
438
  end
439
-
439
+
440
440
  # Over-ride returning an array of symbols for required
441
441
  # configuration keys.
442
442
  def required_configuration
443
443
  end
444
-
444
+
445
445
  end
446
-
446
+
447
447
  end
448
448
  end