bento_search 1.5.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -9,18 +9,18 @@ require 'nokogiri'
9
9
 
10
10
  module BentoSearch
11
11
  # Usually raised by #get on an engine, when result for specified identifier
12
- # can't be found.
12
+ # can't be found.
13
13
  class ::BentoSearch::NotFound < ::BentoSearch::Error ; end
14
- # Usually raised by #get when identifier results in more than one record.
14
+ # Usually raised by #get when identifier results in more than one record.
15
15
  class ::BentoSearch::TooManyFound < ::BentoSearch::Error ; end
16
16
  # Raised for problem contacting or unexpected response from
17
- # remote service. Not yet universally used.
17
+ # remote service. Not yet universally used.
18
18
  class ::BentoSearch::FetchError < ::BentoSearch::Error ; end
19
19
 
20
-
21
- # Module mix-in for bento_search search engines.
20
+
21
+ # Module mix-in for bento_search search engines.
22
22
  #
23
- # ==Using a SearchEngine
23
+ # ==Using a SearchEngine
24
24
  #
25
25
  # See a whole bunch more examples in the project README.
26
26
  #
@@ -43,18 +43,18 @@ module BentoSearch
43
43
  # of BentoSearch::Results
44
44
  #
45
45
  # results = engine.search("query")
46
- #
47
- # See more docs under #search, as well as project README.
48
46
  #
49
- # == Standard configuration variables.
50
- #
47
+ # See more docs under #search, as well as project README.
48
+ #
49
+ # == Standard configuration variables.
50
+ #
51
51
  # Some engines require their own engine-specific configuration for api keys
52
52
  # and such, and offer their own engine-specific configuration for engine-specific
53
- # features.
53
+ # features.
54
54
  #
55
55
  # An additional semi-standard configuration variable, some engines take
56
56
  # an `:auth => true` to tell the engine to assume that all access is by
57
- # authenticated local users who should be given elevated access to results.
57
+ # authenticated local users who should be given elevated access to results.
58
58
  #
59
59
  # Additional standard configuration keys that are implemented by the bento_search
60
60
  # framework:
@@ -63,7 +63,7 @@ module BentoSearch
63
63
  # String name of decorator class that will be applied by #bento_decorate
64
64
  # helper in standard view. See wiki for more info on decorators. Must be
65
65
  # string name, actual class object not supported (to make it easier
66
- # to serialize and transport configuration).
66
+ # to serialize and transport configuration).
67
67
  #
68
68
  # == Implementing a SearchEngine
69
69
  #
@@ -71,7 +71,7 @@ module BentoSearch
71
71
  # generally only responsible for the parts specific to your search engine:
72
72
  # receiving a query, making a call to the external search engine, and
73
73
  # translating it's result to standard a BentoSearch::Results full of
74
- # BentoSearch::ResultItems.
74
+ # BentoSearch::ResultItems.
75
75
  #
76
76
  # Start out by simply including the search engine module:
77
77
  #
@@ -85,64 +85,64 @@ module BentoSearch
85
85
  # BentoSearch::Results item.
86
86
  #
87
87
  # The Results object should have #total_items set with total hitcount, and
88
- # contain BentoSearch::ResultItem objects for each hit in the current page.
89
- # See individual class documentation for more info.
88
+ # contain BentoSearch::ResultItem objects for each hit in the current page.
89
+ # See individual class documentation for more info.
90
90
  #
91
91
  # That's about the extent of your responsibilities. If the search failed
92
92
  # for some reason due to an error, you should return a Results object
93
93
  # with it's #error object set, so it will be `failed?`. The framework
94
94
  # will take care of this for you for certain uncaught exceptions you allow
95
95
  # to rise out of #search_implementation (timeouts, HTTPClient timeouts,
96
- # nokogiri and MultiJson parse errors).
96
+ # nokogiri and MultiJson parse errors).
97
97
  #
98
98
  # A SearchEngine object can be re-used for multiple searches, possibly
99
99
  # under concurrent multi-threading. Do not store search-specific state
100
100
  # in the search object. but you can store configuration-specific state there
101
- # of course.
102
- #
101
+ # of course.
102
+ #
103
103
  # Recommend use of HTTPClient, if possible, for http searches. Especially
104
104
  # using a class-level HTTPClient instance, to re-use persistent http
105
105
  # connections accross searches (can be esp important if you need to contact
106
106
  # external search api via https/ssl).
107
107
  #
108
- # If you have required configuration keys, you can register that with
109
- # class-level required_configuration_keys method.
108
+ # If you have required configuration keys, you can register that with
109
+ # class-level required_configuration_keys method.
110
110
  #
111
- # You can also advertise max per-page value by overriding max_per_page.
111
+ # You can also advertise max per-page value by overriding max_per_page.
112
112
  #
113
- # If you support fielded searching, you should over-ride
113
+ # If you support fielded searching, you should over-ride
114
114
  # #search_field_definitions; if you support sorting, you should
115
115
  # override #sort_definitions. See BentoSearch::SearchEngine::Capabilities
116
- # module for documentation.
117
- #
116
+ # module for documentation.
117
+ #
118
118
  #
119
119
  module SearchEngine
120
120
  DefaultPerPage = 10
121
-
122
121
 
123
-
124
-
122
+
123
+
124
+
125
125
  extend ActiveSupport::Concern
126
-
126
+
127
127
  include Capabilities
128
-
128
+
129
129
  included do
130
- attr_accessor :configuration
130
+ attr_accessor :configuration
131
131
  end
132
-
132
+
133
133
  # If specific SearchEngine calls initialize, you want to call super
134
134
  # handles configuration loading, mostly. Argument is a
135
- # Confstruct::Configuration or Hash.
135
+ # Confstruct::Configuration or Hash.
136
136
  def initialize(aConfiguration = Confstruct::Configuration.new)
137
137
  # To work around weird confstruct bug, we need to change
138
- # a hash to a Confstruct ourselves.
138
+ # a hash to a Confstruct ourselves.
139
139
  # https://github.com/mbklein/confstruct/issues/14
140
140
  unless aConfiguration.kind_of? Confstruct::Configuration
141
141
  aConfiguration = Confstruct::Configuration.new aConfiguration
142
142
  end
143
-
144
-
145
- # init, from copy of default, or new
143
+
144
+
145
+ # init, from copy of default, or new
146
146
  if self.class.default_configuration
147
147
  self.configuration = Confstruct::Configuration.new(self.class.default_configuration)
148
148
  else
@@ -150,187 +150,187 @@ module BentoSearch
150
150
  end
151
151
  # merge in current instance config
152
152
  self.configuration.configure ( aConfiguration )
153
-
154
- # global defaults?
153
+
154
+ # global defaults?
155
155
  self.configuration[:for_display] ||= {}
156
-
156
+
157
157
  # check for required keys -- have to be present, and not nil
158
158
  if self.class.required_configuration
159
- self.class.required_configuration.each do |required_key|
159
+ self.class.required_configuration.each do |required_key|
160
160
  if ["**NOT_FOUND**", nil].include? self.configuration.lookup!(required_key.to_s, "**NOT_FOUND**")
161
161
  raise ArgumentError.new("#{self.class.name} requires configuration key #{required_key}")
162
162
  end
163
163
  end
164
164
  end
165
-
165
+
166
166
  end
167
-
168
-
169
- # Method used to actually get results from a search engine.
167
+
168
+
169
+ # Method used to actually get results from a search engine.
170
170
  #
171
171
  # When implementing a search engine, you do not override this #search
172
172
  # method, but instead override #search_implementation. #search will
173
173
  # call your specific #search_implementation, first normalizing the query
174
- # arguments, and then normalizing and adding standard metadata to your return value.
174
+ # arguments, and then normalizing and adding standard metadata to your return value.
175
175
  #
176
176
  # Most engines support pagination, sorting, and searching in a specific
177
- # field.
177
+ # field.
178
178
  #
179
179
  # # 1-based page index
180
180
  # engine.search("query", :per_page => 20, :page => 5)
181
181
  # # or use 0-based per-record index, engines that don't
182
- # # support this will round to nearest page.
182
+ # # support this will round to nearest page.
183
183
  # engine.search("query", :start => 20)
184
184
  #
185
185
  # You can ask an engine what search fields it supports with engine.search_keys
186
186
  # engine.search("query", :search_field => "engine_search_field_name")
187
187
  #
188
188
  # There are also normalized 'semantic' names you can use accross engines
189
- # (if they support them): :title, :author, :subject, maybe more.
189
+ # (if they support them): :title, :author, :subject, maybe more.
190
190
  #
191
191
  # engine.search("query", :semantic_search_field => :title)
192
192
  #
193
193
  # Ask an engine what semantic field names it supports with `engine.semantic_search_keys`
194
194
  #
195
- # Unrecognized search fields will be ignored, unless you pass in
196
- # :unrecognized_search_field => :raise (or do same in config).
195
+ # Unrecognized search fields will be ignored, unless you pass in
196
+ # :unrecognized_search_field => :raise (or do same in config).
197
197
  #
198
198
  # Ask an engine what sort fields it supports with `engine.sort_keys`. See
199
199
  # list of standard sort keys in I18n file at ./config/locales/en.yml, in
200
- # `en.bento_search.sort_keys`.
200
+ # `en.bento_search.sort_keys`.
201
201
  #
202
202
  # engine.search("query", :sort => "some_sort_key")
203
203
  #
204
204
  # Some engines support additional arguments to 'search', see individual
205
205
  # engine documentation. For instance, some engines support `:auth => true`
206
206
  # to give the user elevated search privileges when you have an authenticated
207
- # local user.
207
+ # local user.
208
208
  #
209
209
  # Query as first arg is just a convenience, you can also use a single hash
210
- # argument.
210
+ # argument.
211
211
  #
212
212
  # engine.search(:query => "query", :per_page => 20, :page => 4)
213
213
  #
214
214
  def search(*arguments)
215
215
  start_t = Time.now
216
-
216
+
217
217
  arguments = normalized_search_arguments(*arguments)
218
218
 
219
219
  results = search_implementation(arguments)
220
-
220
+
221
221
  fill_in_search_metadata_for(results, arguments)
222
-
222
+
223
223
  results.timing = (Time.now - start_t)
224
-
224
+
225
225
  return results
226
226
  rescue *auto_rescue_exceptions => e
227
227
  # Uncaught exception, log and turn into failed Results object. We
228
228
  # only catch certain types of exceptions, or it makes dev really
229
229
  # confusing eating exceptions. This is intentionally a convenience
230
230
  # to allow search engine implementations to just raise the exception
231
- # and we'll turn it into a proper error.
231
+ # and we'll turn it into a proper error.
232
232
  cleaned_backtrace = Rails.backtrace_cleaner.clean(e.backtrace)
233
233
  log_msg = "BentoSearch::SearchEngine failed results: #{e.inspect}\n #{cleaned_backtrace.join("\n ")}"
234
234
  Rails.logger.error log_msg
235
-
235
+
236
236
  failed = BentoSearch::Results.new
237
237
  failed.error ||= {}
238
238
  failed.error[:exception] = e
239
-
239
+
240
240
  failed.timing = (Time.now - start_t)
241
-
241
+
242
242
  fill_in_search_metadata_for(failed, arguments)
243
243
 
244
-
244
+
245
245
  return failed
246
246
  end
247
-
247
+
248
248
  # SOME of the elements of Results to be returned that SearchEngine implementation
249
249
  # fills in automatically post-search. Extracted into a method for DRY in
250
250
  # error handling to try to fill these in even in errors. Also can be used
251
- # as public method for de-serialized or mock results.
251
+ # as public method for de-serialized or mock results.
252
252
  def fill_in_search_metadata_for(results, normalized_arguments = {})
253
253
  results.search_args = normalized_arguments
254
254
  results.start = normalized_arguments[:start] || 0
255
255
  results.per_page = normalized_arguments[:per_page]
256
-
256
+
257
257
  results.engine_id = configuration.id
258
258
  results.display_configuration = configuration.for_display
259
259
 
260
260
  # We copy some configuraton info over to each Item, as a convenience
261
261
  # to display logic that may have decide what to do given only an item,
262
262
  # and may want to parameterize based on configuration.
263
- results.each do |item|
264
- item.engine_id = configuration.id
263
+ results.each do |item|
264
+ item.engine_id = configuration.id
265
265
  item.decorator = configuration.lookup!("for_display.decorator")
266
266
  item.display_configuration = configuration.for_display
267
267
  end
268
268
 
269
269
  results
270
270
  end
271
-
271
+
272
272
 
273
273
  # Take the arguments passed into #search, which can be flexibly given
274
274
  # in several ways, and normalize to an expected single hash that
275
275
  # will be passed to an engine's #search_implementation. The output
276
276
  # of this method is a single hash, and is what a #search_implementation
277
- # can expect to receive as an argument, with keys:
277
+ # can expect to receive as an argument, with keys:
278
278
  #
279
279
  # [:query] the query
280
280
  # [:per_page] will _always_ be present, using the default per_page if
281
281
  # none given by caller
282
282
  # [:start, :page] both :start and :page will _always_ be present, regardless
283
283
  # of which the caller used. They will both be integers, even if strings passed in.
284
- # [:search_field] A search field from the engine's #search_field_definitions, as string.
284
+ # [:search_field] A search field from the engine's #search_field_definitions, as string.
285
285
  # Even if the caller used :semantic_search_field, it'll be normalized
286
- # to the actual local search_field key on output.
287
- # [:sort] Sort key.
286
+ # to the actual local search_field key on output.
287
+ # [:sort] Sort key.
288
288
  #
289
289
  def normalized_search_arguments(*orig_arguments)
290
290
  arguments = {}
291
-
291
+
292
292
  # Two-arg style to one hash, if present
293
293
  if (orig_arguments.length > 1 ||
294
294
  (orig_arguments.length == 1 && ! orig_arguments.first.kind_of?(Hash)))
295
- arguments[:query] = orig_arguments.delete_at(0)
295
+ arguments[:query] = orig_arguments.delete_at(0)
296
296
  end
297
297
 
298
298
  arguments.merge!(orig_arguments.first) if orig_arguments.length > 0
299
-
300
-
299
+
300
+
301
301
  # allow strings for pagination (like from url query), change to
302
- # int please.
302
+ # int please.
303
303
  [:page, :per_page, :start].each do |key|
304
304
  arguments.delete(key) if arguments[key].blank?
305
305
  arguments[key] = arguments[key].to_i if arguments[key]
306
- end
306
+ end
307
307
  arguments[:per_page] ||= DefaultPerPage
308
-
309
- # illegal arguments
308
+
309
+ # illegal arguments
310
310
  if (arguments[:start] && arguments[:page])
311
311
  raise ArgumentError.new("Can't supply both :page and :start")
312
312
  end
313
- if ( arguments[:per_page] &&
314
- self.max_per_page &&
313
+ if ( arguments[:per_page] &&
314
+ self.max_per_page &&
315
315
  arguments[:per_page] > self.max_per_page)
316
316
  raise ArgumentError.new("#{arguments[:per_page]} is more than maximum :per_page of #{self.max_per_page} for #{self.class}")
317
317
  end
318
-
319
-
318
+
319
+
320
320
  # Normalize :page to :start, and vice versa
321
321
  if arguments[:page]
322
322
  arguments[:start] = (arguments[:page] - 1) * arguments[:per_page]
323
323
  elsif arguments[:start]
324
324
  arguments[:page] = (arguments[:start] / arguments[:per_page]) + 1
325
325
  end
326
-
326
+
327
327
  # normalize :sort from possibly symbol to string
328
328
  # TODO: raise if unrecognized sort key?
329
329
  if arguments[:sort]
330
330
  arguments[:sort] = arguments[:sort].to_s
331
331
  end
332
332
 
333
-
333
+
334
334
  # Multi-field search
335
335
  if arguments[:query].kind_of? Hash
336
336
  # Only if allowed
@@ -348,7 +348,7 @@ module BentoSearch
348
348
  # translate semantic fields, raising for unfound fields if configured
349
349
  arguments[:query].transform_keys! do |key|
350
350
  new_key = self.semantic_search_map[key.to_s] || key
351
-
351
+
352
352
  if ( config_arg(arguments, :unrecognized_search_field) == "raise" &&
353
353
  ! self.search_keys.include?(new_key))
354
354
  raise ArgumentError.new("#{self.class.name} does not know about search_field #{new_key}, in query Hash #{arguments[:query]}")
@@ -358,60 +358,60 @@ module BentoSearch
358
358
  end
359
359
 
360
360
  end
361
-
361
+
362
362
  # translate semantic_search_field to search_field, or raise if
363
- # can't.
363
+ # can't.
364
364
  if (semantic = arguments.delete(:semantic_search_field)) && ! semantic.blank?
365
365
  semantic = semantic.to_s
366
366
  # Legacy publication_title is now called source_title
367
367
  semantic = "source_title" if semantic == "publication_title"
368
368
 
369
369
  mapped = self.semantic_search_map[semantic]
370
- if config_arg(arguments, :unrecognized_search_field) == "raise" && ! mapped
370
+ if config_arg(arguments, :unrecognized_search_field) == "raise" && ! mapped
371
371
  raise ArgumentError.new("#{self.class.name} does not know about :semantic_search_field #{semantic}")
372
372
  end
373
373
  arguments[:search_field] = mapped
374
- end
374
+ end
375
375
  if config_arg(arguments, :unrecognized_search_field) == "raise" && ! search_keys.include?(arguments[:search_field])
376
376
  raise ArgumentError.new("#{self.class.name} does not know about :search_field #{arguments[:search_field]}")
377
377
  end
378
-
379
-
378
+
379
+
380
380
  return arguments
381
381
  end
382
382
  alias_method :parse_search_arguments, :normalized_search_arguments
383
-
384
-
385
- # Used mainly/only by the AJAX results loading.
383
+
384
+
385
+ # Used mainly/only by the AJAX results loading.
386
386
  # an array WHITELIST of attributes that can be sent as non-verified
387
387
  # request params and used to execute a search. For instance, 'auth' is
388
- # NOT on there, you can't trust a web request as to 'auth' status.
388
+ # NOT on there, you can't trust a web request as to 'auth' status.
389
389
  # individual engines may over-ride, call super, and add additional
390
- # engine-specific attributes.
390
+ # engine-specific attributes.
391
391
  def public_settable_search_args
392
392
  [:query, :search_field, :semantic_search_field, :sort, :page, :start, :per_page]
393
393
  end
394
-
395
-
394
+
395
+
396
396
  protected
397
397
 
398
398
  # get value of an arg that can be supplied in search args OR config,
399
399
  # with search_args over-ridding config. Also normalizes value to_s
400
- # (for symbols/strings).
400
+ # (for symbols/strings).
401
401
  def config_arg(arguments, key, default = nil)
402
402
  value = if arguments[key].present?
403
403
  arguments[key]
404
404
  else
405
405
  configuration[key]
406
406
  end
407
-
407
+
408
408
  value = value.to_s if value.kind_of? Symbol
409
-
409
+
410
410
  return value
411
411
  end
412
-
412
+
413
413
  # What exceptions should our #search wrapper rescue and turn
414
- # into failed results instead of fatal errors?
414
+ # into failed results instead of fatal errors?
415
415
  #
416
416
  # Can't rescue everything, or we eat VCR/webmock errors, and lots
417
417
  # of other errors we don't want to eat either, making
@@ -420,29 +420,29 @@ module BentoSearch
420
420
  #
421
421
  # This default list is probably useful already, but individual
422
422
  # engines can override if it's convenient for their own errorau
423
- # handling.
423
+ # handling.
424
424
  def auto_rescue_exceptions
425
- [TimeoutError, HTTPClient::TimeoutError,
425
+ [BentoSearch::RubyTimeoutClass, HTTPClient::TimeoutError,
426
426
  HTTPClient::ConfigurationError, HTTPClient::BadResponseError,
427
427
  MultiJson::DecodeError, Nokogiri::SyntaxError]
428
428
  end
429
-
430
-
429
+
430
+
431
431
  module ClassMethods
432
-
433
- # Over-ride returning a hash or Confstruct with
434
- # any configuration values you want by default.
432
+
433
+ # Over-ride returning a hash or Confstruct with
434
+ # any configuration values you want by default.
435
435
  # actual user-specified config values will be deep-merged
436
- # into the defaults.
436
+ # into the defaults.
437
437
  def default_configuration
438
438
  end
439
-
439
+
440
440
  # Over-ride returning an array of symbols for required
441
441
  # configuration keys.
442
442
  def required_configuration
443
443
  end
444
-
444
+
445
445
  end
446
-
446
+
447
447
  end
448
448
  end