chewy 7.6.0 → 8.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (73) hide show
  1. checksums.yaml +4 -4
  2. data/.github/CODEOWNERS +1 -1
  3. data/.github/dependabot.yml +2 -2
  4. data/.github/workflows/ruby.yml +11 -10
  5. data/.rubocop.yml +1 -1
  6. data/.rubocop_todo.yml +132 -39
  7. data/CHANGELOG.md +18 -1
  8. data/CONTRIBUTING.md +1 -1
  9. data/LICENSE.txt +1 -1
  10. data/README.md +50 -1125
  11. data/chewy.gemspec +3 -2
  12. data/docker-compose.yml +14 -0
  13. data/docs/README.md +16 -0
  14. data/docs/configuration.md +440 -0
  15. data/docs/import.md +122 -0
  16. data/docs/indexing.md +329 -0
  17. data/docs/querying.md +72 -0
  18. data/docs/rake_tasks.md +108 -0
  19. data/docs/testing.md +41 -0
  20. data/docs/troubleshooting.md +101 -0
  21. data/gemfiles/base.gemfile +3 -3
  22. data/gemfiles/{rails.6.1.activerecord.gemfile → rails.7.2.activerecord.gemfile} +3 -3
  23. data/gemfiles/{rails.7.0.activerecord.gemfile → rails.8.0.activerecord.gemfile} +3 -3
  24. data/lib/chewy/config.rb +2 -2
  25. data/lib/chewy/errors.rb +3 -0
  26. data/lib/chewy/fields/root.rb +1 -1
  27. data/lib/chewy/index/actions.rb +5 -5
  28. data/lib/chewy/index/aliases.rb +1 -1
  29. data/lib/chewy/index/syncer.rb +5 -5
  30. data/lib/chewy/minitest/helpers.rb +1 -1
  31. data/lib/chewy/search/request.rb +4 -4
  32. data/lib/chewy/search/response.rb +7 -0
  33. data/lib/chewy/search/scrolling.rb +2 -1
  34. data/lib/chewy/strategy/delayed_sidekiq/worker.rb +1 -1
  35. data/lib/chewy/version.rb +1 -1
  36. data/lib/chewy.rb +4 -0
  37. data/migration_guide.md +1 -1
  38. data/spec/chewy/config_spec.rb +13 -14
  39. data/spec/chewy/elastic_client_spec.rb +1 -1
  40. data/spec/chewy/fields/base_spec.rb +2 -2
  41. data/spec/chewy/fields/time_fields_spec.rb +1 -1
  42. data/spec/chewy/index/actions_spec.rb +9 -70
  43. data/spec/chewy/index/aliases_spec.rb +1 -1
  44. data/spec/chewy/index/import/bulk_builder_spec.rb +2 -2
  45. data/spec/chewy/index/import/bulk_request_spec.rb +1 -1
  46. data/spec/chewy/index/import/routine_spec.rb +1 -1
  47. data/spec/chewy/index/import_spec.rb +15 -15
  48. data/spec/chewy/index/observe/callback_spec.rb +1 -1
  49. data/spec/chewy/index/specification_spec.rb +1 -4
  50. data/spec/chewy/index/syncer_spec.rb +1 -1
  51. data/spec/chewy/index_spec.rb +1 -1
  52. data/spec/chewy/journal_spec.rb +2 -2
  53. data/spec/chewy/minitest/helpers_spec.rb +2 -6
  54. data/spec/chewy/multi_search_spec.rb +1 -1
  55. data/spec/chewy/rake_helper_spec.rb +1 -1
  56. data/spec/chewy/repository_spec.rb +4 -4
  57. data/spec/chewy/rspec/update_index_spec.rb +2 -2
  58. data/spec/chewy/runtime_spec.rb +2 -2
  59. data/spec/chewy/search/loader_spec.rb +1 -1
  60. data/spec/chewy/search/pagination/kaminari_examples.rb +1 -1
  61. data/spec/chewy/search/query_proxy_spec.rb +0 -24
  62. data/spec/chewy/search/request_spec.rb +7 -3
  63. data/spec/chewy/search/response_spec.rb +2 -24
  64. data/spec/chewy/search/scrolling_spec.rb +1 -1
  65. data/spec/chewy/search_spec.rb +1 -1
  66. data/spec/chewy/stash_spec.rb +1 -1
  67. data/spec/chewy/strategy/delayed_sidekiq_spec.rb +27 -10
  68. data/spec/chewy/strategy_spec.rb +1 -1
  69. data/spec/chewy_spec.rb +5 -22
  70. data/spec/spec_helper.rb +26 -0
  71. data/spec/support/active_record.rb +35 -4
  72. metadata +22 -17
  73. data/gemfiles/rails.7.1.activerecord.gemfile +0 -14
data/README.md CHANGED
@@ -13,7 +13,7 @@ In this section we'll cover why you might want to use Chewy instead of the offic
13
13
 
14
14
  * Every index is observable by all the related models.
15
15
 
16
- Most of the indexed models are related to other and sometimes it is necessary to denormalize this related data and put at the same object. For example, you need to index an array of tags together with an article. Chewy allows you to specify an updateable index for every model separately - so corresponding articles will be reindexed on any tag update.
16
+ Most of the indexed models are related to each other and sometimes it is necessary to denormalize this related data and put it in the same object. For example, you need to index an array of tags together with an article. Chewy allows you to specify an updateable index for every model separately - so corresponding articles will be reindexed on any tag update.
17
17
 
18
18
  * Bulk import everywhere.
19
19
 
@@ -25,6 +25,8 @@ In this section we'll cover why you might want to use Chewy instead of the offic
25
25
 
26
26
  * Support for ActiveRecord.
27
27
 
28
+ Chewy provides out-of-the-box integration with ActiveRecord, including automatic index updates on model changes via `update_index` callbacks.
29
+
28
30
  ## Installation
29
31
 
30
32
  Add this line to your application's `Gemfile`:
@@ -41,16 +43,17 @@ Or install it yourself as:
41
43
 
42
44
  ## Compatibility
43
45
 
44
- ### Ruby
46
+ Chewy aims to support all Ruby and Rails versions that are currently maintained by their respective teams. When a version reaches end-of-life, we may drop support for it in a future release.
45
47
 
46
- Chewy is compatible with MRI 3.0-3.2¹.
48
+ ### Ruby
47
49
 
48
- > ¹ Ruby 3 is only supported with Rails 6.1
50
+ Chewy is compatible with MRI 3.2-3.4.
49
51
 
50
52
  ### Elasticsearch compatibility matrix
51
53
 
52
54
  | Chewy version | Elasticsearch version |
53
55
  | ------------- | ---------------------------------- |
56
+ | 8.0.0 | 8.x |
54
57
  | 7.2.x | 7.x |
55
58
  | 7.1.x | 7.x |
56
59
  | 7.0.x | 6.8, 7.x |
@@ -66,7 +69,10 @@ various Chewy versions.
66
69
 
67
70
  ### Active Record
68
71
 
69
- 5.2, 6.0, 6.1 Active Record versions are supported by all Chewy versions.
72
+ The following Active Record versions are supported by Chewy:
73
+
74
+ - 7.2
75
+ - 8.0
70
76
 
71
77
  ## Getting Started
72
78
 
@@ -97,7 +103,36 @@ development:
97
103
  Make sure you have Elasticsearch up and running. You can [install](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html) it locally, but the easiest way is to use [Docker](https://www.docker.com/get-started):
98
104
 
99
105
  ```shell
100
- $ docker run --rm --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.11.1
106
+ $ docker run --rm --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "xpack.security.enabled=false" elasticsearch:8.15.0
107
+ ```
108
+
109
+ ### Security
110
+
111
+ Please note that starting from version 8 Elasticsearch has security features enabled by default.
112
+ Docker command above has it disabled for local testing convenience. If you want to enable it, omit
113
+ `"xpack.security.enabled=false"` part from Docker command, and run these command after starting container (container name `es8` assumed):
114
+
115
+ Reset password for `elastic` user:
116
+ ```
117
+ docker container exec es8 '/usr/share/elasticsearch/bin/elasticsearch-reset-password' -u elastic
118
+ ```
119
+
120
+ Extract CA certificate generated by Elasticsearch on first run:
121
+ ```
122
+ docker container cp es8:/usr/share/elasticsearch/config/certs/http_ca.crt tmp/
123
+ ```
124
+
125
+ And then add them to settings:
126
+
127
+ ```yaml
128
+ # config/chewy.yml
129
+ development:
130
+ host: 'localhost:9200'
131
+ user: 'elastic'
132
+ password: 'SomeLongPassword'
133
+ transport_options:
134
+ ssl:
135
+ ca_file: './tmp/http_ca.crt'
101
136
  ```
102
137
 
103
138
  ### Index
@@ -199,1125 +234,15 @@ end
199
234
  ]
200
235
  ```
201
236
 
202
- ## Usage and configuration
203
-
204
- ### Client settings
205
-
206
- To configure the Chewy client you need to add `chewy.rb` file with `Chewy.settings` hash:
207
-
208
- ```ruby
209
- # config/initializers/chewy.rb
210
- Chewy.settings = {host: 'localhost:9250'} # do not use environments
211
- ```
212
-
213
- And add `chewy.yml` configuration file.
214
-
215
- You can create `chewy.yml` manually or run `rails g chewy:install` to generate it:
216
-
217
- ```yaml
218
- # config/chewy.yml
219
- # separate environment configs
220
- test:
221
- host: 'localhost:9250'
222
- prefix: 'test'
223
- development:
224
- host: 'localhost:9200'
225
- ```
226
-
227
- The resulting config merges both hashes. Client options are passed as is to `Elasticsearch::Transport::Client` except for the `:prefix`, which is used internally by Chewy to create prefixed index names:
228
-
229
- ```ruby
230
- Chewy.settings = {prefix: 'test'}
231
- UsersIndex.index_name # => 'test_users'
232
- ```
233
-
234
- The logger may be set explicitly:
235
-
236
- ```ruby
237
- Chewy.logger = Logger.new(STDOUT)
238
- ```
239
-
240
- See [config.rb](lib/chewy/config.rb) for more details.
241
-
242
- #### AWS Elasticsearch
243
-
244
- If you would like to use AWS's Elasticsearch using an IAM user policy, you will need to sign your requests for the `es:*` action by injecting the appropriate headers passing a proc to `transport_options`.
245
- You'll need an additional gem for Faraday middleware: add `gem 'faraday_middleware-aws-sigv4'` to your Gemfile.
246
-
247
- ```ruby
248
- require 'faraday_middleware/aws_sigv4'
249
-
250
- Chewy.settings = {
251
- host: 'http://my-es-instance-on-aws.us-east-1.es.amazonaws.com:80',
252
- port: 80, # 443 for https host
253
- transport_options: {
254
- headers: { content_type: 'application/json' },
255
- proc: -> (f) do
256
- f.request :aws_sigv4,
257
- service: 'es',
258
- region: 'us-east-1',
259
- access_key_id: ENV['AWS_ACCESS_KEY'],
260
- secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
261
- end
262
- }
263
- }
264
- ```
265
-
266
- #### Index definition
267
-
268
- 1. Create `/app/chewy/users_index.rb`
269
-
270
- ```ruby
271
- class UsersIndex < Chewy::Index
272
-
273
- end
274
- ```
275
-
276
- 2. Define index scope (you can omit this part if you don't need to specify a scope (i.e. use PORO objects for import) or options)
277
-
278
- ```ruby
279
- class UsersIndex < Chewy::Index
280
- index_scope User.active # or just model instead_of scope: index_scope User
281
- end
282
- ```
283
-
284
- 3. Add some mappings
285
-
286
- ```ruby
287
- class UsersIndex < Chewy::Index
288
- index_scope User.active.includes(:country, :badges, :projects)
289
- field :first_name, :last_name # multiple fields without additional options
290
- field :email, analyzer: 'email' # Elasticsearch-related options
291
- field :country, value: ->(user) { user.country.name } # custom value proc
292
- field :badges, value: ->(user) { user.badges.map(&:name) } # passing array values to index
293
- field :projects do # the same block syntax for multi_field, if `:type` is specified
294
- field :title
295
- field :description # default data type is `text`
296
- # additional top-level objects passed to value proc:
297
- field :categories, value: ->(project, user) { project.categories.map(&:name) if user.active? }
298
- end
299
- field :rating, type: 'integer' # custom data type
300
- field :created, type: 'date', include_in_all: false,
301
- value: ->{ created_at } # value proc for source object context
302
- end
303
- ```
304
-
305
- [See here for mapping definitions](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html).
306
-
307
- 4. Add some index-related settings. Analyzer repositories might be used as well. See `Chewy::Index.settings` docs for details:
308
-
309
- ```ruby
310
- class UsersIndex < Chewy::Index
311
- settings analysis: {
312
- analyzer: {
313
- email: {
314
- tokenizer: 'keyword',
315
- filter: ['lowercase']
316
- }
317
- }
318
- }
319
-
320
- index_scope User.active.includes(:country, :badges, :projects)
321
- root date_detection: false do
322
- template 'about_translations.*', type: 'text', analyzer: 'standard'
323
-
324
- field :first_name, :last_name
325
- field :email, analyzer: 'email'
326
- field :country, value: ->(user) { user.country.name }
327
- field :badges, value: ->(user) { user.badges.map(&:name) }
328
- field :projects do
329
- field :title
330
- field :description
331
- end
332
- field :about_translations, type: 'object' # pass object type explicitly if necessary
333
- field :rating, type: 'integer'
334
- field :created, type: 'date', include_in_all: false,
335
- value: ->{ created_at }
336
- end
337
- end
338
- ```
339
-
340
- [See index settings here](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html).
341
- [See root object settings here](https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html).
342
-
343
- See [mapping.rb](lib/chewy/index/mapping.rb) for more details.
344
-
345
- 5. Add model-observing code
346
-
347
- ```ruby
348
- class User < ActiveRecord::Base
349
- update_index('users') { self } # specifying index and back-reference
350
- # for updating after user save or destroy
351
- end
352
-
353
- class Country < ActiveRecord::Base
354
- has_many :users
355
-
356
- update_index('users') { users } # return single object or collection
357
- end
358
-
359
- class Project < ActiveRecord::Base
360
- update_index('users') { user if user.active? } # you can return even `nil` from the back-reference
361
- end
362
-
363
- class Book < ActiveRecord::Base
364
- update_index(->(book) {"books_#{book.language}"}) { self } # dynamic index name with proc.
365
- # For book with language == "en"
366
- # this code will generate `books_en`
367
- end
368
- ```
369
-
370
- Also, you can use the second argument for method name passing:
371
-
372
- ```ruby
373
- update_index('users', :self)
374
- update_index('users', :users)
375
- ```
376
-
377
- In the case of a belongs_to association you may need to update both associated objects, previous and current:
378
-
379
- ```ruby
380
- class City < ActiveRecord::Base
381
- belongs_to :country
382
-
383
- update_index('cities') { self }
384
- update_index 'countries' do
385
- previous_changes['country_id'] || country
386
- end
387
- end
388
- ```
389
-
390
- ### Default import options
391
-
392
- Every index has `default_import_options` configuration to specify, suddenly, default import options:
393
-
394
- ```ruby
395
- class ProductsIndex < Chewy::Index
396
- index_scope Post.includes(:tags)
397
- default_import_options batch_size: 100, bulk_size: 10.megabytes, refresh: false
398
-
399
- field :name
400
- field :tags, value: -> { tags.map(&:name) }
401
- end
402
- ```
403
-
404
- See [import.rb](lib/chewy/index/import.rb) for available options.
405
-
406
- ### Multi (nested) and object field types
407
-
408
- To define an objects field you can simply nest fields in the DSL:
409
-
410
- ```ruby
411
- field :projects do
412
- field :title
413
- field :description
414
- end
415
- ```
416
-
417
- This will automatically set the type or root field to `object`. You may also specify `type: 'objects'` explicitly.
418
-
419
- To define a multi field you have to specify any type except for `object` or `nested` in the root field:
420
-
421
- ```ruby
422
- field :full_name, type: 'text', value: ->{ full_name.strip } do
423
- field :ordered, analyzer: 'ordered'
424
- field :untouched, type: 'keyword'
425
- end
426
- ```
427
-
428
- The `value:` option for internal fields will no longer be effective.
429
-
430
- ### Geo Point fields
431
-
432
- You can use [Elasticsearch's geo mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html) with the `geo_point` field type, allowing you to query, filter and order by latitude and longitude. You can use the following hash format:
433
-
434
- ```ruby
435
- field :coordinates, type: 'geo_point', value: ->{ {lat: latitude, lon: longitude} }
436
- ```
437
-
438
- or by using nested fields:
439
-
440
- ```ruby
441
- field :coordinates, type: 'geo_point' do
442
- field :lat, value: ->{ latitude }
443
- field :long, value: ->{ longitude }
444
- end
445
- ```
446
-
447
- See the section on *Script fields* for details on calculating distance in a search.
448
-
449
- ### Join fields
450
-
451
- You can use a [join field](https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html)
452
- to implement parent-child relationships between documents.
453
- It [replaces the old `parent_id` based parent-child mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html#parent-child-mapping-types)
454
-
455
- To use it, you need to pass `relations` and `join` (with `type` and `id`) options:
456
- ```ruby
457
- field :hierarchy_link, type: :join, relations: {question: %i[answer comment], answer: :vote, vote: :subvote}, join: {type: :comment_type, id: :commented_id}
458
- ```
459
- assuming you have `comment_type` and `commented_id` fields in your model.
460
-
461
- Note that when you reindex a parent, its children and grandchildren will be reindexed as well.
462
- This may require additional queries to the primary database and to elastisearch.
463
-
464
- Also note that the join field doesn't support crutches (it should be a field directly defined on the model).
465
-
466
- ### Crutches™ technology
467
-
468
- Assume you are defining your index like this (product has_many categories through product_categories):
469
-
470
- ```ruby
471
- class ProductsIndex < Chewy::Index
472
- index_scope Product.includes(:categories)
473
- field :name
474
- field :category_names, value: ->(product) { product.categories.map(&:name) } # or shorter just -> { categories.map(&:name) }
475
- end
476
- ```
477
-
478
- Then the Chewy reindexing flow will look like the following pseudo-code:
479
-
480
- ```ruby
481
- Product.includes(:categories).find_in_batches(1000) do |batch|
482
- bulk_body = batch.map do |object|
483
- {name: object.name, category_names: object.categories.map(&:name)}.to_json
484
- end
485
- # here we are sending every batch of data to ES
486
- Chewy.client.bulk bulk_body
487
- end
488
- ```
489
-
490
- If you meet complicated cases when associations are not applicable you can replace Rails associations with Chewy Crutches™ technology:
491
-
492
- ```ruby
493
- class ProductsIndex < Chewy::Index
494
- index_scope Product
495
- crutch :categories do |collection| # collection here is a current batch of products
496
- # data is fetched with a lightweight query without objects initialization
497
- data = ProductCategory.joins(:category).where(product_id: collection.map(&:id)).pluck(:product_id, 'categories.name')
498
- # then we have to convert fetched data to appropriate format
499
- # this will return our data in structure like:
500
- # {123 => ['sweets', 'juices'], 456 => ['meat']}
501
- data.each.with_object({}) { |(id, name), result| (result[id] ||= []).push(name) }
502
- end
503
-
504
- field :name
505
- # simply use crutch-fetched data as a value:
506
- field :category_names, value: ->(product, crutches) { crutches[:categories][product.id] }
507
- end
508
- ```
509
-
510
- An example flow will look like this:
511
-
512
- ```ruby
513
- Product.includes(:categories).find_in_batches(1000) do |batch|
514
- crutches[:categories] = ProductCategory.joins(:category).where(product_id: batch.map(&:id)).pluck(:product_id, 'categories.name')
515
- .each.with_object({}) { |(id, name), result| (result[id] ||= []).push(name) }
516
-
517
- bulk_body = batch.map do |object|
518
- {name: object.name, category_names: crutches[:categories][object.id]}.to_json
519
- end
520
- Chewy.client.bulk bulk_body
521
- end
522
- ```
523
-
524
- So Chewy Crutches™ technology is able to increase your indexing performance in some cases up to a hundredfold or even more depending on your associations complexity.
525
-
526
- ### Witchcraft™ technology
527
-
528
- One more experimental technology to increase import performance. As far as you know, chewy defines value proc for every imported field in mapping, so at the import time each of these procs is executed on imported object to extract result document to import. It would be great for performance to use one huge whole-document-returning proc instead. So basically the idea or Witchcraft™ technology is to compile a single document-returning proc from the index definition.
529
-
530
- ```ruby
531
- index_scope Product
532
- witchcraft!
533
-
534
- field :title
535
- field :tags, value: -> { tags.map(&:name) }
536
- field :categories do
537
- field :name, value: -> (product, category) { category.name }
538
- field :type, value: -> (product, category, crutch) { crutch.types[category.name] }
539
- end
540
- ```
541
-
542
- The index definition above will be compiled to something close to:
543
-
544
- ```ruby
545
- -> (object, crutches) do
546
- {
547
- title: object.title,
548
- tags: object.tags.map(&:name),
549
- categories: object.categories.map do |object2|
550
- {
551
- name: object2.name
552
- type: crutches.types[object2.name]
553
- }
554
- end
555
- }
556
- end
557
- ```
558
-
559
- And don't even ask how is it possible, it is a witchcraft.
560
- Obviously not every type of definition might be compiled. There are some restrictions:
561
-
562
- 1. Use reasonable formatting to make `method_source` be able to extract field value proc sources.
563
- 2. Value procs with splat arguments are not supported right now.
564
- 3. If you are generating fields dynamically use value proc with arguments, argumentless value procs are not supported yet:
565
-
566
- ```ruby
567
- [:first_name, :last_name].each do |name|
568
- field name, value: -> (o) { o.send(name) }
569
- end
570
- ```
571
-
572
- However, it is quite possible that your index definition will be supported by Witchcraft™ technology out of the box in most of the cases.
573
-
574
- ### Raw Import
575
-
576
- Another way to speed up import time is Raw Imports. This technology is only available in ActiveRecord adapter. Very often, ActiveRecord model instantiation is what consumes most of the CPU and RAM resources. Precious time is wasted on converting, say, timestamps from strings and then serializing them back to strings. Chewy can operate on raw hashes of data directly obtained from the database. All you need is to provide a way to convert that hash to a lightweight object that mimics the behaviour of the normal ActiveRecord object.
577
-
578
- ```ruby
579
- class LightweightProduct
580
- def initialize(attributes)
581
- @attributes = attributes
582
- end
583
-
584
- # Depending on the database, `created_at` might
585
- # be in different formats. In PostgreSQL, for example,
586
- # you might see the following format:
587
- # "2016-03-22 16:23:22"
588
- #
589
- # Taking into account that Elastic expects something different,
590
- # one might do something like the following, just to avoid
591
- # unnecessary String -> DateTime -> String conversion.
592
- #
593
- # "2016-03-22 16:23:22" -> "2016-03-22T16:23:22Z"
594
- def created_at
595
- @attributes['created_at'].tr(' ', 'T') << 'Z'
596
- end
597
- end
598
-
599
- index_scope Product
600
- default_import_options raw_import: ->(hash) {
601
- LightweightProduct.new(hash)
602
- }
603
-
604
- field :created_at, 'datetime'
605
- ```
606
-
607
- Also, you can pass `:raw_import` option to the `import` method explicitly.
608
-
609
- ### Index creation during import
610
-
611
- By default, when you perform import Chewy checks whether an index exists and creates it if it's absent.
612
- You can turn off this feature to decrease Elasticsearch hits count.
613
- To do so you need to set `skip_index_creation_on_import` parameter to `false` in your `config/chewy.yml`
614
-
615
- ### Skip record fields during import
616
-
617
- You can use `ignore_blank: true` to skip fields that return `true` for the `.blank?` method:
618
-
619
- ```ruby
620
- index_scope Country
621
- field :id
622
- field :cities, ignore_blank: true do
623
- field :id
624
- field :name
625
- field :surname, ignore_blank: true
626
- field :description
627
- end
628
- ```
629
-
630
- #### Default values for different types
631
-
632
- By default `ignore_blank` is false on every type except `geo_point`.
633
-
634
- ### Journaling
635
-
636
- You can record all actions that were made to the separate journal index in ElasticSearch.
637
- When you create/update/destroy your documents, it will be saved in this special index.
638
- If you make something with a batch of documents (e.g. during index reset) it will be saved as a one record, including primary keys of each document that was affected.
639
- Common journal record looks like this:
640
-
641
- ```json
642
- {
643
- "action": "index",
644
- "object_id": [1, 2, 3],
645
- "index_name": "...",
646
- "created_at": "<timestamp>"
647
- }
648
- ```
649
-
650
- This feature is turned off by default.
651
- But you can turn it on by setting `journal` setting to `true` in `config/chewy.yml`.
652
- Also, you can specify journal index name. For example:
653
-
654
- ```yaml
655
- # config/chewy.yml
656
- production:
657
- journal: true
658
- journal_name: my_super_journal
659
- ```
660
-
661
- Also, you can provide this option while you're importing some index:
662
-
663
- ```ruby
664
- CityIndex.import journal: true
665
- ```
666
-
667
- Or as a default import option for an index:
668
-
669
- ```ruby
670
- class CityIndex
671
- index_scope City
672
- default_import_options journal: true
673
- end
674
- ```
675
-
676
- You may be wondering why do you need it? The answer is simple: not to lose the data.
677
-
678
- Imagine that you reset your index in a zero-downtime manner (to separate index), and in the meantime somebody keeps updating the data frequently (to old index). So all these actions will be written to the journal index and you'll be able to apply them after index reset using the `Chewy::Journal` interface.
679
-
680
- When enabled, journal can grow to enormous size, consider setting up cron job that would clean it occasionally using [`chewy:journal:clean` rake task](#chewyjournal).
681
-
682
- ### Index manipulation
683
-
684
- ```ruby
685
- UsersIndex.delete # destroy index if it exists
686
- UsersIndex.delete!
687
-
688
- UsersIndex.create
689
- UsersIndex.create! # use bang or non-bang methods
690
-
691
- UsersIndex.purge
692
- UsersIndex.purge! # deletes then creates index
693
-
694
- UsersIndex.import # import with 0 arguments process all the data specified in index_scope definition
695
- UsersIndex.import User.where('rating > 100') # or import specified users scope
696
- UsersIndex.import User.where('rating > 100').to_a # or import specified users array
697
- UsersIndex.import [1, 2, 42] # pass even ids for import, it will be handled in the most effective way
698
- UsersIndex.import User.where('rating > 100'), update_fields: [:email] # if update fields are specified - it will update their values only with the `update` bulk action
699
- UsersIndex.import! # raises an exception in case of any import errors
700
-
701
- UsersIndex.reset! # purges index and imports default data for all types
702
- ```
703
-
704
- If the passed user is `#destroyed?`, or satisfies a `delete_if` index_scope option, or the specified id does not exist in the database, import will perform delete from index action for this object.
705
-
706
- ```ruby
707
- index_scope User, delete_if: :deleted_at
708
- index_scope User, delete_if: -> { deleted_at }
709
- index_scope User, delete_if: ->(user) { user.deleted_at }
710
- ```
711
-
712
- See [actions.rb](lib/chewy/index/actions.rb) for more details.
713
-
714
- ### Index update strategies
715
-
716
- Assume you've got the following code:
717
-
718
- ```ruby
719
- class City < ActiveRecord::Base
720
- update_index 'cities', :self
721
- end
722
-
723
- class CitiesIndex < Chewy::Index
724
- index_scope City
725
- field :name
726
- end
727
- ```
728
-
729
- If you do something like `City.first.save!` you'll get an UndefinedUpdateStrategy exception instead of the object saving and index updating. This exception forces you to choose an appropriate update strategy for the current context.
730
-
731
- If you want to return to the pre-0.7.0 behavior - just set `Chewy.root_strategy = :bypass`.
732
-
733
- #### `:atomic`
734
-
735
- The main strategy here is `:atomic`. Assume you have to update a lot of records in the db.
736
-
737
- ```ruby
738
- Chewy.strategy(:atomic) do
739
- City.popular.map(&:do_some_update_action!)
740
- end
741
- ```
742
-
743
- Using this strategy delays the index update request until the end of the block. Updated records are aggregated and the index update happens with the bulk API. So this strategy is highly optimized.
744
-
745
- #### `:sidekiq`
746
-
747
- This does the same thing as `:atomic`, but asynchronously using sidekiq. Patch `Chewy::Strategy::Sidekiq::Worker` for index updates improving.
748
-
749
- ```ruby
750
- Chewy.strategy(:sidekiq) do
751
- City.popular.map(&:do_some_update_action!)
752
- end
753
- ```
754
-
755
- The default queue name is `chewy`, you can customize it in settings: `sidekiq.queue_name`
756
- ```
757
- Chewy.settings[:sidekiq] = {queue: :low}
758
- ```
759
-
760
- #### `:lazy_sidekiq`
761
-
762
- This does the same thing as `:sidekiq`, but with lazy evaluation. Beware it does not allow you to use any non-persistent record state for indices and conditions because record will be re-fetched from database asynchronously using sidekiq. However for destroying records strategy will fallback to `:sidekiq` because it's not possible to re-fetch deleted records from database.
237
+ ## Documentation
763
238
 
764
- The purpose of this strategy is to improve the response time of the code that should update indexes, as it does not only defer actual ES calls to a background job but `update_index` callbacks evaluation (for created and updated objects) too. Similar to `:sidekiq`, index update is asynchronous so this strategy cannot be used when data and index synchronization is required.
765
-
766
- ```ruby
767
- Chewy.strategy(:lazy_sidekiq) do
768
- City.popular.map(&:do_some_update_action!)
769
- end
770
- ```
771
-
772
- The default queue name is `chewy`, you can customize it in settings: `sidekiq.queue_name`
773
- ```
774
- Chewy.settings[:sidekiq] = {queue: :low}
775
- ```
776
-
777
- #### `:delayed_sidekiq`
778
-
779
- It accumulates IDs of records to be reindexed during the latency window in Redis and then performs the reindexing of all accumulated records at once.
780
- This strategy is very useful in the case of frequently mutated records.
781
- It supports the `update_fields` option, so it will attempt to select just enough data from the database.
782
-
783
- Keep in mind, this strategy does not guarantee reindexing in the event of Sidekiq worker termination or an error during the reindexing phase.
784
- This behavior is intentional to prevent continuous growth of Redis db.
785
-
786
- There are three options that can be defined in the index:
787
- ```ruby
788
- class CitiesIndex...
789
- strategy_config delayed_sidekiq: {
790
- latency: 3,
791
- margin: 2,
792
- ttl: 60 * 60 * 24,
793
- reindex_wrapper: ->(&reindex) {
794
- ActiveRecord::Base.connected_to(role: :reading) { reindex.call }
795
- }
796
- # latency - will prevent scheduling identical jobs
797
- # margin - main purpose is to cover db replication lag by the margin
798
- # ttl - a chunk expiration time (in seconds)
799
- # reindex_wrapper - lambda that accepts block to wrap that reindex process AR connection block.
800
- }
801
-
802
- ...
803
- end
804
- ```
805
-
806
- Also you can define defaults in the `initializers/chewy.rb`
807
- ```ruby
808
- Chewy.settings = {
809
- strategy_config: {
810
- delayed_sidekiq: {
811
- latency: 3,
812
- margin: 2,
813
- ttl: 60 * 60 * 24,
814
- reindex_wrapper: ->(&reindex) {
815
- ActiveRecord::Base.connected_to(role: :reading) { reindex.call }
816
- }
817
- }
818
- }
819
- }
820
-
821
- ```
822
- or in `config/chewy.yml`
823
- ```ruby
824
- strategy_config:
825
- delayed_sidekiq:
826
- latency: 3
827
- margin: 2
828
- ttl: <%= 60 * 60 * 24 %>
829
- # reindex_wrapper setting is not possible here!!! use the initializer instead
830
- ```
831
-
832
- You can use the strategy identically to other strategies
833
- ```ruby
834
- Chewy.strategy(:delayed_sidekiq) do
835
- City.popular.map(&:do_some_update_action!)
836
- end
837
- ```
838
-
839
- The default queue name is `chewy`, you can customize it in settings: `sidekiq.queue_name`
840
- ```
841
- Chewy.settings[:sidekiq] = {queue: :low}
842
- ```
843
-
844
- Explicit call of the reindex using `:delayed_sidekiq strategy`
845
- ```ruby
846
- CitiesIndex.import([1, 2, 3], strategy: :delayed_sidekiq)
847
- ```
848
-
849
- Explicit call of the reindex using `:delayed_sidekiq` strategy with `:update_fields` support
850
- ```ruby
851
- CitiesIndex.import([1, 2, 3], update_fields: [:name], strategy: :delayed_sidekiq)
852
- ```
853
-
854
- While running tests with delayed_sidekiq strategy and Sidekiq is using a real redis instance that is NOT cleaned up in between tests (via e.g. `Sidekiq.redis(&:flushdb)`), you'll want to cleanup some redis keys in between tests to avoid state leaking and flaky tests. Chewy provides a convenience method for that:
855
- ```ruby
856
- # it might be a good idea to also add to your testing setup, e.g.: a rspec `before` hook
857
- Chewy::Strategy::DelayedSidekiq.clear_timechunks!
858
- ```
859
-
860
- #### `:active_job`
861
-
862
- This does the same thing as `:atomic`, but using ActiveJob. This will inherit the ActiveJob configuration settings including the `active_job.queue_adapter` setting for the environment. Patch `Chewy::Strategy::ActiveJob::Worker` for index updates improving.
863
-
864
- ```ruby
865
- Chewy.strategy(:active_job) do
866
- City.popular.map(&:do_some_update_action!)
867
- end
868
- ```
869
-
870
- The default queue name is `chewy`, you can customize it in settings: `active_job.queue_name`
871
- ```
872
- Chewy.settings[:active_job] = {queue: :low}
873
- ```
874
-
875
- #### `:urgent`
876
-
877
- The following strategy is convenient if you are going to update documents in your index one by one.
878
-
879
- ```ruby
880
- Chewy.strategy(:urgent) do
881
- City.popular.map(&:do_some_update_action!)
882
- end
883
- ```
884
-
885
- This code will perform `City.popular.count` requests for ES documents update.
886
-
887
- It is convenient for use in e.g. the Rails console with non-block notation:
888
-
889
- ```ruby
890
- > Chewy.strategy(:urgent)
891
- > City.popular.map(&:do_some_update_action!)
892
- ```
893
-
894
- #### `:bypass`
895
-
896
- When the bypass strategy is active the index will not be automatically updated on object save.
897
-
898
- For example, on `City.first.save!` the cities index would not be updated.
899
-
900
- #### Nesting
901
-
902
- Strategies are designed to allow nesting, so it is possible to redefine it for nested contexts.
903
-
904
- ```ruby
905
- Chewy.strategy(:atomic) do
906
- city1.do_update!
907
- Chewy.strategy(:urgent) do
908
- city2.do_update!
909
- city3.do_update!
910
- # there will be 2 update index requests for city2 and city3
911
- end
912
- city4..do_update!
913
- # city1 and city4 will be grouped in one index update request
914
- end
915
- ```
916
-
917
- #### Non-block notation
918
-
919
- It is possible to nest strategies without blocks:
920
-
921
- ```ruby
922
- Chewy.strategy(:urgent)
923
- city1.do_update! # index updated
924
- Chewy.strategy(:bypass)
925
- city2.do_update! # update bypassed
926
- Chewy.strategy.pop
927
- city3.do_update! # index updated again
928
- ```
929
-
930
- #### Designing your own strategies
931
-
932
- See [strategy/base.rb](lib/chewy/strategy/base.rb) for more details. See [strategy/atomic.rb](lib/chewy/strategy/atomic.rb) for an example.
933
-
934
- ### Rails application strategies integration
935
-
936
- There are a couple of predefined strategies for your Rails application. Initially, the Rails console uses the `:urgent` strategy by default, except in the sandbox case. When you are running sandbox it switches to the `:bypass` strategy to avoid polluting the index.
937
-
938
- Migrations are wrapped with the `:bypass` strategy. Because the main behavior implies that indices are reset after migration, there is no need for extra index updates. Also indexing might be broken during migrations because of the outdated schema.
939
-
940
- Controller actions are wrapped with the configurable value of `Chewy.request_strategy` and defaults to `:atomic`. This is done at the middleware level to reduce the number of index update requests inside actions.
941
-
942
- It is also a good idea to set up the `:bypass` strategy inside your test suite and import objects manually only when needed, and use `Chewy.massacre` when needed to flush test ES indices before every example. This will allow you to minimize unnecessary ES requests and reduce overhead.
943
-
944
- ```ruby
945
- RSpec.configure do |config|
946
- config.before(:suite) do
947
- Chewy.strategy(:bypass)
948
- end
949
- end
950
- ```
951
-
952
- ### Elasticsearch client options
953
-
954
- All connection options, except the `:prefix`, are passed to the `Elasticseach::Client.new` ([chewy/lib/chewy.rb](https://github.com/toptal/chewy/blob/f5bad9f83c21416ac10590f6f34009c645062e89/lib/chewy.rb#L153-L160)):
955
-
956
- Here's the relevant Elasticsearch documentation on the subject: https://rubydoc.info/gems/elasticsearch-transport#setting-hosts
957
-
958
- ### `ActiveSupport::Notifications` support
959
-
960
- Chewy has notifying the following events:
961
-
962
- #### `search_query.chewy` payload
963
-
964
- * `payload[:index]`: requested index class
965
- * `payload[:request]`: request hash
966
-
967
- #### `import_objects.chewy` payload
968
-
969
- * `payload[:index]`: currently imported index name
970
- * `payload[:import]`: imports stats, total imported and deleted objects count:
971
-
972
- ```ruby
973
- {index: 30, delete: 5}
974
- ```
975
-
976
- * `payload[:errors]`: might not exist. Contains grouped errors with objects ids list:
977
-
978
- ```ruby
979
- {index: {
980
- 'error 1 text' => ['1', '2', '3'],
981
- 'error 2 text' => ['4']
982
- }, delete: {
983
- 'delete error text' => ['10', '12']
984
- }}
985
- ```
986
-
987
- ### NewRelic integration
988
-
989
- To integrate with NewRelic you may use the following example source (config/initializers/chewy.rb):
990
-
991
- ```ruby
992
- require 'new_relic/agent/instrumentation/evented_subscriber'
993
-
994
- class ChewySubscriber < NewRelic::Agent::Instrumentation::EventedSubscriber
995
- def start(name, id, payload)
996
- event = ChewyEvent.new(name, Time.current, nil, id, payload)
997
- push_event(event)
998
- end
999
-
1000
- def finish(_name, id, _payload)
1001
- pop_event(id).finish
1002
- end
1003
-
1004
- class ChewyEvent < NewRelic::Agent::Instrumentation::Event
1005
- OPERATIONS = {
1006
- 'import_objects.chewy' => 'import',
1007
- 'search_query.chewy' => 'search',
1008
- 'delete_query.chewy' => 'delete'
1009
- }.freeze
1010
-
1011
- def initialize(*args)
1012
- super
1013
- @segment = start_segment
1014
- end
1015
-
1016
- def start_segment
1017
- segment = NewRelic::Agent::Transaction::DatastoreSegment.new product, operation, collection, host, port
1018
- if (txn = state.current_transaction)
1019
- segment.transaction = txn
1020
- end
1021
- segment.notice_sql @payload[:request].to_s
1022
- segment.start
1023
- segment
1024
- end
1025
-
1026
- def finish
1027
- if (txn = state.current_transaction)
1028
- txn.add_segment @segment
1029
- end
1030
- @segment.finish
1031
- end
1032
-
1033
- private
1034
-
1035
- def state
1036
- @state ||= NewRelic::Agent::TransactionState.tl_get
1037
- end
1038
-
1039
- def product
1040
- 'Elasticsearch'
1041
- end
1042
-
1043
- def operation
1044
- OPERATIONS[name]
1045
- end
1046
-
1047
- def collection
1048
- payload.values_at(:type, :index)
1049
- .reject { |value| value.try(:empty?) }
1050
- .first
1051
- .to_s
1052
- end
1053
-
1054
- def host
1055
- Chewy.client.transport.hosts.first[:host]
1056
- end
1057
-
1058
- def port
1059
- Chewy.client.transport.hosts.first[:port]
1060
- end
1061
- end
1062
- end
1063
-
1064
- ActiveSupport::Notifications.subscribe(/.chewy$/, ChewySubscriber.new)
1065
- ```
1066
-
1067
- ### Search requests
1068
-
1069
- Quick introduction.
1070
-
1071
- #### Composing requests
1072
-
1073
- The request DSL have the same chainable nature as AR. The main class is `Chewy::Search::Request`.
1074
-
1075
- ```ruby
1076
- CitiesIndex.query(match: {name: 'London'})
1077
- ```
1078
-
1079
- Main methods of the request DSL are: `query`, `filter` and `post_filter`, it is possible to pass pure query hashes or use `elasticsearch-dsl`.
1080
-
1081
- ```ruby
1082
- CitiesIndex
1083
- .filter(term: {name: 'Bangkok'})
1084
- .query(match: {name: 'London'})
1085
- .query.not(range: {population: {gt: 1_000_000}})
1086
- ```
1087
-
1088
- You can query a set of indexes at once:
1089
-
1090
- ```ruby
1091
- CitiesIndex.indices(CountriesIndex).query(match: {name: 'Some'})
1092
- ```
1093
-
1094
- See https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html and https://github.com/elastic/elasticsearch-dsl-ruby for more details.
1095
-
1096
- An important part of requests manipulation is merging. There are 4 methods to perform it: `merge`, `and`, `or`, `not`. See [Chewy::Search::QueryProxy](lib/chewy/search/query_proxy.rb) for details. Also, `only` and `except` methods help to remove unneeded parts of the request.
1097
-
1098
- Every other request part is covered by a bunch of additional methods, see [Chewy::Search::Request](lib/chewy/search/request.rb) for details:
1099
-
1100
- ```ruby
1101
- CitiesIndex.limit(10).offset(30).order(:name, {population: {order: :desc}})
1102
- ```
1103
-
1104
- Request DSL also provides additional scope actions, like `delete_all`, `exists?`, `count`, `pluck`, etc.
1105
-
1106
- #### Pagination
1107
-
1108
- The request DSL supports pagination with `Kaminari`. An extension is enabled on initialization if `Kaminari` is available. See [Chewy::Search](lib/chewy/search.rb) and [Chewy::Search::Pagination::Kaminari](lib/chewy/search/pagination/kaminari.rb) for details.
1109
-
1110
- #### Named scopes
1111
-
1112
- Chewy supports named scopes functionality. There is no specialized DSL for named scopes definition, it is simply about defining class methods.
1113
-
1114
- See [Chewy::Search::Scoping](lib/chewy/search/scoping.rb) for details.
1115
-
1116
- #### Scroll API
1117
-
1118
- ElasticSearch scroll API is utilized by a bunch of methods: `scroll_batches`, `scroll_hits`, `scroll_wrappers` and `scroll_objects`.
1119
-
1120
- See [Chewy::Search::Scrolling](lib/chewy/search/scrolling.rb) for details.
1121
-
1122
- #### Loading objects
1123
-
1124
- It is possible to load ORM/ODM source objects with the `objects` method. To provide additional loading options use `load` method:
1125
-
1126
- ```ruby
1127
- CitiesIndex.load(scope: -> { active }).to_a # to_a returns `Chewy::Index` wrappers.
1128
- CitiesIndex.load(scope: -> { active }).objects # An array of AR source objects.
1129
- ```
1130
-
1131
- See [Chewy::Search::Loader](lib/chewy/search/loader.rb) for more details.
1132
-
1133
- In case when it is necessary to iterate through both of the wrappers and objects simultaneously, `object_hash` method helps a lot:
1134
-
1135
- ```ruby
1136
- scope = CitiesIndex.load(scope: -> { active })
1137
- scope.each do |wrapper|
1138
- scope.object_hash[wrapper]
1139
- end
1140
- ```
1141
-
1142
- ### Rake tasks
1143
-
1144
- For a Rails application, some index-maintaining rake tasks are defined.
1145
-
1146
- #### `chewy:reset`
1147
-
1148
- Performs zero-downtime reindexing as described [here](https://www.elastic.co/blog/changing-mapping-with-zero-downtime). So the rake task creates a new index with unique suffix and then simply aliases it to the common index name. The previous index is deleted afterwards (see `Chewy::Index.reset!` for more details).
1149
-
1150
- ```bash
1151
- rake chewy:reset # resets all the existing indices
1152
- rake chewy:reset[users] # resets UsersIndex only
1153
- rake chewy:reset[users,cities] # resets UsersIndex and CitiesIndex
1154
- rake chewy:reset[-users,cities] # resets every index in the application except specified ones
1155
- ```
1156
-
1157
- #### `chewy:upgrade`
1158
-
1159
- Performs reset exactly the same way as `chewy:reset` does, but only when the index specification (setting or mapping) was changed.
1160
-
1161
- It works only when index specification is locked in `Chewy::Stash::Specification` index. The first run will reset all indexes and lock their specifications.
1162
-
1163
- See [Chewy::Stash::Specification](lib/chewy/stash.rb) and [Chewy::Index::Specification](lib/chewy/index/specification.rb) for more details.
1164
-
1165
-
1166
- ```bash
1167
- rake chewy:upgrade # upgrades all the existing indices
1168
- rake chewy:upgrade[users] # upgrades UsersIndex only
1169
- rake chewy:upgrade[users,cities] # upgrades UsersIndex and CitiesIndex
1170
- rake chewy:upgrade[-users,cities] # upgrades every index in the application except specified ones
1171
- ```
1172
-
1173
- #### `chewy:update`
1174
-
1175
- It doesn't create indexes, it simply imports everything to the existing ones and fails if the index was not created before.
1176
-
1177
- ```bash
1178
- rake chewy:update # updates all the existing indices
1179
- rake chewy:update[users] # updates UsersIndex only
1180
- rake chewy:update[users,cities] # updates UsersIndex and CitiesIndex
1181
- rake chewy:update[-users,cities] # updates every index in the application except UsersIndex and CitiesIndex
1182
- ```
1183
-
1184
- #### `chewy:sync`
1185
-
1186
- Provides a way to synchronize outdated indexes with the source quickly and without doing a full reset. By default field `updated_at` is used to find outdated records, but this could be customized by `outdated_sync_field` as described at [Chewy::Index::Syncer](lib/chewy/index/syncer.rb).
1187
-
1188
- Arguments are similar to the ones taken by `chewy:update` task.
1189
-
1190
- See [Chewy::Index::Syncer](lib/chewy/index/syncer.rb) for more details.
1191
-
1192
- ```bash
1193
- rake chewy:sync # synchronizes all the existing indices
1194
- rake chewy:sync[users] # synchronizes UsersIndex only
1195
- rake chewy:sync[users,cities] # synchronizes UsersIndex and CitiesIndex
1196
- rake chewy:sync[-users,cities] # synchronizes every index in the application except except UsersIndex and CitiesIndex
1197
- ```
1198
-
1199
- #### `chewy:deploy`
1200
-
1201
- This rake task is especially useful during the production deploy. It is a combination of `chewy:upgrade` and `chewy:sync` and the latter is called only for the indexes that were not reset during the first stage.
1202
-
1203
- It is not possible to specify any particular indexes for this task as it doesn't make much sense.
1204
-
1205
- Right now the approach is that if some data had been updated, but index definition was not changed (no changes satisfying the synchronization algorithm were done), it would be much faster to perform manual partial index update inside data migrations or even manually after the deploy.
1206
-
1207
- Also, there is always full reset alternative with `rake chewy:reset`.
1208
-
1209
- #### `chewy:create_missing_indexes`
1210
-
1211
- This rake task creates newly defined indexes in ElasticSearch and skips existing ones. Useful for production-like environments.
1212
-
1213
- #### Parallelizing rake tasks
1214
-
1215
- Every task described above has its own parallel version. Every parallel rake task takes the number for processes for execution as the first argument and the rest of the arguments are exactly the same as for the non-parallel task version.
1216
-
1217
- [https://github.com/grosser/parallel](https://github.com/grosser/parallel) gem is required to use these tasks.
1218
-
1219
- If the number of processes is not specified explicitly - `parallel` gem tries to automatically derive the number of processes to use.
1220
-
1221
- ```bash
1222
- rake chewy:parallel:reset
1223
- rake chewy:parallel:upgrade[4]
1224
- rake chewy:parallel:update[4,cities]
1225
- rake chewy:parallel:sync[4,-users]
1226
- rake chewy:parallel:deploy[4] # performs parallel upgrade and parallel sync afterwards
1227
- ```
1228
-
1229
- #### `chewy:journal`
1230
-
1231
- This namespace contains two tasks for the journal manipulations: `chewy:journal:apply` and `chewy:journal:clean`. Both are taking time as the first argument (optional for clean) and a list of indexes exactly as the tasks above. Time can be in any format parsable by ActiveSupport.
1232
-
1233
- ```bash
1234
- rake chewy:journal:apply["$(date -v-1H -u +%FT%TZ)"] # apply journaled changes for the past hour
1235
- rake chewy:journal:apply["$(date -v-1H -u +%FT%TZ)",users] # apply journaled changes for the past hour on UsersIndex only
1236
- ```
1237
-
1238
- When the size of the journal becomes very large, the classical way of deletion would be obstructive and resource consuming. Fortunately, Chewy internally uses [delete-by-query](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-delete-by-query.html#docs-delete-by-query-task-api) ES function which supports async execution with batching and [throttling](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query-throttle).
1239
-
1240
- The available options, which can be set by ENV variables, are listed below:
1241
- * `WAIT_FOR_COMPLETION` - a boolean flag. It controls async execution. It waits by default. When set to `false` (`0`, `f`, `false` or `off` in any case spelling is accepted as `false`), Elasticsearch performs some preflight checks, launches the request, and returns a task reference you can use to cancel the task or get its status.
1242
- * `REQUESTS_PER_SECOND` - float. The throttle for this request in sub-requests per second. No throttling is enforced by default.
1243
- * `SCROLL_SIZE` - integer. The number of documents to be deleted in single sub-request. The default batch size is 1000.
1244
-
1245
- ```bash
1246
- rake chewy:journal:clean WAIT_FOR_COMPLETION=false REQUESTS_PER_SECOND=10 SCROLL_SIZE=5000
1247
- ```
1248
-
1249
- ### RSpec integration
1250
-
1251
- Just add `require 'chewy/rspec'` to your spec_helper.rb and you will get additional features:
1252
-
1253
- [update_index](lib/chewy/rspec/update_index.rb) helper
1254
- `mock_elasticsearch_response` helper to mock elasticsearch response
1255
- `mock_elasticsearch_response_sources` helper to mock elasticsearch response sources
1256
- `build_query` matcher to compare request and expected query (returns `true`/`false`)
1257
-
1258
- To use `mock_elasticsearch_response` and `mock_elasticsearch_response_sources` helpers add `include Chewy::Rspec::Helpers` to your tests.
1259
-
1260
- See [chewy/rspec/](lib/chewy/rspec/) for more details.
1261
-
1262
- ### Minitest integration
1263
-
1264
- Add `require 'chewy/minitest'` to your test_helper.rb, and then for tests which you'd like indexing test hooks, `include Chewy::Minitest::Helpers`.
1265
-
1266
- Since you can set `:bypass` strategy for test suites and manually handle import for the index and manually flush test indices using `Chewy.massacre`. This will help reduce unnecessary ES requests
1267
-
1268
- But if you require chewy to index/update model regularly in your test suite then you can specify `:urgent` strategy for documents indexing. Add `Chewy.strategy(:urgent)` to test_helper.rb.
1269
-
1270
- Also, you can use additional helpers:
1271
-
1272
- `mock_elasticsearch_response` to mock elasticsearch response
1273
- `mock_elasticsearch_response_sources` to mock elasticsearch response sources
1274
- `assert_elasticsearch_query` to compare request and expected query (returns `true`/`false`)
1275
-
1276
- See [chewy/minitest/](lib/chewy/minitest/) for more details.
1277
-
1278
- ### DatabaseCleaner
1279
-
1280
- If you use `DatabaseCleaner` in your tests with [the `transaction` strategy](https://github.com/DatabaseCleaner/database_cleaner#how-to-use), you may run into the problem that `ActiveRecord`'s models are not indexed automatically on save despite the fact that you set the callbacks to do this with the `update_index` method. The issue arises because `chewy` indices data on `after_commit` run as default, but all `after_commit` callbacks are not run with the `DatabaseCleaner`'s' `transaction` strategy. You can solve this issue by changing the `Chewy.use_after_commit_callbacks` option. Just add the following initializer in your Rails application:
1281
-
1282
- ```ruby
1283
- #config/initializers/chewy.rb
1284
- Chewy.use_after_commit_callbacks = !Rails.env.test?
1285
- ```
1286
-
1287
- ### Pre-request Filter
1288
-
1289
- Should you need to inspect the query prior to it being dispatched to ElasticSearch during any queries, you can use the `before_es_request_filter`. `before_es_request_filter` is a callable object, as demonstrated below:
1290
-
1291
- ```ruby
1292
- Chewy.before_es_request_filter = -> (method_name, args, kw_args) { ... }
1293
- ```
1294
-
1295
- While using the `before_es_request_filter`, please consider the following:
1296
-
1297
- * `before_es_request_filter` acts as a simple proxy before any request made via the `ElasticSearch::Client`. The arguments passed to this filter include:
1298
- * `method_name` - The name of the method being called. Examples are search, count, bulk and etc.
1299
- * `args` and `kw_args` - These are the positional arguments provided in the method call.
1300
- * The operation is synchronous, so avoid executing any heavy or time-consuming operations within the filter to prevent performance degradation.
1301
- * The return value of the proc is disregarded. This filter is intended for inspection or modification of the query rather than generating a response.
1302
- * Any exception raised inside the callback will propagate upward and halt the execution of the query. It is essential to handle potential errors adequately to ensure the stability of your search functionality.
1303
-
1304
- ### Import scope clean-up behavior
1305
-
1306
- Whenever you set the `import_scope` for the index, in the case of ActiveRecord,
1307
- options for order, offset and limit will be removed. You can set the behavior of
1308
- chewy, before the clean-up itself.
1309
-
1310
- The default behavior is a warning sent to the Chewy logger (`:warn`). Another more
1311
- restrictive option is raising an exception (`:raise`). Both options have a
1312
- negative impact on performance since verifying whether the code uses any of
1313
- these options requires building AREL query.
1314
-
1315
- To avoid the loading time impact, you can ignore the check (`:ignore`) before
1316
- the clean-up.
1317
-
1318
- ```
1319
- Chewy.import_scope_cleanup_behavior = :ignore
1320
- ```
239
+ - [Configuration](docs/configuration.md) client settings, update strategies, notifications, integrations
240
+ - [Indexing](docs/indexing.md) — index definition, field types, crutches, witchcraft, index manipulation
241
+ - [Import](docs/import.md) — import options, raw import, journaling
242
+ - [Querying](docs/querying.md) — search requests, pagination, scopes, scroll, loading
243
+ - [Rake Tasks](docs/rake_tasks.md) — all rake tasks and parallelization
244
+ - [Testing](docs/testing.md) — RSpec, Minitest, DatabaseCleaner
245
+ - [Troubleshooting](docs/troubleshooting.md) — pre-request filter
1321
246
 
1322
247
  ## Contributing
1323
248
 
@@ -1337,5 +262,5 @@ rake elasticsearch:stop # stop Elasticsearch
1337
262
 
1338
263
  ## Copyright
1339
264
 
1340
- Copyright (c) 2013-2021 Toptal, LLC. See [LICENSE.txt](LICENSE.txt) for
265
+ Copyright (c) 2013-2025 Toptal, LLC. See [LICENSE.txt](LICENSE.txt) for
1341
266
  further details.