chewy 7.2.6 → 7.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.github/CODEOWNERS +1 -1
- data/.github/workflows/ruby.yml +10 -2
- data/CHANGELOG.md +25 -0
- data/README.md +94 -7
- data/chewy.gemspec +1 -0
- data/lib/chewy/index/import/bulk_builder.rb +4 -5
- data/lib/chewy/index/import.rb +29 -2
- data/lib/chewy/index.rb +25 -0
- data/lib/chewy/journal.rb +6 -2
- data/lib/chewy/rake_helper.rb +39 -5
- data/lib/chewy/search/request.rb +15 -3
- data/lib/chewy/stash.rb +3 -3
- data/lib/chewy/strategy/delayed_sidekiq/scheduler.rb +148 -0
- data/lib/chewy/strategy/delayed_sidekiq/worker.rb +52 -0
- data/lib/chewy/strategy/delayed_sidekiq.rb +17 -0
- data/lib/chewy/strategy.rb +1 -0
- data/lib/chewy/version.rb +1 -1
- data/lib/tasks/chewy.rake +7 -1
- data/spec/chewy/index/import/bulk_builder_spec.rb +4 -0
- data/spec/chewy/rake_helper_spec.rb +75 -0
- data/spec/chewy/search/request_spec.rb +25 -0
- data/spec/chewy/strategy/delayed_sidekiq_spec.rb +190 -0
- metadata +25 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: fcd795c985f120c412c315bc0b87dd65e7b55a302aeae1bffbc5c71beb910652
|
4
|
+
data.tar.gz: ad70c5ff382f8405146da2be5643d9ac397d60fbbf16842a509a950cc90acbd2
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 0f4ac34a7c84c9fc6dd2743ab77982b634bddb0ceda0116b2a2bfa9a0b902655b375bc215857002041159a5cd0ce52e9c409342f06ce7a41f0667d3c74c07ab4
|
7
|
+
data.tar.gz: fa68110ccfc6acd942c54b279cfd99e36b980dcb408bd1508430598757c3efd57e8dc1eaa60c2bf77156eb9acba77c700715ac9676af0a36eacb600de730b1b5
|
data/.github/CODEOWNERS
CHANGED
@@ -1 +1 @@
|
|
1
|
-
.github/workflows @toptal/
|
1
|
+
.github/workflows @toptal/platform-sre
|
data/.github/workflows/ruby.yml
CHANGED
@@ -1,6 +1,14 @@
|
|
1
1
|
name: CI
|
2
2
|
|
3
|
-
on:
|
3
|
+
on:
|
4
|
+
push:
|
5
|
+
branches: [master]
|
6
|
+
pull_request:
|
7
|
+
types: [
|
8
|
+
synchronize, # PR was updated
|
9
|
+
opened, # PR was open
|
10
|
+
reopened # PR was reopened
|
11
|
+
]
|
4
12
|
|
5
13
|
jobs:
|
6
14
|
ruby-2:
|
@@ -34,7 +42,7 @@ jobs:
|
|
34
42
|
strategy:
|
35
43
|
fail-fast: false
|
36
44
|
matrix:
|
37
|
-
ruby: [ '3.0', 3.1 ]
|
45
|
+
ruby: [ '3.0', '3.1', '3.2' ]
|
38
46
|
gemfile: [ rails.6.1.activerecord, rails.7.0.activerecord ]
|
39
47
|
name: ${{ matrix.ruby }}-${{ matrix.gemfile }}
|
40
48
|
|
data/CHANGELOG.md
CHANGED
@@ -8,6 +8,31 @@
|
|
8
8
|
|
9
9
|
### Bugs Fixed
|
10
10
|
|
11
|
+
## 7.3.0 (2023-04-03)
|
12
|
+
|
13
|
+
### New Features
|
14
|
+
|
15
|
+
* [#869](https://github.com/toptal/chewy/pull/869): New strategy - `delayed_sidekiq`. Allow passing `strategy: :delayed_sidekiq` option to `SomeIndex.import([1, ...], strategy: :delayed_sidekiq)`. The strategy is compatible with `update_fields` option as well. ([@skcc321][])
|
16
|
+
* [#879](https://github.com/toptal/chewy/pull/879): Configure CI to check for ruby 3.2 compatibility. ([@konalegi][])
|
17
|
+
|
18
|
+
### Changes
|
19
|
+
|
20
|
+
### Bugs Fixed
|
21
|
+
|
22
|
+
* [#856](https://github.com/toptal/chewy/pull/856): Fix return value of subscribed_task_stats used in rake tasks. ([@fabiormoura][])
|
23
|
+
|
24
|
+
## 7.2.7 (2022-11-15)
|
25
|
+
|
26
|
+
### New Features
|
27
|
+
|
28
|
+
* [#857](https://github.com/toptal/chewy/pull/857): Allow passing `wait_for_completion`, `request_per_second` and `scroll_size` options to `chewy:journal:clean` rake task and `delete_all` query builder method. ([@konalegi][])([@barthez][])
|
29
|
+
|
30
|
+
### Changes
|
31
|
+
|
32
|
+
### Bugs Fixed
|
33
|
+
|
34
|
+
* [#863](https://github.com/toptal/chewy/pull/863): Fix `crutches` call doesn't respect `update_fields` option. ([@skcc321][])
|
35
|
+
|
11
36
|
## 7.2.6 (2022-06-13)
|
12
37
|
|
13
38
|
### New Features
|
data/README.md
CHANGED
@@ -5,7 +5,7 @@
|
|
5
5
|
|
6
6
|
# Chewy
|
7
7
|
|
8
|
-
Chewy is an ODM (Object Document Mapper), built on top of
|
8
|
+
Chewy is an ODM (Object Document Mapper), built on top of [the official Elasticsearch client](https://github.com/elastic/elasticsearch-ruby).
|
9
9
|
|
10
10
|
## Why Chewy?
|
11
11
|
|
@@ -458,7 +458,7 @@ field :hierarchy_link, type: :join, relations: {question: %i[answer comment], an
|
|
458
458
|
```
|
459
459
|
assuming you have `comment_type` and `commented_id` fields in your model.
|
460
460
|
|
461
|
-
Note that when you reindex a parent,
|
461
|
+
Note that when you reindex a parent, its children and grandchildren will be reindexed as well.
|
462
462
|
This may require additional queries to the primary database and to elastisearch.
|
463
463
|
|
464
464
|
Also note that the join field doesn't support crutches (it should be a field directly defined on the model).
|
@@ -525,7 +525,7 @@ So Chewy Crutches™ technology is able to increase your indexing performance in
|
|
525
525
|
|
526
526
|
### Witchcraft™ technology
|
527
527
|
|
528
|
-
One more experimental technology to increase import performance. As far as you know, chewy defines value proc for every imported field in mapping, so at the import time each of
|
528
|
+
One more experimental technology to increase import performance. As far as you know, chewy defines value proc for every imported field in mapping, so at the import time each of these procs is executed on imported object to extract result document to import. It would be great for performance to use one huge whole-document-returning proc instead. So basically the idea or Witchcraft™ technology is to compile a single document-returning proc from the index definition.
|
529
529
|
|
530
530
|
```ruby
|
531
531
|
index_scope Product
|
@@ -569,7 +569,7 @@ Obviously not every type of definition might be compiled. There are some restric
|
|
569
569
|
end
|
570
570
|
```
|
571
571
|
|
572
|
-
However, it is quite possible that your index definition will be supported by Witchcraft™ technology out of the box in
|
572
|
+
However, it is quite possible that your index definition will be supported by Witchcraft™ technology out of the box in most of the cases.
|
573
573
|
|
574
574
|
### Raw Import
|
575
575
|
|
@@ -675,7 +675,9 @@ end
|
|
675
675
|
|
676
676
|
You may be wondering why do you need it? The answer is simple: not to lose the data.
|
677
677
|
|
678
|
-
Imagine that you reset your index in a zero-downtime manner (to separate index), and
|
678
|
+
Imagine that you reset your index in a zero-downtime manner (to separate index), and in the meantime somebody keeps updating the data frequently (to old index). So all these actions will be written to the journal index and you'll be able to apply them after index reset using the `Chewy::Journal` interface.
|
679
|
+
|
680
|
+
When enabled, journal can grow to enormous size, consider setting up cron job that would clean it occasionally using [`chewy:journal:clean` rake task](#chewyjournal).
|
679
681
|
|
680
682
|
### Index manipulation
|
681
683
|
|
@@ -772,6 +774,80 @@ The default queue name is `chewy`, you can customize it in settings: `sidekiq.qu
|
|
772
774
|
Chewy.settings[:sidekiq] = {queue: :low}
|
773
775
|
```
|
774
776
|
|
777
|
+
#### `:delayed_sidekiq`
|
778
|
+
|
779
|
+
It accumulates ids of records to be reindexed during the latency window in redis and then does the reindexing of all accumulated records at once.
|
780
|
+
The strategy is very useful in case of frequently mutated records.
|
781
|
+
It supports `update_fields` option, so it will try to select just enough data from the DB
|
782
|
+
|
783
|
+
There are three options that can be defined in the index:
|
784
|
+
```ruby
|
785
|
+
class CitiesIndex...
|
786
|
+
strategy_config delayed_sidekiq: {
|
787
|
+
latency: 3,
|
788
|
+
margin: 2,
|
789
|
+
ttl: 60 * 60 * 24,
|
790
|
+
reindex_wrapper: ->(&reindex) {
|
791
|
+
ActiveRecord::Base.connected_to(role: :reading) { reindex.call }
|
792
|
+
}
|
793
|
+
# latency - will prevent scheduling identical jobs
|
794
|
+
# margin - main purpose is to cover db replication lag by the margin
|
795
|
+
# ttl - a chunk expiration time (in seconds)
|
796
|
+
# reindex_wrapper - lambda that accepts block to wrap that reindex process AR connection block.
|
797
|
+
}
|
798
|
+
|
799
|
+
...
|
800
|
+
end
|
801
|
+
```
|
802
|
+
|
803
|
+
Also you can define defaults in the `initializers/chewy.rb`
|
804
|
+
```ruby
|
805
|
+
Chewy.settings = {
|
806
|
+
strategy_config: {
|
807
|
+
delayed_sidekiq: {
|
808
|
+
latency: 3,
|
809
|
+
margin: 2,
|
810
|
+
ttl: 60 * 60 * 24,
|
811
|
+
reindex_wrapper: ->(&reindex) {
|
812
|
+
ActiveRecord::Base.connected_to(role: :reading) { reindex.call }
|
813
|
+
}
|
814
|
+
}
|
815
|
+
}
|
816
|
+
}
|
817
|
+
|
818
|
+
```
|
819
|
+
or in `config/chewy.yml`
|
820
|
+
```ruby
|
821
|
+
strategy_config:
|
822
|
+
delayed_sidekiq:
|
823
|
+
latency: 3
|
824
|
+
margin: 2
|
825
|
+
ttl: <%= 60 * 60 * 24 %>
|
826
|
+
# reindex_wrapper setting is not possible here!!! use the initializer instead
|
827
|
+
```
|
828
|
+
|
829
|
+
You can use the strategy identically to other strategies
|
830
|
+
```ruby
|
831
|
+
Chewy.strategy(:delayed_sidekiq) do
|
832
|
+
City.popular.map(&:do_some_update_action!)
|
833
|
+
end
|
834
|
+
```
|
835
|
+
|
836
|
+
The default queue name is `chewy`, you can customize it in settings: `sidekiq.queue_name`
|
837
|
+
```
|
838
|
+
Chewy.settings[:sidekiq] = {queue: :low}
|
839
|
+
```
|
840
|
+
|
841
|
+
Explicit call of the reindex using `:delayed_sidekiq strategy`
|
842
|
+
```ruby
|
843
|
+
CitiesIndex.import([1, 2, 3], strategy: :delayed_sidekiq)
|
844
|
+
```
|
845
|
+
|
846
|
+
Explicit call of the reindex using `:delayed_sidekiq` strategy with `:update_fields` support
|
847
|
+
```ruby
|
848
|
+
CitiesIndex.import([1, 2, 3], update_fields: [:name], strategy: :delayed_sidekiq)
|
849
|
+
```
|
850
|
+
|
775
851
|
#### `:active_job`
|
776
852
|
|
777
853
|
This does the same thing as `:atomic`, but using ActiveJob. This will inherit the ActiveJob configuration settings including the `active_job.queue_adapter` setting for the environment. Patch `Chewy::Strategy::ActiveJob::Worker` for index updates improving.
|
@@ -886,7 +962,7 @@ Chewy has notifying the following events:
|
|
886
962
|
{index: 30, delete: 5}
|
887
963
|
```
|
888
964
|
|
889
|
-
* `payload[:errors]`: might not
|
965
|
+
* `payload[:errors]`: might not exist. Contains grouped errors with objects ids list:
|
890
966
|
|
891
967
|
```ruby
|
892
968
|
{index: {
|
@@ -1018,7 +1094,7 @@ Request DSL also provides additional scope actions, like `delete_all`, `exists?`
|
|
1018
1094
|
|
1019
1095
|
#### Pagination
|
1020
1096
|
|
1021
|
-
The request DSL supports pagination with `Kaminari`. An extension is enabled on
|
1097
|
+
The request DSL supports pagination with `Kaminari`. An extension is enabled on initialization if `Kaminari` is available. See [Chewy::Search](lib/chewy/search.rb) and [Chewy::Search::Pagination::Kaminari](lib/chewy/search/pagination/kaminari.rb) for details.
|
1022
1098
|
|
1023
1099
|
#### Named scopes
|
1024
1100
|
|
@@ -1144,6 +1220,17 @@ rake chewy:journal:apply["$(date -v-1H -u +%FT%TZ)"] # apply journaled changes f
|
|
1144
1220
|
rake chewy:journal:apply["$(date -v-1H -u +%FT%TZ)",users] # apply journaled changes for the past hour on UsersIndex only
|
1145
1221
|
```
|
1146
1222
|
|
1223
|
+
When the size of the journal becomes very large, the classical way of deletion would be obstructive and resource consuming. Fortunately, Chewy internally uses [delete-by-query](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-delete-by-query.html#docs-delete-by-query-task-api) ES function which supports async execution with batching and [throttling](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query-throttle).
|
1224
|
+
|
1225
|
+
The available options, which can be set by ENV variables, are listed below:
|
1226
|
+
* `WAIT_FOR_COMPLETION` - a boolean flag. It controls async execution. It waits by default. When set to `false` (`0`, `f`, `false` or `off` in any case spelling is accepted as `false`), Elasticsearch performs some preflight checks, launches the request, and returns a task reference you can use to cancel the task or get its status.
|
1227
|
+
* `REQUESTS_PER_SECOND` - float. The throttle for this request in sub-requests per second. No throttling is enforced by default.
|
1228
|
+
* `SCROLL_SIZE` - integer. The number of documents to be deleted in single sub-request. The default batch size is 1000.
|
1229
|
+
|
1230
|
+
```bash
|
1231
|
+
rake chewy:journal:clean WAIT_FOR_COMPLETION=false REQUESTS_PER_SECOND=10 SCROLL_SIZE=5000
|
1232
|
+
```
|
1233
|
+
|
1147
1234
|
### RSpec integration
|
1148
1235
|
|
1149
1236
|
Just add `require 'chewy/rspec'` to your spec_helper.rb and you will get additional features:
|
data/chewy.gemspec
CHANGED
@@ -19,6 +19,7 @@ Gem::Specification.new do |spec| # rubocop:disable Metrics/BlockLength
|
|
19
19
|
|
20
20
|
spec.add_development_dependency 'database_cleaner'
|
21
21
|
spec.add_development_dependency 'elasticsearch-extensions'
|
22
|
+
spec.add_development_dependency 'mock_redis'
|
22
23
|
spec.add_development_dependency 'rake'
|
23
24
|
spec.add_development_dependency 'rspec', '>= 3.7.0'
|
24
25
|
spec.add_development_dependency 'rspec-collection_matchers'
|
@@ -48,12 +48,11 @@ module Chewy
|
|
48
48
|
def index_entry(object)
|
49
49
|
entry = {}
|
50
50
|
entry[:_id] = index_object_ids[object] if index_object_ids[object]
|
51
|
+
entry[:routing] = routing(object) if join_field?
|
51
52
|
|
52
|
-
data = data_for(object)
|
53
53
|
parent = cache(entry[:_id])
|
54
|
-
|
55
|
-
|
56
|
-
if parent_changed?(data, parent)
|
54
|
+
data = data_for(object) if parent.present?
|
55
|
+
if parent.present? && parent_changed?(data, parent)
|
57
56
|
reindex_entries(object, data) + reindex_descendants(object)
|
58
57
|
elsif @fields.present?
|
59
58
|
return [] unless entry[:_id]
|
@@ -61,7 +60,7 @@ module Chewy
|
|
61
60
|
entry[:data] = {doc: data_for(object, fields: @fields)}
|
62
61
|
[{update: entry}]
|
63
62
|
else
|
64
|
-
entry[:data] = data
|
63
|
+
entry[:data] = data || data_for(object)
|
65
64
|
[{index: entry}]
|
66
65
|
end
|
67
66
|
end
|
data/lib/chewy/index/import.rb
CHANGED
@@ -73,7 +73,7 @@ module Chewy
|
|
73
73
|
# @option options [true, Integer, Hash] parallel enables parallel import processing with the Parallel gem, accepts the number of workers or any Parallel gem acceptable options
|
74
74
|
# @return [true, false] false in case of errors
|
75
75
|
ruby2_keywords def import(*args)
|
76
|
-
|
76
|
+
intercept_import_using_strategy(*args).blank?
|
77
77
|
end
|
78
78
|
|
79
79
|
# @!method import!(*collection, **options)
|
@@ -84,7 +84,8 @@ module Chewy
|
|
84
84
|
#
|
85
85
|
# @raise [Chewy::ImportFailed] in case of errors
|
86
86
|
ruby2_keywords def import!(*args)
|
87
|
-
errors =
|
87
|
+
errors = intercept_import_using_strategy(*args)
|
88
|
+
|
88
89
|
raise Chewy::ImportFailed.new(self, errors) if errors.present?
|
89
90
|
|
90
91
|
true
|
@@ -126,6 +127,32 @@ module Chewy
|
|
126
127
|
|
127
128
|
private
|
128
129
|
|
130
|
+
def intercept_import_using_strategy(*args)
|
131
|
+
args_clone = args.deep_dup
|
132
|
+
options = args_clone.extract_options!
|
133
|
+
strategy = options.delete(:strategy)
|
134
|
+
|
135
|
+
return import_routine(*args) if strategy.blank?
|
136
|
+
|
137
|
+
ids = args_clone.flatten
|
138
|
+
return {} if ids.blank?
|
139
|
+
return {argument: {"#{strategy} supports ids only!" => ids}} unless ids.all? do |id|
|
140
|
+
id.respond_to?(:to_i)
|
141
|
+
end
|
142
|
+
|
143
|
+
case strategy
|
144
|
+
when :delayed_sidekiq
|
145
|
+
begin
|
146
|
+
Chewy::Strategy::DelayedSidekiq::Scheduler.new(self, ids, options).postpone
|
147
|
+
{} # success. errors handling convention
|
148
|
+
rescue StandardError => e
|
149
|
+
{scheduler: {e.message => ids}}
|
150
|
+
end
|
151
|
+
else
|
152
|
+
{argument: {"unsupported strategy: '#{strategy}'" => ids}}
|
153
|
+
end
|
154
|
+
end
|
155
|
+
|
129
156
|
def import_routine(*args)
|
130
157
|
return if !args.first.nil? && empty_objects_or_scope?(args.first)
|
131
158
|
|
data/lib/chewy/index.rb
CHANGED
@@ -20,6 +20,10 @@ module Chewy
|
|
20
20
|
pipeline raw_import refresh replication
|
21
21
|
].freeze
|
22
22
|
|
23
|
+
STRATEGY_OPTIONS = {
|
24
|
+
delayed_sidekiq: %i[latency margin ttl reindex_wrapper]
|
25
|
+
}.freeze
|
26
|
+
|
23
27
|
include Search
|
24
28
|
include Actions
|
25
29
|
include Aliases
|
@@ -221,6 +225,27 @@ module Chewy
|
|
221
225
|
params.assert_valid_keys(IMPORT_OPTIONS_KEYS)
|
222
226
|
self._default_import_options = _default_import_options.merge(params)
|
223
227
|
end
|
228
|
+
|
229
|
+
def strategy_config(params = {})
|
230
|
+
@strategy_config ||= begin
|
231
|
+
config_struct = Struct.new(*STRATEGY_OPTIONS.keys).new
|
232
|
+
|
233
|
+
STRATEGY_OPTIONS.each_with_object(config_struct) do |(strategy, options), res|
|
234
|
+
res[strategy] = case strategy
|
235
|
+
when :delayed_sidekiq
|
236
|
+
Struct.new(*STRATEGY_OPTIONS[strategy]).new.tap do |config|
|
237
|
+
options.each do |option|
|
238
|
+
config[option] = params.dig(strategy, option) || Chewy.configuration.dig(:strategy_config, strategy, option)
|
239
|
+
end
|
240
|
+
|
241
|
+
config[:reindex_wrapper] ||= ->(&reindex) { reindex.call } # default wrapper
|
242
|
+
end
|
243
|
+
else
|
244
|
+
raise NotImplementedError, "Unsupported strategy: '#{strategy}'"
|
245
|
+
end
|
246
|
+
end
|
247
|
+
end
|
248
|
+
end
|
224
249
|
end
|
225
250
|
end
|
226
251
|
end
|
data/lib/chewy/journal.rb
CHANGED
@@ -43,8 +43,12 @@ module Chewy
|
|
43
43
|
#
|
44
44
|
# @param until_time [Time, DateTime] time to clean up until it
|
45
45
|
# @return [Hash] delete_by_query ES API call result
|
46
|
-
def clean(until_time = nil)
|
47
|
-
Chewy::Stash::Journal.clean(
|
46
|
+
def clean(until_time = nil, delete_by_query_options: {})
|
47
|
+
Chewy::Stash::Journal.clean(
|
48
|
+
until_time,
|
49
|
+
only: @only,
|
50
|
+
delete_by_query_options: delete_by_query_options.merge(refresh: false)
|
51
|
+
)
|
48
52
|
end
|
49
53
|
|
50
54
|
private
|
data/lib/chewy/rake_helper.rb
CHANGED
@@ -19,6 +19,9 @@ module Chewy
|
|
19
19
|
output.puts " Applying journal to #{targets}, #{count} entries, stage #{payload[:stage]}"
|
20
20
|
end
|
21
21
|
|
22
|
+
DELETE_BY_QUERY_OPTIONS = %w[WAIT_FOR_COMPLETION REQUESTS_PER_SECOND SCROLL_SIZE].freeze
|
23
|
+
FALSE_VALUES = %w[0 f false off].freeze
|
24
|
+
|
22
25
|
class << self
|
23
26
|
# Performs zero-downtime reindexing of all documents for the specified indexes
|
24
27
|
#
|
@@ -162,7 +165,7 @@ module Chewy
|
|
162
165
|
|
163
166
|
subscribed_task_stats(output) do
|
164
167
|
output.puts "Applying journal entries created after #{time}"
|
165
|
-
count = Chewy::Journal.new(
|
168
|
+
count = Chewy::Journal.new(journal_indexes_from(only: only, except: except)).apply(time)
|
166
169
|
output.puts 'No journal entries were created after the specified time' if count.zero?
|
167
170
|
end
|
168
171
|
end
|
@@ -181,12 +184,16 @@ module Chewy
|
|
181
184
|
# @param except [Array<Chewy::Index, String>, Chewy::Index, String] indexes to exclude from processing
|
182
185
|
# @param output [IO] output io for logging
|
183
186
|
# @return [Array<Chewy::Index>] indexes that were actually updated
|
184
|
-
def journal_clean(time: nil, only: nil, except: nil, output: $stdout)
|
187
|
+
def journal_clean(time: nil, only: nil, except: nil, delete_by_query_options: {}, output: $stdout)
|
185
188
|
subscribed_task_stats(output) do
|
186
189
|
output.puts "Cleaning journal entries created before #{time}" if time
|
187
|
-
response = Chewy::Journal.new(
|
188
|
-
|
189
|
-
|
190
|
+
response = Chewy::Journal.new(journal_indexes_from(only: only, except: except)).clean(time, delete_by_query_options: delete_by_query_options)
|
191
|
+
if response.key?('task')
|
192
|
+
output.puts "Task to cleanup the journal has been created, #{response['task']}"
|
193
|
+
else
|
194
|
+
count = response['deleted'] || response['_indices']['_all']['deleted']
|
195
|
+
output.puts "Cleaned up #{count} journal entries"
|
196
|
+
end
|
190
197
|
end
|
191
198
|
end
|
192
199
|
|
@@ -228,6 +235,26 @@ module Chewy
|
|
228
235
|
end
|
229
236
|
end
|
230
237
|
|
238
|
+
# Reads options that are required to run journal cleanup asynchronously from ENV hash
|
239
|
+
# @see https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
|
240
|
+
#
|
241
|
+
# @example
|
242
|
+
# Chewy::RakeHelper.delete_by_query_options_from_env({'WAIT_FOR_COMPLETION' => 'false','REQUESTS_PER_SECOND' => '10','SCROLL_SIZE' => '5000'})
|
243
|
+
# # => { wait_for_completion: false, requests_per_second: 10.0, scroll_size: 5000 }
|
244
|
+
#
|
245
|
+
def delete_by_query_options_from_env(env)
|
246
|
+
env
|
247
|
+
.slice(*DELETE_BY_QUERY_OPTIONS)
|
248
|
+
.transform_keys { |k| k.downcase.to_sym }
|
249
|
+
.to_h do |key, value|
|
250
|
+
case key
|
251
|
+
when :wait_for_completion then [key, !FALSE_VALUES.include?(value.downcase)]
|
252
|
+
when :requests_per_second then [key, value.to_f]
|
253
|
+
when :scroll_size then [key, value.to_i]
|
254
|
+
end
|
255
|
+
end
|
256
|
+
end
|
257
|
+
|
231
258
|
def normalize_indexes(*identifiers)
|
232
259
|
identifiers.flatten(1).map { |identifier| normalize_index(identifier) }
|
233
260
|
end
|
@@ -243,11 +270,18 @@ module Chewy
|
|
243
270
|
ActiveSupport::Notifications.subscribed(JOURNAL_CALLBACK.curry[output], 'apply_journal.chewy') do
|
244
271
|
ActiveSupport::Notifications.subscribed(IMPORT_CALLBACK.curry[output], 'import_objects.chewy', &block)
|
245
272
|
end
|
273
|
+
ensure
|
246
274
|
output.puts "Total: #{human_duration(Time.now - start)}"
|
247
275
|
end
|
248
276
|
|
249
277
|
private
|
250
278
|
|
279
|
+
def journal_indexes_from(only: nil, except: nil)
|
280
|
+
return if Array.wrap(only).empty? && Array.wrap(except).empty?
|
281
|
+
|
282
|
+
indexes_from(only: only, except: except)
|
283
|
+
end
|
284
|
+
|
251
285
|
def indexes_from(only: nil, except: nil)
|
252
286
|
indexes = if only.present?
|
253
287
|
normalize_indexes(Array.wrap(only))
|
data/lib/chewy/search/request.rb
CHANGED
@@ -962,10 +962,22 @@ module Chewy
|
|
962
962
|
#
|
963
963
|
# @see https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
|
964
964
|
# @note The result hash is different for different API used.
|
965
|
-
# @param refresh [true, false]
|
965
|
+
# @param refresh [true, false] Refreshes all shards involved in the delete by query
|
966
|
+
# @param wait_for_completion [true, false] wait for request completion or run it asynchronously
|
967
|
+
# and return task reference at `.tasks/task/${taskId}`.
|
968
|
+
# @param requests_per_second [Float] The throttle for this request in sub-requests per second
|
969
|
+
# @param scroll_size [Integer] Size of the scroll request that powers the operation
|
970
|
+
|
966
971
|
# @return [Hash] the result of query execution
|
967
|
-
def delete_all(refresh: true)
|
968
|
-
request_body = only(WHERE_STORAGES).render.merge(
|
972
|
+
def delete_all(refresh: true, wait_for_completion: nil, requests_per_second: nil, scroll_size: nil)
|
973
|
+
request_body = only(WHERE_STORAGES).render.merge(
|
974
|
+
{
|
975
|
+
refresh: refresh,
|
976
|
+
wait_for_completion: wait_for_completion,
|
977
|
+
requests_per_second: requests_per_second,
|
978
|
+
scroll_size: scroll_size
|
979
|
+
}.compact
|
980
|
+
)
|
969
981
|
ActiveSupport::Notifications.instrument 'delete_query.chewy', notification_payload(request: request_body) do
|
970
982
|
request_body[:body] = {query: {match_all: {}}} if request_body[:body].empty?
|
971
983
|
Chewy.client.delete_by_query(request_body)
|
data/lib/chewy/stash.rb
CHANGED
@@ -28,12 +28,12 @@ module Chewy
|
|
28
28
|
# Cleans up all the journal entries until the specified time. If nothing is
|
29
29
|
# specified - cleans up everything.
|
30
30
|
#
|
31
|
-
# @param
|
31
|
+
# @param until_time [Time, DateTime] Clean everything before that date
|
32
32
|
# @param only [Chewy::Index, Array<Chewy::Index>] indexes to clean up journal entries for
|
33
|
-
def self.clean(until_time = nil, only: [])
|
33
|
+
def self.clean(until_time = nil, only: [], delete_by_query_options: {})
|
34
34
|
scope = self.for(only)
|
35
35
|
scope = scope.filter(range: {created_at: {lte: until_time}}) if until_time
|
36
|
-
scope.delete_all
|
36
|
+
scope.delete_all(**delete_by_query_options)
|
37
37
|
end
|
38
38
|
|
39
39
|
# Selects all the journal entries for the specified indices.
|
@@ -0,0 +1,148 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require_relative '../../index'
|
4
|
+
|
5
|
+
# The class is responsible for accumulating in redis [type, ids]
|
6
|
+
# that were requested to be reindexed during `latency` seconds.
|
7
|
+
# The reindex job is going to be scheduled after a `latency` seconds.
|
8
|
+
# that job is going to read accumulated [type, ids] from the redis
|
9
|
+
# and reindex all them at once.
|
10
|
+
module Chewy
|
11
|
+
class Strategy
|
12
|
+
class DelayedSidekiq
|
13
|
+
require_relative 'worker'
|
14
|
+
|
15
|
+
class Scheduler
|
16
|
+
DEFAULT_TTL = 60 * 60 * 24 # in seconds
|
17
|
+
DEFAULT_LATENCY = 10
|
18
|
+
DEFAULT_MARGIN = 2
|
19
|
+
DEFAULT_QUEUE = 'chewy'
|
20
|
+
KEY_PREFIX = 'chewy:delayed_sidekiq'
|
21
|
+
FALLBACK_FIELDS = 'all'
|
22
|
+
FIELDS_IDS_SEPARATOR = ';'
|
23
|
+
IDS_SEPARATOR = ','
|
24
|
+
|
25
|
+
def initialize(type, ids, options = {})
|
26
|
+
@type = type
|
27
|
+
@ids = ids
|
28
|
+
@options = options
|
29
|
+
end
|
30
|
+
|
31
|
+
# the diagram:
|
32
|
+
#
|
33
|
+
# inputs:
|
34
|
+
# latency == 2
|
35
|
+
# reindex_time = Time.current
|
36
|
+
#
|
37
|
+
# Parallel OR Sequential triggers of reindex: | What is going on in reindex store (Redis):
|
38
|
+
# --------------------------------------------------------------------------------------------------
|
39
|
+
# |
|
40
|
+
# process 1 (reindex_time): | chewy:delayed_sidekiq:CitiesIndex:1679347866 = [1]
|
41
|
+
# Schedule.new(CitiesIndex, [1]).postpone | chewy:delayed_sidekiq:timechunks = [{ score: 1679347866, "chewy:delayed_sidekiq:CitiesIndex:1679347866"}]
|
42
|
+
# | & schedule a DelayedSidekiq::Worker at 1679347869 (at + 3)
|
43
|
+
# | it will zpop chewy:delayed_sidekiq:timechunks up to 1679347866 score and reindex all ids with zpoped keys
|
44
|
+
# | chewy:delayed_sidekiq:CitiesIndex:1679347866
|
45
|
+
# |
|
46
|
+
# |
|
47
|
+
# process 2 (reindex_time): | chewy:delayed_sidekiq:CitiesIndex:1679347866 = [1, 2]
|
48
|
+
# Schedule.new(CitiesIndex, [2]).postpone | chewy:delayed_sidekiq:timechunks = [{ score: 1679347866, "chewy:delayed_sidekiq:CitiesIndex:1679347866"}]
|
49
|
+
# | & do not schedule a new worker
|
50
|
+
# |
|
51
|
+
# |
|
52
|
+
# process 1 (reindex_time + (latency - 1).seconds): | chewy:delayed_sidekiq:CitiesIndex:1679347866 = [1, 2, 3]
|
53
|
+
# Schedule.new(CitiesIndex, [3]).postpone | chewy:delayed_sidekiq:timechunks = [{ score: 1679347866, "chewy:delayed_sidekiq:CitiesIndex:1679347866"}]
|
54
|
+
# | & do not schedule a new worker
|
55
|
+
# |
|
56
|
+
# |
|
57
|
+
# process 2 (reindex_time + (latency + 1).seconds): | chewy:delayed_sidekiq:CitiesIndex:1679347866 = [1, 2, 3]
|
58
|
+
# Schedule.new(CitiesIndex, [4]).postpone | chewy:delayed_sidekiq:CitiesIndex:1679347868 = [4]
|
59
|
+
# | chewy:delayed_sidekiq:timechunks = [
|
60
|
+
# | { score: 1679347866, "chewy:delayed_sidekiq:CitiesIndex:1679347866"}
|
61
|
+
# | { score: 1679347868, "chewy:delayed_sidekiq:CitiesIndex:1679347868"}
|
62
|
+
# | ]
|
63
|
+
# | & schedule a DelayedSidekiq::Worker at 1679347871 (at + 3)
|
64
|
+
# | it will zpop chewy:delayed_sidekiq:timechunks up to 1679347868 score and reindex all ids with zpoped keys
|
65
|
+
# | chewy:delayed_sidekiq:CitiesIndex:1679347866 (in case of failed previous reindex),
|
66
|
+
# | chewy:delayed_sidekiq:CitiesIndex:1679347868
|
67
|
+
def postpone
|
68
|
+
::Sidekiq.redis do |redis|
|
69
|
+
# warning: Redis#sadd will always return an Integer in Redis 5.0.0. Use Redis#sadd? instead
|
70
|
+
if redis.respond_to?(:sadd?)
|
71
|
+
redis.sadd?(timechunk_key, serialize_data)
|
72
|
+
else
|
73
|
+
redis.sadd(timechunk_key, serialize_data)
|
74
|
+
end
|
75
|
+
|
76
|
+
redis.expire(timechunk_key, ttl)
|
77
|
+
|
78
|
+
unless redis.zrank(timechunks_key, timechunk_key)
|
79
|
+
redis.zadd(timechunks_key, at, timechunk_key)
|
80
|
+
redis.expire(timechunks_key, ttl)
|
81
|
+
|
82
|
+
::Sidekiq::Client.push(
|
83
|
+
'queue' => sidekiq_queue,
|
84
|
+
'at' => at + margin,
|
85
|
+
'class' => Chewy::Strategy::DelayedSidekiq::Worker,
|
86
|
+
'args' => [type_name, at]
|
87
|
+
)
|
88
|
+
end
|
89
|
+
end
|
90
|
+
end
|
91
|
+
|
92
|
+
private
|
93
|
+
|
94
|
+
attr_reader :type, :ids, :options
|
95
|
+
|
96
|
+
# this method returns predictable value that jumps by latency value
|
97
|
+
# another words each latency seconds it return the same value
|
98
|
+
def at
|
99
|
+
@at ||= begin
|
100
|
+
schedule_at = latency.seconds.from_now.to_f
|
101
|
+
|
102
|
+
(schedule_at - (schedule_at % latency)).to_i
|
103
|
+
end
|
104
|
+
end
|
105
|
+
|
106
|
+
def fields
|
107
|
+
options[:update_fields].presence || [FALLBACK_FIELDS]
|
108
|
+
end
|
109
|
+
|
110
|
+
def timechunks_key
|
111
|
+
"#{KEY_PREFIX}:#{type_name}:timechunks"
|
112
|
+
end
|
113
|
+
|
114
|
+
def timechunk_key
|
115
|
+
"#{KEY_PREFIX}:#{type_name}:#{at}"
|
116
|
+
end
|
117
|
+
|
118
|
+
def serialize_data
|
119
|
+
[ids.join(IDS_SEPARATOR), fields.join(IDS_SEPARATOR)].join(FIELDS_IDS_SEPARATOR)
|
120
|
+
end
|
121
|
+
|
122
|
+
def type_name
|
123
|
+
type.name
|
124
|
+
end
|
125
|
+
|
126
|
+
def latency
|
127
|
+
strategy_config.latency || DEFAULT_LATENCY
|
128
|
+
end
|
129
|
+
|
130
|
+
def margin
|
131
|
+
strategy_config.margin || DEFAULT_MARGIN
|
132
|
+
end
|
133
|
+
|
134
|
+
def ttl
|
135
|
+
strategy_config.ttl || DEFAULT_TTL
|
136
|
+
end
|
137
|
+
|
138
|
+
def sidekiq_queue
|
139
|
+
Chewy.settings.dig(:sidekiq, :queue) || DEFAULT_QUEUE
|
140
|
+
end
|
141
|
+
|
142
|
+
def strategy_config
|
143
|
+
type.strategy_config.delayed_sidekiq
|
144
|
+
end
|
145
|
+
end
|
146
|
+
end
|
147
|
+
end
|
148
|
+
end
|
@@ -0,0 +1,52 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module Chewy
|
4
|
+
class Strategy
|
5
|
+
class DelayedSidekiq
|
6
|
+
class Worker
|
7
|
+
include ::Sidekiq::Worker
|
8
|
+
|
9
|
+
def perform(type, score, options = {})
|
10
|
+
options[:refresh] = !Chewy.disable_refresh_async if Chewy.disable_refresh_async
|
11
|
+
|
12
|
+
::Sidekiq.redis do |redis|
|
13
|
+
timechunks_key = "#{Scheduler::KEY_PREFIX}:#{type}:timechunks"
|
14
|
+
timechunk_keys = redis.zrangebyscore(timechunks_key, -1, score)
|
15
|
+
members = timechunk_keys.flat_map { |timechunk_key| redis.smembers(timechunk_key) }.compact
|
16
|
+
|
17
|
+
# extract ids and fields & do the reset of records
|
18
|
+
ids, fields = extract_ids_and_fields(members)
|
19
|
+
options[:update_fields] = fields if fields
|
20
|
+
|
21
|
+
index = type.constantize
|
22
|
+
index.strategy_config.delayed_sidekiq.reindex_wrapper.call do
|
23
|
+
options.any? ? index.import!(ids, **options) : index.import!(ids)
|
24
|
+
end
|
25
|
+
|
26
|
+
redis.del(timechunk_keys)
|
27
|
+
redis.zremrangebyscore(timechunks_key, -1, score)
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
private
|
32
|
+
|
33
|
+
def extract_ids_and_fields(members)
|
34
|
+
ids = []
|
35
|
+
fields = []
|
36
|
+
|
37
|
+
members.each do |member|
|
38
|
+
member_ids, member_fields = member.split(Scheduler::FIELDS_IDS_SEPARATOR).map do |v|
|
39
|
+
v.split(Scheduler::IDS_SEPARATOR)
|
40
|
+
end
|
41
|
+
ids |= member_ids
|
42
|
+
fields |= member_fields
|
43
|
+
end
|
44
|
+
|
45
|
+
fields = nil if fields.include?(Scheduler::FALLBACK_FIELDS)
|
46
|
+
|
47
|
+
[ids.map(&:to_i), fields]
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end
|
51
|
+
end
|
52
|
+
end
|
@@ -0,0 +1,17 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module Chewy
|
4
|
+
class Strategy
|
5
|
+
class DelayedSidekiq < Sidekiq
|
6
|
+
require_relative 'delayed_sidekiq/scheduler'
|
7
|
+
|
8
|
+
def leave
|
9
|
+
@stash.each do |type, ids|
|
10
|
+
next if ids.empty?
|
11
|
+
|
12
|
+
DelayedSidekiq::Scheduler.new(type, ids).postpone
|
13
|
+
end
|
14
|
+
end
|
15
|
+
end
|
16
|
+
end
|
17
|
+
end
|
data/lib/chewy/strategy.rb
CHANGED
data/lib/chewy/version.rb
CHANGED
data/lib/tasks/chewy.rake
CHANGED
@@ -94,7 +94,13 @@ namespace :chewy do
|
|
94
94
|
|
95
95
|
desc 'Removes journal records created before the specified timestamp for the specified indexes/types or all of them'
|
96
96
|
task clean: :environment do |_task, args|
|
97
|
-
Chewy::RakeHelper.
|
97
|
+
delete_options = Chewy::RakeHelper.delete_by_query_options_from_env(ENV)
|
98
|
+
Chewy::RakeHelper.journal_clean(
|
99
|
+
[
|
100
|
+
parse_journal_args(args.extras),
|
101
|
+
{delete_by_query_options: delete_options}
|
102
|
+
].reduce({}, :merge)
|
103
|
+
)
|
98
104
|
end
|
99
105
|
end
|
100
106
|
end
|
@@ -62,6 +62,8 @@ describe Chewy::Index::Import::BulkBuilder do
|
|
62
62
|
let(:to_index) { cities.first(2) }
|
63
63
|
let(:delete) { [cities.last] }
|
64
64
|
specify do
|
65
|
+
expect(subject).to receive(:data_for).with(cities.first).and_call_original
|
66
|
+
expect(subject).to receive(:data_for).with(cities.second).and_call_original
|
65
67
|
expect(subject.bulk_body).to eq([
|
66
68
|
{index: {_id: 1, data: {'name' => 'City17', 'rating' => 42}}},
|
67
69
|
{index: {_id: 2, data: {'name' => 'City18', 'rating' => 42}}},
|
@@ -72,6 +74,8 @@ describe Chewy::Index::Import::BulkBuilder do
|
|
72
74
|
context ':fields' do
|
73
75
|
let(:fields) { %w[name] }
|
74
76
|
specify do
|
77
|
+
expect(subject).to receive(:data_for).with(cities.first, fields: [:name]).and_call_original
|
78
|
+
expect(subject).to receive(:data_for).with(cities.second, fields: [:name]).and_call_original
|
75
79
|
expect(subject.bulk_body).to eq([
|
76
80
|
{update: {_id: 1, data: {doc: {'name' => 'City17'}}}},
|
77
81
|
{update: {_id: 2, data: {doc: {'name' => 'City18'}}}},
|
@@ -426,6 +426,33 @@ Total: \\d+s\\Z
|
|
426
426
|
described_class.journal_clean(except: CitiesIndex, output: output)
|
427
427
|
expect(output.string).to match(Regexp.new(<<-OUTPUT, Regexp::MULTILINE))
|
428
428
|
\\ACleaned up 1 journal entries
|
429
|
+
Total: \\d+s\\Z
|
430
|
+
OUTPUT
|
431
|
+
end
|
432
|
+
|
433
|
+
it 'executes asynchronously' do
|
434
|
+
output = StringIO.new
|
435
|
+
expect(Chewy.client).to receive(:delete_by_query).with(
|
436
|
+
{
|
437
|
+
body: {query: {match_all: {}}},
|
438
|
+
index: ['chewy_journal'],
|
439
|
+
refresh: false,
|
440
|
+
requests_per_second: 10.0,
|
441
|
+
scroll_size: 200,
|
442
|
+
wait_for_completion: false
|
443
|
+
}
|
444
|
+
).and_call_original
|
445
|
+
described_class.journal_clean(
|
446
|
+
output: output,
|
447
|
+
delete_by_query_options: {
|
448
|
+
wait_for_completion: false,
|
449
|
+
requests_per_second: 10.0,
|
450
|
+
scroll_size: 200
|
451
|
+
}
|
452
|
+
)
|
453
|
+
|
454
|
+
expect(output.string).to match(Regexp.new(<<-OUTPUT, Regexp::MULTILINE))
|
455
|
+
\\ATask to cleanup the journal has been created, [^\\n]*
|
429
456
|
Total: \\d+s\\Z
|
430
457
|
OUTPUT
|
431
458
|
end
|
@@ -502,4 +529,52 @@ Total: \\d+s\\Z
|
|
502
529
|
end
|
503
530
|
end
|
504
531
|
end
|
532
|
+
|
533
|
+
describe '.delete_by_query_options_from_env' do
|
534
|
+
subject(:options) { described_class.delete_by_query_options_from_env(env) }
|
535
|
+
let(:env) do
|
536
|
+
{
|
537
|
+
'WAIT_FOR_COMPLETION' => 'false',
|
538
|
+
'REQUESTS_PER_SECOND' => '10',
|
539
|
+
'SCROLL_SIZE' => '5000'
|
540
|
+
}
|
541
|
+
end
|
542
|
+
|
543
|
+
it 'parses the options' do
|
544
|
+
expect(options).to eq(
|
545
|
+
wait_for_completion: false,
|
546
|
+
requests_per_second: 10.0,
|
547
|
+
scroll_size: 5000
|
548
|
+
)
|
549
|
+
end
|
550
|
+
|
551
|
+
context 'with different boolean values' do
|
552
|
+
it 'parses the option correctly' do
|
553
|
+
%w[1 t true TRUE on ON].each do |v|
|
554
|
+
expect(described_class.delete_by_query_options_from_env({'WAIT_FOR_COMPLETION' => v}))
|
555
|
+
.to eq(wait_for_completion: true)
|
556
|
+
end
|
557
|
+
|
558
|
+
%w[0 f false FALSE off OFF].each do |v|
|
559
|
+
expect(described_class.delete_by_query_options_from_env({'WAIT_FOR_COMPLETION' => v}))
|
560
|
+
.to eq(wait_for_completion: false)
|
561
|
+
end
|
562
|
+
end
|
563
|
+
end
|
564
|
+
|
565
|
+
context 'with other env' do
|
566
|
+
let(:env) { {'SOME_ENV' => '123', 'REQUESTS_PER_SECOND' => '15'} }
|
567
|
+
|
568
|
+
it 'parses only the options' do
|
569
|
+
expect(options).to eq(requests_per_second: 15.0)
|
570
|
+
end
|
571
|
+
end
|
572
|
+
end
|
573
|
+
|
574
|
+
describe '.subscribed_task_stats' do
|
575
|
+
specify do
|
576
|
+
block_output = described_class.subscribed_task_stats(StringIO.new) { 'expected output' }
|
577
|
+
expect(block_output).to eq('expected output')
|
578
|
+
end
|
579
|
+
end
|
505
580
|
end
|
@@ -817,6 +817,31 @@ describe Chewy::Search::Request do
|
|
817
817
|
request: {index: ['products'], body: {query: {match: {name: 'name3'}}}, refresh: false}
|
818
818
|
)
|
819
819
|
end
|
820
|
+
|
821
|
+
it 'delete records asynchronously' do
|
822
|
+
outer_payload = nil
|
823
|
+
ActiveSupport::Notifications.subscribe('delete_query.chewy') do |_name, _start, _finish, _id, payload|
|
824
|
+
outer_payload = payload
|
825
|
+
end
|
826
|
+
subject.query(match: {name: 'name3'}).delete_all(
|
827
|
+
refresh: false,
|
828
|
+
wait_for_completion: false,
|
829
|
+
requests_per_second: 10.0,
|
830
|
+
scroll_size: 2000
|
831
|
+
)
|
832
|
+
expect(outer_payload).to eq(
|
833
|
+
index: ProductsIndex,
|
834
|
+
indexes: [ProductsIndex],
|
835
|
+
request: {
|
836
|
+
index: ['products'],
|
837
|
+
body: {query: {match: {name: 'name3'}}},
|
838
|
+
refresh: false,
|
839
|
+
wait_for_completion: false,
|
840
|
+
requests_per_second: 10.0,
|
841
|
+
scroll_size: 2000
|
842
|
+
}
|
843
|
+
)
|
844
|
+
end
|
820
845
|
end
|
821
846
|
|
822
847
|
describe '#response=' do
|
@@ -0,0 +1,190 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
if defined?(Sidekiq)
|
4
|
+
require 'sidekiq/testing'
|
5
|
+
require 'mock_redis'
|
6
|
+
|
7
|
+
describe Chewy::Strategy::DelayedSidekiq do
|
8
|
+
around do |example|
|
9
|
+
Chewy.strategy(:bypass) { example.run }
|
10
|
+
end
|
11
|
+
|
12
|
+
before do
|
13
|
+
redis = MockRedis.new
|
14
|
+
allow(Sidekiq).to receive(:redis).and_yield(redis)
|
15
|
+
Sidekiq::Worker.clear_all
|
16
|
+
end
|
17
|
+
|
18
|
+
before do
|
19
|
+
stub_model(:city) do
|
20
|
+
update_index('cities') { self }
|
21
|
+
end
|
22
|
+
|
23
|
+
stub_index(:cities) do
|
24
|
+
index_scope City
|
25
|
+
end
|
26
|
+
end
|
27
|
+
|
28
|
+
let(:city) { City.create!(name: 'hello') }
|
29
|
+
let(:other_city) { City.create!(name: 'world') }
|
30
|
+
|
31
|
+
it 'does not trigger immediate reindex due to it`s async nature' do
|
32
|
+
expect { [city, other_city].map(&:save!) }
|
33
|
+
.not_to update_index(CitiesIndex, strategy: :delayed_sidekiq)
|
34
|
+
end
|
35
|
+
|
36
|
+
it "respects 'refresh: false' options" do
|
37
|
+
allow(Chewy).to receive(:disable_refresh_async).and_return(true)
|
38
|
+
expect(CitiesIndex).to receive(:import!).with([city.id, other_city.id], refresh: false)
|
39
|
+
scheduler = Chewy::Strategy::DelayedSidekiq::Scheduler.new(CitiesIndex, [city.id, other_city.id])
|
40
|
+
scheduler.postpone
|
41
|
+
Chewy::Strategy::DelayedSidekiq::Worker.drain
|
42
|
+
end
|
43
|
+
|
44
|
+
context 'with default config' do
|
45
|
+
it 'does schedule a job that triggers reindex with default options' do
|
46
|
+
Timecop.freeze do
|
47
|
+
expect(Sidekiq::Client).to receive(:push).with(
|
48
|
+
hash_including(
|
49
|
+
'queue' => 'chewy',
|
50
|
+
'at' => (Time.current.to_i.ceil(-1) + 2.seconds).to_i,
|
51
|
+
'class' => Chewy::Strategy::DelayedSidekiq::Worker,
|
52
|
+
'args' => ['CitiesIndex', an_instance_of(Integer)]
|
53
|
+
)
|
54
|
+
).and_call_original
|
55
|
+
|
56
|
+
expect($stdout).not_to receive(:puts)
|
57
|
+
|
58
|
+
Sidekiq::Testing.inline! do
|
59
|
+
expect { [city, other_city].map(&:save!) }
|
60
|
+
.to update_index(CitiesIndex, strategy: :delayed_sidekiq)
|
61
|
+
.and_reindex(city, other_city).only
|
62
|
+
end
|
63
|
+
end
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
context 'with custom config' do
|
68
|
+
before do
|
69
|
+
CitiesIndex.strategy_config(
|
70
|
+
delayed_sidekiq: {
|
71
|
+
reindex_wrapper: lambda { |&reindex|
|
72
|
+
puts 'hello'
|
73
|
+
reindex.call
|
74
|
+
},
|
75
|
+
margin: 5,
|
76
|
+
latency: 60
|
77
|
+
}
|
78
|
+
)
|
79
|
+
end
|
80
|
+
|
81
|
+
it 'respects :strategy_config options' do
|
82
|
+
Timecop.freeze do
|
83
|
+
expect(Sidekiq::Client).to receive(:push).with(
|
84
|
+
hash_including(
|
85
|
+
'queue' => 'chewy',
|
86
|
+
'at' => (60.seconds.from_now.change(sec: 0) + 5.seconds).to_i,
|
87
|
+
'class' => Chewy::Strategy::DelayedSidekiq::Worker,
|
88
|
+
'args' => ['CitiesIndex', an_instance_of(Integer)]
|
89
|
+
)
|
90
|
+
).and_call_original
|
91
|
+
|
92
|
+
expect($stdout).to receive(:puts).with('hello') # check that reindex_wrapper works
|
93
|
+
|
94
|
+
Sidekiq::Testing.inline! do
|
95
|
+
expect { [city, other_city].map(&:save!) }
|
96
|
+
.to update_index(CitiesIndex, strategy: :delayed_sidekiq)
|
97
|
+
.and_reindex(city, other_city).only
|
98
|
+
end
|
99
|
+
end
|
100
|
+
end
|
101
|
+
end
|
102
|
+
|
103
|
+
context 'two reindex call within the timewindow' do
|
104
|
+
it 'accumulates all ids does the reindex one time' do
|
105
|
+
Timecop.freeze do
|
106
|
+
expect(CitiesIndex).to receive(:import!).with([other_city.id, city.id]).once
|
107
|
+
scheduler = Chewy::Strategy::DelayedSidekiq::Scheduler.new(CitiesIndex, [city.id])
|
108
|
+
scheduler.postpone
|
109
|
+
scheduler = Chewy::Strategy::DelayedSidekiq::Scheduler.new(CitiesIndex, [other_city.id])
|
110
|
+
scheduler.postpone
|
111
|
+
Chewy::Strategy::DelayedSidekiq::Worker.drain
|
112
|
+
end
|
113
|
+
end
|
114
|
+
|
115
|
+
context 'one call with update_fields another one without update_fields' do
|
116
|
+
it 'does reindex of all fields' do
|
117
|
+
Timecop.freeze do
|
118
|
+
expect(CitiesIndex).to receive(:import!).with([other_city.id, city.id]).once
|
119
|
+
scheduler = Chewy::Strategy::DelayedSidekiq::Scheduler.new(CitiesIndex, [city.id], update_fields: ['name'])
|
120
|
+
scheduler.postpone
|
121
|
+
scheduler = Chewy::Strategy::DelayedSidekiq::Scheduler.new(CitiesIndex, [other_city.id])
|
122
|
+
scheduler.postpone
|
123
|
+
Chewy::Strategy::DelayedSidekiq::Worker.drain
|
124
|
+
end
|
125
|
+
end
|
126
|
+
end
|
127
|
+
|
128
|
+
context 'both calls with different update fields' do
|
129
|
+
it 'deos reindex with union of fields' do
|
130
|
+
Timecop.freeze do
|
131
|
+
expect(CitiesIndex).to receive(:import!).with([other_city.id, city.id], update_fields: %w[description name]).once
|
132
|
+
scheduler = Chewy::Strategy::DelayedSidekiq::Scheduler.new(CitiesIndex, [city.id], update_fields: ['name'])
|
133
|
+
scheduler.postpone
|
134
|
+
scheduler = Chewy::Strategy::DelayedSidekiq::Scheduler.new(CitiesIndex, [other_city.id], update_fields: ['description'])
|
135
|
+
scheduler.postpone
|
136
|
+
Chewy::Strategy::DelayedSidekiq::Worker.drain
|
137
|
+
end
|
138
|
+
end
|
139
|
+
end
|
140
|
+
end
|
141
|
+
|
142
|
+
context 'two calls within different timewindows' do
|
143
|
+
it 'does two separate reindexes' do
|
144
|
+
Timecop.freeze do
|
145
|
+
expect(CitiesIndex).to receive(:import!).with([city.id]).once
|
146
|
+
expect(CitiesIndex).to receive(:import!).with([other_city.id]).once
|
147
|
+
Timecop.travel(20.seconds.ago) do
|
148
|
+
scheduler = Chewy::Strategy::DelayedSidekiq::Scheduler.new(CitiesIndex, [city.id])
|
149
|
+
scheduler.postpone
|
150
|
+
end
|
151
|
+
scheduler = Chewy::Strategy::DelayedSidekiq::Scheduler.new(CitiesIndex, [other_city.id])
|
152
|
+
scheduler.postpone
|
153
|
+
Chewy::Strategy::DelayedSidekiq::Worker.drain
|
154
|
+
end
|
155
|
+
end
|
156
|
+
end
|
157
|
+
|
158
|
+
context 'first call has update_fields' do
|
159
|
+
it 'does first reindex with the expected update_fields and second without update_fields' do
|
160
|
+
Timecop.freeze do
|
161
|
+
expect(CitiesIndex).to receive(:import!).with([city.id], update_fields: ['name']).once
|
162
|
+
expect(CitiesIndex).to receive(:import!).with([other_city.id]).once
|
163
|
+
Timecop.travel(20.seconds.ago) do
|
164
|
+
scheduler = Chewy::Strategy::DelayedSidekiq::Scheduler.new(CitiesIndex, [city.id], update_fields: ['name'])
|
165
|
+
scheduler.postpone
|
166
|
+
end
|
167
|
+
scheduler = Chewy::Strategy::DelayedSidekiq::Scheduler.new(CitiesIndex, [other_city.id])
|
168
|
+
scheduler.postpone
|
169
|
+
Chewy::Strategy::DelayedSidekiq::Worker.drain
|
170
|
+
end
|
171
|
+
end
|
172
|
+
end
|
173
|
+
|
174
|
+
context 'both calls have update_fields option' do
|
175
|
+
it 'does both reindexes with their expected update_fields option' do
|
176
|
+
Timecop.freeze do
|
177
|
+
expect(CitiesIndex).to receive(:import!).with([city.id], update_fields: ['name']).once
|
178
|
+
expect(CitiesIndex).to receive(:import!).with([other_city.id], update_fields: ['description']).once
|
179
|
+
Timecop.travel(20.seconds.ago) do
|
180
|
+
scheduler = Chewy::Strategy::DelayedSidekiq::Scheduler.new(CitiesIndex, [city.id], update_fields: ['name'])
|
181
|
+
scheduler.postpone
|
182
|
+
end
|
183
|
+
scheduler = Chewy::Strategy::DelayedSidekiq::Scheduler.new(CitiesIndex, [other_city.id], update_fields: ['description'])
|
184
|
+
scheduler.postpone
|
185
|
+
Chewy::Strategy::DelayedSidekiq::Worker.drain
|
186
|
+
end
|
187
|
+
end
|
188
|
+
end
|
189
|
+
end
|
190
|
+
end
|
metadata
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: chewy
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 7.
|
4
|
+
version: 7.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Toptal, LLC
|
8
8
|
- pyromaniac
|
9
|
-
autorequire:
|
9
|
+
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date:
|
12
|
+
date: 2023-04-03 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: database_cleaner
|
@@ -39,6 +39,20 @@ dependencies:
|
|
39
39
|
- - ">="
|
40
40
|
- !ruby/object:Gem::Version
|
41
41
|
version: '0'
|
42
|
+
- !ruby/object:Gem::Dependency
|
43
|
+
name: mock_redis
|
44
|
+
requirement: !ruby/object:Gem::Requirement
|
45
|
+
requirements:
|
46
|
+
- - ">="
|
47
|
+
- !ruby/object:Gem::Version
|
48
|
+
version: '0'
|
49
|
+
type: :development
|
50
|
+
prerelease: false
|
51
|
+
version_requirements: !ruby/object:Gem::Requirement
|
52
|
+
requirements:
|
53
|
+
- - ">="
|
54
|
+
- !ruby/object:Gem::Version
|
55
|
+
version: '0'
|
42
56
|
- !ruby/object:Gem::Dependency
|
43
57
|
name: rake
|
44
58
|
requirement: !ruby/object:Gem::Requirement
|
@@ -344,6 +358,9 @@ files:
|
|
344
358
|
- lib/chewy/strategy/atomic_no_refresh.rb
|
345
359
|
- lib/chewy/strategy/base.rb
|
346
360
|
- lib/chewy/strategy/bypass.rb
|
361
|
+
- lib/chewy/strategy/delayed_sidekiq.rb
|
362
|
+
- lib/chewy/strategy/delayed_sidekiq/scheduler.rb
|
363
|
+
- lib/chewy/strategy/delayed_sidekiq/worker.rb
|
347
364
|
- lib/chewy/strategy/lazy_sidekiq.rb
|
348
365
|
- lib/chewy/strategy/sidekiq.rb
|
349
366
|
- lib/chewy/strategy/urgent.rb
|
@@ -437,6 +454,7 @@ files:
|
|
437
454
|
- spec/chewy/strategy/active_job_spec.rb
|
438
455
|
- spec/chewy/strategy/atomic_no_refresh_spec.rb
|
439
456
|
- spec/chewy/strategy/atomic_spec.rb
|
457
|
+
- spec/chewy/strategy/delayed_sidekiq_spec.rb
|
440
458
|
- spec/chewy/strategy/lazy_sidekiq_spec.rb
|
441
459
|
- spec/chewy/strategy/sidekiq_spec.rb
|
442
460
|
- spec/chewy/strategy_spec.rb
|
@@ -449,7 +467,7 @@ homepage: https://github.com/toptal/chewy
|
|
449
467
|
licenses:
|
450
468
|
- MIT
|
451
469
|
metadata: {}
|
452
|
-
post_install_message:
|
470
|
+
post_install_message:
|
453
471
|
rdoc_options: []
|
454
472
|
require_paths:
|
455
473
|
- lib
|
@@ -464,8 +482,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
464
482
|
- !ruby/object:Gem::Version
|
465
483
|
version: '0'
|
466
484
|
requirements: []
|
467
|
-
rubygems_version: 3.
|
468
|
-
signing_key:
|
485
|
+
rubygems_version: 3.3.26
|
486
|
+
signing_key:
|
469
487
|
specification_version: 4
|
470
488
|
summary: Elasticsearch ODM client wrapper
|
471
489
|
test_files:
|
@@ -554,6 +572,7 @@ test_files:
|
|
554
572
|
- spec/chewy/strategy/active_job_spec.rb
|
555
573
|
- spec/chewy/strategy/atomic_no_refresh_spec.rb
|
556
574
|
- spec/chewy/strategy/atomic_spec.rb
|
575
|
+
- spec/chewy/strategy/delayed_sidekiq_spec.rb
|
557
576
|
- spec/chewy/strategy/lazy_sidekiq_spec.rb
|
558
577
|
- spec/chewy/strategy/sidekiq_spec.rb
|
559
578
|
- spec/chewy/strategy_spec.rb
|