chewy 7.2.6 → 7.2.7
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.github/workflows/ruby.yml +8 -1
- data/CHANGELOG.md +12 -0
- data/README.md +13 -0
- data/lib/chewy/index/import/bulk_builder.rb +4 -5
- data/lib/chewy/journal.rb +6 -2
- data/lib/chewy/rake_helper.rb +38 -5
- data/lib/chewy/search/request.rb +15 -3
- data/lib/chewy/stash.rb +3 -3
- data/lib/chewy/version.rb +1 -1
- data/lib/tasks/chewy.rake +7 -1
- data/spec/chewy/index/import/bulk_builder_spec.rb +4 -0
- data/spec/chewy/rake_helper_spec.rb +68 -0
- data/spec/chewy/search/request_spec.rb +25 -0
- metadata +6 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: f720ba38bdda37de9d1a9b54d0b9c28efbe444e3c9cdfb8e3a1fef0f01aeaf85
|
4
|
+
data.tar.gz: 01041cfe59a33c5b9b07bc46f4e5c0d9a68802a7ddf6ad99aebf7a34c85b32b2
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 01567be06ff1aa7deb0cd9e2459c9a094d80c113e088e1eac7dbdea0cf6b74bbb78579923bf7335bb2114884a3cb3a99ca946009fe3f762566b76a82c17ea7f2
|
7
|
+
data.tar.gz: 0cd235971d30435c1d5ad2b33f70643733fb5af02d7d49891668d216396fdaa0326b7aad4c01bcbf3ed207ef3ee9ac37c02ff269087b9bb6c8e498304dd902d9
|
data/.github/workflows/ruby.yml
CHANGED
data/CHANGELOG.md
CHANGED
@@ -8,6 +8,18 @@
|
|
8
8
|
|
9
9
|
### Bugs Fixed
|
10
10
|
|
11
|
+
## 7.2.7 (2022-11-15)
|
12
|
+
|
13
|
+
### New Features
|
14
|
+
|
15
|
+
* [#857](https://github.com/toptal/chewy/pull/857): Allow passing `wait_for_completion`, `request_per_second` and `scroll_size` options to `chewy:journal:clean` rake task and `delete_all` query builder method. ([@konalegi][])([@barthez][])
|
16
|
+
|
17
|
+
### Changes
|
18
|
+
|
19
|
+
### Bugs Fixed
|
20
|
+
|
21
|
+
* [#863](https://github.com/toptal/chewy/pull/863): Fix `crutches` call doesn't respect `update_fields` option. ([@skcc321][])
|
22
|
+
|
11
23
|
## 7.2.6 (2022-06-13)
|
12
24
|
|
13
25
|
### New Features
|
data/README.md
CHANGED
@@ -677,6 +677,8 @@ You may be wondering why do you need it? The answer is simple: not to lose the d
|
|
677
677
|
|
678
678
|
Imagine that you reset your index in a zero-downtime manner (to separate index), and at the meantime somebody keeps updating the data frequently (to old index). So all these actions will be written to the journal index and you'll be able to apply them after index reset using the `Chewy::Journal` interface.
|
679
679
|
|
680
|
+
When enabled, journal can grow to enormous size, consider setting up cron job that would clean it occasionally using [`chewy:journal:clean` rake task](#chewyjournal).
|
681
|
+
|
680
682
|
### Index manipulation
|
681
683
|
|
682
684
|
```ruby
|
@@ -1144,6 +1146,17 @@ rake chewy:journal:apply["$(date -v-1H -u +%FT%TZ)"] # apply journaled changes f
|
|
1144
1146
|
rake chewy:journal:apply["$(date -v-1H -u +%FT%TZ)",users] # apply journaled changes for the past hour on UsersIndex only
|
1145
1147
|
```
|
1146
1148
|
|
1149
|
+
When the size of the journal becomes very large, the classical way of deletion would be obstructive and resource consuming. Fortunately, Chewy internally uses [delete-by-query](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-delete-by-query.html#docs-delete-by-query-task-api) ES function which supports async execution with batching and [throttling](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query-throttle).
|
1150
|
+
|
1151
|
+
The available options, which can be set by ENV variables, are listed below:
|
1152
|
+
* `WAIT_FOR_COMPLETION` - a boolean flag. It controls async execution. It waits by default. When set to `false` (`0`, `f`, `false` or `off` in any case spelling is accepted as `false`), Elasticsearch performs some preflight checks, launches the request, and returns a task reference you can use to cancel the task or get its status.
|
1153
|
+
* `REQUESTS_PER_SECOND` - float. The throttle for this request in sub-requests per second. No throttling is enforced by default.
|
1154
|
+
* `SCROLL_SIZE` - integer. The number of documents to be deleted in single sub-request. The default batch size is 1000.
|
1155
|
+
|
1156
|
+
```bash
|
1157
|
+
rake chewy:journal:clean WAIT_FOR_COMPLETION=false REQUESTS_PER_SECOND=10 SCROLL_SIZE=5000
|
1158
|
+
```
|
1159
|
+
|
1147
1160
|
### RSpec integration
|
1148
1161
|
|
1149
1162
|
Just add `require 'chewy/rspec'` to your spec_helper.rb and you will get additional features:
|
@@ -48,12 +48,11 @@ module Chewy
|
|
48
48
|
def index_entry(object)
|
49
49
|
entry = {}
|
50
50
|
entry[:_id] = index_object_ids[object] if index_object_ids[object]
|
51
|
+
entry[:routing] = routing(object) if join_field?
|
51
52
|
|
52
|
-
data = data_for(object)
|
53
53
|
parent = cache(entry[:_id])
|
54
|
-
|
55
|
-
|
56
|
-
if parent_changed?(data, parent)
|
54
|
+
data = data_for(object) if parent.present?
|
55
|
+
if parent.present? && parent_changed?(data, parent)
|
57
56
|
reindex_entries(object, data) + reindex_descendants(object)
|
58
57
|
elsif @fields.present?
|
59
58
|
return [] unless entry[:_id]
|
@@ -61,7 +60,7 @@ module Chewy
|
|
61
60
|
entry[:data] = {doc: data_for(object, fields: @fields)}
|
62
61
|
[{update: entry}]
|
63
62
|
else
|
64
|
-
entry[:data] = data
|
63
|
+
entry[:data] = data || data_for(object)
|
65
64
|
[{index: entry}]
|
66
65
|
end
|
67
66
|
end
|
data/lib/chewy/journal.rb
CHANGED
@@ -43,8 +43,12 @@ module Chewy
|
|
43
43
|
#
|
44
44
|
# @param until_time [Time, DateTime] time to clean up until it
|
45
45
|
# @return [Hash] delete_by_query ES API call result
|
46
|
-
def clean(until_time = nil)
|
47
|
-
Chewy::Stash::Journal.clean(
|
46
|
+
def clean(until_time = nil, delete_by_query_options: {})
|
47
|
+
Chewy::Stash::Journal.clean(
|
48
|
+
until_time,
|
49
|
+
only: @only,
|
50
|
+
delete_by_query_options: delete_by_query_options.merge(refresh: false)
|
51
|
+
)
|
48
52
|
end
|
49
53
|
|
50
54
|
private
|
data/lib/chewy/rake_helper.rb
CHANGED
@@ -19,6 +19,9 @@ module Chewy
|
|
19
19
|
output.puts " Applying journal to #{targets}, #{count} entries, stage #{payload[:stage]}"
|
20
20
|
end
|
21
21
|
|
22
|
+
DELETE_BY_QUERY_OPTIONS = %w[WAIT_FOR_COMPLETION REQUESTS_PER_SECOND SCROLL_SIZE].freeze
|
23
|
+
FALSE_VALUES = %w[0 f false off].freeze
|
24
|
+
|
22
25
|
class << self
|
23
26
|
# Performs zero-downtime reindexing of all documents for the specified indexes
|
24
27
|
#
|
@@ -162,7 +165,7 @@ module Chewy
|
|
162
165
|
|
163
166
|
subscribed_task_stats(output) do
|
164
167
|
output.puts "Applying journal entries created after #{time}"
|
165
|
-
count = Chewy::Journal.new(
|
168
|
+
count = Chewy::Journal.new(journal_indexes_from(only: only, except: except)).apply(time)
|
166
169
|
output.puts 'No journal entries were created after the specified time' if count.zero?
|
167
170
|
end
|
168
171
|
end
|
@@ -181,12 +184,16 @@ module Chewy
|
|
181
184
|
# @param except [Array<Chewy::Index, String>, Chewy::Index, String] indexes to exclude from processing
|
182
185
|
# @param output [IO] output io for logging
|
183
186
|
# @return [Array<Chewy::Index>] indexes that were actually updated
|
184
|
-
def journal_clean(time: nil, only: nil, except: nil, output: $stdout)
|
187
|
+
def journal_clean(time: nil, only: nil, except: nil, delete_by_query_options: {}, output: $stdout)
|
185
188
|
subscribed_task_stats(output) do
|
186
189
|
output.puts "Cleaning journal entries created before #{time}" if time
|
187
|
-
response = Chewy::Journal.new(
|
188
|
-
|
189
|
-
|
190
|
+
response = Chewy::Journal.new(journal_indexes_from(only: only, except: except)).clean(time, delete_by_query_options: delete_by_query_options)
|
191
|
+
if response.key?('task')
|
192
|
+
output.puts "Task to cleanup the journal has been created, #{response['task']}"
|
193
|
+
else
|
194
|
+
count = response['deleted'] || response['_indices']['_all']['deleted']
|
195
|
+
output.puts "Cleaned up #{count} journal entries"
|
196
|
+
end
|
190
197
|
end
|
191
198
|
end
|
192
199
|
|
@@ -228,6 +235,26 @@ module Chewy
|
|
228
235
|
end
|
229
236
|
end
|
230
237
|
|
238
|
+
# Reads options that are required to run journal cleanup asynchronously from ENV hash
|
239
|
+
# @see https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
|
240
|
+
#
|
241
|
+
# @example
|
242
|
+
# Chewy::RakeHelper.delete_by_query_options_from_env({'WAIT_FOR_COMPLETION' => 'false','REQUESTS_PER_SECOND' => '10','SCROLL_SIZE' => '5000'})
|
243
|
+
# # => { wait_for_completion: false, requests_per_second: 10.0, scroll_size: 5000 }
|
244
|
+
#
|
245
|
+
def delete_by_query_options_from_env(env)
|
246
|
+
env
|
247
|
+
.slice(*DELETE_BY_QUERY_OPTIONS)
|
248
|
+
.transform_keys { |k| k.downcase.to_sym }
|
249
|
+
.to_h do |key, value|
|
250
|
+
case key
|
251
|
+
when :wait_for_completion then [key, !FALSE_VALUES.include?(value.downcase)]
|
252
|
+
when :requests_per_second then [key, value.to_f]
|
253
|
+
when :scroll_size then [key, value.to_i]
|
254
|
+
end
|
255
|
+
end
|
256
|
+
end
|
257
|
+
|
231
258
|
def normalize_indexes(*identifiers)
|
232
259
|
identifiers.flatten(1).map { |identifier| normalize_index(identifier) }
|
233
260
|
end
|
@@ -248,6 +275,12 @@ module Chewy
|
|
248
275
|
|
249
276
|
private
|
250
277
|
|
278
|
+
def journal_indexes_from(only: nil, except: nil)
|
279
|
+
return if Array.wrap(only).empty? && Array.wrap(except).empty?
|
280
|
+
|
281
|
+
indexes_from(only: only, except: except)
|
282
|
+
end
|
283
|
+
|
251
284
|
def indexes_from(only: nil, except: nil)
|
252
285
|
indexes = if only.present?
|
253
286
|
normalize_indexes(Array.wrap(only))
|
data/lib/chewy/search/request.rb
CHANGED
@@ -962,10 +962,22 @@ module Chewy
|
|
962
962
|
#
|
963
963
|
# @see https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
|
964
964
|
# @note The result hash is different for different API used.
|
965
|
-
# @param refresh [true, false]
|
965
|
+
# @param refresh [true, false] Refreshes all shards involved in the delete by query
|
966
|
+
# @param wait_for_completion [true, false] wait for request completion or run it asynchronously
|
967
|
+
# and return task reference at `.tasks/task/${taskId}`.
|
968
|
+
# @param requests_per_second [Float] The throttle for this request in sub-requests per second
|
969
|
+
# @param scroll_size [Integer] Size of the scroll request that powers the operation
|
970
|
+
|
966
971
|
# @return [Hash] the result of query execution
|
967
|
-
def delete_all(refresh: true)
|
968
|
-
request_body = only(WHERE_STORAGES).render.merge(
|
972
|
+
def delete_all(refresh: true, wait_for_completion: nil, requests_per_second: nil, scroll_size: nil)
|
973
|
+
request_body = only(WHERE_STORAGES).render.merge(
|
974
|
+
{
|
975
|
+
refresh: refresh,
|
976
|
+
wait_for_completion: wait_for_completion,
|
977
|
+
requests_per_second: requests_per_second,
|
978
|
+
scroll_size: scroll_size
|
979
|
+
}.compact
|
980
|
+
)
|
969
981
|
ActiveSupport::Notifications.instrument 'delete_query.chewy', notification_payload(request: request_body) do
|
970
982
|
request_body[:body] = {query: {match_all: {}}} if request_body[:body].empty?
|
971
983
|
Chewy.client.delete_by_query(request_body)
|
data/lib/chewy/stash.rb
CHANGED
@@ -28,12 +28,12 @@ module Chewy
|
|
28
28
|
# Cleans up all the journal entries until the specified time. If nothing is
|
29
29
|
# specified - cleans up everything.
|
30
30
|
#
|
31
|
-
# @param
|
31
|
+
# @param until_time [Time, DateTime] Clean everything before that date
|
32
32
|
# @param only [Chewy::Index, Array<Chewy::Index>] indexes to clean up journal entries for
|
33
|
-
def self.clean(until_time = nil, only: [])
|
33
|
+
def self.clean(until_time = nil, only: [], delete_by_query_options: {})
|
34
34
|
scope = self.for(only)
|
35
35
|
scope = scope.filter(range: {created_at: {lte: until_time}}) if until_time
|
36
|
-
scope.delete_all
|
36
|
+
scope.delete_all(**delete_by_query_options)
|
37
37
|
end
|
38
38
|
|
39
39
|
# Selects all the journal entries for the specified indices.
|
data/lib/chewy/version.rb
CHANGED
data/lib/tasks/chewy.rake
CHANGED
@@ -94,7 +94,13 @@ namespace :chewy do
|
|
94
94
|
|
95
95
|
desc 'Removes journal records created before the specified timestamp for the specified indexes/types or all of them'
|
96
96
|
task clean: :environment do |_task, args|
|
97
|
-
Chewy::RakeHelper.
|
97
|
+
delete_options = Chewy::RakeHelper.delete_by_query_options_from_env(ENV)
|
98
|
+
Chewy::RakeHelper.journal_clean(
|
99
|
+
[
|
100
|
+
parse_journal_args(args.extras),
|
101
|
+
{delete_by_query_options: delete_options}
|
102
|
+
].reduce({}, :merge)
|
103
|
+
)
|
98
104
|
end
|
99
105
|
end
|
100
106
|
end
|
@@ -62,6 +62,8 @@ describe Chewy::Index::Import::BulkBuilder do
|
|
62
62
|
let(:to_index) { cities.first(2) }
|
63
63
|
let(:delete) { [cities.last] }
|
64
64
|
specify do
|
65
|
+
expect(subject).to receive(:data_for).with(cities.first).and_call_original
|
66
|
+
expect(subject).to receive(:data_for).with(cities.second).and_call_original
|
65
67
|
expect(subject.bulk_body).to eq([
|
66
68
|
{index: {_id: 1, data: {'name' => 'City17', 'rating' => 42}}},
|
67
69
|
{index: {_id: 2, data: {'name' => 'City18', 'rating' => 42}}},
|
@@ -72,6 +74,8 @@ describe Chewy::Index::Import::BulkBuilder do
|
|
72
74
|
context ':fields' do
|
73
75
|
let(:fields) { %w[name] }
|
74
76
|
specify do
|
77
|
+
expect(subject).to receive(:data_for).with(cities.first, fields: [:name]).and_call_original
|
78
|
+
expect(subject).to receive(:data_for).with(cities.second, fields: [:name]).and_call_original
|
75
79
|
expect(subject.bulk_body).to eq([
|
76
80
|
{update: {_id: 1, data: {doc: {'name' => 'City17'}}}},
|
77
81
|
{update: {_id: 2, data: {doc: {'name' => 'City18'}}}},
|
@@ -426,6 +426,33 @@ Total: \\d+s\\Z
|
|
426
426
|
described_class.journal_clean(except: CitiesIndex, output: output)
|
427
427
|
expect(output.string).to match(Regexp.new(<<-OUTPUT, Regexp::MULTILINE))
|
428
428
|
\\ACleaned up 1 journal entries
|
429
|
+
Total: \\d+s\\Z
|
430
|
+
OUTPUT
|
431
|
+
end
|
432
|
+
|
433
|
+
it 'executes asynchronously' do
|
434
|
+
output = StringIO.new
|
435
|
+
expect(Chewy.client).to receive(:delete_by_query).with(
|
436
|
+
{
|
437
|
+
body: {query: {match_all: {}}},
|
438
|
+
index: ['chewy_journal'],
|
439
|
+
refresh: false,
|
440
|
+
requests_per_second: 10.0,
|
441
|
+
scroll_size: 200,
|
442
|
+
wait_for_completion: false
|
443
|
+
}
|
444
|
+
).and_call_original
|
445
|
+
described_class.journal_clean(
|
446
|
+
output: output,
|
447
|
+
delete_by_query_options: {
|
448
|
+
wait_for_completion: false,
|
449
|
+
requests_per_second: 10.0,
|
450
|
+
scroll_size: 200
|
451
|
+
}
|
452
|
+
)
|
453
|
+
|
454
|
+
expect(output.string).to match(Regexp.new(<<-OUTPUT, Regexp::MULTILINE))
|
455
|
+
\\ATask to cleanup the journal has been created, [^\\n]*
|
429
456
|
Total: \\d+s\\Z
|
430
457
|
OUTPUT
|
431
458
|
end
|
@@ -502,4 +529,45 @@ Total: \\d+s\\Z
|
|
502
529
|
end
|
503
530
|
end
|
504
531
|
end
|
532
|
+
|
533
|
+
describe '.delete_by_query_options_from_env' do
|
534
|
+
subject(:options) { described_class.delete_by_query_options_from_env(env) }
|
535
|
+
let(:env) do
|
536
|
+
{
|
537
|
+
'WAIT_FOR_COMPLETION' => 'false',
|
538
|
+
'REQUESTS_PER_SECOND' => '10',
|
539
|
+
'SCROLL_SIZE' => '5000'
|
540
|
+
}
|
541
|
+
end
|
542
|
+
|
543
|
+
it 'parses the options' do
|
544
|
+
expect(options).to eq(
|
545
|
+
wait_for_completion: false,
|
546
|
+
requests_per_second: 10.0,
|
547
|
+
scroll_size: 5000
|
548
|
+
)
|
549
|
+
end
|
550
|
+
|
551
|
+
context 'with different boolean values' do
|
552
|
+
it 'parses the option correctly' do
|
553
|
+
%w[1 t true TRUE on ON].each do |v|
|
554
|
+
expect(described_class.delete_by_query_options_from_env({'WAIT_FOR_COMPLETION' => v}))
|
555
|
+
.to eq(wait_for_completion: true)
|
556
|
+
end
|
557
|
+
|
558
|
+
%w[0 f false FALSE off OFF].each do |v|
|
559
|
+
expect(described_class.delete_by_query_options_from_env({'WAIT_FOR_COMPLETION' => v}))
|
560
|
+
.to eq(wait_for_completion: false)
|
561
|
+
end
|
562
|
+
end
|
563
|
+
end
|
564
|
+
|
565
|
+
context 'with other env' do
|
566
|
+
let(:env) { {'SOME_ENV' => '123', 'REQUESTS_PER_SECOND' => '15'} }
|
567
|
+
|
568
|
+
it 'parses only the options' do
|
569
|
+
expect(options).to eq(requests_per_second: 15.0)
|
570
|
+
end
|
571
|
+
end
|
572
|
+
end
|
505
573
|
end
|
@@ -817,6 +817,31 @@ describe Chewy::Search::Request do
|
|
817
817
|
request: {index: ['products'], body: {query: {match: {name: 'name3'}}}, refresh: false}
|
818
818
|
)
|
819
819
|
end
|
820
|
+
|
821
|
+
it 'delete records asynchronously' do
|
822
|
+
outer_payload = nil
|
823
|
+
ActiveSupport::Notifications.subscribe('delete_query.chewy') do |_name, _start, _finish, _id, payload|
|
824
|
+
outer_payload = payload
|
825
|
+
end
|
826
|
+
subject.query(match: {name: 'name3'}).delete_all(
|
827
|
+
refresh: false,
|
828
|
+
wait_for_completion: false,
|
829
|
+
requests_per_second: 10.0,
|
830
|
+
scroll_size: 2000
|
831
|
+
)
|
832
|
+
expect(outer_payload).to eq(
|
833
|
+
index: ProductsIndex,
|
834
|
+
indexes: [ProductsIndex],
|
835
|
+
request: {
|
836
|
+
index: ['products'],
|
837
|
+
body: {query: {match: {name: 'name3'}}},
|
838
|
+
refresh: false,
|
839
|
+
wait_for_completion: false,
|
840
|
+
requests_per_second: 10.0,
|
841
|
+
scroll_size: 2000
|
842
|
+
}
|
843
|
+
)
|
844
|
+
end
|
820
845
|
end
|
821
846
|
|
822
847
|
describe '#response=' do
|
metadata
CHANGED
@@ -1,15 +1,15 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: chewy
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 7.2.
|
4
|
+
version: 7.2.7
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Toptal, LLC
|
8
8
|
- pyromaniac
|
9
|
-
autorequire:
|
9
|
+
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2022-
|
12
|
+
date: 2022-11-15 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: database_cleaner
|
@@ -449,7 +449,7 @@ homepage: https://github.com/toptal/chewy
|
|
449
449
|
licenses:
|
450
450
|
- MIT
|
451
451
|
metadata: {}
|
452
|
-
post_install_message:
|
452
|
+
post_install_message:
|
453
453
|
rdoc_options: []
|
454
454
|
require_paths:
|
455
455
|
- lib
|
@@ -464,8 +464,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
464
464
|
- !ruby/object:Gem::Version
|
465
465
|
version: '0'
|
466
466
|
requirements: []
|
467
|
-
rubygems_version: 3.2.
|
468
|
-
signing_key:
|
467
|
+
rubygems_version: 3.2.33
|
468
|
+
signing_key:
|
469
469
|
specification_version: 4
|
470
470
|
summary: Elasticsearch ODM client wrapper
|
471
471
|
test_files:
|