chewy 7.2.6 → 7.2.7

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 96329a4913f6308f4c7435cd50b962a36dc63b728b50567ff53ce2cb8011086f
4
- data.tar.gz: a25b93955769a85fdf5de68a6d284060a981c2e3f6e10c929b0cd49816c19720
3
+ metadata.gz: f720ba38bdda37de9d1a9b54d0b9c28efbe444e3c9cdfb8e3a1fef0f01aeaf85
4
+ data.tar.gz: 01041cfe59a33c5b9b07bc46f4e5c0d9a68802a7ddf6ad99aebf7a34c85b32b2
5
5
  SHA512:
6
- metadata.gz: 66a3dfa94536e8d7b262a6a0ccfe32ff76ca7431850722eef21b14a11b3d162ad0346c3a469293696ce79916c44db7fb17b56f462fd4083c619b2dde09fa6c29
7
- data.tar.gz: 6e15e9fcb103c64a5565add1e5200caa4bf1018d3fc7e9ac79a8db6829a5bb1699ac77910e838d59df2f6f195d019a88c22987d51f1b02ebdbb8f37844cce8d3
6
+ metadata.gz: 01567be06ff1aa7deb0cd9e2459c9a094d80c113e088e1eac7dbdea0cf6b74bbb78579923bf7335bb2114884a3cb3a99ca946009fe3f762566b76a82c17ea7f2
7
+ data.tar.gz: 0cd235971d30435c1d5ad2b33f70643733fb5af02d7d49891668d216396fdaa0326b7aad4c01bcbf3ed207ef3ee9ac37c02ff269087b9bb6c8e498304dd902d9
@@ -1,6 +1,13 @@
1
1
  name: CI
2
2
 
3
- on: [push, pull_request]
3
+ on:
4
+ push:
5
+ branches: [master]
6
+ pull_request:
7
+ types: [
8
+ synchronize, # PR was updated
9
+ opened # PR was open
10
+ ]
4
11
 
5
12
  jobs:
6
13
  ruby-2:
data/CHANGELOG.md CHANGED
@@ -8,6 +8,18 @@
8
8
 
9
9
  ### Bugs Fixed
10
10
 
11
+ ## 7.2.7 (2022-11-15)
12
+
13
+ ### New Features
14
+
15
+ * [#857](https://github.com/toptal/chewy/pull/857): Allow passing `wait_for_completion`, `request_per_second` and `scroll_size` options to `chewy:journal:clean` rake task and `delete_all` query builder method. ([@konalegi][])([@barthez][])
16
+
17
+ ### Changes
18
+
19
+ ### Bugs Fixed
20
+
21
+ * [#863](https://github.com/toptal/chewy/pull/863): Fix `crutches` call doesn't respect `update_fields` option. ([@skcc321][])
22
+
11
23
  ## 7.2.6 (2022-06-13)
12
24
 
13
25
  ### New Features
data/README.md CHANGED
@@ -677,6 +677,8 @@ You may be wondering why do you need it? The answer is simple: not to lose the d
677
677
 
678
678
  Imagine that you reset your index in a zero-downtime manner (to separate index), and at the meantime somebody keeps updating the data frequently (to old index). So all these actions will be written to the journal index and you'll be able to apply them after index reset using the `Chewy::Journal` interface.
679
679
 
680
+ When enabled, journal can grow to enormous size, consider setting up cron job that would clean it occasionally using [`chewy:journal:clean` rake task](#chewyjournal).
681
+
680
682
  ### Index manipulation
681
683
 
682
684
  ```ruby
@@ -1144,6 +1146,17 @@ rake chewy:journal:apply["$(date -v-1H -u +%FT%TZ)"] # apply journaled changes f
1144
1146
  rake chewy:journal:apply["$(date -v-1H -u +%FT%TZ)",users] # apply journaled changes for the past hour on UsersIndex only
1145
1147
  ```
1146
1148
 
1149
+ When the size of the journal becomes very large, the classical way of deletion would be obstructive and resource consuming. Fortunately, Chewy internally uses [delete-by-query](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-delete-by-query.html#docs-delete-by-query-task-api) ES function which supports async execution with batching and [throttling](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query-throttle).
1150
+
1151
+ The available options, which can be set by ENV variables, are listed below:
1152
+ * `WAIT_FOR_COMPLETION` - a boolean flag. It controls async execution. It waits by default. When set to `false` (`0`, `f`, `false` or `off` in any case spelling is accepted as `false`), Elasticsearch performs some preflight checks, launches the request, and returns a task reference you can use to cancel the task or get its status.
1153
+ * `REQUESTS_PER_SECOND` - float. The throttle for this request in sub-requests per second. No throttling is enforced by default.
1154
+ * `SCROLL_SIZE` - integer. The number of documents to be deleted in single sub-request. The default batch size is 1000.
1155
+
1156
+ ```bash
1157
+ rake chewy:journal:clean WAIT_FOR_COMPLETION=false REQUESTS_PER_SECOND=10 SCROLL_SIZE=5000
1158
+ ```
1159
+
1147
1160
  ### RSpec integration
1148
1161
 
1149
1162
  Just add `require 'chewy/rspec'` to your spec_helper.rb and you will get additional features:
@@ -48,12 +48,11 @@ module Chewy
48
48
  def index_entry(object)
49
49
  entry = {}
50
50
  entry[:_id] = index_object_ids[object] if index_object_ids[object]
51
+ entry[:routing] = routing(object) if join_field?
51
52
 
52
- data = data_for(object)
53
53
  parent = cache(entry[:_id])
54
-
55
- entry[:routing] = routing(object) if join_field?
56
- if parent_changed?(data, parent)
54
+ data = data_for(object) if parent.present?
55
+ if parent.present? && parent_changed?(data, parent)
57
56
  reindex_entries(object, data) + reindex_descendants(object)
58
57
  elsif @fields.present?
59
58
  return [] unless entry[:_id]
@@ -61,7 +60,7 @@ module Chewy
61
60
  entry[:data] = {doc: data_for(object, fields: @fields)}
62
61
  [{update: entry}]
63
62
  else
64
- entry[:data] = data
63
+ entry[:data] = data || data_for(object)
65
64
  [{index: entry}]
66
65
  end
67
66
  end
data/lib/chewy/journal.rb CHANGED
@@ -43,8 +43,12 @@ module Chewy
43
43
  #
44
44
  # @param until_time [Time, DateTime] time to clean up until it
45
45
  # @return [Hash] delete_by_query ES API call result
46
- def clean(until_time = nil)
47
- Chewy::Stash::Journal.clean(until_time, only: @only)
46
+ def clean(until_time = nil, delete_by_query_options: {})
47
+ Chewy::Stash::Journal.clean(
48
+ until_time,
49
+ only: @only,
50
+ delete_by_query_options: delete_by_query_options.merge(refresh: false)
51
+ )
48
52
  end
49
53
 
50
54
  private
@@ -19,6 +19,9 @@ module Chewy
19
19
  output.puts " Applying journal to #{targets}, #{count} entries, stage #{payload[:stage]}"
20
20
  end
21
21
 
22
+ DELETE_BY_QUERY_OPTIONS = %w[WAIT_FOR_COMPLETION REQUESTS_PER_SECOND SCROLL_SIZE].freeze
23
+ FALSE_VALUES = %w[0 f false off].freeze
24
+
22
25
  class << self
23
26
  # Performs zero-downtime reindexing of all documents for the specified indexes
24
27
  #
@@ -162,7 +165,7 @@ module Chewy
162
165
 
163
166
  subscribed_task_stats(output) do
164
167
  output.puts "Applying journal entries created after #{time}"
165
- count = Chewy::Journal.new(indexes_from(only: only, except: except)).apply(time)
168
+ count = Chewy::Journal.new(journal_indexes_from(only: only, except: except)).apply(time)
166
169
  output.puts 'No journal entries were created after the specified time' if count.zero?
167
170
  end
168
171
  end
@@ -181,12 +184,16 @@ module Chewy
181
184
  # @param except [Array<Chewy::Index, String>, Chewy::Index, String] indexes to exclude from processing
182
185
  # @param output [IO] output io for logging
183
186
  # @return [Array<Chewy::Index>] indexes that were actually updated
184
- def journal_clean(time: nil, only: nil, except: nil, output: $stdout)
187
+ def journal_clean(time: nil, only: nil, except: nil, delete_by_query_options: {}, output: $stdout)
185
188
  subscribed_task_stats(output) do
186
189
  output.puts "Cleaning journal entries created before #{time}" if time
187
- response = Chewy::Journal.new(indexes_from(only: only, except: except)).clean(time)
188
- count = response['deleted'] || response['_indices']['_all']['deleted']
189
- output.puts "Cleaned up #{count} journal entries"
190
+ response = Chewy::Journal.new(journal_indexes_from(only: only, except: except)).clean(time, delete_by_query_options: delete_by_query_options)
191
+ if response.key?('task')
192
+ output.puts "Task to cleanup the journal has been created, #{response['task']}"
193
+ else
194
+ count = response['deleted'] || response['_indices']['_all']['deleted']
195
+ output.puts "Cleaned up #{count} journal entries"
196
+ end
190
197
  end
191
198
  end
192
199
 
@@ -228,6 +235,26 @@ module Chewy
228
235
  end
229
236
  end
230
237
 
238
+ # Reads options that are required to run journal cleanup asynchronously from ENV hash
239
+ # @see https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
240
+ #
241
+ # @example
242
+ # Chewy::RakeHelper.delete_by_query_options_from_env({'WAIT_FOR_COMPLETION' => 'false','REQUESTS_PER_SECOND' => '10','SCROLL_SIZE' => '5000'})
243
+ # # => { wait_for_completion: false, requests_per_second: 10.0, scroll_size: 5000 }
244
+ #
245
+ def delete_by_query_options_from_env(env)
246
+ env
247
+ .slice(*DELETE_BY_QUERY_OPTIONS)
248
+ .transform_keys { |k| k.downcase.to_sym }
249
+ .to_h do |key, value|
250
+ case key
251
+ when :wait_for_completion then [key, !FALSE_VALUES.include?(value.downcase)]
252
+ when :requests_per_second then [key, value.to_f]
253
+ when :scroll_size then [key, value.to_i]
254
+ end
255
+ end
256
+ end
257
+
231
258
  def normalize_indexes(*identifiers)
232
259
  identifiers.flatten(1).map { |identifier| normalize_index(identifier) }
233
260
  end
@@ -248,6 +275,12 @@ module Chewy
248
275
 
249
276
  private
250
277
 
278
+ def journal_indexes_from(only: nil, except: nil)
279
+ return if Array.wrap(only).empty? && Array.wrap(except).empty?
280
+
281
+ indexes_from(only: only, except: except)
282
+ end
283
+
251
284
  def indexes_from(only: nil, except: nil)
252
285
  indexes = if only.present?
253
286
  normalize_indexes(Array.wrap(only))
@@ -962,10 +962,22 @@ module Chewy
962
962
  #
963
963
  # @see https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
964
964
  # @note The result hash is different for different API used.
965
- # @param refresh [true, false] field names
965
+ # @param refresh [true, false] Refreshes all shards involved in the delete by query
966
+ # @param wait_for_completion [true, false] wait for request completion or run it asynchronously
967
+ # and return task reference at `.tasks/task/${taskId}`.
968
+ # @param requests_per_second [Float] The throttle for this request in sub-requests per second
969
+ # @param scroll_size [Integer] Size of the scroll request that powers the operation
970
+
966
971
  # @return [Hash] the result of query execution
967
- def delete_all(refresh: true)
968
- request_body = only(WHERE_STORAGES).render.merge(refresh: refresh)
972
+ def delete_all(refresh: true, wait_for_completion: nil, requests_per_second: nil, scroll_size: nil)
973
+ request_body = only(WHERE_STORAGES).render.merge(
974
+ {
975
+ refresh: refresh,
976
+ wait_for_completion: wait_for_completion,
977
+ requests_per_second: requests_per_second,
978
+ scroll_size: scroll_size
979
+ }.compact
980
+ )
969
981
  ActiveSupport::Notifications.instrument 'delete_query.chewy', notification_payload(request: request_body) do
970
982
  request_body[:body] = {query: {match_all: {}}} if request_body[:body].empty?
971
983
  Chewy.client.delete_by_query(request_body)
data/lib/chewy/stash.rb CHANGED
@@ -28,12 +28,12 @@ module Chewy
28
28
  # Cleans up all the journal entries until the specified time. If nothing is
29
29
  # specified - cleans up everything.
30
30
  #
31
- # @param since_time [Time, DateTime] the time top boundary
31
+ # @param until_time [Time, DateTime] Clean everything before that date
32
32
  # @param only [Chewy::Index, Array<Chewy::Index>] indexes to clean up journal entries for
33
- def self.clean(until_time = nil, only: [])
33
+ def self.clean(until_time = nil, only: [], delete_by_query_options: {})
34
34
  scope = self.for(only)
35
35
  scope = scope.filter(range: {created_at: {lte: until_time}}) if until_time
36
- scope.delete_all
36
+ scope.delete_all(**delete_by_query_options)
37
37
  end
38
38
 
39
39
  # Selects all the journal entries for the specified indices.
data/lib/chewy/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Chewy
2
- VERSION = '7.2.6'.freeze
2
+ VERSION = '7.2.7'.freeze
3
3
  end
data/lib/tasks/chewy.rake CHANGED
@@ -94,7 +94,13 @@ namespace :chewy do
94
94
 
95
95
  desc 'Removes journal records created before the specified timestamp for the specified indexes/types or all of them'
96
96
  task clean: :environment do |_task, args|
97
- Chewy::RakeHelper.journal_clean(**parse_journal_args(args.extras))
97
+ delete_options = Chewy::RakeHelper.delete_by_query_options_from_env(ENV)
98
+ Chewy::RakeHelper.journal_clean(
99
+ [
100
+ parse_journal_args(args.extras),
101
+ {delete_by_query_options: delete_options}
102
+ ].reduce({}, :merge)
103
+ )
98
104
  end
99
105
  end
100
106
  end
@@ -62,6 +62,8 @@ describe Chewy::Index::Import::BulkBuilder do
62
62
  let(:to_index) { cities.first(2) }
63
63
  let(:delete) { [cities.last] }
64
64
  specify do
65
+ expect(subject).to receive(:data_for).with(cities.first).and_call_original
66
+ expect(subject).to receive(:data_for).with(cities.second).and_call_original
65
67
  expect(subject.bulk_body).to eq([
66
68
  {index: {_id: 1, data: {'name' => 'City17', 'rating' => 42}}},
67
69
  {index: {_id: 2, data: {'name' => 'City18', 'rating' => 42}}},
@@ -72,6 +74,8 @@ describe Chewy::Index::Import::BulkBuilder do
72
74
  context ':fields' do
73
75
  let(:fields) { %w[name] }
74
76
  specify do
77
+ expect(subject).to receive(:data_for).with(cities.first, fields: [:name]).and_call_original
78
+ expect(subject).to receive(:data_for).with(cities.second, fields: [:name]).and_call_original
75
79
  expect(subject.bulk_body).to eq([
76
80
  {update: {_id: 1, data: {doc: {'name' => 'City17'}}}},
77
81
  {update: {_id: 2, data: {doc: {'name' => 'City18'}}}},
@@ -426,6 +426,33 @@ Total: \\d+s\\Z
426
426
  described_class.journal_clean(except: CitiesIndex, output: output)
427
427
  expect(output.string).to match(Regexp.new(<<-OUTPUT, Regexp::MULTILINE))
428
428
  \\ACleaned up 1 journal entries
429
+ Total: \\d+s\\Z
430
+ OUTPUT
431
+ end
432
+
433
+ it 'executes asynchronously' do
434
+ output = StringIO.new
435
+ expect(Chewy.client).to receive(:delete_by_query).with(
436
+ {
437
+ body: {query: {match_all: {}}},
438
+ index: ['chewy_journal'],
439
+ refresh: false,
440
+ requests_per_second: 10.0,
441
+ scroll_size: 200,
442
+ wait_for_completion: false
443
+ }
444
+ ).and_call_original
445
+ described_class.journal_clean(
446
+ output: output,
447
+ delete_by_query_options: {
448
+ wait_for_completion: false,
449
+ requests_per_second: 10.0,
450
+ scroll_size: 200
451
+ }
452
+ )
453
+
454
+ expect(output.string).to match(Regexp.new(<<-OUTPUT, Regexp::MULTILINE))
455
+ \\ATask to cleanup the journal has been created, [^\\n]*
429
456
  Total: \\d+s\\Z
430
457
  OUTPUT
431
458
  end
@@ -502,4 +529,45 @@ Total: \\d+s\\Z
502
529
  end
503
530
  end
504
531
  end
532
+
533
+ describe '.delete_by_query_options_from_env' do
534
+ subject(:options) { described_class.delete_by_query_options_from_env(env) }
535
+ let(:env) do
536
+ {
537
+ 'WAIT_FOR_COMPLETION' => 'false',
538
+ 'REQUESTS_PER_SECOND' => '10',
539
+ 'SCROLL_SIZE' => '5000'
540
+ }
541
+ end
542
+
543
+ it 'parses the options' do
544
+ expect(options).to eq(
545
+ wait_for_completion: false,
546
+ requests_per_second: 10.0,
547
+ scroll_size: 5000
548
+ )
549
+ end
550
+
551
+ context 'with different boolean values' do
552
+ it 'parses the option correctly' do
553
+ %w[1 t true TRUE on ON].each do |v|
554
+ expect(described_class.delete_by_query_options_from_env({'WAIT_FOR_COMPLETION' => v}))
555
+ .to eq(wait_for_completion: true)
556
+ end
557
+
558
+ %w[0 f false FALSE off OFF].each do |v|
559
+ expect(described_class.delete_by_query_options_from_env({'WAIT_FOR_COMPLETION' => v}))
560
+ .to eq(wait_for_completion: false)
561
+ end
562
+ end
563
+ end
564
+
565
+ context 'with other env' do
566
+ let(:env) { {'SOME_ENV' => '123', 'REQUESTS_PER_SECOND' => '15'} }
567
+
568
+ it 'parses only the options' do
569
+ expect(options).to eq(requests_per_second: 15.0)
570
+ end
571
+ end
572
+ end
505
573
  end
@@ -817,6 +817,31 @@ describe Chewy::Search::Request do
817
817
  request: {index: ['products'], body: {query: {match: {name: 'name3'}}}, refresh: false}
818
818
  )
819
819
  end
820
+
821
+ it 'delete records asynchronously' do
822
+ outer_payload = nil
823
+ ActiveSupport::Notifications.subscribe('delete_query.chewy') do |_name, _start, _finish, _id, payload|
824
+ outer_payload = payload
825
+ end
826
+ subject.query(match: {name: 'name3'}).delete_all(
827
+ refresh: false,
828
+ wait_for_completion: false,
829
+ requests_per_second: 10.0,
830
+ scroll_size: 2000
831
+ )
832
+ expect(outer_payload).to eq(
833
+ index: ProductsIndex,
834
+ indexes: [ProductsIndex],
835
+ request: {
836
+ index: ['products'],
837
+ body: {query: {match: {name: 'name3'}}},
838
+ refresh: false,
839
+ wait_for_completion: false,
840
+ requests_per_second: 10.0,
841
+ scroll_size: 2000
842
+ }
843
+ )
844
+ end
820
845
  end
821
846
 
822
847
  describe '#response=' do
metadata CHANGED
@@ -1,15 +1,15 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: chewy
3
3
  version: !ruby/object:Gem::Version
4
- version: 7.2.6
4
+ version: 7.2.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Toptal, LLC
8
8
  - pyromaniac
9
- autorequire:
9
+ autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2022-06-13 00:00:00.000000000 Z
12
+ date: 2022-11-15 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: database_cleaner
@@ -449,7 +449,7 @@ homepage: https://github.com/toptal/chewy
449
449
  licenses:
450
450
  - MIT
451
451
  metadata: {}
452
- post_install_message:
452
+ post_install_message:
453
453
  rdoc_options: []
454
454
  require_paths:
455
455
  - lib
@@ -464,8 +464,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
464
464
  - !ruby/object:Gem::Version
465
465
  version: '0'
466
466
  requirements: []
467
- rubygems_version: 3.2.29
468
- signing_key:
467
+ rubygems_version: 3.2.33
468
+ signing_key:
469
469
  specification_version: 4
470
470
  summary: Elasticsearch ODM client wrapper
471
471
  test_files: