sidekiq-iteration 0.2.0 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 342540da75582c7f102f6ead29643a5196038978c5626b0c6a00a43db04f4f1f
4
- data.tar.gz: cfb7cf80031976e5c68b2503d6ab0a13ffe92efc016419ad20944d3c5ce3e56d
3
+ metadata.gz: ea1fc3e6f5faff037ecfade45cc58bd2035385cafb888ef67ce253d12295aee8
4
+ data.tar.gz: a4bf097d4a1a8750f4d3e2ac9ce4f01191b5e7313635fed3d958b89647639c9a
5
5
  SHA512:
6
- metadata.gz: 9c22d0b3d74888b394fcb26ca759a0a9a74e762f06c1d655b1dcb94b791c5d4be3880c6a127d8b5c0d149317fa4f25e1a2550026a14bde62b3743de7b058557c
7
- data.tar.gz: 13ca6cd11f437d9c1b25e6dbd380f018674b1b80b7b4fece7c3a91c042b59b9dc8f8e1bc6a42a14325ec89ab4af4b022d4c3bd00e15aca6cb6189082e03d099a
6
+ metadata.gz: 415f27277011c3721853ae64c1c78c566a3bfe2c690ebaeb8e9e05bb10b4f3faac00cda0f6f95ef9fd980993695edc6621085833c7901bb09b14ff6027467622
7
+ data.tar.gz: 68568c8205c0370a3d1765e5bd6431655d1a3d5fd0ff07ee65fc6e8bf1dd0447fba1a62978c45fea3850029a9046f4cbe7768bce885f5bcb4d9c8e322a539ab6
data/CHANGELOG.md CHANGED
@@ -1,5 +1,49 @@
1
1
  ## master (unreleased)
2
2
 
3
+ ## 0.4.0 (2024-05-10)
4
+
5
+ - Support ordering using multiple directions for ActiveRecord enumerators
6
+
7
+ ```ruby
8
+ active_record_records_enumerator(..., columns: [:shop_id, :id], order: [:asc, :desc])
9
+ ```
10
+
11
+ - Support iterating over ActiveRecord models with composite primary keys
12
+
13
+ - Use Arel to generate SQL in ActiveRecord enumerator
14
+
15
+ Previously, the enumerator coerced numeric ids to a string value (e.g.: `... AND id > '1'`),
16
+ which can cause problems on some DBMSes (like BigQuery).
17
+
18
+ - Enforce explicitly passed to ActiveRecord enumerators `:columns` value to include a primary key
19
+
20
+ Previously, the primary key column was added implicitly if it was not in the list.
21
+
22
+ ```ruby
23
+ # before
24
+ active_record_records_enumerator(..., columns: [:updated_at])
25
+
26
+ # after
27
+ active_record_records_enumerator(..., columns: [:updated_at, :id])
28
+ ```
29
+
30
+ - Accept single values as a `:columns` for ActiveRecord enumerators
31
+ - Add `around_iteration` hook
32
+
33
+ ## 0.3.0 (2023-05-20)
34
+
35
+ - Allow a default retry backoff to be configured
36
+
37
+ ```ruby
38
+ SidekiqIteration.default_retry_backoff = 10.seconds
39
+ ```
40
+
41
+ - Add ability to iterate Active Record enumerators in reverse order
42
+
43
+ ```ruby
44
+ active_record_records_enumerator(User.all, order: :desc)
45
+ ```
46
+
3
47
  ## 0.2.0 (2022-11-11)
4
48
 
5
49
  - Fix storing run metadata when the job fails for sidekiq < 6.5.2
data/README.md CHANGED
@@ -4,6 +4,8 @@
4
4
 
5
5
  Meet Iteration, an extension for [Sidekiq](https://github.com/mperham/sidekiq) that makes your long-running jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
6
6
 
7
+ You may consider [`pluck_in_batches`](https://github.com/fatkodima/pluck_in_batches) gem to speedup iterating over large database tables.
8
+
7
9
  ## Background
8
10
 
9
11
  Imagine the following job:
@@ -33,7 +35,7 @@ Software that is designed for high availability [must be resilient](https://12fa
33
35
  - Ruby 2.7+ (if you need support for older ruby, [open an issue](https://github.com/fatkodima/sidekiq-iteration/issues/new))
34
36
  - Sidekiq 6+
35
37
 
36
- ## Getting started
38
+ ## Installation
37
39
 
38
40
  Add this line to your application's Gemfile:
39
41
 
@@ -45,6 +47,8 @@ And then execute:
45
47
 
46
48
  $ bundle
47
49
 
50
+ ## Getting started
51
+
48
52
  In the job, include `SidekiqIteration::Iteration` module and start describing the job with two methods (`build_enumerator` and `each_iteration`) instead of `perform`:
49
53
 
50
54
  ```ruby
@@ -97,6 +101,12 @@ class NotifyUsersJob
97
101
  # Will be called when the job starts iterating. Called only once, for the first time.
98
102
  end
99
103
 
104
+ def around_iteration
105
+ # Will be called around each iteration.
106
+ # Can be useful for some metrics collection, performance tracking etc.
107
+ yield
108
+ end
109
+
100
110
  def on_resume
101
111
  # Called when the job resumes iterating.
102
112
  end
@@ -184,10 +194,10 @@ class CsvJob
184
194
 
185
195
  def build_enumerator(import_id, cursor:)
186
196
  import = Import.find(import_id)
187
- csv_enumereator(import.csv, cursor: cursor)
197
+ csv_enumerator(import.csv, cursor: cursor)
188
198
  end
189
199
 
190
- def each_iteration(csv_row)
200
+ def each_iteration(csv_row, import_id)
191
201
  # insert csv_row to database
192
202
  end
193
203
  end
@@ -220,6 +230,7 @@ end
220
230
  ## Guides
221
231
 
222
232
  * [Iteration: how it works](guides/iteration-how-it-works.md)
233
+ * [Job argument semantics](guides/argument-semantics.md)
223
234
  * [Best practices](guides/best-practices.md)
224
235
  * [Writing custom enumerator](guides/custom-enumerator.md)
225
236
  * [Throttling](guides/throttling.md)
@@ -0,0 +1,130 @@
1
+ # Argument Semantics
2
+
3
+ `sidekiq-iteration` defines the `perform` method, required by `sidekiq`, to allow for iteration.
4
+
5
+ The call sequence is usually 3 methods:
6
+
7
+ `perform -> build_enumerator -> each_iteration`
8
+
9
+ In that sense `sidekiq-iteration` works like a framework (it calls your code) rather than like a library (that you call). When using jobs with parameters, the following rules of thumb are good to keep in mind.
10
+
11
+ ## Jobs without arguments
12
+
13
+ Jobs without arguments do not pass anything into either `build_enumerator` or `each_iteration` except for the `cursor` which `sidekiq-iteration` persists by itself:
14
+
15
+ ```ruby
16
+ class ArglessJob
17
+ include Sidekiq::Job
18
+ include SidekiqIteration::Iteration
19
+
20
+ def build_enumerator(cursor:)
21
+ # ...
22
+ end
23
+
24
+ def each_iteration(single_object_yielded_from_enumerator)
25
+ # ...
26
+ end
27
+ end
28
+ ```
29
+
30
+ To enqueue the job:
31
+
32
+ ```ruby
33
+ ArglessJob.perform_async
34
+ ```
35
+
36
+ ## Jobs with positional arguments
37
+
38
+ Jobs with positional arguments will have those arguments available to both `build_enumerator` and `each_iteration`:
39
+
40
+ ```ruby
41
+ class ArgumentativeJob
42
+ include Sidekiq::Job
43
+ include SidekiqIteration::Iteration
44
+
45
+ def build_enumerator(arg1, arg2, arg3, cursor:)
46
+ # ...
47
+ end
48
+
49
+ def each_iteration(single_object_yielded_from_enumerator, arg1, arg2, arg3)
50
+ # ...
51
+ end
52
+ end
53
+ ```
54
+
55
+ To enqueue the job:
56
+
57
+ ```ruby
58
+ ArgumentativeJob.perform_async(_arg1 = "One", _arg2 = "Two", _arg3 = "Three")
59
+ ```
60
+
61
+ ## Jobs with keyword arguments
62
+
63
+ Jobs with keyword arguments will have the keyword arguments available to both `build_enumerator` and `each_iteration`, but these arguments come packaged into a Hash in both cases. You will need to `fetch` or `[]` your parameter from the `Hash` you get passed in:
64
+
65
+ ```ruby
66
+ class ParameterizedJob
67
+ include Sidekiq::Job
68
+ include SidekiqIteration::Iteration
69
+
70
+ def build_enumerator(kwargs, cursor:)
71
+ name = kwargs.fetch("name")
72
+ email = kwargs.fetch("email")
73
+ # ...
74
+ end
75
+
76
+ def each_iteration(object_yielded_from_enumerator, kwargs)
77
+ name = kwargs.fetch("name")
78
+ email = kwargs.fetch("email")
79
+ # ...
80
+ end
81
+ end
82
+ ```
83
+
84
+ To enqueue the job:
85
+
86
+ ```ruby
87
+ ParameterizedJob.perform_async("name" => "Jane", "email" => "jane@host.example")
88
+ ```
89
+
90
+ ## Jobs with both positional and keyword arguments
91
+
92
+ Jobs with keyword arguments will have the keyword arguments available to both `build_enumerator` and `each_iteration`, but these arguments come packaged into a Hash in both cases. You will need to `fetch` or `[]` your parameter from the `Hash` you get passed in. Positional arguments get passed first and "unsplatted" (not combined into an array), the `Hash` containing keyword arguments comes after:
93
+
94
+ ```ruby
95
+ class HighlyConfigurableGreetingJob
96
+ include Sidekiq::Job
97
+ include SidekiqIteration::Iteration
98
+
99
+ def build_enumerator(subject_line, kwargs, cursor:)
100
+ name = kwargs.fetch("sender_name")
101
+ email = kwargs.fetch("sender_email")
102
+ # ...
103
+ end
104
+
105
+ def each_iteration(object_yielded_from_enumerator, subject_line, kwargs)
106
+ name = kwargs.fetch("sender_name")
107
+ email = kwargs.fetch("sender_email")
108
+ # ...
109
+ end
110
+ end
111
+ ```
112
+
113
+ To enqueue the job:
114
+
115
+ ```ruby
116
+ HighlyConfigurableGreetingJob.perform_async(_subject_line = "Greetings everybody!", "sender_name" => "Jane", "sender_email" => "jane@host.example")
117
+ ```
118
+
119
+ ## Returning (yielding) from enumerators
120
+
121
+ When defining a custom enumerator (see the [custom enumerator guide](custom-enumerator.md)) you need to yield two positional arguments from it: the object that will be the value for the current iteration (like a single ActiveModel instance, a single number...) and the value you want to be persisted as the `cursor` value should `sidekiq-iteration` decide to interrupt you after this iteration. Calling the enumerator with that cursor should return the next object after the one returned in this iteration. That new `cursor` value does not get passed to `each_iteration`:
122
+
123
+ ```ruby
124
+ Enumerator.new do |yielder|
125
+ # In this case `cursor` is an Integer
126
+ cursor.upto(99999) do |offset|
127
+ yielder.yield(fetch_record_at(offset), offset)
128
+ end
129
+ end
130
+ ```
@@ -2,7 +2,7 @@
2
2
 
3
3
  ## Considerations when writing jobs
4
4
 
5
- * Duration of `#each_iteration`: processing a single element from the enumerator builded in `#build_enumerator` should take less than 25 seconds, or the duration set as a timeout for Sidekiq. It allows the job to be safely interrupted and resumed.
5
+ * Duration of `#each_iteration`: processing a single element from the enumerator built in `#build_enumerator` should take less than 25 seconds, or the duration set as a timeout for Sidekiq. It allows the job to be safely interrupted and resumed.
6
6
  * Idempotency of `#each_iteration`: it should be safe to run `#each_iteration` multiple times for the same element from the enumerator. Read more in [this Sidekiq best practice](https://github.com/mperham/sidekiq/wiki/Best-Practices#2-make-your-job-idempotent-and-transactional). It's important if the job errors and you run it again, because the same element that errored the job may be processed again. It especially matters in the situation described above, when the iteration duration exceeds the timeout: if the job is re-enqueued, multiple elements may be processed again.
7
7
 
8
8
  ## Batch iteration
@@ -2,6 +2,17 @@
2
2
 
3
3
  Iteration leverages the [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.html) pattern from the Ruby standard library, which allows us to use almost any resource as a collection to iterate.
4
4
 
5
+ Before writing an enumerator, it is important to understand [how Iteration works](iteration-how-it-works.md) and how
6
+ your enumerator will be used by it. An enumerator must `yield` two things in the following order as positional
7
+ arguments:
8
+ - An object to be processed in a job `each_iteration` method
9
+ - A cursor position, which Iteration will persist if `each_iteration` returns successfully and the job is forced to shut
10
+ down. It can be any data type your job backend can serialize and deserialize correctly.
11
+
12
+ A job that includes Iteration is first started with `nil` as the cursor. When resuming an interrupted job, Iteration
13
+ will deserialize the persisted cursor and pass it to the job's `build_enumerator` method, which your enumerator uses to
14
+ find objects that come _after_ the last successfully processed object.
15
+
5
16
  ## Cursorless Enumerator
6
17
 
7
18
  Consider a custom Enumerator that takes items from a Redis list. Because a Redis list is essentially a queue, we can ignore the cursor:
@@ -23,7 +34,7 @@ class ListJob
23
34
  end
24
35
  end
25
36
 
26
- def each_iteration(item)
37
+ def each_iteration(item_from_redis)
27
38
  # ...
28
39
  end
29
40
  end
@@ -31,14 +42,15 @@ end
31
42
 
32
43
  ## Enumerator with cursor
33
44
 
34
- But what about iterating based on a cursor? Consider this Enumerator that wraps third party API (Stripe) for paginated iteration:
45
+ For a more complex example, consider this Enumerator that wraps a third party API (Stripe) for paginated iteration and
46
+ stores a string as the cursor position:
35
47
 
36
48
  ```ruby
37
49
  class StripeListEnumerator
38
50
  # @param resource [Stripe::APIResource] The type of Stripe object to request
39
51
  # @param params [Hash] Query parameters for the request
40
52
  # @param options [Hash] Request options, such as API key or version
41
- # @param cursor [String]
53
+ # @param cursor [nil, String] The Stripe ID of the last item iterated over
42
54
  def initialize(resource, params: {}, options: {}, cursor:)
43
55
  pagination_params = {}
44
56
  pagination_params[:starting_after] = cursor unless cursor.nil?
@@ -59,6 +71,9 @@ class StripeListEnumerator
59
71
  def each
60
72
  loop do
61
73
  @list.each do |item, _index|
74
+ # The first argument is what gets passed to `each_iteration`.
75
+ # The second argument (item.id) is going to be persisted as the cursor,
76
+ # it doesn't get passed to `each_iteration`.
62
77
  yield item, item.id
63
78
  end
64
79
 
@@ -71,26 +86,38 @@ class StripeListEnumerator
71
86
  end
72
87
  ```
73
88
 
89
+ Here we leverage the Stripe cursor pagination where the cursor is an ID of a specific item in the collection. The job
90
+ which uses such an `Enumerator` would then look like so:
91
+
74
92
  ```ruby
75
- class StripeJob
93
+ class LoadRefundsForChargeJob
76
94
  include Sidekiq::Job
77
95
  include SidekiqIteration::Iteration
78
96
 
79
- def build_enumerator(params, cursor:)
97
+ def build_enumerator(charge_id, cursor:)
80
98
  StripeListEnumerator.new(
81
99
  Stripe::Refund,
82
- params: { charge: "ch_123" },
100
+ params: { charge: charge_id }, # "charge_id" will be a prefixed Stripe ID such as "chrg_123"
83
101
  options: { api_key: "sk_test_123", stripe_version: "2018-01-18" },
84
102
  cursor: cursor
85
103
  ).to_enumerator
86
104
  end
87
105
 
88
- def each_iteration(stripe_refund, _params)
106
+ # Note that in this case `each_iteration` will only receive one positional argument per iteration.
107
+ # If what your enumerator yields is a composite object you will need to unpack it yourself
108
+ # inside the `each_iteration`.
109
+ def each_iteration(stripe_refund, charge_id)
89
110
  # ...
90
111
  end
91
112
  end
92
113
  ```
93
114
 
115
+ and you initiate the job with
116
+
117
+ ```ruby
118
+ LoadRefundsForChargeJob.perform_later(_charge_id = "chrg_345")
119
+ ```
120
+
94
121
  ## Notes
95
122
 
96
123
  We recommend that you read the implementation of the other enumerators that come with the library (`CsvEnumerator`, `ActiveRecordEnumerator`) to gain a better understanding of building Enumerator objects.
@@ -36,6 +36,12 @@ SELECT "users".* FROM "users" ORDER BY "users"."id" LIMIT 100
36
36
  SELECT "users".* FROM "users" WHERE "users"."id" > 2 ORDER BY "products"."id" LIMIT 100
37
37
  ```
38
38
 
39
+ ## Exceptions inside `each_iteration`
40
+
41
+ When an unrescued exception happens inside the `each_iteration` block, the job will stop and re-enqueue itself with the last successful cursor. This means that the iteration that failed will be retried with the same parameters and the cursor will only move if that iteration succeeds. This behaviour may be enough for intermittent errors, such as network connection failures, but if your execution is deterministic and you have an error, subsequent iterations will never run.
42
+
43
+ In other words, if you are trying to process 100 records but the job consistently fails on the 61st, only the first 60 will be processed and the job will try to process the 61st record until retries are exhausted.
44
+
39
45
  ## Signals
40
46
 
41
47
  It's critical to know [UNIX signals](https://www.tutorialspoint.com/unix/unix-signals-traps.htm) in order to understand how interruption works. There are two main signals that Sidekiq use: `SIGTERM` and `SIGKILL`. `SIGTERM` is the graceful termination signal which means that the process should exit _soon_, not immediately. For Iteration, it means that we have time to wait for the last iteration to finish and to push job back to the queue with the last cursor position.
data/guides/throttling.md CHANGED
@@ -25,7 +25,7 @@ class DeleteAccountsThrottledJob
25
25
  end
26
26
  ```
27
27
 
28
- Note that its up to you to define a throttling condition that makes sense for your app.
28
+ Note that it's up to you to define a throttling condition that makes sense for your app.
29
29
  For example, `DatabaseStatus.healthy?` can check various MySQL metrics such as replication lag, DB threads, whether DB writes are available, etc.
30
30
 
31
31
  Jobs can define multiple throttle conditions. Throttle conditions are inherited by descendants, and new conditions will be appended without impacting existing conditions.
@@ -5,41 +5,69 @@ module SidekiqIteration
5
5
  class ActiveRecordEnumerator
6
6
  SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%6N"
7
7
 
8
- def initialize(relation, columns: nil, batch_size: 100, cursor: nil)
8
+ def initialize(relation, columns: nil, batch_size: 100, order: :asc, cursor: nil)
9
9
  unless relation.is_a?(ActiveRecord::Relation)
10
10
  raise ArgumentError, "relation must be an ActiveRecord::Relation"
11
11
  end
12
12
 
13
- @primary_key = "#{relation.table_name}.#{relation.primary_key}"
14
- @columns = Array(columns&.map(&:to_s) || @primary_key)
15
- @primary_key_index = @columns.index(@primary_key) || @columns.index(relation.primary_key)
16
- @pluck_columns = if @primary_key_index
17
- @columns
18
- else
19
- @columns + [@primary_key]
20
- end
21
- @batch_size = batch_size
22
- @cursor = Array.wrap(cursor)
23
- raise ArgumentError, "Must specify at least one column" if @columns.empty?
24
- if relation.joins_values.present? && !@columns.all?(/\./)
25
- raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
26
- end
27
-
28
13
  if relation.arel.orders.present? || relation.arel.taken.present?
29
14
  raise ArgumentError,
30
15
  "The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
31
16
  "You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
32
17
  end
33
18
 
34
- @base_relation = relation.reorder(@columns.join(", "))
19
+ @relation = relation
20
+ @primary_key = relation.primary_key
21
+ columns = Array(columns || @primary_key).map(&:to_s)
22
+
23
+ if (Array(order) - [:asc, :desc]).any?
24
+ raise ArgumentError, ":order must be :asc or :desc or an array consisting of :asc or :desc, got #{order.inspect}"
25
+ end
26
+
27
+ if order.is_a?(Array) && order.size != columns.size
28
+ raise ArgumentError, ":order must include a direction for each batching column"
29
+ end
30
+
31
+ @primary_key_index = primary_key_index(columns, relation)
32
+ if @primary_key_index.nil? || (composite_primary_key? && @primary_key_index.any?(nil))
33
+ raise ArgumentError, ":columns must include a primary key columns"
34
+ end
35
+
36
+ @batch_size = batch_size
37
+ @order = batch_order(columns, order)
38
+ @cursor = Array(cursor)
39
+
40
+ if @cursor.present? && @cursor.size != columns.size
41
+ raise ArgumentError, ":cursor must include values for all the columns from :columns"
42
+ end
43
+
44
+ if columns.any?(/\W/)
45
+ arel_columns = columns.map.with_index do |column, i|
46
+ arel_column(column).as("cursor_column_#{i + 1}")
47
+ end
48
+ @cursor_columns = arel_columns.map { |column| column.right.to_s }
49
+
50
+ relation =
51
+ if relation.select_values.empty?
52
+ relation.select(@relation.arel_table[Arel.star], arel_columns)
53
+ else
54
+ relation.select(arel_columns)
55
+ end
56
+ else
57
+ @cursor_columns = columns
58
+ end
59
+
60
+ @columns = columns
61
+ ordering = @columns.zip(@order).to_h
62
+ @base_relation = relation.reorder(ordering)
35
63
  @iteration_count = 0
36
64
  end
37
65
 
38
66
  def records
39
67
  Enumerator.new(-> { records_size }) do |yielder|
40
- batches.each do |batch, _|
68
+ batches.each do |batch, _| # rubocop:disable Style/HashEachMethods
41
69
  batch.each do |record|
42
- @iteration_count += 1
70
+ increment_iteration
43
71
  yielder.yield(record, cursor_value(record))
44
72
  end
45
73
  end
@@ -49,7 +77,7 @@ module SidekiqIteration
49
77
  def batches
50
78
  Enumerator.new(-> { records_size }) do |yielder|
51
79
  while (batch = next_batch(load: true))
52
- @iteration_count += 1
80
+ increment_iteration
53
81
  yielder.yield(batch, cursor_value(batch.last))
54
82
  end
55
83
  end
@@ -58,13 +86,44 @@ module SidekiqIteration
58
86
  def relations
59
87
  Enumerator.new(-> { relations_size }) do |yielder|
60
88
  while (batch = next_batch(load: false))
61
- @iteration_count += 1
89
+ increment_iteration
62
90
  yielder.yield(batch, unwrap_array(@cursor))
63
91
  end
64
92
  end
65
93
  end
66
94
 
67
95
  private
96
+ def primary_key_index(columns, relation)
97
+ indexes = Array(@primary_key).map do |pk_column|
98
+ columns.index do |column|
99
+ column == pk_column ||
100
+ (column.include?(relation.table_name) && column.include?(pk_column))
101
+ end
102
+ end
103
+
104
+ if composite_primary_key?
105
+ indexes
106
+ else
107
+ indexes.first
108
+ end
109
+ end
110
+
111
+ def batch_order(columns, order)
112
+ if order.is_a?(Array)
113
+ order
114
+ else
115
+ [order] * columns.size
116
+ end
117
+ end
118
+
119
+ def arel_column(column)
120
+ if column.include?(".")
121
+ Arel.sql(column)
122
+ else
123
+ @relation.arel_table[column]
124
+ end
125
+ end
126
+
68
127
  def records_size
69
128
  @base_relation.count(:all)
70
129
  end
@@ -75,8 +134,8 @@ module SidekiqIteration
75
134
 
76
135
  def next_batch(load:)
77
136
  batch_relation = @base_relation.limit(@batch_size)
78
- if conditions.any?
79
- batch_relation = batch_relation.where(*conditions)
137
+ if @cursor.present?
138
+ batch_relation = apply_cursor(batch_relation)
80
139
  end
81
140
 
82
141
  records = nil
@@ -92,9 +151,7 @@ module SidekiqIteration
92
151
  cursor = cursor_values.last
93
152
  return unless cursor.present?
94
153
 
95
- # The primary key was plucked, but original cursor did not include it, so we should remove it
96
- cursor.pop unless @primary_key_index
97
- @cursor = Array.wrap(cursor)
154
+ @cursor = Array(cursor)
98
155
 
99
156
  # Yields relations by selecting the primary keys of records in the batch.
100
157
  # Post.where(published: nil) results in an enumerator of relations like:
@@ -105,79 +162,89 @@ module SidekiqIteration
105
162
  end
106
163
 
107
164
  def pluck_columns(batch)
108
- columns =
109
- if batch.is_a?(Array)
110
- @pluck_columns.map { |column| column.to_s.split(".").last }
111
- else
112
- @pluck_columns
113
- end
114
-
115
- if columns.size == 1 # only the primary key
116
- column_values = batch.pluck(columns.first)
165
+ if @cursor_columns.size == 1 # only the primary key
166
+ column_values = batch.pluck(@cursor_columns.first)
117
167
  return [column_values, column_values]
118
168
  end
119
169
 
120
- column_values = batch.pluck(*columns)
121
- primary_key_index = @primary_key_index || -1
122
- primary_key_values = column_values.map { |values| values[primary_key_index] }
170
+ column_values = batch.pluck(*@cursor_columns)
171
+ primary_key_values =
172
+ if composite_primary_key?
173
+ column_values.map { |values| values.values_at(*@primary_key_index) }
174
+ else
175
+ column_values.map { |values| values[@primary_key_index] }
176
+ end
123
177
 
124
- serialize_column_values!(column_values)
178
+ column_values = serialize_column_values(column_values)
125
179
  [column_values, primary_key_values]
126
180
  end
127
181
 
128
182
  def cursor_value(record)
129
- positions = @columns.map do |column|
130
- attribute_name = column.to_s.split(".").last
131
- column_value(record[attribute_name])
183
+ positions = @cursor_columns.map do |column|
184
+ column_value(record[column])
132
185
  end
133
186
 
134
187
  unwrap_array(positions)
135
188
  end
136
189
 
137
- def conditions
138
- return [] if @cursor.empty?
190
+ # (x, y) >= (a, b) iff (x > a or (x = a and y >= b))
191
+ # (x, y) <= (a, b) iff (x < a or (x = a and y <= b))
192
+ def apply_cursor(relation)
193
+ arel_columns = @columns.map { |column| arel_column(column) }
194
+ cursor_positions = arel_columns.zip(@cursor, cursor_operators)
139
195
 
140
- binds = []
141
- sql = build_starts_after_conditions(0, binds)
142
-
143
- # Start from the record pointed by cursor.
144
- # We use the property that `>=` is equivalent to `> or =`.
145
- if @iteration_count == 0
146
- binds.unshift(*@cursor)
147
- columns_equality = @columns.map { |column| "#{column} = ?" }.join(" AND ")
148
- sql = "(#{columns_equality}) OR (#{sql})"
196
+ where_clause = nil
197
+ cursor_positions.reverse_each.with_index do |(arel_column, value, operator), index|
198
+ where_clause =
199
+ if index == 0
200
+ arel_column.public_send(operator, value)
201
+ else
202
+ arel_column.public_send(operator, value).or(
203
+ arel_column.eq(value).and(where_clause),
204
+ )
205
+ end
149
206
  end
150
207
 
151
- [sql, *binds]
208
+ relation.where(where_clause)
152
209
  end
153
210
 
154
- # (x, y) > (a, b) iff (x > a or (x = a and y > b))
155
- def build_starts_after_conditions(index, binds)
156
- column = @columns[index]
211
+ def serialize_column_values(column_values)
212
+ column_values.map { |values| values.map { |value| column_value(value) } }
213
+ end
157
214
 
158
- if index < @cursor.size - 1
159
- binds << @cursor[index] << @cursor[index]
160
- "#{column} > ? OR (#{column} = ? AND (#{build_starts_after_conditions(index + 1, binds)}))"
215
+ def column_value(value)
216
+ if value.is_a?(Time)
217
+ value.strftime(SQL_DATETIME_WITH_NSEC)
161
218
  else
162
- binds << @cursor[index]
163
- if @columns.size == @cursor.size
164
- "#{column} > ?"
219
+ value
220
+ end
221
+ end
222
+
223
+ def cursor_operators
224
+ # Start from the record pointed by cursor when just starting.
225
+ @columns.zip(@order).map do |column, order|
226
+ if column == @columns.last
227
+ if order == :asc
228
+ first_iteration? ? :gteq : :gt
229
+ else
230
+ first_iteration? ? :lteq : :lt
231
+ end
165
232
  else
166
- "#{column} >= ?"
233
+ order == :asc ? :gt : :lt
167
234
  end
168
235
  end
169
236
  end
170
237
 
171
- def serialize_column_values!(column_values)
172
- column_values.map! { |values| values.map! { |value| column_value(value) } }
238
+ def increment_iteration
239
+ @iteration_count += 1
173
240
  end
174
241
 
175
- def column_value(value)
176
- if value.is_a?(Time)
177
- value.strftime(SQL_DATETIME_WITH_NSEC)
178
- else
179
- value
180
- end
242
+ def first_iteration?
243
+ @iteration_count == 0
244
+ end
245
+
246
+ def composite_primary_key?
247
+ @primary_key.is_a?(Array)
181
248
  end
182
249
 
183
250
  def unwrap_array(array)
@@ -36,7 +36,7 @@ module SidekiqIteration
36
36
  # SidekiqIteration::CsvEnumerator.new(csv).rows(cursor: cursor)
37
37
  #
38
38
  def initialize(csv)
39
- unless csv.instance_of?(CSV)
39
+ unless defined?(CSV) && csv.instance_of?(CSV)
40
40
  raise ArgumentError, "CsvEnumerator.new takes CSV object"
41
41
  end
42
42
 
@@ -17,10 +17,6 @@ module SidekiqIteration
17
17
  def array_enumerator(array, cursor:)
18
18
  raise ArgumentError, "array must be an Array" unless array.is_a?(Array)
19
19
 
20
- if defined?(ActiveRecord) && array.any?(ActiveRecord::Base)
21
- raise ArgumentError, "array cannot contain ActiveRecord objects"
22
- end
23
-
24
20
  array.each_with_index.drop(cursor || 0).to_enum { array.size }
25
21
  end
26
22
 
@@ -28,9 +24,10 @@ module SidekiqIteration
28
24
  #
29
25
  # @param scope [ActiveRecord::Relation] scope to iterate
30
26
  # @param cursor [Object] offset to start iteration from, usually an id
31
- # @option options :columns [Array<String, Symbol>] used to build the actual query for iteration,
27
+ # @option options :columns [Array<String, Symbol>, String, Symbol] used to build the actual query for iteration,
32
28
  # defaults to primary key
33
29
  # @option options :batch_size [Integer] (100) size of the batch
30
+ # @option options :order [:asc, :desc, Array<:asc, :desc>] (:asc) specifies iteration order
34
31
  #
35
32
  # +columns:+ argument is used to build the actual query for iteration. +columns+: defaults to primary key:
36
33
  #
@@ -58,7 +55,7 @@ module SidekiqIteration
58
55
  # As a result of this query pattern, if the values in these columns change for the records in scope during
59
56
  # iteration, they may be skipped or yielded multiple times depending on the nature of the update and the
60
57
  # cursor's value. If the value gets updated to a greater value than the cursor's value, it will get yielded
61
- # again. Similarly, if the value gets updated to a lesser value than the curor's value, it will get skipped.
58
+ # again. Similarly, if the value gets updated to a lesser value than the cursor's value, it will get skipped.
62
59
  #
63
60
  # @example
64
61
  # def build_enumerator(cursor:)
@@ -13,15 +13,14 @@ module SidekiqIteration
13
13
  base.extend(Throttling)
14
14
 
15
15
  base.class_eval do
16
- throttle_on(backoff: 0) do |job|
16
+ throttle_on(backoff: SidekiqIteration.default_retry_backoff) do |job|
17
17
  job.class.max_job_runtime &&
18
18
  job.start_time &&
19
19
  (Time.now.utc - job.start_time) > job.class.max_job_runtime
20
20
  end
21
21
 
22
- throttle_on(backoff: 0) do
23
- defined?(Sidekiq::CLI) &&
24
- Sidekiq::CLI.instance.launcher.stopping?
22
+ throttle_on(backoff: SidekiqIteration.default_retry_backoff) do
23
+ SidekiqIteration.stopping
25
24
  end
26
25
  end
27
26
 
@@ -56,16 +55,22 @@ module SidekiqIteration
56
55
 
57
56
  attr_reader :executions,
58
57
  :cursor_position,
59
- :start_time,
60
58
  :times_interrupted,
61
- :total_time,
62
59
  :current_run_iterations
63
60
 
61
+ # The time when the job starts running. If the job is interrupted and runs again,
62
+ # the value is updated.
63
+ attr_reader :start_time
64
+
65
+ # The total time the job has been running, including multiple iterations.
66
+ # The time isn't reset if the job is interrupted.
67
+ attr_reader :total_time
68
+
64
69
  # @private
65
70
  def initialize
66
71
  super
67
72
  @arguments = nil
68
- @job_iteration_retry_backoff = nil
73
+ @job_iteration_retry_backoff = SidekiqIteration.default_retry_backoff
69
74
  @needs_reenqueue = false
70
75
  @current_run_iterations = 0
71
76
  end
@@ -82,6 +87,12 @@ module SidekiqIteration
82
87
  def on_start
83
88
  end
84
89
 
90
+ # A hook to override that will be called around each iteration.
91
+ # Can be useful for some metrics collection, performance tracking etc.
92
+ def around_iteration
93
+ yield
94
+ end
95
+
85
96
  # A hook to override that will be called when the job resumes iterating.
86
97
  def on_resume
87
98
  end
@@ -172,7 +183,9 @@ module SidekiqIteration
172
183
 
173
184
  enumerator.each do |object_from_enumerator, index|
174
185
  found_record = true
175
- each_iteration(object_from_enumerator, *arguments)
186
+ around_iteration do
187
+ each_iteration(object_from_enumerator, *arguments)
188
+ end
176
189
  @cursor_position = index
177
190
  @current_run_iterations += 1
178
191
 
@@ -191,14 +204,14 @@ module SidekiqIteration
191
204
  )
192
205
  end
193
206
 
194
- adjust_total_time
195
207
  true
208
+ ensure
209
+ adjust_total_time
196
210
  end
197
211
 
198
212
  def reenqueue_iteration_job
199
213
  SidekiqIteration.logger.info("[SidekiqIteration::Iteration] Interrupting and re-enqueueing the job cursor_position=#{cursor_position}")
200
214
 
201
- adjust_total_time
202
215
  @times_interrupted += 1
203
216
 
204
217
  arguments = @arguments
@@ -252,13 +265,6 @@ module SidekiqIteration
252
265
  true
253
266
  when false, :skip_complete_callback
254
267
  false
255
- when Array # can be used to return early from the enumerator
256
- reason, backoff = completed
257
- raise "Unknown reason: #{reason}" unless reason == :retry
258
-
259
- @job_iteration_retry_backoff = backoff
260
- @needs_reenqueue = true
261
- false
262
268
  else
263
269
  raise "Unexpected thrown value: #{completed.inspect}"
264
270
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SidekiqIteration
4
- VERSION = "0.2.0"
4
+ VERSION = "0.4.0"
5
5
  end
@@ -1,6 +1,8 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require "sidekiq"
4
+ require_relative "sidekiq_iteration/iteration"
5
+ require_relative "sidekiq_iteration/job_retry_patch"
4
6
  require_relative "sidekiq_iteration/version"
5
7
 
6
8
  module SidekiqIteration
@@ -22,6 +24,17 @@ module SidekiqIteration
22
24
  #
23
25
  attr_accessor :max_job_runtime
24
26
 
27
+ # Configures a delay duration to wait before resuming an interrupted job.
28
+ #
29
+ # @example
30
+ # SidekiqIteration.default_retry_backoff = 10.seconds
31
+ #
32
+ # Defaults to nil which means interrupted jobs will be retried immediately.
33
+ # This value will be ignored when an interruption is raised by a throttle enumerator,
34
+ # where the throttle backoff value will take precedence over this setting.
35
+ #
36
+ attr_accessor :default_retry_backoff
37
+
25
38
  # Set a custom logger for sidekiq-iteration.
26
39
  # Defaults to `Sidekiq.logger`.
27
40
  #
@@ -33,8 +46,14 @@ module SidekiqIteration
33
46
  def logger
34
47
  @logger ||= Sidekiq.logger
35
48
  end
49
+
50
+ # @private
51
+ attr_accessor :stopping
36
52
  end
37
53
  end
38
54
 
39
- require_relative "sidekiq_iteration/iteration"
40
- require_relative "sidekiq_iteration/job_retry_patch"
55
+ Sidekiq.configure_server do |config|
56
+ config.on(:quiet) do
57
+ SidekiqIteration.stopping = true
58
+ end
59
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sidekiq-iteration
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - fatkodima
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2022-11-11 00:00:00.000000000 Z
12
+ date: 2024-05-10 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: sidekiq
@@ -35,6 +35,7 @@ files:
35
35
  - CHANGELOG.md
36
36
  - LICENSE.txt
37
37
  - README.md
38
+ - guides/argument-semantics.md
38
39
  - guides/best-practices.md
39
40
  - guides/custom-enumerator.md
40
41
  - guides/iteration-how-it-works.md
@@ -71,8 +72,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
71
72
  - !ruby/object:Gem::Version
72
73
  version: '0'
73
74
  requirements: []
74
- rubygems_version: 3.1.6
75
+ rubygems_version: 3.4.19
75
76
  signing_key:
76
77
  specification_version: 4
77
- summary: Makes your sidekiq jobs interruptible and resumable.
78
+ summary: Makes your long-running sidekiq jobs interruptible and resumable.
78
79
  test_files: []