sidekiq-iteration 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 342540da75582c7f102f6ead29643a5196038978c5626b0c6a00a43db04f4f1f
4
- data.tar.gz: cfb7cf80031976e5c68b2503d6ab0a13ffe92efc016419ad20944d3c5ce3e56d
3
+ metadata.gz: 40efca13e06cd7fdcfc1ff59ad08fea8fc731ee1b5560ae5b75b2591379bcb63
4
+ data.tar.gz: eec2991b40bb67ffc1dcea55f1c0e8acc98a18cd59ad2fd117cd80c1a94c3e79
5
5
  SHA512:
6
- metadata.gz: 9c22d0b3d74888b394fcb26ca759a0a9a74e762f06c1d655b1dcb94b791c5d4be3880c6a127d8b5c0d149317fa4f25e1a2550026a14bde62b3743de7b058557c
7
- data.tar.gz: 13ca6cd11f437d9c1b25e6dbd380f018674b1b80b7b4fece7c3a91c042b59b9dc8f8e1bc6a42a14325ec89ab4af4b022d4c3bd00e15aca6cb6189082e03d099a
6
+ metadata.gz: 1162ffafc4d157e7a8f9d2b8f69163e90e83431daac129707e16605a9b0df250c1cf2dd9063f651782b01ab0dcd9a0f3848d381981f94ef5a2daf36f43d591be
7
+ data.tar.gz: 45a1efa4e1e65ae322b7c923cb5795ceda901863afc4d135cb25e1b342c9604e7f90719259b7fd93b8c38b15a5bcb4cab984617ea915fb58055f21936470269c
data/CHANGELOG.md CHANGED
@@ -1,5 +1,19 @@
1
1
  ## master (unreleased)
2
2
 
3
+ ## 0.3.0 (2023-05-20)
4
+
5
+ - Allow a default retry backoff to be configured
6
+
7
+ ```ruby
8
+ SidekiqIteration.default_retry_backoff = 10.seconds
9
+ ```
10
+
11
+ - Add ability to iterate Active Record enumerators in reverse order
12
+
13
+ ```ruby
14
+ active_record_records_enumerator(User.all, order: :desc)
15
+ ```
16
+
3
17
  ## 0.2.0 (2022-11-11)
4
18
 
5
19
  - Fix storing run metadata when the job fails for sidekiq < 6.5.2
data/README.md CHANGED
@@ -33,7 +33,7 @@ Software that is designed for high availability [must be resilient](https://12fa
33
33
  - Ruby 2.7+ (if you need support for older ruby, [open an issue](https://github.com/fatkodima/sidekiq-iteration/issues/new))
34
34
  - Sidekiq 6+
35
35
 
36
- ## Getting started
36
+ ## Installation
37
37
 
38
38
  Add this line to your application's Gemfile:
39
39
 
@@ -45,6 +45,8 @@ And then execute:
45
45
 
46
46
  $ bundle
47
47
 
48
+ ## Getting started
49
+
48
50
  In the job, include `SidekiqIteration::Iteration` module and start describing the job with two methods (`build_enumerator` and `each_iteration`) instead of `perform`:
49
51
 
50
52
  ```ruby
@@ -184,10 +186,10 @@ class CsvJob
184
186
 
185
187
  def build_enumerator(import_id, cursor:)
186
188
  import = Import.find(import_id)
187
- csv_enumereator(import.csv, cursor: cursor)
189
+ csv_enumerator(import.csv, cursor: cursor)
188
190
  end
189
191
 
190
- def each_iteration(csv_row)
192
+ def each_iteration(csv_row, import_id)
191
193
  # insert csv_row to database
192
194
  end
193
195
  end
@@ -220,6 +222,7 @@ end
220
222
  ## Guides
221
223
 
222
224
  * [Iteration: how it works](guides/iteration-how-it-works.md)
225
+ * [Job argument semantics](guides/argument-semantics.md)
223
226
  * [Best practices](guides/best-practices.md)
224
227
  * [Writing custom enumerator](guides/custom-enumerator.md)
225
228
  * [Throttling](guides/throttling.md)
@@ -0,0 +1,130 @@
1
+ # Argument Semantics
2
+
3
+ `sidekiq-iteration` defines the `perform` method, required by `sidekiq`, to allow for iteration.
4
+
5
+ The call sequence is usually 3 methods:
6
+
7
+ `perform -> build_enumerator -> each_iteration`
8
+
9
+ In that sense `sidekiq-iteration` works like a framework (it calls your code) rather than like a library (that you call). When using jobs with parameters, the following rules of thumb are good to keep in mind.
10
+
11
+ ## Jobs without arguments
12
+
13
+ Jobs without arguments do not pass anything into either `build_enumerator` or `each_iteration` except for the `cursor` which `sidekiq-iteration` persists by itself:
14
+
15
+ ```ruby
16
+ class ArglessJob
17
+ include Sidekiq::Job
18
+ include SidekiqIteration::Iteration
19
+
20
+ def build_enumerator(cursor:)
21
+ # ...
22
+ end
23
+
24
+ def each_iteration(single_object_yielded_from_enumerator)
25
+ # ...
26
+ end
27
+ end
28
+ ```
29
+
30
+ To enqueue the job:
31
+
32
+ ```ruby
33
+ ArglessJob.perform_async
34
+ ```
35
+
36
+ ## Jobs with positional arguments
37
+
38
+ Jobs with positional arguments will have those arguments available to both `build_enumerator` and `each_iteration`:
39
+
40
+ ```ruby
41
+ class ArgumentativeJob
42
+ include Sidekiq::Job
43
+ include SidekiqIteration::Iteration
44
+
45
+ def build_enumerator(arg1, arg2, arg3, cursor:)
46
+ # ...
47
+ end
48
+
49
+ def each_iteration(single_object_yielded_from_enumerator, arg1, arg2, arg3)
50
+ # ...
51
+ end
52
+ end
53
+ ```
54
+
55
+ To enqueue the job:
56
+
57
+ ```ruby
58
+ ArgumentativeJob.perform_async(_arg1 = "One", _arg2 = "Two", _arg3 = "Three")
59
+ ```
60
+
61
+ ## Jobs with keyword arguments
62
+
63
+ Jobs with keyword arguments will have the keyword arguments available to both `build_enumerator` and `each_iteration`, but these arguments come packaged into a Hash in both cases. You will need to `fetch` or `[]` your parameter from the `Hash` you get passed in:
64
+
65
+ ```ruby
66
+ class ParameterizedJob
67
+ include Sidekiq::Job
68
+ include SidekiqIteration::Iteration
69
+
70
+ def build_enumerator(kwargs, cursor:)
71
+ name = kwargs.fetch("name")
72
+ email = kwargs.fetch("email")
73
+ # ...
74
+ end
75
+
76
+ def each_iteration(object_yielded_from_enumerator, kwargs)
77
+ name = kwargs.fetch("name")
78
+ email = kwargs.fetch("email")
79
+ # ...
80
+ end
81
+ end
82
+ ```
83
+
84
+ To enqueue the job:
85
+
86
+ ```ruby
87
+ ParameterizedJob.perform_async("name" => "Jane", "email" => "jane@host.example")
88
+ ```
89
+
90
+ ## Jobs with both positional and keyword arguments
91
+
92
+ Jobs with keyword arguments will have the keyword arguments available to both `build_enumerator` and `each_iteration`, but these arguments come packaged into a Hash in both cases. You will need to `fetch` or `[]` your parameter from the `Hash` you get passed in. Positional arguments get passed first and "unsplatted" (not combined into an array), the `Hash` containing keyword arguments comes after:
93
+
94
+ ```ruby
95
+ class HighlyConfigurableGreetingJob
96
+ include Sidekiq::Job
97
+ include SidekiqIteration::Iteration
98
+
99
+ def build_enumerator(subject_line, kwargs, cursor:)
100
+ name = kwargs.fetch("sender_name")
101
+ email = kwargs.fetch("sender_email")
102
+ # ...
103
+ end
104
+
105
+ def each_iteration(object_yielded_from_enumerator, subject_line, kwargs)
106
+ name = kwargs.fetch("sender_name")
107
+ email = kwargs.fetch("sender_email")
108
+ # ...
109
+ end
110
+ end
111
+ ```
112
+
113
+ To enqueue the job:
114
+
115
+ ```ruby
116
+ HighlyConfigurableGreetingJob.perform_async(_subject_line = "Greetings everybody!", "sender_name" => "Jane", "sender_email" => "jane@host.example")
117
+ ```
118
+
119
+ ## Returning (yielding) from enumerators
120
+
121
+ When defining a custom enumerator (see the [custom enumerator guide](custom-enumerator.md)) you need to yield two positional arguments from it: the object that will be the value for the current iteration (like a single ActiveModel instance, a single number...) and the value you want to be persisted as the `cursor` value should `sidekiq-iteration` decide to interrupt you after this iteration. Calling the enumerator with that cursor should return the next object after the one returned in this iteration. That new `cursor` value does not get passed to `each_iteration`:
122
+
123
+ ```ruby
124
+ Enumerator.new do |yielder|
125
+ # In this case `cursor` is an Integer
126
+ cursor.upto(99999) do |offset|
127
+ yielder.yield(fetch_record_at(offset), offset)
128
+ end
129
+ end
130
+ ```
@@ -2,6 +2,17 @@
2
2
 
3
3
  Iteration leverages the [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.html) pattern from the Ruby standard library, which allows us to use almost any resource as a collection to iterate.
4
4
 
5
+ Before writing an enumerator, it is important to understand [how Iteration works](iteration-how-it-works.md) and how
6
+ your enumerator will be used by it. An enumerator must `yield` two things in the following order as positional
7
+ arguments:
8
+ - An object to be processed in a job `each_iteration` method
9
+ - A cursor position, which Iteration will persist if `each_iteration` returns succesfully and the job is forced to shut
10
+ down. It can be any data type your job backend can serialize and deserialize correctly.
11
+
12
+ A job that includes Iteration is first started with `nil` as the cursor. When resuming an interrupted job, Iteration
13
+ will deserialize the persisted cursor and pass it to the job's `build_enumerator` method, which your enumerator uses to
14
+ find objects that come _after_ the last successfully processed object.
15
+
5
16
  ## Cursorless Enumerator
6
17
 
7
18
  Consider a custom Enumerator that takes items from a Redis list. Because a Redis list is essentially a queue, we can ignore the cursor:
@@ -23,7 +34,7 @@ class ListJob
23
34
  end
24
35
  end
25
36
 
26
- def each_iteration(item)
37
+ def each_iteration(item_from_redis)
27
38
  # ...
28
39
  end
29
40
  end
@@ -31,14 +42,15 @@ end
31
42
 
32
43
  ## Enumerator with cursor
33
44
 
34
- But what about iterating based on a cursor? Consider this Enumerator that wraps third party API (Stripe) for paginated iteration:
45
+ For a more complex example, consider this Enumerator that wraps a third party API (Stripe) for paginated iteration and
46
+ stores a string as the cursor position:
35
47
 
36
48
  ```ruby
37
49
  class StripeListEnumerator
38
50
  # @param resource [Stripe::APIResource] The type of Stripe object to request
39
51
  # @param params [Hash] Query parameters for the request
40
52
  # @param options [Hash] Request options, such as API key or version
41
- # @param cursor [String]
53
+ # @param cursor [nil, String] The Stripe ID of the last item iterated over
42
54
  def initialize(resource, params: {}, options: {}, cursor:)
43
55
  pagination_params = {}
44
56
  pagination_params[:starting_after] = cursor unless cursor.nil?
@@ -59,6 +71,9 @@ class StripeListEnumerator
59
71
  def each
60
72
  loop do
61
73
  @list.each do |item, _index|
74
+ # The first argument is what gets passed to `each_iteration`.
75
+ # The second argument (item.id) is going to be persisted as the cursor,
76
+ # it doesn't get passed to `each_iteration`.
62
77
  yield item, item.id
63
78
  end
64
79
 
@@ -71,26 +86,38 @@ class StripeListEnumerator
71
86
  end
72
87
  ```
73
88
 
89
+ Here we leverage the Stripe cursor pagination where the cursor is an ID of a specific item in the collection. The job
90
+ which uses such an `Enumerator` would then look like so:
91
+
74
92
  ```ruby
75
- class StripeJob
93
+ class LoadRefundsForChargeJob
76
94
  include Sidekiq::Job
77
95
  include SidekiqIteration::Iteration
78
96
 
79
- def build_enumerator(params, cursor:)
97
+ def build_enumerator(charge_id, cursor:)
80
98
  StripeListEnumerator.new(
81
99
  Stripe::Refund,
82
- params: { charge: "ch_123" },
100
+ params: { charge: charge_id }, # "charge_id" will be a prefixed Stripe ID such as "chrg_123"
83
101
  options: { api_key: "sk_test_123", stripe_version: "2018-01-18" },
84
102
  cursor: cursor
85
103
  ).to_enumerator
86
104
  end
87
105
 
88
- def each_iteration(stripe_refund, _params)
106
+ # Note that in this case `each_iteration` will only receive one positional argument per iteration.
107
+ # If what your enumerator yields is a composite object you will need to unpack it yourself
108
+ # inside the `each_iteration`.
109
+ def each_iteration(stripe_refund, charge_id)
89
110
  # ...
90
111
  end
91
112
  end
92
113
  ```
93
114
 
115
+ and you initiate the job with
116
+
117
+ ```ruby
118
+ LoadRefundsForChargeJob.perform_later(_charge_id = "chrg_345")
119
+ ```
120
+
94
121
  ## Notes
95
122
 
96
123
  We recommend that you read the implementation of the other enumerators that come with the library (`CsvEnumerator`, `ActiveRecordEnumerator`) to gain a better understanding of building Enumerator objects.
@@ -5,11 +5,15 @@ module SidekiqIteration
5
5
  class ActiveRecordEnumerator
6
6
  SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%6N"
7
7
 
8
- def initialize(relation, columns: nil, batch_size: 100, cursor: nil)
8
+ def initialize(relation, columns: nil, batch_size: 100, order: :asc, cursor: nil)
9
9
  unless relation.is_a?(ActiveRecord::Relation)
10
10
  raise ArgumentError, "relation must be an ActiveRecord::Relation"
11
11
  end
12
12
 
13
+ unless order == :asc || order == :desc
14
+ raise ArgumentError, ":order must be :asc or :desc, got #{order.inspect}"
15
+ end
16
+
13
17
  @primary_key = "#{relation.table_name}.#{relation.primary_key}"
14
18
  @columns = Array(columns&.map(&:to_s) || @primary_key)
15
19
  @primary_key_index = @columns.index(@primary_key) || @columns.index(relation.primary_key)
@@ -19,6 +23,7 @@ module SidekiqIteration
19
23
  @columns + [@primary_key]
20
24
  end
21
25
  @batch_size = batch_size
26
+ @order = order
22
27
  @cursor = Array.wrap(cursor)
23
28
  raise ArgumentError, "Must specify at least one column" if @columns.empty?
24
29
  if relation.joins_values.present? && !@columns.all?(/\./)
@@ -31,7 +36,8 @@ module SidekiqIteration
31
36
  "You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
32
37
  end
33
38
 
34
- @base_relation = relation.reorder(@columns.join(", "))
39
+ ordering = @columns.to_h { |column| [column, @order] }
40
+ @base_relation = relation.reorder(ordering)
35
41
  @iteration_count = 0
36
42
  end
37
43
 
@@ -152,18 +158,19 @@ module SidekiqIteration
152
158
  end
153
159
 
154
160
  # (x, y) > (a, b) iff (x > a or (x = a and y > b))
161
+ # (x, y) < (a, b) iff (x < a or (x = a and y < b))
155
162
  def build_starts_after_conditions(index, binds)
156
163
  column = @columns[index]
157
164
 
158
165
  if index < @cursor.size - 1
159
166
  binds << @cursor[index] << @cursor[index]
160
- "#{column} > ? OR (#{column} = ? AND (#{build_starts_after_conditions(index + 1, binds)}))"
167
+ "#{column} #{@order == :asc ? '>' : '<'} ? OR (#{column} = ? AND (#{build_starts_after_conditions(index + 1, binds)}))"
161
168
  else
162
169
  binds << @cursor[index]
163
170
  if @columns.size == @cursor.size
164
- "#{column} > ?"
171
+ @order == :asc ? "#{column} > ?" : "#{column} < ?"
165
172
  else
166
- "#{column} >= ?"
173
+ @order == :asc ? "#{column} >= ?" : "#{column} <= ?"
167
174
  end
168
175
  end
169
176
  end
@@ -31,6 +31,7 @@ module SidekiqIteration
31
31
  # @option options :columns [Array<String, Symbol>] used to build the actual query for iteration,
32
32
  # defaults to primary key
33
33
  # @option options :batch_size [Integer] (100) size of the batch
34
+ # @option options :order [:asc, :desc] (:asc) specifies iteration order
34
35
  #
35
36
  # +columns:+ argument is used to build the actual query for iteration. +columns+: defaults to primary key:
36
37
  #
@@ -13,13 +13,13 @@ module SidekiqIteration
13
13
  base.extend(Throttling)
14
14
 
15
15
  base.class_eval do
16
- throttle_on(backoff: 0) do |job|
16
+ throttle_on(backoff: SidekiqIteration.default_retry_backoff) do |job|
17
17
  job.class.max_job_runtime &&
18
18
  job.start_time &&
19
19
  (Time.now.utc - job.start_time) > job.class.max_job_runtime
20
20
  end
21
21
 
22
- throttle_on(backoff: 0) do
22
+ throttle_on(backoff: SidekiqIteration.default_retry_backoff) do
23
23
  defined?(Sidekiq::CLI) &&
24
24
  Sidekiq::CLI.instance.launcher.stopping?
25
25
  end
@@ -56,16 +56,22 @@ module SidekiqIteration
56
56
 
57
57
  attr_reader :executions,
58
58
  :cursor_position,
59
- :start_time,
60
59
  :times_interrupted,
61
- :total_time,
62
60
  :current_run_iterations
63
61
 
62
+ # The time when the job starts running. If the job is interrupted and runs again,
63
+ # the value is updated.
64
+ attr_reader :start_time
65
+
66
+ # The total time the job has been running, including multiple iterations.
67
+ # The time isn't reset if the job is interrupted.
68
+ attr_reader :total_time
69
+
64
70
  # @private
65
71
  def initialize
66
72
  super
67
73
  @arguments = nil
68
- @job_iteration_retry_backoff = nil
74
+ @job_iteration_retry_backoff = SidekiqIteration.default_retry_backoff
69
75
  @needs_reenqueue = false
70
76
  @current_run_iterations = 0
71
77
  end
@@ -191,14 +197,14 @@ module SidekiqIteration
191
197
  )
192
198
  end
193
199
 
194
- adjust_total_time
195
200
  true
201
+ ensure
202
+ adjust_total_time
196
203
  end
197
204
 
198
205
  def reenqueue_iteration_job
199
206
  SidekiqIteration.logger.info("[SidekiqIteration::Iteration] Interrupting and re-enqueueing the job cursor_position=#{cursor_position}")
200
207
 
201
- adjust_total_time
202
208
  @times_interrupted += 1
203
209
 
204
210
  arguments = @arguments
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SidekiqIteration
4
- VERSION = "0.2.0"
4
+ VERSION = "0.3.0"
5
5
  end
@@ -22,6 +22,17 @@ module SidekiqIteration
22
22
  #
23
23
  attr_accessor :max_job_runtime
24
24
 
25
+ # Configures a delay duration to wait before resuming an interrupted job.
26
+ #
27
+ # @example
28
+ # SidekiqIteration.default_retry_backoff = 10.seconds
29
+ #
30
+ # Defaults to nil which means interrupted jobs will be retried immediately.
31
+ # This value will be ignored when an interruption is raised by a throttle enumerator,
32
+ # where the throttle backoff value will take precedence over this setting.
33
+ #
34
+ attr_accessor :default_retry_backoff
35
+
25
36
  # Set a custom logger for sidekiq-iteration.
26
37
  # Defaults to `Sidekiq.logger`.
27
38
  #
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sidekiq-iteration
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - fatkodima
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2022-11-11 00:00:00.000000000 Z
12
+ date: 2023-05-20 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: sidekiq
@@ -35,6 +35,7 @@ files:
35
35
  - CHANGELOG.md
36
36
  - LICENSE.txt
37
37
  - README.md
38
+ - guides/argument-semantics.md
38
39
  - guides/best-practices.md
39
40
  - guides/custom-enumerator.md
40
41
  - guides/iteration-how-it-works.md
@@ -71,8 +72,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
71
72
  - !ruby/object:Gem::Version
72
73
  version: '0'
73
74
  requirements: []
74
- rubygems_version: 3.1.6
75
+ rubygems_version: 3.4.12
75
76
  signing_key:
76
77
  specification_version: 4
77
- summary: Makes your sidekiq jobs interruptible and resumable.
78
+ summary: Makes your long-running sidekiq jobs interruptible and resumable.
78
79
  test_files: []