sidekiq-iteration 0.2.0 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 342540da75582c7f102f6ead29643a5196038978c5626b0c6a00a43db04f4f1f
4
- data.tar.gz: cfb7cf80031976e5c68b2503d6ab0a13ffe92efc016419ad20944d3c5ce3e56d
3
+ metadata.gz: 40efca13e06cd7fdcfc1ff59ad08fea8fc731ee1b5560ae5b75b2591379bcb63
4
+ data.tar.gz: eec2991b40bb67ffc1dcea55f1c0e8acc98a18cd59ad2fd117cd80c1a94c3e79
5
5
  SHA512:
6
- metadata.gz: 9c22d0b3d74888b394fcb26ca759a0a9a74e762f06c1d655b1dcb94b791c5d4be3880c6a127d8b5c0d149317fa4f25e1a2550026a14bde62b3743de7b058557c
7
- data.tar.gz: 13ca6cd11f437d9c1b25e6dbd380f018674b1b80b7b4fece7c3a91c042b59b9dc8f8e1bc6a42a14325ec89ab4af4b022d4c3bd00e15aca6cb6189082e03d099a
6
+ metadata.gz: 1162ffafc4d157e7a8f9d2b8f69163e90e83431daac129707e16605a9b0df250c1cf2dd9063f651782b01ab0dcd9a0f3848d381981f94ef5a2daf36f43d591be
7
+ data.tar.gz: 45a1efa4e1e65ae322b7c923cb5795ceda901863afc4d135cb25e1b342c9604e7f90719259b7fd93b8c38b15a5bcb4cab984617ea915fb58055f21936470269c
data/CHANGELOG.md CHANGED
@@ -1,5 +1,19 @@
1
1
  ## master (unreleased)
2
2
 
3
+ ## 0.3.0 (2023-05-20)
4
+
5
+ - Allow a default retry backoff to be configured
6
+
7
+ ```ruby
8
+ SidekiqIteration.default_retry_backoff = 10.seconds
9
+ ```
10
+
11
+ - Add ability to iterate Active Record enumerators in reverse order
12
+
13
+ ```ruby
14
+ active_record_records_enumerator(User.all, order: :desc)
15
+ ```
16
+
3
17
  ## 0.2.0 (2022-11-11)
4
18
 
5
19
  - Fix storing run metadata when the job fails for sidekiq < 6.5.2
data/README.md CHANGED
@@ -33,7 +33,7 @@ Software that is designed for high availability [must be resilient](https://12fa
33
33
  - Ruby 2.7+ (if you need support for older ruby, [open an issue](https://github.com/fatkodima/sidekiq-iteration/issues/new))
34
34
  - Sidekiq 6+
35
35
 
36
- ## Getting started
36
+ ## Installation
37
37
 
38
38
  Add this line to your application's Gemfile:
39
39
 
@@ -45,6 +45,8 @@ And then execute:
45
45
 
46
46
  $ bundle
47
47
 
48
+ ## Getting started
49
+
48
50
  In the job, include `SidekiqIteration::Iteration` module and start describing the job with two methods (`build_enumerator` and `each_iteration`) instead of `perform`:
49
51
 
50
52
  ```ruby
@@ -184,10 +186,10 @@ class CsvJob
184
186
 
185
187
  def build_enumerator(import_id, cursor:)
186
188
  import = Import.find(import_id)
187
- csv_enumereator(import.csv, cursor: cursor)
189
+ csv_enumerator(import.csv, cursor: cursor)
188
190
  end
189
191
 
190
- def each_iteration(csv_row)
192
+ def each_iteration(csv_row, import_id)
191
193
  # insert csv_row to database
192
194
  end
193
195
  end
@@ -220,6 +222,7 @@ end
220
222
  ## Guides
221
223
 
222
224
  * [Iteration: how it works](guides/iteration-how-it-works.md)
225
+ * [Job argument semantics](guides/argument-semantics.md)
223
226
  * [Best practices](guides/best-practices.md)
224
227
  * [Writing custom enumerator](guides/custom-enumerator.md)
225
228
  * [Throttling](guides/throttling.md)
@@ -0,0 +1,130 @@
1
+ # Argument Semantics
2
+
3
+ `sidekiq-iteration` defines the `perform` method, required by `sidekiq`, to allow for iteration.
4
+
5
+ The call sequence is usually 3 methods:
6
+
7
+ `perform -> build_enumerator -> each_iteration`
8
+
9
+ In that sense `sidekiq-iteration` works like a framework (it calls your code) rather than like a library (that you call). When using jobs with parameters, the following rules of thumb are good to keep in mind.
10
+
11
+ ## Jobs without arguments
12
+
13
+ Jobs without arguments do not pass anything into either `build_enumerator` or `each_iteration` except for the `cursor` which `sidekiq-iteration` persists by itself:
14
+
15
+ ```ruby
16
+ class ArglessJob
17
+ include Sidekiq::Job
18
+ include SidekiqIteration::Iteration
19
+
20
+ def build_enumerator(cursor:)
21
+ # ...
22
+ end
23
+
24
+ def each_iteration(single_object_yielded_from_enumerator)
25
+ # ...
26
+ end
27
+ end
28
+ ```
29
+
30
+ To enqueue the job:
31
+
32
+ ```ruby
33
+ ArglessJob.perform_async
34
+ ```
35
+
36
+ ## Jobs with positional arguments
37
+
38
+ Jobs with positional arguments will have those arguments available to both `build_enumerator` and `each_iteration`:
39
+
40
+ ```ruby
41
+ class ArgumentativeJob
42
+ include Sidekiq::Job
43
+ include SidekiqIteration::Iteration
44
+
45
+ def build_enumerator(arg1, arg2, arg3, cursor:)
46
+ # ...
47
+ end
48
+
49
+ def each_iteration(single_object_yielded_from_enumerator, arg1, arg2, arg3)
50
+ # ...
51
+ end
52
+ end
53
+ ```
54
+
55
+ To enqueue the job:
56
+
57
+ ```ruby
58
+ ArgumentativeJob.perform_async(_arg1 = "One", _arg2 = "Two", _arg3 = "Three")
59
+ ```
60
+
61
+ ## Jobs with keyword arguments
62
+
63
+ Jobs with keyword arguments will have the keyword arguments available to both `build_enumerator` and `each_iteration`, but these arguments come packaged into a Hash in both cases. You will need to `fetch` or `[]` your parameter from the `Hash` you get passed in:
64
+
65
+ ```ruby
66
+ class ParameterizedJob
67
+ include Sidekiq::Job
68
+ include SidekiqIteration::Iteration
69
+
70
+ def build_enumerator(kwargs, cursor:)
71
+ name = kwargs.fetch("name")
72
+ email = kwargs.fetch("email")
73
+ # ...
74
+ end
75
+
76
+ def each_iteration(object_yielded_from_enumerator, kwargs)
77
+ name = kwargs.fetch("name")
78
+ email = kwargs.fetch("email")
79
+ # ...
80
+ end
81
+ end
82
+ ```
83
+
84
+ To enqueue the job:
85
+
86
+ ```ruby
87
+ ParameterizedJob.perform_async("name" => "Jane", "email" => "jane@host.example")
88
+ ```
89
+
90
+ ## Jobs with both positional and keyword arguments
91
+
92
+ Jobs with keyword arguments will have the keyword arguments available to both `build_enumerator` and `each_iteration`, but these arguments come packaged into a Hash in both cases. You will need to `fetch` or `[]` your parameter from the `Hash` you get passed in. Positional arguments get passed first and "unsplatted" (not combined into an array), the `Hash` containing keyword arguments comes after:
93
+
94
+ ```ruby
95
+ class HighlyConfigurableGreetingJob
96
+ include Sidekiq::Job
97
+ include SidekiqIteration::Iteration
98
+
99
+ def build_enumerator(subject_line, kwargs, cursor:)
100
+ name = kwargs.fetch("sender_name")
101
+ email = kwargs.fetch("sender_email")
102
+ # ...
103
+ end
104
+
105
+ def each_iteration(object_yielded_from_enumerator, subject_line, kwargs)
106
+ name = kwargs.fetch("sender_name")
107
+ email = kwargs.fetch("sender_email")
108
+ # ...
109
+ end
110
+ end
111
+ ```
112
+
113
+ To enqueue the job:
114
+
115
+ ```ruby
116
+ HighlyConfigurableGreetingJob.perform_async(_subject_line = "Greetings everybody!", "sender_name" => "Jane", "sender_email" => "jane@host.example")
117
+ ```
118
+
119
+ ## Returning (yielding) from enumerators
120
+
121
+ When defining a custom enumerator (see the [custom enumerator guide](custom-enumerator.md)) you need to yield two positional arguments from it: the object that will be the value for the current iteration (like a single ActiveModel instance, a single number...) and the value you want to be persisted as the `cursor` value should `sidekiq-iteration` decide to interrupt you after this iteration. Calling the enumerator with that cursor should return the next object after the one returned in this iteration. That new `cursor` value does not get passed to `each_iteration`:
122
+
123
+ ```ruby
124
+ Enumerator.new do |yielder|
125
+ # In this case `cursor` is an Integer
126
+ cursor.upto(99999) do |offset|
127
+ yielder.yield(fetch_record_at(offset), offset)
128
+ end
129
+ end
130
+ ```
@@ -2,6 +2,17 @@
2
2
 
3
3
  Iteration leverages the [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.html) pattern from the Ruby standard library, which allows us to use almost any resource as a collection to iterate.
4
4
 
5
+ Before writing an enumerator, it is important to understand [how Iteration works](iteration-how-it-works.md) and how
6
+ your enumerator will be used by it. An enumerator must `yield` two things in the following order as positional
7
+ arguments:
8
+ - An object to be processed in a job `each_iteration` method
9
+ - A cursor position, which Iteration will persist if `each_iteration` returns succesfully and the job is forced to shut
10
+ down. It can be any data type your job backend can serialize and deserialize correctly.
11
+
12
+ A job that includes Iteration is first started with `nil` as the cursor. When resuming an interrupted job, Iteration
13
+ will deserialize the persisted cursor and pass it to the job's `build_enumerator` method, which your enumerator uses to
14
+ find objects that come _after_ the last successfully processed object.
15
+
5
16
  ## Cursorless Enumerator
6
17
 
7
18
  Consider a custom Enumerator that takes items from a Redis list. Because a Redis list is essentially a queue, we can ignore the cursor:
@@ -23,7 +34,7 @@ class ListJob
23
34
  end
24
35
  end
25
36
 
26
- def each_iteration(item)
37
+ def each_iteration(item_from_redis)
27
38
  # ...
28
39
  end
29
40
  end
@@ -31,14 +42,15 @@ end
31
42
 
32
43
  ## Enumerator with cursor
33
44
 
34
- But what about iterating based on a cursor? Consider this Enumerator that wraps third party API (Stripe) for paginated iteration:
45
+ For a more complex example, consider this Enumerator that wraps a third party API (Stripe) for paginated iteration and
46
+ stores a string as the cursor position:
35
47
 
36
48
  ```ruby
37
49
  class StripeListEnumerator
38
50
  # @param resource [Stripe::APIResource] The type of Stripe object to request
39
51
  # @param params [Hash] Query parameters for the request
40
52
  # @param options [Hash] Request options, such as API key or version
41
- # @param cursor [String]
53
+ # @param cursor [nil, String] The Stripe ID of the last item iterated over
42
54
  def initialize(resource, params: {}, options: {}, cursor:)
43
55
  pagination_params = {}
44
56
  pagination_params[:starting_after] = cursor unless cursor.nil?
@@ -59,6 +71,9 @@ class StripeListEnumerator
59
71
  def each
60
72
  loop do
61
73
  @list.each do |item, _index|
74
+ # The first argument is what gets passed to `each_iteration`.
75
+ # The second argument (item.id) is going to be persisted as the cursor,
76
+ # it doesn't get passed to `each_iteration`.
62
77
  yield item, item.id
63
78
  end
64
79
 
@@ -71,26 +86,38 @@ class StripeListEnumerator
71
86
  end
72
87
  ```
73
88
 
89
+ Here we leverage the Stripe cursor pagination where the cursor is an ID of a specific item in the collection. The job
90
+ which uses such an `Enumerator` would then look like so:
91
+
74
92
  ```ruby
75
- class StripeJob
93
+ class LoadRefundsForChargeJob
76
94
  include Sidekiq::Job
77
95
  include SidekiqIteration::Iteration
78
96
 
79
- def build_enumerator(params, cursor:)
97
+ def build_enumerator(charge_id, cursor:)
80
98
  StripeListEnumerator.new(
81
99
  Stripe::Refund,
82
- params: { charge: "ch_123" },
100
+ params: { charge: charge_id }, # "charge_id" will be a prefixed Stripe ID such as "chrg_123"
83
101
  options: { api_key: "sk_test_123", stripe_version: "2018-01-18" },
84
102
  cursor: cursor
85
103
  ).to_enumerator
86
104
  end
87
105
 
88
- def each_iteration(stripe_refund, _params)
106
+ # Note that in this case `each_iteration` will only receive one positional argument per iteration.
107
+ # If what your enumerator yields is a composite object you will need to unpack it yourself
108
+ # inside the `each_iteration`.
109
+ def each_iteration(stripe_refund, charge_id)
89
110
  # ...
90
111
  end
91
112
  end
92
113
  ```
93
114
 
115
+ and you initiate the job with
116
+
117
+ ```ruby
118
+ LoadRefundsForChargeJob.perform_later(_charge_id = "chrg_345")
119
+ ```
120
+
94
121
  ## Notes
95
122
 
96
123
  We recommend that you read the implementation of the other enumerators that come with the library (`CsvEnumerator`, `ActiveRecordEnumerator`) to gain a better understanding of building Enumerator objects.
@@ -5,11 +5,15 @@ module SidekiqIteration
5
5
  class ActiveRecordEnumerator
6
6
  SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%6N"
7
7
 
8
- def initialize(relation, columns: nil, batch_size: 100, cursor: nil)
8
+ def initialize(relation, columns: nil, batch_size: 100, order: :asc, cursor: nil)
9
9
  unless relation.is_a?(ActiveRecord::Relation)
10
10
  raise ArgumentError, "relation must be an ActiveRecord::Relation"
11
11
  end
12
12
 
13
+ unless order == :asc || order == :desc
14
+ raise ArgumentError, ":order must be :asc or :desc, got #{order.inspect}"
15
+ end
16
+
13
17
  @primary_key = "#{relation.table_name}.#{relation.primary_key}"
14
18
  @columns = Array(columns&.map(&:to_s) || @primary_key)
15
19
  @primary_key_index = @columns.index(@primary_key) || @columns.index(relation.primary_key)
@@ -19,6 +23,7 @@ module SidekiqIteration
19
23
  @columns + [@primary_key]
20
24
  end
21
25
  @batch_size = batch_size
26
+ @order = order
22
27
  @cursor = Array.wrap(cursor)
23
28
  raise ArgumentError, "Must specify at least one column" if @columns.empty?
24
29
  if relation.joins_values.present? && !@columns.all?(/\./)
@@ -31,7 +36,8 @@ module SidekiqIteration
31
36
  "You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
32
37
  end
33
38
 
34
- @base_relation = relation.reorder(@columns.join(", "))
39
+ ordering = @columns.to_h { |column| [column, @order] }
40
+ @base_relation = relation.reorder(ordering)
35
41
  @iteration_count = 0
36
42
  end
37
43
 
@@ -152,18 +158,19 @@ module SidekiqIteration
152
158
  end
153
159
 
154
160
  # (x, y) > (a, b) iff (x > a or (x = a and y > b))
161
+ # (x, y) < (a, b) iff (x < a or (x = a and y < b))
155
162
  def build_starts_after_conditions(index, binds)
156
163
  column = @columns[index]
157
164
 
158
165
  if index < @cursor.size - 1
159
166
  binds << @cursor[index] << @cursor[index]
160
- "#{column} > ? OR (#{column} = ? AND (#{build_starts_after_conditions(index + 1, binds)}))"
167
+ "#{column} #{@order == :asc ? '>' : '<'} ? OR (#{column} = ? AND (#{build_starts_after_conditions(index + 1, binds)}))"
161
168
  else
162
169
  binds << @cursor[index]
163
170
  if @columns.size == @cursor.size
164
- "#{column} > ?"
171
+ @order == :asc ? "#{column} > ?" : "#{column} < ?"
165
172
  else
166
- "#{column} >= ?"
173
+ @order == :asc ? "#{column} >= ?" : "#{column} <= ?"
167
174
  end
168
175
  end
169
176
  end
@@ -31,6 +31,7 @@ module SidekiqIteration
31
31
  # @option options :columns [Array<String, Symbol>] used to build the actual query for iteration,
32
32
  # defaults to primary key
33
33
  # @option options :batch_size [Integer] (100) size of the batch
34
+ # @option options :order [:asc, :desc] (:asc) specifies iteration order
34
35
  #
35
36
  # +columns:+ argument is used to build the actual query for iteration. +columns+: defaults to primary key:
36
37
  #
@@ -13,13 +13,13 @@ module SidekiqIteration
13
13
  base.extend(Throttling)
14
14
 
15
15
  base.class_eval do
16
- throttle_on(backoff: 0) do |job|
16
+ throttle_on(backoff: SidekiqIteration.default_retry_backoff) do |job|
17
17
  job.class.max_job_runtime &&
18
18
  job.start_time &&
19
19
  (Time.now.utc - job.start_time) > job.class.max_job_runtime
20
20
  end
21
21
 
22
- throttle_on(backoff: 0) do
22
+ throttle_on(backoff: SidekiqIteration.default_retry_backoff) do
23
23
  defined?(Sidekiq::CLI) &&
24
24
  Sidekiq::CLI.instance.launcher.stopping?
25
25
  end
@@ -56,16 +56,22 @@ module SidekiqIteration
56
56
 
57
57
  attr_reader :executions,
58
58
  :cursor_position,
59
- :start_time,
60
59
  :times_interrupted,
61
- :total_time,
62
60
  :current_run_iterations
63
61
 
62
+ # The time when the job starts running. If the job is interrupted and runs again,
63
+ # the value is updated.
64
+ attr_reader :start_time
65
+
66
+ # The total time the job has been running, including multiple iterations.
67
+ # The time isn't reset if the job is interrupted.
68
+ attr_reader :total_time
69
+
64
70
  # @private
65
71
  def initialize
66
72
  super
67
73
  @arguments = nil
68
- @job_iteration_retry_backoff = nil
74
+ @job_iteration_retry_backoff = SidekiqIteration.default_retry_backoff
69
75
  @needs_reenqueue = false
70
76
  @current_run_iterations = 0
71
77
  end
@@ -191,14 +197,14 @@ module SidekiqIteration
191
197
  )
192
198
  end
193
199
 
194
- adjust_total_time
195
200
  true
201
+ ensure
202
+ adjust_total_time
196
203
  end
197
204
 
198
205
  def reenqueue_iteration_job
199
206
  SidekiqIteration.logger.info("[SidekiqIteration::Iteration] Interrupting and re-enqueueing the job cursor_position=#{cursor_position}")
200
207
 
201
- adjust_total_time
202
208
  @times_interrupted += 1
203
209
 
204
210
  arguments = @arguments
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SidekiqIteration
4
- VERSION = "0.2.0"
4
+ VERSION = "0.3.0"
5
5
  end
@@ -22,6 +22,17 @@ module SidekiqIteration
22
22
  #
23
23
  attr_accessor :max_job_runtime
24
24
 
25
+ # Configures a delay duration to wait before resuming an interrupted job.
26
+ #
27
+ # @example
28
+ # SidekiqIteration.default_retry_backoff = 10.seconds
29
+ #
30
+ # Defaults to nil which means interrupted jobs will be retried immediately.
31
+ # This value will be ignored when an interruption is raised by a throttle enumerator,
32
+ # where the throttle backoff value will take precedence over this setting.
33
+ #
34
+ attr_accessor :default_retry_backoff
35
+
25
36
  # Set a custom logger for sidekiq-iteration.
26
37
  # Defaults to `Sidekiq.logger`.
27
38
  #
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sidekiq-iteration
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - fatkodima
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2022-11-11 00:00:00.000000000 Z
12
+ date: 2023-05-20 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: sidekiq
@@ -35,6 +35,7 @@ files:
35
35
  - CHANGELOG.md
36
36
  - LICENSE.txt
37
37
  - README.md
38
+ - guides/argument-semantics.md
38
39
  - guides/best-practices.md
39
40
  - guides/custom-enumerator.md
40
41
  - guides/iteration-how-it-works.md
@@ -71,8 +72,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
71
72
  - !ruby/object:Gem::Version
72
73
  version: '0'
73
74
  requirements: []
74
- rubygems_version: 3.1.6
75
+ rubygems_version: 3.4.12
75
76
  signing_key:
76
77
  specification_version: 4
77
- summary: Makes your sidekiq jobs interruptible and resumable.
78
+ summary: Makes your long-running sidekiq jobs interruptible and resumable.
78
79
  test_files: []