sidekiq-iteration 0.2.0 → 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +44 -0
- data/README.md +14 -3
- data/guides/argument-semantics.md +130 -0
- data/guides/best-practices.md +1 -1
- data/guides/custom-enumerator.md +34 -7
- data/guides/iteration-how-it-works.md +6 -0
- data/guides/throttling.md +1 -1
- data/lib/sidekiq_iteration/active_record_enumerator.rb +139 -72
- data/lib/sidekiq_iteration/csv_enumerator.rb +1 -1
- data/lib/sidekiq_iteration/enumerators.rb +3 -6
- data/lib/sidekiq_iteration/iteration.rb +23 -17
- data/lib/sidekiq_iteration/version.rb +1 -1
- data/lib/sidekiq_iteration.rb +21 -2
- metadata +5 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: ea1fc3e6f5faff037ecfade45cc58bd2035385cafb888ef67ce253d12295aee8
|
4
|
+
data.tar.gz: a4bf097d4a1a8750f4d3e2ac9ce4f01191b5e7313635fed3d958b89647639c9a
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 415f27277011c3721853ae64c1c78c566a3bfe2c690ebaeb8e9e05bb10b4f3faac00cda0f6f95ef9fd980993695edc6621085833c7901bb09b14ff6027467622
|
7
|
+
data.tar.gz: 68568c8205c0370a3d1765e5bd6431655d1a3d5fd0ff07ee65fc6e8bf1dd0447fba1a62978c45fea3850029a9046f4cbe7768bce885f5bcb4d9c8e322a539ab6
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,49 @@
|
|
1
1
|
## master (unreleased)
|
2
2
|
|
3
|
+
## 0.4.0 (2024-05-10)
|
4
|
+
|
5
|
+
- Support ordering using multiple directions for ActiveRecord enumerators
|
6
|
+
|
7
|
+
```ruby
|
8
|
+
active_record_records_enumerator(..., columns: [:shop_id, :id], order: [:asc, :desc])
|
9
|
+
```
|
10
|
+
|
11
|
+
- Support iterating over ActiveRecord models with composite primary keys
|
12
|
+
|
13
|
+
- Use Arel to generate SQL in ActiveRecord enumerator
|
14
|
+
|
15
|
+
Previously, the enumerator coerced numeric ids to a string value (e.g.: `... AND id > '1'`),
|
16
|
+
which can cause problems on some DBMSes (like BigQuery).
|
17
|
+
|
18
|
+
- Enforce explicitly passed to ActiveRecord enumerators `:columns` value to include a primary key
|
19
|
+
|
20
|
+
Previously, the primary key column was added implicitly if it was not in the list.
|
21
|
+
|
22
|
+
```ruby
|
23
|
+
# before
|
24
|
+
active_record_records_enumerator(..., columns: [:updated_at])
|
25
|
+
|
26
|
+
# after
|
27
|
+
active_record_records_enumerator(..., columns: [:updated_at, :id])
|
28
|
+
```
|
29
|
+
|
30
|
+
- Accept single values as a `:columns` for ActiveRecord enumerators
|
31
|
+
- Add `around_iteration` hook
|
32
|
+
|
33
|
+
## 0.3.0 (2023-05-20)
|
34
|
+
|
35
|
+
- Allow a default retry backoff to be configured
|
36
|
+
|
37
|
+
```ruby
|
38
|
+
SidekiqIteration.default_retry_backoff = 10.seconds
|
39
|
+
```
|
40
|
+
|
41
|
+
- Add ability to iterate Active Record enumerators in reverse order
|
42
|
+
|
43
|
+
```ruby
|
44
|
+
active_record_records_enumerator(User.all, order: :desc)
|
45
|
+
```
|
46
|
+
|
3
47
|
## 0.2.0 (2022-11-11)
|
4
48
|
|
5
49
|
- Fix storing run metadata when the job fails for sidekiq < 6.5.2
|
data/README.md
CHANGED
@@ -4,6 +4,8 @@
|
|
4
4
|
|
5
5
|
Meet Iteration, an extension for [Sidekiq](https://github.com/mperham/sidekiq) that makes your long-running jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
|
6
6
|
|
7
|
+
You may consider [`pluck_in_batches`](https://github.com/fatkodima/pluck_in_batches) gem to speedup iterating over large database tables.
|
8
|
+
|
7
9
|
## Background
|
8
10
|
|
9
11
|
Imagine the following job:
|
@@ -33,7 +35,7 @@ Software that is designed for high availability [must be resilient](https://12fa
|
|
33
35
|
- Ruby 2.7+ (if you need support for older ruby, [open an issue](https://github.com/fatkodima/sidekiq-iteration/issues/new))
|
34
36
|
- Sidekiq 6+
|
35
37
|
|
36
|
-
##
|
38
|
+
## Installation
|
37
39
|
|
38
40
|
Add this line to your application's Gemfile:
|
39
41
|
|
@@ -45,6 +47,8 @@ And then execute:
|
|
45
47
|
|
46
48
|
$ bundle
|
47
49
|
|
50
|
+
## Getting started
|
51
|
+
|
48
52
|
In the job, include `SidekiqIteration::Iteration` module and start describing the job with two methods (`build_enumerator` and `each_iteration`) instead of `perform`:
|
49
53
|
|
50
54
|
```ruby
|
@@ -97,6 +101,12 @@ class NotifyUsersJob
|
|
97
101
|
# Will be called when the job starts iterating. Called only once, for the first time.
|
98
102
|
end
|
99
103
|
|
104
|
+
def around_iteration
|
105
|
+
# Will be called around each iteration.
|
106
|
+
# Can be useful for some metrics collection, performance tracking etc.
|
107
|
+
yield
|
108
|
+
end
|
109
|
+
|
100
110
|
def on_resume
|
101
111
|
# Called when the job resumes iterating.
|
102
112
|
end
|
@@ -184,10 +194,10 @@ class CsvJob
|
|
184
194
|
|
185
195
|
def build_enumerator(import_id, cursor:)
|
186
196
|
import = Import.find(import_id)
|
187
|
-
|
197
|
+
csv_enumerator(import.csv, cursor: cursor)
|
188
198
|
end
|
189
199
|
|
190
|
-
def each_iteration(csv_row)
|
200
|
+
def each_iteration(csv_row, import_id)
|
191
201
|
# insert csv_row to database
|
192
202
|
end
|
193
203
|
end
|
@@ -220,6 +230,7 @@ end
|
|
220
230
|
## Guides
|
221
231
|
|
222
232
|
* [Iteration: how it works](guides/iteration-how-it-works.md)
|
233
|
+
* [Job argument semantics](guides/argument-semantics.md)
|
223
234
|
* [Best practices](guides/best-practices.md)
|
224
235
|
* [Writing custom enumerator](guides/custom-enumerator.md)
|
225
236
|
* [Throttling](guides/throttling.md)
|
@@ -0,0 +1,130 @@
|
|
1
|
+
# Argument Semantics
|
2
|
+
|
3
|
+
`sidekiq-iteration` defines the `perform` method, required by `sidekiq`, to allow for iteration.
|
4
|
+
|
5
|
+
The call sequence is usually 3 methods:
|
6
|
+
|
7
|
+
`perform -> build_enumerator -> each_iteration`
|
8
|
+
|
9
|
+
In that sense `sidekiq-iteration` works like a framework (it calls your code) rather than like a library (that you call). When using jobs with parameters, the following rules of thumb are good to keep in mind.
|
10
|
+
|
11
|
+
## Jobs without arguments
|
12
|
+
|
13
|
+
Jobs without arguments do not pass anything into either `build_enumerator` or `each_iteration` except for the `cursor` which `sidekiq-iteration` persists by itself:
|
14
|
+
|
15
|
+
```ruby
|
16
|
+
class ArglessJob
|
17
|
+
include Sidekiq::Job
|
18
|
+
include SidekiqIteration::Iteration
|
19
|
+
|
20
|
+
def build_enumerator(cursor:)
|
21
|
+
# ...
|
22
|
+
end
|
23
|
+
|
24
|
+
def each_iteration(single_object_yielded_from_enumerator)
|
25
|
+
# ...
|
26
|
+
end
|
27
|
+
end
|
28
|
+
```
|
29
|
+
|
30
|
+
To enqueue the job:
|
31
|
+
|
32
|
+
```ruby
|
33
|
+
ArglessJob.perform_async
|
34
|
+
```
|
35
|
+
|
36
|
+
## Jobs with positional arguments
|
37
|
+
|
38
|
+
Jobs with positional arguments will have those arguments available to both `build_enumerator` and `each_iteration`:
|
39
|
+
|
40
|
+
```ruby
|
41
|
+
class ArgumentativeJob
|
42
|
+
include Sidekiq::Job
|
43
|
+
include SidekiqIteration::Iteration
|
44
|
+
|
45
|
+
def build_enumerator(arg1, arg2, arg3, cursor:)
|
46
|
+
# ...
|
47
|
+
end
|
48
|
+
|
49
|
+
def each_iteration(single_object_yielded_from_enumerator, arg1, arg2, arg3)
|
50
|
+
# ...
|
51
|
+
end
|
52
|
+
end
|
53
|
+
```
|
54
|
+
|
55
|
+
To enqueue the job:
|
56
|
+
|
57
|
+
```ruby
|
58
|
+
ArgumentativeJob.perform_async(_arg1 = "One", _arg2 = "Two", _arg3 = "Three")
|
59
|
+
```
|
60
|
+
|
61
|
+
## Jobs with keyword arguments
|
62
|
+
|
63
|
+
Jobs with keyword arguments will have the keyword arguments available to both `build_enumerator` and `each_iteration`, but these arguments come packaged into a Hash in both cases. You will need to `fetch` or `[]` your parameter from the `Hash` you get passed in:
|
64
|
+
|
65
|
+
```ruby
|
66
|
+
class ParameterizedJob
|
67
|
+
include Sidekiq::Job
|
68
|
+
include SidekiqIteration::Iteration
|
69
|
+
|
70
|
+
def build_enumerator(kwargs, cursor:)
|
71
|
+
name = kwargs.fetch("name")
|
72
|
+
email = kwargs.fetch("email")
|
73
|
+
# ...
|
74
|
+
end
|
75
|
+
|
76
|
+
def each_iteration(object_yielded_from_enumerator, kwargs)
|
77
|
+
name = kwargs.fetch("name")
|
78
|
+
email = kwargs.fetch("email")
|
79
|
+
# ...
|
80
|
+
end
|
81
|
+
end
|
82
|
+
```
|
83
|
+
|
84
|
+
To enqueue the job:
|
85
|
+
|
86
|
+
```ruby
|
87
|
+
ParameterizedJob.perform_async("name" => "Jane", "email" => "jane@host.example")
|
88
|
+
```
|
89
|
+
|
90
|
+
## Jobs with both positional and keyword arguments
|
91
|
+
|
92
|
+
Jobs with keyword arguments will have the keyword arguments available to both `build_enumerator` and `each_iteration`, but these arguments come packaged into a Hash in both cases. You will need to `fetch` or `[]` your parameter from the `Hash` you get passed in. Positional arguments get passed first and "unsplatted" (not combined into an array), the `Hash` containing keyword arguments comes after:
|
93
|
+
|
94
|
+
```ruby
|
95
|
+
class HighlyConfigurableGreetingJob
|
96
|
+
include Sidekiq::Job
|
97
|
+
include SidekiqIteration::Iteration
|
98
|
+
|
99
|
+
def build_enumerator(subject_line, kwargs, cursor:)
|
100
|
+
name = kwargs.fetch("sender_name")
|
101
|
+
email = kwargs.fetch("sender_email")
|
102
|
+
# ...
|
103
|
+
end
|
104
|
+
|
105
|
+
def each_iteration(object_yielded_from_enumerator, subject_line, kwargs)
|
106
|
+
name = kwargs.fetch("sender_name")
|
107
|
+
email = kwargs.fetch("sender_email")
|
108
|
+
# ...
|
109
|
+
end
|
110
|
+
end
|
111
|
+
```
|
112
|
+
|
113
|
+
To enqueue the job:
|
114
|
+
|
115
|
+
```ruby
|
116
|
+
HighlyConfigurableGreetingJob.perform_async(_subject_line = "Greetings everybody!", "sender_name" => "Jane", "sender_email" => "jane@host.example")
|
117
|
+
```
|
118
|
+
|
119
|
+
## Returning (yielding) from enumerators
|
120
|
+
|
121
|
+
When defining a custom enumerator (see the [custom enumerator guide](custom-enumerator.md)) you need to yield two positional arguments from it: the object that will be the value for the current iteration (like a single ActiveModel instance, a single number...) and the value you want to be persisted as the `cursor` value should `sidekiq-iteration` decide to interrupt you after this iteration. Calling the enumerator with that cursor should return the next object after the one returned in this iteration. That new `cursor` value does not get passed to `each_iteration`:
|
122
|
+
|
123
|
+
```ruby
|
124
|
+
Enumerator.new do |yielder|
|
125
|
+
# In this case `cursor` is an Integer
|
126
|
+
cursor.upto(99999) do |offset|
|
127
|
+
yielder.yield(fetch_record_at(offset), offset)
|
128
|
+
end
|
129
|
+
end
|
130
|
+
```
|
data/guides/best-practices.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2
2
|
|
3
3
|
## Considerations when writing jobs
|
4
4
|
|
5
|
-
* Duration of `#each_iteration`: processing a single element from the enumerator
|
5
|
+
* Duration of `#each_iteration`: processing a single element from the enumerator built in `#build_enumerator` should take less than 25 seconds, or the duration set as a timeout for Sidekiq. It allows the job to be safely interrupted and resumed.
|
6
6
|
* Idempotency of `#each_iteration`: it should be safe to run `#each_iteration` multiple times for the same element from the enumerator. Read more in [this Sidekiq best practice](https://github.com/mperham/sidekiq/wiki/Best-Practices#2-make-your-job-idempotent-and-transactional). It's important if the job errors and you run it again, because the same element that errored the job may be processed again. It especially matters in the situation described above, when the iteration duration exceeds the timeout: if the job is re-enqueued, multiple elements may be processed again.
|
7
7
|
|
8
8
|
## Batch iteration
|
data/guides/custom-enumerator.md
CHANGED
@@ -2,6 +2,17 @@
|
|
2
2
|
|
3
3
|
Iteration leverages the [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.html) pattern from the Ruby standard library, which allows us to use almost any resource as a collection to iterate.
|
4
4
|
|
5
|
+
Before writing an enumerator, it is important to understand [how Iteration works](iteration-how-it-works.md) and how
|
6
|
+
your enumerator will be used by it. An enumerator must `yield` two things in the following order as positional
|
7
|
+
arguments:
|
8
|
+
- An object to be processed in a job `each_iteration` method
|
9
|
+
- A cursor position, which Iteration will persist if `each_iteration` returns successfully and the job is forced to shut
|
10
|
+
down. It can be any data type your job backend can serialize and deserialize correctly.
|
11
|
+
|
12
|
+
A job that includes Iteration is first started with `nil` as the cursor. When resuming an interrupted job, Iteration
|
13
|
+
will deserialize the persisted cursor and pass it to the job's `build_enumerator` method, which your enumerator uses to
|
14
|
+
find objects that come _after_ the last successfully processed object.
|
15
|
+
|
5
16
|
## Cursorless Enumerator
|
6
17
|
|
7
18
|
Consider a custom Enumerator that takes items from a Redis list. Because a Redis list is essentially a queue, we can ignore the cursor:
|
@@ -23,7 +34,7 @@ class ListJob
|
|
23
34
|
end
|
24
35
|
end
|
25
36
|
|
26
|
-
def each_iteration(
|
37
|
+
def each_iteration(item_from_redis)
|
27
38
|
# ...
|
28
39
|
end
|
29
40
|
end
|
@@ -31,14 +42,15 @@ end
|
|
31
42
|
|
32
43
|
## Enumerator with cursor
|
33
44
|
|
34
|
-
|
45
|
+
For a more complex example, consider this Enumerator that wraps a third party API (Stripe) for paginated iteration and
|
46
|
+
stores a string as the cursor position:
|
35
47
|
|
36
48
|
```ruby
|
37
49
|
class StripeListEnumerator
|
38
50
|
# @param resource [Stripe::APIResource] The type of Stripe object to request
|
39
51
|
# @param params [Hash] Query parameters for the request
|
40
52
|
# @param options [Hash] Request options, such as API key or version
|
41
|
-
# @param cursor [String]
|
53
|
+
# @param cursor [nil, String] The Stripe ID of the last item iterated over
|
42
54
|
def initialize(resource, params: {}, options: {}, cursor:)
|
43
55
|
pagination_params = {}
|
44
56
|
pagination_params[:starting_after] = cursor unless cursor.nil?
|
@@ -59,6 +71,9 @@ class StripeListEnumerator
|
|
59
71
|
def each
|
60
72
|
loop do
|
61
73
|
@list.each do |item, _index|
|
74
|
+
# The first argument is what gets passed to `each_iteration`.
|
75
|
+
# The second argument (item.id) is going to be persisted as the cursor,
|
76
|
+
# it doesn't get passed to `each_iteration`.
|
62
77
|
yield item, item.id
|
63
78
|
end
|
64
79
|
|
@@ -71,26 +86,38 @@ class StripeListEnumerator
|
|
71
86
|
end
|
72
87
|
```
|
73
88
|
|
89
|
+
Here we leverage the Stripe cursor pagination where the cursor is an ID of a specific item in the collection. The job
|
90
|
+
which uses such an `Enumerator` would then look like so:
|
91
|
+
|
74
92
|
```ruby
|
75
|
-
class
|
93
|
+
class LoadRefundsForChargeJob
|
76
94
|
include Sidekiq::Job
|
77
95
|
include SidekiqIteration::Iteration
|
78
96
|
|
79
|
-
def build_enumerator(
|
97
|
+
def build_enumerator(charge_id, cursor:)
|
80
98
|
StripeListEnumerator.new(
|
81
99
|
Stripe::Refund,
|
82
|
-
params: { charge: "
|
100
|
+
params: { charge: charge_id }, # "charge_id" will be a prefixed Stripe ID such as "chrg_123"
|
83
101
|
options: { api_key: "sk_test_123", stripe_version: "2018-01-18" },
|
84
102
|
cursor: cursor
|
85
103
|
).to_enumerator
|
86
104
|
end
|
87
105
|
|
88
|
-
|
106
|
+
# Note that in this case `each_iteration` will only receive one positional argument per iteration.
|
107
|
+
# If what your enumerator yields is a composite object you will need to unpack it yourself
|
108
|
+
# inside the `each_iteration`.
|
109
|
+
def each_iteration(stripe_refund, charge_id)
|
89
110
|
# ...
|
90
111
|
end
|
91
112
|
end
|
92
113
|
```
|
93
114
|
|
115
|
+
and you initiate the job with
|
116
|
+
|
117
|
+
```ruby
|
118
|
+
LoadRefundsForChargeJob.perform_later(_charge_id = "chrg_345")
|
119
|
+
```
|
120
|
+
|
94
121
|
## Notes
|
95
122
|
|
96
123
|
We recommend that you read the implementation of the other enumerators that come with the library (`CsvEnumerator`, `ActiveRecordEnumerator`) to gain a better understanding of building Enumerator objects.
|
@@ -36,6 +36,12 @@ SELECT "users".* FROM "users" ORDER BY "users"."id" LIMIT 100
|
|
36
36
|
SELECT "users".* FROM "users" WHERE "users"."id" > 2 ORDER BY "products"."id" LIMIT 100
|
37
37
|
```
|
38
38
|
|
39
|
+
## Exceptions inside `each_iteration`
|
40
|
+
|
41
|
+
When an unrescued exception happens inside the `each_iteration` block, the job will stop and re-enqueue itself with the last successful cursor. This means that the iteration that failed will be retried with the same parameters and the cursor will only move if that iteration succeeds. This behaviour may be enough for intermittent errors, such as network connection failures, but if your execution is deterministic and you have an error, subsequent iterations will never run.
|
42
|
+
|
43
|
+
In other words, if you are trying to process 100 records but the job consistently fails on the 61st, only the first 60 will be processed and the job will try to process the 61st record until retries are exhausted.
|
44
|
+
|
39
45
|
## Signals
|
40
46
|
|
41
47
|
It's critical to know [UNIX signals](https://www.tutorialspoint.com/unix/unix-signals-traps.htm) in order to understand how interruption works. There are two main signals that Sidekiq use: `SIGTERM` and `SIGKILL`. `SIGTERM` is the graceful termination signal which means that the process should exit _soon_, not immediately. For Iteration, it means that we have time to wait for the last iteration to finish and to push job back to the queue with the last cursor position.
|
data/guides/throttling.md
CHANGED
@@ -25,7 +25,7 @@ class DeleteAccountsThrottledJob
|
|
25
25
|
end
|
26
26
|
```
|
27
27
|
|
28
|
-
Note that it
|
28
|
+
Note that it's up to you to define a throttling condition that makes sense for your app.
|
29
29
|
For example, `DatabaseStatus.healthy?` can check various MySQL metrics such as replication lag, DB threads, whether DB writes are available, etc.
|
30
30
|
|
31
31
|
Jobs can define multiple throttle conditions. Throttle conditions are inherited by descendants, and new conditions will be appended without impacting existing conditions.
|
@@ -5,41 +5,69 @@ module SidekiqIteration
|
|
5
5
|
class ActiveRecordEnumerator
|
6
6
|
SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%6N"
|
7
7
|
|
8
|
-
def initialize(relation, columns: nil, batch_size: 100, cursor: nil)
|
8
|
+
def initialize(relation, columns: nil, batch_size: 100, order: :asc, cursor: nil)
|
9
9
|
unless relation.is_a?(ActiveRecord::Relation)
|
10
10
|
raise ArgumentError, "relation must be an ActiveRecord::Relation"
|
11
11
|
end
|
12
12
|
|
13
|
-
@primary_key = "#{relation.table_name}.#{relation.primary_key}"
|
14
|
-
@columns = Array(columns&.map(&:to_s) || @primary_key)
|
15
|
-
@primary_key_index = @columns.index(@primary_key) || @columns.index(relation.primary_key)
|
16
|
-
@pluck_columns = if @primary_key_index
|
17
|
-
@columns
|
18
|
-
else
|
19
|
-
@columns + [@primary_key]
|
20
|
-
end
|
21
|
-
@batch_size = batch_size
|
22
|
-
@cursor = Array.wrap(cursor)
|
23
|
-
raise ArgumentError, "Must specify at least one column" if @columns.empty?
|
24
|
-
if relation.joins_values.present? && !@columns.all?(/\./)
|
25
|
-
raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
|
26
|
-
end
|
27
|
-
|
28
13
|
if relation.arel.orders.present? || relation.arel.taken.present?
|
29
14
|
raise ArgumentError,
|
30
15
|
"The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
|
31
16
|
"You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
|
32
17
|
end
|
33
18
|
|
34
|
-
@
|
19
|
+
@relation = relation
|
20
|
+
@primary_key = relation.primary_key
|
21
|
+
columns = Array(columns || @primary_key).map(&:to_s)
|
22
|
+
|
23
|
+
if (Array(order) - [:asc, :desc]).any?
|
24
|
+
raise ArgumentError, ":order must be :asc or :desc or an array consisting of :asc or :desc, got #{order.inspect}"
|
25
|
+
end
|
26
|
+
|
27
|
+
if order.is_a?(Array) && order.size != columns.size
|
28
|
+
raise ArgumentError, ":order must include a direction for each batching column"
|
29
|
+
end
|
30
|
+
|
31
|
+
@primary_key_index = primary_key_index(columns, relation)
|
32
|
+
if @primary_key_index.nil? || (composite_primary_key? && @primary_key_index.any?(nil))
|
33
|
+
raise ArgumentError, ":columns must include a primary key columns"
|
34
|
+
end
|
35
|
+
|
36
|
+
@batch_size = batch_size
|
37
|
+
@order = batch_order(columns, order)
|
38
|
+
@cursor = Array(cursor)
|
39
|
+
|
40
|
+
if @cursor.present? && @cursor.size != columns.size
|
41
|
+
raise ArgumentError, ":cursor must include values for all the columns from :columns"
|
42
|
+
end
|
43
|
+
|
44
|
+
if columns.any?(/\W/)
|
45
|
+
arel_columns = columns.map.with_index do |column, i|
|
46
|
+
arel_column(column).as("cursor_column_#{i + 1}")
|
47
|
+
end
|
48
|
+
@cursor_columns = arel_columns.map { |column| column.right.to_s }
|
49
|
+
|
50
|
+
relation =
|
51
|
+
if relation.select_values.empty?
|
52
|
+
relation.select(@relation.arel_table[Arel.star], arel_columns)
|
53
|
+
else
|
54
|
+
relation.select(arel_columns)
|
55
|
+
end
|
56
|
+
else
|
57
|
+
@cursor_columns = columns
|
58
|
+
end
|
59
|
+
|
60
|
+
@columns = columns
|
61
|
+
ordering = @columns.zip(@order).to_h
|
62
|
+
@base_relation = relation.reorder(ordering)
|
35
63
|
@iteration_count = 0
|
36
64
|
end
|
37
65
|
|
38
66
|
def records
|
39
67
|
Enumerator.new(-> { records_size }) do |yielder|
|
40
|
-
batches.each do |batch, _|
|
68
|
+
batches.each do |batch, _| # rubocop:disable Style/HashEachMethods
|
41
69
|
batch.each do |record|
|
42
|
-
|
70
|
+
increment_iteration
|
43
71
|
yielder.yield(record, cursor_value(record))
|
44
72
|
end
|
45
73
|
end
|
@@ -49,7 +77,7 @@ module SidekiqIteration
|
|
49
77
|
def batches
|
50
78
|
Enumerator.new(-> { records_size }) do |yielder|
|
51
79
|
while (batch = next_batch(load: true))
|
52
|
-
|
80
|
+
increment_iteration
|
53
81
|
yielder.yield(batch, cursor_value(batch.last))
|
54
82
|
end
|
55
83
|
end
|
@@ -58,13 +86,44 @@ module SidekiqIteration
|
|
58
86
|
def relations
|
59
87
|
Enumerator.new(-> { relations_size }) do |yielder|
|
60
88
|
while (batch = next_batch(load: false))
|
61
|
-
|
89
|
+
increment_iteration
|
62
90
|
yielder.yield(batch, unwrap_array(@cursor))
|
63
91
|
end
|
64
92
|
end
|
65
93
|
end
|
66
94
|
|
67
95
|
private
|
96
|
+
def primary_key_index(columns, relation)
|
97
|
+
indexes = Array(@primary_key).map do |pk_column|
|
98
|
+
columns.index do |column|
|
99
|
+
column == pk_column ||
|
100
|
+
(column.include?(relation.table_name) && column.include?(pk_column))
|
101
|
+
end
|
102
|
+
end
|
103
|
+
|
104
|
+
if composite_primary_key?
|
105
|
+
indexes
|
106
|
+
else
|
107
|
+
indexes.first
|
108
|
+
end
|
109
|
+
end
|
110
|
+
|
111
|
+
def batch_order(columns, order)
|
112
|
+
if order.is_a?(Array)
|
113
|
+
order
|
114
|
+
else
|
115
|
+
[order] * columns.size
|
116
|
+
end
|
117
|
+
end
|
118
|
+
|
119
|
+
def arel_column(column)
|
120
|
+
if column.include?(".")
|
121
|
+
Arel.sql(column)
|
122
|
+
else
|
123
|
+
@relation.arel_table[column]
|
124
|
+
end
|
125
|
+
end
|
126
|
+
|
68
127
|
def records_size
|
69
128
|
@base_relation.count(:all)
|
70
129
|
end
|
@@ -75,8 +134,8 @@ module SidekiqIteration
|
|
75
134
|
|
76
135
|
def next_batch(load:)
|
77
136
|
batch_relation = @base_relation.limit(@batch_size)
|
78
|
-
if
|
79
|
-
batch_relation = batch_relation
|
137
|
+
if @cursor.present?
|
138
|
+
batch_relation = apply_cursor(batch_relation)
|
80
139
|
end
|
81
140
|
|
82
141
|
records = nil
|
@@ -92,9 +151,7 @@ module SidekiqIteration
|
|
92
151
|
cursor = cursor_values.last
|
93
152
|
return unless cursor.present?
|
94
153
|
|
95
|
-
|
96
|
-
cursor.pop unless @primary_key_index
|
97
|
-
@cursor = Array.wrap(cursor)
|
154
|
+
@cursor = Array(cursor)
|
98
155
|
|
99
156
|
# Yields relations by selecting the primary keys of records in the batch.
|
100
157
|
# Post.where(published: nil) results in an enumerator of relations like:
|
@@ -105,79 +162,89 @@ module SidekiqIteration
|
|
105
162
|
end
|
106
163
|
|
107
164
|
def pluck_columns(batch)
|
108
|
-
|
109
|
-
|
110
|
-
@pluck_columns.map { |column| column.to_s.split(".").last }
|
111
|
-
else
|
112
|
-
@pluck_columns
|
113
|
-
end
|
114
|
-
|
115
|
-
if columns.size == 1 # only the primary key
|
116
|
-
column_values = batch.pluck(columns.first)
|
165
|
+
if @cursor_columns.size == 1 # only the primary key
|
166
|
+
column_values = batch.pluck(@cursor_columns.first)
|
117
167
|
return [column_values, column_values]
|
118
168
|
end
|
119
169
|
|
120
|
-
column_values = batch.pluck(
|
121
|
-
|
122
|
-
|
170
|
+
column_values = batch.pluck(*@cursor_columns)
|
171
|
+
primary_key_values =
|
172
|
+
if composite_primary_key?
|
173
|
+
column_values.map { |values| values.values_at(*@primary_key_index) }
|
174
|
+
else
|
175
|
+
column_values.map { |values| values[@primary_key_index] }
|
176
|
+
end
|
123
177
|
|
124
|
-
serialize_column_values
|
178
|
+
column_values = serialize_column_values(column_values)
|
125
179
|
[column_values, primary_key_values]
|
126
180
|
end
|
127
181
|
|
128
182
|
def cursor_value(record)
|
129
|
-
positions = @
|
130
|
-
|
131
|
-
column_value(record[attribute_name])
|
183
|
+
positions = @cursor_columns.map do |column|
|
184
|
+
column_value(record[column])
|
132
185
|
end
|
133
186
|
|
134
187
|
unwrap_array(positions)
|
135
188
|
end
|
136
189
|
|
137
|
-
|
138
|
-
|
190
|
+
# (x, y) >= (a, b) iff (x > a or (x = a and y >= b))
|
191
|
+
# (x, y) <= (a, b) iff (x < a or (x = a and y <= b))
|
192
|
+
def apply_cursor(relation)
|
193
|
+
arel_columns = @columns.map { |column| arel_column(column) }
|
194
|
+
cursor_positions = arel_columns.zip(@cursor, cursor_operators)
|
139
195
|
|
140
|
-
|
141
|
-
|
142
|
-
|
143
|
-
|
144
|
-
|
145
|
-
|
146
|
-
|
147
|
-
|
148
|
-
|
196
|
+
where_clause = nil
|
197
|
+
cursor_positions.reverse_each.with_index do |(arel_column, value, operator), index|
|
198
|
+
where_clause =
|
199
|
+
if index == 0
|
200
|
+
arel_column.public_send(operator, value)
|
201
|
+
else
|
202
|
+
arel_column.public_send(operator, value).or(
|
203
|
+
arel_column.eq(value).and(where_clause),
|
204
|
+
)
|
205
|
+
end
|
149
206
|
end
|
150
207
|
|
151
|
-
|
208
|
+
relation.where(where_clause)
|
152
209
|
end
|
153
210
|
|
154
|
-
|
155
|
-
|
156
|
-
|
211
|
+
def serialize_column_values(column_values)
|
212
|
+
column_values.map { |values| values.map { |value| column_value(value) } }
|
213
|
+
end
|
157
214
|
|
158
|
-
|
159
|
-
|
160
|
-
|
215
|
+
def column_value(value)
|
216
|
+
if value.is_a?(Time)
|
217
|
+
value.strftime(SQL_DATETIME_WITH_NSEC)
|
161
218
|
else
|
162
|
-
|
163
|
-
|
164
|
-
|
219
|
+
value
|
220
|
+
end
|
221
|
+
end
|
222
|
+
|
223
|
+
def cursor_operators
|
224
|
+
# Start from the record pointed by cursor when just starting.
|
225
|
+
@columns.zip(@order).map do |column, order|
|
226
|
+
if column == @columns.last
|
227
|
+
if order == :asc
|
228
|
+
first_iteration? ? :gteq : :gt
|
229
|
+
else
|
230
|
+
first_iteration? ? :lteq : :lt
|
231
|
+
end
|
165
232
|
else
|
166
|
-
|
233
|
+
order == :asc ? :gt : :lt
|
167
234
|
end
|
168
235
|
end
|
169
236
|
end
|
170
237
|
|
171
|
-
def
|
172
|
-
|
238
|
+
def increment_iteration
|
239
|
+
@iteration_count += 1
|
173
240
|
end
|
174
241
|
|
175
|
-
def
|
176
|
-
|
177
|
-
|
178
|
-
|
179
|
-
|
180
|
-
|
242
|
+
def first_iteration?
|
243
|
+
@iteration_count == 0
|
244
|
+
end
|
245
|
+
|
246
|
+
def composite_primary_key?
|
247
|
+
@primary_key.is_a?(Array)
|
181
248
|
end
|
182
249
|
|
183
250
|
def unwrap_array(array)
|
@@ -36,7 +36,7 @@ module SidekiqIteration
|
|
36
36
|
# SidekiqIteration::CsvEnumerator.new(csv).rows(cursor: cursor)
|
37
37
|
#
|
38
38
|
def initialize(csv)
|
39
|
-
unless csv.instance_of?(CSV)
|
39
|
+
unless defined?(CSV) && csv.instance_of?(CSV)
|
40
40
|
raise ArgumentError, "CsvEnumerator.new takes CSV object"
|
41
41
|
end
|
42
42
|
|
@@ -17,10 +17,6 @@ module SidekiqIteration
|
|
17
17
|
def array_enumerator(array, cursor:)
|
18
18
|
raise ArgumentError, "array must be an Array" unless array.is_a?(Array)
|
19
19
|
|
20
|
-
if defined?(ActiveRecord) && array.any?(ActiveRecord::Base)
|
21
|
-
raise ArgumentError, "array cannot contain ActiveRecord objects"
|
22
|
-
end
|
23
|
-
|
24
20
|
array.each_with_index.drop(cursor || 0).to_enum { array.size }
|
25
21
|
end
|
26
22
|
|
@@ -28,9 +24,10 @@ module SidekiqIteration
|
|
28
24
|
#
|
29
25
|
# @param scope [ActiveRecord::Relation] scope to iterate
|
30
26
|
# @param cursor [Object] offset to start iteration from, usually an id
|
31
|
-
# @option options :columns [Array<String, Symbol
|
27
|
+
# @option options :columns [Array<String, Symbol>, String, Symbol] used to build the actual query for iteration,
|
32
28
|
# defaults to primary key
|
33
29
|
# @option options :batch_size [Integer] (100) size of the batch
|
30
|
+
# @option options :order [:asc, :desc, Array<:asc, :desc>] (:asc) specifies iteration order
|
34
31
|
#
|
35
32
|
# +columns:+ argument is used to build the actual query for iteration. +columns+: defaults to primary key:
|
36
33
|
#
|
@@ -58,7 +55,7 @@ module SidekiqIteration
|
|
58
55
|
# As a result of this query pattern, if the values in these columns change for the records in scope during
|
59
56
|
# iteration, they may be skipped or yielded multiple times depending on the nature of the update and the
|
60
57
|
# cursor's value. If the value gets updated to a greater value than the cursor's value, it will get yielded
|
61
|
-
# again. Similarly, if the value gets updated to a lesser value than the
|
58
|
+
# again. Similarly, if the value gets updated to a lesser value than the cursor's value, it will get skipped.
|
62
59
|
#
|
63
60
|
# @example
|
64
61
|
# def build_enumerator(cursor:)
|
@@ -13,15 +13,14 @@ module SidekiqIteration
|
|
13
13
|
base.extend(Throttling)
|
14
14
|
|
15
15
|
base.class_eval do
|
16
|
-
throttle_on(backoff:
|
16
|
+
throttle_on(backoff: SidekiqIteration.default_retry_backoff) do |job|
|
17
17
|
job.class.max_job_runtime &&
|
18
18
|
job.start_time &&
|
19
19
|
(Time.now.utc - job.start_time) > job.class.max_job_runtime
|
20
20
|
end
|
21
21
|
|
22
|
-
throttle_on(backoff:
|
23
|
-
|
24
|
-
Sidekiq::CLI.instance.launcher.stopping?
|
22
|
+
throttle_on(backoff: SidekiqIteration.default_retry_backoff) do
|
23
|
+
SidekiqIteration.stopping
|
25
24
|
end
|
26
25
|
end
|
27
26
|
|
@@ -56,16 +55,22 @@ module SidekiqIteration
|
|
56
55
|
|
57
56
|
attr_reader :executions,
|
58
57
|
:cursor_position,
|
59
|
-
:start_time,
|
60
58
|
:times_interrupted,
|
61
|
-
:total_time,
|
62
59
|
:current_run_iterations
|
63
60
|
|
61
|
+
# The time when the job starts running. If the job is interrupted and runs again,
|
62
|
+
# the value is updated.
|
63
|
+
attr_reader :start_time
|
64
|
+
|
65
|
+
# The total time the job has been running, including multiple iterations.
|
66
|
+
# The time isn't reset if the job is interrupted.
|
67
|
+
attr_reader :total_time
|
68
|
+
|
64
69
|
# @private
|
65
70
|
def initialize
|
66
71
|
super
|
67
72
|
@arguments = nil
|
68
|
-
@job_iteration_retry_backoff =
|
73
|
+
@job_iteration_retry_backoff = SidekiqIteration.default_retry_backoff
|
69
74
|
@needs_reenqueue = false
|
70
75
|
@current_run_iterations = 0
|
71
76
|
end
|
@@ -82,6 +87,12 @@ module SidekiqIteration
|
|
82
87
|
def on_start
|
83
88
|
end
|
84
89
|
|
90
|
+
# A hook to override that will be called around each iteration.
|
91
|
+
# Can be useful for some metrics collection, performance tracking etc.
|
92
|
+
def around_iteration
|
93
|
+
yield
|
94
|
+
end
|
95
|
+
|
85
96
|
# A hook to override that will be called when the job resumes iterating.
|
86
97
|
def on_resume
|
87
98
|
end
|
@@ -172,7 +183,9 @@ module SidekiqIteration
|
|
172
183
|
|
173
184
|
enumerator.each do |object_from_enumerator, index|
|
174
185
|
found_record = true
|
175
|
-
|
186
|
+
around_iteration do
|
187
|
+
each_iteration(object_from_enumerator, *arguments)
|
188
|
+
end
|
176
189
|
@cursor_position = index
|
177
190
|
@current_run_iterations += 1
|
178
191
|
|
@@ -191,14 +204,14 @@ module SidekiqIteration
|
|
191
204
|
)
|
192
205
|
end
|
193
206
|
|
194
|
-
adjust_total_time
|
195
207
|
true
|
208
|
+
ensure
|
209
|
+
adjust_total_time
|
196
210
|
end
|
197
211
|
|
198
212
|
def reenqueue_iteration_job
|
199
213
|
SidekiqIteration.logger.info("[SidekiqIteration::Iteration] Interrupting and re-enqueueing the job cursor_position=#{cursor_position}")
|
200
214
|
|
201
|
-
adjust_total_time
|
202
215
|
@times_interrupted += 1
|
203
216
|
|
204
217
|
arguments = @arguments
|
@@ -252,13 +265,6 @@ module SidekiqIteration
|
|
252
265
|
true
|
253
266
|
when false, :skip_complete_callback
|
254
267
|
false
|
255
|
-
when Array # can be used to return early from the enumerator
|
256
|
-
reason, backoff = completed
|
257
|
-
raise "Unknown reason: #{reason}" unless reason == :retry
|
258
|
-
|
259
|
-
@job_iteration_retry_backoff = backoff
|
260
|
-
@needs_reenqueue = true
|
261
|
-
false
|
262
268
|
else
|
263
269
|
raise "Unexpected thrown value: #{completed.inspect}"
|
264
270
|
end
|
data/lib/sidekiq_iteration.rb
CHANGED
@@ -1,6 +1,8 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
require "sidekiq"
|
4
|
+
require_relative "sidekiq_iteration/iteration"
|
5
|
+
require_relative "sidekiq_iteration/job_retry_patch"
|
4
6
|
require_relative "sidekiq_iteration/version"
|
5
7
|
|
6
8
|
module SidekiqIteration
|
@@ -22,6 +24,17 @@ module SidekiqIteration
|
|
22
24
|
#
|
23
25
|
attr_accessor :max_job_runtime
|
24
26
|
|
27
|
+
# Configures a delay duration to wait before resuming an interrupted job.
|
28
|
+
#
|
29
|
+
# @example
|
30
|
+
# SidekiqIteration.default_retry_backoff = 10.seconds
|
31
|
+
#
|
32
|
+
# Defaults to nil which means interrupted jobs will be retried immediately.
|
33
|
+
# This value will be ignored when an interruption is raised by a throttle enumerator,
|
34
|
+
# where the throttle backoff value will take precedence over this setting.
|
35
|
+
#
|
36
|
+
attr_accessor :default_retry_backoff
|
37
|
+
|
25
38
|
# Set a custom logger for sidekiq-iteration.
|
26
39
|
# Defaults to `Sidekiq.logger`.
|
27
40
|
#
|
@@ -33,8 +46,14 @@ module SidekiqIteration
|
|
33
46
|
def logger
|
34
47
|
@logger ||= Sidekiq.logger
|
35
48
|
end
|
49
|
+
|
50
|
+
# @private
|
51
|
+
attr_accessor :stopping
|
36
52
|
end
|
37
53
|
end
|
38
54
|
|
39
|
-
|
40
|
-
|
55
|
+
Sidekiq.configure_server do |config|
|
56
|
+
config.on(:quiet) do
|
57
|
+
SidekiqIteration.stopping = true
|
58
|
+
end
|
59
|
+
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sidekiq-iteration
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.4.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- fatkodima
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date:
|
12
|
+
date: 2024-05-10 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: sidekiq
|
@@ -35,6 +35,7 @@ files:
|
|
35
35
|
- CHANGELOG.md
|
36
36
|
- LICENSE.txt
|
37
37
|
- README.md
|
38
|
+
- guides/argument-semantics.md
|
38
39
|
- guides/best-practices.md
|
39
40
|
- guides/custom-enumerator.md
|
40
41
|
- guides/iteration-how-it-works.md
|
@@ -71,8 +72,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
71
72
|
- !ruby/object:Gem::Version
|
72
73
|
version: '0'
|
73
74
|
requirements: []
|
74
|
-
rubygems_version: 3.
|
75
|
+
rubygems_version: 3.4.19
|
75
76
|
signing_key:
|
76
77
|
specification_version: 4
|
77
|
-
summary: Makes your sidekiq jobs interruptible and resumable.
|
78
|
+
summary: Makes your long-running sidekiq jobs interruptible and resumable.
|
78
79
|
test_files: []
|