sidekiq-iteration 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/CHANGELOG.md +5 -0
- data/LICENSE.txt +21 -0
- data/README.md +265 -0
- data/guides/best-practices.md +71 -0
- data/guides/custom-enumerator.md +98 -0
- data/guides/iteration-how-it-works.md +71 -0
- data/guides/throttling.md +42 -0
- data/lib/sidekiq-iteration.rb +3 -0
- data/lib/sidekiq_iteration/active_record_batch_enumerator.rb +127 -0
- data/lib/sidekiq_iteration/active_record_cursor.rb +89 -0
- data/lib/sidekiq_iteration/active_record_enumerator.rb +69 -0
- data/lib/sidekiq_iteration/csv_enumerator.rb +85 -0
- data/lib/sidekiq_iteration/enumerators.rb +187 -0
- data/lib/sidekiq_iteration/iteration.rb +267 -0
- data/lib/sidekiq_iteration/job_retry_patch.rb +30 -0
- data/lib/sidekiq_iteration/nested_enumerator.rb +39 -0
- data/lib/sidekiq_iteration/throttling.rb +45 -0
- data/lib/sidekiq_iteration/version.rb +5 -0
- data/lib/sidekiq_iteration.rb +40 -0
- metadata +80 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 82316cffa840b2c9619792b6f0c5bb7ec696f964ad81edf6b0ed5861339ca064
|
4
|
+
data.tar.gz: 8337c0e87e6be8858d5b9d868c8a04b05f91de315be2e9aca9c40e8447c78644
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 63712780bca873613cbe3ef89ff0037c3eaa5633a28382eb02427311360714a89f092e5efef29873efa75067d22f32745eea3f04d7606442e29da33e2e2e6a08
|
7
|
+
data.tar.gz: 4ded7fc6ab772c019154e6559027c87d389a356a16915d2c869e318fb26f5339dd98355a0f1eb422358c1cd9f7fe31bc2a70d5627a577f8f45394db15d0593b7
|
data/CHANGELOG.md
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2022 fatkodima
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,265 @@
|
|
1
|
+
# Sidekiq Iteration
|
2
|
+
|
3
|
+
[![Build Status](https://github.com/fatkodima/sidekiq-iteration/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/fatkodima/sidekiq-iteration/actions/workflows/ci.yml)
|
4
|
+
|
5
|
+
Meet Iteration, an extension for [Sidekiq](https://github.com/mperham/sidekiq) that makes your jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
|
6
|
+
|
7
|
+
## Background
|
8
|
+
|
9
|
+
Imagine the following job:
|
10
|
+
|
11
|
+
```ruby
|
12
|
+
class SimpleJob
|
13
|
+
include Sidekiq::Job
|
14
|
+
|
15
|
+
def perform
|
16
|
+
User.find_each do |user|
|
17
|
+
user.notify_about_something
|
18
|
+
end
|
19
|
+
end
|
20
|
+
end
|
21
|
+
```
|
22
|
+
|
23
|
+
The job would run fairly quickly when you only have a hundred `User` records. But as the number of records grows, it will take longer for a job to iterate over all Users. Eventually, there will be millions of records to iterate and the job will end up taking hours or even days.
|
24
|
+
|
25
|
+
With frequent deploys and worker restarts, it would mean that a job will be either lost or restarted from the beginning. Some records (especially those in the beginning of the relation) will be processed more than once.
|
26
|
+
|
27
|
+
Cloud environments are also unpredictable, and there's no way to guarantee that a single job will have reserved hardware to run for hours and days. What if AWS diagnosed the instance as unhealthy and will restart it in 5 minutes? All job progress will be lost.
|
28
|
+
|
29
|
+
Software that is designed for high availability [must be resilient](https://12factor.net/disposability) to interruptions that come from the infrastructure. That's exactly what Iteration brings to Sidekiq.
|
30
|
+
|
31
|
+
## Requirements
|
32
|
+
|
33
|
+
- Ruby 2.7+ (if you need support for older ruby, [open an issue](https://github.com/fatkodima/sidekiq-iteration/issues/new))
|
34
|
+
- Sidekiq 6+
|
35
|
+
|
36
|
+
## Getting started
|
37
|
+
|
38
|
+
Add this line to your application's Gemfile:
|
39
|
+
|
40
|
+
```ruby
|
41
|
+
gem 'sidekiq-iteration'
|
42
|
+
```
|
43
|
+
|
44
|
+
And then execute:
|
45
|
+
|
46
|
+
$ bundle
|
47
|
+
|
48
|
+
In the job, include `SidekiqIteration::Iteration` module and start describing the job with two methods (`build_enumerator` and `each_iteration`) instead of `perform`:
|
49
|
+
|
50
|
+
```ruby
|
51
|
+
class NotifyUsersJob
|
52
|
+
include Sidekiq::Job
|
53
|
+
include SidekiqIteration::Iteration
|
54
|
+
|
55
|
+
def build_enumerator(cursor:)
|
56
|
+
active_record_records_enumerator(User.all, cursor: cursor)
|
57
|
+
end
|
58
|
+
|
59
|
+
def each_iteration(user)
|
60
|
+
user.notify_about_something
|
61
|
+
end
|
62
|
+
end
|
63
|
+
```
|
64
|
+
|
65
|
+
`each_iteration` will be called for each `User` model in `User.all` relation. The relation will be ordered by primary key, exactly like `find_each` does.
|
66
|
+
Iteration hooks into Sidekiq out of the box to support graceful interruption. No extra configuration is required.
|
67
|
+
|
68
|
+
## Examples
|
69
|
+
|
70
|
+
### Job with custom arguments
|
71
|
+
|
72
|
+
```ruby
|
73
|
+
class ArgumentsJob
|
74
|
+
include Sidekiq::Job
|
75
|
+
include SidekiqIteration::Iteration
|
76
|
+
|
77
|
+
def build_enumerator(arg1, arg2, cursor:)
|
78
|
+
active_record_records_enumerator(User.all, cursor: cursor)
|
79
|
+
end
|
80
|
+
|
81
|
+
def each_iteration(user, arg1, arg2)
|
82
|
+
user.notify_about_something
|
83
|
+
end
|
84
|
+
end
|
85
|
+
|
86
|
+
ArgumentsJob.perform_async(arg1, arg2)
|
87
|
+
```
|
88
|
+
|
89
|
+
### Job with custom lifecycle callbacks
|
90
|
+
|
91
|
+
```ruby
|
92
|
+
class NotifyUsersJob
|
93
|
+
include Sidekiq::Job
|
94
|
+
include SidekiqIteration::Iteration
|
95
|
+
|
96
|
+
def on_start
|
97
|
+
# Will be called when the job starts iterating. Called only once, for the first time.
|
98
|
+
end
|
99
|
+
|
100
|
+
def on_resume
|
101
|
+
# Called when the job resumes iterating.
|
102
|
+
end
|
103
|
+
|
104
|
+
def on_shutdown
|
105
|
+
# Called each time the job is interrupted.
|
106
|
+
# This can be due to throttling, `max_job_runtime` configuration, or sidekiq restarting.
|
107
|
+
end
|
108
|
+
|
109
|
+
def on_complete
|
110
|
+
# Called when the job finished iterating.
|
111
|
+
end
|
112
|
+
|
113
|
+
# ...
|
114
|
+
end
|
115
|
+
```
|
116
|
+
|
117
|
+
### Iterating over batches of Active Record objects
|
118
|
+
|
119
|
+
```ruby
|
120
|
+
class BatchesJob
|
121
|
+
include Sidekiq::Job
|
122
|
+
include SidekiqIteration::Iteration
|
123
|
+
|
124
|
+
def build_enumerator(product_id, cursor:)
|
125
|
+
active_record_batches_enumerator(
|
126
|
+
Comment.where(product_id: product_id).select(:id),
|
127
|
+
cursor: cursor,
|
128
|
+
batch_size: 100,
|
129
|
+
)
|
130
|
+
end
|
131
|
+
|
132
|
+
def each_iteration(batch_of_comments, product_id)
|
133
|
+
comment_ids = batch_of_comments.map(&:id)
|
134
|
+
CommentService.call(comment_ids: comment_ids)
|
135
|
+
end
|
136
|
+
end
|
137
|
+
```
|
138
|
+
|
139
|
+
### Iterating over batches of Active Record Relations
|
140
|
+
|
141
|
+
```ruby
|
142
|
+
class BatchesAsRelationJob
|
143
|
+
include Sidekiq::Job
|
144
|
+
include SidekiqIteration::Iteration
|
145
|
+
|
146
|
+
def build_enumerator(product_id, cursor:)
|
147
|
+
active_record_relations_enumerator(
|
148
|
+
Product.find(product_id).comments,
|
149
|
+
cursor: cursor,
|
150
|
+
batch_size: 100,
|
151
|
+
)
|
152
|
+
end
|
153
|
+
|
154
|
+
def each_iteration(batch_of_comments, product_id)
|
155
|
+
# batch_of_comments will be a Comment::ActiveRecord_Relation
|
156
|
+
batch_of_comments.update_all(deleted: true)
|
157
|
+
end
|
158
|
+
end
|
159
|
+
```
|
160
|
+
|
161
|
+
### Iterating over arrays
|
162
|
+
|
163
|
+
```ruby
|
164
|
+
class ArrayJob
|
165
|
+
include Sidekiq::Job
|
166
|
+
include SidekiqIteration::Iteration
|
167
|
+
|
168
|
+
def build_enumerator(cursor:)
|
169
|
+
array_enumerator(['build', 'enumerator', 'from', 'any', 'array'], cursor: cursor)
|
170
|
+
end
|
171
|
+
|
172
|
+
def each_iteration(array_element)
|
173
|
+
# use array_element
|
174
|
+
end
|
175
|
+
end
|
176
|
+
```
|
177
|
+
|
178
|
+
### Iterating over CSV
|
179
|
+
|
180
|
+
```ruby
|
181
|
+
class CsvJob
|
182
|
+
include Sidekiq::Job
|
183
|
+
include SidekiqIteration::Iteration
|
184
|
+
|
185
|
+
def build_enumerator(import_id, cursor:)
|
186
|
+
import = Import.find(import_id)
|
187
|
+
csv_enumereator(import.csv, cursor: cursor)
|
188
|
+
end
|
189
|
+
|
190
|
+
def each_iteration(csv_row)
|
191
|
+
# insert csv_row to database
|
192
|
+
end
|
193
|
+
end
|
194
|
+
```
|
195
|
+
|
196
|
+
### Nested iteration
|
197
|
+
|
198
|
+
```ruby
|
199
|
+
class NestedIterationJob
|
200
|
+
include Sidekiq::Job
|
201
|
+
include SidekiqIteration::Iteration
|
202
|
+
|
203
|
+
def build_enumerator(cursor:)
|
204
|
+
nested_enumerator(
|
205
|
+
[
|
206
|
+
->(cursor) { active_record_records_enumerator(Shop.all, cursor: cursor) },
|
207
|
+
->(shop, cursor) { active_record_records_enumerator(shop.products, cursor: cursor) },
|
208
|
+
->(_shop, product, cursor) { active_record_relations_enumerator(product.product_variants, cursor: cursor) }
|
209
|
+
],
|
210
|
+
cursor: cursor
|
211
|
+
)
|
212
|
+
end
|
213
|
+
|
214
|
+
def each_iteration(product_variants_relation)
|
215
|
+
# do something
|
216
|
+
end
|
217
|
+
end
|
218
|
+
```
|
219
|
+
|
220
|
+
## Guides
|
221
|
+
|
222
|
+
* [Iteration: how it works](guides/iteration-how-it-works.md)
|
223
|
+
* [Best practices](guides/best-practices.md)
|
224
|
+
* [Writing custom enumerator](guides/custom-enumerator.md)
|
225
|
+
* [Throttling](guides/throttling.md)
|
226
|
+
|
227
|
+
For more detailed documentation, see [rubydoc](https://rubydoc.info/gems/sidekiq-iteration).
|
228
|
+
|
229
|
+
## API
|
230
|
+
|
231
|
+
Iteration job must respond to `build_enumerator` and `each_iteration` methods. `build_enumerator` must return [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.htmll) object that respects the `cursor` value.
|
232
|
+
|
233
|
+
## FAQ
|
234
|
+
|
235
|
+
**Why can't I just iterate in `#perform` method and do whatever I want?** You can, but then your job has to comply with a long list of requirements, such as the ones above. This creates leaky abstractions more easily, when instead we can expose a more powerful abstraction for developers without exposing the underlying infrastructure.
|
236
|
+
|
237
|
+
**What happens when my job is interrupted?** A checkpoint will be persisted to Redis after the current `each_iteration`, and the job will be re-enqueued. Once it's popped off the queue, the worker will work off from the next iteration.
|
238
|
+
|
239
|
+
**What happens with retries?** An interruption of a job does not count as a retry. The iteration of job that caused the job to fail will be retried and progress will continue from there on.
|
240
|
+
|
241
|
+
**What happens if my iteration takes a long time?** We recommend that a single `each_iteration` should take no longer than 30 seconds. In the future, this may raise an exception.
|
242
|
+
|
243
|
+
**Why is it important that `each_iteration` takes less than 30 seconds?** When the job worker is scheduled for restart or shutdown, it gets a notice to finish remaining unit of work. To guarantee that no progress is lost we need to make sure that `each_iteration` completes within a reasonable amount of time.
|
244
|
+
|
245
|
+
**What do I do if each iteration takes a long time, because it's doing nested operations?** If your `each_iteration` is complex, we recommend enqueuing another job, which will run your nested business logic. If `each_iteration` performs some other iterations, like iterating over child records, consider using [nested iterations](#nested-iteration).
|
246
|
+
|
247
|
+
**My job has a complex flow. How do I write my own Enumerator?** See [the guide on Custom Enumerators](guides/custom-enumerator.md) for details.
|
248
|
+
|
249
|
+
## Credits
|
250
|
+
|
251
|
+
Thanks to [`job-iteration` gem](https://github.com/Shopify/job-iteration) for the original implementation and inspiration.
|
252
|
+
|
253
|
+
## Development
|
254
|
+
|
255
|
+
After checking out the repo, run `bundle install` to install dependencies and start Redis. Run `bundle exec rake` to run the linter and tests. This project uses multiple Gemfiles to test against multiple versions of Sidekiq; you can run the tests against the specific version with `BUNDLE_GEMFILE=gemfiles/sidekiq_6.gemfile bundle exec rake test`.
|
256
|
+
|
257
|
+
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
258
|
+
|
259
|
+
## Contributing
|
260
|
+
|
261
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/fatkodima/sidekiq-iteration.
|
262
|
+
|
263
|
+
## License
|
264
|
+
|
265
|
+
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|
@@ -0,0 +1,71 @@
|
|
1
|
+
# Best practices
|
2
|
+
|
3
|
+
## Batch iteration
|
4
|
+
|
5
|
+
Regardless of the active record enumerator used in the task, `sidekiq-iteration` gem loads records in batches of 100 (by default).
|
6
|
+
The following two tasks produce equivalent database queries, however `RecordsJob` task allows for more frequent interruptions by doing just one thing in the `each_iteration` method.
|
7
|
+
|
8
|
+
```ruby
|
9
|
+
# bad
|
10
|
+
class BatchesJob
|
11
|
+
include Sidekiq::Job
|
12
|
+
include SidekiqIteration::Iteration
|
13
|
+
|
14
|
+
def build_enumerator(product_id, cursor:)
|
15
|
+
active_record_batches_enumerator(
|
16
|
+
Comment.where(product_id: product_id),
|
17
|
+
cursor: cursor,
|
18
|
+
batch_size: 5,
|
19
|
+
)
|
20
|
+
end
|
21
|
+
|
22
|
+
def each_iteration(batch_of_comments, product_id)
|
23
|
+
batch_of_comments.each(&:destroy)
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
# good
|
28
|
+
class RecordsJob
|
29
|
+
include Sidekiq::Job
|
30
|
+
include SidekiqIteration::Iteration
|
31
|
+
|
32
|
+
def build_enumerator(product_id, cursor:)
|
33
|
+
active_record_records_enumerator(
|
34
|
+
Comment.where(product_id: product_id),
|
35
|
+
cursor: cursor,
|
36
|
+
batch_size: 5,
|
37
|
+
)
|
38
|
+
end
|
39
|
+
|
40
|
+
def each_iteration(comment, product_id)
|
41
|
+
comment.destroy
|
42
|
+
end
|
43
|
+
end
|
44
|
+
```
|
45
|
+
|
46
|
+
## Max job runtime
|
47
|
+
|
48
|
+
If a job is supposed to have millions of iterations and you expect it to run for hours and days, it's still a good idea to sometimes interrupt the job even if there are no interruption signals coming from deploys or the infrastructure.
|
49
|
+
|
50
|
+
```ruby
|
51
|
+
SidekiqIteration.max_job_runtime = 5.minutes # nil by default
|
52
|
+
```
|
53
|
+
|
54
|
+
Use this accessor to tweak how often you'd like the job to interrupt itself.
|
55
|
+
|
56
|
+
### Per job max job runtime
|
57
|
+
|
58
|
+
For more granular control, `max_job_runtime` can be set **per-job class**. This allows both incremental adoption, as well as using a conservative global setting, and an aggressive setting on a per-job basis.
|
59
|
+
|
60
|
+
```ruby
|
61
|
+
class MyJob
|
62
|
+
include Sidekiq::Job
|
63
|
+
include SidekiqIteration::Iteration
|
64
|
+
|
65
|
+
self.max_job_runtime = 3.minutes
|
66
|
+
|
67
|
+
# ...
|
68
|
+
end
|
69
|
+
```
|
70
|
+
|
71
|
+
This setting will be inherited by any child classes, although it can be further overridden.
|
@@ -0,0 +1,98 @@
|
|
1
|
+
# Custom Enumerator
|
2
|
+
|
3
|
+
Iteration leverages the [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.html) pattern from the Ruby standard library, which allows us to use almost any resource as a collection to iterate.
|
4
|
+
|
5
|
+
## Cursorless Enumerator
|
6
|
+
|
7
|
+
Consider a custom Enumerator that takes items from a Redis list. Because a Redis list is essentially a queue, we can ignore the cursor:
|
8
|
+
|
9
|
+
```ruby
|
10
|
+
class ListJob
|
11
|
+
include Sidekiq::Job
|
12
|
+
include SidekiqIteration::Iteration
|
13
|
+
|
14
|
+
def build_enumerator(*)
|
15
|
+
@redis = Redis.new
|
16
|
+
Enumerator.new do |yielder|
|
17
|
+
loop do
|
18
|
+
item = @redis.lpop(key)
|
19
|
+
break unless item
|
20
|
+
|
21
|
+
yielder.yield(item, nil)
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
def each_iteration(item)
|
27
|
+
# ...
|
28
|
+
end
|
29
|
+
end
|
30
|
+
```
|
31
|
+
|
32
|
+
## Enumerator with cursor
|
33
|
+
|
34
|
+
But what about iterating based on a cursor? Consider this Enumerator that wraps third party API (Stripe) for paginated iteration:
|
35
|
+
|
36
|
+
```ruby
|
37
|
+
class StripeListEnumerator
|
38
|
+
# @param resource [Stripe::APIResource] The type of Stripe object to request
|
39
|
+
# @param params [Hash] Query parameters for the request
|
40
|
+
# @param options [Hash] Request options, such as API key or version
|
41
|
+
# @param cursor [String]
|
42
|
+
def initialize(resource, params: {}, options: {}, cursor:)
|
43
|
+
pagination_params = {}
|
44
|
+
pagination_params[:starting_after] = cursor unless cursor.nil?
|
45
|
+
|
46
|
+
# The following line makes a request, consider adding your rate limiter here.
|
47
|
+
@list = resource.public_send(:list, params.merge(pagination_params), options)
|
48
|
+
end
|
49
|
+
|
50
|
+
def to_enumerator
|
51
|
+
to_enum(:each).lazy
|
52
|
+
end
|
53
|
+
|
54
|
+
private
|
55
|
+
|
56
|
+
# We yield our enumerator with the object id as the index so it is persisted
|
57
|
+
# as the cursor on the job. This allows us to properly set the
|
58
|
+
# `starting_after` parameter for the API request when resuming.
|
59
|
+
def each
|
60
|
+
loop do
|
61
|
+
@list.each do |item, _index|
|
62
|
+
yield item, item.id
|
63
|
+
end
|
64
|
+
|
65
|
+
# The following line makes a request, consider adding your rate limiter here.
|
66
|
+
@list = @list.next_page
|
67
|
+
|
68
|
+
break if @list.empty?
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|
72
|
+
```
|
73
|
+
|
74
|
+
```ruby
|
75
|
+
class StripeJob
|
76
|
+
include Sidekiq::Job
|
77
|
+
include SidekiqIteration::Iteration
|
78
|
+
|
79
|
+
def build_enumerator(params, cursor:)
|
80
|
+
StripeListEnumerator.new(
|
81
|
+
Stripe::Refund,
|
82
|
+
params: { charge: "ch_123" },
|
83
|
+
options: { api_key: "sk_test_123", stripe_version: "2018-01-18" },
|
84
|
+
cursor: cursor
|
85
|
+
).to_enumerator
|
86
|
+
end
|
87
|
+
|
88
|
+
def each_iteration(stripe_refund, _params)
|
89
|
+
# ...
|
90
|
+
end
|
91
|
+
end
|
92
|
+
```
|
93
|
+
|
94
|
+
## Notes
|
95
|
+
|
96
|
+
We recommend that you read the implementation of the other enumerators that come with the library (`CsvEnumerator`, `ActiveRecordEnumerator`) to gain a better understanding of building Enumerator objects.
|
97
|
+
|
98
|
+
Code that is written after the `yield` in a custom enumerator is not guaranteed to execute. In the case that a job is forced to exit (`max_job_runtime` is reached or sidekiq is topping), then the job is re-enqueued during the yield and the rest of the code in the enumerator does not run.
|
@@ -0,0 +1,71 @@
|
|
1
|
+
# Iteration: how it works
|
2
|
+
|
3
|
+
The main idea behind Iteration is to provide an API to describe jobs in an interruptible manner, in contrast with implementing one massive `#perform` method that is impossible to interrupt safely.
|
4
|
+
|
5
|
+
Exposing the enumerator and the action to apply allows us to keep a cursor and interrupt between iterations. Let's see what this looks like with an `ActiveRecord::Relation` (and `Enumerator`).
|
6
|
+
|
7
|
+
```ruby
|
8
|
+
class NotifyUsers
|
9
|
+
include Sidekiq::Job
|
10
|
+
include SidekiqIteration::Iteration
|
11
|
+
|
12
|
+
def build_enumerator(cursor:)
|
13
|
+
active_record_records_enumerator(User.all, cursor: cursor)
|
14
|
+
end
|
15
|
+
|
16
|
+
def each_iteration(user)
|
17
|
+
user.notify_about_something
|
18
|
+
end
|
19
|
+
end
|
20
|
+
```
|
21
|
+
|
22
|
+
1. `build_enumerator` is called, which constructs `ActiveRecordEnumerator` from an ActiveRecord relation (`User.all`)
|
23
|
+
2. The first batch of records is loaded:
|
24
|
+
|
25
|
+
```sql
|
26
|
+
SELECT "users".* FROM "users" ORDER BY "users"."id" LIMIT 100
|
27
|
+
```
|
28
|
+
|
29
|
+
3. The job iterates over two records of the relation and then receives `SIGTERM` (graceful termination signal) caused by a deploy.
|
30
|
+
4. The signal handler sets a flag that makes `job_should_exit?` return `true`.
|
31
|
+
5. After the last iteration is completed, we will check `job_should_exit?` which now returns `true`.
|
32
|
+
6. The job stops iterating and pushes itself back to the queue, with the latest `cursor_position` value.
|
33
|
+
7. Next time when the job is taken from the queue, we'll load records starting from the last primary key that was processed:
|
34
|
+
|
35
|
+
```sql
|
36
|
+
SELECT "users".* FROM "users" WHERE "users"."id" > 2 ORDER BY "products"."id" LIMIT 100
|
37
|
+
```
|
38
|
+
|
39
|
+
## Signals
|
40
|
+
|
41
|
+
It's critical to know [UNIX signals](https://www.tutorialspoint.com/unix/unix-signals-traps.htm) in order to understand how interruption works. There are two main signals that Sidekiq use: `SIGTERM` and `SIGKILL`. `SIGTERM` is the graceful termination signal which means that the process should exit _soon_, not immediately. For Iteration, it means that we have time to wait for the last iteration to finish and to push job back to the queue with the last cursor position.
|
42
|
+
`SIGTERM` is what allows Iteration to work. In contrast, `SIGKILL` means immediate exit. It doesn't let the worker terminate gracefully, instead it will drop the job and exit as soon as possible.
|
43
|
+
|
44
|
+
Most of the deploy strategies (Heroku, Capistrano) send `SIGTERM` before shutting down a node, then wait for a timeout (usually from 30 seconds to a minute) to send `SIGKILL` if the process has not terminated yet.
|
45
|
+
|
46
|
+
Further reading: [Sidekiq signals](https://github.com/mperham/sidekiq/wiki/Signals).
|
47
|
+
|
48
|
+
## Enumerators
|
49
|
+
|
50
|
+
Iteration supports _any_ `Enumerator`. We expose helpers to build enumerators conveniently (`active_record_records_enumerator`), but it's up for a developer to implement a custom `Enumerator`.
|
51
|
+
|
52
|
+
Consider this example:
|
53
|
+
|
54
|
+
```ruby
|
55
|
+
class MyJob
|
56
|
+
include Sidekiq::Job
|
57
|
+
include SidekiqIteration::Iteration
|
58
|
+
|
59
|
+
def build_enumerator(cursor:)
|
60
|
+
Enumerator.new do
|
61
|
+
Redis.lpop("mylist") # or: Kafka.poll(timeout: 10.seconds)
|
62
|
+
end
|
63
|
+
end
|
64
|
+
|
65
|
+
def each_iteration(element_from_redis)
|
66
|
+
# ...
|
67
|
+
end
|
68
|
+
end
|
69
|
+
```
|
70
|
+
|
71
|
+
Further reading: [ruby-doc](https://ruby-doc.org/core-3.1.2/Enumerator.html), [a great post about Enumerators](http://blog.arkency.com/2014/01/ruby-to-enum-for-enumerator/).
|
@@ -0,0 +1,42 @@
|
|
1
|
+
# Throttling
|
2
|
+
|
3
|
+
The gem provides a throttling mechanism that can be used to throttle a job when a given condition is met.
|
4
|
+
If a job is throttled, it will be interrupted and retried after a backoff period has passed.
|
5
|
+
The default backoff is 30 seconds.
|
6
|
+
|
7
|
+
Specify the throttle condition as a block:
|
8
|
+
|
9
|
+
```ruby
|
10
|
+
class DeleteAccountsThrottledJob
|
11
|
+
include Sidekiq::Job
|
12
|
+
include SidekiqIteration::Iteration
|
13
|
+
|
14
|
+
throttle_on(backoff: 1.minute) do
|
15
|
+
DatabaseStatus.unhealthy?
|
16
|
+
end
|
17
|
+
|
18
|
+
def build_enumerator(cursor:)
|
19
|
+
active_record_relations_enumerator(Account.inactive, cursor: cursor)
|
20
|
+
end
|
21
|
+
|
22
|
+
def each_iteration(accounts)
|
23
|
+
accounts.delete_all
|
24
|
+
end
|
25
|
+
end
|
26
|
+
```
|
27
|
+
|
28
|
+
Note that it’s up to you to define a throttling condition that makes sense for your app.
|
29
|
+
For example, `DatabaseStatus.healthy?` can check various MySQL metrics such as replication lag, DB threads, whether DB writes are available, etc.
|
30
|
+
|
31
|
+
Jobs can define multiple throttle conditions. Throttle conditions are inherited by descendants, and new conditions will be appended without impacting existing conditions.
|
32
|
+
|
33
|
+
The backoff can also be specified as a Proc:
|
34
|
+
|
35
|
+
```ruby
|
36
|
+
class DeleteAccountsThrottledJob
|
37
|
+
throttle_on(backoff: -> { RandomBackoffGenerator.generate_duration } ) do
|
38
|
+
DatabaseStatus.unhealthy?
|
39
|
+
end
|
40
|
+
# ...
|
41
|
+
end
|
42
|
+
```
|