job-iteration 1.9.0 → 1.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +28 -1
- data/README.md +19 -1
- data/job-iteration.gemspec +4 -4
- data/lib/job-iteration/csv_enumerator.rb +6 -10
- data/lib/job-iteration/interruption_adapters/delayed_job_adapter.rb +54 -0
- data/lib/job-iteration/interruption_adapters.rb +1 -1
- data/lib/job-iteration/version.rb +1 -1
- data/lib/tapioca/dsl/compilers/job_iteration.rb +15 -6
- metadata +7 -26
- data/.github/dependabot.yml +0 -16
- data/.github/workflows/ci.yml +0 -98
- data/.github/workflows/cla.yml +0 -22
- data/.gitignore +0 -11
- data/.rubocop.yml +0 -16
- data/.ruby-version +0 -1
- data/.yardopts +0 -3
- data/CODE_OF_CONDUCT.md +0 -74
- data/Gemfile +0 -42
- data/Gemfile.lock +0 -192
- data/Rakefile +0 -12
- data/bin/setup +0 -23
- data/bin/test +0 -32
- data/dev.yml +0 -54
- data/gemfiles/rails_gems.gemfile +0 -18
- data/guides/argument-semantics.md +0 -128
- data/guides/best-practices.md +0 -108
- data/guides/custom-enumerator.md +0 -140
- data/guides/iteration-how-it-works.md +0 -51
- data/guides/throttling.md +0 -68
data/guides/custom-enumerator.md
DELETED
@@ -1,140 +0,0 @@
|
|
1
|
-
# Custom Enumerator
|
2
|
-
|
3
|
-
`Iteration` leverages the [Enumerator](https://ruby-doc.org/3.2.1/Enumerator.html) pattern from the Ruby standard library,
|
4
|
-
which allows us to use almost any resource as a collection to iterate.
|
5
|
-
|
6
|
-
Before writing an enumerator, it is important to understand [how Iteration works](iteration-how-it-works.md) and how
|
7
|
-
your enumerator will be used by it. An enumerator must `yield` two things in the following order as positional
|
8
|
-
arguments:
|
9
|
-
- An object to be processed in a job `each_iteration` method
|
10
|
-
- A cursor position, which `Iteration` will persist if `each_iteration` returns successfully and the job is forced to shut
|
11
|
-
down. It can be any data type your job backend can serialize and deserialize correctly.
|
12
|
-
|
13
|
-
A job that includes `Iteration` is first started with `nil` as the cursor. When resuming an interrupted job, `Iteration`
|
14
|
-
will deserialize the persisted cursor and pass it to the job's `build_enumerator` method, which your enumerator uses to
|
15
|
-
find objects that come _after_ the last successfully processed object. The [array enumerator](https://github.com/Shopify/job-iteration/blob/v1.3.6/lib/job-iteration/enumerator_builder.rb#L50-L67)
|
16
|
-
is a simple example which uses the array index as the cursor position.
|
17
|
-
|
18
|
-
In addition to the remainder of this guide, we recommend you read the implementation of the other enumerators that come with the library (`CsvEnumerator`, `ActiveRecordEnumerator`) to gain a better understanding of building enumerators.
|
19
|
-
|
20
|
-
## Enumerator with cursor
|
21
|
-
|
22
|
-
For a more complex example, consider this `Enumerator` that wraps a third party API (Stripe) for paginated iteration and
|
23
|
-
stores a string as the cursor position:
|
24
|
-
|
25
|
-
```ruby
|
26
|
-
class StripeListEnumerator
|
27
|
-
# @see https://stripe.com/docs/api/pagination
|
28
|
-
# @param resource [Stripe::APIResource] The type of Stripe object to request
|
29
|
-
# @param params [Hash] Query parameters for the request
|
30
|
-
# @param options [Hash] Request options, such as API key or version
|
31
|
-
# @param cursor [nil, String] The Stripe ID of the last item iterated over
|
32
|
-
def initialize(resource, params: {}, options: {}, cursor:)
|
33
|
-
pagination_params = {}
|
34
|
-
pagination_params[:starting_after] = cursor unless cursor.nil?
|
35
|
-
|
36
|
-
# The following line makes a request, consider adding your rate limiter here.
|
37
|
-
@list = resource.public_send(:list, params.merge(pagination_params), options)
|
38
|
-
end
|
39
|
-
|
40
|
-
def to_enumerator
|
41
|
-
to_enum(:each).lazy
|
42
|
-
end
|
43
|
-
|
44
|
-
private
|
45
|
-
|
46
|
-
# We yield our enumerator with the object id as the index so it is persisted
|
47
|
-
# as the cursor on the job. This allows us to properly set the
|
48
|
-
# `starting_after` parameter for the API request when resuming.
|
49
|
-
def each
|
50
|
-
loop do
|
51
|
-
@list.each do |item, _index|
|
52
|
-
# The first argument is what gets passed to `each_iteration`.
|
53
|
-
# The second argument (item.id) is going to be persisted as the cursor,
|
54
|
-
# it doesn't get passed to `each_iteration`.
|
55
|
-
yield item, item.id
|
56
|
-
end
|
57
|
-
|
58
|
-
# The following line makes a request, consider adding your rate limiter here.
|
59
|
-
@list = @list.next_page
|
60
|
-
|
61
|
-
break if @list.empty?
|
62
|
-
end
|
63
|
-
end
|
64
|
-
end
|
65
|
-
```
|
66
|
-
|
67
|
-
### Usage
|
68
|
-
|
69
|
-
Here we leverage the Stripe cursor pagination where the cursor is an ID of a specific item in the collection. The job
|
70
|
-
which uses such an `Enumerator` would then look like so:
|
71
|
-
|
72
|
-
```ruby
|
73
|
-
class LoadRefundsForChargeJob < ActiveJob::Base
|
74
|
-
include JobIteration::Iteration
|
75
|
-
|
76
|
-
# If you added your own rate limiting above, handle it here. For example:
|
77
|
-
# retry_on(MyRateLimiter::LimitExceededError, wait: 30.seconds, attempts: :unlimited)
|
78
|
-
# Use an exponential back-off strategy when Stripe's API returns errors.
|
79
|
-
|
80
|
-
def build_enumerator(charge_id, cursor:)
|
81
|
-
enumerator_builder.wrap(
|
82
|
-
StripeListEnumerator.new(
|
83
|
-
Stripe::Refund,
|
84
|
-
params: { charge: charge_id}, # "charge_id" will be a prefixed Stripe ID such as "chrg_123"
|
85
|
-
options: { api_key: "sk_test_123", stripe_version: "2018-01-18" },
|
86
|
-
cursor: cursor
|
87
|
-
).to_enumerator
|
88
|
-
)
|
89
|
-
end
|
90
|
-
|
91
|
-
# Note that in this case `each_iteration` will only receive one positional argument per iteration.
|
92
|
-
# If what your enumerator yields is a composite object you will need to unpack it yourself
|
93
|
-
# inside the `each_iteration`.
|
94
|
-
def each_iteration(stripe_refund, charge_id)
|
95
|
-
# ...
|
96
|
-
end
|
97
|
-
end
|
98
|
-
```
|
99
|
-
|
100
|
-
and you initiate the job with
|
101
|
-
|
102
|
-
```ruby
|
103
|
-
LoadRefundsForChargeJob.perform_later(charge_id = "chrg_345")
|
104
|
-
```
|
105
|
-
|
106
|
-
## Cursorless enumerator
|
107
|
-
|
108
|
-
Sometimes you can ignore the cursor. Consider the following custom `Enumerator` that takes items from a Redis list, which
|
109
|
-
is essentially a queue. Even if this job doesn't need to persist a cursor in order to resume, it can still use
|
110
|
-
`Iteration`'s signal handling to finish `each_iteration` and gracefully terminate.
|
111
|
-
|
112
|
-
```ruby
|
113
|
-
class RedisPopListJob < ActiveJob::Base
|
114
|
-
include JobIteration::Iteration
|
115
|
-
|
116
|
-
# @see https://redis.io/commands/lpop/
|
117
|
-
def build_enumerator(*)
|
118
|
-
@redis = Redis.new
|
119
|
-
enumerator_builder.wrap(
|
120
|
-
Enumerator.new do |yielder|
|
121
|
-
yielder.yield @redis.lpop(key), nil
|
122
|
-
end
|
123
|
-
)
|
124
|
-
end
|
125
|
-
|
126
|
-
def each_iteration(item_from_redis)
|
127
|
-
# ...
|
128
|
-
end
|
129
|
-
end
|
130
|
-
```
|
131
|
-
|
132
|
-
## Caveats
|
133
|
-
|
134
|
-
### Post-`yield` code
|
135
|
-
|
136
|
-
Code that is written after the `yield` in a custom enumerator is not guaranteed to execute. In the case that a job is
|
137
|
-
forced to exit ie `job_should_exit?` is true, then the job is re-enqueued during the yield and the rest of the code in
|
138
|
-
the enumerator does not run. You can follow that logic
|
139
|
-
[here](https://github.com/Shopify/job-iteration/blob/v1.3.6/lib/job-iteration/iteration.rb#L161-L165) and
|
140
|
-
[here](https://github.com/Shopify/job-iteration/blob/v1.3.6/lib/job-iteration/iteration.rb#L131-L143)
|
@@ -1,51 +0,0 @@
|
|
1
|
-
# Iteration: how it works
|
2
|
-
|
3
|
-
The main idea behind Iteration is to provide an API to describe jobs in an interruptible manner, in contrast with implementing one massive `#perform` method that is impossible to interrupt safely.
|
4
|
-
|
5
|
-
Exposing the enumerator and the action to apply allows us to keep a cursor and interrupt between iterations. Let's see what this looks like with an ActiveRecord relation (and Enumerator).
|
6
|
-
|
7
|
-
1. `build_enumerator` is called, which constructs `ActiveRecordEnumerator` from an ActiveRecord relation (`Product.all`)
|
8
|
-
2. The first batch of records is loaded:
|
9
|
-
|
10
|
-
```sql
|
11
|
-
SELECT `products`.* FROM `products` ORDER BY products.id LIMIT 100
|
12
|
-
```
|
13
|
-
|
14
|
-
3. The job iterates over two records of the relation and then receives `SIGTERM` (graceful termination signal) caused by a deploy.
|
15
|
-
4. The signal handler sets a flag that makes `job_should_exit?` return `true`.
|
16
|
-
5. After the last iteration is completed, we will check `job_should_exit?` which now returns `true`.
|
17
|
-
6. The job stops iterating and pushes itself back to the queue, with the latest `cursor_position` value.
|
18
|
-
7. Next time when the job is taken from the queue, we'll load records starting from the last primary key that was processed:
|
19
|
-
|
20
|
-
```sql
|
21
|
-
SELECT `products`.* FROM `products` WHERE (products.id > 2) ORDER BY products.id LIMIT 100
|
22
|
-
```
|
23
|
-
|
24
|
-
## Exceptions inside `each_iteration`
|
25
|
-
|
26
|
-
Unrescued exceptions inside the `each_iteration` block are handled the same way as exceptions occuring in `perform` for a regular Active Job subclass, meaning you need to configure it to retry using [`retry_on`](https://api.rubyonrails.org/classes/ActiveJob/Exceptions/ClassMethods.html#method-i-retry_on) or manually call [`retry_job`](https://api.rubyonrails.org/classes/ActiveJob/Exceptions.html#method-i-retry_job). The job will re-enqueue itself with the last successful cursor, the iteration that failed will be retried with the same parameters and the cursor will only move if that iteration succeeds. This behaviour may be enough for intermittent errors, such as network connection failures, but if your execution is deterministic and you have an error, subsequent iterations will never run.
|
27
|
-
|
28
|
-
In other words, if you are trying to process 100 records but the job consistently fails on the 61st, only the first 60 will be processed and the job will try to process the 61st record until retries are exhausted.
|
29
|
-
|
30
|
-
If no retries are configured or retries are exhausted, Active Job 'bubbles up' the exception to the job backend. Retries by the backend (e.g. Sidekiq) are not supported, meaning that jobs retried by the job backend instead of Active Job will restart from the beginning.
|
31
|
-
|
32
|
-
## Stopping a job
|
33
|
-
|
34
|
-
Because jobs typically retry when exceptions are thrown, there is a special mechanism to fully stop a job that still has iterations remaining. To do this, you can `throw(:abort)`. This is then caught by job-iteration and signals that the job should complete now, regardless of its iteration state.
|
35
|
-
|
36
|
-
## Signals
|
37
|
-
|
38
|
-
It's critical to know [UNIX signals](https://www.tutorialspoint.com/unix/unix-signals-traps.htm) in order to understand how interruption works. There are two main signals that Sidekiq and Resque use: `SIGTERM` and `SIGKILL`. `SIGTERM` is the graceful termination signal which means that the process should exit _soon_, not immediately. For Iteration, it means that we have time to wait for the last iteration to finish and to push job back to the queue with the last cursor position.
|
39
|
-
`SIGTERM` is what allows Iteration to work. In contrast, `SIGKILL` means immediate exit. It doesn't let the worker terminate gracefully, instead it will drop the job and exit as soon as possible.
|
40
|
-
|
41
|
-
Most of the deploy strategies (Kubernetes, Heroku, Capistrano) send `SIGTERM` before shutting down a node, then wait for a timeout (usually from 30 seconds to a minute) to send `SIGKILL` if the process has not terminated yet.
|
42
|
-
|
43
|
-
Further reading: [Sidekiq signals](https://github.com/mperham/sidekiq/wiki/Signals).
|
44
|
-
|
45
|
-
## Enumerators
|
46
|
-
|
47
|
-
In the early versions of Iteration, `build_enumerator` used to return ActiveRecord relations directly, and we would infer the Enumerator based on the type of object. We used to support ActiveRecord relations, arrays and CSVs. This made it hard to add support for other types of enumerations, and it was easy for developers to make mistakes and return an array of ActiveRecord objects, and for us starting to treat that as an array instead of as an ActiveRecord relation.
|
48
|
-
|
49
|
-
The current version of Iteration supports _any_ Enumerator. We expose helpers to build common enumerators conveniently (`enumerator_builder.active_record_on_records`), but it's up to a developer to implement [a custom Enumerator](custom-enumerator.md).
|
50
|
-
|
51
|
-
Further reading: [ruby-doc](https://ruby-doc.org/3.2.1/Enumerator.html), [a great post about Enumerators](http://blog.arkency.com/2014/01/ruby-to-enum-for-enumerator/).
|
data/guides/throttling.md
DELETED
@@ -1,68 +0,0 @@
|
|
1
|
-
Iteration comes with a special wrapper enumerator that allows you to throttle iterations based on external signal (e.g. database health).
|
2
|
-
|
3
|
-
Consider this example:
|
4
|
-
|
5
|
-
```ruby
|
6
|
-
class InactiveAccountDeleteJob < ActiveJob::Base
|
7
|
-
include JobIteration::Iteration
|
8
|
-
|
9
|
-
def build_enumerator(_params, cursor:)
|
10
|
-
enumerator_builder.active_record_on_batches(
|
11
|
-
Account.inactive,
|
12
|
-
cursor: cursor
|
13
|
-
)
|
14
|
-
end
|
15
|
-
|
16
|
-
def each_iteration(batch, _params)
|
17
|
-
Account.where(id: batch.map(&:id)).delete_all
|
18
|
-
end
|
19
|
-
end
|
20
|
-
```
|
21
|
-
|
22
|
-
For an app that keeps track of customer accounts, it's typical to purge old data that's no longer relevant for storage.
|
23
|
-
|
24
|
-
At the same time, if you've got a lot of DB writes to perform, this can cause extra load on the database and slow down other parts of your service.
|
25
|
-
|
26
|
-
You can change `build_enumerator` to wrap enumeration on DB rows into a throttle enumerator, which takes signal as a proc and enqueues the job for later in case the proc returned `true`.
|
27
|
-
|
28
|
-
```ruby
|
29
|
-
def build_enumerator(_params, cursor:)
|
30
|
-
enumerator_builder.build_throttle_enumerator(
|
31
|
-
enumerator_builder.active_record_on_batches(
|
32
|
-
Account.inactive,
|
33
|
-
cursor: cursor
|
34
|
-
),
|
35
|
-
throttle_on: -> { DatabaseStatus.unhealthy? },
|
36
|
-
backoff: 30.seconds
|
37
|
-
)
|
38
|
-
end
|
39
|
-
```
|
40
|
-
|
41
|
-
If you want to apply throttling on all jobs, you can subclass your own EnumeratorBuilder and override the default
|
42
|
-
enumerator builder. The builder always wraps the returned enumerators from `build_enumerator`
|
43
|
-
|
44
|
-
```ruby
|
45
|
-
class MyOwnBuilder < JobIteration::EnumeratorBuilder
|
46
|
-
class Wrapper < Enumerator
|
47
|
-
class << self
|
48
|
-
def wrap(_builder, enum)
|
49
|
-
ThrottleEnumerator.new(
|
50
|
-
enum,
|
51
|
-
nil,
|
52
|
-
throttle_on: -> { DatabaseStatus.unhealthy? },
|
53
|
-
backoff: 30.seconds
|
54
|
-
)
|
55
|
-
end
|
56
|
-
end
|
57
|
-
end
|
58
|
-
end
|
59
|
-
|
60
|
-
JobIteration.enumerator_builder = MyOwnBuilder
|
61
|
-
```
|
62
|
-
|
63
|
-
Note that it's up to you to implement `DatabaseStatus.unhealthy?` that works for your database choice. At Shopify, a helper like `DatabaseStatus` checks the following MySQL metrics:
|
64
|
-
|
65
|
-
* Replication lag across all regions
|
66
|
-
* DB threads
|
67
|
-
* DB is available for writes (otherwise indicates a failover happening)
|
68
|
-
* [Semian](https://github.com/shopify/semian) open circuits
|