job-iteration 1.9.0 → 1.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,140 +0,0 @@
1
- # Custom Enumerator
2
-
3
- `Iteration` leverages the [Enumerator](https://ruby-doc.org/3.2.1/Enumerator.html) pattern from the Ruby standard library,
4
- which allows us to use almost any resource as a collection to iterate.
5
-
6
- Before writing an enumerator, it is important to understand [how Iteration works](iteration-how-it-works.md) and how
7
- your enumerator will be used by it. An enumerator must `yield` two things in the following order as positional
8
- arguments:
9
- - An object to be processed in a job `each_iteration` method
10
- - A cursor position, which `Iteration` will persist if `each_iteration` returns successfully and the job is forced to shut
11
- down. It can be any data type your job backend can serialize and deserialize correctly.
12
-
13
- A job that includes `Iteration` is first started with `nil` as the cursor. When resuming an interrupted job, `Iteration`
14
- will deserialize the persisted cursor and pass it to the job's `build_enumerator` method, which your enumerator uses to
15
- find objects that come _after_ the last successfully processed object. The [array enumerator](https://github.com/Shopify/job-iteration/blob/v1.3.6/lib/job-iteration/enumerator_builder.rb#L50-L67)
16
- is a simple example which uses the array index as the cursor position.
17
-
18
- In addition to the remainder of this guide, we recommend you read the implementation of the other enumerators that come with the library (`CsvEnumerator`, `ActiveRecordEnumerator`) to gain a better understanding of building enumerators.
19
-
20
- ## Enumerator with cursor
21
-
22
- For a more complex example, consider this `Enumerator` that wraps a third party API (Stripe) for paginated iteration and
23
- stores a string as the cursor position:
24
-
25
- ```ruby
26
- class StripeListEnumerator
27
- # @see https://stripe.com/docs/api/pagination
28
- # @param resource [Stripe::APIResource] The type of Stripe object to request
29
- # @param params [Hash] Query parameters for the request
30
- # @param options [Hash] Request options, such as API key or version
31
- # @param cursor [nil, String] The Stripe ID of the last item iterated over
32
- def initialize(resource, params: {}, options: {}, cursor:)
33
- pagination_params = {}
34
- pagination_params[:starting_after] = cursor unless cursor.nil?
35
-
36
- # The following line makes a request, consider adding your rate limiter here.
37
- @list = resource.public_send(:list, params.merge(pagination_params), options)
38
- end
39
-
40
- def to_enumerator
41
- to_enum(:each).lazy
42
- end
43
-
44
- private
45
-
46
- # We yield our enumerator with the object id as the index so it is persisted
47
- # as the cursor on the job. This allows us to properly set the
48
- # `starting_after` parameter for the API request when resuming.
49
- def each
50
- loop do
51
- @list.each do |item, _index|
52
- # The first argument is what gets passed to `each_iteration`.
53
- # The second argument (item.id) is going to be persisted as the cursor,
54
- # it doesn't get passed to `each_iteration`.
55
- yield item, item.id
56
- end
57
-
58
- # The following line makes a request, consider adding your rate limiter here.
59
- @list = @list.next_page
60
-
61
- break if @list.empty?
62
- end
63
- end
64
- end
65
- ```
66
-
67
- ### Usage
68
-
69
- Here we leverage the Stripe cursor pagination where the cursor is an ID of a specific item in the collection. The job
70
- which uses such an `Enumerator` would then look like so:
71
-
72
- ```ruby
73
- class LoadRefundsForChargeJob < ActiveJob::Base
74
- include JobIteration::Iteration
75
-
76
- # If you added your own rate limiting above, handle it here. For example:
77
- # retry_on(MyRateLimiter::LimitExceededError, wait: 30.seconds, attempts: :unlimited)
78
- # Use an exponential back-off strategy when Stripe's API returns errors.
79
-
80
- def build_enumerator(charge_id, cursor:)
81
- enumerator_builder.wrap(
82
- StripeListEnumerator.new(
83
- Stripe::Refund,
84
- params: { charge: charge_id}, # "charge_id" will be a prefixed Stripe ID such as "chrg_123"
85
- options: { api_key: "sk_test_123", stripe_version: "2018-01-18" },
86
- cursor: cursor
87
- ).to_enumerator
88
- )
89
- end
90
-
91
- # Note that in this case `each_iteration` will only receive one positional argument per iteration.
92
- # If what your enumerator yields is a composite object you will need to unpack it yourself
93
- # inside the `each_iteration`.
94
- def each_iteration(stripe_refund, charge_id)
95
- # ...
96
- end
97
- end
98
- ```
99
-
100
- and you initiate the job with
101
-
102
- ```ruby
103
- LoadRefundsForChargeJob.perform_later(charge_id = "chrg_345")
104
- ```
105
-
106
- ## Cursorless enumerator
107
-
108
- Sometimes you can ignore the cursor. Consider the following custom `Enumerator` that takes items from a Redis list, which
109
- is essentially a queue. Even if this job doesn't need to persist a cursor in order to resume, it can still use
110
- `Iteration`'s signal handling to finish `each_iteration` and gracefully terminate.
111
-
112
- ```ruby
113
- class RedisPopListJob < ActiveJob::Base
114
- include JobIteration::Iteration
115
-
116
- # @see https://redis.io/commands/lpop/
117
- def build_enumerator(*)
118
- @redis = Redis.new
119
- enumerator_builder.wrap(
120
- Enumerator.new do |yielder|
121
- yielder.yield @redis.lpop(key), nil
122
- end
123
- )
124
- end
125
-
126
- def each_iteration(item_from_redis)
127
- # ...
128
- end
129
- end
130
- ```
131
-
132
- ## Caveats
133
-
134
- ### Post-`yield` code
135
-
136
- Code that is written after the `yield` in a custom enumerator is not guaranteed to execute. In the case that a job is
137
- forced to exit ie `job_should_exit?` is true, then the job is re-enqueued during the yield and the rest of the code in
138
- the enumerator does not run. You can follow that logic
139
- [here](https://github.com/Shopify/job-iteration/blob/v1.3.6/lib/job-iteration/iteration.rb#L161-L165) and
140
- [here](https://github.com/Shopify/job-iteration/blob/v1.3.6/lib/job-iteration/iteration.rb#L131-L143)
@@ -1,51 +0,0 @@
1
- # Iteration: how it works
2
-
3
- The main idea behind Iteration is to provide an API to describe jobs in an interruptible manner, in contrast with implementing one massive `#perform` method that is impossible to interrupt safely.
4
-
5
- Exposing the enumerator and the action to apply allows us to keep a cursor and interrupt between iterations. Let's see what this looks like with an ActiveRecord relation (and Enumerator).
6
-
7
- 1. `build_enumerator` is called, which constructs `ActiveRecordEnumerator` from an ActiveRecord relation (`Product.all`)
8
- 2. The first batch of records is loaded:
9
-
10
- ```sql
11
- SELECT `products`.* FROM `products` ORDER BY products.id LIMIT 100
12
- ```
13
-
14
- 3. The job iterates over two records of the relation and then receives `SIGTERM` (graceful termination signal) caused by a deploy.
15
- 4. The signal handler sets a flag that makes `job_should_exit?` return `true`.
16
- 5. After the last iteration is completed, we will check `job_should_exit?` which now returns `true`.
17
- 6. The job stops iterating and pushes itself back to the queue, with the latest `cursor_position` value.
18
- 7. Next time when the job is taken from the queue, we'll load records starting from the last primary key that was processed:
19
-
20
- ```sql
21
- SELECT `products`.* FROM `products` WHERE (products.id > 2) ORDER BY products.id LIMIT 100
22
- ```
23
-
24
- ## Exceptions inside `each_iteration`
25
-
26
- Unrescued exceptions inside the `each_iteration` block are handled the same way as exceptions occuring in `perform` for a regular Active Job subclass, meaning you need to configure it to retry using [`retry_on`](https://api.rubyonrails.org/classes/ActiveJob/Exceptions/ClassMethods.html#method-i-retry_on) or manually call [`retry_job`](https://api.rubyonrails.org/classes/ActiveJob/Exceptions.html#method-i-retry_job). The job will re-enqueue itself with the last successful cursor, the iteration that failed will be retried with the same parameters and the cursor will only move if that iteration succeeds. This behaviour may be enough for intermittent errors, such as network connection failures, but if your execution is deterministic and you have an error, subsequent iterations will never run.
27
-
28
- In other words, if you are trying to process 100 records but the job consistently fails on the 61st, only the first 60 will be processed and the job will try to process the 61st record until retries are exhausted.
29
-
30
- If no retries are configured or retries are exhausted, Active Job 'bubbles up' the exception to the job backend. Retries by the backend (e.g. Sidekiq) are not supported, meaning that jobs retried by the job backend instead of Active Job will restart from the beginning.
31
-
32
- ## Stopping a job
33
-
34
- Because jobs typically retry when exceptions are thrown, there is a special mechanism to fully stop a job that still has iterations remaining. To do this, you can `throw(:abort)`. This is then caught by job-iteration and signals that the job should complete now, regardless of its iteration state.
35
-
36
- ## Signals
37
-
38
- It's critical to know [UNIX signals](https://www.tutorialspoint.com/unix/unix-signals-traps.htm) in order to understand how interruption works. There are two main signals that Sidekiq and Resque use: `SIGTERM` and `SIGKILL`. `SIGTERM` is the graceful termination signal which means that the process should exit _soon_, not immediately. For Iteration, it means that we have time to wait for the last iteration to finish and to push job back to the queue with the last cursor position.
39
- `SIGTERM` is what allows Iteration to work. In contrast, `SIGKILL` means immediate exit. It doesn't let the worker terminate gracefully, instead it will drop the job and exit as soon as possible.
40
-
41
- Most of the deploy strategies (Kubernetes, Heroku, Capistrano) send `SIGTERM` before shutting down a node, then wait for a timeout (usually from 30 seconds to a minute) to send `SIGKILL` if the process has not terminated yet.
42
-
43
- Further reading: [Sidekiq signals](https://github.com/mperham/sidekiq/wiki/Signals).
44
-
45
- ## Enumerators
46
-
47
- In the early versions of Iteration, `build_enumerator` used to return ActiveRecord relations directly, and we would infer the Enumerator based on the type of object. We used to support ActiveRecord relations, arrays and CSVs. This made it hard to add support for other types of enumerations, and it was easy for developers to make mistakes and return an array of ActiveRecord objects, and for us starting to treat that as an array instead of as an ActiveRecord relation.
48
-
49
- The current version of Iteration supports _any_ Enumerator. We expose helpers to build common enumerators conveniently (`enumerator_builder.active_record_on_records`), but it's up to a developer to implement [a custom Enumerator](custom-enumerator.md).
50
-
51
- Further reading: [ruby-doc](https://ruby-doc.org/3.2.1/Enumerator.html), [a great post about Enumerators](http://blog.arkency.com/2014/01/ruby-to-enum-for-enumerator/).
data/guides/throttling.md DELETED
@@ -1,68 +0,0 @@
1
- Iteration comes with a special wrapper enumerator that allows you to throttle iterations based on external signal (e.g. database health).
2
-
3
- Consider this example:
4
-
5
- ```ruby
6
- class InactiveAccountDeleteJob < ActiveJob::Base
7
- include JobIteration::Iteration
8
-
9
- def build_enumerator(_params, cursor:)
10
- enumerator_builder.active_record_on_batches(
11
- Account.inactive,
12
- cursor: cursor
13
- )
14
- end
15
-
16
- def each_iteration(batch, _params)
17
- Account.where(id: batch.map(&:id)).delete_all
18
- end
19
- end
20
- ```
21
-
22
- For an app that keeps track of customer accounts, it's typical to purge old data that's no longer relevant for storage.
23
-
24
- At the same time, if you've got a lot of DB writes to perform, this can cause extra load on the database and slow down other parts of your service.
25
-
26
- You can change `build_enumerator` to wrap enumeration on DB rows into a throttle enumerator, which takes signal as a proc and enqueues the job for later in case the proc returned `true`.
27
-
28
- ```ruby
29
- def build_enumerator(_params, cursor:)
30
- enumerator_builder.build_throttle_enumerator(
31
- enumerator_builder.active_record_on_batches(
32
- Account.inactive,
33
- cursor: cursor
34
- ),
35
- throttle_on: -> { DatabaseStatus.unhealthy? },
36
- backoff: 30.seconds
37
- )
38
- end
39
- ```
40
-
41
- If you want to apply throttling on all jobs, you can subclass your own EnumeratorBuilder and override the default
42
- enumerator builder. The builder always wraps the returned enumerators from `build_enumerator`
43
-
44
- ```ruby
45
- class MyOwnBuilder < JobIteration::EnumeratorBuilder
46
- class Wrapper < Enumerator
47
- class << self
48
- def wrap(_builder, enum)
49
- ThrottleEnumerator.new(
50
- enum,
51
- nil,
52
- throttle_on: -> { DatabaseStatus.unhealthy? },
53
- backoff: 30.seconds
54
- )
55
- end
56
- end
57
- end
58
- end
59
-
60
- JobIteration.enumerator_builder = MyOwnBuilder
61
- ```
62
-
63
- Note that it's up to you to implement `DatabaseStatus.unhealthy?` that works for your database choice. At Shopify, a helper like `DatabaseStatus` checks the following MySQL metrics:
64
-
65
- * Replication lag across all regions
66
- * DB threads
67
- * DB is available for writes (otherwise indicates a failover happening)
68
- * [Semian](https://github.com/shopify/semian) open circuits