sidekiq-iteration 0.1.0 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +23 -0
- data/README.md +19 -11
- data/guides/argument-semantics.md +130 -0
- data/guides/best-practices.md +5 -0
- data/guides/custom-enumerator.md +34 -7
- data/lib/sidekiq_iteration/active_record_enumerator.rb +152 -23
- data/lib/sidekiq_iteration/csv_enumerator.rb +2 -10
- data/lib/sidekiq_iteration/enumerators.rb +3 -4
- data/lib/sidekiq_iteration/iteration.rb +13 -7
- data/lib/sidekiq_iteration/job_retry_patch.rb +16 -3
- data/lib/sidekiq_iteration/version.rb +1 -1
- data/lib/sidekiq_iteration.rb +11 -0
- metadata +5 -6
- data/lib/sidekiq_iteration/active_record_batch_enumerator.rb +0 -127
- data/lib/sidekiq_iteration/active_record_cursor.rb +0 -89
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 40efca13e06cd7fdcfc1ff59ad08fea8fc731ee1b5560ae5b75b2591379bcb63
|
4
|
+
data.tar.gz: eec2991b40bb67ffc1dcea55f1c0e8acc98a18cd59ad2fd117cd80c1a94c3e79
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 1162ffafc4d157e7a8f9d2b8f69163e90e83431daac129707e16605a9b0df250c1cf2dd9063f651782b01ab0dcd9a0f3848d381981f94ef5a2daf36f43d591be
|
7
|
+
data.tar.gz: 45a1efa4e1e65ae322b7c923cb5795ceda901863afc4d135cb25e1b342c9604e7f90719259b7fd93b8c38b15a5bcb4cab984617ea915fb58055f21936470269c
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,28 @@
|
|
1
1
|
## master (unreleased)
|
2
2
|
|
3
|
+
## 0.3.0 (2023-05-20)
|
4
|
+
|
5
|
+
- Allow a default retry backoff to be configured
|
6
|
+
|
7
|
+
```ruby
|
8
|
+
SidekiqIteration.default_retry_backoff = 10.seconds
|
9
|
+
```
|
10
|
+
|
11
|
+
- Add ability to iterate Active Record enumerators in reverse order
|
12
|
+
|
13
|
+
```ruby
|
14
|
+
active_record_records_enumerator(User.all, order: :desc)
|
15
|
+
```
|
16
|
+
|
17
|
+
## 0.2.0 (2022-11-11)
|
18
|
+
|
19
|
+
- Fix storing run metadata when the job fails for sidekiq < 6.5.2
|
20
|
+
|
21
|
+
- Make enumerators resume from the last cursor position
|
22
|
+
|
23
|
+
This fixes `NestedEnumerator` to work correctly. Previously, each intermediate enumerator
|
24
|
+
was resumed from the next cursor position, possibly skipping remaining inner items.
|
25
|
+
|
3
26
|
## 0.1.0 (2022-11-02)
|
4
27
|
|
5
28
|
- First release
|
data/README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2
2
|
|
3
3
|
[![Build Status](https://github.com/fatkodima/sidekiq-iteration/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/fatkodima/sidekiq-iteration/actions/workflows/ci.yml)
|
4
4
|
|
5
|
-
Meet Iteration, an extension for [Sidekiq](https://github.com/mperham/sidekiq) that makes your jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
|
5
|
+
Meet Iteration, an extension for [Sidekiq](https://github.com/mperham/sidekiq) that makes your long-running jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
|
6
6
|
|
7
7
|
## Background
|
8
8
|
|
@@ -33,7 +33,7 @@ Software that is designed for high availability [must be resilient](https://12fa
|
|
33
33
|
- Ruby 2.7+ (if you need support for older ruby, [open an issue](https://github.com/fatkodima/sidekiq-iteration/issues/new))
|
34
34
|
- Sidekiq 6+
|
35
35
|
|
36
|
-
##
|
36
|
+
## Installation
|
37
37
|
|
38
38
|
Add this line to your application's Gemfile:
|
39
39
|
|
@@ -45,6 +45,8 @@ And then execute:
|
|
45
45
|
|
46
46
|
$ bundle
|
47
47
|
|
48
|
+
## Getting started
|
49
|
+
|
48
50
|
In the job, include `SidekiqIteration::Iteration` module and start describing the job with two methods (`build_enumerator` and `each_iteration`) instead of `perform`:
|
49
51
|
|
50
52
|
```ruby
|
@@ -136,10 +138,10 @@ class BatchesJob
|
|
136
138
|
end
|
137
139
|
```
|
138
140
|
|
139
|
-
### Iterating over
|
141
|
+
### Iterating over Active Record Relations
|
140
142
|
|
141
143
|
```ruby
|
142
|
-
class
|
144
|
+
class RelationsJob
|
143
145
|
include Sidekiq::Job
|
144
146
|
include SidekiqIteration::Iteration
|
145
147
|
|
@@ -151,14 +153,14 @@ class BatchesAsRelationJob
|
|
151
153
|
)
|
152
154
|
end
|
153
155
|
|
154
|
-
def each_iteration(
|
155
|
-
#
|
156
|
-
|
156
|
+
def each_iteration(comments_relation, product_id)
|
157
|
+
# comments_relation will be a Comment::ActiveRecord_Relation
|
158
|
+
comments_relation.update_all(deleted: true)
|
157
159
|
end
|
158
160
|
end
|
159
161
|
```
|
160
162
|
|
161
|
-
### Iterating over arrays
|
163
|
+
### Iterating over arbitrary arrays
|
162
164
|
|
163
165
|
```ruby
|
164
166
|
class ArrayJob
|
@@ -184,10 +186,10 @@ class CsvJob
|
|
184
186
|
|
185
187
|
def build_enumerator(import_id, cursor:)
|
186
188
|
import = Import.find(import_id)
|
187
|
-
|
189
|
+
csv_enumerator(import.csv, cursor: cursor)
|
188
190
|
end
|
189
191
|
|
190
|
-
def each_iteration(csv_row)
|
192
|
+
def each_iteration(csv_row, import_id)
|
191
193
|
# insert csv_row to database
|
192
194
|
end
|
193
195
|
end
|
@@ -220,6 +222,7 @@ end
|
|
220
222
|
## Guides
|
221
223
|
|
222
224
|
* [Iteration: how it works](guides/iteration-how-it-works.md)
|
225
|
+
* [Job argument semantics](guides/argument-semantics.md)
|
223
226
|
* [Best practices](guides/best-practices.md)
|
224
227
|
* [Writing custom enumerator](guides/custom-enumerator.md)
|
225
228
|
* [Throttling](guides/throttling.md)
|
@@ -228,10 +231,15 @@ For more detailed documentation, see [rubydoc](https://rubydoc.info/gems/sidekiq
|
|
228
231
|
|
229
232
|
## API
|
230
233
|
|
231
|
-
Iteration job must respond to `build_enumerator` and `each_iteration` methods. `build_enumerator` must return [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.
|
234
|
+
Iteration job must respond to `build_enumerator` and `each_iteration` methods. `build_enumerator` must return [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.html) object that respects the `cursor` value.
|
232
235
|
|
233
236
|
## FAQ
|
234
237
|
|
238
|
+
**Advantages of this pattern over splitting a large job into many small jobs?**
|
239
|
+
* Having one job is easier for redis in terms of memory, time and # of requests needed for enqueuing.
|
240
|
+
* It simplifies sidekiq monitoring, because you have a predictable number of jobs in the queues, instead of having thousands of them at one time and millions at another. Also easier to navigate its web UI.
|
241
|
+
* You can stop/pause/delete just one job, if something goes wrong. With many jobs it is harder and can take a long time, if it is critical to stop it right now.
|
242
|
+
|
235
243
|
**Why can't I just iterate in `#perform` method and do whatever I want?** You can, but then your job has to comply with a long list of requirements, such as the ones above. This creates leaky abstractions more easily, when instead we can expose a more powerful abstraction for developers without exposing the underlying infrastructure.
|
236
244
|
|
237
245
|
**What happens when my job is interrupted?** A checkpoint will be persisted to Redis after the current `each_iteration`, and the job will be re-enqueued. Once it's popped off the queue, the worker will work off from the next iteration.
|
@@ -0,0 +1,130 @@
|
|
1
|
+
# Argument Semantics
|
2
|
+
|
3
|
+
`sidekiq-iteration` defines the `perform` method, required by `sidekiq`, to allow for iteration.
|
4
|
+
|
5
|
+
The call sequence is usually 3 methods:
|
6
|
+
|
7
|
+
`perform -> build_enumerator -> each_iteration`
|
8
|
+
|
9
|
+
In that sense `sidekiq-iteration` works like a framework (it calls your code) rather than like a library (that you call). When using jobs with parameters, the following rules of thumb are good to keep in mind.
|
10
|
+
|
11
|
+
## Jobs without arguments
|
12
|
+
|
13
|
+
Jobs without arguments do not pass anything into either `build_enumerator` or `each_iteration` except for the `cursor` which `sidekiq-iteration` persists by itself:
|
14
|
+
|
15
|
+
```ruby
|
16
|
+
class ArglessJob
|
17
|
+
include Sidekiq::Job
|
18
|
+
include SidekiqIteration::Iteration
|
19
|
+
|
20
|
+
def build_enumerator(cursor:)
|
21
|
+
# ...
|
22
|
+
end
|
23
|
+
|
24
|
+
def each_iteration(single_object_yielded_from_enumerator)
|
25
|
+
# ...
|
26
|
+
end
|
27
|
+
end
|
28
|
+
```
|
29
|
+
|
30
|
+
To enqueue the job:
|
31
|
+
|
32
|
+
```ruby
|
33
|
+
ArglessJob.perform_async
|
34
|
+
```
|
35
|
+
|
36
|
+
## Jobs with positional arguments
|
37
|
+
|
38
|
+
Jobs with positional arguments will have those arguments available to both `build_enumerator` and `each_iteration`:
|
39
|
+
|
40
|
+
```ruby
|
41
|
+
class ArgumentativeJob
|
42
|
+
include Sidekiq::Job
|
43
|
+
include SidekiqIteration::Iteration
|
44
|
+
|
45
|
+
def build_enumerator(arg1, arg2, arg3, cursor:)
|
46
|
+
# ...
|
47
|
+
end
|
48
|
+
|
49
|
+
def each_iteration(single_object_yielded_from_enumerator, arg1, arg2, arg3)
|
50
|
+
# ...
|
51
|
+
end
|
52
|
+
end
|
53
|
+
```
|
54
|
+
|
55
|
+
To enqueue the job:
|
56
|
+
|
57
|
+
```ruby
|
58
|
+
ArgumentativeJob.perform_async(_arg1 = "One", _arg2 = "Two", _arg3 = "Three")
|
59
|
+
```
|
60
|
+
|
61
|
+
## Jobs with keyword arguments
|
62
|
+
|
63
|
+
Jobs with keyword arguments will have the keyword arguments available to both `build_enumerator` and `each_iteration`, but these arguments come packaged into a Hash in both cases. You will need to `fetch` or `[]` your parameter from the `Hash` you get passed in:
|
64
|
+
|
65
|
+
```ruby
|
66
|
+
class ParameterizedJob
|
67
|
+
include Sidekiq::Job
|
68
|
+
include SidekiqIteration::Iteration
|
69
|
+
|
70
|
+
def build_enumerator(kwargs, cursor:)
|
71
|
+
name = kwargs.fetch("name")
|
72
|
+
email = kwargs.fetch("email")
|
73
|
+
# ...
|
74
|
+
end
|
75
|
+
|
76
|
+
def each_iteration(object_yielded_from_enumerator, kwargs)
|
77
|
+
name = kwargs.fetch("name")
|
78
|
+
email = kwargs.fetch("email")
|
79
|
+
# ...
|
80
|
+
end
|
81
|
+
end
|
82
|
+
```
|
83
|
+
|
84
|
+
To enqueue the job:
|
85
|
+
|
86
|
+
```ruby
|
87
|
+
ParameterizedJob.perform_async("name" => "Jane", "email" => "jane@host.example")
|
88
|
+
```
|
89
|
+
|
90
|
+
## Jobs with both positional and keyword arguments
|
91
|
+
|
92
|
+
Jobs with keyword arguments will have the keyword arguments available to both `build_enumerator` and `each_iteration`, but these arguments come packaged into a Hash in both cases. You will need to `fetch` or `[]` your parameter from the `Hash` you get passed in. Positional arguments get passed first and "unsplatted" (not combined into an array), the `Hash` containing keyword arguments comes after:
|
93
|
+
|
94
|
+
```ruby
|
95
|
+
class HighlyConfigurableGreetingJob
|
96
|
+
include Sidekiq::Job
|
97
|
+
include SidekiqIteration::Iteration
|
98
|
+
|
99
|
+
def build_enumerator(subject_line, kwargs, cursor:)
|
100
|
+
name = kwargs.fetch("sender_name")
|
101
|
+
email = kwargs.fetch("sender_email")
|
102
|
+
# ...
|
103
|
+
end
|
104
|
+
|
105
|
+
def each_iteration(object_yielded_from_enumerator, subject_line, kwargs)
|
106
|
+
name = kwargs.fetch("sender_name")
|
107
|
+
email = kwargs.fetch("sender_email")
|
108
|
+
# ...
|
109
|
+
end
|
110
|
+
end
|
111
|
+
```
|
112
|
+
|
113
|
+
To enqueue the job:
|
114
|
+
|
115
|
+
```ruby
|
116
|
+
HighlyConfigurableGreetingJob.perform_async(_subject_line = "Greetings everybody!", "sender_name" => "Jane", "sender_email" => "jane@host.example")
|
117
|
+
```
|
118
|
+
|
119
|
+
## Returning (yielding) from enumerators
|
120
|
+
|
121
|
+
When defining a custom enumerator (see the [custom enumerator guide](custom-enumerator.md)) you need to yield two positional arguments from it: the object that will be the value for the current iteration (like a single ActiveModel instance, a single number...) and the value you want to be persisted as the `cursor` value should `sidekiq-iteration` decide to interrupt you after this iteration. Calling the enumerator with that cursor should return the next object after the one returned in this iteration. That new `cursor` value does not get passed to `each_iteration`:
|
122
|
+
|
123
|
+
```ruby
|
124
|
+
Enumerator.new do |yielder|
|
125
|
+
# In this case `cursor` is an Integer
|
126
|
+
cursor.upto(99999) do |offset|
|
127
|
+
yielder.yield(fetch_record_at(offset), offset)
|
128
|
+
end
|
129
|
+
end
|
130
|
+
```
|
data/guides/best-practices.md
CHANGED
@@ -1,5 +1,10 @@
|
|
1
1
|
# Best practices
|
2
2
|
|
3
|
+
## Considerations when writing jobs
|
4
|
+
|
5
|
+
* Duration of `#each_iteration`: processing a single element from the enumerator builded in `#build_enumerator` should take less than 25 seconds, or the duration set as a timeout for Sidekiq. It allows the job to be safely interrupted and resumed.
|
6
|
+
* Idempotency of `#each_iteration`: it should be safe to run `#each_iteration` multiple times for the same element from the enumerator. Read more in [this Sidekiq best practice](https://github.com/mperham/sidekiq/wiki/Best-Practices#2-make-your-job-idempotent-and-transactional). It's important if the job errors and you run it again, because the same element that errored the job may be processed again. It especially matters in the situation described above, when the iteration duration exceeds the timeout: if the job is re-enqueued, multiple elements may be processed again.
|
7
|
+
|
3
8
|
## Batch iteration
|
4
9
|
|
5
10
|
Regardless of the active record enumerator used in the task, `sidekiq-iteration` gem loads records in batches of 100 (by default).
|
data/guides/custom-enumerator.md
CHANGED
@@ -2,6 +2,17 @@
|
|
2
2
|
|
3
3
|
Iteration leverages the [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.html) pattern from the Ruby standard library, which allows us to use almost any resource as a collection to iterate.
|
4
4
|
|
5
|
+
Before writing an enumerator, it is important to understand [how Iteration works](iteration-how-it-works.md) and how
|
6
|
+
your enumerator will be used by it. An enumerator must `yield` two things in the following order as positional
|
7
|
+
arguments:
|
8
|
+
- An object to be processed in a job `each_iteration` method
|
9
|
+
- A cursor position, which Iteration will persist if `each_iteration` returns succesfully and the job is forced to shut
|
10
|
+
down. It can be any data type your job backend can serialize and deserialize correctly.
|
11
|
+
|
12
|
+
A job that includes Iteration is first started with `nil` as the cursor. When resuming an interrupted job, Iteration
|
13
|
+
will deserialize the persisted cursor and pass it to the job's `build_enumerator` method, which your enumerator uses to
|
14
|
+
find objects that come _after_ the last successfully processed object.
|
15
|
+
|
5
16
|
## Cursorless Enumerator
|
6
17
|
|
7
18
|
Consider a custom Enumerator that takes items from a Redis list. Because a Redis list is essentially a queue, we can ignore the cursor:
|
@@ -23,7 +34,7 @@ class ListJob
|
|
23
34
|
end
|
24
35
|
end
|
25
36
|
|
26
|
-
def each_iteration(
|
37
|
+
def each_iteration(item_from_redis)
|
27
38
|
# ...
|
28
39
|
end
|
29
40
|
end
|
@@ -31,14 +42,15 @@ end
|
|
31
42
|
|
32
43
|
## Enumerator with cursor
|
33
44
|
|
34
|
-
|
45
|
+
For a more complex example, consider this Enumerator that wraps a third party API (Stripe) for paginated iteration and
|
46
|
+
stores a string as the cursor position:
|
35
47
|
|
36
48
|
```ruby
|
37
49
|
class StripeListEnumerator
|
38
50
|
# @param resource [Stripe::APIResource] The type of Stripe object to request
|
39
51
|
# @param params [Hash] Query parameters for the request
|
40
52
|
# @param options [Hash] Request options, such as API key or version
|
41
|
-
# @param cursor [String]
|
53
|
+
# @param cursor [nil, String] The Stripe ID of the last item iterated over
|
42
54
|
def initialize(resource, params: {}, options: {}, cursor:)
|
43
55
|
pagination_params = {}
|
44
56
|
pagination_params[:starting_after] = cursor unless cursor.nil?
|
@@ -59,6 +71,9 @@ class StripeListEnumerator
|
|
59
71
|
def each
|
60
72
|
loop do
|
61
73
|
@list.each do |item, _index|
|
74
|
+
# The first argument is what gets passed to `each_iteration`.
|
75
|
+
# The second argument (item.id) is going to be persisted as the cursor,
|
76
|
+
# it doesn't get passed to `each_iteration`.
|
62
77
|
yield item, item.id
|
63
78
|
end
|
64
79
|
|
@@ -71,26 +86,38 @@ class StripeListEnumerator
|
|
71
86
|
end
|
72
87
|
```
|
73
88
|
|
89
|
+
Here we leverage the Stripe cursor pagination where the cursor is an ID of a specific item in the collection. The job
|
90
|
+
which uses such an `Enumerator` would then look like so:
|
91
|
+
|
74
92
|
```ruby
|
75
|
-
class
|
93
|
+
class LoadRefundsForChargeJob
|
76
94
|
include Sidekiq::Job
|
77
95
|
include SidekiqIteration::Iteration
|
78
96
|
|
79
|
-
def build_enumerator(
|
97
|
+
def build_enumerator(charge_id, cursor:)
|
80
98
|
StripeListEnumerator.new(
|
81
99
|
Stripe::Refund,
|
82
|
-
params: { charge: "
|
100
|
+
params: { charge: charge_id }, # "charge_id" will be a prefixed Stripe ID such as "chrg_123"
|
83
101
|
options: { api_key: "sk_test_123", stripe_version: "2018-01-18" },
|
84
102
|
cursor: cursor
|
85
103
|
).to_enumerator
|
86
104
|
end
|
87
105
|
|
88
|
-
|
106
|
+
# Note that in this case `each_iteration` will only receive one positional argument per iteration.
|
107
|
+
# If what your enumerator yields is a composite object you will need to unpack it yourself
|
108
|
+
# inside the `each_iteration`.
|
109
|
+
def each_iteration(stripe_refund, charge_id)
|
89
110
|
# ...
|
90
111
|
end
|
91
112
|
end
|
92
113
|
```
|
93
114
|
|
115
|
+
and you initiate the job with
|
116
|
+
|
117
|
+
```ruby
|
118
|
+
LoadRefundsForChargeJob.perform_later(_charge_id = "chrg_345")
|
119
|
+
```
|
120
|
+
|
94
121
|
## Notes
|
95
122
|
|
96
123
|
We recommend that you read the implementation of the other enumerators that come with the library (`CsvEnumerator`, `ActiveRecordEnumerator`) to gain a better understanding of building Enumerator objects.
|
@@ -1,28 +1,51 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
-
require_relative "active_record_cursor"
|
4
|
-
|
5
3
|
module SidekiqIteration
|
6
|
-
# Builds Enumerator based on ActiveRecord Relation. Supports enumerating on rows and batches.
|
7
4
|
# @private
|
8
5
|
class ActiveRecordEnumerator
|
9
|
-
SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%
|
6
|
+
SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%6N"
|
10
7
|
|
11
|
-
def initialize(relation, columns: nil, batch_size: 100, cursor: nil)
|
8
|
+
def initialize(relation, columns: nil, batch_size: 100, order: :asc, cursor: nil)
|
12
9
|
unless relation.is_a?(ActiveRecord::Relation)
|
13
10
|
raise ArgumentError, "relation must be an ActiveRecord::Relation"
|
14
11
|
end
|
15
12
|
|
16
|
-
|
13
|
+
unless order == :asc || order == :desc
|
14
|
+
raise ArgumentError, ":order must be :asc or :desc, got #{order.inspect}"
|
15
|
+
end
|
16
|
+
|
17
|
+
@primary_key = "#{relation.table_name}.#{relation.primary_key}"
|
18
|
+
@columns = Array(columns&.map(&:to_s) || @primary_key)
|
19
|
+
@primary_key_index = @columns.index(@primary_key) || @columns.index(relation.primary_key)
|
20
|
+
@pluck_columns = if @primary_key_index
|
21
|
+
@columns
|
22
|
+
else
|
23
|
+
@columns + [@primary_key]
|
24
|
+
end
|
17
25
|
@batch_size = batch_size
|
18
|
-
@
|
19
|
-
@cursor = cursor
|
26
|
+
@order = order
|
27
|
+
@cursor = Array.wrap(cursor)
|
28
|
+
raise ArgumentError, "Must specify at least one column" if @columns.empty?
|
29
|
+
if relation.joins_values.present? && !@columns.all?(/\./)
|
30
|
+
raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
|
31
|
+
end
|
32
|
+
|
33
|
+
if relation.arel.orders.present? || relation.arel.taken.present?
|
34
|
+
raise ArgumentError,
|
35
|
+
"The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
|
36
|
+
"You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
|
37
|
+
end
|
38
|
+
|
39
|
+
ordering = @columns.to_h { |column| [column, @order] }
|
40
|
+
@base_relation = relation.reorder(ordering)
|
41
|
+
@iteration_count = 0
|
20
42
|
end
|
21
43
|
|
22
44
|
def records
|
23
|
-
Enumerator.new(-> {
|
45
|
+
Enumerator.new(-> { records_size }) do |yielder|
|
24
46
|
batches.each do |batch, _|
|
25
47
|
batch.each do |record|
|
48
|
+
@iteration_count += 1
|
26
49
|
yielder.yield(record, cursor_value(record))
|
27
50
|
end
|
28
51
|
end
|
@@ -30,40 +53,146 @@ module SidekiqIteration
|
|
30
53
|
end
|
31
54
|
|
32
55
|
def batches
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
yielder.yield(
|
56
|
+
Enumerator.new(-> { records_size }) do |yielder|
|
57
|
+
while (batch = next_batch(load: true))
|
58
|
+
@iteration_count += 1
|
59
|
+
yielder.yield(batch, cursor_value(batch.last))
|
37
60
|
end
|
38
61
|
end
|
39
62
|
end
|
40
63
|
|
41
|
-
def
|
42
|
-
|
64
|
+
def relations
|
65
|
+
Enumerator.new(-> { relations_size }) do |yielder|
|
66
|
+
while (batch = next_batch(load: false))
|
67
|
+
@iteration_count += 1
|
68
|
+
yielder.yield(batch, unwrap_array(@cursor))
|
69
|
+
end
|
70
|
+
end
|
43
71
|
end
|
44
72
|
|
45
73
|
private
|
74
|
+
def records_size
|
75
|
+
@base_relation.count(:all)
|
76
|
+
end
|
77
|
+
|
78
|
+
def relations_size
|
79
|
+
(records_size + @batch_size - 1) / @batch_size # ceiling division
|
80
|
+
end
|
81
|
+
|
82
|
+
def next_batch(load:)
|
83
|
+
batch_relation = @base_relation.limit(@batch_size)
|
84
|
+
if conditions.any?
|
85
|
+
batch_relation = batch_relation.where(*conditions)
|
86
|
+
end
|
87
|
+
|
88
|
+
records = nil
|
89
|
+
cursor_values, ids = batch_relation.uncached do
|
90
|
+
if load
|
91
|
+
records = batch_relation.records
|
92
|
+
pluck_columns(records)
|
93
|
+
else
|
94
|
+
pluck_columns(batch_relation)
|
95
|
+
end
|
96
|
+
end
|
97
|
+
|
98
|
+
cursor = cursor_values.last
|
99
|
+
return unless cursor.present?
|
100
|
+
|
101
|
+
# The primary key was plucked, but original cursor did not include it, so we should remove it
|
102
|
+
cursor.pop unless @primary_key_index
|
103
|
+
@cursor = Array.wrap(cursor)
|
104
|
+
|
105
|
+
# Yields relations by selecting the primary keys of records in the batch.
|
106
|
+
# Post.where(published: nil) results in an enumerator of relations like:
|
107
|
+
# Post.where(published: nil, ids: batch_of_ids)
|
108
|
+
relation = @base_relation.where(@primary_key => ids)
|
109
|
+
relation.send(:load_records, records) if load
|
110
|
+
relation
|
111
|
+
end
|
112
|
+
|
113
|
+
def pluck_columns(batch)
|
114
|
+
columns =
|
115
|
+
if batch.is_a?(Array)
|
116
|
+
@pluck_columns.map { |column| column.to_s.split(".").last }
|
117
|
+
else
|
118
|
+
@pluck_columns
|
119
|
+
end
|
120
|
+
|
121
|
+
if columns.size == 1 # only the primary key
|
122
|
+
column_values = batch.pluck(columns.first)
|
123
|
+
return [column_values, column_values]
|
124
|
+
end
|
125
|
+
|
126
|
+
column_values = batch.pluck(*columns)
|
127
|
+
primary_key_index = @primary_key_index || -1
|
128
|
+
primary_key_values = column_values.map { |values| values[primary_key_index] }
|
129
|
+
|
130
|
+
serialize_column_values!(column_values)
|
131
|
+
[column_values, primary_key_values]
|
132
|
+
end
|
133
|
+
|
46
134
|
def cursor_value(record)
|
47
135
|
positions = @columns.map do |column|
|
48
136
|
attribute_name = column.to_s.split(".").last
|
49
|
-
column_value(record
|
137
|
+
column_value(record[attribute_name])
|
138
|
+
end
|
139
|
+
|
140
|
+
unwrap_array(positions)
|
141
|
+
end
|
142
|
+
|
143
|
+
def conditions
|
144
|
+
return [] if @cursor.empty?
|
145
|
+
|
146
|
+
binds = []
|
147
|
+
sql = build_starts_after_conditions(0, binds)
|
148
|
+
|
149
|
+
# Start from the record pointed by cursor.
|
150
|
+
# We use the property that `>=` is equivalent to `> or =`.
|
151
|
+
if @iteration_count == 0
|
152
|
+
binds.unshift(*@cursor)
|
153
|
+
columns_equality = @columns.map { |column| "#{column} = ?" }.join(" AND ")
|
154
|
+
sql = "(#{columns_equality}) OR (#{sql})"
|
50
155
|
end
|
51
156
|
|
52
|
-
|
53
|
-
|
157
|
+
[sql, *binds]
|
158
|
+
end
|
159
|
+
|
160
|
+
# (x, y) > (a, b) iff (x > a or (x = a and y > b))
|
161
|
+
# (x, y) < (a, b) iff (x < a or (x = a and y < b))
|
162
|
+
def build_starts_after_conditions(index, binds)
|
163
|
+
column = @columns[index]
|
164
|
+
|
165
|
+
if index < @cursor.size - 1
|
166
|
+
binds << @cursor[index] << @cursor[index]
|
167
|
+
"#{column} #{@order == :asc ? '>' : '<'} ? OR (#{column} = ? AND (#{build_starts_after_conditions(index + 1, binds)}))"
|
54
168
|
else
|
55
|
-
|
169
|
+
binds << @cursor[index]
|
170
|
+
if @columns.size == @cursor.size
|
171
|
+
@order == :asc ? "#{column} > ?" : "#{column} < ?"
|
172
|
+
else
|
173
|
+
@order == :asc ? "#{column} >= ?" : "#{column} <= ?"
|
174
|
+
end
|
56
175
|
end
|
57
176
|
end
|
58
177
|
|
59
|
-
def
|
60
|
-
|
61
|
-
|
62
|
-
|
178
|
+
def serialize_column_values!(column_values)
|
179
|
+
column_values.map! { |values| values.map! { |value| column_value(value) } }
|
180
|
+
end
|
181
|
+
|
182
|
+
def column_value(value)
|
183
|
+
if value.is_a?(Time)
|
63
184
|
value.strftime(SQL_DATETIME_WITH_NSEC)
|
64
185
|
else
|
65
186
|
value
|
66
187
|
end
|
67
188
|
end
|
189
|
+
|
190
|
+
def unwrap_array(array)
|
191
|
+
if array.size == 1
|
192
|
+
array.first
|
193
|
+
else
|
194
|
+
array
|
195
|
+
end
|
196
|
+
end
|
68
197
|
end
|
69
198
|
end
|
@@ -49,7 +49,7 @@ module SidekiqIteration
|
|
49
49
|
def rows(cursor:)
|
50
50
|
@csv.lazy
|
51
51
|
.each_with_index
|
52
|
-
.drop(
|
52
|
+
.drop(cursor || 0)
|
53
53
|
.to_enum { count_of_rows_in_file }
|
54
54
|
end
|
55
55
|
|
@@ -60,7 +60,7 @@ module SidekiqIteration
|
|
60
60
|
@csv.lazy
|
61
61
|
.each_slice(batch_size)
|
62
62
|
.with_index
|
63
|
-
.drop(
|
63
|
+
.drop(cursor || 0)
|
64
64
|
.to_enum { (count_of_rows_in_file.to_f / batch_size).ceil }
|
65
65
|
end
|
66
66
|
|
@@ -73,13 +73,5 @@ module SidekiqIteration
|
|
73
73
|
count -= 1 if @csv.headers
|
74
74
|
count
|
75
75
|
end
|
76
|
-
|
77
|
-
def count_of_processed_rows(cursor)
|
78
|
-
if cursor
|
79
|
-
cursor + 1
|
80
|
-
else
|
81
|
-
0
|
82
|
-
end
|
83
|
-
end
|
84
76
|
end
|
85
77
|
end
|
@@ -1,7 +1,6 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
require_relative "active_record_enumerator"
|
4
|
-
require_relative "active_record_batch_enumerator"
|
5
4
|
require_relative "csv_enumerator"
|
6
5
|
require_relative "nested_enumerator"
|
7
6
|
|
@@ -22,8 +21,7 @@ module SidekiqIteration
|
|
22
21
|
raise ArgumentError, "array cannot contain ActiveRecord objects"
|
23
22
|
end
|
24
23
|
|
25
|
-
drop
|
26
|
-
array.each_with_index.drop(drop).to_enum { array.size }
|
24
|
+
array.each_with_index.drop(cursor || 0).to_enum { array.size }
|
27
25
|
end
|
28
26
|
|
29
27
|
# Builds Enumerator from Active Record Relation. Each Enumerator tick moves the cursor one row forward.
|
@@ -33,6 +31,7 @@ module SidekiqIteration
|
|
33
31
|
# @option options :columns [Array<String, Symbol>] used to build the actual query for iteration,
|
34
32
|
# defaults to primary key
|
35
33
|
# @option options :batch_size [Integer] (100) size of the batch
|
34
|
+
# @option options :order [:asc, :desc] (:asc) specifies iteration order
|
36
35
|
#
|
37
36
|
# +columns:+ argument is used to build the actual query for iteration. +columns+: defaults to primary key:
|
38
37
|
#
|
@@ -115,7 +114,7 @@ module SidekiqIteration
|
|
115
114
|
# end
|
116
115
|
#
|
117
116
|
def active_record_relations_enumerator(scope, cursor:, **options)
|
118
|
-
|
117
|
+
ActiveRecordEnumerator.new(scope, cursor: cursor, **options).relations
|
119
118
|
end
|
120
119
|
|
121
120
|
# Builds Enumerator from a CSV file.
|
@@ -13,13 +13,13 @@ module SidekiqIteration
|
|
13
13
|
base.extend(Throttling)
|
14
14
|
|
15
15
|
base.class_eval do
|
16
|
-
throttle_on(backoff:
|
16
|
+
throttle_on(backoff: SidekiqIteration.default_retry_backoff) do |job|
|
17
17
|
job.class.max_job_runtime &&
|
18
18
|
job.start_time &&
|
19
19
|
(Time.now.utc - job.start_time) > job.class.max_job_runtime
|
20
20
|
end
|
21
21
|
|
22
|
-
throttle_on(backoff:
|
22
|
+
throttle_on(backoff: SidekiqIteration.default_retry_backoff) do
|
23
23
|
defined?(Sidekiq::CLI) &&
|
24
24
|
Sidekiq::CLI.instance.launcher.stopping?
|
25
25
|
end
|
@@ -56,16 +56,22 @@ module SidekiqIteration
|
|
56
56
|
|
57
57
|
attr_reader :executions,
|
58
58
|
:cursor_position,
|
59
|
-
:start_time,
|
60
59
|
:times_interrupted,
|
61
|
-
:total_time,
|
62
60
|
:current_run_iterations
|
63
61
|
|
62
|
+
# The time when the job starts running. If the job is interrupted and runs again,
|
63
|
+
# the value is updated.
|
64
|
+
attr_reader :start_time
|
65
|
+
|
66
|
+
# The total time the job has been running, including multiple iterations.
|
67
|
+
# The time isn't reset if the job is interrupted.
|
68
|
+
attr_reader :total_time
|
69
|
+
|
64
70
|
# @private
|
65
71
|
def initialize
|
66
72
|
super
|
67
73
|
@arguments = nil
|
68
|
-
@job_iteration_retry_backoff =
|
74
|
+
@job_iteration_retry_backoff = SidekiqIteration.default_retry_backoff
|
69
75
|
@needs_reenqueue = false
|
70
76
|
@current_run_iterations = 0
|
71
77
|
end
|
@@ -191,14 +197,14 @@ module SidekiqIteration
|
|
191
197
|
)
|
192
198
|
end
|
193
199
|
|
194
|
-
adjust_total_time
|
195
200
|
true
|
201
|
+
ensure
|
202
|
+
adjust_total_time
|
196
203
|
end
|
197
204
|
|
198
205
|
def reenqueue_iteration_job
|
199
206
|
SidekiqIteration.logger.info("[SidekiqIteration::Iteration] Interrupting and re-enqueueing the job cursor_position=#{cursor_position}")
|
200
207
|
|
201
|
-
adjust_total_time
|
202
208
|
@times_interrupted += 1
|
203
209
|
|
204
210
|
arguments = @arguments
|
@@ -7,6 +7,17 @@ module SidekiqIteration
|
|
7
7
|
module JobRetryPatch
|
8
8
|
private
|
9
9
|
def process_retry(jobinst, msg, queue, exception)
|
10
|
+
add_sidekiq_iteration_metadata(jobinst, msg)
|
11
|
+
super
|
12
|
+
end
|
13
|
+
|
14
|
+
# The method was renamed in https://github.com/mperham/sidekiq/commit/0676a5202e89aa9da4ad7991f4111b97a9d8a0a4.
|
15
|
+
def attempt_retry(jobinst, msg, queue, exception)
|
16
|
+
add_sidekiq_iteration_metadata(jobinst, msg)
|
17
|
+
super
|
18
|
+
end
|
19
|
+
|
20
|
+
def add_sidekiq_iteration_metadata(jobinst, msg)
|
10
21
|
if jobinst.is_a?(Iteration)
|
11
22
|
unless msg["args"].last.is_a?(Hash)
|
12
23
|
msg["args"].push({})
|
@@ -19,12 +30,14 @@ module SidekiqIteration
|
|
19
30
|
"total_time" => jobinst.total_time,
|
20
31
|
}
|
21
32
|
end
|
22
|
-
|
23
|
-
super
|
24
33
|
end
|
25
34
|
end
|
26
35
|
end
|
27
36
|
|
28
|
-
if Sidekiq::JobRetry.
|
37
|
+
if Sidekiq::JobRetry.private_method_defined?(:process_retry) ||
|
38
|
+
Sidekiq::JobRetry.private_method_defined?(:attempt_retry)
|
29
39
|
Sidekiq::JobRetry.prepend(SidekiqIteration::JobRetryPatch)
|
40
|
+
else
|
41
|
+
raise "Sidekiq #{Sidekiq::VERSION} removed the #process_retry method. " \
|
42
|
+
"Please open an issue at the `sidekiq-iteration` gem."
|
30
43
|
end
|
data/lib/sidekiq_iteration.rb
CHANGED
@@ -22,6 +22,17 @@ module SidekiqIteration
|
|
22
22
|
#
|
23
23
|
attr_accessor :max_job_runtime
|
24
24
|
|
25
|
+
# Configures a delay duration to wait before resuming an interrupted job.
|
26
|
+
#
|
27
|
+
# @example
|
28
|
+
# SidekiqIteration.default_retry_backoff = 10.seconds
|
29
|
+
#
|
30
|
+
# Defaults to nil which means interrupted jobs will be retried immediately.
|
31
|
+
# This value will be ignored when an interruption is raised by a throttle enumerator,
|
32
|
+
# where the throttle backoff value will take precedence over this setting.
|
33
|
+
#
|
34
|
+
attr_accessor :default_retry_backoff
|
35
|
+
|
25
36
|
# Set a custom logger for sidekiq-iteration.
|
26
37
|
# Defaults to `Sidekiq.logger`.
|
27
38
|
#
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sidekiq-iteration
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- fatkodima
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date:
|
12
|
+
date: 2023-05-20 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: sidekiq
|
@@ -35,14 +35,13 @@ files:
|
|
35
35
|
- CHANGELOG.md
|
36
36
|
- LICENSE.txt
|
37
37
|
- README.md
|
38
|
+
- guides/argument-semantics.md
|
38
39
|
- guides/best-practices.md
|
39
40
|
- guides/custom-enumerator.md
|
40
41
|
- guides/iteration-how-it-works.md
|
41
42
|
- guides/throttling.md
|
42
43
|
- lib/sidekiq-iteration.rb
|
43
44
|
- lib/sidekiq_iteration.rb
|
44
|
-
- lib/sidekiq_iteration/active_record_batch_enumerator.rb
|
45
|
-
- lib/sidekiq_iteration/active_record_cursor.rb
|
46
45
|
- lib/sidekiq_iteration/active_record_enumerator.rb
|
47
46
|
- lib/sidekiq_iteration/csv_enumerator.rb
|
48
47
|
- lib/sidekiq_iteration/enumerators.rb
|
@@ -73,8 +72,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
73
72
|
- !ruby/object:Gem::Version
|
74
73
|
version: '0'
|
75
74
|
requirements: []
|
76
|
-
rubygems_version: 3.
|
75
|
+
rubygems_version: 3.4.12
|
77
76
|
signing_key:
|
78
77
|
specification_version: 4
|
79
|
-
summary: Makes your sidekiq jobs interruptible and resumable.
|
78
|
+
summary: Makes your long-running sidekiq jobs interruptible and resumable.
|
80
79
|
test_files: []
|
@@ -1,127 +0,0 @@
|
|
1
|
-
# frozen_string_literal: true
|
2
|
-
|
3
|
-
module SidekiqIteration
|
4
|
-
# Batch Enumerator based on ActiveRecord Relation.
|
5
|
-
# @private
|
6
|
-
class ActiveRecordBatchEnumerator
|
7
|
-
include Enumerable
|
8
|
-
|
9
|
-
SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%N"
|
10
|
-
|
11
|
-
def initialize(relation, columns: nil, batch_size: 100, cursor: nil)
|
12
|
-
@primary_key = "#{relation.table_name}.#{relation.primary_key}"
|
13
|
-
@columns = Array(columns&.map(&:to_s) || @primary_key)
|
14
|
-
@primary_key_index = @columns.index(@primary_key) || @columns.index(relation.primary_key)
|
15
|
-
@pluck_columns = if @primary_key_index
|
16
|
-
@columns
|
17
|
-
else
|
18
|
-
@columns + [@primary_key]
|
19
|
-
end
|
20
|
-
@batch_size = batch_size
|
21
|
-
@cursor = Array.wrap(cursor)
|
22
|
-
@initial_cursor = @cursor
|
23
|
-
raise ArgumentError, "Must specify at least one column" if @columns.empty?
|
24
|
-
if relation.joins_values.present? && !@columns.all?(/\./)
|
25
|
-
raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
|
26
|
-
end
|
27
|
-
|
28
|
-
if relation.arel.orders.present? || relation.arel.taken.present?
|
29
|
-
raise ArgumentError,
|
30
|
-
"The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
|
31
|
-
"You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
|
32
|
-
end
|
33
|
-
|
34
|
-
@base_relation = relation.reorder(@columns.join(", "))
|
35
|
-
end
|
36
|
-
|
37
|
-
def each
|
38
|
-
return to_enum { size } unless block_given?
|
39
|
-
|
40
|
-
while (relation = next_batch)
|
41
|
-
yield relation, cursor_value
|
42
|
-
end
|
43
|
-
end
|
44
|
-
|
45
|
-
def size
|
46
|
-
(@base_relation.count(:all) + @batch_size - 1) / @batch_size # ceiling division
|
47
|
-
end
|
48
|
-
|
49
|
-
private
|
50
|
-
def next_batch
|
51
|
-
relation = @base_relation.limit(@batch_size)
|
52
|
-
if conditions.any?
|
53
|
-
relation = relation.where(*conditions)
|
54
|
-
end
|
55
|
-
|
56
|
-
cursor_values, ids = relation.uncached do
|
57
|
-
pluck_columns(relation)
|
58
|
-
end
|
59
|
-
|
60
|
-
cursor = cursor_values.last
|
61
|
-
unless cursor.present?
|
62
|
-
@cursor = @initial_cursor
|
63
|
-
return
|
64
|
-
end
|
65
|
-
# The primary key was plucked, but original cursor did not include it, so we should remove it
|
66
|
-
cursor.pop unless @primary_key_index
|
67
|
-
@cursor = Array.wrap(cursor)
|
68
|
-
|
69
|
-
# Yields relations by selecting the primary keys of records in the batch.
|
70
|
-
# Post.where(published: nil) results in an enumerator of relations like:
|
71
|
-
# Post.where(published: nil, ids: batch_of_ids)
|
72
|
-
@base_relation.where(@primary_key => ids)
|
73
|
-
end
|
74
|
-
|
75
|
-
def pluck_columns(relation)
|
76
|
-
if @pluck_columns.size == 1 # only the primary key
|
77
|
-
column_values = relation.pluck(*@pluck_columns)
|
78
|
-
return [column_values, column_values]
|
79
|
-
end
|
80
|
-
|
81
|
-
column_values = relation.pluck(*@pluck_columns)
|
82
|
-
primary_key_index = @primary_key_index || -1
|
83
|
-
primary_key_values = column_values.map { |values| values[primary_key_index] }
|
84
|
-
|
85
|
-
serialize_column_values!(column_values)
|
86
|
-
[column_values, primary_key_values]
|
87
|
-
end
|
88
|
-
|
89
|
-
def cursor_value
|
90
|
-
if @cursor.size == 1
|
91
|
-
@cursor.first
|
92
|
-
else
|
93
|
-
@cursor
|
94
|
-
end
|
95
|
-
end
|
96
|
-
|
97
|
-
def conditions
|
98
|
-
column_index = @cursor.size - 1
|
99
|
-
column = @columns[column_index]
|
100
|
-
where_clause = if @columns.size == @cursor.size
|
101
|
-
"#{column} > ?"
|
102
|
-
else
|
103
|
-
"#{column} >= ?"
|
104
|
-
end
|
105
|
-
while column_index > 0
|
106
|
-
column_index -= 1
|
107
|
-
column = @columns[column_index]
|
108
|
-
where_clause = "#{column} > ? OR (#{column} = ? AND (#{where_clause}))"
|
109
|
-
end
|
110
|
-
ret = @cursor.reduce([where_clause]) { |params, value| params << value << value }
|
111
|
-
ret.pop
|
112
|
-
ret
|
113
|
-
end
|
114
|
-
|
115
|
-
def serialize_column_values!(column_values)
|
116
|
-
column_values.map! { |values| values.map! { |value| column_value(value) } }
|
117
|
-
end
|
118
|
-
|
119
|
-
def column_value(value)
|
120
|
-
if value.is_a?(Time)
|
121
|
-
value.strftime(SQL_DATETIME_WITH_NSEC)
|
122
|
-
else
|
123
|
-
value
|
124
|
-
end
|
125
|
-
end
|
126
|
-
end
|
127
|
-
end
|
@@ -1,89 +0,0 @@
|
|
1
|
-
# frozen_string_literal: true
|
2
|
-
|
3
|
-
module SidekiqIteration
|
4
|
-
# @private
|
5
|
-
class ActiveRecordCursor
|
6
|
-
include Comparable
|
7
|
-
|
8
|
-
attr_reader :position, :reached_end
|
9
|
-
|
10
|
-
def initialize(relation, columns = nil, position = nil)
|
11
|
-
columns ||= "#{relation.table_name}.#{relation.primary_key}"
|
12
|
-
@columns = Array.wrap(columns)
|
13
|
-
raise ArgumentError, "Must specify at least one column" if @columns.empty?
|
14
|
-
|
15
|
-
self.position = Array.wrap(position)
|
16
|
-
if relation.joins_values.present? && !@columns.all?(/\./)
|
17
|
-
raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
|
18
|
-
end
|
19
|
-
|
20
|
-
if relation.arel.orders.present? || relation.arel.taken.present?
|
21
|
-
raise ArgumentError,
|
22
|
-
"The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
|
23
|
-
"You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
|
24
|
-
end
|
25
|
-
|
26
|
-
@base_relation = relation.reorder(@columns.join(", "))
|
27
|
-
@reached_end = false
|
28
|
-
end
|
29
|
-
|
30
|
-
def <=>(other)
|
31
|
-
if reached_end == other.reached_end
|
32
|
-
position <=> other.position
|
33
|
-
else
|
34
|
-
reached_end ? 1 : -1
|
35
|
-
end
|
36
|
-
end
|
37
|
-
|
38
|
-
def position=(position)
|
39
|
-
raise ArgumentError, "Cursor position cannot contain nil values" if position.any?(&:nil?)
|
40
|
-
|
41
|
-
@position = position
|
42
|
-
end
|
43
|
-
|
44
|
-
def next_batch(batch_size)
|
45
|
-
return if @reached_end
|
46
|
-
|
47
|
-
relation = @base_relation.limit(batch_size)
|
48
|
-
|
49
|
-
if (conditions = self.conditions).any?
|
50
|
-
relation = relation.where(*conditions)
|
51
|
-
end
|
52
|
-
|
53
|
-
records = relation.uncached do
|
54
|
-
relation.to_a
|
55
|
-
end
|
56
|
-
|
57
|
-
update_from_record(records.last) if records.any?
|
58
|
-
@reached_end = records.size < batch_size
|
59
|
-
|
60
|
-
records if records.any?
|
61
|
-
end
|
62
|
-
|
63
|
-
private
|
64
|
-
def conditions
|
65
|
-
i = @position.size - 1
|
66
|
-
column = @columns[i]
|
67
|
-
conditions = if @columns.size == @position.size
|
68
|
-
"#{column} > ?"
|
69
|
-
else
|
70
|
-
"#{column} >= ?"
|
71
|
-
end
|
72
|
-
while i > 0
|
73
|
-
i -= 1
|
74
|
-
column = @columns[i]
|
75
|
-
conditions = "#{column} > ? OR (#{column} = ? AND (#{conditions}))"
|
76
|
-
end
|
77
|
-
ret = @position.reduce([conditions]) { |params, value| params << value << value }
|
78
|
-
ret.pop
|
79
|
-
ret
|
80
|
-
end
|
81
|
-
|
82
|
-
def update_from_record(record)
|
83
|
-
self.position = @columns.map do |column|
|
84
|
-
method = column.to_s.split(".").last
|
85
|
-
record.send(method)
|
86
|
-
end
|
87
|
-
end
|
88
|
-
end
|
89
|
-
end
|