sidekiq-iteration 0.1.0 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +9 -0
- data/README.md +13 -8
- data/guides/best-practices.md +5 -0
- data/lib/sidekiq_iteration/active_record_enumerator.rb +144 -22
- data/lib/sidekiq_iteration/csv_enumerator.rb +2 -10
- data/lib/sidekiq_iteration/enumerators.rb +2 -4
- data/lib/sidekiq_iteration/job_retry_patch.rb +16 -3
- data/lib/sidekiq_iteration/version.rb +1 -1
- metadata +2 -4
- data/lib/sidekiq_iteration/active_record_batch_enumerator.rb +0 -127
- data/lib/sidekiq_iteration/active_record_cursor.rb +0 -89
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 342540da75582c7f102f6ead29643a5196038978c5626b0c6a00a43db04f4f1f
|
4
|
+
data.tar.gz: cfb7cf80031976e5c68b2503d6ab0a13ffe92efc016419ad20944d3c5ce3e56d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9c22d0b3d74888b394fcb26ca759a0a9a74e762f06c1d655b1dcb94b791c5d4be3880c6a127d8b5c0d149317fa4f25e1a2550026a14bde62b3743de7b058557c
|
7
|
+
data.tar.gz: 13ca6cd11f437d9c1b25e6dbd380f018674b1b80b7b4fece7c3a91c042b59b9dc8f8e1bc6a42a14325ec89ab4af4b022d4c3bd00e15aca6cb6189082e03d099a
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,14 @@
|
|
1
1
|
## master (unreleased)
|
2
2
|
|
3
|
+
## 0.2.0 (2022-11-11)
|
4
|
+
|
5
|
+
- Fix storing run metadata when the job fails for sidekiq < 6.5.2
|
6
|
+
|
7
|
+
- Make enumerators resume from the last cursor position
|
8
|
+
|
9
|
+
This fixes `NestedEnumerator` to work correctly. Previously, each intermediate enumerator
|
10
|
+
was resumed from the next cursor position, possibly skipping remaining inner items.
|
11
|
+
|
3
12
|
## 0.1.0 (2022-11-02)
|
4
13
|
|
5
14
|
- First release
|
data/README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2
2
|
|
3
3
|
[![Build Status](https://github.com/fatkodima/sidekiq-iteration/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/fatkodima/sidekiq-iteration/actions/workflows/ci.yml)
|
4
4
|
|
5
|
-
Meet Iteration, an extension for [Sidekiq](https://github.com/mperham/sidekiq) that makes your jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
|
5
|
+
Meet Iteration, an extension for [Sidekiq](https://github.com/mperham/sidekiq) that makes your long-running jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
|
6
6
|
|
7
7
|
## Background
|
8
8
|
|
@@ -136,10 +136,10 @@ class BatchesJob
|
|
136
136
|
end
|
137
137
|
```
|
138
138
|
|
139
|
-
### Iterating over
|
139
|
+
### Iterating over Active Record Relations
|
140
140
|
|
141
141
|
```ruby
|
142
|
-
class
|
142
|
+
class RelationsJob
|
143
143
|
include Sidekiq::Job
|
144
144
|
include SidekiqIteration::Iteration
|
145
145
|
|
@@ -151,14 +151,14 @@ class BatchesAsRelationJob
|
|
151
151
|
)
|
152
152
|
end
|
153
153
|
|
154
|
-
def each_iteration(
|
155
|
-
#
|
156
|
-
|
154
|
+
def each_iteration(comments_relation, product_id)
|
155
|
+
# comments_relation will be a Comment::ActiveRecord_Relation
|
156
|
+
comments_relation.update_all(deleted: true)
|
157
157
|
end
|
158
158
|
end
|
159
159
|
```
|
160
160
|
|
161
|
-
### Iterating over arrays
|
161
|
+
### Iterating over arbitrary arrays
|
162
162
|
|
163
163
|
```ruby
|
164
164
|
class ArrayJob
|
@@ -228,10 +228,15 @@ For more detailed documentation, see [rubydoc](https://rubydoc.info/gems/sidekiq
|
|
228
228
|
|
229
229
|
## API
|
230
230
|
|
231
|
-
Iteration job must respond to `build_enumerator` and `each_iteration` methods. `build_enumerator` must return [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.
|
231
|
+
Iteration job must respond to `build_enumerator` and `each_iteration` methods. `build_enumerator` must return [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.html) object that respects the `cursor` value.
|
232
232
|
|
233
233
|
## FAQ
|
234
234
|
|
235
|
+
**Advantages of this pattern over splitting a large job into many small jobs?**
|
236
|
+
* Having one job is easier for redis in terms of memory, time and # of requests needed for enqueuing.
|
237
|
+
* It simplifies sidekiq monitoring, because you have a predictable number of jobs in the queues, instead of having thousands of them at one time and millions at another. Also easier to navigate its web UI.
|
238
|
+
* You can stop/pause/delete just one job, if something goes wrong. With many jobs it is harder and can take a long time, if it is critical to stop it right now.
|
239
|
+
|
235
240
|
**Why can't I just iterate in `#perform` method and do whatever I want?** You can, but then your job has to comply with a long list of requirements, such as the ones above. This creates leaky abstractions more easily, when instead we can expose a more powerful abstraction for developers without exposing the underlying infrastructure.
|
236
241
|
|
237
242
|
**What happens when my job is interrupted?** A checkpoint will be persisted to Redis after the current `each_iteration`, and the job will be re-enqueued. Once it's popped off the queue, the worker will work off from the next iteration.
|
data/guides/best-practices.md
CHANGED
@@ -1,5 +1,10 @@
|
|
1
1
|
# Best practices
|
2
2
|
|
3
|
+
## Considerations when writing jobs
|
4
|
+
|
5
|
+
* Duration of `#each_iteration`: processing a single element from the enumerator builded in `#build_enumerator` should take less than 25 seconds, or the duration set as a timeout for Sidekiq. It allows the job to be safely interrupted and resumed.
|
6
|
+
* Idempotency of `#each_iteration`: it should be safe to run `#each_iteration` multiple times for the same element from the enumerator. Read more in [this Sidekiq best practice](https://github.com/mperham/sidekiq/wiki/Best-Practices#2-make-your-job-idempotent-and-transactional). It's important if the job errors and you run it again, because the same element that errored the job may be processed again. It especially matters in the situation described above, when the iteration duration exceeds the timeout: if the job is re-enqueued, multiple elements may be processed again.
|
7
|
+
|
3
8
|
## Batch iteration
|
4
9
|
|
5
10
|
Regardless of the active record enumerator used in the task, `sidekiq-iteration` gem loads records in batches of 100 (by default).
|
@@ -1,28 +1,45 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
-
require_relative "active_record_cursor"
|
4
|
-
|
5
3
|
module SidekiqIteration
|
6
|
-
# Builds Enumerator based on ActiveRecord Relation. Supports enumerating on rows and batches.
|
7
4
|
# @private
|
8
5
|
class ActiveRecordEnumerator
|
9
|
-
SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%
|
6
|
+
SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%6N"
|
10
7
|
|
11
8
|
def initialize(relation, columns: nil, batch_size: 100, cursor: nil)
|
12
9
|
unless relation.is_a?(ActiveRecord::Relation)
|
13
10
|
raise ArgumentError, "relation must be an ActiveRecord::Relation"
|
14
11
|
end
|
15
12
|
|
16
|
-
@
|
13
|
+
@primary_key = "#{relation.table_name}.#{relation.primary_key}"
|
14
|
+
@columns = Array(columns&.map(&:to_s) || @primary_key)
|
15
|
+
@primary_key_index = @columns.index(@primary_key) || @columns.index(relation.primary_key)
|
16
|
+
@pluck_columns = if @primary_key_index
|
17
|
+
@columns
|
18
|
+
else
|
19
|
+
@columns + [@primary_key]
|
20
|
+
end
|
17
21
|
@batch_size = batch_size
|
18
|
-
@
|
19
|
-
|
22
|
+
@cursor = Array.wrap(cursor)
|
23
|
+
raise ArgumentError, "Must specify at least one column" if @columns.empty?
|
24
|
+
if relation.joins_values.present? && !@columns.all?(/\./)
|
25
|
+
raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
|
26
|
+
end
|
27
|
+
|
28
|
+
if relation.arel.orders.present? || relation.arel.taken.present?
|
29
|
+
raise ArgumentError,
|
30
|
+
"The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
|
31
|
+
"You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
|
32
|
+
end
|
33
|
+
|
34
|
+
@base_relation = relation.reorder(@columns.join(", "))
|
35
|
+
@iteration_count = 0
|
20
36
|
end
|
21
37
|
|
22
38
|
def records
|
23
|
-
Enumerator.new(-> {
|
39
|
+
Enumerator.new(-> { records_size }) do |yielder|
|
24
40
|
batches.each do |batch, _|
|
25
41
|
batch.each do |record|
|
42
|
+
@iteration_count += 1
|
26
43
|
yielder.yield(record, cursor_value(record))
|
27
44
|
end
|
28
45
|
end
|
@@ -30,40 +47,145 @@ module SidekiqIteration
|
|
30
47
|
end
|
31
48
|
|
32
49
|
def batches
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
yielder.yield(
|
50
|
+
Enumerator.new(-> { records_size }) do |yielder|
|
51
|
+
while (batch = next_batch(load: true))
|
52
|
+
@iteration_count += 1
|
53
|
+
yielder.yield(batch, cursor_value(batch.last))
|
37
54
|
end
|
38
55
|
end
|
39
56
|
end
|
40
57
|
|
41
|
-
def
|
42
|
-
|
58
|
+
def relations
|
59
|
+
Enumerator.new(-> { relations_size }) do |yielder|
|
60
|
+
while (batch = next_batch(load: false))
|
61
|
+
@iteration_count += 1
|
62
|
+
yielder.yield(batch, unwrap_array(@cursor))
|
63
|
+
end
|
64
|
+
end
|
43
65
|
end
|
44
66
|
|
45
67
|
private
|
68
|
+
def records_size
|
69
|
+
@base_relation.count(:all)
|
70
|
+
end
|
71
|
+
|
72
|
+
def relations_size
|
73
|
+
(records_size + @batch_size - 1) / @batch_size # ceiling division
|
74
|
+
end
|
75
|
+
|
76
|
+
def next_batch(load:)
|
77
|
+
batch_relation = @base_relation.limit(@batch_size)
|
78
|
+
if conditions.any?
|
79
|
+
batch_relation = batch_relation.where(*conditions)
|
80
|
+
end
|
81
|
+
|
82
|
+
records = nil
|
83
|
+
cursor_values, ids = batch_relation.uncached do
|
84
|
+
if load
|
85
|
+
records = batch_relation.records
|
86
|
+
pluck_columns(records)
|
87
|
+
else
|
88
|
+
pluck_columns(batch_relation)
|
89
|
+
end
|
90
|
+
end
|
91
|
+
|
92
|
+
cursor = cursor_values.last
|
93
|
+
return unless cursor.present?
|
94
|
+
|
95
|
+
# The primary key was plucked, but original cursor did not include it, so we should remove it
|
96
|
+
cursor.pop unless @primary_key_index
|
97
|
+
@cursor = Array.wrap(cursor)
|
98
|
+
|
99
|
+
# Yields relations by selecting the primary keys of records in the batch.
|
100
|
+
# Post.where(published: nil) results in an enumerator of relations like:
|
101
|
+
# Post.where(published: nil, ids: batch_of_ids)
|
102
|
+
relation = @base_relation.where(@primary_key => ids)
|
103
|
+
relation.send(:load_records, records) if load
|
104
|
+
relation
|
105
|
+
end
|
106
|
+
|
107
|
+
def pluck_columns(batch)
|
108
|
+
columns =
|
109
|
+
if batch.is_a?(Array)
|
110
|
+
@pluck_columns.map { |column| column.to_s.split(".").last }
|
111
|
+
else
|
112
|
+
@pluck_columns
|
113
|
+
end
|
114
|
+
|
115
|
+
if columns.size == 1 # only the primary key
|
116
|
+
column_values = batch.pluck(columns.first)
|
117
|
+
return [column_values, column_values]
|
118
|
+
end
|
119
|
+
|
120
|
+
column_values = batch.pluck(*columns)
|
121
|
+
primary_key_index = @primary_key_index || -1
|
122
|
+
primary_key_values = column_values.map { |values| values[primary_key_index] }
|
123
|
+
|
124
|
+
serialize_column_values!(column_values)
|
125
|
+
[column_values, primary_key_values]
|
126
|
+
end
|
127
|
+
|
46
128
|
def cursor_value(record)
|
47
129
|
positions = @columns.map do |column|
|
48
130
|
attribute_name = column.to_s.split(".").last
|
49
|
-
column_value(record
|
131
|
+
column_value(record[attribute_name])
|
132
|
+
end
|
133
|
+
|
134
|
+
unwrap_array(positions)
|
135
|
+
end
|
136
|
+
|
137
|
+
def conditions
|
138
|
+
return [] if @cursor.empty?
|
139
|
+
|
140
|
+
binds = []
|
141
|
+
sql = build_starts_after_conditions(0, binds)
|
142
|
+
|
143
|
+
# Start from the record pointed by cursor.
|
144
|
+
# We use the property that `>=` is equivalent to `> or =`.
|
145
|
+
if @iteration_count == 0
|
146
|
+
binds.unshift(*@cursor)
|
147
|
+
columns_equality = @columns.map { |column| "#{column} = ?" }.join(" AND ")
|
148
|
+
sql = "(#{columns_equality}) OR (#{sql})"
|
50
149
|
end
|
51
150
|
|
52
|
-
|
53
|
-
|
151
|
+
[sql, *binds]
|
152
|
+
end
|
153
|
+
|
154
|
+
# (x, y) > (a, b) iff (x > a or (x = a and y > b))
|
155
|
+
def build_starts_after_conditions(index, binds)
|
156
|
+
column = @columns[index]
|
157
|
+
|
158
|
+
if index < @cursor.size - 1
|
159
|
+
binds << @cursor[index] << @cursor[index]
|
160
|
+
"#{column} > ? OR (#{column} = ? AND (#{build_starts_after_conditions(index + 1, binds)}))"
|
54
161
|
else
|
55
|
-
|
162
|
+
binds << @cursor[index]
|
163
|
+
if @columns.size == @cursor.size
|
164
|
+
"#{column} > ?"
|
165
|
+
else
|
166
|
+
"#{column} >= ?"
|
167
|
+
end
|
56
168
|
end
|
57
169
|
end
|
58
170
|
|
59
|
-
def
|
60
|
-
|
61
|
-
|
62
|
-
|
171
|
+
def serialize_column_values!(column_values)
|
172
|
+
column_values.map! { |values| values.map! { |value| column_value(value) } }
|
173
|
+
end
|
174
|
+
|
175
|
+
def column_value(value)
|
176
|
+
if value.is_a?(Time)
|
63
177
|
value.strftime(SQL_DATETIME_WITH_NSEC)
|
64
178
|
else
|
65
179
|
value
|
66
180
|
end
|
67
181
|
end
|
182
|
+
|
183
|
+
def unwrap_array(array)
|
184
|
+
if array.size == 1
|
185
|
+
array.first
|
186
|
+
else
|
187
|
+
array
|
188
|
+
end
|
189
|
+
end
|
68
190
|
end
|
69
191
|
end
|
@@ -49,7 +49,7 @@ module SidekiqIteration
|
|
49
49
|
def rows(cursor:)
|
50
50
|
@csv.lazy
|
51
51
|
.each_with_index
|
52
|
-
.drop(
|
52
|
+
.drop(cursor || 0)
|
53
53
|
.to_enum { count_of_rows_in_file }
|
54
54
|
end
|
55
55
|
|
@@ -60,7 +60,7 @@ module SidekiqIteration
|
|
60
60
|
@csv.lazy
|
61
61
|
.each_slice(batch_size)
|
62
62
|
.with_index
|
63
|
-
.drop(
|
63
|
+
.drop(cursor || 0)
|
64
64
|
.to_enum { (count_of_rows_in_file.to_f / batch_size).ceil }
|
65
65
|
end
|
66
66
|
|
@@ -73,13 +73,5 @@ module SidekiqIteration
|
|
73
73
|
count -= 1 if @csv.headers
|
74
74
|
count
|
75
75
|
end
|
76
|
-
|
77
|
-
def count_of_processed_rows(cursor)
|
78
|
-
if cursor
|
79
|
-
cursor + 1
|
80
|
-
else
|
81
|
-
0
|
82
|
-
end
|
83
|
-
end
|
84
76
|
end
|
85
77
|
end
|
@@ -1,7 +1,6 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
require_relative "active_record_enumerator"
|
4
|
-
require_relative "active_record_batch_enumerator"
|
5
4
|
require_relative "csv_enumerator"
|
6
5
|
require_relative "nested_enumerator"
|
7
6
|
|
@@ -22,8 +21,7 @@ module SidekiqIteration
|
|
22
21
|
raise ArgumentError, "array cannot contain ActiveRecord objects"
|
23
22
|
end
|
24
23
|
|
25
|
-
drop
|
26
|
-
array.each_with_index.drop(drop).to_enum { array.size }
|
24
|
+
array.each_with_index.drop(cursor || 0).to_enum { array.size }
|
27
25
|
end
|
28
26
|
|
29
27
|
# Builds Enumerator from Active Record Relation. Each Enumerator tick moves the cursor one row forward.
|
@@ -115,7 +113,7 @@ module SidekiqIteration
|
|
115
113
|
# end
|
116
114
|
#
|
117
115
|
def active_record_relations_enumerator(scope, cursor:, **options)
|
118
|
-
|
116
|
+
ActiveRecordEnumerator.new(scope, cursor: cursor, **options).relations
|
119
117
|
end
|
120
118
|
|
121
119
|
# Builds Enumerator from a CSV file.
|
@@ -7,6 +7,17 @@ module SidekiqIteration
|
|
7
7
|
module JobRetryPatch
|
8
8
|
private
|
9
9
|
def process_retry(jobinst, msg, queue, exception)
|
10
|
+
add_sidekiq_iteration_metadata(jobinst, msg)
|
11
|
+
super
|
12
|
+
end
|
13
|
+
|
14
|
+
# The method was renamed in https://github.com/mperham/sidekiq/commit/0676a5202e89aa9da4ad7991f4111b97a9d8a0a4.
|
15
|
+
def attempt_retry(jobinst, msg, queue, exception)
|
16
|
+
add_sidekiq_iteration_metadata(jobinst, msg)
|
17
|
+
super
|
18
|
+
end
|
19
|
+
|
20
|
+
def add_sidekiq_iteration_metadata(jobinst, msg)
|
10
21
|
if jobinst.is_a?(Iteration)
|
11
22
|
unless msg["args"].last.is_a?(Hash)
|
12
23
|
msg["args"].push({})
|
@@ -19,12 +30,14 @@ module SidekiqIteration
|
|
19
30
|
"total_time" => jobinst.total_time,
|
20
31
|
}
|
21
32
|
end
|
22
|
-
|
23
|
-
super
|
24
33
|
end
|
25
34
|
end
|
26
35
|
end
|
27
36
|
|
28
|
-
if Sidekiq::JobRetry.
|
37
|
+
if Sidekiq::JobRetry.private_method_defined?(:process_retry) ||
|
38
|
+
Sidekiq::JobRetry.private_method_defined?(:attempt_retry)
|
29
39
|
Sidekiq::JobRetry.prepend(SidekiqIteration::JobRetryPatch)
|
40
|
+
else
|
41
|
+
raise "Sidekiq #{Sidekiq::VERSION} removed the #process_retry method. " \
|
42
|
+
"Please open an issue at the `sidekiq-iteration` gem."
|
30
43
|
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sidekiq-iteration
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- fatkodima
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2022-11-
|
12
|
+
date: 2022-11-11 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: sidekiq
|
@@ -41,8 +41,6 @@ files:
|
|
41
41
|
- guides/throttling.md
|
42
42
|
- lib/sidekiq-iteration.rb
|
43
43
|
- lib/sidekiq_iteration.rb
|
44
|
-
- lib/sidekiq_iteration/active_record_batch_enumerator.rb
|
45
|
-
- lib/sidekiq_iteration/active_record_cursor.rb
|
46
44
|
- lib/sidekiq_iteration/active_record_enumerator.rb
|
47
45
|
- lib/sidekiq_iteration/csv_enumerator.rb
|
48
46
|
- lib/sidekiq_iteration/enumerators.rb
|
@@ -1,127 +0,0 @@
|
|
1
|
-
# frozen_string_literal: true
|
2
|
-
|
3
|
-
module SidekiqIteration
|
4
|
-
# Batch Enumerator based on ActiveRecord Relation.
|
5
|
-
# @private
|
6
|
-
class ActiveRecordBatchEnumerator
|
7
|
-
include Enumerable
|
8
|
-
|
9
|
-
SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%N"
|
10
|
-
|
11
|
-
def initialize(relation, columns: nil, batch_size: 100, cursor: nil)
|
12
|
-
@primary_key = "#{relation.table_name}.#{relation.primary_key}"
|
13
|
-
@columns = Array(columns&.map(&:to_s) || @primary_key)
|
14
|
-
@primary_key_index = @columns.index(@primary_key) || @columns.index(relation.primary_key)
|
15
|
-
@pluck_columns = if @primary_key_index
|
16
|
-
@columns
|
17
|
-
else
|
18
|
-
@columns + [@primary_key]
|
19
|
-
end
|
20
|
-
@batch_size = batch_size
|
21
|
-
@cursor = Array.wrap(cursor)
|
22
|
-
@initial_cursor = @cursor
|
23
|
-
raise ArgumentError, "Must specify at least one column" if @columns.empty?
|
24
|
-
if relation.joins_values.present? && !@columns.all?(/\./)
|
25
|
-
raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
|
26
|
-
end
|
27
|
-
|
28
|
-
if relation.arel.orders.present? || relation.arel.taken.present?
|
29
|
-
raise ArgumentError,
|
30
|
-
"The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
|
31
|
-
"You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
|
32
|
-
end
|
33
|
-
|
34
|
-
@base_relation = relation.reorder(@columns.join(", "))
|
35
|
-
end
|
36
|
-
|
37
|
-
def each
|
38
|
-
return to_enum { size } unless block_given?
|
39
|
-
|
40
|
-
while (relation = next_batch)
|
41
|
-
yield relation, cursor_value
|
42
|
-
end
|
43
|
-
end
|
44
|
-
|
45
|
-
def size
|
46
|
-
(@base_relation.count(:all) + @batch_size - 1) / @batch_size # ceiling division
|
47
|
-
end
|
48
|
-
|
49
|
-
private
|
50
|
-
def next_batch
|
51
|
-
relation = @base_relation.limit(@batch_size)
|
52
|
-
if conditions.any?
|
53
|
-
relation = relation.where(*conditions)
|
54
|
-
end
|
55
|
-
|
56
|
-
cursor_values, ids = relation.uncached do
|
57
|
-
pluck_columns(relation)
|
58
|
-
end
|
59
|
-
|
60
|
-
cursor = cursor_values.last
|
61
|
-
unless cursor.present?
|
62
|
-
@cursor = @initial_cursor
|
63
|
-
return
|
64
|
-
end
|
65
|
-
# The primary key was plucked, but original cursor did not include it, so we should remove it
|
66
|
-
cursor.pop unless @primary_key_index
|
67
|
-
@cursor = Array.wrap(cursor)
|
68
|
-
|
69
|
-
# Yields relations by selecting the primary keys of records in the batch.
|
70
|
-
# Post.where(published: nil) results in an enumerator of relations like:
|
71
|
-
# Post.where(published: nil, ids: batch_of_ids)
|
72
|
-
@base_relation.where(@primary_key => ids)
|
73
|
-
end
|
74
|
-
|
75
|
-
def pluck_columns(relation)
|
76
|
-
if @pluck_columns.size == 1 # only the primary key
|
77
|
-
column_values = relation.pluck(*@pluck_columns)
|
78
|
-
return [column_values, column_values]
|
79
|
-
end
|
80
|
-
|
81
|
-
column_values = relation.pluck(*@pluck_columns)
|
82
|
-
primary_key_index = @primary_key_index || -1
|
83
|
-
primary_key_values = column_values.map { |values| values[primary_key_index] }
|
84
|
-
|
85
|
-
serialize_column_values!(column_values)
|
86
|
-
[column_values, primary_key_values]
|
87
|
-
end
|
88
|
-
|
89
|
-
def cursor_value
|
90
|
-
if @cursor.size == 1
|
91
|
-
@cursor.first
|
92
|
-
else
|
93
|
-
@cursor
|
94
|
-
end
|
95
|
-
end
|
96
|
-
|
97
|
-
def conditions
|
98
|
-
column_index = @cursor.size - 1
|
99
|
-
column = @columns[column_index]
|
100
|
-
where_clause = if @columns.size == @cursor.size
|
101
|
-
"#{column} > ?"
|
102
|
-
else
|
103
|
-
"#{column} >= ?"
|
104
|
-
end
|
105
|
-
while column_index > 0
|
106
|
-
column_index -= 1
|
107
|
-
column = @columns[column_index]
|
108
|
-
where_clause = "#{column} > ? OR (#{column} = ? AND (#{where_clause}))"
|
109
|
-
end
|
110
|
-
ret = @cursor.reduce([where_clause]) { |params, value| params << value << value }
|
111
|
-
ret.pop
|
112
|
-
ret
|
113
|
-
end
|
114
|
-
|
115
|
-
def serialize_column_values!(column_values)
|
116
|
-
column_values.map! { |values| values.map! { |value| column_value(value) } }
|
117
|
-
end
|
118
|
-
|
119
|
-
def column_value(value)
|
120
|
-
if value.is_a?(Time)
|
121
|
-
value.strftime(SQL_DATETIME_WITH_NSEC)
|
122
|
-
else
|
123
|
-
value
|
124
|
-
end
|
125
|
-
end
|
126
|
-
end
|
127
|
-
end
|
@@ -1,89 +0,0 @@
|
|
1
|
-
# frozen_string_literal: true
|
2
|
-
|
3
|
-
module SidekiqIteration
|
4
|
-
# @private
|
5
|
-
class ActiveRecordCursor
|
6
|
-
include Comparable
|
7
|
-
|
8
|
-
attr_reader :position, :reached_end
|
9
|
-
|
10
|
-
def initialize(relation, columns = nil, position = nil)
|
11
|
-
columns ||= "#{relation.table_name}.#{relation.primary_key}"
|
12
|
-
@columns = Array.wrap(columns)
|
13
|
-
raise ArgumentError, "Must specify at least one column" if @columns.empty?
|
14
|
-
|
15
|
-
self.position = Array.wrap(position)
|
16
|
-
if relation.joins_values.present? && !@columns.all?(/\./)
|
17
|
-
raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
|
18
|
-
end
|
19
|
-
|
20
|
-
if relation.arel.orders.present? || relation.arel.taken.present?
|
21
|
-
raise ArgumentError,
|
22
|
-
"The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
|
23
|
-
"You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
|
24
|
-
end
|
25
|
-
|
26
|
-
@base_relation = relation.reorder(@columns.join(", "))
|
27
|
-
@reached_end = false
|
28
|
-
end
|
29
|
-
|
30
|
-
def <=>(other)
|
31
|
-
if reached_end == other.reached_end
|
32
|
-
position <=> other.position
|
33
|
-
else
|
34
|
-
reached_end ? 1 : -1
|
35
|
-
end
|
36
|
-
end
|
37
|
-
|
38
|
-
def position=(position)
|
39
|
-
raise ArgumentError, "Cursor position cannot contain nil values" if position.any?(&:nil?)
|
40
|
-
|
41
|
-
@position = position
|
42
|
-
end
|
43
|
-
|
44
|
-
def next_batch(batch_size)
|
45
|
-
return if @reached_end
|
46
|
-
|
47
|
-
relation = @base_relation.limit(batch_size)
|
48
|
-
|
49
|
-
if (conditions = self.conditions).any?
|
50
|
-
relation = relation.where(*conditions)
|
51
|
-
end
|
52
|
-
|
53
|
-
records = relation.uncached do
|
54
|
-
relation.to_a
|
55
|
-
end
|
56
|
-
|
57
|
-
update_from_record(records.last) if records.any?
|
58
|
-
@reached_end = records.size < batch_size
|
59
|
-
|
60
|
-
records if records.any?
|
61
|
-
end
|
62
|
-
|
63
|
-
private
|
64
|
-
def conditions
|
65
|
-
i = @position.size - 1
|
66
|
-
column = @columns[i]
|
67
|
-
conditions = if @columns.size == @position.size
|
68
|
-
"#{column} > ?"
|
69
|
-
else
|
70
|
-
"#{column} >= ?"
|
71
|
-
end
|
72
|
-
while i > 0
|
73
|
-
i -= 1
|
74
|
-
column = @columns[i]
|
75
|
-
conditions = "#{column} > ? OR (#{column} = ? AND (#{conditions}))"
|
76
|
-
end
|
77
|
-
ret = @position.reduce([conditions]) { |params, value| params << value << value }
|
78
|
-
ret.pop
|
79
|
-
ret
|
80
|
-
end
|
81
|
-
|
82
|
-
def update_from_record(record)
|
83
|
-
self.position = @columns.map do |column|
|
84
|
-
method = column.to_s.split(".").last
|
85
|
-
record.send(method)
|
86
|
-
end
|
87
|
-
end
|
88
|
-
end
|
89
|
-
end
|