sidekiq-iteration 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 82316cffa840b2c9619792b6f0c5bb7ec696f964ad81edf6b0ed5861339ca064
4
- data.tar.gz: 8337c0e87e6be8858d5b9d868c8a04b05f91de315be2e9aca9c40e8447c78644
3
+ metadata.gz: 342540da75582c7f102f6ead29643a5196038978c5626b0c6a00a43db04f4f1f
4
+ data.tar.gz: cfb7cf80031976e5c68b2503d6ab0a13ffe92efc016419ad20944d3c5ce3e56d
5
5
  SHA512:
6
- metadata.gz: 63712780bca873613cbe3ef89ff0037c3eaa5633a28382eb02427311360714a89f092e5efef29873efa75067d22f32745eea3f04d7606442e29da33e2e2e6a08
7
- data.tar.gz: 4ded7fc6ab772c019154e6559027c87d389a356a16915d2c869e318fb26f5339dd98355a0f1eb422358c1cd9f7fe31bc2a70d5627a577f8f45394db15d0593b7
6
+ metadata.gz: 9c22d0b3d74888b394fcb26ca759a0a9a74e762f06c1d655b1dcb94b791c5d4be3880c6a127d8b5c0d149317fa4f25e1a2550026a14bde62b3743de7b058557c
7
+ data.tar.gz: 13ca6cd11f437d9c1b25e6dbd380f018674b1b80b7b4fece7c3a91c042b59b9dc8f8e1bc6a42a14325ec89ab4af4b022d4c3bd00e15aca6cb6189082e03d099a
data/CHANGELOG.md CHANGED
@@ -1,5 +1,14 @@
1
1
  ## master (unreleased)
2
2
 
3
+ ## 0.2.0 (2022-11-11)
4
+
5
+ - Fix storing run metadata when the job fails for sidekiq < 6.5.2
6
+
7
+ - Make enumerators resume from the last cursor position
8
+
9
+ This fixes `NestedEnumerator` to work correctly. Previously, each intermediate enumerator
10
+ was resumed from the next cursor position, possibly skipping remaining inner items.
11
+
3
12
  ## 0.1.0 (2022-11-02)
4
13
 
5
14
  - First release
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  [![Build Status](https://github.com/fatkodima/sidekiq-iteration/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/fatkodima/sidekiq-iteration/actions/workflows/ci.yml)
4
4
 
5
- Meet Iteration, an extension for [Sidekiq](https://github.com/mperham/sidekiq) that makes your jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
5
+ Meet Iteration, an extension for [Sidekiq](https://github.com/mperham/sidekiq) that makes your long-running jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
6
6
 
7
7
  ## Background
8
8
 
@@ -136,10 +136,10 @@ class BatchesJob
136
136
  end
137
137
  ```
138
138
 
139
- ### Iterating over batches of Active Record Relations
139
+ ### Iterating over Active Record Relations
140
140
 
141
141
  ```ruby
142
- class BatchesAsRelationJob
142
+ class RelationsJob
143
143
  include Sidekiq::Job
144
144
  include SidekiqIteration::Iteration
145
145
 
@@ -151,14 +151,14 @@ class BatchesAsRelationJob
151
151
  )
152
152
  end
153
153
 
154
- def each_iteration(batch_of_comments, product_id)
155
- # batch_of_comments will be a Comment::ActiveRecord_Relation
156
- batch_of_comments.update_all(deleted: true)
154
+ def each_iteration(comments_relation, product_id)
155
+ # comments_relation will be a Comment::ActiveRecord_Relation
156
+ comments_relation.update_all(deleted: true)
157
157
  end
158
158
  end
159
159
  ```
160
160
 
161
- ### Iterating over arrays
161
+ ### Iterating over arbitrary arrays
162
162
 
163
163
  ```ruby
164
164
  class ArrayJob
@@ -228,10 +228,15 @@ For more detailed documentation, see [rubydoc](https://rubydoc.info/gems/sidekiq
228
228
 
229
229
  ## API
230
230
 
231
- Iteration job must respond to `build_enumerator` and `each_iteration` methods. `build_enumerator` must return [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.htmll) object that respects the `cursor` value.
231
+ Iteration job must respond to `build_enumerator` and `each_iteration` methods. `build_enumerator` must return [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.html) object that respects the `cursor` value.
232
232
 
233
233
  ## FAQ
234
234
 
235
+ **Advantages of this pattern over splitting a large job into many small jobs?**
236
+ * Having one job is easier for redis in terms of memory, time and # of requests needed for enqueuing.
237
+ * It simplifies sidekiq monitoring, because you have a predictable number of jobs in the queues, instead of having thousands of them at one time and millions at another. Also easier to navigate its web UI.
238
+ * You can stop/pause/delete just one job, if something goes wrong. With many jobs it is harder and can take a long time, if it is critical to stop it right now.
239
+
235
240
  **Why can't I just iterate in `#perform` method and do whatever I want?** You can, but then your job has to comply with a long list of requirements, such as the ones above. This creates leaky abstractions more easily, when instead we can expose a more powerful abstraction for developers without exposing the underlying infrastructure.
236
241
 
237
242
  **What happens when my job is interrupted?** A checkpoint will be persisted to Redis after the current `each_iteration`, and the job will be re-enqueued. Once it's popped off the queue, the worker will work off from the next iteration.
@@ -1,5 +1,10 @@
1
1
  # Best practices
2
2
 
3
+ ## Considerations when writing jobs
4
+
5
+ * Duration of `#each_iteration`: processing a single element from the enumerator builded in `#build_enumerator` should take less than 25 seconds, or the duration set as a timeout for Sidekiq. It allows the job to be safely interrupted and resumed.
6
+ * Idempotency of `#each_iteration`: it should be safe to run `#each_iteration` multiple times for the same element from the enumerator. Read more in [this Sidekiq best practice](https://github.com/mperham/sidekiq/wiki/Best-Practices#2-make-your-job-idempotent-and-transactional). It's important if the job errors and you run it again, because the same element that errored the job may be processed again. It especially matters in the situation described above, when the iteration duration exceeds the timeout: if the job is re-enqueued, multiple elements may be processed again.
7
+
3
8
  ## Batch iteration
4
9
 
5
10
  Regardless of the active record enumerator used in the task, `sidekiq-iteration` gem loads records in batches of 100 (by default).
@@ -1,28 +1,45 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require_relative "active_record_cursor"
4
-
5
3
  module SidekiqIteration
6
- # Builds Enumerator based on ActiveRecord Relation. Supports enumerating on rows and batches.
7
4
  # @private
8
5
  class ActiveRecordEnumerator
9
- SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%N"
6
+ SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%6N"
10
7
 
11
8
  def initialize(relation, columns: nil, batch_size: 100, cursor: nil)
12
9
  unless relation.is_a?(ActiveRecord::Relation)
13
10
  raise ArgumentError, "relation must be an ActiveRecord::Relation"
14
11
  end
15
12
 
16
- @relation = relation
13
+ @primary_key = "#{relation.table_name}.#{relation.primary_key}"
14
+ @columns = Array(columns&.map(&:to_s) || @primary_key)
15
+ @primary_key_index = @columns.index(@primary_key) || @columns.index(relation.primary_key)
16
+ @pluck_columns = if @primary_key_index
17
+ @columns
18
+ else
19
+ @columns + [@primary_key]
20
+ end
17
21
  @batch_size = batch_size
18
- @columns = Array(columns || "#{relation.table_name}.#{relation.primary_key}")
19
- @cursor = cursor
22
+ @cursor = Array.wrap(cursor)
23
+ raise ArgumentError, "Must specify at least one column" if @columns.empty?
24
+ if relation.joins_values.present? && !@columns.all?(/\./)
25
+ raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
26
+ end
27
+
28
+ if relation.arel.orders.present? || relation.arel.taken.present?
29
+ raise ArgumentError,
30
+ "The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
31
+ "You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
32
+ end
33
+
34
+ @base_relation = relation.reorder(@columns.join(", "))
35
+ @iteration_count = 0
20
36
  end
21
37
 
22
38
  def records
23
- Enumerator.new(-> { size }) do |yielder|
39
+ Enumerator.new(-> { records_size }) do |yielder|
24
40
  batches.each do |batch, _|
25
41
  batch.each do |record|
42
+ @iteration_count += 1
26
43
  yielder.yield(record, cursor_value(record))
27
44
  end
28
45
  end
@@ -30,40 +47,145 @@ module SidekiqIteration
30
47
  end
31
48
 
32
49
  def batches
33
- cursor = ActiveRecordCursor.new(@relation, @columns, @cursor)
34
- Enumerator.new(-> { size }) do |yielder|
35
- while (records = cursor.next_batch(@batch_size))
36
- yielder.yield(records, cursor_value(records.last)) if records.any?
50
+ Enumerator.new(-> { records_size }) do |yielder|
51
+ while (batch = next_batch(load: true))
52
+ @iteration_count += 1
53
+ yielder.yield(batch, cursor_value(batch.last))
37
54
  end
38
55
  end
39
56
  end
40
57
 
41
- def size
42
- @relation.count(:all)
58
+ def relations
59
+ Enumerator.new(-> { relations_size }) do |yielder|
60
+ while (batch = next_batch(load: false))
61
+ @iteration_count += 1
62
+ yielder.yield(batch, unwrap_array(@cursor))
63
+ end
64
+ end
43
65
  end
44
66
 
45
67
  private
68
+ def records_size
69
+ @base_relation.count(:all)
70
+ end
71
+
72
+ def relations_size
73
+ (records_size + @batch_size - 1) / @batch_size # ceiling division
74
+ end
75
+
76
+ def next_batch(load:)
77
+ batch_relation = @base_relation.limit(@batch_size)
78
+ if conditions.any?
79
+ batch_relation = batch_relation.where(*conditions)
80
+ end
81
+
82
+ records = nil
83
+ cursor_values, ids = batch_relation.uncached do
84
+ if load
85
+ records = batch_relation.records
86
+ pluck_columns(records)
87
+ else
88
+ pluck_columns(batch_relation)
89
+ end
90
+ end
91
+
92
+ cursor = cursor_values.last
93
+ return unless cursor.present?
94
+
95
+ # The primary key was plucked, but original cursor did not include it, so we should remove it
96
+ cursor.pop unless @primary_key_index
97
+ @cursor = Array.wrap(cursor)
98
+
99
+ # Yields relations by selecting the primary keys of records in the batch.
100
+ # Post.where(published: nil) results in an enumerator of relations like:
101
+ # Post.where(published: nil, ids: batch_of_ids)
102
+ relation = @base_relation.where(@primary_key => ids)
103
+ relation.send(:load_records, records) if load
104
+ relation
105
+ end
106
+
107
+ def pluck_columns(batch)
108
+ columns =
109
+ if batch.is_a?(Array)
110
+ @pluck_columns.map { |column| column.to_s.split(".").last }
111
+ else
112
+ @pluck_columns
113
+ end
114
+
115
+ if columns.size == 1 # only the primary key
116
+ column_values = batch.pluck(columns.first)
117
+ return [column_values, column_values]
118
+ end
119
+
120
+ column_values = batch.pluck(*columns)
121
+ primary_key_index = @primary_key_index || -1
122
+ primary_key_values = column_values.map { |values| values[primary_key_index] }
123
+
124
+ serialize_column_values!(column_values)
125
+ [column_values, primary_key_values]
126
+ end
127
+
46
128
  def cursor_value(record)
47
129
  positions = @columns.map do |column|
48
130
  attribute_name = column.to_s.split(".").last
49
- column_value(record, attribute_name)
131
+ column_value(record[attribute_name])
132
+ end
133
+
134
+ unwrap_array(positions)
135
+ end
136
+
137
+ def conditions
138
+ return [] if @cursor.empty?
139
+
140
+ binds = []
141
+ sql = build_starts_after_conditions(0, binds)
142
+
143
+ # Start from the record pointed by cursor.
144
+ # We use the property that `>=` is equivalent to `> or =`.
145
+ if @iteration_count == 0
146
+ binds.unshift(*@cursor)
147
+ columns_equality = @columns.map { |column| "#{column} = ?" }.join(" AND ")
148
+ sql = "(#{columns_equality}) OR (#{sql})"
50
149
  end
51
150
 
52
- if positions.size == 1
53
- positions.first
151
+ [sql, *binds]
152
+ end
153
+
154
+ # (x, y) > (a, b) iff (x > a or (x = a and y > b))
155
+ def build_starts_after_conditions(index, binds)
156
+ column = @columns[index]
157
+
158
+ if index < @cursor.size - 1
159
+ binds << @cursor[index] << @cursor[index]
160
+ "#{column} > ? OR (#{column} = ? AND (#{build_starts_after_conditions(index + 1, binds)}))"
54
161
  else
55
- positions
162
+ binds << @cursor[index]
163
+ if @columns.size == @cursor.size
164
+ "#{column} > ?"
165
+ else
166
+ "#{column} >= ?"
167
+ end
56
168
  end
57
169
  end
58
170
 
59
- def column_value(record, attribute)
60
- value = record.read_attribute(attribute.to_sym)
61
- case record.class.columns_hash.fetch(attribute).type
62
- when :datetime
171
+ def serialize_column_values!(column_values)
172
+ column_values.map! { |values| values.map! { |value| column_value(value) } }
173
+ end
174
+
175
+ def column_value(value)
176
+ if value.is_a?(Time)
63
177
  value.strftime(SQL_DATETIME_WITH_NSEC)
64
178
  else
65
179
  value
66
180
  end
67
181
  end
182
+
183
+ def unwrap_array(array)
184
+ if array.size == 1
185
+ array.first
186
+ else
187
+ array
188
+ end
189
+ end
68
190
  end
69
191
  end
@@ -49,7 +49,7 @@ module SidekiqIteration
49
49
  def rows(cursor:)
50
50
  @csv.lazy
51
51
  .each_with_index
52
- .drop(count_of_processed_rows(cursor))
52
+ .drop(cursor || 0)
53
53
  .to_enum { count_of_rows_in_file }
54
54
  end
55
55
 
@@ -60,7 +60,7 @@ module SidekiqIteration
60
60
  @csv.lazy
61
61
  .each_slice(batch_size)
62
62
  .with_index
63
- .drop(count_of_processed_rows(cursor))
63
+ .drop(cursor || 0)
64
64
  .to_enum { (count_of_rows_in_file.to_f / batch_size).ceil }
65
65
  end
66
66
 
@@ -73,13 +73,5 @@ module SidekiqIteration
73
73
  count -= 1 if @csv.headers
74
74
  count
75
75
  end
76
-
77
- def count_of_processed_rows(cursor)
78
- if cursor
79
- cursor + 1
80
- else
81
- 0
82
- end
83
- end
84
76
  end
85
77
  end
@@ -1,7 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require_relative "active_record_enumerator"
4
- require_relative "active_record_batch_enumerator"
5
4
  require_relative "csv_enumerator"
6
5
  require_relative "nested_enumerator"
7
6
 
@@ -22,8 +21,7 @@ module SidekiqIteration
22
21
  raise ArgumentError, "array cannot contain ActiveRecord objects"
23
22
  end
24
23
 
25
- drop = cursor ? cursor + 1 : 0
26
- array.each_with_index.drop(drop).to_enum { array.size }
24
+ array.each_with_index.drop(cursor || 0).to_enum { array.size }
27
25
  end
28
26
 
29
27
  # Builds Enumerator from Active Record Relation. Each Enumerator tick moves the cursor one row forward.
@@ -115,7 +113,7 @@ module SidekiqIteration
115
113
  # end
116
114
  #
117
115
  def active_record_relations_enumerator(scope, cursor:, **options)
118
- ActiveRecordBatchEnumerator.new(scope, cursor: cursor, **options).each
116
+ ActiveRecordEnumerator.new(scope, cursor: cursor, **options).relations
119
117
  end
120
118
 
121
119
  # Builds Enumerator from a CSV file.
@@ -7,6 +7,17 @@ module SidekiqIteration
7
7
  module JobRetryPatch
8
8
  private
9
9
  def process_retry(jobinst, msg, queue, exception)
10
+ add_sidekiq_iteration_metadata(jobinst, msg)
11
+ super
12
+ end
13
+
14
+ # The method was renamed in https://github.com/mperham/sidekiq/commit/0676a5202e89aa9da4ad7991f4111b97a9d8a0a4.
15
+ def attempt_retry(jobinst, msg, queue, exception)
16
+ add_sidekiq_iteration_metadata(jobinst, msg)
17
+ super
18
+ end
19
+
20
+ def add_sidekiq_iteration_metadata(jobinst, msg)
10
21
  if jobinst.is_a?(Iteration)
11
22
  unless msg["args"].last.is_a?(Hash)
12
23
  msg["args"].push({})
@@ -19,12 +30,14 @@ module SidekiqIteration
19
30
  "total_time" => jobinst.total_time,
20
31
  }
21
32
  end
22
-
23
- super
24
33
  end
25
34
  end
26
35
  end
27
36
 
28
- if Sidekiq::JobRetry.instance_method(:process_retry)
37
+ if Sidekiq::JobRetry.private_method_defined?(:process_retry) ||
38
+ Sidekiq::JobRetry.private_method_defined?(:attempt_retry)
29
39
  Sidekiq::JobRetry.prepend(SidekiqIteration::JobRetryPatch)
40
+ else
41
+ raise "Sidekiq #{Sidekiq::VERSION} removed the #process_retry method. " \
42
+ "Please open an issue at the `sidekiq-iteration` gem."
30
43
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SidekiqIteration
4
- VERSION = "0.1.0"
4
+ VERSION = "0.2.0"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sidekiq-iteration
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - fatkodima
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2022-11-02 00:00:00.000000000 Z
12
+ date: 2022-11-11 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: sidekiq
@@ -41,8 +41,6 @@ files:
41
41
  - guides/throttling.md
42
42
  - lib/sidekiq-iteration.rb
43
43
  - lib/sidekiq_iteration.rb
44
- - lib/sidekiq_iteration/active_record_batch_enumerator.rb
45
- - lib/sidekiq_iteration/active_record_cursor.rb
46
44
  - lib/sidekiq_iteration/active_record_enumerator.rb
47
45
  - lib/sidekiq_iteration/csv_enumerator.rb
48
46
  - lib/sidekiq_iteration/enumerators.rb
@@ -1,127 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- module SidekiqIteration
4
- # Batch Enumerator based on ActiveRecord Relation.
5
- # @private
6
- class ActiveRecordBatchEnumerator
7
- include Enumerable
8
-
9
- SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%N"
10
-
11
- def initialize(relation, columns: nil, batch_size: 100, cursor: nil)
12
- @primary_key = "#{relation.table_name}.#{relation.primary_key}"
13
- @columns = Array(columns&.map(&:to_s) || @primary_key)
14
- @primary_key_index = @columns.index(@primary_key) || @columns.index(relation.primary_key)
15
- @pluck_columns = if @primary_key_index
16
- @columns
17
- else
18
- @columns + [@primary_key]
19
- end
20
- @batch_size = batch_size
21
- @cursor = Array.wrap(cursor)
22
- @initial_cursor = @cursor
23
- raise ArgumentError, "Must specify at least one column" if @columns.empty?
24
- if relation.joins_values.present? && !@columns.all?(/\./)
25
- raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
26
- end
27
-
28
- if relation.arel.orders.present? || relation.arel.taken.present?
29
- raise ArgumentError,
30
- "The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
31
- "You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
32
- end
33
-
34
- @base_relation = relation.reorder(@columns.join(", "))
35
- end
36
-
37
- def each
38
- return to_enum { size } unless block_given?
39
-
40
- while (relation = next_batch)
41
- yield relation, cursor_value
42
- end
43
- end
44
-
45
- def size
46
- (@base_relation.count(:all) + @batch_size - 1) / @batch_size # ceiling division
47
- end
48
-
49
- private
50
- def next_batch
51
- relation = @base_relation.limit(@batch_size)
52
- if conditions.any?
53
- relation = relation.where(*conditions)
54
- end
55
-
56
- cursor_values, ids = relation.uncached do
57
- pluck_columns(relation)
58
- end
59
-
60
- cursor = cursor_values.last
61
- unless cursor.present?
62
- @cursor = @initial_cursor
63
- return
64
- end
65
- # The primary key was plucked, but original cursor did not include it, so we should remove it
66
- cursor.pop unless @primary_key_index
67
- @cursor = Array.wrap(cursor)
68
-
69
- # Yields relations by selecting the primary keys of records in the batch.
70
- # Post.where(published: nil) results in an enumerator of relations like:
71
- # Post.where(published: nil, ids: batch_of_ids)
72
- @base_relation.where(@primary_key => ids)
73
- end
74
-
75
- def pluck_columns(relation)
76
- if @pluck_columns.size == 1 # only the primary key
77
- column_values = relation.pluck(*@pluck_columns)
78
- return [column_values, column_values]
79
- end
80
-
81
- column_values = relation.pluck(*@pluck_columns)
82
- primary_key_index = @primary_key_index || -1
83
- primary_key_values = column_values.map { |values| values[primary_key_index] }
84
-
85
- serialize_column_values!(column_values)
86
- [column_values, primary_key_values]
87
- end
88
-
89
- def cursor_value
90
- if @cursor.size == 1
91
- @cursor.first
92
- else
93
- @cursor
94
- end
95
- end
96
-
97
- def conditions
98
- column_index = @cursor.size - 1
99
- column = @columns[column_index]
100
- where_clause = if @columns.size == @cursor.size
101
- "#{column} > ?"
102
- else
103
- "#{column} >= ?"
104
- end
105
- while column_index > 0
106
- column_index -= 1
107
- column = @columns[column_index]
108
- where_clause = "#{column} > ? OR (#{column} = ? AND (#{where_clause}))"
109
- end
110
- ret = @cursor.reduce([where_clause]) { |params, value| params << value << value }
111
- ret.pop
112
- ret
113
- end
114
-
115
- def serialize_column_values!(column_values)
116
- column_values.map! { |values| values.map! { |value| column_value(value) } }
117
- end
118
-
119
- def column_value(value)
120
- if value.is_a?(Time)
121
- value.strftime(SQL_DATETIME_WITH_NSEC)
122
- else
123
- value
124
- end
125
- end
126
- end
127
- end
@@ -1,89 +0,0 @@
1
- # frozen_string_literal: true
2
-
3
- module SidekiqIteration
4
- # @private
5
- class ActiveRecordCursor
6
- include Comparable
7
-
8
- attr_reader :position, :reached_end
9
-
10
- def initialize(relation, columns = nil, position = nil)
11
- columns ||= "#{relation.table_name}.#{relation.primary_key}"
12
- @columns = Array.wrap(columns)
13
- raise ArgumentError, "Must specify at least one column" if @columns.empty?
14
-
15
- self.position = Array.wrap(position)
16
- if relation.joins_values.present? && !@columns.all?(/\./)
17
- raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
18
- end
19
-
20
- if relation.arel.orders.present? || relation.arel.taken.present?
21
- raise ArgumentError,
22
- "The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
23
- "You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
24
- end
25
-
26
- @base_relation = relation.reorder(@columns.join(", "))
27
- @reached_end = false
28
- end
29
-
30
- def <=>(other)
31
- if reached_end == other.reached_end
32
- position <=> other.position
33
- else
34
- reached_end ? 1 : -1
35
- end
36
- end
37
-
38
- def position=(position)
39
- raise ArgumentError, "Cursor position cannot contain nil values" if position.any?(&:nil?)
40
-
41
- @position = position
42
- end
43
-
44
- def next_batch(batch_size)
45
- return if @reached_end
46
-
47
- relation = @base_relation.limit(batch_size)
48
-
49
- if (conditions = self.conditions).any?
50
- relation = relation.where(*conditions)
51
- end
52
-
53
- records = relation.uncached do
54
- relation.to_a
55
- end
56
-
57
- update_from_record(records.last) if records.any?
58
- @reached_end = records.size < batch_size
59
-
60
- records if records.any?
61
- end
62
-
63
- private
64
- def conditions
65
- i = @position.size - 1
66
- column = @columns[i]
67
- conditions = if @columns.size == @position.size
68
- "#{column} > ?"
69
- else
70
- "#{column} >= ?"
71
- end
72
- while i > 0
73
- i -= 1
74
- column = @columns[i]
75
- conditions = "#{column} > ? OR (#{column} = ? AND (#{conditions}))"
76
- end
77
- ret = @position.reduce([conditions]) { |params, value| params << value << value }
78
- ret.pop
79
- ret
80
- end
81
-
82
- def update_from_record(record)
83
- self.position = @columns.map do |column|
84
- method = column.to_s.split(".").last
85
- record.send(method)
86
- end
87
- end
88
- end
89
- end