pluck_in_batches 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d86f3ce009db02836e820ec434ae17fa43685651a74e85e2260675d7f2aeb945
4
- data.tar.gz: e27703d5c07b89db1d75adc5331ce597c8b95669575a65596e0e141a44287412
3
+ metadata.gz: 7c30844758feb52e696cbde2155d47204b537201975ec2a7857db5400ca982c9
4
+ data.tar.gz: 69c73ac8249b95a2b9d8e4542396ee7969be7d28aa3e2277b06f5ed652eedc87
5
5
  SHA512:
6
- metadata.gz: f96ef381074b16ab8cd5fa48ca6f38245cae06ae0213fcb3c1ebb159b7c6712a441bbdf08c5ae36bb63a12bc46d04daa70c38ef2f5bd04462397bf254ab610f0
7
- data.tar.gz: bd5e5382cbd444f4cc0d9b81bad56792c6a6f436cad3376064d1b1cfe46699c2b129e36b4311cedda26aec827b5a140d7f344ebeb959bb1e22594414e1577d6b
6
+ metadata.gz: 7e49e5929fefd29e092c0dc5da7f051fd735dd2175a347dfa0a34af274c105b3dd5c4897f90b350ad2ac982b260f4cfbe64d2504878f5f554036bb7466210579
7
+ data.tar.gz: a1ba0c9f0a426c0fd6cd73985af955cad50dbca4c7777049b94cbf85f655721edffeb13dc4bc71d8c8bc78e357ff22b620b96b7718a64c59a8fb091e3029164d
data/CHANGELOG.md CHANGED
@@ -1,5 +1,23 @@
1
1
  ## master (unreleased)
2
2
 
3
+ ## 0.3.0 (2024-11-03)
4
+
5
+ - Support plucking custom Arel columns
6
+
7
+ ```ruby
8
+ User.pluck_in_batches(:id, Arel.sql("json_extract(users.metadata, '$.rank')"))
9
+ ```
10
+
11
+ ## 0.2.0 (2023-07-24)
12
+
13
+ - Support specifying per cursor column ordering when batching
14
+
15
+ ```ruby
16
+ Book.pluck_in_batches(:title, cursor_columns: [:author_id, :version], order: [:asc, :desc])
17
+ ```
18
+
19
+ - Add `:of` as an alias for `:batch_size` option
20
+
3
21
  ## 0.1.0 (2023-05-16)
4
22
 
5
23
  - First release
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  [![Build Status](https://github.com/fatkodima/pluck_in_batches/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/fatkodima/pluck_in_batches/actions/workflows/ci.yml)
4
4
 
5
- ActiveRecord comes with `find_each` and `find_in_batches` methods to batch process records from a database.
5
+ ActiveRecord comes with `find_each` / `find_in_batches` / `in_batches` methods to batch process records from a database.
6
6
  ActiveRecord also has the `pluck` method which allows the selection of a set of fields without pulling
7
7
  the entire record into memory.
8
8
 
@@ -14,7 +14,7 @@ It performs half of the number of SQL queries, allocates up to half of the memor
14
14
 
15
15
  ```ruby
16
16
  # Before
17
- User.in_batches do |batch|
17
+ User.in_batches do |batch| # or .find_in_batches, or .select(:email).find_each etc
18
18
  emails = batch.pluck(:emails)
19
19
  # do something with emails
20
20
  end
@@ -25,6 +25,8 @@ User.pluck_in_batches(:email) do |emails|
25
25
  end
26
26
  ```
27
27
 
28
+ **Note**: You may also find [`sidekiq-iteration`](https://github.com/fatkodima/sidekiq-iteration) useful when iterating over large collections in Sidekiq jobs.
29
+
28
30
  ## Requirements
29
31
 
30
32
  - Ruby 2.7+
@@ -89,18 +91,25 @@ User.pluck_in_batches(:name, :email).with_index do |group, index|
89
91
  jobs = group.map { |name, email| PartyReminderJob.new(name, email) }
90
92
  ActiveJob.perform_all_later(jobs)
91
93
  end
94
+
95
+ # Custom arel column
96
+ User.pluck_in_batches(:id, Arel.sql("json_extract(users.metadata, '$.rank')")).with_index do |group, index|
97
+ # ...
98
+ end
92
99
  ```
93
100
 
94
101
  Both methods support the following configuration options:
95
102
 
96
103
  * `:batch_size` - Specifies the size of the batch. Defaults to 1000.
104
+ Also aliased as `:of`.
97
105
  * `:start` - Specifies the primary key value to start from, inclusive of the value.
98
106
  * `:finish` - Specifies the primary key value to end at, inclusive of the value.
99
107
  * `:error_on_ignore` - Overrides the application config to specify if an error should be raised when
100
108
  an order is present in the relation.
101
109
  * `:cursor_column` - Specifies the column(s) on which the iteration should be done.
102
110
  This column(s) should be orderable (e.g. an integer or string). Defaults to primary key.
103
- * `:order` - Specifies the primary key order (can be `:asc` or `:desc`). Defaults to `:asc`.
111
+ * `:order` - Specifies the primary key order (can be `:asc` or `:desc` or
112
+ an array consisting of :asc or :desc). Defaults to `:asc`.
104
113
 
105
114
  ## Development
106
115
 
@@ -13,7 +13,7 @@ module PluckInBatches
13
13
  #
14
14
  # See #pluck_in_batches for all the details.
15
15
  #
16
- def pluck_each(*columns, start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, order: :asc, cursor_column: primary_key, &block)
16
+ def pluck_each(*columns, start: nil, finish: nil, of: 1000, batch_size: of, error_on_ignore: nil, order: :asc, cursor_column: primary_key, &block)
17
17
  iterator = Iterator.new(self)
18
18
  iterator.each(*columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order, &block)
19
19
  end
@@ -37,13 +37,23 @@ module PluckInBatches
37
37
  #
38
38
  # ==== Options
39
39
  # * <tt>:batch_size</tt> - Specifies the size of the batch. Defaults to 1000.
40
+ # * <tt>:of</tt> - Same as +:batch_size+.
40
41
  # * <tt>:start</tt> - Specifies the primary key value to start from, inclusive of the value.
41
42
  # * <tt>:finish</tt> - Specifies the primary key value to end at, inclusive of the value.
42
43
  # * <tt>:error_on_ignore</tt> - Overrides the application config to specify if an error should be raised when
43
44
  # an order is present in the relation.
44
45
  # * <tt>:cursor_column</tt> - Specifies the column(s) on which the iteration should be done.
45
46
  # This column(s) should be orderable (e.g. an integer or string). Defaults to primary key.
46
- # * <tt>:order</tt> - Specifies the cursor column(s) order (can be +:asc+ or +:desc+). Defaults to +:asc+.
47
+ # * <tt>:order</tt> - Specifies the cursor column(s) order (can be +:asc+ or +:desc+ or an array consisting
48
+ # of :asc or :desc). Defaults to +:asc+.
49
+ #
50
+ # class Book < ActiveRecord::Base
51
+ # self.primary_key = [:author_id, :version]
52
+ # end
53
+ #
54
+ # Book.pluck_in_batches(:title, order: [:asc, :desc])
55
+ #
56
+ # In the above code, +author_id+ is sorted in ascending order and +version+ in descending order.
47
57
  #
48
58
  # Limits are honored, and if present there is no requirement for the batch
49
59
  # size: it can be less than, equal to, or greater than the limit.
@@ -68,7 +78,7 @@ module PluckInBatches
68
78
  # NOTE: By its nature, batch processing is subject to race conditions if
69
79
  # other processes are modifying the database.
70
80
  #
71
- def pluck_in_batches(*columns, start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, cursor_column: primary_key, order: :asc, &block)
81
+ def pluck_in_batches(*columns, start: nil, finish: nil, of: 1000, batch_size: of, error_on_ignore: nil, cursor_column: primary_key, order: :asc, &block)
72
82
  iterator = Iterator.new(self)
73
83
  iterator.each_batch(*columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order, &block)
74
84
  end
@@ -2,12 +2,15 @@
2
2
 
3
3
  module PluckInBatches
4
4
  class Iterator # :nodoc:
5
+ VALID_ORDERS = [:asc, :desc].freeze
6
+ DEFAULT_ORDER = :asc
7
+
5
8
  def initialize(relation)
6
9
  @relation = relation
7
10
  @klass = relation.klass
8
11
  end
9
12
 
10
- def each(*columns, start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, cursor_column: @relation.primary_key, order: :asc, &block)
13
+ def each(*columns, start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, cursor_column: @relation.primary_key, order: DEFAULT_ORDER, &block)
11
14
  if columns.empty?
12
15
  raise ArgumentError, "Call `pluck_each' with at least one column."
13
16
  end
@@ -18,21 +21,28 @@ module PluckInBatches
18
21
  end
19
22
  else
20
23
  enum_for(__callee__, *columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order) do
21
- apply_limits(@relation, start, finish, order).size
24
+ apply_limits(@relation, start, finish, build_batch_orders(order)).size
22
25
  end
23
26
  end
24
27
  end
25
28
 
26
- def each_batch(*columns, start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, cursor_column: @relation.primary_key, order: :asc)
29
+ def each_batch(*columns, start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, cursor_column: @relation.primary_key, order: DEFAULT_ORDER)
27
30
  if columns.empty?
28
31
  raise ArgumentError, "Call `pluck_in_batches' with at least one column."
29
32
  end
30
33
 
31
- unless order == :asc || order == :desc
32
- raise ArgumentError, ":order must be :asc or :desc, got #{order.inspect}"
34
+ unless Array(order).all? { |ord| VALID_ORDERS.include?(ord) }
35
+ raise ArgumentError, ":order must be :asc or :desc or an array consisting of :asc or :desc, got #{order.inspect}"
36
+ end
37
+
38
+ pluck_columns = columns.map do |column|
39
+ if Arel.arel_node?(column)
40
+ column
41
+ else
42
+ column.to_s
43
+ end
33
44
  end
34
45
 
35
- pluck_columns = columns.map(&:to_s)
36
46
  cursor_columns = Array(cursor_column).map(&:to_s)
37
47
  cursor_column_indexes = cursor_column_indexes(pluck_columns, cursor_columns)
38
48
  missing_cursor_columns = cursor_column_indexes.count(&:nil?)
@@ -44,10 +54,11 @@ module PluckInBatches
44
54
  end
45
55
 
46
56
  relation = @relation
57
+ batch_orders = build_batch_orders(cursor_columns, order)
47
58
 
48
59
  unless block_given?
49
60
  return to_enum(__callee__, *columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order) do
50
- total = apply_limits(relation, cursor_columns, start, finish, order).size
61
+ total = apply_limits(relation, cursor_columns, start, finish, batch_orders).size
51
62
  (total - 1).div(batch_size) + 1
52
63
  end
53
64
  end
@@ -62,8 +73,8 @@ module PluckInBatches
62
73
  batch_limit = remaining if remaining < batch_limit
63
74
  end
64
75
 
65
- relation = relation.reorder(*batch_order(cursor_columns, order)).limit(batch_limit)
66
- relation = apply_limits(relation, cursor_columns, start, finish, order)
76
+ relation = relation.reorder(batch_orders.to_h).limit(batch_limit)
77
+ relation = apply_limits(relation, cursor_columns, start, finish, batch_orders)
67
78
  relation.skip_query_cache! # Retaining the results in the query cache would undermine the point of batching
68
79
  batch_relation = relation
69
80
 
@@ -99,9 +110,13 @@ module PluckInBatches
99
110
  end
100
111
  end
101
112
 
102
- batch_relation = batch_condition(
103
- relation, cursor_columns, cursor_column_offsets, order == :desc ? :lt : :gt
104
- )
113
+ _last_column, last_order = batch_orders.last
114
+ operators = batch_orders.map do |_column, order| # rubocop:disable Lint/ShadowingOuterLocalVariable
115
+ order == :desc ? :lteq : :gteq
116
+ end
117
+ operators[-1] = (last_order == :desc ? :lt : :gt)
118
+
119
+ batch_relation = batch_condition(relation, cursor_columns, cursor_column_offsets, operators)
105
120
  end
106
121
  end
107
122
 
@@ -135,29 +150,33 @@ module PluckInBatches
135
150
  end
136
151
  end
137
152
 
138
- def apply_limits(relation, columns, start, finish, order)
139
- relation = apply_start_limit(relation, columns, start, order) if start
140
- relation = apply_finish_limit(relation, columns, finish, order) if finish
153
+ def apply_limits(relation, columns, start, finish, batch_orders)
154
+ relation = apply_start_limit(relation, columns, start, batch_orders) if start
155
+ relation = apply_finish_limit(relation, columns, finish, batch_orders) if finish
141
156
  relation
142
157
  end
143
158
 
144
- def apply_start_limit(relation, columns, start, order)
145
- batch_condition(relation, columns, start, order == :desc ? :lteq : :gteq)
159
+ def apply_start_limit(relation, columns, start, batch_orders)
160
+ operators = batch_orders.map do |_column, order|
161
+ order == :desc ? :lteq : :gteq
162
+ end
163
+ batch_condition(relation, columns, start, operators)
146
164
  end
147
165
 
148
- def apply_finish_limit(relation, columns, finish, order)
149
- batch_condition(relation, columns, finish, order == :desc ? :gteq : :lteq)
166
+ def apply_finish_limit(relation, columns, finish, batch_orders)
167
+ operators = batch_orders.map do |_column, order|
168
+ order == :desc ? :gteq : :lteq
169
+ end
170
+ batch_condition(relation, columns, finish, operators)
150
171
  end
151
172
 
152
- def batch_condition(relation, columns, values, operator)
153
- columns = Array(columns)
154
- values = Array(values)
155
- cursor_positions = columns.zip(values)
173
+ def batch_condition(relation, columns, values, operators)
174
+ cursor_positions = Array(columns).zip(Array(values), operators)
156
175
 
157
- first_clause_column, first_clause_value = cursor_positions.pop
176
+ first_clause_column, first_clause_value, operator = cursor_positions.pop
158
177
  where_clause = build_attribute_predicate(first_clause_column, first_clause_value, operator)
159
178
 
160
- cursor_positions.reverse_each do |column_name, value|
179
+ cursor_positions.reverse_each do |column_name, value, operator| # rubocop:disable Lint/ShadowingOuterLocalVariable
161
180
  where_clause = build_attribute_predicate(column_name, value, operator == :lteq ? :lt : :gt).or(
162
181
  build_attribute_predicate(column_name, value, :eq).and(where_clause)
163
182
  )
@@ -170,9 +189,9 @@ module PluckInBatches
170
189
  @relation.bind_attribute(column, value) { |attr, bind| attr.public_send(operator, bind) }
171
190
  end
172
191
 
173
- def batch_order(cursor_columns, order)
174
- cursor_columns.map do |column|
175
- @relation.arel_table[column].public_send(order)
192
+ def build_batch_orders(cursor_columns, order)
193
+ cursor_columns.zip(Array(order)).map do |column, ord|
194
+ [column, ord || DEFAULT_ORDER]
176
195
  end
177
196
  end
178
197
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module PluckInBatches
4
- VERSION = "0.1.0"
4
+ VERSION = "0.3.0"
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pluck_in_batches
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - fatkodima
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-05-16 00:00:00.000000000 Z
11
+ date: 2024-11-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activerecord
@@ -60,8 +60,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
60
60
  - !ruby/object:Gem::Version
61
61
  version: '0'
62
62
  requirements: []
63
- rubygems_version: 3.4.12
63
+ rubygems_version: 3.4.19
64
64
  signing_key:
65
65
  specification_version: 4
66
- summary: Change
66
+ summary: A faster alternative to the custom use of `in_batches` with `pluck`.
67
67
  test_files: []