pluck_in_batches 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: d86f3ce009db02836e820ec434ae17fa43685651a74e85e2260675d7f2aeb945
4
+ data.tar.gz: e27703d5c07b89db1d75adc5331ce597c8b95669575a65596e0e141a44287412
5
+ SHA512:
6
+ metadata.gz: f96ef381074b16ab8cd5fa48ca6f38245cae06ae0213fcb3c1ebb159b7c6712a441bbdf08c5ae36bb63a12bc46d04daa70c38ef2f5bd04462397bf254ab610f0
7
+ data.tar.gz: bd5e5382cbd444f4cc0d9b81bad56792c6a6f436cad3376064d1b1cfe46699c2b129e36b4311cedda26aec827b5a140d7f344ebeb959bb1e22594414e1577d6b
data/CHANGELOG.md ADDED
@@ -0,0 +1,5 @@
1
+ ## master (unreleased)
2
+
3
+ ## 0.1.0 (2023-05-16)
4
+
5
+ - First release
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2023 fatkodima
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,115 @@
1
+ # PluckInBatches
2
+
3
+ [![Build Status](https://github.com/fatkodima/pluck_in_batches/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/fatkodima/pluck_in_batches/actions/workflows/ci.yml)
4
+
5
+ ActiveRecord comes with `find_each` and `find_in_batches` methods to batch process records from a database.
6
+ ActiveRecord also has the `pluck` method which allows the selection of a set of fields without pulling
7
+ the entire record into memory.
8
+
9
+ This gem combines these ideas and provides `pluck_each` and `pluck_in_batches` methods to allow
10
+ batch processing of plucked fields from the database.
11
+
12
+ It performs half of the number of SQL queries, allocates up to half of the memory and is up to 2x faster
13
+ (or more, depending on how far is your database from the application) than the available alternative:
14
+
15
+ ```ruby
16
+ # Before
17
+ User.in_batches do |batch|
18
+ emails = batch.pluck(:emails)
19
+ # do something with emails
20
+ end
21
+
22
+ # Now, using this gem (up to 2x faster)
23
+ User.pluck_in_batches(:email) do |emails|
24
+ # do something with emails
25
+ end
26
+ ```
27
+
28
+ ## Requirements
29
+
30
+ - Ruby 2.7+
31
+ - ActiveRecord 6+
32
+
33
+ If you need support for older versions, [open an issue](https://github.com/fatkodima/pluck_in_batches/issues/new).
34
+
35
+ ## Installation
36
+
37
+ Add this line to your application's Gemfile:
38
+
39
+ ```ruby
40
+ gem 'pluck_in_batches'
41
+ ```
42
+
43
+ And then execute:
44
+
45
+ ```sh
46
+ $ bundle
47
+ ```
48
+
49
+ Or install it yourself as:
50
+
51
+ ```sh
52
+ $ gem install pluck_in_batches
53
+ ```
54
+
55
+ ## Usage
56
+
57
+ ### `pluck_each`
58
+
59
+ Behaves similarly to `find_each` ActiveRecord's method, but yields each set of values corresponding
60
+ to the specified columns.
61
+
62
+ ```ruby
63
+ # Single column
64
+ User.where(active: true).pluck_each(:email) do |email|
65
+ # do something with email
66
+ end
67
+
68
+ # Multiple columns
69
+ User.where(active: true).pluck_each(:id, :email) do |id, email|
70
+ # do something with id and email
71
+ end
72
+ ```
73
+
74
+ ### `pluck_in_batches`
75
+
76
+ Behaves similarly to `in_batches` ActiveRecord's method, but yields each batch
77
+ of values corresponding to the specified columns.
78
+
79
+ ```ruby
80
+ # Single column
81
+ User.where("age > 21").pluck_in_batches(:email) do |emails|
82
+ jobs = emails.map { |email| PartyReminderJob.new(email) }
83
+ ActiveJob.perform_all_later(jobs)
84
+ end
85
+
86
+ # Multiple columns
87
+ User.pluck_in_batches(:name, :email).with_index do |group, index|
88
+ puts "Processing group ##{index}"
89
+ jobs = group.map { |name, email| PartyReminderJob.new(name, email) }
90
+ ActiveJob.perform_all_later(jobs)
91
+ end
92
+ ```
93
+
94
+ Both methods support the following configuration options:
95
+
96
+ * `:batch_size` - Specifies the size of the batch. Defaults to 1000.
97
+ * `:start` - Specifies the primary key value to start from, inclusive of the value.
98
+ * `:finish` - Specifies the primary key value to end at, inclusive of the value.
99
+ * `:error_on_ignore` - Overrides the application config to specify if an error should be raised when
100
+ an order is present in the relation.
101
+ * `:cursor_column` - Specifies the column(s) on which the iteration should be done.
102
+ This column(s) should be orderable (e.g. an integer or string). Defaults to primary key.
103
+ * `:order` - Specifies the primary key order (can be `:asc` or `:desc`). Defaults to `:asc`.
104
+
105
+ ## Development
106
+
107
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
108
+
109
+ ## Contributing
110
+
111
+ Bug reports and pull requests are welcome on GitHub at https://github.com/fatkodima/pluck_in_batches.
112
+
113
+ ## License
114
+
115
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
@@ -0,0 +1,77 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PluckInBatches
4
+ module Extensions
5
+ module ModelExtension
6
+ delegate :pluck_each, :pluck_in_batches, to: :all
7
+ end
8
+
9
+ module RelationExtension
10
+ # Yields each set of values corresponding to the specified columns that was found
11
+ # by the passed options. If one column specified - returns its value, if an array of columns -
12
+ # returns an array of values.
13
+ #
14
+ # See #pluck_in_batches for all the details.
15
+ #
16
+ def pluck_each(*columns, start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, order: :asc, cursor_column: primary_key, &block)
17
+ iterator = Iterator.new(self)
18
+ iterator.each(*columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order, &block)
19
+ end
20
+
21
+ # Yields each batch of values corresponding to the specified columns that was found
22
+ # by the passed options as an array.
23
+ #
24
+ # User.where("age > 21").pluck_in_batches(:email) do |emails|
25
+ # jobs = emails.map { |email| PartyReminderJob.new(email) }
26
+ # ActiveJob.perform_all_later(jobs)
27
+ # end
28
+ #
29
+ # If you do not provide a block to #pluck_in_batches, it will return an Enumerator
30
+ # for chaining with other methods:
31
+ #
32
+ # User.pluck_in_batches(:name, :email).with_index do |group, index|
33
+ # puts "Processing group ##{index}"
34
+ # jobs = group.map { |name, email| PartyReminderJob.new(name, email) }
35
+ # ActiveJob.perform_all_later(jobs)
36
+ # end
37
+ #
38
+ # ==== Options
39
+ # * <tt>:batch_size</tt> - Specifies the size of the batch. Defaults to 1000.
40
+ # * <tt>:start</tt> - Specifies the primary key value to start from, inclusive of the value.
41
+ # * <tt>:finish</tt> - Specifies the primary key value to end at, inclusive of the value.
42
+ # * <tt>:error_on_ignore</tt> - Overrides the application config to specify if an error should be raised when
43
+ # an order is present in the relation.
44
+ # * <tt>:cursor_column</tt> - Specifies the column(s) on which the iteration should be done.
45
+ # This column(s) should be orderable (e.g. an integer or string). Defaults to primary key.
46
+ # * <tt>:order</tt> - Specifies the cursor column(s) order (can be +:asc+ or +:desc+). Defaults to +:asc+.
47
+ #
48
+ # Limits are honored, and if present there is no requirement for the batch
49
+ # size: it can be less than, equal to, or greater than the limit.
50
+ #
51
+ # The options +start+ and +finish+ are especially useful if you want
52
+ # multiple workers dealing with the same processing queue. You can make
53
+ # worker 1 handle all the records between id 1 and 9999 and worker 2
54
+ # handle from 10000 and beyond by setting the +:start+ and +:finish+
55
+ # option on each worker.
56
+ #
57
+ # # Let's process from record 10_000 on.
58
+ # User.pluck_in_batches(:email, start: 10_000) do |emails|
59
+ # jobs = emails.map { |email| PartyReminderJob.new(email) }
60
+ # ActiveJob.perform_all_later(jobs)
61
+ # end
62
+ #
63
+ # NOTE: Order can be ascending (:asc) or descending (:desc). It is automatically set to
64
+ # ascending on the primary key ("id ASC").
65
+ # This also means that this method only works when the primary key is
66
+ # orderable (e.g. an integer or string).
67
+ #
68
+ # NOTE: By its nature, batch processing is subject to race conditions if
69
+ # other processes are modifying the database.
70
+ #
71
+ def pluck_in_batches(*columns, start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, cursor_column: primary_key, order: :asc, &block)
72
+ iterator = Iterator.new(self)
73
+ iterator.each_batch(*columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order, &block)
74
+ end
75
+ end
76
+ end
77
+ end
@@ -0,0 +1,183 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PluckInBatches
4
+ class Iterator # :nodoc:
5
+ def initialize(relation)
6
+ @relation = relation
7
+ @klass = relation.klass
8
+ end
9
+
10
+ def each(*columns, start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, cursor_column: @relation.primary_key, order: :asc, &block)
11
+ if columns.empty?
12
+ raise ArgumentError, "Call `pluck_each' with at least one column."
13
+ end
14
+
15
+ if block_given?
16
+ each_batch(*columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order) do |batch|
17
+ batch.each(&block)
18
+ end
19
+ else
20
+ enum_for(__callee__, *columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order) do
21
+ apply_limits(@relation, start, finish, order).size
22
+ end
23
+ end
24
+ end
25
+
26
+ def each_batch(*columns, start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, cursor_column: @relation.primary_key, order: :asc)
27
+ if columns.empty?
28
+ raise ArgumentError, "Call `pluck_in_batches' with at least one column."
29
+ end
30
+
31
+ unless order == :asc || order == :desc
32
+ raise ArgumentError, ":order must be :asc or :desc, got #{order.inspect}"
33
+ end
34
+
35
+ pluck_columns = columns.map(&:to_s)
36
+ cursor_columns = Array(cursor_column).map(&:to_s)
37
+ cursor_column_indexes = cursor_column_indexes(pluck_columns, cursor_columns)
38
+ missing_cursor_columns = cursor_column_indexes.count(&:nil?)
39
+ cursor_column_indexes.each_with_index do |column_index, index|
40
+ unless column_index
41
+ cursor_column_indexes[index] = pluck_columns.size
42
+ pluck_columns << cursor_columns[index]
43
+ end
44
+ end
45
+
46
+ relation = @relation
47
+
48
+ unless block_given?
49
+ return to_enum(__callee__, *columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order) do
50
+ total = apply_limits(relation, cursor_columns, start, finish, order).size
51
+ (total - 1).div(batch_size) + 1
52
+ end
53
+ end
54
+
55
+ if relation.arel.orders.present?
56
+ act_on_ignored_order(error_on_ignore)
57
+ end
58
+
59
+ batch_limit = batch_size
60
+ if relation.limit_value
61
+ remaining = relation.limit_value
62
+ batch_limit = remaining if remaining < batch_limit
63
+ end
64
+
65
+ relation = relation.reorder(*batch_order(cursor_columns, order)).limit(batch_limit)
66
+ relation = apply_limits(relation, cursor_columns, start, finish, order)
67
+ relation.skip_query_cache! # Retaining the results in the query cache would undermine the point of batching
68
+ batch_relation = relation
69
+
70
+ loop do
71
+ batch = batch_relation.pluck(*pluck_columns)
72
+ break if batch.empty?
73
+
74
+ cursor_column_offsets =
75
+ if pluck_columns.size == 1
76
+ Array(batch.last)
77
+ else
78
+ cursor_column_indexes.map.with_index do |column_index, index|
79
+ batch.last[column_index || (batch.last.size - cursor_column_indexes.size + index)]
80
+ end
81
+ end
82
+
83
+ missing_cursor_columns.times { batch.each(&:pop) }
84
+ batch.flatten!(1) if columns.size == 1
85
+
86
+ yield batch
87
+
88
+ break if batch.length < batch_limit
89
+
90
+ if @relation.limit_value
91
+ remaining -= batch.length
92
+
93
+ if remaining == 0
94
+ # Saves a useless iteration when the limit is a multiple of the
95
+ # batch size.
96
+ break
97
+ elsif remaining < batch_limit
98
+ relation = relation.limit(remaining)
99
+ end
100
+ end
101
+
102
+ batch_relation = batch_condition(
103
+ relation, cursor_columns, cursor_column_offsets, order == :desc ? :lt : :gt
104
+ )
105
+ end
106
+ end
107
+
108
+ private
109
+ def cursor_column_indexes(columns, cursor_column)
110
+ cursor_column.map do |column|
111
+ columns.index(column) ||
112
+ columns.index("#{@klass.table_name}.#{column}") ||
113
+ columns.index("#{@klass.quoted_table_name}.#{@klass.connection.quote_column_name(column)}")
114
+ end
115
+ end
116
+
117
+ def act_on_ignored_order(error_on_ignore)
118
+ raise_error =
119
+ if error_on_ignore.nil?
120
+ if ar_version >= 7.0
121
+ ActiveRecord.error_on_ignored_order
122
+ else
123
+ @klass.error_on_ignored_order
124
+ end
125
+ else
126
+ error_on_ignore
127
+ end
128
+
129
+ message = "Scoped order is ignored, it's forced to be batch order."
130
+
131
+ if raise_error
132
+ raise ArgumentError, message
133
+ elsif (logger = ActiveRecord::Base.logger)
134
+ logger.warn(message)
135
+ end
136
+ end
137
+
138
+ def apply_limits(relation, columns, start, finish, order)
139
+ relation = apply_start_limit(relation, columns, start, order) if start
140
+ relation = apply_finish_limit(relation, columns, finish, order) if finish
141
+ relation
142
+ end
143
+
144
+ def apply_start_limit(relation, columns, start, order)
145
+ batch_condition(relation, columns, start, order == :desc ? :lteq : :gteq)
146
+ end
147
+
148
+ def apply_finish_limit(relation, columns, finish, order)
149
+ batch_condition(relation, columns, finish, order == :desc ? :gteq : :lteq)
150
+ end
151
+
152
+ def batch_condition(relation, columns, values, operator)
153
+ columns = Array(columns)
154
+ values = Array(values)
155
+ cursor_positions = columns.zip(values)
156
+
157
+ first_clause_column, first_clause_value = cursor_positions.pop
158
+ where_clause = build_attribute_predicate(first_clause_column, first_clause_value, operator)
159
+
160
+ cursor_positions.reverse_each do |column_name, value|
161
+ where_clause = build_attribute_predicate(column_name, value, operator == :lteq ? :lt : :gt).or(
162
+ build_attribute_predicate(column_name, value, :eq).and(where_clause)
163
+ )
164
+ end
165
+
166
+ relation.where(where_clause)
167
+ end
168
+
169
+ def build_attribute_predicate(column, value, operator)
170
+ @relation.bind_attribute(column, value) { |attr, bind| attr.public_send(operator, bind) }
171
+ end
172
+
173
+ def batch_order(cursor_columns, order)
174
+ cursor_columns.map do |column|
175
+ @relation.arel_table[column].public_send(order)
176
+ end
177
+ end
178
+
179
+ def ar_version
180
+ ActiveRecord.version.to_s.to_f
181
+ end
182
+ end
183
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PluckInBatches
4
+ VERSION = "0.1.0"
5
+ end
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "active_record"
4
+
5
+ require_relative "pluck_in_batches/iterator"
6
+ require_relative "pluck_in_batches/extensions"
7
+ require_relative "pluck_in_batches/version"
8
+
9
+ module PluckInBatches
10
+ end
11
+
12
+ ActiveSupport.on_load(:active_record) do
13
+ extend(PluckInBatches::Extensions::ModelExtension)
14
+ ActiveRecord::Relation.include(PluckInBatches::Extensions::RelationExtension)
15
+ end
metadata ADDED
@@ -0,0 +1,67 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: pluck_in_batches
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - fatkodima
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2023-05-16 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: activerecord
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '6.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '6.0'
27
+ description:
28
+ email:
29
+ - fatkodima123@gmail.com
30
+ executables: []
31
+ extensions: []
32
+ extra_rdoc_files: []
33
+ files:
34
+ - CHANGELOG.md
35
+ - LICENSE.txt
36
+ - README.md
37
+ - lib/pluck_in_batches.rb
38
+ - lib/pluck_in_batches/extensions.rb
39
+ - lib/pluck_in_batches/iterator.rb
40
+ - lib/pluck_in_batches/version.rb
41
+ homepage: https://github.com/fatkodima/pluck_in_batches
42
+ licenses:
43
+ - MIT
44
+ metadata:
45
+ homepage_uri: https://github.com/fatkodima/pluck_in_batches
46
+ source_code_uri: https://github.com/fatkodima/pluck_in_batches
47
+ changelog_uri: https://github.com/fatkodima/pluck_in_batches/blob/master/CHANGELOG.md
48
+ post_install_message:
49
+ rdoc_options: []
50
+ require_paths:
51
+ - lib
52
+ required_ruby_version: !ruby/object:Gem::Requirement
53
+ requirements:
54
+ - - ">="
55
+ - !ruby/object:Gem::Version
56
+ version: 2.7.0
57
+ required_rubygems_version: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ requirements: []
63
+ rubygems_version: 3.4.12
64
+ signing_key:
65
+ specification_version: 4
66
+ summary: Change
67
+ test_files: []