pluck_in_batches 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: d86f3ce009db02836e820ec434ae17fa43685651a74e85e2260675d7f2aeb945
4
+ data.tar.gz: e27703d5c07b89db1d75adc5331ce597c8b95669575a65596e0e141a44287412
5
+ SHA512:
6
+ metadata.gz: f96ef381074b16ab8cd5fa48ca6f38245cae06ae0213fcb3c1ebb159b7c6712a441bbdf08c5ae36bb63a12bc46d04daa70c38ef2f5bd04462397bf254ab610f0
7
+ data.tar.gz: bd5e5382cbd444f4cc0d9b81bad56792c6a6f436cad3376064d1b1cfe46699c2b129e36b4311cedda26aec827b5a140d7f344ebeb959bb1e22594414e1577d6b
data/CHANGELOG.md ADDED
@@ -0,0 +1,5 @@
1
+ ## master (unreleased)
2
+
3
+ ## 0.1.0 (2023-05-16)
4
+
5
+ - First release
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2023 fatkodima
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,115 @@
1
+ # PluckInBatches
2
+
3
+ [![Build Status](https://github.com/fatkodima/pluck_in_batches/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/fatkodima/pluck_in_batches/actions/workflows/ci.yml)
4
+
5
+ ActiveRecord comes with `find_each` and `find_in_batches` methods to batch process records from a database.
6
+ ActiveRecord also has the `pluck` method which allows the selection of a set of fields without pulling
7
+ the entire record into memory.
8
+
9
+ This gem combines these ideas and provides `pluck_each` and `pluck_in_batches` methods to allow
10
+ batch processing of plucked fields from the database.
11
+
12
+ It performs half of the number of SQL queries, allocates up to half of the memory and is up to 2x faster
13
+ (or more, depending on how far is your database from the application) than the available alternative:
14
+
15
+ ```ruby
16
+ # Before
17
+ User.in_batches do |batch|
18
+ emails = batch.pluck(:emails)
19
+ # do something with emails
20
+ end
21
+
22
+ # Now, using this gem (up to 2x faster)
23
+ User.pluck_in_batches(:email) do |emails|
24
+ # do something with emails
25
+ end
26
+ ```
27
+
28
+ ## Requirements
29
+
30
+ - Ruby 2.7+
31
+ - ActiveRecord 6+
32
+
33
+ If you need support for older versions, [open an issue](https://github.com/fatkodima/pluck_in_batches/issues/new).
34
+
35
+ ## Installation
36
+
37
+ Add this line to your application's Gemfile:
38
+
39
+ ```ruby
40
+ gem 'pluck_in_batches'
41
+ ```
42
+
43
+ And then execute:
44
+
45
+ ```sh
46
+ $ bundle
47
+ ```
48
+
49
+ Or install it yourself as:
50
+
51
+ ```sh
52
+ $ gem install pluck_in_batches
53
+ ```
54
+
55
+ ## Usage
56
+
57
+ ### `pluck_each`
58
+
59
+ Behaves similarly to `find_each` ActiveRecord's method, but yields each set of values corresponding
60
+ to the specified columns.
61
+
62
+ ```ruby
63
+ # Single column
64
+ User.where(active: true).pluck_each(:email) do |email|
65
+ # do something with email
66
+ end
67
+
68
+ # Multiple columns
69
+ User.where(active: true).pluck_each(:id, :email) do |id, email|
70
+ # do something with id and email
71
+ end
72
+ ```
73
+
74
+ ### `pluck_in_batches`
75
+
76
+ Behaves similarly to `in_batches` ActiveRecord's method, but yields each batch
77
+ of values corresponding to the specified columns.
78
+
79
+ ```ruby
80
+ # Single column
81
+ User.where("age > 21").pluck_in_batches(:email) do |emails|
82
+ jobs = emails.map { |email| PartyReminderJob.new(email) }
83
+ ActiveJob.perform_all_later(jobs)
84
+ end
85
+
86
+ # Multiple columns
87
+ User.pluck_in_batches(:name, :email).with_index do |group, index|
88
+ puts "Processing group ##{index}"
89
+ jobs = group.map { |name, email| PartyReminderJob.new(name, email) }
90
+ ActiveJob.perform_all_later(jobs)
91
+ end
92
+ ```
93
+
94
+ Both methods support the following configuration options:
95
+
96
+ * `:batch_size` - Specifies the size of the batch. Defaults to 1000.
97
+ * `:start` - Specifies the primary key value to start from, inclusive of the value.
98
+ * `:finish` - Specifies the primary key value to end at, inclusive of the value.
99
+ * `:error_on_ignore` - Overrides the application config to specify if an error should be raised when
100
+ an order is present in the relation.
101
+ * `:cursor_column` - Specifies the column(s) on which the iteration should be done.
102
+ This column(s) should be orderable (e.g. an integer or string). Defaults to primary key.
103
+ * `:order` - Specifies the primary key order (can be `:asc` or `:desc`). Defaults to `:asc`.
104
+
105
+ ## Development
106
+
107
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
108
+
109
+ ## Contributing
110
+
111
+ Bug reports and pull requests are welcome on GitHub at https://github.com/fatkodima/pluck_in_batches.
112
+
113
+ ## License
114
+
115
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
@@ -0,0 +1,77 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PluckInBatches
4
+ module Extensions
5
+ module ModelExtension
6
+ delegate :pluck_each, :pluck_in_batches, to: :all
7
+ end
8
+
9
+ module RelationExtension
10
+ # Yields each set of values corresponding to the specified columns that was found
11
+ # by the passed options. If one column specified - returns its value, if an array of columns -
12
+ # returns an array of values.
13
+ #
14
+ # See #pluck_in_batches for all the details.
15
+ #
16
+ def pluck_each(*columns, start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, order: :asc, cursor_column: primary_key, &block)
17
+ iterator = Iterator.new(self)
18
+ iterator.each(*columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order, &block)
19
+ end
20
+
21
+ # Yields each batch of values corresponding to the specified columns that was found
22
+ # by the passed options as an array.
23
+ #
24
+ # User.where("age > 21").pluck_in_batches(:email) do |emails|
25
+ # jobs = emails.map { |email| PartyReminderJob.new(email) }
26
+ # ActiveJob.perform_all_later(jobs)
27
+ # end
28
+ #
29
+ # If you do not provide a block to #pluck_in_batches, it will return an Enumerator
30
+ # for chaining with other methods:
31
+ #
32
+ # User.pluck_in_batches(:name, :email).with_index do |group, index|
33
+ # puts "Processing group ##{index}"
34
+ # jobs = group.map { |name, email| PartyReminderJob.new(name, email) }
35
+ # ActiveJob.perform_all_later(jobs)
36
+ # end
37
+ #
38
+ # ==== Options
39
+ # * <tt>:batch_size</tt> - Specifies the size of the batch. Defaults to 1000.
40
+ # * <tt>:start</tt> - Specifies the primary key value to start from, inclusive of the value.
41
+ # * <tt>:finish</tt> - Specifies the primary key value to end at, inclusive of the value.
42
+ # * <tt>:error_on_ignore</tt> - Overrides the application config to specify if an error should be raised when
43
+ # an order is present in the relation.
44
+ # * <tt>:cursor_column</tt> - Specifies the column(s) on which the iteration should be done.
45
+ # This column(s) should be orderable (e.g. an integer or string). Defaults to primary key.
46
+ # * <tt>:order</tt> - Specifies the cursor column(s) order (can be +:asc+ or +:desc+). Defaults to +:asc+.
47
+ #
48
+ # Limits are honored, and if present there is no requirement for the batch
49
+ # size: it can be less than, equal to, or greater than the limit.
50
+ #
51
+ # The options +start+ and +finish+ are especially useful if you want
52
+ # multiple workers dealing with the same processing queue. You can make
53
+ # worker 1 handle all the records between id 1 and 9999 and worker 2
54
+ # handle from 10000 and beyond by setting the +:start+ and +:finish+
55
+ # option on each worker.
56
+ #
57
+ # # Let's process from record 10_000 on.
58
+ # User.pluck_in_batches(:email, start: 10_000) do |emails|
59
+ # jobs = emails.map { |email| PartyReminderJob.new(email) }
60
+ # ActiveJob.perform_all_later(jobs)
61
+ # end
62
+ #
63
+ # NOTE: Order can be ascending (:asc) or descending (:desc). It is automatically set to
64
+ # ascending on the primary key ("id ASC").
65
+ # This also means that this method only works when the primary key is
66
+ # orderable (e.g. an integer or string).
67
+ #
68
+ # NOTE: By its nature, batch processing is subject to race conditions if
69
+ # other processes are modifying the database.
70
+ #
71
+ def pluck_in_batches(*columns, start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, cursor_column: primary_key, order: :asc, &block)
72
+ iterator = Iterator.new(self)
73
+ iterator.each_batch(*columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order, &block)
74
+ end
75
+ end
76
+ end
77
+ end
@@ -0,0 +1,183 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PluckInBatches
4
+ class Iterator # :nodoc:
5
+ def initialize(relation)
6
+ @relation = relation
7
+ @klass = relation.klass
8
+ end
9
+
10
+ def each(*columns, start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, cursor_column: @relation.primary_key, order: :asc, &block)
11
+ if columns.empty?
12
+ raise ArgumentError, "Call `pluck_each' with at least one column."
13
+ end
14
+
15
+ if block_given?
16
+ each_batch(*columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order) do |batch|
17
+ batch.each(&block)
18
+ end
19
+ else
20
+ enum_for(__callee__, *columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order) do
21
+ apply_limits(@relation, start, finish, order).size
22
+ end
23
+ end
24
+ end
25
+
26
+ def each_batch(*columns, start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, cursor_column: @relation.primary_key, order: :asc)
27
+ if columns.empty?
28
+ raise ArgumentError, "Call `pluck_in_batches' with at least one column."
29
+ end
30
+
31
+ unless order == :asc || order == :desc
32
+ raise ArgumentError, ":order must be :asc or :desc, got #{order.inspect}"
33
+ end
34
+
35
+ pluck_columns = columns.map(&:to_s)
36
+ cursor_columns = Array(cursor_column).map(&:to_s)
37
+ cursor_column_indexes = cursor_column_indexes(pluck_columns, cursor_columns)
38
+ missing_cursor_columns = cursor_column_indexes.count(&:nil?)
39
+ cursor_column_indexes.each_with_index do |column_index, index|
40
+ unless column_index
41
+ cursor_column_indexes[index] = pluck_columns.size
42
+ pluck_columns << cursor_columns[index]
43
+ end
44
+ end
45
+
46
+ relation = @relation
47
+
48
+ unless block_given?
49
+ return to_enum(__callee__, *columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order) do
50
+ total = apply_limits(relation, cursor_columns, start, finish, order).size
51
+ (total - 1).div(batch_size) + 1
52
+ end
53
+ end
54
+
55
+ if relation.arel.orders.present?
56
+ act_on_ignored_order(error_on_ignore)
57
+ end
58
+
59
+ batch_limit = batch_size
60
+ if relation.limit_value
61
+ remaining = relation.limit_value
62
+ batch_limit = remaining if remaining < batch_limit
63
+ end
64
+
65
+ relation = relation.reorder(*batch_order(cursor_columns, order)).limit(batch_limit)
66
+ relation = apply_limits(relation, cursor_columns, start, finish, order)
67
+ relation.skip_query_cache! # Retaining the results in the query cache would undermine the point of batching
68
+ batch_relation = relation
69
+
70
+ loop do
71
+ batch = batch_relation.pluck(*pluck_columns)
72
+ break if batch.empty?
73
+
74
+ cursor_column_offsets =
75
+ if pluck_columns.size == 1
76
+ Array(batch.last)
77
+ else
78
+ cursor_column_indexes.map.with_index do |column_index, index|
79
+ batch.last[column_index || (batch.last.size - cursor_column_indexes.size + index)]
80
+ end
81
+ end
82
+
83
+ missing_cursor_columns.times { batch.each(&:pop) }
84
+ batch.flatten!(1) if columns.size == 1
85
+
86
+ yield batch
87
+
88
+ break if batch.length < batch_limit
89
+
90
+ if @relation.limit_value
91
+ remaining -= batch.length
92
+
93
+ if remaining == 0
94
+ # Saves a useless iteration when the limit is a multiple of the
95
+ # batch size.
96
+ break
97
+ elsif remaining < batch_limit
98
+ relation = relation.limit(remaining)
99
+ end
100
+ end
101
+
102
+ batch_relation = batch_condition(
103
+ relation, cursor_columns, cursor_column_offsets, order == :desc ? :lt : :gt
104
+ )
105
+ end
106
+ end
107
+
108
+ private
109
+ def cursor_column_indexes(columns, cursor_column)
110
+ cursor_column.map do |column|
111
+ columns.index(column) ||
112
+ columns.index("#{@klass.table_name}.#{column}") ||
113
+ columns.index("#{@klass.quoted_table_name}.#{@klass.connection.quote_column_name(column)}")
114
+ end
115
+ end
116
+
117
+ def act_on_ignored_order(error_on_ignore)
118
+ raise_error =
119
+ if error_on_ignore.nil?
120
+ if ar_version >= 7.0
121
+ ActiveRecord.error_on_ignored_order
122
+ else
123
+ @klass.error_on_ignored_order
124
+ end
125
+ else
126
+ error_on_ignore
127
+ end
128
+
129
+ message = "Scoped order is ignored, it's forced to be batch order."
130
+
131
+ if raise_error
132
+ raise ArgumentError, message
133
+ elsif (logger = ActiveRecord::Base.logger)
134
+ logger.warn(message)
135
+ end
136
+ end
137
+
138
+ def apply_limits(relation, columns, start, finish, order)
139
+ relation = apply_start_limit(relation, columns, start, order) if start
140
+ relation = apply_finish_limit(relation, columns, finish, order) if finish
141
+ relation
142
+ end
143
+
144
+ def apply_start_limit(relation, columns, start, order)
145
+ batch_condition(relation, columns, start, order == :desc ? :lteq : :gteq)
146
+ end
147
+
148
+ def apply_finish_limit(relation, columns, finish, order)
149
+ batch_condition(relation, columns, finish, order == :desc ? :gteq : :lteq)
150
+ end
151
+
152
+ def batch_condition(relation, columns, values, operator)
153
+ columns = Array(columns)
154
+ values = Array(values)
155
+ cursor_positions = columns.zip(values)
156
+
157
+ first_clause_column, first_clause_value = cursor_positions.pop
158
+ where_clause = build_attribute_predicate(first_clause_column, first_clause_value, operator)
159
+
160
+ cursor_positions.reverse_each do |column_name, value|
161
+ where_clause = build_attribute_predicate(column_name, value, operator == :lteq ? :lt : :gt).or(
162
+ build_attribute_predicate(column_name, value, :eq).and(where_clause)
163
+ )
164
+ end
165
+
166
+ relation.where(where_clause)
167
+ end
168
+
169
+ def build_attribute_predicate(column, value, operator)
170
+ @relation.bind_attribute(column, value) { |attr, bind| attr.public_send(operator, bind) }
171
+ end
172
+
173
+ def batch_order(cursor_columns, order)
174
+ cursor_columns.map do |column|
175
+ @relation.arel_table[column].public_send(order)
176
+ end
177
+ end
178
+
179
+ def ar_version
180
+ ActiveRecord.version.to_s.to_f
181
+ end
182
+ end
183
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PluckInBatches
4
+ VERSION = "0.1.0"
5
+ end
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "active_record"
4
+
5
+ require_relative "pluck_in_batches/iterator"
6
+ require_relative "pluck_in_batches/extensions"
7
+ require_relative "pluck_in_batches/version"
8
+
9
+ module PluckInBatches
10
+ end
11
+
12
+ ActiveSupport.on_load(:active_record) do
13
+ extend(PluckInBatches::Extensions::ModelExtension)
14
+ ActiveRecord::Relation.include(PluckInBatches::Extensions::RelationExtension)
15
+ end
metadata ADDED
@@ -0,0 +1,67 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: pluck_in_batches
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - fatkodima
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2023-05-16 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: activerecord
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '6.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '6.0'
27
+ description:
28
+ email:
29
+ - fatkodima123@gmail.com
30
+ executables: []
31
+ extensions: []
32
+ extra_rdoc_files: []
33
+ files:
34
+ - CHANGELOG.md
35
+ - LICENSE.txt
36
+ - README.md
37
+ - lib/pluck_in_batches.rb
38
+ - lib/pluck_in_batches/extensions.rb
39
+ - lib/pluck_in_batches/iterator.rb
40
+ - lib/pluck_in_batches/version.rb
41
+ homepage: https://github.com/fatkodima/pluck_in_batches
42
+ licenses:
43
+ - MIT
44
+ metadata:
45
+ homepage_uri: https://github.com/fatkodima/pluck_in_batches
46
+ source_code_uri: https://github.com/fatkodima/pluck_in_batches
47
+ changelog_uri: https://github.com/fatkodima/pluck_in_batches/blob/master/CHANGELOG.md
48
+ post_install_message:
49
+ rdoc_options: []
50
+ require_paths:
51
+ - lib
52
+ required_ruby_version: !ruby/object:Gem::Requirement
53
+ requirements:
54
+ - - ">="
55
+ - !ruby/object:Gem::Version
56
+ version: 2.7.0
57
+ required_rubygems_version: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ requirements: []
63
+ rubygems_version: 3.4.12
64
+ signing_key:
65
+ specification_version: 4
66
+ summary: Change
67
+ test_files: []