job-iteration 1.1.11 → 1.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 18324439fe7b98c7f1bca543c078ddbfdff18af5a7e97a87b21b48890e919769
4
- data.tar.gz: 3d912ea06a5a66ee841fbd605e8dfc414bd1760c61d9edf74445e79ae16a6466
3
+ metadata.gz: eff37be02274e3c0db10de1cf3e49a10eea5b5b838b19800d143853aee94c08f
4
+ data.tar.gz: '085dc8b7eec8897393461e43fb4ece8acaddc6705165a2b6de0856cef54b1215'
5
5
  SHA512:
6
- metadata.gz: 622368d1208ea23014188c028f832f51f899b6bdfbca8938c6396dcca48f913679ee7f41a035c5cbe0376346ea40a583c4615bd665251cf7dc36a30aeb37904a
7
- data.tar.gz: 51931d565cffb4600141e1e96dc9a1a6b8a522589c8b0b38dc4cd1846e8d59a9f150d0d0bfcde4aaee587478af57418093a9e313b2f17c9f2c54ffb6cf18382a
6
+ metadata.gz: 22352326237ca16a23fc4177611187d4c6d426c07397caecd09165303726ce9ba00acaa882e35fb580b19bbe53f056ddde642298268a329dd02c88f61f82b5b7
7
+ data.tar.gz: 583406e5180f2721895673aa0410a390dce378c9c30e625151b5455fa8c1331867e94cc1cf2518b8b29cb0fbe908af517ee88a88429fd5f71f34a80617f5d8e6
@@ -13,11 +13,9 @@ jobs:
13
13
  - 6379:6379
14
14
  strategy:
15
15
  matrix:
16
- ruby: [2.5, 2.6, 2.7, 3.0]
16
+ ruby: [2.6, 2.7, 3.0]
17
17
  gemfile: [rails_5_2, rails_6_0, rails_edge]
18
18
  exclude:
19
- - ruby: 2.5
20
- gemfile: rails_edge
21
19
  - ruby: 2.6
22
20
  gemfile: rails_edge
23
21
  - ruby: 3.0
data/.rubocop.yml CHANGED
@@ -2,7 +2,7 @@ inherit_gem:
2
2
  rubocop-shopify: rubocop.yml
3
3
 
4
4
  AllCops:
5
- TargetRubyVersion: 2.4.4
5
+ TargetRubyVersion: 2.6.5
6
6
  Exclude:
7
7
  - 'vendor/bundle/**/*'
8
8
  Lint/SuppressedException:
data/CHANGELOG.md CHANGED
@@ -1,15 +1,35 @@
1
1
  ### Master (unreleased)
2
2
 
3
+ ## v1.2.0 (Sept 21, 2021)
4
+ - [107](https://github.com/Shopify/job-iteration/pull/107) - Remove broken links from README
5
+ - [108](https://github.com/Shopify/job-iteration/pull/108) - Drop support for ruby 2.5
6
+ - [110](https://github.com/Shopify/job-iteration/pull/110) - Update rubocop TargetRubyVersion
7
+
8
+ ## v1.1.14 (May 28, 2021)
9
+
10
+ #### Bug fix
11
+ - [84](https://github.com/Shopify/job-iteration/pull/84) - Call adjust_total_time before running on_complete callbacks
12
+ - [94](https://github.com/Shopify/job-iteration/pull/94) - Remove unnecessary break
13
+ - [95](https://github.com/Shopify/job-iteration/pull/95) - ActiveRecordBatchEnumerator#each should rewind at the end
14
+ - [97](https://github.com/Shopify/job-iteration/pull/97) - Batch enumerator size returns the number of batches, not records
15
+
16
+ ## v1.1.13 (May 20, 2021)
17
+
3
18
  #### New feature
19
+ - [91](https://github.com/Shopify/job-iteration/pull/91) - Add enumerator yielding batches as Active Record Relations
20
+
21
+ ## v1.1.12 (April 19, 2021)
4
22
 
5
23
  #### Bug fix
6
24
 
25
+ - [77](https://github.com/Shopify/job-iteration/pull/77) - Defer enforce cursor be serializable until 2.0.0
7
26
 
8
27
  ## v1.1.11 (April 19, 2021)
9
28
 
10
29
  #### Bug fix
11
30
 
12
31
  - [73](https://github.com/Shopify/job-iteration/pull/73) - Enforce cursor be serializable
32
+ _This is reverted in 1.1.12 as it breaks behaviour in some apps._
13
33
 
14
34
  ## v1.1.10 (March 30, 2021)
15
35
 
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Job Iteration API
2
2
 
3
- [![Build Status](https://travis-ci.com/Shopify/job-iteration.svg?branch=master)](https://travis-ci.com/Shopify/job-iteration)
3
+ [![CI](https://github.com/Shopify/job-iteration/actions/workflows/ci.yml/badge.svg)](https://github.com/Shopify/job-iteration/actions/workflows/ci.yml)
4
4
 
5
5
  Meet Iteration, an extension for [ActiveJob](https://github.com/rails/rails/tree/master/activejob) that makes your jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
6
6
 
@@ -77,7 +77,28 @@ class BatchesJob < ApplicationJob
77
77
 
78
78
  def each_iteration(batch_of_comments, product_id)
79
79
  # batch_of_comments will contain batches of 100 records
80
- Comment.where(id: batch_of_comments.map(&:id)).update_all(deleted: true)
80
+ batch_of_comments.each do |comment|
81
+ DeleteCommentJob.perform_later(comment)
82
+ end
83
+ end
84
+ end
85
+ ```
86
+
87
+ ```ruby
88
+ class BatchesAsRelationJob < ApplicationJob
89
+ include JobIteration::Iteration
90
+
91
+ def build_enumerator(product_id, cursor:)
92
+ enumerator_builder.active_record_on_batch_relations(
93
+ Product.find(product_id).comments,
94
+ cursor: cursor,
95
+ batch_size: 100,
96
+ )
97
+ end
98
+
99
+ def each_iteration(batch_of_comments, product_id)
100
+ # batch_of_comments will be a Comment::ActiveRecord_Relation
101
+ batch_of_comments.update_all(deleted: true)
81
102
  end
82
103
  end
83
104
  ```
@@ -150,7 +171,7 @@ There a few configuration assumptions that are required for Iteration to work wi
150
171
 
151
172
  **Why is it important that `each_iteration` takes less than 30 seconds?** When the job worker is scheduled for restart or shutdown, it gets a notice to finish remaining unit of work. To guarantee that no progress is lost we need to make sure that `each_iteration` completes within a reasonable amount of time.
152
173
 
153
- **What do I do if each iteration takes a long time, because it's doing nested operations?** If your `each_iteration` is complex, we recommend enqueuing another job, which will run your nested business logic. We may expose primitives in the future to do this more effectively, but this is not terribly common today. We recommend to read https://goo.gl/UobaaU to learn more about nested operations.
174
+ **What do I do if each iteration takes a long time, because it's doing nested operations?** If your `each_iteration` is complex, we recommend enqueuing another job, which will run your nested business logic. We may expose primitives in the future to do this more effectively, but this is not terribly common today.
154
175
 
155
176
  **Why do I use have to use this ugly helper in `build_enumerator`? Why can't you automatically infer it?** This is how the first version of the API worked. We checked the type of object returned by `build_enumerable`, and whether it was ActiveRecord Relation or an Array, we used the matching adapter. This caused opaque type branching in Iteration internals and it didn’t allow developers to craft their own Enumerators and control the cursor value. We made a decision to _always_ return Enumerator instance from `build_enumerator`. Now we provide explicit helpers to convert ActiveRecord Relation or an Array to Enumerator, and for more complex iteration flows developers can build their own `Enumerator` objects.
156
177
 
data/bin/test ADDED
@@ -0,0 +1,32 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ def main
5
+ begin
6
+ command = create_command
7
+ rescue ArgumentError => e
8
+ abort(e.message)
9
+ end
10
+ puts "Running #{command.join(" ")}"
11
+ system(*command)
12
+ end
13
+
14
+ def create_command
15
+ case ARGV.length
16
+ when 0
17
+ ["bundle", "exec", "rake", "test"]
18
+ when 1
19
+ filename = ARGV[0]
20
+ ["bundle", "exec", "rake", "test", "TEST=#{filename}"]
21
+ when 2
22
+ filename = ARGV[0]
23
+ test_name = ARGV[1]
24
+ test_name_with_underscores = test_name.tr(" ", "_")
25
+ test_name_pattern = "/#{Regexp.escape(test_name_with_underscores)}/"
26
+ ["bundle", "exec", "rake", "test", "TEST=#{filename}", "TESTOPTS=\"--name=#{test_name_pattern} -v\""]
27
+ else
28
+ raise ArgumentError, "Too many arguments. Did you forget to put the test name in quotes?"
29
+ end
30
+ end
31
+
32
+ main
data/dev.yml CHANGED
@@ -13,7 +13,43 @@ up:
13
13
  - custom:
14
14
  name: Create Job Iteration database
15
15
  meet: mysql -uroot -h job-iteration.railgun -e "CREATE DATABASE job_iteration_test"
16
- met?: mysql -uroot -h job-iteration.railgun job_iteration_test -e "SELECT 1"
16
+ met?: mysql -uroot -h job-iteration.railgun job_iteration_test -e "SELECT 1" &> /dev/null
17
17
 
18
18
  commands:
19
- test: bundle exec rake
19
+ test:
20
+ run: bin/test "$@"
21
+ syntax:
22
+ optional: filename testnamepattern
23
+ aliases: [t]
24
+ desc: run tests
25
+ long_desc: |
26
+ {{bold:Default}}
27
+ =======
28
+ Run the entire test suite.
29
+
30
+ Examples:
31
+ {{command:dev test}}
32
+ {{command:dev t}}
33
+
34
+ {{bold:Run all tests in a file}}
35
+ ========================
36
+ Include the file path.
37
+
38
+ Example:
39
+ {{command:dev test test/unit/iteration_test.rb}}
40
+
41
+ {{bold:Run a single test in a given file}}
42
+ ========================
43
+ Include the file path and the name of the test you'd like to run.
44
+
45
+ Example:
46
+ {{command:dev test test/unit/iteration_test.rb test_that_it_has_a_version_number}}
47
+
48
+ {{bold:Run all tests in a given file whose name contains a string}}
49
+ ========================
50
+ Include the file path and the string that the test names should contain.
51
+
52
+ Example:
53
+ {{command:dev test test/unit/iteration_test.rb version_number}}
54
+ style:
55
+ run: bundle exec rubocop -a
@@ -5,9 +5,10 @@ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
5
5
  require "job-iteration/version"
6
6
 
7
7
  Gem::Specification.new do |spec|
8
+ spec.required_ruby_version = ">= 2.6"
8
9
  spec.name = "job-iteration"
9
10
  spec.version = JobIteration::VERSION
10
- spec.authors = %w(Shopify)
11
+ spec.authors = ["Shopify"]
11
12
  spec.email = ["ops-accounts+shipit@shopify.com"]
12
13
 
13
14
  spec.summary = "Makes your background jobs interruptible and resumable."
@@ -20,7 +21,7 @@ Gem::Specification.new do |spec|
20
21
  end
21
22
  spec.bindir = "exe"
22
23
  spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
23
- spec.require_paths = %w(lib)
24
+ spec.require_paths = ["lib"]
24
25
 
25
26
  spec.metadata["changelog_uri"] = "https://github.com/Shopify/job-iteration/blob/master/CHANGELOG.md"
26
27
  spec.metadata["allowed_push_host"] = "https://rubygems.org"
@@ -0,0 +1,117 @@
1
+ # frozen_string_literal: true
2
+
3
+ module JobIteration
4
+ # Builds Batch Enumerator based on ActiveRecord Relation.
5
+ # @see EnumeratorBuilder
6
+ class ActiveRecordBatchEnumerator
7
+ include Enumerable
8
+
9
+ SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%N"
10
+
11
+ def initialize(relation, columns: nil, batch_size: 100, cursor: nil)
12
+ @batch_size = batch_size
13
+ @primary_key = "#{relation.table_name}.#{relation.primary_key}"
14
+ @columns = Array(columns&.map(&:to_s) || @primary_key)
15
+ @primary_key_index = @columns.index(@primary_key) || @columns.index(relation.primary_key)
16
+ @pluck_columns = if @primary_key_index
17
+ @columns
18
+ else
19
+ @columns.dup << @primary_key
20
+ end
21
+ @cursor = Array.wrap(cursor)
22
+ @initial_cursor = @cursor
23
+ raise ArgumentError, "Must specify at least one column" if @columns.empty?
24
+ if relation.joins_values.present? && !@columns.all? { |column| column.to_s.include?(".") }
25
+ raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
26
+ end
27
+
28
+ if relation.arel.orders.present? || relation.arel.taken.present?
29
+ raise ConditionNotSupportedError
30
+ end
31
+
32
+ @base_relation = relation.reorder(@columns.join(","))
33
+ end
34
+
35
+ def each
36
+ return to_enum { size } unless block_given?
37
+ while (relation = next_batch)
38
+ yield relation, cursor_value
39
+ end
40
+ end
41
+
42
+ def size
43
+ (@base_relation.count + @batch_size - 1) / @batch_size # ceiling division
44
+ end
45
+
46
+ private
47
+
48
+ def next_batch
49
+ relation = @base_relation.limit(@batch_size)
50
+ if conditions.any?
51
+ relation = relation.where(*conditions)
52
+ end
53
+
54
+ cursor_values, ids = relation.uncached do
55
+ pluck_columns(relation)
56
+ end
57
+
58
+ cursor = cursor_values.last
59
+ unless cursor.present?
60
+ @cursor = @initial_cursor
61
+ return
62
+ end
63
+ # The primary key was plucked, but original cursor did not include it, so we should remove it
64
+ cursor.pop unless @primary_key_index
65
+ @cursor = Array.wrap(cursor)
66
+
67
+ # Yields relations by selecting the primary keys of records in the batch.
68
+ # Post.where(published: nil) results in an enumerator of relations like: Post.where(ids: batch_of_ids)
69
+ @base_relation.where(@primary_key => ids)
70
+ end
71
+
72
+ def pluck_columns(relation)
73
+ if @pluck_columns.size == 1 # only the primary key
74
+ column_values = relation.pluck(*@pluck_columns)
75
+ return [column_values, column_values]
76
+ end
77
+
78
+ column_values = relation.pluck(*@pluck_columns)
79
+ primary_key_index = @primary_key_index || -1
80
+ primary_key_values = column_values.map { |values| values[primary_key_index] }
81
+
82
+ serialize_column_values!(column_values)
83
+ [column_values, primary_key_values]
84
+ end
85
+
86
+ def cursor_value
87
+ return @cursor.first if @cursor.size == 1
88
+ @cursor
89
+ end
90
+
91
+ def conditions
92
+ column_index = @cursor.size - 1
93
+ column = @columns[column_index]
94
+ where_clause = if @columns.size == @cursor.size
95
+ "#{column} > ?"
96
+ else
97
+ "#{column} >= ?"
98
+ end
99
+ while column_index > 0
100
+ column_index -= 1
101
+ column = @columns[column_index]
102
+ where_clause = "#{column} > ? OR (#{column} = ? AND (#{where_clause}))"
103
+ end
104
+ ret = @cursor.reduce([where_clause]) { |params, value| params << value << value }
105
+ ret.pop
106
+ ret
107
+ end
108
+
109
+ def serialize_column_values!(column_values)
110
+ column_values.map! { |values| values.map! { |value| column_value(value) } }
111
+ end
112
+
113
+ def column_value(value)
114
+ value.is_a?(Time) ? value.strftime(SQL_DATETIME_WITH_NSEC) : value
115
+ end
116
+ end
117
+ end
@@ -1,4 +1,5 @@
1
1
  # frozen_string_literal: true
2
+ require_relative "./active_record_batch_enumerator"
2
3
  require_relative "./active_record_enumerator"
3
4
  require_relative "./csv_enumerator"
4
5
  require_relative "./throttle_enumerator"
@@ -86,6 +87,11 @@ module JobIteration
86
87
  # WHERE (created_at > '$LAST_CREATED_AT_CURSOR'
87
88
  # OR (created_at = '$LAST_CREATED_AT_CURSOR' AND (id > '$LAST_ID_CURSOR')))
88
89
  # ORDER BY created_at, id LIMIT 100
90
+ #
91
+ # As a result of this query pattern, if the values in these columns change for the records in scope during
92
+ # iteration, they may be skipped or yielded multiple times depending on the nature of the update and the
93
+ # cursor's value. If the value gets updated to a greater value than the cursor's value, it will get yielded
94
+ # again. Similarly, if the value gets updated to a lesser value than the curor's value, it will get skipped.
89
95
  def build_active_record_enumerator_on_records(scope, cursor:, **args)
90
96
  enum = build_active_record_enumerator(
91
97
  scope,
@@ -95,7 +101,7 @@ module JobIteration
95
101
  wrap(self, enum)
96
102
  end
97
103
 
98
- # Builds Enumerator from Active Record Relation and enumerates on batches.
104
+ # Builds Enumerator from Active Record Relation and enumerates on batches of records.
99
105
  # Each Enumerator tick moves the cursor +batch_size+ rows forward.
100
106
  #
101
107
  # +batch_size:+ sets how many records will be fetched in one batch. Defaults to 100.
@@ -110,6 +116,16 @@ module JobIteration
110
116
  wrap(self, enum)
111
117
  end
112
118
 
119
+ # Builds Enumerator from Active Record Relation and enumerates on batches, yielding Active Record Relations.
120
+ # See documentation for #build_active_record_enumerator_on_batches.
121
+ def build_active_record_enumerator_on_batch_relations(scope, cursor:, **args)
122
+ JobIteration::ActiveRecordBatchEnumerator.new(
123
+ scope,
124
+ cursor: cursor,
125
+ **args
126
+ ).each
127
+ end
128
+
113
129
  def build_throttle_enumerator(enum, throttle_on:, backoff:)
114
130
  JobIteration::ThrottleEnumerator.new(
115
131
  enum,
@@ -124,6 +140,7 @@ module JobIteration
124
140
  alias_method :array, :build_array_enumerator
125
141
  alias_method :active_record_on_records, :build_active_record_enumerator_on_records
126
142
  alias_method :active_record_on_batches, :build_active_record_enumerator_on_batches
143
+ alias_method :active_record_on_batch_relations, :build_active_record_enumerator_on_batch_relations
127
144
  alias_method :throttle, :build_throttle_enumerator
128
145
 
129
146
  private
@@ -142,7 +142,8 @@ module JobIteration
142
142
  arguments = arguments.dup.freeze
143
143
  found_record = false
144
144
  enumerator.each do |object_from_enumerator, index|
145
- assert_valid_cursor!(index)
145
+ # Deferred until 2.0.0
146
+ # assert_valid_cursor!(index)
146
147
 
147
148
  record_unit_of_work do
148
149
  found_record = true
@@ -161,6 +162,8 @@ module JobIteration
161
162
  "times_interrupted=#{times_interrupted} cursor_position=#{cursor_position}"
162
163
  ) unless found_record
163
164
 
165
+ adjust_total_time
166
+
164
167
  true
165
168
  end
166
169
 
@@ -248,8 +251,6 @@ module JobIteration
248
251
  end
249
252
 
250
253
  def output_interrupt_summary
251
- adjust_total_time
252
-
253
254
  message = "[JobIteration::Iteration] Completed iterating. times_interrupted=%d total_time=%.3f"
254
255
  logger.info(Kernel.format(message, times_interrupted, total_time))
255
256
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module JobIteration
4
- VERSION = "1.1.11"
4
+ VERSION = "1.2.0"
5
5
  end
data/lib/job-iteration.rb CHANGED
@@ -38,16 +38,14 @@ module JobIteration
38
38
  def load_integrations
39
39
  loaded = nil
40
40
  INTEGRATIONS.each do |integration|
41
- begin
42
- load_integration(integration)
43
- if loaded
44
- raise IntegrationLoadError,
45
- "#{loaded} integration has already been loaded, but #{integration} is also available. " \
46
- "Iteration will only work with one integration."
47
- end
48
- loaded = integration
49
- rescue LoadError
41
+ load_integration(integration)
42
+ if loaded
43
+ raise IntegrationLoadError,
44
+ "#{loaded} integration has already been loaded, but #{integration} is also available. " \
45
+ "Iteration will only work with one integration."
50
46
  end
47
+ loaded = integration
48
+ rescue LoadError
51
49
  end
52
50
  end
53
51
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: job-iteration
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.11
4
+ version: 1.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Shopify
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2021-04-19 00:00:00.000000000 Z
11
+ date: 2021-09-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activerecord
@@ -56,6 +56,7 @@ files:
56
56
  - README.md
57
57
  - Rakefile
58
58
  - bin/setup
59
+ - bin/test
59
60
  - dev.yml
60
61
  - gemfiles/rails_5_2.gemfile
61
62
  - gemfiles/rails_6_0.gemfile
@@ -66,6 +67,7 @@ files:
66
67
  - guides/throttling.md
67
68
  - job-iteration.gemspec
68
69
  - lib/job-iteration.rb
70
+ - lib/job-iteration/active_record_batch_enumerator.rb
69
71
  - lib/job-iteration/active_record_cursor.rb
70
72
  - lib/job-iteration/active_record_enumerator.rb
71
73
  - lib/job-iteration/csv_enumerator.rb
@@ -91,14 +93,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
91
93
  requirements:
92
94
  - - ">="
93
95
  - !ruby/object:Gem::Version
94
- version: '0'
96
+ version: '2.6'
95
97
  required_rubygems_version: !ruby/object:Gem::Requirement
96
98
  requirements:
97
99
  - - ">="
98
100
  - !ruby/object:Gem::Version
99
101
  version: '0'
100
102
  requirements: []
101
- rubygems_version: 3.0.3
103
+ rubygems_version: 3.2.20
102
104
  signing_key:
103
105
  specification_version: 4
104
106
  summary: Makes your background jobs interruptible and resumable.