job-iteration 0.9.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: e042aee693c7b308e483fd6276a9966894dacf75
4
+ data.tar.gz: bd8fe09135913110159faaaa7498e0786877650a
5
+ SHA512:
6
+ metadata.gz: ef8ca88e85af8844864511165062f1dc75d23b34d8161849ae7db55ddb9f8e70bf53c58cb96465800cb82c717eb34b945c6a55a6f8d8e88861c661a302723f69
7
+ data.tar.gz: 2854303407e9d67bbc727970b0ee979f74a63917039ffa356efba86ed6b8f3b4eb29a42d56735e9516404d203cd890af8850f84e979f83ff1a581aafb76819d8
@@ -0,0 +1,10 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+ .ruby-version
10
+ .rubocop-http---shopify-github-io-ruby-style-guide-rubocop-yml
@@ -0,0 +1,13 @@
1
+ inherit_from:
2
+ - http://shopify.github.io/ruby-style-guide/rubocop.yml
3
+
4
+ AllCops:
5
+ TargetRubyVersion: 2.4.4
6
+ Exclude:
7
+ - 'vendor/bundle/**/*'
8
+ Lint/HandleExceptions:
9
+ Exclude:
10
+ - lib/job-iteration.rb
11
+ Style/GlobalVars:
12
+ Exclude:
13
+ - lib/job-iteration/integrations/resque.rb
@@ -0,0 +1,14 @@
1
+ services:
2
+ - mysql
3
+ - redis-server
4
+ language: ruby
5
+ rvm:
6
+ - 2.4.4
7
+ - 2.5.1
8
+ before_install:
9
+ - gem install bundler -v 1.16.0
10
+ - mysql -e 'CREATE DATABASE job_iteration_test;'
11
+ script:
12
+ - bundle exec rake test
13
+ - bundle exec rubocop
14
+
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at shatrov@me.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,24 @@
1
+ # frozen_string_literal: true
2
+
3
+ source "https://rubygems.org"
4
+
5
+ git_source(:github) { |repo_name| "https://github.com/#{repo_name}" }
6
+
7
+ # Specify your gem's dependencies in job-iteration.gemspec
8
+ gemspec
9
+
10
+ # for integration testing
11
+ gem 'sidekiq'
12
+ gem 'resque'
13
+
14
+ gem 'activerecord'
15
+ gem 'mysql2', '~> 0.4.4'
16
+ gem 'globalid'
17
+ gem 'i18n'
18
+ gem 'redis'
19
+ gem 'database_cleaner'
20
+
21
+ gem 'pry'
22
+ gem 'mocha'
23
+
24
+ gem 'rubocop'
@@ -0,0 +1,113 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ job-iteration (0.9.0)
5
+ activejob (~> 5.2)
6
+
7
+ GEM
8
+ remote: https://rubygems.org/
9
+ specs:
10
+ activejob (5.2.0)
11
+ activesupport (= 5.2.0)
12
+ globalid (>= 0.3.6)
13
+ activemodel (5.2.0)
14
+ activesupport (= 5.2.0)
15
+ activerecord (5.2.0)
16
+ activemodel (= 5.2.0)
17
+ activesupport (= 5.2.0)
18
+ arel (>= 9.0)
19
+ activesupport (5.2.0)
20
+ concurrent-ruby (~> 1.0, >= 1.0.2)
21
+ i18n (>= 0.7, < 2)
22
+ minitest (~> 5.1)
23
+ tzinfo (~> 1.1)
24
+ arel (9.0.0)
25
+ ast (2.4.0)
26
+ coderay (1.1.2)
27
+ concurrent-ruby (1.0.5)
28
+ connection_pool (2.2.2)
29
+ database_cleaner (1.7.0)
30
+ globalid (0.4.1)
31
+ activesupport (>= 4.2.0)
32
+ i18n (1.0.1)
33
+ concurrent-ruby (~> 1.0)
34
+ jaro_winkler (1.5.1)
35
+ metaclass (0.0.4)
36
+ method_source (0.9.0)
37
+ minitest (5.11.3)
38
+ mocha (1.5.0)
39
+ metaclass (~> 0.0.1)
40
+ mono_logger (1.1.0)
41
+ multi_json (1.13.1)
42
+ mustermann (1.0.2)
43
+ mysql2 (0.4.10)
44
+ parallel (1.12.1)
45
+ parser (2.5.1.0)
46
+ ast (~> 2.4.0)
47
+ powerpack (0.1.2)
48
+ pry (0.11.3)
49
+ coderay (~> 1.1.0)
50
+ method_source (~> 0.9.0)
51
+ rack (2.0.5)
52
+ rack-protection (2.0.3)
53
+ rack
54
+ rainbow (3.0.0)
55
+ rake (10.5.0)
56
+ redis (4.0.1)
57
+ redis-namespace (1.6.0)
58
+ redis (>= 3.0.4)
59
+ resque (1.27.4)
60
+ mono_logger (~> 1.0)
61
+ multi_json (~> 1.0)
62
+ redis-namespace (~> 1.3)
63
+ sinatra (>= 0.9.2)
64
+ vegas (~> 0.1.2)
65
+ rubocop (0.57.2)
66
+ jaro_winkler (~> 1.5.1)
67
+ parallel (~> 1.10)
68
+ parser (>= 2.5)
69
+ powerpack (~> 0.1)
70
+ rainbow (>= 2.2.2, < 4.0)
71
+ ruby-progressbar (~> 1.7)
72
+ unicode-display_width (~> 1.0, >= 1.0.1)
73
+ ruby-progressbar (1.9.0)
74
+ sidekiq (5.1.3)
75
+ concurrent-ruby (~> 1.0)
76
+ connection_pool (~> 2.2, >= 2.2.0)
77
+ rack-protection (>= 1.5.0)
78
+ redis (>= 3.3.5, < 5)
79
+ sinatra (2.0.3)
80
+ mustermann (~> 1.0)
81
+ rack (~> 2.0)
82
+ rack-protection (= 2.0.3)
83
+ tilt (~> 2.0)
84
+ thread_safe (0.3.6)
85
+ tilt (2.0.8)
86
+ tzinfo (1.2.5)
87
+ thread_safe (~> 0.1)
88
+ unicode-display_width (1.4.0)
89
+ vegas (0.1.11)
90
+ rack (>= 1.0.0)
91
+
92
+ PLATFORMS
93
+ ruby
94
+
95
+ DEPENDENCIES
96
+ activerecord
97
+ bundler (~> 1.16)
98
+ database_cleaner
99
+ globalid
100
+ i18n
101
+ job-iteration!
102
+ minitest (~> 5.0)
103
+ mocha
104
+ mysql2 (~> 0.4.4)
105
+ pry
106
+ rake (~> 10.0)
107
+ redis
108
+ resque
109
+ rubocop
110
+ sidekiq
111
+
112
+ BUNDLED WITH
113
+ 1.16.1
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2018 Shopify
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,191 @@
1
+ # Job Iteration API
2
+
3
+ Meet Iteration, an extension for [ActiveJob](https://github.com/rails/rails/tree/master/activejob) that makes your jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
4
+
5
+ ## Background
6
+
7
+ Imagine the following job:
8
+
9
+ ```ruby
10
+ class SimpleJob < ActiveJob::Base
11
+ def perform
12
+ User.find_each do |user|
13
+ user.notify_about_something
14
+ end
15
+ end
16
+ end
17
+ ```
18
+
19
+ The job would run fairly quickly when you only have a hundred User records. But as the number of records grows, it will take longer for a job to iterate over all Users. Eventually, there will be millions of records to iterate and the job will end up taking hours and days.
20
+
21
+ With frequent deploys and worker restarts, it would mean that a job will be either lost of started from the beginning. Some records (especially those in the beginning of the relation) will be processed more than once.
22
+
23
+ Cloud environments are also unpredictable, and there's no way to guarantee that a single job will have reserved hardware to run for hours and days. What if AWS diagnosed the instance as unhealthy and will restart it in 5 minutes? What if Kubernetes pod is getting [evicted](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/)? Again, all job progress will be lost.
24
+
25
+ Software that is designed for high availability [must be friendly](https://12factor.net/disposability) to interruptions that come from the infrastructure. That's exactly what Iteration brings to ActiveJob. It's been developed at Shopify to safely process long-running jobs, in Cloud, and has been working in production since May 2017.
26
+
27
+ We recommend you to watch a [conference talk](https://www.youtube.com/watch?v=XvnWjsmAl60) about the ideas and history behind Iteration API.
28
+
29
+ ## Getting started
30
+
31
+ Add this line to your application's Gemfile:
32
+
33
+ ```ruby
34
+ gem 'job-iteration'
35
+ ```
36
+
37
+ And then execute:
38
+
39
+ $ bundle
40
+
41
+ In the job, include `JobIteration::Iteration` module and start describing the job with two methods (`build_enumerator` and `each_iteration`) instead of `perform`:
42
+
43
+ ```ruby
44
+ class NotifyUsersJob < ActiveJob::Base
45
+ include JobIteration::Iteration
46
+
47
+ def build_enumerator(cursor:)
48
+ enumerator_builder.active_record_on_records(
49
+ User.all,
50
+ cursor: cursor,
51
+ )
52
+ end
53
+
54
+ def each_iteration(user)
55
+ user.notify_about_something
56
+ end
57
+ end
58
+ ```
59
+
60
+ `each_iteration` will be called for each `User` model in `User.all` relation. The relation will be ordered by primary key, exactly like `find_each` does.
61
+
62
+ Check out more examples of Iterations:
63
+
64
+ ```ruby
65
+ class BatchesJob < ActiveJob::Iteration
66
+ def build_enumerator(product_id, cursor:)
67
+ enumerator_builder.active_record_on_batches(
68
+ Product.find(product_id).comments,
69
+ cursor: cursor,
70
+ batch_size: 100,
71
+ )
72
+ end
73
+
74
+ def each_iteration(batch_of_comments, product_id)
75
+ # batch_of_comments will contain batches of 100 records
76
+ Comment.where(id: batch_of_comments.map(&:id)).update_all(deleted: true)
77
+ end
78
+ end
79
+ ```
80
+
81
+ ```ruby
82
+ class ArrayJob < ActiveJob::Iteration
83
+ def build_enumerator(cursor:)
84
+ enumerator_builder.array(['build', 'enumerator', 'from', 'any', 'array'], cursor: cursor)
85
+ end
86
+
87
+ def each_iteration(array_element)
88
+ # use array_element
89
+ end
90
+ end
91
+ ```
92
+
93
+ ```ruby
94
+ class CsvJob < ActiveJob::Iteration
95
+ def build_enumerator(import_id, cursor:)
96
+ import = Import.find(import_id)
97
+ JobIteration::CsvEnumerator.new(import.csv).rows(cursor: cursor)
98
+ end
99
+
100
+ def each_iteration(csv_row)
101
+ # insert csv_row to database
102
+ end
103
+ end
104
+ ```
105
+
106
+ ## Guides
107
+
108
+ * [Iteration: how it works](guides/iteration-how-it-works.md).
109
+ * [Best practices](guides/best-practices.md).
110
+
111
+ ## Requirements
112
+
113
+ ActiveJob is the primary requirement for Iteration. While there's nothing that prevents it, Iteration is not yet compatible with [vanilla](https://github.com/mperham/sidekiq/wiki/Active-Job) Sidekiq API.
114
+
115
+ ### API
116
+
117
+ Iteration job must respond to `build_enumerator` and `each_iteration` methods. `build_enumerator` must return [Enumerator](http://ruby-doc.org/core-2.5.1/Enumerator.html) object that respects the `cursor` value.
118
+
119
+ ### Sidekiq adapter
120
+
121
+ Unless you are running on Heroku, we recommend you to tune Sidekiq's [timeout](https://github.com/mperham/sidekiq/wiki/Deployment#overview) option from the default 8 seconds to 25-30 seconds, to allow the last `each_iteration` to complete and gracefully shutdown.
122
+
123
+ ### Resque adapter
124
+
125
+ There a few configuration assumptions that are required for Iteration to work with Resque. `GRACEFUL_TERM` must be enabled (giving the job ability to gracefully interrupt), and `FORK_PER_JOB` is recommended to be disabled (set to `false`).
126
+
127
+ ## FAQ
128
+
129
+ **Why can't I just iterate in `#perform` method and do whatever I want?** You can, but then your job has to comply with a long list of requirements, such as the ones above. This creates leaky abstractions more easily, when instead we can expose a more powerful abstraction for developers--without exposing the underlying infrastructure.
130
+
131
+ **What happens when my job is interrupted?** A checkpoint will be persisted to Redis after the current `each_iteration`, and the job will be re-enqueued. Once it's popped off the queue, the worker will work off from the next iteration.
132
+
133
+ **What happens with retries?** An interruption of a job does not count as a retry. The iteration of job that caused the job to fail will be retried and progress will continue from there on.
134
+
135
+ **What happens if my iteration takes a long time?** We recommend that a single `each_iteration` should take no longer than 30 seconds. In the future, this may raise.
136
+
137
+ **Why is it important that `each_iteration` takes less than 30 seconds?** When the job worker is scheduled for restart or shutdown, it gets a notice to finish remaining unit of work. To guarantee that no progress is lost we need to make sure that `each_iteration` completes within a reasonable amount of time.
138
+
139
+ **What do I do if each iteration takes a long time, because it's doing nested operations?** If your `each_iteration` is complex, we recommend enqueuing another job. We may expose primitives in the future to do this more effectively, but this is not terribly common today. We recommend to read https://goo.gl/UobaaU to learn more about nested operations.
140
+
141
+ **Why do I use have to use this ugly helper in `build_enumerator`? Why can't you automatically infer it?** This is how the first version of the API worked. We checked the type of object returned by `build_enumerable`, and whether it was ActiveRecord Relation or an Array, we used the matching adapter. This caused opaque type branching in Iteration internals and it didn’t allow developers to craft their own Enumerators and control the cursor value. We made a decision to _always_ return Enumerator instance from `build_enumerator`. Now we provide explicit helpers to convert ActiveRecord Relation or an Array to Enumerator, and for more complex iteration flows developers can build their own `Enumerator` objects.
142
+
143
+ **What is the difference between Enumerable and Enumerator?** We recomend [this post](http://blog.arkency.com/2014/01/ruby-to-enum-for-enumerator/) to learn more about Enumerators in Ruby.
144
+
145
+ **My job has a complex flow. How do I write my own Enumerator?** Iteration API takes care of persisting the cursor (that you may use to calculate an offset) and controlling the job state. The power of Enumerator object is that you can use the cursor in any way you want. One example is a cursorless job that pops records from a datastore until the job is interrupted:
146
+
147
+ ```ruby
148
+ class MyJob < ActiveJob::Base
149
+ include JobIteration::Iteration
150
+
151
+ def build_enumerator(cursor:)
152
+ Enumerator.new do
153
+ Redis.lpop("mylist") # or: Kafka.poll(timeout: 10.seconds)
154
+ end
155
+ end
156
+
157
+ def each_iteration(element_from_redis)
158
+ # ...
159
+ end
160
+ end
161
+ ```
162
+
163
+ ## Credits
164
+
165
+ This project would not be possible without these individuals (in alphabetical order):
166
+
167
+ * Daniella Niyonkuru
168
+ * Emil Stolarsky
169
+ * Florian Weingarten
170
+ * Guillaume Malette
171
+ * Hormoz Kheradmand
172
+ * Mohamed-Adam Chaieb
173
+ * Simon Eskildsen
174
+
175
+ ## Development
176
+
177
+ After checking out the repo, run `bundle install` to install dependencies. Then, run `bundle exec rake test` to run the tests.
178
+
179
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
180
+
181
+ ## Contributing
182
+
183
+ Bug reports and pull requests are welcome on GitHub at https://github.com/Shopify/job-iteration. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
184
+
185
+ ## License
186
+
187
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
188
+
189
+ ## Code of Conduct
190
+
191
+ Everyone interacting in the Job::Iteration project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/kirs/job-iteration/blob/master/CODE_OF_CONDUCT.md).