plines 0.5.0
Sign up to get free protection for your applications and to get access to all the features.
- data/Gemfile +12 -0
- data/LICENSE +22 -0
- data/README.md +420 -0
- data/Rakefile +61 -0
- data/lib/plines.rb +13 -0
- data/lib/plines/configuration.rb +55 -0
- data/lib/plines/dependency_graph.rb +81 -0
- data/lib/plines/dynamic_struct.rb +34 -0
- data/lib/plines/enqueued_job.rb +120 -0
- data/lib/plines/external_dependency_timeout.rb +30 -0
- data/lib/plines/indifferent_hash.rb +58 -0
- data/lib/plines/job.rb +88 -0
- data/lib/plines/job_batch.rb +363 -0
- data/lib/plines/job_batch_list.rb +57 -0
- data/lib/plines/job_enqueuer.rb +83 -0
- data/lib/plines/pipeline.rb +97 -0
- data/lib/plines/redis_objects.rb +108 -0
- data/lib/plines/step.rb +269 -0
- data/lib/plines/version.rb +3 -0
- metadata +192 -0
data/Gemfile
ADDED
@@ -0,0 +1,12 @@
|
|
1
|
+
source 'https://rubygems.org'
|
2
|
+
|
3
|
+
# Specify your gem's dependencies in plines.gemspec
|
4
|
+
gemspec
|
5
|
+
|
6
|
+
gem 'qless', git: 'git://github.com/seomoz/qless.git', branch: 'unified'
|
7
|
+
|
8
|
+
group :extras do
|
9
|
+
gem 'debugger', platform: :mri
|
10
|
+
end
|
11
|
+
|
12
|
+
gem 'rspec-fire', git: 'git://github.com/xaviershay/rspec-fire.git'
|
data/LICENSE
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2012 Myron Marston
|
2
|
+
|
3
|
+
MIT License
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,420 @@
|
|
1
|
+
# Plines
|
2
|
+
|
3
|
+
Plines creates job pipelines out of a complex set of step dependencies.
|
4
|
+
It's intended to maximize the efficiency and throughput of the jobs
|
5
|
+
(ensuring jobs are run as soon as their dependencies have been met)
|
6
|
+
while minimizing the amount of "glue" code you have to write to make it
|
7
|
+
work.
|
8
|
+
|
9
|
+
Plines is built on top of [Qless](https://github.com/seomoz/qless) and
|
10
|
+
[Redis](http://redis.io/).
|
11
|
+
|
12
|
+
## Installation
|
13
|
+
|
14
|
+
Add this line to your application's Gemfile:
|
15
|
+
|
16
|
+
gem 'plines'
|
17
|
+
|
18
|
+
And then execute:
|
19
|
+
|
20
|
+
$ bundle
|
21
|
+
|
22
|
+
Or install it yourself as:
|
23
|
+
|
24
|
+
$ gem install plines
|
25
|
+
|
26
|
+
## Getting Started
|
27
|
+
|
28
|
+
First, create a pipeline using the `Plines::Pipeline` module:
|
29
|
+
|
30
|
+
``` ruby
|
31
|
+
module MyProcessingPipeline
|
32
|
+
extend Plines::Pipeline
|
33
|
+
|
34
|
+
configure do |config|
|
35
|
+
# configuration goes here; see below for available options
|
36
|
+
end
|
37
|
+
end
|
38
|
+
```
|
39
|
+
|
40
|
+
`MyProcessingPipeline` will function both as the namespace for your
|
41
|
+
pipeline steps and also as a singleton holding some state for your
|
42
|
+
pipeline.
|
43
|
+
|
44
|
+
Next, define some pipeline steps. Your steps should be simple ruby
|
45
|
+
classes that extend the `Plines::Step` module and define a `perform`
|
46
|
+
method:
|
47
|
+
|
48
|
+
``` ruby
|
49
|
+
module MyProcessingPipeline
|
50
|
+
class CountWidgets
|
51
|
+
extend Plines::Step
|
52
|
+
|
53
|
+
def perform
|
54
|
+
# do some work
|
55
|
+
end
|
56
|
+
end
|
57
|
+
end
|
58
|
+
```
|
59
|
+
|
60
|
+
The `Plines::Step` module makes available some class-level
|
61
|
+
macros for declaring step dependency relationships. See the **Step Class
|
62
|
+
DSL** section below for more details.
|
63
|
+
|
64
|
+
Once you've defined all your steps, you can enqueue jobs for them:
|
65
|
+
|
66
|
+
``` ruby
|
67
|
+
MyProcessingPipeline.enqueue_jobs_for("some" => "data", "goes" => "here")
|
68
|
+
```
|
69
|
+
|
70
|
+
`MyProcessingPipeline.enqueue_jobs_for` will enqueue a full set of qless
|
71
|
+
jobs (or a `JobBatch` in Plines terminology) for the given batch data
|
72
|
+
based on your step classes' macro declarations.
|
73
|
+
|
74
|
+
## Configuring a Pipeline
|
75
|
+
|
76
|
+
Plines supports configuration at the pipeline level:
|
77
|
+
|
78
|
+
``` ruby
|
79
|
+
module MyProcessingPipeline
|
80
|
+
extend Plines::Pipeline
|
81
|
+
|
82
|
+
configure do |config|
|
83
|
+
# Determines how job batches are identified. Plines provides an API
|
84
|
+
# to find the most recent existing job batch based on this key.
|
85
|
+
config.batch_list_key { |batch_data| batch_data.fetch(:user_id) }
|
86
|
+
|
87
|
+
# Sets the Qless client to use. If you have only one Qless server,
|
88
|
+
# have the block return a client for it. If you're sharding your
|
89
|
+
# Qless usage, you can have the block return a client based on the
|
90
|
+
# given batch list key.
|
91
|
+
config.qless_client do |user_id|
|
92
|
+
Qless::Client.new(redis: RedisShard.for(user_id))
|
93
|
+
end
|
94
|
+
|
95
|
+
# Determines how long the Plines job batch data will be kept around
|
96
|
+
# in redis after the batch reaches a final state (cancelled or
|
97
|
+
# completed). By default, this is set to 6 months, but you
|
98
|
+
# will probably want to set it to something shorter (like 2 weeks)
|
99
|
+
config.data_ttl_in_seconds = 14 * 24 * 60 * 60
|
100
|
+
|
101
|
+
# Provides a hook that gets called when job batches are cancelled.
|
102
|
+
# Use this to perform any cleanup in your system.
|
103
|
+
config.after_job_batch_cancellation do |job_batch|
|
104
|
+
# do some cleanup
|
105
|
+
end
|
106
|
+
|
107
|
+
# Use this callback to set additional global qless job
|
108
|
+
# options (such as queue, tags and priority). You can also set
|
109
|
+
# options on an individual step class (see below).
|
110
|
+
config.qless_job_options do |job|
|
111
|
+
{ tags: [job.data[:user_id]] }
|
112
|
+
end
|
113
|
+
end
|
114
|
+
end
|
115
|
+
```
|
116
|
+
|
117
|
+
## The Step Class DSL
|
118
|
+
|
119
|
+
An example will help illustrate the Step class DSL. (Note that this
|
120
|
+
example omits the `perform` method declarations for brevity).
|
121
|
+
|
122
|
+
``` ruby
|
123
|
+
module MakeThanksgivingDinner
|
124
|
+
extend Plines::Pipeline
|
125
|
+
|
126
|
+
class BuyGroceries
|
127
|
+
extend Plines::Step
|
128
|
+
|
129
|
+
# Indicates that the BuyGroceries step must run before all other steps.
|
130
|
+
# Essentially creates an implicit dependency of all steps on this one.
|
131
|
+
# You can have only one step declare `depended_on_by_all_steps`.
|
132
|
+
# Doing this relieves you of the burden of having to add
|
133
|
+
# `depends_on :BuyGroceries` to all step definitions.
|
134
|
+
depended_on_by_all_steps
|
135
|
+
end
|
136
|
+
|
137
|
+
# This step depends on BuyGroceries automatically due to the
|
138
|
+
# depended_on_by_all_steps declaration above.
|
139
|
+
class MakeStuffing
|
140
|
+
extend Plines::Step
|
141
|
+
|
142
|
+
# qless_options lets you set qless job options for this step.
|
143
|
+
qless_options do |qless|
|
144
|
+
# By default, jobs are enqueued to the :plines queue but you can override it
|
145
|
+
# Plines::Step overrides here will override any configurations in a Plines::Pipeline class
|
146
|
+
qless.queue = :make_stuffing
|
147
|
+
qless.tags = [:foo, :bar]
|
148
|
+
qless.priority = -10
|
149
|
+
qless.retries = 7
|
150
|
+
end
|
151
|
+
end
|
152
|
+
|
153
|
+
class PickupTurkey
|
154
|
+
extend Plines::Step
|
155
|
+
|
156
|
+
# External dependencies are named things that must be resolved
|
157
|
+
# before this step is allowed to proceed. They are intended for
|
158
|
+
# use when a step has a dependency on data from an external
|
159
|
+
# asynchronous system that operates on its own schedule.
|
160
|
+
has_external_dependencies do |deps, job_data|
|
161
|
+
deps.add "await_turkey_is_ready_for_pickup_notice", wait_up_to: 12.hours
|
162
|
+
end
|
163
|
+
end
|
164
|
+
|
165
|
+
class PrepareTurkey
|
166
|
+
extend Plines::Step
|
167
|
+
|
168
|
+
# Declares that the PrepareTurkey job cannot run until the
|
169
|
+
# PickupTurkey has run first. Note that the step class name
|
170
|
+
# is relative to the pipeline module namespace.
|
171
|
+
depends_on :PickupTurkey
|
172
|
+
end
|
173
|
+
|
174
|
+
class MakePie
|
175
|
+
extend Plines::Step
|
176
|
+
|
177
|
+
# By default, a single instance of a step will get enqueued in a
|
178
|
+
# pipeline job batch. The `fan_out` macro can be used to get multiple
|
179
|
+
# instances of the same step in a single job batch, each with
|
180
|
+
# different arguments.
|
181
|
+
#
|
182
|
+
# In this example, we will have multiple `MakePie` steps--one for
|
183
|
+
# each pie type, each with a different pie type argument.
|
184
|
+
fan_out do |batch_data|
|
185
|
+
batch_data['pie_types'].map do |type|
|
186
|
+
{ 'pie_type' => type, 'family' => batch_data['family'] }
|
187
|
+
end
|
188
|
+
end
|
189
|
+
|
190
|
+
# Makes each instance of this step depend on the prior one,
|
191
|
+
# to ensure no two instances run in parallel. This isn't usually
|
192
|
+
# needed, but is occasionally useful to prevent resource contention
|
193
|
+
# when these jobs operate on a common resource.
|
194
|
+
run_jobs_in_serial
|
195
|
+
end
|
196
|
+
|
197
|
+
class AddWhipCreamToPie
|
198
|
+
extend Plines::Step
|
199
|
+
|
200
|
+
fan_out do |batch_data|
|
201
|
+
batch_data['pie_types'].map do |type|
|
202
|
+
{ 'pie_type' => type, 'family' => batch_data['family'] }
|
203
|
+
end
|
204
|
+
end
|
205
|
+
|
206
|
+
# By default, `depends_on` makes all instances of this step depend on all
|
207
|
+
# instances of the named step. If you only want it to depend on some
|
208
|
+
# instances of the named step, pass a block; the instances of this step
|
209
|
+
# will only depend on the MakePie jobs for which the pie_type is the same.
|
210
|
+
depends_on :MakePie do |add_whip_cream_data, make_pie_data|
|
211
|
+
add_whip_cream_data['pie_type'] == make_pie_data['pie_type']
|
212
|
+
end
|
213
|
+
end
|
214
|
+
|
215
|
+
class SetTable
|
216
|
+
extend Plines::Step
|
217
|
+
|
218
|
+
# Indicates that this step should run last. This relieves you
|
219
|
+
# from the burden of having to add an extra `depends_on` declaration
|
220
|
+
# for each new step you create.
|
221
|
+
depends_on_all_steps
|
222
|
+
end
|
223
|
+
end
|
224
|
+
```
|
225
|
+
|
226
|
+
## Enqueing Jobs
|
227
|
+
|
228
|
+
To enqueue a job batch, use `#enqueue_jobs_for`:
|
229
|
+
|
230
|
+
``` ruby
|
231
|
+
MakeThanksgivingDinner.enqueue_jobs_for(
|
232
|
+
"family" => "Smith",
|
233
|
+
"pie_types" => %w[ apple pumpkin pecan ]
|
234
|
+
)
|
235
|
+
```
|
236
|
+
|
237
|
+
The argument given to `enqueue_jobs_for` _must_ be a hash. This
|
238
|
+
hash will be yielded to the `fan_out` blocks. In addition, this hash
|
239
|
+
(or the one returned by a `fan_out` block) will be available as
|
240
|
+
`#job_data` in a step's `#perform` method.
|
241
|
+
|
242
|
+
Based on the `MakeThanksgivingDinner` example above, the following jobs
|
243
|
+
will be enqueued in this batch:
|
244
|
+
|
245
|
+
* 1 BuyGroceries job
|
246
|
+
* 1 MakeStuffing job
|
247
|
+
* 1 PickupTurkey job
|
248
|
+
* 1 PrepareTurkey job
|
249
|
+
* 3 MakePie jobs, each with slightly different arguments (1 each with
|
250
|
+
"apple", "pumpkin" and "pecan")
|
251
|
+
* 3 AddWhipCreamToPie jobs, each with slightly different arguments (1
|
252
|
+
each with "apple", "pumpkin" and "pecan")
|
253
|
+
* 1 SetTable job
|
254
|
+
|
255
|
+
The declared dependencies will be honored as well:
|
256
|
+
|
257
|
+
* BuyGroceries is guaranteed to run first.
|
258
|
+
* MakeStuffing and the 3 MakePie jobs will be available for processing
|
259
|
+
immediately after the BuyGroceries job has finished.
|
260
|
+
* The 3 AddWhipCreamToPie jobs will be available for processing once
|
261
|
+
their corresponding MakePie jobs have completed.
|
262
|
+
* PickupTurkey will not run until the
|
263
|
+
`"await_turkey_is_ready_for_pickup_notice"` external dependency is
|
264
|
+
fulfilled (see below for more details).
|
265
|
+
* PrepareTurkey will be available for processing once the PickupTurkey
|
266
|
+
job has finished.
|
267
|
+
* SetTable will wait to be processed until all other jobs are complete.
|
268
|
+
|
269
|
+
## Working With Job Batches
|
270
|
+
|
271
|
+
Plines stores data about the batch in redis. It also provides a
|
272
|
+
first-class `JobBatch` object that allows you to work with job batches.
|
273
|
+
|
274
|
+
First, you need to configure the pipeline so that it knows how your
|
275
|
+
batches are identified:
|
276
|
+
|
277
|
+
``` ruby
|
278
|
+
MakeThanksgivingDinner.configure do |config|
|
279
|
+
config.batch_list_key do |batch_data|
|
280
|
+
batch_data["family"]
|
281
|
+
end
|
282
|
+
end
|
283
|
+
```
|
284
|
+
|
285
|
+
Once this is in place, you can find a particular job batch:
|
286
|
+
|
287
|
+
``` ruby
|
288
|
+
job_batch = MakeThanksgivingDinner.most_recent_job_batch_for("family" => "Smith")
|
289
|
+
```
|
290
|
+
|
291
|
+
The `batch_list_key` config option above means the job batch will be
|
292
|
+
keyed by the "family" entry in the batch data hash. Thus, you can easily
|
293
|
+
look up a job batch by giving it a hash with the same "family" entry.
|
294
|
+
|
295
|
+
Once you have a job batch, there are several things you can do with it:
|
296
|
+
|
297
|
+
``` ruby
|
298
|
+
# returns whether or not the job batch is finished.
|
299
|
+
job_batch.complete?
|
300
|
+
|
301
|
+
# returns the data hash that was used to enqueue the job batch
|
302
|
+
job_batch.data
|
303
|
+
|
304
|
+
# cancels all remaining jobs in this batch
|
305
|
+
job_batch.cancel!
|
306
|
+
|
307
|
+
# Resolves the named external dependency. For the example above,
|
308
|
+
# calling this will allow the PickupTurkey job to proceed.
|
309
|
+
job_batch.resolve_external_dependency "await_turkey_is_ready_for_pickup_notice"
|
310
|
+
```
|
311
|
+
|
312
|
+
Plines sets expiration on the redis keys it uses to track job batches as
|
313
|
+
soon as the job batch is completed or canceled. By default, the
|
314
|
+
expiration is set to 6 months. You can configure it if you wish to
|
315
|
+
shorten it:
|
316
|
+
|
317
|
+
``` ruby
|
318
|
+
MakeThanksgivingDinner.configure do |config|
|
319
|
+
config.data_ttl_in_seconds = 14 * 24 * 60 * 60 # 2 weeks
|
320
|
+
end
|
321
|
+
```
|
322
|
+
|
323
|
+
## External Dependency Timeouts
|
324
|
+
|
325
|
+
Under normal configuration, no job will run until all of its
|
326
|
+
dependencies have been met. However, plines provides support
|
327
|
+
for timing out an external dependency:
|
328
|
+
|
329
|
+
``` ruby
|
330
|
+
module MyPipeline
|
331
|
+
class MyStep
|
332
|
+
extend Plines::Step
|
333
|
+
has_external_dependencies do |deps, job_data|
|
334
|
+
deps.add "my_async_service", wait_up_to: 3.hours
|
335
|
+
end
|
336
|
+
end
|
337
|
+
end
|
338
|
+
```
|
339
|
+
|
340
|
+
With this configuration, Plines will schedule a Qless job to run in
|
341
|
+
3 hours that will timeout the `"my_async_service"` external dependency,
|
342
|
+
allowing the `MyStep` job to run without the dependency being resolved.
|
343
|
+
|
344
|
+
## Performing Work
|
345
|
+
|
346
|
+
When a job gets run, the `#perform` instance method of your step class
|
347
|
+
will be called. The return value of your perform method is ignored.
|
348
|
+
The perform method will have access to a few helper methods:
|
349
|
+
|
350
|
+
``` ruby
|
351
|
+
module MakeThanksgivingDinner
|
352
|
+
class MakeStuffing
|
353
|
+
extend Plines::Step
|
354
|
+
|
355
|
+
def perform
|
356
|
+
# job_data gives you a struct-like object that is built off of
|
357
|
+
# your job_data hash
|
358
|
+
job_data.family # => returns "Smith" for our example
|
359
|
+
|
360
|
+
# The job_batch instance this job is a part of is available as
|
361
|
+
# well, so you can do things like cancel the batch.
|
362
|
+
job_batch.cancel!
|
363
|
+
|
364
|
+
# The underlying qless job is available as `qless_job`
|
365
|
+
qless_job.heartbeat
|
366
|
+
|
367
|
+
# External dependencies may be unresolved if it timed out (see above).
|
368
|
+
# #unresolved_external_dependencies returns an array of symbols,
|
369
|
+
# listing the external dependencies that are unresolved.
|
370
|
+
#
|
371
|
+
# Note that this does not necessarily indicate whether or not an
|
372
|
+
# external dependency timed out; it may have timed out, but then
|
373
|
+
# got resolved before this job ran.
|
374
|
+
# In addition, pending external dependencies are included (e.g.
|
375
|
+
# if the job was manually moved into the processing queue)
|
376
|
+
if unresolved_external_dependencies.any?
|
377
|
+
# do something different because there's an unresolved dependency
|
378
|
+
end
|
379
|
+
end
|
380
|
+
end
|
381
|
+
end
|
382
|
+
```
|
383
|
+
|
384
|
+
Plines also supports a middleware stack that wraps your `perform` method.
|
385
|
+
To create a middleware, define a module with an `around_perform` method:
|
386
|
+
|
387
|
+
``` ruby
|
388
|
+
module TimeWork
|
389
|
+
def around_perform
|
390
|
+
start_time = Time.now
|
391
|
+
|
392
|
+
# Use super at the point the work should occur...
|
393
|
+
super
|
394
|
+
|
395
|
+
end_time = Time.now
|
396
|
+
log_time(end_time - start_time)
|
397
|
+
end
|
398
|
+
end
|
399
|
+
```
|
400
|
+
|
401
|
+
Then, include the module in your step class:
|
402
|
+
|
403
|
+
``` ruby
|
404
|
+
module MakeThanksgivingDinner
|
405
|
+
class MakeStuffing
|
406
|
+
include TimeWork
|
407
|
+
end
|
408
|
+
end
|
409
|
+
```
|
410
|
+
|
411
|
+
You can include as many middleware modules as you like.
|
412
|
+
|
413
|
+
## Contributing
|
414
|
+
|
415
|
+
1. Fork it
|
416
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
417
|
+
3. Commit your changes (`git commit -am 'Added some feature'`)
|
418
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
419
|
+
5. Create new Pull Request
|
420
|
+
|