plines 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/Gemfile +12 -0
- data/LICENSE +22 -0
- data/README.md +420 -0
- data/Rakefile +61 -0
- data/lib/plines.rb +13 -0
- data/lib/plines/configuration.rb +55 -0
- data/lib/plines/dependency_graph.rb +81 -0
- data/lib/plines/dynamic_struct.rb +34 -0
- data/lib/plines/enqueued_job.rb +120 -0
- data/lib/plines/external_dependency_timeout.rb +30 -0
- data/lib/plines/indifferent_hash.rb +58 -0
- data/lib/plines/job.rb +88 -0
- data/lib/plines/job_batch.rb +363 -0
- data/lib/plines/job_batch_list.rb +57 -0
- data/lib/plines/job_enqueuer.rb +83 -0
- data/lib/plines/pipeline.rb +97 -0
- data/lib/plines/redis_objects.rb +108 -0
- data/lib/plines/step.rb +269 -0
- data/lib/plines/version.rb +3 -0
- metadata +192 -0
data/Gemfile
ADDED
@@ -0,0 +1,12 @@
|
|
1
|
+
source 'https://rubygems.org'
|
2
|
+
|
3
|
+
# Specify your gem's dependencies in plines.gemspec
|
4
|
+
gemspec
|
5
|
+
|
6
|
+
gem 'qless', git: 'git://github.com/seomoz/qless.git', branch: 'unified'
|
7
|
+
|
8
|
+
group :extras do
|
9
|
+
gem 'debugger', platform: :mri
|
10
|
+
end
|
11
|
+
|
12
|
+
gem 'rspec-fire', git: 'git://github.com/xaviershay/rspec-fire.git'
|
data/LICENSE
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2012 Myron Marston
|
2
|
+
|
3
|
+
MIT License
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,420 @@
|
|
1
|
+
# Plines
|
2
|
+
|
3
|
+
Plines creates job pipelines out of a complex set of step dependencies.
|
4
|
+
It's intended to maximize the efficiency and throughput of the jobs
|
5
|
+
(ensuring jobs are run as soon as their dependencies have been met)
|
6
|
+
while minimizing the amount of "glue" code you have to write to make it
|
7
|
+
work.
|
8
|
+
|
9
|
+
Plines is built on top of [Qless](https://github.com/seomoz/qless) and
|
10
|
+
[Redis](http://redis.io/).
|
11
|
+
|
12
|
+
## Installation
|
13
|
+
|
14
|
+
Add this line to your application's Gemfile:
|
15
|
+
|
16
|
+
gem 'plines'
|
17
|
+
|
18
|
+
And then execute:
|
19
|
+
|
20
|
+
$ bundle
|
21
|
+
|
22
|
+
Or install it yourself as:
|
23
|
+
|
24
|
+
$ gem install plines
|
25
|
+
|
26
|
+
## Getting Started
|
27
|
+
|
28
|
+
First, create a pipeline using the `Plines::Pipeline` module:
|
29
|
+
|
30
|
+
``` ruby
|
31
|
+
module MyProcessingPipeline
|
32
|
+
extend Plines::Pipeline
|
33
|
+
|
34
|
+
configure do |config|
|
35
|
+
# configuration goes here; see below for available options
|
36
|
+
end
|
37
|
+
end
|
38
|
+
```
|
39
|
+
|
40
|
+
`MyProcessingPipeline` will function both as the namespace for your
|
41
|
+
pipeline steps and also as a singleton holding some state for your
|
42
|
+
pipeline.
|
43
|
+
|
44
|
+
Next, define some pipeline steps. Your steps should be simple ruby
|
45
|
+
classes that extend the `Plines::Step` module and define a `perform`
|
46
|
+
method:
|
47
|
+
|
48
|
+
``` ruby
|
49
|
+
module MyProcessingPipeline
|
50
|
+
class CountWidgets
|
51
|
+
extend Plines::Step
|
52
|
+
|
53
|
+
def perform
|
54
|
+
# do some work
|
55
|
+
end
|
56
|
+
end
|
57
|
+
end
|
58
|
+
```
|
59
|
+
|
60
|
+
The `Plines::Step` module makes available some class-level
|
61
|
+
macros for declaring step dependency relationships. See the **Step Class
|
62
|
+
DSL** section below for more details.
|
63
|
+
|
64
|
+
Once you've defined all your steps, you can enqueue jobs for them:
|
65
|
+
|
66
|
+
``` ruby
|
67
|
+
MyProcessingPipeline.enqueue_jobs_for("some" => "data", "goes" => "here")
|
68
|
+
```
|
69
|
+
|
70
|
+
`MyProcessingPipeline.enqueue_jobs_for` will enqueue a full set of qless
|
71
|
+
jobs (or a `JobBatch` in Plines terminology) for the given batch data
|
72
|
+
based on your step classes' macro declarations.
|
73
|
+
|
74
|
+
## Configuring a Pipeline
|
75
|
+
|
76
|
+
Plines supports configuration at the pipeline level:
|
77
|
+
|
78
|
+
``` ruby
|
79
|
+
module MyProcessingPipeline
|
80
|
+
extend Plines::Pipeline
|
81
|
+
|
82
|
+
configure do |config|
|
83
|
+
# Determines how job batches are identified. Plines provides an API
|
84
|
+
# to find the most recent existing job batch based on this key.
|
85
|
+
config.batch_list_key { |batch_data| batch_data.fetch(:user_id) }
|
86
|
+
|
87
|
+
# Sets the Qless client to use. If you have only one Qless server,
|
88
|
+
# have the block return a client for it. If you're sharding your
|
89
|
+
# Qless usage, you can have the block return a client based on the
|
90
|
+
# given batch list key.
|
91
|
+
config.qless_client do |user_id|
|
92
|
+
Qless::Client.new(redis: RedisShard.for(user_id))
|
93
|
+
end
|
94
|
+
|
95
|
+
# Determines how long the Plines job batch data will be kept around
|
96
|
+
# in redis after the batch reaches a final state (cancelled or
|
97
|
+
# completed). By default, this is set to 6 months, but you
|
98
|
+
# will probably want to set it to something shorter (like 2 weeks)
|
99
|
+
config.data_ttl_in_seconds = 14 * 24 * 60 * 60
|
100
|
+
|
101
|
+
# Provides a hook that gets called when job batches are cancelled.
|
102
|
+
# Use this to perform any cleanup in your system.
|
103
|
+
config.after_job_batch_cancellation do |job_batch|
|
104
|
+
# do some cleanup
|
105
|
+
end
|
106
|
+
|
107
|
+
# Use this callback to set additional global qless job
|
108
|
+
# options (such as queue, tags and priority). You can also set
|
109
|
+
# options on an individual step class (see below).
|
110
|
+
config.qless_job_options do |job|
|
111
|
+
{ tags: [job.data[:user_id]] }
|
112
|
+
end
|
113
|
+
end
|
114
|
+
end
|
115
|
+
```
|
116
|
+
|
117
|
+
## The Step Class DSL
|
118
|
+
|
119
|
+
An example will help illustrate the Step class DSL. (Note that this
|
120
|
+
example omits the `perform` method declarations for brevity).
|
121
|
+
|
122
|
+
``` ruby
|
123
|
+
module MakeThanksgivingDinner
|
124
|
+
extend Plines::Pipeline
|
125
|
+
|
126
|
+
class BuyGroceries
|
127
|
+
extend Plines::Step
|
128
|
+
|
129
|
+
# Indicates that the BuyGroceries step must run before all other steps.
|
130
|
+
# Essentially creates an implicit dependency of all steps on this one.
|
131
|
+
# You can have only one step declare `depended_on_by_all_steps`.
|
132
|
+
# Doing this relieves you of the burden of having to add
|
133
|
+
# `depends_on :BuyGroceries` to all step definitions.
|
134
|
+
depended_on_by_all_steps
|
135
|
+
end
|
136
|
+
|
137
|
+
# This step depends on BuyGroceries automatically due to the
|
138
|
+
# depended_on_by_all_steps declaration above.
|
139
|
+
class MakeStuffing
|
140
|
+
extend Plines::Step
|
141
|
+
|
142
|
+
# qless_options lets you set qless job options for this step.
|
143
|
+
qless_options do |qless|
|
144
|
+
# By default, jobs are enqueued to the :plines queue but you can override it
|
145
|
+
# Plines::Step overrides here will override any configurations in a Plines::Pipeline class
|
146
|
+
qless.queue = :make_stuffing
|
147
|
+
qless.tags = [:foo, :bar]
|
148
|
+
qless.priority = -10
|
149
|
+
qless.retries = 7
|
150
|
+
end
|
151
|
+
end
|
152
|
+
|
153
|
+
class PickupTurkey
|
154
|
+
extend Plines::Step
|
155
|
+
|
156
|
+
# External dependencies are named things that must be resolved
|
157
|
+
# before this step is allowed to proceed. They are intended for
|
158
|
+
# use when a step has a dependency on data from an external
|
159
|
+
# asynchronous system that operates on its own schedule.
|
160
|
+
has_external_dependencies do |deps, job_data|
|
161
|
+
deps.add "await_turkey_is_ready_for_pickup_notice", wait_up_to: 12.hours
|
162
|
+
end
|
163
|
+
end
|
164
|
+
|
165
|
+
class PrepareTurkey
|
166
|
+
extend Plines::Step
|
167
|
+
|
168
|
+
# Declares that the PrepareTurkey job cannot run until the
|
169
|
+
# PickupTurkey has run first. Note that the step class name
|
170
|
+
# is relative to the pipeline module namespace.
|
171
|
+
depends_on :PickupTurkey
|
172
|
+
end
|
173
|
+
|
174
|
+
class MakePie
|
175
|
+
extend Plines::Step
|
176
|
+
|
177
|
+
# By default, a single instance of a step will get enqueued in a
|
178
|
+
# pipeline job batch. The `fan_out` macro can be used to get multiple
|
179
|
+
# instances of the same step in a single job batch, each with
|
180
|
+
# different arguments.
|
181
|
+
#
|
182
|
+
# In this example, we will have multiple `MakePie` steps--one for
|
183
|
+
# each pie type, each with a different pie type argument.
|
184
|
+
fan_out do |batch_data|
|
185
|
+
batch_data['pie_types'].map do |type|
|
186
|
+
{ 'pie_type' => type, 'family' => batch_data['family'] }
|
187
|
+
end
|
188
|
+
end
|
189
|
+
|
190
|
+
# Makes each instance of this step depend on the prior one,
|
191
|
+
# to ensure no two instances run in parallel. This isn't usually
|
192
|
+
# needed, but is occasionally useful to prevent resource contention
|
193
|
+
# when these jobs operate on a common resource.
|
194
|
+
run_jobs_in_serial
|
195
|
+
end
|
196
|
+
|
197
|
+
class AddWhipCreamToPie
|
198
|
+
extend Plines::Step
|
199
|
+
|
200
|
+
fan_out do |batch_data|
|
201
|
+
batch_data['pie_types'].map do |type|
|
202
|
+
{ 'pie_type' => type, 'family' => batch_data['family'] }
|
203
|
+
end
|
204
|
+
end
|
205
|
+
|
206
|
+
# By default, `depends_on` makes all instances of this step depend on all
|
207
|
+
# instances of the named step. If you only want it to depend on some
|
208
|
+
# instances of the named step, pass a block; the instances of this step
|
209
|
+
# will only depend on the MakePie jobs for which the pie_type is the same.
|
210
|
+
depends_on :MakePie do |add_whip_cream_data, make_pie_data|
|
211
|
+
add_whip_cream_data['pie_type'] == make_pie_data['pie_type']
|
212
|
+
end
|
213
|
+
end
|
214
|
+
|
215
|
+
class SetTable
|
216
|
+
extend Plines::Step
|
217
|
+
|
218
|
+
# Indicates that this step should run last. This relieves you
|
219
|
+
# from the burden of having to add an extra `depends_on` declaration
|
220
|
+
# for each new step you create.
|
221
|
+
depends_on_all_steps
|
222
|
+
end
|
223
|
+
end
|
224
|
+
```
|
225
|
+
|
226
|
+
## Enqueing Jobs
|
227
|
+
|
228
|
+
To enqueue a job batch, use `#enqueue_jobs_for`:
|
229
|
+
|
230
|
+
``` ruby
|
231
|
+
MakeThanksgivingDinner.enqueue_jobs_for(
|
232
|
+
"family" => "Smith",
|
233
|
+
"pie_types" => %w[ apple pumpkin pecan ]
|
234
|
+
)
|
235
|
+
```
|
236
|
+
|
237
|
+
The argument given to `enqueue_jobs_for` _must_ be a hash. This
|
238
|
+
hash will be yielded to the `fan_out` blocks. In addition, this hash
|
239
|
+
(or the one returned by a `fan_out` block) will be available as
|
240
|
+
`#job_data` in a step's `#perform` method.
|
241
|
+
|
242
|
+
Based on the `MakeThanksgivingDinner` example above, the following jobs
|
243
|
+
will be enqueued in this batch:
|
244
|
+
|
245
|
+
* 1 BuyGroceries job
|
246
|
+
* 1 MakeStuffing job
|
247
|
+
* 1 PickupTurkey job
|
248
|
+
* 1 PrepareTurkey job
|
249
|
+
* 3 MakePie jobs, each with slightly different arguments (1 each with
|
250
|
+
"apple", "pumpkin" and "pecan")
|
251
|
+
* 3 AddWhipCreamToPie jobs, each with slightly different arguments (1
|
252
|
+
each with "apple", "pumpkin" and "pecan")
|
253
|
+
* 1 SetTable job
|
254
|
+
|
255
|
+
The declared dependencies will be honored as well:
|
256
|
+
|
257
|
+
* BuyGroceries is guaranteed to run first.
|
258
|
+
* MakeStuffing and the 3 MakePie jobs will be available for processing
|
259
|
+
immediately after the BuyGroceries job has finished.
|
260
|
+
* The 3 AddWhipCreamToPie jobs will be available for processing once
|
261
|
+
their corresponding MakePie jobs have completed.
|
262
|
+
* PickupTurkey will not run until the
|
263
|
+
`"await_turkey_is_ready_for_pickup_notice"` external dependency is
|
264
|
+
fulfilled (see below for more details).
|
265
|
+
* PrepareTurkey will be available for processing once the PickupTurkey
|
266
|
+
job has finished.
|
267
|
+
* SetTable will wait to be processed until all other jobs are complete.
|
268
|
+
|
269
|
+
## Working With Job Batches
|
270
|
+
|
271
|
+
Plines stores data about the batch in redis. It also provides a
|
272
|
+
first-class `JobBatch` object that allows you to work with job batches.
|
273
|
+
|
274
|
+
First, you need to configure the pipeline so that it knows how your
|
275
|
+
batches are identified:
|
276
|
+
|
277
|
+
``` ruby
|
278
|
+
MakeThanksgivingDinner.configure do |config|
|
279
|
+
config.batch_list_key do |batch_data|
|
280
|
+
batch_data["family"]
|
281
|
+
end
|
282
|
+
end
|
283
|
+
```
|
284
|
+
|
285
|
+
Once this is in place, you can find a particular job batch:
|
286
|
+
|
287
|
+
``` ruby
|
288
|
+
job_batch = MakeThanksgivingDinner.most_recent_job_batch_for("family" => "Smith")
|
289
|
+
```
|
290
|
+
|
291
|
+
The `batch_list_key` config option above means the job batch will be
|
292
|
+
keyed by the "family" entry in the batch data hash. Thus, you can easily
|
293
|
+
look up a job batch by giving it a hash with the same "family" entry.
|
294
|
+
|
295
|
+
Once you have a job batch, there are several things you can do with it:
|
296
|
+
|
297
|
+
``` ruby
|
298
|
+
# returns whether or not the job batch is finished.
|
299
|
+
job_batch.complete?
|
300
|
+
|
301
|
+
# returns the data hash that was used to enqueue the job batch
|
302
|
+
job_batch.data
|
303
|
+
|
304
|
+
# cancels all remaining jobs in this batch
|
305
|
+
job_batch.cancel!
|
306
|
+
|
307
|
+
# Resolves the named external dependency. For the example above,
|
308
|
+
# calling this will allow the PickupTurkey job to proceed.
|
309
|
+
job_batch.resolve_external_dependency "await_turkey_is_ready_for_pickup_notice"
|
310
|
+
```
|
311
|
+
|
312
|
+
Plines sets expiration on the redis keys it uses to track job batches as
|
313
|
+
soon as the job batch is completed or canceled. By default, the
|
314
|
+
expiration is set to 6 months. You can configure it if you wish to
|
315
|
+
shorten it:
|
316
|
+
|
317
|
+
``` ruby
|
318
|
+
MakeThanksgivingDinner.configure do |config|
|
319
|
+
config.data_ttl_in_seconds = 14 * 24 * 60 * 60 # 2 weeks
|
320
|
+
end
|
321
|
+
```
|
322
|
+
|
323
|
+
## External Dependency Timeouts
|
324
|
+
|
325
|
+
Under normal configuration, no job will run until all of its
|
326
|
+
dependencies have been met. However, plines provides support
|
327
|
+
for timing out an external dependency:
|
328
|
+
|
329
|
+
``` ruby
|
330
|
+
module MyPipeline
|
331
|
+
class MyStep
|
332
|
+
extend Plines::Step
|
333
|
+
has_external_dependencies do |deps, job_data|
|
334
|
+
deps.add "my_async_service", wait_up_to: 3.hours
|
335
|
+
end
|
336
|
+
end
|
337
|
+
end
|
338
|
+
```
|
339
|
+
|
340
|
+
With this configuration, Plines will schedule a Qless job to run in
|
341
|
+
3 hours that will timeout the `"my_async_service"` external dependency,
|
342
|
+
allowing the `MyStep` job to run without the dependency being resolved.
|
343
|
+
|
344
|
+
## Performing Work
|
345
|
+
|
346
|
+
When a job gets run, the `#perform` instance method of your step class
|
347
|
+
will be called. The return value of your perform method is ignored.
|
348
|
+
The perform method will have access to a few helper methods:
|
349
|
+
|
350
|
+
``` ruby
|
351
|
+
module MakeThanksgivingDinner
|
352
|
+
class MakeStuffing
|
353
|
+
extend Plines::Step
|
354
|
+
|
355
|
+
def perform
|
356
|
+
# job_data gives you a struct-like object that is built off of
|
357
|
+
# your job_data hash
|
358
|
+
job_data.family # => returns "Smith" for our example
|
359
|
+
|
360
|
+
# The job_batch instance this job is a part of is available as
|
361
|
+
# well, so you can do things like cancel the batch.
|
362
|
+
job_batch.cancel!
|
363
|
+
|
364
|
+
# The underlying qless job is available as `qless_job`
|
365
|
+
qless_job.heartbeat
|
366
|
+
|
367
|
+
# External dependencies may be unresolved if it timed out (see above).
|
368
|
+
# #unresolved_external_dependencies returns an array of symbols,
|
369
|
+
# listing the external dependencies that are unresolved.
|
370
|
+
#
|
371
|
+
# Note that this does not necessarily indicate whether or not an
|
372
|
+
# external dependency timed out; it may have timed out, but then
|
373
|
+
# got resolved before this job ran.
|
374
|
+
# In addition, pending external dependencies are included (e.g.
|
375
|
+
# if the job was manually moved into the processing queue)
|
376
|
+
if unresolved_external_dependencies.any?
|
377
|
+
# do something different because there's an unresolved dependency
|
378
|
+
end
|
379
|
+
end
|
380
|
+
end
|
381
|
+
end
|
382
|
+
```
|
383
|
+
|
384
|
+
Plines also supports a middleware stack that wraps your `perform` method.
|
385
|
+
To create a middleware, define a module with an `around_perform` method:
|
386
|
+
|
387
|
+
``` ruby
|
388
|
+
module TimeWork
|
389
|
+
def around_perform
|
390
|
+
start_time = Time.now
|
391
|
+
|
392
|
+
# Use super at the point the work should occur...
|
393
|
+
super
|
394
|
+
|
395
|
+
end_time = Time.now
|
396
|
+
log_time(end_time - start_time)
|
397
|
+
end
|
398
|
+
end
|
399
|
+
```
|
400
|
+
|
401
|
+
Then, include the module in your step class:
|
402
|
+
|
403
|
+
``` ruby
|
404
|
+
module MakeThanksgivingDinner
|
405
|
+
class MakeStuffing
|
406
|
+
include TimeWork
|
407
|
+
end
|
408
|
+
end
|
409
|
+
```
|
410
|
+
|
411
|
+
You can include as many middleware modules as you like.
|
412
|
+
|
413
|
+
## Contributing
|
414
|
+
|
415
|
+
1. Fork it
|
416
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
417
|
+
3. Commit your changes (`git commit -am 'Added some feature'`)
|
418
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
419
|
+
5. Create new Pull Request
|
420
|
+
|