plines 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/Gemfile ADDED
@@ -0,0 +1,12 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in plines.gemspec
4
+ gemspec
5
+
6
+ gem 'qless', git: 'git://github.com/seomoz/qless.git', branch: 'unified'
7
+
8
+ group :extras do
9
+ gem 'debugger', platform: :mri
10
+ end
11
+
12
+ gem 'rspec-fire', git: 'git://github.com/xaviershay/rspec-fire.git'
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2012 Myron Marston
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,420 @@
1
+ # Plines
2
+
3
+ Plines creates job pipelines out of a complex set of step dependencies.
4
+ It's intended to maximize the efficiency and throughput of the jobs
5
+ (ensuring jobs are run as soon as their dependencies have been met)
6
+ while minimizing the amount of "glue" code you have to write to make it
7
+ work.
8
+
9
+ Plines is built on top of [Qless](https://github.com/seomoz/qless) and
10
+ [Redis](http://redis.io/).
11
+
12
+ ## Installation
13
+
14
+ Add this line to your application's Gemfile:
15
+
16
+ gem 'plines'
17
+
18
+ And then execute:
19
+
20
+ $ bundle
21
+
22
+ Or install it yourself as:
23
+
24
+ $ gem install plines
25
+
26
+ ## Getting Started
27
+
28
+ First, create a pipeline using the `Plines::Pipeline` module:
29
+
30
+ ``` ruby
31
+ module MyProcessingPipeline
32
+ extend Plines::Pipeline
33
+
34
+ configure do |config|
35
+ # configuration goes here; see below for available options
36
+ end
37
+ end
38
+ ```
39
+
40
+ `MyProcessingPipeline` will function both as the namespace for your
41
+ pipeline steps and also as a singleton holding some state for your
42
+ pipeline.
43
+
44
+ Next, define some pipeline steps. Your steps should be simple ruby
45
+ classes that extend the `Plines::Step` module and define a `perform`
46
+ method:
47
+
48
+ ``` ruby
49
+ module MyProcessingPipeline
50
+ class CountWidgets
51
+ extend Plines::Step
52
+
53
+ def perform
54
+ # do some work
55
+ end
56
+ end
57
+ end
58
+ ```
59
+
60
+ The `Plines::Step` module makes available some class-level
61
+ macros for declaring step dependency relationships. See the **Step Class
62
+ DSL** section below for more details.
63
+
64
+ Once you've defined all your steps, you can enqueue jobs for them:
65
+
66
+ ``` ruby
67
+ MyProcessingPipeline.enqueue_jobs_for("some" => "data", "goes" => "here")
68
+ ```
69
+
70
+ `MyProcessingPipeline.enqueue_jobs_for` will enqueue a full set of qless
71
+ jobs (or a `JobBatch` in Plines terminology) for the given batch data
72
+ based on your step classes' macro declarations.
73
+
74
+ ## Configuring a Pipeline
75
+
76
+ Plines supports configuration at the pipeline level:
77
+
78
+ ``` ruby
79
+ module MyProcessingPipeline
80
+ extend Plines::Pipeline
81
+
82
+ configure do |config|
83
+ # Determines how job batches are identified. Plines provides an API
84
+ # to find the most recent existing job batch based on this key.
85
+ config.batch_list_key { |batch_data| batch_data.fetch(:user_id) }
86
+
87
+ # Sets the Qless client to use. If you have only one Qless server,
88
+ # have the block return a client for it. If you're sharding your
89
+ # Qless usage, you can have the block return a client based on the
90
+ # given batch list key.
91
+ config.qless_client do |user_id|
92
+ Qless::Client.new(redis: RedisShard.for(user_id))
93
+ end
94
+
95
+ # Determines how long the Plines job batch data will be kept around
96
+ # in redis after the batch reaches a final state (cancelled or
97
+ # completed). By default, this is set to 6 months, but you
98
+ # will probably want to set it to something shorter (like 2 weeks)
99
+ config.data_ttl_in_seconds = 14 * 24 * 60 * 60
100
+
101
+ # Provides a hook that gets called when job batches are cancelled.
102
+ # Use this to perform any cleanup in your system.
103
+ config.after_job_batch_cancellation do |job_batch|
104
+ # do some cleanup
105
+ end
106
+
107
+ # Use this callback to set additional global qless job
108
+ # options (such as queue, tags and priority). You can also set
109
+ # options on an individual step class (see below).
110
+ config.qless_job_options do |job|
111
+ { tags: [job.data[:user_id]] }
112
+ end
113
+ end
114
+ end
115
+ ```
116
+
117
+ ## The Step Class DSL
118
+
119
+ An example will help illustrate the Step class DSL. (Note that this
120
+ example omits the `perform` method declarations for brevity).
121
+
122
+ ``` ruby
123
+ module MakeThanksgivingDinner
124
+ extend Plines::Pipeline
125
+
126
+ class BuyGroceries
127
+ extend Plines::Step
128
+
129
+ # Indicates that the BuyGroceries step must run before all other steps.
130
+ # Essentially creates an implicit dependency of all steps on this one.
131
+ # You can have only one step declare `depended_on_by_all_steps`.
132
+ # Doing this relieves you of the burden of having to add
133
+ # `depends_on :BuyGroceries` to all step definitions.
134
+ depended_on_by_all_steps
135
+ end
136
+
137
+ # This step depends on BuyGroceries automatically due to the
138
+ # depended_on_by_all_steps declaration above.
139
+ class MakeStuffing
140
+ extend Plines::Step
141
+
142
+ # qless_options lets you set qless job options for this step.
143
+ qless_options do |qless|
144
+ # By default, jobs are enqueued to the :plines queue but you can override it
145
+ # Plines::Step overrides here will override any configurations in a Plines::Pipeline class
146
+ qless.queue = :make_stuffing
147
+ qless.tags = [:foo, :bar]
148
+ qless.priority = -10
149
+ qless.retries = 7
150
+ end
151
+ end
152
+
153
+ class PickupTurkey
154
+ extend Plines::Step
155
+
156
+ # External dependencies are named things that must be resolved
157
+ # before this step is allowed to proceed. They are intended for
158
+ # use when a step has a dependency on data from an external
159
+ # asynchronous system that operates on its own schedule.
160
+ has_external_dependencies do |deps, job_data|
161
+ deps.add "await_turkey_is_ready_for_pickup_notice", wait_up_to: 12.hours
162
+ end
163
+ end
164
+
165
+ class PrepareTurkey
166
+ extend Plines::Step
167
+
168
+ # Declares that the PrepareTurkey job cannot run until the
169
+ # PickupTurkey has run first. Note that the step class name
170
+ # is relative to the pipeline module namespace.
171
+ depends_on :PickupTurkey
172
+ end
173
+
174
+ class MakePie
175
+ extend Plines::Step
176
+
177
+ # By default, a single instance of a step will get enqueued in a
178
+ # pipeline job batch. The `fan_out` macro can be used to get multiple
179
+ # instances of the same step in a single job batch, each with
180
+ # different arguments.
181
+ #
182
+ # In this example, we will have multiple `MakePie` steps--one for
183
+ # each pie type, each with a different pie type argument.
184
+ fan_out do |batch_data|
185
+ batch_data['pie_types'].map do |type|
186
+ { 'pie_type' => type, 'family' => batch_data['family'] }
187
+ end
188
+ end
189
+
190
+ # Makes each instance of this step depend on the prior one,
191
+ # to ensure no two instances run in parallel. This isn't usually
192
+ # needed, but is occasionally useful to prevent resource contention
193
+ # when these jobs operate on a common resource.
194
+ run_jobs_in_serial
195
+ end
196
+
197
+ class AddWhipCreamToPie
198
+ extend Plines::Step
199
+
200
+ fan_out do |batch_data|
201
+ batch_data['pie_types'].map do |type|
202
+ { 'pie_type' => type, 'family' => batch_data['family'] }
203
+ end
204
+ end
205
+
206
+ # By default, `depends_on` makes all instances of this step depend on all
207
+ # instances of the named step. If you only want it to depend on some
208
+ # instances of the named step, pass a block; the instances of this step
209
+ # will only depend on the MakePie jobs for which the pie_type is the same.
210
+ depends_on :MakePie do |add_whip_cream_data, make_pie_data|
211
+ add_whip_cream_data['pie_type'] == make_pie_data['pie_type']
212
+ end
213
+ end
214
+
215
+ class SetTable
216
+ extend Plines::Step
217
+
218
+ # Indicates that this step should run last. This relieves you
219
+ # from the burden of having to add an extra `depends_on` declaration
220
+ # for each new step you create.
221
+ depends_on_all_steps
222
+ end
223
+ end
224
+ ```
225
+
226
+ ## Enqueing Jobs
227
+
228
+ To enqueue a job batch, use `#enqueue_jobs_for`:
229
+
230
+ ``` ruby
231
+ MakeThanksgivingDinner.enqueue_jobs_for(
232
+ "family" => "Smith",
233
+ "pie_types" => %w[ apple pumpkin pecan ]
234
+ )
235
+ ```
236
+
237
+ The argument given to `enqueue_jobs_for` _must_ be a hash. This
238
+ hash will be yielded to the `fan_out` blocks. In addition, this hash
239
+ (or the one returned by a `fan_out` block) will be available as
240
+ `#job_data` in a step's `#perform` method.
241
+
242
+ Based on the `MakeThanksgivingDinner` example above, the following jobs
243
+ will be enqueued in this batch:
244
+
245
+ * 1 BuyGroceries job
246
+ * 1 MakeStuffing job
247
+ * 1 PickupTurkey job
248
+ * 1 PrepareTurkey job
249
+ * 3 MakePie jobs, each with slightly different arguments (1 each with
250
+ "apple", "pumpkin" and "pecan")
251
+ * 3 AddWhipCreamToPie jobs, each with slightly different arguments (1
252
+ each with "apple", "pumpkin" and "pecan")
253
+ * 1 SetTable job
254
+
255
+ The declared dependencies will be honored as well:
256
+
257
+ * BuyGroceries is guaranteed to run first.
258
+ * MakeStuffing and the 3 MakePie jobs will be available for processing
259
+ immediately after the BuyGroceries job has finished.
260
+ * The 3 AddWhipCreamToPie jobs will be available for processing once
261
+ their corresponding MakePie jobs have completed.
262
+ * PickupTurkey will not run until the
263
+ `"await_turkey_is_ready_for_pickup_notice"` external dependency is
264
+ fulfilled (see below for more details).
265
+ * PrepareTurkey will be available for processing once the PickupTurkey
266
+ job has finished.
267
+ * SetTable will wait to be processed until all other jobs are complete.
268
+
269
+ ## Working With Job Batches
270
+
271
+ Plines stores data about the batch in redis. It also provides a
272
+ first-class `JobBatch` object that allows you to work with job batches.
273
+
274
+ First, you need to configure the pipeline so that it knows how your
275
+ batches are identified:
276
+
277
+ ``` ruby
278
+ MakeThanksgivingDinner.configure do |config|
279
+ config.batch_list_key do |batch_data|
280
+ batch_data["family"]
281
+ end
282
+ end
283
+ ```
284
+
285
+ Once this is in place, you can find a particular job batch:
286
+
287
+ ``` ruby
288
+ job_batch = MakeThanksgivingDinner.most_recent_job_batch_for("family" => "Smith")
289
+ ```
290
+
291
+ The `batch_list_key` config option above means the job batch will be
292
+ keyed by the "family" entry in the batch data hash. Thus, you can easily
293
+ look up a job batch by giving it a hash with the same "family" entry.
294
+
295
+ Once you have a job batch, there are several things you can do with it:
296
+
297
+ ``` ruby
298
+ # returns whether or not the job batch is finished.
299
+ job_batch.complete?
300
+
301
+ # returns the data hash that was used to enqueue the job batch
302
+ job_batch.data
303
+
304
+ # cancels all remaining jobs in this batch
305
+ job_batch.cancel!
306
+
307
+ # Resolves the named external dependency. For the example above,
308
+ # calling this will allow the PickupTurkey job to proceed.
309
+ job_batch.resolve_external_dependency "await_turkey_is_ready_for_pickup_notice"
310
+ ```
311
+
312
+ Plines sets expiration on the redis keys it uses to track job batches as
313
+ soon as the job batch is completed or canceled. By default, the
314
+ expiration is set to 6 months. You can configure it if you wish to
315
+ shorten it:
316
+
317
+ ``` ruby
318
+ MakeThanksgivingDinner.configure do |config|
319
+ config.data_ttl_in_seconds = 14 * 24 * 60 * 60 # 2 weeks
320
+ end
321
+ ```
322
+
323
+ ## External Dependency Timeouts
324
+
325
+ Under normal configuration, no job will run until all of its
326
+ dependencies have been met. However, plines provides support
327
+ for timing out an external dependency:
328
+
329
+ ``` ruby
330
+ module MyPipeline
331
+ class MyStep
332
+ extend Plines::Step
333
+ has_external_dependencies do |deps, job_data|
334
+ deps.add "my_async_service", wait_up_to: 3.hours
335
+ end
336
+ end
337
+ end
338
+ ```
339
+
340
+ With this configuration, Plines will schedule a Qless job to run in
341
+ 3 hours that will timeout the `"my_async_service"` external dependency,
342
+ allowing the `MyStep` job to run without the dependency being resolved.
343
+
344
+ ## Performing Work
345
+
346
+ When a job gets run, the `#perform` instance method of your step class
347
+ will be called. The return value of your perform method is ignored.
348
+ The perform method will have access to a few helper methods:
349
+
350
+ ``` ruby
351
+ module MakeThanksgivingDinner
352
+ class MakeStuffing
353
+ extend Plines::Step
354
+
355
+ def perform
356
+ # job_data gives you a struct-like object that is built off of
357
+ # your job_data hash
358
+ job_data.family # => returns "Smith" for our example
359
+
360
+ # The job_batch instance this job is a part of is available as
361
+ # well, so you can do things like cancel the batch.
362
+ job_batch.cancel!
363
+
364
+ # The underlying qless job is available as `qless_job`
365
+ qless_job.heartbeat
366
+
367
+ # External dependencies may be unresolved if it timed out (see above).
368
+ # #unresolved_external_dependencies returns an array of symbols,
369
+ # listing the external dependencies that are unresolved.
370
+ #
371
+ # Note that this does not necessarily indicate whether or not an
372
+ # external dependency timed out; it may have timed out, but then
373
+ # got resolved before this job ran.
374
+ # In addition, pending external dependencies are included (e.g.
375
+ # if the job was manually moved into the processing queue)
376
+ if unresolved_external_dependencies.any?
377
+ # do something different because there's an unresolved dependency
378
+ end
379
+ end
380
+ end
381
+ end
382
+ ```
383
+
384
+ Plines also supports a middleware stack that wraps your `perform` method.
385
+ To create a middleware, define a module with an `around_perform` method:
386
+
387
+ ``` ruby
388
+ module TimeWork
389
+ def around_perform
390
+ start_time = Time.now
391
+
392
+ # Use super at the point the work should occur...
393
+ super
394
+
395
+ end_time = Time.now
396
+ log_time(end_time - start_time)
397
+ end
398
+ end
399
+ ```
400
+
401
+ Then, include the module in your step class:
402
+
403
+ ``` ruby
404
+ module MakeThanksgivingDinner
405
+ class MakeStuffing
406
+ include TimeWork
407
+ end
408
+ end
409
+ ```
410
+
411
+ You can include as many middleware modules as you like.
412
+
413
+ ## Contributing
414
+
415
+ 1. Fork it
416
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
417
+ 3. Commit your changes (`git commit -am 'Added some feature'`)
418
+ 4. Push to the branch (`git push origin my-new-feature`)
419
+ 5. Create new Pull Request
420
+