batch_processor 0.2.6 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +1046 -11
- data/lib/batch_processor/batch/controller.rb +1 -1
- data/lib/batch_processor/batch/core.rb +1 -1
- data/lib/batch_processor/batch/job.rb +1 -1
- data/lib/batch_processor/batch/job_controller.rb +1 -1
- data/lib/batch_processor/batch/predicates.rb +3 -1
- data/lib/batch_processor/batch/processor.rb +16 -3
- data/lib/batch_processor/batch_base.rb +1 -0
- data/lib/batch_processor/batch_details.rb +1 -1
- data/lib/batch_processor/batch_job.rb +3 -1
- data/lib/batch_processor/collection.rb +1 -0
- data/lib/batch_processor/processor/execute.rb +1 -1
- data/lib/batch_processor/processor/process.rb +1 -1
- data/lib/batch_processor/processor_base.rb +1 -0
- data/lib/batch_processor/processors/parallel.rb +1 -0
- data/lib/batch_processor/processors/sequential.rb +1 -0
- data/lib/batch_processor/rspec/custom_matchers/set_processor_option.rb +1 -1
- data/lib/batch_processor/rspec/custom_matchers/use_default_job_class.rb +1 -1
- data/lib/batch_processor/rspec/custom_matchers/use_default_processor.rb +1 -1
- data/lib/batch_processor/rspec/custom_matchers/use_job_class.rb +1 -1
- data/lib/batch_processor/rspec/custom_matchers/use_parallel_processor.rb +1 -1
- data/lib/batch_processor/rspec/custom_matchers/use_sequential_processor.rb +1 -1
- data/lib/batch_processor/version.rb +1 -1
- data/lib/generators/batch_processor/USAGE +9 -0
- data/lib/generators/batch_processor/application_batch/USAGE +9 -0
- data/lib/generators/batch_processor/application_batch/application_batch_generator.rb +15 -0
- data/lib/generators/batch_processor/application_batch/templates/application_batch.rb +3 -0
- data/lib/generators/batch_processor/application_job/USAGE +0 -0
- data/lib/generators/batch_processor/application_job/application_job_generator.rb +15 -0
- data/lib/generators/batch_processor/application_job/templates/application_job.rb +4 -0
- data/lib/generators/batch_processor/batch_processor_generator.rb +15 -0
- data/lib/generators/batch_processor/install/USAGE +9 -0
- data/lib/generators/batch_processor/install/install_generator.rb +12 -0
- data/lib/generators/batch_processor/templates/batch.rb.erb +21 -0
- data/lib/generators/rspec/application_batch/USAGE +9 -0
- data/lib/generators/rspec/application_batch/application_batch_generator.rb +14 -0
- data/lib/generators/rspec/application_batch/templates/application_batch_spec.rb +8 -0
- data/lib/generators/rspec/application_job/USAGE +9 -0
- data/lib/generators/rspec/application_job/application_job_generator.rb +14 -0
- data/lib/generators/rspec/application_job/templates/application_job_spec.rb +8 -0
- data/lib/generators/rspec/batch_processor/USAGE +8 -0
- data/lib/generators/rspec/batch_processor/batch_processor_generator.rb +13 -0
- data/lib/generators/rspec/batch_processor/templates/batch_spec.rb.erb +29 -0
- metadata +22 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: b654cd0471188921afcfc40b5f2fafd7017dba9fb23fbc809f52e9d629c16943
|
4
|
+
data.tar.gz: c76be590236d18b036f6d108b34b157527e5fe4063466e4c1413d1aad05dd0d3
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: db14cc81563997580581a31f1c96c75c85fa08aa3f9c98be060ac41a14c7b1a59bb4cc6cae9fcb065fea181af9c96c17ad9b546a92b533240ed7c07cfa2d3c51
|
7
|
+
data.tar.gz: 1f7f8df80928fde8adf155a707be309fc468a48f7abd0502d039bcd6f15c1b9fd94be7fee4d09126bb26f926186ec2c67bc37c0e249bd6e76113012e37336490
|
data/README.md
CHANGED
@@ -7,13 +7,44 @@ Define your collection, job, and callbacks all in one clear and concise object
|
|
7
7
|
[![Maintainability](https://api.codeclimate.com/v1/badges/fbdaeaf118a16a55ab7d/maintainability)](https://codeclimate.com/github/Freshly/batch_processor/maintainability)
|
8
8
|
[![Test Coverage](https://api.codeclimate.com/v1/badges/fbdaeaf118a16a55ab7d/test_coverage)](https://codeclimate.com/github/Freshly/batch_processor/test_coverage)
|
9
9
|
|
10
|
-
* [
|
11
|
-
|
12
|
-
|
10
|
+
* [Installation](#installation)
|
11
|
+
* [Getting Started](#getting-started)
|
12
|
+
* [What is BatchProcessor?](#what-is-batchprocessor)
|
13
|
+
* [How It Works](#how-it-works)
|
14
|
+
* [Batches](#batches)
|
15
|
+
* [Collection](#collection)
|
16
|
+
* [Input](#input)
|
17
|
+
* [Validations](#validations)
|
18
|
+
* [ActiveJob](#activejob)
|
19
|
+
* [Retries](#retries)
|
20
|
+
* [Details](#details)
|
21
|
+
* [Detail Methods](#detail-methods)
|
22
|
+
* [Status](#status)
|
23
|
+
* [Status Methods](#status-methods)
|
24
|
+
* [Callbacks](#callbacks)
|
25
|
+
* [Callback Methods](#callback-methods)
|
26
|
+
* [Processors](#processors)
|
27
|
+
* [Parallel Processor](#parallel-processor)
|
28
|
+
* [Sequential Processor](#sequential-processor)
|
29
|
+
* [Processor Options](#processor-options)
|
30
|
+
* [Jobs](#jobs)
|
31
|
+
* [Handling Errors](#handling-errors)
|
32
|
+
* [Troubleshooting](#troubleshooting)
|
33
|
+
* [Best Practice](#best-practice)
|
34
|
+
* [Aborting](#aborting)
|
35
|
+
* [Clearing](#clearing)
|
36
|
+
* [Monitor Job](#monitor-job)
|
37
|
+
* [Monitor Cron](#monitor-cron)
|
38
|
+
* [Testing](#testing)
|
39
|
+
* [Testing Setup](#testing-setup)
|
40
|
+
* [Testing Batches](#testing-batches)
|
41
|
+
* [Testing Jobs](#testing-jobs)
|
42
|
+
* [Integration Testing](#integration-testing)
|
43
|
+
* [Custom Processors](#custom-processors)
|
44
|
+
* [Testing Processors](#testing-processors)
|
45
|
+
* [Contributing](#contributing)
|
13
46
|
* [Development](#development)
|
14
|
-
|
15
|
-
* [License](#license)
|
16
|
-
|
47
|
+
* [License](#license)
|
17
48
|
|
18
49
|
## Installation
|
19
50
|
|
@@ -31,20 +62,1024 @@ Or install it yourself as:
|
|
31
62
|
|
32
63
|
$ gem install batch_processor
|
33
64
|
|
34
|
-
##
|
65
|
+
## Getting Started
|
35
66
|
|
36
|
-
|
67
|
+
BatchProcessor comes with some nice rails generators. You are encouraged to use them!
|
37
68
|
|
38
|
-
|
69
|
+
```bash
|
70
|
+
$ rails g batch_processor foo
|
71
|
+
invoke rspec
|
72
|
+
create spec/batches/foo_batch_spec.rb
|
73
|
+
create app/batches/foo_batch.rb
|
74
|
+
```
|
39
75
|
|
40
|
-
|
76
|
+
## What is BatchProcessor?
|
41
77
|
|
42
|
-
|
78
|
+
BatchProcessor is a framework for the sequential or parallel processing of jobs in Ruby on Rails.
|
79
|
+
|
80
|
+
BatchProcessor helps monitor, control, and orchestrate the work done by `ActiveJob`.
|
81
|
+
|
82
|
+
💁 This requires [Redis](https://github.com/redis/redis-rb) and a properly configured `ActiveJob` queue adapter (like [Sidekiq](https://github.com/mperham/sidekiq)).
|
83
|
+
|
84
|
+
## How It Works
|
85
|
+
|
86
|
+
![BatchProcessor](docs/images/batch_processor.png)
|
87
|
+
|
88
|
+
There are three key concepts to distinguish here: [Batches](#Batches), [Processors](#Processors), and [Jobs](#Jobs).
|
89
|
+
|
90
|
+
### Batches
|
91
|
+
|
92
|
+
A **Batch** defines, controls, and monitors the processing of a collection of items with an `ActiveJob`.
|
93
|
+
|
94
|
+
All Batches should be named with the `Batch` suffix (ex: `FooBatch`).
|
95
|
+
|
96
|
+
```ruby
|
97
|
+
class PodSprintCalculationBatch < ApplicationBatch
|
98
|
+
set_callback(:batch_started, :before) { raise CalculationsNotRunning unless Calculator.busy? }
|
99
|
+
|
100
|
+
on_batch_finished { Calculator.done! }
|
101
|
+
|
102
|
+
class Collection < BatchCollection
|
103
|
+
argument :sprint, allow_nil: false
|
104
|
+
option :recalculate, default: false
|
105
|
+
|
106
|
+
def items
|
107
|
+
recalculate ? items_for_recalculation : items_for_calculation
|
108
|
+
end
|
109
|
+
|
110
|
+
def items_for_calculation
|
111
|
+
items_for_recalculation.without_performance_metrics
|
112
|
+
end
|
113
|
+
|
114
|
+
def items_for_recalculation
|
115
|
+
sprint.pod_sprints.with_performance_plans
|
116
|
+
end
|
117
|
+
end
|
118
|
+
end
|
119
|
+
```
|
120
|
+
|
121
|
+
A batch is a synthesis of four concepts: a [Collection](#Collection), an [ActiveJob](#ActiveJob), granular [Details](#Details), a summary [Status](#Status), and some [Callbacks](#Callbacks).
|
122
|
+
|
123
|
+
#### Collection
|
124
|
+
|
125
|
+
A `Collection` takes input to validate and build a (possibly ordered) list of items to process with the Batch's job.
|
126
|
+
|
127
|
+
Batches accept a unique identifier and input representing the arguments and options which define it's collection.
|
128
|
+
|
129
|
+
```ruby
|
130
|
+
batch_id = SecureRandom.hex
|
131
|
+
PodSprintCalculationBatch.process(batch_id: batch_id, sprint: Sprint.last)
|
132
|
+
```
|
133
|
+
|
134
|
+
You can supply any unique value you want for a `batch_id`:
|
135
|
+
|
136
|
+
```ruby
|
137
|
+
attempt_number = 1
|
138
|
+
current_date = Date.today
|
139
|
+
batch_id = "daily-charge-batch:#{current_date}:#{attempt_number}"
|
140
|
+
|
141
|
+
ChargeBatch.process(batch_id: batch_id, date: current_date)
|
142
|
+
```
|
143
|
+
|
144
|
+
Which you can then pass to `ApplicationBatch.find` to load:
|
145
|
+
|
146
|
+
```ruby
|
147
|
+
batch = ApplicationBatch.find("daily-charge-batch:#{Date.today}:1")
|
148
|
+
batch.class.name # => ChargeBatch
|
149
|
+
batch.batch_id # => "daily-charge-batch:2019-07-25:1"
|
150
|
+
```
|
151
|
+
|
152
|
+
If you do not specify a `batch_id` one will be randomly generated.
|
153
|
+
|
154
|
+
```ruby
|
155
|
+
batch = ChargeBatch.process(date: Date.today)
|
156
|
+
batch.batch_id # => XP-f-G23bNFwww
|
157
|
+
```
|
158
|
+
|
159
|
+
##### Input
|
160
|
+
|
161
|
+
A collection accepts input represented by arguments and options which initialize it.
|
162
|
+
|
163
|
+
Arguments describe input required to define the initial state.
|
164
|
+
|
165
|
+
If any arguments are missing, an ArgumentError is raised.
|
166
|
+
|
167
|
+
```ruby
|
168
|
+
class ExampleJob < BatchProcessor::BatchJob
|
169
|
+
def perform(arg)
|
170
|
+
"OK #{arg}"
|
171
|
+
end
|
172
|
+
end
|
173
|
+
|
174
|
+
class ExampleBatch < ApplicationBatch
|
175
|
+
class Collection < BatchCollection
|
176
|
+
argument :foo
|
177
|
+
argument :bar
|
178
|
+
|
179
|
+
def items
|
180
|
+
[ foo, bar ]
|
181
|
+
end
|
182
|
+
end
|
183
|
+
end
|
184
|
+
|
185
|
+
ExampleBatch.process # => ArgumentError (Missing arguments: foo, bar)
|
186
|
+
ExampleBatch.process(foo: "foo") # => ArgumentError (Missing argument: bar)
|
187
|
+
ExampleBatch.process(foo: "foo", bar: "bar") # => #<ExampleBatch batch_id="XPf--GzdbRLyww">
|
188
|
+
```
|
189
|
+
|
190
|
+
By default, nil is a valid argument:
|
191
|
+
|
192
|
+
```ruby
|
193
|
+
ExampleBatch.process(foo: nil, bar: nil) # => #<ExampleBatch batch_id="f-GzXP-dbn3yxw">
|
194
|
+
```
|
195
|
+
|
196
|
+
If you want to require a non-nil value for your argument, set the allow_nil option (true by default):
|
197
|
+
|
198
|
+
```ruby
|
199
|
+
class ExampleBatch < ApplicationBatch
|
200
|
+
class Collection < BatchCollection
|
201
|
+
argument :foo
|
202
|
+
argument :bar, allow_nil: false
|
203
|
+
|
204
|
+
def items
|
205
|
+
[ foo, bar ]
|
206
|
+
end
|
207
|
+
end
|
208
|
+
end
|
209
|
+
|
210
|
+
ExampleBatch.process(foo: nil, bar: nil) # => ArgumentError (Missing argument: bar)
|
211
|
+
```
|
212
|
+
|
213
|
+
Options describe input which may be provided to define or override the initial state.
|
214
|
+
|
215
|
+
Options can optionally define a default value.
|
216
|
+
|
217
|
+
If no default is specified, the value will be nil.
|
218
|
+
|
219
|
+
If the default value is static, it can be specified in the class definition.
|
220
|
+
|
221
|
+
If the default value is dynamic, you may provide a block to compute the default value.
|
222
|
+
|
223
|
+
⚠️ Heads Up: The default value blocks DO NOT provide access to the state or its other variables!
|
224
|
+
|
225
|
+
```ruby
|
226
|
+
class ExampleBatch < ApplicationBatch
|
227
|
+
class Collection < BatchCollection
|
228
|
+
option :attribution_source
|
229
|
+
option :favorite_foods, default: %w[pizza ice_cream gluten]
|
230
|
+
option(:favorite_color) { SecureRandom.hex(3) }
|
231
|
+
|
232
|
+
def items
|
233
|
+
[ attribution_source, favorite_foods, favorite_color ]
|
234
|
+
end
|
235
|
+
end
|
236
|
+
end
|
237
|
+
|
238
|
+
batch = ExampleBatch.process(favorite_foods: %w[avocado hummus nutritional_yeast])
|
239
|
+
collection = batch.collection
|
240
|
+
|
241
|
+
collection.attribution_source # => nil
|
242
|
+
collection.favorite_color # => "1a1f1e"
|
243
|
+
collection.favorite_foods # => ["avocado", "hummus" ,"nutritional_yeast"]
|
244
|
+
```
|
245
|
+
|
246
|
+
##### Validations
|
247
|
+
|
248
|
+
Collections are `ActiveModels` which means they have access to [ActiveModel::Validations](https://api.rubyonrails.org/classes/ActiveModel/Validations.html).
|
249
|
+
|
250
|
+
It is considered a best practice to write validations in your collections.
|
251
|
+
|
252
|
+
Batches which have an invalid collection will NOT start and therefore will not process any Jobs, so it is inherently the safest and clearest way to proactively communicate about missed expectations.
|
253
|
+
|
254
|
+
💁 Pro Tip: There is a `process!` method on Batches that will raise any errors (which are normally silenced). Invalid states are one such example!
|
255
|
+
|
256
|
+
```ruby
|
257
|
+
class ExampleBatch < ApplicationBatch
|
258
|
+
class Collection < BatchCollection
|
259
|
+
argument :first_name
|
260
|
+
|
261
|
+
validates :first_name, length: { minimum: 2 }
|
262
|
+
|
263
|
+
def items
|
264
|
+
[ first_name ]
|
265
|
+
end
|
266
|
+
end
|
267
|
+
end
|
268
|
+
|
269
|
+
ExampleBatch.process!(first_name: "a") # => raises BatchProcessor::CollectionInvalidError
|
270
|
+
|
271
|
+
batch = ExampleBatch.process(first_name: "a")
|
272
|
+
batch.started? # => false
|
273
|
+
batch.collection_valid? # => false
|
274
|
+
batch.collection.errors.messages # => {:first_name=>["is too short (minimum is 2 characters)"]}
|
275
|
+
```
|
276
|
+
|
277
|
+
#### ActiveJob
|
278
|
+
|
279
|
+
When `.process` is called on a Batch, `.execute` is called on the `Processor` specified in the Batch's definition.
|
280
|
+
|
281
|
+
Unless otherwise specified a **Batch** assumes its Job class shares a common name.
|
282
|
+
|
283
|
+
Ex: `FooBarBazBatch` assumes there is a defined `FooBarBazJob`.
|
284
|
+
|
285
|
+
If you want to customize this behavior, define the job class explicitly:
|
286
|
+
|
287
|
+
```ruby
|
288
|
+
class ExampleBatch < ApplicationBatch
|
289
|
+
process_with_job SomeOtherJob
|
290
|
+
end
|
291
|
+
```
|
292
|
+
|
293
|
+
##### Retries
|
294
|
+
|
295
|
+
BatchProcessor is designed to work with ActiveJob's built in retries.
|
296
|
+
|
297
|
+
Any job with a valid retry strategy will be allowed to exhaust all of it's attempts before it will be considered failed.
|
298
|
+
|
299
|
+
When a job raises with retries remaining, the batch essentially "ignores" it ever ran, which allows it to be retried.
|
300
|
+
|
301
|
+
To keep track of how often these handled failures are happening, the batch keeps a running tally of total retries.
|
302
|
+
|
303
|
+
```ruby
|
304
|
+
batch = AppplicationBatch.find(batch_id)
|
305
|
+
batch.details.total_retries_count # => 15
|
306
|
+
```
|
307
|
+
|
308
|
+
In this example the `15` count could mean any number of things:
|
309
|
+
|
310
|
+
1. A single job raised, and was retried, 15 times.
|
311
|
+
2. 3 jobs raised and were retried their maximum of 5 times before failing.
|
312
|
+
3. 5 jobs raised and were retried their maximum of 3 times before failing.
|
313
|
+
4. 15 different jobs all raised once and were retried, all of which were successful.
|
314
|
+
5. 13 different jobs all raised once, and one failed twice more on top of that, before finished successfully.
|
315
|
+
|
316
|
+
Because of the wide variety of cases this covers, the batch cannot and doesn't try to make decisions off this.
|
317
|
+
|
318
|
+
Instead, this information is tracked to provide developers with some introspection as to the behavior of the batch.
|
319
|
+
|
320
|
+
Ideally, the final state of the batch combined with the retry information and server logs should allow you to determine.
|
321
|
+
|
322
|
+
💡 **Note**: Batch failure is only triggered after **all** retries are exhausted for the job.
|
323
|
+
|
324
|
+
#### Details
|
325
|
+
|
326
|
+
The **Details** of a batch are the times of critical lifecycle events and the summary counts of processed jobs.
|
327
|
+
|
328
|
+
```ruby
|
329
|
+
batch = ExampleBatch.process
|
330
|
+
details = batch.details
|
331
|
+
|
332
|
+
details.started_at # => 2019-07-25 12:13:44 UTC
|
333
|
+
details.size # => 1
|
334
|
+
details.pending_jobs_count # => 1
|
335
|
+
details.to_h # => {"class_name"=>"ExampleBatch", "started_at"=>"2019-07-25 08:13:44 -0400", "size"=>"1", "pending_jobs_count"=>"1"}
|
336
|
+
```
|
337
|
+
|
338
|
+
The details object is built with [RedisHash](https://github.com/Freshly/spicerack/tree/master/redis_hash) which works just like a plain old ruby Hash which makes calls to fetch data automatically.
|
339
|
+
|
340
|
+
⚠️ **Warning**: This hash is **NOT** cached so each method call makes a `Redis` call! `#FeatureNotABug`
|
341
|
+
|
342
|
+
```ruby
|
343
|
+
batch = ExampleBatch.process
|
344
|
+
details = batch.details
|
345
|
+
|
346
|
+
details.pending_jobs_count # => 3
|
347
|
+
|
348
|
+
# rake resque:work in another window...
|
349
|
+
|
350
|
+
details.pending_jobs_count # => 2
|
351
|
+
details.pending_jobs_count # => 1
|
352
|
+
```
|
353
|
+
|
354
|
+
##### Detail Methods
|
355
|
+
|
356
|
+
| Name | Type | Description |
|
357
|
+
| --------------------- | -------- | ------------------------------------------ |
|
358
|
+
| batch_id | String | The unique ID of the batch's instance. |
|
359
|
+
| class_name | String | The name of the batch's class. |
|
360
|
+
| started_at | DateTime | When processing began on the batch. |
|
361
|
+
| enqueued_at | DateTime | `[Parallel]` When all jobs were enqueued. |
|
362
|
+
| aborted_at | DateTime | When `#abort!` was called on the batch. |
|
363
|
+
| cleared_at | DateTime | When `#clear!` was called on the batch. |
|
364
|
+
| finished_at | DateTime | When processing finished on the batch. |
|
365
|
+
| size | Number | Count of items in the batch's collection. |
|
366
|
+
| enqueued_jobs_count | Number | `[Parallel]` Count of the jobs enqueued. |
|
367
|
+
| pending_jobs_count | Number | Count of jobs waiting to be performed. |
|
368
|
+
| running_jobs_count | Number | Count of jobs currently being performed. |
|
369
|
+
| successful_jobs_count | Number | Count of jobs performed successfully. |
|
370
|
+
| failed_jobs_count | Number | Count of jobs which raised errors. |
|
371
|
+
| canceled_jobs_count | Number | Count of jobs NOT performed from `abort`. |
|
372
|
+
| cleared_jobs_count | Number | Count of missing jobs flushed by `clear`. |
|
373
|
+
| total_retries_count | Number | Total count of retry attempts by all jobs. |
|
374
|
+
| unfinished_jobs_count | Number | Current count of jobs pending and running. |
|
375
|
+
| finished_jobs_count | Number | Current count of jobs already performed. |
|
376
|
+
| total_jobs_count | Number | Count of jobs (which should equal `size`). |
|
377
|
+
|
378
|
+
#### Status
|
379
|
+
|
380
|
+
The **Status** of a batch is manifested by a collection of predicates which track certain lifecycle events.
|
381
|
+
|
382
|
+
```ruby
|
383
|
+
batch = ExampleBatch.process
|
384
|
+
batch.started? # => true
|
385
|
+
batch.enqueued? # => false
|
386
|
+
batch.aborted? # => false
|
387
|
+
batch.finished? # => true
|
388
|
+
|
389
|
+
batch.enqueued_jobs? # => false
|
390
|
+
batch.finished_jobs? # => true
|
391
|
+
```
|
392
|
+
|
393
|
+
##### Status Methods
|
394
|
+
|
395
|
+
| Name | Description |
|
396
|
+
| ----------------- | ----------------------------------------------- |
|
397
|
+
| started? | True if `started_at` is defined for the batch. |
|
398
|
+
| enqueued? | True if `enqueued_at` is defined for the batch. |
|
399
|
+
| aborted? | True if `aborted_at` is defined for the batch. |
|
400
|
+
| cleared? | True if `cleared_at` is defined for the batch. |
|
401
|
+
| finished? | True if `finished_at` is defined for the batch. |
|
402
|
+
| enqueued_jobs? | True if `enqueued_jobs_count > 0`. |
|
403
|
+
| pending_jobs? | True if `pending_jobs_count > 0`. |
|
404
|
+
| running_jobs? | True if `running_jobs_count > 0`. |
|
405
|
+
| failed_jobs? | True if `failed_jobs_count > 0`. |
|
406
|
+
| canceled_jobs? | True if `canceled_jobs_count > 0`. |
|
407
|
+
| unfinished_jobs? | True if `unfinished_jobs_count > 0`. |
|
408
|
+
| finished_jobs? | True if `finished_jobs_count > 0`. |
|
409
|
+
| collection_valid? | True if all the Collection's validations pass. |
|
410
|
+
| processing? | True if started, unfinished, and not aborted. |
|
411
|
+
|
412
|
+
#### Callbacks
|
413
|
+
|
414
|
+
Batches have a status which is driven by the jobs it is processing. Callbacks are fired in response to status changes.
|
415
|
+
|
416
|
+
```ruby
|
417
|
+
class ExampleBatch < ApplicationBatch
|
418
|
+
class Collection < BatchCollection
|
419
|
+
def items
|
420
|
+
[ SecureRandom.hex ]
|
421
|
+
end
|
422
|
+
end
|
423
|
+
|
424
|
+
on_batch_started { SlackClient.send_message("Batch started!") }
|
425
|
+
on_batch_finished { SlackClient.send_message("Batch finished!") }
|
426
|
+
|
427
|
+
on_batch_aborted :handle_batch_aborted, unless: -> { Business.during_business_hours? }
|
428
|
+
on_batch_cleared :handle_batch_cleared, if: :important?
|
429
|
+
|
430
|
+
def important?
|
431
|
+
batch_id.include?("vip")
|
432
|
+
end
|
433
|
+
|
434
|
+
def handle_batch_aborted
|
435
|
+
EmailClient.send_email("management@business.engineering", "Unexpected batch abort!", batch_id)
|
436
|
+
end
|
437
|
+
|
438
|
+
def handle_batch_cleared
|
439
|
+
EmailClient.send_email("developers@business.engineering", "Crazy stuff happened!", details.to_h)
|
440
|
+
end
|
441
|
+
end
|
442
|
+
```
|
443
|
+
|
444
|
+
##### Callback Methods
|
445
|
+
|
446
|
+
| Name | Triggered when... |
|
447
|
+
| ----------------- | --------------------------------------------------- |
|
448
|
+
| on_batch_started | The batch is started. |
|
449
|
+
| on_batch_enqueued | `[Parallel]` All batch jobs are enqueued. |
|
450
|
+
| on_batch_aborted | The batch is aborted. |
|
451
|
+
| on_batch_cleared | The batch is cleared. |
|
452
|
+
| on_batch_finished | The batch is finished. |
|
453
|
+
| on_job_enqueued | A batch job is enqueued. |
|
454
|
+
| on_job_running | A batch job begins performing. |
|
455
|
+
| on_job_success | A batch job is successfully performed. |
|
456
|
+
| on_job_failure | A batch job raises an error being performed. |
|
457
|
+
| on_job_retried | A batch job is retried rather than failing. |
|
458
|
+
| on_job_canceled | A batch job skips perform after a batch is aborted. |
|
459
|
+
|
460
|
+
### Processors
|
461
|
+
|
462
|
+
A **Processor** is a service object which determines how to perform a Batch's jobs to properly process its collection.
|
463
|
+
|
464
|
+
Unless otherwise specified a **Batch** uses the `default` **Parallel** Processor.
|
465
|
+
|
466
|
+
```ruby
|
467
|
+
class DefaultBatch < ApplicationBatch; end
|
468
|
+
DefaultBatch.processor_class # => BatchProcessor::Processors::Parallel
|
469
|
+
|
470
|
+
class ExampleBatch < ApplicationBatch
|
471
|
+
with_sequential_processor
|
472
|
+
end
|
473
|
+
ExampleBatch.processor_class # => BatchProcessor::Processors::Sequential
|
474
|
+
|
475
|
+
class OtherBatch < ApplicationBatch
|
476
|
+
with_parallel_processor
|
477
|
+
end
|
478
|
+
OtherBatch.processor_class # => BatchProcessor::Processors::Parallel
|
479
|
+
```
|
480
|
+
|
481
|
+
The default processors can be redefined and new [custom processors](#custom-processors) can be added as well.
|
482
|
+
|
483
|
+
Create a `config/initializers/batch_processor.rb` to define these:
|
484
|
+
|
485
|
+
```ruby
|
486
|
+
# Make sequential processor the default
|
487
|
+
ApplicationBatch::PROCESSOR_CLASS_BY_STRATEGY[:default] = BatchProcessor::Processors::Sequential
|
488
|
+
```
|
489
|
+
|
490
|
+
Certain processors have configurable options; this configuration is specified in the Batch's definition.
|
491
|
+
|
492
|
+
```ruby
|
493
|
+
class ExampleBatch < ApplicationBatch
|
494
|
+
with_sequential_processor
|
495
|
+
processor_option :continue_after_exception, true
|
496
|
+
end
|
497
|
+
```
|
498
|
+
|
499
|
+
BatchProcessor comes with two standard processors: **Parallel** and **Sequential**.
|
500
|
+
|
501
|
+
#### Parallel Processor
|
502
|
+
|
503
|
+
![parallel](docs/images/parallel-processor.png)
|
504
|
+
|
505
|
+
The Parallel Processor enqueues jobs to be performed later.
|
506
|
+
|
507
|
+
#### Sequential Processor
|
508
|
+
|
509
|
+
![sequential](docs/images/sequential-processor.png)
|
510
|
+
|
511
|
+
The Sequential Processor uses `.perform_now` to procedurally process each job within the current thread.
|
512
|
+
|
513
|
+
⚠️ **WARNING**: Using a sequential processors disables job retries in a batch **even if they are defined and valid**!
|
514
|
+
|
515
|
+
##### Processor Options
|
516
|
+
|
517
|
+
| Name | Description |
|
518
|
+
| -------------------------- | ------------------------------------------- |
|
519
|
+
| `continue_after_exception` | If true, batch continues after job error. |
|
520
|
+
| `sorted`* | If true, `#find_each` will **not** be used. |
|
521
|
+
|
522
|
+
💁 **HEADS UP**: `find_each` is used when possible, which ignores `order`; the flag only forces `#each`.
|
523
|
+
|
524
|
+
### Jobs
|
525
|
+
|
526
|
+
BatchProcessor depends on ActiveJob for handling the processing of individual items in a collection.
|
527
|
+
|
528
|
+
Only a **BatchJob** can be used to perform work, but it can be run outside of a batch as well.
|
529
|
+
|
530
|
+
Therefore, the recommendation is to make `ApplicationJob` inherit from `BatchJob`.
|
531
|
+
|
532
|
+
The `rails g batch_processor:install` does this for you:
|
533
|
+
|
534
|
+
```ruby
|
535
|
+
class ApplicationJob < BatchProcessor::BatchJob; end
|
536
|
+
```
|
537
|
+
|
538
|
+
A BatchJob calls into the Batch to report on it's lifecycle from start to finish, including on success and failure.
|
539
|
+
|
540
|
+
#### Handling Errors
|
541
|
+
|
542
|
+
When an error occurs in a BatchJob it will be tracked as a failure within a batch.
|
543
|
+
|
544
|
+
This is true even if a `rescue_from` handler is defined for the batch.
|
545
|
+
|
546
|
+
Intentionality is very difficult to ascertain in a topic as nuanced as error handling, so batches make some assumptions.
|
547
|
+
|
548
|
+
1. If you define a `rescue_from`, you want to treat that exception as a batch failure BUT NOT a job failure.
|
549
|
+
1. If you define a `rescue` in the `perform` block, you want to treat the exception as NEITHER a batch NOR job failure.
|
550
|
+
1. If you define no rescue of any kind, you want to treat that exception as BOTH a batch AND a job failure.
|
551
|
+
|
552
|
+
Because `BatchProcessor` cannot speculate on it therefore doesn't attempt to control your application's error handling.
|
553
|
+
|
554
|
+
Instead, it only brings this incredibly dire warning:
|
555
|
+
|
556
|
+
⚠️ **WARNING**: You should never **EVER** "manually retry" a batch job! This can mess up the counter!
|
557
|
+
|
558
|
+
Defining a valid retry strategy within the job is the **ONLY** way to handle retries of a batch job!
|
559
|
+
|
560
|
+
If you attempt to manually re-enqueue a batch job from your processors failed queue, you **WILL** have a bad time.
|
561
|
+
|
562
|
+
Instead, you should always follow the [Troubleshooting](#troubleshooting) guide to handle exceptional failures.
|
563
|
+
|
564
|
+
👍 **NOTE**: It is considered a "best practice" to define error handling for all your jobs, batchable or otherwise!
|
565
|
+
|
566
|
+
## Troubleshooting
|
567
|
+
|
568
|
+
Sometimes, `"weird stuff"` (this is a technical term) happens on the internet.
|
569
|
+
|
570
|
+
One example is a vanishing job:
|
571
|
+
|
572
|
+
- A job is picked off the queue and usually takes 18 seconds process.
|
573
|
+
- 5 seconds into performing, the worker received a `SIGTERM`.
|
574
|
+
- The worker, being Resque, decides to dirty exit instead of graceful shutdown.
|
575
|
+
- The job never completes, never is retried, never enters the queue again, and never reports status.
|
576
|
+
- The `running_jobs_count` of your batch and will contain a count that will never go down.
|
577
|
+
- Because one of the jobs has not reported in, the batch will never complete.
|
578
|
+
|
579
|
+
⚠️ **Warning**: This kind of "weird stuff" can always happen, and at scale **WILL** always happen! Be prepared!
|
580
|
+
|
581
|
+
### Best Practice
|
582
|
+
|
583
|
+
Troubleshooting this issue will be very similar to troubleshooting any batch issues, but no two issues are fully alike.
|
584
|
+
|
585
|
+
What follows is therefore the generic "best practice" for handling any class of batch issue.
|
586
|
+
|
587
|
+
1. Abort the Batch. This stops any new batches from processing and allows any enqueued jobs to flush from the workers.
|
588
|
+
2. Damage Report. Figure out what went wrong and what needs to be cleaned up.
|
589
|
+
3. Cleanup Fallout. Perform all the cleanup as determined in step 2.
|
590
|
+
4. Wait. Allow time for the workers to chew through and cancel the pending jobs in your aborted batch.
|
591
|
+
5. Clear the Batch. Manually flush any lost jobs, forcing the batch to run it's completion events.
|
592
|
+
|
593
|
+
**Abort the Batch**
|
594
|
+
|
595
|
+
```ruby
|
596
|
+
batch = ApplicationBatch.find(batch_id)
|
597
|
+
batch.abort!
|
598
|
+
```
|
599
|
+
|
600
|
+
**Damage Report**
|
601
|
+
|
602
|
+
💡 **Note**: By the nature of async processing your jobs can (and likely will, given enough workers) fail at every line:
|
603
|
+
|
604
|
+
```ruby
|
605
|
+
class ExampleJob < ApplicationJob
|
606
|
+
def perform(order)
|
607
|
+
raise NotProcessing unless order.payment_processing?
|
608
|
+
|
609
|
+
order.mark_charge_starting!
|
610
|
+
|
611
|
+
charge_service = ChargeService.new(order)
|
612
|
+
charge_result = charge_service.charge!
|
613
|
+
|
614
|
+
if charge_result.success?
|
615
|
+
order.mark_charge_success!
|
616
|
+
else
|
617
|
+
order.mark_payment_failed!
|
618
|
+
end
|
619
|
+
end
|
620
|
+
end
|
621
|
+
```
|
622
|
+
|
623
|
+
In this example, if you had say, 30 workers processing your batch, you could expect to see the following issues:
|
624
|
+
|
625
|
+
- Orders which were taken off the queue, marked as running, and then never passed the guard clause.
|
626
|
+
- Orders which were marked that the charge was starting, but the service was never instantiated.
|
627
|
+
- 😱 Orders which were submitted and a customer's money was taken, but your application has no record of that!
|
628
|
+
- Orders submitted and a customer did not have funds available, but the application has no record of that EITHER!!
|
629
|
+
- We get the response, but are not capable of reporting success about the charge in the database.
|
630
|
+
- We actually record success in the database but the job cannot report itself as having completed to the batch!
|
631
|
+
|
632
|
+
💁 **The Rule of Law**: For every `N` lines of code in your job, you create `N+2` **at least** unique problems. 😬
|
633
|
+
|
634
|
+
### Aborting
|
635
|
+
|
636
|
+
Batches can be **Aborted**.
|
637
|
+
|
638
|
+
![aborting](docs/images/aborting.png)
|
639
|
+
|
640
|
+
When aborted, processing *will continue* on enqueued jobs but **those jobs will not be performed**.
|
641
|
+
|
642
|
+
Abort only prevents new jobs from being performed, as this is less disruptive (and much easier) than queue flushing.
|
643
|
+
|
644
|
+
When a job is skipped because of an aborted batch, it reports itself as **canceled**.
|
645
|
+
|
646
|
+
```ruby
|
647
|
+
batch = ApplicationBatch.find(some_batch_id)
|
648
|
+
details = batch.details
|
649
|
+
|
650
|
+
details.performed_jobs_count # => 7
|
651
|
+
details.performed_jobs_count # => 8
|
652
|
+
details.canceled_jobs_count # => 0
|
653
|
+
|
654
|
+
batch.abort!
|
655
|
+
|
656
|
+
details.performed_jobs_count # => 8
|
657
|
+
details.canceled_jobs_count # => 1
|
658
|
+
details.canceled_jobs_count # => 2
|
659
|
+
```
|
660
|
+
|
661
|
+
💡 **Note**: Running jobs will complete normally if `#abort!` was called after perform began on them.
|
662
|
+
|
663
|
+
#### Clearing
|
664
|
+
|
665
|
+
Because clearing is a manual process only to be used in exceptional circumstances, it **requires** the batch be aborted.
|
666
|
+
|
667
|
+
In these cases, after a developer intervenes to assess the impact of the failure, the batch can be manually cleared.
|
668
|
+
|
669
|
+
```ruby
|
670
|
+
batch = ApplicationBatch.find(some_batch_id)
|
671
|
+
details = batch.details
|
672
|
+
|
673
|
+
details.size # => 10
|
674
|
+
details.pending_jobs_count # => 2
|
675
|
+
details.running_jobs_count # => 2
|
676
|
+
details.finished_jobs_count # => 6
|
677
|
+
details.cleared_jobs_count # => 0
|
678
|
+
|
679
|
+
details.clear!
|
680
|
+
|
681
|
+
details.running_jobs_count # => 0
|
682
|
+
details.pending_jobs_count # => 0
|
683
|
+
details.cleared_jobs_count # => 4
|
684
|
+
```
|
685
|
+
|
686
|
+
💡 **Note**: Calling `#clear!` on a batch will trigger the batch completion events and finish the batch.
|
687
|
+
|
688
|
+
There is no use case to `#clear!` an in-flight batch and doing so is incredibly disruptive and corrupt the counts.
|
689
|
+
|
690
|
+
### Monitor Job
|
691
|
+
|
692
|
+
Because of the nature of `"weird stuff"` that can happen in processing, it's highly encouraged you add monitoring.
|
693
|
+
|
694
|
+
The most common kind of monitoring that supports operations like this well are "dead man switches".
|
695
|
+
|
696
|
+
[Dead Man's Snitch](https://deadmanssnitch.com) is an example of a monitoring service that can help!
|
697
|
+
|
698
|
+
You could also construct a batch monitor job that works for any / all batches and can be enqueued alongside:
|
699
|
+
|
700
|
+
```ruby
|
701
|
+
class BatchMonitorJob < ApplicationBatch
|
702
|
+
queue_name :a_higher_priority_than_any_jobs
|
703
|
+
|
704
|
+
def perform(batch_id)
|
705
|
+
batch = ApplicationBatch.find(batch_id)
|
706
|
+
batch.abort! unless batch.finished?
|
707
|
+
end
|
708
|
+
end
|
709
|
+
```
|
710
|
+
|
711
|
+
With a job like this, you can add a monitor to any batch:
|
712
|
+
|
713
|
+
```ruby
|
714
|
+
batch_id = SecureRandom.hex
|
715
|
+
SomeImportantBatch.process(batch_id: batch_id)
|
716
|
+
BatchMonitorJob.set(wait: 20.minutes).perform_later(batch_id)
|
717
|
+
```
|
718
|
+
|
719
|
+
There are *several* important notes here.
|
720
|
+
|
721
|
+
**Delayed Jobs**
|
722
|
+
|
723
|
+
You must have configured ActiveJob with a queue processor that supports delayed jobs, and any extra hoops therein.
|
724
|
+
|
725
|
+
💁 **Example**: On Resque, a second worker and suite of gems is required to support `.set(wait: 20.minutes)`!
|
726
|
+
|
727
|
+
Please make sure to consult your given ActiveJob queue processor's own documentation for supporting delayed jobs.
|
728
|
+
|
729
|
+
**Delay**
|
730
|
+
|
731
|
+
There is no such thing as "normal" for what to expect for processing time. You can expectation set using the following:
|
732
|
+
|
733
|
+
```text
|
734
|
+
good_timeout = ((number_of_jobs * average_time_per_job) / average_number_of_workers_available) + rand(5).minutes
|
735
|
+
```
|
736
|
+
|
737
|
+
For some batches, you may expect 10 minutes to be more than sufficient to process the whole workload.
|
738
|
+
|
739
|
+
For other batches, you may expect 2 hours to be a very conservative and aggressive estimate for completion.
|
740
|
+
|
741
|
+
Sometimes, the 10 minute batch could take over 2 hours if enqueued at the same time if other jobs are taking resources.
|
742
|
+
|
743
|
+
Because of the great variability here, you need to assess the differences of your own environment and adapt accordingly.
|
744
|
+
|
745
|
+
**Queue Priority**
|
746
|
+
|
747
|
+
Delayed jobs get enqueued after their delay at the REAR of the queue.
|
748
|
+
|
749
|
+
As such, your monitor **must** have a higher priority than any of jobs it's meant to monitor!
|
750
|
+
|
751
|
+
Otherwise it will be enqueued and processed after all the jobs finish, which defeats the purpose of this!
|
752
|
+
|
753
|
+
Because queueing theory is hard and there isn't really a great standard and several approaches, you're on your own here.
|
754
|
+
|
755
|
+
Sorry, and godspeed!
|
756
|
+
|
757
|
+
**Notifications**
|
758
|
+
|
759
|
+
This automation is only a "best practice" for keeping things more or less clean.
|
760
|
+
|
761
|
+
Anytime an abort happens, developers likely need to look into why, as it should be considered an exceptional event.
|
762
|
+
|
763
|
+
As such, you are encouraged to setup notifications and alerting to communicate these events to your staff.
|
764
|
+
|
765
|
+
There are lots of solutions for how to setup this kind of monitoring, like using an internal mailer:
|
766
|
+
|
767
|
+
```ruby
|
768
|
+
class ApplicationBatch < BatchProcessor::BatchBase
|
769
|
+
on_batch_aborted { EngineeringMailer.batch_timeout(batch: self) }
|
770
|
+
end
|
771
|
+
```
|
772
|
+
|
773
|
+
### Monitor Cron
|
774
|
+
|
775
|
+
Another alternative which inherits different constraints is setting up a cron task to monitory our batches.
|
776
|
+
|
777
|
+
Again, depending on the specifics of your environment and needs, there are several different solutions here as well.
|
778
|
+
|
779
|
+
Let's assume you already have invested in a cron solution like the [clockwork gem](https://rubygems.org/gems/clockwork).
|
780
|
+
|
781
|
+
If you know for instances that your charge batch was certainly supposed to have started by 8PM, write that:
|
782
|
+
|
783
|
+
```ruby
|
784
|
+
every(1.day, 'charge_start.job', at: '20:00') do
|
785
|
+
batch_id = "charge-batch-for-#{Date.today}"
|
786
|
+
begin
|
787
|
+
batch = ApplicationBatch.find(batch_id)
|
788
|
+
EngineeringMailer.charge_batch_not_started(batch_id) unless batch.started?
|
789
|
+
rescue StandardError => exception
|
790
|
+
EngineeringMailer.charge_batch_not_found(exception)
|
791
|
+
end
|
792
|
+
end
|
793
|
+
```
|
794
|
+
|
795
|
+
Likewise, if by 10PM it needs to be finished or you want it aborted, write that:
|
796
|
+
|
797
|
+
```ruby
|
798
|
+
every(1.day, 'charge_start.job', at: '22:00') do
|
799
|
+
batch_id = "charge-batch-for-#{Date.today}"
|
800
|
+
begin
|
801
|
+
batch = ApplicationBatch.find(batch_id)
|
802
|
+
# Assuming you have some kind of generic `on_batch_abort` that handles notifications
|
803
|
+
batch.abort! unless batch.completed?
|
804
|
+
rescue StandardError => exception
|
805
|
+
EngineeringMailer.charge_batch_not_found(exception)
|
806
|
+
end
|
807
|
+
end
|
808
|
+
```
|
809
|
+
|
810
|
+
## Testing
|
811
|
+
|
812
|
+
If you plan on writing `RSpec` tests `BatchProcessor` comes packaged with some custom matchers.
|
813
|
+
|
814
|
+
### Testing Setup
|
815
|
+
|
816
|
+
Add the following to your spec/rails_helper.rb file:
|
817
|
+
|
818
|
+
```ruby
|
819
|
+
require "batch_prcessor/spec_helper"
|
820
|
+
```
|
821
|
+
|
822
|
+
BatchProcessor works best with [shoulda-matchers](https://github.com/thoughtbot/shoulda-matchers) and [rspice](https://github.com/Freshly/spicerack/tree/master/rspice).
|
823
|
+
|
824
|
+
Add them to the development and test group of your Gemfile:
|
825
|
+
|
826
|
+
```ruby
|
827
|
+
group :development, :test do
|
828
|
+
gem "shoulda-matchers", git: "https://github.com/thoughtbot/shoulda-matchers.git", branch: "rails-5"
|
829
|
+
gem "rspice"
|
830
|
+
end
|
831
|
+
```
|
832
|
+
|
833
|
+
Then run `bundle install` and add the following into `spec/rails_helper.rb`:
|
834
|
+
|
835
|
+
```ruby
|
836
|
+
require "rspec/rails"
|
837
|
+
require "rspice"
|
838
|
+
require "batch_processor/spec_helper"
|
839
|
+
|
840
|
+
# Configuration for the shoulda-matchers gem
|
841
|
+
Shoulda::Matchers.configure do |config|
|
842
|
+
config.integrate do |with|
|
843
|
+
with.test_framework :rspec
|
844
|
+
with.library :rails
|
845
|
+
end
|
846
|
+
end
|
847
|
+
```
|
848
|
+
|
849
|
+
This will allow you to use the following custom matchers:
|
850
|
+
|
851
|
+
* [set_processor_option](lib/batch_processor/rspec/custom_matchers/set_processor_option.rb) tests usages of batches specifying processor options.
|
852
|
+
* [use_default_job_class](lib/batch_processor/rspec/custom_matchers/use_default_job_class.rb) tests usages of batches which do not explicitly specify a job.
|
853
|
+
* [use_default_processor](lib/batch_processor/rspec/custom_matchers/use_default_processor.rb) tests usages of batches which do not explicitly specify a processor.
|
854
|
+
* [use_job_class](lib/batch_processor/rspec/custom_matchers/use_job_class.rb) tests usages of batches which explicitly specify a job.
|
855
|
+
* [use_parallel_processor](lib/batch_processor/rspec/custom_matchers/use_parallel_processor.rb) tests usages of `.with_parallel_processor`.
|
856
|
+
* [use_sequential_processor](lib/batch_processor/rspec/custom_matchers/use_sequential_processor.rb) tests usages of `.with_sequential_processor`.
|
857
|
+
|
858
|
+
There are also some internal matchers added:
|
859
|
+
|
860
|
+
* [use_batch_processor_strategy](lib/batch_processor/rspec/custom_matchers/use_batch_processor_strategy.rb) is used to DRY out the similarities between the other batch processor matchers.
|
861
|
+
|
862
|
+
### Testing Batches
|
863
|
+
|
864
|
+
The best way to test a Batch is with an integration test.
|
865
|
+
|
866
|
+
The easiest way to test a Batch is with a unit test.
|
867
|
+
|
868
|
+
Batches are generated with the following RSPec template:
|
869
|
+
|
870
|
+
```ruby
|
871
|
+
# frozen_string_literal: true
|
872
|
+
|
873
|
+
require "rails_helper"
|
874
|
+
|
875
|
+
RSpec.describe FooBatch, type: :batch do
|
876
|
+
subject { described_class }
|
877
|
+
|
878
|
+
it { is_expected.to inherit_from BatchProcessor::BatchBase }
|
879
|
+
|
880
|
+
# it { is_expected.to use_sequential_processor }
|
881
|
+
# it { is_expected.to use_parallel_processor }
|
882
|
+
|
883
|
+
# it { is_expected.to be_allow_empty }
|
884
|
+
|
885
|
+
# it { is_expected.to use_default_job_class }
|
886
|
+
# it { is_expected.to use_job_class OtherJob }
|
887
|
+
|
888
|
+
# it { is_expected.to set_processor_option :continue_after_exception, true }
|
889
|
+
# it { is_expected.to set_processor_option :sorted, true }
|
890
|
+
# it { is_expected.not_to be_allow_empty }
|
891
|
+
|
892
|
+
describe FooBatch::Collection, type: :batch_collection do
|
893
|
+
subject { described_class.new }
|
894
|
+
|
895
|
+
it { is_expected.to inherit_from BatchProcessor::BatchBase::BatchCollection }
|
896
|
+
# it { is_expected.to define_argument :arg, allow_nil: false }
|
897
|
+
# it { is_expected.to define_option :opt, default: 3 }
|
898
|
+
end
|
899
|
+
end
|
900
|
+
```
|
901
|
+
|
902
|
+
#### Testing Collections
|
903
|
+
|
904
|
+
If your Collections are complicated enough that you want to put them into a separate file, they are **too** complicated.
|
905
|
+
|
906
|
+
Collections are expected to be incredibly straightforward objects with minimal validations and logic.
|
907
|
+
|
908
|
+
Highly maintainable collections should essentially be input sanity checks around something like an `ActiveRecord` scope:
|
909
|
+
|
910
|
+
```ruby
|
911
|
+
class Collection < BatchCollection
|
912
|
+
argument :charge_date
|
913
|
+
|
914
|
+
validate :charge_date, date: { is_today_or_future: true }
|
915
|
+
|
916
|
+
def items
|
917
|
+
Orders.to_charge_on(charge_date).payment_pending
|
918
|
+
end
|
919
|
+
end
|
920
|
+
```
|
921
|
+
|
922
|
+
This will allow you to keep this class as a slim, which is the intent!
|
923
|
+
|
924
|
+
### Testing Jobs
|
925
|
+
|
926
|
+
BatchProcessor, though heavily reliant on jobs, does not include anything special or specific to test them.
|
927
|
+
|
928
|
+
Any job descending from `BatchProcessor::BatchableJob` (which has `ActiveJob::Base` as its parent) is batchable.
|
929
|
+
|
930
|
+
You can **and should** test your jobs, but there is nothing "special about them".
|
931
|
+
|
932
|
+
Ideally, if you're starting with a collection of well-built jobs, they should work nearly effortlessly here.
|
933
|
+
|
934
|
+
Admittedly "well-built" is subjective, but taken here to mean "clearly defined error handling and one responsibility".
|
935
|
+
|
936
|
+
Generally speaking the same suggestions for testing collections apply to `ActiveJob`.
|
937
|
+
|
938
|
+
🤓 **HUMBLE OBSERVATION**: The best jobs are slim wrappers around clearly defined services.
|
939
|
+
|
940
|
+
```ruby
|
941
|
+
class ExampleJob < ApplicationJob
|
942
|
+
def perform(id)
|
943
|
+
the_thing = Thing.find(id)
|
944
|
+
raise Thing::Locked if the_thing.locked?
|
945
|
+
|
946
|
+
the_thing.lock!
|
947
|
+
info :locked_thing, the_thing: the_thing
|
948
|
+
|
949
|
+
the_stuff = Stuff.for(the_thing)
|
950
|
+
info :got_stuff, the_stuff: the_stuff
|
951
|
+
|
952
|
+
the_thing.make_do(the_stuff)
|
953
|
+
info :made_the_thing_do_the_stuff
|
954
|
+
end
|
955
|
+
end
|
956
|
+
|
957
|
+
# Executed with...
|
958
|
+
ExampleJob.perform_later(the_thing.id)
|
959
|
+
```
|
960
|
+
|
961
|
+
You should always move all that stuff into a service object:
|
962
|
+
|
963
|
+
```ruby
|
964
|
+
class ExampleJob < ApplicationJob
|
965
|
+
def perform(the_thing)
|
966
|
+
DoTheStuffService.for(the_thing)
|
967
|
+
end
|
968
|
+
end
|
969
|
+
|
970
|
+
# Let globalID take care of this! Don't reinvent wheels!
|
971
|
+
ExampleJob.perform(the_thing)
|
972
|
+
```
|
973
|
+
|
974
|
+
If this feels right to you, check out [flow](https://github.com/Freshly/flow). You'll like what you see, I guarantee it.
|
975
|
+
|
976
|
+
### Integration Testing
|
977
|
+
|
978
|
+
Effective integration testing for batches requires you to configure `ActiveJob` for testing.
|
979
|
+
|
980
|
+
There are lots of solutions to this puzzle, so you're expected to pick your own poison in that regard.
|
981
|
+
|
982
|
+
Once you have everything configured to effectively unit test jobs, you can confirm the behavior of your batch.
|
983
|
+
|
984
|
+
A comprehensive suite of integration tests for a batch will cover three contexts:
|
985
|
+
|
986
|
+
1) When the batch finishes successfully.
|
987
|
+
2) When the batch finishes with some errors.
|
988
|
+
3) When you manually intervene with the batch.
|
989
|
+
|
990
|
+
These are basically the only circumstances that you will actually encounter in the real world, so you should test them.
|
991
|
+
|
992
|
+
Writing general purpose handlers for batch aborts and clears will save you a lot of trouble and excess testing!
|
993
|
+
|
994
|
+
Pretty much every manual intervention in a batch will elicit a "stuff is on fire" response from the team anyway.
|
995
|
+
|
996
|
+
## Custom Processors
|
997
|
+
|
998
|
+
You are able to define your own custom processors and use them with the batch processor.
|
999
|
+
|
1000
|
+
The following example is incredibly contrived for the purposes of demonstration:
|
1001
|
+
|
1002
|
+
Let's say you wanted a `NoBobProcessor` which enqueued jobs for anyone unless they had `Bob` in their name.
|
1003
|
+
|
1004
|
+
First, create an `app/batch_processors` directory. (really it can be in any folder, but why not be explicit?)
|
1005
|
+
|
1006
|
+
Then create your new `NoBobProcessor` class which is a descendant of `BatchProcessor::ProcessorBase`.
|
1007
|
+
|
1008
|
+
Generally speaking, when defining a processor you only need to define one method: `#process_collection_item`
|
1009
|
+
|
1010
|
+
This method is called with each and every item from a Batch's `Collection` and the processor decides what to do with it.
|
1011
|
+
|
1012
|
+
In our example, we will add a string-matching guard clause to exclude the `Bob`s of the world from processing.
|
1013
|
+
|
1014
|
+
When writing processors, it's always best to assume a generic case.
|
1015
|
+
|
1016
|
+
For now, let's assume that's either being a string representing a name, or an object with a `#name` property to check.
|
1017
|
+
|
1018
|
+
```ruby
|
1019
|
+
class NoBobProcessor < BatchProcessor::ProcessorBase
|
1020
|
+
# Required for parallel processors to keep accurate and expected reporting
|
1021
|
+
set_callback(:collection_processed, :after) { batch.enqueued }
|
1022
|
+
|
1023
|
+
def process_collection_item(item)
|
1024
|
+
return if for_a_bob?(item)
|
1025
|
+
|
1026
|
+
job = batch.job_class.new(item)
|
1027
|
+
job.batch_id = batch.batch_id
|
1028
|
+
job.enqueue
|
1029
|
+
end
|
1030
|
+
|
1031
|
+
private
|
1032
|
+
|
1033
|
+
def for_a_bob?(item)
|
1034
|
+
name = item.name if item.respond_to?(:name)
|
1035
|
+
name ||= item if item.is_a?(String)
|
1036
|
+
raise ArgumentError, "Unknown item: #{item}" if name.nil?
|
1037
|
+
|
1038
|
+
name.include?("Bob")
|
1039
|
+
end
|
1040
|
+
end
|
1041
|
+
```
|
1042
|
+
|
1043
|
+
Then, it needs to be register as a processing strategy so batches can utilize it.
|
1044
|
+
|
1045
|
+
To define it, create or edit a `config/initializers/batch_processor.rb` file and add the following line:
|
1046
|
+
|
1047
|
+
```ruby
|
1048
|
+
ApplicationBatch::PROCESSOR_CLASS_BY_STRATEGY[:no_bobs] = NoBobProcessor
|
1049
|
+
```
|
1050
|
+
|
1051
|
+
This will enable you to specify this processor within your batch:
|
1052
|
+
|
1053
|
+
```ruby
|
1054
|
+
class ChargeNoBobsBatch < ApplicationBatch
|
1055
|
+
use_no_bobs_processor
|
1056
|
+
|
1057
|
+
# ...
|
1058
|
+
end
|
1059
|
+
```
|
1060
|
+
|
1061
|
+
Reference your new processor by the name you used to enter it with the `PROCESSOR_CLASS_BY_STRATEGY` hash.
|
1062
|
+
|
1063
|
+
You can refer to the existing [processors](lib/batch_processor/processors) for reference.
|
1064
|
+
|
1065
|
+
### Testing Processors
|
1066
|
+
|
1067
|
+
Testing custom processors is best suited by unit tests and confirmed by integration tests.
|
1068
|
+
|
1069
|
+
There's a lot that can go wrong when batching lots of jobs is involved, and it really helps to have unit tests on this.
|
1070
|
+
|
1071
|
+
I can't offer more guidance on writing good unit tests for processors other than suggesting [riffing on these](spec/batch_processor/processors).
|
43
1072
|
|
44
1073
|
## Contributing
|
45
1074
|
|
46
1075
|
Bug reports and pull requests are welcome on GitHub at https://github.com/Freshly/batch_processor.
|
47
1076
|
|
1077
|
+
### Development
|
1078
|
+
|
1079
|
+
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
1080
|
+
|
1081
|
+
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
1082
|
+
|
48
1083
|
## License
|
49
1084
|
|
50
1085
|
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|