busybee 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +71 -7
  3. data/README.md +70 -42
  4. data/docs/client/quick_start.md +279 -0
  5. data/docs/client.md +825 -0
  6. data/docs/configuration.md +550 -0
  7. data/docs/grpc.md +50 -25
  8. data/docs/testing.md +118 -28
  9. data/docs/workers.md +982 -0
  10. data/exe/busybee +6 -0
  11. data/lib/busybee/cli.rb +173 -0
  12. data/lib/busybee/client/error_handling.rb +37 -0
  13. data/lib/busybee/client/job_operations.rb +236 -0
  14. data/lib/busybee/client/message_operations.rb +84 -0
  15. data/lib/busybee/client/process_operations.rb +108 -0
  16. data/lib/busybee/client/variable_operations.rb +64 -0
  17. data/lib/busybee/client.rb +87 -0
  18. data/lib/busybee/configure.rb +290 -0
  19. data/lib/busybee/credentials/camunda_cloud.rb +58 -0
  20. data/lib/busybee/credentials/insecure.rb +24 -0
  21. data/lib/busybee/credentials/oauth.rb +157 -0
  22. data/lib/busybee/credentials/tls.rb +43 -0
  23. data/lib/busybee/credentials.rb +200 -0
  24. data/lib/busybee/defaults.rb +20 -0
  25. data/lib/busybee/error.rb +50 -0
  26. data/lib/busybee/grpc/error.rb +60 -0
  27. data/lib/busybee/grpc.rb +2 -2
  28. data/lib/busybee/job.rb +219 -0
  29. data/lib/busybee/job_stream.rb +85 -0
  30. data/lib/busybee/logging.rb +61 -0
  31. data/lib/busybee/railtie.rb +113 -0
  32. data/lib/busybee/runner/hybrid.rb +64 -0
  33. data/lib/busybee/runner/multi.rb +101 -0
  34. data/lib/busybee/runner/polling.rb +54 -0
  35. data/lib/busybee/runner/streaming.rb +159 -0
  36. data/lib/busybee/runner.rb +97 -0
  37. data/lib/busybee/runtime_config.rb +184 -0
  38. data/lib/busybee/serialization.rb +100 -0
  39. data/lib/busybee/testing/activated_job.rb +33 -8
  40. data/lib/busybee/testing/helpers/execution.rb +139 -0
  41. data/lib/busybee/testing/helpers/support.rb +78 -0
  42. data/lib/busybee/testing/helpers.rb +56 -66
  43. data/lib/busybee/testing/matchers/complete_job.rb +55 -0
  44. data/lib/busybee/testing/matchers/fail_job.rb +75 -0
  45. data/lib/busybee/testing/matchers/have_activated.rb +1 -1
  46. data/lib/busybee/testing/matchers/have_available_jobs.rb +44 -0
  47. data/lib/busybee/testing/matchers/throw_bpmn_error_on.rb +72 -0
  48. data/lib/busybee/testing.rb +5 -33
  49. data/lib/busybee/version.rb +1 -1
  50. data/lib/busybee/worker/configuration.rb +287 -0
  51. data/lib/busybee/worker/dsl.rb +187 -0
  52. data/lib/busybee/worker/shutdown.rb +27 -0
  53. data/lib/busybee/worker.rb +130 -0
  54. data/lib/busybee.rb +134 -2
  55. metadata +80 -3
data/docs/workers.md ADDED
@@ -0,0 +1,982 @@
1
+ # Workers
2
+
3
+ In a distributed system, each application might need to participate in dozens of business processes that span the whole organization. Orchestration lets you meet that need by allowing each app to expose just a handful of domain-specific actions, and then reusing and composing those actions into different workflows which describe those business processes. Busybee's **Worker** abstraction lets you define those actions as simple Ruby classes, and handles everything else for you: connecting your class to the workflow engine, requesting work, reporting results, and managing the process lifecycle.
4
+
5
+ If you've used Sidekiq (or similar frameworks) to build background jobs, this pattern should feel very familiar. You define a class, implement a `perform` method, and let the framework handle the infrastructure. The key conceptual differences are:
6
+ - Background jobs in Sidekiq are always running in the same application that invokes them and defines them, while [Workers](https://docs.camunda.io/docs/components/concepts/job-workers/) in an orchestrated system are still running in the application that defines them, but they are always being invoked externally, by an instance of one of those business processes that is running in the central workflow engine.
7
+ - Background jobs work by side effects only (that is, the return value of `perform` in a Sidekiq job does not matter), but in a Worker both side effects and return values matter. The return values become part of the context of the running business process instance. This allows other downstream workers to consume those values, and also allows the workflow to make flow control decisions based on them.
8
+
9
+ Busybee is built around a workflow engine named [Zeebe](https://docs.camunda.io/docs/components/zeebe/zeebe-overview/), which is available in either self-hosted form or as a hosted/SaaS product from [Camunda](https://camunda.com/). The workflow definition format used by Zeebe and Camunda, and therefore what Busybee supports, is called [BPMN](https://docs.camunda.io/docs/components/modeler/bpmn/bpmn-primer/).
10
+
11
+ > For a working example of workers in a multi-domain system, see the [Dropship Co. demo app](../spec/demo/README.md), which uses busybee workers to orchestrate order fulfillment across isolated warehousing, logistics, and delivery domains.
12
+
13
+ ## Table of Contents
14
+
15
+ - [Defining Workers](#defining-workers)
16
+ - [Your First Worker](#your-first-worker)
17
+ - [The Job Lifecycle](#the-job-lifecycle)
18
+ - [Declaring Inputs](#declaring-inputs)
19
+ - [Declaring Outputs](#declaring-outputs)
20
+ - [Input/Output Types](#inputoutput-types)
21
+ - [Advanced DSL Options](#advanced-dsl-options)
22
+ - [Running Workers](#running-workers)
23
+ - [CLI Quick Start](#cli-quick-start)
24
+ - [CLI Reference](#cli-reference)
25
+ - [Rails Integration](#rails-integration)
26
+ - [Signal Handling](#signal-handling)
27
+ - [Worker Modes](#worker-modes)
28
+ - [Multiple Workers in One Process](#multiple-workers-in-one-process)
29
+ - [YAML Configuration](#yaml-configuration)
30
+ - [Configuration Precedence](#configuration-precedence)
31
+ - [Testing Workers](#testing-workers)
32
+ - [Setup](#setup)
33
+ - [Basic Worker Testing](#basic-worker-testing)
34
+ - [Inspecting Job State](#inspecting-job-state)
35
+ - [Worker Testing Matchers](#worker-testing-matchers)
36
+ - [Testing Best Practices](#testing-best-practices)
37
+
38
+ ---
39
+
40
+ ## Defining Workers
41
+
42
+ ### Your First Worker
43
+
44
+ A worker is a Ruby class that subclasses `Busybee::Worker` and implements `perform`. Each time the workflow engine has a job ready, Busybee creates a new instance of your worker class for that job and calls `perform`:
45
+
46
+ ```ruby
47
+ class ProcessOrderWorker < Busybee::Worker
48
+ job_type "process_order"
49
+
50
+ variable :order_id, type: :uuid
51
+
52
+ output :confirmation_number, type: :string
53
+
54
+ def perform
55
+ order = Order.find(order_id)
56
+ confirmation = order.process!
57
+
58
+ { confirmation_number: confirmation }
59
+ end
60
+ end
61
+ ```
62
+
63
+ A few things are happening here:
64
+
65
+ - **`job_type`** identifies which jobs this worker handles. When the workflow engine reaches a [service task](https://docs.camunda.io/docs/components/modeler/bpmn/service-tasks/) with this type, it creates a job and sends it to an available worker. If you omit `job_type`, it's derived from the class name: `ProcessOrderWorker` becomes `"process_order"`.
66
+ - **[`variable`](#declaring-inputs)** declares an input your worker expects. Busybee defines an accessor method so you can call `order_id` directly in `perform`.
67
+ - **[`output`](#declaring-outputs)** declares what your worker returns. When `perform` returns a Hash, those values flow back into the workflow as [process variables](https://docs.camunda.io/docs/components/concepts/variables/) so that downstream workers may have access to them among their inputs.
68
+ - **[`perform`](#the-job-lifecycle)** contains your business logic. A new worker instance is created for each job, so you can safely use instance variables and private helper methods. (Note that perform takes no arguments.)
69
+
70
+ ### The Job Lifecycle
71
+
72
+ While a Worker object knows how to perform units of work, a Job object represents one individual unit of that work to be performed. When a running [process instance](https://docs.camunda.io/docs/components/concepts/processes/#process-instance-creation) (a single execution of a workflow) arrives at the point where work needs to be done (a "[service task](https://docs.camunda.io/docs/components/modeler/bpmn/service-tasks/)" in BPMN terms), a job is created in the workflow engine with all of the context needed to perform that work. The job is called "created" or "available" when the workflow engine prepares it, and called "activated" or "ready" when it has been picked up by a running worker. At that point, the workflow engine waits for the worker to call back and report one of three possible outcomes:
73
+ - **Completed** - This is the happy path. If the work was performed successfully, the job is marked complete and optional additional data variables are sent back to the process instance, which continues to the next step in the workflow.
74
+ - **Failed** - If the work could not be performed (if a ruby exception was raised), the job is marked as failed, and after a short backoff delay it will be retried (made available again for another worker to pick it up).
75
+ - The maximum number of retries is set by the [process definition](https://docs.camunda.io/docs/components/modeler/bpmn/service-tasks/#task-definition) (the BPMN document that describes the workflow); if it is exceeded, the entire running process instance is paused and an ["Incident"](https://docs.camunda.io/docs/components/concepts/incidents/) is raised in the workflow engine for an operator to review. The remaining number of retries on the current job can be read and updated as desired.
76
+ - **Errored** - If the work encountered an abnormal **business** condition (for example, insufficient funds) the job may do what is called _throwing a BPMN error._ This is different than a _Ruby error,_ which causes job failure and retry; BPMN errors should be used for flow control, when there's an anticipated business outcome that the workflow needs to handle by taking a different branch.
77
+
78
+ If none of those three things happen within a configurable window of time (the job timeout), the workflow engine assumes that the worker process must have crashed, and it will make the job available again for other workers to pick up. The deadline for the current job can also be read and updated as desired.
79
+
80
+ When a running busybee process receives a job, it uses your worker class to execute this lifecycle, with some additional checks and conveniences:
81
+
82
+ 1. **Instantiation** - a new instance of your worker is created for that job.
83
+ 2. **Input Validation** - all `required: true` inputs are checked. If any are missing, `MissingInput` is raised.
84
+ 3. **Perform** - your `perform` method runs.
85
+ 4. **On Success** - if `perform` returned successfully and `complete_job_on_success` is `true` (the default), then:
86
+ - **Output Validation** - All `required: true` outputs are checked in the Hash returned from `perform`. If any are missing, `MissingOutput` is raised.
87
+ - **Return Variables** - Busybee reports to the workflow engine that the job is complete, sending back any output values returned from `perform`.
88
+ 5. **On Failure** - if `perform` raises an exception and `fail_job_on_error` is `true` (the default), then:
89
+ - **Error Reporting** - Busybee reports to the workflow engine that the job failed, sending back the error class and message and the configured backoff delay.
90
+
91
+ This means that for most workers, you can just implement `perform`, return a Hash, and let Busybee handle the rest.
92
+
93
+ (For throwing a BPMN error, see the [Manual Lifecycle Control](#manual-lifecycle-control) section below.)
94
+
95
+ #### Automatic Completion
96
+
97
+ By default, Busybee completes the job when `perform` returns successfully. If `perform` returns a Hash, those key-value pairs become output variables:
98
+
99
+ ```ruby
100
+ def perform
101
+ order = Order.find(order_id)
102
+
103
+ { status: order.status, processed_at: Time.now.iso8601 }
104
+ # Job is completed automatically with these variables
105
+ end
106
+ ```
107
+
108
+ If `perform` returns an empty Hash or anything other than a Hash (including nil), the job is completed with no output variables.
109
+
110
+ **Output Validation:** If one or more output variables were declared with `required: true` (the default) but those keys are not present in the returned Hash, a `MissingOutput` error will be raised.
111
+
112
+ #### Automatic Failure (Error Handling)
113
+
114
+ If `perform` raises an exception, Busybee reports the job as failed to the workflow engine, along with the error message. The job will then be retried after a configurable backoff delay, up to the maximum retry count set in the [BPMN process definition](https://docs.camunda.io/docs/components/modeler/bpmn/service-tasks/#task-definition) (not shown here):
115
+
116
+ ```ruby
117
+ class ProcessPaymentWorker < Busybee::Worker
118
+ variable :order_id, type: :uuid
119
+
120
+ output :charged, type: :boolean
121
+
122
+ backoff 30_000 # wait 30 seconds before the workflow engine makes this job available again
123
+
124
+ def perform
125
+ order = Order.find(order_id) # may raise ActiveRecord::RecordNotFound
126
+ PaymentGateway.charge(order) # may raise PaymentGateway::Timeout
127
+
128
+ { charged: true }
129
+ end
130
+ # If either line raises, the job is failed and retried after 30s
131
+ end
132
+ ```
133
+
134
+ **Important:** Because failed jobs are retried by default, you should try to make your `perform` method [idempotent](https://en.wikipedia.org/wiki/Idempotence) whenever possible. If a particular worker cannot safely be retried, set retries to `0` in the BPMN definition. Even then, **Zeebe does not guarantee exactly-once execution.** If you need that guarantee, your worker must implement it.
135
+
136
+ #### Manual Lifecycle Control
137
+
138
+ For cases where automatic handling isn't sufficient, you can control the job lifecycle directly. The `complete!`, `fail!`, and `throw_bpmn_error!` methods are delegated from the worker to the job:
139
+
140
+ ```ruby
141
+ class ProcessOrderWorker < Busybee::Worker
142
+ complete_job_on_success false # we'll handle completion ourselves
143
+
144
+ def perform
145
+ order = Order.find(order_id)
146
+
147
+ case order.validate
148
+ when :ok
149
+ order.process!
150
+ complete!(confirmation: order.confirmation_number)
151
+ when :fraud_detected
152
+ # this is a business-level error case -- the workflow will have a branch to handle this:
153
+ throw_bpmn_error!(:fraud_detected, "Fraud detected for order #{order_id}")
154
+ when :invalid_items
155
+ # this is a technical failure -- if it cannot succeed on retry, the workflow needs to stop and alert the operator:
156
+ fail!("Order contains invalid or unavailable items")
157
+ end
158
+ end
159
+ end
160
+ ```
161
+
162
+ **`complete!(vars = {})`** completes the job with optional output variables.
163
+
164
+ **`fail!(error, retries: nil, backoff: nil)`** fails the job. Accepts a String or Exception. Optionally override the retry count or backoff delay.
165
+
166
+ **`throw_bpmn_error!(code, message = "")`** throws a [BPMN error](https://docs.camunda.io/docs/components/modeler/bpmn/error-events/) that can be caught by an [error boundary event](https://docs.camunda.io/docs/components/modeler/bpmn/error-events/#error-boundary-events) in the process definition. The error code can be a String, Symbol (converted to UPPERCASE), or Exception class (it will be converted from `MyApp::OrderNotFound` to the code string `MY_APP_ORDER_NOT_FOUND`). Use BPMN errors when the failure is an anticipated business outcome that the workflow should handle, rather than a technical failure that should be retried.
167
+
168
+ **`update_retries(count)`** and **`update_timeout(duration)`** modify the job's retry count or lock timeout without completing or failing it. Useful for long-running jobs that need to extend their deadline:
169
+
170
+ ```ruby
171
+ def perform
172
+ update_timeout(5.minutes) # extend deadline before starting long operation
173
+ # ... long operation ...
174
+ end
175
+ ```
176
+
177
+ Note that you can safely mix-and-match manual and automatic control, because both automatic completion and automatic failure check whether the job is still `ready?` before they attempt to complete or fail it. Therefore, this is a perfectly valid alternate approach to the above:
178
+
179
+ ```ruby
180
+ class ProcessOrderWorker < Busybee::Worker
181
+ def perform
182
+ order = Order.find(order_id)
183
+
184
+ case order.validate
185
+ when :ok
186
+ order.process!
187
+ return { confirmation: order.confirmation_number } # will trigger auto-complete
188
+ when :fraud_detected
189
+ throw_bpmn_error!(:fraud_detected, "Fraud detected for order #{order_id}") # marks the job non-ready, so auto-complete is skipped
190
+ when :invalid_items
191
+ raise "Order contains invalid or unavailable items" # will trigger auto-fail
192
+ end
193
+ end
194
+ end
195
+ ```
196
+
197
+ #### Shutdown Handling
198
+
199
+ Some exceptions represent conditions that a worker container can't recover from: a lost database connection, a broken Redis pool, a revoked API credential. When one of these occurs, it's better to shut down the worker process so that your container manager (e.g. kubernetes) can replace it with a fresh one.
200
+
201
+ Use `shutdown_on` to declare which exception classes should trigger a graceful shutdown:
202
+
203
+ ```ruby
204
+ class ProcessOrderWorker < Busybee::Worker
205
+ shutdown_on PG::ConnectionBad
206
+ shutdown_on Redis::ConnectionError
207
+
208
+ def perform
209
+ # If this raises PG::ConnectionBad, the worker shuts down gracefully
210
+ Order.find(order_id).process!
211
+ end
212
+ end
213
+ ```
214
+
215
+ You can also configure shutdown errors globally for all workers in your application via [`Busybee.shutdown_on_errors`](configuration.md).
216
+
217
+ When a shutdown is triggered, the worker process stops requesting new jobs, fails any in-flight jobs (preserving their retry count so they'll be picked up by another worker), and exits.
218
+
219
+ #### Direct Job Access
220
+
221
+ Several of the methods you've already seen — `complete!`, `fail!`, `throw_bpmn_error!`, `update_retries`, `update_timeout`, `variables`, and `headers` — are actually delegated from the worker to an underlying `Busybee::Job` object. You can access this object directly via `self.job` in `perform`. The job carries metadata, raw data, and status information that isn't available at the worker level:
222
+
223
+ ```ruby
224
+ def perform
225
+ # Metadata (job-only)
226
+ job.key # unique job identifier (Integer)
227
+ job.type # job type from BPMN (String)
228
+ job.process_instance_key # workflow instance this job belongs to (Integer)
229
+ job.bpmn_process_id # BPMN process ID (String)
230
+ job.retries # remaining retry attempts (Integer)
231
+ job.deadline # lock expiration time (frozen Time, UTC)
232
+
233
+ # Data (delegated, but declared inputs are preferred — see Declaring Inputs)
234
+ job.variables # all process variables, as a frozen hash with indifferent access
235
+ job.headers # custom headers from BPMN definition, same format
236
+
237
+ # Lifecycle (delegated — see Manual Lifecycle Control)
238
+ job.complete!(vars = {}) # mark job complete, with optional output variables
239
+ job.fail!(error, retries: nil, backoff: nil) # mark job failed
240
+ job.throw_bpmn_error!(code, message = "") # throw a BPMN error
241
+ job.update_retries(count) # change remaining retry count
242
+ job.update_timeout(duration) # extend or shorten the job lock deadline
243
+
244
+ # Status predicates (job-only)
245
+ job.ready? # true if not yet completed/failed/errored
246
+ job.complete? # true if completed
247
+ job.failed? # true if failed
248
+ job.error? # true if BPMN error was thrown
249
+ end
250
+ ```
251
+
252
+ Variables and headers support both hash-style and method-style access, including nested values:
253
+
254
+ ```ruby
255
+ job.variables[:order_id] # hash access with symbol key
256
+ job.variables["order_id"] # hash access with string key
257
+ job.variables.order_id # method access
258
+ job.variables.address.zip_code # nested method access
259
+ ```
260
+
261
+ Most of the time, you won't need to reach for `job` directly — input accessors give you named, validated methods for reading data, and the lifecycle delegations (`complete!`, `fail!`, etc.) read more naturally without the `job.` prefix. But the job object is there when you need metadata, status checks, or raw data access.
262
+
263
+ ### Declaring Inputs
264
+
265
+ Inputs declare the data your worker needs from the running workflow instance. Each input becomes an accessor method on your worker, so you can use it directly in `perform` instead of digging through raw hashes.
266
+
267
+ Inputs come from two sources: **variables** and **headers**. [Variables](https://docs.camunda.io/docs/components/concepts/variables/) are data specific to a running workflow instance: an order ID, a customer email, a calculated total. [Headers](https://docs.camunda.io/docs/components/modeler/bpmn/service-tasks/#task-headers) are set in the BPMN process definition and are the same for every instance, so they are useful for configuration like which email template to send.
268
+
269
+ #### From Variables
270
+
271
+ ```ruby
272
+ class ShipOrderWorker < Busybee::Worker
273
+ variable :order_id, type: :uuid, description: "Order to ship"
274
+ variable :shipping_method, default: "standard"
275
+
276
+ def perform
277
+ order = Order.find(order_id)
278
+ order.ship!(method: shipping_method) # "standard" if not in variables
279
+
280
+ { tracking_number: order.tracking_number }
281
+ end
282
+ end
283
+ ```
284
+
285
+ #### From Headers
286
+
287
+ ```ruby
288
+ class CalculateDistanceWorker < Busybee::Worker
289
+ variable :from_lat, type: :decimal
290
+ variable :from_lon, type: :decimal
291
+ variable :to_lat, type: :decimal
292
+ variable :to_lon, type: :decimal
293
+
294
+ header :algorithm, type: :string, description: "Distance formula to use"
295
+
296
+ output :distance, type: :decimal
297
+
298
+ def perform
299
+ dist = compute_distance(algorithm)
300
+
301
+ { distance: dist.round(3) }
302
+ end
303
+ end
304
+ ```
305
+
306
+ Because the algorithm is a header, different BPMN tasks can reuse the same worker with different algorithms: one task might set the header to `"haversine"`, another to `"pythagorean"`.
307
+
308
+ #### From Either Source
309
+
310
+ Sometimes a value should come from a variable when available, but fall back to a header as a default (or vice versa). Pass an array of sources -- the first non-nil value wins:
311
+
312
+ ```ruby
313
+ input :priority, source: [:variable, :header], type: :string
314
+ ```
315
+
316
+ This is the general form. The `variable` and `header` DSL methods are shorthands:
317
+
318
+ ```ruby
319
+ variable :template # same as `input :template, source: :variable`
320
+ header :template # same as `input :template, source: :header`
321
+ input :template, source: [:variable, :header] # check variable first, then header
322
+ ```
323
+
324
+ #### Input Options
325
+
326
+ | Option | Type | Default | Description |
327
+ |--------|------|---------|-------------|
328
+ | `source:` | Symbol or Array | (required for `input`) | `:variable`, `:header`, or `[:variable, :header]` |
329
+ | `required:` | Boolean | `true`\* | Raise `MissingInput` if absent. Cannot combine with `default:` |
330
+ | `type:` | Symbol | `nil` | Documentation hint. See [Input/Output Types](#inputoutput-types) |
331
+ | `description:` | String | `nil` | Human-readable description |
332
+ | `default:` | any | (none) | Default value when input is missing. Makes the input not required |
333
+ | `accessor_name:` | Symbol | (same as name) | Custom method name for the accessor |
334
+ | `define_accessor:` | Boolean | `true` | Set to `false` to skip accessor definition |
335
+
336
+ When an input is `required: true` (the default) and the value is missing from the job, Busybee raises `Busybee::MissingInput` before your `perform` method runs. This can alert you to a workflow which is trying to use this worker in an invalid or incorrect way before that might cause harder-to-catch bugs further downstream.
337
+
338
+ > \* The default value of `required` can be switched for your entire app if desired, allowing you to disable the raise-on-missing behavior. See the [configuration](./configuration.md) document.
339
+
340
+ ### Declaring Outputs
341
+
342
+ Outputs declare the variables your worker returns to the workflow engine. When your `perform` method returns a Hash, Busybee sends those key-value pairs back as new or updated [process variables](https://docs.camunda.io/docs/components/concepts/variables/):
343
+
344
+ ```ruby
345
+ class CreateShipmentWorker < Busybee::Worker
346
+ variable :order_id, type: :uuid
347
+ variable :warehouse_id, type: :uuid
348
+
349
+ output :shipment_id, type: :uuid, description: "Created shipment's ID"
350
+ output :item_count, type: :integer, description: "Total item count"
351
+
352
+ def perform
353
+ shipment = Shipment.create!(order_id: order_id, warehouse_id: warehouse_id)
354
+
355
+ { shipment_id: shipment.id, item_count: shipment.items.count }
356
+ end
357
+ end
358
+ ```
359
+
360
+ If a required output is missing from the returned Hash, Busybee raises `Busybee::MissingOutput`. This can alert you to a worker which isn't fulfilling its entire contract (isn't doing everything a workflow is relying on it to do).
361
+
362
+ Note that if `perform` returns nothing at all (or returns anything other than a Hash), no variables are sent back. This is equivalent to returning an empty Hash.
363
+
364
+ #### Output Options
365
+
366
+ | Option | Type | Default | Description |
367
+ |--------|------|---------|-------------|
368
+ | `required:` | Boolean | `true`\* | Raise `MissingOutput` if absent from return value |
369
+ | `type:` | Symbol | `nil` | Documentation hint |
370
+ | `description:` | String | `nil` | Human-readable description |
371
+
372
+ > \* The default value of `required` can be switched for your entire app if desired, allowing you to disable the raise-on-missing behavior. See the [configuration](./configuration.md) document.
373
+
374
+ ### Input/Output Types
375
+
376
+ The `type:` option is a documentation hint that describes what kind of value to expect. Types are not enforced at runtime (job variables arrive as JSON and are deserialized accordingly) but they serve as a contract between the BPMN process definition and your worker code. The available types are designed to align well with [JSON](https://www.json.org/) and Zeebe's [FEEL expression language](https://docs.camunda.io/docs/components/modeler/feel/what-is-feel/):
377
+
378
+ | Type | JSON Representation | Example |
379
+ |------|--------------------| --------|
380
+ | `string` | String | `"hello"` |
381
+ | `integer` | Number (integer) | `42` |
382
+ | `decimal` | Number (float) | `99.95` |
383
+ | `boolean` | Boolean | `true` |
384
+ | `datetime` | String ([ISO 8601](https://en.wikipedia.org/wiki/ISO_8601)) | `"2026-03-06T14:30:00Z"` |
385
+ | `duration` | String ([ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations)) | `"PT6H"` |
386
+ | `uuid` | String | `"550e8400-e29b-41d4-a716-446655440000"` |
387
+ | `null` | null | `null` |
388
+
389
+ Note that, while JSON and FEEL support array and object types, this version of busybee does not yet provide that support. If you have array- or object-shaped inputs or outputs, either omit the `type:` option, or set it to `null` (which will not be enforced anywhere).
390
+
391
+ > A future busybee version will provide runtime instrumentation hooks for when workers start up or shut down, which will receive input/output types and descriptions among their metadata. This will allow you to register this metadata in your own tracking / auditing systems.
392
+
393
+ ### Advanced DSL Options
394
+
395
+ #### `complete_job_on_success`
396
+
397
+ Controls whether Busybee automatically completes the job when `perform` returns without raising. Default: `true`.
398
+
399
+ Set to `false` when your worker needs to manage the job lifecycle manually (for example, when completion depends on a conditional branch, or when using async patterns):
400
+
401
+ ```ruby
402
+ class PickAndPackWorker < Busybee::Worker
403
+ complete_job_on_success false
404
+ fail_job_on_error false
405
+
406
+ def perform
407
+ delay = calculate_delay
408
+ current_job = job
409
+
410
+ Concurrent::Promises.future { simulate_packing(current_job, delay) }
411
+ .then { current_job.complete! }
412
+ .rescue { |err| current_job.fail!(err) }
413
+ end
414
+ end
415
+ ```
416
+
417
+ > See the [Dropship Co. demo app's simulation workers](../spec/demo/app/workers/sim/) for a full example of this pattern.
418
+
419
+ #### `fail_job_on_error`
420
+
421
+ Controls whether Busybee automatically fails the job when `perform` raises an exception. Default: `true`.
422
+
423
+ Set to `false` when you want to handle all errors yourself. Note that if the job is neither completed nor failed, it will eventually time out and be retried by the workflow engine.
424
+
425
+ #### `description`
426
+
427
+ A human-readable description of what the worker does. Used for documentation (will be passed to instrumentation hooks in a future version):
428
+
429
+ ```ruby
430
+ description "Calculates distance between two geographic points using a configurable algorithm"
431
+ ```
432
+
433
+ Not to be confused with the `description:` option on input and output declarations, which is similarly used for documentation.
434
+
435
+ #### `job_timeout`
436
+
437
+ How long this worker is allowed to hold a job before the workflow engine assumes that the worker has crashed and will make the job available to another worker. Accepts an Integer (milliseconds) or `ActiveSupport::Duration`:
438
+
439
+ ```ruby
440
+ job_timeout 120_000 # 2 minutes
441
+ job_timeout 2.minutes # same, with ActiveSupport
442
+ ```
443
+
444
+ Default: `60_000` ms (1 minute), configurable via [`Busybee.default_job_lock_timeout`](configuration.md).
445
+
446
+ #### `backoff`
447
+
448
+ How long the workflow engine should wait before making a failed job available for retry. Accepts an Integer (milliseconds) or `ActiveSupport::Duration`:
449
+
450
+ ```ruby
451
+ backoff 30_000 # 30 seconds
452
+ backoff 30.seconds # same, with ActiveSupport
453
+ ```
454
+
455
+ Default: `5_000` ms (5 seconds), configurable via [`Busybee.default_fail_job_backoff`](configuration.md).
456
+
457
+ #### Mode Configuration in the DSL
458
+
459
+ Workers can declare their preferred worker mode and any mode-specific options. These serve as defaults that can be overridden at deploy time via CLI flags or YAML configuration (see [Configuration Precedence](#configuration-precedence)):
460
+
461
+ ```ruby
462
+ class HighThroughputWorker < Busybee::Worker
463
+ worker_mode :streaming
464
+ streaming buffer: true, buffer_throttle: 5 # 5ms delay between accepting jobs
465
+
466
+ def perform
467
+ # ...
468
+ end
469
+ end
470
+
471
+ class BatchWorker < Busybee::Worker
472
+ worker_mode :polling
473
+ polling max_jobs: 50, request_timeout: 30_000
474
+
475
+ def perform
476
+ # ...
477
+ end
478
+ end
479
+ ```
480
+
481
+ See [Worker Modes](#worker-modes) for what these options mean and when to use each mode.
482
+
483
+ #### DSL Quick Reference
484
+
485
+ | DSL Method | Arguments | Default | Description |
486
+ |------------|-----------|---------|-------------|
487
+ | `job_type` | String | Derived from class name | Job type identifier |
488
+ | `description` | String | `nil` | Human-readable description |
489
+ | `variable` | name, opts | | Declare a variable input |
490
+ | `header` | name, opts | | Declare a header input |
491
+ | `input` | name, `source:`, opts | | Declare an input from any source |
492
+ | `output` | name, opts | | Declare an output |
493
+ | `worker_mode` | Symbol | `:hybrid` | `:polling`, `:streaming`, or `:hybrid` |
494
+ | `polling` | `max_jobs:`, `request_timeout:` | `25`, `60_000` | Polling mode options |
495
+ | `streaming` | `buffer:`, `buffer_throttle:` | `true`, `false` | Streaming mode options |
496
+ | `job_timeout` | Integer or Duration | `60_000` | Job lock timeout (ms) |
497
+ | `backoff` | Integer or Duration | `5_000` | Retry backoff delay (ms) |
498
+ | `backpressure_delay` | Integer or Duration | `2_000` | Delay after backpressure error (ms) |
499
+ | `complete_job_on_success` | Boolean | `true` | Auto-complete on success |
500
+ | `fail_job_on_error` | Boolean | `true` | Auto-fail on exception |
501
+ | `shutdown_on` | Exception class(es) | `[]` | Exceptions that trigger shutdown |
502
+
503
+ ---
504
+
505
+ ## Running Workers
506
+
507
+ ### CLI Quick Start
508
+
509
+ Run a worker with `bundle exec busybee`:
510
+
511
+ ```bash
512
+ # Run a single worker
513
+ bundle exec busybee ProcessOrderWorker
514
+
515
+ # Run multiple workers in one process
516
+ bundle exec busybee ProcessOrderWorker ShipOrderWorker NotifyCustomerWorker
517
+
518
+ # Run workers defined in a YAML config file
519
+ bundle exec busybee --config config/busybee.yml
520
+ ```
521
+
522
+ The CLI loads your Rails environment automatically (if present), instantiates the named worker classes, and starts processing jobs. Press Ctrl-C for a graceful shutdown, or Ctrl-C again to force-quit.
523
+
524
+ If you've used [Racecar](https://github.com/zendesk/racecar) to run Kafka consumers, this pattern should be familiar: one executable, one or more handler classes, and a long-running process that connects to the messaging infrastructure and dispatches work.
525
+
526
+ ### CLI Reference
527
+
528
+ ```
529
+ Usage: busybee [options] WorkerClass [WorkerClass ...]
530
+ ```
531
+
532
+ | Flag | Short | Type | Description |
533
+ |------|-------|------|-------------|
534
+ | `--config FILE` | `-c` | String | Path to a [YAML configuration file](#yaml-configuration) |
535
+ | `--worker-mode MODE` | `-m` | String | Worker mode: `polling`, `streaming`, or `hybrid` |
536
+ | `--log-format FORMAT` | `-l` | String | Log format: `text` or `json` |
537
+ | `--worker-name NAME` | `-n` | String | Worker process identifier (default: hostname) |
538
+ | `--cluster-address ADDR` | `-a` | String | Zeebe gateway address as `host:port` |
539
+ | `--version` | `-v` | | Print version and exit |
540
+ | `--help` | `-h` | | Print help and exit |
541
+
542
+ **Mutual Exclusions:**
543
+
544
+ - `--config` and `--worker-mode` cannot be used together. Set `worker_mode` in YAML instead.
545
+ - `--config` and positional worker arguments cannot be used together. List workers in YAML instead.
546
+
547
+ ### Rails Integration
548
+
549
+ The CLI automatically loads your Rails environment by requiring `./config/environment`. This means your workers have access to your models, application config, and everything else in your Rails app. Most gem configuration settings (credentials, logging, etc.) can be set through Rails app configuration values. See [Configuration: Rails Integration](configuration.md#rails-integration).
550
+
551
+ If you don't have Rails installed, loading the environment will be skipped automatically and transparently. If you _do_ have Rails installed but for some reason you want to skip loading the Rails environment, you can set an env var:
552
+
553
+ ```bash
554
+ BUSYBEE_SKIP_RAILS=1 bundle exec busybee MyWorker
555
+ ```
556
+
557
+ (Using an env var is necessary because the decision to attempt loading the environment must be made before we could load any configuration values from that environment.)
558
+
559
+ ### Signal Handling
560
+
561
+ The worker process responds to standard Unix signals:
562
+
563
+ | Signal | First time | Second time |
564
+ |--------|-----------|------------|
565
+ | `INT` (Ctrl-C) | Graceful shutdown: stop accepting new jobs, finish in-flight work | Force shutdown: exit immediately |
566
+ | `TERM` | Same as INT | Same as INT |
567
+ | `QUIT` | Same as INT | Same as INT |
568
+
569
+ During graceful shutdown, any jobs that were received from the workflow engine but not yet started are failed back to the workflow engine with their retry count preserved, so they'll be picked up by another worker.
570
+
571
+ ### Worker Modes
572
+
573
+ Zeebe supports two different ways of fetching jobs for your worker: long-polling or streaming. Both of them have advantages and disadvantages. Busybee supports both modes, as well as a third hybrid mode which eliminates the downsides of using either polling or streaming alone.
574
+
575
+ **If you don't know (or don't want to think about) which mode to use, use hybrid mode.** It's the default, it's been specifically designed to give you the best of both worlds, and it will allow you to mostly ignore this section. However, if you want to understand the tradeoffs between the different modes, read on.
576
+
577
+ #### Polling
578
+
579
+ ```ruby
580
+ worker_mode :polling
581
+ ```
582
+
583
+ In polling mode, the busybee process for your worker repeatedly [long-polls](https://docs.camunda.io/docs/apis-tools/zeebe-api/gateway-service/#activatejobs-rpc) the Zeebe gateway: "give me up to N jobs of this type." If no jobs are available, the call blocks until at least one job is available. Your worker receives the available jobs, processes them sequentially, then polls again.
584
+
585
+ This is the simplest mode, built on the oldest API. It has two principal downsides compared to streaming mode: one, it requires considerably more network traffic, and two, it results in additional latency for each job (both within the workflow engine, while buffering waiting for a polling request, and in the worker process while the batch is being sequentially processed). However, it avoids the main downside of [streaming mode](#streaming) by guaranteeing that it will eventually retrieve all jobs created prior to the polling request.
586
+
587
+ **Options:**
588
+
589
+ | Option | DSL | YAML/CLI | Default | Description |
590
+ |--------|-----|----------|---------|-------------|
591
+ | Max jobs per request | `polling max_jobs: N` | `max_jobs` | `25` | Limit on how many jobs to fetch per poll |
592
+ | Request timeout | `polling request_timeout: N` | `request_timeout` | `60_000` ms | Limit on how long to wait for jobs before the gateway returns an empty response |
593
+
594
+ **When to Use:** Polling is good for local prototyping, to ensure that backlogs of unprocessed "invisible" jobs cannot form due to race conditions. For deployed or production-like environments, polling should not normally be used, but could be useful during incident response to help clean up a large backlog of available jobs.
595
+
596
+ #### Streaming
597
+
598
+ ```ruby
599
+ worker_mode :streaming
600
+ ```
601
+
602
+ In streaming mode, the busybee process for your worker opens a persistent [gRPC stream](https://docs.camunda.io/docs/apis-tools/zeebe-api/gateway-service/#streamactivatedjobs-rpc) connection to the workflow engine. The engine pushes jobs to your worker as soon as they're created.
603
+
604
+ This is the more modern mode, giving you the lowest possible latency for new jobs, and the lowest amount of network overhead to get them. But it has a major downside: streams only ever deliver jobs *created after the stream opens.* If there were jobs of that type already backlogged in the workflow engine, a worker in streaming mode won't ever see them. For that, you need polling or [hybrid mode](#hybrid).
605
+
606
+ With default settings, a streaming worker accepts jobs from the workflow engine immediately, buffering them in memory in ruby prior to actual execution by your worker code. This helps ensure the stream stays responsive and enables [buffer throttling](#buffer-throttle) for controllable backpressure if the size of the in-memory buffer becomes too large. Jobs are still processed sequentially.
607
+
608
+ **Options:**
609
+
610
+ | Option | DSL | YAML/CLI | Default | Description |
611
+ |--------|-----|----------|---------|-------------|
612
+ | Buffer mode | `streaming buffer: true/false` | `buffer` | `true` | Use the buffer. Set to `false` for inline (unbuffered) processing. |
613
+ | Buffer throttle | `streaming buffer_throttle: N` | `buffer_throttle` | `false` | Delay between accepting jobs, in ms. See [Buffer Throttle](#buffer-throttle). |
614
+
615
+ **When to Use:** Whenever you can guarantee that there will be no pre-existing backlog of available jobs. In practice, that guarantee can be difficult to meet, because it depends on human processes to ensure that workflows are never deployed or started before all of the workers they rely on are already running.
616
+
617
+ #### Hybrid
618
+
619
+ ```ruby
620
+ worker_mode :hybrid
621
+ ```
622
+
623
+ In hybrid mode, busybee combines both approaches to avoid the downsides of either. It opens a stream to capture new jobs immediately, buffering them in memory, then also makes polling requests to drain any backlog. Once the backlog is caught up, it stops polling and continues stream-only processing.
624
+
625
+ This is the default mode, and it should be set-and-forget in most cases.
626
+
627
+ Hybrid mode works in three phases:
628
+
629
+ 1. **Open Stream** - starts receiving new jobs immediately, into the buffer.
630
+ 2. **Drain Backlog** - polls for pre-existing jobs while also processing any stream jobs that arrive. Stream jobs always take priority (the backlog is only drained if the worker is keeping ahead of the new jobs in the stream).
631
+ 3. **Stream Only** - once the backlog is caught up, it stops polling, but continues processing jobs from the stream.
632
+
633
+ All calls to your `perform` method happen on the main thread, maintaining the same sequential guarantee as the other modes.
634
+
635
+ **When to use:** Nearly always. This is the default and the right choice for most workloads. You get low latency for new jobs, low network load, *and* reliable backlog processing after deploys or restarts.
636
+
637
+ #### Buffer Throttle
638
+
639
+ When using hybrid mode, or streaming mode with the default `buffer: true`, jobs are consumed from the gRPC stream as soon as they are available, and are buffered in memory while they wait for your worker to process them. This design avoids applying any [backpressure](https://docs.camunda.io/docs/components/concepts/job-workers/#backpressure) to the gRPC gateway, so that the stream does not get marked as `not-ready` and end up missing future jobs (see that link for details).
640
+
641
+ For most workloads, this arrangement should work smoothly. But if your worker processes jobs slowly while the workflow engine is pushing lots of jobs fast, then the buffer (and ruby heap size) can start to grow without bound.
642
+
643
+ The `buffer_throttle` option lets you address this situation by adding a sleep between accepting each job. This limits the rate at which busybee accepts jobs from the gRPC gateway, which limits how fast the buffer can grow.
644
+
645
+ For most users, the default (false, no throttle) should be correct most of the time. Only tune this if you observe concerning memory growth or OOM errors from your workers due to unbounded buffer depth.
646
+
647
+ ```ruby
648
+ streaming buffer: true, buffer_throttle: 5.0 # 5ms delay between accepting each job -- max 200 jobs/s
649
+ ```
650
+
651
+ | `buffer_throttle` value | Behavior | Rate Cap (Appx.) |
652
+ |-------------------------|----------|------------------|
653
+ | `false` (default) | No throttling (buffer can grow without bound) | Not capped |
654
+ | `0` | Minimal possible throttling (see Sleep Granularity, below) | ~200k - ~1M jobs/sec |
655
+ | `0.1` - `10` (ms) | Practical range for stable throttling | Up to 10,000 jobs/sec |
656
+
657
+ Note that `buffer_throttle` is not a panacea. If your system is generating jobs at a faster rate than your worker can process them, enabling throttling **alone** will only make the problem worse. If the stream for your worker is [marked `not-ready` by the gRPC gateway due to being too slow](https://docs.camunda.io/docs/components/concepts/job-workers/#backpressure), some future jobs will not be routed to it and will end up "hidden" in the workflow engine's buffer, where they will never be sent to a stream (and must be polled for). The _true_ solution to the problem of having too many jobs is to add additional capacity by scaling your worker either horizontally (adding more replicas) or vertically (adding more CPU or memory). In such a situation, using `buffer_throttle` lets you ensure that any one replica never gets overloaded and runs out of memory.
658
+
659
+ > Instrumentation hooks for monitoring buffer depth, and detecting the need for additional capacity, are planned for v0.4.
660
+
661
+ **Sleep Granularity:** Ruby's `Kernel#sleep` delegates to `nanosleep(2)` on POSIX systems. Values down to 0.1ms (100 microseconds) work reliably on modern Linux and macOS. Below that, OS scheduler and GVL overhead dominate, so sub-0.1ms values are unlikely to behave meaningfully. Therefore, the maximum *stable and reliable* rate cap you can get is close to 10k jobs/sec, which you get from `buffer_throttle: 0.1`.
662
+
663
+ However, there is an option that gives you a rate cap higher than this value without being totally unthrottled. If you set `buffer_throttle` to 0, the thread does not actually sleep, but it does cause a context swap, which slows it down more than simply doing nothing (on the order of 1-5µs). Setting `buffer_throttle: 0` should give you a rate cap somewhere between roughly 200k - 1M jobs/sec, but the exact value will depend on your infrastructure.
664
+
665
+ #### Backpressure
666
+
667
+ When the Zeebe cluster is under heavy load, it may respond to requests with a `ResourceExhausted` GRPC error. Both the polling and hybrid modes handle this automatically by sleeping for `backpressure_delay` milliseconds (default: 2,000) before retrying.
668
+
669
+ ```ruby
670
+ backpressure_delay 10_000 # wait 10 seconds on backpressure
671
+ ```
672
+
673
+ > Backpressure delays are slated to be overhauled in v0.5, and this section is expected to be rewritten at that time.
674
+
675
+ ### Multiple Workers in One Process
676
+
677
+ When you pass multiple worker classes (via CLI args or YAML), Busybee runs them in a single process. Each worker runs in a dedicated thread, sharing a single gRPC connection to Zeebe:
678
+
679
+ ```bash
680
+ bundle exec busybee ProcessOrderWorker ShipOrderWorker NotifyCustomerWorker
681
+ ```
682
+
683
+ Each worker's configuration gets resolved independently through the [precedence chain](#configuration-precedence), so one worker can poll while another worker streams.
684
+
685
+ #### Thread Safety
686
+
687
+ Jobs of the *same* type are always processed sequentially. That is, only one instance of a given worker class will ever be executing `perform` at a given moment. But jobs of *different* types in the same container will run in parallel across threads. If your workers perform operations on shared resources (global state, shared caches, non-thread-safe libraries), you'll need to handle synchronization yourself. Most common Rails operations (ActiveRecord queries, cache reads/writes) are already thread-safe.
688
+
689
+ > An opt-in feature to run workers concurrently (multi-threaded) will be included in a future version of Busybee, and this section will be updated.
690
+
691
+ #### Database Connections
692
+
693
+ When running multiple workers, ensure your database connection pool is large enough to support one connection for each worker. Busybee logs a warning at startup if the ActiveRecord pool size is smaller than the number of workers.
694
+
695
+ ### YAML Configuration
696
+
697
+ For repeatable deployments, define your worker configuration in a YAML file:
698
+
699
+ ```yaml
700
+ # config/busybee.yml
701
+ worker_mode: hybrid
702
+ job_timeout: 120000
703
+ backoff: 10000
704
+
705
+ workers:
706
+ - ProcessOrderWorker
707
+ - ShipOrderWorker
708
+ - NotifyCustomerWorker
709
+ ```
710
+
711
+ Run it with:
712
+
713
+ ```bash
714
+ bundle exec busybee --config config/busybee.yml
715
+ ```
716
+
717
+ #### Per-Worker Overrides
718
+
719
+ Different workers often have different performance characteristics, so YAML supports per-worker overrides for any per-worker setting:
720
+
721
+ ```yaml
722
+ worker_mode: hybrid
723
+ workers:
724
+ - ProcessOrderWorker:
725
+ worker_mode: polling
726
+ max_jobs: 50
727
+ request_timeout: 10000
728
+ - ShipOrderWorker:
729
+ worker_mode: streaming
730
+ buffer_throttle: 5
731
+ - NotifyCustomerWorker # uses top-level defaults
732
+ ```
733
+
734
+ #### YAML Reference
735
+
736
+ **Top-level keys** (apply to all workers unless overridden):
737
+
738
+ | Key | Type | Description |
739
+ |-----|------|-------------|
740
+ | `worker_mode` | String | `polling`, `streaming`, or `hybrid` |
741
+ | `max_jobs` | Integer | Max jobs per polling request |
742
+ | `request_timeout` | Integer | Long-poll timeout (ms) |
743
+ | `job_timeout` | Integer | Job lock timeout (ms) |
744
+ | `backoff` | Integer | Retry backoff (ms) |
745
+ | `backpressure_delay` | Integer | Delay after backpressure error (ms) |
746
+ | `buffer` | Boolean | Enable job buffering in streaming mode |
747
+ | `buffer_throttle` | Integer/Boolean | Job buffer delay (ms). `false` to disable |
748
+ | `workers` | Array | Worker class names, with optional per-worker overrides |
749
+
750
+ **Process-wide settings** (`log_format`, `worker_name`, `cluster_address`) are CLI-only and cannot be set in YAML. Use the corresponding CLI flags alongside `--config`:
751
+
752
+ ```bash
753
+ bundle exec busybee --config config/busybee.yml --log-format json --worker-name "prod-worker-1"
754
+ ```
755
+
756
+ > For a realistic example, see the [Dropship Co. demo app's busybee config files](../spec/demo/config/busybee/).
757
+
758
+ ### Configuration Precedence
759
+
760
+ **Worker runtime settings** resolve through a 4-level precedence chain. Each level overrides the one below it:
761
+
762
+ ```
763
+ Per-Worker Override in YAML (highest priority) `workers: - MyWorker: { max_jobs: 50 }`
764
+ | v
765
+ Top-Level YAML / CLI Flag v `max_jobs: 50` (at YAML top level)
766
+ | v
767
+ Worker DSL Declaration v `polling max_jobs: 32` (in the Worker class)
768
+ | v
769
+ Gem Configuration & Defaults (lowest priority) `Busybee.default_max_jobs` (25 by default, but can be set in config)
770
+ ```
771
+
772
+ The first non-nil value wins. This means `0` and `false` are valid explicit values -- for example, `buffer_throttle: false` explicitly disables throttling even if a lower level sets it.
773
+
774
+ The [per-worker settings](#yaml-reference) this applies to are: `worker_mode`, `max_jobs`, `request_timeout`, `job_timeout`, `backoff`, `backpressure_delay`, `buffer`, and `buffer_throttle`.
775
+
776
+ **Process-wide settings** (like `--log-format`, `--worker-name`, and `--cluster-address`) follow a simpler 2-level chain: the CLI flag, then gem config / default. They don't participate in per-worker overrides because they always apply to the entire process. Also, they often take env vars as their inputs, so they are less useful in YAML.
777
+
778
+ For gem-level defaults (the bottom of the chain), see [Configuration](configuration.md).
779
+
780
+ ---
781
+
782
+ ## Testing Workers
783
+
784
+ Busybee includes helpers that let you unit test your workers without a running Zeebe instance, by constructing a simulated job and then running the real worker lifecycle with it.
785
+
786
+ This is a complement to the [workflow tests](testing.md) that you write. Those verify your process definitions, ensuring that the correct jobs will be available with the correct variables at the right times in the business process. These, by contrast, verify that your workers perform those jobs correctly under different conditions and with different inputs.
787
+
788
+ See that link for information about testing the workflow definitions. Read on for more about testing your workers.
789
+
790
+ ### Setup
791
+
792
+ If you've already set up `busybee/testing` for BPMN workflow tests, worker testing helpers are available automatically. If not:
793
+
794
+ ```ruby
795
+ # spec/spec_helper.rb or spec/rails_helper.rb
796
+ require "rspec"
797
+ require "busybee/testing"
798
+ ```
799
+
800
+ This makes `execute_worker`, `build_test_job`, and the worker matchers available in all RSpec examples.
801
+
802
+ ### Basic Worker Testing
803
+
804
+ The simplest way to test a worker is `execute_worker`. It runs the full worker lifecycle (input validation, `perform`, output validation, auto-complete) and returns the result:
805
+
806
+ ```ruby
807
+ RSpec.describe ProcessOrderWorker do
808
+ let(:order) { create(:order) }
809
+
810
+ it "processes the order and returns confirmation number" do
811
+ result = execute_worker(described_class, variables: { order_id: order.id })
812
+ expect(result[:confirmation_number]).to be_present
813
+ end
814
+
815
+ it "marks the order as processed" do
816
+ execute_worker(described_class, variables: { order_id: order.id })
817
+ expect(order.reload).to be_processed
818
+ end
819
+
820
+ it "raises when order is missing" do
821
+ expect {
822
+ execute_worker(described_class, variables: { order_id: "nonexistent" })
823
+ }.to raise_error(ActiveRecord::RecordNotFound)
824
+ end
825
+ end
826
+ ```
827
+
828
+ `execute_worker` accepts the same keyword arguments as `build_test_job`:
829
+
830
+ | Argument | Type | Default | Description |
831
+ |----------|------|---------|-------------|
832
+ | `variables:` | Hash | `{}` | Process variables |
833
+ | `headers:` | Hash | `{}` | Custom headers |
834
+ | `bpmn_process_id:` | String | `"test-process"` | BPMN process ID |
835
+ | `retries:` | Integer | `3` | Retry count |
836
+
837
+ Errors are re-raised after the worker's error handling runs, so you can use `expect { }.to raise_error` alongside job status assertions (see below).
838
+
839
+ ### Inspecting Job State
840
+
841
+ When you need to assert on what the worker *did* to the job (completed it? failed it? threw a BPMN error?), or if you need so many variables or headers that passing all options inline becomes unreadable, you can build a test job first with `build_test_job` and then pass it to `execute_worker`:
842
+
843
+ ```ruby
844
+ RSpec.describe ProcessOrderWorker do
845
+ it "completes the job on success" do
846
+ job = build_test_job(variables: { order_id: create(:order).id })
847
+ execute_worker(described_class, job: job)
848
+ expect(job).to be_complete
849
+ end
850
+
851
+ it "fails the job on error" do
852
+ job = build_test_job(variables: { order_id: "nonexistent" })
853
+ expect { execute_worker(described_class, job: job) }
854
+ .to raise_error(ActiveRecord::RecordNotFound)
855
+ expect(job).to be_failed
856
+ end
857
+ end
858
+ ```
859
+
860
+ `build_test_job` returns a real `Busybee::Job` backed by a stub client. All lifecycle operations (`complete!`, `fail!`, `throw_bpmn_error!`) update the job's status but don't make any network calls.
861
+
862
+ ### Worker Testing Matchers
863
+
864
+ For more expressive assertions, Busybee provides three RSpec matchers that combine execution and verification in a single expectation.
865
+
866
+ #### `complete_job`
867
+
868
+ Asserts that a worker completes the job successfully:
869
+
870
+ ```ruby
871
+ job = build_test_job(variables: { order_id: order.id })
872
+
873
+ # Just assert completion
874
+ expect(ProcessOrderWorker).to complete_job(job)
875
+
876
+ # Assert completion with specific output variables
877
+ expect(ProcessOrderWorker).to complete_job(job)
878
+ .with_vars(confirmation_number: "ORD-123")
879
+
880
+ # Assert completion with no output variables
881
+ expect(NotifyCustomerWorker).to complete_job(job).with_no_vars
882
+
883
+ # Works with RSpec composable matchers
884
+ expect(ProcessOrderWorker).to complete_job(job)
885
+ .with_vars(hash_including(confirmation_number: a_string_starting_with("ORD-")))
886
+ ```
887
+
888
+ #### `fail_job`
889
+
890
+ Asserts that a worker fails the job. Optionally match the error class and/or message, using the same argument forms as RSpec's `raise_error`:
891
+
892
+ ```ruby
893
+ job = build_test_job(variables: { order_id: "nonexistent" })
894
+
895
+ # Just assert failure
896
+ expect(ProcessOrderWorker).to fail_job(job)
897
+
898
+ # Match error class
899
+ expect(ProcessOrderWorker).to fail_job(job)
900
+ .with_error(ActiveRecord::RecordNotFound)
901
+
902
+ # Match error class and message pattern
903
+ expect(ProcessOrderWorker).to fail_job(job)
904
+ .with_error(ActiveRecord::RecordNotFound, /Couldn't find Order/)
905
+
906
+ # Match message only
907
+ expect(ProcessOrderWorker).to fail_job(job)
908
+ .with_error(/not found/)
909
+ ```
910
+
911
+ #### `throw_bpmn_error_on`
912
+
913
+ Asserts that a worker throws a [BPMN error](https://docs.camunda.io/docs/components/modeler/bpmn/error-events/). Remember, BPMN errors are a workflow control-flow concept, distinct from a Ruby exception. When your worker throws a BPMN error, it signals to the process instance that a known business condition occurred, and the workflow definition decides what happens next.
914
+
915
+ ```ruby
916
+ job = build_test_job(variables: { order_id: expired_order.id })
917
+
918
+ # Just assert a BPMN error was thrown
919
+ expect(ProcessOrderWorker).to throw_bpmn_error_on(job)
920
+
921
+ # Match error code (symbol form - converted to uppercase)
922
+ expect(ProcessOrderWorker).to throw_bpmn_error_on(job)
923
+ .with_code(:order_expired) # matches code "ORDER_EXPIRED"
924
+
925
+ # Match error code and message
926
+ expect(ProcessOrderWorker).to throw_bpmn_error_on(job)
927
+ .with_code(:order_expired, message: /has expired/)
928
+
929
+ # Match code from exception class (MyApp::OrderExpired -> "MY_APP_ORDER_EXPIRED")
930
+ expect(ProcessOrderWorker).to throw_bpmn_error_on(job)
931
+ .with_code(MyApp::OrderExpired)
932
+ ```
933
+
934
+ ### Testing Best Practices
935
+
936
+ One recommended pattern is to compose `build_test_job` using RSpec's `let` blocks, then reuse the job in different contexts while adjusting / overriding the individual parameters. Provided that the `let` blocks themselves do not become unwieldy, this can be a powerful and elegant pattern:
937
+
938
+ ```ruby
939
+ describe ProcessOrderWorker do
940
+ # There could potentially be many more variables, and/or some headers, but we show just one for clarity:
941
+ let(:job) { build_test_job(variables: variables) }
942
+ let(:variables) { { order_id: order_id } }
943
+
944
+ let(:order) { create :order } # e.g. FactoryBot or similar fixture setup
945
+ let(:order_id) { order.id }
946
+
947
+ context "with a valid order" do
948
+ it "processes the order normally" do
949
+ expect(described_class).to complete_job(job).with_vars(confirmation_number: /[A-Z]{6}/)
950
+ end
951
+ end
952
+
953
+ # Now we can override just the fixture object while reusing the rest of the job setup:
954
+ context "with an order without sufficient funds" do
955
+ let(:order) { create :order, :insufficient_funds } # e.g. a FactoryBot trait or similar
956
+
957
+ it "throws a BPMN error so the workflow can branch" do
958
+ expect(described_class).to throw_bpmn_error_on(job).with_code("INSUFFICIENT_FUNDS")
959
+ end
960
+ end
961
+
962
+ # Or we can override just the variable's value itself, and bypass the fixture entirely:
963
+ context "when the order is not found" do
964
+ let(:order_id) { SecureRandom.uuid }
965
+
966
+ it "fails and reports the error" do
967
+ expect(described_class).to fail_job(job).with_error(ActiveRecord::RecordNotFound)
968
+ end
969
+ end
970
+
971
+ # Or even override the entire set of variables:
972
+ context "when a workflow does not pass the expected set of variables" do
973
+ let(:variables) { {} }
974
+
975
+ it "fails input validation, alerting us to the problem" do
976
+ expect(described_class).to fail_job(job).with_error(Busybee::MissingInput)
977
+ end
978
+ end
979
+ end
980
+ ```
981
+
982
+ For more realistic examples, see the [demo app's worker specs](../spec/demo/spec/workers/).