busybee 0.1.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +71 -7
- data/README.md +70 -42
- data/docs/client/quick_start.md +279 -0
- data/docs/client.md +825 -0
- data/docs/configuration.md +550 -0
- data/docs/grpc.md +50 -25
- data/docs/testing.md +118 -28
- data/docs/workers.md +982 -0
- data/exe/busybee +6 -0
- data/lib/busybee/cli.rb +173 -0
- data/lib/busybee/client/error_handling.rb +37 -0
- data/lib/busybee/client/job_operations.rb +236 -0
- data/lib/busybee/client/message_operations.rb +84 -0
- data/lib/busybee/client/process_operations.rb +108 -0
- data/lib/busybee/client/variable_operations.rb +64 -0
- data/lib/busybee/client.rb +87 -0
- data/lib/busybee/configure.rb +290 -0
- data/lib/busybee/credentials/camunda_cloud.rb +58 -0
- data/lib/busybee/credentials/insecure.rb +24 -0
- data/lib/busybee/credentials/oauth.rb +157 -0
- data/lib/busybee/credentials/tls.rb +43 -0
- data/lib/busybee/credentials.rb +200 -0
- data/lib/busybee/defaults.rb +20 -0
- data/lib/busybee/error.rb +50 -0
- data/lib/busybee/grpc/error.rb +60 -0
- data/lib/busybee/grpc.rb +2 -2
- data/lib/busybee/job.rb +219 -0
- data/lib/busybee/job_stream.rb +85 -0
- data/lib/busybee/logging.rb +61 -0
- data/lib/busybee/railtie.rb +113 -0
- data/lib/busybee/runner/hybrid.rb +64 -0
- data/lib/busybee/runner/multi.rb +101 -0
- data/lib/busybee/runner/polling.rb +54 -0
- data/lib/busybee/runner/streaming.rb +159 -0
- data/lib/busybee/runner.rb +97 -0
- data/lib/busybee/runtime_config.rb +184 -0
- data/lib/busybee/serialization.rb +100 -0
- data/lib/busybee/testing/activated_job.rb +33 -8
- data/lib/busybee/testing/helpers/execution.rb +139 -0
- data/lib/busybee/testing/helpers/support.rb +78 -0
- data/lib/busybee/testing/helpers.rb +56 -66
- data/lib/busybee/testing/matchers/complete_job.rb +55 -0
- data/lib/busybee/testing/matchers/fail_job.rb +75 -0
- data/lib/busybee/testing/matchers/have_activated.rb +1 -1
- data/lib/busybee/testing/matchers/have_available_jobs.rb +44 -0
- data/lib/busybee/testing/matchers/throw_bpmn_error_on.rb +72 -0
- data/lib/busybee/testing.rb +5 -33
- data/lib/busybee/version.rb +1 -1
- data/lib/busybee/worker/configuration.rb +287 -0
- data/lib/busybee/worker/dsl.rb +187 -0
- data/lib/busybee/worker/shutdown.rb +27 -0
- data/lib/busybee/worker.rb +130 -0
- data/lib/busybee.rb +134 -2
- metadata +80 -3
data/docs/workers.md
ADDED
|
@@ -0,0 +1,982 @@
|
|
|
1
|
+
# Workers
|
|
2
|
+
|
|
3
|
+
In a distributed system, each application might need to participate in dozens of business processes that span the whole organization. Orchestration lets you meet that need by allowing each app to expose just a handful of domain-specific actions, and then reusing and composing those actions into different workflows which describe those business processes. Busybee's **Worker** abstraction lets you define those actions as simple Ruby classes, and handles everything else for you: connecting your class to the workflow engine, requesting work, reporting results, and managing the process lifecycle.
|
|
4
|
+
|
|
5
|
+
If you've used Sidekiq (or similar frameworks) to build background jobs, this pattern should feel very familiar. You define a class, implement a `perform` method, and let the framework handle the infrastructure. The key conceptual differences are:
|
|
6
|
+
- Background jobs in Sidekiq are always running in the same application that invokes them and defines them, while [Workers](https://docs.camunda.io/docs/components/concepts/job-workers/) in an orchestrated system are still running in the application that defines them, but they are always being invoked externally, by an instance of one of those business processes that is running in the central workflow engine.
|
|
7
|
+
- Background jobs work by side effects only (that is, the return value of `perform` in a Sidekiq job does not matter), but in a Worker both side effects and return values matter. The return values become part of the context of the running business process instance. This allows other downstream workers to consume those values, and also allows the workflow to make flow control decisions based on them.
|
|
8
|
+
|
|
9
|
+
Busybee is built around a workflow engine named [Zeebe](https://docs.camunda.io/docs/components/zeebe/zeebe-overview/), which is available in either self-hosted form or as a hosted/SaaS product from [Camunda](https://camunda.com/). The workflow definition format used by Zeebe and Camunda, and therefore what Busybee supports, is called [BPMN](https://docs.camunda.io/docs/components/modeler/bpmn/bpmn-primer/).
|
|
10
|
+
|
|
11
|
+
> For a working example of workers in a multi-domain system, see the [Dropship Co. demo app](../spec/demo/README.md), which uses busybee workers to orchestrate order fulfillment across isolated warehousing, logistics, and delivery domains.
|
|
12
|
+
|
|
13
|
+
## Table of Contents
|
|
14
|
+
|
|
15
|
+
- [Defining Workers](#defining-workers)
|
|
16
|
+
- [Your First Worker](#your-first-worker)
|
|
17
|
+
- [The Job Lifecycle](#the-job-lifecycle)
|
|
18
|
+
- [Declaring Inputs](#declaring-inputs)
|
|
19
|
+
- [Declaring Outputs](#declaring-outputs)
|
|
20
|
+
- [Input/Output Types](#inputoutput-types)
|
|
21
|
+
- [Advanced DSL Options](#advanced-dsl-options)
|
|
22
|
+
- [Running Workers](#running-workers)
|
|
23
|
+
- [CLI Quick Start](#cli-quick-start)
|
|
24
|
+
- [CLI Reference](#cli-reference)
|
|
25
|
+
- [Rails Integration](#rails-integration)
|
|
26
|
+
- [Signal Handling](#signal-handling)
|
|
27
|
+
- [Worker Modes](#worker-modes)
|
|
28
|
+
- [Multiple Workers in One Process](#multiple-workers-in-one-process)
|
|
29
|
+
- [YAML Configuration](#yaml-configuration)
|
|
30
|
+
- [Configuration Precedence](#configuration-precedence)
|
|
31
|
+
- [Testing Workers](#testing-workers)
|
|
32
|
+
- [Setup](#setup)
|
|
33
|
+
- [Basic Worker Testing](#basic-worker-testing)
|
|
34
|
+
- [Inspecting Job State](#inspecting-job-state)
|
|
35
|
+
- [Worker Testing Matchers](#worker-testing-matchers)
|
|
36
|
+
- [Testing Best Practices](#testing-best-practices)
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## Defining Workers
|
|
41
|
+
|
|
42
|
+
### Your First Worker
|
|
43
|
+
|
|
44
|
+
A worker is a Ruby class that subclasses `Busybee::Worker` and implements `perform`. Each time the workflow engine has a job ready, Busybee creates a new instance of your worker class for that job and calls `perform`:
|
|
45
|
+
|
|
46
|
+
```ruby
|
|
47
|
+
class ProcessOrderWorker < Busybee::Worker
|
|
48
|
+
job_type "process_order"
|
|
49
|
+
|
|
50
|
+
variable :order_id, type: :uuid
|
|
51
|
+
|
|
52
|
+
output :confirmation_number, type: :string
|
|
53
|
+
|
|
54
|
+
def perform
|
|
55
|
+
order = Order.find(order_id)
|
|
56
|
+
confirmation = order.process!
|
|
57
|
+
|
|
58
|
+
{ confirmation_number: confirmation }
|
|
59
|
+
end
|
|
60
|
+
end
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
A few things are happening here:
|
|
64
|
+
|
|
65
|
+
- **`job_type`** identifies which jobs this worker handles. When the workflow engine reaches a [service task](https://docs.camunda.io/docs/components/modeler/bpmn/service-tasks/) with this type, it creates a job and sends it to an available worker. If you omit `job_type`, it's derived from the class name: `ProcessOrderWorker` becomes `"process_order"`.
|
|
66
|
+
- **[`variable`](#declaring-inputs)** declares an input your worker expects. Busybee defines an accessor method so you can call `order_id` directly in `perform`.
|
|
67
|
+
- **[`output`](#declaring-outputs)** declares what your worker returns. When `perform` returns a Hash, those values flow back into the workflow as [process variables](https://docs.camunda.io/docs/components/concepts/variables/) so that downstream workers may have access to them among their inputs.
|
|
68
|
+
- **[`perform`](#the-job-lifecycle)** contains your business logic. A new worker instance is created for each job, so you can safely use instance variables and private helper methods. (Note that perform takes no arguments.)
|
|
69
|
+
|
|
70
|
+
### The Job Lifecycle
|
|
71
|
+
|
|
72
|
+
While a Worker object knows how to perform units of work, a Job object represents one individual unit of that work to be performed. When a running [process instance](https://docs.camunda.io/docs/components/concepts/processes/#process-instance-creation) (a single execution of a workflow) arrives at the point where work needs to be done (a "[service task](https://docs.camunda.io/docs/components/modeler/bpmn/service-tasks/)" in BPMN terms), a job is created in the workflow engine with all of the context needed to perform that work. The job is called "created" or "available" when the workflow engine prepares it, and called "activated" or "ready" when it has been picked up by a running worker. At that point, the workflow engine waits for the worker to call back and report one of three possible outcomes:
|
|
73
|
+
- **Completed** - This is the happy path. If the work was performed successfully, the job is marked complete and optional additional data variables are sent back to the process instance, which continues to the next step in the workflow.
|
|
74
|
+
- **Failed** - If the work could not be performed (if a ruby exception was raised), the job is marked as failed, and after a short backoff delay it will be retried (made available again for another worker to pick it up).
|
|
75
|
+
- The maximum number of retries is set by the [process definition](https://docs.camunda.io/docs/components/modeler/bpmn/service-tasks/#task-definition) (the BPMN document that describes the workflow); if it is exceeded, the entire running process instance is paused and an ["Incident"](https://docs.camunda.io/docs/components/concepts/incidents/) is raised in the workflow engine for an operator to review. The remaining number of retries on the current job can be read and updated as desired.
|
|
76
|
+
- **Errored** - If the work encountered an abnormal **business** condition (for example, insufficient funds) the job may do what is called _throwing a BPMN error._ This is different than a _Ruby error,_ which causes job failure and retry; BPMN errors should be used for flow control, when there's an anticipated business outcome that the workflow needs to handle by taking a different branch.
|
|
77
|
+
|
|
78
|
+
If none of those three things happen within a configurable window of time (the job timeout), the workflow engine assumes that the worker process must have crashed, and it will make the job available again for other workers to pick up. The deadline for the current job can also be read and updated as desired.
|
|
79
|
+
|
|
80
|
+
When a running busybee process receives a job, it uses your worker class to execute this lifecycle, with some additional checks and conveniences:
|
|
81
|
+
|
|
82
|
+
1. **Instantiation** - a new instance of your worker is created for that job.
|
|
83
|
+
2. **Input Validation** - all `required: true` inputs are checked. If any are missing, `MissingInput` is raised.
|
|
84
|
+
3. **Perform** - your `perform` method runs.
|
|
85
|
+
4. **On Success** - if `perform` returned successfully and `complete_job_on_success` is `true` (the default), then:
|
|
86
|
+
- **Output Validation** - All `required: true` outputs are checked in the Hash returned from `perform`. If any are missing, `MissingOutput` is raised.
|
|
87
|
+
- **Return Variables** - Busybee reports to the workflow engine that the job is complete, sending back any output values returned from `perform`.
|
|
88
|
+
5. **On Failure** - if `perform` raises an exception and `fail_job_on_error` is `true` (the default), then:
|
|
89
|
+
- **Error Reporting** - Busybee reports to the workflow engine that the job failed, sending back the error class and message and the configured backoff delay.
|
|
90
|
+
|
|
91
|
+
This means that for most workers, you can just implement `perform`, return a Hash, and let Busybee handle the rest.
|
|
92
|
+
|
|
93
|
+
(For throwing a BPMN error, see the [Manual Lifecycle Control](#manual-lifecycle-control) section below.)
|
|
94
|
+
|
|
95
|
+
#### Automatic Completion
|
|
96
|
+
|
|
97
|
+
By default, Busybee completes the job when `perform` returns successfully. If `perform` returns a Hash, those key-value pairs become output variables:
|
|
98
|
+
|
|
99
|
+
```ruby
|
|
100
|
+
def perform
|
|
101
|
+
order = Order.find(order_id)
|
|
102
|
+
|
|
103
|
+
{ status: order.status, processed_at: Time.now.iso8601 }
|
|
104
|
+
# Job is completed automatically with these variables
|
|
105
|
+
end
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
If `perform` returns an empty Hash or anything other than a Hash (including nil), the job is completed with no output variables.
|
|
109
|
+
|
|
110
|
+
**Output Validation:** If one or more output variables were declared with `required: true` (the default) but those keys are not present in the returned Hash, a `MissingOutput` error will be raised.
|
|
111
|
+
|
|
112
|
+
#### Automatic Failure (Error Handling)
|
|
113
|
+
|
|
114
|
+
If `perform` raises an exception, Busybee reports the job as failed to the workflow engine, along with the error message. The job will then be retried after a configurable backoff delay, up to the maximum retry count set in the [BPMN process definition](https://docs.camunda.io/docs/components/modeler/bpmn/service-tasks/#task-definition) (not shown here):
|
|
115
|
+
|
|
116
|
+
```ruby
|
|
117
|
+
class ProcessPaymentWorker < Busybee::Worker
|
|
118
|
+
variable :order_id, type: :uuid
|
|
119
|
+
|
|
120
|
+
output :charged, type: :boolean
|
|
121
|
+
|
|
122
|
+
backoff 30_000 # wait 30 seconds before the workflow engine makes this job available again
|
|
123
|
+
|
|
124
|
+
def perform
|
|
125
|
+
order = Order.find(order_id) # may raise ActiveRecord::RecordNotFound
|
|
126
|
+
PaymentGateway.charge(order) # may raise PaymentGateway::Timeout
|
|
127
|
+
|
|
128
|
+
{ charged: true }
|
|
129
|
+
end
|
|
130
|
+
# If either line raises, the job is failed and retried after 30s
|
|
131
|
+
end
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
**Important:** Because failed jobs are retried by default, you should try to make your `perform` method [idempotent](https://en.wikipedia.org/wiki/Idempotence) whenever possible. If a particular worker cannot safely be retried, set retries to `0` in the BPMN definition. Even then, **Zeebe does not guarantee exactly-once execution.** If you need that guarantee, your worker must implement it.
|
|
135
|
+
|
|
136
|
+
#### Manual Lifecycle Control
|
|
137
|
+
|
|
138
|
+
For cases where automatic handling isn't sufficient, you can control the job lifecycle directly. The `complete!`, `fail!`, and `throw_bpmn_error!` methods are delegated from the worker to the job:
|
|
139
|
+
|
|
140
|
+
```ruby
|
|
141
|
+
class ProcessOrderWorker < Busybee::Worker
|
|
142
|
+
complete_job_on_success false # we'll handle completion ourselves
|
|
143
|
+
|
|
144
|
+
def perform
|
|
145
|
+
order = Order.find(order_id)
|
|
146
|
+
|
|
147
|
+
case order.validate
|
|
148
|
+
when :ok
|
|
149
|
+
order.process!
|
|
150
|
+
complete!(confirmation: order.confirmation_number)
|
|
151
|
+
when :fraud_detected
|
|
152
|
+
# this is a business-level error case -- the workflow will have a branch to handle this:
|
|
153
|
+
throw_bpmn_error!(:fraud_detected, "Fraud detected for order #{order_id}")
|
|
154
|
+
when :invalid_items
|
|
155
|
+
# this is a technical failure -- if it cannot succeed on retry, the workflow needs to stop and alert the operator:
|
|
156
|
+
fail!("Order contains invalid or unavailable items")
|
|
157
|
+
end
|
|
158
|
+
end
|
|
159
|
+
end
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
**`complete!(vars = {})`** completes the job with optional output variables.
|
|
163
|
+
|
|
164
|
+
**`fail!(error, retries: nil, backoff: nil)`** fails the job. Accepts a String or Exception. Optionally override the retry count or backoff delay.
|
|
165
|
+
|
|
166
|
+
**`throw_bpmn_error!(code, message = "")`** throws a [BPMN error](https://docs.camunda.io/docs/components/modeler/bpmn/error-events/) that can be caught by an [error boundary event](https://docs.camunda.io/docs/components/modeler/bpmn/error-events/#error-boundary-events) in the process definition. The error code can be a String, Symbol (converted to UPPERCASE), or Exception class (it will be converted from `MyApp::OrderNotFound` to the code string `MY_APP_ORDER_NOT_FOUND`). Use BPMN errors when the failure is an anticipated business outcome that the workflow should handle, rather than a technical failure that should be retried.
|
|
167
|
+
|
|
168
|
+
**`update_retries(count)`** and **`update_timeout(duration)`** modify the job's retry count or lock timeout without completing or failing it. Useful for long-running jobs that need to extend their deadline:
|
|
169
|
+
|
|
170
|
+
```ruby
|
|
171
|
+
def perform
|
|
172
|
+
update_timeout(5.minutes) # extend deadline before starting long operation
|
|
173
|
+
# ... long operation ...
|
|
174
|
+
end
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
Note that you can safely mix-and-match manual and automatic control, because both automatic completion and automatic failure check whether the job is still `ready?` before they attempt to complete or fail it. Therefore, this is a perfectly valid alternate approach to the above:
|
|
178
|
+
|
|
179
|
+
```ruby
|
|
180
|
+
class ProcessOrderWorker < Busybee::Worker
|
|
181
|
+
def perform
|
|
182
|
+
order = Order.find(order_id)
|
|
183
|
+
|
|
184
|
+
case order.validate
|
|
185
|
+
when :ok
|
|
186
|
+
order.process!
|
|
187
|
+
return { confirmation: order.confirmation_number } # will trigger auto-complete
|
|
188
|
+
when :fraud_detected
|
|
189
|
+
throw_bpmn_error!(:fraud_detected, "Fraud detected for order #{order_id}") # marks the job non-ready, so auto-complete is skipped
|
|
190
|
+
when :invalid_items
|
|
191
|
+
raise "Order contains invalid or unavailable items" # will trigger auto-fail
|
|
192
|
+
end
|
|
193
|
+
end
|
|
194
|
+
end
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
#### Shutdown Handling
|
|
198
|
+
|
|
199
|
+
Some exceptions represent conditions that a worker container can't recover from: a lost database connection, a broken Redis pool, a revoked API credential. When one of these occurs, it's better to shut down the worker process so that your container manager (e.g. kubernetes) can replace it with a fresh one.
|
|
200
|
+
|
|
201
|
+
Use `shutdown_on` to declare which exception classes should trigger a graceful shutdown:
|
|
202
|
+
|
|
203
|
+
```ruby
|
|
204
|
+
class ProcessOrderWorker < Busybee::Worker
|
|
205
|
+
shutdown_on PG::ConnectionBad
|
|
206
|
+
shutdown_on Redis::ConnectionError
|
|
207
|
+
|
|
208
|
+
def perform
|
|
209
|
+
# If this raises PG::ConnectionBad, the worker shuts down gracefully
|
|
210
|
+
Order.find(order_id).process!
|
|
211
|
+
end
|
|
212
|
+
end
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
You can also configure shutdown errors globally for all workers in your application via [`Busybee.shutdown_on_errors`](configuration.md).
|
|
216
|
+
|
|
217
|
+
When a shutdown is triggered, the worker process stops requesting new jobs, fails any in-flight jobs (preserving their retry count so they'll be picked up by another worker), and exits.
|
|
218
|
+
|
|
219
|
+
#### Direct Job Access
|
|
220
|
+
|
|
221
|
+
Several of the methods you've already seen — `complete!`, `fail!`, `throw_bpmn_error!`, `update_retries`, `update_timeout`, `variables`, and `headers` — are actually delegated from the worker to an underlying `Busybee::Job` object. You can access this object directly via `self.job` in `perform`. The job carries metadata, raw data, and status information that isn't available at the worker level:
|
|
222
|
+
|
|
223
|
+
```ruby
|
|
224
|
+
def perform
|
|
225
|
+
# Metadata (job-only)
|
|
226
|
+
job.key # unique job identifier (Integer)
|
|
227
|
+
job.type # job type from BPMN (String)
|
|
228
|
+
job.process_instance_key # workflow instance this job belongs to (Integer)
|
|
229
|
+
job.bpmn_process_id # BPMN process ID (String)
|
|
230
|
+
job.retries # remaining retry attempts (Integer)
|
|
231
|
+
job.deadline # lock expiration time (frozen Time, UTC)
|
|
232
|
+
|
|
233
|
+
# Data (delegated, but declared inputs are preferred — see Declaring Inputs)
|
|
234
|
+
job.variables # all process variables, as a frozen hash with indifferent access
|
|
235
|
+
job.headers # custom headers from BPMN definition, same format
|
|
236
|
+
|
|
237
|
+
# Lifecycle (delegated — see Manual Lifecycle Control)
|
|
238
|
+
job.complete!(vars = {}) # mark job complete, with optional output variables
|
|
239
|
+
job.fail!(error, retries: nil, backoff: nil) # mark job failed
|
|
240
|
+
job.throw_bpmn_error!(code, message = "") # throw a BPMN error
|
|
241
|
+
job.update_retries(count) # change remaining retry count
|
|
242
|
+
job.update_timeout(duration) # extend or shorten the job lock deadline
|
|
243
|
+
|
|
244
|
+
# Status predicates (job-only)
|
|
245
|
+
job.ready? # true if not yet completed/failed/errored
|
|
246
|
+
job.complete? # true if completed
|
|
247
|
+
job.failed? # true if failed
|
|
248
|
+
job.error? # true if BPMN error was thrown
|
|
249
|
+
end
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
Variables and headers support both hash-style and method-style access, including nested values:
|
|
253
|
+
|
|
254
|
+
```ruby
|
|
255
|
+
job.variables[:order_id] # hash access with symbol key
|
|
256
|
+
job.variables["order_id"] # hash access with string key
|
|
257
|
+
job.variables.order_id # method access
|
|
258
|
+
job.variables.address.zip_code # nested method access
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
Most of the time, you won't need to reach for `job` directly — input accessors give you named, validated methods for reading data, and the lifecycle delegations (`complete!`, `fail!`, etc.) read more naturally without the `job.` prefix. But the job object is there when you need metadata, status checks, or raw data access.
|
|
262
|
+
|
|
263
|
+
### Declaring Inputs
|
|
264
|
+
|
|
265
|
+
Inputs declare the data your worker needs from the running workflow instance. Each input becomes an accessor method on your worker, so you can use it directly in `perform` instead of digging through raw hashes.
|
|
266
|
+
|
|
267
|
+
Inputs come from two sources: **variables** and **headers**. [Variables](https://docs.camunda.io/docs/components/concepts/variables/) are data specific to a running workflow instance: an order ID, a customer email, a calculated total. [Headers](https://docs.camunda.io/docs/components/modeler/bpmn/service-tasks/#task-headers) are set in the BPMN process definition and are the same for every instance, so they are useful for configuration like which email template to send.
|
|
268
|
+
|
|
269
|
+
#### From Variables
|
|
270
|
+
|
|
271
|
+
```ruby
|
|
272
|
+
class ShipOrderWorker < Busybee::Worker
|
|
273
|
+
variable :order_id, type: :uuid, description: "Order to ship"
|
|
274
|
+
variable :shipping_method, default: "standard"
|
|
275
|
+
|
|
276
|
+
def perform
|
|
277
|
+
order = Order.find(order_id)
|
|
278
|
+
order.ship!(method: shipping_method) # "standard" if not in variables
|
|
279
|
+
|
|
280
|
+
{ tracking_number: order.tracking_number }
|
|
281
|
+
end
|
|
282
|
+
end
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
#### From Headers
|
|
286
|
+
|
|
287
|
+
```ruby
|
|
288
|
+
class CalculateDistanceWorker < Busybee::Worker
|
|
289
|
+
variable :from_lat, type: :decimal
|
|
290
|
+
variable :from_lon, type: :decimal
|
|
291
|
+
variable :to_lat, type: :decimal
|
|
292
|
+
variable :to_lon, type: :decimal
|
|
293
|
+
|
|
294
|
+
header :algorithm, type: :string, description: "Distance formula to use"
|
|
295
|
+
|
|
296
|
+
output :distance, type: :decimal
|
|
297
|
+
|
|
298
|
+
def perform
|
|
299
|
+
dist = compute_distance(algorithm)
|
|
300
|
+
|
|
301
|
+
{ distance: dist.round(3) }
|
|
302
|
+
end
|
|
303
|
+
end
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
Because the algorithm is a header, different BPMN tasks can reuse the same worker with different algorithms: one task might set the header to `"haversine"`, another to `"pythagorean"`.
|
|
307
|
+
|
|
308
|
+
#### From Either Source
|
|
309
|
+
|
|
310
|
+
Sometimes a value should come from a variable when available, but fall back to a header as a default (or vice versa). Pass an array of sources -- the first non-nil value wins:
|
|
311
|
+
|
|
312
|
+
```ruby
|
|
313
|
+
input :priority, source: [:variable, :header], type: :string
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
This is the general form. The `variable` and `header` DSL methods are shorthands:
|
|
317
|
+
|
|
318
|
+
```ruby
|
|
319
|
+
variable :template # same as `input :template, source: :variable`
|
|
320
|
+
header :template # same as `input :template, source: :header`
|
|
321
|
+
input :template, source: [:variable, :header] # check variable first, then header
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
#### Input Options
|
|
325
|
+
|
|
326
|
+
| Option | Type | Default | Description |
|
|
327
|
+
|--------|------|---------|-------------|
|
|
328
|
+
| `source:` | Symbol or Array | (required for `input`) | `:variable`, `:header`, or `[:variable, :header]` |
|
|
329
|
+
| `required:` | Boolean | `true`\* | Raise `MissingInput` if absent. Cannot combine with `default:` |
|
|
330
|
+
| `type:` | Symbol | `nil` | Documentation hint. See [Input/Output Types](#inputoutput-types) |
|
|
331
|
+
| `description:` | String | `nil` | Human-readable description |
|
|
332
|
+
| `default:` | any | (none) | Default value when input is missing. Makes the input not required |
|
|
333
|
+
| `accessor_name:` | Symbol | (same as name) | Custom method name for the accessor |
|
|
334
|
+
| `define_accessor:` | Boolean | `true` | Set to `false` to skip accessor definition |
|
|
335
|
+
|
|
336
|
+
When an input is `required: true` (the default) and the value is missing from the job, Busybee raises `Busybee::MissingInput` before your `perform` method runs. This can alert you to a workflow which is trying to use this worker in an invalid or incorrect way before that might cause harder-to-catch bugs further downstream.
|
|
337
|
+
|
|
338
|
+
> \* The default value of `required` can be switched for your entire app if desired, allowing you to disable the raise-on-missing behavior. See the [configuration](./configuration.md) document.
|
|
339
|
+
|
|
340
|
+
### Declaring Outputs
|
|
341
|
+
|
|
342
|
+
Outputs declare the variables your worker returns to the workflow engine. When your `perform` method returns a Hash, Busybee sends those key-value pairs back as new or updated [process variables](https://docs.camunda.io/docs/components/concepts/variables/):
|
|
343
|
+
|
|
344
|
+
```ruby
|
|
345
|
+
class CreateShipmentWorker < Busybee::Worker
|
|
346
|
+
variable :order_id, type: :uuid
|
|
347
|
+
variable :warehouse_id, type: :uuid
|
|
348
|
+
|
|
349
|
+
output :shipment_id, type: :uuid, description: "Created shipment's ID"
|
|
350
|
+
output :item_count, type: :integer, description: "Total item count"
|
|
351
|
+
|
|
352
|
+
def perform
|
|
353
|
+
shipment = Shipment.create!(order_id: order_id, warehouse_id: warehouse_id)
|
|
354
|
+
|
|
355
|
+
{ shipment_id: shipment.id, item_count: shipment.items.count }
|
|
356
|
+
end
|
|
357
|
+
end
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
If a required output is missing from the returned Hash, Busybee raises `Busybee::MissingOutput`. This can alert you to a worker which isn't fulfilling its entire contract (isn't doing everything a workflow is relying on it to do).
|
|
361
|
+
|
|
362
|
+
Note that if `perform` returns nothing at all (or returns anything other than a Hash), no variables are sent back. This is equivalent to returning an empty Hash.
|
|
363
|
+
|
|
364
|
+
#### Output Options
|
|
365
|
+
|
|
366
|
+
| Option | Type | Default | Description |
|
|
367
|
+
|--------|------|---------|-------------|
|
|
368
|
+
| `required:` | Boolean | `true`\* | Raise `MissingOutput` if absent from return value |
|
|
369
|
+
| `type:` | Symbol | `nil` | Documentation hint |
|
|
370
|
+
| `description:` | String | `nil` | Human-readable description |
|
|
371
|
+
|
|
372
|
+
> \* The default value of `required` can be switched for your entire app if desired, allowing you to disable the raise-on-missing behavior. See the [configuration](./configuration.md) document.
|
|
373
|
+
|
|
374
|
+
### Input/Output Types
|
|
375
|
+
|
|
376
|
+
The `type:` option is a documentation hint that describes what kind of value to expect. Types are not enforced at runtime (job variables arrive as JSON and are deserialized accordingly) but they serve as a contract between the BPMN process definition and your worker code. The available types are designed to align well with [JSON](https://www.json.org/) and Zeebe's [FEEL expression language](https://docs.camunda.io/docs/components/modeler/feel/what-is-feel/):
|
|
377
|
+
|
|
378
|
+
| Type | JSON Representation | Example |
|
|
379
|
+
|------|--------------------| --------|
|
|
380
|
+
| `string` | String | `"hello"` |
|
|
381
|
+
| `integer` | Number (integer) | `42` |
|
|
382
|
+
| `decimal` | Number (float) | `99.95` |
|
|
383
|
+
| `boolean` | Boolean | `true` |
|
|
384
|
+
| `datetime` | String ([ISO 8601](https://en.wikipedia.org/wiki/ISO_8601)) | `"2026-03-06T14:30:00Z"` |
|
|
385
|
+
| `duration` | String ([ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations)) | `"PT6H"` |
|
|
386
|
+
| `uuid` | String | `"550e8400-e29b-41d4-a716-446655440000"` |
|
|
387
|
+
| `null` | null | `null` |
|
|
388
|
+
|
|
389
|
+
Note that, while JSON and FEEL support array and object types, this version of busybee does not yet provide that support. If you have array- or object-shaped inputs or outputs, either omit the `type:` option, or set it to `null` (which will not be enforced anywhere).
|
|
390
|
+
|
|
391
|
+
> A future busybee version will provide runtime instrumentation hooks for when workers start up or shut down, which will receive input/output types and descriptions among their metadata. This will allow you to register this metadata in your own tracking / auditing systems.
|
|
392
|
+
|
|
393
|
+
### Advanced DSL Options
|
|
394
|
+
|
|
395
|
+
#### `complete_job_on_success`
|
|
396
|
+
|
|
397
|
+
Controls whether Busybee automatically completes the job when `perform` returns without raising. Default: `true`.
|
|
398
|
+
|
|
399
|
+
Set to `false` when your worker needs to manage the job lifecycle manually (for example, when completion depends on a conditional branch, or when using async patterns):
|
|
400
|
+
|
|
401
|
+
```ruby
|
|
402
|
+
class PickAndPackWorker < Busybee::Worker
|
|
403
|
+
complete_job_on_success false
|
|
404
|
+
fail_job_on_error false
|
|
405
|
+
|
|
406
|
+
def perform
|
|
407
|
+
delay = calculate_delay
|
|
408
|
+
current_job = job
|
|
409
|
+
|
|
410
|
+
Concurrent::Promises.future { simulate_packing(current_job, delay) }
|
|
411
|
+
.then { current_job.complete! }
|
|
412
|
+
.rescue { |err| current_job.fail!(err) }
|
|
413
|
+
end
|
|
414
|
+
end
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
> See the [Dropship Co. demo app's simulation workers](../spec/demo/app/workers/sim/) for a full example of this pattern.
|
|
418
|
+
|
|
419
|
+
#### `fail_job_on_error`
|
|
420
|
+
|
|
421
|
+
Controls whether Busybee automatically fails the job when `perform` raises an exception. Default: `true`.
|
|
422
|
+
|
|
423
|
+
Set to `false` when you want to handle all errors yourself. Note that if the job is neither completed nor failed, it will eventually time out and be retried by the workflow engine.
|
|
424
|
+
|
|
425
|
+
#### `description`
|
|
426
|
+
|
|
427
|
+
A human-readable description of what the worker does. Used for documentation (will be passed to instrumentation hooks in a future version):
|
|
428
|
+
|
|
429
|
+
```ruby
|
|
430
|
+
description "Calculates distance between two geographic points using a configurable algorithm"
|
|
431
|
+
```
|
|
432
|
+
|
|
433
|
+
Not to be confused with the `description:` option on input and output declarations, which is similarly used for documentation.
|
|
434
|
+
|
|
435
|
+
#### `job_timeout`
|
|
436
|
+
|
|
437
|
+
How long this worker is allowed to hold a job before the workflow engine assumes that the worker has crashed and will make the job available to another worker. Accepts an Integer (milliseconds) or `ActiveSupport::Duration`:
|
|
438
|
+
|
|
439
|
+
```ruby
|
|
440
|
+
job_timeout 120_000 # 2 minutes
|
|
441
|
+
job_timeout 2.minutes # same, with ActiveSupport
|
|
442
|
+
```
|
|
443
|
+
|
|
444
|
+
Default: `60_000` ms (1 minute), configurable via [`Busybee.default_job_lock_timeout`](configuration.md).
|
|
445
|
+
|
|
446
|
+
#### `backoff`
|
|
447
|
+
|
|
448
|
+
How long the workflow engine should wait before making a failed job available for retry. Accepts an Integer (milliseconds) or `ActiveSupport::Duration`:
|
|
449
|
+
|
|
450
|
+
```ruby
|
|
451
|
+
backoff 30_000 # 30 seconds
|
|
452
|
+
backoff 30.seconds # same, with ActiveSupport
|
|
453
|
+
```
|
|
454
|
+
|
|
455
|
+
Default: `5_000` ms (5 seconds), configurable via [`Busybee.default_fail_job_backoff`](configuration.md).
|
|
456
|
+
|
|
457
|
+
#### Mode Configuration in the DSL
|
|
458
|
+
|
|
459
|
+
Workers can declare their preferred worker mode and any mode-specific options. These serve as defaults that can be overridden at deploy time via CLI flags or YAML configuration (see [Configuration Precedence](#configuration-precedence)):
|
|
460
|
+
|
|
461
|
+
```ruby
|
|
462
|
+
class HighThroughputWorker < Busybee::Worker
|
|
463
|
+
worker_mode :streaming
|
|
464
|
+
streaming buffer: true, buffer_throttle: 5 # 5ms delay between accepting jobs
|
|
465
|
+
|
|
466
|
+
def perform
|
|
467
|
+
# ...
|
|
468
|
+
end
|
|
469
|
+
end
|
|
470
|
+
|
|
471
|
+
class BatchWorker < Busybee::Worker
|
|
472
|
+
worker_mode :polling
|
|
473
|
+
polling max_jobs: 50, request_timeout: 30_000
|
|
474
|
+
|
|
475
|
+
def perform
|
|
476
|
+
# ...
|
|
477
|
+
end
|
|
478
|
+
end
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
See [Worker Modes](#worker-modes) for what these options mean and when to use each mode.
|
|
482
|
+
|
|
483
|
+
#### DSL Quick Reference
|
|
484
|
+
|
|
485
|
+
| DSL Method | Arguments | Default | Description |
|
|
486
|
+
|------------|-----------|---------|-------------|
|
|
487
|
+
| `job_type` | String | Derived from class name | Job type identifier |
|
|
488
|
+
| `description` | String | `nil` | Human-readable description |
|
|
489
|
+
| `variable` | name, opts | | Declare a variable input |
|
|
490
|
+
| `header` | name, opts | | Declare a header input |
|
|
491
|
+
| `input` | name, `source:`, opts | | Declare an input from any source |
|
|
492
|
+
| `output` | name, opts | | Declare an output |
|
|
493
|
+
| `worker_mode` | Symbol | `:hybrid` | `:polling`, `:streaming`, or `:hybrid` |
|
|
494
|
+
| `polling` | `max_jobs:`, `request_timeout:` | `25`, `60_000` | Polling mode options |
|
|
495
|
+
| `streaming` | `buffer:`, `buffer_throttle:` | `true`, `false` | Streaming mode options |
|
|
496
|
+
| `job_timeout` | Integer or Duration | `60_000` | Job lock timeout (ms) |
|
|
497
|
+
| `backoff` | Integer or Duration | `5_000` | Retry backoff delay (ms) |
|
|
498
|
+
| `backpressure_delay` | Integer or Duration | `2_000` | Delay after backpressure error (ms) |
|
|
499
|
+
| `complete_job_on_success` | Boolean | `true` | Auto-complete on success |
|
|
500
|
+
| `fail_job_on_error` | Boolean | `true` | Auto-fail on exception |
|
|
501
|
+
| `shutdown_on` | Exception class(es) | `[]` | Exceptions that trigger shutdown |
|
|
502
|
+
|
|
503
|
+
---
|
|
504
|
+
|
|
505
|
+
## Running Workers
|
|
506
|
+
|
|
507
|
+
### CLI Quick Start
|
|
508
|
+
|
|
509
|
+
Run a worker with `bundle exec busybee`:
|
|
510
|
+
|
|
511
|
+
```bash
|
|
512
|
+
# Run a single worker
|
|
513
|
+
bundle exec busybee ProcessOrderWorker
|
|
514
|
+
|
|
515
|
+
# Run multiple workers in one process
|
|
516
|
+
bundle exec busybee ProcessOrderWorker ShipOrderWorker NotifyCustomerWorker
|
|
517
|
+
|
|
518
|
+
# Run workers defined in a YAML config file
|
|
519
|
+
bundle exec busybee --config config/busybee.yml
|
|
520
|
+
```
|
|
521
|
+
|
|
522
|
+
The CLI loads your Rails environment automatically (if present), instantiates the named worker classes, and starts processing jobs. Press Ctrl-C for a graceful shutdown, or Ctrl-C again to force-quit.
|
|
523
|
+
|
|
524
|
+
If you've used [Racecar](https://github.com/zendesk/racecar) to run Kafka consumers, this pattern should be familiar: one executable, one or more handler classes, and a long-running process that connects to the messaging infrastructure and dispatches work.
|
|
525
|
+
|
|
526
|
+
### CLI Reference
|
|
527
|
+
|
|
528
|
+
```
|
|
529
|
+
Usage: busybee [options] WorkerClass [WorkerClass ...]
|
|
530
|
+
```
|
|
531
|
+
|
|
532
|
+
| Flag | Short | Type | Description |
|
|
533
|
+
|------|-------|------|-------------|
|
|
534
|
+
| `--config FILE` | `-c` | String | Path to a [YAML configuration file](#yaml-configuration) |
|
|
535
|
+
| `--worker-mode MODE` | `-m` | String | Worker mode: `polling`, `streaming`, or `hybrid` |
|
|
536
|
+
| `--log-format FORMAT` | `-l` | String | Log format: `text` or `json` |
|
|
537
|
+
| `--worker-name NAME` | `-n` | String | Worker process identifier (default: hostname) |
|
|
538
|
+
| `--cluster-address ADDR` | `-a` | String | Zeebe gateway address as `host:port` |
|
|
539
|
+
| `--version` | `-v` | | Print version and exit |
|
|
540
|
+
| `--help` | `-h` | | Print help and exit |
|
|
541
|
+
|
|
542
|
+
**Mutual Exclusions:**
|
|
543
|
+
|
|
544
|
+
- `--config` and `--worker-mode` cannot be used together. Set `worker_mode` in YAML instead.
|
|
545
|
+
- `--config` and positional worker arguments cannot be used together. List workers in YAML instead.
|
|
546
|
+
|
|
547
|
+
### Rails Integration
|
|
548
|
+
|
|
549
|
+
The CLI automatically loads your Rails environment by requiring `./config/environment`. This means your workers have access to your models, application config, and everything else in your Rails app. Most gem configuration settings (credentials, logging, etc.) can be set through Rails app configuration values. See [Configuration: Rails Integration](configuration.md#rails-integration).
|
|
550
|
+
|
|
551
|
+
If you don't have Rails installed, loading the environment will be skipped automatically and transparently. If you _do_ have Rails installed but for some reason you want to skip loading the Rails environment, you can set an env var:
|
|
552
|
+
|
|
553
|
+
```bash
|
|
554
|
+
BUSYBEE_SKIP_RAILS=1 bundle exec busybee MyWorker
|
|
555
|
+
```
|
|
556
|
+
|
|
557
|
+
(Using an env var is necessary because the decision to attempt loading the environment must be made before we could load any configuration values from that environment.)
|
|
558
|
+
|
|
559
|
+
### Signal Handling
|
|
560
|
+
|
|
561
|
+
The worker process responds to standard Unix signals:
|
|
562
|
+
|
|
563
|
+
| Signal | First time | Second time |
|
|
564
|
+
|--------|-----------|------------|
|
|
565
|
+
| `INT` (Ctrl-C) | Graceful shutdown: stop accepting new jobs, finish in-flight work | Force shutdown: exit immediately |
|
|
566
|
+
| `TERM` | Same as INT | Same as INT |
|
|
567
|
+
| `QUIT` | Same as INT | Same as INT |
|
|
568
|
+
|
|
569
|
+
During graceful shutdown, any jobs that were received from the workflow engine but not yet started are failed back to the workflow engine with their retry count preserved, so they'll be picked up by another worker.
|
|
570
|
+
|
|
571
|
+
### Worker Modes
|
|
572
|
+
|
|
573
|
+
Zeebe supports two different ways of fetching jobs for your worker: long-polling or streaming. Both of them have advantages and disadvantages. Busybee supports both modes, as well as a third hybrid mode which eliminates the downsides of using either polling or streaming alone.
|
|
574
|
+
|
|
575
|
+
**If you don't know (or don't want to think about) which mode to use, use hybrid mode.** It's the default, it's been specifically designed to give you the best of both worlds, and it will allow you to mostly ignore this section. However, if you want to understand the tradeoffs between the different modes, read on.
|
|
576
|
+
|
|
577
|
+
#### Polling
|
|
578
|
+
|
|
579
|
+
```ruby
|
|
580
|
+
worker_mode :polling
|
|
581
|
+
```
|
|
582
|
+
|
|
583
|
+
In polling mode, the busybee process for your worker repeatedly [long-polls](https://docs.camunda.io/docs/apis-tools/zeebe-api/gateway-service/#activatejobs-rpc) the Zeebe gateway: "give me up to N jobs of this type." If no jobs are available, the call blocks until at least one job is available. Your worker receives the available jobs, processes them sequentially, then polls again.
|
|
584
|
+
|
|
585
|
+
This is the simplest mode, built on the oldest API. It has two principal downsides compared to streaming mode: one, it requires considerably more network traffic, and two, it results in additional latency for each job (both within the workflow engine, while buffering waiting for a polling request, and in the worker process while the batch is being sequentially processed). However, it avoids the main downside of [streaming mode](#streaming) by guaranteeing that it will eventually retrieve all jobs created prior to the polling request.
|
|
586
|
+
|
|
587
|
+
**Options:**
|
|
588
|
+
|
|
589
|
+
| Option | DSL | YAML/CLI | Default | Description |
|
|
590
|
+
|--------|-----|----------|---------|-------------|
|
|
591
|
+
| Max jobs per request | `polling max_jobs: N` | `max_jobs` | `25` | Limit on how many jobs to fetch per poll |
|
|
592
|
+
| Request timeout | `polling request_timeout: N` | `request_timeout` | `60_000` ms | Limit on how long to wait for jobs before the gateway returns an empty response |
|
|
593
|
+
|
|
594
|
+
**When to Use:** Polling is good for local prototyping, to ensure that backlogs of unprocessed "invisible" jobs cannot form due to race conditions. For deployed or production-like environments, polling should not normally be used, but could be useful during incident response to help clean up a large backlog of available jobs.
|
|
595
|
+
|
|
596
|
+
#### Streaming
|
|
597
|
+
|
|
598
|
+
```ruby
|
|
599
|
+
worker_mode :streaming
|
|
600
|
+
```
|
|
601
|
+
|
|
602
|
+
In streaming mode, the busybee process for your worker opens a persistent [gRPC stream](https://docs.camunda.io/docs/apis-tools/zeebe-api/gateway-service/#streamactivatedjobs-rpc) connection to the workflow engine. The engine pushes jobs to your worker as soon as they're created.
|
|
603
|
+
|
|
604
|
+
This is the more modern mode, giving you the lowest possible latency for new jobs, and the lowest amount of network overhead to get them. But it has a major downside: streams only ever deliver jobs *created after the stream opens.* If there were jobs of that type already backlogged in the workflow engine, a worker in streaming mode won't ever see them. For that, you need polling or [hybrid mode](#hybrid).
|
|
605
|
+
|
|
606
|
+
With default settings, a streaming worker accepts jobs from the workflow engine immediately, buffering them in memory in ruby prior to actual execution by your worker code. This helps ensure the stream stays responsive and enables [buffer throttling](#buffer-throttle) for controllable backpressure if the size of the in-memory buffer becomes too large. Jobs are still processed sequentially.
|
|
607
|
+
|
|
608
|
+
**Options:**
|
|
609
|
+
|
|
610
|
+
| Option | DSL | YAML/CLI | Default | Description |
|
|
611
|
+
|--------|-----|----------|---------|-------------|
|
|
612
|
+
| Buffer mode | `streaming buffer: true/false` | `buffer` | `true` | Use the buffer. Set to `false` for inline (unbuffered) processing. |
|
|
613
|
+
| Buffer throttle | `streaming buffer_throttle: N` | `buffer_throttle` | `false` | Delay between accepting jobs, in ms. See [Buffer Throttle](#buffer-throttle). |
|
|
614
|
+
|
|
615
|
+
**When to Use:** Whenever you can guarantee that there will be no pre-existing backlog of available jobs. In practice, that guarantee can be difficult to meet, because it depends on human processes to ensure that workflows are never deployed or started before all of the workers they rely on are already running.
|
|
616
|
+
|
|
617
|
+
#### Hybrid
|
|
618
|
+
|
|
619
|
+
```ruby
|
|
620
|
+
worker_mode :hybrid
|
|
621
|
+
```
|
|
622
|
+
|
|
623
|
+
In hybrid mode, busybee combines both approaches to avoid the downsides of either. It opens a stream to capture new jobs immediately, buffering them in memory, then also makes polling requests to drain any backlog. Once the backlog is caught up, it stops polling and continues stream-only processing.
|
|
624
|
+
|
|
625
|
+
This is the default mode, and it should be set-and-forget in most cases.
|
|
626
|
+
|
|
627
|
+
Hybrid mode works in three phases:
|
|
628
|
+
|
|
629
|
+
1. **Open Stream** - starts receiving new jobs immediately, into the buffer.
|
|
630
|
+
2. **Drain Backlog** - polls for pre-existing jobs while also processing any stream jobs that arrive. Stream jobs always take priority (the backlog is only drained if the worker is keeping ahead of the new jobs in the stream).
|
|
631
|
+
3. **Stream Only** - once the backlog is caught up, it stops polling, but continues processing jobs from the stream.
|
|
632
|
+
|
|
633
|
+
All calls to your `perform` method happen on the main thread, maintaining the same sequential guarantee as the other modes.
|
|
634
|
+
|
|
635
|
+
**When to use:** Nearly always. This is the default and the right choice for most workloads. You get low latency for new jobs, low network load, *and* reliable backlog processing after deploys or restarts.
|
|
636
|
+
|
|
637
|
+
#### Buffer Throttle
|
|
638
|
+
|
|
639
|
+
When using hybrid mode, or streaming mode with the default `buffer: true`, jobs are consumed from the gRPC stream as soon as they are available, and are buffered in memory while they wait for your worker to process them. This design avoids applying any [backpressure](https://docs.camunda.io/docs/components/concepts/job-workers/#backpressure) to the gRPC gateway, so that the stream does not get marked as `not-ready` and end up missing future jobs (see that link for details).
|
|
640
|
+
|
|
641
|
+
For most workloads, this arrangement should work smoothly. But if your worker processes jobs slowly while the workflow engine is pushing lots of jobs fast, then the buffer (and ruby heap size) can start to grow without bound.
|
|
642
|
+
|
|
643
|
+
The `buffer_throttle` option lets you address this situation by adding a sleep between accepting each job. This limits the rate at which busybee accepts jobs from the gRPC gateway, which limits how fast the buffer can grow.
|
|
644
|
+
|
|
645
|
+
For most users, the default (false, no throttle) should be correct most of the time. Only tune this if you observe concerning memory growth or OOM errors from your workers due to unbounded buffer depth.
|
|
646
|
+
|
|
647
|
+
```ruby
|
|
648
|
+
streaming buffer: true, buffer_throttle: 5.0 # 5ms delay between accepting each job -- max 200 jobs/s
|
|
649
|
+
```
|
|
650
|
+
|
|
651
|
+
| `buffer_throttle` value | Behavior | Rate Cap (Appx.) |
|
|
652
|
+
|-------------------------|----------|------------------|
|
|
653
|
+
| `false` (default) | No throttling (buffer can grow without bound) | Not capped |
|
|
654
|
+
| `0` | Minimal possible throttling (see Sleep Granularity, below) | ~200k - ~1M jobs/sec |
|
|
655
|
+
| `0.1` - `10` (ms) | Practical range for stable throttling | Up to 10,000 jobs/sec |
|
|
656
|
+
|
|
657
|
+
Note that `buffer_throttle` is not a panacea. If your system is generating jobs at a faster rate than your worker can process them, enabling throttling **alone** will only make the problem worse. If the stream for your worker is [marked `not-ready` by the gRPC gateway due to being too slow](https://docs.camunda.io/docs/components/concepts/job-workers/#backpressure), some future jobs will not be routed to it and will end up "hidden" in the workflow engine's buffer, where they will never be sent to a stream (and must be polled for). The _true_ solution to the problem of having too many jobs is to add additional capacity by scaling your worker either horizontally (adding more replicas) or vertically (adding more CPU or memory). In such a situation, using `buffer_throttle` lets you ensure that any one replica never gets overloaded and runs out of memory.
|
|
658
|
+
|
|
659
|
+
> Instrumentation hooks for monitoring buffer depth, and detecting the need for additional capacity, are planned for v0.4.
|
|
660
|
+
|
|
661
|
+
**Sleep Granularity:** Ruby's `Kernel#sleep` delegates to `nanosleep(2)` on POSIX systems. Values down to 0.1ms (100 microseconds) work reliably on modern Linux and macOS. Below that, OS scheduler and GVL overhead dominate, so sub-0.1ms values are unlikely to behave meaningfully. Therefore, the maximum *stable and reliable* rate cap you can get is close to 10k jobs/sec, which you get from `buffer_throttle: 0.1`.
|
|
662
|
+
|
|
663
|
+
However, there is an option that gives you a rate cap higher than this value without being totally unthrottled. If you set `buffer_throttle` to 0, the thread does not actually sleep, but it does cause a context swap, which slows it down more than simply doing nothing (on the order of 1-5µs). Setting `buffer_throttle: 0` should give you a rate cap somewhere between roughly 200k - 1M jobs/sec, but the exact value will depend on your infrastructure.
|
|
664
|
+
|
|
665
|
+
#### Backpressure
|
|
666
|
+
|
|
667
|
+
When the Zeebe cluster is under heavy load, it may respond to requests with a `ResourceExhausted` GRPC error. Both the polling and hybrid modes handle this automatically by sleeping for `backpressure_delay` milliseconds (default: 2,000) before retrying.
|
|
668
|
+
|
|
669
|
+
```ruby
|
|
670
|
+
backpressure_delay 10_000 # wait 10 seconds on backpressure
|
|
671
|
+
```
|
|
672
|
+
|
|
673
|
+
> Backpressure delays are slated to be overhauled in v0.5, and this section is expected to be rewritten at that time.
|
|
674
|
+
|
|
675
|
+
### Multiple Workers in One Process
|
|
676
|
+
|
|
677
|
+
When you pass multiple worker classes (via CLI args or YAML), Busybee runs them in a single process. Each worker runs in a dedicated thread, sharing a single gRPC connection to Zeebe:
|
|
678
|
+
|
|
679
|
+
```bash
|
|
680
|
+
bundle exec busybee ProcessOrderWorker ShipOrderWorker NotifyCustomerWorker
|
|
681
|
+
```
|
|
682
|
+
|
|
683
|
+
Each worker's configuration gets resolved independently through the [precedence chain](#configuration-precedence), so one worker can poll while another worker streams.
|
|
684
|
+
|
|
685
|
+
#### Thread Safety
|
|
686
|
+
|
|
687
|
+
Jobs of the *same* type are always processed sequentially. That is, only one instance of a given worker class will ever be executing `perform` at a given moment. But jobs of *different* types in the same container will run in parallel across threads. If your workers perform operations on shared resources (global state, shared caches, non-thread-safe libraries), you'll need to handle synchronization yourself. Most common Rails operations (ActiveRecord queries, cache reads/writes) are already thread-safe.
|
|
688
|
+
|
|
689
|
+
> An opt-in feature to run workers concurrently (multi-threaded) will be included in a future version of Busybee, and this section will be updated.
|
|
690
|
+
|
|
691
|
+
#### Database Connections
|
|
692
|
+
|
|
693
|
+
When running multiple workers, ensure your database connection pool is large enough to support one connection for each worker. Busybee logs a warning at startup if the ActiveRecord pool size is smaller than the number of workers.
|
|
694
|
+
|
|
695
|
+
### YAML Configuration
|
|
696
|
+
|
|
697
|
+
For repeatable deployments, define your worker configuration in a YAML file:
|
|
698
|
+
|
|
699
|
+
```yaml
|
|
700
|
+
# config/busybee.yml
|
|
701
|
+
worker_mode: hybrid
|
|
702
|
+
job_timeout: 120000
|
|
703
|
+
backoff: 10000
|
|
704
|
+
|
|
705
|
+
workers:
|
|
706
|
+
- ProcessOrderWorker
|
|
707
|
+
- ShipOrderWorker
|
|
708
|
+
- NotifyCustomerWorker
|
|
709
|
+
```
|
|
710
|
+
|
|
711
|
+
Run it with:
|
|
712
|
+
|
|
713
|
+
```bash
|
|
714
|
+
bundle exec busybee --config config/busybee.yml
|
|
715
|
+
```
|
|
716
|
+
|
|
717
|
+
#### Per-Worker Overrides
|
|
718
|
+
|
|
719
|
+
Different workers often have different performance characteristics, so YAML supports per-worker overrides for any per-worker setting:
|
|
720
|
+
|
|
721
|
+
```yaml
|
|
722
|
+
worker_mode: hybrid
|
|
723
|
+
workers:
|
|
724
|
+
- ProcessOrderWorker:
|
|
725
|
+
worker_mode: polling
|
|
726
|
+
max_jobs: 50
|
|
727
|
+
request_timeout: 10000
|
|
728
|
+
- ShipOrderWorker:
|
|
729
|
+
worker_mode: streaming
|
|
730
|
+
buffer_throttle: 5
|
|
731
|
+
- NotifyCustomerWorker # uses top-level defaults
|
|
732
|
+
```
|
|
733
|
+
|
|
734
|
+
#### YAML Reference
|
|
735
|
+
|
|
736
|
+
**Top-level keys** (apply to all workers unless overridden):
|
|
737
|
+
|
|
738
|
+
| Key | Type | Description |
|
|
739
|
+
|-----|------|-------------|
|
|
740
|
+
| `worker_mode` | String | `polling`, `streaming`, or `hybrid` |
|
|
741
|
+
| `max_jobs` | Integer | Max jobs per polling request |
|
|
742
|
+
| `request_timeout` | Integer | Long-poll timeout (ms) |
|
|
743
|
+
| `job_timeout` | Integer | Job lock timeout (ms) |
|
|
744
|
+
| `backoff` | Integer | Retry backoff (ms) |
|
|
745
|
+
| `backpressure_delay` | Integer | Delay after backpressure error (ms) |
|
|
746
|
+
| `buffer` | Boolean | Enable job buffering in streaming mode |
|
|
747
|
+
| `buffer_throttle` | Integer/Boolean | Job buffer delay (ms). `false` to disable |
|
|
748
|
+
| `workers` | Array | Worker class names, with optional per-worker overrides |
|
|
749
|
+
|
|
750
|
+
**Process-wide settings** (`log_format`, `worker_name`, `cluster_address`) are CLI-only and cannot be set in YAML. Use the corresponding CLI flags alongside `--config`:
|
|
751
|
+
|
|
752
|
+
```bash
|
|
753
|
+
bundle exec busybee --config config/busybee.yml --log-format json --worker-name "prod-worker-1"
|
|
754
|
+
```
|
|
755
|
+
|
|
756
|
+
> For a realistic example, see the [Dropship Co. demo app's busybee config files](../spec/demo/config/busybee/).
|
|
757
|
+
|
|
758
|
+
### Configuration Precedence
|
|
759
|
+
|
|
760
|
+
**Worker runtime settings** resolve through a 4-level precedence chain. Each level overrides the one below it:
|
|
761
|
+
|
|
762
|
+
```
|
|
763
|
+
Per-Worker Override in YAML (highest priority) `workers: - MyWorker: { max_jobs: 50 }`
|
|
764
|
+
| v
|
|
765
|
+
Top-Level YAML / CLI Flag v `max_jobs: 50` (at YAML top level)
|
|
766
|
+
| v
|
|
767
|
+
Worker DSL Declaration v `polling max_jobs: 32` (in the Worker class)
|
|
768
|
+
| v
|
|
769
|
+
Gem Configuration & Defaults (lowest priority) `Busybee.default_max_jobs` (25 by default, but can be set in config)
|
|
770
|
+
```
|
|
771
|
+
|
|
772
|
+
The first non-nil value wins. This means `0` and `false` are valid explicit values -- for example, `buffer_throttle: false` explicitly disables throttling even if a lower level sets it.
|
|
773
|
+
|
|
774
|
+
The [per-worker settings](#yaml-reference) this applies to are: `worker_mode`, `max_jobs`, `request_timeout`, `job_timeout`, `backoff`, `backpressure_delay`, `buffer`, and `buffer_throttle`.
|
|
775
|
+
|
|
776
|
+
**Process-wide settings** (like `--log-format`, `--worker-name`, and `--cluster-address`) follow a simpler 2-level chain: the CLI flag, then gem config / default. They don't participate in per-worker overrides because they always apply to the entire process. Also, they often take env vars as their inputs, so they are less useful in YAML.
|
|
777
|
+
|
|
778
|
+
For gem-level defaults (the bottom of the chain), see [Configuration](configuration.md).
|
|
779
|
+
|
|
780
|
+
---
|
|
781
|
+
|
|
782
|
+
## Testing Workers
|
|
783
|
+
|
|
784
|
+
Busybee includes helpers that let you unit test your workers without a running Zeebe instance, by constructing a simulated job and then running the real worker lifecycle with it.
|
|
785
|
+
|
|
786
|
+
This is a complement to the [workflow tests](testing.md) that you write. Those verify your process definitions, ensuring that the correct jobs will be available with the correct variables at the right times in the business process. These, by contrast, verify that your workers perform those jobs correctly under different conditions and with different inputs.
|
|
787
|
+
|
|
788
|
+
See that link for information about testing the workflow definitions. Read on for more about testing your workers.
|
|
789
|
+
|
|
790
|
+
### Setup
|
|
791
|
+
|
|
792
|
+
If you've already set up `busybee/testing` for BPMN workflow tests, worker testing helpers are available automatically. If not:
|
|
793
|
+
|
|
794
|
+
```ruby
|
|
795
|
+
# spec/spec_helper.rb or spec/rails_helper.rb
|
|
796
|
+
require "rspec"
|
|
797
|
+
require "busybee/testing"
|
|
798
|
+
```
|
|
799
|
+
|
|
800
|
+
This makes `execute_worker`, `build_test_job`, and the worker matchers available in all RSpec examples.
|
|
801
|
+
|
|
802
|
+
### Basic Worker Testing
|
|
803
|
+
|
|
804
|
+
The simplest way to test a worker is `execute_worker`. It runs the full worker lifecycle (input validation, `perform`, output validation, auto-complete) and returns the result:
|
|
805
|
+
|
|
806
|
+
```ruby
|
|
807
|
+
RSpec.describe ProcessOrderWorker do
|
|
808
|
+
let(:order) { create(:order) }
|
|
809
|
+
|
|
810
|
+
it "processes the order and returns confirmation number" do
|
|
811
|
+
result = execute_worker(described_class, variables: { order_id: order.id })
|
|
812
|
+
expect(result[:confirmation_number]).to be_present
|
|
813
|
+
end
|
|
814
|
+
|
|
815
|
+
it "marks the order as processed" do
|
|
816
|
+
execute_worker(described_class, variables: { order_id: order.id })
|
|
817
|
+
expect(order.reload).to be_processed
|
|
818
|
+
end
|
|
819
|
+
|
|
820
|
+
it "raises when order is missing" do
|
|
821
|
+
expect {
|
|
822
|
+
execute_worker(described_class, variables: { order_id: "nonexistent" })
|
|
823
|
+
}.to raise_error(ActiveRecord::RecordNotFound)
|
|
824
|
+
end
|
|
825
|
+
end
|
|
826
|
+
```
|
|
827
|
+
|
|
828
|
+
`execute_worker` accepts the same keyword arguments as `build_test_job`:
|
|
829
|
+
|
|
830
|
+
| Argument | Type | Default | Description |
|
|
831
|
+
|----------|------|---------|-------------|
|
|
832
|
+
| `variables:` | Hash | `{}` | Process variables |
|
|
833
|
+
| `headers:` | Hash | `{}` | Custom headers |
|
|
834
|
+
| `bpmn_process_id:` | String | `"test-process"` | BPMN process ID |
|
|
835
|
+
| `retries:` | Integer | `3` | Retry count |
|
|
836
|
+
|
|
837
|
+
Errors are re-raised after the worker's error handling runs, so you can use `expect { }.to raise_error` alongside job status assertions (see below).
|
|
838
|
+
|
|
839
|
+
### Inspecting Job State
|
|
840
|
+
|
|
841
|
+
When you need to assert on what the worker *did* to the job (completed it? failed it? threw a BPMN error?), or if you need so many variables or headers that passing all options inline becomes unreadable, you can build a test job first with `build_test_job` and then pass it to `execute_worker`:
|
|
842
|
+
|
|
843
|
+
```ruby
|
|
844
|
+
RSpec.describe ProcessOrderWorker do
|
|
845
|
+
it "completes the job on success" do
|
|
846
|
+
job = build_test_job(variables: { order_id: create(:order).id })
|
|
847
|
+
execute_worker(described_class, job: job)
|
|
848
|
+
expect(job).to be_complete
|
|
849
|
+
end
|
|
850
|
+
|
|
851
|
+
it "fails the job on error" do
|
|
852
|
+
job = build_test_job(variables: { order_id: "nonexistent" })
|
|
853
|
+
expect { execute_worker(described_class, job: job) }
|
|
854
|
+
.to raise_error(ActiveRecord::RecordNotFound)
|
|
855
|
+
expect(job).to be_failed
|
|
856
|
+
end
|
|
857
|
+
end
|
|
858
|
+
```
|
|
859
|
+
|
|
860
|
+
`build_test_job` returns a real `Busybee::Job` backed by a stub client. All lifecycle operations (`complete!`, `fail!`, `throw_bpmn_error!`) update the job's status but don't make any network calls.
|
|
861
|
+
|
|
862
|
+
### Worker Testing Matchers
|
|
863
|
+
|
|
864
|
+
For more expressive assertions, Busybee provides three RSpec matchers that combine execution and verification in a single expectation.
|
|
865
|
+
|
|
866
|
+
#### `complete_job`
|
|
867
|
+
|
|
868
|
+
Asserts that a worker completes the job successfully:
|
|
869
|
+
|
|
870
|
+
```ruby
|
|
871
|
+
job = build_test_job(variables: { order_id: order.id })
|
|
872
|
+
|
|
873
|
+
# Just assert completion
|
|
874
|
+
expect(ProcessOrderWorker).to complete_job(job)
|
|
875
|
+
|
|
876
|
+
# Assert completion with specific output variables
|
|
877
|
+
expect(ProcessOrderWorker).to complete_job(job)
|
|
878
|
+
.with_vars(confirmation_number: "ORD-123")
|
|
879
|
+
|
|
880
|
+
# Assert completion with no output variables
|
|
881
|
+
expect(NotifyCustomerWorker).to complete_job(job).with_no_vars
|
|
882
|
+
|
|
883
|
+
# Works with RSpec composable matchers
|
|
884
|
+
expect(ProcessOrderWorker).to complete_job(job)
|
|
885
|
+
.with_vars(hash_including(confirmation_number: a_string_starting_with("ORD-")))
|
|
886
|
+
```
|
|
887
|
+
|
|
888
|
+
#### `fail_job`
|
|
889
|
+
|
|
890
|
+
Asserts that a worker fails the job. Optionally match the error class and/or message, using the same argument forms as RSpec's `raise_error`:
|
|
891
|
+
|
|
892
|
+
```ruby
|
|
893
|
+
job = build_test_job(variables: { order_id: "nonexistent" })
|
|
894
|
+
|
|
895
|
+
# Just assert failure
|
|
896
|
+
expect(ProcessOrderWorker).to fail_job(job)
|
|
897
|
+
|
|
898
|
+
# Match error class
|
|
899
|
+
expect(ProcessOrderWorker).to fail_job(job)
|
|
900
|
+
.with_error(ActiveRecord::RecordNotFound)
|
|
901
|
+
|
|
902
|
+
# Match error class and message pattern
|
|
903
|
+
expect(ProcessOrderWorker).to fail_job(job)
|
|
904
|
+
.with_error(ActiveRecord::RecordNotFound, /Couldn't find Order/)
|
|
905
|
+
|
|
906
|
+
# Match message only
|
|
907
|
+
expect(ProcessOrderWorker).to fail_job(job)
|
|
908
|
+
.with_error(/not found/)
|
|
909
|
+
```
|
|
910
|
+
|
|
911
|
+
#### `throw_bpmn_error_on`
|
|
912
|
+
|
|
913
|
+
Asserts that a worker throws a [BPMN error](https://docs.camunda.io/docs/components/modeler/bpmn/error-events/). Remember, BPMN errors are a workflow control-flow concept, distinct from a Ruby exception. When your worker throws a BPMN error, it signals to the process instance that a known business condition occurred, and the workflow definition decides what happens next.
|
|
914
|
+
|
|
915
|
+
```ruby
|
|
916
|
+
job = build_test_job(variables: { order_id: expired_order.id })
|
|
917
|
+
|
|
918
|
+
# Just assert a BPMN error was thrown
|
|
919
|
+
expect(ProcessOrderWorker).to throw_bpmn_error_on(job)
|
|
920
|
+
|
|
921
|
+
# Match error code (symbol form - converted to uppercase)
|
|
922
|
+
expect(ProcessOrderWorker).to throw_bpmn_error_on(job)
|
|
923
|
+
.with_code(:order_expired) # matches code "ORDER_EXPIRED"
|
|
924
|
+
|
|
925
|
+
# Match error code and message
|
|
926
|
+
expect(ProcessOrderWorker).to throw_bpmn_error_on(job)
|
|
927
|
+
.with_code(:order_expired, message: /has expired/)
|
|
928
|
+
|
|
929
|
+
# Match code from exception class (MyApp::OrderExpired -> "MY_APP_ORDER_EXPIRED")
|
|
930
|
+
expect(ProcessOrderWorker).to throw_bpmn_error_on(job)
|
|
931
|
+
.with_code(MyApp::OrderExpired)
|
|
932
|
+
```
|
|
933
|
+
|
|
934
|
+
### Testing Best Practices
|
|
935
|
+
|
|
936
|
+
One recommended pattern is to compose `build_test_job` using RSpec's `let` blocks, then reuse the job in different contexts while adjusting / overriding the individual parameters. Provided that the `let` blocks themselves do not become unwieldy, this can be a powerful and elegant pattern:
|
|
937
|
+
|
|
938
|
+
```ruby
|
|
939
|
+
describe ProcessOrderWorker do
|
|
940
|
+
# There could potentially be many more variables, and/or some headers, but we show just one for clarity:
|
|
941
|
+
let(:job) { build_test_job(variables: variables) }
|
|
942
|
+
let(:variables) { { order_id: order_id } }
|
|
943
|
+
|
|
944
|
+
let(:order) { create :order } # e.g. FactoryBot or similar fixture setup
|
|
945
|
+
let(:order_id) { order.id }
|
|
946
|
+
|
|
947
|
+
context "with a valid order" do
|
|
948
|
+
it "processes the order normally" do
|
|
949
|
+
expect(described_class).to complete_job(job).with_vars(confirmation_number: /[A-Z]{6}/)
|
|
950
|
+
end
|
|
951
|
+
end
|
|
952
|
+
|
|
953
|
+
# Now we can override just the fixture object while reusing the rest of the job setup:
|
|
954
|
+
context "with an order without sufficient funds" do
|
|
955
|
+
let(:order) { create :order, :insufficient_funds } # e.g. a FactoryBot trait or similar
|
|
956
|
+
|
|
957
|
+
it "throws a BPMN error so the workflow can branch" do
|
|
958
|
+
expect(described_class).to throw_bpmn_error_on(job).with_code("INSUFFICIENT_FUNDS")
|
|
959
|
+
end
|
|
960
|
+
end
|
|
961
|
+
|
|
962
|
+
# Or we can override just the variable's value itself, and bypass the fixture entirely:
|
|
963
|
+
context "when the order is not found" do
|
|
964
|
+
let(:order_id) { SecureRandom.uuid }
|
|
965
|
+
|
|
966
|
+
it "fails and reports the error" do
|
|
967
|
+
expect(described_class).to fail_job(job).with_error(ActiveRecord::RecordNotFound)
|
|
968
|
+
end
|
|
969
|
+
end
|
|
970
|
+
|
|
971
|
+
# Or even override the entire set of variables:
|
|
972
|
+
context "when a workflow does not pass the expected set of variables" do
|
|
973
|
+
let(:variables) { {} }
|
|
974
|
+
|
|
975
|
+
it "fails input validation, alerting us to the problem" do
|
|
976
|
+
expect(described_class).to fail_job(job).with_error(Busybee::MissingInput)
|
|
977
|
+
end
|
|
978
|
+
end
|
|
979
|
+
end
|
|
980
|
+
```
|
|
981
|
+
|
|
982
|
+
For more realistic examples, see the [demo app's worker specs](../spec/demo/spec/workers/).
|