pgbus 0.4.0 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +360 -331
- data/app/controllers/pgbus/dead_letter_controller.rb +3 -7
- data/app/frontend/pgbus/style.css +1 -1
- data/app/frontend/pgbus/tailwind.css +28 -1
- data/app/views/layouts/pgbus/application.html.erb +58 -12
- data/app/views/pgbus/dead_letter/_messages_table.html.erb +3 -5
- data/app/views/pgbus/insights/show.html.erb +6 -6
- data/app/views/pgbus/jobs/_enqueued_table.html.erb +2 -3
- data/lib/generators/pgbus/templates/pgbus.yml.erb +5 -3
- data/lib/generators/pgbus/update_generator.rb +75 -0
- data/lib/pgbus/circuit_breaker.rb +17 -3
- data/lib/pgbus/cli.rb +95 -3
- data/lib/pgbus/client.rb +91 -3
- data/lib/pgbus/configuration/capsule_dsl.rb +190 -0
- data/lib/pgbus/configuration.rb +305 -25
- data/lib/pgbus/failed_event_recorder.rb +15 -2
- data/lib/pgbus/generators/config_converter.rb +323 -0
- data/lib/pgbus/process/dispatcher.rb +42 -20
- data/lib/pgbus/process/supervisor.rb +11 -16
- data/lib/pgbus/process/worker.rb +20 -2
- data/lib/pgbus/version.rb +1 -1
- data/lib/pgbus/web/data_source.rb +50 -15
- data/lib/pgbus.rb +13 -1
- metadata +4 -1
data/README.md
CHANGED
|
@@ -12,22 +12,33 @@ PostgreSQL-native job processing and event bus for Rails, built on [PGMQ](https:
|
|
|
12
12
|
- [Requirements](#requirements)
|
|
13
13
|
- [Installation](#installation)
|
|
14
14
|
- [Quick start](#quick-start)
|
|
15
|
-
- [
|
|
16
|
-
- [
|
|
17
|
-
- [
|
|
18
|
-
- [
|
|
19
|
-
- [
|
|
20
|
-
- [
|
|
21
|
-
- [
|
|
22
|
-
- [
|
|
23
|
-
- [
|
|
24
|
-
- [
|
|
25
|
-
- [
|
|
26
|
-
- [
|
|
27
|
-
- [
|
|
28
|
-
- [
|
|
29
|
-
- [
|
|
30
|
-
- [
|
|
15
|
+
- [1. Configure (optional)](#1-configure-optional)
|
|
16
|
+
- [2. Use as ActiveJob backend](#2-use-as-activejob-backend)
|
|
17
|
+
- [3. Event bus (optional)](#3-event-bus-optional)
|
|
18
|
+
- [4. Start workers](#4-start-workers)
|
|
19
|
+
- [5. Mount the dashboard](#5-mount-the-dashboard)
|
|
20
|
+
- [Reliability](#reliability)
|
|
21
|
+
- [Job uniqueness](#job-uniqueness)
|
|
22
|
+
- [Concurrency controls](#concurrency-controls)
|
|
23
|
+
- [Circuit breaker and queue pause/resume](#circuit-breaker-and-queue-pauseresume)
|
|
24
|
+
- [Prefetch flow control](#prefetch-flow-control)
|
|
25
|
+
- [Worker recycling](#worker-recycling)
|
|
26
|
+
- [Routing and ordering](#routing-and-ordering)
|
|
27
|
+
- [Priority queues](#priority-queues)
|
|
28
|
+
- [Consumer priority](#consumer-priority)
|
|
29
|
+
- [Single active consumer](#single-active-consumer)
|
|
30
|
+
- [Persistence and batching](#persistence-and-batching)
|
|
31
|
+
- [Batches](#batches)
|
|
32
|
+
- [Transactional outbox](#transactional-outbox)
|
|
33
|
+
- [Archive compaction](#archive-compaction)
|
|
34
|
+
- [Operations](#operations)
|
|
35
|
+
- [CLI](#cli)
|
|
36
|
+
- [Dashboard](#dashboard)
|
|
37
|
+
- [Database tables](#database-tables)
|
|
38
|
+
- [Switching from another backend](#switching-from-another-backend)
|
|
39
|
+
- [Reference](#reference)
|
|
40
|
+
- [Architecture](#architecture)
|
|
41
|
+
- [Configuration reference](#configuration-reference)
|
|
31
42
|
- [Development](#development)
|
|
32
43
|
- [License](#license)
|
|
33
44
|
|
|
@@ -75,51 +86,33 @@ CREATE EXTENSION IF NOT EXISTS pgmq;
|
|
|
75
86
|
|
|
76
87
|
### 1. Configure (optional)
|
|
77
88
|
|
|
78
|
-
Pgbus works with zero config in Rails -- it uses your existing `ActiveRecord` connection. For custom setups,
|
|
79
|
-
|
|
80
|
-
```yaml
|
|
81
|
-
production:
|
|
82
|
-
queue_prefix: myapp
|
|
83
|
-
default_queue: default
|
|
84
|
-
pool_size: 10
|
|
85
|
-
max_retries: 5
|
|
86
|
-
prefetch_limit: 20
|
|
87
|
-
workers:
|
|
88
|
-
- queues: [default, mailers]
|
|
89
|
-
threads: 10
|
|
90
|
-
consumer_priority: 10
|
|
91
|
-
- queues: [critical]
|
|
92
|
-
threads: 5
|
|
93
|
-
single_active_consumer: true
|
|
94
|
-
- queues: [default, mailers]
|
|
95
|
-
threads: 5
|
|
96
|
-
consumer_priority: 0 # fallback worker
|
|
97
|
-
event_consumers:
|
|
98
|
-
- queues: [orders, payments]
|
|
99
|
-
threads: 5
|
|
100
|
-
max_jobs_per_worker: 10000
|
|
101
|
-
max_memory_mb: 512
|
|
102
|
-
max_worker_lifetime: 3600
|
|
103
|
-
```
|
|
104
|
-
|
|
105
|
-
Or configure in an initializer:
|
|
89
|
+
Pgbus works with zero config in Rails -- it uses your existing `ActiveRecord` connection. For custom setups, drop a Ruby initializer:
|
|
106
90
|
|
|
107
91
|
```ruby
|
|
108
92
|
# config/initializers/pgbus.rb
|
|
109
|
-
Pgbus.configure do |
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
93
|
+
Pgbus.configure do |c|
|
|
94
|
+
c.queue_prefix = "myapp"
|
|
95
|
+
c.max_retries = 5
|
|
96
|
+
c.visibility_timeout = 30.seconds # ActiveSupport::Duration accepted
|
|
97
|
+
c.idempotency_ttl = 7.days
|
|
98
|
+
|
|
99
|
+
# Worker recycling — prevents long-lived processes from leaking memory
|
|
100
|
+
c.max_jobs_per_worker = 10_000
|
|
101
|
+
c.max_memory_mb = 512
|
|
102
|
+
c.max_worker_lifetime = 1.hour
|
|
103
|
+
|
|
104
|
+
# Capsule string DSL — Sidekiq-style "queues: threads; queues: threads"
|
|
105
|
+
c.workers = "default, mailers: 10; critical: 5"
|
|
106
|
+
|
|
107
|
+
# Or use named capsules with advanced options
|
|
108
|
+
c.capsule :ordered, queues: %w[ordered_events], threads: 1, single_active_consumer: true
|
|
120
109
|
end
|
|
121
110
|
```
|
|
122
111
|
|
|
112
|
+
The capsule string DSL is the shortest form for the common case. Use `c.capsule` when you need named capsules with advanced options like `single_active_consumer` or `consumer_priority`. See [Routing and ordering](#routing-and-ordering) for the full set.
|
|
113
|
+
|
|
114
|
+
> **Migrating from `config/pgbus.yml`?** Run `rails generate pgbus:update` to convert your YAML config to a Ruby initializer using the modern DSL. The original YAML stays in place for review; delete it once the new initializer looks right.
|
|
115
|
+
|
|
123
116
|
### 2. Use as ActiveJob backend
|
|
124
117
|
|
|
125
118
|
```ruby
|
|
@@ -239,7 +232,101 @@ Pgbus.configure do |config|
|
|
|
239
232
|
end
|
|
240
233
|
```
|
|
241
234
|
|
|
242
|
-
##
|
|
235
|
+
## Reliability
|
|
236
|
+
|
|
237
|
+
These features stop bad jobs from cascading into outages: deduplication, concurrency caps, automatic queue pausing on repeated failures, in-flight backpressure, and worker recycling.
|
|
238
|
+
|
|
239
|
+
### Job uniqueness
|
|
240
|
+
|
|
241
|
+
Prevent duplicate jobs from running. Unlike `limits_concurrency` (which controls *how many* jobs with the same key run), uniqueness guarantees *at most one* job with a given key exists in the system at any time.
|
|
242
|
+
|
|
243
|
+
```ruby
|
|
244
|
+
class ImportOrderJob < ApplicationJob
|
|
245
|
+
ensures_uniqueness strategy: :until_executed,
|
|
246
|
+
key: ->(order_id) { "import-order-#{order_id}" },
|
|
247
|
+
on_conflict: :reject
|
|
248
|
+
|
|
249
|
+
def perform(order_id)
|
|
250
|
+
# Only ONE instance per order_id can exist — from enqueue through completion.
|
|
251
|
+
# If another ImportOrderJob for this order_id is already enqueued or running,
|
|
252
|
+
# the duplicate is rejected immediately.
|
|
253
|
+
end
|
|
254
|
+
end
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
#### Strategies
|
|
258
|
+
|
|
259
|
+
| Strategy | Lock acquired | Lock released | Prevents |
|
|
260
|
+
|----------|--------------|---------------|----------|
|
|
261
|
+
| `:until_executed` | At enqueue | On completion or DLQ | Duplicate enqueue AND execution |
|
|
262
|
+
| `:while_executing` | At execution start | On completion or DLQ | Duplicate execution only |
|
|
263
|
+
|
|
264
|
+
#### Conflict policies
|
|
265
|
+
|
|
266
|
+
| Policy | Behavior |
|
|
267
|
+
|--------|----------|
|
|
268
|
+
| `:reject` | Raise `Pgbus::JobNotUnique` (default) |
|
|
269
|
+
| `:discard` | Silently drop the duplicate |
|
|
270
|
+
| `:log` | Log a warning and drop |
|
|
271
|
+
|
|
272
|
+
#### Lock lifecycle
|
|
273
|
+
|
|
274
|
+
The lock is **never released by a timer**. It is held as long as the job exists in the system:
|
|
275
|
+
|
|
276
|
+
```text
|
|
277
|
+
Enqueue ──→ pgbus_job_locks (state: queued, owner_pid: nil)
|
|
278
|
+
│
|
|
279
|
+
Worker picks up job
|
|
280
|
+
│
|
|
281
|
+
▼
|
|
282
|
+
claim_for_execution! (state: executing, owner_pid: PID)
|
|
283
|
+
│
|
|
284
|
+
┌───────┴───────┐
|
|
285
|
+
▼ ▼
|
|
286
|
+
Success Crash
|
|
287
|
+
release! (lock orphaned)
|
|
288
|
+
(row deleted) │
|
|
289
|
+
▼
|
|
290
|
+
Reaper checks:
|
|
291
|
+
Is owner_pid in pgbus_processes
|
|
292
|
+
with fresh heartbeat?
|
|
293
|
+
│
|
|
294
|
+
┌─────┴─────┐
|
|
295
|
+
No Yes
|
|
296
|
+
▼ ▼
|
|
297
|
+
release! (keep lock,
|
|
298
|
+
(orphaned) job is running)
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
**Crash recovery** works through the reaper (runs every 5 minutes in the dispatcher). It cross-references `owner_pid` in `pgbus_job_locks` against `pgbus_processes` heartbeats. If the owning worker has no fresh heartbeat, the lock is orphaned and released — the PGMQ message's visibility timeout will expire and the job will be retried by another worker.
|
|
302
|
+
|
|
303
|
+
A last-resort TTL (default 24 hours) handles the case where the entire pgbus supervisor is dead and the reaper itself can't run.
|
|
304
|
+
|
|
305
|
+
#### Uniqueness vs concurrency controls
|
|
306
|
+
|
|
307
|
+
| | `ensures_uniqueness` | `limits_concurrency` |
|
|
308
|
+
|---|---|---|
|
|
309
|
+
| **Purpose** | Prevent duplicate jobs | Limit concurrent execution slots |
|
|
310
|
+
| **Lock type** | Binary lock (one or none) | Counting semaphore (up to N) |
|
|
311
|
+
| **At enqueue** | `:until_executed` blocks duplicates | Checks semaphore, blocks/discards/raises |
|
|
312
|
+
| **At execution** | `:while_executing` blocks duplicate runs | Not checked (semaphore acquired at enqueue) |
|
|
313
|
+
| **Duplicate in queue** | `:until_executed`: impossible. `:while_executing`: allowed, only one runs | Allowed up to N, rest blocked |
|
|
314
|
+
| **Crash recovery** | Reaper checks heartbeats | Semaphore `expires_at` + dispatcher cleanup |
|
|
315
|
+
| **Use when** | "This exact job must not run twice" | "At most N of these can run at once" |
|
|
316
|
+
|
|
317
|
+
**When to use which:**
|
|
318
|
+
- Payment processing, order import, unique email sends → `ensures_uniqueness`
|
|
319
|
+
- Rate-limited API calls, resource-constrained tasks → `limits_concurrency`
|
|
320
|
+
- Both at once → combine them (they use separate tables, no conflicts)
|
|
321
|
+
|
|
322
|
+
#### Setup
|
|
323
|
+
|
|
324
|
+
```bash
|
|
325
|
+
rails generate pgbus:add_job_locks # Add the migration
|
|
326
|
+
rails generate pgbus:add_job_locks --database=pgbus # For separate database
|
|
327
|
+
```
|
|
328
|
+
|
|
329
|
+
### Concurrency controls
|
|
243
330
|
|
|
244
331
|
Limit how many jobs with the same key can run concurrently:
|
|
245
332
|
|
|
@@ -256,7 +343,7 @@ class ProcessOrderJob < ApplicationJob
|
|
|
256
343
|
end
|
|
257
344
|
```
|
|
258
345
|
|
|
259
|
-
|
|
346
|
+
#### Options
|
|
260
347
|
|
|
261
348
|
| Option | Default | Description |
|
|
262
349
|
|--------|---------|-------------|
|
|
@@ -265,7 +352,7 @@ end
|
|
|
265
352
|
| `duration:` | `15.minutes` | Safety expiry for the semaphore (crashed worker recovery) |
|
|
266
353
|
| `on_conflict:` | `:block` | What to do when the limit is reached |
|
|
267
354
|
|
|
268
|
-
|
|
355
|
+
#### Conflict strategies
|
|
269
356
|
|
|
270
357
|
| Strategy | Behavior |
|
|
271
358
|
|----------|----------|
|
|
@@ -273,17 +360,17 @@ end
|
|
|
273
360
|
| `:discard` | Silently drop the job. |
|
|
274
361
|
| `:raise` | Raise `Pgbus::ConcurrencyLimitExceeded` so the caller can handle it. |
|
|
275
362
|
|
|
276
|
-
|
|
363
|
+
#### How concurrency works
|
|
277
364
|
|
|
278
365
|
1. **Enqueue**: The adapter checks a semaphore table for the concurrency key. If under the limit, it increments the counter and sends the job to PGMQ. If at the limit, it applies the `on_conflict` strategy.
|
|
279
366
|
2. **Complete**: After a job succeeds or is dead-lettered, the executor signals the concurrency system via an `ensure` block (guaranteeing the signal fires even if the archive step fails). It first tries to promote a blocked job (atomic delete + enqueue in a single transaction). If nothing to promote, it releases the semaphore slot.
|
|
280
367
|
3. **Safety net**: The dispatcher periodically cleans up expired semaphores and orphaned blocked executions to recover from crashed workers.
|
|
281
368
|
|
|
282
|
-
|
|
369
|
+
#### Concurrency compared to other backends
|
|
283
370
|
|
|
284
371
|
Pgbus, SolidQueue, GoodJob, and Sidekiq all offer concurrency controls, but with fundamentally different locking strategies and trade-offs.
|
|
285
372
|
|
|
286
|
-
|
|
373
|
+
##### Architecture comparison
|
|
287
374
|
|
|
288
375
|
| | **Pgbus** | **SolidQueue** | **GoodJob** | **Sidekiq Enterprise** |
|
|
289
376
|
|---|---|---|---|---|
|
|
@@ -296,7 +383,7 @@ Pgbus, SolidQueue, GoodJob, and Sidekiq all offer concurrency controls, but with
|
|
|
296
383
|
| **Crash recovery** | Semaphore `expires_at` + dispatcher `expire_stale` cleanup | Semaphore `expires_at` + concurrency maintenance task | Advisory locks auto-release on session disconnect | TTL-based lease expiry (default 5 min) |
|
|
297
384
|
| **Message lifecycle** | PGMQ visibility timeout (`FOR UPDATE SKIP LOCKED`) — message stays in queue until archived | AR-backed `claimed_executions` table | AR-backed `good_jobs` table with advisory lock per row | Redis list + sorted set |
|
|
298
385
|
|
|
299
|
-
|
|
386
|
+
##### Key design differences
|
|
300
387
|
|
|
301
388
|
**Pgbus** uses PGMQ's native `FOR UPDATE SKIP LOCKED` for message claiming and a separate semaphore table for concurrency control. This two-layer approach means the message queue and concurrency system are independent — PGMQ handles exactly-once delivery, the semaphore handles admission control. The semaphore acquire is a single atomic SQL (`INSERT ... ON CONFLICT DO UPDATE WHERE value < max`), avoiding the need for explicit row locks.
|
|
302
389
|
|
|
@@ -306,7 +393,7 @@ Pgbus, SolidQueue, GoodJob, and Sidekiq all offer concurrency controls, but with
|
|
|
306
393
|
|
|
307
394
|
**Sidekiq Enterprise** uses Redis sorted sets with TTL-based leases. Each concurrent slot is a sorted set entry with an expiry timestamp. This is fast and simple but has no durability guarantee — Redis failover can lose leases, temporarily allowing over-limit execution. The `sidekiq-unique-jobs` gem (open-source) uses a similar Lua-script approach but with more lock strategies (`:until_executing`, `:while_executing`, `:until_and_while_executing`) and configurable conflict handlers (`:reject`, `:reschedule`, `:replace`, `:raise`).
|
|
308
395
|
|
|
309
|
-
|
|
396
|
+
##### Race condition resilience
|
|
310
397
|
|
|
311
398
|
| Scenario | Pgbus | SolidQueue | GoodJob | Sidekiq |
|
|
312
399
|
|---|---|---|---|---|
|
|
@@ -315,142 +402,65 @@ Pgbus, SolidQueue, GoodJob, and Sidekiq all offer concurrency controls, but with
|
|
|
315
402
|
| **Archive succeeds but signal fails** | `ensure` block guarantees signal fires even if archive raises. For SIGKILL: semaphore expires via dispatcher. | Fixed in PR #689 — `unblock_next_job` moved inside same transaction as `finished`. | Advisory lock released by session disconnect. | Lease auto-expires. |
|
|
316
403
|
| **Concurrent enqueue and signal race** | Semaphore acquire is a single atomic SQL — no read-then-write gap. | Fixed in PR #689 — `FOR UPDATE` lock on semaphore row serializes enqueue with signal. | `pg_advisory_xact_lock` serializes the concurrency check. | Redis Lua script is atomic. |
|
|
317
404
|
|
|
318
|
-
|
|
405
|
+
### Circuit breaker and queue pause/resume
|
|
319
406
|
|
|
320
|
-
|
|
407
|
+
Pgbus automatically pauses queues that fail repeatedly, preventing cascading failures.
|
|
321
408
|
|
|
322
409
|
```ruby
|
|
323
|
-
|
|
324
|
-
|
|
325
|
-
on_success: BatchSucceededJob,
|
|
326
|
-
on_discard: BatchFailedJob,
|
|
327
|
-
description: "Import users",
|
|
328
|
-
properties: { initiated_by: current_user.id }
|
|
329
|
-
)
|
|
330
|
-
|
|
331
|
-
batch.enqueue do
|
|
332
|
-
users.each { |user| ImportUserJob.perform_later(user.id) }
|
|
410
|
+
Pgbus.configure do |config|
|
|
411
|
+
config.circuit_breaker_enabled = true # default
|
|
333
412
|
end
|
|
334
413
|
```
|
|
335
414
|
|
|
336
|
-
|
|
415
|
+
The trip threshold (`5` consecutive failures), base backoff (`30s`), and
|
|
416
|
+
max backoff (`600s`) are tuned via constants on `Pgbus::CircuitBreaker`.
|
|
417
|
+
Override the constants in an initializer if you need different values —
|
|
418
|
+
they are not exposed as configuration because tweaking them at runtime
|
|
419
|
+
has never proved useful in practice.
|
|
337
420
|
|
|
338
|
-
|
|
339
|
-
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
| `on_discard` | At least one job was dead-lettered |
|
|
421
|
+
When a queue hits the failure threshold:
|
|
422
|
+
1. The circuit breaker **auto-pauses** the queue with exponential backoff
|
|
423
|
+
2. After the backoff expires, the queue **auto-resumes** and the trip counter resets
|
|
424
|
+
3. If failures continue, each trip doubles the backoff (capped at `MAX_BACKOFF`)
|
|
343
425
|
|
|
344
|
-
|
|
426
|
+
You can also **manually pause/resume** queues from the dashboard. The pause state is stored in the `pgbus_queue_states` table and survives restarts.
|
|
345
427
|
|
|
346
|
-
```
|
|
347
|
-
|
|
348
|
-
|
|
349
|
-
user = User.find(properties["initiated_by"])
|
|
350
|
-
ImportMailer.complete(user).deliver_later
|
|
351
|
-
end
|
|
352
|
-
end
|
|
428
|
+
```bash
|
|
429
|
+
rails generate pgbus:add_queue_states # Add the queue_states migration
|
|
430
|
+
rails generate pgbus:add_queue_states --database=pgbus # For separate database
|
|
353
431
|
```
|
|
354
432
|
|
|
355
|
-
###
|
|
356
|
-
|
|
357
|
-
1. `Batch.new(...)` creates a tracking row in `pgbus_batches` with `status: "pending"`
|
|
358
|
-
2. `batch.enqueue { ... }` tags each enqueued job with the `pgbus_batch_id` in its payload
|
|
359
|
-
3. After each job completes or is dead-lettered, the executor atomically updates the batch counters
|
|
360
|
-
4. When `completed_jobs + discarded_jobs == total_jobs`, the batch status flips to `"finished"` and callback jobs are enqueued
|
|
361
|
-
5. The dispatcher cleans up finished batches older than 7 days
|
|
362
|
-
|
|
363
|
-
## Job uniqueness
|
|
433
|
+
### Prefetch flow control
|
|
364
434
|
|
|
365
|
-
|
|
435
|
+
Cap the number of in-flight (claimed but unfinished) messages per worker:
|
|
366
436
|
|
|
367
437
|
```ruby
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
key: ->(order_id) { "import-order-#{order_id}" },
|
|
371
|
-
on_conflict: :reject
|
|
372
|
-
|
|
373
|
-
def perform(order_id)
|
|
374
|
-
# Only ONE instance per order_id can exist — from enqueue through completion.
|
|
375
|
-
# If another ImportOrderJob for this order_id is already enqueued or running,
|
|
376
|
-
# the duplicate is rejected immediately.
|
|
377
|
-
end
|
|
438
|
+
Pgbus.configure do |config|
|
|
439
|
+
config.prefetch_limit = 20 # nil = unlimited (default)
|
|
378
440
|
end
|
|
379
441
|
```
|
|
380
442
|
|
|
381
|
-
|
|
382
|
-
|
|
383
|
-
| Strategy | Lock acquired | Lock released | Prevents |
|
|
384
|
-
|----------|--------------|---------------|----------|
|
|
385
|
-
| `:until_executed` | At enqueue | On completion or DLQ | Duplicate enqueue AND execution |
|
|
386
|
-
| `:while_executing` | At execution start | On completion or DLQ | Duplicate execution only |
|
|
387
|
-
|
|
388
|
-
### Conflict policies
|
|
389
|
-
|
|
390
|
-
| Policy | Behavior |
|
|
391
|
-
|--------|----------|
|
|
392
|
-
| `:reject` | Raise `Pgbus::JobNotUnique` (default) |
|
|
393
|
-
| `:discard` | Silently drop the duplicate |
|
|
394
|
-
| `:log` | Log a warning and drop |
|
|
443
|
+
The worker tracks in-flight messages with an atomic counter and only fetches `min(idle_threads, prefetch_available)` messages per cycle. The counter is decremented in an `ensure` block so it never gets stuck.
|
|
395
444
|
|
|
396
|
-
###
|
|
445
|
+
### Worker recycling
|
|
397
446
|
|
|
398
|
-
|
|
447
|
+
Pgbus workers recycle themselves to prevent memory bloat. This is the main reliability difference vs. solid_queue, which leaves workers alive forever.
|
|
399
448
|
|
|
400
|
-
```
|
|
401
|
-
|
|
402
|
-
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
|
|
406
|
-
claim_for_execution! (state: executing, owner_pid: PID)
|
|
407
|
-
│
|
|
408
|
-
┌───────┴───────┐
|
|
409
|
-
▼ ▼
|
|
410
|
-
Success Crash
|
|
411
|
-
release! (lock orphaned)
|
|
412
|
-
(row deleted) │
|
|
413
|
-
▼
|
|
414
|
-
Reaper checks:
|
|
415
|
-
Is owner_pid in pgbus_processes
|
|
416
|
-
with fresh heartbeat?
|
|
417
|
-
│
|
|
418
|
-
┌─────┴─────┐
|
|
419
|
-
No Yes
|
|
420
|
-
▼ ▼
|
|
421
|
-
release! (keep lock,
|
|
422
|
-
(orphaned) job is running)
|
|
449
|
+
```ruby
|
|
450
|
+
Pgbus.configure do |config|
|
|
451
|
+
config.max_jobs_per_worker = 10_000 # Restart after 10k jobs
|
|
452
|
+
config.max_memory_mb = 512 # Restart if memory exceeds 512MB
|
|
453
|
+
config.max_worker_lifetime = 1.hour # Restart after 1 hour
|
|
454
|
+
end
|
|
423
455
|
```
|
|
424
456
|
|
|
425
|
-
|
|
426
|
-
|
|
427
|
-
A last-resort TTL (default 24 hours) handles the case where the entire pgbus supervisor is dead and the reaper itself can't run.
|
|
428
|
-
|
|
429
|
-
### Uniqueness vs concurrency controls
|
|
430
|
-
|
|
431
|
-
| | `ensures_uniqueness` | `limits_concurrency` |
|
|
432
|
-
|---|---|---|
|
|
433
|
-
| **Purpose** | Prevent duplicate jobs | Limit concurrent execution slots |
|
|
434
|
-
| **Lock type** | Binary lock (one or none) | Counting semaphore (up to N) |
|
|
435
|
-
| **At enqueue** | `:until_executed` blocks duplicates | Checks semaphore, blocks/discards/raises |
|
|
436
|
-
| **At execution** | `:while_executing` blocks duplicate runs | Not checked (semaphore acquired at enqueue) |
|
|
437
|
-
| **Duplicate in queue** | `:until_executed`: impossible. `:while_executing`: allowed, only one runs | Allowed up to N, rest blocked |
|
|
438
|
-
| **Crash recovery** | Reaper checks heartbeats | Semaphore `expires_at` + dispatcher cleanup |
|
|
439
|
-
| **Use when** | "This exact job must not run twice" | "At most N of these can run at once" |
|
|
440
|
-
|
|
441
|
-
**When to use which:**
|
|
442
|
-
- Payment processing, order import, unique email sends → `ensures_uniqueness`
|
|
443
|
-
- Rate-limited API calls, resource-constrained tasks → `limits_concurrency`
|
|
444
|
-
- Both at once → combine them (they use separate tables, no conflicts)
|
|
457
|
+
When a limit is hit, the worker drains its thread pool, exits, and the supervisor forks a fresh process. RSS memory is sampled from `/proc/self/statm` (Linux) or `ps -o rss` (macOS).
|
|
445
458
|
|
|
446
|
-
|
|
459
|
+
## Routing and ordering
|
|
447
460
|
|
|
448
|
-
|
|
449
|
-
rails generate pgbus:add_job_locks # Add the migration
|
|
450
|
-
rails generate pgbus:add_job_locks --database=pgbus # For separate database
|
|
451
|
-
```
|
|
461
|
+
How messages flow between producers and the workers that handle them: priority sub-queues, consumer priority for active/standby workers, and single-active-consumer for strict ordering.
|
|
452
462
|
|
|
453
|
-
|
|
463
|
+
### Priority queues
|
|
454
464
|
|
|
455
465
|
Route jobs to priority sub-queues so high-priority work is processed first:
|
|
456
466
|
|
|
@@ -487,80 +497,82 @@ end
|
|
|
487
497
|
|
|
488
498
|
When `priority_levels` is `nil` (default), priority queues are disabled and all jobs go to a single queue per logical name.
|
|
489
499
|
|
|
490
|
-
|
|
491
|
-
|
|
492
|
-
For queues that require strict ordering, enable single active consumer mode. Only one worker process can read from a queue at a time -- others skip it and process other queues.
|
|
493
|
-
|
|
494
|
-
```yaml
|
|
495
|
-
# config/pgbus.yml
|
|
496
|
-
production:
|
|
497
|
-
workers:
|
|
498
|
-
- queues: [ordered_events]
|
|
499
|
-
threads: 1
|
|
500
|
-
single_active_consumer: true
|
|
501
|
-
- queues: [ordered_events]
|
|
502
|
-
threads: 1
|
|
503
|
-
single_active_consumer: true # Standby — takes over if the first worker dies
|
|
504
|
-
```
|
|
505
|
-
|
|
506
|
-
Uses PostgreSQL session-level advisory locks (`pg_try_advisory_lock`). The lock is non-blocking -- workers that can't acquire it simply skip the queue. Locks auto-release on connection close (including crashes), so failover is automatic.
|
|
507
|
-
|
|
508
|
-
## Consumer priority
|
|
500
|
+
### Consumer priority
|
|
509
501
|
|
|
510
502
|
When multiple workers subscribe to the same queues, higher-priority workers process messages first. Lower-priority workers back off (3x polling interval) when a higher-priority worker is active.
|
|
511
503
|
|
|
512
|
-
```
|
|
513
|
-
|
|
514
|
-
|
|
515
|
-
|
|
516
|
-
|
|
517
|
-
threads: 10
|
|
518
|
-
consumer_priority: 10 # Primary — polls at base interval
|
|
519
|
-
- queues: [default]
|
|
520
|
-
threads: 5
|
|
521
|
-
consumer_priority: 0 # Fallback — polls at 3x interval when primary is healthy
|
|
504
|
+
```ruby
|
|
505
|
+
Pgbus.configure do |c|
|
|
506
|
+
c.capsule :primary, queues: %w[default], threads: 10, consumer_priority: 10
|
|
507
|
+
c.capsule :fallback, queues: %w[default], threads: 5, consumer_priority: 0
|
|
508
|
+
end
|
|
522
509
|
```
|
|
523
510
|
|
|
524
511
|
Priority is stored in heartbeat metadata. Workers check the `pgbus_processes` table to discover higher-priority peers. When a high-priority worker goes stale (no heartbeat for 5 minutes), lower-priority workers automatically resume normal polling.
|
|
525
512
|
|
|
526
|
-
|
|
513
|
+
### Single active consumer
|
|
527
514
|
|
|
528
|
-
|
|
515
|
+
For queues that require strict ordering, enable single active consumer mode. Only one worker process can read from a queue at a time — others skip it and process other queues.
|
|
529
516
|
|
|
530
517
|
```ruby
|
|
531
|
-
Pgbus.configure do |
|
|
532
|
-
|
|
533
|
-
|
|
534
|
-
config.circuit_breaker_base_backoff = 30 # seconds (doubles per trip)
|
|
535
|
-
config.circuit_breaker_max_backoff = 600 # 10 minute cap
|
|
518
|
+
Pgbus.configure do |c|
|
|
519
|
+
c.capsule :ordered_primary, queues: %w[ordered_events], threads: 1, single_active_consumer: true
|
|
520
|
+
c.capsule :ordered_standby, queues: %w[ordered_events], threads: 1, single_active_consumer: true
|
|
536
521
|
end
|
|
537
522
|
```
|
|
538
523
|
|
|
539
|
-
|
|
540
|
-
1. The circuit breaker **auto-pauses** the queue with exponential backoff
|
|
541
|
-
2. After the backoff expires, the queue **auto-resumes** and the trip counter resets
|
|
542
|
-
3. If failures continue, each trip doubles the backoff (capped at `max_backoff`)
|
|
524
|
+
Uses PostgreSQL session-level advisory locks (`pg_try_advisory_lock`). The lock is non-blocking — workers that can't acquire it simply skip the queue. Locks auto-release on connection close (including crashes), so failover is automatic. The standby capsule takes over within one polling tick if the primary dies.
|
|
543
525
|
|
|
544
|
-
|
|
526
|
+
## Persistence and batching
|
|
545
527
|
|
|
546
|
-
|
|
547
|
-
|
|
548
|
-
|
|
528
|
+
How Pgbus integrates with your application's transactions and tracks groups of related work: outbox for atomic publish, batches for fan-out coordination, archive compaction for keeping the queue tables small.
|
|
529
|
+
|
|
530
|
+
### Batches
|
|
531
|
+
|
|
532
|
+
Coordinate groups of jobs with callbacks when all complete:
|
|
533
|
+
|
|
534
|
+
```ruby
|
|
535
|
+
batch = Pgbus::Batch.new(
|
|
536
|
+
on_finish: BatchFinishedJob,
|
|
537
|
+
on_success: BatchSucceededJob,
|
|
538
|
+
on_discard: BatchFailedJob,
|
|
539
|
+
description: "Import users",
|
|
540
|
+
properties: { initiated_by: current_user.id }
|
|
541
|
+
)
|
|
542
|
+
|
|
543
|
+
batch.enqueue do
|
|
544
|
+
users.each { |user| ImportUserJob.perform_later(user.id) }
|
|
545
|
+
end
|
|
549
546
|
```
|
|
550
547
|
|
|
551
|
-
|
|
548
|
+
#### Callbacks
|
|
552
549
|
|
|
553
|
-
|
|
550
|
+
| Callback | Fired when |
|
|
551
|
+
|----------|------------|
|
|
552
|
+
| `on_finish` | All jobs completed (success or discard) |
|
|
553
|
+
| `on_success` | All jobs completed successfully (zero discarded) |
|
|
554
|
+
| `on_discard` | At least one job was dead-lettered |
|
|
555
|
+
|
|
556
|
+
Callback jobs receive the batch `properties` hash as their argument:
|
|
554
557
|
|
|
555
558
|
```ruby
|
|
556
|
-
|
|
557
|
-
|
|
559
|
+
class BatchFinishedJob < ApplicationJob
|
|
560
|
+
def perform(properties)
|
|
561
|
+
user = User.find(properties["initiated_by"])
|
|
562
|
+
ImportMailer.complete(user).deliver_later
|
|
563
|
+
end
|
|
558
564
|
end
|
|
559
565
|
```
|
|
560
566
|
|
|
561
|
-
|
|
567
|
+
#### How batches work
|
|
568
|
+
|
|
569
|
+
1. `Batch.new(...)` creates a tracking row in `pgbus_batches` with `status: "pending"`
|
|
570
|
+
2. `batch.enqueue { ... }` tags each enqueued job with the `pgbus_batch_id` in its payload
|
|
571
|
+
3. After each job completes or is dead-lettered, the executor atomically updates the batch counters
|
|
572
|
+
4. When `completed_jobs + discarded_jobs == total_jobs`, the batch status flips to `"finished"` and callback jobs are enqueued
|
|
573
|
+
5. The dispatcher cleans up finished batches older than 7 days
|
|
562
574
|
|
|
563
|
-
|
|
575
|
+
### Transactional outbox
|
|
564
576
|
|
|
565
577
|
Publish events atomically inside your database transactions. A background poller moves outbox entries to PGMQ.
|
|
566
578
|
|
|
@@ -574,7 +586,7 @@ Pgbus.configure do |config|
|
|
|
574
586
|
config.outbox_enabled = true
|
|
575
587
|
config.outbox_poll_interval = 1.0 # seconds
|
|
576
588
|
config.outbox_batch_size = 100
|
|
577
|
-
config.outbox_retention =
|
|
589
|
+
config.outbox_retention = 1.day # ActiveSupport::Duration also accepted
|
|
578
590
|
end
|
|
579
591
|
```
|
|
580
592
|
|
|
@@ -595,144 +607,92 @@ end
|
|
|
595
607
|
|
|
596
608
|
The outbox poller uses `FOR UPDATE SKIP LOCKED` inside a transaction to claim entries, publishes them to PGMQ, and marks them as published. Failed entries are skipped and retried next cycle.
|
|
597
609
|
|
|
598
|
-
|
|
610
|
+
### Archive compaction
|
|
599
611
|
|
|
600
612
|
PGMQ archive tables grow unbounded. Pgbus automatically purges old entries:
|
|
601
613
|
|
|
602
614
|
```ruby
|
|
603
615
|
Pgbus.configure do |config|
|
|
604
|
-
config.archive_retention = 7
|
|
605
|
-
config.archive_compaction_interval = 3600 # run every hour (default)
|
|
606
|
-
config.archive_compaction_batch_size = 1000 # delete in batches (default)
|
|
616
|
+
config.archive_retention = 7.days # ActiveSupport::Duration (default 7 days)
|
|
607
617
|
end
|
|
608
618
|
```
|
|
609
619
|
|
|
610
|
-
The
|
|
611
|
-
|
|
612
|
-
|
|
620
|
+
The compaction loop runs every hour and deletes up to 1000 rows per
|
|
621
|
+
queue per cycle. Both knobs live as constants on
|
|
622
|
+
`Pgbus::Process::Dispatcher` (`ARCHIVE_COMPACTION_INTERVAL`,
|
|
623
|
+
`ARCHIVE_COMPACTION_BATCH_SIZE`) — they have never been worth surfacing
|
|
624
|
+
as configuration. The dispatcher runs archive compaction as part of its
|
|
625
|
+
maintenance loop, deleting archived messages older than `archive_retention`
|
|
626
|
+
in batches to avoid long-running transactions.
|
|
613
627
|
|
|
614
|
-
|
|
615
|
-
|--------|---------|-------------|
|
|
616
|
-
| `database_url` | `nil` | PostgreSQL connection URL (auto-detected in Rails) |
|
|
617
|
-
| `queue_prefix` | `"pgbus"` | Prefix for all PGMQ queue names |
|
|
618
|
-
| `default_queue` | `"default"` | Default queue for jobs without explicit queue |
|
|
619
|
-
| `pool_size` | `5` | Connection pool size |
|
|
620
|
-
| `workers` | `[{queues: ["default"], threads: 5}]` | Worker process definitions |
|
|
621
|
-
| `event_consumers` | `nil` | Event consumer process definitions (same format as workers) |
|
|
622
|
-
| `polling_interval` | `0.1` | Seconds between polls (LISTEN/NOTIFY is primary) |
|
|
623
|
-
| `visibility_timeout` | `30` | Seconds before unacked message becomes visible again |
|
|
624
|
-
| `max_retries` | `5` | Failed reads before routing to dead letter queue |
|
|
625
|
-
| `max_jobs_per_worker` | `nil` | Recycle worker after N jobs (nil = unlimited) |
|
|
626
|
-
| `max_memory_mb` | `nil` | Recycle worker when memory exceeds N MB |
|
|
627
|
-
| `max_worker_lifetime` | `nil` | Recycle worker after N seconds |
|
|
628
|
-
| `listen_notify` | `true` | Use PGMQ's LISTEN/NOTIFY for instant wake-up |
|
|
629
|
-
| `prefetch_limit` | `nil` | Max in-flight messages per worker (nil = unlimited) |
|
|
630
|
-
| `dispatch_interval` | `1.0` | Seconds between dispatcher maintenance ticks |
|
|
631
|
-
| `circuit_breaker_enabled` | `true` | Enable auto-pause on consecutive failures |
|
|
632
|
-
| `circuit_breaker_threshold` | `5` | Consecutive failures before tripping |
|
|
633
|
-
| `circuit_breaker_base_backoff` | `30` | Base backoff seconds (doubles per trip) |
|
|
634
|
-
| `circuit_breaker_max_backoff` | `600` | Max backoff cap in seconds |
|
|
635
|
-
| `priority_levels` | `nil` | Number of priority sub-queues (nil = disabled, 2-10) |
|
|
636
|
-
| `default_priority` | `1` | Default priority for jobs without explicit priority |
|
|
637
|
-
| `archive_retention` | `604800` | Seconds to keep archived messages (7 days) |
|
|
638
|
-
| `archive_compaction_interval` | `3600` | Seconds between archive cleanup runs |
|
|
639
|
-
| `archive_compaction_batch_size` | `1000` | Rows deleted per batch during compaction |
|
|
640
|
-
| `outbox_enabled` | `false` | Enable transactional outbox poller process |
|
|
641
|
-
| `outbox_poll_interval` | `1.0` | Seconds between outbox poll cycles |
|
|
642
|
-
| `outbox_batch_size` | `100` | Max entries per outbox poll cycle |
|
|
643
|
-
| `outbox_retention` | `86400` | Seconds to keep published outbox entries (1 day) |
|
|
644
|
-
| `idempotency_ttl` | `604800` | Seconds to keep processed event records (7 days, cleaned hourly) |
|
|
645
|
-
| `base_controller_class` | `"::ActionController::Base"` | Base class for dashboard controllers (string, constantized at load time) |
|
|
646
|
-
| `return_to_app_url` | `nil` | URL for "back to app" button in dashboard nav (nil hides the button) |
|
|
647
|
-
| `web_auth` | `nil` | Lambda for dashboard authentication |
|
|
648
|
-
| `web_refresh_interval` | `5000` | Dashboard auto-refresh interval in milliseconds |
|
|
649
|
-
| `web_live_updates` | `true` | Enable Turbo Frames auto-refresh on dashboard |
|
|
650
|
-
| `stats_enabled` | `true` | Record job execution stats for insights dashboard |
|
|
651
|
-
| `stats_retention` | `604800` | Seconds to keep job stats (7 days) |
|
|
628
|
+
## Operations
|
|
652
629
|
|
|
653
|
-
|
|
630
|
+
Day-to-day running of Pgbus: starting and stopping processes, observing what is happening on the dashboard, the database tables Pgbus relies on, and how to migrate from an existing job backend.
|
|
654
631
|
|
|
655
|
-
|
|
656
|
-
Supervisor (fork manager)
|
|
657
|
-
├── Worker 1 (queues: [default, mailers], threads: 10, priority: 10)
|
|
658
|
-
├── Worker 2 (queues: [critical], threads: 5, single_active_consumer: true)
|
|
659
|
-
├── Dispatcher (maintenance: cleanup, compaction, reaping, circuit breaker)
|
|
660
|
-
├── Scheduler (recurring tasks via cron)
|
|
661
|
-
├── Consumer (event bus topics)
|
|
662
|
-
└── Outbox Poller (transactional outbox → PGMQ, when enabled)
|
|
632
|
+
### CLI
|
|
663
633
|
|
|
664
|
-
|
|
665
|
-
|
|
666
|
-
|
|
667
|
-
|
|
668
|
-
|
|
669
|
-
|
|
670
|
-
└── pgbus_queue_states (pause/resume + circuit breaker state)
|
|
634
|
+
```bash
|
|
635
|
+
pgbus start # Start supervisor with workers + dispatcher + scheduler
|
|
636
|
+
pgbus status # Show running processes
|
|
637
|
+
pgbus queues # List queues with depth/metrics
|
|
638
|
+
pgbus version # Print version
|
|
639
|
+
pgbus help # Show help
|
|
671
640
|
```
|
|
672
641
|
|
|
673
|
-
|
|
674
|
-
|
|
675
|
-
1. **Enqueue**: ActiveJob serializes the job to JSON, Pgbus sends it to the appropriate PGMQ queue
|
|
676
|
-
2. **Read**: Workers poll queues (or wake instantly via LISTEN/NOTIFY) and claim messages with a visibility timeout
|
|
677
|
-
3. **Execute**: The job is deserialized and executed within the Rails executor
|
|
678
|
-
4. **Archive/Retry**: On success, the message is archived. On failure, the visibility timeout expires and the message becomes available again. PGMQ's `read_ct` tracks delivery attempts
|
|
679
|
-
5. **Dead letter**: When `read_ct` exceeds `max_retries`, the message is moved to the `_dlq` queue for manual inspection
|
|
680
|
-
|
|
681
|
-
### Worker recycling
|
|
642
|
+
#### Role flags (split deployments)
|
|
682
643
|
|
|
683
|
-
|
|
644
|
+
By default, `pgbus start` boots every role in one supervisor (workers, dispatcher, scheduler, event consumers, outbox poller). For containerized deployments where each role lives in a separate process, use the role flags:
|
|
684
645
|
|
|
685
|
-
```
|
|
686
|
-
|
|
687
|
-
|
|
688
|
-
|
|
689
|
-
config.max_worker_lifetime = 3600 # Restart after 1 hour
|
|
690
|
-
end
|
|
646
|
+
```bash
|
|
647
|
+
pgbus start --workers-only # Only worker processes
|
|
648
|
+
pgbus start --scheduler-only # Only the recurring-task scheduler
|
|
649
|
+
pgbus start --dispatcher-only # Only the maintenance dispatcher
|
|
691
650
|
```
|
|
692
651
|
|
|
693
|
-
|
|
652
|
+
These flags are mutually exclusive. The auto-tuned `pool_size` adjusts to the role: a `--scheduler-only` deployment with 50 worker threads configured only opens the connections it actually needs (1 for the scheduler), not 51.
|
|
694
653
|
|
|
695
|
-
|
|
654
|
+
#### Capsule selection
|
|
655
|
+
|
|
656
|
+
`--capsule NAME` boots a single named capsule. Combine with `--workers-only` to run one capsule per container:
|
|
696
657
|
|
|
697
658
|
```bash
|
|
698
|
-
pgbus start
|
|
699
|
-
pgbus
|
|
700
|
-
pgbus queues # List queues with depth/metrics
|
|
701
|
-
pgbus version # Print version
|
|
702
|
-
pgbus help # Show help
|
|
659
|
+
pgbus start --workers-only --capsule critical
|
|
660
|
+
pgbus start --workers-only --capsule default
|
|
703
661
|
```
|
|
704
662
|
|
|
705
|
-
|
|
663
|
+
The capsule name is the `:name` you passed to `c.capsule` in your initializer (or the first queue token when using the string DSL).
|
|
664
|
+
|
|
665
|
+
### Dashboard
|
|
706
666
|
|
|
707
667
|
The dashboard is a mountable Rails engine at `/pgbus` with:
|
|
708
668
|
|
|
709
|
-
- **Overview**
|
|
710
|
-
- **Queues**
|
|
711
|
-
- **Jobs**
|
|
712
|
-
- **Dead letter**
|
|
713
|
-
- **Processes**
|
|
714
|
-
- **Events**
|
|
715
|
-
- **Outbox**
|
|
716
|
-
- **Locks**
|
|
717
|
-
- **Insights**
|
|
669
|
+
- **Overview** — queue depths, enqueued count, active processes, failure count, throughput rate
|
|
670
|
+
- **Queues** — per-queue metrics, purge/pause/resume/delete actions
|
|
671
|
+
- **Jobs** — enqueued and failed jobs, retry/discard actions
|
|
672
|
+
- **Dead letter** — DLQ messages with retry/discard, bulk actions
|
|
673
|
+
- **Processes** — active workers/dispatcher/consumers with heartbeat status
|
|
674
|
+
- **Events** — registered subscribers and processed events
|
|
675
|
+
- **Outbox** — transactional outbox entries pending publication
|
|
676
|
+
- **Locks** — active job uniqueness locks with state (queued/executing), owner PID@hostname, age
|
|
677
|
+
- **Insights** — throughput chart (jobs/min), status distribution donut, slowest job classes table
|
|
718
678
|
|
|
719
679
|
All tables use Turbo Frames for periodic auto-refresh without page reloads. Destructive actions use styled confirmation dialogs (not browser `confirm()`), and flash messages appear as auto-dismissing toast notifications.
|
|
720
680
|
|
|
721
|
-
|
|
681
|
+
#### Queue management
|
|
722
682
|
|
|
723
683
|
The queues page lets you manage PGMQ queues directly:
|
|
724
684
|
|
|
725
|
-
- **Purge**
|
|
726
|
-
- **Delete**
|
|
727
|
-
- **Pause / Resume**
|
|
685
|
+
- **Purge** — removes all messages from the queue (the queue itself remains)
|
|
686
|
+
- **Delete** — permanently drops the queue from PGMQ (removes the queue table and metadata)
|
|
687
|
+
- **Pause / Resume** — pauses or resumes job processing for a queue
|
|
728
688
|
|
|
729
689
|
All destructive actions require confirmation. Pause/resume and delete are available on both the queue index and detail pages.
|
|
730
690
|
|
|
731
|
-
|
|
691
|
+
#### Dark mode
|
|
732
692
|
|
|
733
693
|
The dashboard supports dark mode via Tailwind CSS `dark:` classes. It respects your system preference on first visit and persists your choice via localStorage. Toggle with the sun/moon button in the nav bar.
|
|
734
694
|
|
|
735
|
-
|
|
695
|
+
#### Job stats and insights
|
|
736
696
|
|
|
737
697
|
The executor records every job completion to `pgbus_job_stats` (job class, queue, status, duration). The insights page visualizes this data with ApexCharts (loaded via CDN, zero npm dependencies).
|
|
738
698
|
|
|
@@ -741,9 +701,9 @@ rails generate pgbus:add_job_stats # Add the stats migration
|
|
|
741
701
|
rails generate pgbus:add_job_stats --database=pgbus
|
|
742
702
|
```
|
|
743
703
|
|
|
744
|
-
Stats collection is enabled by default (`config.stats_enabled = true`). Old stats are cleaned up by the dispatcher based on `config.stats_retention` (default:
|
|
704
|
+
Stats collection is enabled by default (`config.stats_enabled = true`). Old stats are cleaned up by the dispatcher based on `config.stats_retention` (default: 30 days). If the migration hasn't been run yet, stat recording is silently skipped.
|
|
745
705
|
|
|
746
|
-
|
|
706
|
+
### Database tables
|
|
747
707
|
|
|
748
708
|
Pgbus uses these tables (created via PGMQ and migrations):
|
|
749
709
|
|
|
@@ -764,16 +724,85 @@ Pgbus uses these tables (created via PGMQ and migrations):
|
|
|
764
724
|
| `pgbus_recurring_tasks` | Recurring job definitions |
|
|
765
725
|
| `pgbus_recurring_executions` | Recurring job execution history |
|
|
766
726
|
|
|
767
|
-
|
|
727
|
+
### Switching from another backend
|
|
768
728
|
|
|
769
729
|
Already using a different job processor? These guides walk you through the migration:
|
|
770
730
|
|
|
771
|
-
- **[Switch from Sidekiq](docs/switch_from_sidekiq.md)**
|
|
772
|
-
- **[Switch from SolidQueue](docs/switch_from_solid_queue.md)**
|
|
773
|
-
- **[Switch from GoodJob](docs/switch_from_good_job.md)**
|
|
731
|
+
- **[Switch from Sidekiq](docs/switch_from_sidekiq.md)** — remove Redis, convert native workers, replace middleware with callbacks
|
|
732
|
+
- **[Switch from SolidQueue](docs/switch_from_solid_queue.md)** — similar architecture, swap config format, gain LISTEN/NOTIFY + worker recycling
|
|
733
|
+
- **[Switch from GoodJob](docs/switch_from_good_job.md)** — both PostgreSQL-native, swap advisory locks for PGMQ visibility timeouts
|
|
774
734
|
|
|
775
735
|
See [docs/README.md](docs/README.md) for a full feature comparison table.
|
|
776
736
|
|
|
737
|
+
## Reference
|
|
738
|
+
|
|
739
|
+
Architectural overview and the full list of configuration settings.
|
|
740
|
+
|
|
741
|
+
### Architecture
|
|
742
|
+
|
|
743
|
+
```text
|
|
744
|
+
Supervisor (fork manager)
|
|
745
|
+
├── Worker 1 (queues: [default, mailers], threads: 10, priority: 10)
|
|
746
|
+
├── Worker 2 (queues: [critical], threads: 5, single_active_consumer: true)
|
|
747
|
+
├── Dispatcher (maintenance: cleanup, compaction, reaping, circuit breaker)
|
|
748
|
+
├── Scheduler (recurring tasks via cron)
|
|
749
|
+
├── Consumer (event bus topics)
|
|
750
|
+
└── Outbox Poller (transactional outbox → PGMQ, when enabled)
|
|
751
|
+
|
|
752
|
+
PostgreSQL + PGMQ
|
|
753
|
+
├── pgbus_default (job queue)
|
|
754
|
+
├── pgbus_default_dlq (dead letter queue)
|
|
755
|
+
├── pgbus_critical (job queue)
|
|
756
|
+
├── pgbus_critical_dlq (dead letter queue)
|
|
757
|
+
├── pgbus_mailers (job queue)
|
|
758
|
+
└── pgbus_queue_states (pause/resume + circuit breaker state)
|
|
759
|
+
```
|
|
760
|
+
|
|
761
|
+
#### How a job flows through the system
|
|
762
|
+
|
|
763
|
+
1. **Enqueue**: ActiveJob serializes the job to JSON, Pgbus sends it to the appropriate PGMQ queue
|
|
764
|
+
2. **Read**: Workers poll queues (or wake instantly via LISTEN/NOTIFY) and claim messages with a visibility timeout
|
|
765
|
+
3. **Execute**: The job is deserialized and executed within the Rails executor
|
|
766
|
+
4. **Archive/Retry**: On success, the message is archived. On failure, the visibility timeout expires and the message becomes available again. PGMQ's `read_ct` tracks delivery attempts
|
|
767
|
+
5. **Dead letter**: When `read_ct` exceeds `max_retries`, the message is moved to the `_dlq` queue for manual inspection
|
|
768
|
+
|
|
769
|
+
### Configuration reference
|
|
770
|
+
|
|
771
|
+
| Option | Default | Description |
|
|
772
|
+
|--------|---------|-------------|
|
|
773
|
+
| `database_url` | `nil` | PostgreSQL connection URL (auto-detected in Rails) |
|
|
774
|
+
| `queue_prefix` | `"pgbus"` | Prefix for all PGMQ queue names |
|
|
775
|
+
| `default_queue` | `"default"` | Default queue for jobs without explicit queue |
|
|
776
|
+
| `pool_size` | `nil` (auto) | Connection pool size. Auto-tuned from worker thread counts: `sum(workers.threads) + sum(event_consumers.threads) + 2`. Set explicitly to override. |
|
|
777
|
+
| `workers` | `[{queues: ["default"], threads: 5}]` | Worker capsule definitions. String DSL (`"default: 5; critical: 10"`), Array, or `nil`. |
|
|
778
|
+
| `event_consumers` | `nil` | Event consumer process definitions (same format as workers) |
|
|
779
|
+
| `roles` | `nil` (all) | Supervisor role filter — usually set via CLI flags (`--workers-only` etc.) |
|
|
780
|
+
| `polling_interval` | `0.1` | Seconds between polls (LISTEN/NOTIFY is primary) |
|
|
781
|
+
| `visibility_timeout` | `30` | Time before unacked message becomes visible again. Accepts seconds or `ActiveSupport::Duration` (e.g. `10.minutes`) |
|
|
782
|
+
| `max_retries` | `5` | Failed reads before routing to dead letter queue |
|
|
783
|
+
| `max_jobs_per_worker` | `nil` | Recycle worker after N jobs (nil = unlimited) |
|
|
784
|
+
| `max_memory_mb` | `nil` | Recycle worker when memory exceeds N MB |
|
|
785
|
+
| `max_worker_lifetime` | `nil` | Recycle worker after N seconds. Accepts seconds or Duration. |
|
|
786
|
+
| `listen_notify` | `true` | Use PGMQ's LISTEN/NOTIFY for instant wake-up |
|
|
787
|
+
| `prefetch_limit` | `nil` | Max in-flight messages per worker (nil = unlimited) |
|
|
788
|
+
| `dispatch_interval` | `1.0` | Seconds between dispatcher maintenance ticks |
|
|
789
|
+
| `circuit_breaker_enabled` | `true` | Enable auto-pause on consecutive failures (threshold and backoff are tuned via `Pgbus::CircuitBreaker` constants) |
|
|
790
|
+
| `priority_levels` | `nil` | Number of priority sub-queues (nil = disabled, 2-10) |
|
|
791
|
+
| `default_priority` | `1` | Default priority for jobs without explicit priority |
|
|
792
|
+
| `archive_retention` | `7.days` | How long to keep archived messages. Accepts seconds, Duration, or `nil` to disable cleanup |
|
|
793
|
+
| `outbox_enabled` | `false` | Enable transactional outbox poller process |
|
|
794
|
+
| `outbox_poll_interval` | `1.0` | Seconds between outbox poll cycles |
|
|
795
|
+
| `outbox_batch_size` | `100` | Max entries per outbox poll cycle |
|
|
796
|
+
| `outbox_retention` | `1.day` | How long to keep published outbox entries. Accepts seconds, Duration, or `nil` to disable cleanup |
|
|
797
|
+
| `idempotency_ttl` | `7.days` | How long to keep processed event records. Accepts seconds, Duration, or `nil` to disable cleanup |
|
|
798
|
+
| `base_controller_class` | `"::ActionController::Base"` | Base class for dashboard controllers (string, constantized at load time) |
|
|
799
|
+
| `return_to_app_url` | `nil` | URL for "back to app" button in dashboard nav (nil hides the button) |
|
|
800
|
+
| `web_auth` | `nil` | Lambda for dashboard authentication |
|
|
801
|
+
| `web_refresh_interval` | `5000` | Dashboard auto-refresh interval in milliseconds |
|
|
802
|
+
| `web_live_updates` | `true` | Enable Turbo Frames auto-refresh on dashboard |
|
|
803
|
+
| `stats_enabled` | `true` | Record job execution stats for insights dashboard |
|
|
804
|
+
| `stats_retention` | `30.days` | How long to keep job stats. Accepts seconds, Duration, or `nil` to disable cleanup |
|
|
805
|
+
|
|
777
806
|
## Development
|
|
778
807
|
|
|
779
808
|
```bash
|