pgbus 0.4.1 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -12,22 +12,33 @@ PostgreSQL-native job processing and event bus for Rails, built on [PGMQ](https:
12
12
  - [Requirements](#requirements)
13
13
  - [Installation](#installation)
14
14
  - [Quick start](#quick-start)
15
- - [Concurrency controls](#concurrency-controls)
16
- - [Batches](#batches)
17
- - [Job uniqueness](#job-uniqueness)
18
- - [Priority queues](#priority-queues)
19
- - [Single active consumer](#single-active-consumer)
20
- - [Consumer priority](#consumer-priority)
21
- - [Circuit breaker and queue pause/resume](#circuit-breaker-and-queue-pauseresume)
22
- - [Prefetch flow control](#prefetch-flow-control)
23
- - [Transactional outbox](#transactional-outbox)
24
- - [Archive compaction](#archive-compaction)
25
- - [Configuration reference](#configuration-reference)
26
- - [Architecture](#architecture)
27
- - [CLI](#cli)
28
- - [Dashboard](#dashboard)
29
- - [Database tables](#database-tables)
30
- - [Switching from another backend](#switching-from-another-backend)
15
+ - [1. Configure (optional)](#1-configure-optional)
16
+ - [2. Use as ActiveJob backend](#2-use-as-activejob-backend)
17
+ - [3. Event bus (optional)](#3-event-bus-optional)
18
+ - [4. Start workers](#4-start-workers)
19
+ - [5. Mount the dashboard](#5-mount-the-dashboard)
20
+ - [Reliability](#reliability)
21
+ - [Job uniqueness](#job-uniqueness)
22
+ - [Concurrency controls](#concurrency-controls)
23
+ - [Circuit breaker and queue pause/resume](#circuit-breaker-and-queue-pauseresume)
24
+ - [Prefetch flow control](#prefetch-flow-control)
25
+ - [Worker recycling](#worker-recycling)
26
+ - [Routing and ordering](#routing-and-ordering)
27
+ - [Priority queues](#priority-queues)
28
+ - [Consumer priority](#consumer-priority)
29
+ - [Single active consumer](#single-active-consumer)
30
+ - [Persistence and batching](#persistence-and-batching)
31
+ - [Batches](#batches)
32
+ - [Transactional outbox](#transactional-outbox)
33
+ - [Archive compaction](#archive-compaction)
34
+ - [Operations](#operations)
35
+ - [CLI](#cli)
36
+ - [Dashboard](#dashboard)
37
+ - [Database tables](#database-tables)
38
+ - [Switching from another backend](#switching-from-another-backend)
39
+ - [Reference](#reference)
40
+ - [Architecture](#architecture)
41
+ - [Configuration reference](#configuration-reference)
31
42
  - [Development](#development)
32
43
  - [License](#license)
33
44
 
@@ -75,51 +86,33 @@ CREATE EXTENSION IF NOT EXISTS pgmq;
75
86
 
76
87
  ### 1. Configure (optional)
77
88
 
78
- Pgbus works with zero config in Rails -- it uses your existing `ActiveRecord` connection. For custom setups, create `config/pgbus.yml`:
79
-
80
- ```yaml
81
- production:
82
- queue_prefix: myapp
83
- default_queue: default
84
- pool_size: 10
85
- max_retries: 5
86
- prefetch_limit: 20
87
- workers:
88
- - queues: [default, mailers]
89
- threads: 10
90
- consumer_priority: 10
91
- - queues: [critical]
92
- threads: 5
93
- single_active_consumer: true
94
- - queues: [default, mailers]
95
- threads: 5
96
- consumer_priority: 0 # fallback worker
97
- event_consumers:
98
- - queues: [orders, payments]
99
- threads: 5
100
- max_jobs_per_worker: 10000
101
- max_memory_mb: 512
102
- max_worker_lifetime: 3600
103
- ```
104
-
105
- Or configure in an initializer:
89
+ Pgbus works with zero config in Rails -- it uses your existing `ActiveRecord` connection. For custom setups, drop a Ruby initializer:
106
90
 
107
91
  ```ruby
108
92
  # config/initializers/pgbus.rb
109
- Pgbus.configure do |config|
110
- config.queue_prefix = "myapp"
111
- config.max_retries = 5
112
- config.max_jobs_per_worker = 10_000
113
- config.max_memory_mb = 512
114
- config.max_worker_lifetime = 3600
115
-
116
- config.workers = [
117
- { queues: %w[default mailers], threads: 10 },
118
- { queues: %w[critical], threads: 5 }
119
- ]
93
+ Pgbus.configure do |c|
94
+ c.queue_prefix = "myapp"
95
+ c.max_retries = 5
96
+ c.visibility_timeout = 30.seconds # ActiveSupport::Duration accepted
97
+ c.idempotency_ttl = 7.days
98
+
99
+ # Worker recycling — prevents long-lived processes from leaking memory
100
+ c.max_jobs_per_worker = 10_000
101
+ c.max_memory_mb = 512
102
+ c.max_worker_lifetime = 1.hour
103
+
104
+ # Capsule string DSL — Sidekiq-style "queues: threads; queues: threads"
105
+ c.workers = "default, mailers: 10; critical: 5"
106
+
107
+ # Or use named capsules with advanced options
108
+ c.capsule :ordered, queues: %w[ordered_events], threads: 1, single_active_consumer: true
120
109
  end
121
110
  ```
122
111
 
112
+ The capsule string DSL is the shortest form for the common case. Use `c.capsule` when you need named capsules with advanced options like `single_active_consumer` or `consumer_priority`. See [Routing and ordering](#routing-and-ordering) for the full set.
113
+
114
+ > **Migrating from `config/pgbus.yml`?** Run `rails generate pgbus:update` to convert your YAML config to a Ruby initializer using the modern DSL. The original YAML stays in place for review; delete it once the new initializer looks right.
115
+
123
116
  ### 2. Use as ActiveJob backend
124
117
 
125
118
  ```ruby
@@ -239,7 +232,101 @@ Pgbus.configure do |config|
239
232
  end
240
233
  ```
241
234
 
242
- ## Concurrency controls
235
+ ## Reliability
236
+
237
+ These features stop bad jobs from cascading into outages: deduplication, concurrency caps, automatic queue pausing on repeated failures, in-flight backpressure, and worker recycling.
238
+
239
+ ### Job uniqueness
240
+
241
+ Prevent duplicate jobs from running. Unlike `limits_concurrency` (which controls *how many* jobs with the same key run), uniqueness guarantees *at most one* job with a given key exists in the system at any time.
242
+
243
+ ```ruby
244
+ class ImportOrderJob < ApplicationJob
245
+ ensures_uniqueness strategy: :until_executed,
246
+ key: ->(order_id) { "import-order-#{order_id}" },
247
+ on_conflict: :reject
248
+
249
+ def perform(order_id)
250
+ # Only ONE instance per order_id can exist — from enqueue through completion.
251
+ # If another ImportOrderJob for this order_id is already enqueued or running,
252
+ # the duplicate is rejected immediately.
253
+ end
254
+ end
255
+ ```
256
+
257
+ #### Strategies
258
+
259
+ | Strategy | Lock acquired | Lock released | Prevents |
260
+ |----------|--------------|---------------|----------|
261
+ | `:until_executed` | At enqueue | On completion or DLQ | Duplicate enqueue AND execution |
262
+ | `:while_executing` | At execution start | On completion or DLQ | Duplicate execution only |
263
+
264
+ #### Conflict policies
265
+
266
+ | Policy | Behavior |
267
+ |--------|----------|
268
+ | `:reject` | Raise `Pgbus::JobNotUnique` (default) |
269
+ | `:discard` | Silently drop the duplicate |
270
+ | `:log` | Log a warning and drop |
271
+
272
+ #### Lock lifecycle
273
+
274
+ The lock is **never released by a timer**. It is held as long as the job exists in the system:
275
+
276
+ ```text
277
+ Enqueue ──→ pgbus_job_locks (state: queued, owner_pid: nil)
278
+
279
+ Worker picks up job
280
+
281
+
282
+ claim_for_execution! (state: executing, owner_pid: PID)
283
+
284
+ ┌───────┴───────┐
285
+ ▼ ▼
286
+ Success Crash
287
+ release! (lock orphaned)
288
+ (row deleted) │
289
+
290
+ Reaper checks:
291
+ Is owner_pid in pgbus_processes
292
+ with fresh heartbeat?
293
+
294
+ ┌─────┴─────┐
295
+ No Yes
296
+ ▼ ▼
297
+ release! (keep lock,
298
+ (orphaned) job is running)
299
+ ```
300
+
301
+ **Crash recovery** works through the reaper (runs every 5 minutes in the dispatcher). It cross-references `owner_pid` in `pgbus_job_locks` against `pgbus_processes` heartbeats. If the owning worker has no fresh heartbeat, the lock is orphaned and released — the PGMQ message's visibility timeout will expire and the job will be retried by another worker.
302
+
303
+ A last-resort TTL (default 24 hours) handles the case where the entire pgbus supervisor is dead and the reaper itself can't run.
304
+
305
+ #### Uniqueness vs concurrency controls
306
+
307
+ | | `ensures_uniqueness` | `limits_concurrency` |
308
+ |---|---|---|
309
+ | **Purpose** | Prevent duplicate jobs | Limit concurrent execution slots |
310
+ | **Lock type** | Binary lock (one or none) | Counting semaphore (up to N) |
311
+ | **At enqueue** | `:until_executed` blocks duplicates | Checks semaphore, blocks/discards/raises |
312
+ | **At execution** | `:while_executing` blocks duplicate runs | Not checked (semaphore acquired at enqueue) |
313
+ | **Duplicate in queue** | `:until_executed`: impossible. `:while_executing`: allowed, only one runs | Allowed up to N, rest blocked |
314
+ | **Crash recovery** | Reaper checks heartbeats | Semaphore `expires_at` + dispatcher cleanup |
315
+ | **Use when** | "This exact job must not run twice" | "At most N of these can run at once" |
316
+
317
+ **When to use which:**
318
+ - Payment processing, order import, unique email sends → `ensures_uniqueness`
319
+ - Rate-limited API calls, resource-constrained tasks → `limits_concurrency`
320
+ - Both at once → combine them (they use separate tables, no conflicts)
321
+
322
+ #### Setup
323
+
324
+ ```bash
325
+ rails generate pgbus:add_job_locks # Add the migration
326
+ rails generate pgbus:add_job_locks --database=pgbus # For separate database
327
+ ```
328
+
329
+ ### Concurrency controls
243
330
 
244
331
  Limit how many jobs with the same key can run concurrently:
245
332
 
@@ -256,7 +343,7 @@ class ProcessOrderJob < ApplicationJob
256
343
  end
257
344
  ```
258
345
 
259
- ### Options
346
+ #### Options
260
347
 
261
348
  | Option | Default | Description |
262
349
  |--------|---------|-------------|
@@ -265,7 +352,7 @@ end
265
352
  | `duration:` | `15.minutes` | Safety expiry for the semaphore (crashed worker recovery) |
266
353
  | `on_conflict:` | `:block` | What to do when the limit is reached |
267
354
 
268
- ### Conflict strategies
355
+ #### Conflict strategies
269
356
 
270
357
  | Strategy | Behavior |
271
358
  |----------|----------|
@@ -273,17 +360,17 @@ end
273
360
  | `:discard` | Silently drop the job. |
274
361
  | `:raise` | Raise `Pgbus::ConcurrencyLimitExceeded` so the caller can handle it. |
275
362
 
276
- ### How it works
363
+ #### How concurrency works
277
364
 
278
365
  1. **Enqueue**: The adapter checks a semaphore table for the concurrency key. If under the limit, it increments the counter and sends the job to PGMQ. If at the limit, it applies the `on_conflict` strategy.
279
366
  2. **Complete**: After a job succeeds or is dead-lettered, the executor signals the concurrency system via an `ensure` block (guaranteeing the signal fires even if the archive step fails). It first tries to promote a blocked job (atomic delete + enqueue in a single transaction). If nothing to promote, it releases the semaphore slot.
280
367
  3. **Safety net**: The dispatcher periodically cleans up expired semaphores and orphaned blocked executions to recover from crashed workers.
281
368
 
282
- ### Concurrency compared to other backends
369
+ #### Concurrency compared to other backends
283
370
 
284
371
  Pgbus, SolidQueue, GoodJob, and Sidekiq all offer concurrency controls, but with fundamentally different locking strategies and trade-offs.
285
372
 
286
- #### Architecture comparison
373
+ ##### Architecture comparison
287
374
 
288
375
  | | **Pgbus** | **SolidQueue** | **GoodJob** | **Sidekiq Enterprise** |
289
376
  |---|---|---|---|---|
@@ -296,7 +383,7 @@ Pgbus, SolidQueue, GoodJob, and Sidekiq all offer concurrency controls, but with
296
383
  | **Crash recovery** | Semaphore `expires_at` + dispatcher `expire_stale` cleanup | Semaphore `expires_at` + concurrency maintenance task | Advisory locks auto-release on session disconnect | TTL-based lease expiry (default 5 min) |
297
384
  | **Message lifecycle** | PGMQ visibility timeout (`FOR UPDATE SKIP LOCKED`) — message stays in queue until archived | AR-backed `claimed_executions` table | AR-backed `good_jobs` table with advisory lock per row | Redis list + sorted set |
298
385
 
299
- #### Key design differences
386
+ ##### Key design differences
300
387
 
301
388
  **Pgbus** uses PGMQ's native `FOR UPDATE SKIP LOCKED` for message claiming and a separate semaphore table for concurrency control. This two-layer approach means the message queue and concurrency system are independent — PGMQ handles exactly-once delivery, the semaphore handles admission control. The semaphore acquire is a single atomic SQL (`INSERT ... ON CONFLICT DO UPDATE WHERE value < max`), avoiding the need for explicit row locks.
302
389
 
@@ -306,7 +393,7 @@ Pgbus, SolidQueue, GoodJob, and Sidekiq all offer concurrency controls, but with
306
393
 
307
394
  **Sidekiq Enterprise** uses Redis sorted sets with TTL-based leases. Each concurrent slot is a sorted set entry with an expiry timestamp. This is fast and simple but has no durability guarantee — Redis failover can lose leases, temporarily allowing over-limit execution. The `sidekiq-unique-jobs` gem (open-source) uses a similar Lua-script approach but with more lock strategies (`:until_executing`, `:while_executing`, `:until_and_while_executing`) and configurable conflict handlers (`:reject`, `:reschedule`, `:replace`, `:raise`).
308
395
 
309
- #### Race condition resilience
396
+ ##### Race condition resilience
310
397
 
311
398
  | Scenario | Pgbus | SolidQueue | GoodJob | Sidekiq |
312
399
  |---|---|---|---|---|
@@ -315,142 +402,65 @@ Pgbus, SolidQueue, GoodJob, and Sidekiq all offer concurrency controls, but with
315
402
  | **Archive succeeds but signal fails** | `ensure` block guarantees signal fires even if archive raises. For SIGKILL: semaphore expires via dispatcher. | Fixed in PR #689 — `unblock_next_job` moved inside same transaction as `finished`. | Advisory lock released by session disconnect. | Lease auto-expires. |
316
403
  | **Concurrent enqueue and signal race** | Semaphore acquire is a single atomic SQL — no read-then-write gap. | Fixed in PR #689 — `FOR UPDATE` lock on semaphore row serializes enqueue with signal. | `pg_advisory_xact_lock` serializes the concurrency check. | Redis Lua script is atomic. |
317
404
 
318
- ## Batches
405
+ ### Circuit breaker and queue pause/resume
319
406
 
320
- Coordinate groups of jobs with callbacks when all complete:
407
+ Pgbus automatically pauses queues that fail repeatedly, preventing cascading failures.
321
408
 
322
409
  ```ruby
323
- batch = Pgbus::Batch.new(
324
- on_finish: BatchFinishedJob,
325
- on_success: BatchSucceededJob,
326
- on_discard: BatchFailedJob,
327
- description: "Import users",
328
- properties: { initiated_by: current_user.id }
329
- )
330
-
331
- batch.enqueue do
332
- users.each { |user| ImportUserJob.perform_later(user.id) }
410
+ Pgbus.configure do |config|
411
+ config.circuit_breaker_enabled = true # default
333
412
  end
334
413
  ```
335
414
 
336
- ### Callbacks
415
+ The trip threshold (`5` consecutive failures), base backoff (`30s`), and
416
+ max backoff (`600s`) are tuned via constants on `Pgbus::CircuitBreaker`.
417
+ Override the constants in an initializer if you need different values —
418
+ they are not exposed as configuration because tweaking them at runtime
419
+ has never proved useful in practice.
337
420
 
338
- | Callback | Fired when |
339
- |----------|------------|
340
- | `on_finish` | All jobs completed (success or discard) |
341
- | `on_success` | All jobs completed successfully (zero discarded) |
342
- | `on_discard` | At least one job was dead-lettered |
421
+ When a queue hits the failure threshold:
422
+ 1. The circuit breaker **auto-pauses** the queue with exponential backoff
423
+ 2. After the backoff expires, the queue **auto-resumes** and the trip counter resets
424
+ 3. If failures continue, each trip doubles the backoff (capped at `MAX_BACKOFF`)
343
425
 
344
- Callback jobs receive the batch `properties` hash as their argument:
426
+ You can also **manually pause/resume** queues from the dashboard. The pause state is stored in the `pgbus_queue_states` table and survives restarts.
345
427
 
346
- ```ruby
347
- class BatchFinishedJob < ApplicationJob
348
- def perform(properties)
349
- user = User.find(properties["initiated_by"])
350
- ImportMailer.complete(user).deliver_later
351
- end
352
- end
428
+ ```bash
429
+ rails generate pgbus:add_queue_states # Add the queue_states migration
430
+ rails generate pgbus:add_queue_states --database=pgbus # For separate database
353
431
  ```
354
432
 
355
- ### How it works
356
-
357
- 1. `Batch.new(...)` creates a tracking row in `pgbus_batches` with `status: "pending"`
358
- 2. `batch.enqueue { ... }` tags each enqueued job with the `pgbus_batch_id` in its payload
359
- 3. After each job completes or is dead-lettered, the executor atomically updates the batch counters
360
- 4. When `completed_jobs + discarded_jobs == total_jobs`, the batch status flips to `"finished"` and callback jobs are enqueued
361
- 5. The dispatcher cleans up finished batches older than 7 days
362
-
363
- ## Job uniqueness
433
+ ### Prefetch flow control
364
434
 
365
- Prevent duplicate jobs from running. Unlike `limits_concurrency` (which controls *how many* jobs with the same key run), uniqueness guarantees *at most one* job with a given key exists in the system at any time.
435
+ Cap the number of in-flight (claimed but unfinished) messages per worker:
366
436
 
367
437
  ```ruby
368
- class ImportOrderJob < ApplicationJob
369
- ensures_uniqueness strategy: :until_executed,
370
- key: ->(order_id) { "import-order-#{order_id}" },
371
- on_conflict: :reject
372
-
373
- def perform(order_id)
374
- # Only ONE instance per order_id can exist — from enqueue through completion.
375
- # If another ImportOrderJob for this order_id is already enqueued or running,
376
- # the duplicate is rejected immediately.
377
- end
438
+ Pgbus.configure do |config|
439
+ config.prefetch_limit = 20 # nil = unlimited (default)
378
440
  end
379
441
  ```
380
442
 
381
- ### Strategies
382
-
383
- | Strategy | Lock acquired | Lock released | Prevents |
384
- |----------|--------------|---------------|----------|
385
- | `:until_executed` | At enqueue | On completion or DLQ | Duplicate enqueue AND execution |
386
- | `:while_executing` | At execution start | On completion or DLQ | Duplicate execution only |
387
-
388
- ### Conflict policies
389
-
390
- | Policy | Behavior |
391
- |--------|----------|
392
- | `:reject` | Raise `Pgbus::JobNotUnique` (default) |
393
- | `:discard` | Silently drop the duplicate |
394
- | `:log` | Log a warning and drop |
443
+ The worker tracks in-flight messages with an atomic counter and only fetches `min(idle_threads, prefetch_available)` messages per cycle. The counter is decremented in an `ensure` block so it never gets stuck.
395
444
 
396
- ### Lock lifecycle
445
+ ### Worker recycling
397
446
 
398
- The lock is **never released by a timer**. It is held as long as the job exists in the system:
447
+ Pgbus workers recycle themselves to prevent memory bloat. This is the main reliability difference vs. solid_queue, which leaves workers alive forever.
399
448
 
400
- ```text
401
- Enqueue ──→ pgbus_job_locks (state: queued, owner_pid: nil)
402
-
403
- Worker picks up job
404
-
405
-
406
- claim_for_execution! (state: executing, owner_pid: PID)
407
-
408
- ┌───────┴───────┐
409
- ▼ ▼
410
- Success Crash
411
- release! (lock orphaned)
412
- (row deleted) │
413
-
414
- Reaper checks:
415
- Is owner_pid in pgbus_processes
416
- with fresh heartbeat?
417
-
418
- ┌─────┴─────┐
419
- No Yes
420
- ▼ ▼
421
- release! (keep lock,
422
- (orphaned) job is running)
449
+ ```ruby
450
+ Pgbus.configure do |config|
451
+ config.max_jobs_per_worker = 10_000 # Restart after 10k jobs
452
+ config.max_memory_mb = 512 # Restart if memory exceeds 512MB
453
+ config.max_worker_lifetime = 1.hour # Restart after 1 hour
454
+ end
423
455
  ```
424
456
 
425
- **Crash recovery** works through the reaper (runs every 5 minutes in the dispatcher). It cross-references `owner_pid` in `pgbus_job_locks` against `pgbus_processes` heartbeats. If the owning worker has no fresh heartbeat, the lock is orphaned and released the PGMQ message's visibility timeout will expire and the job will be retried by another worker.
426
-
427
- A last-resort TTL (default 24 hours) handles the case where the entire pgbus supervisor is dead and the reaper itself can't run.
428
-
429
- ### Uniqueness vs concurrency controls
430
-
431
- | | `ensures_uniqueness` | `limits_concurrency` |
432
- |---|---|---|
433
- | **Purpose** | Prevent duplicate jobs | Limit concurrent execution slots |
434
- | **Lock type** | Binary lock (one or none) | Counting semaphore (up to N) |
435
- | **At enqueue** | `:until_executed` blocks duplicates | Checks semaphore, blocks/discards/raises |
436
- | **At execution** | `:while_executing` blocks duplicate runs | Not checked (semaphore acquired at enqueue) |
437
- | **Duplicate in queue** | `:until_executed`: impossible. `:while_executing`: allowed, only one runs | Allowed up to N, rest blocked |
438
- | **Crash recovery** | Reaper checks heartbeats | Semaphore `expires_at` + dispatcher cleanup |
439
- | **Use when** | "This exact job must not run twice" | "At most N of these can run at once" |
440
-
441
- **When to use which:**
442
- - Payment processing, order import, unique email sends → `ensures_uniqueness`
443
- - Rate-limited API calls, resource-constrained tasks → `limits_concurrency`
444
- - Both at once → combine them (they use separate tables, no conflicts)
457
+ When a limit is hit, the worker drains its thread pool, exits, and the supervisor forks a fresh process. RSS memory is sampled from `/proc/self/statm` (Linux) or `ps -o rss` (macOS).
445
458
 
446
- ### Setup
459
+ ## Routing and ordering
447
460
 
448
- ```bash
449
- rails generate pgbus:add_job_locks # Add the migration
450
- rails generate pgbus:add_job_locks --database=pgbus # For separate database
451
- ```
461
+ How messages flow between producers and the workers that handle them: priority sub-queues, consumer priority for active/standby workers, and single-active-consumer for strict ordering.
452
462
 
453
- ## Priority queues
463
+ ### Priority queues
454
464
 
455
465
  Route jobs to priority sub-queues so high-priority work is processed first:
456
466
 
@@ -487,80 +497,82 @@ end
487
497
 
488
498
  When `priority_levels` is `nil` (default), priority queues are disabled and all jobs go to a single queue per logical name.
489
499
 
490
- ## Single active consumer
491
-
492
- For queues that require strict ordering, enable single active consumer mode. Only one worker process can read from a queue at a time -- others skip it and process other queues.
493
-
494
- ```yaml
495
- # config/pgbus.yml
496
- production:
497
- workers:
498
- - queues: [ordered_events]
499
- threads: 1
500
- single_active_consumer: true
501
- - queues: [ordered_events]
502
- threads: 1
503
- single_active_consumer: true # Standby — takes over if the first worker dies
504
- ```
505
-
506
- Uses PostgreSQL session-level advisory locks (`pg_try_advisory_lock`). The lock is non-blocking -- workers that can't acquire it simply skip the queue. Locks auto-release on connection close (including crashes), so failover is automatic.
507
-
508
- ## Consumer priority
500
+ ### Consumer priority
509
501
 
510
502
  When multiple workers subscribe to the same queues, higher-priority workers process messages first. Lower-priority workers back off (3x polling interval) when a higher-priority worker is active.
511
503
 
512
- ```yaml
513
- # config/pgbus.yml
514
- production:
515
- workers:
516
- - queues: [default]
517
- threads: 10
518
- consumer_priority: 10 # Primary — polls at base interval
519
- - queues: [default]
520
- threads: 5
521
- consumer_priority: 0 # Fallback — polls at 3x interval when primary is healthy
504
+ ```ruby
505
+ Pgbus.configure do |c|
506
+ c.capsule :primary, queues: %w[default], threads: 10, consumer_priority: 10
507
+ c.capsule :fallback, queues: %w[default], threads: 5, consumer_priority: 0
508
+ end
522
509
  ```
523
510
 
524
511
  Priority is stored in heartbeat metadata. Workers check the `pgbus_processes` table to discover higher-priority peers. When a high-priority worker goes stale (no heartbeat for 5 minutes), lower-priority workers automatically resume normal polling.
525
512
 
526
- ## Circuit breaker and queue pause/resume
513
+ ### Single active consumer
527
514
 
528
- Pgbus automatically pauses queues that fail repeatedly, preventing cascading failures.
515
+ For queues that require strict ordering, enable single active consumer mode. Only one worker process can read from a queue at a time — others skip it and process other queues.
529
516
 
530
517
  ```ruby
531
- Pgbus.configure do |config|
532
- config.circuit_breaker_enabled = true # default
533
- config.circuit_breaker_threshold = 5 # consecutive failures before tripping
534
- config.circuit_breaker_base_backoff = 30 # seconds (doubles per trip)
535
- config.circuit_breaker_max_backoff = 600 # 10 minute cap
518
+ Pgbus.configure do |c|
519
+ c.capsule :ordered_primary, queues: %w[ordered_events], threads: 1, single_active_consumer: true
520
+ c.capsule :ordered_standby, queues: %w[ordered_events], threads: 1, single_active_consumer: true
536
521
  end
537
522
  ```
538
523
 
539
- When a queue hits the failure threshold:
540
- 1. The circuit breaker **auto-pauses** the queue with exponential backoff
541
- 2. After the backoff expires, the queue **auto-resumes** and the trip counter resets
542
- 3. If failures continue, each trip doubles the backoff (capped at `max_backoff`)
524
+ Uses PostgreSQL session-level advisory locks (`pg_try_advisory_lock`). The lock is non-blocking — workers that can't acquire it simply skip the queue. Locks auto-release on connection close (including crashes), so failover is automatic. The standby capsule takes over within one polling tick if the primary dies.
543
525
 
544
- You can also **manually pause/resume** queues from the dashboard. The pause state is stored in the `pgbus_queue_states` table and survives restarts.
526
+ ## Persistence and batching
545
527
 
546
- ```bash
547
- rails generate pgbus:add_queue_states # Add the queue_states migration
548
- rails generate pgbus:add_queue_states --database=pgbus # For separate database
528
+ How Pgbus integrates with your application's transactions and tracks groups of related work: outbox for atomic publish, batches for fan-out coordination, archive compaction for keeping the queue tables small.
529
+
530
+ ### Batches
531
+
532
+ Coordinate groups of jobs with callbacks when all complete:
533
+
534
+ ```ruby
535
+ batch = Pgbus::Batch.new(
536
+ on_finish: BatchFinishedJob,
537
+ on_success: BatchSucceededJob,
538
+ on_discard: BatchFailedJob,
539
+ description: "Import users",
540
+ properties: { initiated_by: current_user.id }
541
+ )
542
+
543
+ batch.enqueue do
544
+ users.each { |user| ImportUserJob.perform_later(user.id) }
545
+ end
549
546
  ```
550
547
 
551
- ## Prefetch flow control
548
+ #### Callbacks
552
549
 
553
- Cap the number of in-flight (claimed but unfinished) messages per worker:
550
+ | Callback | Fired when |
551
+ |----------|------------|
552
+ | `on_finish` | All jobs completed (success or discard) |
553
+ | `on_success` | All jobs completed successfully (zero discarded) |
554
+ | `on_discard` | At least one job was dead-lettered |
555
+
556
+ Callback jobs receive the batch `properties` hash as their argument:
554
557
 
555
558
  ```ruby
556
- Pgbus.configure do |config|
557
- config.prefetch_limit = 20 # nil = unlimited (default)
559
+ class BatchFinishedJob < ApplicationJob
560
+ def perform(properties)
561
+ user = User.find(properties["initiated_by"])
562
+ ImportMailer.complete(user).deliver_later
563
+ end
558
564
  end
559
565
  ```
560
566
 
561
- The worker tracks in-flight messages with an atomic counter and only fetches `min(idle_threads, prefetch_available)` messages per cycle. The counter is decremented in an `ensure` block so it never gets stuck.
567
+ #### How batches work
568
+
569
+ 1. `Batch.new(...)` creates a tracking row in `pgbus_batches` with `status: "pending"`
570
+ 2. `batch.enqueue { ... }` tags each enqueued job with the `pgbus_batch_id` in its payload
571
+ 3. After each job completes or is dead-lettered, the executor atomically updates the batch counters
572
+ 4. When `completed_jobs + discarded_jobs == total_jobs`, the batch status flips to `"finished"` and callback jobs are enqueued
573
+ 5. The dispatcher cleans up finished batches older than 7 days
562
574
 
563
- ## Transactional outbox
575
+ ### Transactional outbox
564
576
 
565
577
  Publish events atomically inside your database transactions. A background poller moves outbox entries to PGMQ.
566
578
 
@@ -574,7 +586,7 @@ Pgbus.configure do |config|
574
586
  config.outbox_enabled = true
575
587
  config.outbox_poll_interval = 1.0 # seconds
576
588
  config.outbox_batch_size = 100
577
- config.outbox_retention = 24 * 3600 # keep published entries for 24h
589
+ config.outbox_retention = 1.day # ActiveSupport::Duration also accepted
578
590
  end
579
591
  ```
580
592
 
@@ -595,144 +607,92 @@ end
595
607
 
596
608
  The outbox poller uses `FOR UPDATE SKIP LOCKED` inside a transaction to claim entries, publishes them to PGMQ, and marks them as published. Failed entries are skipped and retried next cycle.
597
609
 
598
- ## Archive compaction
610
+ ### Archive compaction
599
611
 
600
612
  PGMQ archive tables grow unbounded. Pgbus automatically purges old entries:
601
613
 
602
614
  ```ruby
603
615
  Pgbus.configure do |config|
604
- config.archive_retention = 7 * 24 * 3600 # 7 days (default)
605
- config.archive_compaction_interval = 3600 # run every hour (default)
606
- config.archive_compaction_batch_size = 1000 # delete in batches (default)
616
+ config.archive_retention = 7.days # ActiveSupport::Duration (default 7 days)
607
617
  end
608
618
  ```
609
619
 
610
- The dispatcher runs archive compaction as part of its maintenance loop, deleting archived messages older than `archive_retention` in batches to avoid long-running transactions.
611
-
612
- ## Configuration reference
620
+ The compaction loop runs every hour and deletes up to 1000 rows per
621
+ queue per cycle. Both knobs live as constants on
622
+ `Pgbus::Process::Dispatcher` (`ARCHIVE_COMPACTION_INTERVAL`,
623
+ `ARCHIVE_COMPACTION_BATCH_SIZE`) — they have never been worth surfacing
624
+ as configuration. The dispatcher runs archive compaction as part of its
625
+ maintenance loop, deleting archived messages older than `archive_retention`
626
+ in batches to avoid long-running transactions.
613
627
 
614
- | Option | Default | Description |
615
- |--------|---------|-------------|
616
- | `database_url` | `nil` | PostgreSQL connection URL (auto-detected in Rails) |
617
- | `queue_prefix` | `"pgbus"` | Prefix for all PGMQ queue names |
618
- | `default_queue` | `"default"` | Default queue for jobs without explicit queue |
619
- | `pool_size` | `5` | Connection pool size |
620
- | `workers` | `[{queues: ["default"], threads: 5}]` | Worker process definitions |
621
- | `event_consumers` | `nil` | Event consumer process definitions (same format as workers) |
622
- | `polling_interval` | `0.1` | Seconds between polls (LISTEN/NOTIFY is primary) |
623
- | `visibility_timeout` | `30` | Seconds before unacked message becomes visible again |
624
- | `max_retries` | `5` | Failed reads before routing to dead letter queue |
625
- | `max_jobs_per_worker` | `nil` | Recycle worker after N jobs (nil = unlimited) |
626
- | `max_memory_mb` | `nil` | Recycle worker when memory exceeds N MB |
627
- | `max_worker_lifetime` | `nil` | Recycle worker after N seconds |
628
- | `listen_notify` | `true` | Use PGMQ's LISTEN/NOTIFY for instant wake-up |
629
- | `prefetch_limit` | `nil` | Max in-flight messages per worker (nil = unlimited) |
630
- | `dispatch_interval` | `1.0` | Seconds between dispatcher maintenance ticks |
631
- | `circuit_breaker_enabled` | `true` | Enable auto-pause on consecutive failures |
632
- | `circuit_breaker_threshold` | `5` | Consecutive failures before tripping |
633
- | `circuit_breaker_base_backoff` | `30` | Base backoff seconds (doubles per trip) |
634
- | `circuit_breaker_max_backoff` | `600` | Max backoff cap in seconds |
635
- | `priority_levels` | `nil` | Number of priority sub-queues (nil = disabled, 2-10) |
636
- | `default_priority` | `1` | Default priority for jobs without explicit priority |
637
- | `archive_retention` | `604800` | Seconds to keep archived messages (7 days) |
638
- | `archive_compaction_interval` | `3600` | Seconds between archive cleanup runs |
639
- | `archive_compaction_batch_size` | `1000` | Rows deleted per batch during compaction |
640
- | `outbox_enabled` | `false` | Enable transactional outbox poller process |
641
- | `outbox_poll_interval` | `1.0` | Seconds between outbox poll cycles |
642
- | `outbox_batch_size` | `100` | Max entries per outbox poll cycle |
643
- | `outbox_retention` | `86400` | Seconds to keep published outbox entries (1 day) |
644
- | `idempotency_ttl` | `604800` | Seconds to keep processed event records (7 days, cleaned hourly) |
645
- | `base_controller_class` | `"::ActionController::Base"` | Base class for dashboard controllers (string, constantized at load time) |
646
- | `return_to_app_url` | `nil` | URL for "back to app" button in dashboard nav (nil hides the button) |
647
- | `web_auth` | `nil` | Lambda for dashboard authentication |
648
- | `web_refresh_interval` | `5000` | Dashboard auto-refresh interval in milliseconds |
649
- | `web_live_updates` | `true` | Enable Turbo Frames auto-refresh on dashboard |
650
- | `stats_enabled` | `true` | Record job execution stats for insights dashboard |
651
- | `stats_retention` | `604800` | Seconds to keep job stats (7 days) |
628
+ ## Operations
652
629
 
653
- ## Architecture
630
+ Day-to-day running of Pgbus: starting and stopping processes, observing what is happening on the dashboard, the database tables Pgbus relies on, and how to migrate from an existing job backend.
654
631
 
655
- ```text
656
- Supervisor (fork manager)
657
- ├── Worker 1 (queues: [default, mailers], threads: 10, priority: 10)
658
- ├── Worker 2 (queues: [critical], threads: 5, single_active_consumer: true)
659
- ├── Dispatcher (maintenance: cleanup, compaction, reaping, circuit breaker)
660
- ├── Scheduler (recurring tasks via cron)
661
- ├── Consumer (event bus topics)
662
- └── Outbox Poller (transactional outbox → PGMQ, when enabled)
632
+ ### CLI
663
633
 
664
- PostgreSQL + PGMQ
665
- ├── pgbus_default (job queue)
666
- ├── pgbus_default_dlq (dead letter queue)
667
- ├── pgbus_critical (job queue)
668
- ├── pgbus_critical_dlq (dead letter queue)
669
- ├── pgbus_mailers (job queue)
670
- └── pgbus_queue_states (pause/resume + circuit breaker state)
634
+ ```bash
635
+ pgbus start # Start supervisor with workers + dispatcher + scheduler
636
+ pgbus status # Show running processes
637
+ pgbus queues # List queues with depth/metrics
638
+ pgbus version # Print version
639
+ pgbus help # Show help
671
640
  ```
672
641
 
673
- ### How it works
674
-
675
- 1. **Enqueue**: ActiveJob serializes the job to JSON, Pgbus sends it to the appropriate PGMQ queue
676
- 2. **Read**: Workers poll queues (or wake instantly via LISTEN/NOTIFY) and claim messages with a visibility timeout
677
- 3. **Execute**: The job is deserialized and executed within the Rails executor
678
- 4. **Archive/Retry**: On success, the message is archived. On failure, the visibility timeout expires and the message becomes available again. PGMQ's `read_ct` tracks delivery attempts
679
- 5. **Dead letter**: When `read_ct` exceeds `max_retries`, the message is moved to the `_dlq` queue for manual inspection
680
-
681
- ### Worker recycling
642
+ #### Role flags (split deployments)
682
643
 
683
- Unlike solid_queue, Pgbus workers recycle themselves to prevent memory bloat:
644
+ By default, `pgbus start` boots every role in one supervisor (workers, dispatcher, scheduler, event consumers, outbox poller). For containerized deployments where each role lives in a separate process, use the role flags:
684
645
 
685
- ```ruby
686
- Pgbus.configure do |config|
687
- config.max_jobs_per_worker = 10_000 # Restart after 10k jobs
688
- config.max_memory_mb = 512 # Restart if memory exceeds 512MB
689
- config.max_worker_lifetime = 3600 # Restart after 1 hour
690
- end
646
+ ```bash
647
+ pgbus start --workers-only # Only worker processes
648
+ pgbus start --scheduler-only # Only the recurring-task scheduler
649
+ pgbus start --dispatcher-only # Only the maintenance dispatcher
691
650
  ```
692
651
 
693
- When a limit is hit, the worker drains its thread pool, exits, and the supervisor forks a fresh process.
652
+ These flags are mutually exclusive. The auto-tuned `pool_size` adjusts to the role: a `--scheduler-only` deployment with 50 worker threads configured only opens the connections it actually needs (1 for the scheduler), not 51.
694
653
 
695
- ## CLI
654
+ #### Capsule selection
655
+
656
+ `--capsule NAME` boots a single named capsule. Combine with `--workers-only` to run one capsule per container:
696
657
 
697
658
  ```bash
698
- pgbus start # Start supervisor with workers + dispatcher
699
- pgbus status # Show running processes
700
- pgbus queues # List queues with depth/metrics
701
- pgbus version # Print version
702
- pgbus help # Show help
659
+ pgbus start --workers-only --capsule critical
660
+ pgbus start --workers-only --capsule default
703
661
  ```
704
662
 
705
- ## Dashboard
663
+ The capsule name is the `:name` you passed to `c.capsule` in your initializer (or the first queue token when using the string DSL).
664
+
665
+ ### Dashboard
706
666
 
707
667
  The dashboard is a mountable Rails engine at `/pgbus` with:
708
668
 
709
- - **Overview** -- queue depths, enqueued count, active processes, failure count, throughput rate
710
- - **Queues** -- per-queue metrics, purge/pause/resume/delete actions
711
- - **Jobs** -- enqueued and failed jobs, retry/discard actions
712
- - **Dead letter** -- DLQ messages with retry/discard, bulk actions
713
- - **Processes** -- active workers/dispatcher/consumers with heartbeat status
714
- - **Events** -- registered subscribers and processed events
715
- - **Outbox** -- transactional outbox entries pending publication
716
- - **Locks** -- active job uniqueness locks with state (queued/executing), owner PID@hostname, age
717
- - **Insights** -- throughput chart (jobs/min), status distribution donut, slowest job classes table
669
+ - **Overview** queue depths, enqueued count, active processes, failure count, throughput rate
670
+ - **Queues** per-queue metrics, purge/pause/resume/delete actions
671
+ - **Jobs** enqueued and failed jobs, retry/discard actions
672
+ - **Dead letter** DLQ messages with retry/discard, bulk actions
673
+ - **Processes** active workers/dispatcher/consumers with heartbeat status
674
+ - **Events** registered subscribers and processed events
675
+ - **Outbox** transactional outbox entries pending publication
676
+ - **Locks** active job uniqueness locks with state (queued/executing), owner PID@hostname, age
677
+ - **Insights** throughput chart (jobs/min), status distribution donut, slowest job classes table
718
678
 
719
679
  All tables use Turbo Frames for periodic auto-refresh without page reloads. Destructive actions use styled confirmation dialogs (not browser `confirm()`), and flash messages appear as auto-dismissing toast notifications.
720
680
 
721
- ### Queue management
681
+ #### Queue management
722
682
 
723
683
  The queues page lets you manage PGMQ queues directly:
724
684
 
725
- - **Purge** -- removes all messages from the queue (the queue itself remains)
726
- - **Delete** -- permanently drops the queue from PGMQ (removes the queue table and metadata)
727
- - **Pause / Resume** -- pauses or resumes job processing for a queue
685
+ - **Purge** removes all messages from the queue (the queue itself remains)
686
+ - **Delete** permanently drops the queue from PGMQ (removes the queue table and metadata)
687
+ - **Pause / Resume** pauses or resumes job processing for a queue
728
688
 
729
689
  All destructive actions require confirmation. Pause/resume and delete are available on both the queue index and detail pages.
730
690
 
731
- ### Dark mode
691
+ #### Dark mode
732
692
 
733
693
  The dashboard supports dark mode via Tailwind CSS `dark:` classes. It respects your system preference on first visit and persists your choice via localStorage. Toggle with the sun/moon button in the nav bar.
734
694
 
735
- ### Job stats and insights
695
+ #### Job stats and insights
736
696
 
737
697
  The executor records every job completion to `pgbus_job_stats` (job class, queue, status, duration). The insights page visualizes this data with ApexCharts (loaded via CDN, zero npm dependencies).
738
698
 
@@ -741,9 +701,9 @@ rails generate pgbus:add_job_stats # Add the stats migration
741
701
  rails generate pgbus:add_job_stats --database=pgbus
742
702
  ```
743
703
 
744
- Stats collection is enabled by default (`config.stats_enabled = true`). Old stats are cleaned up by the dispatcher based on `config.stats_retention` (default: 7 days). If the migration hasn't been run yet, stat recording is silently skipped.
704
+ Stats collection is enabled by default (`config.stats_enabled = true`). Old stats are cleaned up by the dispatcher based on `config.stats_retention` (default: 30 days). If the migration hasn't been run yet, stat recording is silently skipped.
745
705
 
746
- ## Database tables
706
+ ### Database tables
747
707
 
748
708
  Pgbus uses these tables (created via PGMQ and migrations):
749
709
 
@@ -764,16 +724,85 @@ Pgbus uses these tables (created via PGMQ and migrations):
764
724
  | `pgbus_recurring_tasks` | Recurring job definitions |
765
725
  | `pgbus_recurring_executions` | Recurring job execution history |
766
726
 
767
- ## Switching from another backend
727
+ ### Switching from another backend
768
728
 
769
729
  Already using a different job processor? These guides walk you through the migration:
770
730
 
771
- - **[Switch from Sidekiq](docs/switch_from_sidekiq.md)** -- remove Redis, convert native workers, replace middleware with callbacks
772
- - **[Switch from SolidQueue](docs/switch_from_solid_queue.md)** -- similar architecture, swap config format, gain LISTEN/NOTIFY + worker recycling
773
- - **[Switch from GoodJob](docs/switch_from_good_job.md)** -- both PostgreSQL-native, swap advisory locks for PGMQ visibility timeouts
731
+ - **[Switch from Sidekiq](docs/switch_from_sidekiq.md)** remove Redis, convert native workers, replace middleware with callbacks
732
+ - **[Switch from SolidQueue](docs/switch_from_solid_queue.md)** similar architecture, swap config format, gain LISTEN/NOTIFY + worker recycling
733
+ - **[Switch from GoodJob](docs/switch_from_good_job.md)** both PostgreSQL-native, swap advisory locks for PGMQ visibility timeouts
774
734
 
775
735
  See [docs/README.md](docs/README.md) for a full feature comparison table.
776
736
 
737
+ ## Reference
738
+
739
+ Architectural overview and the full list of configuration settings.
740
+
741
+ ### Architecture
742
+
743
+ ```text
744
+ Supervisor (fork manager)
745
+ ├── Worker 1 (queues: [default, mailers], threads: 10, priority: 10)
746
+ ├── Worker 2 (queues: [critical], threads: 5, single_active_consumer: true)
747
+ ├── Dispatcher (maintenance: cleanup, compaction, reaping, circuit breaker)
748
+ ├── Scheduler (recurring tasks via cron)
749
+ ├── Consumer (event bus topics)
750
+ └── Outbox Poller (transactional outbox → PGMQ, when enabled)
751
+
752
+ PostgreSQL + PGMQ
753
+ ├── pgbus_default (job queue)
754
+ ├── pgbus_default_dlq (dead letter queue)
755
+ ├── pgbus_critical (job queue)
756
+ ├── pgbus_critical_dlq (dead letter queue)
757
+ ├── pgbus_mailers (job queue)
758
+ └── pgbus_queue_states (pause/resume + circuit breaker state)
759
+ ```
760
+
761
+ #### How a job flows through the system
762
+
763
+ 1. **Enqueue**: ActiveJob serializes the job to JSON, Pgbus sends it to the appropriate PGMQ queue
764
+ 2. **Read**: Workers poll queues (or wake instantly via LISTEN/NOTIFY) and claim messages with a visibility timeout
765
+ 3. **Execute**: The job is deserialized and executed within the Rails executor
766
+ 4. **Archive/Retry**: On success, the message is archived. On failure, the visibility timeout expires and the message becomes available again. PGMQ's `read_ct` tracks delivery attempts
767
+ 5. **Dead letter**: When `read_ct` exceeds `max_retries`, the message is moved to the `_dlq` queue for manual inspection
768
+
769
+ ### Configuration reference
770
+
771
+ | Option | Default | Description |
772
+ |--------|---------|-------------|
773
+ | `database_url` | `nil` | PostgreSQL connection URL (auto-detected in Rails) |
774
+ | `queue_prefix` | `"pgbus"` | Prefix for all PGMQ queue names |
775
+ | `default_queue` | `"default"` | Default queue for jobs without explicit queue |
776
+ | `pool_size` | `nil` (auto) | Connection pool size. Auto-tuned from worker thread counts: `sum(workers.threads) + sum(event_consumers.threads) + 2`. Set explicitly to override. |
777
+ | `workers` | `[{queues: ["default"], threads: 5}]` | Worker capsule definitions. String DSL (`"default: 5; critical: 10"`), Array, or `nil`. |
778
+ | `event_consumers` | `nil` | Event consumer process definitions (same format as workers) |
779
+ | `roles` | `nil` (all) | Supervisor role filter — usually set via CLI flags (`--workers-only` etc.) |
780
+ | `polling_interval` | `0.1` | Seconds between polls (LISTEN/NOTIFY is primary) |
781
+ | `visibility_timeout` | `30` | Time before unacked message becomes visible again. Accepts seconds or `ActiveSupport::Duration` (e.g. `10.minutes`) |
782
+ | `max_retries` | `5` | Failed reads before routing to dead letter queue |
783
+ | `max_jobs_per_worker` | `nil` | Recycle worker after N jobs (nil = unlimited) |
784
+ | `max_memory_mb` | `nil` | Recycle worker when memory exceeds N MB |
785
+ | `max_worker_lifetime` | `nil` | Recycle worker after N seconds. Accepts seconds or Duration. |
786
+ | `listen_notify` | `true` | Use PGMQ's LISTEN/NOTIFY for instant wake-up |
787
+ | `prefetch_limit` | `nil` | Max in-flight messages per worker (nil = unlimited) |
788
+ | `dispatch_interval` | `1.0` | Seconds between dispatcher maintenance ticks |
789
+ | `circuit_breaker_enabled` | `true` | Enable auto-pause on consecutive failures (threshold and backoff are tuned via `Pgbus::CircuitBreaker` constants) |
790
+ | `priority_levels` | `nil` | Number of priority sub-queues (nil = disabled, 2-10) |
791
+ | `default_priority` | `1` | Default priority for jobs without explicit priority |
792
+ | `archive_retention` | `7.days` | How long to keep archived messages. Accepts seconds, Duration, or `nil` to disable cleanup |
793
+ | `outbox_enabled` | `false` | Enable transactional outbox poller process |
794
+ | `outbox_poll_interval` | `1.0` | Seconds between outbox poll cycles |
795
+ | `outbox_batch_size` | `100` | Max entries per outbox poll cycle |
796
+ | `outbox_retention` | `1.day` | How long to keep published outbox entries. Accepts seconds, Duration, or `nil` to disable cleanup |
797
+ | `idempotency_ttl` | `7.days` | How long to keep processed event records. Accepts seconds, Duration, or `nil` to disable cleanup |
798
+ | `base_controller_class` | `"::ActionController::Base"` | Base class for dashboard controllers (string, constantized at load time) |
799
+ | `return_to_app_url` | `nil` | URL for "back to app" button in dashboard nav (nil hides the button) |
800
+ | `web_auth` | `nil` | Lambda for dashboard authentication |
801
+ | `web_refresh_interval` | `5000` | Dashboard auto-refresh interval in milliseconds |
802
+ | `web_live_updates` | `true` | Enable Turbo Frames auto-refresh on dashboard |
803
+ | `stats_enabled` | `true` | Record job execution stats for insights dashboard |
804
+ | `stats_retention` | `30.days` | How long to keep job stats. Accepts seconds, Duration, or `nil` to disable cleanup |
805
+
777
806
  ## Development
778
807
 
779
808
  ```bash