pgbus 0.6.9 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 84c8f674c6aa6d83d0120f586b93f0fc4126ccb5d72378ec04a9b8cb3d464677
4
- data.tar.gz: 2544b1be7c34ccf50493ce0af717c3069e979d95ded26197340d8133d637378f
3
+ metadata.gz: c45dd364f341b6819b7901f583e058dbe761b375210e4162f19faf75917e3043
4
+ data.tar.gz: ed5ee189a3ff3d7fe0610deada3b2daba3726a2647d7762c58735e0771c9e0eb
5
5
  SHA512:
6
- metadata.gz: bb329bb54c6eb7da4341eb8050657007d4fe057e862db7b8ef3c6cb0612abff33eaa383d7ea0ce00844dab1a532bd4b45ad3c52618c9ef45b7f9cf9d693ba199
7
- data.tar.gz: 8ed028896561f51cb8a904f8e692125e6e61335501fceae366f5cda3f22daf7379b1b83d472df940c4965e2f0b2e00aed30ab90e3227d185e797ee17fb9654b6
6
+ metadata.gz: e28c032dc7b4f2cba37bd709c4a45030bcedd90274b86a1499d55dd4f4e255769e580983023602a42267b8728068a7eb11f7fdcbc8d12bd3efd83f35a50f241b
7
+ data.tar.gz: dc6d8d5d2e4feebbf7d940c173f53700549b5364c0e15df3f6afac1d72ecc6b1222127d85649016e9c6590e3c184eb9365da95c424a21a98be683862f4a24e67
data/README.md CHANGED
@@ -23,6 +23,7 @@ PostgreSQL-native job processing and event bus for Rails, built on [PGMQ](https:
23
23
  - [Circuit breaker and queue pause/resume](#circuit-breaker-and-queue-pauseresume)
24
24
  - [Prefetch flow control](#prefetch-flow-control)
25
25
  - [Worker recycling](#worker-recycling)
26
+ - [Retry backoff](#retry-backoff)
26
27
  - [Routing and ordering](#routing-and-ordering)
27
28
  - [Priority queues](#priority-queues)
28
29
  - [Consumer priority](#consumer-priority)
@@ -31,6 +32,10 @@ PostgreSQL-native job processing and event bus for Rails, built on [PGMQ](https:
31
32
  - [Batches](#batches)
32
33
  - [Transactional outbox](#transactional-outbox)
33
34
  - [Archive compaction](#archive-compaction)
35
+ - [Observability](#observability)
36
+ - [Error reporting](#error-reporting)
37
+ - [Structured logging](#structured-logging)
38
+ - [Queue health monitoring](#queue-health-monitoring)
34
39
  - [Real-time broadcasts](#real-time-broadcasts-turbo-streams-replacement)
35
40
  - [Operations](#operations)
36
41
  - [CLI](#cli)
@@ -63,6 +68,10 @@ PostgreSQL-native job processing and event bus for Rails, built on [PGMQ](https:
63
68
  - **Single active consumer** -- advisory-lock-based exclusive queue processing for strict ordering
64
69
  - **Consumer priority** -- higher-priority workers get first dibs, lower-priority workers back off
65
70
  - **Job uniqueness** -- prevent duplicate jobs with reaper-based crash recovery, no TTL-driven expiry
71
+ - **Retry backoff** -- exponential backoff with jitter for VT-based retries, per-job overrides
72
+ - **Error reporting** -- pluggable error reporters for APM integration (Appsignal, Sentry, etc.)
73
+ - **Structured logging** -- JSON log formatter with component extraction and thread-local context
74
+ - **Queue health** -- dead tuple monitoring, autovacuum tuning, Prometheus metrics
66
75
 
67
76
  ## Requirements
68
77
 
@@ -463,6 +472,34 @@ end
463
472
 
464
473
  When a limit is hit, the worker drains its thread pool, exits, and the supervisor forks a fresh process. RSS memory is sampled from `/proc/self/statm` (Linux) or `ps -o rss` (macOS).
465
474
 
475
+ ### Retry backoff
476
+
477
+ When a job fails, Pgbus extends the PGMQ visibility timeout with exponential backoff so retries are spread out instead of bunched at fixed intervals:
478
+
479
+ ```ruby
480
+ Pgbus.configure do |config|
481
+ config.retry_backoff = 5 # base delay (seconds)
482
+ config.retry_backoff_max = 300 # cap at 5 minutes
483
+ config.retry_backoff_jitter = 0.15 # +-15% randomization
484
+ end
485
+ ```
486
+
487
+ The delay formula is `base * 2^(attempt-1) * (1 + random_jitter)`. For a job that fails 4 times with defaults: ~5s, ~10s, ~20s, ~40s before hitting DLQ on the 5th read.
488
+
489
+ Jobs can override the global settings per-class:
490
+
491
+ ```ruby
492
+ class FragileApiJob < ApplicationJob
493
+ include Pgbus::RetryBackoff::JobMixin
494
+
495
+ pgbus_retry_backoff base: 10, max: 600, jitter: 0.2
496
+
497
+ def perform(...)
498
+ # ...
499
+ end
500
+ end
501
+ ```
502
+
466
503
  ### Async execution mode (fibers)
467
504
 
468
505
  Workers can optionally execute jobs as fibers instead of threads. This is ideal for I/O-bound workloads (HTTP calls, email delivery, LLM API calls) where jobs spend most of their time waiting on network I/O.
@@ -662,6 +699,96 @@ as configuration. The dispatcher runs archive compaction as part of its
662
699
  maintenance loop, deleting archived messages older than `archive_retention`
663
700
  in batches to avoid long-running transactions.
664
701
 
702
+ ## Observability
703
+
704
+ Error reporting, structured logging, and queue health monitoring.
705
+
706
+ ### Error reporting
707
+
708
+ By default, Pgbus logs caught exceptions and continues. To route them to your APM service (Appsignal, Sentry, Honeybadger, etc.), push callable reporters onto `config.error_reporters`:
709
+
710
+ ```ruby
711
+ Pgbus.configure do |c|
712
+ c.error_reporters << ->(ex, ctx) {
713
+ Appsignal.set_error(ex) { |t| t.set_tags(ctx) }
714
+ }
715
+ end
716
+ ```
717
+
718
+ Each reporter receives `(exception, context_hash)`. The context hash includes keys like `action`, `queue`, `job_class`, and `msg_id` depending on the call site. Reporters that accept a third argument also receive the Pgbus configuration object.
719
+
720
+ Reporters are wired into all critical rescue paths: job execution failures, worker fetch/process errors, dispatcher maintenance, supervisor fork failures, circuit breaker trips, outbox publish errors, and failed event recording. Non-critical paths (dashboard queries, stat recording) remain log-only.
721
+
722
+ `ErrorReporter.report` is guaranteed to never raise — if a reporter or the logger itself throws, the error is swallowed silently. This preserves fault-tolerance invariants at every rescue site.
723
+
724
+ ### Structured logging
725
+
726
+ Pgbus ships two log formatters inspired by Sidekiq's `Logger::Formatters`:
727
+
728
+ ```ruby
729
+ Pgbus.configure do |c|
730
+ c.log_format = :json # or :text (default)
731
+ end
732
+ ```
733
+
734
+ **Text format** (default):
735
+
736
+ ```text
737
+ INFO 2025-01-15T10:30:00.000Z pid=1234 tid=abc queue=default: Starting job
738
+ ```
739
+
740
+ **JSON format**:
741
+
742
+ ```json
743
+ {"ts":"2025-01-15T10:30:00.000Z","pid":1234,"tid":"abc","lvl":"INFO","component":"Pgbus","msg":"Starting job","ctx":{"queue":"default"}}
744
+ ```
745
+
746
+ The JSON formatter extracts `[Pgbus]` and `[Pgbus::Web]` prefixes from log messages into a separate `component` field so the `msg` field stays clean for log aggregators. Thread-local context can be added via `Pgbus::LogFormatter.with_context(queue: "default") { ... }` and appears under the `ctx` key.
747
+
748
+ You can also set a formatter directly on the logger:
749
+
750
+ ```ruby
751
+ Pgbus.configure do |c|
752
+ c.logger.formatter = Pgbus::LogFormatter::JSON.new
753
+ end
754
+ ```
755
+
756
+ ### Queue health monitoring
757
+
758
+ The dashboard includes a **Queue Health** panel showing PostgreSQL vacuum stats per PGMQ table: dead tuple counts, live tuple counts, bloat ratio (dead / total), last vacuum age, and MVCC horizon age. The same stats appear on individual queue detail pages.
759
+
760
+ #### Autovacuum tuning
761
+
762
+ PGMQ queue tables have high insert/delete churn that overwhelms PostgreSQL's default autovacuum settings. Pgbus applies aggressive per-table tuning automatically:
763
+
764
+ - **New queues at runtime**: `Client#ensure_single_queue` applies tuning after `pgmq.create()`
765
+ - **Existing installations**: `rails generate pgbus:update` detects untuned tables
766
+ - **Fresh installs**: The install migration includes tuning for the default queue
767
+
768
+ To apply tuning manually or after `db:schema:load` (which loses `ALTER TABLE` settings):
769
+
770
+ ```bash
771
+ rails generate pgbus:tune_autovacuum # Generate migration
772
+ rails generate pgbus:tune_autovacuum --database=pgbus # For separate database
773
+ ```
774
+
775
+ The `pgbus:tune_autovacuum` rake task also hooks into `db:schema:load` automatically.
776
+
777
+ #### Prometheus metrics
778
+
779
+ When `config.metrics_enabled = true` (default), the dashboard exposes Prometheus-compatible gauges:
780
+
781
+ | Metric | Description |
782
+ |--------|-------------|
783
+ | `pgbus_table_dead_tuples` | Dead tuple count per PGMQ table |
784
+ | `pgbus_table_live_tuples` | Live tuple count per PGMQ table |
785
+ | `pgbus_table_bloat_ratio` | Dead / (dead + live) per table |
786
+ | `pgbus_table_last_vacuum_age_seconds` | Seconds since last vacuum per table |
787
+ | `pgbus_oldest_transaction_age_seconds` | MVCC horizon pin risk |
788
+ | `pgbus_worker_pool_capacity` | Total worker thread slots |
789
+ | `pgbus_worker_pool_busy` | Currently busy worker threads |
790
+ | `pgbus_worker_pool_utilization` | Busy / capacity ratio |
791
+
665
792
  ## Real-time broadcasts (turbo-streams replacement)
666
793
 
667
794
  Pgbus ships a drop-in replacement for turbo-rails' `turbo_stream_from` helper that fixes several well-known ActionCable correctness bugs by using PGMQ message IDs as a replay cursor. Same API as turbo-rails. No Redis. No ActionCable. No lost messages on reconnect.
@@ -751,6 +878,34 @@ One Puma worker (or Falcon reactor) hosts one `Pgbus::Web::Streamer::Instance` s
751
878
 
752
879
  Per-stream retention is handled by the main pgbus dispatcher process on the same interval as the dispatcher's `ARCHIVE_COMPACTION_INTERVAL` constant. Streams default to a 5-minute retention because SSE clients reconnect within seconds; chat-style applications override the retention to days via `streams_retention`.
753
880
 
881
+ ### Stream name helpers
882
+
883
+ Apps using UUID primary keys with turbo-rails-style dom IDs can hit PGMQ's 47-character queue-name ceiling (`"gid://app/Ai::Chat/9c14e8b2-...:messages"` exceeds the limit before the `pgbus_` prefix is even added). Pgbus provides helpers to generate short, collision-safe stream names:
884
+
885
+ ```ruby
886
+ # In your ApplicationRecord
887
+ class ApplicationRecord < ActiveRecord::Base
888
+ primary_abstract_class
889
+ include Pgbus::Streams::Streamable
890
+ end
891
+ ```
892
+
893
+ This gives every model `short_id` (16-hex SHA-256 prefix of the GlobalID) and `to_stream_key`:
894
+
895
+ ```ruby
896
+ chat = Ai::Chat.find("9c14e8b2-...")
897
+ chat.short_id # => "ai_chat_a3f8c1e9d2b47610"
898
+ chat.to_stream_key # => "ai_chat_a3f8c1e9d2b47610"
899
+
900
+ # Compose multi-part stream names
901
+ Pgbus.stream_key(chat, :messages) # => "ai_chat_a3f8c1e9d2b47610_messages"
902
+
903
+ # Use in views
904
+ <%= pgbus_stream_from Pgbus.stream_key(@chat, :messages) %>
905
+ ```
906
+
907
+ The budget is computed from `config.queue_prefix` at call time so prefix overrides adjust automatically. If a stream name exceeds the budget, `Pgbus::Streams::StreamNameTooLong` is raised immediately with the offending name, computed budget, and a pointer to `Pgbus.stream_key` — before PGMQ is ever touched.
908
+
754
909
  ### Transactional broadcasts
755
910
 
756
911
  **This is the feature no other Rails real-time stack can offer.** A broadcast issued inside an open ActiveRecord transaction is deferred until the transaction commits. If it rolls back, the broadcast silently drops — clients never see the change that the database never persisted.
@@ -1001,6 +1156,8 @@ Pgbus uses these tables (created via PGMQ and migrations):
1001
1156
  | `pgbus_outbox_entries` | Transactional outbox entries pending publication |
1002
1157
  | `pgbus_recurring_tasks` | Recurring job definitions |
1003
1158
  | `pgbus_recurring_executions` | Recurring job execution history |
1159
+ | `pgbus_presence_members` | Stream presence tracking (who is subscribed) |
1160
+ | `pgbus_stream_stats` | Stream broadcast/connect/disconnect metrics (opt-in) |
1004
1161
 
1005
1162
  ### Switching from another backend
1006
1163
 
@@ -1058,6 +1215,9 @@ PostgreSQL + PGMQ
1058
1215
  | `polling_interval` | `0.1` | Seconds between polls (LISTEN/NOTIFY is primary) |
1059
1216
  | `visibility_timeout` | `30` | Time before unacked message becomes visible again. Accepts seconds or `ActiveSupport::Duration` (e.g. `10.minutes`) |
1060
1217
  | `max_retries` | `5` | Failed reads before routing to dead letter queue |
1218
+ | `retry_backoff` | `5` | Base delay in seconds for VT-based retry backoff (exponential: `base * 2^(attempt-1)`) |
1219
+ | `retry_backoff_max` | `300` | Maximum retry delay in seconds (caps the exponential curve) |
1220
+ | `retry_backoff_jitter` | `0.15` | Jitter factor (0-1) added to retry delays to spread retries |
1061
1221
  | `max_jobs_per_worker` | `nil` | Recycle worker after N jobs (nil = unlimited) |
1062
1222
  | `max_memory_mb` | `nil` | Recycle worker when memory exceeds N MB |
1063
1223
  | `max_worker_lifetime` | `nil` | Recycle worker after N seconds. Accepts seconds or Duration. |
@@ -1080,6 +1240,12 @@ PostgreSQL + PGMQ
1080
1240
  | `web_live_updates` | `true` | Enable Turbo Frames auto-refresh on dashboard |
1081
1241
  | `stats_enabled` | `true` | Record job execution stats for insights dashboard |
1082
1242
  | `stats_retention` | `30.days` | How long to keep job stats. Accepts seconds, Duration, or `nil` to disable cleanup |
1243
+ | `streams_stats_enabled` | `false` | Record stream broadcast/connect/disconnect stats (opt-in, can be high volume) |
1244
+ | `streams_path` | `nil` | Custom URL path for the SSE endpoint (nil = auto-detected from engine mount) |
1245
+ | `execution_mode` | `:threads` | Global execution mode (`:threads` or `:async`). Per-worker override via capsule config. |
1246
+ | `error_reporters` | `[]` | Array of callables invoked on caught exceptions. Each receives `(exception, context_hash)`. |
1247
+ | `log_format` | `:text` | Log formatter (`:text` or `:json`). Sets `logger.formatter` automatically. |
1248
+ | `metrics_enabled` | `true` | Enable Prometheus-compatible metrics on the dashboard |
1083
1249
 
1084
1250
  ## Development
1085
1251
 
@@ -13,6 +13,10 @@ module Pgbus
13
13
  @stat_buffer = stat_buffer
14
14
  end
15
15
 
16
+ # Exceptions we never want to swallow — let the process die/signal propagate.
17
+ FATAL_EXCEPTIONS = [SystemExit, Interrupt, SignalException, NoMemoryError, SystemStackError].freeze
18
+ private_constant :FATAL_EXCEPTIONS
19
+
16
20
  def execute(message, queue_name, source_queue: nil)
17
21
  execution_start = monotonic_now
18
22
  payload = JSON.parse(message.message)
@@ -51,18 +55,36 @@ module Pgbus
51
55
 
52
56
  job_succeeded = false
53
57
 
58
+ # Debug-level phase markers. Silent at INFO+, but invaluable when a
59
+ # fiber interrupt or connection issue loses control flow between phases
60
+ # (issue #126). Each line identifies msg_id + phase so the gap is
61
+ # visible in logs: "deserialized" without "archived" means the job
62
+ # ran but its message was never archived.
63
+ msg_id = message.msg_id.to_i
54
64
  Instrumentation.instrument("pgbus.executor.execute", queue: queue_name, job_class: job_class) do
65
+ Pgbus.logger.debug { "[Pgbus] Executor phase=deserialize msg_id=#{msg_id} job=#{job_class}" }
55
66
  job = ::ActiveJob::Base.deserialize(payload)
67
+ Pgbus.logger.debug { "[Pgbus] Executor phase=perform msg_id=#{msg_id} job=#{job_class}" }
56
68
  execute_job(job)
57
- archive_from(queue_name, message.msg_id.to_i, source_queue: source_queue)
58
- FailedEventRecorder.clear!(queue_name: queue_name, msg_id: message.msg_id.to_i)
69
+ Pgbus.logger.debug { "[Pgbus] Executor phase=archive msg_id=#{msg_id} job=#{job_class}" }
70
+ archive_from(queue_name, msg_id, source_queue: source_queue)
71
+ FailedEventRecorder.clear!(queue_name: queue_name, msg_id: msg_id)
59
72
  job_succeeded = true
73
+ Pgbus.logger.debug { "[Pgbus] Executor phase=succeeded msg_id=#{msg_id} job=#{job_class}" }
60
74
  end
61
75
 
62
76
  instrument("pgbus.job_completed", queue: queue_name, job_class: job_class)
63
77
  record_stat(payload, queue_name, "success", execution_start, message: message)
64
78
  :success
65
- rescue StandardError => e
79
+ rescue *FATAL_EXCEPTIONS
80
+ # Process-fatal: propagate so the supervisor/OS can react.
81
+ raise
82
+ rescue Exception => e # rubocop:disable Lint/RescueException
83
+ # Widened from StandardError to catch Async::Stop / Async::Cancel
84
+ # (both inherit from Exception, not StandardError) under execution_mode: :async.
85
+ # Before this, a fiber interruption between perform_now and archive_from
86
+ # silently lost control flow — no failed event row, no job_failed
87
+ # notification, uniqueness lock held until VT expired. See issue #126.
66
88
  handle_failure(message, queue_name, e, payload: payload)
67
89
  instrument("pgbus.job_failed", queue: queue_name, job_class: payload&.dig("job_class"), error: e.class.name)
68
90
  record_stat(payload, queue_name, "failed", execution_start, message: message)
@@ -146,7 +168,9 @@ module Pgbus
146
168
  end
147
169
 
148
170
  def handle_failure(message, queue_name, error, payload: nil)
149
- Pgbus.logger.error { "[Pgbus] Job failed: #{error.class}: #{error.message}" }
171
+ ctx = { action: "execute_job", queue: queue_name, job_class: payload&.dig("job_class"),
172
+ msg_id: message.msg_id.to_i, read_ct: message.read_ct.to_i }
173
+ ErrorReporter.report(error, ctx)
150
174
  Pgbus.logger.debug { error.backtrace&.join("\n") }
151
175
 
152
176
  # Record failure for dashboard visibility.
@@ -92,7 +92,7 @@ module Pgbus
92
92
  @failure_counts.delete(queue_name)
93
93
  invalidate_cache(queue_name)
94
94
  rescue StandardError => e
95
- Pgbus.logger.error { "[Pgbus] Circuit breaker trip failed for #{queue_name}: #{e.message}" }
95
+ ErrorReporter.report(e, { action: "circuit_breaker_trip", queue: queue_name })
96
96
  end
97
97
 
98
98
  def check_paused(queue_name)
@@ -27,7 +27,9 @@ module Pgbus
27
27
  # sensitive and need every broadcast to fire a NOTIFY, even
28
28
  # when several are batched within a single millisecond.
29
29
  # Override the throttle to 0 specifically for stream queues.
30
- synchronized { @pgmq.enable_notify_insert(full_name, throttle_interval_ms: 0) } if config.listen_notify
30
+ # Use the idempotent path to avoid deadlocks when multiple
31
+ # processes race to set up the same stream queue.
32
+ synchronized { enable_notify_if_needed(full_name, 0) }
31
33
 
32
34
  # CREATE INDEX IF NOT EXISTS is idempotent in Postgres but still
33
35
  # requires a roundtrip and a brief ACCESS SHARE lock on the archive
data/lib/pgbus/client.rb CHANGED
@@ -156,10 +156,17 @@ module Pgbus
156
156
  # Read from multiple queues in a single SQL query (UNION ALL).
157
157
  # Each returned message includes a queue_name field identifying its source.
158
158
  # queue_names should be logical names (prefix is added automatically).
159
- def read_multi(queue_names, qty:, vt: nil)
159
+ #
160
+ # `qty` is the per-queue cap (pgmq-ruby semantics), so without `limit:` the
161
+ # caller receives up to `queue_count * qty` messages. Pass `limit:` to cap
162
+ # the total across all queues — required when feeding a fixed-size pool,
163
+ # otherwise the pool can overflow on multi-queue reads (issue #123).
164
+ def read_multi(queue_names, qty:, vt: nil, limit: nil)
160
165
  full_names = queue_names.map { |q| config.queue_name(q) }
161
- Instrumentation.instrument("pgbus.client.read_multi", queues: full_names, qty: qty) do
162
- synchronized { @pgmq.read_multi(full_names, vt: vt || config.visibility_timeout, qty: qty) }
166
+ Instrumentation.instrument("pgbus.client.read_multi", queues: full_names, qty: qty, limit: limit) do
167
+ synchronized do
168
+ @pgmq.read_multi(full_names, vt: vt || config.visibility_timeout, qty: qty, limit: limit)
169
+ end
163
170
  end
164
171
  end
165
172
 
@@ -450,12 +457,48 @@ module Pgbus
450
457
  synchronized do
451
458
  @pgmq.create(full_name)
452
459
  tune_autovacuum(full_name)
453
- @pgmq.enable_notify_insert(full_name, throttle_interval_ms: NOTIFY_THROTTLE_MS) if config.listen_notify
460
+ enable_notify_if_needed(full_name, NOTIFY_THROTTLE_MS)
454
461
  end
455
462
  true
456
463
  end
457
464
  end
458
465
 
466
+ def enable_notify_if_needed(full_name, throttle_ms)
467
+ return unless config.listen_notify
468
+ return if notify_trigger_current?(full_name, throttle_ms)
469
+
470
+ @pgmq.enable_notify_insert(full_name, throttle_interval_ms: throttle_ms)
471
+ end
472
+
473
+ # Check whether the NOTIFY trigger already exists on this queue with the
474
+ # expected throttle interval. When it does, we can skip the destructive
475
+ # DROP TRIGGER + CREATE TRIGGER cycle that causes deadlocks when multiple
476
+ # forked processes race during bootstrap.
477
+ def notify_trigger_current?(full_name, throttle_ms)
478
+ with_raw_connection do |conn|
479
+ result = conn.exec_params(<<~SQL, [full_name, throttle_ms])
480
+ SELECT 1
481
+ FROM pg_trigger t
482
+ JOIN pg_class c ON t.tgrelid = c.oid
483
+ JOIN pg_namespace n ON c.relnamespace = n.oid
484
+ WHERE n.nspname = 'pgmq'
485
+ AND c.relname = pgmq.format_table_name($1, 'q')
486
+ AND t.tgname = 'trigger_notify_queue_insert_listeners'
487
+ AND EXISTS (
488
+ SELECT 1 FROM pgmq.notify_insert_throttle
489
+ WHERE queue_name = $1
490
+ AND throttle_interval_ms = $2
491
+ )
492
+ LIMIT 1
493
+ SQL
494
+ result.ntuples.positive?
495
+ end
496
+ rescue StandardError
497
+ # If we can't check (e.g. pgmq schema not fully ready), fall back to
498
+ # the unconditional path — same behavior as before this fix.
499
+ false
500
+ end
501
+
459
502
  def tune_autovacuum(queue_name)
460
503
  with_raw_connection do |conn|
461
504
  conn.exec(AutovacuumTuning.sql_for_queue(queue_name))
@@ -59,6 +59,11 @@ module Pgbus
59
59
 
60
60
  # Logging
61
61
  attr_accessor :logger
62
+ attr_reader :log_format # rubocop:disable Style/AccessorGrouping
63
+
64
+ # Error reporting — array of callable objects invoked on caught exceptions.
65
+ # Each receives (exception, context_hash) or (exception, context_hash, config).
66
+ attr_accessor :error_reporters
62
67
 
63
68
  # LISTEN/NOTIFY. Only the on/off switch is user-facing — the throttle
64
69
  # interval is a Postgres-side tuning knob that lives as a constant on
@@ -140,6 +145,8 @@ module Pgbus
140
145
  @allowed_global_id_models = nil # nil = allow all (for backwards compat)
141
146
 
142
147
  @logger = (defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger) || Logger.new($stdout)
148
+ @log_format = :text
149
+ @error_reporters = []
143
150
 
144
151
  @listen_notify = true
145
152
 
@@ -224,6 +231,21 @@ module Pgbus
224
231
  ExecutionPools.normalize_mode(mode)
225
232
  end
226
233
 
234
+ VALID_LOG_FORMATS = %i[text json].freeze
235
+
236
+ def log_format=(format)
237
+ format = format.to_sym
238
+ unless VALID_LOG_FORMATS.include?(format)
239
+ raise ArgumentError, "Invalid log_format: #{format}. Must be one of: #{VALID_LOG_FORMATS.join(", ")}"
240
+ end
241
+
242
+ @log_format = format
243
+ @logger.formatter = case format
244
+ when :json then LogFormatter::JSON.new
245
+ when :text then LogFormatter::Text.new
246
+ end
247
+ end
248
+
227
249
  VALID_PGMQ_SCHEMA_MODES = %i[auto extension embedded].freeze
228
250
 
229
251
  def pgmq_schema_mode=(mode)
@@ -0,0 +1,48 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Pgbus
4
+ # Central error reporting module. Iterates all configured error reporters
5
+ # and logs the error. Inspired by Sidekiq's error_handlers pattern.
6
+ #
7
+ # Usage:
8
+ # Pgbus::ErrorReporter.report(exception, { queue: "default" })
9
+ #
10
+ # Configuration:
11
+ # Pgbus.configure do |c|
12
+ # c.error_reporters << ->(ex, ctx) { Appsignal.set_error(ex) { |t| t.set_tags(ctx) } }
13
+ # end
14
+ module ErrorReporter
15
+ module_function
16
+
17
+ def report(exception, context = {}, config: Pgbus.configuration)
18
+ log_error(exception, context, config: config)
19
+
20
+ config.error_reporters.each do |handler|
21
+ call_handler(handler, exception, context, config)
22
+ rescue Exception => e # rubocop:disable Lint/RescueException
23
+ config.logger.error { "[Pgbus] Error reporter raised: #{e.class}: #{e.message}" }
24
+ end
25
+ rescue Exception # rubocop:disable Lint/RescueException
26
+ # ErrorReporter must never raise — callers sit inside rescue blocks
27
+ # where an unexpected raise would break fault-tolerance invariants.
28
+ nil
29
+ end
30
+
31
+ def call_handler(handler, exception, context, config)
32
+ target = handler.is_a?(Proc) ? handler : handler.method(:call)
33
+ if target.arity == 3 || (target.arity.negative? && target.parameters.size >= 3)
34
+ handler.call(exception, context, config)
35
+ else
36
+ handler.call(exception, context)
37
+ end
38
+ end
39
+
40
+ def log_error(exception, context, config:)
41
+ config.logger.error do
42
+ msg = "[Pgbus] #{exception.class}: #{exception.message}"
43
+ msg += " (#{context.inspect})" unless context.empty?
44
+ msg
45
+ end
46
+ end
47
+ end
48
+ end
@@ -128,9 +128,20 @@ module Pgbus
128
128
  nil
129
129
  end
130
130
 
131
+ # Supervisor-level rescue: catch any Exception raised from the user
132
+ # block so capacity is always restored and the failure is logged.
133
+ # The `async` gem uses Async::Stop / Async::Cancel (Exception subclasses,
134
+ # NOT StandardError) to cancel tasks, and prior to issue #126 those
135
+ # would leak past `rescue StandardError` and silently vanish.
136
+ # Process-fatal signals still propagate so the supervisor can react.
137
+ FATAL_EXCEPTIONS = [SystemExit, Interrupt, SignalException, NoMemoryError, SystemStackError].freeze
138
+ private_constant :FATAL_EXCEPTIONS
139
+
131
140
  def perform(block)
132
141
  block.call
133
- rescue StandardError => e
142
+ rescue *FATAL_EXCEPTIONS
143
+ raise
144
+ rescue Exception => e # rubocop:disable Lint/RescueException
134
145
  Pgbus.logger.error { "[Pgbus] Async pool fiber error: #{e.class}: #{e.message}" }
135
146
  ensure
136
147
  restore_capacity
@@ -32,14 +32,7 @@ module Pgbus
32
32
  ]
33
33
  )
34
34
  rescue StandardError => e
35
- # ERROR-level: silent loss of failure-tracking data defeats the
36
- # purpose of the dashboard's "Failed Jobs" section. If recording
37
- # fails, surface it loudly so the broken state can be diagnosed
38
- # rather than silently masked.
39
- Pgbus.logger.error do
40
- "[Pgbus] Failed to record failed event for queue=#{queue_name} msg_id=#{msg_id}: " \
41
- "#{e.class}: #{e.message}"
42
- end
35
+ ErrorReporter.report(e, { action: "record_failed_event", queue: queue_name, msg_id: msg_id })
43
36
  end
44
37
 
45
38
  def clear!(queue_name:, msg_id:)
@@ -0,0 +1,96 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "json"
4
+ require "logger"
5
+ require "time"
6
+
7
+ module Pgbus
8
+ # Log formatters for Pgbus, inspired by Sidekiq::Logger::Formatters.
9
+ #
10
+ # Usage:
11
+ # Pgbus.configure do |c|
12
+ # c.logger.formatter = Pgbus::LogFormatter::JSON.new
13
+ # end
14
+ #
15
+ # Or via the convenience config option:
16
+ # Pgbus.configure do |c|
17
+ # c.log_format = :json
18
+ # end
19
+ module LogFormatter
20
+ module_function
21
+
22
+ def tid
23
+ Thread.current[:pgbus_tid] ||= (Thread.current.object_id ^ ::Process.pid).to_s(36)
24
+ end
25
+
26
+ # Thread-local context for structured logging. Works like
27
+ # Sidekiq::Context — any key/value pairs set via with_context
28
+ # appear in the JSON output under the "ctx" key.
29
+ def with_context(hash)
30
+ orig = current_context.dup
31
+ current_context.merge!(hash)
32
+ yield
33
+ ensure
34
+ Thread.current[:pgbus_log_context] = orig
35
+ end
36
+
37
+ def current_context
38
+ Thread.current[:pgbus_log_context] ||= {}
39
+ end
40
+
41
+ # Human-readable text formatter with Pgbus context.
42
+ # Output: "INFO 2024-01-15T10:30:00.000Z pid=1234 tid=abc queue=default: message\n"
43
+ class Text < ::Logger::Formatter
44
+ def call(severity, time, _progname, message)
45
+ "#{severity} #{time.utc.iso8601(3)} pid=#{::Process.pid} tid=#{LogFormatter.tid}#{format_context}: #{message}\n"
46
+ end
47
+
48
+ private
49
+
50
+ def format_context
51
+ ctx = LogFormatter.current_context
52
+ return "" if ctx.empty?
53
+
54
+ " #{ctx.map { |k, v| "#{k}=#{v}" }.join(" ")}"
55
+ end
56
+ end
57
+
58
+ # JSON formatter for structured logging. Each log line is a single
59
+ # JSON object followed by a newline. Extracts the [Pgbus::Component]
60
+ # prefix from messages into a separate "component" field.
61
+ #
62
+ # Output fields:
63
+ # ts — ISO 8601 timestamp with milliseconds
64
+ # pid — process ID
65
+ # tid — thread ID (short hex)
66
+ # lvl — severity (DEBUG/INFO/WARN/ERROR/FATAL)
67
+ # msg — the log message (with component prefix stripped)
68
+ # component — extracted from [Pgbus] or [Pgbus::Foo] prefix (optional)
69
+ # ctx — thread-local context hash (optional, only when non-empty)
70
+ class JSON < ::Logger::Formatter
71
+ COMPONENT_PREFIX = /\A\[([^\]]+)\]\s*/
72
+
73
+ def call(severity, time, _progname, message)
74
+ msg = message.to_s
75
+ hash = {
76
+ ts: time.utc.iso8601(3),
77
+ pid: ::Process.pid,
78
+ tid: LogFormatter.tid,
79
+ lvl: severity
80
+ }
81
+
82
+ if (match = msg.match(COMPONENT_PREFIX))
83
+ hash[:component] = match[1]
84
+ msg = msg.sub(COMPONENT_PREFIX, "")
85
+ end
86
+
87
+ hash[:msg] = msg
88
+
89
+ ctx = LogFormatter.current_context
90
+ hash[:ctx] = ctx unless ctx.empty?
91
+
92
+ "#{::JSON.generate(hash)}\n"
93
+ end
94
+ end
95
+ end
96
+ end
@@ -65,7 +65,7 @@ module Pgbus
65
65
  Pgbus.logger.debug { "[Pgbus] Outbox published #{published} entries" } if published.positive?
66
66
  published
67
67
  rescue StandardError => e
68
- Pgbus.logger.error { "[Pgbus] Outbox poll error: #{e.message}" }
68
+ ErrorReporter.report(e, { action: "outbox_poll" })
69
69
  0
70
70
  end
71
71
 
@@ -92,7 +92,7 @@ module Pgbus
92
92
  entry.update!(published_at: Time.current)
93
93
  true
94
94
  rescue StandardError => e
95
- Pgbus.logger.error { "[Pgbus] Failed to publish outbox entry #{entry.id}: #{e.message}" }
95
+ ErrorReporter.report(e, { action: "outbox_publish_topic", entry_id: entry.id })
96
96
  false
97
97
  end
98
98
 
@@ -112,7 +112,7 @@ module Pgbus
112
112
  group.each { |e| e.update!(published_at: now) }
113
113
  succeeded += group.size
114
114
  rescue StandardError => e
115
- Pgbus.logger.error { "[Pgbus] Failed to batch-publish #{group.size} outbox entries: #{e.message}" }
115
+ ErrorReporter.report(e, { action: "outbox_batch_publish", queue: queue, batch_size: group.size })
116
116
  # Fall back to individual publishing for this group
117
117
  group.each { |entry| succeeded += 1 if publish_single_queue(entry) }
118
118
  end
@@ -133,7 +133,7 @@ module Pgbus
133
133
  entry.update!(published_at: Time.current)
134
134
  true
135
135
  rescue StandardError => e
136
- Pgbus.logger.error { "[Pgbus] Failed to publish outbox entry #{entry.id}: #{e.message}" }
136
+ ErrorReporter.report(e, { action: "outbox_publish_queue", entry_id: entry.id })
137
137
  false
138
138
  end
139
139
 
@@ -98,8 +98,12 @@ module Pgbus
98
98
  # the next read will route to DLQ above.
99
99
  end
100
100
 
101
+ # `qty` is the total pool capacity. pgmq-ruby treats `qty:` as per-queue,
102
+ # so we also pass `limit: qty` to cap the total across all queues —
103
+ # otherwise we get `queue_count * qty` messages and overflow the
104
+ # execution pool, crashing the consumer fork (issue #123).
101
105
  def fetch_multi_consumer(qty)
102
- messages = Pgbus.client.read_multi(@queue_names, qty: qty) || []
106
+ messages = Pgbus.client.read_multi(@queue_names, qty: qty, limit: qty) || []
103
107
  prefix = "#{config.queue_prefix}_"
104
108
 
105
109
  messages.map do |m|
@@ -94,7 +94,7 @@ module Pgbus
94
94
  yield
95
95
  instance_variable_set(ivar, now)
96
96
  rescue StandardError => e
97
- Pgbus.logger.error { "[Pgbus] Dispatcher maintenance error: #{e.message}" }
97
+ ErrorReporter.report(e, { action: "dispatcher_maintenance", task: ivar.to_s.delete_prefix("@last_").delete_suffix("_at") })
98
98
  end
99
99
 
100
100
  def cleanup_processed_events
@@ -21,6 +21,14 @@ module Pgbus
21
21
 
22
22
  Pgbus.logger.info { "[Pgbus] Supervisor starting pid=#{::Process.pid}" }
23
23
 
24
+ # Bootstrap queues once in the parent process before forking children.
25
+ # This avoids the deadlock that occurs when multiple forked children
26
+ # race to call enable_notify_insert (DROP TRIGGER + CREATE TRIGGER)
27
+ # concurrently on the same queue tables. Children still call
28
+ # bootstrap_queues post-fork but the idempotent check in
29
+ # notify_trigger_current? makes those calls cheap no-ops.
30
+ bootstrap_queues
31
+
24
32
  boot_processes
25
33
  monitor_loop
26
34
  ensure
@@ -83,7 +91,7 @@ module Pgbus
83
91
  @forks[pid] = { type: :worker, config: worker_config }
84
92
  Pgbus.logger.info { "[Pgbus] Forked worker pid=#{pid} queues=#{queues.join(",")} mode=#{exec_mode}" }
85
93
  rescue Errno::EAGAIN, Errno::ENOMEM => e
86
- Pgbus.logger.error { "[Pgbus] Fork failed for worker: #{e.message}" }
94
+ ErrorReporter.report(e, { action: "fork_worker", queues: queues })
87
95
  end
88
96
 
89
97
  def fork_dispatcher
@@ -103,7 +111,7 @@ module Pgbus
103
111
  @forks[pid] = { type: :dispatcher }
104
112
  Pgbus.logger.info { "[Pgbus] Forked dispatcher pid=#{pid}" }
105
113
  rescue Errno::EAGAIN, Errno::ENOMEM => e
106
- Pgbus.logger.error { "[Pgbus] Fork failed for dispatcher: #{e.message}" }
114
+ ErrorReporter.report(e, { action: "fork_dispatcher" })
107
115
  end
108
116
 
109
117
  def boot_scheduler
@@ -132,7 +140,7 @@ module Pgbus
132
140
  @forks[pid] = { type: :scheduler }
133
141
  Pgbus.logger.info { "[Pgbus] Forked scheduler pid=#{pid}" }
134
142
  rescue Errno::EAGAIN, Errno::ENOMEM => e
135
- Pgbus.logger.error { "[Pgbus] Fork failed for scheduler: #{e.message}" }
143
+ ErrorReporter.report(e, { action: "fork_scheduler" })
136
144
  end
137
145
 
138
146
  def recurring_tasks_configured?
@@ -186,7 +194,7 @@ module Pgbus
186
194
  @forks[pid] = { type: :consumer, config: consumer_config }
187
195
  Pgbus.logger.info { "[Pgbus] Forked consumer pid=#{pid} topics=#{topics.join(",")}" }
188
196
  rescue Errno::EAGAIN, Errno::ENOMEM => e
189
- Pgbus.logger.error { "[Pgbus] Fork failed for consumer: #{e.message}" }
197
+ ErrorReporter.report(e, { action: "fork_consumer", topics: topics })
190
198
  end
191
199
 
192
200
  def boot_outbox_poller
@@ -212,7 +220,7 @@ module Pgbus
212
220
  @forks[pid] = { type: :outbox_poller }
213
221
  Pgbus.logger.info { "[Pgbus] Forked outbox poller pid=#{pid}" }
214
222
  rescue Errno::EAGAIN, Errno::ENOMEM => e
215
- Pgbus.logger.error { "[Pgbus] Fork failed for outbox poller: #{e.message}" }
223
+ ErrorReporter.report(e, { action: "fork_outbox_poller" })
216
224
  end
217
225
 
218
226
  def monitor_loop
@@ -282,7 +290,7 @@ module Pgbus
282
290
  def bootstrap_queues
283
291
  Pgbus.client.ensure_all_queues
284
292
  rescue StandardError => e
285
- Pgbus.logger.error { "[Pgbus] Failed to bootstrap queues: #{e.message}" }
293
+ ErrorReporter.report(e, { action: "bootstrap_queues" })
286
294
  end
287
295
 
288
296
  def load_rails_app
@@ -151,7 +151,7 @@ module Pgbus
151
151
  if undefined_queue_table_error?(e)
152
152
  evict_missing_queues(e)
153
153
  else
154
- Pgbus.logger.error { "[Pgbus] Error fetching messages: #{e.message}" }
154
+ ErrorReporter.report(e, { action: "fetch_messages", queues: active_queues })
155
155
  end
156
156
  []
157
157
  end
@@ -194,8 +194,13 @@ module Pgbus
194
194
  # Use pgmq-ruby's read_multi to read from all queues in a single
195
195
  # SQL query (UNION ALL). Each returned message carries a queue_name
196
196
  # field so we can map it back to the logical queue.
197
+ #
198
+ # `qty` is the total pool capacity. pgmq-ruby treats `qty:` as per-queue,
199
+ # so we also pass `limit: qty` to cap the total across all queues —
200
+ # otherwise we get `queue_count * qty` messages and overflow the
201
+ # execution pool, crashing the worker fork (issue #123).
197
202
  def fetch_multi(active_queues, qty)
198
- messages = Pgbus.client.read_multi(active_queues, qty: qty) || []
203
+ messages = Pgbus.client.read_multi(active_queues, qty: qty, limit: qty) || []
199
204
  prefix = "#{config.queue_prefix}_"
200
205
 
201
206
  messages.map do |m|
@@ -223,7 +228,7 @@ module Pgbus
223
228
  @jobs_failed.increment
224
229
  @rate_counter.increment(:failed)
225
230
  @circuit_breaker.record_failure(queue_name)
226
- Pgbus.logger.error { "[Pgbus] Unhandled error processing message: #{e.message}" }
231
+ ErrorReporter.report(e, { action: "process_message", queue: queue_name })
227
232
  ensure
228
233
  @in_flight.decrement
229
234
  end
@@ -7,9 +7,21 @@ module Pgbus
7
7
  # (e.g., pgmq.q_<name>, pgmq.a_<name>). This module enforces strict
8
8
  # validation to prevent SQL injection via crafted queue names.
9
9
  module QueueNameValidator
10
- # PostgreSQL identifier limit is 63 bytes (NAMEDATALEN - 1).
11
- # PGMQ prefixes with "q_" or "a_" (2 chars), so limit the name itself.
12
- MAX_QUEUE_NAME_LENGTH = 61
10
+ # PostgreSQL's NAMEDATALEN caps identifiers at 63 bytes, but the
11
+ # effective limit is tighter: pgmq-ruby (our transport gem) rejects
12
+ # any queue name with `length >= 48` in `PGMQ::Client#validate_queue_name!`.
13
+ # That leaves an actual usable ceiling of 47 characters for the
14
+ # fully-prefixed name (`<queue_prefix>_<logical_name>`), which is what
15
+ # this constant expresses. pgmq-ruby picked 48 to leave headroom for
16
+ # PGMQ's internal tables (`pgmq.q_`, `pgmq.a_`, sequences, indexes)
17
+ # which all get suffixed beyond the base name.
18
+ #
19
+ # Historically this was 61 (the raw PostgreSQL ceiling minus PGMQ's
20
+ # `q_`/`a_` prefix). That was wrong in practice: names in the 48-61
21
+ # range passed pgbus's validator but blew up deep inside pgmq-ruby
22
+ # with an InvalidQueueNameError — exactly the kind of opaque failure
23
+ # the validator was meant to catch up front.
24
+ MAX_QUEUE_NAME_LENGTH = 47
13
25
 
14
26
  # Only alphanumeric characters and underscores are allowed.
15
27
  VALID_QUEUE_NAME_PATTERN = /\A[a-zA-Z0-9_]+\z/
@@ -34,16 +46,23 @@ module Pgbus
34
46
  name
35
47
  end
36
48
 
37
- # Normalizes a queue name by replacing common separators (hyphens, dots)
38
- # with underscores, stripping remaining invalid characters, and collapsing
39
- # consecutive underscores. Use this for names from external sources
40
- # (e.g., Turbo stream names like "hotwire-livereload") where the intent
41
- # is to derive a valid PGMQ queue name that preserves readability.
49
+ # Normalizes a queue name by replacing common separators (hyphens,
50
+ # dots, colons) with underscores, stripping remaining invalid
51
+ # characters, and collapsing consecutive underscores. Use this for
52
+ # names from external sources (e.g., Turbo stream names like
53
+ # "hotwire-livereload" or "gid://app/Foo/1") where the intent is
54
+ # to derive a valid PGMQ queue name that preserves as much of the
55
+ # original identifier as possible.
56
+ #
57
+ # Colons in particular are the turbo-rails stream-name separator
58
+ # (`Pgbus.stream([user, :notifications])` → `"user_gid:notifications"`),
59
+ # so they must map to a safe character rather than be stripped —
60
+ # otherwise `"a:b"` and `"ab"` would collide on the same queue.
42
61
  def normalize(name)
43
62
  name = name.to_s
44
63
  return validate!(name) if VALID_QUEUE_NAME_PATTERN.match?(name)
45
64
 
46
- normalized = name.gsub(/[-.]/, "_") # hyphens/dots → underscores
65
+ normalized = name.gsub(/[-.:]/, "_") # hyphens/dots/colons → underscores
47
66
  .gsub(/[^a-zA-Z0-9_]/, "") # strip remaining invalid chars
48
67
  .gsub(/_+/, "_") # collapse consecutive underscores
49
68
  .gsub(/\A_|_\z/, "") # strip leading/trailing underscores
@@ -0,0 +1,173 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "digest"
4
+
5
+ module Pgbus
6
+ module Streams
7
+ # Short, pgbus-safe stream identifiers.
8
+ #
9
+ # PGMQ queue names are bounded by two ceilings: PostgreSQL's
10
+ # NAMEDATALEN (63 chars for `pgmq.q_<name>`) and pgmq-ruby's own
11
+ # stricter runtime check (`length >= 48` in
12
+ # `PGMQ::Client#validate_queue_name!`). The effective budget is the
13
+ # lower of the two, exposed as `QueueNameValidator::MAX_QUEUE_NAME_LENGTH`
14
+ # (currently 47). Any stream name composed from UUID primary keys
15
+ # and turbo-rails-style dom ids blows past that budget almost
16
+ # immediately:
17
+ #
18
+ # "gid://app/Ai::Chat/9c14e8b2-94c3-4c6f-8ca1-f50d2f5e22ca:messages"
19
+ # # => 63 chars, already too long before the "pgbus_" prefix is added.
20
+ #
21
+ # `stream_key` produces a deterministic short form suitable as a
22
+ # pgbus stream identifier. It normalizes each part, joins with ":",
23
+ # and enforces the queue-name budget (derived from the configured
24
+ # `queue_prefix`) at the call site — raising ArgumentError rather
25
+ # than letting the failure surface as an opaque QueueNameValidator
26
+ # error deep inside `Pgbus.stream(...).broadcast(...)`.
27
+ #
28
+ # Silent truncation is intentionally NOT supported: trimming a
29
+ # too-long key to fit would reintroduce the collision risk that the
30
+ # 64-bit digest is chosen to eliminate. Callers who overflow should
31
+ # shorten their own identifiers or adopt `Pgbus::Streams::Streamable`
32
+ # on their ActiveRecord models.
33
+ #
34
+ # Usage:
35
+ #
36
+ # Pgbus.stream_key(chat, :messages)
37
+ # # => "ai_chat_3a4f9c21b7d20e18:messages"
38
+ #
39
+ # Pgbus.stream_key([user, :notifications])
40
+ # # => "user_5fa83c91d44a2701:notifications"
41
+ #
42
+ # Pgbus.stream(Pgbus.stream_key(chat, :messages)).broadcast("<turbo-stream/>")
43
+ #
44
+ # Collision horizon: the 64-bit SHA-256 prefix gives a birthday bound
45
+ # of roughly 5 billion records per model class before a 50% chance of
46
+ # collision. For multi-tenant apps where a collision would mean two
47
+ # records share a stream (and receive each other's broadcasts), this
48
+ # is wide enough in practice. Callers with higher sensitivity can
49
+ # pass `digest_bits: 128`.
50
+ module Key
51
+ DEFAULT_DIGEST_BITS = 64
52
+
53
+ module_function
54
+
55
+ # Compose a short pgbus-safe stream name from any mix of records,
56
+ # strings, symbols, and arrays. Returns the joined key when it fits
57
+ # the pgbus queue-name budget; raises ArgumentError otherwise.
58
+ #
59
+ # Fragments must not contain `:` — it's the join separator, so
60
+ # `stream_key("a:b", :c)` and `stream_key("a", "b:c")` would both
61
+ # produce `"a:b:c"` and collapse two logically distinct streams
62
+ # onto one queue. Colons inside fragments (typically from a
63
+ # `to_stream_key`/`to_gid_param` implementation that forgot to
64
+ # sanitize) raise an ArgumentError at the call site.
65
+ def stream_key(*parts, digest_bits: DEFAULT_DIGEST_BITS)
66
+ fragments = Array(parts).flatten.map { |part| normalize(part, digest_bits: digest_bits) }
67
+ fragments.each { |fragment| reject_colons!(fragment) }
68
+ key = fragments.join(":")
69
+ budget = queue_name_budget
70
+ return key if key.length <= budget
71
+
72
+ raise ArgumentError,
73
+ "stream_key #{key.inspect} is #{key.length} chars, " \
74
+ "exceeds pgbus budget of #{budget} " \
75
+ "(queue_prefix=#{Pgbus.configuration.queue_prefix.inspect}, " \
76
+ "pgbus_max_queue_name_length=#{QueueNameValidator::MAX_QUEUE_NAME_LENGTH}). " \
77
+ "Shorten the streamables or use Pgbus::Streams::Streamable on the model."
78
+ end
79
+
80
+ # 64-bit (default) SHA-256 prefix of the record's primary key. Stdlib
81
+ # only, deterministic, and fixed-length. CRC32's 32-bit output is
82
+ # intentionally not used here: its ~77k-row birthday bound is too
83
+ # tight for a multi-tenant stream identifier where a collision would
84
+ # route two records' broadcasts to the same queue.
85
+ # Full output size of the backing digest, in bits. Capping
86
+ # digest_bits here matters because `SHA256.hexdigest` only
87
+ # produces 64 hex chars (256 bits) no matter what — slicing
88
+ # `[0, 128]` just returns all 64 chars — so a caller asking for
89
+ # `digest_bits: 512` would silently get the same output as
90
+ # `digest_bits: 256` and walk away believing they'd widened the
91
+ # collision horizon. Raise instead.
92
+ MAX_DIGEST_BITS = ::Digest::SHA256.new.digest_length * 8 # => 256
93
+
94
+ def short_id(record, digest_bits: DEFAULT_DIGEST_BITS)
95
+ unless digest_bits.is_a?(Integer) && digest_bits.positive? &&
96
+ (digest_bits % 4).zero? && digest_bits <= MAX_DIGEST_BITS
97
+ raise ArgumentError,
98
+ "digest_bits must be a positive multiple of 4 and <= #{MAX_DIGEST_BITS} " \
99
+ "(SHA-256 produces #{MAX_DIGEST_BITS} bits; asking for more would silently truncate)"
100
+ end
101
+
102
+ # Unpersisted records all share id=nil, which hashes to a single
103
+ # constant digest and would collapse every new instance of the
104
+ # same class into one stream. Fail loud at the first unsaved
105
+ # call site — the whole point of the 64-bit digest is to
106
+ # eliminate collisions, so silently producing a shared key here
107
+ # would reintroduce exactly what it was chosen to prevent.
108
+ if record.id.nil?
109
+ raise ArgumentError,
110
+ "#{record.class.name} must be persisted before generating a stream key " \
111
+ "(record.id is nil — all unsaved records would collide on one stream)"
112
+ end
113
+
114
+ hex_chars = digest_bits / 4
115
+ ::Digest::SHA256.hexdigest(record.id.to_s)[0, hex_chars]
116
+ end
117
+
118
+ # Normalize a single streamable fragment to a pgbus-safe string.
119
+ # Mirrors the shape accepted by Turbo::Streams::StreamName and
120
+ # Pgbus::Streams::Stream.name_from so the two code paths agree
121
+ # on the wire format.
122
+ #
123
+ # - Strings and symbols pass through verbatim.
124
+ # - ActiveRecord models become "<param_key>_<short_id>".
125
+ # - Anything else responding to `to_gid_param` / `to_param` falls
126
+ # back to that; a UUID primary key would still overflow, which
127
+ # is why AR models are hashed above.
128
+ def normalize(part, digest_bits: DEFAULT_DIGEST_BITS)
129
+ case part
130
+ when String, Symbol
131
+ part.to_s
132
+ else
133
+ if defined?(::ActiveRecord::Base) && part.is_a?(::ActiveRecord::Base)
134
+ "#{part.class.model_name.param_key}_#{short_id(part, digest_bits: digest_bits)}"
135
+ elsif part.respond_to?(:to_stream_key)
136
+ part.to_stream_key
137
+ elsif part.respond_to?(:to_gid_param)
138
+ part.to_gid_param
139
+ elsif part.respond_to?(:to_param)
140
+ part.to_param
141
+ else
142
+ part.to_s
143
+ end
144
+ end
145
+ end
146
+
147
+ # Budget = effective pgbus queue-name limit - "<queue_prefix>_"
148
+ # length. Computed at call time (not a constant) so apps that
149
+ # override `config.queue_prefix` get the correct budget
150
+ # automatically.
151
+ def queue_name_budget
152
+ QueueNameValidator::MAX_QUEUE_NAME_LENGTH -
153
+ Pgbus.configuration.queue_prefix.length - 1
154
+ end
155
+
156
+ # Raises if a normalized fragment contains the `:` separator.
157
+ # Kept private-ish via module_function so the guard is shared
158
+ # between stream_key and any future composer without becoming
159
+ # part of the public surface.
160
+ def reject_colons!(fragment)
161
+ return unless fragment.include?(":")
162
+
163
+ raise ArgumentError,
164
+ "stream_key fragment #{fragment.inspect} contains ':' which is the " \
165
+ "join separator — two calls with different colon placements would " \
166
+ "collapse to the same key (e.g. stream_key('a:b', :c) vs " \
167
+ "stream_key('a', 'b:c') both produce 'a:b:c'). Strip or replace " \
168
+ "colons in the offending streamable before calling stream_key."
169
+ end
170
+ private_class_method :reject_colons!
171
+ end
172
+ end
173
+ end
@@ -0,0 +1,57 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Pgbus
4
+ module Streams
5
+ # ActiveRecord concern that adds `short_id` and `to_stream_key`
6
+ # instance methods for producing pgbus-safe stream identifiers from
7
+ # records whose primary key is a UUID (or any other long string).
8
+ #
9
+ # Intended for inclusion in `ApplicationRecord`:
10
+ #
11
+ # class ApplicationRecord < ActiveRecord::Base
12
+ # primary_abstract_class
13
+ # include Pgbus::Streams::Streamable
14
+ # end
15
+ #
16
+ # Every subclass then gets:
17
+ #
18
+ # chat.short_id
19
+ # # => "3a4f9c21b7d20e18"
20
+ #
21
+ # chat.to_stream_key
22
+ # # => "ai_chat_3a4f9c21b7d20e18"
23
+ #
24
+ # Pgbus.stream_key(chat, :messages)
25
+ # # => "ai_chat_3a4f9c21b7d20e18:messages"
26
+ #
27
+ # The mixin is intentionally thin: it delegates to `Pgbus::Streams::Key`
28
+ # so the digest policy lives in one place.
29
+ #
30
+ # `#to_stream_key` calls `Key.short_id(self)` directly rather than
31
+ # dispatching through `#short_id`. Ruby does NOT warn when a class
32
+ # defines an instance method and a later `include` adds a module
33
+ # with the same name — the class method silently wins. A host app
34
+ # that already defines its own `#short_id` (returning, say, a
35
+ # display-friendly abbreviation) would therefore hijack
36
+ # `to_stream_key` without any indication, producing stream keys
37
+ # the wire format never promised. Calling `Key.short_id(self)`
38
+ # explicitly bypasses instance-method lookup and guarantees the
39
+ # advertised digest regardless of what the host class does with
40
+ # the unqualified name.
41
+ module Streamable
42
+ # Returns a short SHA-256 prefix (64 bits / 16 hex chars by default)
43
+ # of this record's primary key. See `Pgbus::Streams::Key.short_id`
44
+ # for the digest policy and collision horizon.
45
+ def short_id(digest_bits: Key::DEFAULT_DIGEST_BITS)
46
+ Key.short_id(self, digest_bits: digest_bits)
47
+ end
48
+
49
+ # Returns a stable, pgbus-safe identifier of the form
50
+ # `<model_key>_<short_id>` suitable for passing directly to
51
+ # `Pgbus.stream(...)` or composing with `Pgbus.stream_key`.
52
+ def to_stream_key
53
+ "#{self.class.model_name.param_key}_#{Key.short_id(self)}"
54
+ end
55
+ end
56
+ end
57
+ end
data/lib/pgbus/streams.rb CHANGED
@@ -6,6 +6,15 @@ module Pgbus
6
6
  # which returns a `Pgbus::Streams::Stream` providing `#broadcast`,
7
7
  # `#current_msg_id`, and `#read_after`.
8
8
  module Streams
9
+ # Raised when a composed stream name would overflow PGMQ's queue-name
10
+ # budget (derived from PostgreSQL's NAMEDATALEN=64, minus PGMQ's `q_`
11
+ # table prefix, minus the configured queue_prefix + separator).
12
+ #
13
+ # Inherits from ArgumentError so existing rescues of the underlying
14
+ # QueueNameValidator error keep working; callers that want to handle
15
+ # this specifically can rescue Pgbus::Streams::StreamNameTooLong.
16
+ class StreamNameTooLong < ArgumentError; end
17
+
9
18
  # Process-wide registry of server-side audience filter predicates.
10
19
  # Register filters at boot time via:
11
20
  # Pgbus::Streams.filters.register(:admin_only) { |user| user.admin? }
@@ -30,6 +39,7 @@ module Pgbus
30
39
 
31
40
  def initialize(streamables, client: Pgbus.client)
32
41
  @name = self.class.name_from(streamables)
42
+ self.class.validate_name_length!(@name, streamables)
33
43
  @client = client
34
44
  @ensured = false
35
45
  @ensure_mutex = Mutex.new
@@ -108,6 +118,33 @@ module Pgbus
108
118
  end
109
119
  end
110
120
 
121
+ # Enforces the pgbus queue-name budget at the Stream-construction
122
+ # boundary so a forgotten call site fails with an actionable error
123
+ # (pointing at the offending streamables and suggesting
124
+ # `Pgbus.stream_key`) instead of an opaque QueueNameValidator
125
+ # failure three frames deep in Client#ensure_stream_queue.
126
+ #
127
+ # The budget is computed from `config.queue_prefix` at call time
128
+ # so apps that override the prefix get the correct limit. Does not
129
+ # mutate the name — silent truncation is a footgun for
130
+ # multi-tenant apps where collisions would mix broadcasts across
131
+ # records. Callers who need a short, safe identifier should use
132
+ # `Pgbus.stream_key(...)` or include `Pgbus::Streams::Streamable`
133
+ # on their ActiveRecord models.
134
+ def self.validate_name_length!(name, streamables)
135
+ budget = Key.queue_name_budget
136
+ return if name.length <= budget
137
+
138
+ raise StreamNameTooLong,
139
+ "Stream name #{name.inspect} is #{name.length} chars, " \
140
+ "exceeds pgbus budget of #{budget} " \
141
+ "(queue_prefix=#{Pgbus.configuration.queue_prefix.inspect}, " \
142
+ "pgbus_max_queue_name_length=#{QueueNameValidator::MAX_QUEUE_NAME_LENGTH}). " \
143
+ "Streamables: #{streamables.inspect}. " \
144
+ "Use Pgbus.stream_key(*streamables) to produce a safe short name, " \
145
+ "or include Pgbus::Streams::Streamable on the model."
146
+ end
147
+
111
148
  private
112
149
 
113
150
  def ensure_queue!
data/lib/pgbus/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Pgbus
4
- VERSION = "0.6.9"
4
+ VERSION = "0.7.1"
5
5
  end
data/lib/pgbus.rb CHANGED
@@ -105,6 +105,20 @@ module Pgbus
105
105
  @stream_cache.compute_if_absent(name) { Streams::Stream.new(streamables) }
106
106
  end
107
107
 
108
+ # Compose a short, pgbus-safe stream identifier from any mix of
109
+ # records, strings, symbols, and arrays. Delegates to
110
+ # `Pgbus::Streams::Key.stream_key`; raises `ArgumentError` if the
111
+ # resulting key would overflow the pgbus queue-name budget. See
112
+ # `lib/pgbus/streams/key.rb` for the digest policy and rationale.
113
+ #
114
+ # Pgbus.stream_key(chat, :messages)
115
+ # # => "ai_chat_3a4f9c21b7d20e18:messages"
116
+ #
117
+ # Pgbus.stream(Pgbus.stream_key(chat, :messages)).broadcast(html)
118
+ def stream_key(*parts, **)
119
+ Streams::Key.stream_key(*parts, **)
120
+ end
121
+
108
122
  def reset!
109
123
  @client&.close
110
124
  @client = nil
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pgbus
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.6.9
4
+ version: 0.7.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Mikael Henriksson
@@ -256,6 +256,7 @@ files:
256
256
  - lib/pgbus/configuration/capsule_dsl.rb
257
257
  - lib/pgbus/dedup_cache.rb
258
258
  - lib/pgbus/engine.rb
259
+ - lib/pgbus/error_reporter.rb
259
260
  - lib/pgbus/event.rb
260
261
  - lib/pgbus/event_bus/handler.rb
261
262
  - lib/pgbus/event_bus/publisher.rb
@@ -269,6 +270,7 @@ files:
269
270
  - lib/pgbus/generators/database_target_detector.rb
270
271
  - lib/pgbus/generators/migration_detector.rb
271
272
  - lib/pgbus/instrumentation.rb
273
+ - lib/pgbus/log_formatter.rb
272
274
  - lib/pgbus/outbox.rb
273
275
  - lib/pgbus/outbox/poller.rb
274
276
  - lib/pgbus/pgmq_schema.rb
@@ -299,8 +301,10 @@ files:
299
301
  - lib/pgbus/streams/cursor.rb
300
302
  - lib/pgbus/streams/envelope.rb
301
303
  - lib/pgbus/streams/filters.rb
304
+ - lib/pgbus/streams/key.rb
302
305
  - lib/pgbus/streams/presence.rb
303
306
  - lib/pgbus/streams/signed_name.rb
307
+ - lib/pgbus/streams/streamable.rb
304
308
  - lib/pgbus/streams/turbo_broadcastable.rb
305
309
  - lib/pgbus/streams/turbo_stream_override.rb
306
310
  - lib/pgbus/streams/watermark_cache_middleware.rb