pgbus 0.7.8 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 48e02fc16de64a9ccec20b8b0d917e154e17dc2082c1e3933b80da1bdc2f0c2a
4
- data.tar.gz: 811db2dfcb4c4271296d9125754af3865b117ebc8b45d3194bc4314cc43ffaba
3
+ metadata.gz: 60cc7178f84e5d28919085f5c5a9d824aca958180d433daf13cb04a369ed25c0
4
+ data.tar.gz: 5a7fdb569f90cf3e60ef5951e86d6ed0c3c31c7703b0f88be91c2df3054b8817
5
5
  SHA512:
6
- metadata.gz: 2b41cd760d469af41c4e0ef0d9bfaffd3cb17ff81b38dbeb250605d78c737708f54ee43519fe25afeda83a5e15177a14e89e48d27471ae39ad5c79420d791e21
7
- data.tar.gz: 50f209ed7b74e18c1bf927ca73fe80286c9a4333642af4f32e579bd6f9e40158007c07a39dfa6a3fc2082fd9ad1663f3e9b109ebd7021485f931a615ad9baabb
6
+ metadata.gz: 8cc8aa7893bdb605379f4cea9542766062b5a759a89d6149e49a39729484f56790bc131b1063b16a855357a0ab4979aeb52f4bd9f572568e793482fcabaf9939
7
+ data.tar.gz: '052525538d45540220e66da0d43f007727848543d8f1184f7598f481e3abe3fe6be961eb6c634f414c158cd884b856305a35ded77e39d2f0f4d7c24bc10cf1a4'
data/README.md CHANGED
@@ -728,6 +728,48 @@ Reporters are wired into all critical rescue paths: job execution failures, work
728
728
 
729
729
  `ErrorReporter.report` is guaranteed to never raise — if a reporter or the logger itself throws, the error is swallowed silently. This preserves fault-tolerance invariants at every rescue site.
730
730
 
731
+ ### AppSignal integration
732
+
733
+ When the `appsignal` gem is loaded in your app, Pgbus auto-installs a subscriber and a minutely probe that report into AppSignal:
734
+
735
+ - **Background-job transactions** for every ActiveJob run and every event-bus handler invocation. Action names follow the AppSignal convention: `MyJob#perform`, `MyHandler#handle`. Tags include `queue`, `job_class`/`handler`, `routing_key`, `attempts`, and the `active_job_id` / `provider_job_id`. `enqueued_at` becomes the AppSignal `queue_start` timestamp so "time on queue" shows up correctly in the timeline.
736
+ - **Custom counters and distributions** for sends, reads, broadcasts, outbox publishes, recurring scheduling, and worker recycles. All metric names are prefixed `pgbus_`.
737
+ - **A minutely probe** that gauges queue depth (visible vs total), oldest message age per queue, DLQ depth, failed events count, dead-tuple totals, MVCC horizon age, active processes, and stream connection estimates.
738
+
739
+ There is nothing to wire up — load the appsignal gem and the integration installs itself in a Rails initializer. To opt out:
740
+
741
+ ```ruby
742
+ Pgbus.configure do |c|
743
+ c.appsignal_enabled = false # disable subscriber + probe entirely
744
+ c.appsignal_probe_enabled = false # keep transactions, drop the gauge probe
745
+ end
746
+ ```
747
+
748
+ #### Dashboards
749
+
750
+ Three importable AppSignal dashboards ship with the gem:
751
+
752
+ | File | Purpose |
753
+ |------|---------|
754
+ | `lib/pgbus/integrations/appsignal/dashboards/pgbus_throughput.json` | Jobs/sec, perform-duration percentiles, send/read counts |
755
+ | `lib/pgbus/integrations/appsignal/dashboards/pgbus_health.json` | Queue depth, oldest message age, DLQ, dead tuples, MVCC horizon, worker recycles |
756
+ | `lib/pgbus/integrations/appsignal/dashboards/pgbus_streams.json` | Broadcasts, fanout, active SSE connections, outbox, recurring tasks |
757
+
758
+ Import via the AppSignal dashboard UI ("New dashboard" → "Import JSON") or the AppSignal API.
759
+
760
+ #### Custom subscriptions
761
+
762
+ The integration is built on `ActiveSupport::Notifications`. If you want to push pgbus telemetry into a different APM (Datadog, New Relic, OpenTelemetry), subscribe directly:
763
+
764
+ ```ruby
765
+ ActiveSupport::Notifications.subscribe(/^pgbus\./) do |name, start, finish, _id, payload|
766
+ duration_ms = (finish - start) * 1_000
767
+ YourApm.record(name, duration_ms, payload)
768
+ end
769
+ ```
770
+
771
+ Events emitted: `pgbus.executor.execute`, `pgbus.job_completed`, `pgbus.job_failed`, `pgbus.job_dead_lettered`, `pgbus.event_processed`, `pgbus.event_failed`, `pgbus.client.send_message`, `pgbus.client.send_batch`, `pgbus.client.read_batch`, `pgbus.stream.broadcast`, `pgbus.outbox.publish`, `pgbus.recurring.enqueue`, `pgbus.worker.recycle`. Payload keys are documented in `lib/pgbus/instrumentation.rb`.
772
+
731
773
  ### Structured logging
732
774
 
733
775
  Pgbus ships two log formatters inspired by Sidekiq's `Logger::Formatters`:
@@ -131,7 +131,9 @@ module Pgbus
131
131
  return nil if cache[:script_emitted]
132
132
 
133
133
  cache[:script_emitted] = true
134
- script = '<script type="module">import "pgbus/stream_source_element"</script>'
134
+ nonce = content_security_policy_nonce if respond_to?(:content_security_policy_nonce)
135
+ nonce_attr = nonce ? %( nonce="#{CGI.escape_html(nonce)}") : ""
136
+ script = %(<script type="module"#{nonce_attr}>import "pgbus/stream_source_element"</script>)
135
137
  script.respond_to?(:html_safe) ? script.html_safe : script
136
138
  end
137
139
 
@@ -33,6 +33,15 @@ module Pgbus
33
33
  signal_batch_discarded(payload)
34
34
  Uniqueness.release_lock(Uniqueness.extract_key(payload))
35
35
  record_stat(payload, queue_name, "dead_lettered", execution_start, message: message)
36
+ instrument(
37
+ "pgbus.job_dead_lettered",
38
+ queue: queue_name,
39
+ job_class: job_class,
40
+ job_id: payload["job_id"],
41
+ provider_job_id: payload["provider_job_id"],
42
+ read_ct: read_count,
43
+ msg_id: message.msg_id.to_i
44
+ )
36
45
  Pgbus.logger.debug { "[Pgbus::Executor] dead_lettered #{tag} job_class=#{job_class}" }
37
46
  return :dead_lettered
38
47
  end
@@ -60,7 +69,17 @@ module Pgbus
60
69
  job_succeeded = false
61
70
 
62
71
  msg_id = message.msg_id.to_i
63
- Instrumentation.instrument("pgbus.executor.execute", queue: queue_name, job_class: job_class) do
72
+ instrument_payload = {
73
+ queue: queue_name,
74
+ job_class: job_class,
75
+ job_id: payload["job_id"],
76
+ provider_job_id: payload["provider_job_id"],
77
+ arguments: payload["arguments"],
78
+ enqueued_at: payload["enqueued_at"],
79
+ read_ct: read_count,
80
+ msg_id: msg_id
81
+ }
82
+ Instrumentation.instrument("pgbus.executor.execute", instrument_payload) do
64
83
  job = ::ActiveJob::Base.deserialize(payload)
65
84
  Pgbus.logger.debug { "[Pgbus::Executor] running #{tag} job_class=#{job_class}" }
66
85
  execute_job(job)
@@ -85,7 +104,17 @@ module Pgbus
85
104
  # silently lost control flow — no failed event row, no job_failed
86
105
  # notification, uniqueness lock held until VT expired. See issue #126.
87
106
  handle_failure(message, queue_name, e, payload: payload)
88
- instrument("pgbus.job_failed", queue: queue_name, job_class: payload&.dig("job_class"), error: e.class.name)
107
+ instrument(
108
+ "pgbus.job_failed",
109
+ queue: queue_name,
110
+ job_class: payload&.dig("job_class"),
111
+ job_id: payload&.dig("job_id"),
112
+ provider_job_id: payload&.dig("provider_job_id"),
113
+ read_ct: message.read_ct.to_i,
114
+ msg_id: message.msg_id.to_i,
115
+ error: e.class.name,
116
+ exception_object: e
117
+ )
89
118
  record_stat(payload, queue_name, "failed", execution_start, message: message)
90
119
  Pgbus.logger.debug { "[Pgbus::Executor] failed #{tag} job_class=#{payload&.dig("job_class")} error=#{e.class}" }
91
120
  # Don't signal concurrency on transient failure — the job will be retried.
@@ -0,0 +1,37 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Pgbus
4
+ class Client
5
+ # Fire-and-forget PG NOTIFY for ephemeral stream broadcasts. No PGMQ
6
+ # queue is created — the payload travels via the Postgres NOTIFY channel
7
+ # only, matching the channel naming convention that PGMQ's trigger uses:
8
+ # pgmq.q_<full_queue_name>.INSERT
9
+ #
10
+ # Subscribers already LISTEN on this channel via the Streamer's Listener.
11
+ # When a subscriber is connected, the StreamEventDispatcher receives the
12
+ # NOTIFY and fans out the payload. When no subscriber is connected,
13
+ # the NOTIFY is silently discarded by Postgres — no queue, no storage,
14
+ # no orphan tables.
15
+ #
16
+ # The payload is JSON-serialized into the NOTIFY's optional payload
17
+ # parameter (max 8000 bytes in Postgres). Broadcasts exceeding this
18
+ # limit will raise a PG::ProgramLimitExceeded error — callers needing
19
+ # large payloads should use durable mode (which inserts into PGMQ).
20
+ module NotifyStream
21
+ def notify_stream(stream_name, payload)
22
+ full_name = config.queue_name(stream_name)
23
+ sanitized = QueueNameValidator.sanitize!(full_name)
24
+ channel = "pgmq.q_#{sanitized}.INSERT"
25
+ json = payload.is_a?(String) ? payload : JSON.generate(payload)
26
+
27
+ Instrumentation.instrument("pgbus.stream.notify", stream: stream_name, bytes: json.bytesize) do
28
+ synchronized do
29
+ @pgmq.with_connection do |conn|
30
+ conn.exec_params("SELECT pg_notify($1, $2)", [channel, json])
31
+ end
32
+ end
33
+ end
34
+ end
35
+ end
36
+ end
37
+ end
data/lib/pgbus/client.rb CHANGED
@@ -3,11 +3,13 @@
3
3
  require "json"
4
4
  require_relative "client/read_after"
5
5
  require_relative "client/ensure_stream_queue"
6
+ require_relative "client/notify_stream"
6
7
 
7
8
  module Pgbus
8
9
  class Client
9
10
  include ReadAfter
10
11
  include EnsureStreamQueue
12
+ include NotifyStream
11
13
 
12
14
  attr_reader :pgmq, :config
13
15
 
@@ -104,7 +104,13 @@ module Pgbus
104
104
  :streams_default_retention, :streams_retention, :streams_heartbeat_interval,
105
105
  :streams_max_connections, :streams_idle_timeout, :streams_listen_health_check_ms,
106
106
  :streams_write_deadline_ms, :streams_falcon_streaming_body,
107
- :streams_stats_enabled, :streams_test_mode
107
+ :streams_stats_enabled, :streams_test_mode,
108
+ :streams_orphan_sweep_interval, :streams_orphan_threshold
109
+ attr_reader :streams_default_broadcast_mode # rubocop:disable Style/AccessorGrouping
110
+
111
+ # AppSignal integration (auto-loaded when ::Appsignal is defined and this is true).
112
+ # Set to false to opt out without uninstalling the appsignal gem.
113
+ attr_accessor :appsignal_enabled, :appsignal_probe_enabled
108
114
 
109
115
  def initialize
110
116
  @database_url = nil
@@ -212,6 +218,14 @@ module Pgbus
212
218
  # usually want job stats on and stream stats off, or vice versa.
213
219
  @streams_stats_enabled = false
214
220
  @streams_test_mode = false
221
+ @streams_default_broadcast_mode = :ephemeral
222
+ @streams_orphan_sweep_interval = 3600 # 1 hour
223
+ @streams_orphan_threshold = 86_400 # 24 hours
224
+
225
+ # AppSignal: auto-on when the appsignal gem is loaded; probe runs in
226
+ # the same process, so the operator can disable it independently.
227
+ @appsignal_enabled = true
228
+ @appsignal_probe_enabled = true
215
229
  end
216
230
 
217
231
  def queue_name(name)
@@ -255,6 +269,18 @@ module Pgbus
255
269
  end
256
270
  end
257
271
 
272
+ VALID_BROADCAST_MODES = %i[ephemeral durable].freeze
273
+
274
+ def streams_default_broadcast_mode=(mode)
275
+ mode = mode.to_sym
276
+ unless VALID_BROADCAST_MODES.include?(mode)
277
+ raise ArgumentError,
278
+ "Invalid streams_default_broadcast_mode: #{mode}. Must be one of: #{VALID_BROADCAST_MODES.join(", ")}"
279
+ end
280
+
281
+ @streams_default_broadcast_mode = mode
282
+ end
283
+
258
284
  VALID_PGMQ_SCHEMA_MODES = %i[auto extension embedded].freeze
259
285
 
260
286
  def pgmq_schema_mode=(mode)
@@ -334,6 +360,15 @@ module Pgbus
334
360
  end
335
361
 
336
362
  raise ArgumentError, "streams_retention must be a Hash" unless streams_retention.is_a?(Hash)
363
+
364
+ if streams_orphan_sweep_interval && !(streams_orphan_sweep_interval.is_a?(Numeric) && streams_orphan_sweep_interval.positive?)
365
+ raise ArgumentError, "streams_orphan_sweep_interval must be a positive number or nil to disable"
366
+ end
367
+
368
+ return if streams_orphan_threshold.nil?
369
+ return if streams_orphan_threshold.is_a?(Numeric) && streams_orphan_threshold.positive?
370
+
371
+ raise ArgumentError, "streams_orphan_threshold must be a positive number or nil to disable"
337
372
  end
338
373
 
339
374
  # Set the worker capsule list. Accepts:
data/lib/pgbus/engine.rb CHANGED
@@ -71,6 +71,21 @@ module Pgbus
71
71
  require "pgbus/web/data_source"
72
72
  end
73
73
 
74
+ # AppSignal is third-party and entirely optional. We require the
75
+ # integration only when the host app has the appsignal gem loaded
76
+ # AND hasn't disabled it via config.appsignal_enabled. AppSignal
77
+ # itself loads early (it's typically required from config/environment.rb
78
+ # before Rails finishes booting), so by the time `after_initialize`
79
+ # fires the constant check is reliable.
80
+ initializer "pgbus.integrations.appsignal", after: :load_config_initializers do
81
+ ActiveSupport.on_load(:after_initialize) do
82
+ next unless defined?(::Appsignal) && Pgbus.configuration.appsignal_enabled
83
+
84
+ require "pgbus/integrations/appsignal"
85
+ Pgbus::Integrations::Appsignal.install!
86
+ end
87
+ end
88
+
74
89
  # Install the watermark cache middleware ahead of the app's own
75
90
  # middleware so the thread-local cache is cleared between every
76
91
  # Rack request. Without this, repeated page renders served by the
@@ -30,12 +30,32 @@ module Pgbus
30
30
  def process!(message)
31
31
  raw = JSON.parse(message.message)
32
32
  event = build_event(raw)
33
+ routing_key = raw.dig("headers", "routing_key") || raw["routing_key"]
33
34
 
34
35
  return :skipped if self.class.idempotent? && !claim_idempotency?(event.event_id)
35
36
 
36
- handle(event)
37
- instrument("pgbus.event_processed", event_id: event.event_id, handler: self.class.name)
37
+ instrument_payload = {
38
+ event_id: event.event_id,
39
+ handler: self.class.name,
40
+ routing_key: routing_key,
41
+ published_at: event.published_at,
42
+ read_ct: message.read_ct.to_i,
43
+ msg_id: message.msg_id.to_i
44
+ }
45
+ Instrumentation.instrument("pgbus.event_processed", instrument_payload) do
46
+ handle(event)
47
+ end
38
48
  :handled
49
+ rescue StandardError => e
50
+ instrument(
51
+ "pgbus.event_failed",
52
+ event_id: event&.event_id,
53
+ handler: self.class.name,
54
+ routing_key: routing_key,
55
+ error: e.class.name,
56
+ exception_object: e
57
+ )
58
+ raise
39
59
  end
40
60
 
41
61
  # Mirrors Pgbus::ActiveJob::Executor#execute_job: wrap the handler
@@ -7,12 +7,21 @@ module Pgbus
7
7
  # automatically when used with the block form of AS::Notifications.instrument.
8
8
  #
9
9
  # Events emitted:
10
- # pgbus.client.send_message — single message enqueue
11
- # pgbus.client.send_batch — batch enqueue
12
- # pgbus.client.read_batch — batch dequeue
13
- # pgbus.client.read_message — single message dequeue
14
- # pgbus.executor.execute — full job execution (deserialize + perform + archive)
15
- # pgbus.serializer.serialize — job/event serialization
10
+ # pgbus.client.send_message — single message enqueue
11
+ # pgbus.client.send_batch — batch enqueue
12
+ # pgbus.client.read_batch — batch dequeue
13
+ # pgbus.client.read_message — single message dequeue
14
+ # pgbus.executor.execute — full job execution (deserialize + perform + archive)
15
+ # pgbus.job_completed — job archived successfully
16
+ # pgbus.job_failed — job raised; carries :exception_object
17
+ # pgbus.job_dead_lettered — job exceeded max_retries and was DLQ-routed
18
+ # pgbus.event_processed — event handler succeeded
19
+ # pgbus.event_failed — event handler raised; carries :exception_object
20
+ # pgbus.stream.broadcast — stream broadcast (sync or deferred)
21
+ # pgbus.outbox.publish — outbox row created
22
+ # pgbus.recurring.enqueue — scheduler enqueued a due recurring task
23
+ # pgbus.worker.recycle — worker hit a recycle threshold
24
+ # pgbus.serializer.serialize — job/event serialization
16
25
  # pgbus.serializer.deserialize — job/event deserialization
17
26
  #
18
27
  module Instrumentation
@@ -0,0 +1,87 @@
1
+ {
2
+ "title": "Pgbus — Health",
3
+ "description": "Backlog, dead-letter activity, dead-tuple growth, and MVCC horizon. The 'should I page someone?' dashboard.",
4
+ "graphs": [
5
+ {
6
+ "title": "Queue depth (visible vs total)",
7
+ "description": "Visible depth excludes messages whose VT hasn't expired. A divergence between the two means workers are slow but the queue isn't growing.",
8
+ "line_label": "%queue",
9
+ "format": "number",
10
+ "kind": "timeseries",
11
+ "metrics": [
12
+ { "name": "pgbus_queue_depth", "fields": ["GAUGE"], "tags": [] },
13
+ { "name": "pgbus_queue_visible_depth", "fields": ["GAUGE"], "tags": [] }
14
+ ]
15
+ },
16
+ {
17
+ "title": "Oldest message age (seconds)",
18
+ "description": "Per-queue head-of-line waiting time. If this climbs while queue depth stays flat, a single poison message is stuck in the VT loop.",
19
+ "line_label": "%queue",
20
+ "format": "duration",
21
+ "format_input": "second",
22
+ "kind": "timeseries",
23
+ "metrics": [
24
+ { "name": "pgbus_queue_oldest_message_age_seconds", "fields": ["GAUGE"], "tags": [] }
25
+ ]
26
+ },
27
+ {
28
+ "title": "DLQ depth + failed events",
29
+ "description": "Messages that exceeded max_retries plus the failed-events table. Spikes after a deploy point at a regression.",
30
+ "format": "number",
31
+ "kind": "timeseries",
32
+ "metrics": [
33
+ { "name": "pgbus_dlq_depth", "fields": ["GAUGE"], "tags": [] },
34
+ { "name": "pgbus_failed_events_total", "fields": ["GAUGE"], "tags": [] }
35
+ ]
36
+ },
37
+ {
38
+ "title": "Dead-lettered jobs per minute",
39
+ "line_label": "%queue %job_class",
40
+ "format": "number",
41
+ "kind": "timeseries",
42
+ "draw_null_as_zero": true,
43
+ "metrics": [
44
+ { "name": "pgbus_queue_job_count", "fields": ["COUNTER"], "tags": [{ "key": "status", "value": "dead_lettered" }] }
45
+ ]
46
+ },
47
+ {
48
+ "title": "Active processes",
49
+ "description": "Workers + dispatcher + scheduler currently heartbeating into pgbus_processes.",
50
+ "format": "number",
51
+ "kind": "timeseries",
52
+ "metrics": [
53
+ { "name": "pgbus_active_processes", "fields": ["GAUGE"], "tags": [] }
54
+ ]
55
+ },
56
+ {
57
+ "title": "Dead tuples in queue/archive tables",
58
+ "description": "If autovacuum can't keep up the index gets bloated and lock acquisition slows. Tune autovacuum_vacuum_scale_factor on the offending tables when this climbs.",
59
+ "format": "number",
60
+ "kind": "timeseries",
61
+ "metrics": [
62
+ { "name": "pgbus_total_dead_tuples", "fields": ["GAUGE"], "tags": [] }
63
+ ]
64
+ },
65
+ {
66
+ "title": "Oldest open transaction (seconds)",
67
+ "description": "MVCC horizon pin. Long-running transactions prevent VACUUM from cleaning the dead tuples above. Anything over 60s is a smell.",
68
+ "format": "duration",
69
+ "format_input": "second",
70
+ "kind": "timeseries",
71
+ "metrics": [
72
+ { "name": "pgbus_oldest_transaction_age_seconds", "fields": ["GAUGE"], "tags": [] }
73
+ ]
74
+ },
75
+ {
76
+ "title": "Worker recycles per minute",
77
+ "description": "Recycles by reason. Steady max_jobs is healthy; spiking max_memory means you have a leak.",
78
+ "line_label": "%reason",
79
+ "format": "number",
80
+ "kind": "timeseries",
81
+ "draw_null_as_zero": true,
82
+ "metrics": [
83
+ { "name": "pgbus_worker_recycled", "fields": ["COUNTER"], "tags": [] }
84
+ ]
85
+ }
86
+ ]
87
+ }
@@ -0,0 +1,65 @@
1
+ {
2
+ "title": "Pgbus — Streams",
3
+ "description": "Real-time SSE pub/sub. Broadcasts, fanout, active connections, and the outbox/recurring scheduler.",
4
+ "graphs": [
5
+ {
6
+ "title": "Stream broadcasts per minute",
7
+ "line_label": "%stream %deferred",
8
+ "format": "number",
9
+ "kind": "timeseries",
10
+ "draw_null_as_zero": true,
11
+ "metrics": [
12
+ { "name": "pgbus_stream_broadcast_count", "fields": ["COUNTER"], "tags": [] }
13
+ ]
14
+ },
15
+ {
16
+ "title": "Active SSE connections",
17
+ "description": "Estimated from connect/disconnect events in the last 60 minutes. Use as a rough capacity gauge — exact count requires the SSE process telemetry.",
18
+ "format": "number",
19
+ "kind": "timeseries",
20
+ "metrics": [
21
+ { "name": "pgbus_stream_active_connections", "fields": ["GAUGE"], "tags": [] }
22
+ ]
23
+ },
24
+ {
25
+ "title": "Average fanout per broadcast",
26
+ "description": "Mean number of connections that received each broadcast over the last hour.",
27
+ "format": "number",
28
+ "kind": "timeseries",
29
+ "metrics": [
30
+ { "name": "pgbus_stream_avg_fanout", "fields": ["GAUGE"], "tags": [] }
31
+ ]
32
+ },
33
+ {
34
+ "title": "Broadcast payload size (bytes)",
35
+ "description": "Distribution of payload bytes. Use to spot accidentally-streaming-an-entire-page bugs.",
36
+ "line_label": "%stream",
37
+ "format": "size",
38
+ "format_input": "byte",
39
+ "kind": "timeseries",
40
+ "metrics": [
41
+ { "name": "pgbus_stream_broadcast_bytes", "fields": ["MEAN", "P95"], "tags": [] }
42
+ ]
43
+ },
44
+ {
45
+ "title": "Outbox publishes per minute",
46
+ "line_label": "%kind",
47
+ "format": "number",
48
+ "kind": "timeseries",
49
+ "draw_null_as_zero": true,
50
+ "metrics": [
51
+ { "name": "pgbus_outbox_published", "fields": ["COUNTER"], "tags": [] }
52
+ ]
53
+ },
54
+ {
55
+ "title": "Recurring tasks enqueued per minute",
56
+ "line_label": "%task",
57
+ "format": "number",
58
+ "kind": "timeseries",
59
+ "draw_null_as_zero": true,
60
+ "metrics": [
61
+ { "name": "pgbus_recurring_enqueued", "fields": ["COUNTER"], "tags": [] }
62
+ ]
63
+ }
64
+ ]
65
+ }
@@ -0,0 +1,81 @@
1
+ {
2
+ "title": "Pgbus — Throughput & Latency",
3
+ "description": "Job and event throughput, perform-duration percentiles, and PGMQ send/read counts. Drives the most common 'is the worker keeping up?' question.",
4
+ "graphs": [
5
+ {
6
+ "title": "Jobs processed per minute",
7
+ "description": "Successful, failed, and dead-lettered jobs. A spike in failed without a spike in processed usually means a deploy regression.",
8
+ "line_label": "%queue %job_class %status",
9
+ "format": "number",
10
+ "format_input": null,
11
+ "kind": "timeseries",
12
+ "draw_null_as_zero": true,
13
+ "metrics": [
14
+ { "name": "pgbus_queue_job_count", "fields": ["COUNTER"], "tags": [] }
15
+ ]
16
+ },
17
+ {
18
+ "title": "Job perform duration (ms)",
19
+ "description": "Distribution of how long perform_now takes per job class. P95 and P99 are the lines to watch.",
20
+ "line_label": "%job_class",
21
+ "format": "duration",
22
+ "format_input": "millisecond",
23
+ "kind": "timeseries",
24
+ "metrics": [
25
+ { "name": "pgbus_job_duration_ms", "fields": ["MEAN", "P95", "P99"], "tags": [] }
26
+ ]
27
+ },
28
+ {
29
+ "title": "Events processed per minute",
30
+ "description": "Event-bus handler invocations grouped by routing key.",
31
+ "line_label": "%routing_key %handler %status",
32
+ "format": "number",
33
+ "kind": "timeseries",
34
+ "draw_null_as_zero": true,
35
+ "metrics": [
36
+ { "name": "pgbus_event_count", "fields": ["COUNTER"], "tags": [] }
37
+ ]
38
+ },
39
+ {
40
+ "title": "Event handler duration (ms)",
41
+ "line_label": "%handler",
42
+ "format": "duration",
43
+ "format_input": "millisecond",
44
+ "kind": "timeseries",
45
+ "metrics": [
46
+ { "name": "pgbus_event_duration_ms", "fields": ["MEAN", "P95", "P99"], "tags": [] }
47
+ ]
48
+ },
49
+ {
50
+ "title": "PGMQ messages sent (per minute)",
51
+ "line_label": "%queue",
52
+ "format": "number",
53
+ "kind": "timeseries",
54
+ "draw_null_as_zero": true,
55
+ "metrics": [
56
+ { "name": "pgbus_messages_sent", "fields": ["COUNTER"], "tags": [] }
57
+ ]
58
+ },
59
+ {
60
+ "title": "Send duration (ms)",
61
+ "line_label": "%queue",
62
+ "format": "duration",
63
+ "format_input": "millisecond",
64
+ "kind": "timeseries",
65
+ "metrics": [
66
+ { "name": "pgbus_send_duration_ms", "fields": ["MEAN", "P95", "P99"], "tags": [] }
67
+ ]
68
+ },
69
+ {
70
+ "title": "PGMQ messages read (per minute)",
71
+ "description": "Messages fetched from queues by workers. Compare against 'sent' to spot backlog growth.",
72
+ "line_label": "%queue",
73
+ "format": "number",
74
+ "kind": "timeseries",
75
+ "draw_null_as_zero": true,
76
+ "metrics": [
77
+ { "name": "pgbus_messages_read", "fields": ["COUNTER"], "tags": [] }
78
+ ]
79
+ }
80
+ ]
81
+ }