pgbus 0.7.8 → 0.7.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 48e02fc16de64a9ccec20b8b0d917e154e17dc2082c1e3933b80da1bdc2f0c2a
4
- data.tar.gz: 811db2dfcb4c4271296d9125754af3865b117ebc8b45d3194bc4314cc43ffaba
3
+ metadata.gz: d7661d7d684ac911e36b15267b4b6135081fa3bd9ca0a795cfceae7e9977304a
4
+ data.tar.gz: 7cdb802918724dafa634925a99c48c9a0ee154356ec807e69ad6ec86cb48f9be
5
5
  SHA512:
6
- metadata.gz: 2b41cd760d469af41c4e0ef0d9bfaffd3cb17ff81b38dbeb250605d78c737708f54ee43519fe25afeda83a5e15177a14e89e48d27471ae39ad5c79420d791e21
7
- data.tar.gz: 50f209ed7b74e18c1bf927ca73fe80286c9a4333642af4f32e579bd6f9e40158007c07a39dfa6a3fc2082fd9ad1663f3e9b109ebd7021485f931a615ad9baabb
6
+ metadata.gz: 917889d47343e7c8775f6be1c9bc9d72b7ca8a26de356a9434ba1470f0f4f345215e9f016f9bba74b78007c72668254b92715944574fc056b20ceb1af92ebaac
7
+ data.tar.gz: 011533444eec2c09e29de3938d84e6fc06efd87c82621efcccf92008d88b3a3ac76464619c959617d022e076da4bed5ca03b50f3b5997e7d81ec9b52ccce90c3
data/README.md CHANGED
@@ -728,6 +728,48 @@ Reporters are wired into all critical rescue paths: job execution failures, work
728
728
 
729
729
  `ErrorReporter.report` is guaranteed to never raise — if a reporter or the logger itself throws, the error is swallowed silently. This preserves fault-tolerance invariants at every rescue site.
730
730
 
731
+ ### AppSignal integration
732
+
733
+ When the `appsignal` gem is loaded in your app, Pgbus auto-installs a subscriber and a minutely probe that report into AppSignal:
734
+
735
+ - **Background-job transactions** for every ActiveJob run and every event-bus handler invocation. Action names follow the AppSignal convention: `MyJob#perform`, `MyHandler#handle`. Tags include `queue`, `job_class`/`handler`, `routing_key`, `attempts`, and the `active_job_id` / `provider_job_id`. `enqueued_at` becomes the AppSignal `queue_start` timestamp so "time on queue" shows up correctly in the timeline.
736
+ - **Custom counters and distributions** for sends, reads, broadcasts, outbox publishes, recurring scheduling, and worker recycles. All metric names are prefixed `pgbus_`.
737
+ - **A minutely probe** that gauges queue depth (visible vs total), oldest message age per queue, DLQ depth, failed events count, dead-tuple totals, MVCC horizon age, active processes, and stream connection estimates.
738
+
739
+ There is nothing to wire up — load the appsignal gem and the integration installs itself in a Rails initializer. To opt out:
740
+
741
+ ```ruby
742
+ Pgbus.configure do |c|
743
+ c.appsignal_enabled = false # disable subscriber + probe entirely
744
+ c.appsignal_probe_enabled = false # keep transactions, drop the gauge probe
745
+ end
746
+ ```
747
+
748
+ #### Dashboards
749
+
750
+ Three importable AppSignal dashboards ship with the gem:
751
+
752
+ | File | Purpose |
753
+ |------|---------|
754
+ | `lib/pgbus/integrations/appsignal/dashboards/pgbus_throughput.json` | Jobs/sec, perform-duration percentiles, send/read counts |
755
+ | `lib/pgbus/integrations/appsignal/dashboards/pgbus_health.json` | Queue depth, oldest message age, DLQ, dead tuples, MVCC horizon, worker recycles |
756
+ | `lib/pgbus/integrations/appsignal/dashboards/pgbus_streams.json` | Broadcasts, fanout, active SSE connections, outbox, recurring tasks |
757
+
758
+ Import via the AppSignal dashboard UI ("New dashboard" → "Import JSON") or the AppSignal API.
759
+
760
+ #### Custom subscriptions
761
+
762
+ The integration is built on `ActiveSupport::Notifications`. If you want to push pgbus telemetry into a different APM (Datadog, New Relic, OpenTelemetry), subscribe directly:
763
+
764
+ ```ruby
765
+ ActiveSupport::Notifications.subscribe(/^pgbus\./) do |name, start, finish, _id, payload|
766
+ duration_ms = (finish - start) * 1_000
767
+ YourApm.record(name, duration_ms, payload)
768
+ end
769
+ ```
770
+
771
+ Events emitted: `pgbus.executor.execute`, `pgbus.job_completed`, `pgbus.job_failed`, `pgbus.job_dead_lettered`, `pgbus.event_processed`, `pgbus.event_failed`, `pgbus.client.send_message`, `pgbus.client.send_batch`, `pgbus.client.read_batch`, `pgbus.stream.broadcast`, `pgbus.outbox.publish`, `pgbus.recurring.enqueue`, `pgbus.worker.recycle`. Payload keys are documented in `lib/pgbus/instrumentation.rb`.
772
+
731
773
  ### Structured logging
732
774
 
733
775
  Pgbus ships two log formatters inspired by Sidekiq's `Logger::Formatters`:
@@ -33,6 +33,15 @@ module Pgbus
33
33
  signal_batch_discarded(payload)
34
34
  Uniqueness.release_lock(Uniqueness.extract_key(payload))
35
35
  record_stat(payload, queue_name, "dead_lettered", execution_start, message: message)
36
+ instrument(
37
+ "pgbus.job_dead_lettered",
38
+ queue: queue_name,
39
+ job_class: job_class,
40
+ job_id: payload["job_id"],
41
+ provider_job_id: payload["provider_job_id"],
42
+ read_ct: read_count,
43
+ msg_id: message.msg_id.to_i
44
+ )
36
45
  Pgbus.logger.debug { "[Pgbus::Executor] dead_lettered #{tag} job_class=#{job_class}" }
37
46
  return :dead_lettered
38
47
  end
@@ -60,7 +69,17 @@ module Pgbus
60
69
  job_succeeded = false
61
70
 
62
71
  msg_id = message.msg_id.to_i
63
- Instrumentation.instrument("pgbus.executor.execute", queue: queue_name, job_class: job_class) do
72
+ instrument_payload = {
73
+ queue: queue_name,
74
+ job_class: job_class,
75
+ job_id: payload["job_id"],
76
+ provider_job_id: payload["provider_job_id"],
77
+ arguments: payload["arguments"],
78
+ enqueued_at: payload["enqueued_at"],
79
+ read_ct: read_count,
80
+ msg_id: msg_id
81
+ }
82
+ Instrumentation.instrument("pgbus.executor.execute", instrument_payload) do
64
83
  job = ::ActiveJob::Base.deserialize(payload)
65
84
  Pgbus.logger.debug { "[Pgbus::Executor] running #{tag} job_class=#{job_class}" }
66
85
  execute_job(job)
@@ -85,7 +104,17 @@ module Pgbus
85
104
  # silently lost control flow — no failed event row, no job_failed
86
105
  # notification, uniqueness lock held until VT expired. See issue #126.
87
106
  handle_failure(message, queue_name, e, payload: payload)
88
- instrument("pgbus.job_failed", queue: queue_name, job_class: payload&.dig("job_class"), error: e.class.name)
107
+ instrument(
108
+ "pgbus.job_failed",
109
+ queue: queue_name,
110
+ job_class: payload&.dig("job_class"),
111
+ job_id: payload&.dig("job_id"),
112
+ provider_job_id: payload&.dig("provider_job_id"),
113
+ read_ct: message.read_ct.to_i,
114
+ msg_id: message.msg_id.to_i,
115
+ error: e.class.name,
116
+ exception_object: e
117
+ )
89
118
  record_stat(payload, queue_name, "failed", execution_start, message: message)
90
119
  Pgbus.logger.debug { "[Pgbus::Executor] failed #{tag} job_class=#{payload&.dig("job_class")} error=#{e.class}" }
91
120
  # Don't signal concurrency on transient failure — the job will be retried.
@@ -106,6 +106,10 @@ module Pgbus
106
106
  :streams_write_deadline_ms, :streams_falcon_streaming_body,
107
107
  :streams_stats_enabled, :streams_test_mode
108
108
 
109
+ # AppSignal integration (auto-loaded when ::Appsignal is defined and this is true).
110
+ # Set to false to opt out without uninstalling the appsignal gem.
111
+ attr_accessor :appsignal_enabled, :appsignal_probe_enabled
112
+
109
113
  def initialize
110
114
  @database_url = nil
111
115
  @connection_params = nil
@@ -212,6 +216,11 @@ module Pgbus
212
216
  # usually want job stats on and stream stats off, or vice versa.
213
217
  @streams_stats_enabled = false
214
218
  @streams_test_mode = false
219
+
220
+ # AppSignal: auto-on when the appsignal gem is loaded; probe runs in
221
+ # the same process, so the operator can disable it independently.
222
+ @appsignal_enabled = true
223
+ @appsignal_probe_enabled = true
215
224
  end
216
225
 
217
226
  def queue_name(name)
data/lib/pgbus/engine.rb CHANGED
@@ -71,6 +71,21 @@ module Pgbus
71
71
  require "pgbus/web/data_source"
72
72
  end
73
73
 
74
+ # AppSignal is third-party and entirely optional. We require the
75
+ # integration only when the host app has the appsignal gem loaded
76
+ # AND hasn't disabled it via config.appsignal_enabled. AppSignal
77
+ # itself loads early (it's typically required from config/environment.rb
78
+ # before Rails finishes booting), so by the time `after_initialize`
79
+ # fires the constant check is reliable.
80
+ initializer "pgbus.integrations.appsignal", after: :load_config_initializers do
81
+ ActiveSupport.on_load(:after_initialize) do
82
+ next unless defined?(::Appsignal) && Pgbus.configuration.appsignal_enabled
83
+
84
+ require "pgbus/integrations/appsignal"
85
+ Pgbus::Integrations::Appsignal.install!
86
+ end
87
+ end
88
+
74
89
  # Install the watermark cache middleware ahead of the app's own
75
90
  # middleware so the thread-local cache is cleared between every
76
91
  # Rack request. Without this, repeated page renders served by the
@@ -30,12 +30,32 @@ module Pgbus
30
30
  def process!(message)
31
31
  raw = JSON.parse(message.message)
32
32
  event = build_event(raw)
33
+ routing_key = raw.dig("headers", "routing_key") || raw["routing_key"]
33
34
 
34
35
  return :skipped if self.class.idempotent? && !claim_idempotency?(event.event_id)
35
36
 
36
- handle(event)
37
- instrument("pgbus.event_processed", event_id: event.event_id, handler: self.class.name)
37
+ instrument_payload = {
38
+ event_id: event.event_id,
39
+ handler: self.class.name,
40
+ routing_key: routing_key,
41
+ published_at: event.published_at,
42
+ read_ct: message.read_ct.to_i,
43
+ msg_id: message.msg_id.to_i
44
+ }
45
+ Instrumentation.instrument("pgbus.event_processed", instrument_payload) do
46
+ handle(event)
47
+ end
38
48
  :handled
49
+ rescue StandardError => e
50
+ instrument(
51
+ "pgbus.event_failed",
52
+ event_id: event&.event_id,
53
+ handler: self.class.name,
54
+ routing_key: routing_key,
55
+ error: e.class.name,
56
+ exception_object: e
57
+ )
58
+ raise
39
59
  end
40
60
 
41
61
  # Mirrors Pgbus::ActiveJob::Executor#execute_job: wrap the handler
@@ -7,12 +7,21 @@ module Pgbus
7
7
  # automatically when used with the block form of AS::Notifications.instrument.
8
8
  #
9
9
  # Events emitted:
10
- # pgbus.client.send_message — single message enqueue
11
- # pgbus.client.send_batch — batch enqueue
12
- # pgbus.client.read_batch — batch dequeue
13
- # pgbus.client.read_message — single message dequeue
14
- # pgbus.executor.execute — full job execution (deserialize + perform + archive)
15
- # pgbus.serializer.serialize — job/event serialization
10
+ # pgbus.client.send_message — single message enqueue
11
+ # pgbus.client.send_batch — batch enqueue
12
+ # pgbus.client.read_batch — batch dequeue
13
+ # pgbus.client.read_message — single message dequeue
14
+ # pgbus.executor.execute — full job execution (deserialize + perform + archive)
15
+ # pgbus.job_completed — job archived successfully
16
+ # pgbus.job_failed — job raised; carries :exception_object
17
+ # pgbus.job_dead_lettered — job exceeded max_retries and was DLQ-routed
18
+ # pgbus.event_processed — event handler succeeded
19
+ # pgbus.event_failed — event handler raised; carries :exception_object
20
+ # pgbus.stream.broadcast — stream broadcast (sync or deferred)
21
+ # pgbus.outbox.publish — outbox row created
22
+ # pgbus.recurring.enqueue — scheduler enqueued a due recurring task
23
+ # pgbus.worker.recycle — worker hit a recycle threshold
24
+ # pgbus.serializer.serialize — job/event serialization
16
25
  # pgbus.serializer.deserialize — job/event deserialization
17
26
  #
18
27
  module Instrumentation
@@ -0,0 +1,87 @@
1
+ {
2
+ "title": "Pgbus — Health",
3
+ "description": "Backlog, dead-letter activity, dead-tuple growth, and MVCC horizon. The 'should I page someone?' dashboard.",
4
+ "graphs": [
5
+ {
6
+ "title": "Queue depth (visible vs total)",
7
+ "description": "Visible depth excludes messages whose VT hasn't expired. A divergence between the two means workers are slow but the queue isn't growing.",
8
+ "line_label": "%queue",
9
+ "format": "number",
10
+ "kind": "timeseries",
11
+ "metrics": [
12
+ { "name": "pgbus_queue_depth", "fields": ["GAUGE"], "tags": [] },
13
+ { "name": "pgbus_queue_visible_depth", "fields": ["GAUGE"], "tags": [] }
14
+ ]
15
+ },
16
+ {
17
+ "title": "Oldest message age (seconds)",
18
+ "description": "Per-queue head-of-line waiting time. If this climbs while queue depth stays flat, a single poison message is stuck in the VT loop.",
19
+ "line_label": "%queue",
20
+ "format": "duration",
21
+ "format_input": "second",
22
+ "kind": "timeseries",
23
+ "metrics": [
24
+ { "name": "pgbus_queue_oldest_message_age_seconds", "fields": ["GAUGE"], "tags": [] }
25
+ ]
26
+ },
27
+ {
28
+ "title": "DLQ depth + failed events",
29
+ "description": "Messages that exceeded max_retries plus the failed-events table. Spikes after a deploy point at a regression.",
30
+ "format": "number",
31
+ "kind": "timeseries",
32
+ "metrics": [
33
+ { "name": "pgbus_dlq_depth", "fields": ["GAUGE"], "tags": [] },
34
+ { "name": "pgbus_failed_events_total", "fields": ["GAUGE"], "tags": [] }
35
+ ]
36
+ },
37
+ {
38
+ "title": "Dead-lettered jobs per minute",
39
+ "line_label": "%queue %job_class",
40
+ "format": "number",
41
+ "kind": "timeseries",
42
+ "draw_null_as_zero": true,
43
+ "metrics": [
44
+ { "name": "pgbus_queue_job_count", "fields": ["COUNTER"], "tags": [{ "key": "status", "value": "dead_lettered" }] }
45
+ ]
46
+ },
47
+ {
48
+ "title": "Active processes",
49
+ "description": "Workers + dispatcher + scheduler currently heartbeating into pgbus_processes.",
50
+ "format": "number",
51
+ "kind": "timeseries",
52
+ "metrics": [
53
+ { "name": "pgbus_active_processes", "fields": ["GAUGE"], "tags": [] }
54
+ ]
55
+ },
56
+ {
57
+ "title": "Dead tuples in queue/archive tables",
58
+ "description": "If autovacuum can't keep up the index gets bloated and lock acquisition slows. Tune autovacuum_vacuum_scale_factor on the offending tables when this climbs.",
59
+ "format": "number",
60
+ "kind": "timeseries",
61
+ "metrics": [
62
+ { "name": "pgbus_total_dead_tuples", "fields": ["GAUGE"], "tags": [] }
63
+ ]
64
+ },
65
+ {
66
+ "title": "Oldest open transaction (seconds)",
67
+ "description": "MVCC horizon pin. Long-running transactions prevent VACUUM from cleaning the dead tuples above. Anything over 60s is a smell.",
68
+ "format": "duration",
69
+ "format_input": "second",
70
+ "kind": "timeseries",
71
+ "metrics": [
72
+ { "name": "pgbus_oldest_transaction_age_seconds", "fields": ["GAUGE"], "tags": [] }
73
+ ]
74
+ },
75
+ {
76
+ "title": "Worker recycles per minute",
77
+ "description": "Recycles by reason. Steady max_jobs is healthy; spiking max_memory means you have a leak.",
78
+ "line_label": "%reason",
79
+ "format": "number",
80
+ "kind": "timeseries",
81
+ "draw_null_as_zero": true,
82
+ "metrics": [
83
+ { "name": "pgbus_worker_recycled", "fields": ["COUNTER"], "tags": [] }
84
+ ]
85
+ }
86
+ ]
87
+ }
@@ -0,0 +1,65 @@
1
+ {
2
+ "title": "Pgbus — Streams",
3
+ "description": "Real-time SSE pub/sub. Broadcasts, fanout, active connections, and the outbox/recurring scheduler.",
4
+ "graphs": [
5
+ {
6
+ "title": "Stream broadcasts per minute",
7
+ "line_label": "%stream %deferred",
8
+ "format": "number",
9
+ "kind": "timeseries",
10
+ "draw_null_as_zero": true,
11
+ "metrics": [
12
+ { "name": "pgbus_stream_broadcast_count", "fields": ["COUNTER"], "tags": [] }
13
+ ]
14
+ },
15
+ {
16
+ "title": "Active SSE connections",
17
+ "description": "Estimated from connect/disconnect events in the last 60 minutes. Use as a rough capacity gauge — exact count requires the SSE process telemetry.",
18
+ "format": "number",
19
+ "kind": "timeseries",
20
+ "metrics": [
21
+ { "name": "pgbus_stream_active_connections", "fields": ["GAUGE"], "tags": [] }
22
+ ]
23
+ },
24
+ {
25
+ "title": "Average fanout per broadcast",
26
+ "description": "Mean number of connections that received each broadcast over the last hour.",
27
+ "format": "number",
28
+ "kind": "timeseries",
29
+ "metrics": [
30
+ { "name": "pgbus_stream_avg_fanout", "fields": ["GAUGE"], "tags": [] }
31
+ ]
32
+ },
33
+ {
34
+ "title": "Broadcast payload size (bytes)",
35
+ "description": "Distribution of payload bytes. Use to spot accidentally-streaming-an-entire-page bugs.",
36
+ "line_label": "%stream",
37
+ "format": "size",
38
+ "format_input": "byte",
39
+ "kind": "timeseries",
40
+ "metrics": [
41
+ { "name": "pgbus_stream_broadcast_bytes", "fields": ["MEAN", "P95"], "tags": [] }
42
+ ]
43
+ },
44
+ {
45
+ "title": "Outbox publishes per minute",
46
+ "line_label": "%kind",
47
+ "format": "number",
48
+ "kind": "timeseries",
49
+ "draw_null_as_zero": true,
50
+ "metrics": [
51
+ { "name": "pgbus_outbox_published", "fields": ["COUNTER"], "tags": [] }
52
+ ]
53
+ },
54
+ {
55
+ "title": "Recurring tasks enqueued per minute",
56
+ "line_label": "%task",
57
+ "format": "number",
58
+ "kind": "timeseries",
59
+ "draw_null_as_zero": true,
60
+ "metrics": [
61
+ { "name": "pgbus_recurring_enqueued", "fields": ["COUNTER"], "tags": [] }
62
+ ]
63
+ }
64
+ ]
65
+ }
@@ -0,0 +1,81 @@
1
+ {
2
+ "title": "Pgbus — Throughput & Latency",
3
+ "description": "Job and event throughput, perform-duration percentiles, and PGMQ send/read counts. Drives the most common 'is the worker keeping up?' question.",
4
+ "graphs": [
5
+ {
6
+ "title": "Jobs processed per minute",
7
+ "description": "Successful, failed, and dead-lettered jobs. A spike in failed without a spike in processed usually means a deploy regression.",
8
+ "line_label": "%queue %job_class %status",
9
+ "format": "number",
10
+ "format_input": null,
11
+ "kind": "timeseries",
12
+ "draw_null_as_zero": true,
13
+ "metrics": [
14
+ { "name": "pgbus_queue_job_count", "fields": ["COUNTER"], "tags": [] }
15
+ ]
16
+ },
17
+ {
18
+ "title": "Job perform duration (ms)",
19
+ "description": "Distribution of how long perform_now takes per job class. P95 and P99 are the lines to watch.",
20
+ "line_label": "%job_class",
21
+ "format": "duration",
22
+ "format_input": "millisecond",
23
+ "kind": "timeseries",
24
+ "metrics": [
25
+ { "name": "pgbus_job_duration_ms", "fields": ["MEAN", "P95", "P99"], "tags": [] }
26
+ ]
27
+ },
28
+ {
29
+ "title": "Events processed per minute",
30
+ "description": "Event-bus handler invocations grouped by routing key.",
31
+ "line_label": "%routing_key %handler %status",
32
+ "format": "number",
33
+ "kind": "timeseries",
34
+ "draw_null_as_zero": true,
35
+ "metrics": [
36
+ { "name": "pgbus_event_count", "fields": ["COUNTER"], "tags": [] }
37
+ ]
38
+ },
39
+ {
40
+ "title": "Event handler duration (ms)",
41
+ "line_label": "%handler",
42
+ "format": "duration",
43
+ "format_input": "millisecond",
44
+ "kind": "timeseries",
45
+ "metrics": [
46
+ { "name": "pgbus_event_duration_ms", "fields": ["MEAN", "P95", "P99"], "tags": [] }
47
+ ]
48
+ },
49
+ {
50
+ "title": "PGMQ messages sent (per minute)",
51
+ "line_label": "%queue",
52
+ "format": "number",
53
+ "kind": "timeseries",
54
+ "draw_null_as_zero": true,
55
+ "metrics": [
56
+ { "name": "pgbus_messages_sent", "fields": ["COUNTER"], "tags": [] }
57
+ ]
58
+ },
59
+ {
60
+ "title": "Send duration (ms)",
61
+ "line_label": "%queue",
62
+ "format": "duration",
63
+ "format_input": "millisecond",
64
+ "kind": "timeseries",
65
+ "metrics": [
66
+ { "name": "pgbus_send_duration_ms", "fields": ["MEAN", "P95", "P99"], "tags": [] }
67
+ ]
68
+ },
69
+ {
70
+ "title": "PGMQ messages read (per minute)",
71
+ "description": "Messages fetched from queues by workers. Compare against 'sent' to spot backlog growth.",
72
+ "line_label": "%queue",
73
+ "format": "number",
74
+ "kind": "timeseries",
75
+ "draw_null_as_zero": true,
76
+ "metrics": [
77
+ { "name": "pgbus_messages_read", "fields": ["COUNTER"], "tags": [] }
78
+ ]
79
+ }
80
+ ]
81
+ }
@@ -0,0 +1,128 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Pgbus
4
+ module Integrations
5
+ module Appsignal
6
+ # Minutely probe that pushes pgbus-wide gauges into AppSignal.
7
+ #
8
+ # All readings come from Pgbus::Web::DataSource so the probe doesn't
9
+ # duplicate query logic. DataSource is built to be resilient — every
10
+ # method rescues StandardError and returns a safe default — but we
11
+ # still wrap each section in our own rescue so a probe iteration
12
+ # never raises out into the AppSignal probe runner.
13
+ module Probe
14
+ METRIC_PREFIX = "pgbus_"
15
+ private_constant :METRIC_PREFIX
16
+
17
+ class << self
18
+ def install! # rubocop:disable Naming/PredicateMethod
19
+ return false if @installed
20
+
21
+ ::Appsignal::Probes.register :pgbus, new_probe_instance
22
+ @installed = true
23
+ true
24
+ end
25
+
26
+ def installed?
27
+ @installed == true
28
+ end
29
+
30
+ def reset!
31
+ ::Appsignal::Probes.unregister(:pgbus) if defined?(::Appsignal::Probes) &&
32
+ ::Appsignal::Probes.respond_to?(:unregister)
33
+ @installed = false
34
+ end
35
+
36
+ # Visible for testing — returns a fresh runnable probe.
37
+ def new_probe_instance
38
+ Runner.new
39
+ end
40
+ end
41
+
42
+ # The actual probe object; AppSignal calls #call once per minute.
43
+ class Runner
44
+ def initialize(data_source: nil)
45
+ @data_source = data_source
46
+ end
47
+
48
+ def call
49
+ return unless data_source
50
+
51
+ track_queues
52
+ track_processes
53
+ track_summary
54
+ track_streams
55
+ end
56
+
57
+ private
58
+
59
+ def data_source
60
+ @data_source ||=
61
+ (::Pgbus::Web::DataSource.new if defined?(::Pgbus::Web::DataSource))
62
+ end
63
+
64
+ def track_queues
65
+ data_source.queues_with_metrics.each do |q|
66
+ tags = { queue: q[:name] }
67
+ gauge "queue_depth", q[:queue_length], tags
68
+ gauge "queue_visible_depth", q[:queue_visible_length], tags
69
+ gauge "queue_paused", q[:paused] ? 1 : 0, tags
70
+ age = q[:oldest_msg_age_sec]
71
+ gauge "queue_oldest_message_age_seconds", age, tags if age
72
+ end
73
+ rescue StandardError => e
74
+ log_failure("queue metrics", e)
75
+ end
76
+
77
+ def track_processes
78
+ gauge "active_processes", data_source.processes.count
79
+ rescue StandardError => e
80
+ log_failure("process metrics", e)
81
+ end
82
+
83
+ def track_summary
84
+ stats = data_source.summary_stats
85
+ gauge "total_queues", stats[:total_queues]
86
+ gauge "total_depth", stats[:total_depth]
87
+ gauge "total_visible", stats[:total_visible]
88
+ gauge "dlq_depth", stats[:dlq_depth]
89
+ gauge "failed_events_total", stats[:failed_count]
90
+ gauge "throughput_rate", stats[:throughput_rate]
91
+ gauge "total_dead_tuples", stats[:total_dead_tuples]
92
+ gauge "tables_needing_vacuum", stats[:tables_needing_vacuum]
93
+ gauge "oldest_transaction_age_seconds", stats[:oldest_transaction_age_sec]
94
+ rescue StandardError => e
95
+ log_failure("summary metrics", e)
96
+ end
97
+
98
+ def track_streams
99
+ return unless data_source.respond_to?(:stream_stats_available?) &&
100
+ data_source.stream_stats_available?
101
+
102
+ summary = data_source.stream_stats_summary
103
+ gauge "stream_broadcasts_60m", summary[:broadcasts]
104
+ gauge "stream_connects_60m", summary[:connects]
105
+ gauge "stream_disconnects_60m", summary[:disconnects]
106
+ gauge "stream_active_connections", summary[:active_estimate]
107
+ gauge "stream_avg_fanout", summary[:avg_fanout]
108
+ gauge "stream_avg_broadcast_ms", summary[:avg_broadcast_ms]
109
+ rescue StandardError => e
110
+ log_failure("stream metrics", e)
111
+ end
112
+
113
+ def gauge(key, value, tags = {})
114
+ return if value.nil?
115
+
116
+ ::Appsignal.set_gauge("#{METRIC_PREFIX}#{key}", value, tags)
117
+ end
118
+
119
+ def log_failure(label, error)
120
+ Pgbus.logger.debug do
121
+ "[Pgbus::AppSignal::Probe] #{label} failed: #{error.class}: #{error.message}"
122
+ end
123
+ end
124
+ end
125
+ end
126
+ end
127
+ end
128
+ end
@@ -0,0 +1,303 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "time"
4
+
5
+ module Pgbus
6
+ module Integrations
7
+ module Appsignal
8
+ # Translates Pgbus::Instrumentation events into AppSignal transactions
9
+ # and custom metrics.
10
+ #
11
+ # Job and event-handler events open a BACKGROUND_JOB transaction so they
12
+ # appear under AppSignal's "Performance > Background jobs" view, with
13
+ # action `<JobClass>#perform` or `<HandlerClass>#handle`. All other
14
+ # events are reported as counters or distributions only.
15
+ #
16
+ # All metric names are prefixed `pgbus_`. Tag keys avoid high-cardinality
17
+ # values (no msg_id, no event_id) so AppSignal's metric storage stays
18
+ # efficient.
19
+ module Subscriber
20
+ BACKGROUND_JOB = "background_job"
21
+ METRIC_PREFIX = "pgbus_"
22
+ private_constant :BACKGROUND_JOB, :METRIC_PREFIX
23
+
24
+ # Tracked so we can detach in reset! (used by specs).
25
+ @subscriptions = []
26
+
27
+ class << self
28
+ def install!
29
+ return false if @installed
30
+
31
+ @subscriptions = [
32
+ subscribe("pgbus.executor.execute") { |event| on_executor_execute(event) },
33
+ subscribe("pgbus.job_completed") { |event| on_job_completed(event) },
34
+ subscribe("pgbus.job_failed") { |event| on_job_failed(event) },
35
+ subscribe("pgbus.job_dead_lettered") { |event| on_job_dead_lettered(event) },
36
+ subscribe("pgbus.event_processed") { |event| on_event_processed(event) },
37
+ subscribe("pgbus.event_failed") { |event| on_event_failed(event) },
38
+ subscribe("pgbus.client.send_message") { |event| on_send_message(event) },
39
+ subscribe("pgbus.client.send_batch") { |event| on_send_batch(event) },
40
+ subscribe("pgbus.client.read_batch") { |event| on_read_batch(event) },
41
+ subscribe("pgbus.stream.broadcast") { |event| on_stream_broadcast(event) },
42
+ subscribe("pgbus.outbox.publish") { |event| on_outbox_publish(event) },
43
+ subscribe("pgbus.recurring.enqueue") { |event| on_recurring_enqueue(event) },
44
+ subscribe("pgbus.worker.recycle") { |event| on_worker_recycle(event) }
45
+ ]
46
+ @installed = true
47
+ end
48
+
49
+ def installed?
50
+ @installed == true
51
+ end
52
+
53
+ def reset!
54
+ @subscriptions&.each { |s| ActiveSupport::Notifications.unsubscribe(s) }
55
+ @subscriptions = []
56
+ @installed = false
57
+ end
58
+
59
+ private
60
+
61
+ def subscribe(name, &block)
62
+ # silence rubocop unused
63
+ ActiveSupport::Notifications.subscribe(name) do |*args|
64
+ event = ActiveSupport::Notifications::Event.new(*args)
65
+ safely { block.call(event) }
66
+ end
67
+ end
68
+
69
+ # Errors in the subscriber must never affect the producer thread.
70
+ # AppSignal can be misconfigured, the agent can be down, etc. — log
71
+ # and move on.
72
+ def safely
73
+ yield
74
+ rescue StandardError => e
75
+ Pgbus.logger.warn do
76
+ "[Pgbus::AppSignal] subscriber error: #{e.class}: #{e.message}"
77
+ end
78
+ end
79
+
80
+ # ── Job execution ───────────────────────────────────────────────
81
+
82
+ def on_executor_execute(event)
83
+ payload = event.payload
84
+ transaction = ::Appsignal::Transaction.create(BACKGROUND_JOB)
85
+ transaction.set_action_if_nil("#{payload[:job_class] || "UnknownJob"}#perform")
86
+ apply_queue_start(transaction, payload[:enqueued_at])
87
+ transaction.add_tags(job_tags(payload))
88
+ transaction.add_params_if_nil { { arguments: payload[:arguments] } }
89
+ ::Appsignal.add_distribution_value(
90
+ "#{METRIC_PREFIX}job_duration_ms",
91
+ event.duration,
92
+ { queue: payload[:queue], job_class: payload[:job_class] }
93
+ )
94
+ ensure
95
+ ::Appsignal::Transaction.complete_current!
96
+ end
97
+
98
+ def on_job_completed(event)
99
+ payload = event.payload
100
+ ::Appsignal.increment_counter(
101
+ "#{METRIC_PREFIX}queue_job_count",
102
+ 1,
103
+ { queue: payload[:queue], job_class: payload[:job_class], status: "processed" }
104
+ )
105
+ end
106
+
107
+ def on_job_failed(event)
108
+ payload = event.payload
109
+ ::Appsignal.increment_counter(
110
+ "#{METRIC_PREFIX}queue_job_count",
111
+ 1,
112
+ { queue: payload[:queue], job_class: payload[:job_class], status: "failed" }
113
+ )
114
+ err = payload[:exception_object]
115
+ ::Appsignal.set_error(err) if err && ::Appsignal.respond_to?(:set_error)
116
+ end
117
+
118
+ def on_job_dead_lettered(event)
119
+ payload = event.payload
120
+ ::Appsignal.increment_counter(
121
+ "#{METRIC_PREFIX}queue_job_count",
122
+ 1,
123
+ { queue: payload[:queue], job_class: payload[:job_class], status: "dead_lettered" }
124
+ )
125
+ end
126
+
127
+ # ── Event handler ───────────────────────────────────────────────
128
+
129
+ def on_event_processed(event)
130
+ payload = event.payload
131
+ transaction = ::Appsignal::Transaction.create(BACKGROUND_JOB)
132
+ transaction.set_action_if_nil("#{payload[:handler] || "UnknownHandler"}#handle")
133
+ apply_queue_start(transaction, payload[:published_at])
134
+ transaction.add_tags(handler_tags(payload))
135
+ ::Appsignal.add_distribution_value(
136
+ "#{METRIC_PREFIX}event_duration_ms",
137
+ event.duration,
138
+ { handler: payload[:handler], routing_key: payload[:routing_key] }
139
+ )
140
+ ::Appsignal.increment_counter(
141
+ "#{METRIC_PREFIX}event_count",
142
+ 1,
143
+ { handler: payload[:handler], routing_key: payload[:routing_key], status: "processed" }
144
+ )
145
+ ensure
146
+ ::Appsignal::Transaction.complete_current!
147
+ end
148
+
149
+ def on_event_failed(event)
150
+ payload = event.payload
151
+ ::Appsignal.increment_counter(
152
+ "#{METRIC_PREFIX}event_count",
153
+ 1,
154
+ { handler: payload[:handler], routing_key: payload[:routing_key], status: "failed" }
155
+ )
156
+ err = payload[:exception_object]
157
+ ::Appsignal.set_error(err) if err && ::Appsignal.respond_to?(:set_error)
158
+ end
159
+
160
+ # ── Client (PGMQ wrapper) ───────────────────────────────────────
161
+
162
+ def on_send_message(event)
163
+ payload = event.payload
164
+ ::Appsignal.increment_counter(
165
+ "#{METRIC_PREFIX}messages_sent",
166
+ 1,
167
+ { queue: payload[:queue] }
168
+ )
169
+ ::Appsignal.add_distribution_value(
170
+ "#{METRIC_PREFIX}send_duration_ms",
171
+ event.duration,
172
+ { queue: payload[:queue] }
173
+ )
174
+ end
175
+
176
+ def on_send_batch(event)
177
+ payload = event.payload
178
+ count = payload[:count] || payload[:batch_size] || 1
179
+ ::Appsignal.increment_counter(
180
+ "#{METRIC_PREFIX}messages_sent",
181
+ count,
182
+ { queue: payload[:queue] }
183
+ )
184
+ ::Appsignal.add_distribution_value(
185
+ "#{METRIC_PREFIX}send_batch_duration_ms",
186
+ event.duration,
187
+ { queue: payload[:queue] }
188
+ )
189
+ end
190
+
191
+ def on_read_batch(event)
192
+ payload = event.payload
193
+ count = payload[:count] || payload[:fetched] || 0
194
+ ::Appsignal.increment_counter(
195
+ "#{METRIC_PREFIX}messages_read",
196
+ count,
197
+ { queue: payload[:queue] }
198
+ )
199
+ end
200
+
201
+ # ── Streams ─────────────────────────────────────────────────────
202
+
203
+ def on_stream_broadcast(event)
204
+ payload = event.payload
205
+ ::Appsignal.increment_counter(
206
+ "#{METRIC_PREFIX}stream_broadcast_count",
207
+ 1,
208
+ { stream: payload[:stream], deferred: payload[:deferred] ? "true" : "false" }
209
+ )
210
+ return unless payload[:bytes]
211
+
212
+ ::Appsignal.add_distribution_value(
213
+ "#{METRIC_PREFIX}stream_broadcast_bytes",
214
+ payload[:bytes],
215
+ { stream: payload[:stream] }
216
+ )
217
+ end
218
+
219
+ # ── Outbox ──────────────────────────────────────────────────────
220
+
221
+ def on_outbox_publish(event)
222
+ payload = event.payload
223
+ ::Appsignal.increment_counter(
224
+ "#{METRIC_PREFIX}outbox_published",
225
+ 1,
226
+ { kind: payload[:kind] || "job" }
227
+ )
228
+ ::Appsignal.add_distribution_value(
229
+ "#{METRIC_PREFIX}outbox_publish_duration_ms",
230
+ event.duration,
231
+ { kind: payload[:kind] || "job" }
232
+ )
233
+ end
234
+
235
+ # ── Recurring scheduler ─────────────────────────────────────────
236
+
237
+ def on_recurring_enqueue(event)
238
+ payload = event.payload
239
+ ::Appsignal.increment_counter(
240
+ "#{METRIC_PREFIX}recurring_enqueued",
241
+ 1,
242
+ { task: payload[:task], class_name: payload[:class_name] }
243
+ )
244
+ end
245
+
246
+ # ── Worker lifecycle ────────────────────────────────────────────
247
+
248
+ def on_worker_recycle(event)
249
+ payload = event.payload
250
+ ::Appsignal.increment_counter(
251
+ "#{METRIC_PREFIX}worker_recycled",
252
+ 1,
253
+ { reason: payload[:reason] }
254
+ )
255
+ end
256
+
257
+ # ── Helpers ─────────────────────────────────────────────────────
258
+
259
+ # AppSignal expects queue-start as Unix epoch milliseconds. Pgbus
260
+ # carries it as either an ISO-8601 String or a Time — both happen
261
+ # in practice (executor passes the JSON string, handler passes a
262
+ # parsed Time).
263
+ def apply_queue_start(transaction, value)
264
+ return unless value
265
+
266
+ millis =
267
+ case value
268
+ when Time
269
+ (value.to_f * 1_000).to_i
270
+ when String
271
+ (Time.parse(value).to_f * 1_000).to_i
272
+ when Numeric
273
+ value.to_i
274
+ end
275
+ transaction.set_queue_start(millis) if millis
276
+ rescue ArgumentError
277
+ # Unparseable timestamp — skip rather than blow up.
278
+ end
279
+
280
+ def job_tags(payload)
281
+ tags = {
282
+ "queue" => payload[:queue],
283
+ "job_class" => payload[:job_class],
284
+ "attempts" => payload[:read_ct]
285
+ }
286
+ tags["active_job_id"] = payload[:job_id] if payload[:job_id]
287
+ tags["provider_job_id"] = payload[:provider_job_id] if payload[:provider_job_id]
288
+ tags["request_id"] = payload[:provider_job_id] || payload[:job_id]
289
+ tags.compact
290
+ end
291
+
292
+ def handler_tags(payload)
293
+ {
294
+ "handler" => payload[:handler],
295
+ "routing_key" => payload[:routing_key],
296
+ "attempts" => payload[:read_ct]
297
+ }.compact
298
+ end
299
+ end
300
+ end
301
+ end
302
+ end
303
+ end
@@ -0,0 +1,52 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "pgbus/integrations/appsignal/subscriber"
4
+ require "pgbus/integrations/appsignal/probe"
5
+
6
+ module Pgbus
7
+ module Integrations
8
+ # AppSignal integration for pgbus.
9
+ #
10
+ # Loaded automatically by Pgbus::Engine when the appsignal gem is present
11
+ # and config.appsignal_enabled is true (default). To opt out:
12
+ #
13
+ # Pgbus.configure do |c|
14
+ # c.appsignal_enabled = false
15
+ # end
16
+ #
17
+ # The integration:
18
+ # * Subscribes to pgbus.* ActiveSupport::Notifications and translates
19
+ # them into AppSignal background-job transactions and metrics.
20
+ # * Registers a minutely probe that reports queue depth, DLQ size,
21
+ # dead-tuple counts, MVCC horizon age, and stream stats from
22
+ # Pgbus::Web::DataSource.
23
+ #
24
+ # All metric names are prefixed `pgbus_` so they group cleanly in
25
+ # AppSignal's custom-metrics view.
26
+ module Appsignal
27
+ module_function
28
+
29
+ def install! # rubocop:disable Naming/PredicateMethod
30
+ return false unless defined?(::Appsignal)
31
+ return false if @installed
32
+
33
+ Subscriber.install!
34
+ Probe.install! if Pgbus.configuration.appsignal_probe_enabled
35
+ @installed = true
36
+ Pgbus.logger.info { "[Pgbus] AppSignal integration installed" }
37
+ true
38
+ end
39
+
40
+ def installed?
41
+ @installed == true
42
+ end
43
+
44
+ # Test hook: tear everything down so a fresh install! can run.
45
+ def reset!
46
+ Subscriber.reset! if defined?(Subscriber)
47
+ Probe.reset! if defined?(Probe)
48
+ @installed = false
49
+ end
50
+ end
51
+ end
52
+ end
data/lib/pgbus/outbox.rb CHANGED
@@ -5,22 +5,26 @@ module Pgbus
5
5
  module_function
6
6
 
7
7
  def publish(queue_name, payload, headers: nil, priority: nil, delay: 0)
8
- OutboxEntry.create!(
9
- queue_name: queue_name,
10
- payload: payload,
11
- headers: headers,
12
- priority: priority || Pgbus.configuration.default_priority,
13
- delay: delay
14
- )
8
+ Instrumentation.instrument("pgbus.outbox.publish", queue: queue_name, kind: :job) do
9
+ OutboxEntry.create!(
10
+ queue_name: queue_name,
11
+ payload: payload,
12
+ headers: headers,
13
+ priority: priority || Pgbus.configuration.default_priority,
14
+ delay: delay
15
+ )
16
+ end
15
17
  end
16
18
 
17
19
  def publish_event(routing_key, payload, headers: nil)
18
- event_data = EventBus::Publisher.build_event_data(payload)
19
- OutboxEntry.create!(
20
- routing_key: routing_key,
21
- payload: event_data,
22
- headers: headers
23
- )
20
+ Instrumentation.instrument("pgbus.outbox.publish", routing_key: routing_key, kind: :event) do
21
+ event_data = EventBus::Publisher.build_event_data(payload)
22
+ OutboxEntry.create!(
23
+ routing_key: routing_key,
24
+ payload: event_data,
25
+ headers: headers
26
+ )
27
+ end
24
28
  end
25
29
 
26
30
  def flush!
@@ -302,15 +302,33 @@ module Pgbus
302
302
  end
303
303
 
304
304
  def check_recycle
305
- return unless @lifecycle.running? && recycle_needed?
305
+ return unless @lifecycle.running?
306
+
307
+ reason = recycle_reason
308
+ return unless reason
306
309
 
307
310
  Pgbus.stopping = true
308
311
  @lifecycle.transition_to(:draining)
312
+ Pgbus::Instrumentation.instrument(
313
+ "pgbus.worker.recycle",
314
+ reason: reason,
315
+ jobs_processed: @jobs_processed.value,
316
+ memory_mb: current_memory_mb,
317
+ lifetime_seconds: monotonic_now - @started_at_monotonic
318
+ )
309
319
  @wake_signal.notify!
310
320
  end
311
321
 
322
+ def recycle_reason
323
+ return :max_jobs if exceeded_max_jobs?
324
+ return :max_memory if exceeded_max_memory?
325
+ return :max_lifetime if exceeded_max_lifetime?
326
+
327
+ nil
328
+ end
329
+
312
330
  def recycle_needed?
313
- exceeded_max_jobs? || exceeded_max_memory? || exceeded_max_lifetime?
331
+ !recycle_reason.nil?
314
332
  end
315
333
 
316
334
  def exceeded_max_jobs?
@@ -41,8 +41,16 @@ module Pgbus
41
41
 
42
42
  def tick(now)
43
43
  schedule.due_tasks(now).each do |task, run_at|
44
- schedule.enqueue_task(task, run_at: run_at)
45
- @last_runs[task.key] = now
44
+ Pgbus::Instrumentation.instrument(
45
+ "pgbus.recurring.enqueue",
46
+ task: task.key,
47
+ class_name: task.class_name,
48
+ queue: task.queue_name,
49
+ run_at: run_at
50
+ ) do
51
+ schedule.enqueue_task(task, run_at: run_at)
52
+ @last_runs[task.key] = now
53
+ end
46
54
  rescue StandardError => e
47
55
  Pgbus.logger.error do
48
56
  "[Pgbus] Error scheduling recurring task #{task.key}: #{e.class}: #{e.message}"
data/lib/pgbus/streams.rb CHANGED
@@ -73,11 +73,19 @@ module Pgbus
73
73
  wrapped = { "html" => payload.to_s }
74
74
  wrapped["visible_to"] = visible_to.to_s if visible_to
75
75
  transaction = current_open_transaction
76
- if transaction
77
- transaction.after_commit { @client.send_message(@name, wrapped) }
78
- nil
79
- else
80
- @client.send_message(@name, wrapped)
76
+ instrument_payload = {
77
+ stream: @name,
78
+ visible_to: visible_to,
79
+ deferred: !transaction.nil?,
80
+ bytes: wrapped["html"].bytesize
81
+ }
82
+ Instrumentation.instrument("pgbus.stream.broadcast", instrument_payload) do
83
+ if transaction
84
+ transaction.after_commit { @client.send_message(@name, wrapped) }
85
+ nil
86
+ else
87
+ @client.send_message(@name, wrapped)
88
+ end
81
89
  end
82
90
  end
83
91
 
data/lib/pgbus/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Pgbus
4
- VERSION = "0.7.8"
4
+ VERSION = "0.7.9"
5
5
  end
data/lib/pgbus.rb CHANGED
@@ -40,6 +40,10 @@ module Pgbus
40
40
  loader.ignore("#{__dir__}/generators")
41
41
  loader.ignore("#{__dir__}/active_job")
42
42
  loader.ignore("#{__dir__}/pgbus/testing")
43
+ # Vendor integrations are loaded conditionally (when the vendor gem
44
+ # is present) by lib/pgbus/engine.rb. Keeping them out of Zeitwerk
45
+ # means we don't reference vendor constants at autoload time.
46
+ loader.ignore("#{__dir__}/pgbus/integrations")
43
47
  # lib/puma/plugin/pgbus_streams.rb is a Puma plugin — it's required
44
48
  # explicitly by the user from config/puma.rb via `plugin :pgbus_streams`.
45
49
  # Without this ignore, Zeitwerk scans lib/puma/ under the pgbus loader
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pgbus
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.8
4
+ version: 0.7.9
5
5
  platform: ruby
6
6
  authors:
7
7
  - Mikael Henriksson
@@ -273,6 +273,12 @@ files:
273
273
  - lib/pgbus/generators/database_target_detector.rb
274
274
  - lib/pgbus/generators/migration_detector.rb
275
275
  - lib/pgbus/instrumentation.rb
276
+ - lib/pgbus/integrations/appsignal.rb
277
+ - lib/pgbus/integrations/appsignal/dashboards/pgbus_health.json
278
+ - lib/pgbus/integrations/appsignal/dashboards/pgbus_streams.json
279
+ - lib/pgbus/integrations/appsignal/dashboards/pgbus_throughput.json
280
+ - lib/pgbus/integrations/appsignal/probe.rb
281
+ - lib/pgbus/integrations/appsignal/subscriber.rb
276
282
  - lib/pgbus/log_formatter.rb
277
283
  - lib/pgbus/outbox.rb
278
284
  - lib/pgbus/outbox/poller.rb