pgbus 0.7.4 → 0.7.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3c1610b9a423eb2d8cc6797176ec23ce9ddadab1b313f8c1ee985327a6af0c42
4
- data.tar.gz: b7369612f3ac54bc3bef2e9fbec4afcc8c31a632edd667ab91d03d750f1c3e4c
3
+ metadata.gz: a28025e2b7af463cb6ff57d12e516d461b9b968ed7bb4eb4568be18d77ff2678
4
+ data.tar.gz: fd39f5774df158b0cf06e85a6ef52f0114d649008f56b4736a782ce895baae0b
5
5
  SHA512:
6
- metadata.gz: 906567f5015468dad875c7d4948979fb78b095e1433838cbb3761eff02d54b3186a251a44e8d52508a3fd6b8cb540648d79eded23301561346f4d97878c63b11
7
- data.tar.gz: 24cdb2845692e0172681fa899f5b8dac743ef00d214b0e09f4f9a0836f042026f0cceda1fab76851a52028b8b50101df9158907d487752dd301f4b28aed9466c
6
+ metadata.gz: 19ced25de78f93ccbf307e2bdd72a5dbc46050657c6e394bc9e6a16d8665fa19bc2a29cebf2fcbda8c31eb07e07d94545322eec80c5d61915ddd6bfc924de105
7
+ data.tar.gz: 9d8e2a9255ce011edf6bc2647fbd4d604db65b8fb2470c17735149599072f2824e510db4e97c4d241cdd4d6a9fef2ff7c23638738cd40de581ee56a07b783f6b
data/CHANGELOG.md CHANGED
@@ -1,18 +1,3 @@
1
- ## [0.5.1] - 2026-04-08
2
-
3
- ### Fixed
4
-
5
- - **Capsule DSL: anonymous duplicate capsules are now allowed.** Configurations like `c.workers = "*: 3; *: 3; *: 3; *: 3; *: 3"` (the legacy YAML pattern of 5 forks × 3 threads, all reading every queue) were rejected at boot in 0.5.0 with `Pgbus::Configuration::CapsuleDSL::ParseError: wildcard '*' appears in two capsules`. PGMQ tolerates multiple processes reading the same queue natively (`FOR UPDATE SKIP LOCKED`), and this is the canonical way to scale CPU parallelism across forks, so the rejection was wrong.
6
-
7
- The fix introduces a "named vs anonymous" distinction:
8
-
9
- - The string DSL parser is now purely syntactic — it no longer enforces overlap rules.
10
- - `Pgbus::Configuration#workers=` auto-assigns `:name` only to capsules whose first queue would yield a *unique* name AND is not the bare wildcard. Wildcards stay anonymous; collision-prone first-queues stay anonymous.
11
- - `Pgbus::Configuration#validate_no_queue_overlap!` (called by `c.capsule :name, ...`) now only checks against existing **named** capsules. Anonymous capsules can overlap freely with each other and with named capsules.
12
- - Net result: `"*: 3; *: 3; *: 3"` produces 3 anonymous capsules (3 forks), `"critical: 5; default: 10"` produces 2 named capsules (CLI `--capsule critical` still works), and named-vs-named overlap is still rejected as before.
13
-
14
- No changes required to user configuration — legacy YAML patterns and the modern DSL both work as documented.
15
-
16
1
  ## [Unreleased]
17
2
 
18
3
  ### Breaking Changes
@@ -28,6 +13,25 @@
28
13
  - Warn when dashboard `web_auth` is unconfigured
29
14
  - Add `globalid` as an explicit runtime dependency (was used but only transitively available via activejob)
30
15
 
16
+ ### Fixed
17
+
18
+ - **Defensive retry on stale pooled pgmq connections in the enqueue path.** `Pgbus::Client#send_message`, `#send_batch`, and `#publish_to_topic` now retry once when `@pgmq.produce*` raises `PGMQ::Errors::ConnectionError` with a message indicating the pooled `PG::Connection` was killed beneath pgmq-ruby — typically by PgBouncer hitting `server_idle_timeout` / `client_idle_timeout`, an admin disconnect, or a TCP RST. Observed in production as `PQsocket() can't get socket descriptor` on the first produce following an idle window. pgmq-ruby's `auto_reconnect` recovers on the *next* pool checkout, so a single retry is sufficient; non-stale errors (pool timeout, misconfiguration, unreachable database) still propagate unchanged. Upstream pgmq-ruby fix for the underlying misclassification is in-flight at mensfeld/pgmq-ruby#94.
19
+
20
+ ## [0.5.1] - 2026-04-08
21
+
22
+ ### Fixed
23
+
24
+ - **Capsule DSL: anonymous duplicate capsules are now allowed.** Configurations like `c.workers = "*: 3; *: 3; *: 3; *: 3; *: 3"` (the legacy YAML pattern of 5 forks × 3 threads, all reading every queue) were rejected at boot in 0.5.0 with `Pgbus::Configuration::CapsuleDSL::ParseError: wildcard '*' appears in two capsules`. PGMQ tolerates multiple processes reading the same queue natively (`FOR UPDATE SKIP LOCKED`), and this is the canonical way to scale CPU parallelism across forks, so the rejection was wrong.
25
+
26
+ The fix introduces a "named vs anonymous" distinction:
27
+
28
+ - The string DSL parser is now purely syntactic — it no longer enforces overlap rules.
29
+ - `Pgbus::Configuration#workers=` auto-assigns `:name` only to capsules whose first queue would yield a *unique* name AND is not the bare wildcard. Wildcards stay anonymous; collision-prone first-queues stay anonymous.
30
+ - `Pgbus::Configuration#validate_no_queue_overlap!` (called by `c.capsule :name, ...`) now only checks against existing **named** capsules. Anonymous capsules can overlap freely with each other and with named capsules.
31
+ - Net result: `"*: 3; *: 3; *: 3"` produces 3 anonymous capsules (3 forks), `"critical: 5; default: 10"` produces 2 named capsules (CLI `--capsule critical` still works), and named-vs-named overlap is still rejected as before.
32
+
33
+ No changes required to user configuration — legacy YAML patterns and the modern DSL both work as documented.
34
+
31
35
  ## [0.1.0] - 2026-03-30
32
36
 
33
37
  - Initial release
@@ -156,6 +156,11 @@ class CreatePgbusTables < ActiveRecord::Migration<%= migration_version %>
156
156
  # queue processing and concurrency lock management.
157
157
  execute Pgbus::AutovacuumTuning.sql_for_all_queues
158
158
  execute Pgbus::AutovacuumTuning.sql_for_high_churn_tables
159
+
160
+ # Set fillfactor on queue tables to reduce bloat from PGMQ's read
161
+ # UPDATE operations (vt, read_ct, last_read_at). Lower fillfactor
162
+ # reserves page space, reducing page density during heavy update churn.
163
+ execute Pgbus::TableMaintenance.fillfactor_sql_for_all_queues
159
164
  end
160
165
 
161
166
  def down
@@ -0,0 +1,30 @@
1
+ class TunePgbusFillfactor < ActiveRecord::Migration<%= migration_version %>
2
+ def up
3
+ # Set fillfactor on queue tables to reduce bloat from PGMQ's read
4
+ # UPDATE operations. PGMQ updates vt, read_ct, and last_read_at on
5
+ # every read — with fillfactor=100 (default), pages fill completely
6
+ # between vacuum passes. Lowering fillfactor reserves page space,
7
+ # reducing page density during heavy update churn. Note: because vt
8
+ # is indexed, these updates are not HOT-eligible.
9
+ #
10
+ # Archive tables are append-only (INSERT from queue, DELETE on
11
+ # retention) and don't benefit from fillfactor tuning.
12
+ #
13
+ # New queues created after this migration automatically receive
14
+ # this setting via Pgbus::Client at queue creation time.
15
+ execute Pgbus::TableMaintenance.fillfactor_sql_for_all_queues
16
+ end
17
+
18
+ def down
19
+ execute <<~SQL
20
+ DO $$
21
+ DECLARE
22
+ q RECORD;
23
+ BEGIN
24
+ FOR q IN SELECT queue_name FROM pgmq.meta LOOP
25
+ EXECUTE format('ALTER TABLE pgmq.q_%I RESET (fillfactor)', q.queue_name);
26
+ END LOOP;
27
+ END $$;
28
+ SQL
29
+ end
30
+ end
@@ -0,0 +1,51 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "rails/generators"
4
+ require "rails/generators/active_record"
5
+ require_relative "migration_path"
6
+
7
+ module Pgbus
8
+ module Generators
9
+ class TuneFillfactorGenerator < Rails::Generators::Base
10
+ include ActiveRecord::Generators::Migration
11
+ include MigrationPath
12
+
13
+ source_root File.expand_path("templates", __dir__)
14
+
15
+ desc "Set fillfactor on PGMQ queue tables to reduce page density and bloat"
16
+
17
+ class_option :database,
18
+ type: :string,
19
+ default: nil,
20
+ desc: "Use a separate database for pgbus tables (e.g. --database=pgbus)"
21
+
22
+ def create_migration_file
23
+ migration_template "tune_fillfactor.rb.erb",
24
+ File.join(pgbus_migrate_path, "tune_pgbus_fillfactor.rb")
25
+ end
26
+
27
+ def display_post_install
28
+ say ""
29
+ say "Pgbus fillfactor tuning migration created!", :green
30
+ say ""
31
+ say "This migration sets fillfactor=#{Pgbus::TableMaintenance::FILLFACTOR} on all existing"
32
+ say "PGMQ queue tables. This reserves #{100 - Pgbus::TableMaintenance::FILLFACTOR}% of each page to"
33
+ say "reduce page density during PGMQ's heavy read UPDATE churn."
34
+ say ""
35
+ say "New queues created at runtime will automatically receive"
36
+ say "this setting."
37
+ say ""
38
+ say "Next steps:"
39
+ say " 1. Run: rails db:migrate#{":#{options[:database]}" if separate_database?}"
40
+ say " 2. Restart pgbus: bin/pgbus start"
41
+ say ""
42
+ end
43
+
44
+ private
45
+
46
+ def migration_version
47
+ "[#{ActiveRecord::Migration.current_version}]"
48
+ end
49
+ end
50
+ end
51
+ end
@@ -19,17 +19,20 @@ module Pgbus
19
19
  # stream and from the streamer on first subscription per stream.
20
20
  module EnsureStreamQueue
21
21
  def ensure_stream_queue(stream_name)
22
- ensure_queue(stream_name)
23
22
  full_name = config.queue_name(stream_name)
24
23
 
25
- # PGMQ's default NOTIFY throttle is 250ms — meant to coalesce
26
- # high-frequency worker queue inserts. Streams are latency-
27
- # sensitive and need every broadcast to fire a NOTIFY, even
28
- # when several are batched within a single millisecond.
29
- # Override the throttle to 0 specifically for stream queues.
30
- # Use the idempotent path to avoid deadlocks when multiple
31
- # processes race to set up the same stream queue.
32
- synchronized { enable_notify_if_needed(full_name, 0) }
24
+ with_stale_connection_retry do
25
+ ensure_queue(stream_name)
26
+
27
+ # PGMQ's default NOTIFY throttle is 250ms meant to coalesce
28
+ # high-frequency worker queue inserts. Streams are latency-
29
+ # sensitive and need every broadcast to fire a NOTIFY, even
30
+ # when several are batched within a single millisecond.
31
+ # Override the throttle to 0 specifically for stream queues.
32
+ # Use the idempotent path to avoid deadlocks when multiple
33
+ # processes race to set up the same stream queue.
34
+ synchronized { enable_notify_if_needed(full_name, 0) }
35
+ end
33
36
 
34
37
  # CREATE INDEX IF NOT EXISTS is idempotent in Postgres but still
35
38
  # requires a roundtrip and a brief ACCESS SHARE lock on the archive
data/lib/pgbus/client.rb CHANGED
@@ -85,32 +85,40 @@ module Pgbus
85
85
 
86
86
  def send_message(queue_name, payload, headers: nil, delay: 0, priority: nil)
87
87
  target = @queue_strategy.target_queue(queue_name, priority)
88
- ensure_queue(queue_name)
89
88
  Instrumentation.instrument("pgbus.client.send_message", queue: target) do
90
- synchronized { @pgmq.produce(target, serialize(payload), headers: headers && serialize(headers), delay: delay) }
89
+ with_stale_connection_retry do
90
+ ensure_queue(queue_name)
91
+ synchronized { @pgmq.produce(target, serialize(payload), headers: headers && serialize(headers), delay: delay) }
92
+ end
91
93
  end
92
94
  end
93
95
 
94
96
  def send_batch(queue_name, payloads, headers: nil, delay: 0)
95
97
  full_name = config.queue_name(queue_name)
96
- ensure_queue(queue_name)
97
98
  serialized, serialized_headers = serialize_batch(payloads, headers)
98
99
  Instrumentation.instrument("pgbus.client.send_batch", queue: full_name, size: payloads.size) do
99
- synchronized { @pgmq.produce_batch(full_name, serialized, headers: serialized_headers, delay: delay) }
100
+ with_stale_connection_retry do
101
+ ensure_queue(queue_name)
102
+ synchronized { @pgmq.produce_batch(full_name, serialized, headers: serialized_headers, delay: delay) }
103
+ end
100
104
  end
101
105
  end
102
106
 
103
107
  def read_message(queue_name, vt: nil)
104
108
  full_name = config.queue_name(queue_name)
105
109
  Instrumentation.instrument("pgbus.client.read_message", queue: full_name) do
106
- synchronized { @pgmq.read(full_name, vt: vt || config.visibility_timeout) }
110
+ with_stale_connection_retry do
111
+ synchronized { @pgmq.read(full_name, vt: vt || config.visibility_timeout) }
112
+ end
107
113
  end
108
114
  end
109
115
 
110
116
  def read_batch(queue_name, qty:, vt: nil)
111
117
  full_name = config.queue_name(queue_name)
112
118
  Instrumentation.instrument("pgbus.client.read_batch", queue: full_name, qty: qty) do
113
- synchronized { @pgmq.read_batch(full_name, vt: vt || config.visibility_timeout, qty: qty) }
119
+ with_stale_connection_retry do
120
+ synchronized { @pgmq.read_batch(full_name, vt: vt || config.visibility_timeout, qty: qty) }
121
+ end
114
122
  end
115
123
  end
116
124
 
@@ -130,7 +138,9 @@ module Pgbus
130
138
  break if remaining <= 0
131
139
 
132
140
  msgs = Instrumentation.instrument("pgbus.client.read_batch", queue: pq_name, qty: remaining) do
133
- synchronized { @pgmq.read_batch(pq_name, vt: vt || config.visibility_timeout, qty: remaining) }
141
+ with_stale_connection_retry do
142
+ synchronized { @pgmq.read_batch(pq_name, vt: vt || config.visibility_timeout, qty: remaining) }
143
+ end
134
144
  end || []
135
145
 
136
146
  msgs.each { |m| results << [pq_name, m] }
@@ -142,14 +152,16 @@ module Pgbus
142
152
 
143
153
  def read_with_poll(queue_name, qty:, vt: nil, max_poll_seconds: 5, poll_interval_ms: 100)
144
154
  full_name = config.queue_name(queue_name)
145
- synchronized do
146
- @pgmq.read_with_poll(
147
- full_name,
148
- vt: vt || config.visibility_timeout,
149
- qty: qty,
150
- max_poll_seconds: max_poll_seconds,
151
- poll_interval_ms: poll_interval_ms
152
- )
155
+ with_stale_connection_retry do
156
+ synchronized do
157
+ @pgmq.read_with_poll(
158
+ full_name,
159
+ vt: vt || config.visibility_timeout,
160
+ qty: qty,
161
+ max_poll_seconds: max_poll_seconds,
162
+ poll_interval_ms: poll_interval_ms
163
+ )
164
+ end
153
165
  end
154
166
  end
155
167
 
@@ -164,8 +176,10 @@ module Pgbus
164
176
  def read_multi(queue_names, qty:, vt: nil, limit: nil)
165
177
  full_names = queue_names.map { |q| config.queue_name(q) }
166
178
  Instrumentation.instrument("pgbus.client.read_multi", queues: full_names, qty: qty, limit: limit) do
167
- synchronized do
168
- @pgmq.read_multi(full_names, vt: vt || config.visibility_timeout, qty: qty, limit: limit)
179
+ with_stale_connection_retry do
180
+ synchronized do
181
+ @pgmq.read_multi(full_names, vt: vt || config.visibility_timeout, qty: qty, limit: limit)
182
+ end
169
183
  end
170
184
  end
171
185
  end
@@ -174,74 +188,99 @@ module Pgbus
174
188
  # the full PGMQ queue name (e.g. from priority sub-queues or dashboard).
175
189
  def delete_message(queue_name, msg_id, prefixed: true)
176
190
  name = prefixed ? config.queue_name(queue_name) : queue_name
177
- synchronized { @pgmq.delete(name, msg_id) }
191
+ with_stale_connection_retry do
192
+ synchronized { @pgmq.delete(name, msg_id) }
193
+ end
178
194
  end
179
195
 
180
196
  # Archive a message. Pass prefixed: false when queue_name is already
181
197
  # the full PGMQ queue name.
182
198
  def archive_message(queue_name, msg_id, prefixed: true)
183
199
  name = prefixed ? config.queue_name(queue_name) : queue_name
184
- synchronized { @pgmq.archive(name, msg_id) }
200
+ with_stale_connection_retry do
201
+ synchronized { @pgmq.archive(name, msg_id) }
202
+ end
185
203
  end
186
204
 
187
205
  # Batch archive — moves multiple messages to the archive table in one call.
188
206
  def archive_batch(queue_name, msg_ids, prefixed: true)
189
207
  name = prefixed ? config.queue_name(queue_name) : queue_name
190
- synchronized { @pgmq.archive_batch(name, msg_ids) }
208
+ with_stale_connection_retry do
209
+ synchronized { @pgmq.archive_batch(name, msg_ids) }
210
+ end
191
211
  end
192
212
 
193
213
  # Batch delete — permanently removes multiple messages in one call.
194
214
  def delete_batch(queue_name, msg_ids, prefixed: true)
195
215
  name = prefixed ? config.queue_name(queue_name) : queue_name
196
- synchronized { @pgmq.delete_batch(name, msg_ids) }
216
+ with_stale_connection_retry do
217
+ synchronized { @pgmq.delete_batch(name, msg_ids) }
218
+ end
197
219
  end
198
220
 
199
221
  # Set visibility timeout. Pass prefixed: false when queue_name is already
200
222
  # the full PGMQ queue name.
201
223
  def set_visibility_timeout(queue_name, msg_id, vt:, prefixed: true)
202
224
  name = prefixed ? config.queue_name(queue_name) : queue_name
203
- synchronized { @pgmq.set_vt(name, msg_id, vt: vt) }
225
+ with_stale_connection_retry do
226
+ synchronized { @pgmq.set_vt(name, msg_id, vt: vt) }
227
+ end
204
228
  end
205
229
 
230
+ # Open a PGMQ transaction. The caller block may run twice if the first
231
+ # attempt hits a pre-flight stale-connection error — safe because no SQL
232
+ # was sent on the first attempt (the connection was dead before the BEGIN).
206
233
  def transaction(&block)
207
- synchronized { @pgmq.transaction(&block) }
234
+ with_stale_connection_retry do
235
+ synchronized { @pgmq.transaction(&block) }
236
+ end
208
237
  end
209
238
 
210
239
  def move_to_dead_letter(queue_name, message)
211
- ensure_dead_letter_queue(queue_name)
212
240
  dlq_name = config.dead_letter_queue_name(queue_name)
213
241
  full_queue = config.queue_name(queue_name)
214
242
 
215
- synchronized do
216
- @pgmq.transaction do |txn|
217
- txn.produce(dlq_name, message.message, headers: message.headers)
218
- txn.delete(full_queue, message.msg_id.to_i)
243
+ with_stale_connection_retry do
244
+ ensure_dead_letter_queue(queue_name)
245
+ synchronized do
246
+ @pgmq.transaction do |txn|
247
+ txn.produce(dlq_name, message.message, headers: message.headers)
248
+ txn.delete(full_queue, message.msg_id.to_i)
249
+ end
219
250
  end
220
251
  end
221
252
  end
222
253
 
223
254
  def metrics(queue_name = nil)
224
- synchronized do
225
- if queue_name
226
- @pgmq.metrics(config.queue_name(queue_name))
227
- else
228
- @pgmq.metrics_all
255
+ with_stale_connection_retry do
256
+ synchronized do
257
+ if queue_name
258
+ @pgmq.metrics(config.queue_name(queue_name))
259
+ else
260
+ @pgmq.metrics_all
261
+ end
229
262
  end
230
263
  end
231
264
  end
232
265
 
233
266
  def list_queues
234
- synchronized { @pgmq.list_queues }
267
+ with_stale_connection_retry do
268
+ synchronized { @pgmq.list_queues }
269
+ end
235
270
  end
236
271
 
237
272
  def purge_queue(queue_name, prefixed: true)
238
273
  name = prefixed ? config.queue_name(queue_name) : queue_name
239
- synchronized { @pgmq.purge_queue(name) }
274
+ with_stale_connection_retry do
275
+ synchronized { @pgmq.purge_queue(name) }
276
+ end
240
277
  end
241
278
 
242
279
  def drop_queue(queue_name, prefixed: true)
243
280
  name = prefixed ? config.queue_name(queue_name) : queue_name
244
- result = synchronized { @pgmq.drop_queue(name) }
281
+ result = with_stale_connection_retry do
282
+ synchronized { @pgmq.drop_queue(name) }
283
+ end
245
284
  @queues_created.delete(name)
246
285
  result
247
286
  end
@@ -313,18 +352,22 @@ module Pgbus
313
352
  # Topic routing
314
353
  def bind_topic(pattern, queue_name)
315
354
  full_name = config.queue_name(queue_name)
316
- ensure_queue(queue_name)
317
- synchronized { @pgmq.bind_topic(pattern, full_name) }
355
+ with_stale_connection_retry do
356
+ ensure_queue(queue_name)
357
+ synchronized { @pgmq.bind_topic(pattern, full_name) }
358
+ end
318
359
  end
319
360
 
320
361
  def publish_to_topic(routing_key, payload, headers: nil, delay: 0)
321
- synchronized do
322
- @pgmq.produce_topic(
323
- routing_key,
324
- serialize(payload),
325
- headers: headers && serialize(headers),
326
- delay: delay
327
- )
362
+ with_stale_connection_retry do
363
+ synchronized do
364
+ @pgmq.produce_topic(
365
+ routing_key,
366
+ serialize(payload),
367
+ headers: headers && serialize(headers),
368
+ delay: delay
369
+ )
370
+ end
328
371
  end
329
372
  end
330
373
 
@@ -502,6 +545,7 @@ module Pgbus
502
545
  def tune_autovacuum(queue_name)
503
546
  with_raw_connection do |conn|
504
547
  conn.exec(AutovacuumTuning.sql_for_queue(queue_name))
548
+ conn.exec(TableMaintenance.fillfactor_sql_for_queue(queue_name))
505
549
  end
506
550
  rescue StandardError => e
507
551
  Pgbus.logger.debug { "[Pgbus::Client] Autovacuum tuning failed for #{queue_name}: #{e.message}" }
@@ -518,6 +562,60 @@ module Pgbus
518
562
  end
519
563
  end
520
564
 
565
+ # Substrings that indicate the pooled PG::Connection was already dead
566
+ # *before* pgmq-ruby tried to use it — typically killed by a connection
567
+ # pooler (PgBouncer server_idle_timeout / client_idle_timeout), an admin
568
+ # disconnect, or a TCP RST while the slot was idle.
569
+ #
570
+ # Only pre-checkout / pre-flight errors belong here. Mid-flight errors
571
+ # like "server closed the connection" or "connection to server was lost"
572
+ # are excluded because PG may have already committed the INSERT before
573
+ # the socket died, and retrying would duplicate the message.
574
+ #
575
+ # See mensfeld/pgmq-ruby#94.
576
+ STALE_CONNECTION_PATTERNS = [
577
+ "pqsocket() can't get socket descriptor",
578
+ "connection is closed",
579
+ "connection has been closed",
580
+ "connection not open",
581
+ "no connection to the server",
582
+ "ssl error: unexpected eof",
583
+ "ssl syscall error"
584
+ ].freeze
585
+ private_constant :STALE_CONNECTION_PATTERNS
586
+
587
+ # Rescue PGMQ::Errors::ConnectionError once if its message matches a
588
+ # known stale-socket pattern. pgmq-ruby's auto_reconnect + verify_connection!
589
+ # already recovers on the *next* checkout, so a single retry is sufficient.
590
+ # Other connection errors (pool timeout, misconfiguration, truly unreachable
591
+ # DB) propagate.
592
+ #
593
+ # Wraps every @pgmq.* call site. Pattern matching is intentionally narrow
594
+ # (pre-flight / idle-socket signals only), so retry is safe even for
595
+ # non-idempotent ops like delete/archive — a matched error means the
596
+ # connection was dead *before* pgmq-ruby tried to use it, so no SQL was
597
+ # ever sent. Mid-flight errors like "server closed the connection" are
598
+ # excluded from the pattern list for this reason.
599
+ def with_stale_connection_retry
600
+ attempts = 0
601
+ begin
602
+ yield
603
+ rescue PGMQ::Errors::ConnectionError => e
604
+ attempts += 1
605
+ raise unless attempts == 1 && stale_connection_error?(e)
606
+
607
+ Pgbus.logger.warn do
608
+ "[Pgbus::Client] Retrying after stale pgmq connection: #{e.message}"
609
+ end
610
+ retry
611
+ end
612
+ end
613
+
614
+ def stale_connection_error?(error)
615
+ message = error.message.to_s.downcase
616
+ STALE_CONNECTION_PATTERNS.any? { |pattern| message.include?(pattern) }
617
+ end
618
+
521
619
  def serialize(data)
522
620
  case data
523
621
  when String
@@ -78,6 +78,7 @@ module Pgbus
78
78
 
79
79
  # Recurring jobs
80
80
  attr_accessor :recurring_tasks, :recurring_schedule_interval, :recurring_tasks_file, :skip_recurring
81
+ attr_writer :recurring_tasks_files
81
82
  attr_reader :recurring_execution_retention # rubocop:disable Style/AccessorGrouping
82
83
 
83
84
  # Multi-database support (optional separate database for pgbus tables)
@@ -161,6 +162,7 @@ module Pgbus
161
162
  @recurring_tasks = nil
162
163
  @recurring_schedule_interval = 1.0
163
164
  @recurring_tasks_file = nil
165
+ @recurring_tasks_files = nil
164
166
  @skip_recurring = false
165
167
  @recurring_execution_retention = 7 * 24 * 3600 # 7 days
166
168
 
@@ -492,6 +494,12 @@ module Pgbus
492
494
  @recurring_execution_retention = coerce_duration!(value, :recurring_execution_retention)
493
495
  end
494
496
 
497
+ def recurring_tasks_files
498
+ return @recurring_tasks_files if @recurring_tasks_files
499
+
500
+ recurring_tasks_file ? [recurring_tasks_file] : nil
501
+ end
502
+
495
503
  # Returns the connection pool size to use for the PGMQ client.
496
504
  #
497
505
  # If +pool_size+ was explicitly set, returns that value unchanged. Otherwise
data/lib/pgbus/engine.rb CHANGED
@@ -18,10 +18,22 @@ module Pgbus
18
18
  end
19
19
 
20
20
  initializer "pgbus.recurring" do |app|
21
- recurring_path = app.root.join("config", "recurring.yml")
22
- if recurring_path.exist? && !Pgbus.configuration.recurring_tasks
23
- Pgbus.configuration.recurring_tasks = Pgbus::Recurring::ConfigLoader.load(recurring_path)
24
- Pgbus.configuration.recurring_tasks_file ||= recurring_path.to_s
21
+ next if Pgbus.configuration.recurring_tasks
22
+
23
+ config = Pgbus.configuration
24
+ files = config.recurring_tasks_files
25
+ default_path = app.root.join("config", "recurring.yml")
26
+
27
+ if files
28
+ tasks = Pgbus::Recurring::ConfigLoader.load_all(files)
29
+ if tasks.empty? && default_path.exist? && files.none? { |f| File.expand_path(f.to_s) == File.expand_path(default_path.to_s) }
30
+ tasks = Pgbus::Recurring::ConfigLoader.load(default_path)
31
+ config.recurring_tasks_file ||= default_path.to_s
32
+ end
33
+ config.recurring_tasks = tasks unless tasks.empty?
34
+ elsif default_path.exist?
35
+ config.recurring_tasks = Pgbus::Recurring::ConfigLoader.load(default_path)
36
+ config.recurring_tasks_file ||= default_path.to_s
25
37
  end
26
38
  end
27
39
 
@@ -74,7 +74,8 @@ module Pgbus
74
74
  add_outbox: "pgbus:add_outbox",
75
75
  add_recurring: "pgbus:add_recurring",
76
76
  add_failed_events_index: "pgbus:add_failed_events_index",
77
- tune_autovacuum: "pgbus:tune_autovacuum"
77
+ tune_autovacuum: "pgbus:tune_autovacuum",
78
+ tune_fillfactor: "pgbus:tune_fillfactor"
78
79
  }.freeze
79
80
 
80
81
  # Human-friendly description of each migration for the generator
@@ -90,7 +91,8 @@ module Pgbus
90
91
  add_outbox: "outbox entries table (transactional outbox)",
91
92
  add_recurring: "recurring tasks + executions tables",
92
93
  add_failed_events_index: "unique index on pgbus_failed_events (queue_name, msg_id)",
93
- tune_autovacuum: "autovacuum tuning for PGMQ queue and archive tables"
94
+ tune_autovacuum: "autovacuum tuning for PGMQ queue and archive tables",
95
+ tune_fillfactor: "fillfactor=70 on PGMQ queue tables (reduces page density during update churn)"
94
96
  }.freeze
95
97
 
96
98
  def initialize(connection)
@@ -113,7 +115,8 @@ module Pgbus
113
115
  *outbox_migrations,
114
116
  *recurring_migrations,
115
117
  *failed_events_index_migrations,
116
- *autovacuum_migrations
118
+ *autovacuum_migrations,
119
+ *fillfactor_migrations
117
120
  ]
118
121
  end
119
122
 
@@ -205,6 +208,15 @@ module Pgbus
205
208
  [:tune_autovacuum]
206
209
  end
207
210
 
211
+ # Fillfactor tuning: check if any PGMQ queue table already has
212
+ # fillfactor applied. If not, queue the migration.
213
+ def fillfactor_migrations
214
+ return [] unless pgmq_schema_exists?
215
+ return [] if fillfactor_already_tuned?
216
+
217
+ [:tune_fillfactor]
218
+ end
219
+
208
220
  # --- schema probes -------------------------------------------------
209
221
 
210
222
  def table_exists?(name)
@@ -247,6 +259,22 @@ module Pgbus
247
259
  rescue StandardError
248
260
  true # if we can't tell, assume already tuned (safe default)
249
261
  end
262
+
263
+ def fillfactor_already_tuned?
264
+ queue_name = connection.select_value("SELECT queue_name FROM pgmq.meta ORDER BY queue_name LIMIT 1")
265
+ return true unless queue_name # no queues = nothing to tune, skip
266
+
267
+ result = connection.select_value(<<~SQL)
268
+ SELECT reloptions::text LIKE '%fillfactor%'
269
+ FROM pg_class
270
+ WHERE relname = 'q_#{queue_name}'
271
+ AND relnamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'pgmq')
272
+ SQL
273
+
274
+ [true, "t"].include?(result)
275
+ rescue StandardError
276
+ true # if we can't tell, assume already tuned (safe default)
277
+ end
250
278
  end
251
279
  end
252
280
  end
@@ -15,6 +15,7 @@ module Pgbus
15
15
  OUTBOX_CLEANUP_INTERVAL = 3600 # Run outbox cleanup every hour
16
16
  JOB_LOCK_CLEANUP_INTERVAL = 300 # Run job lock cleanup every 5 minutes
17
17
  STATS_CLEANUP_INTERVAL = 3600 # Run stats cleanup every hour
18
+ TABLE_MAINTENANCE_INTERVAL = Pgbus::TableMaintenance::MAINTENANCE_INTERVAL
18
19
 
19
20
  # Page size for archive compaction. Each cycle deletes up to this
20
21
  # many archived rows per queue. Tuned via constant rather than
@@ -37,6 +38,7 @@ module Pgbus
37
38
  @last_outbox_cleanup_at = monotonic_now
38
39
  @last_job_lock_cleanup_at = monotonic_now
39
40
  @last_stats_cleanup_at = monotonic_now
41
+ @last_table_maintenance_at = monotonic_now
40
42
  end
41
43
 
42
44
  def run
@@ -84,6 +86,7 @@ module Pgbus
84
86
  run_if_due(now, :@last_outbox_cleanup_at, OUTBOX_CLEANUP_INTERVAL) { cleanup_outbox }
85
87
  run_if_due(now, :@last_job_lock_cleanup_at, JOB_LOCK_CLEANUP_INTERVAL) { cleanup_job_locks }
86
88
  run_if_due(now, :@last_stats_cleanup_at, STATS_CLEANUP_INTERVAL) { cleanup_stats }
89
+ run_if_due(now, :@last_table_maintenance_at, TABLE_MAINTENANCE_INTERVAL) { run_table_maintenance }
87
90
  end
88
91
 
89
92
  # Only update the timestamp when the block succeeds.
@@ -158,6 +161,19 @@ module Pgbus
158
161
  Pgbus.logger.debug { "[Pgbus] Cleaned up #{deleted} old stream stats" } if deleted.positive?
159
162
  end
160
163
 
164
+ def run_table_maintenance
165
+ conn = config.connects_to ? Pgbus::BusRecord.connection : ActiveRecord::Base.connection
166
+ raw_conn = conn.raw_connection
167
+ maintained = TableMaintenance.run_maintenance(
168
+ raw_conn,
169
+ threshold: TableMaintenance::BLOAT_THRESHOLD,
170
+ reindex: true
171
+ )
172
+ Pgbus.logger.info { "[Pgbus] Table maintenance completed: #{maintained} table(s) vacuumed" } if maintained.positive?
173
+ rescue StandardError => e
174
+ Pgbus.logger.warn { "[Pgbus] Table maintenance failed: #{e.message}" }
175
+ end
176
+
161
177
  def cleanup_job_locks
162
178
  # Clean up truly orphaned uniqueness keys: rows whose referenced
163
179
  # message no longer exists in the PGMQ queue. This handles crashes
@@ -145,9 +145,12 @@ module Pgbus
145
145
 
146
146
  def recurring_tasks_configured?
147
147
  return true if config.recurring_tasks&.any?
148
+
149
+ files = config.recurring_tasks_files
150
+ return true if files&.any? { |f| File.exist?(f.to_s) }
151
+
148
152
  return true if config.recurring_tasks_file && File.exist?(config.recurring_tasks_file.to_s)
149
153
 
150
- # Check default location
151
154
  if defined?(Rails) && Rails.respond_to?(:root) && Rails.root
152
155
  default_path = Rails.root.join("config", "recurring.yml")
153
156
  return File.exist?(default_path.to_s)
@@ -159,6 +162,13 @@ module Pgbus
159
162
  def load_recurring_config
160
163
  return if config.recurring_tasks&.any?
161
164
 
165
+ files = config.recurring_tasks_files
166
+ if files
167
+ tasks = Recurring::ConfigLoader.load_all(files)
168
+ config.recurring_tasks = tasks unless tasks.empty?
169
+ return if tasks.any?
170
+ end
171
+
162
172
  path = config.recurring_tasks_file
163
173
  path ||= defined?(Rails) && Rails.respond_to?(:root) && Rails.root ? Rails.root.join("config", "recurring.yml") : nil
164
174
  return unless path && File.exist?(path.to_s)
@@ -23,6 +23,30 @@ module Pgbus
23
23
  {}
24
24
  end
25
25
 
26
+ def load_all(paths, env: nil)
27
+ normalized = Array(paths).compact.map { |p| p.respond_to?(:to_path) ? p.to_path : p.to_s }.reject(&:empty?)
28
+ return {} if normalized.empty?
29
+
30
+ env ||= detect_env
31
+
32
+ normalized.each_with_object({}) do |path, acc|
33
+ unless File.exist?(path.to_s)
34
+ Pgbus.logger.warn { "[Pgbus] Recurring file not found, skipping: #{path}" }
35
+ next
36
+ end
37
+
38
+ parsed = load(path, env: env)
39
+ unless parsed.is_a?(Hash)
40
+ Pgbus.logger.error { "[Pgbus] Invalid recurring config in #{path}: expected Hash, got #{parsed.class}" }
41
+ next
42
+ end
43
+ parsed.each_key do |key|
44
+ Pgbus.logger.debug { "[Pgbus] Recurring task '#{key}' overridden by #{path}" } if acc.key?(key)
45
+ end
46
+ acc.merge!(parsed)
47
+ end
48
+ end
49
+
26
50
  def detect_env
27
51
  if defined?(Rails) && Rails.respond_to?(:env) && Rails.env
28
52
  Rails.env.to_s
@@ -0,0 +1,110 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Pgbus
4
+ # Proactive table maintenance to reduce bloat on PGMQ queue tables.
5
+ #
6
+ # PGMQ's read operation UPDATEs three columns (vt, read_ct, last_read_at)
7
+ # on every message read. With the default fillfactor of 100, every UPDATE
8
+ # creates a new heap tuple AND a new index entry — the dead tuple and its
9
+ # old index pointer remain until VACUUM. Under sustained load, autovacuum
10
+ # can't keep up and B-tree indexes bloat.
11
+ #
12
+ # Setting fillfactor=70 on queue tables reserves 30% of each page for
13
+ # update churn. Because `vt` is indexed and changes on every read, these
14
+ # writes are not HOT updates, but leaving headroom on heap pages still
15
+ # reduces page density for a table that is updated heavily between vacuum
16
+ # passes.
17
+ #
18
+ # More importantly, this module provides targeted VACUUM: instead of
19
+ # relying solely on autovacuum's global heuristics, the dispatcher
20
+ # periodically checks pg_stat_user_tables for tables with high dead tuple
21
+ # ratios and vacuums them explicitly. This is inspired by pgque's
22
+ # philosophy of measuring bloat before acting.
23
+ module TableMaintenance
24
+ FILLFACTOR = 70
25
+ BLOAT_THRESHOLD = 0.1
26
+ MAINTENANCE_INTERVAL = 6 * 3600 # 6 hours
27
+
28
+ class << self
29
+ def fillfactor_sql_for_queue(queue_name)
30
+ "ALTER TABLE pgmq.q_#{queue_name} SET (fillfactor = #{FILLFACTOR});"
31
+ end
32
+
33
+ def fillfactor_sql_for_all_queues
34
+ <<~SQL
35
+ DO $$
36
+ DECLARE
37
+ q RECORD;
38
+ BEGIN
39
+ FOR q IN SELECT queue_name FROM pgmq.meta LOOP
40
+ EXECUTE format('ALTER TABLE pgmq.q_%I SET (fillfactor = #{FILLFACTOR})', q.queue_name);
41
+ END LOOP;
42
+ END $$;
43
+ SQL
44
+ end
45
+
46
+ def vacuum_candidates(conn, threshold: BLOAT_THRESHOLD)
47
+ rows = conn.exec(<<~SQL)
48
+ SELECT schemaname, relname, n_dead_tup, n_live_tup
49
+ FROM pg_stat_user_tables
50
+ WHERE schemaname = 'pgmq'
51
+ AND relname LIKE 'q_%'
52
+ ORDER BY n_dead_tup DESC
53
+ SQL
54
+
55
+ rows.each_with_object([]) do |row, candidates|
56
+ dead = row["n_dead_tup"].to_i
57
+ live = row["n_live_tup"].to_i
58
+ total = dead + live
59
+ next if total.zero?
60
+
61
+ ratio = dead.to_f / total
62
+ next unless ratio > threshold
63
+
64
+ candidates << {
65
+ table: "#{row["schemaname"]}.#{row["relname"]}",
66
+ dead_tuples: dead,
67
+ live_tuples: live,
68
+ dead_ratio: ratio.round(4)
69
+ }
70
+ end
71
+ end
72
+
73
+ def vacuum_sql(table)
74
+ schema, relname = table.split(".", 2)
75
+ "VACUUM \"#{schema}\".\"#{relname}\""
76
+ end
77
+
78
+ def reindex_sql(table)
79
+ schema, relname = table.split(".", 2)
80
+ "REINDEX TABLE CONCURRENTLY \"#{schema}\".\"#{relname}\""
81
+ end
82
+
83
+ def run_maintenance(conn, threshold: BLOAT_THRESHOLD, reindex: true)
84
+ candidates = vacuum_candidates(conn, threshold: threshold)
85
+ return 0 if candidates.empty?
86
+
87
+ maintained = 0
88
+ candidates.each do |candidate|
89
+ table = candidate[:table]
90
+ Pgbus.logger.info do
91
+ "[Pgbus::TableMaintenance] Vacuuming #{table} " \
92
+ "(dead_ratio=#{candidate[:dead_ratio]}, dead=#{candidate[:dead_tuples]})"
93
+ end
94
+ conn.exec(vacuum_sql(table))
95
+
96
+ if reindex
97
+ Pgbus.logger.info { "[Pgbus::TableMaintenance] Reindexing #{table}" }
98
+ conn.exec(reindex_sql(table))
99
+ end
100
+
101
+ maintained += 1
102
+ rescue StandardError => e
103
+ Pgbus.logger.error { "[Pgbus::TableMaintenance] Failed to maintain #{table}: #{e.message}" }
104
+ end
105
+
106
+ maintained
107
+ end
108
+ end
109
+ end
110
+ end
data/lib/pgbus/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Pgbus
4
- VERSION = "0.7.4"
4
+ VERSION = "0.7.6"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pgbus
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.4
4
+ version: 0.7.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Mikael Henriksson
@@ -234,8 +234,10 @@ files:
234
234
  - lib/generators/pgbus/templates/pgbus_binstub.erb
235
235
  - lib/generators/pgbus/templates/recurring.yml.erb
236
236
  - lib/generators/pgbus/templates/tune_autovacuum.rb.erb
237
+ - lib/generators/pgbus/templates/tune_fillfactor.rb.erb
237
238
  - lib/generators/pgbus/templates/upgrade_pgmq.rb.erb
238
239
  - lib/generators/pgbus/tune_autovacuum_generator.rb
240
+ - lib/generators/pgbus/tune_fillfactor_generator.rb
239
241
  - lib/generators/pgbus/update_generator.rb
240
242
  - lib/generators/pgbus/upgrade_pgmq_generator.rb
241
243
  - lib/pgbus.rb
@@ -309,6 +311,7 @@ files:
309
311
  - lib/pgbus/streams/turbo_broadcastable.rb
310
312
  - lib/pgbus/streams/turbo_stream_override.rb
311
313
  - lib/pgbus/streams/watermark_cache_middleware.rb
314
+ - lib/pgbus/table_maintenance.rb
312
315
  - lib/pgbus/testing.rb
313
316
  - lib/pgbus/testing/assertions.rb
314
317
  - lib/pgbus/testing/minitest.rb