pgbus 0.7.4 → 0.7.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +19 -15
- data/lib/generators/pgbus/templates/migration.rb.erb +5 -0
- data/lib/generators/pgbus/templates/tune_fillfactor.rb.erb +30 -0
- data/lib/generators/pgbus/tune_fillfactor_generator.rb +51 -0
- data/lib/pgbus/client.rb +61 -9
- data/lib/pgbus/configuration.rb +8 -0
- data/lib/pgbus/engine.rb +16 -4
- data/lib/pgbus/generators/migration_detector.rb +31 -3
- data/lib/pgbus/process/dispatcher.rb +16 -0
- data/lib/pgbus/process/supervisor.rb +11 -1
- data/lib/pgbus/recurring/config_loader.rb +24 -0
- data/lib/pgbus/table_maintenance.rb +110 -0
- data/lib/pgbus/version.rb +1 -1
- metadata +4 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 8d2034c37f18f53d3a55df8381e7d37581905b622487f9ae4beb0c1e0f35c964
|
|
4
|
+
data.tar.gz: 419c3c7c6b62276903579ca53e39baf06ed1c0e818817e6139ceb004ad6026b3
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 40bf9610792cef84a8792035b513c660312ff91d0642432a088f544b84846996856d7bd318b7c06d530b8cd573a2dbb78f20ee132f53540a937a075625a54afc
|
|
7
|
+
data.tar.gz: 50ddee092a20af26d348c13ae61f48a8566ec0e46687a958012dbdb508e35e2fde217ee277ddaf47333c9c5b20ad51197c0b7950c358b519c8590573433206aa
|
data/CHANGELOG.md
CHANGED
|
@@ -1,18 +1,3 @@
|
|
|
1
|
-
## [0.5.1] - 2026-04-08
|
|
2
|
-
|
|
3
|
-
### Fixed
|
|
4
|
-
|
|
5
|
-
- **Capsule DSL: anonymous duplicate capsules are now allowed.** Configurations like `c.workers = "*: 3; *: 3; *: 3; *: 3; *: 3"` (the legacy YAML pattern of 5 forks × 3 threads, all reading every queue) were rejected at boot in 0.5.0 with `Pgbus::Configuration::CapsuleDSL::ParseError: wildcard '*' appears in two capsules`. PGMQ tolerates multiple processes reading the same queue natively (`FOR UPDATE SKIP LOCKED`), and this is the canonical way to scale CPU parallelism across forks, so the rejection was wrong.
|
|
6
|
-
|
|
7
|
-
The fix introduces a "named vs anonymous" distinction:
|
|
8
|
-
|
|
9
|
-
- The string DSL parser is now purely syntactic — it no longer enforces overlap rules.
|
|
10
|
-
- `Pgbus::Configuration#workers=` auto-assigns `:name` only to capsules whose first queue would yield a *unique* name AND is not the bare wildcard. Wildcards stay anonymous; collision-prone first-queues stay anonymous.
|
|
11
|
-
- `Pgbus::Configuration#validate_no_queue_overlap!` (called by `c.capsule :name, ...`) now only checks against existing **named** capsules. Anonymous capsules can overlap freely with each other and with named capsules.
|
|
12
|
-
- Net result: `"*: 3; *: 3; *: 3"` produces 3 anonymous capsules (3 forks), `"critical: 5; default: 10"` produces 2 named capsules (CLI `--capsule critical` still works), and named-vs-named overlap is still rejected as before.
|
|
13
|
-
|
|
14
|
-
No changes required to user configuration — legacy YAML patterns and the modern DSL both work as documented.
|
|
15
|
-
|
|
16
1
|
## [Unreleased]
|
|
17
2
|
|
|
18
3
|
### Breaking Changes
|
|
@@ -28,6 +13,25 @@
|
|
|
28
13
|
- Warn when dashboard `web_auth` is unconfigured
|
|
29
14
|
- Add `globalid` as an explicit runtime dependency (was used but only transitively available via activejob)
|
|
30
15
|
|
|
16
|
+
### Fixed
|
|
17
|
+
|
|
18
|
+
- **Defensive retry on stale pooled pgmq connections in the enqueue path.** `Pgbus::Client#send_message`, `#send_batch`, and `#publish_to_topic` now retry once when `@pgmq.produce*` raises `PGMQ::Errors::ConnectionError` with a message indicating the pooled `PG::Connection` was killed beneath pgmq-ruby — typically by PgBouncer hitting `server_idle_timeout` / `client_idle_timeout`, an admin disconnect, or a TCP RST. Observed in production as `PQsocket() can't get socket descriptor` on the first produce following an idle window. pgmq-ruby's `auto_reconnect` recovers on the *next* pool checkout, so a single retry is sufficient; non-stale errors (pool timeout, misconfiguration, unreachable database) still propagate unchanged. Upstream pgmq-ruby fix for the underlying misclassification is in-flight at mensfeld/pgmq-ruby#94.
|
|
19
|
+
|
|
20
|
+
## [0.5.1] - 2026-04-08
|
|
21
|
+
|
|
22
|
+
### Fixed
|
|
23
|
+
|
|
24
|
+
- **Capsule DSL: anonymous duplicate capsules are now allowed.** Configurations like `c.workers = "*: 3; *: 3; *: 3; *: 3; *: 3"` (the legacy YAML pattern of 5 forks × 3 threads, all reading every queue) were rejected at boot in 0.5.0 with `Pgbus::Configuration::CapsuleDSL::ParseError: wildcard '*' appears in two capsules`. PGMQ tolerates multiple processes reading the same queue natively (`FOR UPDATE SKIP LOCKED`), and this is the canonical way to scale CPU parallelism across forks, so the rejection was wrong.
|
|
25
|
+
|
|
26
|
+
The fix introduces a "named vs anonymous" distinction:
|
|
27
|
+
|
|
28
|
+
- The string DSL parser is now purely syntactic — it no longer enforces overlap rules.
|
|
29
|
+
- `Pgbus::Configuration#workers=` auto-assigns `:name` only to capsules whose first queue would yield a *unique* name AND is not the bare wildcard. Wildcards stay anonymous; collision-prone first-queues stay anonymous.
|
|
30
|
+
- `Pgbus::Configuration#validate_no_queue_overlap!` (called by `c.capsule :name, ...`) now only checks against existing **named** capsules. Anonymous capsules can overlap freely with each other and with named capsules.
|
|
31
|
+
- Net result: `"*: 3; *: 3; *: 3"` produces 3 anonymous capsules (3 forks), `"critical: 5; default: 10"` produces 2 named capsules (CLI `--capsule critical` still works), and named-vs-named overlap is still rejected as before.
|
|
32
|
+
|
|
33
|
+
No changes required to user configuration — legacy YAML patterns and the modern DSL both work as documented.
|
|
34
|
+
|
|
31
35
|
## [0.1.0] - 2026-03-30
|
|
32
36
|
|
|
33
37
|
- Initial release
|
|
@@ -156,6 +156,11 @@ class CreatePgbusTables < ActiveRecord::Migration<%= migration_version %>
|
|
|
156
156
|
# queue processing and concurrency lock management.
|
|
157
157
|
execute Pgbus::AutovacuumTuning.sql_for_all_queues
|
|
158
158
|
execute Pgbus::AutovacuumTuning.sql_for_high_churn_tables
|
|
159
|
+
|
|
160
|
+
# Set fillfactor on queue tables to reduce bloat from PGMQ's read
|
|
161
|
+
# UPDATE operations (vt, read_ct, last_read_at). Lower fillfactor
|
|
162
|
+
# reserves page space, reducing page density during heavy update churn.
|
|
163
|
+
execute Pgbus::TableMaintenance.fillfactor_sql_for_all_queues
|
|
159
164
|
end
|
|
160
165
|
|
|
161
166
|
def down
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
class TunePgbusFillfactor < ActiveRecord::Migration<%= migration_version %>
|
|
2
|
+
def up
|
|
3
|
+
# Set fillfactor on queue tables to reduce bloat from PGMQ's read
|
|
4
|
+
# UPDATE operations. PGMQ updates vt, read_ct, and last_read_at on
|
|
5
|
+
# every read — with fillfactor=100 (default), pages fill completely
|
|
6
|
+
# between vacuum passes. Lowering fillfactor reserves page space,
|
|
7
|
+
# reducing page density during heavy update churn. Note: because vt
|
|
8
|
+
# is indexed, these updates are not HOT-eligible.
|
|
9
|
+
#
|
|
10
|
+
# Archive tables are append-only (INSERT from queue, DELETE on
|
|
11
|
+
# retention) and don't benefit from fillfactor tuning.
|
|
12
|
+
#
|
|
13
|
+
# New queues created after this migration automatically receive
|
|
14
|
+
# this setting via Pgbus::Client at queue creation time.
|
|
15
|
+
execute Pgbus::TableMaintenance.fillfactor_sql_for_all_queues
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
def down
|
|
19
|
+
execute <<~SQL
|
|
20
|
+
DO $$
|
|
21
|
+
DECLARE
|
|
22
|
+
q RECORD;
|
|
23
|
+
BEGIN
|
|
24
|
+
FOR q IN SELECT queue_name FROM pgmq.meta LOOP
|
|
25
|
+
EXECUTE format('ALTER TABLE pgmq.q_%I RESET (fillfactor)', q.queue_name);
|
|
26
|
+
END LOOP;
|
|
27
|
+
END $$;
|
|
28
|
+
SQL
|
|
29
|
+
end
|
|
30
|
+
end
|
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "rails/generators"
|
|
4
|
+
require "rails/generators/active_record"
|
|
5
|
+
require_relative "migration_path"
|
|
6
|
+
|
|
7
|
+
module Pgbus
|
|
8
|
+
module Generators
|
|
9
|
+
class TuneFillfactorGenerator < Rails::Generators::Base
|
|
10
|
+
include ActiveRecord::Generators::Migration
|
|
11
|
+
include MigrationPath
|
|
12
|
+
|
|
13
|
+
source_root File.expand_path("templates", __dir__)
|
|
14
|
+
|
|
15
|
+
desc "Set fillfactor on PGMQ queue tables to reduce page density and bloat"
|
|
16
|
+
|
|
17
|
+
class_option :database,
|
|
18
|
+
type: :string,
|
|
19
|
+
default: nil,
|
|
20
|
+
desc: "Use a separate database for pgbus tables (e.g. --database=pgbus)"
|
|
21
|
+
|
|
22
|
+
def create_migration_file
|
|
23
|
+
migration_template "tune_fillfactor.rb.erb",
|
|
24
|
+
File.join(pgbus_migrate_path, "tune_pgbus_fillfactor.rb")
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
def display_post_install
|
|
28
|
+
say ""
|
|
29
|
+
say "Pgbus fillfactor tuning migration created!", :green
|
|
30
|
+
say ""
|
|
31
|
+
say "This migration sets fillfactor=#{Pgbus::TableMaintenance::FILLFACTOR} on all existing"
|
|
32
|
+
say "PGMQ queue tables. This reserves #{100 - Pgbus::TableMaintenance::FILLFACTOR}% of each page to"
|
|
33
|
+
say "reduce page density during PGMQ's heavy read UPDATE churn."
|
|
34
|
+
say ""
|
|
35
|
+
say "New queues created at runtime will automatically receive"
|
|
36
|
+
say "this setting."
|
|
37
|
+
say ""
|
|
38
|
+
say "Next steps:"
|
|
39
|
+
say " 1. Run: rails db:migrate#{":#{options[:database]}" if separate_database?}"
|
|
40
|
+
say " 2. Restart pgbus: bin/pgbus start"
|
|
41
|
+
say ""
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
private
|
|
45
|
+
|
|
46
|
+
def migration_version
|
|
47
|
+
"[#{ActiveRecord::Migration.current_version}]"
|
|
48
|
+
end
|
|
49
|
+
end
|
|
50
|
+
end
|
|
51
|
+
end
|
data/lib/pgbus/client.rb
CHANGED
|
@@ -87,7 +87,9 @@ module Pgbus
|
|
|
87
87
|
target = @queue_strategy.target_queue(queue_name, priority)
|
|
88
88
|
ensure_queue(queue_name)
|
|
89
89
|
Instrumentation.instrument("pgbus.client.send_message", queue: target) do
|
|
90
|
-
|
|
90
|
+
with_stale_connection_retry do
|
|
91
|
+
synchronized { @pgmq.produce(target, serialize(payload), headers: headers && serialize(headers), delay: delay) }
|
|
92
|
+
end
|
|
91
93
|
end
|
|
92
94
|
end
|
|
93
95
|
|
|
@@ -96,7 +98,9 @@ module Pgbus
|
|
|
96
98
|
ensure_queue(queue_name)
|
|
97
99
|
serialized, serialized_headers = serialize_batch(payloads, headers)
|
|
98
100
|
Instrumentation.instrument("pgbus.client.send_batch", queue: full_name, size: payloads.size) do
|
|
99
|
-
|
|
101
|
+
with_stale_connection_retry do
|
|
102
|
+
synchronized { @pgmq.produce_batch(full_name, serialized, headers: serialized_headers, delay: delay) }
|
|
103
|
+
end
|
|
100
104
|
end
|
|
101
105
|
end
|
|
102
106
|
|
|
@@ -318,13 +322,15 @@ module Pgbus
|
|
|
318
322
|
end
|
|
319
323
|
|
|
320
324
|
def publish_to_topic(routing_key, payload, headers: nil, delay: 0)
|
|
321
|
-
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
|
|
325
|
-
|
|
326
|
-
|
|
327
|
-
|
|
325
|
+
with_stale_connection_retry do
|
|
326
|
+
synchronized do
|
|
327
|
+
@pgmq.produce_topic(
|
|
328
|
+
routing_key,
|
|
329
|
+
serialize(payload),
|
|
330
|
+
headers: headers && serialize(headers),
|
|
331
|
+
delay: delay
|
|
332
|
+
)
|
|
333
|
+
end
|
|
328
334
|
end
|
|
329
335
|
end
|
|
330
336
|
|
|
@@ -502,6 +508,7 @@ module Pgbus
|
|
|
502
508
|
def tune_autovacuum(queue_name)
|
|
503
509
|
with_raw_connection do |conn|
|
|
504
510
|
conn.exec(AutovacuumTuning.sql_for_queue(queue_name))
|
|
511
|
+
conn.exec(TableMaintenance.fillfactor_sql_for_queue(queue_name))
|
|
505
512
|
end
|
|
506
513
|
rescue StandardError => e
|
|
507
514
|
Pgbus.logger.debug { "[Pgbus::Client] Autovacuum tuning failed for #{queue_name}: #{e.message}" }
|
|
@@ -518,6 +525,51 @@ module Pgbus
|
|
|
518
525
|
end
|
|
519
526
|
end
|
|
520
527
|
|
|
528
|
+
# Substrings that indicate the pooled PG::Connection was already dead
|
|
529
|
+
# *before* pgmq-ruby tried to use it — typically killed by a connection
|
|
530
|
+
# pooler (PgBouncer server_idle_timeout / client_idle_timeout), an admin
|
|
531
|
+
# disconnect, or a TCP RST while the slot was idle.
|
|
532
|
+
#
|
|
533
|
+
# Only pre-checkout / pre-flight errors belong here. Mid-flight errors
|
|
534
|
+
# like "server closed the connection" or "connection to server was lost"
|
|
535
|
+
# are excluded because PG may have already committed the INSERT before
|
|
536
|
+
# the socket died, and retrying would duplicate the message.
|
|
537
|
+
#
|
|
538
|
+
# See mensfeld/pgmq-ruby#94.
|
|
539
|
+
STALE_CONNECTION_PATTERNS = [
|
|
540
|
+
"pqsocket() can't get socket descriptor",
|
|
541
|
+
"connection is closed",
|
|
542
|
+
"connection has been closed",
|
|
543
|
+
"connection not open",
|
|
544
|
+
"no connection to the server"
|
|
545
|
+
].freeze
|
|
546
|
+
private_constant :STALE_CONNECTION_PATTERNS
|
|
547
|
+
|
|
548
|
+
# Enqueue path guard: rescue PGMQ::Errors::ConnectionError once if its
|
|
549
|
+
# message matches a known stale-socket pattern. pgmq-ruby's
|
|
550
|
+
# auto_reconnect + verify_connection! already recovers on the *next*
|
|
551
|
+
# checkout, so a single retry is sufficient. Other connection errors
|
|
552
|
+
# (pool timeout, misconfiguration, truly unreachable DB) propagate.
|
|
553
|
+
def with_stale_connection_retry
|
|
554
|
+
attempts = 0
|
|
555
|
+
begin
|
|
556
|
+
yield
|
|
557
|
+
rescue PGMQ::Errors::ConnectionError => e
|
|
558
|
+
attempts += 1
|
|
559
|
+
raise unless attempts == 1 && stale_connection_error?(e)
|
|
560
|
+
|
|
561
|
+
Pgbus.logger.warn do
|
|
562
|
+
"[Pgbus::Client] Retrying produce after stale pgmq connection: #{e.message}"
|
|
563
|
+
end
|
|
564
|
+
retry
|
|
565
|
+
end
|
|
566
|
+
end
|
|
567
|
+
|
|
568
|
+
def stale_connection_error?(error)
|
|
569
|
+
message = error.message.to_s.downcase
|
|
570
|
+
STALE_CONNECTION_PATTERNS.any? { |pattern| message.include?(pattern) }
|
|
571
|
+
end
|
|
572
|
+
|
|
521
573
|
def serialize(data)
|
|
522
574
|
case data
|
|
523
575
|
when String
|
data/lib/pgbus/configuration.rb
CHANGED
|
@@ -78,6 +78,7 @@ module Pgbus
|
|
|
78
78
|
|
|
79
79
|
# Recurring jobs
|
|
80
80
|
attr_accessor :recurring_tasks, :recurring_schedule_interval, :recurring_tasks_file, :skip_recurring
|
|
81
|
+
attr_writer :recurring_tasks_files
|
|
81
82
|
attr_reader :recurring_execution_retention # rubocop:disable Style/AccessorGrouping
|
|
82
83
|
|
|
83
84
|
# Multi-database support (optional separate database for pgbus tables)
|
|
@@ -161,6 +162,7 @@ module Pgbus
|
|
|
161
162
|
@recurring_tasks = nil
|
|
162
163
|
@recurring_schedule_interval = 1.0
|
|
163
164
|
@recurring_tasks_file = nil
|
|
165
|
+
@recurring_tasks_files = nil
|
|
164
166
|
@skip_recurring = false
|
|
165
167
|
@recurring_execution_retention = 7 * 24 * 3600 # 7 days
|
|
166
168
|
|
|
@@ -492,6 +494,12 @@ module Pgbus
|
|
|
492
494
|
@recurring_execution_retention = coerce_duration!(value, :recurring_execution_retention)
|
|
493
495
|
end
|
|
494
496
|
|
|
497
|
+
def recurring_tasks_files
|
|
498
|
+
return @recurring_tasks_files if @recurring_tasks_files
|
|
499
|
+
|
|
500
|
+
recurring_tasks_file ? [recurring_tasks_file] : nil
|
|
501
|
+
end
|
|
502
|
+
|
|
495
503
|
# Returns the connection pool size to use for the PGMQ client.
|
|
496
504
|
#
|
|
497
505
|
# If +pool_size+ was explicitly set, returns that value unchanged. Otherwise
|
data/lib/pgbus/engine.rb
CHANGED
|
@@ -18,10 +18,22 @@ module Pgbus
|
|
|
18
18
|
end
|
|
19
19
|
|
|
20
20
|
initializer "pgbus.recurring" do |app|
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
21
|
+
next if Pgbus.configuration.recurring_tasks
|
|
22
|
+
|
|
23
|
+
config = Pgbus.configuration
|
|
24
|
+
files = config.recurring_tasks_files
|
|
25
|
+
default_path = app.root.join("config", "recurring.yml")
|
|
26
|
+
|
|
27
|
+
if files
|
|
28
|
+
tasks = Pgbus::Recurring::ConfigLoader.load_all(files)
|
|
29
|
+
if tasks.empty? && default_path.exist? && files.none? { |f| File.expand_path(f.to_s) == File.expand_path(default_path.to_s) }
|
|
30
|
+
tasks = Pgbus::Recurring::ConfigLoader.load(default_path)
|
|
31
|
+
config.recurring_tasks_file ||= default_path.to_s
|
|
32
|
+
end
|
|
33
|
+
config.recurring_tasks = tasks unless tasks.empty?
|
|
34
|
+
elsif default_path.exist?
|
|
35
|
+
config.recurring_tasks = Pgbus::Recurring::ConfigLoader.load(default_path)
|
|
36
|
+
config.recurring_tasks_file ||= default_path.to_s
|
|
25
37
|
end
|
|
26
38
|
end
|
|
27
39
|
|
|
@@ -74,7 +74,8 @@ module Pgbus
|
|
|
74
74
|
add_outbox: "pgbus:add_outbox",
|
|
75
75
|
add_recurring: "pgbus:add_recurring",
|
|
76
76
|
add_failed_events_index: "pgbus:add_failed_events_index",
|
|
77
|
-
tune_autovacuum: "pgbus:tune_autovacuum"
|
|
77
|
+
tune_autovacuum: "pgbus:tune_autovacuum",
|
|
78
|
+
tune_fillfactor: "pgbus:tune_fillfactor"
|
|
78
79
|
}.freeze
|
|
79
80
|
|
|
80
81
|
# Human-friendly description of each migration for the generator
|
|
@@ -90,7 +91,8 @@ module Pgbus
|
|
|
90
91
|
add_outbox: "outbox entries table (transactional outbox)",
|
|
91
92
|
add_recurring: "recurring tasks + executions tables",
|
|
92
93
|
add_failed_events_index: "unique index on pgbus_failed_events (queue_name, msg_id)",
|
|
93
|
-
tune_autovacuum: "autovacuum tuning for PGMQ queue and archive tables"
|
|
94
|
+
tune_autovacuum: "autovacuum tuning for PGMQ queue and archive tables",
|
|
95
|
+
tune_fillfactor: "fillfactor=70 on PGMQ queue tables (reduces page density during update churn)"
|
|
94
96
|
}.freeze
|
|
95
97
|
|
|
96
98
|
def initialize(connection)
|
|
@@ -113,7 +115,8 @@ module Pgbus
|
|
|
113
115
|
*outbox_migrations,
|
|
114
116
|
*recurring_migrations,
|
|
115
117
|
*failed_events_index_migrations,
|
|
116
|
-
*autovacuum_migrations
|
|
118
|
+
*autovacuum_migrations,
|
|
119
|
+
*fillfactor_migrations
|
|
117
120
|
]
|
|
118
121
|
end
|
|
119
122
|
|
|
@@ -205,6 +208,15 @@ module Pgbus
|
|
|
205
208
|
[:tune_autovacuum]
|
|
206
209
|
end
|
|
207
210
|
|
|
211
|
+
# Fillfactor tuning: check if any PGMQ queue table already has
|
|
212
|
+
# fillfactor applied. If not, queue the migration.
|
|
213
|
+
def fillfactor_migrations
|
|
214
|
+
return [] unless pgmq_schema_exists?
|
|
215
|
+
return [] if fillfactor_already_tuned?
|
|
216
|
+
|
|
217
|
+
[:tune_fillfactor]
|
|
218
|
+
end
|
|
219
|
+
|
|
208
220
|
# --- schema probes -------------------------------------------------
|
|
209
221
|
|
|
210
222
|
def table_exists?(name)
|
|
@@ -247,6 +259,22 @@ module Pgbus
|
|
|
247
259
|
rescue StandardError
|
|
248
260
|
true # if we can't tell, assume already tuned (safe default)
|
|
249
261
|
end
|
|
262
|
+
|
|
263
|
+
def fillfactor_already_tuned?
|
|
264
|
+
queue_name = connection.select_value("SELECT queue_name FROM pgmq.meta ORDER BY queue_name LIMIT 1")
|
|
265
|
+
return true unless queue_name # no queues = nothing to tune, skip
|
|
266
|
+
|
|
267
|
+
result = connection.select_value(<<~SQL)
|
|
268
|
+
SELECT reloptions::text LIKE '%fillfactor%'
|
|
269
|
+
FROM pg_class
|
|
270
|
+
WHERE relname = 'q_#{queue_name}'
|
|
271
|
+
AND relnamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'pgmq')
|
|
272
|
+
SQL
|
|
273
|
+
|
|
274
|
+
[true, "t"].include?(result)
|
|
275
|
+
rescue StandardError
|
|
276
|
+
true # if we can't tell, assume already tuned (safe default)
|
|
277
|
+
end
|
|
250
278
|
end
|
|
251
279
|
end
|
|
252
280
|
end
|
|
@@ -15,6 +15,7 @@ module Pgbus
|
|
|
15
15
|
OUTBOX_CLEANUP_INTERVAL = 3600 # Run outbox cleanup every hour
|
|
16
16
|
JOB_LOCK_CLEANUP_INTERVAL = 300 # Run job lock cleanup every 5 minutes
|
|
17
17
|
STATS_CLEANUP_INTERVAL = 3600 # Run stats cleanup every hour
|
|
18
|
+
TABLE_MAINTENANCE_INTERVAL = Pgbus::TableMaintenance::MAINTENANCE_INTERVAL
|
|
18
19
|
|
|
19
20
|
# Page size for archive compaction. Each cycle deletes up to this
|
|
20
21
|
# many archived rows per queue. Tuned via constant rather than
|
|
@@ -37,6 +38,7 @@ module Pgbus
|
|
|
37
38
|
@last_outbox_cleanup_at = monotonic_now
|
|
38
39
|
@last_job_lock_cleanup_at = monotonic_now
|
|
39
40
|
@last_stats_cleanup_at = monotonic_now
|
|
41
|
+
@last_table_maintenance_at = monotonic_now
|
|
40
42
|
end
|
|
41
43
|
|
|
42
44
|
def run
|
|
@@ -84,6 +86,7 @@ module Pgbus
|
|
|
84
86
|
run_if_due(now, :@last_outbox_cleanup_at, OUTBOX_CLEANUP_INTERVAL) { cleanup_outbox }
|
|
85
87
|
run_if_due(now, :@last_job_lock_cleanup_at, JOB_LOCK_CLEANUP_INTERVAL) { cleanup_job_locks }
|
|
86
88
|
run_if_due(now, :@last_stats_cleanup_at, STATS_CLEANUP_INTERVAL) { cleanup_stats }
|
|
89
|
+
run_if_due(now, :@last_table_maintenance_at, TABLE_MAINTENANCE_INTERVAL) { run_table_maintenance }
|
|
87
90
|
end
|
|
88
91
|
|
|
89
92
|
# Only update the timestamp when the block succeeds.
|
|
@@ -158,6 +161,19 @@ module Pgbus
|
|
|
158
161
|
Pgbus.logger.debug { "[Pgbus] Cleaned up #{deleted} old stream stats" } if deleted.positive?
|
|
159
162
|
end
|
|
160
163
|
|
|
164
|
+
def run_table_maintenance
|
|
165
|
+
conn = config.connects_to ? Pgbus::BusRecord.connection : ActiveRecord::Base.connection
|
|
166
|
+
raw_conn = conn.raw_connection
|
|
167
|
+
maintained = TableMaintenance.run_maintenance(
|
|
168
|
+
raw_conn,
|
|
169
|
+
threshold: TableMaintenance::BLOAT_THRESHOLD,
|
|
170
|
+
reindex: true
|
|
171
|
+
)
|
|
172
|
+
Pgbus.logger.info { "[Pgbus] Table maintenance completed: #{maintained} table(s) vacuumed" } if maintained.positive?
|
|
173
|
+
rescue StandardError => e
|
|
174
|
+
Pgbus.logger.warn { "[Pgbus] Table maintenance failed: #{e.message}" }
|
|
175
|
+
end
|
|
176
|
+
|
|
161
177
|
def cleanup_job_locks
|
|
162
178
|
# Clean up truly orphaned uniqueness keys: rows whose referenced
|
|
163
179
|
# message no longer exists in the PGMQ queue. This handles crashes
|
|
@@ -145,9 +145,12 @@ module Pgbus
|
|
|
145
145
|
|
|
146
146
|
def recurring_tasks_configured?
|
|
147
147
|
return true if config.recurring_tasks&.any?
|
|
148
|
+
|
|
149
|
+
files = config.recurring_tasks_files
|
|
150
|
+
return true if files&.any? { |f| File.exist?(f.to_s) }
|
|
151
|
+
|
|
148
152
|
return true if config.recurring_tasks_file && File.exist?(config.recurring_tasks_file.to_s)
|
|
149
153
|
|
|
150
|
-
# Check default location
|
|
151
154
|
if defined?(Rails) && Rails.respond_to?(:root) && Rails.root
|
|
152
155
|
default_path = Rails.root.join("config", "recurring.yml")
|
|
153
156
|
return File.exist?(default_path.to_s)
|
|
@@ -159,6 +162,13 @@ module Pgbus
|
|
|
159
162
|
def load_recurring_config
|
|
160
163
|
return if config.recurring_tasks&.any?
|
|
161
164
|
|
|
165
|
+
files = config.recurring_tasks_files
|
|
166
|
+
if files
|
|
167
|
+
tasks = Recurring::ConfigLoader.load_all(files)
|
|
168
|
+
config.recurring_tasks = tasks unless tasks.empty?
|
|
169
|
+
return if tasks.any?
|
|
170
|
+
end
|
|
171
|
+
|
|
162
172
|
path = config.recurring_tasks_file
|
|
163
173
|
path ||= defined?(Rails) && Rails.respond_to?(:root) && Rails.root ? Rails.root.join("config", "recurring.yml") : nil
|
|
164
174
|
return unless path && File.exist?(path.to_s)
|
|
@@ -23,6 +23,30 @@ module Pgbus
|
|
|
23
23
|
{}
|
|
24
24
|
end
|
|
25
25
|
|
|
26
|
+
def load_all(paths, env: nil)
|
|
27
|
+
normalized = Array(paths).compact.map { |p| p.respond_to?(:to_path) ? p.to_path : p.to_s }.reject(&:empty?)
|
|
28
|
+
return {} if normalized.empty?
|
|
29
|
+
|
|
30
|
+
env ||= detect_env
|
|
31
|
+
|
|
32
|
+
normalized.each_with_object({}) do |path, acc|
|
|
33
|
+
unless File.exist?(path.to_s)
|
|
34
|
+
Pgbus.logger.warn { "[Pgbus] Recurring file not found, skipping: #{path}" }
|
|
35
|
+
next
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
parsed = load(path, env: env)
|
|
39
|
+
unless parsed.is_a?(Hash)
|
|
40
|
+
Pgbus.logger.error { "[Pgbus] Invalid recurring config in #{path}: expected Hash, got #{parsed.class}" }
|
|
41
|
+
next
|
|
42
|
+
end
|
|
43
|
+
parsed.each_key do |key|
|
|
44
|
+
Pgbus.logger.debug { "[Pgbus] Recurring task '#{key}' overridden by #{path}" } if acc.key?(key)
|
|
45
|
+
end
|
|
46
|
+
acc.merge!(parsed)
|
|
47
|
+
end
|
|
48
|
+
end
|
|
49
|
+
|
|
26
50
|
def detect_env
|
|
27
51
|
if defined?(Rails) && Rails.respond_to?(:env) && Rails.env
|
|
28
52
|
Rails.env.to_s
|
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Pgbus
|
|
4
|
+
# Proactive table maintenance to reduce bloat on PGMQ queue tables.
|
|
5
|
+
#
|
|
6
|
+
# PGMQ's read operation UPDATEs three columns (vt, read_ct, last_read_at)
|
|
7
|
+
# on every message read. With the default fillfactor of 100, every UPDATE
|
|
8
|
+
# creates a new heap tuple AND a new index entry — the dead tuple and its
|
|
9
|
+
# old index pointer remain until VACUUM. Under sustained load, autovacuum
|
|
10
|
+
# can't keep up and B-tree indexes bloat.
|
|
11
|
+
#
|
|
12
|
+
# Setting fillfactor=70 on queue tables reserves 30% of each page for
|
|
13
|
+
# update churn. Because `vt` is indexed and changes on every read, these
|
|
14
|
+
# writes are not HOT updates, but leaving headroom on heap pages still
|
|
15
|
+
# reduces page density for a table that is updated heavily between vacuum
|
|
16
|
+
# passes.
|
|
17
|
+
#
|
|
18
|
+
# More importantly, this module provides targeted VACUUM: instead of
|
|
19
|
+
# relying solely on autovacuum's global heuristics, the dispatcher
|
|
20
|
+
# periodically checks pg_stat_user_tables for tables with high dead tuple
|
|
21
|
+
# ratios and vacuums them explicitly. This is inspired by pgque's
|
|
22
|
+
# philosophy of measuring bloat before acting.
|
|
23
|
+
module TableMaintenance
|
|
24
|
+
FILLFACTOR = 70
|
|
25
|
+
BLOAT_THRESHOLD = 0.1
|
|
26
|
+
MAINTENANCE_INTERVAL = 6 * 3600 # 6 hours
|
|
27
|
+
|
|
28
|
+
class << self
|
|
29
|
+
def fillfactor_sql_for_queue(queue_name)
|
|
30
|
+
"ALTER TABLE pgmq.q_#{queue_name} SET (fillfactor = #{FILLFACTOR});"
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
def fillfactor_sql_for_all_queues
|
|
34
|
+
<<~SQL
|
|
35
|
+
DO $$
|
|
36
|
+
DECLARE
|
|
37
|
+
q RECORD;
|
|
38
|
+
BEGIN
|
|
39
|
+
FOR q IN SELECT queue_name FROM pgmq.meta LOOP
|
|
40
|
+
EXECUTE format('ALTER TABLE pgmq.q_%I SET (fillfactor = #{FILLFACTOR})', q.queue_name);
|
|
41
|
+
END LOOP;
|
|
42
|
+
END $$;
|
|
43
|
+
SQL
|
|
44
|
+
end
|
|
45
|
+
|
|
46
|
+
def vacuum_candidates(conn, threshold: BLOAT_THRESHOLD)
|
|
47
|
+
rows = conn.exec(<<~SQL)
|
|
48
|
+
SELECT schemaname, relname, n_dead_tup, n_live_tup
|
|
49
|
+
FROM pg_stat_user_tables
|
|
50
|
+
WHERE schemaname = 'pgmq'
|
|
51
|
+
AND relname LIKE 'q_%'
|
|
52
|
+
ORDER BY n_dead_tup DESC
|
|
53
|
+
SQL
|
|
54
|
+
|
|
55
|
+
rows.each_with_object([]) do |row, candidates|
|
|
56
|
+
dead = row["n_dead_tup"].to_i
|
|
57
|
+
live = row["n_live_tup"].to_i
|
|
58
|
+
total = dead + live
|
|
59
|
+
next if total.zero?
|
|
60
|
+
|
|
61
|
+
ratio = dead.to_f / total
|
|
62
|
+
next unless ratio > threshold
|
|
63
|
+
|
|
64
|
+
candidates << {
|
|
65
|
+
table: "#{row["schemaname"]}.#{row["relname"]}",
|
|
66
|
+
dead_tuples: dead,
|
|
67
|
+
live_tuples: live,
|
|
68
|
+
dead_ratio: ratio.round(4)
|
|
69
|
+
}
|
|
70
|
+
end
|
|
71
|
+
end
|
|
72
|
+
|
|
73
|
+
def vacuum_sql(table)
|
|
74
|
+
schema, relname = table.split(".", 2)
|
|
75
|
+
"VACUUM \"#{schema}\".\"#{relname}\""
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
def reindex_sql(table)
|
|
79
|
+
schema, relname = table.split(".", 2)
|
|
80
|
+
"REINDEX TABLE CONCURRENTLY \"#{schema}\".\"#{relname}\""
|
|
81
|
+
end
|
|
82
|
+
|
|
83
|
+
def run_maintenance(conn, threshold: BLOAT_THRESHOLD, reindex: true)
|
|
84
|
+
candidates = vacuum_candidates(conn, threshold: threshold)
|
|
85
|
+
return 0 if candidates.empty?
|
|
86
|
+
|
|
87
|
+
maintained = 0
|
|
88
|
+
candidates.each do |candidate|
|
|
89
|
+
table = candidate[:table]
|
|
90
|
+
Pgbus.logger.info do
|
|
91
|
+
"[Pgbus::TableMaintenance] Vacuuming #{table} " \
|
|
92
|
+
"(dead_ratio=#{candidate[:dead_ratio]}, dead=#{candidate[:dead_tuples]})"
|
|
93
|
+
end
|
|
94
|
+
conn.exec(vacuum_sql(table))
|
|
95
|
+
|
|
96
|
+
if reindex
|
|
97
|
+
Pgbus.logger.info { "[Pgbus::TableMaintenance] Reindexing #{table}" }
|
|
98
|
+
conn.exec(reindex_sql(table))
|
|
99
|
+
end
|
|
100
|
+
|
|
101
|
+
maintained += 1
|
|
102
|
+
rescue StandardError => e
|
|
103
|
+
Pgbus.logger.error { "[Pgbus::TableMaintenance] Failed to maintain #{table}: #{e.message}" }
|
|
104
|
+
end
|
|
105
|
+
|
|
106
|
+
maintained
|
|
107
|
+
end
|
|
108
|
+
end
|
|
109
|
+
end
|
|
110
|
+
end
|
data/lib/pgbus/version.rb
CHANGED
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: pgbus
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.7.
|
|
4
|
+
version: 0.7.5
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Mikael Henriksson
|
|
@@ -234,8 +234,10 @@ files:
|
|
|
234
234
|
- lib/generators/pgbus/templates/pgbus_binstub.erb
|
|
235
235
|
- lib/generators/pgbus/templates/recurring.yml.erb
|
|
236
236
|
- lib/generators/pgbus/templates/tune_autovacuum.rb.erb
|
|
237
|
+
- lib/generators/pgbus/templates/tune_fillfactor.rb.erb
|
|
237
238
|
- lib/generators/pgbus/templates/upgrade_pgmq.rb.erb
|
|
238
239
|
- lib/generators/pgbus/tune_autovacuum_generator.rb
|
|
240
|
+
- lib/generators/pgbus/tune_fillfactor_generator.rb
|
|
239
241
|
- lib/generators/pgbus/update_generator.rb
|
|
240
242
|
- lib/generators/pgbus/upgrade_pgmq_generator.rb
|
|
241
243
|
- lib/pgbus.rb
|
|
@@ -309,6 +311,7 @@ files:
|
|
|
309
311
|
- lib/pgbus/streams/turbo_broadcastable.rb
|
|
310
312
|
- lib/pgbus/streams/turbo_stream_override.rb
|
|
311
313
|
- lib/pgbus/streams/watermark_cache_middleware.rb
|
|
314
|
+
- lib/pgbus/table_maintenance.rb
|
|
312
315
|
- lib/pgbus/testing.rb
|
|
313
316
|
- lib/pgbus/testing/assertions.rb
|
|
314
317
|
- lib/pgbus/testing/minitest.rb
|