dispatch_policy 0.4.2 → 0.4.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +53 -0
- data/README.md +1 -0
- data/app/controllers/dispatch_policy/dashboard_controller.rb +21 -13
- data/app/controllers/dispatch_policy/policies_controller.rb +7 -4
- data/app/models/dispatch_policy/partition.rb +0 -4
- data/lib/dispatch_policy/config.rb +11 -0
- data/lib/dispatch_policy/cursor_pagination.rb +16 -0
- data/lib/dispatch_policy/gates/concurrency.rb +7 -0
- data/lib/dispatch_policy/gates/throttle.rb +23 -3
- data/lib/dispatch_policy/inflight_tracker.rb +9 -1
- data/lib/dispatch_policy/pipeline.rb +21 -2
- data/lib/dispatch_policy/policy_dsl.rb +2 -2
- data/lib/dispatch_policy/repository.rb +147 -6
- data/lib/dispatch_policy/tick.rb +15 -8
- data/lib/dispatch_policy/tick_loop.rb +4 -1
- data/lib/dispatch_policy/version.rb +1 -1
- data/lib/generators/dispatch_policy/install/templates/initializer.rb.tt +1 -0
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 23433a64c963b0e0908c185ad8dc8e6f97edbd8d476ee712d15023f74ba0e338
|
|
4
|
+
data.tar.gz: 64ff19e04a6d02b0f1eedb4fb6d74b0e073e3773efb9a3afc92ae1a3e9002aeb
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: e168e049dbb0d399dddc6e84427b7b557474d9ce10cb1983d3f4f24c6fde43ffda9c03c179b4590d9f09d225a8adeb5d7295a10788898ebc4ab0bc47a765163c
|
|
7
|
+
data.tar.gz: d8ef9debaebdf89de7cce28e5fa669484acafd27e4b5e65ff959f6f177c1aaa124cb1215da191c3a6759094aac467980489241062f907574860b7226dd9dbc9a
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,58 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.4.3
|
|
4
|
+
|
|
5
|
+
### Fixed
|
|
6
|
+
- The `throttle` gate now charges its token bucket for the number of jobs
|
|
7
|
+
**actually admitted**, not for the optimistic `allowed` it computes at
|
|
8
|
+
evaluate time. The deduction moved from `#evaluate` to the `#consume`
|
|
9
|
+
hook (run after the staging DELETE, via `Pipeline.settle`), so the
|
|
10
|
+
bucket is no longer over-charged — and the effective rate no longer
|
|
11
|
+
drifts below the configured one — when fewer jobs are admitted than
|
|
12
|
+
allowed: future-scheduled rows skipped by the `scheduled_at <= now()`
|
|
13
|
+
filter, a downstream `concurrency` gate capping `admit_count`, or rows a
|
|
14
|
+
concurrent tick claimed under `SKIP LOCKED`.
|
|
15
|
+
- Inflight rows for jobs that were admitted but have **not started
|
|
16
|
+
performing yet** (still waiting in the adapter's queue) are no longer
|
|
17
|
+
reaped at `inflight_stale_after`. Their heartbeat thread only starts in
|
|
18
|
+
`around_perform`, so under a deep adapter backlog the sweeper used to
|
|
19
|
+
delete still-valid admissions, making the concurrency gate under-count
|
|
20
|
+
and over-admit. `sweep_stale_inflight!` is now two-tier: rows
|
|
21
|
+
heartbeated past admission reap at `inflight_stale_after`; never-started
|
|
22
|
+
rows reap only past the new, generous `config.inflight_queued_stale_after`
|
|
23
|
+
(1 hour default).
|
|
24
|
+
- `InflightTracker` now applies the same `job.queue_name || policy.queue_name`
|
|
25
|
+
fallback at perform time that the staging path uses, so a policy whose
|
|
26
|
+
`partition_by`/`shard_by` reads `queue_name` derives the same
|
|
27
|
+
`partition_key` at admission and at perform (otherwise the inflight row
|
|
28
|
+
and adaptive observations landed under the wrong scope).
|
|
29
|
+
- `CursorPagination` rejects cursors whose value isn't a scalar or whose
|
|
30
|
+
id isn't an integer (the cursor is an attacker-controllable query
|
|
31
|
+
param), and ignores a value whose type can't compare against the sort
|
|
32
|
+
column instead of raising a `PG` error (a forged numeric value on a
|
|
33
|
+
timestamp sort). Falls back to the first page.
|
|
34
|
+
- `PolicyDSL#tick_admission_budget(nil)` / `#admission_batch_size(nil)` are
|
|
35
|
+
no-ops that defer to config instead of raising in `Integer(nil)`,
|
|
36
|
+
matching how `fairness(half_life:)` already guards nil.
|
|
37
|
+
|
|
38
|
+
### Changed
|
|
39
|
+
- The admin UI's dashboard and policies index collapse their per-policy
|
|
40
|
+
`N+1` query loops into grouped `Repository` methods
|
|
41
|
+
(`tick_summaries_by_policy`, `top_denied_reason_by_policy`,
|
|
42
|
+
`partition_round_trip_stats_by_policy`, `partition_counts_by_policy`),
|
|
43
|
+
one query each instead of several per policy.
|
|
44
|
+
|
|
45
|
+
### Added
|
|
46
|
+
- `config.inflight_queued_stale_after` (default 1 hour) — the sweep cutoff
|
|
47
|
+
for inflight rows admitted but never started. Raise it if your adapter
|
|
48
|
+
backlog can exceed an hour.
|
|
49
|
+
|
|
50
|
+
### Removed
|
|
51
|
+
- The broken, unused `Partition.stale_inactive` scope — it filtered on an
|
|
52
|
+
`in_flight_count` column dropped back in 0.3.0, so any call raised
|
|
53
|
+
`PG::UndefinedColumn`. The real partition GC is
|
|
54
|
+
`Repository.sweep_inactive_partitions!`.
|
|
55
|
+
|
|
3
56
|
## 0.4.2
|
|
4
57
|
|
|
5
58
|
### Fixed
|
data/README.md
CHANGED
|
@@ -534,6 +534,7 @@ DispatchPolicy.configure do |c|
|
|
|
534
534
|
c.idle_pause = 0.5 # seconds slept when a tick admits nothing
|
|
535
535
|
c.partition_inactive_after = 86_400 # GC partitions idle this long
|
|
536
536
|
c.inflight_stale_after = 300 # GC inflight rows whose worker stopped heartbeating
|
|
537
|
+
c.inflight_queued_stale_after = 3_600 # GC inflight rows admitted but never started (queued)
|
|
537
538
|
c.inflight_heartbeat_interval = 30 # how often the worker bumps heartbeat_at
|
|
538
539
|
c.sweep_every_ticks = 50 # sweeper cadence (in tick iterations)
|
|
539
540
|
c.metrics_retention = 86_400 # tick_samples kept this long
|
|
@@ -61,29 +61,37 @@ module DispatchPolicy
|
|
|
61
61
|
one_min_ago = now - 60
|
|
62
62
|
five_min_ago = now - 300
|
|
63
63
|
|
|
64
|
+
# Aggregate everything the per-policy rows need in 4 grouped queries
|
|
65
|
+
# instead of ~4 per policy. With dozens of policies this was the bulk
|
|
66
|
+
# of the dashboard's query count.
|
|
67
|
+
m1_by = Repository.tick_summaries_by_policy(since: one_min_ago)
|
|
68
|
+
m5_by = Repository.tick_summaries_by_policy(since: five_min_ago)
|
|
69
|
+
denied_by = Repository.top_denied_reason_by_policy(since: one_min_ago)
|
|
70
|
+
rt_by = Repository.partition_round_trip_stats_by_policy
|
|
71
|
+
|
|
64
72
|
names = (pending_by_policy.keys + in_flight_by_policy.keys).uniq.sort
|
|
65
73
|
@policies = names.map do |name|
|
|
66
|
-
info
|
|
67
|
-
m1
|
|
68
|
-
m5
|
|
69
|
-
|
|
70
|
-
|
|
74
|
+
info = pending_by_policy[name] || {}
|
|
75
|
+
m1 = m1_by[name] || {}
|
|
76
|
+
m5 = m5_by[name] || {}
|
|
77
|
+
rt = rt_by[name] || {}
|
|
78
|
+
top = denied_by[name] # [reason, count] or nil
|
|
71
79
|
|
|
72
80
|
{
|
|
73
81
|
name: name,
|
|
74
82
|
pending: info[:pending] || 0,
|
|
75
83
|
in_flight: in_flight_by_policy[name] || 0,
|
|
76
84
|
last_admit_at: info[:last_admit_at],
|
|
77
|
-
admitted_1m: m1[:jobs_admitted],
|
|
78
|
-
admitted_5m: m5[:jobs_admitted],
|
|
79
|
-
ticks_1m: m1[:ticks],
|
|
80
|
-
avg_tick_ms_1m: m1[:avg_duration_ms],
|
|
81
|
-
forward_failures_1m: m1[:forward_failures],
|
|
85
|
+
admitted_1m: m1[:jobs_admitted] || 0,
|
|
86
|
+
admitted_5m: m5[:jobs_admitted] || 0,
|
|
87
|
+
ticks_1m: m1[:ticks] || 0,
|
|
88
|
+
avg_tick_ms_1m: m1[:avg_duration_ms] || 0,
|
|
89
|
+
forward_failures_1m: m1[:forward_failures] || 0,
|
|
82
90
|
oldest_age_seconds: rt[:oldest_age_seconds],
|
|
83
91
|
p95_age_seconds: rt[:p95_age_seconds],
|
|
84
|
-
in_backoff: rt[:in_backoff],
|
|
85
|
-
top_denial_reason:
|
|
86
|
-
top_denial_count:
|
|
92
|
+
in_backoff: rt[:in_backoff] || 0,
|
|
93
|
+
top_denial_reason: top&.first,
|
|
94
|
+
top_denial_count: top&.last
|
|
87
95
|
}
|
|
88
96
|
end
|
|
89
97
|
end
|
|
@@ -12,16 +12,19 @@ module DispatchPolicy
|
|
|
12
12
|
names = (registry_names + db_names).uniq.sort
|
|
13
13
|
|
|
14
14
|
in_flight_by_policy = InflightJob.where(policy_name: names).group(:policy_name).count
|
|
15
|
+
# One grouped query for pending / partition count / paused count
|
|
16
|
+
# across every policy instead of three per policy.
|
|
17
|
+
counts_by_policy = Repository.partition_counts_by_policy
|
|
15
18
|
|
|
16
19
|
@rows = names.map do |name|
|
|
17
|
-
|
|
20
|
+
counts = counts_by_policy[name] || {}
|
|
18
21
|
{
|
|
19
22
|
name: name,
|
|
20
23
|
registered: registry_names.include?(name),
|
|
21
|
-
pending:
|
|
24
|
+
pending: counts[:pending] || 0,
|
|
22
25
|
in_flight: in_flight_by_policy[name] || 0,
|
|
23
|
-
partitions: partitions
|
|
24
|
-
paused_count:
|
|
26
|
+
partitions: counts[:partitions] || 0,
|
|
27
|
+
paused_count: counts[:paused] || 0
|
|
25
28
|
}
|
|
26
29
|
end
|
|
27
30
|
end
|
|
@@ -9,10 +9,6 @@ module DispatchPolicy
|
|
|
9
9
|
scope :active, -> { where(status: "active") }
|
|
10
10
|
scope :paused, -> { where(status: "paused") }
|
|
11
11
|
scope :pending, -> { where("pending_count > 0") }
|
|
12
|
-
scope :stale_inactive, ->(cutoff) {
|
|
13
|
-
where("pending_count = 0 AND in_flight_count = 0")
|
|
14
|
-
.where("last_admit_at < ? OR (last_admit_at IS NULL AND created_at < ?)", cutoff, cutoff)
|
|
15
|
-
}
|
|
16
12
|
|
|
17
13
|
def paused?
|
|
18
14
|
status == "paused"
|
|
@@ -10,6 +10,7 @@ module DispatchPolicy
|
|
|
10
10
|
:busy_pause,
|
|
11
11
|
:partition_inactive_after,
|
|
12
12
|
:inflight_stale_after,
|
|
13
|
+
:inflight_queued_stale_after,
|
|
13
14
|
:inflight_heartbeat_interval,
|
|
14
15
|
:real_adapter,
|
|
15
16
|
:logger,
|
|
@@ -40,6 +41,16 @@ module DispatchPolicy
|
|
|
40
41
|
@busy_pause = 0.0
|
|
41
42
|
@partition_inactive_after = 24 * 60 * 60
|
|
42
43
|
@inflight_stale_after = 5 * 60
|
|
44
|
+
# Cutoff for inflight rows that were admitted (pre-inserted by the
|
|
45
|
+
# Tick) but never started performing — so the heartbeat thread, which
|
|
46
|
+
# only starts in around_perform, never advanced their heartbeat_at.
|
|
47
|
+
# These sit in the adapter's queue waiting for a worker; reaping them
|
|
48
|
+
# at `inflight_stale_after` (5 min) would make the concurrency gate
|
|
49
|
+
# under-count and over-admit whenever queue latency exceeds that. We
|
|
50
|
+
# give never-started rows a far more generous cutoff (1h) before
|
|
51
|
+
# assuming the admission was lost. Raise it if your adapter backlog
|
|
52
|
+
# can exceed an hour.
|
|
53
|
+
@inflight_queued_stale_after = 60 * 60
|
|
43
54
|
@inflight_heartbeat_interval = 30
|
|
44
55
|
@real_adapter = nil
|
|
45
56
|
@logger = nil
|
|
@@ -66,6 +66,13 @@ module DispatchPolicy
|
|
|
66
66
|
decoded = JSON.parse(Base64.urlsafe_decode64(cursor))
|
|
67
67
|
return nil unless decoded.is_a?(Array) && decoded.size == 2
|
|
68
68
|
|
|
69
|
+
# The cursor is attacker-controllable (a query param). Reject anything
|
|
70
|
+
# that isn't a (scalar value, integer id) tuple so a hostile payload
|
|
71
|
+
# like [[1,2], {}] can't reach the WHERE clause and raise a 500 (or
|
|
72
|
+
# worse). Per-column type compatibility is enforced in #apply.
|
|
73
|
+
value, id = decoded
|
|
74
|
+
return nil unless (value.is_a?(String) || value.is_a?(Numeric)) && id.is_a?(Integer)
|
|
75
|
+
|
|
69
76
|
decoded
|
|
70
77
|
rescue StandardError
|
|
71
78
|
nil
|
|
@@ -78,6 +85,15 @@ module DispatchPolicy
|
|
|
78
85
|
return scope if cursor.nil?
|
|
79
86
|
|
|
80
87
|
value, last_id = cursor
|
|
88
|
+
# Ignore a cursor whose value type can't be compared against this
|
|
89
|
+
# sort's column. The numeric columns (pending_count, total_admitted)
|
|
90
|
+
# need a Numeric; everything else compares as text (partition_key, or
|
|
91
|
+
# the ISO8601 timestamps emitted by #extract). A mismatch — e.g. a
|
|
92
|
+
# numeric value forged for a timestamp sort — would raise PG error;
|
|
93
|
+
# instead we fall back to the first page.
|
|
94
|
+
numeric_column = %w[pending_count total_admitted].include?(sort[:cursor_sql])
|
|
95
|
+
return scope unless numeric_column ? value.is_a?(Numeric) : value.is_a?(String)
|
|
96
|
+
|
|
81
97
|
case sort[:direction]
|
|
82
98
|
when :desc
|
|
83
99
|
scope.where(
|
|
@@ -27,6 +27,13 @@ module DispatchPolicy
|
|
|
27
27
|
cap = capacity_for(ctx)
|
|
28
28
|
return Decision.deny(retry_after: @full_backoff, reason: "max=0") if cap <= 0
|
|
29
29
|
|
|
30
|
+
# This COUNT(*) runs in `evaluate`, BEFORE the admission TX opens, so
|
|
31
|
+
# the cap holds only when a single tick loop owns a given
|
|
32
|
+
# (policy, shard): within one tick, pass-2 re-reads the count after
|
|
33
|
+
# pass-1's inflight pre-insert has committed. Running two tick loops
|
|
34
|
+
# over the SAME shard would let both read the same pre-admission
|
|
35
|
+
# count and over-admit — shard the policy instead of duplicating
|
|
36
|
+
# loops on one shard (see shard_by in the README).
|
|
30
37
|
in_flight = Repository.count_inflight(
|
|
31
38
|
policy_name: partition["policy_name"],
|
|
32
39
|
partition_key: inflight_partition_key(partition["policy_name"], ctx)
|
|
@@ -39,11 +39,19 @@ module DispatchPolicy
|
|
|
39
39
|
elapsed = [now - refilled_at, 0.0].max
|
|
40
40
|
tokens = [tokens + (elapsed * refill_rate), capacity.to_f].min
|
|
41
41
|
|
|
42
|
-
|
|
42
|
+
# The patch records the post-refill bucket WITHOUT deducting yet.
|
|
43
|
+
# The actual deduction is deferred to #consume, which runs once
|
|
44
|
+
# the admission TX knows how many staged rows were really claimed.
|
|
45
|
+
# Deducting `allowed` here over-charges the bucket whenever fewer
|
|
46
|
+
# jobs are admitted than allowed — a later gate capping admit_count,
|
|
47
|
+
# future-scheduled rows skipped by the `scheduled_at <= now()`
|
|
48
|
+
# filter, or rows another tick grabbed under SKIP LOCKED.
|
|
49
|
+
patch = { "tokens" => tokens, "refilled_at" => now }
|
|
50
|
+
|
|
51
|
+
whole = tokens.floor
|
|
43
52
|
if whole.zero?
|
|
44
53
|
missing = 1.0 - tokens
|
|
45
54
|
retry_after = missing / refill_rate
|
|
46
|
-
patch = { "tokens" => tokens, "refilled_at" => now }
|
|
47
55
|
return Decision.new(allowed: 0,
|
|
48
56
|
retry_after: retry_after,
|
|
49
57
|
gate_state_patch: { "throttle" => patch },
|
|
@@ -51,10 +59,22 @@ module DispatchPolicy
|
|
|
51
59
|
end
|
|
52
60
|
|
|
53
61
|
allowed = [whole, admit_budget].min
|
|
54
|
-
patch = { "tokens" => tokens - allowed, "refilled_at" => now }
|
|
55
62
|
Decision.new(allowed: allowed, gate_state_patch: { "throttle" => patch })
|
|
56
63
|
end
|
|
57
64
|
|
|
65
|
+
# Settles the bucket against the number of jobs actually admitted.
|
|
66
|
+
# `evaluate` recorded the post-refill token count in the decision's
|
|
67
|
+
# patch; here we subtract exactly `admitted_count` (≤ allowed), so
|
|
68
|
+
# the bucket is charged for jobs that really left, never for unspent
|
|
69
|
+
# budget. Called by Pipeline.settle after the claim.
|
|
70
|
+
def consume(decision, admitted_count)
|
|
71
|
+
st = decision.gate_state_patch && decision.gate_state_patch["throttle"]
|
|
72
|
+
return nil unless st
|
|
73
|
+
|
|
74
|
+
{ "throttle" => { "tokens" => st["tokens"].to_f - admitted_count,
|
|
75
|
+
"refilled_at" => st["refilled_at"] } }
|
|
76
|
+
end
|
|
77
|
+
|
|
58
78
|
private
|
|
59
79
|
|
|
60
80
|
def capacity_for(ctx)
|
|
@@ -28,7 +28,15 @@ module DispatchPolicy
|
|
|
28
28
|
policy = DispatchPolicy.registry.fetch(policy_name)
|
|
29
29
|
return yield unless policy
|
|
30
30
|
|
|
31
|
-
|
|
31
|
+
# Mirror the stage-time fallback in JobExtension.around_enqueue_for:
|
|
32
|
+
# when the job carries no explicit queue, use the policy's default.
|
|
33
|
+
# Without this, a policy whose partition_by/shard_by reads queue_name
|
|
34
|
+
# would compute a DIFFERENT partition_key here than at admission, so
|
|
35
|
+
# the around_perform inflight row (and adaptive observations) would
|
|
36
|
+
# land under the wrong scope and the concurrency gate's COUNT(*) would
|
|
37
|
+
# miss them.
|
|
38
|
+
queue_name = job.queue_name&.to_s || policy.queue_name
|
|
39
|
+
ctx = policy.build_context(job.arguments, queue_name: queue_name)
|
|
32
40
|
partition_key = policy.partition_key_for(ctx)
|
|
33
41
|
|
|
34
42
|
Repository.insert_inflight!([{
|
|
@@ -5,12 +5,30 @@ module DispatchPolicy
|
|
|
5
5
|
# partition. Returns a value object describing how many jobs may be
|
|
6
6
|
# admitted right now and which gate-state patches to persist.
|
|
7
7
|
class Pipeline
|
|
8
|
-
Result = Struct.new(:admit_count, :retry_after, :gate_state_patch, :reasons, keyword_init: true)
|
|
8
|
+
Result = Struct.new(:admit_count, :retry_after, :gate_state_patch, :reasons, :decisions, keyword_init: true)
|
|
9
9
|
|
|
10
10
|
def initialize(policy)
|
|
11
11
|
@policy = policy
|
|
12
12
|
end
|
|
13
13
|
|
|
14
|
+
# Computes the gate_state patch to persist once the REAL admitted count
|
|
15
|
+
# is known (after the staging DELETE). Each gate's #consume settles its
|
|
16
|
+
# state against the actual number of jobs claimed — the throttle
|
|
17
|
+
# deducts that many tokens rather than the optimistic `allowed` it
|
|
18
|
+
# returned at evaluate time. Gates that keep no gate_state (concurrency,
|
|
19
|
+
# adaptive_concurrency — their state lives in their own tables) return
|
|
20
|
+
# nil from #consume and contribute nothing here.
|
|
21
|
+
#
|
|
22
|
+
# `decisions` is the [gate, decision] list carried on the Result.
|
|
23
|
+
def self.settle(decisions, admitted_count)
|
|
24
|
+
patch = {}
|
|
25
|
+
decisions.each do |gate, decision|
|
|
26
|
+
sub = gate.consume(decision, admitted_count)
|
|
27
|
+
patch.merge!(sub) if sub
|
|
28
|
+
end
|
|
29
|
+
patch
|
|
30
|
+
end
|
|
31
|
+
|
|
14
32
|
def call(ctx, partition, max_budget)
|
|
15
33
|
budget = max_budget
|
|
16
34
|
retry_after = nil
|
|
@@ -41,7 +59,8 @@ module DispatchPolicy
|
|
|
41
59
|
admit_count: admit_count,
|
|
42
60
|
retry_after: retry_after,
|
|
43
61
|
gate_state_patch: patch,
|
|
44
|
-
reasons: reasons
|
|
62
|
+
reasons: reasons,
|
|
63
|
+
decisions: decisions
|
|
45
64
|
)
|
|
46
65
|
end
|
|
47
66
|
end
|
|
@@ -45,7 +45,7 @@ module DispatchPolicy
|
|
|
45
45
|
end
|
|
46
46
|
|
|
47
47
|
def admission_batch_size(size)
|
|
48
|
-
@admission_batch_size = Integer(size)
|
|
48
|
+
@admission_batch_size = Integer(size) if size
|
|
49
49
|
end
|
|
50
50
|
|
|
51
51
|
# Per-policy override for the EWMA half-life used to weigh recent
|
|
@@ -62,7 +62,7 @@ module DispatchPolicy
|
|
|
62
62
|
# nil, no global cap is enforced and per-partition admission_batch_size
|
|
63
63
|
# is the only ceiling.
|
|
64
64
|
def tick_admission_budget(value)
|
|
65
|
-
@tick_admission_budget = Integer(value)
|
|
65
|
+
@tick_admission_budget = Integer(value) if value
|
|
66
66
|
end
|
|
67
67
|
|
|
68
68
|
# Defines the partition scope. Required — every policy declares
|
|
@@ -192,8 +192,8 @@ module DispatchPolicy
|
|
|
192
192
|
# through `bulk_record_partition_denies!` instead, which collapses
|
|
193
193
|
# many partitions into a single UPDATE…FROM(VALUES…) at the end of
|
|
194
194
|
# the tick.
|
|
195
|
-
def claim_staged_jobs!(policy_name:, partition_key:, limit:,
|
|
196
|
-
half_life_seconds: nil)
|
|
195
|
+
def claim_staged_jobs!(policy_name:, partition_key:, limit:, retry_after:,
|
|
196
|
+
gate_state_patch: nil, half_life_seconds: nil)
|
|
197
197
|
raise ArgumentError, "claim_staged_jobs! requires limit > 0" unless limit.positive?
|
|
198
198
|
|
|
199
199
|
sql_select = <<~SQL.squish
|
|
@@ -212,11 +212,18 @@ module DispatchPolicy
|
|
|
212
212
|
SQL
|
|
213
213
|
rows = connection.exec_query(sql_select, "claim_staged_jobs", [policy_name, partition_key, limit]).to_a
|
|
214
214
|
|
|
215
|
+
# The gate_state patch may depend on how many rows we actually
|
|
216
|
+
# claimed (e.g. the throttle charges its bucket for jobs admitted,
|
|
217
|
+
# not for the optimistic `allowed`). When the caller passes a block
|
|
218
|
+
# it receives that real count and returns the patch to persist;
|
|
219
|
+
# gate-less callers pass a fixed `gate_state_patch:` instead.
|
|
220
|
+
patch = block_given? ? yield(rows.size) : (gate_state_patch || {})
|
|
221
|
+
|
|
215
222
|
record_partition_admit!(
|
|
216
223
|
policy_name: policy_name,
|
|
217
224
|
partition_key: partition_key,
|
|
218
225
|
admitted: rows.size,
|
|
219
|
-
gate_state_patch:
|
|
226
|
+
gate_state_patch: patch,
|
|
220
227
|
retry_after: retry_after,
|
|
221
228
|
half_life_seconds: half_life_seconds
|
|
222
229
|
)
|
|
@@ -396,14 +403,37 @@ module DispatchPolicy
|
|
|
396
403
|
Integer(result.rows.first.first)
|
|
397
404
|
end
|
|
398
405
|
|
|
399
|
-
|
|
406
|
+
# Reap inflight rows whose owner is gone. Two tiers, distinguished by
|
|
407
|
+
# whether the row was ever heartbeated past its admission:
|
|
408
|
+
#
|
|
409
|
+
# heartbeat_at > admitted_at → the worker started performing and the
|
|
410
|
+
# heartbeat thread advanced heartbeat_at at least once. If it then
|
|
411
|
+
# went silent for `cutoff_seconds`, the worker died mid-run: reap.
|
|
412
|
+
#
|
|
413
|
+
# heartbeat_at <= admitted_at → never heartbeated past admission. The
|
|
414
|
+
# row was pre-inserted by the Tick and the job is still waiting in
|
|
415
|
+
# the adapter's queue (or only just started — the first heartbeat
|
|
416
|
+
# fires after inflight_heartbeat_interval). Reaping these at the
|
|
417
|
+
# short cutoff would under-count the concurrency gate and over-admit
|
|
418
|
+
# whenever queue latency exceeds it. Only reap once they're older
|
|
419
|
+
# than the far more generous `queued_cutoff_seconds`, by which point
|
|
420
|
+
# the admission is presumed lost.
|
|
421
|
+
#
|
|
422
|
+
# The Tick pre-insert writes admitted_at and heartbeat_at from the same
|
|
423
|
+
# now() (a single statement), so a never-started row has them exactly
|
|
424
|
+
# equal; one heartbeat makes heartbeat_at strictly greater.
|
|
425
|
+
def sweep_stale_inflight!(cutoff_seconds:, queued_cutoff_seconds: nil)
|
|
426
|
+
queued_cutoff_seconds ||= cutoff_seconds
|
|
400
427
|
connection.exec_query(
|
|
401
428
|
<<~SQL.squish,
|
|
402
429
|
DELETE FROM #{INFLIGHT_TABLE}
|
|
403
|
-
WHERE heartbeat_at
|
|
430
|
+
WHERE (heartbeat_at > admitted_at
|
|
431
|
+
AND heartbeat_at < now() - ($1 || ' seconds')::interval)
|
|
432
|
+
OR (heartbeat_at <= admitted_at
|
|
433
|
+
AND admitted_at < now() - ($2 || ' seconds')::interval)
|
|
404
434
|
SQL
|
|
405
435
|
"sweep_stale_inflight",
|
|
406
|
-
[cutoff_seconds.to_i]
|
|
436
|
+
[cutoff_seconds.to_i, queued_cutoff_seconds.to_i]
|
|
407
437
|
)
|
|
408
438
|
end
|
|
409
439
|
|
|
@@ -471,6 +501,37 @@ module DispatchPolicy
|
|
|
471
501
|
}
|
|
472
502
|
end
|
|
473
503
|
|
|
504
|
+
# One grouped query returning per-policy tick aggregates, keyed by
|
|
505
|
+
# policy_name. Replaces calling tick_summary once per policy on the
|
|
506
|
+
# dashboard (N queries → 1). Only the fields the overview renders.
|
|
507
|
+
# { "policy_a" => { jobs_admitted:, forward_failures:, ticks:,
|
|
508
|
+
# avg_duration_ms: }, ... }
|
|
509
|
+
def tick_summaries_by_policy(since:)
|
|
510
|
+
result = connection.exec_query(
|
|
511
|
+
<<~SQL.squish,
|
|
512
|
+
SELECT
|
|
513
|
+
policy_name,
|
|
514
|
+
COALESCE(SUM(jobs_admitted), 0)::int AS jobs_admitted,
|
|
515
|
+
COALESCE(SUM(forward_failures), 0)::int AS forward_failures,
|
|
516
|
+
COUNT(*)::int AS ticks,
|
|
517
|
+
COALESCE(AVG(duration_ms), 0)::int AS avg_duration_ms
|
|
518
|
+
FROM #{SAMPLES_TABLE}
|
|
519
|
+
WHERE sampled_at >= $1
|
|
520
|
+
GROUP BY policy_name
|
|
521
|
+
SQL
|
|
522
|
+
"tick_summaries_by_policy",
|
|
523
|
+
[since]
|
|
524
|
+
)
|
|
525
|
+
result.to_a.each_with_object({}) do |r, h|
|
|
526
|
+
h[r["policy_name"]] = {
|
|
527
|
+
jobs_admitted: r["jobs_admitted"].to_i,
|
|
528
|
+
forward_failures: r["forward_failures"].to_i,
|
|
529
|
+
ticks: r["ticks"].to_i,
|
|
530
|
+
avg_duration_ms: r["avg_duration_ms"].to_i
|
|
531
|
+
}
|
|
532
|
+
end
|
|
533
|
+
end
|
|
534
|
+
|
|
474
535
|
# Aggregate denied_reasons jsonb across samples in window: returns
|
|
475
536
|
# { "throttle" => 12, "concurrency_full" => 3, ... }
|
|
476
537
|
def denied_reasons_summary(policy_name: nil, since:)
|
|
@@ -490,6 +551,30 @@ module DispatchPolicy
|
|
|
490
551
|
result.to_a.each_with_object({}) { |r, h| h[r["key"]] = r["total"].to_i }
|
|
491
552
|
end
|
|
492
553
|
|
|
554
|
+
# The single most-denied reason per policy in one query, keyed by
|
|
555
|
+
# policy_name → [reason, count]. Replaces calling denied_reasons_summary
|
|
556
|
+
# per policy on the dashboard just to read its top entry.
|
|
557
|
+
def top_denied_reason_by_policy(since:)
|
|
558
|
+
result = connection.exec_query(
|
|
559
|
+
<<~SQL.squish,
|
|
560
|
+
SELECT DISTINCT ON (policy_name) policy_name, key, total
|
|
561
|
+
FROM (
|
|
562
|
+
SELECT policy_name, key, SUM(value::int)::int AS total
|
|
563
|
+
FROM #{SAMPLES_TABLE},
|
|
564
|
+
LATERAL jsonb_each_text(denied_reasons)
|
|
565
|
+
WHERE sampled_at >= $1
|
|
566
|
+
GROUP BY policy_name, key
|
|
567
|
+
) t
|
|
568
|
+
ORDER BY policy_name, total DESC
|
|
569
|
+
SQL
|
|
570
|
+
"top_denied_reason_by_policy",
|
|
571
|
+
[since]
|
|
572
|
+
)
|
|
573
|
+
result.to_a.each_with_object({}) do |r, h|
|
|
574
|
+
h[r["policy_name"]] = [r["key"], r["total"].to_i]
|
|
575
|
+
end
|
|
576
|
+
end
|
|
577
|
+
|
|
493
578
|
# Returns time-bucketed series for sparklines. `bucket_seconds` is the
|
|
494
579
|
# bucket width. Each row: { bucket_at:, jobs_admitted:, forward_failures:,
|
|
495
580
|
# pending_total:, ticks: }.
|
|
@@ -595,6 +680,62 @@ module DispatchPolicy
|
|
|
595
680
|
}
|
|
596
681
|
end
|
|
597
682
|
|
|
683
|
+
# Per-policy partition counts in one grouped query, keyed by
|
|
684
|
+
# policy_name → { pending, partitions, paused }. Replaces calling
|
|
685
|
+
# Partition.for_policy(name).sum/.count/.paused.count once per policy on
|
|
686
|
+
# the policies index (3N queries → 1).
|
|
687
|
+
def partition_counts_by_policy
|
|
688
|
+
result = connection.exec_query(
|
|
689
|
+
<<~SQL.squish,
|
|
690
|
+
SELECT
|
|
691
|
+
policy_name,
|
|
692
|
+
COALESCE(SUM(pending_count), 0)::int AS pending,
|
|
693
|
+
COUNT(*)::int AS partitions,
|
|
694
|
+
COUNT(*) FILTER (WHERE status = 'paused')::int AS paused
|
|
695
|
+
FROM #{PARTITIONS_TABLE}
|
|
696
|
+
GROUP BY policy_name
|
|
697
|
+
SQL
|
|
698
|
+
"partition_counts_by_policy",
|
|
699
|
+
[]
|
|
700
|
+
)
|
|
701
|
+
result.to_a.each_with_object({}) do |r, h|
|
|
702
|
+
h[r["policy_name"]] = {
|
|
703
|
+
pending: r["pending"].to_i,
|
|
704
|
+
partitions: r["partitions"].to_i,
|
|
705
|
+
paused: r["paused"].to_i
|
|
706
|
+
}
|
|
707
|
+
end
|
|
708
|
+
end
|
|
709
|
+
|
|
710
|
+
# Per-policy round-trip stats in one grouped query, keyed by
|
|
711
|
+
# policy_name. Only the fields the dashboard overview renders
|
|
712
|
+
# (in_backoff, oldest/p95 age); use partition_round_trip_stats for the
|
|
713
|
+
# full single-policy breakdown. Replaces N per-policy calls on the
|
|
714
|
+
# dashboard. Same percentile-inversion note as partition_round_trip_stats.
|
|
715
|
+
def partition_round_trip_stats_by_policy
|
|
716
|
+
result = connection.exec_query(
|
|
717
|
+
<<~SQL.squish,
|
|
718
|
+
SELECT
|
|
719
|
+
p.policy_name,
|
|
720
|
+
COUNT(*) FILTER (WHERE p.next_eligible_at IS NOT NULL AND p.next_eligible_at > now())::int AS in_backoff,
|
|
721
|
+
EXTRACT(EPOCH FROM (now() - MIN(p.last_checked_at)))::float AS oldest_age_seconds,
|
|
722
|
+
EXTRACT(EPOCH FROM (now() - PERCENTILE_DISC(0.05) WITHIN GROUP (ORDER BY p.last_checked_at)))::float AS p95_age_seconds
|
|
723
|
+
FROM #{PARTITIONS_TABLE} p
|
|
724
|
+
WHERE p.status = 'active' AND p.pending_count > 0
|
|
725
|
+
GROUP BY p.policy_name
|
|
726
|
+
SQL
|
|
727
|
+
"partition_round_trip_stats_by_policy",
|
|
728
|
+
[]
|
|
729
|
+
)
|
|
730
|
+
result.to_a.each_with_object({}) do |r, h|
|
|
731
|
+
h[r["policy_name"]] = {
|
|
732
|
+
in_backoff: r["in_backoff"].to_i,
|
|
733
|
+
oldest_age_seconds: r["oldest_age_seconds"]&.to_f,
|
|
734
|
+
p95_age_seconds: r["p95_age_seconds"]&.to_f
|
|
735
|
+
}
|
|
736
|
+
end
|
|
737
|
+
end
|
|
738
|
+
|
|
598
739
|
# ----- adaptive_concurrency stats -----------------------------------------
|
|
599
740
|
|
|
600
741
|
# Insert a fresh stats row for the given partition if none exists.
|
data/lib/dispatch_policy/tick.rb
CHANGED
|
@@ -201,19 +201,24 @@ module DispatchPolicy
|
|
|
201
201
|
return { admitted: 0, failures: 0, reasons: deduce_reasons(result) }
|
|
202
202
|
end
|
|
203
203
|
|
|
204
|
-
admitted
|
|
204
|
+
admitted = 0
|
|
205
|
+
settled_patch = nil
|
|
205
206
|
half_life = @policy.fairness_half_life_seconds || @config.fairness_half_life_seconds
|
|
206
207
|
|
|
207
208
|
Repository.with_connection do
|
|
208
209
|
ActiveRecord::Base.transaction(requires_new: true) do
|
|
210
|
+
# The gate_state we persist depends on how many rows actually
|
|
211
|
+
# got claimed: each gate settles its state against the real
|
|
212
|
+
# admitted count via Pipeline.settle (the throttle deducts that
|
|
213
|
+
# many tokens, not the optimistic `allowed`). The block runs
|
|
214
|
+
# inside claim_staged_jobs! right after the DELETE.
|
|
209
215
|
rows = Repository.claim_staged_jobs!(
|
|
210
216
|
policy_name: @policy_name,
|
|
211
217
|
partition_key: partition["partition_key"],
|
|
212
218
|
limit: result.admit_count,
|
|
213
|
-
gate_state_patch: result.gate_state_patch,
|
|
214
219
|
retry_after: result.retry_after,
|
|
215
220
|
half_life_seconds: half_life
|
|
216
|
-
)
|
|
221
|
+
) { |admitted_count| settled_patch = Pipeline.settle(result.decisions, admitted_count) }
|
|
217
222
|
|
|
218
223
|
# `claim_staged_jobs!` always runs `record_partition_admit!` so
|
|
219
224
|
# the partition's counters and gate_state commit even when the
|
|
@@ -293,11 +298,13 @@ module DispatchPolicy
|
|
|
293
298
|
# the STALE pre-pass-1 snapshot. For the throttle that means reading
|
|
294
299
|
# the token bucket at its original level and double-spending —
|
|
295
300
|
# admitting above the configured rate and overwriting pass-1's
|
|
296
|
-
# consumption.
|
|
297
|
-
#
|
|
298
|
-
#
|
|
299
|
-
if
|
|
300
|
-
|
|
301
|
+
# consumption. We mirror the SETTLED patch (post-consume, charged for
|
|
302
|
+
# the real admitted count), not evaluate's pre-consume snapshot. The
|
|
303
|
+
# shallow merge matches Postgres jsonb `||`. Only runs on a committed
|
|
304
|
+
# admit: if the TX raised we fall through to the rescue below and
|
|
305
|
+
# never touch the in-memory state.
|
|
306
|
+
if settled_patch&.any?
|
|
307
|
+
partition["gate_state"] = (partition["gate_state"] || {}).merge(settled_patch)
|
|
301
308
|
end
|
|
302
309
|
|
|
303
310
|
if admitted.zero?
|
|
@@ -68,7 +68,10 @@ module DispatchPolicy
|
|
|
68
68
|
|
|
69
69
|
def sweep!
|
|
70
70
|
cfg = DispatchPolicy.config
|
|
71
|
-
Repository.sweep_stale_inflight!(
|
|
71
|
+
Repository.sweep_stale_inflight!(
|
|
72
|
+
cutoff_seconds: cfg.inflight_stale_after,
|
|
73
|
+
queued_cutoff_seconds: cfg.inflight_queued_stale_after
|
|
74
|
+
)
|
|
72
75
|
Repository.sweep_inactive_partitions!(cutoff_seconds: cfg.partition_inactive_after)
|
|
73
76
|
Repository.sweep_old_tick_samples!(cutoff_seconds: cfg.metrics_retention)
|
|
74
77
|
rescue StandardError => e
|
|
@@ -7,5 +7,6 @@ DispatchPolicy.configure do |c|
|
|
|
7
7
|
c.idle_pause = 0.5 # seconds slept when no admissions happened
|
|
8
8
|
c.partition_inactive_after = 24 * 60 * 60 # GC partitions idle this long
|
|
9
9
|
c.inflight_stale_after = 5 * 60 # GC inflight rows whose worker stopped heartbeating
|
|
10
|
+
c.inflight_queued_stale_after = 60 * 60 # GC inflight rows admitted but never started (still queued)
|
|
10
11
|
c.sweep_every_ticks = 50 # how often to run the sweepers
|
|
11
12
|
end
|