dispatch_policy 0.4.2 → 0.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 0f7f08701682539b873609e4f45432661e9d9e070e5b27593695a90b6266595e
4
- data.tar.gz: c32b4f91f257b1e69e0417edc44fb3779c3f0567f4dd152d893fa781ca7d3142
3
+ metadata.gz: 23433a64c963b0e0908c185ad8dc8e6f97edbd8d476ee712d15023f74ba0e338
4
+ data.tar.gz: 64ff19e04a6d02b0f1eedb4fb6d74b0e073e3773efb9a3afc92ae1a3e9002aeb
5
5
  SHA512:
6
- metadata.gz: 982defdd7fda9aae96d31b83666bd291bf4c36dde33e673f43322ee35cbb830c547a9ada0bbcbe142b47a00f10112d6aae24f1631a06f94e97e2e9ef71365239
7
- data.tar.gz: e928eb7605905fb7de6867d4933332789ebd36e932be2f6784ed672fdcd485bf3228b659a5947b5965e7e47ee3e2c9089dd6cdd5b6069764008f682eacbacaa2
6
+ metadata.gz: e168e049dbb0d399dddc6e84427b7b557474d9ce10cb1983d3f4f24c6fde43ffda9c03c179b4590d9f09d225a8adeb5d7295a10788898ebc4ab0bc47a765163c
7
+ data.tar.gz: d8ef9debaebdf89de7cce28e5fa669484acafd27e4b5e65ff959f6f177c1aaa124cb1215da191c3a6759094aac467980489241062f907574860b7226dd9dbc9a
data/CHANGELOG.md CHANGED
@@ -1,5 +1,58 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.4.3
4
+
5
+ ### Fixed
6
+ - The `throttle` gate now charges its token bucket for the number of jobs
7
+ **actually admitted**, not for the optimistic `allowed` it computes at
8
+ evaluate time. The deduction moved from `#evaluate` to the `#consume`
9
+ hook (run after the staging DELETE, via `Pipeline.settle`), so the
10
+ bucket is no longer over-charged — and the effective rate no longer
11
+ drifts below the configured one — when fewer jobs are admitted than
12
+ allowed: future-scheduled rows skipped by the `scheduled_at <= now()`
13
+ filter, a downstream `concurrency` gate capping `admit_count`, or rows a
14
+ concurrent tick claimed under `SKIP LOCKED`.
15
+ - Inflight rows for jobs that were admitted but have **not started
16
+ performing yet** (still waiting in the adapter's queue) are no longer
17
+ reaped at `inflight_stale_after`. Their heartbeat thread only starts in
18
+ `around_perform`, so under a deep adapter backlog the sweeper used to
19
+ delete still-valid admissions, making the concurrency gate under-count
20
+ and over-admit. `sweep_stale_inflight!` is now two-tier: rows
21
+ heartbeated past admission reap at `inflight_stale_after`; never-started
22
+ rows reap only past the new, generous `config.inflight_queued_stale_after`
23
+ (1 hour default).
24
+ - `InflightTracker` now applies the same `job.queue_name || policy.queue_name`
25
+ fallback at perform time that the staging path uses, so a policy whose
26
+ `partition_by`/`shard_by` reads `queue_name` derives the same
27
+ `partition_key` at admission and at perform (otherwise the inflight row
28
+ and adaptive observations landed under the wrong scope).
29
+ - `CursorPagination` rejects cursors whose value isn't a scalar or whose
30
+ id isn't an integer (the cursor is an attacker-controllable query
31
+ param), and ignores a value whose type can't compare against the sort
32
+ column instead of raising a `PG` error (a forged numeric value on a
33
+ timestamp sort). Falls back to the first page.
34
+ - `PolicyDSL#tick_admission_budget(nil)` / `#admission_batch_size(nil)` are
35
+ no-ops that defer to config instead of raising in `Integer(nil)`,
36
+ matching how `fairness(half_life:)` already guards nil.
37
+
38
+ ### Changed
39
+ - The admin UI's dashboard and policies index collapse their per-policy
40
+ `N+1` query loops into grouped `Repository` methods
41
+ (`tick_summaries_by_policy`, `top_denied_reason_by_policy`,
42
+ `partition_round_trip_stats_by_policy`, `partition_counts_by_policy`),
43
+ one query each instead of several per policy.
44
+
45
+ ### Added
46
+ - `config.inflight_queued_stale_after` (default 1 hour) — the sweep cutoff
47
+ for inflight rows admitted but never started. Raise it if your adapter
48
+ backlog can exceed an hour.
49
+
50
+ ### Removed
51
+ - The broken, unused `Partition.stale_inactive` scope — it filtered on an
52
+ `in_flight_count` column dropped back in 0.3.0, so any call raised
53
+ `PG::UndefinedColumn`. The real partition GC is
54
+ `Repository.sweep_inactive_partitions!`.
55
+
3
56
  ## 0.4.2
4
57
 
5
58
  ### Fixed
data/README.md CHANGED
@@ -534,6 +534,7 @@ DispatchPolicy.configure do |c|
534
534
  c.idle_pause = 0.5 # seconds slept when a tick admits nothing
535
535
  c.partition_inactive_after = 86_400 # GC partitions idle this long
536
536
  c.inflight_stale_after = 300 # GC inflight rows whose worker stopped heartbeating
537
+ c.inflight_queued_stale_after = 3_600 # GC inflight rows admitted but never started (queued)
537
538
  c.inflight_heartbeat_interval = 30 # how often the worker bumps heartbeat_at
538
539
  c.sweep_every_ticks = 50 # sweeper cadence (in tick iterations)
539
540
  c.metrics_retention = 86_400 # tick_samples kept this long
@@ -61,29 +61,37 @@ module DispatchPolicy
61
61
  one_min_ago = now - 60
62
62
  five_min_ago = now - 300
63
63
 
64
+ # Aggregate everything the per-policy rows need in 4 grouped queries
65
+ # instead of ~4 per policy. With dozens of policies this was the bulk
66
+ # of the dashboard's query count.
67
+ m1_by = Repository.tick_summaries_by_policy(since: one_min_ago)
68
+ m5_by = Repository.tick_summaries_by_policy(since: five_min_ago)
69
+ denied_by = Repository.top_denied_reason_by_policy(since: one_min_ago)
70
+ rt_by = Repository.partition_round_trip_stats_by_policy
71
+
64
72
  names = (pending_by_policy.keys + in_flight_by_policy.keys).uniq.sort
65
73
  @policies = names.map do |name|
66
- info = pending_by_policy[name] || {}
67
- m1 = Repository.tick_summary(policy_name: name, since: one_min_ago)
68
- m5 = Repository.tick_summary(policy_name: name, since: five_min_ago)
69
- rs = Repository.denied_reasons_summary(policy_name: name, since: one_min_ago)
70
- rt = Repository.partition_round_trip_stats(policy_name: name)
74
+ info = pending_by_policy[name] || {}
75
+ m1 = m1_by[name] || {}
76
+ m5 = m5_by[name] || {}
77
+ rt = rt_by[name] || {}
78
+ top = denied_by[name] # [reason, count] or nil
71
79
 
72
80
  {
73
81
  name: name,
74
82
  pending: info[:pending] || 0,
75
83
  in_flight: in_flight_by_policy[name] || 0,
76
84
  last_admit_at: info[:last_admit_at],
77
- admitted_1m: m1[:jobs_admitted],
78
- admitted_5m: m5[:jobs_admitted],
79
- ticks_1m: m1[:ticks],
80
- avg_tick_ms_1m: m1[:avg_duration_ms],
81
- forward_failures_1m: m1[:forward_failures],
85
+ admitted_1m: m1[:jobs_admitted] || 0,
86
+ admitted_5m: m5[:jobs_admitted] || 0,
87
+ ticks_1m: m1[:ticks] || 0,
88
+ avg_tick_ms_1m: m1[:avg_duration_ms] || 0,
89
+ forward_failures_1m: m1[:forward_failures] || 0,
82
90
  oldest_age_seconds: rt[:oldest_age_seconds],
83
91
  p95_age_seconds: rt[:p95_age_seconds],
84
- in_backoff: rt[:in_backoff],
85
- top_denial_reason: rs.first&.first,
86
- top_denial_count: rs.first&.last
92
+ in_backoff: rt[:in_backoff] || 0,
93
+ top_denial_reason: top&.first,
94
+ top_denial_count: top&.last
87
95
  }
88
96
  end
89
97
  end
@@ -12,16 +12,19 @@ module DispatchPolicy
12
12
  names = (registry_names + db_names).uniq.sort
13
13
 
14
14
  in_flight_by_policy = InflightJob.where(policy_name: names).group(:policy_name).count
15
+ # One grouped query for pending / partition count / paused count
16
+ # across every policy instead of three per policy.
17
+ counts_by_policy = Repository.partition_counts_by_policy
15
18
 
16
19
  @rows = names.map do |name|
17
- partitions = Partition.for_policy(name)
20
+ counts = counts_by_policy[name] || {}
18
21
  {
19
22
  name: name,
20
23
  registered: registry_names.include?(name),
21
- pending: partitions.sum(:pending_count),
24
+ pending: counts[:pending] || 0,
22
25
  in_flight: in_flight_by_policy[name] || 0,
23
- partitions: partitions.count,
24
- paused_count: partitions.paused.count
26
+ partitions: counts[:partitions] || 0,
27
+ paused_count: counts[:paused] || 0
25
28
  }
26
29
  end
27
30
  end
@@ -9,10 +9,6 @@ module DispatchPolicy
9
9
  scope :active, -> { where(status: "active") }
10
10
  scope :paused, -> { where(status: "paused") }
11
11
  scope :pending, -> { where("pending_count > 0") }
12
- scope :stale_inactive, ->(cutoff) {
13
- where("pending_count = 0 AND in_flight_count = 0")
14
- .where("last_admit_at < ? OR (last_admit_at IS NULL AND created_at < ?)", cutoff, cutoff)
15
- }
16
12
 
17
13
  def paused?
18
14
  status == "paused"
@@ -10,6 +10,7 @@ module DispatchPolicy
10
10
  :busy_pause,
11
11
  :partition_inactive_after,
12
12
  :inflight_stale_after,
13
+ :inflight_queued_stale_after,
13
14
  :inflight_heartbeat_interval,
14
15
  :real_adapter,
15
16
  :logger,
@@ -40,6 +41,16 @@ module DispatchPolicy
40
41
  @busy_pause = 0.0
41
42
  @partition_inactive_after = 24 * 60 * 60
42
43
  @inflight_stale_after = 5 * 60
44
+ # Cutoff for inflight rows that were admitted (pre-inserted by the
45
+ # Tick) but never started performing — so the heartbeat thread, which
46
+ # only starts in around_perform, never advanced their heartbeat_at.
47
+ # These sit in the adapter's queue waiting for a worker; reaping them
48
+ # at `inflight_stale_after` (5 min) would make the concurrency gate
49
+ # under-count and over-admit whenever queue latency exceeds that. We
50
+ # give never-started rows a far more generous cutoff (1h) before
51
+ # assuming the admission was lost. Raise it if your adapter backlog
52
+ # can exceed an hour.
53
+ @inflight_queued_stale_after = 60 * 60
43
54
  @inflight_heartbeat_interval = 30
44
55
  @real_adapter = nil
45
56
  @logger = nil
@@ -66,6 +66,13 @@ module DispatchPolicy
66
66
  decoded = JSON.parse(Base64.urlsafe_decode64(cursor))
67
67
  return nil unless decoded.is_a?(Array) && decoded.size == 2
68
68
 
69
+ # The cursor is attacker-controllable (a query param). Reject anything
70
+ # that isn't a (scalar value, integer id) tuple so a hostile payload
71
+ # like [[1,2], {}] can't reach the WHERE clause and raise a 500 (or
72
+ # worse). Per-column type compatibility is enforced in #apply.
73
+ value, id = decoded
74
+ return nil unless (value.is_a?(String) || value.is_a?(Numeric)) && id.is_a?(Integer)
75
+
69
76
  decoded
70
77
  rescue StandardError
71
78
  nil
@@ -78,6 +85,15 @@ module DispatchPolicy
78
85
  return scope if cursor.nil?
79
86
 
80
87
  value, last_id = cursor
88
+ # Ignore a cursor whose value type can't be compared against this
89
+ # sort's column. The numeric columns (pending_count, total_admitted)
90
+ # need a Numeric; everything else compares as text (partition_key, or
91
+ # the ISO8601 timestamps emitted by #extract). A mismatch — e.g. a
92
+ # numeric value forged for a timestamp sort — would raise PG error;
93
+ # instead we fall back to the first page.
94
+ numeric_column = %w[pending_count total_admitted].include?(sort[:cursor_sql])
95
+ return scope unless numeric_column ? value.is_a?(Numeric) : value.is_a?(String)
96
+
81
97
  case sort[:direction]
82
98
  when :desc
83
99
  scope.where(
@@ -27,6 +27,13 @@ module DispatchPolicy
27
27
  cap = capacity_for(ctx)
28
28
  return Decision.deny(retry_after: @full_backoff, reason: "max=0") if cap <= 0
29
29
 
30
+ # This COUNT(*) runs in `evaluate`, BEFORE the admission TX opens, so
31
+ # the cap holds only when a single tick loop owns a given
32
+ # (policy, shard): within one tick, pass-2 re-reads the count after
33
+ # pass-1's inflight pre-insert has committed. Running two tick loops
34
+ # over the SAME shard would let both read the same pre-admission
35
+ # count and over-admit — shard the policy instead of duplicating
36
+ # loops on one shard (see shard_by in the README).
30
37
  in_flight = Repository.count_inflight(
31
38
  policy_name: partition["policy_name"],
32
39
  partition_key: inflight_partition_key(partition["policy_name"], ctx)
@@ -39,11 +39,19 @@ module DispatchPolicy
39
39
  elapsed = [now - refilled_at, 0.0].max
40
40
  tokens = [tokens + (elapsed * refill_rate), capacity.to_f].min
41
41
 
42
- whole = tokens.floor
42
+ # The patch records the post-refill bucket WITHOUT deducting yet.
43
+ # The actual deduction is deferred to #consume, which runs once
44
+ # the admission TX knows how many staged rows were really claimed.
45
+ # Deducting `allowed` here over-charges the bucket whenever fewer
46
+ # jobs are admitted than allowed — a later gate capping admit_count,
47
+ # future-scheduled rows skipped by the `scheduled_at <= now()`
48
+ # filter, or rows another tick grabbed under SKIP LOCKED.
49
+ patch = { "tokens" => tokens, "refilled_at" => now }
50
+
51
+ whole = tokens.floor
43
52
  if whole.zero?
44
53
  missing = 1.0 - tokens
45
54
  retry_after = missing / refill_rate
46
- patch = { "tokens" => tokens, "refilled_at" => now }
47
55
  return Decision.new(allowed: 0,
48
56
  retry_after: retry_after,
49
57
  gate_state_patch: { "throttle" => patch },
@@ -51,10 +59,22 @@ module DispatchPolicy
51
59
  end
52
60
 
53
61
  allowed = [whole, admit_budget].min
54
- patch = { "tokens" => tokens - allowed, "refilled_at" => now }
55
62
  Decision.new(allowed: allowed, gate_state_patch: { "throttle" => patch })
56
63
  end
57
64
 
65
+ # Settles the bucket against the number of jobs actually admitted.
66
+ # `evaluate` recorded the post-refill token count in the decision's
67
+ # patch; here we subtract exactly `admitted_count` (≤ allowed), so
68
+ # the bucket is charged for jobs that really left, never for unspent
69
+ # budget. Called by Pipeline.settle after the claim.
70
+ def consume(decision, admitted_count)
71
+ st = decision.gate_state_patch && decision.gate_state_patch["throttle"]
72
+ return nil unless st
73
+
74
+ { "throttle" => { "tokens" => st["tokens"].to_f - admitted_count,
75
+ "refilled_at" => st["refilled_at"] } }
76
+ end
77
+
58
78
  private
59
79
 
60
80
  def capacity_for(ctx)
@@ -28,7 +28,15 @@ module DispatchPolicy
28
28
  policy = DispatchPolicy.registry.fetch(policy_name)
29
29
  return yield unless policy
30
30
 
31
- ctx = policy.build_context(job.arguments, queue_name: job.queue_name&.to_s)
31
+ # Mirror the stage-time fallback in JobExtension.around_enqueue_for:
32
+ # when the job carries no explicit queue, use the policy's default.
33
+ # Without this, a policy whose partition_by/shard_by reads queue_name
34
+ # would compute a DIFFERENT partition_key here than at admission, so
35
+ # the around_perform inflight row (and adaptive observations) would
36
+ # land under the wrong scope and the concurrency gate's COUNT(*) would
37
+ # miss them.
38
+ queue_name = job.queue_name&.to_s || policy.queue_name
39
+ ctx = policy.build_context(job.arguments, queue_name: queue_name)
32
40
  partition_key = policy.partition_key_for(ctx)
33
41
 
34
42
  Repository.insert_inflight!([{
@@ -5,12 +5,30 @@ module DispatchPolicy
5
5
  # partition. Returns a value object describing how many jobs may be
6
6
  # admitted right now and which gate-state patches to persist.
7
7
  class Pipeline
8
- Result = Struct.new(:admit_count, :retry_after, :gate_state_patch, :reasons, keyword_init: true)
8
+ Result = Struct.new(:admit_count, :retry_after, :gate_state_patch, :reasons, :decisions, keyword_init: true)
9
9
 
10
10
  def initialize(policy)
11
11
  @policy = policy
12
12
  end
13
13
 
14
+ # Computes the gate_state patch to persist once the REAL admitted count
15
+ # is known (after the staging DELETE). Each gate's #consume settles its
16
+ # state against the actual number of jobs claimed — the throttle
17
+ # deducts that many tokens rather than the optimistic `allowed` it
18
+ # returned at evaluate time. Gates that keep no gate_state (concurrency,
19
+ # adaptive_concurrency — their state lives in their own tables) return
20
+ # nil from #consume and contribute nothing here.
21
+ #
22
+ # `decisions` is the [gate, decision] list carried on the Result.
23
+ def self.settle(decisions, admitted_count)
24
+ patch = {}
25
+ decisions.each do |gate, decision|
26
+ sub = gate.consume(decision, admitted_count)
27
+ patch.merge!(sub) if sub
28
+ end
29
+ patch
30
+ end
31
+
14
32
  def call(ctx, partition, max_budget)
15
33
  budget = max_budget
16
34
  retry_after = nil
@@ -41,7 +59,8 @@ module DispatchPolicy
41
59
  admit_count: admit_count,
42
60
  retry_after: retry_after,
43
61
  gate_state_patch: patch,
44
- reasons: reasons
62
+ reasons: reasons,
63
+ decisions: decisions
45
64
  )
46
65
  end
47
66
  end
@@ -45,7 +45,7 @@ module DispatchPolicy
45
45
  end
46
46
 
47
47
  def admission_batch_size(size)
48
- @admission_batch_size = Integer(size)
48
+ @admission_batch_size = Integer(size) if size
49
49
  end
50
50
 
51
51
  # Per-policy override for the EWMA half-life used to weigh recent
@@ -62,7 +62,7 @@ module DispatchPolicy
62
62
  # nil, no global cap is enforced and per-partition admission_batch_size
63
63
  # is the only ceiling.
64
64
  def tick_admission_budget(value)
65
- @tick_admission_budget = Integer(value)
65
+ @tick_admission_budget = Integer(value) if value
66
66
  end
67
67
 
68
68
  # Defines the partition scope. Required — every policy declares
@@ -192,8 +192,8 @@ module DispatchPolicy
192
192
  # through `bulk_record_partition_denies!` instead, which collapses
193
193
  # many partitions into a single UPDATE…FROM(VALUES…) at the end of
194
194
  # the tick.
195
- def claim_staged_jobs!(policy_name:, partition_key:, limit:, gate_state_patch:, retry_after:,
196
- half_life_seconds: nil)
195
+ def claim_staged_jobs!(policy_name:, partition_key:, limit:, retry_after:,
196
+ gate_state_patch: nil, half_life_seconds: nil)
197
197
  raise ArgumentError, "claim_staged_jobs! requires limit > 0" unless limit.positive?
198
198
 
199
199
  sql_select = <<~SQL.squish
@@ -212,11 +212,18 @@ module DispatchPolicy
212
212
  SQL
213
213
  rows = connection.exec_query(sql_select, "claim_staged_jobs", [policy_name, partition_key, limit]).to_a
214
214
 
215
+ # The gate_state patch may depend on how many rows we actually
216
+ # claimed (e.g. the throttle charges its bucket for jobs admitted,
217
+ # not for the optimistic `allowed`). When the caller passes a block
218
+ # it receives that real count and returns the patch to persist;
219
+ # gate-less callers pass a fixed `gate_state_patch:` instead.
220
+ patch = block_given? ? yield(rows.size) : (gate_state_patch || {})
221
+
215
222
  record_partition_admit!(
216
223
  policy_name: policy_name,
217
224
  partition_key: partition_key,
218
225
  admitted: rows.size,
219
- gate_state_patch: gate_state_patch,
226
+ gate_state_patch: patch,
220
227
  retry_after: retry_after,
221
228
  half_life_seconds: half_life_seconds
222
229
  )
@@ -396,14 +403,37 @@ module DispatchPolicy
396
403
  Integer(result.rows.first.first)
397
404
  end
398
405
 
399
- def sweep_stale_inflight!(cutoff_seconds:)
406
+ # Reap inflight rows whose owner is gone. Two tiers, distinguished by
407
+ # whether the row was ever heartbeated past its admission:
408
+ #
409
+ # heartbeat_at > admitted_at → the worker started performing and the
410
+ # heartbeat thread advanced heartbeat_at at least once. If it then
411
+ # went silent for `cutoff_seconds`, the worker died mid-run: reap.
412
+ #
413
+ # heartbeat_at <= admitted_at → never heartbeated past admission. The
414
+ # row was pre-inserted by the Tick and the job is still waiting in
415
+ # the adapter's queue (or only just started — the first heartbeat
416
+ # fires after inflight_heartbeat_interval). Reaping these at the
417
+ # short cutoff would under-count the concurrency gate and over-admit
418
+ # whenever queue latency exceeds it. Only reap once they're older
419
+ # than the far more generous `queued_cutoff_seconds`, by which point
420
+ # the admission is presumed lost.
421
+ #
422
+ # The Tick pre-insert writes admitted_at and heartbeat_at from the same
423
+ # now() (a single statement), so a never-started row has them exactly
424
+ # equal; one heartbeat makes heartbeat_at strictly greater.
425
+ def sweep_stale_inflight!(cutoff_seconds:, queued_cutoff_seconds: nil)
426
+ queued_cutoff_seconds ||= cutoff_seconds
400
427
  connection.exec_query(
401
428
  <<~SQL.squish,
402
429
  DELETE FROM #{INFLIGHT_TABLE}
403
- WHERE heartbeat_at < now() - ($1 || ' seconds')::interval
430
+ WHERE (heartbeat_at > admitted_at
431
+ AND heartbeat_at < now() - ($1 || ' seconds')::interval)
432
+ OR (heartbeat_at <= admitted_at
433
+ AND admitted_at < now() - ($2 || ' seconds')::interval)
404
434
  SQL
405
435
  "sweep_stale_inflight",
406
- [cutoff_seconds.to_i]
436
+ [cutoff_seconds.to_i, queued_cutoff_seconds.to_i]
407
437
  )
408
438
  end
409
439
 
@@ -471,6 +501,37 @@ module DispatchPolicy
471
501
  }
472
502
  end
473
503
 
504
+ # One grouped query returning per-policy tick aggregates, keyed by
505
+ # policy_name. Replaces calling tick_summary once per policy on the
506
+ # dashboard (N queries → 1). Only the fields the overview renders.
507
+ # { "policy_a" => { jobs_admitted:, forward_failures:, ticks:,
508
+ # avg_duration_ms: }, ... }
509
+ def tick_summaries_by_policy(since:)
510
+ result = connection.exec_query(
511
+ <<~SQL.squish,
512
+ SELECT
513
+ policy_name,
514
+ COALESCE(SUM(jobs_admitted), 0)::int AS jobs_admitted,
515
+ COALESCE(SUM(forward_failures), 0)::int AS forward_failures,
516
+ COUNT(*)::int AS ticks,
517
+ COALESCE(AVG(duration_ms), 0)::int AS avg_duration_ms
518
+ FROM #{SAMPLES_TABLE}
519
+ WHERE sampled_at >= $1
520
+ GROUP BY policy_name
521
+ SQL
522
+ "tick_summaries_by_policy",
523
+ [since]
524
+ )
525
+ result.to_a.each_with_object({}) do |r, h|
526
+ h[r["policy_name"]] = {
527
+ jobs_admitted: r["jobs_admitted"].to_i,
528
+ forward_failures: r["forward_failures"].to_i,
529
+ ticks: r["ticks"].to_i,
530
+ avg_duration_ms: r["avg_duration_ms"].to_i
531
+ }
532
+ end
533
+ end
534
+
474
535
  # Aggregate denied_reasons jsonb across samples in window: returns
475
536
  # { "throttle" => 12, "concurrency_full" => 3, ... }
476
537
  def denied_reasons_summary(policy_name: nil, since:)
@@ -490,6 +551,30 @@ module DispatchPolicy
490
551
  result.to_a.each_with_object({}) { |r, h| h[r["key"]] = r["total"].to_i }
491
552
  end
492
553
 
554
+ # The single most-denied reason per policy in one query, keyed by
555
+ # policy_name → [reason, count]. Replaces calling denied_reasons_summary
556
+ # per policy on the dashboard just to read its top entry.
557
+ def top_denied_reason_by_policy(since:)
558
+ result = connection.exec_query(
559
+ <<~SQL.squish,
560
+ SELECT DISTINCT ON (policy_name) policy_name, key, total
561
+ FROM (
562
+ SELECT policy_name, key, SUM(value::int)::int AS total
563
+ FROM #{SAMPLES_TABLE},
564
+ LATERAL jsonb_each_text(denied_reasons)
565
+ WHERE sampled_at >= $1
566
+ GROUP BY policy_name, key
567
+ ) t
568
+ ORDER BY policy_name, total DESC
569
+ SQL
570
+ "top_denied_reason_by_policy",
571
+ [since]
572
+ )
573
+ result.to_a.each_with_object({}) do |r, h|
574
+ h[r["policy_name"]] = [r["key"], r["total"].to_i]
575
+ end
576
+ end
577
+
493
578
  # Returns time-bucketed series for sparklines. `bucket_seconds` is the
494
579
  # bucket width. Each row: { bucket_at:, jobs_admitted:, forward_failures:,
495
580
  # pending_total:, ticks: }.
@@ -595,6 +680,62 @@ module DispatchPolicy
595
680
  }
596
681
  end
597
682
 
683
+ # Per-policy partition counts in one grouped query, keyed by
684
+ # policy_name → { pending, partitions, paused }. Replaces calling
685
+ # Partition.for_policy(name).sum/.count/.paused.count once per policy on
686
+ # the policies index (3N queries → 1).
687
+ def partition_counts_by_policy
688
+ result = connection.exec_query(
689
+ <<~SQL.squish,
690
+ SELECT
691
+ policy_name,
692
+ COALESCE(SUM(pending_count), 0)::int AS pending,
693
+ COUNT(*)::int AS partitions,
694
+ COUNT(*) FILTER (WHERE status = 'paused')::int AS paused
695
+ FROM #{PARTITIONS_TABLE}
696
+ GROUP BY policy_name
697
+ SQL
698
+ "partition_counts_by_policy",
699
+ []
700
+ )
701
+ result.to_a.each_with_object({}) do |r, h|
702
+ h[r["policy_name"]] = {
703
+ pending: r["pending"].to_i,
704
+ partitions: r["partitions"].to_i,
705
+ paused: r["paused"].to_i
706
+ }
707
+ end
708
+ end
709
+
710
+ # Per-policy round-trip stats in one grouped query, keyed by
711
+ # policy_name. Only the fields the dashboard overview renders
712
+ # (in_backoff, oldest/p95 age); use partition_round_trip_stats for the
713
+ # full single-policy breakdown. Replaces N per-policy calls on the
714
+ # dashboard. Same percentile-inversion note as partition_round_trip_stats.
715
+ def partition_round_trip_stats_by_policy
716
+ result = connection.exec_query(
717
+ <<~SQL.squish,
718
+ SELECT
719
+ p.policy_name,
720
+ COUNT(*) FILTER (WHERE p.next_eligible_at IS NOT NULL AND p.next_eligible_at > now())::int AS in_backoff,
721
+ EXTRACT(EPOCH FROM (now() - MIN(p.last_checked_at)))::float AS oldest_age_seconds,
722
+ EXTRACT(EPOCH FROM (now() - PERCENTILE_DISC(0.05) WITHIN GROUP (ORDER BY p.last_checked_at)))::float AS p95_age_seconds
723
+ FROM #{PARTITIONS_TABLE} p
724
+ WHERE p.status = 'active' AND p.pending_count > 0
725
+ GROUP BY p.policy_name
726
+ SQL
727
+ "partition_round_trip_stats_by_policy",
728
+ []
729
+ )
730
+ result.to_a.each_with_object({}) do |r, h|
731
+ h[r["policy_name"]] = {
732
+ in_backoff: r["in_backoff"].to_i,
733
+ oldest_age_seconds: r["oldest_age_seconds"]&.to_f,
734
+ p95_age_seconds: r["p95_age_seconds"]&.to_f
735
+ }
736
+ end
737
+ end
738
+
598
739
  # ----- adaptive_concurrency stats -----------------------------------------
599
740
 
600
741
  # Insert a fresh stats row for the given partition if none exists.
@@ -201,19 +201,24 @@ module DispatchPolicy
201
201
  return { admitted: 0, failures: 0, reasons: deduce_reasons(result) }
202
202
  end
203
203
 
204
- admitted = 0
204
+ admitted = 0
205
+ settled_patch = nil
205
206
  half_life = @policy.fairness_half_life_seconds || @config.fairness_half_life_seconds
206
207
 
207
208
  Repository.with_connection do
208
209
  ActiveRecord::Base.transaction(requires_new: true) do
210
+ # The gate_state we persist depends on how many rows actually
211
+ # got claimed: each gate settles its state against the real
212
+ # admitted count via Pipeline.settle (the throttle deducts that
213
+ # many tokens, not the optimistic `allowed`). The block runs
214
+ # inside claim_staged_jobs! right after the DELETE.
209
215
  rows = Repository.claim_staged_jobs!(
210
216
  policy_name: @policy_name,
211
217
  partition_key: partition["partition_key"],
212
218
  limit: result.admit_count,
213
- gate_state_patch: result.gate_state_patch,
214
219
  retry_after: result.retry_after,
215
220
  half_life_seconds: half_life
216
- )
221
+ ) { |admitted_count| settled_patch = Pipeline.settle(result.decisions, admitted_count) }
217
222
 
218
223
  # `claim_staged_jobs!` always runs `record_partition_admit!` so
219
224
  # the partition's counters and gate_state commit even when the
@@ -293,11 +298,13 @@ module DispatchPolicy
293
298
  # the STALE pre-pass-1 snapshot. For the throttle that means reading
294
299
  # the token bucket at its original level and double-spending —
295
300
  # admitting above the configured rate and overwriting pass-1's
296
- # consumption. The shallow merge matches Postgres jsonb `||`.
297
- # Only runs on a committed admit: if the TX raised we fall through to
298
- # the rescue below and never touch the in-memory state.
299
- if result.gate_state_patch&.any?
300
- partition["gate_state"] = (partition["gate_state"] || {}).merge(result.gate_state_patch)
301
+ # consumption. We mirror the SETTLED patch (post-consume, charged for
302
+ # the real admitted count), not evaluate's pre-consume snapshot. The
303
+ # shallow merge matches Postgres jsonb `||`. Only runs on a committed
304
+ # admit: if the TX raised we fall through to the rescue below and
305
+ # never touch the in-memory state.
306
+ if settled_patch&.any?
307
+ partition["gate_state"] = (partition["gate_state"] || {}).merge(settled_patch)
301
308
  end
302
309
 
303
310
  if admitted.zero?
@@ -68,7 +68,10 @@ module DispatchPolicy
68
68
 
69
69
  def sweep!
70
70
  cfg = DispatchPolicy.config
71
- Repository.sweep_stale_inflight!(cutoff_seconds: cfg.inflight_stale_after)
71
+ Repository.sweep_stale_inflight!(
72
+ cutoff_seconds: cfg.inflight_stale_after,
73
+ queued_cutoff_seconds: cfg.inflight_queued_stale_after
74
+ )
72
75
  Repository.sweep_inactive_partitions!(cutoff_seconds: cfg.partition_inactive_after)
73
76
  Repository.sweep_old_tick_samples!(cutoff_seconds: cfg.metrics_retention)
74
77
  rescue StandardError => e
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module DispatchPolicy
4
- VERSION = "0.4.2"
4
+ VERSION = "0.4.3"
5
5
  end
@@ -7,5 +7,6 @@ DispatchPolicy.configure do |c|
7
7
  c.idle_pause = 0.5 # seconds slept when no admissions happened
8
8
  c.partition_inactive_after = 24 * 60 * 60 # GC partitions idle this long
9
9
  c.inflight_stale_after = 5 * 60 # GC inflight rows whose worker stopped heartbeating
10
+ c.inflight_queued_stale_after = 60 * 60 # GC inflight rows admitted but never started (still queued)
10
11
  c.sweep_every_ticks = 50 # how often to run the sweepers
11
12
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dispatch_policy
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.2
4
+ version: 0.4.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - José Galisteo