dispatch_policy 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +31 -0
- data/README.md +117 -1
- data/app/controllers/dispatch_policy/policies_controller.rb +39 -13
- data/app/models/dispatch_policy/partition_observation.rb +34 -7
- data/app/views/dispatch_policy/policies/show.html.erb +8 -0
- data/db/migrate/20260425000001_add_duration_to_partition_observations.rb +8 -0
- data/lib/dispatch_policy/dispatchable.rb +7 -4
- data/lib/dispatch_policy/gates/throttle.rb +5 -1
- data/lib/dispatch_policy/policy.rb +14 -1
- data/lib/dispatch_policy/tick.rb +89 -2
- data/lib/dispatch_policy/version.rb +1 -1
- data/lib/dispatch_policy.rb +14 -8
- metadata +2 -2
- data/lib/dispatch_policy/install_generator.rb +0 -23
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 6eb153991642d0669fffd7cd3c8c2133c837978faf934bf6b14bf44ae8628907
|
|
4
|
+
data.tar.gz: 6ce8ff07f09fbd7763cb191b6305f316106b21a5f11b20a25ecdac3e3e2e1f2c
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: ea97e0959378bedd5a024888a8beab4880b30025216102b420b03deb5dff0c0b2ffd07beec2eb8e37d713ea3ae763378c8e50b143850563c01bf17bfb1ff47d9
|
|
7
|
+
data.tar.gz: 5ca3f5205c79ab6c8cbe76cd60820602b5ed69bb57c80109dd0ac240988a7a021a3473c3d0c67ae14181913516bae3cdf46e00ec47cf7310a22c8bf49da087ae
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,36 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.2.0
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
- `round_robin_by` supports `weight: :time` to balance per-tick quanta
|
|
7
|
+
by recent perform compute time instead of by request count (#3).
|
|
8
|
+
- GitHub Actions CI matrix covering Ruby 3.4 and Rails 7.2 / 8.1 (#5).
|
|
9
|
+
- Integration tests for gate combinations and throttle bucket
|
|
10
|
+
boundaries (#12).
|
|
11
|
+
- Resilience tests covering failure paths and dedupe state transitions
|
|
12
|
+
(#13).
|
|
13
|
+
- `bin/release` wrapper around `rake release` (#2).
|
|
14
|
+
|
|
15
|
+
### Changed
|
|
16
|
+
- Admin partition breakdown caps its aggregations to keep the page
|
|
17
|
+
responsive on policies with many partitions (#9).
|
|
18
|
+
- Admin pending list no longer loads the `arguments` jsonb column
|
|
19
|
+
(#6).
|
|
20
|
+
|
|
21
|
+
### Fixed
|
|
22
|
+
- Admission is reverted when the underlying adapter silently declines
|
|
23
|
+
to enqueue, so the staged row doesn't stay marked as admitted (#14).
|
|
24
|
+
- `consumed_ms_by_partition` window is padded to survive
|
|
25
|
+
minute-boundary races in the time-weighted round-robin fetch (#11).
|
|
26
|
+
- ThrottleBucket row locks are taken in a deterministic key order to
|
|
27
|
+
remove a deadlock window when multiple ticks contend on the same
|
|
28
|
+
set of partitions (#8).
|
|
29
|
+
|
|
30
|
+
### Removed
|
|
31
|
+
- Stale custom `InstallGenerator` — the engine's migration generator
|
|
32
|
+
is the supported install path (#7).
|
|
33
|
+
|
|
3
34
|
## 0.1.0
|
|
4
35
|
|
|
5
36
|
Initial release.
|
data/README.md
CHANGED
|
@@ -29,7 +29,17 @@ Use it when you need:
|
|
|
29
29
|
and grows back when workers keep up, without manual tuning.
|
|
30
30
|
- **Dedupe** against a partial unique index, not an in-memory key.
|
|
31
31
|
- **Round-robin fairness across tenants** (LATERAL batch fetch) so one
|
|
32
|
-
tenant's burst can't starve the others
|
|
32
|
+
tenant's burst can't starve the others — including a **time-weighted
|
|
33
|
+
variant** that balances total compute time per tenant when their
|
|
34
|
+
performs have very different durations.
|
|
35
|
+
|
|
36
|
+
## Demo
|
|
37
|
+
|
|
38
|
+
A runnable playground that exercises every gate and the admin UI lives
|
|
39
|
+
at [ceritium/dispatch_policy-demo](https://github.com/ceritium/dispatch_policy-demo).
|
|
40
|
+
Clone it, `bundle && rails db:setup`, and use the in-browser forms to
|
|
41
|
+
fire jobs through throttle / concurrency / adaptive / round-robin
|
|
42
|
+
policies while the admin UI updates in real time.
|
|
33
43
|
|
|
34
44
|
## Install
|
|
35
45
|
|
|
@@ -117,6 +127,11 @@ end
|
|
|
117
127
|
|
|
118
128
|
`perform_later` stages the job; the tick admits it when its gates pass.
|
|
119
129
|
|
|
130
|
+
For the common multi-tenant webhook case (mixed-latency tenants behind
|
|
131
|
+
a shared pool) skip ahead to [Recipes](#multi-tenant-webhook-delivery)
|
|
132
|
+
— `round_robin_by weight: :time` plus `:adaptive_concurrency` covers
|
|
133
|
+
it without an explicit throttle.
|
|
134
|
+
|
|
120
135
|
## Gates
|
|
121
136
|
|
|
122
137
|
Gates run in declared order, each narrowing the survivor set. Any option
|
|
@@ -351,6 +366,81 @@ fetch:
|
|
|
351
366
|
Cost per tick is O(`quantum × active_keys`), not O(backlog) — so the
|
|
352
367
|
admin stays snappy even with thousands of distinct tenants.
|
|
353
368
|
|
|
369
|
+
### Time-weighted variant
|
|
370
|
+
|
|
371
|
+
Equal-quanta round-robin gives every active tenant the same number of
|
|
372
|
+
admissions per tick — fair by *count*. If your tenants have very
|
|
373
|
+
different per-job durations (slow webhooks, varied report sizes) and
|
|
374
|
+
you want to balance the *total compute time* each consumes, pass
|
|
375
|
+
`weight: :time`:
|
|
376
|
+
|
|
377
|
+
```ruby
|
|
378
|
+
round_robin_by ->(args) { args.first[:account_id] }, weight: :time
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
Solo tenants are unaffected — the fetch falls through to the trailing
|
|
382
|
+
top-up and they consume up to `batch_size` per tick. When multiple
|
|
383
|
+
tenants are active, each one's quantum is sized inversely to how much
|
|
384
|
+
compute time it has used in the last `window` seconds (default 60),
|
|
385
|
+
sourced from `dispatch_policy_partition_observations`. So if `slow`
|
|
386
|
+
has burned 20 s of perform time recently and `fast` has burned 200 ms,
|
|
387
|
+
this tick `fast` claims ~99% of `batch_size` while `slow` gets the
|
|
388
|
+
floor — total compute per minute stays balanced and you don't need a
|
|
389
|
+
throttle on top.
|
|
390
|
+
|
|
391
|
+
## Recipes
|
|
392
|
+
|
|
393
|
+
### Multi-tenant webhook delivery
|
|
394
|
+
|
|
395
|
+
Mixed-latency tenants behind a shared worker pool — exactly the case
|
|
396
|
+
that motivated `weight: :time` and adaptive concurrency. Pair them:
|
|
397
|
+
|
|
398
|
+
```ruby
|
|
399
|
+
class WebhookDeliveryJob < ApplicationJob
|
|
400
|
+
include DispatchPolicy::Dispatchable
|
|
401
|
+
|
|
402
|
+
dispatch_policy do
|
|
403
|
+
context ->(args) { { account_id: args.first[:account_id] } }
|
|
404
|
+
|
|
405
|
+
# Fetch-level fairness by *compute time* (not request count). When
|
|
406
|
+
# several accounts compete, per-tick quanta are sized inverse to
|
|
407
|
+
# their recent perform duration; solo accounts top up to batch_size.
|
|
408
|
+
round_robin_by ->(args) { args.first[:account_id] },
|
|
409
|
+
weight: :time, window: 60
|
|
410
|
+
|
|
411
|
+
# Drip-feed admission per account based on adapter queue lag.
|
|
412
|
+
# Without this, a single account with thousands of pending could
|
|
413
|
+
# dump batch_size jobs into the adapter queue in one tick and lose
|
|
414
|
+
# the ability to react to performance changes mid-burst.
|
|
415
|
+
gate :adaptive_concurrency,
|
|
416
|
+
partition_by: ->(ctx) { ctx[:account_id] },
|
|
417
|
+
initial_max: 3,
|
|
418
|
+
target_lag_ms: 500
|
|
419
|
+
end
|
|
420
|
+
|
|
421
|
+
def perform(account_id:, **) = WebhookClient.deliver!(account_id)
|
|
422
|
+
end
|
|
423
|
+
```
|
|
424
|
+
|
|
425
|
+
What you get with no throttle, no manual tuning:
|
|
426
|
+
|
|
427
|
+
- A solo account runs at whatever throughput its downstream allows;
|
|
428
|
+
`:adaptive_concurrency` grows `current_max` while queue lag stays
|
|
429
|
+
under `target_lag_ms`.
|
|
430
|
+
- A slow account (1 s/perform) and a fast account (100 ms/perform)
|
|
431
|
+
competing → `weight: :time` gives the fast one most of each tick's
|
|
432
|
+
budget; the slow one's adaptive cap shrinks toward `min`. Total
|
|
433
|
+
compute time per minute stays balanced and the adapter queue
|
|
434
|
+
doesn't pile up behind whichever tenant happened to enqueue first.
|
|
435
|
+
- A misbehaving downstream that suddenly goes from 100 ms to 5 s →
|
|
436
|
+
that tenant's `current_max` drops within a few completions and its
|
|
437
|
+
fetch quantum shrinks; the other tenants are unaffected.
|
|
438
|
+
|
|
439
|
+
Tune `target_lag_ms` for the latency budget you can tolerate (see
|
|
440
|
+
[Choosing target_lag_ms](#choosing-target_lag_ms)) and `window` for
|
|
441
|
+
how reactive the time-balancing should be (smaller = noisier, larger
|
|
442
|
+
= more stable).
|
|
443
|
+
|
|
354
444
|
## Running the tick
|
|
355
445
|
|
|
356
446
|
The gem exposes `DispatchPolicy::TickLoop.run(policy_name:, stop_when:)`
|
|
@@ -430,6 +520,32 @@ indexes, `FOR UPDATE SKIP LOCKED`, `jsonb`). `PGUSER` / `PGHOST` /
|
|
|
430
520
|
`PGPASSWORD` env vars override the defaults in
|
|
431
521
|
`test/dummy/config/database.yml`.
|
|
432
522
|
|
|
523
|
+
## Releasing
|
|
524
|
+
|
|
525
|
+
The gem uses the standard `bundler/gem_tasks` flow — there is no
|
|
526
|
+
release automation in CI. To cut a new version:
|
|
527
|
+
|
|
528
|
+
1. Bump `DispatchPolicy::VERSION` in `lib/dispatch_policy/version.rb`
|
|
529
|
+
following SemVer. While the API is marked experimental, breaking
|
|
530
|
+
changes go in a minor bump and should be called out in the changelog.
|
|
531
|
+
2. Add a section to `CHANGELOG.md` above the previous one, grouping
|
|
532
|
+
entries (Added / Changed / Fixed / Removed). Link any relevant PRs.
|
|
533
|
+
3. Make sure the working tree is on `master`, clean, and CI is green
|
|
534
|
+
(`bundle exec rake test` locally for a sanity check).
|
|
535
|
+
4. Commit: `git commit -am "Release vX.Y.Z"`.
|
|
536
|
+
5. `bundle exec rake release` — Bundler will build the `.gem` into
|
|
537
|
+
`pkg/`, tag `vX.Y.Z`, push the commit and tag, and `gem push` to
|
|
538
|
+
RubyGems. The gemspec sets `rubygems_mfa_required`, so have your
|
|
539
|
+
OTP ready (`gem signin` first if you aren't authenticated).
|
|
540
|
+
6. Optional: publish a GitHub release from the tag, e.g.
|
|
541
|
+
`gh release create vX.Y.Z --notes-from-tag`, or paste the
|
|
542
|
+
changelog section into the release notes.
|
|
543
|
+
|
|
544
|
+
If `rake release` fails partway through (e.g. RubyGems push rejects
|
|
545
|
+
the version), do not retry blindly — inspect what already happened
|
|
546
|
+
(tag created? commit pushed?) and clean up before re-running, since
|
|
547
|
+
Bundler won't re-tag an existing version.
|
|
548
|
+
|
|
433
549
|
## License
|
|
434
550
|
|
|
435
551
|
MIT.
|
|
@@ -59,7 +59,12 @@ module DispatchPolicy
|
|
|
59
59
|
load_adaptive_chart_data
|
|
60
60
|
@throttle_buckets = ThrottleBucket
|
|
61
61
|
.where(policy_name: @policy_name).order(:gate_name, :partition_key).limit(50)
|
|
62
|
-
|
|
62
|
+
# Explicit select: don't load the `arguments` jsonb (job payload —
|
|
63
|
+
# may contain PII / tokens) into memory just to render six fields.
|
|
64
|
+
@pending_jobs = scope.pending
|
|
65
|
+
.select(:id, :dedupe_key, :round_robin_key, :priority, :staged_at, :not_before_at)
|
|
66
|
+
.order(:priority, :staged_at)
|
|
67
|
+
.limit(50)
|
|
63
68
|
end
|
|
64
69
|
|
|
65
70
|
private
|
|
@@ -85,8 +90,12 @@ module DispatchPolicy
|
|
|
85
90
|
now = Time.current
|
|
86
91
|
now_iso = now.iso8601
|
|
87
92
|
since_24h = 24.hours.ago.iso8601
|
|
93
|
+
limit = DispatchPolicy.config.admin_partition_limit
|
|
94
|
+
@partition_breakdown_truncated = false
|
|
88
95
|
|
|
89
96
|
adaptive_stats = AdaptiveConcurrencyStats.where(policy_name: @policy_name)
|
|
97
|
+
.order(updated_at: :desc)
|
|
98
|
+
.limit(limit)
|
|
90
99
|
.pluck(:gate_name, :partition_key, :current_max, :ewma_latency_ms)
|
|
91
100
|
.each_with_object({}) { |(g, k, c, l), h|
|
|
92
101
|
h[[ g, k ]] = { current_max: c, ewma_latency_ms: l.to_f.round(1) }
|
|
@@ -107,25 +116,38 @@ module DispatchPolicy
|
|
|
107
116
|
}
|
|
108
117
|
}
|
|
109
118
|
|
|
119
|
+
# Each aggregation below is order-by-count + limited so that a
|
|
120
|
+
# policy with tens of thousands of distinct (context, round_robin_key)
|
|
121
|
+
# tuples can't pull megabytes of rows into memory per request. We
|
|
122
|
+
# show the top-N most-active partitions per axis and flip the
|
|
123
|
+
# truncation flag for the view banner.
|
|
124
|
+
|
|
110
125
|
# Activity timestamps bounded to the last 24h so the scan stays on
|
|
111
126
|
# an index-friendly slice of staged_jobs.
|
|
112
127
|
activity_rows = scope
|
|
113
128
|
.where("staged_at > ?", since_24h)
|
|
114
129
|
.group(:context, :round_robin_key)
|
|
130
|
+
.order(Arel.sql("MAX(staged_at) DESC"))
|
|
131
|
+
.limit(limit)
|
|
115
132
|
.pluck(
|
|
116
133
|
:context,
|
|
117
134
|
:round_robin_key,
|
|
118
135
|
Arel.sql("MAX(staged_at)"),
|
|
119
136
|
Arel.sql("MAX(admitted_at)")
|
|
120
137
|
)
|
|
138
|
+
@partition_breakdown_truncated = true if activity_rows.size >= limit
|
|
121
139
|
|
|
122
140
|
sources.each do |name, extract|
|
|
123
|
-
pending_counts = scope.pending.group(:context, :round_robin_key)
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
141
|
+
pending_counts = scope.pending.group(:context, :round_robin_key)
|
|
142
|
+
.order(Arel.sql("count(*) DESC"))
|
|
143
|
+
.limit(limit)
|
|
144
|
+
.pluck(
|
|
145
|
+
:context,
|
|
146
|
+
:round_robin_key,
|
|
147
|
+
Arel.sql("count(*) filter (where not_before_at is null or not_before_at <= '#{now_iso}')"),
|
|
148
|
+
Arel.sql("count(*) filter (where not_before_at > '#{now_iso}')")
|
|
149
|
+
)
|
|
150
|
+
@partition_breakdown_truncated = true if pending_counts.size >= limit
|
|
129
151
|
pending_counts.each do |ctx, rr_key, eligible, scheduled|
|
|
130
152
|
partition = extract.call(ctx, rr_key)
|
|
131
153
|
row = rows[[ name, partition ]]
|
|
@@ -133,18 +155,22 @@ module DispatchPolicy
|
|
|
133
155
|
row[:scheduled] += scheduled
|
|
134
156
|
end
|
|
135
157
|
|
|
136
|
-
admitted_counts = scope.admitted.group(:context, :round_robin_key)
|
|
137
|
-
|
|
138
|
-
|
|
158
|
+
admitted_counts = scope.admitted.group(:context, :round_robin_key)
|
|
159
|
+
.order(Arel.sql("count(*) DESC"))
|
|
160
|
+
.limit(limit)
|
|
161
|
+
.pluck(:context, :round_robin_key, Arel.sql("count(*)"))
|
|
162
|
+
@partition_breakdown_truncated = true if admitted_counts.size >= limit
|
|
139
163
|
admitted_counts.each do |ctx, rr_key, in_flight|
|
|
140
164
|
partition = extract.call(ctx, rr_key)
|
|
141
165
|
rows[[ name, partition ]][:in_flight] += in_flight
|
|
142
166
|
end
|
|
143
167
|
|
|
144
168
|
completed_counts = scope.completed.where("completed_at > ?", since_24h)
|
|
145
|
-
.group(:context, :round_robin_key)
|
|
146
|
-
|
|
147
|
-
)
|
|
169
|
+
.group(:context, :round_robin_key)
|
|
170
|
+
.order(Arel.sql("count(*) DESC"))
|
|
171
|
+
.limit(limit)
|
|
172
|
+
.pluck(:context, :round_robin_key, Arel.sql("count(*)"))
|
|
173
|
+
@partition_breakdown_truncated = true if completed_counts.size >= limit
|
|
148
174
|
completed_counts.each do |ctx, rr_key, completed|
|
|
149
175
|
partition = extract.call(ctx, rr_key)
|
|
150
176
|
rows[[ name, partition ]][:completed_24h] += completed
|
|
@@ -7,41 +7,68 @@ module DispatchPolicy
|
|
|
7
7
|
# for all partitioned policies, not just the adaptive ones.
|
|
8
8
|
#
|
|
9
9
|
# One row per (policy, partition, minute): total_lag_ms accumulates the
|
|
10
|
-
# sum of queue_lag_ms observations in that minute,
|
|
11
|
-
#
|
|
12
|
-
#
|
|
10
|
+
# sum of queue_lag_ms observations in that minute, total_duration_ms
|
|
11
|
+
# accumulates perform durations (used by :time_budget and :fair_time_share),
|
|
12
|
+
# observation_count increments, max_lag_ms / max_duration_ms track worst
|
|
13
|
+
# spikes. Averages are derived on read as total / count.
|
|
13
14
|
class PartitionObservation < ApplicationRecord
|
|
14
15
|
self.table_name = "dispatch_policy_partition_observations"
|
|
15
16
|
|
|
16
17
|
OBSERVATION_TTL = 2 * 60 * 60 # 2 hours
|
|
17
18
|
|
|
18
|
-
def self.observe!(policy_name:, partition_key:, queue_lag_ms:, current_max: nil)
|
|
19
|
+
def self.observe!(policy_name:, partition_key:, queue_lag_ms:, duration_ms: 0, current_max: nil)
|
|
19
20
|
return if partition_key.nil? || partition_key.to_s.empty?
|
|
20
21
|
|
|
21
22
|
now = Time.current
|
|
22
23
|
lag = queue_lag_ms.to_i
|
|
24
|
+
dur = duration_ms.to_i
|
|
23
25
|
sql = <<~SQL.squish
|
|
24
26
|
INSERT INTO #{quoted_table_name}
|
|
25
27
|
(policy_name, partition_key, minute_bucket,
|
|
26
|
-
total_lag_ms,
|
|
28
|
+
total_lag_ms, total_duration_ms, observation_count,
|
|
29
|
+
max_lag_ms, max_duration_ms, current_max,
|
|
27
30
|
created_at, updated_at)
|
|
28
|
-
VALUES (?, ?, date_trunc('minute', ?::timestamp), ?, 1, ?, ?, ?, ?)
|
|
31
|
+
VALUES (?, ?, date_trunc('minute', ?::timestamp), ?, ?, 1, ?, ?, ?, ?, ?)
|
|
29
32
|
ON CONFLICT (policy_name, partition_key, minute_bucket)
|
|
30
33
|
DO UPDATE SET
|
|
31
34
|
total_lag_ms = #{quoted_table_name}.total_lag_ms + EXCLUDED.total_lag_ms,
|
|
35
|
+
total_duration_ms = #{quoted_table_name}.total_duration_ms + EXCLUDED.total_duration_ms,
|
|
32
36
|
observation_count = #{quoted_table_name}.observation_count + 1,
|
|
33
37
|
max_lag_ms = GREATEST(#{quoted_table_name}.max_lag_ms, EXCLUDED.max_lag_ms),
|
|
38
|
+
max_duration_ms = GREATEST(#{quoted_table_name}.max_duration_ms, EXCLUDED.max_duration_ms),
|
|
34
39
|
current_max = COALESCE(EXCLUDED.current_max, #{quoted_table_name}.current_max),
|
|
35
40
|
updated_at = EXCLUDED.updated_at
|
|
36
41
|
SQL
|
|
37
42
|
connection.exec_update(
|
|
38
43
|
sanitize_sql_array([
|
|
39
44
|
sql, policy_name, partition_key.to_s, now,
|
|
40
|
-
lag, lag, current_max, now, now
|
|
45
|
+
lag, dur, lag, dur, current_max, now, now
|
|
41
46
|
])
|
|
42
47
|
)
|
|
43
48
|
end
|
|
44
49
|
|
|
50
|
+
# Sum of perform durations per partition over the last `window` seconds.
|
|
51
|
+
# Used by :fair_time_share to bias admission ordering toward partitions
|
|
52
|
+
# that have consumed less compute time recently.
|
|
53
|
+
def self.consumed_ms_by_partition(policy_name:, partition_keys:, window:)
|
|
54
|
+
return {} if partition_keys.empty?
|
|
55
|
+
|
|
56
|
+
# minute_bucket is floored on insert (date_trunc('minute', now)).
|
|
57
|
+
# An observation written T seconds ago lives in a bucket up to 60s
|
|
58
|
+
# earlier than T. Add a one-bucket pad to the lower bound so the
|
|
59
|
+
# most recent bucket is always inside the window — without it, the
|
|
60
|
+
# previous-minute bucket is silently excluded as soon as the wall
|
|
61
|
+
# clock crosses a minute boundary.
|
|
62
|
+
since = Time.current - window - 60
|
|
63
|
+
rows = where(policy_name: policy_name, partition_key: partition_keys.map(&:to_s))
|
|
64
|
+
.where("minute_bucket >= ?", since)
|
|
65
|
+
.group(:partition_key)
|
|
66
|
+
.pluck(Arel.sql("partition_key, SUM(total_duration_ms), SUM(observation_count)"))
|
|
67
|
+
rows.each_with_object({}) do |(key, total, count), acc|
|
|
68
|
+
acc[key] = { consumed_ms: total.to_i, count: count.to_i }
|
|
69
|
+
end
|
|
70
|
+
end
|
|
71
|
+
|
|
45
72
|
def self.prune!
|
|
46
73
|
where("minute_bucket < ?", Time.current - OBSERVATION_TTL).delete_all
|
|
47
74
|
end
|
|
@@ -98,6 +98,14 @@
|
|
|
98
98
|
|
|
99
99
|
<h2>All partitions <small class="muted">(<%= @partition_total_list %>)</small></h2>
|
|
100
100
|
|
|
101
|
+
<% if @partition_breakdown_truncated %>
|
|
102
|
+
<p class="muted" style="background: #fff3cd; border: 1px solid #f0d97f; padding: 0.5em 0.75em; border-radius: 4px;">
|
|
103
|
+
Showing the most-active partitions only. The full set exceeds the
|
|
104
|
+
admin's per-request cap (<code>DispatchPolicy.config.admin_partition_limit</code>
|
|
105
|
+
= <%= DispatchPolicy.config.admin_partition_limit %>); raise it if you need the long tail.
|
|
106
|
+
</p>
|
|
107
|
+
<% end %>
|
|
108
|
+
|
|
101
109
|
<turbo-frame id="partitions_list" data-turbo-action="advance">
|
|
102
110
|
<% if @partition_total_list.to_i.zero? && @partition_search.blank? %>
|
|
103
111
|
<p class="muted">This policy declares no partitioning (no gate with <code>partition_by</code> and no <code>round_robin_by</code>).</p>
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
class AddDurationToPartitionObservations < ActiveRecord::Migration[7.1]
|
|
4
|
+
def change
|
|
5
|
+
add_column :dispatch_policy_partition_observations, :total_duration_ms, :bigint, null: false, default: 0
|
|
6
|
+
add_column :dispatch_policy_partition_observations, :max_duration_ms, :integer, null: false, default: 0
|
|
7
|
+
end
|
|
8
|
+
end
|
|
@@ -41,6 +41,7 @@ module DispatchPolicy
|
|
|
41
41
|
block.call
|
|
42
42
|
succeeded = true
|
|
43
43
|
ensure
|
|
44
|
+
duration_ms = ((Time.current - perform_start) * 1000).to_i
|
|
44
45
|
policy_name = job.class.resolved_dispatch_policy&.name
|
|
45
46
|
|
|
46
47
|
if job._dispatch_partitions.present?
|
|
@@ -49,9 +50,9 @@ module DispatchPolicy
|
|
|
49
50
|
partitions: job._dispatch_partitions
|
|
50
51
|
)
|
|
51
52
|
|
|
52
|
-
# Let adaptive gates update their AIMD state first;
|
|
53
|
-
#
|
|
54
|
-
#
|
|
53
|
+
# Let adaptive gates update their AIMD state first; the
|
|
54
|
+
# generic observation below then captures the resulting
|
|
55
|
+
# current_max alongside lag + duration for the chart.
|
|
55
56
|
policy = job.class.resolved_dispatch_policy
|
|
56
57
|
job._dispatch_partitions.each do |gate_name, partition_key|
|
|
57
58
|
gate = policy&.gates&.find { |g| g.name == gate_name.to_sym }
|
|
@@ -64,7 +65,8 @@ module DispatchPolicy
|
|
|
64
65
|
end
|
|
65
66
|
|
|
66
67
|
# Generic observation per unique partition. Every gate with
|
|
67
|
-
# partition_by (adaptive or not) gets a sparkline this way
|
|
68
|
+
# partition_by (adaptive or not) gets a sparkline this way,
|
|
69
|
+
# plus :fair_time_share reads consumed_ms from here.
|
|
68
70
|
job._dispatch_partitions.values.uniq.each do |partition_key|
|
|
69
71
|
current_max = DispatchPolicy::AdaptiveConcurrencyStats.current_max_for(
|
|
70
72
|
policy_name: policy_name,
|
|
@@ -74,6 +76,7 @@ module DispatchPolicy
|
|
|
74
76
|
policy_name: policy_name,
|
|
75
77
|
partition_key: partition_key,
|
|
76
78
|
queue_lag_ms: queue_lag_ms,
|
|
79
|
+
duration_ms: duration_ms,
|
|
77
80
|
current_max: current_max
|
|
78
81
|
)
|
|
79
82
|
end
|
|
@@ -18,7 +18,11 @@ module DispatchPolicy
|
|
|
18
18
|
by_partition = batch.group_by { |staged| partition_key_for(context.for(staged)) }
|
|
19
19
|
|
|
20
20
|
admitted = []
|
|
21
|
-
|
|
21
|
+
# Sort keys before acquiring per-partition row locks: two ticks
|
|
22
|
+
# processing overlapping partitions in different group_by orders
|
|
23
|
+
# would otherwise deadlock on each other's FOR UPDATE rows.
|
|
24
|
+
by_partition.keys.sort.each do |partition_key|
|
|
25
|
+
jobs = by_partition[partition_key]
|
|
22
26
|
sample_ctx = context.for(jobs.first)
|
|
23
27
|
rate = resolve(@rate, sample_ctx).to_f
|
|
24
28
|
per = @per.to_f
|
|
@@ -12,6 +12,8 @@ module DispatchPolicy
|
|
|
12
12
|
@snapshots = {}
|
|
13
13
|
@dedupe_key_builder = nil
|
|
14
14
|
@round_robin_builder = nil
|
|
15
|
+
@round_robin_weight = :equal
|
|
16
|
+
@round_robin_window = 60
|
|
15
17
|
instance_eval(&block) if block
|
|
16
18
|
DispatchPolicy.registry[@name] = job_class
|
|
17
19
|
end
|
|
@@ -45,14 +47,25 @@ module DispatchPolicy
|
|
|
45
47
|
key&.to_s
|
|
46
48
|
end
|
|
47
49
|
|
|
48
|
-
def round_robin_by(builder)
|
|
50
|
+
def round_robin_by(builder, weight: :equal, window: 60)
|
|
51
|
+
raise ArgumentError, "weight must be :equal or :time" unless %i[equal time].include?(weight)
|
|
49
52
|
@round_robin_builder = builder
|
|
53
|
+
@round_robin_weight = weight
|
|
54
|
+
@round_robin_window = window
|
|
50
55
|
end
|
|
51
56
|
|
|
52
57
|
def round_robin?
|
|
53
58
|
!@round_robin_builder.nil?
|
|
54
59
|
end
|
|
55
60
|
|
|
61
|
+
def round_robin_weight
|
|
62
|
+
@round_robin_weight
|
|
63
|
+
end
|
|
64
|
+
|
|
65
|
+
def round_robin_window
|
|
66
|
+
@round_robin_window
|
|
67
|
+
end
|
|
68
|
+
|
|
56
69
|
def build_round_robin_key(arguments)
|
|
57
70
|
return nil unless @round_robin_builder
|
|
58
71
|
key = @round_robin_builder.call(arguments)
|
data/lib/dispatch_policy/tick.rb
CHANGED
|
@@ -28,7 +28,20 @@ module DispatchPolicy
|
|
|
28
28
|
pending_enqueue.each do |staged, job|
|
|
29
29
|
begin
|
|
30
30
|
job.enqueue(_bypass_staging: true)
|
|
31
|
-
|
|
31
|
+
# ActiveJob adapters report a polite failure by setting
|
|
32
|
+
# enqueue_error and leaving successfully_enqueued? false
|
|
33
|
+
# instead of raising. Without this check the staged row
|
|
34
|
+
# would stay marked admitted while the adapter never queued
|
|
35
|
+
# the job — losing it silently.
|
|
36
|
+
if job.successfully_enqueued?
|
|
37
|
+
admitted_count += 1
|
|
38
|
+
else
|
|
39
|
+
Rails.logger&.warn(
|
|
40
|
+
"[DispatchPolicy] adapter did not enqueue staged=#{staged.id}: " \
|
|
41
|
+
"#{job.enqueue_error&.class}: #{job.enqueue_error&.message}"
|
|
42
|
+
)
|
|
43
|
+
revert_admission(staged)
|
|
44
|
+
end
|
|
32
45
|
rescue StandardError => e
|
|
33
46
|
Rails.logger&.error("[DispatchPolicy] enqueue failed staged=#{staged.id}: #{e.class}: #{e.message}")
|
|
34
47
|
revert_admission(staged)
|
|
@@ -100,7 +113,11 @@ module DispatchPolicy
|
|
|
100
113
|
|
|
101
114
|
def self.fetch_batch(policy)
|
|
102
115
|
if policy.round_robin?
|
|
103
|
-
|
|
116
|
+
if policy.round_robin_weight == :time
|
|
117
|
+
fetch_time_weighted_batch(policy)
|
|
118
|
+
else
|
|
119
|
+
fetch_round_robin_batch(policy)
|
|
120
|
+
end
|
|
104
121
|
else
|
|
105
122
|
fetch_plain_batch(policy)
|
|
106
123
|
end
|
|
@@ -162,6 +179,76 @@ module DispatchPolicy
|
|
|
162
179
|
batch + top_up
|
|
163
180
|
end
|
|
164
181
|
|
|
182
|
+
# Time-weighted variant of round-robin: instead of an equal quantum
|
|
183
|
+
# per active partition, allocate quanta proportional to the inverse
|
|
184
|
+
# of recently-consumed compute time. Solo partitions get the full
|
|
185
|
+
# batch_size; competing partitions get slices that bias admission
|
|
186
|
+
# toward whoever has consumed less, so total compute time stays
|
|
187
|
+
# balanced even when one tenant's backlog is much bigger than
|
|
188
|
+
# another's. Falls back to the same trailing top-up as the equal
|
|
189
|
+
# round-robin so we never under-fill the batch when only a few
|
|
190
|
+
# partitions are active.
|
|
191
|
+
DEFAULT_TIME_SHARE_DURATION_MS = 100
|
|
192
|
+
|
|
193
|
+
def self.fetch_time_weighted_batch(policy)
|
|
194
|
+
batch_size = DispatchPolicy.config.batch_size
|
|
195
|
+
now = Time.current
|
|
196
|
+
|
|
197
|
+
partitions = StagedJob.pending
|
|
198
|
+
.where(policy_name: policy.name)
|
|
199
|
+
.where("not_before_at IS NULL OR not_before_at <= ?", now)
|
|
200
|
+
.where.not(round_robin_key: nil)
|
|
201
|
+
.distinct
|
|
202
|
+
.pluck(:round_robin_key)
|
|
203
|
+
|
|
204
|
+
return fetch_plain_batch(policy) if partitions.empty?
|
|
205
|
+
|
|
206
|
+
consumed = PartitionObservation.consumed_ms_by_partition(
|
|
207
|
+
policy_name: policy.name,
|
|
208
|
+
partition_keys: partitions,
|
|
209
|
+
window: policy.round_robin_window
|
|
210
|
+
)
|
|
211
|
+
|
|
212
|
+
# Inverse-of-consumed weights, with a floor so a brand-new partition
|
|
213
|
+
# (no observations) doesn't dominate to infinity.
|
|
214
|
+
weights = partitions.each_with_object({}) do |key, acc|
|
|
215
|
+
consumed_ms = consumed.dig(key, :consumed_ms) || 0
|
|
216
|
+
denom = [ consumed_ms, DEFAULT_TIME_SHARE_DURATION_MS ].max
|
|
217
|
+
acc[key] = 1.0 / denom
|
|
218
|
+
end
|
|
219
|
+
total_weight = weights.values.sum
|
|
220
|
+
quanta = weights.transform_values do |w|
|
|
221
|
+
[ (batch_size * w / total_weight).floor, 1 ].max
|
|
222
|
+
end
|
|
223
|
+
|
|
224
|
+
batch = []
|
|
225
|
+
partitions.each do |key|
|
|
226
|
+
rows = StagedJob.pending
|
|
227
|
+
.where(policy_name: policy.name, round_robin_key: key)
|
|
228
|
+
.where("not_before_at IS NULL OR not_before_at <= ?", now)
|
|
229
|
+
.order(:priority, :staged_at)
|
|
230
|
+
.limit(quanta[key])
|
|
231
|
+
.lock("FOR UPDATE SKIP LOCKED")
|
|
232
|
+
.to_a
|
|
233
|
+
batch.concat(rows)
|
|
234
|
+
break if batch.size >= batch_size
|
|
235
|
+
end
|
|
236
|
+
|
|
237
|
+
remaining = batch_size - batch.size
|
|
238
|
+
return batch if remaining <= 0 || batch.empty?
|
|
239
|
+
|
|
240
|
+
top_up = StagedJob.pending
|
|
241
|
+
.where(policy_name: policy.name)
|
|
242
|
+
.where("not_before_at IS NULL OR not_before_at <= ?", now)
|
|
243
|
+
.where.not(id: batch.map(&:id))
|
|
244
|
+
.order(:priority, :staged_at)
|
|
245
|
+
.limit(remaining)
|
|
246
|
+
.lock("FOR UPDATE SKIP LOCKED")
|
|
247
|
+
.to_a
|
|
248
|
+
|
|
249
|
+
batch + top_up
|
|
250
|
+
end
|
|
251
|
+
|
|
165
252
|
def self.lookup_policy(policy_name)
|
|
166
253
|
job_class = DispatchPolicy.registry[policy_name] || autoload_job_for(policy_name)
|
|
167
254
|
return nil unless job_class
|
data/lib/dispatch_policy.rb
CHANGED
|
@@ -16,19 +16,25 @@ module DispatchPolicy
|
|
|
16
16
|
:tick_sleep,
|
|
17
17
|
:tick_sleep_busy,
|
|
18
18
|
:partition_idle_ttl,
|
|
19
|
+
:admin_partition_limit,
|
|
19
20
|
keyword_init: true
|
|
20
21
|
)
|
|
21
22
|
|
|
22
23
|
def self.config
|
|
23
24
|
@config ||= Config.new(
|
|
24
|
-
enabled:
|
|
25
|
-
lease_duration:
|
|
26
|
-
batch_size:
|
|
27
|
-
round_robin_quantum:
|
|
28
|
-
tick_max_duration:
|
|
29
|
-
tick_sleep:
|
|
30
|
-
tick_sleep_busy:
|
|
31
|
-
partition_idle_ttl:
|
|
25
|
+
enabled: true,
|
|
26
|
+
lease_duration: 15 * 60, # 15.minutes
|
|
27
|
+
batch_size: 500,
|
|
28
|
+
round_robin_quantum: 50,
|
|
29
|
+
tick_max_duration: 60, # 1.minute
|
|
30
|
+
tick_sleep: 1, # idle sleep
|
|
31
|
+
tick_sleep_busy: 0.05, # busy sleep
|
|
32
|
+
partition_idle_ttl: 30 * 60, # 30.minutes
|
|
33
|
+
# Hard cap on rows the admin's partition breakdown will pull per
|
|
34
|
+
# aggregation. Protects the host DB and process when a policy has
|
|
35
|
+
# tens of thousands of partitions: the admin shows the top-N most
|
|
36
|
+
# active and a truncation banner instead of dragging in everything.
|
|
37
|
+
admin_partition_limit: 5_000
|
|
32
38
|
)
|
|
33
39
|
end
|
|
34
40
|
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: dispatch_policy
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.2.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- José Galisteo
|
|
@@ -136,6 +136,7 @@ files:
|
|
|
136
136
|
- db/migrate/20260424000002_create_adaptive_concurrency_stats.rb
|
|
137
137
|
- db/migrate/20260424000003_create_adaptive_concurrency_samples.rb
|
|
138
138
|
- db/migrate/20260424000004_rename_samples_to_partition_observations.rb
|
|
139
|
+
- db/migrate/20260425000001_add_duration_to_partition_observations.rb
|
|
139
140
|
- lib/dispatch_policy.rb
|
|
140
141
|
- lib/dispatch_policy/active_job_perform_all_later_patch.rb
|
|
141
142
|
- lib/dispatch_policy/dispatch_context.rb
|
|
@@ -147,7 +148,6 @@ files:
|
|
|
147
148
|
- lib/dispatch_policy/gates/fair_interleave.rb
|
|
148
149
|
- lib/dispatch_policy/gates/global_cap.rb
|
|
149
150
|
- lib/dispatch_policy/gates/throttle.rb
|
|
150
|
-
- lib/dispatch_policy/install_generator.rb
|
|
151
151
|
- lib/dispatch_policy/policy.rb
|
|
152
152
|
- lib/dispatch_policy/tick.rb
|
|
153
153
|
- lib/dispatch_policy/tick_loop.rb
|
|
@@ -1,23 +0,0 @@
|
|
|
1
|
-
# frozen_string_literal: true
|
|
2
|
-
|
|
3
|
-
require "rails/generators"
|
|
4
|
-
require "rails/generators/active_record"
|
|
5
|
-
|
|
6
|
-
module DispatchPolicy
|
|
7
|
-
module Generators
|
|
8
|
-
class InstallGenerator < Rails::Generators::Base
|
|
9
|
-
include Rails::Generators::Migration
|
|
10
|
-
|
|
11
|
-
source_root File.expand_path("../../db/migrate", __dir__)
|
|
12
|
-
|
|
13
|
-
def self.next_migration_number(dirname)
|
|
14
|
-
ActiveRecord::Generators::Base.next_migration_number(dirname)
|
|
15
|
-
end
|
|
16
|
-
|
|
17
|
-
def copy_migration
|
|
18
|
-
migration_template "20260424000001_create_dispatch_policy_tables.rb",
|
|
19
|
-
"db/migrate/create_dispatch_policy_tables.rb"
|
|
20
|
-
end
|
|
21
|
-
end
|
|
22
|
-
end
|
|
23
|
-
end
|