RubyGems - dispatch_policy - Versions diffs - 0.1.0 → 0.3.0 - Mend

dispatch_policy 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (70) hide show

checksums.yaml +4 -4
data/MIT-LICENSE +16 -17
data/README.md +449 -288
data/app/assets/stylesheets/dispatch_policy/application.css +157 -0
data/app/controllers/dispatch_policy/application_controller.rb +45 -1
data/app/controllers/dispatch_policy/dashboard_controller.rb +91 -0
data/app/controllers/dispatch_policy/partitions_controller.rb +122 -0
data/app/controllers/dispatch_policy/policies_controller.rb +94 -241
data/app/controllers/dispatch_policy/staged_jobs_controller.rb +9 -0
data/app/models/dispatch_policy/adaptive_concurrency_stats.rb +11 -81
data/app/models/dispatch_policy/inflight_job.rb +12 -0
data/app/models/dispatch_policy/partition.rb +21 -0
data/app/models/dispatch_policy/staged_job.rb +4 -97
data/app/models/dispatch_policy/tick_sample.rb +11 -0
data/app/views/dispatch_policy/dashboard/index.html.erb +109 -0
data/app/views/dispatch_policy/partitions/index.html.erb +63 -0
data/app/views/dispatch_policy/partitions/show.html.erb +106 -0
data/app/views/dispatch_policy/policies/index.html.erb +15 -37
data/app/views/dispatch_policy/policies/show.html.erb +140 -216
data/app/views/dispatch_policy/shared/_capacity.html.erb +67 -0
data/app/views/dispatch_policy/shared/_hints.html.erb +13 -0
data/app/views/dispatch_policy/shared/_partition_row.html.erb +12 -0
data/app/views/dispatch_policy/staged_jobs/show.html.erb +31 -0
data/app/views/layouts/dispatch_policy/application.html.erb +95 -238
data/config/routes.rb +18 -2
data/db/migrate/20260501000001_create_dispatch_policy_tables.rb +103 -0
data/lib/dispatch_policy/bypass.rb +23 -0
data/lib/dispatch_policy/config.rb +85 -0
data/lib/dispatch_policy/context.rb +50 -0
data/lib/dispatch_policy/cursor_pagination.rb +121 -0
data/lib/dispatch_policy/decision.rb +22 -0
data/lib/dispatch_policy/engine.rb +4 -27
data/lib/dispatch_policy/forwarder.rb +63 -0
data/lib/dispatch_policy/gate.rb +10 -38
data/lib/dispatch_policy/gates/adaptive_concurrency.rb +99 -97
data/lib/dispatch_policy/gates/concurrency.rb +45 -26
data/lib/dispatch_policy/gates/throttle.rb +65 -37
data/lib/dispatch_policy/inflight_tracker.rb +174 -0
data/lib/dispatch_policy/job_extension.rb +155 -0
data/lib/dispatch_policy/operator_hints.rb +126 -0
data/lib/dispatch_policy/pipeline.rb +48 -0
data/lib/dispatch_policy/policy.rb +62 -47
data/lib/dispatch_policy/policy_dsl.rb +120 -0
data/lib/dispatch_policy/railtie.rb +35 -0
data/lib/dispatch_policy/registry.rb +46 -0
data/lib/dispatch_policy/repository.rb +723 -0
data/lib/dispatch_policy/serializer.rb +36 -0
data/lib/dispatch_policy/tick.rb +263 -172
data/lib/dispatch_policy/tick_loop.rb +59 -26
data/lib/dispatch_policy/version.rb +1 -1
data/lib/dispatch_policy.rb +71 -46
data/lib/generators/dispatch_policy/install/install_generator.rb +70 -0
data/lib/generators/dispatch_policy/install/templates/create_dispatch_policy_tables.rb.tt +95 -0
data/lib/generators/dispatch_policy/install/templates/dispatch_tick_loop_job.rb.tt +53 -0
data/lib/generators/dispatch_policy/install/templates/initializer.rb.tt +11 -0
metadata +101 -43
data/CHANGELOG.md +0 -12
data/app/models/dispatch_policy/partition_inflight_count.rb +0 -42
data/app/models/dispatch_policy/partition_observation.rb +0 -49
data/app/models/dispatch_policy/throttle_bucket.rb +0 -41
data/db/migrate/20260424000001_create_dispatch_policy_tables.rb +0 -80
data/db/migrate/20260424000002_create_adaptive_concurrency_stats.rb +0 -22
data/db/migrate/20260424000003_create_adaptive_concurrency_samples.rb +0 -25
data/db/migrate/20260424000004_rename_samples_to_partition_observations.rb +0 -32
data/lib/dispatch_policy/active_job_perform_all_later_patch.rb +0 -32
data/lib/dispatch_policy/dispatch_context.rb +0 -53
data/lib/dispatch_policy/dispatchable.rb +0 -120
data/lib/dispatch_policy/gates/fair_interleave.rb +0 -32
data/lib/dispatch_policy/gates/global_cap.rb +0 -26
data/lib/dispatch_policy/install_generator.rb +0 -23

data/README.md CHANGED Viewed

@@ -1,434 +1,595 @@
 # DispatchPolicy
-> **⚠️ Experimental.** The API, schema, and defaults can change between
-> minor releases without notice. DispatchPolicy is currently running in
-> production on [pulso.run](https://pulso.run) — that's how we learn
-> what breaks. If you pick it up for your own project, pin the exact
-> version and expect to follow the changelog.
+> **⚠️ Experimental v2 branch.** This is the `v2` branch of
+> [ceritium/dispatch_policy](https://github.com/ceritium/dispatch_policy)
+> — an alternative cut: TX-atomic admission, in-tick fairness as a
+> layer (not a gate), and a single canonical partition scope per
+> policy. API, schema, and defaults can change between any two
+> commits. The `master` branch of the same repo is the original
+> design and is what the published gem (when one ships) tracks.
 >
-> **PostgreSQL only (11+).** The staging, admission, and fairness
-> machinery lean on `jsonb`, partial indexes, `FOR UPDATE SKIP LOCKED`,
-> `ON CONFLICT`, and `CROSS JOIN LATERAL`. MySQL/SQLite support isn't
-> closed off as a goal — being drop-in across every ActiveJob backend
-> is the long-term direction — but it would take meaningful rework
-> (shadow columns for `jsonb`, full indexes instead of partial, a
-> different batch-fetch strategy for fairness). Contributions welcome.
+> **PostgreSQL only.** Staging, admission, and adaptive stats lean on
+> `jsonb`, partial indexes, `FOR UPDATE SKIP LOCKED`, `ON CONFLICT`,
+> and the adapter sharing `ActiveRecord::Base.connection` so the
+> admit + adapter INSERT can join one transaction. Tested against
+> good_job and solid_queue.
 Per-partition admission control for ActiveJob. Stages `perform_later`
 into a dedicated table, runs a tick loop that admits jobs through
-declared gates (throttle, concurrency, global_cap, fair_interleave,
-adaptive_concurrency), then forwards survivors to the real adapter.
+declared gates (`throttle`, `concurrency`, `adaptive_concurrency`),
+then forwards survivors to the real adapter. The admission and the
+adapter INSERT happen inside one Postgres transaction, so a worker
+crash mid-tick can't lose a job.
 Use it when you need:
-- **Per-tenant / per-endpoint throttle** that's exact (token bucket)
-  instead of best-effort enqueue-side.
-- **Per-partition concurrency** with a proper release hook on job
-  completion (and lease-expiry recovery if the worker dies mid-perform).
+- **Per-tenant / per-endpoint throttle** — token bucket per partition,
+  refreshed lazily on read.
+- **Per-partition concurrency** — fixed cap on in-flight jobs with a
+  release hook on completion and a heartbeat-based reaper for crashes.
 - **Adaptive concurrency** — a cap that shrinks under queue pressure
-  and grows back when workers keep up, without manual tuning.
-- **Dedupe** against a partial unique index, not an in-memory key.
-- **Round-robin fairness across tenants** (LATERAL batch fetch) so one
-  tenant's burst can't starve the others.
+  and grows back when workers keep up, no manual tuning per tenant.
+- **In-tick fairness** — within a single tick, partitions are reordered
+  by recent activity (EWMA) and an optional global cap is shared
+  fairly across them. So one tenant's burst can't starve the others.
+- **Sharding** — split a policy across N queues so independent tick
+  workers admit in parallel.
+## Demo
+The demo lives in `test/dummy/` — a tiny Rails app inside this repo.
+Run it locally to play with every gate and the admin UI:
+```bash
+bin/dummy setup good_job        # creates the DB and migrates
+DUMMY_ADAPTER=good_job bundle exec foreman start
+```
+Then open:
+- `http://localhost:3000/` — playground with one card per job and a
+  storm form that exercises the adaptive cap and fairness reorder
+  across many tenants.
+- `http://localhost:3000/dispatch_policy` — admin UI: live throughput,
+  partition state, denial reasons, capacity hints.
+The dummy ships ten purpose-built jobs covering throttle, concurrency,
+mixed gates, scheduling, retries, stress tests, sharding, fairness, and
+adaptive concurrency. See `test/dummy/app/jobs/`.
 ## Install
 Add to your `Gemfile`:
 ```ruby
-gem "dispatch_policy"
+gem "dispatch_policy",
+    git:    "https://github.com/ceritium/dispatch_policy",
+    branch: "v2"
 ```
-Copy the migration and run it:
+Generate the install bundle (migration + initializer + tick loop job):
-```
-bundle exec rails dispatch_policy:install:migrations
-bundle exec rails db:migrate
+```bash
+bin/rails generate dispatch_policy:install
+bin/rails db:migrate
 ```
-Mount the admin UI in `config/routes.rb` (optional):
+Mount the admin UI (optional but recommended):
 ```ruby
-mount DispatchPolicy::Engine => "/admin/dispatch_policy"
+mount DispatchPolicy::Engine, at: "/dispatch_policy"
 ```
-Configure in `config/initializers/dispatch_policy.rb`:
+Then schedule the tick loop. The generator wrote a
+`DispatchTickLoopJob` in `app/jobs/`; kick it off once and it
+re-enqueues itself:
 ```ruby
-DispatchPolicy.configure do |c|
-  c.enabled             = ENV.fetch("DISPATCH_POLICY_ENABLED", "true") != "false"
-  c.lease_duration      = 15.minutes
-  c.batch_size          = 500
-  c.round_robin_quantum = 50
-  c.tick_sleep          = 1        # idle
-  c.tick_sleep_busy     = 0.05     # after productive ticks
-end
+DispatchTickLoopJob.perform_later
 ```
 ## Flow
 ```
 ActiveJob#perform_later
-  → Dispatchable#enqueue
-    → StagedJob.stage!   (insert into dispatch_policy_staged_jobs, pending)
+  → JobExtension.around_enqueue_for
+    → Repository.stage!   (INSERT staged + UPSERT partition; ctx refreshed)
 (tick loop, periodically)
-  → SELECT pending FOR UPDATE SKIP LOCKED
-  → Run gates in declared order; survivors are the admitted set
-  → StagedJob#mark_admitted!   (increment counters, set admitted_at)
-  → job.enqueue(_bypass_staging: true)   (hand off to the real adapter)
+  → claim_partitions      (FOR UPDATE SKIP LOCKED, ordered by last_checked_at)
+  → reorder by decayed_admits ASC                  (in-tick fairness)
+  → for each: pipeline.call(ctx, partition, fair_share)
+    → gates evaluate; admit_count = min(allowed)
+    → ONE TX: claim_staged_jobs! + insert_inflight! + Forwarder.dispatch
+              (the adapter INSERT shares the TX; rollback if anything raises)
+  → bulk-flush deny-state in one UPDATE ... FROM (VALUES ...)
 (worker runs perform)
-  → Dispatchable#around_perform
+  → InflightTracker.track (around_perform)
+    → INSERT inflight_jobs ON CONFLICT DO NOTHING
+    → spawn heartbeat thread
     → block.call
-    → release counters, mark StagedJob completed_at, record observation
+    → record_observation on adaptive gates (queue_lag → AIMD update)
+    → DELETE inflight_jobs
 ```
 ## Declaring a policy
 ```ruby
-class SendWebhookJob < ApplicationJob
-  include DispatchPolicy::Dispatchable
+class FetchEndpointJob < ApplicationJob
+  dispatch_policy_inflight_tracking      # only required if a concurrency gate is used
-  dispatch_policy do
-    # Persisted in the staged row so gates can read it without touching AR.
+  dispatch_policy :endpoints do
     context ->(args) {
       event = args.first
-      { endpoint_id: event.endpoint_id, rate_limit: event.endpoint.rate_limit }
+      {
+        endpoint_id:     event.endpoint_id,
+        rate_limit:      event.endpoint.rate_limit,
+        max_per_account: event.account.dispatch_concurrency
+      }
     }
-    # Partial unique index dedupes identical keys while the previous is pending.
-    dedupe_key ->(args) { "event:#{args.first.id}" }
-    # Tenant fairness — see the "Round-robin" section below.
-    round_robin_by ->(args) { args.first.account_id }
+    # Required: every gate in the policy enforces against this scope.
+    partition_by ->(ctx) { ctx[:endpoint_id] }
     gate :throttle,
-         rate:         ->(ctx) { ctx[:rate_limit] },
-         per:          1.minute,
-         partition_by: ->(ctx) { ctx[:endpoint_id] }
+         rate: ->(ctx) { ctx[:rate_limit] },
+         per:  1.minute
-    gate :fair_interleave
+    gate :concurrency,
+         max: ->(ctx) { ctx[:max_per_account] || 5 }
+    retry_strategy :restage      # default; alternative: :bypass
   end
-  def perform(event) = event.deliver!
+  def perform(event)
+    # ... call the rate-limited HTTP endpoint
+  end
 end
 ```
 `perform_later` stages the job; the tick admits it when its gates pass.
+With multiple gates the actual `admit_count` per tick comes out as
+`min(allowed)` across all of them.
-## Gates
+## Choosing the partition scope
-Gates run in declared order, each narrowing the survivor set. Any option
-that takes a value can alternatively take a lambda that receives the
-`ctx` hash, so parameters can depend on per-job data.
+`partition_by` is the most consequential decision in a policy and the
+only required field. It tells the gem **what counts as one logical
+partition** — what scope each gate enforces against, and what the
+in-tick fairness reorder operates over.
-### `:concurrency` — in-flight cap per partition
+A policy with `partition_by` and **no gates** is also valid: the
+pipeline passes the full budget through, and the Tick caps it via
+`admission_batch_size` (or `tick_admission_budget` if set). Useful
+for "balance N tenants evenly" without rate-limiting any of them.
-Caps the number of admitted-but-not-yet-completed jobs in each
-partition. Tracks in-flight counts in
-`dispatch_policy_partition_counts`; decremented by the `around_perform`
-hook when the job finishes, or by the reaper when a lease expires
-(worker crashed).
+If you need genuinely different scopes per gate (throttle by endpoint
+AND concurrency by account, each enforced at its own scope), **split
+into two policies** and chain them: the staging policy admits, its
+worker enqueues into the second.
-```ruby
-gate :concurrency,
-     max:          ->(ctx) { ctx[:max_per_account] || 5 },
-     partition_by: ->(ctx) { "acct:#{ctx[:account_id]}" }
-```
+## Gates
-When to reach for it: external APIs with per-tenant concurrency limits,
-database-heavy jobs you don't want to pile up per customer, anything
-where "at most N running at once for this key" matters.
+Gates run in declared order; each narrows the survivor count. Every
+option that takes a value can alternatively take a lambda receiving
+the `ctx` hash, so parameters can depend on per-job data.
 ### `:throttle` — token-bucket rate limit per partition
-Refills `rate` tokens every `per` seconds, capped at `burst` (defaults
-to `rate`). Admits jobs while tokens are available; leaves the rest
-pending for the next tick.
+Refills `rate` tokens every `per` seconds, capped at `rate` (no
+separate burst). Admits jobs while tokens are available; leaves the
+rest pending for the next tick. State is persisted in
+`partitions.gate_state.throttle`.
 ```ruby
 gate :throttle,
-     rate:         100,          # tokens
-     per:          1.minute,     # refill window
-     burst:        100,          # bucket cap (optional, defaults to rate)
-     partition_by: ->(ctx) { "host:#{ctx[:host]}" }
+     rate: ->(ctx) { ctx[:rate_limit] },
+     per:  1.minute
 ```
-`rate` and `burst` accept lambdas, so the limit can come from
-configuration stored alongside the thing being rate-limited:
+Throttle does **not** release tokens on completion — tokens refill
+only with elapsed time.
+### `:concurrency` — in-flight cap per partition
+Caps the number of admitted-but-not-yet-completed jobs per partition.
+Counts rows in `dispatch_policy_inflight_jobs` keyed by the policy's
+canonical partition. Decremented by `InflightTracker.track`'s
+`around_perform`; reaped by a periodic sweeper if a worker crashes.
 ```ruby
-gate :throttle,
-     rate:         ->(ctx) { ctx[:rate_limit] },
-     per:          1.minute,
-     partition_by: ->(ctx) { ctx[:endpoint_id] }
+gate :concurrency,
+     max: ->(ctx) { ctx[:max_per_account] || 5 }
 ```
-Unlike `:concurrency`, throttle does **not** release tokens on job
-completion — tokens refill only with elapsed time.
+When the cap is full, the gate returns `retry_after = full_backoff`
+(default 1s) so the partition skips the next ticks instead of
+hammering `count(*)` every iteration.
-### `:global_cap` — single cap across all partitions
+### `:adaptive_concurrency` — per-partition cap that self-tunes
-A global version of `:concurrency`: at most `max` jobs admitted
-simultaneously across the whole policy, regardless of partition.
-Useful as a safety ceiling on top of per-partition limits.
+Like `:concurrency` but the cap (`current_max`) shrinks when the
+adapter queue backs up and grows when workers drain it quickly.
+AIMD loop on a per-partition stats row in
+`dispatch_policy_adaptive_concurrency_stats`.
 ```ruby
-gate :concurrency, max: 10, partition_by: ->(ctx) { ctx[:tenant] }
-gate :global_cap,  max: 200
+gate :adaptive_concurrency,
+     initial_max:   3,
+     target_lag_ms: 1000,   # acceptable queue wait before backoff
+     min:           1       # floor; a partition can't lock out
 ```
-Reads: "up to 10 in flight per tenant, but never more than 200 total".
+- **Feedback signal**: `admitted_at → perform_start` (queue wait in
+  the real adapter). Pure saturation signal — slow performs in the
+  downstream service don't punish admissions if workers still drain
+  the queue quickly.
+- **Growth**: `current_max += 1` per fast success.
+- **Slow shrink**: `current_max *= 0.95` when EWMA lag > target.
+- **Failure shrink**: `current_max *= 0.5` when `perform` raises.
+- **Safety valve**: when `in_flight == 0` the gate floors `remaining`
+  at `initial_max` so a partition that AIMD shrunk to `min` during
+  a past burst can re-grow when it idles.
-### `:fair_interleave` — round-robin ordering across partitions
+#### Choosing `target_lag_ms`
-Not a filter — a reordering step. Groups the batch by its primary
-partition and interleaves, so no single partition can starve others
-even if it has many pending jobs.
+It's the knob that trades latency for throughput. Rough guide:
-```ruby
-gate :concurrency, max: 10, partition_by: ->(ctx) { "acct:#{ctx[:account_id]}" }
-gate :fair_interleave
-```
+- **Too low** (10–50 ms): the gate reacts to every tiny bump in
+  queue wait and shrinks aggressively. Workers idle while jobs sit
+  pending — overshoot.
+- **Too high** (30 s+): the gate barely pushes back; throughput is
+  near-max but new admissions wait seconds before a worker picks
+  them up.
+- **Reasonable starting point**: `≈ worker_threads × avg_perform_ms`.
+  E.g. 5 workers × 200 ms perform = 1000 ms means "queue depth up
+  to ~1 s is fine".
-Place it after a gate that assigned partitions; interleaving is keyed
-off the first partition a row picked up.
+## Fairness within a tick
-### `:adaptive_concurrency` — per-partition cap that self-tunes
+When several partitions compete for admission inside the same tick,
+the gem reorders them by **least-recently-active first** so a hot
+partition with thousands of pending jobs cannot starve a cold one
+that just woke up.
-The cap per partition (`current_max`) shrinks when the adapter queue
-backs up (EWMA of queue lag > `target_lag_ms`) or when performs raise;
-grows back by +1 when lag stays under target. AIMD loop on a
-per-partition stats row (`dispatch_policy_adaptive_concurrency_stats`).
+The mechanism has two knobs: an EWMA half-life (controls *how* the
+order is decided) and an optional global tick cap (controls *how
+much* each partition is allowed in one tick).
-```ruby
-gate :adaptive_concurrency,
-     partition_by:   ->(ctx) { ctx[:account_id] },
-     initial_max:    3,
-     target_lag_ms:  1000,   # acceptable queue wait before admission
-     min:            1       # floor so a partition can't lock out
-end
+### `fairness half_life:`
+Each partition keeps `decayed_admits` and `decayed_admits_at`,
+updated atomically inside the admit transaction:
+```
+decayed_admits := decayed_admits * exp(-Δt / τ) + admitted
+                  where τ = half_life / ln(2)
 ```
-- **Feedback signal**: `admitted_at → perform_start` (queue wait in the
-  real adapter). Pure saturation signal — slow performs in the
-  downstream service don't punish admissions if workers still drain
-  the queue quickly.
-- **Growth**: +1 per fast success. No hard ceiling; the algorithm
-  self-limits via `target_lag_ms`. If the queue builds up, the cap
-  shrinks multiplicatively.
-- **Failure**: `current_max *= 0.5` (halve) when `perform` raises.
-- **Slow**: `current_max *= 0.95` when EWMA lag > target.
+After `half_life` seconds without admitting, the value halves. The
+Tick sorts the claimed batch by current `decayed_admits` ASC, so the
+under-admitted go first.
-### Choosing `target_lag_ms`
+| Value     | Behaviour                                                                    |
+|-----------|------------------------------------------------------------------------------|
+| 5–10 s    | Reacts to brief pauses. Bursty workloads where short stalls deserve a head start. |
+| **60 s** (default) | Stable steady-state. Hot partitions stay "hot" through normal latency variation. |
+| 5–15 min  | Long memory. Burst on partition A penalises A for many minutes.              |
-It's the knob that trades latency for throughput. Rough guide:
+Set `c.fairness_half_life_seconds = nil` to disable the reorder
+entirely — partitions are processed in `claim_partitions` order
+(last-checked-first).
-- **Too low** (e.g. 10-50 ms). The gate reacts to every tiny bump in
-  queue wait and shrinks the cap aggressively. Workers can end up
-  idle with jobs still pending admission because the cap is
-  overcorrecting — classic contention / overshoot.
-- **Too high** (e.g. 30 s). The gate barely ever pushes back, so
-  you get near-maximum throughput at the cost of real queue buildup;
-  newly admitted jobs may wait seconds before a worker picks them
-  up.
-- **Reasonable starting point**: `≈ worker_max_threads × avg_perform_ms`.
-  If you run 5 workers at ~200 ms/perform, `target_lag_ms: 1000`
-  means "it's OK if the adapter queue stays at most ~1 second
-  deep". You'll want to tune from there based on what your
-  downstream tolerates and how fast you want bursts to drain.
-Pair it with `round_robin_by` for multi-tenant systems that want
-automatic backpressure without hand-tuned caps per tenant:
+### `tick_admission_budget`
+Without this, each partition admits up to `admission_batch_size`.
+With it set, the per-partition ceiling becomes `fair_share = ceil(cap
+/ claimed_partitions)`. Pass-1 walks the (decay-sorted) partitions
+giving each up to `fair_share`; pass-2 redistributes any leftover to
+those that filled their share.
 ```ruby
-round_robin_by ->(args) { args.first[:account_id] }
-gate :adaptive_concurrency,
-     partition_by:  ->(ctx) { ctx[:account_id] },
-     initial_max:   3,
-     target_lag_ms: 1000
+DispatchPolicy.configure do |c|
+  c.fairness_half_life_seconds = 60
+  c.tick_admission_budget      = nil   # default — no global cap
+end
+# Per-policy override:
+dispatch_policy :endpoints do
+  partition_by ->(c) { c[:endpoint_id] }
+  fairness half_life: 30.seconds
+  tick_admission_budget 200
+  gate :throttle, rate: 100, per: 60
+end
 ```
-## Queues and partitioning
+When the cap is hit before all partitions admit, the rest are denied
+with reason `tick_cap_exhausted`. They were still observed
+(`last_checked_at` bumped), so they're at the front of the next
+tick's order.
+### Anti-stagnation
-DispatchPolicy operates at the **policy** (class) level. A job's
-ActiveJob `queue` and `priority` travel through staging into admission
-and on to the real adapter — workers of each queue pick up their jobs
-normally — but neither affects which staged rows the gates see. All
-enqueues of the same job class share one policy, one throttle bucket,
-one concurrency cap.
+The decay-based reorder only applies to partitions already claimed.
+Selection (`Repository.claim_partitions`) still orders by
+`last_checked_at NULLS FIRST, id`. Every active partition with
+pending jobs is visited in at most ⌈N / partition_batch_size⌉ ticks
+regardless of how hot or cold it is.
-Two consequences to be aware of:
+### Mixing `:adaptive_concurrency` with fairness
-- Enqueuing the same job to different queues does **not** give one
-  queue priority at admission; they share the policy's gates. If
-  urgent work should jump ahead, set a lower ActiveJob `priority`
-  (the admission SELECT is `ORDER BY priority, staged_at`) — or split
-  into a subclass with its own policy.
-- `dedupe_key` is queue-agnostic: the same key enqueued to
-  `:urgent` and `:low` dedupes to one row.
+Adaptive and fairness operate at different layers and compose
+without sharing state:
-### Using queue as a partition
+- **Fairness** writes `partitions.decayed_admits` inside the
+  per-partition admit TX.
+- **Adaptive** writes `dispatch_policy_adaptive_concurrency_stats`
+  from the worker's `around_perform` via `record_observation`.
-The context hash has `queue_name` and `priority` injected automatically
-at stage time (user-supplied keys win). Use them in any `partition_by`:
+Different tables, different locks. Each tick the actual admit_count
+becomes `min(fair_share, current_max - in_flight)` (with the
+adaptive safety valve when `in_flight == 0`). Fairness picks order +
+budget per tick; adaptive shapes how aggressively each partition
+consumes its share.
 ```ruby
-class SendEmailJob < ApplicationJob
-  include DispatchPolicy::Dispatchable
+dispatch_policy :tenants do
+  partition_by ->(c) { c[:tenant] }
-  dispatch_policy do
-    context ->(args) { { account_id: args.first.account_id } }
+  gate :adaptive_concurrency,
+       initial_max:   5,
+       target_lag_ms: 1000,
+       min:           1
-    # Separate throttle bucket per (queue, account) — urgent and default
-    # don't share rate tokens.
-    gate :throttle,
-         rate:         100,
-         per:          1.minute,
-         partition_by: ->(ctx) { "#{ctx[:queue_name]}:#{ctx[:account_id]}" }
-  end
+  fairness half_life: 30.seconds
+  tick_admission_budget 60
 end
-SendEmailJob.set(queue: :urgent).perform_later(user)
-SendEmailJob.set(queue: :default).perform_later(user)
-# → two partitions, each with its own bucket.
 ```
-If you'd rather keep the two streams fully isolated (separate policies,
-admin rows, and dedupe scopes), subclass:
+The dummy `AdaptiveDemoJob` declares both; the storm form drives it
+across many tenants with a triangular weight distribution so you can
+watch the EWMA reorder hot tenants AND the AIMD shrink their cap.
+Integration test: `test/integration/adaptive_with_fairness_test.rb`.
+## Sharding a policy across worker pools
+Shards partition the gem horizontally: each tick worker sees only
+the partitions on its own shard, so multiple workers can admit in
+parallel for the same policy. Declare a `shard_by`:
 ```ruby
-class UrgentEmailJob < SendEmailJob
-  queue_as :urgent
-  dispatch_policy do
-    context ->(args) { { account_id: args.first.account_id } }
-    gate :throttle, rate: 500, per: 1.minute, partition_by: ->(ctx) { ctx[:account_id] }
-  end
+dispatch_policy :events do
+  context ->(args) { { account_id: args.first[:account_id] } }
+  partition_by ->(c) { "acct:#{c[:account_id]}" }
+  shard_by     ->(c) { "events-shard-#{c[:account_id].hash.abs % 4}" }
+  gate :concurrency, max: 50
 end
 ```
-## Dedupe
+Run one `DispatchTickLoopJob` per shard:
+```ruby
+4.times { |i| DispatchTickLoopJob.perform_later("events", "events-shard-#{i}") }
+```
+The generated `DispatchTickLoopJob` template uses
+`queue_as { arguments[1] }` so each tick is enqueued on the same
+queue it monitors. Workers listening on `events-shard-*` queues run
+both the tick loops and the admitted jobs from one pool per shard.
-`dedupe_key` is enforced by a partial unique index on
-`(policy_name, dedupe_key) WHERE completed_at IS NULL`. Semantics:
+The gem's automatic context enrichment puts `:queue_name` into the
+ctx hash so `shard_by` can use it directly without your `context`
+proc having to know about it.
-- Re-enqueuing while a previous staged row is pending or admitted →
-  silently dropped.
-- Re-enqueuing after the previous completes → fresh staged row.
-- Returning `nil` from the lambda → no dedup for that enqueue.
+**`shard_by` must be ≥ as coarse as the most restrictive throttle's
+scope.** If not, the bucket duplicates across shards and the
+effective rate becomes `rate × N_shards`.
-Typical pattern: `"<domain>:<entity>:<id>"` (`"monitor:42"`,
-`"event:abc123"`). Keep it stable for the duration of a logical unit
-of work.
+## Atomic admission
-## Round-robin batching (tenant fairness)
+`Forwarder.dispatch` runs inside the per-partition admission
+transaction. The adapter (good_job, solid_queue) uses
+`ActiveRecord::Base.connection`, so its `INSERT INTO good_jobs`
+joins the same TX as the `DELETE FROM staged_jobs` and the `INSERT
+INTO inflight_jobs`. Any exception (deserialize, adapter error,
+network) rolls everything back atomically — no window where staged
+is gone but the adapter never received the job.
-For policies where every tenant should keep making progress even
-when one suddenly enqueues 100× its normal volume, neither throttle
-nor concurrency is a good fit — you want max throughput, just
-fairness. `round_robin_by` solves it at the batch SELECT layer:
+The trade-off: the gem requires a PG-backed adapter for
+at-least-once. The railtie warns at boot if the adapter doesn't
+look PG-shared (Sidekiq, Resque, async, …) but doesn't hard-fail —
+a custom PG-backed adapter we don't recognise can still work.
+For Rails multi-DB (e.g. solid_queue on a separate `:queue` role):
 ```ruby
-dispatch_policy do
-  context ->(args) { { account_id: args.first.account_id } }
-  round_robin_by ->(args) { args.first.account_id }
+DispatchPolicy.configure do |c|
+  c.database_role = :queue
 end
 ```
-At stage time the lambda's result is written into the dedicated
-`round_robin_key` column (indexed). `Tick.run` then uses a two-phase
-fetch:
+`Repository.with_connection` wraps the admission TX in
+`connected_to(role:)` when set. Staging tables and the adapter's
+table must live in the same DB for atomicity to hold.
-1. **LATERAL join** — distinct keys × per-key `LIMIT round_robin_quantum`.
-   Guarantees each active tenant gets at least `quantum` rows per
-   tick, so a tenant with 10 pending is served in the same tick as
-   a tenant with 50k pending.
-2. **Top-up** — if the fairness floor doesn't fill `batch_size`, the
-   remaining slots go to the oldest pending (excluding the ids
-   already locked). Keeps single-tenant throughput at full capacity.
+## Running the tick
-Cost per tick is O(`quantum × active_keys`), not O(backlog) — so the
-admin stays snappy even with thousands of distinct tenants.
+`DispatchPolicy::TickLoop.run(policy_name:, shard:, stop_when:)` is
+the entry point. It claims partitions under `FOR UPDATE SKIP
+LOCKED`, evaluates gates, atomically admits, and updates partition
+state. The install generator scaffolds a `DispatchTickLoopJob` you
+schedule like any other ActiveJob:
-## Running the tick
+```ruby
+DispatchTickLoopJob.perform_later                  # all policies
+DispatchTickLoopJob.perform_later("endpoints")     # one policy
+DispatchTickLoopJob.perform_later("endpoints", "shard-2")
+```
+Each job uses `good_job_control_concurrency_with` (or solid_queue's
+`limits_concurrency`) so only one tick is active per
+(policy, shard) combination at a time. The job re-enqueues itself
+with a 1-second tail wait, so the loop survives normal restarts.
+## Admin UI
+Mount the engine and visit `/dispatch_policy`:
+- **Dashboard** — totals, throughput windows, round-trip stats,
+  capacity gauges (admit rate vs adapter ceiling, avg tick vs
+  `tick_max_duration`), pending trend with up/down arrow, auto-hints
+  ("avg tick at 88% of tick_max_duration — shard or lower
+  admission_batch_size").
+- **Policies** — per-policy throughput, denial reasons breakdown,
+  top partitions by lifetime/pending, pause/resume/drain.
+- **Partitions** — searchable list, detail view with gate state,
+  decayed_admits + admits/min estimate, recent staged jobs,
+  force-admit, drain.
-The gem exposes `DispatchPolicy::TickLoop.run(policy_name:, stop_when:)`
-but **does not ship a tick job** — concurrency semantics are
-queue-adapter specific (GoodJob's `total_limit`, Sidekiq Enterprise
-uniqueness, etc.), so you write a small job in your app that wraps
-the loop with whatever dedup your adapter provides. Example for
-GoodJob:
+The UI auto-refreshes via Turbo morph + a controllable picker
+(off / 2s / 5s / 10s) stored in sessionStorage; preserves scroll
+position; and skips a refresh while a previous Turbo visit is in
+flight so a slow page doesn't stack visits.
+CSRF and forgery protection use the host app's settings. The UI
+ships unauthenticated; wrap the `mount` with a constraint or
+`before_action` for auth in production.
+## Configuration
 ```ruby
-# app/jobs/dispatch_tick_loop_job.rb
-class DispatchTickLoopJob < ApplicationJob
-  include GoodJob::ActiveJobExtensions::Concurrency
-  good_job_control_concurrency_with(
-    total_limit: 1,
-    key: -> { "dispatch_tick_loop:#{arguments.first || 'all'}" }
-  )
-  def perform(policy_name = nil)
-    deadline = Time.current + DispatchPolicy.config.tick_max_duration
-    DispatchPolicy::TickLoop.run(
-      policy_name: policy_name,
-      stop_when:   -> {
-        GoodJob.current_thread_shutting_down? || Time.current >= deadline
-      }
-    )
-    # Self-chain so the next run starts immediately; cron below is a safety net.
-    DispatchTickLoopJob.set(wait: 1.second).perform_later(policy_name)
-  end
+# config/initializers/dispatch_policy.rb
+DispatchPolicy.configure do |c|
+  c.tick_max_duration         = 25       # seconds the tick job stays admitting
+  c.partition_batch_size      = 50       # partitions claimed per tick iteration
+  c.admission_batch_size      = 100      # max jobs admitted per partition per iteration
+  c.idle_pause                = 0.5      # seconds slept when a tick admits nothing
+  c.partition_inactive_after  = 86_400   # GC partitions idle this long
+  c.inflight_stale_after      = 300      # GC inflight rows whose worker stopped heartbeating
+  c.inflight_heartbeat_interval = 30     # how often the worker bumps heartbeat_at
+  c.sweep_every_ticks         = 50       # sweeper cadence (in tick iterations)
+  c.metrics_retention         = 86_400   # tick_samples kept this long
+  c.fairness_half_life_seconds = 60      # EWMA half-life for in-tick reorder; nil disables
+  c.tick_admission_budget      = nil     # global cap on admissions per tick; nil = none
+  c.adapter_throughput_target  = nil     # jobs/sec; UI shows admit rate as % of this
+  c.database_role              = nil     # AR role for the admission TX (multi-DB)
 end
 ```
-Schedule it (every 10s as a safety net — the self-chain keeps one
-alive under normal operation):
+You can override `admission_batch_size`, `fairness_half_life_seconds`,
+and `tick_admission_budget` per policy via the DSL.
-```ruby
-# config/application.rb
-config.good_job.cron = {
-  dispatch_tick_loop: {
-    cron:  "*/10 * * * * *",
-    class: "DispatchTickLoopJob"
-  }
-}
+## `partitions.context` is refreshed on every enqueue
+When you call `perform_later`, the gem evaluates your `context` proc
+and upserts the partition row with the resulting hash:
+```sql
+INSERT INTO dispatch_policy_partitions (..., context, context_updated_at, ...) VALUES (...)
+ON CONFLICT (policy_name, partition_key) DO UPDATE
+  SET context            = EXCLUDED.context,
+      context_updated_at = EXCLUDED.context_updated_at,
+      pending_count      = dispatch_policy_partitions.pending_count + 1,
+      ...
 ```
-For adapters without a first-class dedup mechanism, implement it
-yourself (e.g. `pg_try_advisory_lock` inside `perform`) before calling
-`DispatchPolicy::TickLoop.run`.
+Gates evaluate against `partition.context`, **not** the per-job
+snapshot in `staged_jobs.context`. So if a tenant bumps their
+`dispatch_concurrency` from 5 to 20 and a new job arrives, the next
+admission uses the new value — no need to drain the partition
+first. If a partition has no new traffic, the context stays at the
+value seen by the last enqueue.
-## Admin UI
+## Retry strategies
-`DispatchPolicy::Engine` ships a read-only admin mounted wherever
-you like. Features:
-- Policy index with pending / admitted / completed-24h totals.
-- Per-policy page with a **partition breakdown** (watched + searchable
-  list) showing pending-eligible / pending-scheduled / in-flight /
-  completed / adaptive cap / EWMA latency / last enqueue / last
-  dispatch per partition.
-- Line chart of avg EWMA queue lag (last hour, per minute) with
-  completions-per-minute bars behind it.
-- Per-partition sparkline with the same overlay; click to watch /
-  unwatch. Watched set is persisted in `localStorage` and synced into
-  the URL so reloading keeps your view.
-- Opt-in auto-refresh (off / 2s / 5s / 15s) stored in `localStorage`.
-  Page updates via Turbo morph — scroll position and tooltips survive.
+By default a retry produced by `retry_on` re-enters the policy and
+is staged again, so throttle/concurrency apply equally to first
+attempts and retries. Use `retry_strategy :bypass` if you want
+retries to skip the gem and go straight to the adapter:
+```ruby
+dispatch_policy :foo do
+  partition_by ->(_c) { "k" }
+  gate :throttle, rate: 5, per: 60
+  retry_strategy :bypass
+end
+```
+## Compatibility
+- Rails 7.1+ (developed against 8.1).
+- PostgreSQL 12+ (uses `FOR UPDATE SKIP LOCKED`, `JSONB`, `ON CONFLICT`).
+- `good_job` ≥ 4.0 or `solid_queue` ≥ 1.0.
+- Sidekiq / Resque are NOT supported — the at-least-once guarantee
+  needs the adapter to share Postgres with the gem.
 ## Testing
-```
-bundle install
-bundle exec rake test
+```bash
+bundle exec rake test         # 124 runs / 284 assertions
+bundle exec rake bench        # manual benchmark suite (creates dispatch_policy_bench DB)
+bundle exec rake bench:real   # end-to-end against good_job on the dummy DB
+bundle exec rake bench:limits # stretches every path to its breaking point
 ```
-Tests require a PostgreSQL instance (uses `ON CONFLICT`, partial
-indexes, `FOR UPDATE SKIP LOCKED`, `jsonb`). `PGUSER` / `PGHOST` /
-`PGPASSWORD` env vars override the defaults in
-`test/dummy/config/database.yml`.
+Integration tests skip when no Postgres is reachable (default DB
+`dispatch_policy_test`; override via `DB_NAME`, `DB_HOST`,
+`DB_USER`, `DB_PASS`).
+## Releasing
+Cutting a new version is driven by `bin/release`. Steps:
+1. Bump `DispatchPolicy::VERSION` in
+   `lib/dispatch_policy/version.rb`.
+2. Add a `## <VERSION>` section in `CHANGELOG.md` describing the
+   release. The script extracts that section verbatim as the
+   GitHub release notes, so anything missing here will be missing
+   on GitHub.
+3. Commit both on `master` and push so `origin/master` matches
+   local.
+4. Run the script from the repo root:
+   ```bash
+   bin/release
+   ```
+The script:
+- Refuses to run unless you are on `master`, the working tree is
+  clean, the local branch matches `origin/master`, and the tag
+  `v<VERSION>` does not yet exist.
+- Asks for a `y` confirmation before doing anything.
+- Hands off to `bundle exec rake release` (builds the gem, creates
+  the `v<VERSION>` tag, pushes the tag to GitHub, pushes the gem to
+  RubyGems.org).
+- Creates a GitHub release for `v<VERSION>` using the matching
+  CHANGELOG section as the body. Requires the `gh` CLI; if it is
+  missing, the gem ships but you'll need to create the GitHub
+  release manually with `gh release create v<VERSION> --notes-file
+  CHANGELOG.md`.
+Prerequisites: a configured `~/.gem/credentials` for RubyGems push
+and `gh auth login` for the GitHub release.
+## Status
+Published on RubyGems. API may still shift between minors until
+1.0. The set of features that ship today:
+- Gates: `:throttle`, `:concurrency`, `:adaptive_concurrency`.
+- Fairness: in-tick EWMA reorder + optional `tick_admission_budget`.
+- Sharding: `shard_by` + per-shard tick loops.
+- Bulk handoff: `ActiveJob.perform_all_later` collapses to one
+  adapter `INSERT` per tick when admissible.
+- Admin UI with capacity hints, pending trend, denial reasons.
+- Manual benchmark suite.
+Deferred ideas (with rationale) live in [`IDEAS.md`](IDEAS.md):
+`gate :global_cap`, smarter sweeper defaults, `sweep_every_seconds`
+instead of `sweep_every_ticks`.
 ## License