RubyGems - dispatch_policy - Versions diffs - 0.1.0 → 0.2.0 - Mend

dispatch_policy 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +31 -0
data/README.md +117 -1
data/app/controllers/dispatch_policy/policies_controller.rb +39 -13
data/app/models/dispatch_policy/partition_observation.rb +34 -7
data/app/views/dispatch_policy/policies/show.html.erb +8 -0
data/db/migrate/20260425000001_add_duration_to_partition_observations.rb +8 -0
data/lib/dispatch_policy/dispatchable.rb +7 -4
data/lib/dispatch_policy/gates/throttle.rb +5 -1
data/lib/dispatch_policy/policy.rb +14 -1
data/lib/dispatch_policy/tick.rb +89 -2
data/lib/dispatch_policy/version.rb +1 -1
data/lib/dispatch_policy.rb +14 -8
metadata +2 -2
data/lib/dispatch_policy/install_generator.rb +0 -23

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 78151d6562bbaa7ef349966a0c968ee26cc3c2c1830cb3f33461c5c1d5f66303
-  data.tar.gz: 29360c499ccdab8bb98c865bfcf522efad89facd287ee335beb9eeec302cb5cf
+  metadata.gz: 6eb153991642d0669fffd7cd3c8c2133c837978faf934bf6b14bf44ae8628907
+  data.tar.gz: 6ce8ff07f09fbd7763cb191b6305f316106b21a5f11b20a25ecdac3e3e2e1f2c
 SHA512:
-  metadata.gz: 306526c1343773820a6a1df453a716551201433555843869b841c85ffefaea3b61c64c6b29cec2c7105718ba019f92c3dd19180ccd837260dd65c0b317fb75e5
-  data.tar.gz: 8d5372b9feb3857b45ad0745feae75a123fa68a219d936ed6bf9af5eb4c9f97dbc183cae86cb36bf155b65c47137f4c897a90f2f5b85112fc9099397bdf35ca3
+  metadata.gz: ea97e0959378bedd5a024888a8beab4880b30025216102b420b03deb5dff0c0b2ffd07beec2eb8e37d713ea3ae763378c8e50b143850563c01bf17bfb1ff47d9
+  data.tar.gz: 5ca3f5205c79ab6c8cbe76cd60820602b5ed69bb57c80109dd0ac240988a7a021a3473c3d0c67ae14181913516bae3cdf46e00ec47cf7310a22c8bf49da087ae

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,36 @@
 # Changelog
+## 0.2.0
+### Added
+- `round_robin_by` supports `weight: :time` to balance per-tick quanta
+  by recent perform compute time instead of by request count (#3).
+- GitHub Actions CI matrix covering Ruby 3.4 and Rails 7.2 / 8.1 (#5).
+- Integration tests for gate combinations and throttle bucket
+  boundaries (#12).
+- Resilience tests covering failure paths and dedupe state transitions
+  (#13).
+- `bin/release` wrapper around `rake release` (#2).
+### Changed
+- Admin partition breakdown caps its aggregations to keep the page
+  responsive on policies with many partitions (#9).
+- Admin pending list no longer loads the `arguments` jsonb column
+  (#6).
+### Fixed
+- Admission is reverted when the underlying adapter silently declines
+  to enqueue, so the staged row doesn't stay marked as admitted (#14).
+- `consumed_ms_by_partition` window is padded to survive
+  minute-boundary races in the time-weighted round-robin fetch (#11).
+- ThrottleBucket row locks are taken in a deterministic key order to
+  remove a deadlock window when multiple ticks contend on the same
+  set of partitions (#8).
+### Removed
+- Stale custom `InstallGenerator` — the engine's migration generator
+  is the supported install path (#7).
 ## 0.1.0
 Initial release.

data/README.md CHANGED Viewed

@@ -29,7 +29,17 @@ Use it when you need:
   and grows back when workers keep up, without manual tuning.
 - **Dedupe** against a partial unique index, not an in-memory key.
 - **Round-robin fairness across tenants** (LATERAL batch fetch) so one
-  tenant's burst can't starve the others.
+  tenant's burst can't starve the others — including a **time-weighted
+  variant** that balances total compute time per tenant when their
+  performs have very different durations.
+## Demo
+A runnable playground that exercises every gate and the admin UI lives
+at [ceritium/dispatch_policy-demo](https://github.com/ceritium/dispatch_policy-demo).
+Clone it, `bundle && rails db:setup`, and use the in-browser forms to
+fire jobs through throttle / concurrency / adaptive / round-robin
+policies while the admin UI updates in real time.
 ## Install
@@ -117,6 +127,11 @@ end
 `perform_later` stages the job; the tick admits it when its gates pass.
+For the common multi-tenant webhook case (mixed-latency tenants behind
+a shared pool) skip ahead to [Recipes](#multi-tenant-webhook-delivery)
+— `round_robin_by weight: :time` plus `:adaptive_concurrency` covers
+it without an explicit throttle.
 ## Gates
 Gates run in declared order, each narrowing the survivor set. Any option
@@ -351,6 +366,81 @@ fetch:
 Cost per tick is O(`quantum × active_keys`), not O(backlog) — so the
 admin stays snappy even with thousands of distinct tenants.
+### Time-weighted variant
+Equal-quanta round-robin gives every active tenant the same number of
+admissions per tick — fair by *count*. If your tenants have very
+different per-job durations (slow webhooks, varied report sizes) and
+you want to balance the *total compute time* each consumes, pass
+`weight: :time`:
+```ruby
+round_robin_by ->(args) { args.first[:account_id] }, weight: :time
+```
+Solo tenants are unaffected — the fetch falls through to the trailing
+top-up and they consume up to `batch_size` per tick. When multiple
+tenants are active, each one's quantum is sized inversely to how much
+compute time it has used in the last `window` seconds (default 60),
+sourced from `dispatch_policy_partition_observations`. So if `slow`
+has burned 20 s of perform time recently and `fast` has burned 200 ms,
+this tick `fast` claims ~99% of `batch_size` while `slow` gets the
+floor — total compute per minute stays balanced and you don't need a
+throttle on top.
+## Recipes
+### Multi-tenant webhook delivery
+Mixed-latency tenants behind a shared worker pool — exactly the case
+that motivated `weight: :time` and adaptive concurrency. Pair them:
+```ruby
+class WebhookDeliveryJob < ApplicationJob
+  include DispatchPolicy::Dispatchable
+  dispatch_policy do
+    context ->(args) { { account_id: args.first[:account_id] } }
+    # Fetch-level fairness by *compute time* (not request count). When
+    # several accounts compete, per-tick quanta are sized inverse to
+    # their recent perform duration; solo accounts top up to batch_size.
+    round_robin_by ->(args) { args.first[:account_id] },
+                   weight: :time, window: 60
+    # Drip-feed admission per account based on adapter queue lag.
+    # Without this, a single account with thousands of pending could
+    # dump batch_size jobs into the adapter queue in one tick and lose
+    # the ability to react to performance changes mid-burst.
+    gate :adaptive_concurrency,
+         partition_by:  ->(ctx) { ctx[:account_id] },
+         initial_max:   3,
+         target_lag_ms: 500
+  end
+  def perform(account_id:, **) = WebhookClient.deliver!(account_id)
+end
+```
+What you get with no throttle, no manual tuning:
+- A solo account runs at whatever throughput its downstream allows;
+  `:adaptive_concurrency` grows `current_max` while queue lag stays
+  under `target_lag_ms`.
+- A slow account (1 s/perform) and a fast account (100 ms/perform)
+  competing → `weight: :time` gives the fast one most of each tick's
+  budget; the slow one's adaptive cap shrinks toward `min`. Total
+  compute time per minute stays balanced and the adapter queue
+  doesn't pile up behind whichever tenant happened to enqueue first.
+- A misbehaving downstream that suddenly goes from 100 ms to 5 s →
+  that tenant's `current_max` drops within a few completions and its
+  fetch quantum shrinks; the other tenants are unaffected.
+Tune `target_lag_ms` for the latency budget you can tolerate (see
+[Choosing target_lag_ms](#choosing-target_lag_ms)) and `window` for
+how reactive the time-balancing should be (smaller = noisier, larger
+= more stable).
 ## Running the tick
 The gem exposes `DispatchPolicy::TickLoop.run(policy_name:, stop_when:)`
@@ -430,6 +520,32 @@ indexes, `FOR UPDATE SKIP LOCKED`, `jsonb`). `PGUSER` / `PGHOST` /
 `PGPASSWORD` env vars override the defaults in
 `test/dummy/config/database.yml`.
+## Releasing
+The gem uses the standard `bundler/gem_tasks` flow — there is no
+release automation in CI. To cut a new version:
+1. Bump `DispatchPolicy::VERSION` in `lib/dispatch_policy/version.rb`
+   following SemVer. While the API is marked experimental, breaking
+   changes go in a minor bump and should be called out in the changelog.
+2. Add a section to `CHANGELOG.md` above the previous one, grouping
+   entries (Added / Changed / Fixed / Removed). Link any relevant PRs.
+3. Make sure the working tree is on `master`, clean, and CI is green
+   (`bundle exec rake test` locally for a sanity check).
+4. Commit: `git commit -am "Release vX.Y.Z"`.
+5. `bundle exec rake release` — Bundler will build the `.gem` into
+   `pkg/`, tag `vX.Y.Z`, push the commit and tag, and `gem push` to
+   RubyGems. The gemspec sets `rubygems_mfa_required`, so have your
+   OTP ready (`gem signin` first if you aren't authenticated).
+6. Optional: publish a GitHub release from the tag, e.g.
+   `gh release create vX.Y.Z --notes-from-tag`, or paste the
+   changelog section into the release notes.
+If `rake release` fails partway through (e.g. RubyGems push rejects
+the version), do not retry blindly — inspect what already happened
+(tag created? commit pushed?) and clean up before re-running, since
+Bundler won't re-tag an existing version.
 ## License
 MIT.

data/app/controllers/dispatch_policy/policies_controller.rb CHANGED Viewed

@@ -59,7 +59,12 @@ module DispatchPolicy
       load_adaptive_chart_data
       @throttle_buckets = ThrottleBucket
         .where(policy_name: @policy_name).order(:gate_name, :partition_key).limit(50)
-      @pending_jobs = scope.pending.order(:priority, :staged_at).limit(50)
+      # Explicit select: don't load the `arguments` jsonb (job payload —
+      # may contain PII / tokens) into memory just to render six fields.
+      @pending_jobs = scope.pending
+        .select(:id, :dedupe_key, :round_robin_key, :priority, :staged_at, :not_before_at)
+        .order(:priority, :staged_at)
+        .limit(50)
     end
     private
@@ -85,8 +90,12 @@ module DispatchPolicy
       now       = Time.current
       now_iso   = now.iso8601
       since_24h = 24.hours.ago.iso8601
+      limit     = DispatchPolicy.config.admin_partition_limit
+      @partition_breakdown_truncated = false
       adaptive_stats = AdaptiveConcurrencyStats.where(policy_name: @policy_name)
+        .order(updated_at: :desc)
+        .limit(limit)
         .pluck(:gate_name, :partition_key, :current_max, :ewma_latency_ms)
         .each_with_object({}) { |(g, k, c, l), h|
           h[[ g, k ]] = { current_max: c, ewma_latency_ms: l.to_f.round(1) }
@@ -107,25 +116,38 @@ module DispatchPolicy
         }
       }
+      # Each aggregation below is order-by-count + limited so that a
+      # policy with tens of thousands of distinct (context, round_robin_key)
+      # tuples can't pull megabytes of rows into memory per request. We
+      # show the top-N most-active partitions per axis and flip the
+      # truncation flag for the view banner.
       # Activity timestamps bounded to the last 24h so the scan stays on
       # an index-friendly slice of staged_jobs.
       activity_rows = scope
         .where("staged_at > ?", since_24h)
         .group(:context, :round_robin_key)
+        .order(Arel.sql("MAX(staged_at) DESC"))
+        .limit(limit)
         .pluck(
           :context,
           :round_robin_key,
           Arel.sql("MAX(staged_at)"),
           Arel.sql("MAX(admitted_at)")
         )
+      @partition_breakdown_truncated = true if activity_rows.size >= limit
       sources.each do |name, extract|
-        pending_counts = scope.pending.group(:context, :round_robin_key).pluck(
-          :context,
-          :round_robin_key,
-          Arel.sql("count(*) filter (where not_before_at is null or not_before_at <= '#{now_iso}')"),
-          Arel.sql("count(*) filter (where not_before_at > '#{now_iso}')")
-        )
+        pending_counts = scope.pending.group(:context, :round_robin_key)
+          .order(Arel.sql("count(*) DESC"))
+          .limit(limit)
+          .pluck(
+            :context,
+            :round_robin_key,
+            Arel.sql("count(*) filter (where not_before_at is null or not_before_at <= '#{now_iso}')"),
+            Arel.sql("count(*) filter (where not_before_at > '#{now_iso}')")
+          )
+        @partition_breakdown_truncated = true if pending_counts.size >= limit
         pending_counts.each do |ctx, rr_key, eligible, scheduled|
           partition = extract.call(ctx, rr_key)
           row = rows[[ name, partition ]]
@@ -133,18 +155,22 @@ module DispatchPolicy
           row[:scheduled] += scheduled
         end
-        admitted_counts = scope.admitted.group(:context, :round_robin_key).pluck(
-          :context, :round_robin_key, Arel.sql("count(*)")
-        )
+        admitted_counts = scope.admitted.group(:context, :round_robin_key)
+          .order(Arel.sql("count(*) DESC"))
+          .limit(limit)
+          .pluck(:context, :round_robin_key, Arel.sql("count(*)"))
+        @partition_breakdown_truncated = true if admitted_counts.size >= limit
         admitted_counts.each do |ctx, rr_key, in_flight|
           partition = extract.call(ctx, rr_key)
           rows[[ name, partition ]][:in_flight] += in_flight
         end
         completed_counts = scope.completed.where("completed_at > ?", since_24h)
-          .group(:context, :round_robin_key).pluck(
-            :context, :round_robin_key, Arel.sql("count(*)")
-          )
+          .group(:context, :round_robin_key)
+          .order(Arel.sql("count(*) DESC"))
+          .limit(limit)
+          .pluck(:context, :round_robin_key, Arel.sql("count(*)"))
+        @partition_breakdown_truncated = true if completed_counts.size >= limit
         completed_counts.each do |ctx, rr_key, completed|
           partition = extract.call(ctx, rr_key)
           rows[[ name, partition ]][:completed_24h] += completed

data/app/models/dispatch_policy/partition_observation.rb CHANGED Viewed

@@ -7,41 +7,68 @@ module DispatchPolicy
   # for all partitioned policies, not just the adaptive ones.
   #
   # One row per (policy, partition, minute): total_lag_ms accumulates the
-  # sum of queue_lag_ms observations in that minute, observation_count
-  # increments, max_lag_ms tracks the worst spike. Average lag for the
-  # bucket is derived on read as total / count.
+  # sum of queue_lag_ms observations in that minute, total_duration_ms
+  # accumulates perform durations (used by :time_budget and :fair_time_share),
+  # observation_count increments, max_lag_ms / max_duration_ms track worst
+  # spikes. Averages are derived on read as total / count.
   class PartitionObservation < ApplicationRecord
     self.table_name = "dispatch_policy_partition_observations"
     OBSERVATION_TTL = 2 * 60 * 60  # 2 hours
-    def self.observe!(policy_name:, partition_key:, queue_lag_ms:, current_max: nil)
+    def self.observe!(policy_name:, partition_key:, queue_lag_ms:, duration_ms: 0, current_max: nil)
       return if partition_key.nil? || partition_key.to_s.empty?
       now = Time.current
       lag = queue_lag_ms.to_i
+      dur = duration_ms.to_i
       sql = <<~SQL.squish
         INSERT INTO #{quoted_table_name}
           (policy_name, partition_key, minute_bucket,
-           total_lag_ms, observation_count, max_lag_ms, current_max,
+           total_lag_ms, total_duration_ms, observation_count,
+           max_lag_ms, max_duration_ms, current_max,
            created_at, updated_at)
-        VALUES (?, ?, date_trunc('minute', ?::timestamp), ?, 1, ?, ?, ?, ?)
+        VALUES (?, ?, date_trunc('minute', ?::timestamp), ?, ?, 1, ?, ?, ?, ?, ?)
         ON CONFLICT (policy_name, partition_key, minute_bucket)
         DO UPDATE SET
           total_lag_ms      = #{quoted_table_name}.total_lag_ms + EXCLUDED.total_lag_ms,
+          total_duration_ms = #{quoted_table_name}.total_duration_ms + EXCLUDED.total_duration_ms,
           observation_count = #{quoted_table_name}.observation_count + 1,
           max_lag_ms        = GREATEST(#{quoted_table_name}.max_lag_ms, EXCLUDED.max_lag_ms),
+          max_duration_ms   = GREATEST(#{quoted_table_name}.max_duration_ms, EXCLUDED.max_duration_ms),
           current_max       = COALESCE(EXCLUDED.current_max, #{quoted_table_name}.current_max),
           updated_at        = EXCLUDED.updated_at
       SQL
       connection.exec_update(
         sanitize_sql_array([
           sql, policy_name, partition_key.to_s, now,
-          lag, lag, current_max, now, now
+          lag, dur, lag, dur, current_max, now, now
         ])
       )
     end
+    # Sum of perform durations per partition over the last `window` seconds.
+    # Used by :fair_time_share to bias admission ordering toward partitions
+    # that have consumed less compute time recently.
+    def self.consumed_ms_by_partition(policy_name:, partition_keys:, window:)
+      return {} if partition_keys.empty?
+      # minute_bucket is floored on insert (date_trunc('minute', now)).
+      # An observation written T seconds ago lives in a bucket up to 60s
+      # earlier than T. Add a one-bucket pad to the lower bound so the
+      # most recent bucket is always inside the window — without it, the
+      # previous-minute bucket is silently excluded as soon as the wall
+      # clock crosses a minute boundary.
+      since = Time.current - window - 60
+      rows = where(policy_name: policy_name, partition_key: partition_keys.map(&:to_s))
+        .where("minute_bucket >= ?", since)
+        .group(:partition_key)
+        .pluck(Arel.sql("partition_key, SUM(total_duration_ms), SUM(observation_count)"))
+      rows.each_with_object({}) do |(key, total, count), acc|
+        acc[key] = { consumed_ms: total.to_i, count: count.to_i }
+      end
+    end
     def self.prune!
       where("minute_bucket < ?", Time.current - OBSERVATION_TTL).delete_all
     end

data/app/views/dispatch_policy/policies/show.html.erb CHANGED Viewed

@@ -98,6 +98,14 @@
 <h2>All partitions <small class="muted">(<%= @partition_total_list %>)</small></h2>
+<% if @partition_breakdown_truncated %>
+  <p class="muted" style="background: #fff3cd; border: 1px solid #f0d97f; padding: 0.5em 0.75em; border-radius: 4px;">
+    Showing the most-active partitions only. The full set exceeds the
+    admin's per-request cap (<code>DispatchPolicy.config.admin_partition_limit</code>
+    = <%= DispatchPolicy.config.admin_partition_limit %>); raise it if you need the long tail.
+  </p>
+<% end %>
 <turbo-frame id="partitions_list" data-turbo-action="advance">
   <% if @partition_total_list.to_i.zero? && @partition_search.blank? %>
     <p class="muted">This policy declares no partitioning (no gate with <code>partition_by</code> and no <code>round_robin_by</code>).</p>

data/db/migrate/20260425000001_add_duration_to_partition_observations.rb ADDED Viewed

@@ -0,0 +1,8 @@
+# frozen_string_literal: true
+class AddDurationToPartitionObservations < ActiveRecord::Migration[7.1]
+  def change
+    add_column :dispatch_policy_partition_observations, :total_duration_ms, :bigint,  null: false, default: 0
+    add_column :dispatch_policy_partition_observations, :max_duration_ms,   :integer, null: false, default: 0
+  end
+end

data/lib/dispatch_policy/dispatchable.rb CHANGED Viewed

@@ -41,6 +41,7 @@ module DispatchPolicy
           block.call
           succeeded = true
         ensure
+          duration_ms = ((Time.current - perform_start) * 1000).to_i
           policy_name = job.class.resolved_dispatch_policy&.name
           if job._dispatch_partitions.present?
@@ -49,9 +50,9 @@ module DispatchPolicy
               partitions:  job._dispatch_partitions
             )
-            # Let adaptive gates update their AIMD state first; we pick up
-            # the resulting current_max in the generic observation below
-            # so the chart surfaces the cap alongside lag + completions.
+            # Let adaptive gates update their AIMD state first; the
+            # generic observation below then captures the resulting
+            # current_max alongside lag + duration for the chart.
             policy = job.class.resolved_dispatch_policy
             job._dispatch_partitions.each do |gate_name, partition_key|
               gate = policy&.gates&.find { |g| g.name == gate_name.to_sym }
@@ -64,7 +65,8 @@ module DispatchPolicy
             end
             # Generic observation per unique partition. Every gate with
-            # partition_by (adaptive or not) gets a sparkline this way.
+            # partition_by (adaptive or not) gets a sparkline this way,
+            # plus :fair_time_share reads consumed_ms from here.
             job._dispatch_partitions.values.uniq.each do |partition_key|
               current_max = DispatchPolicy::AdaptiveConcurrencyStats.current_max_for(
                 policy_name:   policy_name,
@@ -74,6 +76,7 @@ module DispatchPolicy
                 policy_name:   policy_name,
                 partition_key: partition_key,
                 queue_lag_ms:  queue_lag_ms,
+                duration_ms:   duration_ms,
                 current_max:   current_max
               )
             end

data/lib/dispatch_policy/gates/throttle.rb CHANGED Viewed

@@ -18,7 +18,11 @@ module DispatchPolicy
         by_partition = batch.group_by { |staged| partition_key_for(context.for(staged)) }
         admitted = []
-        by_partition.each do |partition_key, jobs|
+        # Sort keys before acquiring per-partition row locks: two ticks
+        # processing overlapping partitions in different group_by orders
+        # would otherwise deadlock on each other's FOR UPDATE rows.
+        by_partition.keys.sort.each do |partition_key|
+          jobs       = by_partition[partition_key]
           sample_ctx = context.for(jobs.first)
           rate       = resolve(@rate, sample_ctx).to_f
           per        = @per.to_f

data/lib/dispatch_policy/policy.rb CHANGED Viewed

@@ -12,6 +12,8 @@ module DispatchPolicy
       @snapshots           = {}
       @dedupe_key_builder  = nil
       @round_robin_builder = nil
+      @round_robin_weight  = :equal
+      @round_robin_window  = 60
       instance_eval(&block) if block
       DispatchPolicy.registry[@name] = job_class
     end
@@ -45,14 +47,25 @@ module DispatchPolicy
       key&.to_s
     end
-    def round_robin_by(builder)
+    def round_robin_by(builder, weight: :equal, window: 60)
+      raise ArgumentError, "weight must be :equal or :time" unless %i[equal time].include?(weight)
       @round_robin_builder = builder
+      @round_robin_weight  = weight
+      @round_robin_window  = window
     end
     def round_robin?
       !@round_robin_builder.nil?
     end
+    def round_robin_weight
+      @round_robin_weight
+    end
+    def round_robin_window
+      @round_robin_window
+    end
     def build_round_robin_key(arguments)
       return nil unless @round_robin_builder
       key = @round_robin_builder.call(arguments)

data/lib/dispatch_policy/tick.rb CHANGED Viewed

@@ -28,7 +28,20 @@ module DispatchPolicy
       pending_enqueue.each do |staged, job|
         begin
           job.enqueue(_bypass_staging: true)
-          admitted_count += 1
+          # ActiveJob adapters report a polite failure by setting
+          # enqueue_error and leaving successfully_enqueued? false
+          # instead of raising. Without this check the staged row
+          # would stay marked admitted while the adapter never queued
+          # the job — losing it silently.
+          if job.successfully_enqueued?
+            admitted_count += 1
+          else
+            Rails.logger&.warn(
+              "[DispatchPolicy] adapter did not enqueue staged=#{staged.id}: " \
+              "#{job.enqueue_error&.class}: #{job.enqueue_error&.message}"
+            )
+            revert_admission(staged)
+          end
         rescue StandardError => e
           Rails.logger&.error("[DispatchPolicy] enqueue failed staged=#{staged.id}: #{e.class}: #{e.message}")
           revert_admission(staged)
@@ -100,7 +113,11 @@ module DispatchPolicy
     def self.fetch_batch(policy)
       if policy.round_robin?
-        fetch_round_robin_batch(policy)
+        if policy.round_robin_weight == :time
+          fetch_time_weighted_batch(policy)
+        else
+          fetch_round_robin_batch(policy)
+        end
       else
         fetch_plain_batch(policy)
       end
@@ -162,6 +179,76 @@ module DispatchPolicy
       batch + top_up
     end
+    # Time-weighted variant of round-robin: instead of an equal quantum
+    # per active partition, allocate quanta proportional to the inverse
+    # of recently-consumed compute time. Solo partitions get the full
+    # batch_size; competing partitions get slices that bias admission
+    # toward whoever has consumed less, so total compute time stays
+    # balanced even when one tenant's backlog is much bigger than
+    # another's. Falls back to the same trailing top-up as the equal
+    # round-robin so we never under-fill the batch when only a few
+    # partitions are active.
+    DEFAULT_TIME_SHARE_DURATION_MS = 100
+    def self.fetch_time_weighted_batch(policy)
+      batch_size = DispatchPolicy.config.batch_size
+      now        = Time.current
+      partitions = StagedJob.pending
+        .where(policy_name: policy.name)
+        .where("not_before_at IS NULL OR not_before_at <= ?", now)
+        .where.not(round_robin_key: nil)
+        .distinct
+        .pluck(:round_robin_key)
+      return fetch_plain_batch(policy) if partitions.empty?
+      consumed = PartitionObservation.consumed_ms_by_partition(
+        policy_name:    policy.name,
+        partition_keys: partitions,
+        window:         policy.round_robin_window
+      )
+      # Inverse-of-consumed weights, with a floor so a brand-new partition
+      # (no observations) doesn't dominate to infinity.
+      weights = partitions.each_with_object({}) do |key, acc|
+        consumed_ms     = consumed.dig(key, :consumed_ms) || 0
+        denom           = [ consumed_ms, DEFAULT_TIME_SHARE_DURATION_MS ].max
+        acc[key]        = 1.0 / denom
+      end
+      total_weight = weights.values.sum
+      quanta = weights.transform_values do |w|
+        [ (batch_size * w / total_weight).floor, 1 ].max
+      end
+      batch = []
+      partitions.each do |key|
+        rows = StagedJob.pending
+          .where(policy_name: policy.name, round_robin_key: key)
+          .where("not_before_at IS NULL OR not_before_at <= ?", now)
+          .order(:priority, :staged_at)
+          .limit(quanta[key])
+          .lock("FOR UPDATE SKIP LOCKED")
+          .to_a
+        batch.concat(rows)
+        break if batch.size >= batch_size
+      end
+      remaining = batch_size - batch.size
+      return batch if remaining <= 0 || batch.empty?
+      top_up = StagedJob.pending
+        .where(policy_name: policy.name)
+        .where("not_before_at IS NULL OR not_before_at <= ?", now)
+        .where.not(id: batch.map(&:id))
+        .order(:priority, :staged_at)
+        .limit(remaining)
+        .lock("FOR UPDATE SKIP LOCKED")
+        .to_a
+      batch + top_up
+    end
     def self.lookup_policy(policy_name)
       job_class = DispatchPolicy.registry[policy_name] || autoload_job_for(policy_name)
       return nil unless job_class

data/lib/dispatch_policy/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module DispatchPolicy
-  VERSION = "0.1.0"
+  VERSION = "0.2.0"
 end

data/lib/dispatch_policy.rb CHANGED Viewed

@@ -16,19 +16,25 @@ module DispatchPolicy
     :tick_sleep,
     :tick_sleep_busy,
     :partition_idle_ttl,
+    :admin_partition_limit,
     keyword_init: true
   )
   def self.config
     @config ||= Config.new(
-      enabled:             true,
-      lease_duration:      15 * 60,          # 15.minutes
-      batch_size:          500,
-      round_robin_quantum: 50,
-      tick_max_duration:   60,               # 1.minute
-      tick_sleep:          1,                # idle sleep
-      tick_sleep_busy:     0.05,             # busy sleep
-      partition_idle_ttl:  30 * 60           # 30.minutes
+      enabled:               true,
+      lease_duration:        15 * 60,          # 15.minutes
+      batch_size:            500,
+      round_robin_quantum:   50,
+      tick_max_duration:     60,               # 1.minute
+      tick_sleep:            1,                # idle sleep
+      tick_sleep_busy:       0.05,             # busy sleep
+      partition_idle_ttl:    30 * 60,          # 30.minutes
+      # Hard cap on rows the admin's partition breakdown will pull per
+      # aggregation. Protects the host DB and process when a policy has
+      # tens of thousands of partitions: the admin shows the top-N most
+      # active and a truncation banner instead of dragging in everything.
+      admin_partition_limit: 5_000
     )
   end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: dispatch_policy
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.2.0
 platform: ruby
 authors:
 - José Galisteo
@@ -136,6 +136,7 @@ files:
 - db/migrate/20260424000002_create_adaptive_concurrency_stats.rb
 - db/migrate/20260424000003_create_adaptive_concurrency_samples.rb
 - db/migrate/20260424000004_rename_samples_to_partition_observations.rb
+- db/migrate/20260425000001_add_duration_to_partition_observations.rb
 - lib/dispatch_policy.rb
 - lib/dispatch_policy/active_job_perform_all_later_patch.rb
 - lib/dispatch_policy/dispatch_context.rb
@@ -147,7 +148,6 @@ files:
 - lib/dispatch_policy/gates/fair_interleave.rb
 - lib/dispatch_policy/gates/global_cap.rb
 - lib/dispatch_policy/gates/throttle.rb
-- lib/dispatch_policy/install_generator.rb
 - lib/dispatch_policy/policy.rb
 - lib/dispatch_policy/tick.rb
 - lib/dispatch_policy/tick_loop.rb

data/lib/dispatch_policy/install_generator.rb DELETED Viewed

@@ -1,23 +0,0 @@
-# frozen_string_literal: true
-require "rails/generators"
-require "rails/generators/active_record"
-module DispatchPolicy
-  module Generators
-    class InstallGenerator < Rails::Generators::Base
-      include Rails::Generators::Migration
-      source_root File.expand_path("../../db/migrate", __dir__)
-      def self.next_migration_number(dirname)
-        ActiveRecord::Generators::Base.next_migration_number(dirname)
-      end
-      def copy_migration
-        migration_template "20260424000001_create_dispatch_policy_tables.rb",
-                           "db/migrate/create_dispatch_policy_tables.rb"
-      end
-    end
-  end
-end