ruby_reactor 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,459 @@
1
+ # Locks, Semaphores & Periods
2
+
3
+ RubyReactor ships with three Redis-backed coordination primitives — each tackling a different problem:
4
+
5
+ | Primitive | Question it answers |
6
+ | ----------------- | ---------------------------------------------------------------------------------------------------- |
7
+ | `with_lock` | "Is anyone else **currently** running with this key?" — concurrency control. |
8
+ | `with_semaphore` | "Are too many runs **currently** in flight for this key?" — capacity control. |
9
+ | `with_rate_limit` | "Have we already made N calls in this time window?" — fixed-window rate limiting (e.g. 3/sec). |
10
+ | `with_period` | "Has a successful run **already happened in this calendar bucket**?" — dedup / once-per-period. |
11
+
12
+ They are orthogonal and composable: a reactor can declare any combination.
13
+
14
+ A typical use case:
15
+
16
+ - Only one `RefundOrderReactor` should run per order at a time → exclusive lock keyed by order id.
17
+ - Calls to an external service should never exceed 5 concurrent requests → semaphore with `limit: 5`.
18
+ - Calls to a rate-limited API must respect "3 per second AND 100 per minute" → multi-window `with_rate_limit`.
19
+ - A monthly billing reactor should run exactly once per org per month, even if a buggy scheduler enqueues it daily → period gate keyed by org id with `every: :month`.
20
+
21
+ The lock/semaphore primitives:
22
+
23
+ - Are acquired before any step runs and released in an `ensure` block (so a crash, failure, or interrupt does not leak a holder).
24
+ - Snooze (re-enqueue) instead of fail when contention is encountered inside a Sidekiq worker.
25
+ - Carry a TTL so a crashed Ruby process cannot block the resource forever.
26
+
27
+ The period primitive is different: it is **dedup**, not concurrency. It records a marker after a successful run and skips subsequent runs in the same calendar bucket.
28
+
29
+ ## Table of Contents
30
+
31
+ - [Exclusive Locks](#exclusive-locks)
32
+ - [Re-entrancy](#re-entrancy)
33
+ - [Auto-extend (TTL keepalive)](#auto-extend-ttl-keepalive)
34
+ - [Inline vs async behavior on contention](#inline-vs-async-behavior-on-contention)
35
+ - [Owner identity](#owner-identity)
36
+ - [Semaphores](#semaphores)
37
+ - [Token model](#token-model)
38
+ - [Release safety](#release-safety)
39
+ - [Rate Limits](#rate-limits)
40
+ - [Single window](#single-window)
41
+ - [Multi-window quotas](#multi-window-quotas)
42
+ - [Algorithm & atomicity](#algorithm--atomicity)
43
+ - [Smart snooze on async](#smart-snooze-on-async)
44
+ - [Periods (once-per-bucket dedup)](#periods-once-per-bucket-dedup)
45
+ - [Bucket model](#bucket-model)
46
+ - [When the marker is written](#when-the-marker-is-written)
47
+ - [Composing with `with_lock`](#composing-with-with_lock)
48
+ - [The `Skipped` result](#the-skipped-result)
49
+ - [Skipping mid-reactor from a step](#skipping-mid-reactor-from-a-step)
50
+ - [Snooze configuration](#snooze-configuration)
51
+ - [Inheritance](#inheritance)
52
+ - [Observability](#observability)
53
+ - [Limitations](#limitations)
54
+
55
+ ## Exclusive Locks
56
+
57
+ Declare an exclusive lock on a reactor with the `with_lock` DSL. The block receives the reactor inputs and must return the **lock key** as a string.
58
+
59
+ ```ruby
60
+ class RefundOrderReactor < RubyReactor::Reactor
61
+ input :order_id
62
+
63
+ with_lock(ttl: 60) { |inputs| "order:#{inputs[:order_id]}" }
64
+
65
+ step :refund do
66
+ argument :order_id, input(:order_id)
67
+ run { |args| PaymentGateway.refund(args[:order_id]) }
68
+ end
69
+ end
70
+ ```
71
+
72
+ While the reactor is running, every other caller trying to acquire `lock:order:<id>` either snoozes (async) or raises `RubyReactor::Lock::AcquisitionError` (inline).
73
+
74
+ ### Re-entrancy
75
+
76
+ Composed reactors share the same lock owner, so they can re-acquire a lock that an outer reactor already holds without blocking themselves:
77
+
78
+ ```ruby
79
+ class InventoryReactor < RubyReactor::Reactor
80
+ with_lock { |inputs| "warehouse:#{inputs[:warehouse_id]}" }
81
+
82
+ compose :stock_check, StockCheckReactor # also locks "warehouse:<id>"
83
+ end
84
+ ```
85
+
86
+ Re-entrancy is owner-based — a sibling process trying to grab `warehouse:<id>` while `InventoryReactor` runs will still be blocked. See [Owner identity](#owner-identity) for what counts as "the same owner."
87
+
88
+ ### Auto-extend (TTL keepalive)
89
+
90
+ Long-running steps can outlive the `ttl` you pick. To prevent the lock from expiring mid-execution, RubyReactor **auto-extends** locks by default: a background thread refreshes the TTL every `ttl / 3` seconds (minimum 1s) while the reactor runs, and stops on release.
91
+
92
+ ```ruby
93
+ # Default: keepalive enabled
94
+ with_lock(ttl: 60) { |i| "k:#{i[:id]}" }
95
+
96
+ # Disable if you trust ttl to outlast every step
97
+ with_lock(ttl: 60, auto_extend: false) { |i| "k:#{i[:id]}" }
98
+ ```
99
+
100
+ If the Ruby process dies, the extender dies with it, so the TTL still kicks in and the lock becomes acquirable again.
101
+
102
+ ### Inline vs async behavior on contention
103
+
104
+ The behavior on a "lock already held" condition depends on **where** the reactor is running:
105
+
106
+ | Caller | Behavior on contention |
107
+ | --------------------- | --------------------------------------------------------------------------------------------------------------- |
108
+ | Inline (`Reactor.run`) | Raises `RubyReactor::Lock::AcquisitionError`. The caller decides whether to retry, switch to async, or give up. |
109
+ | Sidekiq worker | Snoozes the job via `perform_in(delay, ...)`. **Does not** consume the Sidekiq retry budget. |
110
+
111
+ The async path also force-disables `wait:` (no `sleep`/BLPOP inside a worker thread) — better to snooze the job than to tie up a worker.
112
+
113
+ After `lock_snooze_max_attempts` snoozes, the worker stops re-enqueuing and marks the context as failed. See [Snooze configuration](#snooze-configuration).
114
+
115
+ ```ruby
116
+ # Inline error handling
117
+ begin
118
+ RefundOrderReactor.run(order_id: 42)
119
+ rescue RubyReactor::Lock::AcquisitionError
120
+ # Someone else is refunding this order; surface a 409, retry later, or hand
121
+ # off to async:
122
+ RubyReactor::SidekiqWorkers::Worker.perform_async(...)
123
+ end
124
+ ```
125
+
126
+ ### Owner identity
127
+
128
+ The lock owner is the **root context id** of the currently-executing reactor — meaning every reactor *invocation* is its own owner, but every composed/nested reactor inside that invocation shares the owner.
129
+
130
+ Two implications:
131
+
132
+ - A user-triggered retry that creates a new top-level run has a **new** owner. If the previous run's lock has not expired yet (e.g. process crashed without auto-extend), the retry will see contention.
133
+ - Across the async pause/resume boundary, the lock is released on pause and re-acquired on resume — a separate runner can sneak in between. Lean on `ttl` and idempotency to make this safe.
134
+
135
+ ## Semaphores
136
+
137
+ A semaphore caps **concurrent executions** of a reactor across processes. Declare one with `with_semaphore`:
138
+
139
+ ```ruby
140
+ class GeocodeReactor < RubyReactor::Reactor
141
+ input :address
142
+
143
+ with_semaphore(limit: 5) { |inputs| "geocode_api" }
144
+
145
+ step :geocode do
146
+ argument :address, input(:address)
147
+ run { |args| Geocoder.lookup(args[:address]) }
148
+ end
149
+ end
150
+ ```
151
+
152
+ At any time, at most five `GeocodeReactor` invocations run concurrently across your fleet. The 6th call snoozes (async) or raises `RubyReactor::Semaphore::AcquisitionError` (inline).
153
+
154
+ ### Token model
155
+
156
+ Internally a semaphore is a Redis `LIST` of unique UUID tokens plus a `SET` tracking which tokens are currently held:
157
+
158
+ - `semaphore:<key>` — LIST of available token UUIDs.
159
+ - `semaphore:<key>:held` — SET of UUIDs currently checked out.
160
+ - `semaphore:<key>:init` — initialization sentinel (value = `limit`).
161
+
162
+ `acquire` does an atomic `LPOP + SADD` (Lua). `release` does a guarded `SREM + RPUSH` (Lua) so a token is only returned to the pool if the caller actually held it.
163
+
164
+ ### Release safety
165
+
166
+ The release script enforces two invariants:
167
+
168
+ 1. The token must be in `:held` (no spurious releases for tokens that were never acquired).
169
+ 2. After release, the list size cannot exceed `limit` (no over-cap RPUSH).
170
+
171
+ This means a buggy double-release, a stale token from a crashed process, or a forged release attempt cannot inflate the pool beyond its configured capacity.
172
+
173
+ ## Rate Limits
174
+
175
+ `with_rate_limit` caps **how many runs are allowed within a time window**, regardless of whether they overlap in time. This is what you want for "no more than 3 calls per second to the Stripe API."
176
+
177
+ It is not the same as `with_semaphore`:
178
+
179
+ - Semaphore: "no more than N **concurrent** runs at any instant."
180
+ - Rate limit: "no more than N runs **starting** within any X-second window."
181
+
182
+ A reactor making three back-to-back API calls in 100ms hits a `3/sec` rate limit on the fourth — even though only one is ever in flight at a time.
183
+
184
+ ### Single window
185
+
186
+ ```ruby
187
+ class ChargeReactor < RubyReactor::Reactor
188
+ input :account_id
189
+
190
+ with_rate_limit(limit: 3, period: :second) { |inputs| "stripe:#{inputs[:account_id]}" }
191
+
192
+ step :charge do
193
+ argument :account_id, input(:account_id)
194
+ run { |args| Stripe.charge(args[:account_id]) }
195
+ end
196
+ end
197
+ ```
198
+
199
+ `period:` accepts the same units as `with_period`: `:second`, `:minute`, `:hour`, `:day`, `:week`, `:month`, `:year`, or integer seconds.
200
+
201
+ The block returns the **key base**; each window stores its counter under `rate:<base>:<period_name>:<bucket_id>` so different periods don't collide.
202
+
203
+ ### Multi-window quotas
204
+
205
+ Real upstream APIs typically expose layered limits ("3/sec AND 100/min AND 5000/hr"). Pass them all in one call with `limits:`:
206
+
207
+ ```ruby
208
+ with_rate_limit(
209
+ limits: { second: 3, minute: 100, hour: 5000 }
210
+ ) { |inputs| "stripe:#{inputs[:account_id]}" }
211
+ ```
212
+
213
+ All windows are checked atomically in one Lua call. **If any window fails, none of the others get incremented** — so a burst that blows the per-second cap doesn't also burn a per-minute slot.
214
+
215
+ The error reports the tightest (failing) window:
216
+
217
+ ```ruby
218
+ begin
219
+ ChargeReactor.run(account_id: 42)
220
+ rescue RubyReactor::RateLimit::ExceededError => e
221
+ e.period_name # => "second"
222
+ e.limit # => 3
223
+ e.period_seconds # => 1
224
+ e.retry_after_seconds # => seconds until the bucket rolls (1..period)
225
+ e.key_base # => "stripe:42"
226
+ end
227
+ ```
228
+
229
+ ### Algorithm & atomicity
230
+
231
+ Fixed-window counter (same family as the [kpumuk/throttling](https://github.com/kpumuk/throttling) gem):
232
+
233
+ - Bucket id = `floor(now / period_seconds)`. It changes the instant the period rolls, so old buckets become irrelevant the moment they expire — no cleanup needed.
234
+ - One Redis `INCR` per window, with a single `EXPIRE` on the first increment of a new bucket. TTL = `2 * period_seconds` for safety.
235
+ - Multi-window: two passes inside a single Lua script — check all, then increment all. No interleaving with other clients.
236
+
237
+ Trade-off vs token bucket: fixed-window can allow up to 2× the limit across the very boundary (3 at `:59.99` + 3 at `:00.01` = 6 in 20ms). For typical upstream API limits this is fine; if you need strict pacing, layer a second `with_rate_limit(limit: 1, period: <interval>)`.
238
+
239
+ ### Smart snooze on async
240
+
241
+ When a Sidekiq worker hits a rate limit, it reads `retry_after_seconds` off the error and snoozes for **exactly** that long (plus jitter, floored at 0.1s). The next attempt fires the moment the bucket rolls — no busy waiting, no fixed cadence.
242
+
243
+ This shares the existing snooze cap (`lock_snooze_max_attempts`). After the cap is reached, the context is marked `:failed`, same as for lock/semaphore contention.
244
+
245
+ | Caller | Behavior on rate-limit hit |
246
+ | ------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
247
+ | Inline | Raises `RubyReactor::RateLimit::ExceededError`. Caller can `sleep(error.retry_after_seconds); retry` or surface 429 to its user. |
248
+ | Sidekiq async | Snoozes `perform_in(retry_after + jitter, ...)`. Does not burn Sidekiq retry budget. Counted against `lock_snooze_max_attempts`. |
249
+
250
+ The rate-limit check happens **before** lock/semaphore acquisition: a job that would be rate-limited never grabs a mutex.
251
+
252
+ ## Periods (once-per-bucket dedup)
253
+
254
+ The period gate solves a different problem from locks and semaphores: it ensures a reactor runs **at most once per calendar bucket**, regardless of how many times its caller enqueues it.
255
+
256
+ A typical scenario:
257
+
258
+ > "Send the monthly billing report once a month. A scheduling bug now enqueues this reactor daily — we don't want 30 duplicate reports."
259
+
260
+ ```ruby
261
+ class MonthlyBillingReactor < RubyReactor::Reactor
262
+ input :org_id
263
+
264
+ with_period(every: :month) { |inputs| "monthly_billing:#{inputs[:org_id]}" }
265
+
266
+ step :build do
267
+ argument :org_id, input(:org_id)
268
+ run { |args| Billing.generate(args[:org_id]) }
269
+ end
270
+ end
271
+ ```
272
+
273
+ After the first successful run in May 2026, every other `MonthlyBillingReactor.run(org_id: 42)` call until June 1 (UTC) returns a `RubyReactor::Skipped` result. **No steps execute.**
274
+
275
+ ### Bucket model
276
+
277
+ `every:` accepts:
278
+
279
+ - Symbols: `:minute`, `:hour`, `:day`, `:week`, `:month`, `:year` — calendar-aligned UTC buckets. Two calls at `2026-05-31 23:59 UTC` and `2026-06-01 00:01 UTC` fall into different `:month` buckets, even though they're two minutes apart.
280
+ - Integer seconds: e.g. `every: 3600` — sliding bucket computed as `time.to_i / every`.
281
+
282
+ The block returns the **base key**. The final Redis marker is `period:<base>:<bucket_id>`, e.g. `period:monthly_billing:42:2026-05`.
283
+
284
+ | Symbol | Bucket format example | TTL stored on marker |
285
+ | --------- | ------------------------- | -------------------- |
286
+ | `:minute` | `2026-05-15T14-30` | 120 s |
287
+ | `:hour` | `2026-05-15T14` | 7 200 s |
288
+ | `:day` | `2026-05-15` | 172 800 s |
289
+ | `:week` | `2026-W20` (ISO week) | 1 209 600 s |
290
+ | `:month` | `2026-05` | ~62 days |
291
+ | `:year` | `2026` | ~2 years |
292
+
293
+ TTL is always **twice the period length** so the marker reliably dedups the next attempt, even with clock skew across the boundary.
294
+
295
+ ### When the marker is written
296
+
297
+ The marker is written **only after a terminal `Success`** (and after the reactor's `mark_period_on_success` runs, which the executor handles automatically). This means:
298
+
299
+ - A failed run does **not** consume the bucket — the next attempt can succeed.
300
+ - A paused run (interrupted, async-handed-off) does **not** consume the bucket until the eventual resume completes successfully.
301
+ - A `Skipped` result does **not** re-mark the bucket (no-op).
302
+
303
+ Resume paths skip the period check entirely — a paused reactor must never skip *itself* when its eventual marker appears.
304
+
305
+ ### Composing with `with_lock`
306
+
307
+ `with_period` alone is dedup, not concurrency. Two callers that fire at exactly the same time may both see "no marker yet" and both run. That's usually fine if the work is idempotent, but if you need strict at-most-one-per-bucket, pair it with `with_lock`:
308
+
309
+ ```ruby
310
+ class MonthlyBillingReactor < RubyReactor::Reactor
311
+ # Mutex: only one runner at a time per org.
312
+ with_lock(ttl: 600) { |inputs| "monthly_billing:#{inputs[:org_id]}" }
313
+ # Dedup: each (org, month) tuple runs only once.
314
+ with_period(every: :month) { |inputs| "monthly_billing:#{inputs[:org_id]}" }
315
+ end
316
+ ```
317
+
318
+ Order of evaluation per call:
319
+
320
+ 1. **Period check.** If marker exists, return `Skipped` immediately. No lock acquired, no steps run.
321
+ 2. **Lock acquire.** Standard concurrency control kicks in.
322
+ 3. **Run steps.**
323
+ 4. **On terminal Success: mark the period bucket.**
324
+ 5. **Release lock.**
325
+
326
+ ### The `Skipped` result
327
+
328
+ `RubyReactor::Skipped` is a Success-subclass result returned in two situations:
329
+
330
+ 1. **Implicit period gate**, as shown above — a `with_period` reactor reruns in an already-claimed bucket.
331
+ 2. **Explicit step return** — a step's `run` block returns `RubyReactor.Skipped(...)` to halt the reactor cleanly without compensation. See [Skipping mid-reactor from a step](#skipping-mid-reactor-from-a-step) below.
332
+
333
+ Both shapes share the same API:
334
+
335
+ ```ruby
336
+ result = MonthlyBillingReactor.run(org_id: 42)
337
+
338
+ result.success? # => true (Skipped is a Success subclass)
339
+ result.skipped? # => true
340
+ result.reason # => :period (or whatever the step passed)
341
+ result.period_key # => "period:monthly_billing:42:2026-05" (period gate only)
342
+ result.step_name # => :build_report (step return only)
343
+ ```
344
+
345
+ `Skipped` deliberately satisfies `success?` so existing `if result.success? ... else ...` branches still take the right path. Code that wants to log or count skips explicitly checks `result.skipped?`.
346
+
347
+ The reactor's context status becomes `:skipped` (rather than `:completed`), so dashboards can render skip events distinctly.
348
+
349
+ ### Skipping mid-reactor from a step
350
+
351
+ You can also produce a `Skipped` result from inside a step's `run` block. This is useful when a step discovers that the rest of the workflow is unnecessary **and the partial progress so far is fine to keep**.
352
+
353
+ ```ruby
354
+ class SyncSubscriberReactor < RubyReactor::Reactor
355
+ input :user_id
356
+
357
+ step :fetch_user do
358
+ argument :user_id, input(:user_id)
359
+ run { |args| User.find(args[:user_id]) }
360
+ end
361
+
362
+ step :ensure_active do
363
+ argument :user, result(:fetch_user)
364
+ run do |args|
365
+ # Nothing to do — bail out, but keep the user-fetch we already did.
366
+ next RubyReactor.Skipped(reason: "user_opted_out") if args[:user].opted_out?
367
+
368
+ RubyReactor.Success(args[:user])
369
+ end
370
+ end
371
+
372
+ step :push_to_mailing_list do
373
+ argument :user, result(:ensure_active)
374
+ run { |args| Mailchimp.subscribe(args[:user]) }
375
+ end
376
+ end
377
+
378
+ result = SyncSubscriberReactor.run(user_id: 42)
379
+
380
+ if result.skipped?
381
+ Rails.logger.info("Sync skipped (#{result.reason}) at step #{result.step_name}")
382
+ end
383
+ ```
384
+
385
+ What happens when a step returns `Skipped`:
386
+
387
+ | Aspect | Behavior |
388
+ | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
389
+ | Remaining steps | Not executed. The reactor halts at the skipping step. |
390
+ | Previously completed steps | **Left intact — no compensation** runs. This is the critical difference vs `Failure`. |
391
+ | Step's value | Not stored in `intermediate_results` (it produced no usable output). Downstream never runs, so unreachable. |
392
+ | Execution trace | A `{ type: :skipped, step: <name>, reason: <reason> }` entry is appended. |
393
+ | Returned `Skipped` | Carries `step_name` (the halting step) and `reason` (whatever the user passed). |
394
+ | `Reactor.run` / `result.success?` | Returns the `Skipped`. `success?` is `true`, `skipped?` is `true`, status `:skipped`. |
395
+
396
+ **`Skipped` vs `Failure` decision matrix:**
397
+
398
+ | Situation | Return |
399
+ | ---------------------------------------------------- | ------------------------------------------------- |
400
+ | Step did its job; subsequent steps not needed | `RubyReactor.Skipped(reason: "...")` |
401
+ | Step couldn't proceed because of an error | `RubyReactor.Failure(error)` — triggers undo path |
402
+ | Step succeeded normally | `RubyReactor.Success(value)` |
403
+
404
+ A common smell to avoid: returning `Skipped` from a step that has just done **partial** work that needs cleanup. If you'd want compensation to run, use `Failure` instead — `Skipped` explicitly says "the partial progress is correct, stop here."
405
+
406
+ ## Snooze configuration
407
+
408
+ When a Sidekiq worker hits contention it re-enqueues itself after a small delay. Three knobs on `RubyReactor.configuration` control this:
409
+
410
+ ```ruby
411
+ RubyReactor.configure do |config|
412
+ # Base seconds before the worker re-checks contention.
413
+ config.lock_snooze_base_delay = 5
414
+
415
+ # Extra random seconds added on top to avoid thundering herd
416
+ # (delay = base + rand(0..jitter)).
417
+ config.lock_snooze_jitter = 5
418
+
419
+ # Maximum snoozes per job. After this, the context is marked :failed
420
+ # and no more reschedules happen. Set to :infinity to never give up.
421
+ config.lock_snooze_max_attempts = 20
422
+ end
423
+ ```
424
+
425
+ The current snooze count is tracked as a positional arg on the Sidekiq job, so it survives reschedules but stays per-job (parallel jobs don't share a counter).
426
+
427
+ ## Inheritance
428
+
429
+ Lock, semaphore, rate-limit, and period config defined on a reactor are propagated to subclasses:
430
+
431
+ ```ruby
432
+ class BaseRefund < RubyReactor::Reactor
433
+ with_lock { |i| "order:#{i[:order_id]}" }
434
+ # ...
435
+ end
436
+
437
+ class FullRefund < BaseRefund # also locks "order:<id>"
438
+ end
439
+ ```
440
+
441
+ A subclass can call `with_lock` / `with_semaphore` / `with_rate_limit` / `with_period` again to override the inherited configuration.
442
+
443
+ ## Observability
444
+
445
+ - Snooze escalation, release failures, and "release on something we did not actually hold" conditions are logged via `RubyReactor.configuration.logger.warn`.
446
+ - The current owner of a lock is in the Redis hash `lock:<key>` under field `owner`.
447
+ - The held-tokens set for a semaphore is `semaphore:<key>:held`. Its cardinality plus `LLEN semaphore:<key>` should always equal `limit` at rest.
448
+ - The period marker is the plain key `period:<base>:<bucket_id>`. `TTL` on that key tells you when the bucket frees up.
449
+ - A `Skipped` result sets context status to `:skipped` (separate from `:completed`/`:failed`).
450
+ - Rate-limit counters are at `rate:<base>:<period_name>:<bucket_id>`. `GET` gives the current count for the window; `TTL` gives time until the bucket rolls.
451
+
452
+ ## Limitations
453
+
454
+ - **Step-level locking** is not yet supported — locks apply to the whole reactor run. Same for `with_period`.
455
+ - **Inline retries** do not increment the snooze counter (they are not Sidekiq-scheduled). If you retry inline in a loop, add your own backoff.
456
+ - **Multi-Redis** failover is not addressed. The lock is as durable as your Redis deployment; for cross-region critical sections, consider an external locking service.
457
+ - **Wait inside a Sidekiq worker** is intentionally disabled. If you want to keep a worker thread parked on `BLPOP`, run that reactor inline instead.
458
+ - **`with_period` alone is not a mutex.** Concurrent racers can both run before either has written the marker. Pair with `with_lock` if you need true at-most-one-per-bucket. The period is calendar-aligned, not "N hours since last run"; if you need sliding semantics, pass an integer `every:`.
459
+ - **`with_rate_limit` is fixed-window.** Up to 2× the limit can run across a single window boundary. For strict pacing, use a token-bucket-style external rate limiter or stack a tighter `with_rate_limit(limit: 1, period: <interval>)` for serialized requests.
@@ -85,11 +85,12 @@ The backoff strategy for calculating delays between retry attempts.
85
85
  - `:fixed`: Same delay for each attempt
86
86
 
87
87
  ### base_delay
88
+
88
89
  The base delay for retry calculations. Can be a number (seconds) or ActiveSupport duration.
89
90
 
90
91
  ```ruby
91
- retry base_delay: 5.seconds
92
- retry base_delay: 300 # 5 minutes in seconds
92
+ retries base_delay: 5.seconds
93
+ retries base_delay: 300 # 5 minutes in seconds
93
94
  ```
94
95
 
95
96
  ## Backoff Strategies
@@ -222,24 +223,27 @@ class CustomRetryReactor < RubyReactor::Reactor
222
223
 
223
224
  step :call_external_api do
224
225
  retries max_attempts: 5, backoff: :exponential, base_delay: 1.second
225
- run do
226
- result = ExternalAPI.call
227
- # Raise specific errors based on response
228
- case result.status
226
+ run do |_args, _ctx|
227
+ response = ExternalAPI.call
228
+ # Build a Failure with the right retryable flag so the retry manager
229
+ # can short-circuit non-transient errors.
230
+ case response.status
229
231
  when 429 # Rate limited
230
- Failure(RateLimitError.new(result) retryable: true)
232
+ Failure(RateLimitError.new(response), retryable: true)
231
233
  when 500 # Server error
232
- Failure(ServerError.new(result) retryable: true)
233
- when 400 # Bad request
234
- Failure(ValidationError.new(result) retryable: false)
234
+ Failure(ServerError.new(response), retryable: true)
235
+ when 400 # Bad request - don't retry
236
+ Failure(ValidationError.new(response), retryable: false)
235
237
  else
236
- result
238
+ Success(response)
237
239
  end
238
240
  end
239
241
  end
240
242
  end
241
243
  ```
242
244
 
245
+ When a `Failure` is returned with `retryable: false`, the retry manager stops immediately and falls through to compensation. Custom error classes can also implement `retryable?` to control this from the exception side.
246
+
243
247
  ## Monitoring and Observability
244
248
 
245
249
  ### Retry Metrics
@@ -326,10 +330,11 @@ RSpec.describe PaymentReactor do
326
330
 
327
331
  expect(PaymentService).to receive(:charge).exactly(3).times
328
332
 
329
- result = PaymentReactor.run(card_token: "tok_123", amount: 100)
333
+ subject = test_reactor(PaymentReactor, card_token: "tok_123", amount: 100)
330
334
 
331
- expect(result).to be_success
332
- expect(result.step_results[:charge_card][:payment_id]).to eq("pay_123")
335
+ expect(subject).to be_success
336
+ expect(subject).to have_retried_step(:charge_card).times(2)
337
+ expect(subject.step_result(:charge_card)[:payment_id]).to eq("pay_123")
333
338
  end
334
339
  end
335
340
  ```