chrono_forge 0.9.1 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +22 -0
  3. data/README.md +305 -44
  4. data/docs/superpowers/plans/2026-06-25-chrono_forge-dashboard.md +1748 -0
  5. data/docs/superpowers/plans/2026-06-25-chrono_forge-dashboard.md.tasks.json +17 -0
  6. data/docs/superpowers/plans/2026-06-25-composite-retry-policies.md +930 -0
  7. data/docs/superpowers/plans/2026-06-25-composite-retry-policies.md.tasks.json +54 -0
  8. data/docs/superpowers/plans/2026-06-25-reserved-kwarg-guard.md +241 -0
  9. data/docs/superpowers/plans/2026-06-25-reserved-kwarg-guard.md.tasks.json +12 -0
  10. data/docs/superpowers/plans/2026-06-26-branches-spawn-merge.md +1378 -0
  11. data/docs/superpowers/plans/2026-06-26-branches-spawn-merge.md.tasks.json +67 -0
  12. data/docs/superpowers/plans/2026-06-26-deferral-continuation-race-and-catchup.md +709 -0
  13. data/docs/superpowers/plans/2026-06-26-deferral-continuation-race-and-catchup.md.tasks.json +19 -0
  14. data/docs/superpowers/specs/2026-06-03-unified-retry-policy-design.md +226 -0
  15. data/docs/superpowers/specs/2026-06-25-chrono_forge-dashboard-design.md +190 -0
  16. data/docs/superpowers/specs/2026-06-25-composite-retry-policies-design.md +228 -0
  17. data/docs/superpowers/specs/2026-06-25-reserved-kwarg-guard-design.md +169 -0
  18. data/docs/superpowers/specs/2026-06-25-spawn-merge-branches-design.md +468 -0
  19. data/docs/superpowers/specs/2026-06-26-dashboard-branch-view-design.md +142 -0
  20. data/docs/superpowers/specs/2026-06-26-deferral-continuation-race-and-catchup-design.md +265 -0
  21. data/lib/chrono_forge/branch_merge_job.rb +138 -0
  22. data/lib/chrono_forge/branch_probe.rb +26 -0
  23. data/lib/chrono_forge/cleanup.rb +6 -0
  24. data/lib/chrono_forge/execution_log.rb +6 -0
  25. data/lib/chrono_forge/executor/composite_retry_policy.rb +47 -0
  26. data/lib/chrono_forge/executor/methods/branch.rb +185 -0
  27. data/lib/chrono_forge/executor/methods/durably_execute.rb +21 -19
  28. data/lib/chrono_forge/executor/methods/durably_repeat.rb +118 -25
  29. data/lib/chrono_forge/executor/methods/merge_branches.rb +83 -0
  30. data/lib/chrono_forge/executor/methods/wait.rb +2 -4
  31. data/lib/chrono_forge/executor/methods/wait_until.rb +25 -25
  32. data/lib/chrono_forge/executor/methods/workflow_states.rb +16 -0
  33. data/lib/chrono_forge/executor/methods.rb +2 -0
  34. data/lib/chrono_forge/executor/retry_policy.rb +111 -0
  35. data/lib/chrono_forge/executor.rb +216 -28
  36. data/lib/chrono_forge/version.rb +1 -1
  37. data/lib/chrono_forge/workflow.rb +10 -1
  38. data/lib/generators/chrono_forge/migration_actions.rb +1 -0
  39. data/lib/generators/chrono_forge/templates/add_chrono_forge_parent_execution_log.rb +38 -0
  40. metadata +42 -5
  41. data/lib/chrono_forge/executor/retry_strategy.rb +0 -29
@@ -0,0 +1,930 @@
1
+ # Composite Retry Policies Implementation Plan
2
+
3
+ > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development (recommended) or superpowers-extended-cc:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
4
+
5
+ **Goal:** Let any retry site be configured with an ordered list of `RetryPolicy` objects so each error type gets its own independent attempt budget and backoff.
6
+
7
+ **Architecture:** A new pure `CompositeRetryPolicy` holds an ordered list of `RetryPolicy` objects and routes a failure to the first whose `retry_on` matches the live error (`is_a?`). The matched policy's index keys a `retry_counts` map — stored in execution/repetition log `metadata` for step sites, and threaded through the job args for the workflow-level site (mirroring the existing `attempt:` split). The single-policy path is unchanged and writes no `retry_counts`.
8
+
9
+ **Tech Stack:** Ruby, Rails (ActiveRecord/ActiveJob), Zeitwerk autoloading, Minitest + ChaoticJob.
10
+
11
+ **User Verification:** NO — no user verification required (internal API; correctness is covered by unit + integration tests).
12
+
13
+ ---
14
+
15
+ ### Task 1: `RetryPolicy` — `matches?`, `retry_backoff`, `compose`
16
+
17
+ **Goal:** Add the three pure additions to `RetryPolicy` without changing existing behavior.
18
+
19
+ **Files:**
20
+ - Modify: `lib/chrono_forge/executor/retry_policy.rb`
21
+ - Test: `test/retry_policy_test.rb`
22
+
23
+ **Acceptance Criteria:**
24
+ - [ ] `matches?(error)` returns the routing predicate (`nil` → any StandardError, `[]` → none, list → class/subclass)
25
+ - [ ] `retry_backoff(error, attempts:)` returns a `Duration` when retryable, `nil` otherwise, and ignores any block
26
+ - [ ] `RetryPolicy.compose(*policies)` returns a `CompositeRetryPolicy`
27
+ - [ ] All existing `RetryPolicyTest` tests still pass
28
+
29
+ **Verify:** `bin/rails test test/retry_policy_test.rb` → all pass (0 failures)
30
+
31
+ **Steps:**
32
+
33
+ - [ ] **Step 1: Write the failing tests**
34
+
35
+ Add to `test/retry_policy_test.rb`, before the final `end`:
36
+
37
+ ```ruby
38
+ # --- matches?: routing predicate ---
39
+
40
+ def test_matches_nil_retry_on_matches_any_standard_error
41
+ policy = RetryPolicy.new(retry_on: nil)
42
+ assert policy.matches?(CustomError.new)
43
+ assert policy.matches?(UnrelatedError.new)
44
+ end
45
+
46
+ def test_matches_empty_retry_on_matches_nothing
47
+ policy = RetryPolicy.new(retry_on: [])
48
+ refute policy.matches?(CustomError.new)
49
+ refute policy.matches?(StandardError.new)
50
+ end
51
+
52
+ def test_matches_list_matches_class_and_subclass
53
+ policy = RetryPolicy.new(retry_on: [CustomError])
54
+ assert policy.matches?(CustomError.new)
55
+ assert policy.matches?(SubError.new), "subclass matches"
56
+ refute policy.matches?(UnrelatedError.new)
57
+ end
58
+
59
+ # --- retry_backoff: plain policy ignores the block ---
60
+
61
+ def test_retry_backoff_returns_duration_when_retryable
62
+ policy = RetryPolicy.new(max_attempts: 3, base: 1, cap: 1000, jitter: false)
63
+ assert_in_delta 1.0, policy.retry_backoff(StandardError.new, attempts: 1).to_f, 0.001
64
+ end
65
+
66
+ def test_retry_backoff_returns_nil_past_cap
67
+ policy = RetryPolicy.new(max_attempts: 2)
68
+ assert_nil policy.retry_backoff(StandardError.new, attempts: 2)
69
+ end
70
+
71
+ def test_retry_backoff_ignores_block
72
+ policy = RetryPolicy.new(max_attempts: 3, base: 1, cap: 1000, jitter: false)
73
+ called = false
74
+ result = policy.retry_backoff(StandardError.new, attempts: 1) { |_idx| called = true; 99 }
75
+ refute called, "plain policy must not invoke the count block"
76
+ assert_in_delta 1.0, result.to_f, 0.001
77
+ end
78
+
79
+ # --- compose factory ---
80
+
81
+ def test_compose_builds_composite
82
+ composite = RetryPolicy.compose(RetryPolicy.new, RetryPolicy.new)
83
+ assert_instance_of ChronoForge::Executor::CompositeRetryPolicy, composite
84
+ assert_equal 2, composite.policies.size
85
+ end
86
+ ```
87
+
88
+ - [ ] **Step 2: Run the tests, confirm they fail**
89
+
90
+ Run: `bin/rails test test/retry_policy_test.rb`
91
+ Expected: failures — `NoMethodError: undefined method 'matches?'` / `retry_backoff` / `compose`.
92
+
93
+ - [ ] **Step 3: Implement the additions**
94
+
95
+ In `lib/chrono_forge/executor/retry_policy.rb`, add a public `matches?` and `retry_backoff` after `backoff_for` (before `def self.step_default`), and the `compose` factory among the class methods:
96
+
97
+ ```ruby
98
+ # Public routing predicate: would this policy handle this error at all?
99
+ # (independent of the attempt cap). nil retry_on = any StandardError;
100
+ # [] = nothing; a list = those classes and their subclasses.
101
+ def matches?(error)
102
+ retryable_error?(error)
103
+ end
104
+
105
+ # Single-call decision used by every retry site: the backoff Duration to
106
+ # retry, or nil to stop. A plain policy uses `attempts` and ignores any
107
+ # block (the block exists only so a CompositeRetryPolicy can supply a
108
+ # per-error count — see CompositeRetryPolicy#retry_backoff).
109
+ def retry_backoff(error, attempts:)
110
+ retryable?(error, attempts) ? backoff_for(attempts) : nil
111
+ end
112
+ ```
113
+
114
+ And add the factory next to the other `self.` methods:
115
+
116
+ ```ruby
117
+ # Build a composite policy from an ordered list of RetryPolicy objects.
118
+ def self.compose(*policies)
119
+ CompositeRetryPolicy.new(policies)
120
+ end
121
+ ```
122
+
123
+ - [ ] **Step 4: Run the tests, confirm green**
124
+
125
+ Run: `bin/rails test test/retry_policy_test.rb`
126
+ Expected: all pass (the `compose` test depends on Task 2's class — if running this task in isolation before Task 2, that one test errors; it passes once Task 2 lands. Keep both tasks in the same review batch, or temporarily skip `test_compose_builds_composite` until Task 2).
127
+
128
+ - [ ] **Step 5: Commit**
129
+
130
+ ```bash
131
+ git add lib/chrono_forge/executor/retry_policy.rb test/retry_policy_test.rb
132
+ git commit -m "feat(retry): add RetryPolicy#matches?, #retry_backoff, and .compose"
133
+ ```
134
+
135
+ ```json:metadata
136
+ {"files": ["lib/chrono_forge/executor/retry_policy.rb", "test/retry_policy_test.rb"], "verifyCommand": "bin/rails test test/retry_policy_test.rb", "acceptanceCriteria": ["matches? routing predicate", "retry_backoff returns Duration/nil and ignores block", "compose returns CompositeRetryPolicy", "existing tests pass"], "requiresUserVerification": false}
137
+ ```
138
+
139
+ ---
140
+
141
+ ### Task 2: `CompositeRetryPolicy` class
142
+
143
+ **Goal:** Add the pure `CompositeRetryPolicy` value object with routing, block-driven counting, and a coarse `max_attempts`.
144
+
145
+ **Files:**
146
+ - Create: `lib/chrono_forge/executor/composite_retry_policy.rb`
147
+ - Test: `test/composite_retry_policy_test.rb`
148
+
149
+ **Acceptance Criteria:**
150
+ - [ ] `policy_for(error)` returns the first matching sub-policy or `nil`
151
+ - [ ] `retry_backoff` routes on the live error, yields the matched policy's index, and uses the yielded count for the decision and backoff
152
+ - [ ] without a block, `retry_backoff` falls back to the passed `attempts`
153
+ - [ ] no match → `retry_backoff` returns `nil`
154
+ - [ ] `max_attempts` returns the coarsest bound, `nil` if any sub-policy is unbounded
155
+ - [ ] empty list raises `ArgumentError`
156
+
157
+ **Verify:** `bin/rails test test/composite_retry_policy_test.rb` → all pass
158
+
159
+ **Steps:**
160
+
161
+ - [ ] **Step 1: Write the failing test**
162
+
163
+ Create `test/composite_retry_policy_test.rb`:
164
+
165
+ ```ruby
166
+ require "test_helper"
167
+
168
+ class CompositeRetryPolicyTest < ActiveSupport::TestCase
169
+ RetryPolicy = ChronoForge::Executor::RetryPolicy
170
+ CompositeRetryPolicy = ChronoForge::Executor::CompositeRetryPolicy
171
+
172
+ class NetworkError < StandardError; end
173
+ class FlakyNetworkError < NetworkError; end
174
+ class RateLimitError < StandardError; end
175
+ class DeclinedError < StandardError; end
176
+
177
+ def composite
178
+ CompositeRetryPolicy.new([
179
+ RetryPolicy.new(retry_on: [NetworkError], max_attempts: 5, base: 1, cap: 1000, jitter: false),
180
+ RetryPolicy.new(retry_on: [RateLimitError], max_attempts: 10, base: 2, cap: 1000, jitter: false),
181
+ RetryPolicy.new(retry_on: [DeclinedError], max_attempts: 1)
182
+ ])
183
+ end
184
+
185
+ def test_empty_policy_list_raises
186
+ assert_raises(ArgumentError) { CompositeRetryPolicy.new([]) }
187
+ end
188
+
189
+ def test_policy_for_first_match_wins
190
+ catch_all = RetryPolicy.new(retry_on: nil)
191
+ c = CompositeRetryPolicy.new([RetryPolicy.new(retry_on: [NetworkError]), catch_all])
192
+ assert_equal NetworkError, c.policy_for(NetworkError.new).retry_on.first
193
+ assert_same catch_all, c.policy_for(RateLimitError.new), "falls through to catch-all"
194
+ end
195
+
196
+ def test_policy_for_subclass_routes_to_parent_policy
197
+ assert_equal [NetworkError], composite.policy_for(FlakyNetworkError.new).retry_on
198
+ end
199
+
200
+ def test_policy_for_no_match_returns_nil
201
+ assert_nil composite.policy_for(ArgumentError.new)
202
+ end
203
+
204
+ def test_retry_backoff_yields_matched_index_and_uses_count
205
+ yielded = nil
206
+ backoff = composite.retry_backoff(RateLimitError.new, attempts: 99) do |idx|
207
+ yielded = idx
208
+ 3 # pretend this is the 3rd rate-limit failure
209
+ end
210
+ assert_equal 1, yielded, "RateLimitError is the 2nd policy (index 1)"
211
+ # base 2, exponent (3-1)=2 -> 2 * 2**2 = 8
212
+ assert_in_delta 8.0, backoff.to_f, 0.001, "backoff uses the yielded count, not attempts:"
213
+ end
214
+
215
+ def test_retry_backoff_without_block_uses_attempts
216
+ backoff = composite.retry_backoff(NetworkError.new, attempts: 1)
217
+ assert_in_delta 1.0, backoff.to_f, 0.001
218
+ end
219
+
220
+ def test_retry_backoff_stops_at_matched_policy_cap
221
+ # DeclinedError policy max_attempts: 1 -> first failure (count 1) does not retry
222
+ assert_nil composite.retry_backoff(DeclinedError.new, attempts: 1) { |_idx| 1 }
223
+ end
224
+
225
+ def test_retry_backoff_no_match_returns_nil
226
+ assert_nil composite.retry_backoff(ArgumentError.new, attempts: 1) { |_idx| 1 }
227
+ end
228
+
229
+ def test_max_attempts_is_coarsest_bound
230
+ assert_equal 10, composite.max_attempts
231
+ end
232
+
233
+ def test_max_attempts_nil_when_any_unbounded
234
+ c = CompositeRetryPolicy.new([
235
+ RetryPolicy.new(max_attempts: 3),
236
+ RetryPolicy.new(max_attempts: nil)
237
+ ])
238
+ assert_nil c.max_attempts
239
+ end
240
+ end
241
+ ```
242
+
243
+ - [ ] **Step 2: Run the test, confirm it fails**
244
+
245
+ Run: `bin/rails test test/composite_retry_policy_test.rb`
246
+ Expected: `NameError: uninitialized constant ChronoForge::Executor::CompositeRetryPolicy`.
247
+
248
+ - [ ] **Step 3: Implement the class**
249
+
250
+ Create `lib/chrono_forge/executor/composite_retry_policy.rb`:
251
+
252
+ ```ruby
253
+ module ChronoForge
254
+ module Executor
255
+ # An ordered list of RetryPolicy objects, each scoped to an error type via
256
+ # its `retry_on`. On failure the first policy whose `retry_on` matches the
257
+ # raised error (by `is_a?`) is applied, giving each error type its own
258
+ # independent attempt budget and backoff curve. Put specific policies first
259
+ # and a catch-all (`retry_on: nil`) last; an unmatched error is not retried.
260
+ #
261
+ # Pure: it never reads storage. The per-error count is supplied by the
262
+ # caller through the block passed to #retry_backoff, keyed by the matched
263
+ # policy's index.
264
+ class CompositeRetryPolicy
265
+ attr_reader :policies
266
+
267
+ def initialize(policies)
268
+ @policies = Array(policies)
269
+ if @policies.empty?
270
+ raise ArgumentError, "composite retry policy needs at least one policy"
271
+ end
272
+ end
273
+
274
+ # First sub-policy whose retry_on matches the error, or nil.
275
+ def policy_for(error)
276
+ @policies.find { |p| p.matches?(error) }
277
+ end
278
+
279
+ # Routes on the live error and delegates the decision to the matched
280
+ # sub-policy. When a block is given it is called with the matched policy's
281
+ # index and must return that policy's running attempt count (1-based,
282
+ # including the current failure); otherwise `attempts` is used.
283
+ def retry_backoff(error, attempts:)
284
+ index = @policies.index { |p| p.matches?(error) }
285
+ return nil if index.nil?
286
+
287
+ sub = @policies[index]
288
+ count = block_given? ? yield(index) : attempts
289
+ sub.retryable?(error, count) ? sub.backoff_for(count) : nil
290
+ end
291
+
292
+ # Coarsest attempt bound across sub-policies, for the workflow-level
293
+ # safety-net guard. nil (unbounded) if any sub-policy is unbounded.
294
+ def max_attempts
295
+ caps = @policies.map(&:max_attempts)
296
+ caps.include?(nil) ? nil : caps.max
297
+ end
298
+ end
299
+ end
300
+ end
301
+ ```
302
+
303
+ - [ ] **Step 4: Run the test, confirm green**
304
+
305
+ Run: `bin/rails test test/composite_retry_policy_test.rb test/retry_policy_test.rb`
306
+ Expected: all pass (this also makes Task 1's `test_compose_builds_composite` green).
307
+
308
+ - [ ] **Step 5: Commit**
309
+
310
+ ```bash
311
+ git add lib/chrono_forge/executor/composite_retry_policy.rb test/composite_retry_policy_test.rb
312
+ git commit -m "feat(retry): add CompositeRetryPolicy value object"
313
+ ```
314
+
315
+ ```json:metadata
316
+ {"files": ["lib/chrono_forge/executor/composite_retry_policy.rb", "test/composite_retry_policy_test.rb"], "verifyCommand": "bin/rails test test/composite_retry_policy_test.rb", "acceptanceCriteria": ["policy_for first-match/subclass/nil", "retry_backoff yields index and uses count", "no-block falls back to attempts", "max_attempts coarse bound", "empty list raises"], "requiresUserVerification": false}
317
+ ```
318
+
319
+ ---
320
+
321
+ ### Task 3: Executor coercion, class DSL overload, and `bump_retry_count!`
322
+
323
+ **Goal:** Accept arrays as composite policies everywhere, extend the class DSL to accept positional policies, and add the metadata counter helper used by step sites.
324
+
325
+ **Files:**
326
+ - Modify: `lib/chrono_forge/executor.rb`
327
+ - Test: `test/composite_retry_policy_executor_test.rb`
328
+
329
+ **Acceptance Criteria:**
330
+ - [ ] `coerce_policy` wraps an `Array` into a composite, passes a `RetryPolicy`/`CompositeRetryPolicy` through, and maps `nil` → `nil`
331
+ - [ ] `step_retry_policy` and `wait_retry_policy` coerce their override; `step_retry_policy` also coerces the class default
332
+ - [ ] `retry_policy(*policies)` with positional args stores a composite default; `retry_policy(**opts)` stays single; mixing raises `ArgumentError`
333
+ - [ ] `bump_retry_count!(log, idx)` increments the right slot, persists `metadata`, and returns the new count
334
+
335
+ **Verify:** `bin/rails test test/composite_retry_policy_executor_test.rb` → all pass
336
+
337
+ **Steps:**
338
+
339
+ - [ ] **Step 1: Write the failing test**
340
+
341
+ Create `test/composite_retry_policy_executor_test.rb`:
342
+
343
+ ```ruby
344
+ require "test_helper"
345
+
346
+ # White-box tests for the executor's composite plumbing: policy coercion, the
347
+ # class-level DSL overload, and the metadata-backed per-error counter.
348
+ class CompositeRetryPolicyExecutorTest < ActiveSupport::TestCase
349
+ RetryPolicy = ChronoForge::Executor::RetryPolicy
350
+ CompositeRetryPolicy = ChronoForge::Executor::CompositeRetryPolicy
351
+
352
+ # A bare object mixing in the executor so we can call its private helpers.
353
+ def executor
354
+ Class.new do
355
+ prepend ChronoForge::Executor
356
+ end.allocate
357
+ end
358
+
359
+ def test_coerce_policy_wraps_array
360
+ coerced = executor.send(:coerce_policy, [RetryPolicy.new, RetryPolicy.new])
361
+ assert_instance_of CompositeRetryPolicy, coerced
362
+ assert_equal 2, coerced.policies.size
363
+ end
364
+
365
+ def test_coerce_policy_passes_through_single_and_composite
366
+ single = RetryPolicy.new
367
+ assert_same single, executor.send(:coerce_policy, single)
368
+ composite = CompositeRetryPolicy.new([RetryPolicy.new])
369
+ assert_same composite, executor.send(:coerce_policy, composite)
370
+ end
371
+
372
+ def test_coerce_policy_nil
373
+ assert_nil executor.send(:coerce_policy, nil)
374
+ end
375
+
376
+ def test_class_dsl_positional_sets_composite_default
377
+ klass = Class.new do
378
+ prepend ChronoForge::Executor
379
+ retry_policy RetryPolicy.new(retry_on: [ArgumentError]), RetryPolicy.new(retry_on: nil)
380
+ end
381
+ assert_instance_of CompositeRetryPolicy, klass.default_retry_policy
382
+ end
383
+
384
+ def test_class_dsl_kwargs_sets_single_default
385
+ klass = Class.new do
386
+ prepend ChronoForge::Executor
387
+ retry_policy max_attempts: 7
388
+ end
389
+ assert_instance_of RetryPolicy, klass.default_retry_policy
390
+ assert_equal 7, klass.default_retry_policy.max_attempts
391
+ end
392
+
393
+ def test_class_dsl_mixing_positional_and_kwargs_raises
394
+ assert_raises(ArgumentError) do
395
+ Class.new do
396
+ prepend ChronoForge::Executor
397
+ retry_policy RetryPolicy.new, max_attempts: 3
398
+ end
399
+ end
400
+ end
401
+
402
+ def test_bump_retry_count_increments_and_persists
403
+ workflow = ChronoForge::Workflow.create!(job_class: "X", key: "bump-#{Time.now.to_i}-#{rand(10000)}")
404
+ log = ChronoForge::ExecutionLog.create!(workflow: workflow, step_name: "s", metadata: {})
405
+
406
+ assert_equal 1, executor.send(:bump_retry_count!, log, 0)
407
+ assert_equal 2, executor.send(:bump_retry_count!, log, 0)
408
+ assert_equal 1, executor.send(:bump_retry_count!, log, 1), "index 1 is independent"
409
+
410
+ log.reload
411
+ assert_equal({"0" => 2, "1" => 1}, log.metadata["retry_counts"])
412
+ end
413
+
414
+ def test_bump_retry_count_handles_nil_metadata
415
+ workflow = ChronoForge::Workflow.create!(job_class: "X", key: "bumpnil-#{Time.now.to_i}-#{rand(10000)}")
416
+ log = ChronoForge::ExecutionLog.create!(workflow: workflow, step_name: "s", metadata: nil)
417
+ assert_equal 1, executor.send(:bump_retry_count!, log, 0)
418
+ end
419
+ end
420
+ ```
421
+
422
+ - [ ] **Step 2: Run the test, confirm it fails**
423
+
424
+ Run: `bin/rails test test/composite_retry_policy_executor_test.rb`
425
+ Expected: failures — `coerce_policy` / `bump_retry_count!` undefined, DSL doesn't accept positional args.
426
+
427
+ - [ ] **Step 3: Implement the executor changes**
428
+
429
+ In `lib/chrono_forge/executor.rb`, replace the class DSL `retry_policy` method (currently inside the `class << base` block):
430
+
431
+ ```ruby
432
+ # Class-level DSL to set this workflow's default retry policy. Applies to
433
+ # workflow-level retries and to steps without a per-call override.
434
+ # Positional RetryPolicy objects build a composite (per-error budgets);
435
+ # keyword options build a single RetryPolicy. The two forms are mutually
436
+ # exclusive.
437
+ def retry_policy(*policies, **opts)
438
+ if policies.any? && opts.any?
439
+ raise ArgumentError, "retry_policy takes either positional policies or keyword options, not both"
440
+ end
441
+
442
+ self.default_retry_policy =
443
+ policies.any? ? RetryPolicy.compose(*policies) : RetryPolicy.new(**opts)
444
+ end
445
+ ```
446
+
447
+ Update the resolver methods to coerce, and add `coerce_policy` + `bump_retry_count!` among the private methods:
448
+
449
+ ```ruby
450
+ def step_retry_policy(override)
451
+ coerce_policy(override) || coerce_policy(self.class.default_retry_policy) || RetryPolicy.step_default
452
+ end
453
+
454
+ def wait_retry_policy(override)
455
+ coerce_policy(override) || RetryPolicy.wait_default
456
+ end
457
+ ```
458
+
459
+ ```ruby
460
+ # Normalize a retry-policy value: an Array becomes a composite; a RetryPolicy
461
+ # or CompositeRetryPolicy passes through; nil stays nil.
462
+ def coerce_policy(value)
463
+ value.is_a?(Array) ? RetryPolicy.compose(*value) : value
464
+ end
465
+
466
+ # JSON metadata key holding the per-error attempt counts of a composite
467
+ # policy, keyed by the matched policy's index (as a string).
468
+ RETRY_COUNTS_KEY = "retry_counts"
469
+
470
+ # Increment the matched policy's slot in the log's retry-count map and return
471
+ # the new count. Reassigns `metadata` so the JSON column is marked dirty.
472
+ def bump_retry_count!(log, policy_index)
473
+ meta = log.metadata || {}
474
+ counts = meta[RETRY_COUNTS_KEY] || {}
475
+ key = policy_index.to_s
476
+ counts[key] = counts[key].to_i + 1
477
+ meta[RETRY_COUNTS_KEY] = counts
478
+ log.update!(metadata: meta)
479
+ counts[key]
480
+ end
481
+ ```
482
+
483
+ Note: `workflow_retry_policy` does not need coercion — the class default is already coerced where it is set (the DSL stores a composite directly).
484
+
485
+ - [ ] **Step 4: Run the test, confirm green**
486
+
487
+ Run: `bin/rails test test/composite_retry_policy_executor_test.rb`
488
+ Expected: all pass.
489
+
490
+ - [ ] **Step 5: Commit**
491
+
492
+ ```bash
493
+ git add lib/chrono_forge/executor.rb test/composite_retry_policy_executor_test.rb
494
+ git commit -m "feat(retry): coerce array policies, composite class DSL, metadata counter"
495
+ ```
496
+
497
+ ```json:metadata
498
+ {"files": ["lib/chrono_forge/executor.rb", "test/composite_retry_policy_executor_test.rb"], "verifyCommand": "bin/rails test test/composite_retry_policy_executor_test.rb", "acceptanceCriteria": ["coerce_policy wraps array/passes through/nil", "resolvers coerce", "class DSL positional vs kwargs vs mixed", "bump_retry_count! increments+persists+nil-safe"], "requiresUserVerification": false}
499
+ ```
500
+
501
+ ---
502
+
503
+ ### Task 4: Wire the three step sites to `retry_backoff`
504
+
505
+ **Goal:** Switch `durably_execute`, `wait_until`, and `durably_repeat` from the `retryable? … backoff_for` pair to a single `retry_backoff` call that supplies the per-error count via `bump_retry_count!`.
506
+
507
+ **Files:**
508
+ - Modify: `lib/chrono_forge/executor/methods/durably_execute.rb:104-110`
509
+ - Modify: `lib/chrono_forge/executor/methods/wait_until.rb:129-135`
510
+ - Modify: `lib/chrono_forge/executor/methods/durably_repeat.rb:229-233`
511
+
512
+ **Acceptance Criteria:**
513
+ - [ ] Each site computes `backoff` via `policy.retry_backoff(e, attempts: <log.attempts>) { |idx| bump_retry_count!(<log>, idx) }`
514
+ - [ ] Retry branch uses the returned `backoff`; the else branch is the unchanged terminal action
515
+ - [ ] Single-policy behavior is unchanged (block never runs) — existing integration tests still pass
516
+
517
+ **Verify:** `bin/rails test test/retry_policy_integration_test.rb` → all pass
518
+
519
+ **Steps:**
520
+
521
+ - [ ] **Step 1: Edit `durably_execute`**
522
+
523
+ In `lib/chrono_forge/executor/methods/durably_execute.rb`, replace the retry block (the `if policy.retryable?(e, execution_log.attempts)` … `else` head):
524
+
525
+ ```ruby
526
+ # Optional retry logic
527
+ backoff = policy.retry_backoff(e, attempts: execution_log.attempts) do |idx|
528
+ bump_retry_count!(execution_log, idx)
529
+ end
530
+ if backoff
531
+ # Reschedule with the policy's backoff. The workflow replays on
532
+ # resume and skips completed steps, so the rescheduled run picks
533
+ # this step up again by its persisted execution log.
534
+ self.class
535
+ .set(wait: backoff)
536
+ .perform_later(@workflow.key)
537
+
538
+ # Halt current execution
539
+ halt_execution!
540
+ else
541
+ ```
542
+
543
+ (The `else` body — marking the log failed and raising `ExecutionFailedError` — is unchanged.)
544
+
545
+ - [ ] **Step 2: Edit `wait_until`**
546
+
547
+ In `lib/chrono_forge/executor/methods/wait_until.rb`, replace the `if policy.retryable?(e, execution_log.attempts)` head:
548
+
549
+ ```ruby
550
+ backoff = policy.retry_backoff(e, attempts: execution_log.attempts) do |idx|
551
+ bump_retry_count!(execution_log, idx)
552
+ end
553
+ if backoff
554
+ # Reschedule with the policy's backoff
555
+ self.class
556
+ .set(wait: backoff)
557
+ .perform_later(
558
+ @workflow.key
559
+ )
560
+
561
+ # Halt current execution
562
+ halt_execution!
563
+ else
564
+ ```
565
+
566
+ (The `else` body is unchanged.)
567
+
568
+ - [ ] **Step 3: Edit `durably_repeat`**
569
+
570
+ In `lib/chrono_forge/executor/methods/durably_repeat.rb`, inside `execute_repetition_now`, replace the `if policy.retryable?(e, repetition_log.attempts)` head:
571
+
572
+ ```ruby
573
+ # Handle retry logic for this specific repetition
574
+ backoff = policy.retry_backoff(e, attempts: repetition_log.attempts) do |idx|
575
+ bump_retry_count!(repetition_log, idx)
576
+ end
577
+ if backoff
578
+ # Reschedule this same repetition with the policy's backoff
579
+ self.class
580
+ .set(wait: backoff)
581
+ .perform_later(@workflow.key)
582
+
583
+ # Halt current execution
584
+ halt_execution!
585
+ else
586
+ ```
587
+
588
+ (The `else` body — marking failed and applying `on_error` — is unchanged.)
589
+
590
+ - [ ] **Step 4: Run the existing integration + unit suites, confirm green**
591
+
592
+ Run: `bin/rails test test/retry_policy_integration_test.rb test/workflow_retry_api_test.rb`
593
+ Expected: all pass — single-policy behavior is byte-for-byte unchanged.
594
+
595
+ - [ ] **Step 5: Commit**
596
+
597
+ ```bash
598
+ git add lib/chrono_forge/executor/methods/durably_execute.rb lib/chrono_forge/executor/methods/wait_until.rb lib/chrono_forge/executor/methods/durably_repeat.rb
599
+ git commit -m "feat(retry): route step sites through retry_backoff with per-error counts"
600
+ ```
601
+
602
+ ```json:metadata
603
+ {"files": ["lib/chrono_forge/executor/methods/durably_execute.rb", "lib/chrono_forge/executor/methods/wait_until.rb", "lib/chrono_forge/executor/methods/durably_repeat.rb"], "verifyCommand": "bin/rails test test/retry_policy_integration_test.rb test/workflow_retry_api_test.rb", "acceptanceCriteria": ["each step site uses retry_backoff + bump_retry_count!", "terminal branches unchanged", "single-policy integration tests pass"], "requiresUserVerification": false}
604
+ ```
605
+
606
+ ---
607
+
608
+ ### Task 5: Wire the workflow-level `perform` site
609
+
610
+ **Goal:** Give workflow-level (uncaught) retries per-error budgets by threading a `retry_counts` map through the job args, and keep the early safety-net guard correct for composites.
611
+
612
+ **Files:**
613
+ - Modify: `lib/chrono_forge/executor.rb` (`perform` signature + rescue block, ~lines 64-126)
614
+
615
+ **Acceptance Criteria:**
616
+ - [ ] `perform` accepts `retry_counts: {}` and threads it through the retry reschedule alongside `attempt:`
617
+ - [ ] The rescue uses `policy.retry_backoff(e, attempts: attempts_made) { |idx| <increment retry_counts[idx]> }`
618
+ - [ ] The early guard `attempt >= policy.max_attempts` still works (composite returns a coarse `max_attempts`)
619
+ - [ ] Existing workflow-level retry tests pass
620
+
621
+ **Verify:** `bin/rails test test/workflow_retry_api_test.rb test/retry_policy_integration_test.rb` → all pass
622
+
623
+ **Steps:**
624
+
625
+ - [ ] **Step 1: Add `retry_counts` to the signature**
626
+
627
+ In `lib/chrono_forge/executor.rb`, change:
628
+
629
+ ```ruby
630
+ def perform(key, attempt: 0, retry_workflow: false, options: {}, **kwargs)
631
+ ```
632
+
633
+ to:
634
+
635
+ ```ruby
636
+ def perform(key, attempt: 0, retry_counts: {}, retry_workflow: false, options: {}, **kwargs)
637
+ ```
638
+
639
+ - [ ] **Step 2: Use `retry_backoff` in the rescue**
640
+
641
+ Replace the workflow-level retry decision (currently `if policy.retryable?(e, attempts_made)` … through the `perform_later(... attempt: attempts_made)`):
642
+
643
+ ```ruby
644
+ # Retry if applicable. `attempt` is a 0-based index, so the count of
645
+ # attempts made so far (including this one) is attempt + 1.
646
+ attempts_made = attempt + 1
647
+ backoff = policy.retry_backoff(e, attempts: attempts_made) do |idx|
648
+ key_s = idx.to_s
649
+ retry_counts[key_s] = retry_counts[key_s].to_i + 1
650
+ retry_counts[key_s]
651
+ end
652
+ if backoff
653
+ self.class
654
+ .set(wait: backoff)
655
+ .perform_later(workflow.key, attempt: attempts_made, retry_counts: retry_counts)
656
+ else
657
+ fail_workflow! error_log
658
+ end
659
+ ```
660
+
661
+ The early safety-net guard at the top of `perform` is unchanged:
662
+
663
+ ```ruby
664
+ policy = workflow_retry_policy
665
+ if policy.max_attempts && attempt >= policy.max_attempts
666
+ Rails.logger.error { "ChronoForge:#{self.class} max attempts reached for job workflow(#{key})" }
667
+ return
668
+ end
669
+ ```
670
+
671
+ `CompositeRetryPolicy#max_attempts` (Task 2) returns the coarsest bound, so this guard remains a safe over-estimate and never trips a composite prematurely.
672
+
673
+ - [ ] **Step 3: Run the workflow-level suites, confirm green**
674
+
675
+ Run: `bin/rails test test/workflow_retry_api_test.rb test/retry_policy_integration_test.rb`
676
+ Expected: all pass — single-policy workflow retries thread an empty `retry_counts` and behave exactly as before.
677
+
678
+ - [ ] **Step 4: Commit**
679
+
680
+ ```bash
681
+ git add lib/chrono_forge/executor.rb
682
+ git commit -m "feat(retry): per-error budgets for workflow-level retries via job args"
683
+ ```
684
+
685
+ ```json:metadata
686
+ {"files": ["lib/chrono_forge/executor.rb"], "verifyCommand": "bin/rails test test/workflow_retry_api_test.rb test/retry_policy_integration_test.rb", "acceptanceCriteria": ["perform threads retry_counts", "rescue uses retry_backoff", "safety-net guard honors coarse max_attempts", "single-policy workflow tests pass"], "requiresUserVerification": false}
687
+ ```
688
+
689
+ ---
690
+
691
+ ### Task 6: Integration tests for composite behavior
692
+
693
+ **Goal:** Prove per-error budgets, per-error backoff, fail-fast, subclass routing, array coercion, and the single-policy regression — end to end through the executor.
694
+
695
+ **Files:**
696
+ - Create: `test/composite_retry_policy_integration_test.rb`
697
+
698
+ **Acceptance Criteria:**
699
+ - [ ] Different error types accumulate independent budgets at one step
700
+ - [ ] A `max_attempts: 1` sub-policy fails fast
701
+ - [ ] A subclass of a `retry_on` class draws from the parent policy's budget
702
+ - [ ] An array passed to `retry_policy:` is honored (coerced to composite)
703
+ - [ ] A single policy still writes no `retry_counts`
704
+
705
+ **Verify:** `bin/rails test test/composite_retry_policy_integration_test.rb` → all pass
706
+
707
+ **Steps:**
708
+
709
+ - [ ] **Step 1: Write the integration tests**
710
+
711
+ Create `test/composite_retry_policy_integration_test.rb`:
712
+
713
+ ```ruby
714
+ require "test_helper"
715
+
716
+ # End-to-end: composite retry_policy arrays wired through the executor.
717
+ class CompositeRetryPolicyIntegrationTest < ActiveJob::TestCase
718
+ include ChaoticJob::Helpers
719
+
720
+ RetryPolicy = ChronoForge::Executor::RetryPolicy
721
+
722
+ class NetworkError < StandardError; end
723
+ class FlakyNetworkError < NetworkError; end
724
+ class DeclinedError < StandardError; end
725
+
726
+ def define_workflow(name, &block)
727
+ test_class_name = "#{name}#{Time.now.to_i}_#{rand(100000)}"
728
+ Object.const_set(test_class_name, Class.new(WorkflowJob) do
729
+ prepend ChronoForge::Executor
730
+ class_eval(&block)
731
+ end)
732
+ Object.const_get(test_class_name)
733
+ end
734
+
735
+ def test_each_error_type_has_an_independent_budget
736
+ key = "composite_budgets_#{Time.now.to_i}_#{rand(10000)}"
737
+ klass = define_workflow("CompositeBudgets") do
738
+ define_method(:perform) do
739
+ durably_execute :flaky, retry_policy: [
740
+ RetryPolicy.new(retry_on: [NetworkError], max_attempts: 3, base: 0, cap: 0, jitter: false),
741
+ RetryPolicy.new(retry_on: [DeclinedError], max_attempts: 1)
742
+ ]
743
+ end
744
+ # Fails with NetworkError until its budget (3) is spent, then would
745
+ # raise DeclinedError — which fails fast at max_attempts: 1.
746
+ define_method(:flaky) do
747
+ n = (context[:n] = (context[:n] || 0) + 1)
748
+ raise NetworkError, "net #{n}" if n < 3
749
+ raise DeclinedError, "declined"
750
+ end
751
+ end
752
+
753
+ klass.perform_later(key)
754
+ perform_all_jobs
755
+
756
+ workflow = ChronoForge::Workflow.find_by(key: key)
757
+ log = workflow.execution_logs.find_by(step_name: "durably_execute$flaky")
758
+ # 3 NetworkError attempts (budget 3) + 1 DeclinedError attempt (budget 1) = 4
759
+ assert_equal 4, log.attempts
760
+ assert_equal "failed", log.state
761
+ assert_equal({"0" => 3, "1" => 1}, log.metadata["retry_counts"])
762
+ end
763
+
764
+ def test_subclass_draws_from_parent_policy_budget
765
+ key = "composite_subclass_#{Time.now.to_i}_#{rand(10000)}"
766
+ klass = define_workflow("CompositeSubclass") do
767
+ define_method(:perform) do
768
+ durably_execute :always_flaky, retry_policy: [
769
+ RetryPolicy.new(retry_on: [NetworkError], max_attempts: 2, base: 0, cap: 0, jitter: false)
770
+ ]
771
+ end
772
+ define_method(:always_flaky) { raise FlakyNetworkError, "boom" }
773
+ end
774
+
775
+ klass.perform_later(key)
776
+ perform_all_jobs
777
+
778
+ workflow = ChronoForge::Workflow.find_by(key: key)
779
+ log = workflow.execution_logs.find_by(step_name: "durably_execute$always_flaky")
780
+ assert_equal 2, log.attempts, "subclass routes to NetworkError policy (budget 2)"
781
+ assert_equal({"0" => 2}, log.metadata["retry_counts"])
782
+ end
783
+
784
+ def test_unmatched_error_fails_fast
785
+ key = "composite_unmatched_#{Time.now.to_i}_#{rand(10000)}"
786
+ klass = define_workflow("CompositeUnmatched") do
787
+ define_method(:perform) do
788
+ durably_execute :raises_arg, retry_policy: [
789
+ RetryPolicy.new(retry_on: [NetworkError], max_attempts: 5)
790
+ ]
791
+ end
792
+ define_method(:raises_arg) { raise ArgumentError, "nope" }
793
+ end
794
+
795
+ klass.perform_later(key)
796
+ perform_all_jobs
797
+
798
+ workflow = ChronoForge::Workflow.find_by(key: key)
799
+ log = workflow.execution_logs.find_by(step_name: "durably_execute$raises_arg")
800
+ assert_equal 1, log.attempts, "no matching policy -> fail fast"
801
+ assert_equal "failed", log.state
802
+ end
803
+
804
+ def test_single_policy_writes_no_retry_counts
805
+ key = "single_no_counts_#{Time.now.to_i}_#{rand(10000)}"
806
+ klass = define_workflow("SingleNoCounts") do
807
+ define_method(:perform) do
808
+ durably_execute :always_fails,
809
+ retry_policy: RetryPolicy.new(max_attempts: 2, base: 0, cap: 0, jitter: false)
810
+ end
811
+ define_method(:always_fails) { raise "boom" }
812
+ end
813
+
814
+ klass.perform_later(key)
815
+ perform_all_jobs
816
+
817
+ workflow = ChronoForge::Workflow.find_by(key: key)
818
+ log = workflow.execution_logs.find_by(step_name: "durably_execute$always_fails")
819
+ assert_equal 2, log.attempts
820
+ assert_nil log.metadata["retry_counts"], "single policy path writes no retry_counts"
821
+ end
822
+ end
823
+ ```
824
+
825
+ - [ ] **Step 2: Run the test, confirm green**
826
+
827
+ Run: `bin/rails test test/composite_retry_policy_integration_test.rb`
828
+ Expected: all pass.
829
+
830
+ - [ ] **Step 3: Run the full suite**
831
+
832
+ Run: `bin/rails test`
833
+ Expected: all pass — no regressions across the whole suite.
834
+
835
+ - [ ] **Step 4: Commit**
836
+
837
+ ```bash
838
+ git add test/composite_retry_policy_integration_test.rb
839
+ git commit -m "test(retry): integration coverage for composite retry policies"
840
+ ```
841
+
842
+ ```json:metadata
843
+ {"files": ["test/composite_retry_policy_integration_test.rb"], "verifyCommand": "bin/rails test", "acceptanceCriteria": ["independent per-error budgets", "fail-fast max_attempts:1", "subclass routes to parent budget", "array coerced", "single policy writes no retry_counts", "full suite green"], "requiresUserVerification": false}
844
+ ```
845
+
846
+ ---
847
+
848
+ ### Task 7: Document composite policies in the README
849
+
850
+ **Goal:** Add a "Composite retry policies" subsection to the existing Retry Policies docs, including the ordering footgun and the per-error-budget semantics.
851
+
852
+ **Files:**
853
+ - Modify: `README.md` (Retry Policies section, ~line 214-252)
854
+
855
+ **Acceptance Criteria:**
856
+ - [ ] A worked composite example (`retry_policy: [ ... ]`) with `retry_on` per policy
857
+ - [ ] States: first match wins; catch-all (`retry_on: nil`) last; unmatched → fail fast
858
+ - [ ] States: each error type has an independent budget and its own backoff
859
+ - [ ] Notes the class-level DSL accepts positional policies for a composite default
860
+
861
+ **Verify:** `grep -n "Composite" README.md` → shows the new subsection heading
862
+
863
+ **Steps:**
864
+
865
+ - [ ] **Step 1: Add the subsection**
866
+
867
+ In `README.md`, after the existing per-call/class-default retry examples (just before the section that documents `wait_until`'s opt-in, around line 252), insert:
868
+
869
+ ````markdown
870
+ #### Composite policies (per-error budgets)
871
+
872
+ Pass an **array** of policies to handle different error types differently. On a
873
+ failure, the **first** policy whose `retry_on` matches the raised error applies —
874
+ each error type gets its **own independent attempt budget and backoff**:
875
+
876
+ ```ruby
877
+ durably_execute :charge_card, retry_policy: [
878
+ RetryPolicy.new(retry_on: [NetworkError], max_attempts: 5), # transient: retry hard
879
+ RetryPolicy.new(retry_on: [RateLimitError], max_attempts: 10, base: 5), # back off longer
880
+ RetryPolicy.new(retry_on: [PaymentDeclinedError], max_attempts: 1), # fail fast, never retry
881
+ RetryPolicy.new(retry_on: nil) # catch-all (optional), keep last
882
+ ]
883
+ ```
884
+
885
+ - **Order matters** — the first matching policy wins, so list specific errors
886
+ first and a catch-all (`retry_on: nil`) last. An error matched by no policy is
887
+ **not retried** (fails fast).
888
+ - A subclass of a listed error routes to that policy and draws from its budget.
889
+ - The class-level DSL accepts the same form as positional arguments:
890
+
891
+ ```ruby
892
+ retry_policy RetryPolicy.new(retry_on: [NetworkError], max_attempts: 5),
893
+ RetryPolicy.new(retry_on: nil, max_attempts: 2)
894
+ ```
895
+ ````
896
+
897
+ - [ ] **Step 2: Verify the heading exists**
898
+
899
+ Run: `grep -n "Composite policies" README.md`
900
+ Expected: one match.
901
+
902
+ - [ ] **Step 3: Commit**
903
+
904
+ ```bash
905
+ git add README.md
906
+ git commit -m "docs(retry): document composite per-error retry policies"
907
+ ```
908
+
909
+ ```json:metadata
910
+ {"files": ["README.md"], "verifyCommand": "grep -n 'Composite policies' README.md", "acceptanceCriteria": ["worked array example", "ordering + catch-all + fail-fast stated", "independent budget/backoff stated", "class DSL positional form shown"], "requiresUserVerification": false}
911
+ ```
912
+
913
+ ---
914
+
915
+ ## Self-Review
916
+
917
+ **Spec coverage:**
918
+ - `matches?`, `retry_backoff`, `compose` → Task 1 ✓
919
+ - `CompositeRetryPolicy` (routing, block-count, `max_attempts`, empty guard) → Task 2 ✓
920
+ - `coerce_policy`, class DSL overload, `bump_retry_count!` → Task 3 ✓
921
+ - Step-site wiring (metadata counter) → Task 4 ✓
922
+ - Workflow-level wiring (job-arg counter, safety net) → Task 5 ✓
923
+ - Integration: per-error budgets, fail-fast, subclass, array coercion, single-policy regression → Task 6 ✓
924
+ - Ordering/reorder docs → Task 7 ✓ (reorder caveat is a documented edge in the spec; README covers ordering)
925
+
926
+ **Placeholder scan:** none — every code/step is concrete.
927
+
928
+ **Type consistency:** `retry_backoff(error, attempts:)`, `matches?(error)`, `policy_for(error)`, `RetryPolicy.compose`, `CompositeRetryPolicy.new(policies)`, `coerce_policy`, `bump_retry_count!(log, idx)`, `RETRY_COUNTS_KEY = "retry_counts"`, `retry_counts:` job arg — consistent across all tasks.
929
+
930
+ **Verification requirement scan:** The spec/prompt requires NO user verification (internal API, test-covered). No verification task needed.