chrono_forge 0.9.1 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +22 -0
  3. data/README.md +305 -44
  4. data/docs/superpowers/plans/2026-06-25-chrono_forge-dashboard.md +1748 -0
  5. data/docs/superpowers/plans/2026-06-25-chrono_forge-dashboard.md.tasks.json +17 -0
  6. data/docs/superpowers/plans/2026-06-25-composite-retry-policies.md +930 -0
  7. data/docs/superpowers/plans/2026-06-25-composite-retry-policies.md.tasks.json +54 -0
  8. data/docs/superpowers/plans/2026-06-25-reserved-kwarg-guard.md +241 -0
  9. data/docs/superpowers/plans/2026-06-25-reserved-kwarg-guard.md.tasks.json +12 -0
  10. data/docs/superpowers/plans/2026-06-26-branches-spawn-merge.md +1378 -0
  11. data/docs/superpowers/plans/2026-06-26-branches-spawn-merge.md.tasks.json +67 -0
  12. data/docs/superpowers/plans/2026-06-26-deferral-continuation-race-and-catchup.md +709 -0
  13. data/docs/superpowers/plans/2026-06-26-deferral-continuation-race-and-catchup.md.tasks.json +19 -0
  14. data/docs/superpowers/specs/2026-06-03-unified-retry-policy-design.md +226 -0
  15. data/docs/superpowers/specs/2026-06-25-chrono_forge-dashboard-design.md +190 -0
  16. data/docs/superpowers/specs/2026-06-25-composite-retry-policies-design.md +228 -0
  17. data/docs/superpowers/specs/2026-06-25-reserved-kwarg-guard-design.md +169 -0
  18. data/docs/superpowers/specs/2026-06-25-spawn-merge-branches-design.md +468 -0
  19. data/docs/superpowers/specs/2026-06-26-dashboard-branch-view-design.md +142 -0
  20. data/docs/superpowers/specs/2026-06-26-deferral-continuation-race-and-catchup-design.md +265 -0
  21. data/lib/chrono_forge/branch_merge_job.rb +138 -0
  22. data/lib/chrono_forge/branch_probe.rb +26 -0
  23. data/lib/chrono_forge/cleanup.rb +6 -0
  24. data/lib/chrono_forge/execution_log.rb +6 -0
  25. data/lib/chrono_forge/executor/composite_retry_policy.rb +47 -0
  26. data/lib/chrono_forge/executor/methods/branch.rb +185 -0
  27. data/lib/chrono_forge/executor/methods/durably_execute.rb +21 -19
  28. data/lib/chrono_forge/executor/methods/durably_repeat.rb +118 -25
  29. data/lib/chrono_forge/executor/methods/merge_branches.rb +83 -0
  30. data/lib/chrono_forge/executor/methods/wait.rb +2 -4
  31. data/lib/chrono_forge/executor/methods/wait_until.rb +25 -25
  32. data/lib/chrono_forge/executor/methods/workflow_states.rb +16 -0
  33. data/lib/chrono_forge/executor/methods.rb +2 -0
  34. data/lib/chrono_forge/executor/retry_policy.rb +111 -0
  35. data/lib/chrono_forge/executor.rb +216 -28
  36. data/lib/chrono_forge/version.rb +1 -1
  37. data/lib/chrono_forge/workflow.rb +10 -1
  38. data/lib/generators/chrono_forge/migration_actions.rb +1 -0
  39. data/lib/generators/chrono_forge/templates/add_chrono_forge_parent_execution_log.rb +38 -0
  40. metadata +42 -5
  41. data/lib/chrono_forge/executor/retry_strategy.rb +0 -29
@@ -0,0 +1,709 @@
1
+ # Deferral Continuation Race & Catch-up Surge — Implementation Plan
2
+
3
+ > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development (recommended) or superpowers-extended-cc:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
4
+
5
+ **Goal:** Close the continuation/lock-release race (Issue 1) by publishing every continuation only after the lock is released, and collapse `durably_repeat` catch-up from O(missed intervals) to O(1) with a closed-form fast-forward of the expired prefix (Issue 2).
6
+
7
+ **Architecture:** (1) Deferral primitives stop calling `perform_later` inline; they record an intended continuation on the instance, and the executor flushes it in `ensure` *after* `release_lock`. (2) `durably_repeat` computes the first non-expired grid tick in closed form, advances the coordination log's `last_execution_at`, and writes a single summary `ExecutionLog` for the skipped prefix instead of one timed-out row per tick.
8
+
9
+ **Tech Stack:** Ruby 3.2, Rails (ActiveJob/ActiveRecord), Minitest + `chaotic_job`, SolidQueue (prod). Gem: `chrono_forge` 0.9.1.
10
+
11
+ **Spec:** `docs/superpowers/specs/2026-06-26-deferral-continuation-race-and-catchup-design.md`
12
+
13
+ **User Verification:** NO — no user verification required (automated tests are the acceptance gate).
14
+
15
+ **Test command (single file):** `bundle exec ruby -Itest test/<file>_test.rb`
16
+ **Full suite:** `bundle exec rake test`
17
+
18
+ ---
19
+
20
+ ## File Structure
21
+
22
+ | File | Responsibility | Change |
23
+ |---|---|---|
24
+ | `lib/chrono_forge/executor.rb` | Continuation recording + post-release flush | Add `enqueue_continuation` / `flush_continuation!`; flush in `ensure`; convert workflow-retry enqueue |
25
+ | `lib/chrono_forge/executor/methods/wait.rb` | `wait` reschedule | Convert inline enqueue → `enqueue_continuation` |
26
+ | `lib/chrono_forge/executor/methods/wait_until.rb` | poll + cond-error retry | Convert 2 inline enqueues |
27
+ | `lib/chrono_forge/executor/methods/durably_execute.rb` | retry backoff | Convert 1 inline enqueue |
28
+ | `lib/chrono_forge/executor/methods/durably_repeat.rb` | schedule-later, repetition-retry, schedule-next, **fast-forward** | Convert 3 inline enqueues (Task 1); add `fast_forward_expired_prefix` (Task 2) |
29
+ | `test/continuation_flush_test.rb` | Issue 1 tests | Create (Task 1) |
30
+ | `test/durably_repeat_test.rb` | Issue 2 tests + updates | Add fast-forward tests; update 2 timeout tests (Task 2) |
31
+
32
+ ---
33
+
34
+ ### Task 1: Defer all continuation enqueues until after lock release
35
+
36
+ **Goal:** No continuation job is published while the enqueuing job still holds the workflow lock; all 8 enqueue sites route through one recorded slot flushed in `ensure` after `release_lock`.
37
+
38
+ **Files:**
39
+ - Modify: `lib/chrono_forge/executor.rb` (add helpers near `halt_execution!` ~`:305`; flush in `ensure` `:168-173`; convert workflow-retry enqueue `:162-164`)
40
+ - Modify: `lib/chrono_forge/executor/methods/wait.rb:106-108`
41
+ - Modify: `lib/chrono_forge/executor/methods/wait_until.rb:134-138` and `:180-185`
42
+ - Modify: `lib/chrono_forge/executor/methods/durably_execute.rb:111-113`
43
+ - Modify: `lib/chrono_forge/executor/methods/durably_repeat.rb:192-194`, `:234-236`, `:287-289`
44
+ - Test: `test/continuation_flush_test.rb` (create)
45
+
46
+ **Acceptance Criteria:**
47
+ - [ ] Every continuation observes the workflow lock already released (`locked_by == nil`) at enqueue time.
48
+ - [ ] Per-site kwargs are preserved (`wait_condition:` for the `wait_until` poll; `attempt:`/`retry_counts:` for the workflow retry).
49
+ - [ ] `flush_continuation!` is a no-op when no continuation was recorded, and is skipped when `release_lock` raises (overrun loses the lock).
50
+ - [ ] Full suite still green (regression guard for retry/attempt threading).
51
+
52
+ **Verify:** `bundle exec ruby -Itest test/continuation_flush_test.rb` → all pass; then `bundle exec rake test` → green.
53
+
54
+ **Steps:**
55
+
56
+ - [ ] **Step 1: Write the failing tests**
57
+
58
+ Create `test/continuation_flush_test.rb`:
59
+
60
+ ```ruby
61
+ require "test_helper"
62
+
63
+ class ContinuationFlushTest < ActiveJob::TestCase
64
+ include ChaoticJob::Helpers
65
+
66
+ def setup
67
+ ChronoForge::Workflow.destroy_all
68
+ end
69
+
70
+ # The core ordering guarantee: a continuation must only become claimable after
71
+ # the enqueuing job has released the lock. We observe the workflow's lock owner
72
+ # in the DB at the instant each same-key continuation is enqueued; it must be nil.
73
+ def test_continuation_is_enqueued_only_after_lock_released
74
+ key = "flush_order_#{Time.now.to_i}_#{rand(10_000)}"
75
+
76
+ locked_owners = []
77
+ subscriber = ActiveSupport::Notifications.subscribe("enqueue.active_job") do |*args|
78
+ event = ActiveSupport::Notifications::Event.new(*args)
79
+ job = event.payload[:job]
80
+ next unless job.arguments.first == key
81
+ wf = ChronoForge::Workflow.find_by(key: key)
82
+ locked_owners << (wf && wf.locked_by)
83
+ end
84
+
85
+ begin
86
+ WaitContinuationJob.perform_later(key)
87
+ perform_all_jobs_before(1.second)
88
+ ensure
89
+ ActiveSupport::Notifications.unsubscribe(subscriber)
90
+ end
91
+
92
+ # At least one continuation enqueue must have been observed from inside the job.
93
+ refute locked_owners.empty?, "expected to observe a continuation enqueue"
94
+ assert locked_owners.all?(&:nil?),
95
+ "continuation must be enqueued only after lock release; observed owners: #{locked_owners.inspect}"
96
+ end
97
+
98
+ # flush_continuation! must round-trip arbitrary kwargs into the continuation.
99
+ def test_flush_continuation_preserves_kwargs
100
+ key = "flush_kwargs_#{Time.now.to_i}_#{rand(10_000)}"
101
+ workflow = ChronoForge::Workflow.create!(
102
+ key: key, job_class: "KitchenSink", kwargs: {}, options: {}, context: {}, state: :idle
103
+ )
104
+
105
+ job = KitchenSink.new
106
+ job.instance_variable_set(:@workflow, workflow)
107
+ job.send(:enqueue_continuation, wait: 0.seconds, wait_condition: "my_cond")
108
+
109
+ assert_difference -> { enqueued_jobs.size }, 1 do
110
+ job.send(:flush_continuation!)
111
+ end
112
+
113
+ last = enqueued_jobs.last
114
+ assert_includes last.to_s, key, "continuation should target the workflow key"
115
+ assert_includes last.to_s, "my_cond", "continuation must carry the wait_condition kwarg"
116
+ end
117
+
118
+ # No recorded continuation => flush does nothing.
119
+ def test_flush_continuation_is_noop_without_recorded_continuation
120
+ job = KitchenSink.new
121
+ assert_no_difference -> { enqueued_jobs.size } do
122
+ job.send(:flush_continuation!)
123
+ end
124
+ end
125
+ end
126
+
127
+ class WaitContinuationJob < WorkflowJob
128
+ prepend ChronoForge::Executor
129
+
130
+ def perform
131
+ # First pass: wait period not elapsed -> records a continuation and halts.
132
+ wait 1.hour, "long_wait"
133
+ end
134
+ end
135
+ ```
136
+
137
+ - [ ] **Step 2: Run tests to verify they fail**
138
+
139
+ Run: `bundle exec ruby -Itest test/continuation_flush_test.rb`
140
+ Expected: FAIL —
141
+ - `test_continuation_is_enqueued_only_after_lock_released`: observed owner is the job id (non-nil), because `wait` enqueues before the `ensure` release.
142
+ - `test_flush_continuation_preserves_kwargs` / `..._noop_...`: `NoMethodError: undefined method 'enqueue_continuation'/'flush_continuation!'`.
143
+
144
+ - [ ] **Step 3: Add the recording + flush helpers in the executor**
145
+
146
+ In `lib/chrono_forge/executor.rb`, add near `halt_execution!` (private section, ~`:305`):
147
+
148
+ ```ruby
149
+ # Record the continuation this job intends to enqueue. It is NOT published
150
+ # here: publishing while the lock is still held lets another worker claim it
151
+ # and lose the lock-acquisition race. The executor flushes it in `ensure`,
152
+ # after release_lock (see #flush_continuation!). At most one continuation is
153
+ # recorded per job run (every primitive records one then halts, or falls
154
+ # through the workflow-retry rescue).
155
+ def enqueue_continuation(wait:, **kwargs)
156
+ @continuation = {wait: wait, kwargs: kwargs}
157
+ end
158
+
159
+ # Publish the recorded continuation, if any. Called from `ensure` only after
160
+ # the lock row has been updated to released, so even a zero-delay continuation
161
+ # finds the lock free.
162
+ def flush_continuation!
163
+ return unless @continuation
164
+
165
+ self.class
166
+ .set(wait: @continuation[:wait])
167
+ .perform_later(@workflow.key, **@continuation[:kwargs])
168
+ end
169
+ ```
170
+
171
+ - [ ] **Step 4: Flush in `ensure`, after release_lock**
172
+
173
+ In `lib/chrono_forge/executor.rb`, change the `ensure` block (`:168-173`) from:
174
+
175
+ ```ruby
176
+ ensure
177
+ if lock_acquired # Only release lock if we acquired it
178
+ context.save!
179
+ self.class::LockStrategy.release_lock(job_id, workflow)
180
+ end
181
+ end
182
+ ```
183
+
184
+ to:
185
+
186
+ ```ruby
187
+ ensure
188
+ if lock_acquired # Only release lock if we acquired it
189
+ context.save!
190
+ self.class::LockStrategy.release_lock(job_id, workflow)
191
+ # Publish the continuation only now — after the lock is released — so a
192
+ # zero-delay, same-key continuation can't lose the acquire race against
193
+ # this still-locked job. If release_lock raised (this job overran and
194
+ # lost the lock), we never reach here and another job owns continuation.
195
+ flush_continuation!
196
+ end
197
+ end
198
+ ```
199
+
200
+ - [ ] **Step 5: Convert the workflow-level retry enqueue**
201
+
202
+ In `lib/chrono_forge/executor.rb`, change (`:161-164`):
203
+
204
+ ```ruby
205
+ if backoff
206
+ self.class
207
+ .set(wait: backoff)
208
+ .perform_later(workflow.key, attempt: attempts_made, retry_counts: retry_counts)
209
+ else
210
+ ```
211
+
212
+ to:
213
+
214
+ ```ruby
215
+ if backoff
216
+ enqueue_continuation(wait: backoff, attempt: attempts_made, retry_counts: retry_counts)
217
+ else
218
+ ```
219
+
220
+ - [ ] **Step 6: Convert the `wait` enqueue**
221
+
222
+ In `lib/chrono_forge/executor/methods/wait.rb`, change (`:105-111`):
223
+
224
+ ```ruby
225
+ # Reschedule the job
226
+ self.class
227
+ .set(wait: duration)
228
+ .perform_later(@workflow.key)
229
+
230
+ # Halt current execution
231
+ halt_execution!
232
+ ```
233
+
234
+ to:
235
+
236
+ ```ruby
237
+ # Record the reschedule; the executor publishes it after lock release.
238
+ enqueue_continuation(wait: duration)
239
+
240
+ # Halt current execution
241
+ halt_execution!
242
+ ```
243
+
244
+ - [ ] **Step 7: Convert both `wait_until` enqueues**
245
+
246
+ In `lib/chrono_forge/executor/methods/wait_until.rb`, change the cond-error retry (`:132-141`):
247
+
248
+ ```ruby
249
+ if backoff
250
+ # Reschedule with the policy's backoff
251
+ self.class
252
+ .set(wait: backoff)
253
+ .perform_later(
254
+ @workflow.key
255
+ )
256
+
257
+ # Halt current execution
258
+ halt_execution!
259
+ ```
260
+
261
+ to:
262
+
263
+ ```ruby
264
+ if backoff
265
+ # Reschedule with the policy's backoff (published after lock release).
266
+ enqueue_continuation(wait: backoff)
267
+
268
+ # Halt current execution
269
+ halt_execution!
270
+ ```
271
+
272
+ Then change the poll reschedule (`:179-188`):
273
+
274
+ ```ruby
275
+ # Reschedule with delay
276
+ self.class
277
+ .set(wait: check_interval)
278
+ .perform_later(
279
+ @workflow.key,
280
+ wait_condition: condition
281
+ )
282
+
283
+ # Halt current execution
284
+ halt_execution!
285
+ ```
286
+
287
+ to:
288
+
289
+ ```ruby
290
+ # Reschedule the poll (published after lock release).
291
+ enqueue_continuation(wait: check_interval, wait_condition: condition)
292
+
293
+ # Halt current execution
294
+ halt_execution!
295
+ ```
296
+
297
+ - [ ] **Step 8: Convert the `durably_execute` retry enqueue**
298
+
299
+ In `lib/chrono_forge/executor/methods/durably_execute.rb`, change (`:107-116`):
300
+
301
+ ```ruby
302
+ if backoff
303
+ # Reschedule with the policy's backoff. The workflow replays on
304
+ # resume and skips completed steps, so the rescheduled run picks
305
+ # this step up again by its persisted execution log.
306
+ self.class
307
+ .set(wait: backoff)
308
+ .perform_later(@workflow.key)
309
+
310
+ # Halt current execution
311
+ halt_execution!
312
+ ```
313
+
314
+ to:
315
+
316
+ ```ruby
317
+ if backoff
318
+ # Reschedule with the policy's backoff (published after lock release).
319
+ # The workflow replays on resume and skips completed steps, so the
320
+ # rescheduled run picks this step up again by its execution log.
321
+ enqueue_continuation(wait: backoff)
322
+
323
+ # Halt current execution
324
+ halt_execution!
325
+ ```
326
+
327
+ - [ ] **Step 9: Convert all three `durably_repeat` enqueues**
328
+
329
+ In `lib/chrono_forge/executor/methods/durably_repeat.rb`, `schedule_repetition_for_later` (`:191-197`):
330
+
331
+ ```ruby
332
+ # Schedule the workflow to run at the specified time
333
+ self.class
334
+ .set(wait: delay)
335
+ .perform_later(@workflow.key)
336
+
337
+ # Halt current execution until scheduled time
338
+ halt_execution!
339
+ ```
340
+
341
+ to:
342
+
343
+ ```ruby
344
+ # Schedule the workflow to run at the specified time (published after release).
345
+ enqueue_continuation(wait: delay)
346
+
347
+ # Halt current execution until scheduled time
348
+ halt_execution!
349
+ ```
350
+
351
+ The repetition retry (`:232-239`):
352
+
353
+ ```ruby
354
+ if backoff
355
+ # Reschedule this same repetition with the policy's backoff
356
+ self.class
357
+ .set(wait: backoff)
358
+ .perform_later(@workflow.key)
359
+
360
+ # Halt current execution
361
+ halt_execution!
362
+ ```
363
+
364
+ to:
365
+
366
+ ```ruby
367
+ if backoff
368
+ # Reschedule this same repetition with the policy's backoff (after release).
369
+ enqueue_continuation(wait: backoff)
370
+
371
+ # Halt current execution
372
+ halt_execution!
373
+ ```
374
+
375
+ And `schedule_next_execution_after_completion` (`:286-292`):
376
+
377
+ ```ruby
378
+ # Schedule the workflow to run for the next periodic execution
379
+ self.class
380
+ .set(wait: delay)
381
+ .perform_later(@workflow.key)
382
+
383
+ # Halt current execution
384
+ halt_execution!
385
+ ```
386
+
387
+ to:
388
+
389
+ ```ruby
390
+ # Schedule the next periodic execution (published after lock release).
391
+ enqueue_continuation(wait: delay)
392
+
393
+ # Halt current execution
394
+ halt_execution!
395
+ ```
396
+
397
+ - [ ] **Step 10: Run the new tests — expect PASS**
398
+
399
+ Run: `bundle exec ruby -Itest test/continuation_flush_test.rb`
400
+ Expected: PASS (3 tests).
401
+
402
+ - [ ] **Step 11: Run the full suite — expect green**
403
+
404
+ Run: `bundle exec rake test`
405
+ Expected: all tests pass (retry/attempt threading preserved through the flush).
406
+
407
+ - [ ] **Step 12: Commit**
408
+
409
+ ```bash
410
+ git add lib/chrono_forge/executor.rb \
411
+ lib/chrono_forge/executor/methods/wait.rb \
412
+ lib/chrono_forge/executor/methods/wait_until.rb \
413
+ lib/chrono_forge/executor/methods/durably_execute.rb \
414
+ lib/chrono_forge/executor/methods/durably_repeat.rb \
415
+ test/continuation_flush_test.rb
416
+ git commit -m "fix(executor): publish continuations after lock release to close acquire race"
417
+ ```
418
+
419
+ ```json:metadata
420
+ {"files": ["lib/chrono_forge/executor.rb", "lib/chrono_forge/executor/methods/wait.rb", "lib/chrono_forge/executor/methods/wait_until.rb", "lib/chrono_forge/executor/methods/durably_execute.rb", "lib/chrono_forge/executor/methods/durably_repeat.rb", "test/continuation_flush_test.rb"], "verifyCommand": "bundle exec ruby -Itest test/continuation_flush_test.rb && bundle exec rake test", "acceptanceCriteria": ["continuations enqueued only after lock release", "per-site kwargs preserved", "flush no-ops without recorded continuation and is skipped when release_lock raises", "full suite green"], "requiresUserVerification": false}
421
+ ```
422
+
423
+ ---
424
+
425
+ ### Task 2: Closed-form fast-forward of the expired prefix in `durably_repeat`
426
+
427
+ **Goal:** When `durably_repeat` resumes behind schedule, jump past the expired prefix in O(1), advance the coordination log's `last_execution_at`, and write one summary `ExecutionLog` for the skip — instead of one timed-out row + one zero-delay job per missed tick.
428
+
429
+ **Files:**
430
+ - Modify: `lib/chrono_forge/executor/methods/durably_repeat.rb` (call fast-forward in `durably_repeat` after `:149`; add private `fast_forward_expired_prefix`)
431
+ - Test: `test/durably_repeat_test.rb` (add new tests; update `test_durably_repeat_with_timeout` `:116` and `test_durably_repeat_coordination_log_updated_on_timeout` `:345`)
432
+
433
+ **Acceptance Criteria:**
434
+ - [ ] `fast_forward_expired_prefix` returns the input unchanged when nothing is expired (`next >= now − timeout`).
435
+ - [ ] When ticks are expired, it returns the first grid tick `>= now − timeout` (exact grid landing, `n = ceil((cutoff − next)/every)` intervals).
436
+ - [ ] The expired prefix produces **zero** `Execution timed out` rows and exactly **one** summary row (`error_class: "TimeoutError"`, `metadata["fast_forwarded"] == n`, step on the last skipped grid tick) that does not collide with the first-valid repetition row.
437
+ - [ ] Coordination `last_execution_at` is advanced to `(first_valid − every).iso8601`, so a replay is stable (recomputes the same `first_valid`).
438
+ - [ ] The first in-window tick still executes its work (boundary preserved).
439
+ - [ ] Full suite green.
440
+
441
+ **Verify:** `bundle exec ruby -Itest test/durably_repeat_test.rb` → all pass; then `bundle exec rake test` → green.
442
+
443
+ **Steps:**
444
+
445
+ - [ ] **Step 1: Write the failing unit test for the closed form**
446
+
447
+ Add to `test/durably_repeat_test.rb` (inside `class DurablyRepeatTest`):
448
+
449
+ ```ruby
450
+ def test_fast_forward_returns_input_when_nothing_expired
451
+ workflow = ChronoForge::Workflow.create!(
452
+ key: "ff_noop_#{rand(10_000)}", job_class: "KitchenSink",
453
+ kwargs: {}, options: {}, context: {}, state: :idle
454
+ )
455
+ coordination = workflow.execution_logs.create!(
456
+ step_name: "durably_repeat$x", state: :pending, metadata: {}
457
+ )
458
+ job = KitchenSink.new
459
+ job.instance_variable_set(:@workflow, workflow)
460
+
461
+ next_at = Time.current + 5.seconds # future tick, not expired
462
+ result = job.send(:fast_forward_expired_prefix, coordination, next_at, 2.seconds, 1.hour)
463
+
464
+ assert_in_delta next_at.to_f, result.to_f, 0.001, "future tick must be returned unchanged"
465
+ end
466
+
467
+ def test_fast_forward_lands_on_first_non_expired_grid_tick
468
+ workflow = ChronoForge::Workflow.create!(
469
+ key: "ff_jump_#{rand(10_000)}", job_class: "KitchenSink",
470
+ kwargs: {}, options: {}, context: {}, state: :idle
471
+ )
472
+ coordination = workflow.execution_logs.create!(
473
+ step_name: "durably_repeat$x", state: :pending, metadata: {}
474
+ )
475
+ job = KitchenSink.new
476
+ job.instance_variable_set(:@workflow, workflow)
477
+
478
+ every = 1.second
479
+ timeout = 1.second
480
+ # 60 ticks back, 1s grid, 1s timeout => cutoff = now-1s; first non-expired
481
+ # tick is the smallest grid tick >= now-1s.
482
+ next_at = Time.current - 60.seconds
483
+ cutoff = Time.current - timeout
484
+
485
+ result = job.send(:fast_forward_expired_prefix, coordination, next_at, every, timeout)
486
+
487
+ # On-grid: result == next_at + n*every for integer n.
488
+ n = ((result - next_at) / every.to_f).round
489
+ assert_in_delta next_at.to_f + n * every.to_f, result.to_f, 0.001, "result must stay on the grid"
490
+ assert_operator result, :>=, cutoff, "result must be the first non-expired tick"
491
+ assert_operator result - every, :<, cutoff, "the tick before result must still be expired"
492
+
493
+ # Coordination advanced so replay recomputes the same first_valid.
494
+ coordination.reload
495
+ assert coordination.metadata["last_execution_at"], "last_execution_at must be set"
496
+ assert_in_delta (result - every).to_f,
497
+ Time.parse(coordination.metadata["last_execution_at"]).to_f, 0.001
498
+
499
+ # Exactly one summary row written, on the last skipped grid tick, with the count.
500
+ summary = workflow.execution_logs.where("step_name LIKE ?", "durably_repeat$x$%").to_a
501
+ assert_equal 1, summary.size, "exactly one summary row for the skipped prefix"
502
+ assert_equal "TimeoutError", summary.first.error_class
503
+ assert_operator summary.first.metadata["fast_forwarded"].to_i, :>=, 1
504
+ end
505
+ ```
506
+
507
+ - [ ] **Step 2: Run unit tests to verify they fail**
508
+
509
+ Run: `bundle exec ruby -Itest test/durably_repeat_test.rb -n "/fast_forward/"`
510
+ Expected: FAIL — `NoMethodError: undefined method 'fast_forward_expired_prefix'`.
511
+
512
+ - [ ] **Step 3: Implement `fast_forward_expired_prefix` and wire it in**
513
+
514
+ In `lib/chrono_forge/executor/methods/durably_repeat.rb`, in `durably_repeat`, insert the call right after `next_execution_at` is computed (after `:149`, before `execute_or_schedule_repetition` at `:151`):
515
+
516
+ ```ruby
517
+ next_execution_at = fast_forward_expired_prefix(coordination_log, next_execution_at, every, timeout)
518
+
519
+ execute_or_schedule_repetition(method, coordination_log, next_execution_at, every, policy, timeout, on_error)
520
+ ```
521
+
522
+ Add the private method (alongside the other privates):
523
+
524
+ ```ruby
525
+ # Catch-up fast-forward. A tick `t` is expired (its work is skipped) iff
526
+ # `Time.current > t + timeout`, i.e. `t < now - timeout`. Rather than
527
+ # walking one zero-delay job per expired tick, jump straight to the first
528
+ # non-expired tick on the same grid in closed form.
529
+ #
530
+ # Anchoring the arithmetic on `next_execution_at` (already on the canonical
531
+ # grid: start_at / created_at+every / last_execution_at+every all land on
532
+ # it, because last_execution_at stores the *scheduled* time, not wall-clock)
533
+ # keeps the result exactly on the grid — no drift.
534
+ #
535
+ # Returns `next_execution_at` unchanged when nothing is expired. Otherwise
536
+ # advances the coordination log's last_execution_at so a replay recomputes
537
+ # the same first tick, and writes ONE summary ExecutionLog for the whole
538
+ # skipped prefix (no per-tick timeout rows).
539
+ def fast_forward_expired_prefix(coordination_log, next_execution_at, every, timeout)
540
+ cutoff = Time.current - timeout
541
+ return next_execution_at if next_execution_at >= cutoff
542
+
543
+ n = ((cutoff - next_execution_at) / every.to_f).ceil
544
+ first_valid = next_execution_at + (n * every)
545
+ last_skipped = first_valid - every
546
+
547
+ Rails.logger.info {
548
+ "ChronoForge:#{self.class}(#{@workflow.key}) durably_repeat fast-forwarded " \
549
+ "#{n} expired tick(s) to #{first_valid.iso8601}"
550
+ }
551
+
552
+ # Single summary row for the skipped prefix, on the last skipped grid
553
+ # tick (unique; never collides with the first_valid repetition row).
554
+ summary_step = "#{coordination_log.step_name}$#{last_skipped.to_i}"
555
+ find_or_create_execution_log!(summary_step) do |log|
556
+ log.started_at = Time.current
557
+ log.metadata = {
558
+ fast_forwarded: n,
559
+ from: next_execution_at.iso8601,
560
+ to: last_skipped.iso8601,
561
+ scheduled_for: last_skipped,
562
+ timeout_at: last_skipped + timeout,
563
+ parent_id: coordination_log.id
564
+ }
565
+ end.update!(
566
+ state: :failed,
567
+ error_class: "TimeoutError",
568
+ error_message: "Fast-forwarded #{n} expired tick(s)",
569
+ completed_at: Time.current
570
+ )
571
+
572
+ # Record progress: a replay recomputes naive_next = last + every = first_valid.
573
+ coordination_log.update!(
574
+ metadata: coordination_log.metadata.merge("last_execution_at" => last_skipped.iso8601)
575
+ )
576
+
577
+ first_valid
578
+ end
579
+ ```
580
+
581
+ - [ ] **Step 4: Run unit tests — expect PASS**
582
+
583
+ Run: `bundle exec ruby -Itest test/durably_repeat_test.rb -n "/fast_forward/"`
584
+ Expected: PASS (2 tests).
585
+
586
+ - [ ] **Step 5: Add an integration test for catch-up (red→green in one step since impl now exists)**
587
+
588
+ Add to `test/durably_repeat_test.rb` (class body + a job class at the bottom with the others):
589
+
590
+ ```ruby
591
+ def test_durably_repeat_catch_up_fast_forwards_expired_prefix
592
+ unique_key = "catchup_#{Time.now.to_i}_#{rand(10_000)}"
593
+
594
+ # start_at far in the past with a short timeout => a long expired prefix.
595
+ CatchUpJob.perform_later(unique_key, start_time: Time.current - 60.seconds)
596
+
597
+ perform_all_jobs_before(5.seconds)
598
+
599
+ workflow = ChronoForge::Workflow.find_by(key: unique_key)
600
+
601
+ # No per-tick timeout tombstones for the expired prefix.
602
+ timed_out = workflow.execution_logs.select { |l| l.error_message == "Execution timed out" }
603
+ assert_empty timed_out, "expired prefix must not create per-tick timeout rows"
604
+
605
+ # Exactly one fast-forward summary row.
606
+ summaries = workflow.execution_logs.select { |l| l.metadata && l.metadata["fast_forwarded"] }
607
+ assert_equal 1, summaries.size, "expired prefix collapses to one summary row"
608
+ assert_operator summaries.first.metadata["fast_forwarded"].to_i, :>=, 1
609
+ end
610
+ ```
611
+
612
+ ```ruby
613
+ class CatchUpJob < WorkflowJob
614
+ prepend ChronoForge::Executor
615
+
616
+ def perform(start_time:)
617
+ context.set_once(:execution_count, 0)
618
+ start_obj = start_time.is_a?(String) ? Time.parse(start_time) : start_time
619
+ durably_repeat :catch_up_task, every: 1.second, till: :done?,
620
+ start_at: start_obj, timeout: 1.second
621
+ end
622
+
623
+ private
624
+
625
+ def catch_up_task(_scheduled = nil)
626
+ context[:execution_count] = context.fetch(:execution_count, 0) + 1
627
+ end
628
+
629
+ def done?
630
+ context.fetch(:execution_count, 0) >= 1
631
+ end
632
+ end
633
+ ```
634
+
635
+ - [ ] **Step 6: Update the two existing timeout tests to the new behavior**
636
+
637
+ In `test/durably_repeat_test.rb`, `test_durably_repeat_with_timeout` (`:116-131`) — replace the timeout-tombstone assertion. Change:
638
+
639
+ ```ruby
640
+ # Should have timeout failures
641
+ timeout_logs = workflow.execution_logs.select { |log|
642
+ log.failed? && log.error_message == "Execution timed out"
643
+ }
644
+ assert_operator timeout_logs.size, :>, 0, "should have timeout failures"
645
+ ```
646
+
647
+ to:
648
+
649
+ ```ruby
650
+ # Expired ticks are now fast-forwarded: no per-tick "Execution timed out"
651
+ # rows; the skipped prefix collapses to a single fast_forwarded summary row.
652
+ timeout_logs = workflow.execution_logs.select { |log|
653
+ log.error_message == "Execution timed out"
654
+ }
655
+ assert_empty timeout_logs, "expired ticks should be fast-forwarded, not tombstoned per tick"
656
+
657
+ summaries = workflow.execution_logs.select { |log| log.metadata && log.metadata["fast_forwarded"] }
658
+ assert_operator summaries.size, :>=, 1, "should record a fast-forward summary row"
659
+ ```
660
+
661
+ Then `test_durably_repeat_coordination_log_updated_on_timeout` (`:345-384`) — change the same tombstone block (`:369-373`):
662
+
663
+ ```ruby
664
+ # Find timeout logs
665
+ timeout_logs = workflow.execution_logs.select { |log|
666
+ log.failed? && log.error_message == "Execution timed out"
667
+ }
668
+ assert_operator timeout_logs.size, :>, 0, "should have timeout failures"
669
+ ```
670
+
671
+ to:
672
+
673
+ ```ruby
674
+ # Expired ticks are fast-forwarded into a single summary row, not per-tick rows.
675
+ summaries = workflow.execution_logs.select { |log| log.metadata && log.metadata["fast_forwarded"] }
676
+ assert_operator summaries.size, :>=, 1, "should record a fast-forward summary row"
677
+ ```
678
+
679
+ (The remaining assertions in that test — `last_execution_at` is present and advanced — still hold and stay unchanged.)
680
+
681
+ - [ ] **Step 7: Run the durably_repeat suite — expect PASS**
682
+
683
+ Run: `bundle exec ruby -Itest test/durably_repeat_test.rb`
684
+ Expected: PASS (all, including updated timeout tests).
685
+
686
+ - [ ] **Step 8: Run the full suite — expect green**
687
+
688
+ Run: `bundle exec rake test`
689
+ Expected: all tests pass.
690
+
691
+ - [ ] **Step 9: Commit**
692
+
693
+ ```bash
694
+ git add lib/chrono_forge/executor/methods/durably_repeat.rb test/durably_repeat_test.rb
695
+ git commit -m "fix(durably_repeat): fast-forward expired catch-up prefix in closed form"
696
+ ```
697
+
698
+ ```json:metadata
699
+ {"files": ["lib/chrono_forge/executor/methods/durably_repeat.rb", "test/durably_repeat_test.rb"], "verifyCommand": "bundle exec ruby -Itest test/durably_repeat_test.rb && bundle exec rake test", "acceptanceCriteria": ["fast_forward returns input when nothing expired", "lands on first non-expired grid tick (no drift)", "zero per-tick timeout rows + exactly one summary row", "coordination last_execution_at advanced for stable replay", "first in-window tick still executes", "full suite green"], "requiresUserVerification": false}
700
+ ```
701
+
702
+ ---
703
+
704
+ ## Self-Review
705
+
706
+ - **Spec coverage:** Section 1 (deferred flush, all 8 sites) → Task 1. Section 2 (closed-form fast-forward, summary row, coordination advance, test updates) → Task 2. Both covered.
707
+ - **Placeholder scan:** none — every step has concrete code/commands.
708
+ - **Type/name consistency:** `enqueue_continuation`/`flush_continuation!`/`@continuation`/`fast_forward_expired_prefix` used identically across plan and tests. Summary metadata key `fast_forwarded` consistent in impl and all assertions.
709
+ - **Verification scan:** spec requires no human-in-the-loop verification → User Verification = NO; no verification task needed.