chrono_forge 0.9.1 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +22 -0
  3. data/README.md +305 -44
  4. data/docs/superpowers/plans/2026-06-25-chrono_forge-dashboard.md +1748 -0
  5. data/docs/superpowers/plans/2026-06-25-chrono_forge-dashboard.md.tasks.json +17 -0
  6. data/docs/superpowers/plans/2026-06-25-composite-retry-policies.md +930 -0
  7. data/docs/superpowers/plans/2026-06-25-composite-retry-policies.md.tasks.json +54 -0
  8. data/docs/superpowers/plans/2026-06-25-reserved-kwarg-guard.md +241 -0
  9. data/docs/superpowers/plans/2026-06-25-reserved-kwarg-guard.md.tasks.json +12 -0
  10. data/docs/superpowers/plans/2026-06-26-branches-spawn-merge.md +1378 -0
  11. data/docs/superpowers/plans/2026-06-26-branches-spawn-merge.md.tasks.json +67 -0
  12. data/docs/superpowers/plans/2026-06-26-deferral-continuation-race-and-catchup.md +709 -0
  13. data/docs/superpowers/plans/2026-06-26-deferral-continuation-race-and-catchup.md.tasks.json +19 -0
  14. data/docs/superpowers/specs/2026-06-03-unified-retry-policy-design.md +226 -0
  15. data/docs/superpowers/specs/2026-06-25-chrono_forge-dashboard-design.md +190 -0
  16. data/docs/superpowers/specs/2026-06-25-composite-retry-policies-design.md +228 -0
  17. data/docs/superpowers/specs/2026-06-25-reserved-kwarg-guard-design.md +169 -0
  18. data/docs/superpowers/specs/2026-06-25-spawn-merge-branches-design.md +468 -0
  19. data/docs/superpowers/specs/2026-06-26-dashboard-branch-view-design.md +142 -0
  20. data/docs/superpowers/specs/2026-06-26-deferral-continuation-race-and-catchup-design.md +265 -0
  21. data/lib/chrono_forge/branch_merge_job.rb +138 -0
  22. data/lib/chrono_forge/branch_probe.rb +26 -0
  23. data/lib/chrono_forge/cleanup.rb +6 -0
  24. data/lib/chrono_forge/execution_log.rb +6 -0
  25. data/lib/chrono_forge/executor/composite_retry_policy.rb +47 -0
  26. data/lib/chrono_forge/executor/methods/branch.rb +185 -0
  27. data/lib/chrono_forge/executor/methods/durably_execute.rb +21 -19
  28. data/lib/chrono_forge/executor/methods/durably_repeat.rb +118 -25
  29. data/lib/chrono_forge/executor/methods/merge_branches.rb +83 -0
  30. data/lib/chrono_forge/executor/methods/wait.rb +2 -4
  31. data/lib/chrono_forge/executor/methods/wait_until.rb +25 -25
  32. data/lib/chrono_forge/executor/methods/workflow_states.rb +16 -0
  33. data/lib/chrono_forge/executor/methods.rb +2 -0
  34. data/lib/chrono_forge/executor/retry_policy.rb +111 -0
  35. data/lib/chrono_forge/executor.rb +216 -28
  36. data/lib/chrono_forge/version.rb +1 -1
  37. data/lib/chrono_forge/workflow.rb +10 -1
  38. data/lib/generators/chrono_forge/migration_actions.rb +1 -0
  39. data/lib/generators/chrono_forge/templates/add_chrono_forge_parent_execution_log.rb +38 -0
  40. metadata +42 -5
  41. data/lib/chrono_forge/executor/retry_strategy.rb +0 -29
@@ -0,0 +1,1378 @@
1
+ # Branches (`branch` / `spawn` / `spawn_each` / `merge_branches`) Implementation Plan
2
+
3
+ > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development (recommended) or superpowers-extended-cc:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
4
+
5
+ **Goal:** Add a durable, large-scale fan-out/fan-in primitive to ChronoForge — `branch` blocks that `spawn`/`spawn_each` child sub-workflows and are joined by `merge_branches` (or `automerge`), built to dispatch hundreds of thousands of children per branch.
6
+
7
+ **Architecture:** A `branch :name do … end` block is a durable coordination step (`branch$<name>` execution log). Inside it, `spawn`/`spawn_each` eagerly create + bulk-enqueue child workflows (one `chrono_forge_workflows` row each, linked by a new generic `parent_execution_log_id` FK to the branch log). The block seals when it closes; `spawn_each` streams its source with a per-spawn cursor so dispatch resumes after a crash without re-streaming. Joining is poll-based via a lightweight `BranchMergeJob` (no parent replay per poll); branch/merge state is tracked in an in-memory registry (`@open_branches`) rebuilt each replay pass, so the completion gate can raise on a forgotten join.
8
+
9
+ **Tech Stack:** Ruby, ActiveJob (>= 7.1, for `perform_all_later`), ActiveRecord, Zeitwerk, Minitest + Combustion + ChaoticJob.
10
+
11
+ **User Verification:** NO — no user verification required (library feature; verified by the test suite).
12
+
13
+ **Reference spec:** `docs/superpowers/specs/2026-06-25-spawn-merge-branches-design.md`
14
+
15
+ ---
16
+
17
+ ## File Structure
18
+
19
+ **New library files**
20
+ - `lib/chrono_forge/executor/methods/branch.rb` — `branch`, `spawn`, `spawn_each`, and shared dispatch/cursor/registry helpers.
21
+ - `lib/chrono_forge/executor/methods/merge_branches.rb` — `merge_branches`/`merge_branch`, plus `branches_done?` / `enqueue_branch_merge_job` / `open_branch!` (used by the completion gate too).
22
+ - `lib/chrono_forge/branch_merge_job.rb` — `ChronoForge::BranchMergeJob`, the lightweight poller.
23
+ - `lib/generators/chrono_forge/templates/add_chrono_forge_parent_execution_log.rb` — additive migration.
24
+
25
+ **Modified library files**
26
+ - `lib/chrono_forge/executor.rb` — new error classes; `include Methods::Branch` / `Methods::MergeBranches` (via methods.rb); poll-cadence constants.
27
+ - `lib/chrono_forge/executor/methods.rb` — include the two new modules.
28
+ - `lib/chrono_forge/executor/methods/workflow_states.rb` — completion gate in `complete_workflow!`.
29
+ - `lib/chrono_forge/workflow.rb` — `belongs_to :parent_execution_log`.
30
+ - `lib/chrono_forge/execution_log.rb` — `has_many :spawned_workflows`.
31
+ - `lib/generators/chrono_forge/migration_actions.rb` — add migration to `MIGRATIONS`.
32
+ - `chrono_forge.gemspec` — `activejob >= 7.1` floor.
33
+ - `README.md` — branch/merge section + caveats.
34
+
35
+ **New/modified test files**
36
+ - `test/internal/db/migrate/20260626000001_add_chrono_forge_parent_execution_log.rb` — apply the column to the test DB.
37
+ - `test/internal/app/jobs/` — branch test workflow jobs + a trivial child workflow.
38
+ - `test/branch_test.rb`, `test/spawn_each_test.rb`, `test/branch_merge_job_test.rb`, `test/merge_branches_test.rb`, `test/automerge_test.rb`, `test/branch_recovery_test.rb`, `test/branch_scale_test.rb`.
39
+ - `test/schema_test.rb`, `test/generators_test.rb`, `test/upgrade_migration_test.rb` — extend for the new column/index.
40
+
41
+ ---
42
+
43
+ ### Task 1: Schema — `parent_execution_log_id` column + `(parent_execution_log_id, state)` index
44
+
45
+ **Goal:** Add a generic `parent_execution_log_id` FK column to `chrono_forge_workflows` with a composite index on `(parent_execution_log_id, state)`, shipped as an additive migration wired into the install/upgrade generators and applied to the test DB.
46
+
47
+ **Files:**
48
+ - Create: `lib/generators/chrono_forge/templates/add_chrono_forge_parent_execution_log.rb`
49
+ - Modify: `lib/generators/chrono_forge/migration_actions.rb`
50
+ - Create: `test/internal/db/migrate/20260626000001_add_chrono_forge_parent_execution_log.rb`
51
+ - Modify: `test/schema_test.rb`, `test/generators_test.rb`
52
+
53
+ **Acceptance Criteria:**
54
+ - [ ] `chrono_forge_workflows` has a nullable `parent_execution_log_id` column whose type matches the table's primary-key type (bigint or uuid).
55
+ - [ ] A composite index `(parent_execution_log_id, state)` exists.
56
+ - [ ] The migration is idempotent (`if_not_exists`) and listed in `MigrationActions::MIGRATIONS`.
57
+ - [ ] `generators_test` expects the new migration in the copied set.
58
+
59
+ **Verify:** `cd .worktrees/branches && bundle exec ruby -I test test/schema_test.rb` → all pass
60
+
61
+ **Steps:**
62
+
63
+ - [ ] **Step 1: Write the failing schema test**
64
+
65
+ Add to `test/schema_test.rb` (inside the existing `SchemaTest`):
66
+
67
+ ```ruby
68
+ def test_workflows_have_parent_execution_log_id_column
69
+ assert connection.column_exists?(:chrono_forge_workflows, :parent_execution_log_id),
70
+ "expected chrono_forge_workflows.parent_execution_log_id for branch children"
71
+ end
72
+
73
+ def test_workflows_have_parent_execution_log_state_index
74
+ assert connection.index_exists?(:chrono_forge_workflows, %i[parent_execution_log_id state]),
75
+ "expected composite index on [parent_execution_log_id, state] for the merge probe"
76
+ end
77
+ ```
78
+
79
+ - [ ] **Step 2: Run the test to verify it fails**
80
+
81
+ Run: `bundle exec ruby -I test test/schema_test.rb -n test_workflows_have_parent_execution_log_id_column`
82
+ Expected: FAIL — column does not exist.
83
+
84
+ - [ ] **Step 3: Write the migration template**
85
+
86
+ Create `lib/generators/chrono_forge/templates/add_chrono_forge_parent_execution_log.rb`:
87
+
88
+ ```ruby
89
+ # frozen_string_literal: true
90
+
91
+ # Adds chrono_forge_workflows.parent_execution_log_id: the execution log that
92
+ # spawned a workflow (for branches, the branch$<name> log). Deliberately generic
93
+ # so any future step that spawns sub-workflows can reuse it. The composite
94
+ # [parent_execution_log_id, state] index makes the merge completion probe and the
95
+ # dropped-job re-kick index-only at hundreds of thousands of children.
96
+ #
97
+ # Shipped standalone (matching add_chrono_forge_workflow_state_index) so existing
98
+ # installs pick it up via `rails generate chrono_forge:upgrade`.
99
+ class AddChronoForgeParentExecutionLog < ActiveRecord::Migration[7.1]
100
+ disable_ddl_transaction!
101
+
102
+ def change
103
+ add_column :chrono_forge_workflows, :parent_execution_log_id, parent_log_fk_type,
104
+ null: true, if_not_exists: true
105
+
106
+ add_index :chrono_forge_workflows, %i[parent_execution_log_id state],
107
+ if_not_exists: true, **chrono_forge_index_algorithm
108
+ end
109
+
110
+ private
111
+
112
+ # Match the type of chrono_forge_workflows.id so the FK lines up on both bigint
113
+ # and uuid installs.
114
+ def parent_log_fk_type
115
+ id_col = connection.columns(:chrono_forge_workflows).find { |c| c.name == "id" }
116
+ id_col && id_col.sql_type.to_s.downcase.include?("uuid") ? :uuid : :bigint
117
+ end
118
+
119
+ def chrono_forge_index_algorithm
120
+ if connection.adapter_name.to_s.downcase.include?("postgresql")
121
+ {algorithm: :concurrently}
122
+ else
123
+ {}
124
+ end
125
+ end
126
+ end
127
+ ```
128
+
129
+ - [ ] **Step 4: Wire it into the generators**
130
+
131
+ In `lib/generators/chrono_forge/migration_actions.rb`, append to `MIGRATIONS`:
132
+
133
+ ```ruby
134
+ MIGRATIONS = %w[
135
+ install_chrono_forge
136
+ add_chrono_forge_workflow_state_index
137
+ add_chrono_forge_error_log_step_context
138
+ add_chrono_forge_parent_execution_log
139
+ ].freeze
140
+ ```
141
+
142
+ Update `test/generators_test.rb` `test_install_copies_all_migrations` expected list to include `"add_chrono_forge_parent_execution_log.rb"` (keep it alphabetically sorted as the test sorts), and bump the idempotence count in `test_install_is_idempotent` from `3` to `4`.
143
+
144
+ - [ ] **Step 5: Apply to the test DB**
145
+
146
+ Create `test/internal/db/migrate/20260626000001_add_chrono_forge_parent_execution_log.rb` with the **same class body** as the template (Combustion runs these migrations to build the test schema):
147
+
148
+ ```ruby
149
+ # frozen_string_literal: true
150
+
151
+ require_relative "../../../../lib/generators/chrono_forge/templates/add_chrono_forge_parent_execution_log"
152
+ ```
153
+
154
+ (If the `require_relative` shortcut causes Combustion load-order issues, instead paste the full class body from Step 3 into this file verbatim.)
155
+
156
+ - [ ] **Step 6: Run tests to verify they pass**
157
+
158
+ Run: `bundle exec ruby -I test test/schema_test.rb && bundle exec ruby -I test test/generators_test.rb`
159
+ Expected: PASS.
160
+
161
+ - [ ] **Step 7: Commit**
162
+
163
+ ```bash
164
+ git add lib/generators test/internal/db/migrate test/schema_test.rb test/generators_test.rb
165
+ git commit -m "feat(branches): add parent_execution_log_id column + index"
166
+ ```
167
+
168
+ ```json:metadata
169
+ {"files": ["lib/generators/chrono_forge/templates/add_chrono_forge_parent_execution_log.rb", "lib/generators/chrono_forge/migration_actions.rb", "test/internal/db/migrate/20260626000001_add_chrono_forge_parent_execution_log.rb", "test/schema_test.rb", "test/generators_test.rb"], "verifyCommand": "bundle exec ruby -I test test/schema_test.rb", "acceptanceCriteria": ["parent_execution_log_id column exists", "composite (parent_execution_log_id, state) index exists", "migration listed in MIGRATIONS and generators_test"], "requiresUserVerification": false}
170
+ ```
171
+
172
+ ---
173
+
174
+ ### Task 2: Model associations
175
+
176
+ **Goal:** Link children to their spawning branch log via ActiveRecord associations.
177
+
178
+ **Files:**
179
+ - Modify: `lib/chrono_forge/workflow.rb`
180
+ - Modify: `lib/chrono_forge/execution_log.rb`
181
+ - Test: `test/branch_associations_test.rb`
182
+
183
+ **Acceptance Criteria:**
184
+ - [ ] `Workflow#parent_execution_log` returns the spawning `ExecutionLog` (optional).
185
+ - [ ] `ExecutionLog#spawned_workflows` returns the workflows it spawned.
186
+
187
+ **Verify:** `bundle exec ruby -I test test/branch_associations_test.rb` → PASS
188
+
189
+ **Steps:**
190
+
191
+ - [ ] **Step 1: Write the failing test**
192
+
193
+ Create `test/branch_associations_test.rb`:
194
+
195
+ ```ruby
196
+ require "test_helper"
197
+
198
+ class BranchAssociationsTest < ActiveJob::TestCase
199
+ def test_parent_execution_log_and_spawned_workflows_round_trip
200
+ parent = ChronoForge::Workflow.create!(key: "p-#{SecureRandom.hex}", job_class: "X")
201
+ log = parent.execution_logs.create!(step_name: "branch$grp")
202
+ child = ChronoForge::Workflow.create!(
203
+ key: "c-#{SecureRandom.hex}", job_class: "Y", parent_execution_log_id: log.id
204
+ )
205
+
206
+ assert_equal log, child.parent_execution_log
207
+ assert_includes log.spawned_workflows, child
208
+ end
209
+ end
210
+ ```
211
+
212
+ - [ ] **Step 2: Run to verify it fails**
213
+
214
+ Run: `bundle exec ruby -I test test/branch_associations_test.rb`
215
+ Expected: FAIL — `NoMethodError: undefined method 'parent_execution_log'`.
216
+
217
+ - [ ] **Step 3: Add the associations**
218
+
219
+ In `lib/chrono_forge/workflow.rb`, after `has_many :error_logs, dependent: :destroy`:
220
+
221
+ ```ruby
222
+ belongs_to :parent_execution_log,
223
+ class_name: "ChronoForge::ExecutionLog", optional: true
224
+ ```
225
+
226
+ In `lib/chrono_forge/execution_log.rb`, after `belongs_to :workflow`:
227
+
228
+ ```ruby
229
+ has_many :spawned_workflows,
230
+ class_name: "ChronoForge::Workflow",
231
+ foreign_key: :parent_execution_log_id,
232
+ inverse_of: :parent_execution_log,
233
+ dependent: :nullify
234
+ ```
235
+
236
+ - [ ] **Step 4: Run to verify it passes**
237
+
238
+ Run: `bundle exec ruby -I test test/branch_associations_test.rb`
239
+ Expected: PASS.
240
+
241
+ - [ ] **Step 5: Commit**
242
+
243
+ ```bash
244
+ git add lib/chrono_forge/workflow.rb lib/chrono_forge/execution_log.rb test/branch_associations_test.rb
245
+ git commit -m "feat(branches): parent_execution_log / spawned_workflows associations"
246
+ ```
247
+
248
+ ```json:metadata
249
+ {"files": ["lib/chrono_forge/workflow.rb", "lib/chrono_forge/execution_log.rb", "test/branch_associations_test.rb"], "verifyCommand": "bundle exec ruby -I test test/branch_associations_test.rb", "acceptanceCriteria": ["parent_execution_log association", "spawned_workflows association"], "requiresUserVerification": false}
250
+ ```
251
+
252
+ ---
253
+
254
+ ### Task 3: `branch` + `spawn` (block, registry, eager single dispatch, seal, skip-on-replay)
255
+
256
+ **Goal:** Implement the `branch` block (durable step, in-memory registry, seal-on-close, **skip-the-block-when-sealed**) and `spawn` (single eager child dispatch). `spawn` outside a branch raises.
257
+
258
+ **Files:**
259
+ - Create: `lib/chrono_forge/executor/methods/branch.rb`
260
+ - Modify: `lib/chrono_forge/executor.rb` (error classes)
261
+ - Modify: `lib/chrono_forge/executor/methods.rb` (include)
262
+ - Create: `test/internal/app/jobs/single_spawn_workflow.rb`, `test/internal/app/jobs/noop_child.rb`
263
+ - Create: `test/branch_test.rb`
264
+
265
+ **Acceptance Criteria:**
266
+ - [ ] `branch :g do spawn :c, NoopChild end` creates a child with key `"<parent.key>$g$c"`, `job_class: "NoopChild"`, `parent_execution_log_id` = the `branch$g` log id, `state: idle`.
267
+ - [ ] The `branch$g` log is `completed` (sealed) after the block closes.
268
+ - [ ] On replay (sealed), the block body is **not** re-executed (no duplicate child rows, no re-dispatch).
269
+ - [ ] `spawn` called outside a `branch` block raises `NotInBranchError`.
270
+
271
+ **Verify:** `bundle exec ruby -I test test/branch_test.rb` → PASS
272
+
273
+ **Steps:**
274
+
275
+ - [ ] **Step 1: Write failing tests + fixtures**
276
+
277
+ Create `test/internal/app/jobs/noop_child.rb`:
278
+
279
+ ```ruby
280
+ class NoopChild < WorkflowJob
281
+ prepend ChronoForge::Executor
282
+
283
+ def perform(**)
284
+ durably_execute :noop
285
+ end
286
+
287
+ private
288
+
289
+ def noop = nil
290
+ end
291
+ ```
292
+
293
+ Create `test/internal/app/jobs/single_spawn_workflow.rb`:
294
+
295
+ ```ruby
296
+ class SingleSpawnWorkflow < WorkflowJob
297
+ prepend ChronoForge::Executor
298
+
299
+ def perform
300
+ branch :grp, automerge: true do
301
+ spawn :child, NoopChild, foo: "bar"
302
+ end
303
+ end
304
+ end
305
+ ```
306
+
307
+ Create `test/branch_test.rb`:
308
+
309
+ ```ruby
310
+ require "test_helper"
311
+
312
+ class BranchTest < ActiveJob::TestCase
313
+ def test_spawn_creates_linked_child_and_seals_branch
314
+ SingleSpawnWorkflow.perform_later("ss-1")
315
+ perform_all_jobs
316
+
317
+ parent = ChronoForge::Workflow.find_by(key: "ss-1")
318
+ branch_log = parent.execution_logs.find_by(step_name: "branch$grp")
319
+ assert branch_log.completed?, "branch should seal when the block closes"
320
+
321
+ child = ChronoForge::Workflow.find_by(key: "ss-1$grp$child")
322
+ assert child, "child should be created with deterministic key"
323
+ assert_equal "NoopChild", child.job_class
324
+ assert_equal branch_log.id, child.parent_execution_log_id
325
+ assert_equal({"foo" => "bar"}, child.kwargs)
326
+ end
327
+
328
+ def test_spawn_outside_branch_raises
329
+ workflow = Class.new(WorkflowJob) do
330
+ prepend ChronoForge::Executor
331
+ def perform = spawn(:x, NoopChild)
332
+ end
333
+ Object.const_set(:BareSpawnWorkflow, workflow)
334
+ BareSpawnWorkflow.perform_later("bare-1")
335
+ assert_raises(ChronoForge::Executor::NotInBranchError) { perform_all_jobs }
336
+ ensure
337
+ Object.send(:remove_const, :BareSpawnWorkflow) if defined?(BareSpawnWorkflow)
338
+ end
339
+
340
+ def test_sealed_branch_block_is_not_re_executed_on_replay
341
+ # First run dispatches + seals.
342
+ SingleSpawnWorkflow.perform_later("ss-2")
343
+ perform_all_jobs
344
+ branch_log = ChronoForge::Workflow.find_by(key: "ss-2").execution_logs.find_by(step_name: "branch$grp")
345
+
346
+ # Re-run the same workflow key: the sealed branch must skip its block.
347
+ inserts = 0
348
+ sub = ActiveSupport::Notifications.subscribe("sql.active_record") do |*a|
349
+ inserts += 1 if /INSERT INTO ["`]?chrono_forge_workflows/i.match?(a.last[:sql].to_s)
350
+ end
351
+ SingleSpawnWorkflow.perform_later("ss-2")
352
+ perform_all_jobs
353
+ ActiveSupport::Notifications.unsubscribe(sub)
354
+
355
+ assert_equal 0, inserts, "sealed branch must not re-dispatch children on replay"
356
+ assert_equal 1, ChronoForge::Workflow.where(parent_execution_log_id: branch_log.id).count
357
+ end
358
+ end
359
+ ```
360
+
361
+ - [ ] **Step 2: Run to verify they fail**
362
+
363
+ Run: `bundle exec ruby -I test test/branch_test.rb`
364
+ Expected: FAIL — `NameError: uninitialized constant ... NotInBranchError` / `NoMethodError: branch`.
365
+
366
+ - [ ] **Step 3: Add the error classes**
367
+
368
+ In `lib/chrono_forge/executor.rb`, after `class InvalidStepName < NotExecutableError; end`:
369
+
370
+ ```ruby
371
+ # spawn/spawn_each called outside a branch block. NotExecutableError so it
372
+ # propagates (fail-fast on a programming error) rather than being retried.
373
+ class NotInBranchError < NotExecutableError; end
374
+
375
+ # A branch was opened but neither merged via merge_branches nor declared
376
+ # automerge: true. Raised at the completion gate. Fail-fast (not retried).
377
+ class UnmergedBranchError < NotExecutableError; end
378
+ ```
379
+
380
+ - [ ] **Step 4: Implement `branch` + `spawn` + shared helpers**
381
+
382
+ Create `lib/chrono_forge/executor/methods/branch.rb`:
383
+
384
+ ```ruby
385
+ module ChronoForge
386
+ module Executor
387
+ module Methods
388
+ module Branch
389
+ # Opens a named branch — a durable fan-out step. Spawns inside the block
390
+ # eagerly create + enqueue child workflows; the branch SEALS when the
391
+ # block closes. Returns without waiting (branches are concurrent; the
392
+ # join is a separate merge_branches / automerge).
393
+ def branch(name, automerge: false)
394
+ raise ArgumentError, "branch requires a block" unless block_given?
395
+ raise ArgumentError, "branch blocks cannot be nested" if @current_branch
396
+ validate_step_name_segment!(name)
397
+
398
+ step_name = "branch$#{name}"
399
+ log = find_or_create_execution_log!(step_name) { |l| l.started_at = Time.current }
400
+
401
+ # The sealed branch log may be a readonly, id-less cache stand-in; fetch
402
+ # the real id so the registry/merge can scope children to it.
403
+ log_id = log.id || ExecutionLog.where(workflow: @workflow, step_name: step_name).pick(:id)
404
+ (@open_branches ||= {})[name.to_s] = {automerge: automerge, log_id: log_id}
405
+
406
+ # ---- THE single most important correctness/performance property ----
407
+ # A SEALED branch skips its block ENTIRELY. The expensive source
408
+ # enumeration in spawn_each never re-runs after sealing. Do not move
409
+ # dispatch out from behind this guard.
410
+ unless log.completed?
411
+ @current_branch = {name: name.to_s, log: log, seq: 0}
412
+ begin
413
+ yield
414
+ ensure
415
+ @current_branch = nil
416
+ end
417
+ log.update!(state: :completed, completed_at: Time.current)
418
+ end
419
+
420
+ name
421
+ end
422
+
423
+ # Dispatch a single child into the current branch.
424
+ def spawn(name, workflow_class, **kwargs)
425
+ cb = current_branch!
426
+ validate_step_name_segment!(name)
427
+ child_key = "#{@workflow.key}$#{cb[:name]}$#{name}"
428
+ dispatch_children(cb, [[child_key, workflow_class, kwargs]])
429
+ name
430
+ end
431
+
432
+ private
433
+
434
+ def current_branch!
435
+ @current_branch || raise(NotInBranchError, "spawn/spawn_each may only be called inside a branch block")
436
+ end
437
+
438
+ # Bulk-create child workflow rows then bulk-enqueue their jobs.
439
+ # perform_all_later bypasses the class-level perform_later guard, so we
440
+ # validate the args ourselves before enqueuing.
441
+ def dispatch_children(cb, entries)
442
+ return if entries.empty?
443
+ now = Time.current
444
+ rows = entries.map do |child_key, klass, kwargs|
445
+ validate_child_enqueue!(child_key, kwargs)
446
+ {
447
+ key: child_key, job_class: klass.to_s,
448
+ kwargs: kwargs, options: {}, context: {},
449
+ state: Workflow.states[:idle],
450
+ parent_execution_log_id: cb[:log].id,
451
+ created_at: now, updated_at: now
452
+ }
453
+ end
454
+ # On-conflict-ignore makes re-dispatch (crash recovery) idempotent.
455
+ Workflow.insert_all(rows, unique_by: :key)
456
+ jobs = entries.map { |child_key, klass, kwargs| klass.new(child_key, **kwargs) }
457
+ ActiveJob.perform_all_later(jobs)
458
+ end
459
+
460
+ def validate_child_enqueue!(child_key, kwargs)
461
+ unless child_key.is_a?(String)
462
+ raise ArgumentError, "child key must be a String (got #{child_key.inspect})"
463
+ end
464
+ reserved = kwargs.keys.map(&:to_sym) & RESERVED_KWARGS
465
+ if reserved.any?
466
+ raise ArgumentError, "#{reserved.join(", ")} are reserved ChronoForge keywords"
467
+ end
468
+ end
469
+
470
+ # Advance (and persist) a spawn_each cursor on the branch log.
471
+ # `n` is the running item index; `pk` is the AR keyset position (nil for
472
+ # plain enumerables).
473
+ def advance_cursor!(cb, spawn_name, n:, pk: nil)
474
+ meta = cb[:log].metadata || {}
475
+ cursors = meta["cursors"] || {}
476
+ entry = cursors[spawn_name.to_s] || {}
477
+ entry["n"] = n
478
+ entry["pk"] = pk unless pk.nil?
479
+ cursors[spawn_name.to_s] = entry
480
+ meta["cursors"] = cursors
481
+ cb[:log].update!(metadata: meta)
482
+ end
483
+ end
484
+ end
485
+ end
486
+ end
487
+ ```
488
+
489
+ In `lib/chrono_forge/executor/methods.rb`, add the include (place `Branch` before `WorkflowStates` so its private helpers are available to the completion gate):
490
+
491
+ ```ruby
492
+ module ChronoForge
493
+ module Executor
494
+ module Methods
495
+ include Methods::Wait
496
+ include Methods::WaitUntil
497
+ include Methods::ContinueIf
498
+ include Methods::DurablyExecute
499
+ include Methods::DurablyRepeat
500
+ include Methods::Branch
501
+ include Methods::MergeBranches
502
+ include Methods::WorkflowStates
503
+ end
504
+ end
505
+ end
506
+ ```
507
+
508
+ > Note: `Methods::MergeBranches` is referenced here but created in Task 6. Until then, add a temporary empty module to keep the suite loading, OR implement Task 6 immediately after this task. The subagent executing this plan should create `merge_branches.rb` with at least `module ChronoForge; module Executor; module Methods; module MergeBranches; end; end; end; end` now and flesh it out in Task 6.
509
+
510
+ - [ ] **Step 5: Run tests to verify they pass**
511
+
512
+ Run: `bundle exec ruby -I test test/branch_test.rb`
513
+ Expected: PASS (3 tests).
514
+
515
+ - [ ] **Step 6: Commit**
516
+
517
+ ```bash
518
+ git add lib/chrono_forge test/internal/app/jobs/noop_child.rb test/internal/app/jobs/single_spawn_workflow.rb test/branch_test.rb
519
+ git commit -m "feat(branches): branch block + spawn single child"
520
+ ```
521
+
522
+ ```json:metadata
523
+ {"files": ["lib/chrono_forge/executor/methods/branch.rb", "lib/chrono_forge/executor.rb", "lib/chrono_forge/executor/methods.rb", "test/branch_test.rb", "test/internal/app/jobs/noop_child.rb", "test/internal/app/jobs/single_spawn_workflow.rb"], "verifyCommand": "bundle exec ruby -I test test/branch_test.rb", "acceptanceCriteria": ["spawn creates linked child + seals branch", "spawn outside branch raises NotInBranchError", "sealed branch skips block on replay"], "requiresUserVerification": false}
524
+ ```
525
+
526
+ ---
527
+
528
+ ### Task 4: `spawn_each` — streaming bulk dispatch with cursor
529
+
530
+ **Goal:** Implement `spawn_each(name, source, of:)` — stream an AR relation (keyset) or any enumerable, dispatching one child per item keyed `name_{index}`, with the class returned from the block and a resumable per-spawn cursor. Raise on a conflicting AR `.order`.
531
+
532
+ **Files:**
533
+ - Modify: `lib/chrono_forge/executor/methods/branch.rb`
534
+ - Create: `test/internal/app/jobs/spawn_each_workflow.rb`
535
+ - Create: `test/spawn_each_test.rb`
536
+
537
+ **Acceptance Criteria:**
538
+ - [ ] `spawn_each :items, User.all` over N users creates N children keyed `<parent.key>$grp$items_0 … items_{N-1}`, each `parent_execution_log_id` = the branch log.
539
+ - [ ] The block's returned class is honored per item (mixed classes supported).
540
+ - [ ] An AR relation with an explicit conflicting `.order(...)` raises (via `error_on_ignore: true`).
541
+ - [ ] A plain enumerable source works (offset cursor).
542
+ - [ ] Cursor `{ "pk" =>, "n" => }` is persisted under `metadata.cursors[name]`.
543
+
544
+ **Verify:** `bundle exec ruby -I test test/spawn_each_test.rb` → PASS
545
+
546
+ **Steps:**
547
+
548
+ - [ ] **Step 1: Write failing tests + fixtures**
549
+
550
+ Create `test/internal/app/jobs/spawn_each_workflow.rb`:
551
+
552
+ ```ruby
553
+ class SpawnEachWorkflow < WorkflowJob
554
+ prepend ChronoForge::Executor
555
+
556
+ def perform(of: 1000)
557
+ branch :grp, automerge: true do
558
+ spawn_each :items, User.order(:id), of: of do |user|
559
+ [NoopChild, {user_id: user.id}]
560
+ end
561
+ end
562
+ end
563
+ end
564
+ ```
565
+
566
+ Create `test/spawn_each_test.rb`:
567
+
568
+ ```ruby
569
+ require "test_helper"
570
+
571
+ class SpawnEachTest < ActiveJob::TestCase
572
+ def setup
573
+ User.delete_all
574
+ @users = 5.times.map { |i| User.create!(name: "u#{i}", email: "u#{i}@e.com") }
575
+ end
576
+
577
+ def test_spawn_each_creates_one_indexed_child_per_item
578
+ SpawnEachWorkflow.perform_later("se-1")
579
+ perform_all_jobs
580
+
581
+ parent = ChronoForge::Workflow.find_by(key: "se-1")
582
+ branch_log = parent.execution_logs.find_by(step_name: "branch$grp")
583
+ children = ChronoForge::Workflow.where(parent_execution_log_id: branch_log.id).order(:key)
584
+
585
+ assert_equal 5, children.count
586
+ assert_equal (0..4).map { |i| "se-1$grp$items_#{i}" }, children.pluck(:key)
587
+ assert_equal [@users.first.id], [children.first.kwargs["user_id"]]
588
+ cursor = branch_log.reload.metadata["cursors"]["items"]
589
+ assert_equal 5, cursor["n"]
590
+ end
591
+
592
+ def test_spawn_each_honors_class_from_block
593
+ klass = Class.new(WorkflowJob) { prepend ChronoForge::Executor; def perform(**) = nil }
594
+ Object.const_set(:AltChild, klass)
595
+ job = Class.new(WorkflowJob) do
596
+ prepend ChronoForge::Executor
597
+ def perform
598
+ branch(:g, automerge: true) do
599
+ spawn_each(:i, User.order(:id)) { |u| u.id.even? ? [AltChild, {id: u.id}] : [NoopChild, {id: u.id}] }
600
+ end
601
+ end
602
+ end
603
+ Object.const_set(:MixedClassWorkflow, job)
604
+
605
+ MixedClassWorkflow.perform_later("mc-1")
606
+ perform_all_jobs
607
+
608
+ classes = ChronoForge::Workflow.where("key LIKE ?", "mc-1$g$i_%").pluck(:job_class).uniq.sort
609
+ assert_equal %w[AltChild NoopChild], classes
610
+ ensure
611
+ Object.send(:remove_const, :AltChild) if defined?(AltChild)
612
+ Object.send(:remove_const, :MixedClassWorkflow) if defined?(MixedClassWorkflow)
613
+ end
614
+
615
+ def test_spawn_each_raises_on_conflicting_order
616
+ job = Class.new(WorkflowJob) do
617
+ prepend ChronoForge::Executor
618
+ def perform
619
+ branch(:g, automerge: true) do
620
+ spawn_each(:i, User.order(:email)) { |u| [NoopChild, {id: u.id}] }
621
+ end
622
+ end
623
+ end
624
+ Object.const_set(:BadOrderWorkflow, job)
625
+ BadOrderWorkflow.perform_later("bo-1")
626
+ assert_raises(ActiveRecord::IrreversibleOrderError) { perform_all_jobs }
627
+ ensure
628
+ Object.send(:remove_const, :BadOrderWorkflow) if defined?(BadOrderWorkflow)
629
+ end
630
+ end
631
+ ```
632
+
633
+ - [ ] **Step 2: Run to verify they fail**
634
+
635
+ Run: `bundle exec ruby -I test test/spawn_each_test.rb`
636
+ Expected: FAIL — `NoMethodError: spawn_each`.
637
+
638
+ - [ ] **Step 3: Implement `spawn_each`**
639
+
640
+ Add to `lib/chrono_forge/executor/methods/branch.rb` (in module `Branch`, public, next to `spawn`):
641
+
642
+ ```ruby
643
+ # Dispatch one child per item of `source`, streamed. AR relations use
644
+ # keyset iteration (in_batches start:) for constant memory; any other
645
+ # enumerable uses an offset cursor. Items are keyed `name_{index}` by
646
+ # their sequential position, so the source must re-enumerate identically
647
+ # across replays. The block returns [WorkflowClass, kwargs] (or a class).
648
+ def spawn_each(name, source, of: 1000)
649
+ cb = current_branch!
650
+ validate_step_name_segment!(name)
651
+ cursor = (cb[:log].metadata&.dig("cursors", name.to_s)) || {}
652
+ n = (cursor["n"] || 0)
653
+
654
+ if source.is_a?(ActiveRecord::Relation)
655
+ source.find_in_batches(batch_size: of, start: cursor["pk"], error_on_ignore: true) do |records|
656
+ entries = records.map do |record|
657
+ klass, kw = normalize_spawn(yield(record))
658
+ ck = "#{@workflow.key}$#{cb[:name]}$#{name}_#{n}"
659
+ n += 1
660
+ [ck, klass, kw]
661
+ end
662
+ dispatch_children(cb, entries)
663
+ advance_cursor!(cb, name, pk: records.last.id, n: n)
664
+ end
665
+ else
666
+ source.drop(n).each_slice(of) do |slice|
667
+ entries = slice.map do |item|
668
+ klass, kw = normalize_spawn(yield(item))
669
+ ck = "#{@workflow.key}$#{cb[:name]}$#{name}_#{n}"
670
+ n += 1
671
+ [ck, klass, kw]
672
+ end
673
+ dispatch_children(cb, entries)
674
+ advance_cursor!(cb, name, n: n)
675
+ end
676
+ end
677
+ name
678
+ end
679
+ ```
680
+
681
+ And the private helper (add near the other privates in `Branch`):
682
+
683
+ ```ruby
684
+ # Normalize the block return: [Klass, kwargs] or a bare Klass.
685
+ def normalize_spawn(result)
686
+ klass, kwargs = Array(result)
687
+ [klass, kwargs || {}]
688
+ end
689
+ ```
690
+
691
+ - [ ] **Step 4: Run to verify they pass**
692
+
693
+ Run: `bundle exec ruby -I test test/spawn_each_test.rb`
694
+ Expected: PASS.
695
+
696
+ - [ ] **Step 5: Commit**
697
+
698
+ ```bash
699
+ git add lib/chrono_forge/executor/methods/branch.rb test/spawn_each_test.rb test/internal/app/jobs/spawn_each_workflow.rb
700
+ git commit -m "feat(branches): spawn_each streaming bulk dispatch with cursor"
701
+ ```
702
+
703
+ ```json:metadata
704
+ {"files": ["lib/chrono_forge/executor/methods/branch.rb", "test/spawn_each_test.rb", "test/internal/app/jobs/spawn_each_workflow.rb"], "verifyCommand": "bundle exec ruby -I test test/spawn_each_test.rb", "acceptanceCriteria": ["one indexed child per item", "class from block honored (mixed)", "raises on conflicting AR order", "cursor persisted"], "requiresUserVerification": false}
705
+ ```
706
+
707
+ ---
708
+
709
+ ### Task 5: `BranchMergeJob` — the lightweight poller
710
+
711
+ **Goal:** Implement the dedicated poller: capped-count probe per branch, wake the parent when all branches are sealed + drained, otherwise re-kick dropped jobs and reschedule with an adaptive (capped-count) interval.
712
+
713
+ **Files:**
714
+ - Create: `lib/chrono_forge/branch_merge_job.rb`
715
+ - Modify: `lib/chrono_forge/executor.rb` (poll-cadence constants — optional, can live on the job)
716
+ - Create: `test/branch_merge_job_test.rb`
717
+
718
+ **Acceptance Criteria:**
719
+ - [ ] When every branch log is `completed` (sealed) and has zero incomplete children, the job enqueues the parent workflow (`parent_job_class.perform_later(parent_key)`) and does not reschedule.
720
+ - [ ] Otherwise it reschedules itself with delay `clamp(pending * FACTOR, min, max)` and does not wake the parent.
721
+ - [ ] The pending count is capped at `CAP` (never counts beyond it).
722
+ - [ ] A never-started child (`started_at` nil) older than the re-kick threshold is re-enqueued.
723
+
724
+ **Verify:** `bundle exec ruby -I test test/branch_merge_job_test.rb` → PASS
725
+
726
+ **Steps:**
727
+
728
+ - [ ] **Step 1: Write failing tests**
729
+
730
+ Create `test/branch_merge_job_test.rb`:
731
+
732
+ ```ruby
733
+ require "test_helper"
734
+
735
+ class BranchMergeJobTest < ActiveJob::TestCase
736
+ def setup
737
+ @parent = ChronoForge::Workflow.create!(key: "bmj-parent", job_class: "SingleSpawnWorkflow")
738
+ @log = @parent.execution_logs.create!(step_name: "branch$g", state: :completed)
739
+ end
740
+
741
+ def child!(state:, started_at: Time.current)
742
+ ChronoForge::Workflow.create!(
743
+ key: "c-#{SecureRandom.hex}", job_class: "NoopChild",
744
+ parent_execution_log_id: @log.id, state: state, started_at: started_at
745
+ )
746
+ end
747
+
748
+ def test_wakes_parent_when_all_complete
749
+ child!(state: :completed)
750
+ assert_enqueued_with(job: SingleSpawnWorkflow, args: ["bmj-parent"]) do
751
+ ChronoForge::BranchMergeJob.perform_now("bmj-parent", "SingleSpawnWorkflow", [@log.id], 5, 300)
752
+ end
753
+ end
754
+
755
+ def test_reschedules_when_incomplete
756
+ child!(state: :running)
757
+ assert_enqueued_with(job: ChronoForge::BranchMergeJob) do
758
+ ChronoForge::BranchMergeJob.perform_now("bmj-parent", "SingleSpawnWorkflow", [@log.id], 5, 300)
759
+ end
760
+ refute_enqueued_with(job: SingleSpawnWorkflow)
761
+ end
762
+
763
+ def test_rekicks_never_started_child
764
+ stuck = child!(state: :idle, started_at: nil)
765
+ stuck.update_column(:updated_at, 10.minutes.ago)
766
+ assert_enqueued_with(job: NoopChild, args: ["#{stuck.key}"]) do
767
+ ChronoForge::BranchMergeJob.perform_now("bmj-parent", "SingleSpawnWorkflow", [@log.id], 5, 300)
768
+ end
769
+ end
770
+ end
771
+ ```
772
+
773
+ - [ ] **Step 2: Run to verify they fail**
774
+
775
+ Run: `bundle exec ruby -I test test/branch_merge_job_test.rb`
776
+ Expected: FAIL — `NameError: uninitialized constant ChronoForge::BranchMergeJob`.
777
+
778
+ - [ ] **Step 3: Implement the poller**
779
+
780
+ Create `lib/chrono_forge/branch_merge_job.rb`:
781
+
782
+ ```ruby
783
+ module ChronoForge
784
+ # Lightweight poller that joins one or more branches. NOT a workflow — it holds
785
+ # no lock, does no replay, and carries no context. It exists so the heavy parent
786
+ # workflow is replayed only twice per merge (kick off + completion wake).
787
+ class BranchMergeJob < ActiveJob::Base
788
+ CAP = 5_000 # cap the pending count; beyond it we just pick max_interval
789
+ FACTOR = 0.06 # seconds of delay per pending child
790
+ REKICK_AFTER = 5.minutes
791
+
792
+ def perform(parent_key, parent_job_class, branch_log_ids, min_interval, max_interval)
793
+ pending = branch_log_ids.sum { |id| incomplete_scope(id).limit(CAP).count }
794
+ sealed = branch_log_ids.all? { |id| branch_sealed?(id) }
795
+
796
+ if sealed && pending.zero?
797
+ parent_job_class.constantize.perform_later(parent_key)
798
+ return
799
+ end
800
+
801
+ rekick_dropped_jobs(branch_log_ids)
802
+
803
+ delay = [[pending * FACTOR, min_interval].max, max_interval].min
804
+ self.class.set(wait: delay.seconds)
805
+ .perform_later(parent_key, parent_job_class, branch_log_ids, min_interval, max_interval)
806
+ end
807
+
808
+ private
809
+
810
+ def incomplete_scope(branch_log_id)
811
+ Workflow.where(parent_execution_log_id: branch_log_id)
812
+ .where.not(state: Workflow.states[:completed])
813
+ end
814
+
815
+ def branch_sealed?(branch_log_id)
816
+ ExecutionLog.where(id: branch_log_id, state: ExecutionLog.states[:completed]).exists?
817
+ end
818
+
819
+ # A child dispatched but never run (its job was dropped by the backend) is
820
+ # re-enqueued. started_at IS NULL can't distinguish "never enqueued" from
821
+ # "queued but not yet picked up", so we only re-kick children that have been
822
+ # idle past REKICK_AFTER. Re-enqueue is idempotent: a completed/running child
823
+ # no-ops via the executable?/lock guard.
824
+ def rekick_dropped_jobs(branch_log_ids)
825
+ branch_log_ids.each do |id|
826
+ Workflow.where(parent_execution_log_id: id, started_at: nil)
827
+ .where("updated_at < ?", REKICK_AFTER.ago)
828
+ .find_each do |child|
829
+ child.job_klass.perform_later(child.key, **child.kwargs.symbolize_keys)
830
+ end
831
+ end
832
+ end
833
+ end
834
+ end
835
+ ```
836
+
837
+ - [ ] **Step 4: Run to verify they pass**
838
+
839
+ Run: `bundle exec ruby -I test test/branch_merge_job_test.rb`
840
+ Expected: PASS.
841
+
842
+ - [ ] **Step 5: Commit**
843
+
844
+ ```bash
845
+ git add lib/chrono_forge/branch_merge_job.rb test/branch_merge_job_test.rb
846
+ git commit -m "feat(branches): BranchMergeJob lightweight poller"
847
+ ```
848
+
849
+ ```json:metadata
850
+ {"files": ["lib/chrono_forge/branch_merge_job.rb", "test/branch_merge_job_test.rb"], "verifyCommand": "bundle exec ruby -I test test/branch_merge_job_test.rb", "acceptanceCriteria": ["wakes parent when all complete", "reschedules when incomplete", "capped count", "re-kicks never-started child"], "requiresUserVerification": false}
851
+ ```
852
+
853
+ ---
854
+
855
+ ### Task 6: `merge_branches` / `merge_branch` — the join
856
+
857
+ **Goal:** Implement `merge_branches(*names)` (alias `merge_branch`): immediate done-check, else enqueue `BranchMergeJob` and halt; remove joined branches from `@open_branches` on completion; raise on an unopened name. Provide the shared helpers (`branches_done?`, `enqueue_branch_merge_job`, `open_branch!`) used by the completion gate in Task 7.
858
+
859
+ **Files:**
860
+ - Modify: `lib/chrono_forge/executor/methods/merge_branches.rb`
861
+ - Create: `test/merge_branches_test.rb`
862
+ - Create: `test/internal/app/jobs/two_branch_workflow.rb`
863
+
864
+ **Acceptance Criteria:**
865
+ - [ ] After all children of the named branches complete, the parent resumes and the `merge$<names>` log is `completed`.
866
+ - [ ] While children are incomplete, the parent halts (idle) and a `BranchMergeJob` is enqueued.
867
+ - [ ] A failed/stalled child keeps the parent parked (Option A); recovering it lets the merge resolve.
868
+ - [ ] `merge_branches :never_opened` raises `ArgumentError`.
869
+
870
+ **Verify:** `bundle exec ruby -I test test/merge_branches_test.rb` → PASS
871
+
872
+ **Steps:**
873
+
874
+ - [ ] **Step 1: Write failing tests + fixture**
875
+
876
+ Create `test/internal/app/jobs/two_branch_workflow.rb`:
877
+
878
+ ```ruby
879
+ class TwoBranchWorkflow < WorkflowJob
880
+ prepend ChronoForge::Executor
881
+
882
+ def perform
883
+ branch :a do
884
+ spawn :one, NoopChild
885
+ end
886
+ branch :b do
887
+ spawn :two, NoopChild
888
+ end
889
+ merge_branches :a, :b
890
+ durably_execute :finalize
891
+ end
892
+
893
+ private
894
+
895
+ def finalize
896
+ context["finalized"] = true
897
+ end
898
+ end
899
+ ```
900
+
901
+ Create `test/merge_branches_test.rb`:
902
+
903
+ ```ruby
904
+ require "test_helper"
905
+
906
+ class MergeBranchesTest < ActiveJob::TestCase
907
+ def test_parent_resumes_after_branches_complete
908
+ TwoBranchWorkflow.perform_later("mb-1")
909
+ perform_all_jobs
910
+
911
+ parent = ChronoForge::Workflow.find_by(key: "mb-1")
912
+ assert parent.completed?, "parent should complete once both branches merge"
913
+ assert_equal true, parent.context["finalized"]
914
+ merge_log = parent.execution_logs.find { |l| l.step_name.start_with?("merge$") }
915
+ assert merge_log.completed?
916
+ end
917
+
918
+ def test_unopened_branch_name_raises
919
+ job = Class.new(WorkflowJob) do
920
+ prepend ChronoForge::Executor
921
+ def perform = merge_branches(:nope)
922
+ end
923
+ Object.const_set(:NoBranchMergeWorkflow, job)
924
+ NoBranchMergeWorkflow.perform_later("nb-1")
925
+ assert_raises(ArgumentError) { perform_all_jobs }
926
+ ensure
927
+ Object.send(:remove_const, :NoBranchMergeWorkflow) if defined?(NoBranchMergeWorkflow)
928
+ end
929
+
930
+ # Option A: a non-completed (stalled) child keeps the parent parked; recovering
931
+ # the child lets the merge resolve.
932
+ def test_failed_child_parks_parent_until_recovered
933
+ StalledChildBranchWorkflow.perform_later("oa-1")
934
+ perform_all_jobs
935
+
936
+ parent = ChronoForge::Workflow.find_by(key: "oa-1")
937
+ child = ChronoForge::Workflow.find_by(key: "oa-1$grp$c")
938
+ refute parent.completed?, "parent must stay parked while child is not completed"
939
+ assert child.stalled?, "child should be stalled (permanent failure)"
940
+
941
+ # Recover the child; drive jobs again — the merge poll should now resolve.
942
+ child.context # no-op touch
943
+ child.update!(state: :idle) # simulate fix + allow re-run
944
+ StalledChildBranchWorkflow::ALLOW_COMPLETE[:ok] = true
945
+ child.retry_later rescue child.job_klass.perform_later(child.key)
946
+ perform_all_jobs
947
+
948
+ assert ChronoForge::Workflow.find_by(key: "oa-1").completed?,
949
+ "parent should complete once the recovered child completes"
950
+ end
951
+ end
952
+ ```
953
+
954
+ And add the stalling fixture `test/internal/app/jobs/stalled_child_branch_workflow.rb`:
955
+
956
+ ```ruby
957
+ class StalledChildBranchWorkflow < WorkflowJob
958
+ prepend ChronoForge::Executor
959
+
960
+ # Toggled by the test to let the child succeed on recovery.
961
+ ALLOW_COMPLETE = {ok: false}
962
+
963
+ def perform
964
+ branch :grp do
965
+ spawn :c, StalledChild
966
+ end
967
+ merge_branches :grp
968
+ end
969
+ end
970
+
971
+ class StalledChild < WorkflowJob
972
+ prepend ChronoForge::Executor
973
+
974
+ def perform(**)
975
+ durably_execute :maybe_fail, retry_policy: ChronoForge::Executor::RetryPolicy.new(retry_on: [])
976
+ end
977
+
978
+ private
979
+
980
+ def maybe_fail
981
+ raise "not yet" unless StalledChildBranchWorkflow::ALLOW_COMPLETE[:ok]
982
+ end
983
+ end
984
+ ```
985
+
986
+ > The exact recovery mechanics (`retry_later` vs re-enqueue, the `ALLOW_COMPLETE` toggle) may need adjusting against the real stall/retry behaviour observed in `test/chrono_forge_test.rb`'s permanent-failure tests — the assertion that matters is **parent parked while child not completed, parent completes after child completes**. Mirror the permanent-failure pattern already used in `chrono_forge_test.rb` for the stall setup.
987
+
988
+ - [ ] **Step 2: Run to verify they fail**
989
+
990
+ Run: `bundle exec ruby -I test test/merge_branches_test.rb`
991
+ Expected: FAIL — `NoMethodError: merge_branches`.
992
+
993
+ - [ ] **Step 3: Implement `merge_branches` + helpers**
994
+
995
+ Replace the placeholder `lib/chrono_forge/executor/methods/merge_branches.rb` with:
996
+
997
+ ```ruby
998
+ module ChronoForge
999
+ module Executor
1000
+ module Methods
1001
+ module MergeBranches
1002
+ # Join one or more named branches. Separate from dispatch so branches run
1003
+ # concurrently. Does one immediate check; if not done, hands off to the
1004
+ # lightweight BranchMergeJob and halts (the heavy parent is not replayed
1005
+ # per poll). Default cadence clamps between min/max, scaled by pending.
1006
+ def merge_branches(*names, min_interval: 5.seconds, max_interval: 5.minutes)
1007
+ step_name = "merge$#{names.map(&:to_s).sort.join(",")}"
1008
+ log = find_or_create_execution_log!(step_name) { |l| l.started_at = Time.current }
1009
+ return if log.completed?
1010
+
1011
+ branch_log_ids = names.map { |nm| open_branch!(nm)[:log_id] }
1012
+
1013
+ if branches_done?(branch_log_ids)
1014
+ names.each { |nm| @open_branches.delete(nm.to_s) }
1015
+ log.update!(state: :completed, completed_at: Time.current)
1016
+ return
1017
+ end
1018
+
1019
+ enqueue_branch_merge_job(branch_log_ids, min_interval, max_interval)
1020
+ halt_execution!
1021
+ end
1022
+ alias_method :merge_branch, :merge_branches
1023
+
1024
+ private
1025
+
1026
+ def open_branch!(name)
1027
+ (@open_branches || {}).fetch(name.to_s) do
1028
+ raise ArgumentError, "no open branch named #{name.inspect} (open it with `branch #{name.inspect} do … end` first)"
1029
+ end
1030
+ end
1031
+
1032
+ # A branch is done when its log is sealed (completed) and it has no
1033
+ # incomplete children. exists? short-circuits at the first incomplete row.
1034
+ def branches_done?(branch_log_ids)
1035
+ branch_log_ids.all? do |id|
1036
+ next false unless ExecutionLog.where(id: id, state: ExecutionLog.states[:completed]).exists?
1037
+ !Workflow.where(parent_execution_log_id: id)
1038
+ .where.not(state: Workflow.states[:completed]).exists?
1039
+ end
1040
+ end
1041
+
1042
+ def enqueue_branch_merge_job(branch_log_ids, min_interval, max_interval)
1043
+ BranchMergeJob.perform_later(
1044
+ @workflow.key, self.class.to_s, branch_log_ids,
1045
+ min_interval.to_i, max_interval.to_i
1046
+ )
1047
+ end
1048
+ end
1049
+ end
1050
+ end
1051
+ end
1052
+ ```
1053
+
1054
+ - [ ] **Step 4: Run to verify they pass**
1055
+
1056
+ Run: `bundle exec ruby -I test test/merge_branches_test.rb`
1057
+ Expected: PASS.
1058
+
1059
+ - [ ] **Step 5: Run the full suite to catch regressions**
1060
+
1061
+ Run: `bundle exec rake test`
1062
+ Expected: all green.
1063
+
1064
+ - [ ] **Step 6: Commit**
1065
+
1066
+ ```bash
1067
+ git add lib/chrono_forge/executor/methods/merge_branches.rb test/merge_branches_test.rb test/internal/app/jobs/two_branch_workflow.rb
1068
+ git commit -m "feat(branches): merge_branches poll-join"
1069
+ ```
1070
+
1071
+ ```json:metadata
1072
+ {"files": ["lib/chrono_forge/executor/methods/merge_branches.rb", "test/merge_branches_test.rb", "test/internal/app/jobs/two_branch_workflow.rb"], "verifyCommand": "bundle exec ruby -I test test/merge_branches_test.rb", "acceptanceCriteria": ["parent resumes after branches complete", "halts + enqueues poller while incomplete", "unopened name raises"], "requiresUserVerification": false}
1073
+ ```
1074
+
1075
+ ---
1076
+
1077
+ ### Task 7: Completion gate — automerge + raise on unmerged
1078
+
1079
+ **Goal:** In `complete_workflow!`, before sealing, inspect `@open_branches`: raise `UnmergedBranchError` for any leftover non-automerge branch; for leftover automerge branches, join them (poll/halt) before completing.
1080
+
1081
+ **Files:**
1082
+ - Modify: `lib/chrono_forge/executor/methods/workflow_states.rb`
1083
+ - Create: `test/automerge_test.rb`
1084
+
1085
+ **Acceptance Criteria:**
1086
+ - [ ] An `automerge: true` branch with no `merge_branches` blocks workflow completion until its children finish, then completes.
1087
+ - [ ] A branch opened with neither `merge_branches` nor `automerge: true` raises `UnmergedBranchError` at completion (even if its children already finished).
1088
+ - [ ] A branch already joined via `merge_branches` does not re-trigger at the gate.
1089
+
1090
+ **Verify:** `bundle exec ruby -I test test/automerge_test.rb` → PASS
1091
+
1092
+ **Steps:**
1093
+
1094
+ - [ ] **Step 1: Write failing tests + fixture**
1095
+
1096
+ Create `test/internal/app/jobs/unmerged_branch_workflow.rb`:
1097
+
1098
+ ```ruby
1099
+ class UnmergedBranchWorkflow < WorkflowJob
1100
+ prepend ChronoForge::Executor
1101
+
1102
+ def perform
1103
+ branch :forgotten do # no automerge, never merged
1104
+ spawn :c, NoopChild
1105
+ end
1106
+ end
1107
+ end
1108
+ ```
1109
+
1110
+ Create `test/automerge_test.rb`:
1111
+
1112
+ ```ruby
1113
+ require "test_helper"
1114
+
1115
+ class AutomergeTest < ActiveJob::TestCase
1116
+ # SingleSpawnWorkflow opens branch :grp with automerge: true and no merge.
1117
+ def test_automerge_blocks_completion_until_children_done
1118
+ SingleSpawnWorkflow.perform_later("am-1")
1119
+ perform_all_jobs
1120
+
1121
+ parent = ChronoForge::Workflow.find_by(key: "am-1")
1122
+ assert parent.completed?, "automerge branch should be joined before completion"
1123
+ child = ChronoForge::Workflow.find_by(key: "am-1$grp$child")
1124
+ assert child.completed?
1125
+ end
1126
+
1127
+ def test_unmerged_branch_raises
1128
+ UnmergedBranchWorkflow.perform_later("um-1")
1129
+ error = assert_raises(ChronoForge::Executor::UnmergedBranchError) { perform_all_jobs }
1130
+ assert_match(/forgotten/, error.message)
1131
+ end
1132
+ end
1133
+ ```
1134
+
1135
+ - [ ] **Step 2: Run to verify they fail**
1136
+
1137
+ Run: `bundle exec ruby -I test test/automerge_test.rb`
1138
+ Expected: `test_unmerged_branch_raises` FAILS (no error raised — branch silently detached).
1139
+
1140
+ - [ ] **Step 3: Add the completion gate**
1141
+
1142
+ In `lib/chrono_forge/executor/methods/workflow_states.rb`, change the start of `complete_workflow!` to call the gate first:
1143
+
1144
+ ```ruby
1145
+ def complete_workflow!
1146
+ enforce_branch_joins!
1147
+
1148
+ # Create an execution log for workflow completion
1149
+ execution_log = find_or_create_execution_log!("$workflow_completion$") do |log|
1150
+ log.started_at = Time.current
1151
+ end
1152
+ # ... unchanged body ...
1153
+ ```
1154
+
1155
+ Add the private gate method to the `WorkflowStates` module (it uses `branches_done?` / `enqueue_branch_merge_job` from `MergeBranches`, available on the same instance):
1156
+
1157
+ ```ruby
1158
+ # Every branch must be joined — explicitly (merge_branches) or implicitly
1159
+ # (automerge: true). @open_branches is the in-memory registry rebuilt each
1160
+ # replay pass: branch adds, merge_branches removes on completion. Anything
1161
+ # left here is either an automerge branch to join, or a forgotten join.
1162
+ def enforce_branch_joins!
1163
+ open = @open_branches || {}
1164
+ return if open.empty?
1165
+
1166
+ unmerged = open.reject { |_, b| b[:automerge] }
1167
+ if unmerged.any?
1168
+ names = unmerged.keys
1169
+ raise UnmergedBranchError,
1170
+ "branch(es) #{names.join(", ")} were opened but never merged. " \
1171
+ "Add `merge_branches #{names.map { |n| ":#{n}" }.join(", ")}` " \
1172
+ "or open with `branch(..., automerge: true)`."
1173
+ end
1174
+
1175
+ auto_ids = open.values.map { |b| b[:log_id] }
1176
+ unless branches_done?(auto_ids)
1177
+ enqueue_branch_merge_job(auto_ids, 5.seconds, 5.minutes)
1178
+ halt_execution! # poller wakes the parent; the gate re-runs on replay
1179
+ end
1180
+ end
1181
+ ```
1182
+
1183
+ - [ ] **Step 4: Run to verify they pass**
1184
+
1185
+ Run: `bundle exec ruby -I test test/automerge_test.rb`
1186
+ Expected: PASS.
1187
+
1188
+ - [ ] **Step 5: Run the full suite**
1189
+
1190
+ Run: `bundle exec rake test`
1191
+ Expected: all green (confirm KitchenSink etc. — which open no branches — are unaffected; `enforce_branch_joins!` returns early when `@open_branches` is empty).
1192
+
1193
+ - [ ] **Step 6: Commit**
1194
+
1195
+ ```bash
1196
+ git add lib/chrono_forge/executor/methods/workflow_states.rb test/automerge_test.rb test/internal/app/jobs/unmerged_branch_workflow.rb
1197
+ git commit -m "feat(branches): completion gate — automerge + raise on unmerged"
1198
+ ```
1199
+
1200
+ ```json:metadata
1201
+ {"files": ["lib/chrono_forge/executor/methods/workflow_states.rb", "test/automerge_test.rb", "test/internal/app/jobs/unmerged_branch_workflow.rb"], "verifyCommand": "bundle exec ruby -I test test/automerge_test.rb", "acceptanceCriteria": ["automerge blocks completion until children done", "unmerged branch raises", "already-merged branch is a no-op at gate"], "requiresUserVerification": false}
1202
+ ```
1203
+
1204
+ ---
1205
+
1206
+ ### Task 8: Crash-recovery (cursor resume) + scale/perf regression tests
1207
+
1208
+ **Goal:** Prove dispatch resumes from the cursor after a mid-dispatch crash (no duplicate children, bounded rework), and that dispatch is `⌈N/of⌉` inserts (not N) with constant per-item work.
1209
+
1210
+ **Files:**
1211
+ - Create: `test/branch_recovery_test.rb`
1212
+ - Create: `test/branch_scale_test.rb`
1213
+
1214
+ **Acceptance Criteria:**
1215
+ - [ ] A crash after chunk *k* leaves `metadata.cursors[name]` at that point; the resumed run continues from it, ends with exactly N children, no duplicate keys.
1216
+ - [ ] Dispatching N children with batch size `of` issues `⌈N/of⌉` `INSERT INTO chrono_forge_workflows` statements (not N).
1217
+
1218
+ **Verify:** `bundle exec ruby -I test test/branch_recovery_test.rb test/branch_scale_test.rb` → PASS
1219
+
1220
+ **Steps:**
1221
+
1222
+ - [ ] **Step 1: Write the scale test**
1223
+
1224
+ Create `test/branch_scale_test.rb`:
1225
+
1226
+ ```ruby
1227
+ require "test_helper"
1228
+
1229
+ class BranchScaleTest < ActiveJob::TestCase
1230
+ def setup
1231
+ User.delete_all
1232
+ 25.times { |i| User.create!(name: "u#{i}", email: "u#{i}@e.com") }
1233
+ end
1234
+
1235
+ def test_dispatch_uses_bulk_inserts_not_one_per_child
1236
+ inserts = 0
1237
+ pattern = /INSERT INTO ["`]?chrono_forge_workflows/i
1238
+ sub = ActiveSupport::Notifications.subscribe("sql.active_record") do |*a|
1239
+ inserts += 1 if pattern.match?(a.last[:sql].to_s)
1240
+ end
1241
+ # of: 10 over 25 users => ceil(25/10) = 3 insert_all statements.
1242
+ SpawnEachWorkflow.perform_later("scale-1", of: 10)
1243
+ perform_all_jobs_before(1.second) # dispatch happens on the first pass
1244
+ ActiveSupport::Notifications.unsubscribe(sub)
1245
+
1246
+ branch_log = ChronoForge::Workflow.find_by(key: "scale-1").execution_logs.find_by(step_name: "branch$grp")
1247
+ assert_equal 25, ChronoForge::Workflow.where(parent_execution_log_id: branch_log.id).count
1248
+ assert_operator inserts, :<=, 3, "expected <= ceil(25/10) bulk inserts, got #{inserts}"
1249
+ end
1250
+ end
1251
+ ```
1252
+
1253
+ > Note: this asserts on **`insert_all`** (DB rows), which is always bulk. Do NOT assert bulk job *enqueue* — under the test adapter `perform_all_later` falls back to per-job enqueue.
1254
+
1255
+ - [ ] **Step 2: Write the recovery test**
1256
+
1257
+ Create `test/branch_recovery_test.rb`:
1258
+
1259
+ ```ruby
1260
+ require "test_helper"
1261
+
1262
+ class BranchRecoveryTest < ActiveJob::TestCase
1263
+ include ChaoticJob::Helpers
1264
+
1265
+ def setup
1266
+ User.delete_all
1267
+ 25.times { |i| User.create!(name: "u#{i}", email: "u#{i}@e.com") }
1268
+ end
1269
+
1270
+ def test_resumes_dispatch_from_cursor_after_glitch
1271
+ # Glitch once during dispatch; ChaoticJob re-runs the workflow, which must
1272
+ # resume spawn_each from the persisted cursor rather than restarting at 0.
1273
+ workflow = SpawnEachWorkflow.new("rec-1", of: 10)
1274
+ run_scenario(workflow, glitch: ["before", "#{ChronoForge::Executor::Methods::Branch.instance_method(:dispatch_children).source_location[0]}:#{dispatch_children_line}"])
1275
+
1276
+ branch_log = ChronoForge::Workflow.find_by(key: "rec-1").execution_logs.find_by(step_name: "branch$grp")
1277
+ children = ChronoForge::Workflow.where(parent_execution_log_id: branch_log.id)
1278
+ assert_equal 25, children.count, "exactly N children, no duplicates after resume"
1279
+ assert_equal 25, children.distinct.count(:key)
1280
+ end
1281
+
1282
+ private
1283
+
1284
+ # Resolve the line of the perform_all_later call inside dispatch_children so the
1285
+ # glitch lands mid-dispatch. Adjust if the method changes.
1286
+ def dispatch_children_line
1287
+ src = File.read(ChronoForge::Executor::Methods::Branch.instance_method(:dispatch_children).source_location[0])
1288
+ src.lines.index { |l| l.include?("perform_all_later") }.to_i + 1
1289
+ end
1290
+ end
1291
+ ```
1292
+
1293
+ > If targeting an exact glitch line proves brittle, an acceptable alternative is to call `SpawnEachWorkflow.perform_later` twice in a row with the same key (simulating a re-run) after manually truncating `metadata.cursors` to a mid-point, and assert the final child set is exactly N with no duplicates. Either approach proves cursor-resume idempotency.
1294
+
1295
+ - [ ] **Step 3: Run both to verify they fail, then pass**
1296
+
1297
+ Run: `bundle exec ruby -I test test/branch_scale_test.rb test/branch_recovery_test.rb`
1298
+ Expected: PASS (these exercise Task 3/4 code; if they fail, fix the dispatch/cursor logic, not the tests).
1299
+
1300
+ - [ ] **Step 4: Commit**
1301
+
1302
+ ```bash
1303
+ git add test/branch_recovery_test.rb test/branch_scale_test.rb
1304
+ git commit -m "test(branches): cursor-resume recovery + bulk-dispatch scale guards"
1305
+ ```
1306
+
1307
+ ```json:metadata
1308
+ {"files": ["test/branch_recovery_test.rb", "test/branch_scale_test.rb"], "verifyCommand": "bundle exec ruby -I test test/branch_scale_test.rb test/branch_recovery_test.rb", "acceptanceCriteria": ["cursor resume: exactly N children no dupes", "dispatch uses ceil(N/of) bulk inserts"], "requiresUserVerification": false}
1309
+ ```
1310
+
1311
+ ---
1312
+
1313
+ ### Task 9: Dependency floor + README
1314
+
1315
+ **Goal:** Pin `activejob >= 7.1` (required for `perform_all_later`) and document the feature with the load-bearing caveats.
1316
+
1317
+ **Files:**
1318
+ - Modify: `chrono_forge.gemspec`
1319
+ - Modify: `README.md`
1320
+
1321
+ **Acceptance Criteria:**
1322
+ - [ ] `chrono_forge.gemspec` requires `activejob >= 7.1`.
1323
+ - [ ] README has a "Branches" section documenting `branch`/`spawn`/`spawn_each`/`merge_branches`/`automerge` and the three caveats (every branch must be joined; parent not replayed per poll; source must be stable during dispatch).
1324
+
1325
+ **Verify:** `bundle exec ruby -e "require 'rubygems'; spec = Gem::Specification.load('chrono_forge.gemspec'); dep = spec.dependencies.find { |d| d.name == 'activejob' }; abort('no floor') unless dep.requirement.satisfied_by?(Gem::Version.new('7.1')) && !dep.requirement.satisfied_by?(Gem::Version.new('7.0')); puts 'ok'"` → `ok`
1326
+
1327
+ **Steps:**
1328
+
1329
+ - [ ] **Step 1: Pin the dependency**
1330
+
1331
+ In `chrono_forge.gemspec`, change:
1332
+
1333
+ ```ruby
1334
+ spec.add_dependency "activejob"
1335
+ ```
1336
+
1337
+ to:
1338
+
1339
+ ```ruby
1340
+ spec.add_dependency "activejob", ">= 7.1"
1341
+ ```
1342
+
1343
+ - [ ] **Step 2: Document in README**
1344
+
1345
+ Add a "Branches: parallel sub-workflows" section to `README.md` with a worked example (the `branch :fulfillment, automerge: true do … end` + `merge_branches` example from the spec's Goal section) and a "Caveats" callout covering, verbatim in spirit:
1346
+ - Every branch must be merged or `automerge: true`, else `UnmergedBranchError`.
1347
+ - The heavy parent is not replayed per poll — a lightweight `BranchMergeJob` does the waiting.
1348
+ - The source must be stable during a branch's dispatch window (items keyed `name_{index}` by position; `error_on_ignore: true` catches ordering, not insertion).
1349
+
1350
+ - [ ] **Step 3: Verify the gemspec floor**
1351
+
1352
+ Run the Verify command above. Expected: `ok`.
1353
+
1354
+ - [ ] **Step 4: Run the full suite one last time**
1355
+
1356
+ Run: `bundle exec rake test`
1357
+ Expected: all green.
1358
+
1359
+ - [ ] **Step 5: Commit**
1360
+
1361
+ ```bash
1362
+ git add chrono_forge.gemspec README.md
1363
+ git commit -m "feat(branches): require activejob >= 7.1; document branches"
1364
+ ```
1365
+
1366
+ ```json:metadata
1367
+ {"files": ["chrono_forge.gemspec", "README.md"], "verifyCommand": "bundle exec rake test", "acceptanceCriteria": ["activejob >= 7.1 floor", "README branches section + caveats"], "requiresUserVerification": false}
1368
+ ```
1369
+
1370
+ ---
1371
+
1372
+ ## Notes for the implementer
1373
+
1374
+ - **Zeitwerk loading:** new files under `lib/chrono_forge/` autoload by namespace — `branch_merge_job.rb` → `ChronoForge::BranchMergeJob`; `executor/methods/branch.rb` → `ChronoForge::Executor::Methods::Branch`. No manual `require`. The only wiring is the `include`s in `executor/methods.rb`.
1375
+ - **Helper visibility:** `branch.rb`, `merge_branches.rb`, and `workflow_states.rb` are all mixed into the same `Executor` instance, so their private helpers (`dispatch_children`, `branches_done?`, `enqueue_branch_merge_job`, `current_branch!`) call each other freely.
1376
+ - **`insert_all` + JSON:** pass `kwargs`/`options`/`context` as Ruby hashes; Rails casts them to the json/jsonb columns. `insert_all` does not set timestamps — `created_at`/`updated_at` are set explicitly in `dispatch_children`.
1377
+ - **Child execution on first run:** `dispatch_children` pre-inserts the child row (with `kwargs`), then enqueues `klass.new(child_key, **kwargs)`. When the child job runs, the executor's `setup_workflow!` finds the pre-inserted row and uses its stored `kwargs` — the job-arg kwargs are redundant but harmless.
1378
+ - **Run order:** Tasks 1→2→3→4 are strictly sequential; 5 depends on 2; 6 depends on 5 and 3; 7 depends on 6 and 3/4; 8 depends on 7; 9 is independent. The `MergeBranches` module must exist (even empty) once Task 3 wires the include — create the file in Task 3, flesh it out in Task 6.