@exaudeus/workrail 3.38.0 → 3.39.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,745 +1,162 @@
1
- # Coordinator Script Architecture Discovery
1
+ # PR Review Coordinator Script: Design Candidates
2
2
 
3
- **Date:** 2026-04-18
4
- **Discovery path:** design_first (goal was a solution statement; risk is solving the wrong abstraction)
5
- **Status:** In progress
3
+ *Discovery run: 2026-04-18. Three runs completed. Design settled.*
6
4
 
7
5
  ---
8
6
 
9
- ## Artifact Strategy
7
+ ## Problem Understanding
10
8
 
11
- This document is a **human-readable discovery report** for Etienne and future pipeline authors. It is NOT workflow execution memory -- that lives in WorkRail step notes and context variables. If this file and the WorkRail session disagree, the session notes are authoritative.
9
+ ### Core Tensions
12
10
 
13
- **What this doc is for:**
14
- - Architecture recommendation with rationale
15
- - Key design decisions and tradeoffs
16
- - Open questions and risks
17
- - Reference for implementation
11
+ 1. **Parseability vs. output format:** The `mr-review-workflow` final step (`phase-6-final-handoff`) produces free-form markdown designed for human readers. The coordinator needs machine-parseable output. Adding `## COORDINATOR_OUTPUT` to the workflow prompt would make parsing reliable, but changes a workflow other users may run standalone. Decision: two-tier parser in coordinator only; update the workflow prompt as a separate follow-up.
18
12
 
19
- **What this doc is not for:**
20
- - Tracking workflow execution state
21
- - Storing session continue tokens or checkpoint data
13
+ 2. **HTTP API vs. CLI subprocess for dispatch:** Using CLI subprocess (`execFile('worktrain', ['spawn', ...])`) is simpler (no port discovery logic), but loses context passing and adds subprocess overhead. Using HTTP directly (`POST /api/v2/auto/dispatch`) requires port discovery (same logic as `worktrain-spawn.ts`), but enables the `context` field. Decision: HTTP direct, copy port discovery pattern.
22
14
 
23
- ---
24
-
25
- ## Context / Ask
15
+ 3. **Coordinator as CLI script vs. daemon workflow:** Could run the coordinator as a WorkRail workflow, getting durability. But that adds circular dependency (WorkRail spawning WorkRail sessions from inside a WorkRail session). The backlog explicitly says "scripts-first coordinator" -- deterministic TypeScript logic, not LLM orchestration. Decision: standalone CLI script.
26
16
 
27
- **Stated goal:** Design the first coordinator script template -- the script that drives a multi-phase WorkRail pipeline using worktrain spawn/await.
17
+ 4. **Fix-agent loop termination:** Max 3 passes is the rule, but what if pass 2 comes back minor again? Need another review pass. The tension: another review = another 15-minute wait. Solution: enforce max 3 passes strictly via counter, track in coordinator state per PR.
28
18
 
29
- **Reframed problem (solution-bias stripped):** How should multi-phase WorkRail pipelines be orchestrated -- what is the right abstraction layer, locus of control, and failure model for driving sequential and parallel child sessions?
19
+ ### What Makes This Hard
30
20
 
31
- **Goal was a solution statement.** The original framing assumed: (a) a script is the right locus, (b) worktrain spawn/await are the right primitives, (c) a reusable template is achievable. These assumptions are challenged below.
21
+ 1. **Notes extraction requires 2 sequential HTTP calls:** `GET /api/v2/sessions/:id` must succeed and return a valid `preferredTipNodeId` before the node detail call. If either fails, coordinator must treat as `unknown` severity (conservative, escalate).
32
22
 
33
- ---
23
+ 2. **Fix agent loop state management:** Need to track per-PR: pass count, current handle, previous findings. This is mutable state, which conflicts with the immutability preference. Resolution: keep loop counter local to the per-PR processing function, not exposed as shared state.
34
24
 
35
- ## Path Recommendation
25
+ 3. **`worktrain await` does NOT return session notes:** `await` returns only `{ results: [...SessionResult], allSucceeded }`. To get what the agent actually found, a separate 2-call HTTP sequence is needed after `await` returns.
36
26
 
37
- **design_first** -- The dominant risk is shaping the wrong abstraction. Two viable architectures exist (coordinator script vs. native WorkRail workflow with spawn_agent). Without clarifying the real problem first, we risk building the wrong thing at the wrong layer.
27
+ 4. **Keyword scan ambiguity:** The mr-review markdown may use ambiguous language (e.g., "minor architectural blocking concern"). Conservative default: `unknown` -> blocking always wins over minor.
38
28
 
39
- Rationale against `landscape_first`: the landscape is already well-understood from code reading. The risk is not ignorance of options, it is premature commitment to the script model without examining the native workflow alternative.
29
+ ### Likely Seam
40
30
 
41
- Rationale against `full_spectrum`: the reframing is already complete from Step 1. The uncertainty is architectural (which layer owns coordination), not conceptual (what is coordination).
31
+ The real seam is `CoordinatorDeps` -- all HTTP calls, CLI calls (`gh`, `git`), and stderr output sit behind this interface. The coordinator core is pure TypeScript with no side effects except through deps.
42
32
 
43
33
  ---
44
34
 
45
- ## Constraints / Anti-goals
35
+ ## Philosophy Constraints
46
36
 
47
- **Core constraints:**
48
- - Zero LLM cost for coordination routing decisions (scripts, not LLM reasoning, for deterministic logic)
49
- - Coordinator must be observable: console DAG must show parent-child session tree
50
- - Coordinator must be testable without a live daemon (mockable spawn/await primitives)
51
- - Must not require engine changes to add a new pipeline
52
- - Must handle fan-out parallelism (spawn N child sessions, collect all N results)
37
+ Source: `/Users/etienneb/CLAUDE.md`
53
38
 
54
- **Anti-goals:**
55
- - Do NOT build a coordinator that uses an LLM to route between phases
56
- - Do NOT require modifications to trigger-router.ts or workflow-runner.ts for each new pipeline
57
- - Do NOT produce a template so generic it cannot express mr-review's loop-with-retry topology
58
- - Do NOT couple the coordinator's failure model to the daemon's Semaphore (no deadlock risk)
39
+ - **Immutability by default** -- coordinator state as read-only data structures; mutation only in explicit loop counters
40
+ - **Errors as data** -- `Result<T, E>` return types from `parseFindingsFromNotes()`, `getAgentResult()` -- no throws
41
+ - **Validate at boundaries** -- validate port, workspace path at CoordinatorDeps wiring time (CLI entry point), trust internally
42
+ - **DI for I/O** -- all fetch, execFile, stderr injected via CoordinatorDeps; no direct imports in coordinator core
43
+ - **Explicit domain types** -- `ReviewSeverity = 'clean' | 'minor' | 'blocking' | 'unknown'` not plain string
44
+ - **Exhaustiveness everywhere** -- switch on `ReviewSeverity` must be exhaustive
45
+ - **YAGNI with discipline** -- build this coordinator, not a coordinator framework
59
46
 
60
- ---
61
-
62
- ## Key Facts From Code Reading
63
-
64
- ### worktrain spawn (worktrain-spawn.ts)
65
- - Flags: `--workflow <id>`, `--goal <text>`, `--workspace <path>`, `[--port <n>]`
66
- - HTTP POST to `/api/v2/auto/dispatch` with `{ workflowId, goal, workspacePath }`
67
- - Output to stdout: session handle (string, e.g. `sess_abc123`)
68
- - Output to stderr: progress/errors
69
- - Return: CliResult success | failure | misuse
70
- - **Does NOT pass context variables** -- only workflowId, goal, workspacePath
71
-
72
- ### worktrain await (worktrain-await.ts)
73
- - Flags: `--sessions <h1,h2,...>`, `[--mode all|any]`, `[--timeout 30m]`
74
- - Polls GET `/api/v2/sessions/:sessionId` every 3 seconds
75
- - Terminal statuses: `complete`, `complete_with_gaps`, `blocked`, `dormant`
76
- - Output to stdout: JSON `{ results: [{ handle, outcome, status, durationMs }], allSucceeded }`
77
- - **CRITICAL GAP: No step notes, no findings, no structured artifacts returned**
78
- - Exit code 0 if all succeeded, 1 if any failed/timed out
79
-
80
- ### spawn_agent tool (workflow-runner.ts L1415)
81
- - Available inside workflow steps (not as a CLI command)
82
- - Blocking: parent AgentLoop pauses inside execute() until child completes
83
- - Returns: `{ childSessionId, outcome: 'success'|'error'|'timeout', notes: string }`
84
- - Depth-limited: default max depth 3
85
- - **Returns last step notes from child** -- actionable content available immediately
86
- - **Serial only**: cannot fan out N children in parallel (each call blocks)
87
- - Parent session's maxSessionMinutes keeps ticking while child runs
88
-
89
- ### trigger-router.ts dispatch()
90
- - Fire-and-forget via KeyedAsyncQueue
91
- - Returns immediately (202 pattern)
92
- - Uses global Semaphore (max 3 concurrent by default)
93
- - **Why spawn_agent cannot use dispatch()**: dispatch is fire-and-forget; calling it inside a running session would lose the result. Direct runWorkflow() call is used instead.
94
-
95
- ### classify-task-workflow.json
96
- - Single-step, fast, no tools, no subagents
97
- - Outputs: taskComplexity, riskLevel, hasUI, touchesArchitecture, taskType, affectedDomains, recommendedPipeline
98
- - `recommendedPipeline` is an ordered array of workflow IDs
99
- - Notes are the output channel -- no structured context variables emitted
47
+ **No conflicts** between stated philosophy and repo patterns.
100
48
 
101
49
  ---
102
50
 
103
- ## Critical Gap: worktrain await Does Not Return Session Content
104
-
105
- The backlog pseudocode (backlog.md L1793-1795) shows:
106
- ```
107
- 3. Calls `await_sessions(handles)` → structured findings (script waits)
108
- 4. Parses the findings JSON block from each session's output (script)
109
- 5. Routes: clean → merge queue, minor → spawn fix agent, blocking → escalate
110
- ```
111
-
112
- But the real `worktrain await` output is `{ handle, outcome, status, durationMs }` -- no findings, no notes.
51
+ ## Impact Surface
113
52
 
114
- **To route on content, the coordinator must:**
115
- 1. `worktrain await --sessions h1,h2` -- wait for completion
116
- 2. For each completed session handle, call GET `/api/v2/sessions/:sessionId` -- retrieve step notes
117
- 3. Parse the structured block from notes
118
- 4. Route on parsed content
119
-
120
- This is a missing primitive: **worktrain notes <session-handle>** (or a --include-notes flag on worktrain await).
53
+ - `src/cli-worktrain.ts` -- adds `run pr-review` subcommand (minimal change, follows existing pattern)
54
+ - `src/cli/commands/index.ts` -- exports new command types
55
+ - `workflows/mr-review-workflow.agentic.v2.json` -- NOT changed in this PR; the coordinator's two-tier parser handles current output
56
+ - `POST /api/v2/auto/dispatch` -- used by coordinator via HTTP (no change to route)
57
+ - `GET /api/v2/sessions/:id` + `GET /api/v2/sessions/:id/nodes/:nodeId` -- read-only; no changes
121
58
 
122
59
  ---
123
60
 
124
- ## Two Candidate Architectures
125
-
126
- ### Candidate A: Coordinator Script (TypeScript/Shell)
127
-
128
- ```
129
- coordinator-mr-review.ts
130
- 1. spawn handles[] = worktrain spawn --workflow mr-review for each PR (parallel fire-and-forget)
131
- 2. await results = worktrain await --sessions h1,h2,h3
132
- 3. for each handle: GET /api/v2/sessions/:id -> parse step notes -> extract findings
133
- 4. route: clean -> merge queue, minor -> spawn fix agent, blocking -> escalate
134
- 5. await fix agents -> re-review loop (circuit breaker at 3)
135
- 6. merge clean PRs
136
- ```
137
-
138
- **Pros:**
139
- - True parallel fan-out (fire-and-forget spawn, batch await)
140
- - Deterministic routing (zero LLM cost for coordination)
141
- - Testable: mock fetch, mock worktrain CLI
142
- - Reusable: script is a standalone artifact others can copy
143
-
144
- **Cons:**
145
- - Invisible to WorkRail: coordinator script is not a session in the DAG
146
- - No session DAG for the coordinator itself (only child sessions are visible)
147
- - Requires a running process outside the daemon (shell script lifetime)
148
- - Must handle port discovery, daemon connectivity separately
149
- - **Missing primitive:** must query session notes separately after await
150
-
151
- ### Candidate B: WorkRail Workflow with spawn_agent Steps
152
-
153
- ```
154
- coordinator-mr-review-workflow.json
155
- Step 1: Gather PRs
156
- Step 2: For each PR, call spawn_agent(mr-review-workflow) -- SERIAL (one at a time)
157
- Step 3: Route based on outcome + notes from spawn_agent
158
- Step 4: spawn_agent(fix-workflow) if needed, re-review
159
- Step 5: Merge
160
- ```
161
-
162
- **Pros:**
163
- - Full WorkRail observability: coordinator IS a session in the DAG
164
- - Child sessions linked via parentSessionId
165
- - Console DAG shows the full tree
166
- - spawn_agent returns notes directly (no separate query needed)
167
- - Session state is durable (daemon crash recovery)
168
-
169
- **Cons:**
170
- - Serial only: cannot spawn N review sessions in parallel
171
- - Parent session time limit accumulates across all child runs
172
- - At depth limit 3 (default), nested spawn_agent chains are constrained
173
- - Coordinator logic is in workflow JSON prompt blocks, not testable TypeScript
174
-
175
- ### Candidate C: Hybrid -- Script Coordinator with Session Registration
176
-
177
- A TypeScript coordinator script that:
178
- - Calls worktrain spawn (parallel) + worktrain await (batch)
179
- - Registers itself as a coordinator session with a workflowId (so it appears in the DAG)
180
- - After await, queries session notes via HTTP, routes on content
181
- - Reports phase transitions back to the daemon as structured events
182
-
183
- **Pros:** Parallel fan-out + DAG visibility + testable TypeScript
184
- **Cons:** Requires new engine primitive (coordinator session registration) -- not yet built
185
-
186
- ---
187
-
188
- ## Landscape Packet
189
-
190
- ### Current State Summary
191
-
192
- Two orchestration primitives exist today (both shipped, neither battle-tested):
193
-
194
- | Primitive | Layer | Parallelism | Returns Content | Observable in DAG |
195
- |-----------|-------|-------------|-----------------|-------------------|
196
- | `worktrain spawn` + `worktrain await` | CLI / HTTP | Yes (fire N, await all) | No (outcome + status only) | No (script is invisible) |
197
- | `spawn_agent` tool | Engine / workflow step | No (blocking, serial) | Yes (notes returned inline) | Yes (parentSessionId in store) |
198
-
199
- Neither primitive is complete for the target use case:
200
- - Script model: parallel but content-blind
201
- - Native model: content-aware but serial
202
-
203
- ### Existing Workflows Available as Targets
204
- - `coding-task-workflow-agentic` (lean v2)
205
- - `mr-review-workflow.agentic.v2`
206
- - `routine-context-gathering`, `routine-hypothesis-challenge`, `routine-philosophy-alignment`
207
- - `ui-ux-design-workflow`, `production-readiness-audit`, `architecture-scalability-audit`
208
- - `bug-investigation.agentic.v2`, `wr.discovery`
209
- - `classify-task-workflow` (new, single-step, outputs recommendedPipeline array)
210
-
211
- ### Engineering State (git log context)
212
- - `spawn_agent` shipped in commit `4254feb7` (feat: in-process child session delegation)
213
- - `worktrain spawn` / `worktrain await` -- Tier 3 in Apr 18 grooming: "already merged, needs real-world test"
214
- - `classify-task-workflow` exists but not yet wired into any coordinator
215
- - `parentSessionId` is in session store; console tree view is the next planned feature
216
-
217
- ### Hard Constraints From Code
218
- 1. `dispatch()` in TriggerRouter is fire-and-forget + Semaphore-gated. Calling from inside a running session deadlocks. This is why `spawn_agent` uses direct `runWorkflow()` call, not `dispatch()`.
219
- 2. `worktrain await` stdout schema is `AwaitResult = { results: [{ handle, outcome, status, durationMs }], allSucceeded }`. No notes.
220
- 3. `worktrain spawn` CLI: `{ workflowId, goal, workspacePath }` only. No context variable passing.
221
- 4. `spawn_agent` depth limit: default max 3. Root (0) → child (1) → grandchild (2) → blocked at 3.
222
- 5. `spawn_agent` blocks the parent AgentLoop's execute() method. The parent cannot do other work while child runs.
223
-
224
- ### Obvious Contradictions
225
-
226
- **C1: Backlog assumes findings from await, but CLI returns none.**
227
- Backlog pseudocode (backlog.md L1793): `await_sessions(handles) → structured findings`. Real CLI returns `{ outcome, status, durationMs }` only. Every coordinator that routes on content has an undocumented extra HTTP step.
228
-
229
- **C2: Backlog envisions parallel fan-out, but spawn_agent is serial.**
230
- Backlog (backlog.md L2190): `await_sessions({ handles: [...], mode: 'all' })` implies a non-blocking spawn primitive. But `spawn_agent` (the native tool) blocks. The CLI (`worktrain spawn` / `worktrain await`) can do parallel, but returns no content.
231
-
232
- **C3: classify-task output is in notes, not in context variables.**
233
- The classify-task workflow outputs via step notes (a markdown block), not via WorkRail context variables. A coordinator that reads classify output must parse the notes string -- there is no structured `context.taskComplexity` to read directly.
234
-
235
- ### Evidence Gaps
236
- - **Gap 1:** Whether GET /api/v2/sessions/:id currently returns full step notes in the runs array (the await code reads `runs[0].status` but not notes -- unclear if notes are included in the response body)
237
- - **Gap 2:** Whether a `--context` flag for `worktrain spawn` is planned (needed to pass classify output to coding-task session)
238
- - **Gap 3:** Whether non-blocking spawn_agent (fire + await_all) is on the roadmap (would resolve C2)
239
-
240
- ---
241
-
242
- ## Problem Frame Packet (Deep)
243
-
244
- ### Stakeholders
245
-
246
- **Primary user: Etienne (WorkTrain builder / first pipeline author)**
247
- - Job: Wire up an autonomous pipeline that runs the full develop-review-fix-merge cycle without manual coordination
248
- - Outcome: Spend Monday morning reviewing Slack, not manually driving 8 agent sessions in sequence
249
- - Pain: Today every handoff (review complete -> spawn fix agent -> re-review) requires Etienne to be online, read the findings, and manually kick the next step
250
- - Constraint: Must be able to reason about what went wrong when a pipeline fails at 2am
61
+ ## Candidates
251
62
 
252
- **Secondary user: Future WorkTrain pipeline authors (teams adopting WorkTrain)**
253
- - Job: Add their own pipelines (onboarding pipeline, data migration pipeline, etc.)
254
- - Outcome: Write a new pipeline by copying a template and changing workflow IDs and routing rules
255
- - Pain: If the coordinator pattern is hard to understand or extend, each team rewrites it from scratch
256
- - Constraint: Cannot be expected to understand the WorkRail engine internals
63
+ ### Candidate A: Minimal CLI Script (subprocess model)
257
64
 
258
- ### Jobs / Desired Outcomes
65
+ **Summary:** `worktrain run pr-review` as a thin TypeScript wrapper shelling out to `worktrain spawn` and `worktrain await` CLIs via `execFile`, parsing stdout manually, calling `gh` directly.
259
66
 
260
- 1. **Full autonomy:** A trigger fires, the pipeline runs, a Slack message arrives with outcome. No human in the loop.
261
- 2. **Debuggability:** When something goes wrong, Etienne can trace exactly what each phase did and why the coordinator made each routing decision.
262
- 3. **Composability:** The pipeline is assembled from existing workflow primitives, not a monolithic LLM session.
263
- 4. **Parallelism:** Review N PRs simultaneously, not one at a time.
67
+ - **Tensions resolved:** Simplest possible change; reuses existing CLI contracts
68
+ - **Tensions accepted:** No context passing; subprocess overhead; harder to test
69
+ - **Boundary:** CoordinatorDeps wraps `execFile` for all subprocess calls
70
+ - **Failure mode:** `worktrain spawn` output format change breaks coordinator silently
71
+ - **Repo-pattern relationship:** Departs -- `delivery-action.ts` uses direct function calls, not subprocesses
72
+ - **Gain:** Minimal new code
73
+ - **Loss:** No context passing, no type safety on spawn/await results, poor testability
74
+ - **Scope:** Too narrow
75
+ - **Philosophy:** Violates 'prefer fakes over mocks' (exec calls hard to fake cleanly); violates 'errors as data'
264
76
 
265
- ### Tensions
77
+ ### Candidate B: HTTP-first with CoordinatorDeps Interface (RECOMMENDED)
266
78
 
267
- **T1: Observability vs. Parallelism**
268
- - Native `spawn_agent` (observable in DAG, serial) vs. script + worktrain CLI (parallel, invisible to DAG)
269
- - You cannot have both today. Console DAG tree = serial. Parallel fan-out = invisible coordinator.
79
+ **Summary:** `src/coordinators/pr-review.ts` with a `CoordinatorDeps` readonly interface. Core logic is pure functions. CLI wiring in `src/cli-worktrain.ts` provides real HTTP/CLI deps. Tests inject fakes.
270
80
 
271
- **T2: Content access vs. Simplicity**
272
- - Routing on findings requires 2-call HTTP sequence (session detail + node detail) after worktrain await
273
- - Simpler coordinator ignores findings and routes only on exit code (succeeded / failed) -- but this loses the clean/minor/blocking distinction
274
- - The backlog explicitly wants content-based routing; simplicity would sacrifice the main value proposition
275
-
276
- **T3: Template reuse vs. Pipeline specificity**
277
- - A generic pipeline runner (execute recommendedPipeline array from classify-task) is maximally reusable but cannot express mr-review's loop-with-retry
278
- - A specialized mr-review coordinator can express the full topology but is not reusable
279
- - The first coordinator template will set the pattern -- wrong abstraction level here propagates to all future pipelines
280
-
281
- **T4: Build now vs. Build right**
282
- - worktrain spawn/await are merged but untested (Tier 3 Apr 18 grooming)
283
- - spawn_agent just shipped (commit 4254feb7) and needs real-world validation
284
- - Building the coordinator NOW uses primitives that are still in "needs testing" state
285
- - Waiting for primitives to stabilize reduces rework risk but delays the autonomous pipeline
286
-
287
- **T5: Script model vs. Workflow model (the central architectural tension)**
288
- - Script: parallel, testable, zero LLM cost, but invisible to DAG and no native failure recovery
289
- - Workflow: observable, content-aware via spawn_agent, but serial and time-budget constrained
290
- - The backlog explicitly names coordinator scripts as the intended model -- but the code reality shows spawn_agent is the more capable primitive for content-based routing
291
-
292
- ### Success Criteria (observable)
293
-
294
- 1. A coordinator triggers from a cron or webhook, runs the mr-review pipeline for all open PRs, and posts a Slack summary -- without Etienne touching anything
295
- 2. When findings are "blocking", the coordinator spawns a fix agent and re-reviews (not just logs and exits)
296
- 3. When a child session fails, the coordinator's Slack summary names WHICH PR failed and WHY (the finding text)
297
- 4. The console DAG shows all child sessions linked to the coordinator as a tree (even if the coordinator itself is invisible as a script)
298
- 5. A second pipeline (e.g. implement-feature) can be added by writing a new 50-line TypeScript file and changing 3 workflow IDs
299
-
300
- ### Primary Framing Risk
301
-
302
- **If spawn_agent becomes non-blocking (fire + await_all) in the near term, the entire script-vs-workflow calculus inverts.**
303
-
304
- Currently the script model wins on parallelism (the only dimension where it beats native workflows). If spawn_agent gets a non-blocking mode with batch-await, native workflows become strictly better: same parallelism, plus DAG observability, plus notes available inline, plus no separate HTTP calls. In that scenario, building a coordinator script template today would be building the wrong abstraction -- the right abstraction would be a coordinator workflow JSON file.
305
-
306
- This is not generic. It is a specific condition (spawn_agent async mode shipping) that would make the current framing wrong. The decision hinges on whether to build the script now or wait for/build the async spawn_agent first.
307
-
308
- ## Open Questions
309
-
310
- 1. **Parallel fan-out with spawn_agent:** Is there a plan to make spawn_agent non-blocking (fire-and-forget + await_all)? If yes, Candidate B becomes viable for parallel pipelines.
311
- 2. **worktrain await --include-notes flag:** Is this planned? Without it, every coordinator routing on content needs a separate HTTP call.
312
- 3. **worktrain spawn --context flag:** The current CLI does not pass context variables to the spawned session. How does the coordinator pass classify-task output (taskComplexity, recommendedPipeline) to the coding-task session?
313
- 4. **Coordinator session registration:** Is there a plan for scripts to register as coordinator sessions so they appear in the DAG?
314
- 5. **Session notes API:** Is GET /api/v2/sessions/:id currently returning full step notes in the runs array? The await code reads `runs[0].status` but not notes.
315
-
316
- ---
317
-
318
- ## Problem Frame Packet
319
-
320
- **Primary uncertainty:** Whether the script model or the native workflow model is the right long-term abstraction -- given that spawn_agent is blocking+serial today, but the backlog explicitly describes async spawn + batch await as the desired primitive.
321
-
322
- **Known approaches:** Coordinator script (Candidate A), native workflow (Candidate B), hybrid (Candidate C).
323
-
324
- **Key stakeholders:** Anyone building a coordinator pipeline (today: Etienne; soon: teams using WorkTrain autonomously).
325
-
326
- ---
327
-
328
- ## Candidate Directions
329
-
330
- ### Candidate Generation Expectations
331
-
332
- This is a **design_first** pass with **THOROUGH** rigor. Requirements for the candidate set:
333
- 1. At least one candidate must meaningfully reframe the problem (not just package an obvious solution)
334
- 2. All candidates must address the central tension: observability vs. parallelism
335
- 3. All candidates must specify how they handle content-based routing (the 2-call HTTP gap or native notes)
336
- 4. One candidate must represent the "build now with current primitives" position (pragmatic)
337
- 5. One candidate must represent the "build the right primitive first" position (strategic)
338
- 6. The spread must not cluster -- candidates genuinely differ in abstraction level
339
-
340
- ---
341
-
342
- ### Candidate 1: TypeScript Coordinator Script (Minimal, Build Now)
343
-
344
- **One-sentence summary:** A standalone TypeScript file with DI-injected spawn/await/HTTP effects that drives the mr-review pipeline using today's worktrain spawn/await CLI plus a 2-call HTTP sequence to retrieve step notes for routing.
345
-
346
- **Concrete shape:**
81
+ **CoordinatorDeps interface:**
347
82
  ```typescript
348
- // coordinator-mr-review.ts
349
83
  interface CoordinatorDeps {
350
- readonly spawnSession: (workflowId: string, goal: string, workspace: string) => Promise<string>; // returns sessionHandle
351
- readonly awaitSessions: (handles: string[], mode: 'all'|'any', timeoutMs: number) => Promise<AwaitResult>;
352
- readonly getSessionNotes: (handle: string) => Promise<string | null>; // GET /api/v2/sessions/:id/nodes/:tipNodeId -> recapMarkdown
353
- readonly listOpenPRs: () => Promise<PullRequest[]>; // gh pr list --json
354
- readonly mergePR: (number: number) => Promise<void>;
355
- readonly postSlack: (message: string) => Promise<void>;
84
+ readonly spawnSession: (workflowId: string, goal: string, workspace: string) => Promise<string>; // sessionHandle
85
+ readonly awaitSessions: (handles: string[], timeoutMs?: number) => Promise<AwaitResult>;
86
+ readonly getAgentResult: (sessionHandle: string) => Promise<string | null>; // recapMarkdown
87
+ readonly listOpenPRs: (workspace: string) => Promise<PrSummary[]>;
88
+ readonly mergePR: (prNumber: number, workspace: string) => Promise<void>;
89
+ readonly postResult: (notes: string) => Promise<void>;
356
90
  readonly stderr: (line: string) => void;
357
- }
358
-
359
- type FindingsSeverity = 'clean' | 'minor' | 'blocking';
360
-
361
- async function runMrReviewPipeline(deps: CoordinatorDeps, workspace: string): Promise<CoordinatorResult>
362
- ```
363
-
364
- Notes are retrieved via: GET /api/v2/sessions/:id (get tip nodeId from runs[0].nodes[preferredTipNodeId]) then GET /api/v2/sessions/:id/nodes/:nodeId (get recapMarkdown). Parsed by a `parseFindings(recapMarkdown: string): FindingsSeverity` function that scans for known severity markers.
365
-
366
- **Tensions resolved:** T3 (specific topology: loop-with-retry), T2 (content access via explicit 2-call HTTP)
367
- **Tensions accepted:** T1 (no parallelism), T4 (build now, not right)
368
- **Wait -- parallelism:** This candidate CAN achieve parallel fan-out by calling `spawnSession` N times before calling `awaitSessions([h1, h2, h3, ...])`. The `awaitSessions` dep wraps `worktrain await` which polls all sessions concurrently. **This resolves T1.**
369
-
370
- **Boundary solved at:** TypeScript module boundary. All I/O injected via `CoordinatorDeps`. Testable with fake deps (no live daemon).
371
-
372
- **Failure mode to watch:** `parseFindings` is a string parser on LLM-generated markdown. If the mr-review workflow changes its notes format, the parser silently misclassifies. Must have a `'unknown'` severity fallback that defaults to 'blocking' (conservative).
373
-
374
- **Relation to existing patterns:** Directly follows `WorktrainSpawnCommandDeps` / `WorktrainAwaitCommandDeps` DI pattern. Same injectable interface shape.
375
-
376
- **Gains:** Ships today. Uses stable primitives. Fully testable. Parallel fan-out. Content-based routing.
377
- **Gives up:** Invisible to console DAG (coordinator is not a WorkRail session). No durable state (script crash = lost progress). Notes parsing is brittle.
378
-
379
- **Impact surface:** None -- coordinator is a standalone file. Does not require engine changes.
380
-
381
- **Scope:** Best-fit for "first coordinator template."
382
-
383
- **Philosophy honored:** Errors as data (CliResult pattern), Dependency injection, Exhaustiveness (FindingsSeverity union), Validate at boundaries (parseFindings validates at HTTP response boundary)
384
- **Philosophy tension:** Not fully deterministic if notes format varies (string parsing on LLM output)
385
-
386
- ---
387
-
388
- ### Candidate 2: Serial Coordinator Workflow (Native, Observable)
389
-
390
- **One-sentence summary:** A WorkRail workflow JSON file where each pipeline phase is a step that calls `spawn_agent` once, receives notes inline, and uses those notes to set context variables that the next step reads.
391
-
392
- **Concrete shape:**
393
- ```json
394
- {
395
- "id": "coordinator-mr-review",
396
- "steps": [
397
- {
398
- "id": "gather-prs",
399
- "procedure": ["Run gh pr list --json, set context.openPRs"]
400
- },
401
- {
402
- "id": "review-loop",
403
- "loopCondition": "context.openPRs.length > 0",
404
- "procedure": [
405
- "Pop one PR from context.openPRs",
406
- "Call spawn_agent(mr-review-workflow-agentic, goal: 'Review PR #N')",
407
- "Read spawn_agent result.notes",
408
- "If notes contain CLEAN: add to context.mergeQueue",
409
- "If notes contain MINOR: call spawn_agent(coding-task-workflow-agentic, 'Fix: <finding>')",
410
- "If notes contain BLOCKING: add to context.escalationList"
411
- ]
412
- },
413
- {
414
- "id": "merge-queue",
415
- "procedure": ["For each PR in mergeQueue: run git merge sequence"]
416
- }
417
- ]
418
- }
419
- ```
420
-
421
- Each `spawn_agent` call blocks until child completes, then returns `{ outcome, notes }`. Notes are available immediately -- no HTTP polling needed.
422
-
423
- **Tensions resolved:** T1 (fully observable in console DAG), T2 (notes available inline)
424
- **Tensions accepted:** T1 partial (serial review -- one PR at a time, not parallel), T4 (cannot build right now, needs workflow JSON authoring)
425
-
426
- **Boundary solved at:** WorkRail workflow step boundary. All coordination logic in workflow JSON prompt instructions + agent reasoning.
427
-
428
- **Failure mode to watch:** Parent session's maxSessionMinutes accumulates across all spawn_agent calls. Reviewing 10 PRs with a 30-minute child budget each requires the parent to have 300+ minutes. Time budget explosion is silent -- parent times out while children are running.
429
-
430
- **Relation to existing patterns:** Directly uses spawn_agent as designed. Follows the workflow-runner.ts spawn_agent pattern (blocking, errors as data, parentSessionId).
431
-
432
- **Gains:** Full DAG observability. Session state is durable (daemon restart recovers). Notes available inline, no separate HTTP call.
433
- **Gives up:** Serial reviews (10 PRs = 10x review time). Time budget scales linearly. Routing logic is in LLM-readable prompt, not testable TypeScript.
434
-
435
- **Impact surface:** None. A new workflow JSON file.
436
-
437
- **Scope:** Best-fit IF serial review is acceptable.
438
-
439
- **Philosophy honored:** Errors as data (spawn_agent returns outcome), Exhaustiveness (outcome enum), Observable state
440
- **Philosophy tension:** Routing logic is LLM prompt text, not typed domain logic. Coordinator philosophy says scripts, not LLM reasoning -- this violates the scripts-first principle.
441
-
442
- ---
443
-
444
- ### Candidate 3: Build Async spawn_agent First, Then Native Coordinator Workflow
445
-
446
- **One-sentence summary:** Extend spawn_agent with a non-blocking mode (`blocking: false`) that returns a `pendingHandle` immediately, add an `await_agents` tool that takes an array of pendingHandles and blocks until all complete, then build the coordinator as a WorkRail workflow that uses these new tools for parallel + observable + content-aware orchestration.
447
-
448
- **Concrete shape (new engine API):**
449
- ```typescript
450
- // New tool: spawn_agent with blocking: false
451
- spawn_agent({ workflowId, goal, workspacePath, blocking: false })
452
- → { pendingHandle: string } // returns immediately
453
-
454
- // New tool: await_agents
455
- await_agents({ handles: ['ph_abc', 'ph_def'], mode: 'all' })
456
- → [{ handle, childSessionId, outcome, notes }] // blocks until all complete
457
- ```
458
-
459
- The coordinator workflow then becomes:
460
- ```
461
- Step 1: Gather PRs (script/bash)
462
- Step 2: Spawn all review sessions in parallel (call spawn_agent with blocking:false for each PR)
463
- Step 3: Await all (call await_agents)
464
- Step 4: Route on notes (typed context variables set from parsed notes)
465
- Step 5: Spawn fix agents (blocking spawn_agent, one at a time)
466
- Step 6: Merge
467
- ```
468
-
469
- **Tensions resolved:** T1 (parallel + observable), T2 (notes inline from await_agents), T4 (builds the right primitive)
470
- **Tensions accepted:** T4 partial (does not ship today -- requires engine work)
471
-
472
- **Boundary solved at:** WorkRail engine boundary. New tools in workflow-runner.ts.
473
-
474
- **Failure mode to watch:** The non-blocking spawn pattern must not use `dispatch()` (deadlock risk via Semaphore + queue slot). The implementation must use the same `runWorkflow()` pattern as the blocking spawn_agent, but launch it as a concurrent Promise that is tracked in a coordinator-owned pending map.
475
-
476
- **Relation to existing patterns:** Extends spawn_agent in workflow-runner.ts (L1415). The blocking version already exists -- non-blocking is an additive extension. `pendingHandle` concept mirrors the CLI's sessionHandle.
477
-
478
- **Gains:** Resolves all 5 decision criteria. Parallel fan-out + observability + content routing + DAG tree + extensible. The coordinator workflow is the long-term correct abstraction.
479
- **Gives up:** Does not exist today. Requires engine PR before coordinator can be built. 2-4 week delay (estimate).
480
-
481
- **Impact surface:** workflow-runner.ts (new makeAwaitAgentsTool), workflow-runner.ts (extend makeSpawnAgentTool), possibly workflow-runner.ts executeWorkflowLoop for pending handle tracking.
482
-
483
- **Scope:** Too broad for "first coordinator template" alone, but correctly scoped for "right long-term architecture."
484
-
485
- **Philosophy honored:** All 5 decision criteria. Architectural fixes over patches. Make illegal states unrepresentable (pending handle as a typed domain type).
486
- **Philosophy tension:** YAGNI -- this builds a speculative primitive before any coordinator has validated the need in production.
487
-
488
- ---
489
-
490
- ### Candidate 4: Minimal Script + Structured Notes Contract (Pragmatic + Future-Proof)
491
-
492
- **One-sentence summary:** Build Candidate 1 (TypeScript coordinator script) but add a structured `## COORDINATOR_OUTPUT` JSON block to the mr-review workflow's final step notes as a first-class contract, making the coordinator's dependency on notes format explicit and versioned rather than fragile string parsing.
493
-
494
- **Concrete shape:**
495
-
496
- In `mr-review-workflow.agentic.v2.json` final step, add to `outputRequired`:
497
- ```json
498
- {
499
- "coordinatorOutput": "JSON block in exact format:\n```json\n{\"findings\": [{\"severity\": \"clean|minor|blocking\", \"summary\": \"...\", \"prNumber\": N}]}\n```"
500
- }
501
- ```
502
-
503
- In the coordinator script:
504
- ```typescript
505
- function parseCoordinatorOutput(notes: string): Result<CoordinatorOutput, ParseError> {
506
- const match = /```json\n([\s\S]+?)\n```/.exec(notes);
507
- if (!match) return err({ kind: 'missing_block' });
508
- return parseJson(match[1]).andThen(validateCoordinatorOutput);
91
+ readonly now: () => number;
92
+ readonly port: number;
509
93
  }
510
94
  ```
511
95
 
512
- The coordinator is now a typed consumer of a versioned contract, not a fragile markdown parser.
513
-
514
- **Tensions resolved:** T2 (typed content-based routing), T3 (mr-review topology with loop-with-retry), T1 partial (parallel via worktrain spawn + await)
515
- **Tensions accepted:** T1 (invisible to DAG), T4 (builds now at script layer)
96
+ **Pure functions:**
97
+ - `parseFindingsFromNotes(markdown: string | null): Result<ReviewFindings, string>` -- two-tier (JSON block first, keyword scan fallback)
98
+ - `classifySeverity(findings: ReviewFindings): ReviewSeverity`
99
+ - `buildFixGoal(prNumber: number, findings: ReviewFindings): string`
516
100
 
517
- **Boundary solved at:** Notes contract boundary (between workflow and coordinator). The contract is the seam.
101
+ - **Tensions resolved:** Context passing possible (HTTP direct), type safety, testability, all 5 robustness rules
102
+ - **Tensions accepted:** Slightly more code vs A; port discovery logic duplicated from spawn.ts (intentional)
103
+ - **Boundary:** `CoordinatorDeps` -- exactly the same pattern as `WorktrainSpawnCommandDeps`
104
+ - **Failure mode:** `recapMarkdown` is null -> treated as `unknown` -> escalate (conservative, correct)
105
+ - **Repo-pattern relationship:** Follows `WorktrainSpawnCommandDeps` pattern exactly; adapts `parseHandoffArtifact` two-tier parser
106
+ - **Gain:** Full type safety, testable pure core, context passing, matches existing architecture
107
+ - **Loss:** More code (but correct code)
108
+ - **Scope:** Best-fit -- 3 new files
109
+ - **Philosophy:** Honors all -- immutability, DI for I/O, errors as data, explicit domain types, validate at boundaries, prefer fakes over mocks
518
110
 
519
- **Failure mode to watch:** The coordinator output contract must be maintained in sync with the workflow JSON. If the workflow changes the JSON block format without updating the coordinator, parsing fails. Needs schema versioning or a shared type definition.
111
+ ### Candidate C: Generic Coordinator Framework + pr-review Instance
520
112
 
521
- **Relation to existing patterns:** The handoff artifact (delivery-action.ts `parseHandoffArtifact`) already does exactly this -- parses a structured JSON block from step notes. The coordinator contract is the same pattern at the coordination layer.
113
+ **Summary:** Build `src/coordinators/base.ts` with `CoordinatorDeps<TInput, TOutput>` generic and pipeline pattern, then implement pr-review as an instance.
522
114
 
523
- **Gains:** Parallel fan-out + typed routing + works today + extensible to other workflows. The notes contract is explicit rather than implicit.
524
- **Gives up:** Invisible to DAG. Notes contract couples coordinator to specific workflow version.
525
-
526
- **Impact surface:** mr-review workflow JSON (add coordinatorOutput to outputRequired), coordinator script (use typed parser), possibly a shared `coordinator-contract-types.ts` file.
527
-
528
- **Scope:** Best-fit. Pragmatic + the most defensible long-term.
529
-
530
- **Philosophy honored:** Validate at boundaries, Errors as data, Prefer explicit domain types (CoordinatorOutput > string parse), Exhaustiveness (FindingsSeverity discriminated union)
531
- **Philosophy conflict:** None significant.
115
+ - **Tensions resolved:** Extensibility for future coordinators
116
+ - **Tensions accepted:** Higher upfront complexity; forced generic mold may not fit next coordinator
117
+ - **Boundary:** Generic abstraction layer above CoordinatorDeps
118
+ - **Failure mode:** Framework abstraction doesn't fit next coordinator's shape
119
+ - **Repo-pattern relationship:** No existing coordinator framework to adapt; departs significantly
120
+ - **Gain:** Future reuse
121
+ - **Loss:** YAGNI violation -- second coordinator doesn't exist yet
122
+ - **Scope:** Too broad
123
+ - **Philosophy:** Violates YAGNI with discipline
532
124
 
533
125
  ---
534
126
 
535
127
  ## Comparison and Recommendation
536
128
 
537
- ### Tensions x Candidates Matrix
538
-
539
- | Criterion | C1 (Script, minimal) | C2 (Workflow, serial) | C3 (Async spawn_agent) | C4 (Script + contract) |
540
- |-----------|----------------------|----------------------|------------------------|------------------------|
541
- | Parallel fan-out | YES | NO | YES | YES |
542
- | Content-based routing | YES (fragile parse) | YES (inline) | YES (inline) | YES (typed contract) |
543
- | Structured failure data | YES | YES | YES | YES |
544
- | Console DAG tree | NO | YES | YES | NO |
545
- | New pipeline = new file | YES | YES | YES | YES |
546
- | Ships today | YES | YES | NO | YES |
547
- | Testable without daemon | YES | NO | NO | YES |
548
-
549
- ### Recommended Direction: Candidate 4 (Script + Structured Notes Contract)
129
+ **Recommendation: Candidate B**
550
130
 
551
- **Build a TypeScript coordinator script with DI-injected effects, parallel fan-out via worktrain spawn + await, and content-based routing against an explicit `## COORDINATOR_OUTPUT` JSON block added to the mr-review workflow's final step.**
131
+ Candidate B is the only option that:
132
+ 1. Follows the established DI interface pattern (`WorktrainSpawnCommandDeps`) exactly
133
+ 2. Enables the `getAgentResult` 2-call HTTP sequence in a testable way
134
+ 3. Produces pure functions for finding parsing and severity classification
135
+ 4. Enforces all 5 robustness rules with explicit typed state
136
+ 5. Honors all CLAUDE.md philosophy principles
552
137
 
553
- Rationale:
554
- 1. Only C1 and C4 achieve parallel fan-out + ship today + testable without daemon
555
- 2. C4 > C1 because typed contract (explicit domain type) vs. fragile regex (philosophy violation)
556
- 3. C2 loses on serial reviews (N*reviewTime) and routing logic in LLM prompts (scripts-first violation)
557
- 4. C3 is the correct long-term direction but does not exist (YAGNI until C4 validates the topology)
558
-
559
- **The coordinator output contract pattern is already proven in the codebase:** `parseHandoffArtifact` in `src/trigger/delivery-action.ts` does exactly this for the coding-task workflow. The coordinator contract is the same pattern at the coordination layer.
560
-
561
- ### What C4 Gives Up
562
-
563
- Console DAG visibility. The coordinator script is invisible -- not a WorkRail session. Mitigation: child sessions appear in the console as a flat list with parentSessionId links once that UI is built. Phase transitions logged to stderr with session handles.
564
-
565
- **C4 is a stepping stone, not a permanent decision.** Once async spawn_agent ships (C3 direction), the coordinator workflow will have DAG visibility + all current advantages. C4 validates the topology in production so that C3 is built on confirmed requirements.
566
-
567
- ### Self-Critique
568
-
569
- **Strongest counter-argument:** Notes contract coupling creates a maintenance dependency between the coordinator script and a specific version of the mr-review workflow. If the workflow format changes, the coordinator silently fails or throws a parse error. Candidate 2 avoids this entirely (the LLM reads whatever notes exist).
570
-
571
- **Pivot conditions:**
572
- 1. If notes contract adds too much friction across multiple coordinator scripts -> use structured context variables (workflow final step sets context.coordinatorOutput, coordinator reads it via GET session context) instead of notes block
573
- 2. If async spawn_agent is scheduled within 2 weeks -> consider waiting and going directly to C3
574
- 3. If DAG observability is critical for the first real pipeline debug -> accept C2's serial reviews to get the tree view
575
-
576
- ## Challenge Notes
577
-
578
- ### Adversarial Challenge of Candidate 4 (Leading Direction)
579
-
580
- **Challenge 1: The notes contract is a false solution to the wrong problem.**
581
-
582
- Candidate 4 adds a `## COORDINATOR_OUTPUT` JSON block to the mr-review workflow. But what happens when there are 10 different coordinator pipelines, each with a different output contract? You now have N contracts to maintain, each coupling a coordinator to a specific workflow version. The `parseHandoffArtifact` precedent is actually a warning, not a green light -- the handoff artifact has already caused fragility when workflow output formats drifted. Coordinator contracts multiply this surface area.
583
-
584
- **Resolution:** Valid concern, but the alternative (LLM prose routing in C2) has the same coupling problem in a less visible form -- the coordinator LLM must still understand what 'BLOCKING' means in free-form text. Typed contracts are preferable to implicit text conventions even if they require maintenance. Mitigation: a shared `coordinator-contract-types.ts` with a versioned schema, and a `validate_coordinator_output` step in the workflow's verify block that rejects outputs that don't match the schema.
585
-
586
- **Challenge 2: worktrain spawn does not pass context variables -- classify-task output cannot reach the coding-task session.**
587
-
588
- The mr-review pipeline does not need classify-task output. But the full implementation pipeline (implement-feature coordinator) requires passing `{ taskComplexity, recommendedPipeline }` from classify-task to the coding-task session. The current worktrain spawn CLI has no `--context` flag. A coordinator that needs to pass context must use HTTP directly: POST /api/v2/auto/dispatch with a `context` body field.
589
-
590
- **Resolution:** Confirmed gap. Add `passContext: (handle: string, context: Record<string, unknown>) => Promise<void>` to CoordinatorDeps, implemented via HTTP. This must be documented in the coordinator template as a required dep. It is not a blocker for the mr-review coordinator specifically (which does not need classify output), but IS a blocker for the implement-feature coordinator.
591
-
592
- **Challenge 3: The coordinator script is not durable. If the script process dies mid-pipeline, all coordination state is lost.**
593
-
594
- If the coordinator crashes after spawning 5 review sessions but before collecting their results, those sessions are orphaned in 'in_progress' state. The coordinator has no way to recover -- it cannot re-acquire the session handles it created. The script's state is entirely in-memory.
595
-
596
- **Resolution:** This is a real limitation with no clean fix in Candidate 4. Mitigation: write session handles to a state file (`~/.workrail/coordinator-state/{run-id}.json`) at each phase transition. On coordinator restart, read the state file and resume from the last checkpoint. This adds complexity but is not insurmountable. Alternatively: accept the limitation for the first coordinator and note it as a reason to invest in C3 (native workflow + daemon durability).
597
-
598
- **Challenge 4: 'Parallel fan-out' requires calling spawnSession N times before awaitSessions. But worktrain spawn is sequential at the CLI level -- each CLI invocation is a separate process.**
599
-
600
- Actually this is a non-issue. The coordinator script calls the `spawnSession` dep N times (N async HTTP calls in parallel via Promise.all), then calls `awaitSessions` once with all handles. The HTTP calls are non-blocking. True parallelism IS achievable in the TypeScript coordinator. This challenge fails.
601
-
602
- **Challenge 5: What if the first mr-review pipeline reveals that the topology is wrong?**
603
-
604
- If the first real pipeline run shows that the mr-review workflow needs to return richer data than the notes contract provides, the coordinator needs a contract change + workflow change + coordinator change. In C2 (native workflow), a format change is handled by changing the LLM prompt -- no parse layer to update. This makes C2 more flexible for early iteration.
605
-
606
- **Resolution:** Valid tradeoff. C4 is less flexible to format changes than C2. Mitigation: keep the coordinator output block minimal for v1 (severity + summary + prNumber only) and add fields incrementally. Schema versioning via a `version: 1` field in the JSON block lets the coordinator detect stale contract versions.
607
-
608
- ### Challenge Verdict
609
-
610
- **C4 holds up under challenge.** The three real concerns (contract maintenance, no context passing, no durability) are documented limitations with known mitigations, not blockers. The challenge confirms C4 as the right first coordinator design, with C3 (async spawn_agent + native workflow) as the explicit next investment after production validation.
611
-
612
- ---
613
-
614
- ## Resolution Notes
615
-
616
- **Selected Direction: Candidate 4** -- TypeScript coordinator script with DI-injected effects, parallel fan-out via worktrain spawn + await, and an explicit `## COORDINATOR_OUTPUT` JSON block in the mr-review workflow as the typed findings contract.
617
-
618
- **Runner-up: Candidate 3** -- async spawn_agent + native coordinator workflow. This is the correct long-term direction. It should be built after C4 validates the topology in production.
619
-
620
- **Why C4 over C2:** C2 is serial (N*reviewTime). C2 routing logic lives in LLM prompts (violates scripts-first principle). C2 is not testable without a live daemon.
621
-
622
- **Why C4 over C1:** C1 uses fragile regex on free-form notes. C4 uses a typed contract (`## COORDINATOR_OUTPUT` JSON block). The difference is the same as parseHandoffArtifact vs. ad-hoc string parsing.
623
-
624
- **Why C4 before C3:** YAGNI. Async spawn_agent does not exist. Build C4 first, run it in production, then use the validated topology to spec async spawn_agent properly.
138
+ The scope is correct: 3 new files, clear boundaries, no speculative abstractions.
625
139
 
626
140
  ---
627
141
 
628
- ## Decision Log
142
+ ## Self-Critique
629
143
 
630
- **Decision 1:** Path = design_first. Rationale: goal was a solution statement; two viable architectures existed; risk was premature commitment to wrong abstraction level.
144
+ **Strongest counter-argument against B:** Port discovery logic is duplicated from `worktrain-spawn.ts`. A clean-design purist would extract it to a shared util. However, the coordinator is a standalone script in `src/coordinators/`, not part of the daemon machinery. Coupling it to internal daemon utils would be the wrong dependency direction. The duplication is intentional and bounded.
631
145
 
632
- **Decision 2:** Selected Candidate 4 (TypeScript script + coordinator output contract). Rationale: parallel fan-out, typed routing, testable, ships today, extensible. C3 is the long-term direction but YAGNI until production validates the topology.
146
+ **What would tip toward A:** If context passing is never needed and the coordinator will always be a thin orchestrator. But the robustness rules (zombie detection, traceability JSON block) require typed interfaces, which Candidate A can't easily provide.
633
147
 
634
- **Decision 3:** Notes contract pattern confirmed by precedent (parseHandoffArtifact in delivery-action.ts). The coordinator output block is the same pattern at the coordination layer.
148
+ **What evidence would justify C:** A second coordinator (e.g., `groom-prs`, `security-audit`) that shares 50%+ of the shape. Currently speculative.
635
149
 
636
- **Decision 4:** Three documented limitations accepted as known tradeoffs: (a) coordinator invisible to console DAG, (b) no context passing from worktrain spawn CLI (use HTTP directly), (c) no durability on coordinator crash (state file mitigation optional for v1).
637
-
638
- **Decision 5:** C3 (async spawn_agent) named as next engine investment. Trigger: after first coordinator (C4) has run in production with real PRs.
150
+ **Invalidating assumption:** If `GET /api/v2/sessions/:id/nodes/:nodeId` consistently returns `recapMarkdown: null` for the final step (e.g., because `requireConfirmation: true` nodes store notes differently). Mitigation already built in: null -> unknown -> escalate. The coordinator won't crash, it escalates conservatively.
639
151
 
640
152
  ---
641
153
 
642
- ## Final Summary
643
-
644
- ### Recommended Architecture
645
-
646
- **Candidate 4: TypeScript Coordinator Script with Structured Notes Contract**
647
-
648
- Build a standalone TypeScript file (`coordinator-mr-review.ts`) with:
649
-
650
- 1. **CoordinatorDeps DI interface** -- all effects injectable (spawn, await, notes retrieval, PR list, merge, Slack):
651
- ```typescript
652
- interface CoordinatorDeps {
653
- readonly spawnSession: (workflowId: string, goal: string, workspace: string, context?: Record<string, unknown>) => Promise<string>;
654
- readonly awaitSessions: (handles: string[], timeoutMs: number) => Promise<AwaitResult>;
655
- readonly getAgentResult: (handle: string) => Promise<AgentResult>; // 2-call HTTP internally
656
- readonly listOpenPRs: () => Promise<PullRequest[]>;
657
- readonly mergePR: (number: number) => Promise<void>;
658
- readonly postSlack: (message: string) => Promise<void>;
659
- readonly stderr: (line: string) => void;
660
- }
661
- ```
662
-
663
- 2. **AgentResult bridge type** (mirrors future async spawn_agent result):
664
- ```typescript
665
- type AgentResult = { handle: string; childSessionId: string | null; outcome: SessionOutcome; notes: string | null };
666
- ```
667
-
668
- 3. **Two-tier findings parsing**:
669
- - Preferred: parse `## COORDINATOR_OUTPUT` JSON block from notes (typed contract)
670
- - Fallback: scan for BLOCKING/MINOR/CLEAN keywords in notes (graceful degradation)
671
- - Unknown severity defaults to 'blocking' (conservative)
672
-
673
- 4. **Parallel fan-out via Promise.all + worktrain await**:
674
- ```typescript
675
- const handles = await Promise.all(prs.map(pr => deps.spawnSession('mr-review-workflow-agentic', `Review PR #${pr.number}`, workspace)));
676
- const awaitResult = await deps.awaitSessions(handles, 30 * 60 * 1000);
677
- const agentResults = await Promise.all(handles.map(h => deps.getAgentResult(h)));
678
- ```
679
-
680
- 5. **Typed routing** over FindingsSeverity discriminated union:
681
- ```typescript
682
- type FindingsSeverity = 'clean' | 'minor' | 'blocking' | 'unknown';
683
- ```
684
-
685
- 6. **Required parallel workflow change: add verify step to mr-review workflow**
686
- The mr-review workflow's final step must include in `outputRequired`:
687
- ```
688
- coordinatorOutput: "JSON block: ## COORDINATOR_OUTPUT\n```json\n{\"findings\": [{\"severity\": \"clean|minor|blocking\", \"summary\": \"...\", \"prNumber\": N}]}\n```"
689
- ```
690
- And in `verify`: "## COORDINATOR_OUTPUT block is present with valid JSON matching the coordinator schema."
691
-
692
- ### Notes Retrieval: 2-Call HTTP Sequence
693
-
694
- The `getAgentResult` dep implementation does:
695
- 1. GET `/api/v2/sessions/:sessionId` -- find `runs[0].nodes` preferred tip node ID
696
- 2. GET `/api/v2/sessions/:sessionId/nodes/:nodeId` -- get `recapMarkdown` (step notes)
697
- 3. Parse coordinator output block from recapMarkdown
698
-
699
- ### Context Passing
700
-
701
- The `spawnSession` dep uses HTTP dispatch directly (not worktrain spawn CLI):
702
- ```
703
- POST /api/v2/auto/dispatch { workflowId, goal, workspacePath, context }
704
- ```
705
- The `context` body field is supported (verified in console-routes.ts L519).
706
- The worktrain spawn CLI does not have a `--context` flag -- use HTTP for context-passing pipelines.
707
-
708
- ### Pipeline Sequence (mr-review coordinator)
709
-
710
- ```
711
- 1. listOpenPRs() -> [PR]
712
- 2. Parallel: spawnSession('mr-review-workflow-agentic', goal, workspace) for each PR
713
- 3. awaitSessions(all handles, 30m)
714
- 4. For each handle: getAgentResult -> parse findings
715
- 5. Route:
716
- - clean -> mergeQueue
717
- - minor -> spawnSession('coding-task-workflow-agentic', 'Fix: <finding>'), await, re-review (max 3 passes)
718
- - blocking -> escalation list (Slack + GitLab comment)
719
- - unknown -> escalation list (conservative default)
720
- 6. mergePR() for each clean PR (serial, pull before each to avoid conflicts)
721
- 7. postSlack(summary)
722
- ```
723
-
724
- ### What This Gives Up
725
-
726
- - Console DAG tree view: coordinator is not a WorkRail session. Child sessions appear as a flat list. **Mitigation:** parentSessionId is already in the session store; console tree view is the next planned feature and will retroactively make child sessions visible.
727
- - Coordinator crash durability: script state is in-memory. **Mitigation (v2):** state file at `~/.workrail/coordinator-state/{run-id}.json` written at each phase transition.
728
-
729
- ### Long-Term Direction (Candidate 3)
730
-
731
- After the first coordinator script runs in production and validates the topology:
732
- - Build `spawn_agent(blocking: false)` + `await_agents([handles])` native engine tools
733
- - Build a coordinator as a WorkRail workflow JSON file using these tools
734
- - This gives: parallel fan-out + console DAG observability + daemon durability + inline notes (no 2-call HTTP)
735
- - The `AgentResult` bridge type in C4 makes this migration trivial (same result shape)
154
+ ## Open Questions for the Main Agent
736
155
 
737
- ### Open Questions (non-blocking)
156
+ 1. Should `worktrain run pr-review` use a nested commander subcommand (`program.command('run').command('pr-review')`) or a flat command (`program.command('run pr-review')`)? Commander supports both; the existing commands all use flat style -- recommend flat.
738
157
 
739
- 1. Is non-blocking spawn_agent on the near-term roadmap? If yes, skip C4 and build C3 directly.
740
- 2. Should `worktrain spawn` get a `--context` flag? Would simplify coordinator deps for context-passing pipelines.
741
- 3. Should `worktrain await` get a `--include-notes` flag? Would consolidate the 2-call HTTP into the CLI.
158
+ 2. For the serial merge sequence: should the coordinator do `git pull` before each `gh pr merge --squash`? Yes -- ensures clean base. But this is a coordinator behavior detail, not an architectural question.
742
159
 
743
- ### Confidence Band: HIGH
160
+ 3. Should the coordinator write a full report file (`coordinator-pr-review-YYYY-MM-DD.md`)? Yes, per UX spec. This is a simple file write via CoordinatorDeps.
744
161
 
745
- All findings are grounded in direct code reading of: `worktrain-spawn.ts`, `worktrain-await.ts`, `workflow-runner.ts` (spawn_agent section), `trigger-router.ts`, `classify-task-workflow.json`, `console-types.ts`, `console-routes.ts`, and 4 backlog sections. No speculative assumptions.
162
+ 4. Is `coding-task-workflow-agentic` the correct fix-agent workflow? Yes -- it handles "implement/fix" tasks. The goal string `Fix review findings in PR #N: [finding summaries]` is the goal format.