@exaudeus/workrail 3.38.0 → 3.39.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli-worktrain.js +207 -0
- package/dist/console-ui/assets/{index-BtOJj6Xy.js → index-3oXZ_A9m.js} +1 -1
- package/dist/console-ui/index.html +1 -1
- package/dist/coordinators/pr-review.d.ts +57 -0
- package/dist/coordinators/pr-review.js +520 -0
- package/dist/manifest.json +15 -7
- package/docs/discovery/coordinator-design-review.md +73 -0
- package/docs/discovery/coordinator-script-design.md +96 -679
- package/docs/discovery/hypothesis-challenge-report.md +44 -0
- package/docs/discovery/simulation-report.md +85 -0
- package/package.json +1 -1
|
@@ -1,745 +1,162 @@
|
|
|
1
|
-
# Coordinator Script
|
|
1
|
+
# PR Review Coordinator Script: Design Candidates
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
**Discovery path:** design_first (goal was a solution statement; risk is solving the wrong abstraction)
|
|
5
|
-
**Status:** In progress
|
|
3
|
+
*Discovery run: 2026-04-18. Three runs completed. Design settled.*
|
|
6
4
|
|
|
7
5
|
---
|
|
8
6
|
|
|
9
|
-
##
|
|
7
|
+
## Problem Understanding
|
|
10
8
|
|
|
11
|
-
|
|
9
|
+
### Core Tensions
|
|
12
10
|
|
|
13
|
-
**
|
|
14
|
-
- Architecture recommendation with rationale
|
|
15
|
-
- Key design decisions and tradeoffs
|
|
16
|
-
- Open questions and risks
|
|
17
|
-
- Reference for implementation
|
|
11
|
+
1. **Parseability vs. output format:** The `mr-review-workflow` final step (`phase-6-final-handoff`) produces free-form markdown designed for human readers. The coordinator needs machine-parseable output. Adding `## COORDINATOR_OUTPUT` to the workflow prompt would make parsing reliable, but changes a workflow other users may run standalone. Decision: two-tier parser in coordinator only; update the workflow prompt as a separate follow-up.
|
|
18
12
|
|
|
19
|
-
**
|
|
20
|
-
- Tracking workflow execution state
|
|
21
|
-
- Storing session continue tokens or checkpoint data
|
|
13
|
+
2. **HTTP API vs. CLI subprocess for dispatch:** Using CLI subprocess (`execFile('worktrain', ['spawn', ...])`) is simpler (no port discovery logic), but loses context passing and adds subprocess overhead. Using HTTP directly (`POST /api/v2/auto/dispatch`) requires port discovery (same logic as `worktrain-spawn.ts`), but enables the `context` field. Decision: HTTP direct, copy port discovery pattern.
|
|
22
14
|
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
## Context / Ask
|
|
15
|
+
3. **Coordinator as CLI script vs. daemon workflow:** Could run the coordinator as a WorkRail workflow, getting durability. But that adds circular dependency (WorkRail spawning WorkRail sessions from inside a WorkRail session). The backlog explicitly says "scripts-first coordinator" -- deterministic TypeScript logic, not LLM orchestration. Decision: standalone CLI script.
|
|
26
16
|
|
|
27
|
-
**
|
|
17
|
+
4. **Fix-agent loop termination:** Max 3 passes is the rule, but what if pass 2 comes back minor again? Need another review pass. The tension: another review = another 15-minute wait. Solution: enforce max 3 passes strictly via counter, track in coordinator state per PR.
|
|
28
18
|
|
|
29
|
-
|
|
19
|
+
### What Makes This Hard
|
|
30
20
|
|
|
31
|
-
**
|
|
21
|
+
1. **Notes extraction requires 2 sequential HTTP calls:** `GET /api/v2/sessions/:id` must succeed and return a valid `preferredTipNodeId` before the node detail call. If either fails, coordinator must treat as `unknown` severity (conservative, escalate).
|
|
32
22
|
|
|
33
|
-
|
|
23
|
+
2. **Fix agent loop state management:** Need to track per-PR: pass count, current handle, previous findings. This is mutable state, which conflicts with the immutability preference. Resolution: keep loop counter local to the per-PR processing function, not exposed as shared state.
|
|
34
24
|
|
|
35
|
-
|
|
25
|
+
3. **`worktrain await` does NOT return session notes:** `await` returns only `{ results: [...SessionResult], allSucceeded }`. To get what the agent actually found, a separate 2-call HTTP sequence is needed after `await` returns.
|
|
36
26
|
|
|
37
|
-
**
|
|
27
|
+
4. **Keyword scan ambiguity:** The mr-review markdown may use ambiguous language (e.g., "minor architectural blocking concern"). Conservative default: `unknown` -> blocking always wins over minor.
|
|
38
28
|
|
|
39
|
-
|
|
29
|
+
### Likely Seam
|
|
40
30
|
|
|
41
|
-
|
|
31
|
+
The real seam is `CoordinatorDeps` -- all HTTP calls, CLI calls (`gh`, `git`), and stderr output sit behind this interface. The coordinator core is pure TypeScript with no side effects except through deps.
|
|
42
32
|
|
|
43
33
|
---
|
|
44
34
|
|
|
45
|
-
## Constraints
|
|
35
|
+
## Philosophy Constraints
|
|
46
36
|
|
|
47
|
-
|
|
48
|
-
- Zero LLM cost for coordination routing decisions (scripts, not LLM reasoning, for deterministic logic)
|
|
49
|
-
- Coordinator must be observable: console DAG must show parent-child session tree
|
|
50
|
-
- Coordinator must be testable without a live daemon (mockable spawn/await primitives)
|
|
51
|
-
- Must not require engine changes to add a new pipeline
|
|
52
|
-
- Must handle fan-out parallelism (spawn N child sessions, collect all N results)
|
|
37
|
+
Source: `/Users/etienneb/CLAUDE.md`
|
|
53
38
|
|
|
54
|
-
**
|
|
55
|
-
-
|
|
56
|
-
-
|
|
57
|
-
-
|
|
58
|
-
-
|
|
39
|
+
- **Immutability by default** -- coordinator state as read-only data structures; mutation only in explicit loop counters
|
|
40
|
+
- **Errors as data** -- `Result<T, E>` return types from `parseFindingsFromNotes()`, `getAgentResult()` -- no throws
|
|
41
|
+
- **Validate at boundaries** -- validate port, workspace path at CoordinatorDeps wiring time (CLI entry point), trust internally
|
|
42
|
+
- **DI for I/O** -- all fetch, execFile, stderr injected via CoordinatorDeps; no direct imports in coordinator core
|
|
43
|
+
- **Explicit domain types** -- `ReviewSeverity = 'clean' | 'minor' | 'blocking' | 'unknown'` not plain string
|
|
44
|
+
- **Exhaustiveness everywhere** -- switch on `ReviewSeverity` must be exhaustive
|
|
45
|
+
- **YAGNI with discipline** -- build this coordinator, not a coordinator framework
|
|
59
46
|
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
## Key Facts From Code Reading
|
|
63
|
-
|
|
64
|
-
### worktrain spawn (worktrain-spawn.ts)
|
|
65
|
-
- Flags: `--workflow <id>`, `--goal <text>`, `--workspace <path>`, `[--port <n>]`
|
|
66
|
-
- HTTP POST to `/api/v2/auto/dispatch` with `{ workflowId, goal, workspacePath }`
|
|
67
|
-
- Output to stdout: session handle (string, e.g. `sess_abc123`)
|
|
68
|
-
- Output to stderr: progress/errors
|
|
69
|
-
- Return: CliResult success | failure | misuse
|
|
70
|
-
- **Does NOT pass context variables** -- only workflowId, goal, workspacePath
|
|
71
|
-
|
|
72
|
-
### worktrain await (worktrain-await.ts)
|
|
73
|
-
- Flags: `--sessions <h1,h2,...>`, `[--mode all|any]`, `[--timeout 30m]`
|
|
74
|
-
- Polls GET `/api/v2/sessions/:sessionId` every 3 seconds
|
|
75
|
-
- Terminal statuses: `complete`, `complete_with_gaps`, `blocked`, `dormant`
|
|
76
|
-
- Output to stdout: JSON `{ results: [{ handle, outcome, status, durationMs }], allSucceeded }`
|
|
77
|
-
- **CRITICAL GAP: No step notes, no findings, no structured artifacts returned**
|
|
78
|
-
- Exit code 0 if all succeeded, 1 if any failed/timed out
|
|
79
|
-
|
|
80
|
-
### spawn_agent tool (workflow-runner.ts L1415)
|
|
81
|
-
- Available inside workflow steps (not as a CLI command)
|
|
82
|
-
- Blocking: parent AgentLoop pauses inside execute() until child completes
|
|
83
|
-
- Returns: `{ childSessionId, outcome: 'success'|'error'|'timeout', notes: string }`
|
|
84
|
-
- Depth-limited: default max depth 3
|
|
85
|
-
- **Returns last step notes from child** -- actionable content available immediately
|
|
86
|
-
- **Serial only**: cannot fan out N children in parallel (each call blocks)
|
|
87
|
-
- Parent session's maxSessionMinutes keeps ticking while child runs
|
|
88
|
-
|
|
89
|
-
### trigger-router.ts dispatch()
|
|
90
|
-
- Fire-and-forget via KeyedAsyncQueue
|
|
91
|
-
- Returns immediately (202 pattern)
|
|
92
|
-
- Uses global Semaphore (max 3 concurrent by default)
|
|
93
|
-
- **Why spawn_agent cannot use dispatch()**: dispatch is fire-and-forget; calling it inside a running session would lose the result. Direct runWorkflow() call is used instead.
|
|
94
|
-
|
|
95
|
-
### classify-task-workflow.json
|
|
96
|
-
- Single-step, fast, no tools, no subagents
|
|
97
|
-
- Outputs: taskComplexity, riskLevel, hasUI, touchesArchitecture, taskType, affectedDomains, recommendedPipeline
|
|
98
|
-
- `recommendedPipeline` is an ordered array of workflow IDs
|
|
99
|
-
- Notes are the output channel -- no structured context variables emitted
|
|
47
|
+
**No conflicts** between stated philosophy and repo patterns.
|
|
100
48
|
|
|
101
49
|
---
|
|
102
50
|
|
|
103
|
-
##
|
|
104
|
-
|
|
105
|
-
The backlog pseudocode (backlog.md L1793-1795) shows:
|
|
106
|
-
```
|
|
107
|
-
3. Calls `await_sessions(handles)` → structured findings (script waits)
|
|
108
|
-
4. Parses the findings JSON block from each session's output (script)
|
|
109
|
-
5. Routes: clean → merge queue, minor → spawn fix agent, blocking → escalate
|
|
110
|
-
```
|
|
111
|
-
|
|
112
|
-
But the real `worktrain await` output is `{ handle, outcome, status, durationMs }` -- no findings, no notes.
|
|
51
|
+
## Impact Surface
|
|
113
52
|
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
This is a missing primitive: **worktrain notes <session-handle>** (or a --include-notes flag on worktrain await).
|
|
53
|
+
- `src/cli-worktrain.ts` -- adds `run pr-review` subcommand (minimal change, follows existing pattern)
|
|
54
|
+
- `src/cli/commands/index.ts` -- exports new command types
|
|
55
|
+
- `workflows/mr-review-workflow.agentic.v2.json` -- NOT changed in this PR; the coordinator's two-tier parser handles current output
|
|
56
|
+
- `POST /api/v2/auto/dispatch` -- used by coordinator via HTTP (no change to route)
|
|
57
|
+
- `GET /api/v2/sessions/:id` + `GET /api/v2/sessions/:id/nodes/:nodeId` -- read-only; no changes
|
|
121
58
|
|
|
122
59
|
---
|
|
123
60
|
|
|
124
|
-
##
|
|
125
|
-
|
|
126
|
-
### Candidate A: Coordinator Script (TypeScript/Shell)
|
|
127
|
-
|
|
128
|
-
```
|
|
129
|
-
coordinator-mr-review.ts
|
|
130
|
-
1. spawn handles[] = worktrain spawn --workflow mr-review for each PR (parallel fire-and-forget)
|
|
131
|
-
2. await results = worktrain await --sessions h1,h2,h3
|
|
132
|
-
3. for each handle: GET /api/v2/sessions/:id -> parse step notes -> extract findings
|
|
133
|
-
4. route: clean -> merge queue, minor -> spawn fix agent, blocking -> escalate
|
|
134
|
-
5. await fix agents -> re-review loop (circuit breaker at 3)
|
|
135
|
-
6. merge clean PRs
|
|
136
|
-
```
|
|
137
|
-
|
|
138
|
-
**Pros:**
|
|
139
|
-
- True parallel fan-out (fire-and-forget spawn, batch await)
|
|
140
|
-
- Deterministic routing (zero LLM cost for coordination)
|
|
141
|
-
- Testable: mock fetch, mock worktrain CLI
|
|
142
|
-
- Reusable: script is a standalone artifact others can copy
|
|
143
|
-
|
|
144
|
-
**Cons:**
|
|
145
|
-
- Invisible to WorkRail: coordinator script is not a session in the DAG
|
|
146
|
-
- No session DAG for the coordinator itself (only child sessions are visible)
|
|
147
|
-
- Requires a running process outside the daemon (shell script lifetime)
|
|
148
|
-
- Must handle port discovery, daemon connectivity separately
|
|
149
|
-
- **Missing primitive:** must query session notes separately after await
|
|
150
|
-
|
|
151
|
-
### Candidate B: WorkRail Workflow with spawn_agent Steps
|
|
152
|
-
|
|
153
|
-
```
|
|
154
|
-
coordinator-mr-review-workflow.json
|
|
155
|
-
Step 1: Gather PRs
|
|
156
|
-
Step 2: For each PR, call spawn_agent(mr-review-workflow) -- SERIAL (one at a time)
|
|
157
|
-
Step 3: Route based on outcome + notes from spawn_agent
|
|
158
|
-
Step 4: spawn_agent(fix-workflow) if needed, re-review
|
|
159
|
-
Step 5: Merge
|
|
160
|
-
```
|
|
161
|
-
|
|
162
|
-
**Pros:**
|
|
163
|
-
- Full WorkRail observability: coordinator IS a session in the DAG
|
|
164
|
-
- Child sessions linked via parentSessionId
|
|
165
|
-
- Console DAG shows the full tree
|
|
166
|
-
- spawn_agent returns notes directly (no separate query needed)
|
|
167
|
-
- Session state is durable (daemon crash recovery)
|
|
168
|
-
|
|
169
|
-
**Cons:**
|
|
170
|
-
- Serial only: cannot spawn N review sessions in parallel
|
|
171
|
-
- Parent session time limit accumulates across all child runs
|
|
172
|
-
- At depth limit 3 (default), nested spawn_agent chains are constrained
|
|
173
|
-
- Coordinator logic is in workflow JSON prompt blocks, not testable TypeScript
|
|
174
|
-
|
|
175
|
-
### Candidate C: Hybrid -- Script Coordinator with Session Registration
|
|
176
|
-
|
|
177
|
-
A TypeScript coordinator script that:
|
|
178
|
-
- Calls worktrain spawn (parallel) + worktrain await (batch)
|
|
179
|
-
- Registers itself as a coordinator session with a workflowId (so it appears in the DAG)
|
|
180
|
-
- After await, queries session notes via HTTP, routes on content
|
|
181
|
-
- Reports phase transitions back to the daemon as structured events
|
|
182
|
-
|
|
183
|
-
**Pros:** Parallel fan-out + DAG visibility + testable TypeScript
|
|
184
|
-
**Cons:** Requires new engine primitive (coordinator session registration) -- not yet built
|
|
185
|
-
|
|
186
|
-
---
|
|
187
|
-
|
|
188
|
-
## Landscape Packet
|
|
189
|
-
|
|
190
|
-
### Current State Summary
|
|
191
|
-
|
|
192
|
-
Two orchestration primitives exist today (both shipped, neither battle-tested):
|
|
193
|
-
|
|
194
|
-
| Primitive | Layer | Parallelism | Returns Content | Observable in DAG |
|
|
195
|
-
|-----------|-------|-------------|-----------------|-------------------|
|
|
196
|
-
| `worktrain spawn` + `worktrain await` | CLI / HTTP | Yes (fire N, await all) | No (outcome + status only) | No (script is invisible) |
|
|
197
|
-
| `spawn_agent` tool | Engine / workflow step | No (blocking, serial) | Yes (notes returned inline) | Yes (parentSessionId in store) |
|
|
198
|
-
|
|
199
|
-
Neither primitive is complete for the target use case:
|
|
200
|
-
- Script model: parallel but content-blind
|
|
201
|
-
- Native model: content-aware but serial
|
|
202
|
-
|
|
203
|
-
### Existing Workflows Available as Targets
|
|
204
|
-
- `coding-task-workflow-agentic` (lean v2)
|
|
205
|
-
- `mr-review-workflow.agentic.v2`
|
|
206
|
-
- `routine-context-gathering`, `routine-hypothesis-challenge`, `routine-philosophy-alignment`
|
|
207
|
-
- `ui-ux-design-workflow`, `production-readiness-audit`, `architecture-scalability-audit`
|
|
208
|
-
- `bug-investigation.agentic.v2`, `wr.discovery`
|
|
209
|
-
- `classify-task-workflow` (new, single-step, outputs recommendedPipeline array)
|
|
210
|
-
|
|
211
|
-
### Engineering State (git log context)
|
|
212
|
-
- `spawn_agent` shipped in commit `4254feb7` (feat: in-process child session delegation)
|
|
213
|
-
- `worktrain spawn` / `worktrain await` -- Tier 3 in Apr 18 grooming: "already merged, needs real-world test"
|
|
214
|
-
- `classify-task-workflow` exists but not yet wired into any coordinator
|
|
215
|
-
- `parentSessionId` is in session store; console tree view is the next planned feature
|
|
216
|
-
|
|
217
|
-
### Hard Constraints From Code
|
|
218
|
-
1. `dispatch()` in TriggerRouter is fire-and-forget + Semaphore-gated. Calling from inside a running session deadlocks. This is why `spawn_agent` uses direct `runWorkflow()` call, not `dispatch()`.
|
|
219
|
-
2. `worktrain await` stdout schema is `AwaitResult = { results: [{ handle, outcome, status, durationMs }], allSucceeded }`. No notes.
|
|
220
|
-
3. `worktrain spawn` CLI: `{ workflowId, goal, workspacePath }` only. No context variable passing.
|
|
221
|
-
4. `spawn_agent` depth limit: default max 3. Root (0) → child (1) → grandchild (2) → blocked at 3.
|
|
222
|
-
5. `spawn_agent` blocks the parent AgentLoop's execute() method. The parent cannot do other work while child runs.
|
|
223
|
-
|
|
224
|
-
### Obvious Contradictions
|
|
225
|
-
|
|
226
|
-
**C1: Backlog assumes findings from await, but CLI returns none.**
|
|
227
|
-
Backlog pseudocode (backlog.md L1793): `await_sessions(handles) → structured findings`. Real CLI returns `{ outcome, status, durationMs }` only. Every coordinator that routes on content has an undocumented extra HTTP step.
|
|
228
|
-
|
|
229
|
-
**C2: Backlog envisions parallel fan-out, but spawn_agent is serial.**
|
|
230
|
-
Backlog (backlog.md L2190): `await_sessions({ handles: [...], mode: 'all' })` implies a non-blocking spawn primitive. But `spawn_agent` (the native tool) blocks. The CLI (`worktrain spawn` / `worktrain await`) can do parallel, but returns no content.
|
|
231
|
-
|
|
232
|
-
**C3: classify-task output is in notes, not in context variables.**
|
|
233
|
-
The classify-task workflow outputs via step notes (a markdown block), not via WorkRail context variables. A coordinator that reads classify output must parse the notes string -- there is no structured `context.taskComplexity` to read directly.
|
|
234
|
-
|
|
235
|
-
### Evidence Gaps
|
|
236
|
-
- **Gap 1:** Whether GET /api/v2/sessions/:id currently returns full step notes in the runs array (the await code reads `runs[0].status` but not notes -- unclear if notes are included in the response body)
|
|
237
|
-
- **Gap 2:** Whether a `--context` flag for `worktrain spawn` is planned (needed to pass classify output to coding-task session)
|
|
238
|
-
- **Gap 3:** Whether non-blocking spawn_agent (fire + await_all) is on the roadmap (would resolve C2)
|
|
239
|
-
|
|
240
|
-
---
|
|
241
|
-
|
|
242
|
-
## Problem Frame Packet (Deep)
|
|
243
|
-
|
|
244
|
-
### Stakeholders
|
|
245
|
-
|
|
246
|
-
**Primary user: Etienne (WorkTrain builder / first pipeline author)**
|
|
247
|
-
- Job: Wire up an autonomous pipeline that runs the full develop-review-fix-merge cycle without manual coordination
|
|
248
|
-
- Outcome: Spend Monday morning reviewing Slack, not manually driving 8 agent sessions in sequence
|
|
249
|
-
- Pain: Today every handoff (review complete -> spawn fix agent -> re-review) requires Etienne to be online, read the findings, and manually kick the next step
|
|
250
|
-
- Constraint: Must be able to reason about what went wrong when a pipeline fails at 2am
|
|
61
|
+
## Candidates
|
|
251
62
|
|
|
252
|
-
|
|
253
|
-
- Job: Add their own pipelines (onboarding pipeline, data migration pipeline, etc.)
|
|
254
|
-
- Outcome: Write a new pipeline by copying a template and changing workflow IDs and routing rules
|
|
255
|
-
- Pain: If the coordinator pattern is hard to understand or extend, each team rewrites it from scratch
|
|
256
|
-
- Constraint: Cannot be expected to understand the WorkRail engine internals
|
|
63
|
+
### Candidate A: Minimal CLI Script (subprocess model)
|
|
257
64
|
|
|
258
|
-
|
|
65
|
+
**Summary:** `worktrain run pr-review` as a thin TypeScript wrapper shelling out to `worktrain spawn` and `worktrain await` CLIs via `execFile`, parsing stdout manually, calling `gh` directly.
|
|
259
66
|
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
67
|
+
- **Tensions resolved:** Simplest possible change; reuses existing CLI contracts
|
|
68
|
+
- **Tensions accepted:** No context passing; subprocess overhead; harder to test
|
|
69
|
+
- **Boundary:** CoordinatorDeps wraps `execFile` for all subprocess calls
|
|
70
|
+
- **Failure mode:** `worktrain spawn` output format change breaks coordinator silently
|
|
71
|
+
- **Repo-pattern relationship:** Departs -- `delivery-action.ts` uses direct function calls, not subprocesses
|
|
72
|
+
- **Gain:** Minimal new code
|
|
73
|
+
- **Loss:** No context passing, no type safety on spawn/await results, poor testability
|
|
74
|
+
- **Scope:** Too narrow
|
|
75
|
+
- **Philosophy:** Violates 'prefer fakes over mocks' (exec calls hard to fake cleanly); violates 'errors as data'
|
|
264
76
|
|
|
265
|
-
###
|
|
77
|
+
### Candidate B: HTTP-first with CoordinatorDeps Interface (RECOMMENDED)
|
|
266
78
|
|
|
267
|
-
**
|
|
268
|
-
- Native `spawn_agent` (observable in DAG, serial) vs. script + worktrain CLI (parallel, invisible to DAG)
|
|
269
|
-
- You cannot have both today. Console DAG tree = serial. Parallel fan-out = invisible coordinator.
|
|
79
|
+
**Summary:** `src/coordinators/pr-review.ts` with a `CoordinatorDeps` readonly interface. Core logic is pure functions. CLI wiring in `src/cli-worktrain.ts` provides real HTTP/CLI deps. Tests inject fakes.
|
|
270
80
|
|
|
271
|
-
**
|
|
272
|
-
- Routing on findings requires 2-call HTTP sequence (session detail + node detail) after worktrain await
|
|
273
|
-
- Simpler coordinator ignores findings and routes only on exit code (succeeded / failed) -- but this loses the clean/minor/blocking distinction
|
|
274
|
-
- The backlog explicitly wants content-based routing; simplicity would sacrifice the main value proposition
|
|
275
|
-
|
|
276
|
-
**T3: Template reuse vs. Pipeline specificity**
|
|
277
|
-
- A generic pipeline runner (execute recommendedPipeline array from classify-task) is maximally reusable but cannot express mr-review's loop-with-retry
|
|
278
|
-
- A specialized mr-review coordinator can express the full topology but is not reusable
|
|
279
|
-
- The first coordinator template will set the pattern -- wrong abstraction level here propagates to all future pipelines
|
|
280
|
-
|
|
281
|
-
**T4: Build now vs. Build right**
|
|
282
|
-
- worktrain spawn/await are merged but untested (Tier 3 Apr 18 grooming)
|
|
283
|
-
- spawn_agent just shipped (commit 4254feb7) and needs real-world validation
|
|
284
|
-
- Building the coordinator NOW uses primitives that are still in "needs testing" state
|
|
285
|
-
- Waiting for primitives to stabilize reduces rework risk but delays the autonomous pipeline
|
|
286
|
-
|
|
287
|
-
**T5: Script model vs. Workflow model (the central architectural tension)**
|
|
288
|
-
- Script: parallel, testable, zero LLM cost, but invisible to DAG and no native failure recovery
|
|
289
|
-
- Workflow: observable, content-aware via spawn_agent, but serial and time-budget constrained
|
|
290
|
-
- The backlog explicitly names coordinator scripts as the intended model -- but the code reality shows spawn_agent is the more capable primitive for content-based routing
|
|
291
|
-
|
|
292
|
-
### Success Criteria (observable)
|
|
293
|
-
|
|
294
|
-
1. A coordinator triggers from a cron or webhook, runs the mr-review pipeline for all open PRs, and posts a Slack summary -- without Etienne touching anything
|
|
295
|
-
2. When findings are "blocking", the coordinator spawns a fix agent and re-reviews (not just logs and exits)
|
|
296
|
-
3. When a child session fails, the coordinator's Slack summary names WHICH PR failed and WHY (the finding text)
|
|
297
|
-
4. The console DAG shows all child sessions linked to the coordinator as a tree (even if the coordinator itself is invisible as a script)
|
|
298
|
-
5. A second pipeline (e.g. implement-feature) can be added by writing a new 50-line TypeScript file and changing 3 workflow IDs
|
|
299
|
-
|
|
300
|
-
### Primary Framing Risk
|
|
301
|
-
|
|
302
|
-
**If spawn_agent becomes non-blocking (fire + await_all) in the near term, the entire script-vs-workflow calculus inverts.**
|
|
303
|
-
|
|
304
|
-
Currently the script model wins on parallelism (the only dimension where it beats native workflows). If spawn_agent gets a non-blocking mode with batch-await, native workflows become strictly better: same parallelism, plus DAG observability, plus notes available inline, plus no separate HTTP calls. In that scenario, building a coordinator script template today would be building the wrong abstraction -- the right abstraction would be a coordinator workflow JSON file.
|
|
305
|
-
|
|
306
|
-
This is not generic. It is a specific condition (spawn_agent async mode shipping) that would make the current framing wrong. The decision hinges on whether to build the script now or wait for/build the async spawn_agent first.
|
|
307
|
-
|
|
308
|
-
## Open Questions
|
|
309
|
-
|
|
310
|
-
1. **Parallel fan-out with spawn_agent:** Is there a plan to make spawn_agent non-blocking (fire-and-forget + await_all)? If yes, Candidate B becomes viable for parallel pipelines.
|
|
311
|
-
2. **worktrain await --include-notes flag:** Is this planned? Without it, every coordinator routing on content needs a separate HTTP call.
|
|
312
|
-
3. **worktrain spawn --context flag:** The current CLI does not pass context variables to the spawned session. How does the coordinator pass classify-task output (taskComplexity, recommendedPipeline) to the coding-task session?
|
|
313
|
-
4. **Coordinator session registration:** Is there a plan for scripts to register as coordinator sessions so they appear in the DAG?
|
|
314
|
-
5. **Session notes API:** Is GET /api/v2/sessions/:id currently returning full step notes in the runs array? The await code reads `runs[0].status` but not notes.
|
|
315
|
-
|
|
316
|
-
---
|
|
317
|
-
|
|
318
|
-
## Problem Frame Packet
|
|
319
|
-
|
|
320
|
-
**Primary uncertainty:** Whether the script model or the native workflow model is the right long-term abstraction -- given that spawn_agent is blocking+serial today, but the backlog explicitly describes async spawn + batch await as the desired primitive.
|
|
321
|
-
|
|
322
|
-
**Known approaches:** Coordinator script (Candidate A), native workflow (Candidate B), hybrid (Candidate C).
|
|
323
|
-
|
|
324
|
-
**Key stakeholders:** Anyone building a coordinator pipeline (today: Etienne; soon: teams using WorkTrain autonomously).
|
|
325
|
-
|
|
326
|
-
---
|
|
327
|
-
|
|
328
|
-
## Candidate Directions
|
|
329
|
-
|
|
330
|
-
### Candidate Generation Expectations
|
|
331
|
-
|
|
332
|
-
This is a **design_first** pass with **THOROUGH** rigor. Requirements for the candidate set:
|
|
333
|
-
1. At least one candidate must meaningfully reframe the problem (not just package an obvious solution)
|
|
334
|
-
2. All candidates must address the central tension: observability vs. parallelism
|
|
335
|
-
3. All candidates must specify how they handle content-based routing (the 2-call HTTP gap or native notes)
|
|
336
|
-
4. One candidate must represent the "build now with current primitives" position (pragmatic)
|
|
337
|
-
5. One candidate must represent the "build the right primitive first" position (strategic)
|
|
338
|
-
6. The spread must not cluster -- candidates genuinely differ in abstraction level
|
|
339
|
-
|
|
340
|
-
---
|
|
341
|
-
|
|
342
|
-
### Candidate 1: TypeScript Coordinator Script (Minimal, Build Now)
|
|
343
|
-
|
|
344
|
-
**One-sentence summary:** A standalone TypeScript file with DI-injected spawn/await/HTTP effects that drives the mr-review pipeline using today's worktrain spawn/await CLI plus a 2-call HTTP sequence to retrieve step notes for routing.
|
|
345
|
-
|
|
346
|
-
**Concrete shape:**
|
|
81
|
+
**CoordinatorDeps interface:**
|
|
347
82
|
```typescript
|
|
348
|
-
// coordinator-mr-review.ts
|
|
349
83
|
interface CoordinatorDeps {
|
|
350
|
-
readonly spawnSession: (workflowId: string, goal: string, workspace: string) => Promise<string>; //
|
|
351
|
-
readonly awaitSessions: (handles: string[],
|
|
352
|
-
readonly
|
|
353
|
-
readonly listOpenPRs: () => Promise<
|
|
354
|
-
readonly mergePR: (number:
|
|
355
|
-
readonly
|
|
84
|
+
readonly spawnSession: (workflowId: string, goal: string, workspace: string) => Promise<string>; // sessionHandle
|
|
85
|
+
readonly awaitSessions: (handles: string[], timeoutMs?: number) => Promise<AwaitResult>;
|
|
86
|
+
readonly getAgentResult: (sessionHandle: string) => Promise<string | null>; // recapMarkdown
|
|
87
|
+
readonly listOpenPRs: (workspace: string) => Promise<PrSummary[]>;
|
|
88
|
+
readonly mergePR: (prNumber: number, workspace: string) => Promise<void>;
|
|
89
|
+
readonly postResult: (notes: string) => Promise<void>;
|
|
356
90
|
readonly stderr: (line: string) => void;
|
|
357
|
-
|
|
358
|
-
|
|
359
|
-
type FindingsSeverity = 'clean' | 'minor' | 'blocking';
|
|
360
|
-
|
|
361
|
-
async function runMrReviewPipeline(deps: CoordinatorDeps, workspace: string): Promise<CoordinatorResult>
|
|
362
|
-
```
|
|
363
|
-
|
|
364
|
-
Notes are retrieved via: GET /api/v2/sessions/:id (get tip nodeId from runs[0].nodes[preferredTipNodeId]) then GET /api/v2/sessions/:id/nodes/:nodeId (get recapMarkdown). Parsed by a `parseFindings(recapMarkdown: string): FindingsSeverity` function that scans for known severity markers.
|
|
365
|
-
|
|
366
|
-
**Tensions resolved:** T3 (specific topology: loop-with-retry), T2 (content access via explicit 2-call HTTP)
|
|
367
|
-
**Tensions accepted:** T1 (no parallelism), T4 (build now, not right)
|
|
368
|
-
**Wait -- parallelism:** This candidate CAN achieve parallel fan-out by calling `spawnSession` N times before calling `awaitSessions([h1, h2, h3, ...])`. The `awaitSessions` dep wraps `worktrain await` which polls all sessions concurrently. **This resolves T1.**
|
|
369
|
-
|
|
370
|
-
**Boundary solved at:** TypeScript module boundary. All I/O injected via `CoordinatorDeps`. Testable with fake deps (no live daemon).
|
|
371
|
-
|
|
372
|
-
**Failure mode to watch:** `parseFindings` is a string parser on LLM-generated markdown. If the mr-review workflow changes its notes format, the parser silently misclassifies. Must have a `'unknown'` severity fallback that defaults to 'blocking' (conservative).
|
|
373
|
-
|
|
374
|
-
**Relation to existing patterns:** Directly follows `WorktrainSpawnCommandDeps` / `WorktrainAwaitCommandDeps` DI pattern. Same injectable interface shape.
|
|
375
|
-
|
|
376
|
-
**Gains:** Ships today. Uses stable primitives. Fully testable. Parallel fan-out. Content-based routing.
|
|
377
|
-
**Gives up:** Invisible to console DAG (coordinator is not a WorkRail session). No durable state (script crash = lost progress). Notes parsing is brittle.
|
|
378
|
-
|
|
379
|
-
**Impact surface:** None -- coordinator is a standalone file. Does not require engine changes.
|
|
380
|
-
|
|
381
|
-
**Scope:** Best-fit for "first coordinator template."
|
|
382
|
-
|
|
383
|
-
**Philosophy honored:** Errors as data (CliResult pattern), Dependency injection, Exhaustiveness (FindingsSeverity union), Validate at boundaries (parseFindings validates at HTTP response boundary)
|
|
384
|
-
**Philosophy tension:** Not fully deterministic if notes format varies (string parsing on LLM output)
|
|
385
|
-
|
|
386
|
-
---
|
|
387
|
-
|
|
388
|
-
### Candidate 2: Serial Coordinator Workflow (Native, Observable)
|
|
389
|
-
|
|
390
|
-
**One-sentence summary:** A WorkRail workflow JSON file where each pipeline phase is a step that calls `spawn_agent` once, receives notes inline, and uses those notes to set context variables that the next step reads.
|
|
391
|
-
|
|
392
|
-
**Concrete shape:**
|
|
393
|
-
```json
|
|
394
|
-
{
|
|
395
|
-
"id": "coordinator-mr-review",
|
|
396
|
-
"steps": [
|
|
397
|
-
{
|
|
398
|
-
"id": "gather-prs",
|
|
399
|
-
"procedure": ["Run gh pr list --json, set context.openPRs"]
|
|
400
|
-
},
|
|
401
|
-
{
|
|
402
|
-
"id": "review-loop",
|
|
403
|
-
"loopCondition": "context.openPRs.length > 0",
|
|
404
|
-
"procedure": [
|
|
405
|
-
"Pop one PR from context.openPRs",
|
|
406
|
-
"Call spawn_agent(mr-review-workflow-agentic, goal: 'Review PR #N')",
|
|
407
|
-
"Read spawn_agent result.notes",
|
|
408
|
-
"If notes contain CLEAN: add to context.mergeQueue",
|
|
409
|
-
"If notes contain MINOR: call spawn_agent(coding-task-workflow-agentic, 'Fix: <finding>')",
|
|
410
|
-
"If notes contain BLOCKING: add to context.escalationList"
|
|
411
|
-
]
|
|
412
|
-
},
|
|
413
|
-
{
|
|
414
|
-
"id": "merge-queue",
|
|
415
|
-
"procedure": ["For each PR in mergeQueue: run git merge sequence"]
|
|
416
|
-
}
|
|
417
|
-
]
|
|
418
|
-
}
|
|
419
|
-
```
|
|
420
|
-
|
|
421
|
-
Each `spawn_agent` call blocks until child completes, then returns `{ outcome, notes }`. Notes are available immediately -- no HTTP polling needed.
|
|
422
|
-
|
|
423
|
-
**Tensions resolved:** T1 (fully observable in console DAG), T2 (notes available inline)
|
|
424
|
-
**Tensions accepted:** T1 partial (serial review -- one PR at a time, not parallel), T4 (cannot build right now, needs workflow JSON authoring)
|
|
425
|
-
|
|
426
|
-
**Boundary solved at:** WorkRail workflow step boundary. All coordination logic in workflow JSON prompt instructions + agent reasoning.
|
|
427
|
-
|
|
428
|
-
**Failure mode to watch:** Parent session's maxSessionMinutes accumulates across all spawn_agent calls. Reviewing 10 PRs with a 30-minute child budget each requires the parent to have 300+ minutes. Time budget explosion is silent -- parent times out while children are running.
|
|
429
|
-
|
|
430
|
-
**Relation to existing patterns:** Directly uses spawn_agent as designed. Follows the workflow-runner.ts spawn_agent pattern (blocking, errors as data, parentSessionId).
|
|
431
|
-
|
|
432
|
-
**Gains:** Full DAG observability. Session state is durable (daemon restart recovers). Notes available inline, no separate HTTP call.
|
|
433
|
-
**Gives up:** Serial reviews (10 PRs = 10x review time). Time budget scales linearly. Routing logic is in LLM-readable prompt, not testable TypeScript.
|
|
434
|
-
|
|
435
|
-
**Impact surface:** None. A new workflow JSON file.
|
|
436
|
-
|
|
437
|
-
**Scope:** Best-fit IF serial review is acceptable.
|
|
438
|
-
|
|
439
|
-
**Philosophy honored:** Errors as data (spawn_agent returns outcome), Exhaustiveness (outcome enum), Observable state
|
|
440
|
-
**Philosophy tension:** Routing logic is LLM prompt text, not typed domain logic. Coordinator philosophy says scripts, not LLM reasoning -- this violates the scripts-first principle.
|
|
441
|
-
|
|
442
|
-
---
|
|
443
|
-
|
|
444
|
-
### Candidate 3: Build Async spawn_agent First, Then Native Coordinator Workflow
|
|
445
|
-
|
|
446
|
-
**One-sentence summary:** Extend spawn_agent with a non-blocking mode (`blocking: false`) that returns a `pendingHandle` immediately, add an `await_agents` tool that takes an array of pendingHandles and blocks until all complete, then build the coordinator as a WorkRail workflow that uses these new tools for parallel + observable + content-aware orchestration.
|
|
447
|
-
|
|
448
|
-
**Concrete shape (new engine API):**
|
|
449
|
-
```typescript
|
|
450
|
-
// New tool: spawn_agent with blocking: false
|
|
451
|
-
spawn_agent({ workflowId, goal, workspacePath, blocking: false })
|
|
452
|
-
→ { pendingHandle: string } // returns immediately
|
|
453
|
-
|
|
454
|
-
// New tool: await_agents
|
|
455
|
-
await_agents({ handles: ['ph_abc', 'ph_def'], mode: 'all' })
|
|
456
|
-
→ [{ handle, childSessionId, outcome, notes }] // blocks until all complete
|
|
457
|
-
```
|
|
458
|
-
|
|
459
|
-
The coordinator workflow then becomes:
|
|
460
|
-
```
|
|
461
|
-
Step 1: Gather PRs (script/bash)
|
|
462
|
-
Step 2: Spawn all review sessions in parallel (call spawn_agent with blocking:false for each PR)
|
|
463
|
-
Step 3: Await all (call await_agents)
|
|
464
|
-
Step 4: Route on notes (typed context variables set from parsed notes)
|
|
465
|
-
Step 5: Spawn fix agents (blocking spawn_agent, one at a time)
|
|
466
|
-
Step 6: Merge
|
|
467
|
-
```
|
|
468
|
-
|
|
469
|
-
**Tensions resolved:** T1 (parallel + observable), T2 (notes inline from await_agents), T4 (builds the right primitive)
|
|
470
|
-
**Tensions accepted:** T4 partial (does not ship today -- requires engine work)
|
|
471
|
-
|
|
472
|
-
**Boundary solved at:** WorkRail engine boundary. New tools in workflow-runner.ts.
|
|
473
|
-
|
|
474
|
-
**Failure mode to watch:** The non-blocking spawn pattern must not use `dispatch()` (deadlock risk via Semaphore + queue slot). The implementation must use the same `runWorkflow()` pattern as the blocking spawn_agent, but launch it as a concurrent Promise that is tracked in a coordinator-owned pending map.
|
|
475
|
-
|
|
476
|
-
**Relation to existing patterns:** Extends spawn_agent in workflow-runner.ts (L1415). The blocking version already exists -- non-blocking is an additive extension. `pendingHandle` concept mirrors the CLI's sessionHandle.
|
|
477
|
-
|
|
478
|
-
**Gains:** Resolves all 5 decision criteria. Parallel fan-out + observability + content routing + DAG tree + extensible. The coordinator workflow is the long-term correct abstraction.
|
|
479
|
-
**Gives up:** Does not exist today. Requires engine PR before coordinator can be built. 2-4 week delay (estimate).
|
|
480
|
-
|
|
481
|
-
**Impact surface:** workflow-runner.ts (new makeAwaitAgentsTool), workflow-runner.ts (extend makeSpawnAgentTool), possibly workflow-runner.ts executeWorkflowLoop for pending handle tracking.
|
|
482
|
-
|
|
483
|
-
**Scope:** Too broad for "first coordinator template" alone, but correctly scoped for "right long-term architecture."
|
|
484
|
-
|
|
485
|
-
**Philosophy honored:** All 5 decision criteria. Architectural fixes over patches. Make illegal states unrepresentable (pending handle as a typed domain type).
|
|
486
|
-
**Philosophy tension:** YAGNI -- this builds a speculative primitive before any coordinator has validated the need in production.
|
|
487
|
-
|
|
488
|
-
---
|
|
489
|
-
|
|
490
|
-
### Candidate 4: Minimal Script + Structured Notes Contract (Pragmatic + Future-Proof)
|
|
491
|
-
|
|
492
|
-
**One-sentence summary:** Build Candidate 1 (TypeScript coordinator script) but add a structured `## COORDINATOR_OUTPUT` JSON block to the mr-review workflow's final step notes as a first-class contract, making the coordinator's dependency on notes format explicit and versioned rather than fragile string parsing.
|
|
493
|
-
|
|
494
|
-
**Concrete shape:**
|
|
495
|
-
|
|
496
|
-
In `mr-review-workflow.agentic.v2.json` final step, add to `outputRequired`:
|
|
497
|
-
```json
|
|
498
|
-
{
|
|
499
|
-
"coordinatorOutput": "JSON block in exact format:\n```json\n{\"findings\": [{\"severity\": \"clean|minor|blocking\", \"summary\": \"...\", \"prNumber\": N}]}\n```"
|
|
500
|
-
}
|
|
501
|
-
```
|
|
502
|
-
|
|
503
|
-
In the coordinator script:
|
|
504
|
-
```typescript
|
|
505
|
-
function parseCoordinatorOutput(notes: string): Result<CoordinatorOutput, ParseError> {
|
|
506
|
-
const match = /```json\n([\s\S]+?)\n```/.exec(notes);
|
|
507
|
-
if (!match) return err({ kind: 'missing_block' });
|
|
508
|
-
return parseJson(match[1]).andThen(validateCoordinatorOutput);
|
|
91
|
+
readonly now: () => number;
|
|
92
|
+
readonly port: number;
|
|
509
93
|
}
|
|
510
94
|
```
|
|
511
95
|
|
|
512
|
-
|
|
513
|
-
|
|
514
|
-
|
|
515
|
-
|
|
96
|
+
**Pure functions:**
|
|
97
|
+
- `parseFindingsFromNotes(markdown: string | null): Result<ReviewFindings, string>` -- two-tier (JSON block first, keyword scan fallback)
|
|
98
|
+
- `classifySeverity(findings: ReviewFindings): ReviewSeverity`
|
|
99
|
+
- `buildFixGoal(prNumber: number, findings: ReviewFindings): string`
|
|
516
100
|
|
|
517
|
-
**
|
|
101
|
+
- **Tensions resolved:** Context passing possible (HTTP direct), type safety, testability, all 5 robustness rules
|
|
102
|
+
- **Tensions accepted:** Slightly more code vs A; port discovery logic duplicated from spawn.ts (intentional)
|
|
103
|
+
- **Boundary:** `CoordinatorDeps` -- exactly the same pattern as `WorktrainSpawnCommandDeps`
|
|
104
|
+
- **Failure mode:** `recapMarkdown` is null -> treated as `unknown` -> escalate (conservative, correct)
|
|
105
|
+
- **Repo-pattern relationship:** Follows `WorktrainSpawnCommandDeps` pattern exactly; adapts `parseHandoffArtifact` two-tier parser
|
|
106
|
+
- **Gain:** Full type safety, testable pure core, context passing, matches existing architecture
|
|
107
|
+
- **Loss:** More code (but correct code)
|
|
108
|
+
- **Scope:** Best-fit -- 3 new files
|
|
109
|
+
- **Philosophy:** Honors all -- immutability, DI for I/O, errors as data, explicit domain types, validate at boundaries, prefer fakes over mocks
|
|
518
110
|
|
|
519
|
-
|
|
111
|
+
### Candidate C: Generic Coordinator Framework + pr-review Instance
|
|
520
112
|
|
|
521
|
-
**
|
|
113
|
+
**Summary:** Build `src/coordinators/base.ts` with `CoordinatorDeps<TInput, TOutput>` generic and pipeline pattern, then implement pr-review as an instance.
|
|
522
114
|
|
|
523
|
-
|
|
524
|
-
**
|
|
525
|
-
|
|
526
|
-
**
|
|
527
|
-
|
|
528
|
-
**
|
|
529
|
-
|
|
530
|
-
**
|
|
531
|
-
**Philosophy
|
|
115
|
+
- **Tensions resolved:** Extensibility for future coordinators
|
|
116
|
+
- **Tensions accepted:** Higher upfront complexity; forced generic mold may not fit next coordinator
|
|
117
|
+
- **Boundary:** Generic abstraction layer above CoordinatorDeps
|
|
118
|
+
- **Failure mode:** Framework abstraction doesn't fit next coordinator's shape
|
|
119
|
+
- **Repo-pattern relationship:** No existing coordinator framework to adapt; departs significantly
|
|
120
|
+
- **Gain:** Future reuse
|
|
121
|
+
- **Loss:** YAGNI violation -- second coordinator doesn't exist yet
|
|
122
|
+
- **Scope:** Too broad
|
|
123
|
+
- **Philosophy:** Violates YAGNI with discipline
|
|
532
124
|
|
|
533
125
|
---
|
|
534
126
|
|
|
535
127
|
## Comparison and Recommendation
|
|
536
128
|
|
|
537
|
-
|
|
538
|
-
|
|
539
|
-
| Criterion | C1 (Script, minimal) | C2 (Workflow, serial) | C3 (Async spawn_agent) | C4 (Script + contract) |
|
|
540
|
-
|-----------|----------------------|----------------------|------------------------|------------------------|
|
|
541
|
-
| Parallel fan-out | YES | NO | YES | YES |
|
|
542
|
-
| Content-based routing | YES (fragile parse) | YES (inline) | YES (inline) | YES (typed contract) |
|
|
543
|
-
| Structured failure data | YES | YES | YES | YES |
|
|
544
|
-
| Console DAG tree | NO | YES | YES | NO |
|
|
545
|
-
| New pipeline = new file | YES | YES | YES | YES |
|
|
546
|
-
| Ships today | YES | YES | NO | YES |
|
|
547
|
-
| Testable without daemon | YES | NO | NO | YES |
|
|
548
|
-
|
|
549
|
-
### Recommended Direction: Candidate 4 (Script + Structured Notes Contract)
|
|
129
|
+
**Recommendation: Candidate B**
|
|
550
130
|
|
|
551
|
-
|
|
131
|
+
Candidate B is the only option that:
|
|
132
|
+
1. Follows the established DI interface pattern (`WorktrainSpawnCommandDeps`) exactly
|
|
133
|
+
2. Enables the `getAgentResult` 2-call HTTP sequence in a testable way
|
|
134
|
+
3. Produces pure functions for finding parsing and severity classification
|
|
135
|
+
4. Enforces all 5 robustness rules with explicit typed state
|
|
136
|
+
5. Honors all CLAUDE.md philosophy principles
|
|
552
137
|
|
|
553
|
-
|
|
554
|
-
1. Only C1 and C4 achieve parallel fan-out + ship today + testable without daemon
|
|
555
|
-
2. C4 > C1 because typed contract (explicit domain type) vs. fragile regex (philosophy violation)
|
|
556
|
-
3. C2 loses on serial reviews (N*reviewTime) and routing logic in LLM prompts (scripts-first violation)
|
|
557
|
-
4. C3 is the correct long-term direction but does not exist (YAGNI until C4 validates the topology)
|
|
558
|
-
|
|
559
|
-
**The coordinator output contract pattern is already proven in the codebase:** `parseHandoffArtifact` in `src/trigger/delivery-action.ts` does exactly this for the coding-task workflow. The coordinator contract is the same pattern at the coordination layer.
|
|
560
|
-
|
|
561
|
-
### What C4 Gives Up
|
|
562
|
-
|
|
563
|
-
Console DAG visibility. The coordinator script is invisible -- not a WorkRail session. Mitigation: child sessions appear in the console as a flat list with parentSessionId links once that UI is built. Phase transitions logged to stderr with session handles.
|
|
564
|
-
|
|
565
|
-
**C4 is a stepping stone, not a permanent decision.** Once async spawn_agent ships (C3 direction), the coordinator workflow will have DAG visibility + all current advantages. C4 validates the topology in production so that C3 is built on confirmed requirements.
|
|
566
|
-
|
|
567
|
-
### Self-Critique
|
|
568
|
-
|
|
569
|
-
**Strongest counter-argument:** Notes contract coupling creates a maintenance dependency between the coordinator script and a specific version of the mr-review workflow. If the workflow format changes, the coordinator silently fails or throws a parse error. Candidate 2 avoids this entirely (the LLM reads whatever notes exist).
|
|
570
|
-
|
|
571
|
-
**Pivot conditions:**
|
|
572
|
-
1. If notes contract adds too much friction across multiple coordinator scripts -> use structured context variables (workflow final step sets context.coordinatorOutput, coordinator reads it via GET session context) instead of notes block
|
|
573
|
-
2. If async spawn_agent is scheduled within 2 weeks -> consider waiting and going directly to C3
|
|
574
|
-
3. If DAG observability is critical for the first real pipeline debug -> accept C2's serial reviews to get the tree view
|
|
575
|
-
|
|
576
|
-
## Challenge Notes
|
|
577
|
-
|
|
578
|
-
### Adversarial Challenge of Candidate 4 (Leading Direction)
|
|
579
|
-
|
|
580
|
-
**Challenge 1: The notes contract is a false solution to the wrong problem.**
|
|
581
|
-
|
|
582
|
-
Candidate 4 adds a `## COORDINATOR_OUTPUT` JSON block to the mr-review workflow. But what happens when there are 10 different coordinator pipelines, each with a different output contract? You now have N contracts to maintain, each coupling a coordinator to a specific workflow version. The `parseHandoffArtifact` precedent is actually a warning, not a green light -- the handoff artifact has already caused fragility when workflow output formats drifted. Coordinator contracts multiply this surface area.
|
|
583
|
-
|
|
584
|
-
**Resolution:** Valid concern, but the alternative (LLM prose routing in C2) has the same coupling problem in a less visible form -- the coordinator LLM must still understand what 'BLOCKING' means in free-form text. Typed contracts are preferable to implicit text conventions even if they require maintenance. Mitigation: a shared `coordinator-contract-types.ts` with a versioned schema, and a `validate_coordinator_output` step in the workflow's verify block that rejects outputs that don't match the schema.
|
|
585
|
-
|
|
586
|
-
**Challenge 2: worktrain spawn does not pass context variables -- classify-task output cannot reach the coding-task session.**
|
|
587
|
-
|
|
588
|
-
The mr-review pipeline does not need classify-task output. But the full implementation pipeline (implement-feature coordinator) requires passing `{ taskComplexity, recommendedPipeline }` from classify-task to the coding-task session. The current worktrain spawn CLI has no `--context` flag. A coordinator that needs to pass context must use HTTP directly: POST /api/v2/auto/dispatch with a `context` body field.
|
|
589
|
-
|
|
590
|
-
**Resolution:** Confirmed gap. Add `passContext: (handle: string, context: Record<string, unknown>) => Promise<void>` to CoordinatorDeps, implemented via HTTP. This must be documented in the coordinator template as a required dep. It is not a blocker for the mr-review coordinator specifically (which does not need classify output), but IS a blocker for the implement-feature coordinator.
|
|
591
|
-
|
|
592
|
-
**Challenge 3: The coordinator script is not durable. If the script process dies mid-pipeline, all coordination state is lost.**
|
|
593
|
-
|
|
594
|
-
If the coordinator crashes after spawning 5 review sessions but before collecting their results, those sessions are orphaned in 'in_progress' state. The coordinator has no way to recover -- it cannot re-acquire the session handles it created. The script's state is entirely in-memory.
|
|
595
|
-
|
|
596
|
-
**Resolution:** This is a real limitation with no clean fix in Candidate 4. Mitigation: write session handles to a state file (`~/.workrail/coordinator-state/{run-id}.json`) at each phase transition. On coordinator restart, read the state file and resume from the last checkpoint. This adds complexity but is not insurmountable. Alternatively: accept the limitation for the first coordinator and note it as a reason to invest in C3 (native workflow + daemon durability).
|
|
597
|
-
|
|
598
|
-
**Challenge 4: 'Parallel fan-out' requires calling spawnSession N times before awaitSessions. But worktrain spawn is sequential at the CLI level -- each CLI invocation is a separate process.**
|
|
599
|
-
|
|
600
|
-
Actually this is a non-issue. The coordinator script calls the `spawnSession` dep N times (N async HTTP calls in parallel via Promise.all), then calls `awaitSessions` once with all handles. The HTTP calls are non-blocking. True parallelism IS achievable in the TypeScript coordinator. This challenge fails.
|
|
601
|
-
|
|
602
|
-
**Challenge 5: What if the first mr-review pipeline reveals that the topology is wrong?**
|
|
603
|
-
|
|
604
|
-
If the first real pipeline run shows that the mr-review workflow needs to return richer data than the notes contract provides, the coordinator needs a contract change + workflow change + coordinator change. In C2 (native workflow), a format change is handled by changing the LLM prompt -- no parse layer to update. This makes C2 more flexible for early iteration.
|
|
605
|
-
|
|
606
|
-
**Resolution:** Valid tradeoff. C4 is less flexible to format changes than C2. Mitigation: keep the coordinator output block minimal for v1 (severity + summary + prNumber only) and add fields incrementally. Schema versioning via a `version: 1` field in the JSON block lets the coordinator detect stale contract versions.
|
|
607
|
-
|
|
608
|
-
### Challenge Verdict
|
|
609
|
-
|
|
610
|
-
**C4 holds up under challenge.** The three real concerns (contract maintenance, no context passing, no durability) are documented limitations with known mitigations, not blockers. The challenge confirms C4 as the right first coordinator design, with C3 (async spawn_agent + native workflow) as the explicit next investment after production validation.
|
|
611
|
-
|
|
612
|
-
---
|
|
613
|
-
|
|
614
|
-
## Resolution Notes
|
|
615
|
-
|
|
616
|
-
**Selected Direction: Candidate 4** -- TypeScript coordinator script with DI-injected effects, parallel fan-out via worktrain spawn + await, and an explicit `## COORDINATOR_OUTPUT` JSON block in the mr-review workflow as the typed findings contract.
|
|
617
|
-
|
|
618
|
-
**Runner-up: Candidate 3** -- async spawn_agent + native coordinator workflow. This is the correct long-term direction. It should be built after C4 validates the topology in production.
|
|
619
|
-
|
|
620
|
-
**Why C4 over C2:** C2 is serial (N*reviewTime). C2 routing logic lives in LLM prompts (violates scripts-first principle). C2 is not testable without a live daemon.
|
|
621
|
-
|
|
622
|
-
**Why C4 over C1:** C1 uses fragile regex on free-form notes. C4 uses a typed contract (`## COORDINATOR_OUTPUT` JSON block). The difference is the same as parseHandoffArtifact vs. ad-hoc string parsing.
|
|
623
|
-
|
|
624
|
-
**Why C4 before C3:** YAGNI. Async spawn_agent does not exist. Build C4 first, run it in production, then use the validated topology to spec async spawn_agent properly.
|
|
138
|
+
The scope is correct: 3 new files, clear boundaries, no speculative abstractions.
|
|
625
139
|
|
|
626
140
|
---
|
|
627
141
|
|
|
628
|
-
##
|
|
142
|
+
## Self-Critique
|
|
629
143
|
|
|
630
|
-
**
|
|
144
|
+
**Strongest counter-argument against B:** Port discovery logic is duplicated from `worktrain-spawn.ts`. A clean-design purist would extract it to a shared util. However, the coordinator is a standalone script in `src/coordinators/`, not part of the daemon machinery. Coupling it to internal daemon utils would be the wrong dependency direction. The duplication is intentional and bounded.
|
|
631
145
|
|
|
632
|
-
**
|
|
146
|
+
**What would tip toward A:** If context passing is never needed and the coordinator will always be a thin orchestrator. But the robustness rules (zombie detection, traceability JSON block) require typed interfaces, which Candidate A can't easily provide.
|
|
633
147
|
|
|
634
|
-
**
|
|
148
|
+
**What evidence would justify C:** A second coordinator (e.g., `groom-prs`, `security-audit`) that shares 50%+ of the shape. Currently speculative.
|
|
635
149
|
|
|
636
|
-
**
|
|
637
|
-
|
|
638
|
-
**Decision 5:** C3 (async spawn_agent) named as next engine investment. Trigger: after first coordinator (C4) has run in production with real PRs.
|
|
150
|
+
**Invalidating assumption:** If `GET /api/v2/sessions/:id/nodes/:nodeId` consistently returns `recapMarkdown: null` for the final step (e.g., because `requireConfirmation: true` nodes store notes differently). Mitigation already built in: null -> unknown -> escalate. The coordinator won't crash, it escalates conservatively.
|
|
639
151
|
|
|
640
152
|
---
|
|
641
153
|
|
|
642
|
-
##
|
|
643
|
-
|
|
644
|
-
### Recommended Architecture
|
|
645
|
-
|
|
646
|
-
**Candidate 4: TypeScript Coordinator Script with Structured Notes Contract**
|
|
647
|
-
|
|
648
|
-
Build a standalone TypeScript file (`coordinator-mr-review.ts`) with:
|
|
649
|
-
|
|
650
|
-
1. **CoordinatorDeps DI interface** -- all effects injectable (spawn, await, notes retrieval, PR list, merge, Slack):
|
|
651
|
-
```typescript
|
|
652
|
-
interface CoordinatorDeps {
|
|
653
|
-
readonly spawnSession: (workflowId: string, goal: string, workspace: string, context?: Record<string, unknown>) => Promise<string>;
|
|
654
|
-
readonly awaitSessions: (handles: string[], timeoutMs: number) => Promise<AwaitResult>;
|
|
655
|
-
readonly getAgentResult: (handle: string) => Promise<AgentResult>; // 2-call HTTP internally
|
|
656
|
-
readonly listOpenPRs: () => Promise<PullRequest[]>;
|
|
657
|
-
readonly mergePR: (number: number) => Promise<void>;
|
|
658
|
-
readonly postSlack: (message: string) => Promise<void>;
|
|
659
|
-
readonly stderr: (line: string) => void;
|
|
660
|
-
}
|
|
661
|
-
```
|
|
662
|
-
|
|
663
|
-
2. **AgentResult bridge type** (mirrors future async spawn_agent result):
|
|
664
|
-
```typescript
|
|
665
|
-
type AgentResult = { handle: string; childSessionId: string | null; outcome: SessionOutcome; notes: string | null };
|
|
666
|
-
```
|
|
667
|
-
|
|
668
|
-
3. **Two-tier findings parsing**:
|
|
669
|
-
- Preferred: parse `## COORDINATOR_OUTPUT` JSON block from notes (typed contract)
|
|
670
|
-
- Fallback: scan for BLOCKING/MINOR/CLEAN keywords in notes (graceful degradation)
|
|
671
|
-
- Unknown severity defaults to 'blocking' (conservative)
|
|
672
|
-
|
|
673
|
-
4. **Parallel fan-out via Promise.all + worktrain await**:
|
|
674
|
-
```typescript
|
|
675
|
-
const handles = await Promise.all(prs.map(pr => deps.spawnSession('mr-review-workflow-agentic', `Review PR #${pr.number}`, workspace)));
|
|
676
|
-
const awaitResult = await deps.awaitSessions(handles, 30 * 60 * 1000);
|
|
677
|
-
const agentResults = await Promise.all(handles.map(h => deps.getAgentResult(h)));
|
|
678
|
-
```
|
|
679
|
-
|
|
680
|
-
5. **Typed routing** over FindingsSeverity discriminated union:
|
|
681
|
-
```typescript
|
|
682
|
-
type FindingsSeverity = 'clean' | 'minor' | 'blocking' | 'unknown';
|
|
683
|
-
```
|
|
684
|
-
|
|
685
|
-
6. **Required parallel workflow change: add verify step to mr-review workflow**
|
|
686
|
-
The mr-review workflow's final step must include in `outputRequired`:
|
|
687
|
-
```
|
|
688
|
-
coordinatorOutput: "JSON block: ## COORDINATOR_OUTPUT\n```json\n{\"findings\": [{\"severity\": \"clean|minor|blocking\", \"summary\": \"...\", \"prNumber\": N}]}\n```"
|
|
689
|
-
```
|
|
690
|
-
And in `verify`: "## COORDINATOR_OUTPUT block is present with valid JSON matching the coordinator schema."
|
|
691
|
-
|
|
692
|
-
### Notes Retrieval: 2-Call HTTP Sequence
|
|
693
|
-
|
|
694
|
-
The `getAgentResult` dep implementation does:
|
|
695
|
-
1. GET `/api/v2/sessions/:sessionId` -- find `runs[0].nodes` preferred tip node ID
|
|
696
|
-
2. GET `/api/v2/sessions/:sessionId/nodes/:nodeId` -- get `recapMarkdown` (step notes)
|
|
697
|
-
3. Parse coordinator output block from recapMarkdown
|
|
698
|
-
|
|
699
|
-
### Context Passing
|
|
700
|
-
|
|
701
|
-
The `spawnSession` dep uses HTTP dispatch directly (not worktrain spawn CLI):
|
|
702
|
-
```
|
|
703
|
-
POST /api/v2/auto/dispatch { workflowId, goal, workspacePath, context }
|
|
704
|
-
```
|
|
705
|
-
The `context` body field is supported (verified in console-routes.ts L519).
|
|
706
|
-
The worktrain spawn CLI does not have a `--context` flag -- use HTTP for context-passing pipelines.
|
|
707
|
-
|
|
708
|
-
### Pipeline Sequence (mr-review coordinator)
|
|
709
|
-
|
|
710
|
-
```
|
|
711
|
-
1. listOpenPRs() -> [PR]
|
|
712
|
-
2. Parallel: spawnSession('mr-review-workflow-agentic', goal, workspace) for each PR
|
|
713
|
-
3. awaitSessions(all handles, 30m)
|
|
714
|
-
4. For each handle: getAgentResult -> parse findings
|
|
715
|
-
5. Route:
|
|
716
|
-
- clean -> mergeQueue
|
|
717
|
-
- minor -> spawnSession('coding-task-workflow-agentic', 'Fix: <finding>'), await, re-review (max 3 passes)
|
|
718
|
-
- blocking -> escalation list (Slack + GitLab comment)
|
|
719
|
-
- unknown -> escalation list (conservative default)
|
|
720
|
-
6. mergePR() for each clean PR (serial, pull before each to avoid conflicts)
|
|
721
|
-
7. postSlack(summary)
|
|
722
|
-
```
|
|
723
|
-
|
|
724
|
-
### What This Gives Up
|
|
725
|
-
|
|
726
|
-
- Console DAG tree view: coordinator is not a WorkRail session. Child sessions appear as a flat list. **Mitigation:** parentSessionId is already in the session store; console tree view is the next planned feature and will retroactively make child sessions visible.
|
|
727
|
-
- Coordinator crash durability: script state is in-memory. **Mitigation (v2):** state file at `~/.workrail/coordinator-state/{run-id}.json` written at each phase transition.
|
|
728
|
-
|
|
729
|
-
### Long-Term Direction (Candidate 3)
|
|
730
|
-
|
|
731
|
-
After the first coordinator script runs in production and validates the topology:
|
|
732
|
-
- Build `spawn_agent(blocking: false)` + `await_agents([handles])` native engine tools
|
|
733
|
-
- Build a coordinator as a WorkRail workflow JSON file using these tools
|
|
734
|
-
- This gives: parallel fan-out + console DAG observability + daemon durability + inline notes (no 2-call HTTP)
|
|
735
|
-
- The `AgentResult` bridge type in C4 makes this migration trivial (same result shape)
|
|
154
|
+
## Open Questions for the Main Agent
|
|
736
155
|
|
|
737
|
-
|
|
156
|
+
1. Should `worktrain run pr-review` use a nested commander subcommand (`program.command('run').command('pr-review')`) or a flat command (`program.command('run pr-review')`)? Commander supports both; the existing commands all use flat style -- recommend flat.
|
|
738
157
|
|
|
739
|
-
|
|
740
|
-
2. Should `worktrain spawn` get a `--context` flag? Would simplify coordinator deps for context-passing pipelines.
|
|
741
|
-
3. Should `worktrain await` get a `--include-notes` flag? Would consolidate the 2-call HTTP into the CLI.
|
|
158
|
+
2. For the serial merge sequence: should the coordinator do `git pull` before each `gh pr merge --squash`? Yes -- ensures clean base. But this is a coordinator behavior detail, not an architectural question.
|
|
742
159
|
|
|
743
|
-
|
|
160
|
+
3. Should the coordinator write a full report file (`coordinator-pr-review-YYYY-MM-DD.md`)? Yes, per UX spec. This is a simple file write via CoordinatorDeps.
|
|
744
161
|
|
|
745
|
-
|
|
162
|
+
4. Is `coding-task-workflow-agentic` the correct fix-agent workflow? Yes -- it handles "implement/fix" tasks. The goal string `Fix review findings in PR #N: [finding summaries]` is the goal format.
|