@exaudeus/workrail 3.39.0 → 3.40.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli-worktrain.js +50 -26
- package/dist/console-ui/assets/{index-3oXZ_A9m.js → index-CXWCAonr.js} +1 -1
- package/dist/console-ui/index.html +1 -1
- package/dist/coordinators/pr-review.d.ts +6 -1
- package/dist/coordinators/pr-review.js +60 -5
- package/dist/daemon/workflow-runner.d.ts +3 -2
- package/dist/daemon/workflow-runner.js +6 -3
- package/dist/manifest.json +56 -40
- package/dist/mcp/output-schemas.d.ts +10 -10
- package/dist/mcp/tools.d.ts +12 -12
- package/dist/trigger/trigger-router.js +9 -2
- package/dist/types/workflow-source.d.ts +0 -1
- package/dist/types/workflow-source.js +3 -6
- package/dist/types/workflow.d.ts +1 -1
- package/dist/types/workflow.js +1 -2
- package/dist/v2/durable-core/domain/artifact-contract-validator.js +66 -0
- package/dist/v2/durable-core/schemas/artifacts/coordinator-signal.d.ts +25 -0
- package/dist/v2/durable-core/schemas/artifacts/coordinator-signal.js +31 -0
- package/dist/v2/durable-core/schemas/artifacts/index.d.ts +3 -1
- package/dist/v2/durable-core/schemas/artifacts/index.js +14 -1
- package/dist/v2/durable-core/schemas/artifacts/review-verdict.d.ts +41 -0
- package/dist/v2/durable-core/schemas/artifacts/review-verdict.js +30 -0
- package/dist/v2/durable-core/schemas/export-bundle/index.d.ts +236 -236
- package/dist/v2/durable-core/schemas/session/events.d.ts +50 -50
- package/dist/v2/durable-core/schemas/session/gaps.d.ts +2 -2
- package/dist/v2/durable-core/schemas/session/manifest.d.ts +4 -4
- package/dist/v2/durable-core/schemas/session/outputs.d.ts +8 -8
- package/dist/v2/usecases/console-routes.js +178 -0
- package/docs/design/coordinator-artifact-protocol-design-candidates.md +155 -0
- package/docs/design/coordinator-artifact-protocol-design-review.md +103 -0
- package/docs/design/coordinator-artifact-protocol-implementation-plan.md +259 -0
- package/docs/ideas/backlog.md +158 -100
- package/package.json +1 -1
- package/workflows/mr-review-workflow.agentic.v2.json +5 -1
|
@@ -0,0 +1,259 @@
|
|
|
1
|
+
# Implementation Plan: Coordinator Artifact Protocol
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-04-18
|
|
4
|
+
**Branch:** `feat/coordinator-artifact-protocol`
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Problem Statement
|
|
9
|
+
|
|
10
|
+
The PR review coordinator (`src/coordinators/pr-review.ts`) extracts review severity from completed review sessions by running a keyword scan on free-form step notes. The coordinator ignores the `artifacts[]` field that `GET /api/v2/sessions/:id/nodes/:nodeId` already returns. This makes severity extraction brittle and unmeasurable.
|
|
11
|
+
|
|
12
|
+
The fix: define a `wr.review_verdict` artifact schema, update the final handoff step to emit it, update `getAgentResult()` to return artifacts alongside notes, and update the coordinator to try the artifact path before the keyword scan.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## Acceptance Criteria
|
|
17
|
+
|
|
18
|
+
1. `npm run build` completes with 0 TypeScript errors
|
|
19
|
+
2. `tests/unit/coordinator-pr-review.test.ts` passes (all existing tests + new `readVerdictArtifact` tests)
|
|
20
|
+
3. `readVerdictArtifact([{ kind: 'wr.review_verdict', verdict: 'clean', ... }])` returns `{ severity: 'clean', source: 'artifact', ... }`
|
|
21
|
+
4. `readVerdictArtifact([])` returns `null`
|
|
22
|
+
5. `readVerdictArtifact([{ kind: 'wr.review_verdict', verdict: 'INVALID' }])` returns `null` and logs WARN
|
|
23
|
+
6. `CoordinatorDeps.getAgentResult` return type is `Promise<{ recapMarkdown: string | null; artifacts: readonly unknown[] }>`
|
|
24
|
+
7. `WorkflowRunSuccess` has optional field `lastStepArtifacts?: readonly unknown[]`
|
|
25
|
+
8. `mr-review-workflow.agentic.v2.json` phase-6-final-handoff has `outputContract: { contractRef: 'wr.contracts.review_verdict', required: false }`
|
|
26
|
+
9. `isValidContractRef('wr.contracts.review_verdict')` returns `true`
|
|
27
|
+
10. `validateArtifactContract([{ kind: 'wr.review_verdict', verdict: 'clean', ... }], { contractRef: 'wr.contracts.review_verdict' })` returns `{ valid: true, artifact: ... }`
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## Non-Goals
|
|
32
|
+
|
|
33
|
+
- Do NOT add a `/api/v2/sessions/:id/artifacts` server-side aggregation endpoint
|
|
34
|
+
- Do NOT change `required: false` to `required: true` (post-graduation decision)
|
|
35
|
+
- Do NOT remove the keyword-scan fallback from `parseFindingsFromNotes`
|
|
36
|
+
- Do NOT add a `coordinatorProtocol` field to the workflow JSON (deferred)
|
|
37
|
+
- Do NOT add artifacts to `spawn_agent` return value (post-MVP)
|
|
38
|
+
- Do NOT make `source` required on `ReviewFindings` (breaking change deferred)
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## Philosophy Constraints
|
|
43
|
+
|
|
44
|
+
- **Make illegal states unrepresentable:** `verdict`, `source`, `confidence` use closed enums
|
|
45
|
+
- **Validate at boundaries:** Zod `safeParse` in `readVerdictArtifact()`; engine validation via `validateArtifactContract()`
|
|
46
|
+
- **Errors are data:** `readVerdictArtifact()` returns `ReviewFindings | null`, not throws
|
|
47
|
+
- **Functional/declarative:** `readVerdictArtifact()` is a pure function
|
|
48
|
+
- **Prefer fakes over mocks:** New tests use `makeFakeDeps()` pattern
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## Invariants
|
|
53
|
+
|
|
54
|
+
1. `required: false` in outputContract -- never block sessions during transition
|
|
55
|
+
2. Schema registration (`ARTIFACT_CONTRACT_REFS`) MUST be done before workflow JSON update (compiler validates at load time via `isValidContractRef()`)
|
|
56
|
+
3. Keyword-scan fallback MUST remain live in `parseFindingsFromNotes`
|
|
57
|
+
4. All call sites of `CoordinatorDeps.getAgentResult` MUST handle `{ recapMarkdown, artifacts }` shape
|
|
58
|
+
5. `readVerdictArtifact()` MUST log `[WARN coord:reason=artifact_parse_failed]` when kind matches but safeParse fails
|
|
59
|
+
6. Per-node HTTP fetch failures MUST be caught individually (not by outer try/catch)
|
|
60
|
+
7. `makeContinueWorkflowTool` AND `makeCompleteStepTool` MUST both pass artifacts to `onComplete`
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## Selected Approach
|
|
65
|
+
|
|
66
|
+
**Candidate A:** Three ordered changes, all additive, following existing repo patterns exactly.
|
|
67
|
+
|
|
68
|
+
**Rationale:** Zero new infrastructure; follows `loop-control.ts` schema pattern; follows `WorkflowRunSuccess.lastStepNotes` conditional spread pattern; follows `makeFakeDeps()` testing pattern; backward compatible via `required: false` + keyword-scan fallback.
|
|
69
|
+
|
|
70
|
+
**Runner-up:** Tip-node only artifact read. Disqualified by task spec 'CRITICAL: must aggregate artifacts across ALL session nodes'.
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## Slices
|
|
75
|
+
|
|
76
|
+
### Slice 1: Schema registration (prerequisite for all other changes)
|
|
77
|
+
|
|
78
|
+
**Files:**
|
|
79
|
+
- `src/v2/durable-core/schemas/artifacts/review-verdict.ts` (NEW)
|
|
80
|
+
- `src/v2/durable-core/schemas/artifacts/index.ts` (update)
|
|
81
|
+
- `src/v2/durable-core/domain/artifact-contract-validator.ts` (update)
|
|
82
|
+
|
|
83
|
+
**Work:**
|
|
84
|
+
1. Create `review-verdict.ts` following `loop-control.ts` pattern:
|
|
85
|
+
- `REVIEW_VERDICT_CONTRACT_REF = 'wr.contracts.review_verdict' as const`
|
|
86
|
+
- `ReviewVerdictArtifactV1Schema = z.object({ kind: z.literal('wr.review_verdict'), verdict: z.enum(['clean', 'minor', 'blocking']), confidence: z.enum(['high', 'medium', 'low']), findings: z.array(z.object({ severity: z.enum(['critical', 'major', 'minor', 'nit']), summary: z.string().min(1) }).strict()), summary: z.string().min(1) }).strict()`
|
|
87
|
+
- `isReviewVerdictArtifact()` type guard
|
|
88
|
+
- `parseReviewVerdictArtifact()` convenience function
|
|
89
|
+
2. Update `index.ts`: export all new symbols, add `'wr.contracts.review_verdict'` to `ARTIFACT_CONTRACT_REFS`
|
|
90
|
+
3. Update `artifact-contract-validator.ts`: import new symbols, add `case REVIEW_VERDICT_CONTRACT_REF:` to switch with `validateReviewVerdictContract()` helper
|
|
91
|
+
|
|
92
|
+
**Done when:** `isValidContractRef('wr.contracts.review_verdict')` returns `true`; `validateArtifactContract([{ kind: 'wr.review_verdict', ... }], { contractRef: 'wr.contracts.review_verdict' })` returns `{ valid: true, artifact: ... }`.
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
### Slice 2: Fix onComplete callback signature
|
|
97
|
+
|
|
98
|
+
**Files:**
|
|
99
|
+
- `src/daemon/workflow-runner.ts`
|
|
100
|
+
|
|
101
|
+
**Work:**
|
|
102
|
+
1. Change `onComplete` closure definition (line 2096) from `(notes: string | undefined): void` to `(notes: string | undefined, artifacts?: readonly unknown[]): void`
|
|
103
|
+
2. Add `let lastStepArtifacts: readonly unknown[] | undefined;` near `let lastStepNotes`
|
|
104
|
+
3. Update `onComplete` body to set `lastStepArtifacts = artifacts`
|
|
105
|
+
4. Add `lastStepArtifacts?: readonly unknown[]` to `WorkflowRunSuccess` interface
|
|
106
|
+
5. Update `makeCompleteStepTool` call to `onComplete(notes)` -> `onComplete(notes, params.artifacts as readonly unknown[] | undefined)` (line 1249)
|
|
107
|
+
6. Update `makeContinueWorkflowTool` call to `onComplete(params.notesMarkdown)` -> `onComplete(params.notesMarkdown, params.artifacts as readonly unknown[] | undefined)` (line 1046)
|
|
108
|
+
7. Update the final `return` in `runWorkflow()` (line 2622) to spread `lastStepArtifacts` conditionally
|
|
109
|
+
|
|
110
|
+
**Done when:** `WorkflowRunSuccess` has `lastStepArtifacts` field; both tool factory call sites pass artifacts; `npm run build` passes.
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
### Slice 3: Update getAgentResult to return artifacts
|
|
115
|
+
|
|
116
|
+
**Files:**
|
|
117
|
+
- `src/cli-worktrain.ts`
|
|
118
|
+
|
|
119
|
+
**Work:**
|
|
120
|
+
1. Change `getAgentResult: async (sessionHandle: string): Promise<string | null>` -> `Promise<{ recapMarkdown: string | null; artifacts: readonly unknown[] }>`
|
|
121
|
+
2. In the implementation body:
|
|
122
|
+
- After reading `runs[0]`, read `runs[0].nodes` as `Array<{ nodeId: string; [key: string]: unknown }>` (with null check)
|
|
123
|
+
- Walk all nodes, fetch each node detail with individual `try/catch`:
|
|
124
|
+
```
|
|
125
|
+
for (const node of nodes) {
|
|
126
|
+
try {
|
|
127
|
+
const nodeRes = await fetch(nodeUrl + '/' + node.nodeId)
|
|
128
|
+
// collect artifacts from nodeData['artifacts']
|
|
129
|
+
} catch { /* log WARN, continue */ }
|
|
130
|
+
}
|
|
131
|
+
```
|
|
132
|
+
- Return `{ recapMarkdown: recap, artifacts: collectedArtifacts }` (or `{ recapMarkdown: null, artifacts: [] }` on failure)
|
|
133
|
+
3. Early-return failures must also return `{ recapMarkdown: null, artifacts: [] }` instead of `null`
|
|
134
|
+
|
|
135
|
+
**Done when:** Return type is `Promise<{ recapMarkdown: string | null; artifacts: readonly unknown[] }>`; TypeScript compile-time errors at call sites force updates.
|
|
136
|
+
|
|
137
|
+
---
|
|
138
|
+
|
|
139
|
+
### Slice 4: Update coordinator to use artifact path
|
|
140
|
+
|
|
141
|
+
**Files:**
|
|
142
|
+
- `src/coordinators/pr-review.ts`
|
|
143
|
+
|
|
144
|
+
**Work:**
|
|
145
|
+
1. Import `ReviewVerdictArtifactV1Schema` from artifacts schema
|
|
146
|
+
2. Update `CoordinatorDeps.getAgentResult` return type to match new shape
|
|
147
|
+
3. Add `source?: 'artifact' | 'keyword_scan'` to `ReviewFindings` interface
|
|
148
|
+
4. Add `readVerdictArtifact(artifacts: readonly unknown[]): ReviewFindings | null` pure function:
|
|
149
|
+
- Walk artifacts array
|
|
150
|
+
- For each, check `(raw as any).kind === 'wr.review_verdict'`
|
|
151
|
+
- If kind matches, call `ReviewVerdictArtifactV1Schema.safeParse(raw)`
|
|
152
|
+
- On success: return `{ severity: v.verdict, findingSummaries: v.findings.map(f => f.summary), raw: JSON.stringify(v), source: 'artifact' }`
|
|
153
|
+
- On failure: log `[WARN coord:reason=artifact_parse_failed]`, continue to next artifact
|
|
154
|
+
- If no valid artifact found and artifacts.length > 0: log `[INFO coord:source=keyword_scan reason=no_valid_artifact artifactCount=N]`
|
|
155
|
+
- Return `null`
|
|
156
|
+
5. Update both call sites in `runPrReviewCoordinator()`:
|
|
157
|
+
- `const { recapMarkdown: notes, artifacts } = await deps.getAgentResult(handle);`
|
|
158
|
+
- `const findingsResult = readVerdictArtifact(artifacts) ? ok(readVerdictArtifact(artifacts)!) : parseFindingsFromNotes(notes);`
|
|
159
|
+
- Log `[INFO coord:source=artifact]` or `[INFO coord:source=keyword_scan]`
|
|
160
|
+
6. Add divergence check (O2): if artifact verdict and keyword-scan severity disagree, log WARN
|
|
161
|
+
7. Update traceability JSON block to include `source` field
|
|
162
|
+
|
|
163
|
+
**Done when:** Coordinator tries artifact path first; keyword-scan fallback works; logging emits; `npm run build` passes.
|
|
164
|
+
|
|
165
|
+
---
|
|
166
|
+
|
|
167
|
+
### Slice 5: Update mr-review workflow
|
|
168
|
+
|
|
169
|
+
**Files:**
|
|
170
|
+
- `workflows/mr-review-workflow.agentic.v2.json`
|
|
171
|
+
|
|
172
|
+
**Work:**
|
|
173
|
+
1. In `phase-6-final-handoff` step, add `outputContract: { "contractRef": "wr.contracts.review_verdict", "required": false }`
|
|
174
|
+
2. Append to the step `prompt` field the artifact emission instruction:
|
|
175
|
+
```
|
|
176
|
+
\n\nAfter completing your notes, emit a structured verdict via complete_step artifacts[] parameter. Use exactly this schema:\n{ "kind": "wr.review_verdict", "verdict": "clean|minor|blocking", "confidence": "high|medium|low", "findings": [{ "severity": "critical|major|minor|nit", "summary": "one-line description" }], "summary": "one-line overall summary" }\nFor a clean review with no findings, use findings: [].
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
**Done when:** Workflow JSON validates via `npm run build`; `isValidContractRef('wr.contracts.review_verdict')` returns `true` (prerequisite: Slice 1 must be done first).
|
|
180
|
+
|
|
181
|
+
---
|
|
182
|
+
|
|
183
|
+
### Slice 6: Tests
|
|
184
|
+
|
|
185
|
+
**Files:**
|
|
186
|
+
- `tests/unit/coordinator-pr-review.test.ts`
|
|
187
|
+
|
|
188
|
+
**Work:**
|
|
189
|
+
1. Update `makeFakeDeps()` to return `{ recapMarkdown: string | null; artifacts: readonly unknown[] }` from `getAgentResult` (change return type from `string | null`)
|
|
190
|
+
2. Update `ReviewFindings` literal objects in `buildFixGoal` tests to add `source: 'artifact'` or `source: 'keyword_scan'` (or leave as optional -- `source?` means no update needed)
|
|
191
|
+
3. Add new `describe('readVerdictArtifact')` block:
|
|
192
|
+
- `it('returns ReviewFindings with source artifact for valid artifact')`
|
|
193
|
+
- `it('returns null for invalid schema (wrong verdict enum)')`
|
|
194
|
+
- `it('returns null for empty artifacts array')`
|
|
195
|
+
- `it('returns null for artifact with different kind')`
|
|
196
|
+
- `it('returns first valid artifact when multiple present')`
|
|
197
|
+
4. Import `readVerdictArtifact` from `pr-review.js`
|
|
198
|
+
|
|
199
|
+
**Done when:** All existing tests pass; 5 new `readVerdictArtifact` tests pass.
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## Test Design
|
|
204
|
+
|
|
205
|
+
**Unit tests (pure function):**
|
|
206
|
+
- `readVerdictArtifact` with valid `wr.review_verdict` artifact -> returns `ReviewFindings` with `severity` mapped from `verdict`, `source: 'artifact'`
|
|
207
|
+
- `readVerdictArtifact` with invalid schema (wrong enum) -> returns `null`
|
|
208
|
+
- `readVerdictArtifact` with empty array -> returns `null`
|
|
209
|
+
- `readVerdictArtifact` with artifact of different `kind` -> returns `null` (no false positives)
|
|
210
|
+
- `readVerdictArtifact` with valid + invalid artifacts -> returns valid one (first match wins)
|
|
211
|
+
|
|
212
|
+
**Integration tests (fake deps):**
|
|
213
|
+
- Existing `runPrReviewCoordinator` tests must pass with updated `getAgentResult` return type
|
|
214
|
+
- The fake `getAgentResult` returns `{ recapMarkdown: 'APPROVE ...', artifacts: [] }` by default
|
|
215
|
+
|
|
216
|
+
---
|
|
217
|
+
|
|
218
|
+
## Risk Register
|
|
219
|
+
|
|
220
|
+
| Risk | Likelihood | Impact | Mitigation |
|
|
221
|
+
|------|-----------|--------|------------|
|
|
222
|
+
| Missing `makeContinueWorkflowTool` onComplete update | Low | Silent -- artifacts not forwarded from continue_workflow path | Manual verification; code comment at both call sites |
|
|
223
|
+
| Per-node HTTP fetch error aborting aggregation | Low | Graceful fallback to keyword scan | Per-node try/catch (Slice 3 R2) |
|
|
224
|
+
| LLM emits extra fields in artifact (`.strict()` reject) | Medium | Zod fail -> WARN log -> keyword scan fallback | Acceptable during `required: false` transition |
|
|
225
|
+
| `runs[0].nodes` undefined or empty | Low | Empty artifact array -> keyword scan fallback | Null check in Slice 3 |
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
229
|
+
## PR Packaging Strategy
|
|
230
|
+
|
|
231
|
+
Single PR: `feat/coordinator-artifact-protocol`
|
|
232
|
+
|
|
233
|
+
All 6 slices in one PR. Changes are tightly coupled (schema + validator + coordinator must be consistent). Breaking the PR into multiple would require interface stubs that add noise.
|
|
234
|
+
|
|
235
|
+
**PR description structure:**
|
|
236
|
+
1. Summary: what was done and why
|
|
237
|
+
2. Change 1 (schema), Change 2 (onComplete), Change 3 (coordinator + workflow)
|
|
238
|
+
3. Test plan: `npm run build`, `npx vitest run tests/unit/coordinator-pr-review.test.ts`
|
|
239
|
+
|
|
240
|
+
---
|
|
241
|
+
|
|
242
|
+
## Philosophy Alignment
|
|
243
|
+
|
|
244
|
+
| Slice | Principle | Status |
|
|
245
|
+
|-------|-----------|--------|
|
|
246
|
+
| 1 (schema) | Make illegal states unrepresentable | Satisfied -- closed enums, kind literal |
|
|
247
|
+
| 1 (schema) | Validate at boundaries | Satisfied -- Zod strict schema |
|
|
248
|
+
| 2 (onComplete) | Immutability by default | Satisfied -- `readonly unknown[]` |
|
|
249
|
+
| 3 (getAgentResult) | Errors are data | Satisfied -- returns `{ recapMarkdown: null, artifacts: [] }` not null |
|
|
250
|
+
| 4 (coordinator) | Functional/declarative | Satisfied -- `readVerdictArtifact()` is pure |
|
|
251
|
+
| 4 (coordinator) | Make illegal states unrepresentable | Tension -- `source?` optional; accepted tradeoff |
|
|
252
|
+
| 6 (tests) | Prefer fakes over mocks | Satisfied -- `makeFakeDeps()` pattern |
|
|
253
|
+
|
|
254
|
+
---
|
|
255
|
+
|
|
256
|
+
## planConfidenceBand: High
|
|
257
|
+
|
|
258
|
+
- unresolvedUnknownCount: 0
|
|
259
|
+
- followUpTickets: Y1 (make source required post-graduation), Y2 (remove keyword scan post-graduation), spawn_agent artifacts gap (post-MVP)
|
package/docs/ideas/backlog.md
CHANGED
|
@@ -5700,136 +5700,194 @@ Tested empirically today. This is what actually works, not what's specced.
|
|
|
5700
5700
|
|
|
5701
5701
|
---
|
|
5702
5702
|
|
|
5703
|
-
###
|
|
5703
|
+
### Autonomous feature development: scope → breakdown → parallel execution → merge (Apr 18, 2026)
|
|
5704
5704
|
|
|
5705
|
-
**The
|
|
5705
|
+
**The vision:** give WorkTrain a feature scope -- from a vague idea to a fully groomed ticket -- and it figures out the rest. Discovery if needed, design if needed, breakdown into parallel slices, execution across worktrees, context management across agents, bringing it all back together.
|
|
5706
5706
|
|
|
5707
|
-
**
|
|
5707
|
+
**The four pillars the user cares about:**
|
|
5708
|
+
1. **Autonomy** -- WorkTrain takes a scope and figures out the work breakdown without hand-holding
|
|
5709
|
+
2. **Quality** -- comes FROM autonomy + workflow enforcement + coordination. Each slice goes through the right phases.
|
|
5710
|
+
3. **Throughput** -- parallel slices across worktrees simultaneously. N agents working while you focus elsewhere.
|
|
5711
|
+
4. **Visibility** -- one coherent work unit you can track at a glance, not N unrelated sessions in a flat list.
|
|
5708
5712
|
|
|
5709
|
-
**
|
|
5713
|
+
**The pipeline for a scope:**
|
|
5710
5714
|
|
|
5711
|
-
|
|
5712
|
-
|
|
5713
|
-
|
|
5714
|
-
|
|
5715
|
-
-
|
|
5715
|
+
```
|
|
5716
|
+
Input: "add GitHub polling support" (any level of definition -- idea to full spec)
|
|
5717
|
+
│
|
|
5718
|
+
├── [if vague] ideation + spec authoring → output: BRD / acceptance criteria
|
|
5719
|
+
├── classify-task → taskComplexity, hasUI, touchesArchitecture, taskMaturity
|
|
5720
|
+
├── [if Medium/Large] discovery → context bundle, invariants, candidate files
|
|
5721
|
+
├── [if touchesArchitecture] design → candidates, review, selected approach
|
|
5722
|
+
├── breakdown → parallel slices with dependency graph
|
|
5723
|
+
│ ├── Slice 1: types + schema (worktree A)
|
|
5724
|
+
│ ├── Slice 2: polling adapter (worktree B, depends: 1)
|
|
5725
|
+
│ ├── Slice 3: scheduler integration (worktree C, depends: 2)
|
|
5726
|
+
│ └── Slice 4: tests (worktree D, depends: 1-3)
|
|
5727
|
+
├── [parallel execution] each slice: implement → review → (fix if needed) → approved
|
|
5728
|
+
├── [serial integration] merge slices in dependency order, verify after each
|
|
5729
|
+
└── [final] integration test → PR created → notification to user
|
|
5730
|
+
```
|
|
5716
5731
|
|
|
5717
|
-
**
|
|
5718
|
-
-
|
|
5719
|
-
-
|
|
5720
|
-
-
|
|
5721
|
-
-
|
|
5732
|
+
**Context management across agents:**
|
|
5733
|
+
- Coordinator maintains a "work unit manifest": current phase, slice status, shared invariants, decisions made in design phase
|
|
5734
|
+
- Each spawned agent receives a context bundle: relevant portion of the manifest + files it needs + decisions from upstream phases
|
|
5735
|
+
- Agents don't rediscover what the coordinator already knows
|
|
5736
|
+
- After each agent completes, its findings update the manifest (new invariants found, scope changes, follow-up tickets)
|
|
5722
5737
|
|
|
5723
|
-
**
|
|
5724
|
-
-
|
|
5725
|
-
-
|
|
5726
|
-
-
|
|
5738
|
+
**Worktree coordination:**
|
|
5739
|
+
- Each slice gets its own worktree (already done via `--isolation worktree`)
|
|
5740
|
+
- Coordinator tracks which files each slice touches -- detects conflicts before they happen
|
|
5741
|
+
- Independent slices run in parallel; dependent slices queue automatically
|
|
5742
|
+
- Merge order follows the dependency graph, not wall-clock completion time
|
|
5727
5743
|
|
|
5728
|
-
**
|
|
5729
|
-
-
|
|
5730
|
-
-
|
|
5731
|
-
-
|
|
5732
|
-
- All of this happens automatically, without user intervention
|
|
5744
|
+
**Knowing when to spawn a new main agent:**
|
|
5745
|
+
- When a slice is too large or discovers unexpected scope, it requests a breakdown from the coordinator
|
|
5746
|
+
- When a review finds a Critical finding, the coordinator spawns a dedicated fix agent with the finding + relevant context
|
|
5747
|
+
- When integration reveals a regression, coordinator spawns an investigation agent before retrying the merge
|
|
5733
5748
|
|
|
5734
|
-
**
|
|
5735
|
-
-
|
|
5736
|
-
-
|
|
5737
|
-
-
|
|
5738
|
-
-
|
|
5749
|
+
**The coordinator's job (what stays in scripts, not LLM):**
|
|
5750
|
+
- Maintain the manifest (JSON file, append-only)
|
|
5751
|
+
- Compute the dependency graph
|
|
5752
|
+
- Decide parallelism vs serialization
|
|
5753
|
+
- Route: clean → merge, minor findings → fix agent, critical → escalate
|
|
5754
|
+
- Track worktrees, detect conflicts
|
|
5755
|
+
- Sequence the merge order
|
|
5739
5756
|
|
|
5740
|
-
**
|
|
5741
|
-
-
|
|
5742
|
-
-
|
|
5743
|
-
-
|
|
5757
|
+
**What requires LLM cognition:**
|
|
5758
|
+
- Discovery (what are the invariants, which files matter)
|
|
5759
|
+
- Design (which approach, what tradeoffs)
|
|
5760
|
+
- Implementation (write the code)
|
|
5761
|
+
- Review (is this correct and complete)
|
|
5762
|
+
- Breakdown (what are the right slice boundaries)
|
|
5744
5763
|
|
|
5745
|
-
**
|
|
5746
|
-
-
|
|
5747
|
-
- Phase 2: kill on detection -- stop agents immediately when tooling failure detected. No more unverified output reaching main.
|
|
5748
|
-
- Phase 3: auto-resume -- restart and resume for recoverable failures.
|
|
5749
|
-
- Phase 4: full self-heal loop -- diagnose, fix, reboot, resume automatically.
|
|
5764
|
+
**The minimum viable version:**
|
|
5765
|
+
A coordinator that handles a Medium/Small scoped task (already classified, no need for ideation or design). Takes 2-4 parallel slices, runs them, reviews each, merges when clean. No escalation handling in v1 -- if anything fails, notify the user.
|
|
5750
5766
|
|
|
5751
|
-
|
|
5767
|
+
This is the thing that makes WorkTrain feel like a senior engineer taking ownership of a task, not a tool you have to supervise step by step.
|
|
5752
5768
|
|
|
5753
5769
|
---
|
|
5754
5770
|
|
|
5755
|
-
###
|
|
5771
|
+
### Coordinator design decision: MVP-first, generalize after (Apr 18, 2026)
|
|
5756
5772
|
|
|
5757
|
-
|
|
5773
|
+
**Decision:** Build the first coordinator as a PR review-specific script. Generalize to a reusable coordinator framework after proving it works end-to-end.
|
|
5758
5774
|
|
|
5759
|
-
**
|
|
5775
|
+
**Rationale:** Three discovery runs all converged on the architecture (TypeScript script, `CoordinatorDeps` interface, 2-call HTTP for notes). The risk is over-engineering for hypothetical pipelines before validating the real one. PR review is the highest-value first use case with a clear success criterion.
|
|
5760
5776
|
|
|
5761
|
-
|
|
5762
|
-
|
|
5763
|
-
|
|
5764
|
-
|
|
5765
|
-
|
|
5766
|
-
|
|
5767
|
-
|
|
5768
|
-
|
|
5769
|
-
|
|
5770
|
-
|
|
5771
|
-
|
|
5772
|
-
|
|
5773
|
-
|
|
5774
|
-
|
|
5775
|
-
|
|
5776
|
-
|
|
5777
|
-
|
|
5778
|
-
|
|
5779
|
-
|
|
5780
|
-
|
|
5781
|
-
|
|
5777
|
+
**The generic coordinator architecture is already designed** (see `docs/discovery/coordinator-script-design.md`). The `CoordinatorDeps` interface and `AgentResult` bridge type make migration to a generic coordinator trivial -- the PR review script uses these types, so generalizing is additive, not a rewrite.
|
|
5778
|
+
|
|
5779
|
+
**Migration path:** once PR review coordinator is proven in production, extract the routing logic (`parseFindings`, `routeByFindings`) and `CoordinatorDeps` interface into `src/coordinators/base.ts`. The PR review coordinator becomes one implementation of the base pattern.
|
|
5780
|
+
|
|
5781
|
+
---
|
|
5782
|
+
|
|
5783
|
+
### Architecture decisions from Apr 17-18 sessions (to record before files are cleaned up)
|
|
5784
|
+
|
|
5785
|
+
**Decision 1: Structured output + tool calls can coexist (Apr 18)**
|
|
5786
|
+
Validated empirically via integration test. The beta API (`client.beta.messages.create()`) supports both JSON schema enforcement AND tool calls in the same request. Schema enforcement applies at `end_turn` only. Bedrock is more consistent than direct Anthropic API for system-prompt fallback behavior. This opens a future path for replacing `complete_step` with structured output, but `complete_step` remains the chosen primitive for now.
|
|
5787
|
+
|
|
5788
|
+
**Decision 2: `complete_step` is the preferred daemon workflow-control primitive (Apr 18)**
|
|
5789
|
+
PR #569 merged. The daemon holds the continueToken in a closure; LLM calls `complete_step(notes)` and never handles the token directly. Structured output (`beta.messages.create` with JSON schema) was evaluated as an alternative and deferred -- it's a viable migration path for a future version but adds API complexity today. Follow-up: track a structured output migration as a future improvement, not a current priority.
|
|
5790
|
+
|
|
5791
|
+
**Decision 3: AgentLoop error handling contract -- FatalToolError (Apr 16)**
|
|
5792
|
+
`FatalToolError` subclass selected for distinguishing recoverable from non-recoverable tool failures in the AgentLoop. The contract: user-facing tools (Bash, Read, Write) catch failures and return `isError: true` in the tool_result (loop continues, LLM can retry). Coordination tools with unrecoverable failures (session store corruption, token decode failure) throw `FatalToolError` -- `_executeTools` instanceof-checks this and kills the session rather than surfacing a confusing error to the LLM. This contract is part of the AgentLoop architecture and must be followed by any new tool implementations.
|
|
5793
|
+
|
|
5794
|
+
**Decision 4: Use `wr.discovery` for discovery-only tasks, not `coding-task-workflow-agentic` (Apr 17)**
|
|
5795
|
+
Discovered from a broken session: `coding-task-workflow-agentic` dispatched with "do discovery only, no code" ran 11 step advances then stopped without `run_completed`. The workflow's implementation phases fired even with explicit instructions not to code. Lesson: when a trigger or coordinator wants pure discovery/research, use `wr.discovery` as the workflowId. `coding-task-workflow-agentic` should only be dispatched when implementation is the actual goal.
|
|
5796
|
+
|
|
5797
|
+
**Decision 5: Bug -- MCP server EPIPE crash (Apr 18)**
|
|
5798
|
+
Root cause confirmed with 15 production crash log entries: `process.stderr` is missing an `'error'` event handler in `registerFatalHandlers()`. When an MCP client disconnects, Node.js emits `EPIPE` on stderr which crashes the process with an unhandled error. `process.stdout` already has equivalent protection via `wireStdoutShutdown()`. Fix: mirror the stdout protection for stderr. One-line fix being implemented in PR `fix/mcp-stderr-epipe-crash`.
|
|
5799
|
+
|
|
5800
|
+
---
|
|
5801
|
+
|
|
5802
|
+
### worktrain status → console integration (Apr 18, 2026)
|
|
5803
|
+
|
|
5804
|
+
The `worktrain status` CLI command is Phase 1. Phase 2: the same data and rendering lives inside the console as the default landing view when you open it -- not the sessions list, the overview. Same `StatusDataPacket` type, two surfaces. The console overview replaces the need to run a CLI command; it auto-refreshes and stays live.
|
|
5782
5805
|
|
|
5783
|
-
|
|
5806
|
+
---
|
|
5807
|
+
|
|
5808
|
+
### WorkTrain as a native macOS app (Apr 18, 2026)
|
|
5809
|
+
|
|
5810
|
+
Long-term vision: WorkTrain becomes a full native Mac app -- not just a CLI + web console, but a proper macOS application with a menubar icon, system notifications, windows, and native UX.
|
|
5784
5811
|
|
|
5785
|
-
**
|
|
5812
|
+
**What this unlocks:**
|
|
5813
|
+
- Always-on menubar presence showing daemon status at a glance
|
|
5814
|
+
- Native macOS notifications (already built via osascript -- the app version uses UserNotifications framework directly)
|
|
5815
|
+
- The `worktrain status` overview as a native window, not a browser tab
|
|
5816
|
+
- Message queue and inbox as a native interface (type a message from anywhere on your Mac, not just the terminal)
|
|
5817
|
+
- Background daemon management -- start/stop/restart from the menubar without terminal
|
|
5818
|
+
- Deep system integration: file system events, calendar, Contacts, native share sheet
|
|
5786
5819
|
|
|
5787
|
-
**
|
|
5820
|
+
**Tech stack options:**
|
|
5821
|
+
- Swift/SwiftUI: full native, best macOS integration, steeper learning curve from TypeScript
|
|
5822
|
+
- Electron + existing console UI: fastest path, same TypeScript codebase, but heavy
|
|
5823
|
+
- Tauri: Rust core + existing web frontend, lighter than Electron, good macOS support
|
|
5824
|
+
- React Native macOS: reuses React knowledge, not quite native feel
|
|
5788
5825
|
|
|
5789
|
-
**
|
|
5790
|
-
|
|
5791
|
-
-
|
|
5792
|
-
- Console code reads the session store directly -- no IPC with MCP or daemon needed
|
|
5793
|
-
- These are separate processes. A crash in one does not affect the others.
|
|
5826
|
+
**Recommended path:** Tauri wrapping the existing console UI. The console is already a React/Vite app. Tauri gives native menubar, notifications, and system APIs without rewriting the frontend. The WorkTrain daemon stays as a separate process managed by the app.
|
|
5827
|
+
|
|
5828
|
+
**This is a post-v1 platform decision** -- not a near-term priority, but worth designing toward. Don't make architectural decisions that would make the Tauri wrapper hard later.
|
|
5794
5829
|
|
|
5795
5830
|
---
|
|
5796
5831
|
|
|
5797
|
-
###
|
|
5832
|
+
### Long-running sessions: stay open across agent handoffs (Apr 18, 2026)
|
|
5833
|
+
|
|
5834
|
+
**The problem:** today when an MR review session completes, it writes its findings and exits. If the findings require fixes, a new fix agent starts from scratch with no shared context. When the fix is done, a new re-review agent also starts from scratch. Three sessions that are logically one unit of work are isolated from each other.
|
|
5798
5835
|
|
|
5799
|
-
|
|
5836
|
+
**The vision:** a session can stay open and wait -- dormant but alive -- while another agent does work. When that work completes, the waiting session resumes with full context continuity.
|
|
5800
5837
|
|
|
5801
|
-
**
|
|
5838
|
+
**The MR review example:**
|
|
5802
5839
|
|
|
5803
5840
|
```
|
|
5804
|
-
|
|
5805
|
-
|
|
5806
|
-
|
|
5807
|
-
|
|
5808
|
-
|
|
5809
|
-
|
|
5810
|
-
|
|
5811
|
-
|
|
5812
|
-
|
|
5813
|
-
│ WorkRail MCP │ │ WorkTrain │ │ WorkRail │
|
|
5814
|
-
│ Server │ │ Daemon │ │ Console │
|
|
5815
|
-
│ workrail start │ │ worktrain │ │ worktrain │
|
|
5816
|
-
│ src/mcp/ │ │ daemon │ │ console │
|
|
5817
|
-
│ │ │ src/daemon/│ │ src/console/ │
|
|
5818
|
-
│ Claude Code │ │ src/trigger│ │ │
|
|
5819
|
-
│ connects here │ │ │ │ Shows BOTH │
|
|
5820
|
-
│ via stdio │ │ autonomous │ │ MCP + daemon │
|
|
5821
|
-
│ │ │ agent loop │ │ sessions │
|
|
5822
|
-
└────────────────┘ └────────────┘ └──────────────┘
|
|
5841
|
+
[MR review session] finds: 2 critical, 3 minor
|
|
5842
|
+
→ stays open, waiting for fixes
|
|
5843
|
+
|
|
5844
|
+
[Fix agent session] addresses all 5 findings
|
|
5845
|
+
→ completes, signals "fixes ready"
|
|
5846
|
+
|
|
5847
|
+
[MR review session resumes] re-reads the diff, re-evaluates
|
|
5848
|
+
→ all 5 verified fixed, 0 new findings
|
|
5849
|
+
→ completes with APPROVE verdict
|
|
5823
5850
|
```
|
|
5824
5851
|
|
|
5825
|
-
|
|
5852
|
+
The same session that found the issues verifies the fixes. No context reconstruction. No risk of re-review missing something the original reviewer knew.
|
|
5853
|
+
|
|
5854
|
+
**Other use cases for waiting sessions:**
|
|
5855
|
+
|
|
5856
|
+
- **Architecture review waiting for approval:** architect session identifies a design gap, waits for the human to decide on direction, resumes when the decision is recorded
|
|
5857
|
+
- **Discovery session waiting for data:** a research session identifies that it needs a specific file or API response, signals "blocked on: fetch X", waits for a retrieval agent to deliver it, resumes with the data injected
|
|
5858
|
+
- **Coordinator waiting on child completion:** instead of a coordinator script polling `worktrain await`, the coordinator session can yield and be resumed by the daemon when child sessions complete -- same session, same context, no polling overhead
|
|
5859
|
+
- **Spec authoring waiting for stakeholder input:** a spec session writes a draft, flags "needs: human review of acceptance criteria", waits, resumes when the human adds a comment
|
|
5860
|
+
- **Integration test waiting for deployment:** a test coordination session waits for a deploy to complete before running integration tests
|
|
5861
|
+
|
|
5862
|
+
**The key insight: the LLM doesn't experience waiting.**
|
|
5826
5863
|
|
|
5827
|
-
|
|
5864
|
+
LLMs have no concept of time. Between one turn and the next, zero time passes from the agent's perspective. This means "waiting" is not a thing that happens to the agent -- it just doesn't receive its next turn until the coordinator has something to give it.
|
|
5828
5865
|
|
|
5829
|
-
|
|
5866
|
+
The session is paused at the engine level (DAG holds at a node, no new turns issued). The agent submitted its output and simply hasn't received a response yet. When the coordinator is ready -- fix agent completed, human reviewed, deployment finished -- it advances the session with a turn that contains the new context. From the agent's perspective: it submitted findings and immediately received "here are the fixes, verify them."
|
|
5867
|
+
|
|
5868
|
+
**No `wait_for` primitive needed at the workflow level.** The coordinator is the timing mechanism. This is the coordinator's job: know when each session is ready for its next input, and deliver that input at the right time.
|
|
5869
|
+
|
|
5870
|
+
```
|
|
5871
|
+
Coordinator logic:
|
|
5872
|
+
|
|
5873
|
+
1. Advance review session to "findings complete" node
|
|
5874
|
+
2. Read findings from session output
|
|
5875
|
+
3. Spawn fix agent with those findings
|
|
5876
|
+
4. Wait for fix agent to complete (worktrain await)
|
|
5877
|
+
5. Inject fix summary into review session's next turn
|
|
5878
|
+
6. Advance review session: "Here are the fixes. Verify them."
|
|
5879
|
+
→ LLM receives this as the natural next step, no time gap perceived
|
|
5880
|
+
```
|
|
5881
|
+
|
|
5882
|
+
**Why this is more powerful than re-running a fresh session:**
|
|
5883
|
+
|
|
5884
|
+
- **Context continuity:** the reviewer remembers what it found, why it flagged it, what invariants it was checking. A fresh session has to re-discover all of that.
|
|
5885
|
+
- **Relational memory:** "does this fix address the root cause I identified, or just the symptom?" -- only the original session knows the root cause reasoning.
|
|
5886
|
+
- **Efficiency:** no redundant context gathering. The resumed session picks up exactly where it left off.
|
|
5887
|
+
- **The agent doesn't know it's coordinating:** from the agent's view, it's a continuous workflow. The coordinator manages the timing externally.
|
|
5888
|
+
|
|
5889
|
+
**Implementation path:**
|
|
5830
5890
|
|
|
5831
|
-
|
|
5832
|
-
-
|
|
5833
|
-
-
|
|
5834
|
-
- Console reads the session store directly -- no IPC with either needed
|
|
5835
|
-
- These are separate processes. A crash in one does not affect the others.
|
|
5891
|
+
- Phase 1: coordinator scripts withhold `complete_step` advancement until the condition is met. This already works today -- the coordinator just doesn't advance the session until the fix agent is done.
|
|
5892
|
+
- Phase 2: the coordinator passes structured context when advancing: `complete_step(session, { injectedContext: fixSummary })`. The session receives it as part of the next step's prompt.
|
|
5893
|
+
- Phase 3: declarative pipelines -- workflow JSON declares that step N waits for an external condition before proceeding. The coordinator reads this and manages the timing automatically. No hand-coded coordinator script needed for common patterns.
|
package/package.json
CHANGED
|
@@ -312,7 +312,11 @@
|
|
|
312
312
|
{
|
|
313
313
|
"id": "phase-6-final-handoff",
|
|
314
314
|
"title": "Phase 6: Final Handoff",
|
|
315
|
-
"prompt": "Provide the final MR review handoff.\n\nInclude:\n- MR title and purpose\n- review mode used\n- final recommendation and confidence band\n- confidence assessment summary, including the most important reason confidence was capped if it was not High\n- counts of Critical / Major / Minor / Nit findings\n- top findings with rationale\n- strongest remaining areas of uncertainty, if any\n- summary of the coverage ledger, especially any still-uncertain domains\n- ready-to-post MR comments summary\n- any validation outcomes a human reviewer should see\n- review environment status:\n - what review target/context sources were successfully used\n - what important sources were missing or ambiguous\n - boundary confidence and context confidence\n - how those limits affected the review\n- path to the full human-facing review artifact (`reviewDocPath`) only if one was created\n\nRules:\n- the final recommendation assists a human reviewer; it does not replace them\n- if `reviewDocPath` exists, treat it as a human-facing companion artifact only\n- be explicit when missing PR/ticket/doc/boundary context limited confidence\n- do not post comments, approve, reject, or merge unless the user explicitly asks",
|
|
315
|
+
"prompt": "Provide the final MR review handoff.\n\nInclude:\n- MR title and purpose\n- review mode used\n- final recommendation and confidence band\n- confidence assessment summary, including the most important reason confidence was capped if it was not High\n- counts of Critical / Major / Minor / Nit findings\n- top findings with rationale\n- strongest remaining areas of uncertainty, if any\n- summary of the coverage ledger, especially any still-uncertain domains\n- ready-to-post MR comments summary\n- any validation outcomes a human reviewer should see\n- review environment status:\n - what review target/context sources were successfully used\n - what important sources were missing or ambiguous\n - boundary confidence and context confidence\n - how those limits affected the review\n- path to the full human-facing review artifact (`reviewDocPath`) only if one was created\n\nRules:\n- the final recommendation assists a human reviewer; it does not replace them\n- if `reviewDocPath` exists, treat it as a human-facing companion artifact only\n- be explicit when missing PR/ticket/doc/boundary context limited confidence\n- do not post comments, approve, reject, or merge unless the user explicitly asks\n\nIMPORTANT: After writing your notes, emit a structured verdict via complete_step's artifacts[] parameter using EXACTLY this schema (no extra fields):\n{\n \"kind\": \"wr.review_verdict\",\n \"verdict\": \"clean\" | \"minor\" | \"blocking\",\n \"confidence\": \"high\" | \"medium\" | \"low\",\n \"findings\": [ { \"severity\": \"critical\" | \"major\" | \"minor\" | \"nit\", \"summary\": \"one-line description\" } ],\n \"summary\": \"one-line overall verdict summary\"\n}\nFor a clean review with no findings, use findings: []. The verdict field maps to severity: clean = no blocking issues, minor = small issues only, blocking = critical or major issues found.",
|
|
316
|
+
"outputContract": {
|
|
317
|
+
"contractRef": "wr.contracts.review_verdict",
|
|
318
|
+
"required": false
|
|
319
|
+
},
|
|
316
320
|
"requireConfirmation": true
|
|
317
321
|
}
|
|
318
322
|
]
|