npm - @exaudeus/workrail - Versions diffs - 3.39.0 → 3.41.0 - Mend

@exaudeus/workrail 3.39.0 → 3.41.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (97) hide show

package/dist/cli/commands/init.js +0 -3
package/dist/cli-worktrain.js +58 -26
package/dist/cli.js +0 -18
package/dist/config/app-config.d.ts +0 -16
package/dist/config/app-config.js +0 -14
package/dist/config/config-file.js +0 -3
package/dist/console-ui/assets/index-CQt4UhPB.js +28 -0
package/dist/console-ui/assets/index-DGj8EsFR.css +1 -0
package/dist/console-ui/index.html +2 -2
package/dist/coordinators/pr-review.d.ts +23 -1
package/dist/coordinators/pr-review.js +224 -5
package/dist/daemon/daemon-events.d.ts +9 -1
package/dist/daemon/soul-template.d.ts +2 -2
package/dist/daemon/soul-template.js +11 -1
package/dist/daemon/workflow-runner.d.ts +17 -3
package/dist/daemon/workflow-runner.js +401 -28
package/dist/di/container.js +1 -25
package/dist/di/tokens.d.ts +0 -3
package/dist/di/tokens.js +0 -3
package/dist/engine/engine-factory.js +0 -1
package/dist/infrastructure/console-defaults.d.ts +1 -0
package/dist/infrastructure/console-defaults.js +4 -0
package/dist/infrastructure/session/index.d.ts +0 -1
package/dist/infrastructure/session/index.js +1 -3
package/dist/manifest.json +124 -124
package/dist/mcp/handlers/session.d.ts +1 -0
package/dist/mcp/handlers/session.js +61 -13
package/dist/mcp/output-schemas.d.ts +10 -10
package/dist/mcp/server.js +1 -18
package/dist/mcp/tools.d.ts +12 -12
package/dist/mcp/transports/http-entry.js +0 -2
package/dist/mcp/transports/stdio-entry.js +1 -2
package/dist/mcp/types.d.ts +0 -2
package/dist/trigger/daemon-console.d.ts +2 -0
package/dist/trigger/daemon-console.js +1 -1
package/dist/trigger/trigger-listener.d.ts +2 -0
package/dist/trigger/trigger-listener.js +3 -1
package/dist/trigger/trigger-router.d.ts +4 -3
package/dist/trigger/trigger-router.js +13 -5
package/dist/trigger/trigger-store.js +17 -4
package/dist/types/workflow-source.d.ts +0 -1
package/dist/types/workflow-source.js +3 -6
package/dist/types/workflow.d.ts +1 -1
package/dist/types/workflow.js +1 -2
package/dist/v2/durable-core/domain/artifact-contract-validator.js +66 -0
package/dist/v2/durable-core/schemas/artifacts/coordinator-signal.d.ts +25 -0
package/dist/v2/durable-core/schemas/artifacts/coordinator-signal.js +31 -0
package/dist/v2/durable-core/schemas/artifacts/index.d.ts +3 -1
package/dist/v2/durable-core/schemas/artifacts/index.js +14 -1
package/dist/v2/durable-core/schemas/artifacts/review-verdict.d.ts +41 -0
package/dist/v2/durable-core/schemas/artifacts/review-verdict.js +30 -0
package/dist/v2/durable-core/schemas/export-bundle/index.d.ts +236 -236
package/dist/v2/durable-core/schemas/session/events.d.ts +50 -50
package/dist/v2/durable-core/schemas/session/gaps.d.ts +2 -2
package/dist/v2/durable-core/schemas/session/manifest.d.ts +4 -4
package/dist/v2/durable-core/schemas/session/outputs.d.ts +8 -8
package/dist/v2/usecases/console-routes.d.ts +2 -1
package/dist/v2/usecases/console-routes.js +207 -5
package/dist/v2/usecases/console-service.js +14 -0
package/dist/v2/usecases/console-types.d.ts +1 -0
package/docs/authoring.md +16 -16
package/docs/design/coordinator-artifact-protocol-design-candidates.md +155 -0
package/docs/design/coordinator-artifact-protocol-design-review.md +103 -0
package/docs/design/coordinator-artifact-protocol-implementation-plan.md +259 -0
package/docs/design/coordinator-message-queue-drain-plan.md +241 -0
package/docs/design/coordinator-message-queue-drain-review.md +120 -0
package/docs/design/coordinator-message-queue-drain.md +289 -0
package/docs/design/shaping-workflow-external-research.md +119 -0
package/docs/discovery/late-bound-goals-impl-plan.md +147 -0
package/docs/discovery/late-bound-goals-review.md +82 -0
package/docs/discovery/late-bound-goals.md +118 -0
package/docs/discovery/steer-endpoint-design-candidates.md +288 -0
package/docs/discovery/steer-endpoint-design-review-findings.md +104 -0
package/docs/discovery/steer-endpoint-implementation-plan.md +284 -0
package/docs/ideas/backlog.md +447 -97
package/docs/ideas/design-candidates-console-session-tree-impl.md +64 -0
package/docs/ideas/design-candidates-session-tree-view.md +196 -0
package/docs/ideas/design-review-findings-console-session-tree-impl.md +75 -0
package/docs/ideas/design-review-findings-session-tree-view.md +88 -0
package/docs/ideas/implementation_plan_session_tree_view.md +238 -0
package/package.json +2 -1
package/spec/authoring-spec.json +16 -16
package/spec/shape.schema.json +178 -0
package/spec/workflow-tags.json +232 -47
package/workflows/coding-task-workflow-agentic.json +491 -480
package/workflows/mr-review-workflow.agentic.v2.json +5 -1
package/workflows/wr.shaping.json +182 -0
package/dist/console-ui/assets/index-3oXZ_A9m.js +0 -28
package/dist/console-ui/assets/index-8dh0Psu-.css +0 -1
package/dist/infrastructure/session/DashboardHeartbeat.d.ts +0 -8
package/dist/infrastructure/session/DashboardHeartbeat.js +0 -39
package/dist/infrastructure/session/DashboardLockRelease.d.ts +0 -2
package/dist/infrastructure/session/DashboardLockRelease.js +0 -29
package/dist/infrastructure/session/HttpServer.d.ts +0 -60
package/dist/infrastructure/session/HttpServer.js +0 -912
package/workflows/coding-task-workflow-agentic.lean.v2.json +0 -648
package/workflows/coding-task-workflow-agentic.v2.json +0 -324

package/docs/design/coordinator-artifact-protocol-design-review.md ADDED Viewed

@@ -0,0 +1,103 @@
+# Design Review Findings: Coordinator Artifact Protocol
+**Status:** Review complete
+**Date:** 2026-04-18
+**Design reviewed:** Candidate A from coordinator-artifact-protocol-design-candidates.md
+---
+## Tradeoff Review
+| Tradeoff | Acceptable? | When it stops being acceptable |
+|----------|-------------|-------------------------------|
+| N+1 HTTP calls for all-node aggregation | Yes (localhost, ~50-100ms) | If coordinator is called for sessions with 50+ nodes |
+| `source?` optional on `ReviewFindings` | Yes (observability only, not routing) | If future code switches exhaustively on `source` |
+| `.strict()` schema | Yes (follows existing precedent) | If LLM consistently emits extra fields causing Zod failures |
+| `required: false` in outputContract | Yes (transition strategy) | Once 10+ consecutive sessions confirm 100% artifact emission |
+---
+## Failure Mode Review
+| Failure Mode | Severity | Handling | Missing Mitigation |
+|-------------|----------|----------|--------------------|
+| Missing `makeContinueWorkflowTool` onComplete update | LOW | TypeScript won't catch (optional param) -- manual verification required | Code comment at both call sites |
+| Per-node HTTP fetch failure during aggregation | LOW | Graceful fallback to keyword scan | Per-node try/catch + WARN logging |
+| Agent emits malformed artifact (wrong enum, missing field) | MEDIUM | `safeParse` fails silently without logging | `[WARN coord:reason=artifact_parse_failed]` logging REQUIRED |
+| `runs[0].nodes` undefined for empty sessions | NONE | Null check + empty-array fallback | None |
+| `required: false` default behavior | NONE | Engine correctly reads `required: false` and skips validation | None |
+---
+## Runner-Up / Simpler Alternative Review
+**Runner-up (tip-node only):** Disqualified by task spec 'CRITICAL: must aggregate artifacts across ALL session nodes'. No elements worth incorporating.
+**Simpler variant (skip `lastStepArtifacts`):** The pr-review coordinator reads via HTTP, not via `WorkflowRunSuccess`. Skipping would satisfy the coordinator use case. Rejected because the task spec explicitly requires it, and it's the foundation for `spawn_agent` artifact surfacing (post-MVP).
+**Simpler variant (skip `onComplete` change):** Would leave `WorkflowRunSuccess.lastStepArtifacts` always undefined. Rejected -- inconsistent state.
+---
+## Philosophy Alignment
+**Satisfied:** validate-at-boundaries, errors-as-data, functional/declarative, prefer-fakes, exhaustiveness (closed enums), immutability.
+**Under tension (accepted):**
+- `source?` optional vs. type-safety-first: minor, observability-only field
+- `required: false` vs. make-illegal-states-unrepresentable: time-boxed transition strategy
+---
+## Findings
+### RED (must fix before shipping)
+**R1: `readVerdictArtifact()` must log on malformed artifact**
+If the agent emits an artifact with `kind: 'wr.review_verdict'` but wrong schema, `safeParse` fails silently. Without logging, FM3 (malformed artifact) is invisible and prevents monitoring of the artifact emission rate.
+Required: `process.stderr.write('[WARN coord:reason=artifact_parse_failed ...]')` when `safeParse` fails AND the artifact has `kind === 'wr.review_verdict'`.
+**R2: Per-node fetch errors must be caught individually**
+The current outer `try/catch` in `getAgentResult` covers the entire function. The new implementation walks multiple nodes -- if one node fetch throws, the outer catch aborts the entire aggregation. Each per-node fetch must be wrapped individually so one failure doesn't discard all other nodes' artifacts.
+---
+### ORANGE (fix before C1 -> C2 graduation)
+**O1: Log when keyword scan fires on a session that had artifacts**
+The coordinator cannot distinguish 'artifact never emitted' from 'artifact emitted but invalid' without checking. Add a log entry when `readVerdictArtifact` returns null but `artifacts.length > 0`. This enables the graduation metric (10+ sessions with 0 fallback warnings).
+Required log: `[INFO coord:source=keyword_scan reason=no_valid_artifact artifactCount=N]`
+**O2: Divergence detection warning**
+If both artifact severity (from `readVerdictArtifact`) and keyword-scan severity (from `parseFindingsFromNotes`) are available and disagree, log at WARN. Design doc recommends this (ORANGE finding). Protects against semantic inconsistency between notes and artifact.
+---
+### YELLOW (future consideration)
+**Y1: `source?` optional on `ReviewFindings`**
+Making `source` required would improve type safety. Currently deferred to avoid breaking 4 existing test literals. When those tests are updated for other reasons, upgrade `source` to required.
+**Y2: Post-graduation: remove keyword scan fallback**
+Once the graduation criterion is met, `parseFindingsFromNotes` callers can be removed from the coordinator routing logic. The `unknown` severity variant can also be removed from `ReviewSeverity`.
+---
+## Recommended Revisions
+1. **R1:** In `readVerdictArtifact()`, check if `raw` object has `kind === 'wr.review_verdict'` before `safeParse`. If kind matches but safeParse fails, log WARN.
+2. **R2:** In `getAgentResult()` implementation, wrap each per-node HTTP fetch in its own try/catch. Failed nodes are skipped with a WARN log; successful nodes contribute their artifacts.
+3. **O1:** After the artifact/keyword-scan decision in the coordinator, log `source` with the artifact count context.
+4. **O2:** Add divergence check: run keyword scan on `recapMarkdown` when an artifact is found; if severities disagree, log WARN.
+---
+## Residual Concerns
+1. **`continue_workflow` onComplete call site:** `makeContinueWorkflowTool` is marked DEPRECATED for daemon sessions, but it still calls `onComplete`. The new `artifacts?` parameter must be passed from `params.artifacts` at line 1046. Must be verified manually -- TypeScript won't catch a missing optional parameter.
+2. **`.strict()` vs. LLM reliability:** If the LLM adds extra fields (e.g., `rationale`, `notes`) to the artifact, `.strict()` causes Zod failure. With `required: false`, this just triggers the keyword-scan fallback. Acceptable during transition. If the failure rate is high in production, consider switching to `.strip()`.
+3. **Convention only:** `V1` suffix on `ReviewVerdictArtifactV1Schema` is a convention, not enforced. No migration path exists for schema changes. Future schema evolution must use a new type (`ReviewVerdictArtifactV2Schema`) in parallel until old sessions are retired.

package/docs/design/coordinator-artifact-protocol-implementation-plan.md ADDED Viewed

@@ -0,0 +1,259 @@
+# Implementation Plan: Coordinator Artifact Protocol
+**Date:** 2026-04-18
+**Branch:** `feat/coordinator-artifact-protocol`
+---
+## Problem Statement
+The PR review coordinator (`src/coordinators/pr-review.ts`) extracts review severity from completed review sessions by running a keyword scan on free-form step notes. The coordinator ignores the `artifacts[]` field that `GET /api/v2/sessions/:id/nodes/:nodeId` already returns. This makes severity extraction brittle and unmeasurable.
+The fix: define a `wr.review_verdict` artifact schema, update the final handoff step to emit it, update `getAgentResult()` to return artifacts alongside notes, and update the coordinator to try the artifact path before the keyword scan.
+---
+## Acceptance Criteria
+1. `npm run build` completes with 0 TypeScript errors
+2. `tests/unit/coordinator-pr-review.test.ts` passes (all existing tests + new `readVerdictArtifact` tests)
+3. `readVerdictArtifact([{ kind: 'wr.review_verdict', verdict: 'clean', ... }])` returns `{ severity: 'clean', source: 'artifact', ... }`
+4. `readVerdictArtifact([])` returns `null`
+5. `readVerdictArtifact([{ kind: 'wr.review_verdict', verdict: 'INVALID' }])` returns `null` and logs WARN
+6. `CoordinatorDeps.getAgentResult` return type is `Promise<{ recapMarkdown: string | null; artifacts: readonly unknown[] }>`
+7. `WorkflowRunSuccess` has optional field `lastStepArtifacts?: readonly unknown[]`
+8. `mr-review-workflow.agentic.v2.json` phase-6-final-handoff has `outputContract: { contractRef: 'wr.contracts.review_verdict', required: false }`
+9. `isValidContractRef('wr.contracts.review_verdict')` returns `true`
+10. `validateArtifactContract([{ kind: 'wr.review_verdict', verdict: 'clean', ... }], { contractRef: 'wr.contracts.review_verdict' })` returns `{ valid: true, artifact: ... }`
+---
+## Non-Goals
+- Do NOT add a `/api/v2/sessions/:id/artifacts` server-side aggregation endpoint
+- Do NOT change `required: false` to `required: true` (post-graduation decision)
+- Do NOT remove the keyword-scan fallback from `parseFindingsFromNotes`
+- Do NOT add a `coordinatorProtocol` field to the workflow JSON (deferred)
+- Do NOT add artifacts to `spawn_agent` return value (post-MVP)
+- Do NOT make `source` required on `ReviewFindings` (breaking change deferred)
+---
+## Philosophy Constraints
+- **Make illegal states unrepresentable:** `verdict`, `source`, `confidence` use closed enums
+- **Validate at boundaries:** Zod `safeParse` in `readVerdictArtifact()`; engine validation via `validateArtifactContract()`
+- **Errors are data:** `readVerdictArtifact()` returns `ReviewFindings | null`, not throws
+- **Functional/declarative:** `readVerdictArtifact()` is a pure function
+- **Prefer fakes over mocks:** New tests use `makeFakeDeps()` pattern
+---
+## Invariants
+1. `required: false` in outputContract -- never block sessions during transition
+2. Schema registration (`ARTIFACT_CONTRACT_REFS`) MUST be done before workflow JSON update (compiler validates at load time via `isValidContractRef()`)
+3. Keyword-scan fallback MUST remain live in `parseFindingsFromNotes`
+4. All call sites of `CoordinatorDeps.getAgentResult` MUST handle `{ recapMarkdown, artifacts }` shape
+5. `readVerdictArtifact()` MUST log `[WARN coord:reason=artifact_parse_failed]` when kind matches but safeParse fails
+6. Per-node HTTP fetch failures MUST be caught individually (not by outer try/catch)
+7. `makeContinueWorkflowTool` AND `makeCompleteStepTool` MUST both pass artifacts to `onComplete`
+---
+## Selected Approach
+**Candidate A:** Three ordered changes, all additive, following existing repo patterns exactly.
+**Rationale:** Zero new infrastructure; follows `loop-control.ts` schema pattern; follows `WorkflowRunSuccess.lastStepNotes` conditional spread pattern; follows `makeFakeDeps()` testing pattern; backward compatible via `required: false` + keyword-scan fallback.
+**Runner-up:** Tip-node only artifact read. Disqualified by task spec 'CRITICAL: must aggregate artifacts across ALL session nodes'.
+---
+## Slices
+### Slice 1: Schema registration (prerequisite for all other changes)
+**Files:**
+- `src/v2/durable-core/schemas/artifacts/review-verdict.ts` (NEW)
+- `src/v2/durable-core/schemas/artifacts/index.ts` (update)
+- `src/v2/durable-core/domain/artifact-contract-validator.ts` (update)
+**Work:**
+1. Create `review-verdict.ts` following `loop-control.ts` pattern:
+   - `REVIEW_VERDICT_CONTRACT_REF = 'wr.contracts.review_verdict' as const`
+   - `ReviewVerdictArtifactV1Schema = z.object({ kind: z.literal('wr.review_verdict'), verdict: z.enum(['clean', 'minor', 'blocking']), confidence: z.enum(['high', 'medium', 'low']), findings: z.array(z.object({ severity: z.enum(['critical', 'major', 'minor', 'nit']), summary: z.string().min(1) }).strict()), summary: z.string().min(1) }).strict()`
+   - `isReviewVerdictArtifact()` type guard
+   - `parseReviewVerdictArtifact()` convenience function
+2. Update `index.ts`: export all new symbols, add `'wr.contracts.review_verdict'` to `ARTIFACT_CONTRACT_REFS`
+3. Update `artifact-contract-validator.ts`: import new symbols, add `case REVIEW_VERDICT_CONTRACT_REF:` to switch with `validateReviewVerdictContract()` helper
+**Done when:** `isValidContractRef('wr.contracts.review_verdict')` returns `true`; `validateArtifactContract([{ kind: 'wr.review_verdict', ... }], { contractRef: 'wr.contracts.review_verdict' })` returns `{ valid: true, artifact: ... }`.
+---
+### Slice 2: Fix onComplete callback signature
+**Files:**
+- `src/daemon/workflow-runner.ts`
+**Work:**
+1. Change `onComplete` closure definition (line 2096) from `(notes: string | undefined): void` to `(notes: string | undefined, artifacts?: readonly unknown[]): void`
+2. Add `let lastStepArtifacts: readonly unknown[] | undefined;` near `let lastStepNotes`
+3. Update `onComplete` body to set `lastStepArtifacts = artifacts`
+4. Add `lastStepArtifacts?: readonly unknown[]` to `WorkflowRunSuccess` interface
+5. Update `makeCompleteStepTool` call to `onComplete(notes)` -> `onComplete(notes, params.artifacts as readonly unknown[] | undefined)` (line 1249)
+6. Update `makeContinueWorkflowTool` call to `onComplete(params.notesMarkdown)` -> `onComplete(params.notesMarkdown, params.artifacts as readonly unknown[] | undefined)` (line 1046)
+7. Update the final `return` in `runWorkflow()` (line 2622) to spread `lastStepArtifacts` conditionally
+**Done when:** `WorkflowRunSuccess` has `lastStepArtifacts` field; both tool factory call sites pass artifacts; `npm run build` passes.
+---
+### Slice 3: Update getAgentResult to return artifacts
+**Files:**
+- `src/cli-worktrain.ts`
+**Work:**
+1. Change `getAgentResult: async (sessionHandle: string): Promise<string | null>` -> `Promise<{ recapMarkdown: string | null; artifacts: readonly unknown[] }>`
+2. In the implementation body:
+   - After reading `runs[0]`, read `runs[0].nodes` as `Array<{ nodeId: string; [key: string]: unknown }>` (with null check)
+   - Walk all nodes, fetch each node detail with individual `try/catch`:
+     ```
+     for (const node of nodes) {
+       try {
+         const nodeRes = await fetch(nodeUrl + '/' + node.nodeId)
+         // collect artifacts from nodeData['artifacts']
+       } catch { /* log WARN, continue */ }
+     }
+     ```
+   - Return `{ recapMarkdown: recap, artifacts: collectedArtifacts }` (or `{ recapMarkdown: null, artifacts: [] }` on failure)
+3. Early-return failures must also return `{ recapMarkdown: null, artifacts: [] }` instead of `null`
+**Done when:** Return type is `Promise<{ recapMarkdown: string | null; artifacts: readonly unknown[] }>`; TypeScript compile-time errors at call sites force updates.
+---
+### Slice 4: Update coordinator to use artifact path
+**Files:**
+- `src/coordinators/pr-review.ts`
+**Work:**
+1. Import `ReviewVerdictArtifactV1Schema` from artifacts schema
+2. Update `CoordinatorDeps.getAgentResult` return type to match new shape
+3. Add `source?: 'artifact' | 'keyword_scan'` to `ReviewFindings` interface
+4. Add `readVerdictArtifact(artifacts: readonly unknown[]): ReviewFindings | null` pure function:
+   - Walk artifacts array
+   - For each, check `(raw as any).kind === 'wr.review_verdict'`
+   - If kind matches, call `ReviewVerdictArtifactV1Schema.safeParse(raw)`
+   - On success: return `{ severity: v.verdict, findingSummaries: v.findings.map(f => f.summary), raw: JSON.stringify(v), source: 'artifact' }`
+   - On failure: log `[WARN coord:reason=artifact_parse_failed]`, continue to next artifact
+   - If no valid artifact found and artifacts.length > 0: log `[INFO coord:source=keyword_scan reason=no_valid_artifact artifactCount=N]`
+   - Return `null`
+5. Update both call sites in `runPrReviewCoordinator()`:
+   - `const { recapMarkdown: notes, artifacts } = await deps.getAgentResult(handle);`
+   - `const findingsResult = readVerdictArtifact(artifacts) ? ok(readVerdictArtifact(artifacts)!) : parseFindingsFromNotes(notes);`
+   - Log `[INFO coord:source=artifact]` or `[INFO coord:source=keyword_scan]`
+6. Add divergence check (O2): if artifact verdict and keyword-scan severity disagree, log WARN
+7. Update traceability JSON block to include `source` field
+**Done when:** Coordinator tries artifact path first; keyword-scan fallback works; logging emits; `npm run build` passes.
+---
+### Slice 5: Update mr-review workflow
+**Files:**
+- `workflows/mr-review-workflow.agentic.v2.json`
+**Work:**
+1. In `phase-6-final-handoff` step, add `outputContract: { "contractRef": "wr.contracts.review_verdict", "required": false }`
+2. Append to the step `prompt` field the artifact emission instruction:
+   ```
+   \n\nAfter completing your notes, emit a structured verdict via complete_step artifacts[] parameter. Use exactly this schema:\n{ "kind": "wr.review_verdict", "verdict": "clean|minor|blocking", "confidence": "high|medium|low", "findings": [{ "severity": "critical|major|minor|nit", "summary": "one-line description" }], "summary": "one-line overall summary" }\nFor a clean review with no findings, use findings: [].
+   ```
+**Done when:** Workflow JSON validates via `npm run build`; `isValidContractRef('wr.contracts.review_verdict')` returns `true` (prerequisite: Slice 1 must be done first).
+---
+### Slice 6: Tests
+**Files:**
+- `tests/unit/coordinator-pr-review.test.ts`
+**Work:**
+1. Update `makeFakeDeps()` to return `{ recapMarkdown: string | null; artifacts: readonly unknown[] }` from `getAgentResult` (change return type from `string | null`)
+2. Update `ReviewFindings` literal objects in `buildFixGoal` tests to add `source: 'artifact'` or `source: 'keyword_scan'` (or leave as optional -- `source?` means no update needed)
+3. Add new `describe('readVerdictArtifact')` block:
+   - `it('returns ReviewFindings with source artifact for valid artifact')`
+   - `it('returns null for invalid schema (wrong verdict enum)')`
+   - `it('returns null for empty artifacts array')`
+   - `it('returns null for artifact with different kind')`
+   - `it('returns first valid artifact when multiple present')`
+4. Import `readVerdictArtifact` from `pr-review.js`
+**Done when:** All existing tests pass; 5 new `readVerdictArtifact` tests pass.
+---
+## Test Design
+**Unit tests (pure function):**
+- `readVerdictArtifact` with valid `wr.review_verdict` artifact -> returns `ReviewFindings` with `severity` mapped from `verdict`, `source: 'artifact'`
+- `readVerdictArtifact` with invalid schema (wrong enum) -> returns `null`
+- `readVerdictArtifact` with empty array -> returns `null`
+- `readVerdictArtifact` with artifact of different `kind` -> returns `null` (no false positives)
+- `readVerdictArtifact` with valid + invalid artifacts -> returns valid one (first match wins)
+**Integration tests (fake deps):**
+- Existing `runPrReviewCoordinator` tests must pass with updated `getAgentResult` return type
+- The fake `getAgentResult` returns `{ recapMarkdown: 'APPROVE ...', artifacts: [] }` by default
+---
+## Risk Register
+| Risk | Likelihood | Impact | Mitigation |
+|------|-----------|--------|------------|
+| Missing `makeContinueWorkflowTool` onComplete update | Low | Silent -- artifacts not forwarded from continue_workflow path | Manual verification; code comment at both call sites |
+| Per-node HTTP fetch error aborting aggregation | Low | Graceful fallback to keyword scan | Per-node try/catch (Slice 3 R2) |
+| LLM emits extra fields in artifact (`.strict()` reject) | Medium | Zod fail -> WARN log -> keyword scan fallback | Acceptable during `required: false` transition |
+| `runs[0].nodes` undefined or empty | Low | Empty artifact array -> keyword scan fallback | Null check in Slice 3 |
+---
+## PR Packaging Strategy
+Single PR: `feat/coordinator-artifact-protocol`
+All 6 slices in one PR. Changes are tightly coupled (schema + validator + coordinator must be consistent). Breaking the PR into multiple would require interface stubs that add noise.
+**PR description structure:**
+1. Summary: what was done and why
+2. Change 1 (schema), Change 2 (onComplete), Change 3 (coordinator + workflow)
+3. Test plan: `npm run build`, `npx vitest run tests/unit/coordinator-pr-review.test.ts`
+---
+## Philosophy Alignment
+| Slice | Principle | Status |
+|-------|-----------|--------|
+| 1 (schema) | Make illegal states unrepresentable | Satisfied -- closed enums, kind literal |
+| 1 (schema) | Validate at boundaries | Satisfied -- Zod strict schema |
+| 2 (onComplete) | Immutability by default | Satisfied -- `readonly unknown[]` |
+| 3 (getAgentResult) | Errors are data | Satisfied -- returns `{ recapMarkdown: null, artifacts: [] }` not null |
+| 4 (coordinator) | Functional/declarative | Satisfied -- `readVerdictArtifact()` is pure |
+| 4 (coordinator) | Make illegal states unrepresentable | Tension -- `source?` optional; accepted tradeoff |
+| 6 (tests) | Prefer fakes over mocks | Satisfied -- `makeFakeDeps()` pattern |
+---
+## planConfidenceBand: High
+- unresolvedUnknownCount: 0
+- followUpTickets: Y1 (make source required post-graduation), Y2 (remove keyword scan post-graduation), spawn_agent artifacts gap (post-MVP)

package/docs/design/coordinator-message-queue-drain-plan.md ADDED Viewed

@@ -0,0 +1,241 @@
+# Implementation Plan: Coordinator Message Queue Drain
+## 1. Problem Statement
+`worktrain tell "<message>"` appends to `~/.workrail/message-queue.jsonl` but the PR review
+coordinator (`runPrReviewCoordinator`) never reads this file. Messages sent from a phone,
+terminal, or automation (e.g., "stop", "skip-pr 42") are silently ignored. The coordinator
+must drain this queue at the start of each cycle and act on actionable messages before spawning
+any agent.
+## 2. Acceptance Criteria
+AC1. When `stop` appears as the first meaningful word in a queued message (matched by
+     `/^\s*stop\b/i`), the coordinator exits cleanly without reviewing any PR, and appends an
+     outbox notification that includes the full triggering message text and timestamp.
+AC2. When `skip-pr N` appears in a queued message (matched by `/\bskip[- ]pr[\s#]+(\d+)/i`),
+     PR #N is removed from the list before Stage 1 review dispatch. An outbox notification is
+     appended confirming the skip.
+AC3. When `add-pr N` appears in a queued message (matched by `/\badd[- ]pr[\s#]+(\d+)/i`),
+     PR #N is added to the list (with Set dedup to prevent duplicates). An outbox notification
+     is appended confirming the addition.
+AC4. Messages that match no recognized pattern are skipped silently (treated as notes).
+AC5. After draining, the cursor in `~/.workrail/message-queue-cursor.json` is updated so
+     processed messages are not re-processed on the next coordinator invocation.
+AC6. If `~/.workrail/message-queue.jsonl` does not exist (ENOENT), the drain returns a no-op
+     result and the coordinator proceeds normally.
+AC7. Malformed JSONL lines (unparseable JSON) are skipped without crashing the coordinator.
+     A stderr warning is emitted for each skipped malformed line.
+AC8. All drain I/O (readFile, appendFile, homedir, joinPath, now, generateId) is injected via
+     `CoordinatorDeps`. No direct `fs` imports are added to `pr-review.ts`.
+AC9. Unit tests for `drainMessageQueue()` use fake deps (in-memory file map). No real filesystem
+     access in tests.
+## 3. Non-Goals
+- No `reprioritize` message kind in this PR
+- No workspace routing (workspaceHint matching) -- all messages are consumed regardless of hint
+- No structured `kind` field on `QueuedMessage` (Candidate C) -- that is a follow-up issue
+- No truncation or compaction of consumed messages (queue remains append-only)
+- No real-time / `--watch` mode
+- No multi-coordinator fan-out (single coordinator consumes the queue)
+- No integration test (unit tests with fakes are sufficient)
+## 4. Philosophy-Driven Constraints
+- Errors as data: `drainMessageQueue` returns `DrainResult`, never throws
+- All I/O injected: `CoordinatorDeps` gains `readFile` and `appendFile`; zero direct fs imports
+- Immutability: `DrainResult` and all new interfaces are fully readonly
+- Prefer fakes over mocks: tests use in-memory fake deps
+- Validate at boundaries: JSONL parsing, ENOENT, cursor desync handled at the read boundary
+- Document WHY: function header explains the cursor pattern and text-matching tradeoff
+## 5. Invariants
+I1. `message-queue.jsonl` is never written or truncated by the coordinator (append-only)
+I2. The coordinator drains the queue BEFORE Stage 1 (PR discovery) -- never mid-agent-run
+I3. `stop: true` in `DrainResult` takes absolute precedence; coordinator must check stop before
+    acting on `skipPrNumbers` or `addPrNumbers`
+I4. The cursor advances only AFTER successful outbox writes (best-effort; cursor write failure
+    does not block drain -- same pattern as worktrain-inbox.ts)
+I5. ENOENT on message-queue.jsonl = no messages = coordinator proceeds normally (not an error)
+I6. Cursor desync guard: if `cursor > totalLines`, reset to 0 (queue was wiped)
+## 6. Selected Approach & Rationale
+**Selected: Candidate B** -- `drainMessageQueue()` pure function with cursor + text parsing.
+**Rationale:** Direct adaptation of the `worktrain-inbox.ts` cursor pattern (already tested, same
+`InboxCursor` shape `{ lastReadCount: number }`). Additive to `CoordinatorDeps`. Text parsing is
+narrow (`^\\s*stop\\b`) and consistent with how `parseFindingsFromNotes()` works in the same file.
+**Runner-up: Candidate C** (structured `kind` field on `QueuedMessage`). Loses because it
+requires a schema change to the public CLI interface (`worktrain tell`), which is out of scope.
+Filed as a follow-up.
+## 7. Vertical Slices
+### Slice 1: Extend `CoordinatorDeps` and add `DrainResult` type
+**Files:** `src/coordinators/pr-review.ts`
+**Work:**
+- Add `readFile: (path: string) => Promise<string>` to `CoordinatorDeps`
+- Add `appendFile: (path: string, content: string) => Promise<void>` to `CoordinatorDeps`
+- Add `mkdir: (path: string, options: { recursive: boolean }) => Promise<string | undefined>` to `CoordinatorDeps`
+- Define `DrainResult` interface (readonly: stop, stopReason, skipPrNumbers, addPrNumbers, messagesProcessed)
+**Done when:** TypeScript compiles with new interface fields. No runtime behavior change yet.
+**Note:** Updating fake deps in `coordinator-pr-review.test.ts` is part of this slice (compile-
+time requirement).
+---
+### Slice 2: Implement `drainMessageQueue()`
+**Files:** `src/coordinators/pr-review.ts`
+**Work:**
+- New exported function `drainMessageQueue(deps, workrailDir)` -- deps is the coordinator deps
+  subset; workrailDir defaults to `deps.joinPath(deps.homedir(), '.workrail')`
+- Reads `message-queue.jsonl` (ENOENT -> return empty result)
+- Reads cursor from `message-queue-cursor.json` (missing/corrupt -> 0)
+- Applies cursor desync guard (cursor > totalLines -> reset to 0)
+- Parses new lines (slice from cursor), skips malformed with stderr warning
+- For each parsed `QueuedMessage`:
+  - `^\\s*stop\\b/i` match -> set stop=true, record stopReason=message.message
+  - `/\\bskip[- ]pr[\\s#]+([0-9]+)/i` match -> add to skipSet
+  - `/\\badd[- ]pr[\\s#]+([0-9]+)/i` match -> add to addSet
+  - Otherwise: skip (informational note)
+- After processing all new messages:
+  - For each actionable message: appendFile to outbox.jsonl with confirmation text
+  - Append stderr `[INFO coord:drain kind=... message="..." ts=...]` per actionable message
+  - Update cursor file (non-fatal on failure)
+- Return `DrainResult`
+**Done when:** Function exists, TypeScript compiles, unit tests pass.
+---
+### Slice 3: Integrate drain into `runPrReviewCoordinator()`
+**Files:** `src/coordinators/pr-review.ts`
+**Work:**
+- Call `drainMessageQueue(deps)` at the top of `runPrReviewCoordinator()` (before Stage 1 log)
+- Check `drainResult.stop` immediately:
+  - If true: log stop reason, write report (empty/aborted), return early with all zeros
+- Apply `drainResult.skipPrNumbers` to remove PRs from the discovered list (after Stage 1)
+- Apply `drainResult.addPrNumbers` to add PRs to the list (with Set dedup, before Stage 1)
+- Log drain activity: `[drain] processed N messages, skip=[...], add=[...]` if messagesProcessed > 0
+**Done when:** Integration passes existing coordinator unit tests + new drain integration test.
+---
+### Slice 4: Wire new deps in `cli-worktrain.ts`
+**Files:** `src/cli-worktrain.ts`
+**Work:**
+- Add `readFile: (p: string) => fs.promises.readFile(p, 'utf-8')` to CoordinatorDeps wiring
+- Add `appendFile: (p: string, content: string) => fs.promises.appendFile(p, content, 'utf-8')`
+  to CoordinatorDeps wiring
+- Add `mkdir: (p: string, opts: { recursive: boolean }) => fs.promises.mkdir(p, opts)` to
+  CoordinatorDeps wiring
+**Done when:** `worktrain run pr-review --dry-run` compiles and runs without error.
+---
+### Slice 5: Unit tests for `drainMessageQueue()`
+**Files:** `tests/unit/coordinator-pr-review.test.ts`
+**Work:**
+- Add `readFile` and `appendFile` to the existing fake CoordinatorDeps helper
+- New `describe('drainMessageQueue')` block covering:
+  - ENOENT -> returns empty DrainResult (messagesProcessed=0, stop=false)
+  - Stop message at start of message text -> stop=true, stopReason set
+  - Stop NOT triggered when 'stop' appears mid-sentence ("please stop overthinking" -- note: this
+    still fires with `^\\s*stop` since it doesn't start the message; test confirms this is the
+    designed behavior)
+  - skip-pr with PR number -> skipPrNumbers contains the number
+  - add-pr with PR number -> addPrNumbers contains the number
+  - Malformed JSONL lines skipped, messagesProcessed counts only valid lines
+  - Cursor advances after drain
+  - Cursor desync guard resets to 0 when cursor > totalLines
+  - Multiple messages: stop takes precedence regardless of order in queue
+  - Note-only messages: no action, cursor advances, messagesProcessed = N
+**Done when:** All new tests pass; no existing tests broken.
+## 8. Test Design
+**Strategy:** Fake deps only (in-memory Map for files, Set for dirs). No real filesystem.
+**Key test helpers:**
+```ts
+interface FakeDrainFs {
+  files: Map<string, string>;
+}
+function makeDrainDeps(fs: FakeDrainFs): Pick<CoordinatorDeps, 'readFile' | 'appendFile' | 'mkdir' | 'homedir' | 'joinPath' | 'now' | 'generateId' | 'stderr'>
+```
+**Critical test cases:**
+- `stop` as sole message: stop=true, outbox has triggering text
+- `skip-pr 42` after a note: skipPrNumbers=[42], messagesProcessed=2
+- Two `skip-pr` for same PR: deduplicated in Set (skipPrNumbers=[42] not [42, 42])
+- Cursor = 5, file has 5 lines: messagesProcessed=0 (all previously read)
+- Cursor = 10, file has 5 lines: cursor reset to 0, all 5 processed
+## 9. Risk Register
+| Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|
+| `stop` false positive on note message | Low | Medium | `^\\s*stop\\b` anchor; outbox shows triggering text |
+| Cursor file write failure | Very Low | Low | Non-fatal; next run re-reads from 0 (desync reset) |
+| Outbox write failure during stop | Very Low | Low | Non-fatal; stderr log is backup |
+| `readFile`/`appendFile` not wired in cli-worktrain.ts | Low | High | Slice 4 is explicit; TypeScript will catch missing fields at compile time |
+## 10. PR Packaging Strategy
+Single PR on branch `feat/coordinator-message-queue`. All 5 slices in one PR -- they are
+tightly coupled (type change -> function -> integration -> wiring -> tests). Separating them
+would create a non-compiling intermediate state.
+## 11. Philosophy Alignment Per Slice
+| Slice | Principle | Status |
+|---|---|---|
+| 1 | Immutability by default | Satisfied -- all new fields are readonly |
+| 1 | Explicit domain types | Tension -- DrainResult uses boolean stop not a discriminated union; documented |
+| 2 | Errors are data | Satisfied -- DrainResult is a value; ENOENT returns empty result |
+| 2 | Dependency injection | Satisfied -- all I/O via injected deps |
+| 2 | Validate at boundaries | Satisfied -- malformed JSONL skipped at parse boundary |
+| 3 | Determinism over cleverness | Satisfied -- same queue + cursor = same result |
+| 4 | Compose with small pure functions | Satisfied -- drainMessageQueue is pure at logic level |
+| 5 | Prefer fakes over mocks | Satisfied -- fake deps, no vi.mock() |
+## 12. Follow-Up Tickets
+1. **Add `kind` field to `QueuedMessage` for structured dispatch** (Candidate C) -- unblocks
+   automated tooling writing to the message queue without text fragility.
+2. **`worktrain tell --help` should list recognized coordinator command patterns** -- discovery
+   for users who don't know what command words the coordinator recognizes.
+## Summary
+- `estimatedPRCount`: 1
+- `unresolvedUnknownCount`: 0
+- `planConfidenceBand`: High