npm - @exaudeus/workrail - Versions diffs - 3.66.0 → 3.68.0 - Mend

@exaudeus/workrail 3.66.0 → 3.68.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (150) hide show

package/dist/application/services/compiler/template-registry.js +10 -1
package/dist/application/validation.js +1 -1
package/dist/cli/commands/worktrain-init.js +1 -1
package/dist/console/standalone-console.js +4 -1
package/dist/console-ui/assets/{index-BynU38Vu.js → index-CyzltI6D.js} +1 -1
package/dist/console-ui/index.html +1 -1
package/dist/coordinators/modes/full-pipeline.js +4 -4
package/dist/coordinators/modes/implement-shared.js +5 -5
package/dist/coordinators/modes/implement.js +4 -4
package/dist/coordinators/pr-review.js +4 -4
package/dist/daemon/workflow-runner.d.ts +1 -0
package/dist/daemon/workflow-runner.js +1 -0
package/dist/infrastructure/storage/schema-validating-workflow-storage.d.ts +21 -2
package/dist/infrastructure/storage/schema-validating-workflow-storage.js +48 -0
package/dist/manifest.json +41 -41
package/dist/mcp/handlers/v2-workflow.js +24 -7
package/dist/mcp/output-schemas.d.ts +36 -0
package/dist/mcp/output-schemas.js +11 -1
package/dist/mcp/workflow-protocol-contracts.js +2 -2
package/dist/v2/projections/session-metrics.d.ts +1 -1
package/dist/v2/projections/session-metrics.js +16 -35
package/dist/v2/usecases/console-routes.d.ts +2 -2
package/docs/authoring-v2.md +4 -4
package/docs/changelog-recent.md +3 -3
package/docs/configuration.md +1 -1
package/docs/design/adaptive-coordinator-context-candidates.md +1 -1
package/docs/design/adaptive-coordinator-context.md +1 -1
package/docs/design/adaptive-coordinator-routing-candidates.md +18 -18
package/docs/design/adaptive-coordinator-routing-review.md +1 -1
package/docs/design/adaptive-coordinator-routing.md +34 -34
package/docs/design/agent-cascade-protocol.md +2 -2
package/docs/design/console-daemon-separation-discovery.md +323 -0
package/docs/design/context-assembly-design-candidates.md +1 -1
package/docs/design/context-assembly-implementation-plan.md +1 -1
package/docs/design/context-assembly-layer.md +2 -2
package/docs/design/context-assembly-review-findings.md +1 -1
package/docs/design/coordinator-access-audit.md +293 -0
package/docs/design/coordinator-architecture-audit.md +62 -0
package/docs/design/coordinator-error-handling-audit.md +240 -0
package/docs/design/coordinator-testability-audit.md +426 -0
package/docs/design/daemon-architecture-discovery.md +1 -1
package/docs/design/daemon-console-separation-discovery.md +242 -0
package/docs/design/daemon-memory-audit.md +203 -0
package/docs/design/design-candidates-console-daemon-separation.md +256 -0
package/docs/design/design-candidates-discovery-loop-fix.md +141 -0
package/docs/design/design-review-findings-console-daemon-separation.md +106 -0
package/docs/design/design-review-findings-discovery-loop-fix.md +81 -0
package/docs/design/discovery-loop-fix-candidates.md +161 -0
package/docs/design/discovery-loop-fix-design-review.md +106 -0
package/docs/design/discovery-loop-fix-validation.md +258 -0
package/docs/design/discovery-loop-investigation-A.md +188 -0
package/docs/design/discovery-loop-investigation-B.md +287 -0
package/docs/design/exploration-workflow-candidates.md +205 -0
package/docs/design/exploration-workflow-design-review.md +166 -0
package/docs/design/exploration-workflow-discovery.md +443 -0
package/docs/design/ide-context-files-candidates.md +231 -0
package/docs/design/ide-context-files-design-review.md +85 -0
package/docs/design/ide-context-files.md +615 -0
package/docs/design/implementation-plan-discovery-loop-fix.md +199 -0
package/docs/design/implementation-plan-queue-poll-rotation.md +102 -0
package/docs/design/in-process-http-audit.md +190 -0
package/docs/design/layer3b-ghost-nodes-design-candidates.md +2 -2
package/docs/design/loadSessionNotes-candidates.md +108 -0
package/docs/design/loadSessionNotes-test-coverage-discovery.md +297 -0
package/docs/design/loadSessionNotes-test-coverage-session4.md +209 -0
package/docs/design/loadSessionNotes-test-coverage-v3.md +321 -0
package/docs/design/probe-session-design-candidates.md +261 -0
package/docs/design/probe-session-phase0.md +490 -0
package/docs/design/routines-guide.md +7 -7
package/docs/design/session-metrics-attribution-candidates.md +250 -0
package/docs/design/session-metrics-attribution-design-review.md +115 -0
package/docs/design/session-metrics-attribution-discovery.md +319 -0
package/docs/design/session-metrics-candidates.md +227 -0
package/docs/design/session-metrics-design-review.md +104 -0
package/docs/design/session-metrics-discovery.md +454 -0
package/docs/design/spawn-session-debug.md +202 -0
package/docs/design/trigger-validator-candidates.md +214 -0
package/docs/design/trigger-validator-review.md +109 -0
package/docs/design/trigger-validator-shaping-phase0.md +239 -0
package/docs/design/trigger-validator.md +454 -0
package/docs/design/v2-core-design-locks.md +2 -2
package/docs/design/workflow-extension-points.md +15 -15
package/docs/design/workflow-id-validation-at-startup.md +1 -1
package/docs/design/workflow-id-validation-implementation-plan.md +2 -2
package/docs/design/workflow-trigger-lifecycle-audit.md +175 -0
package/docs/design/worktrain-task-queue-candidates.md +5 -5
package/docs/design/worktrain-task-queue.md +4 -4
package/docs/discovery/coordinator-script-design.md +1 -1
package/docs/discovery/coordinator-ux-discovery.md +3 -3
package/docs/discovery/simulation-report.md +1 -1
package/docs/discovery/workflow-modernization-discovery.md +326 -0
package/docs/discovery/workflow-selection-for-discovery-tasks.md +33 -33
package/docs/discovery/worktrain-status-briefing.md +1 -1
package/docs/discovery/wr-discovery-goal-reframing.md +1 -1
package/docs/docker.md +1 -1
package/docs/ideas/backlog.md +227 -0
package/docs/ideas/third-party-workflow-setup-design-thinking.md +1 -1
package/docs/integrations/claude-code.md +5 -5
package/docs/integrations/firebender.md +1 -1
package/docs/plans/agentic-orchestration-roadmap.md +2 -2
package/docs/plans/mr-review-workflow-redesign.md +9 -9
package/docs/plans/ui-ux-workflow-design-candidates.md +4 -4
package/docs/plans/ui-ux-workflow-discovery.md +2 -2
package/docs/plans/workflow-categories-candidates.md +8 -8
package/docs/plans/workflow-categories-discovery.md +4 -4
package/docs/plans/workflow-modernization-design.md +430 -0
package/docs/plans/workflow-staleness-detection-candidates.md +11 -11
package/docs/plans/workflow-staleness-detection-review.md +4 -4
package/docs/plans/workflow-staleness-detection.md +9 -9
package/docs/plans/workrail-platform-vision.md +3 -3
package/docs/reference/agent-context-cleaner-snippet.md +1 -1
package/docs/reference/agent-context-guidance.md +4 -4
package/docs/reference/context-optimization.md +2 -2
package/docs/roadmap/now-next-later.md +2 -2
package/docs/roadmap/open-work-inventory.md +16 -16
package/docs/workflows.md +31 -31
package/package.json +1 -1
package/spec/workflow-tags.json +47 -47
package/workflows/adaptive-ticket-creation.json +16 -16
package/workflows/architecture-scalability-audit.json +22 -22
package/workflows/bug-investigation.agentic.v2.json +3 -3
package/workflows/classify-task-workflow.json +1 -1
package/workflows/coding-task-workflow-agentic.json +6 -6
package/workflows/cross-platform-code-conversion.v2.json +8 -8
package/workflows/document-creation-workflow.json +8 -8
package/workflows/documentation-update-workflow.json +8 -8
package/workflows/intelligent-test-case-generation.json +2 -2
package/workflows/learner-centered-course-workflow.json +2 -2
package/workflows/mr-review-workflow.agentic.v2.json +4 -4
package/workflows/personal-learning-materials-creation-branched.json +8 -8
package/workflows/presentation-creation.json +5 -5
package/workflows/production-readiness-audit.json +1 -1
package/workflows/relocation-workflow-us.json +31 -31
package/workflows/routines/context-gathering.json +1 -1
package/workflows/routines/design-review.json +1 -1
package/workflows/routines/execution-simulation.json +1 -1
package/workflows/routines/feature-implementation.json +3 -3
package/workflows/routines/final-verification.json +1 -1
package/workflows/routines/hypothesis-challenge.json +1 -1
package/workflows/routines/ideation.json +1 -1
package/workflows/routines/parallel-work-partitioning.json +3 -3
package/workflows/routines/philosophy-alignment.json +2 -2
package/workflows/routines/plan-analysis.json +1 -1
package/workflows/routines/plan-generation.json +1 -1
package/workflows/routines/tension-driven-design.json +6 -6
package/workflows/scoped-documentation-workflow.json +26 -26
package/workflows/ui-ux-design-workflow.json +14 -14
package/workflows/workflow-diagnose-environment.json +1 -1
package/workflows/workflow-for-workflows.json +32 -77
package/workflows/workflow-for-workflows.v2.json +0 -788

package/docs/design/coordinator-testability-audit.md ADDED Viewed

@@ -0,0 +1,426 @@
+# Coordinator Testability Audit
+Generated: 2026-04-19
+## Context
+**Motivating bug:** The `awaitSessions` HTTP bug shipped to production. The real implementation
+polled an HTTP console endpoint instead of reading from the session store. The unit-test fake
+always returned a success `AwaitResult`, so the wrong implementation path was invisible to tests.
+**Audit question:** For each dependency function in `AdaptiveCoordinatorDeps`, does the test fake
+exist? Does it simulate realistic failure modes? Could the real implementation fail in a way the
+fake would NOT catch?
+**Scope:**
+- `tests/unit/adaptive-implement.test.ts`
+- `tests/unit/adaptive-full-pipeline.test.ts`
+- `tests/unit/route-task.test.ts`
+- `src/coordinators/adaptive-pipeline.ts` -- `AdaptiveCoordinatorDeps` full interface
+- `src/coordinators/pr-review.ts` -- `CoordinatorDeps` (parent interface)
+- `src/trigger/trigger-listener.ts` -- real production wiring (lines 640-752)
+- `src/cli-worktrain.ts` -- real `awaitSessions` wiring (lines 1336-1372)
+**Principle under audit:** "Prefer fakes over mocks -- tests should validate behavior with
+realistic substitutes." (CLAUDE.md)
+---
+## Interface Summary
+`AdaptiveCoordinatorDeps` extends `CoordinatorDeps` with 5 additional deps.
+Total: 19 named deps plus one optional (`contextAssembler`).
+**Inherited from `CoordinatorDeps`:**
+| Dep | Type |
+|-----|------|
+| `spawnSession` | async, returns `Result<string, string>` |
+| `awaitSessions` | async, returns `AwaitResult` |
+| `getAgentResult` | async, returns `{ recapMarkdown, artifacts }` |
+| `listOpenPRs` | async, returns `PrSummary[]` |
+| `mergePR` | async, returns `Result<void, string>` |
+| `writeFile` | async, returns `void` |
+| `readFile` | async, returns `string` (throws on ENOENT) |
+| `appendFile` | async, returns `void` |
+| `mkdir` | async, returns `string \| undefined` |
+| `stderr` | sync, void |
+| `now` | sync, returns `number` |
+| `port` | value |
+| `homedir` | sync, returns `string` |
+| `joinPath` | sync, returns `string` |
+| `nowIso` | sync, returns `string` |
+| `generateId` | sync, returns `string` |
+| `contextAssembler` | optional, `ContextAssembler` |
+**Added by `AdaptiveCoordinatorDeps`:**
+| Dep | Type |
+|-----|------|
+| `fileExists` | sync, returns `boolean` |
+| `archiveFile` | async, returns `void` |
+| `pollForPR` | async, returns `string \| null` |
+| `postToOutbox` | async, returns `void` |
+| `pollOutboxAck` | async, returns `'acked' \| 'timeout'` |
+---
+## Per-Dep Analysis
+### 1. `awaitSessions`
+**Fake default:** Always returns `makeSuccessAwait(handles[0])` -- outcome is always `'success'`.
+**Fake failure coverage:** `makeFailedAwait()` and `makeTimeoutAwait()` helpers exist and are used
+in explicit test overrides (e.g., "escalates when coding session times out"). These cover the
+`outcome: 'failed'` and `outcome: 'timeout'` contract branches.
+**Real implementation (cli-worktrain.ts:1336):** Calls `executeWorktrainAwaitCommand` which makes
+HTTP requests to the daemon console port. If the HTTP call fails or the result is unparseable,
+`resolvedResult` stays null and the function returns all sessions as `outcome: 'failed'`.
+**Gap:** The fake does not simulate the scenario where `awaitSessions` returns all-failed because
+the daemon is unreachable (port unavailable). This is different from a session failing: it means
+the coordinator cannot determine any session status. While `makeFailedAwait` covers the contract
+shape, it does not represent the CAUSE: the coordinator trusts `awaitSessions` to correctly
+reflect session state, but the real impl can return all-failed even when sessions succeeded.
+**The awaitSessions bug:** The bug was that the real impl polled HTTP instead of reading the
+session store. A fake that validates contract shape cannot catch this. However, a fake that
+simulates port-unavailable failure (returning all-failed unconditionally when `port` is set to 0
+or -1) would have surfaced this as a test gap: the coordinator would escalate, and the test author
+would ask why.
+**Missing scenarios:**
+- `awaitSessions` returns all-failed because daemon is unreachable (simulated via `port = 0`)
+- `awaitSessions` called with an empty handles array
+**Severity: HIGH**
+---
+### 2. `getAgentResult`
+**Fake default:** Always returns `{ recapMarkdown: 'APPROVE -- no findings. LGTM.', artifacts: [] }`.
+**Fake failure coverage:** None in the default fake. Tests that need a specific result override
+`getAgentResult` inline, but no test simulates a network failure.
+**Real implementation (cli-worktrain.ts:1374):** Makes two raw `globalThis.fetch()` calls (session
+detail, then node detail). Neither call is wrapped in try/catch in `implement-shared.ts` or
+`full-pipeline.ts`. A network error throws an unhandled exception, crashing the coordinator.
+**Gap 1 (success bias):** Default fake never returns `{ recapMarkdown: null, artifacts: [] }`,
+which is the real impl's fallback on HTTP failure. The `null` recap path IS tested in
+`adaptive-full-pipeline.test.ts` but only via explicit override, not as the default.
+**Gap 2 (throw injection):** The real impl throws on network error (fetch rejects). The callers
+in `implement-shared.ts` do not have try/catch around `getAgentResult`. If this throws, the
+coordinator crashes rather than returning `{ kind: 'escalated' }`. No test exercises this.
+**Missing scenarios:**
+- `getAgentResult` returns `{ recapMarkdown: null, artifacts: [] }` as the default (should escalate gracefully)
+- `getAgentResult` throws a network error (coordinator should escalate, not crash)
+**Severity: HIGH**
+---
+### 3. `spawnSession`
+**Fake default:** Returns `ok(nextHandle())` -- always succeeds.
+**Fake failure coverage:** GOOD. Multiple tests use `vi.fn().mockResolvedValue(err('...'))` and
+workflow-specific failure injection. The `err()` path is well-tested for all spawn points
+(coding, UX gate, review, fix loop).
+**Real implementation:** Makes an HTTP POST to the daemon console. Returns `err()` on HTTP failure.
+Does not throw -- uses Result type consistently.
+**Gap:** None significant. Zombie handle detection (empty string handle) is NOT explicitly tested
+in the adaptive tests (it is tested in `pr-review.ts` context), but the structural coverage is good.
+**Missing scenarios:**
+- `spawnSession` returns `ok('')` (empty handle / zombie detection) for adaptive modes
+**Severity: LOW**
+---
+### 4. `pollForPR`
+**Fake default:** Returns `'https://github.com/org/repo/pull/42'` -- always finds a PR.
+**Fake failure coverage:** `null` return IS tested via explicit override (`pollForPR: vi.fn().mockResolvedValue(null)`
+in "escalates when no PR is found after coding session"). So the null path has a test.
+**Real implementation (trigger-listener.ts:665):** Shells `gh pr list` every 30 seconds until
+timeout. Can fail silently if `gh` is not installed, not authenticated, or the branch pattern
+has no match. Returns null after timeout.
+**Gap:** The default fake always returns a PR URL, so any regression that makes the real impl
+always return null would be masked in all other tests. The `gh` CLI failure mode (throws, then
+continues polling) is completely untested.
+**Missing scenarios:**
+- Default fake returning null (to catch regressions earlier -- currently only one explicit test)
+- `gh` CLI throws on every poll but eventually times out
+**Severity: MEDIUM**
+---
+### 5. `postToOutbox`
+**Fake default:** Returns `undefined` (vi.fn().mockResolvedValue(undefined)) -- always succeeds.
+**Fake failure coverage:** Tests verify that `postToOutbox` WAS CALLED with the right arguments,
+but no test verifies coordinator behavior when `postToOutbox` throws.
+**Real implementation (trigger-listener.ts:694):** Writes a JSON line to `~/.workrail/outbox.jsonl`
+using `fs.promises.appendFile`. Can fail on disk full, missing directory, or permission error.
+**Gap:** `postToOutbox` is called at critical escalation decision points (fix loop exhausted,
+human review required, do-not-merge). If it throws, the coordinator crashes at that point and
+never returns a `PipelineOutcome`. Callers do not wrap calls in try/catch.
+**Missing scenarios:**
+- `postToOutbox` throws an error (coordinator should continue and return escalated outcome)
+**Severity: MEDIUM**
+---
+### 6. `pollOutboxAck`
+**Fake default:** Returns `'acked'` -- always acknowledged immediately.
+**Fake failure coverage:** NONE. No test in either file exercises the `'timeout'` return value.
+**Real implementation (trigger-listener.ts:707):** Polls `inbox-cursor.json` every 5 minutes
+for up to 24 hours. The `'timeout'` path is the most likely real-world outcome because users
+do not always ack notifications promptly.
+**Gap:** The UX gate escalation path on `'timeout'` is completely untested. The `'timeout'`
+branch exists in `full-pipeline.ts` but no test triggers it. Any regression in that branch
+(e.g., forgetting to escalate, forgetting to post another outbox message) would be invisible.
+The UX gate is triggered by goals containing: 'ui', 'screen', 'component', 'design', 'ux',
+'frontend' -- a common set of keywords, not an edge case.
+**Missing scenarios:**
+- `pollOutboxAck` returns `'timeout'` -- coordinator should escalate gracefully
+**Severity: HIGH**
+---
+### 7. `archiveFile`
+**Fake default:** Returns `undefined` -- always succeeds.
+**Fake failure coverage:** GOOD. One test explicitly tests `archiveFile` throwing:
+"logs a warning if archiveFile throws but does not change the outcome". The coordinator
+uses a try/catch wrapper around `archiveFile` in a finally block. This is the only dep
+with proper throw-handling coverage.
+**Real implementation:** `fs.promises.rename(src, dest)` -- can fail on cross-device rename,
+missing dest directory, or permission error.
+**Gap:** None. Well-tested.
+**Severity: LOW**
+---
+### 8. `fileExists`
+**Fake default:** Returns `false` (in makeFakeDeps). `route-task.test.ts` uses explicit
+`noPitch` and `hasPitch` fakes.
+**Fake failure coverage:** Good for routing tests. The `fileExists` dep is sync and pure;
+it does not have network failure modes.
+**Real implementation:** `fs.existsSync(p)` -- cannot throw in normal operation.
+**Gap:** None significant.
+**Severity: LOW**
+---
+### 9. `mergePR`
+**Fake default:** Returns `ok(undefined)` -- always succeeds.
+**Fake failure coverage:** None for IMPLEMENT/FULL mode tests. `mergePR` is called by
+`runPrReviewCoordinator` (pr-review.ts), not by IMPLEMENT/FULL mode logic. The IMPLEMENT
+and FULL pipeline tests include `mergePR` in their fakes, but it is never called by the
+code under test.
+**Real implementation:** Shells `gh pr merge --squash`. Can fail on merge conflict,
+required CI not passing, or branch protection rule violation.
+**Gap:** Low severity because `mergePR` is not called in IMPLEMENT/FULL mode. The fake
+is structurally correct but its presence in `makeFakeDeps()` is misleading -- it implies
+IMPLEMENT/FULL mode merges PRs, which it does not (merging is delegated to the review
+coordinator). The misleading presence could cause future confusion.
+**Missing scenarios:** N/A for IMPLEMENT/FULL tests. Reviewed separately in pr-review tests.
+**Severity: LOW (structural confusion only)**
+---
+### 10. `listOpenPRs`
+Same analysis as `mergePR`: present in fakes but not called by IMPLEMENT/FULL mode logic.
+**Severity: LOW (structural confusion only)**
+---
+### 11. `writeFile`
+**Fake default:** Returns `undefined`. Called by `runAdaptivePipeline` for routing log writes
+and by `writeReport` in pr-review mode. Both callers wrap in try/catch -- routing log failure
+is explicitly non-fatal.
+**Gap:** None significant for IMPLEMENT/FULL. The try/catch wrapper means failures are already
+safe.
+**Severity: LOW**
+---
+### 12. `readFile`
+**Fake default:** Throws `Object.assign(new Error('ENOENT'), { code: 'ENOENT' })` -- simulates
+missing file. This is the most realistic default of any fake in the suite: it forces callers
+to handle ENOENT, which is the most common real-world readFile failure.
+**Gap:** None. Well-designed default.
+**Severity: LOW**
+---
+### 13. `contextAssembler` (optional)
+**Fake:** Absent from all `makeFakeDeps()` in adaptive tests. `contextAssembler` is an optional
+field and its absence means context assembly never runs in unit tests.
+**Real implementation:** Assembles git diff and prior session notes. Can fail if the git command
+fails or the session store is unreachable.
+**Gap:** Context assembly failures are completely invisible to unit tests. If a regression in
+context assembly caused `spawnSession` to receive malformed context and crash, unit tests would
+not catch it. However, this is intentional -- the optional nature of the dep means the coordinator
+is designed to work without it.
+**Severity: LOW (by design, but worth documenting)**
+---
+### 14. `stderr`, `now`, `port`, `homedir`, `joinPath`, `nowIso`, `generateId`
+These are synchronous utility functions with trivial or no failure modes.
+- `stderr`: vi.fn() -- never throws in real impl
+- `now`: vi.fn().mockReturnValue(Date.now()) -- realistic
+- `port`: hardcoded 3456 -- does not simulate "port 0 = daemon not running"
+- `homedir`: returns '/home/test' -- realistic enough for path construction
+- `joinPath`: uses string concatenation -- realistic
+- `nowIso`: returns ISO string -- realistic
+- `generateId`: returns random string -- realistic
+**Gap for `port`:** The `port` value in fakes is always 3456 (a valid port). A fake with `port = 0`
+or `port = -1` combined with a failure-simulating `awaitSessions` would represent "daemon not
+running" more realistically. This is the test scenario that would have caught the awaitSessions
+HTTP bug.
+**Severity: LOW (port gap is only meaningful when combined with awaitSessions)**
+---
+## vi.mock() Audit
+**Finding:** Zero `vi.mock()` calls in any of the three test files. All dependencies are
+injected via `makeFakeDeps()` or explicit object literals. This correctly follows the
+"prefer fakes over mocks" principle.
+The `vi.fn()` calls within `makeFakeDeps()` are jest-spy instances on the fake's own methods,
+not module-level mocks. This is the correct pattern.
+---
+## Missing Test Scenarios Summary
+| Gap | Dep | Severity | File |
+|-----|-----|----------|------|
+| Fake never simulates daemon-unreachable (all-failed when port unavailable) | `awaitSessions` | HIGH | both |
+| Fake never throws network error; callers lack try/catch | `getAgentResult` | HIGH | both |
+| `pollOutboxAck` `'timeout'` path never exercised | `pollOutboxAck` | HIGH | `adaptive-full-pipeline.test.ts` |
+| `postToOutbox` throw not tested; callers lack try/catch | `postToOutbox` | MEDIUM | both |
+| Default always returns PR URL; only one explicit null test | `pollForPR` | MEDIUM | both |
+| Empty-handle zombie detection not tested in adaptive modes | `spawnSession` | LOW | both |
+| `port = 3456` always valid; never simulates "daemon not running" | `port` | LOW | both |
+| `contextAssembler` absent from all fakes | `contextAssembler` | LOW | both |
+| `mergePR` / `listOpenPRs` in fakes but never called by IMPLEMENT/FULL | `mergePR`, `listOpenPRs` | LOW (misleading) | both |
+---
+## Severity Rankings
+### HIGH (must fix -- these are the awaitSessions class of gap)
+1. **`awaitSessions` daemon-unreachable scenario** -- The exact class of bug that motivated this
+   audit. Add a test that injects `awaitSessions` returning all-failed to simulate an unreachable
+   daemon port. Verify the coordinator escalates gracefully (not crashes).
+2. **`getAgentResult` throw injection** -- The real impl uses raw `fetch()` without try/catch in
+   callers. Add a test where `getAgentResult` throws a `TypeError: fetch failed`. The coordinator
+   should catch this and return `{ kind: 'escalated' }`. Currently it would crash.
+   Fix also requires adding try/catch in `implement-shared.ts` around `getAgentResult` calls.
+3. **`pollOutboxAck` timeout path** -- Add a test for the UX gate in `full-pipeline.ts` where
+   `pollOutboxAck` returns `'timeout'`. Verify the coordinator escalates and does not hang.
+   This is the most common real-world outcome of the UX gate.
+### MEDIUM (should fix)
+4. **`postToOutbox` throw injection** -- Add a test where `postToOutbox` throws. The coordinator
+   calls it at critical decision points without try/catch; a throw currently crashes the pipeline.
+5. **`pollForPR` null as default** -- The null path is covered by one explicit test, but the
+   default fake always returns a URL. Consider making null the default in a second
+   `makeFakeDeps` variant used for failure-path tests, to catch regressions earlier.
+### LOW (address in cleanup)
+6. **`spawnSession` zombie handle** -- Add one test where `spawnSession` returns `ok('')` (empty
+   handle) for IMPLEMENT/FULL modes and verify the coordinator escalates.
+7. **`mergePR` / `listOpenPRs` in `makeFakeDeps`** -- These deps are not called by IMPLEMENT/FULL
+   mode. Remove them from `makeFakeDeps` to reduce noise, or add a comment explaining they are
+   inherited from `CoordinatorDeps` for REVIEW_ONLY/QUICK_REVIEW modes.
+8. **`contextAssembler` smoke test** -- Add at least one test that injects a minimal
+   `contextAssembler` fake to verify context threading in IMPLEMENT/FULL modes.
+---
+## Architectural Note
+The awaitSessions HTTP bug was an implementation-path bug, not an interface-contract bug. No
+unit-test fake can catch "the real implementation chose the wrong data source." What a better
+fake CAN do is simulate the *outcome* of that wrong choice (daemon unreachable = all-failed)
+so the coordinator's escalation path for that outcome is exercised. The gap was not "bad fake"
+but "untested escalation branch for a failure mode that occurs in production."
+The correct fix at two levels:
+1. **Fake level:** Simulate transport failures (port unavailable, network error) to exercise
+   escalation paths.
+2. **Implementation level:** Wrap `getAgentResult`, `pollForPR`, and `postToOutbox` calls in
+   try/catch in the mode files, so transport errors return `{ kind: 'escalated' }` rather than
+   crashing the coordinator.

package/docs/design/daemon-architecture-discovery.md CHANGED Viewed

@@ -96,7 +96,7 @@ etc.) to call its MCP tools over a transport (stdio or HTTP). The process entry
 | **Team lead** | Get consistent, enforced process on every MR without training reviewers | Reviews are ad-hoc; agents drift and skip steps |
 | **Platform/infra engineer** | Deploy WorkRail as a service on cloud infrastructure | WorkRail is a local tool that exits when the terminal closes |
 | **Workflow author** | Write a workflow once, have it run identically in both manual and autonomous mode | Today: manual mode only; would need to rewrite for autonomous mode if it existed separately |
-| **WorkRail itself (self-improvement)** | Run `workflow-for-workflows` to author new workflows autonomously | Cannot initiate its own workflows; must be driven by a human |
+| **WorkRail itself (self-improvement)** | Run `wr.workflow-for-workflows` to author new workflows autonomously | Cannot initiate its own workflows; must be driven by a human |
 ### Core tension

package/docs/design/daemon-console-separation-discovery.md ADDED Viewed

@@ -0,0 +1,242 @@
+# Daemon Console Separation -- Architectural Discovery
+## About This Document
+This is a **human-readable artifact** capturing the discovery process and findings. It is NOT execution memory -- the workflow's durable state lives in WorkRail session notes and context variables. This doc is for the owner to read when reviewing the recommendation or handing work to a coding agent.
+---
+## Context / Ask
+The owner wants STRICT separation between the three WorkTrain/WorkRail systems: the MCP server, the daemon, and the console. Currently the daemon starts an embedded console server (`src/trigger/daemon-console.ts`) that holds live references to daemon internals. This creates a port conflict with the standalone console and prevents the standalone console from being the single independently-runnable console process.
+**Stated goal (solution-statement):** Split by port -- standalone console on 3456 reads filesystem, daemon control endpoints move to port 3200.
+**Reframed problem:** The daemon and standalone console fight over port 3456 because the daemon embeds a console that duplicates the standalone console's role while adding live daemon object wiring -- yet the browser UI only needs dispatch access, not the full daemon wiring.
+---
+## Path Recommendation
+**Path: `landscape_first`**
+The reframed problem is well-understood from source reading. The dominant need now is comparing the viable architectural approaches (eliminate daemon-console entirely, split-by-port, proxy approach) against the actual constraints in the codebase. We already have enough problem grounding from Step 1 -- the landscape of options is the open question.
+*Why not `design_first`*: The problem is already well-scoped. We know what's wrong. We need to know which fix is least disruptive and most maintainable.
+*Why not `full_spectrum`*: The goal was a solution-statement but we already reframed it in Step 1. We have clear success criteria. No further reframing is needed.
+---
+## Constraints / Anti-goals
+**Constraints:**
+- The MCP server (`src/mcp/`) must be touched as little as possible -- it is used in production by people other than the owner
+- The standalone console must remain independently runnable (no daemon dependency)
+- The daemon's control operations (dispatch, steer, poll) must remain HTTP-accessible
+- The browser UI dispatch button must continue to work
+**Anti-goals:**
+- Do not add auth or multi-user features in this change
+- Do not move the webhook receiver (port 3200) off its current role
+- Do not break existing `worktrain console` command behavior
+- Do not require changes to the frontend's Vite dev proxy config unless absolutely necessary
+---
+## Landscape Packet
+### Current State Summary
+Three separate server processes share port 3456 and a single lock file (`~/.workrail/daemon-console.lock`):
+1. **`src/console/standalone-console.ts`** (`worktrain console`) -- already filesystem-only, no daemon coupling. Calls `mountConsoleRoutes()` with no daemon objects (all optional params omitted). This is the CORRECT target state.
+2. **`src/trigger/daemon-console.ts`** (started by `worktrain daemon`) -- starts another Express server on port 3456 that calls `mountConsoleRoutes()` with live daemon objects. Competes directly with the standalone console. Is the source of the coupling problem.
+3. **`src/mcp/server.ts` / HttpServer** -- legacy MCP server console, writes `dashboard.lock`. Mostly retired but still present in some code paths.
+### Actual Cross-Boundary Imports (the violations)
+**Violation 1: `src/v2/usecases/console-routes.ts` imports from `src/daemon/`**
+```
+import type { SteerRegistry } from '../../daemon/workflow-runner.js';
+import { runWorkflow } from '../../daemon/workflow-runner.js';
+```
+This puts daemon types into the shared console route layer. `v2/usecases/` is supposed to be the shared middle layer used by both the standalone console and the daemon. The daemon bleeding into it is a layering violation.
+**Violation 2: `src/trigger/daemon-console.ts` imports from `src/mcp/types.js`**
+```
+import type { V2ToolContext } from '../mcp/types.js';
+```
+The trigger system importing from the MCP system is a soft violation (type-only import, not a hard runtime dependency), but it creates invisible coupling between `src/trigger/` and `src/mcp/`.
+**Violation 3: `src/v2/usecases/console-routes.ts` imports from `src/trigger/`**
+```
+import type { TriggerRouter } from '../../trigger/trigger-router.js';
+import type { PollingScheduler } from '../../trigger/polling-scheduler.js';
+```
+The shared v2/usecases layer imports from the trigger system. The layering rule should be: v2/usecases knows nothing about trigger internals.
+### How the Port/Lock System Works
+- `daemon-console.lock` is the single file read by all CLI commands (`worktrain spawn`, `worktrain trigger poll`, `worktrain await`) to discover the running console port
+- The standalone console writes this file when it starts
+- The daemon-console ALSO writes this file when it starts (they compete)
+- `worktrain spawn` calls `POST /api/v2/auto/dispatch` against the discovered port
+- `worktrain trigger poll` calls `POST /api/v2/triggers/:id/poll` against the discovered port
+- `src/mcp/handlers/session.ts` `handleOpenDashboard()` reads the lock file to construct the dashboard URL
+### Vite Dev Proxy
+`console/vite.config.ts` proxies `/api` to `http://localhost:3456`. The built frontend uses relative URLs. This means:
+- In development: all frontend API calls go to port 3456 (via Vite proxy)
+- In production: all frontend API calls go to the same origin (port 3456) via relative paths
+- **There is no mechanism for the frontend to reach port 3200 today**
+### What Endpoints Live Where
+**Port 3456 (daemon-console / standalone-console) via `mountConsoleRoutes()`:**
+- `GET /api/v2/sessions` -- session list (filesystem read)
+- `GET /api/v2/sessions/:id` -- session detail (filesystem read)
+- `GET /api/v2/sessions/:id/nodes/:nodeId` -- node detail (filesystem read)
+- `GET /api/v2/sessions/:id/events` -- per-session SSE (reads daemon event log file)
+- `GET /api/v2/workspace/events` -- workspace SSE (watches sessions dir)
+- `GET /api/v2/worktrees` -- worktree list (git commands + filesystem)
+- `GET /api/v2/workflows` -- workflow catalog (filesystem)
+- `GET /api/v2/triggers` -- trigger list (requires `TriggerRouter` injection; returns [] without it)
+- `POST /api/v2/auto/dispatch` -- dispatch workflow (requires `V2ToolContext` + optional `TriggerRouter`)
+- `POST /api/v2/triggers/:id/poll` -- force poll (requires `PollingScheduler` injection; returns 503 without it)
+- `POST /api/v2/sessions/:id/steer` -- inject text into running session (requires `SteerRegistry`; returns 503 without it)
+**Port 3200 (trigger-listener) via `createTriggerApp()`:**
+- `GET /health` -- health check
+- `POST /webhook/:triggerId` -- webhook receiver
+### Browser Frontend API Usage
+Only three control endpoints are called from the browser:
+- `POST /api/v2/auto/dispatch` -- called from `DispatchPane.tsx`
+- `GET /api/v2/triggers` -- called from `DispatchPane.tsx` (trigger list display)
+- All others are read-only data display
+**Steer and poll have ZERO frontend callers.** They are programmatic coordinator APIs only.
+### CLI Commands and Their Console Port Usage
+| Command | What it calls | Port used |
+|---------|--------------|-----------|
+| `worktrain spawn` | `POST /api/v2/auto/dispatch` | Discovers from `daemon-console.lock`, default 3456 |
+| `worktrain trigger poll <id>` | `POST /api/v2/triggers/:id/poll` | Discovers from `daemon-console.lock`, default 3200 |
+| `worktrain await` | Reads lock file for URL construction | daemon-console.lock port |
+| `src/mcp/handlers/session.ts` | Reads lock file for URL construction | daemon-console.lock port |
+### Existing Approaches / Precedents
+The codebase already has a clean precedent: the standalone console (`standalone-console.ts`) calls `mountConsoleRoutes()` with all optional daemon params omitted. The endpoints that need daemon objects return `503 Service Unavailable` gracefully when those objects are not injected. This is the "degradation without error" pattern already designed in.
+### Option Categories
+Three architectural approaches are viable (see Candidate Directions section).
+### Contradictions
+- The comment in `daemon-console.ts` says it is "designed to be called from the daemon startup path so the console stays up as long as the daemon process runs" -- but if the standalone console is running, the daemon's console conflicts with it.
+- `worktrain-trigger-poll.ts` has `DEFAULT_POLL_PORT = 3200` as a "spec requirement" but then says "in practice, the daemon console writes daemon-console.lock (port 3456)" -- the spec says 3200 but reality is 3456. This inconsistency suggests the spec and implementation diverged.
+- `v2/usecases/console-routes.ts` is supposed to be a shared middle layer but it imports from `src/daemon/workflow-runner.js`. This is an upward dependency from the shared layer to the application layer.
+### Evidence Gaps
+- It is unknown whether the owner intends to ever add steer/poll UI controls in the browser
+- It is unknown whether the daemon is expected to run simultaneously with `worktrain console` (they compete for port 3456 today -- does the owner expect one to take priority?)
+---
+## Problem Frame Packet
+### Users / Stakeholders
+- **Primary user:** Project owner -- a single developer who runs daemon + console locally
+- **Secondary users:** None for daemon/console. External users only interact with the MCP server.
+- **Affected indirectly:** External MCP users if MCP server is destabilized by changes
+### Jobs, Goals, and Outcomes
+- Run `worktrain console` as a standalone process that works whether or not the daemon is running
+- Run the daemon without it conflicting with the standalone console
+- Browser dispatch button (`DispatchPane`) works when the daemon is running
+- Coordinator scripts can call steer and poll APIs when the daemon is running
+- `worktrain spawn`, `worktrain trigger poll`, `worktrain await` CLI commands continue to work
+### Pains / Tensions / Constraints
+1. **Port conflict pain:** Daemon starts on port 3456, preventing `worktrain console` from binding. Owner must choose: run daemon OR run console. Can't run both simultaneously today.
+2. **Layering violation pain:** `src/v2/usecases/console-routes.ts` -- intended as a shared layer -- imports from `src/daemon/` and `src/trigger/`. This means any change to daemon or trigger internals could break the console routes, and the console routes cannot be reasoned about independently.
+3. **Ownership ambiguity:** Both `daemon-console.ts` and `standalone-console.ts` write the same lock file and bind the same port. Neither "owns" the console definitively. CLI tools read whichever wrote last.
+4. **Dispatch coupling tension:** The dispatch button requires a live `V2ToolContext` (for `executeStartWorkflow`) and optionally a `TriggerRouter` (for queue serialization). These objects only exist inside the daemon process. The standalone console cannot dispatch autonomously without them.
+### Success Criteria
+1. `worktrain console` binds port 3456, serves sessions, and never errors out just because the daemon is or isn't running
+2. When the daemon is running AND the standalone console is running, both processes work without port conflict
+3. `POST /api/v2/auto/dispatch` from the browser UI succeeds when the daemon is running
+4. `POST /api/v2/sessions/:id/steer` and `POST /api/v2/triggers/:id/poll` remain functional for coordinator scripts
+5. `worktrain spawn` and `worktrain trigger poll` continue to work
+6. No imports cross from `src/v2/usecases/` into `src/daemon/` or `src/trigger/`
+7. `npx vitest run` passes
+### Assumptions
+- The owner wants daemon and standalone console to coexist simultaneously (not be mutually exclusive)
+- Steer and poll will remain coordinator-script-only APIs (no browser UI for them)
+- The browser frontend will NOT be rewritten to support dual-port API calls
+### Reframes / HMW Questions
+1. **HMW eliminate the daemon's embedded console entirely?** Instead of the daemon starting its own console, the standalone console could be the only console. The daemon adds its control surface (dispatch, steer, poll) to port 3200 or a new dedicated port (3201). The standalone console proxies dispatch requests to the daemon port, or the dispatch button in the browser is disabled when the daemon port is unreachable.
+2. **HMW make dispatch work without the daemon holding objects?** Instead of `mountConsoleRoutes()` holding a live `TriggerRouter` reference, the dispatch endpoint could make an HTTP POST to the trigger-listener on port 3200 (`/webhook/:triggerId`) or a new `/dispatch` endpoint on 3200. This decouples the console from the daemon's object graph.
+3. **HMW minimize the change surface?** The cleanest path might be: (a) remove the daemon's console startup from `cli-worktrain.ts`, (b) move the three control endpoints (`dispatch`, `steer`, `poll`) OUT of `mountConsoleRoutes()` and into a separate `mountDaemonControlRoutes()`, (c) the daemon mounts both on port 3200 (alongside `/webhook`), (d) the standalone console mounts only `mountConsoleRoutes()`, (e) CLI tools discover which port has control endpoints via separate lock files.
+### What Would Make This Framing Wrong
+- If the owner DOES want to run daemon and standalone console as mutually exclusive alternatives (not simultaneously), then port conflict is not a bug but a feature -- and the real problem is just the import layering violations
+- If the owner wants to add steer/poll browser UI, the "steer and poll are API-only" assumption is wrong and the design needs to account for cross-port browser calls
+- If the owner wants NO changes to `worktrain spawn` and `worktrain trigger poll` behavior, any solution that moves dispatch/poll to a different port must preserve the lock file discovery mechanism exactly
+---
+## Candidate Directions
+*(To be populated in Phase 1)*
+---
+## Challenge Notes
+*(To be populated in Phase 2)*
+---
+## Resolution Notes
+*(To be populated in Phase 2)*
+---
+## Decision Log
+| Date | Decision | Rationale |
+|------|----------|-----------|
+| 2026-04-21 | Goal reclassified as solution_statement | "Split by port" names a specific approach, not the problem |
+| 2026-04-21 | Path = landscape_first | Problem is well-understood; option landscape is the open question |
+---
+## Final Summary
+*(To be populated at end of workflow)*