@exaudeus/workrail 3.36.0 → 3.37.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (44) hide show
  1. package/dist/config/config-file.js +2 -0
  2. package/dist/console-ui/assets/{index-n8cJrS4v.js → index-o-p__sHJ.js} +1 -1
  3. package/dist/console-ui/index.html +1 -1
  4. package/dist/daemon/workflow-runner.d.ts +1 -0
  5. package/dist/daemon/workflow-runner.js +3 -6
  6. package/dist/manifest.json +23 -15
  7. package/dist/trigger/notification-service.d.ts +42 -0
  8. package/dist/trigger/notification-service.js +164 -0
  9. package/dist/trigger/trigger-listener.js +7 -1
  10. package/dist/trigger/trigger-router.d.ts +3 -1
  11. package/dist/trigger/trigger-router.js +4 -1
  12. package/docs/design/agent-behavior-patterns-discovery.md +312 -0
  13. package/docs/design/agent-engine-communication-discovery.md +390 -0
  14. package/docs/design/agent-loop-architecture-alternatives-discovery.md +531 -0
  15. package/docs/design/agent-loop-error-handling-contract.md +238 -0
  16. package/docs/design/complete-step-approach-validation-discovery.md +344 -0
  17. package/docs/design/daemon-stuck-detection-discovery.md +174 -0
  18. package/docs/design/mcp-server-disconnect-discovery.md +245 -0
  19. package/docs/design/mcp-server-epipe-crash.md +198 -0
  20. package/docs/design/notification-design-candidates.md +131 -0
  21. package/docs/design/notification-design-review.md +84 -0
  22. package/docs/design/notification-implementation-plan.md +181 -0
  23. package/docs/design/spawn-agent-failure-modes.md +161 -0
  24. package/docs/design/spawn-agent-result-handling-implementation-plan.md +186 -0
  25. package/docs/design/stdio-simplification-design-candidates.md +341 -0
  26. package/docs/design/stdio-simplification-design-review.md +93 -0
  27. package/docs/design/stdio-simplification-implementation-plan.md +317 -0
  28. package/docs/design/structured-output-tools-coexist-findings.md +288 -0
  29. package/docs/discovery/coordinator-script-design.md +745 -0
  30. package/docs/discovery/coordinator-ux-discovery.md +471 -0
  31. package/docs/discovery/spawn-agent-failure-modes.md +309 -0
  32. package/docs/discovery/workflow-selection-for-discovery-tasks.md +336 -0
  33. package/docs/discovery/worktrain-status-briefing.md +325 -0
  34. package/docs/discovery/worktrain-status-design-candidates.md +202 -0
  35. package/docs/discovery/worktrain-status-design-review-findings.md +86 -0
  36. package/docs/ideas/backlog.md +608 -0
  37. package/docs/ideas/daemon-structured-output-vs-tool-calls.md +344 -0
  38. package/docs/ideas/design-candidates-backlog-consolidation.md +85 -0
  39. package/docs/ideas/design-review-findings-backlog-consolidation.md +39 -0
  40. package/docs/ideas/implementation_plan_backlog_consolidation.md +117 -0
  41. package/docs/plans/authoring-doc-staleness-enforcement-candidates.md +251 -0
  42. package/docs/plans/authoring-doc-staleness-enforcement-review.md +99 -0
  43. package/docs/plans/authoring-doc-staleness-enforcement.md +463 -0
  44. package/package.json +1 -1
@@ -0,0 +1,186 @@
1
+ # Implementation Plan: Fix spawn_agent delivery_failed result handling
2
+
3
+ **Branch:** `fix/spawn-agent-result-handling`
4
+ **Status:** Ready for implementation.
5
+
6
+ ---
7
+
8
+ ## Problem Statement
9
+
10
+ `makeSpawnAgentTool` in `src/daemon/workflow-runner.ts` maps the `delivery_failed` result variant to `outcome: 'success'` when constructing the structured result returned to the parent LLM. This is wrong: a parent LLM that receives `outcome: 'success'` will proceed as if the child session completed normally, even though an unexpected/impossible state was reached.
11
+
12
+ The bug is in the `else` branch of the result-mapping block (lines 1572-1579). The branch is architecturally unreachable -- `runWorkflow()` never produces `delivery_failed` (only `TriggerRouter` does, post-HTTP-callback). But the `else` fallthrough silently maps it to success rather than surfacing it as an error.
13
+
14
+ ---
15
+
16
+ ## Acceptance Criteria
17
+
18
+ 1. `delivery_failed` does NOT map to `outcome: 'success'` in `makeSpawnAgentTool`.
19
+ 2. A `ChildWorkflowRunResult` type alias is exported from `workflow-runner.ts` representing the 3 variants `runWorkflow()` actually returns: `WorkflowRunSuccess | WorkflowRunError | WorkflowRunTimeout`.
20
+ 3. The result-mapping block uses explicit if-else branches over `ChildWorkflowRunResult` variants only, with `assertNever` in the else position.
21
+ 4. The WHY comment on the result-mapping block accurately describes the architectural invariant (runWorkflow() never returns delivery_failed; only TriggerRouter does).
22
+ 5. The existing `delivery_failed not expected here` comments in `console-routes.ts` and `trigger-router.ts` are updated to explain why they use soft handling (unlike spawn_agent, they have no user-visible consequence).
23
+ 6. New test file `tests/unit/workflow-runner-spawn-agent.test.ts` exists and covers:
24
+ - success -> `{ outcome: 'success', notes: <lastStepNotes> }`
25
+ - error -> `{ outcome: 'error', notes: <message> }`
26
+ - timeout -> `{ outcome: 'timeout', notes: <message> }`
27
+ - depth limit exceeded -> `{ outcome: 'error', childSessionId: null }`
28
+ - startResult failure -> `{ outcome: 'error', childSessionId: null }`
29
+ 7. All existing tests pass.
30
+
31
+ ---
32
+
33
+ ## Non-Goals
34
+
35
+ - Do NOT change `TriggerRouter`'s delivery logic.
36
+ - Do NOT remove `delivery_failed` from `WorkflowRunResult` globally.
37
+ - Do NOT change `runWorkflow()`'s declared return type (Candidate 3 -- out of scope for this fix).
38
+ - Do NOT add retry logic for HTTP callback delivery.
39
+ - Do NOT modify the tool's description string (it already lists `'success'|'error'|'timeout'` correctly).
40
+
41
+ ---
42
+
43
+ ## Philosophy-Driven Constraints
44
+
45
+ - **Make illegal states unrepresentable:** `delivery_failed` must be excluded from the type at the spawn_agent call site. Use `ChildWorkflowRunResult` alias for this.
46
+ - **Exhaustiveness everywhere:** Use `assertNever(childResult)` in the else branch, not an implicit fallthrough.
47
+ - **Errors are data:** Impossible/unexpected states must surface as errors, not be silently mapped to success.
48
+ - **Document "why", not "what":** WHY comments must explain the architectural invariant, not just the mechanics.
49
+ - **Type safety as the first line of defense:** Compile-time exhaustiveness over the 3 real variants is the primary guard.
50
+
51
+ ---
52
+
53
+ ## Invariants
54
+
55
+ 1. `runWorkflow()` never returns `delivery_failed` -- only `TriggerRouter` does, post-HTTP-callback.
56
+ 2. Child sessions spawned by `spawn_agent` bypass `TriggerRouter` and have no `callbackUrl`.
57
+ 3. The parent LLM must never receive `outcome: 'success'` for an impossible or unexpected state.
58
+ 4. `ChildWorkflowRunResult` must be a strict subset of `WorkflowRunResult` (no new result types).
59
+
60
+ ---
61
+
62
+ ## Selected Approach
63
+
64
+ **Candidate 2: ChildWorkflowRunResult alias + cast + assertNever**
65
+
66
+ 1. Export `type ChildWorkflowRunResult = WorkflowRunSuccess | WorkflowRunError | WorkflowRunTimeout` from `workflow-runner.ts`, placed near `WorkflowRunResult`. Include a WHY comment documenting the architectural invariant.
67
+ 2. In `makeSpawnAgentTool.execute()`, cast `childResult` to `ChildWorkflowRunResult` immediately after the `runWorkflowFn(...)` call. Include a WHY comment on the cast.
68
+ 3. Replace the implicit `else` fallthrough with `assertNever(childResult)`.
69
+ 4. Import `assertNever` from `'../runtime/assert-never.js'` in `workflow-runner.ts`.
70
+ 5. Update comments at the `delivery_failed not expected here` branches in `console-routes.ts` and `trigger-router.ts`.
71
+
72
+ **Runner-up:** Candidate 1 (minimal patch -- change `outcome: 'success'` to `'error'`). Loses because the type lie persists and the else fallthrough has no compile-time guard.
73
+
74
+ **Pivot condition:** If `runWorkflow()` gains a `callbackUrl` parameter and starts producing `delivery_failed` directly, switch to Candidate 3 (narrow `runWorkflow()`'s return type).
75
+
76
+ ---
77
+
78
+ ## Vertical Slices
79
+
80
+ ### Slice 1: ChildWorkflowRunResult type + assertNever fix in workflow-runner.ts
81
+
82
+ **File:** `src/daemon/workflow-runner.ts`
83
+ **Changes:**
84
+ - Add `export type ChildWorkflowRunResult = WorkflowRunSuccess | WorkflowRunError | WorkflowRunTimeout` near `WorkflowRunResult` (line ~341), with WHY comment.
85
+ - Add `import { assertNever } from '../runtime/assert-never.js'` to the imports.
86
+ - In `makeSpawnAgentTool.execute()`, after the `runWorkflowFn(...)` call, add the cast: `const childResult = (await runWorkflowFn(...)) as ChildWorkflowRunResult`.
87
+ - Replace the implicit `else` branch body with `assertNever(childResult)`.
88
+ - Update the comment on the result-mapping block (lines 1546-1551) to accurately state the invariant.
89
+ - Replace the explicit variable declaration `let resultObj: { ... }` with `let resultObj: { childSessionId: string | null; outcome: 'success' | 'error' | 'timeout'; notes: string }` (unchanged -- it already correctly excludes delivery_failed from the outcome type).
90
+
91
+ **Done when:** TypeScript compiles without errors; `delivery_failed` branch is gone from the result-mapping block.
92
+
93
+ ### Slice 2: Comment updates in console-routes.ts and trigger-router.ts
94
+
95
+ **Files:** `src/v2/usecases/console-routes.ts`, `src/trigger/trigger-router.ts`
96
+ **Changes (comment-only):**
97
+ - `console-routes.ts` line 638-641: add note explaining soft handling is intentional (log-only path, no user-visible outcome, unlike spawn_agent).
98
+ - `trigger-router.ts` line 679-681: same.
99
+
100
+ **Done when:** Both files updated with explanatory comments.
101
+
102
+ ### Slice 3: New test file for makeSpawnAgentTool result mapping
103
+
104
+ **File:** `tests/unit/workflow-runner-spawn-agent.test.ts` (new)
105
+ **Coverage:**
106
+ - `success` result -> `{ outcome: 'success', notes: lastStepNotes }`
107
+ - `error` result -> `{ outcome: 'error', notes: message }`
108
+ - `timeout` result -> `{ outcome: 'timeout', notes: message }`
109
+ - Depth limit exceeded (before runWorkflow call) -> `{ outcome: 'error', childSessionId: null }`
110
+ - `executeStartWorkflow` failure (startResult.isErr()) -> `{ outcome: 'error', childSessionId: null }`
111
+
112
+ **Done when:** Tests pass; no existing tests are broken.
113
+
114
+ ---
115
+
116
+ ## Test Design
117
+
118
+ **Framework:** Existing test suite (vitest, based on `workflow-runner-*.test.ts` patterns).
119
+ **Pattern:** One describe block for `makeSpawnAgentTool`, one `it` per behavior.
120
+ **Stubs:** Inject a `runWorkflowFn` stub that returns the desired variant. Inject a minimal `ctx` with the required ports. No real LLM calls.
121
+
122
+ **Key test cases:**
123
+ ```typescript
124
+ describe('makeSpawnAgentTool result mapping', () => {
125
+ it('maps success to outcome: success with lastStepNotes', ...)
126
+ it('maps error to outcome: error with message', ...)
127
+ it('maps timeout to outcome: timeout with message', ...)
128
+ it('returns error when depth limit exceeded', ...)
129
+ it('returns error when executeStartWorkflow fails', ...)
130
+ })
131
+ ```
132
+
133
+ Note: The `assertNever` branch for `delivery_failed` is not testable without casting in the test (since `ChildWorkflowRunResult` excludes it). The compile-time guard is the primary verification for that branch.
134
+
135
+ ---
136
+
137
+ ## Risk Register
138
+
139
+ | Risk | Likelihood | Impact | Mitigation |
140
+ |---|---|---|---|
141
+ | Cast becomes stale (runWorkflow gains delivery_failed) | Low | Medium | WHY comment makes assumption visible; assertNever throws loudly |
142
+ | New WorkflowRunResult variant breaks assertNever | Low | Low | Compile error surfaces immediately |
143
+ | Test scaffolding for makeSpawnAgentTool is complex | Low | Low | Other workflow-runner tests show the pattern; reuse it |
144
+
145
+ ---
146
+
147
+ ## PR Packaging Strategy
148
+
149
+ **Single PR** on branch `fix/spawn-agent-result-handling`.
150
+
151
+ Contents:
152
+ - `src/daemon/workflow-runner.ts` -- type alias + import + cast + assertNever + updated comment
153
+ - `src/v2/usecases/console-routes.ts` -- comment update only
154
+ - `src/trigger/trigger-router.ts` -- comment update only
155
+ - `tests/unit/workflow-runner-spawn-agent.test.ts` -- new test file
156
+ - `docs/design/spawn-agent-failure-modes.md` -- discovery doc
157
+ - `docs/design/spawn-agent-failure-modes-design-review.md` -- design review doc
158
+ - `docs/design/spawn-agent-result-handling-implementation-plan.md` -- this file
159
+
160
+ ---
161
+
162
+ ## Philosophy Alignment Per Slice
163
+
164
+ ### Slice 1 (ChildWorkflowRunResult + assertNever)
165
+ - Make illegal states unrepresentable -> **satisfied**: delivery_failed excluded from type at call site
166
+ - Exhaustiveness everywhere -> **satisfied**: assertNever guards all future variants
167
+ - Errors are data -> **satisfied**: impossible state throws, not silently succeeds
168
+ - Type safety as first line of defense -> **satisfied**: compile-time exhaustiveness over 3 real variants
169
+ - Document "why" not "what" -> **satisfied**: WHY comments on alias and cast
170
+ - YAGNI with discipline -> **tension**: one additional type alias; acceptable (documents existing invariant, not speculative)
171
+
172
+ ### Slice 2 (Comment updates)
173
+ - Document "why" not "what" -> **satisfied**: explains intentional inconsistency between callsites
174
+
175
+ ### Slice 3 (Tests)
176
+ - Prefer fakes over mocks -> **satisfied**: stub runWorkflowFn function, not mock framework
177
+ - Determinism over cleverness -> **satisfied**: each test exercises one result variant deterministically
178
+
179
+ ---
180
+
181
+ ## Plan Confidence: High
182
+
183
+ - `unresolvedUnknownCount`: 0
184
+ - `planConfidenceBand`: High
185
+ - `estimatedPRCount`: 1
186
+ - `followUpTickets`: Candidate 3 (narrow runWorkflow return type) if runWorkflow ever gains callbackUrl support -- not filed yet, documented as pivot condition.
@@ -0,0 +1,341 @@
1
+ # WorkRail MCP Server Stdio Simplification -- Design Candidates
2
+
3
+ **Status:** Discovery only. No code changes. For review by the main agent before implementation begins.
4
+ **Date:** 2026-04-19
5
+ **Scope:** Remove primary election (DashboardLock, tryBecomePrimary, bindWithPortFallback), bridge mechanism (bridge-entry.ts, reconnect cycles, spawn storm), and HTTP dashboard serving from the MCP server. The standalone worktrain console (PR #512, merged) now owns the UI.
6
+
7
+ ---
8
+
9
+ ## Problem Understanding
10
+
11
+ ### Background
12
+
13
+ The bridge/primary-election system was built to solve one problem: "only one process should serve the console UI on port 3456." When multiple Claude Code windows open against the same repo, each spawns a `workrail` stdio process. Without coordination, all would try to bind port 3456 for the dashboard. The solution: the first instance becomes the HTTP primary; all subsequent instances become bridges that forward JSON-RPC over the primary's HTTP endpoint.
14
+
15
+ PR #512 merged `worktrain console` as a standalone process that reads session files directly and has zero coupling to the MCP server. That change removed the reason the bridge/election system exists. The coordination problem is solved at the infrastructure layer (console is independent). The MCP server can now be pure stdio.
16
+
17
+ ### Core tensions
18
+
19
+ **T1: Clean architectural removal vs behavioral backward compatibility**
20
+
21
+ `sessionTools` is enabled by default (`WORKRAIL_ENABLE_SESSION_TOOLS=true` in the generated config template). This means most users have it on. The `create_session` tool returns `dashboardUrl`, built from `httpServer.getBaseUrl()`. The `open_dashboard` tool calls `httpServer.openDashboard()`, which uses the `open` npm package to launch a browser. Removing `HttpServer` from the MCP server makes `dashboardUrl` return `null` and kills browser-open. This is a visible API contract change -- not dangerous, but requiring a migration note and potentially a `feat:` or `fix:` commit tag.
22
+
23
+ **T2: Self-referential risk**
24
+
25
+ The MCP server running in this repo is the tool executing this workflow. Any code change that breaks `src/mcp-server.ts` or transport initialization will kill the active session. This demands smallest-first slices with green tests between each. The bridge, tombstone, and HttpServer are interdependent in subtle ways (tombstone is only written when `ctx.httpServer?.getPort()` is non-null; bridge reads tombstone). Removing just one leaves the others in an inconsistent state.
26
+
27
+ **T3: Two HTTP servers with similar names, different purposes**
28
+
29
+ There are two completely different HTTP servers in this codebase:
30
+ - `src/mcp/transports/http-entry.ts` + `src/mcp/transports/http-listener.ts`: the **MCP protocol over HTTP** transport for bot services and daemon (`WORKRAIL_TRANSPORT=http`). This is correct infrastructure and must stay.
31
+ - `src/infrastructure/session/HttpServer.ts`: the **Express dashboard server** with primary election on port 3456. This is what needs to go.
32
+
33
+ A naive "remove HttpServer" that conflates these would break the bot-service HTTP transport and `tests/integration/mcp-http-transport.test.ts`.
34
+
35
+ **T4: `sessionTools` feature coherence after losing its HTTP half**
36
+
37
+ `SessionManager` (filesystem-based session CRUD) and `HttpServer` (dashboard serving) are co-gated by a single `requireSessionTools()` guard that checks `ctx.sessionManager !== null && ctx.httpServer !== null`. After removal, `SessionManager` still works. Only `open_dashboard` and the `dashboardUrl` in `create_session` depend on `HttpServer`. The guard bundles capabilities that should be separate.
38
+
39
+ ### What makes this hard
40
+
41
+ 1. **Tombstone write is conditional on HttpServer port.** `stdio-entry.ts` does `if (ctx.httpServer?.getPort() != null) writeTombstone(port, pid)`. If HttpServer is removed before tombstone code is cleaned up, the tombstone never writes -- which is fine since bridges are also gone, but leaves dead code.
42
+
43
+ 2. **`cli.ts` cleanup command resolves `DI.Infra.HttpServer`.** The `workrail cleanup` CLI command calls `httpServer.fullCleanup()` (lsof/netstat to kill processes on ports 3456-3499). After removing the token, the DI resolution crashes. The command must be simplified or removed in the same PR that removes the DI token.
44
+
45
+ 3. **DI smoke test resolves every registered token.** `tests/smoke/di-container.smoke.test.ts` iterates all `DI.*` symbols and resolves them. Removing `DI.Infra.HttpServer`, `DI.Config.DashboardMode`, `DI.Config.BrowserBehavior` from `tokens.ts` means those tokens simply no longer appear in the loop. The test automatically shrinks -- no manual update needed.
46
+
47
+ 4. **`worktrain-spawn.ts` reads `dashboard.lock` as a fallback.** After HttpServer removal, `dashboard.lock` is never written, so the fallback always returns null. The code already handles this gracefully (ENOENT caught, falls through to default port 3456). No behavioral change, but the fallback is now permanently dead code.
48
+
49
+ 5. **`open_dashboard` is not called from any bundled workflow** (verified: `grep -rn "open_dashboard" workflows/` returns nothing). It's a UX tool, not a workflow primitive. The behavioral degradation is real but low-impact.
50
+
51
+ 6. **The `open` npm package is used only by `HttpServer.ts`.** Removing `HttpServer.ts` also removes the only usage of the `open` package. This is a welcome dependency reduction.
52
+
53
+ ### Likely real seam
54
+
55
+ The primary seam for the bridge removal is `mcp-server.ts` `main()` -- the 28-line auto-bridge block that probes port 3100 and conditionally starts bridge mode.
56
+
57
+ The primary seam for HttpServer removal is `ToolContext.httpServer: HttpServer | null` in `src/mcp/types.ts`. Removing this field from the type produces TypeScript errors at every callsite (handler, server, transport entry points), making the full blast radius immediately visible and compiler-verified. This is the correct seam: it uses type-system enforcement rather than grep-and-hope.
58
+
59
+ ---
60
+
61
+ ## Philosophy Constraints
62
+
63
+ Principles from AGENTS.md and daemon-soul.md that directly constrain the design:
64
+
65
+ | Principle | Constraint |
66
+ |---|---|
67
+ | Architectural fixes over patches | Remove the root cause (HttpServer from MCP). Don't add a feature flag to paper over it. |
68
+ | Make illegal states unrepresentable | `ToolContext.httpServer: HttpServer | null` is an illegal state post-removal. The field should not exist. |
69
+ | YAGNI with discipline | `DashboardHeartbeat`, `DashboardLockRelease`, `DashboardLock` interface, `BrowserBehavior`, tombstone, bridge reconnect state machine -- all YAGNI once console is standalone. |
70
+ | Keep interfaces small and focused | `ToolContext` has a capability it no longer needs. `requireSessionTools()` gates two unrelated capabilities together. |
71
+ | Determinism over cleverness | The 150ms zombie-bridge stdin probe, the spawn coordinator lock, the jitter in `spawnPrimary()`, the tombstone fast-path -- all eliminated. Startup becomes deterministic. |
72
+ | Validate at boundaries, trust inside | Post-simplification: transport mode is resolved once at startup from env vars. No runtime probing mid-boot. |
73
+ | Errors are data | `bridge-entry.ts` uses a callback-based `performShutdown`. The retained code uses DI-injected `ShutdownEvents` with discriminated-union events. The bridge's pattern is inferior and goes away. |
74
+ | Never push to main directly | Implementation must be on feature branches with PRs. Design-only here. |
75
+
76
+ **Conflicts found:**
77
+
78
+ 1. `src/v2/` is listed as a protected file tree (AGENTS.md). The console routes (`src/v2/usecases/console-routes.ts`) are called from `src/mcp/server.ts` via `ctx.httpServer.mountRoutes()`. Removing that call is in `src/mcp/server.ts`, which is NOT protected. The `console-routes.ts` function itself does not change. No conflict once scoped correctly.
79
+
80
+ 2. "Dependency injection for boundaries" suggests the `http://localhost:3456` URL used in the degraded `open_dashboard` and `dashboardUrl` should be injected rather than hardcoded. Counter-argument: 3456 is the established documented default; injecting a rarely-changed constant adds complexity. The constant can be defined once in a shared location (`DEFAULT_CONSOLE_PORT = 3456`) rather than injected via DI.
81
+
82
+ ---
83
+
84
+ ## Impact Surface
85
+
86
+ Paths, consumers, and contracts that must stay consistent after the change:
87
+
88
+ | Surface | Impact | Required action |
89
+ |---|---|---|
90
+ | `src/mcp/transports/http-entry.ts` | MCP protocol over HTTP (bot services). Must NOT be removed. | Keep unchanged |
91
+ | `src/mcp/transports/http-listener.ts` | Used by `http-entry.ts`. Must NOT be removed. | Keep unchanged |
92
+ | `src/mcp/index.ts` | Exports `startHttpServer`. Must stay (public library export). | Keep unchanged |
93
+ | `mcp-server.ts` public API | Exports `startBridgeServer`, `detectHealthyPrimary`. Must be removed along with bridge-entry.ts. | Remove re-exports |
94
+ | `worktrain-spawn.ts` | Reads `dashboard.lock` as fallback. After removal: fallback misses but doesn't crash. | Dead code; can remove in follow-up |
95
+ | `worktrain-await.ts` | Same pattern as spawn. | Same |
96
+ | `cli.ts` cleanup command | Resolves `DI.Infra.HttpServer`. Must be updated when token is removed. | Simplify or remove command |
97
+ | `NodeProcessSignals` comment | Says "HttpServer.setupPrimaryCleanup() and wireShutdownHooks() both call on()". After removal, only wireShutdownHooks() calls on(). | Update comment |
98
+ | `tests/integration/mcp-http-transport.test.ts` | Tests MCP-over-HTTP (not dashboard). Keep. | Keep unchanged |
99
+ | `tests/unit/mcp/http-listener.test.ts` | Tests `createHttpListener` and `bindWithPortFallback`. Keep (needed for bot-service transport). | Keep unchanged |
100
+ | DI smoke test | Iterates all DI tokens. Removing tokens makes them disappear from iteration automatically. | No change needed |
101
+ | `WORKRAIL_DASHBOARD_PORT`, `WORKRAIL_DISABLE_UNIFIED_DASHBOARD` env vars | Written into user config by `workrail init`. After HttpServer removal, these become ignored. | Document deprecation in release notes |
102
+ | Schema consistency test | Imports `openDashboardTool`. The tool definition stays, just its behavior changes. | No change needed |
103
+ | `sessionTools` feature flag description | Currently says "and HTTP dashboard server". After removal, this is inaccurate. | Update description string |
104
+
105
+ ---
106
+
107
+ ## Candidates
108
+
109
+ ### Candidate 1: Bridge out, HttpServer simplified (no election)
110
+
111
+ **Summary:** Remove the bridge, tombstone, and auto-primary detection; strip election machinery from inside `HttpServer` but keep HttpServer starting on port 3456 for sessionTools users.
112
+
113
+ **Structural changes:**
114
+ - Delete `bridge-entry.ts`, `primary-tombstone.ts`, `bridge-events.ts`
115
+ - Remove 28-line auto-bridge block from `mcp-server.ts` main()
116
+ - Remove bridge re-exports from `mcp-server.ts`
117
+ - Remove tombstone calls from `stdio-entry.ts`
118
+ - In `HttpServer.ts`: delete `tryBecomePrimary()`, `reclaimStaleLock()`, `shouldReclaimLock()`, `setupPrimaryCleanup()`, `startLegacyMode()`, `DashboardLock` interface, heartbeat, lock file logic, `fullCleanup()`. Replace `start()` with direct `startAsPrimary()`.
119
+ - Delete `DashboardHeartbeat.ts`, `DashboardLockRelease.ts`
120
+ - Remove `DI.Config.DashboardMode`, `DI.Config.BrowserBehavior` from container
121
+ - Update/delete relevant tests
122
+
123
+ **Tensions resolved:** T2 (safe first PR), T3 partially (bridge confusion removed)
124
+ **Tensions accepted:** T1 (port contention between windows persists), T4 (sessionTools still bundled with HttpServer)
125
+
126
+ **Boundary:** `mcp-server.ts` main() entry point and HttpServer internals
127
+
128
+ **Why that boundary:** Safest single-PR boundary; no API surface changes; HttpServer still starts and serves `dashboardUrl`
129
+
130
+ **Failure mode:** Multiple Claude windows each start an un-elected HttpServer on port 3456. With no lock, the first wins; others fall through to legacy ports (3457+). Each window binds a port unnecessarily.
131
+
132
+ **Repo pattern relationship:** Follows "direct removal without flags" pattern for the bridge. Adapts existing HttpServer structure by stripping internals.
133
+
134
+ **Gains:** Bridge complexity entirely gone (~1400 lines). MCP server starts deterministically. No more 150ms probe delay.
135
+ **Gives up:** Port contention between Claude windows is still an ongoing concern. `dashboard.lock` still written by the simplified HttpServer (no election, but still a lock file that worktrain-spawn could use as fallback).
136
+
137
+ **Scope judgment: too narrow.** Removes bridge (correct) but leaves the architectural remnant (HttpServer bound to a port for every Claude window). Does not satisfy the backlog requirement: "remove HttpServer starting as part of the MCP server." C1 is best understood as Slice A of C2 without committing to Slice B.
138
+
139
+ **Philosophy:**
140
+ - Honors: "Architectural fixes over patches" (removes bridge root cause), "Determinism" (no probe delay), "YAGNI" (election machinery deleted)
141
+ - Conflicts with: "Make illegal states unrepresentable" (`ToolContext.httpServer` field remains), backlog explicit requirement
142
+
143
+ ---
144
+
145
+ ### Candidate 2: Two sequential PRs -- bridge removal then HttpServer removal (recommended)
146
+
147
+ **Summary:** PR-A removes the bridge and tombstone. PR-B removes `HttpServer` from MCP server startup entirely, degrades `open_dashboard` to return a static `http://localhost:3456` URL, and removes `httpServer` from `ToolContext`.
148
+
149
+ **Structural changes (PR-A: bridge removal):**
150
+ - Delete `src/mcp/transports/bridge-entry.ts`
151
+ - Delete `src/mcp/transports/primary-tombstone.ts`
152
+ - Delete `src/mcp/transports/bridge-events.ts`
153
+ - `src/mcp-server.ts`: Remove 28-line auto-bridge block (lines 90-117). Remove imports of `startBridgeServer`, `detectHealthyPrimary`, `waitForStdinReadable`, `STDIO_CLIENT_PROBE_MS`. Remove re-exports of those 3 names.
154
+ - `src/mcp/transports/stdio-entry.ts`: Remove `writeTombstone`/`clearTombstone` imports and their 3 call sites. Keep `ctx.httpServer?.stop()` in shutdown hook (HttpServer still starts in PR-A state).
155
+ - `src/mcp/transports/http-entry.ts`: Remove `writeTombstone`/`clearTombstone` imports and their 2 call sites.
156
+ - Delete `tests/unit/mcp/transports/bridge-entry.test.ts` (638 lines)
157
+ - Delete `tests/unit/mcp/transports/primary-tombstone.test.ts` (85 lines)
158
+ - Delete `tests/unit/mcp/stdin-probe.test.ts` (covers `waitForStdinReadable`)
159
+ - Update `tests/unit/mcp-server.test.ts`: remove assertions about bridge-entry.ts existence and startBridgeServer import
160
+
161
+ **Structural changes (PR-B: HttpServer removal from MCP server):**
162
+ - `src/mcp/types.ts`: Remove `readonly httpServer: HttpServer | null` field from `ToolContext`. Remove `import type { HttpServer }` from the file.
163
+ - `src/mcp/server.ts` `createToolContext()`: Remove `httpServer = container.resolve(DI.Infra.HttpServer)` block and the entire `if (featureFlags.isEnabled('sessionTools'))` block that gates it. Remove `sessionManager` from the function too -- it's resolved via the same flag. Simplify to: always resolve `sessionManager` when `sessionTools` is enabled, never resolve `httpServer`. Remove console routes mount block (`if (ctx.v2 && ctx.httpServer ...)`). Remove `ctx.httpServer?.finalize()`.
164
+ - `src/mcp/handlers/session.ts`: Rewrite `requireSessionTools()` to check only `ctx.sessionManager !== null`. Rewrite `handleCreateSession`: `const dashboardUrl = 'http://localhost:3456' + '?session=' + input.sessionId` (static, no httpServer call). Rewrite `handleOpenDashboard`: return `{ url: 'http://localhost:3456' }` with a note "Run 'worktrain console' to start the dashboard UI". Remove `httpServer` usage.
165
+ - `src/di/container.ts` `registerServices()`: Remove `HttpServer` import and registration. Remove `DI.Infra.HttpServer` from `registerServices()`.
166
+ - `src/di/container.ts` `registerConfig()`: Remove `DI.Config.DashboardMode` and `DI.Config.BrowserBehavior` registrations.
167
+ - `src/di/container.ts` `startAsyncServices()`: Remove entire `if (flags.isEnabled('sessionTools'))` block. The function becomes a shell that sets `asyncInitialized = true`.
168
+ - `src/di/tokens.ts`: Remove `HttpServer: Symbol(...)`, `DashboardMode: Symbol(...)`, `BrowserBehavior: Symbol(...)`.
169
+ - `src/config/app-config.ts`: Remove `ValidatedConfig.dashboard` subtree (`mode`, `browserBehavior`, `port`). Remove `DashboardMode`, `BrowserBehavior`, `DashboardPort` type exports. Remove `WORKRAIL_DISABLE_UNIFIED_DASHBOARD` and `WORKRAIL_DASHBOARD_PORT` from `EnvVarsSchema`.
170
+ - `src/config/feature-flags.ts`: Update `sessionTools` description to "(session store only; use 'worktrain console' for the dashboard UI)". Remove "HTTP dashboard server" from description.
171
+ - `src/cli.ts`: Remove the `cleanup` command block entirely (it resolved `DI.Infra.HttpServer`). Or: print a deprecation notice. Remove `import type { HttpServer }` and the `DI.Infra.HttpServer` resolve call.
172
+ - `src/mcp/transports/stdio-entry.ts`: Simplify `onBeforeTerminate` to `async () => {}` (no httpServer to stop). Remove `ctx.httpServer?.stop()`.
173
+ - `src/mcp/transports/http-entry.ts`: Remove `ctx.httpServer?.stop()` from both shutdown hook usages.
174
+ - `src/infrastructure/session/HttpServer.ts`: **Deleted** (~1000 lines)
175
+ - `src/infrastructure/session/DashboardHeartbeat.ts`: **Deleted**
176
+ - `src/infrastructure/session/DashboardLockRelease.ts`: **Deleted**
177
+ - `src/infrastructure/session/index.ts`: Remove those three exports
178
+ - `src/runtime/adapters/node-process-signals.ts`: Update comment removing reference to `HttpServer.setupPrimaryCleanup()`
179
+ - Delete `tests/integration/unified-dashboard.test.ts` (206 lines)
180
+ - Delete `tests/integration/process-cleanup.test.ts` (HttpServer-dependent portions)
181
+ - Delete `tests/unit/http-server-stop-idempotency.test.ts` (147 lines)
182
+ - Update `tests/unit/mcp-server.test.ts`: remove HttpServer-related assertions, add assertion that `ctx.httpServer` field does not exist in `ToolContext`
183
+ - Update `tests/smoke/di-container.smoke.test.ts`: no change needed (auto-shrinks when tokens removed)
184
+
185
+ **Total lines deleted across both PRs:** approximately 2000 lines deleted, ~150 lines changed.
186
+
187
+ **Tensions resolved:** All four. T1: `dashboardUrl` stays functional (static URL instead of null). T2: PR-A is safe; PR-B doesn't touch transport logic. T3: `infrastructure/session/HttpServer.ts` gone, no naming confusion. T4: `sessionTools` gates `SessionManager` only; `requireSessionTools()` checks one thing.
188
+
189
+ **Tensions accepted:** Static `http://localhost:3456` in `handleCreateSession` and `handleOpenDashboard` is wrong if user runs worktrain console on a different port. Low probability (custom port requires explicit `--port` flag); documented assumption.
190
+
191
+ **Boundary:** `ToolContext.httpServer` field in `src/mcp/types.ts`. Removing this field is the correct seam: TypeScript propagates errors to all callsites, making the blast radius explicit and compiler-enforced.
192
+
193
+ **Why that boundary is best fit:** The field represents a capability that no longer exists. Removing it from the capability record (ToolContext) makes the absence unrepresentable-as-present. Every other change follows as a consequence of this type change.
194
+
195
+ **Failure mode:** Hardcoded `http://localhost:3456` is wrong if user runs `worktrain console --port 4000`. The URL is then a dead link. Mitigation path: read `~/.workrail/daemon-console.lock` in the handler for the actual port. This is a one-shot async file read -- doable in a follow-up PR if the issue is reported.
196
+
197
+ **Repo pattern relationship:** Follows the established "just remove it" pattern for refactors (no intermediate flag). Adapts the existing `requireSessionTools()` guard to check one capability instead of two. The static URL constant follows the existing `DEFAULT_MCP_PORT = 3100` pattern in `mcp-server.ts`.
198
+
199
+ **Gains:** ~2000 lines of coordination machinery deleted. No port contention between Claude windows. No `dashboard.lock`, no `spawn-coordinator-*.lock`, no `primary.tombstone`. MCP server starts in ~50ms instead of 150ms+. `sessionTools` remains functional. `open` npm package dependency removed. DI container loses three tokens.
200
+ **Gives up:** `open_dashboard` no longer auto-launches a browser (the `open` npm package call is the only usage). `dashboardUrl` is a static hint rather than a guaranteed-live URL. `workrail cleanup` command is removed.
201
+
202
+ **Impact beyond immediate task:** `worktrain-spawn.ts` and `worktrain-await.ts` lose the `dashboard.lock` fallback permanently -- they gracefully fall through to default port 3456 without it. This is the correct behavior. The fallback lines can be removed as a chore PR later.
203
+
204
+ **Scope judgment: best-fit.** Exactly matches the backlog requirements. Two PRs are the right granularity: PR-A is the high-value, low-risk change; PR-B is the deeper structural change that benefits from PR-A being validated first.
205
+
206
+ **Philosophy:**
207
+ - Honors: "Architectural fixes over patches", "Make illegal states unrepresentable" (httpServer field removed), "YAGNI with discipline" (all dead code deleted), "Determinism over cleverness" (startup has one path), "Keep interfaces small and focused" (ToolContext shrinks), "Errors are data" (bridge's callback-based shutdown is gone)
208
+ - Conflicts with: "Dependency injection for boundaries" for the hardcoded port constant. Counter: this is a well-known default value, not a behavioral policy. A `DEFAULT_CONSOLE_PORT` constant in a shared location is sufficient.
209
+
210
+ ---
211
+
212
+ ### Candidate 3: Rip-all-at-once behind a `WORKRAIL_SIMPLE_STDIO` feature flag
213
+
214
+ **Summary:** A single PR adds a `WORKRAIL_SIMPLE_STDIO` feature flag. When set, `mcp-server.ts` short-circuits to `startStdioServer()` and `startAsyncServices()` skips HttpServer. All bridge/HttpServer code remains for one release cycle as the flag is graduated to default-on.
215
+
216
+ **Structural changes:**
217
+ - Add `simplestdio` flag to `feature-flags.ts`
218
+ - Add 3-line conditional at top of `mcp-server.ts` main()
219
+ - Add 3-line conditional in `startAsyncServices()`
220
+ - No deletions yet
221
+
222
+ **Tensions resolved:** T2 only (the migration window reduces self-referential risk during rollout)
223
+ **Tensions accepted:** T1 (two API codepaths), T3 (both HTTP servers still exist), T4 (feature still bundled)
224
+
225
+ **Boundary:** Feature flag
226
+
227
+ **Failure mode:** The flag is never defaulted to `true` and never cleaned up. Becomes permanent dead-with-flag code. The codebase already has flags with `since: '0.6.0'` that show this pattern is a real risk.
228
+
229
+ **Repo pattern relationship:** Departs. There is NO existing pattern of using a feature flag to gate the removal of code. Every refactor in the git log (including PR #512) removes code directly. This candidate invents a new anti-pattern.
230
+
231
+ **Gains:** Theoretically reversible within one release cycle.
232
+ **Gives up:** Two codepaths to test and maintain. Adds complexity before reducing it. Flag cleanup requires a follow-up PR. Goes against the established direct-removal pattern.
233
+
234
+ **Scope judgment: too broad in the wrong dimension.** Adds complexity (flag + two codepaths) while doing less structural work. Not broader in the sense of "more value" -- broader in the sense of "more surface area for the same outcome."
235
+
236
+ **Philosophy:**
237
+ - Honors: "Graceful degradation ladders" (migration window) -- but this principle applies to user-facing capability degradation, not internal refactor sequencing
238
+ - Conflicts with: "YAGNI with discipline", "Architectural fixes over patches", repo direct-removal pattern
239
+
240
+ ---
241
+
242
+ ## Comparison and Recommendation
243
+
244
+ ### Tension resolution matrix
245
+
246
+ | | C1 (bridge out, HttpServer simplified) | C2 (two PRs: bridge + HttpServer) | C3 (flag-gated) |
247
+ |---|---|---|---|
248
+ | T1: removal vs compat | Accepts (port contention persists) | Resolves (static URL degrades gracefully) | Half-resolves (flag window) |
249
+ | T2: self-referential risk | Resolves (small first PR) | Resolves (PR-A safe; PR-B doesn't touch transport) | Accepts (one large PR) |
250
+ | T3: HTTP naming confusion | Partial (bridge gone; HttpServer still starts) | Resolves (only MCP-transport HTTP remains) | Accepts (both servers persist) |
251
+ | T4: sessionTools coherence | Accepts (still bundled) | Resolves (guard checks one thing) | Accepts (both codepaths) |
252
+
253
+ ### Recommendation: Candidate 2
254
+
255
+ C2 is the only candidate that satisfies the explicit backlog requirement in full and resolves all four tensions.
256
+
257
+ The two-PR sequencing is not a compromise. PR-A (bridge removal) is the high-value, low-risk change: it eliminates the 28-line non-deterministic entry point and ~800 lines of reconnect state machine. PR-B (HttpServer removal) is the structural completion: it makes the absence of dashboard HTTP serving unrepresentable in the type system and deletes ~1200 more lines. Both PRs are independently reviewable, independently deployable, and leave the system in a consistent state.
258
+
259
+ C1 is rejected because it leaves the port-contention problem and contradicts the explicit backlog design. It is best understood as PR-A of C2, not a complete design.
260
+
261
+ C3 is rejected because it invents an anti-pattern (removal-gate flags) with no precedent in this repo and adds complexity in exchange for a migration window that is not needed (the behavioral changes in C2 are mild and documented).
262
+
263
+ ---
264
+
265
+ ## Self-Critique
266
+
267
+ ### Strongest argument against C2
268
+
269
+ The static `http://localhost:3456` URL in `handleCreateSession` and `handleOpenDashboard` is an assumption, not a guarantee. A user who runs `worktrain console --port 4000` gets a stale URL. The `open_dashboard` tool used to work (it opened a browser to the live server). After C2, it returns a URL to a server that may not be running.
270
+
271
+ Counter: the existing behavior already has this problem -- if `HttpServer` failed to start (port exhaustion in legacy mode), `dashboardUrl` was already `null`. The static URL is strictly better than null. The correct fix (read `daemon-console.lock` to discover the actual port) can be added in a focused follow-up PR without blocking the simplification.
272
+
273
+ ### Why C1 loses
274
+
275
+ C1 removes the most disruptive code (the bridge state machine) without touching the sessionTools API surface. But it is not a complete design: it leaves HttpServer running in each Claude window, still binding port 3456, still writing `dashboard.lock`, still being resolved from DI. The backlog says "remove HttpServer starting as part of the MCP server." C1 does not do that. It is PR-A of C2, not a standalone design.
276
+
277
+ ### What broader scope would require
278
+
279
+ A fully correct `handleOpenDashboard` that reads `daemon-console.lock` to discover the actual port and returns a live URL would require:
280
+ - One async `fs.readFile` call in the handler
281
+ - A fallback to port 3456 if the lock is absent
282
+ - A test for the lock-read path
283
+
284
+ This is approximately 20 lines of code added to PR-B. It is the right long-term behavior. Whether it belongs in PR-B or a follow-up PR is a question of scope appetite. Including it in PR-B makes PR-B slightly larger but produces a more correct `handleOpenDashboard`.
285
+
286
+ ### Assumption that would invalidate C2
287
+
288
+ If operators call `/api/v2/sessions` directly on the MCP server's port (rather than on the worktrain console port) in production workflows or automation, removing console route mounting from the MCP server breaks them. Evidence check: `worktrain-spawn.ts` already prefers `daemon-console.lock` (worktrain console) over `dashboard.lock` (MCP server) -- the standalone console is already the canonical API source. No bundled workflow calls `/api/v2/sessions`. No documentation describes calling the MCP server's dashboard port directly. This assumption holds unless an operator has built private tooling against the MCP server's dashboard endpoint, which would be undocumented and fragile already.
289
+
290
+ ---
291
+
292
+ ## Open Questions for the Main Agent
293
+
294
+ 1. **`handleOpenDashboard` lock-file read in PR-B or follow-up?** Reading `daemon-console.lock` to discover the actual console port makes `handleOpenDashboard` return a live URL instead of a best-effort static constant. ~20 lines. Does this belong in PR-B or a separate chore PR?
295
+
296
+ 2. **`workrail cleanup` command: remove or degrade?** The `cleanup` command's implementation (lsof/netstat to kill processes on 3456-3499) becomes meaningless after HttpServer removal. Two options: (A) remove the command entirely, (B) print a deprecation notice saying "Use 'worktrain console' to manage the console UI." Option A is cleaner; Option B is kinder to users who had `workrail cleanup` in scripts.
297
+
298
+ 3. **`WORKRAIL_DASHBOARD_PORT` and `WORKRAIL_DISABLE_UNIFIED_DASHBOARD` env vars:** These appear in the generated config template (`workrail init` writes them to `~/.workrail/config.json`). After removal, they are silently ignored. Should a startup warning be emitted when these env vars are set but HttpServer no longer uses them? Or just silently ignore them and document in the release notes?
299
+
300
+ 4. **`http-listener.ts` tests:** `tests/unit/mcp/http-listener.test.ts` tests `createHttpListener` and `bindWithPortFallback`. These are still needed (for bot-service HTTP transport). The test file should be reviewed to confirm none of its tests are about dashboard election behavior vs MCP transport behavior. A quick scan should confirm they test `createHttpListener` lifecycle only.
301
+
302
+ 5. **PR-B commit type:** Removing `open_dashboard` auto-open behavior and changing `dashboardUrl` from a live URL to a static constant are MCP tool contract changes. Under the release policy, this counts as a breaking change defaulting to `minor`. The PR title should be `feat(mcp): ...` or `fix(mcp): ...` rather than `chore(mcp): ...` to ensure semantic-release creates a release entry with the change documented.
303
+
304
+ ---
305
+
306
+ ## Final Summary
307
+
308
+ **Review date:** 2026-04-17
309
+ **Review doc:** `docs/design/stdio-simplification-design-review.md`
310
+
311
+ ### Selected Direction: Candidate 2 (two sequential PRs)
312
+
313
+ C2 is the only candidate that satisfies the explicit backlog requirement and resolves all four tensions. PR-A (bridge removal) is the high-value, low-risk change: ~800 lines deleted, deterministic startup, no 150ms probe delay. PR-B (HttpServer removal) is the structural completion: `ToolContext.httpServer` field deleted, ~1200 more lines removed, `sessionTools` remains functional with degraded (but acceptable) `open_dashboard` behavior.
314
+
315
+ ### Why Alternatives Lost
316
+
317
+ - C1 is PR-A of C2, not a complete design. It leaves HttpServer running in every Claude window and contradicts the backlog requirement.
318
+ - C3 invents a removal-gate feature flag anti-pattern with no repo precedent.
319
+
320
+ ### Confidence Band: HIGH
321
+
322
+ All four tensions resolved. No RED or ORANGE findings in review. Two YELLOW items, both with clear mitigations.
323
+
324
+ ### Additive Revisions to C2
325
+
326
+ 1. Include `daemon-console.lock` read in PR-B's `handleOpenDashboard` (~20 lines) -- resolves Open Question 1.
327
+ 2. Use `DEFAULT_CONSOLE_PORT = 3456` named constant, not a bare literal.
328
+ 3. PR-B commit type: `feat(mcp)` (MCP tool contract change -- resolves Open Question 5).
329
+
330
+ ### Resolved Open Questions
331
+
332
+ - OQ1: Lock-file read belongs in PR-B, not a follow-up.
333
+ - OQ2: Remove the `cleanup` command entirely with a clear release note.
334
+ - OQ3: Emit a startup warning when `WORKRAIL_DASHBOARD_PORT` or `WORKRAIL_DISABLE_UNIFIED_DASHBOARD` are set.
335
+ - OQ4: Scan `tests/unit/mcp/http-listener.test.ts` before PR-B to confirm no dashboard-election tests before deletion decisions.
336
+ - OQ5: PR-B is `feat(mcp)`.
337
+
338
+ ### Next Actions
339
+
340
+ 1. Start PR-A: delete `bridge-entry.ts`, `primary-tombstone.ts`, `bridge-events.ts`; remove 28-line auto-bridge block from `mcp-server.ts`; remove tombstone call sites.
341
+ 2. After PR-A merges and CI is green, start PR-B: remove `ToolContext.httpServer` field, delete `HttpServer.ts`, update DI container, update `handleCreateSession`/`handleOpenDashboard` with static URL + lock-file read.
@@ -0,0 +1,93 @@
1
+ # WorkRail MCP Server Stdio Simplification -- Design Review Findings
2
+
3
+ **Status:** Review complete. Ready for implementation.
4
+ **Date:** 2026-04-17
5
+ **Reviewing:** `docs/design/stdio-simplification-design-candidates.md`
6
+ **Selected Direction:** Candidate 2 -- two sequential PRs (bridge removal + HttpServer removal)
7
+
8
+ ---
9
+
10
+ ## Tradeoff Review
11
+
12
+ | Tradeoff | Verdict | Condition for Reversal |
13
+ |---|---|---|
14
+ | Static `http://localhost:3456` in `open_dashboard` | Acceptable | If significant users run `worktrain console --port N`; mitigate by reading `daemon-console.lock` |
15
+ | `workrail cleanup` command removed | Acceptable | Document in release notes; no acceptance criterion requires it |
16
+ | `dashboardUrl` is a hint not a guaranteed-live URL | Acceptable | No caller checks URL liveness programmatically |
17
+ | `open_dashboard` no longer auto-launches browser | Acceptable | UX tool, not a workflow primitive; no bundled workflow calls it |
18
+
19
+ All tradeoffs hold under realistic conditions. No tradeoff violates an acceptance criterion.
20
+
21
+ ---
22
+
23
+ ## Failure Mode Review
24
+
25
+ ### FM1: Hardcoded port assumption (YELLOW)
26
+ - **Description:** `handleOpenDashboard` returns `http://localhost:3456` -- wrong if user runs `worktrain console --port 4000`
27
+ - **Design handling:** Acknowledged in design doc. Mitigation: read `~/.workrail/daemon-console.lock` in handler (~20 lines)
28
+ - **Severity:** Yellow -- affects non-default-port users only; degraded behavior is a dead link, not a crash
29
+ - **Recommended action:** Include the lock-file read in PR-B (not deferred to follow-up). ~20 lines, well-scoped.
30
+
31
+ ### FM2: Private operator tooling calling /api/v2/sessions on MCP dashboard port (YELLOW)
32
+ - **Description:** Removing console route mounting from MCP server breaks undocumented operator tooling
33
+ - **Design handling:** Evidence strongly argues against this: `worktrain-spawn.ts` already prefers `daemon-console.lock`; no bundled workflow calls the endpoint; not documented
34
+ - **Severity:** Yellow -- very low probability; no evidence exists
35
+ - **Recommended action:** No action required. Include a note in PR-B description for awareness.
36
+
37
+ ---
38
+
39
+ ## Runner-Up / Simpler Alternative Review
40
+
41
+ **C1 (runner-up):** Bridge removal only, HttpServer simplified. Contains no elements worth borrowing into C2 -- C1 is a strict subset of C2's PR-A. C1 fails to satisfy the backlog requirement.
42
+
43
+ **Simpler C2 variant:** Inject `null` for `httpServer` rather than removing the field from `ToolContext`. Rejected -- violates "Make illegal states unrepresentable." Nullable field for a removed capability perpetuates dead code.
44
+
45
+ **Hybrid opportunity:** Include `daemon-console.lock` read in PR-B (Open Question 1 from design doc). This resolves FM1 at the point of change rather than deferring. Recommended.
46
+
47
+ ---
48
+
49
+ ## Philosophy Alignment
50
+
51
+ | Principle | Status |
52
+ |---|---|
53
+ | Architectural fixes over patches | Satisfied -- root cause removed |
54
+ | Make illegal states unrepresentable | Satisfied -- `ToolContext.httpServer` field deleted |
55
+ | YAGNI with discipline | Satisfied -- all dead coordination machinery deleted |
56
+ | Determinism over cleverness | Satisfied -- startup has one deterministic path |
57
+ | Keep interfaces small and focused | Satisfied -- ToolContext shrinks; `requireSessionTools()` checks one thing |
58
+ | Errors are data | Satisfied -- callback-based bridge shutdown gone |
59
+ | Dependency injection for boundaries | Minor tension -- hardcoded `DEFAULT_CONSOLE_PORT = 3456` constant. Acceptable; injecting a well-known default adds DI complexity for no behavioral benefit. |
60
+
61
+ ---
62
+
63
+ ## Findings
64
+
65
+ ### RED
66
+ None.
67
+
68
+ ### ORANGE
69
+ None.
70
+
71
+ ### YELLOW
72
+ - **FM1:** `handleOpenDashboard` static port assumption. Resolve by including `daemon-console.lock` read in PR-B.
73
+ - **FM2:** Console route removal may break undocumented operator tooling. Low probability. Include in PR-B description.
74
+
75
+ ---
76
+
77
+ ## Recommended Revisions to C2
78
+
79
+ 1. **Include lock-file read in PR-B** (not follow-up): Add `daemon-console.lock` read to `handleOpenDashboard` with fallback to port 3456. ~20 lines. Makes the tool return a live URL rather than a best-effort hint.
80
+
81
+ 2. **Use named constant, not bare literal:** Define `DEFAULT_CONSOLE_PORT = 3456` in a shared location (e.g., `src/infrastructure/console-defaults.ts`). Use it in `handleOpenDashboard`, `handleCreateSession`, and `worktrain-spawn.ts` fallback.
82
+
83
+ 3. **PR-B commit type:** This is a `feat(mcp)` commit (MCP tool contract change: `dashboardUrl` behavior, `open_dashboard` behavior, `workrail cleanup` removal). Not `chore`. Semantic-release must create a release entry.
84
+
85
+ ---
86
+
87
+ ## Residual Concerns
88
+
89
+ 1. **Open Question 2 (cleanup command):** Remove entirely (Option A) or print deprecation notice (Option B). Recommendation: Option A with a clear release note. The command's implementation (lsof/netstat kills on 3456-3499) becomes semantically wrong after HttpServer removal.
90
+
91
+ 2. **Open Question 3 (deprecated env vars):** `WORKRAIL_DASHBOARD_PORT` and `WORKRAIL_DISABLE_UNIFIED_DASHBOARD` will be silently ignored after removal. Recommendation: emit a startup warning when these vars are set but HttpServer is no longer present. One-line check in app startup.
92
+
93
+ 3. **Open Question 4 (http-listener.ts tests):** Review `tests/unit/mcp/http-listener.test.ts` before PR-B to confirm all tests cover MCP-transport lifecycle only, not dashboard election behavior. Low risk but worth a quick scan before deletion decisions.