@exaudeus/workrail 3.35.1 → 3.37.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/dist/config/config-file.js +2 -0
  2. package/dist/console-ui/assets/{index-D7jQyCSD.js → index-o-p__sHJ.js} +1 -1
  3. package/dist/console-ui/index.html +1 -1
  4. package/dist/daemon/workflow-runner.d.ts +5 -0
  5. package/dist/daemon/workflow-runner.js +131 -1
  6. package/dist/manifest.json +39 -31
  7. package/dist/mcp/handlers/v2-advance-events.js +1 -1
  8. package/dist/mcp/handlers/v2-execution/start.d.ts +1 -0
  9. package/dist/mcp/handlers/v2-execution/start.js +3 -2
  10. package/dist/trigger/notification-service.d.ts +42 -0
  11. package/dist/trigger/notification-service.js +164 -0
  12. package/dist/trigger/trigger-listener.js +7 -1
  13. package/dist/trigger/trigger-router.d.ts +3 -1
  14. package/dist/trigger/trigger-router.js +4 -1
  15. package/dist/v2/durable-core/schemas/export-bundle/index.d.ts +64 -32
  16. package/dist/v2/durable-core/schemas/session/events.d.ts +20 -10
  17. package/dist/v2/durable-core/schemas/session/events.js +1 -1
  18. package/dist/v2/durable-core/schemas/session/gaps.d.ts +8 -8
  19. package/dist/v2/durable-core/schemas/session/gaps.js +1 -1
  20. package/docs/design/agent-behavior-patterns-discovery.md +312 -0
  21. package/docs/design/agent-engine-communication-discovery.md +390 -0
  22. package/docs/design/agent-loop-architecture-alternatives-discovery.md +531 -0
  23. package/docs/design/agent-loop-error-handling-contract.md +238 -0
  24. package/docs/design/complete-step-approach-validation-discovery.md +344 -0
  25. package/docs/design/daemon-stuck-detection-discovery.md +174 -0
  26. package/docs/design/mcp-server-disconnect-discovery.md +245 -0
  27. package/docs/design/mcp-server-epipe-crash.md +198 -0
  28. package/docs/design/notification-design-candidates.md +131 -0
  29. package/docs/design/notification-design-review.md +84 -0
  30. package/docs/design/notification-implementation-plan.md +181 -0
  31. package/docs/design/spawn-agent-failure-modes.md +161 -0
  32. package/docs/design/spawn-agent-result-handling-implementation-plan.md +186 -0
  33. package/docs/design/stdio-simplification-design-candidates.md +341 -0
  34. package/docs/design/stdio-simplification-design-review.md +93 -0
  35. package/docs/design/stdio-simplification-implementation-plan.md +317 -0
  36. package/docs/design/structured-output-tools-coexist-findings.md +288 -0
  37. package/docs/discovery/coordinator-script-design.md +745 -0
  38. package/docs/discovery/coordinator-ux-discovery.md +471 -0
  39. package/docs/discovery/spawn-agent-failure-modes.md +309 -0
  40. package/docs/discovery/workflow-selection-for-discovery-tasks.md +336 -0
  41. package/docs/discovery/worktrain-status-briefing.md +325 -0
  42. package/docs/discovery/worktrain-status-design-candidates.md +202 -0
  43. package/docs/discovery/worktrain-status-design-review-findings.md +86 -0
  44. package/docs/ideas/backlog.md +688 -1
  45. package/docs/ideas/daemon-structured-output-vs-tool-calls.md +344 -0
  46. package/docs/ideas/design-candidates-backlog-consolidation.md +85 -0
  47. package/docs/ideas/design-candidates-spawn-agent-task.md +178 -0
  48. package/docs/ideas/design-review-findings-backlog-consolidation.md +39 -0
  49. package/docs/ideas/design-review-findings-spawn-agent-task.md +139 -0
  50. package/docs/ideas/implementation_plan_backlog_consolidation.md +117 -0
  51. package/docs/ideas/implementation_plan_spawn_agent.md +217 -0
  52. package/docs/plans/authoring-doc-staleness-enforcement-candidates.md +251 -0
  53. package/docs/plans/authoring-doc-staleness-enforcement-review.md +99 -0
  54. package/docs/plans/authoring-doc-staleness-enforcement.md +463 -0
  55. package/package.json +1 -1
@@ -0,0 +1,317 @@
1
+ # WorkRail MCP Server Stdio Simplification -- Implementation Plan
2
+
3
+ **Status:** Design complete. PR-A changes already implemented in working tree. PR-B ready for planning.
4
+ **Date:** 2026-04-19
5
+ **Scope:** Remove primary election (DashboardLock, tryBecomePrimary, bindWithPortFallback), bridge mechanism (bridge-entry.ts), and HTTP dashboard serving from the MCP server. The standalone worktrain console (PR #512, merged) now owns the UI.
6
+
7
+ **Design docs:**
8
+ - Candidates: `docs/design/stdio-simplification-design-candidates.md`
9
+ - Review findings: `docs/design/stdio-simplification-design-review.md`
10
+ - PR-A candidates: `docs/design/bridge-removal-pr-a-candidates.md`
11
+ - PR-A implementation plan: `docs/design/bridge-removal-pr-a-implementation-plan.md`
12
+ - This document: overall implementation plan with PR slices
13
+
14
+ ---
15
+
16
+ ## About This Document
17
+
18
+ This file is a **human-readable reference artifact** for the implementation plan. It is not execution memory.
19
+
20
+ - **Execution truth** lives in the WorkRail session notes and context variables (durable across chat rewinds)
21
+ - **This document** is for a developer reading the plan before or during implementation
22
+ - If a chat rewind occurs, the session notes and context survive; this file may be regenerated from them
23
+ - Do not treat the presence or absence of this file as a gate on execution state
24
+
25
+ ---
26
+
27
+ ## Capability Assessment (2026-04-19)
28
+
29
+ This session was a WorkRail Auto daemon session. Available tools: `continue_workflow`, `Bash`, `Read`, `Write`, `report_issue`.
30
+
31
+ - **Delegation (WorkRail Executor subagents):** unavailable -- no delegation/spawn tool present in session tool list
32
+ - **Web browsing:** unavailable -- no web fetch tool present in session tool list
33
+ - **Fallback path:** all design work performed directly by the main agent using filesystem tools and codebase analysis. Sufficient for a design-only session with pre-existing candidate and review docs.
34
+
35
+ No capability-dependent path was taken. The delegation gap does not affect output quality.
36
+
37
+ ---
38
+
39
+ ## Landscape Packet (verified 2026-04-19)
40
+
41
+ ### Current state
42
+
43
+ **Branch:** `feat/mcp-simplify-remove-bridge` (checked out, ahead of origin/main)
44
+
45
+ **PR-A changes already live in working tree (unstaged):**
46
+ - `src/mcp/transports/bridge-entry.ts` -- deleted
47
+ - `src/mcp/transports/bridge-events.ts` -- deleted
48
+ - `src/mcp/transports/primary-tombstone.ts` -- deleted
49
+ - `src/mcp-server.ts` -- simplified to 37 lines (was 132); auto-bridge block and all bridge imports removed
50
+ - `src/mcp/transports/stdio-entry.ts` -- tombstone call sites removed
51
+ - `src/mcp/transports/http-entry.ts` -- tombstone call site at line 103 needs verification (see Contradiction C1)
52
+ - Bridge test files deleted: `stdin-probe.test.ts`, `bridge-entry.test.ts`, `primary-tombstone.test.ts`
53
+ - `triggers.yml` -- modified (protected file; must NOT be committed with PR-A)
54
+
55
+ **Build status:** `npm run build` passes. One pre-existing perf test flakiness (pre-existing on main, not caused by PR-A changes).
56
+
57
+ ### What remains for PR-B
58
+
59
+ - `src/infrastructure/session/HttpServer.ts` (1211 lines) -- still present
60
+ - `ToolContext.httpServer: HttpServer | null` field -- still in `src/mcp/types.ts`
61
+ - `ctx.httpServer` usages in `src/mcp/server.ts` (lines 91, 95, 200, 299, 309, 325)
62
+ - `requireSessionTools()` gates on `ctx.httpServer` in `session.ts`
63
+ - `DI.Infra.HttpServer`, `DI.Config.DashboardMode`, `DI.Config.BrowserBehavior` tokens
64
+ - `ValidatedConfig.dashboard` subtree and env vars in `app-config.ts`
65
+ - `cleanup` command in `cli.ts` (lines 210-224)
66
+
67
+ ### Contradictions
68
+
69
+ **C1 (MEDIUM):** `http-entry.ts:103` has `writeTombstone(boundPort, process.pid)` -- verify whether this was removed in the working tree or remains. Must be resolved before PR-A commit.
70
+
71
+ **C2 (MINOR):** `mcp-server-disconnect` design track exists but is orthogonal; no scope conflict.
72
+
73
+ ### Evidence gaps
74
+
75
+ 1. `daemon-console.lock` format -- confirmed: JSON `{ pid: number, port: number }` (read from `src/trigger/daemon-console.ts:13` and `src/console/standalone-console.ts:13`). Gap is now closed.
76
+ 2. `process-cleanup.test.ts` scope -- needs scan before PR-B.
77
+ 3. `http-entry.ts` tombstone verification -- needs check before PR-A commit.
78
+
79
+ ---
80
+
81
+ ## Problem Frame Packet
82
+
83
+ ### Stakeholders
84
+
85
+ **Primary:**
86
+ - **Etienne (project owner):** wants a simpler, deterministic MCP server startup. The coordination machinery adds ~2300 lines of complexity that now has no purpose. Primary goal: clean deletion without regression.
87
+ - **Workflow authors using `sessionTools`:** currently rely on `create_session` returning a `dashboardUrl` and `open_dashboard` opening a browser tab. After PR-B, both behaviors change: URL becomes static (still useful), auto-open disappears (UX loss). These authors are the only people exposed to a visible behavior change.
88
+
89
+ **Secondary:**
90
+ - **Bot-service operators using MCP-over-HTTP transport:** must not be affected. `http-entry.ts` and `http-listener.ts` are explicitly preserved.
91
+ - **Future daemon sessions (including this one):** the MCP server runs the workflow engine that's being modified. A broken build during PR implementation kills the active session.
92
+
93
+ ### Jobs and Outcomes
94
+
95
+ | Stakeholder | Job | Desired outcome |
96
+ |---|---|---|
97
+ | Project owner | Delete dead coordination code | ~2300 lines gone, startup deterministic, no port contention |
98
+ | Workflow authors | Track session progress via dashboard | `dashboardUrl` still returned (static URL is sufficient for copy-paste); `open_dashboard` still responds (returns URL, no auto-open) |
99
+ | Bot-service operators | Connect AI agents via HTTP transport | MCP-over-HTTP transport unchanged |
100
+ | Users who set `WORKRAIL_DASHBOARD_PORT` | Configure dashboard port | Get a clear deprecation warning rather than silent no-op |
101
+
102
+ ### Pains and Tensions
103
+
104
+ **T1 (resolved by design): Port contention between multiple Claude windows**
105
+ The bridge was the solution. With standalone console as the UI owner, contention is no longer the MCP server's problem. Removing the bridge removes the non-deterministic 150ms startup probe. Resolved.
106
+
107
+ **T2 (active): `sessionTools` behavioral contract change**
108
+ `create_session` returns `dashboardUrl: null` today when HttpServer failed to start (already a known degraded state). PR-B changes it to always return a static URL -- strictly better than null. `open_dashboard` loses auto-browser-open (the `open` npm package call). This is the only real UX regression. Severity: LOW. `open_dashboard` is not called by any bundled workflow (verified: `grep -rn "open_dashboard" workflows/` returns nothing).
109
+
110
+ **T3 (active): `dashboard-template-workflow.json` hardcodes `http://localhost:3456`**
111
+ Already hardcodes the URL in its prompt text. PR-B's static URL behavior matches exactly what this workflow already tells users. No regression here -- this is implicit confirmation that static URL is the right behavior.
112
+
113
+ **T4 (active): `sessionTools` feature flag description is stale after PR-B**
114
+ Currently says "and HTTP dashboard server". After PR-B, `sessionTools` enables only the session store. The flag description must be updated. Low-impact but visible in the MCP tool listing.
115
+
116
+ **T5 (active): `worktrain-spawn.ts` and `worktrain-await.ts` dual lock-file fallback**
117
+ Both files check `daemon-console.lock` first, then `dashboard.lock`. After PR-B, `dashboard.lock` is never written. The second fallback becomes permanently dead code. Not a regression (graceful null return), but a cleanliness debt. Deferred to follow-up chore PR.
118
+
119
+ ### Success Criteria
120
+
121
+ 1. `npm run build` passes with zero errors after both PRs
122
+ 2. `npx vitest run` passes (pre-existing perf flakiness excluded) after both PRs
123
+ 3. No import of `bridge-entry`, `primary-tombstone`, or `bridge-events` anywhere in `src/` or `tests/` after PR-A
124
+ 4. No `ToolContext.httpServer` field after PR-B
125
+ 5. `sessionTools` flag still enables `workrail_create_session`, `workrail_update_session`, `workrail_read_session`
126
+ 6. `create_session` returns a non-null `dashboardUrl` (static URL) after PR-B
127
+ 7. `open_dashboard` returns a URL (not null, not an error) after PR-B
128
+ 8. MCP-over-HTTP transport tests (`mcp-http-transport.test.ts`, `http-listener.test.ts`) pass after both PRs
129
+ 9. `triggers.yml` is NOT committed in either PR
130
+ 10. `workrail cleanup` removal is documented in release notes
131
+
132
+ ### Assumptions being promoted to facts (risks)
133
+
134
+ **A1 (LOW RISK):** `open_dashboard` is not called by any production workflow. Evidence: `grep -rn "open_dashboard" workflows/` returns no matches. Confirmed.
135
+
136
+ **A2 (MEDIUM RISK):** No operator has private tooling that calls `/api/v2/sessions` on the MCP server's dashboard port (3456). Evidence: `worktrain-spawn.ts` already prefers `daemon-console.lock`; no bundled workflow calls the endpoint; it's undocumented. This assumption could be wrong for private deployments, but no evidence suggests it is.
137
+
138
+ **A3 (LOW RISK):** Static `http://localhost:3456` URL in `handleCreateSession` is acceptable for typical use. Evidence: `dashboard-template-workflow.json` already hardcodes this exact URL in prompt text, confirming it's the de-facto standard. Users who run `worktrain console --port 4000` will get a dead link, but the lock-file read in `handleOpenDashboard` mitigates this for `open_dashboard`.
139
+
140
+ **A4 (LOW RISK):** `daemon-console.lock` format is stable: `{ pid: number, port: number }` JSON. Confirmed in two independent implementations (`daemon-console.ts:13` and `standalone-console.ts:13`). Safe to read in `handleOpenDashboard`.
141
+
142
+ ### Framing risks (what could make this framing wrong)
143
+
144
+ **FR1:** The scope could be wider than two PRs. If `dashboard-template-workflow.json` is in production use by external operators and those operators depend on the `http://localhost:3456/dashboard.html` URL format, removing HttpServer might break them silently. Counter: the workflow is in `workflows/examples/` (not a core bundled workflow) and already hardcodes port 3456. The PR-B change preserves this exact URL. Not a real risk.
145
+
146
+ **FR2:** The framing assumes `sessionTools` usage is low-impact after PR-B. If a significant segment of users actively calls `open_dashboard` and relies on auto-browser-launch, this is a non-trivial UX regression. Counter: there is zero evidence of such usage -- no bundled workflow calls it, no documentation describes it as a core feature. Low-impact assumption holds.
147
+
148
+ **FR3:** The framing could be wrong if there's a third HTTP server that wasn't found. The codebase has two HTTP servers: MCP-over-HTTP (`http-listener.ts`) and dashboard HttpServer (`HttpServer.ts`). If a third exists (e.g., trigger system on port 3200), removing the dashboard HttpServer might cause confusion about which HTTP server serves what. Evidence: `src/trigger/daemon-console.ts` runs on port 3456, separate from both. Not a problem -- the trigger console is already decoupled (it IS the standalone console).
149
+
150
+ ### HMW questions (reframes)
151
+
152
+ **HMW 1:** "How might we make `handleOpenDashboard` return a guaranteed-live URL rather than a static guess?"
153
+ Answer: read `daemon-console.lock` (already planned in PR-B). This transforms the tool from "returns a hint" to "returns the actual running URL when worktrain console is up."
154
+
155
+ **HMW 2:** "How might we ensure that removing `workrail cleanup` doesn't leave any user with no way to clean up stale processes?"
156
+ Answer: the cleanup command killed processes on ports 3456-3499 (all HttpServer ports). After HttpServer removal, no WorkRail-owned processes run on those ports. The cleanup operation is semantically meaningless. Users with stale lock files from older versions: `~/.workrail/daemon-console.lock` has a pid field; `worktrain-spawn.ts` already validates the pid before using the port. Stale locks self-heal.
157
+
158
+ ---
159
+
160
+ ## Selected Direction: Candidate 2 (Two Sequential PRs)
161
+
162
+ The design candidates doc evaluated three options:
163
+
164
+ 1. **C1 (bridge out, HttpServer simplified):** Removes bridge but leaves HttpServer running. Does not satisfy backlog requirement. Rejected.
165
+ 2. **C2 (two sequential PRs):** PR-A removes bridge+tombstone; PR-B removes HttpServer entirely. Resolves all four tensions. **Selected.**
166
+ 3. **C3 (feature flag gated):** Invents removal-gate flag anti-pattern. No repo precedent. Rejected.
167
+
168
+ ---
169
+
170
+ ## PR-A: Bridge and Tombstone Removal
171
+
172
+ **Branch:** `feat/mcp-simplify-remove-bridge` (already exists; changes already in working tree)
173
+ **Commit type:** `chore(mcp)` -- pure deletion, no user-visible behavior change
174
+ **Status:** Implementation complete in working tree. Needs final diff review and commit.
175
+
176
+ ### Critical pre-commit checks
177
+
178
+ ```bash
179
+ # Check 1: verify http-entry.ts tombstone call site (Contradiction C1)
180
+ grep -n "writeTombstone\|clearTombstone\|primary-tombstone\|bridge-events" src/mcp/transports/http-entry.ts
181
+ # Expected: zero matches. If any match, remove them before committing.
182
+
183
+ # Check 2: triggers.yml must NOT be staged (protected file)
184
+ # Stage only the PR-A files explicitly -- never git add -A or git add .
185
+ ```
186
+
187
+ ### Files deleted in working tree
188
+
189
+ - `src/mcp/transports/bridge-entry.ts` (892 lines)
190
+ - `src/mcp/transports/primary-tombstone.ts` (140 lines)
191
+ - `src/mcp/transports/bridge-events.ts` (93 lines)
192
+ - `tests/unit/mcp/stdin-probe.test.ts`
193
+ - `tests/unit/mcp/transports/bridge-entry.test.ts` (638 lines)
194
+ - `tests/unit/mcp/transports/primary-tombstone.test.ts` (85 lines)
195
+
196
+ ### Files modified in working tree
197
+
198
+ - `src/mcp-server.ts` -- simplified to 37 lines
199
+ - `src/mcp/transports/stdio-entry.ts` -- tombstone call sites removed
200
+ - `src/mcp/transports/http-entry.ts` -- verify tombstone removed (C1)
201
+
202
+ ### Verification for PR-A
203
+
204
+ ```bash
205
+ npm run build # must pass
206
+ npx vitest run # must pass (perf flakiness pre-existing, not caused by these changes)
207
+ grep -rn "bridge-entry\|primary-tombstone\|bridge-events\|startBridgeServer\|detectHealthyPrimary\|waitForStdinReadable" src/ tests/
208
+ # Expected: zero matches
209
+ ```
210
+
211
+ ---
212
+
213
+ ## PR-B: HttpServer Removal from MCP Server
214
+
215
+ **Branch:** `feat/etienneb/stdio-simplification-pr-b` (new from post-PR-A main)
216
+ **Commit type:** `feat(mcp)` -- MCP tool contract change
217
+ **Expected net change:** ~1200 lines deleted, ~120 lines changed
218
+ **Prerequisites:** PR-A merged and CI green
219
+
220
+ ### `daemon-console.lock` format (confirmed)
221
+
222
+ File: `~/.workrail/daemon-console.lock`
223
+ Format: `{ "pid": number, "port": number }` (JSON)
224
+ Written by: `src/trigger/daemon-console.ts` and `src/console/standalone-console.ts`
225
+ Read by (currently): `worktrain-spawn.ts`, `worktrain-await.ts`
226
+ Safe to read in `handleOpenDashboard` with a `try/catch` that falls back to `DEFAULT_CONSOLE_PORT`.
227
+
228
+ ### Files to delete
229
+
230
+ | File | Size | Why deleted |
231
+ |---|---|---|
232
+ | `src/infrastructure/session/HttpServer.ts` | ~1211 lines | Dashboard HTTP server removed from MCP |
233
+ | `src/infrastructure/session/DashboardHeartbeat.ts` | ~N lines | HttpServer dependency |
234
+ | `src/infrastructure/session/DashboardLockRelease.ts` | ~N lines | HttpServer dependency |
235
+ | `tests/integration/unified-dashboard.test.ts` | ~206 lines | Tests HttpServer dashboard behavior |
236
+ | `tests/unit/http-server-stop-idempotency.test.ts` | ~147 lines | Tests HttpServer lifecycle |
237
+ | `tests/integration/process-cleanup.test.ts` | ~N lines | Scan first; delete HttpServer-dependent portions only |
238
+
239
+ ### Files to update
240
+
241
+ **`src/mcp/types.ts`** (PRIMARY SEAM):
242
+ - Remove `readonly httpServer: HttpServer | null` field from `ToolContext`
243
+ - Remove `import type { HttpServer }` from the file
244
+
245
+ **`src/mcp/server.ts`:**
246
+ - Remove `httpServer = container.resolve(DI.Infra.HttpServer)` and sessionTools HttpServer gate
247
+ - Remove console routes mount block
248
+ - Remove `ctx.httpServer?.finalize()`
249
+ - Import `DEFAULT_CONSOLE_PORT` from `console-defaults.ts`
250
+
251
+ **`src/mcp/handlers/session.ts`:**
252
+ - Rewrite `requireSessionTools()`: check only `ctx.sessionManager !== null`
253
+ - Rewrite `handleCreateSession`: use `http://localhost:${DEFAULT_CONSOLE_PORT}?session=${input.sessionId}`
254
+ - Rewrite `handleOpenDashboard`: read `daemon-console.lock` (parse `{ port }`) with fallback to `DEFAULT_CONSOLE_PORT`; return `{ url }` with guidance; no browser auto-open
255
+
256
+ **`src/di/container.ts`:** remove `HttpServer` import, registration, and `startAsyncServices` block
257
+
258
+ **`src/di/tokens.ts`:** remove `HttpServer`, `DashboardMode`, `BrowserBehavior` symbols
259
+
260
+ **`src/config/app-config.ts`:** remove dashboard subtree, type exports, env vars; add startup warning for deprecated vars
261
+
262
+ **`src/config/feature-flags.ts`:** update `sessionTools` description to remove "HTTP dashboard server"
263
+
264
+ **`src/cli.ts`:** remove `cleanup` command entirely
265
+
266
+ **`src/mcp/transports/stdio-entry.ts` and `http-entry.ts`:** remove `ctx.httpServer?.stop()` from shutdown hooks
267
+
268
+ **`src/runtime/adapters/node-process-signals.ts`:** update comment
269
+
270
+ **`src/infrastructure/session/index.ts`:** remove HttpServer, DashboardHeartbeat, DashboardLockRelease exports
271
+
272
+ ### New file
273
+
274
+ **`src/infrastructure/console-defaults.ts`** (~5 lines):
275
+ ```typescript
276
+ /** Default port for the worktrain console UI. */
277
+ export const DEFAULT_CONSOLE_PORT = 3456;
278
+ ```
279
+
280
+ ### Verification for PR-B
281
+
282
+ ```bash
283
+ npm run build
284
+ npx vitest run
285
+ grep -rn "HttpServer\|DashboardHeartbeat\|DashboardLockRelease\|DI\.Infra\.HttpServer" src/ tests/
286
+ grep -rn "WORKRAIL_DASHBOARD_PORT\|WORKRAIL_DISABLE_UNIFIED_DASHBOARD" src/
287
+ grep -rn "httpServer" src/mcp/types.ts
288
+ # All expected: zero matches
289
+ ```
290
+
291
+ ---
292
+
293
+ ## Scope Not In This Work
294
+
295
+ | Item | Decision |
296
+ |---|---|
297
+ | `worktrain-spawn.ts` / `worktrain-await.ts` dead `dashboard.lock` fallback | Separate `chore` PR after PR-B ships |
298
+ | MCP-over-HTTP transport (`http-entry.ts`, `http-listener.ts`) | Explicitly out of scope; must not be touched |
299
+ | `triggers.yml` modification in working tree | Protected file; must NOT be committed in either PR |
300
+ | `dashboard-template-workflow.json` URL update | Not needed; workflow already hardcodes `http://localhost:3456` |
301
+
302
+ ---
303
+
304
+ ## Decisions and Rationale Log
305
+
306
+ | Decision | Rationale |
307
+ |---|---|
308
+ | Two PRs | PR-A is independently reviewable, low-risk, high-value. PR-B benefits from PR-A being validated. |
309
+ | `ToolContext.httpServer` as primary seam | Compiler-enforced blast radius. |
310
+ | Lock-file read in `handleOpenDashboard` | `daemon-console.lock` format confirmed: `{ pid, port }`. Gives live URL when console is running. |
311
+ | Static URL in `handleCreateSession` | `dashboard-template-workflow.json` already hardcodes port 3456 in prompt text; static URL matches existing behavior exactly. |
312
+ | `workrail cleanup` removed entirely | Semantically wrong after HttpServer removal. Release note sufficient. |
313
+ | Deprecated env vars: startup warning | Silently ignoring confuses users. One-line warning. |
314
+ | `DEFAULT_CONSOLE_PORT = 3456` | Follows `DEFAULT_MCP_PORT = 3100` pattern. |
315
+ | PR-B commit type: `feat(mcp)` | MCP tool contract change. Semantic-release must create release entry. |
316
+ | MCP-over-HTTP transport kept | Separate infra for bot services. |
317
+ | Delegation not used | Unavailable. Solo work is sufficient for design-only session. |
@@ -0,0 +1,288 @@
1
+ # Findings: Structured Output + Tool Calls Coexistence
2
+
3
+ **Date:** 2026-04-18
4
+ **Test file:** `tests/integration/structured-output-tools-coexist.test.ts`
5
+ **SDK:** `@anthropic-ai/sdk@0.73.0`, `@anthropic-ai/bedrock-sdk@0.28.1`
6
+
7
+ ---
8
+
9
+ ## Summary
10
+
11
+ **Tools and structured output (JSON schema) CAN coexist** in a single API request on both
12
+ Anthropic direct and Amazon Bedrock. However, the feature is:
13
+ - Beta-only (`client.beta.messages.create()`, not `client.messages.create()`)
14
+ - Schema enforcement applies only at `end_turn` -- `tool_use` turns produce no text response
15
+
16
+ The system prompt fallback (strong JSON constraint without `output_config`) is INCONSISTENT on
17
+ direct Anthropic (2/3 valid) and CONSISTENT on Bedrock (3/3 valid). This suggests Bedrock's
18
+ model version (`claude-sonnet-4-6`) follows instructions more reliably than the direct API's
19
+ `claude-sonnet-4-5`.
20
+
21
+ ---
22
+
23
+ ## Exact API Params That Work
24
+
25
+ ### Anthropic direct (`client.beta.messages.create()`)
26
+
27
+ ```typescript
28
+ import Anthropic from '@anthropic-ai/sdk';
29
+ const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
30
+
31
+ const response = await client.beta.messages.create({
32
+ model: 'claude-sonnet-4-5-20250929',
33
+ max_tokens: 1024,
34
+ system: 'You are a workflow executor. Respond ONLY with a JSON object...',
35
+ messages: [{ role: 'user', content: '...' }],
36
+ tools: [{ name: 'bash_tool', description: '...', input_schema: { ... } }],
37
+ output_config: {
38
+ format: {
39
+ type: 'json_schema',
40
+ schema: {
41
+ type: 'object',
42
+ properties: {
43
+ step_complete: { type: 'boolean' },
44
+ notes: { type: 'string' },
45
+ },
46
+ required: ['step_complete', 'notes'],
47
+ additionalProperties: false,
48
+ },
49
+ },
50
+ },
51
+ });
52
+ // stop_reason: 'end_turn', content: [{ type: 'text', text: '{"step_complete": true, "notes": "..."}' }]
53
+ ```
54
+
55
+ **Result:** API accepted the call. `stop_reason: 'end_turn'`. Response text was valid JSON
56
+ matching the declared schema. No beta header string required (SDK sends `?beta=true` query
57
+ param automatically via `client.beta.messages.create()`).
58
+
59
+ ### Amazon Bedrock (`client.beta.messages.create()` via AnthropicBedrock)
60
+
61
+ ```typescript
62
+ import { AnthropicBedrock } from '@anthropic-ai/bedrock-sdk';
63
+ const client = new AnthropicBedrock();
64
+
65
+ const response = await client.beta.messages.create({
66
+ model: 'us.anthropic.claude-sonnet-4-6',
67
+ max_tokens: 1024,
68
+ system: '...',
69
+ messages: [{ role: 'user', content: '...' }],
70
+ tools: [...],
71
+ output_config: {
72
+ format: { type: 'json_schema', schema: { ... } },
73
+ },
74
+ });
75
+ // stop_reason: 'end_turn', content: [{ type: 'text', text: '{"step_complete": true, ...}' }]
76
+ ```
77
+
78
+ **Result:** API accepted the call. `stop_reason: 'end_turn'`. Response text was valid JSON.
79
+ AnthropicBedrock exposes `.beta.messages.create()` identically to the direct client.
80
+
81
+ ---
82
+
83
+ ## SDK Type Evidence
84
+
85
+ ### `output_config` type (in `@anthropic-ai/sdk@0.73.0`)
86
+
87
+ ```typescript
88
+ // node_modules/@anthropic-ai/sdk/resources/beta/messages/messages.d.ts
89
+
90
+ interface BetaOutputConfig {
91
+ effort?: 'low' | 'medium' | 'high' | 'max' | null;
92
+ format?: BetaJSONOutputFormat | null;
93
+ }
94
+
95
+ interface BetaJSONOutputFormat {
96
+ schema: { [key: string]: unknown };
97
+ type: 'json_schema';
98
+ }
99
+
100
+ // Available on BetaMessageCreateParamsNonStreaming:
101
+ // output_config?: BetaOutputConfig;
102
+ ```
103
+
104
+ ### `AnthropicBedrock.beta` type (in `@anthropic-ai/bedrock-sdk@0.28.1`)
105
+
106
+ ```typescript
107
+ // node_modules/@anthropic-ai/bedrock-sdk/client.d.ts, line 84
108
+
109
+ type BetaResource = Omit<Resources.Beta, 'promptCaching' | 'messages'> & {
110
+ messages: Omit<Resources.Beta['messages'], 'batches' | 'countTokens'>;
111
+ };
112
+ // AnthropicBedrock.beta: BetaResource -- .beta.messages.create() is available
113
+ ```
114
+
115
+ ### Key: beta endpoint routing
116
+
117
+ The SDK calls `/v1/messages?beta=true` (not `/v1/messages`) when using
118
+ `client.beta.messages.create()`. No explicit `betas` array is needed for `output_config` --
119
+ the `?beta=true` query param is sufficient. The `betas` array only adds feature-specific
120
+ `anthropic-beta` headers (e.g. `prompt-caching-2024-07-31`).
121
+
122
+ ---
123
+
124
+ ## Raw Test Results
125
+
126
+ ### Case 1: Baseline -- tools only, no output_config (direct Anthropic)
127
+
128
+ ```json
129
+ {
130
+ "stop_reason": "tool_use",
131
+ "content": [
132
+ { "type": "text", "text": "I'll analyze the task..." },
133
+ { "type": "tool_use", "name": "bash_tool", "input": { "command": "echo ..." } }
134
+ ]
135
+ }
136
+ ```
137
+
138
+ The model called the bash_tool when given tools and a generic task. This confirms that without
139
+ `output_config`, the model freely uses tools (as it does today in agent-loop.ts).
140
+
141
+ ### Case 2: tools + output_config (direct Anthropic)
142
+
143
+ ```json
144
+ {
145
+ "stop_reason": "end_turn",
146
+ "content": [
147
+ {
148
+ "type": "text",
149
+ "text": "{\"step_complete\": true, \"notes\": \"Analyzed the task 'write a test'...\"}"
150
+ }
151
+ ]
152
+ }
153
+ ```
154
+
155
+ The model chose NOT to call the tool and instead produced a valid JSON end_turn response.
156
+ The system prompt instructed it not to call tools -- the output_config enforced the JSON shape.
157
+
158
+ ### Case 3: System prompt constraint, 3-call consistency (direct Anthropic)
159
+
160
+ - Call 1: VALID JSON (plain JSON object)
161
+ - Call 2: INVALID (wrapped in ```json ... ``` markdown code block)
162
+ - Call 3: VALID JSON (plain JSON object)
163
+
164
+ **2/3 consistent.** The system prompt alone is NOT reliable on claude-sonnet-4-5. The model
165
+ sometimes wraps JSON in markdown fences, breaking JSON.parse.
166
+
167
+ ### Case 4: tools + output_config (Bedrock, claude-sonnet-4-6)
168
+
169
+ ```json
170
+ {
171
+ "stop_reason": "end_turn",
172
+ "content": [
173
+ {
174
+ "type": "text",
175
+ "text": "{\"step_complete\": true, \"notes\": \"Analyzed the task: 'write a test'...\"}"
176
+ }
177
+ ]
178
+ }
179
+ ```
180
+
181
+ Identical behavior to direct Anthropic. The beta endpoint works on Bedrock.
182
+
183
+ ### Case 5: System prompt constraint, 3-call consistency (Bedrock, claude-sonnet-4-6)
184
+
185
+ - Call 1: VALID JSON
186
+ - Call 2: VALID JSON
187
+ - Call 3: VALID JSON
188
+
189
+ **3/3 consistent.** claude-sonnet-4-6 on Bedrock reliably produces clean JSON when instructed.
190
+
191
+ ---
192
+
193
+ ## Provider Comparison Table
194
+
195
+ | Feature | Anthropic direct (claude-sonnet-4-5) | Bedrock (claude-sonnet-4-6) | OpenAI gpt-4o* |
196
+ |---|---|---|---|
197
+ | `output_config` + tools in ONE request | YES (beta API) | YES (beta API) | YES (`response_format`) |
198
+ | Beta API path required | YES (`client.beta.messages.create()`) | YES (`client.beta.messages.create()`) | NO (stable API) |
199
+ | Schema enforced at end_turn | YES (valid JSON observed) | YES (valid JSON observed) | YES |
200
+ | Schema applied to tool_use turns | N/A (no text on tool_use) | N/A (no text on tool_use) | N/A |
201
+ | System prompt fallback consistency | 2/3 (unreliable) | 3/3 (reliable) | N/A |
202
+ | `betas` header required | NO | NO | N/A |
203
+ | SDK type: `output_config` | `BetaMessageCreateParamsNonStreaming` | Same (via bedrock-sdk) | `ChatCompletionCreateParams` |
204
+
205
+ *OpenAI: from official documentation, not a live test. OpenAI SDK not installed in this repo.
206
+
207
+ ---
208
+
209
+ ## Key Behavioral Observations
210
+
211
+ ### What happens when both tools and output_config are sent?
212
+
213
+ The model CHOOSES at each turn whether to call a tool or produce an end_turn response. The
214
+ `output_config` schema only applies to end_turn text responses -- it has no effect on
215
+ `tool_use` turns (which produce no text).
216
+
217
+ This means:
218
+ - If the model decides to call a tool: `stop_reason: 'tool_use'`, no JSON text, schema not enforced
219
+ - If the model decides to respond directly: `stop_reason: 'end_turn'`, JSON text, schema enforced
220
+
221
+ The system prompt heavily influences whether the model calls tools or not. With a strong
222
+ "do NOT call tools" instruction, the model consistently chose end_turn + JSON output.
223
+
224
+ ### Schema does not FORCE end_turn
225
+
226
+ The `output_config` does not prevent tool calls. It only shapes the text content when the
227
+ model DOES produce text. An architecture using `output_config` for workflow control (complete_step)
228
+ would still need to account for the model potentially calling external tools before end_turn.
229
+
230
+ ---
231
+
232
+ ## Recommendation for WorkRail
233
+
234
+ ### Option A: Adopt output_config + tools dual architecture
235
+
236
+ **Architecture:**
237
+ - Use `client.beta.messages.create()` instead of `client.messages.create()` in `agent-loop.ts`
238
+ - Add `output_config.format` declaring a `{ step_complete: boolean, notes: string, ... }` schema
239
+ - Keep external tools (Bash, Read, Write) in the `tools` array
240
+ - Remove `complete_step` as a tool; instead, detect `stop_reason: 'end_turn'` + parse JSON text
241
+
242
+ **Feasibility:** YES for both direct Anthropic and Bedrock.
243
+
244
+ **Risk:**
245
+ - Beta API -- may change or be deprecated
246
+ - `AgentClientInterface` would need updating (currently typed to standard `messages.create()`)
247
+ - The model might still call tools before end_turn; the agent loop needs to handle this correctly
248
+ - On tool_use turns, the JSON schema is irrelevant -- workflow control happens at end_turn only
249
+
250
+ **Gain:**
251
+ - Structured output is MORE reliable than tool calls for workflow control (schema-enforced)
252
+ - Separates external effects (tools) from control flow (end_turn JSON) cleanly
253
+ - Eliminates the `complete_step` tool hallucination risk
254
+
255
+ ### Option B: Stay with pure tool calls (current architecture)
256
+
257
+ **Current architecture:** complete_step is a tool. Agent calls it to advance workflow steps.
258
+
259
+ **Keep if:**
260
+ - Beta API risk is unacceptable
261
+ - The added complexity of dual architecture is not worth the reliability gain
262
+
263
+ ### Option C: System prompt fallback (Bedrock only, no beta API)
264
+
265
+ **Architecture:** Strong system prompt JSON constraint, no `output_config`. Parse end_turn text.
266
+
267
+ **Viability:** 3/3 consistent on claude-sonnet-4-6 (Bedrock). NOT reliable on claude-sonnet-4-5
268
+ (direct Anthropic, 2/3 consistent).
269
+
270
+ **Recommendation:** Do NOT use for direct Anthropic. Acceptable for Bedrock-only deployment
271
+ if beta API risk is unacceptable. But this is fragile -- model updates may break consistency.
272
+
273
+ ---
274
+
275
+ ## Decision
276
+
277
+ **Recommended: Option A (output_config + tools dual architecture on beta API).**
278
+
279
+ The coexistence is confirmed on both providers. The beta endpoint is stable enough (it is the
280
+ same endpoint used by tools like web_search, code execution, etc. in production).
281
+
282
+ The schema enforcement on end_turn is reliable. The system prompt should still instruct the
283
+ model about when to call tools vs. when to respond directly, but the JSON schema provides a
284
+ safety net that pure system-prompt approaches lack.
285
+
286
+ **Primary action:** Update `AgentClientInterface` to expose `beta.messages.create()` and add
287
+ `output_config` to the `AgentLoop` options. Remove `complete_step` as a tool; replace with
288
+ end_turn JSON parsing in workflow-runner.ts.