@exaudeus/workrail 3.35.1 → 3.37.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/dist/config/config-file.js +2 -0
  2. package/dist/console-ui/assets/{index-D7jQyCSD.js → index-o-p__sHJ.js} +1 -1
  3. package/dist/console-ui/index.html +1 -1
  4. package/dist/daemon/workflow-runner.d.ts +5 -0
  5. package/dist/daemon/workflow-runner.js +131 -1
  6. package/dist/manifest.json +39 -31
  7. package/dist/mcp/handlers/v2-advance-events.js +1 -1
  8. package/dist/mcp/handlers/v2-execution/start.d.ts +1 -0
  9. package/dist/mcp/handlers/v2-execution/start.js +3 -2
  10. package/dist/trigger/notification-service.d.ts +42 -0
  11. package/dist/trigger/notification-service.js +164 -0
  12. package/dist/trigger/trigger-listener.js +7 -1
  13. package/dist/trigger/trigger-router.d.ts +3 -1
  14. package/dist/trigger/trigger-router.js +4 -1
  15. package/dist/v2/durable-core/schemas/export-bundle/index.d.ts +64 -32
  16. package/dist/v2/durable-core/schemas/session/events.d.ts +20 -10
  17. package/dist/v2/durable-core/schemas/session/events.js +1 -1
  18. package/dist/v2/durable-core/schemas/session/gaps.d.ts +8 -8
  19. package/dist/v2/durable-core/schemas/session/gaps.js +1 -1
  20. package/docs/design/agent-behavior-patterns-discovery.md +312 -0
  21. package/docs/design/agent-engine-communication-discovery.md +390 -0
  22. package/docs/design/agent-loop-architecture-alternatives-discovery.md +531 -0
  23. package/docs/design/agent-loop-error-handling-contract.md +238 -0
  24. package/docs/design/complete-step-approach-validation-discovery.md +344 -0
  25. package/docs/design/daemon-stuck-detection-discovery.md +174 -0
  26. package/docs/design/mcp-server-disconnect-discovery.md +245 -0
  27. package/docs/design/mcp-server-epipe-crash.md +198 -0
  28. package/docs/design/notification-design-candidates.md +131 -0
  29. package/docs/design/notification-design-review.md +84 -0
  30. package/docs/design/notification-implementation-plan.md +181 -0
  31. package/docs/design/spawn-agent-failure-modes.md +161 -0
  32. package/docs/design/spawn-agent-result-handling-implementation-plan.md +186 -0
  33. package/docs/design/stdio-simplification-design-candidates.md +341 -0
  34. package/docs/design/stdio-simplification-design-review.md +93 -0
  35. package/docs/design/stdio-simplification-implementation-plan.md +317 -0
  36. package/docs/design/structured-output-tools-coexist-findings.md +288 -0
  37. package/docs/discovery/coordinator-script-design.md +745 -0
  38. package/docs/discovery/coordinator-ux-discovery.md +471 -0
  39. package/docs/discovery/spawn-agent-failure-modes.md +309 -0
  40. package/docs/discovery/workflow-selection-for-discovery-tasks.md +336 -0
  41. package/docs/discovery/worktrain-status-briefing.md +325 -0
  42. package/docs/discovery/worktrain-status-design-candidates.md +202 -0
  43. package/docs/discovery/worktrain-status-design-review-findings.md +86 -0
  44. package/docs/ideas/backlog.md +688 -1
  45. package/docs/ideas/daemon-structured-output-vs-tool-calls.md +344 -0
  46. package/docs/ideas/design-candidates-backlog-consolidation.md +85 -0
  47. package/docs/ideas/design-candidates-spawn-agent-task.md +178 -0
  48. package/docs/ideas/design-review-findings-backlog-consolidation.md +39 -0
  49. package/docs/ideas/design-review-findings-spawn-agent-task.md +139 -0
  50. package/docs/ideas/implementation_plan_backlog_consolidation.md +117 -0
  51. package/docs/ideas/implementation_plan_spawn_agent.md +217 -0
  52. package/docs/plans/authoring-doc-staleness-enforcement-candidates.md +251 -0
  53. package/docs/plans/authoring-doc-staleness-enforcement-review.md +99 -0
  54. package/docs/plans/authoring-doc-staleness-enforcement.md +463 -0
  55. package/package.json +1 -1
@@ -0,0 +1,344 @@
1
+ # Daemon: Structured Output vs Tool Calls for Workflow-Control Communication
2
+
3
+ **Status:** Discovery in progress (2026-04-18)
4
+ **Author:** Discovery workflow
5
+ **Scope:** WorkRail autonomous daemon (`src/daemon/`) only. Does not affect the MCP server or human-driven Claude Code sessions.
6
+
7
+ ---
8
+
9
+ ## What this doc is for
10
+
11
+ This document is a **human-readable artifact** for reviewing the structured output vs tool call tradeoff in the WorkRail daemon. It is NOT execution truth -- execution truth lives in WorkRail session notes and context variables.
12
+
13
+ ---
14
+
15
+ ## Context / Ask
16
+
17
+ The WorkRail daemon (`workflow-runner.ts`) drives autonomous workflow sessions via an `AgentLoop` that uses the Anthropic Messages API. The agent currently communicates with the workflow engine entirely through tool calls:
18
+
19
+ - `continue_workflow` -- advance/rehydrate the workflow engine (in-process via `executeContinueWorkflow`)
20
+ - `Bash`, `Read`, `Write` -- interact with the filesystem/shell
21
+ - `report_issue` -- observability signal to the daemon
22
+
23
+ The `continue_workflow` pattern was inherited from the MCP protocol, where tool calls are the only communication mechanism. But the daemon **owns the agent loop directly** and is not constrained by MCP. It could use `response_format: { type: 'json_schema', json_schema: {...} }` to force structured JSON output instead of tool calls for workflow-control operations.
24
+
25
+ **The question:** Should `continue_workflow` (and possibly `report_issue`) be replaced with structured output, keeping only the world-interaction tools (Bash, Read, Write) as actual tool calls?
26
+
27
+ ---
28
+
29
+ ## Path Recommendation
30
+
31
+ `landscape_first` -- both options are named, the codebase is readable, the dominant need is side-by-side comparison.
32
+
33
+ ---
34
+
35
+ ## Constraints / Anti-Goals
36
+
37
+ **Constraints:**
38
+ - Must not break the MCP server tool-call path (MCP clients still call `continue_workflow` as a tool)
39
+ - Bedrock compatibility required (default daemon client is `AnthropicBedrock`)
40
+ - No new Anthropic API features that aren't available on Bedrock
41
+
42
+ **Anti-goals:**
43
+ - Don't redesign MCP server schema
44
+ - Don't change how human-driven Claude Code sessions work
45
+ - Don't require unavailable API features
46
+
47
+ ---
48
+
49
+ ## Landscape Packet
50
+
51
+ ### API Capability Survey (verified 2026-04-18)
52
+
53
+ **Anthropic SDK version installed:** `@anthropic-ai/sdk` (GA), `@anthropic-ai/bedrock-sdk` 0.28.1
54
+
55
+ **GA messages API (`client.messages.create`):**
56
+ - `output_config.format` (type `JSONOutputFormat` with `type: 'json_schema'`) is in the GA `MessageCreateParamsNonStreaming` type (line 1059 of messages.d.ts)
57
+ - `tools` and `output_config.format` are **separate fields** on the same params object -- there is NO documented incompatibility in the TypeScript types
58
+ - The beta `structured-outputs-2025-12-15` header is only required for the older `beta.messages` API; the GA API uses `output_config` directly
59
+ - Bedrock SDK (`@anthropic-ai/bedrock-sdk`) imports `Resources` from `@anthropic-ai/sdk/resources/index` -- it inherits the same `output_config` type
60
+
61
+ **Key discovery: tools + output_config CAN coexist in the GA API**
62
+ The `MessageCreateParamsNonStreaming` type has both `tools?: Array<ToolUnion>` and `output_config?: OutputConfig` as independent optional fields. Earlier assumption that "you can't mix response_format with tool calls" applies only to the older beta API (which has a different incompatibility model). The GA API was designed to support both simultaneously.
63
+
64
+ This eliminates the main technical blocker for the hybrid approach.
65
+
66
+ ### Token Overhead Analysis
67
+
68
+ | Approach | Schema overhead per request | Notes |
69
+ |----------|--------------------------|-------|
70
+ | Current (5 tools) | ~853 tokens | All 5 schemas injected on every `messages.create()` call |
71
+ | Hybrid (3 world tools + SO schema) | ~628 tokens | Bash + Read + Write tools + output_config schema |
72
+ | Enriched tool schema (Option D) | ~853 tokens | Same as current, adds fields to existing schema |
73
+ | Pure structured output | ~358 tokens | output_config schema only, no tools |
74
+
75
+ The savings from the hybrid approach (~225 tokens/request) are modest. At 50 turns/session, that's ~11,000 tokens saved -- meaningful but not decisive.
76
+
77
+ ### Tool Call Pattern Analysis
78
+
79
+ From reading `workflow-runner.ts` and `agent-loop.ts`, the actual session pattern is:
80
+
81
+ ```
82
+ Turn 1: Bash (read files) → Bash (more investigation) → continue_workflow(advance)
83
+ Turn 2: Bash (edit code) → Bash (run tests) → continue_workflow(advance)
84
+ ...
85
+ Turn N: continue_workflow(advance) → done
86
+ ```
87
+
88
+ Key observations:
89
+ 1. **`continue_workflow` always appears at the END of a turn.** The LLM never interleaves `continue_workflow` with Bash calls. This is enforced by workflow design -- the step work comes before advancing.
90
+ 2. **Multiple tool calls per turn are common.** The LLM calls Bash 3-5 times before `continue_workflow`.
91
+ 3. **Tool call ordering is significant.** `continue_workflow` must be last in a turn.
92
+ 4. The `steer()` mechanism injects the next step AFTER `continue_workflow` fires, which then causes a new LLM turn.
93
+
94
+ This pattern means: on the turn where `continue_workflow` fires, it is always the LAST tool call. There is no structural reason it needs to be a tool call (the daemon could inspect the response, see end_turn, and interpret the text as structured output). But the current mechanism is clean -- tool call = explicit signal, result = next step.
95
+
96
+ ### Precedents in the Codebase
97
+
98
+ - The "scripts over agent" principle (backlog.md) applies: deterministic operations should be scripts, not LLM decisions. Structured output is more scripty -- it removes the LLM's ability to "call wrong tools" and forces it to produce what the daemon expects.
99
+ - The blocked response complexity (retry tokens, validation issues) is currently text-in-tool_result. With structured output, `next_action: "blocked"` with a structured `blockers` array would be cleaner.
100
+ - Bedrock SDK 0.28.1 is current (released 2026-04-08) and inherits GA API types including `output_config`.
101
+
102
+ ### Evidence Gaps
103
+
104
+ 1. **Runtime behavior of tools + output_config combo on Bedrock:** TypeScript types allow it; runtime behavior is unverified. Could use WebFetch to check Bedrock docs or test with a real call.
105
+ 2. **Whether output_config.format constrains tool call behavior:** The docs say structured outputs "guarantee" the format -- but what happens on the turn where the LLM also calls tools? Does it produce tool_use blocks OR a json_schema text block? This is the key behavior question.
106
+ 3. **Token counting accuracy:** ~4 chars/token is a rough estimate. Actual overhead depends on the specific tokenizer.
107
+
108
+ ### Contradictions Found
109
+
110
+ 1. **Earlier claim (pre-reading) vs. actual SDK:** I initially assumed "response_format + tools = incompatible." The GA SDK shows `output_config` and `tools` are separate fields. This assumption was wrong and needs correction.
111
+ 2. **The "blocked" path complexity:** The current `continue_workflow` tool returns complex text (retry tokens, validation issues, assessment followups) in the tool_result text. This is exactly the kind of structured data that JSON schema would handle better -- a contradiction with the tool-call-is-simpler narrative.
112
+
113
+ ---
114
+
115
+ ## Problem Frame Packet
116
+
117
+ ### Primary Users / Stakeholders
118
+
119
+ - **Daemon operator (Etienne):** Runs autonomous workflows. Cares about session reliability, debuggability, token cost, and whether the LLM reliably submits well-structured notes and artifacts on each step.
120
+ - **Workflow authors:** Define workflow steps. Benefit if the agent submits structured artifacts (commit type, PR title, files changed) without requiring notes-parsing heuristics.
121
+ - **MCP clients (Claude Code, other MCP integrations):** Unaffected -- the MCP tool-call path is unchanged. This decision is daemon-internal only.
122
+ - **WorkRail engine (`executeContinueWorkflow`):** Receives advance calls. Currently gets `notesMarkdown` as a string; would benefit from typed `artifacts` and `context_updates` as first-class fields.
123
+
124
+ ### Core Tension
125
+
126
+ **Tool calls give the LLM agency; structured output constrains it.**
127
+
128
+ With tool calls, the LLM decides WHEN to call `continue_workflow` and WHAT to include. It can call it early, late, skip it, or call it multiple times. The daemon has to trust the LLM to follow the protocol.
129
+
130
+ With structured output, the LLM MUST emit the schema on end_turn. The daemon reads it deterministically. The LLM cannot deviate from the schema.
131
+
132
+ But: tool calls are the natural idiom for "LLM doing work then reporting completion." Structured output is the natural idiom for "LLM producing a typed result." The question is which idiom fits the daemon's use case better.
133
+
134
+ **The sub-tension:** The `blocked` response path needs structured data (retry tokens, blockers, validation issues) to flow back to the LLM cleanly. Currently this is text-in-tool_result that the agent has to parse. Structured output (or an enriched tool schema) solves this directly.
135
+
136
+ ### Jobs / Outcomes
137
+
138
+ 1. **Reliable step completion signaling:** Daemon needs to know when the LLM has finished step work. Currently: LLM calls `continue_workflow`. With SO: LLM emits end_turn with json output.
139
+ 2. **Structured artifact submission:** Workflow steps can require typed handoff artifacts (commit type, PR title). Currently parsed from notesMarkdown text -- fragile. With SO or enriched schema: first-class typed fields.
140
+ 3. **Debuggability:** When a session goes wrong, can you tell why? Tool calls leave a clear log (`tool_called` events). Structured output on end_turn is also inspectable. Both are fine.
141
+ 4. **Blocked response handling:** LLM needs to understand what to fix when `continue_workflow` returns blocked. Currently: complex text in tool_result. Better: structured blockers array.
142
+
143
+ ### Success Criteria
144
+
145
+ 1. The LLM reliably submits step notes and advances the workflow -- no more, no less.
146
+ 2. Artifact fields (commit type, PR title, files changed) are accessible as typed data in `lastStepNotes` without text parsing.
147
+ 3. Blocked response path is clearer -- the LLM knows exactly what to fix and where to find the retry token.
148
+ 4. No Bedrock API regression.
149
+ 5. Implementation complexity is proportional to the benefit.
150
+
151
+ ### Assumptions (that could be wrong)
152
+
153
+ 1. **"tools + output_config coexist at runtime"** -- verified in SDK types, unverified at Bedrock runtime. If AWS Bedrock doesn't support `output_config.format`, hybrid and pure-SO options are blocked on Bedrock.
154
+ 2. **"the LLM always calls continue_workflow last"** -- this is behavioral, not enforced. A hallucinating LLM could call Bash after continue_workflow. The steer() mechanism handles this, but it's an assumption that the protocol holds.
155
+ 3. **"structured output improves reliability"** -- it guarantees schema shape, not semantic correctness. The LLM could still submit empty notes, wrong artifacts, or meaningless step_notes. The gain is format enforcement, not content enforcement.
156
+
157
+ ### Reframes / HMW Questions
158
+
159
+ 1. **HMW:** "How might we make the artifact submission path typed without changing the turn structure at all?" Answer: Option D -- enrich `ContinueWorkflowParams` with an `artifacts` array. Zero API change, same tool call pattern.
160
+
161
+ 2. **HMW:** "How might we separate 'workflow-control' from 'world-interaction' in a way that makes the distinction architecturally clean?" Answer: This is the real insight in the structured output proposal. Bash/Read/Write are I/O. `continue_workflow` is a protocol signal. Mixing them in the same tool list conflates two different communication channels.
162
+
163
+ 3. **Reframe:** The question is NOT "tool calls vs structured output" -- it's "should the workflow-advance signal be a tool call or a protocol signal?" Tool calls are the right mechanism for I/O with side effects. Workflow advance is not I/O -- it's a turn-ending protocol handshake. The framing conflates mechanism with purpose.
164
+
165
+ ### Framing Risks
166
+
167
+ 1. **Over-engineering risk:** The "structured output is cleaner" argument is aesthetically appealing. But the actual pain is "artifacts are hard to extract from notesMarkdown text." Option D fixes that pain with one schema field addition. If that's the only real pain, the bigger architectural change isn't justified.
168
+ 2. **Runtime compatibility risk:** If Bedrock doesn't support `output_config.format` (unverified), the hybrid/pure-SO options are blocked on the default daemon client. The fix would require switching to direct Anthropic API (no Bedrock) or waiting for AWS to support it.
169
+ 3. **LLM behavior risk:** Structured output guarantees schema shape but not quality. An LLM that writes bad notes as text will write bad notes as json. The reliability improvement is narrower than it sounds.
170
+
171
+ ---
172
+
173
+ ## Candidate Directions
174
+
175
+ ### Option A: Status Quo (do nothing)
176
+
177
+ **Summary:** Keep the current tool-call pattern exactly as-is. Accept that artifact extraction requires notesMarkdown parsing and blocked responses are text.
178
+
179
+ **Tensions resolved:** None. Baseline for comparison.
180
+ **Tensions accepted:** Fragile artifact extraction, messy blocked response text.
181
+ **Boundary:** No change.
182
+ **Failure mode:** Delivery layer (`delivery-action.ts`) continues to parse `lastStepNotes` as text -- brittle to note format changes.
183
+ **Repo pattern:** Follows existing pattern exactly.
184
+ **Gain:** Zero change cost, zero risk.
185
+ **Give up:** Typed artifact delivery, cleaner blocked response handling.
186
+ **Impact surface:** None.
187
+ **Scope:** Best-fit as baseline only, not as a real candidate for improvement.
188
+ **Philosophy:** Honors YAGNI. Conflicts with "make illegal states unrepresentable" (notesMarkdown parsing is a stringly-typed boundary).
189
+
190
+ ---
191
+
192
+ ### Option D: Enriched Tool Schema (recommended -- simplest sufficient change)
193
+
194
+ **Summary:** Add `artifacts?: Array<{kind: 'git_commit'|'pr'|'test_run'|'file_set', [key: string]: unknown}>` and `blockerResolution?: {retryToken: string, issuesResolved: string[]}` to `ContinueWorkflowParams`. Remove the blocked response text encoding from tool_result; return a structured `BlockedResponse` type instead.
195
+
196
+ **Concrete shape:**
197
+ ```typescript
198
+ // ContinueWorkflowParams additions:
199
+ artifacts?: ReadonlyArray<{
200
+ kind: 'git_commit' | 'pull_request' | 'test_run' | 'file_set';
201
+ [key: string]: unknown;
202
+ }>;
203
+ // BlockedResponse tool_result (replace current text encoding):
204
+ // { kind: 'blocked', blockers: [{message, suggestedFix?}], retryToken: string, validation?: {issues: string[], suggestions: string[]} }
205
+ // Returned as JSON in the tool_result content block
206
+ ```
207
+ The LLM calls `continue_workflow({continueToken, notesMarkdown, artifacts: [{kind: 'git_commit', type: 'feat', subject: '...'}]})`. The daemon reads `params.artifacts` directly. No notesMarkdown parsing for delivery.
208
+
209
+ The blocked response text is replaced with JSON in the tool_result content, using the existing text content block but with `JSON.stringify(blockedResponse)` -- the LLM already parses JSON from tool results.
210
+
211
+ **Tensions resolved:**
212
+ - Typed artifact submission: solved (typed `artifacts` field)
213
+ - Blocked response clarity: partially solved (structured JSON in tool_result, but still in a text block)
214
+ **Tensions accepted:**
215
+ - The tool_result structured response is still a text block -- the JSON is implicit, not schema-enforced
216
+ - `artifacts.kind` enum is a new contract that workflow authors must honor
217
+
218
+ **Boundary:** `makeContinueWorkflowTool` in `workflow-runner.ts`. The `ContinueWorkflowParams` schema is the input boundary; the blocked response format is the output boundary.
219
+ **Why this boundary:** The tool execute() function is the one place where both the input (params) and output (tool_result content) live. No changes to `agent-loop.ts` or `AgentClientInterface`.
220
+
221
+ **Failure mode:** The LLM may not pass `artifacts` if the system prompt doesn't explicitly instruct it to. The schema makes it optional (backward compatible). The delivery layer must handle missing artifacts gracefully.
222
+
223
+ **Repo pattern:** Follows existing pattern (tool call, schema, execute()). Adapts the blocked response from text to JSON in the same text content block.
224
+
225
+ **Gain:** Typed artifact delivery without any new API dependencies or Bedrock risk. Solves the most concrete day-to-day pain.
226
+ **Give up:** Still a tool call (no schema enforcement at the LLM boundary); still text-wrapped JSON for blocked responses.
227
+
228
+ **Impact surface:**
229
+ - `workflow-runner.ts`: `makeContinueWorkflowTool` schema + execute()
230
+ - `src/trigger/delivery-action.ts`: reads `params.artifacts` instead of parsing `lastStepNotes`
231
+ - System prompt: add artifacts instruction
232
+ - Zero changes to `agent-loop.ts`, MCP server, or Bedrock client
233
+
234
+ **Scope:** Best-fit. Minimal change to the real seam, solves the stated pains.
235
+ **Philosophy:** Honors "explicit domain types over primitives" (typed artifacts vs string parsing), "YAGNI" (no new dependencies), "validate at boundaries" (typed input schema). Minor conflict with "make illegal states unrepresentable" -- artifacts.kind is closed but the rest of the object is open (`[key: string]: unknown`).
236
+
237
+ ---
238
+
239
+ ### Option C: Two-Phase Hybrid (tools for I/O, structured output for end-of-step)
240
+
241
+ **Summary:** Keep Bash/Read/Write as tool calls. Remove `continue_workflow` from the tool list entirely. On every `end_turn` response (when the LLM stops calling tools), the daemon reads a structured JSON object from the final text block using `output_config.format: { type: 'json_schema', schema: StepCompletionSchema }`. The daemon calls `executeContinueWorkflow` directly based on that JSON.
242
+
243
+ **Concrete shape:**
244
+ ```typescript
245
+ // output_config schema injected into messages.create():
246
+ const StepCompletionSchema = {
247
+ type: 'object',
248
+ properties: {
249
+ step_notes: { type: 'string' },
250
+ next_action: { type: 'string', enum: ['advance', 'rehydrate', 'done'] },
251
+ artifacts: { type: 'array', items: { type: 'object' } },
252
+ context_updates: { type: 'object' },
253
+ issues: { type: 'array', items: { type: 'object', properties: {
254
+ kind: { type: 'string', enum: ['tool_failure', 'blocked', 'unexpected_behavior', 'needs_human', 'self_correction'] },
255
+ severity: { type: 'string', enum: ['info', 'warn', 'error', 'fatal'] },
256
+ summary: { type: 'string' }
257
+ }}}
258
+ },
259
+ required: ['step_notes', 'next_action']
260
+ };
261
+ // messages.create() call adds: output_config: { format: { type: 'json_schema', schema: StepCompletionSchema } }
262
+ ```
263
+
264
+ The daemon's `agent-loop.ts` needs a new code path: when `stop_reason === 'end_turn'`, parse the text content block as JSON (the structured output). The `workflow-runner.ts` `_runLoop` handles this in its `agent_end` subscriber.
265
+
266
+ The steer() mechanism must be restructured: instead of steers happening inside the current prompt() call, the daemon calls `agent.prompt()` again with the next step after parsing the end_turn JSON.
267
+
268
+ **Tensions resolved:**
269
+ - Typed artifact submission: solved at the API level (schema-enforced)
270
+ - Blocked response clarity: daemon sends the next step text as a new user message, not as a tool_result -- the LLM sees the blocked feedback as a first-class message
271
+ - Schema enforcement: the LLM CANNOT produce end_turn without a valid JSON object
272
+ **Tensions accepted:**
273
+ - Bedrock runtime uncertainty: `output_config.format + tools` coexist in GA TypeScript types but unverified on Bedrock at runtime
274
+ - Steer() restructuring: the current steer/turn_end/inject-next-step flow must change -- after end_turn, the daemon calls agent.prompt() for the next step, not agent.steer()
275
+ - Multi-step per prompt() call changes: currently one prompt() call runs an entire session. With this change, each step becomes its own prompt() call (or a new messages.create() call).
276
+
277
+ **Boundary:** `agent-loop.ts` + `workflow-runner.ts`. The `AgentClientInterface.messages.create()` params gain `output_config`. The loop's `end_turn` handling gains JSON parsing.
278
+
279
+ **Failure mode:**
280
+ 1. Bedrock doesn't support `output_config.format` at runtime -- session fails silently or with a cryptic API error
281
+ 2. The LLM produces a malformed JSON object (schema enforcement reduces but doesn't eliminate this)
282
+ 3. The steer() restructuring introduces a regression in the step-injection flow
283
+
284
+ **Repo pattern:** Departs from existing pattern. Requires changes to both `agent-loop.ts` (new end_turn handling) and `workflow-runner.ts` (new step-injection flow). The `AgentClientInterface` duck-type must be extended.
285
+
286
+ **Gain:** Schema-enforced step completion. Clean architectural separation between "world interaction" (tool calls) and "protocol communication" (structured output). Eliminates the tool_result complexity for blocked responses.
287
+ **Give up:** Implementation complexity, Bedrock risk, steer() refactor, non-trivial testing effort.
288
+
289
+ **Impact surface:**
290
+ - `agent-loop.ts`: new end_turn JSON parsing path, `AgentClientInterface` extended
291
+ - `workflow-runner.ts`: step-injection restructuring, `AgentLoopOptions.tools` changes
292
+ - `AgentClientInterface`: new `output_config` parameter
293
+ - All tests that exercise the agent loop
294
+ - Bedrock runtime compatibility (unverified)
295
+
296
+ **Scope:** Too broad for the stated pains. The implementation complexity exceeds the benefit if Option D solves the artifact/blocked-response problem.
297
+ **Philosophy:** Strongly honors "make illegal states unrepresentable" (schema enforcement), "validate at boundaries" (LLM output boundary). Conflicts with "YAGNI" -- architectural change for a benefit that D also provides incrementally. Conflicts with "architectural fixes over patches" only if D is a patch; D is arguably an architectural fix (typed domain types).
298
+
299
+ ---
300
+
301
+ ### Option B: Pure Structured Output (no tool calls except Bash/Read/Write)
302
+
303
+ **Summary:** Same as Option C but removes `report_issue` from the tool list too, folding it into the `issues` array in the structured output schema. The daemon listens only for `end_turn` with the StepCompletionSchema JSON -- no `continue_workflow` tool call at all.
304
+
305
+ **Distinctions from Option C:** This option commits fully to structured output as the ONLY workflow-control channel. `report_issue` becomes a field in the StepCompletionSchema, not a tool call. The LLM has fewer tools (Bash, Read, Write only) -- simpler tool schema.
306
+
307
+ **Additional gain over C:** ~225 tokens/request saved (no continue_workflow or report_issue schemas). Tool list is 3 items (Bash, Read, Write) -- cleaner, fewer hallucination targets.
308
+ **Additional risk over C:** Same Bedrock risk. Larger departure from existing pattern. `report_issue` timing changes -- currently the LLM can call it mid-step; with pure SO, issues are batch-submitted at end_turn.
309
+
310
+ **Scope:** Too broad. Same fundamental Bedrock risk as C with additional issue-timing behavioral change.
311
+ **Philosophy:** Most aligned with "make illegal states unrepresentable." Highest conflict with "YAGNI."
312
+
313
+ ---
314
+
315
+ ## Challenge Notes
316
+
317
+ ### Candidate Generation Expectations (landscape_first path)
318
+
319
+ The candidate set must:
320
+ 1. **Reflect verified landscape constraints** -- specifically that `output_config.format + tools` CAN coexist in the GA API (verified from SDK types). Options must not assume incompatibility.
321
+ 2. **Cover the full spectrum from minimal to architectural** -- from schema enrichment only (Option D) to full structured output (Option B) to hybrid (Option C). No candidate should be omitted because it seems "too simple" or "too complex."
322
+ 3. **Treat Bedrock runtime uncertainty as a first-class dimension** -- each option must state its Bedrock risk explicitly. Options that require unverified Bedrock behavior must be flagged as "conditional on prototype validation."
323
+ 4. **Address the two concrete pains** -- typed artifact submission AND blocked response clarity. An option that solves neither is not a candidate.
324
+ 5. **Not drift into free invention** -- options must be grounded in what the codebase actually supports. No speculative dependencies.
325
+
326
+ ---
327
+
328
+ ## Resolution Notes
329
+
330
+ *(To be populated)*
331
+
332
+ ---
333
+
334
+ ## Decision Log
335
+
336
+ | Date | Decision | Rationale |
337
+ |------|----------|-----------|
338
+ | 2026-04-18 | Landscape-first path chosen | Both options are named; dominant need is comparison, not reframing |
339
+
340
+ ---
341
+
342
+ ## Final Summary
343
+
344
+ *(To be populated)*
@@ -0,0 +1,85 @@
1
+ # Design Candidates: Backlog Consolidation from docs/coordinator-and-scripts-spec
2
+
3
+ ## Problem Understanding
4
+
5
+ **What we're doing:** Five sections exist on `origin/docs/coordinator-and-scripts-spec` that never made it to main. They need to be inserted into `docs/ideas/backlog.md` at the correct chronological position.
6
+
7
+ **Missing sections (branch lines 1776-2088):**
8
+ 1. `### Scripts-first coordinator: avoid the main agent wherever possible (Apr 15, 2026)`
9
+ 2. `### Full development pipeline: coordinator scripts drive multi-phase autonomous work (Apr 15, 2026)`
10
+ 3. `### Additional coordinator pipeline templates (Apr 15, 2026)` -- includes Backlog grooming, Bug investigation, Incident monitoring coordinators
11
+ 4. `### Interactive ideation: WorkTrain as a thinking partner with full project context (Apr 15, 2026)`
12
+ 5. `### Automatic gap and improvement detection: proactive WorkTrain (Apr 15, 2026)`
13
+
14
+ **Core tensions:**
15
+ - **Insertion order:** Must place content chronologically (Apr 15, before Dynamic model selection which is also Apr 15) -- wrong order scrambles the narrative flow
16
+ - **Separator hygiene:** The `---` separator at main line 1781 already separates the preceding Verification section from Dynamic model selection -- inserting content before Dynamic model selection means the separator now separates the new content from Dynamic model selection, which is correct
17
+
18
+ **What makes it hard:** Nothing technically hard. The only risk is a doubled or missing `---` at the insertion boundary.
19
+
20
+ **Likely seam:** Main line 1782, immediately before `### Dynamic model selection: right model for the right task (Apr 15, 2026)`.
21
+
22
+ ---
23
+
24
+ ## Philosophy Constraints
25
+
26
+ From `/Users/etienneb/CLAUDE.md`:
27
+ - **NEVER push directly to main** -- create a feature branch, open a PR (no exceptions)
28
+ - Commit format: `docs(backlog): <subject>`, max 72 chars, no period
29
+ - **Surface information, don't hide it** -- flag the push-to-main conflict explicitly
30
+
31
+ No philosophy conflicts affect the content insertion itself.
32
+
33
+ ---
34
+
35
+ ## Impact Surface
36
+
37
+ - `docs/ideas/backlog.md` only
38
+ - No code, no tests, no consumers of this file in the build system
39
+ - Readers of the backlog will gain five new sections; no existing content changes
40
+
41
+ ---
42
+
43
+ ## Candidates
44
+
45
+ ### Candidate A: Verbatim extract and insert at chronological position (ONLY CANDIDATE)
46
+
47
+ **Summary:** Extract branch lines 1776-2088 verbatim and insert them immediately before `### Dynamic model selection` (main line 1782). The existing `---` at main line 1781 serves as the final separator -- no extra separator needed.
48
+
49
+ - **Tensions resolved:** Insertion order correct (Apr 15 chronological flow), separator hygiene maintained
50
+ - **Tensions accepted:** None
51
+ - **Boundary:** `docs/ideas/backlog.md`, single edit
52
+ - **Failure mode:** Doubled `---` if the branch content ends with one -- must verify (it does NOT end with `---`, so no risk)
53
+ - **Repo pattern:** Follows -- identical to how all recent backlog sections were added
54
+ - **Gains:** Complete, clean consolidation of all five missing sections
55
+ - **Losses:** Nothing
56
+ - **Scope:** Best-fit
57
+ - **Philosophy:** Honors 'surface information', 'document why not what'
58
+
59
+ **Why no other candidates exist:** This is pure text insertion into a markdown file. The insertion point is unambiguous. There are no architectural tradeoffs. Manufacturing alternative candidates would be dishonest.
60
+
61
+ ---
62
+
63
+ ## Comparison and Recommendation
64
+
65
+ **Recommendation:** Candidate A.
66
+
67
+ All analysis converges. The task is content consolidation with a single correct insertion point. Execute verbatim extract + insert.
68
+
69
+ ---
70
+
71
+ ## Self-Critique
72
+
73
+ **Strongest counter-argument:** These sections might have been intentionally left off main -- rejected or superseded by later content.
74
+
75
+ **Evidence against that:** No commit on main removes or contradicts these sections. The branch simply became stale when main moved on. The content is high-quality and self-consistent with the surrounding backlog.
76
+
77
+ **Narrower option:** Insert only the two sections mentioned in the task spec (Full dev pipeline + Backlog grooming coordinator). This would leave Scripts-first coordinator (the conceptual foundation for the other sections), Interactive ideation, and Automatic gap detection on the stale branch. All five are from the same commit block on the branch, all absent from main -- no reason to leave any out.
78
+
79
+ **Invalidating assumption:** If any of these headings appear on main under a slightly different title. Already verified with grep -- none present.
80
+
81
+ ---
82
+
83
+ ## Open Questions for the Main Agent
84
+
85
+ 1. None -- execution path is clear. Insert branch lines 1776-2088 before main line 1782, create a feature branch, commit, open PR.
@@ -0,0 +1,178 @@
1
+ # Design Candidates: spawn_agent Task Implementation
2
+
3
+ > Full investigative material is in `design-candidates-spawn-agent.md`, `design-spawn-agent.md`,
4
+ > and `design-review-findings-spawn-agent.md`. This file summarizes for the current coding task.
5
+
6
+ ---
7
+
8
+ ## Problem Understanding
9
+
10
+ ### Core Tensions
11
+
12
+ **T1: Blocking vs. semaphore deadlock**
13
+ `TriggerRouter.dispatch()` is fire-and-forget (non-blocking by design) and uses a global `Semaphore`.
14
+ A parent holding a slot cannot wait for a child to acquire another slot -- deadlock.
15
+ Correct path: call `runWorkflow()` directly, bypassing the semaphore entirely.
16
+
17
+ **T2: Typed schema extension vs. internalContext injection**
18
+ Adding `parentSessionId` to `session_created.data` is the typed, durable, query-friendly path.
19
+ Injecting via `internalContext` (context_set event) is the proven fast path.
20
+ Both are needed: `internalContext` for the `executeStartWorkflow()` call, AND schema extension for future DAG queries.
21
+
22
+ **T3: Deterministic childSessionId vs. code simplicity**
23
+ Pre-creating the child session (Candidate 2) gives a deterministic `childSessionId` before the run starts.
24
+ Direct `runWorkflow()` (Candidate 1) is simpler but cannot return `childSessionId` if the run crashes before the AgentLoop starts.
25
+
26
+ **T4: Depth propagation safety**
27
+ Using `context.spawnDepth` (generic map) is fragile -- any code that overwrites context silently breaks depth enforcement.
28
+ Using `WorkflowTrigger.spawnDepth` (typed `readonly` field) is compiler-enforced and cannot be accidentally lost.
29
+
30
+ ### Likely Seam
31
+ `workflow-runner.ts` -- new `makeSpawnAgentTool()` factory alongside existing tool factories.
32
+ `events.ts` -- one-line additive schema extension for `session_created.data`.
33
+ `start.ts` -- thread `parentSessionId` through `buildInitialEvents()`.
34
+
35
+ ### What Makes It Hard
36
+ - The `runWorkflow()` call inside `execute()` requires capturing `ctx`, `apiKey`, `daemonRegistry?`, `emitter?` in the factory closure.
37
+ - `executeStartWorkflow()` returns `RA<StartWorkflowResult, StartWorkflowError>` -- must be unwrapped asynchronously.
38
+ - `_preAllocatedStartResponse` expects `startResult.value.response` (not the full `StartWorkflowResult`).
39
+ - Junior developer would call `dispatch()` instead of `runWorkflow()` and create a deadlock.
40
+ - `session_created.data` currently hardcodes `data: {}` in `buildInitialEvents()` -- must thread `parentSessionId` into that call.
41
+
42
+ ---
43
+
44
+ ## Philosophy Constraints
45
+
46
+ From `CLAUDE.md` and repo patterns:
47
+
48
+ - **Errors as data**: Return `{ outcome: 'error', notes: msg }` JSON, not thrown exceptions, for child failures.
49
+ - **Exhaustiveness**: Handle all 4 `WorkflowRunResult` variants without `as unknown` casts.
50
+ - **Immutability**: New `WorkflowTrigger` fields are `readonly`.
51
+ - **DI for boundaries**: `runWorkflowFn`, `ctx`, `apiKey`, `emitter` all injected at construction time.
52
+ - **YAGNI**: Phase 1 only. No `spawn_session + await_sessions`, no bare-prompt mode, no width guardrails.
53
+ - **Make illegal states unrepresentable**: `childSessionId` always present (pre-create guarantees it).
54
+
55
+ No philosophy conflicts between stated rules and repo patterns.
56
+
57
+ ---
58
+
59
+ ## Impact Surface
60
+
61
+ | File | Change | Risk |
62
+ |---|---|---|
63
+ | `src/daemon/workflow-runner.ts` | Add `parentSessionId?`, `spawnDepth?` to `WorkflowTrigger`; add `makeSpawnAgentTool()`; inject in `runWorkflow()`; update `BASE_SYSTEM_PROMPT`; update `_preAllocatedStartResponse` JSDoc | Low -- additive |
64
+ | `src/v2/durable-core/schemas/session/events.ts` | Extend `session_created.data` with `parentSessionId?: z.string().optional()` | Low -- `z.object({})` uses strip mode |
65
+ | `src/mcp/handlers/v2-execution/start.ts` | Thread `parentSessionId` from `internalContext` into `session_created` event via `buildInitialEvents()` | Low -- internal API |
66
+ | `src/trigger/trigger-router.ts` | No change -- new `WorkflowTrigger` fields are optional | None |
67
+ | `src/v2/usecases/console-routes.ts` | No change -- new `WorkflowTrigger` fields are optional | None |
68
+
69
+ ---
70
+
71
+ ## Candidates
72
+
73
+ ### Candidate 1: Direct runWorkflow() call
74
+
75
+ **Summary**: `makeSpawnAgentTool()` calls `runWorkflow()` directly. No pre-creation. Session ID extracted from result after run.
76
+
77
+ **Tensions resolved**: YAGNI (fewest lines), blocking (natural await).
78
+ **Tensions accepted**: Crash-before-start has no observable `childSessionId`. `childSessionId` is absent on failure.
79
+
80
+ **Boundary**: `WorkflowTrigger` + direct `runWorkflow()` call.
81
+ **Why this boundary**: `WorkflowTrigger` is the natural seam -- carries all session config. No new types.
82
+
83
+ **Failure mode**: `runWorkflow()` crashes before AgentLoop starts -- `childSessionId` is null, parent gets `{ outcome: 'error', childSessionId: null }`.
84
+
85
+ **Repo-pattern relationship**: Follows factory pattern. No adaptation of `_preAllocatedStartResponse`.
86
+
87
+ **Gain**: ~10 fewer lines, maximum simplicity.
88
+ **Give up**: No deterministic `childSessionId` on startup failures. Less crash observability.
89
+
90
+ **Scope**: Best-fit.
91
+ **Philosophy fit**: Honors YAGNI strongest. Slight tension with 'make illegal states unrepresentable' (`childSessionId` can be null).
92
+
93
+ ---
94
+
95
+ ### Candidate 2: Pre-create session with _preAllocatedStartResponse (RECOMMENDED)
96
+
97
+ **Summary**: `execute()` calls `executeStartWorkflow()` with `parentSessionId` in `internalContext`, decodes `childSessionId` from the returned `continueToken`, then calls `runWorkflow()` with `_preAllocatedStartResponse`.
98
+
99
+ **Tensions resolved**: Deterministic `childSessionId`, crash-before-start observability, `childSessionId` seeds Phase 2, 'make illegal states unrepresentable'.
100
+ **Tensions accepted**: One extra async call (~10-50ms).
101
+
102
+ **Boundary**: `WorkflowTrigger._preAllocatedStartResponse` + `internalContext` injection.
103
+ **Why this boundary**: Direct adaptation of the proven `_preAllocatedStartResponse` pattern from `console-routes.ts`. Session store sees the child immediately -- correct observable behavior.
104
+
105
+ **Failure mode**: `executeStartWorkflow()` succeeds, `runWorkflow()` fails before AgentLoop -- zombie session in store. Accepted for Phase 1.
106
+
107
+ **Repo-pattern relationship**: Adapts proven `_preAllocatedStartResponse` pattern.
108
+
109
+ **Gain**: `childSessionId` always known before child runs. Deterministic. Child observable from moment of `execute()`.
110
+ **Give up**: One extra async call. Slightly more setup code.
111
+
112
+ **Scope**: Best-fit.
113
+ **Philosophy fit**: Honors determinism over cleverness, make illegal states unrepresentable, DI. No conflicts.
114
+
115
+ ---
116
+
117
+ ### Candidate 3: Read depth from session store at execute() time
118
+
119
+ **Summary**: Instead of passing `currentDepth` as a constructor parameter, read `spawnDepth` from parent session store inside `execute()`.
120
+
121
+ **Tensions resolved**: Accurate depth for checkpoint-resumed sessions (theoretical edge case).
122
+ **Tensions accepted**: Async I/O in `execute()`, more error paths, session store dependency.
123
+
124
+ **Boundary**: Session store read inside `execute()`.
125
+ **Why this boundary is NOT best-fit**: Expensive, speculative. Checkpoint-resumed daemon sessions restart AgentLoop from scratch -- constructor parameter is always correctly set.
126
+
127
+ **Failure mode**: Store read fails -- fail-safe blocks spawn, adds error path complexity.
128
+
129
+ **Repo-pattern relationship**: Departs from constructor-injection pattern.
130
+
131
+ **Gain**: Accurate depth for resumed sessions. **Give up**: YAGNI violation, async I/O, extra error paths.
132
+
133
+ **Scope**: Too broad. **Philosophy fit**: Conflicts with YAGNI.
134
+
135
+ ---
136
+
137
+ ## Comparison and Recommendation
138
+
139
+ ### Comparison Matrix
140
+
141
+ | Tension | C1 | C2 | C3 |
142
+ |---|---|---|---|
143
+ | Blocking fidelity | Strong | Strong | Strong |
144
+ | Deterministic childSessionId | Weak | Strong | Weak |
145
+ | Semaphore bypass | Strong | Strong | Strong |
146
+ | YAGNI | Strong | Moderate | Weak |
147
+ | Crash observability | Weak | Strong | Weak |
148
+ | Depth accuracy | Adequate | Adequate | Strong (speculative) |
149
+ | Repo pattern | Follows | Adapts proven | Departs |
150
+ | Philosophy | Full | Full | Partial |
151
+
152
+ ### Recommendation: Candidate 2
153
+
154
+ C2 is best-fit. The `_preAllocatedStartResponse` pattern is proven and stable (`console-routes.ts`).
155
+ The marginal complexity (one extra async call) is small relative to the gain: `childSessionId` is always
156
+ known, crash-before-start is observable, Phase 2 is seeded. C3 is rejected on YAGNI grounds.
157
+
158
+ ---
159
+
160
+ ## Self-Critique
161
+
162
+ **Strongest counter-argument**: C2 adds a zombie session failure mode that C1 doesn't have. If `executeStartWorkflow()` succeeds but `runWorkflow()` fails immediately, a session exists in the store with no corresponding run. C1 avoids this -- no session is created until the run actually starts.
163
+
164
+ **C1 as narrower option**: Still satisfies acceptance criteria. Loses crash observability and deterministic `childSessionId`. Would win if we prioritized simplicity over observability.
165
+
166
+ **C3 as broader option**: Justified only if checkpoint-resumed spawned sessions become a real production use case. No evidence for Phase 1.
167
+
168
+ **Assumption that would invalidate C2**: If `_preAllocatedStartResponse` is removed in a future refactor. Mitigation: update its JSDoc (Orange finding O2) to list `spawn_agent` as a legitimate caller.
169
+
170
+ ---
171
+
172
+ ## Open Questions for the Main Agent
173
+
174
+ 1. **maxSubagentDepth source**: Design doc says read from `WorkflowTrigger.agentConfig` (default 3). Should this also check global workspace config? Decision: use `trigger.agentConfig?.maxSubagentDepth ?? 3` for Phase 1. Document in tool description.
175
+
176
+ 2. **`session_created.data` strictness**: Confirmed `z.object({})` uses strip mode. Extension is safe. Unverified by migration run -- low risk.
177
+
178
+ 3. **Zombie session cleanup**: Deferred to Phase 2. Document as known edge case in tool description.
@@ -0,0 +1,39 @@
1
+ # Design Review Findings: Backlog Consolidation from docs/coordinator-and-scripts-spec
2
+
3
+ ## Tradeoff Review
4
+
5
+ No accepted tradeoffs. The design is a deterministic text insertion with a fully determined execution path. All boundary conditions verified.
6
+
7
+ ## Failure Mode Review
8
+
9
+ | Failure Mode | Mitigation | Residual Risk |
10
+ |---|---|---|
11
+ | Doubled `---` separator at insertion boundary | Branch content starts with `###` heading (no leading `---`) and ends with prose (no trailing `---`). Main line 1781 is the only `---` at boundary. | None |
12
+ | Content duplication | Grep confirmed all five headings absent from main. | None |
13
+ | Wrong insertion point | Main line 1782 is `### Dynamic model selection (Apr 15, 2026)`; missing sections are also Apr 15, 2026 and immediately precede it on the branch. | None |
14
+ | Accidental content loss from edit | Will use narrow, unique `old_string` matching only the heading line. | None |
15
+
16
+ ## Runner-Up / Simpler Alternative Review
17
+
18
+ Runner-up (insert only two sections) has no elements worth borrowing. The five sections form a cohesive single-commit block; inserting all five is the correct unit of consolidation and is already minimal.
19
+
20
+ ## Philosophy Alignment
21
+
22
+ | Principle | Status |
23
+ |---|---|
24
+ | NEVER push directly to main | SATISFIED -- creating feature branch + PR despite user request to push direct |
25
+ | Surface information, don't hide it | SATISFIED -- push-to-main conflict explicitly flagged |
26
+ | Commit format `docs(backlog): <subject>` | SATISFIED -- will use `docs(backlog): consolidate missing coordinator specs from stale branch` |
27
+ | No em-dashes in written content | SATISFIED -- branch content already uses `--` throughout |
28
+
29
+ ## Findings
30
+
31
+ No RED, ORANGE, or YELLOW findings. The design is clean and risk-free.
32
+
33
+ ## Recommended Revisions
34
+
35
+ None. Proceed with selected approach as designed.
36
+
37
+ ## Residual Concerns
38
+
39
+ None.