@exaudeus/workrail 3.34.2 → 3.35.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,166 @@
1
+ # Implementation Plan: daemon complete_step tool
2
+
3
+ ## Problem Statement
4
+
5
+ The daemon's `continue_workflow` tool requires the LLM to round-trip a `continueToken` (an HMAC-signed opaque token). The LLM frequently mangles this token, causing TOKEN_BAD_SIGNATURE errors that kill sessions. The fix: add a `complete_step` tool where the daemon injects the `continueToken` internally -- the LLM never sees it.
6
+
7
+ ## Acceptance Criteria
8
+
9
+ 1. `makeCompleteStepTool()` function exists in `src/daemon/workflow-runner.ts`, exported alongside `makeContinueWorkflowTool()`
10
+ 2. `complete_step` accepts: `notes: string` (required, min 50 chars), `artifacts?: unknown[]`, `context?: Record<string, unknown>`
11
+ 3. The tool injects the current session's `continueToken` internally before calling `executeContinueWorkflow` -- LLM never provides it
12
+ 4. `continueToken` is managed via a closure variable `currentContinueToken` in `runWorkflow()`, updated on advance and blocked-retry
13
+ 5. On successful advance: returns `{ status: 'advanced', nextStep: '<step title>' }` text
14
+ 6. On workflow complete: returns `{ status: 'complete' }` text
15
+ 7. On blocked: returns human-readable feedback saying 'call complete_step again' (not continue_workflow)
16
+ 8. Runtime validation: throws if `notes.length < 50`
17
+ 9. `complete_step` is in the daemon tools list in `runWorkflow()` alongside `continue_workflow`
18
+ 10. `continue_workflow` description is marked `[DEPRECATED in daemon sessions -- use complete_step]`
19
+ 11. `BASE_SYSTEM_PROMPT` lists `complete_step` as the primary advancement tool
20
+ 12. Initial prompt removes `continueToken: ${startContinueToken}` and says 'Call complete_step with your notes when done'
21
+ 13. Unit tests cover: happy-path advance, workflow complete, blocked response (retryable + non-retryable), notes too short, artifacts pass-through
22
+ 14. All existing tests pass
23
+
24
+ ## Non-Goals
25
+
26
+ - No schema changes to `V2ContinueWorkflowInputShape` or public MCP tools
27
+ - No new HTTP routes or public API changes
28
+ - `complete_step` is daemon-only -- NOT added to the MCP server's public tools list
29
+ - No migration of `makeContinueWorkflowTool` to use `TokenRef` pattern
30
+ - No removal of `continue_workflow` from the daemon tools list (backward compat)
31
+ - No changes to `rehydrate` flow
32
+
33
+ ## Philosophy-Driven Constraints
34
+
35
+ - **Immutability by default**: `currentContinueToken` is the only mutable shared state; confined to `runWorkflow()` closure
36
+ - **Validate at boundaries**: runtime `notes.length` check in `execute()` (JSON Schema is informational only)
37
+ - **Prefer fakes over mocks**: tests use fake `executeContinueWorkflow` injection
38
+ - **YAGNI**: zero new abstraction types
39
+ - **Document "why" not "what"**: WHY comments on all non-obvious decisions
40
+
41
+ ## Invariants
42
+
43
+ 1. `persistTokens(sessionId, newToken, ...)` is always called BEFORE `onAdvance()` fires (crash safety)
44
+ 2. `currentContinueToken` is updated to the advance token on `kind: 'ok'` responses
45
+ 3. `currentContinueToken` is updated to the retry token on `kind: 'blocked'` responses
46
+ 4. `continueToken` never appears in `complete_step` response text
47
+ 5. `intent: 'advance'` is hardcoded -- LLM cannot pass `rehydrate` via `complete_step`
48
+ 6. `complete_step` is NOT exposed in the MCP server's public tool registration
49
+
50
+ ## Selected Approach
51
+
52
+ **Candidate 1: Inline closure variable + two callback paths**
53
+
54
+ `runWorkflow()` holds `let currentContinueToken = startContinueToken`. The variable is updated:
55
+ - In `onAdvance(stepText, continueToken)` -- uses the `continueToken` param (already available, currently ignored)
56
+ - Via `onTokenUpdate: (t: string) => void` callback passed to `makeCompleteStepTool()` -- called on blocked retry
57
+
58
+ `makeCompleteStepTool()` signature:
59
+ ```typescript
60
+ export function makeCompleteStepTool(
61
+ sessionId: string,
62
+ ctx: V2ToolContext,
63
+ getCurrentToken: () => string,
64
+ onAdvance: (nextStepText: string, continueToken: string) => void,
65
+ onComplete: (notes: string | undefined) => void,
66
+ onTokenUpdate: (t: string) => void,
67
+ schemas: Record<string, any>,
68
+ _executeContinueWorkflowFn?: typeof executeContinueWorkflow,
69
+ emitter?: DaemonEventEmitter,
70
+ workrailSessionId?: string | null,
71
+ ): AgentTool
72
+ ```
73
+
74
+ **Runner-up:** Candidate 2 (`TokenRef` object) -- rejected: requires `makeContinueWorkflowTool` signature change (out of scope), violates YAGNI.
75
+
76
+ ## Vertical Slices
77
+
78
+ ### Slice 1: Schema and factory function
79
+ **Files:** `src/daemon/workflow-runner.ts`
80
+ **Work:**
81
+ - Add `CompleteStepParams` JSON Schema to `getSchemas()`:
82
+ ```json
83
+ {
84
+ "type": "object",
85
+ "properties": {
86
+ "notes": { "type": "string", "minLength": 50, "description": "..." },
87
+ "artifacts": { "type": "array", "items": {}, "description": "..." },
88
+ "context": { "type": "object", "additionalProperties": true }
89
+ },
90
+ "required": ["notes"],
91
+ "additionalProperties": false
92
+ }
93
+ ```
94
+ - Add `makeCompleteStepTool()` factory function (described above)
95
+ - Runtime notes validation inside `execute()`
96
+ - Full blocked/advance/complete handling (mirror `makeContinueWorkflowTool` but without token in response text)
97
+
98
+ **AC:** Function exists, exported, executes correctly with fake injection
99
+
100
+ ### Slice 2: runWorkflow() integration
101
+ **Files:** `src/daemon/workflow-runner.ts`
102
+ **Work:**
103
+ - Add `let currentContinueToken = startContinueToken` after `startContinueToken` is assigned
104
+ - Update `onAdvance` to set `currentContinueToken = continueToken` (second param was ignored before)
105
+ - Add `complete_step` to the tools list with `makeCompleteStepTool(..., () => currentContinueToken, onAdvance, onComplete, (t) => { currentContinueToken = t; }, ...)`
106
+ - Mark `continue_workflow` description as deprecated
107
+
108
+ **AC:** `runWorkflow()` compiles; `currentContinueToken` is updated on advance
109
+
110
+ ### Slice 3: System prompt + initial prompt updates
111
+ **Files:** `src/daemon/workflow-runner.ts`
112
+ **Work:**
113
+ - Update `BASE_SYSTEM_PROMPT` tools section: add `complete_step` as primary tool, mark `continue_workflow` as deprecated
114
+ - Update `initialPrompt` in `runWorkflow()`: remove `continueToken: ${startContinueToken}` from the text; change 'call continue_workflow' to 'call complete_step'
115
+
116
+ **AC:** Prompt does not contain the continueToken string; 'complete_step' appears as primary tool
117
+
118
+ ### Slice 4: Tests
119
+ **Files:** `tests/unit/workflow-runner-complete-step.test.ts`
120
+ **Test cases:**
121
+ - TC1: notes present, advance returns `{ status: 'advanced', nextStep: '...' }`
122
+ - TC2: workflow complete returns `{ status: 'complete' }`
123
+ - TC3: blocked retryable -- feedback says 'call complete_step again', `onTokenUpdate` called with retry token
124
+ - TC4: blocked non-retryable -- feedback says cannot proceed without resolving
125
+ - TC5: notes too short (< 50 chars) -- tool throws
126
+ - TC6: notes absent -- tool throws
127
+ - TC7: artifacts pass-through -- artifacts forwarded to executeContinueWorkflow
128
+ - TC8: no artifacts, no output artifact object constructed (empty array guard)
129
+ - TC9: `continueToken` NOT in response text (regression guard)
130
+ - TC10: `getCurrentToken()` is called (not a hardcoded token) -- verify via fake that captures input
131
+
132
+ **AC:** All 10 tests pass; `npm run test:unit` green
133
+
134
+ ## Risk Register
135
+
136
+ | Risk | Likelihood | Impact | Mitigation |
137
+ |---|---|---|---|
138
+ | LLM uses deprecated continue_workflow with hallucinated token | Low | Medium | System prompt deprecation notice; transition risk accepted |
139
+ | Token not updated on blocked retry | Low | High | Test TC3 catches this |
140
+ | `continueToken` in response text | Low | Medium | Test TC9 catches this |
141
+ | Initial prompt still contains token string | Low | Medium | Test prompt content in system-prompt test |
142
+
143
+ ## PR Packaging Strategy
144
+
145
+ Single PR: `feat/daemon-complete-step-tool`
146
+
147
+ All 4 slices go in one commit per slice (or a single clean commit). No multi-PR needed -- change is additive, no breaking changes.
148
+
149
+ ## Philosophy Alignment
150
+
151
+ | Slice | Principle | Status |
152
+ |---|---|---|
153
+ | Slice 1: Schema | Make illegal states unrepresentable | Satisfied -- notes required, intent hardcoded |
154
+ | Slice 1: Factory | Validate at boundaries | Satisfied -- runtime check in execute() |
155
+ | Slice 1: Factory | Immutability by default | Satisfied -- mutation via callbacks only |
156
+ | Slice 2: runWorkflow | Determinism | Acceptable tension -- sequential execution makes mutable state deterministic |
157
+ | Slice 3: Prompts | Document "why" | Satisfied -- prompts explain the tool's purpose |
158
+ | Slice 4: Tests | Prefer fakes over mocks | Satisfied -- fake injection pattern |
159
+ | All slices | YAGNI | Satisfied -- zero new abstractions |
160
+
161
+ ## Unresolved Unknowns
162
+
163
+ None material. All design decisions are confirmed.
164
+
165
+ - `unresolvedUnknownCount`: 0
166
+ - `planConfidenceBand`: High
@@ -4751,3 +4751,227 @@ The daemon assembles a pre-packaged context bundle from these sources before the
4751
4751
  - How do you handle a trigger that spans multiple systems (e.g. a Jira ticket about a GitHub PR)?
4752
4752
 
4753
4753
  **This is a design-first item** -- the ideas are promising but the right shape isn't obvious. Needs a discovery pass before any implementation.
4754
+
4755
+ ---
4756
+
4757
+ ### Rethinking the subagent loop from first principles (Apr 18, 2026)
4758
+
4759
+ **Step back from all assumptions.** The current design assumes subagent spawning works like Claude Code's `mcp__nested-subagent__Task` -- the LLM decides when to spawn, what to give it, and handles the result. That's not the only model, and it might not be the best one for WorkTrain.
4760
+
4761
+ ---
4762
+
4763
+ #### The current assumption (inherited from Claude Code)
4764
+
4765
+ ```
4766
+ Agent decides → calls spawn_agent tool → subagent runs → agent gets result → agent continues
4767
+ ```
4768
+
4769
+ The LLM is the orchestrator. It decides when parallelism is needed, what context to pass, how to handle results.
4770
+
4771
+ **Problems with this:**
4772
+ - LLMs are bad at orchestration decisions -- they sometimes delegate when they shouldn't, sometimes don't when they should
4773
+ - Context passing is lossy -- the LLM decides what to include, which is usually insufficient
4774
+ - Subagent output competes with everything else in the parent's context window
4775
+ - The LLM has to reason about the subagent's output before continuing -- burns context and turns
4776
+ - No enforcement -- the LLM can skip delegation entirely and just do the work itself (often wrong)
4777
+
4778
+ ---
4779
+
4780
+ #### Alternative model: workflow-declared parallelism, daemon-enforced
4781
+
4782
+ **The workflow spec is the orchestration. The daemon is the orchestrator. The LLM is the executor.**
4783
+
4784
+ ```yaml
4785
+ # Workflow step definition
4786
+ - id: parallel-review
4787
+ type: parallel
4788
+ agents:
4789
+ - workflow: routine-correctness-review
4790
+ contextFrom: [phase-3-output, candidateFiles]
4791
+ - workflow: routine-philosophy-alignment
4792
+ contextFrom: [phase-0-output, philosophySources]
4793
+ - workflow: routine-hypothesis-challenge
4794
+ contextFrom: [phase-2-output, selectedApproach]
4795
+ synthesisStep: synthesize-parallel-review
4796
+ ```
4797
+
4798
+ The daemon sees this step definition and:
4799
+ 1. Automatically spawns 3 child sessions with specified workflows
4800
+ 2. Injects the declared context bundles (from prior step outputs) into each child
4801
+ 3. Waits for all 3 to complete
4802
+ 4. Passes all 3 results to a synthesis step
4803
+ 5. Injects the synthesis into the parent agent's next turn
4804
+
4805
+ **The parent LLM never decides to spawn anything.** It just does its part. The workflow declares the orchestration pattern. The daemon enforces it.
4806
+
4807
+ ---
4808
+
4809
+ #### What this changes about the agent's job
4810
+
4811
+ Today: "Do this work, and decide when to delegate parts of it to subagents."
4812
+
4813
+ New model: "Do this bounded cognitive task. The daemon handles everything else."
4814
+
4815
+ The agent's job becomes strictly about the cognitive work -- reasoning, writing, deciding within a defined scope. Orchestration, parallelism, context packaging, result synthesis -- all daemon responsibilities defined by the workflow spec.
4816
+
4817
+ ---
4818
+
4819
+ #### The agent gives context to the daemon, not to subagents directly
4820
+
4821
+ Instead of the LLM calling `spawn_agent({ goal: "...", context: {...} })`, the workflow step has:
4822
+
4823
+ ```yaml
4824
+ - id: context-gathering
4825
+ output:
4826
+ contextFor:
4827
+ - step: parallel-review
4828
+ keys: [candidateFiles, invariants, philosophySources]
4829
+ ```
4830
+
4831
+ The agent writes outputs as structured artifacts. The daemon routes those artifacts to the right child agents at the right time. The LLM never packages context for a subagent -- it just produces outputs, and the workflow spec declares where those outputs go.
4832
+
4833
+ **This is the shift:** from "agent as orchestrator" to "workflow as orchestrator, daemon as executor, agent as cognitive unit."
4834
+
4835
+ ---
4836
+
4837
+ #### What the subagent loop might look like
4838
+
4839
+ ```
4840
+ Parent workflow step completes
4841
+ ↓ Daemon reads step output artifacts
4842
+ ↓ Daemon checks workflow spec for parallel/sequential children
4843
+ ↓ Daemon spawns child sessions with structured context bundles
4844
+ ↓ Children run their bounded tasks
4845
+ ↓ Daemon collects child outputs
4846
+ ↓ Daemon passes synthesized context to parent's next step
4847
+ ↓ Parent continues with full context
4848
+ ```
4849
+
4850
+ No LLM orchestration. No token-burning context packaging decisions. No "did I remember to delegate this?" uncertainty.
4851
+
4852
+ ---
4853
+
4854
+ #### What needs to be designed (don't implement yet)
4855
+
4856
+ 1. **Workflow step schema for parallelism** -- how does the workflow spec declare parallel agents, sequential chains, fan-out/fan-in patterns?
4857
+ 2. **Context routing spec** -- how does a step's output get routed to specific child agents? What's the schema for `contextFor`?
4858
+ 3. **Synthesis patterns** -- how do multiple child outputs get combined? (concatenate? LLM synthesis step? structured merge?)
4859
+ 4. **Failure handling** -- if one child fails, what happens? (fail-fast? continue with partial results? retry?)
4860
+ 5. **Depth limits** -- same constraints as native agent spawning, but enforced at the workflow level not tool level
4861
+ 6. **Backward compatibility** -- workflows that currently use `mcp__nested-subagent__Task` can be migrated incrementally
4862
+
4863
+ **This is a design-first item.** Run a discovery session to explore the design space before any implementation. The current assumptions about subagent loops may be entirely wrong.
4864
+
4865
+ ---
4866
+
4867
+ ### Workflow runtime adapter: one spec, two runtimes (Apr 18, 2026)
4868
+
4869
+ **The core insight:** as workflows evolve (potentially morphing significantly once the subagent loop is rethought), the workflow JSON becomes the canonical spec for *what work needs to happen*. How that spec gets executed depends on the runtime. A single adapter layer translates the canonical spec to runtime-specific execution plans.
4870
+
4871
+ **Two runtimes, one spec:**
4872
+
4873
+ ```
4874
+ workflows/mr-review-workflow-agentic.json ← canonical spec (unchanged)
4875
+
4876
+ WorkflowAdapter.forRuntime('mcp') ← MCP runtime interpretation
4877
+ WorkflowAdapter.forRuntime('daemon') ← Daemon runtime interpretation
4878
+ ```
4879
+
4880
+ **What each adapter does:**
4881
+
4882
+ MCP adapter (human-in-the-loop):
4883
+ - Preserves `requireConfirmation` gates
4884
+ - Presents `continue_workflow` tool call interface
4885
+ - LLM drives subagent spawning manually via `mcp__nested-subagent__Task`
4886
+ - Maintains backward compat with all existing Claude Code usage
4887
+
4888
+ Daemon adapter (fully autonomous):
4889
+ - Removes or auto-bypasses `requireConfirmation` gates
4890
+ - Replaces `continue_workflow` with `complete_step` (daemon manages tokens)
4891
+ - Converts workflow-declared parallelism into automatic child session spawning
4892
+ - Routes step outputs to child agents per workflow spec
4893
+ - Enforces output contracts at step boundaries
4894
+
4895
+ **Why this matters as workflows evolve:**
4896
+
4897
+ Once the subagent loop is rethought (workflow-as-orchestrator model), workflow steps will likely declare parallelism, context routing, and synthesis patterns explicitly. These declarations make no sense to the MCP runtime (a human is already deciding this in real-time). The adapter translates them:
4898
+
4899
+ ```yaml
4900
+ # Workflow spec (future shape)
4901
+ - id: parallel-review
4902
+ type: parallel
4903
+ agents: [correctness, philosophy, hypothesis-challenge]
4904
+ contextFrom: [phase-3-output]
4905
+ ```
4906
+
4907
+ MCP adapter sees this → renders as: "You should spawn 3 reviewer subagents now. Here's a template..."
4908
+ Daemon adapter sees this → actually spawns 3 child sessions automatically
4909
+
4910
+ The workflow spec describes the intent. The adapter knows how each runtime fulfills it.
4911
+
4912
+ **Key guarantee:** workflow improvements automatically benefit both runtimes. Improving `mr-review-workflow-agentic`'s philosophy alignment step shows up whether a human runs it through Claude Code or WorkTrain runs it autonomously. No dual maintenance.
4913
+
4914
+ **Also eliminates "autonomous workflow variants":** the backlog had a separate item for autonomous variants of workflows. With the adapter, the canonical workflow spec is the only version -- the daemon adapter handles what "autonomy: full" means in practice. No parallel workflow files.
4915
+
4916
+ **Build order:**
4917
+ 1. Define the canonical workflow spec surface (what can be declared)
4918
+ 2. MCP adapter (largely a no-op -- existing behavior, but formally defined)
4919
+ 3. Daemon adapter (the interesting one -- translates declarations to daemon execution)
4920
+ 4. Converter for upgrading existing workflow JSONs to the new canonical spec if the schema evolves
4921
+
4922
+ **Dependencies:** requires the subagent loop rethinking to be resolved first -- the adapter can't be designed until we know what the workflow spec will declare.
4923
+
4924
+ ---
4925
+
4926
+ ### User notifications when daemon starts and finishes work (Apr 18, 2026)
4927
+
4928
+ **The problem:** the daemon silently starts and finishes sessions. Unless you're watching the console or tailing the log, you have no idea work happened or completed. For autonomous sessions that run over minutes or hours, this is a significant UX gap.
4929
+
4930
+ **What users need to know:**
4931
+ - Session started: "WorkTrain started reviewing PR #566" (with a link)
4932
+ - Session completed: "WorkTrain finished reviewing PR #566 -- APPROVED, no findings" (with session link)
4933
+ - Session failed/stuck: "WorkTrain got stuck on PR #566 after 15 turns -- needs attention" (with details)
4934
+
4935
+ **Notification channels -- anything the user wants:**
4936
+
4937
+ The notification system should be open-ended. Any channel that accepts a webhook or has an API should be configurable. The architecture is: `DaemonEventEmitter` → `NotificationRouter` → one or more configured channels.
4938
+
4939
+ Short-term (easiest to ship):
4940
+ - **Outbox.jsonl** -- already spec'd. `worktrain inbox` reads it, mobile client polls it. Works everywhere, zero config.
4941
+ - **Generic webhook** -- HTTP POST to any URL. Covers Slack, Discord, Teams, PagerDuty, Zapier, IFTTT, and anything else that accepts webhooks. One implementation, infinite integrations.
4942
+ - **macOS notification** -- `osascript` on Mac. Useful for local dev awareness.
4943
+ - **Linux/Windows notification** -- `notify-send` on Linux, Windows Toast via PowerShell.
4944
+
4945
+ Medium-term (first-class integrations):
4946
+ - **Slack** (direct API, not just webhook -- enables threading, reactions, rich formatting)
4947
+ - **Discord** (webhook, then bot for richer interactions)
4948
+ - **Microsoft Teams** (Adaptive Cards)
4949
+ - **Telegram** (popular for personal automation)
4950
+ - **Email** (SMTP for async, digest mode)
4951
+
4952
+ Long-term (when mobile exists):
4953
+ - **Mobile push notifications** -- the mobile app (spec'd in backlog) receives push notifications directly. When the app exists, this becomes the primary channel -- native push is better than any polling-based alternative.
4954
+ - **Desktop app** -- if WorkTrain ever has a desktop app, native notifications from there.
4955
+
4956
+ **The outbox is the universal foundation.** Every notification goes through `~/.workrail/outbox.jsonl` first. Channel-specific delivery (webhook, Slack, push) is a fan-out from the outbox. This means: a mobile app polling the outbox gets ALL notifications regardless of which other channels are configured.
4957
+
4958
+ **Config:**
4959
+ ```json
4960
+ // ~/.workrail/config.json
4961
+ {
4962
+ "notifications": {
4963
+ "onSessionComplete": true,
4964
+ "onSessionFailed": true,
4965
+ "onStuck": true,
4966
+ "onSessionStart": false,
4967
+ "channels": [
4968
+ { "type": "webhook", "url": "$SLACK_WEBHOOK_URL" },
4969
+ { "type": "webhook", "url": "$DISCORD_WEBHOOK_URL" },
4970
+ { "type": "macos" },
4971
+ { "type": "outbox" }
4972
+ ]
4973
+ }
4974
+ }
4975
+ ```
4976
+
4977
+ **Build order:** outbox.jsonl integration (foundation, works everywhere) → generic webhook (covers Slack/Discord/Teams/anything) → platform notifications (macOS/Linux/Windows) → mobile app push (when mobile exists).
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@exaudeus/workrail",
3
- "version": "3.34.2",
3
+ "version": "3.35.0",
4
4
  "description": "Step-by-step workflow enforcement for AI agents via MCP",
5
5
  "license": "MIT",
6
6
  "repository": {