@exaudeus/workrail 3.34.2 → 3.35.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,166 @@
1
+ # Implementation Plan: daemon complete_step tool
2
+
3
+ ## Problem Statement
4
+
5
+ The daemon's `continue_workflow` tool requires the LLM to round-trip a `continueToken` (an HMAC-signed opaque token). The LLM frequently mangles this token, causing TOKEN_BAD_SIGNATURE errors that kill sessions. The fix: add a `complete_step` tool where the daemon injects the `continueToken` internally -- the LLM never sees it.
6
+
7
+ ## Acceptance Criteria
8
+
9
+ 1. `makeCompleteStepTool()` function exists in `src/daemon/workflow-runner.ts`, exported alongside `makeContinueWorkflowTool()`
10
+ 2. `complete_step` accepts: `notes: string` (required, min 50 chars), `artifacts?: unknown[]`, `context?: Record<string, unknown>`
11
+ 3. The tool injects the current session's `continueToken` internally before calling `executeContinueWorkflow` -- LLM never provides it
12
+ 4. `continueToken` is managed via a closure variable `currentContinueToken` in `runWorkflow()`, updated on advance and blocked-retry
13
+ 5. On successful advance: returns `{ status: 'advanced', nextStep: '<step title>' }` text
14
+ 6. On workflow complete: returns `{ status: 'complete' }` text
15
+ 7. On blocked: returns human-readable feedback saying 'call complete_step again' (not continue_workflow)
16
+ 8. Runtime validation: throws if `notes.length < 50`
17
+ 9. `complete_step` is in the daemon tools list in `runWorkflow()` alongside `continue_workflow`
18
+ 10. `continue_workflow` description is marked `[DEPRECATED in daemon sessions -- use complete_step]`
19
+ 11. `BASE_SYSTEM_PROMPT` lists `complete_step` as the primary advancement tool
20
+ 12. Initial prompt removes `continueToken: ${startContinueToken}` and says 'Call complete_step with your notes when done'
21
+ 13. Unit tests cover: happy-path advance, workflow complete, blocked response (retryable + non-retryable), notes too short, artifacts pass-through
22
+ 14. All existing tests pass
23
+
24
+ ## Non-Goals
25
+
26
+ - No schema changes to `V2ContinueWorkflowInputShape` or public MCP tools
27
+ - No new HTTP routes or public API changes
28
+ - `complete_step` is daemon-only -- NOT added to the MCP server's public tools list
29
+ - No migration of `makeContinueWorkflowTool` to use `TokenRef` pattern
30
+ - No removal of `continue_workflow` from the daemon tools list (backward compat)
31
+ - No changes to `rehydrate` flow
32
+
33
+ ## Philosophy-Driven Constraints
34
+
35
+ - **Immutability by default**: `currentContinueToken` is the only mutable shared state; confined to `runWorkflow()` closure
36
+ - **Validate at boundaries**: runtime `notes.length` check in `execute()` (JSON Schema is informational only)
37
+ - **Prefer fakes over mocks**: tests use fake `executeContinueWorkflow` injection
38
+ - **YAGNI**: zero new abstraction types
39
+ - **Document "why" not "what"**: WHY comments on all non-obvious decisions
40
+
41
+ ## Invariants
42
+
43
+ 1. `persistTokens(sessionId, newToken, ...)` is always called BEFORE `onAdvance()` fires (crash safety)
44
+ 2. `currentContinueToken` is updated to the advance token on `kind: 'ok'` responses
45
+ 3. `currentContinueToken` is updated to the retry token on `kind: 'blocked'` responses
46
+ 4. `continueToken` never appears in `complete_step` response text
47
+ 5. `intent: 'advance'` is hardcoded -- LLM cannot pass `rehydrate` via `complete_step`
48
+ 6. `complete_step` is NOT exposed in the MCP server's public tool registration
49
+
50
+ ## Selected Approach
51
+
52
+ **Candidate 1: Inline closure variable + two callback paths**
53
+
54
+ `runWorkflow()` holds `let currentContinueToken = startContinueToken`. The variable is updated:
55
+ - In `onAdvance(stepText, continueToken)` -- uses the `continueToken` param (already available, currently ignored)
56
+ - Via `onTokenUpdate: (t: string) => void` callback passed to `makeCompleteStepTool()` -- called on blocked retry
57
+
58
+ `makeCompleteStepTool()` signature:
59
+ ```typescript
60
+ export function makeCompleteStepTool(
61
+ sessionId: string,
62
+ ctx: V2ToolContext,
63
+ getCurrentToken: () => string,
64
+ onAdvance: (nextStepText: string, continueToken: string) => void,
65
+ onComplete: (notes: string | undefined) => void,
66
+ onTokenUpdate: (t: string) => void,
67
+ schemas: Record<string, any>,
68
+ _executeContinueWorkflowFn?: typeof executeContinueWorkflow,
69
+ emitter?: DaemonEventEmitter,
70
+ workrailSessionId?: string | null,
71
+ ): AgentTool
72
+ ```
73
+
74
+ **Runner-up:** Candidate 2 (`TokenRef` object) -- rejected: requires `makeContinueWorkflowTool` signature change (out of scope), violates YAGNI.
75
+
76
+ ## Vertical Slices
77
+
78
+ ### Slice 1: Schema and factory function
79
+ **Files:** `src/daemon/workflow-runner.ts`
80
+ **Work:**
81
+ - Add `CompleteStepParams` JSON Schema to `getSchemas()`:
82
+ ```json
83
+ {
84
+ "type": "object",
85
+ "properties": {
86
+ "notes": { "type": "string", "minLength": 50, "description": "..." },
87
+ "artifacts": { "type": "array", "items": {}, "description": "..." },
88
+ "context": { "type": "object", "additionalProperties": true }
89
+ },
90
+ "required": ["notes"],
91
+ "additionalProperties": false
92
+ }
93
+ ```
94
+ - Add `makeCompleteStepTool()` factory function (described above)
95
+ - Runtime notes validation inside `execute()`
96
+ - Full blocked/advance/complete handling (mirror `makeContinueWorkflowTool` but without token in response text)
97
+
98
+ **AC:** Function exists, exported, executes correctly with fake injection
99
+
100
+ ### Slice 2: runWorkflow() integration
101
+ **Files:** `src/daemon/workflow-runner.ts`
102
+ **Work:**
103
+ - Add `let currentContinueToken = startContinueToken` after `startContinueToken` is assigned
104
+ - Update `onAdvance` to set `currentContinueToken = continueToken` (second param was ignored before)
105
+ - Add `complete_step` to the tools list with `makeCompleteStepTool(..., () => currentContinueToken, onAdvance, onComplete, (t) => { currentContinueToken = t; }, ...)`
106
+ - Mark `continue_workflow` description as deprecated
107
+
108
+ **AC:** `runWorkflow()` compiles; `currentContinueToken` is updated on advance
109
+
110
+ ### Slice 3: System prompt + initial prompt updates
111
+ **Files:** `src/daemon/workflow-runner.ts`
112
+ **Work:**
113
+ - Update `BASE_SYSTEM_PROMPT` tools section: add `complete_step` as primary tool, mark `continue_workflow` as deprecated
114
+ - Update `initialPrompt` in `runWorkflow()`: remove `continueToken: ${startContinueToken}` from the text; change 'call continue_workflow' to 'call complete_step'
115
+
116
+ **AC:** Prompt does not contain the continueToken string; 'complete_step' appears as primary tool
117
+
118
+ ### Slice 4: Tests
119
+ **Files:** `tests/unit/workflow-runner-complete-step.test.ts`
120
+ **Test cases:**
121
+ - TC1: notes present, advance returns `{ status: 'advanced', nextStep: '...' }`
122
+ - TC2: workflow complete returns `{ status: 'complete' }`
123
+ - TC3: blocked retryable -- feedback says 'call complete_step again', `onTokenUpdate` called with retry token
124
+ - TC4: blocked non-retryable -- feedback says cannot proceed without resolving
125
+ - TC5: notes too short (< 50 chars) -- tool throws
126
+ - TC6: notes absent -- tool throws
127
+ - TC7: artifacts pass-through -- artifacts forwarded to executeContinueWorkflow
128
+ - TC8: no artifacts, no output artifact object constructed (empty array guard)
129
+ - TC9: `continueToken` NOT in response text (regression guard)
130
+ - TC10: `getCurrentToken()` is called (not a hardcoded token) -- verify via fake that captures input
131
+
132
+ **AC:** All 10 tests pass; `npm run test:unit` green
133
+
134
+ ## Risk Register
135
+
136
+ | Risk | Likelihood | Impact | Mitigation |
137
+ |---|---|---|---|
138
+ | LLM uses deprecated continue_workflow with hallucinated token | Low | Medium | System prompt deprecation notice; transition risk accepted |
139
+ | Token not updated on blocked retry | Low | High | Test TC3 catches this |
140
+ | `continueToken` in response text | Low | Medium | Test TC9 catches this |
141
+ | Initial prompt still contains token string | Low | Medium | Test prompt content in system-prompt test |
142
+
143
+ ## PR Packaging Strategy
144
+
145
+ Single PR: `feat/daemon-complete-step-tool`
146
+
147
+ All 4 slices go in one commit per slice (or a single clean commit). No multi-PR needed -- change is additive, no breaking changes.
148
+
149
+ ## Philosophy Alignment
150
+
151
+ | Slice | Principle | Status |
152
+ |---|---|---|
153
+ | Slice 1: Schema | Make illegal states unrepresentable | Satisfied -- notes required, intent hardcoded |
154
+ | Slice 1: Factory | Validate at boundaries | Satisfied -- runtime check in execute() |
155
+ | Slice 1: Factory | Immutability by default | Satisfied -- mutation via callbacks only |
156
+ | Slice 2: runWorkflow | Determinism | Acceptable tension -- sequential execution makes mutable state deterministic |
157
+ | Slice 3: Prompts | Document "why" | Satisfied -- prompts explain the tool's purpose |
158
+ | Slice 4: Tests | Prefer fakes over mocks | Satisfied -- fake injection pattern |
159
+ | All slices | YAGNI | Satisfied -- zero new abstractions |
160
+
161
+ ## Unresolved Unknowns
162
+
163
+ None material. All design decisions are confirmed.
164
+
165
+ - `unresolvedUnknownCount`: 0
166
+ - `planConfidenceBand`: High
@@ -4751,3 +4751,398 @@ The daemon assembles a pre-packaged context bundle from these sources before the
4751
4751
  - How do you handle a trigger that spans multiple systems (e.g. a Jira ticket about a GitHub PR)?
4752
4752
 
4753
4753
  **This is a design-first item** -- the ideas are promising but the right shape isn't obvious. Needs a discovery pass before any implementation.
4754
+
4755
+ ---
4756
+
4757
+ ### Rethinking the subagent loop from first principles (Apr 18, 2026)
4758
+
4759
+ **Step back from all assumptions.** The current design assumes subagent spawning works like Claude Code's `mcp__nested-subagent__Task` -- the LLM decides when to spawn, what to give it, and handles the result. That's not the only model, and it might not be the best one for WorkTrain.
4760
+
4761
+ ---
4762
+
4763
+ #### The current assumption (inherited from Claude Code)
4764
+
4765
+ ```
4766
+ Agent decides → calls spawn_agent tool → subagent runs → agent gets result → agent continues
4767
+ ```
4768
+
4769
+ The LLM is the orchestrator. It decides when parallelism is needed, what context to pass, how to handle results.
4770
+
4771
+ **Problems with this:**
4772
+ - LLMs are bad at orchestration decisions -- they sometimes delegate when they shouldn't, sometimes don't when they should
4773
+ - Context passing is lossy -- the LLM decides what to include, which is usually insufficient
4774
+ - Subagent output competes with everything else in the parent's context window
4775
+ - The LLM has to reason about the subagent's output before continuing -- burns context and turns
4776
+ - No enforcement -- the LLM can skip delegation entirely and just do the work itself (often wrong)
4777
+
4778
+ ---
4779
+
4780
+ #### Alternative model: workflow-declared parallelism, daemon-enforced
4781
+
4782
+ **The workflow spec is the orchestration. The daemon is the orchestrator. The LLM is the executor.**
4783
+
4784
+ ```yaml
4785
+ # Workflow step definition
4786
+ - id: parallel-review
4787
+ type: parallel
4788
+ agents:
4789
+ - workflow: routine-correctness-review
4790
+ contextFrom: [phase-3-output, candidateFiles]
4791
+ - workflow: routine-philosophy-alignment
4792
+ contextFrom: [phase-0-output, philosophySources]
4793
+ - workflow: routine-hypothesis-challenge
4794
+ contextFrom: [phase-2-output, selectedApproach]
4795
+ synthesisStep: synthesize-parallel-review
4796
+ ```
4797
+
4798
+ The daemon sees this step definition and:
4799
+ 1. Automatically spawns 3 child sessions with specified workflows
4800
+ 2. Injects the declared context bundles (from prior step outputs) into each child
4801
+ 3. Waits for all 3 to complete
4802
+ 4. Passes all 3 results to a synthesis step
4803
+ 5. Injects the synthesis into the parent agent's next turn
4804
+
4805
+ **The parent LLM never decides to spawn anything.** It just does its part. The workflow declares the orchestration pattern. The daemon enforces it.
4806
+
4807
+ ---
4808
+
4809
+ #### What this changes about the agent's job
4810
+
4811
+ Today: "Do this work, and decide when to delegate parts of it to subagents."
4812
+
4813
+ New model: "Do this bounded cognitive task. The daemon handles everything else."
4814
+
4815
+ The agent's job becomes strictly about the cognitive work -- reasoning, writing, deciding within a defined scope. Orchestration, parallelism, context packaging, result synthesis -- all daemon responsibilities defined by the workflow spec.
4816
+
4817
+ ---
4818
+
4819
+ #### The agent gives context to the daemon, not to subagents directly
4820
+
4821
+ Instead of the LLM calling `spawn_agent({ goal: "...", context: {...} })`, the workflow step has:
4822
+
4823
+ ```yaml
4824
+ - id: context-gathering
4825
+ output:
4826
+ contextFor:
4827
+ - step: parallel-review
4828
+ keys: [candidateFiles, invariants, philosophySources]
4829
+ ```
4830
+
4831
+ The agent writes outputs as structured artifacts. The daemon routes those artifacts to the right child agents at the right time. The LLM never packages context for a subagent -- it just produces outputs, and the workflow spec declares where those outputs go.
4832
+
4833
+ **This is the shift:** from "agent as orchestrator" to "workflow as orchestrator, daemon as executor, agent as cognitive unit."
4834
+
4835
+ ---
4836
+
4837
+ #### What the subagent loop might look like
4838
+
4839
+ ```
4840
+ Parent workflow step completes
4841
+ ↓ Daemon reads step output artifacts
4842
+ ↓ Daemon checks workflow spec for parallel/sequential children
4843
+ ↓ Daemon spawns child sessions with structured context bundles
4844
+ ↓ Children run their bounded tasks
4845
+ ↓ Daemon collects child outputs
4846
+ ↓ Daemon passes synthesized context to parent's next step
4847
+ ↓ Parent continues with full context
4848
+ ```
4849
+
4850
+ No LLM orchestration. No token-burning context packaging decisions. No "did I remember to delegate this?" uncertainty.
4851
+
4852
+ ---
4853
+
4854
+ #### What needs to be designed (don't implement yet)
4855
+
4856
+ 1. **Workflow step schema for parallelism** -- how does the workflow spec declare parallel agents, sequential chains, fan-out/fan-in patterns?
4857
+ 2. **Context routing spec** -- how does a step's output get routed to specific child agents? What's the schema for `contextFor`?
4858
+ 3. **Synthesis patterns** -- how do multiple child outputs get combined? (concatenate? LLM synthesis step? structured merge?)
4859
+ 4. **Failure handling** -- if one child fails, what happens? (fail-fast? continue with partial results? retry?)
4860
+ 5. **Depth limits** -- same constraints as native agent spawning, but enforced at the workflow level not tool level
4861
+ 6. **Backward compatibility** -- workflows that currently use `mcp__nested-subagent__Task` can be migrated incrementally
4862
+
4863
+ **This is a design-first item.** Run a discovery session to explore the design space before any implementation. The current assumptions about subagent loops may be entirely wrong.
4864
+
4865
+ ---
4866
+
4867
+ ### Workflow runtime adapter: one spec, two runtimes (Apr 18, 2026)
4868
+
4869
+ **The core insight:** as workflows evolve (potentially morphing significantly once the subagent loop is rethought), the workflow JSON becomes the canonical spec for *what work needs to happen*. How that spec gets executed depends on the runtime. A single adapter layer translates the canonical spec to runtime-specific execution plans.
4870
+
4871
+ **Two runtimes, one spec:**
4872
+
4873
+ ```
4874
+ workflows/mr-review-workflow-agentic.json ← canonical spec (unchanged)
4875
+
4876
+ WorkflowAdapter.forRuntime('mcp') ← MCP runtime interpretation
4877
+ WorkflowAdapter.forRuntime('daemon') ← Daemon runtime interpretation
4878
+ ```
4879
+
4880
+ **What each adapter does:**
4881
+
4882
+ MCP adapter (human-in-the-loop):
4883
+ - Preserves `requireConfirmation` gates
4884
+ - Presents `continue_workflow` tool call interface
4885
+ - LLM drives subagent spawning manually via `mcp__nested-subagent__Task`
4886
+ - Maintains backward compat with all existing Claude Code usage
4887
+
4888
+ Daemon adapter (fully autonomous):
4889
+ - Removes or auto-bypasses `requireConfirmation` gates
4890
+ - Replaces `continue_workflow` with `complete_step` (daemon manages tokens)
4891
+ - Converts workflow-declared parallelism into automatic child session spawning
4892
+ - Routes step outputs to child agents per workflow spec
4893
+ - Enforces output contracts at step boundaries
4894
+
4895
+ **Why this matters as workflows evolve:**
4896
+
4897
+ Once the subagent loop is rethought (workflow-as-orchestrator model), workflow steps will likely declare parallelism, context routing, and synthesis patterns explicitly. These declarations make no sense to the MCP runtime (a human is already deciding this in real-time). The adapter translates them:
4898
+
4899
+ ```yaml
4900
+ # Workflow spec (future shape)
4901
+ - id: parallel-review
4902
+ type: parallel
4903
+ agents: [correctness, philosophy, hypothesis-challenge]
4904
+ contextFrom: [phase-3-output]
4905
+ ```
4906
+
4907
+ MCP adapter sees this → renders as: "You should spawn 3 reviewer subagents now. Here's a template..."
4908
+ Daemon adapter sees this → actually spawns 3 child sessions automatically
4909
+
4910
+ The workflow spec describes the intent. The adapter knows how each runtime fulfills it.
4911
+
4912
+ **Key guarantee:** workflow improvements automatically benefit both runtimes. Improving `mr-review-workflow-agentic`'s philosophy alignment step shows up whether a human runs it through Claude Code or WorkTrain runs it autonomously. No dual maintenance.
4913
+
4914
+ **Also eliminates "autonomous workflow variants":** the backlog had a separate item for autonomous variants of workflows. With the adapter, the canonical workflow spec is the only version -- the daemon adapter handles what "autonomy: full" means in practice. No parallel workflow files.
4915
+
4916
+ **Build order:**
4917
+ 1. Define the canonical workflow spec surface (what can be declared)
4918
+ 2. MCP adapter (largely a no-op -- existing behavior, but formally defined)
4919
+ 3. Daemon adapter (the interesting one -- translates declarations to daemon execution)
4920
+ 4. Converter for upgrading existing workflow JSONs to the new canonical spec if the schema evolves
4921
+
4922
+ **Dependencies:** requires the subagent loop rethinking to be resolved first -- the adapter can't be designed until we know what the workflow spec will declare.
4923
+
4924
+ ---
4925
+
4926
+ ### User notifications when daemon starts and finishes work (Apr 18, 2026)
4927
+
4928
+ **The problem:** the daemon silently starts and finishes sessions. Unless you're watching the console or tailing the log, you have no idea work happened or completed. For autonomous sessions that run over minutes or hours, this is a significant UX gap.
4929
+
4930
+ **What users need to know:**
4931
+ - Session started: "WorkTrain started reviewing PR #566" (with a link)
4932
+ - Session completed: "WorkTrain finished reviewing PR #566 -- APPROVED, no findings" (with session link)
4933
+ - Session failed/stuck: "WorkTrain got stuck on PR #566 after 15 turns -- needs attention" (with details)
4934
+
4935
+ **Notification channels -- anything the user wants:**
4936
+
4937
+ The notification system should be open-ended. Any channel that accepts a webhook or has an API should be configurable. The architecture is: `DaemonEventEmitter` → `NotificationRouter` → one or more configured channels.
4938
+
4939
+ Short-term (easiest to ship):
4940
+ - **Outbox.jsonl** -- already spec'd. `worktrain inbox` reads it, mobile client polls it. Works everywhere, zero config.
4941
+ - **Generic webhook** -- HTTP POST to any URL. Covers Slack, Discord, Teams, PagerDuty, Zapier, IFTTT, and anything else that accepts webhooks. One implementation, infinite integrations.
4942
+ - **macOS notification** -- `osascript` on Mac. Useful for local dev awareness.
4943
+ - **Linux/Windows notification** -- `notify-send` on Linux, Windows Toast via PowerShell.
4944
+
4945
+ Medium-term (first-class integrations):
4946
+ - **Slack** (direct API, not just webhook -- enables threading, reactions, rich formatting)
4947
+ - **Discord** (webhook, then bot for richer interactions)
4948
+ - **Microsoft Teams** (Adaptive Cards)
4949
+ - **Telegram** (popular for personal automation)
4950
+ - **Email** (SMTP for async, digest mode)
4951
+
4952
+ Long-term (when mobile exists):
4953
+ - **Mobile push notifications** -- the mobile app (spec'd in backlog) receives push notifications directly. When the app exists, this becomes the primary channel -- native push is better than any polling-based alternative.
4954
+ - **Desktop app** -- if WorkTrain ever has a desktop app, native notifications from there.
4955
+
4956
+ **The outbox is the universal foundation.** Every notification goes through `~/.workrail/outbox.jsonl` first. Channel-specific delivery (webhook, Slack, push) is a fan-out from the outbox. This means: a mobile app polling the outbox gets ALL notifications regardless of which other channels are configured.
4957
+
4958
+ **Config:**
4959
+ ```json
4960
+ // ~/.workrail/config.json
4961
+ {
4962
+ "notifications": {
4963
+ "onSessionComplete": true,
4964
+ "onSessionFailed": true,
4965
+ "onStuck": true,
4966
+ "onSessionStart": false,
4967
+ "channels": [
4968
+ { "type": "webhook", "url": "$SLACK_WEBHOOK_URL" },
4969
+ { "type": "webhook", "url": "$DISCORD_WEBHOOK_URL" },
4970
+ { "type": "macos" },
4971
+ { "type": "outbox" }
4972
+ ]
4973
+ }
4974
+ }
4975
+ ```
4976
+
4977
+ **Build order:** outbox.jsonl integration (foundation, works everywhere) → generic webhook (covers Slack/Discord/Teams/anything) → platform notifications (macOS/Linux/Windows) → mobile app push (when mobile exists).
4978
+
4979
+ ---
4980
+
4981
+ ## 🎉 WorkTrain first confirmed end-to-end autonomous session (Apr 18, 2026)
4982
+
4983
+ **Timestamp:** 2026-04-18T15:09:49Z
4984
+ **Commit:** `473f4bd0` (main)
4985
+ **npm version:** v3.34.1 (published, installable by anyone)
4986
+ **What happened:** A real MR review workflow (`mr-review-workflow-agentic`) ran completely autonomously via webhook trigger, advanced through all phases (context gathering, review, synthesis, validation, handoff), self-validated, and produced a structured finding set. 8 step advances, `outcome: success`.
4987
+
4988
+ **Trigger:** `POST /webhook/mr-review {"goal": "Review PR #566: fix two minor bugs..."}`
4989
+ **Session:** `sess_3bmjuzf7l2vrqynjtleg5iskm4`
4990
+ **Result:** APPROVE with High confidence. 3 Minor findings, 1 Informational. Correctly decided not to delegate since no Critical/Major issues.
4991
+
4992
+ ---
4993
+
4994
+ ### What works at this commit
4995
+
4996
+ - ✅ Daemon accepts webhooks, starts sessions, runs workflows end-to-end
4997
+ - ✅ Sessions advance through all workflow phases autonomously
4998
+ - ✅ `mr-review-workflow-agentic` v2.6 runs fully -- context gathering, review phases, synthesis loop, validation, handoff
4999
+ - ✅ `wr.discovery` v3.2.0 runs fully -- with new phase-0-reframe (goal reframing before research)
5000
+ - ✅ Console shows live sessions via event log (no daemon connection required)
5001
+ - ✅ MCP server is stable (bridge removed, EPIPE fixed, v3.34.1 published)
5002
+ - ✅ GitHub + GitLab polling triggers (no webhooks needed)
5003
+ - ✅ `worktrain init`, `tell`, `inbox`, `spawn`, `await` CLI commands
5004
+ - ✅ Stuck detection + visibility (`worktrain status`, `worktrain logs --follow`)
5005
+ - ✅ `complete_step` tool -- daemon manages continueToken, LLM never handles it
5006
+ - ✅ Assessment gate circuit breaker (stops at 3 blocked attempts, shows artifact format)
5007
+ - ✅ `worktrain daemon --install` creates launchd service (daemon survives MCP reconnects)
5008
+ - ✅ Self-configuration (`triggers.yml`, `daemon-soul.md`, `AGENTS.md` for workrail repo)
5009
+
5010
+ ### Current limitations at this commit
5011
+
5012
+ **Blocking reliable complex workflows:**
5013
+ 1. **`complete_step` not yet tested in production** -- just merged, daemon still using `continue_workflow` in running sessions. Needs daemon restart to take effect.
5014
+ 2. **Assessment gates still unreliable** -- `complete_step` fixes the token issue; the `artifacts` field (#557) fixes the submission issue. But `coding-task-workflow-agentic` phases with quality gates haven't been tested end-to-end yet.
5015
+ 3. **Native `spawn_agent` not yet merged** -- implementation in progress. Until it lands, all subagent delegation is via `mcp__nested-subagent__Task` (invisible black box).
5016
+ 4. **No session identity (parentSessionId)** -- multi-phase work appears as unrelated flat sessions in the console.
5017
+
5018
+ **Architecture not yet realized:**
5019
+ 5. **Coordinator scripts don't exist** -- `worktrain spawn/await` is there but no templates.
5020
+ 6. **Subagent loop not rethought** -- LLM still decides when to delegate; workflow-as-orchestrator model is spec'd but not built.
5021
+ 7. **Workflow runtime adapter not built** -- workflows run in daemon mode as-is; no MCP vs daemon adaptation layer.
5022
+ 8. **Knowledge graph not built** -- context gathering still sweeps files on every session.
5023
+ 9. **MCP simplification PR-B not done** -- HttpServer still starts with MCP server.
5024
+
5025
+ **Missing for production autonomy:**
5026
+ 10. **No notifications** -- daemon completes work silently. Users have no awareness unless watching console/logs.
5027
+ 11. **No auto-commit from handoff artifact** -- merged but untested end-to-end.
5028
+ 12. **Late-bound goals not implemented** -- triggers require static goals; dynamic goals (like PR reviews) need `goalTemplate: "{{$.goal}}"` as default.
5029
+ 13. **No coordinator script template** -- the multi-phase autonomous pipeline exists as primitives but not as a usable script.
5030
+
5031
+ ---
5032
+
5033
+ ### Artifacts as first-class citizens: explorable, accessible, out of the repo (Apr 18, 2026)
5034
+
5035
+ **The current mess:** every autonomous session dumps `design-candidates.md`, `implementation_plan.md`, `design-review-findings.md`, `mr-review.md` etc. as files in the repo root or worktrees. They are:
5036
+ - Not indexed or searchable
5037
+ - Not visible in the console
5038
+ - Not accessible to other sessions (agent B can't read agent A's handoff without knowing the exact file path)
5039
+ - Polluting the repo with ephemeral working documents
5040
+ - Lost when worktrees are cleaned up
5041
+ - Scattered across the filesystem with no structure
5042
+
5043
+ **The right model:** artifacts are WorkTrain data, not filesystem files.
5044
+
5045
+ ---
5046
+
5047
+ #### What an artifact is
5048
+
5049
+ Any structured output from a session that has value beyond the session itself:
5050
+ - **Handoff docs** -- what one session produces for the next to consume
5051
+ - **Design candidates** -- research output with tradeoffs and recommendation
5052
+ - **Implementation plans** -- what to build, how, in what order
5053
+ - **Review findings** -- MR review output with findings, severity, recommendation
5054
+ - **Spec files** -- behavioral specs, acceptance criteria, API contracts
5055
+ - **Investigation summaries** -- bug investigation root cause and reproduction
5056
+ - **Context bundles** -- pre-packaged knowledge for subagent consumption
5057
+
5058
+ **NOT artifacts:** step notes (stay in WorkRail session store), event logs (stay in daemon events), source code (stays in repo).
5059
+
5060
+ ---
5061
+
5062
+ #### Where artifacts live
5063
+
5064
+ `~/.workrail/artifacts/<sessionId>/<artifact-type>-<timestamp>.json`
5065
+
5066
+ Structured JSON, not markdown. The display layer (console, `worktrain artifacts`) renders them as human-readable. Other agents query them as structured data.
5067
+
5068
+ **Why JSON not markdown:**
5069
+ - Queryable by other agents (what are the findings with severity=critical?)
5070
+ - Renderable by the console with proper formatting, filtering, search
5071
+ - Versionable and diffable in the artifact store
5072
+ - Accessible via the knowledge graph (artifacts become nodes with typed edges)
5073
+
5074
+ ---
5075
+
5076
+ #### Console integration
5077
+
5078
+ The console session detail view gets an "Artifacts" tab alongside "Steps" and "Notes":
5079
+
5080
+ ```
5081
+ Session: sess_3bmj... [MR Review: PR #566]
5082
+ ├── Steps (8)
5083
+ ├── Notes
5084
+ └── Artifacts (3)
5085
+ ├── 📋 review-findings.json "APPROVE -- 3 Minor, 1 Info"
5086
+ ├── 📄 context-bundle.json "12 files read, 4 patterns identified"
5087
+ └── 🔍 investigation-notes.json "Signal 3 dead code in max_turns path"
5088
+ ```
5089
+
5090
+ Click an artifact → full rendered view in the console.
5091
+
5092
+ ---
5093
+
5094
+ #### Accessibility to other agents
5095
+
5096
+ Agents can query artifacts from prior sessions via a new tool:
5097
+
5098
+ ```
5099
+ read_artifact({ sessionId: 'sess_3bmj...', type: 'review-findings' })
5100
+ → { verdict: 'APPROVE', findings: [...], recommendation: '...' }
5101
+
5102
+ search_artifacts({ type: 'implementation-plan', workflowId: 'coding-task-workflow-agentic', since: '7d' })
5103
+ → [{ sessionId, summary, createdAt }, ...]
5104
+ ```
5105
+
5106
+ This replaces the current pattern where agents `cat design-candidates.md` from a known path -- fragile, path-dependent, breaks across worktrees.
5107
+
5108
+ ---
5109
+
5110
+ #### Workflow integration
5111
+
5112
+ Workflow steps declare their artifact output type:
5113
+
5114
+ ```json
5115
+ {
5116
+ "id": "phase-1c-challenge-and-select",
5117
+ "output": {
5118
+ "artifact": "design-candidates",
5119
+ "schema": "wr.artifacts.design-candidates.v1"
5120
+ }
5121
+ }
5122
+ ```
5123
+
5124
+ The daemon automatically stores the step's notes as a typed artifact. Other steps and other sessions can query it by type rather than by file path.
5125
+
5126
+ ---
5127
+
5128
+ #### What stays in the repo
5129
+
5130
+ Almost nothing from WorkTrain sessions. The only things that belong in the repo:
5131
+ - Source code changes (committed via auto-commit or human review)
5132
+ - Long-lived spec files that are part of the product (e.g. `docs/ideas/backlog.md`)
5133
+ - Workflow definitions (`workflows/*.json`)
5134
+
5135
+ Everything else -- design docs, review findings, investigation notes, implementation plans -- lives in `~/.workrail/artifacts/`. If you want a design doc in the repo, you explicitly commit it. The default is: it lives in WorkTrain's data layer.
5136
+
5137
+ ---
5138
+
5139
+ #### Build order
5140
+
5141
+ 1. **Artifact store** -- `~/.workrail/artifacts/<sessionId>/` directory structure, JSON schema for common types
5142
+ 2. **Daemon writes artifacts** -- workflow steps with `output.artifact` declaration write to the artifact store automatically
5143
+ 3. **`worktrain artifacts` CLI** -- list, read, search artifacts by session, type, date
5144
+ 4. **Console artifacts tab** -- render artifacts in session detail view
5145
+ 5. **`read_artifact` / `search_artifacts` tools** -- agents can query the artifact store
5146
+ 6. **Knowledge graph integration** -- artifacts become nodes, sessions link to their artifacts
5147
+
5148
+ **The `NEVER COMMIT MARKDOWN FILES` rule in metaGuidance is a symptom of this missing feature.** The rule exists because agents keep dumping files in the wrong place. With a proper artifact store, the rule becomes unnecessary -- artifacts have nowhere to go except the artifact store.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@exaudeus/workrail",
3
- "version": "3.34.2",
3
+ "version": "3.35.1",
4
4
  "description": "Step-by-step workflow enforcement for AI agents via MCP",
5
5
  "license": "MIT",
6
6
  "repository": {