npm - @exaudeus/workrail - Versions diffs - 3.34.2 → 3.35.1 - Mend

@exaudeus/workrail 3.34.2 → 3.35.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/dist/console-ui/assets/{index-DSRkHTz1.js → index-D7jQyCSD.js} +2 -2
package/dist/console-ui/index.html +1 -1
package/dist/daemon/workflow-runner.d.ts +1 -0
package/dist/daemon/workflow-runner.js +148 -10
package/dist/manifest.json +8 -8
package/docs/design/daemon-complete-step-tool-candidates.md +160 -0
package/docs/design/daemon-complete-step-tool-design-review.md +82 -0
package/docs/design/daemon-complete-step-tool-implementation-plan.md +166 -0
package/docs/ideas/backlog.md +395 -0
package/package.json +1 -1

package/docs/design/daemon-complete-step-tool-implementation-plan.md ADDED Viewed

@@ -0,0 +1,166 @@
+# Implementation Plan: daemon complete_step tool
+## Problem Statement
+The daemon's `continue_workflow` tool requires the LLM to round-trip a `continueToken` (an HMAC-signed opaque token). The LLM frequently mangles this token, causing TOKEN_BAD_SIGNATURE errors that kill sessions. The fix: add a `complete_step` tool where the daemon injects the `continueToken` internally -- the LLM never sees it.
+## Acceptance Criteria
+1. `makeCompleteStepTool()` function exists in `src/daemon/workflow-runner.ts`, exported alongside `makeContinueWorkflowTool()`
+2. `complete_step` accepts: `notes: string` (required, min 50 chars), `artifacts?: unknown[]`, `context?: Record<string, unknown>`
+3. The tool injects the current session's `continueToken` internally before calling `executeContinueWorkflow` -- LLM never provides it
+4. `continueToken` is managed via a closure variable `currentContinueToken` in `runWorkflow()`, updated on advance and blocked-retry
+5. On successful advance: returns `{ status: 'advanced', nextStep: '<step title>' }` text
+6. On workflow complete: returns `{ status: 'complete' }` text
+7. On blocked: returns human-readable feedback saying 'call complete_step again' (not continue_workflow)
+8. Runtime validation: throws if `notes.length < 50`
+9. `complete_step` is in the daemon tools list in `runWorkflow()` alongside `continue_workflow`
+10. `continue_workflow` description is marked `[DEPRECATED in daemon sessions -- use complete_step]`
+11. `BASE_SYSTEM_PROMPT` lists `complete_step` as the primary advancement tool
+12. Initial prompt removes `continueToken: ${startContinueToken}` and says 'Call complete_step with your notes when done'
+13. Unit tests cover: happy-path advance, workflow complete, blocked response (retryable + non-retryable), notes too short, artifacts pass-through
+14. All existing tests pass
+## Non-Goals
+- No schema changes to `V2ContinueWorkflowInputShape` or public MCP tools
+- No new HTTP routes or public API changes
+- `complete_step` is daemon-only -- NOT added to the MCP server's public tools list
+- No migration of `makeContinueWorkflowTool` to use `TokenRef` pattern
+- No removal of `continue_workflow` from the daemon tools list (backward compat)
+- No changes to `rehydrate` flow
+## Philosophy-Driven Constraints
+- **Immutability by default**: `currentContinueToken` is the only mutable shared state; confined to `runWorkflow()` closure
+- **Validate at boundaries**: runtime `notes.length` check in `execute()` (JSON Schema is informational only)
+- **Prefer fakes over mocks**: tests use fake `executeContinueWorkflow` injection
+- **YAGNI**: zero new abstraction types
+- **Document "why" not "what"**: WHY comments on all non-obvious decisions
+## Invariants
+1. `persistTokens(sessionId, newToken, ...)` is always called BEFORE `onAdvance()` fires (crash safety)
+2. `currentContinueToken` is updated to the advance token on `kind: 'ok'` responses
+3. `currentContinueToken` is updated to the retry token on `kind: 'blocked'` responses
+4. `continueToken` never appears in `complete_step` response text
+5. `intent: 'advance'` is hardcoded -- LLM cannot pass `rehydrate` via `complete_step`
+6. `complete_step` is NOT exposed in the MCP server's public tool registration
+## Selected Approach
+**Candidate 1: Inline closure variable + two callback paths**
+`runWorkflow()` holds `let currentContinueToken = startContinueToken`. The variable is updated:
+- In `onAdvance(stepText, continueToken)` -- uses the `continueToken` param (already available, currently ignored)
+- Via `onTokenUpdate: (t: string) => void` callback passed to `makeCompleteStepTool()` -- called on blocked retry
+`makeCompleteStepTool()` signature:
+```typescript
+export function makeCompleteStepTool(
+  sessionId: string,
+  ctx: V2ToolContext,
+  getCurrentToken: () => string,
+  onAdvance: (nextStepText: string, continueToken: string) => void,
+  onComplete: (notes: string | undefined) => void,
+  onTokenUpdate: (t: string) => void,
+  schemas: Record<string, any>,
+  _executeContinueWorkflowFn?: typeof executeContinueWorkflow,
+  emitter?: DaemonEventEmitter,
+  workrailSessionId?: string | null,
+): AgentTool
+```
+**Runner-up:** Candidate 2 (`TokenRef` object) -- rejected: requires `makeContinueWorkflowTool` signature change (out of scope), violates YAGNI.
+## Vertical Slices
+### Slice 1: Schema and factory function
+**Files:** `src/daemon/workflow-runner.ts`
+**Work:**
+- Add `CompleteStepParams` JSON Schema to `getSchemas()`:
+  ```json
+  {
+    "type": "object",
+    "properties": {
+      "notes": { "type": "string", "minLength": 50, "description": "..." },
+      "artifacts": { "type": "array", "items": {}, "description": "..." },
+      "context": { "type": "object", "additionalProperties": true }
+    },
+    "required": ["notes"],
+    "additionalProperties": false
+  }
+  ```
+- Add `makeCompleteStepTool()` factory function (described above)
+- Runtime notes validation inside `execute()`
+- Full blocked/advance/complete handling (mirror `makeContinueWorkflowTool` but without token in response text)
+**AC:** Function exists, exported, executes correctly with fake injection
+### Slice 2: runWorkflow() integration
+**Files:** `src/daemon/workflow-runner.ts`
+**Work:**
+- Add `let currentContinueToken = startContinueToken` after `startContinueToken` is assigned
+- Update `onAdvance` to set `currentContinueToken = continueToken` (second param was ignored before)
+- Add `complete_step` to the tools list with `makeCompleteStepTool(..., () => currentContinueToken, onAdvance, onComplete, (t) => { currentContinueToken = t; }, ...)`
+- Mark `continue_workflow` description as deprecated
+**AC:** `runWorkflow()` compiles; `currentContinueToken` is updated on advance
+### Slice 3: System prompt + initial prompt updates
+**Files:** `src/daemon/workflow-runner.ts`
+**Work:**
+- Update `BASE_SYSTEM_PROMPT` tools section: add `complete_step` as primary tool, mark `continue_workflow` as deprecated
+- Update `initialPrompt` in `runWorkflow()`: remove `continueToken: ${startContinueToken}` from the text; change 'call continue_workflow' to 'call complete_step'
+**AC:** Prompt does not contain the continueToken string; 'complete_step' appears as primary tool
+### Slice 4: Tests
+**Files:** `tests/unit/workflow-runner-complete-step.test.ts`
+**Test cases:**
+- TC1: notes present, advance returns `{ status: 'advanced', nextStep: '...' }`
+- TC2: workflow complete returns `{ status: 'complete' }`
+- TC3: blocked retryable -- feedback says 'call complete_step again', `onTokenUpdate` called with retry token
+- TC4: blocked non-retryable -- feedback says cannot proceed without resolving
+- TC5: notes too short (< 50 chars) -- tool throws
+- TC6: notes absent -- tool throws
+- TC7: artifacts pass-through -- artifacts forwarded to executeContinueWorkflow
+- TC8: no artifacts, no output artifact object constructed (empty array guard)
+- TC9: `continueToken` NOT in response text (regression guard)
+- TC10: `getCurrentToken()` is called (not a hardcoded token) -- verify via fake that captures input
+**AC:** All 10 tests pass; `npm run test:unit` green
+## Risk Register
+| Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|
+| LLM uses deprecated continue_workflow with hallucinated token | Low | Medium | System prompt deprecation notice; transition risk accepted |
+| Token not updated on blocked retry | Low | High | Test TC3 catches this |
+| `continueToken` in response text | Low | Medium | Test TC9 catches this |
+| Initial prompt still contains token string | Low | Medium | Test prompt content in system-prompt test |
+## PR Packaging Strategy
+Single PR: `feat/daemon-complete-step-tool`
+All 4 slices go in one commit per slice (or a single clean commit). No multi-PR needed -- change is additive, no breaking changes.
+## Philosophy Alignment
+| Slice | Principle | Status |
+|---|---|---|
+| Slice 1: Schema | Make illegal states unrepresentable | Satisfied -- notes required, intent hardcoded |
+| Slice 1: Factory | Validate at boundaries | Satisfied -- runtime check in execute() |
+| Slice 1: Factory | Immutability by default | Satisfied -- mutation via callbacks only |
+| Slice 2: runWorkflow | Determinism | Acceptable tension -- sequential execution makes mutable state deterministic |
+| Slice 3: Prompts | Document "why" | Satisfied -- prompts explain the tool's purpose |
+| Slice 4: Tests | Prefer fakes over mocks | Satisfied -- fake injection pattern |
+| All slices | YAGNI | Satisfied -- zero new abstractions |
+## Unresolved Unknowns
+None material. All design decisions are confirmed.
+- `unresolvedUnknownCount`: 0
+- `planConfidenceBand`: High

package/docs/ideas/backlog.md CHANGED Viewed

@@ -4751,3 +4751,398 @@ The daemon assembles a pre-packaged context bundle from these sources before the
 - How do you handle a trigger that spans multiple systems (e.g. a Jira ticket about a GitHub PR)?
 **This is a design-first item** -- the ideas are promising but the right shape isn't obvious. Needs a discovery pass before any implementation.
+---
+### Rethinking the subagent loop from first principles (Apr 18, 2026)
+**Step back from all assumptions.** The current design assumes subagent spawning works like Claude Code's `mcp__nested-subagent__Task` -- the LLM decides when to spawn, what to give it, and handles the result. That's not the only model, and it might not be the best one for WorkTrain.
+---
+#### The current assumption (inherited from Claude Code)
+```
+Agent decides → calls spawn_agent tool → subagent runs → agent gets result → agent continues
+```
+The LLM is the orchestrator. It decides when parallelism is needed, what context to pass, how to handle results.
+**Problems with this:**
+- LLMs are bad at orchestration decisions -- they sometimes delegate when they shouldn't, sometimes don't when they should
+- Context passing is lossy -- the LLM decides what to include, which is usually insufficient
+- Subagent output competes with everything else in the parent's context window
+- The LLM has to reason about the subagent's output before continuing -- burns context and turns
+- No enforcement -- the LLM can skip delegation entirely and just do the work itself (often wrong)
+---
+#### Alternative model: workflow-declared parallelism, daemon-enforced
+**The workflow spec is the orchestration. The daemon is the orchestrator. The LLM is the executor.**
+```yaml
+# Workflow step definition
+- id: parallel-review
+  type: parallel
+  agents:
+    - workflow: routine-correctness-review
+      contextFrom: [phase-3-output, candidateFiles]
+    - workflow: routine-philosophy-alignment
+      contextFrom: [phase-0-output, philosophySources]
+    - workflow: routine-hypothesis-challenge
+      contextFrom: [phase-2-output, selectedApproach]
+  synthesisStep: synthesize-parallel-review
+```
+The daemon sees this step definition and:
+1. Automatically spawns 3 child sessions with specified workflows
+2. Injects the declared context bundles (from prior step outputs) into each child
+3. Waits for all 3 to complete
+4. Passes all 3 results to a synthesis step
+5. Injects the synthesis into the parent agent's next turn
+**The parent LLM never decides to spawn anything.** It just does its part. The workflow declares the orchestration pattern. The daemon enforces it.
+---
+#### What this changes about the agent's job
+Today: "Do this work, and decide when to delegate parts of it to subagents."
+New model: "Do this bounded cognitive task. The daemon handles everything else."
+The agent's job becomes strictly about the cognitive work -- reasoning, writing, deciding within a defined scope. Orchestration, parallelism, context packaging, result synthesis -- all daemon responsibilities defined by the workflow spec.
+---
+#### The agent gives context to the daemon, not to subagents directly
+Instead of the LLM calling `spawn_agent({ goal: "...", context: {...} })`, the workflow step has:
+```yaml
+- id: context-gathering
+  output:
+    contextFor:
+      - step: parallel-review
+        keys: [candidateFiles, invariants, philosophySources]
+```
+The agent writes outputs as structured artifacts. The daemon routes those artifacts to the right child agents at the right time. The LLM never packages context for a subagent -- it just produces outputs, and the workflow spec declares where those outputs go.
+**This is the shift:** from "agent as orchestrator" to "workflow as orchestrator, daemon as executor, agent as cognitive unit."
+---
+#### What the subagent loop might look like
+```
+Parent workflow step completes
+  ↓ Daemon reads step output artifacts
+  ↓ Daemon checks workflow spec for parallel/sequential children
+  ↓ Daemon spawns child sessions with structured context bundles
+  ↓ Children run their bounded tasks
+  ↓ Daemon collects child outputs
+  ↓ Daemon passes synthesized context to parent's next step
+  ↓ Parent continues with full context
+```
+No LLM orchestration. No token-burning context packaging decisions. No "did I remember to delegate this?" uncertainty.
+---
+#### What needs to be designed (don't implement yet)
+1. **Workflow step schema for parallelism** -- how does the workflow spec declare parallel agents, sequential chains, fan-out/fan-in patterns?
+2. **Context routing spec** -- how does a step's output get routed to specific child agents? What's the schema for `contextFor`?
+3. **Synthesis patterns** -- how do multiple child outputs get combined? (concatenate? LLM synthesis step? structured merge?)
+4. **Failure handling** -- if one child fails, what happens? (fail-fast? continue with partial results? retry?)
+5. **Depth limits** -- same constraints as native agent spawning, but enforced at the workflow level not tool level
+6. **Backward compatibility** -- workflows that currently use `mcp__nested-subagent__Task` can be migrated incrementally
+**This is a design-first item.** Run a discovery session to explore the design space before any implementation. The current assumptions about subagent loops may be entirely wrong.
+---
+### Workflow runtime adapter: one spec, two runtimes (Apr 18, 2026)
+**The core insight:** as workflows evolve (potentially morphing significantly once the subagent loop is rethought), the workflow JSON becomes the canonical spec for *what work needs to happen*. How that spec gets executed depends on the runtime. A single adapter layer translates the canonical spec to runtime-specific execution plans.
+**Two runtimes, one spec:**
+```
+workflows/mr-review-workflow-agentic.json  ← canonical spec (unchanged)
+         ↓
+WorkflowAdapter.forRuntime('mcp')          ← MCP runtime interpretation
+WorkflowAdapter.forRuntime('daemon')       ← Daemon runtime interpretation
+```
+**What each adapter does:**
+MCP adapter (human-in-the-loop):
+- Preserves `requireConfirmation` gates
+- Presents `continue_workflow` tool call interface
+- LLM drives subagent spawning manually via `mcp__nested-subagent__Task`
+- Maintains backward compat with all existing Claude Code usage
+Daemon adapter (fully autonomous):
+- Removes or auto-bypasses `requireConfirmation` gates
+- Replaces `continue_workflow` with `complete_step` (daemon manages tokens)
+- Converts workflow-declared parallelism into automatic child session spawning
+- Routes step outputs to child agents per workflow spec
+- Enforces output contracts at step boundaries
+**Why this matters as workflows evolve:**
+Once the subagent loop is rethought (workflow-as-orchestrator model), workflow steps will likely declare parallelism, context routing, and synthesis patterns explicitly. These declarations make no sense to the MCP runtime (a human is already deciding this in real-time). The adapter translates them:
+```yaml
+# Workflow spec (future shape)
+- id: parallel-review
+  type: parallel
+  agents: [correctness, philosophy, hypothesis-challenge]
+  contextFrom: [phase-3-output]
+```
+MCP adapter sees this → renders as: "You should spawn 3 reviewer subagents now. Here's a template..."
+Daemon adapter sees this → actually spawns 3 child sessions automatically
+The workflow spec describes the intent. The adapter knows how each runtime fulfills it.
+**Key guarantee:** workflow improvements automatically benefit both runtimes. Improving `mr-review-workflow-agentic`'s philosophy alignment step shows up whether a human runs it through Claude Code or WorkTrain runs it autonomously. No dual maintenance.
+**Also eliminates "autonomous workflow variants":** the backlog had a separate item for autonomous variants of workflows. With the adapter, the canonical workflow spec is the only version -- the daemon adapter handles what "autonomy: full" means in practice. No parallel workflow files.
+**Build order:**
+1. Define the canonical workflow spec surface (what can be declared)
+2. MCP adapter (largely a no-op -- existing behavior, but formally defined)
+3. Daemon adapter (the interesting one -- translates declarations to daemon execution)
+4. Converter for upgrading existing workflow JSONs to the new canonical spec if the schema evolves
+**Dependencies:** requires the subagent loop rethinking to be resolved first -- the adapter can't be designed until we know what the workflow spec will declare.
+---
+### User notifications when daemon starts and finishes work (Apr 18, 2026)
+**The problem:** the daemon silently starts and finishes sessions. Unless you're watching the console or tailing the log, you have no idea work happened or completed. For autonomous sessions that run over minutes or hours, this is a significant UX gap.
+**What users need to know:**
+- Session started: "WorkTrain started reviewing PR #566" (with a link)
+- Session completed: "WorkTrain finished reviewing PR #566 -- APPROVED, no findings" (with session link)
+- Session failed/stuck: "WorkTrain got stuck on PR #566 after 15 turns -- needs attention" (with details)
+**Notification channels -- anything the user wants:**
+The notification system should be open-ended. Any channel that accepts a webhook or has an API should be configurable. The architecture is: `DaemonEventEmitter` → `NotificationRouter` → one or more configured channels.
+Short-term (easiest to ship):
+- **Outbox.jsonl** -- already spec'd. `worktrain inbox` reads it, mobile client polls it. Works everywhere, zero config.
+- **Generic webhook** -- HTTP POST to any URL. Covers Slack, Discord, Teams, PagerDuty, Zapier, IFTTT, and anything else that accepts webhooks. One implementation, infinite integrations.
+- **macOS notification** -- `osascript` on Mac. Useful for local dev awareness.
+- **Linux/Windows notification** -- `notify-send` on Linux, Windows Toast via PowerShell.
+Medium-term (first-class integrations):
+- **Slack** (direct API, not just webhook -- enables threading, reactions, rich formatting)
+- **Discord** (webhook, then bot for richer interactions)
+- **Microsoft Teams** (Adaptive Cards)
+- **Telegram** (popular for personal automation)
+- **Email** (SMTP for async, digest mode)
+Long-term (when mobile exists):
+- **Mobile push notifications** -- the mobile app (spec'd in backlog) receives push notifications directly. When the app exists, this becomes the primary channel -- native push is better than any polling-based alternative.
+- **Desktop app** -- if WorkTrain ever has a desktop app, native notifications from there.
+**The outbox is the universal foundation.** Every notification goes through `~/.workrail/outbox.jsonl` first. Channel-specific delivery (webhook, Slack, push) is a fan-out from the outbox. This means: a mobile app polling the outbox gets ALL notifications regardless of which other channels are configured.
+**Config:**
+```json
+// ~/.workrail/config.json
+{
+  "notifications": {
+    "onSessionComplete": true,
+    "onSessionFailed": true,
+    "onStuck": true,
+    "onSessionStart": false,
+    "channels": [
+      { "type": "webhook", "url": "$SLACK_WEBHOOK_URL" },
+      { "type": "webhook", "url": "$DISCORD_WEBHOOK_URL" },
+      { "type": "macos" },
+      { "type": "outbox" }
+    ]
+  }
+}
+```
+**Build order:** outbox.jsonl integration (foundation, works everywhere) → generic webhook (covers Slack/Discord/Teams/anything) → platform notifications (macOS/Linux/Windows) → mobile app push (when mobile exists).
+---
+## 🎉 WorkTrain first confirmed end-to-end autonomous session (Apr 18, 2026)
+**Timestamp:** 2026-04-18T15:09:49Z
+**Commit:** `473f4bd0` (main)
+**npm version:** v3.34.1 (published, installable by anyone)
+**What happened:** A real MR review workflow (`mr-review-workflow-agentic`) ran completely autonomously via webhook trigger, advanced through all phases (context gathering, review, synthesis, validation, handoff), self-validated, and produced a structured finding set. 8 step advances, `outcome: success`.
+**Trigger:** `POST /webhook/mr-review {"goal": "Review PR #566: fix two minor bugs..."}`
+**Session:** `sess_3bmjuzf7l2vrqynjtleg5iskm4`
+**Result:** APPROVE with High confidence. 3 Minor findings, 1 Informational. Correctly decided not to delegate since no Critical/Major issues.
+---
+### What works at this commit
+- ✅ Daemon accepts webhooks, starts sessions, runs workflows end-to-end
+- ✅ Sessions advance through all workflow phases autonomously
+- ✅ `mr-review-workflow-agentic` v2.6 runs fully -- context gathering, review phases, synthesis loop, validation, handoff
+- ✅ `wr.discovery` v3.2.0 runs fully -- with new phase-0-reframe (goal reframing before research)
+- ✅ Console shows live sessions via event log (no daemon connection required)
+- ✅ MCP server is stable (bridge removed, EPIPE fixed, v3.34.1 published)
+- ✅ GitHub + GitLab polling triggers (no webhooks needed)
+- ✅ `worktrain init`, `tell`, `inbox`, `spawn`, `await` CLI commands
+- ✅ Stuck detection + visibility (`worktrain status`, `worktrain logs --follow`)
+- ✅ `complete_step` tool -- daemon manages continueToken, LLM never handles it
+- ✅ Assessment gate circuit breaker (stops at 3 blocked attempts, shows artifact format)
+- ✅ `worktrain daemon --install` creates launchd service (daemon survives MCP reconnects)
+- ✅ Self-configuration (`triggers.yml`, `daemon-soul.md`, `AGENTS.md` for workrail repo)
+### Current limitations at this commit
+**Blocking reliable complex workflows:**
+1. **`complete_step` not yet tested in production** -- just merged, daemon still using `continue_workflow` in running sessions. Needs daemon restart to take effect.
+2. **Assessment gates still unreliable** -- `complete_step` fixes the token issue; the `artifacts` field (#557) fixes the submission issue. But `coding-task-workflow-agentic` phases with quality gates haven't been tested end-to-end yet.
+3. **Native `spawn_agent` not yet merged** -- implementation in progress. Until it lands, all subagent delegation is via `mcp__nested-subagent__Task` (invisible black box).
+4. **No session identity (parentSessionId)** -- multi-phase work appears as unrelated flat sessions in the console.
+**Architecture not yet realized:**
+5. **Coordinator scripts don't exist** -- `worktrain spawn/await` is there but no templates.
+6. **Subagent loop not rethought** -- LLM still decides when to delegate; workflow-as-orchestrator model is spec'd but not built.
+7. **Workflow runtime adapter not built** -- workflows run in daemon mode as-is; no MCP vs daemon adaptation layer.
+8. **Knowledge graph not built** -- context gathering still sweeps files on every session.
+9. **MCP simplification PR-B not done** -- HttpServer still starts with MCP server.
+**Missing for production autonomy:**
+10. **No notifications** -- daemon completes work silently. Users have no awareness unless watching console/logs.
+11. **No auto-commit from handoff artifact** -- merged but untested end-to-end.
+12. **Late-bound goals not implemented** -- triggers require static goals; dynamic goals (like PR reviews) need `goalTemplate: "{{$.goal}}"` as default.
+13. **No coordinator script template** -- the multi-phase autonomous pipeline exists as primitives but not as a usable script.
+---
+### Artifacts as first-class citizens: explorable, accessible, out of the repo (Apr 18, 2026)
+**The current mess:** every autonomous session dumps `design-candidates.md`, `implementation_plan.md`, `design-review-findings.md`, `mr-review.md` etc. as files in the repo root or worktrees. They are:
+- Not indexed or searchable
+- Not visible in the console
+- Not accessible to other sessions (agent B can't read agent A's handoff without knowing the exact file path)
+- Polluting the repo with ephemeral working documents
+- Lost when worktrees are cleaned up
+- Scattered across the filesystem with no structure
+**The right model:** artifacts are WorkTrain data, not filesystem files.
+---
+#### What an artifact is
+Any structured output from a session that has value beyond the session itself:
+- **Handoff docs** -- what one session produces for the next to consume
+- **Design candidates** -- research output with tradeoffs and recommendation
+- **Implementation plans** -- what to build, how, in what order
+- **Review findings** -- MR review output with findings, severity, recommendation
+- **Spec files** -- behavioral specs, acceptance criteria, API contracts
+- **Investigation summaries** -- bug investigation root cause and reproduction
+- **Context bundles** -- pre-packaged knowledge for subagent consumption
+**NOT artifacts:** step notes (stay in WorkRail session store), event logs (stay in daemon events), source code (stays in repo).
+---
+#### Where artifacts live
+`~/.workrail/artifacts/<sessionId>/<artifact-type>-<timestamp>.json`
+Structured JSON, not markdown. The display layer (console, `worktrain artifacts`) renders them as human-readable. Other agents query them as structured data.
+**Why JSON not markdown:**
+- Queryable by other agents (what are the findings with severity=critical?)
+- Renderable by the console with proper formatting, filtering, search
+- Versionable and diffable in the artifact store
+- Accessible via the knowledge graph (artifacts become nodes with typed edges)
+---
+#### Console integration
+The console session detail view gets an "Artifacts" tab alongside "Steps" and "Notes":
+```
+Session: sess_3bmj...  [MR Review: PR #566]
+├── Steps (8)
+├── Notes
+└── Artifacts (3)
+    ├── 📋 review-findings.json    "APPROVE -- 3 Minor, 1 Info"
+    ├── 📄 context-bundle.json     "12 files read, 4 patterns identified"
+    └── 🔍 investigation-notes.json "Signal 3 dead code in max_turns path"
+```
+Click an artifact → full rendered view in the console.
+---
+#### Accessibility to other agents
+Agents can query artifacts from prior sessions via a new tool:
+```
+read_artifact({ sessionId: 'sess_3bmj...', type: 'review-findings' })
+→ { verdict: 'APPROVE', findings: [...], recommendation: '...' }
+search_artifacts({ type: 'implementation-plan', workflowId: 'coding-task-workflow-agentic', since: '7d' })
+→ [{ sessionId, summary, createdAt }, ...]
+```
+This replaces the current pattern where agents `cat design-candidates.md` from a known path -- fragile, path-dependent, breaks across worktrees.
+---
+#### Workflow integration
+Workflow steps declare their artifact output type:
+```json
+{
+  "id": "phase-1c-challenge-and-select",
+  "output": {
+    "artifact": "design-candidates",
+    "schema": "wr.artifacts.design-candidates.v1"
+  }
+}
+```
+The daemon automatically stores the step's notes as a typed artifact. Other steps and other sessions can query it by type rather than by file path.
+---
+#### What stays in the repo
+Almost nothing from WorkTrain sessions. The only things that belong in the repo:
+- Source code changes (committed via auto-commit or human review)
+- Long-lived spec files that are part of the product (e.g. `docs/ideas/backlog.md`)
+- Workflow definitions (`workflows/*.json`)
+Everything else -- design docs, review findings, investigation notes, implementation plans -- lives in `~/.workrail/artifacts/`. If you want a design doc in the repo, you explicitly commit it. The default is: it lives in WorkTrain's data layer.
+---
+#### Build order
+1. **Artifact store** -- `~/.workrail/artifacts/<sessionId>/` directory structure, JSON schema for common types
+2. **Daemon writes artifacts** -- workflow steps with `output.artifact` declaration write to the artifact store automatically
+3. **`worktrain artifacts` CLI** -- list, read, search artifacts by session, type, date
+4. **Console artifacts tab** -- render artifacts in session detail view
+5. **`read_artifact` / `search_artifacts` tools** -- agents can query the artifact store
+6. **Knowledge graph integration** -- artifacts become nodes, sessions link to their artifacts
+**The `NEVER COMMIT MARKDOWN FILES` rule in metaGuidance is a symptom of this missing feature.** The rule exists because agents keep dumping files in the wrong place. With a proper artifact store, the rule becomes unnecessary -- artifacts have nowhere to go except the artifact store.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@exaudeus/workrail",
-  "version": "3.34.2",
+  "version": "3.35.1",
   "description": "Step-by-step workflow enforcement for AI agents via MCP",
   "license": "MIT",
   "repository": {