npm - @exaudeus/workrail - Versions diffs - 3.32.0 → 3.34.0 - Mend

@exaudeus/workrail 3.32.0 → 3.34.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (98) hide show

package/dist/cli/commands/index.d.ts +1 -0
package/dist/cli/commands/index.js +3 -1
package/dist/cli/commands/worktrain-await.js +11 -9
package/dist/cli/commands/worktrain-daemon-install.d.ts +35 -0
package/dist/cli/commands/worktrain-daemon-install.js +291 -0
package/dist/cli/commands/worktrain-daemon.d.ts +31 -0
package/dist/cli/commands/worktrain-daemon.js +272 -0
package/dist/cli/commands/worktrain-spawn.js +11 -9
package/dist/cli-worktrain.js +488 -0
package/dist/cli.js +1 -22
package/dist/console/standalone-console.d.ts +28 -0
package/dist/console/standalone-console.js +142 -0
package/dist/{console/assets/index-Cb_LO718.js → console-ui/assets/index-C1JXnwZS.js} +1 -1
package/dist/{console → console-ui}/index.html +1 -1
package/dist/daemon/agent-loop.d.ts +27 -0
package/dist/daemon/agent-loop.js +39 -1
package/dist/daemon/daemon-events.d.ts +63 -1
package/dist/daemon/workflow-runner.d.ts +3 -2
package/dist/daemon/workflow-runner.js +285 -46
package/dist/infrastructure/session/HttpServer.js +133 -34
package/dist/manifest.json +136 -104
package/dist/mcp/handlers/v2-error-mapping.d.ts +3 -0
package/dist/mcp/handlers/v2-error-mapping.js +2 -0
package/dist/mcp/handlers/v2-execution/advance.js +25 -0
package/dist/mcp/handlers/v2-execution/continue-advance.js +7 -0
package/dist/mcp/output-schemas.d.ts +30 -30
package/dist/mcp/transports/fatal-exit.js +4 -0
package/dist/mcp/transports/http-entry.js +0 -5
package/dist/mcp/transports/stdio-entry.js +24 -12
package/dist/mcp/v2/tools.d.ts +4 -4
package/dist/mcp-server.d.ts +0 -2
package/dist/mcp-server.js +1 -42
package/dist/trigger/adapters/github-poller.d.ts +44 -0
package/dist/trigger/adapters/github-poller.js +190 -0
package/dist/trigger/adapters/gitlab-poller.d.ts +27 -0
package/dist/trigger/adapters/gitlab-poller.js +81 -0
package/dist/trigger/index.d.ts +4 -1
package/dist/trigger/index.js +5 -1
package/dist/trigger/polled-event-store.d.ts +22 -0
package/dist/trigger/polled-event-store.js +173 -0
package/dist/trigger/polling-scheduler.d.ts +20 -0
package/dist/trigger/polling-scheduler.js +249 -0
package/dist/trigger/trigger-listener.d.ts +3 -0
package/dist/trigger/trigger-listener.js +47 -3
package/dist/trigger/trigger-store.js +114 -33
package/dist/trigger/types.d.ts +17 -1
package/dist/v2/durable-core/domain/observation-builder.d.ts +3 -0
package/dist/v2/durable-core/domain/observation-builder.js +2 -2
package/dist/v2/durable-core/domain/prompt-renderer.d.ts +2 -1
package/dist/v2/durable-core/domain/prompt-renderer.js +10 -0
package/dist/v2/durable-core/schemas/export-bundle/index.d.ts +224 -224
package/dist/v2/durable-core/schemas/session/events.d.ts +42 -42
package/dist/v2/durable-core/schemas/session/manifest.d.ts +6 -6
package/dist/v2/durable-core/schemas/session/validation-event.d.ts +2 -2
package/dist/v2/durable-core/tokens/payloads.d.ts +52 -52
package/dist/v2/usecases/console-routes.js +3 -3
package/dist/v2/usecases/console-service.js +185 -10
package/dist/v2/usecases/console-types.d.ts +8 -0
package/docs/design/bridge-removal-pr-a-candidates.md +115 -0
package/docs/design/bridge-removal-pr-a-design-review.md +79 -0
package/docs/design/bridge-removal-pr-a-implementation-plan.md +203 -0
package/docs/design/daemon-conversation-logging-plan.md +98 -0
package/docs/design/daemon-conversation-logging-review.md +55 -0
package/docs/design/daemon-conversation-logging.md +129 -0
package/docs/design/github-polling-adapter-design-candidates.md +226 -0
package/docs/design/github-polling-adapter-design-review-findings.md +131 -0
package/docs/design/github-polling-adapter-implementation-plan.md +284 -0
package/docs/design/implementation_plan.md +192 -0
package/docs/design/workflow-id-validation-at-startup.md +146 -0
package/docs/design/workflow-id-validation-design-review.md +87 -0
package/docs/design/workflow-id-validation-implementation-plan.md +185 -0
package/docs/design/worktrain-system-prompt-report-issue-candidates.md +135 -0
package/docs/design/worktrain-system-prompt-report-issue-design-review.md +73 -0
package/docs/discovery/design-candidates.md +180 -0
package/docs/discovery/design-review-findings.md +110 -0
package/docs/discovery/wr-discovery-goal-reframing.md +303 -0
package/docs/ideas/backlog.md +627 -0
package/package.json +1 -1
package/workflows/architecture-scalability-audit.json +1 -1
package/workflows/bug-investigation.agentic.v2.json +3 -3
package/workflows/coding-task-workflow-agentic.json +32 -32
package/workflows/coding-task-workflow-agentic.lean.v2.json +1 -1
package/workflows/coding-task-workflow-agentic.v2.json +7 -7
package/workflows/mr-review-workflow.agentic.v2.json +21 -12
package/workflows/personal-learning-materials-creation-branched.json +2 -2
package/workflows/production-readiness-audit.json +1 -1
package/workflows/relocation-workflow-us.json +2 -2
package/workflows/ui-ux-design-workflow.json +14 -14
package/workflows/workflow-for-workflows.json +3 -3
package/workflows/workflow-for-workflows.v2.json +2 -2
package/workflows/wr.discovery.json +59 -8
package/dist/mcp/transports/bridge-entry.d.ts +0 -102
package/dist/mcp/transports/bridge-entry.js +0 -454
package/dist/mcp/transports/bridge-events.d.ts +0 -51
package/dist/mcp/transports/bridge-events.js +0 -24
package/dist/mcp/transports/primary-tombstone.d.ts +0 -21
package/dist/mcp/transports/primary-tombstone.js +0 -51
/package/dist/{console → console-ui}/assets/index-8dh0Psu-.css +0 -0

package/docs/ideas/backlog.md CHANGED Viewed

@@ -4029,3 +4029,630 @@ worktrain logs --format json            # machine-readable for scripts
 3. `worktrain logs` CLI commands (reads files, correlates by sessionId)
 4. SSE extension in DaemonConsole for live event streaming
 5. Coordinator script subscription to event streams (replaces polling session store)
+---
+### Subagent context packaging: the main agent assumes too much (Apr 17, 2026)
+**The problem:** When a main agent spawns a subagent, the work package it creates is usually too thin. The main agent has rich context from the full conversation -- why this task matters, what was already tried, what constraints were discovered -- but it packages the subagent task as if that context is shared. The subagent gets a one-liner and has to rediscover everything from scratch.
+This is the same problem as a developer handing a junior a vague JIRA ticket instead of a proper brief. The subagent wastes tokens re-deriving what the main agent already knows, or worse, makes wrong assumptions.
+**Where this manifests:**
+- Coding task subagents that don't know why a specific approach was chosen
+- MR review subagents that don't know what invariants matter for this codebase
+- Discovery subagents that re-read files the main agent just read
+- Fix subagents that don't know what was already tried and failed
+**Three solution directions:**
+**Option A: Better instructions to the main agent (prompt engineering)**
+Add explicit guidance to the WorkTrain system prompt: "When spawning a subagent, include: (1) what you already know that the subagent won't, (2) what was already tried, (3) why this specific approach was chosen, (4) what constraints or invariants matter, (5) what 'done' looks like." This is the cheapest fix but depends on the main agent reliably following it.
+**Option B: Platform-assisted package creation (structured)**
+The `worktrain spawn` command (or the `spawn_session` tool) takes a structured work package:
+```typescript
+spawnSession({
+  workflowId: 'coding-task-workflow-agentic',
+  goal: '...',
+  context: {
+    whyThisApproach: '...',        // what the main agent knows about the decision
+    alreadyTried: [...],           // what failed
+    knownConstraints: [...],       // invariants the subagent must respect
+    relevantFiles: [...],          // files the main agent already read
+    completionCriteria: '...'      // what done actually looks like
+  }
+})
+```
+The platform validates that the package is complete before spawning -- missing fields emit a warning or block the spawn. The subagent's system prompt is enriched with this context automatically, without the main agent having to think about how to format it.
+**Option C: Platform-mediated context transfer (autonomous)**
+The platform automatically packages context from the spawning session into the child session. When the main agent calls `spawn_session`, the platform reads the current session's step notes and recent advances, synthesizes a context bundle, and injects it into the child's system prompt. No explicit packaging required from the main agent.
+This is the most powerful but also the most complex -- requires the platform to understand what's relevant, not just what's recent.
+**Recommended approach: B + A**
+Option B (structured work package with validation) as the primary mechanism. Option A (better main agent instructions) as a fallback. Option C as a long-term goal once the knowledge graph and session event stream are queryable enough to synthesize context automatically.
+**The `context` field in the structured package is the key addition.** Today `worktrain spawn` takes `goal`, `workflowId`, `workspacePath`. Adding a structured `context` object that the platform validates and injects gives subagents the brief they need without depending on the main agent to remember to include it.
+**Connection to knowledge graph:** Once the structural knowledge graph is built, `relevantFiles` can be auto-populated from a graph query rather than requiring the main agent to list them. The platform asks "what files are relevant to this goal?" and includes them automatically. This is how the context packaging problem gets solved at scale -- the platform knows what the subagent needs without the main agent having to enumerate it.
+**Session knowledge log (extends Option B):**
+As the main agent progresses, it continuously appends to a structured `session-knowledge.jsonl` for the session. Not step notes (those are workflow artifacts) -- this is a running record of things that would matter to any agent picking up this work:
+```jsonl
+{"kind":"decision","summary":"Using execFile not exec for all subprocess calls","reason":"Shell injection risk with user-controlled content","ts":1234567890}
+{"kind":"user_pushback","summary":"User rejected the polling approach","detail":"Wants webhook-based solution instead","ts":...}
+{"kind":"relevant_file","path":"src/trigger/trigger-router.ts","why":"Core routing logic, all trigger changes flow through here","ts":...}
+{"kind":"constraint","summary":"Never modify triggers.yml autonomously","source":"daemon-soul.md","ts":...}
+{"kind":"tried_and_failed","summary":"Tried npx approach, got version mismatch","detail":"Local build is different from installed package","ts":...}
+{"kind":"external_ref","url":"https://github.com/...","why":"Design doc for the delivery pattern","ts":...}
+{"kind":"plan","path":"implementation_plan.md","summary":"3-slice plan for the feature","ts":...}
+```
+When spawning a subagent, the platform automatically includes the session knowledge log in the work package. The subagent gets the full brief without the main agent having to reconstruct it.
+**Blank subagents (intentionally uncontextualized):**
+Sometimes you explicitly DON'T want context from the main session -- fresh eyes are the point. A hypothesis challenge subagent should challenge the leading hypothesis, not be anchored to it. An adversarial reviewer should find problems without knowing the main agent thinks the approach is sound.
+The `spawn_session` call should have an explicit `context: 'inherit' | 'blank' | 'custom'` field:
+- `inherit` -- auto-package from session knowledge log (default for most tasks)
+- `blank` -- no session context injected, subagent starts fresh (for adversarial roles)
+- `custom` -- explicit structured package (for precise control)
+**Subagent types with specialized system prompts and tools:**
+Different tasks need different cognitive profiles. A subagent type bundles: system prompt, available tools, and context mode:
+| Type | System prompt focus | Tools | Context |
+|------|---------------------|-------|---------|
+| `researcher` | Thorough, neutral, evidence-first | Read, Bash (read-only), Glob, Grep | inherit |
+| `challenger` | Adversarial, finds holes, challenges assumptions | Read, Bash | blank (intentionally unanchored) |
+| `implementer` | Precise, follows plans, no improvisation | Read, Write, Bash, continue_workflow | inherit |
+| `reviewer` | Finds bugs, security issues, philosophy violations | Read, Bash | blank |
+| `verifier` | Confirms claims with evidence, runs commands | Read, Bash | inherit |
+| `coordinator` | Routes work, reads event streams, dispatches | worktrain_spawn, worktrain_await | inherit |
+The type determines the system prompt variant, not just the tools. A `challenger` gets a system prompt that explicitly says "your job is to find problems, not solve them -- do not offer solutions." A `verifier` gets "do not trust claims without running the commands yourself."
+This is the WorkTrain equivalent of cognitive specialization -- different agents for different modes of thought, not just different tasks. The workflow step can specify which subagent type to spawn: `spawn_session({ type: 'challenger', goal: '...' })`.
+---
+### Workflow-scoped system prompts for subagents (Apr 17, 2026)
+**The idea:** Workflows (and individual steps within them) can declare a `systemPrompt` field that gets injected into subagent sessions spawned by that workflow step. The workflow author encodes the cognitive mode directly rather than describing it in step prose that the agent has to interpret.
+**Why this is the right layer:**
+The workflow already controls: what steps run, what tools are available, what the output contract is, what assessments are required. The cognitive mode -- how the agent should think -- is a natural extension of that. A workflow that says "run as adversarial challenger" should be able to enforce that at the platform level, not just suggest it in a prompt.
+**Two levels:**
+**1. Workflow-level `systemPrompt`** -- applies to all subagents spawned by this workflow:
+```json
+{
+  "id": "mr-review-workflow.agentic.v2",
+  "systemPrompt": "You are an adversarial code reviewer. Your job is to find problems, not validate the approach. Do not offer solutions -- only surface issues with evidence. Treat every claim as unproven until you verify it yourself.",
+  "steps": [...]
+}
+```
+**2. Step-level `systemPrompt`** -- overrides the workflow-level prompt for a specific step:
+```json
+{
+  "id": "phase-hypothesis-challenge",
+  "systemPrompt": "You are a devil's advocate. For every assumption in the hypothesis, find the strongest counterargument. Do not be balanced -- be adversarial.",
+  "prompt": "Challenge the leading hypothesis..."
+}
+```
+**How it composes with the base system prompt:**
+The final subagent system prompt is assembled in layers:
+1. WorkTrain base prompt (execution contract, oracle priority, tools)
+2. Workflow-level `systemPrompt` (cognitive mode for this workflow)
+3. Step-level `systemPrompt` (cognitive override for this step)
+4. Soul file (operator behavioral rules)
+5. AGENTS.md / workspace context
+6. Session knowledge log (inherited context, if `context: 'inherit'`)
+7. Step prompt (the actual work instruction)
+The workflow author controls layers 2-3. The operator controls layer 4. The platform assembles 1 and 5-7 automatically. Clear separation of concerns.
+**This also enables the subagent type system** (from the previous backlog entry) to be workflow-driven rather than call-site-driven. Instead of `spawn_session({ type: 'challenger' })`, the workflow step that spawns a challenger simply declares `systemPrompt: "you are adversarial..."` -- the cognitive mode travels with the workflow definition, not the spawn call.
+**Schema addition:**
+```typescript
+interface WorkflowDefinition {
+  systemPrompt?: string;  // workflow-level, injected into all subagent sessions
+  steps: WorkflowStep[];
+}
+interface WorkflowStep {
+  systemPrompt?: string;  // step-level, overrides workflow-level for this step
+  prompt: string;
+  // ...existing fields
+}
+```
+**Authoring implication:** The `workflow-for-workflows` meta-workflow should guide authors to write cognitive mode as `systemPrompt` rather than embedding it in `prompt` prose. "What mode should the agent be in?" is a structural question, not a content question.
+---
+### Console as the unified WorkRail dashboard -- standalone, file-reading, zero coupling (Apr 18, 2026)
+**The insight:** The console is the unified view of all WorkRail activity -- whether sessions were started by the autonomous daemon or by a human working interactively through the MCP server. It doesn't care how a session was created. It reads the same session store either way.
+The console doesn't need a live connection to either the daemon or the MCP server. It reads files. The current architecture where the console is owned by whichever process wins a port election is wrong -- it's a legacy of when the MCP server was the only long-running process.
+**Target architecture -- zero coupling:**
+```
+Daemon          → writes ~/.workrail/data/sessions/
+                → writes ~/.workrail/events/daemon/
+                → serves :3200 (webhooks only)
+MCP server      → reads/writes session store (same files as daemon)
+                → serves :3100 (Claude Code bridge only)
+Console         → reads ~/.workrail/data/sessions/ (file watch, not HTTP)
+                → reads ~/.workrail/events/daemon/ (file watch)
+                → reads git for PR/commit context
+                → serves :3456 (browser UI only)
+                → `worktrain console` -- fully standalone binary
+```
+**No startup coordination. No lock files. No port election. No coupling.**
+The console works whether the daemon is running or not, whether the MCP server is running or not. Start it once, leave it running permanently. It shows whatever is in the files.
+**How it gets live updates without HTTP:** FSEvents (macOS) / inotify (Linux) file watching on the session store and daemon event stream. When a new event is appended, the console picks it up within milliseconds and pushes to the browser via SSE -- same latency as today, no polling, no HTTP connection to the daemon required.
+**The `worktrain console` command:**
+```bash
+worktrain console              # start on default port 3456
+worktrain console --port 4000  # custom port
+worktrain console --workspace ~/git/myproject  # workspace-scoped view
+```
+**Migration:** Remove console startup from both the daemon command and the MCP server startup. The primary election logic (`DashboardLock`, `bindWithPortFallback`) becomes unnecessary. The `DaemonConsole` module in `src/trigger/daemon-console.ts` becomes `src/console/standalone-console.ts` with a simpler interface.
+**Why this matters:** Today the console goes down whenever the MCP server crashes. With this architecture, the console is as stable as the filesystem. The daemon crashing doesn't affect the console. The MCP server crashing doesn't affect the console. The only thing that can take down the console is killing the `worktrain console` process itself.
+---
+## WorkTrain sprint: Apr 17-18, 2026 -- shipped and current state
+### What shipped (Apr 17-18)
+**Daemon stabilization:**
+- ✅ `report_issue` tool -- agents call this instead of dying silently; structured JSON written to `~/.workrail/issues/<sessionId>.jsonl`, event emitted to daemon stream, WORKTRAIN_STUCK marker in `WorkflowRunResult`
+- ✅ Richer `BASE_SYSTEM_PROMPT` -- baked-in behavioral principles (oracle hierarchy, self-directed reasoning, workflow-as-contract, silent failure policy) rather than relying on soul file alone
+- ✅ `/bin/bash` for Bash tool -- process substitution `<(...)` and other bash-specific syntax now works
+- ✅ `DaemonEventEmitter` -- structured event stream at `~/.workrail/events/daemon/YYYY-MM-DD.jsonl`
+- ✅ Self-configuration -- `triggers.yml`, upgraded `daemon-soul.md` (WorkRail-specific rules + coding philosophy), `AGENTS.md` WorkTrain section
+**Workflow library:**
+- ✅ mr-review v2.6 -- `philosophy_alignment` reviewer family; scoped philosophy extraction in fact packet; 7th coverage domain; "is this the right design?" framing
+- ✅ wfw v2.5 -- phases 2 and 3 split into dedicated prep-step design steps (2a/2b, 3a/3b); principle: assessments need dedicated prep steps, not on-the-fly evidence gathering
+- ✅ Clean workflow display names across library (removed `v2 •`, `Lean •`, etc.)
+- ✅ `philosophy.mdc` created at `~/.firebender/commands/philosophy.mdc` -- MR review subagents now evaluate findings against coding philosophy
+**Integrations and infrastructure:**
+- ✅ GitLab polling triggers fully merged (#404) -- zero-webhook MR polling
+- ✅ TS6 forward-compat tsconfig fixes (#401) -- unblocks TypeScript 6 dep bumps
+- ✅ Standalone console spec -- `worktrain console` as independent file-reading binary, zero coupling to daemon or MCP server
+---
+### Current state (Apr 18, 2026)
+**What works:**
+- Daemon runs autonomously on webhook triggers
+- Sessions advance through full workflow steps
+- Console at `:3456` when daemon starts before MCP server
+- Daemon event stream logging every tool call
+- GitLab + GitHub polling (no webhooks needed)
+- Philosophy-aligned MR reviews
+- `report_issue` tool available to agents
+**Known issues / active bugs:**
+1. **Daemon killed by MCP server reconnects** (CRITICAL) -- the daemon and MCP server share process infrastructure via the bridge mechanism. When Claude Code reconnects and a new MCP server process starts, it displaces the running daemon. The daemon must be run from a separate terminal or as a `launchd` service to survive MCP reconnects. Root fix: decouple daemon from the MCP server process tree entirely.
+2. **Console unstable** -- the console port (3456) is contested between daemon and MCP server. Whoever starts first wins. When the MCP server reconnects, it takes the port and the daemon console goes down. Root fix: standalone `worktrain console` binary (spec in backlog).
+3. **`workflow_not_found` on first test** -- trigger used `coding-task-workflow-agentic.lean.v2` (filename) instead of `coding-task-workflow-agentic` (workflow ID). Fixed in triggers.yml. Symptom of workflow ID vs filename confusion -- worth a validator that catches this at `worktrain daemon` startup.
+4. **Session advances 0 when daemon crashes** -- if daemon dies mid-Phase-0 (before any `continue_workflow` call), the session is orphaned at `observation_recorded(8)` with 0 advances and no output. No automatic recovery. Crash recovery reads the daemon-session token file but can't resume a session that never advanced. No fix yet.
+---
+### Next priorities (groomed Apr 18)
+**Tier 1 -- Must fix for reliable autonomous operation:**
+1. **Daemon as a launchd service** -- run daemon outside Claude Code's process tree so MCP reconnects can't kill it. `worktrain daemon --install` creates a launchd plist and starts it.
+2. **Standalone `worktrain console`** -- file-watching binary independent of daemon/MCP. Zero coupling. Spec in backlog.
+3. **Workflow ID validation at startup** -- `workrail daemon` should validate that all `workflowId` values in triggers.yml resolve to real workflows before starting, not fail silently at dispatch time.
+**Tier 2 -- Workflow quality:**
+4. **mr-review prep steps** -- the audit identified missing dedicated prep steps for philosophy extraction, pattern baseline, and design decision reconstruction. These are described in the backlog but not yet in the workflow JSON. wfw v2.5 guides new workflows to add them; the mr-review workflow itself still needs a v2.7 pass to implement them.
+5. **Autonomous workflow variants** -- audit `requireConfirmation` gates across all workflows; confirm daemon's `autonomy: full` setting correctly bypasses the right ones.
+**Tier 3 -- Features:**
+6. **`worktrain spawn` / `worktrain await`** -- already merged, needs real-world test
+7. **Auto-commit from handoff artifact** -- merged but untested end-to-end
+8. **Session knowledge log** -- continuous context accumulation for subagent packaging
+9. **TypeScript 6 dep bump** -- tsconfig fixes are in (#401), unblocks #244 and #231
+**Open PRs (only dep bumps remain):**
+- #330, #287, #288 -- vitest 4 + vite 8 (major version, needs testing)
+- #244, #231 -- TypeScript 6.0.2 (now unblocked by #401)
+---
+### Duplicate task detection: prevent agents from doing the same work twice (Apr 18, 2026)
+**The problem:** with multiple agents running concurrently and a persistent work queue, it's easy to accidentally start two agents on the same task -- especially when the queue drains items from external sources (GitHub issues, Jira) that may be added again after a sync. Today, two agents can independently pick up the same issue, do the same investigation, and open duplicate PRs.
+**Detection sources:**
+1. **Open PRs**: before starting any coding task, check `gh pr list --state open` -- if a PR already exists that addresses the same issue/goal, skip it
+2. **Active sessions**: the session store knows which workflows are currently running and what their goals are; a new dispatch can check for semantic overlap before starting
+3. **Queue deduplication**: the work queue should deduplicate by external item ID (GitHub issue number, Jira ticket key) so the same item can't be enqueued twice
+4. **Session history**: before starting an investigation, check recent session notes for the same workflowId + goal combination -- if it was completed in the last 24 hours with a successful result, skip or ask the user
+**Implementation approach:**
+- Queue-level dedup is the simplest and most reliable: each queue item from an external source carries its `sourceId` (e.g. `github:EtienneBBeaulac/workrail:issues:123`). On enqueue, check if `sourceId` already exists in the queue (pending or active) -- if so, skip with a log.
+- PR-level dedup: before `worktrain spawn` dispatches a coding task, run `gh pr list --search "<issue title keywords>"` and check for matches. If found, add to outbox ("task already in progress as PR #X") and skip.
+- Session-level dedup: the coordinator script checks active session goals before spawning a new one with the same goal text.
+**The classify-task-workflow role:** when a task is classified, it can also output a `deduplicationKey` (e.g. `fix:trigger-store:error-kind-consistency`) that is stored with the queue item. Queue items with the same key are considered duplicates.
+**What makes this hard:** semantic dedup (two tasks described differently but solving the same problem) requires embedding-based similarity, not exact match. For MVP, exact `sourceId` match + approximate PR title search is sufficient. Semantic dedup is a post-knowledge-graph feature.
+---
+### Agent actions as first-class events in the session event log (Apr 18, 2026)
+**The vision:** the console should be able to reconstruct exactly what an agent did in a session -- every tool call, every argument, every result, every decision -- by reading the event log alone. No log files, no stdout parsing, no separate monitoring infrastructure. The session event store IS the audit trail.
+**What's already in the event log:**
+- `session_created`, `run_started`, `run_completed`
+- `node_created`, `edge_created`, `advance_recorded`
+- `node_output_appended` (step notes)
+- `preferences_changed`, `context_set`, `observation_recorded`
+**What's missing -- agent-level actions:**
+- `tool_call_started` -- which tool was called, with what arguments, at what timestamp
+- `tool_call_completed` -- result (truncated), duration, success/error
+- `llm_turn_started` -- model, token count estimate, step context
+- `llm_turn_completed` -- stop reason, output tokens, whether steer() was injected
+- `steer_injected` -- what context was injected and why (session recap, workspace context)
+- `report_issue_recorded` -- the structured issue from the `report_issue` tool
+- `worktrain_stuck` -- when WORKTRAIN_STUCK marker is emitted
+**Why this matters:**
+Today the `DaemonEventEmitter` writes to `~/.workrail/events/daemon/YYYY-MM-DD.jsonl` separately from the session store. That's two places to look -- and they're not correlated to specific sessions. Putting agent actions into the session event log means:
+- Console can show a session timeline: "Phase 0: called `bash` 3 times (12ms, 8ms, 45ms) → called `read` 2 times → advanced to Phase 1"
+- The proof record (verification chain spec) can link specific tool calls to assessment gate evidence
+- Crash recovery knows exactly where in the agent's execution it died
+- The knowledge graph can be updated from session events without re-reading step notes
+**The event schema (additions to the existing event store format):**
+```typescript
+// Tool call lifecycle
+{ kind: 'tool_call_started', tool: 'bash', args: { command: 'git status' }, nodeId, ts }
+{ kind: 'tool_call_completed', tool: 'bash', durationMs: 45, exitCode: 0, resultSummary: '...', nodeId, ts }
+{ kind: 'tool_call_failed', tool: 'bash', durationMs: 45, error: 'ENOENT', nodeId, ts }
+// LLM turn lifecycle
+{ kind: 'llm_turn_started', model: 'claude-sonnet-4-6', inputTokens: 12000, nodeId, ts }
+{ kind: 'llm_turn_completed', stopReason: 'tool_use', outputTokens: 450, toolsRequested: ['bash'], nodeId, ts }
+// Steer injection
+{ kind: 'steer_injected', reason: 'session_recap', contentLength: 800, nodeId, ts }
+// Agent self-reporting
+{ kind: 'report_issue_recorded', severity: 'warning', summary: '...', sessionId, ts }
+```
+**Where to emit them:**
+- In `src/daemon/agent-loop.ts` -- before and after each `tool.execute()` call, before and after each LLM call
+- In `src/daemon/workflow-runner.ts` -- for steer injection and report_issue recording
+- Use the existing `V2ToolContext` session store to append events (same mechanism as `continue_workflow` and `start_workflow`)
+**Console rendering:**
+Each session detail view gets a "Timeline" tab alongside "Steps" and "Notes":
+```
+Phase 0: Understand & Classify         [2m 14s]
+  ├── llm_turn              450 tokens → 3 tool calls
+  ├── bash: git status                    45ms ✓
+  ├── bash: gh pr list                   180ms ✓
+  ├── read: AGENTS.md                      8ms ✓
+  └── llm_turn              280 tokens → advance
+Phase 1a: State Hypothesis              [0m 38s]
+  ├── llm_turn              310 tokens → advance
+  ...
+```
+**Relationship to DaemonEventEmitter:**
+The existing `DaemonEventEmitter` (written in #498) writes to a separate daily log file. Once agent actions are first-class session events, the daemon event emitter can be simplified or removed -- the session event log is the canonical record. The console reads session events, not daemon event files.
+**Build order:**
+1. Add `tool_call_started`/`tool_call_completed` events to `agent-loop.ts` -- smallest change, highest value
+2. Add `llm_turn_started`/`llm_turn_completed` events
+3. Console Timeline tab reads and renders the new event kinds
+4. Wire `report_issue_recorded` and `steer_injected` events
+---
+### FatalToolError: distinguish recoverable from non-recoverable tool failures (follow-up from PR #523)
+The blanket try/catch in AgentLoop._executeTools() converts ALL tool throws to isError tool_results. This is correct for Bash/Read/Write (LLM can see and retry), but potentially wrong for continue_workflow failures (LLM retrying with a broken token loops). The discovery agent proposed a FatalToolError subclass: tools throw FatalToolError for non-recoverable errors (session corruption, bad tokens), plain Error for recoverable failures. _executeTools catches plain Error and returns isError; FatalToolError propagates and kills the session. Combined with the DEFAULT_MAX_TURNS cap (PR followup), this provides defense-in-depth.
+5. Deprecate `DaemonEventEmitter` once console reads from session events
+---
+### Worktree lifecycle management: automatic cleanup and inventory (Apr 18, 2026)
+**The problem:** every WorkTrain agent that uses `--isolation worktree` leaves a worktree on disk after completion. With 10 concurrent agents running all day, this accumulated to 69 worktrees in `.claude/worktrees/`, triggering hundreds of simultaneous `git status` processes that saturated the CPU.
+**What's needed:**
+1. **Automatic cleanup on session end** -- when a WorkTrain session completes (success or failure), the daemon automatically runs `git worktree remove <path> --force` for the session's worktree. If the branch is already merged to main, also delete the local branch ref.
+2. **Startup pruning** -- `workrail daemon` startup runs `git worktree prune` in each configured workspace before starting the trigger listener.
+3. **`worktrain worktree list`** -- shows all WorkTrain-managed worktrees: path, branch, session ID, age, whether the branch is merged.
+4. **`worktrain worktree clean`** -- removes all worktrees whose branches are merged to main, or older than N days. Dry-run mode by default.
+5. **`worktrain worktree status`** -- summary: how many worktrees, total disk usage, any stale ones.
+6. **Never use main as a worktree** (already in backlog) -- enforced at worktree creation time, not just as a rule.
+**Root cause of the CPU spike:** 69 worktrees × repeated `git status --short` from tools/IDE plugins = hundreds of concurrent git processes. Each `git status` on a large repo with many untracked files is CPU-intensive.
+**Mitigation already in place:** `--isolation worktree` creates branches named `worktree-agent-<id>` -- these are identifiable and bulk-deletable. The daemon's `runStartupRecovery()` could also prune them.
+**Build order:** startup pruning (trivial, high value) → automatic cleanup on session end → `worktrain worktree` CLI commands.
+---
+### Simplify MCP server: remove primary election, bridge, and HTTP serving (architectural cleanup)
+**The core insight:** the bridge/primary-election system exists solely to solve "only one process should serve the console UI on port 3456." Now that `worktrain console` is a standalone file-watching binary (PR #512), that problem is already solved. The entire bridge/election system can be removed.
+**What "allow multiple MCP processes" means in practice:**
+- Each Claude Code window gets its own MCP server -- no port contention, no primary election, no bridge reconnect cycles
+- MCP server becomes pure stdio: starts, handles tools, exits. Nothing async needs to write after the pipe closes -- EPIPE is irrelevant.
+- Session store is append-only JSONL per-session -- multiple processes writing different sessions cannot corrupt each other
+- `worktrain console` aggregates all sessions from the file store regardless of how many MCP servers ran
+**What to remove:**
+- `DashboardLock` / `tryBecomePrimary()` / `bindWithPortFallback()` -- the entire primary election system
+- `bridge-entry.ts` -- the bridge, spawn storm, and reconnect drama are gone
+- `HttpServer` starting as part of the MCP server -- console owns HTTP, not MCP
+**What remains for the MCP server:** pure stdio MCP protocol + session engine. No HTTP, no port binding, no lock files. Starts instantly, exits cleanly.
+**Why this is safe:**
+- Tokens are session-scoped UUIDs -- two servers cannot share a session
+- Append-only JSONL has no exclusive file locks
+- ~50MB per process × 3 Claude Code windows = 150MB -- acceptable
+**The bridge complexity was always a band-aid.** It was the right solution when the MCP server also owned the console UI. With the standalone console, the band-aid can come off and the system becomes dramatically simpler and more reliable.
+**Build order:** extract `worktrain console` fully (done) → remove HttpServer from MCP startup → remove bridge → remove DashboardLock/primary election → MCP server is pure stdio.
+---
+### Agent-engine communication: first principles design (Apr 18, 2026)
+**The setup for this conversation:**
+Three discovery agents investigated whether the daemon should continue using MCP-style tool calls for workflow control (`continue_workflow`). Their findings:
+- **Discovery 1**: Tool calls are fine; enrich `continue_workflow` with `artifacts` now, explore structured output hybrid later pending Bedrock verification. ~225 tokens/request saved with hybrid.
+- **Discovery 2**: `complete_step` tool -- daemon owns transitions, continueToken hidden from LLM, notes required at type level. Cleaner DX without paradigm shift.
+- **Discovery 3**: The field has converged on tool calls. OpenAI Agents SDK, LangGraph, Temporal, Vercel AI SDK all use tool calls for workflow control. WorkRail's `continue_workflow` with HMAC tokens is already field-standard or better.
+**User's response to "the field has converged on tool calls":**
+> "Right, but do we want industry standards? Aren't we trying to build something special? What if there is better?"
+This is the right question. "Field convergence" is a description of where everyone ended up starting from the MCP/function-calling paradigm -- not proof that it's optimal. Every system surveyed treats the workflow engine as external infrastructure the agent calls into. WorkRail is different: **the daemon IS the workflow engine**. The agent loop and the step sequencer run in the same process, sharing the same DI container. Tool calls are a network-origin concept -- they exist because there's an LLM over there and an executor over here. WorkRail doesn't have that constraint.
+---
+#### First-principles alternatives (unexplored territory)
+These were not in any of the discovery agents' outputs -- they emerge from the insight that WorkRail owns both sides of the conversation:
+**1. Structured response parsing (no tool call for workflow control)**
+The agent outputs a structured response at the end of each turn. The daemon parses it. The LLM never "calls a tool" to advance -- it produces a well-structured output and the daemon acts on it. The continueToken and workflow machinery are completely invisible to the LLM. Example: agent outputs `{"step_complete": true, "notes": "...", "artifacts": [...]}` as its final text, daemon detects this and advances.
+**2. Implicit advancement (criteria-based)**
+The daemon watches what the agent produces (file writes, bash outcomes, notes) and decides when to advance -- the agent never explicitly signals "I'm done." The workflow step has completion criteria, and the daemon evaluates them against the agent's cumulative output. More like a CI pipeline (tests pass = done) than an API call. The agent just works; the daemon decides when the step is complete.
+**3. Declarative intent + daemon execution**
+The agent outputs what it *wants* to happen: "I want to commit these files with this message and advance to the next step." The daemon executes. Same as the scripts-over-agent principle applied to the agent's own workflow control -- the agent declares intent, scripts execute. No tool call for the mechanical parts.
+**4. Streaming judgment**
+The daemon reads the agent's streaming response in real-time, extracts notes and artifacts as they appear, and makes the advance decision before the agent "finishes." No explicit signal from the agent. The daemon monitors and decides.
+**5. Separation of concerns: tools for world, declaration for workflow**
+Keep tool calls for external actions (Bash, Read, Write) -- these genuinely need interleaved execution and result reasoning. But workflow control (advance, submit artifacts, set context) uses a different mechanism entirely: structured response, implicit detection, or a single lightweight declaration. The protocol distinction: tools are for I/O, declarations are for state.
+---
+#### What makes this hard
+These alternatives trade off in important ways:
+- **Structured response parsing**: requires reliable structured output from the LLM, which can fail without explicit enforcement
+- **Implicit advancement**: requires the daemon to correctly evaluate completion criteria -- complex for open-ended steps
+- **Declarative intent**: still needs some kind of output format; essentially moves the "tool call" into the response text
+- **Streaming judgment**: hardest to implement correctly; requires the daemon to parse partial responses reliably
+The current tool-call approach works precisely because it's explicit: the agent signals intent exactly once, the daemon acts on it. The alternatives are more elegant but less reliable.
+---
+#### What to actually investigate
+Before committing to any alternative, these questions need answers:
+1. **Does Bedrock support `response_format + tools` simultaneously?** A 10-line test call resolves this. If yes, hybrid structured output is immediately viable for workflow control.
+2. **What does implicit advancement actually look like for a coding task?** Write out the completion criteria for `coding-task-workflow-agentic` phase-0 (classify). Can a daemon reliably detect "Phase 0 is done" without an explicit signal?
+3. **What is the actual failure mode of structured response parsing?** How often does Claude 4.6 Sonnet fail to produce valid JSON when asked to end its turn with a structured summary? Under what conditions?
+4. **What did nexus-core do?** The backlog notes nexus-core as a more advanced system -- how does it handle agent-step transitions?
+These are prototype questions, not design questions. Build the smallest possible test for each before committing to any direction.
+---
+### Bundled trigger templates: zero-config workflow automation via worktrain init (Apr 18, 2026)
+**Problem:** Every user has to write their own triggers.yml manually. Wrong workflow IDs, missing required fields, wrong workspace paths -- all common mistakes (we hit all three today). There's no "just works" path to workflow automation.
+**Solution:** Ship common trigger templates bundled with WorkTrain. `worktrain init` presents a menu and generates a pre-filled triggers.yml.
+**Bundled templates to ship:**
+```yaml
+# Template: mr-review
+- id: mr-review
+  workflowId: mr-review-workflow-agentic
+  goal: "Review the PR specified in the webhook payload goal field"
+  concurrencyMode: parallel
+  autoCommit: false
+  agentConfig: { maxSessionMinutes: 30 }
+# Template: coding-task
+- id: coding-task
+  workflowId: coding-task-workflow-agentic
+  concurrencyMode: parallel
+  autoCommit: false
+  agentConfig: { maxSessionMinutes: 60 }
+# Template: discovery-task
+- id: discovery-task
+  workflowId: wr.discovery
+  concurrencyMode: parallel
+  autoCommit: false
+  agentConfig: { maxSessionMinutes: 60 }
+# Template: bug-investigation
+- id: bug-investigation
+  workflowId: bug-investigation.agentic.v2
+  agentConfig: { maxSessionMinutes: 45 }
+# Template: weekly-health-scan (cron, when native cron trigger ships)
+# - id: weekly-health-scan
+#   type: cron
+#   schedule: "0 9 * * 0"
+#   workflowId: architecture-scalability-audit
+```
+**`worktrain init` flow:**
+1. "Which workflows do you want to run automatically?" (checkbox menu)
+2. For each selected: set `workspacePath` to current directory (overridable)
+3. Generate `triggers.yml` in the workspace root
+4. Validate workflow IDs exist before writing (use the startup validator)
+5. Tell the user how to fire each trigger: `curl -X POST http://localhost:3200/webhook/<id> ...`
+**Why this matters:** The difference between WorkTrain being usable by anyone vs only by engineers who read the source code. A new user should be able to go from `worktrain init` to their first automated workflow in under 5 minutes.
+**Also needed:** `worktrain trigger add <template-name>` to add a single trigger to an existing triggers.yml without re-running init.
+---
+### Coordinator context injection standard: agents start informed, not discovering (Apr 18, 2026)
+**The problem:** subagents spawned by a coordinator are completely blind. They know nothing of prior conversations, existing docs, the pipeline, or what's already been tried. The workflows compensate by spending 3-5 turns on "Phase 0: context gathering" every session -- expensive in tokens, time, and LLM turns -- just to get oriented before work starts.
+**The root cause:** the coordinator spawns agents with task descriptions but not context. "Fix the Windows CI failures" is a task. "The Windows CI failures are in `workflow-runner-bash-tool.test.ts` because `node -e` isn't in PATH on Windows -- the fix is to use `process.execPath` instead of `node`, which is the established pattern in this codebase" is context. The difference is 0 discovery turns vs 5.
+**The standard to establish:**
+Every coordinator-spawned agent gets a pre-packaged context bundle. The coordinator assembles it before calling `worktrain spawn`. The bundle includes:
+1. **Prior session findings** -- what relevant sessions discovered (from session store query)
+2. **Established patterns** -- the specific invariants and patterns the agent needs (from knowledge graph or AGENTS.md)
+3. **What NOT to discover** -- explicit list of things already known so the agent doesn't waste turns
+4. **Failure history** -- what's been tried and didn't work (prevents re-exploring dead ends)
+**Format:** ~2000 tokens max, injected as a `<context>` block before the task description. Structured so the agent can skip Phase 0 context gathering entirely when the bundle is complete.
+**Build order:**
+1. Write the standard as a prompt template for coordinator scripts (`worktrain spawn` calls)
+2. The knowledge graph provides the infrastructure for querying relevant context automatically
+3. Eventually: `worktrain spawn` reads the context bundle from the graph + session store automatically, coordinator doesn't have to assemble it manually
+**Why this is high priority:** every agent spawned today without proper context is burning tokens on discovery that should have been provided upfront. At 10 concurrent agents, that's 10x the waste. With proper context injection, Phase 0 becomes 1 turn instead of 5, and output quality improves because the agent starts with the right mental model.
+---
+### Context budget per spawned agent: capped, structured, queryable (Apr 18, 2026)
+**The companion spec to context injection:**
+Rather than hoping agents discover the right context, the coordinator guarantees a minimum context budget: a pre-packaged bundle of ~2000 tokens that every agent starts with. The knowledge graph is what makes this scalable -- without it, the coordinator has to manually assemble context from files, which is itself expensive.
+**Bundle contents (structured):**
+- `<relevant_files>` -- paths + key excerpts from files the agent will likely touch (from KG query)
+- `<prior_sessions>` -- summaries of the last 3 sessions that touched related code (from session store)
+- `<established_patterns>` -- specific patterns the agent must follow (e.g. "use `tmpPath()` not `/tmp/`")
+- `<known_facts>` -- things already proven true (e.g. "semantic-release runs automatically after CI, not before")
+- `<do_not_explore>` -- explicit list of dead ends and already-tried approaches
+**How the knowledge graph enables this:**
+- `relevant_files`: KG query "what files are related to the goal?" returns the structural subgraph
+- `prior_sessions`: session store query "what sessions touched these files in the last 7 days?"
+- `established_patterns`: AGENTS.md + KG pattern nodes
+- `known_facts` and `do_not_explore`: built by the coordinator from prior session outputs
+**Without the KG (today):** the coordinator manually includes key context in the prompt. Better than nothing, but requires the coordinator to know what's relevant.
+**With the KG (future):** `worktrain spawn --workflow X --goal "..."` automatically queries the KG and assembles the context bundle. Coordinator just provides the goal.
+---
+### Decouple goal from trigger definition -- late-bound goals for daemon sessions (Apr 18, 2026)
+**The problem:** `goal` is currently required at trigger-definition time (in triggers.yml). For triggers like `mr-review`, the goal is inherently dynamic -- it's the PR title and description, known only when the webhook fires, not when the trigger is configured.
+The current workaround: `goalTemplate: "{{$.goal}}"` with the caller passing `{"goal": "Review PR #123..."}` in the webhook payload. This works but is awkward -- the caller must know the payload field convention, and it's not obvious from the trigger definition.
+**The right model:** separate "which workflow" (trigger definition) from "what to do" (dispatch-time goal).
+```yaml
+# Trigger definition -- no goal required
+triggers:
+  - id: mr-review
+    workflowId: mr-review-workflow-agentic
+    workspacePath: ~/git/myproject
+    # No goal here -- goal comes from dispatch context
+```
+```bash
+# Dispatch with goal at call time
+curl -X POST http://localhost:3200/webhook/mr-review \
+  -d '{"goal": "Review PR #123: fix authentication bug"}'
+# Or via worktrain spawn
+worktrain spawn --trigger mr-review --goal "Review PR #123: fix authentication bug"
+```
+**Implementation options:**
+1. **goalTemplate with `$.goal` as the default** -- if no `goal` is set in the trigger and no `goalTemplate` is set, default to `goalTemplate: "{{$.goal}}"`. The webhook payload's `goal` field becomes the canonical way to pass a dynamic goal. Zero breaking changes.
+2. **Late-bound goal field on WorkflowTrigger** -- `executeStartWorkflow` accepts `goal` as a separate parameter. The trigger provides everything except the goal; the dispatcher (TriggerRouter) resolves the goal from the webhook payload or a default. This makes the separation explicit at the type level.
+3. **Prompt injection** -- the workflow's first step can read `context.goal` which is injected from the webhook payload. The trigger has a static placeholder; the real goal comes through as a context variable. This is how it currently half-works but without the clean API.
+**Preferred: Option 1 (default goalTemplate)** -- minimal change, backward compatible, works immediately. If `goal` is absent from the trigger and the webhook payload contains `{"goal": "..."}`, use it. Document this as the standard pattern for dynamic-goal triggers.
+**Also needed:** the `worktrain spawn` CLI command should accept `--goal` as a first-class flag (already partially implemented) so coordinator scripts can pass goals without knowing the webhook payload format.
+**Why this matters for WorkTrain being production-ready:** most real-world triggers (PR review, issue investigation, incident response) have dynamic goals that depend on what just happened. Static goals in triggers.yml only work for scheduled/cron tasks. Late-bound goals make the whole trigger system composable with external events.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@exaudeus/workrail",
-  "version": "3.32.0",
+  "version": "3.34.0",
   "description": "Step-by-step workflow enforcement for AI agents via MCP",
   "license": "MIT",
   "repository": {