npm - @exaudeus/workrail - Versions diffs - 3.28.0 → 3.30.0 - Mend

@exaudeus/workrail 3.28.0 → 3.30.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (160) hide show

package/dist/console/assets/{index-C146q2kN.js → index-Bl5-Ghuu.js} +1 -1
package/dist/console/index.html +1 -1
package/dist/manifest.json +3 -3
package/docs/README.md +57 -0
package/docs/adrs/001-hybrid-storage-backend.md +38 -0
package/docs/adrs/002-four-layer-context-classification.md +38 -0
package/docs/adrs/003-checkpoint-trigger-strategy.md +35 -0
package/docs/adrs/004-opt-in-encryption-strategy.md +36 -0
package/docs/adrs/005-agent-first-workflow-execution-tokens.md +105 -0
package/docs/adrs/006-append-only-session-run-event-log.md +76 -0
package/docs/adrs/007-resume-and-checkpoint-only-sessions.md +51 -0
package/docs/adrs/008-blocked-nodes-architectural-upgrade.md +178 -0
package/docs/adrs/009-bridge-mode-single-instance-mcp.md +195 -0
package/docs/adrs/010-release-pipeline.md +89 -0
package/docs/architecture/README.md +7 -0
package/docs/architecture/refactor-audit.md +364 -0
package/docs/authoring-v2.md +527 -0
package/docs/authoring.md +873 -0
package/docs/changelog-recent.md +201 -0
package/docs/configuration.md +505 -0
package/docs/ctc-mcp-proposal.md +518 -0
package/docs/design/README.md +22 -0
package/docs/design/agent-cascade-protocol.md +96 -0
package/docs/design/autonomous-console-design-candidates.md +253 -0
package/docs/design/autonomous-console-design-review.md +111 -0
package/docs/design/autonomous-platform-mvp-discovery.md +525 -0
package/docs/design/claude-code-source-deep-dive.md +713 -0
package/docs/design/console-cyberpunk-ui-discovery.md +504 -0
package/docs/design/console-execution-trace-candidates-final.md +160 -0
package/docs/design/console-execution-trace-candidates.md +211 -0
package/docs/design/console-execution-trace-design-candidates-v2.md +113 -0
package/docs/design/console-execution-trace-design-review.md +74 -0
package/docs/design/console-execution-trace-discovery.md +394 -0
package/docs/design/console-execution-trace-final-review.md +77 -0
package/docs/design/console-execution-trace-review.md +92 -0
package/docs/design/console-performance-discovery.md +415 -0
package/docs/design/console-ui-backlog.md +280 -0
package/docs/design/daemon-architecture-discovery.md +853 -0
package/docs/design/daemon-design-candidates.md +318 -0
package/docs/design/daemon-design-review-findings.md +119 -0
package/docs/design/daemon-engine-design-candidates.md +210 -0
package/docs/design/daemon-engine-design-review.md +131 -0
package/docs/design/daemon-execution-engine-discovery.md +280 -0
package/docs/design/daemon-gap-analysis.md +554 -0
package/docs/design/daemon-owns-console-plan.md +168 -0
package/docs/design/daemon-owns-console-review.md +91 -0
package/docs/design/daemon-owns-console.md +195 -0
package/docs/design/data-model-erd.md +11 -0
package/docs/design/design-candidates-consolidate-dev-staleness.md +98 -0
package/docs/design/design-candidates-walk-cache-depth-limit.md +80 -0
package/docs/design/design-review-consolidate-dev-staleness.md +54 -0
package/docs/design/design-review-walk-cache-depth-limit.md +48 -0
package/docs/design/implementation-plan-consolidate-dev-staleness.md +142 -0
package/docs/design/implementation-plan-walk-cache-depth-limit.md +141 -0
package/docs/design/layer3b-ghost-nodes-design-candidates.md +229 -0
package/docs/design/layer3b-ghost-nodes-design-review.md +93 -0
package/docs/design/layer3b-ghost-nodes-implementation-plan.md +219 -0
package/docs/design/list-workflows-latency-fix-plan.md +128 -0
package/docs/design/list-workflows-latency-fix-review.md +55 -0
package/docs/design/list-workflows-latency-fix.md +109 -0
package/docs/design/native-context-management-api.md +11 -0
package/docs/design/performance-sweep-2026-04.md +96 -0
package/docs/design/routines-guide.md +219 -0
package/docs/design/sequence-diagrams.md +11 -0
package/docs/design/subagent-design-principles.md +220 -0
package/docs/design/temporal-patterns-design-candidates.md +312 -0
package/docs/design/temporal-patterns-design-review-findings.md +163 -0
package/docs/design/test-isolation-from-config-file.md +335 -0
package/docs/design/v2-core-design-locks.md +2746 -0
package/docs/design/v2-lock-registry.json +734 -0
package/docs/design/workflow-authoring-v2.md +1044 -0
package/docs/design/workflow-docs-spec.md +218 -0
package/docs/design/workflow-extension-points.md +687 -0
package/docs/design/workrail-auto-trigger-system.md +359 -0
package/docs/design/workrail-config-file-discovery.md +513 -0
package/docs/docker.md +110 -0
package/docs/generated/v2-lock-closure-plan.md +26 -0
package/docs/generated/v2-lock-coverage.json +797 -0
package/docs/generated/v2-lock-coverage.md +177 -0
package/docs/ideas/backlog.md +3927 -0
package/docs/ideas/design-candidates-mcp-resilience.md +208 -0
package/docs/ideas/design-review-findings-mcp-resilience.md +119 -0
package/docs/ideas/implementation_plan.md +249 -0
package/docs/ideas/third-party-workflow-setup-design-thinking.md +1948 -0
package/docs/implementation/02-architecture.md +316 -0
package/docs/implementation/04-testing-strategy.md +124 -0
package/docs/implementation/09-simple-workflow-guide.md +835 -0
package/docs/implementation/13-advanced-validation-guide.md +874 -0
package/docs/implementation/README.md +21 -0
package/docs/integrations/claude-code.md +300 -0
package/docs/integrations/firebender.md +315 -0
package/docs/migration/v0.1.0.md +147 -0
package/docs/naming-conventions.md +45 -0
package/docs/planning/README.md +104 -0
package/docs/planning/github-ticketing-playbook.md +195 -0
package/docs/plans/README.md +24 -0
package/docs/plans/agent-managed-ticketing-design.md +605 -0
package/docs/plans/agentic-orchestration-roadmap.md +112 -0
package/docs/plans/assessment-gates-engine-handoff.md +536 -0
package/docs/plans/content-coherence-and-references.md +151 -0
package/docs/plans/library-extraction-plan.md +340 -0
package/docs/plans/mr-review-workflow-redesign.md +1451 -0
package/docs/plans/native-context-management-epic.md +11 -0
package/docs/plans/perf-fixes-design-candidates.md +225 -0
package/docs/plans/perf-fixes-design-review-findings.md +61 -0
package/docs/plans/perf-fixes-new-issues-candidates.md +264 -0
package/docs/plans/perf-fixes-new-issues-review.md +110 -0
package/docs/plans/prompt-fragments.md +53 -0
package/docs/plans/ui-ux-workflow-design-candidates.md +120 -0
package/docs/plans/ui-ux-workflow-discovery.md +100 -0
package/docs/plans/ui-ux-workflow-review.md +48 -0
package/docs/plans/v2-followup-enhancements.md +587 -0
package/docs/plans/workflow-categories-candidates.md +105 -0
package/docs/plans/workflow-categories-discovery.md +110 -0
package/docs/plans/workflow-categories-review.md +51 -0
package/docs/plans/workflow-discovery-model-candidates.md +94 -0
package/docs/plans/workflow-discovery-model-discovery.md +74 -0
package/docs/plans/workflow-discovery-model-review.md +48 -0
package/docs/plans/workflow-source-setup-phase-1.md +245 -0
package/docs/plans/workflow-source-setup-phase-2.md +361 -0
package/docs/plans/workflow-staleness-detection-candidates.md +104 -0
package/docs/plans/workflow-staleness-detection-review.md +58 -0
package/docs/plans/workflow-staleness-detection.md +80 -0
package/docs/plans/workflow-v2-design.md +69 -0
package/docs/plans/workflow-v2-roadmap.md +74 -0
package/docs/plans/workflow-validation-design.md +98 -0
package/docs/plans/workflow-validation-roadmap.md +108 -0
package/docs/plans/workrail-platform-vision.md +420 -0
package/docs/reference/agent-context-cleaner-snippet.md +94 -0
package/docs/reference/agent-context-guidance.md +140 -0
package/docs/reference/context-optimization.md +284 -0
package/docs/reference/example-workflow-repository-template/.github/workflows/validate.yml +125 -0
package/docs/reference/example-workflow-repository-template/README.md +268 -0
package/docs/reference/example-workflow-repository-template/workflows/example-workflow.json +80 -0
package/docs/reference/external-workflow-repositories.md +916 -0
package/docs/reference/feature-flags-architecture.md +472 -0
package/docs/reference/feature-flags.md +349 -0
package/docs/reference/god-tier-workflow-validation.md +272 -0
package/docs/reference/loop-optimization.md +209 -0
package/docs/reference/loop-validation.md +176 -0
package/docs/reference/loops.md +465 -0
package/docs/reference/mcp-platform-constraints.md +59 -0
package/docs/reference/recovery.md +88 -0
package/docs/reference/releases.md +177 -0
package/docs/reference/troubleshooting.md +105 -0
package/docs/reference/workflow-execution-contract.md +998 -0
package/docs/roadmap/README.md +22 -0
package/docs/roadmap/legacy-planning-status.md +103 -0
package/docs/roadmap/now-next-later.md +70 -0
package/docs/roadmap/open-work-inventory.md +389 -0
package/docs/tickets/README.md +39 -0
package/docs/tickets/next-up.md +76 -0
package/docs/workflow-management.md +317 -0
package/docs/workflow-templates.md +423 -0
package/docs/workflow-validation.md +184 -0
package/docs/workflows.md +254 -0
package/package.json +4 -1
package/spec/authoring-spec.json +61 -16
package/workflows/workflow-for-workflows.json +3 -3
package/workflows/workflow-for-workflows.v2.json +3 -3

package/docs/design/daemon-design-candidates.md ADDED Viewed

@@ -0,0 +1,318 @@
+# WorkRail Daemon Architecture: Design Candidates
+> Raw investigative material for main-agent synthesis. Not a final decision.
+> Generated: 2026-04-14.
+---
+## Problem Understanding
+### Core tensions
+1. **Single-process simplicity vs. concurrent-session correctness.**
+   The `engineActive` guard exists because the DI container is a global singleton.
+   Sequential sessions (one at a time) are safe but not scalable. Concurrent sessions
+   require either exposing a single shared engine instance (same process) or using
+   separate processes (isolated engines). These have different deployment implications.
+2. **Direct handler calls vs. MCP tool protocol portability.**
+   `engine-factory.ts` calls `executeStartWorkflow` / `executeContinueWorkflow` directly,
+   bypassing JSON-RPC. This is faster and already built, but it couples the daemon to
+   the internal handler API. Changes to handler signatures require daemon changes.
+   The MCP protocol layer is what makes callers swappable.
+3. **Freestanding vs. dependency-rich agent loop.**
+   `npx -y @exaudeus/workrail` portability is a core feature. Adding pi-mono as a
+   dependency doubles the surface area. But building a custom agent loop from scratch
+   converges on the same patterns.
+4. **Self-enforced trust model.**
+   In autonomous mode, the daemon is both driver and enforced entity. The HMAC token
+   protocol prevents token forgery, but the daemon can choose not to call `continueWorkflow`
+   at all. Enforcement integrity relies on the daemon being well-behaved.
+### Likely seam
+The seam between the engine and the agent loop is clean and already designed:
+- Engine produces: `{ pending: { prompt: string, stepId, title } }`
+- Agent loop consumes: the prompt, calls LLM, executes tool calls, returns
+  `{ notesMarkdown: string, context: Record<string, unknown> }`
+- Engine accepts: `continueWorkflow(stateToken, ackToken, output, context)`
+The daemon's job is to close this loop. The seam is at `pending.prompt` going in and
+`{ notesMarkdown, context }` coming out.
+### What makes this hard (junior developer blind spots)
+1. **Token durability across the agent loop.** The `continueToken` must survive process
+   crashes between LLM calls. If the process crashes after the LLM responds but before
+   `continueWorkflow` is called, the step is re-attempted on next start. The dedup key
+   system (`advance_recorded:sessionId:nodeId:attemptId`) handles this correctly, but
+   the daemon must persist the token durably.
+2. **Tool call routing within the agent loop.** LLM responses contain tool calls that
+   must be executed and results returned to the LLM BEFORE the LLM produces the final
+   `notesMarkdown` output to feed to `continueWorkflow`. This is the `agentLoop` pattern
+   from pi-mono -- it is a multi-turn LLM loop, not a single call.
+3. **Session lifecycle vs. agent context lifecycle.** A WorkRail session is durable (event
+   log, tokens, steps). An LLM context window is ephemeral (messages, tool results). The
+   daemon must coordinate these two lifecycles: the WorkRail session survives context
+   compaction; the LLM context window does not.
+---
+## Philosophy Constraints
+From CLAUDE.md (system instructions), confirmed by code patterns:
+- **Errors are data (neverthrow ResultAsync):** All handler code uses `RA` (ResultAsync)
+  chains. Daemon code MUST follow the same pattern. No try/catch in the agent loop.
+- **Branded types:** Daemon config must use branded types for credentials
+  (`AnthropicApiKey`, `GitLabToken`, etc.), not primitive strings.
+- **DI for boundaries:** The agent loop (LLM caller) MUST be injected as an
+  `AgentLoopPort`, not hardcoded to Anthropic SDK. This makes the LLM provider swappable.
+- **Exhaustive discriminated unions:** `DaemonConfig` process-boundary choice should be
+  a discriminated union, not a boolean flag.
+- **YAGNI with discipline:** The REST control plane is not speculative -- it is required
+  for human oversight in the changed trust model. The `engineActive` guard change is not
+  speculative -- it is required for concurrent sessions.
+**Philosophy conflicts: none.** The codebase exactly embodies the CLAUDE.md principles.
+No conflict between stated philosophy and repo patterns.
+---
+## Impact Surface
+### What must stay consistent if the daemon is added
+- `engine-factory.ts`: the `engineActive` guard and `createWorkRailEngine()` API. The
+  daemon must not require breaking changes to this API.
+- `src/mcp-server.ts`: the MCP server entry point must be unchanged. Existing Claude
+  Code / Cursor users must see no difference.
+- The session store append-only invariant: daemon sessions write to the same store as
+  MCP sessions. The lock protocol must hold under concurrent access.
+- The token protocol: daemon-initiated sessions produce HMAC tokens identical to
+  MCP-initiated sessions. The console must not distinguish them.
+- Existing workflows: zero changes required. The daemon reads `pending.prompt` from the
+  workflow step; the workflow does not know who is driving.
+### Nearby callers and consumers
+- Console (`console/`) -- reads session history from the same store. If daemon sessions
+  are added, they appear in the session list immediately (no console changes needed for
+  basic visibility; REST control plane needed for live view).
+- `src/engine/index.ts` -- the library export surface. If `createWorkRailEngine()` is
+  changed, consumers of the engine library must be updated.
+- `src/di/container.ts` -- the DI container. Any change to the `engineActive` guard
+  touches the container's initialization path.
+---
+## Candidates
+### Candidate 1: Minimal Sequential Daemon
+**Summary:** `src/daemon/entry.ts` calls `createWorkRailEngine()`, runs one session at a
+time via FIFO queue, drives the agent loop with direct Anthropic SDK calls, exits after
+each session.
+**Tensions resolved:** Simplicity; single-process; `engineActive` guard avoided via queue.
+**Tensions accepted:** No concurrent sessions; no live view; no human override.
+**Boundary:** `src/daemon/` only. No changes to engine, MCP server, or console.
+**Why this boundary:** The minimum viable boundary that proves autonomous execution works.
+**Failure mode:** Session throughput bottleneck. If sessions are 30-60 min each, the
+queue grows unbounded. Acceptable for MVP; unacceptable for production.
+**Repo pattern:** Directly follows `engine-factory.ts`. The engine was built for this.
+**Gain:** Ships fast, proves the agent loop concept, zero architectural risk.
+**Give up:** No concurrency, no live view, no human override.
+**Scope:** Too narrow for 12-month platform. Best-fit for 3-month proof-of-concept.
+**Philosophy:** Perfect fit. YAGNI applied correctly for a proof-of-concept target.
+---
+### Candidate 2: Pure MCP Client Daemon
+**Summary:** A separate `workrail-daemon` process connects to the running WorkRail MCP
+server over HTTP, calls `start_workflow` / `continue_workflow` via JSON-RPC, with no
+direct engine access.
+**Concrete shape:**
+- `packages/daemon/src/mcp-client.ts` -- `call(toolName, input)` over HTTP JSON-RPC
+- `packages/daemon/src/trigger/` -- trigger listeners
+- `packages/daemon/src/agent-loop/` -- same structure as C1 but calls MCP client
+- Deployment: two Docker services (MCP server + daemon)
+**Tensions resolved:** Clean process boundary; no `engineActive` concern; maximally
+decoupled; crash isolation; deployable anywhere.
+**Tensions accepted:** JSON-RPC overhead per step; two-process local dev; MCP server is
+single point of failure for both human and autonomous sessions.
+**Boundary:** Separate package/process. MCP HTTP transport is the interface.
+**Why this boundary:** The MCP protocol is the stable public interface. Calling it from
+the daemon ensures the daemon is never coupled to handler internals.
+**Failure mode:** MCP server crash stops both Claude Code users and autonomous sessions.
+**Repo pattern:** Departs from `engine-factory.ts`. Treats the MCP server as a black box.
+**Gain:** Maximum decoupling; daemon can be any language; natural cloud model.
+**Give up:** Two-process deployment; HTTP overhead; MCP server as prerequisite.
+**Scope:** Best-fit for 18-24 month distributed cloud. Too broad for MVP.
+**Philosophy:** Honors DI at the process level (ultimate boundary). Mild YAGNI conflict.
+---
+### Candidate 3: Composite Same-Process (recommended)
+**Summary:** `src/daemon/` calls the engine via a shared instance (not two separate
+`createWorkRailEngine()` calls), with concurrent sessions managed by `DaemonSessionManager`,
+and a thin REST/SSE control plane added to the existing HTTP server.
+**Concrete shape:**
+- `src/engine/engine-factory.ts` change: instead of a boolean `engineActive` guard, expose
+  a `getSharedEngine(config): WorkRailEngine` that creates the engine once and returns the
+  same instance to all callers (MCP server entry + daemon entry). The guard becomes:
+  "container initialized: yes/no" rather than "engine in use: yes/no."
+- `src/daemon/session-manager.ts` -- `DaemonSessionManager`: `Map<SessionId, DaemonSession>`,
+  each session running as an independent `Promise` chain with its own `continueToken`.
+  `DaemonSession = { continueToken: string; status: 'running' | 'paused' | 'complete' | 'failed'; abortController: AbortController }`
+- `src/daemon/agent-loop/step-runner.ts` -- `runStep(pending: PendingStep, toolExecutor, llmPort): Promise<StepOutput>`.
+  Multi-turn loop: call LLM -> execute tool calls -> return to LLM -> repeat until
+  LLM produces `continueWorkflow` output. `StepOutput = { notesMarkdown: string; context: Record<string, unknown> }`
+- `src/daemon/trigger/gitlab-webhook.ts` -- HTTP listener, parses MR opened events.
+- `src/daemon/tool-executor/local.ts` -- `Bash(cmd: string): RA<string, ToolError>`,
+  `Read(path: string): RA<string, ToolError>`, `Write(path, content): RA<void, ToolError>`.
+  Also `BashInRepo(repo: string, cmd: string)` for cross-repo routing.
+- REST additions to existing HTTP server:
+  - `GET /api/v2/sessions/:id/daemon-status` -- `{ status, currentStepTitle, startedAt }`
+  - `POST /api/v2/sessions/:id/pause`
+  - `POST /api/v2/sessions/:id/resume`
+  - `DELETE /api/v2/sessions/:id` (cancel + abort LLM call)
+**Tensions resolved:** Single deployment; concurrent sessions; live view; human override;
+`engineActive` solved by shared instance (not relaxed); correct enforcement (HMAC identical
+for daemon and manual sessions).
+**Tensions accepted:** Shared process means shared failure domain (daemon crash affects MCP
+server -- use process supervisor to restart).
+**Boundary:** Same process; new `src/daemon/` module; minor HTTP server additions.
+**Why this boundary:** The 12-month success criteria require concurrent sessions AND live
+view AND single deployment. This is the only candidate that satisfies all three.
+**Failure mode:** If the daemon's agent loop has a bug that crashes the process, the MCP
+server also goes down. Mitigation: crash isolation within the daemon (try/catch at the
+session manager boundary, not inside handlers) and a process supervisor.
+**Repo pattern:** Directly adapts `engine-factory.ts`. Requires one targeted change to the
+engine factory (shared instance pattern). All other code follows existing patterns.
+**Gain:** Single process, concurrent sessions, live view, human control, upgrade path to C4.
+**Give up:** Shared failure domain (vs. C4's isolated processes).
+**Scope:** Best-fit for 12-month vision.
+**Philosophy:** Honors DI (engine injected, `AgentLoopPort` injected), errors-as-data
+(ResultAsync throughout), immutability (session store append-only), exhaustiveness
+(`DaemonSession.status` is a discriminated union). No conflicts.
+---
+### Candidate 4: Composite Separate-Process
+**Summary:** The daemon runs as a separate process with its own `createWorkRailEngine()`
+instance, sharing durable session state with the MCP server through a shared `dataDir`.
+**Concrete shape:**
+- `packages/daemon/src/entry.ts` -- new process, calls `createWorkRailEngine({ dataDir: sharedPath })`
+- Same `DaemonSessionManager`, `step-runner`, `trigger/`, `tool-executor/` as C3
+- Separate HTTP port (default: 3101) for daemon control plane
+- `withHealthySessionLock` file locking ensures safe cross-process session store writes
+- Console proxies to both port 3100 (MCP server) and port 3101 (daemon) for live view
+**Tensions resolved:** No `engineActive` guard change needed (each process has one engine);
+crash isolation (daemon crash does not affect MCP server); natural cloud upgrade path.
+**Tensions accepted:** Two-process deployment for local dev; filesystem coordination limits
+to single machine without shared volume; separate HTTP port for daemon control.
+**Boundary:** Separate process with shared filesystem state.
+**Why this boundary:** Cleanest architectural expression. No guard changes. Natural path
+to cloud (swap `LocalDataDirV2` for a remote-backed store port).
+**Failure mode:** Lock contention on shared session store under high concurrent load.
+`withHealthySessionLock` handles this, but cross-process file locking has not been tested
+with two WorkRail processes.
+**Repo pattern:** Adapts `engine-factory.ts` correctly (one engine per process). Departs
+from single-process assumption in `mcp-server.ts`.
+**Gain:** Clean process boundary; no guard change; independent scaling; cloud-natural.
+**Give up:** Two-process local dev; lock contention risk; more complex setup.
+**Scope:** Best-fit for 18-month cloud target. Slightly broad for 12-month local-first.
+**Philosophy:** Architecturally the purest expression of all principles. One engine per
+process, no guard relaxation, cleanest DI. No conflicts.
+---
+## Comparison and Recommendation
+| | C1 | C2 | C3 | C4 |
+|---|---|---|---|---|
+| Single deployment | Yes | No | Yes | No |
+| Concurrent sessions | No | Yes | Yes | Yes |
+| Human override (live view) | No | Partial | Yes | Yes |
+| engineActive change | None | None | Shared instance | None |
+| Cloud upgrade path | Hard | Native | Port swap | Natural |
+| Repo pattern fit | Perfect | Departs | Perfect + extend | Perfect + extend |
+| Philosophy fit | Perfect | Good | Perfect | Best |
+| Ship complexity | Low | High | Medium | High |
+**Recommendation: Candidate 3.**
+The 12-month success criteria require concurrent sessions AND live view AND single
+deployment. Only C3 satisfies all three. The safety concern (concurrent handler calls)
+is resolved by code analysis -- `V2Dependencies` is stateless and the session store
+serializes per-session writes. The `engineActive` guard change (boolean -> shared
+instance) is a targeted, well-understood change.
+---
+## Self-Critique
+### Strongest counter-argument
+C1 (sequential) ships faster and proves the actual unknowns: does the agent loop work?
+Does the trigger system work? Does the daemon produce correct `notesMarkdown`? The REST
+control plane is a developer experience feature, not a correctness feature. If the
+primary goal is "demonstrate autonomous execution," C1 is the better choice. C3 adds
+scope before the core concept is proven.
+### Narrower option that could work
+C1 with a note: "expand to C3 in the next iteration." C1 is a strict subset of C3 --
+the FIFO queue is a degenerate case of C3's `DaemonSessionManager` (concurrency = 1).
+Starting with C1 and expanding to C3 is a valid staged approach.
+### Broader option and what would justify it
+C4 (separate process) is justified if cloud deployment becomes a committed 12-month
+goal. The migration from C3 to C4 is: extract `src/daemon/` into `packages/daemon/`,
+add a separate process entry point, verify cross-process lock safety. The daemon code
+itself does not change -- only the process boundary changes.
+### Assumption that would invalidate this recommendation
+If `withHealthySessionLock` does NOT safely handle concurrent callers within the same
+process (i.e., if the lock is not reentrant-safe for async calls), then concurrent
+sessions in C3 would corrupt the session store. This is unlikely (the lock is designed
+for concurrent writes) but must be verified before shipping C3 with concurrency enabled.
+---
+## Open Questions for the Main Agent
+1. Should the first daemon version use C1 (sequential, ship fast) or C3 (concurrent,
+   full scope) as the initial implementation target?
+2. The `AgentLoopPort` interface -- should it abstract the full LLM conversation turn
+   (multi-turn tool call loop) or just the single LLM API call? A full-turn abstraction
+   is cleaner but harder to design. A single-call abstraction leaks the tool call loop
+   into the daemon.
+3. Is pi-mono's `agentLoop` the right reference for the agent loop implementation, or
+   should WorkRail build a minimal implementation against the Anthropic SDK directly?
+4. Cross-repo execution: is it a 12-month must-have or a post-12-month feature? If it
+   is a must-have, `BashInRepo` / `ReadRepo` must be designed now. If it is post-12-month,
+   the tool executor can be simpler (single-workspace Bash/Read/Write).
+5. The `engineActive` guard change: should it be a shared singleton instance pattern, or
+   a ref-counted guard, or something else? The choice affects how tests isolate engine
+   instances.

package/docs/design/daemon-design-review-findings.md ADDED Viewed

@@ -0,0 +1,119 @@
+# WorkRail Daemon Architecture: Design Review Findings
+> Review output for the selected direction: Candidate 3 (Composite Same-Process) with
+> Candidate 1 safety defaults (maxConcurrentSessions: 1 for v1).
+> Generated: 2026-04-14.
+---
+## Tradeoff Review
+| Tradeoff | Acceptable? | Failure Condition | Hidden Assumption |
+|----------|-------------|------------------|-------------------|
+| Shared process failure domain | Yes (local-first 12-month scope) | WorkRail deployed as shared multi-user server | Daemon agent loop is well-behaved (AbortController timeouts required) |
+| Process-level init change (`initializeWorkRailProcess`) | Yes (internal, invisible to users) | `runtimeMode` discriminant insufficient for combined mode -- may need third mode or flags object | DI container initialization has no entry-point-specific services that conflict |
+| Cross-repo deferred to post-MVP | Yes (backlog explicitly says post-MVP) | First real use case (MR review) requires cross-repo | MVP MR review workflow is single-repo -- must be confirmed with actual first workflow target |
+---
+## Failure Mode Review
+| Failure Mode | Design Handling | Missing Mitigation | Risk Level |
+|---|---|---|---|
+| Hanging agent loops | `AbortController` in `DaemonSession`; REST cancel calls `abort()` | `runStep` must accept `AbortSignal` parameter -- currently not in spec | **ORANGE** -- manageable but must be explicit in design |
+| Two `initializeContainer()` calls corrupting DI state | Process-level `initializeWorkRailProcess()` called once | Exact interface (`SharedEngineContext`) not yet specified; `mcp-server.ts` startup path needs refactor | **ORANGE** -- must be designed and tested first; highest-risk change |
+| Lock contention under high concurrent load | `withHealthySessionLock` per session; v1 uses queue (concurrency=1) | Session concurrency limit in `DaemonSessionManager` (max N for v1.5) | **YELLOW** -- performance concern, not correctness |
+---
+## Runner-Up / Simpler Alternative Review
+**Runner-up (Candidate 1: Sequential):**
+- C1's FIFO queue ensures `engineActive` guard is never violated without requiring the guard to change
+- This strength is worth borrowing: v1 runs with `maxConcurrentSessions: 1`
+- C1 loses because it provides no live view and no human override path -- unacceptable for the trust model change that autonomous execution represents
+**Simpler alternative (C3 without REST control plane):**
+- Saves ~150 lines of code in v1
+- Loses: operators cannot pause a runaway autonomous session
+- For local dev (one developer, their own machine), acceptable
+- For team deployment, unacceptable safety gap
+- Decision: include REST control plane in v1; keep it simple (3-4 routes)
+**Hybrid adopted: C3 with `maxConcurrentSessions: 1` default**
+- C1 safety (queue) + C3 architecture (DaemonSessionManager, REST control plane)
+- No `engineActive` guard change needed in v1 (queue ensures one engine call at a time)
+- Path to full concurrency: design `SharedEngineContext`, enable in v1.5
+---
+## Philosophy Alignment
+**Satisfied clearly:**
+- Errors as data (ResultAsync throughout)
+- Immutability (append-only events, typed status transitions)
+- Make illegal states unrepresentable (`DaemonSession.status` discriminated union)
+- Explicit domain types (`AnthropicApiKey`, `GitLabToken` branded types)
+- Validate at boundaries (Zod for trigger payloads)
+- DI for boundaries (`AgentLoopPort`, `ToolExecutorPort` injected)
+- YAGNI with discipline (maxConcurrentSessions:1, cross-repo deferred)
+**Under tension (all acceptable):**
+- Determinism: LLM outputs are non-deterministic by nature; WorkRail's value is structural enforcement, not content determinism
+- Pure functions: multi-turn LLM loop is inherently stateful; `runStep` API is as pure as possible
+- Architectural fixes over patches: queue is a deliberate v1 design, not a hidden workaround; SharedEngineContext is designed and documented
+---
+## Findings
+### RED (blocking -- must be resolved before implementation begins)
+None.
+### ORANGE (must address before shipping)
+**[ORANGE-1] `runStep` missing `AbortSignal` parameter**
+- Finding: The `step-runner` spec does not include an `AbortSignal` parameter. Without it, `DaemonSession.abortController.abort()` does not propagate to LLM calls or Bash subprocesses.
+- Required fix: `runStep(pending: PendingStep, toolExecutor: ToolExecutorPort, llmPort: AgentLoopPort, signal: AbortSignal): RA<StepOutput, StepError>`
+- Impact: REST `DELETE /api/v2/sessions/:id` (cancel) and `POST pause` do not work without this
+**[ORANGE-2] `SharedEngineContext` interface not specified**
+- Finding: The process-level `initializeWorkRailProcess()` function is identified as needed but its return type and contract are not designed.
+- Required: `initializeWorkRailProcess(config: ProcessConfig): Promise<SharedEngineContext>` where `SharedEngineContext` exposes the engine instance + DI-resolved ports that both MCP server and daemon entry points need.
+- Impact: If both entry points call `initializeContainer()` independently, DI container state is indeterminate. This is the highest-risk change; must be designed and tested first.
+- Note: For v1 with `maxConcurrentSessions: 1`, the queue ensures the `engineActive` boolean guard is never violated -- so this is not needed for v1. But it must be designed in v1 and tested before enabling concurrency in v1.5.
+### YELLOW (should address, not blocking)
+**[YELLOW-1] Session concurrency limit not specified**
+- Finding: `DaemonSessionManager` has no upper bound on concurrent sessions even in the v1.5 full-concurrency mode.
+- Recommendation: Add `maxConcurrentSessions: number` to `DaemonConfig` with a safe default (e.g., 10). Sessions beyond the limit are queued, not rejected.
+**[YELLOW-2] `runtimeMode` may be insufficient**
+- Finding: The current `runtimeMode` discriminant (`library` | `server`) does not express "server + daemon combined" mode. A third mode or flags object may be needed.
+- Recommendation: Evaluate whether `initializeContainer({ runtimeMode: 'server', daemon: true })` is sufficient or whether a new mode value is needed when designing `SharedEngineContext`.
+**[YELLOW-3] Cross-repo tool executor interface not extensible**
+- Finding: The v1 tool executor spec (`Bash`, `Read`, `Write`) is single-workspace. If the first real use case requires cross-repo, the interface must be redesigned.
+- Recommendation: Design `ToolExecutorPort` to support an optional `repo` parameter from day one: `Bash(cmd: string, opts?: { repo?: string }): RA<string, ToolError>`. Single-workspace behavior when `repo` is absent; cross-repo routing when present. Costs nothing to include; prevents a breaking interface change later.
+---
+## Recommended Revisions
+1. **[Required for v1]** Add `signal: AbortSignal` to `runStep` signature.
+2. **[Required for v1]** Design `SharedEngineContext` interface and `initializeWorkRailProcess()` signature (even if only the queue-mode path is enabled in v1).
+3. **[Strongly recommended for v1]** Design `ToolExecutorPort` with optional `repo` parameter to avoid a future breaking change.
+4. **[v1.5]** Evaluate `runtimeMode` extension before enabling full concurrency.
+5. **[v1.5]** Add session concurrency limit with safe default.
+---
+## Residual Concerns
+1. **Agent loop correctness is the riskiest unknown.** The step-runner must correctly handle multi-turn LLM conversations with tool calls (not just single LLM calls). This is the piece that has never been built before in WorkRail. The pi-mono `agentLoop` reference is the best existing implementation to study. Whether to use pi-mono directly or implement from scratch against the Anthropic SDK is an unresolved dependency decision.
+2. **`mcp-server.ts` refactor scope is uncertain.** Introducing `initializeWorkRailProcess()` requires refactoring how `startStdioServer` and `startHttpServer` initialize the container. The scope of this refactor depends on how deeply initialization is entangled in each transport entry point. This should be the first code spike when C3 implementation begins.
+3. **Console integration is not specified.** The REST control plane additions are specified (`daemon-status`, `pause`, `resume`, `cancel`). How these appear in the console UI is not -- that is a separate design question for the console team / next iteration.

package/docs/design/daemon-engine-design-candidates.md ADDED Viewed

@@ -0,0 +1,210 @@
+# Daemon Execution Engine -- Design Candidates
+**Status:** Raw investigative material -- for main agent review
+**Date:** 2026-04-14
+**Context:** Architecture decision for WorkRail's autonomous execution daemon
+---
+## Problem Understanding
+### Core tensions
+1. **Build speed vs. structural correctness**: `engine-factory.ts` is 477 lines and already wraps the exact same v2 handlers the MCP tools call. Using it from a daemon looks like a 50-line win. But it has two hard correctness bugs when used alongside a running MCP server.
+2. **Colocation vs. isolation**: The DI container is a module-level tsyringe global singleton. Its design invariant is one container per process. Forcing two concurrent execution paths through it violates the invariant by design, not by accident.
+3. **API surface stability vs. speed of access**: Option A gives the daemon access to internal handler functions (`executeStartWorkflow`, `executeContinueWorkflow`) -- no versioning, no contract boundary. Option B uses the MCP HTTP API -- versioned, stable, Zod-validated.
+4. **Testing simplicity vs. deployment correctness**: Option A requires no running HTTP server in tests. Option B does (or a mock MCP server). This is a real cost, not a theoretical one.
+### Likely seam
+**OS process boundary.** WorkRail already has two modes: library (same process, no signals) and server (own process, HTTP transport). The daemon belongs in a third role: a separate process that is a consumer of the server's MCP HTTP API. The seam is the `/mcp` endpoint.
+### What makes this hard / what a junior developer would miss
+- `engineActive = false` in `engine-factory.ts` is a hard block, not advisory
+- `process.kill(pid, 0)` in `LocalSessionLockV2` cannot distinguish two call paths that share a PID -- it will treat a daemon-held lock as valid even after the daemon crashes, because the process (MCP server) is still alive
+- The DI global singleton means both paths share keyring material -- a compromise or divergence in one affects the other
+- `ThrowingProcessTerminator` in library mode throws instead of calling `process.exit()` -- fine for embedding, but if the daemon has an invariant violation in this mode, the event loop continues in a possibly corrupt state
+---
+## Philosophy Constraints
+From `/Users/etienneb/CLAUDE.md` and codebase patterns:
+**Principles under most pressure:**
+- **Make illegal states unrepresentable**: Option A makes a lock-held-by-same-process state possible and undetectable. Option B makes it structurally impossible.
+- **Dependency injection for boundaries**: Option A collapses the boundary between daemon and server infrastructure. Option B preserves it -- the MCP API is the injected boundary.
+**Principles that both options satisfy:**
+- **Errors are data** (`neverthrow` / `ResultAsync`): Both options can surface typed errors.
+- **YAGNI with discipline**: Neither option over-engineers.
+**No stated-vs-practiced conflicts** observed in the codebase.
+---
+## Impact Surface
+If Option B is chosen:
+- `http-entry.ts` / `http-listener.ts`: no changes needed -- HTTP server is already multi-client
+- `StreamableHTTPServerTransport` with `sessionIdGenerator: crypto.randomUUID`: already handles concurrent MCP clients
+- Daemon uses `@modelcontextprotocol/sdk/client` (already a dependency for the SDK)
+- `continueToken` / `checkpointToken` are the only state the daemon carries between steps
+If Option A were chosen (rejected):
+- `engineActive` guard would need to be bypassed or disabled -- violates explicit design intent
+- `LocalSessionLockV2.acquire()` would give false "lock is valid" results for daemon-held locks after daemon crashes
+- Both paths would share the same `DI.V2.Keyring` instance -- any key rotation in one path invalidates tokens in the other
+---
+## Candidates
+### Candidate 1: Daemon as MCP HTTP client (RECOMMENDED)
+**Summary:** Separate OS process. Uses `@modelcontextprotocol/sdk/client` pointed at `localhost:3100/mcp`. Drives sessions via `start_workflow` / `continue_workflow` HTTP calls. No shared in-process state with the MCP server.
+**Tensions resolved:**
+- Colocation vs. isolation: resolved -- different PIDs, separate DI containers
+- API stability: resolved -- MCP HTTP contract is versioned and Zod-validated
+- Cloud/Docker portability: resolved -- HTTP over localhost = HTTP over private network
+**Tension accepted:** Testing requires a running HTTP server or mock MCP server.
+**Boundary solved at:** OS process boundary via HTTP. This is the boundary WorkRail already establishes for its MCP transport.
+**Why this boundary is the best fit:** The `engineActive` guard, the DI global singleton, and the PID-based lock check are all designed around the process boundary as the fundamental isolation unit. Respecting this boundary means working with the codebase's invariants, not against them.
+**Failure mode:** MCP session token propagation -- `StreamableHTTPServerTransport` requires `Mcp-Session-Id` headers after session establishment. The MCP SDK client handles this automatically; using raw `fetch` would require manual header management. Mitigation: use the SDK client.
+**Repo-pattern relationship:** Follows `http-entry.ts` (HTTP transport already established). Adapts pi-mono's `agentLoop` (stateless loop calling external tools). No departure from existing patterns.
+**Gains:**
+- Zero session lock contention
+- Zero DI collision
+- Cloud/Docker portable without code changes
+- Independently testable with mock server
+- Daemon crash only affects its own sessions (no shared state to corrupt)
+**Gives up:**
+- Direct function call latency (~1-5ms per step vs. microseconds)
+- Requires MCP HTTP server to be running at startup
+**Scope judgment:** Best-fit. Directly addresses the problem without over-engineering.
+**Philosophy fit:**
+- Honors: make-illegal-states-unrepresentable, DI-for-boundaries, validate-at-boundaries, errors-are-data
+- Conflicts: none
+---
+### Candidate 2: engine-factory via child_process.fork + IPC
+**Summary:** Daemon forks a child process that exclusively runs `createWorkRailEngine()`. Parent sends JSON-serialized `{ kind: 'start' | 'continue' | 'checkpoint', ... }` messages over `process.send()` IPC. Child responds with serialized `EngineResult`. Different PIDs -- lock check works.
+**Tensions resolved:**
+- Colocation vs. isolation: resolved (different PIDs via fork)
+- Test ergonomics: resolved (no HTTP server needed)
+**Tension accepted:**
+- API surface stability: weak -- IPC message format is ad-hoc, not versioned
+- New keyring divergence risk: if two processes load the same keyring file independently and either rotates keys, token signatures from the other process become invalid
+- Cloud/Docker: forks don't cross container boundaries; must be replaced with a network transport for cloud
+**Boundary solved at:** OS process boundary via `child_process.fork()` IPC.
+**Why this boundary is NOT the best fit:** It introduces a novel IPC protocol not present in the codebase and creates a keyring divergence risk that does not exist in Option B. The fork model works locally but fails in Docker multi-container or any distributed deployment.
+**Failure mode (critical):** Keyring divergence. Two processes loading the same `~/.workrail/keyring.json` get the same initial HMAC keys. But if either process rotates keys (key expiry, re-keying), the other process's in-memory keyring diverges. Tokens signed by process A may fail validation in process B. This is a non-obvious correctness risk with no mitigation short of external coordination.
+**Repo-pattern relationship:** Departs from existing patterns. No `child_process.fork()` or IPC in the codebase.
+**Gains:**
+- PID isolation (lock check works)
+- No HTTP server dependency
+- Reuses typed `WorkRailEngine` API
+**Gives up:**
+- Introduces ad-hoc IPC protocol
+- Keyring divergence risk
+- No cloud portability
+- More maintenance burden than Option B
+**Scope judgment:** Too broad. Solves the PID problem but introduces a new correctness risk. More complex than Option B without the deployment benefits.
+**Philosophy fit:**
+- Honors: make-illegal-states-unrepresentable (PIDs now different), errors-are-data
+- Conflicts: YAGNI (novel IPC layer), validate-at-boundaries (IPC serialization is unvalidated), architectural-fixes-over-patches (this is a patch, not a fix)
+---
+### Candidate 3: Hybrid -- MCP HTTP in production, engine-factory in test/local
+**Summary:** At startup, the daemon checks: if `WORKRAIL_TRANSPORT=http` and `localhost:{port}/mcp` is reachable, use Candidate 1 (MCP HTTP client). Otherwise, use `createWorkRailEngine()` directly (Candidate A). Internal `DaemonTransport` union type: `{ kind: 'mcp_http'; client: McpClient } | { kind: 'direct'; engine: WorkRailEngine }`.
+**Tensions resolved:**
+- Test ergonomics: resolved (direct path needs no HTTP server)
+- Build speed: partially resolved (reuse engine-factory for local scenarios)
+**Tension accepted:** Two code paths must be maintained. Any new engine feature must be reflected in both, or the direct path diverges from the MCP path over time.
+**Boundary solved at:** Startup-time capability detection. Adapts the `resolveTransportMode()` pattern from `mcp-server.ts`.
+**Why this boundary is not the best fit:** The `direct` path still has the `engineActive` singleton constraint and the PID aliasing risk if an MCP server is accidentally co-located. The runtime check protecting against this is advisory (a thrown Error), not structural (a type system constraint).
+**Failure mode:** If `WORKRAIL_DAEMON_TRANSPORT=direct` is set in a production environment where an MCP server is also running, the PID aliasing bug returns silently. There is no compile-time protection.
+**Repo-pattern relationship:** Adapts `resolveTransportMode()` from `mcp-server.ts`. Reasonable adaptation.
+**Gains:**
+- Test ergonomics (no HTTP server needed in direct mode)
+- Migration path for library embedding use cases
+**Gives up:**
+- Two-path maintenance burden
+- Direct path safety is advisory, not structural
+- Adds conditional logic to every daemon session operation
+**Scope judgment:** Slightly too broad for the production case. Best-fit only if test ergonomics is a primary blocking concern.
+**Philosophy fit:**
+- Honors: YAGNI (reuses existing API), errors-are-data
+- Conflicts: make-illegal-states-unrepresentable (direct path allows aliasing), architectural-fixes-over-patches (hybrid is a patch)
+---
+## Comparison and Recommendation
+| Criterion | C1 (MCP HTTP) | C2 (fork+IPC) | C3 (hybrid) |\n|-----------|-------------|-------------|------------|\n| Lock contention | Resolved structurally | Resolved via fork | Resolved in prod path |\n| DI isolation | Resolved | Resolved | Resolved in prod path |\n| Keyring safety | N/A (server owns it) | New risk | N/A in MCP path |\n| Cloud/Docker | Excellent | Poor (no cross-container fork) | Good only in MCP path |\n| API stability | Strong (MCP contract) | Weak (ad-hoc IPC) | Mixed |\n| Test ergonomics | Needs mock server | No server needed | No server needed (direct) |\n| Maintenance burden | Low (single path) | High (IPC layer) | Medium (two paths) |\n| Repo pattern fit | Excellent | Poor | Acceptable |\n| Philosophy alignment | Strong | Weak | Mixed |\n\n**Recommendation: Candidate 1 (MCP HTTP client)**
+All five decision criteria are satisfied structurally, not by advisory guards or operational conventions. Cloud portability is zero-cost. The implementation is the shortest path to correctness (~100 lines for the daemon's MCP client wrapper + agent loop), not the shortest path to a running prototype.
+---
+## Self-Critique
+**Strongest counter-argument against C1:**
+The MCP HTTP server must be running before the daemon operates. In a single-binary deployment, this requires orchestrating two processes. In practice: use a process supervisor (PM2, systemd, Docker Compose `depends_on`), or have the daemon implement a startup retry loop. This is operational boilerplate, not a correctness problem.
+**What narrower option might still work:**
+Candidate 3 (hybrid) satisfies all criteria in its MCP path. It loses because the two-path maintenance burden and the advisory-only guard on the direct path make it structurally weaker than C1 with no material benefit that C1 cannot achieve with a test-only mock server.
+**What broader option might be justified:**
+A full job queue (Redis/BullMQ backing the daemon) for multi-tenant SaaS scale. Evidence required: concurrent sessions, multiple daemon instances, distributed scheduling. Not in scope.
+**Assumption that would invalidate this design:**
+The daemon must operate in an air-gapped/offline environment with no localhost HTTP available. In that case, Candidate 2 (fork+IPC) is the right shape -- but requires resolving the keyring divergence risk first (e.g., move token signing to a shared file-based signing service, or use the same process for both keyring and daemon).
+---
+## Open Questions for the Main Agent
+1. Is the MCP HTTP startup dependency acceptable, or is there a single-binary deployment requirement that makes Option B impractical?
+2. Should the daemon use a test-mode mock MCP server (simulated in-memory) or a real HTTP server in unit tests?
+3. Should the `DaemonTransport` abstraction from Candidate 3 be built as a future extension point even if only the MCP path is implemented initially?
+4. Is the 1-5ms HTTP overhead per step a real concern for the planned step intervals (seconds to minutes), or is it safe to ignore?
+5. What is the expected concurrency model for the daemon -- one session at a time, or multiple concurrent sessions driving different workflows?