npm - @exaudeus/workrail - Versions diffs - 3.34.2 → 3.35.0 - Mend

@exaudeus/workrail 3.34.2 → 3.35.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/dist/console-ui/assets/{index-DSRkHTz1.js → index-B10Bn8qC.js} +1 -1
package/dist/console-ui/index.html +1 -1
package/dist/daemon/workflow-runner.d.ts +1 -0
package/dist/daemon/workflow-runner.js +148 -10
package/dist/manifest.json +7 -7
package/docs/design/daemon-complete-step-tool-candidates.md +160 -0
package/docs/design/daemon-complete-step-tool-design-review.md +82 -0
package/docs/design/daemon-complete-step-tool-implementation-plan.md +166 -0
package/docs/ideas/backlog.md +224 -0
package/package.json +1 -1

package/docs/design/daemon-complete-step-tool-implementation-plan.md ADDED Viewed

@@ -0,0 +1,166 @@
+# Implementation Plan: daemon complete_step tool
+## Problem Statement
+The daemon's `continue_workflow` tool requires the LLM to round-trip a `continueToken` (an HMAC-signed opaque token). The LLM frequently mangles this token, causing TOKEN_BAD_SIGNATURE errors that kill sessions. The fix: add a `complete_step` tool where the daemon injects the `continueToken` internally -- the LLM never sees it.
+## Acceptance Criteria
+1. `makeCompleteStepTool()` function exists in `src/daemon/workflow-runner.ts`, exported alongside `makeContinueWorkflowTool()`
+2. `complete_step` accepts: `notes: string` (required, min 50 chars), `artifacts?: unknown[]`, `context?: Record<string, unknown>`
+3. The tool injects the current session's `continueToken` internally before calling `executeContinueWorkflow` -- LLM never provides it
+4. `continueToken` is managed via a closure variable `currentContinueToken` in `runWorkflow()`, updated on advance and blocked-retry
+5. On successful advance: returns `{ status: 'advanced', nextStep: '<step title>' }` text
+6. On workflow complete: returns `{ status: 'complete' }` text
+7. On blocked: returns human-readable feedback saying 'call complete_step again' (not continue_workflow)
+8. Runtime validation: throws if `notes.length < 50`
+9. `complete_step` is in the daemon tools list in `runWorkflow()` alongside `continue_workflow`
+10. `continue_workflow` description is marked `[DEPRECATED in daemon sessions -- use complete_step]`
+11. `BASE_SYSTEM_PROMPT` lists `complete_step` as the primary advancement tool
+12. Initial prompt removes `continueToken: ${startContinueToken}` and says 'Call complete_step with your notes when done'
+13. Unit tests cover: happy-path advance, workflow complete, blocked response (retryable + non-retryable), notes too short, artifacts pass-through
+14. All existing tests pass
+## Non-Goals
+- No schema changes to `V2ContinueWorkflowInputShape` or public MCP tools
+- No new HTTP routes or public API changes
+- `complete_step` is daemon-only -- NOT added to the MCP server's public tools list
+- No migration of `makeContinueWorkflowTool` to use `TokenRef` pattern
+- No removal of `continue_workflow` from the daemon tools list (backward compat)
+- No changes to `rehydrate` flow
+## Philosophy-Driven Constraints
+- **Immutability by default**: `currentContinueToken` is the only mutable shared state; confined to `runWorkflow()` closure
+- **Validate at boundaries**: runtime `notes.length` check in `execute()` (JSON Schema is informational only)
+- **Prefer fakes over mocks**: tests use fake `executeContinueWorkflow` injection
+- **YAGNI**: zero new abstraction types
+- **Document "why" not "what"**: WHY comments on all non-obvious decisions
+## Invariants
+1. `persistTokens(sessionId, newToken, ...)` is always called BEFORE `onAdvance()` fires (crash safety)
+2. `currentContinueToken` is updated to the advance token on `kind: 'ok'` responses
+3. `currentContinueToken` is updated to the retry token on `kind: 'blocked'` responses
+4. `continueToken` never appears in `complete_step` response text
+5. `intent: 'advance'` is hardcoded -- LLM cannot pass `rehydrate` via `complete_step`
+6. `complete_step` is NOT exposed in the MCP server's public tool registration
+## Selected Approach
+**Candidate 1: Inline closure variable + two callback paths**
+`runWorkflow()` holds `let currentContinueToken = startContinueToken`. The variable is updated:
+- In `onAdvance(stepText, continueToken)` -- uses the `continueToken` param (already available, currently ignored)
+- Via `onTokenUpdate: (t: string) => void` callback passed to `makeCompleteStepTool()` -- called on blocked retry
+`makeCompleteStepTool()` signature:
+```typescript
+export function makeCompleteStepTool(
+  sessionId: string,
+  ctx: V2ToolContext,
+  getCurrentToken: () => string,
+  onAdvance: (nextStepText: string, continueToken: string) => void,
+  onComplete: (notes: string | undefined) => void,
+  onTokenUpdate: (t: string) => void,
+  schemas: Record<string, any>,
+  _executeContinueWorkflowFn?: typeof executeContinueWorkflow,
+  emitter?: DaemonEventEmitter,
+  workrailSessionId?: string | null,
+): AgentTool
+```
+**Runner-up:** Candidate 2 (`TokenRef` object) -- rejected: requires `makeContinueWorkflowTool` signature change (out of scope), violates YAGNI.
+## Vertical Slices
+### Slice 1: Schema and factory function
+**Files:** `src/daemon/workflow-runner.ts`
+**Work:**
+- Add `CompleteStepParams` JSON Schema to `getSchemas()`:
+  ```json
+  {
+    "type": "object",
+    "properties": {
+      "notes": { "type": "string", "minLength": 50, "description": "..." },
+      "artifacts": { "type": "array", "items": {}, "description": "..." },
+      "context": { "type": "object", "additionalProperties": true }
+    },
+    "required": ["notes"],
+    "additionalProperties": false
+  }
+  ```
+- Add `makeCompleteStepTool()` factory function (described above)
+- Runtime notes validation inside `execute()`
+- Full blocked/advance/complete handling (mirror `makeContinueWorkflowTool` but without token in response text)
+**AC:** Function exists, exported, executes correctly with fake injection
+### Slice 2: runWorkflow() integration
+**Files:** `src/daemon/workflow-runner.ts`
+**Work:**
+- Add `let currentContinueToken = startContinueToken` after `startContinueToken` is assigned
+- Update `onAdvance` to set `currentContinueToken = continueToken` (second param was ignored before)
+- Add `complete_step` to the tools list with `makeCompleteStepTool(..., () => currentContinueToken, onAdvance, onComplete, (t) => { currentContinueToken = t; }, ...)`
+- Mark `continue_workflow` description as deprecated
+**AC:** `runWorkflow()` compiles; `currentContinueToken` is updated on advance
+### Slice 3: System prompt + initial prompt updates
+**Files:** `src/daemon/workflow-runner.ts`
+**Work:**
+- Update `BASE_SYSTEM_PROMPT` tools section: add `complete_step` as primary tool, mark `continue_workflow` as deprecated
+- Update `initialPrompt` in `runWorkflow()`: remove `continueToken: ${startContinueToken}` from the text; change 'call continue_workflow' to 'call complete_step'
+**AC:** Prompt does not contain the continueToken string; 'complete_step' appears as primary tool
+### Slice 4: Tests
+**Files:** `tests/unit/workflow-runner-complete-step.test.ts`
+**Test cases:**
+- TC1: notes present, advance returns `{ status: 'advanced', nextStep: '...' }`
+- TC2: workflow complete returns `{ status: 'complete' }`
+- TC3: blocked retryable -- feedback says 'call complete_step again', `onTokenUpdate` called with retry token
+- TC4: blocked non-retryable -- feedback says cannot proceed without resolving
+- TC5: notes too short (< 50 chars) -- tool throws
+- TC6: notes absent -- tool throws
+- TC7: artifacts pass-through -- artifacts forwarded to executeContinueWorkflow
+- TC8: no artifacts, no output artifact object constructed (empty array guard)
+- TC9: `continueToken` NOT in response text (regression guard)
+- TC10: `getCurrentToken()` is called (not a hardcoded token) -- verify via fake that captures input
+**AC:** All 10 tests pass; `npm run test:unit` green
+## Risk Register
+| Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|
+| LLM uses deprecated continue_workflow with hallucinated token | Low | Medium | System prompt deprecation notice; transition risk accepted |
+| Token not updated on blocked retry | Low | High | Test TC3 catches this |
+| `continueToken` in response text | Low | Medium | Test TC9 catches this |
+| Initial prompt still contains token string | Low | Medium | Test prompt content in system-prompt test |
+## PR Packaging Strategy
+Single PR: `feat/daemon-complete-step-tool`
+All 4 slices go in one commit per slice (or a single clean commit). No multi-PR needed -- change is additive, no breaking changes.
+## Philosophy Alignment
+| Slice | Principle | Status |
+|---|---|---|
+| Slice 1: Schema | Make illegal states unrepresentable | Satisfied -- notes required, intent hardcoded |
+| Slice 1: Factory | Validate at boundaries | Satisfied -- runtime check in execute() |
+| Slice 1: Factory | Immutability by default | Satisfied -- mutation via callbacks only |
+| Slice 2: runWorkflow | Determinism | Acceptable tension -- sequential execution makes mutable state deterministic |
+| Slice 3: Prompts | Document "why" | Satisfied -- prompts explain the tool's purpose |
+| Slice 4: Tests | Prefer fakes over mocks | Satisfied -- fake injection pattern |
+| All slices | YAGNI | Satisfied -- zero new abstractions |
+## Unresolved Unknowns
+None material. All design decisions are confirmed.
+- `unresolvedUnknownCount`: 0
+- `planConfidenceBand`: High

package/docs/ideas/backlog.md CHANGED Viewed

@@ -4751,3 +4751,227 @@ The daemon assembles a pre-packaged context bundle from these sources before the
 - How do you handle a trigger that spans multiple systems (e.g. a Jira ticket about a GitHub PR)?
 **This is a design-first item** -- the ideas are promising but the right shape isn't obvious. Needs a discovery pass before any implementation.
+---
+### Rethinking the subagent loop from first principles (Apr 18, 2026)
+**Step back from all assumptions.** The current design assumes subagent spawning works like Claude Code's `mcp__nested-subagent__Task` -- the LLM decides when to spawn, what to give it, and handles the result. That's not the only model, and it might not be the best one for WorkTrain.
+---
+#### The current assumption (inherited from Claude Code)
+```
+Agent decides → calls spawn_agent tool → subagent runs → agent gets result → agent continues
+```
+The LLM is the orchestrator. It decides when parallelism is needed, what context to pass, how to handle results.
+**Problems with this:**
+- LLMs are bad at orchestration decisions -- they sometimes delegate when they shouldn't, sometimes don't when they should
+- Context passing is lossy -- the LLM decides what to include, which is usually insufficient
+- Subagent output competes with everything else in the parent's context window
+- The LLM has to reason about the subagent's output before continuing -- burns context and turns
+- No enforcement -- the LLM can skip delegation entirely and just do the work itself (often wrong)
+---
+#### Alternative model: workflow-declared parallelism, daemon-enforced
+**The workflow spec is the orchestration. The daemon is the orchestrator. The LLM is the executor.**
+```yaml
+# Workflow step definition
+- id: parallel-review
+  type: parallel
+  agents:
+    - workflow: routine-correctness-review
+      contextFrom: [phase-3-output, candidateFiles]
+    - workflow: routine-philosophy-alignment
+      contextFrom: [phase-0-output, philosophySources]
+    - workflow: routine-hypothesis-challenge
+      contextFrom: [phase-2-output, selectedApproach]
+  synthesisStep: synthesize-parallel-review
+```
+The daemon sees this step definition and:
+1. Automatically spawns 3 child sessions with specified workflows
+2. Injects the declared context bundles (from prior step outputs) into each child
+3. Waits for all 3 to complete
+4. Passes all 3 results to a synthesis step
+5. Injects the synthesis into the parent agent's next turn
+**The parent LLM never decides to spawn anything.** It just does its part. The workflow declares the orchestration pattern. The daemon enforces it.
+---
+#### What this changes about the agent's job
+Today: "Do this work, and decide when to delegate parts of it to subagents."
+New model: "Do this bounded cognitive task. The daemon handles everything else."
+The agent's job becomes strictly about the cognitive work -- reasoning, writing, deciding within a defined scope. Orchestration, parallelism, context packaging, result synthesis -- all daemon responsibilities defined by the workflow spec.
+---
+#### The agent gives context to the daemon, not to subagents directly
+Instead of the LLM calling `spawn_agent({ goal: "...", context: {...} })`, the workflow step has:
+```yaml
+- id: context-gathering
+  output:
+    contextFor:
+      - step: parallel-review
+        keys: [candidateFiles, invariants, philosophySources]
+```
+The agent writes outputs as structured artifacts. The daemon routes those artifacts to the right child agents at the right time. The LLM never packages context for a subagent -- it just produces outputs, and the workflow spec declares where those outputs go.
+**This is the shift:** from "agent as orchestrator" to "workflow as orchestrator, daemon as executor, agent as cognitive unit."
+---
+#### What the subagent loop might look like
+```
+Parent workflow step completes
+  ↓ Daemon reads step output artifacts
+  ↓ Daemon checks workflow spec for parallel/sequential children
+  ↓ Daemon spawns child sessions with structured context bundles
+  ↓ Children run their bounded tasks
+  ↓ Daemon collects child outputs
+  ↓ Daemon passes synthesized context to parent's next step
+  ↓ Parent continues with full context
+```
+No LLM orchestration. No token-burning context packaging decisions. No "did I remember to delegate this?" uncertainty.
+---
+#### What needs to be designed (don't implement yet)
+1. **Workflow step schema for parallelism** -- how does the workflow spec declare parallel agents, sequential chains, fan-out/fan-in patterns?
+2. **Context routing spec** -- how does a step's output get routed to specific child agents? What's the schema for `contextFor`?
+3. **Synthesis patterns** -- how do multiple child outputs get combined? (concatenate? LLM synthesis step? structured merge?)
+4. **Failure handling** -- if one child fails, what happens? (fail-fast? continue with partial results? retry?)
+5. **Depth limits** -- same constraints as native agent spawning, but enforced at the workflow level not tool level
+6. **Backward compatibility** -- workflows that currently use `mcp__nested-subagent__Task` can be migrated incrementally
+**This is a design-first item.** Run a discovery session to explore the design space before any implementation. The current assumptions about subagent loops may be entirely wrong.
+---
+### Workflow runtime adapter: one spec, two runtimes (Apr 18, 2026)
+**The core insight:** as workflows evolve (potentially morphing significantly once the subagent loop is rethought), the workflow JSON becomes the canonical spec for *what work needs to happen*. How that spec gets executed depends on the runtime. A single adapter layer translates the canonical spec to runtime-specific execution plans.
+**Two runtimes, one spec:**
+```
+workflows/mr-review-workflow-agentic.json  ← canonical spec (unchanged)
+         ↓
+WorkflowAdapter.forRuntime('mcp')          ← MCP runtime interpretation
+WorkflowAdapter.forRuntime('daemon')       ← Daemon runtime interpretation
+```
+**What each adapter does:**
+MCP adapter (human-in-the-loop):
+- Preserves `requireConfirmation` gates
+- Presents `continue_workflow` tool call interface
+- LLM drives subagent spawning manually via `mcp__nested-subagent__Task`
+- Maintains backward compat with all existing Claude Code usage
+Daemon adapter (fully autonomous):
+- Removes or auto-bypasses `requireConfirmation` gates
+- Replaces `continue_workflow` with `complete_step` (daemon manages tokens)
+- Converts workflow-declared parallelism into automatic child session spawning
+- Routes step outputs to child agents per workflow spec
+- Enforces output contracts at step boundaries
+**Why this matters as workflows evolve:**
+Once the subagent loop is rethought (workflow-as-orchestrator model), workflow steps will likely declare parallelism, context routing, and synthesis patterns explicitly. These declarations make no sense to the MCP runtime (a human is already deciding this in real-time). The adapter translates them:
+```yaml
+# Workflow spec (future shape)
+- id: parallel-review
+  type: parallel
+  agents: [correctness, philosophy, hypothesis-challenge]
+  contextFrom: [phase-3-output]
+```
+MCP adapter sees this → renders as: "You should spawn 3 reviewer subagents now. Here's a template..."
+Daemon adapter sees this → actually spawns 3 child sessions automatically
+The workflow spec describes the intent. The adapter knows how each runtime fulfills it.
+**Key guarantee:** workflow improvements automatically benefit both runtimes. Improving `mr-review-workflow-agentic`'s philosophy alignment step shows up whether a human runs it through Claude Code or WorkTrain runs it autonomously. No dual maintenance.
+**Also eliminates "autonomous workflow variants":** the backlog had a separate item for autonomous variants of workflows. With the adapter, the canonical workflow spec is the only version -- the daemon adapter handles what "autonomy: full" means in practice. No parallel workflow files.
+**Build order:**
+1. Define the canonical workflow spec surface (what can be declared)
+2. MCP adapter (largely a no-op -- existing behavior, but formally defined)
+3. Daemon adapter (the interesting one -- translates declarations to daemon execution)
+4. Converter for upgrading existing workflow JSONs to the new canonical spec if the schema evolves
+**Dependencies:** requires the subagent loop rethinking to be resolved first -- the adapter can't be designed until we know what the workflow spec will declare.
+---
+### User notifications when daemon starts and finishes work (Apr 18, 2026)
+**The problem:** the daemon silently starts and finishes sessions. Unless you're watching the console or tailing the log, you have no idea work happened or completed. For autonomous sessions that run over minutes or hours, this is a significant UX gap.
+**What users need to know:**
+- Session started: "WorkTrain started reviewing PR #566" (with a link)
+- Session completed: "WorkTrain finished reviewing PR #566 -- APPROVED, no findings" (with session link)
+- Session failed/stuck: "WorkTrain got stuck on PR #566 after 15 turns -- needs attention" (with details)
+**Notification channels -- anything the user wants:**
+The notification system should be open-ended. Any channel that accepts a webhook or has an API should be configurable. The architecture is: `DaemonEventEmitter` → `NotificationRouter` → one or more configured channels.
+Short-term (easiest to ship):
+- **Outbox.jsonl** -- already spec'd. `worktrain inbox` reads it, mobile client polls it. Works everywhere, zero config.
+- **Generic webhook** -- HTTP POST to any URL. Covers Slack, Discord, Teams, PagerDuty, Zapier, IFTTT, and anything else that accepts webhooks. One implementation, infinite integrations.
+- **macOS notification** -- `osascript` on Mac. Useful for local dev awareness.
+- **Linux/Windows notification** -- `notify-send` on Linux, Windows Toast via PowerShell.
+Medium-term (first-class integrations):
+- **Slack** (direct API, not just webhook -- enables threading, reactions, rich formatting)
+- **Discord** (webhook, then bot for richer interactions)
+- **Microsoft Teams** (Adaptive Cards)
+- **Telegram** (popular for personal automation)
+- **Email** (SMTP for async, digest mode)
+Long-term (when mobile exists):
+- **Mobile push notifications** -- the mobile app (spec'd in backlog) receives push notifications directly. When the app exists, this becomes the primary channel -- native push is better than any polling-based alternative.
+- **Desktop app** -- if WorkTrain ever has a desktop app, native notifications from there.
+**The outbox is the universal foundation.** Every notification goes through `~/.workrail/outbox.jsonl` first. Channel-specific delivery (webhook, Slack, push) is a fan-out from the outbox. This means: a mobile app polling the outbox gets ALL notifications regardless of which other channels are configured.
+**Config:**
+```json
+// ~/.workrail/config.json
+{
+  "notifications": {
+    "onSessionComplete": true,
+    "onSessionFailed": true,
+    "onStuck": true,
+    "onSessionStart": false,
+    "channels": [
+      { "type": "webhook", "url": "$SLACK_WEBHOOK_URL" },
+      { "type": "webhook", "url": "$DISCORD_WEBHOOK_URL" },
+      { "type": "macos" },
+      { "type": "outbox" }
+    ]
+  }
+}
+```
+**Build order:** outbox.jsonl integration (foundation, works everywhere) → generic webhook (covers Slack/Discord/Teams/anything) → platform notifications (macOS/Linux/Windows) → mobile app push (when mobile exists).

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@exaudeus/workrail",
-  "version": "3.34.2",
+  "version": "3.35.0",
   "description": "Step-by-step workflow enforcement for AI agents via MCP",
   "license": "MIT",
   "repository": {