npm - @exaudeus/workrail - Versions diffs - 3.35.0 → 3.36.0 - Mend

@exaudeus/workrail 3.35.0 → 3.36.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/dist/console-ui/assets/{index-B10Bn8qC.js → index-n8cJrS4v.js} +2 -2
package/dist/console-ui/index.html +1 -1
package/dist/daemon/workflow-runner.d.ts +4 -0
package/dist/daemon/workflow-runner.js +133 -0
package/dist/manifest.json +24 -24
package/dist/mcp/handlers/v2-advance-events.js +1 -1
package/dist/mcp/handlers/v2-execution/start.d.ts +1 -0
package/dist/mcp/handlers/v2-execution/start.js +3 -2
package/dist/v2/durable-core/schemas/export-bundle/index.d.ts +64 -32
package/dist/v2/durable-core/schemas/session/events.d.ts +20 -10
package/dist/v2/durable-core/schemas/session/events.js +1 -1
package/dist/v2/durable-core/schemas/session/gaps.d.ts +8 -8
package/dist/v2/durable-core/schemas/session/gaps.js +1 -1
package/docs/ideas/backlog.md +250 -0
package/docs/ideas/design-candidates-spawn-agent-task.md +178 -0
package/docs/ideas/design-review-findings-spawn-agent-task.md +139 -0
package/docs/ideas/implementation_plan_spawn_agent.md +217 -0
package/package.json +1 -1

package/docs/ideas/design-review-findings-spawn-agent-task.md ADDED Viewed

@@ -0,0 +1,139 @@
+# Design Review Findings: spawn_agent Tool Implementation
+_Concise, actionable findings for main-agent synthesis. Design: Candidate 2 (pre-create session with _preAllocatedStartResponse, then blocking runWorkflow())._
+> Note: Full discovery-phase review is in `design-review-findings-spawn-agent.md`. This file is for the current coding task review pass.
+---
+## Tradeoff Review
+### T1: Parent clock keeps ticking while child runs
+- Confirmed acceptable. Success criterion 2 ('parent does not advance until child completes') is satisfied even on timeout (parent aborts, not advances).
+- When parent times out, child continues as orphaned session. Work is preserved in session store. Session tree preserves the parent-child link.
+- Mitigation needed: document in tool description.
+- **Status: ACCEPTED.**
+### T2: _preAllocatedStartResponse comment needs update
+- Current comment: 'set only by the dispatch HTTP handler.' spawn_agent will be another legitimate internal caller.
+- If not updated, future developer may remove spawn_agent support as accidental usage.
+- **Status: REQUIRED FIX (low effort, Step 1.1).**
+### T3: One extra async call in execute()
+- executeStartWorkflow() is ~10-50ms (no LLM call). Negligible for a tool that blocks 1-30 minutes.
+- **Status: ACCEPTED.**
+### T4: session_created.data extension
+- Confirmed `z.object({})` uses strip mode (not `.strict()`). Extension with `parentSessionId?: z.string().optional()` is backward-compatible.
+- `buildInitialEvents()` currently hardcodes `data: {}` -- requires threading `parentSessionId` parameter.
+- **Status: REQUIRED, LOW RISK.**
+---
+## Failure Mode Review
+### FM1: Parent timeout while child is running
+- Severity: LOW. Child completes normally, work preserved, session tree intact.
+- Design coverage: adequate. Orphaned child traceable via parentSessionId.
+- **No revision required.**
+### FM2: executeStartWorkflow() succeeds, runWorkflow() fails before AgentLoop starts
+- Severity: MEDIUM. Zombie session in store (shows as 'running' indefinitely).
+- Design coverage: partial. Parent gets `{ childSessionId, outcome: 'error', notes: errorMessage }` -- child session is observable. But zombie cleanup is deferred.
+- Mitigation for Phase 1: document as known edge case. Phase 2: session timeout/zombie cleanup.
+- **Status: ACCEPTED for Phase 1.**
+### FM3: spawnDepth propagation failure
+- Severity: HIGH if unmitigated. FULLY MITIGATED by using typed `readonly spawnDepth?: number` field on `WorkflowTrigger`.
+- After fix: severity drops to LOW (depth is typed, cannot be accidentally lost).
+- **Status: MITIGATED.**
+### FM4: Depth bypass via width (sequential spawning)
+- Severity: LOW for Phase 1. `maxSessionMinutes` on parent is the practical limit.
+- **Status: ACCEPTED, deferred to Phase 2.**
+---
+## Runner-Up / Simpler Alternative Review
+**Candidate 1 (direct runWorkflow, no pre-create):** Close alternative. Simpler execute() -- no executeStartWorkflow() call. Loses: `childSessionId` is unknown until after runWorkflow() starts; crash-before-start has no childSessionId to return.
+**No elements worth borrowing from Candidate 1.** C2 already does everything C1 does plus the session-ID-upfront guarantee.
+**Could skip session_created.data extension?** Technically yes -- `parentSessionId` in `context_set` events is still durable and queryable. But the extension is ~8 lines total and future-proofs DAG queries. Keep it.
+---
+## Philosophy Alignment
+### Clearly satisfied
+- Errors as data: discriminated union return, no throws
+- DI for boundaries: ctx, apiKey, emitter all injected at construction time
+- Immutability: WorkflowTrigger fully readonly, new fields also readonly
+- Exhaustiveness: WorkflowRunResult match handles all 4 variants
+- Validate at boundaries: depth check at start of execute()
+- YAGNI: Phase 1 only; non-blocking spawn deferred
+- Make illegal states unrepresentable: childSessionId always present (pre-create guarantees it)
+### Under tension (acceptable)
+- Architectural fixes over patches: parentSessionId via internalContext is somewhat patch-like. Acceptable because internalContext is an established pattern for daemon-internal injection (is_autonomous, workspacePath). Tension is low.
+- Compose with small pure functions: execute() has two async operations (~50 lines). Complexity is necessary and bounded.
+---
+## Findings
+### Red (blocking)
+_None._
+### Orange (required before implementation)
+**O1: Use `readonly spawnDepth?: number` on WorkflowTrigger (not `context.spawnDepth`)**
+Rationale: generic context map can be silently overwritten by trigger system or other callers, breaking depth enforcement. Typed field makes the invariant explicit and compiler-checked.
+Files: `src/daemon/workflow-runner.ts` (WorkflowTrigger type definition)
+**Status: Design already incorporates this fix.**
+**O2: Update `_preAllocatedStartResponse` comment to list spawn_agent as legitimate caller**
+Rationale: current comment misleads future developers. Without this update, spawn_agent support could be removed as accidental.
+Files: `src/daemon/workflow-runner.ts` (WorkflowTrigger._preAllocatedStartResponse JSDoc)
+**Status: Must be applied during implementation.**
+**O3: Thread parentSessionId into buildInitialEvents() for session_created.data**
+Rationale: the `internalContext` injection only reaches `context_set` events, not `session_created.data`. For the typed schema extension to work, `buildInitialEvents()` needs a new optional parameter.
+Files: `src/mcp/handlers/v2-execution/start.ts` (`buildInitialEvents()` signature and call site)
+**Status: Required -- not in original design review, discovered during implementation analysis.**
+### Yellow (should-fix, not blocking)
+**Y1: Document parent-clock behavior in tool description**
+The spawn_agent tool description should note that the parent session's maxSessionMinutes clock runs while the child executes. Workflow authors must configure the parent's timeout to be longer than the expected child duration.
+**Y2: Document zombie session edge case**
+The spawn_agent tool description should note that if runWorkflow() fails before the AgentLoop starts, a zombie session may exist in the store. Phase 2 will add cleanup.
+**Y3: maxSubagentDepth source**
+For Phase 1, default to 3 if `WorkflowTrigger.agentConfig?.maxSubagentDepth` is not set. Document in tool description.
+---
+## Recommended Revisions
+1. Use `readonly spawnDepth?: number` on WorkflowTrigger (O1 -- design already incorporates)
+2. Update `_preAllocatedStartResponse` JSDoc to list spawn_agent as a legitimate internal caller (O2)
+3. Add `parentSessionId?: string` parameter to `buildInitialEvents()` and thread it into `session_created.data` (O3)
+4. Add parent-clock behavior documentation to spawn_agent tool description (Y1)
+5. Add zombie session documentation to spawn_agent tool description (Y2)
+6. Use `trigger.agentConfig?.maxSubagentDepth ?? 3` as maxDepth (Y3)
+---
+## Residual Concerns
+**RC1: Session tree query infrastructure deferred to Phase 2**
+`parentSessionId` is written to the session store but no query path exists to read 'all children of session X'. The console DAG view cannot render the tree until Phase 2. This is by design -- Phase 1 writes the data; Phase 2 builds the reader.
+**RC2: Zod strictness on session_created.data**
+Confirmed that `z.object({})` uses strip mode (not strict). Extension with `parentSessionId?: z.string().optional()` is backward-compatible. Unverified by an actual migration run -- low risk but unvalidated.
+**RC3: maxTotalAgentsPerTask guardrail deferred**
+Phase 1 enforces depth only. Wide spawning is not caught by depth limits. Phase 2 adds the concurrency registry. For Phase 1, `maxSessionMinutes` on the parent session is the practical limit on total work done.

package/docs/ideas/implementation_plan_spawn_agent.md ADDED Viewed

@@ -0,0 +1,217 @@
+# Implementation Plan: spawn_agent Tool
+## 1. Problem Statement
+Agents running inside WorkRail daemon sessions currently delegate sub-tasks using
+`mcp__nested-subagent__Task` (external Claude Code MCP tool). This produces no WorkRail
+session, no audit trail, and no structured output. The delegated work is completely invisible
+to WorkRail -- it cannot be traced, replayed, or composed into larger orchestrations.
+**Three root causes to fix:**
+1. No native WorkRail tool for spawning child sessions from within a daemon session
+2. No `parentSessionId` link in the session event log (no parent-child graph)
+3. No depth enforcement to prevent runaway delegation chains
+---
+## 2. Acceptance Criteria
+- [ ] A daemon agent can call `spawn_agent({ workflowId, goal, workspacePath, context? })` from a workflow step
+- [ ] The parent agent blocks until the child session completes (no polling, no fire-and-forget)
+- [ ] The child session is created with `parentSessionId` in the session event log (durable, survives crashes)
+- [ ] The tool returns `{ childSessionId, outcome: 'success'|'error'|'timeout', notes: string }` as JSON
+- [ ] Spawning a child at depth >= `maxSubagentDepth` (default 3) returns a typed error without spawning
+- [ ] `spawnDepth` propagates correctly through chains: root=0, child=1, grandchild=2
+- [ ] Child sessions are created in-process (no HTTP dispatch, no semaphore involvement)
+- [ ] `npm run build` passes with no new TypeScript errors
+- [ ] The `spawn_agent` tool is listed in `BASE_SYSTEM_PROMPT` with usage guidance
+---
+## 3. Non-Goals
+- `spawn_session` + `await_sessions` non-blocking parallel spawn (Phase 2)
+- `maxTotalAgentsPerTask` width guardrail (Phase 2)
+- Bare-prompt child sessions without a `workflowId` (Phase 2)
+- Session tree query API (console DAG view reads `parentSessionId`, but no query endpoint in Phase 1)
+- Zombie session cleanup (Phase 2)
+- Changes to `TriggerRouter`, `dispatch()`, or the HTTP dispatch route
+- Changes to the public `V2StartWorkflowInput` MCP schema (external callers are unaffected)
+---
+## 4. Philosophy-Driven Constraints
+- **Errors are data**: All 4 `WorkflowRunResult` variants (`success`, `error`, `timeout`, `delivery_failed`) must be handled exhaustively and mapped to structured JSON return values. `execute()` must NOT throw for child failures.
+- **Immutability by default**: New `WorkflowTrigger` fields must be `readonly`.
+- **DI for boundaries**: `ctx`, `apiKey`, `runWorkflowFn`, `emitter` injected at factory construction time. No singletons, no global state.
+- **Validate at boundaries**: Depth check at the START of `execute()`, before any async operations.
+- **Make illegal states unrepresentable**: `childSessionId` must always be present in the result (pre-create guarantees it). `spawnDepth` must be a typed `WorkflowTrigger` field (not in context map).
+- **YAGNI**: Phase 1 only. No bare-prompt mode, no `await_sessions`, no width guardrails.
+- **Document 'why'**: Add WHY comments consistent with existing code style in `workflow-runner.ts`.
+---
+## 5. Invariants
+1. **Semaphore bypass invariant**: `spawn_agent` must call `runWorkflow()` directly. Calling `TriggerRouter.dispatch()` from within a running session would cause semaphore deadlock.
+2. **_preAllocatedStartResponse invariant**: When `trigger._preAllocatedStartResponse` is set, `runWorkflow()` MUST NOT call `executeStartWorkflow()` again. This invariant is already documented in the `WorkflowTrigger` JSDoc.
+3. **Depth propagation invariant**: A child session at depth N must always construct its `spawn_agent` tool with `currentDepth = N`. This is enforced by the typed `readonly spawnDepth?: number` field on `WorkflowTrigger`.
+4. **Schema strip invariant**: `session_created.data` uses `z.object({})` strip mode. Adding `parentSessionId?: string` is a backward-compatible additive change.
+5. **V2StartWorkflowInput unchanged**: The public MCP input schema for `start_workflow` must not be modified. `parentSessionId` flows via `internalContext` only.
+---
+## 6. Selected Approach + Rationale + Runner-Up
+**Selected: Candidate 2 (pre-create session with _preAllocatedStartResponse)**
+`execute()` calls `executeStartWorkflow(input, ctx, { parentSessionId })`, decodes `childSessionId`
+from the returned `continueToken` via `parseContinueTokenOrFail()`, then calls `runWorkflow()` with
+`_preAllocatedStartResponse: startResult.value.response`. This blocks naturally -- `await runWorkflow()`
+inside `execute()` pauses the parent's tool execution until the child completes.
+**Rationale**: The `_preAllocatedStartResponse` pattern is already proven in `console-routes.ts`
+(lines 573-624). This is a direct adaptation of stable existing machinery, not invention.
+`childSessionId` is always deterministic (known before child runs). Crash-before-start is
+observable (zombie session in store with `parentSessionId` intact).
+**Runner-up: Candidate 1 (direct runWorkflow())**
+Simpler (~10 fewer lines). Loses: `childSessionId` is unavailable if the run crashes before
+AgentLoop starts. Pivot to this if `_preAllocatedStartResponse` is ever removed.
+---
+## 7. Vertical Slices
+### Slice 1: WorkflowTrigger schema extension
+**File**: `src/daemon/workflow-runner.ts`
+**Change**: Add `readonly parentSessionId?: string` and `readonly spawnDepth?: number` to the `WorkflowTrigger` interface. Update `_preAllocatedStartResponse` JSDoc to list `spawn_agent` as a legitimate internal caller (O2 fix).
+**Acceptance**: TypeScript compiles. No existing callers broken (new fields are optional). `_preAllocatedStartResponse` comment updated.
+**Risk**: None (additive change).
+### Slice 2: session_created.data schema extension
+**File**: `src/v2/durable-core/schemas/session/events.ts`
+**Change**: Extend `session_created.data` from `z.object({})` to `z.object({ parentSessionId: z.string().optional() })`.
+**File**: `src/mcp/handlers/v2-execution/start.ts`
+**Change**: Add `parentSessionId?: string` optional parameter to `buildInitialEvents()`. When provided, include it in the `session_created` event's `data` field. Thread `parentSessionId` from `executeStartWorkflow()` (via `internalContext?.['parentSessionId']`) into `buildInitialEvents()`.
+**Acceptance**: TypeScript compiles. Existing session creation (without `parentSessionId`) produces `data: {}` (unchanged). Session created with `parentSessionId` produces `data: { parentSessionId: 'sess_...' }`.
+**Risk**: Low (Zod strip mode, backward-compatible).
+### Slice 3: makeSpawnAgentTool() factory
+**File**: `src/daemon/workflow-runner.ts`
+**Change**: Add `SpawnAgentParams` JSON Schema to `getSchemas()`. Add `makeSpawnAgentTool()` factory function alongside `makeCompleteStepTool()`, `makeContinueWorkflowTool()`, etc.
+Factory signature:
+```typescript
+export function makeSpawnAgentTool(
+  sessionId: string,             // process-local UUID (for logging)
+  ctx: V2ToolContext,
+  apiKey: string,
+  thisWorkrailSessionId: string, // WorkRail sess_xxx ID (becomes parentSessionId)
+  currentDepth: number,          // spawn depth of the parent session
+  maxDepth: number,              // max depth before blocking spawn
+  runWorkflowFn: typeof runWorkflow,
+  emitter?: DaemonEventEmitter,
+): AgentTool
+```
+`execute()` logic:
+1. Depth check: if `currentDepth >= maxDepth`, return `{ childSessionId: null, outcome: 'error', notes: 'Max spawn depth exceeded ...' }` as JSON
+2. Call `executeStartWorkflow({ workflowId, goal, workspacePath, context }, ctx, { parentSessionId: thisWorkrailSessionId })` and match result
+3. On start error: return `{ childSessionId: null, outcome: 'error', notes: errorMessage }` as JSON
+4. On start success: decode `childSessionId` from `startResult.response.continueToken` via `parseContinueTokenOrFail()`
+5. Call `runWorkflowFn({ workflowId, goal, workspacePath, context, spawnDepth: currentDepth + 1, _preAllocatedStartResponse: startResult.response }, ctx, apiKey, undefined, emitter)`
+6. Map `WorkflowRunResult` to `{ childSessionId, outcome, notes }`:
+   - `success`: `{ outcome: 'success', notes: result.lastStepNotes ?? '' }`
+   - `error`: `{ outcome: 'error', notes: result.message }`
+   - `timeout`: `{ outcome: 'timeout', notes: result.message }`
+   - `delivery_failed`: `{ outcome: 'error', notes: result.deliveryError }`
+7. Return `{ content: [{ type: 'text', text: JSON.stringify(resultObj) }] }`
+**Acceptance**: TypeScript compiles. Depth check fires at limit. All 4 result variants handled. `childSessionId` present in all returns (null only on depth error and start error).
+**Risk**: Low (new function, no existing code changed).
+### Slice 4: Inject spawn_agent in runWorkflow()
+**File**: `src/daemon/workflow-runner.ts`
+**Change**: Read `trigger.spawnDepth ?? 0` and `trigger.agentConfig?.maxSubagentDepth ?? 3` in `runWorkflow()`. Add `makeSpawnAgentTool(...)` to the `tools` array, using the decoded `workrailSessionId` as `thisWorkrailSessionId`.
+Note: `workrailSessionId` is decoded from `startContinueToken` AFTER `executeStartWorkflow()`. The tool must be constructed after this decode, but the existing tool list construction already happens after this point (line ~1914).
+**Acceptance**: TypeScript compiles. `runWorkflow()` signature unchanged. `spawn_agent` tool is in the tools list passed to `AgentLoop`.
+**Risk**: Low. One new tool added to existing list.
+### Slice 5: BASE_SYSTEM_PROMPT update
+**File**: `src/daemon/workflow-runner.ts`
+**Change**: Add `spawn_agent` to the tools section of `BASE_SYSTEM_PROMPT`. Document:
+- When to use (delegate sub-tasks to a child WorkRail session)
+- What it returns (`{ childSessionId, outcome, notes }`)
+- Parent clock warning (maxSessionMinutes keeps ticking)
+- Zombie session note (best-effort; cleanup in Phase 2)
+- Depth limit note (default max depth 3)
+**Acceptance**: Tool listed in system prompt with accurate description.
+**Risk**: None.
+---
+## 8. Test Design
+**Note**: No unit tests exist for other tool factories in `workflow-runner.ts` (all integration-style). Adding unit tests for `makeSpawnAgentTool()` is aspirational; the acceptance criterion for Phase 1 is `npm run build` passing.
+**What to verify manually**:
+1. TypeScript compiles cleanly (`npm run build`)
+2. Depth check: create a trigger with `spawnDepth: 3`, verify tool returns error immediately
+3. `_preAllocatedStartResponse` path: verify child session is created in store before `runWorkflow()` starts
+**If existing tests exist**, run them with `npm test` after each slice.
+---
+## 9. Risk Register
+| Risk | Severity | Mitigation |
+|---|---|---|
+| Zombie session (executeStartWorkflow succeeds, runWorkflow fails before AgentLoop) | MEDIUM | Document in tool description, Phase 2 cleanup |
+| Parent timeout while child runs | LOW | Document in tool description, user configures maxSessionMinutes |
+| session_created.data Zod strictness | LOW | Confirmed strip mode; unverified by migration test |
+| _preAllocatedStartResponse removed in future refactor | LOW | Update JSDoc (O2 fix) to protect against this |
+| workrailSessionId null at tool construction time | LOW | Tool construction happens AFTER decode; if decode fails, skip spawn_agent tool or construct with empty string |
+---
+## 10. PR Packaging Strategy
+**SinglePR**: `feat/daemon-spawn-agent-tool`
+All 5 slices go into one PR. The slices are interdependent (Slice 4 requires Slice 3, Slice 3 uses Slice 2, etc.). Splitting across multiple PRs would leave the codebase in a broken intermediate state.
+PR title: `feat(workflows): add spawn_agent tool for in-process child session delegation`
+---
+## 11. Philosophy Alignment Per Slice
+| Slice | Principle | Status |
+|---|---|---|
+| Slice 1 (WorkflowTrigger extension) | Immutability: new fields are readonly | Satisfied |
+| Slice 1 | Make illegal states unrepresentable: spawnDepth typed, not in context map | Satisfied |
+| Slice 2 (schema extension) | Errors are data: Zod strip mode means extension is non-breaking | Satisfied |
+| Slice 3 (makeSpawnAgentTool) | Errors are data: all variants return JSON, no throws | Satisfied |
+| Slice 3 | Exhaustiveness: all 4 WorkflowRunResult variants handled | Satisfied |
+| Slice 3 | DI for boundaries: ctx, apiKey, emitter injected | Satisfied |
+| Slice 3 | Validate at boundaries: depth check first | Satisfied |
+| Slice 4 (inject in runWorkflow) | YAGNI: no bare-prompt, no width guardrails | Satisfied |
+| Slice 4 | Semaphore bypass invariant: direct runWorkflow(), not dispatch() | Satisfied |
+| Slice 5 (system prompt) | Document 'why': parent clock and zombie warnings in tool description | Satisfied |
+---
+## 12. Plan Metrics
+- `implementationPlan`: 5 slices across 3 files
+- `slices`: Slice 1 (WorkflowTrigger), Slice 2 (schema + buildInitialEvents), Slice 3 (makeSpawnAgentTool), Slice 4 (inject in runWorkflow), Slice 5 (BASE_SYSTEM_PROMPT)
+- `testDesign`: npm run build + manual depth check + manual session creation verification
+- `estimatedPRCount`: 1
+- `followUpTickets`: Phase 2 (spawn_session + await_sessions), zombie cleanup, session tree query API, maxTotalAgentsPerTask guardrail
+- `unresolvedUnknownCount`: 0 (all open questions from design phase resolved)
+- `planConfidenceBand`: High

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@exaudeus/workrail",
-  "version": "3.35.0",
+  "version": "3.36.0",
   "description": "Step-by-step workflow enforcement for AI agents via MCP",
   "license": "MIT",
   "repository": {