npm - alvin-bot - Versions diffs - 4.12.3 → 4.13.0 - Mend

alvin-bot 4.12.3 → 4.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/CHANGELOG.md +126 -0
package/dist/handlers/message.js +9 -0
package/dist/paths.js +8 -0
package/dist/providers/claude-sdk-provider.js +25 -5
package/dist/services/alvin-dispatch.js +125 -0
package/dist/services/alvin-mcp-tools.js +103 -0
package/dist/services/async-agent-parser.js +126 -1
package/dist/services/personality.js +36 -10
package/package.json +1 -1
package/test/alvin-dispatch.test.ts +220 -0
package/test/async-agent-parser-staleness.test.ts +412 -0
package/test/async-agent-parser-streamjson.test.ts +273 -0

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,132 @@
 All notable changes to Alvin Bot are documented here.
+## [4.13.0] — 2026-04-16
+### ✨ Major: truly detached sub-agent dispatch via `alvin_dispatch_agent` MCP tool
+**Background.** v4.12.1 → v4.12.3 tried three progressively more complex fixes for the "bot freezes while sub-agent runs" problem, all of which depended on Claude Agent SDK's built-in `Task(run_in_background: true)` tool. All three iterations missed the same architectural reality: the SDK's background task stays tied to the parent SDK subprocess lifecycle. When v4.12.3's bypass path aborted the parent to unblock the user, the abort cascaded into killing the in-flight sub-agent mid-work. v4.12.4 worked around this at the delivery layer (recovering partial output after a 5-min staleness window), but the fundamental architecture was still wrong.
+v4.13 fixes the architecture. Instead of using the SDK's built-in Task tool for background work, we register our own MCP tool — `mcp__alvin__dispatch_agent` — which spawns a **completely independent** `claude -p` subprocess (its own PID, its own process group, unreferenced from the parent's event loop). Aborting the parent has zero effect on the dispatched subprocess. It continues to write its stream-json output to its own file and runs to completion. The async-agent-watcher polls the output file and delivers the result as a separate message when ready.
+Empirically verified with a standalone survival test (`scripts/smoke-test-abort-survival.mjs`): dispatch an agent that needs 20+ seconds of work, kill the parent Node process 100ms later, watch the subprocess keep writing to its output file and complete cleanly with the expected result.
+### What changed for the user
+- **Before v4.13** (with Task tool): the bot shows "typing…" for the entire duration of the sub-agent's work (5, 20, 60 minutes). New messages sit in a queue and don't get processed. If the user interrupts via v4.12.3's bypass, the sub-agent dies mid-work and hours later the user gets a `720m timeout · (empty output)` message.
+- **After v4.13** (with `alvin_dispatch_agent`): the bot's turn completes within seconds of dispatch. The user sees "🤖 Dispatched 2 background agents — I'll send the results when ready." and can immediately chat about anything else. The background subprocesses finish cleanly and deliver their full results as separate messages.
+This matches the OpenClaw experience the user was asking about — except it's built natively into Claude Agent SDK's MCP-tool mechanism, not a wholesale replacement.
+### Technical details
+**New module** `src/services/alvin-dispatch.ts`
+- `dispatchDetachedAgent(input)` — spawns `claude -p <prompt> --output-format stream-json` via `child_process.spawn({ detached: true, stdio: ["ignore", outFd, errFd] })` + `.unref()`
+- Synchronous return: `{ agentId, outputFile, spawned: true }`
+- Side effects: registers with `async-agent-watcher`, increments `session.pendingBackgroundCount`
+- Unique agent IDs via `crypto.randomBytes(12).toString("hex")` (collision-safe for parallel dispatch)
+- Cleans `CLAUDECODE`/`CLAUDE_CODE_ENTRYPOINT` from env to prevent nested-session errors
+**New module** `src/services/alvin-mcp-tools.ts`
+- `buildAlvinMcpServer(ctx)` — creates an SDK MCP server bound to this turn's `{ chatId, userId, sessionKey }` context via closure
+- Exposes `dispatch_agent` tool (zod-validated input: `{ prompt: string, description: string }`)
+- Tool handler calls `dispatchDetachedAgent` and returns `agentId + outputFile` to Claude
+- Uses SDK's `createSdkMcpServer` + `tool` builders (the SDK's native inline-tool API — no separate MCP server process needed)
+**Provider integration** (`src/providers/claude-sdk-provider.ts`)
+- New `QueryOptions.alvinDispatchContext` field — when set, provider registers `mcpServers: { alvin: buildAlvinMcpServer(ctx) }` + appends `mcp__alvin__dispatch_agent` to the default `allowedTools` list
+- When unset, the MCP server is not registered and Claude falls back to the built-in Task tool only
+- Non-SDK providers ignore the new field entirely
+**Handler integration** (`src/handlers/message.ts`)
+- Passes `alvinDispatchContext: { chatId, userId, sessionKey }` on every SDK turn
+- No other handler changes — the bypass path, the staleness parser, and the pending-count decrement are all reused from v4.12.3/v4.12.4
+**Parser extension** (`src/services/async-agent-parser.ts`)
+- New first-pass scan for `{"type":"result"}` events — the completion marker used by `claude -p --output-format stream-json` (different from the SDK-internal sub-agent format that uses `message.stop_reason: "end_turn"`)
+- When found, uses the `result.result` field as authoritative output when present, falls back to aggregating all assistant text blocks
+- Preserves backward compat with the existing `end_turn`-based path (tested by the old test suite)
+**System prompt update** (`src/services/personality.ts`)
+- `BACKGROUND_SUBAGENT_HINT` rewritten to strongly prefer `mcp__alvin__dispatch_agent` over `Task(run_in_background: true)` on Telegram/WhatsApp/Slack/Discord
+- Explicit decision tree, concrete example prompts, parallel-dispatch guidance
+- Built-in Task tool remains available but deprecated for long-running work; reserved for the rare case where Claude needs a result in the same turn
+### Known limitations
+- **First-turn only for now**: the MCP server is bound to `{ chatId, userId, sessionKey }` at query construction time. If the session's underlying SDK session ID changes mid-conversation (rare), the tool context goes stale. Defensive: a new MCP server is built on each handler invocation, so any next turn picks up the correct context.
+- **Non-Telegram platforms**: `src/handlers/platform-message.ts` (Slack/Discord/WhatsApp) doesn't pass `alvinDispatchContext` yet. Deferred to follow-up — the Telegram path is the primary use case and the one the user explicitly requested.
+- **Parallel dispatch not smoke-tested**: the system prompt guides Claude to call `dispatch_agent` multiple times in one turn for parallel work, but I only end-to-end tested single dispatch. Should work (no shared state in the handler), but YMMV until battle-tested.
+### Testing
+- **Baseline**: 447 tests (v4.12.4)
+- **New**:
+  - `test/alvin-dispatch.test.ts` — 6 tests (spawn flags, unique IDs, watcher registration, session counter, stdio redirect, env cleanup)
+  - `test/async-agent-parser-streamjson.test.ts` — 7 tests (result-event detection, token extraction, error state, running state, multi-text aggregation, `result.result` precedence, minimal fields)
+- **Total**: 460 tests, all green, TSC clean
+- **Real-world smoke tests** (NOT in CI — run via `node scripts/smoke-test-dispatch.mjs` and `node scripts/smoke-test-abort-survival.mjs`):
+  - `smoke-test-dispatch`: dispatches a real `claude -p` subprocess, polls to completion (~10s), verifies exact output `"SMOKE_TEST_OK_v4.13"`. **PASS**.
+  - `smoke-test-abort-survival`: dispatches a subprocess that needs ~25s of work, kills the parent Node process ~100ms later, polls the output file. Subprocess survives and completes cleanly. **PASS**.
+### Files changed
+- **NEW**: `src/services/alvin-dispatch.ts`, `src/services/alvin-mcp-tools.ts`, `scripts/smoke-test-dispatch.mjs`, `scripts/smoke-test-abort-survival.mjs`
+- **NEW tests**: `test/alvin-dispatch.test.ts`, `test/async-agent-parser-streamjson.test.ts`
+- **Modified**: `src/paths.ts` (SUBAGENTS_DIR), `src/services/async-agent-parser.ts` (stream-json detection), `src/providers/claude-sdk-provider.ts` (MCP server registration + allowedTools), `src/providers/types.ts` (QueryOptions.alvinDispatchContext), `src/handlers/message.ts` (pass dispatch context), `src/services/personality.ts` (BACKGROUND_SUBAGENT_HINT rewrite)
+- **Version**: `package.json` 4.12.4 → 4.13.0 (minor bump — new public surface: MCP tool)
+---
+## [4.12.4] — 2026-04-16
+### 🐛 Patch: recover partial output from interrupted background sub-agents
+**The bug Ali saw:** Two Telegram messages appeared hours apart: `⏱️ Background agent a5bf8c74 timeout · 720m 3s · 0 in / 0 out` and `... ab9372d4 timeout · 720m 1s · 0 in / 0 out`, both with `(empty output)`. Three more agents were still pending, all interrupted mid-execution with hundreds of KB of real work sitting on disk.
+**Root cause:** v4.12.3's bypass-abort calls `session.abortController.abort()`, which propagates through `claude-sdk-provider.ts`'s `internalAbortController` into the SDK's CLI subprocess, which in turn propagates into any in-flight `Agent(run_in_background: true)` tool executions. Evidence from the disk:
+- `agent-a03ce829...jsonl`: 116 lines, last event = literally `"[Request interrupted by user for tool use]"` mid-Bash-tool-use
+- `agent-af61fa6e...jsonl`: 81 lines, last assistant text = `"Ich habe jetzt genug Daten für den vollständigen Audit. Hier ist der Report:"` — interrupted while streaming the final report
+- `agent-ac47c4a2...jsonl`: 131 lines, last assistant text = `"## Perseus Audit — Ergebnis\n### Kritische Bugs"` — interrupted a few words into the payoff
+None of them reached `stop_reason: "end_turn"`. The pre-v4.12.4 `parseOutputFileStatus` only recognized `end_turn` as a completion signal, so these agents sat in the pending list for 12h until `giveUpAt` elapsed, then got delivered as `(empty output)` while their real work was still on disk.
+**The fix:** `parseOutputFileStatus` now has a staleness fallback. When no `end_turn` is present BUT the outputFile hasn't been written to in `stalenessMs` (default 5 min, configurable via `ALVIN_SUBAGENT_STALENESS_MS`) AND there is usable assistant text content in the tail, the parser:
+1. Aggregates ALL text blocks across all assistant turns in the tail (not just the last one — bias toward delivering more context)
+2. Prepends a clear banner: `⚠️ _Sub-Agent wurde unterbrochen — hier ist der partielle Output:_`
+3. Returns `state: "completed"` so the watcher delivers it instead of continuing to poll
+Result: on the next `pollOnce()` after v4.12.4 ships, the three stuck agents get delivered with their real partial output (combined ~1.2MB of text across the three). Future interrupts recover within 5 minutes instead of hanging 12 hours.
+### Behavioral notes
+- **Clean `end_turn` sub-agents are unchanged** — the staleness fallback is a *fallback only*. The existing strict path runs first and takes precedence.
+- **`stalenessMs: 0` disables the fallback entirely** — strict end_turn-only mode for callers that prefer it.
+- **Thinking blocks are still filtered out** of the partial delivery — same as with clean completion.
+- **Files with no assistant text at all** (only tool_use) stay in `running` state — nothing useful to deliver.
+- **Tokens are surfaced when available** — the last assistant event's `usage.input_tokens`/`output_tokens` flow through to the delivery banner.
+### Known limitations (carried over from v4.12.3, deferred to v4.13)
+- The bypass-abort mechanism in `message.ts` still propagates to the SDK subprocess and kills in-flight sub-agents. v4.12.4 works around this at the delivery layer (recovering partial output); a true fix requires either architectural replacement of the SDK's `Task` tool with our own detached-subprocess dispatch, or SDK support for per-task-branch abort signals. Tracked for v4.13.
+- Users may still experience the bot's "typing…" indicator when Claude is thinking in the main turn (before dispatching any background agent). Bypass only fires once `pendingBackgroundCount > 0`. For interrupt before dispatch, use `/cancel`.
+### Testing
+- **Baseline**: 436 tests (v4.12.3)
+- **New**: `test/async-agent-parser-staleness.test.ts` — 11 tests covering: clean `end_turn` still wins over staleness, fresh-interrupted file stays running, stale-interrupted file delivers partial with banner, no-text file stays running, `stalenessMs: 0` disables, aggregation across multiple turns, thinking-block filtering, token extraction, interrupt-only file with no useful content, and ordering preservation.
+- **Total**: 447 tests, all green, TSC clean.
+### Files changed
+- **Modified**: `src/services/async-agent-parser.ts` — staleness fallback in `parseOutputFileStatus`, `DEFAULT_STALENESS_MS` constant, `INTERRUPTED_BANNER` prefix.
+- **NEW tests**: `test/async-agent-parser-staleness.test.ts`.
+- **Version**: `package.json` 4.12.3 → 4.12.4.
+---
 ## [4.12.3] — 2026-04-15
 ### 🐛 Patch: Background sub-agent no longer blocks the main Telegram session

package/dist/handlers/message.js CHANGED Viewed

@@ -400,6 +400,15 @@ export async function handleMessage(ctx) {
                 messageCount: session.messageCount,
                 toolUseCount: session.toolUseCount,
             } : undefined,
+            // v4.13 — Expose alvin_dispatch_agent MCP tool so Claude can spawn
+            // truly detached background sub-agents (independent of this SDK
+            // subprocess's lifecycle). Only for SDK provider + Telegram here —
+            // non-SDK providers ignore this field.
+            alvinDispatchContext: isSDK ? {
+                chatId: ctx.chat.id,
+                userId,
+                sessionKey,
+            } : undefined,
         };
         // Stream response from provider (with fallback)
         let lastBroadcastLen = 0;

package/dist/paths.js CHANGED Viewed

@@ -118,3 +118,11 @@ export const ASSETS_DIR = resolve(DATA_DIR, "assets");
 export const ASSETS_INDEX_JSON = resolve(DATA_DIR, "assets", "INDEX.json");
 /** assets/INDEX.md — Human-readable asset summary (injected into prompts) */
 export const ASSETS_INDEX_MD = resolve(DATA_DIR, "assets", "INDEX.md");
+/** subagents/ — Detached `claude -p` subprocess output files (v4.13).
+ *  Each dispatched agent writes its full stream-json output to
+ *  subagents/<agentId>.jsonl. The async-agent-watcher polls these files
+ *  and delivers the final result as a separate message when ready.
+ *  These live outside BOT_ROOT/DATA_DIR's state/ so that the watcher's
+ *  giveUpAt-survive-restart logic doesn't leak into the subprocess
+ *  lifecycle. */
+export const SUBAGENTS_DIR = resolve(DATA_DIR, "subagents");

package/dist/providers/claude-sdk-provider.js CHANGED Viewed

@@ -13,6 +13,7 @@ import { fileURLToPath } from "url";
 import { execFile } from "child_process";
 import { promisify } from "util";
 import { findClaudeBinary } from "../find-claude-binary.js";
+import { buildAlvinMcpServer } from "../services/alvin-mcp-tools.js";
 const execFileAsync = promisify(execFile);
 /**
  * Detects the Claude CLI "Not logged in" error message. The CLI emits this
@@ -103,6 +104,25 @@ export class ClaudeSDKProvider {
         }
         try {
             const claudePath = findClaudeBinary();
+            // v4.13 — Register Alvin's custom MCP server if the caller provided
+            // dispatch context. The server exposes `alvin_dispatch_agent` which
+            // spawns truly detached `claude -p` subprocesses (independent of the
+            // main SDK subprocess's lifecycle). When Claude calls it, the bot
+            // can abort this query without killing the dispatched sub-agent.
+            const mcpServers = {};
+            if (options.alvinDispatchContext) {
+                mcpServers.alvin = buildAlvinMcpServer(options.alvinDispatchContext);
+            }
+            // v4.13 — MCP tool names must be explicitly whitelisted via allowedTools
+            // in the form `mcp__<server>__<tool>`. Without this, Claude can see the
+            // tool in the catalog but cannot actually invoke it.
+            const defaultAllowed = [
+                "Read", "Write", "Edit", "Bash", "Glob", "Grep",
+                "WebSearch", "WebFetch", "Task",
+            ];
+            if (options.alvinDispatchContext) {
+                defaultAllowed.push("mcp__alvin__dispatch_agent");
+            }
             const q = query({
                 prompt,
                 options: {
@@ -116,11 +136,11 @@ export class ClaudeSDKProvider {
                     settingSources: ["user", "project"],
                     // v4.12.2 — options.allowedTools can override the default full set.
                     // Used by sub-agents with toolset="readonly"/"research" to restrict
-                    // what Claude can do. Default = full access.
-                    allowedTools: options.allowedTools ?? [
-                        "Read", "Write", "Edit", "Bash", "Glob", "Grep",
-                        "WebSearch", "WebFetch", "Task",
-                    ],
+                    // what Claude can do. Default = full access + alvin MCP tools.
+                    allowedTools: options.allowedTools ?? defaultAllowed,
+                    // v4.13 — Conditionally pass the MCP server config so the inline
+                    // dispatch tool is visible. Empty object = no custom tools.
+                    mcpServers: Object.keys(mcpServers).length > 0 ? mcpServers : undefined,
                     systemPrompt,
                     effort: (options.effort || "medium"),
                     maxTurns: 50,

package/dist/services/alvin-dispatch.js ADDED Viewed

@@ -0,0 +1,125 @@
+/**
+ * v4.13 — alvin_dispatch custom-tool service.
+ *
+ * Architectural replacement for Claude Agent SDK's built-in
+ * `Task(run_in_background: true)` tool. The SDK's built-in version
+ * ties the background sub-agent's execution to the parent SDK
+ * subprocess lifecycle — killing the parent (e.g. via v4.12.3's
+ * bypass-abort) cascades into killing any in-flight background tasks.
+ *
+ * This module instead spawns a truly independent `claude -p` subprocess
+ * via Node's `child_process.spawn({ detached: true, stdio: [...] })`.
+ * The subprocess:
+ *   - Has its own PID, own process group (by detached: true)
+ *   - Is unreffed so the parent Node process doesn't wait for it
+ *   - Writes its stream-json output to its own file
+ *   - Survives any abort/crash/restart of the parent Alvin bot
+ *
+ * The async-agent-watcher polls the output file and delivers the
+ * final result via subagent-delivery.ts when the sub-agent completes.
+ *
+ * See Phase A of docs/superpowers/plans/2026-04-16-v4.13-truly-async-subagents.md
+ * for the empirical verification that detached `claude -p` subprocesses
+ * behave as expected (they do).
+ */
+import { spawn } from "node:child_process";
+import fs from "node:fs";
+import crypto from "node:crypto";
+import { resolve } from "node:path";
+import { findClaudeBinary } from "../find-claude-binary.js";
+import { registerPendingAgent } from "./async-agent-watcher.js";
+import { getAllSessions } from "./session.js";
+import { SUBAGENTS_DIR } from "../paths.js";
+/** Generate a 32-char hex agent id. Avoids collisions across parallel
+ *  dispatches even at sub-millisecond intervals. */
+function generateAgentId() {
+    return "alvin-" + crypto.randomBytes(12).toString("hex");
+}
+/**
+ * Dispatch a detached sub-agent. Returns synchronously — the subprocess
+ * runs in the background. Throws if spawn fails. On success:
+ *
+ *   1. Subprocess is running, writing stream-json to outputFile
+ *   2. The agent is registered with async-agent-watcher (pending list)
+ *   3. session.pendingBackgroundCount is incremented
+ *   4. When the subprocess completes, watcher delivers the result
+ */
+export function dispatchDetachedAgent(input) {
+    // Ensure subagents dir exists. Idempotent.
+    try {
+        fs.mkdirSync(SUBAGENTS_DIR, { recursive: true });
+    }
+    catch {
+        /* race-safe — next open() will surface the real error */
+    }
+    const agentId = generateAgentId();
+    const outputFile = resolve(SUBAGENTS_DIR, `${agentId}.jsonl`);
+    // Open the output file for write. We pass the FD to child's stdout
+    // so the subprocess writes directly without going through us.
+    // stderr → separate .err file for diagnostics.
+    const errFile = resolve(SUBAGENTS_DIR, `${agentId}.err`);
+    const outFd = fs.openSync(outputFile, "w");
+    const errFd = fs.openSync(errFile, "w");
+    const cleanEnv = { ...process.env };
+    // v4.13 — Prevent nested-session errors. The SDK refuses to run if
+    // these are already set in env (they leak from parent Alvin/SDK).
+    delete cleanEnv.CLAUDECODE;
+    delete cleanEnv.CLAUDE_CODE_ENTRYPOINT;
+    const claudePath = findClaudeBinary();
+    if (!claudePath) {
+        fs.closeSync(outFd);
+        fs.closeSync(errFd);
+        throw new Error("alvin_dispatch: claude CLI not found. Install claude-code to enable background dispatch.");
+    }
+    const child = spawn(claudePath, [
+        "-p",
+        input.prompt,
+        "--output-format",
+        "stream-json",
+        "--verbose",
+    ], {
+        cwd: input.cwd,
+        detached: true,
+        stdio: ["ignore", outFd, errFd],
+        env: cleanEnv,
+    });
+    // Close our copies of the FDs — the child has its own descriptors now.
+    try {
+        fs.closeSync(outFd);
+    }
+    catch {
+        /* ignore */
+    }
+    try {
+        fs.closeSync(errFd);
+    }
+    catch {
+        /* ignore */
+    }
+    // Detach from parent Node's event loop so parent exit doesn't wait.
+    child.unref();
+    // Register with watcher so it polls the output file and delivers.
+    registerPendingAgent({
+        agentId,
+        outputFile,
+        description: input.description,
+        prompt: input.prompt,
+        chatId: input.chatId,
+        userId: input.userId,
+        toolUseId: null,
+        sessionKey: input.sessionKey,
+    });
+    // Increment the session's pendingBackgroundCount so the main handler
+    // knows a background task is in flight (same signal path as SDK's
+    // built-in Task tool).
+    try {
+        const s = getAllSessions().get(input.sessionKey);
+        if (s) {
+            s.pendingBackgroundCount = (s.pendingBackgroundCount ?? 0) + 1;
+        }
+    }
+    catch {
+        /* never let counter updates break dispatch */
+    }
+    return { agentId, outputFile, spawned: true };
+}

package/dist/services/alvin-mcp-tools.js ADDED Viewed

@@ -0,0 +1,103 @@
+/**
+ * v4.13 — Alvin's custom MCP tools, registered with the Claude Agent SDK
+ * via `createSdkMcpServer()`.
+ *
+ * Currently exposes a single tool:
+ *   `alvin_dispatch_agent(prompt, description)` — spawns a truly
+ *   detached `claude -p` subprocess that's independent of the parent
+ *   SDK lifecycle. Claude should prefer this over built-in
+ *   `Task(run_in_background: true)` for any long-running work on
+ *   Telegram so the main Telegram session isn't blocked by the SDK's
+ *   task-notification injection mechanism.
+ *
+ * The MCP server is created lazily per-query so each query gets fresh
+ * handler context (chatId/userId/sessionKey) via a closure.
+ */
+import { createSdkMcpServer, tool, } from "@anthropic-ai/claude-agent-sdk";
+import { z } from "zod";
+import { dispatchDetachedAgent } from "./alvin-dispatch.js";
+/**
+ * Build an MCP server bound to a specific turn's context. Pass the
+ * returned instance under `mcpServers: { alvin: <instance> }` in the
+ * query options.
+ */
+export function buildAlvinMcpServer(ctx) {
+    return createSdkMcpServer({
+        name: "alvin",
+        version: "4.13.0",
+        tools: [
+            tool("dispatch_agent", [
+                "Dispatch a TRULY DETACHED background sub-agent that runs",
+                "independently of this session. Use this for ANY long-running",
+                "work on Telegram/Slack/Discord/WhatsApp — research tasks,",
+                "audits, multi-page scraping, deep analysis — so the main",
+                "user session stays responsive and the user can keep chatting",
+                "with you while the sub-agent works.",
+                "",
+                "HOW IT DIFFERS FROM Task(run_in_background: true):",
+                "- The built-in Task tool's subprocess is tied to this session,",
+                "  so aborting the session also kills the sub-agent mid-work.",
+                "- `alvin_dispatch.dispatch_agent` spawns a completely",
+                "  independent `claude -p` subprocess that survives any abort,",
+                "  crash, or restart of the main bot.",
+                "",
+                "WHEN TO USE:",
+                "- Any audit/research visiting >2 URLs or reading >5 files",
+                "- Full-repo scans, code reviews, SEO/security/perf audits",
+                "- Anything you'd describe as 'thorough' or 'takes a few min'",
+                "",
+                "HOW THE RESULT GETS BACK TO THE USER:",
+                "- The tool returns { agentId, outputFile } immediately.",
+                "- The bot's async-agent watcher polls the outputFile and",
+                "  delivers the final result as a separate chat message when",
+                "  the sub-agent completes (success, failure, or 5-min",
+                "  staleness).",
+                "- Your job after calling this tool: tell the user ONE short",
+                "  sentence about what you dispatched, then END your turn.",
+                "  Do NOT wait. Do NOT poll the outputFile yourself.",
+            ].join("\n"), {
+                prompt: z
+                    .string()
+                    .describe("The full prompt for the sub-agent. Be specific and self-contained — the sub-agent has no access to this conversation's context and will see only this prompt."),
+                description: z
+                    .string()
+                    .describe("Short human-readable title (e.g. 'SEO audit alev-b.com', 'Research Higgsfield Seedance 2.0'). Shown to the user when the result arrives."),
+            }, async (args) => {
+                try {
+                    const result = dispatchDetachedAgent({
+                        prompt: args.prompt,
+                        description: args.description,
+                        chatId: ctx.chatId,
+                        userId: ctx.userId,
+                        sessionKey: ctx.sessionKey,
+                        cwd: ctx.cwd,
+                    });
+                    return {
+                        content: [
+                            {
+                                type: "text",
+                                text: `✅ Background sub-agent dispatched.\n` +
+                                    `agentId: ${result.agentId}\n` +
+                                    `output_file: ${result.outputFile}\n` +
+                                    `The user will receive the result as a separate message when the sub-agent completes.\n` +
+                                    `End your turn now. Do not wait for the result — it arrives asynchronously.`,
+                            },
+                        ],
+                    };
+                }
+                catch (err) {
+                    const msg = err instanceof Error ? err.message : String(err);
+                    return {
+                        content: [
+                            {
+                                type: "text",
+                                text: `⚠️ Failed to dispatch background agent: ${msg}`,
+                            },
+                        ],
+                        isError: true,
+                    };
+                }
+            }),
+        ],
+    });
+}

package/dist/services/async-agent-parser.js CHANGED Viewed

@@ -68,14 +68,45 @@ export function parseAsyncLaunchedToolResult(raw) {
     return { agentId, outputFile };
 }
 const DEFAULT_TAIL_BYTES = 64 * 1024;
+/**
+ * v4.12.4 — Default staleness window for partial-output delivery.
+ *
+ * If an outputFile has not been written to for at least this long AND
+ * there is usable assistant text content in it, treat it as "completed
+ * with partial output" rather than leaving it to time out at 12h with
+ * an empty banner. 5 minutes is a balance between:
+ *   - Fast enough to unblock interrupted agents (most useful work is
+ *     done within a few minutes)
+ *   - Slow enough to avoid false-positives on slow-but-alive agents
+ *     (typical tool_use gaps are under 30s)
+ *
+ * Override per call via opts.stalenessMs, or globally via the
+ * ALVIN_SUBAGENT_STALENESS_MS env var. `0` disables the fallback
+ * entirely (strict end_turn-only completion detection).
+ */
+const DEFAULT_STALENESS_MS = Number(process.env.ALVIN_SUBAGENT_STALENESS_MS) || 5 * 60 * 1000;
+/**
+ * Banner prepended to partial-output deliveries so the user knows the
+ * sub-agent was interrupted and this isn't a clean completion.
+ */
+const INTERRUPTED_BANNER = "⚠️ _Sub-Agent wurde unterbrochen — hier ist der partielle Output:_\n\n";
 /**
  * Read the tail of an SDK background-agent outputFile and decide what
  * state the sub-agent is in. See spec doc for the JSONL format. We only
  * read the last `maxTailBytes` of the file because long-running agents
  * (SEO audits etc.) can produce hundreds of KB of intermediate JSONL.
+ *
+ * v4.12.4 adds staleness-based partial-output delivery. When no
+ * `end_turn` marker is present, the parser checks file mtime: if the
+ * file hasn't grown in `stalenessMs` AND there is text content in the
+ * assistant turns, aggregate the text across all turns (not just the
+ * last), prepend an "interrupted" banner, and return "completed". This
+ * recovers real work from agents killed mid-execution (e.g. by the
+ * v4.12.3 bypass abort propagating through the SDK subprocess).
  */
 export async function parseOutputFileStatus(path, opts = {}) {
     const maxTailBytes = opts.maxTailBytes ?? DEFAULT_TAIL_BYTES;
+    const stalenessMs = opts.stalenessMs ?? DEFAULT_STALENESS_MS;
     let stat;
     try {
         stat = await fs.stat(path);
@@ -117,6 +148,56 @@ export async function parseOutputFileStatus(path, opts = {}) {
     const usable = lines
         .slice(headIncomplete, lines.length - (trailIncomplete > 0 ? trailIncomplete : 0))
         .filter((l) => l.length > 0);
+    // v4.13 — FIRST PASS: look for a `{"type":"result"}` event anywhere in
+    // the tail. This is the completion signal for `claude -p
+    // --output-format stream-json` output (used by the v4.13 dispatch
+    // mechanism). When present, the `result` field holds the authoritative
+    // final text. If `result.result` is missing, aggregate from all
+    // assistant text blocks in the tail.
+    for (let i = usable.length - 1; i >= 0; i--) {
+        let parsed;
+        try {
+            parsed = JSON.parse(usable[i]);
+        }
+        catch {
+            continue;
+        }
+        if (parsed.type === "result") {
+            // Prefer the authoritative `result` field when present.
+            let output = typeof parsed.result === "string" ? parsed.result : "";
+            // Fallback: aggregate text from all assistant messages in the tail.
+            if (!output) {
+                const fragments = [];
+                for (const line of usable) {
+                    let p;
+                    try {
+                        p = JSON.parse(line);
+                    }
+                    catch {
+                        continue;
+                    }
+                    if (p.type === "assistant" &&
+                        Array.isArray(p.message?.content)) {
+                        for (const c of p.message.content) {
+                            if (c?.type === "text" && typeof c.text === "string") {
+                                fragments.push(c.text);
+                            }
+                        }
+                    }
+                }
+                output = fragments.join("\n\n").trim();
+            }
+            // Token usage from the result event itself.
+            const usage = parsed.usage;
+            const tokensUsed = usage
+                ? {
+                    input: usage.input_tokens ?? 0,
+                    output: usage.output_tokens ?? 0,
+                }
+                : undefined;
+            return { state: "completed", output, tokensUsed };
+        }
+    }
     // Walk backwards to find the most-recent assistant message with end_turn
     for (let i = usable.length - 1; i >= 0; i--) {
         let parsed;
@@ -147,6 +228,50 @@ export async function parseOutputFileStatus(path, opts = {}) {
             };
         }
     }
-    // No completion marker found — still running.
+    // v4.12.4 — No clean end_turn. Check for staleness + partial text.
+    if (stalenessMs > 0) {
+        const ageMs = Date.now() - stat.mtimeMs;
+        if (ageMs >= stalenessMs) {
+            // Aggregate ALL assistant text blocks across the tail, in order.
+            // We parse forward now (not backward like the end_turn scan) so
+            // the delivered text preserves the natural reading order.
+            const textFragments = [];
+            let lastUsage;
+            for (const line of usable) {
+                let parsed;
+                try {
+                    parsed = JSON.parse(line);
+                }
+                catch {
+                    continue;
+                }
+                if (parsed.type === "assistant" &&
+                    Array.isArray(parsed.message?.content)) {
+                    for (const c of parsed.message.content) {
+                        if (c?.type === "text" && typeof c.text === "string") {
+                            textFragments.push(c.text);
+                        }
+                    }
+                    if (parsed.message?.usage) {
+                        lastUsage = {
+                            input: parsed.message.usage.input_tokens ?? 0,
+                            output: parsed.message.usage.output_tokens ?? 0,
+                        };
+                    }
+                }
+            }
+            if (textFragments.length > 0) {
+                const aggregated = textFragments.join("\n\n").trim();
+                if (aggregated.length > 0) {
+                    return {
+                        state: "completed",
+                        output: INTERRUPTED_BANNER + aggregated,
+                        tokensUsed: lastUsage,
+                    };
+                }
+            }
+        }
+    }
+    // No completion marker found and not stale (or no text) — still running.
     return { state: "running", size: stat.size };
 }