npm - alvin-bot - Versions diffs - 4.12.4 → 4.13.0 - Mend

alvin-bot 4.12.4 → 4.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/CHANGELOG.md +77 -0
package/dist/handlers/message.js +9 -0
package/dist/paths.js +8 -0
package/dist/providers/claude-sdk-provider.js +25 -5
package/dist/services/alvin-dispatch.js +125 -0
package/dist/services/alvin-mcp-tools.js +103 -0
package/dist/services/async-agent-parser.js +50 -0
package/dist/services/personality.js +36 -10
package/package.json +1 -1
package/test/alvin-dispatch.test.ts +220 -0
package/test/async-agent-parser-streamjson.test.ts +273 -0

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,83 @@
 All notable changes to Alvin Bot are documented here.
+## [4.13.0] — 2026-04-16
+### ✨ Major: truly detached sub-agent dispatch via `alvin_dispatch_agent` MCP tool
+**Background.** v4.12.1 → v4.12.3 tried three progressively more complex fixes for the "bot freezes while sub-agent runs" problem, all of which depended on Claude Agent SDK's built-in `Task(run_in_background: true)` tool. All three iterations missed the same architectural reality: the SDK's background task stays tied to the parent SDK subprocess lifecycle. When v4.12.3's bypass path aborted the parent to unblock the user, the abort cascaded into killing the in-flight sub-agent mid-work. v4.12.4 worked around this at the delivery layer (recovering partial output after a 5-min staleness window), but the fundamental architecture was still wrong.
+v4.13 fixes the architecture. Instead of using the SDK's built-in Task tool for background work, we register our own MCP tool — `mcp__alvin__dispatch_agent` — which spawns a **completely independent** `claude -p` subprocess (its own PID, its own process group, unreferenced from the parent's event loop). Aborting the parent has zero effect on the dispatched subprocess. It continues to write its stream-json output to its own file and runs to completion. The async-agent-watcher polls the output file and delivers the result as a separate message when ready.
+Empirically verified with a standalone survival test (`scripts/smoke-test-abort-survival.mjs`): dispatch an agent that needs 20+ seconds of work, kill the parent Node process 100ms later, watch the subprocess keep writing to its output file and complete cleanly with the expected result.
+### What changed for the user
+- **Before v4.13** (with Task tool): the bot shows "typing…" for the entire duration of the sub-agent's work (5, 20, 60 minutes). New messages sit in a queue and don't get processed. If the user interrupts via v4.12.3's bypass, the sub-agent dies mid-work and hours later the user gets a `720m timeout · (empty output)` message.
+- **After v4.13** (with `alvin_dispatch_agent`): the bot's turn completes within seconds of dispatch. The user sees "🤖 Dispatched 2 background agents — I'll send the results when ready." and can immediately chat about anything else. The background subprocesses finish cleanly and deliver their full results as separate messages.
+This matches the OpenClaw experience the user was asking about — except it's built natively into Claude Agent SDK's MCP-tool mechanism, not a wholesale replacement.
+### Technical details
+**New module** `src/services/alvin-dispatch.ts`
+- `dispatchDetachedAgent(input)` — spawns `claude -p <prompt> --output-format stream-json` via `child_process.spawn({ detached: true, stdio: ["ignore", outFd, errFd] })` + `.unref()`
+- Synchronous return: `{ agentId, outputFile, spawned: true }`
+- Side effects: registers with `async-agent-watcher`, increments `session.pendingBackgroundCount`
+- Unique agent IDs via `crypto.randomBytes(12).toString("hex")` (collision-safe for parallel dispatch)
+- Cleans `CLAUDECODE`/`CLAUDE_CODE_ENTRYPOINT` from env to prevent nested-session errors
+**New module** `src/services/alvin-mcp-tools.ts`
+- `buildAlvinMcpServer(ctx)` — creates an SDK MCP server bound to this turn's `{ chatId, userId, sessionKey }` context via closure
+- Exposes `dispatch_agent` tool (zod-validated input: `{ prompt: string, description: string }`)
+- Tool handler calls `dispatchDetachedAgent` and returns `agentId + outputFile` to Claude
+- Uses SDK's `createSdkMcpServer` + `tool` builders (the SDK's native inline-tool API — no separate MCP server process needed)
+**Provider integration** (`src/providers/claude-sdk-provider.ts`)
+- New `QueryOptions.alvinDispatchContext` field — when set, provider registers `mcpServers: { alvin: buildAlvinMcpServer(ctx) }` + appends `mcp__alvin__dispatch_agent` to the default `allowedTools` list
+- When unset, the MCP server is not registered and Claude falls back to the built-in Task tool only
+- Non-SDK providers ignore the new field entirely
+**Handler integration** (`src/handlers/message.ts`)
+- Passes `alvinDispatchContext: { chatId, userId, sessionKey }` on every SDK turn
+- No other handler changes — the bypass path, the staleness parser, and the pending-count decrement are all reused from v4.12.3/v4.12.4
+**Parser extension** (`src/services/async-agent-parser.ts`)
+- New first-pass scan for `{"type":"result"}` events — the completion marker used by `claude -p --output-format stream-json` (different from the SDK-internal sub-agent format that uses `message.stop_reason: "end_turn"`)
+- When found, uses the `result.result` field as authoritative output when present, falls back to aggregating all assistant text blocks
+- Preserves backward compat with the existing `end_turn`-based path (tested by the old test suite)
+**System prompt update** (`src/services/personality.ts`)
+- `BACKGROUND_SUBAGENT_HINT` rewritten to strongly prefer `mcp__alvin__dispatch_agent` over `Task(run_in_background: true)` on Telegram/WhatsApp/Slack/Discord
+- Explicit decision tree, concrete example prompts, parallel-dispatch guidance
+- Built-in Task tool remains available but deprecated for long-running work; reserved for the rare case where Claude needs a result in the same turn
+### Known limitations
+- **First-turn only for now**: the MCP server is bound to `{ chatId, userId, sessionKey }` at query construction time. If the session's underlying SDK session ID changes mid-conversation (rare), the tool context goes stale. Defensive: a new MCP server is built on each handler invocation, so any next turn picks up the correct context.
+- **Non-Telegram platforms**: `src/handlers/platform-message.ts` (Slack/Discord/WhatsApp) doesn't pass `alvinDispatchContext` yet. Deferred to follow-up — the Telegram path is the primary use case and the one the user explicitly requested.
+- **Parallel dispatch not smoke-tested**: the system prompt guides Claude to call `dispatch_agent` multiple times in one turn for parallel work, but I only end-to-end tested single dispatch. Should work (no shared state in the handler), but YMMV until battle-tested.
+### Testing
+- **Baseline**: 447 tests (v4.12.4)
+- **New**:
+  - `test/alvin-dispatch.test.ts` — 6 tests (spawn flags, unique IDs, watcher registration, session counter, stdio redirect, env cleanup)
+  - `test/async-agent-parser-streamjson.test.ts` — 7 tests (result-event detection, token extraction, error state, running state, multi-text aggregation, `result.result` precedence, minimal fields)
+- **Total**: 460 tests, all green, TSC clean
+- **Real-world smoke tests** (NOT in CI — run via `node scripts/smoke-test-dispatch.mjs` and `node scripts/smoke-test-abort-survival.mjs`):
+  - `smoke-test-dispatch`: dispatches a real `claude -p` subprocess, polls to completion (~10s), verifies exact output `"SMOKE_TEST_OK_v4.13"`. **PASS**.
+  - `smoke-test-abort-survival`: dispatches a subprocess that needs ~25s of work, kills the parent Node process ~100ms later, polls the output file. Subprocess survives and completes cleanly. **PASS**.
+### Files changed
+- **NEW**: `src/services/alvin-dispatch.ts`, `src/services/alvin-mcp-tools.ts`, `scripts/smoke-test-dispatch.mjs`, `scripts/smoke-test-abort-survival.mjs`
+- **NEW tests**: `test/alvin-dispatch.test.ts`, `test/async-agent-parser-streamjson.test.ts`
+- **Modified**: `src/paths.ts` (SUBAGENTS_DIR), `src/services/async-agent-parser.ts` (stream-json detection), `src/providers/claude-sdk-provider.ts` (MCP server registration + allowedTools), `src/providers/types.ts` (QueryOptions.alvinDispatchContext), `src/handlers/message.ts` (pass dispatch context), `src/services/personality.ts` (BACKGROUND_SUBAGENT_HINT rewrite)
+- **Version**: `package.json` 4.12.4 → 4.13.0 (minor bump — new public surface: MCP tool)
+---
 ## [4.12.4] — 2026-04-16
 ### 🐛 Patch: recover partial output from interrupted background sub-agents

package/dist/handlers/message.js CHANGED Viewed

@@ -400,6 +400,15 @@ export async function handleMessage(ctx) {
                 messageCount: session.messageCount,
                 toolUseCount: session.toolUseCount,
             } : undefined,
+            // v4.13 — Expose alvin_dispatch_agent MCP tool so Claude can spawn
+            // truly detached background sub-agents (independent of this SDK
+            // subprocess's lifecycle). Only for SDK provider + Telegram here —
+            // non-SDK providers ignore this field.
+            alvinDispatchContext: isSDK ? {
+                chatId: ctx.chat.id,
+                userId,
+                sessionKey,
+            } : undefined,
         };
         // Stream response from provider (with fallback)
         let lastBroadcastLen = 0;

package/dist/paths.js CHANGED Viewed

@@ -118,3 +118,11 @@ export const ASSETS_DIR = resolve(DATA_DIR, "assets");
 export const ASSETS_INDEX_JSON = resolve(DATA_DIR, "assets", "INDEX.json");
 /** assets/INDEX.md — Human-readable asset summary (injected into prompts) */
 export const ASSETS_INDEX_MD = resolve(DATA_DIR, "assets", "INDEX.md");
+/** subagents/ — Detached `claude -p` subprocess output files (v4.13).
+ *  Each dispatched agent writes its full stream-json output to
+ *  subagents/<agentId>.jsonl. The async-agent-watcher polls these files
+ *  and delivers the final result as a separate message when ready.
+ *  These live outside BOT_ROOT/DATA_DIR's state/ so that the watcher's
+ *  giveUpAt-survive-restart logic doesn't leak into the subprocess
+ *  lifecycle. */
+export const SUBAGENTS_DIR = resolve(DATA_DIR, "subagents");

package/dist/providers/claude-sdk-provider.js CHANGED Viewed

@@ -13,6 +13,7 @@ import { fileURLToPath } from "url";
 import { execFile } from "child_process";
 import { promisify } from "util";
 import { findClaudeBinary } from "../find-claude-binary.js";
+import { buildAlvinMcpServer } from "../services/alvin-mcp-tools.js";
 const execFileAsync = promisify(execFile);
 /**
  * Detects the Claude CLI "Not logged in" error message. The CLI emits this
@@ -103,6 +104,25 @@ export class ClaudeSDKProvider {
         }
         try {
             const claudePath = findClaudeBinary();
+            // v4.13 — Register Alvin's custom MCP server if the caller provided
+            // dispatch context. The server exposes `alvin_dispatch_agent` which
+            // spawns truly detached `claude -p` subprocesses (independent of the
+            // main SDK subprocess's lifecycle). When Claude calls it, the bot
+            // can abort this query without killing the dispatched sub-agent.
+            const mcpServers = {};
+            if (options.alvinDispatchContext) {
+                mcpServers.alvin = buildAlvinMcpServer(options.alvinDispatchContext);
+            }
+            // v4.13 — MCP tool names must be explicitly whitelisted via allowedTools
+            // in the form `mcp__<server>__<tool>`. Without this, Claude can see the
+            // tool in the catalog but cannot actually invoke it.
+            const defaultAllowed = [
+                "Read", "Write", "Edit", "Bash", "Glob", "Grep",
+                "WebSearch", "WebFetch", "Task",
+            ];
+            if (options.alvinDispatchContext) {
+                defaultAllowed.push("mcp__alvin__dispatch_agent");
+            }
             const q = query({
                 prompt,
                 options: {
@@ -116,11 +136,11 @@ export class ClaudeSDKProvider {
                     settingSources: ["user", "project"],
                     // v4.12.2 — options.allowedTools can override the default full set.
                     // Used by sub-agents with toolset="readonly"/"research" to restrict
-                    // what Claude can do. Default = full access.
-                    allowedTools: options.allowedTools ?? [
-                        "Read", "Write", "Edit", "Bash", "Glob", "Grep",
-                        "WebSearch", "WebFetch", "Task",
-                    ],
+                    // what Claude can do. Default = full access + alvin MCP tools.
+                    allowedTools: options.allowedTools ?? defaultAllowed,
+                    // v4.13 — Conditionally pass the MCP server config so the inline
+                    // dispatch tool is visible. Empty object = no custom tools.
+                    mcpServers: Object.keys(mcpServers).length > 0 ? mcpServers : undefined,
                     systemPrompt,
                     effort: (options.effort || "medium"),
                     maxTurns: 50,

package/dist/services/alvin-dispatch.js ADDED Viewed

@@ -0,0 +1,125 @@
+/**
+ * v4.13 — alvin_dispatch custom-tool service.
+ *
+ * Architectural replacement for Claude Agent SDK's built-in
+ * `Task(run_in_background: true)` tool. The SDK's built-in version
+ * ties the background sub-agent's execution to the parent SDK
+ * subprocess lifecycle — killing the parent (e.g. via v4.12.3's
+ * bypass-abort) cascades into killing any in-flight background tasks.
+ *
+ * This module instead spawns a truly independent `claude -p` subprocess
+ * via Node's `child_process.spawn({ detached: true, stdio: [...] })`.
+ * The subprocess:
+ *   - Has its own PID, own process group (by detached: true)
+ *   - Is unreffed so the parent Node process doesn't wait for it
+ *   - Writes its stream-json output to its own file
+ *   - Survives any abort/crash/restart of the parent Alvin bot
+ *
+ * The async-agent-watcher polls the output file and delivers the
+ * final result via subagent-delivery.ts when the sub-agent completes.
+ *
+ * See Phase A of docs/superpowers/plans/2026-04-16-v4.13-truly-async-subagents.md
+ * for the empirical verification that detached `claude -p` subprocesses
+ * behave as expected (they do).
+ */
+import { spawn } from "node:child_process";
+import fs from "node:fs";
+import crypto from "node:crypto";
+import { resolve } from "node:path";
+import { findClaudeBinary } from "../find-claude-binary.js";
+import { registerPendingAgent } from "./async-agent-watcher.js";
+import { getAllSessions } from "./session.js";
+import { SUBAGENTS_DIR } from "../paths.js";
+/** Generate a 32-char hex agent id. Avoids collisions across parallel
+ *  dispatches even at sub-millisecond intervals. */
+function generateAgentId() {
+    return "alvin-" + crypto.randomBytes(12).toString("hex");
+}
+/**
+ * Dispatch a detached sub-agent. Returns synchronously — the subprocess
+ * runs in the background. Throws if spawn fails. On success:
+ *
+ *   1. Subprocess is running, writing stream-json to outputFile
+ *   2. The agent is registered with async-agent-watcher (pending list)
+ *   3. session.pendingBackgroundCount is incremented
+ *   4. When the subprocess completes, watcher delivers the result
+ */
+export function dispatchDetachedAgent(input) {
+    // Ensure subagents dir exists. Idempotent.
+    try {
+        fs.mkdirSync(SUBAGENTS_DIR, { recursive: true });
+    }
+    catch {
+        /* race-safe — next open() will surface the real error */
+    }
+    const agentId = generateAgentId();
+    const outputFile = resolve(SUBAGENTS_DIR, `${agentId}.jsonl`);
+    // Open the output file for write. We pass the FD to child's stdout
+    // so the subprocess writes directly without going through us.
+    // stderr → separate .err file for diagnostics.
+    const errFile = resolve(SUBAGENTS_DIR, `${agentId}.err`);
+    const outFd = fs.openSync(outputFile, "w");
+    const errFd = fs.openSync(errFile, "w");
+    const cleanEnv = { ...process.env };
+    // v4.13 — Prevent nested-session errors. The SDK refuses to run if
+    // these are already set in env (they leak from parent Alvin/SDK).
+    delete cleanEnv.CLAUDECODE;
+    delete cleanEnv.CLAUDE_CODE_ENTRYPOINT;
+    const claudePath = findClaudeBinary();
+    if (!claudePath) {
+        fs.closeSync(outFd);
+        fs.closeSync(errFd);
+        throw new Error("alvin_dispatch: claude CLI not found. Install claude-code to enable background dispatch.");
+    }
+    const child = spawn(claudePath, [
+        "-p",
+        input.prompt,
+        "--output-format",
+        "stream-json",
+        "--verbose",
+    ], {
+        cwd: input.cwd,
+        detached: true,
+        stdio: ["ignore", outFd, errFd],
+        env: cleanEnv,
+    });
+    // Close our copies of the FDs — the child has its own descriptors now.
+    try {
+        fs.closeSync(outFd);
+    }
+    catch {
+        /* ignore */
+    }
+    try {
+        fs.closeSync(errFd);
+    }
+    catch {
+        /* ignore */
+    }
+    // Detach from parent Node's event loop so parent exit doesn't wait.
+    child.unref();
+    // Register with watcher so it polls the output file and delivers.
+    registerPendingAgent({
+        agentId,
+        outputFile,
+        description: input.description,
+        prompt: input.prompt,
+        chatId: input.chatId,
+        userId: input.userId,
+        toolUseId: null,
+        sessionKey: input.sessionKey,
+    });
+    // Increment the session's pendingBackgroundCount so the main handler
+    // knows a background task is in flight (same signal path as SDK's
+    // built-in Task tool).
+    try {
+        const s = getAllSessions().get(input.sessionKey);
+        if (s) {
+            s.pendingBackgroundCount = (s.pendingBackgroundCount ?? 0) + 1;
+        }
+    }
+    catch {
+        /* never let counter updates break dispatch */
+    }
+    return { agentId, outputFile, spawned: true };
+}

package/dist/services/alvin-mcp-tools.js ADDED Viewed

@@ -0,0 +1,103 @@
+/**
+ * v4.13 — Alvin's custom MCP tools, registered with the Claude Agent SDK
+ * via `createSdkMcpServer()`.
+ *
+ * Currently exposes a single tool:
+ *   `alvin_dispatch_agent(prompt, description)` — spawns a truly
+ *   detached `claude -p` subprocess that's independent of the parent
+ *   SDK lifecycle. Claude should prefer this over built-in
+ *   `Task(run_in_background: true)` for any long-running work on
+ *   Telegram so the main Telegram session isn't blocked by the SDK's
+ *   task-notification injection mechanism.
+ *
+ * The MCP server is created lazily per-query so each query gets fresh
+ * handler context (chatId/userId/sessionKey) via a closure.
+ */
+import { createSdkMcpServer, tool, } from "@anthropic-ai/claude-agent-sdk";
+import { z } from "zod";
+import { dispatchDetachedAgent } from "./alvin-dispatch.js";
+/**
+ * Build an MCP server bound to a specific turn's context. Pass the
+ * returned instance under `mcpServers: { alvin: <instance> }` in the
+ * query options.
+ */
+export function buildAlvinMcpServer(ctx) {
+    return createSdkMcpServer({
+        name: "alvin",
+        version: "4.13.0",
+        tools: [
+            tool("dispatch_agent", [
+                "Dispatch a TRULY DETACHED background sub-agent that runs",
+                "independently of this session. Use this for ANY long-running",
+                "work on Telegram/Slack/Discord/WhatsApp — research tasks,",
+                "audits, multi-page scraping, deep analysis — so the main",
+                "user session stays responsive and the user can keep chatting",
+                "with you while the sub-agent works.",
+                "",
+                "HOW IT DIFFERS FROM Task(run_in_background: true):",
+                "- The built-in Task tool's subprocess is tied to this session,",
+                "  so aborting the session also kills the sub-agent mid-work.",
+                "- `alvin_dispatch.dispatch_agent` spawns a completely",
+                "  independent `claude -p` subprocess that survives any abort,",
+                "  crash, or restart of the main bot.",
+                "",
+                "WHEN TO USE:",
+                "- Any audit/research visiting >2 URLs or reading >5 files",
+                "- Full-repo scans, code reviews, SEO/security/perf audits",
+                "- Anything you'd describe as 'thorough' or 'takes a few min'",
+                "",
+                "HOW THE RESULT GETS BACK TO THE USER:",
+                "- The tool returns { agentId, outputFile } immediately.",
+                "- The bot's async-agent watcher polls the outputFile and",
+                "  delivers the final result as a separate chat message when",
+                "  the sub-agent completes (success, failure, or 5-min",
+                "  staleness).",
+                "- Your job after calling this tool: tell the user ONE short",
+                "  sentence about what you dispatched, then END your turn.",
+                "  Do NOT wait. Do NOT poll the outputFile yourself.",
+            ].join("\n"), {
+                prompt: z
+                    .string()
+                    .describe("The full prompt for the sub-agent. Be specific and self-contained — the sub-agent has no access to this conversation's context and will see only this prompt."),
+                description: z
+                    .string()
+                    .describe("Short human-readable title (e.g. 'SEO audit alev-b.com', 'Research Higgsfield Seedance 2.0'). Shown to the user when the result arrives."),
+            }, async (args) => {
+                try {
+                    const result = dispatchDetachedAgent({
+                        prompt: args.prompt,
+                        description: args.description,
+                        chatId: ctx.chatId,
+                        userId: ctx.userId,
+                        sessionKey: ctx.sessionKey,
+                        cwd: ctx.cwd,
+                    });
+                    return {
+                        content: [
+                            {
+                                type: "text",
+                                text: `✅ Background sub-agent dispatched.\n` +
+                                    `agentId: ${result.agentId}\n` +
+                                    `output_file: ${result.outputFile}\n` +
+                                    `The user will receive the result as a separate message when the sub-agent completes.\n` +
+                                    `End your turn now. Do not wait for the result — it arrives asynchronously.`,
+                            },
+                        ],
+                    };
+                }
+                catch (err) {
+                    const msg = err instanceof Error ? err.message : String(err);
+                    return {
+                        content: [
+                            {
+                                type: "text",
+                                text: `⚠️ Failed to dispatch background agent: ${msg}`,
+                            },
+                        ],
+                        isError: true,
+                    };
+                }
+            }),
+        ],
+    });
+}

package/dist/services/async-agent-parser.js CHANGED Viewed

@@ -148,6 +148,56 @@ export async function parseOutputFileStatus(path, opts = {}) {
     const usable = lines
         .slice(headIncomplete, lines.length - (trailIncomplete > 0 ? trailIncomplete : 0))
         .filter((l) => l.length > 0);
+    // v4.13 — FIRST PASS: look for a `{"type":"result"}` event anywhere in
+    // the tail. This is the completion signal for `claude -p
+    // --output-format stream-json` output (used by the v4.13 dispatch
+    // mechanism). When present, the `result` field holds the authoritative
+    // final text. If `result.result` is missing, aggregate from all
+    // assistant text blocks in the tail.
+    for (let i = usable.length - 1; i >= 0; i--) {
+        let parsed;
+        try {
+            parsed = JSON.parse(usable[i]);
+        }
+        catch {
+            continue;
+        }
+        if (parsed.type === "result") {
+            // Prefer the authoritative `result` field when present.
+            let output = typeof parsed.result === "string" ? parsed.result : "";
+            // Fallback: aggregate text from all assistant messages in the tail.
+            if (!output) {
+                const fragments = [];
+                for (const line of usable) {
+                    let p;
+                    try {
+                        p = JSON.parse(line);
+                    }
+                    catch {
+                        continue;
+                    }
+                    if (p.type === "assistant" &&
+                        Array.isArray(p.message?.content)) {
+                        for (const c of p.message.content) {
+                            if (c?.type === "text" && typeof c.text === "string") {
+                                fragments.push(c.text);
+                            }
+                        }
+                    }
+                }
+                output = fragments.join("\n\n").trim();
+            }
+            // Token usage from the result event itself.
+            const usage = parsed.usage;
+            const tokensUsed = usage
+                ? {
+                    input: usage.input_tokens ?? 0,
+                    output: usage.output_tokens ?? 0,
+                }
+                : undefined;
+            return { state: "completed", output, tokensUsed };
+        }
+    }
     // Walk backwards to find the most-recent assistant message with end_turn
     for (let i = usable.length - 1; i >= 0; i--) {
         let parsed;

package/dist/services/personality.js CHANGED Viewed

@@ -61,13 +61,27 @@ const SDK_ADDON = `When you run commands or edit files, briefly explain what you
  * See test/system-prompt-background-hint.test.ts and
  * docs/superpowers/plans/2026-04-13-async-subagents.md
  */
-const BACKGROUND_SUBAGENT_HINT = `## ⚠️ CRITICAL: Background Sub-Agents (Telegram blocking rule)
+const BACKGROUND_SUBAGENT_HINT = `## ⚠️ CRITICAL: Background Sub-Agents on Telegram/WhatsApp/Slack
-**The rule**: For any Agent/Task tool call that will take more than
-~30 seconds, you MUST set \`run_in_background: true\`. Otherwise the
-Telegram user is COMPLETELY BLOCKED from messaging me until the task
-finishes — terrible UX for anything that ends up taking minutes or
-hours. This is the single most important rule for using the Agent tool.
+**THE RULE (v4.13 update)**: For ANY long-running work (research,
+audits, multi-step analysis, >30 seconds), you MUST prefer the
+\`mcp__alvin__dispatch_agent\` tool over the built-in \`Task\` tool.
+**Why alvin_dispatch over Task(run_in_background: true)?**
+The built-in Task tool (even with \`run_in_background: true\`) keeps
+the sub-agent tied to THIS session's CLI subprocess. If the user sends
+me a new message mid-work, aborting this session would kill the
+sub-agent half-way through and leave garbage partial output. Alvin
+learned this the hard way (v4.12.1–v4.12.3 all got this wrong).
+\`mcp__alvin__dispatch_agent\` spawns a COMPLETELY INDEPENDENT
+\`claude -p\` subprocess with its own PID, own process group. It
+survives ANY abort of this session. The user can interrupt me freely
+and keep chatting while the sub-agent finishes cleanly. The bot
+delivers the result as a separate message when ready.
+**THIS IS THE SINGLE MOST IMPORTANT RULE FOR USER RESPONSIVENESS.**
 **Why it matters**: During a synchronous Agent tool call the parent
 session has no way to know the sub-agent is still working. It appears
@@ -77,7 +91,7 @@ an \`agentId\` + \`outputFile\` path IMMEDIATELY, your turn ends in
 seconds, the user can keep chatting with me, and the bot automatically
 delivers the sub-agent's final result as a separate message when ready.
-**Decision tree** (apply every time you consider the Agent/Task tool):
+**Decision tree** (apply every time you consider any sub-agent tool):
   Does the task involve ANY of the following?
     • Visiting more than 2 URLs
@@ -89,8 +103,14 @@ delivers the sub-agent's final result as a separate message when ready.
     • Crawling, scraping, or fetching multiple resources
     • Research across multiple sources or domains
-  YES → \`run_in_background: true\` (no exceptions)
-  NO  → foreground is fine (single quick sub-query under 30s)
+  YES → use \`mcp__alvin__dispatch_agent\` (truly detached, preferred)
+  NO  → foreground is fine (single quick sub-query under 30s, answer
+        yourself if possible)
+NOTE: The built-in Task tool with run_in_background: true still works
+but is now deprecated on Telegram/Slack/Discord/WhatsApp because it
+ties sub-agent lifetime to this session. Only use Task directly when
+you explicitly need the sub-agent's result IN THIS SAME TURN (rare).
 **Examples where you MUST use \`run_in_background: true\`:**
 - ANY audit (SEO, security, code quality, performance, accessibility, GEO)
@@ -107,7 +127,7 @@ delivers the sub-agent's final result as a separate message when ready.
 - "What's 2+2?" (no sub-agent needed — answer yourself)
 - "Check if package.json has foo" (one quick tool call)
-**After launching a background agent, you MUST:**
+**After launching a background agent (either tool), you MUST:**
 1. Tell the user in ONE short sentence what you kicked off.
    Example: "Starting SEO audit for gethomes.io in the background —
    I'll send the report when it's done."
@@ -115,6 +135,12 @@ delivers the sub-agent's final result as a separate message when ready.
 3. The bot will deliver the result as a separate message when ready.
    You don't need to poll the outputFile proactively.
+**For PARALLEL dispatch** (e.g. user says "research X and Y in parallel"):
+Call \`mcp__alvin__dispatch_agent\` multiple times in the SAME assistant
+turn, once per sub-task. Each returns its own agentId immediately. Your
+turn ends as soon as all dispatches have returned — no sequential
+waiting. The bot delivers each sub-agent's result separately when ready.
 If the user asks "is it done yet?" before the bot delivers the result,
 you MAY read the agent's \`outputFile\` (from the original tool result)
 using the Read tool to peek at progress — but don't block on it.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "alvin-bot",
-  "version": "4.12.4",
+  "version": "4.13.0",
   "description": "Alvin Bot \u2014 Your personal AI agent on Telegram, WhatsApp, Discord, Signal, and Web.",
   "type": "module",
   "main": "dist/index.js",

package/test/alvin-dispatch.test.ts ADDED Viewed

@@ -0,0 +1,220 @@
+/**
+ * v4.13 — alvin_dispatch custom-tool service.
+ *
+ * `dispatchDetachedAgent(input)` spawns a truly independent `claude -p`
+ * subprocess that survives the parent handler's abort. This is the
+ * architectural replacement for SDK's built-in Task(run_in_background)
+ * tool, which was tied to the parent SDK subprocess lifecycle.
+ *
+ * Contract:
+ *   - Input: { prompt, description, chatId, userId, sessionKey }
+ *   - Output (synchronous): { agentId, outputFile, spawned: true }
+ *   - Side effect: spawns detached subprocess writing stream-json
+ *     output to outputFile, registers with async-agent-watcher.
+ *
+ * These tests stub child_process.spawn so they run fast and deterministic.
+ * The "real subprocess survives parent" property was verified empirically
+ * in Phase A (see plan doc).
+ */
+import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
+import os from "os";
+import fs from "fs";
+import { resolve } from "path";
+const TEST_DATA_DIR = resolve(
+  os.tmpdir(),
+  `alvin-dispatch-${process.pid}-${Date.now()}`,
+);
+interface SpawnRecord {
+  cmd: string;
+  args: string[];
+  opts: {
+    detached?: boolean;
+    stdio?: unknown;
+    cwd?: string;
+    env?: Record<string, string | undefined>;
+  };
+  unreffed: boolean;
+}
+let spawned: SpawnRecord[] = [];
+beforeEach(async () => {
+  if (fs.existsSync(TEST_DATA_DIR))
+    fs.rmSync(TEST_DATA_DIR, { recursive: true, force: true });
+  fs.mkdirSync(TEST_DATA_DIR, { recursive: true });
+  process.env.ALVIN_DATA_DIR = TEST_DATA_DIR;
+  spawned = [];
+  vi.resetModules();
+  vi.doMock("node:child_process", async () => {
+    const actual = await vi.importActual<typeof import("node:child_process")>(
+      "node:child_process",
+    );
+    return {
+      ...actual,
+      spawn: (cmd: string, args: string[], opts: SpawnRecord["opts"]) => {
+        const record: SpawnRecord = {
+          cmd,
+          args,
+          opts,
+          unreffed: false,
+        };
+        spawned.push(record);
+        return {
+          pid: 12345,
+          unref() {
+            record.unreffed = true;
+          },
+          on() {},
+          kill() {},
+        };
+      },
+    };
+  });
+  vi.doMock("../src/services/subagent-delivery.js", () => ({
+    deliverSubAgentResult: async () => {},
+    attachBotApi: () => {},
+    __setBotApiForTest: () => {},
+  }));
+});
+afterEach(async () => {
+  try {
+    const mod = await import("../src/services/async-agent-watcher.js");
+    mod.stopWatcher();
+    mod.__resetForTest();
+  } catch {
+    /* ignore */
+  }
+});
+describe("dispatchDetachedAgent (v4.13)", () => {
+  it("spawns claude -p with detached: true and unrefs", async () => {
+    const mod = await import("../src/services/alvin-dispatch.js");
+    const result = mod.dispatchDetachedAgent({
+      prompt: "research X",
+      description: "X research",
+      chatId: 42,
+      userId: 42,
+      sessionKey: "s1",
+    });
+    expect(result.agentId).toMatch(/^alvin-[a-f0-9]{16,}$/);
+    expect(result.outputFile).toContain(TEST_DATA_DIR);
+    expect(result.spawned).toBe(true);
+    expect(spawned).toHaveLength(1);
+    const [s] = spawned;
+    expect(s.cmd).toMatch(/claude/);
+    expect(s.args).toContain("-p");
+    expect(s.args).toContain("research X");
+    expect(s.args).toContain("--output-format");
+    expect(s.args).toContain("stream-json");
+    expect(s.opts.detached).toBe(true);
+    expect(s.unreffed).toBe(true);
+  });
+  it("returns unique agentIds for concurrent dispatches", async () => {
+    const mod = await import("../src/services/alvin-dispatch.js");
+    const r1 = mod.dispatchDetachedAgent({
+      prompt: "a",
+      description: "a",
+      chatId: 1,
+      userId: 1,
+      sessionKey: "s1",
+    });
+    const r2 = mod.dispatchDetachedAgent({
+      prompt: "b",
+      description: "b",
+      chatId: 1,
+      userId: 1,
+      sessionKey: "s1",
+    });
+    expect(r1.agentId).not.toBe(r2.agentId);
+    expect(r1.outputFile).not.toBe(r2.outputFile);
+  });
+  it("registers the pending agent with the watcher", async () => {
+    const mod = await import("../src/services/alvin-dispatch.js");
+    const watcher = await import("../src/services/async-agent-watcher.js");
+    mod.dispatchDetachedAgent({
+      prompt: "x",
+      description: "X audit",
+      chatId: 42,
+      userId: 42,
+      sessionKey: "s1",
+    });
+    const pending = watcher.listPendingAgents();
+    expect(pending).toHaveLength(1);
+    expect(pending[0].description).toBe("X audit");
+    expect(pending[0].sessionKey).toBe("s1");
+  });
+  it("increments session.pendingBackgroundCount on dispatch", async () => {
+    const mod = await import("../src/services/alvin-dispatch.js");
+    const { getSession } = await import("../src/services/session.js");
+    const session = getSession("s-count");
+    session.pendingBackgroundCount = 0;
+    mod.dispatchDetachedAgent({
+      prompt: "p",
+      description: "d",
+      chatId: 1,
+      userId: 1,
+      sessionKey: "s-count",
+    });
+    expect(session.pendingBackgroundCount).toBe(1);
+    mod.dispatchDetachedAgent({
+      prompt: "p2",
+      description: "d2",
+      chatId: 1,
+      userId: 1,
+      sessionKey: "s-count",
+    });
+    expect(session.pendingBackgroundCount).toBe(2);
+  });
+  it("uses stdio redirect so child's stdout goes to outputFile", async () => {
+    const mod = await import("../src/services/alvin-dispatch.js");
+    mod.dispatchDetachedAgent({
+      prompt: "p",
+      description: "d",
+      chatId: 1,
+      userId: 1,
+      sessionKey: "s1",
+    });
+    const [s] = spawned;
+    // stdio should be an array with FD redirects (ignore, pipe-to-file, ignore)
+    // or similar. We verify it's NOT "inherit" (which would attach to parent).
+    expect(s.opts.stdio).not.toBe("inherit");
+    expect(s.opts.stdio).not.toBe(undefined);
+  });
+  it("cleans env of CLAUDECODE/CLAUDE_CODE_ENTRYPOINT to prevent nested session errors", async () => {
+    const mod = await import("../src/services/alvin-dispatch.js");
+    process.env.CLAUDECODE = "1";
+    process.env.CLAUDE_CODE_ENTRYPOINT = "cli";
+    try {
+      mod.dispatchDetachedAgent({
+        prompt: "p",
+        description: "d",
+        chatId: 1,
+        userId: 1,
+        sessionKey: "s1",
+      });
+      const [s] = spawned;
+      expect(s.opts.env).toBeDefined();
+      expect(s.opts.env?.CLAUDECODE).toBeUndefined();
+      expect(s.opts.env?.CLAUDE_CODE_ENTRYPOINT).toBeUndefined();
+    } finally {
+      delete process.env.CLAUDECODE;
+      delete process.env.CLAUDE_CODE_ENTRYPOINT;
+    }
+  });
+});

package/test/async-agent-parser-streamjson.test.ts ADDED Viewed

@@ -0,0 +1,273 @@
+/**
+ * v4.13 — parseOutputFileStatus support for `claude -p --output-format stream-json`.
+ *
+ * The SDK's built-in Task tool writes its sub-agent output in one JSONL
+ * format (events with `message.stop_reason: "end_turn"`). The new v4.13
+ * dispatch mechanism spawns `claude -p --output-format stream-json`
+ * which writes a DIFFERENT format:
+ *
+ *   - Assistant messages have `message.stop_reason: null` (streaming shape)
+ *   - A final `{"type":"result","subtype":"success","stop_reason":"end_turn",...}`
+ *     event marks completion explicitly
+ *   - `result.duration_ms`, `total_cost_usd`, `num_turns`, `usage`
+ *     are the authoritative completion signals
+ *
+ * The parser must recognize BOTH formats. v4.13 adds detection for the
+ * result-event format while preserving backward compat with the existing
+ * SDK-internal format (tested in the sibling test files).
+ */
+import { describe, it, expect, beforeEach, afterEach } from "vitest";
+import fs from "fs";
+import os from "os";
+import { resolve } from "path";
+import { parseOutputFileStatus } from "../src/services/async-agent-parser.js";
+const TMP_BASE = resolve(
+  os.tmpdir(),
+  `alvin-parser-streamjson-${process.pid}`,
+);
+beforeEach(() => {
+  fs.mkdirSync(TMP_BASE, { recursive: true });
+});
+afterEach(() => {
+  try {
+    fs.rmSync(TMP_BASE, { recursive: true, force: true });
+  } catch {
+    /* ignore */
+  }
+});
+describe("parseOutputFileStatus — stream-json format (v4.13)", () => {
+  it("returns 'completed' when final event is type:result + subtype:success", async () => {
+    const path = resolve(TMP_BASE, "stream-success.jsonl");
+    const lines = [
+      { type: "system", subtype: "init", session_id: "s1" },
+      {
+        type: "assistant",
+        message: {
+          role: "assistant",
+          content: [{ type: "text", text: "The answer is 42." }],
+          stop_reason: null, // streaming shape — NOT end_turn yet
+        },
+        session_id: "s1",
+      },
+      {
+        type: "result",
+        subtype: "success",
+        stop_reason: "end_turn",
+        session_id: "s1",
+        total_cost_usd: 0.01,
+        duration_ms: 500,
+        usage: { input_tokens: 10, output_tokens: 5 },
+        result: "The answer is 42.",
+      },
+    ];
+    fs.writeFileSync(
+      path,
+      lines.map((l) => JSON.stringify(l)).join("\n") + "\n",
+      "utf-8",
+    );
+    const status = await parseOutputFileStatus(path);
+    expect(status.state).toBe("completed");
+    if (status.state === "completed") {
+      expect(status.output).toContain("The answer is 42.");
+      expect(status.output).not.toMatch(/interrupted|partial/i);
+    }
+  });
+  it("extracts tokens from result.usage when using stream-json format", async () => {
+    const path = resolve(TMP_BASE, "stream-tokens.jsonl");
+    const lines = [
+      {
+        type: "assistant",
+        message: {
+          content: [{ type: "text", text: "x" }],
+          stop_reason: null,
+        },
+      },
+      {
+        type: "result",
+        subtype: "success",
+        stop_reason: "end_turn",
+        usage: { input_tokens: 1234, output_tokens: 567 },
+      },
+    ];
+    fs.writeFileSync(
+      path,
+      lines.map((l) => JSON.stringify(l)).join("\n") + "\n",
+      "utf-8",
+    );
+    const status = await parseOutputFileStatus(path);
+    expect(status.state).toBe("completed");
+    if (status.state === "completed") {
+      expect(status.tokensUsed).toEqual({ input: 1234, output: 567 });
+    }
+  });
+  it("recognises 'failed' state when result.is_error is true", async () => {
+    const path = resolve(TMP_BASE, "stream-failed.jsonl");
+    const lines = [
+      {
+        type: "assistant",
+        message: {
+          content: [{ type: "text", text: "I tried..." }],
+          stop_reason: null,
+        },
+      },
+      {
+        type: "result",
+        subtype: "error_max_turns",
+        is_error: true,
+        stop_reason: "max_turns",
+      },
+    ];
+    fs.writeFileSync(
+      path,
+      lines.map((l) => JSON.stringify(l)).join("\n") + "\n",
+      "utf-8",
+    );
+    const status = await parseOutputFileStatus(path);
+    // With an is_error result + text content, we still deliver the text
+    // as completed (better to give the user SOMETHING than nothing).
+    // The delivery layer can annotate differently if it chooses.
+    expect(status.state).toBe("completed");
+    if (status.state === "completed") {
+      expect(status.output).toContain("I tried...");
+    }
+  });
+  it("returns 'running' when stream-json events are present but no result yet", async () => {
+    const path = resolve(TMP_BASE, "stream-running.jsonl");
+    const lines = [
+      { type: "system", subtype: "init", session_id: "s1" },
+      {
+        type: "assistant",
+        message: {
+          content: [{ type: "text", text: "Thinking..." }],
+          stop_reason: null,
+        },
+      },
+      {
+        type: "assistant",
+        message: {
+          content: [{ type: "tool_use", name: "Bash", input: {} }],
+          stop_reason: null,
+        },
+      },
+    ];
+    fs.writeFileSync(
+      path,
+      lines.map((l) => JSON.stringify(l)).join("\n") + "\n",
+      "utf-8",
+    );
+    const status = await parseOutputFileStatus(path);
+    expect(status.state).toBe("running");
+  });
+  it("aggregates text from ALL assistant messages when result arrives", async () => {
+    const path = resolve(TMP_BASE, "stream-multi-text.jsonl");
+    const lines = [
+      {
+        type: "assistant",
+        message: {
+          content: [{ type: "text", text: "First thought." }],
+          stop_reason: null,
+        },
+      },
+      {
+        type: "user",
+        message: { content: [{ type: "tool_result", content: "ok" }] },
+      },
+      {
+        type: "assistant",
+        message: {
+          content: [{ type: "text", text: "Continuing..." }],
+          stop_reason: null,
+        },
+      },
+      {
+        type: "user",
+        message: { content: [{ type: "tool_result", content: "ok" }] },
+      },
+      {
+        type: "assistant",
+        message: {
+          content: [{ type: "text", text: "Final answer." }],
+          stop_reason: null,
+        },
+      },
+      { type: "result", subtype: "success", stop_reason: "end_turn" },
+    ];
+    fs.writeFileSync(
+      path,
+      lines.map((l) => JSON.stringify(l)).join("\n") + "\n",
+      "utf-8",
+    );
+    const status = await parseOutputFileStatus(path);
+    expect(status.state).toBe("completed");
+    if (status.state === "completed") {
+      // All three text blocks must be present
+      expect(status.output).toContain("First thought");
+      expect(status.output).toContain("Continuing");
+      expect(status.output).toContain("Final answer");
+    }
+  });
+  it("prefers result.result field as authoritative output when available", async () => {
+    // The stream-json's result event has a `result` field with the
+    // already-concatenated final answer. Use it directly when present
+    // (more accurate than re-aggregating from streaming chunks).
+    const path = resolve(TMP_BASE, "stream-result-field.jsonl");
+    const lines = [
+      {
+        type: "assistant",
+        message: {
+          content: [{ type: "text", text: "Intermediate chunk" }],
+          stop_reason: null,
+        },
+      },
+      {
+        type: "result",
+        subtype: "success",
+        stop_reason: "end_turn",
+        result: "FINAL AUTHORITATIVE ANSWER",
+      },
+    ];
+    fs.writeFileSync(
+      path,
+      lines.map((l) => JSON.stringify(l)).join("\n") + "\n",
+      "utf-8",
+    );
+    const status = await parseOutputFileStatus(path);
+    expect(status.state).toBe("completed");
+    if (status.state === "completed") {
+      expect(status.output).toContain("FINAL AUTHORITATIVE ANSWER");
+    }
+  });
+  it("handles result event with only partial fields (defensive)", async () => {
+    const path = resolve(TMP_BASE, "stream-result-minimal.jsonl");
+    const lines = [
+      {
+        type: "assistant",
+        message: {
+          content: [{ type: "text", text: "Some output" }],
+          stop_reason: null,
+        },
+      },
+      { type: "result" }, // no subtype, no result field, no usage
+    ];
+    fs.writeFileSync(
+      path,
+      lines.map((l) => JSON.stringify(l)).join("\n") + "\n",
+      "utf-8",
+    );
+    const status = await parseOutputFileStatus(path);
+    expect(status.state).toBe("completed");
+    if (status.state === "completed") {
+      expect(status.output).toContain("Some output");
+    }
+  });
+});