alvin-bot 4.12.4 β†’ 4.13.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,127 @@
2
2
 
3
3
  All notable changes to Alvin Bot are documented here.
4
4
 
5
+ ## [4.13.1] β€” 2026-04-16
6
+
7
+ ### πŸ› Patch: Slack Test Connection + PM2 β†’ launchd migration for Maintenance UI
8
+
9
+ Two latent UI bugs surfaced during live Slack setup:
10
+
11
+ **Bug 1 β€” `/api/platforms/test-connection` returned "Unknown platform" for Slack.** The handler in `setup-api.ts` only knew about telegram/discord/signal/whatsapp. Users who entered a valid Bot Token (`xoxb-…`) + App Token (`xapp-…`) and clicked Test Connection got a confusing "Unknown platform" error β€” couldn't tell if their tokens were wrong or the feature was broken.
12
+
13
+ **Fix:** New `slack` case in the handler. Validates Bot Token via `https://slack.com/api/auth.test` (cheap, ~100ms). For App Token, checks the `xapp-` prefix as the quickest sanity check (Socket Mode can't actually be "pinged" without opening a persistent WebSocket). Returns the authenticated bot user + team name on success, or Slack's own `auth.test` error (e.g. `invalid_auth`, `token_expired`) on failure. Warns if App Token is missing or has wrong prefix even when Bot Token is valid β€” helps users notice they only configured half the pair.
14
+
15
+ **Bug 2 β€” Maintenance section's buttons were broken on macOS launchd installs.** Since v4.8 the macOS install runs under `launchd` (`com.alvinbot.app.plist`), not PM2. But `doctor-api.ts` kept calling `pm2 jlist`/`pm2 restart`/`pm2 stop`/`pm2 logs`. Results: status endpoint returned stale data from ghost PM2 entries (uptime/memory/cpu/restarts all wrong), Stop/Start buttons silently failed, log viewer was empty. The Restart button accidentally worked because it used `scheduleGracefulRestart` (launchd's `KeepAlive` auto-brings-back on exit).
16
+
17
+ **Fix:** New `src/services/process-manager.ts` abstraction that auto-detects the active supervisor per request:
18
+ - **launchd** (macOS) if `launchctl print gui/$UID/com.alvinbot.app` succeeds
19
+ - **pm2** (VPS / legacy installs) if `pm2 jlist` lists our process
20
+ - **standalone** if neither (fallback β€” only Restart works, since there's no supervisor to bring the process back)
21
+
22
+ Each manager implements `getStatus()`, `stop()`, `start()`, `getLogs()` with the right tooling:
23
+ - launchd: `launchctl print` + `ps -p <pid> -o %cpu=,%mem=,rss=,etime=` for resource stats, `launchctl bootout` / `bootstrap` for stop/start, `tail` on the known log paths for logs
24
+ - pm2: unchanged β€” `pm2 jlist` / `pm2 stop` / `pm2 start` / `pm2 logs`
25
+ - standalone: `process.uptime()` / `process.memoryUsage()` / manual log tailing
26
+
27
+ The WebUI routes (`/api/pm2/status`, `/api/pm2/action`, `/api/pm2/logs`) keep their names for compat but now dispatch via `detectProcessManager()`. Real-world verified against the running bot: detection returned `launchd`, PID/uptime/memory all correct from the actual launchd-managed process (not a stale PM2 ghost).
28
+
29
+ ### Testing
30
+
31
+ - **Baseline**: 460 tests (v4.13.0)
32
+ - **New**:
33
+ - `test/slack-test-connection.test.ts` β€” 5 tests (no tokens set, auth.test accepts, auth.test rejects, App Token format warning, unknown platform regression)
34
+ - `test/process-manager.test.ts` β€” 10 tests (detection order, each manager's status parsing, stop/start command dispatch)
35
+ - **Total**: 475 tests, all green, TSC clean
36
+ - **Live verification**: ran `detectProcessManager().getStatus()` against the actual running bot β†’ returned `launchd`, PID 4767 (matches `launchctl print pid = 4767`), uptime 655s, memory 76MB β€” all real data, not stale PM2 cache
37
+
38
+ ### Files changed
39
+
40
+ - **NEW**: `src/services/process-manager.ts`, `test/slack-test-connection.test.ts`, `test/process-manager.test.ts`
41
+ - **Modified**: `src/web/setup-api.ts` (+slack case in test-connection), `src/web/doctor-api.ts` (routes use process-manager abstraction), `package.json` (4.13.0 β†’ 4.13.1)
42
+
43
+ ### Known limitations (deferred to v4.14)
44
+
45
+ - **Slack subagent support**: v4.13.0's `mcp__alvin__dispatch_agent` tool only activates on the Telegram handler (passes `alvinDispatchContext`). Slack users can receive normal replies but can't trigger background sub-agents yet. Requires extending `PendingAsyncAgent.chatId` to `number | string`, adding `platform` to the watcher's pending record, and making `subagent-delivery.ts` platform-aware. Tracked for v4.14.
46
+
47
+ ---
48
+
49
+ ## [4.13.0] β€” 2026-04-16
50
+
51
+ ### ✨ Major: truly detached sub-agent dispatch via `alvin_dispatch_agent` MCP tool
52
+
53
+ **Background.** v4.12.1 β†’ v4.12.3 tried three progressively more complex fixes for the "bot freezes while sub-agent runs" problem, all of which depended on Claude Agent SDK's built-in `Task(run_in_background: true)` tool. All three iterations missed the same architectural reality: the SDK's background task stays tied to the parent SDK subprocess lifecycle. When v4.12.3's bypass path aborted the parent to unblock the user, the abort cascaded into killing the in-flight sub-agent mid-work. v4.12.4 worked around this at the delivery layer (recovering partial output after a 5-min staleness window), but the fundamental architecture was still wrong.
54
+
55
+ v4.13 fixes the architecture. Instead of using the SDK's built-in Task tool for background work, we register our own MCP tool β€” `mcp__alvin__dispatch_agent` β€” which spawns a **completely independent** `claude -p` subprocess (its own PID, its own process group, unreferenced from the parent's event loop). Aborting the parent has zero effect on the dispatched subprocess. It continues to write its stream-json output to its own file and runs to completion. The async-agent-watcher polls the output file and delivers the result as a separate message when ready.
56
+
57
+ Empirically verified with a standalone survival test (`scripts/smoke-test-abort-survival.mjs`): dispatch an agent that needs 20+ seconds of work, kill the parent Node process 100ms later, watch the subprocess keep writing to its output file and complete cleanly with the expected result.
58
+
59
+ ### What changed for the user
60
+
61
+ - **Before v4.13** (with Task tool): the bot shows "typing…" for the entire duration of the sub-agent's work (5, 20, 60 minutes). New messages sit in a queue and don't get processed. If the user interrupts via v4.12.3's bypass, the sub-agent dies mid-work and hours later the user gets a `720m timeout Β· (empty output)` message.
62
+ - **After v4.13** (with `alvin_dispatch_agent`): the bot's turn completes within seconds of dispatch. The user sees "πŸ€– Dispatched 2 background agents β€” I'll send the results when ready." and can immediately chat about anything else. The background subprocesses finish cleanly and deliver their full results as separate messages.
63
+
64
+ This matches the OpenClaw experience the user was asking about β€” except it's built natively into Claude Agent SDK's MCP-tool mechanism, not a wholesale replacement.
65
+
66
+ ### Technical details
67
+
68
+ **New module** `src/services/alvin-dispatch.ts`
69
+ - `dispatchDetachedAgent(input)` β€” spawns `claude -p <prompt> --output-format stream-json` via `child_process.spawn({ detached: true, stdio: ["ignore", outFd, errFd] })` + `.unref()`
70
+ - Synchronous return: `{ agentId, outputFile, spawned: true }`
71
+ - Side effects: registers with `async-agent-watcher`, increments `session.pendingBackgroundCount`
72
+ - Unique agent IDs via `crypto.randomBytes(12).toString("hex")` (collision-safe for parallel dispatch)
73
+ - Cleans `CLAUDECODE`/`CLAUDE_CODE_ENTRYPOINT` from env to prevent nested-session errors
74
+
75
+ **New module** `src/services/alvin-mcp-tools.ts`
76
+ - `buildAlvinMcpServer(ctx)` β€” creates an SDK MCP server bound to this turn's `{ chatId, userId, sessionKey }` context via closure
77
+ - Exposes `dispatch_agent` tool (zod-validated input: `{ prompt: string, description: string }`)
78
+ - Tool handler calls `dispatchDetachedAgent` and returns `agentId + outputFile` to Claude
79
+ - Uses SDK's `createSdkMcpServer` + `tool` builders (the SDK's native inline-tool API β€” no separate MCP server process needed)
80
+
81
+ **Provider integration** (`src/providers/claude-sdk-provider.ts`)
82
+ - New `QueryOptions.alvinDispatchContext` field β€” when set, provider registers `mcpServers: { alvin: buildAlvinMcpServer(ctx) }` + appends `mcp__alvin__dispatch_agent` to the default `allowedTools` list
83
+ - When unset, the MCP server is not registered and Claude falls back to the built-in Task tool only
84
+ - Non-SDK providers ignore the new field entirely
85
+
86
+ **Handler integration** (`src/handlers/message.ts`)
87
+ - Passes `alvinDispatchContext: { chatId, userId, sessionKey }` on every SDK turn
88
+ - No other handler changes β€” the bypass path, the staleness parser, and the pending-count decrement are all reused from v4.12.3/v4.12.4
89
+
90
+ **Parser extension** (`src/services/async-agent-parser.ts`)
91
+ - New first-pass scan for `{"type":"result"}` events β€” the completion marker used by `claude -p --output-format stream-json` (different from the SDK-internal sub-agent format that uses `message.stop_reason: "end_turn"`)
92
+ - When found, uses the `result.result` field as authoritative output when present, falls back to aggregating all assistant text blocks
93
+ - Preserves backward compat with the existing `end_turn`-based path (tested by the old test suite)
94
+
95
+ **System prompt update** (`src/services/personality.ts`)
96
+ - `BACKGROUND_SUBAGENT_HINT` rewritten to strongly prefer `mcp__alvin__dispatch_agent` over `Task(run_in_background: true)` on Telegram/WhatsApp/Slack/Discord
97
+ - Explicit decision tree, concrete example prompts, parallel-dispatch guidance
98
+ - Built-in Task tool remains available but deprecated for long-running work; reserved for the rare case where Claude needs a result in the same turn
99
+
100
+ ### Known limitations
101
+
102
+ - **First-turn only for now**: the MCP server is bound to `{ chatId, userId, sessionKey }` at query construction time. If the session's underlying SDK session ID changes mid-conversation (rare), the tool context goes stale. Defensive: a new MCP server is built on each handler invocation, so any next turn picks up the correct context.
103
+ - **Non-Telegram platforms**: `src/handlers/platform-message.ts` (Slack/Discord/WhatsApp) doesn't pass `alvinDispatchContext` yet. Deferred to follow-up β€” the Telegram path is the primary use case and the one the user explicitly requested.
104
+ - **Parallel dispatch not smoke-tested**: the system prompt guides Claude to call `dispatch_agent` multiple times in one turn for parallel work, but I only end-to-end tested single dispatch. Should work (no shared state in the handler), but YMMV until battle-tested.
105
+
106
+ ### Testing
107
+
108
+ - **Baseline**: 447 tests (v4.12.4)
109
+ - **New**:
110
+ - `test/alvin-dispatch.test.ts` β€” 6 tests (spawn flags, unique IDs, watcher registration, session counter, stdio redirect, env cleanup)
111
+ - `test/async-agent-parser-streamjson.test.ts` β€” 7 tests (result-event detection, token extraction, error state, running state, multi-text aggregation, `result.result` precedence, minimal fields)
112
+ - **Total**: 460 tests, all green, TSC clean
113
+ - **Real-world smoke tests** (NOT in CI β€” run via `node scripts/smoke-test-dispatch.mjs` and `node scripts/smoke-test-abort-survival.mjs`):
114
+ - `smoke-test-dispatch`: dispatches a real `claude -p` subprocess, polls to completion (~10s), verifies exact output `"SMOKE_TEST_OK_v4.13"`. **PASS**.
115
+ - `smoke-test-abort-survival`: dispatches a subprocess that needs ~25s of work, kills the parent Node process ~100ms later, polls the output file. Subprocess survives and completes cleanly. **PASS**.
116
+
117
+ ### Files changed
118
+
119
+ - **NEW**: `src/services/alvin-dispatch.ts`, `src/services/alvin-mcp-tools.ts`, `scripts/smoke-test-dispatch.mjs`, `scripts/smoke-test-abort-survival.mjs`
120
+ - **NEW tests**: `test/alvin-dispatch.test.ts`, `test/async-agent-parser-streamjson.test.ts`
121
+ - **Modified**: `src/paths.ts` (SUBAGENTS_DIR), `src/services/async-agent-parser.ts` (stream-json detection), `src/providers/claude-sdk-provider.ts` (MCP server registration + allowedTools), `src/providers/types.ts` (QueryOptions.alvinDispatchContext), `src/handlers/message.ts` (pass dispatch context), `src/services/personality.ts` (BACKGROUND_SUBAGENT_HINT rewrite)
122
+ - **Version**: `package.json` 4.12.4 β†’ 4.13.0 (minor bump β€” new public surface: MCP tool)
123
+
124
+ ---
125
+
5
126
  ## [4.12.4] β€” 2026-04-16
6
127
 
7
128
  ### πŸ› Patch: recover partial output from interrupted background sub-agents
@@ -400,6 +400,15 @@ export async function handleMessage(ctx) {
400
400
  messageCount: session.messageCount,
401
401
  toolUseCount: session.toolUseCount,
402
402
  } : undefined,
403
+ // v4.13 β€” Expose alvin_dispatch_agent MCP tool so Claude can spawn
404
+ // truly detached background sub-agents (independent of this SDK
405
+ // subprocess's lifecycle). Only for SDK provider + Telegram here β€”
406
+ // non-SDK providers ignore this field.
407
+ alvinDispatchContext: isSDK ? {
408
+ chatId: ctx.chat.id,
409
+ userId,
410
+ sessionKey,
411
+ } : undefined,
403
412
  };
404
413
  // Stream response from provider (with fallback)
405
414
  let lastBroadcastLen = 0;
package/dist/paths.js CHANGED
@@ -118,3 +118,11 @@ export const ASSETS_DIR = resolve(DATA_DIR, "assets");
118
118
  export const ASSETS_INDEX_JSON = resolve(DATA_DIR, "assets", "INDEX.json");
119
119
  /** assets/INDEX.md β€” Human-readable asset summary (injected into prompts) */
120
120
  export const ASSETS_INDEX_MD = resolve(DATA_DIR, "assets", "INDEX.md");
121
+ /** subagents/ β€” Detached `claude -p` subprocess output files (v4.13).
122
+ * Each dispatched agent writes its full stream-json output to
123
+ * subagents/<agentId>.jsonl. The async-agent-watcher polls these files
124
+ * and delivers the final result as a separate message when ready.
125
+ * These live outside BOT_ROOT/DATA_DIR's state/ so that the watcher's
126
+ * giveUpAt-survive-restart logic doesn't leak into the subprocess
127
+ * lifecycle. */
128
+ export const SUBAGENTS_DIR = resolve(DATA_DIR, "subagents");
@@ -13,6 +13,7 @@ import { fileURLToPath } from "url";
13
13
  import { execFile } from "child_process";
14
14
  import { promisify } from "util";
15
15
  import { findClaudeBinary } from "../find-claude-binary.js";
16
+ import { buildAlvinMcpServer } from "../services/alvin-mcp-tools.js";
16
17
  const execFileAsync = promisify(execFile);
17
18
  /**
18
19
  * Detects the Claude CLI "Not logged in" error message. The CLI emits this
@@ -103,6 +104,25 @@ export class ClaudeSDKProvider {
103
104
  }
104
105
  try {
105
106
  const claudePath = findClaudeBinary();
107
+ // v4.13 β€” Register Alvin's custom MCP server if the caller provided
108
+ // dispatch context. The server exposes `alvin_dispatch_agent` which
109
+ // spawns truly detached `claude -p` subprocesses (independent of the
110
+ // main SDK subprocess's lifecycle). When Claude calls it, the bot
111
+ // can abort this query without killing the dispatched sub-agent.
112
+ const mcpServers = {};
113
+ if (options.alvinDispatchContext) {
114
+ mcpServers.alvin = buildAlvinMcpServer(options.alvinDispatchContext);
115
+ }
116
+ // v4.13 β€” MCP tool names must be explicitly whitelisted via allowedTools
117
+ // in the form `mcp__<server>__<tool>`. Without this, Claude can see the
118
+ // tool in the catalog but cannot actually invoke it.
119
+ const defaultAllowed = [
120
+ "Read", "Write", "Edit", "Bash", "Glob", "Grep",
121
+ "WebSearch", "WebFetch", "Task",
122
+ ];
123
+ if (options.alvinDispatchContext) {
124
+ defaultAllowed.push("mcp__alvin__dispatch_agent");
125
+ }
106
126
  const q = query({
107
127
  prompt,
108
128
  options: {
@@ -116,11 +136,11 @@ export class ClaudeSDKProvider {
116
136
  settingSources: ["user", "project"],
117
137
  // v4.12.2 β€” options.allowedTools can override the default full set.
118
138
  // Used by sub-agents with toolset="readonly"/"research" to restrict
119
- // what Claude can do. Default = full access.
120
- allowedTools: options.allowedTools ?? [
121
- "Read", "Write", "Edit", "Bash", "Glob", "Grep",
122
- "WebSearch", "WebFetch", "Task",
123
- ],
139
+ // what Claude can do. Default = full access + alvin MCP tools.
140
+ allowedTools: options.allowedTools ?? defaultAllowed,
141
+ // v4.13 β€” Conditionally pass the MCP server config so the inline
142
+ // dispatch tool is visible. Empty object = no custom tools.
143
+ mcpServers: Object.keys(mcpServers).length > 0 ? mcpServers : undefined,
124
144
  systemPrompt,
125
145
  effort: (options.effort || "medium"),
126
146
  maxTurns: 50,
@@ -0,0 +1,125 @@
1
+ /**
2
+ * v4.13 β€” alvin_dispatch custom-tool service.
3
+ *
4
+ * Architectural replacement for Claude Agent SDK's built-in
5
+ * `Task(run_in_background: true)` tool. The SDK's built-in version
6
+ * ties the background sub-agent's execution to the parent SDK
7
+ * subprocess lifecycle β€” killing the parent (e.g. via v4.12.3's
8
+ * bypass-abort) cascades into killing any in-flight background tasks.
9
+ *
10
+ * This module instead spawns a truly independent `claude -p` subprocess
11
+ * via Node's `child_process.spawn({ detached: true, stdio: [...] })`.
12
+ * The subprocess:
13
+ * - Has its own PID, own process group (by detached: true)
14
+ * - Is unreffed so the parent Node process doesn't wait for it
15
+ * - Writes its stream-json output to its own file
16
+ * - Survives any abort/crash/restart of the parent Alvin bot
17
+ *
18
+ * The async-agent-watcher polls the output file and delivers the
19
+ * final result via subagent-delivery.ts when the sub-agent completes.
20
+ *
21
+ * See Phase A of docs/superpowers/plans/2026-04-16-v4.13-truly-async-subagents.md
22
+ * for the empirical verification that detached `claude -p` subprocesses
23
+ * behave as expected (they do).
24
+ */
25
+ import { spawn } from "node:child_process";
26
+ import fs from "node:fs";
27
+ import crypto from "node:crypto";
28
+ import { resolve } from "node:path";
29
+ import { findClaudeBinary } from "../find-claude-binary.js";
30
+ import { registerPendingAgent } from "./async-agent-watcher.js";
31
+ import { getAllSessions } from "./session.js";
32
+ import { SUBAGENTS_DIR } from "../paths.js";
33
+ /** Generate a 32-char hex agent id. Avoids collisions across parallel
34
+ * dispatches even at sub-millisecond intervals. */
35
+ function generateAgentId() {
36
+ return "alvin-" + crypto.randomBytes(12).toString("hex");
37
+ }
38
+ /**
39
+ * Dispatch a detached sub-agent. Returns synchronously β€” the subprocess
40
+ * runs in the background. Throws if spawn fails. On success:
41
+ *
42
+ * 1. Subprocess is running, writing stream-json to outputFile
43
+ * 2. The agent is registered with async-agent-watcher (pending list)
44
+ * 3. session.pendingBackgroundCount is incremented
45
+ * 4. When the subprocess completes, watcher delivers the result
46
+ */
47
+ export function dispatchDetachedAgent(input) {
48
+ // Ensure subagents dir exists. Idempotent.
49
+ try {
50
+ fs.mkdirSync(SUBAGENTS_DIR, { recursive: true });
51
+ }
52
+ catch {
53
+ /* race-safe β€” next open() will surface the real error */
54
+ }
55
+ const agentId = generateAgentId();
56
+ const outputFile = resolve(SUBAGENTS_DIR, `${agentId}.jsonl`);
57
+ // Open the output file for write. We pass the FD to child's stdout
58
+ // so the subprocess writes directly without going through us.
59
+ // stderr β†’ separate .err file for diagnostics.
60
+ const errFile = resolve(SUBAGENTS_DIR, `${agentId}.err`);
61
+ const outFd = fs.openSync(outputFile, "w");
62
+ const errFd = fs.openSync(errFile, "w");
63
+ const cleanEnv = { ...process.env };
64
+ // v4.13 β€” Prevent nested-session errors. The SDK refuses to run if
65
+ // these are already set in env (they leak from parent Alvin/SDK).
66
+ delete cleanEnv.CLAUDECODE;
67
+ delete cleanEnv.CLAUDE_CODE_ENTRYPOINT;
68
+ const claudePath = findClaudeBinary();
69
+ if (!claudePath) {
70
+ fs.closeSync(outFd);
71
+ fs.closeSync(errFd);
72
+ throw new Error("alvin_dispatch: claude CLI not found. Install claude-code to enable background dispatch.");
73
+ }
74
+ const child = spawn(claudePath, [
75
+ "-p",
76
+ input.prompt,
77
+ "--output-format",
78
+ "stream-json",
79
+ "--verbose",
80
+ ], {
81
+ cwd: input.cwd,
82
+ detached: true,
83
+ stdio: ["ignore", outFd, errFd],
84
+ env: cleanEnv,
85
+ });
86
+ // Close our copies of the FDs β€” the child has its own descriptors now.
87
+ try {
88
+ fs.closeSync(outFd);
89
+ }
90
+ catch {
91
+ /* ignore */
92
+ }
93
+ try {
94
+ fs.closeSync(errFd);
95
+ }
96
+ catch {
97
+ /* ignore */
98
+ }
99
+ // Detach from parent Node's event loop so parent exit doesn't wait.
100
+ child.unref();
101
+ // Register with watcher so it polls the output file and delivers.
102
+ registerPendingAgent({
103
+ agentId,
104
+ outputFile,
105
+ description: input.description,
106
+ prompt: input.prompt,
107
+ chatId: input.chatId,
108
+ userId: input.userId,
109
+ toolUseId: null,
110
+ sessionKey: input.sessionKey,
111
+ });
112
+ // Increment the session's pendingBackgroundCount so the main handler
113
+ // knows a background task is in flight (same signal path as SDK's
114
+ // built-in Task tool).
115
+ try {
116
+ const s = getAllSessions().get(input.sessionKey);
117
+ if (s) {
118
+ s.pendingBackgroundCount = (s.pendingBackgroundCount ?? 0) + 1;
119
+ }
120
+ }
121
+ catch {
122
+ /* never let counter updates break dispatch */
123
+ }
124
+ return { agentId, outputFile, spawned: true };
125
+ }
@@ -0,0 +1,103 @@
1
+ /**
2
+ * v4.13 β€” Alvin's custom MCP tools, registered with the Claude Agent SDK
3
+ * via `createSdkMcpServer()`.
4
+ *
5
+ * Currently exposes a single tool:
6
+ * `alvin_dispatch_agent(prompt, description)` β€” spawns a truly
7
+ * detached `claude -p` subprocess that's independent of the parent
8
+ * SDK lifecycle. Claude should prefer this over built-in
9
+ * `Task(run_in_background: true)` for any long-running work on
10
+ * Telegram so the main Telegram session isn't blocked by the SDK's
11
+ * task-notification injection mechanism.
12
+ *
13
+ * The MCP server is created lazily per-query so each query gets fresh
14
+ * handler context (chatId/userId/sessionKey) via a closure.
15
+ */
16
+ import { createSdkMcpServer, tool, } from "@anthropic-ai/claude-agent-sdk";
17
+ import { z } from "zod";
18
+ import { dispatchDetachedAgent } from "./alvin-dispatch.js";
19
+ /**
20
+ * Build an MCP server bound to a specific turn's context. Pass the
21
+ * returned instance under `mcpServers: { alvin: <instance> }` in the
22
+ * query options.
23
+ */
24
+ export function buildAlvinMcpServer(ctx) {
25
+ return createSdkMcpServer({
26
+ name: "alvin",
27
+ version: "4.13.0",
28
+ tools: [
29
+ tool("dispatch_agent", [
30
+ "Dispatch a TRULY DETACHED background sub-agent that runs",
31
+ "independently of this session. Use this for ANY long-running",
32
+ "work on Telegram/Slack/Discord/WhatsApp β€” research tasks,",
33
+ "audits, multi-page scraping, deep analysis β€” so the main",
34
+ "user session stays responsive and the user can keep chatting",
35
+ "with you while the sub-agent works.",
36
+ "",
37
+ "HOW IT DIFFERS FROM Task(run_in_background: true):",
38
+ "- The built-in Task tool's subprocess is tied to this session,",
39
+ " so aborting the session also kills the sub-agent mid-work.",
40
+ "- `alvin_dispatch.dispatch_agent` spawns a completely",
41
+ " independent `claude -p` subprocess that survives any abort,",
42
+ " crash, or restart of the main bot.",
43
+ "",
44
+ "WHEN TO USE:",
45
+ "- Any audit/research visiting >2 URLs or reading >5 files",
46
+ "- Full-repo scans, code reviews, SEO/security/perf audits",
47
+ "- Anything you'd describe as 'thorough' or 'takes a few min'",
48
+ "",
49
+ "HOW THE RESULT GETS BACK TO THE USER:",
50
+ "- The tool returns { agentId, outputFile } immediately.",
51
+ "- The bot's async-agent watcher polls the outputFile and",
52
+ " delivers the final result as a separate chat message when",
53
+ " the sub-agent completes (success, failure, or 5-min",
54
+ " staleness).",
55
+ "- Your job after calling this tool: tell the user ONE short",
56
+ " sentence about what you dispatched, then END your turn.",
57
+ " Do NOT wait. Do NOT poll the outputFile yourself.",
58
+ ].join("\n"), {
59
+ prompt: z
60
+ .string()
61
+ .describe("The full prompt for the sub-agent. Be specific and self-contained β€” the sub-agent has no access to this conversation's context and will see only this prompt."),
62
+ description: z
63
+ .string()
64
+ .describe("Short human-readable title (e.g. 'SEO audit alev-b.com', 'Research Higgsfield Seedance 2.0'). Shown to the user when the result arrives."),
65
+ }, async (args) => {
66
+ try {
67
+ const result = dispatchDetachedAgent({
68
+ prompt: args.prompt,
69
+ description: args.description,
70
+ chatId: ctx.chatId,
71
+ userId: ctx.userId,
72
+ sessionKey: ctx.sessionKey,
73
+ cwd: ctx.cwd,
74
+ });
75
+ return {
76
+ content: [
77
+ {
78
+ type: "text",
79
+ text: `βœ… Background sub-agent dispatched.\n` +
80
+ `agentId: ${result.agentId}\n` +
81
+ `output_file: ${result.outputFile}\n` +
82
+ `The user will receive the result as a separate message when the sub-agent completes.\n` +
83
+ `End your turn now. Do not wait for the result β€” it arrives asynchronously.`,
84
+ },
85
+ ],
86
+ };
87
+ }
88
+ catch (err) {
89
+ const msg = err instanceof Error ? err.message : String(err);
90
+ return {
91
+ content: [
92
+ {
93
+ type: "text",
94
+ text: `⚠️ Failed to dispatch background agent: ${msg}`,
95
+ },
96
+ ],
97
+ isError: true,
98
+ };
99
+ }
100
+ }),
101
+ ],
102
+ });
103
+ }
@@ -148,6 +148,56 @@ export async function parseOutputFileStatus(path, opts = {}) {
148
148
  const usable = lines
149
149
  .slice(headIncomplete, lines.length - (trailIncomplete > 0 ? trailIncomplete : 0))
150
150
  .filter((l) => l.length > 0);
151
+ // v4.13 β€” FIRST PASS: look for a `{"type":"result"}` event anywhere in
152
+ // the tail. This is the completion signal for `claude -p
153
+ // --output-format stream-json` output (used by the v4.13 dispatch
154
+ // mechanism). When present, the `result` field holds the authoritative
155
+ // final text. If `result.result` is missing, aggregate from all
156
+ // assistant text blocks in the tail.
157
+ for (let i = usable.length - 1; i >= 0; i--) {
158
+ let parsed;
159
+ try {
160
+ parsed = JSON.parse(usable[i]);
161
+ }
162
+ catch {
163
+ continue;
164
+ }
165
+ if (parsed.type === "result") {
166
+ // Prefer the authoritative `result` field when present.
167
+ let output = typeof parsed.result === "string" ? parsed.result : "";
168
+ // Fallback: aggregate text from all assistant messages in the tail.
169
+ if (!output) {
170
+ const fragments = [];
171
+ for (const line of usable) {
172
+ let p;
173
+ try {
174
+ p = JSON.parse(line);
175
+ }
176
+ catch {
177
+ continue;
178
+ }
179
+ if (p.type === "assistant" &&
180
+ Array.isArray(p.message?.content)) {
181
+ for (const c of p.message.content) {
182
+ if (c?.type === "text" && typeof c.text === "string") {
183
+ fragments.push(c.text);
184
+ }
185
+ }
186
+ }
187
+ }
188
+ output = fragments.join("\n\n").trim();
189
+ }
190
+ // Token usage from the result event itself.
191
+ const usage = parsed.usage;
192
+ const tokensUsed = usage
193
+ ? {
194
+ input: usage.input_tokens ?? 0,
195
+ output: usage.output_tokens ?? 0,
196
+ }
197
+ : undefined;
198
+ return { state: "completed", output, tokensUsed };
199
+ }
200
+ }
151
201
  // Walk backwards to find the most-recent assistant message with end_turn
152
202
  for (let i = usable.length - 1; i >= 0; i--) {
153
203
  let parsed;
@@ -61,13 +61,27 @@ const SDK_ADDON = `When you run commands or edit files, briefly explain what you
61
61
  * See test/system-prompt-background-hint.test.ts and
62
62
  * docs/superpowers/plans/2026-04-13-async-subagents.md
63
63
  */
64
- const BACKGROUND_SUBAGENT_HINT = `## ⚠️ CRITICAL: Background Sub-Agents (Telegram blocking rule)
64
+ const BACKGROUND_SUBAGENT_HINT = `## ⚠️ CRITICAL: Background Sub-Agents on Telegram/WhatsApp/Slack
65
65
 
66
- **The rule**: For any Agent/Task tool call that will take more than
67
- ~30 seconds, you MUST set \`run_in_background: true\`. Otherwise the
68
- Telegram user is COMPLETELY BLOCKED from messaging me until the task
69
- finishes β€” terrible UX for anything that ends up taking minutes or
70
- hours. This is the single most important rule for using the Agent tool.
66
+ **THE RULE (v4.13 update)**: For ANY long-running work (research,
67
+ audits, multi-step analysis, >30 seconds), you MUST prefer the
68
+ \`mcp__alvin__dispatch_agent\` tool over the built-in \`Task\` tool.
69
+
70
+ **Why alvin_dispatch over Task(run_in_background: true)?**
71
+
72
+ The built-in Task tool (even with \`run_in_background: true\`) keeps
73
+ the sub-agent tied to THIS session's CLI subprocess. If the user sends
74
+ me a new message mid-work, aborting this session would kill the
75
+ sub-agent half-way through and leave garbage partial output. Alvin
76
+ learned this the hard way (v4.12.1–v4.12.3 all got this wrong).
77
+
78
+ \`mcp__alvin__dispatch_agent\` spawns a COMPLETELY INDEPENDENT
79
+ \`claude -p\` subprocess with its own PID, own process group. It
80
+ survives ANY abort of this session. The user can interrupt me freely
81
+ and keep chatting while the sub-agent finishes cleanly. The bot
82
+ delivers the result as a separate message when ready.
83
+
84
+ **THIS IS THE SINGLE MOST IMPORTANT RULE FOR USER RESPONSIVENESS.**
71
85
 
72
86
  **Why it matters**: During a synchronous Agent tool call the parent
73
87
  session has no way to know the sub-agent is still working. It appears
@@ -77,7 +91,7 @@ an \`agentId\` + \`outputFile\` path IMMEDIATELY, your turn ends in
77
91
  seconds, the user can keep chatting with me, and the bot automatically
78
92
  delivers the sub-agent's final result as a separate message when ready.
79
93
 
80
- **Decision tree** (apply every time you consider the Agent/Task tool):
94
+ **Decision tree** (apply every time you consider any sub-agent tool):
81
95
 
82
96
  Does the task involve ANY of the following?
83
97
  β€’ Visiting more than 2 URLs
@@ -89,8 +103,14 @@ delivers the sub-agent's final result as a separate message when ready.
89
103
  β€’ Crawling, scraping, or fetching multiple resources
90
104
  β€’ Research across multiple sources or domains
91
105
 
92
- YES β†’ \`run_in_background: true\` (no exceptions)
93
- NO β†’ foreground is fine (single quick sub-query under 30s)
106
+ YES β†’ use \`mcp__alvin__dispatch_agent\` (truly detached, preferred)
107
+ NO β†’ foreground is fine (single quick sub-query under 30s, answer
108
+ yourself if possible)
109
+
110
+ NOTE: The built-in Task tool with run_in_background: true still works
111
+ but is now deprecated on Telegram/Slack/Discord/WhatsApp because it
112
+ ties sub-agent lifetime to this session. Only use Task directly when
113
+ you explicitly need the sub-agent's result IN THIS SAME TURN (rare).
94
114
 
95
115
  **Examples where you MUST use \`run_in_background: true\`:**
96
116
  - ANY audit (SEO, security, code quality, performance, accessibility, GEO)
@@ -107,7 +127,7 @@ delivers the sub-agent's final result as a separate message when ready.
107
127
  - "What's 2+2?" (no sub-agent needed β€” answer yourself)
108
128
  - "Check if package.json has foo" (one quick tool call)
109
129
 
110
- **After launching a background agent, you MUST:**
130
+ **After launching a background agent (either tool), you MUST:**
111
131
  1. Tell the user in ONE short sentence what you kicked off.
112
132
  Example: "Starting SEO audit for gethomes.io in the background β€”
113
133
  I'll send the report when it's done."
@@ -115,6 +135,12 @@ delivers the sub-agent's final result as a separate message when ready.
115
135
  3. The bot will deliver the result as a separate message when ready.
116
136
  You don't need to poll the outputFile proactively.
117
137
 
138
+ **For PARALLEL dispatch** (e.g. user says "research X and Y in parallel"):
139
+ Call \`mcp__alvin__dispatch_agent\` multiple times in the SAME assistant
140
+ turn, once per sub-task. Each returns its own agentId immediately. Your
141
+ turn ends as soon as all dispatches have returned β€” no sequential
142
+ waiting. The bot delivers each sub-agent's result separately when ready.
143
+
118
144
  If the user asks "is it done yet?" before the bot delivers the result,
119
145
  you MAY read the agent's \`outputFile\` (from the original tool result)
120
146
  using the Read tool to peek at progress β€” but don't block on it.