alvin-bot 4.12.3 → 4.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,132 @@
2
2
 
3
3
  All notable changes to Alvin Bot are documented here.
4
4
 
5
+ ## [4.13.0] — 2026-04-16
6
+
7
+ ### ✨ Major: truly detached sub-agent dispatch via `alvin_dispatch_agent` MCP tool
8
+
9
+ **Background.** v4.12.1 → v4.12.3 tried three progressively more complex fixes for the "bot freezes while sub-agent runs" problem, all of which depended on Claude Agent SDK's built-in `Task(run_in_background: true)` tool. All three iterations missed the same architectural reality: the SDK's background task stays tied to the parent SDK subprocess lifecycle. When v4.12.3's bypass path aborted the parent to unblock the user, the abort cascaded into killing the in-flight sub-agent mid-work. v4.12.4 worked around this at the delivery layer (recovering partial output after a 5-min staleness window), but the fundamental architecture was still wrong.
10
+
11
+ v4.13 fixes the architecture. Instead of using the SDK's built-in Task tool for background work, we register our own MCP tool — `mcp__alvin__dispatch_agent` — which spawns a **completely independent** `claude -p` subprocess (its own PID, its own process group, unreferenced from the parent's event loop). Aborting the parent has zero effect on the dispatched subprocess. It continues to write its stream-json output to its own file and runs to completion. The async-agent-watcher polls the output file and delivers the result as a separate message when ready.
12
+
13
+ Empirically verified with a standalone survival test (`scripts/smoke-test-abort-survival.mjs`): dispatch an agent that needs 20+ seconds of work, kill the parent Node process 100ms later, watch the subprocess keep writing to its output file and complete cleanly with the expected result.
14
+
15
+ ### What changed for the user
16
+
17
+ - **Before v4.13** (with Task tool): the bot shows "typing…" for the entire duration of the sub-agent's work (5, 20, 60 minutes). New messages sit in a queue and don't get processed. If the user interrupts via v4.12.3's bypass, the sub-agent dies mid-work and hours later the user gets a `720m timeout · (empty output)` message.
18
+ - **After v4.13** (with `alvin_dispatch_agent`): the bot's turn completes within seconds of dispatch. The user sees "🤖 Dispatched 2 background agents — I'll send the results when ready." and can immediately chat about anything else. The background subprocesses finish cleanly and deliver their full results as separate messages.
19
+
20
+ This matches the OpenClaw experience the user was asking about — except it's built natively into Claude Agent SDK's MCP-tool mechanism, not a wholesale replacement.
21
+
22
+ ### Technical details
23
+
24
+ **New module** `src/services/alvin-dispatch.ts`
25
+ - `dispatchDetachedAgent(input)` — spawns `claude -p <prompt> --output-format stream-json` via `child_process.spawn({ detached: true, stdio: ["ignore", outFd, errFd] })` + `.unref()`
26
+ - Synchronous return: `{ agentId, outputFile, spawned: true }`
27
+ - Side effects: registers with `async-agent-watcher`, increments `session.pendingBackgroundCount`
28
+ - Unique agent IDs via `crypto.randomBytes(12).toString("hex")` (collision-safe for parallel dispatch)
29
+ - Cleans `CLAUDECODE`/`CLAUDE_CODE_ENTRYPOINT` from env to prevent nested-session errors
30
+
31
+ **New module** `src/services/alvin-mcp-tools.ts`
32
+ - `buildAlvinMcpServer(ctx)` — creates an SDK MCP server bound to this turn's `{ chatId, userId, sessionKey }` context via closure
33
+ - Exposes `dispatch_agent` tool (zod-validated input: `{ prompt: string, description: string }`)
34
+ - Tool handler calls `dispatchDetachedAgent` and returns `agentId + outputFile` to Claude
35
+ - Uses SDK's `createSdkMcpServer` + `tool` builders (the SDK's native inline-tool API — no separate MCP server process needed)
36
+
37
+ **Provider integration** (`src/providers/claude-sdk-provider.ts`)
38
+ - New `QueryOptions.alvinDispatchContext` field — when set, provider registers `mcpServers: { alvin: buildAlvinMcpServer(ctx) }` + appends `mcp__alvin__dispatch_agent` to the default `allowedTools` list
39
+ - When unset, the MCP server is not registered and Claude falls back to the built-in Task tool only
40
+ - Non-SDK providers ignore the new field entirely
41
+
42
+ **Handler integration** (`src/handlers/message.ts`)
43
+ - Passes `alvinDispatchContext: { chatId, userId, sessionKey }` on every SDK turn
44
+ - No other handler changes — the bypass path, the staleness parser, and the pending-count decrement are all reused from v4.12.3/v4.12.4
45
+
46
+ **Parser extension** (`src/services/async-agent-parser.ts`)
47
+ - New first-pass scan for `{"type":"result"}` events — the completion marker used by `claude -p --output-format stream-json` (different from the SDK-internal sub-agent format that uses `message.stop_reason: "end_turn"`)
48
+ - When found, uses the `result.result` field as authoritative output when present, falls back to aggregating all assistant text blocks
49
+ - Preserves backward compat with the existing `end_turn`-based path (tested by the old test suite)
50
+
51
+ **System prompt update** (`src/services/personality.ts`)
52
+ - `BACKGROUND_SUBAGENT_HINT` rewritten to strongly prefer `mcp__alvin__dispatch_agent` over `Task(run_in_background: true)` on Telegram/WhatsApp/Slack/Discord
53
+ - Explicit decision tree, concrete example prompts, parallel-dispatch guidance
54
+ - Built-in Task tool remains available but deprecated for long-running work; reserved for the rare case where Claude needs a result in the same turn
55
+
56
+ ### Known limitations
57
+
58
+ - **First-turn only for now**: the MCP server is bound to `{ chatId, userId, sessionKey }` at query construction time. If the session's underlying SDK session ID changes mid-conversation (rare), the tool context goes stale. Defensive: a new MCP server is built on each handler invocation, so any next turn picks up the correct context.
59
+ - **Non-Telegram platforms**: `src/handlers/platform-message.ts` (Slack/Discord/WhatsApp) doesn't pass `alvinDispatchContext` yet. Deferred to follow-up — the Telegram path is the primary use case and the one the user explicitly requested.
60
+ - **Parallel dispatch not smoke-tested**: the system prompt guides Claude to call `dispatch_agent` multiple times in one turn for parallel work, but I only end-to-end tested single dispatch. Should work (no shared state in the handler), but YMMV until battle-tested.
61
+
62
+ ### Testing
63
+
64
+ - **Baseline**: 447 tests (v4.12.4)
65
+ - **New**:
66
+ - `test/alvin-dispatch.test.ts` — 6 tests (spawn flags, unique IDs, watcher registration, session counter, stdio redirect, env cleanup)
67
+ - `test/async-agent-parser-streamjson.test.ts` — 7 tests (result-event detection, token extraction, error state, running state, multi-text aggregation, `result.result` precedence, minimal fields)
68
+ - **Total**: 460 tests, all green, TSC clean
69
+ - **Real-world smoke tests** (NOT in CI — run via `node scripts/smoke-test-dispatch.mjs` and `node scripts/smoke-test-abort-survival.mjs`):
70
+ - `smoke-test-dispatch`: dispatches a real `claude -p` subprocess, polls to completion (~10s), verifies exact output `"SMOKE_TEST_OK_v4.13"`. **PASS**.
71
+ - `smoke-test-abort-survival`: dispatches a subprocess that needs ~25s of work, kills the parent Node process ~100ms later, polls the output file. Subprocess survives and completes cleanly. **PASS**.
72
+
73
+ ### Files changed
74
+
75
+ - **NEW**: `src/services/alvin-dispatch.ts`, `src/services/alvin-mcp-tools.ts`, `scripts/smoke-test-dispatch.mjs`, `scripts/smoke-test-abort-survival.mjs`
76
+ - **NEW tests**: `test/alvin-dispatch.test.ts`, `test/async-agent-parser-streamjson.test.ts`
77
+ - **Modified**: `src/paths.ts` (SUBAGENTS_DIR), `src/services/async-agent-parser.ts` (stream-json detection), `src/providers/claude-sdk-provider.ts` (MCP server registration + allowedTools), `src/providers/types.ts` (QueryOptions.alvinDispatchContext), `src/handlers/message.ts` (pass dispatch context), `src/services/personality.ts` (BACKGROUND_SUBAGENT_HINT rewrite)
78
+ - **Version**: `package.json` 4.12.4 → 4.13.0 (minor bump — new public surface: MCP tool)
79
+
80
+ ---
81
+
82
+ ## [4.12.4] — 2026-04-16
83
+
84
+ ### 🐛 Patch: recover partial output from interrupted background sub-agents
85
+
86
+ **The bug Ali saw:** Two Telegram messages appeared hours apart: `⏱️ Background agent a5bf8c74 timeout · 720m 3s · 0 in / 0 out` and `... ab9372d4 timeout · 720m 1s · 0 in / 0 out`, both with `(empty output)`. Three more agents were still pending, all interrupted mid-execution with hundreds of KB of real work sitting on disk.
87
+
88
+ **Root cause:** v4.12.3's bypass-abort calls `session.abortController.abort()`, which propagates through `claude-sdk-provider.ts`'s `internalAbortController` into the SDK's CLI subprocess, which in turn propagates into any in-flight `Agent(run_in_background: true)` tool executions. Evidence from the disk:
89
+
90
+ - `agent-a03ce829...jsonl`: 116 lines, last event = literally `"[Request interrupted by user for tool use]"` mid-Bash-tool-use
91
+ - `agent-af61fa6e...jsonl`: 81 lines, last assistant text = `"Ich habe jetzt genug Daten für den vollständigen Audit. Hier ist der Report:"` — interrupted while streaming the final report
92
+ - `agent-ac47c4a2...jsonl`: 131 lines, last assistant text = `"## Perseus Audit — Ergebnis\n### Kritische Bugs"` — interrupted a few words into the payoff
93
+
94
+ None of them reached `stop_reason: "end_turn"`. The pre-v4.12.4 `parseOutputFileStatus` only recognized `end_turn` as a completion signal, so these agents sat in the pending list for 12h until `giveUpAt` elapsed, then got delivered as `(empty output)` while their real work was still on disk.
95
+
96
+ **The fix:** `parseOutputFileStatus` now has a staleness fallback. When no `end_turn` is present BUT the outputFile hasn't been written to in `stalenessMs` (default 5 min, configurable via `ALVIN_SUBAGENT_STALENESS_MS`) AND there is usable assistant text content in the tail, the parser:
97
+
98
+ 1. Aggregates ALL text blocks across all assistant turns in the tail (not just the last one — bias toward delivering more context)
99
+ 2. Prepends a clear banner: `⚠️ _Sub-Agent wurde unterbrochen — hier ist der partielle Output:_`
100
+ 3. Returns `state: "completed"` so the watcher delivers it instead of continuing to poll
101
+
102
+ Result: on the next `pollOnce()` after v4.12.4 ships, the three stuck agents get delivered with their real partial output (combined ~1.2MB of text across the three). Future interrupts recover within 5 minutes instead of hanging 12 hours.
103
+
104
+ ### Behavioral notes
105
+
106
+ - **Clean `end_turn` sub-agents are unchanged** — the staleness fallback is a *fallback only*. The existing strict path runs first and takes precedence.
107
+ - **`stalenessMs: 0` disables the fallback entirely** — strict end_turn-only mode for callers that prefer it.
108
+ - **Thinking blocks are still filtered out** of the partial delivery — same as with clean completion.
109
+ - **Files with no assistant text at all** (only tool_use) stay in `running` state — nothing useful to deliver.
110
+ - **Tokens are surfaced when available** — the last assistant event's `usage.input_tokens`/`output_tokens` flow through to the delivery banner.
111
+
112
+ ### Known limitations (carried over from v4.12.3, deferred to v4.13)
113
+
114
+ - The bypass-abort mechanism in `message.ts` still propagates to the SDK subprocess and kills in-flight sub-agents. v4.12.4 works around this at the delivery layer (recovering partial output); a true fix requires either architectural replacement of the SDK's `Task` tool with our own detached-subprocess dispatch, or SDK support for per-task-branch abort signals. Tracked for v4.13.
115
+ - Users may still experience the bot's "typing…" indicator when Claude is thinking in the main turn (before dispatching any background agent). Bypass only fires once `pendingBackgroundCount > 0`. For interrupt before dispatch, use `/cancel`.
116
+
117
+ ### Testing
118
+
119
+ - **Baseline**: 436 tests (v4.12.3)
120
+ - **New**: `test/async-agent-parser-staleness.test.ts` — 11 tests covering: clean `end_turn` still wins over staleness, fresh-interrupted file stays running, stale-interrupted file delivers partial with banner, no-text file stays running, `stalenessMs: 0` disables, aggregation across multiple turns, thinking-block filtering, token extraction, interrupt-only file with no useful content, and ordering preservation.
121
+ - **Total**: 447 tests, all green, TSC clean.
122
+
123
+ ### Files changed
124
+
125
+ - **Modified**: `src/services/async-agent-parser.ts` — staleness fallback in `parseOutputFileStatus`, `DEFAULT_STALENESS_MS` constant, `INTERRUPTED_BANNER` prefix.
126
+ - **NEW tests**: `test/async-agent-parser-staleness.test.ts`.
127
+ - **Version**: `package.json` 4.12.3 → 4.12.4.
128
+
129
+ ---
130
+
5
131
  ## [4.12.3] — 2026-04-15
6
132
 
7
133
  ### 🐛 Patch: Background sub-agent no longer blocks the main Telegram session
@@ -400,6 +400,15 @@ export async function handleMessage(ctx) {
400
400
  messageCount: session.messageCount,
401
401
  toolUseCount: session.toolUseCount,
402
402
  } : undefined,
403
+ // v4.13 — Expose alvin_dispatch_agent MCP tool so Claude can spawn
404
+ // truly detached background sub-agents (independent of this SDK
405
+ // subprocess's lifecycle). Only for SDK provider + Telegram here —
406
+ // non-SDK providers ignore this field.
407
+ alvinDispatchContext: isSDK ? {
408
+ chatId: ctx.chat.id,
409
+ userId,
410
+ sessionKey,
411
+ } : undefined,
403
412
  };
404
413
  // Stream response from provider (with fallback)
405
414
  let lastBroadcastLen = 0;
package/dist/paths.js CHANGED
@@ -118,3 +118,11 @@ export const ASSETS_DIR = resolve(DATA_DIR, "assets");
118
118
  export const ASSETS_INDEX_JSON = resolve(DATA_DIR, "assets", "INDEX.json");
119
119
  /** assets/INDEX.md — Human-readable asset summary (injected into prompts) */
120
120
  export const ASSETS_INDEX_MD = resolve(DATA_DIR, "assets", "INDEX.md");
121
+ /** subagents/ — Detached `claude -p` subprocess output files (v4.13).
122
+ * Each dispatched agent writes its full stream-json output to
123
+ * subagents/<agentId>.jsonl. The async-agent-watcher polls these files
124
+ * and delivers the final result as a separate message when ready.
125
+ * These live outside BOT_ROOT/DATA_DIR's state/ so that the watcher's
126
+ * giveUpAt-survive-restart logic doesn't leak into the subprocess
127
+ * lifecycle. */
128
+ export const SUBAGENTS_DIR = resolve(DATA_DIR, "subagents");
@@ -13,6 +13,7 @@ import { fileURLToPath } from "url";
13
13
  import { execFile } from "child_process";
14
14
  import { promisify } from "util";
15
15
  import { findClaudeBinary } from "../find-claude-binary.js";
16
+ import { buildAlvinMcpServer } from "../services/alvin-mcp-tools.js";
16
17
  const execFileAsync = promisify(execFile);
17
18
  /**
18
19
  * Detects the Claude CLI "Not logged in" error message. The CLI emits this
@@ -103,6 +104,25 @@ export class ClaudeSDKProvider {
103
104
  }
104
105
  try {
105
106
  const claudePath = findClaudeBinary();
107
+ // v4.13 — Register Alvin's custom MCP server if the caller provided
108
+ // dispatch context. The server exposes `alvin_dispatch_agent` which
109
+ // spawns truly detached `claude -p` subprocesses (independent of the
110
+ // main SDK subprocess's lifecycle). When Claude calls it, the bot
111
+ // can abort this query without killing the dispatched sub-agent.
112
+ const mcpServers = {};
113
+ if (options.alvinDispatchContext) {
114
+ mcpServers.alvin = buildAlvinMcpServer(options.alvinDispatchContext);
115
+ }
116
+ // v4.13 — MCP tool names must be explicitly whitelisted via allowedTools
117
+ // in the form `mcp__<server>__<tool>`. Without this, Claude can see the
118
+ // tool in the catalog but cannot actually invoke it.
119
+ const defaultAllowed = [
120
+ "Read", "Write", "Edit", "Bash", "Glob", "Grep",
121
+ "WebSearch", "WebFetch", "Task",
122
+ ];
123
+ if (options.alvinDispatchContext) {
124
+ defaultAllowed.push("mcp__alvin__dispatch_agent");
125
+ }
106
126
  const q = query({
107
127
  prompt,
108
128
  options: {
@@ -116,11 +136,11 @@ export class ClaudeSDKProvider {
116
136
  settingSources: ["user", "project"],
117
137
  // v4.12.2 — options.allowedTools can override the default full set.
118
138
  // Used by sub-agents with toolset="readonly"/"research" to restrict
119
- // what Claude can do. Default = full access.
120
- allowedTools: options.allowedTools ?? [
121
- "Read", "Write", "Edit", "Bash", "Glob", "Grep",
122
- "WebSearch", "WebFetch", "Task",
123
- ],
139
+ // what Claude can do. Default = full access + alvin MCP tools.
140
+ allowedTools: options.allowedTools ?? defaultAllowed,
141
+ // v4.13 Conditionally pass the MCP server config so the inline
142
+ // dispatch tool is visible. Empty object = no custom tools.
143
+ mcpServers: Object.keys(mcpServers).length > 0 ? mcpServers : undefined,
124
144
  systemPrompt,
125
145
  effort: (options.effort || "medium"),
126
146
  maxTurns: 50,
@@ -0,0 +1,125 @@
1
+ /**
2
+ * v4.13 — alvin_dispatch custom-tool service.
3
+ *
4
+ * Architectural replacement for Claude Agent SDK's built-in
5
+ * `Task(run_in_background: true)` tool. The SDK's built-in version
6
+ * ties the background sub-agent's execution to the parent SDK
7
+ * subprocess lifecycle — killing the parent (e.g. via v4.12.3's
8
+ * bypass-abort) cascades into killing any in-flight background tasks.
9
+ *
10
+ * This module instead spawns a truly independent `claude -p` subprocess
11
+ * via Node's `child_process.spawn({ detached: true, stdio: [...] })`.
12
+ * The subprocess:
13
+ * - Has its own PID, own process group (by detached: true)
14
+ * - Is unreffed so the parent Node process doesn't wait for it
15
+ * - Writes its stream-json output to its own file
16
+ * - Survives any abort/crash/restart of the parent Alvin bot
17
+ *
18
+ * The async-agent-watcher polls the output file and delivers the
19
+ * final result via subagent-delivery.ts when the sub-agent completes.
20
+ *
21
+ * See Phase A of docs/superpowers/plans/2026-04-16-v4.13-truly-async-subagents.md
22
+ * for the empirical verification that detached `claude -p` subprocesses
23
+ * behave as expected (they do).
24
+ */
25
+ import { spawn } from "node:child_process";
26
+ import fs from "node:fs";
27
+ import crypto from "node:crypto";
28
+ import { resolve } from "node:path";
29
+ import { findClaudeBinary } from "../find-claude-binary.js";
30
+ import { registerPendingAgent } from "./async-agent-watcher.js";
31
+ import { getAllSessions } from "./session.js";
32
+ import { SUBAGENTS_DIR } from "../paths.js";
33
+ /** Generate a 32-char hex agent id. Avoids collisions across parallel
34
+ * dispatches even at sub-millisecond intervals. */
35
+ function generateAgentId() {
36
+ return "alvin-" + crypto.randomBytes(12).toString("hex");
37
+ }
38
+ /**
39
+ * Dispatch a detached sub-agent. Returns synchronously — the subprocess
40
+ * runs in the background. Throws if spawn fails. On success:
41
+ *
42
+ * 1. Subprocess is running, writing stream-json to outputFile
43
+ * 2. The agent is registered with async-agent-watcher (pending list)
44
+ * 3. session.pendingBackgroundCount is incremented
45
+ * 4. When the subprocess completes, watcher delivers the result
46
+ */
47
+ export function dispatchDetachedAgent(input) {
48
+ // Ensure subagents dir exists. Idempotent.
49
+ try {
50
+ fs.mkdirSync(SUBAGENTS_DIR, { recursive: true });
51
+ }
52
+ catch {
53
+ /* race-safe — next open() will surface the real error */
54
+ }
55
+ const agentId = generateAgentId();
56
+ const outputFile = resolve(SUBAGENTS_DIR, `${agentId}.jsonl`);
57
+ // Open the output file for write. We pass the FD to child's stdout
58
+ // so the subprocess writes directly without going through us.
59
+ // stderr → separate .err file for diagnostics.
60
+ const errFile = resolve(SUBAGENTS_DIR, `${agentId}.err`);
61
+ const outFd = fs.openSync(outputFile, "w");
62
+ const errFd = fs.openSync(errFile, "w");
63
+ const cleanEnv = { ...process.env };
64
+ // v4.13 — Prevent nested-session errors. The SDK refuses to run if
65
+ // these are already set in env (they leak from parent Alvin/SDK).
66
+ delete cleanEnv.CLAUDECODE;
67
+ delete cleanEnv.CLAUDE_CODE_ENTRYPOINT;
68
+ const claudePath = findClaudeBinary();
69
+ if (!claudePath) {
70
+ fs.closeSync(outFd);
71
+ fs.closeSync(errFd);
72
+ throw new Error("alvin_dispatch: claude CLI not found. Install claude-code to enable background dispatch.");
73
+ }
74
+ const child = spawn(claudePath, [
75
+ "-p",
76
+ input.prompt,
77
+ "--output-format",
78
+ "stream-json",
79
+ "--verbose",
80
+ ], {
81
+ cwd: input.cwd,
82
+ detached: true,
83
+ stdio: ["ignore", outFd, errFd],
84
+ env: cleanEnv,
85
+ });
86
+ // Close our copies of the FDs — the child has its own descriptors now.
87
+ try {
88
+ fs.closeSync(outFd);
89
+ }
90
+ catch {
91
+ /* ignore */
92
+ }
93
+ try {
94
+ fs.closeSync(errFd);
95
+ }
96
+ catch {
97
+ /* ignore */
98
+ }
99
+ // Detach from parent Node's event loop so parent exit doesn't wait.
100
+ child.unref();
101
+ // Register with watcher so it polls the output file and delivers.
102
+ registerPendingAgent({
103
+ agentId,
104
+ outputFile,
105
+ description: input.description,
106
+ prompt: input.prompt,
107
+ chatId: input.chatId,
108
+ userId: input.userId,
109
+ toolUseId: null,
110
+ sessionKey: input.sessionKey,
111
+ });
112
+ // Increment the session's pendingBackgroundCount so the main handler
113
+ // knows a background task is in flight (same signal path as SDK's
114
+ // built-in Task tool).
115
+ try {
116
+ const s = getAllSessions().get(input.sessionKey);
117
+ if (s) {
118
+ s.pendingBackgroundCount = (s.pendingBackgroundCount ?? 0) + 1;
119
+ }
120
+ }
121
+ catch {
122
+ /* never let counter updates break dispatch */
123
+ }
124
+ return { agentId, outputFile, spawned: true };
125
+ }
@@ -0,0 +1,103 @@
1
+ /**
2
+ * v4.13 — Alvin's custom MCP tools, registered with the Claude Agent SDK
3
+ * via `createSdkMcpServer()`.
4
+ *
5
+ * Currently exposes a single tool:
6
+ * `alvin_dispatch_agent(prompt, description)` — spawns a truly
7
+ * detached `claude -p` subprocess that's independent of the parent
8
+ * SDK lifecycle. Claude should prefer this over built-in
9
+ * `Task(run_in_background: true)` for any long-running work on
10
+ * Telegram so the main Telegram session isn't blocked by the SDK's
11
+ * task-notification injection mechanism.
12
+ *
13
+ * The MCP server is created lazily per-query so each query gets fresh
14
+ * handler context (chatId/userId/sessionKey) via a closure.
15
+ */
16
+ import { createSdkMcpServer, tool, } from "@anthropic-ai/claude-agent-sdk";
17
+ import { z } from "zod";
18
+ import { dispatchDetachedAgent } from "./alvin-dispatch.js";
19
+ /**
20
+ * Build an MCP server bound to a specific turn's context. Pass the
21
+ * returned instance under `mcpServers: { alvin: <instance> }` in the
22
+ * query options.
23
+ */
24
+ export function buildAlvinMcpServer(ctx) {
25
+ return createSdkMcpServer({
26
+ name: "alvin",
27
+ version: "4.13.0",
28
+ tools: [
29
+ tool("dispatch_agent", [
30
+ "Dispatch a TRULY DETACHED background sub-agent that runs",
31
+ "independently of this session. Use this for ANY long-running",
32
+ "work on Telegram/Slack/Discord/WhatsApp — research tasks,",
33
+ "audits, multi-page scraping, deep analysis — so the main",
34
+ "user session stays responsive and the user can keep chatting",
35
+ "with you while the sub-agent works.",
36
+ "",
37
+ "HOW IT DIFFERS FROM Task(run_in_background: true):",
38
+ "- The built-in Task tool's subprocess is tied to this session,",
39
+ " so aborting the session also kills the sub-agent mid-work.",
40
+ "- `alvin_dispatch.dispatch_agent` spawns a completely",
41
+ " independent `claude -p` subprocess that survives any abort,",
42
+ " crash, or restart of the main bot.",
43
+ "",
44
+ "WHEN TO USE:",
45
+ "- Any audit/research visiting >2 URLs or reading >5 files",
46
+ "- Full-repo scans, code reviews, SEO/security/perf audits",
47
+ "- Anything you'd describe as 'thorough' or 'takes a few min'",
48
+ "",
49
+ "HOW THE RESULT GETS BACK TO THE USER:",
50
+ "- The tool returns { agentId, outputFile } immediately.",
51
+ "- The bot's async-agent watcher polls the outputFile and",
52
+ " delivers the final result as a separate chat message when",
53
+ " the sub-agent completes (success, failure, or 5-min",
54
+ " staleness).",
55
+ "- Your job after calling this tool: tell the user ONE short",
56
+ " sentence about what you dispatched, then END your turn.",
57
+ " Do NOT wait. Do NOT poll the outputFile yourself.",
58
+ ].join("\n"), {
59
+ prompt: z
60
+ .string()
61
+ .describe("The full prompt for the sub-agent. Be specific and self-contained — the sub-agent has no access to this conversation's context and will see only this prompt."),
62
+ description: z
63
+ .string()
64
+ .describe("Short human-readable title (e.g. 'SEO audit alev-b.com', 'Research Higgsfield Seedance 2.0'). Shown to the user when the result arrives."),
65
+ }, async (args) => {
66
+ try {
67
+ const result = dispatchDetachedAgent({
68
+ prompt: args.prompt,
69
+ description: args.description,
70
+ chatId: ctx.chatId,
71
+ userId: ctx.userId,
72
+ sessionKey: ctx.sessionKey,
73
+ cwd: ctx.cwd,
74
+ });
75
+ return {
76
+ content: [
77
+ {
78
+ type: "text",
79
+ text: `✅ Background sub-agent dispatched.\n` +
80
+ `agentId: ${result.agentId}\n` +
81
+ `output_file: ${result.outputFile}\n` +
82
+ `The user will receive the result as a separate message when the sub-agent completes.\n` +
83
+ `End your turn now. Do not wait for the result — it arrives asynchronously.`,
84
+ },
85
+ ],
86
+ };
87
+ }
88
+ catch (err) {
89
+ const msg = err instanceof Error ? err.message : String(err);
90
+ return {
91
+ content: [
92
+ {
93
+ type: "text",
94
+ text: `⚠️ Failed to dispatch background agent: ${msg}`,
95
+ },
96
+ ],
97
+ isError: true,
98
+ };
99
+ }
100
+ }),
101
+ ],
102
+ });
103
+ }
@@ -68,14 +68,45 @@ export function parseAsyncLaunchedToolResult(raw) {
68
68
  return { agentId, outputFile };
69
69
  }
70
70
  const DEFAULT_TAIL_BYTES = 64 * 1024;
71
+ /**
72
+ * v4.12.4 — Default staleness window for partial-output delivery.
73
+ *
74
+ * If an outputFile has not been written to for at least this long AND
75
+ * there is usable assistant text content in it, treat it as "completed
76
+ * with partial output" rather than leaving it to time out at 12h with
77
+ * an empty banner. 5 minutes is a balance between:
78
+ * - Fast enough to unblock interrupted agents (most useful work is
79
+ * done within a few minutes)
80
+ * - Slow enough to avoid false-positives on slow-but-alive agents
81
+ * (typical tool_use gaps are under 30s)
82
+ *
83
+ * Override per call via opts.stalenessMs, or globally via the
84
+ * ALVIN_SUBAGENT_STALENESS_MS env var. `0` disables the fallback
85
+ * entirely (strict end_turn-only completion detection).
86
+ */
87
+ const DEFAULT_STALENESS_MS = Number(process.env.ALVIN_SUBAGENT_STALENESS_MS) || 5 * 60 * 1000;
88
+ /**
89
+ * Banner prepended to partial-output deliveries so the user knows the
90
+ * sub-agent was interrupted and this isn't a clean completion.
91
+ */
92
+ const INTERRUPTED_BANNER = "⚠️ _Sub-Agent wurde unterbrochen — hier ist der partielle Output:_\n\n";
71
93
  /**
72
94
  * Read the tail of an SDK background-agent outputFile and decide what
73
95
  * state the sub-agent is in. See spec doc for the JSONL format. We only
74
96
  * read the last `maxTailBytes` of the file because long-running agents
75
97
  * (SEO audits etc.) can produce hundreds of KB of intermediate JSONL.
98
+ *
99
+ * v4.12.4 adds staleness-based partial-output delivery. When no
100
+ * `end_turn` marker is present, the parser checks file mtime: if the
101
+ * file hasn't grown in `stalenessMs` AND there is text content in the
102
+ * assistant turns, aggregate the text across all turns (not just the
103
+ * last), prepend an "interrupted" banner, and return "completed". This
104
+ * recovers real work from agents killed mid-execution (e.g. by the
105
+ * v4.12.3 bypass abort propagating through the SDK subprocess).
76
106
  */
77
107
  export async function parseOutputFileStatus(path, opts = {}) {
78
108
  const maxTailBytes = opts.maxTailBytes ?? DEFAULT_TAIL_BYTES;
109
+ const stalenessMs = opts.stalenessMs ?? DEFAULT_STALENESS_MS;
79
110
  let stat;
80
111
  try {
81
112
  stat = await fs.stat(path);
@@ -117,6 +148,56 @@ export async function parseOutputFileStatus(path, opts = {}) {
117
148
  const usable = lines
118
149
  .slice(headIncomplete, lines.length - (trailIncomplete > 0 ? trailIncomplete : 0))
119
150
  .filter((l) => l.length > 0);
151
+ // v4.13 — FIRST PASS: look for a `{"type":"result"}` event anywhere in
152
+ // the tail. This is the completion signal for `claude -p
153
+ // --output-format stream-json` output (used by the v4.13 dispatch
154
+ // mechanism). When present, the `result` field holds the authoritative
155
+ // final text. If `result.result` is missing, aggregate from all
156
+ // assistant text blocks in the tail.
157
+ for (let i = usable.length - 1; i >= 0; i--) {
158
+ let parsed;
159
+ try {
160
+ parsed = JSON.parse(usable[i]);
161
+ }
162
+ catch {
163
+ continue;
164
+ }
165
+ if (parsed.type === "result") {
166
+ // Prefer the authoritative `result` field when present.
167
+ let output = typeof parsed.result === "string" ? parsed.result : "";
168
+ // Fallback: aggregate text from all assistant messages in the tail.
169
+ if (!output) {
170
+ const fragments = [];
171
+ for (const line of usable) {
172
+ let p;
173
+ try {
174
+ p = JSON.parse(line);
175
+ }
176
+ catch {
177
+ continue;
178
+ }
179
+ if (p.type === "assistant" &&
180
+ Array.isArray(p.message?.content)) {
181
+ for (const c of p.message.content) {
182
+ if (c?.type === "text" && typeof c.text === "string") {
183
+ fragments.push(c.text);
184
+ }
185
+ }
186
+ }
187
+ }
188
+ output = fragments.join("\n\n").trim();
189
+ }
190
+ // Token usage from the result event itself.
191
+ const usage = parsed.usage;
192
+ const tokensUsed = usage
193
+ ? {
194
+ input: usage.input_tokens ?? 0,
195
+ output: usage.output_tokens ?? 0,
196
+ }
197
+ : undefined;
198
+ return { state: "completed", output, tokensUsed };
199
+ }
200
+ }
120
201
  // Walk backwards to find the most-recent assistant message with end_turn
121
202
  for (let i = usable.length - 1; i >= 0; i--) {
122
203
  let parsed;
@@ -147,6 +228,50 @@ export async function parseOutputFileStatus(path, opts = {}) {
147
228
  };
148
229
  }
149
230
  }
150
- // No completion marker found still running.
231
+ // v4.12.4 — No clean end_turn. Check for staleness + partial text.
232
+ if (stalenessMs > 0) {
233
+ const ageMs = Date.now() - stat.mtimeMs;
234
+ if (ageMs >= stalenessMs) {
235
+ // Aggregate ALL assistant text blocks across the tail, in order.
236
+ // We parse forward now (not backward like the end_turn scan) so
237
+ // the delivered text preserves the natural reading order.
238
+ const textFragments = [];
239
+ let lastUsage;
240
+ for (const line of usable) {
241
+ let parsed;
242
+ try {
243
+ parsed = JSON.parse(line);
244
+ }
245
+ catch {
246
+ continue;
247
+ }
248
+ if (parsed.type === "assistant" &&
249
+ Array.isArray(parsed.message?.content)) {
250
+ for (const c of parsed.message.content) {
251
+ if (c?.type === "text" && typeof c.text === "string") {
252
+ textFragments.push(c.text);
253
+ }
254
+ }
255
+ if (parsed.message?.usage) {
256
+ lastUsage = {
257
+ input: parsed.message.usage.input_tokens ?? 0,
258
+ output: parsed.message.usage.output_tokens ?? 0,
259
+ };
260
+ }
261
+ }
262
+ }
263
+ if (textFragments.length > 0) {
264
+ const aggregated = textFragments.join("\n\n").trim();
265
+ if (aggregated.length > 0) {
266
+ return {
267
+ state: "completed",
268
+ output: INTERRUPTED_BANNER + aggregated,
269
+ tokensUsed: lastUsage,
270
+ };
271
+ }
272
+ }
273
+ }
274
+ }
275
+ // No completion marker found and not stale (or no text) — still running.
151
276
  return { state: "running", size: stat.size };
152
277
  }