alvin-bot 4.9.3 β†’ 4.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,114 @@
2
2
 
3
3
  All notable changes to Alvin Bot are documented here.
4
4
 
5
+ ## [4.10.0] β€” 2026-04-13
6
+
7
+ ### πŸš€ Async sub-agents β€” main session no longer blocks during long tasks
8
+
9
+ The big architecture upgrade: Claude can now delegate long-running work (SEO audits, multi-page research, full-repo analyses) to **background** sub-agents. The main Telegram session ends quickly, the user can keep chatting, and the sub-agent's final report arrives as a separate message when ready.
10
+
11
+ A colleague flagged the underlying problem on 2026-04-13 via WhatsApp voice note: *"It's weird that the main routine crashes when the sub-agents are still running. It should just run in the background, and that should have zero impact on the main routine."* He was right. OpenClaw had this years ago because back then the SDK didn't support async; today's `@anthropic-ai/claude-agent-sdk@0.2.97` already ships `run_in_background: true` on the Agent tool β€” Alvin just wasn't using it.
12
+
13
+ This release closes that gap in two complementary stages, both bundled into the same v4.10.0:
14
+
15
+ #### Stage 1 β€” System prompt teaches Claude when to use `run_in_background`
16
+
17
+ - New `BACKGROUND_SUBAGENT_HINT` constant in `src/services/personality.ts`, injected only into SDK sessions (non-SDK providers don't have an Agent tool).
18
+ - The hint tells Claude: for audits / multi-page research / >2 min tasks β†’ ALWAYS set `run_in_background: true`. After launching, end the turn promptly. The bot delivers the result automatically when done.
19
+ - Net effect: Claude's main turn ends in ~5 s instead of 10+ minutes. `session.isProcessing` flips to `false` quickly so the user can keep chatting.
20
+
21
+ #### Stage 2 β€” Async-agent watcher polls and delivers
22
+
23
+ The hard part. Three new pure modules + one new wired-up service:
24
+
25
+ - **`src/services/async-agent-parser.ts`** (NEW, pure) β€” two helpers:
26
+ - `parseAsyncLaunchedToolResult(text)` extracts `agentId` + `output_file` from the SDK's plain-text `Async agent launched successfully…` tool-result. **Important**: the `.d.ts` type in the SDK package claims this is a JSON object with `outputFile: string`. The runtime actually emits plain text with `output_file` (snake_case). Captured live via probe β€” see the parser test fixtures.
27
+ - `parseOutputFileStatus(path)` tail-reads (64 KB) the JSONL `output_file` and detects completion by finding the most-recent `assistant` message with `stop_reason: "end_turn"`. Concatenates `content[].text` blocks for the final answer. Token usage extracted from the `usage` field. Survives partial last lines, garbage lines, and tail-cuts on huge files. **19 unit tests** including a 200 KB tail-test.
28
+ - **`src/services/async-agent-watcher.ts`** (NEW) β€” the polling service. `Map<agentId, PendingAsyncAgent>` in memory, persisted to `~/.alvin-bot/state/async-agents.json` for restart catch-up (same pattern as v4.9.0 cron scheduler). Public API: `startWatcher` / `stopWatcher` / `registerPendingAgent` / `pollOnce` / `listPendingAgents`. Polls every 15 s, gives up after 12 h per-agent (timeout banner). On completion β†’ builds a `SubAgentInfo + SubAgentResult` and hands off to the existing `subagent-delivery.ts` from v4.9.x. **7 integration tests** including bot-restart catch-up.
29
+ - **`src/handlers/async-agent-chunk-handler.ts`** (NEW) β€” bridge between provider stream chunks and the watcher. Inspects `tool_result` chunks for the async_launched payload, extracts the `description` from the immediately preceding `tool_use` chunk, registers with the watcher. **4 unit tests**.
30
+ - **`src/providers/claude-sdk-provider.ts`** β€” extended to surface `tool_result` blocks from SDK `user` messages as a new `tool_result` chunk type. Previously the provider only emitted `text` and `tool_use` chunks.
31
+ - **`src/providers/types.ts`** β€” `StreamChunk` gets two new optional fields: `toolUseId` and `toolResultContent`.
32
+ - **`src/handlers/message.ts`** β€” captures `lastAgentToolUseInput` from each `tool_use` chunk and consumes it on the immediately-following `tool_result` chunk. Tool-name match also extended from `"Task"` β†’ `"Task" | "Agent"` (the SDK renamed it in v2.1.63).
33
+ - **`src/index.ts`** β€” `startAsyncAgentWatcher()` after the cron scheduler, `stopAsyncAgentWatcher()` in the shutdown handler.
34
+ - **`src/paths.ts`** β€” new `ASYNC_AGENTS_STATE_FILE` constant under `~/.alvin-bot/state/`.
35
+
36
+ #### Investigation artifacts (gitignored, maintainer-local)
37
+
38
+ - `docs/superpowers/plans/2026-04-13-async-subagents.md` β€” full TDD plan
39
+ - `docs/superpowers/specs/sdk-async-agent-outputfile-format.md` β€” live-captured SDK format spec; documents the `.d.ts` mismatch that ate ~30 minutes of debugging time
40
+
41
+ #### Testing
42
+
43
+ **237 tests total** (201 baseline + 36 new). All green. TSC clean.
44
+
45
+ - 6 system-prompt-hint tests (Stage 1)
46
+ - 19 parser tests (8 plain-text format + 11 JSONL format including 200 KB tail-test)
47
+ - 7 watcher integration tests (register, deliver, persistence, restart catch-up, timeout, concurrent agents)
48
+ - 4 chunk-handler unit tests
49
+
50
+ Live-verified via isolated SDK probe (`node sdk-probe.mjs` inside the repo) which confirmed the real `output_file` path and JSONL format match the parser's expectations.
51
+
52
+ #### What you'll see as a user
53
+
54
+ Send: *"Make a SEO audit of gethomes.io and alev-b.com in parallel"*
55
+
56
+ - **0 s** β€” Claude responds: *"Starting both audits in the background β€” I'll send the reports when done."* Main session **unlocks**.
57
+ - **1–10 min later** β€” You can chat about anything else. The bot answers immediately.
58
+ - **~13 min** (when each agent finishes) β€” Two separate banner messages arrive: *"βœ… SEO audit gethomes.io completed Β· 13m 17s Β· 2.6M in / 28k out"* + the full report body, delivered via the v4.9.3 Markdownβ†’plain-text fallback path.
59
+
60
+ #### Non-goals
61
+
62
+ - No session-mutex refactor (Stage 3 from the analysis, out of scope here)
63
+ - No replacement for Alvin's existing cron `spawnSubAgent` system (different use case)
64
+ - No SDK upgrade beyond `0.2.97`
65
+
66
+ #### Compatibility
67
+
68
+ - `CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1` in `.env` disables background mode at the SDK level β†’ Stage 1 hint becomes inert, watcher idles; foreground behavior is restored
69
+
70
+ ## [4.9.4] β€” 2026-04-13
71
+
72
+ ### πŸ”Œ Web UI fully decoupled from main bot β€” port conflicts no longer crash anything
73
+
74
+ Colleague feedback (WhatsApp voice note, 2026-04-13):
75
+ > *"The gateway binds to port 3100 like OpenClaw. When the bot restarts,
76
+ > the port is often still held β†’ catastrophic crash. I ended up
77
+ > decoupling the gateway process completely, because the actual bot
78
+ > runs independently of the gateway β€” it can still answer Telegram
79
+ > even if the web endpoint isn't reachable yet. It's weird that the
80
+ > main routine crashes when the port is busy. It should just run in
81
+ > the background, watch for the port to become free, and connect
82
+ > then. Zero impact on the main routine."*
83
+
84
+ He was right. My v4.9.0 `stopWebServer()` fix was *prevention* β€” it stopped the bot itself from holding 3100 across restarts. But it didn't cover the *resilience* side: a foreign process holding 3100 (another dev server, an OpenClaw-style orphan, a TIME_WAIT race after SIGKILL) still crashed the boot, because `startWebServer()` was synchronous and the `uncaught exception` from `server.listen()` escaped to the main event loop.
85
+
86
+ **Complete rewrite of the bind loop:**
87
+
88
+ - **`src/web/bind-strategy.ts` (new) β€” pure decision helper.** `decideNextBindAction(err, attempt, opts)` returns either `{type: "retry-port", port, attempt}` (climb the ladder) or `{type: "retry-background", delayMs, port}` (back off, retry the original port in 30 s). EADDRINUSE with attempts remaining β†’ ladder. EADDRINUSE exhausted β†’ background. Any other error β†’ background. 8 unit tests covering every branch + purity.
89
+
90
+ - **`src/web/server.ts` startWebServer β€” non-blocking, fresh-server-per-attempt.** Returns `void` synchronously, NEVER throws, NEVER blocks on bind. Each attempt creates a new `http.Server` (no state-recycling bugs) and attaches its own error handler. On failure, cleans up and calls `decideNextBindAction` to decide the next move. If the ladder is exhausted, schedules a 30 s background retry at the original port β€” the Telegram bot keeps running the whole time, the web UI just isn't reachable yet.
91
+
92
+ - **`src/web/server.ts` WebSocketServer attached POST-bind.** The `ws` library's `WebSocketServer` constructor installs its own event plumbing on the underlying `http.Server` and β€” crucially β€” causes EADDRINUSE errors to escape as uncaught exceptions when attached pre-listen. Debugging this chewed an hour on 2026-04-13. Fix: only `new WebSocketServer({ server })` AFTER `listen()` has fired its callback. The unit-test `test/web-server-integration.test.ts "when the primary port is taken"` pins this behaviour.
93
+
94
+ - **`src/web/server.ts` error handler: `on` not `once`.** Previous version used `.once("error", handler)` and a node edge case where a single bind failure emits TWO error events left the second one uncaught. Handler is now `on` with a `handled` guard β€” idempotent, and a post-bind quiet logger replaces it on success.
95
+
96
+ - **`src/web/server.ts` defensive try/catch around `server.listen()`.** In the wild Node sometimes throws synchronously for edge-case binds (already-listening, invalid backlog, kernel race). The catch funnels sync throws through the same `handleBindFailure` path as async error events.
97
+
98
+ - **`src/web/server.ts` `closeHttpServerGracefully(server)` + `stopWebServer()`.** The old `stopWebServer(server)` took an explicit server arg; it's been split into a low-level helper (`closeHttpServerGracefully(server)`, exported for tests) and a stateful top-level (`stopWebServer()`, no args, cleans up `currentServer` + `wsServerRef` + `bindRetryTimer`). Safe to call before start, safe to call twice, cancels pending background retries.
99
+
100
+ - **`src/index.ts` call sites adjusted.** `const webServer = startWebServer()` β†’ `startWebServer()`. `stopWebServer(webServer)` β†’ `stopWebServer()`. The comment above the call explains the decoupling so nobody accidentally re-couples it in a future "clean up" refactor.
101
+
102
+ **Testing: 186 β†’ 201 (+15 new).**
103
+
104
+ - `test/web-server-resilience.test.ts` β€” 8 unit tests for `decideNextBindAction`
105
+ - `test/web-server-integration.test.ts` β€” 7 real-server integration tests: startWebServer returns void, binds, stops, is idempotent, survives primary-port conflict by climbing the ladder, closes servers with hanging sockets.
106
+ - **Live-verified on the maintainer's machine**: `launchctl unload` + dual-stack Node hog on port 3100 + `launchctl load` β†’ bot booted cleanly β†’ out.log contained `[web] port 3100 busy (EADDRINUSE) β€” trying 3101` β†’ `🌐 Web UI: http://localhost:3101 (Port 3100 was busy, using 3101 instead)` β†’ Telegram responsive throughout. Exactly what the colleague described.
107
+
108
+ **Non-goals / intentionally unchanged:**
109
+ - Timeouts stay unlimited (v4.8.8 behaviour preserved).
110
+ - The primary port is still `WEB_PORT || 3100` β€” no config schema change.
111
+ - When the bot binds on a non-primary port (e.g. 3101), the README permalink still points at 3100. Users hitting a ladder-climbed bot should check the startup log; this is rare and temporary.
112
+
5
113
  ## [4.9.3] β€” 2026-04-11
6
114
 
7
115
  ### πŸ›  Two UX bugs found in production after v4.9.2 β€” now closed
package/README.md CHANGED
@@ -114,7 +114,18 @@ That's it. The setup wizard validates everything:
114
114
 
115
115
  **Requires:** Node.js 18+ ([nodejs.org](https://nodejs.org)) Β· Telegram bot token ([@BotFather](https://t.me/BotFather)) Β· Your Telegram user ID ([@userinfobot](https://t.me/userinfobot))
116
116
 
117
- Free AI providers available β€” no credit card needed.
117
+ Free AI providers available β€” no credit card needed. **Privacy-first?** Pick the πŸ”’ **Offline β€” Gemma 4 E4B** option in setup for a fully local LLM via Ollama (macOS/Linux: automated install; Windows: manual).
118
+
119
+ ### πŸ“˜ First-time setup walkthroughs
120
+
121
+ Step-by-step guides with screenshots and screen-for-screen instructions:
122
+
123
+ | Platform | PDF (printable) |
124
+ |---|---|
125
+ | 🍎 **macOS** (with `launchd` background service) | [Download PDF](https://github.com/alvbln/Alvin-Bot/releases/latest/download/Alvin-Bot-macOS-Setup-Guide.pdf) |
126
+ | πŸͺŸ **Windows** (with Task Scheduler / Startup folder) | [Download PDF](https://github.com/alvbln/Alvin-Bot/releases/latest/download/Alvin-Bot-Windows-Setup-Guide.pdf) |
127
+
128
+ Both guides cover: Node.js install Β· Telegram bot creation Β· first-time `setup` Β· foreground test Β· background service Β· offline Gemma 4 mode Β· troubleshooting. ~15 min end-to-end for a first-time user.
118
129
 
119
130
  ### macOS: use `launchd` instead of pm2 (recommended)
120
131
 
@@ -0,0 +1,33 @@
1
+ import { parseAsyncLaunchedToolResult } from "../services/async-agent-parser.js";
2
+ import { registerPendingAgent } from "../services/async-agent-watcher.js";
3
+ /**
4
+ * Inspect a stream chunk; if it's an Agent async_launched tool_result,
5
+ * register the pending agent with the watcher.
6
+ *
7
+ * Safe to call on any chunk type β€” non-tool_result chunks are ignored.
8
+ */
9
+ export function handleToolResultChunk(chunk, ctx) {
10
+ if (chunk.type !== "tool_result")
11
+ return;
12
+ if (!chunk.toolResultContent)
13
+ return;
14
+ const info = parseAsyncLaunchedToolResult(chunk.toolResultContent);
15
+ if (!info)
16
+ return;
17
+ // The description and prompt come from the original tool_use input,
18
+ // not the tool_result text. If we don't have them (e.g. test setup
19
+ // forgot to pass lastToolUseInput), fall back to a generic label so
20
+ // the user still sees something meaningful in the delivery banner.
21
+ const description = ctx.lastToolUseInput?.description?.trim() ||
22
+ `Background agent ${info.agentId.slice(0, 8)}`;
23
+ const prompt = ctx.lastToolUseInput?.prompt?.trim() || "";
24
+ registerPendingAgent({
25
+ agentId: info.agentId,
26
+ outputFile: info.outputFile,
27
+ description,
28
+ prompt,
29
+ chatId: ctx.chatId,
30
+ userId: ctx.userId,
31
+ toolUseId: chunk.toolUseId ?? null,
32
+ });
33
+ }
@@ -15,6 +15,7 @@ import { trackUsage } from "../services/usage-tracker.js";
15
15
  import { emitUserMessage as broadcastUserMessage, emitResponseStart as broadcastResponseStart, emitResponseDelta as broadcastResponseDelta, emitResponseDone as broadcastResponseDone, } from "../services/broadcast.js";
16
16
  import { t } from "../i18n.js";
17
17
  import { isHarmlessTelegramError } from "../util/telegram-error-filter.js";
18
+ import { handleToolResultChunk } from "./async-agent-chunk-handler.js";
18
19
  /**
19
20
  * Stuck-only timeout β€” NO absolute cap.
20
21
  *
@@ -279,6 +280,11 @@ export async function handleMessage(ctx) {
279
280
  };
280
281
  // Stream response from provider (with fallback)
281
282
  let lastBroadcastLen = 0;
283
+ // Captured during tool_use chunks; consumed by tool_result chunks so
284
+ // the async-agent watcher can label pending agents with their human-
285
+ // readable description (which only appears in the tool_use input,
286
+ // not in the tool_result text). See Fix #17 Stage 2.
287
+ let lastAgentToolUseInput;
282
288
  for await (const chunk of registry.queryWithFallback(queryOpts)) {
283
289
  // Any chunk is progress β€” reset the stuck timer.
284
290
  resetStuckTimer();
@@ -309,13 +315,14 @@ export async function handleMessage(ctx) {
309
315
  if (chunk.toolName) {
310
316
  session.toolUseCount++;
311
317
  const icon = TOOL_ICONS[chunk.toolName] || "πŸ”§";
312
- // Special treatment for Claude's SDK-internal Task tool:
318
+ // Special treatment for Claude's SDK-internal Task/Agent tool:
313
319
  // track how many sub-tasks Claude delegated and surface the
314
320
  // task description in the status line so the user sees WHAT
315
- // is being delegated, not just "Task…".
316
- if (chunk.toolName === "Task") {
321
+ // is being delegated, not just "Task…". The tool was renamed
322
+ // from "Task" to "Agent" in Claude Code v2.1.63 β€” match both.
323
+ if (chunk.toolName === "Task" || chunk.toolName === "Agent") {
317
324
  session.sdkSubTaskCount++;
318
- let label = "Task";
325
+ let label = chunk.toolName;
319
326
  if (chunk.toolInput) {
320
327
  try {
321
328
  const parsed = JSON.parse(chunk.toolInput);
@@ -324,11 +331,18 @@ export async function handleMessage(ctx) {
324
331
  const desc = parsed.description.length > 80
325
332
  ? parsed.description.slice(0, 80) + "…"
326
333
  : parsed.description;
327
- label = `Task: ${desc}`;
334
+ label = `${chunk.toolName}: ${desc}`;
328
335
  }
329
336
  else if (parsed.subagent_type) {
330
- label = `Task (${parsed.subagent_type})`;
337
+ label = `${chunk.toolName} (${parsed.subagent_type})`;
331
338
  }
339
+ // Capture the description+prompt for the upcoming
340
+ // tool_result. Used by Fix #17 Stage 2 to label
341
+ // background agents in the watcher's delivery banner.
342
+ lastAgentToolUseInput = {
343
+ description: parsed.description,
344
+ prompt: parsed.prompt,
345
+ };
332
346
  }
333
347
  catch {
334
348
  // not JSON β€” keep generic label
@@ -341,6 +355,20 @@ export async function handleMessage(ctx) {
341
355
  }
342
356
  }
343
357
  break;
358
+ case "tool_result":
359
+ // Fix #17 Stage 2: detect Agent async_launched payloads and
360
+ // hand them off to the async-agent watcher. The watcher will
361
+ // poll the outputFile and deliver the result as a separate
362
+ // Telegram message when the background agent finishes.
363
+ handleToolResultChunk(chunk, {
364
+ chatId: ctx.chat.id,
365
+ userId,
366
+ lastToolUseInput: lastAgentToolUseInput,
367
+ });
368
+ // Reset the captured input β€” only the immediately following
369
+ // tool_result should consume it.
370
+ lastAgentToolUseInput = undefined;
371
+ break;
344
372
  case "done":
345
373
  if (chunk.sessionId)
346
374
  session.sessionId = chunk.sessionId;
package/dist/index.js CHANGED
@@ -78,6 +78,7 @@ import { loadPlugins, registerPluginCommands, unloadPlugins } from "./services/p
78
78
  import { initMCP, disconnectMCP, hasMCPConfig } from "./services/mcp.js";
79
79
  import { startWebServer, stopWebServer } from "./web/server.js";
80
80
  import { startScheduler, stopScheduler, setNotifyCallback } from "./services/cron.js";
81
+ import { startWatcher as startAsyncAgentWatcher, stopWatcher as stopAsyncAgentWatcher } from "./services/async-agent-watcher.js";
81
82
  import { startSessionCleanup, stopSessionCleanup } from "./services/session.js";
82
83
  import { processQueue, cleanupQueue, setSenders, enqueue } from "./services/delivery-queue.js";
83
84
  import { discoverTools } from "./services/tool-discovery.js";
@@ -254,6 +255,7 @@ const shutdown = async () => {
254
255
  await cancelAllSubAgents(true);
255
256
  stopWatchdog();
256
257
  stopScheduler();
258
+ stopAsyncAgentWatcher();
257
259
  stopSessionCleanup();
258
260
  if (queueInterval)
259
261
  clearInterval(queueInterval);
@@ -267,7 +269,7 @@ const shutdown = async () => {
267
269
  }
268
270
  // Release :3100 so the next launchd boot doesn't hit EADDRINUSE.
269
271
  // Must happen before exit β€” see src/web/server.ts stopWebServer() comment.
270
- await stopWebServer(webServer).catch((err) => console.warn("[shutdown] stopWebServer failed:", err));
272
+ await stopWebServer().catch((err) => console.warn("[shutdown] stopWebServer failed:", err));
271
273
  await unloadPlugins().catch(() => { });
272
274
  await disconnectMCP().catch(() => { });
273
275
  // Tear down any bot-managed local runners (Ollama, LM Studio, …) so VRAM
@@ -404,8 +406,13 @@ async function startOptionalPlatforms() {
404
406
  }
405
407
  }
406
408
  startOptionalPlatforms().catch(err => console.error("Platform startup error:", err));
407
- // Start Web UI (ALWAYS β€” regardless of Telegram/AI config)
408
- const webServer = startWebServer();
409
+ // Start Web UI (ALWAYS β€” regardless of Telegram/AI config).
410
+ // startWebServer is now non-blocking and will never throw: if port 3100
411
+ // is busy (foreign process, TIME_WAIT, another bot instance), it climbs
412
+ // the port ladder up to 3119 and then enters a background retry loop
413
+ // at 3100 every 30s. The Telegram bot runs independently β€” Web UI is a
414
+ // feature, not core. See src/web/bind-strategy.ts for the retry rules.
415
+ startWebServer();
409
416
  // Start Cron Scheduler β€” route notifications through delivery queue for reliability
410
417
  setNotifyCallback(async (target, text) => {
411
418
  if (target.platform === "web") {
@@ -415,6 +422,11 @@ setNotifyCallback(async (target, text) => {
415
422
  enqueue(target.platform, String(target.chatId), text);
416
423
  });
417
424
  startScheduler();
425
+ // Start the async-agent watcher (Fix #17 Stage 2). Polls outputFiles
426
+ // of background sub-agents Claude launched with run_in_background and
427
+ // delivers their completed reports as separate Telegram messages.
428
+ // Loads any persisted pending agents from disk on boot.
429
+ startAsyncAgentWatcher();
418
430
  // Session memory hygiene: purge sessions idle > 7 days (configurable via
419
431
  // ALVIN_SESSION_TTL_DAYS). Never touches active sessions β€” see session.ts.
420
432
  startSessionCleanup();
package/dist/paths.js CHANGED
@@ -62,6 +62,10 @@ export const SUDO_ENC_FILE = resolve(DATA_DIR, "data", ".sudo-enc");
62
62
  export const SUDO_KEY_FILE = resolve(DATA_DIR, "data", ".sudo-key");
63
63
  /** backups/ β€” Config snapshots */
64
64
  export const BACKUP_DIR = resolve(DATA_DIR, "backups");
65
+ /** state/async-agents.json β€” Pending background SDK agents (Fix #17 Stage 2).
66
+ * See src/services/async-agent-watcher.ts for the watcher that polls and
67
+ * delivers these. Survives bot restarts. */
68
+ export const ASYNC_AGENTS_STATE_FILE = resolve(DATA_DIR, "state", "async-agents.json");
65
69
  /** soul.md β€” Bot personality */
66
70
  export const SOUL_FILE = resolve(DATA_DIR, "soul.md");
67
71
  /** tools.md β€” Custom tool definitions (Markdown) */
@@ -186,6 +186,49 @@ export class ClaudeSDKProvider {
186
186
  }
187
187
  }
188
188
  }
189
+ // User message β€” tool_results from the Claude API arrive as user
190
+ // messages in the SDK protocol. We surface tool_result blocks as
191
+ // chunks so the message handler can detect Agent async_launched
192
+ // payloads and register them with the watcher (Fix #17 Stage 2).
193
+ if (message.type === "user") {
194
+ // eslint-disable-next-line @typescript-eslint/no-explicit-any
195
+ const userMsg = message;
196
+ const content = userMsg.message?.content;
197
+ if (Array.isArray(content)) {
198
+ for (const block of content) {
199
+ if (block &&
200
+ typeof block === "object" &&
201
+ block.type === "tool_result" &&
202
+ typeof block.tool_use_id === "string") {
203
+ // The `content` field on a tool_result block can be a
204
+ // plain string OR an array of content blocks. Normalize
205
+ // to a single string so the chunk consumer doesn't need
206
+ // to know about the SDK shape.
207
+ let contentText = "";
208
+ if (typeof block.content === "string") {
209
+ contentText = block.content;
210
+ }
211
+ else if (Array.isArray(block.content)) {
212
+ contentText = block.content
213
+ .map((c) => {
214
+ if (c && typeof c === "object" && "text" in c) {
215
+ const t = c.text;
216
+ return typeof t === "string" ? t : "";
217
+ }
218
+ return "";
219
+ })
220
+ .join("");
221
+ }
222
+ yield {
223
+ type: "tool_result",
224
+ toolUseId: block.tool_use_id,
225
+ toolResultContent: contentText,
226
+ sessionId: capturedSessionId,
227
+ };
228
+ }
229
+ }
230
+ }
231
+ }
189
232
  // Result β€” done (extract full usage including cache tokens)
190
233
  if (message.type === "result") {
191
234
  const resultMsg = message;
@@ -0,0 +1,152 @@
1
+ /**
2
+ * Pure helpers for the async-agent watcher (Fix #17 Stage 2).
3
+ *
4
+ * Two responsibilities, both pure (the file read in parseOutputFileStatus
5
+ * is pure-by-input β€” same path returns the same shape at that moment in
6
+ * time, no mutation, no side effects):
7
+ *
8
+ * 1. Parse the SDK's plain-text "Async agent launched successfully" tool
9
+ * result into a structured AsyncLaunchedInfo.
10
+ * 2. Read the tail of an outputFile JSONL stream and decide whether the
11
+ * sub-agent is still running, completed, or failed.
12
+ *
13
+ * Format details captured live from @anthropic-ai/claude-agent-sdk@0.2.97
14
+ * on 2026-04-13. See docs/superpowers/specs/sdk-async-agent-outputfile-format.md
15
+ * for the full investigation notes β€” the SDK's .d.ts shape DOES NOT match
16
+ * what the runtime actually emits, which is why the contract is pinned by
17
+ * tests against real fixtures.
18
+ */
19
+ import { promises as fs } from "fs";
20
+ // ── Tool-result text parser ──────────────────────────────────────────
21
+ /**
22
+ * Parse the plain-text SDK tool-result content for an `Agent` call with
23
+ * `run_in_background: true`. The format is documented in the spec doc
24
+ * β€” it's NOT JSON, and the field is `output_file` (snake_case).
25
+ *
26
+ * Accepts:
27
+ * - the raw text string
28
+ * - an Anthropic SDK content array `[{type: "text", text: "..."}]`
29
+ * - null/undefined/non-string β†’ returns null
30
+ */
31
+ export function parseAsyncLaunchedToolResult(raw) {
32
+ // Normalize to a string
33
+ let text;
34
+ if (raw == null)
35
+ return null;
36
+ if (typeof raw === "string") {
37
+ text = raw;
38
+ }
39
+ else if (Array.isArray(raw)) {
40
+ // SDK content blocks shape
41
+ text = raw
42
+ .map((b) => (b && typeof b === "object" && "text" in b ? String(b.text) : ""))
43
+ .join("");
44
+ }
45
+ else {
46
+ return null;
47
+ }
48
+ if (!text || text.length === 0)
49
+ return null;
50
+ // Quick gate: avoid expensive matching on non-async tool results
51
+ if (!text.includes("Async agent launched successfully"))
52
+ return null;
53
+ // agentId line: "agentId: <id> (...)" β€” capture everything up to first space/paren
54
+ const agentMatch = text.match(/agentId:\s*(\S+)/);
55
+ if (!agentMatch)
56
+ return null;
57
+ const agentId = agentMatch[1].trim();
58
+ if (!agentId)
59
+ return null;
60
+ // output_file line: "output_file: <path>" β€” path may contain spaces, capture
61
+ // until end of line (the path is always on its own line in real output).
62
+ const outFileMatch = text.match(/output_file:\s*(.+?)\s*(?:\n|$)/);
63
+ if (!outFileMatch)
64
+ return null;
65
+ const outputFile = outFileMatch[1].trim();
66
+ if (!outputFile)
67
+ return null;
68
+ return { agentId, outputFile };
69
+ }
70
+ const DEFAULT_TAIL_BYTES = 64 * 1024;
71
+ /**
72
+ * Read the tail of an SDK background-agent outputFile and decide what
73
+ * state the sub-agent is in. See spec doc for the JSONL format. We only
74
+ * read the last `maxTailBytes` of the file because long-running agents
75
+ * (SEO audits etc.) can produce hundreds of KB of intermediate JSONL.
76
+ */
77
+ export async function parseOutputFileStatus(path, opts = {}) {
78
+ const maxTailBytes = opts.maxTailBytes ?? DEFAULT_TAIL_BYTES;
79
+ let stat;
80
+ try {
81
+ stat = await fs.stat(path);
82
+ }
83
+ catch {
84
+ return { state: "missing" };
85
+ }
86
+ if (stat.size === 0) {
87
+ // Empty file is functionally the same as missing β€” we keep polling.
88
+ return { state: "missing" };
89
+ }
90
+ // Tail-read the last maxTailBytes
91
+ let buf;
92
+ let fh;
93
+ try {
94
+ fh = await fs.open(path, "r");
95
+ const readSize = Math.min(stat.size, maxTailBytes);
96
+ buf = Buffer.alloc(readSize);
97
+ await fh.read(buf, 0, readSize, stat.size - readSize);
98
+ }
99
+ catch {
100
+ return { state: "missing" };
101
+ }
102
+ finally {
103
+ try {
104
+ await fh?.close();
105
+ }
106
+ catch { /* ignore */ }
107
+ }
108
+ const text = buf.toString("utf-8");
109
+ // Split into lines. If we tail-read into the middle of a line (size >
110
+ // maxTailBytes), drop the first line because it's almost certainly
111
+ // truncated. The trailing line is dropped if there's no newline β€” it's
112
+ // the line being written right now.
113
+ const lines = text.split("\n");
114
+ const tailIsMidLine = stat.size > maxTailBytes;
115
+ const headIncomplete = tailIsMidLine ? 1 : 0;
116
+ const trailIncomplete = text.endsWith("\n") ? 0 : 1;
117
+ const usable = lines
118
+ .slice(headIncomplete, lines.length - (trailIncomplete > 0 ? trailIncomplete : 0))
119
+ .filter((l) => l.length > 0);
120
+ // Walk backwards to find the most-recent assistant message with end_turn
121
+ for (let i = usable.length - 1; i >= 0; i--) {
122
+ let parsed;
123
+ try {
124
+ parsed = JSON.parse(usable[i]);
125
+ }
126
+ catch {
127
+ // Garbage line β€” skip
128
+ continue;
129
+ }
130
+ if (parsed.type === "assistant" &&
131
+ parsed.message?.stop_reason === "end_turn" &&
132
+ Array.isArray(parsed.message.content)) {
133
+ const finalText = parsed.message.content
134
+ .filter((c) => c?.type === "text" && typeof c.text === "string")
135
+ .map((c) => c.text)
136
+ .join("\n\n");
137
+ const usage = parsed.message.usage;
138
+ return {
139
+ state: "completed",
140
+ output: finalText,
141
+ tokensUsed: usage
142
+ ? {
143
+ input: usage.input_tokens ?? 0,
144
+ output: usage.output_tokens ?? 0,
145
+ }
146
+ : undefined,
147
+ };
148
+ }
149
+ }
150
+ // No completion marker found β€” still running.
151
+ return { state: "running", size: stat.size };
152
+ }