npm - alvin-bot - Versions diffs - 4.9.3 → 4.10.0 - Mend

alvin-bot 4.9.3 → 4.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/CHANGELOG.md +108 -0
package/README.md +12 -1
package/dist/handlers/async-agent-chunk-handler.js +33 -0
package/dist/handlers/message.js +34 -6
package/dist/index.js +15 -3
package/dist/paths.js +4 -0
package/dist/providers/claude-sdk-provider.js +43 -0
package/dist/services/async-agent-parser.js +152 -0
package/dist/services/async-agent-watcher.js +206 -0
package/dist/services/personality.js +55 -0
package/dist/web/bind-strategy.js +42 -0
package/dist/web/server.js +231 -101
package/package.json +1 -1
package/test/async-agent-chunk-flow.test.ts +131 -0
package/test/async-agent-parser.test.ts +322 -0
package/test/async-agent-watcher.test.ts +229 -0
package/test/stress-scenarios.test.ts +1 -1
package/test/system-prompt-background-hint.test.ts +48 -0
package/test/web-server-integration.test.ts +189 -0
package/test/web-server-resilience.test.ts +118 -0
package/test/web-server-shutdown.test.ts +7 -1

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,114 @@
 All notable changes to Alvin Bot are documented here.
+## [4.10.0] — 2026-04-13
+### 🚀 Async sub-agents — main session no longer blocks during long tasks
+The big architecture upgrade: Claude can now delegate long-running work (SEO audits, multi-page research, full-repo analyses) to **background** sub-agents. The main Telegram session ends quickly, the user can keep chatting, and the sub-agent's final report arrives as a separate message when ready.
+A colleague flagged the underlying problem on 2026-04-13 via WhatsApp voice note: *"It's weird that the main routine crashes when the sub-agents are still running. It should just run in the background, and that should have zero impact on the main routine."* He was right. OpenClaw had this years ago because back then the SDK didn't support async; today's `@anthropic-ai/claude-agent-sdk@0.2.97` already ships `run_in_background: true` on the Agent tool — Alvin just wasn't using it.
+This release closes that gap in two complementary stages, both bundled into the same v4.10.0:
+#### Stage 1 — System prompt teaches Claude when to use `run_in_background`
+- New `BACKGROUND_SUBAGENT_HINT` constant in `src/services/personality.ts`, injected only into SDK sessions (non-SDK providers don't have an Agent tool).
+- The hint tells Claude: for audits / multi-page research / >2 min tasks → ALWAYS set `run_in_background: true`. After launching, end the turn promptly. The bot delivers the result automatically when done.
+- Net effect: Claude's main turn ends in ~5 s instead of 10+ minutes. `session.isProcessing` flips to `false` quickly so the user can keep chatting.
+#### Stage 2 — Async-agent watcher polls and delivers
+The hard part. Three new pure modules + one new wired-up service:
+- **`src/services/async-agent-parser.ts`** (NEW, pure) — two helpers:
+  - `parseAsyncLaunchedToolResult(text)` extracts `agentId` + `output_file` from the SDK's plain-text `Async agent launched successfully…` tool-result. **Important**: the `.d.ts` type in the SDK package claims this is a JSON object with `outputFile: string`. The runtime actually emits plain text with `output_file` (snake_case). Captured live via probe — see the parser test fixtures.
+  - `parseOutputFileStatus(path)` tail-reads (64 KB) the JSONL `output_file` and detects completion by finding the most-recent `assistant` message with `stop_reason: "end_turn"`. Concatenates `content[].text` blocks for the final answer. Token usage extracted from the `usage` field. Survives partial last lines, garbage lines, and tail-cuts on huge files. **19 unit tests** including a 200 KB tail-test.
+- **`src/services/async-agent-watcher.ts`** (NEW) — the polling service. `Map<agentId, PendingAsyncAgent>` in memory, persisted to `~/.alvin-bot/state/async-agents.json` for restart catch-up (same pattern as v4.9.0 cron scheduler). Public API: `startWatcher` / `stopWatcher` / `registerPendingAgent` / `pollOnce` / `listPendingAgents`. Polls every 15 s, gives up after 12 h per-agent (timeout banner). On completion → builds a `SubAgentInfo + SubAgentResult` and hands off to the existing `subagent-delivery.ts` from v4.9.x. **7 integration tests** including bot-restart catch-up.
+- **`src/handlers/async-agent-chunk-handler.ts`** (NEW) — bridge between provider stream chunks and the watcher. Inspects `tool_result` chunks for the async_launched payload, extracts the `description` from the immediately preceding `tool_use` chunk, registers with the watcher. **4 unit tests**.
+- **`src/providers/claude-sdk-provider.ts`** — extended to surface `tool_result` blocks from SDK `user` messages as a new `tool_result` chunk type. Previously the provider only emitted `text` and `tool_use` chunks.
+- **`src/providers/types.ts`** — `StreamChunk` gets two new optional fields: `toolUseId` and `toolResultContent`.
+- **`src/handlers/message.ts`** — captures `lastAgentToolUseInput` from each `tool_use` chunk and consumes it on the immediately-following `tool_result` chunk. Tool-name match also extended from `"Task"` → `"Task" | "Agent"` (the SDK renamed it in v2.1.63).
+- **`src/index.ts`** — `startAsyncAgentWatcher()` after the cron scheduler, `stopAsyncAgentWatcher()` in the shutdown handler.
+- **`src/paths.ts`** — new `ASYNC_AGENTS_STATE_FILE` constant under `~/.alvin-bot/state/`.
+#### Investigation artifacts (gitignored, maintainer-local)
+- `docs/superpowers/plans/2026-04-13-async-subagents.md` — full TDD plan
+- `docs/superpowers/specs/sdk-async-agent-outputfile-format.md` — live-captured SDK format spec; documents the `.d.ts` mismatch that ate ~30 minutes of debugging time
+#### Testing
+**237 tests total** (201 baseline + 36 new). All green. TSC clean.
+- 6 system-prompt-hint tests (Stage 1)
+- 19 parser tests (8 plain-text format + 11 JSONL format including 200 KB tail-test)
+- 7 watcher integration tests (register, deliver, persistence, restart catch-up, timeout, concurrent agents)
+- 4 chunk-handler unit tests
+Live-verified via isolated SDK probe (`node sdk-probe.mjs` inside the repo) which confirmed the real `output_file` path and JSONL format match the parser's expectations.
+#### What you'll see as a user
+Send: *"Make a SEO audit of gethomes.io and alev-b.com in parallel"*
+- **0 s** — Claude responds: *"Starting both audits in the background — I'll send the reports when done."* Main session **unlocks**.
+- **1–10 min later** — You can chat about anything else. The bot answers immediately.
+- **~13 min** (when each agent finishes) — Two separate banner messages arrive: *"✅ SEO audit gethomes.io completed · 13m 17s · 2.6M in / 28k out"* + the full report body, delivered via the v4.9.3 Markdown→plain-text fallback path.
+#### Non-goals
+- No session-mutex refactor (Stage 3 from the analysis, out of scope here)
+- No replacement for Alvin's existing cron `spawnSubAgent` system (different use case)
+- No SDK upgrade beyond `0.2.97`
+#### Compatibility
+- `CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1` in `.env` disables background mode at the SDK level → Stage 1 hint becomes inert, watcher idles; foreground behavior is restored
+## [4.9.4] — 2026-04-13
+### 🔌 Web UI fully decoupled from main bot — port conflicts no longer crash anything
+Colleague feedback (WhatsApp voice note, 2026-04-13):
+> *"The gateway binds to port 3100 like OpenClaw. When the bot restarts,
+> the port is often still held → catastrophic crash. I ended up
+> decoupling the gateway process completely, because the actual bot
+> runs independently of the gateway — it can still answer Telegram
+> even if the web endpoint isn't reachable yet. It's weird that the
+> main routine crashes when the port is busy. It should just run in
+> the background, watch for the port to become free, and connect
+> then. Zero impact on the main routine."*
+He was right. My v4.9.0 `stopWebServer()` fix was *prevention* — it stopped the bot itself from holding 3100 across restarts. But it didn't cover the *resilience* side: a foreign process holding 3100 (another dev server, an OpenClaw-style orphan, a TIME_WAIT race after SIGKILL) still crashed the boot, because `startWebServer()` was synchronous and the `uncaught exception` from `server.listen()` escaped to the main event loop.
+**Complete rewrite of the bind loop:**
+- **`src/web/bind-strategy.ts` (new) — pure decision helper.** `decideNextBindAction(err, attempt, opts)` returns either `{type: "retry-port", port, attempt}` (climb the ladder) or `{type: "retry-background", delayMs, port}` (back off, retry the original port in 30 s). EADDRINUSE with attempts remaining → ladder. EADDRINUSE exhausted → background. Any other error → background. 8 unit tests covering every branch + purity.
+- **`src/web/server.ts` startWebServer — non-blocking, fresh-server-per-attempt.** Returns `void` synchronously, NEVER throws, NEVER blocks on bind. Each attempt creates a new `http.Server` (no state-recycling bugs) and attaches its own error handler. On failure, cleans up and calls `decideNextBindAction` to decide the next move. If the ladder is exhausted, schedules a 30 s background retry at the original port — the Telegram bot keeps running the whole time, the web UI just isn't reachable yet.
+- **`src/web/server.ts` WebSocketServer attached POST-bind.** The `ws` library's `WebSocketServer` constructor installs its own event plumbing on the underlying `http.Server` and — crucially — causes EADDRINUSE errors to escape as uncaught exceptions when attached pre-listen. Debugging this chewed an hour on 2026-04-13. Fix: only `new WebSocketServer({ server })` AFTER `listen()` has fired its callback. The unit-test `test/web-server-integration.test.ts "when the primary port is taken"` pins this behaviour.
+- **`src/web/server.ts` error handler: `on` not `once`.** Previous version used `.once("error", handler)` and a node edge case where a single bind failure emits TWO error events left the second one uncaught. Handler is now `on` with a `handled` guard — idempotent, and a post-bind quiet logger replaces it on success.
+- **`src/web/server.ts` defensive try/catch around `server.listen()`.** In the wild Node sometimes throws synchronously for edge-case binds (already-listening, invalid backlog, kernel race). The catch funnels sync throws through the same `handleBindFailure` path as async error events.
+- **`src/web/server.ts` `closeHttpServerGracefully(server)` + `stopWebServer()`.** The old `stopWebServer(server)` took an explicit server arg; it's been split into a low-level helper (`closeHttpServerGracefully(server)`, exported for tests) and a stateful top-level (`stopWebServer()`, no args, cleans up `currentServer` + `wsServerRef` + `bindRetryTimer`). Safe to call before start, safe to call twice, cancels pending background retries.
+- **`src/index.ts` call sites adjusted.** `const webServer = startWebServer()` → `startWebServer()`. `stopWebServer(webServer)` → `stopWebServer()`. The comment above the call explains the decoupling so nobody accidentally re-couples it in a future "clean up" refactor.
+**Testing: 186 → 201 (+15 new).**
+- `test/web-server-resilience.test.ts` — 8 unit tests for `decideNextBindAction`
+- `test/web-server-integration.test.ts` — 7 real-server integration tests: startWebServer returns void, binds, stops, is idempotent, survives primary-port conflict by climbing the ladder, closes servers with hanging sockets.
+- **Live-verified on the maintainer's machine**: `launchctl unload` + dual-stack Node hog on port 3100 + `launchctl load` → bot booted cleanly → out.log contained `[web] port 3100 busy (EADDRINUSE) — trying 3101` → `🌐 Web UI: http://localhost:3101   (Port 3100 was busy, using 3101 instead)` → Telegram responsive throughout. Exactly what the colleague described.
+**Non-goals / intentionally unchanged:**
+- Timeouts stay unlimited (v4.8.8 behaviour preserved).
+- The primary port is still `WEB_PORT || 3100` — no config schema change.
+- When the bot binds on a non-primary port (e.g. 3101), the README permalink still points at 3100. Users hitting a ladder-climbed bot should check the startup log; this is rare and temporary.
 ## [4.9.3] — 2026-04-11
 ### 🛠 Two UX bugs found in production after v4.9.2 — now closed

package/README.md CHANGED Viewed

@@ -114,7 +114,18 @@ That's it. The setup wizard validates everything:
 **Requires:** Node.js 18+ ([nodejs.org](https://nodejs.org)) · Telegram bot token ([@BotFather](https://t.me/BotFather)) · Your Telegram user ID ([@userinfobot](https://t.me/userinfobot))
-Free AI providers available — no credit card needed.
+Free AI providers available — no credit card needed. **Privacy-first?** Pick the 🔒 **Offline — Gemma 4 E4B** option in setup for a fully local LLM via Ollama (macOS/Linux: automated install; Windows: manual).
+### 📘 First-time setup walkthroughs
+Step-by-step guides with screenshots and screen-for-screen instructions:
+| Platform | PDF (printable) |
+|---|---|
+| 🍎 **macOS** (with `launchd` background service) | [Download PDF](https://github.com/alvbln/Alvin-Bot/releases/latest/download/Alvin-Bot-macOS-Setup-Guide.pdf) |
+| 🪟 **Windows** (with Task Scheduler / Startup folder) | [Download PDF](https://github.com/alvbln/Alvin-Bot/releases/latest/download/Alvin-Bot-Windows-Setup-Guide.pdf) |
+Both guides cover: Node.js install · Telegram bot creation · first-time `setup` · foreground test · background service · offline Gemma 4 mode · troubleshooting. ~15 min end-to-end for a first-time user.
 ### macOS: use `launchd` instead of pm2 (recommended)

package/dist/handlers/async-agent-chunk-handler.js ADDED Viewed

@@ -0,0 +1,33 @@
+import { parseAsyncLaunchedToolResult } from "../services/async-agent-parser.js";
+import { registerPendingAgent } from "../services/async-agent-watcher.js";
+/**
+ * Inspect a stream chunk; if it's an Agent async_launched tool_result,
+ * register the pending agent with the watcher.
+ *
+ * Safe to call on any chunk type — non-tool_result chunks are ignored.
+ */
+export function handleToolResultChunk(chunk, ctx) {
+    if (chunk.type !== "tool_result")
+        return;
+    if (!chunk.toolResultContent)
+        return;
+    const info = parseAsyncLaunchedToolResult(chunk.toolResultContent);
+    if (!info)
+        return;
+    // The description and prompt come from the original tool_use input,
+    // not the tool_result text. If we don't have them (e.g. test setup
+    // forgot to pass lastToolUseInput), fall back to a generic label so
+    // the user still sees something meaningful in the delivery banner.
+    const description = ctx.lastToolUseInput?.description?.trim() ||
+        `Background agent ${info.agentId.slice(0, 8)}`;
+    const prompt = ctx.lastToolUseInput?.prompt?.trim() || "";
+    registerPendingAgent({
+        agentId: info.agentId,
+        outputFile: info.outputFile,
+        description,
+        prompt,
+        chatId: ctx.chatId,
+        userId: ctx.userId,
+        toolUseId: chunk.toolUseId ?? null,
+    });
+}

package/dist/handlers/message.js CHANGED Viewed

@@ -15,6 +15,7 @@ import { trackUsage } from "../services/usage-tracker.js";
 import { emitUserMessage as broadcastUserMessage, emitResponseStart as broadcastResponseStart, emitResponseDelta as broadcastResponseDelta, emitResponseDone as broadcastResponseDone, } from "../services/broadcast.js";
 import { t } from "../i18n.js";
 import { isHarmlessTelegramError } from "../util/telegram-error-filter.js";
+import { handleToolResultChunk } from "./async-agent-chunk-handler.js";
 /**
  * Stuck-only timeout — NO absolute cap.
  *
@@ -279,6 +280,11 @@ export async function handleMessage(ctx) {
         };
         // Stream response from provider (with fallback)
         let lastBroadcastLen = 0;
+        // Captured during tool_use chunks; consumed by tool_result chunks so
+        // the async-agent watcher can label pending agents with their human-
+        // readable description (which only appears in the tool_use input,
+        // not in the tool_result text). See Fix #17 Stage 2.
+        let lastAgentToolUseInput;
         for await (const chunk of registry.queryWithFallback(queryOpts)) {
             // Any chunk is progress — reset the stuck timer.
             resetStuckTimer();
@@ -309,13 +315,14 @@ export async function handleMessage(ctx) {
                     if (chunk.toolName) {
                         session.toolUseCount++;
                         const icon = TOOL_ICONS[chunk.toolName] || "🔧";
-                        // Special treatment for Claude's SDK-internal Task tool:
+                        // Special treatment for Claude's SDK-internal Task/Agent tool:
                         // track how many sub-tasks Claude delegated and surface the
                         // task description in the status line so the user sees WHAT
-                        // is being delegated, not just "Task…".
-                        if (chunk.toolName === "Task") {
+                        // is being delegated, not just "Task…". The tool was renamed
+                        // from "Task" to "Agent" in Claude Code v2.1.63 — match both.
+                        if (chunk.toolName === "Task" || chunk.toolName === "Agent") {
                             session.sdkSubTaskCount++;
-                            let label = "Task";
+                            let label = chunk.toolName;
                             if (chunk.toolInput) {
                                 try {
                                     const parsed = JSON.parse(chunk.toolInput);
@@ -324,11 +331,18 @@ export async function handleMessage(ctx) {
                                         const desc = parsed.description.length > 80
                                             ? parsed.description.slice(0, 80) + "…"
                                             : parsed.description;
-                                        label = `Task: ${desc}`;
+                                        label = `${chunk.toolName}: ${desc}`;
                                     }
                                     else if (parsed.subagent_type) {
-                                        label = `Task (${parsed.subagent_type})`;
+                                        label = `${chunk.toolName} (${parsed.subagent_type})`;
                                     }
+                                    // Capture the description+prompt for the upcoming
+                                    // tool_result. Used by Fix #17 Stage 2 to label
+                                    // background agents in the watcher's delivery banner.
+                                    lastAgentToolUseInput = {
+                                        description: parsed.description,
+                                        prompt: parsed.prompt,
+                                    };
                                 }
                                 catch {
                                     // not JSON — keep generic label
@@ -341,6 +355,20 @@ export async function handleMessage(ctx) {
                         }
                     }
                     break;
+                case "tool_result":
+                    // Fix #17 Stage 2: detect Agent async_launched payloads and
+                    // hand them off to the async-agent watcher. The watcher will
+                    // poll the outputFile and deliver the result as a separate
+                    // Telegram message when the background agent finishes.
+                    handleToolResultChunk(chunk, {
+                        chatId: ctx.chat.id,
+                        userId,
+                        lastToolUseInput: lastAgentToolUseInput,
+                    });
+                    // Reset the captured input — only the immediately following
+                    // tool_result should consume it.
+                    lastAgentToolUseInput = undefined;
+                    break;
                 case "done":
                     if (chunk.sessionId)
                         session.sessionId = chunk.sessionId;

package/dist/index.js CHANGED Viewed

@@ -78,6 +78,7 @@ import { loadPlugins, registerPluginCommands, unloadPlugins } from "./services/p
 import { initMCP, disconnectMCP, hasMCPConfig } from "./services/mcp.js";
 import { startWebServer, stopWebServer } from "./web/server.js";
 import { startScheduler, stopScheduler, setNotifyCallback } from "./services/cron.js";
+import { startWatcher as startAsyncAgentWatcher, stopWatcher as stopAsyncAgentWatcher } from "./services/async-agent-watcher.js";
 import { startSessionCleanup, stopSessionCleanup } from "./services/session.js";
 import { processQueue, cleanupQueue, setSenders, enqueue } from "./services/delivery-queue.js";
 import { discoverTools } from "./services/tool-discovery.js";
@@ -254,6 +255,7 @@ const shutdown = async () => {
     await cancelAllSubAgents(true);
     stopWatchdog();
     stopScheduler();
+    stopAsyncAgentWatcher();
     stopSessionCleanup();
     if (queueInterval)
         clearInterval(queueInterval);
@@ -267,7 +269,7 @@ const shutdown = async () => {
     }
     // Release :3100 so the next launchd boot doesn't hit EADDRINUSE.
     // Must happen before exit — see src/web/server.ts stopWebServer() comment.
-    await stopWebServer(webServer).catch((err) => console.warn("[shutdown] stopWebServer failed:", err));
+    await stopWebServer().catch((err) => console.warn("[shutdown] stopWebServer failed:", err));
     await unloadPlugins().catch(() => { });
     await disconnectMCP().catch(() => { });
     // Tear down any bot-managed local runners (Ollama, LM Studio, …) so VRAM
@@ -404,8 +406,13 @@ async function startOptionalPlatforms() {
     }
 }
 startOptionalPlatforms().catch(err => console.error("Platform startup error:", err));
-// Start Web UI (ALWAYS — regardless of Telegram/AI config)
-const webServer = startWebServer();
+// Start Web UI (ALWAYS — regardless of Telegram/AI config).
+// startWebServer is now non-blocking and will never throw: if port 3100
+// is busy (foreign process, TIME_WAIT, another bot instance), it climbs
+// the port ladder up to 3119 and then enters a background retry loop
+// at 3100 every 30s. The Telegram bot runs independently — Web UI is a
+// feature, not core. See src/web/bind-strategy.ts for the retry rules.
+startWebServer();
 // Start Cron Scheduler — route notifications through delivery queue for reliability
 setNotifyCallback(async (target, text) => {
     if (target.platform === "web") {
@@ -415,6 +422,11 @@ setNotifyCallback(async (target, text) => {
     enqueue(target.platform, String(target.chatId), text);
 });
 startScheduler();
+// Start the async-agent watcher (Fix #17 Stage 2). Polls outputFiles
+// of background sub-agents Claude launched with run_in_background and
+// delivers their completed reports as separate Telegram messages.
+// Loads any persisted pending agents from disk on boot.
+startAsyncAgentWatcher();
 // Session memory hygiene: purge sessions idle > 7 days (configurable via
 // ALVIN_SESSION_TTL_DAYS). Never touches active sessions — see session.ts.
 startSessionCleanup();

package/dist/paths.js CHANGED Viewed

@@ -62,6 +62,10 @@ export const SUDO_ENC_FILE = resolve(DATA_DIR, "data", ".sudo-enc");
 export const SUDO_KEY_FILE = resolve(DATA_DIR, "data", ".sudo-key");
 /** backups/ — Config snapshots */
 export const BACKUP_DIR = resolve(DATA_DIR, "backups");
+/** state/async-agents.json — Pending background SDK agents (Fix #17 Stage 2).
+ *  See src/services/async-agent-watcher.ts for the watcher that polls and
+ *  delivers these. Survives bot restarts. */
+export const ASYNC_AGENTS_STATE_FILE = resolve(DATA_DIR, "state", "async-agents.json");
 /** soul.md — Bot personality */
 export const SOUL_FILE = resolve(DATA_DIR, "soul.md");
 /** tools.md — Custom tool definitions (Markdown) */

package/dist/providers/claude-sdk-provider.js CHANGED Viewed

@@ -186,6 +186,49 @@ export class ClaudeSDKProvider {
                         }
                     }
                 }
+                // User message — tool_results from the Claude API arrive as user
+                // messages in the SDK protocol. We surface tool_result blocks as
+                // chunks so the message handler can detect Agent async_launched
+                // payloads and register them with the watcher (Fix #17 Stage 2).
+                if (message.type === "user") {
+                    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+                    const userMsg = message;
+                    const content = userMsg.message?.content;
+                    if (Array.isArray(content)) {
+                        for (const block of content) {
+                            if (block &&
+                                typeof block === "object" &&
+                                block.type === "tool_result" &&
+                                typeof block.tool_use_id === "string") {
+                                // The `content` field on a tool_result block can be a
+                                // plain string OR an array of content blocks. Normalize
+                                // to a single string so the chunk consumer doesn't need
+                                // to know about the SDK shape.
+                                let contentText = "";
+                                if (typeof block.content === "string") {
+                                    contentText = block.content;
+                                }
+                                else if (Array.isArray(block.content)) {
+                                    contentText = block.content
+                                        .map((c) => {
+                                        if (c && typeof c === "object" && "text" in c) {
+                                            const t = c.text;
+                                            return typeof t === "string" ? t : "";
+                                        }
+                                        return "";
+                                    })
+                                        .join("");
+                                }
+                                yield {
+                                    type: "tool_result",
+                                    toolUseId: block.tool_use_id,
+                                    toolResultContent: contentText,
+                                    sessionId: capturedSessionId,
+                                };
+                            }
+                        }
+                    }
+                }
                 // Result — done (extract full usage including cache tokens)
                 if (message.type === "result") {
                     const resultMsg = message;

package/dist/services/async-agent-parser.js ADDED Viewed

@@ -0,0 +1,152 @@
+/**
+ * Pure helpers for the async-agent watcher (Fix #17 Stage 2).
+ *
+ * Two responsibilities, both pure (the file read in parseOutputFileStatus
+ * is pure-by-input — same path returns the same shape at that moment in
+ * time, no mutation, no side effects):
+ *
+ *  1. Parse the SDK's plain-text "Async agent launched successfully" tool
+ *     result into a structured AsyncLaunchedInfo.
+ *  2. Read the tail of an outputFile JSONL stream and decide whether the
+ *     sub-agent is still running, completed, or failed.
+ *
+ * Format details captured live from @anthropic-ai/claude-agent-sdk@0.2.97
+ * on 2026-04-13. See docs/superpowers/specs/sdk-async-agent-outputfile-format.md
+ * for the full investigation notes — the SDK's .d.ts shape DOES NOT match
+ * what the runtime actually emits, which is why the contract is pinned by
+ * tests against real fixtures.
+ */
+import { promises as fs } from "fs";
+// ── Tool-result text parser ──────────────────────────────────────────
+/**
+ * Parse the plain-text SDK tool-result content for an `Agent` call with
+ * `run_in_background: true`. The format is documented in the spec doc
+ * — it's NOT JSON, and the field is `output_file` (snake_case).
+ *
+ * Accepts:
+ *   - the raw text string
+ *   - an Anthropic SDK content array `[{type: "text", text: "..."}]`
+ *   - null/undefined/non-string → returns null
+ */
+export function parseAsyncLaunchedToolResult(raw) {
+    // Normalize to a string
+    let text;
+    if (raw == null)
+        return null;
+    if (typeof raw === "string") {
+        text = raw;
+    }
+    else if (Array.isArray(raw)) {
+        // SDK content blocks shape
+        text = raw
+            .map((b) => (b && typeof b === "object" && "text" in b ? String(b.text) : ""))
+            .join("");
+    }
+    else {
+        return null;
+    }
+    if (!text || text.length === 0)
+        return null;
+    // Quick gate: avoid expensive matching on non-async tool results
+    if (!text.includes("Async agent launched successfully"))
+        return null;
+    // agentId line: "agentId: <id> (...)" — capture everything up to first space/paren
+    const agentMatch = text.match(/agentId:\s*(\S+)/);
+    if (!agentMatch)
+        return null;
+    const agentId = agentMatch[1].trim();
+    if (!agentId)
+        return null;
+    // output_file line: "output_file: <path>" — path may contain spaces, capture
+    // until end of line (the path is always on its own line in real output).
+    const outFileMatch = text.match(/output_file:\s*(.+?)\s*(?:\n|$)/);
+    if (!outFileMatch)
+        return null;
+    const outputFile = outFileMatch[1].trim();
+    if (!outputFile)
+        return null;
+    return { agentId, outputFile };
+}
+const DEFAULT_TAIL_BYTES = 64 * 1024;
+/**
+ * Read the tail of an SDK background-agent outputFile and decide what
+ * state the sub-agent is in. See spec doc for the JSONL format. We only
+ * read the last `maxTailBytes` of the file because long-running agents
+ * (SEO audits etc.) can produce hundreds of KB of intermediate JSONL.
+ */
+export async function parseOutputFileStatus(path, opts = {}) {
+    const maxTailBytes = opts.maxTailBytes ?? DEFAULT_TAIL_BYTES;
+    let stat;
+    try {
+        stat = await fs.stat(path);
+    }
+    catch {
+        return { state: "missing" };
+    }
+    if (stat.size === 0) {
+        // Empty file is functionally the same as missing — we keep polling.
+        return { state: "missing" };
+    }
+    // Tail-read the last maxTailBytes
+    let buf;
+    let fh;
+    try {
+        fh = await fs.open(path, "r");
+        const readSize = Math.min(stat.size, maxTailBytes);
+        buf = Buffer.alloc(readSize);
+        await fh.read(buf, 0, readSize, stat.size - readSize);
+    }
+    catch {
+        return { state: "missing" };
+    }
+    finally {
+        try {
+            await fh?.close();
+        }
+        catch { /* ignore */ }
+    }
+    const text = buf.toString("utf-8");
+    // Split into lines. If we tail-read into the middle of a line (size >
+    // maxTailBytes), drop the first line because it's almost certainly
+    // truncated. The trailing line is dropped if there's no newline — it's
+    // the line being written right now.
+    const lines = text.split("\n");
+    const tailIsMidLine = stat.size > maxTailBytes;
+    const headIncomplete = tailIsMidLine ? 1 : 0;
+    const trailIncomplete = text.endsWith("\n") ? 0 : 1;
+    const usable = lines
+        .slice(headIncomplete, lines.length - (trailIncomplete > 0 ? trailIncomplete : 0))
+        .filter((l) => l.length > 0);
+    // Walk backwards to find the most-recent assistant message with end_turn
+    for (let i = usable.length - 1; i >= 0; i--) {
+        let parsed;
+        try {
+            parsed = JSON.parse(usable[i]);
+        }
+        catch {
+            // Garbage line — skip
+            continue;
+        }
+        if (parsed.type === "assistant" &&
+            parsed.message?.stop_reason === "end_turn" &&
+            Array.isArray(parsed.message.content)) {
+            const finalText = parsed.message.content
+                .filter((c) => c?.type === "text" && typeof c.text === "string")
+                .map((c) => c.text)
+                .join("\n\n");
+            const usage = parsed.message.usage;
+            return {
+                state: "completed",
+                output: finalText,
+                tokensUsed: usage
+                    ? {
+                        input: usage.input_tokens ?? 0,
+                        output: usage.output_tokens ?? 0,
+                    }
+                    : undefined,
+            };
+        }
+    }
+    // No completion marker found — still running.
+    return { state: "running", size: stat.size };
+}