npm - alvin-bot - Versions diffs - 4.12.3 → 4.13.0 - Mend

alvin-bot 4.12.3 → 4.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/CHANGELOG.md +126 -0
package/dist/handlers/message.js +9 -0
package/dist/paths.js +8 -0
package/dist/providers/claude-sdk-provider.js +25 -5
package/dist/services/alvin-dispatch.js +125 -0
package/dist/services/alvin-mcp-tools.js +103 -0
package/dist/services/async-agent-parser.js +126 -1
package/dist/services/personality.js +36 -10
package/package.json +1 -1
package/test/alvin-dispatch.test.ts +220 -0
package/test/async-agent-parser-staleness.test.ts +412 -0
package/test/async-agent-parser-streamjson.test.ts +273 -0

package/dist/services/personality.js CHANGED Viewed

@@ -61,13 +61,27 @@ const SDK_ADDON = `When you run commands or edit files, briefly explain what you
  * See test/system-prompt-background-hint.test.ts and
  * docs/superpowers/plans/2026-04-13-async-subagents.md
  */
-const BACKGROUND_SUBAGENT_HINT = `## ⚠️ CRITICAL: Background Sub-Agents (Telegram blocking rule)
+const BACKGROUND_SUBAGENT_HINT = `## ⚠️ CRITICAL: Background Sub-Agents on Telegram/WhatsApp/Slack
-**The rule**: For any Agent/Task tool call that will take more than
-~30 seconds, you MUST set \`run_in_background: true\`. Otherwise the
-Telegram user is COMPLETELY BLOCKED from messaging me until the task
-finishes — terrible UX for anything that ends up taking minutes or
-hours. This is the single most important rule for using the Agent tool.
+**THE RULE (v4.13 update)**: For ANY long-running work (research,
+audits, multi-step analysis, >30 seconds), you MUST prefer the
+\`mcp__alvin__dispatch_agent\` tool over the built-in \`Task\` tool.
+**Why alvin_dispatch over Task(run_in_background: true)?**
+The built-in Task tool (even with \`run_in_background: true\`) keeps
+the sub-agent tied to THIS session's CLI subprocess. If the user sends
+me a new message mid-work, aborting this session would kill the
+sub-agent half-way through and leave garbage partial output. Alvin
+learned this the hard way (v4.12.1–v4.12.3 all got this wrong).
+\`mcp__alvin__dispatch_agent\` spawns a COMPLETELY INDEPENDENT
+\`claude -p\` subprocess with its own PID, own process group. It
+survives ANY abort of this session. The user can interrupt me freely
+and keep chatting while the sub-agent finishes cleanly. The bot
+delivers the result as a separate message when ready.
+**THIS IS THE SINGLE MOST IMPORTANT RULE FOR USER RESPONSIVENESS.**
 **Why it matters**: During a synchronous Agent tool call the parent
 session has no way to know the sub-agent is still working. It appears
@@ -77,7 +91,7 @@ an \`agentId\` + \`outputFile\` path IMMEDIATELY, your turn ends in
 seconds, the user can keep chatting with me, and the bot automatically
 delivers the sub-agent's final result as a separate message when ready.
-**Decision tree** (apply every time you consider the Agent/Task tool):
+**Decision tree** (apply every time you consider any sub-agent tool):
   Does the task involve ANY of the following?
     • Visiting more than 2 URLs
@@ -89,8 +103,14 @@ delivers the sub-agent's final result as a separate message when ready.
     • Crawling, scraping, or fetching multiple resources
     • Research across multiple sources or domains
-  YES → \`run_in_background: true\` (no exceptions)
-  NO  → foreground is fine (single quick sub-query under 30s)
+  YES → use \`mcp__alvin__dispatch_agent\` (truly detached, preferred)
+  NO  → foreground is fine (single quick sub-query under 30s, answer
+        yourself if possible)
+NOTE: The built-in Task tool with run_in_background: true still works
+but is now deprecated on Telegram/Slack/Discord/WhatsApp because it
+ties sub-agent lifetime to this session. Only use Task directly when
+you explicitly need the sub-agent's result IN THIS SAME TURN (rare).
 **Examples where you MUST use \`run_in_background: true\`:**
 - ANY audit (SEO, security, code quality, performance, accessibility, GEO)
@@ -107,7 +127,7 @@ delivers the sub-agent's final result as a separate message when ready.
 - "What's 2+2?" (no sub-agent needed — answer yourself)
 - "Check if package.json has foo" (one quick tool call)
-**After launching a background agent, you MUST:**
+**After launching a background agent (either tool), you MUST:**
 1. Tell the user in ONE short sentence what you kicked off.
    Example: "Starting SEO audit for gethomes.io in the background —
    I'll send the report when it's done."
@@ -115,6 +135,12 @@ delivers the sub-agent's final result as a separate message when ready.
 3. The bot will deliver the result as a separate message when ready.
    You don't need to poll the outputFile proactively.
+**For PARALLEL dispatch** (e.g. user says "research X and Y in parallel"):
+Call \`mcp__alvin__dispatch_agent\` multiple times in the SAME assistant
+turn, once per sub-task. Each returns its own agentId immediately. Your
+turn ends as soon as all dispatches have returned — no sequential
+waiting. The bot delivers each sub-agent's result separately when ready.
 If the user asks "is it done yet?" before the bot delivers the result,
 you MAY read the agent's \`outputFile\` (from the original tool result)
 using the Read tool to peek at progress — but don't block on it.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "alvin-bot",
-  "version": "4.12.3",
+  "version": "4.13.0",
   "description": "Alvin Bot \u2014 Your personal AI agent on Telegram, WhatsApp, Discord, Signal, and Web.",
   "type": "module",
   "main": "dist/index.js",

package/test/alvin-dispatch.test.ts ADDED Viewed

@@ -0,0 +1,220 @@
+/**
+ * v4.13 — alvin_dispatch custom-tool service.
+ *
+ * `dispatchDetachedAgent(input)` spawns a truly independent `claude -p`
+ * subprocess that survives the parent handler's abort. This is the
+ * architectural replacement for SDK's built-in Task(run_in_background)
+ * tool, which was tied to the parent SDK subprocess lifecycle.
+ *
+ * Contract:
+ *   - Input: { prompt, description, chatId, userId, sessionKey }
+ *   - Output (synchronous): { agentId, outputFile, spawned: true }
+ *   - Side effect: spawns detached subprocess writing stream-json
+ *     output to outputFile, registers with async-agent-watcher.
+ *
+ * These tests stub child_process.spawn so they run fast and deterministic.
+ * The "real subprocess survives parent" property was verified empirically
+ * in Phase A (see plan doc).
+ */
+import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
+import os from "os";
+import fs from "fs";
+import { resolve } from "path";
+const TEST_DATA_DIR = resolve(
+  os.tmpdir(),
+  `alvin-dispatch-${process.pid}-${Date.now()}`,
+);
+interface SpawnRecord {
+  cmd: string;
+  args: string[];
+  opts: {
+    detached?: boolean;
+    stdio?: unknown;
+    cwd?: string;
+    env?: Record<string, string | undefined>;
+  };
+  unreffed: boolean;
+}
+let spawned: SpawnRecord[] = [];
+beforeEach(async () => {
+  if (fs.existsSync(TEST_DATA_DIR))
+    fs.rmSync(TEST_DATA_DIR, { recursive: true, force: true });
+  fs.mkdirSync(TEST_DATA_DIR, { recursive: true });
+  process.env.ALVIN_DATA_DIR = TEST_DATA_DIR;
+  spawned = [];
+  vi.resetModules();
+  vi.doMock("node:child_process", async () => {
+    const actual = await vi.importActual<typeof import("node:child_process")>(
+      "node:child_process",
+    );
+    return {
+      ...actual,
+      spawn: (cmd: string, args: string[], opts: SpawnRecord["opts"]) => {
+        const record: SpawnRecord = {
+          cmd,
+          args,
+          opts,
+          unreffed: false,
+        };
+        spawned.push(record);
+        return {
+          pid: 12345,
+          unref() {
+            record.unreffed = true;
+          },
+          on() {},
+          kill() {},
+        };
+      },
+    };
+  });
+  vi.doMock("../src/services/subagent-delivery.js", () => ({
+    deliverSubAgentResult: async () => {},
+    attachBotApi: () => {},
+    __setBotApiForTest: () => {},
+  }));
+});
+afterEach(async () => {
+  try {
+    const mod = await import("../src/services/async-agent-watcher.js");
+    mod.stopWatcher();
+    mod.__resetForTest();
+  } catch {
+    /* ignore */
+  }
+});
+describe("dispatchDetachedAgent (v4.13)", () => {
+  it("spawns claude -p with detached: true and unrefs", async () => {
+    const mod = await import("../src/services/alvin-dispatch.js");
+    const result = mod.dispatchDetachedAgent({
+      prompt: "research X",
+      description: "X research",
+      chatId: 42,
+      userId: 42,
+      sessionKey: "s1",
+    });
+    expect(result.agentId).toMatch(/^alvin-[a-f0-9]{16,}$/);
+    expect(result.outputFile).toContain(TEST_DATA_DIR);
+    expect(result.spawned).toBe(true);
+    expect(spawned).toHaveLength(1);
+    const [s] = spawned;
+    expect(s.cmd).toMatch(/claude/);
+    expect(s.args).toContain("-p");
+    expect(s.args).toContain("research X");
+    expect(s.args).toContain("--output-format");
+    expect(s.args).toContain("stream-json");
+    expect(s.opts.detached).toBe(true);
+    expect(s.unreffed).toBe(true);
+  });
+  it("returns unique agentIds for concurrent dispatches", async () => {
+    const mod = await import("../src/services/alvin-dispatch.js");
+    const r1 = mod.dispatchDetachedAgent({
+      prompt: "a",
+      description: "a",
+      chatId: 1,
+      userId: 1,
+      sessionKey: "s1",
+    });
+    const r2 = mod.dispatchDetachedAgent({
+      prompt: "b",
+      description: "b",
+      chatId: 1,
+      userId: 1,
+      sessionKey: "s1",
+    });
+    expect(r1.agentId).not.toBe(r2.agentId);
+    expect(r1.outputFile).not.toBe(r2.outputFile);
+  });
+  it("registers the pending agent with the watcher", async () => {
+    const mod = await import("../src/services/alvin-dispatch.js");
+    const watcher = await import("../src/services/async-agent-watcher.js");
+    mod.dispatchDetachedAgent({
+      prompt: "x",
+      description: "X audit",
+      chatId: 42,
+      userId: 42,
+      sessionKey: "s1",
+    });
+    const pending = watcher.listPendingAgents();
+    expect(pending).toHaveLength(1);
+    expect(pending[0].description).toBe("X audit");
+    expect(pending[0].sessionKey).toBe("s1");
+  });
+  it("increments session.pendingBackgroundCount on dispatch", async () => {
+    const mod = await import("../src/services/alvin-dispatch.js");
+    const { getSession } = await import("../src/services/session.js");
+    const session = getSession("s-count");
+    session.pendingBackgroundCount = 0;
+    mod.dispatchDetachedAgent({
+      prompt: "p",
+      description: "d",
+      chatId: 1,
+      userId: 1,
+      sessionKey: "s-count",
+    });
+    expect(session.pendingBackgroundCount).toBe(1);
+    mod.dispatchDetachedAgent({
+      prompt: "p2",
+      description: "d2",
+      chatId: 1,
+      userId: 1,
+      sessionKey: "s-count",
+    });
+    expect(session.pendingBackgroundCount).toBe(2);
+  });
+  it("uses stdio redirect so child's stdout goes to outputFile", async () => {
+    const mod = await import("../src/services/alvin-dispatch.js");
+    mod.dispatchDetachedAgent({
+      prompt: "p",
+      description: "d",
+      chatId: 1,
+      userId: 1,
+      sessionKey: "s1",
+    });
+    const [s] = spawned;
+    // stdio should be an array with FD redirects (ignore, pipe-to-file, ignore)
+    // or similar. We verify it's NOT "inherit" (which would attach to parent).
+    expect(s.opts.stdio).not.toBe("inherit");
+    expect(s.opts.stdio).not.toBe(undefined);
+  });
+  it("cleans env of CLAUDECODE/CLAUDE_CODE_ENTRYPOINT to prevent nested session errors", async () => {
+    const mod = await import("../src/services/alvin-dispatch.js");
+    process.env.CLAUDECODE = "1";
+    process.env.CLAUDE_CODE_ENTRYPOINT = "cli";
+    try {
+      mod.dispatchDetachedAgent({
+        prompt: "p",
+        description: "d",
+        chatId: 1,
+        userId: 1,
+        sessionKey: "s1",
+      });
+      const [s] = spawned;
+      expect(s.opts.env).toBeDefined();
+      expect(s.opts.env?.CLAUDECODE).toBeUndefined();
+      expect(s.opts.env?.CLAUDE_CODE_ENTRYPOINT).toBeUndefined();
+    } finally {
+      delete process.env.CLAUDECODE;
+      delete process.env.CLAUDE_CODE_ENTRYPOINT;
+    }
+  });
+});