npm - @kky42/pi-subagents - Versions diffs - 1.0.1 → 1.0.2 - Mend

@kky42/pi-subagents 1.0.1 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md +27 -2
package/package.json +4 -1
package/scripts/e2e/main-agent-comparison.mjs +663 -0
package/src/pi-subagent.ts +47 -105
package/src/prompts.ts +11 -3
package/src/types.ts +0 -4

package/README.md CHANGED Viewed

@@ -17,7 +17,7 @@ Fresh subagents start with their own conversation and the same working directory
 Recent comparison runs:
-| Case | Claude haiku | pi deepseek-v4-flash |
+| Case | Claude Code | pi deepseek-v4-flash |
 | --- | --- | --- |
 | explore this repo | 1 Agent(Explore) | 1 Agent(explorer) |
 | auth multi-repo comparison | 1 Agent | 3 Agent calls |
@@ -58,7 +58,32 @@ The explorer returns a concise repo map, and the main agent relays the useful pa
 ## Notes
-- Nested delegation is supported and bounded by the extension.
+- Subagents cannot launch other subagents; the main agent coordinates follow-up delegation after each result returns.
+- Root-level parallel delegation is supported and bounded by the extension.
 - Subagents inherit the caller's current model and thinking level.
 - Subagents do not inherit parent conversation messages or tool results, so prompts should be self-contained.
 - `explorer` is prompted as read-only; pi permissions are still controlled by the active pi runtime.
+## E2E
+Run the main-agent behavior e2e matrix:
+```bash
+npm run e2e
+```
+This downloads fresh GitHub fixtures (`sindresorhus/ky` and `sindresorhus/got` by default), runs fresh `pi -p` sessions with ambient skills, extensions, prompt templates, themes, and context files disabled, then records Claude Code-style routing scenarios. The default pi settings are `deepseek/deepseek-v4-flash` with `--thinking high`.
+- codebase exploration
+- review
+- simple codebase QA
+- small feature implementation
+- two-codebase comparison
+To compare the same scenarios against Claude Code:
+```bash
+npm run e2e:compare-claude
+```
+The Claude comparison uses `--model haiku --effort high` by default. If `DEEPSEEK_API_KEY` is exported or present in `.env`, the runner configures Claude Code with DeepSeek's Anthropic-compatible endpoint and maps `haiku` to `deepseek-v4-flash[1m]`; it also creates a temporary pi auth file for the same key. It writes a report under `/tmp`, shows `MATCH` or `DIFF` for each scenario, and treats timeouts or budget caps in observational scenarios as inconclusive by default. Add `-- --repeat 3` to repeat each task, `-- --strict-observed` when incomplete observational scenarios should fail the command, or `-- --strict-claude` when Claude-side failures should fail the command.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@kky42/pi-subagents",
-  "version": "1.0.1",
+  "version": "1.0.2",
   "description": "Claude Code-style subagents for pi.",
   "type": "module",
   "main": "./index.ts",
@@ -8,6 +8,7 @@
   "files": [
     "index.ts",
     "src",
+    "scripts",
     "README.md",
     "LICENSE",
     "assets"
@@ -21,6 +22,8 @@
   "scripts": {
     "build": "tsc",
     "check": "tsc --noEmit && vitest run",
+    "e2e": "node scripts/e2e/main-agent-comparison.mjs",
+    "e2e:compare-claude": "node scripts/e2e/main-agent-comparison.mjs --with-claude",
     "test": "vitest run"
   },
   "peerDependencies": {

package/scripts/e2e/main-agent-comparison.mjs ADDED Viewed

@@ -0,0 +1,663 @@
+#!/usr/bin/env node
+import { spawn, spawnSync } from "node:child_process";
+import {
+  cpSync,
+  createWriteStream,
+  existsSync,
+  mkdirSync,
+  readdirSync,
+  readFileSync,
+  rmSync,
+  writeFileSync,
+} from "node:fs";
+import { tmpdir } from "node:os";
+import path from "node:path";
+import process from "node:process";
+import { fileURLToPath } from "node:url";
+const repoRoot = path.resolve(path.dirname(fileURLToPath(import.meta.url)), "..", "..");
+function loadDotEnv(filePath) {
+  if (!existsSync(filePath)) return;
+  const lines = readFileSync(filePath, "utf8").split("\n");
+  for (const line of lines) {
+    const trimmed = line.trim();
+    if (!trimmed || trimmed.startsWith("#")) continue;
+    const separator = trimmed.indexOf("=");
+    if (separator === -1) continue;
+    const key = trimmed.slice(0, separator).trim();
+    let value = trimmed.slice(separator + 1).trim();
+    if (!key || process.env[key] !== undefined) continue;
+    if (
+      (value.startsWith('"') && value.endsWith('"')) ||
+      (value.startsWith("'") && value.endsWith("'"))
+    ) {
+      value = value.slice(1, -1);
+    }
+    process.env[key] = value;
+  }
+}
+loadDotEnv(path.join(repoRoot, ".env"));
+const scenarios = [
+  {
+    id: "codebase-exploration",
+    fixture: "primary",
+    prompt:
+      "I just opened this project and need a quick orientation. What is it for, where is the important code, and what should I run to check it? Please just report back; don't change files.",
+  },
+  {
+    id: "review",
+    fixture: "primary",
+    prompt:
+      "Can you review this codebase for a few concrete maintainability or testing risks? Please cite the files that led you there, and don't change anything.",
+  },
+  {
+    id: "qa-about-codebase",
+    expectedBehavior: "direct",
+    fixture: "primary",
+    prompt:
+      "What package name and license does this repo declare? Please answer from the repo files.",
+  },
+  {
+    id: "implement-feature",
+    fixture: "primary",
+    prompt:
+      "Please add a short README section called \"Local checks\" that tells contributors the main command to run before opening a pull request.",
+  },
+  {
+    id: "compare-codebases",
+    fixture: "both",
+    prompt:
+      "I'm choosing between ./ky and ./got for a small project. Can you compare their purpose, rough architecture, and test setup?",
+  },
+];
+function parseArgs(argv) {
+  const options = {
+    cwd: repoRoot,
+    extension: path.join(repoRoot, "index.ts"),
+    model: "deepseek/deepseek-v4-flash",
+    thinking: "high",
+    sessionRoot: path.join(tmpdir(), `pi-subagent-main-agent-e2e-${Date.now()}`),
+    timeoutMs: 120_000,
+    repeat: 1,
+    primaryRepo: "https://github.com/sindresorhus/ky.git",
+    primaryName: "ky",
+    secondaryRepo: "https://github.com/sindresorhus/got.git",
+    secondaryName: "got",
+    deepseekApiKeyEnv: "DEEPSEEK_API_KEY",
+    withClaude: false,
+    strictClaude: false,
+    strictObserved: false,
+    claudeModel: "haiku",
+    claudeEffort: "high",
+    claudeTimeoutMs: 120_000,
+    claudeMaxBudgetUsd: "0.80",
+  };
+  for (let index = 0; index < argv.length; index += 1) {
+    const arg = argv[index];
+    if (arg === "--help" || arg === "-h") {
+      options.help = true;
+      continue;
+    }
+    if (arg === "--with-claude") {
+      options.withClaude = true;
+      continue;
+    }
+    if (arg === "--strict-claude") {
+      options.strictClaude = true;
+      continue;
+    }
+    if (arg === "--strict-observed") {
+      options.strictObserved = true;
+      continue;
+    }
+    const readValue = () => {
+      const value = argv[index + 1];
+      if (!value) throw new Error(`${arg} requires a value`);
+      index += 1;
+      return value;
+    };
+    if (arg === "--cwd") options.cwd = path.resolve(readValue());
+    else if (arg === "--extension") options.extension = path.resolve(readValue());
+    else if (arg === "--model") options.model = readValue();
+    else if (arg === "--thinking") options.thinking = readValue();
+    else if (arg === "--session-root") options.sessionRoot = path.resolve(readValue());
+    else if (arg === "--timeout-ms") options.timeoutMs = Number(readValue());
+    else if (arg === "--repeat") options.repeat = Number(readValue());
+    else if (arg === "--primary-repo") options.primaryRepo = readValue();
+    else if (arg === "--primary-name") options.primaryName = readValue();
+    else if (arg === "--secondary-repo") options.secondaryRepo = readValue();
+    else if (arg === "--secondary-name") options.secondaryName = readValue();
+    else if (arg === "--deepseek-api-key-env") options.deepseekApiKeyEnv = readValue();
+    else if (arg === "--claude-model") options.claudeModel = readValue();
+    else if (arg === "--claude-effort") options.claudeEffort = readValue();
+    else if (arg === "--claude-timeout-ms") options.claudeTimeoutMs = Number(readValue());
+    else if (arg === "--claude-max-budget-usd") options.claudeMaxBudgetUsd = readValue();
+    else throw new Error(`Unknown option: ${arg}`);
+  }
+  return options;
+}
+function printHelp() {
+  console.log(`Usage: node scripts/e2e/main-agent-comparison.mjs [options]
+Runs main-agent behavior scenarios against pi. With --with-claude, runs the same
+scenarios through Claude Code and compares whether each main agent delegated or
+handled the task directly. Behavior differences are reported but do not fail the
+run unless a scenario has an explicit expectedBehavior.
+Options:
+  --model <id>                   pi model (default: deepseek/deepseek-v4-flash)
+  --thinking <level>             pi thinking level (default: high)
+  --session-root <dir>           artifact root (default: OS temp dir)
+  --timeout-ms <ms>              per-pi-scenario timeout (default: 120000)
+  --repeat <n>                   repetitions per scenario (default: 1)
+  --primary-repo <url>           GitHub repo for single-codebase scenarios
+  --secondary-repo <url>         GitHub repo for two-codebase comparison
+  --deepseek-api-key-env <name>  env var used for pi and Claude DeepSeek auth
+  --with-claude                  also run Claude Code comparison
+  --strict-claude                fail if a Claude Code scenario is incomplete or unexpected
+  --strict-observed              fail incomplete observed scenarios too
+  --claude-model <id>            Claude Code model alias/id (default: haiku)
+  --claude-effort <level>        Claude Code effort (default: high)
+  --claude-timeout-ms <ms>       per-Claude-scenario timeout (default: 120000)
+  --claude-max-budget-usd <usd>  Claude Code budget cap (default: 0.80)
+`);
+}
+function ensureDirectory(dir) {
+  mkdirSync(dir, { recursive: true });
+}
+function runText(command, args, cwd) {
+  const result = spawnSync(command, args, {
+    cwd,
+    encoding: "utf8",
+    env: process.env,
+  });
+  if (result.status !== 0) {
+    return undefined;
+  }
+  return result.stdout.trim();
+}
+function getDeepseekApiKey(options) {
+  return process.env[options.deepseekApiKeyEnv] || process.env.ANTHROPIC_AUTH_TOKEN || process.env.DEEPSEEK_API_KEY;
+}
+function buildPiEnv(options, sessionDir) {
+  const env = { ...process.env };
+  const key = getDeepseekApiKey(options);
+  if (key) {
+    const agentDir = path.join(sessionDir, "agent");
+    ensureDirectory(agentDir);
+    env.PI_CODING_AGENT_DIR = agentDir;
+    env.DEEPSEEK_API_KEY = key;
+    writeFileSync(
+      path.join(agentDir, "auth.json"),
+      `${JSON.stringify({ deepseek: { type: "api_key", key } }, null, 2)}\n`,
+    );
+  }
+  return env;
+}
+function buildClaudeEnv(options) {
+  const env = { ...process.env };
+  const key = getDeepseekApiKey(options);
+  if (key) {
+    env.ANTHROPIC_BASE_URL = "https://api.deepseek.com/anthropic";
+    env.ANTHROPIC_AUTH_TOKEN = key;
+    env.ANTHROPIC_MODEL = "deepseek-v4-pro[1m]";
+    env.ANTHROPIC_DEFAULT_OPUS_MODEL = "deepseek-v4-pro[1m]";
+    env.ANTHROPIC_DEFAULT_SONNET_MODEL = "deepseek-v4-pro[1m]";
+    env.ANTHROPIC_DEFAULT_HAIKU_MODEL = "deepseek-v4-flash[1m]";
+  }
+  env.CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC = "1";
+  env.CLAUDE_CODE_ATTRIBUTION_HEADER = "0";
+  env.CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS = "1";
+  return env;
+}
+async function cloneRepo({ url, name, baseDir, timeoutMs }) {
+  const repoDir = path.join(baseDir, name);
+  if (existsSync(repoDir)) {
+    return {
+      name,
+      url,
+      path: repoDir,
+      commit: runText("git", ["rev-parse", "HEAD"], repoDir),
+    };
+  }
+  ensureDirectory(baseDir);
+  const logDir = path.join(baseDir, "_clone-logs");
+  ensureDirectory(logDir);
+  const command = await runProcess({
+    command: "git",
+    args: ["clone", "--depth", "1", url, repoDir],
+    cwd: baseDir,
+    stdoutPath: path.join(logDir, `${name}.stdout.txt`),
+    stderrPath: path.join(logDir, `${name}.stderr.txt`),
+    timeoutMs,
+  });
+  if (command.exitCode !== 0 || command.timedOut) {
+    throw new Error(`Failed to clone ${url}. See ${logDir}`);
+  }
+  return {
+    name,
+    url,
+    path: repoDir,
+    commit: runText("git", ["rev-parse", "HEAD"], repoDir),
+  };
+}
+async function prepareFixtures(options) {
+  const baseDir = path.join(options.sessionRoot, "fixtures", "base");
+  const primary = await cloneRepo({
+    url: options.primaryRepo,
+    name: options.primaryName,
+    baseDir,
+    timeoutMs: options.timeoutMs,
+  });
+  const secondary = await cloneRepo({
+    url: options.secondaryRepo,
+    name: options.secondaryName,
+    baseDir,
+    timeoutMs: options.timeoutMs,
+  });
+  return { baseDir, primary, secondary };
+}
+function prepareScenarioWorkdir(options, fixtures, sessionDir, scenario) {
+  const workRoot = path.join(sessionDir, "work");
+  rmSync(workRoot, { recursive: true, force: true });
+  ensureDirectory(workRoot);
+  const primaryTarget = path.join(workRoot, fixtures.primary.name);
+  cpSync(fixtures.primary.path, primaryTarget, { recursive: true });
+  if (scenario.fixture === "both") {
+    const secondaryTarget = path.join(workRoot, fixtures.secondary.name);
+    cpSync(fixtures.secondary.path, secondaryTarget, { recursive: true });
+    return workRoot;
+  }
+  return primaryTarget;
+}
+function writePromptFile(sessionDir, scenario, kind) {
+  const promptPath = path.join(sessionDir, "prompt.md");
+  const expectedLine = scenario.expectedBehavior
+    ? `Expected behavior: ${scenario.expectedBehavior}`
+    : "Expected behavior: observe and report";
+  writeFileSync(
+    promptPath,
+    `# ${kind} Main-Agent Behavior E2E
+Scenario: ${scenario.id}
+${expectedLine}
+${scenario.prompt}
+`,
+  );
+  return promptPath;
+}
+function runProcess({ command, args, cwd, stdoutPath, stderrPath, timeoutMs, env = process.env }) {
+  return new Promise((resolve) => {
+    const startedAt = new Date().toISOString();
+    const stdout = createWriteStream(stdoutPath, { flags: "a" });
+    const stderr = createWriteStream(stderrPath, { flags: "a" });
+    const child = spawn(command, args, {
+      cwd,
+      stdio: ["ignore", "pipe", "pipe"],
+      env,
+    });
+    let timedOut = false;
+    let settled = false;
+    let errorMessage;
+    let killTimer;
+    const timeout = setTimeout(() => {
+      timedOut = true;
+      child.kill("SIGTERM");
+      killTimer = setTimeout(() => child.kill("SIGKILL"), 5_000);
+      killTimer.unref();
+    }, timeoutMs);
+    timeout.unref();
+    child.stdout.pipe(stdout);
+    child.stderr.pipe(stderr);
+    child.on("error", (error) => {
+      errorMessage = error.message;
+    });
+    child.on("close", (exitCode, signal) => {
+      if (settled) return;
+      settled = true;
+      clearTimeout(timeout);
+      if (killTimer) clearTimeout(killTimer);
+      stdout.end();
+      stderr.end();
+      resolve({
+        command,
+        args,
+        cwd,
+        startedAt,
+        endedAt: new Date().toISOString(),
+        exitCode,
+        signal,
+        timedOut,
+        errorMessage,
+      });
+    });
+  });
+}
+function readJsonlRecords(filePath) {
+  if (!filePath || !existsSync(filePath)) return [];
+  const text = readFileSync(filePath, "utf8");
+  return text
+    .split("\n")
+    .filter(Boolean)
+    .map((line) => {
+      try {
+        return JSON.parse(line);
+      } catch {
+        return undefined;
+      }
+    })
+    .filter(Boolean);
+}
+function findNewestJsonl(dir) {
+  const files = readdirSync(dir)
+    .filter((file) => file.endsWith(".jsonl"))
+    .map((file) => path.join(dir, file))
+    .sort();
+  return files.at(-1);
+}
+function countTool(map, name) {
+  map[name] = (map[name] ?? 0) + 1;
+}
+function analyzePiTrace(filePath) {
+  const toolCalls = {};
+  const toolResults = {};
+  const finalTexts = [];
+  for (const record of readJsonlRecords(filePath)) {
+    const message = record.message;
+    const content = Array.isArray(message?.content) ? message.content : [];
+    for (const item of content) {
+      if (item?.type === "toolCall" && typeof item.name === "string") {
+        countTool(toolCalls, item.name);
+      }
+      if (item?.type === "text" && typeof item.text === "string" && message?.role === "assistant") {
+        finalTexts.push(item.text);
+      }
+    }
+    if (message?.role === "toolResult" && typeof message.toolName === "string") {
+      countTool(toolResults, message.toolName);
+    }
+  }
+  const agentCalls = toolCalls.Agent ?? 0;
+  return {
+    filePath,
+    toolCalls,
+    toolResults,
+    agentCalls,
+    readCalls: toolCalls.read ?? 0,
+    behavior: agentCalls > 0 ? "delegate" : "direct",
+    finalText: finalTexts.at(-1) ?? "",
+  };
+}
+function analyzeClaudeTrace(filePath) {
+  const toolCalls = {};
+  const taskStarts = [];
+  const resultErrors = [];
+  const finalTexts = [];
+  for (const record of readJsonlRecords(filePath)) {
+    if (record.type === "system" && record.subtype === "task_started") {
+      taskStarts.push(record);
+    }
+    if (record.type === "result" && record.is_error) {
+      resultErrors.push(record.subtype ?? record.stop_reason ?? "error");
+    }
+    const message = record.message;
+    const content = Array.isArray(message?.content) ? message.content : [];
+    const isRootMessage = !record.parent_tool_use_id;
+    for (const item of content) {
+      if (isRootMessage && item?.type === "tool_use" && typeof item.name === "string") {
+        countTool(toolCalls, item.name);
+      }
+      if (isRootMessage && item?.type === "text" && typeof item.text === "string" && message?.role === "assistant") {
+        finalTexts.push(item.text);
+      }
+    }
+  }
+  const agentCalls = toolCalls.Agent ?? 0;
+  return {
+    filePath,
+    toolCalls,
+    taskStarts: taskStarts.length,
+    resultErrors,
+    agentCalls,
+    readCalls: toolCalls.Read ?? 0,
+    behavior: agentCalls > 0 || taskStarts.length > 0 ? "delegate" : "direct",
+    finalText: finalTexts.at(-1) ?? "",
+  };
+}
+async function runPiScenario(options, fixtures, scenario, repeatIndex) {
+  const sessionDir = path.join(options.sessionRoot, "pi", scenario.id, `r${repeatIndex}`);
+  ensureDirectory(sessionDir);
+  const workCwd = prepareScenarioWorkdir(options, fixtures, sessionDir, scenario);
+  const promptPath = writePromptFile(sessionDir, scenario, "pi");
+  const stdoutPath = path.join(sessionDir, "stdout.txt");
+  const stderrPath = path.join(sessionDir, "stderr.txt");
+  const args = [
+    "-p",
+    "--model",
+    options.model,
+    "--thinking",
+    options.thinking,
+    "--session-dir",
+    sessionDir,
+    "--no-prompt-templates",
+    "--no-themes",
+    "--no-context-files",
+    "--no-skills",
+    "--no-extensions",
+    "--extension",
+    options.extension,
+    `@${promptPath}`,
+  ];
+  const command = await runProcess({
+    command: "pi",
+    args,
+    cwd: workCwd,
+    stdoutPath,
+    stderrPath,
+    timeoutMs: options.timeoutMs,
+    env: buildPiEnv(options, sessionDir),
+  });
+  const trace = analyzePiTrace(findNewestJsonl(sessionDir));
+  const pass =
+    command.exitCode === 0 &&
+    !command.timedOut &&
+    (!scenario.expectedBehavior || trace.behavior === scenario.expectedBehavior) &&
+    (!scenario.requirePiRead || trace.readCalls > 0);
+  const result = {
+    kind: "pi",
+    scenario: scenario.id,
+    repeat: repeatIndex,
+    expectedBehavior: scenario.expectedBehavior,
+    required: Boolean(scenario.expectedBehavior),
+    pass,
+    command,
+    sessionDir,
+    workCwd,
+    stdoutPath,
+    stderrPath,
+    trace,
+  };
+  writeFileSync(path.join(sessionDir, "result.json"), `${JSON.stringify(result, null, 2)}\n`);
+  return result;
+}
+async function runClaudeScenario(options, fixtures, scenario, repeatIndex) {
+  const sessionDir = path.join(options.sessionRoot, "claude", scenario.id, `r${repeatIndex}`);
+  ensureDirectory(sessionDir);
+  const workCwd = prepareScenarioWorkdir(options, fixtures, sessionDir, scenario);
+  const promptPath = writePromptFile(sessionDir, scenario, "Claude Code");
+  const stdoutPath = path.join(sessionDir, "stream.jsonl");
+  const stderrPath = path.join(sessionDir, "stderr.txt");
+  const args = [
+    "-p",
+    "--output-format",
+    "stream-json",
+    "--max-budget-usd",
+    options.claudeMaxBudgetUsd,
+    "--effort",
+    options.claudeEffort,
+    "--permission-mode",
+    "bypassPermissions",
+    "--disable-slash-commands",
+    "--exclude-dynamic-system-prompt-sections",
+    "--no-session-persistence",
+  ];
+  if (options.claudeModel) args.push("--model", options.claudeModel);
+  args.push(readFileSync(promptPath, "utf8"));
+  const command = await runProcess({
+    command: "claude",
+    args,
+    cwd: workCwd,
+    stdoutPath,
+    stderrPath,
+    timeoutMs: options.claudeTimeoutMs,
+    env: buildClaudeEnv(options),
+  });
+  const trace = analyzeClaudeTrace(stdoutPath);
+  const completed = command.exitCode === 0 && !command.timedOut && trace.resultErrors.length === 0;
+  const pass = completed && (!scenario.expectedBehavior || trace.behavior === scenario.expectedBehavior);
+  const result = {
+    kind: "claude",
+    scenario: scenario.id,
+    repeat: repeatIndex,
+    expectedBehavior: scenario.expectedBehavior,
+    required: Boolean(scenario.expectedBehavior),
+    pass,
+    completed,
+    command,
+    sessionDir,
+    workCwd,
+    stdoutPath,
+    stderrPath,
+    trace,
+  };
+  writeFileSync(path.join(sessionDir, "result.json"), `${JSON.stringify(result, null, 2)}\n`);
+  return result;
+}
+function formatResult(result, options = {}) {
+  let status = result.pass ? "PASS" : "FAIL";
+  if ((result.kind === "pi" || result.kind === "claude") && !result.required && !result.pass) {
+    status = "INCONCLUSIVE";
+  }
+  if (result.kind === "claude" && !options.strictClaude && !result.required && !result.pass) {
+    status = "INCONCLUSIVE";
+  }
+  const expected = result.expectedBehavior ?? "observe";
+  const parts = [
+    status,
+    result.kind,
+    `${result.scenario}#${result.repeat ?? 1}`,
+    `expected=${expected}`,
+    `observed=${result.trace.behavior}`,
+    `agentCalls=${result.trace.agentCalls}`,
+  ];
+  if (result.kind === "pi") parts.push(`readCalls=${result.trace.readCalls}`);
+  if (result.kind === "claude") {
+    parts.push(`completed=${result.completed}`);
+    if (result.command.timedOut) parts.push("timeout=true");
+    if (result.trace.resultErrors.length) parts.push(`errors=${result.trace.resultErrors.join(",")}`);
+  }
+  return parts.join(" ");
+}
+async function main() {
+  const options = parseArgs(process.argv.slice(2));
+  if (options.help) {
+    printHelp();
+    return;
+  }
+  ensureDirectory(options.sessionRoot);
+  const fixtures = await prepareFixtures(options);
+  const results = [];
+  for (let repeatIndex = 1; repeatIndex <= options.repeat; repeatIndex += 1) {
+    for (const scenario of scenarios) {
+      const piResult = await runPiScenario(options, fixtures, scenario, repeatIndex);
+      results.push(piResult);
+      console.log(formatResult(piResult, options));
+      if (options.withClaude) {
+        const claudeResult = await runClaudeScenario(options, fixtures, scenario, repeatIndex);
+        results.push(claudeResult);
+        console.log(formatResult(claudeResult, options));
+        const comparisonMatch = piResult.trace.behavior === claudeResult.trace.behavior;
+        const comparisonRequired = Boolean(scenario.expectedBehavior);
+        const comparisonPass = !comparisonRequired || comparisonMatch;
+        results.push({
+          kind: "comparison",
+          scenario: scenario.id,
+          repeat: repeatIndex,
+          pass: comparisonPass,
+          required: comparisonRequired,
+          match: comparisonMatch,
+          piBehavior: piResult.trace.behavior,
+          claudeBehavior: claudeResult.trace.behavior,
+        });
+        console.log(
+          `${comparisonMatch ? "MATCH" : "DIFF"} comparison ${scenario.id}#${repeatIndex} pi=${piResult.trace.behavior} claude=${claudeResult.trace.behavior}`,
+        );
+      }
+    }
+  }
+  const reportPath = path.join(options.sessionRoot, "report.json");
+  writeFileSync(reportPath, `${JSON.stringify({ options, fixtures, scenarios, results }, null, 2)}\n`);
+  console.log(`report=${reportPath}`);
+  const failed = results.filter((result) => {
+    if (result.kind === "pi") return !result.pass && (result.required || options.strictObserved);
+    if (result.kind === "claude") {
+      return options.strictClaude && !result.pass && (result.required || options.strictObserved);
+    }
+    return false;
+  });
+  if (failed.length > 0) {
+    process.exitCode = 1;
+  }
+}
+main().catch((error) => {
+  console.error(error?.stack || error?.message || String(error));
+  process.exitCode = 1;
+});

package/src/pi-subagent.ts CHANGED Viewed

@@ -12,7 +12,7 @@ import {
   type Theme,
   type ToolDefinition,
 } from "@earendil-works/pi-coding-agent";
-import { Container, Text } from "@earendil-works/pi-tui";
+import { Container, Text, TruncatedText } from "@earendil-works/pi-tui";
 import { Type, type Static } from "typebox";
 import {
   AGENT_PROMPT_GUIDELINES,
@@ -22,7 +22,6 @@ import {
 } from "./prompts.ts";
 import type { SubagentExtensionOptions, SubagentProgressNode, SubagentToolDetails, SubagentType } from "./types.ts";
-const DEFAULT_MAX_DEPTH = 2;
 const DEFAULT_MAX_WIDTH = 4;
 const ALLOWED_SUBAGENTS: SubagentType[] = ["general-purpose", "explorer"];
@@ -43,8 +42,6 @@ const agentToolParameters = Type.Object({
 type AgentToolParams = Static<typeof agentToolParameters>;
 interface DelegationState {
-  depth: number;
-  maxDepth: number;
   maxWidth: number;
   childCount: number;
   progressEnabled: boolean;
@@ -57,25 +54,21 @@ interface CreateAgentToolOptions {
 type AgentToolResult = ReturnType<typeof textResult>;
 const MAX_ACTIVITY_LINES = 3;
+const ACTIVITY_DISPLAY_PREVIEW_CHARS = 120;
 const PROGRESS_UPDATE_INTERVAL_MS = 250;
 const PROGRESS_STATUSES: SubagentProgressNode["status"][] = ["running", "completed", "rejected", "error"];
-function getCliMode(argv = process.argv): string | undefined {
-  for (let i = 0; i < argv.length; i++) {
-    const arg = argv[i];
-    if (arg === "--mode") {
-      return argv[i + 1];
-    }
-    if (arg.startsWith("--mode=")) {
-      return arg.slice("--mode=".length);
-    }
-  }
-  return undefined;
-}
 function shouldEnableProgress(ctx: ExtensionContext): boolean {
-  const mode = getCliMode();
-  return ctx.hasUI && mode !== "json" && mode !== "rpc";
+  if (!ctx.hasUI) {
+    return false;
+  }
+  try {
+    // RPC exposes ExtensionUIContext but has no TUI theme surface. Keep compact
+    // progress updates limited to the interactive TUI renderer.
+    return ctx.ui.getAllThemes().length > 0;
+  } catch {
+    return false;
+  }
 }
 function normalizeLimit(value: number | undefined, fallback: number, label: string): number {
@@ -119,40 +112,28 @@ function isSubagentProgressNode(value: unknown): value is SubagentProgressNode {
     typeof value.id === "string" &&
     typeof value.description === "string" &&
     (subagentType === "unknown" || ALLOWED_SUBAGENTS.includes(subagentType as SubagentType)) &&
-    Number.isFinite(value.depth) &&
     isProgressStatus(value.status) &&
     Number.isFinite(value.startedAt) &&
     Array.isArray(value.activity) &&
     value.activity.every((line) => typeof line === "string") &&
-    Number.isFinite(value.activityCount) &&
-    Array.isArray(value.children)
+    Number.isFinite(value.activityCount)
   );
 }
-function getProgressFromToolResult(result: unknown): SubagentProgressNode | undefined {
-  if (!isRecord(result) || !isRecord(result.details) || !("subagentType" in result.details)) {
-    return undefined;
-  }
-  return isSubagentProgressNode(result.details.progress) ? result.details.progress : undefined;
-}
 function createProgressNode(
   id: string,
   params: AgentToolParams,
   subagentType: SubagentType,
-  depth: number,
   status: SubagentProgressNode["status"] = "running",
 ): SubagentProgressNode {
   return {
     id,
     description: params.description,
     subagentType,
-    depth,
     status,
     startedAt: Date.now(),
     activity: [],
     activityCount: 0,
-    children: [],
   };
 }
@@ -233,18 +214,10 @@ function updateProgressFromEvent(progress: SubagentProgressNode, event: AgentSes
   }
   if (event.type === "tool_execution_update") {
-    const childProgress = event.toolName === "Agent" ? getProgressFromToolResult(event.partialResult) : undefined;
-    if (childProgress) {
-      progress.children = mergeChildProgress(progress.children, childProgress);
-    }
     return;
   }
   if (event.type === "tool_execution_end") {
-    const childProgress = event.toolName === "Agent" ? getProgressFromToolResult(event.result) : undefined;
-    if (childProgress) {
-      progress.children = mergeChildProgress(progress.children, childProgress);
-    }
     return;
   }
@@ -270,19 +243,6 @@ function updateProgressFromEvent(progress: SubagentProgressNode, event: AgentSes
   }
 }
-function mergeChildProgress(
-  children: SubagentProgressNode[],
-  child: SubagentProgressNode,
-): SubagentProgressNode[] {
-  const index = children.findIndex((candidate) => candidate.id === child.id);
-  if (index === -1) {
-    return [...children, child];
-  }
-  const next = children.slice();
-  next[index] = child;
-  return next;
-}
 function getDisplayLabel(subagentType: SubagentType | "unknown"): string {
   return subagentType;
 }
@@ -308,18 +268,21 @@ function formatDuration(ms: number): string {
   return `${seconds}s`;
 }
-function renderProgressNode(
-  node: SubagentProgressNode,
-  theme: Theme,
-  depth = 0,
-): Container {
+function formatActivityLineForDisplay(line: string): string {
+  if (line.length <= ACTIVITY_DISPLAY_PREVIEW_CHARS) {
+    return line;
+  }
+  const hiddenChars = line.length - ACTIVITY_DISPLAY_PREVIEW_CHARS;
+  return `${line.slice(0, ACTIVITY_DISPLAY_PREVIEW_CHARS).trimEnd()} ... (+${hiddenChars} chars)`;
+}
+function renderProgressNode(node: SubagentProgressNode, theme: Theme): Container {
   const container = new Container();
-  const indent = "  ".repeat(depth);
   const status = node.status === "completed" ? "done" : node.status;
   const elapsed = formatDuration((node.endedAt ?? Date.now()) - node.startedAt);
   container.addChild(
     new Text(
-      `${indent}${theme.bold(formatProgressTitle(node))} ${theme.fg("dim", `${status} ${elapsed}`)}`,
+      `${theme.bold(formatProgressTitle(node))} ${theme.fg("dim", `${status} ${elapsed}`)}`,
       0,
       0,
     ),
@@ -327,17 +290,14 @@ function renderProgressNode(
   const skipped = node.activityCount - node.activity.length;
   if (skipped > 0) {
-    container.addChild(new Text(`${indent}  ${theme.fg("muted", `... +${skipped} earlier events`)}`, 0, 0));
+    container.addChild(new Text(`  ${theme.fg("muted", `... +${skipped} earlier events`)}`, 0, 0));
   }
   for (const line of node.activity) {
-    container.addChild(new Text(`${indent}  ${theme.fg("muted", line)}`, 0, 0));
+    container.addChild(new TruncatedText(`  ${theme.fg("muted", formatActivityLineForDisplay(line))}`, 0, 0));
   }
-  for (const child of node.children) {
-    container.addChild(renderProgressNode(child, theme, depth + 1));
-  }
   if (node.error) {
-    container.addChild(new Text(`${indent}  ${theme.fg("error", node.error)}`, 0, 0));
+    container.addChild(new Text(`  ${theme.fg("error", node.error)}`, 0, 0));
   }
   return container;
@@ -372,15 +332,13 @@ async function runSubagent(
   ctx: ExtensionContext,
   onProgress: ((result: AgentToolResult) => void) | undefined,
 ): Promise<ReturnType<typeof textResult>> {
-  const progressDepth = state.depth + 1;
   const progress =
-    state.progressEnabled ? createProgressNode(toolCallId, params, subagentType, progressDepth) : undefined;
+    state.progressEnabled ? createProgressNode(toolCallId, params, subagentType) : undefined;
   if (!ctx.model) {
     return textResult("Cannot launch subagent: no model is selected.", {
       description: params.description,
       subagentType,
-      depth: progressDepth,
       status: "rejected",
       error: "No model is selected",
       ...(progress ? { progress: { ...progress, status: "rejected", error: "No model is selected" } } : {}),
@@ -392,25 +350,19 @@ async function runSubagent(
   const settingsManager = SettingsManager.create(cwd, agentDir);
   const appendPrompts = [
     getPresetAppendPrompt(subagentType),
-    buildCoordinatorPrompt(state.maxDepth, state.maxWidth),
   ].filter((prompt): prompt is string => Boolean(prompt));
   const resourceLoader = new DefaultResourceLoader({
     cwd,
     agentDir,
     settingsManager,
-    appendSystemPrompt: appendPrompts,
+    extensionsOverride: (base) => ({
+      ...base,
+      extensions: base.extensions.filter((extension) => !extension.tools.has("Agent")),
+    }),
+    appendSystemPromptOverride: (base) => [...base, ...appendPrompts],
   });
   await resourceLoader.reload();
-  const childState: DelegationState = {
-    depth: progressDepth,
-    maxDepth: state.maxDepth,
-    maxWidth: state.maxWidth,
-    childCount: 0,
-    progressEnabled: state.progressEnabled,
-  };
-  const nestedAgentTool = createAgentTool(childState, options);
   const { session } = await createAgentSession({
     cwd,
     agentDir,
@@ -420,7 +372,7 @@ async function runSubagent(
     settingsManager,
     sessionManager: SessionManager.inMemory(cwd),
     resourceLoader,
-    customTools: [nestedAgentTool as ToolDefinition],
+    excludeTools: ["Agent"],
   });
   let abortHandler: (() => void) | undefined;
@@ -428,7 +380,9 @@ async function runSubagent(
     abortHandler = () => {
       void session.abort();
     };
-    signal.addEventListener("abort", abortHandler, { once: true });
+    if (!signal.aborted) {
+      signal.addEventListener("abort", abortHandler, { once: true });
+    }
   }
   let lastProgressEmit = 0;
@@ -445,7 +399,6 @@ async function runSubagent(
     onProgress(textResult(`Subagent "${params.description}" (${subagentType}) is running.`, {
       description: params.description,
       subagentType,
-      depth: progressDepth,
       status: progress.status,
       result: progress.result,
       error: progress.error,
@@ -474,7 +427,13 @@ async function runSubagent(
     : undefined;
   try {
+    if (signal?.aborted) {
+      throw new Error("Subagent aborted before prompt start");
+    }
     await session.bindExtensions({});
+    if (signal?.aborted) {
+      throw new Error("Subagent aborted before prompt start");
+    }
     emitProgress();
     await session.prompt(params.prompt, { source: "extension" });
     const result = extractFinalAssistantText(session.messages) || "(no final text output)";
@@ -486,7 +445,6 @@ async function runSubagent(
     return textResult(`Subagent "${params.description}" (${subagentType}) completed:\n\n${result}`, {
       description: params.description,
       subagentType,
-      depth: childState.depth,
       status: "completed",
       result,
       ...(progress ? { progress } : {}),
@@ -501,7 +459,6 @@ async function runSubagent(
     return textResult(`Subagent "${params.description}" (${subagentType}) failed: ${message}`, {
       description: params.description,
       subagentType,
-      depth: childState.depth,
       status: "error",
       error: message,
       ...(progress ? { progress } : {}),
@@ -542,33 +499,18 @@ function createAgentTool(
           {
             description: params.description,
             subagentType: "unknown",
-            depth: effectiveState.depth + 1,
             status: "rejected",
             error: "Unknown subagent_type",
           },
         );
       }
-      if (effectiveState.depth >= effectiveState.maxDepth) {
-        return textResult(
-          `Maximum subagent depth reached. Current depth: ${effectiveState.depth}; maxDepth: ${effectiveState.maxDepth}.`,
-          {
-            description: params.description,
-            subagentType,
-            depth: effectiveState.depth + 1,
-            status: "rejected",
-            error: "Maximum subagent depth reached",
-          },
-        );
-      }
       if (state.childCount >= effectiveState.maxWidth) {
         return textResult(
           `Maximum subagent width reached for this agent run. maxWidth: ${effectiveState.maxWidth}.`,
           {
             description: params.description,
             subagentType,
-            depth: effectiveState.depth + 1,
             status: "rejected",
             error: "Maximum subagent width reached",
           },
@@ -600,7 +542,7 @@ function createAgentTool(
       );
     },
     renderResult(result, _options, theme) {
-      const details = result.details;
+      const details = result.details as SubagentToolDetails;
       if (details.progress) {
         return renderProgressNode(details.progress, theme);
       }
@@ -614,13 +556,10 @@ function createAgentTool(
 }
 export function createSubagentExtension(options: SubagentExtensionOptions = {}): ExtensionFactory {
-  const maxDepth = normalizeLimit(options.maxDepth, DEFAULT_MAX_DEPTH, "maxDepth");
   const maxWidth = normalizeLimit(options.maxWidth, DEFAULT_MAX_WIDTH, "maxWidth");
   return function subagentExtension(pi: ExtensionAPI) {
     const rootState: DelegationState = {
-      depth: 0,
-      maxDepth,
       maxWidth,
       childCount: 0,
       progressEnabled: false,
@@ -632,9 +571,12 @@ export function createSubagentExtension(options: SubagentExtensionOptions = {}):
     pi.registerTool(createAgentTool(rootState, toolOptions));
     pi.on("before_agent_start", (event) => {
+      if (!pi.getAllTools().some((tool) => tool.name === "Agent")) {
+        return;
+      }
       rootState.childCount = 0;
       return {
-        systemPrompt: `${event.systemPrompt}\n\n${buildCoordinatorPrompt(maxDepth, maxWidth)}`,
+        systemPrompt: `${event.systemPrompt}\n\n${buildCoordinatorPrompt()}`,
       };
     });
   };

package/src/prompts.ts CHANGED Viewed

@@ -19,6 +19,7 @@ export const AGENT_PROMPT_GUIDELINES = [
   "If the user asks to explore or survey a repo, use explorer to produce a concise map before doing detailed follow-up yourself.",
   "If the user asks for parallel work, launch multiple Agent calls in the same assistant response.",
   "Write self-contained subagent prompts: fresh subagents do not inherit parent conversation, tool results, or reasoning.",
+  "Subagents cannot launch Agent themselves; coordinate any follow-up delegation from the main conversation after a result returns.",
   "Clearly tell the subagent whether you expect read-only research or code changes.",
   "The Agent final message is returned to you as the tool result and is not shown to the user; relay what matters.",
 ];
@@ -47,17 +48,24 @@ Adapt your search breadth to the caller's prompt. For targeted lookups, be fast
 Return a concise final report with the relevant files, symbols, and caveats. Do not create documentation files.`;
 }
-export function buildCoordinatorPrompt(_maxDepth: number, _maxWidth: number): string {
+export function buildCoordinatorPrompt(): string {
   return `# Subagent Delegation
 Available agents:
 ${formatAvailableAgents(PRESET_DESCRIPTIONS)}
-Use Agent with specialized agents when the task at hand matches the agent's description. Subagents are valuable for parallelizing independent queries or protecting the main context window from excessive results, but they should not be used excessively when not needed.
+Use Agent when a specialized agent matches the task, the work can run independently, or delegating would keep large search/read output out of the main context.
+Guidelines:
+- Do not use subagents excessively; direct lookup is better when the target file, symbol, or value is already known.
+- If the user asks for parallel work, launch independent Agent calls in the same assistant response.
+- Subagents start fresh and do not inherit parent messages, tool results, or reasoning. Brief them with all needed context.
+- Subagents cannot launch other subagents. Coordinate follow-up delegation from the main conversation after each result returns.
+- The Agent final message is returned to you as the tool result. Relay what matters to the user.
 Example usage:
 - User asks "explore this repo": use Agent with subagent_type "explorer" and ask it to map the project purpose, key directories, important files, scripts, tests, and caveats without editing files.
 - User asks for a second opinion on a risky change: use Agent with subagent_type "general-purpose" and give it enough context to review independently.
-Nested subagents are allowed, but delegation depth and width are bounded by the extension. If a limit is reached, the Agent tool will reject the call.`;
+Root-level parallel delegation is bounded by the extension. If the limit is reached, the Agent tool will reject the call.`;
 }

package/src/types.ts CHANGED Viewed

@@ -1,7 +1,6 @@
 export type SubagentType = "general-purpose" | "explorer";
 export interface SubagentExtensionOptions {
-  maxDepth?: number;
   maxWidth?: number;
 }
@@ -9,13 +8,11 @@ export interface SubagentProgressNode {
   id: string;
   description: string;
   subagentType: SubagentType | "unknown";
-  depth: number;
   status: "running" | "completed" | "rejected" | "error";
   startedAt: number;
   endedAt?: number;
   activity: string[];
   activityCount: number;
-  children: SubagentProgressNode[];
   result?: string;
   error?: string;
 }
@@ -23,7 +20,6 @@ export interface SubagentProgressNode {
 export interface SubagentToolDetails {
   description: string;
   subagentType: SubagentType | "unknown";
-  depth: number;
   status: "running" | "completed" | "rejected" | "error";
   result?: string;
   error?: string;