npm - bare-agent - Versions diffs - 0.15.0 → 0.16.0 - Mend

bare-agent 0.15.0 → 0.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/README.md CHANGED Viewed

@@ -68,6 +68,7 @@ Every piece works alone — take what you need, ignore the rest.
 |---|---|
 | **Loop** | Think → act → observe → repeat. Calls any LLM, executes your tools, loops until done. Returns estimated USD cost per run. Governance via `Loop({ policy })` — wire bareguard's `Gate` through `wireGate(gate)` and every tool call (native, MCP, browsing, mobile) traverses one chokepoint with per-caller `ctx` routing. Bareguard owns the audit log, budget caps, and halt decisions; Loop respects the verdict. Context engineering via `Loop({ assemble })` — a per-round `assemble(msgs, ctx)` chokepoint to recall/compress/trim the window sent to the model (the seam litectx plugs into); returns a view, the canonical transcript stays intact, fail-open. The exported `unitAssembler`/`toUnits`/`fromUnits` adapter lets a consumer work over a neutral unit `{id, role, content, kind, pinned, atomic, tokensApprox}` — bareagent owns the grammar (atomic tool-pair bundling, pinned system/task, a pairing seatbelt), the consumer owns content + relevance. The CE function reads its inputs from the per-run `ctx` — litectx's budget-fitter uses `ctx.budget` (and `ctx.task`), so you **must** populate it via `run(msgs, tools, { ctx })`: an unset `ctx.budget` means the fitter has no budget, keeps everything, and returns the window unchanged — a silent no-op, not a bug (see `examples/litectx-assemble.mjs`). For summary-window compaction the Loop also lends a provider-bound `ctx.summarize(excerpt) => Promise<string>` (R-C6): the consumer owns when/what to summarize and the splice, bareagent makes the one model call (counted against the budget via `onLlmResult`, tagged `kind:'summarize'`). For an unbounded long-running agent there's the **destructive** counterpart `Loop({ trim })` (RT-2) — a per-round bound on the canonical transcript that evicts old turns *after* harvesting them; wire it with the exported `unitTrimmer({ trim, onHarvest, policy })` over litectx's `trim` verb (harvest-before-evict, fail-open; `harvestKey` gives the stable upsert id), opt-in (requires a consumer on litectx ≥ 0.16.0). `onError` + `loop:error` surface every silent-ish failure (callback throw, Checkpoint timeout) |
 | **Planner** | Break a goal into a step DAG via LLM. Built-in caching (`cacheTTL`) |
+| **assessComplexity** | Pure-code pre-planner (no LLM): rates a goal `simple`/`medium`/`complex`/`critical` from its text via keyword scoring + a critical safety override. `needsPlanning` gates whether to spend a Planner pass; `critical` flags security/production/compliance work for extra scrutiny. Free, instant, debuggable via `signals` |
 | **runPlan** | Execute steps in parallel waves. Dependency-aware, failure propagation, per-step retry |
 | **Retry** | Exponential/linear backoff with jitter. Respects `err.retryable` |
 | **CircuitBreaker** | Fail fast after N errors. Auto-recovers after cooldown. Per-key isolation |
@@ -83,7 +84,7 @@ Every piece works alone — take what you need, ignore the rest.
 | **Mobile** | Android + iOS device control via `baremobile`. Same two modes: library tools (`createMobileTools` — action tools auto-return snapshots) or CLI session (`baremobile` CLI — disk-based snapshots) |
 | **Shell** | Cross-platform `shell_read`, `shell_grep`, `shell_run` (argv, no shell), `shell_exec` (raw shell). Pure Node — no `grep`/`rg`/`findstr` dependency. Injection-proof `shell_run` for policy-gated use |
 | **MCP Bridge** | Auto-discover MCP servers from IDE configs (Claude Code, Cursor, etc.), expose as bareagent tools. Static allow/deny via `.mcp-bridge.json`, `systemContext` for LLM awareness. Runtime policy lives in `Loop({ policy })` — one hook for MCP + native tools alike. Returns both bulk `tools` (one per MCP tool) and `metaTools` (`mcp_discover` + `mcp_invoke` for token-thrifty access to large catalogs). Connecting runs a server's `command` (which may come from a cwd `.mcp.json`): pass `confirmServer` to vet each before it spawns — otherwise the bridge warns naming every command it runs. Every RPC is time-bounded (`timeout` for the handshake, `callTimeout` for `tools/call`), and a server that breaks its stdin pipe fails the connection instead of crashing the host. Zero deps |
-| **Spawn** | Fork a child bareagent process as a specialist agent. LLM-callable form blocks until child exits; library form returns a handle (`wait`, `onLine`, `kill`). One JSONL channel per child — child stderr captured and re-emitted as `child:stderr` events on the parent stream. Threads `BAREGUARD_AUDIT_PATH` / `BAREGUARD_PARENT_RUN_ID` / `BAREGUARD_BUDGET_FILE` / `BAREGUARD_SPAWN_DEPTH` so the family stitches into one audit + budget. `bareguard ^0.2.0` adds `spawn.ratePerMinute` + `limits.maxDepth` per-family caps |
+| **Spawn** | Fork a child bareagent process as a specialist agent. LLM-callable form blocks until child exits; library form returns a handle (`wait`, `onLine`, `kill`). One JSONL channel per child — child stderr captured and re-emitted as `child:stderr` events on the parent stream. Threads `BAREGUARD_AUDIT_PATH` / `BAREGUARD_PARENT_RUN_ID` / `BAREGUARD_BUDGET_FILE` / `BAREGUARD_SPAWN_DEPTH` so the family stitches into one audit + budget. `bareguard ^0.2.0` adds `spawn.ratePerMinute` + `limits.maxDepth` per-family caps. `timeoutMs` is the wall-clock ceiling; opt-in `idleTimeoutMs` is a heartbeat watchdog that kills a child gone silent on both stdio streams (resets on each line, so slow-but-working children survive; result carries `idleKilled`) |
 | **Defer** | Append a `{action, when}` record to a JSONL queue for a separate waker (cron / systemd timer / `examples/wake.sh`) to fire later. Two-phase governance: emit-time `gate.check` on the `defer` action; fire-time `gate.check` on the inner action when the waker re-invokes. `bareguard ^0.2.0` adds `defer.ratePerMinute` family-wide cap |
 **Providers:** OpenAI-compatible (OpenAI, OpenRouter, Groq, vLLM, LM Studio), Anthropic, Ollama, CLIPipe (any CLI tool via stdin/stdout with real-time streaming), Fallback, or bring your own (one method: `generate`). All return the same shape — swap freely. The OpenAI provider warns if it would send your key over plaintext `http://` to a non-loopback host (use `https`, or drop `apiKey` for keyless local endpoints).

package/bareagent.context.md CHANGED Viewed

@@ -1,7 +1,7 @@
 # bareagent — Integration Guide
 > For AI assistants and developers wiring bareagent into a project.
-> v0.15.0 | Node.js >= 18 | one required dep (`bareguard ^0.4.2`) | Apache 2.0
+> v0.16.0 | Node.js >= 18 | one required dep (`bareguard ^0.4.2`) | Apache 2.0
 >
 > Full human guide with composition examples, design philosophy, and recipes: [Usage Guide](docs/02-features/usage-guide.md)
@@ -14,7 +14,7 @@ npm install bare-agent
 ```
 Eight entry points:
-- `require('bare-agent')` — Loop, Planner, StateMachine, Scheduler, Checkpoint, Memory, Stream, Retry, runPlan, CircuitBreaker, wireGate, defaultActionTranslator, **toUnits, fromUnits, unitAssembler** (the `assemble` context-units adapter, v0.13+), **unitTrimmer, harvestKey** (the destructive `trim` seam adapter — RT-2 harvest-before-evict, needs a consumer on litectx ≥ 0.16.0), BareAgentError, ProviderError, ToolError, TimeoutError, ValidationError, CircuitOpenError, **HaltError**
+- `require('bare-agent')` — Loop, Planner, **assessComplexity** (pure-code no-LLM pre-planner → `{level, score, needsPlanning, signals}`), StateMachine, Scheduler, Checkpoint, Memory, Stream, Retry, runPlan, CircuitBreaker, wireGate, defaultActionTranslator, **toUnits, fromUnits, unitAssembler** (the `assemble` context-units adapter, v0.13+), **unitTrimmer, harvestKey** (the destructive `trim` seam adapter — RT-2 harvest-before-evict, needs a consumer on litectx ≥ 0.16.0), BareAgentError, ProviderError, ToolError, TimeoutError, ValidationError, CircuitOpenError, **HaltError**
 - `require('bare-agent/errors')` — same error classes via a stable subpath (v0.10.1+) for adopters who want to import only the error surface
 - `require('bare-agent/providers')` — OpenAI, Anthropic, Ollama, CLIPipe, Fallback (the canonical short names; `*Provider` aliases — `OpenAIProvider`, `AnthropicProvider`, etc. — are also exported and match the class names, so either destructure works, v0.12.1+)
 - `require('bare-agent/stores')` — SQLite (FTS5), JsonFile
@@ -31,6 +31,8 @@ Eight entry points:
 |---|---|
 | Call an LLM with tools and get a result | Loop + a Provider |
 | Break a goal into steps | Planner + a Provider |
+| Size a goal before planning (no LLM) | assessComplexity — `needsPlanning` gates a Planner pass |
+| Kill a spawned child that hangs silently | createSpawnTool / spawnChild `{ idleTimeoutMs }` |
 | Execute a step DAG with parallelism | runPlan + executeFn |
 | Track task state (pending/running/done/failed) | StateMachine |
 | Run agent turns on a schedule (cron, timers) | Scheduler |

package/index.d.ts CHANGED Viewed

@@ -1,5 +1,6 @@
 import { Loop } from "./src/loop";
 import { Planner } from "./src/planner";
+import { assessComplexity } from "./src/complexity";
 import { StateMachine } from "./src/state";
 import { Scheduler } from "./src/scheduler";
 import { Checkpoint } from "./src/checkpoint";
@@ -22,4 +23,4 @@ import { TimeoutError } from "./src/errors";
 import { ValidationError } from "./src/errors";
 import { CircuitOpenError } from "./src/errors";
 import { HaltError } from "./src/errors";
-export { Loop, Planner, StateMachine, Scheduler, Checkpoint, Memory, Stream, Retry, runPlan, CircuitBreaker, wireGate, defaultActionTranslator, toUnits, fromUnits, unitAssembler, unitTrimmer, harvestKey, BareAgentError, ProviderError, ToolError, TimeoutError, ValidationError, CircuitOpenError, HaltError };
+export { Loop, Planner, assessComplexity, StateMachine, Scheduler, Checkpoint, Memory, Stream, Retry, runPlan, CircuitBreaker, wireGate, defaultActionTranslator, toUnits, fromUnits, unitAssembler, unitTrimmer, harvestKey, BareAgentError, ProviderError, ToolError, TimeoutError, ValidationError, CircuitOpenError, HaltError };

package/index.js CHANGED Viewed

@@ -2,6 +2,7 @@
 const { Loop } = require('./src/loop');
 const { Planner } = require('./src/planner');
+const { assessComplexity } = require('./src/complexity');
 const { StateMachine } = require('./src/state');
 const { Scheduler } = require('./src/scheduler');
 const { Checkpoint } = require('./src/checkpoint');
@@ -25,6 +26,7 @@ const {
 module.exports = {
   Loop,
   Planner,
+  assessComplexity,
   StateMachine,
   Scheduler,
   Checkpoint,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "bare-agent",
-  "version": "0.15.0",
+  "version": "0.16.0",
   "files": [
     "index.js",
     "index.d.ts",

package/src/complexity.d.ts ADDED Viewed

@@ -0,0 +1,31 @@
+export type ComplexityResult = {
+    /**
+     * - Assessed complexity tier.
+     */
+    level: "simple" | "medium" | "complex" | "critical";
+    /**
+     * - Raw heuristic score (100 for a critical override).
+     */
+    score: number;
+    /**
+     * - false for `simple`, true otherwise — the routing hint.
+     */
+    needsPlanning: boolean;
+    /**
+     * - Which signals fired, for transparency/debugging.
+     */
+    signals: string[];
+};
+/**
+ * @typedef {object} ComplexityResult
+ * @property {'simple'|'medium'|'complex'|'critical'} level - Assessed complexity tier.
+ * @property {number} score - Raw heuristic score (100 for a critical override).
+ * @property {boolean} needsPlanning - false for `simple`, true otherwise — the routing hint.
+ * @property {string[]} signals - Which signals fired, for transparency/debugging.
+ */
+/**
+ * Assess the complexity of a goal/prompt from its text alone (no LLM).
+ * @param {string} prompt - The goal to classify.
+ * @returns {ComplexityResult}
+ */
+export function assessComplexity(prompt: string): ComplexityResult;

package/src/complexity.js ADDED Viewed

@@ -0,0 +1,149 @@
+'use strict';
+/**
+ * Keyword complexity assessor — a fast, pure-code "pre-planner" that classifies a goal as
+ * simple / medium / complex / critical from its text alone, with NO LLM call. Ported (concept,
+ * not line-for-line) from Aurora's SOAR keyword assessor. It exists to drive a routing decision:
+ * a `simple` goal can run single-shot; `medium`+ warrants a Planner pass; `critical` (security,
+ * production, compliance, financial) flags work that deserves extra scrutiny (e.g. a checkpoint /
+ * adversarial verification) before acting.
+ *
+ *   const { level, needsPlanning } = assessComplexity(goal);
+ *   const steps = needsPlanning ? await planner.plan(goal) : [{ id: 's1', action: goal }];
+ *
+ * Concept, deliberately lightweight: a critical-keyword override, tiered action-verb scoring
+ * (simple verbs subtract, complex verbs add the most), feature nouns + scope + structure signals,
+ * and two calibrated thresholds. It is a heuristic — transparent and debuggable via `signals`, not
+ * a model. On the upstream validation corpus it lands ~89% (the fuller LLM-free original ~95%);
+ * the gap is long-tail ambiguity ("add a button" is genuinely context-dependent).
+ */
+const has = (/** @type {Set<string>} */ words, /** @type {Set<string>} */ set) =>
+  [...set].filter(w => words.has(w));
+const wordSet = (/** @type {string} */ s) => new Set(s.match(/\b\w+\b/g) || []);
+// Escape regex metacharacters so a keyword can't break (or alter) the word-boundary match — the
+// lists below are plain words today, but a future entry like "c++" or ".net" must stay literal.
+const esc = (/** @type {string} */ k) => k.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+const hasAny = (/** @type {string} */ s, /** @type {string[]} */ list) =>
+  list.some(k => new RegExp(`\\b${esc(k)}\\b`).test(s));
+// --- critical safety override: high-stakes work jumps straight to the top tier ---
+const CRIT_INCIDENT = ['emergency', 'outage', 'breach', 'vulnerability', 'exploit', 'corruption', 'data loss', 'incident', 'penetration'];
+const CRIT_COMPLIANCE = ['gdpr', 'hipaa', 'pci', 'compliance', 'regulation'];
+const SEC_CONTEXT = ['security', 'production', 'authentication', 'authorization'];
+const CRIT_ACTIONS = ['fix', 'patch', 'investigate', 'secure', 'protect', 'mitigate', 'prevent', 'respond', 'handle'];
+const FINANCIAL = ['payment', 'transaction', 'billing', 'financial'];
+const SECURE_ACTS = ['encrypt', 'secure', 'protect', 'audit'];
+/** @param {string} s lowercased prompt */
+function isCritical(s) {
+  if (hasAny(s, CRIT_INCIDENT) || hasAny(s, CRIT_COMPLIANCE)) return true;
+  if (hasAny(s, SEC_CONTEXT) && hasAny(s, CRIT_ACTIONS)) return true;     // e.g. "fix security ..."
+  if (hasAny(s, FINANCIAL) && hasAny(s, SECURE_ACTS)) return true;        // e.g. "encrypt payment ..."
+  return false;
+}
+// --- tiered keyword scoring: verb "weight" reflects how much work the ask implies ---
+const COMPLEX_VERBS = new Set(['implement', 'design', 'architect', 'refactor', 'integrate', 'migrate', 'build', 'create', 'develop', 'construct', 'engineer', 'establish', 'transform', 'overhaul', 'rewrite', 'restructure', 'optimize']);
+const ANALYSIS_VERBS = new Set(['explain', 'compare', 'analyze', 'debug', 'understand', 'investigate', 'describe', 'evaluate', 'review', 'examine', 'diagnose', 'trace', 'why', 'difference']);
+const MEDIUM_VERBS = new Set(['add', 'update', 'fix', 'write', 'change', 'modify', 'remove', 'delete', 'improve', 'enhance', 'extend', 'convert', 'rename', 'move', 'test', 'configure', 'setup', 'set', 'enable', 'disable']);
+const SIMPLE_VERBS = new Set(['what', 'show', 'list', 'get', 'find', 'print', 'check', 'read', 'open', 'run', 'where', 'which', 'display', 'view', 'see', 'tell', 'give', 'name', 'count', 'who', 'when', 'is']);
+const SCOPE = new Set(['all', 'every', 'entire', 'across', 'comprehensive', 'complete', 'codebase', 'project', 'system', 'application', 'full', 'whole', 'everything', 'throughout']);
+const DOMAINS = new Set(['security', 'performance', 'scalability', 'reliability', 'testing', 'authentication', 'authorization', 'caching', 'logging', 'monitoring', 'database', 'api', 'frontend', 'backend', 'infrastructure', 'deployment', 'docker', 'kubernetes', 'microservices', 'distributed']);
+// Feature/system nouns: paired with an action verb they signal a real feature, not a one-liner.
+const COMPLEX_NOUNS = new Set(['authentication', 'authorization', 'oauth', 'jwt', 'session', 'sessions', 'pipeline', 'workflow', 'notification', 'notifications', 'dashboard', 'crud', 'plugin', 'framework', 'websocket', 'websockets', 'realtime', 'pagination', 'search', 'validation', 'migration', 'schema', 'registration']);
+const SEQUENCE = ['first', 'then', 'after that', 'finally', 'next', 'afterwards', 'subsequently', 'step by step', 'and then', 'as well as', 'additionally', 'along with'];
+const CONSTRAINTS = ['without breaking', 'without changing', 'maintaining', 'ensuring', 'backward compatible', 'backwards compatible', 'must not', 'should not', 'preserve', 'without affecting'];
+const SIMPLE_THRESHOLD = 11;
+const MEDIUM_THRESHOLD = 28;
+// Bound the text the assessor scans. Several signal patterns contain `.*`, which can backtrack
+// quadratically on adversarial input (e.g. "integrate "×N with no "with" — O(n²)). Complexity is
+// fully determined by the opening of a goal, so capping the working string makes every scan
+// linear-bounded and removes the DoS surface for callers that pass untrusted end-user text.
+const MAX_ASSESS_LEN = 4000;
+/**
+ * @typedef {object} ComplexityResult
+ * @property {'simple'|'medium'|'complex'|'critical'} level - Assessed complexity tier.
+ * @property {number} score - Raw heuristic score (100 for a critical override).
+ * @property {boolean} needsPlanning - false for `simple`, true otherwise — the routing hint.
+ * @property {string[]} signals - Which signals fired, for transparency/debugging.
+ */
+/**
+ * Assess the complexity of a goal/prompt from its text alone (no LLM).
+ * @param {string} prompt - The goal to classify.
+ * @returns {ComplexityResult}
+ */
+function assessComplexity(prompt) {
+  if (typeof prompt !== 'string' || !prompt.trim()) {
+    return { level: 'simple', score: 0, needsPlanning: false, signals: ['empty'] };
+  }
+  const text = prompt.trim().slice(0, MAX_ASSESS_LEN);
+  const lower = text.toLowerCase();
+  if (isCritical(lower)) {
+    return { level: 'critical', score: 100, needsPlanning: true, signals: ['critical_override'] };
+  }
+  const words = wordSet(lower);
+  const wc = text.split(/\s+/).length;
+  /** @type {string[]} */
+  const signals = [];
+  let score = 0;
+  /** @param {number} n @param {string} [sig] */
+  const add = (n, sig) => { score += n; if (sig) signals.push(sig); };
+  const complex = has(words, COMPLEX_VERBS);
+  const analysis = has(words, ANALYSIS_VERBS);
+  const medium = has(words, MEDIUM_VERBS);
+  const simple = has(words, SIMPLE_VERBS);
+  const scope = has(words, SCOPE);
+  const domains = has(words, DOMAINS);
+  if (complex.length) add(complex.length * 25, 'complex_verbs');
+  if (analysis.length) add(Math.min(analysis.length * 15, 20), 'analysis_verbs');
+  if (medium.length) add(medium.length * 12, 'medium_verbs');
+  if (simple.length) add(-Math.min(simple.length * 3, 10), 'simple_verbs');
+  if (scope.length) add(scope.length * 12, 'scope');
+  if (domains.length > 1) add(domains.length * 8, 'multi_domain');
+  else if (domains.length) add(5, 'domain');
+  // feature noun + an action verb => a real feature (pushes single-verb asks up a tier)
+  const nouns = has(words, COMPLEX_NOUNS);
+  if (nouns.length && (medium.length || complex.length)) add(nouns.length * 10, 'feature_nouns');
+  if (/\b(?:dark\s*mode|feature\s*flags?|real-?time|end-?to-?end|full-?stack)\b/.test(lower)) add(12, 'feature_pattern');
+  if (/\bintegrate\b.*\bwith\b/.test(lower)) add(15, 'integration');
+  if (/\b(?:improve|optimize)\s+(?:performance|speed|efficiency)\b/.test(lower)
+    && !/\b(?:this|the)\s+(?:function|method|query|loop)\b/.test(lower)) add(15, 'open_ended');
+  // structure / sequencing — multi-step asks are heavier
+  const seq = SEQUENCE.filter(m => lower.includes(m)).length;
+  if (seq) add(seq * 8, 'sequence');
+  const constraints = CONSTRAINTS.filter(m => lower.includes(m)).length;
+  if (constraints) add(constraints * 12, 'constraints');
+  const listItems = (text.match(/(?:^|\n)\s*(?:\d+[.)]|[-*])\s/g) || []).length;
+  if (listItems) add(listItems * 9, 'list');
+  // length: longer prompts trend more complex
+  if (wc > 40) add(15, 'long'); else if (wc > 20) add(10); else if (wc > 10) add(5);
+  // architectural / open-ended questions read simple lexically but imply design work
+  if (/\bbest\s+(?:way|approach|practice|architecture)\b|\barchitecture\s+for\b|\bhow (?:should|can|do) (?:we|i)\b.*\b(?:handle|design|implement|build)\b/.test(lower)) add(15, 'design_question');
+  if (/^(?:what is|where is|which|who|is there)\b/.test(lower)) add(-8, 'simple_question');
+  // a trivial edit (typo, comment, log line, version bump) stays simple even though its verb is
+  // "medium" weight — gated to trivial OBJECTS so real medium work isn't wrongly demoted.
+  const trivial = /\b(?:fix|add|remove|delete|rename|update|change)\b.*\b(?:typo|comment|console\.?log|variable|version|line|import)\b/.test(lower)
+    || /\bwrite\s+(?:a|the)\s+function\s+(?:that|to|which)\b/.test(lower);
+  if (trivial && !complex.length && !scope.length && wc <= 10) {
+    score = Math.min(score, SIMPLE_THRESHOLD);
+    signals.push('trivial_edit');
+  }
+  const level = score <= SIMPLE_THRESHOLD ? 'simple' : score <= MEDIUM_THRESHOLD ? 'medium' : 'complex';
+  return { level, score, needsPlanning: level !== 'simple', signals };
+}
+module.exports = { assessComplexity };

package/tools/spawn.d.ts CHANGED Viewed

@@ -42,9 +42,16 @@ export type SpawnChildOptions = {
      */
     cliPath?: string | undefined;
     /**
-     * - Force-kill child after this many ms.
+     * - Force-kill child after this many ms (wall-clock hard ceiling).
      */
     timeoutMs?: number | undefined;
+    /**
+     * - Force-kill child after this many ms with NO output on either
+     * stdout or stderr (heartbeat/liveness watchdog). The clock arms at spawn and resets on every JSONL
+     * line, so a child doing real work is never killed, but one that hangs silently is. Opt-in
+     * (0/undefined disables); independent of `timeoutMs`, which remains the absolute ceiling.
+     */
+    idleTimeoutMs?: number | undefined;
     /**
      * - bareagent Stream — child:stderr events get re-emitted here.
      */
@@ -56,13 +63,16 @@ export type Stream = import("../src/stream").Stream;
  *
  * @param {object} [options]
  * @param {string} [options.cliPath] - Override the bareagent CLI path (default: ./bin/cli.js relative to this file).
- * @param {number} [options.timeoutMs] - Force-kill child after this many ms (default 10 min).
+ * @param {number} [options.timeoutMs] - Force-kill child after this many ms (default 10 min, wall-clock ceiling).
+ * @param {number} [options.idleTimeoutMs] - Force-kill child after this many ms of no stdout/stderr output
+ *   (heartbeat watchdog; default off). Resets on every line, so slow-but-working children survive.
  * @param {Stream} [options.stream] - bareagent Stream instance — child:stderr events get re-emitted here.
  * @returns {{tool: import('../types').ToolDef, spawnChild: typeof spawnChild}}
  */
 export function createSpawnTool(options?: {
     cliPath?: string | undefined;
     timeoutMs?: number | undefined;
+    idleTimeoutMs?: number | undefined;
     stream?: import("../src/stream").Stream | undefined;
 }): {
     tool: import("../types").ToolDef;
@@ -86,12 +96,16 @@ export function createSpawnTool(options?: {
  * @property {string} [config] - Path to a bareagent config JSON file.
  * @property {*} [input] - Optional JSON input passed to the child on stdin.
  * @property {string} [cliPath] - Override the bareagent CLI path.
- * @property {number} [timeoutMs] - Force-kill child after this many ms.
+ * @property {number} [timeoutMs] - Force-kill child after this many ms (wall-clock hard ceiling).
+ * @property {number} [idleTimeoutMs] - Force-kill child after this many ms with NO output on either
+ *   stdout or stderr (heartbeat/liveness watchdog). The clock arms at spawn and resets on every JSONL
+ *   line, so a child doing real work is never killed, but one that hangs silently is. Opt-in
+ *   (0/undefined disables); independent of `timeoutMs`, which remains the absolute ceiling.
  * @property {Stream} [stream] - bareagent Stream — child:stderr events get re-emitted here.
  *
  * @param {SpawnChildOptions} [opts]
  */
-export function spawnChild({ config, input, cliPath, timeoutMs, stream }?: SpawnChildOptions): {
+export function spawnChild({ config, input, cliPath, timeoutMs, idleTimeoutMs, stream }?: SpawnChildOptions): {
     wait: () => Promise<{
         text: any;
         usage: any;
@@ -100,6 +114,7 @@ export function spawnChild({ config, input, cliPath, timeoutMs, stream }?: Spawn
         events: ChildEvent[];
         exitCode: any;
         signal: any;
+        idleKilled: boolean;
     }>;
     onLine: (fn: (event: ChildEvent) => void) => () => void;
     kill: (sig?: NodeJS.Signals) => void;

package/tools/spawn.js CHANGED Viewed

@@ -66,12 +66,16 @@ function resolveCliPath() {
  * @property {string} [config] - Path to a bareagent config JSON file.
  * @property {*} [input] - Optional JSON input passed to the child on stdin.
  * @property {string} [cliPath] - Override the bareagent CLI path.
- * @property {number} [timeoutMs] - Force-kill child after this many ms.
+ * @property {number} [timeoutMs] - Force-kill child after this many ms (wall-clock hard ceiling).
+ * @property {number} [idleTimeoutMs] - Force-kill child after this many ms with NO output on either
+ *   stdout or stderr (heartbeat/liveness watchdog). The clock arms at spawn and resets on every JSONL
+ *   line, so a child doing real work is never killed, but one that hangs silently is. Opt-in
+ *   (0/undefined disables); independent of `timeoutMs`, which remains the absolute ceiling.
  * @property {Stream} [stream] - bareagent Stream — child:stderr events get re-emitted here.
  *
  * @param {SpawnChildOptions} [opts]
  */
-function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
+function spawnChild({ config, input, cliPath, timeoutMs, idleTimeoutMs, stream } = {}) {
   if (typeof config !== 'string' || !config) {
     throw new Error('[spawn] requires { config: <path> }');
   }
@@ -104,10 +108,29 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
     if (i >= 0) lineSubscribers.splice(i, 1);
   }; };
+  // Idle watchdog: kill the child after `idleTimeoutMs` of silence on BOTH stdio streams.
+  // Distinct from `timeoutMs` (wall-clock ceiling): this catches a child that is alive but stuck
+  // producing nothing — the "no activity in stderr" hang — without punishing one doing slow work,
+  // since `armIdle()` resets on every line. Armed at spawn so a child that never emits is caught too.
+  let idleTimer = null;
+  let idleKilled = false;
+  const armIdle = () => {
+    if (!idleTimeoutMs || idleTimeoutMs <= 0) return;
+    if (idleTimer) clearTimeout(idleTimer);
+    idleTimer = setTimeout(() => {
+      idleKilled = true;
+      try { child.kill('SIGTERM'); } catch { /* already dead */ }
+      setTimeout(() => { try { child.kill('SIGKILL'); } catch { /* already dead */ } }, 5000).unref();
+    }, idleTimeoutMs);
+    idleTimer.unref();
+  };
+  armIdle();
   // stdout — JSONL events from the child loop
   const outRl = readline.createInterface({ input: child.stdout, crlfDelay: Infinity });
   outRl.on('line', (line) => {
     if (!line) return;
+    armIdle();
     let event;
     try { event = JSON.parse(line); }
     catch {
@@ -130,6 +153,7 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
   const errRl = readline.createInterface({ input: child.stderr, crlfDelay: Infinity });
   errRl.on('line', (line) => {
     if (!line) return;
+    armIdle();
     const event = { type: 'child:stderr', text: line, ts: new Date().toISOString() };
     events.push(event);
     if (stream) {
@@ -157,18 +181,20 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
   const exitPromise = new Promise((resolve) => {
     child.on('exit', async (code, signal) => {
       if (killTimer) clearTimeout(killTimer);
+      if (idleTimer) clearTimeout(idleTimer);
       // Drain stdio readlines before resolving — last line may still be in buffer.
       await Promise.all([outClosePromise, errClosePromise]);
-      resolve({ code, signal });
+      resolve({ code, signal, idleKilled });
     });
     child.on('error', (err) => {
       if (killTimer) clearTimeout(killTimer);
+      if (idleTimer) clearTimeout(idleTimer);
       resolve({ code: null, signal: null, spawnError: err });
     });
   });
   async function wait() {
-    const { code, signal, spawnError } = await exitPromise;
+    const { code, signal, spawnError, idleKilled: idle } = await exitPromise;
     if (spawnError) {
       return {
         text: '',
@@ -178,6 +204,7 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
         events,
         exitCode: null,
         signal: null,
+        idleKilled: false,
       };
     }
     // Pluck the final loop:done event — that's the canonical child result.
@@ -194,18 +221,21 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
         events,
         exitCode: code,
         signal,
+        idleKilled: !!idle,
       };
     }
     // No loop:done — child exited abnormally or never reached the LLM.
     const errEvent = events.find(e => e.type === 'loop:error' || e.type === 'error');
+    const idleNote = idle ? `[spawn] child killed after idle timeout (no output; signal=${signal})` : null;
     return {
       text: '',
       usage: { inputTokens: 0, outputTokens: 0 },
       cost: 0,
-      error: errEvent?.data?.error || `[spawn] child exited (code=${code}, signal=${signal}) without loop:done`,
+      error: idleNote || errEvent?.data?.error || `[spawn] child exited (code=${code}, signal=${signal}) without loop:done`,
       events,
       exitCode: code,
       signal,
+      idleKilled: !!idle,
     };
   }
@@ -222,7 +252,9 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
  *
  * @param {object} [options]
  * @param {string} [options.cliPath] - Override the bareagent CLI path (default: ./bin/cli.js relative to this file).
- * @param {number} [options.timeoutMs] - Force-kill child after this many ms (default 10 min).
+ * @param {number} [options.timeoutMs] - Force-kill child after this many ms (default 10 min, wall-clock ceiling).
+ * @param {number} [options.idleTimeoutMs] - Force-kill child after this many ms of no stdout/stderr output
+ *   (heartbeat watchdog; default off). Resets on every line, so slow-but-working children survive.
  * @param {Stream} [options.stream] - bareagent Stream instance — child:stderr events get re-emitted here.
  * @returns {{tool: import('../types').ToolDef, spawnChild: typeof spawnChild}}
  */
@@ -250,6 +282,7 @@ function createSpawnTool(options = {}) {
         input,
         cliPath: options.cliPath,
         timeoutMs: options.timeoutMs ?? DEFAULT_TIMEOUT_MS,
+        idleTimeoutMs: options.idleTimeoutMs,
         stream: options.stream,
       });
       return await handle.wait();