bare-agent 0.15.0 → 0.16.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -66,8 +66,9 @@ Every piece works alone — take what you need, ignore the rest.
66
66
 
67
67
  | Component | What it does |
68
68
  |---|---|
69
- | **Loop** | Think → act → observe → repeat. Calls any LLM, executes your tools, loops until done. Returns estimated USD cost per run. Governance via `Loop({ policy })` wire bareguard's `Gate` through `wireGate(gate)` and every tool call (native, MCP, browsing, mobile) traverses one chokepoint with per-caller `ctx` routing. Bareguard owns the audit log, budget caps, and halt decisions; Loop respects the verdict. Context engineering via `Loop({ assemble })` a per-round `assemble(msgs, ctx)` chokepoint to recall/compress/trim the window sent to the model (the seam litectx plugs into); returns a view, the canonical transcript stays intact, fail-open. The exported `unitAssembler`/`toUnits`/`fromUnits` adapter lets a consumer work over a neutral unit `{id, role, content, kind, pinned, atomic, tokensApprox}` — bareagent owns the grammar (atomic tool-pair bundling, pinned system/task, a pairing seatbelt), the consumer owns content + relevance. The CE function reads its inputs from the per-run `ctx` — litectx's budget-fitter uses `ctx.budget` (and `ctx.task`), so you **must** populate it via `run(msgs, tools, { ctx })`: an unset `ctx.budget` means the fitter has no budget, keeps everything, and returns the window unchanged — a silent no-op, not a bug (see `examples/litectx-assemble.mjs`). For summary-window compaction the Loop also lends a provider-bound `ctx.summarize(excerpt) => Promise<string>` (R-C6): the consumer owns when/what to summarize and the splice, bareagent makes the one model call (counted against the budget via `onLlmResult`, tagged `kind:'summarize'`). For an unbounded long-running agent there's the **destructive** counterpart `Loop({ trim })` (RT-2) — a per-round bound on the canonical transcript that evicts old turns *after* harvesting them; wire it with the exported `unitTrimmer({ trim, onHarvest, policy })` over litectx's `trim` verb (harvest-before-evict, fail-open; `harvestKey` gives the stable upsert id), opt-in (requires a consumer on litectx ≥ 0.16.0). `onError` + `loop:error` surface every silent-ish failure (callback throw, Checkpoint timeout) |
69
+ | **Loop** | Think → act → observe → repeat. Calls any LLM, runs your tools, loops until done, returns estimated USD cost per run. Three opt-in seams hook external libraries in without touching your code: **`policy`** (governance wire bareguard for one gated chokepoint over every tool call), **`assemble`** (context engineering — recall/compress/trim the window per round; the seam [litectx](https://npmjs.com/package/litectx) plugs into, transcript untouched), and **`trim`** (destructively bound the transcript for unbounded runs, harvesting turns before eviction). Each is a single chokepoint, fail-open, off by default. `onError` + `loop:error` surface every silent failure |
70
70
  | **Planner** | Break a goal into a step DAG via LLM. Built-in caching (`cacheTTL`) |
71
+ | **assessComplexity** | Pure-code pre-planner (no LLM): rates a goal `simple`/`medium`/`complex`/`critical` from its text via keyword scoring + a critical safety override. `needsPlanning` gates whether to spend a Planner pass; `critical` flags security/production/compliance work for extra scrutiny. Free, instant, debuggable via `signals` |
71
72
  | **runPlan** | Execute steps in parallel waves. Dependency-aware, failure propagation, per-step retry |
72
73
  | **Retry** | Exponential/linear backoff with jitter. Respects `err.retryable` |
73
74
  | **CircuitBreaker** | Fail fast after N errors. Auto-recovers after cooldown. Per-key isolation |
@@ -78,13 +79,13 @@ Every piece works alone — take what you need, ignore the rest.
78
79
  | **Scheduler** | Cron (`0 9 * * 1-5`) or relative (`2h`, `30m`). Persisted jobs survive restarts |
79
80
  | **Stream** | Structured event emitter. Pipe as JSONL, subscribe in-process, or custom transport |
80
81
  | **Errors** | Typed hierarchy — `ProviderError`, `ToolError`, `TimeoutError`, `CircuitOpenError`, `ValidationError`. Halt decisions (turn cap, budget cap, content rules) come from bareguard, not Loop |
81
- | **bareguard adapter** | `wireGate(gate)` returns `{ policy, onLlmResult, onToolResult, filterTools, formatDeny }` one-line wiring to bareguard's `Gate`. `policy` maps gate decisions to Loop's policy contract; `onLlmResult` + `onToolResult` forward every LLM and tool result to `gate.record` (so `budget.maxCostUsd` covers token-only workloads); `filterTools` drops denied tools from the catalog the LLM ever sees. Halt-severity decisions throw a typed `HaltError` and Loop exits cleanly never leaks `[HALT: ...]` to the LLM. `require('bare-agent/bareguard')` |
82
+ | **bareguard adapter** | `wireGate(gate)` `{ policy, onLlmResult, onToolResult, filterTools, formatDeny }`: one-line wiring to bareguard's `Gate`. Routes every LLM + tool result through the gate so budget caps cover token-heavy workloads, drops denied tools before the LLM ever sees them, and turns halts into a clean exit. `require('bare-agent/bareguard')` |
82
83
  | **Browsing** | Web navigation, clicking, typing, reading via `barebrowse` (17 tools). Two modes: library tools (inline snapshots, pass to Loop) or CLI session (disk-based snapshots, token-efficient for multi-step flows). Optional `assess` tool (privacy scan) when `wearehere` is installed |
83
84
  | **Mobile** | Android + iOS device control via `baremobile`. Same two modes: library tools (`createMobileTools` — action tools auto-return snapshots) or CLI session (`baremobile` CLI — disk-based snapshots) |
84
85
  | **Shell** | Cross-platform `shell_read`, `shell_grep`, `shell_run` (argv, no shell), `shell_exec` (raw shell). Pure Node — no `grep`/`rg`/`findstr` dependency. Injection-proof `shell_run` for policy-gated use |
85
- | **MCP Bridge** | Auto-discover MCP servers from IDE configs (Claude Code, Cursor, etc.), expose as bareagent tools. Static allow/deny via `.mcp-bridge.json`, `systemContext` for LLM awareness. Runtime policy lives in `Loop({ policy })` — one hook for MCP + native tools alike. Returns both bulk `tools` (one per MCP tool) and `metaTools` (`mcp_discover` + `mcp_invoke` for token-thrifty access to large catalogs). Connecting runs a server's `command` (which may come from a cwd `.mcp.json`): pass `confirmServer` to vet each before it spawns — otherwise the bridge warns naming every command it runs. Every RPC is time-bounded (`timeout` for the handshake, `callTimeout` for `tools/call`), and a server that breaks its stdin pipe fails the connection instead of crashing the host. Zero deps |
86
- | **Spawn** | Fork a child bareagent process as a specialist agent. LLM-callable form blocks until child exits; library form returns a handle (`wait`, `onLine`, `kill`). One JSONL channel per child — child stderr captured and re-emitted as `child:stderr` events on the parent stream. Threads `BAREGUARD_AUDIT_PATH` / `BAREGUARD_PARENT_RUN_ID` / `BAREGUARD_BUDGET_FILE` / `BAREGUARD_SPAWN_DEPTH` so the family stitches into one audit + budget. `bareguard ^0.2.0` adds `spawn.ratePerMinute` + `limits.maxDepth` per-family caps |
87
- | **Defer** | Append a `{action, when}` record to a JSONL queue for a separate waker (cron / systemd timer / `examples/wake.sh`) to fire later. Two-phase governance: emit-time `gate.check` on the `defer` action; fire-time `gate.check` on the inner action when the waker re-invokes. `bareguard ^0.2.0` adds `defer.ratePerMinute` family-wide cap |
86
+ | **MCP Bridge** | Auto-discover MCP servers from your $HOME/IDE configs (Claude Code, Cursor, ) and expose them as bareagent tools bulk (`tools`) or token-thrifty meta-tools (`mcp_discover` + `mcp_invoke`) for large catalogs. Same `Loop({ policy })` hook governs MCP and native tools alike. The project-cwd `.mcp.json` is **opt-in** (untrusted-repo safety); vet every server spawn with `confirmServer`; every RPC is time-bounded. Zero deps |
87
+ | **Spawn** | Fork a child bareagent as a specialist agent LLM-callable (blocks until exit) or a library handle (`wait`, `onLine`, `kill`). The whole family stitches into one audit log + budget; `bareguard ^0.2.0` adds per-family rate + depth caps. `timeoutMs` caps wall-clock, opt-in `idleTimeoutMs` kills a child gone silent (slow-but-working children survive) |
88
+ | **Defer** | Queue a `{action, when}` record for a separate waker (cron / `examples/wake.sh`) to fire later. Governed twice once when emitted, again when it fires. `bareguard ^0.2.0` adds a family-wide rate cap |
88
89
 
89
90
  **Providers:** OpenAI-compatible (OpenAI, OpenRouter, Groq, vLLM, LM Studio), Anthropic, Ollama, CLIPipe (any CLI tool via stdin/stdout with real-time streaming), Fallback, or bring your own (one method: `generate`). All return the same shape — swap freely. The OpenAI provider warns if it would send your key over plaintext `http://` to a non-loopback host (use `https`, or drop `apiKey` for keyless local endpoints).
90
91
 
@@ -94,6 +95,8 @@ Every piece works alone — take what you need, ignore the rest.
94
95
 
95
96
  **Deps:** 1 required (`bareguard ^0.2.0` for governance — single-gate policy + audit + budget + per-family rate caps). Optional: `cron-parser` (cron expressions), `better-sqlite3` (SQLite store), `barebrowse` (web browsing), `baremobile` (Android + iOS device control), `wearehere` (privacy assessment via barebrowse).
96
97
 
98
+ This table is the map, not the manual — per-component wiring and API detail live in the [Integration Guide](bareagent.context.md) and [Usage Guide](docs/02-features/usage-guide.md).
99
+
97
100
  ---
98
101
 
99
102
  ## Recipes
@@ -1,7 +1,7 @@
1
1
  # bareagent — Integration Guide
2
2
 
3
3
  > For AI assistants and developers wiring bareagent into a project.
4
- > v0.15.0 | Node.js >= 18 | one required dep (`bareguard ^0.4.2`) | Apache 2.0
4
+ > v0.16.1 | Node.js >= 18 | one required dep (`bareguard ^0.4.2`) | Apache 2.0
5
5
  >
6
6
  > Full human guide with composition examples, design philosophy, and recipes: [Usage Guide](docs/02-features/usage-guide.md)
7
7
 
@@ -14,7 +14,7 @@ npm install bare-agent
14
14
  ```
15
15
 
16
16
  Eight entry points:
17
- - `require('bare-agent')` — Loop, Planner, StateMachine, Scheduler, Checkpoint, Memory, Stream, Retry, runPlan, CircuitBreaker, wireGate, defaultActionTranslator, **toUnits, fromUnits, unitAssembler** (the `assemble` context-units adapter, v0.13+), **unitTrimmer, harvestKey** (the destructive `trim` seam adapter — RT-2 harvest-before-evict, needs a consumer on litectx ≥ 0.16.0), BareAgentError, ProviderError, ToolError, TimeoutError, ValidationError, CircuitOpenError, **HaltError**
17
+ - `require('bare-agent')` — Loop, Planner, **assessComplexity** (pure-code no-LLM pre-planner → `{level, score, needsPlanning, signals}`), StateMachine, Scheduler, Checkpoint, Memory, Stream, Retry, runPlan, CircuitBreaker, wireGate, defaultActionTranslator, **toUnits, fromUnits, unitAssembler** (the `assemble` context-units adapter, v0.13+), **unitTrimmer, harvestKey** (the destructive `trim` seam adapter — RT-2 harvest-before-evict, needs a consumer on litectx ≥ 0.16.0), BareAgentError, ProviderError, ToolError, TimeoutError, ValidationError, CircuitOpenError, **HaltError**
18
18
  - `require('bare-agent/errors')` — same error classes via a stable subpath (v0.10.1+) for adopters who want to import only the error surface
19
19
  - `require('bare-agent/providers')` — OpenAI, Anthropic, Ollama, CLIPipe, Fallback (the canonical short names; `*Provider` aliases — `OpenAIProvider`, `AnthropicProvider`, etc. — are also exported and match the class names, so either destructure works, v0.12.1+)
20
20
  - `require('bare-agent/stores')` — SQLite (FTS5), JsonFile
@@ -31,6 +31,8 @@ Eight entry points:
31
31
  |---|---|
32
32
  | Call an LLM with tools and get a result | Loop + a Provider |
33
33
  | Break a goal into steps | Planner + a Provider |
34
+ | Size a goal before planning (no LLM) | assessComplexity — `needsPlanning` gates a Planner pass |
35
+ | Kill a spawned child that hangs silently | createSpawnTool / spawnChild `{ idleTimeoutMs }` |
34
36
  | Execute a step DAG with parallelism | runPlan + executeFn |
35
37
  | Track task state (pending/running/done/failed) | StateMachine |
36
38
  | Run agent turns on a schedule (cron, timers) | Scheduler |
@@ -249,16 +251,17 @@ invoked tool name lives in `args.name`. To deny specific MCP tools when
249
251
  using metaTools, use `tools.denyArgPatterns: { mcp_invoke: [/"name":"linear_admin_/] }`
250
252
  or `content.denyPatterns` over the serialized action.
251
253
 
252
- **Vetting server commands (v0.11.0).** Connecting to a server runs its
253
- `command`, and discovery reads `.mcp.json` from the cwd (an untrusted
254
- repo) as well as your home/IDE configs. Pass `confirmServer(name, def)
255
- => boolean` to `createMCPBridge` to approve each server **before its
256
- command is spawned** (return `false` to skip it; a throw fails closed).
257
- Default trusts all discovered servers unchanged behavior. **When no
258
- `confirmServer` is set, the bridge prints a one-time warning naming every
259
- command it is about to spawn** (before the first spawn, discovery included),
260
- so a cwd `.mcp.json` can't run a command unannounced `confirmServer` is
261
- still how you actually *gate* it.
254
+ **Vetting server commands (v0.11.0; cwd default tightened v0.16.1).** Connecting
255
+ to a server runs its `command`. **Default discovery now scans only your
256
+ $HOME/IDE configs NOT the project-cwd `./.mcp.json`** (v0.16.1): a checked-in
257
+ config in an untrusted repo would otherwise auto-spawn arbitrary commands. To
258
+ include the project config, pass `createMCPBridge({ includeProjectConfig: true })`,
259
+ or a `confirmServer` hook (which implies it, since the hook vets every command).
260
+ Explicitly-passed `configPaths` are honored verbatim. Pass `confirmServer(name,
261
+ def) => boolean` to approve each server **before its command is spawned** (return
262
+ `false` to skip it; a throw fails closed). When no `confirmServer` is set, the
263
+ bridge still trusts all *discovered* servers and prints a one-time warning naming
264
+ every command it is about to spawn — `confirmServer` is how you actually *gate* it.
262
265
 
263
266
  **RPC timeouts (Unreleased).** Every JSON-RPC round-trip is now bounded, so a
264
267
  server that never answers can't hang the bridge or the loop. `opts.timeout`
@@ -552,13 +555,13 @@ new CLIPipe({ command: 'claude', args: ['--print'], systemPromptFlag: '--system-
552
555
  new CLIPipe({ command: 'ollama', args: ['run', 'llama3.2'] })
553
556
  ```
554
557
 
555
- All return `{ text, toolCalls, usage: { inputTokens, outputTokens } }`. CLIPipe always returns `toolCalls: []` and zero usage (CLI tools don't report tokens).
558
+ All return `{ text, toolCalls, usage: { inputTokens, outputTokens }, model? }`. The optional `model` (v0.16.1+) is the id the response was produced by — Loop prefers it over `provider.model` for cost accounting. CLIPipe always returns `toolCalls: []` and zero usage (CLI tools don't report tokens), and omits `model`.
556
559
 
557
560
  **Error body (v0.11.0):** on an HTTP error the OpenAI/Anthropic/Ollama providers throw a `ProviderError` whose `message` carries the upstream error string. The full parsed response is **not** attached to `err.body` by default (so an unexpected field can't leak through logs that dump the error object). Pass `{ exposeErrorBody: true }` to attach it for debugging.
558
561
 
559
562
  **Plaintext-key warning (Unreleased):** the OpenAI provider's `baseUrl` accepts `http://` (for local/OpenAI-compatible endpoints), but a `Bearer` key sent over plaintext http to a **non-loopback** host is exposed on the wire. The provider now warns once when that happens. Loopback hosts (`localhost`/`127.0.0.0/8`/`::1` — local proxies, Ollama-style endpoints) stay silent, since that's the legitimate keyless-local case. The header is **not** stripped (some local proxies want a key), so use `https` for any remote endpoint, or drop `apiKey` when the local endpoint needs none.
560
563
 
561
- **Cost estimation:** Loop automatically estimates USD cost per run based on model and token usage. The `cost` field appears in every `loop.run()` result and in `loop:done` stream events. Pricing covers OpenAI and Anthropic models; unknown models use a default average. To adjust rates, edit `COST_PER_1K` at the top of `src/loop.js`.
564
+ **Cost estimation:** Loop automatically estimates USD cost per run based on model and token usage. The `cost` field appears in every `loop.run()` result and in `loop:done` stream events. Pricing covers OpenAI and Anthropic models; unknown models use a default average. To adjust rates, edit `COST_PER_1K` at the top of `src/loop.js`. The model is resolved as `result.model || provider.model` (v0.16.1+) — providers now echo the model in their `generate()` result, so cost accounting holds even when `provider.model` is absent or varies per response, e.g. behind `FallbackProvider` or `CircuitBreaker.wrapProvider` (the wrapper also preserves `model`/`name` passthrough props). Wire `onLlmResult` (via `wireGate`) and a `budget.maxCostUsd` cap then halts on token-heavy workloads too.
562
565
 
563
566
  ## Store options
564
567
 
@@ -640,7 +643,7 @@ All error classes extend `Error` — `instanceof Error` always works. The `retry
640
643
  ## Key contracts
641
644
 
642
645
  - Loop builds messages in OpenAI format internally. Each provider normalizes to its native format.
643
- - `provider.generate(messages, tools, options)` must return `{ text, toolCalls, usage }`.
646
+ - `provider.generate(messages, tools, options)` must return `{ text, toolCalls, usage }` (and may include `model` for accurate cost accounting).
644
647
  - Store must implement `store(content, metadata) → id`, `search(query, options) → [{id, content, metadata, score}]`, `get(id)`, `delete(id)`.
645
648
  - Components are independent: Memory doesn't know Loop, Scheduler doesn't know Planner. You compose them.
646
649
 
package/index.d.ts CHANGED
@@ -1,5 +1,6 @@
1
1
  import { Loop } from "./src/loop";
2
2
  import { Planner } from "./src/planner";
3
+ import { assessComplexity } from "./src/complexity";
3
4
  import { StateMachine } from "./src/state";
4
5
  import { Scheduler } from "./src/scheduler";
5
6
  import { Checkpoint } from "./src/checkpoint";
@@ -22,4 +23,4 @@ import { TimeoutError } from "./src/errors";
22
23
  import { ValidationError } from "./src/errors";
23
24
  import { CircuitOpenError } from "./src/errors";
24
25
  import { HaltError } from "./src/errors";
25
- export { Loop, Planner, StateMachine, Scheduler, Checkpoint, Memory, Stream, Retry, runPlan, CircuitBreaker, wireGate, defaultActionTranslator, toUnits, fromUnits, unitAssembler, unitTrimmer, harvestKey, BareAgentError, ProviderError, ToolError, TimeoutError, ValidationError, CircuitOpenError, HaltError };
26
+ export { Loop, Planner, assessComplexity, StateMachine, Scheduler, Checkpoint, Memory, Stream, Retry, runPlan, CircuitBreaker, wireGate, defaultActionTranslator, toUnits, fromUnits, unitAssembler, unitTrimmer, harvestKey, BareAgentError, ProviderError, ToolError, TimeoutError, ValidationError, CircuitOpenError, HaltError };
package/index.js CHANGED
@@ -2,6 +2,7 @@
2
2
 
3
3
  const { Loop } = require('./src/loop');
4
4
  const { Planner } = require('./src/planner');
5
+ const { assessComplexity } = require('./src/complexity');
5
6
  const { StateMachine } = require('./src/state');
6
7
  const { Scheduler } = require('./src/scheduler');
7
8
  const { Checkpoint } = require('./src/checkpoint');
@@ -25,6 +26,7 @@ const {
25
26
  module.exports = {
26
27
  Loop,
27
28
  Planner,
29
+ assessComplexity,
28
30
  StateMachine,
29
31
  Scheduler,
30
32
  Checkpoint,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "bare-agent",
3
- "version": "0.15.0",
3
+ "version": "0.16.1",
4
4
  "files": [
5
5
  "index.js",
6
6
  "index.d.ts",
@@ -57,14 +57,18 @@ export class CircuitBreaker {
57
57
  */
58
58
  reset(key?: string): void;
59
59
  /**
60
- * Wrap a provider so generate() goes through the circuit breaker.
61
- * @param {{ generate: (...args: any[]) => Promise<any> }} provider - Provider with generate().
60
+ * Wrap a provider so generate() goes through the circuit breaker. Passthrough props (e.g.
61
+ * `model`, `name`) are preserved so Loop cost accounting which reads `provider.model`
62
+ * keeps working through the wrapper.
63
+ * @param {{ generate: (...args: any[]) => Promise<any>, [k: string]: any }} provider - Provider with generate().
62
64
  * @param {string} [key] - Circuit key.
63
- * @returns {{ generate: (...args: any[]) => Promise<any> }} Wrapped provider with generate().
65
+ * @returns {{ generate: (...args: any[]) => Promise<any>, [k: string]: any }} Wrapped provider.
64
66
  */
65
67
  wrapProvider(provider: {
66
68
  generate: (...args: any[]) => Promise<any>;
69
+ [k: string]: any;
67
70
  }, key?: string): {
68
71
  generate: (...args: any[]) => Promise<any>;
72
+ [k: string]: any;
69
73
  };
70
74
  }
@@ -112,13 +112,16 @@ class CircuitBreaker {
112
112
  }
113
113
 
114
114
  /**
115
- * Wrap a provider so generate() goes through the circuit breaker.
116
- * @param {{ generate: (...args: any[]) => Promise<any> }} provider - Provider with generate().
115
+ * Wrap a provider so generate() goes through the circuit breaker. Passthrough props (e.g.
116
+ * `model`, `name`) are preserved so Loop cost accounting which reads `provider.model`
117
+ * keeps working through the wrapper.
118
+ * @param {{ generate: (...args: any[]) => Promise<any>, [k: string]: any }} provider - Provider with generate().
117
119
  * @param {string} [key] - Circuit key.
118
- * @returns {{ generate: (...args: any[]) => Promise<any> }} Wrapped provider with generate().
120
+ * @returns {{ generate: (...args: any[]) => Promise<any>, [k: string]: any }} Wrapped provider.
119
121
  */
120
122
  wrapProvider(provider, key) {
121
123
  return {
124
+ ...provider,
122
125
  /** @param {...any} args */
123
126
  generate: (...args) => this.call(() => provider.generate(...args), key),
124
127
  };
@@ -0,0 +1,31 @@
1
+ export type ComplexityResult = {
2
+ /**
3
+ * - Assessed complexity tier.
4
+ */
5
+ level: "simple" | "medium" | "complex" | "critical";
6
+ /**
7
+ * - Raw heuristic score (100 for a critical override).
8
+ */
9
+ score: number;
10
+ /**
11
+ * - false for `simple`, true otherwise — the routing hint.
12
+ */
13
+ needsPlanning: boolean;
14
+ /**
15
+ * - Which signals fired, for transparency/debugging.
16
+ */
17
+ signals: string[];
18
+ };
19
+ /**
20
+ * @typedef {object} ComplexityResult
21
+ * @property {'simple'|'medium'|'complex'|'critical'} level - Assessed complexity tier.
22
+ * @property {number} score - Raw heuristic score (100 for a critical override).
23
+ * @property {boolean} needsPlanning - false for `simple`, true otherwise — the routing hint.
24
+ * @property {string[]} signals - Which signals fired, for transparency/debugging.
25
+ */
26
+ /**
27
+ * Assess the complexity of a goal/prompt from its text alone (no LLM).
28
+ * @param {string} prompt - The goal to classify.
29
+ * @returns {ComplexityResult}
30
+ */
31
+ export function assessComplexity(prompt: string): ComplexityResult;
@@ -0,0 +1,149 @@
1
+ 'use strict';
2
+
3
+ /**
4
+ * Keyword complexity assessor — a fast, pure-code "pre-planner" that classifies a goal as
5
+ * simple / medium / complex / critical from its text alone, with NO LLM call. Ported (concept,
6
+ * not line-for-line) from Aurora's SOAR keyword assessor. It exists to drive a routing decision:
7
+ * a `simple` goal can run single-shot; `medium`+ warrants a Planner pass; `critical` (security,
8
+ * production, compliance, financial) flags work that deserves extra scrutiny (e.g. a checkpoint /
9
+ * adversarial verification) before acting.
10
+ *
11
+ * const { level, needsPlanning } = assessComplexity(goal);
12
+ * const steps = needsPlanning ? await planner.plan(goal) : [{ id: 's1', action: goal }];
13
+ *
14
+ * Concept, deliberately lightweight: a critical-keyword override, tiered action-verb scoring
15
+ * (simple verbs subtract, complex verbs add the most), feature nouns + scope + structure signals,
16
+ * and two calibrated thresholds. It is a heuristic — transparent and debuggable via `signals`, not
17
+ * a model. On the upstream validation corpus it lands ~89% (the fuller LLM-free original ~95%);
18
+ * the gap is long-tail ambiguity ("add a button" is genuinely context-dependent).
19
+ */
20
+
21
+ const has = (/** @type {Set<string>} */ words, /** @type {Set<string>} */ set) =>
22
+ [...set].filter(w => words.has(w));
23
+ const wordSet = (/** @type {string} */ s) => new Set(s.match(/\b\w+\b/g) || []);
24
+ // Escape regex metacharacters so a keyword can't break (or alter) the word-boundary match — the
25
+ // lists below are plain words today, but a future entry like "c++" or ".net" must stay literal.
26
+ const esc = (/** @type {string} */ k) => k.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
27
+ const hasAny = (/** @type {string} */ s, /** @type {string[]} */ list) =>
28
+ list.some(k => new RegExp(`\\b${esc(k)}\\b`).test(s));
29
+
30
+ // --- critical safety override: high-stakes work jumps straight to the top tier ---
31
+ const CRIT_INCIDENT = ['emergency', 'outage', 'breach', 'vulnerability', 'exploit', 'corruption', 'data loss', 'incident', 'penetration'];
32
+ const CRIT_COMPLIANCE = ['gdpr', 'hipaa', 'pci', 'compliance', 'regulation'];
33
+ const SEC_CONTEXT = ['security', 'production', 'authentication', 'authorization'];
34
+ const CRIT_ACTIONS = ['fix', 'patch', 'investigate', 'secure', 'protect', 'mitigate', 'prevent', 'respond', 'handle'];
35
+ const FINANCIAL = ['payment', 'transaction', 'billing', 'financial'];
36
+ const SECURE_ACTS = ['encrypt', 'secure', 'protect', 'audit'];
37
+
38
+ /** @param {string} s lowercased prompt */
39
+ function isCritical(s) {
40
+ if (hasAny(s, CRIT_INCIDENT) || hasAny(s, CRIT_COMPLIANCE)) return true;
41
+ if (hasAny(s, SEC_CONTEXT) && hasAny(s, CRIT_ACTIONS)) return true; // e.g. "fix security ..."
42
+ if (hasAny(s, FINANCIAL) && hasAny(s, SECURE_ACTS)) return true; // e.g. "encrypt payment ..."
43
+ return false;
44
+ }
45
+
46
+ // --- tiered keyword scoring: verb "weight" reflects how much work the ask implies ---
47
+ const COMPLEX_VERBS = new Set(['implement', 'design', 'architect', 'refactor', 'integrate', 'migrate', 'build', 'create', 'develop', 'construct', 'engineer', 'establish', 'transform', 'overhaul', 'rewrite', 'restructure', 'optimize']);
48
+ const ANALYSIS_VERBS = new Set(['explain', 'compare', 'analyze', 'debug', 'understand', 'investigate', 'describe', 'evaluate', 'review', 'examine', 'diagnose', 'trace', 'why', 'difference']);
49
+ const MEDIUM_VERBS = new Set(['add', 'update', 'fix', 'write', 'change', 'modify', 'remove', 'delete', 'improve', 'enhance', 'extend', 'convert', 'rename', 'move', 'test', 'configure', 'setup', 'set', 'enable', 'disable']);
50
+ const SIMPLE_VERBS = new Set(['what', 'show', 'list', 'get', 'find', 'print', 'check', 'read', 'open', 'run', 'where', 'which', 'display', 'view', 'see', 'tell', 'give', 'name', 'count', 'who', 'when', 'is']);
51
+ const SCOPE = new Set(['all', 'every', 'entire', 'across', 'comprehensive', 'complete', 'codebase', 'project', 'system', 'application', 'full', 'whole', 'everything', 'throughout']);
52
+ const DOMAINS = new Set(['security', 'performance', 'scalability', 'reliability', 'testing', 'authentication', 'authorization', 'caching', 'logging', 'monitoring', 'database', 'api', 'frontend', 'backend', 'infrastructure', 'deployment', 'docker', 'kubernetes', 'microservices', 'distributed']);
53
+ // Feature/system nouns: paired with an action verb they signal a real feature, not a one-liner.
54
+ const COMPLEX_NOUNS = new Set(['authentication', 'authorization', 'oauth', 'jwt', 'session', 'sessions', 'pipeline', 'workflow', 'notification', 'notifications', 'dashboard', 'crud', 'plugin', 'framework', 'websocket', 'websockets', 'realtime', 'pagination', 'search', 'validation', 'migration', 'schema', 'registration']);
55
+ const SEQUENCE = ['first', 'then', 'after that', 'finally', 'next', 'afterwards', 'subsequently', 'step by step', 'and then', 'as well as', 'additionally', 'along with'];
56
+ const CONSTRAINTS = ['without breaking', 'without changing', 'maintaining', 'ensuring', 'backward compatible', 'backwards compatible', 'must not', 'should not', 'preserve', 'without affecting'];
57
+
58
+ const SIMPLE_THRESHOLD = 11;
59
+ const MEDIUM_THRESHOLD = 28;
60
+
61
+ // Bound the text the assessor scans. Several signal patterns contain `.*`, which can backtrack
62
+ // quadratically on adversarial input (e.g. "integrate "×N with no "with" — O(n²)). Complexity is
63
+ // fully determined by the opening of a goal, so capping the working string makes every scan
64
+ // linear-bounded and removes the DoS surface for callers that pass untrusted end-user text.
65
+ const MAX_ASSESS_LEN = 4000;
66
+
67
+ /**
68
+ * @typedef {object} ComplexityResult
69
+ * @property {'simple'|'medium'|'complex'|'critical'} level - Assessed complexity tier.
70
+ * @property {number} score - Raw heuristic score (100 for a critical override).
71
+ * @property {boolean} needsPlanning - false for `simple`, true otherwise — the routing hint.
72
+ * @property {string[]} signals - Which signals fired, for transparency/debugging.
73
+ */
74
+
75
+ /**
76
+ * Assess the complexity of a goal/prompt from its text alone (no LLM).
77
+ * @param {string} prompt - The goal to classify.
78
+ * @returns {ComplexityResult}
79
+ */
80
+ function assessComplexity(prompt) {
81
+ if (typeof prompt !== 'string' || !prompt.trim()) {
82
+ return { level: 'simple', score: 0, needsPlanning: false, signals: ['empty'] };
83
+ }
84
+ const text = prompt.trim().slice(0, MAX_ASSESS_LEN);
85
+ const lower = text.toLowerCase();
86
+ if (isCritical(lower)) {
87
+ return { level: 'critical', score: 100, needsPlanning: true, signals: ['critical_override'] };
88
+ }
89
+
90
+ const words = wordSet(lower);
91
+ const wc = text.split(/\s+/).length;
92
+ /** @type {string[]} */
93
+ const signals = [];
94
+ let score = 0;
95
+ /** @param {number} n @param {string} [sig] */
96
+ const add = (n, sig) => { score += n; if (sig) signals.push(sig); };
97
+
98
+ const complex = has(words, COMPLEX_VERBS);
99
+ const analysis = has(words, ANALYSIS_VERBS);
100
+ const medium = has(words, MEDIUM_VERBS);
101
+ const simple = has(words, SIMPLE_VERBS);
102
+ const scope = has(words, SCOPE);
103
+ const domains = has(words, DOMAINS);
104
+
105
+ if (complex.length) add(complex.length * 25, 'complex_verbs');
106
+ if (analysis.length) add(Math.min(analysis.length * 15, 20), 'analysis_verbs');
107
+ if (medium.length) add(medium.length * 12, 'medium_verbs');
108
+ if (simple.length) add(-Math.min(simple.length * 3, 10), 'simple_verbs');
109
+ if (scope.length) add(scope.length * 12, 'scope');
110
+ if (domains.length > 1) add(domains.length * 8, 'multi_domain');
111
+ else if (domains.length) add(5, 'domain');
112
+
113
+ // feature noun + an action verb => a real feature (pushes single-verb asks up a tier)
114
+ const nouns = has(words, COMPLEX_NOUNS);
115
+ if (nouns.length && (medium.length || complex.length)) add(nouns.length * 10, 'feature_nouns');
116
+ if (/\b(?:dark\s*mode|feature\s*flags?|real-?time|end-?to-?end|full-?stack)\b/.test(lower)) add(12, 'feature_pattern');
117
+ if (/\bintegrate\b.*\bwith\b/.test(lower)) add(15, 'integration');
118
+ if (/\b(?:improve|optimize)\s+(?:performance|speed|efficiency)\b/.test(lower)
119
+ && !/\b(?:this|the)\s+(?:function|method|query|loop)\b/.test(lower)) add(15, 'open_ended');
120
+
121
+ // structure / sequencing — multi-step asks are heavier
122
+ const seq = SEQUENCE.filter(m => lower.includes(m)).length;
123
+ if (seq) add(seq * 8, 'sequence');
124
+ const constraints = CONSTRAINTS.filter(m => lower.includes(m)).length;
125
+ if (constraints) add(constraints * 12, 'constraints');
126
+ const listItems = (text.match(/(?:^|\n)\s*(?:\d+[.)]|[-*])\s/g) || []).length;
127
+ if (listItems) add(listItems * 9, 'list');
128
+
129
+ // length: longer prompts trend more complex
130
+ if (wc > 40) add(15, 'long'); else if (wc > 20) add(10); else if (wc > 10) add(5);
131
+
132
+ // architectural / open-ended questions read simple lexically but imply design work
133
+ if (/\bbest\s+(?:way|approach|practice|architecture)\b|\barchitecture\s+for\b|\bhow (?:should|can|do) (?:we|i)\b.*\b(?:handle|design|implement|build)\b/.test(lower)) add(15, 'design_question');
134
+ if (/^(?:what is|where is|which|who|is there)\b/.test(lower)) add(-8, 'simple_question');
135
+
136
+ // a trivial edit (typo, comment, log line, version bump) stays simple even though its verb is
137
+ // "medium" weight — gated to trivial OBJECTS so real medium work isn't wrongly demoted.
138
+ const trivial = /\b(?:fix|add|remove|delete|rename|update|change)\b.*\b(?:typo|comment|console\.?log|variable|version|line|import)\b/.test(lower)
139
+ || /\bwrite\s+(?:a|the)\s+function\s+(?:that|to|which)\b/.test(lower);
140
+ if (trivial && !complex.length && !scope.length && wc <= 10) {
141
+ score = Math.min(score, SIMPLE_THRESHOLD);
142
+ signals.push('trivial_edit');
143
+ }
144
+
145
+ const level = score <= SIMPLE_THRESHOLD ? 'simple' : score <= MEDIUM_THRESHOLD ? 'medium' : 'complex';
146
+ return { level, score, needsPlanning: level !== 'simple', signals };
147
+ }
148
+
149
+ module.exports = { assessComplexity };
package/src/loop.js CHANGED
@@ -331,7 +331,7 @@ class Loop {
331
331
  const startedAt = Date.now();
332
332
  const result = await loop.provider.generate(prompt, [], { temperature: 0, ...genOpts });
333
333
  const usage = (result && result.usage) || null;
334
- const model = loop.provider.model || null;
334
+ const model = (result && result.model) || loop.provider.model || null;
335
335
  const cost = estimateCost(model, usage);
336
336
  if (cost !== null) totalCost += cost;
337
337
  loop._safeEmit({ type: 'loop:summarize', data: { usage, costUsd: cost, durationMs: Date.now() - startedAt } });
@@ -421,7 +421,9 @@ class Loop {
421
421
  }
422
422
 
423
423
  lastUsage = result.usage || lastUsage;
424
- const model = this.provider.model || null;
424
+ // Prefer the model the response reports (robust when provider.model is absent or varies per
425
+ // response — e.g. FallbackProvider, or a CircuitBreaker-wrapped provider that drops .model).
426
+ const model = result.model || this.provider.model || null;
425
427
  const roundCost = estimateCost(model, lastUsage);
426
428
  if (roundCost !== null) totalCost += roundCost;
427
429
 
@@ -74,7 +74,11 @@ export type RpcClient = {
74
74
  *
75
75
  * @param {object} [opts]
76
76
  * @param {string} [opts.bridgePath] - Path to .mcp-bridge.json. Default: .mcp-bridge.json in cwd.
77
- * @param {string[]} [opts.configPaths] - IDE config paths for discovery.
77
+ * @param {string[]} [opts.configPaths] - Explicit config paths for discovery. When given, honored
78
+ * verbatim. When omitted, only the trusted $HOME/IDE defaults are scanned (NOT `./.mcp.json`).
79
+ * @param {boolean} [opts.includeProjectConfig=false] - Also scan the project-cwd `./.mcp.json`
80
+ * during default discovery. Off by default: a project config in an untrusted repo can auto-spawn
81
+ * arbitrary commands. Implied true when a `confirmServer` hook is present (it vets each command).
78
82
  * @param {string[]} [opts.servers] - Limit to these server names.
79
83
  * @param {number} [opts.timeout=15000] - Per-server handshake timeout in ms (initialize + tools/list).
80
84
  * @param {number} [opts.callTimeout=120000] - Per-invocation timeout in ms for tools/call. Bounds a
@@ -82,16 +86,17 @@ export type RpcClient = {
82
86
  * @param {boolean} [opts.refresh=false] - Force re-discovery regardless of TTL.
83
87
  * @param {(name: string, def: ServerDef) => boolean | Promise<boolean>} [opts.confirmServer]
84
88
  * Vet each discovered server BEFORE its `command` is spawned. Connecting to an
85
- * MCP server runs its command, and discovery reads configs from the cwd (a
86
- * `.mcp.json` in an untrusted repo) as well as the user's home/IDE configs.
87
- * Return false to skip a server (its command is never executed). A throw is
88
- * treated as a deny (fail-closed). Default: every discovered server is trusted
89
- * (unchanged behavior) pass this to gate command execution.
89
+ * MCP server runs its command. Return false to skip a server (its command is
90
+ * never executed). A throw is treated as a deny (fail-closed). Default: every
91
+ * discovered server is trusted pass this to gate command execution. Presence
92
+ * of this hook also opts default discovery into the project-cwd `./.mcp.json`,
93
+ * since each command is then vetted regardless of source.
90
94
  * @returns {Promise<{tools: ToolDef[], metaTools?: ToolDef[], servers: string[], systemContext: string, denied: DeniedTool[], errors?: Array<{server: string, error: string}>, close: Function}>}
91
95
  */
92
96
  export function createMCPBridge(opts?: {
93
97
  bridgePath?: string | undefined;
94
98
  configPaths?: string[] | undefined;
99
+ includeProjectConfig?: boolean | undefined;
95
100
  servers?: string[] | undefined;
96
101
  timeout?: number | undefined;
97
102
  callTimeout?: number | undefined;
@@ -110,10 +115,15 @@ export function createMCPBridge(opts?: {
110
115
  close: Function;
111
116
  }>;
112
117
  /**
113
- * @param {string[]} [configPaths]
118
+ * @param {string[]} [configPaths] - Explicit config paths. When given, honored verbatim (the
119
+ * caller owns the choice). When omitted, the trusted $HOME/IDE defaults are scanned.
120
+ * @param {{ includeProjectConfig?: boolean }} [opts] - When no explicit `configPaths` are given,
121
+ * set `includeProjectConfig: true` to also scan `./.mcp.json`. Default false — see PROJECT_CONFIG_PATH.
114
122
  * @returns {Map<string, ServerDef>}
115
123
  */
116
- export function discoverServers(configPaths?: string[]): Map<string, ServerDef>;
124
+ export function discoverServers(configPaths?: string[], { includeProjectConfig }?: {
125
+ includeProjectConfig?: boolean;
126
+ }): Map<string, ServerDef>;
117
127
  /**
118
128
  * Build the LLM-callable meta-tool surface from a fully-connected bridge.
119
129
  * Shares the underlying tool array and RPC clients with the bulk surface —
package/src/mcp-bridge.js CHANGED
@@ -62,8 +62,12 @@ const { ToolError } = require('./errors');
62
62
 
63
63
  // --- Config discovery (from IDE configs) ---
64
64
 
65
- const DEFAULT_CONFIG_PATHS = [
66
- () => join(process.cwd(), '.mcp.json'), // project
65
+ // The project-cwd `.mcp.json` is the untrusted-repo vector: discovering it auto-spawns its
66
+ // `command`, so cloning a hostile repo and running an agent inside it would be arbitrary code
67
+ // execution. It is therefore NOT in the trusted defaults — these are user/IDE-authored configs
68
+ // under $HOME, which the user owns. The project config is opt-in (see `includeProjectConfig`).
69
+ const PROJECT_CONFIG_PATH = () => join(process.cwd(), '.mcp.json');
70
+ const TRUSTED_CONFIG_PATHS = [
67
71
  () => join(homedir(), '.mcp.json'), // home
68
72
  () => join(homedir(), '.claude', 'mcp_servers.json'), // Claude Code
69
73
  () => join(homedir(), '.config', 'Claude', 'claude_desktop_config.json'), // Claude Desktop
@@ -71,11 +75,22 @@ const DEFAULT_CONFIG_PATHS = [
71
75
  ];
72
76
 
73
77
  /**
74
- * @param {string[]} [configPaths]
78
+ * @param {string[]} [configPaths] - Explicit config paths. When given, honored verbatim (the
79
+ * caller owns the choice). When omitted, the trusted $HOME/IDE defaults are scanned.
80
+ * @param {{ includeProjectConfig?: boolean }} [opts] - When no explicit `configPaths` are given,
81
+ * set `includeProjectConfig: true` to also scan `./.mcp.json`. Default false — see PROJECT_CONFIG_PATH.
75
82
  * @returns {Map<string, ServerDef>}
76
83
  */
77
- function discoverServers(configPaths) {
78
- const paths = configPaths || DEFAULT_CONFIG_PATHS.map(fn => fn());
84
+ function discoverServers(configPaths, { includeProjectConfig = false } = {}) {
85
+ let paths;
86
+ if (configPaths) {
87
+ paths = configPaths;
88
+ } else {
89
+ paths = TRUSTED_CONFIG_PATHS.map(fn => fn());
90
+ // Project config kept at highest precedence (front) when explicitly opted in — preserves the
91
+ // historical "project overrides home" ordering for callers that want it.
92
+ if (includeProjectConfig) paths.unshift(PROJECT_CONFIG_PATH());
93
+ }
79
94
  /** @type {Map<string, ServerDef>} */
80
95
  const servers = new Map();
81
96
 
@@ -594,7 +609,11 @@ function buildMetaTools(tools, discoveredAt) {
594
609
  *
595
610
  * @param {object} [opts]
596
611
  * @param {string} [opts.bridgePath] - Path to .mcp-bridge.json. Default: .mcp-bridge.json in cwd.
597
- * @param {string[]} [opts.configPaths] - IDE config paths for discovery.
612
+ * @param {string[]} [opts.configPaths] - Explicit config paths for discovery. When given, honored
613
+ * verbatim. When omitted, only the trusted $HOME/IDE defaults are scanned (NOT `./.mcp.json`).
614
+ * @param {boolean} [opts.includeProjectConfig=false] - Also scan the project-cwd `./.mcp.json`
615
+ * during default discovery. Off by default: a project config in an untrusted repo can auto-spawn
616
+ * arbitrary commands. Implied true when a `confirmServer` hook is present (it vets each command).
598
617
  * @param {string[]} [opts.servers] - Limit to these server names.
599
618
  * @param {number} [opts.timeout=15000] - Per-server handshake timeout in ms (initialize + tools/list).
600
619
  * @param {number} [opts.callTimeout=120000] - Per-invocation timeout in ms for tools/call. Bounds a
@@ -602,11 +621,11 @@ function buildMetaTools(tools, discoveredAt) {
602
621
  * @param {boolean} [opts.refresh=false] - Force re-discovery regardless of TTL.
603
622
  * @param {(name: string, def: ServerDef) => boolean | Promise<boolean>} [opts.confirmServer]
604
623
  * Vet each discovered server BEFORE its `command` is spawned. Connecting to an
605
- * MCP server runs its command, and discovery reads configs from the cwd (a
606
- * `.mcp.json` in an untrusted repo) as well as the user's home/IDE configs.
607
- * Return false to skip a server (its command is never executed). A throw is
608
- * treated as a deny (fail-closed). Default: every discovered server is trusted
609
- * (unchanged behavior) pass this to gate command execution.
624
+ * MCP server runs its command. Return false to skip a server (its command is
625
+ * never executed). A throw is treated as a deny (fail-closed). Default: every
626
+ * discovered server is trusted pass this to gate command execution. Presence
627
+ * of this hook also opts default discovery into the project-cwd `./.mcp.json`,
628
+ * since each command is then vetted regardless of source.
610
629
  * @returns {Promise<{tools: ToolDef[], metaTools?: ToolDef[], servers: string[], systemContext: string, denied: DeniedTool[], errors?: Array<{server: string, error: string}>, close: Function}>}
611
630
  */
612
631
  async function createMCPBridge(opts = {}) {
@@ -632,9 +651,10 @@ async function createMCPBridge(opts = {}) {
632
651
  catch { return false; }
633
652
  };
634
653
 
635
- // Connecting to a server EXECUTES its `command`, which can originate from a
636
- // cwd-relative .mcp.json in an untrusted repo (discoverServers reads project
637
- // configs). With no confirmServer hook, every discovered command runs unvetted.
654
+ // Connecting to a server EXECUTES its `command`. The project-cwd `.mcp.json` is excluded from
655
+ // default discovery (see TRUSTED_CONFIG_PATHS / includeProjectConfig), so the untrusted-repo
656
+ // path is closed by default; this warning covers the residual case where home/IDE configs (or an
657
+ // explicit opt-in) contribute commands and no confirmServer hook is present to vet them.
638
658
  // Warn ONCE per call, BEFORE the first spawn — and the first spawn is the
639
659
  // discovery phase on a cold/refresh run, not the main-connect phase below.
640
660
  let warnedUnvetted = false;
@@ -654,8 +674,11 @@ async function createMCPBridge(opts = {}) {
654
674
  const needsRefresh = opts.refresh || !config || isExpired(config);
655
675
 
656
676
  if (needsRefresh) {
657
- // Discover from IDE configs
658
- const discovered = discoverServers(opts.configPaths);
677
+ // Discover from IDE configs. The project-cwd `.mcp.json` is excluded by default (untrusted-repo
678
+ // RCE vector); it is scanned only on explicit opt-in, or when a `confirmServer` hook is present
679
+ // (which vets every command before it spawns, so cwd discovery is safe under it).
680
+ const includeProjectConfig = opts.includeProjectConfig === true || !!confirmServer;
681
+ const discovered = discoverServers(opts.configPaths, { includeProjectConfig });
659
682
 
660
683
  if (discovered.size === 0 && !config) {
661
684
  return { tools: [], servers: [], systemContext: '', denied: [], close: async () => {} };
@@ -83,6 +83,7 @@ class AnthropicProvider {
83
83
  return {
84
84
  text,
85
85
  toolCalls,
86
+ model: data.model || this.model,
86
87
  usage: {
87
88
  inputTokens: data.usage?.input_tokens || 0,
88
89
  outputTokens: data.usage?.output_tokens || 0,
@@ -60,6 +60,7 @@ class OllamaProvider {
60
60
  ? JSON.parse(tc.function.arguments)
61
61
  : tc.function.arguments,
62
62
  })),
63
+ model: data.model || this.model,
63
64
  usage: {
64
65
  inputTokens: data.prompt_eval_count || 0,
65
66
  outputTokens: data.eval_count || 0,
@@ -72,6 +72,7 @@ class OpenAIProvider {
72
72
  name: tc.function.name,
73
73
  arguments: JSON.parse(tc.function.arguments),
74
74
  })),
75
+ model: data.model || this.model,
75
76
  usage: {
76
77
  inputTokens: data.usage?.prompt_tokens || 0,
77
78
  outputTokens: data.usage?.completion_tokens || 0,
package/tools/defer.js CHANGED
@@ -133,16 +133,19 @@ async function readQueue(queuePath) {
133
133
  const path = resolveQueuePath(queuePath);
134
134
  try {
135
135
  const text = await fsp.readFile(path, 'utf8');
136
- /** @type {Record<string, Record<string, any>>} */
137
- const records = {};
136
+ // Fold append-only status lines by id (latest wins). A Map — not a plain object — so an
137
+ // attacker-influenced id from a tampered queue file (e.g. "__proto__", "constructor") is just
138
+ // an ordinary key and cannot reach the prototype-setter path. Also require a string id.
139
+ /** @type {Map<string, Record<string, any>>} */
140
+ const records = new Map();
138
141
  for (const line of text.split('\n')) {
139
142
  if (!line.trim()) continue;
140
143
  let r;
141
144
  try { r = JSON.parse(line); } catch { continue; }
142
- if (!r.id) continue;
143
- records[r.id] = { ...records[r.id], ...r };
145
+ if (typeof r.id !== 'string' || !r.id) continue;
146
+ records.set(r.id, { ...records.get(r.id), ...r });
144
147
  }
145
- return Object.values(records);
148
+ return [...records.values()];
146
149
  } catch (/** @type {any} */ err) {
147
150
  if (err.code === 'ENOENT') return [];
148
151
  throw err;
@@ -0,0 +1 @@
1
+ export {};
@@ -0,0 +1,20 @@
1
+ 'use strict';
2
+
3
+ /**
4
+ * Worker thread for shell_grep's matching phase. Runs the (potentially expensive) regex search
5
+ * off the main thread so the parent can enforce a hard timeout via `worker.terminate()` — JS
6
+ * regex backtracking is uninterruptible on its own thread, so isolation is the only sound bound
7
+ * against catastrophic patterns that slip past the static guard. See tools/shell.js `grepPath`.
8
+ */
9
+
10
+ const { workerData, parentPort } = require('node:worker_threads');
11
+ const { _grepCore } = require('./shell.js');
12
+
13
+ // parentPort is non-null inside a worker, but the type is nullable for the main-thread case.
14
+ if (!parentPort) throw new Error('grep-worker.js must be run as a worker thread');
15
+ const port = parentPort;
16
+
17
+ _grepCore(workerData).then(
18
+ (result) => port.postMessage({ ok: true, result }),
19
+ (err) => port.postMessage({ ok: false, error: err && err.message ? err.message : String(err) }),
20
+ );
package/tools/shell.d.ts CHANGED
@@ -4,6 +4,12 @@ export type GrepArgs = {
4
4
  recursive?: boolean | undefined;
5
5
  maxMatches?: number | undefined;
6
6
  flags?: string | undefined;
7
+ /**
8
+ * - Hard wall-clock ceiling in ms (default 5000). The match runs in a
9
+ * worker thread; on overrun the worker is terminated and the call rejects, so a pattern that slips
10
+ * past `looksCatastrophic` can no longer hang the host event loop.
11
+ */
12
+ timeout?: number | undefined;
7
13
  };
8
14
  export type RunArgvArgs = {
9
15
  argv: string[];
@@ -29,3 +35,31 @@ export type ToolDef = import("../types").ToolDef;
29
35
  export function createShellTools(): {
30
36
  tools: ToolDef[];
31
37
  };
38
+ /**
39
+ * @typedef {object} GrepArgs
40
+ * @property {string} pattern
41
+ * @property {string} path
42
+ * @property {boolean} [recursive]
43
+ * @property {number} [maxMatches]
44
+ * @property {string} [flags]
45
+ * @property {number} [timeout] - Hard wall-clock ceiling in ms (default 5000). The match runs in a
46
+ * worker thread; on overrun the worker is terminated and the call rejects, so a pattern that slips
47
+ * past `looksCatastrophic` can no longer hang the host event loop.
48
+ */
49
+ /**
50
+ * The actual search: walk, skip binaries, regex-test each line. Runs in a worker thread (see
51
+ * grep-worker.js) so a runaway regex is killable via `worker.terminate()`. JS RegExp has no
52
+ * execution timeout and backtracking is uninterruptible on its own thread — isolation is the
53
+ * only sound bound (the static `looksCatastrophic` guard is a best-effort fast-reject, not a
54
+ * guarantee; a grounded bypass like `(a|a|a)*` passes it yet backtracks exponentially).
55
+ * @param {GrepArgs} args
56
+ */
57
+ export function _grepCore({ pattern, path: rawPath, recursive, maxMatches, flags }: GrepArgs): Promise<{
58
+ hits: {
59
+ file: string;
60
+ line: number;
61
+ text: string;
62
+ }[];
63
+ truncated: boolean;
64
+ fileCount: number;
65
+ }>;
package/tools/shell.js CHANGED
@@ -17,9 +17,11 @@
17
17
  const fs = require('node:fs/promises');
18
18
  const path = require('node:path');
19
19
  const { exec, execFile } = require('node:child_process');
20
+ const { Worker } = require('node:worker_threads');
20
21
 
21
22
  const DEFAULT_READ_MAX_BYTES = 256 * 1024; // 256 KB
22
23
  const DEFAULT_GREP_MAX_MATCHES = 200;
24
+ const DEFAULT_GREP_TIMEOUT_MS = 5_000; // hard ceiling on a single grep — bounds ReDoS
23
25
  const DEFAULT_EXEC_TIMEOUT_MS = 30_000;
24
26
  const DEFAULT_EXEC_MAX_BUFFER = 1024 * 1024; // 1 MB
25
27
 
@@ -137,18 +139,22 @@ function looksCatastrophic(pattern) {
137
139
  * @property {boolean} [recursive]
138
140
  * @property {number} [maxMatches]
139
141
  * @property {string} [flags]
142
+ * @property {number} [timeout] - Hard wall-clock ceiling in ms (default 5000). The match runs in a
143
+ * worker thread; on overrun the worker is terminated and the call rejects, so a pattern that slips
144
+ * past `looksCatastrophic` can no longer hang the host event loop.
140
145
  */
141
146
 
142
- /** @param {GrepArgs} args */
143
- async function grepPath({ pattern, path: rawPath, recursive = true, maxMatches, flags = 'i' }) {
147
+ /**
148
+ * The actual search: walk, skip binaries, regex-test each line. Runs in a worker thread (see
149
+ * grep-worker.js) so a runaway regex is killable via `worker.terminate()`. JS RegExp has no
150
+ * execution timeout and backtracking is uninterruptible on its own thread — isolation is the
151
+ * only sound bound (the static `looksCatastrophic` guard is a best-effort fast-reject, not a
152
+ * guarantee; a grounded bypass like `(a|a|a)*` passes it yet backtracks exponentially).
153
+ * @param {GrepArgs} args
154
+ */
155
+ async function _grepCore({ pattern, path: rawPath, recursive = true, maxMatches, flags = 'i' }) {
144
156
  const resolved = path.resolve(expandHome(rawPath));
145
157
  const cap = maxMatches || DEFAULT_GREP_MAX_MATCHES;
146
- if (looksCatastrophic(pattern)) {
147
- throw new Error(
148
- `shell_grep: pattern rejected — nested unbounded quantifier (e.g. "(a+)+") risks catastrophic ` +
149
- `backtracking that would block the process. Simplify the regex.`,
150
- );
151
- }
152
158
  let re;
153
159
  try {
154
160
  re = new RegExp(pattern, flags);
@@ -190,6 +196,53 @@ async function grepPath({ pattern, path: rawPath, recursive = true, maxMatches,
190
196
  return { hits, truncated, fileCount: files.length };
191
197
  }
192
198
 
199
+ /**
200
+ * Public grep entry. Fast-rejects obviously catastrophic patterns without paying for a worker,
201
+ * then runs the search in a worker thread bounded by a hard timeout — so even a pattern that
202
+ * defeats the static guard degrades to a bounded rejection instead of an event-loop hang.
203
+ * @param {GrepArgs} args
204
+ */
205
+ function grepPath(args) {
206
+ const { pattern, flags = 'i', timeout } = args;
207
+ if (looksCatastrophic(pattern)) {
208
+ return Promise.reject(new Error(
209
+ `shell_grep: pattern rejected — nested unbounded quantifier (e.g. "(a+)+") risks catastrophic ` +
210
+ `backtracking that would block the process. Simplify the regex.`,
211
+ ));
212
+ }
213
+ // Cheap up-front validation so a syntactically invalid regex fails clearly without a worker spin-up.
214
+ try {
215
+ new RegExp(pattern, flags);
216
+ } catch (/** @type {any} */ err) {
217
+ return Promise.reject(new Error(`shell_grep: invalid regex — ${err.message}`));
218
+ }
219
+
220
+ const budgetMs = timeout && timeout > 0 ? timeout : DEFAULT_GREP_TIMEOUT_MS;
221
+ return new Promise((resolve, reject) => {
222
+ const worker = new Worker(path.join(__dirname, 'grep-worker.js'), { workerData: args });
223
+ let settled = false;
224
+ const done = (fn, val) => {
225
+ if (settled) return;
226
+ settled = true;
227
+ clearTimeout(timer);
228
+ worker.terminate();
229
+ fn(val);
230
+ };
231
+ const timer = setTimeout(() => {
232
+ done(reject, new Error(
233
+ `shell_grep: pattern exceeded ${budgetMs}ms time budget — likely catastrophic backtracking. ` +
234
+ `Simplify the regex.`,
235
+ ));
236
+ }, budgetMs);
237
+ timer.unref?.();
238
+ worker.once('message', (msg) => {
239
+ if (msg && msg.ok) done(resolve, msg.result);
240
+ else done(reject, new Error((msg && msg.error) || 'shell_grep: worker failed'));
241
+ });
242
+ worker.once('error', (err) => done(reject, err));
243
+ });
244
+ }
245
+
193
246
  /**
194
247
  * @typedef {object} RunArgvArgs
195
248
  * @property {string[]} argv
@@ -360,4 +413,4 @@ function createShellTools() {
360
413
  return { tools };
361
414
  }
362
415
 
363
- module.exports = { createShellTools };
416
+ module.exports = { createShellTools, _grepCore };
package/tools/spawn.d.ts CHANGED
@@ -42,9 +42,16 @@ export type SpawnChildOptions = {
42
42
  */
43
43
  cliPath?: string | undefined;
44
44
  /**
45
- * - Force-kill child after this many ms.
45
+ * - Force-kill child after this many ms (wall-clock hard ceiling).
46
46
  */
47
47
  timeoutMs?: number | undefined;
48
+ /**
49
+ * - Force-kill child after this many ms with NO output on either
50
+ * stdout or stderr (heartbeat/liveness watchdog). The clock arms at spawn and resets on every JSONL
51
+ * line, so a child doing real work is never killed, but one that hangs silently is. Opt-in
52
+ * (0/undefined disables); independent of `timeoutMs`, which remains the absolute ceiling.
53
+ */
54
+ idleTimeoutMs?: number | undefined;
48
55
  /**
49
56
  * - bareagent Stream — child:stderr events get re-emitted here.
50
57
  */
@@ -56,13 +63,16 @@ export type Stream = import("../src/stream").Stream;
56
63
  *
57
64
  * @param {object} [options]
58
65
  * @param {string} [options.cliPath] - Override the bareagent CLI path (default: ./bin/cli.js relative to this file).
59
- * @param {number} [options.timeoutMs] - Force-kill child after this many ms (default 10 min).
66
+ * @param {number} [options.timeoutMs] - Force-kill child after this many ms (default 10 min, wall-clock ceiling).
67
+ * @param {number} [options.idleTimeoutMs] - Force-kill child after this many ms of no stdout/stderr output
68
+ * (heartbeat watchdog; default off). Resets on every line, so slow-but-working children survive.
60
69
  * @param {Stream} [options.stream] - bareagent Stream instance — child:stderr events get re-emitted here.
61
70
  * @returns {{tool: import('../types').ToolDef, spawnChild: typeof spawnChild}}
62
71
  */
63
72
  export function createSpawnTool(options?: {
64
73
  cliPath?: string | undefined;
65
74
  timeoutMs?: number | undefined;
75
+ idleTimeoutMs?: number | undefined;
66
76
  stream?: import("../src/stream").Stream | undefined;
67
77
  }): {
68
78
  tool: import("../types").ToolDef;
@@ -86,12 +96,16 @@ export function createSpawnTool(options?: {
86
96
  * @property {string} [config] - Path to a bareagent config JSON file.
87
97
  * @property {*} [input] - Optional JSON input passed to the child on stdin.
88
98
  * @property {string} [cliPath] - Override the bareagent CLI path.
89
- * @property {number} [timeoutMs] - Force-kill child after this many ms.
99
+ * @property {number} [timeoutMs] - Force-kill child after this many ms (wall-clock hard ceiling).
100
+ * @property {number} [idleTimeoutMs] - Force-kill child after this many ms with NO output on either
101
+ * stdout or stderr (heartbeat/liveness watchdog). The clock arms at spawn and resets on every JSONL
102
+ * line, so a child doing real work is never killed, but one that hangs silently is. Opt-in
103
+ * (0/undefined disables); independent of `timeoutMs`, which remains the absolute ceiling.
90
104
  * @property {Stream} [stream] - bareagent Stream — child:stderr events get re-emitted here.
91
105
  *
92
106
  * @param {SpawnChildOptions} [opts]
93
107
  */
94
- export function spawnChild({ config, input, cliPath, timeoutMs, stream }?: SpawnChildOptions): {
108
+ export function spawnChild({ config, input, cliPath, timeoutMs, idleTimeoutMs, stream }?: SpawnChildOptions): {
95
109
  wait: () => Promise<{
96
110
  text: any;
97
111
  usage: any;
@@ -100,6 +114,7 @@ export function spawnChild({ config, input, cliPath, timeoutMs, stream }?: Spawn
100
114
  events: ChildEvent[];
101
115
  exitCode: any;
102
116
  signal: any;
117
+ idleKilled: boolean;
103
118
  }>;
104
119
  onLine: (fn: (event: ChildEvent) => void) => () => void;
105
120
  kill: (sig?: NodeJS.Signals) => void;
package/tools/spawn.js CHANGED
@@ -66,12 +66,16 @@ function resolveCliPath() {
66
66
  * @property {string} [config] - Path to a bareagent config JSON file.
67
67
  * @property {*} [input] - Optional JSON input passed to the child on stdin.
68
68
  * @property {string} [cliPath] - Override the bareagent CLI path.
69
- * @property {number} [timeoutMs] - Force-kill child after this many ms.
69
+ * @property {number} [timeoutMs] - Force-kill child after this many ms (wall-clock hard ceiling).
70
+ * @property {number} [idleTimeoutMs] - Force-kill child after this many ms with NO output on either
71
+ * stdout or stderr (heartbeat/liveness watchdog). The clock arms at spawn and resets on every JSONL
72
+ * line, so a child doing real work is never killed, but one that hangs silently is. Opt-in
73
+ * (0/undefined disables); independent of `timeoutMs`, which remains the absolute ceiling.
70
74
  * @property {Stream} [stream] - bareagent Stream — child:stderr events get re-emitted here.
71
75
  *
72
76
  * @param {SpawnChildOptions} [opts]
73
77
  */
74
- function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
78
+ function spawnChild({ config, input, cliPath, timeoutMs, idleTimeoutMs, stream } = {}) {
75
79
  if (typeof config !== 'string' || !config) {
76
80
  throw new Error('[spawn] requires { config: <path> }');
77
81
  }
@@ -104,10 +108,29 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
104
108
  if (i >= 0) lineSubscribers.splice(i, 1);
105
109
  }; };
106
110
 
111
+ // Idle watchdog: kill the child after `idleTimeoutMs` of silence on BOTH stdio streams.
112
+ // Distinct from `timeoutMs` (wall-clock ceiling): this catches a child that is alive but stuck
113
+ // producing nothing — the "no activity in stderr" hang — without punishing one doing slow work,
114
+ // since `armIdle()` resets on every line. Armed at spawn so a child that never emits is caught too.
115
+ let idleTimer = null;
116
+ let idleKilled = false;
117
+ const armIdle = () => {
118
+ if (!idleTimeoutMs || idleTimeoutMs <= 0) return;
119
+ if (idleTimer) clearTimeout(idleTimer);
120
+ idleTimer = setTimeout(() => {
121
+ idleKilled = true;
122
+ try { child.kill('SIGTERM'); } catch { /* already dead */ }
123
+ setTimeout(() => { try { child.kill('SIGKILL'); } catch { /* already dead */ } }, 5000).unref();
124
+ }, idleTimeoutMs);
125
+ idleTimer.unref();
126
+ };
127
+ armIdle();
128
+
107
129
  // stdout — JSONL events from the child loop
108
130
  const outRl = readline.createInterface({ input: child.stdout, crlfDelay: Infinity });
109
131
  outRl.on('line', (line) => {
110
132
  if (!line) return;
133
+ armIdle();
111
134
  let event;
112
135
  try { event = JSON.parse(line); }
113
136
  catch {
@@ -130,6 +153,7 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
130
153
  const errRl = readline.createInterface({ input: child.stderr, crlfDelay: Infinity });
131
154
  errRl.on('line', (line) => {
132
155
  if (!line) return;
156
+ armIdle();
133
157
  const event = { type: 'child:stderr', text: line, ts: new Date().toISOString() };
134
158
  events.push(event);
135
159
  if (stream) {
@@ -157,18 +181,20 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
157
181
  const exitPromise = new Promise((resolve) => {
158
182
  child.on('exit', async (code, signal) => {
159
183
  if (killTimer) clearTimeout(killTimer);
184
+ if (idleTimer) clearTimeout(idleTimer);
160
185
  // Drain stdio readlines before resolving — last line may still be in buffer.
161
186
  await Promise.all([outClosePromise, errClosePromise]);
162
- resolve({ code, signal });
187
+ resolve({ code, signal, idleKilled });
163
188
  });
164
189
  child.on('error', (err) => {
165
190
  if (killTimer) clearTimeout(killTimer);
191
+ if (idleTimer) clearTimeout(idleTimer);
166
192
  resolve({ code: null, signal: null, spawnError: err });
167
193
  });
168
194
  });
169
195
 
170
196
  async function wait() {
171
- const { code, signal, spawnError } = await exitPromise;
197
+ const { code, signal, spawnError, idleKilled: idle } = await exitPromise;
172
198
  if (spawnError) {
173
199
  return {
174
200
  text: '',
@@ -178,6 +204,7 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
178
204
  events,
179
205
  exitCode: null,
180
206
  signal: null,
207
+ idleKilled: false,
181
208
  };
182
209
  }
183
210
  // Pluck the final loop:done event — that's the canonical child result.
@@ -194,18 +221,21 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
194
221
  events,
195
222
  exitCode: code,
196
223
  signal,
224
+ idleKilled: !!idle,
197
225
  };
198
226
  }
199
227
  // No loop:done — child exited abnormally or never reached the LLM.
200
228
  const errEvent = events.find(e => e.type === 'loop:error' || e.type === 'error');
229
+ const idleNote = idle ? `[spawn] child killed after idle timeout (no output; signal=${signal})` : null;
201
230
  return {
202
231
  text: '',
203
232
  usage: { inputTokens: 0, outputTokens: 0 },
204
233
  cost: 0,
205
- error: errEvent?.data?.error || `[spawn] child exited (code=${code}, signal=${signal}) without loop:done`,
234
+ error: idleNote || errEvent?.data?.error || `[spawn] child exited (code=${code}, signal=${signal}) without loop:done`,
206
235
  events,
207
236
  exitCode: code,
208
237
  signal,
238
+ idleKilled: !!idle,
209
239
  };
210
240
  }
211
241
 
@@ -222,7 +252,9 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
222
252
  *
223
253
  * @param {object} [options]
224
254
  * @param {string} [options.cliPath] - Override the bareagent CLI path (default: ./bin/cli.js relative to this file).
225
- * @param {number} [options.timeoutMs] - Force-kill child after this many ms (default 10 min).
255
+ * @param {number} [options.timeoutMs] - Force-kill child after this many ms (default 10 min, wall-clock ceiling).
256
+ * @param {number} [options.idleTimeoutMs] - Force-kill child after this many ms of no stdout/stderr output
257
+ * (heartbeat watchdog; default off). Resets on every line, so slow-but-working children survive.
226
258
  * @param {Stream} [options.stream] - bareagent Stream instance — child:stderr events get re-emitted here.
227
259
  * @returns {{tool: import('../types').ToolDef, spawnChild: typeof spawnChild}}
228
260
  */
@@ -250,6 +282,7 @@ function createSpawnTool(options = {}) {
250
282
  input,
251
283
  cliPath: options.cliPath,
252
284
  timeoutMs: options.timeoutMs ?? DEFAULT_TIMEOUT_MS,
285
+ idleTimeoutMs: options.idleTimeoutMs,
253
286
  stream: options.stream,
254
287
  });
255
288
  return await handle.wait();
package/types/index.d.ts CHANGED
@@ -22,6 +22,8 @@ export interface GenerateResult {
22
22
  text: string;
23
23
  toolCalls: ToolCall[];
24
24
  usage: Usage;
25
+ /** Model id the response was produced by; preferred over Provider.model for cost accounting. */
26
+ model?: string | null;
25
27
  }
26
28
 
27
29
  /** A conversation message in OpenAI chat format. */