bare-agent 0.14.0 → 0.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -66,8 +66,9 @@ Every piece works alone — take what you need, ignore the rest.
66
66
 
67
67
  | Component | What it does |
68
68
  |---|---|
69
- | **Loop** | Think → act → observe → repeat. Calls any LLM, executes your tools, loops until done. Returns estimated USD cost per run. Governance via `Loop({ policy })` — wire bareguard's `Gate` through `wireGate(gate)` and every tool call (native, MCP, browsing, mobile) traverses one chokepoint with per-caller `ctx` routing. Bareguard owns the audit log, budget caps, and halt decisions; Loop respects the verdict. Context engineering via `Loop({ assemble })` — a per-round `assemble(msgs, ctx)` chokepoint to recall/compress/trim the window sent to the model (the seam litectx plugs into); returns a view, the canonical transcript stays intact, fail-open. The exported `unitAssembler`/`toUnits`/`fromUnits` adapter lets a consumer work over a neutral unit `{id, role, content, kind, pinned, atomic, tokensApprox}` — bareagent owns the grammar (atomic tool-pair bundling, pinned system/task, a pairing seatbelt), the consumer owns content + relevance. The CE function reads its inputs from the per-run `ctx` — litectx's budget-fitter uses `ctx.budget` (and `ctx.task`), so you **must** populate it via `run(msgs, tools, { ctx })`: an unset `ctx.budget` means the fitter has no budget, keeps everything, and returns the window unchanged — a silent no-op, not a bug (see `examples/litectx-assemble.mjs`). For summary-window compaction the Loop also lends a provider-bound `ctx.summarize(excerpt) => Promise<string>` (R-C6): the consumer owns when/what to summarize and the splice, bareagent makes the one model call (counted against the budget via `onLlmResult`, tagged `kind:'summarize'`). `onError` + `loop:error` surface every silent-ish failure (callback throw, Checkpoint timeout) |
69
+ | **Loop** | Think → act → observe → repeat. Calls any LLM, executes your tools, loops until done. Returns estimated USD cost per run. Governance via `Loop({ policy })` — wire bareguard's `Gate` through `wireGate(gate)` and every tool call (native, MCP, browsing, mobile) traverses one chokepoint with per-caller `ctx` routing. Bareguard owns the audit log, budget caps, and halt decisions; Loop respects the verdict. Context engineering via `Loop({ assemble })` — a per-round `assemble(msgs, ctx)` chokepoint to recall/compress/trim the window sent to the model (the seam litectx plugs into); returns a view, the canonical transcript stays intact, fail-open. The exported `unitAssembler`/`toUnits`/`fromUnits` adapter lets a consumer work over a neutral unit `{id, role, content, kind, pinned, atomic, tokensApprox}` — bareagent owns the grammar (atomic tool-pair bundling, pinned system/task, a pairing seatbelt), the consumer owns content + relevance. The CE function reads its inputs from the per-run `ctx` — litectx's budget-fitter uses `ctx.budget` (and `ctx.task`), so you **must** populate it via `run(msgs, tools, { ctx })`: an unset `ctx.budget` means the fitter has no budget, keeps everything, and returns the window unchanged — a silent no-op, not a bug (see `examples/litectx-assemble.mjs`). For summary-window compaction the Loop also lends a provider-bound `ctx.summarize(excerpt) => Promise<string>` (R-C6): the consumer owns when/what to summarize and the splice, bareagent makes the one model call (counted against the budget via `onLlmResult`, tagged `kind:'summarize'`). For an unbounded long-running agent there's the **destructive** counterpart `Loop({ trim })` (RT-2) — a per-round bound on the canonical transcript that evicts old turns *after* harvesting them; wire it with the exported `unitTrimmer({ trim, onHarvest, policy })` over litectx's `trim` verb (harvest-before-evict, fail-open; `harvestKey` gives the stable upsert id), opt-in (requires a consumer on litectx ≥ 0.16.0). `onError` + `loop:error` surface every silent-ish failure (callback throw, Checkpoint timeout) |
70
70
  | **Planner** | Break a goal into a step DAG via LLM. Built-in caching (`cacheTTL`) |
71
+ | **assessComplexity** | Pure-code pre-planner (no LLM): rates a goal `simple`/`medium`/`complex`/`critical` from its text via keyword scoring + a critical safety override. `needsPlanning` gates whether to spend a Planner pass; `critical` flags security/production/compliance work for extra scrutiny. Free, instant, debuggable via `signals` |
71
72
  | **runPlan** | Execute steps in parallel waves. Dependency-aware, failure propagation, per-step retry |
72
73
  | **Retry** | Exponential/linear backoff with jitter. Respects `err.retryable` |
73
74
  | **CircuitBreaker** | Fail fast after N errors. Auto-recovers after cooldown. Per-key isolation |
@@ -83,7 +84,7 @@ Every piece works alone — take what you need, ignore the rest.
83
84
  | **Mobile** | Android + iOS device control via `baremobile`. Same two modes: library tools (`createMobileTools` — action tools auto-return snapshots) or CLI session (`baremobile` CLI — disk-based snapshots) |
84
85
  | **Shell** | Cross-platform `shell_read`, `shell_grep`, `shell_run` (argv, no shell), `shell_exec` (raw shell). Pure Node — no `grep`/`rg`/`findstr` dependency. Injection-proof `shell_run` for policy-gated use |
85
86
  | **MCP Bridge** | Auto-discover MCP servers from IDE configs (Claude Code, Cursor, etc.), expose as bareagent tools. Static allow/deny via `.mcp-bridge.json`, `systemContext` for LLM awareness. Runtime policy lives in `Loop({ policy })` — one hook for MCP + native tools alike. Returns both bulk `tools` (one per MCP tool) and `metaTools` (`mcp_discover` + `mcp_invoke` for token-thrifty access to large catalogs). Connecting runs a server's `command` (which may come from a cwd `.mcp.json`): pass `confirmServer` to vet each before it spawns — otherwise the bridge warns naming every command it runs. Every RPC is time-bounded (`timeout` for the handshake, `callTimeout` for `tools/call`), and a server that breaks its stdin pipe fails the connection instead of crashing the host. Zero deps |
86
- | **Spawn** | Fork a child bareagent process as a specialist agent. LLM-callable form blocks until child exits; library form returns a handle (`wait`, `onLine`, `kill`). One JSONL channel per child — child stderr captured and re-emitted as `child:stderr` events on the parent stream. Threads `BAREGUARD_AUDIT_PATH` / `BAREGUARD_PARENT_RUN_ID` / `BAREGUARD_BUDGET_FILE` / `BAREGUARD_SPAWN_DEPTH` so the family stitches into one audit + budget. `bareguard ^0.2.0` adds `spawn.ratePerMinute` + `limits.maxDepth` per-family caps |
87
+ | **Spawn** | Fork a child bareagent process as a specialist agent. LLM-callable form blocks until child exits; library form returns a handle (`wait`, `onLine`, `kill`). One JSONL channel per child — child stderr captured and re-emitted as `child:stderr` events on the parent stream. Threads `BAREGUARD_AUDIT_PATH` / `BAREGUARD_PARENT_RUN_ID` / `BAREGUARD_BUDGET_FILE` / `BAREGUARD_SPAWN_DEPTH` so the family stitches into one audit + budget. `bareguard ^0.2.0` adds `spawn.ratePerMinute` + `limits.maxDepth` per-family caps. `timeoutMs` is the wall-clock ceiling; opt-in `idleTimeoutMs` is a heartbeat watchdog that kills a child gone silent on both stdio streams (resets on each line, so slow-but-working children survive; result carries `idleKilled`) |
87
88
  | **Defer** | Append a `{action, when}` record to a JSONL queue for a separate waker (cron / systemd timer / `examples/wake.sh`) to fire later. Two-phase governance: emit-time `gate.check` on the `defer` action; fire-time `gate.check` on the inner action when the waker re-invokes. `bareguard ^0.2.0` adds `defer.ratePerMinute` family-wide cap |
88
89
 
89
90
  **Providers:** OpenAI-compatible (OpenAI, OpenRouter, Groq, vLLM, LM Studio), Anthropic, Ollama, CLIPipe (any CLI tool via stdin/stdout with real-time streaming), Fallback, or bring your own (one method: `generate`). All return the same shape — swap freely. The OpenAI provider warns if it would send your key over plaintext `http://` to a non-loopback host (use `https`, or drop `apiKey` for keyless local endpoints).
@@ -240,17 +241,20 @@ For wiring recipes and API details, see the **[Integration Guide](bareagent.cont
240
241
 
241
242
  ## The bare ecosystem
242
243
 
243
- Four vanilla JS modules. Zero deps where possible (bareguard has one). Same API patterns.
244
+ Local-first, composable agent infrastructure. Same API patterns throughout —
245
+ mix and match, each module works standalone.
244
246
 
245
- | | [**bareagent**](https://npmjs.com/package/bare-agent) | [**barebrowse**](https://npmjs.com/package/barebrowse) | [**baremobile**](https://npmjs.com/package/baremobile) | [**bareguard**](https://npmjs.com/package/bareguard) |
246
- |---|---|---|---|---|
247
- | **Does** | Gives agents a think→act loop | Gives agents a real browser | Gives agents Android + iOS devices | Gates everything an agent does |
248
- | **How** | Goal in → coordinated actions out | URL in → pruned snapshot out | Screen in → pruned snapshot out | Action in → allow / deny / human-asked out |
249
- | **Replaces** | LangChain, CrewAI, AutoGen | Playwright, Selenium, Puppeteer | Appium, Espresso, XCUITest | Hand-rolled allowlists, scattered policy code |
250
- | **Interfaces** | Library · CLI · subprocess | Library · CLI · MCP | Library · CLI · MCP | Library |
251
- | **Solo or together** | Orchestrates the others as tools | Works standalone | Works standalone | Embedded in bareagent's loop; usable by any runner |
247
+ **Core** the brain, the gate, the memory.
252
248
 
253
- > **Reach 50+ messengers with one Docker container via [beeperbox](https://github.com/hamr0/beeperbox)** — a headless Beeper Desktop that exposes WhatsApp, iMessage, Signal, Telegram, Slack, Discord, RCS, SMS and more as a single MCP server. Wire it through bareagent's MCP bridge; bareguard policies the invocations like any other tool (per-chat allowlists, ask patterns on destructive sends, all the usual layered defense).
249
+ - **[bareagent](https://npmjs.com/package/bare-agent)** — the think→act→observe loop. *Goal in coordinated actions out.* Replaces LangChain, CrewAI, AutoGen.
250
+ - **[bareguard](https://npmjs.com/package/bareguard)** — the single gate every action passes through. *Action in → allow / deny / ask-a-human out.* Replaces hand-rolled allowlists and scattered policy code.
251
+ - **[litectx](https://npmjs.com/package/litectx)** — tree-sitter code + memory graph with activation decay, plus lightweight context engineering (write · select · compress · isolate). *Query in → ranked context out.*
252
+
253
+ **Optional reach** — give the agent hands.
254
+
255
+ - **[barebrowse](https://npmjs.com/package/barebrowse)** — a real browser for agents. *URL in → pruned snapshot out.* Replaces Playwright, Selenium, Puppeteer.
256
+ - **[baremobile](https://npmjs.com/package/baremobile)** — Android + iOS device control. *Screen in → pruned snapshot out.* Replaces Appium, Espresso, XCUITest.
257
+ - **[beeperbox](https://github.com/hamr0/beeperbox)** — 50+ messaging networks via one MCP server (headless Beeper Desktop in Docker). *Chat in → unified message stream out.* Replaces Twilio, per-platform bot APIs.
254
258
 
255
259
  **What you can build:**
256
260
 
@@ -1,7 +1,7 @@
1
1
  # bareagent — Integration Guide
2
2
 
3
3
  > For AI assistants and developers wiring bareagent into a project.
4
- > v0.14.0 | Node.js >= 18 | one required dep (`bareguard ^0.4.2`) | Apache 2.0
4
+ > v0.16.0 | Node.js >= 18 | one required dep (`bareguard ^0.4.2`) | Apache 2.0
5
5
  >
6
6
  > Full human guide with composition examples, design philosophy, and recipes: [Usage Guide](docs/02-features/usage-guide.md)
7
7
 
@@ -14,7 +14,7 @@ npm install bare-agent
14
14
  ```
15
15
 
16
16
  Eight entry points:
17
- - `require('bare-agent')` — Loop, Planner, StateMachine, Scheduler, Checkpoint, Memory, Stream, Retry, runPlan, CircuitBreaker, wireGate, defaultActionTranslator, **toUnits, fromUnits, unitAssembler** (the `assemble` context-units adapter, v0.13+), BareAgentError, ProviderError, ToolError, TimeoutError, ValidationError, CircuitOpenError, **HaltError**
17
+ - `require('bare-agent')` — Loop, Planner, **assessComplexity** (pure-code no-LLM pre-planner → `{level, score, needsPlanning, signals}`), StateMachine, Scheduler, Checkpoint, Memory, Stream, Retry, runPlan, CircuitBreaker, wireGate, defaultActionTranslator, **toUnits, fromUnits, unitAssembler** (the `assemble` context-units adapter, v0.13+), **unitTrimmer, harvestKey** (the destructive `trim` seam adapter — RT-2 harvest-before-evict, needs a consumer on litectx ≥ 0.16.0), BareAgentError, ProviderError, ToolError, TimeoutError, ValidationError, CircuitOpenError, **HaltError**
18
18
  - `require('bare-agent/errors')` — same error classes via a stable subpath (v0.10.1+) for adopters who want to import only the error surface
19
19
  - `require('bare-agent/providers')` — OpenAI, Anthropic, Ollama, CLIPipe, Fallback (the canonical short names; `*Provider` aliases — `OpenAIProvider`, `AnthropicProvider`, etc. — are also exported and match the class names, so either destructure works, v0.12.1+)
20
20
  - `require('bare-agent/stores')` — SQLite (FTS5), JsonFile
@@ -31,6 +31,8 @@ Eight entry points:
31
31
  |---|---|
32
32
  | Call an LLM with tools and get a result | Loop + a Provider |
33
33
  | Break a goal into steps | Planner + a Provider |
34
+ | Size a goal before planning (no LLM) | assessComplexity — `needsPlanning` gates a Planner pass |
35
+ | Kill a spawned child that hangs silently | createSpawnTool / spawnChild `{ idleTimeoutMs }` |
34
36
  | Execute a step DAG with parallelism | runPlan + executeFn |
35
37
  | Track task state (pending/running/done/failed) | StateMachine |
36
38
  | Run agent turns on a schedule (cron, timers) | Scheduler |
package/index.d.ts CHANGED
@@ -1,5 +1,6 @@
1
1
  import { Loop } from "./src/loop";
2
2
  import { Planner } from "./src/planner";
3
+ import { assessComplexity } from "./src/complexity";
3
4
  import { StateMachine } from "./src/state";
4
5
  import { Scheduler } from "./src/scheduler";
5
6
  import { Checkpoint } from "./src/checkpoint";
@@ -13,6 +14,8 @@ import { defaultActionTranslator } from "./src/bareguard-adapter";
13
14
  import { toUnits } from "./src/context-units";
14
15
  import { fromUnits } from "./src/context-units";
15
16
  import { unitAssembler } from "./src/context-units";
17
+ import { unitTrimmer } from "./src/context-units";
18
+ import { harvestKey } from "./src/context-units";
16
19
  import { BareAgentError } from "./src/errors";
17
20
  import { ProviderError } from "./src/errors";
18
21
  import { ToolError } from "./src/errors";
@@ -20,4 +23,4 @@ import { TimeoutError } from "./src/errors";
20
23
  import { ValidationError } from "./src/errors";
21
24
  import { CircuitOpenError } from "./src/errors";
22
25
  import { HaltError } from "./src/errors";
23
- export { Loop, Planner, StateMachine, Scheduler, Checkpoint, Memory, Stream, Retry, runPlan, CircuitBreaker, wireGate, defaultActionTranslator, toUnits, fromUnits, unitAssembler, BareAgentError, ProviderError, ToolError, TimeoutError, ValidationError, CircuitOpenError, HaltError };
26
+ export { Loop, Planner, assessComplexity, StateMachine, Scheduler, Checkpoint, Memory, Stream, Retry, runPlan, CircuitBreaker, wireGate, defaultActionTranslator, toUnits, fromUnits, unitAssembler, unitTrimmer, harvestKey, BareAgentError, ProviderError, ToolError, TimeoutError, ValidationError, CircuitOpenError, HaltError };
package/index.js CHANGED
@@ -2,6 +2,7 @@
2
2
 
3
3
  const { Loop } = require('./src/loop');
4
4
  const { Planner } = require('./src/planner');
5
+ const { assessComplexity } = require('./src/complexity');
5
6
  const { StateMachine } = require('./src/state');
6
7
  const { Scheduler } = require('./src/scheduler');
7
8
  const { Checkpoint } = require('./src/checkpoint');
@@ -11,7 +12,7 @@ const { Retry } = require('./src/retry');
11
12
  const { runPlan } = require('./src/run-plan');
12
13
  const { CircuitBreaker } = require('./src/circuit-breaker');
13
14
  const { wireGate, defaultActionTranslator } = require('./src/bareguard-adapter');
14
- const { toUnits, fromUnits, unitAssembler } = require('./src/context-units');
15
+ const { toUnits, fromUnits, unitAssembler, unitTrimmer, harvestKey } = require('./src/context-units');
15
16
  const {
16
17
  BareAgentError,
17
18
  ProviderError,
@@ -25,6 +26,7 @@ const {
25
26
  module.exports = {
26
27
  Loop,
27
28
  Planner,
29
+ assessComplexity,
28
30
  StateMachine,
29
31
  Scheduler,
30
32
  Checkpoint,
@@ -38,6 +40,8 @@ module.exports = {
38
40
  toUnits,
39
41
  fromUnits,
40
42
  unitAssembler,
43
+ unitTrimmer,
44
+ harvestKey,
41
45
  BareAgentError,
42
46
  ProviderError,
43
47
  ToolError,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "bare-agent",
3
- "version": "0.14.0",
3
+ "version": "0.16.0",
4
4
  "files": [
5
5
  "index.js",
6
6
  "index.d.ts",
@@ -99,7 +99,7 @@
99
99
  },
100
100
  "devDependencies": {
101
101
  "@types/node": "^22.19.19",
102
- "litectx": "^0.13.0",
102
+ "litectx": "^0.16.0",
103
103
  "typescript": "^5.7.0"
104
104
  }
105
105
  }
@@ -0,0 +1,31 @@
1
+ export type ComplexityResult = {
2
+ /**
3
+ * - Assessed complexity tier.
4
+ */
5
+ level: "simple" | "medium" | "complex" | "critical";
6
+ /**
7
+ * - Raw heuristic score (100 for a critical override).
8
+ */
9
+ score: number;
10
+ /**
11
+ * - false for `simple`, true otherwise — the routing hint.
12
+ */
13
+ needsPlanning: boolean;
14
+ /**
15
+ * - Which signals fired, for transparency/debugging.
16
+ */
17
+ signals: string[];
18
+ };
19
+ /**
20
+ * @typedef {object} ComplexityResult
21
+ * @property {'simple'|'medium'|'complex'|'critical'} level - Assessed complexity tier.
22
+ * @property {number} score - Raw heuristic score (100 for a critical override).
23
+ * @property {boolean} needsPlanning - false for `simple`, true otherwise — the routing hint.
24
+ * @property {string[]} signals - Which signals fired, for transparency/debugging.
25
+ */
26
+ /**
27
+ * Assess the complexity of a goal/prompt from its text alone (no LLM).
28
+ * @param {string} prompt - The goal to classify.
29
+ * @returns {ComplexityResult}
30
+ */
31
+ export function assessComplexity(prompt: string): ComplexityResult;
@@ -0,0 +1,149 @@
1
+ 'use strict';
2
+
3
+ /**
4
+ * Keyword complexity assessor — a fast, pure-code "pre-planner" that classifies a goal as
5
+ * simple / medium / complex / critical from its text alone, with NO LLM call. Ported (concept,
6
+ * not line-for-line) from Aurora's SOAR keyword assessor. It exists to drive a routing decision:
7
+ * a `simple` goal can run single-shot; `medium`+ warrants a Planner pass; `critical` (security,
8
+ * production, compliance, financial) flags work that deserves extra scrutiny (e.g. a checkpoint /
9
+ * adversarial verification) before acting.
10
+ *
11
+ * const { level, needsPlanning } = assessComplexity(goal);
12
+ * const steps = needsPlanning ? await planner.plan(goal) : [{ id: 's1', action: goal }];
13
+ *
14
+ * Concept, deliberately lightweight: a critical-keyword override, tiered action-verb scoring
15
+ * (simple verbs subtract, complex verbs add the most), feature nouns + scope + structure signals,
16
+ * and two calibrated thresholds. It is a heuristic — transparent and debuggable via `signals`, not
17
+ * a model. On the upstream validation corpus it lands ~89% (the fuller LLM-free original ~95%);
18
+ * the gap is long-tail ambiguity ("add a button" is genuinely context-dependent).
19
+ */
20
+
21
+ const has = (/** @type {Set<string>} */ words, /** @type {Set<string>} */ set) =>
22
+ [...set].filter(w => words.has(w));
23
+ const wordSet = (/** @type {string} */ s) => new Set(s.match(/\b\w+\b/g) || []);
24
+ // Escape regex metacharacters so a keyword can't break (or alter) the word-boundary match — the
25
+ // lists below are plain words today, but a future entry like "c++" or ".net" must stay literal.
26
+ const esc = (/** @type {string} */ k) => k.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
27
+ const hasAny = (/** @type {string} */ s, /** @type {string[]} */ list) =>
28
+ list.some(k => new RegExp(`\\b${esc(k)}\\b`).test(s));
29
+
30
+ // --- critical safety override: high-stakes work jumps straight to the top tier ---
31
+ const CRIT_INCIDENT = ['emergency', 'outage', 'breach', 'vulnerability', 'exploit', 'corruption', 'data loss', 'incident', 'penetration'];
32
+ const CRIT_COMPLIANCE = ['gdpr', 'hipaa', 'pci', 'compliance', 'regulation'];
33
+ const SEC_CONTEXT = ['security', 'production', 'authentication', 'authorization'];
34
+ const CRIT_ACTIONS = ['fix', 'patch', 'investigate', 'secure', 'protect', 'mitigate', 'prevent', 'respond', 'handle'];
35
+ const FINANCIAL = ['payment', 'transaction', 'billing', 'financial'];
36
+ const SECURE_ACTS = ['encrypt', 'secure', 'protect', 'audit'];
37
+
38
+ /** @param {string} s lowercased prompt */
39
+ function isCritical(s) {
40
+ if (hasAny(s, CRIT_INCIDENT) || hasAny(s, CRIT_COMPLIANCE)) return true;
41
+ if (hasAny(s, SEC_CONTEXT) && hasAny(s, CRIT_ACTIONS)) return true; // e.g. "fix security ..."
42
+ if (hasAny(s, FINANCIAL) && hasAny(s, SECURE_ACTS)) return true; // e.g. "encrypt payment ..."
43
+ return false;
44
+ }
45
+
46
+ // --- tiered keyword scoring: verb "weight" reflects how much work the ask implies ---
47
+ const COMPLEX_VERBS = new Set(['implement', 'design', 'architect', 'refactor', 'integrate', 'migrate', 'build', 'create', 'develop', 'construct', 'engineer', 'establish', 'transform', 'overhaul', 'rewrite', 'restructure', 'optimize']);
48
+ const ANALYSIS_VERBS = new Set(['explain', 'compare', 'analyze', 'debug', 'understand', 'investigate', 'describe', 'evaluate', 'review', 'examine', 'diagnose', 'trace', 'why', 'difference']);
49
+ const MEDIUM_VERBS = new Set(['add', 'update', 'fix', 'write', 'change', 'modify', 'remove', 'delete', 'improve', 'enhance', 'extend', 'convert', 'rename', 'move', 'test', 'configure', 'setup', 'set', 'enable', 'disable']);
50
+ const SIMPLE_VERBS = new Set(['what', 'show', 'list', 'get', 'find', 'print', 'check', 'read', 'open', 'run', 'where', 'which', 'display', 'view', 'see', 'tell', 'give', 'name', 'count', 'who', 'when', 'is']);
51
+ const SCOPE = new Set(['all', 'every', 'entire', 'across', 'comprehensive', 'complete', 'codebase', 'project', 'system', 'application', 'full', 'whole', 'everything', 'throughout']);
52
+ const DOMAINS = new Set(['security', 'performance', 'scalability', 'reliability', 'testing', 'authentication', 'authorization', 'caching', 'logging', 'monitoring', 'database', 'api', 'frontend', 'backend', 'infrastructure', 'deployment', 'docker', 'kubernetes', 'microservices', 'distributed']);
53
+ // Feature/system nouns: paired with an action verb they signal a real feature, not a one-liner.
54
+ const COMPLEX_NOUNS = new Set(['authentication', 'authorization', 'oauth', 'jwt', 'session', 'sessions', 'pipeline', 'workflow', 'notification', 'notifications', 'dashboard', 'crud', 'plugin', 'framework', 'websocket', 'websockets', 'realtime', 'pagination', 'search', 'validation', 'migration', 'schema', 'registration']);
55
+ const SEQUENCE = ['first', 'then', 'after that', 'finally', 'next', 'afterwards', 'subsequently', 'step by step', 'and then', 'as well as', 'additionally', 'along with'];
56
+ const CONSTRAINTS = ['without breaking', 'without changing', 'maintaining', 'ensuring', 'backward compatible', 'backwards compatible', 'must not', 'should not', 'preserve', 'without affecting'];
57
+
58
+ const SIMPLE_THRESHOLD = 11;
59
+ const MEDIUM_THRESHOLD = 28;
60
+
61
+ // Bound the text the assessor scans. Several signal patterns contain `.*`, which can backtrack
62
+ // quadratically on adversarial input (e.g. "integrate "×N with no "with" — O(n²)). Complexity is
63
+ // fully determined by the opening of a goal, so capping the working string makes every scan
64
+ // linear-bounded and removes the DoS surface for callers that pass untrusted end-user text.
65
+ const MAX_ASSESS_LEN = 4000;
66
+
67
+ /**
68
+ * @typedef {object} ComplexityResult
69
+ * @property {'simple'|'medium'|'complex'|'critical'} level - Assessed complexity tier.
70
+ * @property {number} score - Raw heuristic score (100 for a critical override).
71
+ * @property {boolean} needsPlanning - false for `simple`, true otherwise — the routing hint.
72
+ * @property {string[]} signals - Which signals fired, for transparency/debugging.
73
+ */
74
+
75
+ /**
76
+ * Assess the complexity of a goal/prompt from its text alone (no LLM).
77
+ * @param {string} prompt - The goal to classify.
78
+ * @returns {ComplexityResult}
79
+ */
80
+ function assessComplexity(prompt) {
81
+ if (typeof prompt !== 'string' || !prompt.trim()) {
82
+ return { level: 'simple', score: 0, needsPlanning: false, signals: ['empty'] };
83
+ }
84
+ const text = prompt.trim().slice(0, MAX_ASSESS_LEN);
85
+ const lower = text.toLowerCase();
86
+ if (isCritical(lower)) {
87
+ return { level: 'critical', score: 100, needsPlanning: true, signals: ['critical_override'] };
88
+ }
89
+
90
+ const words = wordSet(lower);
91
+ const wc = text.split(/\s+/).length;
92
+ /** @type {string[]} */
93
+ const signals = [];
94
+ let score = 0;
95
+ /** @param {number} n @param {string} [sig] */
96
+ const add = (n, sig) => { score += n; if (sig) signals.push(sig); };
97
+
98
+ const complex = has(words, COMPLEX_VERBS);
99
+ const analysis = has(words, ANALYSIS_VERBS);
100
+ const medium = has(words, MEDIUM_VERBS);
101
+ const simple = has(words, SIMPLE_VERBS);
102
+ const scope = has(words, SCOPE);
103
+ const domains = has(words, DOMAINS);
104
+
105
+ if (complex.length) add(complex.length * 25, 'complex_verbs');
106
+ if (analysis.length) add(Math.min(analysis.length * 15, 20), 'analysis_verbs');
107
+ if (medium.length) add(medium.length * 12, 'medium_verbs');
108
+ if (simple.length) add(-Math.min(simple.length * 3, 10), 'simple_verbs');
109
+ if (scope.length) add(scope.length * 12, 'scope');
110
+ if (domains.length > 1) add(domains.length * 8, 'multi_domain');
111
+ else if (domains.length) add(5, 'domain');
112
+
113
+ // feature noun + an action verb => a real feature (pushes single-verb asks up a tier)
114
+ const nouns = has(words, COMPLEX_NOUNS);
115
+ if (nouns.length && (medium.length || complex.length)) add(nouns.length * 10, 'feature_nouns');
116
+ if (/\b(?:dark\s*mode|feature\s*flags?|real-?time|end-?to-?end|full-?stack)\b/.test(lower)) add(12, 'feature_pattern');
117
+ if (/\bintegrate\b.*\bwith\b/.test(lower)) add(15, 'integration');
118
+ if (/\b(?:improve|optimize)\s+(?:performance|speed|efficiency)\b/.test(lower)
119
+ && !/\b(?:this|the)\s+(?:function|method|query|loop)\b/.test(lower)) add(15, 'open_ended');
120
+
121
+ // structure / sequencing — multi-step asks are heavier
122
+ const seq = SEQUENCE.filter(m => lower.includes(m)).length;
123
+ if (seq) add(seq * 8, 'sequence');
124
+ const constraints = CONSTRAINTS.filter(m => lower.includes(m)).length;
125
+ if (constraints) add(constraints * 12, 'constraints');
126
+ const listItems = (text.match(/(?:^|\n)\s*(?:\d+[.)]|[-*])\s/g) || []).length;
127
+ if (listItems) add(listItems * 9, 'list');
128
+
129
+ // length: longer prompts trend more complex
130
+ if (wc > 40) add(15, 'long'); else if (wc > 20) add(10); else if (wc > 10) add(5);
131
+
132
+ // architectural / open-ended questions read simple lexically but imply design work
133
+ if (/\bbest\s+(?:way|approach|practice|architecture)\b|\barchitecture\s+for\b|\bhow (?:should|can|do) (?:we|i)\b.*\b(?:handle|design|implement|build)\b/.test(lower)) add(15, 'design_question');
134
+ if (/^(?:what is|where is|which|who|is there)\b/.test(lower)) add(-8, 'simple_question');
135
+
136
+ // a trivial edit (typo, comment, log line, version bump) stays simple even though its verb is
137
+ // "medium" weight — gated to trivial OBJECTS so real medium work isn't wrongly demoted.
138
+ const trivial = /\b(?:fix|add|remove|delete|rename|update|change)\b.*\b(?:typo|comment|console\.?log|variable|version|line|import)\b/.test(lower)
139
+ || /\bwrite\s+(?:a|the)\s+function\s+(?:that|to|which)\b/.test(lower);
140
+ if (trivial && !complex.length && !scope.length && wc <= 10) {
141
+ score = Math.min(score, SIMPLE_THRESHOLD);
142
+ signals.push('trivial_edit');
143
+ }
144
+
145
+ const level = score <= SIMPLE_THRESHOLD ? 'simple' : score <= MEDIUM_THRESHOLD ? 'medium' : 'complex';
146
+ return { level, score, needsPlanning: level !== 'simple', signals };
147
+ }
148
+
149
+ module.exports = { assessComplexity };
@@ -34,6 +34,67 @@ export function fromUnits(units: Array<Record<string, any>>): Array<Record<strin
34
34
  * @returns {(msgs: Array<Record<string, any>>, ctx: any) => Promise<Array<Record<string, any>>>}
35
35
  */
36
36
  export function unitAssembler(assembleUnits: (units: Array<Record<string, any>>, ctx: any) => (any | Promise<any>)): (msgs: Array<Record<string, any>>, ctx: any) => Promise<Array<Record<string, any>>>;
37
+ /**
38
+ * Wrap litectx's `trim(units, policy)` verb (R-C5) into the Loop's destructive `trim(msgs, ctx)` seam —
39
+ * the RT-2 harvest-before-evict interlock. Unlike {@link unitAssembler} (a non-destructive per-round VIEW),
40
+ * this is EVICTION: the Loop replaces its canonical transcript with the returned (smaller) msgs, so old
41
+ * turns are permanently dropped — which is only safe because every dropped turn is harvested FIRST.
42
+ *
43
+ * The returned function `(msgs, ctx) => keptMsgs`:
44
+ * 1. `toUnits(msgs)` → `trim(units, policy)` → `{ units (kept), dropped, harvest }`.
45
+ * 2. **Interlock — harvest BEFORE evict:** `await onHarvest({ key, content, unit })` for every harvest
46
+ * unit. If `onHarvest` throws (e.g. a write-gate `deny` → HaltError, or a store fault), this throws
47
+ * BEFORE returning the evicted view → the Loop fail-opens (no eviction that round) → nothing is lost;
48
+ * the next round retries and the idempotent key upserts the already-persisted ones. You cannot drop
49
+ * history you have not persisted.
50
+ * 3. `fromUnits(kept)` is the bounded transcript. Fail-OPEN: an unrecognised `trim` return shape → the
51
+ * original msgs unchanged (no eviction). A throw propagates (HaltError → governance halt).
52
+ *
53
+ * `.flush(msgs, ctx)` — the F2 residual harvest: `trim` only hands back EVICTED turns, so the final
54
+ * keepLastN window is never harvested mid-run and a clean run would diverge from an end-of-task batch.
55
+ * Call `.flush` on completion to harvest the surviving non-pinned turns (no eviction); the idempotent key
56
+ * means it never duplicates what eviction already harvested. The Loop invokes it only on **clean
57
+ * completion** — on a halt (e.g. bareguard `maxTurns`), `stop()`, or error exit the survivors stay intact
58
+ * in `result.msgs` but are NOT auto-flushed (auto-flushing a content-halt could re-trigger the deny that
59
+ * caused it); harvest `result.msgs` yourself on those exits if you need the final window persisted.
60
+ *
61
+ * `onHarvest` is the consumer's harvest POLICY point (litectx CE-PRD §6/R-W*: bareagent owns the trigger
62
+ * + what's worth keeping, litectx owns the mechanism + worklist). It is REQUIRED — the adapter structurally
63
+ * enforces harvest-before-evict; pass `async () => {}` to opt INTO lossy bounding.
64
+ *
65
+ * @param {{ trim?: Function, onHarvest?: Function, policy?: any }} [opts]
66
+ * `trim` — litectx's `(units, policy) => { units, harvest }` verb (REQUIRED). `onHarvest` —
67
+ * `({ key, content, unit }) => void|Promise` (REQUIRED; the harvest policy point). `policy` — litectx
68
+ * TrimPolicy: `{ keepLastN }` or `{ maxTokens }` (maxTokens wins). Both verbs are runtime-checked.
69
+ * @returns {((msgs: Array<Record<string, any>>, ctx?: any) => Promise<Array<Record<string, any>>>) & { flush: (msgs: Array<Record<string, any>>, ctx?: any) => Promise<void> }}
70
+ */
71
+ export function unitTrimmer(opts?: {
72
+ trim?: Function;
73
+ onHarvest?: Function;
74
+ policy?: any;
75
+ }): ((msgs: Array<Record<string, any>>, ctx?: any) => Promise<Array<Record<string, any>>>) & {
76
+ flush: (msgs: Array<Record<string, any>>, ctx?: any) => Promise<void>;
77
+ };
78
+ /**
79
+ * Stable harvest key for a unit — the dedup id for harvest-before-evict (RT-2). MUST come from a durable
80
+ * property of the TURN, never from `unit.id`: `toUnits` mints ids from a module counter, so the same turn
81
+ * is `u1` on one call and `u3` on the next (poc/rt2-trim-interlock.mjs F1) — keying on it double-writes.
82
+ * Derived here from the unit's verbatim `_msgs` backing: the joined `tool_call_id`s for a tool turn (stable
83
+ * by construction), else a hash of `[role, content]` for a plain turn. The consumer namespaces it (e.g.
84
+ * `${taskId}:${key}`) and feeds it to `remember(id, …)`, which upserts → a replayed hop overwrites instead
85
+ * of duplicating. NOT a content search (litectx FTS can't match ids; sealed meta is unsearchable) — a
86
+ * deterministic key passed to the keyed write. The consumer's `taskId` prefix is the isolation boundary;
87
+ * treat the returned key as opaque (don't re-parse it).
88
+ *
89
+ * Two collision hardenings (grounded in poc/rt2-audit-grounding.mjs): ids are `encodeURIComponent`-escaped
90
+ * before the `,` join, so `['a','b']` and `['a,b']` can no longer alias the same key; and the plain-turn
91
+ * hash is **two near-independent streams** (FNV-1a + DJB2) → ~64-bit, so a long-running agent's harvest
92
+ * can't silently overwrite a turn via a 32-bit birthday collision (a single 32-bit FNV exhausted in ~5e5
93
+ * distinct turns). Normal provider ids (`call_…`) round-trip unchanged through the escape.
94
+ * @param {Record<string, any>} unit - a unit from {@link toUnits} (its `_msgs` backing is read).
95
+ * @returns {string}
96
+ */
97
+ export function harvestKey(unit: Record<string, any>): string;
37
98
  /** chars/4 token estimate over a list of messages (matches poc2 / the Loop's own heuristic). */
38
99
  export function approxTokens(msgs: any): number;
39
100
  /**
@@ -222,4 +222,104 @@ function unitAssembler(assembleUnits) {
222
222
  };
223
223
  }
224
224
 
225
- module.exports = { toUnits, fromUnits, unitAssembler, approxTokens, pairingSeatbelt };
225
+ /**
226
+ * Stable harvest key for a unit — the dedup id for harvest-before-evict (RT-2). MUST come from a durable
227
+ * property of the TURN, never from `unit.id`: `toUnits` mints ids from a module counter, so the same turn
228
+ * is `u1` on one call and `u3` on the next (poc/rt2-trim-interlock.mjs F1) — keying on it double-writes.
229
+ * Derived here from the unit's verbatim `_msgs` backing: the joined `tool_call_id`s for a tool turn (stable
230
+ * by construction), else a hash of `[role, content]` for a plain turn. The consumer namespaces it (e.g.
231
+ * `${taskId}:${key}`) and feeds it to `remember(id, …)`, which upserts → a replayed hop overwrites instead
232
+ * of duplicating. NOT a content search (litectx FTS can't match ids; sealed meta is unsearchable) — a
233
+ * deterministic key passed to the keyed write. The consumer's `taskId` prefix is the isolation boundary;
234
+ * treat the returned key as opaque (don't re-parse it).
235
+ *
236
+ * Two collision hardenings (grounded in poc/rt2-audit-grounding.mjs): ids are `encodeURIComponent`-escaped
237
+ * before the `,` join, so `['a','b']` and `['a,b']` can no longer alias the same key; and the plain-turn
238
+ * hash is **two near-independent streams** (FNV-1a + DJB2) → ~64-bit, so a long-running agent's harvest
239
+ * can't silently overwrite a turn via a 32-bit birthday collision (a single 32-bit FNV exhausted in ~5e5
240
+ * distinct turns). Normal provider ids (`call_…`) round-trip unchanged through the escape.
241
+ * @param {Record<string, any>} unit - a unit from {@link toUnits} (its `_msgs` backing is read).
242
+ * @returns {string}
243
+ */
244
+ function harvestKey(unit) {
245
+ const back = (unit && unit._msgs) || [];
246
+ /** @type {string[]} */
247
+ const calls = [];
248
+ for (const m of back) {
249
+ if (m.role === 'assistant' && Array.isArray(m.tool_calls)) for (const tc of m.tool_calls) if (tc && tc.id) calls.push(tc.id);
250
+ }
251
+ // escape so a literal ',' inside an id can't alias across call shapes ([{id:'a,b'}] vs [{id:'a'},{id:'b'}]).
252
+ if (calls.length) return `tc:${calls.map((id) => encodeURIComponent(id)).join(',')}`;
253
+ // plain turn (no tool_calls): two near-independent rolling hashes over role+content (FNV-1a + DJB2),
254
+ // concatenated to ~64 bits → collision-resistant at agent scale. Stable across toUnits calls.
255
+ let h1 = 0x811c9dc5; // FNV-1a offset basis
256
+ let h2 = 5381; // DJB2 seed
257
+ const s = JSON.stringify(back.map((m) => [m.role, m.content]));
258
+ for (let i = 0; i < s.length; i++) {
259
+ const c = s.charCodeAt(i);
260
+ h1 ^= c; h1 = Math.imul(h1, 0x01000193);
261
+ h2 = (Math.imul(h2, 33) + c) | 0;
262
+ }
263
+ return `h:${(h1 >>> 0).toString(36)}${(h2 >>> 0).toString(36)}`;
264
+ }
265
+
266
+ /**
267
+ * Wrap litectx's `trim(units, policy)` verb (R-C5) into the Loop's destructive `trim(msgs, ctx)` seam —
268
+ * the RT-2 harvest-before-evict interlock. Unlike {@link unitAssembler} (a non-destructive per-round VIEW),
269
+ * this is EVICTION: the Loop replaces its canonical transcript with the returned (smaller) msgs, so old
270
+ * turns are permanently dropped — which is only safe because every dropped turn is harvested FIRST.
271
+ *
272
+ * The returned function `(msgs, ctx) => keptMsgs`:
273
+ * 1. `toUnits(msgs)` → `trim(units, policy)` → `{ units (kept), dropped, harvest }`.
274
+ * 2. **Interlock — harvest BEFORE evict:** `await onHarvest({ key, content, unit })` for every harvest
275
+ * unit. If `onHarvest` throws (e.g. a write-gate `deny` → HaltError, or a store fault), this throws
276
+ * BEFORE returning the evicted view → the Loop fail-opens (no eviction that round) → nothing is lost;
277
+ * the next round retries and the idempotent key upserts the already-persisted ones. You cannot drop
278
+ * history you have not persisted.
279
+ * 3. `fromUnits(kept)` is the bounded transcript. Fail-OPEN: an unrecognised `trim` return shape → the
280
+ * original msgs unchanged (no eviction). A throw propagates (HaltError → governance halt).
281
+ *
282
+ * `.flush(msgs, ctx)` — the F2 residual harvest: `trim` only hands back EVICTED turns, so the final
283
+ * keepLastN window is never harvested mid-run and a clean run would diverge from an end-of-task batch.
284
+ * Call `.flush` on completion to harvest the surviving non-pinned turns (no eviction); the idempotent key
285
+ * means it never duplicates what eviction already harvested. The Loop invokes it only on **clean
286
+ * completion** — on a halt (e.g. bareguard `maxTurns`), `stop()`, or error exit the survivors stay intact
287
+ * in `result.msgs` but are NOT auto-flushed (auto-flushing a content-halt could re-trigger the deny that
288
+ * caused it); harvest `result.msgs` yourself on those exits if you need the final window persisted.
289
+ *
290
+ * `onHarvest` is the consumer's harvest POLICY point (litectx CE-PRD §6/R-W*: bareagent owns the trigger
291
+ * + what's worth keeping, litectx owns the mechanism + worklist). It is REQUIRED — the adapter structurally
292
+ * enforces harvest-before-evict; pass `async () => {}` to opt INTO lossy bounding.
293
+ *
294
+ * @param {{ trim?: Function, onHarvest?: Function, policy?: any }} [opts]
295
+ * `trim` — litectx's `(units, policy) => { units, harvest }` verb (REQUIRED). `onHarvest` —
296
+ * `({ key, content, unit }) => void|Promise` (REQUIRED; the harvest policy point). `policy` — litectx
297
+ * TrimPolicy: `{ keepLastN }` or `{ maxTokens }` (maxTokens wins). Both verbs are runtime-checked.
298
+ * @returns {((msgs: Array<Record<string, any>>, ctx?: any) => Promise<Array<Record<string, any>>>) & { flush: (msgs: Array<Record<string, any>>, ctx?: any) => Promise<void> }}
299
+ */
300
+ function unitTrimmer(opts) {
301
+ const { trim, onHarvest, policy = {} } = opts || {};
302
+ if (typeof trim !== 'function') throw new Error('[context-units] unitTrimmer({ trim }): trim must be litectx\'s (units, policy) => { units, harvest } verb');
303
+ if (typeof onHarvest !== 'function') throw new Error('[context-units] unitTrimmer({ onHarvest }): onHarvest is required (harvest-before-evict). Pass `async () => {}` to opt into lossy bounding.');
304
+
305
+ /** @type {any} */
306
+ const run = async (/** @type {Array<Record<string, any>>} */ msgs /*, ctx */) => {
307
+ const units = toUnits(msgs);
308
+ const r = await trim(units, policy);
309
+ const kept = r && Array.isArray(r.units) ? r.units : null;
310
+ if (!kept) return msgs; // fail-open: unrecognised return shape → no eviction
311
+ const harvest = r && Array.isArray(r.harvest) ? r.harvest : [];
312
+ for (const u of harvest) await onHarvest({ key: harvestKey(u), content: u.content, unit: u }); // BEFORE evict
313
+ return fromUnits(kept);
314
+ };
315
+
316
+ // F2 residual: harvest the surviving non-pinned turns (no eviction). pinned (system + first user) are
317
+ // reconstructable/anchor turns, never evicted by trim, so they're excluded here for symmetry.
318
+ run.flush = async (/** @type {Array<Record<string, any>>} */ msgs /*, ctx */) => {
319
+ for (const u of toUnits(msgs)) if (!u.pinned) await onHarvest({ key: harvestKey(u), content: u.content, unit: u });
320
+ };
321
+
322
+ return run;
323
+ }
324
+
325
+ module.exports = { toUnits, fromUnits, unitAssembler, unitTrimmer, harvestKey, approxTokens, pairingSeatbelt };
package/src/loop.d.ts CHANGED
@@ -34,6 +34,16 @@ export type LoopOptions = {
34
34
  * summary tokens count against the budget.
35
35
  */
36
36
  assemble?: Function | undefined;
37
+ /**
38
+ * - async (msgs, ctx) => msgs. DESTRUCTIVE transcript-trim chokepoint (RT-2),
39
+ * the opposite of `assemble`: it BOUNDS the canonical transcript — the Loop replaces `msgs` with what
40
+ * this returns, evicting old turns AFTER they are harvested. Runs once per round before `assemble`.
41
+ * So eviction never drops un-persisted history, wire it via `unitTrimmer({ trim, onHarvest, policy })`
42
+ * (src/context-units.js), which performs the harvest-before-evict interlock over litectx's `trim` verb.
43
+ * An optional `.flush(msgs, ctx)` method is called on clean completion for the residual-window harvest.
44
+ * Fail-open (a trim fault degrades to no eviction that round); a thrown HaltError propagates.
45
+ */
46
+ trim?: Function | undefined;
37
47
  /**
38
48
  * - async (event) => void after each LLM call; forwards usage to
39
49
  * gate.record (via wireGate). `event.kind` discriminates the source: `'turn'` for a main-loop round,
@@ -69,6 +79,7 @@ export class Loop {
69
79
  store: import("../types").Store | null;
70
80
  policy: Function | null;
71
81
  assemble: Function | null;
82
+ trim: Function | null;
72
83
  onLlmResult: Function | null;
73
84
  onToolResult: Function | null;
74
85
  _stopped: boolean;
package/src/loop.js CHANGED
@@ -37,6 +37,13 @@ const { ToolError, HaltError } = require('./errors');
37
37
  * non-enumerable): assemble calls it to roll a summary window — bareagent makes the one model
38
38
  * call, the consumer owns the trigger/N/splice. Its usage is forwarded to `onLlmResult` so the
39
39
  * summary tokens count against the budget.
40
+ * @property {Function} [trim] - async (msgs, ctx) => msgs. DESTRUCTIVE transcript-trim chokepoint (RT-2),
41
+ * the opposite of `assemble`: it BOUNDS the canonical transcript — the Loop replaces `msgs` with what
42
+ * this returns, evicting old turns AFTER they are harvested. Runs once per round before `assemble`.
43
+ * So eviction never drops un-persisted history, wire it via `unitTrimmer({ trim, onHarvest, policy })`
44
+ * (src/context-units.js), which performs the harvest-before-evict interlock over litectx's `trim` verb.
45
+ * An optional `.flush(msgs, ctx)` method is called on clean completion for the residual-window harvest.
46
+ * Fail-open (a trim fault degrades to no eviction that round); a thrown HaltError propagates.
40
47
  * @property {Function} [onLlmResult] - async (event) => void after each LLM call; forwards usage to
41
48
  * gate.record (via wireGate). `event.kind` discriminates the source: `'turn'` for a main-loop round,
42
49
  * `'summarize'` for an out-of-band `ctx.summarize` call (R-C6). Both count against the budget.
@@ -181,6 +188,15 @@ class Loop {
181
188
  throw new Error('[Loop] options.assemble must be a function (msgs, info) => msgs');
182
189
  }
183
190
  this.assemble = options.assemble || null;
191
+ // RT-2: optional DESTRUCTIVE transcript-trim seam. Unlike `assemble` (a non-destructive view), `trim`
192
+ // bounds the canonical transcript — the Loop replaces `msgs` with what it returns, evicting old turns
193
+ // AFTER they've been harvested (the harvest-before-evict interlock lives in the trimmer; see
194
+ // src/context-units.js unitTrimmer). Opt-in: a Loop with no `trim` is unchanged. A `.flush` method on
195
+ // the function (if present) is called on clean completion for the F2 residual harvest.
196
+ if (options.trim != null && typeof options.trim !== 'function') {
197
+ throw new Error('[Loop] options.trim must be a function (msgs, ctx) => msgs (e.g. unitTrimmer({ trim, onHarvest, policy }))');
198
+ }
199
+ this.trim = options.trim || null;
184
200
  if (options.onLlmResult != null && typeof options.onLlmResult !== 'function') {
185
201
  throw new Error('[Loop] options.onLlmResult must be a function');
186
202
  }
@@ -351,6 +367,29 @@ class Loop {
351
367
  for (let round = 0; round < HARD_ROUND_LIMIT; round++) {
352
368
  if (this._stopped) break;
353
369
 
370
+ // RT-2: destructive transcript-trim chokepoint — bound the canonical transcript before assembling
371
+ // the window. Runs BEFORE assemble (trim shrinks canonical; assemble shapes the per-call view of
372
+ // what remains). The trimmer harvests every evicted turn BEFORE returning the smaller set, so this
373
+ // never drops un-persisted history. Mutate `msgs` IN PLACE (it's `const`, and result.msgs returns
374
+ // this same reference) → result.msgs becomes the bounded transcript; evicted turns live in the
375
+ // harvest store, restorable by id. Fail-OPEN: a trim fault degrades to no eviction this round (a
376
+ // context-bounding bug must not halt the agent); a HaltError (e.g. a write-gate deny during harvest)
377
+ // propagates as a clean governance exit — same contract as assemble/onLlmResult.
378
+ if (this.trim) {
379
+ try {
380
+ const before = msgs.length;
381
+ const kept = await this.trim(msgs, ctx);
382
+ if (Array.isArray(kept) && kept !== msgs) {
383
+ msgs.length = 0;
384
+ msgs.push(...kept);
385
+ if (msgs.length !== before) this._safeEmit({ type: 'loop:trim', data: { round, before, after: msgs.length } });
386
+ }
387
+ } catch (err) {
388
+ if (err instanceof HaltError) throw err;
389
+ this._reportError('trim', err, { round });
390
+ }
391
+ }
392
+
354
393
  // RT-1: context-assembly chokepoint. Let a caller (e.g. a context-engineering library) shape
355
394
  // the window sent to the provider this round. Returns a VIEW — the canonical `msgs` transcript
356
395
  // is never mutated, so result.msgs stays complete and correct. Fail-OPEN: an assembly error
@@ -412,6 +451,15 @@ class Loop {
412
451
  this._safeCall('onText', this.onText, result.text);
413
452
  this._safeEmit({ type: 'loop:done', data: { text: result.text, usage: lastUsage, cost: totalCost } });
414
453
  msgs.push({ role: 'assistant', content: result.text });
454
+ // RT-2 F2: residual harvest of the surviving window (incl. this final answer) on clean completion.
455
+ // `trim` only harvests EVICTED turns; without this, the never-evicted tail would diverge from an
456
+ // end-of-task batch. The trimmer's idempotent key means it never re-writes what eviction harvested.
457
+ // Fail-open / HaltError per the trim seam contract. No-op unless a `.flush`-capable trim is wired.
458
+ const flush = this.trim && /** @type {any} */ (this.trim).flush;
459
+ if (typeof flush === 'function') {
460
+ try { await flush(msgs, ctx); }
461
+ catch (err) { if (err instanceof HaltError) throw err; this._reportError('trim-flush', err, { round }); }
462
+ }
415
463
  return { text: result.text, toolCalls: [], usage: lastUsage, cost: totalCost, error: null, msgs };
416
464
  }
417
465
 
package/tools/spawn.d.ts CHANGED
@@ -42,9 +42,16 @@ export type SpawnChildOptions = {
42
42
  */
43
43
  cliPath?: string | undefined;
44
44
  /**
45
- * - Force-kill child after this many ms.
45
+ * - Force-kill child after this many ms (wall-clock hard ceiling).
46
46
  */
47
47
  timeoutMs?: number | undefined;
48
+ /**
49
+ * - Force-kill child after this many ms with NO output on either
50
+ * stdout or stderr (heartbeat/liveness watchdog). The clock arms at spawn and resets on every JSONL
51
+ * line, so a child doing real work is never killed, but one that hangs silently is. Opt-in
52
+ * (0/undefined disables); independent of `timeoutMs`, which remains the absolute ceiling.
53
+ */
54
+ idleTimeoutMs?: number | undefined;
48
55
  /**
49
56
  * - bareagent Stream — child:stderr events get re-emitted here.
50
57
  */
@@ -56,13 +63,16 @@ export type Stream = import("../src/stream").Stream;
56
63
  *
57
64
  * @param {object} [options]
58
65
  * @param {string} [options.cliPath] - Override the bareagent CLI path (default: ./bin/cli.js relative to this file).
59
- * @param {number} [options.timeoutMs] - Force-kill child after this many ms (default 10 min).
66
+ * @param {number} [options.timeoutMs] - Force-kill child after this many ms (default 10 min, wall-clock ceiling).
67
+ * @param {number} [options.idleTimeoutMs] - Force-kill child after this many ms of no stdout/stderr output
68
+ * (heartbeat watchdog; default off). Resets on every line, so slow-but-working children survive.
60
69
  * @param {Stream} [options.stream] - bareagent Stream instance — child:stderr events get re-emitted here.
61
70
  * @returns {{tool: import('../types').ToolDef, spawnChild: typeof spawnChild}}
62
71
  */
63
72
  export function createSpawnTool(options?: {
64
73
  cliPath?: string | undefined;
65
74
  timeoutMs?: number | undefined;
75
+ idleTimeoutMs?: number | undefined;
66
76
  stream?: import("../src/stream").Stream | undefined;
67
77
  }): {
68
78
  tool: import("../types").ToolDef;
@@ -86,12 +96,16 @@ export function createSpawnTool(options?: {
86
96
  * @property {string} [config] - Path to a bareagent config JSON file.
87
97
  * @property {*} [input] - Optional JSON input passed to the child on stdin.
88
98
  * @property {string} [cliPath] - Override the bareagent CLI path.
89
- * @property {number} [timeoutMs] - Force-kill child after this many ms.
99
+ * @property {number} [timeoutMs] - Force-kill child after this many ms (wall-clock hard ceiling).
100
+ * @property {number} [idleTimeoutMs] - Force-kill child after this many ms with NO output on either
101
+ * stdout or stderr (heartbeat/liveness watchdog). The clock arms at spawn and resets on every JSONL
102
+ * line, so a child doing real work is never killed, but one that hangs silently is. Opt-in
103
+ * (0/undefined disables); independent of `timeoutMs`, which remains the absolute ceiling.
90
104
  * @property {Stream} [stream] - bareagent Stream — child:stderr events get re-emitted here.
91
105
  *
92
106
  * @param {SpawnChildOptions} [opts]
93
107
  */
94
- export function spawnChild({ config, input, cliPath, timeoutMs, stream }?: SpawnChildOptions): {
108
+ export function spawnChild({ config, input, cliPath, timeoutMs, idleTimeoutMs, stream }?: SpawnChildOptions): {
95
109
  wait: () => Promise<{
96
110
  text: any;
97
111
  usage: any;
@@ -100,6 +114,7 @@ export function spawnChild({ config, input, cliPath, timeoutMs, stream }?: Spawn
100
114
  events: ChildEvent[];
101
115
  exitCode: any;
102
116
  signal: any;
117
+ idleKilled: boolean;
103
118
  }>;
104
119
  onLine: (fn: (event: ChildEvent) => void) => () => void;
105
120
  kill: (sig?: NodeJS.Signals) => void;
package/tools/spawn.js CHANGED
@@ -66,12 +66,16 @@ function resolveCliPath() {
66
66
  * @property {string} [config] - Path to a bareagent config JSON file.
67
67
  * @property {*} [input] - Optional JSON input passed to the child on stdin.
68
68
  * @property {string} [cliPath] - Override the bareagent CLI path.
69
- * @property {number} [timeoutMs] - Force-kill child after this many ms.
69
+ * @property {number} [timeoutMs] - Force-kill child after this many ms (wall-clock hard ceiling).
70
+ * @property {number} [idleTimeoutMs] - Force-kill child after this many ms with NO output on either
71
+ * stdout or stderr (heartbeat/liveness watchdog). The clock arms at spawn and resets on every JSONL
72
+ * line, so a child doing real work is never killed, but one that hangs silently is. Opt-in
73
+ * (0/undefined disables); independent of `timeoutMs`, which remains the absolute ceiling.
70
74
  * @property {Stream} [stream] - bareagent Stream — child:stderr events get re-emitted here.
71
75
  *
72
76
  * @param {SpawnChildOptions} [opts]
73
77
  */
74
- function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
78
+ function spawnChild({ config, input, cliPath, timeoutMs, idleTimeoutMs, stream } = {}) {
75
79
  if (typeof config !== 'string' || !config) {
76
80
  throw new Error('[spawn] requires { config: <path> }');
77
81
  }
@@ -104,10 +108,29 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
104
108
  if (i >= 0) lineSubscribers.splice(i, 1);
105
109
  }; };
106
110
 
111
+ // Idle watchdog: kill the child after `idleTimeoutMs` of silence on BOTH stdio streams.
112
+ // Distinct from `timeoutMs` (wall-clock ceiling): this catches a child that is alive but stuck
113
+ // producing nothing — the "no activity in stderr" hang — without punishing one doing slow work,
114
+ // since `armIdle()` resets on every line. Armed at spawn so a child that never emits is caught too.
115
+ let idleTimer = null;
116
+ let idleKilled = false;
117
+ const armIdle = () => {
118
+ if (!idleTimeoutMs || idleTimeoutMs <= 0) return;
119
+ if (idleTimer) clearTimeout(idleTimer);
120
+ idleTimer = setTimeout(() => {
121
+ idleKilled = true;
122
+ try { child.kill('SIGTERM'); } catch { /* already dead */ }
123
+ setTimeout(() => { try { child.kill('SIGKILL'); } catch { /* already dead */ } }, 5000).unref();
124
+ }, idleTimeoutMs);
125
+ idleTimer.unref();
126
+ };
127
+ armIdle();
128
+
107
129
  // stdout — JSONL events from the child loop
108
130
  const outRl = readline.createInterface({ input: child.stdout, crlfDelay: Infinity });
109
131
  outRl.on('line', (line) => {
110
132
  if (!line) return;
133
+ armIdle();
111
134
  let event;
112
135
  try { event = JSON.parse(line); }
113
136
  catch {
@@ -130,6 +153,7 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
130
153
  const errRl = readline.createInterface({ input: child.stderr, crlfDelay: Infinity });
131
154
  errRl.on('line', (line) => {
132
155
  if (!line) return;
156
+ armIdle();
133
157
  const event = { type: 'child:stderr', text: line, ts: new Date().toISOString() };
134
158
  events.push(event);
135
159
  if (stream) {
@@ -157,18 +181,20 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
157
181
  const exitPromise = new Promise((resolve) => {
158
182
  child.on('exit', async (code, signal) => {
159
183
  if (killTimer) clearTimeout(killTimer);
184
+ if (idleTimer) clearTimeout(idleTimer);
160
185
  // Drain stdio readlines before resolving — last line may still be in buffer.
161
186
  await Promise.all([outClosePromise, errClosePromise]);
162
- resolve({ code, signal });
187
+ resolve({ code, signal, idleKilled });
163
188
  });
164
189
  child.on('error', (err) => {
165
190
  if (killTimer) clearTimeout(killTimer);
191
+ if (idleTimer) clearTimeout(idleTimer);
166
192
  resolve({ code: null, signal: null, spawnError: err });
167
193
  });
168
194
  });
169
195
 
170
196
  async function wait() {
171
- const { code, signal, spawnError } = await exitPromise;
197
+ const { code, signal, spawnError, idleKilled: idle } = await exitPromise;
172
198
  if (spawnError) {
173
199
  return {
174
200
  text: '',
@@ -178,6 +204,7 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
178
204
  events,
179
205
  exitCode: null,
180
206
  signal: null,
207
+ idleKilled: false,
181
208
  };
182
209
  }
183
210
  // Pluck the final loop:done event — that's the canonical child result.
@@ -194,18 +221,21 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
194
221
  events,
195
222
  exitCode: code,
196
223
  signal,
224
+ idleKilled: !!idle,
197
225
  };
198
226
  }
199
227
  // No loop:done — child exited abnormally or never reached the LLM.
200
228
  const errEvent = events.find(e => e.type === 'loop:error' || e.type === 'error');
229
+ const idleNote = idle ? `[spawn] child killed after idle timeout (no output; signal=${signal})` : null;
201
230
  return {
202
231
  text: '',
203
232
  usage: { inputTokens: 0, outputTokens: 0 },
204
233
  cost: 0,
205
- error: errEvent?.data?.error || `[spawn] child exited (code=${code}, signal=${signal}) without loop:done`,
234
+ error: idleNote || errEvent?.data?.error || `[spawn] child exited (code=${code}, signal=${signal}) without loop:done`,
206
235
  events,
207
236
  exitCode: code,
208
237
  signal,
238
+ idleKilled: !!idle,
209
239
  };
210
240
  }
211
241
 
@@ -222,7 +252,9 @@ function spawnChild({ config, input, cliPath, timeoutMs, stream } = {}) {
222
252
  *
223
253
  * @param {object} [options]
224
254
  * @param {string} [options.cliPath] - Override the bareagent CLI path (default: ./bin/cli.js relative to this file).
225
- * @param {number} [options.timeoutMs] - Force-kill child after this many ms (default 10 min).
255
+ * @param {number} [options.timeoutMs] - Force-kill child after this many ms (default 10 min, wall-clock ceiling).
256
+ * @param {number} [options.idleTimeoutMs] - Force-kill child after this many ms of no stdout/stderr output
257
+ * (heartbeat watchdog; default off). Resets on every line, so slow-but-working children survive.
226
258
  * @param {Stream} [options.stream] - bareagent Stream instance — child:stderr events get re-emitted here.
227
259
  * @returns {{tool: import('../types').ToolDef, spawnChild: typeof spawnChild}}
228
260
  */
@@ -250,6 +282,7 @@ function createSpawnTool(options = {}) {
250
282
  input,
251
283
  cliPath: options.cliPath,
252
284
  timeoutMs: options.timeoutMs ?? DEFAULT_TIMEOUT_MS,
285
+ idleTimeoutMs: options.idleTimeoutMs,
253
286
  stream: options.stream,
254
287
  });
255
288
  return await handle.wait();