zidane 5.5.4 → 5.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +7 -1
- package/dist/{agent-CMAklak7.d.ts → agent-B26FuGew.d.ts} +90 -2
- package/dist/agent-B26FuGew.d.ts.map +1 -0
- package/dist/chat.d.ts +133 -22
- package/dist/chat.d.ts.map +1 -1
- package/dist/chat.js +3 -3
- package/dist/{errors-C5VSakmT.js → errors-DdZXnyXE.js} +38 -2
- package/dist/errors-DdZXnyXE.js.map +1 -0
- package/dist/{index-CF5QwBiz.d.ts → index-CE7z_11T.d.ts} +2 -2
- package/dist/{index-CF5QwBiz.d.ts.map → index-CE7z_11T.d.ts.map} +1 -1
- package/dist/{index-kroGomhj.d.ts → index-CROWxXo9.d.ts} +23 -2
- package/dist/index-CROWxXo9.d.ts.map +1 -0
- package/dist/index.d.ts +4 -4
- package/dist/index.js +10 -10
- package/dist/{interpolate-Cvjy8gpk.js → interpolate-j5V-wcAQ.js} +2 -2
- package/dist/{interpolate-Cvjy8gpk.js.map → interpolate-j5V-wcAQ.js.map} +1 -1
- package/dist/{login-B_kfoGMP.js → login-D5lQWoFx.js} +3 -3
- package/dist/{login-B_kfoGMP.js.map → login-D5lQWoFx.js.map} +1 -1
- package/dist/{mcp-BE43Viwi.js → mcp-ngMS0S6N.js} +2 -2
- package/dist/{mcp-BE43Viwi.js.map → mcp-ngMS0S6N.js.map} +1 -1
- package/dist/mcp.d.ts +1 -1
- package/dist/mcp.js +1 -1
- package/dist/{messages-BBWakTN6.js → messages-B5k4DAXy.js} +2 -2
- package/dist/{messages-BBWakTN6.js.map → messages-B5k4DAXy.js.map} +1 -1
- package/dist/{presets-BDvBZuYI.js → presets-BDCthpyD.js} +2 -2
- package/dist/{presets-BDvBZuYI.js.map → presets-BDCthpyD.js.map} +1 -1
- package/dist/presets.d.ts +2 -2
- package/dist/presets.js +1 -1
- package/dist/{providers-CsUyN_FJ.js → providers-CaJE2ToS.js} +3 -3
- package/dist/{providers-CsUyN_FJ.js.map → providers-CaJE2ToS.js.map} +1 -1
- package/dist/providers.d.ts +1 -1
- package/dist/providers.js +2 -2
- package/dist/restate.d.ts +1 -1
- package/dist/session/sqlite.d.ts +1 -1
- package/dist/session/sqlite.d.ts.map +1 -1
- package/dist/session/sqlite.js +226 -51
- package/dist/session/sqlite.js.map +1 -1
- package/dist/{session-DzfRacU_.js → session-BoEW_wCR.js} +2 -2
- package/dist/{session-DzfRacU_.js.map → session-BoEW_wCR.js.map} +1 -1
- package/dist/session.d.ts +1 -1
- package/dist/session.js +2 -2
- package/dist/skills.d.ts +2 -2
- package/dist/skills.js +1 -1
- package/dist/{tools-Bbd0Ivwn.js → tools-Co3VYhgM.js} +154 -15
- package/dist/tools-Co3VYhgM.js.map +1 -0
- package/dist/tools.d.ts +2 -2
- package/dist/tools.js +1 -1
- package/dist/{transcript-anchors-C79AszkC.d.ts → transcript-anchors-CTTeQJzy.d.ts} +12 -4
- package/dist/{transcript-anchors-C79AszkC.d.ts.map → transcript-anchors-CTTeQJzy.d.ts.map} +1 -1
- package/dist/tui.d.ts +2 -2
- package/dist/tui.d.ts.map +1 -1
- package/dist/tui.js +432 -83
- package/dist/tui.js.map +1 -1
- package/dist/{turn-operations-CGf7wWF0.js → turn-operations-fhinWY4m.js} +134 -18
- package/dist/turn-operations-fhinWY4m.js.map +1 -0
- package/dist/types-oKPBdCmL.js.map +1 -1
- package/dist/types.d.ts +3 -3
- package/dist/types.js +2 -2
- package/docs/ARCHITECTURE.md +5 -2
- package/docs/CHAT.md +10 -3
- package/docs/RESTATE.md +190 -0
- package/docs/SKILL.md +27 -2
- package/docs/TUI.md +3 -3
- package/package.json +1 -1
- package/dist/agent-CMAklak7.d.ts.map +0 -1
- package/dist/errors-C5VSakmT.js.map +0 -1
- package/dist/index-kroGomhj.d.ts.map +0 -1
- package/dist/tools-Bbd0Ivwn.js.map +0 -1
- package/dist/turn-operations-CGf7wWF0.js.map +0 -1
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"types-oKPBdCmL.js","names":[],"sources":["../src/types.ts"],"sourcesContent":["/**\n * Shared types for the agent system.\n */\n\nimport type { ToolDef } from './tools/types'\nimport { Buffer } from 'node:buffer'\n\n// ---------------------------------------------------------------------------\n// Thinking / Reasoning\n// ---------------------------------------------------------------------------\n\n/**\n * Thinking / extended-reasoning configuration.\n *\n * - `'off'` — no thinking.\n * - `'minimal' | 'low' | 'medium' | 'high'` — explicit token budget. Maps to\n * provider-specific reasoning controls (Anthropic `thinking.type='enabled'`\n * with a budget; OpenAI `reasoning_effort`).\n * - `'adaptive'` — let the model decide per-turn whether and how much to think.\n * Anthropic-only (`thinking.type='adaptive'`). Other providers fall back to\n * no reasoning when this value is supplied.\n */\nexport type ThinkingLevel = 'off' | 'minimal' | 'low' | 'medium' | 'high' | 'adaptive'\n\n// ---------------------------------------------------------------------------\n// Clock / determinism seam\n// ---------------------------------------------------------------------------\n\n/**\n * Time + UUID source. Defaults to `Date.now()` and `crypto.randomUUID()`.\n *\n * Scoped to **journaled metadata** — every callsite that lands in\n * `SessionTurn.id`, `SessionTurn.createdAt`, `runId`, `turnId`, or hook\n * payloads consumers may persist. Live-only measurements (TTFT deltas,\n * `elapsed` counters) keep `Date.now()` directly so they reflect real\n * wall-clock progress.\n *\n * `now()` is allowed to return a `Promise<number>` so durable-execution\n * adapters can journal each timestamp (Restate's `ctx.date.now()` is\n * async because it routes through `ctx.run`). The loop awaits at every\n * callsite. `randomUUID()` stays synchronous because Restate's\n * `ctx.rand.uuidv4()` is deterministic-from-seed and doesn't need\n * journaling. The native default returns sync values for both.\n *\n * Durable-execution adapters (Restate, Temporal, …) inject a journaled\n * variant — Restate: `{ now: () => ctx.date.now(), randomUUID: () => ctx.rand.uuidv4() }` —\n * so replay regenerates byte-identical session metadata across attempts.\n *\n * Precedence: `AgentRunOptions.clock` > `AgentOptions.clock` >\n * {@link DEFAULT_AGENT_CLOCK}.\n */\nexport interface AgentClock {\n /**\n * Current wall-clock time in ms (epoch). May be async so journaled\n * implementations (Restate) can plumb through their durable timer.\n */\n now: () => number | Promise<number>\n /**\n * RFC-4122 v4 UUID string. Always sync — Restate's `rand.uuidv4()`\n * is seeded from the invocation ID and replay-stable by construction.\n */\n randomUUID: () => string\n}\n\n/** Native clock backing the default (Date.now + crypto.randomUUID). */\nexport const DEFAULT_AGENT_CLOCK: AgentClock = {\n now: () => Date.now(),\n randomUUID: () => crypto.randomUUID(),\n}\n\n// ---------------------------------------------------------------------------\n// MCP server configuration\n// ---------------------------------------------------------------------------\n\n/**\n * Slim shape of an upstream MCP tool descriptor — what `client.listTools()`\n * returns per entry. Exposed publicly so hosts can persist the schemas\n * between runs and feed them back via {@link McpServerConfig.cachedTools}\n * to skip the `tools/list` round-trip on subsequent bootstraps.\n */\nexport interface McpToolSchema {\n name: string\n description?: string | null\n inputSchema?: unknown\n}\n\nexport interface McpServerConfig {\n /** Display name (used for tool namespacing) */\n name: string\n /** Transport type */\n transport: 'stdio' | 'sse' | 'streamable-http'\n /** For stdio: command to run */\n command?: string\n /** For stdio: command arguments */\n args?: string[]\n /**\n * For stdio: environment variables to pass to the server process.\n *\n * Merged on top of the MCP SDK's default inherited environment — a safety\n * whitelist (`PATH`, `HOME`, `LANG`, `SHELL`, `USER` on POSIX; `APPDATA`,\n * `PATH`, ... on Win32). Setting this to `{}` no longer strips `PATH` from\n * the child process. Set {@link McpServerConfig.strictEnv} to `true` to\n * pass `env` verbatim with no inherited defaults.\n */\n env?: Record<string, string>\n /**\n * When true, {@link McpServerConfig.env} is passed verbatim to the spawned\n * process — the MCP SDK's default inherited environment (`PATH`, `HOME`, ...)\n * is NOT merged in. Most consumers should leave this off; the default merge\n * prevents `spawn ENOENT` when a stdio server declares an `env` without\n * restating `PATH`.\n */\n strictEnv?: boolean\n /** For sse/streamable-http: server URL */\n url?: string\n /** Optional headers for HTTP transports */\n headers?: Record<string, string>\n /**\n * OAuth 2.1 authentication (sse / streamable-http only).\n *\n * - `'oauth'` — enables the SDK's OAuth flow with RFC 9728 protected-resource\n * metadata discovery, RFC 8414 / OIDC authorization-server metadata, RFC 7591\n * dynamic client registration, PKCE, and refresh-token rotation. Tokens persist\n * between runs via the host's credential store.\n * - `undefined` (default) — no OAuth. The host may still auto-promote a server\n * to OAuth on `UnauthorizedError` IF no static `Authorization` header is set\n * (the headers check stops us from second-guessing user-managed bearer tokens).\n *\n * Recognized aliases at parse time: Cursor's `authMethod: 'mcpOAuth'` maps to\n * `auth: 'oauth'` so `~/.cursor/mcp.json` pastes work unchanged.\n */\n auth?: 'oauth'\n /**\n * Timeout in milliseconds for MCP server bootstrap (connect + tool discovery).\n *\n * Zidane connects MCP servers lazily on the first `run()`. Without a\n * bootstrap timeout, a slow or hung server can delay the first provider call\n * for an arbitrarily long time even when that MCP server is never used.\n *\n * Default: `10000`.\n */\n bootstrapTimeout?: number\n /** Timeout in milliseconds for MCP tool calls (default: 30000) */\n toolTimeout?: number\n /**\n * Allow-list of tool names to expose. Names match the upstream tool name\n * (NOT the namespaced `mcp_{server}_{tool}` form). Tools not in the list are\n * dropped before registration — the model never sees them in its catalog and\n * the wire cost of advertising them is avoided.\n *\n * Mutually exclusive with {@link McpServerConfig.disabledTools} — passing both\n * throws at bootstrap time.\n *\n * Composes with {@link McpServerConfig.toolFilter}: allow-list applies first,\n * then the predicate. Composes with the `mcp:tools:filter` hook: config-side\n * filters apply first, then the hook can further narrow the list.\n */\n enabledTools?: string[]\n /**\n * Deny-list of tool names. Tools matching are dropped before registration.\n * Same matching semantics as {@link McpServerConfig.enabledTools}.\n */\n disabledTools?: string[]\n /**\n * Custom predicate run on each upstream tool. Return `true` to keep, `false`\n * to drop. Receives the raw `listTools()` payload — useful for filtering by\n * description, schema shape, or other metadata that an allow/deny list can't\n * express.\n *\n * Runs after the allow/deny filter but before the `mcp:tools:filter` hook.\n */\n toolFilter?: (tool: { name: string, description?: string | null, inputSchema?: unknown }) => boolean\n /**\n * Per-server override for {@link AgentBehavior.toolDisclosure}. When set,\n * this server's tools follow this disclosure mode regardless of the\n * agent-wide default. Useful when one big MCP server (200+ tools) should\n * stay lazy while smaller servers stay eager.\n *\n * Default: inherits from `behavior.toolDisclosure`.\n */\n disclosure?: 'eager' | 'lazy'\n /**\n * Pre-cached tool schemas to advertise without issuing `tools/list` at\n * bootstrap. The connection is still established (the SDK's `connect()`\n * is needed for `tools/call`) — only the discovery round-trip is\n * skipped. Schemas are trusted as-is; the host owns invalidation\n * (typical cache key: `(server identity, server version)`). If the\n * server later returns `MethodNotFound` for a cached tool, the host\n * should drop the entry from its cache so the next bootstrap re-lists.\n *\n * Compatible with every transport, every auth mode, and with\n * {@link McpServerConfig.lazyConnect}. Composes with the existing\n * `enabledTools` / `disabledTools` / `toolFilter` filters — those run\n * over the cached schemas exactly as they would over `listTools()`\n * output.\n */\n cachedTools?: McpToolSchema[]\n /**\n * Defer the `client.connect(transport)` call until the first\n * `tools/call` reaches this server. Bootstrap registers the server's\n * tools using {@link McpServerConfig.cachedTools} without touching\n * the network, taking MCP setup off the critical path of\n * `agent.run()`. The first invocation pays the connect cost\n * (~200-500ms typically); every subsequent call reuses the live\n * client.\n *\n * Requires {@link McpServerConfig.cachedTools} — without schemas in\n * hand there is nothing to advertise to the model, so deferring the\n * connection has no purpose. Bootstrap rejects the config otherwise.\n *\n * **Incompatible with `auth: 'oauth'`**: the OAuth handshake (token\n * refresh / RFC 9728 metadata discovery) can fail in ways that today\n * fire `mcp:auth:required` at bootstrap so the host can surface a\n * login affordance *before* the model commits to calling a tool.\n * Deferring that to first call means an auth failure surfaces mid-run\n * as a tool-result error, which the model can't recover from without\n * a fresh prompt. Bootstrap rejects the combination so the error is\n * loud and proximate to the misconfiguration. Use OAuth servers\n * without `lazyConnect` (with `cachedTools` alone, if you want to\n * skip the `tools/list` round-trip).\n *\n * On connect failure (network error, transport refused), the cached\n * promise is dropped so the next `tools/call` retries. The model\n * sees the failure as a normal tool error. Subsequent calls remain\n * eligible to succeed once the upstream is reachable again.\n */\n lazyConnect?: boolean\n}\n\n// ---------------------------------------------------------------------------\n// Tool execution\n// ---------------------------------------------------------------------------\n\nexport interface AgentBehavior {\n /**\n * Maximum number of tools that may be in flight concurrently within a\n * single assistant turn. The scheduler dispatches concurrency-safe tools\n * (`ToolDef.isConcurrencySafe`) in parallel up to this cap; unsafe tools\n * act as barriers (wait for the fleet to drain, then run alone).\n *\n * Default: `10`. Set to `1` to force fully sequential dispatch regardless\n * of per-tool flags — useful for deterministic debugging / eval-grade\n * runs. Values `< 1` are clamped to `1`.\n */\n maxConcurrentTools?: number\n /**\n * Max agent loop iterations.\n *\n * Default: unlimited (Infinity). The loop runs until the model signals\n * completion (no tool calls / `end_turn`), the abort signal fires, or this\n * cap is hit. Set a finite value as a safety net for runaway loops.\n */\n maxTurns?: number\n /** Max tokens per LLM response (default: 16384) */\n maxTokens?: number\n /** Thinking token budget — overrides the level-based default when set */\n thinkingBudget?: number\n /** JSON Schema for structured output enforcement */\n schema?: Record<string, unknown>\n /**\n * Enable provider prompt caching. When on (default), the provider marks the\n * system prompt, tools, and the last stable message with cache breakpoints so\n * the shared prefix is served from cache across turns.\n *\n * - Anthropic: `cache_control: { type: 'ephemeral' }` on the last `system`\n * content part, the last tool, and the last message content part.\n * - OpenAI-compatible / OpenRouter: same shape — honored by Anthropic-backed\n * OpenRouter routes and by Gemini; ignored (no-op) by providers that cache\n * automatically (OpenAI, DeepSeek, Grok, Groq, Moonshot).\n *\n * Usage is surfaced via `TurnUsage.cacheRead` / `TurnUsage.cacheCreation`.\n *\n * Default: `true`.\n */\n cache?: boolean\n /**\n * Soft per-turn cap on total tool-output bytes. When the sum of `outputBytes`\n * across a turn's tool results exceeds this value, the loop injects a\n * synthetic user message instructing the model to summarize before calling\n * more tools, and fires the `budget:exceeded` hook.\n *\n * Measured **post-`tool:transform`** so consumer truncation counts toward the\n * budget. Off by default (undefined / `0` disables the check). A reasonable\n * starting value for OSS-model integrations is `32768`.\n */\n toolOutputBudget?: number\n /**\n * Canonical tool names whose output is exempt from\n * {@link AgentBehavior.toolOutputBudget} accounting. Their bytes don't\n * count toward the per-turn cap and the \"summarize before calling more\n * tools\" nudge is not triggered by them alone.\n *\n * Intended for tools whose entire purpose is to LOAD context into the\n * conversation — penalising them via the budget creates the exact\n * failure mode the budget is meant to prevent (the model gets steered\n * away from the very tool it just called to make progress):\n *\n * - `tool_search` — returns full `inputSchema` payloads for MCP tools the\n * model needs to call next. A Notion MCP server with 20+ tools easily\n * exceeds 64 KiB of schema JSON in a single discovery call.\n * - `skills_use` / `skills_read` — inject skill content; that text IS\n * the value of the call.\n *\n * `read_file` is intentionally NOT in the default list: a 200 KiB file\n * load is exactly the case the budget should steer against (pagination\n * or summarisation is the right next move).\n *\n * For MCP tools, key by the namespaced wire name (`mcp_<server>_<tool>`).\n * The matching happens on the canonical (registry-key) name, so aliases\n * are stable.\n *\n * Default: `undefined` — every tool counts. Chat profiles set their own\n * list — see `src/chat/agents.ts`.\n */\n toolOutputBudgetExcludeTools?: readonly string[]\n /**\n * Deduplicate identical re-reads of the same file in `read_file`. When the\n * model re-reads a file with the same slice and the bytes haven't changed\n * since the last read in this session, the tool returns a short stub\n * instead of re-emitting the full content. Pairs with the read-before-edit\n * guard in `edit` / `multi_edit`.\n *\n * Requires a session (set via `createSession()`); without one, the flag is\n * a no-op since per-session state has nowhere to live.\n *\n * Default: `true`.\n */\n dedupReads?: boolean\n /**\n * Taper the thinking budget over the course of a run. Late turns are\n * usually checkpoint / cleanup work where reasoning rarely pays for\n * itself; early turns benefit most. Two forms:\n *\n * - **Struct** — geometric decay starting after `afterTurn`, multiplying by\n * `factor` each subsequent turn, clamped to `floor`. Example\n * `{ afterTurn: 5, factor: 0.5, floor: 1024 }` with a base budget of 8192:\n * turns 1-5 = 8192, turn 6 = 4096, turn 7 = 2048, turn 8+ = 1024.\n * - **Function** — `(runTurn, baseBudget) => number`. Arbitrary curves;\n * `runTurn` is 1-indexed, run-relative (resumed sessions reset).\n *\n * No-op when `thinkingBudget` is unset. Honored by every provider that\n * respects `thinkingBudget` (anthropic explicit-budget `enabled` path,\n * adaptive `maxTokensCap`, openai-compat `max_tokens` padding).\n *\n * Default: `undefined` (no decay).\n */\n thinkingDecay?: { afterTurn: number, factor: number, floor: number } | ((runTurn: number, baseBudget: number) => number)\n /**\n * Per-tool soft call budget for this run. Keyed by **canonical** tool name.\n * On the first call after the run-cumulative dispatched count for that tool\n * reaches `max`, the framework fires `onExceed`:\n *\n * - `'steer'` (default) — let the call execute, but emit a synthetic user\n * message after the turn that nudges the model away from re-calling the\n * tool. Reuses the existing post-turn steer pathway used by\n * `toolOutputBudget`. Fires `tool-budget:exceeded` with `mode: 'steer'`.\n * - `'block'` — refuse the call via `tool:gate` `block`. The model sees a\n * `Blocked: <reason>` tool result. Fires `tool-budget:exceeded` with\n * `mode: 'block'`.\n * - **Function** — `(ctx) => { mode, message }`. The consumer supplies the\n * steering / refusal text and chooses the mode dynamically.\n *\n * Counts include both real dispatches and dedup substitutes (Z19 hits).\n * Excludes calls already blocked by an earlier gate (skill allow-list,\n * consumer hook). Tool dispatched by spawned subagents has its own per-run\n * counter — child counts never charge the parent.\n *\n * For MCP tools, key by the namespaced wire name (`mcp_<server>_<tool>`).\n *\n * Atomic in parallel mode: the middleware tracks its own per-tool\n * approval counter, incremented synchronously at gate-time. A\n * 4-call parallel batch against `max: 2` will let the first 2 through\n * and refuse the rest, even though the loop's `runToolCounts` only\n * propagates between calls (not within a single batch's gate fan-out).\n *\n * Default: `undefined` (no budget enforcement).\n */\n toolBudgets?: Record<string, {\n max: number\n onExceed?: 'steer' | 'block' | ((ctx: {\n tool: string\n count: number\n max: number\n }) => { mode: 'steer' | 'block', message: string })\n }>\n /**\n * Generic per-tool argument deduplication. Keyed by the tool's **canonical**\n * name (alias-stable). Each entry is a hasher: `(input) => string | undefined`.\n *\n * **Hasher contract** — three return values, three meanings:\n *\n * | Return | Meaning |\n * |-------------------------|------------------------------------------------------------------------|\n * | a non-empty string | Cache key for this call. Equal keys (most-recent-only, this session) |\n * | | replay the prior recorded result without re-dispatching the tool. |\n * | `undefined` | **Skip dedup for this call.** The tool runs normally; nothing recorded.|\n * | `''` / non-string | Treated identically to `undefined` (defensive: no dedup, no error). |\n *\n * The `undefined` opt-out is the way to say *\"this specific call is not\n * cacheable\"* (timestamps in input, randomness baked in, debug flags). It\n * is **not** the same as `JSON.stringify(input)` — that would dedup against\n * the verbatim input. Pick one explicitly:\n *\n * ```ts\n * // Always cache by full input — every identical re-call dedups.\n * dedupTools: { my_pure_tool: input => JSON.stringify(input) }\n *\n * // Cache by a normalized subset; non-cacheable shapes opt out.\n * dedupTools: {\n * execute_sql: (input) => {\n * const q = typeof input.query === 'string' ? input.query.trim().toLowerCase() : undefined\n * if (!q || q.includes('now()') || q.includes('random()')) return undefined\n * return q\n * },\n * }\n * ```\n *\n * On a hit, the previously-recorded result is replayed as the tool_result\n * without dispatching the tool. The substitution flows through `tool:gate`\n * `result` (Z20), so `tool:after` and `tool:transform` still fire.\n *\n * Requires a session (`createSession()`); without one, the map is a silent\n * no-op since per-session state has nowhere to live. Tools with side\n * effects or non-deterministic outputs (network, time, randomness) MUST\n * NOT be listed — there is no safety net beyond the consumer's hasher.\n *\n * For MCP tools, key by the namespaced wire name (`mcp_<server>_<tool>`).\n * Concurrency-safe siblings ({@link ToolDef.isConcurrencySafe}) in the\n * SAME assistant turn race against each other — none can dedup against\n * a sibling that started in the same batch. Unsafe tools act as barriers\n * and honor submission order within a turn, so an unsafe-but-listed tool\n * follows the cache cleanly.\n *\n * **Cache policy**: only the most recent `(hash, result)` per tool is\n * retained. Interleaved patterns (input A, input B, input A) miss on the\n * second A because B overwrote it. Sufficient for the common spam-the-\n * same-call loop; consumers needing a richer cache should hook\n * `tool:gate` directly.\n *\n * Default: `undefined` (no per-tool dedup).\n */\n dedupTools?: Record<string, (input: Record<string, unknown>) => string | undefined>\n /**\n * Require `read_file` before `edit` / `multi_edit` on the same path, and\n * reject edits when the file has changed on disk since the last read in\n * this session. Eliminates the silent-corruption failure mode where a\n * model \"remembers\" stale content and applies a substring edit against\n * bytes that have moved.\n *\n * Requires a session. Off by default; turn it on for stricter eval-grade\n * runs where silent edit corruption would invalidate the result.\n *\n * Default: `false`.\n */\n requireReadBeforeEdit?: boolean\n /**\n * Client-side context compaction strategy. Use this for non-Anthropic\n * providers (OSS via cerebras / openai-compat / openrouter) that don't\n * have a server-side equivalent. Anthropic users should prefer the\n * server-side `context-management-2025-06-27` beta — see\n * `AnthropicParams.contextManagement`.\n *\n * - `'off'` (default) — no client-side compaction.\n * - `'tail'` — when total tool-output bytes in the persisted history\n * exceed `compactThreshold`, replace older `tool_result` outputs with a\n * short stub, keeping the newest `compactKeepTurns` turns intact. The\n * compaction is applied to the wire-level message list only; the\n * underlying session turns are not modified.\n *\n * Default: `'off'`.\n */\n compactStrategy?: 'off' | 'tail'\n /**\n * Soft byte threshold that triggers tail compaction when\n * `compactStrategy === 'tail'`. Counts the post-`context:transform` bytes\n * of `tool_result` outputs across all messages. Default: `131_072` (128\n * KiB). Ignored when compaction is off.\n */\n compactThreshold?: number\n /**\n * Number of trailing turns to leave untouched during tail compaction. The\n * most-recent `compactKeepTurns` user/assistant messages are not eligible\n * for elision so the model keeps the freshest tool context. Default: `4`.\n */\n compactKeepTurns?: number\n /**\n * Prefix every line of `read_file` output with its 1-indexed line number\n * followed by a tab (`<N>\\t<content>`) — the compact `cat -n`-style\n * format Claude Code emits. The `edit` tool strips the prefix from\n * `old_string` / `new_string` so the model can paste back a numbered\n * chunk verbatim without breaking the match.\n *\n * Set `false` to opt out — useful for callers piping `read_file` into\n * downstream parsers that don't recognize the prefix. Per-call\n * `read_file({ lineNumbers: false })` overrides this default.\n *\n * Default: `true`.\n */\n readLineNumbers?: boolean\n /**\n * Replace older `read_file` `tool_result` blocks with a short stub when\n * a successful `edit` / `multi_edit` / `write_file` later in the same\n * run modified the same path. The replacement is applied to the\n * wire-level message list only — persisted session turns keep the\n * original content.\n *\n * Eliminates the common waste pattern where the model carries the\n * pre-edit file body forward across many turns \"in case it needs it\".\n * Pairs cleanly with `compactStrategy: 'tail'`: stale reads shrink\n * first, then the byte-threshold compaction fires if anything's left.\n *\n * Detection is conservative — only triggers when the corresponding\n * tool_result confirms success (`Edited …`, `Created …`, `Updated …`).\n * Failed edits and `No change needed` write_file calls do NOT\n * invalidate prior reads.\n *\n * Default: `false`.\n */\n elideStaleReads?: boolean\n /**\n * Tool disclosure strategy. Controls whether the model sees every tool's\n * full `inputSchema` in its tool list every turn (\"eager\") or whether MCP\n * tools are advertised as a name+description catalog in the system prompt\n * and only get full schemas after being surfaced via the `tool_search`\n * native tool (\"lazy\" / progressive disclosure).\n *\n * Native tools (those passed to `createAgent({ tools })`) and skill tools\n * are always eager — they are core to the agent and cheap. Only MCP tools\n * are eligible for lazy disclosure.\n *\n * When `'lazy'`, the agent:\n * - Appends a `<searchable_tools>` section to the system prompt listing\n * every MCP tool by `name` + `description` only (no `inputSchema`).\n * - Auto-injects a `tool_search` native tool (opt out via\n * {@link AgentBehavior.toolSearch}) the model uses to load schemas on\n * demand. Surfaced tools persist for the rest of the run.\n * - Rebuilds the wire-level tool list each turn, appending newly-unlocked\n * tools at the end so the prefix-cache breakpoint advances cleanly.\n *\n * Trade-off: every `tool_search` invocation expands the tool list and\n * invalidates the tool-list cache breakpoint for one turn. With many\n * MCP servers, the savings on cold turns (fewer schemas in context) are\n * substantial; with one tiny MCP server, the overhead may not pay back.\n *\n * Default: `'eager'`.\n */\n toolDisclosure?: 'eager' | 'lazy'\n /**\n * Fine-grained config for the `tool_search` tool auto-injected when\n * {@link AgentBehavior.toolDisclosure} is `'lazy'`. No-op in eager mode.\n *\n * - `tool: false` — opt out of the auto-injection entirely. Use when the\n * host wants to ship a custom discovery tool. Note that the catalog\n * text drops the call-to-action prose in this case so the model isn't\n * pointed at a non-existent tool.\n * - `limit` — default cap on results returned per `tool_search` call when\n * the model omits the parameter. Default: `20`.\n *\n * Note on host-defined `tool_search`: a tool the host registers under the\n * name `tool_search` (or under any alias whose canonical is `tool_search`)\n * will shadow the auto-injected one — the catalog text will point at the\n * host's wire name, but driving the unlock flow requires either using\n * `createToolSearchTool({ catalog, unlocked })` from `tools/tool-search`\n * (which internally mutates the unlock set) or fully opting out via\n * `toolSearch.tool: false` and treating discovery as a host-side concern.\n * A bare host tool that doesn't touch the unlock set will not advance the\n * lazy disclosure state and the hard gate will keep refusing lazy calls.\n *\n * Default: `undefined` (auto-inject with the default limit).\n */\n toolSearch?: {\n tool?: false\n limit?: number\n }\n /**\n * Persist large `tool_result` outputs to disk and replace the in-message\n * content with a `<persisted-output>` stub (preview + filesystem path).\n * When the post-`tool:transform` byte size of a tool's result exceeds\n * this threshold, the framework writes the full payload to\n * `<persistDir>/<callId>.txt` and substitutes a fixed-format stub so the\n * model sees a 2 KiB preview plus the path it can `read_file`.\n *\n * The substitution happens at emit time (just after `tool:transform` runs)\n * and the stub flows into `session.turns` directly — so every subsequent\n * turn re-emits the same bytes, keeping the prompt-cache prefix stable.\n *\n * Set `0` / `undefined` to disable. Built-in chat profiles default to\n * `8192`. Tools listed in {@link AgentBehavior.persistExcludeTools} bypass\n * regardless of size — typically because their output is intentionally\n * short or persisting would be circular (e.g. `read_file`).\n *\n * Requires {@link AgentBehavior.persistDir} to be set; without a target\n * directory the framework silently skips persistence (no throw, no\n * substitution) since there's nowhere to write the blob.\n *\n * Default: `undefined` (off).\n */\n persistThreshold?: number\n /**\n * Canonical tool names to exclude from disk persistence regardless of\n * output size. The framework bypasses persistence for any tool whose\n * canonical name appears in this list — useful for tools whose results\n * are intentionally part of the prompt (`skills_use`), short envelopes\n * (`tool_search`, `present_plan`, `ask_user`), or where persistence\n * would be circular (`read_file`, whose pagination already serves the\n * same use case).\n *\n * Default: `undefined` (no exclusions). The chat-layer built-in profiles\n * set their own list — see `src/chat/agents.ts`.\n */\n persistExcludeTools?: readonly string[]\n /**\n * Directory under which persisted tool-result blobs land. Each call's\n * payload is written to `<persistDir>/<callId>.txt` (one file per\n * `tool_use` id, atomic via write-then-rename).\n *\n * The chat layer resolves this to `<userDir>/tool-results/<sessionId>/`\n * at session activation; SDK consumers pass an absolute path. Required\n * when {@link AgentBehavior.persistThreshold} is non-zero — when unset\n * the framework treats persistence as disabled.\n *\n * Default: `undefined`.\n */\n persistDir?: string\n /**\n * Absolute directory where the `shell` tool's background mode (the\n * `run_in_background: true` flag) appends output log files. One file\n * per task: `<tasksDir>/<task-id>.log` (e.g. `bash_1.log`). The model\n * gets the absolute path back in the tool result and reads incremental\n * output via the regular `read_file` tool.\n *\n * The chat layer resolves this to `<userDir>/<sessionId>/tasks/` at\n * session activation; SDK consumers pass an absolute path. When unset,\n * `shell({ run_in_background: true })` surfaces a clean error to the\n * model so the framework doesn't silently fall back to a path the user\n * didn't pick.\n *\n * Default: `undefined`.\n */\n tasksDir?: string\n /**\n * Hide the built-in `shell` tool's `run_in_background` field + the\n * background-mode paragraphs in its description, even when\n * {@link AgentBehavior.tasksDir} is set. The model never sees the flag\n * and won't try to use it.\n *\n * Combined with the implicit gate on `tasksDir` (background mode also\n * auto-hides when `tasksDir` is unset, since the host hasn't wired the\n * log dir), this gives two ways to opt out:\n *\n * - **Implicit**: don't set `tasksDir`. The host hasn't opted in.\n * - **Explicit**: set `disableBackgroundTasks: true`. The host has\n * `tasksDir` for some other reason (legacy, fixture) but doesn't\n * want the model spawning background work.\n *\n * Only applies to the framework-provided `shell` tool (identity check\n * against the exported `shell` constant). Hosts who wrap or replace\n * the shell tool own their spec — the auto-disable doesn't touch\n * tool defs the agent doesn't recognize.\n *\n * Default: `false`.\n */\n disableBackgroundTasks?: boolean\n /**\n * Fail-fast instead of repair when the pre-send pairing pass detects\n * corruption (orphan `tool_use` / `tool_result`, duplicate ids,\n * compaction-stranded blocks). Throws {@link AgentToolPairingError} from\n * the next `agent.run()` turn carrying the structured repair list the\n * loop would have performed.\n *\n * Use case: training-data collectors that must reject any transcript\n * containing the synthetic `SYNTHETIC_TOOL_RESULT_PLACEHOLDER` rather\n * than ship poisoned data to the fine-tuning pipeline. User-facing chat\n * sessions should leave this off (the repair-on-the-fly behavior is the\n * point of the pass).\n *\n * Telemetry note: `pairing:repair` still fires for every repair before\n * the throw, so observability handlers see exactly what would have\n * happened.\n *\n * Default: `false`.\n */\n strictToolPairing?: boolean\n}\n\n// ---------------------------------------------------------------------------\n// Prompt parts (multimodal input)\n// ---------------------------------------------------------------------------\n\n/**\n * One block of a multimodal user prompt.\n *\n * `agent.run({ prompt })` accepts either a plain string (treated as a single\n * text part) or an array of these parts for multimodal inputs.\n *\n * `document` parts are routed per provider: PDF-style mime types are sent as\n * native document blocks when the provider supports them; text documents are\n * inlined as text with an attachment header. Providers that cannot handle an\n * image or document throw early.\n */\nexport type PromptPart\n = | PromptTextPart\n | PromptImagePart\n | PromptDocumentPart\n\nexport interface PromptTextPart {\n type: 'text'\n text: string\n}\n\nexport interface PromptImagePart {\n type: 'image'\n /** IANA media type (e.g. `image/png`, `image/jpeg`) */\n mediaType: string\n /** Base64-encoded payload */\n data: string\n /** Optional display name */\n name?: string\n}\n\nexport interface PromptDocumentPart {\n type: 'document'\n /** IANA media type (e.g. `application/pdf`, `text/plain`) */\n mediaType: string\n /** Either a base64-encoded payload (`encoding: 'base64'`) or raw text (`encoding: 'text'`) */\n data: string\n encoding: 'base64' | 'text'\n /** Optional display name used in attachment headers */\n name?: string\n}\n\n// ---------------------------------------------------------------------------\n// Canonical message format (used throughout the agent system)\n// ---------------------------------------------------------------------------\n\n/**\n * A single block of structured tool-result content.\n *\n * MCP servers can return a mix of text, image, resource, and audio blocks. Tools\n * return `string` for the common text-only case or `ToolResultContent[]` when they\n * need to preserve non-text content (e.g. screenshots from a browser MCP).\n *\n * Providers that support native multi-part tool results (Anthropic, OpenAI Codex via\n * pi-ai) route image blocks into their wire format verbatim; OpenAI-compat providers\n * route them via a companion-user-message fallback when the underlying model/endpoint\n * does not accept images inside tool-role messages.\n */\nexport type ToolResultContent\n = | ToolResultTextContent\n | ToolResultImageContent\n\nexport interface ToolResultTextContent {\n type: 'text'\n text: string\n}\n\nexport interface ToolResultImageContent {\n type: 'image'\n /** IANA media type (e.g. `image/png`, `image/jpeg`) */\n mediaType: string\n /** Base64-encoded payload */\n data: string\n}\n\n/**\n * Lossy flattener — converts `ToolResultContent[]` (or a plain string) to a single\n * string. Image blocks are replaced with `[image: <media> — <n> b64 bytes]` markers.\n *\n * Use at UI boundaries where a string is required; providers that understand\n * structured content should route the array through without flattening.\n */\nexport function toolResultToText(content: string | ToolResultContent[]): string {\n if (typeof content === 'string')\n return content\n return content\n .map((block) => {\n if (block.type === 'text')\n return block.text\n return `[image: ${block.mediaType} — ${block.data.length} b64 bytes]`\n })\n .join('\\n')\n}\n\n/**\n * Approximate **wire payload size** of a tool output, in bytes.\n *\n * - Plain text: UTF-8 byte length.\n * - Structured content: text blocks contribute their UTF-8 byte length; image\n * blocks contribute their **base64 character length** — a proxy for the\n * serialized request-body footprint, NOT for tokens. Vision encoders\n * tokenize decoded pixels (geometry-dependent; e.g. Anthropic ≈ `w·h/750`,\n * OpenAI ≈ 85 + 170/tile), which has no meaningful relationship to base64\n * length.\n *\n * Used by the agent loop to populate `outputBytes` on `tool:after`,\n * `tool:transform`, `mcp:tool:after`, and `mcp:tool:transform` hooks so\n * consumers can size-budget tool output without re-counting bytes themselves.\n * Suitable for byte-budget heuristics (`toolOutputBudget`, tail compaction);\n * NOT a substitute for provider-side context-window accounting — defer to\n * server-side context management (e.g. Anthropic's `context-management-*`\n * beta) when token accuracy matters.\n */\nexport function toolOutputByteLength(content: string | ToolResultContent[]): number {\n if (typeof content === 'string')\n return Buffer.byteLength(content)\n let total = 0\n for (const block of content) {\n if (block.type === 'text')\n total += Buffer.byteLength(block.text)\n else\n total += block.data.length\n }\n return total\n}\n\nexport type SessionContentBlock\n = | { type: 'text', text: string }\n | { type: 'image', mediaType: string, data: string }\n | { type: 'tool_call', id: string, name: string, input: Record<string, unknown> }\n | {\n type: 'tool_result'\n callId: string\n /**\n * Tool output — either a plain string (text-only, the common case) or a structured\n * array of content blocks (text + image for multimodal tools such as screenshots).\n */\n output: string | ToolResultContent[]\n isError?: boolean\n }\n | {\n type: 'thinking'\n text: string\n signature?: string\n /**\n * Provider that minted `signature`. Signatures are provider-bound (Anthropic\n * HMAC vs. OpenAI `encrypted_content`) and are dropped on cross-provider\n * hops to avoid 400s. Unset means legacy/unknown — forwarded as-is.\n */\n signatureProducer?: 'anthropic' | 'openai'\n }\n | { type: 'redacted_thinking', data: string }\n | {\n /**\n * Opaque round-trip envelope for reasoning state minted by an OpenAI-compat\n * gateway (currently OpenRouter). The gateway expects its own\n * `reasoning_details` array echoed back verbatim on the next turn so the\n * upstream model can resume an extended-reasoning chain across tool calls.\n *\n * Stored opaquely because the items are provider-bound (Anthropic HMAC\n * signatures, OpenAI `encrypted_content`, model-specific summary formats\n * — all flowing through the gateway's normalized envelope).\n */\n type: 'provider_reasoning'\n producer: 'openrouter'\n details: unknown[]\n /**\n * Model id that produced the details. Reasoning is bound to a specific\n * upstream route — a model switch on the next turn invalidates the\n * embedded signatures, so the sender drops the block on mismatch.\n */\n model?: string\n }\n | {\n /**\n * Compaction marker. Inserted by `compactConversation()` to replace a\n * prefix of turns with an LLM-generated summary.\n *\n * The marker lives in `session.turns` and renders in the transcript —\n * the user can still scroll back to see the original turns. From the\n * agent loop's wire-level perspective, every turn whose id appears in\n * `replacesTurnIds` is dropped, and this block's `summary` text is\n * sent to the model as a single user message in their place.\n *\n * The marker turn carries `role: 'user'` so it sits naturally at a\n * conversational boundary. Only the latest `compact-summary` block in\n * the session is honored — earlier markers are subsumed by later\n * ones (their `replacesTurnIds` are a strict prefix).\n */\n type: 'compact-summary'\n /** Turn ids the summary replaces, in chronological order. */\n replacesTurnIds: readonly string[]\n /** The summary text sent to the model in place of the elided turns. */\n summary: string\n /** Model id used to produce the summary. */\n model: string\n /** Token usage from the summary call. */\n usage: TurnUsage\n /** Unix-ms when compaction completed. */\n compactedAt: number\n }\n\nexport interface SessionMessage {\n role: 'user' | 'assistant'\n content: SessionContentBlock[]\n}\n\nexport interface SessionTurn {\n /** UUID — generated by the store if it provides generateTurnId, else crypto.randomUUID() */\n id: string\n /** Run that produced this turn (e.g. 'run_1') */\n runId?: string\n role: 'user' | 'assistant' | 'system'\n content: SessionContentBlock[]\n /** Token usage — only present on assistant turns */\n usage?: TurnUsage\n /** Unix timestamp (Date.now()) when the turn was created */\n createdAt: number\n}\n\n// ---------------------------------------------------------------------------\n// Agent run options\n// ---------------------------------------------------------------------------\n\n/**\n * Per-run hook registrations. Each entry can be a single handler or an array of handlers.\n * Keys are `AgentHooks` event names (loose-typed here to avoid a circular import; agent.ts\n * narrows it to the strongly-typed map).\n */\nexport type RunHookMap = Record<string, ((ctx: any) => unknown) | ((ctx: any) => unknown)[]>\n\nexport interface AgentRunOptions {\n model?: string\n /**\n * User prompt. Optional when resuming a session with existing turns.\n *\n * Accepts either a plain string (single text part) or an array of `PromptPart`s for\n * multimodal inputs (text, images, documents). See {@link PromptPart}.\n */\n prompt?: string | PromptPart[]\n system?: string\n thinking?: ThinkingLevel\n /** Abort signal — when triggered, the agent stops after the current turn */\n signal?: AbortSignal\n /** Behavior overrides for this run (overrides agent defaults) */\n behavior?: AgentBehavior\n /** Tool overrides for this run. Pass {} for no tools. Omit to use agent tools. */\n tools?: Record<string, ToolDef>\n /**\n * Per-run hook registrations. Each hook is attached before the run starts and\n * detached in a finally block so handlers never leak across runs.\n *\n * Accepts either a single handler or an array (all handlers register).\n */\n hooks?: RunHookMap\n /**\n * Parent run id. Populated automatically by the `spawn` tool when the child\n * shares the parent's session; recorded on the resulting `SessionRun` so the\n * parent↔child run tree can be reconstructed from a persisted session.\n */\n parentRunId?: string\n /**\n * Zero-based subagent depth. 0 = top-level `agent.run()`, 1 = first-level\n * child spawned by a parent agent, and so on. Used by the spawn tool to\n * enforce `maxDepth` and to stamp `child:*` forwarded hook payloads.\n */\n depth?: number\n /**\n * Opaque trace-context carrier propagated from the parent agent's\n * tracer (typically a W3C `{ traceparent, tracestate }` map). The\n * child's `agent.run()` re-emits it on the `agent:start` hook so the\n * child's tracer can stitch its root span as a continuation of the\n * parent's spawn span. Empty / absent on top-level runs.\n *\n * Set automatically by the `spawn` tool when a parent's tracer wrote\n * into `SpawnHookContext.tracingContext` on `spawn:before`. SDK\n * consumers that drive subagents manually can populate this directly.\n */\n tracingContext?: Readonly<Record<string, string>>\n /**\n * Override the time + UUID source for this run only. Per-run wins over\n * agent-level ({@link AgentOptions.clock}); both fall back to\n * {@link DEFAULT_AGENT_CLOCK}. See {@link AgentClock}.\n */\n clock?: AgentClock\n}\n\n// ---------------------------------------------------------------------------\n// Agent stats\n// ---------------------------------------------------------------------------\n\n/**\n * Reason the provider gave for stopping the turn.\n *\n * - `'stop'` — natural turn end (`end_turn` / `stop_sequence`).\n * - `'tool-calls'` — model emitted tool_use blocks.\n * - `'length'` — `max_tokens` reached, or (Anthropic 4.6+) the response bumped\n * against the model's context window mid-stream\n * (`model_context_window_exceeded`). The partial response is preserved; the\n * loop emits this reason so consumers can prune/retry.\n * - `'content-filter'` — model refused.\n * - `'pause'` — Anthropic `pause_turn`: a server-side mid-turn pause for very\n * long thinking. The loop continues with a synthetic \"Please continue.\"\n * user message rather than terminating; consumers see the pause via this\n * finish reason on the prior assistant turn.\n * - `'error'` — provider classified the turn as failed.\n * - `'other'` — unknown / unmapped.\n */\nexport type TurnFinishReason = 'stop' | 'tool-calls' | 'length' | 'content-filter' | 'pause' | 'error' | 'other'\n\nexport interface TurnUsage {\n input: number\n output: number\n /** Tokens written to cache (Anthropic) */\n cacheCreation?: number\n /** Tokens read from cache (Anthropic) */\n cacheRead?: number\n /** Thinking/reasoning tokens used */\n thinking?: number\n /**\n * Cost in USD for this turn. Provider-reported when available\n * (OpenRouter, OpenAI via pi-ai); otherwise estimated from `modelId` ×\n * pi-ai's bundled price registry by `fillEstimatedCost` in\n * `src/providers/cost.ts`. Absent only when neither path could resolve\n * a price (unknown / unbundled model).\n */\n cost?: number\n /**\n * Why the model stopped this turn. Providers normalize native stop reasons to this union.\n * Absent when the provider did not surface a reason (e.g. mock turns).\n */\n finishReason?: TurnFinishReason\n /**\n * The model ID the provider ultimately used. May differ from the requested model when the\n * provider remaps aliases. Absent for providers that do not echo a model ID.\n */\n modelId?: string\n /**\n * Milliseconds from the moment the loop dispatched `provider.stream()`\n * (the `stream:start` hook firing) to the first observable byte of this\n * turn — earliest of `stream:text`, `stream:thinking`, or a tool_use\n * block. Captures the per-turn TTFT independently of run-relative TTFT\n * ({@link AgentStats.timeTillFirstTokenMs}, which only marks the first\n * turn).\n *\n * Useful for metrics histograms (`gen_ai.client.time_to_first_token`)\n * across long multi-turn runs where the run-level metric collapses to\n * the cold turn only. Absent for empty-stream turns and turns that\n * errored before any byte landed.\n */\n timeToFirstTokenMs?: number\n}\n\nexport interface AgentStats {\n /**\n * Cumulative input tokens across the parent agent loop **and** every\n * recursively-spawned sub-agent. Use this for billing / token-ledger\n * consumption.\n */\n totalIn: number\n /** Cumulative output tokens. Same semantics as {@link AgentStats.totalIn}. */\n totalOut: number\n /**\n * Cumulative cache-read tokens across the parent agent loop and every\n * recursively-spawned sub-agent. Surfaced at the top level (rather than\n * only per-`TurnUsage`) because Anthropic prices cache reads at a separate\n * line-item rate from regular input — billing-correct cost computation\n * needs this number directly. Always `0` for providers that don't report\n * cache usage.\n */\n totalCacheRead: number\n /**\n * Cumulative cache-creation tokens across the parent agent loop and every\n * recursively-spawned sub-agent. Same rationale as\n * {@link AgentStats.totalCacheRead} — separate Anthropic billing rate.\n * Always `0` for providers that don't report cache usage.\n */\n totalCacheCreation: number\n /**\n * Number of parent agent-loop turns. Children's turn counts live under\n * `children[].stats.turns` and are NOT folded in here — a single \"turns\"\n * number for the whole tree would conflate two different measures\n * (parent-loop iterations vs. tree-wide tool-call rounds).\n *\n * Tree-wide turn count: `flattenTurns(stats).length`.\n */\n turns: number\n /**\n * Wall-clock duration of the top-level `agent.run()` call, in milliseconds.\n * Children run during parent tool calls so this naturally subsumes child\n * wall time — sequential children inflate it, parallel children compress\n * into the parent's window.\n */\n elapsed: number\n /**\n * Per-turn usage breakdown for the **parent loop only**. Children's per-turn\n * usages live under `children[].stats.turnUsage`. Use {@link flattenTurns}\n * to walk the full tree.\n */\n turnUsage?: TurnUsage[]\n /**\n * Cumulative cost in USD — parent loop plus every recursively-spawned\n * sub-agent. Sums per-turn `TurnUsage.cost` reported by the provider.\n * Absent when neither parent nor any descendant reported a non-zero cost.\n */\n cost?: number\n /** Stats from child agents spawned during this run, in completion order. Recursive. */\n children?: ChildRunStats[]\n /** Structured output from schema enforcement (only present when behavior.schema is set) */\n output?: Record<string, unknown>\n /**\n * Milliseconds from the start of `agent.run()` to the first observable signal from the\n * provider (first `stream:text`, `stream:thinking`, or `tool:before` event).\n *\n * Absent when the run produced no observable signals (e.g. aborted before any stream event).\n */\n timeTillFirstTokenMs?: number\n}\n\nexport interface ChildRunStats {\n id: string\n task: string\n /**\n * The child agent's full {@link AgentStats}. Cumulative for that child's\n * own subtree (child loop + its grandchildren). Do **not** sum\n * `ctx.stats.totalIn` across `spawn:complete` events to derive top-level\n * totals — `agent.run()`'s return value is the canonical cumulative root.\n */\n stats: AgentStats\n /**\n * Subagent depth when this child ran. 1 = direct child of the top-level\n * agent, 2 = grandchild, etc. Useful for telemetry that wants to group\n * runs by depth.\n */\n depth?: number\n /**\n * Terminal state of the child run. `'completed'` is the default. Exposed so\n * a parent reading `stats.children` can distinguish aborted/timed-out\n * children without re-parsing the returned string.\n */\n status?: 'completed' | 'aborted' | 'timeout' | 'error'\n /**\n * Final structured output when the child was run with `behavior.schema`.\n * Mirrors `AgentStats.output` but is surfaced here so the parent can read\n * it without peeking at the nested `stats` bag.\n */\n output?: Record<string, unknown>\n}\n\n// ---------------------------------------------------------------------------\n// Hook context types\n// ---------------------------------------------------------------------------\n\n/**\n * Base context for tool execution hooks.\n *\n * `name` is the canonical tool identity — the spec name registered on the agent (or the\n * `mcp_{server}_{tool}` name for MCP tools). Hooks should policy-match against `name`.\n *\n * `displayName` is the outward-facing name — the alias surfaced to the LLM when\n * `AgentOptions.toolAliases` maps the canonical name; otherwise equal to `name`.\n * UI/telemetry adapters should emit `displayName`.\n *\n * Canonical vs. alias matters on session resume: `session.turns` persists canonical\n * names only, so renaming an alias cannot desync history.\n */\nexport interface ToolHookContext {\n turnId: string\n callId: string\n /** Canonical tool name (spec name). Stable across alias-map changes. */\n name: string\n /** Aliased (wire) name — equal to `name` when no alias is defined. */\n displayName: string\n input: Record<string, unknown>\n /**\n * The run this tool call belongs to (the `SessionRun.id`). Lets a single\n * `tool:*` listener disambiguate calls across parallel runs / subagent\n * trees without subscribing to the `child:tool:*` bubble events.\n */\n runId?: string\n /**\n * Parent run id when this tool call's agent is a subagent — i.e. the\n * `SessionRun.parentRunId` of the run that owns the call. Absent on\n * top-level runs. Useful for observability stitching: a UI grouping\n * subagent-scoped state (e.g. todowrite, edit batches) by parent\n * run can read this directly off `tool:before` / `tool:after`\n * without resolving the run row.\n */\n parentRunId?: string\n /**\n * Subagent depth for this tool call. 0 = top-level, 1 = first-level\n * child, etc. Mirrors `ToolContext.depth` so hook consumers don't\n * have to cross-reference the tool context. Omitted on top-level\n * runs (treated as 0).\n */\n depth?: number\n}\n\n/**\n * Base context for MCP tool hooks.\n *\n * `tool` is the native tool name on the MCP server. `server` is the configured server\n * name. The canonical zidane-namespaced identity is `mcp_{server}_{tool}`.\n *\n * `displayName` equals the canonical namespaced name unless the agent has aliased\n * this MCP tool via `AgentOptions.toolAliases`; in which case `displayName` is the\n * alias that the LLM sees.\n */\nexport interface McpToolHookContext {\n turnId: string\n callId: string\n server: string\n tool: string\n /** Aliased wire name for this MCP tool, or the canonical `mcp_{server}_{tool}` name. */\n displayName: string\n input: Record<string, unknown>\n /** Owning run id — same semantics as `ToolHookContext.runId`. */\n runId?: string\n /** Parent run id when this tool call's agent is a subagent — see `ToolHookContext.parentRunId`. */\n parentRunId?: string\n /** Subagent depth — see `ToolHookContext.depth`. */\n depth?: number\n}\n\n/** Base context for session hooks */\nexport interface SessionHookContext {\n sessionId: string\n}\n\n/** Base context for spawn hooks */\nexport interface SpawnHookContext {\n id: string\n task: string\n /**\n * Subagent depth for the spawn. 1 = direct child of the top-level agent.\n * Present on spawn:before/complete/error. Absent for grandchild spawns that\n * bubble through `child:*` events (which carry their own `depth`).\n */\n depth?: number\n /**\n * Mutable trace-context carrier for parent → child span linkage. Empty\n * object by default; a parent tracer mutates it on `spawn:before` (e.g.\n * `ctx.tracingContext.traceparent = '00-…-…-01'`) and the `spawn` tool\n * forwards the populated object to the child via\n * `AgentRunOptions.tracingContext`. The child's tracer re-emits it on\n * `agent:start` so it can be used as parent context when opening the\n * child's root span.\n *\n * Opaque to the harness — keys / values are tracer-defined. Standard\n * choice is W3C Trace Context (`traceparent` + optional `tracestate`),\n * but Datadog / Sentry / B3 carriers work too.\n */\n tracingContext?: Record<string, string>\n}\n\n/** Context for stream hooks */\nexport interface StreamHookContext {\n turnId: string\n}\n\n/** Context for OAuth refresh hooks */\nexport interface OAuthRefreshHookContext {\n provider: string\n providerId: string\n source: 'params' | 'file'\n previousCredentials: Record<string, unknown> & { access: string, refresh: string, expires: number }\n credentials: Record<string, unknown> & { access: string, refresh: string, expires: number }\n}\n\nexport type SessionEndStatus = 'completed' | 'aborted' | 'error'\n"],"mappings":";;;AAiEA,MAAa,sBAAkC;CAC7C,WAAW,KAAK,KAAK;CACrB,kBAAkB,OAAO,YAAY;CACtC;;;;;;;;AA+rBD,SAAgB,iBAAiB,SAA+C;CAC9E,IAAI,OAAO,YAAY,UACrB,OAAO;CACT,OAAO,QACJ,KAAK,UAAU;EACd,IAAI,MAAM,SAAS,QACjB,OAAO,MAAM;EACf,OAAO,WAAW,MAAM,UAAU,KAAK,MAAM,KAAK,OAAO;GACzD,CACD,KAAK,KAAK;;;;;;;;;;;;;;;;;;;;;AAsBf,SAAgB,qBAAqB,SAA+C;CAClF,IAAI,OAAO,YAAY,UACrB,OAAO,OAAO,WAAW,QAAQ;CACnC,IAAI,QAAQ;CACZ,KAAK,MAAM,SAAS,SAClB,IAAI,MAAM,SAAS,QACjB,SAAS,OAAO,WAAW,MAAM,KAAK;MAEtC,SAAS,MAAM,KAAK;CAExB,OAAO"}
|
|
1
|
+
{"version":3,"file":"types-oKPBdCmL.js","names":[],"sources":["../src/types.ts"],"sourcesContent":["/**\n * Shared types for the agent system.\n */\n\nimport type { ToolDef } from './tools/types'\nimport { Buffer } from 'node:buffer'\n\n// ---------------------------------------------------------------------------\n// Thinking / Reasoning\n// ---------------------------------------------------------------------------\n\n/**\n * Thinking / extended-reasoning configuration.\n *\n * - `'off'` — no thinking.\n * - `'minimal' | 'low' | 'medium' | 'high'` — explicit token budget. Maps to\n * provider-specific reasoning controls (Anthropic `thinking.type='enabled'`\n * with a budget; OpenAI `reasoning_effort`).\n * - `'adaptive'` — let the model decide per-turn whether and how much to think.\n * Anthropic-only (`thinking.type='adaptive'`). Other providers fall back to\n * no reasoning when this value is supplied.\n */\nexport type ThinkingLevel = 'off' | 'minimal' | 'low' | 'medium' | 'high' | 'adaptive'\n\n// ---------------------------------------------------------------------------\n// Clock / determinism seam\n// ---------------------------------------------------------------------------\n\n/**\n * Time + UUID source. Defaults to `Date.now()` and `crypto.randomUUID()`.\n *\n * Scoped to **journaled metadata** — every callsite that lands in\n * `SessionTurn.id`, `SessionTurn.createdAt`, `runId`, `turnId`, or hook\n * payloads consumers may persist. Live-only measurements (TTFT deltas,\n * `elapsed` counters) keep `Date.now()` directly so they reflect real\n * wall-clock progress.\n *\n * `now()` is allowed to return a `Promise<number>` so durable-execution\n * adapters can journal each timestamp (Restate's `ctx.date.now()` is\n * async because it routes through `ctx.run`). The loop awaits at every\n * callsite. `randomUUID()` stays synchronous because Restate's\n * `ctx.rand.uuidv4()` is deterministic-from-seed and doesn't need\n * journaling. The native default returns sync values for both.\n *\n * Durable-execution adapters (Restate, Temporal, …) inject a journaled\n * variant — Restate: `{ now: () => ctx.date.now(), randomUUID: () => ctx.rand.uuidv4() }` —\n * so replay regenerates byte-identical session metadata across attempts.\n *\n * Precedence: `AgentRunOptions.clock` > `AgentOptions.clock` >\n * {@link DEFAULT_AGENT_CLOCK}.\n */\nexport interface AgentClock {\n /**\n * Current wall-clock time in ms (epoch). May be async so journaled\n * implementations (Restate) can plumb through their durable timer.\n */\n now: () => number | Promise<number>\n /**\n * RFC-4122 v4 UUID string. Always sync — Restate's `rand.uuidv4()`\n * is seeded from the invocation ID and replay-stable by construction.\n */\n randomUUID: () => string\n}\n\n/** Native clock backing the default (Date.now + crypto.randomUUID). */\nexport const DEFAULT_AGENT_CLOCK: AgentClock = {\n now: () => Date.now(),\n randomUUID: () => crypto.randomUUID(),\n}\n\n// ---------------------------------------------------------------------------\n// MCP server configuration\n// ---------------------------------------------------------------------------\n\n/**\n * Slim shape of an upstream MCP tool descriptor — what `client.listTools()`\n * returns per entry. Exposed publicly so hosts can persist the schemas\n * between runs and feed them back via {@link McpServerConfig.cachedTools}\n * to skip the `tools/list` round-trip on subsequent bootstraps.\n */\nexport interface McpToolSchema {\n name: string\n description?: string | null\n inputSchema?: unknown\n}\n\nexport interface McpServerConfig {\n /** Display name (used for tool namespacing) */\n name: string\n /** Transport type */\n transport: 'stdio' | 'sse' | 'streamable-http'\n /** For stdio: command to run */\n command?: string\n /** For stdio: command arguments */\n args?: string[]\n /**\n * For stdio: environment variables to pass to the server process.\n *\n * Merged on top of the MCP SDK's default inherited environment — a safety\n * whitelist (`PATH`, `HOME`, `LANG`, `SHELL`, `USER` on POSIX; `APPDATA`,\n * `PATH`, ... on Win32). Setting this to `{}` no longer strips `PATH` from\n * the child process. Set {@link McpServerConfig.strictEnv} to `true` to\n * pass `env` verbatim with no inherited defaults.\n */\n env?: Record<string, string>\n /**\n * When true, {@link McpServerConfig.env} is passed verbatim to the spawned\n * process — the MCP SDK's default inherited environment (`PATH`, `HOME`, ...)\n * is NOT merged in. Most consumers should leave this off; the default merge\n * prevents `spawn ENOENT` when a stdio server declares an `env` without\n * restating `PATH`.\n */\n strictEnv?: boolean\n /** For sse/streamable-http: server URL */\n url?: string\n /** Optional headers for HTTP transports */\n headers?: Record<string, string>\n /**\n * OAuth 2.1 authentication (sse / streamable-http only).\n *\n * - `'oauth'` — enables the SDK's OAuth flow with RFC 9728 protected-resource\n * metadata discovery, RFC 8414 / OIDC authorization-server metadata, RFC 7591\n * dynamic client registration, PKCE, and refresh-token rotation. Tokens persist\n * between runs via the host's credential store.\n * - `undefined` (default) — no OAuth. The host may still auto-promote a server\n * to OAuth on `UnauthorizedError` IF no static `Authorization` header is set\n * (the headers check stops us from second-guessing user-managed bearer tokens).\n *\n * Recognized aliases at parse time: Cursor's `authMethod: 'mcpOAuth'` maps to\n * `auth: 'oauth'` so `~/.cursor/mcp.json` pastes work unchanged.\n */\n auth?: 'oauth'\n /**\n * Timeout in milliseconds for MCP server bootstrap (connect + tool discovery).\n *\n * Zidane connects MCP servers lazily on the first `run()`. Without a\n * bootstrap timeout, a slow or hung server can delay the first provider call\n * for an arbitrarily long time even when that MCP server is never used.\n *\n * Default: `10000`.\n */\n bootstrapTimeout?: number\n /** Timeout in milliseconds for MCP tool calls (default: 30000) */\n toolTimeout?: number\n /**\n * Allow-list of tool names to expose. Names match the upstream tool name\n * (NOT the namespaced `mcp_{server}_{tool}` form). Tools not in the list are\n * dropped before registration — the model never sees them in its catalog and\n * the wire cost of advertising them is avoided.\n *\n * Mutually exclusive with {@link McpServerConfig.disabledTools} — passing both\n * throws at bootstrap time.\n *\n * Composes with {@link McpServerConfig.toolFilter}: allow-list applies first,\n * then the predicate. Composes with the `mcp:tools:filter` hook: config-side\n * filters apply first, then the hook can further narrow the list.\n */\n enabledTools?: string[]\n /**\n * Deny-list of tool names. Tools matching are dropped before registration.\n * Same matching semantics as {@link McpServerConfig.enabledTools}.\n */\n disabledTools?: string[]\n /**\n * Custom predicate run on each upstream tool. Return `true` to keep, `false`\n * to drop. Receives the raw `listTools()` payload — useful for filtering by\n * description, schema shape, or other metadata that an allow/deny list can't\n * express.\n *\n * Runs after the allow/deny filter but before the `mcp:tools:filter` hook.\n */\n toolFilter?: (tool: { name: string, description?: string | null, inputSchema?: unknown }) => boolean\n /**\n * Per-server override for {@link AgentBehavior.toolDisclosure}. When set,\n * this server's tools follow this disclosure mode regardless of the\n * agent-wide default. Useful when one big MCP server (200+ tools) should\n * stay lazy while smaller servers stay eager.\n *\n * Default: inherits from `behavior.toolDisclosure`.\n */\n disclosure?: 'eager' | 'lazy'\n /**\n * Pre-cached tool schemas to advertise without issuing `tools/list` at\n * bootstrap. The connection is still established (the SDK's `connect()`\n * is needed for `tools/call`) — only the discovery round-trip is\n * skipped. Schemas are trusted as-is; the host owns invalidation\n * (typical cache key: `(server identity, server version)`). If the\n * server later returns `MethodNotFound` for a cached tool, the host\n * should drop the entry from its cache so the next bootstrap re-lists.\n *\n * Compatible with every transport, every auth mode, and with\n * {@link McpServerConfig.lazyConnect}. Composes with the existing\n * `enabledTools` / `disabledTools` / `toolFilter` filters — those run\n * over the cached schemas exactly as they would over `listTools()`\n * output.\n */\n cachedTools?: McpToolSchema[]\n /**\n * Defer the `client.connect(transport)` call until the first\n * `tools/call` reaches this server. Bootstrap registers the server's\n * tools using {@link McpServerConfig.cachedTools} without touching\n * the network, taking MCP setup off the critical path of\n * `agent.run()`. The first invocation pays the connect cost\n * (~200-500ms typically); every subsequent call reuses the live\n * client.\n *\n * Requires {@link McpServerConfig.cachedTools} — without schemas in\n * hand there is nothing to advertise to the model, so deferring the\n * connection has no purpose. Bootstrap rejects the config otherwise.\n *\n * **Incompatible with `auth: 'oauth'`**: the OAuth handshake (token\n * refresh / RFC 9728 metadata discovery) can fail in ways that today\n * fire `mcp:auth:required` at bootstrap so the host can surface a\n * login affordance *before* the model commits to calling a tool.\n * Deferring that to first call means an auth failure surfaces mid-run\n * as a tool-result error, which the model can't recover from without\n * a fresh prompt. Bootstrap rejects the combination so the error is\n * loud and proximate to the misconfiguration. Use OAuth servers\n * without `lazyConnect` (with `cachedTools` alone, if you want to\n * skip the `tools/list` round-trip).\n *\n * On connect failure (network error, transport refused), the cached\n * promise is dropped so the next `tools/call` retries. The model\n * sees the failure as a normal tool error. Subsequent calls remain\n * eligible to succeed once the upstream is reachable again.\n */\n lazyConnect?: boolean\n}\n\n// ---------------------------------------------------------------------------\n// Tool execution\n// ---------------------------------------------------------------------------\n\nexport interface AgentBehavior {\n /**\n * Maximum number of tools that may be in flight concurrently within a\n * single assistant turn. The scheduler dispatches concurrency-safe tools\n * (`ToolDef.isConcurrencySafe`) in parallel up to this cap; unsafe tools\n * act as barriers (wait for the fleet to drain, then run alone).\n *\n * Default: `10`. Set to `1` to force fully sequential dispatch regardless\n * of per-tool flags — useful for deterministic debugging / eval-grade\n * runs. Values `< 1` are clamped to `1`.\n */\n maxConcurrentTools?: number\n /**\n * Max agent loop iterations.\n *\n * Default: unlimited (Infinity). The loop runs until the model signals\n * completion (no tool calls / `end_turn`), the abort signal fires, or this\n * cap is hit. Set a finite value as a safety net for runaway loops.\n */\n maxTurns?: number\n /**\n * Run-level cost ceiling, expressed in USD. After each turn, the sum of\n * `TurnUsage.cost` across the run is compared against this value; if the\n * total has crossed the cap, the loop throws `AgentBudgetExceededError`\n * with `limit: 'cost'`. The agent finalizes the session run as\n * `'aborted'` so partial spend is recorded.\n *\n * Complements {@link maxTurns}: a single expensive turn (large context\n * window, deep reasoning) can blow past a turn-count cap but cost is\n * the unit operators actually want to bound for unattended runs.\n *\n * Checked **post-turn**, so the run may exceed by up to one turn's\n * spend before tripping — soft cap, not exact. Default: unbounded.\n * Set `0` to disable explicitly (negative / `NaN` are ignored).\n *\n * Requires the provider to populate `TurnUsage.cost` (every built-in\n * provider does). For providers that don't, the check silently no-ops\n * and {@link maxTotalTokens} is the right knob instead.\n */\n maxCostUsd?: number\n /**\n * Run-level token ceiling — sum of `TurnUsage.input + TurnUsage.output`\n * across the run. Same post-turn semantic + soft-cap behavior as\n * {@link maxCostUsd}; throws `AgentBudgetExceededError` with\n * `limit: 'tokens'` when exceeded.\n *\n * Cache reads / creations are **not** included in the sum — they're\n * billed at a steep discount, so counting them at par would make the\n * ceiling under-estimate the run's true affordability. Operators\n * wanting a stricter accounting should override via the `usage` hook.\n *\n * Default: unbounded. Set `0` to disable explicitly.\n */\n maxTotalTokens?: number\n /** Max tokens per LLM response (default: 16384) */\n maxTokens?: number\n /** Thinking token budget — overrides the level-based default when set */\n thinkingBudget?: number\n /** JSON Schema for structured output enforcement */\n schema?: Record<string, unknown>\n /**\n * Enable provider prompt caching. When on (default), the provider marks the\n * system prompt, tools, and the last stable message with cache breakpoints so\n * the shared prefix is served from cache across turns.\n *\n * - Anthropic: `cache_control: { type: 'ephemeral' }` on the last `system`\n * content part, the last tool, and the last message content part.\n * - OpenAI-compatible / OpenRouter: same shape — honored by Anthropic-backed\n * OpenRouter routes and by Gemini; ignored (no-op) by providers that cache\n * automatically (OpenAI, DeepSeek, Grok, Groq, Moonshot).\n *\n * Usage is surfaced via `TurnUsage.cacheRead` / `TurnUsage.cacheCreation`.\n *\n * Default: `true`.\n */\n cache?: boolean\n /**\n * Soft per-turn cap on total tool-output bytes. When the sum of `outputBytes`\n * across a turn's tool results exceeds this value, the loop injects a\n * synthetic user message instructing the model to summarize before calling\n * more tools, and fires the `budget:exceeded` hook.\n *\n * Measured **post-`tool:transform`** so consumer truncation counts toward the\n * budget. Off by default (undefined / `0` disables the check). A reasonable\n * starting value for OSS-model integrations is `32768`.\n */\n toolOutputBudget?: number\n /**\n * Canonical tool names whose output is exempt from\n * {@link AgentBehavior.toolOutputBudget} accounting. Their bytes don't\n * count toward the per-turn cap and the \"summarize before calling more\n * tools\" nudge is not triggered by them alone.\n *\n * Intended for tools whose entire purpose is to LOAD context into the\n * conversation — penalising them via the budget creates the exact\n * failure mode the budget is meant to prevent (the model gets steered\n * away from the very tool it just called to make progress):\n *\n * - `tool_search` — returns full `inputSchema` payloads for MCP tools the\n * model needs to call next. A Notion MCP server with 20+ tools easily\n * exceeds 64 KiB of schema JSON in a single discovery call.\n * - `skills_use` / `skills_read` — inject skill content; that text IS\n * the value of the call.\n *\n * `read_file` is intentionally NOT in the default list: a 200 KiB file\n * load is exactly the case the budget should steer against (pagination\n * or summarisation is the right next move).\n *\n * For MCP tools, key by the namespaced wire name (`mcp_<server>_<tool>`).\n * The matching happens on the canonical (registry-key) name, so aliases\n * are stable.\n *\n * Default: `undefined` — every tool counts. Chat profiles set their own\n * list — see `src/chat/agents.ts`.\n */\n toolOutputBudgetExcludeTools?: readonly string[]\n /**\n * Deduplicate identical re-reads of the same file in `read_file`. When the\n * model re-reads a file with the same slice and the bytes haven't changed\n * since the last read in this session, the tool returns a short stub\n * instead of re-emitting the full content. Pairs with the read-before-edit\n * guard in `edit` / `multi_edit`.\n *\n * Requires a session (set via `createSession()`); without one, the flag is\n * a no-op since per-session state has nowhere to live.\n *\n * Default: `true`.\n */\n dedupReads?: boolean\n /**\n * Taper the thinking budget over the course of a run. Late turns are\n * usually checkpoint / cleanup work where reasoning rarely pays for\n * itself; early turns benefit most. Two forms:\n *\n * - **Struct** — geometric decay starting after `afterTurn`, multiplying by\n * `factor` each subsequent turn, clamped to `floor`. Example\n * `{ afterTurn: 5, factor: 0.5, floor: 1024 }` with a base budget of 8192:\n * turns 1-5 = 8192, turn 6 = 4096, turn 7 = 2048, turn 8+ = 1024.\n * - **Function** — `(runTurn, baseBudget) => number`. Arbitrary curves;\n * `runTurn` is 1-indexed, run-relative (resumed sessions reset).\n *\n * No-op when `thinkingBudget` is unset. Honored by every provider that\n * respects `thinkingBudget` (anthropic explicit-budget `enabled` path,\n * adaptive `maxTokensCap`, openai-compat `max_tokens` padding).\n *\n * Default: `undefined` (no decay).\n */\n thinkingDecay?: { afterTurn: number, factor: number, floor: number } | ((runTurn: number, baseBudget: number) => number)\n /**\n * Per-tool soft call budget for this run. Keyed by **canonical** tool name.\n * On the first call after the run-cumulative dispatched count for that tool\n * reaches `max`, the framework fires `onExceed`:\n *\n * - `'steer'` (default) — let the call execute, but emit a synthetic user\n * message after the turn that nudges the model away from re-calling the\n * tool. Reuses the existing post-turn steer pathway used by\n * `toolOutputBudget`. Fires `tool-budget:exceeded` with `mode: 'steer'`.\n * - `'block'` — refuse the call via `tool:gate` `block`. The model sees a\n * `Blocked: <reason>` tool result. Fires `tool-budget:exceeded` with\n * `mode: 'block'`.\n * - **Function** — `(ctx) => { mode, message }`. The consumer supplies the\n * steering / refusal text and chooses the mode dynamically.\n *\n * Counts include both real dispatches and dedup substitutes (Z19 hits).\n * Excludes calls already blocked by an earlier gate (skill allow-list,\n * consumer hook). Tool dispatched by spawned subagents has its own per-run\n * counter — child counts never charge the parent.\n *\n * For MCP tools, key by the namespaced wire name (`mcp_<server>_<tool>`).\n *\n * Atomic in parallel mode: the middleware tracks its own per-tool\n * approval counter, incremented synchronously at gate-time. A\n * 4-call parallel batch against `max: 2` will let the first 2 through\n * and refuse the rest, even though the loop's `runToolCounts` only\n * propagates between calls (not within a single batch's gate fan-out).\n *\n * Default: `undefined` (no budget enforcement).\n */\n toolBudgets?: Record<string, {\n max: number\n onExceed?: 'steer' | 'block' | ((ctx: {\n tool: string\n count: number\n max: number\n }) => { mode: 'steer' | 'block', message: string })\n }>\n /**\n * Generic per-tool argument deduplication. Keyed by the tool's **canonical**\n * name (alias-stable). Each entry is a hasher: `(input) => string | undefined`.\n *\n * **Hasher contract** — three return values, three meanings:\n *\n * | Return | Meaning |\n * |-------------------------|------------------------------------------------------------------------|\n * | a non-empty string | Cache key for this call. Equal keys (most-recent-only, this session) |\n * | | replay the prior recorded result without re-dispatching the tool. |\n * | `undefined` | **Skip dedup for this call.** The tool runs normally; nothing recorded.|\n * | `''` / non-string | Treated identically to `undefined` (defensive: no dedup, no error). |\n *\n * The `undefined` opt-out is the way to say *\"this specific call is not\n * cacheable\"* (timestamps in input, randomness baked in, debug flags). It\n * is **not** the same as `JSON.stringify(input)` — that would dedup against\n * the verbatim input. Pick one explicitly:\n *\n * ```ts\n * // Always cache by full input — every identical re-call dedups.\n * dedupTools: { my_pure_tool: input => JSON.stringify(input) }\n *\n * // Cache by a normalized subset; non-cacheable shapes opt out.\n * dedupTools: {\n * execute_sql: (input) => {\n * const q = typeof input.query === 'string' ? input.query.trim().toLowerCase() : undefined\n * if (!q || q.includes('now()') || q.includes('random()')) return undefined\n * return q\n * },\n * }\n * ```\n *\n * On a hit, the previously-recorded result is replayed as the tool_result\n * without dispatching the tool. The substitution flows through `tool:gate`\n * `result` (Z20), so `tool:after` and `tool:transform` still fire.\n *\n * Requires a session (`createSession()`); without one, the map is a silent\n * no-op since per-session state has nowhere to live. Tools with side\n * effects or non-deterministic outputs (network, time, randomness) MUST\n * NOT be listed — there is no safety net beyond the consumer's hasher.\n *\n * For MCP tools, key by the namespaced wire name (`mcp_<server>_<tool>`).\n * Concurrency-safe siblings ({@link ToolDef.isConcurrencySafe}) in the\n * SAME assistant turn race against each other — none can dedup against\n * a sibling that started in the same batch. Unsafe tools act as barriers\n * and honor submission order within a turn, so an unsafe-but-listed tool\n * follows the cache cleanly.\n *\n * **Cache policy**: only the most recent `(hash, result)` per tool is\n * retained. Interleaved patterns (input A, input B, input A) miss on the\n * second A because B overwrote it. Sufficient for the common spam-the-\n * same-call loop; consumers needing a richer cache should hook\n * `tool:gate` directly.\n *\n * Default: `undefined` (no per-tool dedup).\n */\n dedupTools?: Record<string, (input: Record<string, unknown>) => string | undefined>\n /**\n * Require `read_file` before `edit` / `multi_edit` on the same path, and\n * reject edits when the file has changed on disk since the last read in\n * this session. Eliminates the silent-corruption failure mode where a\n * model \"remembers\" stale content and applies a substring edit against\n * bytes that have moved.\n *\n * Requires a session. Off by default; turn it on for stricter eval-grade\n * runs where silent edit corruption would invalidate the result.\n *\n * Default: `false`.\n */\n requireReadBeforeEdit?: boolean\n /**\n * Client-side context compaction strategy. Use this for non-Anthropic\n * providers (OSS via cerebras / openai-compat / openrouter) that don't\n * have a server-side equivalent. Anthropic users should prefer the\n * server-side `context-management-2025-06-27` beta — see\n * `AnthropicParams.contextManagement`.\n *\n * - `'off'` (default) — no client-side compaction.\n * - `'tail'` — when total tool-output bytes in the persisted history\n * exceed `compactThreshold`, replace older `tool_result` outputs with a\n * short stub, keeping the newest `compactKeepTurns` turns intact. The\n * compaction is applied to the wire-level message list only; the\n * underlying session turns are not modified.\n *\n * Default: `'off'`.\n */\n compactStrategy?: 'off' | 'tail'\n /**\n * Soft byte threshold that triggers tail compaction when\n * `compactStrategy === 'tail'`. Counts the post-`context:transform` bytes\n * of `tool_result` outputs across all messages. Default: `131_072` (128\n * KiB). Ignored when compaction is off.\n */\n compactThreshold?: number\n /**\n * Number of trailing turns to leave untouched during tail compaction. The\n * most-recent `compactKeepTurns` user/assistant messages are not eligible\n * for elision so the model keeps the freshest tool context. Default: `4`.\n */\n compactKeepTurns?: number\n /**\n * Prefix every line of `read_file` output with its 1-indexed line number\n * followed by a tab (`<N>\\t<content>`) — the compact `cat -n`-style\n * format Claude Code emits. The `edit` tool strips the prefix from\n * `old_string` / `new_string` so the model can paste back a numbered\n * chunk verbatim without breaking the match.\n *\n * Set `false` to opt out — useful for callers piping `read_file` into\n * downstream parsers that don't recognize the prefix. Per-call\n * `read_file({ lineNumbers: false })` overrides this default.\n *\n * Default: `true`.\n */\n readLineNumbers?: boolean\n /**\n * Replace older `read_file` `tool_result` blocks with a short stub when\n * a successful `edit` / `multi_edit` / `write_file` later in the same\n * run modified the same path. The replacement is applied to the\n * wire-level message list only — persisted session turns keep the\n * original content.\n *\n * Eliminates the common waste pattern where the model carries the\n * pre-edit file body forward across many turns \"in case it needs it\".\n * Pairs cleanly with `compactStrategy: 'tail'`: stale reads shrink\n * first, then the byte-threshold compaction fires if anything's left.\n *\n * Detection is conservative — only triggers when the corresponding\n * tool_result confirms success (`Edited …`, `Created …`, `Updated …`).\n * Failed edits and `No change needed` write_file calls do NOT\n * invalidate prior reads.\n *\n * Default: `false`.\n */\n elideStaleReads?: boolean\n /**\n * Tool disclosure strategy. Controls whether the model sees every tool's\n * full `inputSchema` in its tool list every turn (\"eager\") or whether MCP\n * tools are advertised as a name+description catalog in the system prompt\n * and only get full schemas after being surfaced via the `tool_search`\n * native tool (\"lazy\" / progressive disclosure).\n *\n * Native tools (those passed to `createAgent({ tools })`) and skill tools\n * are always eager — they are core to the agent and cheap. Only MCP tools\n * are eligible for lazy disclosure.\n *\n * When `'lazy'`, the agent:\n * - Appends a `<searchable_tools>` section to the system prompt listing\n * every MCP tool by `name` + `description` only (no `inputSchema`).\n * - Auto-injects a `tool_search` native tool (opt out via\n * {@link AgentBehavior.toolSearch}) the model uses to load schemas on\n * demand. Surfaced tools persist for the rest of the run.\n * - Rebuilds the wire-level tool list each turn, appending newly-unlocked\n * tools at the end so the prefix-cache breakpoint advances cleanly.\n *\n * Trade-off: every `tool_search` invocation expands the tool list and\n * invalidates the tool-list cache breakpoint for one turn. With many\n * MCP servers, the savings on cold turns (fewer schemas in context) are\n * substantial; with one tiny MCP server, the overhead may not pay back.\n *\n * Default: `'eager'`.\n */\n toolDisclosure?: 'eager' | 'lazy'\n /**\n * Fine-grained config for the `tool_search` tool auto-injected when\n * {@link AgentBehavior.toolDisclosure} is `'lazy'`. No-op in eager mode.\n *\n * - `tool: false` — opt out of the auto-injection entirely. Use when the\n * host wants to ship a custom discovery tool. Note that the catalog\n * text drops the call-to-action prose in this case so the model isn't\n * pointed at a non-existent tool.\n * - `limit` — default cap on results returned per `tool_search` call when\n * the model omits the parameter. Default: `20`.\n *\n * Note on host-defined `tool_search`: a tool the host registers under the\n * name `tool_search` (or under any alias whose canonical is `tool_search`)\n * will shadow the auto-injected one — the catalog text will point at the\n * host's wire name, but driving the unlock flow requires either using\n * `createToolSearchTool({ catalog, unlocked })` from `tools/tool-search`\n * (which internally mutates the unlock set) or fully opting out via\n * `toolSearch.tool: false` and treating discovery as a host-side concern.\n * A bare host tool that doesn't touch the unlock set will not advance the\n * lazy disclosure state and the hard gate will keep refusing lazy calls.\n *\n * Default: `undefined` (auto-inject with the default limit).\n */\n toolSearch?: {\n tool?: false\n limit?: number\n }\n /**\n * Persist large `tool_result` outputs to disk and replace the in-message\n * content with a `<persisted-output>` stub (preview + filesystem path).\n * When the post-`tool:transform` byte size of a tool's result exceeds\n * this threshold, the framework writes the full payload to\n * `<persistDir>/<callId>.txt` and substitutes a fixed-format stub so the\n * model sees a 2 KiB preview plus the path it can `read_file`.\n *\n * The substitution happens at emit time (just after `tool:transform` runs)\n * and the stub flows into `session.turns` directly — so every subsequent\n * turn re-emits the same bytes, keeping the prompt-cache prefix stable.\n *\n * Set `0` / `undefined` to disable. Built-in chat profiles default to\n * `8192`. Tools listed in {@link AgentBehavior.persistExcludeTools} bypass\n * regardless of size — typically because their output is intentionally\n * short or persisting would be circular (e.g. `read_file`).\n *\n * Requires {@link AgentBehavior.persistDir} to be set; without a target\n * directory the framework silently skips persistence (no throw, no\n * substitution) since there's nowhere to write the blob.\n *\n * Default: `undefined` (off).\n */\n persistThreshold?: number\n /**\n * Canonical tool names to exclude from disk persistence regardless of\n * output size. The framework bypasses persistence for any tool whose\n * canonical name appears in this list — useful for tools whose results\n * are intentionally part of the prompt (`skills_use`), short envelopes\n * (`tool_search`, `present_plan`, `ask_user`), or where persistence\n * would be circular (`read_file`, whose pagination already serves the\n * same use case).\n *\n * Default: `undefined` (no exclusions). The chat-layer built-in profiles\n * set their own list — see `src/chat/agents.ts`.\n */\n persistExcludeTools?: readonly string[]\n /**\n * Directory under which persisted tool-result blobs land. Each call's\n * payload is written to `<persistDir>/<callId>.txt` (one file per\n * `tool_use` id, atomic via write-then-rename).\n *\n * The chat layer resolves this to `<userDir>/tool-results/<sessionId>/`\n * at session activation; SDK consumers pass an absolute path. Required\n * when {@link AgentBehavior.persistThreshold} is non-zero — when unset\n * the framework treats persistence as disabled.\n *\n * Default: `undefined`.\n */\n persistDir?: string\n /**\n * Soft byte-cap on the cumulative size of persisted blobs under\n * {@link persistDir} for THIS session. After every successful blob\n * write the framework sums the directory's `*.txt` payloads and, if\n * the total exceeds this value, removes oldest-first by mtime until\n * the remainder is at or below the cap.\n *\n * The just-written blob is never evicted in the same sweep (its mtime\n * is the newest), so a single oversize result still lands and gets\n * pointed at by its `<persisted-output>` stub — the LRU is a steady-\n * state housekeeping mechanism, not a per-call admission gate.\n *\n * Long unattended runs can otherwise grow `<userDir>/tool-results/<sessionId>/`\n * without bound; session-delete cleanup runs on demand, not on\n * schedule.\n *\n * Default: `undefined` (no cap). Set `0` to disable explicitly; the\n * eviction step is a no-op for non-positive / non-finite values.\n */\n persistMaxBytes?: number\n /**\n * Absolute directory where the `shell` tool's background mode (the\n * `run_in_background: true` flag) appends output log files. One file\n * per task: `<tasksDir>/<task-id>.log` (e.g. `bash_1.log`). The model\n * gets the absolute path back in the tool result and reads incremental\n * output via the regular `read_file` tool.\n *\n * The chat layer resolves this to `<userDir>/<sessionId>/tasks/` at\n * session activation; SDK consumers pass an absolute path. When unset,\n * `shell({ run_in_background: true })` surfaces a clean error to the\n * model so the framework doesn't silently fall back to a path the user\n * didn't pick.\n *\n * Default: `undefined`.\n */\n tasksDir?: string\n /**\n * Hide the built-in `shell` tool's `run_in_background` field + the\n * background-mode paragraphs in its description, even when\n * {@link AgentBehavior.tasksDir} is set. The model never sees the flag\n * and won't try to use it.\n *\n * Combined with the implicit gate on `tasksDir` (background mode also\n * auto-hides when `tasksDir` is unset, since the host hasn't wired the\n * log dir), this gives two ways to opt out:\n *\n * - **Implicit**: don't set `tasksDir`. The host hasn't opted in.\n * - **Explicit**: set `disableBackgroundTasks: true`. The host has\n * `tasksDir` for some other reason (legacy, fixture) but doesn't\n * want the model spawning background work.\n *\n * Only applies to the framework-provided `shell` tool (identity check\n * against the exported `shell` constant). Hosts who wrap or replace\n * the shell tool own their spec — the auto-disable doesn't touch\n * tool defs the agent doesn't recognize.\n *\n * Default: `false`.\n */\n disableBackgroundTasks?: boolean\n /**\n * Fail-fast instead of repair when the pre-send pairing pass detects\n * corruption (orphan `tool_use` / `tool_result`, duplicate ids,\n * compaction-stranded blocks). Throws {@link AgentToolPairingError} from\n * the next `agent.run()` turn carrying the structured repair list the\n * loop would have performed.\n *\n * Use case: training-data collectors that must reject any transcript\n * containing the synthetic `SYNTHETIC_TOOL_RESULT_PLACEHOLDER` rather\n * than ship poisoned data to the fine-tuning pipeline. User-facing chat\n * sessions should leave this off (the repair-on-the-fly behavior is the\n * point of the pass).\n *\n * Telemetry note: `pairing:repair` still fires for every repair before\n * the throw, so observability handlers see exactly what would have\n * happened.\n *\n * Default: `false`.\n */\n strictToolPairing?: boolean\n}\n\n// ---------------------------------------------------------------------------\n// Prompt parts (multimodal input)\n// ---------------------------------------------------------------------------\n\n/**\n * One block of a multimodal user prompt.\n *\n * `agent.run({ prompt })` accepts either a plain string (treated as a single\n * text part) or an array of these parts for multimodal inputs.\n *\n * `document` parts are routed per provider: PDF-style mime types are sent as\n * native document blocks when the provider supports them; text documents are\n * inlined as text with an attachment header. Providers that cannot handle an\n * image or document throw early.\n */\nexport type PromptPart\n = | PromptTextPart\n | PromptImagePart\n | PromptDocumentPart\n\nexport interface PromptTextPart {\n type: 'text'\n text: string\n}\n\nexport interface PromptImagePart {\n type: 'image'\n /** IANA media type (e.g. `image/png`, `image/jpeg`) */\n mediaType: string\n /** Base64-encoded payload */\n data: string\n /** Optional display name */\n name?: string\n}\n\nexport interface PromptDocumentPart {\n type: 'document'\n /** IANA media type (e.g. `application/pdf`, `text/plain`) */\n mediaType: string\n /** Either a base64-encoded payload (`encoding: 'base64'`) or raw text (`encoding: 'text'`) */\n data: string\n encoding: 'base64' | 'text'\n /** Optional display name used in attachment headers */\n name?: string\n}\n\n// ---------------------------------------------------------------------------\n// Canonical message format (used throughout the agent system)\n// ---------------------------------------------------------------------------\n\n/**\n * A single block of structured tool-result content.\n *\n * MCP servers can return a mix of text, image, resource, and audio blocks. Tools\n * return `string` for the common text-only case or `ToolResultContent[]` when they\n * need to preserve non-text content (e.g. screenshots from a browser MCP).\n *\n * Providers that support native multi-part tool results (Anthropic, OpenAI Codex via\n * pi-ai) route image blocks into their wire format verbatim; OpenAI-compat providers\n * route them via a companion-user-message fallback when the underlying model/endpoint\n * does not accept images inside tool-role messages.\n */\nexport type ToolResultContent\n = | ToolResultTextContent\n | ToolResultImageContent\n\nexport interface ToolResultTextContent {\n type: 'text'\n text: string\n}\n\nexport interface ToolResultImageContent {\n type: 'image'\n /** IANA media type (e.g. `image/png`, `image/jpeg`) */\n mediaType: string\n /** Base64-encoded payload */\n data: string\n}\n\n/**\n * Lossy flattener — converts `ToolResultContent[]` (or a plain string) to a single\n * string. Image blocks are replaced with `[image: <media> — <n> b64 bytes]` markers.\n *\n * Use at UI boundaries where a string is required; providers that understand\n * structured content should route the array through without flattening.\n */\nexport function toolResultToText(content: string | ToolResultContent[]): string {\n if (typeof content === 'string')\n return content\n return content\n .map((block) => {\n if (block.type === 'text')\n return block.text\n return `[image: ${block.mediaType} — ${block.data.length} b64 bytes]`\n })\n .join('\\n')\n}\n\n/**\n * Approximate **wire payload size** of a tool output, in bytes.\n *\n * - Plain text: UTF-8 byte length.\n * - Structured content: text blocks contribute their UTF-8 byte length; image\n * blocks contribute their **base64 character length** — a proxy for the\n * serialized request-body footprint, NOT for tokens. Vision encoders\n * tokenize decoded pixels (geometry-dependent; e.g. Anthropic ≈ `w·h/750`,\n * OpenAI ≈ 85 + 170/tile), which has no meaningful relationship to base64\n * length.\n *\n * Used by the agent loop to populate `outputBytes` on `tool:after`,\n * `tool:transform`, `mcp:tool:after`, and `mcp:tool:transform` hooks so\n * consumers can size-budget tool output without re-counting bytes themselves.\n * Suitable for byte-budget heuristics (`toolOutputBudget`, tail compaction);\n * NOT a substitute for provider-side context-window accounting — defer to\n * server-side context management (e.g. Anthropic's `context-management-*`\n * beta) when token accuracy matters.\n */\nexport function toolOutputByteLength(content: string | ToolResultContent[]): number {\n if (typeof content === 'string')\n return Buffer.byteLength(content)\n let total = 0\n for (const block of content) {\n if (block.type === 'text')\n total += Buffer.byteLength(block.text)\n else\n total += block.data.length\n }\n return total\n}\n\nexport type SessionContentBlock\n = | { type: 'text', text: string }\n | { type: 'image', mediaType: string, data: string }\n | { type: 'tool_call', id: string, name: string, input: Record<string, unknown> }\n | {\n type: 'tool_result'\n callId: string\n /**\n * Tool output — either a plain string (text-only, the common case) or a structured\n * array of content blocks (text + image for multimodal tools such as screenshots).\n */\n output: string | ToolResultContent[]\n isError?: boolean\n }\n | {\n type: 'thinking'\n text: string\n signature?: string\n /**\n * Provider that minted `signature`. Signatures are provider-bound (Anthropic\n * HMAC vs. OpenAI `encrypted_content`) and are dropped on cross-provider\n * hops to avoid 400s. Unset means legacy/unknown — forwarded as-is.\n */\n signatureProducer?: 'anthropic' | 'openai'\n }\n | { type: 'redacted_thinking', data: string }\n | {\n /**\n * Opaque round-trip envelope for reasoning state minted by an OpenAI-compat\n * gateway (currently OpenRouter). The gateway expects its own\n * `reasoning_details` array echoed back verbatim on the next turn so the\n * upstream model can resume an extended-reasoning chain across tool calls.\n *\n * Stored opaquely because the items are provider-bound (Anthropic HMAC\n * signatures, OpenAI `encrypted_content`, model-specific summary formats\n * — all flowing through the gateway's normalized envelope).\n */\n type: 'provider_reasoning'\n producer: 'openrouter'\n details: unknown[]\n /**\n * Model id that produced the details. Reasoning is bound to a specific\n * upstream route — a model switch on the next turn invalidates the\n * embedded signatures, so the sender drops the block on mismatch.\n */\n model?: string\n }\n | {\n /**\n * Compaction marker. Inserted by `compactConversation()` to replace a\n * prefix of turns with an LLM-generated summary.\n *\n * The marker lives in `session.turns` and renders in the transcript —\n * the user can still scroll back to see the original turns. From the\n * agent loop's wire-level perspective, every turn whose id appears in\n * `replacesTurnIds` is dropped, and this block's `summary` text is\n * sent to the model as a single user message in their place.\n *\n * The marker turn carries `role: 'user'` so it sits naturally at a\n * conversational boundary. Only the latest `compact-summary` block in\n * the session is honored — earlier markers are subsumed by later\n * ones (their `replacesTurnIds` are a strict prefix).\n */\n type: 'compact-summary'\n /** Turn ids the summary replaces, in chronological order. */\n replacesTurnIds: readonly string[]\n /** The summary text sent to the model in place of the elided turns. */\n summary: string\n /** Model id used to produce the summary. */\n model: string\n /** Token usage from the summary call. */\n usage: TurnUsage\n /** Unix-ms when compaction completed. */\n compactedAt: number\n }\n\nexport interface SessionMessage {\n role: 'user' | 'assistant'\n content: SessionContentBlock[]\n}\n\nexport interface SessionTurn {\n /** UUID — generated by the store if it provides generateTurnId, else crypto.randomUUID() */\n id: string\n /** Run that produced this turn (e.g. 'run_1') */\n runId?: string\n role: 'user' | 'assistant' | 'system'\n content: SessionContentBlock[]\n /** Token usage — only present on assistant turns */\n usage?: TurnUsage\n /** Unix timestamp (Date.now()) when the turn was created */\n createdAt: number\n}\n\n// ---------------------------------------------------------------------------\n// Agent run options\n// ---------------------------------------------------------------------------\n\n/**\n * Per-run hook registrations. Each entry can be a single handler or an array of handlers.\n * Keys are `AgentHooks` event names (loose-typed here to avoid a circular import; agent.ts\n * narrows it to the strongly-typed map).\n */\nexport type RunHookMap = Record<string, ((ctx: any) => unknown) | ((ctx: any) => unknown)[]>\n\nexport interface AgentRunOptions {\n model?: string\n /**\n * User prompt. Optional when resuming a session with existing turns.\n *\n * Accepts either a plain string (single text part) or an array of `PromptPart`s for\n * multimodal inputs (text, images, documents). See {@link PromptPart}.\n */\n prompt?: string | PromptPart[]\n system?: string\n thinking?: ThinkingLevel\n /** Abort signal — when triggered, the agent stops after the current turn */\n signal?: AbortSignal\n /** Behavior overrides for this run (overrides agent defaults) */\n behavior?: AgentBehavior\n /** Tool overrides for this run. Pass {} for no tools. Omit to use agent tools. */\n tools?: Record<string, ToolDef>\n /**\n * Per-run hook registrations. Each hook is attached before the run starts and\n * detached in a finally block so handlers never leak across runs.\n *\n * Accepts either a single handler or an array (all handlers register).\n */\n hooks?: RunHookMap\n /**\n * Parent run id. Populated automatically by the `spawn` tool when the child\n * shares the parent's session; recorded on the resulting `SessionRun` so the\n * parent↔child run tree can be reconstructed from a persisted session.\n */\n parentRunId?: string\n /**\n * Zero-based subagent depth. 0 = top-level `agent.run()`, 1 = first-level\n * child spawned by a parent agent, and so on. Used by the spawn tool to\n * enforce `maxDepth` and to stamp `child:*` forwarded hook payloads.\n */\n depth?: number\n /**\n * Opaque trace-context carrier propagated from the parent agent's\n * tracer (typically a W3C `{ traceparent, tracestate }` map). The\n * child's `agent.run()` re-emits it on the `agent:start` hook so the\n * child's tracer can stitch its root span as a continuation of the\n * parent's spawn span. Empty / absent on top-level runs.\n *\n * Set automatically by the `spawn` tool when a parent's tracer wrote\n * into `SpawnHookContext.tracingContext` on `spawn:before`. SDK\n * consumers that drive subagents manually can populate this directly.\n */\n tracingContext?: Readonly<Record<string, string>>\n /**\n * Override the time + UUID source for this run only. Per-run wins over\n * agent-level ({@link AgentOptions.clock}); both fall back to\n * {@link DEFAULT_AGENT_CLOCK}. See {@link AgentClock}.\n */\n clock?: AgentClock\n}\n\n// ---------------------------------------------------------------------------\n// Agent stats\n// ---------------------------------------------------------------------------\n\n/**\n * Reason the provider gave for stopping the turn.\n *\n * - `'stop'` — natural turn end (`end_turn` / `stop_sequence`).\n * - `'tool-calls'` — model emitted tool_use blocks.\n * - `'length'` — `max_tokens` reached, or (Anthropic 4.6+) the response bumped\n * against the model's context window mid-stream\n * (`model_context_window_exceeded`). The partial response is preserved; the\n * loop emits this reason so consumers can prune/retry.\n * - `'content-filter'` — model refused.\n * - `'pause'` — Anthropic `pause_turn`: a server-side mid-turn pause for very\n * long thinking. The loop continues with a synthetic \"Please continue.\"\n * user message rather than terminating; consumers see the pause via this\n * finish reason on the prior assistant turn.\n * - `'error'` — provider classified the turn as failed.\n * - `'other'` — unknown / unmapped.\n */\nexport type TurnFinishReason = 'stop' | 'tool-calls' | 'length' | 'content-filter' | 'pause' | 'error' | 'other'\n\nexport interface TurnUsage {\n input: number\n output: number\n /** Tokens written to cache (Anthropic) */\n cacheCreation?: number\n /** Tokens read from cache (Anthropic) */\n cacheRead?: number\n /** Thinking/reasoning tokens used */\n thinking?: number\n /**\n * Cost in USD for this turn. Provider-reported when available\n * (OpenRouter, OpenAI via pi-ai); otherwise estimated from `modelId` ×\n * pi-ai's bundled price registry by `fillEstimatedCost` in\n * `src/providers/cost.ts`. Absent only when neither path could resolve\n * a price (unknown / unbundled model).\n */\n cost?: number\n /**\n * Why the model stopped this turn. Providers normalize native stop reasons to this union.\n * Absent when the provider did not surface a reason (e.g. mock turns).\n */\n finishReason?: TurnFinishReason\n /**\n * The model ID the provider ultimately used. May differ from the requested model when the\n * provider remaps aliases. Absent for providers that do not echo a model ID.\n */\n modelId?: string\n /**\n * Milliseconds from the moment the loop dispatched `provider.stream()`\n * (the `stream:start` hook firing) to the first observable byte of this\n * turn — earliest of `stream:text`, `stream:thinking`, or a tool_use\n * block. Captures the per-turn TTFT independently of run-relative TTFT\n * ({@link AgentStats.timeTillFirstTokenMs}, which only marks the first\n * turn).\n *\n * Useful for metrics histograms (`gen_ai.client.time_to_first_token`)\n * across long multi-turn runs where the run-level metric collapses to\n * the cold turn only. Absent for empty-stream turns and turns that\n * errored before any byte landed.\n */\n timeToFirstTokenMs?: number\n}\n\nexport interface AgentStats {\n /**\n * Cumulative input tokens across the parent agent loop **and** every\n * recursively-spawned sub-agent. Use this for billing / token-ledger\n * consumption.\n */\n totalIn: number\n /** Cumulative output tokens. Same semantics as {@link AgentStats.totalIn}. */\n totalOut: number\n /**\n * Cumulative cache-read tokens across the parent agent loop and every\n * recursively-spawned sub-agent. Surfaced at the top level (rather than\n * only per-`TurnUsage`) because Anthropic prices cache reads at a separate\n * line-item rate from regular input — billing-correct cost computation\n * needs this number directly. Always `0` for providers that don't report\n * cache usage.\n */\n totalCacheRead: number\n /**\n * Cumulative cache-creation tokens across the parent agent loop and every\n * recursively-spawned sub-agent. Same rationale as\n * {@link AgentStats.totalCacheRead} — separate Anthropic billing rate.\n * Always `0` for providers that don't report cache usage.\n */\n totalCacheCreation: number\n /**\n * Number of parent agent-loop turns. Children's turn counts live under\n * `children[].stats.turns` and are NOT folded in here — a single \"turns\"\n * number for the whole tree would conflate two different measures\n * (parent-loop iterations vs. tree-wide tool-call rounds).\n *\n * Tree-wide turn count: `flattenTurns(stats).length`.\n */\n turns: number\n /**\n * Wall-clock duration of the top-level `agent.run()` call, in milliseconds.\n * Children run during parent tool calls so this naturally subsumes child\n * wall time — sequential children inflate it, parallel children compress\n * into the parent's window.\n */\n elapsed: number\n /**\n * Per-turn usage breakdown for the **parent loop only**. Children's per-turn\n * usages live under `children[].stats.turnUsage`. Use {@link flattenTurns}\n * to walk the full tree.\n */\n turnUsage?: TurnUsage[]\n /**\n * Cumulative cost in USD — parent loop plus every recursively-spawned\n * sub-agent. Sums per-turn `TurnUsage.cost` reported by the provider.\n * Absent when neither parent nor any descendant reported a non-zero cost.\n */\n cost?: number\n /** Stats from child agents spawned during this run, in completion order. Recursive. */\n children?: ChildRunStats[]\n /** Structured output from schema enforcement (only present when behavior.schema is set) */\n output?: Record<string, unknown>\n /**\n * Milliseconds from the start of `agent.run()` to the first observable signal from the\n * provider (first `stream:text`, `stream:thinking`, or `tool:before` event).\n *\n * Absent when the run produced no observable signals (e.g. aborted before any stream event).\n */\n timeTillFirstTokenMs?: number\n}\n\nexport interface ChildRunStats {\n id: string\n task: string\n /**\n * The child agent's full {@link AgentStats}. Cumulative for that child's\n * own subtree (child loop + its grandchildren). Do **not** sum\n * `ctx.stats.totalIn` across `spawn:complete` events to derive top-level\n * totals — `agent.run()`'s return value is the canonical cumulative root.\n */\n stats: AgentStats\n /**\n * Subagent depth when this child ran. 1 = direct child of the top-level\n * agent, 2 = grandchild, etc. Useful for telemetry that wants to group\n * runs by depth.\n */\n depth?: number\n /**\n * Terminal state of the child run. `'completed'` is the default. Exposed so\n * a parent reading `stats.children` can distinguish aborted/timed-out\n * children without re-parsing the returned string.\n */\n status?: 'completed' | 'aborted' | 'timeout' | 'error'\n /**\n * Final structured output when the child was run with `behavior.schema`.\n * Mirrors `AgentStats.output` but is surfaced here so the parent can read\n * it without peeking at the nested `stats` bag.\n */\n output?: Record<string, unknown>\n}\n\n// ---------------------------------------------------------------------------\n// Hook context types\n// ---------------------------------------------------------------------------\n\n/**\n * Base context for tool execution hooks.\n *\n * `name` is the canonical tool identity — the spec name registered on the agent (or the\n * `mcp_{server}_{tool}` name for MCP tools). Hooks should policy-match against `name`.\n *\n * `displayName` is the outward-facing name — the alias surfaced to the LLM when\n * `AgentOptions.toolAliases` maps the canonical name; otherwise equal to `name`.\n * UI/telemetry adapters should emit `displayName`.\n *\n * Canonical vs. alias matters on session resume: `session.turns` persists canonical\n * names only, so renaming an alias cannot desync history.\n */\nexport interface ToolHookContext {\n turnId: string\n callId: string\n /** Canonical tool name (spec name). Stable across alias-map changes. */\n name: string\n /** Aliased (wire) name — equal to `name` when no alias is defined. */\n displayName: string\n input: Record<string, unknown>\n /**\n * The run this tool call belongs to (the `SessionRun.id`). Lets a single\n * `tool:*` listener disambiguate calls across parallel runs / subagent\n * trees without subscribing to the `child:tool:*` bubble events.\n */\n runId?: string\n /**\n * Parent run id when this tool call's agent is a subagent — i.e. the\n * `SessionRun.parentRunId` of the run that owns the call. Absent on\n * top-level runs. Useful for observability stitching: a UI grouping\n * subagent-scoped state (e.g. todowrite, edit batches) by parent\n * run can read this directly off `tool:before` / `tool:after`\n * without resolving the run row.\n */\n parentRunId?: string\n /**\n * Subagent depth for this tool call. 0 = top-level, 1 = first-level\n * child, etc. Mirrors `ToolContext.depth` so hook consumers don't\n * have to cross-reference the tool context. Omitted on top-level\n * runs (treated as 0).\n */\n depth?: number\n}\n\n/**\n * Base context for MCP tool hooks.\n *\n * `tool` is the native tool name on the MCP server. `server` is the configured server\n * name. The canonical zidane-namespaced identity is `mcp_{server}_{tool}`.\n *\n * `displayName` equals the canonical namespaced name unless the agent has aliased\n * this MCP tool via `AgentOptions.toolAliases`; in which case `displayName` is the\n * alias that the LLM sees.\n */\nexport interface McpToolHookContext {\n turnId: string\n callId: string\n server: string\n tool: string\n /** Aliased wire name for this MCP tool, or the canonical `mcp_{server}_{tool}` name. */\n displayName: string\n input: Record<string, unknown>\n /** Owning run id — same semantics as `ToolHookContext.runId`. */\n runId?: string\n /** Parent run id when this tool call's agent is a subagent — see `ToolHookContext.parentRunId`. */\n parentRunId?: string\n /** Subagent depth — see `ToolHookContext.depth`. */\n depth?: number\n}\n\n/** Base context for session hooks */\nexport interface SessionHookContext {\n sessionId: string\n}\n\n/** Base context for spawn hooks */\nexport interface SpawnHookContext {\n id: string\n task: string\n /**\n * Subagent depth for the spawn. 1 = direct child of the top-level agent.\n * Present on spawn:before/complete/error. Absent for grandchild spawns that\n * bubble through `child:*` events (which carry their own `depth`).\n */\n depth?: number\n /**\n * Mutable trace-context carrier for parent → child span linkage. Empty\n * object by default; a parent tracer mutates it on `spawn:before` (e.g.\n * `ctx.tracingContext.traceparent = '00-…-…-01'`) and the `spawn` tool\n * forwards the populated object to the child via\n * `AgentRunOptions.tracingContext`. The child's tracer re-emits it on\n * `agent:start` so it can be used as parent context when opening the\n * child's root span.\n *\n * Opaque to the harness — keys / values are tracer-defined. Standard\n * choice is W3C Trace Context (`traceparent` + optional `tracestate`),\n * but Datadog / Sentry / B3 carriers work too.\n */\n tracingContext?: Record<string, string>\n}\n\n/** Context for stream hooks */\nexport interface StreamHookContext {\n turnId: string\n}\n\n/** Context for OAuth refresh hooks */\nexport interface OAuthRefreshHookContext {\n provider: string\n providerId: string\n source: 'params' | 'file'\n previousCredentials: Record<string, unknown> & { access: string, refresh: string, expires: number }\n credentials: Record<string, unknown> & { access: string, refresh: string, expires: number }\n}\n\nexport type SessionEndStatus = 'completed' | 'aborted' | 'error'\n"],"mappings":";;;AAiEA,MAAa,sBAAkC;CAC7C,WAAW,KAAK,KAAK;CACrB,kBAAkB,OAAO,YAAY;CACtC;;;;;;;;AAqvBD,SAAgB,iBAAiB,SAA+C;CAC9E,IAAI,OAAO,YAAY,UACrB,OAAO;CACT,OAAO,QACJ,KAAK,UAAU;EACd,IAAI,MAAM,SAAS,QACjB,OAAO,MAAM;EACf,OAAO,WAAW,MAAM,UAAU,KAAK,MAAM,KAAK,OAAO;GACzD,CACD,KAAK,KAAK;;;;;;;;;;;;;;;;;;;;;AAsBf,SAAgB,qBAAqB,SAA+C;CAClF,IAAI,OAAO,YAAY,UACrB,OAAO,OAAO,WAAW,QAAQ;CACnC,IAAI,QAAQ;CACZ,KAAK,MAAM,SAAS,SAClB,IAAI,MAAM,SAAS,QACjB,SAAS,OAAO,WAAW,MAAM,KAAK;MAEtC,SAAS,MAAM,KAAK;CAExB,OAAO"}
|
package/dist/types.d.ts
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
import { a as ExecutionContext, i as ExecResult, n as ContextCapabilities, o as ExecutionHandle, r as ContextType, s as SpawnConfig } from "./types-KukEp-mi.js";
|
|
2
2
|
import { t as SandboxProvider } from "./index-CbS75MD3.js";
|
|
3
|
-
import { $t as SpawnHookContext, Bt as McpToolHookContext, D as SkillConfig, Dt as OpenAIParams, F as SessionRun, Ft as AgentRunOptions, Gt as PromptPart, Ht as OAuthRefreshHookContext, I as SessionStore, It as AgentStats, Jt as SessionContentBlock, Kt as PromptTextPart, Lt as ChildRunStats, M as CreateSessionOptions, N as Session, Nt as AgentBehavior, P as SessionData, Pt as AgentClock, Qt as SessionTurn, Rt as DEFAULT_AGENT_CLOCK, S as ReadStateMap, Ut as PromptDocumentPart, Wt as PromptImagePart, Xt as SessionHookContext, Yt as SessionEndStatus, Zt as SessionMessage, _n as
|
|
4
|
-
import { H as ModelUsage, L as InteractionToolOptions, S as SpawnToolState, U as flattenTurns, W as statsByModel, b as ChildAgent, h as ValidationResult, t as Preset, x as SpawnToolOptions } from "./index-
|
|
5
|
-
export { type Agent, AgentAbortedError, type AgentBehavior, type AgentClock, AgentContextExceededError, type AgentHooks, type AgentOptions, AgentProviderError, type AgentRunOptions, type AgentStats, AgentToolNotAllowedError, type AnthropicParams, CONTEXT_EXCEEDED_MESSAGE_PATTERNS, type CerebrasParams, type ChildAgent, type ChildRunStats, type ClassifiedError, type ClassifiedErrorKind, type ContextCapabilities, type ContextType, type CreateSessionOptions, DEFAULT_AGENT_CLOCK, type ExecResult, type ExecutionContext, type ExecutionHandle, type InteractionToolOptions, type McpConnection, type McpServerConfig, type McpToolHookContext, type ModelUsage, type OAuthRefreshHookContext, type OpenAIParams, type OpenRouterParams, type Preset, type PromptDocumentPart, type PromptImagePart, type PromptPart, type PromptTextPart, type Provider, type ProviderCapabilities, type ReadStateEntry, type ReadStateMap, type RemoteStoreOptions, type RunHookMap, type SandboxProvider, type Session, type SessionContentBlock, type SessionData, type SessionEndStatus, type SessionHookContext, type SessionMessage, type SessionRun, type SessionStore, type SessionTurn, type SkillConfig, type SkillResource, type SkillsConfig, type SpawnConfig, type SpawnHookContext, type SpawnToolOptions, type SpawnToolState, type StreamCallbacks, type StreamHookContext, type StreamOptions, type ThinkingLevel, type ToolCall, type ToolContext, type ToolDef, type ToolHookContext, type ToolMap, type ToolResult, type ToolResultContent, type ToolResultImageContent, type ToolResultTextContent, type ToolSpec, type TurnFinishReason, type TurnResult, type TurnUsage, type ValidationResult, flattenTurns, matchesContextExceeded, statsByModel, toolOutputByteLength, toolResultToText };
|
|
3
|
+
import { $t as SpawnHookContext, Bt as McpToolHookContext, D as SkillConfig, Dt as OpenAIParams, F as SessionRun, Ft as AgentRunOptions, Gt as PromptPart, Ht as OAuthRefreshHookContext, I as SessionStore, It as AgentStats, Jt as SessionContentBlock, Kt as PromptTextPart, Lt as ChildRunStats, M as CreateSessionOptions, N as Session, Nt as AgentBehavior, P as SessionData, Pt as AgentClock, Qt as SessionTurn, Rt as DEFAULT_AGENT_CLOCK, S as ReadStateMap, Ut as PromptDocumentPart, Wt as PromptImagePart, Xt as SessionHookContext, Yt as SessionEndStatus, Zt as SessionMessage, _n as ClassifiedError, an as ToolResultTextContent, b as ToolMap, bn as matchesContextExceeded, cn as toolOutputByteLength, ct as StreamCallbacks, dn as AgentBudgetExceededError, dt as ToolResult, en as StreamHookContext, fn as AgentContextExceededError, ft as ToolSpec, gn as CONTEXT_EXCEEDED_MESSAGE_PATTERNS, i as AgentOptions, in as ToolResultImageContent, j as SkillsConfig, jt as AnthropicParams, k as SkillResource, kt as CerebrasParams, ln as toolResultToText, lt as StreamOptions, mn as AgentToolNotAllowedError, nn as ToolHookContext, on as TurnFinishReason, ot as Provider, p as McpConnection, pn as AgentProviderError, pt as TurnResult, qt as RunHookMap, r as AgentHooks, rn as ToolResultContent, sn as TurnUsage, st as ProviderCapabilities, t as Agent, tn as ThinkingLevel, un as AgentAbortedError, ut as ToolCall, v as ToolContext, vn as ClassifiedErrorKind, x as ReadStateEntry, y as ToolDef, yt as OpenRouterParams, z as RemoteStoreOptions, zt as McpServerConfig } from "./agent-B26FuGew.js";
|
|
4
|
+
import { H as ModelUsage, L as InteractionToolOptions, S as SpawnToolState, U as flattenTurns, W as statsByModel, b as ChildAgent, h as ValidationResult, t as Preset, x as SpawnToolOptions } from "./index-CROWxXo9.js";
|
|
5
|
+
export { type Agent, AgentAbortedError, type AgentBehavior, AgentBudgetExceededError, type AgentClock, AgentContextExceededError, type AgentHooks, type AgentOptions, AgentProviderError, type AgentRunOptions, type AgentStats, AgentToolNotAllowedError, type AnthropicParams, CONTEXT_EXCEEDED_MESSAGE_PATTERNS, type CerebrasParams, type ChildAgent, type ChildRunStats, type ClassifiedError, type ClassifiedErrorKind, type ContextCapabilities, type ContextType, type CreateSessionOptions, DEFAULT_AGENT_CLOCK, type ExecResult, type ExecutionContext, type ExecutionHandle, type InteractionToolOptions, type McpConnection, type McpServerConfig, type McpToolHookContext, type ModelUsage, type OAuthRefreshHookContext, type OpenAIParams, type OpenRouterParams, type Preset, type PromptDocumentPart, type PromptImagePart, type PromptPart, type PromptTextPart, type Provider, type ProviderCapabilities, type ReadStateEntry, type ReadStateMap, type RemoteStoreOptions, type RunHookMap, type SandboxProvider, type Session, type SessionContentBlock, type SessionData, type SessionEndStatus, type SessionHookContext, type SessionMessage, type SessionRun, type SessionStore, type SessionTurn, type SkillConfig, type SkillResource, type SkillsConfig, type SpawnConfig, type SpawnHookContext, type SpawnToolOptions, type SpawnToolState, type StreamCallbacks, type StreamHookContext, type StreamOptions, type ThinkingLevel, type ToolCall, type ToolContext, type ToolDef, type ToolHookContext, type ToolMap, type ToolResult, type ToolResultContent, type ToolResultImageContent, type ToolResultTextContent, type ToolSpec, type TurnFinishReason, type TurnResult, type TurnUsage, type ValidationResult, flattenTurns, matchesContextExceeded, statsByModel, toolOutputByteLength, toolResultToText };
|
package/dist/types.js
CHANGED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
import {
|
|
1
|
+
import { a as AgentToolNotAllowedError, i as AgentProviderError, l as matchesContextExceeded, n as AgentBudgetExceededError, r as AgentContextExceededError, s as CONTEXT_EXCEEDED_MESSAGE_PATTERNS, t as AgentAbortedError } from "./errors-DdZXnyXE.js";
|
|
2
2
|
import { n as toolOutputByteLength, r as toolResultToText, t as DEFAULT_AGENT_CLOCK } from "./types-oKPBdCmL.js";
|
|
3
3
|
import { r as statsByModel, t as flattenTurns } from "./stats-Lc3zL3RM.js";
|
|
4
|
-
export { AgentAbortedError, AgentContextExceededError, AgentProviderError, AgentToolNotAllowedError, CONTEXT_EXCEEDED_MESSAGE_PATTERNS, DEFAULT_AGENT_CLOCK, flattenTurns, matchesContextExceeded, statsByModel, toolOutputByteLength, toolResultToText };
|
|
4
|
+
export { AgentAbortedError, AgentBudgetExceededError, AgentContextExceededError, AgentProviderError, AgentToolNotAllowedError, CONTEXT_EXCEEDED_MESSAGE_PATTERNS, DEFAULT_AGENT_CLOCK, flattenTurns, matchesContextExceeded, statsByModel, toolOutputByteLength, toolResultToText };
|
package/docs/ARCHITECTURE.md
CHANGED
|
@@ -636,7 +636,7 @@ Merge rules:
|
|
|
636
636
|
|
|
637
637
|
`run.behavior` > `agent.behavior` > defaults (field-by-field merge).
|
|
638
638
|
|
|
639
|
-
Defaults: unlimited turns (set `maxTurns` as a safety net), `maxTokens: 16384`, level-based thinking; `cache: true`, `dedupReads: true`, `requireReadBeforeEdit: false`, `maxConcurrentTools: 10`, `compactStrategy: 'off'`, `compactThreshold: 128 KiB`, `compactKeepTurns: 4`, `toolDisclosure: 'eager'`, `readLineNumbers: true`, `elideStaleReads: false`. All other knobs (`toolOutputBudget`, `dedupTools`, `toolBudgets`, `thinkingDecay`, `toolSearch`, `persistThreshold`, `persistDir`, `persistExcludeTools`) are `undefined`. The chat-layer profiles (`BUILD_AGENT` / `PLAN_AGENT`) override several of these — see CHAT.md → Agents.
|
|
639
|
+
Defaults: unlimited turns (set `maxTurns` as a safety net), `maxTokens: 16384`, level-based thinking; `cache: true`, `dedupReads: true`, `requireReadBeforeEdit: false`, `maxConcurrentTools: 10`, `compactStrategy: 'off'`, `compactThreshold: 128 KiB`, `compactKeepTurns: 4`, `toolDisclosure: 'eager'`, `readLineNumbers: true`, `elideStaleReads: false`. All other knobs (`toolOutputBudget`, `dedupTools`, `toolBudgets`, `thinkingDecay`, `toolSearch`, `persistThreshold`, `persistDir`, `persistExcludeTools`, `persistMaxBytes`, `maxCostUsd`, `maxTotalTokens`) are `undefined`. The chat-layer profiles (`BUILD_AGENT` / `PLAN_AGENT`) override several of these — see CHAT.md → Agents.
|
|
640
640
|
|
|
641
641
|
**`toolOutputBudget`** (off by default). The loop sums every tool result's `outputBytes` after `tool:transform`, so consumer truncation counts. On overshoot, a synthetic user message is appended:
|
|
642
642
|
|
|
@@ -656,7 +656,9 @@ Defaults: unlimited turns (set `maxTurns` as a safety net), `maxTokens: 16384`,
|
|
|
656
656
|
|
|
657
657
|
**Gate precedence**. Consumer hooks register first (agent-lifetime then per-run), framework gates install after — `allowed-tools → tool-budgets → dedup → lazy-disclosure`. Each framework gate short-circuits on existing `ctx.block` or `ctx.result`, so the firing order is consumer → framework with the framework gates acting as an early-exit chain. The net effect: a policy gate always beats a consumer cache substitute, budgets enforce before dedup replays a cached call against an exhausted cap, and lazy-disclosure runs last so skill refusal wins over "load schema first" and dedup substitutes only hit on previously-recorded successful calls (which already passed lazy-disclosure once).
|
|
658
658
|
|
|
659
|
-
**Tool-result persistence** (`behavior.persistThreshold` + `persistDir` + `persistExcludeTools`). When set, the loop replaces oversize `tool_result` outputs with a `<persisted-output tool="…" bytes="…" path="…">` stub carrying a 2 KiB preview. Substitution happens just after `tool:transform`; the stub flows into `session.turns` directly so the prompt-cache prefix stays stable across turns. `persistDir` is required to enable the feature — without a target dir the framework silently skips persistence. `persistExcludeTools` (default empty at the SDK; the chat layer ships `DEFAULT_PERSIST_EXCLUDE_TOOLS` — see CHAT.md) lists canonical tool names that bypass regardless of size.
|
|
659
|
+
**Tool-result persistence** (`behavior.persistThreshold` + `persistDir` + `persistExcludeTools`). When set, the loop replaces oversize `tool_result` outputs with a `<persisted-output tool="…" bytes="…" path="…">` stub carrying a 2 KiB preview. Substitution happens just after `tool:transform`; the stub flows into `session.turns` directly so the prompt-cache prefix stays stable across turns. `persistDir` is required to enable the feature — without a target dir the framework silently skips persistence. `persistExcludeTools` (default empty at the SDK; the chat layer ships `DEFAULT_PERSIST_EXCLUDE_TOOLS` — see CHAT.md) lists canonical tool names that bypass regardless of size. `persistMaxBytes` caps the directory's cumulative `*.txt` size; after each successful write the helper evicts oldest-first by mtime (LRU) until total ≤ cap. The just-written blob is never evicted in the same sweep, so a single oversize result still lands. Off by default.
|
|
660
|
+
|
|
661
|
+
**Run-level budget circuit breaker** (`behavior.maxCostUsd` + `behavior.maxTotalTokens`). Soft caps the cumulative spend / token consumption of a single `agent.run()`. Checked **after** each turn (post-`turn:after` + `usage` hooks), so the breaching turn's numbers are visible before the throw lands. Trips throw `AgentBudgetExceededError` (`limit: 'cost' | 'tokens'`, `limitValue`, `actualValue`); the agent finalizes the session run as `'aborted'` so the persisted record distinguishes operator-imposed stops from runtime errors. Cost check wins when both axes would trip on the same turn (operators set ceilings against dollar accounting). Token ledger sums `input + output` only — cache reads/creates are billed at a discount and counting them at par would under-estimate affordability. `0` / negative / non-finite values disable the corresponding axis. Off by default; `maxTurns` is the count-based companion.
|
|
660
662
|
|
|
661
663
|
## Related
|
|
662
664
|
|
|
@@ -664,3 +666,4 @@ Defaults: unlimited turns (set `maxTurns` as a safety net), `maxTokens: 16384`,
|
|
|
664
666
|
- `docs/CHAT.md` — renderer-agnostic chat engine (`zidane/chat`).
|
|
665
667
|
- `docs/TUI.md` — OpenTUI terminal shell (`zidane/tui`).
|
|
666
668
|
- `docs/INTERACTIONS.md` — interactive tools (`present_plan`, `ask_user`) protocol.
|
|
669
|
+
- `docs/RESTATE.md` — optional durable-execution adapter (`zidane/restate`).
|
package/docs/CHAT.md
CHANGED
|
@@ -140,7 +140,8 @@ The table below indexes every named export; sections further down dive into the
|
|
|
140
140
|
| `mcp-credentials` | File-backed `McpCredentialStore` at `<dataDir>/mcp-credentials.json` (0o600). `createFileMcpCredentialStore`, `patchMcpCredential`, `mcpCredentialsPath`. |
|
|
141
141
|
| `mcps-discovery` | `discoverProjectMcps`, `parseMcpsFile`, `buildMcpServers`, `defaultMcpsConfigPaths`, `DiscoveredMcp`, `DiscoveryError`, `DiscoveryResult`. See **Project MCP servers**. |
|
|
142
142
|
| `model-catalog` | Cross-provider unified model picker assembly — `buildModelCatalog`, `filterModelCatalog`, `indexOfEntry`, `CatalogEntry`. |
|
|
143
|
-
| `oauth` | `runOAuthLogin(descriptor, { onUrl, onProgress, signal })` + `supportsOAuth(descriptor)` — AI-provider OAuth (distinct from MCP OAuth). |
|
|
143
|
+
| `oauth` | `runOAuthLogin(descriptor, { onUrl, onPrompt, onProgress, signal })` + `supportsOAuth(descriptor)` + `oauthUsesManualCodePaste(descriptor)` — AI-provider OAuth (distinct from MCP OAuth). |
|
|
144
|
+
| `oauth-redirect` | `fetchOAuthRedirect(pastedUrl)` — SSH escape hatch. Validates `pasted` is a loopback URL, fires a GET so the in-process callback server (pi-ai's or `startOAuthCallback`'s) receives the request through its normal handler. Used by the TUI to accept a redirect-URL paste when the browser couldn't reach the callback server directly. |
|
|
144
145
|
| `path-display` | `formatPathForCwd(projectRel, projectRoot, cwd)` — rewrites a project-root-relative path (as emitted by `listProjectFiles`) into the form the agent's CWD-resolving tools actually find. Wired into `createFilesCompletionProvider({ formatPath })` by the TUI shell. |
|
|
145
146
|
| `project-root` | `findGitRoot(cwd)` — walks upward looking for `.git`, returns absolute path or `null`. Used for session scope tagging + export anchors. |
|
|
146
147
|
| `prompt-segments` | `splitPromptSegments(text, refs)` → `PromptSegment[]`. GUI maps the same segments to inline-block chip pills. |
|
|
@@ -1188,11 +1189,15 @@ Two distinct OAuth flows, both ending in stored credentials and a working sessio
|
|
|
1188
1189
|
Used by Anthropic (Claude Pro/Max) and OpenAI Codex. Wired through `descriptor.oauthProvider` (a pi-ai `OAuthProviderInterface`).
|
|
1189
1190
|
|
|
1190
1191
|
```ts
|
|
1191
|
-
import {
|
|
1192
|
+
import { oauthUsesManualCodePaste, runOAuthLogin, setProviderCredential, supportsOAuth } from 'zidane/chat'
|
|
1192
1193
|
|
|
1193
1194
|
if (supportsOAuth(descriptor)) {
|
|
1194
1195
|
const credentials = await runOAuthLogin(descriptor, {
|
|
1195
1196
|
onUrl: (url, instructions) => { /* render to user */ },
|
|
1197
|
+
// Required for providers with `usesCallbackServer: false` (Anthropic Claude
|
|
1198
|
+
// Pro/Max). pi-ai invokes `onPrompt` with `{ message, placeholder?, allowEmpty? }`
|
|
1199
|
+
// when it needs the user to paste a code from the browser.
|
|
1200
|
+
onPrompt: async prompt => askUser(prompt.message, prompt.placeholder),
|
|
1196
1201
|
onProgress: msg => { /* show progress */ },
|
|
1197
1202
|
signal: abortController.signal,
|
|
1198
1203
|
})
|
|
@@ -1200,7 +1205,9 @@ if (supportsOAuth(descriptor)) {
|
|
|
1200
1205
|
}
|
|
1201
1206
|
```
|
|
1202
1207
|
|
|
1203
|
-
`runOAuthLogin` calls `tryOpenBrowser(url)` automatically; the user can also click the URL surfaced via `onUrl`.
|
|
1208
|
+
`runOAuthLogin` calls `tryOpenBrowser(url)` automatically; the user can also click the URL surfaced via `onUrl`. `oauthUsesManualCodePaste(descriptor)` is a convenience that returns `true` when the underlying pi-ai provider has `usesCallbackServer: false` — UIs can use it to swap their copy from "waiting for browser callback" to "paste the code from your browser below" before the prompt actually arrives.
|
|
1209
|
+
|
|
1210
|
+
**SSH paste-back.** When the user runs zidane over SSH (or behind a firewall that blocks loopback), the browser-side redirect to `http://127.0.0.1:<port>/callback?...` can't reach the remote callback server. `fetchOAuthRedirect(pastedUrl)` is the escape hatch: hand it the URL the browser ended up at and it fires a GET from inside the agent process. The request hits pi-ai's still-running server through the same handler a real browser would have, `waitForCode` resolves, and `runOAuthLogin` continues uninterrupted — no `onPrompt` call needed. The TUI wires this into a paste input next to the auth URL on every OAuth surface (AI providers + MCP).
|
|
1204
1211
|
|
|
1205
1212
|
### MCP-server OAuth
|
|
1206
1213
|
|
package/docs/RESTATE.md
ADDED
|
@@ -0,0 +1,190 @@
|
|
|
1
|
+
# Zidane × Restate — durable agents
|
|
2
|
+
|
|
3
|
+
`zidane/restate` is an **optional** adapter that runs the agent loop inside a [Restate](https://docs.restate.dev/) handler so every LLM call and tool execution becomes a journaled side effect. Crash mid-run → resume from the journal without re-billing tokens or re-firing tool side effects.
|
|
4
|
+
|
|
5
|
+
It's a thin layer over the existing agent surface — no fork, no separate runtime. The core harness has zero dependency on Restate; you opt in by wiring four wrappers.
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
zidane (core)
|
|
9
|
+
└── zidane/restate ← this adapter (Bun + Node, structurally typed against @restatedev/restate-sdk)
|
|
10
|
+
```
|
|
11
|
+
|
|
12
|
+
A runnable example lives at `examples/restate/agent.ts`.
|
|
13
|
+
|
|
14
|
+
## When to use it (and when not to)
|
|
15
|
+
|
|
16
|
+
**Use Restate when** you need a server-side, long-running agent that must survive process restarts mid-run: an unattended job, a webhook-driven worker, a multi-hour pipeline that calls expensive providers and writes to external systems. The journal turns crash recovery from "replay from a session snapshot" into "replay from the last completed side effect, mid-turn".
|
|
17
|
+
|
|
18
|
+
**Skip Restate when** you're running:
|
|
19
|
+
- A TUI / CLI session (the user owns process lifetime; sessions are the recovery story).
|
|
20
|
+
- A request/response API where the agent finishes inside the request lifetime.
|
|
21
|
+
- A worker where crash → start-over is acceptable (sessions still persist conversation; you just lose the in-flight turn).
|
|
22
|
+
|
|
23
|
+
Restate adds a deployment surface (runtime + SDK + journal storage). Don't pay for it unless mid-run crashes are an actual concern.
|
|
24
|
+
|
|
25
|
+
## Wrappers
|
|
26
|
+
|
|
27
|
+
Four pure wrappers — pick the ones you need. Pass-through for everything else (`Provider.formatTools`, `ToolDef.spec`, etc.) so the rest of Zidane is unchanged.
|
|
28
|
+
|
|
29
|
+
| Wrapper | Wraps | What gets journaled |
|
|
30
|
+
|---|---|---|
|
|
31
|
+
| `restateClock(ctx)` | `AgentClock` | `now()`, `randomUUID()` — turn ids + timestamps stable across replays |
|
|
32
|
+
| `restateProvider(provider, ctx)` | `Provider.stream` | Each LLM turn's `TurnResult` — replay returns cached value, no re-billing |
|
|
33
|
+
| `restateTool(tool, ctx)` / `wrapAgentTools(tools, ctx)` | `ToolDef.execute` | Each tool invocation's result — side effects fire exactly once |
|
|
34
|
+
| `restateSessionStore(ctx)` | `SessionStore` | Session header + turns + runs as virtual-object K/V state |
|
|
35
|
+
|
|
36
|
+
### Minimal wiring
|
|
37
|
+
|
|
38
|
+
```ts
|
|
39
|
+
import * as restate from '@restatedev/restate-sdk'
|
|
40
|
+
import { createAgent } from 'zidane'
|
|
41
|
+
import { anthropic } from 'zidane/providers'
|
|
42
|
+
import { restateClock, restateProvider, restateSessionStore, wrapAgentTools } from 'zidane/restate'
|
|
43
|
+
import { createSession } from 'zidane/session'
|
|
44
|
+
import { shell } from 'zidane/tools'
|
|
45
|
+
|
|
46
|
+
const zidaneAgent = restate.object({
|
|
47
|
+
name: 'zidane-agent',
|
|
48
|
+
handlers: {
|
|
49
|
+
async run(ctx, { prompt }: { prompt: string }) {
|
|
50
|
+
const session = await createSession({
|
|
51
|
+
id: ctx.key,
|
|
52
|
+
store: restateSessionStore(ctx),
|
|
53
|
+
})
|
|
54
|
+
|
|
55
|
+
const agent = createAgent({
|
|
56
|
+
provider: restateProvider(anthropic({ defaultModel: 'claude-sonnet-4-5' }), ctx),
|
|
57
|
+
tools: wrapAgentTools({ shell }, ctx),
|
|
58
|
+
clock: restateClock(ctx),
|
|
59
|
+
session,
|
|
60
|
+
})
|
|
61
|
+
|
|
62
|
+
return await agent.run({ prompt })
|
|
63
|
+
},
|
|
64
|
+
},
|
|
65
|
+
})
|
|
66
|
+
|
|
67
|
+
restate.serve({ services: [zidaneAgent] })
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## How replay works
|
|
71
|
+
|
|
72
|
+
Restate journals each `ctx.run(name, fn)` call. The original execution runs `fn` and stores its return value (or error). On any subsequent invocation of the same handler attempt — typically a crash recovery — `ctx.run(name, fn)` looks the entry up by `name` and returns the cached value **without invoking `fn`**.
|
|
73
|
+
|
|
74
|
+
```
|
|
75
|
+
┌──────────────── live ─────────────────┐
|
|
76
|
+
│ │
|
|
77
|
+
turn N: loop ──▶ restateProvider.stream() ──▶ ctx.run("llm-call-N", () => anthropic.stream(...))
|
|
78
|
+
│
|
|
79
|
+
├─ journal entry: TurnResult
|
|
80
|
+
│
|
|
81
|
+
┌───┴───┐
|
|
82
|
+
│ crash │
|
|
83
|
+
└───┬───┘
|
|
84
|
+
│
|
|
85
|
+
turn N: loop ──▶ restateProvider.stream() ──▶ ctx.run("llm-call-N", () => anthropic.stream(...))
|
|
86
|
+
│
|
|
87
|
+
└─ returns journaled TurnResult (no model call)
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
Journal-entry names are scoped to the handler invocation:
|
|
91
|
+
|
|
92
|
+
- **LLM calls**: `llm-call-1`, `llm-call-2`, … (monotonic per wrapper instance).
|
|
93
|
+
- **Tools**: `tool-<name>-1`, `tool-<name>-2`, … (per wrapper, per tool name).
|
|
94
|
+
|
|
95
|
+
Both names are configurable via the `entryName` option if multiple agents share one service and you need disambiguation.
|
|
96
|
+
|
|
97
|
+
### What re-fires on replay (and why that's fine)
|
|
98
|
+
|
|
99
|
+
Restate replays only what's inside `ctx.run`. Everything else re-executes — including Zidane's hooks:
|
|
100
|
+
|
|
101
|
+
- `tool:gate`, `tool:transform`, `tool:after` re-fire. Built-in handlers (truncation, image-stripping, `<edit-outcomes>` merge) are pure on journaled inputs, so the substitution is replay-stable.
|
|
102
|
+
- `turn:before`, `turn:after`, `usage` re-fire. They read the journaled `TurnResult`, so emitted values match the original.
|
|
103
|
+
- `stream:text`, `stream:thinking` do **not** re-fire — those callbacks are part of the live stream, which doesn't replay. UIs should reconstruct transcripts from `session.turns` (`eventsFromTurns` in `zidane/chat`) rather than relying on hook re-emission. The chat layer already does this.
|
|
104
|
+
- Consumer hooks that hit external services (logging APIs, metrics endpoints) should wrap those side effects in `ctx.run` themselves — otherwise they'll fire twice across the original + replay.
|
|
105
|
+
|
|
106
|
+
### Per-key serialization
|
|
107
|
+
|
|
108
|
+
Restate gives one concurrent invocation per virtual-object key. With sessions bound 1:1 to `ctx.key`, concurrent users hit different keys and never race; the runtime serializes invocations on the same key. No application-level locks, no `BEGIN IMMEDIATE` dance.
|
|
109
|
+
|
|
110
|
+
## Wrapper details
|
|
111
|
+
|
|
112
|
+
### `restateClock`
|
|
113
|
+
|
|
114
|
+
`AgentClock` backed by `ctx.date.now()` + `ctx.rand.uuidv4()`. Pass it via `AgentRunOptions.clock` (or `createAgent({ clock })`) so journaled metadata — `SessionTurn.id`, `SessionTurn.createdAt`, `runId` — stays byte-identical across replays.
|
|
115
|
+
|
|
116
|
+
`ctx.date.now()` is async (the SDK routes it through `ctx.run` internally; replay returns the journaled timestamp). `ctx.rand.uuidv4()` is sync (the SDK seeds an in-memory RNG from the invocation id, itself replay-stable).
|
|
117
|
+
|
|
118
|
+
**Gotcha**: when you pair a `restateSessionStore` with a session, the store's own `generateTurnId` runs ahead of `clock.randomUUID()` (the loop short-circuits via `session?.generateTurnId() ?? clock.randomUUID()`). The store implements `generateTurnId: () => ctx.rand.uuidv4()` precisely so turn ids stay journal-stable even when the clock falls through. If you wire a custom store, replicate that or your turn ids will drift on replay.
|
|
119
|
+
|
|
120
|
+
### `restateProvider`
|
|
121
|
+
|
|
122
|
+
Wraps `Provider.stream` in `ctx.run`. One journal entry per assistant turn carrying the full `TurnResult`. All other `Provider` methods (`formatTools`, `userMessage`, `toolResultsMessage`, `classifyError`) pass through unchanged — they're pure over already-deterministic inputs.
|
|
123
|
+
|
|
124
|
+
Default `runOptions`: `{ maxRetryAttempts: 3 }`. Provider transients (rate limits, 5xx, connection resets) get retried inside the journal entry. Once the cap is reached the failure becomes terminal and surfaces through Zidane's normal `AgentProviderError` path.
|
|
125
|
+
|
|
126
|
+
### `restateTool` / `wrapAgentTools`
|
|
127
|
+
|
|
128
|
+
Wraps `ToolDef.execute` in `ctx.run`. The tool body runs once on the original execution; on replay `ctx.run` returns the journaled result without re-executing. Tools that hit the network / write to disk / poke external systems fire **exactly once** across crashes + retries — the entire reason to pair Zidane with Restate.
|
|
129
|
+
|
|
130
|
+
Default `runOptions`: `{ maxRetryAttempts: 1 }`. Tool errors are typically deterministic (validation failures, missing files); blind retries waste budget. Override per-tool for network-heavy custom tools that benefit from SDK-level retries.
|
|
131
|
+
|
|
132
|
+
`isConcurrencySafe` is preserved. Restate's per-key serialization doesn't preclude in-handler parallelism — the loop's scheduler still fans out safe tools up to `behavior.maxConcurrentTools`, each call lands in its own journal entry. Journal-completion order doesn't affect replay correctness (entries are keyed by name).
|
|
133
|
+
|
|
134
|
+
`wrapAgentTools(tools, ctx, { exclude })` is a batch helper. Pass-through for `exclude`-listed names (fully-deterministic local helpers where journaling is overhead).
|
|
135
|
+
|
|
136
|
+
### `restateSessionStore`
|
|
137
|
+
|
|
138
|
+
A `SessionStore` backed by the bound virtual object's K/V state. Three slots per session — `session-data` (header), `session-turns` (history), `session-runs` (run records).
|
|
139
|
+
|
|
140
|
+
**When to use it.** Pick this store when **external** consumers (a TUI subscribed to the runtime, a web GUI, support tooling) need to read a session without re-entering the agent handler. Virtual-object state is exposed via the SDK's read APIs and the Restate UI — same data, different access path.
|
|
141
|
+
|
|
142
|
+
**When to skip it.** If you only care about durable execution within one invocation, an in-memory store is enough: the journaled `ctx.run` entries already replay the loop to byte-identical state after a crash. Turns rematerialize from the journal on every attempt.
|
|
143
|
+
|
|
144
|
+
Limitations baked into the contract:
|
|
145
|
+
- `list()` returns `[]` — virtual objects don't expose cross-key enumeration. Maintain a separate index handler if discovery matters.
|
|
146
|
+
- `delete()` only clears the current object's keys (no cross-key fan-out).
|
|
147
|
+
- Single-tenant per virtual-object key: every method ignores the `sessionId` argument because the object is already keyed. Calling `appendTurns('other-id', …)` writes to the CURRENT object. Bind one virtual-object key per session and the contract holds.
|
|
148
|
+
|
|
149
|
+
## Recommended `AgentBehavior` for Restate deployments
|
|
150
|
+
|
|
151
|
+
```ts
|
|
152
|
+
createAgent({
|
|
153
|
+
// ... wrappers as above
|
|
154
|
+
behavior: {
|
|
155
|
+
disableBackgroundTasks: true,
|
|
156
|
+
cache: true,
|
|
157
|
+
maxCostUsd: 5,
|
|
158
|
+
maxTotalTokens: 2_000_000,
|
|
159
|
+
},
|
|
160
|
+
})
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
- **`disableBackgroundTasks: true`** — background tasks (`shell` with `run_in_background: true`) aren't journaled and don't survive process death. Leaving them on creates orphaned processes on crash.
|
|
164
|
+
- **`cache: true`** (default) — prompt-cache breakpoints reward stable system/tool prefixes. Cuts cold-start cost on retries when the cache is still warm.
|
|
165
|
+
- **`maxCostUsd` / `maxTotalTokens`** — server-side budget circuit breaker. Throws `AgentBudgetExceededError` and finalizes the session run as `aborted`. Especially useful here because Restate's exactly-once + retry guarantees mean a runaway loop is a runaway *bill*; a soft cap stops it cleanly. See ARCHITECTURE.md → run-level budget circuit breaker.
|
|
166
|
+
|
|
167
|
+
MCP servers re-bootstrap on every handler invocation (`warmup()` is lazy + idempotent); their tool *results* are journaled by `restateTool` so the model sees the same outputs on replay.
|
|
168
|
+
|
|
169
|
+
## Failure modes
|
|
170
|
+
|
|
171
|
+
| Scenario | What happens |
|
|
172
|
+
|---|---|
|
|
173
|
+
| Crash between turn N and N+1 | Re-enter the handler. Journal replays turns 1..N from cache. Turn N+1 runs live. |
|
|
174
|
+
| Crash mid-tool-call | Re-enter. Journal replays prior turns + already-completed tools. The in-flight tool re-executes (Restate didn't see it complete). Tools that should not be re-run on retry need application-level idempotency (e.g. a `requestId` in the API call). |
|
|
175
|
+
| Provider rate limit / 5xx | `restateProvider`'s default `maxRetryAttempts: 3` retries inside the journal entry. Beyond that, surfaces as `AgentProviderError`. |
|
|
176
|
+
| Tool throws | `restateTool`'s default `maxRetryAttempts: 1` — no blind retry. The error is journaled; replay returns the same error. |
|
|
177
|
+
| Restate runtime down | Handler invocations queue / fail at the SDK transport layer; Zidane never sees them. Recovery is operational, not in-app. |
|
|
178
|
+
|
|
179
|
+
## Tests
|
|
180
|
+
|
|
181
|
+
`test/restate.test.ts` covers each wrapper individually + end-to-end agent.run() with the clock + provider in place. A fake `RestateContextLike` records `ctx.run` calls and can be pre-seeded with a journal to verify replay semantics (inner fn not invoked, cached value returned verbatim).
|
|
182
|
+
|
|
183
|
+
The fake structural types live in `src/restate/types.ts` — the adapter typechecks **without** `@restatedev/restate-sdk` installed, useful in monorepos that vend `zidane` upstream of Restate. Consumers wiring the real SDK pass `restate.Context` / `restate.ObjectContext` directly; they're structurally assignable to the in-house shapes.
|
|
184
|
+
|
|
185
|
+
## Related
|
|
186
|
+
|
|
187
|
+
- `examples/restate/agent.ts` — runnable end-to-end virtual object.
|
|
188
|
+
- `docs/ARCHITECTURE.md` — agent lifecycle, hook ordering, run-level budget breaker.
|
|
189
|
+
- `docs/SKILL.md` — `createAgent`, `Provider`, `ToolDef`, `SessionStore` interfaces this adapter wraps.
|
|
190
|
+
- [Restate docs](https://docs.restate.dev/) — runtime, SDK, deployment model.
|