npm - zidane - Versions diffs - 3.1.0 → 3.1.2 - Mend

zidane 3.1.0 → 3.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/README.md +110 -7
package/dist/{agent-O5SmWixc.d.ts → agent-C9q5VMGa.d.ts} +279 -3
package/dist/{chunk-PYLYK3K7.js → chunk-2AE3VM5O.js} +547 -35
package/dist/{chunk-R74LQKAM.js → chunk-7H34OFDA.js} +26 -0
package/dist/{chunk-FEAOZ5DB.js → chunk-BRMURQA2.js} +12 -3
package/dist/{chunk-SARMZCEL.js → chunk-BXO7CZHJ.js} +1 -1
package/dist/{chunk-2EQT4EHD.js → chunk-IUBBVF53.js} +4 -0
package/dist/{chunk-VF4A7HAC.js → chunk-YQ7LY6CL.js} +3 -0
package/dist/contexts.d.ts +3 -3
package/dist/contexts.js +1 -1
package/dist/index.d.ts +6 -6
package/dist/index.js +6 -6
package/dist/mcp.d.ts +2 -2
package/dist/mcp.js +1 -1
package/dist/presets.d.ts +2 -2
package/dist/presets.js +4 -4
package/dist/providers.d.ts +2 -2
package/dist/providers.js +2 -2
package/dist/{sandbox-CW72eLDP.d.ts → sandbox-CLghrTLi.d.ts} +1 -1
package/dist/session/sqlite.d.ts +2 -2
package/dist/session.d.ts +2 -2
package/dist/session.js +1 -1
package/dist/{skills-use-DJKUctM5.d.ts → skills-use-DU0unNP4.d.ts} +1 -1
package/dist/skills.d.ts +3 -3
package/dist/tools.d.ts +5 -5
package/dist/tools.js +3 -3
package/dist/{types-BpvTmawk.d.ts → types-vA1a_ZX7.d.ts} +11 -0
package/dist/types.d.ts +4 -4
package/dist/{validation-DE4g5Cez.d.ts → validation-BKA33eqb.d.ts} +1 -1
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -15,10 +15,10 @@ Built to be embedded.
 A small, hookable core with sensible defaults so most consumers don't write a single hook. Built around three principles: **token discipline by default** (cache, dedup, compaction, byte-accounting), **self-healing on the fault paths** (auto-coerce args, hallucinated-tool fallback, error rewriting), and **provider parity** (server-side features on Anthropic, client-side equivalents everywhere else).
 - 🧠 **Multi-provider, multi-auth** — Anthropic, OpenAI Codex, OpenRouter, Cerebras, plus a generic `openaiCompat` factory (Baseten, Fireworks, Groq, local servers). OAuth + API key, auto-refreshing tokens. Anthropic accepts opt-in `extraBetas` and `contextManagement` for first-party features.
-- 🪝 **Streaming, hookable turn loop** — text/thinking deltas, tool calls, MCP, sessions, skills, spawn, OAuth, validation, budgets — all observable (and most mutatable) via typed hook events.
-- 🛠 **Tools first-class** — `shell`, `read_file`, `write_file`, `edit`, `multi_edit`, `glob`, `grep`, `spawn`, human-in-the-loop, plus any [MCP](https://modelcontextprotocol.io) server. Sequential or parallel, per-call gates (`tool:gate`), validation auto-coerce (`"true"` → `true`), hallucinated-tool fallback (`tool:unknown`), error rewriting (`tool:error` → `result`).
-- ✂️ **Token-aware ergonomics** — paginated reads with a "how to page" footer, 8 KB tail-truncated `shell`, idempotent `write_file`; `outputBytes` surfaced on every tool/MCP hook. `behavior.toolOutputBudget` injects a "summarize" nudge when a turn's outputs exceed the cap.
-- 🗜 **Context discipline** — auto-injected `cache_control` breakpoints (Anthropic + OpenRouter); server-side compaction via `context-management-2025-06-27` on Anthropic, `behavior.compactStrategy: 'tail'` on everyone else. Per-session `read_file` dedup + opt-in `requireReadBeforeEdit` guard kill stale-content edits.
+- 🪝 **Streaming, hookable turn loop** — text/thinking deltas, tool calls, MCP, sessions, skills, spawn, OAuth, validation, budgets — all observable (and most mutatable) via typed hook events. Per-request `system:transform` hook for runtime-derived prompt sections.
+- 🛠 **Tools first-class** — `shell`, `read_file`, `write_file`, `edit`, `multi_edit`, `glob`, `grep`, `spawn`, human-in-the-loop, plus any [MCP](https://modelcontextprotocol.io) server. Sequential or parallel, per-call gates (`tool:gate` with writable `block` / `result` / `runToolCounts`), validation auto-coerce (`"true"` → `true`), hallucinated-tool fallback (`tool:unknown`), error rewriting (`tool:error` → `result`).
+- ✂️ **Token-aware ergonomics** — paginated reads with a "how to page" footer, 8 KB tail-truncated `shell`, idempotent `write_file`; `outputBytes` surfaced on every tool/MCP hook. `behavior.toolOutputBudget` injects a "summarize" nudge when a turn's outputs exceed the cap; `behavior.toolBudgets` caps per-tool call counts (`'steer'` or `'block'`); `behavior.thinkingDecay` tapers reasoning budget per turn.
+- 🗜 **Context discipline** — auto-injected `cache_control` breakpoints (Anthropic + OpenRouter); server-side compaction via `context-management-2025-06-27` on Anthropic, `behavior.compactStrategy: 'tail'` on everyone else. Per-session `read_file` dedup + opt-in `requireReadBeforeEdit` guard kill stale-content edits; `behavior.dedupTools` generalizes the same pattern to arbitrary tools (`todowrite`, `execute_sql`, …).
 - 🎯 **Reasoning + structured output** — thinking levels (`off` / `minimal` / `low` / `medium` / `high` / `adaptive`) with optional exact budgets; force the final answer to a JSON Schema (Zod v4 interop), no brittle parsing.
 - 💾 **Sessions, skills, multimodal** — pluggable session stores (memory / SQLite / remote / file-map), incremental persistence; [Agent Skills](https://agentskills.io/specification) spec-aligned with `allowed-tools` enforcement + resume rehydration; images + documents via `PromptPart[]`, tools can return image blocks routed natively on vision providers or via companion messages elsewhere.
 - 🧵 **Sub-agents + execution contexts** — delegate to child agents with inherited or overridden preset (child events bubble to the parent); run tools in-process, Docker, or any `SandboxProvider` (E2B / Rivet / custom). Parallel MCP bootstrap with `agent.warmup()` + `eager: true` to hide cold starts.
@@ -67,10 +67,13 @@ createAgent({
     maxTurns: 50,                    // max loop iterations
     maxTokens: 16384,                // max tokens per LLM response
     thinkingBudget: 10240,           // exact thinking token budget
+    thinkingDecay: { afterTurn: 5, factor: 0.5, floor: 1024 }, // taper budget per run-relative turn
     cache: true,                     // prompt-cache breakpoints on supported providers (default: true)
     toolOutputBudget: 32768,         // soft per-turn cap on tool-output bytes (off by default)
     dedupReads: true,                // dedup identical re-reads of the same file in `read_file` (default: true)
+    dedupTools: { todowrite: i => JSON.stringify(i.todos) }, // generic per-tool argument dedup
     requireReadBeforeEdit: false,    // refuse `edit` / `multi_edit` against unread or stale files (default: false)
+    toolBudgets: { todowrite: { max: 6, onExceed: 'steer' } }, // per-tool soft call caps
     compactStrategy: 'off',          // client-side tail compaction for non-Anthropic providers — 'off' | 'tail' (default: 'off')
     compactThreshold: 131_072,       // bytes threshold that triggers tail compaction (default: 128 KiB)
     compactKeepTurns: 4,             // trailing turns left intact during compaction (default: 4)
@@ -162,6 +165,18 @@ Fallback: `params.apiKey` > `params.access` > `ANTHROPIC_API_KEY` env > `.creden
 `extraBetas` are merged with the OAuth defaults (`claude-code-20250219`, `oauth-2025-04-20`) and de-duped. `contextManagement` is sent on the request body as `context_management`; pair it with the `context-management-2025-06-27` beta. For non-Anthropic providers, see `behavior.compactStrategy: 'tail'` for the client-side fallback.
+`extraBodyParams` is a generic forward-compat pass-through for un-typed Messages API fields. Spread into the request before the typed core, so explicit factory options always win on collision. Use it when Anthropic ships a new beta before zidane has a dedicated knob:
+```ts
+anthropic({
+  apiKey: '...',
+  extraBetas: ['some-future-beta'],
+  extraBodyParams: { future_field: { /* ... */ } },
+})
+```
+`openaiCompat` accepts the same `extraBodyParams` for OpenAI-style endpoints (e.g. `reasoning_effort`, `metadata`, OpenRouter `provider` routing).
 ### OpenRouter
 ```ts
@@ -367,15 +382,19 @@ All tool hooks include `turnId` and `callId` for correlation. Typed via `ToolHoo
 ```ts
 agent.hooks.hook('tool:gate', (ctx) => {
-  // ctx.turnId, ctx.callId, ctx.name, ctx.input
+  // ctx.turnId, ctx.callId, ctx.name, ctx.input, ctx.runToolCounts
   if (ctx.name === 'shell' && String(ctx.input.command).includes('rm -rf')) {
     ctx.block = true
     ctx.reason = 'dangerous command'
   }
+  // Substitute a successful result without running the tool — mirrors
+  // tool:unknown / tool:error. When both are set, `block` wins.
+  if (ctx.name === 'todowrite' && (ctx.runToolCounts.todowrite ?? 0) > 0)
+    ctx.result = 'Already recorded; no-op.'
 })
-agent.hooks.hook('tool:before', (ctx) => { /* ctx.turnId, ctx.callId, ctx.name, ctx.input, ctx.coercions? */ })
-agent.hooks.hook('tool:after', (ctx) => { /* + ctx.result, ctx.outputBytes, ctx.coercions? */ })
+agent.hooks.hook('tool:before', (ctx) => { /* ctx.turnId, ctx.callId, ctx.name, ctx.input, ctx.runToolCounts, ctx.coercions? */ })
+agent.hooks.hook('tool:after', (ctx) => { /* + ctx.result, ctx.outputBytes, ctx.runToolCounts, ctx.coercions? */ })
 agent.hooks.hook('tool:error', (ctx) => {
   // + ctx.error. Mutate ctx.result to substitute the payload sent back to the
   // model in place of the default `Tool error: <msg>` — useful for OSS-model
@@ -434,6 +453,20 @@ agent.hooks.hook('context:transform', (ctx) => {
 })
 ```
+### System transform
+Mutate the system prompt per request — useful for runtime-derived sections (files already read in the session, live tool budgets, skill activation reminders). Fires after `context:transform`, before the request goes out. `messages` is read-only here.
+```ts
+agent.hooks.hook('system:transform', (ctx) => {
+  // ctx.system, ctx.messages (readonly), ctx.turn, ctx.turnId, ctx.session?
+  if (ctx.session && ctx.turn > 1)
+    ctx.system += `\n\n## Reminder: keep responses concise after turn ${ctx.turn}.`
+})
+```
+Cache breakpoints land naturally inside the provider after this hook, so repeated turns with the same derived system text still hit the cache.
 ### Hook recipes
 Three patterns that don't have a built-in default. Copy-paste and tune.
@@ -484,6 +517,12 @@ const agent = createAgent({
 agent.hooks.hook('budget:exceeded', (ctx) => {
   console.warn(`turn ${ctx.turn}: ${ctx.bytes} > ${ctx.budget} bytes`)
 })
+agent.hooks.hook('tool-budget:exceeded', (ctx) => {
+  // Per-tool counterpart, fires when `behavior.toolBudgets[ctx.tool]` trips.
+  // ctx.tool, ctx.count, ctx.max, ctx.turnId, ctx.mode ('steer' | 'block')
+  console.warn(`tool ${ctx.tool} hit cap (${ctx.count}/${ctx.max}, mode=${ctx.mode})`)
+})
 ```
 ### Client-side context compaction (non-Anthropic)
@@ -510,6 +549,70 @@ Anthropic users should prefer the server-side `context-management-2025-06-27` be
 `behavior.requireReadBeforeEdit` (off by default) — `edit` and `multi_edit` reject when the file hasn't been read in the session, or when its on-disk content has drifted since the last read. Eliminates the silent-corruption case where a model edits against bytes it "remembers" but no longer reflect reality. Recommended on for stricter eval-grade runs.
+### Generic per-tool dedup
+`behavior.dedupTools` extends the read-file pattern to arbitrary tools. Provide a hasher per tool keyed by canonical name; identical inputs replay the prior result without re-running the tool. Requires a session.
+The hasher contract has **three return values, three meanings** — pick deliberately:
+| Return | Meaning |
+|---|---|
+| non-empty string | Cache key for this call. Equal keys replay the prior result. |
+| `undefined` | **Skip dedup for this call.** Tool runs normally; nothing recorded. |
+| `''` or non-string | Treated as `undefined` (defensive). |
+```ts
+behavior: {
+  dedupTools: {
+    // Always cache by full input — every identical re-call dedups.
+    todowrite: input => JSON.stringify(input),
+    // Cache by a normalized subset; non-cacheable shapes opt out via `undefined`.
+    execute_sql: (input) => {
+      const q = typeof input.query === 'string' ? input.query.trim().toLowerCase() : undefined
+      if (!q || q.includes('now()') || q.includes('random()')) return undefined
+      return q
+    },
+  },
+}
+```
+The `undefined` opt-out is **not** the same as `JSON.stringify(input)` — that would dedup against the verbatim input. Use `undefined` to mean "this specific call is not cacheable" (timestamps baked in, randomness, debug flags).
+Tools with side effects or non-deterministic outputs (network, time, randomness) **must not** be listed — there is no safety net beyond the consumer's hasher. For MCP tools, key by the namespaced wire name (`mcp_<server>_<tool>`).
+### Per-tool call budgets
+`behavior.toolBudgets` caps per-tool calls per run. Two reactions:
+- `'steer'` — let the call run, but emit a synthetic user message after the turn nudging the model to commit and finish. Fires once per tool per run.
+- `'block'` — refuse subsequent calls with `Blocked: <reason>`.
+```ts
+behavior: {
+  toolBudgets: {
+    todowrite: { max: 6, onExceed: 'steer' },
+    execute_sql: { max: 3, onExceed: 'block' },
+  },
+}
+```
+Pass a function for custom messages: `onExceed: ctx => ({ mode: 'steer', message: '...' })`. Subscribe to `tool-budget:exceeded` for telemetry. Counts include dedup hits — by design, since both eat against agent-loop sanity.
+### Adaptive thinking budget
+`behavior.thinkingDecay` tapers the thinking budget across turns. Late turns are usually checkpoint / cleanup work where reasoning rarely pays for itself.
+```ts
+behavior: {
+  thinkingBudget: 8192,
+  thinkingDecay: { afterTurn: 5, factor: 0.5, floor: 1024 },
+  // turn 1-5 → 8192, turn 6 → 4096, turn 7 → 2048, turn 8+ → 1024
+}
+```
+Pass a function for arbitrary curves: `thinkingDecay: (turn, base) => base / Math.sqrt(turn)`. No-op when `thinkingBudget` is unset. Honored by every provider that respects `thinkingBudget`.
 ## Steering and Follow-up
 ### Steering

package/dist/{agent-O5SmWixc.d.ts → agent-C9q5VMGa.d.ts} RENAMED Viewed

@@ -1,5 +1,5 @@
 import { Hookable } from 'hookable';
-import { b as ExecutionContext, c as ExecutionHandle } from './types-BpvTmawk.js';
+import { b as ExecutionContext, c as ExecutionHandle } from './types-vA1a_ZX7.js';
 import { Client } from '@modelcontextprotocol/sdk/client/index.js';
 /**
@@ -229,6 +229,126 @@ interface AgentBehavior {
      * Default: `true`.
      */
     dedupReads?: boolean;
+    /**
+     * Taper the thinking budget over the course of a run. Late turns are
+     * usually checkpoint / cleanup work where reasoning rarely pays for
+     * itself; early turns benefit most. Two forms:
+     *
+     * - **Struct** — geometric decay starting after `afterTurn`, multiplying by
+     *   `factor` each subsequent turn, clamped to `floor`. Example
+     *   `{ afterTurn: 5, factor: 0.5, floor: 1024 }` with a base budget of 8192:
+     *   turns 1-5 = 8192, turn 6 = 4096, turn 7 = 2048, turn 8+ = 1024.
+     * - **Function** — `(runTurn, baseBudget) => number`. Arbitrary curves;
+     *   `runTurn` is 1-indexed, run-relative (resumed sessions reset).
+     *
+     * No-op when `thinkingBudget` is unset. Honored by every provider that
+     * respects `thinkingBudget` (anthropic legacy enabled+budget path,
+     * adaptive `maxTokensCap`, openai-compat `max_tokens` padding).
+     *
+     * Default: `undefined` (no decay).
+     */
+    thinkingDecay?: {
+        afterTurn: number;
+        factor: number;
+        floor: number;
+    } | ((runTurn: number, baseBudget: number) => number);
+    /**
+     * Per-tool soft call budget for this run. Keyed by **canonical** tool name.
+     * On the first call after the run-cumulative dispatched count for that tool
+     * reaches `max`, the framework fires `onExceed`:
+     *
+     * - `'steer'` (default) — let the call execute, but emit a synthetic user
+     *   message after the turn that nudges the model away from re-calling the
+     *   tool. Reuses the existing post-turn steer pathway used by
+     *   `toolOutputBudget`. Fires `tool-budget:exceeded` with `mode: 'steer'`.
+     * - `'block'` — refuse the call via `tool:gate` `block`. The model sees a
+     *   `Blocked: <reason>` tool result. Fires `tool-budget:exceeded` with
+     *   `mode: 'block'`.
+     * - **Function** — `(ctx) => { mode, message }`. The consumer supplies the
+     *   steering / refusal text and chooses the mode dynamically.
+     *
+     * Counts include both real dispatches and dedup substitutes (Z19 hits).
+     * Excludes calls already blocked by an earlier gate (skill allow-list,
+     * consumer hook). Tool dispatched by spawned subagents has its own per-run
+     * counter — child counts never charge the parent.
+     *
+     * For MCP tools, key by the namespaced wire name (`mcp_<server>_<tool>`).
+     *
+     * Atomic in parallel mode: the middleware tracks its own per-tool
+     * approval counter, incremented synchronously at gate-time. A
+     * 4-call parallel batch against `max: 2` will let the first 2 through
+     * and refuse the rest, even though the loop's `runToolCounts` only
+     * propagates between calls (not within a single batch's gate fan-out).
+     *
+     * Default: `undefined` (no budget enforcement).
+     */
+    toolBudgets?: Record<string, {
+        max: number;
+        onExceed?: 'steer' | 'block' | ((ctx: {
+            tool: string;
+            count: number;
+            max: number;
+        }) => {
+            mode: 'steer' | 'block';
+            message: string;
+        });
+    }>;
+    /**
+     * Generic per-tool argument deduplication. Keyed by the tool's **canonical**
+     * name (alias-stable). Each entry is a hasher: `(input) => string | undefined`.
+     *
+     * **Hasher contract** — three return values, three meanings:
+     *
+     * | Return                  | Meaning                                                                |
+     * |-------------------------|------------------------------------------------------------------------|
+     * | a non-empty string      | Cache key for this call. Equal keys (most-recent-only, this session)   |
+     * |                         | replay the prior recorded result without re-dispatching the tool.      |
+     * | `undefined`             | **Skip dedup for this call.** The tool runs normally; nothing recorded.|
+     * | `''` / non-string       | Treated identically to `undefined` (defensive: no dedup, no error).    |
+     *
+     * The `undefined` opt-out is the way to say *"this specific call is not
+     * cacheable"* (timestamps in input, randomness baked in, debug flags). It
+     * is **not** the same as `JSON.stringify(input)` — that would dedup against
+     * the verbatim input. Pick one explicitly:
+     *
+     * ```ts
+     * // Always cache by full input — every identical re-call dedups.
+     * dedupTools: { todowrite: input => JSON.stringify(input) }
+     *
+     * // Cache by a normalized subset; non-cacheable shapes opt out.
+     * dedupTools: {
+     *   execute_sql: (input) => {
+     *     const q = typeof input.query === 'string' ? input.query.trim().toLowerCase() : undefined
+     *     if (!q || q.includes('now()') || q.includes('random()')) return undefined
+     *     return q
+     *   },
+     * }
+     * ```
+     *
+     * On a hit, the previously-recorded result is replayed as the tool_result
+     * without dispatching the tool. The substitution flows through `tool:gate`
+     * `result` (Z20), so `tool:after` and `tool:transform` still fire.
+     *
+     * Requires a session (`createSession()`); without one, the map is a silent
+     * no-op since per-session state has nowhere to live. Tools with side
+     * effects or non-deterministic outputs (network, time, randomness) MUST
+     * NOT be listed — there is no safety net beyond the consumer's hasher.
+     *
+     * For MCP tools, key by the namespaced wire name (`mcp_<server>_<tool>`).
+     * Parallel mode (`toolExecution: 'parallel'`, the default) sees calls in
+     * the SAME assistant turn race against each other — none can dedup against
+     * a sibling that started in the same batch. Sequential mode honors order
+     * within a turn.
+     *
+     * **Cache policy**: only the most recent `(hash, result)` per tool is
+     * retained. Interleaved patterns (input A, input B, input A) miss on the
+     * second A because B overwrote it. Sufficient for the common spam-the-
+     * same-call loop; consumers needing a richer cache should hook
+     * `tool:gate` directly.
+     *
+     * Default: `undefined` (no per-tool dedup).
+     */
+    dedupTools?: Record<string, (input: Record<string, unknown>) => string | undefined>;
     /**
      * Require `read_file` before `edit` / `multi_edit` on the same path, and
      * reject edits when the file has changed on disk since the last read in
@@ -451,8 +571,24 @@ interface AgentRunOptions {
      */
     depth?: number;
 }
-/** Reason the provider gave for stopping the turn */
-type TurnFinishReason = 'stop' | 'tool-calls' | 'length' | 'content-filter' | 'error' | 'other';
+/**
+ * Reason the provider gave for stopping the turn.
+ *
+ * - `'stop'` — natural turn end (`end_turn` / `stop_sequence`).
+ * - `'tool-calls'` — model emitted tool_use blocks.
+ * - `'length'` — `max_tokens` reached, or (Anthropic 4.6+) the response bumped
+ *   against the model's context window mid-stream
+ *   (`model_context_window_exceeded`). The partial response is preserved; the
+ *   loop emits this reason so consumers can prune/retry.
+ * - `'content-filter'` — model refused.
+ * - `'pause'` — Anthropic `pause_turn`: a server-side mid-turn pause for very
+ *   long thinking. The loop continues with a synthetic "Please continue."
+ *   user message rather than terminating; consumers see the pause via this
+ *   finish reason on the prior assistant turn.
+ * - `'error'` — provider classified the turn as failed.
+ * - `'other'` — unknown / unmapped.
+ */
+type TurnFinishReason = 'stop' | 'tool-calls' | 'length' | 'content-filter' | 'pause' | 'error' | 'other';
 interface TurnUsage {
     input: number;
     output: number;
@@ -658,6 +794,18 @@ interface AnthropicParams {
      * ```
      */
     contextManagement?: AnthropicContextManagement;
+    /**
+     * Generic pass-through for fields on the Messages API request body that the
+     * SDK does not yet type. Spread into the request before the typed fields,
+     * so explicit options ({@link AnthropicParams.contextManagement} and the
+     * built-in fields like `model` / `tools` / `messages`) win on collision.
+     *
+     * Forward-compat escape hatch for new Anthropic betas — when a future flag
+     * ships before zidane has a dedicated typed knob, set it here without
+     * waiting on a release. Most fields will still need the matching beta in
+     * {@link AnthropicParams.extraBetas}.
+     */
+    extraBodyParams?: Record<string, unknown>;
 }
 declare function anthropic(anthropicParams?: AnthropicParams): Provider;
@@ -780,6 +928,17 @@ interface OpenAICompatParams {
      * Default: `false`. The `openrouter` wrapper sets this to `true`.
      */
     cacheBreakpoints?: boolean;
+    /**
+     * Generic pass-through for fields on the Chat Completions request body that
+     * zidane does not yet type. Spread into the request before the typed core
+     * (model / messages / tools / max_tokens / stream / tool_choice), so
+     * explicit options always win on collision.
+     *
+     * Forward-compat escape hatch for endpoints that ship one-off fields ahead
+     * of zidane (e.g. OpenAI `reasoning_effort`, OpenRouter `provider` routing,
+     * vendor-specific `safety_level` knobs).
+     */
+    extraBodyParams?: Record<string, unknown>;
 }
 /**
  * Factory for any OpenAI-compatible HTTP endpoint.
@@ -1541,11 +1700,34 @@ interface AgentHooks {
         turnId: string;
         options: StreamOptions;
     }) => void;
+    /**
+     * Fires after each assistant turn (before its tool-result follow-up
+     * dispatches; the loop iterates back to a fresh `turn:before` once the
+     * tool results are produced).
+     *
+     * `toolCounts.turn` — calls **emitted** by the model in this assistant
+     * turn, keyed by canonical tool name. Reflects what the model asked for,
+     * regardless of downstream gate outcome. Most useful for spotting per-turn
+     * spikes ("the model called todowrite 4 times in one turn").
+     *
+     * `toolCounts.run` — cumulative running counter of **dispatched** calls
+     * scoped to this `runId`, captured at fire time. Excludes calls that were
+     * `block`ed by `tool:gate` handlers. Includes calls short-circuited via
+     * `tool:gate` `result` substitution (the model still asked, the framework
+     * just answered without the tool running). Resumed sessions start a fresh
+     * run with empty counts.
+     *
+     * Both fields are frozen snapshots; mutate-safe.
+     */
     'turn:after': (ctx: {
         turn: number;
         turnId: string;
         usage: TurnUsage;
         message: SessionTurn;
+        toolCounts: {
+            turn: Readonly<Record<string, number>>;
+            run: Readonly<Record<string, number>>;
+        };
     }) => void;
     'stream:text': (ctx: StreamHookContext & {
         delta: string;
@@ -1559,17 +1741,53 @@ interface AgentHooks {
         thinking: string;
     }) => void;
     'oauth:refresh': (ctx: OAuthRefreshHookContext) => void;
+    /**
+     * Fires before validation, `tool:before`, and `execute`. Two ways to
+     * intercept:
+     *
+     * - Set `block = true` (with a `reason`) to refuse the call. The model
+     *   sees a `Blocked: <reason>` tool result; `tool:before` / `tool:after`
+     *   do **not** fire.
+     * - Set `result` to substitute a successful tool_result and skip
+     *   execution. The model sees the substitute as a normal tool_result;
+     *   `tool:before` does not fire, but `tool:after` and `tool:transform`
+     *   do — so byte budgets, telemetry, and post-mutation hooks see the
+     *   substitute. Useful for cache hits, dedup, idempotency guards,
+     *   plan-mode synthetic acks.
+     *
+     * If multiple handlers along the chain set both `block` and `result`,
+     * `block` wins — refusal beats substitution, so a policy gate
+     * (skills allow-list, custom security) can always override an upstream
+     * consumer's cache substitute. Mirrors the writable-`result` shape on
+     * `tool:unknown` and `tool:error` so consumers learn one pattern.
+     *
+     * `runToolCounts` — frozen pre-call snapshot of per-tool dispatched
+     * counts in this run. Use it to self-throttle, drive observability, or
+     * implement budget guards. Counts every call that passed gate, including
+     * dedup substitutes (Z19); excludes `block`ed calls.
+     *
+     * **Parallel mode** (`toolExecution: 'parallel'`, the default): the
+     * snapshot is taken before any dispatches in the batch, so consumer
+     * hooks reading `runToolCounts` see the pre-batch view. Built-in
+     * budget / dedup middleware uses internal per-call reservation, so
+     * `behavior.toolBudgets` enforces atomically even within a parallel
+     * batch.
+     */
     'tool:gate': (ctx: ToolHookContext & {
         block: boolean;
         reason: string;
+        result?: string | ToolResultContent[];
+        runToolCounts: Readonly<Record<string, number>>;
     }) => void;
     'tool:before': (ctx: ToolHookContext & {
         coercions?: readonly string[];
+        runToolCounts: Readonly<Record<string, number>>;
     }) => void;
     'tool:after': (ctx: ToolHookContext & {
         result: string | ToolResultContent[];
         outputBytes: number;
         coercions?: readonly string[];
+        runToolCounts: Readonly<Record<string, number>>;
     }) => void;
     /**
      * Fires when a tool throws during execution. Mutate `result` to substitute a
@@ -1630,6 +1848,27 @@ interface AgentHooks {
     'context:transform': (ctx: {
         messages: SessionMessage[];
     }) => void;
+    /**
+     * Fires per request, after `context:transform` and before the request goes
+     * out. Mutating `ctx.system` updates the system prompt the provider sends
+     * for this turn — useful for runtime-derived sections (e.g. listing files
+     * already read in the session, surfacing live tool budgets, injecting
+     * skill activation reminders).
+     *
+     * Cache breakpoints are applied inside the provider after this hook, so
+     * mutations land in the cache key naturally — repeated turns with the
+     * same derived system text still hit the cache.
+     *
+     * `messages` is read-only here; use `context:transform` for message
+     * surgery. `session` is `undefined` when the run is sessionless.
+     */
+    'system:transform': (ctx: {
+        system: string;
+        messages: readonly SessionMessage[];
+        turn: number;
+        turnId: string;
+        session?: Session;
+    }) => void;
     'steer:inject': (ctx: {
         message: string;
     }) => void;
@@ -1657,6 +1896,7 @@ interface AgentHooks {
     }) => void;
     'child:tool:before': (ctx: ToolHookContext & {
         coercions?: readonly string[];
+        runToolCounts: Readonly<Record<string, number>>;
         childId: string;
         depth: number;
     }) => void;
@@ -1664,6 +1904,7 @@ interface AgentHooks {
         result: string | ToolResultContent[];
         outputBytes: number;
         coercions?: readonly string[];
+        runToolCounts: Readonly<Record<string, number>>;
         childId: string;
         depth: number;
     }) => void;
@@ -1677,6 +1918,10 @@ interface AgentHooks {
         turnId: string;
         usage: TurnUsage;
         message: SessionTurn;
+        toolCounts: {
+            turn: Readonly<Record<string, number>>;
+            run: Readonly<Record<string, number>>;
+        };
         childId: string;
         depth: number;
     }) => void;
@@ -1716,9 +1961,22 @@ interface AgentHooks {
         ok: false;
         error: Error;
     })) => void;
+    /**
+     * MCP-side counterpart of `tool:gate`. Same shape: set `block` to refuse,
+     * set `result` to substitute a successful payload and skip the upstream
+     * MCP `callTool`. When both are set across the handler chain, `block` wins.
+     *
+     * Fires INSIDE the MCP wrapper's `execute`, after the loop's `tool:gate`
+     * already ran. Does **not** carry `runToolCounts` — those are loop-level
+     * and already exposed on `tool:gate` for MCP tools (which are registered
+     * as agent tools under their namespaced name `mcp_<server>_<tool>`). Use
+     * `tool:gate` for budget / dedup logic; reserve `mcp:tool:gate` for
+     * MCP-specific concerns (per-server routing, transport-aware refusals).
+     */
     'mcp:tool:gate': (ctx: McpToolHookContext & {
         block: boolean;
         reason: string;
+        result?: string | ToolResultContent[];
     }) => void;
     'mcp:tool:before': (ctx: McpToolHookContext) => void;
     'mcp:tool:after': (ctx: McpToolHookContext & {
@@ -1769,6 +2027,24 @@ interface AgentHooks {
         bytes: number;
         budget: number;
     }) => void;
+    /**
+     * Fires when a per-tool budget configured via `behavior.toolBudgets` is
+     * exceeded for a specific tool. `mode` reflects how the framework reacted:
+     * `'steer'` lets the call run and queues a post-turn nudge; `'block'`
+     * refuses the call outright with `Blocked: <message>`.
+     *
+     * `count` is the run-cumulative dispatched count just before this call.
+     * Use `turnId` to correlate with `turn:after` if you need the integer turn
+     * index. Distinct from `budget:exceeded` (byte-level) so consumers can
+     * subscribe specifically; both can fire in the same turn.
+     */
+    'tool-budget:exceeded': (ctx: {
+        tool: string;
+        count: number;
+        max: number;
+        turnId: string;
+        mode: 'steer' | 'block';
+    }) => void;
     'agent:abort': (ctx: object) => void;
     'agent:done': (ctx: AgentStats) => void;
     'session:start': (ctx: SessionHookContext & {