zidane 4.0.2 → 4.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (77) hide show
  1. package/README.md +196 -614
  2. package/dist/agent-BoV5Twdl.d.ts +2347 -0
  3. package/dist/agent-BoV5Twdl.d.ts.map +1 -0
  4. package/dist/contexts-3Arvn7yR.js +321 -0
  5. package/dist/contexts-3Arvn7yR.js.map +1 -0
  6. package/dist/contexts.d.ts +2 -25
  7. package/dist/contexts.js +2 -10
  8. package/dist/errors-D1lhd6mX.js +118 -0
  9. package/dist/errors-D1lhd6mX.js.map +1 -0
  10. package/dist/index-28otmfLX.d.ts +400 -0
  11. package/dist/index-28otmfLX.d.ts.map +1 -0
  12. package/dist/index-BfSdALzk.d.ts +113 -0
  13. package/dist/index-BfSdALzk.d.ts.map +1 -0
  14. package/dist/index-DPsd0qwm.d.ts +254 -0
  15. package/dist/index-DPsd0qwm.d.ts.map +1 -0
  16. package/dist/index.d.ts +5 -95
  17. package/dist/index.js +141 -271
  18. package/dist/index.js.map +1 -0
  19. package/dist/interpolate-CukJwP2G.js +887 -0
  20. package/dist/interpolate-CukJwP2G.js.map +1 -0
  21. package/dist/mcp-8wClKY-3.js +771 -0
  22. package/dist/mcp-8wClKY-3.js.map +1 -0
  23. package/dist/mcp.d.ts +2 -4
  24. package/dist/mcp.js +2 -13
  25. package/dist/messages-z5Pq20p7.js +1020 -0
  26. package/dist/messages-z5Pq20p7.js.map +1 -0
  27. package/dist/presets-Cs7_CsMk.js +39 -0
  28. package/dist/presets-Cs7_CsMk.js.map +1 -0
  29. package/dist/presets.d.ts +2 -43
  30. package/dist/presets.js +2 -17
  31. package/dist/providers-CX-R-Oy-.js +969 -0
  32. package/dist/providers-CX-R-Oy-.js.map +1 -0
  33. package/dist/providers.d.ts +2 -4
  34. package/dist/providers.js +3 -23
  35. package/dist/session/sqlite.d.ts +7 -12
  36. package/dist/session/sqlite.d.ts.map +1 -0
  37. package/dist/session/sqlite.js +67 -79
  38. package/dist/session/sqlite.js.map +1 -0
  39. package/dist/session-Cn68UASv.js +440 -0
  40. package/dist/session-Cn68UASv.js.map +1 -0
  41. package/dist/session.d.ts +2 -4
  42. package/dist/session.js +3 -27
  43. package/dist/skills.d.ts +3 -322
  44. package/dist/skills.js +24 -47
  45. package/dist/skills.js.map +1 -0
  46. package/dist/stats-DoKUtF5T.js +58 -0
  47. package/dist/stats-DoKUtF5T.js.map +1 -0
  48. package/dist/tools-DpeWKzP1.js +3941 -0
  49. package/dist/tools-DpeWKzP1.js.map +1 -0
  50. package/dist/tools.d.ts +3 -95
  51. package/dist/tools.js +2 -40
  52. package/dist/tui.d.ts +533 -0
  53. package/dist/tui.d.ts.map +1 -0
  54. package/dist/tui.js +2004 -0
  55. package/dist/tui.js.map +1 -0
  56. package/dist/types-Bx_F8jet.js +39 -0
  57. package/dist/types-Bx_F8jet.js.map +1 -0
  58. package/dist/types.d.ts +4 -55
  59. package/dist/types.js +4 -28
  60. package/package.json +38 -4
  61. package/dist/agent-BAHrGtqu.d.ts +0 -2425
  62. package/dist/chunk-4ILGBQ23.js +0 -803
  63. package/dist/chunk-4LPBN547.js +0 -3540
  64. package/dist/chunk-64LLNY7F.js +0 -28
  65. package/dist/chunk-6STZTA4N.js +0 -830
  66. package/dist/chunk-7GQ7P6DM.js +0 -566
  67. package/dist/chunk-IC7FT4OD.js +0 -37
  68. package/dist/chunk-JCOB6IYO.js +0 -22
  69. package/dist/chunk-JH6IAAFA.js +0 -28
  70. package/dist/chunk-LNN5UTS2.js +0 -97
  71. package/dist/chunk-PMCQOMV4.js +0 -490
  72. package/dist/chunk-UD25QF3H.js +0 -304
  73. package/dist/chunk-W57VY6DJ.js +0 -834
  74. package/dist/sandbox-D7v6Wy62.d.ts +0 -28
  75. package/dist/skills-use-DwZrNmcw.d.ts +0 -80
  76. package/dist/types-Bai5rKpa.d.ts +0 -89
  77. package/dist/validation-Pm--dQEU.d.ts +0 -185
package/README.md CHANGED
@@ -2,38 +2,34 @@
2
2
 
3
3
  # Zidane
4
4
 
5
- An agent that goes straight to the goal.
6
-
7
- Minimal TypeScript agent loop built with [Bun](https://bun.sh).
8
-
9
- Hook into every step using [hookable](https://github.com/unjs/hookable).
10
-
11
- Built to be embedded.
5
+ An agent that goes straight to the goal. Minimal TypeScript agent loop built with [Bun](https://bun.sh), hookable via [hookable](https://github.com/unjs/hookable). Built to be embedded.
12
6
 
13
7
  ## Features
14
8
 
15
- A small, hookable core with sensible defaults so most consumers don't write a single hook. Built around three principles: **token discipline by default** (cache, dedup, compaction, byte-accounting), **self-healing on the fault paths** (auto-coerce args, hallucinated-tool fallback, error rewriting), and **provider parity** (server-side features on Anthropic, client-side equivalents everywhere else).
9
+ Small, hookable core with sensible defaults. Three principles:
10
+
11
+ - **Token discipline** — cache, dedup, compaction, byte-accounting.
12
+ - **Self-healing fault paths** — auto-coerce args, hallucinated-tool fallback, error rewriting.
13
+ - **Provider parity** — server-side features on Anthropic, client-side equivalents elsewhere.
16
14
 
17
- - 🧠 **Multi-provider, multi-auth** — Anthropic, OpenAI Codex, OpenRouter, Cerebras, plus a generic `openaiCompat` factory (Baseten, Fireworks, Groq, local servers). OAuth + API key, auto-refreshing tokens. Anthropic accepts opt-in `extraBetas` and `contextManagement` for first-party features.
18
- - 🪝 **Streaming, hookable turn loop** — text/thinking deltas, tool calls, MCP, sessions, skills, spawn, OAuth, validation, budgets all observable (and most mutatable) via typed hook events. Per-request `system:transform` hook for runtime-derived prompt sections.
19
- - 🛠 **Tools first-class** — `shell`, `read_file`, `write_file`, `edit`, `multi_edit`, `glob`, `grep`, `spawn`, human-in-the-loop, plus any [MCP](https://modelcontextprotocol.io) server. Sequential or parallel, per-call gates (`tool:gate` with writable `block` / `result` / `runToolCounts`), validation auto-coerce (`"true"` → `true`), hallucinated-tool fallback (`tool:unknown`), error rewriting (`tool:error` → `result`). Optional **progressive disclosure** (`behavior.toolDisclosure: 'lazy'`) hides MCP schemas behind a `tool_search` native tool when context budget matters more than upfront discovery — gated server-side and in-loop so a model can't bypass.
20
- - ✂️ **Token-aware ergonomics** — paginated reads with a "how to page" footer, 8 KB tail-truncated `shell`, idempotent `write_file`; `outputBytes` surfaced on every tool/MCP hook. `behavior.toolOutputBudget` injects a "summarize" nudge when a turn's outputs exceed the cap; `behavior.toolBudgets` caps per-tool call counts (`'steer'` or `'block'`); `behavior.thinkingDecay` tapers reasoning budget per turn.
21
- - 🗜 **Context discipline** — auto-injected `cache_control` breakpoints (Anthropic + OpenRouter); server-side compaction via `context-management-2025-06-27` on Anthropic, `behavior.compactStrategy: 'tail'` on everyone else. Per-session `read_file` dedup + opt-in `requireReadBeforeEdit` guard kill stale-content edits; `behavior.dedupTools` generalizes the same pattern to arbitrary tools (`todowrite`, `execute_sql`, …).
22
- - 🎯 **Reasoning + structured output** — thinking levels (`off` / `minimal` / `low` / `medium` / `high` / `adaptive`) with optional exact budgets; force the final answer to a JSON Schema (Zod v4 interop), no brittle parsing.
23
- - 💾 **Sessions, skills, multimodal** — pluggable session stores (memory / SQLite / remote / file-map), incremental persistence; [Agent Skills](https://agentskills.io/specification) spec-aligned with `allowed-tools` enforcement + resume rehydration; images + documents via `PromptPart[]`, tools can return image blocks routed natively on vision providers or via companion messages elsewhere.
24
- - 🧵 **Sub-agents + execution contexts** — delegate to child agents with inherited or overridden preset (child events bubble to the parent); run tools in-process, Docker, or any `SandboxProvider` (E2B / Rivet / custom). Parallel MCP bootstrap with `agent.warmup()` + `eager: true` to hide cold starts.
25
- - 🧭 **Typed errors + 1000+ tests** — `AgentContextExceededError` / `AgentProviderError` / `AgentAbortedError` instead of sniffing strings. Suite runs in under 2s with mock providers + mock execution contexts, zero API keys.
15
+ - 🧠 **Providers** — Anthropic, OpenAI Codex, OpenRouter, Cerebras, plus `openaiCompat` (Baseten, Fireworks, Groq, local). OAuth + API key with auto-refresh.
16
+ - 🪝 **Hookable turn loop** — every text/thinking delta, tool call, MCP, session, skill, spawn, OAuth, validation, and budget event is observable and (mostly) mutable.
17
+ - 🛠 **First-class tools** — `shell`, `read_file`, `write_file`, `edit`, `multi_edit`, `glob`, `grep`, `spawn`, human-in-the-loop, plus any [MCP](https://modelcontextprotocol.io) server. Per-call gates, arg auto-coerce, hallucinated-tool fallback, error rewriting. Lazy MCP disclosure via `tool_search`.
18
+ - ✂️ **Token-aware** — paginated reads, tail-truncated `shell`, idempotent `write_file`; `outputBytes` everywhere. `toolOutputBudget`, `toolBudgets`, `thinkingDecay`.
19
+ - 🗜 **Context discipline** — `cache_control` breakpoints; server-side compaction on Anthropic, client-side `compactStrategy: 'tail'` elsewhere. Per-session read dedup + `requireReadBeforeEdit`; generalized `dedupTools`.
20
+ - 🎯 **Reasoning + structured output** — thinking levels with optional exact budgets; force final response to a JSON Schema (Zod v4 interop).
21
+ - 💾 **Sessions, skills, multimodal** — pluggable stores, incremental persistence; [Agent Skills](https://agentskills.io/specification) spec; images + documents via `PromptPart[]`.
22
+ - 🧵 **Sub-agents + execution contexts** — child events bubble to parent; run tools in-process, Docker, or any `SandboxProvider`.
23
+ - 🧭 **Typed errors + 1000+ tests** — `AgentContextExceededError` / `AgentProviderError` / `AgentAbortedError`. Suite under 2s with mocks.
26
24
 
27
25
  ## Quickstart
28
26
 
29
27
  ```bash
30
28
  bun install
31
- bun run auth # Anthropic + OpenAI Codex OAuth
29
+ bun run auth # Anthropic + OpenAI Codex OAuth (--openai / --anthropic to scope)
32
30
  bun start --prompt "create a hello world app"
33
31
  ```
34
32
 
35
- `auth` runs both OAuth flows by default. Pass `--openai` or `--anthropic` to authenticate only one provider; the npm script form works too, e.g. `npm run auth --openai`.
36
-
37
33
  ## Agent Setup
38
34
 
39
35
  ```ts
@@ -50,69 +46,65 @@ const stats = await agent.run({ prompt: 'build a REST API' })
50
46
  console.log(`Done in ${stats.turns} turns`)
51
47
  ```
52
48
 
53
- All options on `createAgent`:
49
+ `createAgent` options:
54
50
 
55
51
  ```ts
56
52
  createAgent({
57
- provider, // required: LLM provider
58
- name: 'basic', // optional display name (shown in traces/logs)
59
- system: 'You are a helpful...', // default system prompt
60
- tools: { shell, readFile }, // tool set (default: no tools)
61
- toolAliases: { shell: 'Bash' }, // map canonical names to LLM-facing names
62
- session, // session for persistence
63
- behavior: { // agent-level defaults
64
- toolExecution: 'parallel', // or 'sequential' (default: parallel)
65
- maxTurns: 50, // max loop iterations
66
- maxTokens: 16384, // max tokens per LLM response
67
- thinkingBudget: 10240, // exact thinking token budget
68
- thinkingDecay: { afterTurn: 5, factor: 0.5, floor: 1024 }, // taper budget per run-relative turn
69
- cache: true, // prompt-cache breakpoints on supported providers (default: true)
70
- toolOutputBudget: 32768, // soft per-turn cap on tool-output bytes (off by default)
71
- dedupReads: true, // dedup identical re-reads of the same file in `read_file` (default: true)
72
- dedupTools: { todowrite: i => JSON.stringify(i.todos) }, // generic per-tool argument dedup
73
- requireReadBeforeEdit: false, // refuse `edit` / `multi_edit` against unread or stale files (default: false)
74
- toolBudgets: { todowrite: { max: 6, onExceed: 'steer' } }, // per-tool soft call caps
75
- compactStrategy: 'off', // client-side tail compaction for non-Anthropic providers — 'off' | 'tail' (default: 'off')
76
- compactThreshold: 131_072, // bytes threshold that triggers tail compaction (default: 128 KiB)
77
- compactKeepTurns: 4, // trailing turns left intact during compaction (default: 4)
78
- toolDisclosure: 'eager', // 'eager' | 'lazy' hide MCP schemas behind tool_search (default: 'eager')
79
- toolSearch: { tool: true, limit: 20 }, // tune the auto-injected tool_search (no-op in eager mode)
53
+ provider, // required
54
+ name: 'basic', // display name (traces/logs)
55
+ system: 'You are a helpful...',
56
+ tools: { shell, readFile }, // default: {}
57
+ toolAliases: { shell: 'Bash' }, // canonical LLM-facing names
58
+ session,
59
+ behavior: {
60
+ toolExecution: 'parallel', // 'parallel' | 'sequential' (default: parallel)
61
+ maxTurns: 50,
62
+ maxTokens: 16384,
63
+ thinkingBudget: 10240,
64
+ thinkingDecay: { afterTurn: 5, factor: 0.5, floor: 1024 },
65
+ cache: true, // prompt-cache breakpoints
66
+ toolOutputBudget: 32768, // soft per-turn byte cap (off by default)
67
+ dedupReads: true, // dedup re-reads in `read_file`
68
+ dedupTools: { todowrite: i => JSON.stringify(i.todos) },
69
+ requireReadBeforeEdit: false, // refuse edits against unread/stale files
70
+ toolBudgets: { todowrite: { max: 6, onExceed: 'steer' } },
71
+ compactStrategy: 'off', // 'off' | 'tail' (non-Anthropic compaction)
72
+ compactThreshold: 131_072, // 128 KiB
73
+ compactKeepTurns: 4,
74
+ toolDisclosure: 'eager', // 'eager' | 'lazy' (hide MCP schemas behind tool_search)
75
+ toolSearch: { tool: true, limit: 20 },
80
76
  },
81
- execution: createProcessContext(), // where tools run
82
- mcpServers: [], // MCP tool servers
83
- eager: true, // pre-warm MCP bootstrap in the background (default: false)
84
- skills: {}, // skills configuration
77
+ execution: createProcessContext(),
78
+ mcpServers: [],
79
+ eager: true, // pre-warm MCP in background
80
+ skills: {},
85
81
  })
86
82
  ```
87
83
 
88
- Presets are just `Partial<AgentOptions>` — spread them in, override any field:
84
+ Presets are `Partial<AgentOptions>` — spread, override:
89
85
 
90
86
  ```ts
91
87
  createAgent({ ...basic, provider, system: 'be concise' })
92
88
  ```
93
89
 
94
- All options on `agent.run()`:
90
+ `agent.run()` options:
95
91
 
96
92
  ```ts
97
93
  await agent.run({
98
- prompt: 'your task', // optional when session has existing turns
94
+ prompt: 'your task', // optional when session has turns
99
95
  model: 'claude-opus-4-6',
100
96
  system: 'be concise',
101
97
  thinking: 'medium', // off | minimal | low | medium | high
102
- behavior: { // per-run overrides
103
- maxTurns: 10,
104
- maxTokens: 4096,
105
- thinkingBudget: 8192,
106
- },
107
- tools: {}, // override tools for this run ({} = no tools)
108
- images: [], // base64 images
98
+ behavior: { maxTurns: 10, maxTokens: 4096, thinkingBudget: 8192 },
99
+ tools: {}, // {} = no tools for this run
100
+ images: [],
109
101
  signal: abortController.signal,
110
102
  })
111
103
  ```
112
104
 
113
- `prompt` is optional when a session with existing turns is provided — the agent resumes from the last turn. This supports apps where the user message is persisted to the session before the agent runs (e.g. WebSocket → session → queue → agent).
105
+ `prompt` is optional when the session already has turns — the agent resumes. Useful when the user message is persisted before the run (WebSocket → session → queue → agent).
114
106
 
115
- Precedence: `run.behavior` > `agent.behavior` > hardcoded defaults.
107
+ Precedence: `run.behavior` > `agent.behavior` > defaults.
116
108
 
117
109
  ## CLI
118
110
 
@@ -130,7 +122,7 @@ bun start \
130
122
 
131
123
  ## Providers
132
124
 
133
- All providers accept runtime credentials via a params object. Env vars are fallbacks.
125
+ All providers accept runtime credentials via params. Env vars are fallbacks.
134
126
 
135
127
  ### Anthropic
136
128
 
@@ -138,17 +130,15 @@ All providers accept runtime credentials via a params object. Env vars are fallb
138
130
  import { anthropic } from 'zidane/providers'
139
131
 
140
132
  anthropic({ apiKey: 'sk-ant-...' })
141
- anthropic({ access: 'sk-ant-oat-...' }) // OAuth
142
- anthropic({ access: 'sk-ant-oat-...', refresh: '...', expires: Date.now() + 3600_000 }) // auto-refresh
143
- anthropic({ apiKey: '...', defaultModel: 'claude-sonnet-4-6' })
133
+ anthropic({ access: 'sk-ant-oat-...', refresh: '...', expires: Date.now() + 3600_000 }) // OAuth + auto-refresh
144
134
 
145
- // Opt into first-party Anthropic betas + server-side context compaction:
135
+ // First-party betas + server-side compaction:
146
136
  anthropic({
147
137
  apiKey: '...',
148
138
  extraBetas: [
149
- 'context-management-2025-06-27', // server-side, token-accurate compaction
150
- 'token-efficient-tools-2026-03-28', // ~4.5% output token reduction
151
- 'interleaved-thinking-2025-05-14', // think between tool calls in one turn
139
+ 'context-management-2025-06-27', // token-accurate compaction
140
+ 'token-efficient-tools-2026-03-28', // ~4.5% output reduction
141
+ 'interleaved-thinking-2025-05-14', // think between tool calls
152
142
  ],
153
143
  contextManagement: {
154
144
  edits: [{
@@ -161,55 +151,22 @@ anthropic({
161
151
  })
162
152
  ```
163
153
 
164
- Fallback: `params.apiKey` > `params.access` > `ANTHROPIC_API_KEY` env > `.credentials.json`
154
+ Fallback: `params.apiKey` > `params.access` > `ANTHROPIC_API_KEY` env > `.credentials.json`. `extraBetas` merge with OAuth defaults and de-dupe. `contextManagement` is sent as `context_management`; pair with the matching beta. Non-Anthropic equivalent: `behavior.compactStrategy: 'tail'`.
165
155
 
166
- `extraBetas` are merged with the OAuth defaults (`claude-code-20250219`, `oauth-2025-04-20`) and de-duped. `contextManagement` is sent on the request body as `context_management`; pair it with the `context-management-2025-06-27` beta. For non-Anthropic providers, see `behavior.compactStrategy: 'tail'` for the client-side fallback.
156
+ `extraBodyParams` passes un-typed Messages API fields through (factory options win on collision). Use when Anthropic ships a beta before zidane has a knob. `openaiCompat` accepts the same field (e.g. `reasoning_effort`, `metadata`, OpenRouter `provider` routing).
167
157
 
168
- `extraBodyParams` is a generic forward-compat pass-through for un-typed Messages API fields. Spread into the request before the typed core, so explicit factory options always win on collision. Use it when Anthropic ships a new beta before zidane has a dedicated knob:
158
+ ### OpenRouter / OpenAI / Cerebras
169
159
 
170
160
  ```ts
171
- anthropic({
172
- apiKey: '...',
173
- extraBetas: ['some-future-beta'],
174
- extraBodyParams: { future_field: { /* ... */ } },
175
- })
176
- ```
177
-
178
- `openaiCompat` accepts the same `extraBodyParams` for OpenAI-style endpoints (e.g. `reasoning_effort`, `metadata`, OpenRouter `provider` routing).
179
-
180
- ### OpenRouter
181
-
182
- ```ts
183
- import { openrouter } from 'zidane/providers'
161
+ import { openrouter, openai, cerebras } from 'zidane/providers'
184
162
 
185
163
  openrouter({ apiKey: 'sk-or-...', defaultModel: 'google/gemini-pro' })
186
- ```
187
-
188
- Fallback: `params.apiKey` > `OPENROUTER_API_KEY` env
189
-
190
- ### OpenAI
191
-
192
- ```ts
193
- import { openai } from 'zidane/providers'
194
-
195
- openai() // OpenAI Codex OAuth
196
- openai({ access: 'eyJ...', defaultModel: 'gpt-5.4' })
164
+ openai() // OpenAI Codex OAuth
197
165
  openai({ access: 'eyJ...', refresh: '...', expires: Date.now() + 3600_000, accountId: 'acct_123' })
198
- ```
199
-
200
- Fallback: `params.apiKey` > `params.access` > `OPENAI_CODEX_API_KEY` env > `.credentials.json`
201
-
202
- Pass the full OAuth credential fields (`access`, `refresh`, `expires`, plus provider extras like `accountId`) to let the provider auto-refresh tokens without reading `.credentials.json`.
203
-
204
- ### Cerebras
205
-
206
- ```ts
207
- import { cerebras } from 'zidane/providers'
208
-
209
166
  cerebras({ apiKey: 'csk-...', defaultModel: 'zai-glm-4.7' })
210
167
  ```
211
168
 
212
- Fallback: `params.apiKey` > `CEREBRAS_API_KEY` env
169
+ Fallbacks: `params.apiKey` > `params.access` (Codex) > `<PROVIDER>_API_KEY` env > `.credentials.json` (Codex). Pass full OAuth fields on `openai()` to auto-refresh without reading `.credentials.json`.
213
170
 
214
171
  ### OpenAI-compatible (custom endpoints)
215
172
 
@@ -222,265 +179,127 @@ openaiCompat({
222
179
  name: 'baseten',
223
180
  apiKey: process.env.BASETEN_API_KEY!,
224
181
  baseURL: process.env.BASETEN_PROXY_URL!,
225
- authHeader: { name: 'Authorization', scheme: 'Api-Key' }, // vendor-specific scheme
182
+ authHeader: { name: 'Authorization', scheme: 'Api-Key' }, // vendor-specific
226
183
  capabilities: { vision: false, imageInToolResult: false },
227
- cacheBreakpoints: false, // set true only for endpoints that honor `cache_control`
184
+ cacheBreakpoints: false, // true only when endpoint honors `cache_control`
228
185
  })
229
186
  ```
230
187
 
231
- `openrouter` and `cerebras` are thin wrappers around this factory with vendor defaults pinned. Reach for `openaiCompat` directly when adding a new backend instead of forking a bespoke provider.
188
+ `openrouter` and `cerebras` are thin wrappers with vendor defaults pinned. Use `openaiCompat` directly for new backends.
232
189
 
233
190
  ### Prompt caching
234
191
 
235
- Enabled by default via `behavior.cache`. The provider inserts `cache_control: { type: 'ephemeral' }` markers on the three largest stable prefixes system prompt, tool definitions, and the last message's final content block so the shared prefix is served from cache across turns.
192
+ `behavior.cache` (default on). `cache_control: { type: 'ephemeral' }` is inserted on three stable prefixes (system, last tool, last message's final block). Hits + writes surface on `TurnUsage.cacheRead` / `cacheCreation` via the `usage` hook.
236
193
 
237
194
  | Provider | Behavior |
238
195
  |---|---|
239
- | `anthropic` | Breakpoints honored natively. |
240
- | `openrouter` | Breakpoints forwarded; Anthropic + Gemini routes honor them, OpenAI / DeepSeek / Grok / Groq / Moonshot routes cache automatically and ignore the markers. |
241
- | `openaiCompat` | Opt-in via `cacheBreakpoints: true`. Default off so strict-schema endpoints (OpenAI direct, most OSS servers) don't reject unknown fields. |
242
- | `cerebras` | Off (factory doesn't enable breakpoints). |
243
- | `openai` (Codex) | Not affected separate wire format (pi-ai). |
244
-
245
- Cache hits + writes land on `TurnUsage.cacheRead` / `TurnUsage.cacheCreation` and are surfaced via the `usage` hook.
196
+ | `anthropic` | Honored natively. |
197
+ | `openrouter` | Forwarded; Anthropic + Gemini honor; OpenAI / DeepSeek / Grok / Groq / Moonshot cache automatically and ignore the markers. |
198
+ | `openaiCompat` | Opt-in via `cacheBreakpoints: true`. Off by default. |
199
+ | `cerebras` | Off. |
200
+ | `openai` (Codex) | Not affected (pi-ai wire format). |
246
201
 
247
202
  ## Presets
248
203
 
249
- Reusable slices of `AgentOptions` — spread them into `createAgent()`.
250
-
251
- The `basic` preset bundles:
204
+ `basic` ships:
252
205
 
253
206
  | Tool | Description |
254
207
  |---|---|
255
- | `shell` | Execute shell commands. Combined stdout+stderr tail-truncated at 8 KB by default; `maxOutputBytes: 0` disables |
256
- | `readFile` | Read a file by line range. Default: lines 1..2000, byte cap 64 KB. Truncation footer documents how to page; binary files return a marker instead of mojibake |
257
- | `writeFile` | Write a file. Returns `Created` / `Updated` / `No change needed: …` so the model can detect no-ops without a separate read |
258
- | `edit` | Surgical replace of `old_string` → `new_string`. Fails clearly on non-unique matches (unless `replace_all`) and on not-found (with a nearest-match preview) |
259
- | `multiEdit` | Atomic list of edits to one file. All-or-nothing: any failed edit prevents the write |
260
- | `listFiles` | List directory contents |
261
- | `spawn` | Spawn a sub-agent |
208
+ | `shell` | Combined stdout+stderr tail-truncated at 8 KB. `maxOutputBytes: 0` disables. |
209
+ | `readFile` | Line range, default 1..2000, 64 KB cap. Paging footer; binary marker. |
210
+ | `writeFile` | Returns `Created` / `Updated` / `No change needed: …` for no-op detection. |
211
+ | `edit` | Surgical `old_string` → `new_string`. Clear errors on non-unique / not-found (with nearest-match preview). |
212
+ | `multiEdit` | Atomic edits to one file. All-or-nothing. |
213
+ | `listFiles` | Directory listing. |
214
+ | `spawn` | Sub-agent. |
262
215
 
263
- Opt-in tools available via `import { glob, grep, createInteractionTool } from 'zidane'`:
216
+ Opt-in (via `import from 'zidane'`): `glob` (Bun.Glob; shells out in docker/sandbox), `grep` (ripgrep + Bun.Glob fallback; full Claude Code Grep semantics), `createInteractionTool` (HITL factory).
264
217
 
265
- | Tool | Description |
266
- |---|---|
267
- | `glob` | Bun.Glob-backed pattern matching (in-process); shells out in docker/sandbox |
268
- | `grep` | ripgrep-backed regex search (with a Bun.Glob fallback). `output_mode`, `-i / -n / -A / -B / -C`, `multiline`, `head_limit`, `offset` — Claude Code Grep semantics |
269
- | `createInteractionTool` | Human-in-the-loop factory |
270
-
271
- The three `skills_use` / `skills_read` / `skills_run_script` tools auto-inject when the skills catalog is non-empty.
272
-
273
- Define a custom preset:
218
+ `skills_use` / `skills_read` / `skills_run_script` auto-inject when the skills catalog is non-empty.
274
219
 
275
220
  ```ts
276
221
  import { basicTools, definePreset } from 'zidane/presets'
277
222
 
278
- const researcher = definePreset({
279
- name: 'researcher',
280
- system: 'You are a research assistant.',
281
- tools: { ...basicTools },
282
- })
283
-
284
- createAgent({ ...researcher, provider })
285
- ```
286
-
287
- For pure chat with no tools, omit `tools` or pass `{}` at run time:
288
-
289
- ```ts
290
- createAgent({ provider }) // no tools
291
- await agent.run({ prompt: 'just chat', tools: {} }) // override for one run
223
+ createAgent({ ...definePreset({ name: 'researcher', tools: basicTools }), provider })
224
+ createAgent({ provider }) // no tools
225
+ await agent.run({ prompt: 'just chat', tools: {} }) // no tools for one run
292
226
  ```
293
227
 
294
228
  ## Thinking
295
229
 
296
- Extended reasoning with named levels or exact token budgets.
230
+ Named levels or exact budgets. Traces persist as `{ type: 'thinking', text }` blocks and stream via `stream:thinking`. Supported by Anthropic (native) and OpenRouter/Cerebras (`reasoning_content`/`reasoning` SSE fields).
297
231
 
298
232
  | Level | Default budget |
299
233
  |---|---|
300
234
  | `off` | disabled |
301
- | `minimal` | 1,024 tokens |
302
- | `low` | 4,096 tokens |
303
- | `medium` | 10,240 tokens |
304
- | `high` | 32,768 tokens |
305
- | `adaptive` | model self-budgets per turn |
235
+ | `minimal` | 1,024 |
236
+ | `low` | 4,096 |
237
+ | `medium` | 10,240 |
238
+ | `high` | 32,768 |
239
+ | `adaptive` | model self-budgets |
306
240
 
307
241
  ```ts
308
- // Named level
309
- await agent.run({ prompt: 'solve this', thinking: 'high' })
310
-
311
- // Exact budget (overrides level default)
312
- await agent.run({ prompt: 'solve this', thinking: 'high', behavior: { thinkingBudget: 50000 } })
313
-
314
- // Adaptive — model self-budgets, but `thinkingBudget` caps the response envelope
315
- // (max_tokens) to soft-bound runaway thinking on Anthropic.
316
- await agent.run({ prompt: 'solve this', thinking: 'adaptive', behavior: { thinkingBudget: 32000 } })
317
-
318
- // Agent-level default
319
- const agent = createAgent({ ...basic, provider, behavior: { thinkingBudget: 16384 } })
242
+ await agent.run({ prompt: '…', thinking: 'high' })
243
+ await agent.run({ prompt: '', thinking: 'high', behavior: { thinkingBudget: 50000 } }) // exact
244
+ await agent.run({ prompt: '…', thinking: 'adaptive', behavior: { thinkingBudget: 32000 } })
320
245
  ```
321
246
 
322
- Thinking traces are stored in session turns as `{ type: 'thinking', text }` content blocks and streamed live via the `stream:thinking` hook. Supported by Anthropic (native) and OpenRouter/Cerebras (`reasoning_content`/`reasoning` SSE fields).
323
-
324
- `adaptive` is Anthropic-specific (`thinking.type='adaptive'`) and avoids the `thinking.type='enabled'` deprecation warning on opus 4.6+. It has no native budget knob — when `thinkingBudget` is paired with `adaptive`, zidane caps `max_tokens = min(maxTokens, thinkingBudget)` so unbounded reasoning can't run away. Other providers fall back to no reasoning when `adaptive` is selected.
247
+ `adaptive` is Anthropic-only (`thinking.type='adaptive'`, avoids the opus 4.6+ deprecation warning). Pairing it with `thinkingBudget` caps `max_tokens = min(maxTokens, thinkingBudget)` to bound runaway reasoning. Other providers fall back to no reasoning on `adaptive`.
325
248
 
326
249
  ## Hooks
327
250
 
328
- Every hook receives a mutable context object.
329
-
330
- ### Turn lifecycle
331
-
332
- ```ts
333
- agent.hooks.hook('turn:before', (ctx) => {
334
- // ctx.turn, ctx.turnId, ctx.options (StreamOptions)
335
- })
336
-
337
- agent.hooks.hook('turn:after', (ctx) => {
338
- // ctx.turn, ctx.turnId, ctx.usage, ctx.message (full SessionTurn)
339
- // Always fires — even if the provider throws mid-stream
340
- // Turn is guaranteed to be in agent.turns before this fires
341
- })
342
-
343
- agent.hooks.hook('usage', (ctx) => {
344
- // ctx.turn, ctx.turnId, ctx.usage (per-turn)
345
- // ctx.totalIn, ctx.totalOut (running parent-loop totals — children fold in
346
- // post-loop and are visible on `agent:done`)
347
- })
251
+ Hooks fire at every lifecycle point via [hookable](https://github.com/unjs/hookable). Awaited in registration order; ctx is shared per firing (last-writer wins). See `docs/SKILL.md` for the full hook reference table.
348
252
 
349
- agent.hooks.hook('agent:done', (ctx) => {
350
- // ctx.totalIn / ctx.totalOut / ctx.cost — cumulative across parent loop +
351
- // every recursively-spawned sub-agent
352
- // ctx.turns, ctx.elapsed — parent-loop view (use `flattenTurns(ctx).length`
353
- // for tree-wide turn counts, `statsByModel(ctx)` for per-model breakdown)
354
- // ctx.children? — per-child stats in completion order
355
- // ctx.output — structured output (when behavior.schema is set)
356
- // Fires on all exit paths: completion, maxTurns, and abort
357
- })
358
- ```
359
-
360
- ### Streaming
361
-
362
- ```ts
363
- agent.hooks.hook('stream:text', (ctx) => {
364
- // ctx.delta, ctx.text, ctx.turnId
365
- })
366
-
367
- agent.hooks.hook('stream:end', (ctx) => {
368
- // ctx.text (final), ctx.turnId
369
- // Only fires when there is text content (not on tool-only turns)
370
- })
371
-
372
- agent.hooks.hook('stream:thinking', (ctx) => {
373
- // ctx.delta, ctx.thinking (accumulated), ctx.turnId
374
- // Fires when the model streams reasoning traces (Anthropic, OpenRouter)
375
- })
376
-
377
- agent.hooks.hook('oauth:refresh', (ctx) => {
378
- // ctx.provider, ctx.providerId, ctx.source
379
- // ctx.previousCredentials, ctx.credentials
380
- // Fires when an OAuth token is refreshed from passed credentials or .credentials.json
381
- })
382
- ```
383
-
384
- ### Tool execution
385
-
386
- All tool hooks include `turnId` and `callId` for correlation. Typed via `ToolHookContext`.
253
+ ### Practical examples
387
254
 
388
255
  ```ts
256
+ // Refuse or substitute a tool call.
389
257
  agent.hooks.hook('tool:gate', (ctx) => {
390
- // ctx.turnId, ctx.callId, ctx.name, ctx.input, ctx.runToolCounts
391
258
  if (ctx.name === 'shell' && String(ctx.input.command).includes('rm -rf')) {
392
259
  ctx.block = true
393
260
  ctx.reason = 'dangerous command'
394
261
  }
395
- // Substitute a successful result without running the tool — mirrors
396
- // tool:unknown / tool:error. When both are set, `block` wins.
397
262
  if (ctx.name === 'todowrite' && (ctx.runToolCounts.todowrite ?? 0) > 0)
398
- ctx.result = 'Already recorded; no-op.'
263
+ ctx.result = 'Already recorded; no-op.' // `block` wins if both set
399
264
  })
400
265
 
401
- agent.hooks.hook('tool:before', (ctx) => { /* ctx.turnId, ctx.callId, ctx.name, ctx.input, ctx.runToolCounts, ctx.coercions? */ })
402
- agent.hooks.hook('tool:after', (ctx) => { /* + ctx.result, ctx.outputBytes, ctx.runToolCounts, ctx.coercions? */ })
403
- agent.hooks.hook('tool:error', (ctx) => {
404
- // + ctx.error. Mutate ctx.result to substitute the payload sent back to the
405
- // model in place of the default `Tool error: <msg>` — useful for OSS-model
406
- // error rewriting (collapse stack traces, prepend recovery hints).
407
- })
266
+ // Redact secrets before the model sees a tool result.
408
267
  agent.hooks.hook('tool:transform', (ctx) => {
409
- // + ctx.result, ctx.isError, ctx.outputBytes (pre-mutation), ctx.coercions? — mutate result/isError to modify.
410
- // Built-in tools already truncate; use this hook for consumer concerns the framework can't infer,
411
- // e.g. redacting secrets in tool output before they reach the model.
412
268
  if (typeof ctx.result === 'string')
413
269
  ctx.result = ctx.result.replace(/\b(API_KEY|TOKEN|PASSWORD)\s*=\s*\S+/gi, '$1=<redacted>')
414
270
  })
271
+
272
+ // Substitute for hallucinated tool names instead of erroring.
415
273
  agent.hooks.hook('tool:unknown', (ctx) => {
416
- // Fires when the model invents a tool name (or calls one no longer registered).
417
- // Mutate ctx.result to substitute a friendly response, set ctx.suppressError = true
418
- // to skip the companion `tool:error`.
419
274
  if (ctx.name === 'EnterPlanMode') {
420
275
  ctx.result = 'EnterPlanMode is not available — use shell to draft a plan as comments.'
421
276
  ctx.suppressError = true
422
277
  }
423
278
  })
424
- agent.hooks.hook('validation:reject', (ctx) => {
425
- // Fires when arg validation rejects the input even after auto-coercion attempts.
426
- // Observational — the model still receives `Validation error: …` for the retry.
427
- // ctx.reason, ctx.schema
428
- })
429
- agent.hooks.hook('validation:coerce', (ctx) => {
430
- // Fires when validation auto-healed at least one field. Never fires on
431
- // perfectly-typed inputs. ctx.coercions lists the field names that were changed.
432
- // Symmetric counterpart to `validation:reject` — useful for "model wrongness rate".
433
- })
434
- ```
435
-
436
- `ctx.coercions` (when present) is the same `readonly string[]` exposed via `validation:coerce`. The field is **omitted** from `tool:before` / `tool:after` / `tool:transform` ctx when no coercion happened, so it never noises up the happy path. Listeners can `if (ctx.coercions)` guard.
437
-
438
- MCP tool hooks mirror the same pattern with `server` and `tool` fields. Typed via `McpToolHookContext`.
439
-
440
- ```ts
441
- agent.hooks.hook('mcp:tool:gate', (ctx) => { /* ctx.turnId, ctx.callId, ctx.server, ctx.tool, ctx.input, ctx.block, ctx.reason */ })
442
- agent.hooks.hook('mcp:tool:before', (ctx) => { /* ctx.turnId, ctx.callId, ctx.server, ctx.tool, ctx.input */ })
443
- agent.hooks.hook('mcp:tool:after', (ctx) => { /* + ctx.result, ctx.outputBytes */ })
444
- agent.hooks.hook('mcp:tool:transform', (ctx) => { /* + ctx.result, ctx.outputBytes — mutate to modify */ })
445
- agent.hooks.hook('mcp:tool:error', (ctx) => { /* + ctx.error */ })
446
- ```
447
-
448
- `outputBytes` measures the wire size of the tool's result. On `*:transform` it's the **pre-mutation** size (a truncation handler can size-budget); on `*:after` it's the **post-mutation** size that goes to the model. `toolOutputByteLength(content)` exported from `zidane` reproduces the formula.
449
-
450
- ### Context transform
451
279
 
452
- Prune messages before each LLM call:
280
+ // Per-turn observation.
281
+ agent.hooks.hook('turn:after', (ctx) => { /* ctx.turn, ctx.usage, ctx.message — always fires */ })
282
+ agent.hooks.hook('stream:text', (ctx) => { /* ctx.delta, ctx.text */ })
283
+ agent.hooks.hook('agent:done', (ctx) => { /* AgentStats — cumulative incl. children */ })
453
284
 
454
- ```ts
285
+ // Mutate messages / system before the provider call.
455
286
  agent.hooks.hook('context:transform', (ctx) => {
456
- if (ctx.messages.length > 30)
457
- ctx.messages.splice(2, ctx.messages.length - 30)
287
+ if (ctx.messages.length > 30) ctx.messages.splice(2, ctx.messages.length - 30)
458
288
  })
459
- ```
460
-
461
- ### System transform
462
-
463
- Mutate the system prompt per request — useful for runtime-derived sections (files already read in the session, live tool budgets, skill activation reminders). Fires after `context:transform`, before the request goes out. `messages` is read-only here.
464
-
465
- ```ts
466
289
  agent.hooks.hook('system:transform', (ctx) => {
467
- // ctx.system, ctx.messages (readonly), ctx.turn, ctx.turnId, ctx.session?
468
290
  if (ctx.session && ctx.turn > 1)
469
291
  ctx.system += `\n\n## Reminder: keep responses concise after turn ${ctx.turn}.`
470
292
  })
471
293
  ```
472
294
 
473
- Cache breakpoints land naturally inside the provider after this hook, so repeated turns with the same derived system text still hit the cache.
295
+ Mutable hooks: `tool:gate` (`block` / `reason` / `result`), `tool:transform` (`result` / `isError`), `tool:error` + `tool:unknown` (`result`), `context:transform` (`messages`), `system:transform` + `system:before` (`system`), `skills:catalog` (`catalog`), `mcp:tool:gate` (`block` / `reason` / `result`), `mcp:tool:transform` (`result`). All tool hooks include `turnId` + `callId`. `outputBytes` is **pre-mutation** on `*:transform`, **post-mutation** on `*:after` — reproduce via `toolOutputByteLength()`. `ctx.coercions` is **omitted** when no coercion happened — guard with `if (ctx.coercions)`.
474
296
 
475
297
  ### Hook recipes
476
298
 
477
- Three patterns that don't have a built-in default. Copy-paste and tune.
299
+ Three patterns the framework can't auto-infer. Copy-paste and tune.
478
300
 
479
301
  ```ts
480
- // 1. Truncate MCP tool results.
481
- // Built-in tools (shell, read_file) already tail-truncate; MCP server outputs
482
- // don't, since their sizes vary wildly and zidane can't pick a sane default
483
- // on their behalf. Apply the same shape to mcp:tool:transform.
302
+ // 1. Truncate MCP tool results — sizes vary too much for a default.
484
303
  agent.hooks.hook('mcp:tool:transform', (ctx) => {
485
304
  if (ctx.outputBytes <= 8192 || typeof ctx.result !== 'string')
486
305
  return
@@ -496,7 +315,7 @@ agent.hooks.hook('tool:unknown', (ctx) => {
496
315
  }
497
316
  })
498
317
 
499
- // 3. Drop old turns once the conversation grows past a soft cap.
318
+ // 3. Drop old turns past a soft cap.
500
319
  agent.hooks.hook('context:transform', (ctx) => {
501
320
  const KEEP_RECENT = 30
502
321
  if (ctx.messages.length > KEEP_RECENT) {
@@ -506,92 +325,49 @@ agent.hooks.hook('context:transform', (ctx) => {
506
325
  })
507
326
  ```
508
327
 
509
- `mcp:tool:transform`, `tool:unknown`, and `context:transform` are the highest-leverage entries on the surface for the cases v3 doesn't auto-handle. Most production agents end up with one of each.
328
+ `mcp:tool:transform`, `tool:unknown`, and `context:transform` are the highest-leverage entries the framework doesn't auto-handle. Most production agents end up with one of each.
510
329
 
511
330
  ### Per-turn output budget
512
331
 
513
- When working with OSS models that return large tool outputs, set `behavior.toolOutputBudget` to inject a "summarize before continuing" message after any turn whose combined post-`tool:transform` tool-output bytes exceed the cap. Off by default.
514
-
515
- ```ts
516
- const agent = createAgent({
517
- ...basic,
518
- provider,
519
- behavior: { toolOutputBudget: 32768 },
520
- })
521
-
522
- agent.hooks.hook('budget:exceeded', (ctx) => {
523
- console.warn(`turn ${ctx.turn}: ${ctx.bytes} > ${ctx.budget} bytes`)
524
- })
525
-
526
- agent.hooks.hook('tool-budget:exceeded', (ctx) => {
527
- // Per-tool counterpart, fires when `behavior.toolBudgets[ctx.tool]` trips.
528
- // ctx.tool, ctx.count, ctx.max, ctx.turnId, ctx.mode ('steer' | 'block')
529
- console.warn(`tool ${ctx.tool} hit cap (${ctx.count}/${ctx.max}, mode=${ctx.mode})`)
530
- })
531
- ```
332
+ `behavior.toolOutputBudget` injects a "summarize before continuing" message when a turn's combined post-`tool:transform` bytes exceed the cap. Off by default. Subscribe via `budget:exceeded` (byte) and `tool-budget:exceeded` (per-tool, fields: `tool, count, max, turnId, mode`).
532
333
 
533
334
  ### Client-side context compaction (non-Anthropic)
534
335
 
535
- For non-Anthropic providers (cerebras / openai-compat / openrouter on OSS models), `behavior.compactStrategy: 'tail'` elides older `tool_result` blocks from the wire-level message list once their combined size exceeds `compactThreshold`. The newest `compactKeepTurns` messages stay intact so the model retains the freshest tool context.
536
-
537
- ```ts
538
- const agent = createAgent({
539
- ...basic,
540
- provider: cerebras({ apiKey: '...' }),
541
- behavior: {
542
- compactStrategy: 'tail',
543
- compactThreshold: 131_072, // 128 KiB; default
544
- compactKeepTurns: 4, // default
545
- },
546
- })
547
- ```
548
-
549
- Anthropic users should prefer the server-side `context-management-2025-06-27` beta (token-accurate, configured via `anthropic({ extraBetas, contextManagement })`) — `'tail'` is a client-side approximation that exists because OSS-model providers have no server-side equivalent.
336
+ `behavior.compactStrategy: 'tail'` elides older `tool_result` blocks once their combined size exceeds `compactThreshold` (default 128 KiB); the newest `compactKeepTurns` (default 4) stay intact. Anthropic users should prefer the server-side `context-management-2025-06-27` beta via `anthropic({ extraBetas, contextManagement })` — token-accurate.
550
337
 
551
338
  ### Read dedup + read-before-edit guard
552
339
 
553
- `behavior.dedupReads` (on by default) — `read_file` returns a short `"unchanged since the previous read"` stub instead of re-emitting bytes when the model re-reads the same file with the same slice. Per-session content-hash; requires a session.
554
-
555
- `behavior.requireReadBeforeEdit` (off by default) — `edit` and `multi_edit` reject when the file hasn't been read in the session, or when its on-disk content has drifted since the last read. Eliminates the silent-corruption case where a model edits against bytes it "remembers" but no longer reflect reality. Recommended on for stricter eval-grade runs.
340
+ - `behavior.dedupReads` (default **on**) — `read_file` returns `"unchanged since the previous read"` on identical re-reads. Per-session content-hash.
341
+ - `behavior.requireReadBeforeEdit` (default **off**) — `edit` / `multi_edit` reject when the file hasn't been read this session or has drifted. Recommended for eval-grade runs.
556
342
 
557
343
  ### Generic per-tool dedup
558
344
 
559
- `behavior.dedupTools` extends the read-file pattern to arbitrary tools. Provide a hasher per tool keyed by canonical name; identical inputs replay the prior result without re-running the tool. Requires a session.
560
-
561
- The hasher contract has **three return values, three meanings** — pick deliberately:
345
+ `behavior.dedupTools` extends the pattern to arbitrary tools via a hasher keyed by canonical name. Requires a session. Hasher contract **three returns, three meanings**:
562
346
 
563
347
  | Return | Meaning |
564
348
  |---|---|
565
- | non-empty string | Cache key for this call. Equal keys replay the prior result. |
566
- | `undefined` | **Skip dedup for this call.** Tool runs normally; nothing recorded. |
349
+ | non-empty string | Cache key. Equal keys replay the prior result. |
350
+ | `undefined` | Skip dedup for this call. Tool runs normally. |
567
351
  | `''` or non-string | Treated as `undefined` (defensive). |
568
352
 
569
353
  ```ts
570
354
  behavior: {
571
355
  dedupTools: {
572
- // Always cache by full input — every identical re-call dedups.
573
356
  todowrite: input => JSON.stringify(input),
574
-
575
- // Cache by a normalized subset; non-cacheable shapes opt out via `undefined`.
576
357
  execute_sql: (input) => {
577
358
  const q = typeof input.query === 'string' ? input.query.trim().toLowerCase() : undefined
578
- if (!q || q.includes('now()') || q.includes('random()')) return undefined
359
+ if (!q || q.includes('now()') || q.includes('random()')) return undefined // non-cacheable
579
360
  return q
580
361
  },
581
362
  },
582
363
  }
583
364
  ```
584
365
 
585
- The `undefined` opt-out is **not** the same as `JSON.stringify(input)` that would dedup against the verbatim input. Use `undefined` to mean "this specific call is not cacheable" (timestamps baked in, randomness, debug flags).
586
-
587
- Tools with side effects or non-deterministic outputs (network, time, randomness) **must not** be listed — there is no safety net beyond the consumer's hasher. For MCP tools, key by the namespaced wire name (`mcp_<server>_<tool>`).
366
+ Tools with side effects or non-determinism (network, time, randomness) **must not** be listed. For MCP tools, key by the namespaced wire name (`mcp_<server>_<tool>`).
588
367
 
589
368
  ### Per-tool call budgets
590
369
 
591
- `behavior.toolBudgets` caps per-tool calls per run. Two reactions:
592
-
593
- - `'steer'` — let the call run, but emit a synthetic user message after the turn nudging the model to commit and finish. Fires once per tool per run.
594
- - `'block'` — refuse subsequent calls with `Blocked: <reason>`.
370
+ `behavior.toolBudgets` caps per-tool calls per run. `'steer'` lets the call run then nudges the model to commit (once per tool per run); `'block'` refuses with `Blocked: <reason>`.
595
371
 
596
372
  ```ts
597
373
  behavior: {
@@ -602,11 +378,11 @@ behavior: {
602
378
  }
603
379
  ```
604
380
 
605
- Pass a function for custom messages: `onExceed: ctx => ({ mode: 'steer', message: '...' })`. Subscribe to `tool-budget:exceeded` for telemetry. Counts include dedup hits — by design, since both eat against agent-loop sanity.
381
+ Pass a function for custom messages: `onExceed: ctx => ({ mode: 'steer', message: '...' })`. Counts include dedup hits — by design.
606
382
 
607
383
  ### Adaptive thinking budget
608
384
 
609
- `behavior.thinkingDecay` tapers the thinking budget across turns. Late turns are usually checkpoint / cleanup work where reasoning rarely pays for itself.
385
+ `behavior.thinkingDecay` tapers thinking across turns. Late turns are usually checkpoint work where reasoning rarely pays off.
610
386
 
611
387
  ```ts
612
388
  behavior: {
@@ -616,81 +392,51 @@ behavior: {
616
392
  }
617
393
  ```
618
394
 
619
- Pass a function for arbitrary curves: `thinkingDecay: (turn, base) => base / Math.sqrt(turn)`. No-op when `thinkingBudget` is unset. Honored by every provider that respects `thinkingBudget`.
395
+ Pass a function for arbitrary curves: `thinkingDecay: (turn, base) => base / Math.sqrt(turn)`. No-op when `thinkingBudget` is unset.
620
396
 
621
397
  ## Steering and Follow-up
622
398
 
623
- ### Steering
624
-
625
- Inject a message while the agent is working. Delivered between tool calls.
399
+ - `agent.steer(msg)` — inject mid-run, delivered between tool calls.
400
+ - `agent.followUp(msg)` — queue for after the run finishes.
626
401
 
627
402
  ```ts
628
403
  agent.steer('focus only on the tests directory')
629
- ```
630
-
631
- ### Follow-up
632
-
633
- Queue messages that extend the conversation after the agent finishes.
634
-
635
- ```ts
636
404
  agent.followUp('now write tests for what you built')
637
405
  ```
638
406
 
639
407
  ## Sub-agent Spawning
640
408
 
641
- The `spawn` tool delegates tasks to child agents that run independently.
409
+ `spawn` delegates to independent child agents. Children inherit the parent's preset (tools, system, aliases, MCP servers, skills, behavior) by default. Pass `preset` on `createSpawnTool()` to override per child.
642
410
 
643
411
  ```ts
644
- import { basicTools, definePreset } from 'zidane/presets'
645
- import { createSpawnTool } from 'zidane/tools'
412
+ import { basicTools, definePreset, createSpawnTool } from 'zidane'
646
413
 
647
- const orchestrator = definePreset({
414
+ definePreset({
648
415
  name: 'orchestrator',
649
416
  tools: {
650
417
  ...basicTools,
651
- spawn: createSpawnTool({
652
- maxConcurrent: 5,
653
- model: 'claude-haiku-4-5-20251001',
654
- thinking: 'low',
655
- }),
418
+ spawn: createSpawnTool({ maxConcurrent: 5, model: 'claude-haiku-4-5-20251001', thinking: 'low' }),
656
419
  },
657
420
  })
658
421
  ```
659
422
 
660
- Children inherit the parent's preset (tools, system prompt, aliases, MCP servers, skills, behavior) and can spawn their own children. Pass `preset` on `createSpawnTool()` to override the inherited slice per child.
661
-
662
423
  ## Interaction Tool
663
424
 
664
- Let the agent pause and request structured input from the outside world. Not included in any preset by default.
425
+ Pause the agent and request structured input. Not in any preset by default. `onRequest` may be async — the agent waits. Return a string or object.
665
426
 
666
427
  ```ts
667
- import { basicTools, definePreset } from 'zidane/presets'
668
- import { createInteractionTool } from 'zidane/tools'
428
+ import { createInteractionTool } from 'zidane'
669
429
 
670
430
  const askUser = createInteractionTool({
671
431
  name: 'ask_user',
672
- schema: {
673
- type: 'object',
674
- properties: { question: { type: 'string' } },
675
- required: ['question'],
676
- },
677
- onRequest: async (payload) => {
678
- const answer = await promptUser(payload.question)
679
- return { answer }
680
- },
681
- })
682
-
683
- const interactive = definePreset({
684
- name: 'interactive',
685
- tools: { ...basicTools, ask_user: askUser },
432
+ schema: { type: 'object', properties: { question: { type: 'string' } }, required: ['question'] },
433
+ onRequest: async ({ question }) => ({ answer: await promptUser(question) }),
686
434
  })
687
435
  ```
688
436
 
689
- `onRequest` can be async — the agent waits for the response. Return a string or object (objects are JSON-stringified).
690
-
691
437
  ## Sessions
692
438
 
693
- Sessions give an agent persistent turn history and run metadata across calls.
439
+ Persistent turn history + run metadata across calls. Turns persist incrementally — a crash leaves history up to the last completed turn.
694
440
 
695
441
  ```ts
696
442
  import { createAgent, createSession, createSqliteStore } from 'zidane'
@@ -703,45 +449,13 @@ await agent.run({ prompt: 'hello' })
703
449
  await session.save()
704
450
  ```
705
451
 
706
- Turns are persisted incrementally after each turn not as a full save. If the agent crashes, you have turns up to the last completed turn.
707
-
708
- ### Storage backends
709
-
710
- ```ts
711
- import { createMemoryStore, createRemoteStore, createFileMapStore } from 'zidane/session'
712
- import { createSqliteStore } from 'zidane/session/sqlite' // separate subpath (Bun-only)
713
-
714
- createMemoryStore() // in-memory, no persistence
715
- createSqliteStore({ path: './sessions.db' }) // SQLite via bun:sqlite — WAL mode, per-turn flush
716
- createRemoteStore({ url: 'https://api.example.com' }) // HTTP REST API
717
- createFileMapStore(hostAdapter) // bridge to any { get, save, delete } file-map backend
718
- ```
719
-
720
- `createSqliteStore` lives on its own subpath because it depends on `bun:sqlite`. Non-Bun consumers importing from `zidane` or `zidane/session` never evaluate that module.
721
-
722
- ### Restoring a session
452
+ Storage backends `createMemoryStore()` (in-memory), `createSqliteStore({ path })` from `zidane/session/sqlite` (Bun-only subpath; WAL, per-turn flush), `createRemoteStore({ url })` (HTTP), `createFileMapStore(adapter)` (any `{ get, save, delete }` backend; `turns.jsonl` + `meta.json`).
723
453
 
724
- ```ts
725
- import { loadSession } from 'zidane/session'
726
-
727
- const session = await loadSession(store, 'my-session')
728
- if (session) {
729
- const agent = createAgent({ ...basic, provider, session })
730
- await agent.run({ prompt: 'continue' })
731
- }
732
- ```
733
-
734
- ### Session hooks
735
-
736
- ```ts
737
- agent.hooks.hook('session:start', (ctx) => { /* ctx.sessionId, ctx.runId, ctx.prompt */ })
738
- agent.hooks.hook('session:end', (ctx) => { /* ctx.sessionId, ctx.runId, ctx.status, ctx.turnRange */ })
739
- agent.hooks.hook('session:turns', (ctx) => { /* ctx.sessionId, ctx.turns (SessionTurn[]), ctx.count */ })
740
- ```
454
+ Restore via `await loadSession(store, id)`. Session hooks: `session:start`, `session:turns`, `session:end` (always fires, carries `turnRange`).
741
455
 
742
456
  ## MCP Servers
743
457
 
744
- Connect any MCP-compatible tool server. Tools are namespaced as `mcp_{server}_{tool}`.
458
+ Connect any MCP server. Tools are namespaced `mcp_{server}_{tool}`. Connections are lazy (first `run()`) and reused; all servers bootstrap in parallel.
745
459
 
746
460
  ```ts
747
461
  const agent = createAgent({
@@ -754,93 +468,55 @@ const agent = createAgent({
754
468
  })
755
469
  ```
756
470
 
757
- MCP servers can live on a preset too (they're just `AgentOptions` fields). Connections are lazy (first `run()`) and reused.
758
- Set `bootstrapTimeout` to cap how long a slow `connect + listTools` phase can delay the first model request. Per-server `disclosure: 'lazy' | 'eager'` overrides the agent-wide `behavior.toolDisclosure` (see [Progressive tool disclosure](#progressive-tool-disclosure)).
471
+ Per-server `disclosure: 'lazy' | 'eager'` overrides `behavior.toolDisclosure` (see [Progressive tool disclosure](#progressive-tool-disclosure)).
759
472
 
760
473
  ### Hiding bootstrap latency
761
474
 
762
- Every server is bootstrapped in parallel, but the first `run()` still waits for the slowest one. Two knobs to hide the cost:
475
+ The first `run()` still waits on the slowest server. Two knobs:
763
476
 
764
477
  ```ts
765
- // Option 1 pre-warm manually behind other setup work.
766
- const agent = createAgent({ provider, mcpServers })
767
- await Promise.all([agent.warmup(), authenticate(), loadConfig()])
768
- await agent.run({ prompt: 'go' }) // no MCP wait here
769
-
770
- // Option 2 — let createAgent kick the warmup off for you.
771
- const agent = createAgent({ provider, mcpServers, eager: true })
772
- // ... unrelated startup work ...
773
- await agent.run({ prompt: 'go' }) // awaits the in-flight warmup
478
+ await Promise.all([agent.warmup(), authenticate(), loadConfig()]) // pre-warm manually
479
+ const agent = createAgent({ provider, mcpServers, eager: true }) // or kick off automatically
774
480
  ```
775
481
 
776
- `warmup()` is idempotent and safe to call from multiple callers concurrently. Failures are surfaced on the next `warmup()` / `run()` rather than crashing the eager kickoff.
777
-
778
- ### Observability
482
+ `warmup()` is idempotent and concurrency-safe. Failures surface on the next `warmup()` / `run()`, not on the eager kickoff.
779
483
 
780
- Two hooks fire around each per-server bootstrap, regardless of success:
484
+ Two hooks fire per bootstrap regardless of outcome — attribute cold-start latency per server:
781
485
 
782
486
  ```ts
783
- agent.hooks.hook('mcp:bootstrap:start', ({ name, transport }) => { /* ... */ })
784
487
  agent.hooks.hook('mcp:bootstrap:end', (ctx) => {
785
- // ctx.name, ctx.transport, ctx.durationMs
786
- // ctx.ok === true → ctx.toolCount
787
- // ctx.ok === false → ctx.error
488
+ // ctx.name, ctx.transport, ctx.durationMs, ctx.ok
489
+ // ok ? ctx.toolCount : ctx.error
788
490
  })
789
491
  ```
790
492
 
791
- Use these to attribute cold-start latency per server — the only way to know if a specific MCP (e.g. a remote GitHub MCP) is the one stretching your first `run()`.
792
-
793
493
  ## Progressive tool disclosure
794
494
 
795
- When MCP brings hundreds of tools, every turn ships every schema in the tool list. `behavior.toolDisclosure: 'lazy'` flips MCP tools to a name-only catalog in the system prompt and auto-injects a `tool_search` native tool the model uses to load schemas on demand. Native (non-MCP) tools and skill tools are always eager — only MCP tools are eligible for lazy disclosure.
495
+ With hundreds of MCP tools, every turn ships every schema. `behavior.toolDisclosure: 'lazy'` flips MCP tools to a name-only catalog and auto-injects a `tool_search` native tool. Native + skill tools stay eager.
796
496
 
797
497
  ```ts
798
498
  const agent = createAgent({
799
499
  ...basic,
800
500
  provider,
801
501
  mcpServers: [
802
- { name: 'github', transport: 'stdio', command: 'gh-mcp' }, // 200+ tools
803
- { name: 'fs', transport: 'stdio', command: 'fs-mcp', disclosure: 'eager' }, // override per-server
502
+ { name: 'github', transport: 'stdio', command: 'gh-mcp' }, // 200+ tools
503
+ { name: 'fs', transport: 'stdio', command: 'fs-mcp', disclosure: 'eager' }, // per-server override
804
504
  ],
805
- behavior: {
806
- toolDisclosure: 'lazy',
807
- toolSearch: { limit: 20 }, // default cap on results per call
808
- },
505
+ behavior: { toolDisclosure: 'lazy', toolSearch: { limit: 20 } },
809
506
  })
810
507
  ```
811
508
 
812
- The catalog appended to the system prompt looks like:
813
-
814
- ```
815
- <searchable_tools>
816
- <server name="github">
817
- <tool name="mcp_github_search_issues">Search GitHub issues by query.</tool>
818
- <tool name="mcp_github_create_pr">Open a pull request.</tool>
819
-
820
- </server>
821
- </searchable_tools>
822
- ```
823
-
824
- `tool_search` accepts `query` (substring), `names` (explicit), `server` (bulk-unlock one server), and `limit`. Surfaced tools persist for the rest of the run; the loop rebuilds the wire-level tool list each turn so the next provider call advertises them.
825
-
826
- Two hard guarantees:
827
-
828
- - **Hard gate.** A `tool:gate` middleware refuses dispatch on lazy tools the model hasn't surfaced yet — production providers already enforce this server-side, but the in-loop gate covers custom / mock / lenient providers and any path where a model quotes a name straight from the catalog. Refusal text points the model at `tool_search` so it self-corrects.
829
- - **Aliasing-safe.** Catalog and `tool_search` results show the **wire** (`toolAliases`-rewritten) name — the only one the provider accepts. The unlock set is keyed by canonical name so dispatch / `session.turns` / hook contexts stay alias-stable.
830
-
831
- Cost model: each `tool_search` call appends to the wire-level tool list, advancing the provider's tool-list cache breakpoint. That costs one cache miss per discovery wave; subsequent turns with the same unlocked set hit cache normally. With many lazy tools and few discovery waves, this still beats eager (which always sends every schema) — but it's not a free optimisation.
509
+ System prompt gains `<searchable_tools>` with `name + description` per lazy tool. `tool_search` accepts `query` (substring), `names`, `server`, `limit` — matches unlock for the rest of the run. A `tool:gate` middleware refuses dispatch on un-surfaced lazy tools (covers custom/mock providers; production providers also refuse server-side). Catalog + search results show the **wire** name; the unlock set keys on canonical so dispatch and `session.turns` stay alias-stable.
832
510
 
833
- Opt out via `behavior.toolSearch.tool: false` (the catalog still emits, the call-to-action prose drops). Pre-existing host tool named `tool_search` shadows the auto-injection — see the JSDoc on `behavior.toolSearch` for the host-defined-tool semantics.
511
+ Cost: one cache miss per discovery wave (the tool list grows); subsequent turns hit cache. Opt out via `behavior.toolSearch.tool: false` (catalog still emits, call-to-action drops). A pre-existing host tool named `tool_search` shadows the auto-injection.
834
512
 
835
513
  ## Skills
836
514
 
837
- Reusable instruction packages following the [Agent Skills](https://agentskills.io/specification) open standard.
838
-
839
- ### SKILL.md format
515
+ Reusable instruction packages following the [Agent Skills](https://agentskills.io/specification) standard.
840
516
 
841
517
  ```
842
518
  my-skill/
843
- SKILL.md
519
+ SKILL.md # frontmatter + instructions
844
520
  scripts/ # optional
845
521
  references/ # optional
846
522
  assets/ # optional
@@ -856,92 +532,52 @@ allowed-tools: Bash Read Write
856
532
  paths: "src/**/*.ts, test/**/*.ts"
857
533
  ---
858
534
 
859
- Full instructions the agent receives when this skill activates.
535
+ Full instructions the agent receives on activation.
860
536
  ```
861
537
 
862
- ### Discovery
863
-
864
- Scan paths in priority order (first found wins):
865
-
866
- 1. `{cwd}/.agents/skills`
867
- 2. `{cwd}/.zidane/skills`
868
- 3. `~/.agents/skills`
869
- 4. `~/.zidane/skills`
870
-
871
- ### Configuration
538
+ Default scan paths (first found wins): `{cwd}/.agents/skills`, `{cwd}/.zidane/skills`, `~/.agents/skills`, `~/.zidane/skills`. Instructions support `!\`command\`` — runs during resolution; output replaces the placeholder.
872
539
 
873
540
  ```ts
874
541
  import { createAgent, defineSkill } from 'zidane'
875
542
 
876
- const agent = createAgent({
543
+ createAgent({
877
544
  ...basic,
878
545
  provider,
879
546
  skills: {
880
547
  scan: ['./custom-skills'],
881
- write: [
882
- defineSkill({
883
- name: 'review',
884
- description: 'Code review guidelines.',
885
- instructions: 'Review for correctness and test coverage.',
886
- }),
887
- ],
548
+ write: [defineSkill({ name: 'review', description: 'Code review.', instructions: '...' })],
888
549
  exclude: ['deprecated-skill'],
889
550
  enabled: ['review', 'deploy'],
890
551
  },
891
552
  })
892
553
  ```
893
554
 
894
- Instructions support `!\`command\`` for dynamic content — commands run during resolution and output replaces the placeholder.
895
-
896
555
  ## Execution Contexts
897
556
 
898
- An execution context defines **where** tools run. Defaults to in-process.
899
-
900
- ### Docker
557
+ Where tools run. Defaults to in-process. Docker isolates; sandbox runs remotely (E2B, Rivet, custom).
901
558
 
902
559
  ```ts
903
- import { createAgent, createDockerContext } from 'zidane'
560
+ import { createDockerContext, createSandboxContext } from 'zidane'
904
561
 
905
- const agent = createAgent({
906
- ...basic,
907
- provider,
908
- execution: createDockerContext({
909
- image: 'node:22',
910
- cwd: '/workspace',
911
- limits: { memory: 512, cpu: '1.0' },
912
- }),
913
- })
914
- ```
915
-
916
- ### Sandbox (remote)
917
-
918
- Implement `SandboxProvider` for your provider (E2B, Rivet, etc.):
919
-
920
- ```ts
921
- import { createAgent, createSandboxContext } from 'zidane'
922
-
923
- const agent = createAgent({
924
- ...basic,
925
- provider,
926
- execution: createSandboxContext(myProvider),
927
- })
562
+ createDockerContext({ image: 'node:22', cwd: '/workspace', limits: { memory: 512, cpu: '1.0' } })
563
+ createSandboxContext(myProvider) // implement SandboxProvider
928
564
  ```
929
565
 
930
566
  ## State Management
931
567
 
932
568
  ```ts
933
- agent.isRunning // is a run in progress?
934
- agent.turns // conversation history (SessionTurn[])
935
- agent.abort() // cancel the current run
936
- agent.reset() // clear messages and queues
937
- await agent.warmup() // pre-connect MCP (idempotent, safe to call concurrently)
938
- await agent.destroy() // clean up context + MCP connections
939
- await agent.waitForIdle() // wait for current run to complete
569
+ agent.isRunning // run in progress?
570
+ agent.turns // SessionTurn[]
571
+ agent.abort() // cancel current run
572
+ agent.reset() // clear turns + queues
573
+ await agent.warmup() // pre-connect MCP (idempotent)
574
+ await agent.destroy() // clean up context + MCP
575
+ await agent.waitForIdle() // wait for run to complete
940
576
  ```
941
577
 
942
578
  ## Message Format
943
579
 
944
- All messages use a canonical format. Providers convert to/from wire formats internally.
580
+ Canonical format. Providers convert to/from wire formats internally.
945
581
 
946
582
  ```ts
947
583
  type SessionContentBlock =
@@ -954,24 +590,15 @@ type SessionContentBlock =
954
590
  type ToolResultContent =
955
591
  | { type: 'text', text: string }
956
592
  | { type: 'image', mediaType: string, data: string }
957
-
958
- interface SessionMessage {
959
- role: 'user' | 'assistant'
960
- content: SessionContentBlock[]
961
- }
962
593
  ```
963
594
 
964
- Tool results can carry structured content — pure-text tools keep returning a `string`, tools that produce images (MCP browser servers, screenshot tools) return a `ToolResultContent[]` that the loop routes natively on providers with `imageInToolResult: true` and via a companion user message elsewhere. Use `toolResultToText(output)` to flatten when a consumer only handles strings.
965
-
966
- Converters for external interop:
595
+ Image-producing tools (MCP browsers, screenshots) return `ToolResultContent[]` routed natively on providers with `imageInToolResult: true`, via companion user message elsewhere. Flatten with `toolResultToText(output)`.
967
596
 
968
- ```ts
969
- import { fromAnthropic, toAnthropic, fromOpenAI, toOpenAI, autoDetectAndConvert } from 'zidane'
970
- ```
597
+ External interop converters: `fromAnthropic`, `toAnthropic`, `fromOpenAI`, `toOpenAI`, `autoDetectAndConvert` (re-exported from `zidane`).
971
598
 
972
599
  ## Typed Errors
973
600
 
974
- Provider failures are wrapped into typed error classes before leaving `agent.run()` match on `instanceof` instead of sniffing strings.
601
+ Provider failures are wrapped before leaving `agent.run()`. Match on `instanceof`, not strings. Every provider ships `classifyError(err)`; unrecognized errors fall through as `AgentProviderError`. Abort paths (`agent.abort()` / `AbortSignal`) always produce `AgentAbortedError`.
975
602
 
976
603
  ```ts
977
604
  import { AgentAbortedError, AgentContextExceededError, AgentProviderError } from 'zidane'
@@ -980,23 +607,17 @@ try {
980
607
  await agent.run({ prompt })
981
608
  }
982
609
  catch (err) {
983
- if (err instanceof AgentContextExceededError) {
984
- // prune history, retry
985
- }
986
- else if (err instanceof AgentAbortedError) {
987
- // user cancelled
988
- }
610
+ if (err instanceof AgentContextExceededError) { /* prune history, retry */ }
611
+ else if (err instanceof AgentAbortedError) { /* user cancelled */ }
989
612
  else if (err instanceof AgentProviderError) {
990
613
  console.error(`${err.provider}: ${err.message} (${err.providerCode})`)
991
614
  }
992
615
  }
993
616
  ```
994
617
 
995
- Every provider ships a `classifyError(err)` that maps native errors into a `ClassifiedError` union — unrecognized errors fall through as `AgentProviderError`. Abort paths (`agent.abort()` or a triggered `AbortSignal`) always produce `AgentAbortedError` regardless of classification.
996
-
997
618
  ## Structured Output
998
619
 
999
- Force the agent's final response to match a JSON Schema via provider-level tool forcing.
620
+ Force the final response to a JSON Schema via provider-level tool forcing. Lands on `stats.output` and fires the `output` hook (`ctx.output`, `ctx.schema`).
1000
621
 
1001
622
  ```ts
1002
623
  const stats = await agent.run({
@@ -1009,84 +630,45 @@ const stats = await agent.run({
1009
630
  },
1010
631
  },
1011
632
  })
1012
-
1013
633
  console.log(stats.output) // { name: 'Alice', age: 30 }
1014
634
  ```
1015
635
 
1016
- The `output` hook fires when structured output is extracted:
1017
-
1018
- ```ts
1019
- agent.hooks.hook('output', (ctx) => {
1020
- // ctx.output — the parsed JSON matching the schema
1021
- // ctx.schema — the schema that was enforced
1022
- })
1023
- ```
1024
-
1025
- ### Zod v4 integration
1026
-
1027
- Use `zodToJsonSchema` to normalize `z.toJsonSchema()` output for tool schemas:
1028
-
1029
- ```ts
1030
- import { z } from 'zod'
1031
- import { zodToJsonSchema } from 'zidane'
1032
-
1033
- const schema = zodToJsonSchema(z.toJsonSchema(z.object({ name: z.string() })))
1034
- ```
636
+ For Zod v4, normalize via `zodToJsonSchema(z.toJsonSchema(schema))` strips `$schema` (some providers reject it).
1035
637
 
1036
638
  ## Usage Tracking
1037
639
 
1038
- `stats.totalIn` / `stats.totalOut` / `stats.cost` are **cumulative** parent
1039
- loop plus every recursively-spawned sub-agent. `stats.turns` and
1040
- `stats.turnUsage` cover the parent loop only; reach for the helpers below for
1041
- tree-wide breakdowns.
640
+ `stats.totalIn` / `stats.totalOut` / `stats.cost` are **cumulative** (parent + recursive children). `stats.turns` and `stats.turnUsage` cover the parent loop only. Use helpers for tree-wide breakdowns.
1042
641
 
1043
642
  ```ts
1044
643
  import { flattenTurns, statsByModel } from 'zidane'
1045
644
 
1046
- const stats = await agent.run({ prompt: 'hello' })
1047
- stats.totalIn // cumulative input tokens (parent + recursive children)
1048
- stats.totalOut // cumulative output tokens
1049
- stats.cost // cumulative USD cost (when reported by provider)
1050
- stats.turnUsage // TurnUsage[] — parent loop only
1051
- stats.children // ChildRunStats[] — recursive subtree, completion order
1052
- stats.timeTillFirstTokenMs // ms from run() start to the first stream/tool event
1053
-
1054
- flattenTurns(stats) // every TurnUsage in the tree, parent first then DFS children
1055
- statsByModel(stats) // Map<modelId, { input, output, cost, cacheRead, cacheCreation, turns }>
1056
- ```
1057
-
1058
- ## Types
645
+ stats.totalIn / stats.totalOut / stats.cost // cumulative
646
+ stats.turnUsage // parent loop only
647
+ stats.children // ChildRunStats[] in completion order
648
+ stats.timeTillFirstTokenMs // ms to first stream/tool event
1059
649
 
1060
- All types are available from `zidane/types`:
1061
-
1062
- ```ts
1063
- import type { Agent, SessionTurn, TurnUsage, Provider, ToolDef, ValidationResult } from 'zidane/types'
1064
-
1065
- // Hook context types for typed event handlers
1066
- import type { ToolHookContext, McpToolHookContext, SessionHookContext, StreamHookContext } from 'zidane/types'
650
+ flattenTurns(stats) // every TurnUsage in tree (DFS)
651
+ statsByModel(stats) // Map<modelId, { input, output, cost, cacheRead, cacheCreation, turns }>
1067
652
  ```
1068
653
 
1069
- Helpers (re-exported from the main entry):
654
+ ## Types & Helpers
1070
655
 
1071
- ```ts
1072
- import { toolResultToText, toolOutputByteLength, validateToolArgs } from 'zidane'
1073
- ```
656
+ Types from `zidane/types` (`Agent`, `SessionTurn`, `TurnUsage`, `Provider`, `ToolDef`, `ValidationResult`, hook contexts).
1074
657
 
1075
- - `toolResultToText(content)` — flatten `string | ToolResultContent[]` to a string for logging.
1076
- - `toolOutputByteLength(content)` — same formula the loop uses for `outputBytes`.
1077
- - `validateToolArgs(input, schema)` — the validator the loop runs between `tool:gate` and `tool:before`. Useful for unit tests of consumer tool definitions.
658
+ Helpers re-exported from `zidane`:
659
+ - `toolResultToText(content)` — flatten for logging.
660
+ - `toolOutputByteLength(content)` — same formula as `outputBytes`.
661
+ - `validateToolArgs(input, schema)` — the loop's validator.
1078
662
 
1079
- ## Testing
663
+ ## Testing & Benchmarks
1080
664
 
1081
665
  ```bash
1082
666
  bun test
1083
667
  ```
1084
668
 
1085
- 1000+ tests with mock provider and execution context. No API keys or Docker needed; the suite runs in under 2 seconds.
1086
-
1087
- ## Benchmarks
669
+ 1000+ tests with mock provider + execution context. Under 2 s; no API keys or Docker.
1088
670
 
1089
- Harness integrations for running zidane against public agent benchmarks live in [`benchmarks/`](./benchmarks). First integration: [Terminal-Bench](./benchmarks/terminal-bench).
671
+ Benchmark harnesses live in [`benchmarks/`](./benchmarks). First integration: [Terminal-Bench](./benchmarks/terminal-bench).
1090
672
 
1091
673
  ## License
1092
674