@tangle-network/agent-eval 0.20.10 → 0.20.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (52) hide show
  1. package/README.md +129 -126
  2. package/dist/benchmarks/index.d.ts +2 -1
  3. package/dist/{chunk-JAOLXRIA.js → chunk-75MCTH7P.js} +8 -2
  4. package/dist/chunk-75MCTH7P.js.map +1 -0
  5. package/dist/chunk-HKYRWNHV.js +1354 -0
  6. package/dist/chunk-HKYRWNHV.js.map +1 -0
  7. package/dist/{chunk-LSR4IAYN.js → chunk-HNJLMAJ2.js} +2 -2
  8. package/dist/chunk-IKFVX537.js +717 -0
  9. package/dist/chunk-IKFVX537.js.map +1 -0
  10. package/dist/chunk-KWUAAIHR.js +1764 -0
  11. package/dist/chunk-KWUAAIHR.js.map +1 -0
  12. package/dist/chunk-MCMV7DUL.js +1310 -0
  13. package/dist/chunk-MCMV7DUL.js.map +1 -0
  14. package/dist/chunk-ODFINDLQ.js +413 -0
  15. package/dist/chunk-ODFINDLQ.js.map +1 -0
  16. package/dist/chunk-PKCVBYTQ.js +200 -0
  17. package/dist/chunk-PKCVBYTQ.js.map +1 -0
  18. package/dist/chunk-YUFXO3TU.js +148 -0
  19. package/dist/chunk-YUFXO3TU.js.map +1 -0
  20. package/dist/cli.js +2 -2
  21. package/dist/control-C8NKbF3w.d.ts +258 -0
  22. package/dist/control.d.ts +5 -0
  23. package/dist/control.js +30 -0
  24. package/dist/control.js.map +1 -0
  25. package/dist/dataset-B9qvlm_o.d.ts +112 -0
  26. package/dist/emitter-BYO2nSDA.d.ts +387 -0
  27. package/dist/feedback-trajectory-BGQ_ANCN.d.ts +345 -0
  28. package/dist/{index-1PZOtZFr.d.ts → index-c5saLbKD.d.ts} +2 -133
  29. package/dist/index.d.ts +115 -2870
  30. package/dist/index.js +1049 -6156
  31. package/dist/index.js.map +1 -1
  32. package/dist/multi-shot-optimization-Bvtz294B.d.ts +598 -0
  33. package/dist/openapi.json +1 -1
  34. package/dist/optimization.d.ts +145 -0
  35. package/dist/optimization.js +60 -0
  36. package/dist/optimization.js.map +1 -0
  37. package/dist/reporting.d.ts +426 -0
  38. package/dist/reporting.js +32 -0
  39. package/dist/reporting.js.map +1 -0
  40. package/dist/run-record-CX_jcAyr.d.ts +134 -0
  41. package/dist/traces.d.ts +658 -0
  42. package/dist/traces.js +100 -0
  43. package/dist/traces.js.map +1 -0
  44. package/dist/wire/index.js +2 -2
  45. package/docs/concepts.md +16 -11
  46. package/docs/feature-guide.md +10 -17
  47. package/docs/integration-launch-gates.md +77 -0
  48. package/docs/product-eval-adoption.md +221 -0
  49. package/docs/trace-analysis.md +75 -0
  50. package/package.json +21 -1
  51. package/dist/chunk-JAOLXRIA.js.map +0 -1
  52. /package/dist/{chunk-LSR4IAYN.js.map → chunk-HNJLMAJ2.js.map} +0 -0
package/README.md CHANGED
@@ -1,63 +1,24 @@
1
1
  # @tangle-network/agent-eval
2
2
 
3
- Trace-first evaluation infrastructure for agent systems.
4
-
5
- `agent-eval` provides the contracts and runtime primitives for measuring agent
6
- behavior: traces, harnesses, verifier pipelines, judges, datasets, holdout
7
- gates, failure classification, optimization loops, and release reports.
8
-
9
- It does not own your product state, credentials, UI, or model routing. Product
10
- teams keep those boundaries; this package standardizes how runs are recorded,
11
- checked, compared, and promoted.
12
-
13
- ## Contents
14
-
15
- - [When To Use It](#when-to-use-it)
16
- - [Architecture](#architecture)
17
- - [Install](#install)
18
- - [Core Primitives](#core-primitives)
19
- - [Examples](#examples)
20
- - [Documentation](#documentation)
21
- - [Development](#development)
22
- - [Related Packages](#related-packages)
23
-
24
- ## When To Use It
25
-
26
- Use `agent-eval` when you need one or more of these:
27
-
28
- - A reproducible eval harness for coding agents, builder agents, or multi-tool
29
- workflows.
30
- - Structured traces for agent runs: spans, artifacts, events, budgets, tool
31
- calls, retrieval, judge output, and sandbox execution.
32
- - Deterministic gates around build/test/deploy checks.
33
- - LLM-as-judge or deterministic judge fleets with calibration and canaries.
34
- - Dataset splits, holdouts, paired statistics, and release confidence gates.
35
- - Failure taxonomy that distinguishes prompt, tool, sandbox, retrieval,
36
- evaluator, and knowledge-readiness failures.
37
- - Optimization loops over prompts, steering, code mutations, or full multi-shot
38
- trajectories.
39
- - Report data for internal launch reviews, CI gates, and research analysis.
40
-
41
- ## Architecture
3
+ Evaluation infrastructure for agent products.
4
+
5
+ Use it to wrap the real workflow your users run, record what happened, verify
6
+ the result, turn feedback into replay data, compare variants, and ship only
7
+ when the evidence improves.
42
8
 
43
9
  ```txt
44
- agent/product run
45
- -> TraceEmitter / TraceStore
46
- -> SandboxHarness / MultiLayerVerifier / JudgeRunner
47
- -> failure taxonomy + metrics
48
- -> paired stats + held-out gates
49
- -> optimization + release confidence + reports
10
+ product task
11
+ -> observe state
12
+ -> validate with deterministic gates first
13
+ -> act through the real product adapter
14
+ -> trace + feedback trajectory
15
+ -> replay / optimize / release gate
50
16
  ```
51
17
 
52
- Package responsibilities:
53
-
54
- - `agent-eval`: run evidence, eval contracts, verification, statistics,
55
- optimization, reporting.
56
- - Product app: domain state, tools, credentials, UI, storage, deployment, model
57
- gateway.
58
- - `@tangle-network/agent-runtime`: production agent-loop/session runtime.
59
- - `@tangle-network/agent-knowledge`: evidence stores, claim/page synthesis,
60
- retrieval, knowledge readiness implementation.
18
+ `agent-eval` does not own product state, credentials, UI, storage, model
19
+ routing, browser drivers, sandbox policy, or deployment. Products own those.
20
+ This package owns eval contracts, loop mechanics, traces, statistics,
21
+ optimization inputs, and release evidence.
61
22
 
62
23
  ## Install
63
24
 
@@ -65,106 +26,148 @@ Package responsibilities:
65
26
  pnpm add @tangle-network/agent-eval
66
27
  ```
67
28
 
68
- Wire protocol / CLI:
69
-
70
- ```sh
71
- npm i -g @tangle-network/agent-eval
72
- agent-eval serve --port 5005
29
+ ## Quick Start
30
+
31
+ ```ts
32
+ import {
33
+ objectiveEval,
34
+ runAgentControlLoop,
35
+ } from '@tangle-network/agent-eval/control'
36
+
37
+ const result = await runAgentControlLoop({
38
+ intent: task.prompt,
39
+ budget: { maxSteps: 8, maxWallMs: 180_000, maxCostUsd: 2 },
40
+
41
+ observe() {
42
+ return product.readState(task.id)
43
+ },
44
+
45
+ validate({ state }) {
46
+ return [
47
+ objectiveEval({
48
+ id: 'build-passes',
49
+ passed: state.build.exitCode === 0,
50
+ severity: 'critical',
51
+ metadata: state.build,
52
+ }),
53
+ objectiveEval({
54
+ id: 'preview-serves',
55
+ passed: state.preview.httpStatus === 200,
56
+ severity: 'critical',
57
+ }),
58
+ ]
59
+ },
60
+
61
+ decide({ evals }) {
62
+ const failed = evals.filter((e) => !e.passed)
63
+ if (failed.length === 0) {
64
+ return { type: 'stop', pass: true, reason: 'all gates passed' }
65
+ }
66
+ return {
67
+ type: 'continue',
68
+ action: { type: 'repair', failed: failed.map((e) => e.id) },
69
+ reason: 'repair failed gates',
70
+ }
71
+ },
72
+
73
+ act(action) {
74
+ return product.runAgentStep(task.id, action)
75
+ },
76
+ })
77
+
78
+ await product.storeEvalResult(task.id, result)
73
79
  ```
74
80
 
75
- Python client source lives in `clients/python`. Until the PyPI package is
76
- published, install it from the repo:
81
+ That loop should be the same shape in production, replay, benchmark, and
82
+ optimization. Swap dependencies behind `observe()` and `act()`, not the eval
83
+ contract itself.
77
84
 
78
- ```sh
79
- cd clients/python
80
- pip install -e .
85
+ ## Import Paths
86
+
87
+ The root export remains available, but new code should prefer focused subpaths:
88
+
89
+ ```ts
90
+ import { runAgentControlLoop } from '@tangle-network/agent-eval/control'
91
+ import { runMultiShotOptimization } from '@tangle-network/agent-eval/optimization'
92
+ import { TraceEmitter } from '@tangle-network/agent-eval/traces'
93
+ import { renderReleaseReport } from '@tangle-network/agent-eval/reporting'
81
94
  ```
82
95
 
83
- ## Core Primitives
84
-
85
- | Primitive | Purpose |
86
- |---|---|
87
- | `TraceEmitter`, `TraceStore` | Append-only run/span/event/artifact/budget records. |
88
- | `SandboxHarness` | Build/test/runtime checks with captured stdout, stderr, exit codes, wall time, and parsed test counts. |
89
- | `MultiLayerVerifier` | Ordered verification stages with dependencies, skip-on-fail, findings, scores, and time caps. |
90
- | `JudgeRunner` | Parallel deterministic or LLM-backed judges over the same artifact/run. |
91
- | `runAgentControlLoop` | Observe/validate/decide/act loop with budgets, stop policies, and structured eval results. |
92
- | `Dataset`, `RunRecord`, `HeldOutGate` | Versioned corpora, reproducible run metadata, and held-out promotion decisions. |
93
- | `pairedBootstrap`, `pairedWilcoxon`, `bhAdjust` | Paired experiment statistics and multiple-comparison correction. |
94
- | `classifyFailure` | Rule-based failure classification for agent, tool, sandbox, retrieval, and knowledge failures. |
95
- | `runMultiShotOptimization` | Optimization over full agent trajectories with actionable side information. |
96
- | `runPromptEvolution` | Prompt/steering/code evolution over scenario scores. |
97
- | `evaluateReleaseConfidence` | Release scorecard across evidence volume, pass rate, score, overfit, cost, latency, and gates. |
98
- | `summaryTable`, `paretoChart`, `gainHistogram` | Report-ready structured outputs. |
99
- | `KnowledgeRequirement`, `KnowledgeBundle` | Shared contracts for knowledge readiness. |
100
-
101
- `NoopResearcher` is a fail-loud sentinel for wiring tests. Production systems
102
- should implement `Researcher` directly or use `CallbackResearcher`.
96
+ | Subpath | Use for |
97
+ | --- | --- |
98
+ | `@tangle-network/agent-eval/control` | `observe -> validate -> decide -> act`, action policy, propose/review loops |
99
+ | `@tangle-network/agent-eval/traces` | trace stores, emitters, TraceAnalyst |
100
+ | `@tangle-network/agent-eval/optimization` | feedback trajectories, multi-shot optimization, prompt evolution |
101
+ | `@tangle-network/agent-eval/reporting` | release confidence, paired stats, report/table/chart specs |
102
+ | `@tangle-network/agent-eval/wire` | HTTP/RPC judge server and schemas |
103
+ | `@tangle-network/agent-eval/benchmarks` | benchmark adapter contracts and reference wrappers |
104
+
105
+ ## Core Pieces
106
+
107
+ | Need | Use |
108
+ | --- | --- |
109
+ | Keep an agent working until objective state passes | `runAgentControlLoop` |
110
+ | Turn user/reviewer feedback into replay data | `FeedbackTrajectory` |
111
+ | Compare prompt/tool/retrieval policies over full trajectories | `runMultiShotOptimization` |
112
+ | Gate releases with paired evidence and holdouts | `evaluateReleaseConfidence`, `HeldOutGate` |
113
+ | Explain regressions across trace corpora | `TraceAnalyst` / `analyzeTraces` |
114
+ | Report a launch decision | `renderReleaseReport`, `summaryTable`, `paretoChart`, `gainHistogram` |
115
+ | Model missing context separately from bad reasoning | `KnowledgeRequirement`, `KnowledgeBundle` |
103
116
 
104
117
  ## Examples
105
118
 
106
- Runnable examples live in the repository's
107
- [`examples/`](https://github.com/tangle-network/agent-eval/tree/main/examples)
108
- directory. They are not part of the published npm package.
119
+ Runnable examples live in
120
+ [`examples/`](https://github.com/tangle-network/agent-eval/tree/main/examples).
109
121
 
110
- - [`examples/same-sandbox-harness`](https://github.com/tangle-network/agent-eval/tree/main/examples/same-sandbox-harness) - run
111
- multiple eval passes against the same workspace.
112
- - [`examples/multi-shot-optimization`](https://github.com/tangle-network/agent-eval/tree/main/examples/multi-shot-optimization) -
113
- optimize full agent trajectories with held-out promotion.
114
- - [`examples/benchmarks`](https://github.com/tangle-network/agent-eval/tree/main/examples/benchmarks) - benchmark adapter shape and
115
- reference benchmark wrappers.
122
+ - [`examples/multi-shot-optimization`](https://github.com/tangle-network/agent-eval/tree/main/examples/multi-shot-optimization):
123
+ optimize full trajectories with held-out promotion.
124
+ - [`examples/same-sandbox-harness`](https://github.com/tangle-network/agent-eval/tree/main/examples/same-sandbox-harness):
125
+ run setup/build/test and evidence checks in one workspace.
126
+ - [`examples/benchmarks`](https://github.com/tangle-network/agent-eval/tree/main/examples/benchmarks):
127
+ benchmark adapter shape and reference wrappers.
116
128
 
117
- The examples are intentionally kept outside the README so they can be expanded,
118
- tested, and copied without turning this page into a tutorial.
129
+ ## Docs
119
130
 
120
- ## Documentation
131
+ Read in this order:
121
132
 
122
- - [Concepts](./docs/concepts.md)
123
- - [Feature Guide](./docs/feature-guide.md)
124
- - [Control Runtime](./docs/control-runtime.md)
125
- - [Knowledge Readiness](./docs/knowledge-readiness.md)
126
- - [Multi-Shot Optimization](./docs/multi-shot-optimization.md)
127
- - [Feedback Trajectories](./docs/feedback-trajectories.md)
128
- - [Wire Protocol](./docs/wire-protocol.md)
133
+ 1. [Product Eval Adoption](./docs/product-eval-adoption.md)
134
+ 2. [Control Runtime](./docs/control-runtime.md)
135
+ 3. [Feedback Trajectories](./docs/feedback-trajectories.md)
136
+ 4. [Multi-Shot Optimization](./docs/multi-shot-optimization.md)
137
+ 5. [Trace Analysis](./docs/trace-analysis.md)
138
+ 6. [Knowledge Readiness](./docs/knowledge-readiness.md)
139
+ 7. [Integration Launch Gates](./docs/integration-launch-gates.md)
140
+ 8. [Wire Protocol](./docs/wire-protocol.md)
129
141
 
130
- ## Development
142
+ ## CLI / Wire Protocol
131
143
 
132
144
  ```sh
133
- pnpm install
134
- pnpm typecheck
135
- pnpm test
136
- pnpm build
137
- pnpm openapi
145
+ npm i -g @tangle-network/agent-eval
146
+ agent-eval serve --port 5005
138
147
  ```
139
148
 
140
- Run the local server:
149
+ The Python client lives in `clients/python`:
141
150
 
142
151
  ```sh
143
- pnpm build
144
- node dist/cli.js serve --port 5005
152
+ cd clients/python
153
+ pip install -e .
145
154
  ```
146
155
 
147
- Python client tests:
156
+ ## Development
148
157
 
149
158
  ```sh
159
+ pnpm install
160
+ pnpm typecheck
161
+ pnpm test
150
162
  pnpm build
151
- cd clients/python
152
- pip install -e ".[dev]"
153
- pytest
163
+ pnpm openapi
154
164
  ```
155
165
 
156
- ## Release
157
-
158
- `@tangle-network/agent-eval` publishes to npm. The Python client lives under
159
- `clients/python` and is versioned from this repository.
160
-
161
166
  ## Related Packages
162
167
 
163
- - [`@tangle-network/agent-runtime`](https://github.com/tangle-network/agent-runtime)
164
- - [`@tangle-network/agent-knowledge`](https://github.com/tangle-network/agent-knowledge)
165
- - [`@tangle-network/agent-integrations`](https://github.com/tangle-network/agent-integrations)
166
- - [`@tangle-network/agent-gateway`](https://github.com/tangle-network/agent-gateway)
167
- - [`@tangle-network/tcloud`](https://github.com/tangle-network/tcloud)
168
+ - `@tangle-network/agent-runtime`: production session/runtime layer.
169
+ - `@tangle-network/agent-knowledge`: source-grounded knowledge bases and readiness.
170
+ - `@tangle-network/agent-integrations`: connection, grant, capability, and integration invocation contracts.
168
171
 
169
172
  ## License
170
173
 
@@ -1 +1,2 @@
1
- export { B as BENCHMARK_SPLIT_SEED, b as BenchmarkAdapter, c as BenchmarkDatasetItem, d as BenchmarkEvaluation, i as deterministicSplit, l as routing } from '../index-1PZOtZFr.js';
1
+ export { B as BENCHMARK_SPLIT_SEED, a as BenchmarkAdapter, b as BenchmarkDatasetItem, c as BenchmarkEvaluation, d as deterministicSplit, e as routing } from '../index-c5saLbKD.js';
2
+ import '../run-record-CX_jcAyr.js';
@@ -57,7 +57,10 @@ function buildBody(req, forceJsonObject) {
57
57
  messages: req.messages,
58
58
  temperature: req.temperature ?? 0
59
59
  };
60
- if (req.maxTokens != null) body.max_tokens = req.maxTokens;
60
+ if (req.maxTokens != null) {
61
+ if (usesMaxCompletionTokens(req.model)) body.max_completion_tokens = req.maxTokens;
62
+ else body.max_tokens = req.maxTokens;
63
+ }
61
64
  if (req.jsonSchema && !forceJsonObject) {
62
65
  body.response_format = {
63
66
  type: "json_schema",
@@ -68,6 +71,9 @@ function buildBody(req, forceJsonObject) {
68
71
  }
69
72
  return body;
70
73
  }
74
+ function usesMaxCompletionTokens(model) {
75
+ return /^gpt-5(?:[.\-]|$)/i.test(model);
76
+ }
71
77
  async function sleep(ms) {
72
78
  return new Promise((resolve) => setTimeout(resolve, ms));
73
79
  }
@@ -262,4 +268,4 @@ export {
262
268
  probeLlm,
263
269
  LlmClient
264
270
  };
265
- //# sourceMappingURL=chunk-JAOLXRIA.js.map
271
+ //# sourceMappingURL=chunk-75MCTH7P.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../src/llm-client.ts"],"sourcesContent":["/**\n * LLM client with graceful degrade.\n *\n * OpenAI-compatible `/v1/chat/completions` client with:\n * - Exponential-backoff retry on 429 + 5xx gateway errors (502/503/504).\n * - Retry on transient network errors (fetch failed, AbortError, ECONNRESET).\n * - Graceful json_schema → json_object degrade on 400 with schema-reject body.\n * - Fenced-JSON stripping (```json ... ```) for models that wrap structured output.\n * - Configurable base URL + api key / bearer, works with LiteLLM proxies, OpenAI\n * directly, cli-bridge subscriptions, and any router that speaks the spec.\n *\n * Usage:\n * const { value, result } = await callLlmJson<MyType>(\n * { model: 'gpt-4o', messages: [...], jsonSchema: { name: 'x', schema: {...} } },\n * { baseUrl: 'https://router.tangle.tools/v1', apiKey: process.env.KEY },\n * )\n *\n * This is THE llm-calling seam for agent-eval primitives that need structured\n * output (semantic concept judge, reviewer directives, critic scores). Primitives\n * that need free-form text use `callLlm` and parse output themselves.\n */\n\n// ─── Types ──────────────────────────────────────────────────────────────\n\nexport interface LlmMessage {\n role: 'system' | 'user' | 'assistant'\n /**\n * Either a plain text content string OR a multimodal content array\n * (text + image_url parts) for vision-capable models.\n */\n content:\n | string\n | Array<\n | { type: 'text'; text: string }\n | { type: 'image_url'; image_url: { url: string; detail?: 'auto' | 'low' | 'high' } }\n >\n}\n\nexport interface LlmCallRequest {\n model: string\n messages: LlmMessage[]\n /** Optional JSON-mode response format (response_format: json_object). */\n jsonMode?: boolean\n /** Optional structured output via JSON Schema. Falls back to json_object on 400. */\n jsonSchema?: { name: string; schema: Record<string, unknown> }\n temperature?: number\n maxTokens?: number\n /** Per-call timeout, default 60s. */\n timeoutMs?: number\n}\n\nexport interface LlmUsage {\n promptTokens: number\n completionTokens: number\n totalTokens: number\n /** Proxies populate this when prompt caching is on. */\n cachedPromptTokens?: number\n}\n\nexport interface LlmCallResult {\n /** The text content of the first choice. Empty string if none. */\n content: string\n usage: LlmUsage\n /**\n * Cost in USD. Pulled from proxy's `_response_cost` field when present;\n * `null` when neither the proxy nor the caller can derive it.\n */\n costUsd: number | null\n /** Model name actually used (echoed from response). */\n model: string\n /** Wall-clock duration of the HTTP call (last attempt, if retried). */\n durationMs: number\n /** Raw response body. */\n raw: Record<string, unknown>\n}\n\nexport class LlmCallError extends Error {\n constructor(\n message: string,\n public readonly status: number,\n public readonly body: string,\n public readonly model: string,\n ) {\n super(message)\n this.name = 'LlmCallError'\n }\n}\n\nexport interface LlmClientOptions {\n /** Base URL (without trailing slash). Must end at the `/v1` prefix. */\n baseUrl?: string\n /** Bearer token — either `apiKey` or `bearer` populates `Authorization: Bearer ...`. */\n apiKey?: string\n bearer?: string\n /** Override for the `Authorization` header (e.g. `X-Auth: ...`). Takes precedence over apiKey/bearer. */\n authHeader?: { name: string; value: string }\n /** Default timeout in ms. Per-call can override. */\n defaultTimeoutMs?: number\n /** Max retry attempts on retriable errors. Default 3 (1 initial + 2 retries). */\n maxRetries?: number\n /** Fetch implementation — defaults to global `fetch`. Override for custom transport (e.g. tests). */\n fetch?: typeof fetch\n}\n\n// ─── Internals ──────────────────────────────────────────────────────────\n\nconst DEFAULT_BASE_URL = 'https://router.tangle.tools/v1'\nconst DEFAULT_TIMEOUT_MS = 60_000\nconst DEFAULT_MAX_RETRIES = 3\n\nconst RETRYABLE_STATUS = new Set([429, 502, 503, 504])\n\nfunction isRetryableError(err: unknown): boolean {\n if (err instanceof LlmCallError) return RETRYABLE_STATUS.has(err.status)\n if (err instanceof Error) {\n return (\n err.name === 'AbortError' ||\n err.name === 'TimeoutError' ||\n /fetch failed|ECONNRESET|ETIMEDOUT|EAI_AGAIN/i.test(err.message)\n )\n }\n return false\n}\n\nfunction parseRetryAfter(headers: Headers): number | null {\n const h = headers.get('retry-after')\n if (!h) return null\n const asNumber = Number(h)\n if (Number.isFinite(asNumber) && asNumber > 0) return asNumber * 1000\n const asDate = Date.parse(h)\n if (Number.isFinite(asDate)) return Math.max(0, asDate - Date.now())\n return null\n}\n\nfunction backoffMs(attempt: number): number {\n // 500ms, 1s, 2s, 4s, ...\n return Math.min(500 * Math.pow(2, attempt), 16_000)\n}\n\nfunction buildHeaders(opts: LlmClientOptions): Record<string, string> {\n const headers: Record<string, string> = {\n 'Content-Type': 'application/json',\n Accept: 'application/json',\n }\n if (opts.authHeader) {\n headers[opts.authHeader.name] = opts.authHeader.value\n } else if (opts.bearer || opts.apiKey) {\n headers.Authorization = `Bearer ${opts.bearer ?? opts.apiKey}`\n }\n return headers\n}\n\nfunction isSchemaRejection(status: number, body: string): boolean {\n if (status !== 400) return false\n const lower = body.toLowerCase()\n return (\n lower.includes('response_format') ||\n lower.includes('json_schema') ||\n lower.includes('is unavailable') ||\n lower.includes('not supported')\n )\n}\n\nfunction buildBody(req: LlmCallRequest, forceJsonObject: boolean): Record<string, unknown> {\n const body: Record<string, unknown> = {\n model: req.model,\n messages: req.messages,\n temperature: req.temperature ?? 0,\n }\n if (req.maxTokens != null) {\n if (usesMaxCompletionTokens(req.model)) body.max_completion_tokens = req.maxTokens\n else body.max_tokens = req.maxTokens\n }\n\n if (req.jsonSchema && !forceJsonObject) {\n body.response_format = {\n type: 'json_schema',\n json_schema: { name: req.jsonSchema.name, schema: req.jsonSchema.schema, strict: true },\n }\n } else if (req.jsonMode || req.jsonSchema) {\n body.response_format = { type: 'json_object' }\n }\n\n return body\n}\n\nfunction usesMaxCompletionTokens(model: string): boolean {\n return /^gpt-5(?:[.\\-]|$)/i.test(model)\n}\n\nasync function sleep(ms: number): Promise<void> {\n return new Promise((resolve) => setTimeout(resolve, ms))\n}\n\n// ─── Public API ─────────────────────────────────────────────────────────\n\n/**\n * Strip a ```json / ``` code fence if the model emitted one.\n * Idempotent for naked JSON. Some models (claude-code via router, certain\n * deepseek models) wrap output even under json_object.\n */\nexport function stripFencedJson(raw: string): string {\n const trimmed = raw.trim()\n const m = trimmed.match(/^```(?:json)?\\s*\\n?([\\s\\S]*?)\\n?```\\s*$/)\n return m ? m[1]!.trim() : trimmed\n}\n\nexport function extractJsonPayload(raw: string): string {\n const stripped = stripFencedJson(raw)\n try {\n JSON.parse(stripped)\n return stripped\n } catch {\n // Continue with balanced extraction below.\n }\n\n const starts = [...stripped.matchAll(/[\\[{]/g)].map((match) => match.index).filter((index) => index != null)\n for (const start of starts) {\n const candidate = extractBalancedJson(stripped, start)\n if (!candidate) continue\n try {\n JSON.parse(candidate)\n return candidate\n } catch {\n // Keep scanning; earlier braces may belong to prose.\n }\n }\n\n return stripped\n}\n\nfunction extractBalancedJson(input: string, start: number): string | null {\n const opener = input[start]\n const closer = opener === '{' ? '}' : opener === '[' ? ']' : null\n if (!closer) return null\n\n const stack: string[] = [closer]\n let isInString = false\n let isEscaped = false\n\n for (let i = start + 1; i < input.length; i++) {\n const char = input[i]!\n if (isEscaped) {\n isEscaped = false\n continue\n }\n if (char === '\\\\') {\n isEscaped = isInString\n continue\n }\n if (char === '\"') {\n isInString = !isInString\n continue\n }\n if (isInString) continue\n\n if (char === '{') stack.push('}')\n else if (char === '[') stack.push(']')\n else if (char === stack[stack.length - 1]) {\n stack.pop()\n if (stack.length === 0) return input.slice(start, i + 1)\n }\n }\n\n return null\n}\n\n/**\n * Low-level call. Returns raw content + usage + cost. Retries on transient\n * failures; does NOT degrade schema here — callers that want graceful\n * degrade use `callLlmJson`.\n */\nexport async function callLlm(\n req: LlmCallRequest,\n opts: LlmClientOptions = {},\n): Promise<LlmCallResult> {\n const baseUrl = (opts.baseUrl ?? DEFAULT_BASE_URL).replace(/\\/+$/, '')\n const url = `${baseUrl}/chat/completions`\n const timeoutMs = req.timeoutMs ?? opts.defaultTimeoutMs ?? DEFAULT_TIMEOUT_MS\n const maxRetries = opts.maxRetries ?? DEFAULT_MAX_RETRIES\n const fetchFn = opts.fetch ?? globalThis.fetch\n const headers = buildHeaders(opts)\n\n let lastErr: unknown\n for (let attempt = 0; attempt < maxRetries; attempt++) {\n const controller = new AbortController()\n const timeoutHandle = setTimeout(() => controller.abort(), timeoutMs)\n const started = Date.now()\n\n try {\n const res = await fetchFn(url, {\n method: 'POST',\n headers,\n body: JSON.stringify(buildBody(req, false)),\n signal: controller.signal,\n })\n clearTimeout(timeoutHandle)\n\n if (!res.ok) {\n const body = await res.text()\n const err = new LlmCallError(\n `LLM call ${res.status}: ${body.slice(0, 300)}`,\n res.status,\n body,\n req.model,\n )\n if (RETRYABLE_STATUS.has(res.status) && attempt < maxRetries - 1) {\n lastErr = err\n const retryAfter = parseRetryAfter(res.headers)\n await sleep(retryAfter ?? backoffMs(attempt))\n continue\n }\n throw err\n }\n\n const json = (await res.json()) as Record<string, unknown>\n const choice = (json.choices as Array<{ message?: { content?: string } }> | undefined)?.[0]\n const usageRaw = (json.usage as Record<string, unknown> | undefined) ?? {}\n const costFromProxy = (json._response_cost ?? json.cost_usd) as number | undefined\n\n return {\n content: choice?.message?.content ?? '',\n usage: {\n promptTokens: Number(usageRaw.prompt_tokens ?? 0),\n completionTokens: Number(usageRaw.completion_tokens ?? 0),\n totalTokens: Number(usageRaw.total_tokens ?? 0),\n cachedPromptTokens:\n usageRaw.prompt_tokens_details &&\n typeof usageRaw.prompt_tokens_details === 'object'\n ? Number(\n (usageRaw.prompt_tokens_details as Record<string, unknown>).cached_tokens ?? 0,\n )\n : undefined,\n },\n costUsd: typeof costFromProxy === 'number' ? costFromProxy : null,\n model: (json.model as string) ?? req.model,\n durationMs: Date.now() - started,\n raw: json,\n }\n } catch (err) {\n clearTimeout(timeoutHandle)\n lastErr = err\n if (attempt < maxRetries - 1 && isRetryableError(err)) {\n await sleep(backoffMs(attempt))\n continue\n }\n throw err\n }\n }\n throw lastErr instanceof Error ? lastErr : new Error(String(lastErr))\n}\n\n/**\n * Structured-output call. Returns parsed JSON plus the raw result envelope.\n * Degrades `jsonSchema` → `jsonMode` on a 400 that names the schema param —\n * critical for deepseek-v3/v4, kimi-k2.6, and other models that don't accept\n * the `response_format.json_schema` shape but DO accept `json_object`.\n */\nexport async function callLlmJson<T = unknown>(\n req: LlmCallRequest,\n opts: LlmClientOptions = {},\n): Promise<{ value: T; result: LlmCallResult }> {\n try {\n const result = await callLlm({ ...req, jsonMode: req.jsonMode ?? !req.jsonSchema }, opts)\n const value = parseJsonSafely<T>(result.content, result.model)\n return { value, result }\n } catch (err) {\n if (err instanceof LlmCallError && isSchemaRejection(err.status, err.body) && req.jsonSchema) {\n // Degrade to json_object + retry.\n const degradedReq: LlmCallRequest = { ...req, jsonMode: true, jsonSchema: undefined }\n const result = await callLlm(degradedReq, opts)\n const value = parseJsonSafely<T>(result.content, result.model)\n return { value, result }\n }\n throw err\n }\n}\n\nfunction parseJsonSafely<T>(content: string, model: string): T {\n const stripped = extractJsonPayload(content)\n try {\n return JSON.parse(stripped) as T\n } catch (err) {\n throw new Error(\n `LLM returned non-JSON content (model=${model}): ${\n err instanceof Error ? err.message : String(err)\n }\\n--- raw content ---\\n${content.slice(0, 800)}`,\n )\n }\n}\n\n/**\n * Probe whether a model is reachable. Returns latency + null error on\n * success; `ok=false` + error message on any failure (HTTP, timeout,\n * network, parse). Designed for sweep preflights — fail loud at the\n * boundary before burning a 30-leaf run on a misconfigured router.\n *\n * Sends a tiny `ping` message with `maxTokens=64`. Reasoning models\n * (glm-5.1, deepseek-v4) can burn the entire budget on internal reasoning\n * for short prompts, so don't tighten this further. We don't validate\n * content; HTTP 200 means reachable.\n */\nexport async function probeLlm(\n model: string,\n opts: LlmClientOptions & { timeoutMs?: number } = {},\n): Promise<{ ok: boolean; latencyMs: number; error: string | null }> {\n const start = Date.now()\n try {\n await callLlm(\n {\n model,\n messages: [{ role: 'user', content: 'ping' }],\n maxTokens: 64,\n timeoutMs: opts.timeoutMs ?? 30_000,\n },\n opts,\n )\n return { ok: true, latencyMs: Date.now() - start, error: null }\n } catch (err) {\n return {\n ok: false,\n latencyMs: Date.now() - start,\n error: err instanceof Error ? err.message : String(err),\n }\n }\n}\n\n/**\n * Stateful client — construct once with defaults, call many times.\n * Thin wrapper around the free functions; exists for callers that want\n * to inject a single configured instance into multiple primitives.\n */\nexport class LlmClient {\n constructor(private readonly opts: LlmClientOptions = {}) {}\n\n call(req: LlmCallRequest, per?: LlmClientOptions): Promise<LlmCallResult> {\n return callLlm(req, { ...this.opts, ...per })\n }\n\n callJson<T = unknown>(\n req: LlmCallRequest,\n per?: LlmClientOptions,\n ): Promise<{ value: T; result: LlmCallResult }> {\n return callLlmJson<T>(req, { ...this.opts, ...per })\n }\n}\n"],"mappings":";AA4EO,IAAM,eAAN,cAA2B,MAAM;AAAA,EACtC,YACE,SACgB,QACA,MACA,OAChB;AACA,UAAM,OAAO;AAJG;AACA;AACA;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EANkB;AAAA,EACA;AAAA,EACA;AAKpB;AAoBA,IAAM,mBAAmB;AACzB,IAAM,qBAAqB;AAC3B,IAAM,sBAAsB;AAE5B,IAAM,mBAAmB,oBAAI,IAAI,CAAC,KAAK,KAAK,KAAK,GAAG,CAAC;AAErD,SAAS,iBAAiB,KAAuB;AAC/C,MAAI,eAAe,aAAc,QAAO,iBAAiB,IAAI,IAAI,MAAM;AACvE,MAAI,eAAe,OAAO;AACxB,WACE,IAAI,SAAS,gBACb,IAAI,SAAS,kBACb,+CAA+C,KAAK,IAAI,OAAO;AAAA,EAEnE;AACA,SAAO;AACT;AAEA,SAAS,gBAAgB,SAAiC;AACxD,QAAM,IAAI,QAAQ,IAAI,aAAa;AACnC,MAAI,CAAC,EAAG,QAAO;AACf,QAAM,WAAW,OAAO,CAAC;AACzB,MAAI,OAAO,SAAS,QAAQ,KAAK,WAAW,EAAG,QAAO,WAAW;AACjE,QAAM,SAAS,KAAK,MAAM,CAAC;AAC3B,MAAI,OAAO,SAAS,MAAM,EAAG,QAAO,KAAK,IAAI,GAAG,SAAS,KAAK,IAAI,CAAC;AACnE,SAAO;AACT;AAEA,SAAS,UAAU,SAAyB;AAE1C,SAAO,KAAK,IAAI,MAAM,KAAK,IAAI,GAAG,OAAO,GAAG,IAAM;AACpD;AAEA,SAAS,aAAa,MAAgD;AACpE,QAAM,UAAkC;AAAA,IACtC,gBAAgB;AAAA,IAChB,QAAQ;AAAA,EACV;AACA,MAAI,KAAK,YAAY;AACnB,YAAQ,KAAK,WAAW,IAAI,IAAI,KAAK,WAAW;AAAA,EAClD,WAAW,KAAK,UAAU,KAAK,QAAQ;AACrC,YAAQ,gBAAgB,UAAU,KAAK,UAAU,KAAK,MAAM;AAAA,EAC9D;AACA,SAAO;AACT;AAEA,SAAS,kBAAkB,QAAgB,MAAuB;AAChE,MAAI,WAAW,IAAK,QAAO;AAC3B,QAAM,QAAQ,KAAK,YAAY;AAC/B,SACE,MAAM,SAAS,iBAAiB,KAChC,MAAM,SAAS,aAAa,KAC5B,MAAM,SAAS,gBAAgB,KAC/B,MAAM,SAAS,eAAe;AAElC;AAEA,SAAS,UAAU,KAAqB,iBAAmD;AACzF,QAAM,OAAgC;AAAA,IACpC,OAAO,IAAI;AAAA,IACX,UAAU,IAAI;AAAA,IACd,aAAa,IAAI,eAAe;AAAA,EAClC;AACA,MAAI,IAAI,aAAa,MAAM;AACzB,QAAI,wBAAwB,IAAI,KAAK,EAAG,MAAK,wBAAwB,IAAI;AAAA,QACpE,MAAK,aAAa,IAAI;AAAA,EAC7B;AAEA,MAAI,IAAI,cAAc,CAAC,iBAAiB;AACtC,SAAK,kBAAkB;AAAA,MACrB,MAAM;AAAA,MACN,aAAa,EAAE,MAAM,IAAI,WAAW,MAAM,QAAQ,IAAI,WAAW,QAAQ,QAAQ,KAAK;AAAA,IACxF;AAAA,EACF,WAAW,IAAI,YAAY,IAAI,YAAY;AACzC,SAAK,kBAAkB,EAAE,MAAM,cAAc;AAAA,EAC/C;AAEA,SAAO;AACT;AAEA,SAAS,wBAAwB,OAAwB;AACvD,SAAO,qBAAqB,KAAK,KAAK;AACxC;AAEA,eAAe,MAAM,IAA2B;AAC9C,SAAO,IAAI,QAAQ,CAAC,YAAY,WAAW,SAAS,EAAE,CAAC;AACzD;AASO,SAAS,gBAAgB,KAAqB;AACnD,QAAM,UAAU,IAAI,KAAK;AACzB,QAAM,IAAI,QAAQ,MAAM,yCAAyC;AACjE,SAAO,IAAI,EAAE,CAAC,EAAG,KAAK,IAAI;AAC5B;AAEO,SAAS,mBAAmB,KAAqB;AACtD,QAAM,WAAW,gBAAgB,GAAG;AACpC,MAAI;AACF,SAAK,MAAM,QAAQ;AACnB,WAAO;AAAA,EACT,QAAQ;AAAA,EAER;AAEA,QAAM,SAAS,CAAC,GAAG,SAAS,SAAS,QAAQ,CAAC,EAAE,IAAI,CAAC,UAAU,MAAM,KAAK,EAAE,OAAO,CAAC,UAAU,SAAS,IAAI;AAC3G,aAAW,SAAS,QAAQ;AAC1B,UAAM,YAAY,oBAAoB,UAAU,KAAK;AACrD,QAAI,CAAC,UAAW;AAChB,QAAI;AACF,WAAK,MAAM,SAAS;AACpB,aAAO;AAAA,IACT,QAAQ;AAAA,IAER;AAAA,EACF;AAEA,SAAO;AACT;AAEA,SAAS,oBAAoB,OAAe,OAA8B;AACxE,QAAM,SAAS,MAAM,KAAK;AAC1B,QAAM,SAAS,WAAW,MAAM,MAAM,WAAW,MAAM,MAAM;AAC7D,MAAI,CAAC,OAAQ,QAAO;AAEpB,QAAM,QAAkB,CAAC,MAAM;AAC/B,MAAI,aAAa;AACjB,MAAI,YAAY;AAEhB,WAAS,IAAI,QAAQ,GAAG,IAAI,MAAM,QAAQ,KAAK;AAC7C,UAAM,OAAO,MAAM,CAAC;AACpB,QAAI,WAAW;AACb,kBAAY;AACZ;AAAA,IACF;AACA,QAAI,SAAS,MAAM;AACjB,kBAAY;AACZ;AAAA,IACF;AACA,QAAI,SAAS,KAAK;AAChB,mBAAa,CAAC;AACd;AAAA,IACF;AACA,QAAI,WAAY;AAEhB,QAAI,SAAS,IAAK,OAAM,KAAK,GAAG;AAAA,aACvB,SAAS,IAAK,OAAM,KAAK,GAAG;AAAA,aAC5B,SAAS,MAAM,MAAM,SAAS,CAAC,GAAG;AACzC,YAAM,IAAI;AACV,UAAI,MAAM,WAAW,EAAG,QAAO,MAAM,MAAM,OAAO,IAAI,CAAC;AAAA,IACzD;AAAA,EACF;AAEA,SAAO;AACT;AAOA,eAAsB,QACpB,KACA,OAAyB,CAAC,GACF;AACxB,QAAM,WAAW,KAAK,WAAW,kBAAkB,QAAQ,QAAQ,EAAE;AACrE,QAAM,MAAM,GAAG,OAAO;AACtB,QAAM,YAAY,IAAI,aAAa,KAAK,oBAAoB;AAC5D,QAAM,aAAa,KAAK,cAAc;AACtC,QAAM,UAAU,KAAK,SAAS,WAAW;AACzC,QAAM,UAAU,aAAa,IAAI;AAEjC,MAAI;AACJ,WAAS,UAAU,GAAG,UAAU,YAAY,WAAW;AACrD,UAAM,aAAa,IAAI,gBAAgB;AACvC,UAAM,gBAAgB,WAAW,MAAM,WAAW,MAAM,GAAG,SAAS;AACpE,UAAM,UAAU,KAAK,IAAI;AAEzB,QAAI;AACF,YAAM,MAAM,MAAM,QAAQ,KAAK;AAAA,QAC7B,QAAQ;AAAA,QACR;AAAA,QACA,MAAM,KAAK,UAAU,UAAU,KAAK,KAAK,CAAC;AAAA,QAC1C,QAAQ,WAAW;AAAA,MACrB,CAAC;AACD,mBAAa,aAAa;AAE1B,UAAI,CAAC,IAAI,IAAI;AACX,cAAM,OAAO,MAAM,IAAI,KAAK;AAC5B,cAAM,MAAM,IAAI;AAAA,UACd,YAAY,IAAI,MAAM,KAAK,KAAK,MAAM,GAAG,GAAG,CAAC;AAAA,UAC7C,IAAI;AAAA,UACJ;AAAA,UACA,IAAI;AAAA,QACN;AACA,YAAI,iBAAiB,IAAI,IAAI,MAAM,KAAK,UAAU,aAAa,GAAG;AAChE,oBAAU;AACV,gBAAM,aAAa,gBAAgB,IAAI,OAAO;AAC9C,gBAAM,MAAM,cAAc,UAAU,OAAO,CAAC;AAC5C;AAAA,QACF;AACA,cAAM;AAAA,MACR;AAEA,YAAM,OAAQ,MAAM,IAAI,KAAK;AAC7B,YAAM,SAAU,KAAK,UAAoE,CAAC;AAC1F,YAAM,WAAY,KAAK,SAAiD,CAAC;AACzE,YAAM,gBAAiB,KAAK,kBAAkB,KAAK;AAEnD,aAAO;AAAA,QACL,SAAS,QAAQ,SAAS,WAAW;AAAA,QACrC,OAAO;AAAA,UACL,cAAc,OAAO,SAAS,iBAAiB,CAAC;AAAA,UAChD,kBAAkB,OAAO,SAAS,qBAAqB,CAAC;AAAA,UACxD,aAAa,OAAO,SAAS,gBAAgB,CAAC;AAAA,UAC9C,oBACE,SAAS,yBACT,OAAO,SAAS,0BAA0B,WACtC;AAAA,YACG,SAAS,sBAAkD,iBAAiB;AAAA,UAC/E,IACA;AAAA,QACR;AAAA,QACA,SAAS,OAAO,kBAAkB,WAAW,gBAAgB;AAAA,QAC7D,OAAQ,KAAK,SAAoB,IAAI;AAAA,QACrC,YAAY,KAAK,IAAI,IAAI;AAAA,QACzB,KAAK;AAAA,MACP;AAAA,IACF,SAAS,KAAK;AACZ,mBAAa,aAAa;AAC1B,gBAAU;AACV,UAAI,UAAU,aAAa,KAAK,iBAAiB,GAAG,GAAG;AACrD,cAAM,MAAM,UAAU,OAAO,CAAC;AAC9B;AAAA,MACF;AACA,YAAM;AAAA,IACR;AAAA,EACF;AACA,QAAM,mBAAmB,QAAQ,UAAU,IAAI,MAAM,OAAO,OAAO,CAAC;AACtE;AAQA,eAAsB,YACpB,KACA,OAAyB,CAAC,GACoB;AAC9C,MAAI;AACF,UAAM,SAAS,MAAM,QAAQ,EAAE,GAAG,KAAK,UAAU,IAAI,YAAY,CAAC,IAAI,WAAW,GAAG,IAAI;AACxF,UAAM,QAAQ,gBAAmB,OAAO,SAAS,OAAO,KAAK;AAC7D,WAAO,EAAE,OAAO,OAAO;AAAA,EACzB,SAAS,KAAK;AACZ,QAAI,eAAe,gBAAgB,kBAAkB,IAAI,QAAQ,IAAI,IAAI,KAAK,IAAI,YAAY;AAE5F,YAAM,cAA8B,EAAE,GAAG,KAAK,UAAU,MAAM,YAAY,OAAU;AACpF,YAAM,SAAS,MAAM,QAAQ,aAAa,IAAI;AAC9C,YAAM,QAAQ,gBAAmB,OAAO,SAAS,OAAO,KAAK;AAC7D,aAAO,EAAE,OAAO,OAAO;AAAA,IACzB;AACA,UAAM;AAAA,EACR;AACF;AAEA,SAAS,gBAAmB,SAAiB,OAAkB;AAC7D,QAAM,WAAW,mBAAmB,OAAO;AAC3C,MAAI;AACF,WAAO,KAAK,MAAM,QAAQ;AAAA,EAC5B,SAAS,KAAK;AACZ,UAAM,IAAI;AAAA,MACR,wCAAwC,KAAK,MAC3C,eAAe,QAAQ,IAAI,UAAU,OAAO,GAAG,CACjD;AAAA;AAAA,EAA0B,QAAQ,MAAM,GAAG,GAAG,CAAC;AAAA,IACjD;AAAA,EACF;AACF;AAaA,eAAsB,SACpB,OACA,OAAkD,CAAC,GACgB;AACnE,QAAM,QAAQ,KAAK,IAAI;AACvB,MAAI;AACF,UAAM;AAAA,MACJ;AAAA,QACE;AAAA,QACA,UAAU,CAAC,EAAE,MAAM,QAAQ,SAAS,OAAO,CAAC;AAAA,QAC5C,WAAW;AAAA,QACX,WAAW,KAAK,aAAa;AAAA,MAC/B;AAAA,MACA;AAAA,IACF;AACA,WAAO,EAAE,IAAI,MAAM,WAAW,KAAK,IAAI,IAAI,OAAO,OAAO,KAAK;AAAA,EAChE,SAAS,KAAK;AACZ,WAAO;AAAA,MACL,IAAI;AAAA,MACJ,WAAW,KAAK,IAAI,IAAI;AAAA,MACxB,OAAO,eAAe,QAAQ,IAAI,UAAU,OAAO,GAAG;AAAA,IACxD;AAAA,EACF;AACF;AAOO,IAAM,YAAN,MAAgB;AAAA,EACrB,YAA6B,OAAyB,CAAC,GAAG;AAA7B;AAAA,EAA8B;AAAA,EAA9B;AAAA,EAE7B,KAAK,KAAqB,KAAgD;AACxE,WAAO,QAAQ,KAAK,EAAE,GAAG,KAAK,MAAM,GAAG,IAAI,CAAC;AAAA,EAC9C;AAAA,EAEA,SACE,KACA,KAC8C;AAC9C,WAAO,YAAe,KAAK,EAAE,GAAG,KAAK,MAAM,GAAG,IAAI,CAAC;AAAA,EACrD;AACF;","names":[]}