@tangle-network/agent-eval 0.11.1 → 0.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,27 +1,112 @@
1
1
  # @tangle-network/agent-eval
2
2
 
3
- Trace-first evaluation framework for Tangle agents. Core (spans, pipelines, sandbox harness, OTLP export), trust (dataset, red-team, calibration, behavior DSL), builder-of-builders (three-layer eval, resumable sessions, meta-runtime correlation), and frontier (meta-eval correlation study, Process Reward Modeling, bisector).
3
+ **A library for deciding whether an LLM-driven generator did its job.**
4
4
 
5
- ## Install
5
+ You hand it the thing the generator produced — a code scaffold, a patch, a tweet, a JSON config — and you get back a structured verdict: pass/fail, dimension scores, plain-English rationale. Built to catch the LLM failure modes that LLM-as-judge alone misses.
6
6
 
7
- ```bash
7
+ ```ts
8
+ import { BuilderSession, SubprocessSandboxDriver, InMemoryTraceStore } from '@tangle-network/agent-eval'
9
+
10
+ const session = new BuilderSession(new InMemoryTraceStore(), { projectId: 'my-app' }, new SubprocessSandboxDriver())
11
+ await session.startChat()
12
+ const ship = await session.ship({
13
+ harness: { setupCommand: 'pnpm install', testCommand: 'pnpm exec tsc --noEmit', cwd: scaffoldDir, timeoutMs: 180_000 },
14
+ })
15
+ console.log(ship.result.passed, ship.result.score)
16
+ ```
17
+
18
+ ## Who this is for
19
+
20
+ - You ship a code generator (scaffolder, patcher, refactor agent) and need to gate on whether its output actually works.
21
+ - You ship a content generator and need quality signal beyond "the LLM said it's good".
22
+ - You want a release gate that fails on regressions you can name, not vibes.
23
+
24
+ If that's you, start with [`docs/concepts.md`](./docs/concepts.md) — 5-minute mental model — then come back here.
25
+
26
+ ## Quickstart
27
+
28
+ ### From any language: HTTP or RPC
29
+
30
+ The fastest path. agent-eval ships a CLI that runs as either an HTTP server or a stdio RPC binary. Drive it from Python, Rust, Go, anything.
31
+
32
+ ```sh
33
+ npm i -g @tangle-network/agent-eval
34
+
35
+ # HTTP — long-running
36
+ agent-eval serve --port 5005
37
+
38
+ # stdio RPC — one-shot, batch
39
+ echo '{"rubricName":"anti-slop","content":"…"}' | agent-eval rpc judge
40
+ ```
41
+
42
+ Python:
43
+ ```sh
44
+ pip install tangle-agent-eval
45
+ ```
46
+ ```python
47
+ from tangle_agent_eval import Client
48
+ c = Client()
49
+ r = c.judge(content="our scaffold ships zero-copy IO", rubric_name="anti-slop")
50
+ print(r.composite, r.failure_modes)
51
+ ```
52
+
53
+ See [`docs/wire-protocol.md`](./docs/wire-protocol.md) for the full surface.
54
+
55
+ ### From TypeScript: import directly
56
+
57
+ In-process; no wire round-trip. Use this when your eval lives in the same Node process as your generator.
58
+
59
+ ```sh
8
60
  pnpm add @tangle-network/agent-eval
9
61
  ```
10
62
 
11
- ## Usage
63
+ The recipe for a code-generator eval is in [`SKILL.md` §Minimal working path](./.claude/skills/agent-eval/SKILL.md#minimal-working-path-builder-of-builders).
64
+
65
+ ## Two ways to read this repo
66
+
67
+ - **You're a human onboarding** — read [`docs/concepts.md`](./docs/concepts.md) for the mental model, then [`docs/wire-protocol.md`](./docs/wire-protocol.md) if you'll call from another language, or `SKILL.md` if you'll embed in TS.
68
+ - **You're an LLM agent writing integration code** — read `SKILL.md`. Every directive there encodes a shipped bug; skipping one reintroduces the bug class.
12
69
 
13
- **→ [`.claude/skills/agent-eval/SKILL.md`](./.claude/skills/agent-eval/SKILL.md)** — single source of truth for every usage pattern. Covers: minimal builder-of-builders path, the seven muffled-gate footguns paid for in shipped bugs, the three-layer eval contract, regression tests worth writing, and "when to use what" for the 100+ exports.
70
+ ## What's in the box
14
71
 
15
- If you're an LLM or agent reading this, load the skill file before writing integration code — it encodes 10+ incident-driven directives that will save you from rediscovering them.
72
+ | Module | What it does | Doc |
73
+ |---|---|---|
74
+ | `BuilderSession` | Three-layer eval orchestrator (builder → app-build → app-runtime) for code generators. | concepts.md §three-layer eval |
75
+ | `MultiLayerVerifier` | Pipeline of layers (install → typecheck → build → semantic). Skip-on-fail, weighted aggregate. | concepts.md §verifiers |
76
+ | `judges`, `createCustomJudge`, `createAntiSlopJudge` | LLM and deterministic judges. | SKILL.md |
77
+ | Wire protocol (`agent-eval serve` / `rpc`) | HTTP and stdio RPC interface for cross-language clients. | wire-protocol.md |
78
+ | `clients/python/` | First-party Python client (`tangle-agent-eval` on PyPI). Version-locked to npm. | clients/python/README.md |
79
+ | `BenchmarkRunner`, `executeScenario`, `ConvergenceTracker` | Multi-turn scenario execution + cross-run tracking. | SKILL.md |
80
+ | `ExperimentTracker`, `PromptOptimizer`, `bisector` | A/B prompts, optimize steering, bisect regressions. | SKILL.md |
81
+ | Telemetry (`telemetry/`, `telemetry/file`) | OTLP export, trace replay, file sinks. | inline JSDoc |
16
82
 
17
- ## Dev
83
+ ## Tech stack
18
84
 
19
- ```bash
20
- pnpm build # tsup
21
- pnpm test # vitest
22
- pnpm typecheck # tsc --noEmit
85
+ - TypeScript strict, no semicolons, single quotes, 2-space indent
86
+ - `tsup` for bundling, `vitest` for tests
87
+ - `@tangle-network/tcloud` for LLM calls (judges, driver)
88
+ - `hono` + `@asteasolutions/zod-to-openapi` for the wire protocol
89
+
90
+ ## Develop
91
+
92
+ ```sh
93
+ pnpm install
94
+ pnpm typecheck
95
+ pnpm test
96
+ pnpm build
97
+ pnpm openapi # write dist/openapi.json from the wire schemas
98
+
99
+ # Run the server locally
100
+ node dist/cli.js serve --port 5005
101
+
102
+ # Python client tests (require pnpm build first)
103
+ cd clients/python && pip install -e ".[dev]" && pytest
23
104
  ```
24
105
 
106
+ ## Release
107
+
108
+ `@tangle-network/agent-eval` (npm) and `tangle-agent-eval` (PyPI) ship from the same git tag in the same CI workflow. If either fails to publish, neither does. Versions are locked.
109
+
25
110
  ## Related
26
111
 
27
112
  - [`@tangle-network/agent-gateway`](https://github.com/tangle-network/agent-gateway)
@@ -0,0 +1,215 @@
1
+ // src/llm-client.ts
2
+ var LlmCallError = class extends Error {
3
+ constructor(message, status, body, model) {
4
+ super(message);
5
+ this.status = status;
6
+ this.body = body;
7
+ this.model = model;
8
+ this.name = "LlmCallError";
9
+ }
10
+ status;
11
+ body;
12
+ model;
13
+ };
14
+ var DEFAULT_BASE_URL = "https://router.tangle.tools/v1";
15
+ var DEFAULT_TIMEOUT_MS = 6e4;
16
+ var DEFAULT_MAX_RETRIES = 3;
17
+ var RETRYABLE_STATUS = /* @__PURE__ */ new Set([429, 502, 503, 504]);
18
+ function isRetryableError(err) {
19
+ if (err instanceof LlmCallError) return RETRYABLE_STATUS.has(err.status);
20
+ if (err instanceof Error) {
21
+ return err.name === "AbortError" || err.name === "TimeoutError" || /fetch failed|ECONNRESET|ETIMEDOUT|EAI_AGAIN/i.test(err.message);
22
+ }
23
+ return false;
24
+ }
25
+ function parseRetryAfter(headers) {
26
+ const h = headers.get("retry-after");
27
+ if (!h) return null;
28
+ const asNumber = Number(h);
29
+ if (Number.isFinite(asNumber) && asNumber > 0) return asNumber * 1e3;
30
+ const asDate = Date.parse(h);
31
+ if (Number.isFinite(asDate)) return Math.max(0, asDate - Date.now());
32
+ return null;
33
+ }
34
+ function backoffMs(attempt) {
35
+ return Math.min(500 * Math.pow(2, attempt), 16e3);
36
+ }
37
+ function buildHeaders(opts) {
38
+ const headers = {
39
+ "Content-Type": "application/json",
40
+ Accept: "application/json"
41
+ };
42
+ if (opts.authHeader) {
43
+ headers[opts.authHeader.name] = opts.authHeader.value;
44
+ } else if (opts.bearer || opts.apiKey) {
45
+ headers.Authorization = `Bearer ${opts.bearer ?? opts.apiKey}`;
46
+ }
47
+ return headers;
48
+ }
49
+ function isSchemaRejection(status, body) {
50
+ if (status !== 400) return false;
51
+ const lower = body.toLowerCase();
52
+ return lower.includes("response_format") || lower.includes("json_schema") || lower.includes("is unavailable") || lower.includes("not supported");
53
+ }
54
+ function buildBody(req, forceJsonObject) {
55
+ const body = {
56
+ model: req.model,
57
+ messages: req.messages,
58
+ temperature: req.temperature ?? 0
59
+ };
60
+ if (req.maxTokens != null) body.max_tokens = req.maxTokens;
61
+ if (req.jsonSchema && !forceJsonObject) {
62
+ body.response_format = {
63
+ type: "json_schema",
64
+ json_schema: { name: req.jsonSchema.name, schema: req.jsonSchema.schema, strict: true }
65
+ };
66
+ } else if (req.jsonMode || req.jsonSchema) {
67
+ body.response_format = { type: "json_object" };
68
+ }
69
+ return body;
70
+ }
71
+ async function sleep(ms) {
72
+ return new Promise((resolve) => setTimeout(resolve, ms));
73
+ }
74
+ function stripFencedJson(raw) {
75
+ const trimmed = raw.trim();
76
+ const m = trimmed.match(/^```(?:json)?\s*\n?([\s\S]*?)\n?```\s*$/);
77
+ return m ? m[1].trim() : trimmed;
78
+ }
79
+ async function callLlm(req, opts = {}) {
80
+ const baseUrl = (opts.baseUrl ?? DEFAULT_BASE_URL).replace(/\/+$/, "");
81
+ const url = `${baseUrl}/chat/completions`;
82
+ const timeoutMs = req.timeoutMs ?? opts.defaultTimeoutMs ?? DEFAULT_TIMEOUT_MS;
83
+ const maxRetries = opts.maxRetries ?? DEFAULT_MAX_RETRIES;
84
+ const fetchFn = opts.fetch ?? globalThis.fetch;
85
+ const headers = buildHeaders(opts);
86
+ let lastErr;
87
+ for (let attempt = 0; attempt < maxRetries; attempt++) {
88
+ const controller = new AbortController();
89
+ const timeoutHandle = setTimeout(() => controller.abort(), timeoutMs);
90
+ const started = Date.now();
91
+ try {
92
+ const res = await fetchFn(url, {
93
+ method: "POST",
94
+ headers,
95
+ body: JSON.stringify(buildBody(req, false)),
96
+ signal: controller.signal
97
+ });
98
+ clearTimeout(timeoutHandle);
99
+ if (!res.ok) {
100
+ const body = await res.text();
101
+ const err = new LlmCallError(
102
+ `LLM call ${res.status}: ${body.slice(0, 300)}`,
103
+ res.status,
104
+ body,
105
+ req.model
106
+ );
107
+ if (RETRYABLE_STATUS.has(res.status) && attempt < maxRetries - 1) {
108
+ lastErr = err;
109
+ const retryAfter = parseRetryAfter(res.headers);
110
+ await sleep(retryAfter ?? backoffMs(attempt));
111
+ continue;
112
+ }
113
+ throw err;
114
+ }
115
+ const json = await res.json();
116
+ const choice = json.choices?.[0];
117
+ const usageRaw = json.usage ?? {};
118
+ const costFromProxy = json._response_cost ?? json.cost_usd;
119
+ return {
120
+ content: choice?.message?.content ?? "",
121
+ usage: {
122
+ promptTokens: Number(usageRaw.prompt_tokens ?? 0),
123
+ completionTokens: Number(usageRaw.completion_tokens ?? 0),
124
+ totalTokens: Number(usageRaw.total_tokens ?? 0),
125
+ cachedPromptTokens: usageRaw.prompt_tokens_details && typeof usageRaw.prompt_tokens_details === "object" ? Number(
126
+ usageRaw.prompt_tokens_details.cached_tokens ?? 0
127
+ ) : void 0
128
+ },
129
+ costUsd: typeof costFromProxy === "number" ? costFromProxy : null,
130
+ model: json.model ?? req.model,
131
+ durationMs: Date.now() - started,
132
+ raw: json
133
+ };
134
+ } catch (err) {
135
+ clearTimeout(timeoutHandle);
136
+ lastErr = err;
137
+ if (attempt < maxRetries - 1 && isRetryableError(err)) {
138
+ await sleep(backoffMs(attempt));
139
+ continue;
140
+ }
141
+ throw err;
142
+ }
143
+ }
144
+ throw lastErr instanceof Error ? lastErr : new Error(String(lastErr));
145
+ }
146
+ async function callLlmJson(req, opts = {}) {
147
+ try {
148
+ const result = await callLlm({ ...req, jsonMode: req.jsonMode ?? !req.jsonSchema }, opts);
149
+ const value = parseJsonSafely(result.content, result.model);
150
+ return { value, result };
151
+ } catch (err) {
152
+ if (err instanceof LlmCallError && isSchemaRejection(err.status, err.body) && req.jsonSchema) {
153
+ const degradedReq = { ...req, jsonMode: true, jsonSchema: void 0 };
154
+ const result = await callLlm(degradedReq, opts);
155
+ const value = parseJsonSafely(result.content, result.model);
156
+ return { value, result };
157
+ }
158
+ throw err;
159
+ }
160
+ }
161
+ function parseJsonSafely(content, model) {
162
+ const stripped = stripFencedJson(content);
163
+ try {
164
+ return JSON.parse(stripped);
165
+ } catch (err) {
166
+ throw new Error(
167
+ `LLM returned non-JSON content (model=${model}): ${err instanceof Error ? err.message : String(err)}
168
+ --- raw content ---
169
+ ${content.slice(0, 800)}`
170
+ );
171
+ }
172
+ }
173
+ async function probeLlm(model, opts = {}) {
174
+ const start = Date.now();
175
+ try {
176
+ await callLlm(
177
+ {
178
+ model,
179
+ messages: [{ role: "user", content: "ping" }],
180
+ maxTokens: 64,
181
+ timeoutMs: opts.timeoutMs ?? 3e4
182
+ },
183
+ opts
184
+ );
185
+ return { ok: true, latencyMs: Date.now() - start, error: null };
186
+ } catch (err) {
187
+ return {
188
+ ok: false,
189
+ latencyMs: Date.now() - start,
190
+ error: err instanceof Error ? err.message : String(err)
191
+ };
192
+ }
193
+ }
194
+ var LlmClient = class {
195
+ constructor(opts = {}) {
196
+ this.opts = opts;
197
+ }
198
+ opts;
199
+ call(req, per) {
200
+ return callLlm(req, { ...this.opts, ...per });
201
+ }
202
+ callJson(req, per) {
203
+ return callLlmJson(req, { ...this.opts, ...per });
204
+ }
205
+ };
206
+
207
+ export {
208
+ LlmCallError,
209
+ stripFencedJson,
210
+ callLlm,
211
+ callLlmJson,
212
+ probeLlm,
213
+ LlmClient
214
+ };
215
+ //# sourceMappingURL=chunk-ITN4YOZY.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../src/llm-client.ts"],"sourcesContent":["/**\n * LLM client with graceful degrade.\n *\n * OpenAI-compatible `/v1/chat/completions` client with:\n * - Exponential-backoff retry on 429 + 5xx gateway errors (502/503/504).\n * - Retry on transient network errors (fetch failed, AbortError, ECONNRESET).\n * - Graceful json_schema → json_object degrade on 400 with schema-reject body.\n * - Fenced-JSON stripping (```json ... ```) for models that wrap structured output.\n * - Configurable base URL + api key / bearer, works with LiteLLM proxies, OpenAI\n * directly, cli-bridge subscriptions, and any router that speaks the spec.\n *\n * Usage:\n * const { value, result } = await callLlmJson<MyType>(\n * { model: 'gpt-4o', messages: [...], jsonSchema: { name: 'x', schema: {...} } },\n * { baseUrl: 'https://router.tangle.tools/v1', apiKey: process.env.KEY },\n * )\n *\n * This is THE llm-calling seam for agent-eval primitives that need structured\n * output (semantic concept judge, reviewer directives, critic scores). Primitives\n * that need free-form text use `callLlm` and parse output themselves.\n */\n\n// ─── Types ──────────────────────────────────────────────────────────────\n\nexport interface LlmMessage {\n role: 'system' | 'user' | 'assistant'\n /**\n * Either a plain text content string OR a multimodal content array\n * (text + image_url parts) for vision-capable models.\n */\n content:\n | string\n | Array<\n | { type: 'text'; text: string }\n | { type: 'image_url'; image_url: { url: string; detail?: 'auto' | 'low' | 'high' } }\n >\n}\n\nexport interface LlmCallRequest {\n model: string\n messages: LlmMessage[]\n /** Optional JSON-mode response format (response_format: json_object). */\n jsonMode?: boolean\n /** Optional structured output via JSON Schema. Falls back to json_object on 400. */\n jsonSchema?: { name: string; schema: Record<string, unknown> }\n temperature?: number\n maxTokens?: number\n /** Per-call timeout, default 60s. */\n timeoutMs?: number\n}\n\nexport interface LlmUsage {\n promptTokens: number\n completionTokens: number\n totalTokens: number\n /** Proxies populate this when prompt caching is on. */\n cachedPromptTokens?: number\n}\n\nexport interface LlmCallResult {\n /** The text content of the first choice. Empty string if none. */\n content: string\n usage: LlmUsage\n /**\n * Cost in USD. Pulled from proxy's `_response_cost` field when present;\n * `null` when neither the proxy nor the caller can derive it.\n */\n costUsd: number | null\n /** Model name actually used (echoed from response). */\n model: string\n /** Wall-clock duration of the HTTP call (last attempt, if retried). */\n durationMs: number\n /** Raw response body. */\n raw: Record<string, unknown>\n}\n\nexport class LlmCallError extends Error {\n constructor(\n message: string,\n public readonly status: number,\n public readonly body: string,\n public readonly model: string,\n ) {\n super(message)\n this.name = 'LlmCallError'\n }\n}\n\nexport interface LlmClientOptions {\n /** Base URL (without trailing slash). Must end at the `/v1` prefix. */\n baseUrl?: string\n /** Bearer token — either `apiKey` or `bearer` populates `Authorization: Bearer ...`. */\n apiKey?: string\n bearer?: string\n /** Override for the `Authorization` header (e.g. `X-Auth: ...`). Takes precedence over apiKey/bearer. */\n authHeader?: { name: string; value: string }\n /** Default timeout in ms. Per-call can override. */\n defaultTimeoutMs?: number\n /** Max retry attempts on retriable errors. Default 3 (1 initial + 2 retries). */\n maxRetries?: number\n /** Fetch implementation — defaults to global `fetch`. Override for custom transport (e.g. tests). */\n fetch?: typeof fetch\n}\n\n// ─── Internals ──────────────────────────────────────────────────────────\n\nconst DEFAULT_BASE_URL = 'https://router.tangle.tools/v1'\nconst DEFAULT_TIMEOUT_MS = 60_000\nconst DEFAULT_MAX_RETRIES = 3\n\nconst RETRYABLE_STATUS = new Set([429, 502, 503, 504])\n\nfunction isRetryableError(err: unknown): boolean {\n if (err instanceof LlmCallError) return RETRYABLE_STATUS.has(err.status)\n if (err instanceof Error) {\n return (\n err.name === 'AbortError' ||\n err.name === 'TimeoutError' ||\n /fetch failed|ECONNRESET|ETIMEDOUT|EAI_AGAIN/i.test(err.message)\n )\n }\n return false\n}\n\nfunction parseRetryAfter(headers: Headers): number | null {\n const h = headers.get('retry-after')\n if (!h) return null\n const asNumber = Number(h)\n if (Number.isFinite(asNumber) && asNumber > 0) return asNumber * 1000\n const asDate = Date.parse(h)\n if (Number.isFinite(asDate)) return Math.max(0, asDate - Date.now())\n return null\n}\n\nfunction backoffMs(attempt: number): number {\n // 500ms, 1s, 2s, 4s, ...\n return Math.min(500 * Math.pow(2, attempt), 16_000)\n}\n\nfunction buildHeaders(opts: LlmClientOptions): Record<string, string> {\n const headers: Record<string, string> = {\n 'Content-Type': 'application/json',\n Accept: 'application/json',\n }\n if (opts.authHeader) {\n headers[opts.authHeader.name] = opts.authHeader.value\n } else if (opts.bearer || opts.apiKey) {\n headers.Authorization = `Bearer ${opts.bearer ?? opts.apiKey}`\n }\n return headers\n}\n\nfunction isSchemaRejection(status: number, body: string): boolean {\n if (status !== 400) return false\n const lower = body.toLowerCase()\n return (\n lower.includes('response_format') ||\n lower.includes('json_schema') ||\n lower.includes('is unavailable') ||\n lower.includes('not supported')\n )\n}\n\nfunction buildBody(req: LlmCallRequest, forceJsonObject: boolean): Record<string, unknown> {\n const body: Record<string, unknown> = {\n model: req.model,\n messages: req.messages,\n temperature: req.temperature ?? 0,\n }\n if (req.maxTokens != null) body.max_tokens = req.maxTokens\n\n if (req.jsonSchema && !forceJsonObject) {\n body.response_format = {\n type: 'json_schema',\n json_schema: { name: req.jsonSchema.name, schema: req.jsonSchema.schema, strict: true },\n }\n } else if (req.jsonMode || req.jsonSchema) {\n body.response_format = { type: 'json_object' }\n }\n\n return body\n}\n\nasync function sleep(ms: number): Promise<void> {\n return new Promise((resolve) => setTimeout(resolve, ms))\n}\n\n// ─── Public API ─────────────────────────────────────────────────────────\n\n/**\n * Strip a ```json / ``` code fence if the model emitted one.\n * Idempotent for naked JSON. Some models (claude-code via router, certain\n * deepseek models) wrap output even under json_object.\n */\nexport function stripFencedJson(raw: string): string {\n const trimmed = raw.trim()\n const m = trimmed.match(/^```(?:json)?\\s*\\n?([\\s\\S]*?)\\n?```\\s*$/)\n return m ? m[1]!.trim() : trimmed\n}\n\n/**\n * Low-level call. Returns raw content + usage + cost. Retries on transient\n * failures; does NOT degrade schema here — callers that want graceful\n * degrade use `callLlmJson`.\n */\nexport async function callLlm(\n req: LlmCallRequest,\n opts: LlmClientOptions = {},\n): Promise<LlmCallResult> {\n const baseUrl = (opts.baseUrl ?? DEFAULT_BASE_URL).replace(/\\/+$/, '')\n const url = `${baseUrl}/chat/completions`\n const timeoutMs = req.timeoutMs ?? opts.defaultTimeoutMs ?? DEFAULT_TIMEOUT_MS\n const maxRetries = opts.maxRetries ?? DEFAULT_MAX_RETRIES\n const fetchFn = opts.fetch ?? globalThis.fetch\n const headers = buildHeaders(opts)\n\n let lastErr: unknown\n for (let attempt = 0; attempt < maxRetries; attempt++) {\n const controller = new AbortController()\n const timeoutHandle = setTimeout(() => controller.abort(), timeoutMs)\n const started = Date.now()\n\n try {\n const res = await fetchFn(url, {\n method: 'POST',\n headers,\n body: JSON.stringify(buildBody(req, false)),\n signal: controller.signal,\n })\n clearTimeout(timeoutHandle)\n\n if (!res.ok) {\n const body = await res.text()\n const err = new LlmCallError(\n `LLM call ${res.status}: ${body.slice(0, 300)}`,\n res.status,\n body,\n req.model,\n )\n if (RETRYABLE_STATUS.has(res.status) && attempt < maxRetries - 1) {\n lastErr = err\n const retryAfter = parseRetryAfter(res.headers)\n await sleep(retryAfter ?? backoffMs(attempt))\n continue\n }\n throw err\n }\n\n const json = (await res.json()) as Record<string, unknown>\n const choice = (json.choices as Array<{ message?: { content?: string } }> | undefined)?.[0]\n const usageRaw = (json.usage as Record<string, unknown> | undefined) ?? {}\n const costFromProxy = (json._response_cost ?? json.cost_usd) as number | undefined\n\n return {\n content: choice?.message?.content ?? '',\n usage: {\n promptTokens: Number(usageRaw.prompt_tokens ?? 0),\n completionTokens: Number(usageRaw.completion_tokens ?? 0),\n totalTokens: Number(usageRaw.total_tokens ?? 0),\n cachedPromptTokens:\n usageRaw.prompt_tokens_details &&\n typeof usageRaw.prompt_tokens_details === 'object'\n ? Number(\n (usageRaw.prompt_tokens_details as Record<string, unknown>).cached_tokens ?? 0,\n )\n : undefined,\n },\n costUsd: typeof costFromProxy === 'number' ? costFromProxy : null,\n model: (json.model as string) ?? req.model,\n durationMs: Date.now() - started,\n raw: json,\n }\n } catch (err) {\n clearTimeout(timeoutHandle)\n lastErr = err\n if (attempt < maxRetries - 1 && isRetryableError(err)) {\n await sleep(backoffMs(attempt))\n continue\n }\n throw err\n }\n }\n throw lastErr instanceof Error ? lastErr : new Error(String(lastErr))\n}\n\n/**\n * Structured-output call. Returns parsed JSON plus the raw result envelope.\n * Degrades `jsonSchema` → `jsonMode` on a 400 that names the schema param —\n * critical for deepseek-v3/v4, kimi-k2.6, and other models that don't accept\n * the `response_format.json_schema` shape but DO accept `json_object`.\n */\nexport async function callLlmJson<T = unknown>(\n req: LlmCallRequest,\n opts: LlmClientOptions = {},\n): Promise<{ value: T; result: LlmCallResult }> {\n try {\n const result = await callLlm({ ...req, jsonMode: req.jsonMode ?? !req.jsonSchema }, opts)\n const value = parseJsonSafely<T>(result.content, result.model)\n return { value, result }\n } catch (err) {\n if (err instanceof LlmCallError && isSchemaRejection(err.status, err.body) && req.jsonSchema) {\n // Degrade to json_object + retry.\n const degradedReq: LlmCallRequest = { ...req, jsonMode: true, jsonSchema: undefined }\n const result = await callLlm(degradedReq, opts)\n const value = parseJsonSafely<T>(result.content, result.model)\n return { value, result }\n }\n throw err\n }\n}\n\nfunction parseJsonSafely<T>(content: string, model: string): T {\n const stripped = stripFencedJson(content)\n try {\n return JSON.parse(stripped) as T\n } catch (err) {\n throw new Error(\n `LLM returned non-JSON content (model=${model}): ${\n err instanceof Error ? err.message : String(err)\n }\\n--- raw content ---\\n${content.slice(0, 800)}`,\n )\n }\n}\n\n/**\n * Probe whether a model is reachable. Returns latency + null error on\n * success; `ok=false` + error message on any failure (HTTP, timeout,\n * network, parse). Designed for sweep preflights — fail loud at the\n * boundary before burning a 30-leaf run on a misconfigured router.\n *\n * Sends a tiny `ping` message with `maxTokens=64`. Reasoning models\n * (glm-5.1, deepseek-v4) can burn the entire budget on internal reasoning\n * for short prompts, so don't tighten this further. We don't validate\n * content; HTTP 200 means reachable.\n */\nexport async function probeLlm(\n model: string,\n opts: LlmClientOptions & { timeoutMs?: number } = {},\n): Promise<{ ok: boolean; latencyMs: number; error: string | null }> {\n const start = Date.now()\n try {\n await callLlm(\n {\n model,\n messages: [{ role: 'user', content: 'ping' }],\n maxTokens: 64,\n timeoutMs: opts.timeoutMs ?? 30_000,\n },\n opts,\n )\n return { ok: true, latencyMs: Date.now() - start, error: null }\n } catch (err) {\n return {\n ok: false,\n latencyMs: Date.now() - start,\n error: err instanceof Error ? err.message : String(err),\n }\n }\n}\n\n/**\n * Stateful client — construct once with defaults, call many times.\n * Thin wrapper around the free functions; exists for callers that want\n * to inject a single configured instance into multiple primitives.\n */\nexport class LlmClient {\n constructor(private readonly opts: LlmClientOptions = {}) {}\n\n call(req: LlmCallRequest, per?: LlmClientOptions): Promise<LlmCallResult> {\n return callLlm(req, { ...this.opts, ...per })\n }\n\n callJson<T = unknown>(\n req: LlmCallRequest,\n per?: LlmClientOptions,\n ): Promise<{ value: T; result: LlmCallResult }> {\n return callLlmJson<T>(req, { ...this.opts, ...per })\n }\n}\n"],"mappings":";AA4EO,IAAM,eAAN,cAA2B,MAAM;AAAA,EACtC,YACE,SACgB,QACA,MACA,OAChB;AACA,UAAM,OAAO;AAJG;AACA;AACA;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EANkB;AAAA,EACA;AAAA,EACA;AAKpB;AAoBA,IAAM,mBAAmB;AACzB,IAAM,qBAAqB;AAC3B,IAAM,sBAAsB;AAE5B,IAAM,mBAAmB,oBAAI,IAAI,CAAC,KAAK,KAAK,KAAK,GAAG,CAAC;AAErD,SAAS,iBAAiB,KAAuB;AAC/C,MAAI,eAAe,aAAc,QAAO,iBAAiB,IAAI,IAAI,MAAM;AACvE,MAAI,eAAe,OAAO;AACxB,WACE,IAAI,SAAS,gBACb,IAAI,SAAS,kBACb,+CAA+C,KAAK,IAAI,OAAO;AAAA,EAEnE;AACA,SAAO;AACT;AAEA,SAAS,gBAAgB,SAAiC;AACxD,QAAM,IAAI,QAAQ,IAAI,aAAa;AACnC,MAAI,CAAC,EAAG,QAAO;AACf,QAAM,WAAW,OAAO,CAAC;AACzB,MAAI,OAAO,SAAS,QAAQ,KAAK,WAAW,EAAG,QAAO,WAAW;AACjE,QAAM,SAAS,KAAK,MAAM,CAAC;AAC3B,MAAI,OAAO,SAAS,MAAM,EAAG,QAAO,KAAK,IAAI,GAAG,SAAS,KAAK,IAAI,CAAC;AACnE,SAAO;AACT;AAEA,SAAS,UAAU,SAAyB;AAE1C,SAAO,KAAK,IAAI,MAAM,KAAK,IAAI,GAAG,OAAO,GAAG,IAAM;AACpD;AAEA,SAAS,aAAa,MAAgD;AACpE,QAAM,UAAkC;AAAA,IACtC,gBAAgB;AAAA,IAChB,QAAQ;AAAA,EACV;AACA,MAAI,KAAK,YAAY;AACnB,YAAQ,KAAK,WAAW,IAAI,IAAI,KAAK,WAAW;AAAA,EAClD,WAAW,KAAK,UAAU,KAAK,QAAQ;AACrC,YAAQ,gBAAgB,UAAU,KAAK,UAAU,KAAK,MAAM;AAAA,EAC9D;AACA,SAAO;AACT;AAEA,SAAS,kBAAkB,QAAgB,MAAuB;AAChE,MAAI,WAAW,IAAK,QAAO;AAC3B,QAAM,QAAQ,KAAK,YAAY;AAC/B,SACE,MAAM,SAAS,iBAAiB,KAChC,MAAM,SAAS,aAAa,KAC5B,MAAM,SAAS,gBAAgB,KAC/B,MAAM,SAAS,eAAe;AAElC;AAEA,SAAS,UAAU,KAAqB,iBAAmD;AACzF,QAAM,OAAgC;AAAA,IACpC,OAAO,IAAI;AAAA,IACX,UAAU,IAAI;AAAA,IACd,aAAa,IAAI,eAAe;AAAA,EAClC;AACA,MAAI,IAAI,aAAa,KAAM,MAAK,aAAa,IAAI;AAEjD,MAAI,IAAI,cAAc,CAAC,iBAAiB;AACtC,SAAK,kBAAkB;AAAA,MACrB,MAAM;AAAA,MACN,aAAa,EAAE,MAAM,IAAI,WAAW,MAAM,QAAQ,IAAI,WAAW,QAAQ,QAAQ,KAAK;AAAA,IACxF;AAAA,EACF,WAAW,IAAI,YAAY,IAAI,YAAY;AACzC,SAAK,kBAAkB,EAAE,MAAM,cAAc;AAAA,EAC/C;AAEA,SAAO;AACT;AAEA,eAAe,MAAM,IAA2B;AAC9C,SAAO,IAAI,QAAQ,CAAC,YAAY,WAAW,SAAS,EAAE,CAAC;AACzD;AASO,SAAS,gBAAgB,KAAqB;AACnD,QAAM,UAAU,IAAI,KAAK;AACzB,QAAM,IAAI,QAAQ,MAAM,yCAAyC;AACjE,SAAO,IAAI,EAAE,CAAC,EAAG,KAAK,IAAI;AAC5B;AAOA,eAAsB,QACpB,KACA,OAAyB,CAAC,GACF;AACxB,QAAM,WAAW,KAAK,WAAW,kBAAkB,QAAQ,QAAQ,EAAE;AACrE,QAAM,MAAM,GAAG,OAAO;AACtB,QAAM,YAAY,IAAI,aAAa,KAAK,oBAAoB;AAC5D,QAAM,aAAa,KAAK,cAAc;AACtC,QAAM,UAAU,KAAK,SAAS,WAAW;AACzC,QAAM,UAAU,aAAa,IAAI;AAEjC,MAAI;AACJ,WAAS,UAAU,GAAG,UAAU,YAAY,WAAW;AACrD,UAAM,aAAa,IAAI,gBAAgB;AACvC,UAAM,gBAAgB,WAAW,MAAM,WAAW,MAAM,GAAG,SAAS;AACpE,UAAM,UAAU,KAAK,IAAI;AAEzB,QAAI;AACF,YAAM,MAAM,MAAM,QAAQ,KAAK;AAAA,QAC7B,QAAQ;AAAA,QACR;AAAA,QACA,MAAM,KAAK,UAAU,UAAU,KAAK,KAAK,CAAC;AAAA,QAC1C,QAAQ,WAAW;AAAA,MACrB,CAAC;AACD,mBAAa,aAAa;AAE1B,UAAI,CAAC,IAAI,IAAI;AACX,cAAM,OAAO,MAAM,IAAI,KAAK;AAC5B,cAAM,MAAM,IAAI;AAAA,UACd,YAAY,IAAI,MAAM,KAAK,KAAK,MAAM,GAAG,GAAG,CAAC;AAAA,UAC7C,IAAI;AAAA,UACJ;AAAA,UACA,IAAI;AAAA,QACN;AACA,YAAI,iBAAiB,IAAI,IAAI,MAAM,KAAK,UAAU,aAAa,GAAG;AAChE,oBAAU;AACV,gBAAM,aAAa,gBAAgB,IAAI,OAAO;AAC9C,gBAAM,MAAM,cAAc,UAAU,OAAO,CAAC;AAC5C;AAAA,QACF;AACA,cAAM;AAAA,MACR;AAEA,YAAM,OAAQ,MAAM,IAAI,KAAK;AAC7B,YAAM,SAAU,KAAK,UAAoE,CAAC;AAC1F,YAAM,WAAY,KAAK,SAAiD,CAAC;AACzE,YAAM,gBAAiB,KAAK,kBAAkB,KAAK;AAEnD,aAAO;AAAA,QACL,SAAS,QAAQ,SAAS,WAAW;AAAA,QACrC,OAAO;AAAA,UACL,cAAc,OAAO,SAAS,iBAAiB,CAAC;AAAA,UAChD,kBAAkB,OAAO,SAAS,qBAAqB,CAAC;AAAA,UACxD,aAAa,OAAO,SAAS,gBAAgB,CAAC;AAAA,UAC9C,oBACE,SAAS,yBACT,OAAO,SAAS,0BAA0B,WACtC;AAAA,YACG,SAAS,sBAAkD,iBAAiB;AAAA,UAC/E,IACA;AAAA,QACR;AAAA,QACA,SAAS,OAAO,kBAAkB,WAAW,gBAAgB;AAAA,QAC7D,OAAQ,KAAK,SAAoB,IAAI;AAAA,QACrC,YAAY,KAAK,IAAI,IAAI;AAAA,QACzB,KAAK;AAAA,MACP;AAAA,IACF,SAAS,KAAK;AACZ,mBAAa,aAAa;AAC1B,gBAAU;AACV,UAAI,UAAU,aAAa,KAAK,iBAAiB,GAAG,GAAG;AACrD,cAAM,MAAM,UAAU,OAAO,CAAC;AAC9B;AAAA,MACF;AACA,YAAM;AAAA,IACR;AAAA,EACF;AACA,QAAM,mBAAmB,QAAQ,UAAU,IAAI,MAAM,OAAO,OAAO,CAAC;AACtE;AAQA,eAAsB,YACpB,KACA,OAAyB,CAAC,GACoB;AAC9C,MAAI;AACF,UAAM,SAAS,MAAM,QAAQ,EAAE,GAAG,KAAK,UAAU,IAAI,YAAY,CAAC,IAAI,WAAW,GAAG,IAAI;AACxF,UAAM,QAAQ,gBAAmB,OAAO,SAAS,OAAO,KAAK;AAC7D,WAAO,EAAE,OAAO,OAAO;AAAA,EACzB,SAAS,KAAK;AACZ,QAAI,eAAe,gBAAgB,kBAAkB,IAAI,QAAQ,IAAI,IAAI,KAAK,IAAI,YAAY;AAE5F,YAAM,cAA8B,EAAE,GAAG,KAAK,UAAU,MAAM,YAAY,OAAU;AACpF,YAAM,SAAS,MAAM,QAAQ,aAAa,IAAI;AAC9C,YAAM,QAAQ,gBAAmB,OAAO,SAAS,OAAO,KAAK;AAC7D,aAAO,EAAE,OAAO,OAAO;AAAA,IACzB;AACA,UAAM;AAAA,EACR;AACF;AAEA,SAAS,gBAAmB,SAAiB,OAAkB;AAC7D,QAAM,WAAW,gBAAgB,OAAO;AACxC,MAAI;AACF,WAAO,KAAK,MAAM,QAAQ;AAAA,EAC5B,SAAS,KAAK;AACZ,UAAM,IAAI;AAAA,MACR,wCAAwC,KAAK,MAC3C,eAAe,QAAQ,IAAI,UAAU,OAAO,GAAG,CACjD;AAAA;AAAA,EAA0B,QAAQ,MAAM,GAAG,GAAG,CAAC;AAAA,IACjD;AAAA,EACF;AACF;AAaA,eAAsB,SACpB,OACA,OAAkD,CAAC,GACgB;AACnE,QAAM,QAAQ,KAAK,IAAI;AACvB,MAAI;AACF,UAAM;AAAA,MACJ;AAAA,QACE;AAAA,QACA,UAAU,CAAC,EAAE,MAAM,QAAQ,SAAS,OAAO,CAAC;AAAA,QAC5C,WAAW;AAAA,QACX,WAAW,KAAK,aAAa;AAAA,MAC/B;AAAA,MACA;AAAA,IACF;AACA,WAAO,EAAE,IAAI,MAAM,WAAW,KAAK,IAAI,IAAI,OAAO,OAAO,KAAK;AAAA,EAChE,SAAS,KAAK;AACZ,WAAO;AAAA,MACL,IAAI;AAAA,MACJ,WAAW,KAAK,IAAI,IAAI;AAAA,MACxB,OAAO,eAAe,QAAQ,IAAI,UAAU,OAAO,GAAG;AAAA,IACxD;AAAA,EACF;AACF;AAOO,IAAM,YAAN,MAAgB;AAAA,EACrB,YAA6B,OAAyB,CAAC,GAAG;AAA7B;AAAA,EAA8B;AAAA,EAA9B;AAAA,EAE7B,KAAK,KAAqB,KAAgD;AACxE,WAAO,QAAQ,KAAK,EAAE,GAAG,KAAK,MAAM,GAAG,IAAI,CAAC;AAAA,EAC9C;AAAA,EAEA,SACE,KACA,KAC8C;AAC9C,WAAO,YAAe,KAAK,EAAE,GAAG,KAAK,MAAM,GAAG,IAAI,CAAC;AAAA,EACrD;AACF;","names":[]}