ai-lcr 0.6.0 → 0.6.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,68 @@ All notable changes to `ai-lcr` are documented here. The format follows
4
4
  [Keep a Changelog](https://keepachangelog.com/), and the project adheres to
5
5
  [Semantic Versioning](https://semver.org/).
6
6
 
7
+ ## [0.6.2] — 2026-06-11
8
+
9
+ Circuit breaker for persistently-failing providers. Until now the only recovery
10
+ lever was `resetIntervalMs`, which snaps routing back to the cheapest provider on
11
+ a timer — so a provider that's actually down keeps eating one failed attempt
12
+ every window. The breaker remembers the failure and stops sending it traffic.
13
+
14
+ ### Added
15
+
16
+ - **`createLCR({ cooldown })`.** A provider that fails `maxFailures` times within
17
+ `windowMs` is *skipped* for `cooldownMs` instead of being re-probed every
18
+ request; a single success clears its count. `true` enables defaults (3 / 60s →
19
+ 60s); pass `{ maxFailures, windowMs, cooldownMs }` to tune. New exported type
20
+ `CooldownOptions`.
21
+ - The breaker only **reorders** each request's attempt list (cooling providers go
22
+ last), so when every provider is cooling a request still tries them all rather
23
+ than failing outright — it can never turn a recoverable request into a hard
24
+ failure.
25
+
26
+ ### Changed
27
+
28
+ - The routing engine now snapshots a per-request **attempt order** once (cheapest
29
+ ring with cooling providers moved to the back) and threads it through streaming
30
+ failover, replacing the previous modular index walk. Behavior is identical when
31
+ `cooldown` is unset.
32
+
33
+ ### Compatibility
34
+
35
+ - Fully backward compatible. `cooldown` is **off by default** — with it unset no
36
+ provider is ever skipped and routing behaves exactly as before.
37
+
38
+ ## [0.6.1] — 2026-06-11
39
+
40
+ Zero-config pricing for native-maker routes. Until now every priced provider
41
+ needed a hand-typed `cost: { input, output }`; for a vendor's own API that number
42
+ is just the public list price you could look up. 0.6.1 bundles those.
43
+
44
+ ### Added
45
+
46
+ - **Bundled price table (`MODEL_PRICES`).** Official first-party token prices for
47
+ the native makers ai-lcr documents (openai · anthropic · gemini · deepseek ·
48
+ xai · mistral), keyed by the bare model id you pass to that vendor's AI SDK
49
+ provider — USD per 1M tokens, with `cacheRead` where the maker prices it.
50
+ Generated from [LiteLLM's price map](https://github.com/BerriAI/litellm) (MIT)
51
+ via `scripts/gen-text-prices.mjs`; the generated file is committed.
52
+ - **`getModelPrice(modelId)`.** Look up a bundled price directly; resolves a bare
53
+ id or one with a leading `provider/` segment stripped.
54
+ - **`createLCR({ autoPrice: true })`.** Fills any provider entry that has no
55
+ explicit `cost` from the table, by `model.modelId`. A native-vendor route then
56
+ needs zero hand-typed pricing and `autoSort` can order it.
57
+ - **`discount` on a provider entry.** The flat-reseller knob: `{ model:
58
+ kunavo("…"), discount: 0.2 }` prices a −20% aggregator off the bundled list
59
+ price (scaling input/output/cacheRead) with no hand-typed number. Applies only
60
+ when `autoPrice` fills the entry; out-of-range values throw.
61
+
62
+ ### Compatibility
63
+
64
+ - Fully backward compatible. `autoPrice` is **off by default** — unpriced entries
65
+ stay unpriced and an explicit `cost` always wins, so no existing config changes
66
+ behavior. The table covers native makers only; open-weights hosts (DeepInfra)
67
+ and breadth aggregators (OpenRouter) are still priced explicitly.
68
+
7
69
  ## [0.6.0] — 2026-06-10
8
70
 
9
71
  Media billing contract v2: **rank by the reference, bill by actual usage.**
package/README.md CHANGED
@@ -138,12 +138,50 @@ const lcr = createLCR({
138
138
 
139
139
  DeepInfra carries open weights only — no first-party Claude / GPT / Gemini. For those closed models, route through OpenRouter or a discount gateway instead.
140
140
 
141
+ ## Zero-config pricing (`autoPrice`)
142
+
143
+ Typing `cost: { input, output }` for every provider is the tedious part. `autoPrice: true` fills any entry that has no explicit `cost` from a **bundled price table** (`MODEL_PRICES`) — official first-party rates for the native makers (OpenAI, Anthropic, Google, DeepSeek, xAI, Mistral), keyed by the bare model id you already pass to the provider:
144
+
145
+ ```ts
146
+ const lcr = createLCR({
147
+ autoPrice: true, // fill missing costs from the bundled table
148
+ autoSort: true, // then order cheapest-first using those prices
149
+ models: {
150
+ "claude-sonnet": [
151
+ // Native API — price comes from the table, nothing to type.
152
+ { model: anthropic("claude-sonnet-4-6"), label: "anthropic" },
153
+ // Flat-discount aggregator — `discount` applies on top of the list price.
154
+ { model: kunavo("claude-sonnet-4-6"), label: "kunavo", discount: 0.2 }, // 20% off list
155
+ ],
156
+ },
157
+ });
158
+ ```
159
+
160
+ Three rules keep it predictable:
161
+
162
+ - **Off by default.** Unpriced entries stay unpriced (the pre-existing behavior), so turning `autoPrice` on never silently re-prices a model — and an **explicit `cost` always wins** over the table.
163
+ - **`discount` is the reseller knob.** A flat-% aggregator (Kunavo −20%) becomes `discount: 0.2` instead of a hand-typed number; it scales input, output, and `cacheRead` alike, and only applies when the table fills the entry. Variable-discount providers (TokenMart) still want explicit per-model `cost`.
164
+ - **Native makers only.** The table carries first-party list prices — the cheapest, most-featureful "go direct" route. Open-weights hosts (DeepInfra) and breadth aggregators (OpenRouter) aren't in it; price those explicitly.
165
+
166
+ Look a price up yourself with `getModelPrice("claude-sonnet-4-6")`. The table is generated from [LiteLLM's price map](https://github.com/BerriAI/litellm) (MIT) — refresh with `node scripts/gen-text-prices.mjs`.
167
+
141
168
  ## How it routes
142
169
 
143
170
  1. **Cheapest first.** Providers are tried in order — list them cheapest-first, or set `autoSort: true` to order them by `cost` automatically.
144
171
  2. **Fall through on failure.** On any provider failure — rate limit, 5xx, timeout, a **billing cap** (402 / out-of-credit / quota), *and* a client error like a **400** — it advances to the next provider, streaming-safe. A 400 fails over on purpose: across OpenAI-compatible aggregators a 400 is usually "*this* provider won't take this request" (an unsupported param, a model it hasn't listed, a stricter schema), not a universally-broken request — so the next provider may well serve it. If every provider rejects the request it still fails, surfacing the **original** error so a genuine caller bug stays debuggable. The one failure that never fails over is a deliberate caller cancellation (`AbortSignal`). Pass `shouldRetry: isRetryableError` to `createLCR` to restore the stricter "client errors fail fast" behavior.
145
172
  3. **Recover.** After an idle window (`resetIntervalMs`, default 60s) it snaps back to the cheapest provider.
146
173
 
174
+ For a provider that's *persistently* down, the timer alone keeps re-probing it — one failed attempt every window. Turn on the **circuit breaker** to stop that:
175
+
176
+ ```ts
177
+ const lcr = createLCR({
178
+ models: { /* … */ },
179
+ cooldown: true, // skip a provider that keeps failing, instead of re-probing it
180
+ });
181
+ ```
182
+
183
+ With `cooldown` on, a provider that fails enough times in a window is *skipped* for a cooldown period rather than tried every request — and a single success clears it. Defaults are 3 failures / 60s → 60s cooldown; tune with `cooldown: { maxFailures, windowMs, cooldownMs }`. It only ever **reorders** the attempt list (cooling providers go last), so if *every* provider is cooling a request still tries them all rather than failing outright. Off by default — routing is unchanged unless you opt in.
184
+
147
185
  ## See what happened (`onCall`)
148
186
 
149
187
  `onError`/`onCost` fire separately and uncorrelated, so a failover is hard to read after the fact. `onCall` gives you **one record per request** — the full chain, the winner, the reason for each failed hop, latency, and cost — and `formatCallRecord` turns it into a one-liner you can scan:
@@ -185,13 +223,28 @@ interface CallRecord {
185
223
  outputTokens: number;
186
224
  cachedInputTokens?: number; // prompt-cache hits the winner read (when reported)
187
225
  costUsd: number; // winner cost, cache-discount applied (see `cacheRead`)
188
- baselineUsd?: number; // same usage on the priciest priced leg → savings = baselineUsd − costUsd
226
+ baselineUsd?: number; // what the savings baseline would have charged for the SAME usage → savings = baselineUsd − costUsd
227
+ baselineKind?: "last-leg" | "official" | "priciest-route"; // how that baseline was derived (see below)
228
+ cachedSavingUsd?: number; // the provider's own prompt-cache discount — real money, but NOT a routing saving; never fold it into baselineUsd − costUsd
189
229
  requestId?: string; // your correlation id (see below) — roll multi-step tool loops into one request
190
230
  usageMissing?: boolean; // winner served but reported 0/0 tokens → costUsd is 0 but unknown, not free
231
+ emptyCompletion?: boolean; // clean response that generated NOTHING — prompt billed, zero output
232
+
233
+ // Media calls (createMediaLCR) additionally carry:
234
+ modality?: "image" | "video";
235
+ usage?: { seconds?: number; outputs?: number; megapixels?: number }; // the actual usage the bill was based on
236
+ officialUsd?: number; // the model maker's first-party price for this call's usage
237
+ estCostUsd?: number; // what the configured price table PREDICTED — on provider-reported rows, costUsd − estCostUsd is price-table drift
191
238
  }
192
239
  ```
193
240
 
194
- **Savings, not just spend.** Whenever at least one provider in a chain carries a `cost`, `baselineUsd` is what the same call would have cost on the most expensive priced leg (typically your safety-net fallback). `baselineUsd costUsd` is the money routing saved on that call the number a cost dashboard exists to show.
241
+ **Savings, not just spend.** `baselineUsd` is what the same call would have cost without routing, and `baselineKind` says exactly what that means so a dashboard can qualify the number instead of trusting it blindly:
242
+
243
+ - **`"last-leg"`** (text): the **last priced provider** in the chain — your always-on, list-price fallback. Deliberately *not* the most expensive leg: prompt caching can make a sticker-cheaper provider cost more on a cache-heavy call, and a max-of-chain baseline would fabricate "savings" on calls the fallback itself served.
244
+ - **`"official"`** (media): the model maker's **first-party API price** for the same actual usage — an 8-second clip is baselined at 8 seconds of the official rate, not a reference length.
245
+ - **`"priciest-route"`** (media, no official price known): the most expensive route you configured. Honest about cross-provider spread, but self-referential — not a market price.
246
+
247
+ `baselineUsd − costUsd` is the money routing saved on that call — the number a cost dashboard exists to show.
195
248
 
196
249
  **Responsiveness, not just total time.** On streaming calls (`streamText`, `streamObject`, streaming agents), `ttftMs` is the **time to first token** — measured from the winning provider's attempt start to its first content delta. It's the metric most LLM dashboards lead with, because it's what a user feels as "how fast did it start replying". Total `latencyMs` covers the whole stream including any failover; `ttftMs` isolates the serving model's responsiveness. It's `undefined` for `generateText`/`generateObject` (no streaming → no "first" token) and for calls that failed before any content. Output throughput (tokens/sec) is then `outputTokens / ((latencyMs − ttftMs) / 1000)`.
197
250
 
@@ -226,13 +279,28 @@ const lcr = createLCR({
226
279
  });
227
280
  ```
228
281
 
282
+ ### The companion dashboard ([`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard))
283
+
284
+ <p align="center">
285
+ <img src="assets/dashboard-demo.png" alt="ai-lcr-dashboard (demo data): saved vs spent over time, a price-drift alert, per-project failover health, and per-provider reliability" width="780">
286
+ </p>
287
+
288
+ A **self-hostable** Next.js + Postgres collector built for exactly these records — point `createHttpSink` at its `/api/ingest` and you get, across every project you tag:
289
+
290
+ - **saved vs. spent** over time, with the savings qualified by `baselineKind` and clamped per call (one mispriced row can't eat the rest);
291
+ - **failover health** per provider — who actually failed, who caught it, what leaked to users;
292
+ - **media economics** — image/video calls split out with per-unit cost ($/second of video, $/image);
293
+ - a **price-drift panel** — when a provider's reported bill disagrees with your configured price table by >±20%, it surfaces the route (a ~100× ratio is the classic USD-vs-cents slip). Cheapest-first routing is only as good as its price table; this is the smoke alarm.
294
+
295
+ One-click Vercel deploy (any Postgres: Neon, Supabase, RDS, local); records carry metadata only — no prompts, no outputs. The ingest contract is just the `CallRecord` JSON, so any other drain works too.
296
+
229
297
  ## Supported providers
230
298
 
231
299
  Any OpenAI-compatible endpoint works — and so does any AI SDK provider package, including a model vendor's own official API.
232
300
 
233
301
  - **Model vendors' own APIs (native):** route straight to [DeepSeek](https://platform.deepseek.com), [OpenAI](https://openai.com), [Anthropic](https://anthropic.com), [Google](https://ai.google.dev), [xAI](https://x.ai), etc. via their AI SDK provider packages — no markup, full native features. See [Route to a model vendor's own API](#route-to-a-model-vendors-own-api-native-providers).
234
302
  - **Text aggregators:** [OpenRouter](https://openrouter.ai) (widest coverage, list pricing) · [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off** every model) · [TokenMart](https://thetokenmart.ai) (15–65% off, varies by model)
235
- - **Image / video:** [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off**) · [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) — routing via `createMediaLCR`. Image: Kunavo (generations + `*-edit` reference-image endpoints) + Runware + fal. Video: fal (async queue) and Kunavo (async `POST /v1/videos` + poll, sync fallback) — both verified live
303
+ - **Image / video:** [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off**) · [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) — routing via `createMediaLCR`. Image: Kunavo (generations + `*-edit` reference-image endpoints) + Runware + fal. Video: fal (async queue), Kunavo (async `POST /v1/videos` + poll, sync fallback), and Runware (async `videoInference` + `getResponse` poll) all three on the async `submit`/`poll` path
236
304
 
237
305
  ## Text model pricing
238
306
 
@@ -347,6 +415,37 @@ Design choices worth knowing:
347
415
  - **Telemetry lands once, at the terminal poll** — one `onCall` `CallRecord` with the full failover chain, threaded across both processes (not at `submit`).
348
416
  - An adapter advertises async by implementing `submit` + `checkStatus`; image-only adapters omit them and are skipped by the async router. The bundled Kunavo, fal, and Runware adapters all implement the async path (Kunavo/Runware async is video-only; fal covers both).
349
417
 
418
+ ### Writing your own adapter
419
+
420
+ A `MediaAdapter` is small — `run` for sync, optional `submit`/`checkStatus` for async — and the one contract that matters is **how you report what was produced**:
421
+
422
+ ```ts
423
+ interface MediaAdapter {
424
+ provider: string;
425
+ run(req: { externalId: string; input: Record<string, unknown> }): Promise<MediaGenerateResult>;
426
+ submit?(req: { externalId: string; input; metadata? }): Promise<{ requestId: string }>;
427
+ checkStatus?(req: { externalId: string; requestId: string }): Promise<MediaStatusResult>;
428
+ }
429
+
430
+ // On a settled result, report:
431
+ {
432
+ outputs: [{ url, type: "image" | "video" }],
433
+ costCents?: number, // the provider's OWN bill, in US cents — convert if the API returns dollars (×100)!
434
+ usage?: { // typed actual usage — what the bill (or estimate) is based on
435
+ seconds?: number, // video length actually produced (per-second SKUs bill this)
436
+ outputs?: number, // output count — images or clips (per-image / per-call SKUs bill this)
437
+ megapixels?: number // total output MP (per-megapixel SKUs bill this)
438
+ }
439
+ }
440
+ ```
441
+
442
+ Rules that keep billing honest:
443
+
444
+ - **Report dimensions in `usage`, never as a bare count.** Seconds and output count are separate, explicitly-named fields, so a per-call price can never be multiplied by a clip's duration (the classic 8× overcharge).
445
+ - **`costCents` is cents.** A provider that returns dollars must be converted in the adapter (see the Runware adapter). If you slip, the router's cost-outlier guard flags any bill ≥25× off the price table via `onError` — but the reported number still stands.
446
+ - **When you report nothing**, the router estimates: per-second SKUs read `usage.seconds`, then the input's `duration` (numbers or `"8s"`-style strings), then the 5-second reference as a last resort; per-image/per-call SKUs bill the output count.
447
+ - **Throw errors with an HTTP `status` property** (see `FalMediaError`/`KunavoMediaError`) so the router can classify them for failover.
448
+
350
449
  ## Vetting a provider (capability + cost probe)
351
450
 
352
451
  A discount is worthless if the provider quietly breaks the wire protocol. `ai-lcr` ships a zero-dependency check (`scripts/check-provider.sh`, just `bash` + `curl` + `python3`) that vets the things that actually cost you money or corrupt output, **per model**:
@@ -402,15 +501,18 @@ Two OpenAI-compatible providers, same probe, same day. Cells cover both families
402
501
  ## Roadmap
403
502
 
404
503
  - [x] Own failover engine — cheapest-first routing + streaming-safe fallback, no external routing dependency
504
+ - [x] Circuit breaker (`cooldown`) — skip a persistently-failing provider instead of re-probing it every window
405
505
  - [x] Real per-call cost accounting (`onCost`)
406
506
  - [x] One correlated record per request with the full failover chain (`onCall` + `formatCallRecord`)
407
507
  - [x] Auto cheapest-first ordering (`autoSort`) from per-provider `cost`
408
508
  - [x] Offline capability + cost check (`scripts/check-provider.sh`) → per-model trust matrix
409
- - [ ] Bundled price table for zero-config pricing (drop the manual `cost` numbers)
509
+ - [x] Bundled price table for zero-config pricing (`autoPrice` + `MODEL_PRICES`) — drop the manual `cost` numbers for native-maker routes
410
510
  - [ ] Provider-quirk middleware (transparently patch known per-provider request quirks, e.g. Kunavo's ignored `max_tokens`)
411
511
  - [ ] Feed probe results into routing automatically (auto-exclude a model from a provider that fails its probe)
412
- - [x] Image & video model routing (`createMediaLCR`) — image via Kunavo (incl. `*-edit`) + Runware + fal; **video live via fal and Kunavo** (both verified)
413
- - [ ] Normalized cross-provider video price comparison + verified Runware video adapter
512
+ - [x] Image & video model routing (`createMediaLCR`) — image via Kunavo (incl. `*-edit`) + Runware + fal; video async (`submit`/`poll`) via fal, Kunavo, and Runware
513
+ - [x] Settle-time billing on actual usage (0.6) — typed `usage`, duration-aware savings baseline, `estCostUsd` price-drift signal, cost-outlier guard
514
+ - [x] Self-hosted dashboard ([`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard)) — savings, failover health, media $/unit, price-drift panel
515
+ - [ ] Normalized cross-provider video price comparison in the bundled table
414
516
 
415
517
  ## Affiliate disclosure
416
518
 
package/README.zh-CN.md CHANGED
@@ -144,13 +144,56 @@ DeepInfra 只承载开源权重——没有第一方 Claude / GPT / Gemini。那
144
144
  2. **失败时向下穿透。** 遇到任何 provider 失败——限流、5xx、超时、**额度耗尽**(402 / 欠费 / 余额不足),以及 **400** 这类 client 错误——都会前进到下一个 provider,且对流式安全。400 会 failover 是有意为之:在 OpenAI 兼容聚合层里,400 往往是"*这家* provider 不吃这个请求"(不支持的参数、它没上架这个 model、更严格的 schema),而非请求本身坏了——换一家很可能就能服务。若所有 provider 都拒绝,请求仍会失败,并抛出**第一个**(原始)错误,让真正的调用方 bug 保持可调试。唯一永远不 failover 的是调用方主动取消(`AbortSignal`)。想恢复旧的"client 错误立即失败"行为,给 `createLCR` 传 `shouldRetry: isRetryableError`。
145
145
  3. **恢复。** 在一段空闲窗口(`resetIntervalMs`,默认 60s)之后,自动回到最便宜的 provider。
146
146
 
147
+ ## 看清每次调用发生了什么(`onCall`)
148
+
149
+ `onError`/`onCost` 各自独立触发、互不关联,事后很难还原一次 failover 的全貌。`onCall` 给你**每个请求一条记录**——完整的尝试链、最终服务者、每跳失败的原因、延迟和成本;`formatCallRecord` 把它变成一行可扫读的日志:
150
+
151
+ ```text
152
+ ✓ text tokenmart 412ms $0.0003
153
+ ⚠ text tokenmart→openrouter 910ms $0.0004 ⤷ tokenmart 502
154
+ ✗ text deepseek→tokenmart→openrouter 1240ms FAILED ⤷ deepseek 401, tokenmart 502, openrouter 429
155
+ ```
156
+
157
+ `record` 是一个纯 `CallRecord` 对象,关键字段:
158
+
159
+ ```ts
160
+ interface CallRecord {
161
+ id: string; // 每个请求一个关联 id
162
+ model: string; // 逻辑模型名
163
+ attempts: { provider; ok; latencyMs; errorClass? }[];
164
+ winner?: string; // 最终服务的 provider;全失败则为 undefined
165
+ ok: boolean;
166
+ failedOver: boolean; // 尝试了不止一家
167
+ latencyMs: number;
168
+ ttftMs?: number; // 仅流式:首 token 时间
169
+ inputTokens: number;
170
+ outputTokens: number;
171
+ cachedInputTokens?: number; // 命中 prompt 缓存的输入 token
172
+ costUsd: number; // 实际成本(已按 cacheRead 折扣)
173
+ baselineUsd?: number; // 同样用量在「节约基线」上的价格 → 节约 = baselineUsd − costUsd
174
+ baselineKind?: "last-leg" | "official" | "priciest-route"; // 基线的来源(见下)
175
+ cachedSavingUsd?: number; // provider 自己的缓存折扣——是真金白银,但不是路由的功劳,别混进节约
176
+ usageMissing?: boolean; // 服务成功但 token 报 0/0 → 成本是「未知」而非「免费」
177
+
178
+ // 媒体调用(createMediaLCR)额外携带:
179
+ modality?: "image" | "video";
180
+ usage?: { seconds?; outputs?; megapixels? }; // 账单依据的实际用量
181
+ officialUsd?: number; // 官方第一方价(按本次实际用量)
182
+ estCostUsd?: number; // 价格表的预估——与 costUsd 的差 = 价格表漂移
183
+ }
184
+ ```
185
+
186
+ **节约怎么算才诚实:** `baselineKind` 说明 `baselineUsd` 是哪种基线——文本是**链尾兜底 provider 的列表价**(`"last-leg"`,故意不取最贵的一条:prompt 缓存可能让标价更便宜的那家在缓存重的调用上反而更贵,取最大值会凭空造出"节约");媒体是**模型厂商官方第一方价**(`"official"`,按实际秒数算),查不到官方价时退化为你配置里最贵的路由(`"priciest-route"`,自我参照,仅说明跨 provider 价差)。
187
+
188
+ **送进收集器:** `createHttpSink` 把每条记录 POST 到任意 endpoint(serverless 上传 Next.js 的 `after` 作 `dispatch` 防止被掐断)。配套的自托管 dashboard [`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard)(Next.js + Postgres,Vercel 一键部署)专为这些记录而建:花费 vs 节约趋势、各 provider failover 健康度、媒体 $/秒 与 $/张、以及**价格漂移面板**——某条 model@provider 路由的实报账单与价格表偏差超过 ±20% 时点名示警(约 100× 基本就是美元当美分的笔误)。只存元数据,不存 prompt 和输出。
189
+
147
190
  ## 支持的 provider
148
191
 
149
192
  任何 OpenAI 兼容的 endpoint 都可用——任何 AI SDK 的 provider 包也都可用,包括模型厂商自己的官方 API。
150
193
 
151
194
  - **模型厂商官方 API(原生):** 通过各自的 AI SDK provider 包直连 [DeepSeek](https://platform.deepseek.com)、[OpenAI](https://openai.com)、[Anthropic](https://anthropic.com)、[Google](https://ai.google.dev)、[xAI](https://x.ai) 等——无加价,原生特性齐全。见上方「直连模型厂商官方 API(原生 provider)」一节。
152
195
  - **文本聚合器:** [OpenRouter](https://openrouter.ai)(覆盖最广,列表定价)· [Kunavo](https://kunavo.com/?ref=victorimf)(**全模型 8 折**)· [TokenMart](https://thetokenmart.ai)(按模型 85 折–35 折不等)
153
- - **图像 / 视频:** [Kunavo](https://kunavo.com/?ref=victorimf)(**8 折**)· [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —— 通过 `createMediaLCR` 路由。图像:Kunavo + Runware + fal。视频:fal(已可用,走其异步队列 API);Kunavo Veo 轮询路径已实现但未验证
196
+ - **图像 / 视频:** [Kunavo](https://kunavo.com/?ref=victorimf)(**8 折**)· [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —— 通过 `createMediaLCR` 路由。图像:Kunavo(生成 + `*-edit` 参考图端点)+ Runware + fal。视频:fal(异步队列)、Kunavo(异步 `POST /v1/videos` + 轮询,另有同步兜底)、Runware(异步 `videoInference` + `getResponse` 轮询)——三家都在异步 `submit`/`poll` 路径上
154
197
 
155
198
  ## 文本模型价格
156
199
 
@@ -209,7 +252,9 @@ Kunavo 提供 Anthropic + Google。DeepSeek / OpenAI / Grok / Mistral 路由到
209
252
 
210
253
  ## 图像与视频路由(`createMediaLCR`)
211
254
 
212
- 图像和视频是 `ai-lcr` 独立的一侧(输出是文件、计价单位混杂、视频是异步任务)—— 见 [`src/media.ts`](src/media.ts)。你提供一个 registry(每个模型的 provider 路由 + 单位价)和一组 adapter,它就按最便宜优先路由、自动 failover,并通过与文本侧相同的 `onCall` sink 报告真实/归一化成本。
255
+ 图像和视频是 `ai-lcr` 独立的一侧(输出是文件、计价单位混杂、视频是异步任务)—— 见 [`src/media.ts`](src/media.ts)。你提供一个 registry(每个模型的 provider 路由 + 单位价)和一组 adapter,它就按最便宜优先路由、自动 failover,并通过与文本侧相同的 `onCall` sink 报告真实成本。
256
+
257
+ 两个价格、两份职责(0.6+):**排序**用归一化到参考输出(1080p 一张图 / 5 秒一段片)的价格,让混杂的计价单位可以公平比较;但每次调用的**计费**按实际用量——按秒计价的 SKU,一条 8 秒的片就按 8 秒收,节约基线也按同样的 8 秒官方价算。adapter 上报带类型的实际用量(`usage: { seconds, outputs, megapixels }`);provider 自己报了账单时以账单为准,而账单与价格表预估差距悬殊时(经典的"美元当美分"笔误正好是 100×)会触发 `onError`,提醒你修价格表。
213
258
 
214
259
  ```ts
215
260
  import { createMediaLCR, createKunavoMediaAdapter, createFalMediaAdapter } from 'ai-lcr'
@@ -261,6 +306,30 @@ if (r.done) {
261
306
  - **telemetry 只在终态轮询落一条**——一条 `onCall` `CallRecord`,带完整 failover 链,跨两个进程串起来(不是在 `submit` 时落)。
262
307
  - adapter 通过实现 `submit` + `checkStatus` 来声明支持异步;只做图像的 adapter 省略它们,异步路由会跳过这种路由。内置的 Kunavo、fal、Runware adapter 都实现了异步路径(Kunavo/Runware 异步仅视频;fal 图像视频皆可)。
263
308
 
309
+ ### 自己写 adapter
310
+
311
+ `MediaAdapter` 很小——同步用 `run`,异步可选 `submit`/`checkStatus`——唯一要紧的合同是**如何上报产出**:
312
+
313
+ ```ts
314
+ // 落定的结果上报:
315
+ {
316
+ outputs: [{ url, type: "image" | "video" }],
317
+ costCents?: number, // provider 自己的账单,单位是美分——API 返回美元的要 ×100 转换!
318
+ usage?: { // 带类型的实际用量——账单(或估算)以它为准
319
+ seconds?: number, // 实际产出的视频秒数(按秒计价的 SKU 按它计费)
320
+ outputs?: number, // 产出个数——图或片(按张 / 按次计价按它计费)
321
+ megapixels?: number // 产出总百万像素(按 MP 计价按它计费)
322
+ }
323
+ }
324
+ ```
325
+
326
+ 保证计费正确的几条规则:
327
+
328
+ - **维度在 `usage` 里显式命名,绝不报裸数字。** 秒数和产出数是两个不同的字段,按次的平价永远不可能被片长乘爆(经典的 8× 过计)。
329
+ - **`costCents` 是美分。** API 返回美元的,必须在 adapter 里转换(参考 Runware adapter)。万一失手,路由器的异常账单守卫会在偏差 ≥25× 时触发 `onError`——但上报的数字仍然作数。
330
+ - **什么都不报时**,路由器会估算:按秒 SKU 依次读 `usage.seconds` → 输入的 `duration`(数字或 `"8s"` 这类字符串)→ 最后才退到 5 秒参考;按张/按次 SKU 按产出数计。
331
+ - **抛错时带上 HTTP `status` 属性**(见 `FalMediaError`/`KunavoMediaError`),路由器才能正确分类并 failover。
332
+
264
333
  ## 给 provider 做体检(能力 + 成本探测)
265
334
 
266
335
  折扣再大,如果 provider 偷偷破坏了协议就一文不值。`ai-lcr` 自带一个零依赖的检查脚本(`scripts/check-provider.sh`,只需 `bash` + `curl` + `python3`),**逐模型**核查那些真正会让你多花钱或污染输出的点:
@@ -321,8 +390,10 @@ API_KEY=$INFERENCE_API_KEY BASE=https://model.service-inference.ai \
321
390
  - [ ] 内置价格表,实现零配置定价(省去手填 `cost` 数字)
322
391
  - [ ] provider 怪癖中间件(透明地修补已知怪癖,如 Kunavo 被忽略的 `max_tokens`)
323
392
  - [ ] 把 probe 结果自动接入路由(探测失败的 provider×model 自动从列表剔除)
324
- - [x] 图像与视频模型路由(`createMediaLCR`)—— 图像走 Kunavo + Runware + fal;**视频已可用,走 fal**(异步队列 API)
325
- - [ ] 归一化的跨 provider 视频价格对比 + 验证 Kunavo/Runware 视频适配器
393
+ - [x] 图像与视频模型路由(`createMediaLCR`)—— 图像走 Kunavo(含 `*-edit`)+ Runware + fal;视频异步(`submit`/`poll`)走 fal、Kunavo、Runware 三家
394
+ - [x] 按实际用量的结算计费(0.6)—— typed `usage`、时长感知的节约基线、`estCostUsd` 价格漂移信号、异常账单守卫
395
+ - [x] 自托管 dashboard([`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard))—— 节约、failover 健康度、媒体单位成本、价格漂移面板
396
+ - [ ] 内置价格表中的归一化跨 provider 视频价格对比
326
397
 
327
398
  ## 联盟(Affiliate)披露
328
399