ai-lcr 0.5.6 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,81 @@ All notable changes to `ai-lcr` are documented here. The format follows
4
4
  [Keep a Changelog](https://keepachangelog.com/), and the project adheres to
5
5
  [Semantic Versioning](https://semver.org/).
6
6
 
7
+ ## [0.6.1] — 2026-06-11
8
+
9
+ Zero-config pricing for native-maker routes. Until now every priced provider
10
+ needed a hand-typed `cost: { input, output }`; for a vendor's own API that number
11
+ is just the public list price you could look up. 0.7 bundles those.
12
+
13
+ ### Added
14
+
15
+ - **Bundled price table (`MODEL_PRICES`).** Official first-party token prices for
16
+ the native makers ai-lcr documents (openai · anthropic · gemini · deepseek ·
17
+ xai · mistral), keyed by the bare model id you pass to that vendor's AI SDK
18
+ provider — USD per 1M tokens, with `cacheRead` where the maker prices it.
19
+ Generated from [LiteLLM's price map](https://github.com/BerriAI/litellm) (MIT)
20
+ via `scripts/gen-text-prices.mjs`; the generated file is committed.
21
+ - **`getModelPrice(modelId)`.** Look up a bundled price directly; resolves a bare
22
+ id or one with a leading `provider/` segment stripped.
23
+ - **`createLCR({ autoPrice: true })`.** Fills any provider entry that has no
24
+ explicit `cost` from the table, by `model.modelId`. A native-vendor route then
25
+ needs zero hand-typed pricing and `autoSort` can order it.
26
+ - **`discount` on a provider entry.** The flat-reseller knob: `{ model:
27
+ kunavo("…"), discount: 0.2 }` prices a −20% aggregator off the bundled list
28
+ price (scaling input/output/cacheRead) with no hand-typed number. Applies only
29
+ when `autoPrice` fills the entry; out-of-range values throw.
30
+
31
+ ### Compatibility
32
+
33
+ - Fully backward compatible. `autoPrice` is **off by default** — unpriced entries
34
+ stay unpriced and an explicit `cost` always wins, so no existing config changes
35
+ behavior. The table covers native makers only; open-weights hosts (DeepInfra)
36
+ and breadth aggregators (OpenRouter) are still priced explicitly.
37
+
38
+ ## [0.6.0] — 2026-06-10
39
+
40
+ Media billing contract v2: **rank by the reference, bill by actual usage.**
41
+ The 0.5 media router used one number for both jobs — the price normalized to a
42
+ reference output (1080p image / 5-second clip) ranked routes *and* estimated
43
+ costs, multiplied by an untyped `units` count. That mispriced off-reference
44
+ outputs (an 8s clip billed as 5s) and made the baseline duration-blind, and the
45
+ bare `units` invited a seconds-as-count 8× overcharge. 0.6 separates the two.
46
+
47
+ ### Added
48
+
49
+ - **Typed usage (`MediaUsage`).** Adapter results (`MediaGenerateResult`,
50
+ `MediaStatusResult`) carry `usage: { seconds?, outputs?, megapixels? }` —
51
+ explicitly named dimensions that cannot be confused. The bundled adapters
52
+ report it (Kunavo video now safely reports the real `duration_seconds`).
53
+ The legacy bare `units` field is still honored as an output count.
54
+ - **Settle-time billing.** Cost estimates price the route's actual unit on
55
+ actual usage: per-second SKUs bill `usage.seconds` → `input.duration`
56
+ (numbers or `"8s"`-style strings) → the reference (last resort); per-image /
57
+ per-call SKUs bill output count; per-megapixel SKUs bill measured megapixels.
58
+ New public helpers: `billableUnits`, `priceCents`, `durationFromInput`.
59
+ - **Usage-aware savings baseline.** `baselineUsd` is now priced at settle time
60
+ against the same usage as the cost — an 8-second clip is baselined at 8
61
+ seconds of the official rate, not the 5-second reference. Off-reference calls
62
+ can no longer produce negative or understated savings.
63
+ - **`CallRecord` provenance fields** (all optional, backward compatible):
64
+ `modality` ("image" | "video"), `usage`, `baselineKind`
65
+ ("official" | "priciest-route" | "last-leg" — the text router now stamps
66
+ "last-leg"), `officialUsd` (the official price for this call's usage), and
67
+ `estCostUsd` (the price-table prediction; `costUsd − estCostUsd` on
68
+ provider-reported rows is price-table drift).
69
+ - **Cost-outlier guard.** A provider-reported cost ≥25× off the table
70
+ prediction (the classic USD-vs-cents slip is exactly 100×) raises `onError`
71
+ with both numbers; the reported bill still stands.
72
+ - `MediaRunResult` and the terminal `MediaPollResult` expose the `usage` that
73
+ backed the bill.
74
+
75
+ ### Changed
76
+
77
+ - `MediaJobHandle` now carries the serving route's `pricing` and the resolved
78
+ savings `baseline` so settle-time billing works across processes. Handles
79
+ serialized by 0.5.x still poll fine: they settle with the legacy
80
+ reference-price estimate and the submit-time baseline.
81
+
7
82
  ## [0.5.6] — 2026-06-07
8
83
 
9
84
  All additions are optional and backward compatible. The sync `createMediaLCR`
package/README.md CHANGED
@@ -138,6 +138,33 @@ const lcr = createLCR({
138
138
 
139
139
  DeepInfra carries open weights only — no first-party Claude / GPT / Gemini. For those closed models, route through OpenRouter or a discount gateway instead.
140
140
 
141
+ ## Zero-config pricing (`autoPrice`)
142
+
143
+ Typing `cost: { input, output }` for every provider is the tedious part. `autoPrice: true` fills any entry that has no explicit `cost` from a **bundled price table** (`MODEL_PRICES`) — official first-party rates for the native makers (OpenAI, Anthropic, Google, DeepSeek, xAI, Mistral), keyed by the bare model id you already pass to the provider:
144
+
145
+ ```ts
146
+ const lcr = createLCR({
147
+ autoPrice: true, // fill missing costs from the bundled table
148
+ autoSort: true, // then order cheapest-first using those prices
149
+ models: {
150
+ "claude-sonnet": [
151
+ // Native API — price comes from the table, nothing to type.
152
+ { model: anthropic("claude-sonnet-4-6"), label: "anthropic" },
153
+ // Flat-discount aggregator — `discount` applies on top of the list price.
154
+ { model: kunavo("claude-sonnet-4-6"), label: "kunavo", discount: 0.2 }, // 20% off list
155
+ ],
156
+ },
157
+ });
158
+ ```
159
+
160
+ Three rules keep it predictable:
161
+
162
+ - **Off by default.** Unpriced entries stay unpriced (the pre-existing behavior), so turning `autoPrice` on never silently re-prices a model — and an **explicit `cost` always wins** over the table.
163
+ - **`discount` is the reseller knob.** A flat-% aggregator (Kunavo −20%) becomes `discount: 0.2` instead of a hand-typed number; it scales input, output, and `cacheRead` alike, and only applies when the table fills the entry. Variable-discount providers (TokenMart) still want explicit per-model `cost`.
164
+ - **Native makers only.** The table carries first-party list prices — the cheapest, most-featureful "go direct" route. Open-weights hosts (DeepInfra) and breadth aggregators (OpenRouter) aren't in it; price those explicitly.
165
+
166
+ Look a price up yourself with `getModelPrice("claude-sonnet-4-6")`. The table is generated from [LiteLLM's price map](https://github.com/BerriAI/litellm) (MIT) — refresh with `node scripts/gen-text-prices.mjs`.
167
+
141
168
  ## How it routes
142
169
 
143
170
  1. **Cheapest first.** Providers are tried in order — list them cheapest-first, or set `autoSort: true` to order them by `cost` automatically.
@@ -185,13 +212,28 @@ interface CallRecord {
185
212
  outputTokens: number;
186
213
  cachedInputTokens?: number; // prompt-cache hits the winner read (when reported)
187
214
  costUsd: number; // winner cost, cache-discount applied (see `cacheRead`)
188
- baselineUsd?: number; // same usage on the priciest priced leg → savings = baselineUsd − costUsd
215
+ baselineUsd?: number; // what the savings baseline would have charged for the SAME usage → savings = baselineUsd − costUsd
216
+ baselineKind?: "last-leg" | "official" | "priciest-route"; // how that baseline was derived (see below)
217
+ cachedSavingUsd?: number; // the provider's own prompt-cache discount — real money, but NOT a routing saving; never fold it into baselineUsd − costUsd
189
218
  requestId?: string; // your correlation id (see below) — roll multi-step tool loops into one request
190
219
  usageMissing?: boolean; // winner served but reported 0/0 tokens → costUsd is 0 but unknown, not free
220
+ emptyCompletion?: boolean; // clean response that generated NOTHING — prompt billed, zero output
221
+
222
+ // Media calls (createMediaLCR) additionally carry:
223
+ modality?: "image" | "video";
224
+ usage?: { seconds?: number; outputs?: number; megapixels?: number }; // the actual usage the bill was based on
225
+ officialUsd?: number; // the model maker's first-party price for this call's usage
226
+ estCostUsd?: number; // what the configured price table PREDICTED — on provider-reported rows, costUsd − estCostUsd is price-table drift
191
227
  }
192
228
  ```
193
229
 
194
- **Savings, not just spend.** Whenever at least one provider in a chain carries a `cost`, `baselineUsd` is what the same call would have cost on the most expensive priced leg (typically your safety-net fallback). `baselineUsd costUsd` is the money routing saved on that call the number a cost dashboard exists to show.
230
+ **Savings, not just spend.** `baselineUsd` is what the same call would have cost without routing, and `baselineKind` says exactly what that means so a dashboard can qualify the number instead of trusting it blindly:
231
+
232
+ - **`"last-leg"`** (text): the **last priced provider** in the chain — your always-on, list-price fallback. Deliberately *not* the most expensive leg: prompt caching can make a sticker-cheaper provider cost more on a cache-heavy call, and a max-of-chain baseline would fabricate "savings" on calls the fallback itself served.
233
+ - **`"official"`** (media): the model maker's **first-party API price** for the same actual usage — an 8-second clip is baselined at 8 seconds of the official rate, not a reference length.
234
+ - **`"priciest-route"`** (media, no official price known): the most expensive route you configured. Honest about cross-provider spread, but self-referential — not a market price.
235
+
236
+ `baselineUsd − costUsd` is the money routing saved on that call — the number a cost dashboard exists to show.
195
237
 
196
238
  **Responsiveness, not just total time.** On streaming calls (`streamText`, `streamObject`, streaming agents), `ttftMs` is the **time to first token** — measured from the winning provider's attempt start to its first content delta. It's the metric most LLM dashboards lead with, because it's what a user feels as "how fast did it start replying". Total `latencyMs` covers the whole stream including any failover; `ttftMs` isolates the serving model's responsiveness. It's `undefined` for `generateText`/`generateObject` (no streaming → no "first" token) and for calls that failed before any content. Output throughput (tokens/sec) is then `outputTokens / ((latencyMs − ttftMs) / 1000)`.
197
239
 
@@ -226,13 +268,28 @@ const lcr = createLCR({
226
268
  });
227
269
  ```
228
270
 
271
+ ### The companion dashboard ([`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard))
272
+
273
+ <p align="center">
274
+ <img src="assets/dashboard-demo.png" alt="ai-lcr-dashboard (demo data): saved vs spent over time, a price-drift alert, per-project failover health, and per-provider reliability" width="780">
275
+ </p>
276
+
277
+ A **self-hostable** Next.js + Postgres collector built for exactly these records — point `createHttpSink` at its `/api/ingest` and you get, across every project you tag:
278
+
279
+ - **saved vs. spent** over time, with the savings qualified by `baselineKind` and clamped per call (one mispriced row can't eat the rest);
280
+ - **failover health** per provider — who actually failed, who caught it, what leaked to users;
281
+ - **media economics** — image/video calls split out with per-unit cost ($/second of video, $/image);
282
+ - a **price-drift panel** — when a provider's reported bill disagrees with your configured price table by >±20%, it surfaces the route (a ~100× ratio is the classic USD-vs-cents slip). Cheapest-first routing is only as good as its price table; this is the smoke alarm.
283
+
284
+ One-click Vercel deploy (any Postgres: Neon, Supabase, RDS, local); records carry metadata only — no prompts, no outputs. The ingest contract is just the `CallRecord` JSON, so any other drain works too.
285
+
229
286
  ## Supported providers
230
287
 
231
288
  Any OpenAI-compatible endpoint works — and so does any AI SDK provider package, including a model vendor's own official API.
232
289
 
233
290
  - **Model vendors' own APIs (native):** route straight to [DeepSeek](https://platform.deepseek.com), [OpenAI](https://openai.com), [Anthropic](https://anthropic.com), [Google](https://ai.google.dev), [xAI](https://x.ai), etc. via their AI SDK provider packages — no markup, full native features. See [Route to a model vendor's own API](#route-to-a-model-vendors-own-api-native-providers).
234
291
  - **Text aggregators:** [OpenRouter](https://openrouter.ai) (widest coverage, list pricing) · [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off** every model) · [TokenMart](https://thetokenmart.ai) (15–65% off, varies by model)
235
- - **Image / video:** [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off**) · [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) — routing via `createMediaLCR`. Image: Kunavo (generations + `*-edit` reference-image endpoints) + Runware + fal. Video: fal (async queue) and Kunavo (async `POST /v1/videos` + poll, sync fallback) — both verified live
292
+ - **Image / video:** [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off**) · [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) — routing via `createMediaLCR`. Image: Kunavo (generations + `*-edit` reference-image endpoints) + Runware + fal. Video: fal (async queue), Kunavo (async `POST /v1/videos` + poll, sync fallback), and Runware (async `videoInference` + `getResponse` poll) all three on the async `submit`/`poll` path
236
293
 
237
294
  ## Text model pricing
238
295
 
@@ -293,7 +350,9 @@ USD per second, as of 2026-05 — verify current rates. Video billing differs by
293
350
 
294
351
  ## Image & video routing (`createMediaLCR`)
295
352
 
296
- Image and video are a separate, self-contained side of `ai-lcr` (file outputs, mixed pricing units, async jobs) — see [`src/media.ts`](src/media.ts). You give it a registry (each model's provider routes + per-unit price) and a set of adapters; it routes cheapest-first, fails over, and reports real/normalized cost through the same `onCall` sink as text.
353
+ Image and video are a separate, self-contained side of `ai-lcr` (file outputs, mixed pricing units, async jobs) — see [`src/media.ts`](src/media.ts). You give it a registry (each model's provider routes + per-unit price) and a set of adapters; it routes cheapest-first, fails over, and reports real cost through the same `onCall` sink as text.
354
+
355
+ Two prices, two jobs: routes are **ranked** by their price normalized to one reference output (a 1080p image / a 5-second clip) so mixed units are comparable, but each settled call is **billed** on its actual usage — an 8-second clip on a per-second SKU costs 8 × the per-second rate, and its savings baseline is the official price for those same 8 seconds. Adapters report typed usage (`usage: { seconds, outputs, megapixels }`); when a provider returns its own bill, that wins, and a bill wildly off the price table (the classic USD-vs-cents slip is exactly 100×) raises `onError` so the table gets fixed.
297
356
 
298
357
  ```ts
299
358
  import { createMediaLCR, createKunavoMediaAdapter, createFalMediaAdapter } from 'ai-lcr'
@@ -345,6 +404,37 @@ Design choices worth knowing:
345
404
  - **Telemetry lands once, at the terminal poll** — one `onCall` `CallRecord` with the full failover chain, threaded across both processes (not at `submit`).
346
405
  - An adapter advertises async by implementing `submit` + `checkStatus`; image-only adapters omit them and are skipped by the async router. The bundled Kunavo, fal, and Runware adapters all implement the async path (Kunavo/Runware async is video-only; fal covers both).
347
406
 
407
+ ### Writing your own adapter
408
+
409
+ A `MediaAdapter` is small — `run` for sync, optional `submit`/`checkStatus` for async — and the one contract that matters is **how you report what was produced**:
410
+
411
+ ```ts
412
+ interface MediaAdapter {
413
+ provider: string;
414
+ run(req: { externalId: string; input: Record<string, unknown> }): Promise<MediaGenerateResult>;
415
+ submit?(req: { externalId: string; input; metadata? }): Promise<{ requestId: string }>;
416
+ checkStatus?(req: { externalId: string; requestId: string }): Promise<MediaStatusResult>;
417
+ }
418
+
419
+ // On a settled result, report:
420
+ {
421
+ outputs: [{ url, type: "image" | "video" }],
422
+ costCents?: number, // the provider's OWN bill, in US cents — convert if the API returns dollars (×100)!
423
+ usage?: { // typed actual usage — what the bill (or estimate) is based on
424
+ seconds?: number, // video length actually produced (per-second SKUs bill this)
425
+ outputs?: number, // output count — images or clips (per-image / per-call SKUs bill this)
426
+ megapixels?: number // total output MP (per-megapixel SKUs bill this)
427
+ }
428
+ }
429
+ ```
430
+
431
+ Rules that keep billing honest:
432
+
433
+ - **Report dimensions in `usage`, never as a bare count.** Seconds and output count are separate, explicitly-named fields, so a per-call price can never be multiplied by a clip's duration (the classic 8× overcharge).
434
+ - **`costCents` is cents.** A provider that returns dollars must be converted in the adapter (see the Runware adapter). If you slip, the router's cost-outlier guard flags any bill ≥25× off the price table via `onError` — but the reported number still stands.
435
+ - **When you report nothing**, the router estimates: per-second SKUs read `usage.seconds`, then the input's `duration` (numbers or `"8s"`-style strings), then the 5-second reference as a last resort; per-image/per-call SKUs bill the output count.
436
+ - **Throw errors with an HTTP `status` property** (see `FalMediaError`/`KunavoMediaError`) so the router can classify them for failover.
437
+
348
438
  ## Vetting a provider (capability + cost probe)
349
439
 
350
440
  A discount is worthless if the provider quietly breaks the wire protocol. `ai-lcr` ships a zero-dependency check (`scripts/check-provider.sh`, just `bash` + `curl` + `python3`) that vets the things that actually cost you money or corrupt output, **per model**:
@@ -404,11 +494,13 @@ Two OpenAI-compatible providers, same probe, same day. Cells cover both families
404
494
  - [x] One correlated record per request with the full failover chain (`onCall` + `formatCallRecord`)
405
495
  - [x] Auto cheapest-first ordering (`autoSort`) from per-provider `cost`
406
496
  - [x] Offline capability + cost check (`scripts/check-provider.sh`) → per-model trust matrix
407
- - [ ] Bundled price table for zero-config pricing (drop the manual `cost` numbers)
497
+ - [x] Bundled price table for zero-config pricing (`autoPrice` + `MODEL_PRICES`) — drop the manual `cost` numbers for native-maker routes
408
498
  - [ ] Provider-quirk middleware (transparently patch known per-provider request quirks, e.g. Kunavo's ignored `max_tokens`)
409
499
  - [ ] Feed probe results into routing automatically (auto-exclude a model from a provider that fails its probe)
410
- - [x] Image & video model routing (`createMediaLCR`) — image via Kunavo (incl. `*-edit`) + Runware + fal; **video live via fal and Kunavo** (both verified)
411
- - [ ] Normalized cross-provider video price comparison + verified Runware video adapter
500
+ - [x] Image & video model routing (`createMediaLCR`) — image via Kunavo (incl. `*-edit`) + Runware + fal; video async (`submit`/`poll`) via fal, Kunavo, and Runware
501
+ - [x] Settle-time billing on actual usage (0.6) — typed `usage`, duration-aware savings baseline, `estCostUsd` price-drift signal, cost-outlier guard
502
+ - [x] Self-hosted dashboard ([`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard)) — savings, failover health, media $/unit, price-drift panel
503
+ - [ ] Normalized cross-provider video price comparison in the bundled table
412
504
 
413
505
  ## Affiliate disclosure
414
506
 
package/README.zh-CN.md CHANGED
@@ -144,13 +144,56 @@ DeepInfra 只承载开源权重——没有第一方 Claude / GPT / Gemini。那
144
144
  2. **失败时向下穿透。** 遇到任何 provider 失败——限流、5xx、超时、**额度耗尽**(402 / 欠费 / 余额不足),以及 **400** 这类 client 错误——都会前进到下一个 provider,且对流式安全。400 会 failover 是有意为之:在 OpenAI 兼容聚合层里,400 往往是"*这家* provider 不吃这个请求"(不支持的参数、它没上架这个 model、更严格的 schema),而非请求本身坏了——换一家很可能就能服务。若所有 provider 都拒绝,请求仍会失败,并抛出**第一个**(原始)错误,让真正的调用方 bug 保持可调试。唯一永远不 failover 的是调用方主动取消(`AbortSignal`)。想恢复旧的"client 错误立即失败"行为,给 `createLCR` 传 `shouldRetry: isRetryableError`。
145
145
  3. **恢复。** 在一段空闲窗口(`resetIntervalMs`,默认 60s)之后,自动回到最便宜的 provider。
146
146
 
147
+ ## 看清每次调用发生了什么(`onCall`)
148
+
149
+ `onError`/`onCost` 各自独立触发、互不关联,事后很难还原一次 failover 的全貌。`onCall` 给你**每个请求一条记录**——完整的尝试链、最终服务者、每跳失败的原因、延迟和成本;`formatCallRecord` 把它变成一行可扫读的日志:
150
+
151
+ ```text
152
+ ✓ text tokenmart 412ms $0.0003
153
+ ⚠ text tokenmart→openrouter 910ms $0.0004 ⤷ tokenmart 502
154
+ ✗ text deepseek→tokenmart→openrouter 1240ms FAILED ⤷ deepseek 401, tokenmart 502, openrouter 429
155
+ ```
156
+
157
+ `record` 是一个纯 `CallRecord` 对象,关键字段:
158
+
159
+ ```ts
160
+ interface CallRecord {
161
+ id: string; // 每个请求一个关联 id
162
+ model: string; // 逻辑模型名
163
+ attempts: { provider; ok; latencyMs; errorClass? }[];
164
+ winner?: string; // 最终服务的 provider;全失败则为 undefined
165
+ ok: boolean;
166
+ failedOver: boolean; // 尝试了不止一家
167
+ latencyMs: number;
168
+ ttftMs?: number; // 仅流式:首 token 时间
169
+ inputTokens: number;
170
+ outputTokens: number;
171
+ cachedInputTokens?: number; // 命中 prompt 缓存的输入 token
172
+ costUsd: number; // 实际成本(已按 cacheRead 折扣)
173
+ baselineUsd?: number; // 同样用量在「节约基线」上的价格 → 节约 = baselineUsd − costUsd
174
+ baselineKind?: "last-leg" | "official" | "priciest-route"; // 基线的来源(见下)
175
+ cachedSavingUsd?: number; // provider 自己的缓存折扣——是真金白银,但不是路由的功劳,别混进节约
176
+ usageMissing?: boolean; // 服务成功但 token 报 0/0 → 成本是「未知」而非「免费」
177
+
178
+ // 媒体调用(createMediaLCR)额外携带:
179
+ modality?: "image" | "video";
180
+ usage?: { seconds?; outputs?; megapixels? }; // 账单依据的实际用量
181
+ officialUsd?: number; // 官方第一方价(按本次实际用量)
182
+ estCostUsd?: number; // 价格表的预估——与 costUsd 的差 = 价格表漂移
183
+ }
184
+ ```
185
+
186
+ **节约怎么算才诚实:** `baselineKind` 说明 `baselineUsd` 是哪种基线——文本是**链尾兜底 provider 的列表价**(`"last-leg"`,故意不取最贵的一条:prompt 缓存可能让标价更便宜的那家在缓存重的调用上反而更贵,取最大值会凭空造出"节约");媒体是**模型厂商官方第一方价**(`"official"`,按实际秒数算),查不到官方价时退化为你配置里最贵的路由(`"priciest-route"`,自我参照,仅说明跨 provider 价差)。
187
+
188
+ **送进收集器:** `createHttpSink` 把每条记录 POST 到任意 endpoint(serverless 上传 Next.js 的 `after` 作 `dispatch` 防止被掐断)。配套的自托管 dashboard [`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard)(Next.js + Postgres,Vercel 一键部署)专为这些记录而建:花费 vs 节约趋势、各 provider failover 健康度、媒体 $/秒 与 $/张、以及**价格漂移面板**——某条 model@provider 路由的实报账单与价格表偏差超过 ±20% 时点名示警(约 100× 基本就是美元当美分的笔误)。只存元数据,不存 prompt 和输出。
189
+
147
190
  ## 支持的 provider
148
191
 
149
192
  任何 OpenAI 兼容的 endpoint 都可用——任何 AI SDK 的 provider 包也都可用,包括模型厂商自己的官方 API。
150
193
 
151
194
  - **模型厂商官方 API(原生):** 通过各自的 AI SDK provider 包直连 [DeepSeek](https://platform.deepseek.com)、[OpenAI](https://openai.com)、[Anthropic](https://anthropic.com)、[Google](https://ai.google.dev)、[xAI](https://x.ai) 等——无加价,原生特性齐全。见上方「直连模型厂商官方 API(原生 provider)」一节。
152
195
  - **文本聚合器:** [OpenRouter](https://openrouter.ai)(覆盖最广,列表定价)· [Kunavo](https://kunavo.com/?ref=victorimf)(**全模型 8 折**)· [TokenMart](https://thetokenmart.ai)(按模型 85 折–35 折不等)
153
- - **图像 / 视频:** [Kunavo](https://kunavo.com/?ref=victorimf)(**8 折**)· [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —— 通过 `createMediaLCR` 路由。图像:Kunavo + Runware + fal。视频:fal(已可用,走其异步队列 API);Kunavo Veo 轮询路径已实现但未验证
196
+ - **图像 / 视频:** [Kunavo](https://kunavo.com/?ref=victorimf)(**8 折**)· [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —— 通过 `createMediaLCR` 路由。图像:Kunavo(生成 + `*-edit` 参考图端点)+ Runware + fal。视频:fal(异步队列)、Kunavo(异步 `POST /v1/videos` + 轮询,另有同步兜底)、Runware(异步 `videoInference` + `getResponse` 轮询)——三家都在异步 `submit`/`poll` 路径上
154
197
 
155
198
  ## 文本模型价格
156
199
 
@@ -209,7 +252,9 @@ Kunavo 提供 Anthropic + Google。DeepSeek / OpenAI / Grok / Mistral 路由到
209
252
 
210
253
  ## 图像与视频路由(`createMediaLCR`)
211
254
 
212
- 图像和视频是 `ai-lcr` 独立的一侧(输出是文件、计价单位混杂、视频是异步任务)—— 见 [`src/media.ts`](src/media.ts)。你提供一个 registry(每个模型的 provider 路由 + 单位价)和一组 adapter,它就按最便宜优先路由、自动 failover,并通过与文本侧相同的 `onCall` sink 报告真实/归一化成本。
255
+ 图像和视频是 `ai-lcr` 独立的一侧(输出是文件、计价单位混杂、视频是异步任务)—— 见 [`src/media.ts`](src/media.ts)。你提供一个 registry(每个模型的 provider 路由 + 单位价)和一组 adapter,它就按最便宜优先路由、自动 failover,并通过与文本侧相同的 `onCall` sink 报告真实成本。
256
+
257
+ 两个价格、两份职责(0.6+):**排序**用归一化到参考输出(1080p 一张图 / 5 秒一段片)的价格,让混杂的计价单位可以公平比较;但每次调用的**计费**按实际用量——按秒计价的 SKU,一条 8 秒的片就按 8 秒收,节约基线也按同样的 8 秒官方价算。adapter 上报带类型的实际用量(`usage: { seconds, outputs, megapixels }`);provider 自己报了账单时以账单为准,而账单与价格表预估差距悬殊时(经典的"美元当美分"笔误正好是 100×)会触发 `onError`,提醒你修价格表。
213
258
 
214
259
  ```ts
215
260
  import { createMediaLCR, createKunavoMediaAdapter, createFalMediaAdapter } from 'ai-lcr'
@@ -261,6 +306,30 @@ if (r.done) {
261
306
  - **telemetry 只在终态轮询落一条**——一条 `onCall` `CallRecord`,带完整 failover 链,跨两个进程串起来(不是在 `submit` 时落)。
262
307
  - adapter 通过实现 `submit` + `checkStatus` 来声明支持异步;只做图像的 adapter 省略它们,异步路由会跳过这种路由。内置的 Kunavo、fal、Runware adapter 都实现了异步路径(Kunavo/Runware 异步仅视频;fal 图像视频皆可)。
263
308
 
309
+ ### 自己写 adapter
310
+
311
+ `MediaAdapter` 很小——同步用 `run`,异步可选 `submit`/`checkStatus`——唯一要紧的合同是**如何上报产出**:
312
+
313
+ ```ts
314
+ // 落定的结果上报:
315
+ {
316
+ outputs: [{ url, type: "image" | "video" }],
317
+ costCents?: number, // provider 自己的账单,单位是美分——API 返回美元的要 ×100 转换!
318
+ usage?: { // 带类型的实际用量——账单(或估算)以它为准
319
+ seconds?: number, // 实际产出的视频秒数(按秒计价的 SKU 按它计费)
320
+ outputs?: number, // 产出个数——图或片(按张 / 按次计价按它计费)
321
+ megapixels?: number // 产出总百万像素(按 MP 计价按它计费)
322
+ }
323
+ }
324
+ ```
325
+
326
+ 保证计费正确的几条规则:
327
+
328
+ - **维度在 `usage` 里显式命名,绝不报裸数字。** 秒数和产出数是两个不同的字段,按次的平价永远不可能被片长乘爆(经典的 8× 过计)。
329
+ - **`costCents` 是美分。** API 返回美元的,必须在 adapter 里转换(参考 Runware adapter)。万一失手,路由器的异常账单守卫会在偏差 ≥25× 时触发 `onError`——但上报的数字仍然作数。
330
+ - **什么都不报时**,路由器会估算:按秒 SKU 依次读 `usage.seconds` → 输入的 `duration`(数字或 `"8s"` 这类字符串)→ 最后才退到 5 秒参考;按张/按次 SKU 按产出数计。
331
+ - **抛错时带上 HTTP `status` 属性**(见 `FalMediaError`/`KunavoMediaError`),路由器才能正确分类并 failover。
332
+
264
333
  ## 给 provider 做体检(能力 + 成本探测)
265
334
 
266
335
  折扣再大,如果 provider 偷偷破坏了协议就一文不值。`ai-lcr` 自带一个零依赖的检查脚本(`scripts/check-provider.sh`,只需 `bash` + `curl` + `python3`),**逐模型**核查那些真正会让你多花钱或污染输出的点:
@@ -321,8 +390,10 @@ API_KEY=$INFERENCE_API_KEY BASE=https://model.service-inference.ai \
321
390
  - [ ] 内置价格表,实现零配置定价(省去手填 `cost` 数字)
322
391
  - [ ] provider 怪癖中间件(透明地修补已知怪癖,如 Kunavo 被忽略的 `max_tokens`)
323
392
  - [ ] 把 probe 结果自动接入路由(探测失败的 provider×model 自动从列表剔除)
324
- - [x] 图像与视频模型路由(`createMediaLCR`)—— 图像走 Kunavo + Runware + fal;**视频已可用,走 fal**(异步队列 API)
325
- - [ ] 归一化的跨 provider 视频价格对比 + 验证 Kunavo/Runware 视频适配器
393
+ - [x] 图像与视频模型路由(`createMediaLCR`)—— 图像走 Kunavo(含 `*-edit`)+ Runware + fal;视频异步(`submit`/`poll`)走 fal、Kunavo、Runware 三家
394
+ - [x] 按实际用量的结算计费(0.6)—— typed `usage`、时长感知的节约基线、`estCostUsd` 价格漂移信号、异常账单守卫
395
+ - [x] 自托管 dashboard([`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard))—— 节约、failover 健康度、媒体单位成本、价格漂移面板
396
+ - [ ] 内置价格表中的归一化跨 provider 视频价格对比
326
397
 
327
398
  ## 联盟(Affiliate)披露
328
399