npm - ai-lcr - Versions diffs - 0.6.0 → 0.6.2 - Mend

ai-lcr 0.6.0 → 0.6.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,68 @@ All notable changes to `ai-lcr` are documented here. The format follows
 [Keep a Changelog](https://keepachangelog.com/), and the project adheres to
 [Semantic Versioning](https://semver.org/).
+## [0.6.2] — 2026-06-11
+Circuit breaker for persistently-failing providers. Until now the only recovery
+lever was `resetIntervalMs`, which snaps routing back to the cheapest provider on
+a timer — so a provider that's actually down keeps eating one failed attempt
+every window. The breaker remembers the failure and stops sending it traffic.
+### Added
+- **`createLCR({ cooldown })`.** A provider that fails `maxFailures` times within
+  `windowMs` is *skipped* for `cooldownMs` instead of being re-probed every
+  request; a single success clears its count. `true` enables defaults (3 / 60s →
+  60s); pass `{ maxFailures, windowMs, cooldownMs }` to tune. New exported type
+  `CooldownOptions`.
+- The breaker only **reorders** each request's attempt list (cooling providers go
+  last), so when every provider is cooling a request still tries them all rather
+  than failing outright — it can never turn a recoverable request into a hard
+  failure.
+### Changed
+- The routing engine now snapshots a per-request **attempt order** once (cheapest
+  ring with cooling providers moved to the back) and threads it through streaming
+  failover, replacing the previous modular index walk. Behavior is identical when
+  `cooldown` is unset.
+### Compatibility
+- Fully backward compatible. `cooldown` is **off by default** — with it unset no
+  provider is ever skipped and routing behaves exactly as before.
+## [0.6.1] — 2026-06-11
+Zero-config pricing for native-maker routes. Until now every priced provider
+needed a hand-typed `cost: { input, output }`; for a vendor's own API that number
+is just the public list price you could look up. 0.6.1 bundles those.
+### Added
+- **Bundled price table (`MODEL_PRICES`).** Official first-party token prices for
+  the native makers ai-lcr documents (openai · anthropic · gemini · deepseek ·
+  xai · mistral), keyed by the bare model id you pass to that vendor's AI SDK
+  provider — USD per 1M tokens, with `cacheRead` where the maker prices it.
+  Generated from [LiteLLM's price map](https://github.com/BerriAI/litellm) (MIT)
+  via `scripts/gen-text-prices.mjs`; the generated file is committed.
+- **`getModelPrice(modelId)`.** Look up a bundled price directly; resolves a bare
+  id or one with a leading `provider/` segment stripped.
+- **`createLCR({ autoPrice: true })`.** Fills any provider entry that has no
+  explicit `cost` from the table, by `model.modelId`. A native-vendor route then
+  needs zero hand-typed pricing and `autoSort` can order it.
+- **`discount` on a provider entry.** The flat-reseller knob: `{ model:
+  kunavo("…"), discount: 0.2 }` prices a −20% aggregator off the bundled list
+  price (scaling input/output/cacheRead) with no hand-typed number. Applies only
+  when `autoPrice` fills the entry; out-of-range values throw.
+### Compatibility
+- Fully backward compatible. `autoPrice` is **off by default** — unpriced entries
+  stay unpriced and an explicit `cost` always wins, so no existing config changes
+  behavior. The table covers native makers only; open-weights hosts (DeepInfra)
+  and breadth aggregators (OpenRouter) are still priced explicitly.
 ## [0.6.0] — 2026-06-10
 Media billing contract v2: **rank by the reference, bill by actual usage.**

package/README.md CHANGED Viewed

@@ -138,12 +138,50 @@ const lcr = createLCR({
 DeepInfra carries open weights only — no first-party Claude / GPT / Gemini. For those closed models, route through OpenRouter or a discount gateway instead.
+## Zero-config pricing (`autoPrice`)
+Typing `cost: { input, output }` for every provider is the tedious part. `autoPrice: true` fills any entry that has no explicit `cost` from a **bundled price table** (`MODEL_PRICES`) — official first-party rates for the native makers (OpenAI, Anthropic, Google, DeepSeek, xAI, Mistral), keyed by the bare model id you already pass to the provider:
+```ts
+const lcr = createLCR({
+  autoPrice: true,   // fill missing costs from the bundled table
+  autoSort: true,    // then order cheapest-first using those prices
+  models: {
+    "claude-sonnet": [
+      // Native API — price comes from the table, nothing to type.
+      { model: anthropic("claude-sonnet-4-6"), label: "anthropic" },
+      // Flat-discount aggregator — `discount` applies on top of the list price.
+      { model: kunavo("claude-sonnet-4-6"), label: "kunavo", discount: 0.2 }, // 20% off list
+    ],
+  },
+});
+```
+Three rules keep it predictable:
+- **Off by default.** Unpriced entries stay unpriced (the pre-existing behavior), so turning `autoPrice` on never silently re-prices a model — and an **explicit `cost` always wins** over the table.
+- **`discount` is the reseller knob.** A flat-% aggregator (Kunavo −20%) becomes `discount: 0.2` instead of a hand-typed number; it scales input, output, and `cacheRead` alike, and only applies when the table fills the entry. Variable-discount providers (TokenMart) still want explicit per-model `cost`.
+- **Native makers only.** The table carries first-party list prices — the cheapest, most-featureful "go direct" route. Open-weights hosts (DeepInfra) and breadth aggregators (OpenRouter) aren't in it; price those explicitly.
+Look a price up yourself with `getModelPrice("claude-sonnet-4-6")`. The table is generated from [LiteLLM's price map](https://github.com/BerriAI/litellm) (MIT) — refresh with `node scripts/gen-text-prices.mjs`.
 ## How it routes
 1. **Cheapest first.** Providers are tried in order — list them cheapest-first, or set `autoSort: true` to order them by `cost` automatically.
 2. **Fall through on failure.** On any provider failure — rate limit, 5xx, timeout, a **billing cap** (402 / out-of-credit / quota), *and* a client error like a **400** — it advances to the next provider, streaming-safe. A 400 fails over on purpose: across OpenAI-compatible aggregators a 400 is usually "*this* provider won't take this request" (an unsupported param, a model it hasn't listed, a stricter schema), not a universally-broken request — so the next provider may well serve it. If every provider rejects the request it still fails, surfacing the **original** error so a genuine caller bug stays debuggable. The one failure that never fails over is a deliberate caller cancellation (`AbortSignal`). Pass `shouldRetry: isRetryableError` to `createLCR` to restore the stricter "client errors fail fast" behavior.
 3. **Recover.** After an idle window (`resetIntervalMs`, default 60s) it snaps back to the cheapest provider.
+For a provider that's *persistently* down, the timer alone keeps re-probing it — one failed attempt every window. Turn on the **circuit breaker** to stop that:
+```ts
+const lcr = createLCR({
+  models: { /* … */ },
+  cooldown: true, // skip a provider that keeps failing, instead of re-probing it
+});
+```
+With `cooldown` on, a provider that fails enough times in a window is *skipped* for a cooldown period rather than tried every request — and a single success clears it. Defaults are 3 failures / 60s → 60s cooldown; tune with `cooldown: { maxFailures, windowMs, cooldownMs }`. It only ever **reorders** the attempt list (cooling providers go last), so if *every* provider is cooling a request still tries them all rather than failing outright. Off by default — routing is unchanged unless you opt in.
 ## See what happened (`onCall`)
 `onError`/`onCost` fire separately and uncorrelated, so a failover is hard to read after the fact. `onCall` gives you **one record per request** — the full chain, the winner, the reason for each failed hop, latency, and cost — and `formatCallRecord` turns it into a one-liner you can scan:
@@ -185,13 +223,28 @@ interface CallRecord {
   outputTokens: number;
   cachedInputTokens?: number; // prompt-cache hits the winner read (when reported)
   costUsd: number;            // winner cost, cache-discount applied (see `cacheRead`)
-  baselineUsd?: number;       // same usage on the priciest priced leg → savings = baselineUsd − costUsd
+  baselineUsd?: number;       // what the savings baseline would have charged for the SAME usage → savings = baselineUsd − costUsd
+  baselineKind?: "last-leg" | "official" | "priciest-route"; // how that baseline was derived (see below)
+  cachedSavingUsd?: number;   // the provider's own prompt-cache discount — real money, but NOT a routing saving; never fold it into baselineUsd − costUsd
   requestId?: string;         // your correlation id (see below) — roll multi-step tool loops into one request
   usageMissing?: boolean;     // winner served but reported 0/0 tokens → costUsd is 0 but unknown, not free
+  emptyCompletion?: boolean;  // clean response that generated NOTHING — prompt billed, zero output
+  // Media calls (createMediaLCR) additionally carry:
+  modality?: "image" | "video";
+  usage?: { seconds?: number; outputs?: number; megapixels?: number }; // the actual usage the bill was based on
+  officialUsd?: number;       // the model maker's first-party price for this call's usage
+  estCostUsd?: number;        // what the configured price table PREDICTED — on provider-reported rows, costUsd − estCostUsd is price-table drift
 }
 ```
-**Savings, not just spend.** Whenever at least one provider in a chain carries a `cost`, `baselineUsd` is what the same call would have cost on the most expensive priced leg (typically your safety-net fallback). `baselineUsd − costUsd` is the money routing saved on that call — the number a cost dashboard exists to show.
+**Savings, not just spend.** `baselineUsd` is what the same call would have cost without routing, and `baselineKind` says exactly what that means so a dashboard can qualify the number instead of trusting it blindly:
+- **`"last-leg"`** (text): the **last priced provider** in the chain — your always-on, list-price fallback. Deliberately *not* the most expensive leg: prompt caching can make a sticker-cheaper provider cost more on a cache-heavy call, and a max-of-chain baseline would fabricate "savings" on calls the fallback itself served.
+- **`"official"`** (media): the model maker's **first-party API price** for the same actual usage — an 8-second clip is baselined at 8 seconds of the official rate, not a reference length.
+- **`"priciest-route"`** (media, no official price known): the most expensive route you configured. Honest about cross-provider spread, but self-referential — not a market price.
+`baselineUsd − costUsd` is the money routing saved on that call — the number a cost dashboard exists to show.
 **Responsiveness, not just total time.** On streaming calls (`streamText`, `streamObject`, streaming agents), `ttftMs` is the **time to first token** — measured from the winning provider's attempt start to its first content delta. It's the metric most LLM dashboards lead with, because it's what a user feels as "how fast did it start replying". Total `latencyMs` covers the whole stream including any failover; `ttftMs` isolates the serving model's responsiveness. It's `undefined` for `generateText`/`generateObject` (no streaming → no "first" token) and for calls that failed before any content. Output throughput (tokens/sec) is then `outputTokens / ((latencyMs − ttftMs) / 1000)`.
@@ -226,13 +279,28 @@ const lcr = createLCR({
 });
 ```
+### The companion dashboard ([`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard))
+<p align="center">
+  <img src="assets/dashboard-demo.png" alt="ai-lcr-dashboard (demo data): saved vs spent over time, a price-drift alert, per-project failover health, and per-provider reliability" width="780">
+</p>
+A **self-hostable** Next.js + Postgres collector built for exactly these records — point `createHttpSink` at its `/api/ingest` and you get, across every project you tag:
+- **saved vs. spent** over time, with the savings qualified by `baselineKind` and clamped per call (one mispriced row can't eat the rest);
+- **failover health** per provider — who actually failed, who caught it, what leaked to users;
+- **media economics** — image/video calls split out with per-unit cost ($/second of video, $/image);
+- a **price-drift panel** — when a provider's reported bill disagrees with your configured price table by >±20%, it surfaces the route (a ~100× ratio is the classic USD-vs-cents slip). Cheapest-first routing is only as good as its price table; this is the smoke alarm.
+One-click Vercel deploy (any Postgres: Neon, Supabase, RDS, local); records carry metadata only — no prompts, no outputs. The ingest contract is just the `CallRecord` JSON, so any other drain works too.
 ## Supported providers
 Any OpenAI-compatible endpoint works — and so does any AI SDK provider package, including a model vendor's own official API.
 - **Model vendors' own APIs (native):** route straight to [DeepSeek](https://platform.deepseek.com), [OpenAI](https://openai.com), [Anthropic](https://anthropic.com), [Google](https://ai.google.dev), [xAI](https://x.ai), etc. via their AI SDK provider packages — no markup, full native features. See [Route to a model vendor's own API](#route-to-a-model-vendors-own-api-native-providers).
 - **Text aggregators:** [OpenRouter](https://openrouter.ai) (widest coverage, list pricing) · [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off** every model) · [TokenMart](https://thetokenmart.ai) (15–65% off, varies by model)
-- **Image / video:** [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off**) · [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) — routing via `createMediaLCR`. Image: Kunavo (generations + `*-edit` reference-image endpoints) + Runware + fal. Video: fal (async queue) and Kunavo (async `POST /v1/videos` + poll, sync fallback) — both verified live
+- **Image / video:** [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off**) · [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) — routing via `createMediaLCR`. Image: Kunavo (generations + `*-edit` reference-image endpoints) + Runware + fal. Video: fal (async queue), Kunavo (async `POST /v1/videos` + poll, sync fallback), and Runware (async `videoInference` + `getResponse` poll) — all three on the async `submit`/`poll` path
 ## Text model pricing
@@ -347,6 +415,37 @@ Design choices worth knowing:
 - **Telemetry lands once, at the terminal poll** — one `onCall` `CallRecord` with the full failover chain, threaded across both processes (not at `submit`).
 - An adapter advertises async by implementing `submit` + `checkStatus`; image-only adapters omit them and are skipped by the async router. The bundled Kunavo, fal, and Runware adapters all implement the async path (Kunavo/Runware async is video-only; fal covers both).
+### Writing your own adapter
+A `MediaAdapter` is small — `run` for sync, optional `submit`/`checkStatus` for async — and the one contract that matters is **how you report what was produced**:
+```ts
+interface MediaAdapter {
+  provider: string;
+  run(req: { externalId: string; input: Record<string, unknown> }): Promise<MediaGenerateResult>;
+  submit?(req: { externalId: string; input; metadata? }): Promise<{ requestId: string }>;
+  checkStatus?(req: { externalId: string; requestId: string }): Promise<MediaStatusResult>;
+}
+// On a settled result, report:
+{
+  outputs: [{ url, type: "image" | "video" }],
+  costCents?: number,   // the provider's OWN bill, in US cents — convert if the API returns dollars (×100)!
+  usage?: {             // typed actual usage — what the bill (or estimate) is based on
+    seconds?: number,   //   video length actually produced (per-second SKUs bill this)
+    outputs?: number,   //   output count — images or clips (per-image / per-call SKUs bill this)
+    megapixels?: number //   total output MP (per-megapixel SKUs bill this)
+  }
+}
+```
+Rules that keep billing honest:
+- **Report dimensions in `usage`, never as a bare count.** Seconds and output count are separate, explicitly-named fields, so a per-call price can never be multiplied by a clip's duration (the classic 8× overcharge).
+- **`costCents` is cents.** A provider that returns dollars must be converted in the adapter (see the Runware adapter). If you slip, the router's cost-outlier guard flags any bill ≥25× off the price table via `onError` — but the reported number still stands.
+- **When you report nothing**, the router estimates: per-second SKUs read `usage.seconds`, then the input's `duration` (numbers or `"8s"`-style strings), then the 5-second reference as a last resort; per-image/per-call SKUs bill the output count.
+- **Throw errors with an HTTP `status` property** (see `FalMediaError`/`KunavoMediaError`) so the router can classify them for failover.
 ## Vetting a provider (capability + cost probe)
 A discount is worthless if the provider quietly breaks the wire protocol. `ai-lcr` ships a zero-dependency check (`scripts/check-provider.sh`, just `bash` + `curl` + `python3`) that vets the things that actually cost you money or corrupt output, **per model**:
@@ -402,15 +501,18 @@ Two OpenAI-compatible providers, same probe, same day. Cells cover both families
 ## Roadmap
 - [x] Own failover engine — cheapest-first routing + streaming-safe fallback, no external routing dependency
+- [x] Circuit breaker (`cooldown`) — skip a persistently-failing provider instead of re-probing it every window
 - [x] Real per-call cost accounting (`onCost`)
 - [x] One correlated record per request with the full failover chain (`onCall` + `formatCallRecord`)
 - [x] Auto cheapest-first ordering (`autoSort`) from per-provider `cost`
 - [x] Offline capability + cost check (`scripts/check-provider.sh`) → per-model trust matrix
-- [ ] Bundled price table for zero-config pricing (drop the manual `cost` numbers)
+- [x] Bundled price table for zero-config pricing (`autoPrice` + `MODEL_PRICES`) — drop the manual `cost` numbers for native-maker routes
 - [ ] Provider-quirk middleware (transparently patch known per-provider request quirks, e.g. Kunavo's ignored `max_tokens`)
 - [ ] Feed probe results into routing automatically (auto-exclude a model from a provider that fails its probe)
-- [x] Image & video model routing (`createMediaLCR`) — image via Kunavo (incl. `*-edit`) + Runware + fal; **video live via fal and Kunavo** (both verified)
-- [ ] Normalized cross-provider video price comparison + verified Runware video adapter
+- [x] Image & video model routing (`createMediaLCR`) — image via Kunavo (incl. `*-edit`) + Runware + fal; video async (`submit`/`poll`) via fal, Kunavo, and Runware
+- [x] Settle-time billing on actual usage (0.6) — typed `usage`, duration-aware savings baseline, `estCostUsd` price-drift signal, cost-outlier guard
+- [x] Self-hosted dashboard ([`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard)) — savings, failover health, media $/unit, price-drift panel
+- [ ] Normalized cross-provider video price comparison in the bundled table
 ## Affiliate disclosure

package/README.zh-CN.md CHANGED Viewed

@@ -144,13 +144,56 @@ DeepInfra 只承载开源权重——没有第一方 Claude / GPT / Gemini。那
 2. **失败时向下穿透。** 遇到任何 provider 失败——限流、5xx、超时、**额度耗尽**（402 / 欠费 / 余额不足），以及 **400** 这类 client 错误——都会前进到下一个 provider，且对流式安全。400 会 failover 是有意为之：在 OpenAI 兼容聚合层里，400 往往是"*这家* provider 不吃这个请求"（不支持的参数、它没上架这个 model、更严格的 schema），而非请求本身坏了——换一家很可能就能服务。若所有 provider 都拒绝，请求仍会失败，并抛出**第一个**（原始）错误，让真正的调用方 bug 保持可调试。唯一永远不 failover 的是调用方主动取消（`AbortSignal`）。想恢复旧的"client 错误立即失败"行为，给 `createLCR` 传 `shouldRetry: isRetryableError`。
 3. **恢复。** 在一段空闲窗口（`resetIntervalMs`，默认 60s）之后，自动回到最便宜的 provider。
+## 看清每次调用发生了什么（`onCall`）
+`onError`/`onCost` 各自独立触发、互不关联，事后很难还原一次 failover 的全貌。`onCall` 给你**每个请求一条记录**——完整的尝试链、最终服务者、每跳失败的原因、延迟和成本；`formatCallRecord` 把它变成一行可扫读的日志：
+```text
+✓ text  tokenmart                      412ms  $0.0003
+⚠ text  tokenmart→openrouter           910ms  $0.0004  ⤷ tokenmart 502
+✗ text  deepseek→tokenmart→openrouter  1240ms FAILED   ⤷ deepseek 401, tokenmart 502, openrouter 429
+```
+`record` 是一个纯 `CallRecord` 对象，关键字段：
+```ts
+interface CallRecord {
+  id: string;                 // 每个请求一个关联 id
+  model: string;              // 逻辑模型名
+  attempts: { provider; ok; latencyMs; errorClass? }[];
+  winner?: string;            // 最终服务的 provider；全失败则为 undefined
+  ok: boolean;
+  failedOver: boolean;        // 尝试了不止一家
+  latencyMs: number;
+  ttftMs?: number;            // 仅流式：首 token 时间
+  inputTokens: number;
+  outputTokens: number;
+  cachedInputTokens?: number; // 命中 prompt 缓存的输入 token
+  costUsd: number;            // 实际成本（已按 cacheRead 折扣）
+  baselineUsd?: number;       // 同样用量在「节约基线」上的价格 → 节约 = baselineUsd − costUsd
+  baselineKind?: "last-leg" | "official" | "priciest-route"; // 基线的来源（见下）
+  cachedSavingUsd?: number;   // provider 自己的缓存折扣——是真金白银，但不是路由的功劳，别混进节约
+  usageMissing?: boolean;     // 服务成功但 token 报 0/0 → 成本是「未知」而非「免费」
+  // 媒体调用（createMediaLCR）额外携带：
+  modality?: "image" | "video";
+  usage?: { seconds?; outputs?; megapixels? }; // 账单依据的实际用量
+  officialUsd?: number;       // 官方第一方价（按本次实际用量）
+  estCostUsd?: number;        // 价格表的预估——与 costUsd 的差 = 价格表漂移
+}
+```
+**节约怎么算才诚实：** `baselineKind` 说明 `baselineUsd` 是哪种基线——文本是**链尾兜底 provider 的列表价**（`"last-leg"`，故意不取最贵的一条：prompt 缓存可能让标价更便宜的那家在缓存重的调用上反而更贵，取最大值会凭空造出"节约"）；媒体是**模型厂商官方第一方价**（`"official"`，按实际秒数算），查不到官方价时退化为你配置里最贵的路由（`"priciest-route"`，自我参照，仅说明跨 provider 价差）。
+**送进收集器：** `createHttpSink` 把每条记录 POST 到任意 endpoint（serverless 上传 Next.js 的 `after` 作 `dispatch` 防止被掐断）。配套的自托管 dashboard [`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard)（Next.js + Postgres，Vercel 一键部署）专为这些记录而建：花费 vs 节约趋势、各 provider failover 健康度、媒体 $/秒 与 $/张、以及**价格漂移面板**——某条 model@provider 路由的实报账单与价格表偏差超过 ±20% 时点名示警（约 100× 基本就是美元当美分的笔误）。只存元数据，不存 prompt 和输出。
 ## 支持的 provider
 任何 OpenAI 兼容的 endpoint 都可用——任何 AI SDK 的 provider 包也都可用，包括模型厂商自己的官方 API。
 - **模型厂商官方 API（原生）：** 通过各自的 AI SDK provider 包直连 [DeepSeek](https://platform.deepseek.com)、[OpenAI](https://openai.com)、[Anthropic](https://anthropic.com)、[Google](https://ai.google.dev)、[xAI](https://x.ai) 等——无加价，原生特性齐全。见上方「直连模型厂商官方 API（原生 provider）」一节。
 - **文本聚合器：** [OpenRouter](https://openrouter.ai)（覆盖最广，列表定价）· [Kunavo](https://kunavo.com/?ref=victorimf)（**全模型 8 折**）· [TokenMart](https://thetokenmart.ai)（按模型 85 折–35 折不等）
-- **图像 / 视频：** [Kunavo](https://kunavo.com/?ref=victorimf)（**8 折**）· [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —— 通过 `createMediaLCR` 路由。图像：Kunavo + Runware + fal。视频：fal（已可用，走其异步队列 API）；Kunavo 的 Veo 轮询路径已实现但未验证
+- **图像 / 视频：** [Kunavo](https://kunavo.com/?ref=victorimf)（**8 折**）· [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —— 通过 `createMediaLCR` 路由。图像：Kunavo（生成 + `*-edit` 参考图端点）+ Runware + fal。视频：fal（异步队列）、Kunavo（异步 `POST /v1/videos` + 轮询，另有同步兜底）、Runware（异步 `videoInference` + `getResponse` 轮询）——三家都在异步 `submit`/`poll` 路径上
 ## 文本模型价格
@@ -209,7 +252,9 @@ Kunavo 提供 Anthropic + Google。DeepSeek / OpenAI / Grok / Mistral 路由到
 ## 图像与视频路由（`createMediaLCR`）
-图像和视频是 `ai-lcr` 独立的一侧（输出是文件、计价单位混杂、视频是异步任务）—— 见 [`src/media.ts`](src/media.ts)。你提供一个 registry（每个模型的 provider 路由 + 单位价）和一组 adapter，它就按最便宜优先路由、自动 failover，并通过与文本侧相同的 `onCall` sink 报告真实/归一化成本。
+图像和视频是 `ai-lcr` 独立的一侧（输出是文件、计价单位混杂、视频是异步任务）—— 见 [`src/media.ts`](src/media.ts)。你提供一个 registry（每个模型的 provider 路由 + 单位价）和一组 adapter，它就按最便宜优先路由、自动 failover，并通过与文本侧相同的 `onCall` sink 报告真实成本。
+两个价格、两份职责（0.6+）：**排序**用归一化到参考输出（1080p 一张图 / 5 秒一段片）的价格，让混杂的计价单位可以公平比较；但每次调用的**计费**按实际用量——按秒计价的 SKU，一条 8 秒的片就按 8 秒收，节约基线也按同样的 8 秒官方价算。adapter 上报带类型的实际用量（`usage: { seconds, outputs, megapixels }`）；provider 自己报了账单时以账单为准，而账单与价格表预估差距悬殊时（经典的"美元当美分"笔误正好是 100×）会触发 `onError`，提醒你修价格表。
 ```ts
 import { createMediaLCR, createKunavoMediaAdapter, createFalMediaAdapter } from 'ai-lcr'
@@ -261,6 +306,30 @@ if (r.done) {
 - **telemetry 只在终态轮询落一条**——一条 `onCall` `CallRecord`，带完整 failover 链，跨两个进程串起来（不是在 `submit` 时落）。
 - adapter 通过实现 `submit` + `checkStatus` 来声明支持异步；只做图像的 adapter 省略它们，异步路由会跳过这种路由。内置的 Kunavo、fal、Runware adapter 都实现了异步路径（Kunavo/Runware 异步仅视频；fal 图像视频皆可）。
+### 自己写 adapter
+`MediaAdapter` 很小——同步用 `run`，异步可选 `submit`/`checkStatus`——唯一要紧的合同是**如何上报产出**：
+```ts
+// 落定的结果上报：
+{
+  outputs: [{ url, type: "image" | "video" }],
+  costCents?: number,   // provider 自己的账单，单位是美分——API 返回美元的要 ×100 转换！
+  usage?: {             // 带类型的实际用量——账单（或估算）以它为准
+    seconds?: number,   //   实际产出的视频秒数（按秒计价的 SKU 按它计费）
+    outputs?: number,   //   产出个数——图或片（按张 / 按次计价按它计费）
+    megapixels?: number //   产出总百万像素（按 MP 计价按它计费）
+  }
+}
+```
+保证计费正确的几条规则：
+- **维度在 `usage` 里显式命名，绝不报裸数字。** 秒数和产出数是两个不同的字段，按次的平价永远不可能被片长乘爆（经典的 8× 过计）。
+- **`costCents` 是美分。** API 返回美元的，必须在 adapter 里转换（参考 Runware adapter）。万一失手，路由器的异常账单守卫会在偏差 ≥25× 时触发 `onError`——但上报的数字仍然作数。
+- **什么都不报时**，路由器会估算：按秒 SKU 依次读 `usage.seconds` → 输入的 `duration`（数字或 `"8s"` 这类字符串）→ 最后才退到 5 秒参考；按张/按次 SKU 按产出数计。
+- **抛错时带上 HTTP `status` 属性**（见 `FalMediaError`/`KunavoMediaError`），路由器才能正确分类并 failover。
 ## 给 provider 做体检（能力 + 成本探测）
 折扣再大，如果 provider 偷偷破坏了协议就一文不值。`ai-lcr` 自带一个零依赖的检查脚本（`scripts/check-provider.sh`，只需 `bash` + `curl` + `python3`），**逐模型**核查那些真正会让你多花钱或污染输出的点：
@@ -321,8 +390,10 @@ API_KEY=$INFERENCE_API_KEY BASE=https://model.service-inference.ai \
 - [ ] 内置价格表，实现零配置定价（省去手填 `cost` 数字）
 - [ ] provider 怪癖中间件（透明地修补已知怪癖，如 Kunavo 被忽略的 `max_tokens`）
 - [ ] 把 probe 结果自动接入路由（探测失败的 provider×model 自动从列表剔除）
-- [x] 图像与视频模型路由（`createMediaLCR`）—— 图像走 Kunavo + Runware + fal；**视频已可用，走 fal**（异步队列 API）
-- [ ] 归一化的跨 provider 视频价格对比 + 验证 Kunavo/Runware 视频适配器
+- [x] 图像与视频模型路由（`createMediaLCR`）—— 图像走 Kunavo（含 `*-edit`）+ Runware + fal；视频异步（`submit`/`poll`）走 fal、Kunavo、Runware 三家
+- [x] 按实际用量的结算计费（0.6）—— typed `usage`、时长感知的节约基线、`estCostUsd` 价格漂移信号、异常账单守卫
+- [x] 自托管 dashboard（[`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard)）—— 节约、failover 健康度、媒体单位成本、价格漂移面板
+- [ ] 内置价格表中的归一化跨 provider 视频价格对比
 ## 联盟（Affiliate）披露