npm - ai-lcr - Versions diffs - 0.6.4 → 0.7.0 - Mend

ai-lcr 0.6.4 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,69 @@ All notable changes to `ai-lcr` are documented here. The format follows
 [Keep a Changelog](https://keepachangelog.com/), and the project adheres to
 [Semantic Versioning](https://semver.org/).
+## [0.7.0] — 2026-06-20
+The text router now records the **provider-reported actual cost** when a provider
+returns one, instead of always estimating from the price table. The table becomes
+the routing input and the drift baseline (`estCostUsd`); the recorded `costUsd` is
+the real bill wherever the provider gives it.
+### Why
+A static price table can only encode one price per model, but an aggregator
+(OpenRouter) routes a single model across many sub-providers whose prices differ
+several-fold, picking one per call — so `tokens × table` is structurally unable to
+match the bill for multi-provider models (measured: `deepseek-v4-pro` reconciled at
+~57% of the real cost, while single-provider models like Gemini/Claude/GPT matched
+at 100%). The provider's own number already accounts for which sub-provider served,
+every token kind (cache read/write, reasoning), and fees — none of which a flat
+table can track.
+### Added
+- **`costUsd` prefers the provider-reported actual cost** (text path). Read from
+  OpenRouter's `providerMetadata.openrouter.usage` —
+  `costDetails.upstreamInferenceCost` (the real upstream / BYOK model spend) when
+  present, otherwise `cost` (the credit charge) — and from an OpenAI-compatible
+  provider's `estimated_cost` on the raw usage body. Requires the caller to enable
+  usage accounting on the provider (e.g. OpenRouter `usage: { include: true }`);
+  without it, behavior is unchanged.
+- **`estCostUsd` is now set on text records** (previously media-only) — the
+  price-table prediction for the same usage. `costUsd − estCostUsd` is the
+  price-table drift signal, so a dashboard's drift panel now works for text too.
+### Changed
+- When no provider cost is reported, `costUsd` still equals the price-table
+  estimate (and `estCostUsd` equals it, so no drift is flagged) — a pure fallback,
+  fully backward-compatible. The streaming path reads the reported cost from the
+  `finish` chunk's `providerMetadata`.
+## [0.6.5] — 2026-06-16
+Bundled price table now covers the open-weights labs, not just the Western
+proprietary makers — so `autoPrice` resolves Qwen, Kimi, MiniMax, and GLM routes
+out of the box (previously they needed a hand-typed `cost`).
+### Added
+- **`MODEL_PRICES` now includes the open-weights makers** — Qwen (Alibaba /
+  `dashscope`), Kimi (Moonshot), MiniMax, and GLM (Z.ai), alongside the existing
+  DeepSeek. 55 new first-party list prices (229 → 284 entries), keyed by each
+  maker's own bare model id (`qwen-plus`, `kimi-k2.5`, `MiniMax-M2`, `glm-4.6`,
+  …). The generator's `ALLOW` set gained `dashscope` / `moonshot` / `minimax` /
+  `zai`; no existing price changed.
+### Notes
+- These are **first-party** list rates (the maker's own API). A dedicated
+  inference *host* (DeepInfra, …) is often cheaper and uses HF-style ids
+  (`Qwen/Qwen3-…`) that won't match these bare keys — for an aggregator route,
+  keep passing an explicit `cost` or `discount`. The bundled rate is the
+  `autoPrice` baseline for the maker's own provider and a reference for the rest.
+- Aggregators (deepinfra, together, fireworks, groq, openrouter) remain
+  deliberately excluded from the table — their prices drift per-model.
 ## [0.6.4] — 2026-06-16
 DX improvements that eliminate per-project boilerplate for consumers.

package/README.md CHANGED Viewed

@@ -96,7 +96,7 @@ const lcr = createLCR({
 });
 ```
-The same pattern works for any vendor's native SDK provider — `@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/openai`, `@ai-sdk/xai`, and so on. They all return `LanguageModelV3`, so you can mix a native vendor API with aggregators in one model's list. Native APIs are narrow (only that vendor's models) but featureful; aggregators are broad. **Official-first + aggregator-fallback** is the natural LCR shape.
+The same pattern works for any vendor's native SDK provider — `@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/openai`, `@ai-sdk/xai`, and so on. `ProviderEntry` accepts `AnyLanguageModel` — a duck-typed interface (`doGenerate` + `doStream` + `provider` + `modelId`) that any AI SDK model satisfies regardless of spec version (V2 or V3), so you never need `as`-casts at the integration boundary. Native APIs are narrow (only that vendor's models) but featureful; aggregators are broad. **Official-first + aggregator-fallback** is the natural LCR shape.
 ## Cheapest route for open-weights models (DeepInfra)
@@ -138,9 +138,50 @@ const lcr = createLCR({
 DeepInfra carries open weights only — no first-party Claude / GPT / Gemini. For those closed models, route through OpenRouter or a discount gateway instead.
+## Skip the boilerplate (`DEFAULT_PROVIDERS`)
+Every project that routes through OpenRouter, DeepInfra, TokenMart, DeepSeek, etc. redeclares the same `baseURL` + `apiKeyEnv` pair. `DEFAULT_PROVIDERS` is a bundled dictionary — import what you need instead of copy-pasting URLs:
+```ts
+import { DEFAULT_PROVIDERS } from "ai-lcr";
+import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
+// Pick the providers you use — type-safe, no hardcoded URLs.
+const deepinfra = createOpenAICompatible({
+  name: "deepinfra",
+  baseURL: DEFAULT_PROVIDERS.deepinfra.baseURL,
+  apiKey: process.env[DEFAULT_PROVIDERS.deepinfra.apiKeyEnv],
+});
+```
+Available providers:
+| Key | Base URL | Env var |
+|---|---|---|
+| `openrouter` | `https://openrouter.ai/api/v1` | `OPENROUTER_API_KEY` |
+| `deepinfra` | `https://api.deepinfra.com/v1/openai` | `DEEPINFRA_API_KEY` |
+| `tokenmart` | `https://model.service-inference.ai/v1` | `INFERENCE_API_KEY` |
+| `deepseek` | `https://api.deepseek.com` | `DEEPSEEK_API_KEY` |
+| `kunavo` | `https://api.kunavo.com/v1` | `KUNAVO_API_KEY` |
+| `runware` | `https://api.runware.ai/v1` | `RUNWARE_API_KEY` |
+| `fal` | `https://queue.fal.run` | `FAL_KEY` |
+A common pattern is to subset `DEFAULT_PROVIDERS` into a project-local type for compile-time safety:
+```ts
+import { DEFAULT_PROVIDERS } from "ai-lcr";
+type ProviderId = "deepinfra" | "openrouter";
+export const PROVIDERS = {
+  deepinfra: DEFAULT_PROVIDERS.deepinfra,
+  openrouter: DEFAULT_PROVIDERS.openrouter,
+} satisfies Record<ProviderId, { baseURL: string; apiKeyEnv: string }>;
+```
 ## Zero-config pricing (`autoPrice`)
-Typing `cost: { input, output }` for every provider is the tedious part. `autoPrice: true` fills any entry that has no explicit `cost` from a **bundled price table** (`MODEL_PRICES`) — official first-party rates for the native makers (OpenAI, Anthropic, Google, DeepSeek, xAI, Mistral), keyed by the bare model id you already pass to the provider:
+Typing `cost: { input, output }` for every provider is the tedious part. `autoPrice: true` fills any entry that has no explicit `cost` from a **bundled price table** (`MODEL_PRICES`) — official first-party rates for the native makers (OpenAI, Anthropic, Google, xAI, Mistral, plus the open-weights labs DeepSeek, Qwen, Kimi, MiniMax, GLM), keyed by the bare model id you already pass to the provider:
 ```ts
 const lcr = createLCR({
@@ -161,7 +202,7 @@ Three rules keep it predictable:
 - **Off by default.** Unpriced entries stay unpriced (the pre-existing behavior), so turning `autoPrice` on never silently re-prices a model — and an **explicit `cost` always wins** over the table.
 - **`discount` is the reseller knob.** A flat-% aggregator (Kunavo −20%) becomes `discount: 0.2` instead of a hand-typed number; it scales input, output, and `cacheRead` alike, and only applies when the table fills the entry. Variable-discount providers (TokenMart) still want explicit per-model `cost`.
-- **Native makers only.** The table carries first-party list prices — the cheapest, most-featureful "go direct" route. Open-weights hosts (DeepInfra) and breadth aggregators (OpenRouter) aren't in it; price those explicitly.
+- **Native makers only.** The table carries first-party list prices, keyed by each maker's own bare id (`qwen-plus`, `glm-4.6`, `kimi-k2.5`, `MiniMax-M2`). It's the autoPrice baseline when you route through that maker's own API. Open-weights *hosts* (DeepInfra uses HF-style ids like `Qwen/Qwen3-…`) and breadth aggregators (OpenRouter) aren't keyed here — price those with explicit `cost` or `discount`.
 Look a price up yourself with `getModelPrice("claude-sonnet-4-6")`. The table is generated from [LiteLLM's price map](https://github.com/BerriAI/litellm) (MIT) — refresh with `node scripts/gen-text-prices.mjs`.
@@ -315,6 +356,26 @@ const lcr = createLCR({
 });
 ```
+### Convention-based sink (`createEnvSink`)
+If your app uses the standard env vars (`LCR_INGEST_URL`, `LCR_PROJECT`, `LCR_INGEST_KEY`), you don't need to wire `createHttpSink` at all — `createEnvSink` reads them for you and returns a ready-to-use `onCall` handler (or `undefined` when `LCR_INGEST_URL` is unset, so local dev stays quiet):
+```ts
+import { createEnvSink } from "ai-lcr";
+import { after } from "next/server";
+export const lcrCallSink = createEnvSink(after);
+// → use as `onCall: lcrCallSink` in createLCR
+```
+The only required argument is `dispatch` — a framework-specific fire-and-forget runner (Next.js: `after`, Cloudflare: `ctx.waitUntil`, plain servers: `(fn) => fn()`). Env vars:
+| Var | Required | Description |
+|---|---|---|
+| `LCR_INGEST_URL` | yes (no URL → sink is `undefined`) | Dashboard origin, e.g. `https://lcr.ideamarketfit.com` |
+| `LCR_PROJECT` | no | Project tag merged into each payload; falls back to `SITE_KEY` |
+| `LCR_INGEST_KEY` | no | Bearer token (only if the dashboard sets `INGEST_KEY`) |
 ### The companion dashboard ([`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard))
 <p align="center">

package/README.zh-CN.md CHANGED Viewed

@@ -96,7 +96,7 @@ const lcr = createLCR({
 });
 ```
-同样的模式适用于任何厂商的原生 SDK provider——`@ai-sdk/anthropic`、`@ai-sdk/google`、`@ai-sdk/openai`、`@ai-sdk/xai` 等等。它们都返回 `LanguageModelV3`，所以你可以在一个模型的列表里把厂商原生 API 和聚合器混着用。原生 API 覆盖窄（只有该厂商自己的模型）但特性全；聚合器覆盖广。**官方优先 + 聚合器兜底** 正是 LCR 最自然的形态。
+同样的模式适用于任何厂商的原生 SDK provider——`@ai-sdk/anthropic`、`@ai-sdk/google`、`@ai-sdk/openai`、`@ai-sdk/xai` 等等。`ProviderEntry` 接受 `AnyLanguageModel`——一个鸭子类型接口（`doGenerate` + `doStream` + `provider` + `modelId`），任何 AI SDK model 无论 V2 还是 V3 spec 都满足，集成边界无需 `as` 强转。原生 API 覆盖窄（只有该厂商自己的模型）但特性全；聚合器覆盖广。**官方优先 + 聚合器兜底** 正是 LCR 最自然的形态。
 ## 开源权重模型的最便宜路由（DeepInfra）
@@ -138,6 +138,47 @@ const lcr = createLCR({
 DeepInfra 只承载开源权重——没有第一方 Claude / GPT / Gemini。那些闭源模型请走 OpenRouter 或折扣中转。
+## 省掉样板代码（`DEFAULT_PROVIDERS`）
+每个路由 OpenRouter、DeepInfra、TokenMart、DeepSeek 等的项目都要重复声明相同的 `baseURL` + `apiKeyEnv`。`DEFAULT_PROVIDERS` 是一份内置字典——import 你需要的那几个就行，不用再复制粘贴 URL：
+```ts
+import { DEFAULT_PROVIDERS } from "ai-lcr";
+import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
+// 按需取——类型安全，无硬编码 URL。
+const deepinfra = createOpenAICompatible({
+  name: "deepinfra",
+  baseURL: DEFAULT_PROVIDERS.deepinfra.baseURL,
+  apiKey: process.env[DEFAULT_PROVIDERS.deepinfra.apiKeyEnv],
+});
+```
+可用 provider：
+| Key | Base URL | Env 变量 |
+|---|---|---|
+| `openrouter` | `https://openrouter.ai/api/v1` | `OPENROUTER_API_KEY` |
+| `deepinfra` | `https://api.deepinfra.com/v1/openai` | `DEEPINFRA_API_KEY` |
+| `tokenmart` | `https://model.service-inference.ai/v1` | `INFERENCE_API_KEY` |
+| `deepseek` | `https://api.deepseek.com` | `DEEPSEEK_API_KEY` |
+| `kunavo` | `https://api.kunavo.com/v1` | `KUNAVO_API_KEY` |
+| `runware` | `https://api.runware.ai/v1` | `RUNWARE_API_KEY` |
+| `fal` | `https://queue.fal.run` | `FAL_KEY` |
+常见用法是取 `DEFAULT_PROVIDERS` 的子集，并声明一个项目级类型保证编译安全：
+```ts
+import { DEFAULT_PROVIDERS } from "ai-lcr";
+type ProviderId = "deepinfra" | "openrouter";
+export const PROVIDERS = {
+  deepinfra: DEFAULT_PROVIDERS.deepinfra,
+  openrouter: DEFAULT_PROVIDERS.openrouter,
+} satisfies Record<ProviderId, { baseURL: string; apiKeyEnv: string }>;
+```
 ## 它如何路由
 1. **最便宜优先。** provider 按顺序依次尝试——把它们排成最便宜优先，或设置 `autoSort: true` 让它按 `cost` 自动排序。
@@ -221,7 +262,17 @@ interface CallRecord {
 **节约怎么算才诚实：** `baselineKind` 说明 `baselineUsd` 是哪种基线——文本是**链尾兜底 provider 的列表价**（`"last-leg"`，故意不取最贵的一条：prompt 缓存可能让标价更便宜的那家在缓存重的调用上反而更贵，取最大值会凭空造出"节约"）；媒体是**模型厂商官方第一方价**（`"official"`，按实际秒数算），查不到官方价时退化为你配置里最贵的路由（`"priciest-route"`，自我参照，仅说明跨 provider 价差）。
-**送进收集器：** `createHttpSink` 把每条记录 POST 到任意 endpoint（serverless 上传 Next.js 的 `after` 作 `dispatch` 防止被掐断）。配套的自托管 dashboard [`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard)（Next.js + Postgres，Vercel 一键部署）专为这些记录而建：花费 vs 节约趋势、各 provider failover 健康度、媒体 $/秒 与 $/张、以及**价格漂移面板**——某条 model@provider 路由的实报账单与价格表偏差超过 ±20% 时点名示警（约 100× 基本就是美元当美分的笔误）。只存元数据，不存 prompt 和输出。
+**送进收集器：** `createHttpSink` 把每条记录 POST 到任意 endpoint（serverless 上传 Next.js 的 `after` 作 `dispatch` 防止被掐断）。如果你用标准环境变量（`LCR_INGEST_URL`、`LCR_PROJECT`、`LCR_INGEST_KEY`），`createEnvSink` 全部替你读好——三行搞定：
+```ts
+import { createEnvSink } from "ai-lcr";
+import { after } from "next/server";
+export const lcrCallSink = createEnvSink(after);
+```
+`LCR_INGEST_URL` 不设 → sink 是 `undefined`，本地开发自动静默。唯一必传参数是 `dispatch`——框架相关的 fire-and-forget runner（Next.js: `after`；Cloudflare: `ctx.waitUntil`；长驻服务: `(fn) => fn()`）。
+配套的自托管 dashboard [`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard)（Next.js + Postgres，Vercel 一键部署）专为这些记录而建：花费 vs 节约趋势、各 provider failover 健康度、媒体 $/秒 与 $/张、以及**价格漂移面板**——某条 model@provider 路由的实报账单与价格表偏差超过 ±20% 时点名示警（约 100× 基本就是美元当美分的笔误）。只存元数据，不存 prompt 和输出。
 ## 支持的 provider

package/dist/index.cjs CHANGED Viewed

@@ -341,6 +341,20 @@ function cacheSavingForUsage(cost, inputTokens, cacheReadTokens) {
   const cached = Math.min(Math.max(cacheReadTokens, 0), inputTokens);
   return cached / 1e6 * (cost.input - cost.cacheRead);
 }
+function reportedCost(providerMetadata, usage) {
+  const orUsage = providerMetadata?.openrouter?.usage;
+  if (orUsage) {
+    const upstream = orUsage.costDetails?.upstreamInferenceCost;
+    if (typeof upstream === "number" && upstream > 0) return upstream;
+    if (typeof orUsage.cost === "number") return orUsage.cost;
+  }
+  const raw = usage?.raw;
+  if (raw) {
+    const est = raw["estimated_cost"] ?? raw["cost"];
+    if (typeof est === "number") return est;
+  }
+  return void 0;
+}
 function requestIdFrom(options) {
   const raw = options.providerOptions?.lcr?.requestId;
   return typeof raw === "string" && raw.length > 0 ? raw : void 0;
@@ -539,12 +553,13 @@ var LcrFallbackModel = class {
     return baseline;
   }
   /** Winner settled: record the attempt, fire `onCost` (compat) + `onCall`. */
-  finalizeOk(ctx, provider, attemptStart, usage, ttftMs) {
+  finalizeOk(ctx, provider, attemptStart, usage, ttftMs, providerMetadata) {
     ctx.attempts.push({ provider: provider.label, ok: true, latencyMs: Date.now() - attemptStart });
     const inputTokens = usage?.inputTokens?.total ?? 0;
     const outputTokens = usage?.outputTokens?.total ?? 0;
     const cacheReadTokens = usage?.inputTokens?.cacheRead ?? 0;
-    const costUsd = provider.cost ? costForUsage(provider.cost, inputTokens, outputTokens, cacheReadTokens) : 0;
+    const estCostUsd = provider.cost ? costForUsage(provider.cost, inputTokens, outputTokens, cacheReadTokens) : void 0;
+    const costUsd = reportedCost(providerMetadata, usage) ?? estCostUsd ?? 0;
     const cachedSavingUsd = provider.cost ? cacheSavingForUsage(provider.cost, inputTokens, cacheReadTokens) : 0;
     const usageMissing = inputTokens === 0 && outputTokens === 0;
     const emptyCompletion = inputTokens > 0 && outputTokens === 0;
@@ -579,6 +594,7 @@ var LcrFallbackModel = class {
       outputTokens,
       ...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {},
       costUsd,
+      ...estCostUsd !== void 0 ? { estCostUsd } : {},
       ...baselineUsd !== void 0 ? { baselineUsd, baselineKind: "last-leg" } : {},
       ...cachedSavingUsd > 0 ? { cachedSavingUsd } : {},
       ...ctx.requestId ? { requestId: ctx.requestId } : {},
@@ -635,7 +651,7 @@ var LcrFallbackModel = class {
         }
         this.recordProviderSuccess(idx);
         this.settleSticky(idx);
-        this.finalizeOk(ctx, provider, attemptStart, result.usage);
+        this.finalizeOk(ctx, provider, attemptStart, result.usage, void 0, result.providerMetadata);
         if (cache && cacheKey !== void 0 && ctx.settled?.cacheable) {
           this.storeCache(cacheKey, { kind: "generate", result, meta: ctx.settled.meta });
         }
@@ -767,6 +783,7 @@ var LcrFallbackModel = class {
     const servingIdx = idx;
     const servingPos = p;
     let usage;
+    let finishProviderMetadata;
     let contentStreamed = false;
     let ttftMs;
     const stream = new ReadableStream({
@@ -783,6 +800,7 @@ var LcrFallbackModel = class {
             if (done) break;
             if (value.type === "finish") {
               usage = value.usage;
+              finishProviderMetadata = value.providerMetadata;
               const out = value.usage?.outputTokens?.total ?? 0;
               const inp = value.usage?.inputTokens?.total ?? 0;
               if (inp > 0 && out === 0 && !contentStreamed && servingPos + 1 < n) {
@@ -797,7 +815,7 @@ var LcrFallbackModel = class {
           }
           self.recordProviderSuccess(servingIdx);
           self.settleSticky(servingIdx);
-          self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
+          self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs, finishProviderMetadata);
           controller.close();
         } catch (error) {
           self.emitError(error, servingProvider.label);
@@ -1003,6 +1021,16 @@ var MODEL_PRICES = {
   "gemini-gemma-2-9b-it": { input: 0.35, output: 1.05 },
   "gemini-pro-latest": { input: 1.25, output: 10, cacheRead: 0.125 },
   "gemini-robotics-er-1.5-preview": { input: 0.3, output: 2.5 },
+  "glm-4-32b-0414-128k": { input: 0.1, output: 0.1 },
+  "glm-4.5": { input: 0.6, output: 2.2 },
+  "glm-4.5-air": { input: 0.2, output: 1.1 },
+  "glm-4.5-airx": { input: 1.1, output: 4.5 },
+  "glm-4.5-x": { input: 2.2, output: 8.9 },
+  "glm-4.5v": { input: 0.6, output: 1.8 },
+  "glm-4.6": { input: 0.6, output: 2.2, cacheRead: 0.11 },
+  "glm-4.7": { input: 0.6, output: 2.2, cacheRead: 0.11 },
+  "glm-5": { input: 1, output: 3.2, cacheRead: 0.2 },
+  "glm-5-code": { input: 1.2, output: 5, cacheRead: 0.3 },
   "gpt-3.5-turbo": { input: 0.5, output: 1.5 },
   "gpt-3.5-turbo-0125": { input: 0.5, output: 1.5 },
   "gpt-3.5-turbo-1106": { input: 1, output: 2 },
@@ -1117,6 +1145,18 @@ var MODEL_PRICES = {
   "grok-code-fast-1": { input: 0.2, output: 1.5, cacheRead: 0.02 },
   "grok-code-fast-1-0825": { input: 0.2, output: 1.5, cacheRead: 0.02 },
   "grok-vision-beta": { input: 5, output: 15 },
+  "kimi-k2-0711-preview": { input: 0.6, output: 2.5, cacheRead: 0.15 },
+  "kimi-k2-0905-preview": { input: 0.6, output: 2.5, cacheRead: 0.15 },
+  "kimi-k2-thinking": { input: 0.6, output: 2.5, cacheRead: 0.15 },
+  "kimi-k2-thinking-turbo": { input: 1.15, output: 8, cacheRead: 0.15 },
+  "kimi-k2-turbo-preview": { input: 1.15, output: 8, cacheRead: 0.15 },
+  "kimi-k2.5": { input: 0.6, output: 3, cacheRead: 0.1 },
+  "kimi-k2.6": { input: 0.95, output: 4, cacheRead: 0.16 },
+  "kimi-latest": { input: 2, output: 5, cacheRead: 0.15 },
+  "kimi-latest-128k": { input: 2, output: 5, cacheRead: 0.15 },
+  "kimi-latest-32k": { input: 1, output: 3, cacheRead: 0.15 },
+  "kimi-latest-8k": { input: 0.2, output: 2, cacheRead: 0.15 },
+  "kimi-thinking-preview": { input: 0.6, output: 2.5, cacheRead: 0.15 },
   "labs-devstral-small-2512": { input: 0.1, output: 0.3 },
   "magistral-medium-1-2-2509": { input: 2, output: 5 },
   "magistral-medium-2506": { input: 2, output: 5 },
@@ -1125,6 +1165,12 @@ var MODEL_PRICES = {
   "magistral-small-1-2-2509": { input: 0.5, output: 1.5 },
   "magistral-small-2506": { input: 0.5, output: 1.5 },
   "magistral-small-latest": { input: 0.5, output: 1.5 },
+  "MiniMax-M2": { input: 0.3, output: 1.2, cacheRead: 0.03 },
+  "MiniMax-M2.1": { input: 0.3, output: 1.2, cacheRead: 0.03 },
+  "MiniMax-M2.1-lightning": { input: 0.3, output: 2.4, cacheRead: 0.03 },
+  "MiniMax-M2.5": { input: 0.3, output: 1.2, cacheRead: 0.03 },
+  "MiniMax-M2.5-lightning": { input: 0.3, output: 2.4, cacheRead: 0.03 },
+  "MiniMax-M3": { input: 0.6, output: 2.4, cacheRead: 0.12 },
   "ministral-3-14b-2512": { input: 0.2, output: 0.2 },
   "ministral-3-3b-2512": { input: 0.1, output: 0.1 },
   "ministral-3-8b-2512": { input: 0.15, output: 0.15 },
@@ -1145,6 +1191,16 @@ var MODEL_PRICES = {
   "mistral-small-3-2-2506": { input: 0.06, output: 0.18 },
   "mistral-small-latest": { input: 0.06, output: 0.18 },
   "mistral-tiny": { input: 0.25, output: 0.25 },
+  "moonshot-v1-128k": { input: 2, output: 5 },
+  "moonshot-v1-128k-0430": { input: 2, output: 5 },
+  "moonshot-v1-128k-vision-preview": { input: 2, output: 5 },
+  "moonshot-v1-32k": { input: 1, output: 3 },
+  "moonshot-v1-32k-0430": { input: 1, output: 3 },
+  "moonshot-v1-32k-vision-preview": { input: 1, output: 3 },
+  "moonshot-v1-8k": { input: 0.2, output: 2 },
+  "moonshot-v1-8k-0430": { input: 0.2, output: 2 },
+  "moonshot-v1-8k-vision-preview": { input: 0.2, output: 2 },
+  "moonshot-v1-auto": { input: 2, output: 5 },
   "o1": { input: 15, output: 60, cacheRead: 7.5 },
   "o1-2024-12-17": { input: 15, output: 60, cacheRead: 7.5 },
   "o3": { input: 2, output: 8, cacheRead: 0.5 },
@@ -1161,7 +1217,24 @@ var MODEL_PRICES = {
   "open-mixtral-8x7b": { input: 0.7, output: 0.7 },
   "pixtral-12b-2409": { input: 0.15, output: 0.15 },
   "pixtral-large-2411": { input: 2, output: 6 },
-  "pixtral-large-latest": { input: 2, output: 6 }
+  "pixtral-large-latest": { input: 2, output: 6 },
+  "qwen-coder": { input: 0.3, output: 1.5 },
+  "qwen-max": { input: 1.6, output: 6.4 },
+  "qwen-plus": { input: 0.4, output: 1.2 },
+  "qwen-plus-2025-01-25": { input: 0.4, output: 1.2 },
+  "qwen-plus-2025-04-28": { input: 0.4, output: 1.2 },
+  "qwen-plus-2025-07-14": { input: 0.4, output: 1.2 },
+  "qwen-turbo": { input: 0.05, output: 0.2 },
+  "qwen-turbo-2024-11-01": { input: 0.05, output: 0.2 },
+  "qwen-turbo-2025-04-28": { input: 0.05, output: 0.2 },
+  "qwen-turbo-latest": { input: 0.05, output: 0.2 },
+  "qwen3-next-80b-a3b-instruct": { input: 0.15, output: 1.2 },
+  "qwen3-next-80b-a3b-thinking": { input: 0.15, output: 1.2 },
+  "qwen3-vl-235b-a22b-instruct": { input: 0.4, output: 1.6 },
+  "qwen3-vl-235b-a22b-thinking": { input: 0.4, output: 4 },
+  "qwen3-vl-32b-instruct": { input: 0.16, output: 0.64 },
+  "qwen3-vl-32b-thinking": { input: 0.16, output: 2.87 },
+  "qwq-plus": { input: 0.8, output: 2.4 }
 };
 // src/media-official.ts

package/dist/index.js CHANGED Viewed

@@ -287,6 +287,20 @@ function cacheSavingForUsage(cost, inputTokens, cacheReadTokens) {
   const cached = Math.min(Math.max(cacheReadTokens, 0), inputTokens);
   return cached / 1e6 * (cost.input - cost.cacheRead);
 }
+function reportedCost(providerMetadata, usage) {
+  const orUsage = providerMetadata?.openrouter?.usage;
+  if (orUsage) {
+    const upstream = orUsage.costDetails?.upstreamInferenceCost;
+    if (typeof upstream === "number" && upstream > 0) return upstream;
+    if (typeof orUsage.cost === "number") return orUsage.cost;
+  }
+  const raw = usage?.raw;
+  if (raw) {
+    const est = raw["estimated_cost"] ?? raw["cost"];
+    if (typeof est === "number") return est;
+  }
+  return void 0;
+}
 function requestIdFrom(options) {
   const raw = options.providerOptions?.lcr?.requestId;
   return typeof raw === "string" && raw.length > 0 ? raw : void 0;
@@ -485,12 +499,13 @@ var LcrFallbackModel = class {
     return baseline;
   }
   /** Winner settled: record the attempt, fire `onCost` (compat) + `onCall`. */
-  finalizeOk(ctx, provider, attemptStart, usage, ttftMs) {
+  finalizeOk(ctx, provider, attemptStart, usage, ttftMs, providerMetadata) {
     ctx.attempts.push({ provider: provider.label, ok: true, latencyMs: Date.now() - attemptStart });
     const inputTokens = usage?.inputTokens?.total ?? 0;
     const outputTokens = usage?.outputTokens?.total ?? 0;
     const cacheReadTokens = usage?.inputTokens?.cacheRead ?? 0;
-    const costUsd = provider.cost ? costForUsage(provider.cost, inputTokens, outputTokens, cacheReadTokens) : 0;
+    const estCostUsd = provider.cost ? costForUsage(provider.cost, inputTokens, outputTokens, cacheReadTokens) : void 0;
+    const costUsd = reportedCost(providerMetadata, usage) ?? estCostUsd ?? 0;
     const cachedSavingUsd = provider.cost ? cacheSavingForUsage(provider.cost, inputTokens, cacheReadTokens) : 0;
     const usageMissing = inputTokens === 0 && outputTokens === 0;
     const emptyCompletion = inputTokens > 0 && outputTokens === 0;
@@ -525,6 +540,7 @@ var LcrFallbackModel = class {
       outputTokens,
       ...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {},
       costUsd,
+      ...estCostUsd !== void 0 ? { estCostUsd } : {},
       ...baselineUsd !== void 0 ? { baselineUsd, baselineKind: "last-leg" } : {},
       ...cachedSavingUsd > 0 ? { cachedSavingUsd } : {},
       ...ctx.requestId ? { requestId: ctx.requestId } : {},
@@ -581,7 +597,7 @@ var LcrFallbackModel = class {
         }
         this.recordProviderSuccess(idx);
         this.settleSticky(idx);
-        this.finalizeOk(ctx, provider, attemptStart, result.usage);
+        this.finalizeOk(ctx, provider, attemptStart, result.usage, void 0, result.providerMetadata);
         if (cache && cacheKey !== void 0 && ctx.settled?.cacheable) {
           this.storeCache(cacheKey, { kind: "generate", result, meta: ctx.settled.meta });
         }
@@ -713,6 +729,7 @@ var LcrFallbackModel = class {
     const servingIdx = idx;
     const servingPos = p;
     let usage;
+    let finishProviderMetadata;
     let contentStreamed = false;
     let ttftMs;
     const stream = new ReadableStream({
@@ -729,6 +746,7 @@ var LcrFallbackModel = class {
             if (done) break;
             if (value.type === "finish") {
               usage = value.usage;
+              finishProviderMetadata = value.providerMetadata;
               const out = value.usage?.outputTokens?.total ?? 0;
               const inp = value.usage?.inputTokens?.total ?? 0;
               if (inp > 0 && out === 0 && !contentStreamed && servingPos + 1 < n) {
@@ -743,7 +761,7 @@ var LcrFallbackModel = class {
           }
           self.recordProviderSuccess(servingIdx);
           self.settleSticky(servingIdx);
-          self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
+          self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs, finishProviderMetadata);
           controller.close();
         } catch (error) {
           self.emitError(error, servingProvider.label);
@@ -949,6 +967,16 @@ var MODEL_PRICES = {
   "gemini-gemma-2-9b-it": { input: 0.35, output: 1.05 },
   "gemini-pro-latest": { input: 1.25, output: 10, cacheRead: 0.125 },
   "gemini-robotics-er-1.5-preview": { input: 0.3, output: 2.5 },
+  "glm-4-32b-0414-128k": { input: 0.1, output: 0.1 },
+  "glm-4.5": { input: 0.6, output: 2.2 },
+  "glm-4.5-air": { input: 0.2, output: 1.1 },
+  "glm-4.5-airx": { input: 1.1, output: 4.5 },
+  "glm-4.5-x": { input: 2.2, output: 8.9 },
+  "glm-4.5v": { input: 0.6, output: 1.8 },
+  "glm-4.6": { input: 0.6, output: 2.2, cacheRead: 0.11 },
+  "glm-4.7": { input: 0.6, output: 2.2, cacheRead: 0.11 },
+  "glm-5": { input: 1, output: 3.2, cacheRead: 0.2 },
+  "glm-5-code": { input: 1.2, output: 5, cacheRead: 0.3 },
   "gpt-3.5-turbo": { input: 0.5, output: 1.5 },
   "gpt-3.5-turbo-0125": { input: 0.5, output: 1.5 },
   "gpt-3.5-turbo-1106": { input: 1, output: 2 },
@@ -1063,6 +1091,18 @@ var MODEL_PRICES = {
   "grok-code-fast-1": { input: 0.2, output: 1.5, cacheRead: 0.02 },
   "grok-code-fast-1-0825": { input: 0.2, output: 1.5, cacheRead: 0.02 },
   "grok-vision-beta": { input: 5, output: 15 },
+  "kimi-k2-0711-preview": { input: 0.6, output: 2.5, cacheRead: 0.15 },
+  "kimi-k2-0905-preview": { input: 0.6, output: 2.5, cacheRead: 0.15 },
+  "kimi-k2-thinking": { input: 0.6, output: 2.5, cacheRead: 0.15 },
+  "kimi-k2-thinking-turbo": { input: 1.15, output: 8, cacheRead: 0.15 },
+  "kimi-k2-turbo-preview": { input: 1.15, output: 8, cacheRead: 0.15 },
+  "kimi-k2.5": { input: 0.6, output: 3, cacheRead: 0.1 },
+  "kimi-k2.6": { input: 0.95, output: 4, cacheRead: 0.16 },
+  "kimi-latest": { input: 2, output: 5, cacheRead: 0.15 },
+  "kimi-latest-128k": { input: 2, output: 5, cacheRead: 0.15 },
+  "kimi-latest-32k": { input: 1, output: 3, cacheRead: 0.15 },
+  "kimi-latest-8k": { input: 0.2, output: 2, cacheRead: 0.15 },
+  "kimi-thinking-preview": { input: 0.6, output: 2.5, cacheRead: 0.15 },
   "labs-devstral-small-2512": { input: 0.1, output: 0.3 },
   "magistral-medium-1-2-2509": { input: 2, output: 5 },
   "magistral-medium-2506": { input: 2, output: 5 },
@@ -1071,6 +1111,12 @@ var MODEL_PRICES = {
   "magistral-small-1-2-2509": { input: 0.5, output: 1.5 },
   "magistral-small-2506": { input: 0.5, output: 1.5 },
   "magistral-small-latest": { input: 0.5, output: 1.5 },
+  "MiniMax-M2": { input: 0.3, output: 1.2, cacheRead: 0.03 },
+  "MiniMax-M2.1": { input: 0.3, output: 1.2, cacheRead: 0.03 },
+  "MiniMax-M2.1-lightning": { input: 0.3, output: 2.4, cacheRead: 0.03 },
+  "MiniMax-M2.5": { input: 0.3, output: 1.2, cacheRead: 0.03 },
+  "MiniMax-M2.5-lightning": { input: 0.3, output: 2.4, cacheRead: 0.03 },
+  "MiniMax-M3": { input: 0.6, output: 2.4, cacheRead: 0.12 },
   "ministral-3-14b-2512": { input: 0.2, output: 0.2 },
   "ministral-3-3b-2512": { input: 0.1, output: 0.1 },
   "ministral-3-8b-2512": { input: 0.15, output: 0.15 },
@@ -1091,6 +1137,16 @@ var MODEL_PRICES = {
   "mistral-small-3-2-2506": { input: 0.06, output: 0.18 },
   "mistral-small-latest": { input: 0.06, output: 0.18 },
   "mistral-tiny": { input: 0.25, output: 0.25 },
+  "moonshot-v1-128k": { input: 2, output: 5 },
+  "moonshot-v1-128k-0430": { input: 2, output: 5 },
+  "moonshot-v1-128k-vision-preview": { input: 2, output: 5 },
+  "moonshot-v1-32k": { input: 1, output: 3 },
+  "moonshot-v1-32k-0430": { input: 1, output: 3 },
+  "moonshot-v1-32k-vision-preview": { input: 1, output: 3 },
+  "moonshot-v1-8k": { input: 0.2, output: 2 },
+  "moonshot-v1-8k-0430": { input: 0.2, output: 2 },
+  "moonshot-v1-8k-vision-preview": { input: 0.2, output: 2 },
+  "moonshot-v1-auto": { input: 2, output: 5 },
   "o1": { input: 15, output: 60, cacheRead: 7.5 },
   "o1-2024-12-17": { input: 15, output: 60, cacheRead: 7.5 },
   "o3": { input: 2, output: 8, cacheRead: 0.5 },
@@ -1107,7 +1163,24 @@ var MODEL_PRICES = {
   "open-mixtral-8x7b": { input: 0.7, output: 0.7 },
   "pixtral-12b-2409": { input: 0.15, output: 0.15 },
   "pixtral-large-2411": { input: 2, output: 6 },
-  "pixtral-large-latest": { input: 2, output: 6 }
+  "pixtral-large-latest": { input: 2, output: 6 },
+  "qwen-coder": { input: 0.3, output: 1.5 },
+  "qwen-max": { input: 1.6, output: 6.4 },
+  "qwen-plus": { input: 0.4, output: 1.2 },
+  "qwen-plus-2025-01-25": { input: 0.4, output: 1.2 },
+  "qwen-plus-2025-04-28": { input: 0.4, output: 1.2 },
+  "qwen-plus-2025-07-14": { input: 0.4, output: 1.2 },
+  "qwen-turbo": { input: 0.05, output: 0.2 },
+  "qwen-turbo-2024-11-01": { input: 0.05, output: 0.2 },
+  "qwen-turbo-2025-04-28": { input: 0.05, output: 0.2 },
+  "qwen-turbo-latest": { input: 0.05, output: 0.2 },
+  "qwen3-next-80b-a3b-instruct": { input: 0.15, output: 1.2 },
+  "qwen3-next-80b-a3b-thinking": { input: 0.15, output: 1.2 },
+  "qwen3-vl-235b-a22b-instruct": { input: 0.4, output: 1.6 },
+  "qwen3-vl-235b-a22b-thinking": { input: 0.4, output: 4 },
+  "qwen3-vl-32b-instruct": { input: 0.16, output: 0.64 },
+  "qwen3-vl-32b-thinking": { input: 0.16, output: 2.87 },
+  "qwq-plus": { input: 0.8, output: 2.4 }
 };
 // src/media-official.ts

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "ai-lcr",
-  "version": "0.6.4",
+  "version": "0.7.0",
   "description": "Least Cost Routing for LLMs — route every model call to the cheapest available provider, fall back automatically, and track real cost. Built for the Vercel AI SDK.",
   "keywords": [
     "ai",