ai-lcr 0.6.4 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,69 @@ All notable changes to `ai-lcr` are documented here. The format follows
4
4
  [Keep a Changelog](https://keepachangelog.com/), and the project adheres to
5
5
  [Semantic Versioning](https://semver.org/).
6
6
 
7
+ ## [0.7.0] — 2026-06-20
8
+
9
+ The text router now records the **provider-reported actual cost** when a provider
10
+ returns one, instead of always estimating from the price table. The table becomes
11
+ the routing input and the drift baseline (`estCostUsd`); the recorded `costUsd` is
12
+ the real bill wherever the provider gives it.
13
+
14
+ ### Why
15
+
16
+ A static price table can only encode one price per model, but an aggregator
17
+ (OpenRouter) routes a single model across many sub-providers whose prices differ
18
+ several-fold, picking one per call — so `tokens × table` is structurally unable to
19
+ match the bill for multi-provider models (measured: `deepseek-v4-pro` reconciled at
20
+ ~57% of the real cost, while single-provider models like Gemini/Claude/GPT matched
21
+ at 100%). The provider's own number already accounts for which sub-provider served,
22
+ every token kind (cache read/write, reasoning), and fees — none of which a flat
23
+ table can track.
24
+
25
+ ### Added
26
+
27
+ - **`costUsd` prefers the provider-reported actual cost** (text path). Read from
28
+ OpenRouter's `providerMetadata.openrouter.usage` —
29
+ `costDetails.upstreamInferenceCost` (the real upstream / BYOK model spend) when
30
+ present, otherwise `cost` (the credit charge) — and from an OpenAI-compatible
31
+ provider's `estimated_cost` on the raw usage body. Requires the caller to enable
32
+ usage accounting on the provider (e.g. OpenRouter `usage: { include: true }`);
33
+ without it, behavior is unchanged.
34
+ - **`estCostUsd` is now set on text records** (previously media-only) — the
35
+ price-table prediction for the same usage. `costUsd − estCostUsd` is the
36
+ price-table drift signal, so a dashboard's drift panel now works for text too.
37
+
38
+ ### Changed
39
+
40
+ - When no provider cost is reported, `costUsd` still equals the price-table
41
+ estimate (and `estCostUsd` equals it, so no drift is flagged) — a pure fallback,
42
+ fully backward-compatible. The streaming path reads the reported cost from the
43
+ `finish` chunk's `providerMetadata`.
44
+
45
+ ## [0.6.5] — 2026-06-16
46
+
47
+ Bundled price table now covers the open-weights labs, not just the Western
48
+ proprietary makers — so `autoPrice` resolves Qwen, Kimi, MiniMax, and GLM routes
49
+ out of the box (previously they needed a hand-typed `cost`).
50
+
51
+ ### Added
52
+
53
+ - **`MODEL_PRICES` now includes the open-weights makers** — Qwen (Alibaba /
54
+ `dashscope`), Kimi (Moonshot), MiniMax, and GLM (Z.ai), alongside the existing
55
+ DeepSeek. 55 new first-party list prices (229 → 284 entries), keyed by each
56
+ maker's own bare model id (`qwen-plus`, `kimi-k2.5`, `MiniMax-M2`, `glm-4.6`,
57
+ …). The generator's `ALLOW` set gained `dashscope` / `moonshot` / `minimax` /
58
+ `zai`; no existing price changed.
59
+
60
+ ### Notes
61
+
62
+ - These are **first-party** list rates (the maker's own API). A dedicated
63
+ inference *host* (DeepInfra, …) is often cheaper and uses HF-style ids
64
+ (`Qwen/Qwen3-…`) that won't match these bare keys — for an aggregator route,
65
+ keep passing an explicit `cost` or `discount`. The bundled rate is the
66
+ `autoPrice` baseline for the maker's own provider and a reference for the rest.
67
+ - Aggregators (deepinfra, together, fireworks, groq, openrouter) remain
68
+ deliberately excluded from the table — their prices drift per-model.
69
+
7
70
  ## [0.6.4] — 2026-06-16
8
71
 
9
72
  DX improvements that eliminate per-project boilerplate for consumers.
package/README.md CHANGED
@@ -96,7 +96,7 @@ const lcr = createLCR({
96
96
  });
97
97
  ```
98
98
 
99
- The same pattern works for any vendor's native SDK provider — `@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/openai`, `@ai-sdk/xai`, and so on. They all return `LanguageModelV3`, so you can mix a native vendor API with aggregators in one model's list. Native APIs are narrow (only that vendor's models) but featureful; aggregators are broad. **Official-first + aggregator-fallback** is the natural LCR shape.
99
+ The same pattern works for any vendor's native SDK provider — `@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/openai`, `@ai-sdk/xai`, and so on. `ProviderEntry` accepts `AnyLanguageModel` a duck-typed interface (`doGenerate` + `doStream` + `provider` + `modelId`) that any AI SDK model satisfies regardless of spec version (V2 or V3), so you never need `as`-casts at the integration boundary. Native APIs are narrow (only that vendor's models) but featureful; aggregators are broad. **Official-first + aggregator-fallback** is the natural LCR shape.
100
100
 
101
101
  ## Cheapest route for open-weights models (DeepInfra)
102
102
 
@@ -138,9 +138,50 @@ const lcr = createLCR({
138
138
 
139
139
  DeepInfra carries open weights only — no first-party Claude / GPT / Gemini. For those closed models, route through OpenRouter or a discount gateway instead.
140
140
 
141
+ ## Skip the boilerplate (`DEFAULT_PROVIDERS`)
142
+
143
+ Every project that routes through OpenRouter, DeepInfra, TokenMart, DeepSeek, etc. redeclares the same `baseURL` + `apiKeyEnv` pair. `DEFAULT_PROVIDERS` is a bundled dictionary — import what you need instead of copy-pasting URLs:
144
+
145
+ ```ts
146
+ import { DEFAULT_PROVIDERS } from "ai-lcr";
147
+ import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
148
+
149
+ // Pick the providers you use — type-safe, no hardcoded URLs.
150
+ const deepinfra = createOpenAICompatible({
151
+ name: "deepinfra",
152
+ baseURL: DEFAULT_PROVIDERS.deepinfra.baseURL,
153
+ apiKey: process.env[DEFAULT_PROVIDERS.deepinfra.apiKeyEnv],
154
+ });
155
+ ```
156
+
157
+ Available providers:
158
+
159
+ | Key | Base URL | Env var |
160
+ |---|---|---|
161
+ | `openrouter` | `https://openrouter.ai/api/v1` | `OPENROUTER_API_KEY` |
162
+ | `deepinfra` | `https://api.deepinfra.com/v1/openai` | `DEEPINFRA_API_KEY` |
163
+ | `tokenmart` | `https://model.service-inference.ai/v1` | `INFERENCE_API_KEY` |
164
+ | `deepseek` | `https://api.deepseek.com` | `DEEPSEEK_API_KEY` |
165
+ | `kunavo` | `https://api.kunavo.com/v1` | `KUNAVO_API_KEY` |
166
+ | `runware` | `https://api.runware.ai/v1` | `RUNWARE_API_KEY` |
167
+ | `fal` | `https://queue.fal.run` | `FAL_KEY` |
168
+
169
+ A common pattern is to subset `DEFAULT_PROVIDERS` into a project-local type for compile-time safety:
170
+
171
+ ```ts
172
+ import { DEFAULT_PROVIDERS } from "ai-lcr";
173
+
174
+ type ProviderId = "deepinfra" | "openrouter";
175
+
176
+ export const PROVIDERS = {
177
+ deepinfra: DEFAULT_PROVIDERS.deepinfra,
178
+ openrouter: DEFAULT_PROVIDERS.openrouter,
179
+ } satisfies Record<ProviderId, { baseURL: string; apiKeyEnv: string }>;
180
+ ```
181
+
141
182
  ## Zero-config pricing (`autoPrice`)
142
183
 
143
- Typing `cost: { input, output }` for every provider is the tedious part. `autoPrice: true` fills any entry that has no explicit `cost` from a **bundled price table** (`MODEL_PRICES`) — official first-party rates for the native makers (OpenAI, Anthropic, Google, DeepSeek, xAI, Mistral), keyed by the bare model id you already pass to the provider:
184
+ Typing `cost: { input, output }` for every provider is the tedious part. `autoPrice: true` fills any entry that has no explicit `cost` from a **bundled price table** (`MODEL_PRICES`) — official first-party rates for the native makers (OpenAI, Anthropic, Google, xAI, Mistral, plus the open-weights labs DeepSeek, Qwen, Kimi, MiniMax, GLM), keyed by the bare model id you already pass to the provider:
144
185
 
145
186
  ```ts
146
187
  const lcr = createLCR({
@@ -161,7 +202,7 @@ Three rules keep it predictable:
161
202
 
162
203
  - **Off by default.** Unpriced entries stay unpriced (the pre-existing behavior), so turning `autoPrice` on never silently re-prices a model — and an **explicit `cost` always wins** over the table.
163
204
  - **`discount` is the reseller knob.** A flat-% aggregator (Kunavo −20%) becomes `discount: 0.2` instead of a hand-typed number; it scales input, output, and `cacheRead` alike, and only applies when the table fills the entry. Variable-discount providers (TokenMart) still want explicit per-model `cost`.
164
- - **Native makers only.** The table carries first-party list prices the cheapest, most-featureful "go direct" route. Open-weights hosts (DeepInfra) and breadth aggregators (OpenRouter) aren't in it; price those explicitly.
205
+ - **Native makers only.** The table carries first-party list prices, keyed by each maker's own bare id (`qwen-plus`, `glm-4.6`, `kimi-k2.5`, `MiniMax-M2`). It's the autoPrice baseline when you route through that maker's own API. Open-weights *hosts* (DeepInfra uses HF-style ids like `Qwen/Qwen3-…`) and breadth aggregators (OpenRouter) aren't keyed here price those with explicit `cost` or `discount`.
165
206
 
166
207
  Look a price up yourself with `getModelPrice("claude-sonnet-4-6")`. The table is generated from [LiteLLM's price map](https://github.com/BerriAI/litellm) (MIT) — refresh with `node scripts/gen-text-prices.mjs`.
167
208
 
@@ -315,6 +356,26 @@ const lcr = createLCR({
315
356
  });
316
357
  ```
317
358
 
359
+ ### Convention-based sink (`createEnvSink`)
360
+
361
+ If your app uses the standard env vars (`LCR_INGEST_URL`, `LCR_PROJECT`, `LCR_INGEST_KEY`), you don't need to wire `createHttpSink` at all — `createEnvSink` reads them for you and returns a ready-to-use `onCall` handler (or `undefined` when `LCR_INGEST_URL` is unset, so local dev stays quiet):
362
+
363
+ ```ts
364
+ import { createEnvSink } from "ai-lcr";
365
+ import { after } from "next/server";
366
+
367
+ export const lcrCallSink = createEnvSink(after);
368
+ // → use as `onCall: lcrCallSink` in createLCR
369
+ ```
370
+
371
+ The only required argument is `dispatch` — a framework-specific fire-and-forget runner (Next.js: `after`, Cloudflare: `ctx.waitUntil`, plain servers: `(fn) => fn()`). Env vars:
372
+
373
+ | Var | Required | Description |
374
+ |---|---|---|
375
+ | `LCR_INGEST_URL` | yes (no URL → sink is `undefined`) | Dashboard origin, e.g. `https://lcr.ideamarketfit.com` |
376
+ | `LCR_PROJECT` | no | Project tag merged into each payload; falls back to `SITE_KEY` |
377
+ | `LCR_INGEST_KEY` | no | Bearer token (only if the dashboard sets `INGEST_KEY`) |
378
+
318
379
  ### The companion dashboard ([`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard))
319
380
 
320
381
  <p align="center">
package/README.zh-CN.md CHANGED
@@ -96,7 +96,7 @@ const lcr = createLCR({
96
96
  });
97
97
  ```
98
98
 
99
- 同样的模式适用于任何厂商的原生 SDK provider——`@ai-sdk/anthropic`、`@ai-sdk/google`、`@ai-sdk/openai`、`@ai-sdk/xai` 等等。它们都返回 `LanguageModelV3`,所以你可以在一个模型的列表里把厂商原生 API 和聚合器混着用。原生 API 覆盖窄(只有该厂商自己的模型)但特性全;聚合器覆盖广。**官方优先 + 聚合器兜底** 正是 LCR 最自然的形态。
99
+ 同样的模式适用于任何厂商的原生 SDK provider——`@ai-sdk/anthropic`、`@ai-sdk/google`、`@ai-sdk/openai`、`@ai-sdk/xai` 等等。`ProviderEntry` 接受 `AnyLanguageModel`——一个鸭子类型接口(`doGenerate` + `doStream` + `provider` + `modelId`),任何 AI SDK model 无论 V2 还是 V3 spec 都满足,集成边界无需 `as` 强转。原生 API 覆盖窄(只有该厂商自己的模型)但特性全;聚合器覆盖广。**官方优先 + 聚合器兜底** 正是 LCR 最自然的形态。
100
100
 
101
101
  ## 开源权重模型的最便宜路由(DeepInfra)
102
102
 
@@ -138,6 +138,47 @@ const lcr = createLCR({
138
138
 
139
139
  DeepInfra 只承载开源权重——没有第一方 Claude / GPT / Gemini。那些闭源模型请走 OpenRouter 或折扣中转。
140
140
 
141
+ ## 省掉样板代码(`DEFAULT_PROVIDERS`)
142
+
143
+ 每个路由 OpenRouter、DeepInfra、TokenMart、DeepSeek 等的项目都要重复声明相同的 `baseURL` + `apiKeyEnv`。`DEFAULT_PROVIDERS` 是一份内置字典——import 你需要的那几个就行,不用再复制粘贴 URL:
144
+
145
+ ```ts
146
+ import { DEFAULT_PROVIDERS } from "ai-lcr";
147
+ import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
148
+
149
+ // 按需取——类型安全,无硬编码 URL。
150
+ const deepinfra = createOpenAICompatible({
151
+ name: "deepinfra",
152
+ baseURL: DEFAULT_PROVIDERS.deepinfra.baseURL,
153
+ apiKey: process.env[DEFAULT_PROVIDERS.deepinfra.apiKeyEnv],
154
+ });
155
+ ```
156
+
157
+ 可用 provider:
158
+
159
+ | Key | Base URL | Env 变量 |
160
+ |---|---|---|
161
+ | `openrouter` | `https://openrouter.ai/api/v1` | `OPENROUTER_API_KEY` |
162
+ | `deepinfra` | `https://api.deepinfra.com/v1/openai` | `DEEPINFRA_API_KEY` |
163
+ | `tokenmart` | `https://model.service-inference.ai/v1` | `INFERENCE_API_KEY` |
164
+ | `deepseek` | `https://api.deepseek.com` | `DEEPSEEK_API_KEY` |
165
+ | `kunavo` | `https://api.kunavo.com/v1` | `KUNAVO_API_KEY` |
166
+ | `runware` | `https://api.runware.ai/v1` | `RUNWARE_API_KEY` |
167
+ | `fal` | `https://queue.fal.run` | `FAL_KEY` |
168
+
169
+ 常见用法是取 `DEFAULT_PROVIDERS` 的子集,并声明一个项目级类型保证编译安全:
170
+
171
+ ```ts
172
+ import { DEFAULT_PROVIDERS } from "ai-lcr";
173
+
174
+ type ProviderId = "deepinfra" | "openrouter";
175
+
176
+ export const PROVIDERS = {
177
+ deepinfra: DEFAULT_PROVIDERS.deepinfra,
178
+ openrouter: DEFAULT_PROVIDERS.openrouter,
179
+ } satisfies Record<ProviderId, { baseURL: string; apiKeyEnv: string }>;
180
+ ```
181
+
141
182
  ## 它如何路由
142
183
 
143
184
  1. **最便宜优先。** provider 按顺序依次尝试——把它们排成最便宜优先,或设置 `autoSort: true` 让它按 `cost` 自动排序。
@@ -221,7 +262,17 @@ interface CallRecord {
221
262
 
222
263
  **节约怎么算才诚实:** `baselineKind` 说明 `baselineUsd` 是哪种基线——文本是**链尾兜底 provider 的列表价**(`"last-leg"`,故意不取最贵的一条:prompt 缓存可能让标价更便宜的那家在缓存重的调用上反而更贵,取最大值会凭空造出"节约");媒体是**模型厂商官方第一方价**(`"official"`,按实际秒数算),查不到官方价时退化为你配置里最贵的路由(`"priciest-route"`,自我参照,仅说明跨 provider 价差)。
223
264
 
224
- **送进收集器:** `createHttpSink` 把每条记录 POST 到任意 endpoint(serverless 上传 Next.js 的 `after` 作 `dispatch` 防止被掐断)。配套的自托管 dashboard [`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard)(Next.js + Postgres,Vercel 一键部署)专为这些记录而建:花费 vs 节约趋势、各 provider failover 健康度、媒体 $/秒 与 $/张、以及**价格漂移面板**——某条 model@provider 路由的实报账单与价格表偏差超过 ±20% 时点名示警(约 100× 基本就是美元当美分的笔误)。只存元数据,不存 prompt 和输出。
265
+ **送进收集器:** `createHttpSink` 把每条记录 POST 到任意 endpoint(serverless 上传 Next.js 的 `after` 作 `dispatch` 防止被掐断)。如果你用标准环境变量(`LCR_INGEST_URL`、`LCR_PROJECT`、`LCR_INGEST_KEY`),`createEnvSink` 全部替你读好——三行搞定:
266
+
267
+ ```ts
268
+ import { createEnvSink } from "ai-lcr";
269
+ import { after } from "next/server";
270
+ export const lcrCallSink = createEnvSink(after);
271
+ ```
272
+
273
+ `LCR_INGEST_URL` 不设 → sink 是 `undefined`,本地开发自动静默。唯一必传参数是 `dispatch`——框架相关的 fire-and-forget runner(Next.js: `after`;Cloudflare: `ctx.waitUntil`;长驻服务: `(fn) => fn()`)。
274
+
275
+ 配套的自托管 dashboard [`ai-lcr-dashboard`](https://github.com/ai-lcr/ai-lcr-dashboard)(Next.js + Postgres,Vercel 一键部署)专为这些记录而建:花费 vs 节约趋势、各 provider failover 健康度、媒体 $/秒 与 $/张、以及**价格漂移面板**——某条 model@provider 路由的实报账单与价格表偏差超过 ±20% 时点名示警(约 100× 基本就是美元当美分的笔误)。只存元数据,不存 prompt 和输出。
225
276
 
226
277
  ## 支持的 provider
227
278
 
package/dist/index.cjs CHANGED
@@ -341,6 +341,20 @@ function cacheSavingForUsage(cost, inputTokens, cacheReadTokens) {
341
341
  const cached = Math.min(Math.max(cacheReadTokens, 0), inputTokens);
342
342
  return cached / 1e6 * (cost.input - cost.cacheRead);
343
343
  }
344
+ function reportedCost(providerMetadata, usage) {
345
+ const orUsage = providerMetadata?.openrouter?.usage;
346
+ if (orUsage) {
347
+ const upstream = orUsage.costDetails?.upstreamInferenceCost;
348
+ if (typeof upstream === "number" && upstream > 0) return upstream;
349
+ if (typeof orUsage.cost === "number") return orUsage.cost;
350
+ }
351
+ const raw = usage?.raw;
352
+ if (raw) {
353
+ const est = raw["estimated_cost"] ?? raw["cost"];
354
+ if (typeof est === "number") return est;
355
+ }
356
+ return void 0;
357
+ }
344
358
  function requestIdFrom(options) {
345
359
  const raw = options.providerOptions?.lcr?.requestId;
346
360
  return typeof raw === "string" && raw.length > 0 ? raw : void 0;
@@ -539,12 +553,13 @@ var LcrFallbackModel = class {
539
553
  return baseline;
540
554
  }
541
555
  /** Winner settled: record the attempt, fire `onCost` (compat) + `onCall`. */
542
- finalizeOk(ctx, provider, attemptStart, usage, ttftMs) {
556
+ finalizeOk(ctx, provider, attemptStart, usage, ttftMs, providerMetadata) {
543
557
  ctx.attempts.push({ provider: provider.label, ok: true, latencyMs: Date.now() - attemptStart });
544
558
  const inputTokens = usage?.inputTokens?.total ?? 0;
545
559
  const outputTokens = usage?.outputTokens?.total ?? 0;
546
560
  const cacheReadTokens = usage?.inputTokens?.cacheRead ?? 0;
547
- const costUsd = provider.cost ? costForUsage(provider.cost, inputTokens, outputTokens, cacheReadTokens) : 0;
561
+ const estCostUsd = provider.cost ? costForUsage(provider.cost, inputTokens, outputTokens, cacheReadTokens) : void 0;
562
+ const costUsd = reportedCost(providerMetadata, usage) ?? estCostUsd ?? 0;
548
563
  const cachedSavingUsd = provider.cost ? cacheSavingForUsage(provider.cost, inputTokens, cacheReadTokens) : 0;
549
564
  const usageMissing = inputTokens === 0 && outputTokens === 0;
550
565
  const emptyCompletion = inputTokens > 0 && outputTokens === 0;
@@ -579,6 +594,7 @@ var LcrFallbackModel = class {
579
594
  outputTokens,
580
595
  ...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {},
581
596
  costUsd,
597
+ ...estCostUsd !== void 0 ? { estCostUsd } : {},
582
598
  ...baselineUsd !== void 0 ? { baselineUsd, baselineKind: "last-leg" } : {},
583
599
  ...cachedSavingUsd > 0 ? { cachedSavingUsd } : {},
584
600
  ...ctx.requestId ? { requestId: ctx.requestId } : {},
@@ -635,7 +651,7 @@ var LcrFallbackModel = class {
635
651
  }
636
652
  this.recordProviderSuccess(idx);
637
653
  this.settleSticky(idx);
638
- this.finalizeOk(ctx, provider, attemptStart, result.usage);
654
+ this.finalizeOk(ctx, provider, attemptStart, result.usage, void 0, result.providerMetadata);
639
655
  if (cache && cacheKey !== void 0 && ctx.settled?.cacheable) {
640
656
  this.storeCache(cacheKey, { kind: "generate", result, meta: ctx.settled.meta });
641
657
  }
@@ -767,6 +783,7 @@ var LcrFallbackModel = class {
767
783
  const servingIdx = idx;
768
784
  const servingPos = p;
769
785
  let usage;
786
+ let finishProviderMetadata;
770
787
  let contentStreamed = false;
771
788
  let ttftMs;
772
789
  const stream = new ReadableStream({
@@ -783,6 +800,7 @@ var LcrFallbackModel = class {
783
800
  if (done) break;
784
801
  if (value.type === "finish") {
785
802
  usage = value.usage;
803
+ finishProviderMetadata = value.providerMetadata;
786
804
  const out = value.usage?.outputTokens?.total ?? 0;
787
805
  const inp = value.usage?.inputTokens?.total ?? 0;
788
806
  if (inp > 0 && out === 0 && !contentStreamed && servingPos + 1 < n) {
@@ -797,7 +815,7 @@ var LcrFallbackModel = class {
797
815
  }
798
816
  self.recordProviderSuccess(servingIdx);
799
817
  self.settleSticky(servingIdx);
800
- self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
818
+ self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs, finishProviderMetadata);
801
819
  controller.close();
802
820
  } catch (error) {
803
821
  self.emitError(error, servingProvider.label);
@@ -1003,6 +1021,16 @@ var MODEL_PRICES = {
1003
1021
  "gemini-gemma-2-9b-it": { input: 0.35, output: 1.05 },
1004
1022
  "gemini-pro-latest": { input: 1.25, output: 10, cacheRead: 0.125 },
1005
1023
  "gemini-robotics-er-1.5-preview": { input: 0.3, output: 2.5 },
1024
+ "glm-4-32b-0414-128k": { input: 0.1, output: 0.1 },
1025
+ "glm-4.5": { input: 0.6, output: 2.2 },
1026
+ "glm-4.5-air": { input: 0.2, output: 1.1 },
1027
+ "glm-4.5-airx": { input: 1.1, output: 4.5 },
1028
+ "glm-4.5-x": { input: 2.2, output: 8.9 },
1029
+ "glm-4.5v": { input: 0.6, output: 1.8 },
1030
+ "glm-4.6": { input: 0.6, output: 2.2, cacheRead: 0.11 },
1031
+ "glm-4.7": { input: 0.6, output: 2.2, cacheRead: 0.11 },
1032
+ "glm-5": { input: 1, output: 3.2, cacheRead: 0.2 },
1033
+ "glm-5-code": { input: 1.2, output: 5, cacheRead: 0.3 },
1006
1034
  "gpt-3.5-turbo": { input: 0.5, output: 1.5 },
1007
1035
  "gpt-3.5-turbo-0125": { input: 0.5, output: 1.5 },
1008
1036
  "gpt-3.5-turbo-1106": { input: 1, output: 2 },
@@ -1117,6 +1145,18 @@ var MODEL_PRICES = {
1117
1145
  "grok-code-fast-1": { input: 0.2, output: 1.5, cacheRead: 0.02 },
1118
1146
  "grok-code-fast-1-0825": { input: 0.2, output: 1.5, cacheRead: 0.02 },
1119
1147
  "grok-vision-beta": { input: 5, output: 15 },
1148
+ "kimi-k2-0711-preview": { input: 0.6, output: 2.5, cacheRead: 0.15 },
1149
+ "kimi-k2-0905-preview": { input: 0.6, output: 2.5, cacheRead: 0.15 },
1150
+ "kimi-k2-thinking": { input: 0.6, output: 2.5, cacheRead: 0.15 },
1151
+ "kimi-k2-thinking-turbo": { input: 1.15, output: 8, cacheRead: 0.15 },
1152
+ "kimi-k2-turbo-preview": { input: 1.15, output: 8, cacheRead: 0.15 },
1153
+ "kimi-k2.5": { input: 0.6, output: 3, cacheRead: 0.1 },
1154
+ "kimi-k2.6": { input: 0.95, output: 4, cacheRead: 0.16 },
1155
+ "kimi-latest": { input: 2, output: 5, cacheRead: 0.15 },
1156
+ "kimi-latest-128k": { input: 2, output: 5, cacheRead: 0.15 },
1157
+ "kimi-latest-32k": { input: 1, output: 3, cacheRead: 0.15 },
1158
+ "kimi-latest-8k": { input: 0.2, output: 2, cacheRead: 0.15 },
1159
+ "kimi-thinking-preview": { input: 0.6, output: 2.5, cacheRead: 0.15 },
1120
1160
  "labs-devstral-small-2512": { input: 0.1, output: 0.3 },
1121
1161
  "magistral-medium-1-2-2509": { input: 2, output: 5 },
1122
1162
  "magistral-medium-2506": { input: 2, output: 5 },
@@ -1125,6 +1165,12 @@ var MODEL_PRICES = {
1125
1165
  "magistral-small-1-2-2509": { input: 0.5, output: 1.5 },
1126
1166
  "magistral-small-2506": { input: 0.5, output: 1.5 },
1127
1167
  "magistral-small-latest": { input: 0.5, output: 1.5 },
1168
+ "MiniMax-M2": { input: 0.3, output: 1.2, cacheRead: 0.03 },
1169
+ "MiniMax-M2.1": { input: 0.3, output: 1.2, cacheRead: 0.03 },
1170
+ "MiniMax-M2.1-lightning": { input: 0.3, output: 2.4, cacheRead: 0.03 },
1171
+ "MiniMax-M2.5": { input: 0.3, output: 1.2, cacheRead: 0.03 },
1172
+ "MiniMax-M2.5-lightning": { input: 0.3, output: 2.4, cacheRead: 0.03 },
1173
+ "MiniMax-M3": { input: 0.6, output: 2.4, cacheRead: 0.12 },
1128
1174
  "ministral-3-14b-2512": { input: 0.2, output: 0.2 },
1129
1175
  "ministral-3-3b-2512": { input: 0.1, output: 0.1 },
1130
1176
  "ministral-3-8b-2512": { input: 0.15, output: 0.15 },
@@ -1145,6 +1191,16 @@ var MODEL_PRICES = {
1145
1191
  "mistral-small-3-2-2506": { input: 0.06, output: 0.18 },
1146
1192
  "mistral-small-latest": { input: 0.06, output: 0.18 },
1147
1193
  "mistral-tiny": { input: 0.25, output: 0.25 },
1194
+ "moonshot-v1-128k": { input: 2, output: 5 },
1195
+ "moonshot-v1-128k-0430": { input: 2, output: 5 },
1196
+ "moonshot-v1-128k-vision-preview": { input: 2, output: 5 },
1197
+ "moonshot-v1-32k": { input: 1, output: 3 },
1198
+ "moonshot-v1-32k-0430": { input: 1, output: 3 },
1199
+ "moonshot-v1-32k-vision-preview": { input: 1, output: 3 },
1200
+ "moonshot-v1-8k": { input: 0.2, output: 2 },
1201
+ "moonshot-v1-8k-0430": { input: 0.2, output: 2 },
1202
+ "moonshot-v1-8k-vision-preview": { input: 0.2, output: 2 },
1203
+ "moonshot-v1-auto": { input: 2, output: 5 },
1148
1204
  "o1": { input: 15, output: 60, cacheRead: 7.5 },
1149
1205
  "o1-2024-12-17": { input: 15, output: 60, cacheRead: 7.5 },
1150
1206
  "o3": { input: 2, output: 8, cacheRead: 0.5 },
@@ -1161,7 +1217,24 @@ var MODEL_PRICES = {
1161
1217
  "open-mixtral-8x7b": { input: 0.7, output: 0.7 },
1162
1218
  "pixtral-12b-2409": { input: 0.15, output: 0.15 },
1163
1219
  "pixtral-large-2411": { input: 2, output: 6 },
1164
- "pixtral-large-latest": { input: 2, output: 6 }
1220
+ "pixtral-large-latest": { input: 2, output: 6 },
1221
+ "qwen-coder": { input: 0.3, output: 1.5 },
1222
+ "qwen-max": { input: 1.6, output: 6.4 },
1223
+ "qwen-plus": { input: 0.4, output: 1.2 },
1224
+ "qwen-plus-2025-01-25": { input: 0.4, output: 1.2 },
1225
+ "qwen-plus-2025-04-28": { input: 0.4, output: 1.2 },
1226
+ "qwen-plus-2025-07-14": { input: 0.4, output: 1.2 },
1227
+ "qwen-turbo": { input: 0.05, output: 0.2 },
1228
+ "qwen-turbo-2024-11-01": { input: 0.05, output: 0.2 },
1229
+ "qwen-turbo-2025-04-28": { input: 0.05, output: 0.2 },
1230
+ "qwen-turbo-latest": { input: 0.05, output: 0.2 },
1231
+ "qwen3-next-80b-a3b-instruct": { input: 0.15, output: 1.2 },
1232
+ "qwen3-next-80b-a3b-thinking": { input: 0.15, output: 1.2 },
1233
+ "qwen3-vl-235b-a22b-instruct": { input: 0.4, output: 1.6 },
1234
+ "qwen3-vl-235b-a22b-thinking": { input: 0.4, output: 4 },
1235
+ "qwen3-vl-32b-instruct": { input: 0.16, output: 0.64 },
1236
+ "qwen3-vl-32b-thinking": { input: 0.16, output: 2.87 },
1237
+ "qwq-plus": { input: 0.8, output: 2.4 }
1165
1238
  };
1166
1239
 
1167
1240
  // src/media-official.ts
package/dist/index.js CHANGED
@@ -287,6 +287,20 @@ function cacheSavingForUsage(cost, inputTokens, cacheReadTokens) {
287
287
  const cached = Math.min(Math.max(cacheReadTokens, 0), inputTokens);
288
288
  return cached / 1e6 * (cost.input - cost.cacheRead);
289
289
  }
290
+ function reportedCost(providerMetadata, usage) {
291
+ const orUsage = providerMetadata?.openrouter?.usage;
292
+ if (orUsage) {
293
+ const upstream = orUsage.costDetails?.upstreamInferenceCost;
294
+ if (typeof upstream === "number" && upstream > 0) return upstream;
295
+ if (typeof orUsage.cost === "number") return orUsage.cost;
296
+ }
297
+ const raw = usage?.raw;
298
+ if (raw) {
299
+ const est = raw["estimated_cost"] ?? raw["cost"];
300
+ if (typeof est === "number") return est;
301
+ }
302
+ return void 0;
303
+ }
290
304
  function requestIdFrom(options) {
291
305
  const raw = options.providerOptions?.lcr?.requestId;
292
306
  return typeof raw === "string" && raw.length > 0 ? raw : void 0;
@@ -485,12 +499,13 @@ var LcrFallbackModel = class {
485
499
  return baseline;
486
500
  }
487
501
  /** Winner settled: record the attempt, fire `onCost` (compat) + `onCall`. */
488
- finalizeOk(ctx, provider, attemptStart, usage, ttftMs) {
502
+ finalizeOk(ctx, provider, attemptStart, usage, ttftMs, providerMetadata) {
489
503
  ctx.attempts.push({ provider: provider.label, ok: true, latencyMs: Date.now() - attemptStart });
490
504
  const inputTokens = usage?.inputTokens?.total ?? 0;
491
505
  const outputTokens = usage?.outputTokens?.total ?? 0;
492
506
  const cacheReadTokens = usage?.inputTokens?.cacheRead ?? 0;
493
- const costUsd = provider.cost ? costForUsage(provider.cost, inputTokens, outputTokens, cacheReadTokens) : 0;
507
+ const estCostUsd = provider.cost ? costForUsage(provider.cost, inputTokens, outputTokens, cacheReadTokens) : void 0;
508
+ const costUsd = reportedCost(providerMetadata, usage) ?? estCostUsd ?? 0;
494
509
  const cachedSavingUsd = provider.cost ? cacheSavingForUsage(provider.cost, inputTokens, cacheReadTokens) : 0;
495
510
  const usageMissing = inputTokens === 0 && outputTokens === 0;
496
511
  const emptyCompletion = inputTokens > 0 && outputTokens === 0;
@@ -525,6 +540,7 @@ var LcrFallbackModel = class {
525
540
  outputTokens,
526
541
  ...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {},
527
542
  costUsd,
543
+ ...estCostUsd !== void 0 ? { estCostUsd } : {},
528
544
  ...baselineUsd !== void 0 ? { baselineUsd, baselineKind: "last-leg" } : {},
529
545
  ...cachedSavingUsd > 0 ? { cachedSavingUsd } : {},
530
546
  ...ctx.requestId ? { requestId: ctx.requestId } : {},
@@ -581,7 +597,7 @@ var LcrFallbackModel = class {
581
597
  }
582
598
  this.recordProviderSuccess(idx);
583
599
  this.settleSticky(idx);
584
- this.finalizeOk(ctx, provider, attemptStart, result.usage);
600
+ this.finalizeOk(ctx, provider, attemptStart, result.usage, void 0, result.providerMetadata);
585
601
  if (cache && cacheKey !== void 0 && ctx.settled?.cacheable) {
586
602
  this.storeCache(cacheKey, { kind: "generate", result, meta: ctx.settled.meta });
587
603
  }
@@ -713,6 +729,7 @@ var LcrFallbackModel = class {
713
729
  const servingIdx = idx;
714
730
  const servingPos = p;
715
731
  let usage;
732
+ let finishProviderMetadata;
716
733
  let contentStreamed = false;
717
734
  let ttftMs;
718
735
  const stream = new ReadableStream({
@@ -729,6 +746,7 @@ var LcrFallbackModel = class {
729
746
  if (done) break;
730
747
  if (value.type === "finish") {
731
748
  usage = value.usage;
749
+ finishProviderMetadata = value.providerMetadata;
732
750
  const out = value.usage?.outputTokens?.total ?? 0;
733
751
  const inp = value.usage?.inputTokens?.total ?? 0;
734
752
  if (inp > 0 && out === 0 && !contentStreamed && servingPos + 1 < n) {
@@ -743,7 +761,7 @@ var LcrFallbackModel = class {
743
761
  }
744
762
  self.recordProviderSuccess(servingIdx);
745
763
  self.settleSticky(servingIdx);
746
- self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
764
+ self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs, finishProviderMetadata);
747
765
  controller.close();
748
766
  } catch (error) {
749
767
  self.emitError(error, servingProvider.label);
@@ -949,6 +967,16 @@ var MODEL_PRICES = {
949
967
  "gemini-gemma-2-9b-it": { input: 0.35, output: 1.05 },
950
968
  "gemini-pro-latest": { input: 1.25, output: 10, cacheRead: 0.125 },
951
969
  "gemini-robotics-er-1.5-preview": { input: 0.3, output: 2.5 },
970
+ "glm-4-32b-0414-128k": { input: 0.1, output: 0.1 },
971
+ "glm-4.5": { input: 0.6, output: 2.2 },
972
+ "glm-4.5-air": { input: 0.2, output: 1.1 },
973
+ "glm-4.5-airx": { input: 1.1, output: 4.5 },
974
+ "glm-4.5-x": { input: 2.2, output: 8.9 },
975
+ "glm-4.5v": { input: 0.6, output: 1.8 },
976
+ "glm-4.6": { input: 0.6, output: 2.2, cacheRead: 0.11 },
977
+ "glm-4.7": { input: 0.6, output: 2.2, cacheRead: 0.11 },
978
+ "glm-5": { input: 1, output: 3.2, cacheRead: 0.2 },
979
+ "glm-5-code": { input: 1.2, output: 5, cacheRead: 0.3 },
952
980
  "gpt-3.5-turbo": { input: 0.5, output: 1.5 },
953
981
  "gpt-3.5-turbo-0125": { input: 0.5, output: 1.5 },
954
982
  "gpt-3.5-turbo-1106": { input: 1, output: 2 },
@@ -1063,6 +1091,18 @@ var MODEL_PRICES = {
1063
1091
  "grok-code-fast-1": { input: 0.2, output: 1.5, cacheRead: 0.02 },
1064
1092
  "grok-code-fast-1-0825": { input: 0.2, output: 1.5, cacheRead: 0.02 },
1065
1093
  "grok-vision-beta": { input: 5, output: 15 },
1094
+ "kimi-k2-0711-preview": { input: 0.6, output: 2.5, cacheRead: 0.15 },
1095
+ "kimi-k2-0905-preview": { input: 0.6, output: 2.5, cacheRead: 0.15 },
1096
+ "kimi-k2-thinking": { input: 0.6, output: 2.5, cacheRead: 0.15 },
1097
+ "kimi-k2-thinking-turbo": { input: 1.15, output: 8, cacheRead: 0.15 },
1098
+ "kimi-k2-turbo-preview": { input: 1.15, output: 8, cacheRead: 0.15 },
1099
+ "kimi-k2.5": { input: 0.6, output: 3, cacheRead: 0.1 },
1100
+ "kimi-k2.6": { input: 0.95, output: 4, cacheRead: 0.16 },
1101
+ "kimi-latest": { input: 2, output: 5, cacheRead: 0.15 },
1102
+ "kimi-latest-128k": { input: 2, output: 5, cacheRead: 0.15 },
1103
+ "kimi-latest-32k": { input: 1, output: 3, cacheRead: 0.15 },
1104
+ "kimi-latest-8k": { input: 0.2, output: 2, cacheRead: 0.15 },
1105
+ "kimi-thinking-preview": { input: 0.6, output: 2.5, cacheRead: 0.15 },
1066
1106
  "labs-devstral-small-2512": { input: 0.1, output: 0.3 },
1067
1107
  "magistral-medium-1-2-2509": { input: 2, output: 5 },
1068
1108
  "magistral-medium-2506": { input: 2, output: 5 },
@@ -1071,6 +1111,12 @@ var MODEL_PRICES = {
1071
1111
  "magistral-small-1-2-2509": { input: 0.5, output: 1.5 },
1072
1112
  "magistral-small-2506": { input: 0.5, output: 1.5 },
1073
1113
  "magistral-small-latest": { input: 0.5, output: 1.5 },
1114
+ "MiniMax-M2": { input: 0.3, output: 1.2, cacheRead: 0.03 },
1115
+ "MiniMax-M2.1": { input: 0.3, output: 1.2, cacheRead: 0.03 },
1116
+ "MiniMax-M2.1-lightning": { input: 0.3, output: 2.4, cacheRead: 0.03 },
1117
+ "MiniMax-M2.5": { input: 0.3, output: 1.2, cacheRead: 0.03 },
1118
+ "MiniMax-M2.5-lightning": { input: 0.3, output: 2.4, cacheRead: 0.03 },
1119
+ "MiniMax-M3": { input: 0.6, output: 2.4, cacheRead: 0.12 },
1074
1120
  "ministral-3-14b-2512": { input: 0.2, output: 0.2 },
1075
1121
  "ministral-3-3b-2512": { input: 0.1, output: 0.1 },
1076
1122
  "ministral-3-8b-2512": { input: 0.15, output: 0.15 },
@@ -1091,6 +1137,16 @@ var MODEL_PRICES = {
1091
1137
  "mistral-small-3-2-2506": { input: 0.06, output: 0.18 },
1092
1138
  "mistral-small-latest": { input: 0.06, output: 0.18 },
1093
1139
  "mistral-tiny": { input: 0.25, output: 0.25 },
1140
+ "moonshot-v1-128k": { input: 2, output: 5 },
1141
+ "moonshot-v1-128k-0430": { input: 2, output: 5 },
1142
+ "moonshot-v1-128k-vision-preview": { input: 2, output: 5 },
1143
+ "moonshot-v1-32k": { input: 1, output: 3 },
1144
+ "moonshot-v1-32k-0430": { input: 1, output: 3 },
1145
+ "moonshot-v1-32k-vision-preview": { input: 1, output: 3 },
1146
+ "moonshot-v1-8k": { input: 0.2, output: 2 },
1147
+ "moonshot-v1-8k-0430": { input: 0.2, output: 2 },
1148
+ "moonshot-v1-8k-vision-preview": { input: 0.2, output: 2 },
1149
+ "moonshot-v1-auto": { input: 2, output: 5 },
1094
1150
  "o1": { input: 15, output: 60, cacheRead: 7.5 },
1095
1151
  "o1-2024-12-17": { input: 15, output: 60, cacheRead: 7.5 },
1096
1152
  "o3": { input: 2, output: 8, cacheRead: 0.5 },
@@ -1107,7 +1163,24 @@ var MODEL_PRICES = {
1107
1163
  "open-mixtral-8x7b": { input: 0.7, output: 0.7 },
1108
1164
  "pixtral-12b-2409": { input: 0.15, output: 0.15 },
1109
1165
  "pixtral-large-2411": { input: 2, output: 6 },
1110
- "pixtral-large-latest": { input: 2, output: 6 }
1166
+ "pixtral-large-latest": { input: 2, output: 6 },
1167
+ "qwen-coder": { input: 0.3, output: 1.5 },
1168
+ "qwen-max": { input: 1.6, output: 6.4 },
1169
+ "qwen-plus": { input: 0.4, output: 1.2 },
1170
+ "qwen-plus-2025-01-25": { input: 0.4, output: 1.2 },
1171
+ "qwen-plus-2025-04-28": { input: 0.4, output: 1.2 },
1172
+ "qwen-plus-2025-07-14": { input: 0.4, output: 1.2 },
1173
+ "qwen-turbo": { input: 0.05, output: 0.2 },
1174
+ "qwen-turbo-2024-11-01": { input: 0.05, output: 0.2 },
1175
+ "qwen-turbo-2025-04-28": { input: 0.05, output: 0.2 },
1176
+ "qwen-turbo-latest": { input: 0.05, output: 0.2 },
1177
+ "qwen3-next-80b-a3b-instruct": { input: 0.15, output: 1.2 },
1178
+ "qwen3-next-80b-a3b-thinking": { input: 0.15, output: 1.2 },
1179
+ "qwen3-vl-235b-a22b-instruct": { input: 0.4, output: 1.6 },
1180
+ "qwen3-vl-235b-a22b-thinking": { input: 0.4, output: 4 },
1181
+ "qwen3-vl-32b-instruct": { input: 0.16, output: 0.64 },
1182
+ "qwen3-vl-32b-thinking": { input: 0.16, output: 2.87 },
1183
+ "qwq-plus": { input: 0.8, output: 2.4 }
1111
1184
  };
1112
1185
 
1113
1186
  // src/media-official.ts
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ai-lcr",
3
- "version": "0.6.4",
3
+ "version": "0.7.0",
4
4
  "description": "Least Cost Routing for LLMs — route every model call to the cheapest available provider, fall back automatically, and track real cost. Built for the Vercel AI SDK.",
5
5
  "keywords": [
6
6
  "ai",