ai-lcr 0.0.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -51,7 +51,7 @@ const lcr = createLCR({
51
51
  models: {
52
52
  // One logical model, served cheapest-first across providers.
53
53
  "gemini-3-flash": [
54
- { model: kunavo("gemini-3-flash"), label: "kunavo", cost: { input: 0.35, output: 2.1 } },
54
+ { model: kunavo("gemini-3-flash"), label: "kunavo", cost: { input: 0.40, output: 2.40 } },
55
55
  { model: openrouter("google/gemini-3-flash-preview"), label: "openrouter", cost: { input: 0.5, output: 3.0 } },
56
56
  ],
57
57
  },
@@ -67,56 +67,155 @@ const { text } = await generateText({
67
67
 
68
68
  `cost` and `label` are optional — pass bare models (`kunavo("gemini-3-flash")`) if you don't need cost accounting or `autoSort`. `lcr("gemini-3-flash")` returns a standard AI SDK model, so it works with `generateText`, `streamText`, `generateObject`, tools, and agents.
69
69
 
70
+ ## Route to a model vendor's own API (native providers)
71
+
72
+ A "provider" doesn't have to be an aggregator. A model vendor's **own official API** is just another entry in the list — often the cheapest, since there's no aggregator markup, and the least likely to silently break native features (prompt caching, tool calls). Any AI SDK provider package returns a standard model, so a vendor's native API and an OpenAI-compatible aggregator sit side by side in the same list:
73
+
74
+ ```ts
75
+ import { createLCR } from "ai-lcr";
76
+ import { createDeepSeek } from "@ai-sdk/deepseek"; // DeepSeek's own API
77
+ import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
78
+
79
+ const deepseek = createDeepSeek({ apiKey: process.env.DEEPSEEK_API_KEY });
80
+ const openrouter = createOpenAICompatible({
81
+ name: "openrouter",
82
+ baseURL: "https://openrouter.ai/api/v1",
83
+ apiKey: process.env.OPENROUTER_API_KEY,
84
+ });
85
+
86
+ const lcr = createLCR({
87
+ autoSort: true,
88
+ models: {
89
+ "deepseek-v4": [
90
+ // Official API first — no markup, full native features (caching, off-peak discounts).
91
+ { model: deepseek("deepseek-chat"), label: "deepseek", cost: { input: 0.43, output: 0.87 } },
92
+ // Aggregator as a fallback for uptime + breadth.
93
+ { model: openrouter("deepseek/deepseek-v4"), label: "openrouter", cost: { input: 0.43, output: 0.87 } },
94
+ ],
95
+ },
96
+ });
97
+ ```
98
+
99
+ The same pattern works for any vendor's native SDK provider — `@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/openai`, `@ai-sdk/xai`, and so on. They all return `LanguageModelV3`, so you can mix a native vendor API with aggregators in one model's list. Native APIs are narrow (only that vendor's models) but featureful; aggregators are broad. **Official-first + aggregator-fallback** is the natural LCR shape.
100
+
70
101
  ## How it routes
71
102
 
72
103
  1. **Cheapest first.** Providers are tried in order — list them cheapest-first, or set `autoSort: true` to order them by `cost` automatically.
73
- 2. **Fall through on failure.** On a retryable error (rate limit, 5xx, timeout) it advances to the next provider, streaming-safe. Hard errors (400, 401, 403, 422) pass through immediately.
104
+ 2. **Fall through on failure.** On a retryable error rate limit, 5xx, timeout, or a **billing cap** (402 / out-of-credit / quota) it advances to the next provider, streaming-safe. A caller's own bad request (e.g. 400, 422) passes through immediately.
74
105
  3. **Recover.** After an idle window (`resetIntervalMs`, default 60s) it snaps back to the cheapest provider.
75
106
 
76
107
  <p align="center">
77
108
  <img src="assets/ai-lcr-routing.svg" alt="routing diagram: cheapest first, fallback on failure, recover after idle" width="820">
78
109
  </p>
79
110
 
111
+ ## See what happened (`onCall`)
112
+
113
+ `onError`/`onCost` fire separately and uncorrelated, so a failover is hard to read after the fact. `onCall` gives you **one record per request** — the full chain, the winner, the reason for each failed hop, latency, and cost — and `formatCallRecord` turns it into a one-liner you can scan:
114
+
115
+ ```ts
116
+ import { createLCR, formatCallRecord } from "ai-lcr";
117
+
118
+ const lcr = createLCR({
119
+ models: { /* … */ },
120
+ onCall: (record) => console.log(formatCallRecord(record)),
121
+ });
122
+ ```
123
+
124
+ ```text
125
+ ✓ text tokenmart 412ms $0.0003
126
+ ⚠ text tokenmart→openrouter 910ms $0.0004 ⤷ tokenmart 502
127
+ ✗ text deepseek→tokenmart→openrouter 1240ms FAILED ⤷ deepseek 401, tokenmart 502, openrouter 429
128
+ ```
129
+
130
+ `✓` served on the first try · `⚠` failed over but recovered · `✗` every provider failed. The `⤷` shows which provider died and why.
131
+
132
+ **Persist it anywhere — zero lock-in.** `record` is a plain `CallRecord` object. Log the JSON and point any log drain at it (Axiom, Datadog, your own DB); ai-lcr never decides where it goes:
133
+
134
+ ```ts
135
+ onCall: (record) => console.log(JSON.stringify(record)),
136
+ ```
137
+
138
+ Or ship each record to an HTTP collector with the built-in `createHttpSink` (fire-and-forget, never throws, dashboard-agnostic):
139
+
140
+ ```ts
141
+ import { createLCR, createHttpSink } from "ai-lcr";
142
+ import { after } from "next/server"; // serverless: don't block the response
143
+
144
+ const lcr = createLCR({
145
+ models: { /* … */ },
146
+ onCall: createHttpSink({
147
+ url: `${process.env.LCR_INGEST_URL}/api/ingest`,
148
+ headers: { authorization: `Bearer ${process.env.LCR_INGEST_KEY}` },
149
+ project: process.env.LCR_PROJECT, // optional tag if one collector serves several apps
150
+ dispatch: after, // run after the response is sent (serverless-safe)
151
+ }),
152
+ });
153
+ ```
154
+
155
+ Point `url` at anything that accepts the `CallRecord` JSON — including the self-hostable companion dashboard, **[ai-lcr-dashboard](https://github.com/victorzhrn/ai-lcr-dashboard)** (Spend / Calls / Failover rate + a live failover feed). You run your own instance, so the data never leaves your infrastructure; a [db9](https://db9.ai) database can be provisioned in seconds if you don't want to stand one up yourself.
156
+
157
+ ```ts
158
+ interface CallRecord {
159
+ id: string; // correlation id, one per request
160
+ model: string; // logical model name
161
+ attempts: { provider: string; ok: boolean; latencyMs: number; errorClass?: string }[];
162
+ winner?: string; // provider that served; undefined if all failed
163
+ ok: boolean;
164
+ failedOver: boolean; // more than one provider was tried
165
+ latencyMs: number;
166
+ inputTokens: number;
167
+ outputTokens: number;
168
+ costUsd: number; // what the winner charged for these tokens
169
+ baselineUsd: number; // what the priciest configured route would cost → savings = baselineUsd - costUsd
170
+ }
171
+ ```
172
+
80
173
  ## Supported providers
81
174
 
82
- Any OpenAI-compatible endpoint works.
175
+ Any OpenAI-compatible endpoint works — and so does any AI SDK provider package, including a model vendor's own official API.
83
176
 
84
- - **Text:** [OpenRouter](https://openrouter.ai) (widest coverage, list pricing) · [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) (**30% off** every model)
85
- - **Image / video:** [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) (**30% off**) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) routing on the roadmap
177
+ - **Model vendors' own APIs (native):** route straight to [DeepSeek](https://platform.deepseek.com), [OpenAI](https://openai.com), [Anthropic](https://anthropic.com), [Google](https://ai.google.dev), [xAI](https://x.ai), etc. via their AI SDK provider packages — no markup, full native features. See [Route to a model vendor's own API](#route-to-a-model-vendors-own-api-native-providers).
178
+ - **Text aggregators:** [OpenRouter](https://openrouter.ai) (widest coverage, list pricing) · [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off** every model) · [TokenMart](https://thetokenmart.ai) (15–65% off, varies by model)
179
+ - **Image / video:** [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off**) · [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) — image routing available via `createMediaLCR` (Kunavo + Runware adapters); video on the roadmap
86
180
 
87
181
  ## Text model pricing
88
182
 
89
- USD per 1M tokens, input / output. Official rates as of 2026-05 — verify current rates with each provider. OpenRouter passes list price through; Kunavo is a flat 30% off the official rate.
183
+ USD per 1M tokens, input / output. Official rates as of 2026-05 — verify current rates with each provider. OpenRouter passes list price through; Kunavo is a flat 20% off the official rate. TokenMart prices vary by model (15–65% off list) — verify current rates at [thetokenmart.ai](https://thetokenmart.ai).
184
+
185
+ | Model | Official (in / out) | OpenRouter | [Kunavo](https://kunavo.com/?ref=victorimf) | [TokenMart](https://thetokenmart.ai) | Cheapest |
186
+ |---|---|---|---|---|---|
187
+ | Gemini 3 Flash | $0.50 / $3.00 | no discount | −20% | — | ⭐ Kunavo |
188
+ | Gemini 3 Pro / 3.1 Pro | $2.00 / $12.00 | no discount | −20% | −20% → **$2.40 / $9.60** | ⭐ Kunavo |
189
+ | Gemini 2.5 Pro | $1.25 / $10.00 | no discount | −20% | — | ⭐ Kunavo |
190
+ | Gemini 2.5 Flash | $0.30 / $2.50 | no discount | −20% | — | ⭐ Kunavo |
191
+ | Claude Opus 4.7 | $15.00 / $75.00 | no discount | −20% | **$4.25 / $21.25** | ⭐ TokenMart |
192
+ | Claude Sonnet 4.6 | $3.00 / $15.00 | no discount | −20% | −15% → **$2.55 / $12.75** | ⭐ Kunavo |
193
+ | Claude Haiku 4.5 | $1.00 / $5.00 | no discount | −20% | — | ⭐ Kunavo |
194
+ | DeepSeek V4 | $0.43 / $0.87 | no discount | not carried | — | ⭐ DeepSeek (official) |
90
195
 
91
- | Model | Official (in / out) | OpenRouter | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | Cheapest |
92
- |---|---|---|---|---|
93
- | Gemini 3 Flash | $0.50 / $3.00 | no discount | −30% | ⭐ Kunavo |
94
- | Gemini 3 Pro / 3.1 Pro | $2.00 / $12.00 | no discount | −30% | ⭐ Kunavo |
95
- | Gemini 2.5 Pro | $1.25 / $10.00 | no discount | −30% | ⭐ Kunavo |
96
- | Gemini 2.5 Flash | $0.30 / $2.50 | no discount | −30% | ⭐ Kunavo |
97
- | Claude Sonnet 4.6 | $3.00 / $15.00 | no discount | −30% | ⭐ Kunavo |
98
- | Claude Haiku 4.5 | $1.00 / $5.00 | no discount | −30% | ⭐ Kunavo |
99
- | DeepSeek V4 | $0.43 / $0.87 | no discount | not carried | ⭐ OpenRouter |
196
+ Kunavo carries Anthropic + Google. DeepSeek / OpenAI / Grok / Mistral route to their **own official APIs** (cheapest, full native features) with OpenRouter as a broad fallback — one config can mix native vendors and aggregators.
100
197
 
101
- Kunavo carries Anthropic + Google. DeepSeek / OpenAI / Grok / Mistral route to OpenRouterone config can mix them all.
198
+ > **Note:** List price ≠ effective price — always verify with the [probe](#vetting-a-provider-capability--cost-probe). As of 2026-05-28, Kunavo token counts are clean for both Gemini (~1.1–1.4×) and Claude (~1.0×). Remaining caveats: `max_tokens` is still ignored on both models, and hidden-prompt injection appears intermittently for Claude re-probe before routing in production. Effective cost is why `ai-lcr` should rank by measured behavior, not the sticker price.
199
+
200
+ > **Note:** TokenMart token counts are also verified clean (same backend as Inference.ai, all checks passed 2026-05-27: tool calls, `max_tokens`, no injection, token ~1.0×, prompt caching) — a reliable second provider for Claude at −15% list. Re-probe before routing in production.
102
201
 
103
202
  ## Image model pricing
104
203
 
105
- USD per image, as of 2026-05 (provider list / retail; verify current rates). Kunavo is 30% off official. fal and Runware are compute providers — `ai-lcr` picks the cheapest per model (⭐).
106
-
107
- | Model | fal.ai | Runware | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | Cheapest |
108
- |---|---|---|---|---|
109
- | Nano Banana 2 | $0.080 | $0.069 | $0.047 | ⭐ Kunavo |
110
- | Nano Banana Pro | $0.080 | — | $0.094 | ⭐ fal |
111
- | GPT-Image-2 | $0.210 | $0.094 | $0.089 | ⭐ Kunavo |
112
- | Imagen 4 Ultra | $0.060 | $0.060 | — | ⭐ fal / Runware |
113
- | Ideogram V3 | $0.060 | $0.060 | — | ⭐ fal / Runware |
114
- | Seedream 4 | $0.030 | — | — | ⭐ fal |
115
- | Flux 1.1 Pro | $0.040 | $0.040 | — | ⭐ fal / Runware |
116
- | Flux Dev | $0.025 | $0.025 | — | ⭐ fal / Runware |
117
- | Flux Schnell | $0.0030 | $0.0013 | — | ⭐ Runware |
118
- | Qwen-Image | — | $0.0038 | — | ⭐ Runware |
119
- | FLUX.2 Klein 4B | — | $0.0006 | — | ⭐ Runware |
204
+ USD per image, as of 2026-05 (provider list / retail; verify current rates). Kunavo is 20% off official. fal and Runware are compute providers — `ai-lcr` picks the cheapest per model (⭐).
205
+
206
+ | Model | fal.ai | Runware | [Kunavo](https://kunavo.com/?ref=victorimf) | [TokenMart](https://thetokenmart.ai) | Cheapest |
207
+ |---|---|---|---|---|---|
208
+ | Nano Banana 2 | $0.080 | $0.069 | $0.054 | **$0.050** | TokenMart |
209
+ | Nano Banana Pro | $0.080 | — | $0.107 | — | ⭐ fal |
210
+ | GPT-Image-2 | $0.210 | $0.094 | $0.102 | — | Runware |
211
+ | Imagen 4 Ultra | $0.060 | $0.060 | — | — | ⭐ fal / Runware |
212
+ | Ideogram V3 | $0.060 | $0.060 | — | — | ⭐ fal / Runware |
213
+ | Seedream 4 | $0.030 | — | — | — | ⭐ fal |
214
+ | Flux 1.1 Pro | $0.040 | $0.040 | — | — | ⭐ fal / Runware |
215
+ | Flux Dev | $0.025 | $0.025 | — | — | ⭐ fal / Runware |
216
+ | Flux Schnell | $0.0030 | $0.0013 | — | — | ⭐ Runware |
217
+ | Qwen-Image | — | $0.0038 | — | — | ⭐ Runware |
218
+ | FLUX.2 Klein 4B | — | $0.0006 | — | — | ⭐ Runware |
120
219
 
121
220
  ## Video model pricing
122
221
 
@@ -134,19 +233,71 @@ USD per second, as of 2026-05 — verify current rates. Video billing differs by
134
233
  | Seedance Pro | $0.124 |
135
234
  | Veo 3.1 (audio-on) | $0.400 |
136
235
 
236
+ ## Vetting a provider (capability + cost probe)
237
+
238
+ A discount is worthless if the provider quietly breaks the wire protocol. `ai-lcr` ships a zero-dependency check (`scripts/check-provider.sh`, just `bash` + `curl` + `python3`) that vets the things that actually cost you money or corrupt output, **per model**:
239
+
240
+ - **tool calling** — single call and a multi-step round-trip with `content: null` (the shape every agent loop sends)
241
+ - **`max_tokens` honored** — caps must bound output
242
+ - **hidden-prompt injection** — sends a neutral message; flags the provider if the model starts reacting to a system prompt it was never given
243
+ - **token over-counting** — compares reported `prompt_tokens` against a trusted baseline provider; >1.5× means the bill is inflated and the "discount" may be a loss
244
+ - **prompt caching** — whether `cache_control` actually produces a `cache_read` on repeats
245
+
246
+ ```bash
247
+ # point it at the provider you're vetting; models are generic numbered slots
248
+ # (works for Gemini, Claude, GPT, Llama, …). Add a per-model REF_n on a trusted
249
+ # baseline (e.g. OpenRouter) to enable the token-inflation check. CACHE_MODEL
250
+ # (optional) runs the Anthropic-native /v1/messages prompt-caching test.
251
+ API_KEY=$KUNAVO_API_KEY BASE=https://api.kunavo.com \
252
+ MODEL_1=gemini-3-flash REF_1=google/gemini-3-flash-preview \
253
+ MODEL_2=claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
254
+ CACHE_MODEL=claude-sonnet-4-6 \
255
+ REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
256
+ bash scripts/check-provider.sh
257
+
258
+ # TokenMart uses vendor-prefixed model IDs
259
+ API_KEY=$TOKENMART_API_KEY BASE=https://api.tokenmart.ai \
260
+ MODEL_1=google/gemini-3-flash REF_1=google/gemini-3-flash-preview \
261
+ MODEL_2=anthropic/claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
262
+ CACHE_MODEL=anthropic/claude-sonnet-4-6 \
263
+ REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
264
+ bash scripts/check-provider.sh
265
+ ```
266
+
267
+ A `FAIL` on injection or token over-counting means that provider is **not** a safe least-cost target for that model — keep it off that model's cheapest-first list until it's fixed, then re-probe.
268
+
269
+ ### Trust matrix (probed 2026-05-27)
270
+
271
+ Two OpenAI-compatible providers, same probe, same day. Cells cover both families (G = Gemini, C = Claude).
272
+
273
+ | Check | Kunavo | [TokenMart](https://thetokenmart.ai) |
274
+ |---|---|---|
275
+ | Tool calls (single + multi-step `content: null`) | G ⚠️ intermittent¹ · C ✅ | ✅ both |
276
+ | Token count vs OpenRouter baseline | G ✅ ~1.1–1.4× · C ✅ ~1.0× | ✅ both ~1.0× |
277
+ | Hidden-prompt injection | G ✅ none · C ❌ intermittent² | ✅ none |
278
+ | `max_tokens` honored | ❌ ignored (both) | ✅ both |
279
+ | Prompt caching (`cache_control`) | C ❌ not applied (endpoint also hung mid-probe) | C ✅ `cache_read` > 0 |
280
+
281
+ ¹ Kunavo Gemini returned a clean tool call on one run and **dropped tools entirely** on the next identical request — not a stable pass.
282
+ ² Kunavo Claude reacted to a phantom "fake system prompt" on one run and stayed clean on another — the injection is intermittent, not removed.
283
+
284
+ **Verdict:** TokenMart passes every check on both Gemini and Claude with stable, repeatable results — route freely. Kunavo: token counts are now clean for Claude (re-probed 2026-05-28); at −20% list, Kunavo is the cheapest option for Claude. Remaining caveats: `max_tokens` is ignored on both models, hidden-prompt injection appears intermittently for Claude, and Gemini drops tool calls intermittently — re-probe before routing a new model in production.
285
+
137
286
  ## Roadmap
138
287
 
139
288
  - [x] Own failover engine — cheapest-first routing + streaming-safe fallback, no external routing dependency
140
289
  - [x] Real per-call cost accounting (`onCost`)
290
+ - [x] One correlated record per request with the full failover chain (`onCall` + `formatCallRecord`)
141
291
  - [x] Auto cheapest-first ordering (`autoSort`) from per-provider `cost`
292
+ - [x] Offline capability + cost check (`scripts/check-provider.sh`) → per-model trust matrix
142
293
  - [ ] Bundled price table for zero-config pricing (drop the manual `cost` numbers)
143
- - [ ] Provider-quirk middleware (transparently patch known per-provider request quirks)
144
- - [ ] Offline capability probe (tool-calling / caching / streaming) trust matrix
294
+ - [ ] Provider-quirk middleware (transparently patch known per-provider request quirks, e.g. Kunavo's ignored `max_tokens`)
295
+ - [ ] Feed probe results into routing automatically (auto-exclude a model from a provider that fails its probe)
145
296
  - [ ] Image & video model routing (fal.ai / Runware / Kunavo)
146
297
 
147
298
  ## Affiliate disclosure
148
299
 
149
- `ai-lcr` is provider-neutral and works with any OpenAI-compatible endpoint. The author holds an affiliate arrangement with **[Kunavo](https://kunavo.com/?ref=hJ2uT3iW)**, which — at 30% off official rates — is often (not always) the cheapest option, as the tables above show. Signing up through that link may earn the author a share. You're never required to use it; bring your own providers and routing works identically.
300
+ `ai-lcr` is provider-neutral and works with any OpenAI-compatible endpoint. The author holds an affiliate arrangement with **[Kunavo](https://kunavo.com/?ref=victorimf)**, which — at 20% off official rates — is often (not always) the cheapest option, as the tables above show. Signing up through that link may earn the author a share. You're never required to use it; bring your own providers and routing works identically.
150
301
 
151
302
  ## Development
152
303
 
package/README.zh-CN.md CHANGED
@@ -51,7 +51,7 @@ const lcr = createLCR({
51
51
  models: {
52
52
  // 一个逻辑模型,跨多个 provider 最便宜优先地提供服务。
53
53
  "gemini-3-flash": [
54
- { model: kunavo("gemini-3-flash"), label: "kunavo", cost: { input: 0.35, output: 2.1 } },
54
+ { model: kunavo("gemini-3-flash"), label: "kunavo", cost: { input: 0.40, output: 2.40 } },
55
55
  { model: openrouter("google/gemini-3-flash-preview"), label: "openrouter", cost: { input: 0.5, output: 3.0 } },
56
56
  ],
57
57
  },
@@ -67,6 +67,37 @@ const { text } = await generateText({
67
67
 
68
68
  `cost` 和 `label` 都是可选的——如果你不需要成本核算或 `autoSort`,可以直接传裸模型(`kunavo("gemini-3-flash")`)。`lcr("gemini-3-flash")` 返回一个标准的 AI SDK 模型,因此可与 `generateText`、`streamText`、`generateObject`、工具调用和 agent 一起使用。
69
69
 
70
+ ## 直连模型厂商官方 API(原生 provider)
71
+
72
+ 「provider」不一定是聚合器。模型厂商**自己的官方 API** 就是列表里的又一个 entry——往往是最便宜的那个(没有聚合器加价),也最不容易悄悄破坏原生特性(prompt 缓存、工具调用)。任何 AI SDK 的 provider 包都返回标准模型,所以厂商的原生 API 和 OpenAI 兼容的聚合器可以并排放在同一个列表里:
73
+
74
+ ```ts
75
+ import { createLCR } from "ai-lcr";
76
+ import { createDeepSeek } from "@ai-sdk/deepseek"; // DeepSeek 官方 API
77
+ import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
78
+
79
+ const deepseek = createDeepSeek({ apiKey: process.env.DEEPSEEK_API_KEY });
80
+ const openrouter = createOpenAICompatible({
81
+ name: "openrouter",
82
+ baseURL: "https://openrouter.ai/api/v1",
83
+ apiKey: process.env.OPENROUTER_API_KEY,
84
+ });
85
+
86
+ const lcr = createLCR({
87
+ autoSort: true,
88
+ models: {
89
+ "deepseek-v4": [
90
+ // 官方 API 优先——无加价,原生特性齐全(缓存、错峰折扣)。
91
+ { model: deepseek("deepseek-chat"), label: "deepseek", cost: { input: 0.43, output: 0.87 } },
92
+ // 聚合器作为兜底,保可用性 + 广覆盖。
93
+ { model: openrouter("deepseek/deepseek-v4"), label: "openrouter", cost: { input: 0.43, output: 0.87 } },
94
+ ],
95
+ },
96
+ });
97
+ ```
98
+
99
+ 同样的模式适用于任何厂商的原生 SDK provider——`@ai-sdk/anthropic`、`@ai-sdk/google`、`@ai-sdk/openai`、`@ai-sdk/xai` 等等。它们都返回 `LanguageModelV3`,所以你可以在一个模型的列表里把厂商原生 API 和聚合器混着用。原生 API 覆盖窄(只有该厂商自己的模型)但特性全;聚合器覆盖广。**官方优先 + 聚合器兜底** 正是 LCR 最自然的形态。
100
+
70
101
  ## 它如何路由
71
102
 
72
103
  1. **最便宜优先。** provider 按顺序依次尝试——把它们排成最便宜优先,或设置 `autoSort: true` 让它按 `cost` 自动排序。
@@ -79,44 +110,50 @@ const { text } = await generateText({
79
110
 
80
111
  ## 支持的 provider
81
112
 
82
- 任何 OpenAI 兼容的 endpoint 都可用。
113
+ 任何 OpenAI 兼容的 endpoint 都可用——任何 AI SDK 的 provider 包也都可用,包括模型厂商自己的官方 API。
83
114
 
84
- - **文本:** [OpenRouter](https://openrouter.ai)(覆盖最广,列表定价)· [Kunavo](https://kunavo.com/?ref=hJ2uT3iW)(**全模型 7 折**)
85
- - **图像 / 视频:** [Kunavo](https://kunavo.com/?ref=hJ2uT3iW)(**7 折**)· [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —— 路由功能在路线图中
115
+ - **模型厂商官方 API(原生):** 通过各自的 AI SDK provider 包直连 [DeepSeek](https://platform.deepseek.com)[OpenAI](https://openai.com)、[Anthropic](https://anthropic.com)、[Google](https://ai.google.dev)、[xAI](https://x.ai) 等——无加价,原生特性齐全。见上方「直连模型厂商官方 API(原生 provider)」一节。
116
+ - **文本聚合器:** [OpenRouter](https://openrouter.ai)(覆盖最广,列表定价)· [Kunavo](https://kunavo.com/?ref=victorimf)(**全模型 8 折**)· [TokenMart](https://thetokenmart.ai)(按模型 85 折–35 折不等)
117
+ - **图像 / 视频:** [Kunavo](https://kunavo.com/?ref=victorimf)(**8 折**)· [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —— 路由功能在路线图中
86
118
 
87
119
  ## 文本模型价格
88
120
 
89
- 单位为每 100 万 token 的美元价格,input / output。官方价格截至 2026-05——请向各 provider 核对当前价格。OpenRouter 直接透传列表价;Kunavo 在官方价基础上统一 7 折。
121
+ 单位为每 100 万 token 的美元价格,input / output。官方价格截至 2026-05——请向各 provider 核对当前价格。OpenRouter 直接透传列表价;Kunavo 在官方价基础上统一 8 折。TokenMart 折扣按模型不同(85 折–35 折),请在 [thetokenmart.ai](https://thetokenmart.ai) 核对当前价格。
122
+
123
+ | 模型 | 官方价(in / out) | OpenRouter | [Kunavo](https://kunavo.com/?ref=victorimf) | [TokenMart](https://thetokenmart.ai) | 最便宜 |
124
+ |---|---|---|---|---|---|
125
+ | Gemini 3 Flash | $0.50 / $3.00 | 无折扣 | −20% | — | ⭐ Kunavo |
126
+ | Gemini 3 Pro / 3.1 Pro | $2.00 / $12.00 | 无折扣 | −20% | −20% → **$2.40 / $9.60** | ⭐ Kunavo |
127
+ | Gemini 2.5 Pro | $1.25 / $10.00 | 无折扣 | −20% | — | ⭐ Kunavo |
128
+ | Gemini 2.5 Flash | $0.30 / $2.50 | 无折扣 | −20% | — | ⭐ Kunavo |
129
+ | Claude Opus 4.7 | $15.00 / $75.00 | 无折扣 | −20% | **$4.25 / $21.25** | ⭐ TokenMart |
130
+ | Claude Sonnet 4.6 | $3.00 / $15.00 | 无折扣 | −20% | −15% → **$2.55 / $12.75** | ⭐ Kunavo |
131
+ | Claude Haiku 4.5 | $1.00 / $5.00 | 无折扣 | −20% | — | ⭐ Kunavo |
132
+ | DeepSeek V4 | $0.43 / $0.87 | 无折扣 | 未提供 | — | ⭐ DeepSeek(官方) |
133
+
134
+ Kunavo 提供 Anthropic + Google。DeepSeek / OpenAI / Grok / Mistral 路由到各自的**官方 API**(最便宜,原生特性齐全),以 OpenRouter 作为广覆盖兜底——一份配置即可混用原生厂商与聚合器。
90
135
 
91
- | 模型 | 官方价(in / out) | OpenRouter | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | 最便宜 |
92
- |---|---|---|---|---|
93
- | Gemini 3 Flash | $0.50 / $3.00 | 无折扣 | −30% | ⭐ Kunavo |
94
- | Gemini 3 Pro / 3.1 Pro | $2.00 / $12.00 | 无折扣 | −30% | ⭐ Kunavo |
95
- | Gemini 2.5 Pro | $1.25 / $10.00 | 无折扣 | −30% | ⭐ Kunavo |
96
- | Gemini 2.5 Flash | $0.30 / $2.50 | 无折扣 | −30% | ⭐ Kunavo |
97
- | Claude Sonnet 4.6 | $3.00 / $15.00 | 无折扣 | −30% | ⭐ Kunavo |
98
- | Claude Haiku 4.5 | $1.00 / $5.00 | 无折扣 | −30% | ⭐ Kunavo |
99
- | DeepSeek V4 | $0.43 / $0.87 | 无折扣 | 未提供 | ⭐ OpenRouter |
136
+ > **注:** list 有效价——请始终用 [probe](#给-provider-做体检能力--成本探测) 验证。截至 2026-05-28,Kunavo 在 Gemini(~1.1–1.4×)和 Claude(~1.0×)两条路上的 token 计数均已干净。现存问题:两个模型均忽略 `max_tokens`,Claude 隐藏 prompt 注入仍为间歇性——生产路由前请重新 probe。
100
137
 
101
- Kunavo 提供 Anthropic + Google。DeepSeek / OpenAI / Grok / Mistral 路由到 OpenRouter——一份配置即可混用全部。
138
+ > **注:** TokenMart token 计数同样经 probe 验证干净(后端与 Inference.ai 相同,2026-05-27 全项通过:工具调用、`max_tokens`、无注入、token ~1.0×、prompt 缓存)——如需 Claude 的第二 provider,TokenMart 是可靠备选。生产路由前请重新 probe 确认。
102
139
 
103
140
  ## 图像模型价格
104
141
 
105
- 单位为每张图的美元价格,截至 2026-05(provider 列表价 / 零售价;请核对当前价格)。Kunavo 为官方价 7 折。fal 与 Runware 是算力 provider——`ai-lcr` 为每个模型挑选最便宜的那个(⭐)。
106
-
107
- | 模型 | fal.ai | Runware | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | 最便宜 |
108
- |---|---|---|---|---|
109
- | Nano Banana 2 | $0.080 | $0.069 | $0.047 | ⭐ Kunavo |
110
- | Nano Banana Pro | $0.080 | — | $0.094 | ⭐ fal |
111
- | GPT-Image-2 | $0.210 | $0.094 | $0.089 | ⭐ Kunavo |
112
- | Imagen 4 Ultra | $0.060 | $0.060 | — | ⭐ fal / Runware |
113
- | Ideogram V3 | $0.060 | $0.060 | — | ⭐ fal / Runware |
114
- | Seedream 4 | $0.030 | — | — | ⭐ fal |
115
- | Flux 1.1 Pro | $0.040 | $0.040 | — | ⭐ fal / Runware |
116
- | Flux Dev | $0.025 | $0.025 | — | ⭐ fal / Runware |
117
- | Flux Schnell | $0.0030 | $0.0013 | — | ⭐ Runware |
118
- | Qwen-Image | — | $0.0038 | — | ⭐ Runware |
119
- | FLUX.2 Klein 4B | — | $0.0006 | — | ⭐ Runware |
142
+ 单位为每张图的美元价格,截至 2026-05(provider 列表价 / 零售价;请核对当前价格)。Kunavo 为官方价 8 折。fal 与 Runware 是算力 provider——`ai-lcr` 为每个模型挑选最便宜的那个(⭐)。
143
+
144
+ | 模型 | fal.ai | Runware | [Kunavo](https://kunavo.com/?ref=victorimf) | [TokenMart](https://thetokenmart.ai) | 最便宜 |
145
+ |---|---|---|---|---|---|
146
+ | Nano Banana 2 | $0.080 | $0.069 | $0.054 | **$0.050** | TokenMart |
147
+ | Nano Banana Pro | $0.080 | — | $0.107 | — | ⭐ fal |
148
+ | GPT-Image-2 | $0.210 | $0.094 | $0.102 | — | Runware |
149
+ | Imagen 4 Ultra | $0.060 | $0.060 | — | — | ⭐ fal / Runware |
150
+ | Ideogram V3 | $0.060 | $0.060 | — | — | ⭐ fal / Runware |
151
+ | Seedream 4 | $0.030 | — | — | — | ⭐ fal |
152
+ | Flux 1.1 Pro | $0.040 | $0.040 | — | — | ⭐ fal / Runware |
153
+ | Flux Dev | $0.025 | $0.025 | — | — | ⭐ fal / Runware |
154
+ | Flux Schnell | $0.0030 | $0.0013 | — | — | ⭐ Runware |
155
+ | Qwen-Image | — | $0.0038 | — | — | ⭐ Runware |
156
+ | FLUX.2 Klein 4B | — | $0.0006 | — | — | ⭐ Runware |
120
157
 
121
158
  ## 视频模型价格
122
159
 
@@ -134,19 +171,69 @@ Kunavo 提供 Anthropic + Google。DeepSeek / OpenAI / Grok / Mistral 路由到
134
171
  | Seedance Pro | $0.124 |
135
172
  | Veo 3.1(audio-on) | $0.400 |
136
173
 
174
+ ## 给 provider 做体检(能力 + 成本探测)
175
+
176
+ 折扣再大,如果 provider 偷偷破坏了协议就一文不值。`ai-lcr` 自带一个零依赖的检查脚本(`scripts/check-provider.sh`,只需 `bash` + `curl` + `python3`),**逐模型**核查那些真正会让你多花钱或污染输出的点:
177
+
178
+ - **工具调用** —— 单次调用 + 带 `content: null` 的多步 round-trip(每个 agent 循环都会发的形态)
179
+ - **`max_tokens` 是否生效** —— cap 必须能限制输出长度
180
+ - **隐藏 prompt 注入** —— 发一条中性消息,如果模型开始回应一段它从没收到过的 system prompt,就说明 provider 注入了东西
181
+ - **token 超计** —— 把上报的 `prompt_tokens` 和一个可信基线 provider 对照,>1.5× 说明账单被灌水、"折扣"可能是亏本
182
+ - **prompt 缓存** —— `cache_control` 在重复请求时是否真的产生 `cache_read`
183
+
184
+ ```bash
185
+ # 指向你要体检的 provider;模型用通用编号槽位(Gemini / Claude / GPT / Llama 都行)。
186
+ # 给某个模型配上 REF_n(可信基线上的对应模型 id)即可启用 token 超计检查。
187
+ # CACHE_MODEL(可选)跑 Anthropic 原生 /v1/messages 的 prompt 缓存测试。
188
+ API_KEY=$KUNAVO_API_KEY BASE=https://api.kunavo.com \
189
+ MODEL_1=gemini-3-flash REF_1=google/gemini-3-flash-preview \
190
+ MODEL_2=claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
191
+ CACHE_MODEL=claude-sonnet-4-6 \
192
+ REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
193
+ bash scripts/check-provider.sh
194
+
195
+ # TokenMart 使用 vendor 前缀的模型 ID
196
+ API_KEY=$TOKENMART_API_KEY BASE=https://api.tokenmart.ai \
197
+ MODEL_1=google/gemini-3-flash REF_1=google/gemini-3-flash-preview \
198
+ MODEL_2=anthropic/claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
199
+ CACHE_MODEL=anthropic/claude-sonnet-4-6 \
200
+ REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
201
+ bash scripts/check-provider.sh
202
+ ```
203
+
204
+ 注入或 token 超计这两项 `FAIL`,意味着该 provider 对那个模型来说**不是**安全的最低成本目标——在它修好之前,别把它放进那个模型的「最便宜优先」列表,修好后重新探测。
205
+
206
+ ### 信任矩阵(探测于 2026-05-27)
207
+
208
+ 两个 OpenAI 兼容 provider,同一脚本,同一天。单元格覆盖两个家族(G = Gemini,C = Claude)。
209
+
210
+ | 检查项 | Kunavo | [TokenMart](https://thetokenmart.ai) |
211
+ |---|---|---|
212
+ | 工具调用(单次 + 多步 `content: null`) | G ⚠️ 间歇性¹ · C ✅ | ✅ 两者 |
213
+ | token 计数 vs OpenRouter 基线 | G ✅ ~1.1–1.4× · C ✅ ~1.0× | ✅ 两者 ~1.0× |
214
+ | 隐藏 prompt 注入 | G ✅ 无 · C ❌ 间歇性² | ✅ 无 |
215
+ | `max_tokens` 是否生效 | ❌ 被忽略(两者) | ✅ 两者 |
216
+ | prompt 缓存(`cache_control`) | C ❌ 未生效(探测中途 endpoint 还卡死) | C ✅ `cache_read` > 0 |
217
+
218
+ ¹ Kunavo Gemini 一次返回干净的工具调用,下一次相同请求却**完全丢掉了 tools**——不是稳定通过。
219
+ ² Kunavo Claude 一次对着幻觉中的"fake system prompt"作出反应,另一次又干净——注入是间歇性的,不是被移除了。
220
+
221
+ **结论:** TokenMart 在 Gemini 和 Claude 两条路上每一项都通过,且结果稳定可复现——可以放心路由。Kunavo:Claude token 计数已干净(2026-05-28 重新 probe),按 8 折 list 价,Kunavo 现在是 Claude 模型的最便宜选择。现存问题:两个模型均忽略 `max_tokens`、Claude 隐藏 prompt 注入仍为间歇性、Gemini 也会间歇性丢工具调用——用新模型前先重新探测。
222
+
137
223
  ## 路线图
138
224
 
139
225
  - [x] 自有 failover 引擎——最便宜优先路由 + 流式安全的 fallback,不依赖外部路由库
140
226
  - [x] 真实的逐次调用成本核算(`onCost`)
141
227
  - [x] 基于各 provider `cost` 的自动最便宜优先排序(`autoSort`)
228
+ - [x] 离线能力 + 成本检查(`scripts/check-provider.sh`)→ 逐模型信任矩阵
142
229
  - [ ] 内置价格表,实现零配置定价(省去手填 `cost` 数字)
143
- - [ ] provider 怪癖中间件(透明地修补已知的各 provider 请求怪癖)
144
- - [ ] 离线能力探测(工具调用 / 缓存 / 流式)→ 信任矩阵
230
+ - [ ] provider 怪癖中间件(透明地修补已知怪癖,如 Kunavo 被忽略的 `max_tokens`)
231
+ - [ ] probe 结果自动接入路由(探测失败的 provider×model 自动从列表剔除)
145
232
  - [ ] 图像与视频模型路由(fal.ai / Runware / Kunavo)
146
233
 
147
234
  ## 联盟(Affiliate)披露
148
235
 
149
- `ai-lcr` 是 provider 中立的,可与任何 OpenAI 兼容的 endpoint 配合使用。作者与 **[Kunavo](https://kunavo.com/?ref=hJ2uT3iW)** 之间存在联盟(affiliate)关系——在官方价 7 折的情况下,它往往(但并非总是)是最便宜的选项,正如上面的表格所示。通过该链接注册可能会让作者获得一份分成。你完全不必使用它;自带 provider,路由功能照常工作。
236
+ `ai-lcr` 是 provider 中立的,可与任何 OpenAI 兼容的 endpoint 配合使用。作者与 **[Kunavo](https://kunavo.com/?ref=victorimf)** 之间存在联盟(affiliate)关系——在官方价 8 折的情况下,它往往(但并非总是)是最便宜的选项,正如上面的表格所示。通过该链接注册可能会让作者获得一份分成。你完全不必使用它;自带 provider,路由功能照常工作。
150
237
 
151
238
  ## 开发
152
239