ai-lcr 0.0.1 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -51,7 +51,7 @@ const lcr = createLCR({
51
51
  models: {
52
52
  // One logical model, served cheapest-first across providers.
53
53
  "gemini-3-flash": [
54
- { model: kunavo("gemini-3-flash"), label: "kunavo", cost: { input: 0.35, output: 2.1 } },
54
+ { model: kunavo("gemini-3-flash"), label: "kunavo", cost: { input: 0.40, output: 2.40 } },
55
55
  { model: openrouter("google/gemini-3-flash-preview"), label: "openrouter", cost: { input: 0.5, output: 3.0 } },
56
56
  ],
57
57
  },
@@ -67,10 +67,41 @@ const { text } = await generateText({
67
67
 
68
68
  `cost` and `label` are optional — pass bare models (`kunavo("gemini-3-flash")`) if you don't need cost accounting or `autoSort`. `lcr("gemini-3-flash")` returns a standard AI SDK model, so it works with `generateText`, `streamText`, `generateObject`, tools, and agents.
69
69
 
70
+ ## Route to a model vendor's own API (native providers)
71
+
72
+ A "provider" doesn't have to be an aggregator. A model vendor's **own official API** is just another entry in the list — often the cheapest, since there's no aggregator markup, and the least likely to silently break native features (prompt caching, tool calls). Any AI SDK provider package returns a standard model, so a vendor's native API and an OpenAI-compatible aggregator sit side by side in the same list:
73
+
74
+ ```ts
75
+ import { createLCR } from "ai-lcr";
76
+ import { createDeepSeek } from "@ai-sdk/deepseek"; // DeepSeek's own API
77
+ import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
78
+
79
+ const deepseek = createDeepSeek({ apiKey: process.env.DEEPSEEK_API_KEY });
80
+ const openrouter = createOpenAICompatible({
81
+ name: "openrouter",
82
+ baseURL: "https://openrouter.ai/api/v1",
83
+ apiKey: process.env.OPENROUTER_API_KEY,
84
+ });
85
+
86
+ const lcr = createLCR({
87
+ autoSort: true,
88
+ models: {
89
+ "deepseek-v4": [
90
+ // Official API first — no markup, full native features (caching, off-peak discounts).
91
+ { model: deepseek("deepseek-chat"), label: "deepseek", cost: { input: 0.43, output: 0.87 } },
92
+ // Aggregator as a fallback for uptime + breadth.
93
+ { model: openrouter("deepseek/deepseek-v4"), label: "openrouter", cost: { input: 0.43, output: 0.87 } },
94
+ ],
95
+ },
96
+ });
97
+ ```
98
+
99
+ The same pattern works for any vendor's native SDK provider — `@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/openai`, `@ai-sdk/xai`, and so on. They all return `LanguageModelV3`, so you can mix a native vendor API with aggregators in one model's list. Native APIs are narrow (only that vendor's models) but featureful; aggregators are broad. **Official-first + aggregator-fallback** is the natural LCR shape.
100
+
70
101
  ## How it routes
71
102
 
72
103
  1. **Cheapest first.** Providers are tried in order — list them cheapest-first, or set `autoSort: true` to order them by `cost` automatically.
73
- 2. **Fall through on failure.** On a retryable error (rate limit, 5xx, timeout) it advances to the next provider, streaming-safe. Hard errors (400, 401, 403, 422) pass through immediately.
104
+ 2. **Fall through on failure.** On a retryable error rate limit, 5xx, timeout, or a **billing cap** (402 / out-of-credit / quota) it advances to the next provider, streaming-safe. A caller's own bad request (e.g. 400, 422) passes through immediately.
74
105
  3. **Recover.** After an idle window (`resetIntervalMs`, default 60s) it snaps back to the cheapest provider.
75
106
 
76
107
  <p align="center">
@@ -79,44 +110,50 @@ const { text } = await generateText({
79
110
 
80
111
  ## Supported providers
81
112
 
82
- Any OpenAI-compatible endpoint works.
113
+ Any OpenAI-compatible endpoint works — and so does any AI SDK provider package, including a model vendor's own official API.
83
114
 
84
- - **Text:** [OpenRouter](https://openrouter.ai) (widest coverage, list pricing) · [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) (**30% off** every model)
85
- - **Image / video:** [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) (**30% off**) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) routing on the roadmap
115
+ - **Model vendors' own APIs (native):** route straight to [DeepSeek](https://platform.deepseek.com), [OpenAI](https://openai.com), [Anthropic](https://anthropic.com), [Google](https://ai.google.dev), [xAI](https://x.ai), etc. via their AI SDK provider packages — no markup, full native features. See [Route to a model vendor's own API](#route-to-a-model-vendors-own-api-native-providers).
116
+ - **Text aggregators:** [OpenRouter](https://openrouter.ai) (widest coverage, list pricing) · [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off** every model) · [TokenMart](https://thetokenmart.ai) (15–65% off, varies by model)
117
+ - **Image / video:** [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off**) · [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) — image routing available via `createMediaLCR` (Kunavo + Runware adapters); video on the roadmap
86
118
 
87
119
  ## Text model pricing
88
120
 
89
- USD per 1M tokens, input / output. Official rates as of 2026-05 — verify current rates with each provider. OpenRouter passes list price through; Kunavo is a flat 30% off the official rate.
121
+ USD per 1M tokens, input / output. Official rates as of 2026-05 — verify current rates with each provider. OpenRouter passes list price through; Kunavo is a flat 20% off the official rate. TokenMart prices vary by model (15–65% off list) — verify current rates at [thetokenmart.ai](https://thetokenmart.ai).
122
+
123
+ | Model | Official (in / out) | OpenRouter | [Kunavo](https://kunavo.com/?ref=victorimf) | [TokenMart](https://thetokenmart.ai) | Cheapest |
124
+ |---|---|---|---|---|---|
125
+ | Gemini 3 Flash | $0.50 / $3.00 | no discount | −20% | — | ⭐ Kunavo |
126
+ | Gemini 3 Pro / 3.1 Pro | $2.00 / $12.00 | no discount | −20% | −20% → **$2.40 / $9.60** | ⭐ Kunavo |
127
+ | Gemini 2.5 Pro | $1.25 / $10.00 | no discount | −20% | — | ⭐ Kunavo |
128
+ | Gemini 2.5 Flash | $0.30 / $2.50 | no discount | −20% | — | ⭐ Kunavo |
129
+ | Claude Opus 4.7 | $15.00 / $75.00 | no discount | −20% | **$4.25 / $21.25** | ⭐ TokenMart |
130
+ | Claude Sonnet 4.6 | $3.00 / $15.00 | no discount | −20% | −15% → **$2.55 / $12.75** | ⭐ Kunavo |
131
+ | Claude Haiku 4.5 | $1.00 / $5.00 | no discount | −20% | — | ⭐ Kunavo |
132
+ | DeepSeek V4 | $0.43 / $0.87 | no discount | not carried | — | ⭐ DeepSeek (official) |
133
+
134
+ Kunavo carries Anthropic + Google. DeepSeek / OpenAI / Grok / Mistral route to their **own official APIs** (cheapest, full native features) with OpenRouter as a broad fallback — one config can mix native vendors and aggregators.
90
135
 
91
- | Model | Official (in / out) | OpenRouter | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | Cheapest |
92
- |---|---|---|---|---|
93
- | Gemini 3 Flash | $0.50 / $3.00 | no discount | −30% | ⭐ Kunavo |
94
- | Gemini 3 Pro / 3.1 Pro | $2.00 / $12.00 | no discount | −30% | ⭐ Kunavo |
95
- | Gemini 2.5 Pro | $1.25 / $10.00 | no discount | −30% | ⭐ Kunavo |
96
- | Gemini 2.5 Flash | $0.30 / $2.50 | no discount | −30% | ⭐ Kunavo |
97
- | Claude Sonnet 4.6 | $3.00 / $15.00 | no discount | −30% | ⭐ Kunavo |
98
- | Claude Haiku 4.5 | $1.00 / $5.00 | no discount | −30% | ⭐ Kunavo |
99
- | DeepSeek V4 | $0.43 / $0.87 | no discount | not carried | ⭐ OpenRouter |
136
+ > **Note:** List price effective price always verify with the [probe](#vetting-a-provider-capability--cost-probe). As of 2026-05-28, Kunavo token counts are clean for both Gemini (~1.1–1.4×) and Claude (~1.0×). Remaining caveats: `max_tokens` is still ignored on both models, and hidden-prompt injection appears intermittently for Claude — re-probe before routing in production. Effective cost is why `ai-lcr` should rank by measured behavior, not the sticker price.
100
137
 
101
- Kunavo carries Anthropic + Google. DeepSeek / OpenAI / Grok / Mistral route to OpenRouterone config can mix them all.
138
+ > **Note:** TokenMart token counts are also verified clean (same backend as Inference.ai, all checks passed 2026-05-27: tool calls, `max_tokens`, no injection, token ~1.0×, prompt caching) a reliable second provider for Claude at −15% list. Re-probe before routing in production.
102
139
 
103
140
  ## Image model pricing
104
141
 
105
- USD per image, as of 2026-05 (provider list / retail; verify current rates). Kunavo is 30% off official. fal and Runware are compute providers — `ai-lcr` picks the cheapest per model (⭐).
106
-
107
- | Model | fal.ai | Runware | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | Cheapest |
108
- |---|---|---|---|---|
109
- | Nano Banana 2 | $0.080 | $0.069 | $0.047 | ⭐ Kunavo |
110
- | Nano Banana Pro | $0.080 | — | $0.094 | ⭐ fal |
111
- | GPT-Image-2 | $0.210 | $0.094 | $0.089 | ⭐ Kunavo |
112
- | Imagen 4 Ultra | $0.060 | $0.060 | — | ⭐ fal / Runware |
113
- | Ideogram V3 | $0.060 | $0.060 | — | ⭐ fal / Runware |
114
- | Seedream 4 | $0.030 | — | — | ⭐ fal |
115
- | Flux 1.1 Pro | $0.040 | $0.040 | — | ⭐ fal / Runware |
116
- | Flux Dev | $0.025 | $0.025 | — | ⭐ fal / Runware |
117
- | Flux Schnell | $0.0030 | $0.0013 | — | ⭐ Runware |
118
- | Qwen-Image | — | $0.0038 | — | ⭐ Runware |
119
- | FLUX.2 Klein 4B | — | $0.0006 | — | ⭐ Runware |
142
+ USD per image, as of 2026-05 (provider list / retail; verify current rates). Kunavo is 20% off official. fal and Runware are compute providers — `ai-lcr` picks the cheapest per model (⭐).
143
+
144
+ | Model | fal.ai | Runware | [Kunavo](https://kunavo.com/?ref=victorimf) | [TokenMart](https://thetokenmart.ai) | Cheapest |
145
+ |---|---|---|---|---|---|
146
+ | Nano Banana 2 | $0.080 | $0.069 | $0.054 | **$0.050** | TokenMart |
147
+ | Nano Banana Pro | $0.080 | — | $0.107 | — | ⭐ fal |
148
+ | GPT-Image-2 | $0.210 | $0.094 | $0.102 | — | Runware |
149
+ | Imagen 4 Ultra | $0.060 | $0.060 | — | — | ⭐ fal / Runware |
150
+ | Ideogram V3 | $0.060 | $0.060 | — | — | ⭐ fal / Runware |
151
+ | Seedream 4 | $0.030 | — | — | — | ⭐ fal |
152
+ | Flux 1.1 Pro | $0.040 | $0.040 | — | — | ⭐ fal / Runware |
153
+ | Flux Dev | $0.025 | $0.025 | — | — | ⭐ fal / Runware |
154
+ | Flux Schnell | $0.0030 | $0.0013 | — | — | ⭐ Runware |
155
+ | Qwen-Image | — | $0.0038 | — | — | ⭐ Runware |
156
+ | FLUX.2 Klein 4B | — | $0.0006 | — | — | ⭐ Runware |
120
157
 
121
158
  ## Video model pricing
122
159
 
@@ -134,19 +171,70 @@ USD per second, as of 2026-05 — verify current rates. Video billing differs by
134
171
  | Seedance Pro | $0.124 |
135
172
  | Veo 3.1 (audio-on) | $0.400 |
136
173
 
174
+ ## Vetting a provider (capability + cost probe)
175
+
176
+ A discount is worthless if the provider quietly breaks the wire protocol. `ai-lcr` ships a zero-dependency check (`scripts/check-provider.sh`, just `bash` + `curl` + `python3`) that vets the things that actually cost you money or corrupt output, **per model**:
177
+
178
+ - **tool calling** — single call and a multi-step round-trip with `content: null` (the shape every agent loop sends)
179
+ - **`max_tokens` honored** — caps must bound output
180
+ - **hidden-prompt injection** — sends a neutral message; flags the provider if the model starts reacting to a system prompt it was never given
181
+ - **token over-counting** — compares reported `prompt_tokens` against a trusted baseline provider; >1.5× means the bill is inflated and the "discount" may be a loss
182
+ - **prompt caching** — whether `cache_control` actually produces a `cache_read` on repeats
183
+
184
+ ```bash
185
+ # point it at the provider you're vetting; models are generic numbered slots
186
+ # (works for Gemini, Claude, GPT, Llama, …). Add a per-model REF_n on a trusted
187
+ # baseline (e.g. OpenRouter) to enable the token-inflation check. CACHE_MODEL
188
+ # (optional) runs the Anthropic-native /v1/messages prompt-caching test.
189
+ API_KEY=$KUNAVO_API_KEY BASE=https://api.kunavo.com \
190
+ MODEL_1=gemini-3-flash REF_1=google/gemini-3-flash-preview \
191
+ MODEL_2=claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
192
+ CACHE_MODEL=claude-sonnet-4-6 \
193
+ REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
194
+ bash scripts/check-provider.sh
195
+
196
+ # TokenMart uses vendor-prefixed model IDs
197
+ API_KEY=$TOKENMART_API_KEY BASE=https://api.tokenmart.ai \
198
+ MODEL_1=google/gemini-3-flash REF_1=google/gemini-3-flash-preview \
199
+ MODEL_2=anthropic/claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
200
+ CACHE_MODEL=anthropic/claude-sonnet-4-6 \
201
+ REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
202
+ bash scripts/check-provider.sh
203
+ ```
204
+
205
+ A `FAIL` on injection or token over-counting means that provider is **not** a safe least-cost target for that model — keep it off that model's cheapest-first list until it's fixed, then re-probe.
206
+
207
+ ### Trust matrix (probed 2026-05-27)
208
+
209
+ Two OpenAI-compatible providers, same probe, same day. Cells cover both families (G = Gemini, C = Claude).
210
+
211
+ | Check | Kunavo | [TokenMart](https://thetokenmart.ai) |
212
+ |---|---|---|
213
+ | Tool calls (single + multi-step `content: null`) | G ⚠️ intermittent¹ · C ✅ | ✅ both |
214
+ | Token count vs OpenRouter baseline | G ✅ ~1.1–1.4× · C ✅ ~1.0× | ✅ both ~1.0× |
215
+ | Hidden-prompt injection | G ✅ none · C ❌ intermittent² | ✅ none |
216
+ | `max_tokens` honored | ❌ ignored (both) | ✅ both |
217
+ | Prompt caching (`cache_control`) | C ❌ not applied (endpoint also hung mid-probe) | C ✅ `cache_read` > 0 |
218
+
219
+ ¹ Kunavo Gemini returned a clean tool call on one run and **dropped tools entirely** on the next identical request — not a stable pass.
220
+ ² Kunavo Claude reacted to a phantom "fake system prompt" on one run and stayed clean on another — the injection is intermittent, not removed.
221
+
222
+ **Verdict:** TokenMart passes every check on both Gemini and Claude with stable, repeatable results — route freely. Kunavo: token counts are now clean for Claude (re-probed 2026-05-28); at −20% list, Kunavo is the cheapest option for Claude. Remaining caveats: `max_tokens` is ignored on both models, hidden-prompt injection appears intermittently for Claude, and Gemini drops tool calls intermittently — re-probe before routing a new model in production.
223
+
137
224
  ## Roadmap
138
225
 
139
226
  - [x] Own failover engine — cheapest-first routing + streaming-safe fallback, no external routing dependency
140
227
  - [x] Real per-call cost accounting (`onCost`)
141
228
  - [x] Auto cheapest-first ordering (`autoSort`) from per-provider `cost`
229
+ - [x] Offline capability + cost check (`scripts/check-provider.sh`) → per-model trust matrix
142
230
  - [ ] Bundled price table for zero-config pricing (drop the manual `cost` numbers)
143
- - [ ] Provider-quirk middleware (transparently patch known per-provider request quirks)
144
- - [ ] Offline capability probe (tool-calling / caching / streaming) trust matrix
231
+ - [ ] Provider-quirk middleware (transparently patch known per-provider request quirks, e.g. Kunavo's ignored `max_tokens`)
232
+ - [ ] Feed probe results into routing automatically (auto-exclude a model from a provider that fails its probe)
145
233
  - [ ] Image & video model routing (fal.ai / Runware / Kunavo)
146
234
 
147
235
  ## Affiliate disclosure
148
236
 
149
- `ai-lcr` is provider-neutral and works with any OpenAI-compatible endpoint. The author holds an affiliate arrangement with **[Kunavo](https://kunavo.com/?ref=hJ2uT3iW)**, which — at 30% off official rates — is often (not always) the cheapest option, as the tables above show. Signing up through that link may earn the author a share. You're never required to use it; bring your own providers and routing works identically.
237
+ `ai-lcr` is provider-neutral and works with any OpenAI-compatible endpoint. The author holds an affiliate arrangement with **[Kunavo](https://kunavo.com/?ref=victorimf)**, which — at 20% off official rates — is often (not always) the cheapest option, as the tables above show. Signing up through that link may earn the author a share. You're never required to use it; bring your own providers and routing works identically.
150
238
 
151
239
  ## Development
152
240
 
package/README.zh-CN.md CHANGED
@@ -51,7 +51,7 @@ const lcr = createLCR({
51
51
  models: {
52
52
  // 一个逻辑模型,跨多个 provider 最便宜优先地提供服务。
53
53
  "gemini-3-flash": [
54
- { model: kunavo("gemini-3-flash"), label: "kunavo", cost: { input: 0.35, output: 2.1 } },
54
+ { model: kunavo("gemini-3-flash"), label: "kunavo", cost: { input: 0.40, output: 2.40 } },
55
55
  { model: openrouter("google/gemini-3-flash-preview"), label: "openrouter", cost: { input: 0.5, output: 3.0 } },
56
56
  ],
57
57
  },
@@ -67,6 +67,37 @@ const { text } = await generateText({
67
67
 
68
68
  `cost` 和 `label` 都是可选的——如果你不需要成本核算或 `autoSort`,可以直接传裸模型(`kunavo("gemini-3-flash")`)。`lcr("gemini-3-flash")` 返回一个标准的 AI SDK 模型,因此可与 `generateText`、`streamText`、`generateObject`、工具调用和 agent 一起使用。
69
69
 
70
+ ## 直连模型厂商官方 API(原生 provider)
71
+
72
+ 「provider」不一定是聚合器。模型厂商**自己的官方 API** 就是列表里的又一个 entry——往往是最便宜的那个(没有聚合器加价),也最不容易悄悄破坏原生特性(prompt 缓存、工具调用)。任何 AI SDK 的 provider 包都返回标准模型,所以厂商的原生 API 和 OpenAI 兼容的聚合器可以并排放在同一个列表里:
73
+
74
+ ```ts
75
+ import { createLCR } from "ai-lcr";
76
+ import { createDeepSeek } from "@ai-sdk/deepseek"; // DeepSeek 官方 API
77
+ import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
78
+
79
+ const deepseek = createDeepSeek({ apiKey: process.env.DEEPSEEK_API_KEY });
80
+ const openrouter = createOpenAICompatible({
81
+ name: "openrouter",
82
+ baseURL: "https://openrouter.ai/api/v1",
83
+ apiKey: process.env.OPENROUTER_API_KEY,
84
+ });
85
+
86
+ const lcr = createLCR({
87
+ autoSort: true,
88
+ models: {
89
+ "deepseek-v4": [
90
+ // 官方 API 优先——无加价,原生特性齐全(缓存、错峰折扣)。
91
+ { model: deepseek("deepseek-chat"), label: "deepseek", cost: { input: 0.43, output: 0.87 } },
92
+ // 聚合器作为兜底,保可用性 + 广覆盖。
93
+ { model: openrouter("deepseek/deepseek-v4"), label: "openrouter", cost: { input: 0.43, output: 0.87 } },
94
+ ],
95
+ },
96
+ });
97
+ ```
98
+
99
+ 同样的模式适用于任何厂商的原生 SDK provider——`@ai-sdk/anthropic`、`@ai-sdk/google`、`@ai-sdk/openai`、`@ai-sdk/xai` 等等。它们都返回 `LanguageModelV3`,所以你可以在一个模型的列表里把厂商原生 API 和聚合器混着用。原生 API 覆盖窄(只有该厂商自己的模型)但特性全;聚合器覆盖广。**官方优先 + 聚合器兜底** 正是 LCR 最自然的形态。
100
+
70
101
  ## 它如何路由
71
102
 
72
103
  1. **最便宜优先。** provider 按顺序依次尝试——把它们排成最便宜优先,或设置 `autoSort: true` 让它按 `cost` 自动排序。
@@ -79,44 +110,50 @@ const { text } = await generateText({
79
110
 
80
111
  ## 支持的 provider
81
112
 
82
- 任何 OpenAI 兼容的 endpoint 都可用。
113
+ 任何 OpenAI 兼容的 endpoint 都可用——任何 AI SDK 的 provider 包也都可用,包括模型厂商自己的官方 API。
83
114
 
84
- - **文本:** [OpenRouter](https://openrouter.ai)(覆盖最广,列表定价)· [Kunavo](https://kunavo.com/?ref=hJ2uT3iW)(**全模型 7 折**)
85
- - **图像 / 视频:** [Kunavo](https://kunavo.com/?ref=hJ2uT3iW)(**7 折**)· [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —— 路由功能在路线图中
115
+ - **模型厂商官方 API(原生):** 通过各自的 AI SDK provider 包直连 [DeepSeek](https://platform.deepseek.com)[OpenAI](https://openai.com)、[Anthropic](https://anthropic.com)、[Google](https://ai.google.dev)、[xAI](https://x.ai) 等——无加价,原生特性齐全。见上方「直连模型厂商官方 API(原生 provider)」一节。
116
+ - **文本聚合器:** [OpenRouter](https://openrouter.ai)(覆盖最广,列表定价)· [Kunavo](https://kunavo.com/?ref=victorimf)(**全模型 8 折**)· [TokenMart](https://thetokenmart.ai)(按模型 85 折–35 折不等)
117
+ - **图像 / 视频:** [Kunavo](https://kunavo.com/?ref=victorimf)(**8 折**)· [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —— 路由功能在路线图中
86
118
 
87
119
  ## 文本模型价格
88
120
 
89
- 单位为每 100 万 token 的美元价格,input / output。官方价格截至 2026-05——请向各 provider 核对当前价格。OpenRouter 直接透传列表价;Kunavo 在官方价基础上统一 7 折。
121
+ 单位为每 100 万 token 的美元价格,input / output。官方价格截至 2026-05——请向各 provider 核对当前价格。OpenRouter 直接透传列表价;Kunavo 在官方价基础上统一 8 折。TokenMart 折扣按模型不同(85 折–35 折),请在 [thetokenmart.ai](https://thetokenmart.ai) 核对当前价格。
122
+
123
+ | 模型 | 官方价(in / out) | OpenRouter | [Kunavo](https://kunavo.com/?ref=victorimf) | [TokenMart](https://thetokenmart.ai) | 最便宜 |
124
+ |---|---|---|---|---|---|
125
+ | Gemini 3 Flash | $0.50 / $3.00 | 无折扣 | −20% | — | ⭐ Kunavo |
126
+ | Gemini 3 Pro / 3.1 Pro | $2.00 / $12.00 | 无折扣 | −20% | −20% → **$2.40 / $9.60** | ⭐ Kunavo |
127
+ | Gemini 2.5 Pro | $1.25 / $10.00 | 无折扣 | −20% | — | ⭐ Kunavo |
128
+ | Gemini 2.5 Flash | $0.30 / $2.50 | 无折扣 | −20% | — | ⭐ Kunavo |
129
+ | Claude Opus 4.7 | $15.00 / $75.00 | 无折扣 | −20% | **$4.25 / $21.25** | ⭐ TokenMart |
130
+ | Claude Sonnet 4.6 | $3.00 / $15.00 | 无折扣 | −20% | −15% → **$2.55 / $12.75** | ⭐ Kunavo |
131
+ | Claude Haiku 4.5 | $1.00 / $5.00 | 无折扣 | −20% | — | ⭐ Kunavo |
132
+ | DeepSeek V4 | $0.43 / $0.87 | 无折扣 | 未提供 | — | ⭐ DeepSeek(官方) |
133
+
134
+ Kunavo 提供 Anthropic + Google。DeepSeek / OpenAI / Grok / Mistral 路由到各自的**官方 API**(最便宜,原生特性齐全),以 OpenRouter 作为广覆盖兜底——一份配置即可混用原生厂商与聚合器。
90
135
 
91
- | 模型 | 官方价(in / out) | OpenRouter | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | 最便宜 |
92
- |---|---|---|---|---|
93
- | Gemini 3 Flash | $0.50 / $3.00 | 无折扣 | −30% | ⭐ Kunavo |
94
- | Gemini 3 Pro / 3.1 Pro | $2.00 / $12.00 | 无折扣 | −30% | ⭐ Kunavo |
95
- | Gemini 2.5 Pro | $1.25 / $10.00 | 无折扣 | −30% | ⭐ Kunavo |
96
- | Gemini 2.5 Flash | $0.30 / $2.50 | 无折扣 | −30% | ⭐ Kunavo |
97
- | Claude Sonnet 4.6 | $3.00 / $15.00 | 无折扣 | −30% | ⭐ Kunavo |
98
- | Claude Haiku 4.5 | $1.00 / $5.00 | 无折扣 | −30% | ⭐ Kunavo |
99
- | DeepSeek V4 | $0.43 / $0.87 | 无折扣 | 未提供 | ⭐ OpenRouter |
136
+ > **注:** list 有效价——请始终用 [probe](#给-provider-做体检能力--成本探测) 验证。截至 2026-05-28,Kunavo 在 Gemini(~1.1–1.4×)和 Claude(~1.0×)两条路上的 token 计数均已干净。现存问题:两个模型均忽略 `max_tokens`,Claude 隐藏 prompt 注入仍为间歇性——生产路由前请重新 probe。
100
137
 
101
- Kunavo 提供 Anthropic + Google。DeepSeek / OpenAI / Grok / Mistral 路由到 OpenRouter——一份配置即可混用全部。
138
+ > **注:** TokenMart token 计数同样经 probe 验证干净(后端与 Inference.ai 相同,2026-05-27 全项通过:工具调用、`max_tokens`、无注入、token ~1.0×、prompt 缓存)——如需 Claude 的第二 provider,TokenMart 是可靠备选。生产路由前请重新 probe 确认。
102
139
 
103
140
  ## 图像模型价格
104
141
 
105
- 单位为每张图的美元价格,截至 2026-05(provider 列表价 / 零售价;请核对当前价格)。Kunavo 为官方价 7 折。fal 与 Runware 是算力 provider——`ai-lcr` 为每个模型挑选最便宜的那个(⭐)。
106
-
107
- | 模型 | fal.ai | Runware | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | 最便宜 |
108
- |---|---|---|---|---|
109
- | Nano Banana 2 | $0.080 | $0.069 | $0.047 | ⭐ Kunavo |
110
- | Nano Banana Pro | $0.080 | — | $0.094 | ⭐ fal |
111
- | GPT-Image-2 | $0.210 | $0.094 | $0.089 | ⭐ Kunavo |
112
- | Imagen 4 Ultra | $0.060 | $0.060 | — | ⭐ fal / Runware |
113
- | Ideogram V3 | $0.060 | $0.060 | — | ⭐ fal / Runware |
114
- | Seedream 4 | $0.030 | — | — | ⭐ fal |
115
- | Flux 1.1 Pro | $0.040 | $0.040 | — | ⭐ fal / Runware |
116
- | Flux Dev | $0.025 | $0.025 | — | ⭐ fal / Runware |
117
- | Flux Schnell | $0.0030 | $0.0013 | — | ⭐ Runware |
118
- | Qwen-Image | — | $0.0038 | — | ⭐ Runware |
119
- | FLUX.2 Klein 4B | — | $0.0006 | — | ⭐ Runware |
142
+ 单位为每张图的美元价格,截至 2026-05(provider 列表价 / 零售价;请核对当前价格)。Kunavo 为官方价 8 折。fal 与 Runware 是算力 provider——`ai-lcr` 为每个模型挑选最便宜的那个(⭐)。
143
+
144
+ | 模型 | fal.ai | Runware | [Kunavo](https://kunavo.com/?ref=victorimf) | [TokenMart](https://thetokenmart.ai) | 最便宜 |
145
+ |---|---|---|---|---|---|
146
+ | Nano Banana 2 | $0.080 | $0.069 | $0.054 | **$0.050** | TokenMart |
147
+ | Nano Banana Pro | $0.080 | — | $0.107 | — | ⭐ fal |
148
+ | GPT-Image-2 | $0.210 | $0.094 | $0.102 | — | Runware |
149
+ | Imagen 4 Ultra | $0.060 | $0.060 | — | — | ⭐ fal / Runware |
150
+ | Ideogram V3 | $0.060 | $0.060 | — | — | ⭐ fal / Runware |
151
+ | Seedream 4 | $0.030 | — | — | — | ⭐ fal |
152
+ | Flux 1.1 Pro | $0.040 | $0.040 | — | — | ⭐ fal / Runware |
153
+ | Flux Dev | $0.025 | $0.025 | — | — | ⭐ fal / Runware |
154
+ | Flux Schnell | $0.0030 | $0.0013 | — | — | ⭐ Runware |
155
+ | Qwen-Image | — | $0.0038 | — | — | ⭐ Runware |
156
+ | FLUX.2 Klein 4B | — | $0.0006 | — | — | ⭐ Runware |
120
157
 
121
158
  ## 视频模型价格
122
159
 
@@ -134,19 +171,69 @@ Kunavo 提供 Anthropic + Google。DeepSeek / OpenAI / Grok / Mistral 路由到
134
171
  | Seedance Pro | $0.124 |
135
172
  | Veo 3.1(audio-on) | $0.400 |
136
173
 
174
+ ## 给 provider 做体检(能力 + 成本探测)
175
+
176
+ 折扣再大,如果 provider 偷偷破坏了协议就一文不值。`ai-lcr` 自带一个零依赖的检查脚本(`scripts/check-provider.sh`,只需 `bash` + `curl` + `python3`),**逐模型**核查那些真正会让你多花钱或污染输出的点:
177
+
178
+ - **工具调用** —— 单次调用 + 带 `content: null` 的多步 round-trip(每个 agent 循环都会发的形态)
179
+ - **`max_tokens` 是否生效** —— cap 必须能限制输出长度
180
+ - **隐藏 prompt 注入** —— 发一条中性消息,如果模型开始回应一段它从没收到过的 system prompt,就说明 provider 注入了东西
181
+ - **token 超计** —— 把上报的 `prompt_tokens` 和一个可信基线 provider 对照,>1.5× 说明账单被灌水、"折扣"可能是亏本
182
+ - **prompt 缓存** —— `cache_control` 在重复请求时是否真的产生 `cache_read`
183
+
184
+ ```bash
185
+ # 指向你要体检的 provider;模型用通用编号槽位(Gemini / Claude / GPT / Llama 都行)。
186
+ # 给某个模型配上 REF_n(可信基线上的对应模型 id)即可启用 token 超计检查。
187
+ # CACHE_MODEL(可选)跑 Anthropic 原生 /v1/messages 的 prompt 缓存测试。
188
+ API_KEY=$KUNAVO_API_KEY BASE=https://api.kunavo.com \
189
+ MODEL_1=gemini-3-flash REF_1=google/gemini-3-flash-preview \
190
+ MODEL_2=claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
191
+ CACHE_MODEL=claude-sonnet-4-6 \
192
+ REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
193
+ bash scripts/check-provider.sh
194
+
195
+ # TokenMart 使用 vendor 前缀的模型 ID
196
+ API_KEY=$TOKENMART_API_KEY BASE=https://api.tokenmart.ai \
197
+ MODEL_1=google/gemini-3-flash REF_1=google/gemini-3-flash-preview \
198
+ MODEL_2=anthropic/claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
199
+ CACHE_MODEL=anthropic/claude-sonnet-4-6 \
200
+ REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
201
+ bash scripts/check-provider.sh
202
+ ```
203
+
204
+ 注入或 token 超计这两项 `FAIL`,意味着该 provider 对那个模型来说**不是**安全的最低成本目标——在它修好之前,别把它放进那个模型的「最便宜优先」列表,修好后重新探测。
205
+
206
+ ### 信任矩阵(探测于 2026-05-27)
207
+
208
+ 两个 OpenAI 兼容 provider,同一脚本,同一天。单元格覆盖两个家族(G = Gemini,C = Claude)。
209
+
210
+ | 检查项 | Kunavo | [TokenMart](https://thetokenmart.ai) |
211
+ |---|---|---|
212
+ | 工具调用(单次 + 多步 `content: null`) | G ⚠️ 间歇性¹ · C ✅ | ✅ 两者 |
213
+ | token 计数 vs OpenRouter 基线 | G ✅ ~1.1–1.4× · C ✅ ~1.0× | ✅ 两者 ~1.0× |
214
+ | 隐藏 prompt 注入 | G ✅ 无 · C ❌ 间歇性² | ✅ 无 |
215
+ | `max_tokens` 是否生效 | ❌ 被忽略(两者) | ✅ 两者 |
216
+ | prompt 缓存(`cache_control`) | C ❌ 未生效(探测中途 endpoint 还卡死) | C ✅ `cache_read` > 0 |
217
+
218
+ ¹ Kunavo Gemini 一次返回干净的工具调用,下一次相同请求却**完全丢掉了 tools**——不是稳定通过。
219
+ ² Kunavo Claude 一次对着幻觉中的"fake system prompt"作出反应,另一次又干净——注入是间歇性的,不是被移除了。
220
+
221
+ **结论:** TokenMart 在 Gemini 和 Claude 两条路上每一项都通过,且结果稳定可复现——可以放心路由。Kunavo:Claude token 计数已干净(2026-05-28 重新 probe),按 8 折 list 价,Kunavo 现在是 Claude 模型的最便宜选择。现存问题:两个模型均忽略 `max_tokens`、Claude 隐藏 prompt 注入仍为间歇性、Gemini 也会间歇性丢工具调用——用新模型前先重新探测。
222
+
137
223
  ## 路线图
138
224
 
139
225
  - [x] 自有 failover 引擎——最便宜优先路由 + 流式安全的 fallback,不依赖外部路由库
140
226
  - [x] 真实的逐次调用成本核算(`onCost`)
141
227
  - [x] 基于各 provider `cost` 的自动最便宜优先排序(`autoSort`)
228
+ - [x] 离线能力 + 成本检查(`scripts/check-provider.sh`)→ 逐模型信任矩阵
142
229
  - [ ] 内置价格表,实现零配置定价(省去手填 `cost` 数字)
143
- - [ ] provider 怪癖中间件(透明地修补已知的各 provider 请求怪癖)
144
- - [ ] 离线能力探测(工具调用 / 缓存 / 流式)→ 信任矩阵
230
+ - [ ] provider 怪癖中间件(透明地修补已知怪癖,如 Kunavo 被忽略的 `max_tokens`)
231
+ - [ ] probe 结果自动接入路由(探测失败的 provider×model 自动从列表剔除)
145
232
  - [ ] 图像与视频模型路由(fal.ai / Runware / Kunavo)
146
233
 
147
234
  ## 联盟(Affiliate)披露
148
235
 
149
- `ai-lcr` 是 provider 中立的,可与任何 OpenAI 兼容的 endpoint 配合使用。作者与 **[Kunavo](https://kunavo.com/?ref=hJ2uT3iW)** 之间存在联盟(affiliate)关系——在官方价 7 折的情况下,它往往(但并非总是)是最便宜的选项,正如上面的表格所示。通过该链接注册可能会让作者获得一份分成。你完全不必使用它;自带 provider,路由功能照常工作。
236
+ `ai-lcr` 是 provider 中立的,可与任何 OpenAI 兼容的 endpoint 配合使用。作者与 **[Kunavo](https://kunavo.com/?ref=victorimf)** 之间存在联盟(affiliate)关系——在官方价 8 折的情况下,它往往(但并非总是)是最便宜的选项,正如上面的表格所示。通过该链接注册可能会让作者获得一份分成。你完全不必使用它;自带 provider,路由功能照常工作。
150
237
 
151
238
  ## 开发
152
239
 
package/dist/index.d.ts CHANGED
@@ -29,6 +29,224 @@ interface CostEvent {
29
29
  costUsd: number;
30
30
  }
31
31
 
32
+ type MediaModality = "image" | "video";
33
+ /**
34
+ * Pricing unit a provider bills in. `cents` on MediaPricing is the price for
35
+ * one of these units, in US cents.
36
+ * - "image": flat per generated image (most partner models)
37
+ * - "megapixel": compute-billed; scales with output resolution
38
+ * - "second": per second of video
39
+ * - "call": one flat charge per generation (Kunavo video, fixed clip)
40
+ */
41
+ type MediaUnit = "image" | "megapixel" | "second" | "call";
42
+ interface MediaPricing {
43
+ unit: MediaUnit;
44
+ /** Price in US cents for one `unit`. Fractional allowed (0.13 = $0.0013). */
45
+ cents: number;
46
+ }
47
+ /** One provider's route for a model. */
48
+ interface MediaRoute {
49
+ /** Provider key: "fal" | "runware" | "kunavo" | … */
50
+ provider: string;
51
+ /** Provider-native model id (what its API expects). */
52
+ externalId: string;
53
+ pricing: MediaPricing;
54
+ /**
55
+ * Free-text caveat surfaced in the price table — e.g. a resolution tier the
56
+ * price assumes, or a SKU/version difference from sibling routes. Optional.
57
+ */
58
+ note?: string;
59
+ }
60
+ interface MediaModelDef {
61
+ /** Logical, provider-agnostic id, e.g. "google/nano-banana-2". */
62
+ id: string;
63
+ modality: MediaModality;
64
+ /** Providers that serve this model. Order is irrelevant — routing sorts by cost. */
65
+ routes: MediaRoute[];
66
+ }
67
+ type MediaRegistry = Record<string, MediaModelDef>;
68
+ /**
69
+ * Every price is normalized to the cost of producing ONE of these outputs, so
70
+ * a per-image flat fee, a per-megapixel compute charge, a per-second video
71
+ * rate, and a flat per-call video fee all become directly comparable.
72
+ */
73
+ interface ReferenceSpec {
74
+ /** Reference still image. Default 16:9 1080p (1920×1080 ≈ 2.07 MP). */
75
+ image: {
76
+ width: number;
77
+ height: number;
78
+ };
79
+ /** Reference clip length in seconds (assumed 1080p). Default 5s. */
80
+ videoSeconds: number;
81
+ }
82
+ /** 16:9 1080p image, 5-second clip — the house standard. */
83
+ declare const DEFAULT_REFERENCE: ReferenceSpec;
84
+ declare function referenceMegapixels(ref: ReferenceSpec): number;
85
+ /**
86
+ * Cost in US cents to produce ONE reference output on this pricing.
87
+ * This is the single normalization that makes providers comparable.
88
+ */
89
+ declare function normalizedCents(pricing: MediaPricing, ref?: ReferenceSpec): number;
90
+ interface RankedRoute extends MediaRoute {
91
+ /** Normalized cost (cents per reference output) used for ordering. */
92
+ refCents: number;
93
+ }
94
+ /** A model's routes, cheapest reference-cost first. */
95
+ declare function rankRoutes(def: MediaModelDef, ref?: ReferenceSpec): RankedRoute[];
96
+ declare function cheapestRoute(def: MediaModelDef, ref?: ReferenceSpec): RankedRoute;
97
+ /** Per-model cheapest-provider summary — the price-comparison reference list. */
98
+ interface PriceComparisonRow {
99
+ modelId: string;
100
+ modality: MediaModality;
101
+ cheapest: {
102
+ provider: string;
103
+ refCents: number;
104
+ };
105
+ routes: {
106
+ provider: string;
107
+ refCents: number;
108
+ unit: MediaUnit;
109
+ note?: string;
110
+ }[];
111
+ }
112
+ declare function comparePrices(registry: MediaRegistry, ref?: ReferenceSpec): PriceComparisonRow[];
113
+ interface MediaGenerateRequest {
114
+ externalId: string;
115
+ /** Canonical input: { prompt, image_url?, duration?, aspect_ratio?, … }. */
116
+ input: Record<string, unknown>;
117
+ }
118
+ interface MediaOutput {
119
+ url: string;
120
+ type: MediaModality;
121
+ }
122
+ interface MediaGenerateResult {
123
+ outputs: MediaOutput[];
124
+ /** Provider-reported actual cost in cents, when the API returns it. */
125
+ costCents?: number;
126
+ /** Units actually billed (images, or seconds of video) — for cost fallback. */
127
+ units?: number;
128
+ }
129
+ /**
130
+ * A provider adapter. `run` resolves only when the output is ready: image
131
+ * adapters return synchronously; video adapters submit and poll internally.
132
+ */
133
+ interface MediaAdapter {
134
+ provider: string;
135
+ run(req: MediaGenerateRequest): Promise<MediaGenerateResult>;
136
+ }
137
+ interface MediaCostEvent {
138
+ modelId: string;
139
+ provider: string;
140
+ /** Actual cost: provider-reported if available, else normalized estimate. */
141
+ costCents: number;
142
+ estimated: boolean;
143
+ }
144
+ interface MediaLCRConfig {
145
+ registry: MediaRegistry;
146
+ /** Adapters keyed by provider. A route with no adapter is skipped. */
147
+ adapters: Record<string, MediaAdapter>;
148
+ reference?: ReferenceSpec;
149
+ onError?: (error: Error, provider: string) => void;
150
+ onCost?: (event: MediaCostEvent) => void;
151
+ }
152
+ interface MediaRunResult {
153
+ outputs: MediaOutput[];
154
+ provider: string;
155
+ costCents: number;
156
+ estimated: boolean;
157
+ }
158
+ /**
159
+ * Build a media Least Cost Router. Returns `generate(modelId, input)` which
160
+ * tries providers cheapest-first and falls through on a retryable error —
161
+ * exactly the text LCR's contract, for image/video.
162
+ */
163
+ declare function createMediaLCR(config: MediaLCRConfig): (modelId: string, input: Record<string, unknown>) => Promise<MediaRunResult>;
164
+
165
+ /**
166
+ * Bundled cross-provider price reference for image & video models.
167
+ *
168
+ * This is the "which provider is cheapest" table, as data. Prices are in US
169
+ * cents per native unit; `comparePrices()` (./media) normalizes them to one
170
+ * 16:9 1080p image / 5-second 1080p clip so providers compare directly.
171
+ *
172
+ * PROVENANCE
173
+ * - fal, runware: per-image / per-second rates verified by Victor in ai-art's
174
+ * `@art/models` registry (audited against provider model pages 2026-05).
175
+ * - kunavo: read programmatically from `GET https://api.kunavo.com/v1/models`
176
+ * (`.kunavo.billing.perCallMicroCents` ÷ 1e6 → cents), 2026-05-29.
177
+ * - tokenmart: serves NO image/video models (47 text-only models) — excluded.
178
+ *
179
+ * Only models where 2+ providers compete are listed (that's where routing has a
180
+ * choice). The long tail of fal/runware-only models (Flux, Seedream, Kling, WAN,
181
+ * Sora, Hunyuan, …) lives in ai-art's registry; Kunavo does not carry them, so
182
+ * their "cheapest provider" is the fal-vs-runware call ai-art already encodes.
183
+ */
184
+
185
+ declare const MEDIA_PRICING: MediaRegistry;
186
+
187
+ /**
188
+ * Kunavo media adapter — image (sync) + video (async poll).
189
+ *
190
+ * Kunavo is NOT an AI-SDK chat provider for media: image/video generation uses
191
+ * its own REST endpoints, not `/v1/chat/completions`. So this is a hand-rolled
192
+ * `MediaAdapter`, not a `createOpenAICompatible` wrapper.
193
+ *
194
+ * - Image: POST /v1/images/generations → returns a files.kunavo.com URL.
195
+ * Synchronous (~11s for nano-banana). VERIFIED end-to-end.
196
+ * - Video: POST /v1/video/generations (singular "video"; /videos/ → 405).
197
+ * Long-running. The submit→poll path here is IMPLEMENTED FROM THE
198
+ * DOCS SHAPE BUT NOT YET RUN against a real job (veo-3 generation
199
+ * was skipped to save cost). Treat the poll loop as unverified:
200
+ * the field names (`id`/`status`/`url`) may differ from what the
201
+ * live API returns. Verify before relying on video in production.
202
+ *
203
+ * Kunavo does NOT return a per-call cost in the generation response, so cost is
204
+ * left to the router's normalized estimate (MediaGenerateResult.costCents
205
+ * stays undefined; `units` defaults to 1 — one image / one clip per call).
206
+ */
207
+
208
+ interface KunavoMediaConfig {
209
+ apiKey: string;
210
+ /** Override for testing. Defaults to https://api.kunavo.com. */
211
+ baseUrl?: string;
212
+ /** Video poll cadence (ms). Default 5000. */
213
+ pollIntervalMs?: number;
214
+ /** Max time to wait for a video job before giving up (ms). Default 300000 (5m). */
215
+ pollTimeoutMs?: number;
216
+ /** Injected for testing; defaults to global fetch. */
217
+ fetchImpl?: typeof fetch;
218
+ }
219
+ declare function createKunavoMediaAdapter(config: KunavoMediaConfig): MediaAdapter;
220
+
221
+ /**
222
+ * Runware media adapter — image generation (sync).
223
+ *
224
+ * Runware exposes a single REST endpoint that takes an array of tasks. This
225
+ * adapter wraps the `imageInference` task: it adds the boilerplate every call
226
+ * needs (taskType, a taskUUID, single result, URL output, cost reporting) and
227
+ * passes the caller's `input` straight through, so any Runware image model and
228
+ * any of its parameters (positivePrompt, width/height, steps, CFGScale,
229
+ * seedImage/strength for i2i, referenceImages for edit models, …) work without
230
+ * this adapter knowing about them. That keeps it generic — it is NOT tied to
231
+ * any one model family.
232
+ *
233
+ * Cost: Runware returns `cost` in US DOLLARS (when `includeCost` is on). The
234
+ * media router works in CENTS, so we convert (×100) before reporting it as the
235
+ * provider-reported actual cost.
236
+ *
237
+ * Video: Runware video is a different, async task type and is out of scope here
238
+ * — this adapter handles image inference only.
239
+ */
240
+
241
+ interface RunwareMediaConfig {
242
+ apiKey: string;
243
+ /** Override for testing. Defaults to https://api.runware.ai/v1. */
244
+ baseUrl?: string;
245
+ /** Injected for testing; defaults to global fetch. */
246
+ fetchImpl?: typeof fetch;
247
+ }
248
+ declare function createRunwareMediaAdapter(config: RunwareMediaConfig): MediaAdapter;
249
+
32
250
  /**
33
251
  * ai-lcr — Least Cost Routing for LLMs.
34
252
  *
@@ -76,4 +294,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
76
294
  */
77
295
  declare function createLCR(config: LCRConfig): LCRRouter;
78
296
 
79
- export { type CostEvent, type LCRConfig, type LCRRouter, type ProviderCost, type ProviderEntry, createLCR };
297
+ export { type CostEvent, DEFAULT_REFERENCE, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, cheapestRoute, comparePrices, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, normalizedCents, rankRoutes, referenceMegapixels };
package/dist/index.js CHANGED
@@ -1,5 +1,5 @@
1
1
  // src/fallback.ts
2
- var RETRYABLE_STATUS = /* @__PURE__ */ new Set([401, 403, 408, 409, 413, 429, 498, 500]);
2
+ var RETRYABLE_STATUS = /* @__PURE__ */ new Set([401, 402, 403, 408, 409, 413, 429, 498, 500]);
3
3
  var RETRYABLE_PATTERNS = [
4
4
  "overloaded",
5
5
  "service unavailable",
@@ -16,7 +16,13 @@ var RETRYABLE_PATTERNS = [
16
16
  "502",
17
17
  "503",
18
18
  "504",
19
- "429"
19
+ "429",
20
+ // Billing caps — a capped provider should fall over, not kill the request.
21
+ "insufficient",
22
+ "credit",
23
+ "quota",
24
+ "billing",
25
+ "payment required"
20
26
  ];
21
27
  function isRetryableError(error) {
22
28
  const e = error;
@@ -179,6 +185,358 @@ var LcrFallbackModel = class {
179
185
  }
180
186
  };
181
187
 
188
+ // src/media.ts
189
+ var DEFAULT_REFERENCE = {
190
+ image: { width: 1920, height: 1080 },
191
+ videoSeconds: 5
192
+ };
193
+ function referenceMegapixels(ref) {
194
+ return ref.image.width * ref.image.height / 1e6;
195
+ }
196
+ function normalizedCents(pricing, ref = DEFAULT_REFERENCE) {
197
+ switch (pricing.unit) {
198
+ case "image":
199
+ case "call":
200
+ return pricing.cents;
201
+ // already "one output"
202
+ case "megapixel":
203
+ return pricing.cents * referenceMegapixels(ref);
204
+ case "second":
205
+ return pricing.cents * ref.videoSeconds;
206
+ }
207
+ }
208
+ function rankRoutes(def, ref = DEFAULT_REFERENCE) {
209
+ return def.routes.map((r) => ({ ...r, refCents: normalizedCents(r.pricing, ref) })).sort((a, b) => a.refCents - b.refCents);
210
+ }
211
+ function cheapestRoute(def, ref = DEFAULT_REFERENCE) {
212
+ const ranked = rankRoutes(def, ref);
213
+ if (ranked.length === 0) {
214
+ throw new Error(`ai-lcr: model "${def.id}" has no routes`);
215
+ }
216
+ return ranked[0];
217
+ }
218
+ function comparePrices(registry, ref = DEFAULT_REFERENCE) {
219
+ return Object.values(registry).map((def) => {
220
+ const ranked = rankRoutes(def, ref);
221
+ return {
222
+ modelId: def.id,
223
+ modality: def.modality,
224
+ cheapest: { provider: ranked[0].provider, refCents: ranked[0].refCents },
225
+ routes: ranked.map((r) => ({
226
+ provider: r.provider,
227
+ refCents: r.refCents,
228
+ unit: r.pricing.unit,
229
+ ...r.note ? { note: r.note } : {}
230
+ }))
231
+ };
232
+ });
233
+ }
234
+ function createMediaLCR(config) {
235
+ const { registry, adapters, reference = DEFAULT_REFERENCE, onError, onCost } = config;
236
+ return async function generate(modelId, input) {
237
+ const def = registry[modelId];
238
+ if (!def) {
239
+ throw new Error(`ai-lcr: unknown media model "${modelId}" \u2014 add it to the registry`);
240
+ }
241
+ const ranked = rankRoutes(def, reference);
242
+ let lastErr;
243
+ for (const route of ranked) {
244
+ const adapter = adapters[route.provider];
245
+ if (!adapter) continue;
246
+ try {
247
+ const result = await adapter.run({ externalId: route.externalId, input });
248
+ const estimated = result.costCents === void 0;
249
+ const costCents = estimated ? route.refCents * (result.units ?? 1) : result.costCents;
250
+ onCost?.({ modelId, provider: route.provider, costCents, estimated });
251
+ return { outputs: result.outputs, provider: route.provider, costCents, estimated };
252
+ } catch (err) {
253
+ lastErr = err;
254
+ onError?.(err, route.provider);
255
+ if (!isRetryableError(err)) throw err;
256
+ }
257
+ }
258
+ throw lastErr instanceof Error ? lastErr : new Error(`ai-lcr: no provider could serve media model "${modelId}"`);
259
+ };
260
+ }
261
+
262
+ // src/media-registry.ts
263
+ var MEDIA_PRICING = {
264
+ // ── Google image (Gemini "Nano Banana" family) ──────────────
265
+ "google/nano-banana": {
266
+ id: "google/nano-banana",
267
+ modality: "image",
268
+ routes: [
269
+ // Gemini Flash Image v1. Kunavo only — fal/runware carry v2/pro, not v1.
270
+ { provider: "kunavo", externalId: "nano-banana", pricing: { unit: "image", cents: 2.73 } }
271
+ ]
272
+ },
273
+ "google/nano-banana-2": {
274
+ id: "google/nano-banana-2",
275
+ modality: "image",
276
+ routes: [
277
+ { provider: "kunavo", externalId: "nano-banana-2", pricing: { unit: "image", cents: 4.69 } },
278
+ { provider: "runware", externalId: "google:4@3", pricing: { unit: "image", cents: 6.9 }, note: "1K tier (2K 10.3\xA2, 4K 15.3\xA2)" },
279
+ { provider: "fal", externalId: "fal-ai/nano-banana-2", pricing: { unit: "image", cents: 8 } }
280
+ ]
281
+ },
282
+ "google/nano-banana-pro": {
283
+ id: "google/nano-banana-pro",
284
+ modality: "image",
285
+ routes: [
286
+ { provider: "kunavo", externalId: "nano-banana-pro", pricing: { unit: "image", cents: 6.7 } },
287
+ { provider: "fal", externalId: "fal-ai/nano-banana-pro", pricing: { unit: "image", cents: 8 } }
288
+ ]
289
+ },
290
+ // ── OpenAI image ────────────────────────────────────────────
291
+ "openai/gpt-image-2": {
292
+ id: "openai/gpt-image-2",
293
+ modality: "image",
294
+ routes: [
295
+ // Standard quality. (fal carries only the High tier — separate SKU below.)
296
+ { provider: "kunavo", externalId: "gpt-image-2", pricing: { unit: "image", cents: 6.33 } },
297
+ { provider: "runware", externalId: "openai:gpt-image@2", pricing: { unit: "image", cents: 9.375 } }
298
+ ]
299
+ },
300
+ "openai/gpt-image-2-high": {
301
+ id: "openai/gpt-image-2-high",
302
+ modality: "image",
303
+ routes: [
304
+ { provider: "fal", externalId: "fal-ai/gpt-image-2/high", pricing: { unit: "image", cents: 21 } }
305
+ ]
306
+ },
307
+ // ── Black Forest Labs FLUX (image) ──────────────────────────
308
+ // ⚠️ Prices need a provider-page audit before trusting the gap (same bar as
309
+ // the rest of this table). Runware figures are anchored to a sibling repo's
310
+ // measured cost (Schnell ~0.14¢, Kontext ~1.4¢ — Kontext ≈ 10× Schnell); the
311
+ // fal figures are list prices pending verification. Both are per 1MP image.
312
+ "bfl/flux-schnell": {
313
+ id: "bfl/flux-schnell",
314
+ modality: "image",
315
+ routes: [
316
+ { provider: "runware", externalId: "runware:100@1", pricing: { unit: "image", cents: 0.14 } },
317
+ { provider: "fal", externalId: "fal-ai/flux/schnell", pricing: { unit: "megapixel", cents: 0.3 }, note: "list price, verify" }
318
+ ]
319
+ },
320
+ "bfl/flux-kontext-dev": {
321
+ id: "bfl/flux-kontext-dev",
322
+ modality: "image",
323
+ routes: [
324
+ // Instruction-edit (i2i) model; restricted resolution set.
325
+ { provider: "runware", externalId: "runware:106@1", pricing: { unit: "image", cents: 1.4 } },
326
+ { provider: "fal", externalId: "fal-ai/flux-kontext/dev", pricing: { unit: "image", cents: 2.5 }, note: "list price, verify" }
327
+ ]
328
+ },
329
+ // ── Google video (Veo) ──────────────────────────────────────
330
+ // ⚠️ Version/SKU mismatch across providers: Kunavo bills "veo-3" per CALL
331
+ // (flat fee per clip; Veo 3 generates ~8s, audio/res tier unconfirmed); fal
332
+ // bills "veo3.1" per SECOND. Normalized to a 5s clip the per-call price wins
333
+ // by a wide margin — verify the clip's duration/resolution/audio before
334
+ // trusting the gap. See note fields.
335
+ "google/veo-3": {
336
+ id: "google/veo-3",
337
+ modality: "video",
338
+ routes: [
339
+ { provider: "kunavo", externalId: "veo-3", pricing: { unit: "call", cents: 32 }, note: "flat per clip (~8s, SKU unverified)" },
340
+ { provider: "fal", externalId: "fal-ai/veo3.1", pricing: { unit: "second", cents: 40 }, note: "veo3.1, 1080p audio-on (20\xA2/s audio-off)" }
341
+ ]
342
+ },
343
+ "google/veo-3-lite": {
344
+ id: "google/veo-3-lite",
345
+ modality: "video",
346
+ routes: [
347
+ { provider: "kunavo", externalId: "veo-3-lite", pricing: { unit: "call", cents: 16 }, note: "flat per clip (SKU unverified)" },
348
+ { provider: "fal", externalId: "fal-ai/veo3.1/lite", pricing: { unit: "second", cents: 8 }, note: "veo3.1 lite, 1080p audio-on" }
349
+ ]
350
+ },
351
+ "google/veo-3-quality": {
352
+ id: "google/veo-3-quality",
353
+ modality: "video",
354
+ routes: [
355
+ { provider: "kunavo", externalId: "veo-3-quality", pricing: { unit: "call", cents: 192 }, note: "flat per clip (SKU unverified)" }
356
+ ]
357
+ }
358
+ };
359
+
360
+ // src/adapters/kunavo-media.ts
361
+ var DEFAULT_BASE = "https://api.kunavo.com";
362
+ function extractImageUrls(body) {
363
+ const data = body?.data;
364
+ if (!Array.isArray(data)) return [];
365
+ return data.map((d) => d?.url).filter((u) => typeof u === "string" && u.length > 0);
366
+ }
367
+ function createKunavoMediaAdapter(config) {
368
+ const {
369
+ apiKey,
370
+ baseUrl = DEFAULT_BASE,
371
+ pollIntervalMs = 5e3,
372
+ pollTimeoutMs = 3e5,
373
+ fetchImpl = fetch
374
+ } = config;
375
+ const headers = {
376
+ "content-type": "application/json",
377
+ authorization: `Bearer ${apiKey}`
378
+ };
379
+ async function runImage(req) {
380
+ const res = await fetchImpl(`${baseUrl}/v1/images/generations`, {
381
+ method: "POST",
382
+ headers,
383
+ body: JSON.stringify({ model: req.externalId, ...req.input })
384
+ });
385
+ if (!res.ok) {
386
+ throw new KunavoMediaError(res.status, await safeText(res));
387
+ }
388
+ const body = await res.json();
389
+ const urls = extractImageUrls(body);
390
+ if (urls.length === 0) {
391
+ throw new Error(`ai-lcr: Kunavo returned no image URL for "${req.externalId}"`);
392
+ }
393
+ const outputs = urls.map((url) => ({ url, type: "image" }));
394
+ return { outputs };
395
+ }
396
+ async function runVideo(req) {
397
+ const submit = await fetchImpl(`${baseUrl}/v1/video/generations`, {
398
+ method: "POST",
399
+ headers,
400
+ body: JSON.stringify({ model: req.externalId, ...req.input })
401
+ });
402
+ if (!submit.ok) {
403
+ throw new KunavoMediaError(submit.status, await safeText(submit));
404
+ }
405
+ const submitBody = await submit.json();
406
+ const inlineUrls = extractImageUrls(submitBody);
407
+ if (inlineUrls.length > 0) {
408
+ return { outputs: inlineUrls.map((url) => ({ url, type: "video" })) };
409
+ }
410
+ const jobId = submitBody.id ?? submitBody.task_id ?? submitBody.request_id;
411
+ if (!jobId) {
412
+ throw new Error(
413
+ `ai-lcr: Kunavo video submit returned no job id (got keys: ${Object.keys(
414
+ submitBody
415
+ ).join(", ")})`
416
+ );
417
+ }
418
+ const deadline = Date.now() + pollTimeoutMs;
419
+ while (Date.now() < deadline) {
420
+ await sleep(pollIntervalMs);
421
+ const poll = await fetchImpl(`${baseUrl}/v1/video/generations/${jobId}`, {
422
+ headers
423
+ });
424
+ if (!poll.ok) {
425
+ throw new KunavoMediaError(poll.status, await safeText(poll));
426
+ }
427
+ const pollBody = await poll.json();
428
+ const status = String(pollBody.status ?? "").toLowerCase();
429
+ if (status === "succeeded" || status === "completed" || status === "success") {
430
+ const urls = extractImageUrls(pollBody);
431
+ const direct = pollBody.url;
432
+ const all = urls.length > 0 ? urls : direct ? [direct] : [];
433
+ if (all.length === 0) {
434
+ throw new Error(`ai-lcr: Kunavo video job ${jobId} finished with no URL`);
435
+ }
436
+ return { outputs: all.map((url) => ({ url, type: "video" })) };
437
+ }
438
+ if (status === "failed" || status === "error") {
439
+ throw new Error(
440
+ `ai-lcr: Kunavo video job ${jobId} failed: ${JSON.stringify(pollBody)}`
441
+ );
442
+ }
443
+ }
444
+ throw new Error(`ai-lcr: Kunavo video job ${jobId} timed out after ${pollTimeoutMs}ms`);
445
+ }
446
+ return {
447
+ provider: "kunavo",
448
+ async run(req) {
449
+ const isVideo = /(^|\/)veo/i.test(req.externalId);
450
+ return isVideo ? runVideo(req) : runImage(req);
451
+ }
452
+ };
453
+ }
454
+ var KunavoMediaError = class extends Error {
455
+ constructor(status, body) {
456
+ super(`Kunavo media HTTP ${status}: ${body.slice(0, 300)}`);
457
+ this.status = status;
458
+ this.name = "KunavoMediaError";
459
+ }
460
+ status;
461
+ };
462
+ function sleep(ms) {
463
+ return new Promise((r) => setTimeout(r, ms));
464
+ }
465
+ async function safeText(res) {
466
+ try {
467
+ return await res.text();
468
+ } catch {
469
+ return "<no body>";
470
+ }
471
+ }
472
+
473
+ // src/adapters/runware-media.ts
474
+ var DEFAULT_BASE2 = "https://api.runware.ai/v1";
475
+ function imageUrl(r) {
476
+ return r.imageURL || r.imageUrl || r.url;
477
+ }
478
+ function errorMessage(body) {
479
+ const errs = body.errors?.map((e) => e.message || e.code).filter(Boolean);
480
+ return errs?.join("; ") || body.error || body.message || "unknown";
481
+ }
482
+ function createRunwareMediaAdapter(config) {
483
+ const { apiKey, baseUrl = DEFAULT_BASE2, fetchImpl = fetch } = config;
484
+ return {
485
+ provider: "runware",
486
+ async run(req) {
487
+ const task = {
488
+ numberResults: 1,
489
+ outputType: "URL",
490
+ includeCost: true,
491
+ ...req.input,
492
+ taskType: "imageInference",
493
+ taskUUID: crypto.randomUUID(),
494
+ model: req.externalId
495
+ };
496
+ const res = await fetchImpl(baseUrl, {
497
+ method: "POST",
498
+ headers: {
499
+ "content-type": "application/json",
500
+ authorization: `Bearer ${apiKey}`,
501
+ accept: "application/json"
502
+ },
503
+ body: JSON.stringify([task])
504
+ });
505
+ let body;
506
+ try {
507
+ body = await res.json();
508
+ } catch {
509
+ body = {};
510
+ }
511
+ if (!res.ok || body.errors?.length || body.error) {
512
+ throw new RunwareMediaError(res.ok ? 502 : res.status, errorMessage(body));
513
+ }
514
+ const images = (body.data ?? []).filter((r) => imageUrl(r));
515
+ if (images.length === 0) {
516
+ throw new Error(`ai-lcr: Runware returned no image URL for "${req.externalId}"`);
517
+ }
518
+ const outputs = images.map((r) => ({ url: imageUrl(r), type: "image" }));
519
+ const costUsd = images.reduce((sum, r) => {
520
+ if (typeof r.cost !== "number") return sum;
521
+ return (sum ?? 0) + r.cost;
522
+ }, void 0);
523
+ return {
524
+ outputs,
525
+ units: images.length,
526
+ ...costUsd !== void 0 ? { costCents: costUsd * 100 } : {}
527
+ };
528
+ }
529
+ };
530
+ }
531
+ var RunwareMediaError = class extends Error {
532
+ constructor(status, body) {
533
+ super(`Runware media HTTP ${status}: ${body.slice(0, 300)}`);
534
+ this.status = status;
535
+ this.name = "RunwareMediaError";
536
+ }
537
+ status;
538
+ };
539
+
182
540
  // src/index.ts
183
541
  function isLanguageModel(entry) {
184
542
  return typeof entry.doGenerate === "function";
@@ -220,5 +578,15 @@ function createLCR(config) {
220
578
  };
221
579
  }
222
580
  export {
223
- createLCR
581
+ DEFAULT_REFERENCE,
582
+ MEDIA_PRICING,
583
+ cheapestRoute,
584
+ comparePrices,
585
+ createKunavoMediaAdapter,
586
+ createLCR,
587
+ createMediaLCR,
588
+ createRunwareMediaAdapter,
589
+ normalizedCents,
590
+ rankRoutes,
591
+ referenceMegapixels
224
592
  };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ai-lcr",
3
- "version": "0.0.1",
3
+ "version": "0.1.0",
4
4
  "description": "Least Cost Routing for LLMs — route every model call to the cheapest available provider, fall back automatically, and track real cost. Built for the Vercel AI SDK.",
5
5
  "keywords": [
6
6
  "ai",