npm - ai-lcr - Versions diffs - 0.0.1 → 0.2.0 - Mend

ai-lcr 0.0.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md CHANGED Viewed

@@ -51,7 +51,7 @@ const lcr = createLCR({
   models: {
     // One logical model, served cheapest-first across providers.
     "gemini-3-flash": [
-      { model: kunavo("gemini-3-flash"), label: "kunavo", cost: { input: 0.35, output: 2.1 } },
+      { model: kunavo("gemini-3-flash"), label: "kunavo", cost: { input: 0.40, output: 2.40 } },
       { model: openrouter("google/gemini-3-flash-preview"), label: "openrouter", cost: { input: 0.5, output: 3.0 } },
     ],
   },
@@ -67,56 +67,155 @@ const { text } = await generateText({
 `cost` and `label` are optional — pass bare models (`kunavo("gemini-3-flash")`) if you don't need cost accounting or `autoSort`. `lcr("gemini-3-flash")` returns a standard AI SDK model, so it works with `generateText`, `streamText`, `generateObject`, tools, and agents.
+## Route to a model vendor's own API (native providers)
+A "provider" doesn't have to be an aggregator. A model vendor's **own official API** is just another entry in the list — often the cheapest, since there's no aggregator markup, and the least likely to silently break native features (prompt caching, tool calls). Any AI SDK provider package returns a standard model, so a vendor's native API and an OpenAI-compatible aggregator sit side by side in the same list:
+```ts
+import { createLCR } from "ai-lcr";
+import { createDeepSeek } from "@ai-sdk/deepseek";          // DeepSeek's own API
+import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
+const deepseek = createDeepSeek({ apiKey: process.env.DEEPSEEK_API_KEY });
+const openrouter = createOpenAICompatible({
+  name: "openrouter",
+  baseURL: "https://openrouter.ai/api/v1",
+  apiKey: process.env.OPENROUTER_API_KEY,
+});
+const lcr = createLCR({
+  autoSort: true,
+  models: {
+    "deepseek-v4": [
+      // Official API first — no markup, full native features (caching, off-peak discounts).
+      { model: deepseek("deepseek-chat"), label: "deepseek", cost: { input: 0.43, output: 0.87 } },
+      // Aggregator as a fallback for uptime + breadth.
+      { model: openrouter("deepseek/deepseek-v4"), label: "openrouter", cost: { input: 0.43, output: 0.87 } },
+    ],
+  },
+});
+```
+The same pattern works for any vendor's native SDK provider — `@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/openai`, `@ai-sdk/xai`, and so on. They all return `LanguageModelV3`, so you can mix a native vendor API with aggregators in one model's list. Native APIs are narrow (only that vendor's models) but featureful; aggregators are broad. **Official-first + aggregator-fallback** is the natural LCR shape.
 ## How it routes
 1. **Cheapest first.** Providers are tried in order — list them cheapest-first, or set `autoSort: true` to order them by `cost` automatically.
-2. **Fall through on failure.** On a retryable error (rate limit, 5xx, timeout) it advances to the next provider, streaming-safe. Hard errors (400, 401, 403, 422) pass through immediately.
+2. **Fall through on failure.** On a retryable error — rate limit, 5xx, timeout, or a **billing cap** (402 / out-of-credit / quota) — it advances to the next provider, streaming-safe. A caller's own bad request (e.g. 400, 422) passes through immediately.
 3. **Recover.** After an idle window (`resetIntervalMs`, default 60s) it snaps back to the cheapest provider.
 <p align="center">
   <img src="assets/ai-lcr-routing.svg" alt="routing diagram: cheapest first, fallback on failure, recover after idle" width="820">
 </p>
+## See what happened (`onCall`)
+`onError`/`onCost` fire separately and uncorrelated, so a failover is hard to read after the fact. `onCall` gives you **one record per request** — the full chain, the winner, the reason for each failed hop, latency, and cost — and `formatCallRecord` turns it into a one-liner you can scan:
+```ts
+import { createLCR, formatCallRecord } from "ai-lcr";
+const lcr = createLCR({
+  models: { /* … */ },
+  onCall: (record) => console.log(formatCallRecord(record)),
+});
+```
+```text
+✓ text  tokenmart                      412ms  $0.0003
+⚠ text  tokenmart→openrouter           910ms  $0.0004  ⤷ tokenmart 502
+✗ text  deepseek→tokenmart→openrouter  1240ms FAILED   ⤷ deepseek 401, tokenmart 502, openrouter 429
+```
+`✓` served on the first try · `⚠` failed over but recovered · `✗` every provider failed. The `⤷` shows which provider died and why.
+**Persist it anywhere — zero lock-in.** `record` is a plain `CallRecord` object. Log the JSON and point any log drain at it (Axiom, Datadog, your own DB); ai-lcr never decides where it goes:
+```ts
+onCall: (record) => console.log(JSON.stringify(record)),
+```
+Or ship each record to an HTTP collector with the built-in `createHttpSink` (fire-and-forget, never throws, dashboard-agnostic):
+```ts
+import { createLCR, createHttpSink } from "ai-lcr";
+import { after } from "next/server"; // serverless: don't block the response
+const lcr = createLCR({
+  models: { /* … */ },
+  onCall: createHttpSink({
+    url: `${process.env.LCR_INGEST_URL}/api/ingest`,
+    headers: { authorization: `Bearer ${process.env.LCR_INGEST_KEY}` },
+    project: process.env.LCR_PROJECT, // optional tag if one collector serves several apps
+    dispatch: after,                  // run after the response is sent (serverless-safe)
+  }),
+});
+```
+Point `url` at anything that accepts the `CallRecord` JSON — including the self-hostable companion dashboard, **[ai-lcr-dashboard](https://github.com/victorzhrn/ai-lcr-dashboard)** (Spend / Calls / Failover rate + a live failover feed). You run your own instance, so the data never leaves your infrastructure; a [db9](https://db9.ai) database can be provisioned in seconds if you don't want to stand one up yourself.
+```ts
+interface CallRecord {
+  id: string;                // correlation id, one per request
+  model: string;             // logical model name
+  attempts: { provider: string; ok: boolean; latencyMs: number; errorClass?: string }[];
+  winner?: string;           // provider that served; undefined if all failed
+  ok: boolean;
+  failedOver: boolean;       // more than one provider was tried
+  latencyMs: number;
+  inputTokens: number;
+  outputTokens: number;
+  costUsd: number;            // what the winner charged for these tokens
+  baselineUsd: number;        // what the priciest configured route would cost → savings = baselineUsd - costUsd
+}
+```
 ## Supported providers
-Any OpenAI-compatible endpoint works.
+Any OpenAI-compatible endpoint works — and so does any AI SDK provider package, including a model vendor's own official API.
-- **Text:** [OpenRouter](https://openrouter.ai) (widest coverage, list pricing) · [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) (**30% off** every model)
-- **Image / video:** [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) (**30% off**) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) — routing on the roadmap
+- **Model vendors' own APIs (native):** route straight to [DeepSeek](https://platform.deepseek.com), [OpenAI](https://openai.com), [Anthropic](https://anthropic.com), [Google](https://ai.google.dev), [xAI](https://x.ai), etc. via their AI SDK provider packages — no markup, full native features. See [Route to a model vendor's own API](#route-to-a-model-vendors-own-api-native-providers).
+- **Text aggregators:** [OpenRouter](https://openrouter.ai) (widest coverage, list pricing) · [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off** every model) · [TokenMart](https://thetokenmart.ai) (15–65% off, varies by model)
+- **Image / video:** [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off**) · [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) — image routing available via `createMediaLCR` (Kunavo + Runware adapters); video on the roadmap
 ## Text model pricing
-USD per 1M tokens, input / output. Official rates as of 2026-05 — verify current rates with each provider. OpenRouter passes list price through; Kunavo is a flat 30% off the official rate.
+USD per 1M tokens, input / output. Official rates as of 2026-05 — verify current rates with each provider. OpenRouter passes list price through; Kunavo is a flat 20% off the official rate. TokenMart prices vary by model (15–65% off list) — verify current rates at [thetokenmart.ai](https://thetokenmart.ai).
+| Model | Official (in / out) | OpenRouter | [Kunavo](https://kunavo.com/?ref=victorimf) | [TokenMart](https://thetokenmart.ai) | Cheapest |
+|---|---|---|---|---|---|
+| Gemini 3 Flash | $0.50 / $3.00 | no discount | −20% | — | ⭐ Kunavo |
+| Gemini 3 Pro / 3.1 Pro | $2.00 / $12.00 | no discount | −20% | −20% → **$2.40 / $9.60** | ⭐ Kunavo |
+| Gemini 2.5 Pro | $1.25 / $10.00 | no discount | −20% | — | ⭐ Kunavo |
+| Gemini 2.5 Flash | $0.30 / $2.50 | no discount | −20% | — | ⭐ Kunavo |
+| Claude Opus 4.7 | $15.00 / $75.00 | no discount | −20% | **$4.25 / $21.25** | ⭐ TokenMart |
+| Claude Sonnet 4.6 | $3.00 / $15.00 | no discount | −20% | −15% → **$2.55 / $12.75** | ⭐ Kunavo |
+| Claude Haiku 4.5 | $1.00 / $5.00 | no discount | −20% | — | ⭐ Kunavo |
+| DeepSeek V4 | $0.43 / $0.87 | no discount | not carried | — | ⭐ DeepSeek (official) |
-| Model | Official (in / out) | OpenRouter | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | Cheapest |
-|---|---|---|---|---|
-| Gemini 3 Flash | $0.50 / $3.00 | no discount | −30% | ⭐ Kunavo |
-| Gemini 3 Pro / 3.1 Pro | $2.00 / $12.00 | no discount | −30% | ⭐ Kunavo |
-| Gemini 2.5 Pro | $1.25 / $10.00 | no discount | −30% | ⭐ Kunavo |
-| Gemini 2.5 Flash | $0.30 / $2.50 | no discount | −30% | ⭐ Kunavo |
-| Claude Sonnet 4.6 | $3.00 / $15.00 | no discount | −30% | ⭐ Kunavo |
-| Claude Haiku 4.5 | $1.00 / $5.00 | no discount | −30% | ⭐ Kunavo |
-| DeepSeek V4 | $0.43 / $0.87 | no discount | not carried | ⭐ OpenRouter |
+Kunavo carries Anthropic + Google. DeepSeek / OpenAI / Grok / Mistral route to their **own official APIs** (cheapest, full native features) with OpenRouter as a broad fallback — one config can mix native vendors and aggregators.
-Kunavo carries Anthropic + Google. DeepSeek / OpenAI / Grok / Mistral route to OpenRouter — one config can mix them all.
+> **Note:** List price ≠ effective price — always verify with the [probe](#vetting-a-provider-capability--cost-probe). As of 2026-05-28, Kunavo token counts are clean for both Gemini (~1.1–1.4×) and Claude (~1.0×). Remaining caveats: `max_tokens` is still ignored on both models, and hidden-prompt injection appears intermittently for Claude — re-probe before routing in production. Effective cost is why `ai-lcr` should rank by measured behavior, not the sticker price.
+> **Note:** TokenMart token counts are also verified clean (same backend as Inference.ai, all checks passed 2026-05-27: tool calls, `max_tokens`, no injection, token ~1.0×, prompt caching) — a reliable second provider for Claude at −15% list. Re-probe before routing in production.
 ## Image model pricing
-USD per image, as of 2026-05 (provider list / retail; verify current rates). Kunavo is 30% off official. fal and Runware are compute providers — `ai-lcr` picks the cheapest per model (⭐).
-| Model | fal.ai | Runware | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | Cheapest |
-|---|---|---|---|---|
-| Nano Banana 2 | $0.080 | $0.069 | $0.047 | ⭐ Kunavo |
-| Nano Banana Pro | $0.080 | — | $0.094 | ⭐ fal |
-| GPT-Image-2 | $0.210 | $0.094 | $0.089 | ⭐ Kunavo |
-| Imagen 4 Ultra | $0.060 | $0.060 | — | ⭐ fal / Runware |
-| Ideogram V3 | $0.060 | $0.060 | — | ⭐ fal / Runware |
-| Seedream 4 | $0.030 | — | — | ⭐ fal |
-| Flux 1.1 Pro | $0.040 | $0.040 | — | ⭐ fal / Runware |
-| Flux Dev | $0.025 | $0.025 | — | ⭐ fal / Runware |
-| Flux Schnell | $0.0030 | $0.0013 | — | ⭐ Runware |
-| Qwen-Image | — | $0.0038 | — | ⭐ Runware |
-| FLUX.2 Klein 4B | — | $0.0006 | — | ⭐ Runware |
+USD per image, as of 2026-05 (provider list / retail; verify current rates). Kunavo is 20% off official. fal and Runware are compute providers — `ai-lcr` picks the cheapest per model (⭐).
+| Model | fal.ai | Runware | [Kunavo](https://kunavo.com/?ref=victorimf) | [TokenMart](https://thetokenmart.ai) | Cheapest |
+|---|---|---|---|---|---|
+| Nano Banana 2 | $0.080 | $0.069 | $0.054 | **$0.050** | ⭐ TokenMart |
+| Nano Banana Pro | $0.080 | — | $0.107 | — | ⭐ fal |
+| GPT-Image-2 | $0.210 | $0.094 | $0.102 | — | ⭐ Runware |
+| Imagen 4 Ultra | $0.060 | $0.060 | — | — | ⭐ fal / Runware |
+| Ideogram V3 | $0.060 | $0.060 | — | — | ⭐ fal / Runware |
+| Seedream 4 | $0.030 | — | — | — | ⭐ fal |
+| Flux 1.1 Pro | $0.040 | $0.040 | — | — | ⭐ fal / Runware |
+| Flux Dev | $0.025 | $0.025 | — | — | ⭐ fal / Runware |
+| Flux Schnell | $0.0030 | $0.0013 | — | — | ⭐ Runware |
+| Qwen-Image | — | $0.0038 | — | — | ⭐ Runware |
+| FLUX.2 Klein 4B | — | $0.0006 | — | — | ⭐ Runware |
 ## Video model pricing
@@ -134,19 +233,71 @@ USD per second, as of 2026-05 — verify current rates. Video billing differs by
 | Seedance Pro | $0.124 |
 | Veo 3.1 (audio-on) | $0.400 |
+## Vetting a provider (capability + cost probe)
+A discount is worthless if the provider quietly breaks the wire protocol. `ai-lcr` ships a zero-dependency check (`scripts/check-provider.sh`, just `bash` + `curl` + `python3`) that vets the things that actually cost you money or corrupt output, **per model**:
+- **tool calling** — single call and a multi-step round-trip with `content: null` (the shape every agent loop sends)
+- **`max_tokens` honored** — caps must bound output
+- **hidden-prompt injection** — sends a neutral message; flags the provider if the model starts reacting to a system prompt it was never given
+- **token over-counting** — compares reported `prompt_tokens` against a trusted baseline provider; >1.5× means the bill is inflated and the "discount" may be a loss
+- **prompt caching** — whether `cache_control` actually produces a `cache_read` on repeats
+```bash
+# point it at the provider you're vetting; models are generic numbered slots
+# (works for Gemini, Claude, GPT, Llama, …). Add a per-model REF_n on a trusted
+# baseline (e.g. OpenRouter) to enable the token-inflation check. CACHE_MODEL
+# (optional) runs the Anthropic-native /v1/messages prompt-caching test.
+API_KEY=$KUNAVO_API_KEY BASE=https://api.kunavo.com \
+  MODEL_1=gemini-3-flash    REF_1=google/gemini-3-flash-preview \
+  MODEL_2=claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
+  CACHE_MODEL=claude-sonnet-4-6 \
+  REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
+  bash scripts/check-provider.sh
+# TokenMart uses vendor-prefixed model IDs
+API_KEY=$TOKENMART_API_KEY BASE=https://api.tokenmart.ai \
+  MODEL_1=google/gemini-3-flash    REF_1=google/gemini-3-flash-preview \
+  MODEL_2=anthropic/claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
+  CACHE_MODEL=anthropic/claude-sonnet-4-6 \
+  REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
+  bash scripts/check-provider.sh
+```
+A `FAIL` on injection or token over-counting means that provider is **not** a safe least-cost target for that model — keep it off that model's cheapest-first list until it's fixed, then re-probe.
+### Trust matrix (probed 2026-05-27)
+Two OpenAI-compatible providers, same probe, same day. Cells cover both families (G = Gemini, C = Claude).
+| Check | Kunavo | [TokenMart](https://thetokenmart.ai) |
+|---|---|---|
+| Tool calls (single + multi-step `content: null`) | G ⚠️ intermittent¹ · C ✅ | ✅ both |
+| Token count vs OpenRouter baseline | G ✅ ~1.1–1.4× · C ✅ ~1.0× | ✅ both ~1.0× |
+| Hidden-prompt injection | G ✅ none · C ❌ intermittent² | ✅ none |
+| `max_tokens` honored | ❌ ignored (both) | ✅ both |
+| Prompt caching (`cache_control`) | C ❌ not applied (endpoint also hung mid-probe) | C ✅ `cache_read` > 0 |
+¹ Kunavo Gemini returned a clean tool call on one run and **dropped tools entirely** on the next identical request — not a stable pass.
+² Kunavo Claude reacted to a phantom "fake system prompt" on one run and stayed clean on another — the injection is intermittent, not removed.
+**Verdict:** TokenMart passes every check on both Gemini and Claude with stable, repeatable results — route freely. Kunavo: token counts are now clean for Claude (re-probed 2026-05-28); at −20% list, Kunavo is the cheapest option for Claude. Remaining caveats: `max_tokens` is ignored on both models, hidden-prompt injection appears intermittently for Claude, and Gemini drops tool calls intermittently — re-probe before routing a new model in production.
 ## Roadmap
 - [x] Own failover engine — cheapest-first routing + streaming-safe fallback, no external routing dependency
 - [x] Real per-call cost accounting (`onCost`)
+- [x] One correlated record per request with the full failover chain (`onCall` + `formatCallRecord`)
 - [x] Auto cheapest-first ordering (`autoSort`) from per-provider `cost`
+- [x] Offline capability + cost check (`scripts/check-provider.sh`) → per-model trust matrix
 - [ ] Bundled price table for zero-config pricing (drop the manual `cost` numbers)
-- [ ] Provider-quirk middleware (transparently patch known per-provider request quirks)
-- [ ] Offline capability probe (tool-calling / caching / streaming) → trust matrix
+- [ ] Provider-quirk middleware (transparently patch known per-provider request quirks, e.g. Kunavo's ignored `max_tokens`)
+- [ ] Feed probe results into routing automatically (auto-exclude a model from a provider that fails its probe)
 - [ ] Image & video model routing (fal.ai / Runware / Kunavo)
 ## Affiliate disclosure
-`ai-lcr` is provider-neutral and works with any OpenAI-compatible endpoint. The author holds an affiliate arrangement with **[Kunavo](https://kunavo.com/?ref=hJ2uT3iW)**, which — at 30% off official rates — is often (not always) the cheapest option, as the tables above show. Signing up through that link may earn the author a share. You're never required to use it; bring your own providers and routing works identically.
+`ai-lcr` is provider-neutral and works with any OpenAI-compatible endpoint. The author holds an affiliate arrangement with **[Kunavo](https://kunavo.com/?ref=victorimf)**, which — at 20% off official rates — is often (not always) the cheapest option, as the tables above show. Signing up through that link may earn the author a share. You're never required to use it; bring your own providers and routing works identically.
 ## Development

package/README.zh-CN.md CHANGED Viewed

@@ -51,7 +51,7 @@ const lcr = createLCR({
   models: {
     // 一个逻辑模型，跨多个 provider 最便宜优先地提供服务。
     "gemini-3-flash": [
-      { model: kunavo("gemini-3-flash"), label: "kunavo", cost: { input: 0.35, output: 2.1 } },
+      { model: kunavo("gemini-3-flash"), label: "kunavo", cost: { input: 0.40, output: 2.40 } },
       { model: openrouter("google/gemini-3-flash-preview"), label: "openrouter", cost: { input: 0.5, output: 3.0 } },
     ],
   },
@@ -67,6 +67,37 @@ const { text } = await generateText({
 `cost` 和 `label` 都是可选的——如果你不需要成本核算或 `autoSort`，可以直接传裸模型（`kunavo("gemini-3-flash")`）。`lcr("gemini-3-flash")` 返回一个标准的 AI SDK 模型，因此可与 `generateText`、`streamText`、`generateObject`、工具调用和 agent 一起使用。
+## 直连模型厂商官方 API（原生 provider）
+「provider」不一定是聚合器。模型厂商**自己的官方 API** 就是列表里的又一个 entry——往往是最便宜的那个（没有聚合器加价），也最不容易悄悄破坏原生特性（prompt 缓存、工具调用）。任何 AI SDK 的 provider 包都返回标准模型，所以厂商的原生 API 和 OpenAI 兼容的聚合器可以并排放在同一个列表里：
+```ts
+import { createLCR } from "ai-lcr";
+import { createDeepSeek } from "@ai-sdk/deepseek";          // DeepSeek 官方 API
+import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
+const deepseek = createDeepSeek({ apiKey: process.env.DEEPSEEK_API_KEY });
+const openrouter = createOpenAICompatible({
+  name: "openrouter",
+  baseURL: "https://openrouter.ai/api/v1",
+  apiKey: process.env.OPENROUTER_API_KEY,
+});
+const lcr = createLCR({
+  autoSort: true,
+  models: {
+    "deepseek-v4": [
+      // 官方 API 优先——无加价，原生特性齐全（缓存、错峰折扣）。
+      { model: deepseek("deepseek-chat"), label: "deepseek", cost: { input: 0.43, output: 0.87 } },
+      // 聚合器作为兜底，保可用性 + 广覆盖。
+      { model: openrouter("deepseek/deepseek-v4"), label: "openrouter", cost: { input: 0.43, output: 0.87 } },
+    ],
+  },
+});
+```
+同样的模式适用于任何厂商的原生 SDK provider——`@ai-sdk/anthropic`、`@ai-sdk/google`、`@ai-sdk/openai`、`@ai-sdk/xai` 等等。它们都返回 `LanguageModelV3`，所以你可以在一个模型的列表里把厂商原生 API 和聚合器混着用。原生 API 覆盖窄（只有该厂商自己的模型）但特性全；聚合器覆盖广。**官方优先 + 聚合器兜底** 正是 LCR 最自然的形态。
 ## 它如何路由
 1. **最便宜优先。** provider 按顺序依次尝试——把它们排成最便宜优先，或设置 `autoSort: true` 让它按 `cost` 自动排序。
@@ -79,44 +110,50 @@ const { text } = await generateText({
 ## 支持的 provider
-任何 OpenAI 兼容的 endpoint 都可用。
+任何 OpenAI 兼容的 endpoint 都可用——任何 AI SDK 的 provider 包也都可用，包括模型厂商自己的官方 API。
-- **文本：** [OpenRouter](https://openrouter.ai)（覆盖最广，列表定价）· [Kunavo](https://kunavo.com/?ref=hJ2uT3iW)（**全模型 7 折**）
-- **图像 / 视频：** [Kunavo](https://kunavo.com/?ref=hJ2uT3iW)（**7 折**）· [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —— 路由功能在路线图中
+- **模型厂商官方 API（原生）：** 通过各自的 AI SDK provider 包直连 [DeepSeek](https://platform.deepseek.com)、[OpenAI](https://openai.com)、[Anthropic](https://anthropic.com)、[Google](https://ai.google.dev)、[xAI](https://x.ai) 等——无加价，原生特性齐全。见上方「直连模型厂商官方 API（原生 provider）」一节。
+- **文本聚合器：** [OpenRouter](https://openrouter.ai)（覆盖最广，列表定价）· [Kunavo](https://kunavo.com/?ref=victorimf)（**全模型 8 折**）· [TokenMart](https://thetokenmart.ai)（按模型 85 折–35 折不等）
+- **图像 / 视频：** [Kunavo](https://kunavo.com/?ref=victorimf)（**8 折**）· [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —— 路由功能在路线图中
 ## 文本模型价格
-单位为每 100 万 token 的美元价格，input / output。官方价格截至 2026-05——请向各 provider 核对当前价格。OpenRouter 直接透传列表价；Kunavo 在官方价基础上统一 7 折。
+单位为每 100 万 token 的美元价格，input / output。官方价格截至 2026-05——请向各 provider 核对当前价格。OpenRouter 直接透传列表价；Kunavo 在官方价基础上统一 8 折。TokenMart 折扣按模型不同（85 折–35 折），请在 [thetokenmart.ai](https://thetokenmart.ai) 核对当前价格。
+| 模型 | 官方价（in / out） | OpenRouter | [Kunavo](https://kunavo.com/?ref=victorimf) | [TokenMart](https://thetokenmart.ai) | 最便宜 |
+|---|---|---|---|---|---|
+| Gemini 3 Flash | $0.50 / $3.00 | 无折扣 | −20% | — | ⭐ Kunavo |
+| Gemini 3 Pro / 3.1 Pro | $2.00 / $12.00 | 无折扣 | −20% | −20% → **$2.40 / $9.60** | ⭐ Kunavo |
+| Gemini 2.5 Pro | $1.25 / $10.00 | 无折扣 | −20% | — | ⭐ Kunavo |
+| Gemini 2.5 Flash | $0.30 / $2.50 | 无折扣 | −20% | — | ⭐ Kunavo |
+| Claude Opus 4.7 | $15.00 / $75.00 | 无折扣 | −20% | **$4.25 / $21.25** | ⭐ TokenMart |
+| Claude Sonnet 4.6 | $3.00 / $15.00 | 无折扣 | −20% | −15% → **$2.55 / $12.75** | ⭐ Kunavo |
+| Claude Haiku 4.5 | $1.00 / $5.00 | 无折扣 | −20% | — | ⭐ Kunavo |
+| DeepSeek V4 | $0.43 / $0.87 | 无折扣 | 未提供 | — | ⭐ DeepSeek（官方） |
+Kunavo 提供 Anthropic + Google。DeepSeek / OpenAI / Grok / Mistral 路由到各自的**官方 API**（最便宜，原生特性齐全），以 OpenRouter 作为广覆盖兜底——一份配置即可混用原生厂商与聚合器。
-| 模型 | 官方价（in / out） | OpenRouter | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | 最便宜 |
-|---|---|---|---|---|
-| Gemini 3 Flash | $0.50 / $3.00 | 无折扣 | −30% | ⭐ Kunavo |
-| Gemini 3 Pro / 3.1 Pro | $2.00 / $12.00 | 无折扣 | −30% | ⭐ Kunavo |
-| Gemini 2.5 Pro | $1.25 / $10.00 | 无折扣 | −30% | ⭐ Kunavo |
-| Gemini 2.5 Flash | $0.30 / $2.50 | 无折扣 | −30% | ⭐ Kunavo |
-| Claude Sonnet 4.6 | $3.00 / $15.00 | 无折扣 | −30% | ⭐ Kunavo |
-| Claude Haiku 4.5 | $1.00 / $5.00 | 无折扣 | −30% | ⭐ Kunavo |
-| DeepSeek V4 | $0.43 / $0.87 | 无折扣 | 未提供 | ⭐ OpenRouter |
+> **注：** list 价 ≠ 有效价——请始终用 [probe](#给-provider-做体检能力--成本探测) 验证。截至 2026-05-28，Kunavo 在 Gemini（~1.1–1.4×）和 Claude（~1.0×）两条路上的 token 计数均已干净。现存问题：两个模型均忽略 `max_tokens`，Claude 隐藏 prompt 注入仍为间歇性——生产路由前请重新 probe。
-Kunavo 提供 Anthropic + Google。DeepSeek / OpenAI / Grok / Mistral 路由到 OpenRouter——一份配置即可混用全部。
+> **注：** TokenMart token 计数同样经 probe 验证干净（后端与 Inference.ai 相同，2026-05-27 全项通过：工具调用、`max_tokens`、无注入、token ~1.0×、prompt 缓存）——如需 Claude 的第二 provider，TokenMart 是可靠备选。生产路由前请重新 probe 确认。
 ## 图像模型价格
-单位为每张图的美元价格，截至 2026-05（provider 列表价 / 零售价；请核对当前价格）。Kunavo 为官方价 7 折。fal 与 Runware 是算力 provider——`ai-lcr` 为每个模型挑选最便宜的那个（⭐）。
-| 模型 | fal.ai | Runware | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | 最便宜 |
-|---|---|---|---|---|
-| Nano Banana 2 | $0.080 | $0.069 | $0.047 | ⭐ Kunavo |
-| Nano Banana Pro | $0.080 | — | $0.094 | ⭐ fal |
-| GPT-Image-2 | $0.210 | $0.094 | $0.089 | ⭐ Kunavo |
-| Imagen 4 Ultra | $0.060 | $0.060 | — | ⭐ fal / Runware |
-| Ideogram V3 | $0.060 | $0.060 | — | ⭐ fal / Runware |
-| Seedream 4 | $0.030 | — | — | ⭐ fal |
-| Flux 1.1 Pro | $0.040 | $0.040 | — | ⭐ fal / Runware |
-| Flux Dev | $0.025 | $0.025 | — | ⭐ fal / Runware |
-| Flux Schnell | $0.0030 | $0.0013 | — | ⭐ Runware |
-| Qwen-Image | — | $0.0038 | — | ⭐ Runware |
-| FLUX.2 Klein 4B | — | $0.0006 | — | ⭐ Runware |
+单位为每张图的美元价格，截至 2026-05（provider 列表价 / 零售价；请核对当前价格）。Kunavo 为官方价 8 折。fal 与 Runware 是算力 provider——`ai-lcr` 为每个模型挑选最便宜的那个（⭐）。
+| 模型 | fal.ai | Runware | [Kunavo](https://kunavo.com/?ref=victorimf) | [TokenMart](https://thetokenmart.ai) | 最便宜 |
+|---|---|---|---|---|---|
+| Nano Banana 2 | $0.080 | $0.069 | $0.054 | **$0.050** | ⭐ TokenMart |
+| Nano Banana Pro | $0.080 | — | $0.107 | — | ⭐ fal |
+| GPT-Image-2 | $0.210 | $0.094 | $0.102 | — | ⭐ Runware |
+| Imagen 4 Ultra | $0.060 | $0.060 | — | — | ⭐ fal / Runware |
+| Ideogram V3 | $0.060 | $0.060 | — | — | ⭐ fal / Runware |
+| Seedream 4 | $0.030 | — | — | — | ⭐ fal |
+| Flux 1.1 Pro | $0.040 | $0.040 | — | — | ⭐ fal / Runware |
+| Flux Dev | $0.025 | $0.025 | — | — | ⭐ fal / Runware |
+| Flux Schnell | $0.0030 | $0.0013 | — | — | ⭐ Runware |
+| Qwen-Image | — | $0.0038 | — | — | ⭐ Runware |
+| FLUX.2 Klein 4B | — | $0.0006 | — | — | ⭐ Runware |
 ## 视频模型价格
@@ -134,19 +171,69 @@ Kunavo 提供 Anthropic + Google。DeepSeek / OpenAI / Grok / Mistral 路由到
 | Seedance Pro | $0.124 |
 | Veo 3.1（audio-on） | $0.400 |
+## 给 provider 做体检（能力 + 成本探测）
+折扣再大，如果 provider 偷偷破坏了协议就一文不值。`ai-lcr` 自带一个零依赖的检查脚本（`scripts/check-provider.sh`，只需 `bash` + `curl` + `python3`），**逐模型**核查那些真正会让你多花钱或污染输出的点：
+- **工具调用** —— 单次调用 + 带 `content: null` 的多步 round-trip（每个 agent 循环都会发的形态）
+- **`max_tokens` 是否生效** —— cap 必须能限制输出长度
+- **隐藏 prompt 注入** —— 发一条中性消息，如果模型开始回应一段它从没收到过的 system prompt，就说明 provider 注入了东西
+- **token 超计** —— 把上报的 `prompt_tokens` 和一个可信基线 provider 对照，>1.5× 说明账单被灌水、"折扣"可能是亏本
+- **prompt 缓存** —— `cache_control` 在重复请求时是否真的产生 `cache_read`
+```bash
+# 指向你要体检的 provider；模型用通用编号槽位（Gemini / Claude / GPT / Llama 都行）。
+# 给某个模型配上 REF_n（可信基线上的对应模型 id）即可启用 token 超计检查。
+# CACHE_MODEL（可选）跑 Anthropic 原生 /v1/messages 的 prompt 缓存测试。
+API_KEY=$KUNAVO_API_KEY BASE=https://api.kunavo.com \
+  MODEL_1=gemini-3-flash    REF_1=google/gemini-3-flash-preview \
+  MODEL_2=claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
+  CACHE_MODEL=claude-sonnet-4-6 \
+  REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
+  bash scripts/check-provider.sh
+# TokenMart 使用 vendor 前缀的模型 ID
+API_KEY=$TOKENMART_API_KEY BASE=https://api.tokenmart.ai \
+  MODEL_1=google/gemini-3-flash    REF_1=google/gemini-3-flash-preview \
+  MODEL_2=anthropic/claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
+  CACHE_MODEL=anthropic/claude-sonnet-4-6 \
+  REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
+  bash scripts/check-provider.sh
+```
+注入或 token 超计这两项 `FAIL`，意味着该 provider 对那个模型来说**不是**安全的最低成本目标——在它修好之前，别把它放进那个模型的「最便宜优先」列表，修好后重新探测。
+### 信任矩阵（探测于 2026-05-27）
+两个 OpenAI 兼容 provider，同一脚本，同一天。单元格覆盖两个家族（G = Gemini，C = Claude）。
+| 检查项 | Kunavo | [TokenMart](https://thetokenmart.ai) |
+|---|---|---|
+| 工具调用（单次 + 多步 `content: null`） | G ⚠️ 间歇性¹ · C ✅ | ✅ 两者 |
+| token 计数 vs OpenRouter 基线 | G ✅ ~1.1–1.4× · C ✅ ~1.0× | ✅ 两者 ~1.0× |
+| 隐藏 prompt 注入 | G ✅ 无 · C ❌ 间歇性² | ✅ 无 |
+| `max_tokens` 是否生效 | ❌ 被忽略（两者） | ✅ 两者 |
+| prompt 缓存（`cache_control`） | C ❌ 未生效（探测中途 endpoint 还卡死） | C ✅ `cache_read` > 0 |
+¹ Kunavo Gemini 一次返回干净的工具调用，下一次相同请求却**完全丢掉了 tools**——不是稳定通过。
+² Kunavo Claude 一次对着幻觉中的"fake system prompt"作出反应，另一次又干净——注入是间歇性的，不是被移除了。
+**结论：** TokenMart 在 Gemini 和 Claude 两条路上每一项都通过，且结果稳定可复现——可以放心路由。Kunavo：Claude token 计数已干净（2026-05-28 重新 probe），按 8 折 list 价，Kunavo 现在是 Claude 模型的最便宜选择。现存问题：两个模型均忽略 `max_tokens`、Claude 隐藏 prompt 注入仍为间歇性、Gemini 也会间歇性丢工具调用——用新模型前先重新探测。
 ## 路线图
 - [x] 自有 failover 引擎——最便宜优先路由 + 流式安全的 fallback，不依赖外部路由库
 - [x] 真实的逐次调用成本核算（`onCost`）
 - [x] 基于各 provider `cost` 的自动最便宜优先排序（`autoSort`）
+- [x] 离线能力 + 成本检查（`scripts/check-provider.sh`）→ 逐模型信任矩阵
 - [ ] 内置价格表，实现零配置定价（省去手填 `cost` 数字）
-- [ ] provider 怪癖中间件（透明地修补已知的各 provider 请求怪癖）
-- [ ] 离线能力探测（工具调用 / 缓存 / 流式）→ 信任矩阵
+- [ ] provider 怪癖中间件（透明地修补已知怪癖，如 Kunavo 被忽略的 `max_tokens`）
+- [ ] 把 probe 结果自动接入路由（探测失败的 provider×model 自动从列表剔除）
 - [ ] 图像与视频模型路由（fal.ai / Runware / Kunavo）
 ## 联盟（Affiliate）披露
-`ai-lcr` 是 provider 中立的，可与任何 OpenAI 兼容的 endpoint 配合使用。作者与 **[Kunavo](https://kunavo.com/?ref=hJ2uT3iW)** 之间存在联盟（affiliate）关系——在官方价 7 折的情况下，它往往（但并非总是）是最便宜的选项，正如上面的表格所示。通过该链接注册可能会让作者获得一份分成。你完全不必使用它；自带 provider，路由功能照常工作。
+`ai-lcr` 是 provider 中立的，可与任何 OpenAI 兼容的 endpoint 配合使用。作者与 **[Kunavo](https://kunavo.com/?ref=victorimf)** 之间存在联盟（affiliate）关系——在官方价 8 折的情况下，它往往（但并非总是）是最便宜的选项，正如上面的表格所示。通过该链接注册可能会让作者获得一份分成。你完全不必使用它；自带 provider，路由功能照常工作。
 ## 开发