ai-lcr 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Victor
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,167 @@
1
+ # AI-LCR — AI Least Cost Routing
2
+
3
+ <p align="center">
4
+ <b>English</b> · <a href="./README.zh-CN.md">简体中文</a>
5
+ </p>
6
+
7
+ <p align="center">
8
+ <b>Automatic least-cost routing for LLM calls. One line to cut your AI bill.</b>
9
+ </p>
10
+
11
+ <p align="center">
12
+ <a href="https://www.npmjs.com/package/ai-lcr"><img src="https://img.shields.io/npm/v/ai-lcr.svg" alt="npm version"/></a>
13
+ <img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="MIT license"/>
14
+ <a href="https://ai-sdk.dev"><img src="https://img.shields.io/badge/built%20for-Vercel%20AI%20SDK-black?logo=vercel&logoColor=white" alt="built for Vercel AI SDK"/></a>
15
+ </p>
16
+
17
+ <p align="center">
18
+ <img src="assets/ai-lcr-hero.svg" alt="ai-lcr routes each model to its own cheapest provider — Gemini to Kunavo, DeepSeek to OpenRouter, Seedream to fal, Flux Schnell to Runware — and falls back on failure" width="820">
19
+ </p>
20
+
21
+ The same model costs different amounts on different providers — and no single provider is cheapest for everything. `ai-lcr` keeps a cheapest-first list per model, routes to the cheapest healthy one (⭐ below), and falls through on failure — the way phone carriers have done [Least Cost Routing](https://en.wikipedia.org/wiki/Least-cost_routing) for decades.
22
+
23
+ ## Install
24
+
25
+ ```bash
26
+ npm install ai-lcr
27
+ ```
28
+
29
+ `ai` (the Vercel AI SDK) is a peer dependency.
30
+
31
+ ## Quick start
32
+
33
+ ```ts
34
+ import { createLCR } from "ai-lcr";
35
+ import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
36
+ import { generateText } from "ai";
37
+
38
+ const kunavo = createOpenAICompatible({
39
+ name: "kunavo",
40
+ baseURL: "https://api.kunavo.com/v1",
41
+ apiKey: process.env.KUNAVO_API_KEY,
42
+ });
43
+ const openrouter = createOpenAICompatible({
44
+ name: "openrouter",
45
+ baseURL: "https://openrouter.ai/api/v1",
46
+ apiKey: process.env.OPENROUTER_API_KEY,
47
+ });
48
+
49
+ const lcr = createLCR({
50
+ autoSort: true, // sort each model's providers cheapest-first by `cost`
51
+ models: {
52
+ // One logical model, served cheapest-first across providers.
53
+ "gemini-3-flash": [
54
+ { model: kunavo("gemini-3-flash"), label: "kunavo", cost: { input: 0.35, output: 2.1 } },
55
+ { model: openrouter("google/gemini-3-flash-preview"), label: "openrouter", cost: { input: 0.5, output: 3.0 } },
56
+ ],
57
+ },
58
+ // See exactly what each call cost, on whichever provider served it.
59
+ onCost: ({ provider, costUsd }) => console.log(`${provider}: $${costUsd.toFixed(6)}`),
60
+ });
61
+
62
+ const { text } = await generateText({
63
+ model: lcr("gemini-3-flash"),
64
+ prompt: "Explain Least Cost Routing in one sentence.",
65
+ });
66
+ ```
67
+
68
+ `cost` and `label` are optional — pass bare models (`kunavo("gemini-3-flash")`) if you don't need cost accounting or `autoSort`. `lcr("gemini-3-flash")` returns a standard AI SDK model, so it works with `generateText`, `streamText`, `generateObject`, tools, and agents.
69
+
70
+ ## How it routes
71
+
72
+ 1. **Cheapest first.** Providers are tried in order — list them cheapest-first, or set `autoSort: true` to order them by `cost` automatically.
73
+ 2. **Fall through on failure.** On a retryable error (rate limit, 5xx, timeout) it advances to the next provider, streaming-safe. Hard errors (400, 401, 403, 422) pass through immediately.
74
+ 3. **Recover.** After an idle window (`resetIntervalMs`, default 60s) it snaps back to the cheapest provider.
75
+
76
+ <p align="center">
77
+ <img src="assets/ai-lcr-routing.svg" alt="routing diagram: cheapest first, fallback on failure, recover after idle" width="820">
78
+ </p>
79
+
80
+ ## Supported providers
81
+
82
+ Any OpenAI-compatible endpoint works.
83
+
84
+ - **Text:** [OpenRouter](https://openrouter.ai) (widest coverage, list pricing) · [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) (**30% off** every model)
85
+ - **Image / video:** [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) (**30% off**) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) — routing on the roadmap
86
+
87
+ ## Text model pricing
88
+
89
+ USD per 1M tokens, input / output. Official rates as of 2026-05 — verify current rates with each provider. OpenRouter passes list price through; Kunavo is a flat 30% off the official rate.
90
+
91
+ | Model | Official (in / out) | OpenRouter | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | Cheapest |
92
+ |---|---|---|---|---|
93
+ | Gemini 3 Flash | $0.50 / $3.00 | no discount | −30% | ⭐ Kunavo |
94
+ | Gemini 3 Pro / 3.1 Pro | $2.00 / $12.00 | no discount | −30% | ⭐ Kunavo |
95
+ | Gemini 2.5 Pro | $1.25 / $10.00 | no discount | −30% | ⭐ Kunavo |
96
+ | Gemini 2.5 Flash | $0.30 / $2.50 | no discount | −30% | ⭐ Kunavo |
97
+ | Claude Sonnet 4.6 | $3.00 / $15.00 | no discount | −30% | ⭐ Kunavo |
98
+ | Claude Haiku 4.5 | $1.00 / $5.00 | no discount | −30% | ⭐ Kunavo |
99
+ | DeepSeek V4 | $0.43 / $0.87 | no discount | not carried | ⭐ OpenRouter |
100
+
101
+ Kunavo carries Anthropic + Google. DeepSeek / OpenAI / Grok / Mistral route to OpenRouter — one config can mix them all.
102
+
103
+ ## Image model pricing
104
+
105
+ USD per image, as of 2026-05 (provider list / retail; verify current rates). Kunavo is 30% off official. fal and Runware are compute providers — `ai-lcr` picks the cheapest per model (⭐).
106
+
107
+ | Model | fal.ai | Runware | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | Cheapest |
108
+ |---|---|---|---|---|
109
+ | Nano Banana 2 | $0.080 | $0.069 | $0.047 | ⭐ Kunavo |
110
+ | Nano Banana Pro | $0.080 | — | $0.094 | ⭐ fal |
111
+ | GPT-Image-2 | $0.210 | $0.094 | $0.089 | ⭐ Kunavo |
112
+ | Imagen 4 Ultra | $0.060 | $0.060 | — | ⭐ fal / Runware |
113
+ | Ideogram V3 | $0.060 | $0.060 | — | ⭐ fal / Runware |
114
+ | Seedream 4 | $0.030 | — | — | ⭐ fal |
115
+ | Flux 1.1 Pro | $0.040 | $0.040 | — | ⭐ fal / Runware |
116
+ | Flux Dev | $0.025 | $0.025 | — | ⭐ fal / Runware |
117
+ | Flux Schnell | $0.0030 | $0.0013 | — | ⭐ Runware |
118
+ | Qwen-Image | — | $0.0038 | — | ⭐ Runware |
119
+ | FLUX.2 Klein 4B | — | $0.0006 | — | ⭐ Runware |
120
+
121
+ ## Video model pricing
122
+
123
+ USD per second, as of 2026-05 — verify current rates. Video billing differs by provider, so a clean cross-provider table isn't apples-to-apples: fal.ai and Runware charge per second, while Kunavo's Veo is per clip (Fast ~$0.28 / Lite ~$0.168 / Quality ~$1.34). Below are fal.ai's per-second rates (the video workhorse in testing); a normalized fal / Runware / Kunavo comparison is a TODO.
124
+
125
+ | Model | fal.ai ($/s) |
126
+ |---|---|
127
+ | Seedance Lite | $0.036 |
128
+ | Hailuo 02 Standard | $0.045 |
129
+ | LTX-2 | $0.060 |
130
+ | Kling 2.6 Pro | $0.070 |
131
+ | WAN 2.2 | $0.080 |
132
+ | Veo 3.1 Lite | $0.080 |
133
+ | Kling V3 Pro | $0.112 |
134
+ | Seedance Pro | $0.124 |
135
+ | Veo 3.1 (audio-on) | $0.400 |
136
+
137
+ ## Roadmap
138
+
139
+ - [x] Own failover engine — cheapest-first routing + streaming-safe fallback, no external routing dependency
140
+ - [x] Real per-call cost accounting (`onCost`)
141
+ - [x] Auto cheapest-first ordering (`autoSort`) from per-provider `cost`
142
+ - [ ] Bundled price table for zero-config pricing (drop the manual `cost` numbers)
143
+ - [ ] Provider-quirk middleware (transparently patch known per-provider request quirks)
144
+ - [ ] Offline capability probe (tool-calling / caching / streaming) → trust matrix
145
+ - [ ] Image & video model routing (fal.ai / Runware / Kunavo)
146
+
147
+ ## Affiliate disclosure
148
+
149
+ `ai-lcr` is provider-neutral and works with any OpenAI-compatible endpoint. The author holds an affiliate arrangement with **[Kunavo](https://kunavo.com/?ref=hJ2uT3iW)**, which — at 30% off official rates — is often (not always) the cheapest option, as the tables above show. Signing up through that link may earn the author a share. You're never required to use it; bring your own providers and routing works identically.
150
+
151
+ ## Development
152
+
153
+ ```bash
154
+ npm install
155
+ npm run typecheck
156
+ npm test # mocked routing/failover tests + live Kunavo tests
157
+ ```
158
+
159
+ The suite covers cheapest-first routing, failover on retryable errors (and *not* failing over on a 400), exhausting the whole chain, and a real broken-provider → Kunavo recovery. Live tests run only when `KUNAVO_API_KEY` is set in the environment; otherwise they're skipped.
160
+
161
+ ## Credits
162
+
163
+ The streaming-safe failover approach is adapted from [`ai-fallback`](https://github.com/remorses/ai-fallback) (MIT) — reimplemented in-house so ai-lcr owns its engine and layers cost accounting + routing directly into it. Built on the [Vercel AI SDK](https://ai-sdk.dev).
164
+
165
+ ## License
166
+
167
+ [MIT](./LICENSE) © Victor
@@ -0,0 +1,167 @@
1
+ # AI-LCR — AI 最低成本路由(Least Cost Routing)
2
+
3
+ <p align="center">
4
+ <a href="./README.md">English</a> · <b>简体中文</b>
5
+ </p>
6
+
7
+ <p align="center">
8
+ <b>LLM 调用的自动最低成本路由。一行代码,降低 AI 账单。</b>
9
+ </p>
10
+
11
+ <p align="center">
12
+ <a href="https://www.npmjs.com/package/ai-lcr"><img src="https://img.shields.io/npm/v/ai-lcr.svg" alt="npm version"/></a>
13
+ <img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="MIT license"/>
14
+ <a href="https://ai-sdk.dev"><img src="https://img.shields.io/badge/built%20for-Vercel%20AI%20SDK-black?logo=vercel&logoColor=white" alt="built for Vercel AI SDK"/></a>
15
+ </p>
16
+
17
+ <p align="center">
18
+ <img src="assets/ai-lcr-hero.svg" alt="ai-lcr 把每个模型路由到各自最便宜的 provider——Gemini 走 Kunavo,DeepSeek 走 OpenRouter,Seedream 走 fal,Flux Schnell 走 Runware——失败时自动 fallback" width="820">
19
+ </p>
20
+
21
+ 同一个模型在不同 provider 上的价格不同——而且没有任何单一 provider 在所有模型上都最便宜。`ai-lcr` 为每个模型维护一份「最便宜优先」的列表,路由到其中最便宜且健康的 provider(下表中的 ⭐),失败时向下穿透——这正是电话运营商几十年来一直在做的 [最低成本路由(Least Cost Routing)](https://en.wikipedia.org/wiki/Least-cost_routing)。
22
+
23
+ ## 安装
24
+
25
+ ```bash
26
+ npm install ai-lcr
27
+ ```
28
+
29
+ `ai`(Vercel AI SDK)是 peer dependency。
30
+
31
+ ## 快速开始
32
+
33
+ ```ts
34
+ import { createLCR } from "ai-lcr";
35
+ import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
36
+ import { generateText } from "ai";
37
+
38
+ const kunavo = createOpenAICompatible({
39
+ name: "kunavo",
40
+ baseURL: "https://api.kunavo.com/v1",
41
+ apiKey: process.env.KUNAVO_API_KEY,
42
+ });
43
+ const openrouter = createOpenAICompatible({
44
+ name: "openrouter",
45
+ baseURL: "https://openrouter.ai/api/v1",
46
+ apiKey: process.env.OPENROUTER_API_KEY,
47
+ });
48
+
49
+ const lcr = createLCR({
50
+ autoSort: true, // 按 `cost` 把每个模型的 provider 排成最便宜优先
51
+ models: {
52
+ // 一个逻辑模型,跨多个 provider 最便宜优先地提供服务。
53
+ "gemini-3-flash": [
54
+ { model: kunavo("gemini-3-flash"), label: "kunavo", cost: { input: 0.35, output: 2.1 } },
55
+ { model: openrouter("google/gemini-3-flash-preview"), label: "openrouter", cost: { input: 0.5, output: 3.0 } },
56
+ ],
57
+ },
58
+ // 看清每次调用的实际花费,以及由哪个 provider 提供。
59
+ onCost: ({ provider, costUsd }) => console.log(`${provider}: $${costUsd.toFixed(6)}`),
60
+ });
61
+
62
+ const { text } = await generateText({
63
+ model: lcr("gemini-3-flash"),
64
+ prompt: "Explain Least Cost Routing in one sentence.",
65
+ });
66
+ ```
67
+
68
+ `cost` 和 `label` 都是可选的——如果你不需要成本核算或 `autoSort`,可以直接传裸模型(`kunavo("gemini-3-flash")`)。`lcr("gemini-3-flash")` 返回一个标准的 AI SDK 模型,因此可与 `generateText`、`streamText`、`generateObject`、工具调用和 agent 一起使用。
69
+
70
+ ## 它如何路由
71
+
72
+ 1. **最便宜优先。** provider 按顺序依次尝试——把它们排成最便宜优先,或设置 `autoSort: true` 让它按 `cost` 自动排序。
73
+ 2. **失败时向下穿透。** 遇到可重试的错误(限流、5xx、超时)时,前进到下一个 provider,且对流式安全。硬错误(400、401、403、422)会直接透传,不做重试。
74
+ 3. **恢复。** 在一段空闲窗口(`resetIntervalMs`,默认 60s)之后,自动回到最便宜的 provider。
75
+
76
+ <p align="center">
77
+ <img src="assets/ai-lcr-routing.svg" alt="路由示意图:最便宜优先、失败时 fallback、空闲后恢复" width="820">
78
+ </p>
79
+
80
+ ## 支持的 provider
81
+
82
+ 任何 OpenAI 兼容的 endpoint 都可用。
83
+
84
+ - **文本:** [OpenRouter](https://openrouter.ai)(覆盖最广,列表定价)· [Kunavo](https://kunavo.com/?ref=hJ2uT3iW)(**全模型 7 折**)
85
+ - **图像 / 视频:** [Kunavo](https://kunavo.com/?ref=hJ2uT3iW)(**7 折**)· [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —— 路由功能在路线图中
86
+
87
+ ## 文本模型价格
88
+
89
+ 单位为每 100 万 token 的美元价格,input / output。官方价格截至 2026-05——请向各 provider 核对当前价格。OpenRouter 直接透传列表价;Kunavo 在官方价基础上统一 7 折。
90
+
91
+ | 模型 | 官方价(in / out) | OpenRouter | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | 最便宜 |
92
+ |---|---|---|---|---|
93
+ | Gemini 3 Flash | $0.50 / $3.00 | 无折扣 | −30% | ⭐ Kunavo |
94
+ | Gemini 3 Pro / 3.1 Pro | $2.00 / $12.00 | 无折扣 | −30% | ⭐ Kunavo |
95
+ | Gemini 2.5 Pro | $1.25 / $10.00 | 无折扣 | −30% | ⭐ Kunavo |
96
+ | Gemini 2.5 Flash | $0.30 / $2.50 | 无折扣 | −30% | ⭐ Kunavo |
97
+ | Claude Sonnet 4.6 | $3.00 / $15.00 | 无折扣 | −30% | ⭐ Kunavo |
98
+ | Claude Haiku 4.5 | $1.00 / $5.00 | 无折扣 | −30% | ⭐ Kunavo |
99
+ | DeepSeek V4 | $0.43 / $0.87 | 无折扣 | 未提供 | ⭐ OpenRouter |
100
+
101
+ Kunavo 提供 Anthropic + Google。DeepSeek / OpenAI / Grok / Mistral 路由到 OpenRouter——一份配置即可混用全部。
102
+
103
+ ## 图像模型价格
104
+
105
+ 单位为每张图的美元价格,截至 2026-05(provider 列表价 / 零售价;请核对当前价格)。Kunavo 为官方价 7 折。fal 与 Runware 是算力 provider——`ai-lcr` 为每个模型挑选最便宜的那个(⭐)。
106
+
107
+ | 模型 | fal.ai | Runware | [Kunavo](https://kunavo.com/?ref=hJ2uT3iW) | 最便宜 |
108
+ |---|---|---|---|---|
109
+ | Nano Banana 2 | $0.080 | $0.069 | $0.047 | ⭐ Kunavo |
110
+ | Nano Banana Pro | $0.080 | — | $0.094 | ⭐ fal |
111
+ | GPT-Image-2 | $0.210 | $0.094 | $0.089 | ⭐ Kunavo |
112
+ | Imagen 4 Ultra | $0.060 | $0.060 | — | ⭐ fal / Runware |
113
+ | Ideogram V3 | $0.060 | $0.060 | — | ⭐ fal / Runware |
114
+ | Seedream 4 | $0.030 | — | — | ⭐ fal |
115
+ | Flux 1.1 Pro | $0.040 | $0.040 | — | ⭐ fal / Runware |
116
+ | Flux Dev | $0.025 | $0.025 | — | ⭐ fal / Runware |
117
+ | Flux Schnell | $0.0030 | $0.0013 | — | ⭐ Runware |
118
+ | Qwen-Image | — | $0.0038 | — | ⭐ Runware |
119
+ | FLUX.2 Klein 4B | — | $0.0006 | — | ⭐ Runware |
120
+
121
+ ## 视频模型价格
122
+
123
+ 单位为每秒的美元价格,截至 2026-05——请核对当前价格。视频计费方式因 provider 而异,因此无法做严格对等的跨 provider 表格:fal.ai 和 Runware 按秒计费,而 Kunavo 的 Veo 按段计费(Fast ~$0.28 / Lite ~$0.168 / Quality ~$1.34)。下表为 fal.ai 的每秒价格(测试中的视频主力);fal / Runware / Kunavo 的归一化对比是一个 TODO。
124
+
125
+ | 模型 | fal.ai($/s) |
126
+ |---|---|
127
+ | Seedance Lite | $0.036 |
128
+ | Hailuo 02 Standard | $0.045 |
129
+ | LTX-2 | $0.060 |
130
+ | Kling 2.6 Pro | $0.070 |
131
+ | WAN 2.2 | $0.080 |
132
+ | Veo 3.1 Lite | $0.080 |
133
+ | Kling V3 Pro | $0.112 |
134
+ | Seedance Pro | $0.124 |
135
+ | Veo 3.1(audio-on) | $0.400 |
136
+
137
+ ## 路线图
138
+
139
+ - [x] 自有 failover 引擎——最便宜优先路由 + 流式安全的 fallback,不依赖外部路由库
140
+ - [x] 真实的逐次调用成本核算(`onCost`)
141
+ - [x] 基于各 provider `cost` 的自动最便宜优先排序(`autoSort`)
142
+ - [ ] 内置价格表,实现零配置定价(省去手填 `cost` 数字)
143
+ - [ ] provider 怪癖中间件(透明地修补已知的各 provider 请求怪癖)
144
+ - [ ] 离线能力探测(工具调用 / 缓存 / 流式)→ 信任矩阵
145
+ - [ ] 图像与视频模型路由(fal.ai / Runware / Kunavo)
146
+
147
+ ## 联盟(Affiliate)披露
148
+
149
+ `ai-lcr` 是 provider 中立的,可与任何 OpenAI 兼容的 endpoint 配合使用。作者与 **[Kunavo](https://kunavo.com/?ref=hJ2uT3iW)** 之间存在联盟(affiliate)关系——在官方价 7 折的情况下,它往往(但并非总是)是最便宜的选项,正如上面的表格所示。通过该链接注册可能会让作者获得一份分成。你完全不必使用它;自带 provider,路由功能照常工作。
150
+
151
+ ## 开发
152
+
153
+ ```bash
154
+ npm install
155
+ npm run typecheck
156
+ npm test # mock 的路由 / failover 测试 + 真实 Kunavo 测试
157
+ ```
158
+
159
+ 测试套件覆盖了:最便宜优先路由、可重试错误时的 failover(以及遇到 400 时*不*做 failover)、穷尽整条链路,以及一次真实的「provider 故障 → Kunavo 恢复」。真实测试仅在环境变量 `KUNAVO_API_KEY` 设置时运行,否则跳过。
160
+
161
+ ## 致谢
162
+
163
+ 流式安全的 failover 方案改编自 [`ai-fallback`](https://github.com/remorses/ai-fallback)(MIT)——在内部重新实现,使 ai-lcr 拥有自己的引擎,并把成本核算 + 路由直接融入其中。基于 [Vercel AI SDK](https://ai-sdk.dev) 构建。
164
+
165
+ ## 许可证
166
+
167
+ [MIT](./LICENSE) © Victor
@@ -0,0 +1,79 @@
1
+ import { LanguageModelV3 } from '@ai-sdk/provider';
2
+
3
+ /**
4
+ * Owned failover engine for ai-lcr.
5
+ *
6
+ * A LanguageModelV3 that wraps an ordered, cheapest-first list of providers:
7
+ * it serves from the first healthy one, switches to the next on a retryable
8
+ * error (streaming-safe), and snaps back to the cheapest after an idle window.
9
+ * It also computes per-call cost from each provider's price and fires `onCost`.
10
+ *
11
+ * The switching loop is adapted from `ai-fallback` (MIT, © remorses) — its
12
+ * streaming-safe fallback approach — reimplemented here so ai-lcr owns its core
13
+ * engine and can layer cost accounting + provider quirks directly into it.
14
+ */
15
+
16
+ /** USD per 1M tokens. */
17
+ interface ProviderCost {
18
+ input: number;
19
+ output: number;
20
+ }
21
+ interface CostEvent {
22
+ /** Logical model name (the key in createLCR's `models`). */
23
+ model: string;
24
+ /** Which provider actually served the request. */
25
+ provider: string;
26
+ inputTokens: number;
27
+ outputTokens: number;
28
+ /** Computed from the serving provider's `cost`; 0 if no price was given. */
29
+ costUsd: number;
30
+ }
31
+
32
+ /**
33
+ * ai-lcr — Least Cost Routing for LLMs.
34
+ *
35
+ * Route each model to the cheapest provider that can serve it, fall back
36
+ * automatically on failure, and report real per-call cost. Built on its own
37
+ * failover engine (see ./fallback) — no external routing dependency.
38
+ *
39
+ * Roadmap (see README): provider-quirk middleware, offline capability probe,
40
+ * a bundled price table for zero-config cheapest-first ordering.
41
+ */
42
+
43
+ /**
44
+ * A provider for a model: either a bare AI SDK model (e.g.
45
+ * `createOpenAICompatible(...)("id")`), or that model wrapped with price/label
46
+ * metadata to unlock cost accounting and cheapest-first auto-sorting.
47
+ */
48
+ type ProviderEntry = LanguageModelV3 | {
49
+ model: LanguageModelV3;
50
+ /** USD per 1M tokens. Enables `onCost` and `autoSort`. */
51
+ cost?: ProviderCost;
52
+ /** Label used in cost events / logs. Defaults to the model's provider id. */
53
+ label?: string;
54
+ };
55
+ interface LCRConfig {
56
+ /**
57
+ * Map of logical model name -> providers to try, cheapest-first.
58
+ * Order is priority order unless `autoSort` is set.
59
+ */
60
+ models: Record<string, ProviderEntry[]>;
61
+ /** Sort each model's providers cheapest-first by `cost` before routing. */
62
+ autoSort?: boolean;
63
+ /** Idle window after which routing snaps back to the cheapest provider. Default 60s. */
64
+ resetIntervalMs?: number;
65
+ /** Called when a provider errors and routing falls through to the next. */
66
+ onError?: (error: Error, provider: string) => void;
67
+ /** Called after each successful call with the serving provider, tokens, and cost. */
68
+ onCost?: (event: CostEvent) => void;
69
+ }
70
+ /** Resolve a logical model name to a routed model. */
71
+ type LCRRouter = (modelName: string) => LanguageModelV3;
72
+ /**
73
+ * Build a Least Cost Router. Returns a function that resolves a logical model
74
+ * name to a routed model usable anywhere in the Vercel AI SDK (generateText,
75
+ * streamText, generateObject, tools, agents).
76
+ */
77
+ declare function createLCR(config: LCRConfig): LCRRouter;
78
+
79
+ export { type CostEvent, type LCRConfig, type LCRRouter, type ProviderCost, type ProviderEntry, createLCR };
package/dist/index.js ADDED
@@ -0,0 +1,224 @@
1
+ // src/fallback.ts
2
+ var RETRYABLE_STATUS = /* @__PURE__ */ new Set([401, 403, 408, 409, 413, 429, 498, 500]);
3
+ var RETRYABLE_PATTERNS = [
4
+ "overloaded",
5
+ "service unavailable",
6
+ "bad gateway",
7
+ "too many requests",
8
+ "internal server error",
9
+ "gateway timeout",
10
+ "rate_limit",
11
+ "ratelimit",
12
+ "rate limit",
13
+ "capacity",
14
+ "timeout",
15
+ "server_error",
16
+ "502",
17
+ "503",
18
+ "504",
19
+ "429"
20
+ ];
21
+ function isRetryableError(error) {
22
+ const e = error;
23
+ const status = e?.statusCode ?? e?.status;
24
+ if (typeof status === "number" && (RETRYABLE_STATUS.has(status) || status > 500)) {
25
+ return true;
26
+ }
27
+ const text = (e?.message ? String(e.message) : safeStringify(error)).toLowerCase();
28
+ return RETRYABLE_PATTERNS.some((p) => text.includes(p));
29
+ }
30
+ function safeStringify(value) {
31
+ try {
32
+ return JSON.stringify(value) ?? "";
33
+ } catch {
34
+ return String(value);
35
+ }
36
+ }
37
+ var LcrFallbackModel = class {
38
+ constructor(opts) {
39
+ this.opts = opts;
40
+ if (opts.providers.length === 0) {
41
+ throw new Error(`ai-lcr: model "${opts.modelName}" has no providers`);
42
+ }
43
+ this.resetIntervalMs = opts.resetIntervalMs ?? 6e4;
44
+ }
45
+ opts;
46
+ specificationVersion = "v3";
47
+ index = 0;
48
+ lastReset = Date.now();
49
+ resetIntervalMs;
50
+ get current() {
51
+ return this.opts.providers[this.index];
52
+ }
53
+ get modelId() {
54
+ return this.current.model.modelId;
55
+ }
56
+ get provider() {
57
+ return this.current.model.provider;
58
+ }
59
+ get supportedUrls() {
60
+ return this.current.model.supportedUrls;
61
+ }
62
+ checkReset() {
63
+ if (this.index !== 0 && Date.now() - this.lastReset >= this.resetIntervalMs) {
64
+ this.index = 0;
65
+ }
66
+ this.lastReset = Date.now();
67
+ }
68
+ switchNext() {
69
+ this.index = (this.index + 1) % this.opts.providers.length;
70
+ }
71
+ shouldRetry(error) {
72
+ return (this.opts.shouldRetry ?? isRetryableError)(error);
73
+ }
74
+ emitCost(provider, usage) {
75
+ const onCost = this.opts.onCost;
76
+ if (!onCost) return;
77
+ const inputTokens = usage?.inputTokens?.total ?? 0;
78
+ const outputTokens = usage?.outputTokens?.total ?? 0;
79
+ const costUsd = provider.cost ? inputTokens / 1e6 * provider.cost.input + outputTokens / 1e6 * provider.cost.output : 0;
80
+ onCost({
81
+ model: this.opts.modelName,
82
+ provider: provider.label,
83
+ inputTokens,
84
+ outputTokens,
85
+ costUsd
86
+ });
87
+ }
88
+ async doGenerate(options) {
89
+ this.checkReset();
90
+ const start = this.index;
91
+ let lastError;
92
+ for (; ; ) {
93
+ const provider = this.current;
94
+ try {
95
+ const result = await provider.model.doGenerate(options);
96
+ this.emitCost(provider, result.usage);
97
+ return result;
98
+ } catch (error) {
99
+ lastError = error;
100
+ if (!this.shouldRetry(error)) throw error;
101
+ this.opts.onError?.(error, provider.label);
102
+ this.switchNext();
103
+ if (this.index === start) throw lastError;
104
+ }
105
+ }
106
+ }
107
+ async doStream(options) {
108
+ this.checkReset();
109
+ const self = this;
110
+ const start = this.index;
111
+ let result;
112
+ let serving;
113
+ for (; ; ) {
114
+ serving = this.current;
115
+ try {
116
+ result = await serving.model.doStream(options);
117
+ break;
118
+ } catch (error) {
119
+ if (!this.shouldRetry(error)) throw error;
120
+ this.opts.onError?.(error, serving.label);
121
+ this.switchNext();
122
+ if (this.index === start) throw error;
123
+ }
124
+ }
125
+ const servingProvider = serving;
126
+ let usage;
127
+ let streamedAny = false;
128
+ const stream = new ReadableStream({
129
+ async start(controller) {
130
+ let reader = null;
131
+ try {
132
+ reader = result.stream.getReader();
133
+ for (; ; ) {
134
+ const { done, value } = await reader.read();
135
+ if (!streamedAny && value && typeof value === "object" && "error" in value) {
136
+ const err = value.error;
137
+ if (self.shouldRetry(err)) throw err;
138
+ }
139
+ if (done) break;
140
+ if (value.type === "finish") usage = value.usage;
141
+ controller.enqueue(value);
142
+ if (value.type !== "stream-start") streamedAny = true;
143
+ }
144
+ self.emitCost(servingProvider, usage);
145
+ controller.close();
146
+ } catch (error) {
147
+ self.opts.onError?.(error, servingProvider.label);
148
+ if (!streamedAny) {
149
+ self.switchNext();
150
+ if (self.index === start) {
151
+ controller.error(error);
152
+ return;
153
+ }
154
+ try {
155
+ const next = await self.doStream(options);
156
+ const nextReader = next.stream.getReader();
157
+ try {
158
+ for (; ; ) {
159
+ const { done, value } = await nextReader.read();
160
+ if (done) break;
161
+ controller.enqueue(value);
162
+ }
163
+ controller.close();
164
+ } finally {
165
+ nextReader.releaseLock();
166
+ }
167
+ } catch (nextError) {
168
+ controller.error(nextError);
169
+ }
170
+ return;
171
+ }
172
+ controller.error(error);
173
+ } finally {
174
+ reader?.releaseLock();
175
+ }
176
+ }
177
+ });
178
+ return { ...result, stream };
179
+ }
180
+ };
181
+
182
+ // src/index.ts
183
+ function isLanguageModel(entry) {
184
+ return typeof entry.doGenerate === "function";
185
+ }
186
+ function normalize(entry) {
187
+ if (isLanguageModel(entry)) {
188
+ return { model: entry, label: entry.provider };
189
+ }
190
+ return {
191
+ model: entry.model,
192
+ label: entry.label ?? entry.model.provider,
193
+ cost: entry.cost
194
+ };
195
+ }
196
+ function priceKey(p) {
197
+ return p.cost ? p.cost.input + p.cost.output : Number.POSITIVE_INFINITY;
198
+ }
199
+ function createLCR(config) {
200
+ const { models, autoSort = false, resetIntervalMs, onError, onCost } = config;
201
+ const routed = /* @__PURE__ */ new Map();
202
+ for (const [name, entries] of Object.entries(models)) {
203
+ let providers = entries.map(normalize);
204
+ if (autoSort) {
205
+ providers = [...providers].sort((a, b) => priceKey(a) - priceKey(b));
206
+ }
207
+ routed.set(
208
+ name,
209
+ new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost })
210
+ );
211
+ }
212
+ return (modelName) => {
213
+ const model = routed.get(modelName);
214
+ if (!model) {
215
+ throw new Error(
216
+ `ai-lcr: unknown model "${modelName}" \u2014 add it to createLCR({ models })`
217
+ );
218
+ }
219
+ return model;
220
+ };
221
+ }
222
+ export {
223
+ createLCR
224
+ };
package/package.json ADDED
@@ -0,0 +1,61 @@
1
+ {
2
+ "name": "ai-lcr",
3
+ "version": "0.0.1",
4
+ "description": "Least Cost Routing for LLMs — route every model call to the cheapest available provider, fall back automatically, and track real cost. Built for the Vercel AI SDK.",
5
+ "keywords": [
6
+ "ai",
7
+ "llm",
8
+ "vercel-ai-sdk",
9
+ "ai-sdk",
10
+ "router",
11
+ "least-cost-routing",
12
+ "lcr",
13
+ "fallback",
14
+ "openrouter",
15
+ "cost-optimization",
16
+ "openai-compatible"
17
+ ],
18
+ "license": "MIT",
19
+ "author": "Victor",
20
+ "repository": {
21
+ "type": "git",
22
+ "url": "git+https://github.com/victorzhrn/ai-lcr.git"
23
+ },
24
+ "homepage": "https://github.com/victorzhrn/ai-lcr#readme",
25
+ "bugs": {
26
+ "url": "https://github.com/victorzhrn/ai-lcr/issues"
27
+ },
28
+ "type": "module",
29
+ "main": "./dist/index.js",
30
+ "module": "./dist/index.js",
31
+ "types": "./dist/index.d.ts",
32
+ "exports": {
33
+ ".": {
34
+ "types": "./dist/index.d.ts",
35
+ "import": "./dist/index.js"
36
+ }
37
+ },
38
+ "files": [
39
+ "dist",
40
+ "README.md",
41
+ "LICENSE"
42
+ ],
43
+ "scripts": {
44
+ "build": "tsup src/index.ts --format esm --dts --clean",
45
+ "typecheck": "tsc --noEmit",
46
+ "test": "vitest run",
47
+ "test:watch": "vitest"
48
+ },
49
+ "peerDependencies": {
50
+ "ai": "^6.0.0"
51
+ },
52
+ "devDependencies": {
53
+ "@ai-sdk/openai-compatible": "^2.0.0",
54
+ "@ai-sdk/provider": "^3.0.0",
55
+ "@types/node": "^25.9.1",
56
+ "ai": "^6.0.0",
57
+ "tsup": "^8.0.0",
58
+ "typescript": "^5.5.0",
59
+ "vitest": "^3.0.0"
60
+ }
61
+ }