ai-lcr 0.5.2 → 0.5.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +51 -0
- package/README.md +2 -2
- package/README.zh-CN.md +2 -2
- package/dist/index.cjs +45 -9
- package/dist/index.d.cts +58 -1
- package/dist/index.d.ts +58 -1
- package/dist/index.js +40 -8
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,57 @@ All notable changes to `ai-lcr` are documented here. The format follows
|
|
|
4
4
|
[Keep a Changelog](https://keepachangelog.com/), and the project adheres to
|
|
5
5
|
[Semantic Versioning](https://semver.org/).
|
|
6
6
|
|
|
7
|
+
## [0.5.4] — 2026-06-03
|
|
8
|
+
|
|
9
|
+
### Changed
|
|
10
|
+
|
|
11
|
+
- **A provider 400 now fails over instead of being passed through.** Previously
|
|
12
|
+
any client error (400/422/…) was treated as the caller's fault and thrown
|
|
13
|
+
immediately, killing the request even when another provider would have served
|
|
14
|
+
it. But across OpenAI-compatible aggregators a 400 is most often
|
|
15
|
+
*provider-specific* — an unsupported parameter, a model the provider hasn't
|
|
16
|
+
listed, a stricter JSON schema — not a universally-broken request. The default
|
|
17
|
+
failover gate (`shouldFailover`) now advances to the next provider on **any**
|
|
18
|
+
failure except a deliberate caller cancellation (`AbortSignal`), which is the
|
|
19
|
+
one thing we must never re-issue elsewhere. When every provider rejects the
|
|
20
|
+
request it still throws — now surfacing the **first** (original) error rather
|
|
21
|
+
than the last fallback's, so a genuine caller bug stays debuggable. Failed
|
|
22
|
+
attempts keep their precise `ErrorKind` (`"client"` for a 400) in the
|
|
23
|
+
`CallRecord`, so a real bug is still visible.
|
|
24
|
+
|
|
25
|
+
To restore the old "client errors fail fast" behavior, pass
|
|
26
|
+
`shouldRetry: isRetryableError` to `createLCR`.
|
|
27
|
+
|
|
28
|
+
### Added
|
|
29
|
+
|
|
30
|
+
- **`createLCR({ shouldRetry })`.** The failover predicate is now configurable
|
|
31
|
+
from the top-level API (it previously existed only on the internal engine), so
|
|
32
|
+
callers can tune or fully override the policy above.
|
|
33
|
+
- **Exported error predicates** `isRetryableError`, `isNetworkError`,
|
|
34
|
+
`isAbortError`, and `shouldFailover` — building blocks for a custom
|
|
35
|
+
`shouldRetry`.
|
|
36
|
+
|
|
37
|
+
## [0.5.3] — 2026-06-03
|
|
38
|
+
|
|
39
|
+
All additions are optional and backward compatible.
|
|
40
|
+
|
|
41
|
+
### Added
|
|
42
|
+
|
|
43
|
+
- **`defaultCacheReadRatio` — chain-wide fallback price for prompt-cache reads.**
|
|
44
|
+
ai-lcr already detects cache hits from the provider's reported usage and emits
|
|
45
|
+
`cachedInputTokens` for any provider that reports them (Anthropic, Gemini's
|
|
46
|
+
implicit cache, DeepSeek, …). But the *saving* (`cachedSavingUsd`) and the
|
|
47
|
+
cache-discounted `costUsd` were only computed when a leg set an explicit
|
|
48
|
+
`cost.cacheRead` — so a route that forgot it (e.g. a Gemini OpenRouter leg)
|
|
49
|
+
silently reported `$0` saved and billed cached tokens at the full input rate.
|
|
50
|
+
|
|
51
|
+
`createLCR({ defaultCacheReadRatio: 0.1 })` now supplies a fallback cache-read
|
|
52
|
+
price as a fraction of each leg's `input`, applied **only** to legs that omit
|
|
53
|
+
an explicit `cacheRead`. Most providers' cache-read price is ~0.1× input, so
|
|
54
|
+
`0.1` makes cache cost + savings "just work" across every model without each
|
|
55
|
+
route hardcoding a rate. Legs with their own `cacheRead` are untouched (set it
|
|
56
|
+
for outliers like OpenAI's ~0.5×). Unset = previous behavior. Must be in [0, 1].
|
|
57
|
+
|
|
7
58
|
## [0.5.0] — 2026-06-02
|
|
8
59
|
|
|
9
60
|
All additions are optional and backward compatible.
|
package/README.md
CHANGED
|
@@ -141,7 +141,7 @@ DeepInfra carries open weights only — no first-party Claude / GPT / Gemini. Fo
|
|
|
141
141
|
## How it routes
|
|
142
142
|
|
|
143
143
|
1. **Cheapest first.** Providers are tried in order — list them cheapest-first, or set `autoSort: true` to order them by `cost` automatically.
|
|
144
|
-
2. **Fall through on failure.** On
|
|
144
|
+
2. **Fall through on failure.** On any provider failure — rate limit, 5xx, timeout, a **billing cap** (402 / out-of-credit / quota), *and* a client error like a **400** — it advances to the next provider, streaming-safe. A 400 fails over on purpose: across OpenAI-compatible aggregators a 400 is usually "*this* provider won't take this request" (an unsupported param, a model it hasn't listed, a stricter schema), not a universally-broken request — so the next provider may well serve it. If every provider rejects the request it still fails, surfacing the **original** error so a genuine caller bug stays debuggable. The one failure that never fails over is a deliberate caller cancellation (`AbortSignal`). Pass `shouldRetry: isRetryableError` to `createLCR` to restore the stricter "client errors fail fast" behavior.
|
|
145
145
|
3. **Recover.** After an idle window (`resetIntervalMs`, default 60s) it snaps back to the cheapest provider.
|
|
146
146
|
|
|
147
147
|
## See what happened (`onCall`)
|
|
@@ -364,7 +364,7 @@ npm run typecheck
|
|
|
364
364
|
npm test # mocked routing/failover tests + live Kunavo tests
|
|
365
365
|
```
|
|
366
366
|
|
|
367
|
-
The suite covers cheapest-first routing, failover on retryable errors
|
|
367
|
+
The suite covers cheapest-first routing, failover on retryable errors *and* on a provider 400 (but *not* on a caller cancellation), surfacing the original error when the whole chain is exhausted, and a real broken-provider → Kunavo recovery. Live tests run only when `KUNAVO_API_KEY` is set in the environment; otherwise they're skipped.
|
|
368
368
|
|
|
369
369
|
## Credits
|
|
370
370
|
|
package/README.zh-CN.md
CHANGED
|
@@ -141,7 +141,7 @@ DeepInfra 只承载开源权重——没有第一方 Claude / GPT / Gemini。那
|
|
|
141
141
|
## 它如何路由
|
|
142
142
|
|
|
143
143
|
1. **最便宜优先。** provider 按顺序依次尝试——把它们排成最便宜优先,或设置 `autoSort: true` 让它按 `cost` 自动排序。
|
|
144
|
-
2. **失败时向下穿透。**
|
|
144
|
+
2. **失败时向下穿透。** 遇到任何 provider 失败——限流、5xx、超时、**额度耗尽**(402 / 欠费 / 余额不足),以及 **400** 这类 client 错误——都会前进到下一个 provider,且对流式安全。400 会 failover 是有意为之:在 OpenAI 兼容聚合层里,400 往往是"*这家* provider 不吃这个请求"(不支持的参数、它没上架这个 model、更严格的 schema),而非请求本身坏了——换一家很可能就能服务。若所有 provider 都拒绝,请求仍会失败,并抛出**第一个**(原始)错误,让真正的调用方 bug 保持可调试。唯一永远不 failover 的是调用方主动取消(`AbortSignal`)。想恢复旧的"client 错误立即失败"行为,给 `createLCR` 传 `shouldRetry: isRetryableError`。
|
|
145
145
|
3. **恢复。** 在一段空闲窗口(`resetIntervalMs`,默认 60s)之后,自动回到最便宜的 provider。
|
|
146
146
|
|
|
147
147
|
## 支持的 provider
|
|
@@ -280,7 +280,7 @@ npm run typecheck
|
|
|
280
280
|
npm test # mock 的路由 / failover 测试 + 真实 Kunavo 测试
|
|
281
281
|
```
|
|
282
282
|
|
|
283
|
-
|
|
283
|
+
测试套件覆盖了:最便宜优先路由、可重试错误以及 provider 400 时的 failover(但调用方主动取消时*不*做 failover)、穷尽整条链路时抛出原始错误,以及一次真实的「provider 故障 → Kunavo 恢复」。真实测试仅在环境变量 `KUNAVO_API_KEY` 设置时运行,否则跳过。
|
|
284
284
|
|
|
285
285
|
## 致谢
|
|
286
286
|
|
package/dist/index.cjs
CHANGED
|
@@ -34,9 +34,13 @@ __export(index_exports, {
|
|
|
34
34
|
createMediaLCR: () => createMediaLCR,
|
|
35
35
|
createRunwareMediaAdapter: () => createRunwareMediaAdapter,
|
|
36
36
|
formatCallRecord: () => formatCallRecord,
|
|
37
|
+
isAbortError: () => isAbortError,
|
|
38
|
+
isNetworkError: () => isNetworkError,
|
|
39
|
+
isRetryableError: () => isRetryableError,
|
|
37
40
|
normalizedCents: () => normalizedCents,
|
|
38
41
|
rankRoutes: () => rankRoutes,
|
|
39
|
-
referenceMegapixels: () => referenceMegapixels
|
|
42
|
+
referenceMegapixels: () => referenceMegapixels,
|
|
43
|
+
shouldFailover: () => shouldFailover
|
|
40
44
|
});
|
|
41
45
|
module.exports = __toCommonJS(index_exports);
|
|
42
46
|
|
|
@@ -158,6 +162,15 @@ function isRetryableError(error) {
|
|
|
158
162
|
const { text } = errorSignals(error);
|
|
159
163
|
return RETRYABLE_PATTERNS.some((p) => text.includes(p));
|
|
160
164
|
}
|
|
165
|
+
function isAbortError(error) {
|
|
166
|
+
const e = error;
|
|
167
|
+
if (typeof e?.name === "string" && e.name === "AbortError") return true;
|
|
168
|
+
const { text } = errorSignals(error);
|
|
169
|
+
return text.includes("operation was aborted") || text.includes("operation was canceled");
|
|
170
|
+
}
|
|
171
|
+
function shouldFailover(error) {
|
|
172
|
+
return !isAbortError(error);
|
|
173
|
+
}
|
|
161
174
|
function classifyError(error) {
|
|
162
175
|
if (error instanceof EmptyCompletionError) return "empty_completion";
|
|
163
176
|
const e = error;
|
|
@@ -281,7 +294,7 @@ var LcrFallbackModel = class {
|
|
|
281
294
|
this.lastFailoverAt = Date.now();
|
|
282
295
|
}
|
|
283
296
|
shouldRetry(error) {
|
|
284
|
-
return (this.opts.shouldRetry ??
|
|
297
|
+
return (this.opts.shouldRetry ?? shouldFailover)(error);
|
|
285
298
|
}
|
|
286
299
|
// Observer callbacks are caller-supplied logging hooks: a throw from one of
|
|
287
300
|
// them must NEVER turn a successful (or already-failed) request into a
|
|
@@ -314,6 +327,7 @@ var LcrFallbackModel = class {
|
|
|
314
327
|
}
|
|
315
328
|
/** Record a failed attempt onto the call's chain (no event yet). */
|
|
316
329
|
recordFail(ctx, provider, attemptStart, error) {
|
|
330
|
+
if (ctx.firstError === void 0) ctx.firstError = error;
|
|
317
331
|
ctx.attempts.push({
|
|
318
332
|
provider: provider.label,
|
|
319
333
|
ok: false,
|
|
@@ -429,7 +443,7 @@ var LcrFallbackModel = class {
|
|
|
429
443
|
}
|
|
430
444
|
}
|
|
431
445
|
this.finalizeFail(ctx);
|
|
432
|
-
throw lastError;
|
|
446
|
+
throw ctx.firstError ?? lastError;
|
|
433
447
|
}
|
|
434
448
|
async doStream(options) {
|
|
435
449
|
return this.doStreamWithCtx(options, this.startCall(options), this.startIndex(), 0);
|
|
@@ -465,7 +479,7 @@ var LcrFallbackModel = class {
|
|
|
465
479
|
tried++;
|
|
466
480
|
if (tried >= n) {
|
|
467
481
|
this.finalizeFail(ctx);
|
|
468
|
-
throw error;
|
|
482
|
+
throw ctx.firstError ?? error;
|
|
469
483
|
}
|
|
470
484
|
idx = (idx + 1) % n;
|
|
471
485
|
}
|
|
@@ -513,7 +527,7 @@ var LcrFallbackModel = class {
|
|
|
513
527
|
const nextTried = triedBeforeServing + 1;
|
|
514
528
|
if (nextTried >= n) {
|
|
515
529
|
self.finalizeFail(ctx);
|
|
516
|
-
controller.error(error);
|
|
530
|
+
controller.error(ctx.firstError ?? error);
|
|
517
531
|
return;
|
|
518
532
|
}
|
|
519
533
|
try {
|
|
@@ -1224,17 +1238,35 @@ function normalize(entry) {
|
|
|
1224
1238
|
function priceKey(p) {
|
|
1225
1239
|
return p.cost ? p.cost.input + p.cost.output : Number.POSITIVE_INFINITY;
|
|
1226
1240
|
}
|
|
1241
|
+
function withDefaultCacheRead(p, ratio) {
|
|
1242
|
+
if (ratio === void 0 || !p.cost || p.cost.cacheRead !== void 0) return p;
|
|
1243
|
+
return { ...p, cost: { ...p.cost, cacheRead: p.cost.input * ratio } };
|
|
1244
|
+
}
|
|
1227
1245
|
function createLCR(config) {
|
|
1228
|
-
const {
|
|
1246
|
+
const {
|
|
1247
|
+
models,
|
|
1248
|
+
autoSort = false,
|
|
1249
|
+
resetIntervalMs,
|
|
1250
|
+
onError,
|
|
1251
|
+
onCost,
|
|
1252
|
+
onCall,
|
|
1253
|
+
shouldRetry,
|
|
1254
|
+
defaultCacheReadRatio
|
|
1255
|
+
} = config;
|
|
1256
|
+
if (defaultCacheReadRatio !== void 0 && (defaultCacheReadRatio < 0 || defaultCacheReadRatio > 1)) {
|
|
1257
|
+
throw new Error(
|
|
1258
|
+
`ai-lcr: defaultCacheReadRatio must be in [0, 1], got ${defaultCacheReadRatio}`
|
|
1259
|
+
);
|
|
1260
|
+
}
|
|
1229
1261
|
const routed = /* @__PURE__ */ new Map();
|
|
1230
1262
|
for (const [name, entries] of Object.entries(models)) {
|
|
1231
|
-
let providers = entries.map(normalize);
|
|
1263
|
+
let providers = entries.map(normalize).map((p) => withDefaultCacheRead(p, defaultCacheReadRatio));
|
|
1232
1264
|
if (autoSort) {
|
|
1233
1265
|
providers = [...providers].sort((a, b) => priceKey(a) - priceKey(b));
|
|
1234
1266
|
}
|
|
1235
1267
|
routed.set(
|
|
1236
1268
|
name,
|
|
1237
|
-
new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost, onCall })
|
|
1269
|
+
new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost, onCall, shouldRetry })
|
|
1238
1270
|
);
|
|
1239
1271
|
}
|
|
1240
1272
|
return (modelName) => {
|
|
@@ -1263,7 +1295,11 @@ function createLCR(config) {
|
|
|
1263
1295
|
createMediaLCR,
|
|
1264
1296
|
createRunwareMediaAdapter,
|
|
1265
1297
|
formatCallRecord,
|
|
1298
|
+
isAbortError,
|
|
1299
|
+
isNetworkError,
|
|
1300
|
+
isRetryableError,
|
|
1266
1301
|
normalizedCents,
|
|
1267
1302
|
rankRoutes,
|
|
1268
|
-
referenceMegapixels
|
|
1303
|
+
referenceMegapixels,
|
|
1304
|
+
shouldFailover
|
|
1269
1305
|
});
|
package/dist/index.d.cts
CHANGED
|
@@ -165,6 +165,40 @@ interface CallRecord {
|
|
|
165
165
|
*/
|
|
166
166
|
emptyCompletion?: boolean;
|
|
167
167
|
}
|
|
168
|
+
/**
|
|
169
|
+
* A transport-level failure (provider unreachable / socket dropped / DNS /
|
|
170
|
+
* connect timeout). These carry no HTTP status, so they must be detected
|
|
171
|
+
* structurally — by Node `code` or message — or they read as non-retryable.
|
|
172
|
+
* Note: a deliberate caller cancellation (AbortError without a network code) is
|
|
173
|
+
* intentionally NOT treated as network here, so we don't "fail over" a request
|
|
174
|
+
* the caller chose to abort.
|
|
175
|
+
*/
|
|
176
|
+
declare function isNetworkError(error: unknown): boolean;
|
|
177
|
+
/** Default switch criterion: provider down / rate-limited / overloaded / unreachable. */
|
|
178
|
+
declare function isRetryableError(error: unknown): boolean;
|
|
179
|
+
/**
|
|
180
|
+
* A deliberate caller cancellation (an `AbortSignal` fired by the app). This is
|
|
181
|
+
* the one failure we must NEVER fail over: re-issuing an aborted request to the
|
|
182
|
+
* next provider is the opposite of what the caller asked for. Detected by name
|
|
183
|
+
* (`fetch`/AI SDK emit an `AbortError`) and by the canonical abort message.
|
|
184
|
+
*/
|
|
185
|
+
declare function isAbortError(error: unknown): boolean;
|
|
186
|
+
/**
|
|
187
|
+
* Default failover criterion — broader than {@link isRetryableError} on purpose.
|
|
188
|
+
* It fails over on *anything* except a deliberate caller cancellation, including
|
|
189
|
+
* a client error such as a 400. In the OpenAI-compatible aggregator world a 400
|
|
190
|
+
* is most often "THIS provider won't take this request" (an unsupported param, a
|
|
191
|
+
* model it hasn't listed, a stricter schema) rather than a universally-broken
|
|
192
|
+
* request — and the next provider may well serve it, which is the whole point of
|
|
193
|
+
* the router. When every provider rejects the request, the engine still throws
|
|
194
|
+
* (surfacing the original error), so a genuinely-bad request stays debuggable.
|
|
195
|
+
* The failed attempts keep their precise {@link ErrorKind} (`"client"` for a
|
|
196
|
+
* 400) so a real caller bug is still visible in the {@link CallRecord}.
|
|
197
|
+
*
|
|
198
|
+
* Pass a custom `shouldRetry` to opt out (e.g. `isRetryableError` to restore the
|
|
199
|
+
* stricter "client errors fail fast" behavior).
|
|
200
|
+
*/
|
|
201
|
+
declare function shouldFailover(error: unknown): boolean;
|
|
168
202
|
/**
|
|
169
203
|
* Normalize an error into a short, log-friendly class for {@link CallRecord}.
|
|
170
204
|
* An HTTP status wins (e.g. "502", "429"); otherwise the first matching
|
|
@@ -589,6 +623,29 @@ interface LCRConfig {
|
|
|
589
623
|
* you. Pair with `formatCallRecord` for a one-line log. See {@link CallRecord}.
|
|
590
624
|
*/
|
|
591
625
|
onCall?: (record: CallRecord) => void;
|
|
626
|
+
/**
|
|
627
|
+
* Decide whether a failed attempt should fail over to the next provider.
|
|
628
|
+
* Defaults to {@link shouldFailover} — fail over on everything except a
|
|
629
|
+
* deliberate caller cancellation, so a provider-specific 400 still survives by
|
|
630
|
+
* trying the next provider. Pass {@link isRetryableError} to restore the
|
|
631
|
+
* stricter behavior where a client error (e.g. 400) fails fast.
|
|
632
|
+
*/
|
|
633
|
+
shouldRetry?: (error: unknown) => boolean;
|
|
634
|
+
/**
|
|
635
|
+
* Fallback prompt-cache read rate, as a fraction of each leg's `input` price,
|
|
636
|
+
* applied ONLY to legs whose `cost` omits an explicit `cacheRead`. So a leg
|
|
637
|
+
* priced `{ input: 0.5, output: 3 }` with `defaultCacheReadRatio: 0.1` bills
|
|
638
|
+
* its cached input tokens at `0.05`/1M and reports the resulting
|
|
639
|
+
* `cachedSavingUsd` — without every route having to hardcode `cacheRead`.
|
|
640
|
+
*
|
|
641
|
+
* Most providers' cache-read price is ~0.1× input (Anthropic, Gemini, DeepSeek);
|
|
642
|
+
* `0.1` is a sane default. Legs with their own `cacheRead` are untouched, so set
|
|
643
|
+
* it explicitly for outliers (e.g. OpenAI's ~0.5×). Unset = pre-existing
|
|
644
|
+
* behavior: cached tokens bill at the full input rate and save nothing.
|
|
645
|
+
* Caching is detected from the provider's reported usage either way; this only
|
|
646
|
+
* controls the *price* applied to it. Must be in [0, 1].
|
|
647
|
+
*/
|
|
648
|
+
defaultCacheReadRatio?: number;
|
|
592
649
|
}
|
|
593
650
|
/** Resolve a logical model name to a routed model. */
|
|
594
651
|
type LCRRouter = (modelName: string) => LanguageModelV3;
|
|
@@ -599,4 +656,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
|
|
|
599
656
|
*/
|
|
600
657
|
declare function createLCR(config: LCRConfig): LCRRouter;
|
|
601
658
|
|
|
602
|
-
export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
|
|
659
|
+
export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, isAbortError, isNetworkError, isRetryableError, normalizedCents, rankRoutes, referenceMegapixels, shouldFailover };
|
package/dist/index.d.ts
CHANGED
|
@@ -165,6 +165,40 @@ interface CallRecord {
|
|
|
165
165
|
*/
|
|
166
166
|
emptyCompletion?: boolean;
|
|
167
167
|
}
|
|
168
|
+
/**
|
|
169
|
+
* A transport-level failure (provider unreachable / socket dropped / DNS /
|
|
170
|
+
* connect timeout). These carry no HTTP status, so they must be detected
|
|
171
|
+
* structurally — by Node `code` or message — or they read as non-retryable.
|
|
172
|
+
* Note: a deliberate caller cancellation (AbortError without a network code) is
|
|
173
|
+
* intentionally NOT treated as network here, so we don't "fail over" a request
|
|
174
|
+
* the caller chose to abort.
|
|
175
|
+
*/
|
|
176
|
+
declare function isNetworkError(error: unknown): boolean;
|
|
177
|
+
/** Default switch criterion: provider down / rate-limited / overloaded / unreachable. */
|
|
178
|
+
declare function isRetryableError(error: unknown): boolean;
|
|
179
|
+
/**
|
|
180
|
+
* A deliberate caller cancellation (an `AbortSignal` fired by the app). This is
|
|
181
|
+
* the one failure we must NEVER fail over: re-issuing an aborted request to the
|
|
182
|
+
* next provider is the opposite of what the caller asked for. Detected by name
|
|
183
|
+
* (`fetch`/AI SDK emit an `AbortError`) and by the canonical abort message.
|
|
184
|
+
*/
|
|
185
|
+
declare function isAbortError(error: unknown): boolean;
|
|
186
|
+
/**
|
|
187
|
+
* Default failover criterion — broader than {@link isRetryableError} on purpose.
|
|
188
|
+
* It fails over on *anything* except a deliberate caller cancellation, including
|
|
189
|
+
* a client error such as a 400. In the OpenAI-compatible aggregator world a 400
|
|
190
|
+
* is most often "THIS provider won't take this request" (an unsupported param, a
|
|
191
|
+
* model it hasn't listed, a stricter schema) rather than a universally-broken
|
|
192
|
+
* request — and the next provider may well serve it, which is the whole point of
|
|
193
|
+
* the router. When every provider rejects the request, the engine still throws
|
|
194
|
+
* (surfacing the original error), so a genuinely-bad request stays debuggable.
|
|
195
|
+
* The failed attempts keep their precise {@link ErrorKind} (`"client"` for a
|
|
196
|
+
* 400) so a real caller bug is still visible in the {@link CallRecord}.
|
|
197
|
+
*
|
|
198
|
+
* Pass a custom `shouldRetry` to opt out (e.g. `isRetryableError` to restore the
|
|
199
|
+
* stricter "client errors fail fast" behavior).
|
|
200
|
+
*/
|
|
201
|
+
declare function shouldFailover(error: unknown): boolean;
|
|
168
202
|
/**
|
|
169
203
|
* Normalize an error into a short, log-friendly class for {@link CallRecord}.
|
|
170
204
|
* An HTTP status wins (e.g. "502", "429"); otherwise the first matching
|
|
@@ -589,6 +623,29 @@ interface LCRConfig {
|
|
|
589
623
|
* you. Pair with `formatCallRecord` for a one-line log. See {@link CallRecord}.
|
|
590
624
|
*/
|
|
591
625
|
onCall?: (record: CallRecord) => void;
|
|
626
|
+
/**
|
|
627
|
+
* Decide whether a failed attempt should fail over to the next provider.
|
|
628
|
+
* Defaults to {@link shouldFailover} — fail over on everything except a
|
|
629
|
+
* deliberate caller cancellation, so a provider-specific 400 still survives by
|
|
630
|
+
* trying the next provider. Pass {@link isRetryableError} to restore the
|
|
631
|
+
* stricter behavior where a client error (e.g. 400) fails fast.
|
|
632
|
+
*/
|
|
633
|
+
shouldRetry?: (error: unknown) => boolean;
|
|
634
|
+
/**
|
|
635
|
+
* Fallback prompt-cache read rate, as a fraction of each leg's `input` price,
|
|
636
|
+
* applied ONLY to legs whose `cost` omits an explicit `cacheRead`. So a leg
|
|
637
|
+
* priced `{ input: 0.5, output: 3 }` with `defaultCacheReadRatio: 0.1` bills
|
|
638
|
+
* its cached input tokens at `0.05`/1M and reports the resulting
|
|
639
|
+
* `cachedSavingUsd` — without every route having to hardcode `cacheRead`.
|
|
640
|
+
*
|
|
641
|
+
* Most providers' cache-read price is ~0.1× input (Anthropic, Gemini, DeepSeek);
|
|
642
|
+
* `0.1` is a sane default. Legs with their own `cacheRead` are untouched, so set
|
|
643
|
+
* it explicitly for outliers (e.g. OpenAI's ~0.5×). Unset = pre-existing
|
|
644
|
+
* behavior: cached tokens bill at the full input rate and save nothing.
|
|
645
|
+
* Caching is detected from the provider's reported usage either way; this only
|
|
646
|
+
* controls the *price* applied to it. Must be in [0, 1].
|
|
647
|
+
*/
|
|
648
|
+
defaultCacheReadRatio?: number;
|
|
592
649
|
}
|
|
593
650
|
/** Resolve a logical model name to a routed model. */
|
|
594
651
|
type LCRRouter = (modelName: string) => LanguageModelV3;
|
|
@@ -599,4 +656,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
|
|
|
599
656
|
*/
|
|
600
657
|
declare function createLCR(config: LCRConfig): LCRRouter;
|
|
601
658
|
|
|
602
|
-
export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
|
|
659
|
+
export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, isAbortError, isNetworkError, isRetryableError, normalizedCents, rankRoutes, referenceMegapixels, shouldFailover };
|
package/dist/index.js
CHANGED
|
@@ -116,6 +116,15 @@ function isRetryableError(error) {
|
|
|
116
116
|
const { text } = errorSignals(error);
|
|
117
117
|
return RETRYABLE_PATTERNS.some((p) => text.includes(p));
|
|
118
118
|
}
|
|
119
|
+
function isAbortError(error) {
|
|
120
|
+
const e = error;
|
|
121
|
+
if (typeof e?.name === "string" && e.name === "AbortError") return true;
|
|
122
|
+
const { text } = errorSignals(error);
|
|
123
|
+
return text.includes("operation was aborted") || text.includes("operation was canceled");
|
|
124
|
+
}
|
|
125
|
+
function shouldFailover(error) {
|
|
126
|
+
return !isAbortError(error);
|
|
127
|
+
}
|
|
119
128
|
function classifyError(error) {
|
|
120
129
|
if (error instanceof EmptyCompletionError) return "empty_completion";
|
|
121
130
|
const e = error;
|
|
@@ -239,7 +248,7 @@ var LcrFallbackModel = class {
|
|
|
239
248
|
this.lastFailoverAt = Date.now();
|
|
240
249
|
}
|
|
241
250
|
shouldRetry(error) {
|
|
242
|
-
return (this.opts.shouldRetry ??
|
|
251
|
+
return (this.opts.shouldRetry ?? shouldFailover)(error);
|
|
243
252
|
}
|
|
244
253
|
// Observer callbacks are caller-supplied logging hooks: a throw from one of
|
|
245
254
|
// them must NEVER turn a successful (or already-failed) request into a
|
|
@@ -272,6 +281,7 @@ var LcrFallbackModel = class {
|
|
|
272
281
|
}
|
|
273
282
|
/** Record a failed attempt onto the call's chain (no event yet). */
|
|
274
283
|
recordFail(ctx, provider, attemptStart, error) {
|
|
284
|
+
if (ctx.firstError === void 0) ctx.firstError = error;
|
|
275
285
|
ctx.attempts.push({
|
|
276
286
|
provider: provider.label,
|
|
277
287
|
ok: false,
|
|
@@ -387,7 +397,7 @@ var LcrFallbackModel = class {
|
|
|
387
397
|
}
|
|
388
398
|
}
|
|
389
399
|
this.finalizeFail(ctx);
|
|
390
|
-
throw lastError;
|
|
400
|
+
throw ctx.firstError ?? lastError;
|
|
391
401
|
}
|
|
392
402
|
async doStream(options) {
|
|
393
403
|
return this.doStreamWithCtx(options, this.startCall(options), this.startIndex(), 0);
|
|
@@ -423,7 +433,7 @@ var LcrFallbackModel = class {
|
|
|
423
433
|
tried++;
|
|
424
434
|
if (tried >= n) {
|
|
425
435
|
this.finalizeFail(ctx);
|
|
426
|
-
throw error;
|
|
436
|
+
throw ctx.firstError ?? error;
|
|
427
437
|
}
|
|
428
438
|
idx = (idx + 1) % n;
|
|
429
439
|
}
|
|
@@ -471,7 +481,7 @@ var LcrFallbackModel = class {
|
|
|
471
481
|
const nextTried = triedBeforeServing + 1;
|
|
472
482
|
if (nextTried >= n) {
|
|
473
483
|
self.finalizeFail(ctx);
|
|
474
|
-
controller.error(error);
|
|
484
|
+
controller.error(ctx.firstError ?? error);
|
|
475
485
|
return;
|
|
476
486
|
}
|
|
477
487
|
try {
|
|
@@ -1182,17 +1192,35 @@ function normalize(entry) {
|
|
|
1182
1192
|
function priceKey(p) {
|
|
1183
1193
|
return p.cost ? p.cost.input + p.cost.output : Number.POSITIVE_INFINITY;
|
|
1184
1194
|
}
|
|
1195
|
+
function withDefaultCacheRead(p, ratio) {
|
|
1196
|
+
if (ratio === void 0 || !p.cost || p.cost.cacheRead !== void 0) return p;
|
|
1197
|
+
return { ...p, cost: { ...p.cost, cacheRead: p.cost.input * ratio } };
|
|
1198
|
+
}
|
|
1185
1199
|
function createLCR(config) {
|
|
1186
|
-
const {
|
|
1200
|
+
const {
|
|
1201
|
+
models,
|
|
1202
|
+
autoSort = false,
|
|
1203
|
+
resetIntervalMs,
|
|
1204
|
+
onError,
|
|
1205
|
+
onCost,
|
|
1206
|
+
onCall,
|
|
1207
|
+
shouldRetry,
|
|
1208
|
+
defaultCacheReadRatio
|
|
1209
|
+
} = config;
|
|
1210
|
+
if (defaultCacheReadRatio !== void 0 && (defaultCacheReadRatio < 0 || defaultCacheReadRatio > 1)) {
|
|
1211
|
+
throw new Error(
|
|
1212
|
+
`ai-lcr: defaultCacheReadRatio must be in [0, 1], got ${defaultCacheReadRatio}`
|
|
1213
|
+
);
|
|
1214
|
+
}
|
|
1187
1215
|
const routed = /* @__PURE__ */ new Map();
|
|
1188
1216
|
for (const [name, entries] of Object.entries(models)) {
|
|
1189
|
-
let providers = entries.map(normalize);
|
|
1217
|
+
let providers = entries.map(normalize).map((p) => withDefaultCacheRead(p, defaultCacheReadRatio));
|
|
1190
1218
|
if (autoSort) {
|
|
1191
1219
|
providers = [...providers].sort((a, b) => priceKey(a) - priceKey(b));
|
|
1192
1220
|
}
|
|
1193
1221
|
routed.set(
|
|
1194
1222
|
name,
|
|
1195
|
-
new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost, onCall })
|
|
1223
|
+
new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost, onCall, shouldRetry })
|
|
1196
1224
|
);
|
|
1197
1225
|
}
|
|
1198
1226
|
return (modelName) => {
|
|
@@ -1220,7 +1248,11 @@ export {
|
|
|
1220
1248
|
createMediaLCR,
|
|
1221
1249
|
createRunwareMediaAdapter,
|
|
1222
1250
|
formatCallRecord,
|
|
1251
|
+
isAbortError,
|
|
1252
|
+
isNetworkError,
|
|
1253
|
+
isRetryableError,
|
|
1223
1254
|
normalizedCents,
|
|
1224
1255
|
rankRoutes,
|
|
1225
|
-
referenceMegapixels
|
|
1256
|
+
referenceMegapixels,
|
|
1257
|
+
shouldFailover
|
|
1226
1258
|
};
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "ai-lcr",
|
|
3
|
-
"version": "0.5.
|
|
3
|
+
"version": "0.5.4",
|
|
4
4
|
"description": "Least Cost Routing for LLMs — route every model call to the cheapest available provider, fall back automatically, and track real cost. Built for the Vercel AI SDK.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"ai",
|