ai-lcr 0.5.2 → 0.5.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,57 @@ All notable changes to `ai-lcr` are documented here. The format follows
4
4
  [Keep a Changelog](https://keepachangelog.com/), and the project adheres to
5
5
  [Semantic Versioning](https://semver.org/).
6
6
 
7
+ ## [0.5.4] — 2026-06-03
8
+
9
+ ### Changed
10
+
11
+ - **A provider 400 now fails over instead of being passed through.** Previously
12
+ any client error (400/422/…) was treated as the caller's fault and thrown
13
+ immediately, killing the request even when another provider would have served
14
+ it. But across OpenAI-compatible aggregators a 400 is most often
15
+ *provider-specific* — an unsupported parameter, a model the provider hasn't
16
+ listed, a stricter JSON schema — not a universally-broken request. The default
17
+ failover gate (`shouldFailover`) now advances to the next provider on **any**
18
+ failure except a deliberate caller cancellation (`AbortSignal`), which is the
19
+ one thing we must never re-issue elsewhere. When every provider rejects the
20
+ request it still throws — now surfacing the **first** (original) error rather
21
+ than the last fallback's, so a genuine caller bug stays debuggable. Failed
22
+ attempts keep their precise `ErrorKind` (`"client"` for a 400) in the
23
+ `CallRecord`, so a real bug is still visible.
24
+
25
+ To restore the old "client errors fail fast" behavior, pass
26
+ `shouldRetry: isRetryableError` to `createLCR`.
27
+
28
+ ### Added
29
+
30
+ - **`createLCR({ shouldRetry })`.** The failover predicate is now configurable
31
+ from the top-level API (it previously existed only on the internal engine), so
32
+ callers can tune or fully override the policy above.
33
+ - **Exported error predicates** `isRetryableError`, `isNetworkError`,
34
+ `isAbortError`, and `shouldFailover` — building blocks for a custom
35
+ `shouldRetry`.
36
+
37
+ ## [0.5.3] — 2026-06-03
38
+
39
+ All additions are optional and backward compatible.
40
+
41
+ ### Added
42
+
43
+ - **`defaultCacheReadRatio` — chain-wide fallback price for prompt-cache reads.**
44
+ ai-lcr already detects cache hits from the provider's reported usage and emits
45
+ `cachedInputTokens` for any provider that reports them (Anthropic, Gemini's
46
+ implicit cache, DeepSeek, …). But the *saving* (`cachedSavingUsd`) and the
47
+ cache-discounted `costUsd` were only computed when a leg set an explicit
48
+ `cost.cacheRead` — so a route that forgot it (e.g. a Gemini OpenRouter leg)
49
+ silently reported `$0` saved and billed cached tokens at the full input rate.
50
+
51
+ `createLCR({ defaultCacheReadRatio: 0.1 })` now supplies a fallback cache-read
52
+ price as a fraction of each leg's `input`, applied **only** to legs that omit
53
+ an explicit `cacheRead`. Most providers' cache-read price is ~0.1× input, so
54
+ `0.1` makes cache cost + savings "just work" across every model without each
55
+ route hardcoding a rate. Legs with their own `cacheRead` are untouched (set it
56
+ for outliers like OpenAI's ~0.5×). Unset = previous behavior. Must be in [0, 1].
57
+
7
58
  ## [0.5.0] — 2026-06-02
8
59
 
9
60
  All additions are optional and backward compatible.
package/README.md CHANGED
@@ -141,7 +141,7 @@ DeepInfra carries open weights only — no first-party Claude / GPT / Gemini. Fo
141
141
  ## How it routes
142
142
 
143
143
  1. **Cheapest first.** Providers are tried in order — list them cheapest-first, or set `autoSort: true` to order them by `cost` automatically.
144
- 2. **Fall through on failure.** On a retryable error — rate limit, 5xx, timeout, or a **billing cap** (402 / out-of-credit / quota) — it advances to the next provider, streaming-safe. A caller's own bad request (e.g. 400, 422) passes through immediately.
144
+ 2. **Fall through on failure.** On any provider failure — rate limit, 5xx, timeout, a **billing cap** (402 / out-of-credit / quota), *and* a client error like a **400** — it advances to the next provider, streaming-safe. A 400 fails over on purpose: across OpenAI-compatible aggregators a 400 is usually "*this* provider won't take this request" (an unsupported param, a model it hasn't listed, a stricter schema), not a universally-broken request — so the next provider may well serve it. If every provider rejects the request it still fails, surfacing the **original** error so a genuine caller bug stays debuggable. The one failure that never fails over is a deliberate caller cancellation (`AbortSignal`). Pass `shouldRetry: isRetryableError` to `createLCR` to restore the stricter "client errors fail fast" behavior.
145
145
  3. **Recover.** After an idle window (`resetIntervalMs`, default 60s) it snaps back to the cheapest provider.
146
146
 
147
147
  ## See what happened (`onCall`)
@@ -364,7 +364,7 @@ npm run typecheck
364
364
  npm test # mocked routing/failover tests + live Kunavo tests
365
365
  ```
366
366
 
367
- The suite covers cheapest-first routing, failover on retryable errors (and *not* failing over on a 400), exhausting the whole chain, and a real broken-provider → Kunavo recovery. Live tests run only when `KUNAVO_API_KEY` is set in the environment; otherwise they're skipped.
367
+ The suite covers cheapest-first routing, failover on retryable errors *and* on a provider 400 (but *not* on a caller cancellation), surfacing the original error when the whole chain is exhausted, and a real broken-provider → Kunavo recovery. Live tests run only when `KUNAVO_API_KEY` is set in the environment; otherwise they're skipped.
368
368
 
369
369
  ## Credits
370
370
 
package/README.zh-CN.md CHANGED
@@ -141,7 +141,7 @@ DeepInfra 只承载开源权重——没有第一方 Claude / GPT / Gemini。那
141
141
  ## 它如何路由
142
142
 
143
143
  1. **最便宜优先。** provider 按顺序依次尝试——把它们排成最便宜优先,或设置 `autoSort: true` 让它按 `cost` 自动排序。
144
- 2. **失败时向下穿透。** 遇到可重试的错误(限流、5xx、超时)时,前进到下一个 provider,且对流式安全。硬错误(400、401、403、422)会直接透传,不做重试。
144
+ 2. **失败时向下穿透。** 遇到任何 provider 失败——限流、5xx、超时、**额度耗尽**(402 / 欠费 / 余额不足),以及 **400** 这类 client 错误——都会前进到下一个 provider,且对流式安全。400 会 failover 是有意为之:在 OpenAI 兼容聚合层里,400 往往是"*这家* provider 不吃这个请求"(不支持的参数、它没上架这个 model、更严格的 schema),而非请求本身坏了——换一家很可能就能服务。若所有 provider 都拒绝,请求仍会失败,并抛出**第一个**(原始)错误,让真正的调用方 bug 保持可调试。唯一永远不 failover 的是调用方主动取消(`AbortSignal`)。想恢复旧的"client 错误立即失败"行为,给 `createLCR` 传 `shouldRetry: isRetryableError`。
145
145
  3. **恢复。** 在一段空闲窗口(`resetIntervalMs`,默认 60s)之后,自动回到最便宜的 provider。
146
146
 
147
147
  ## 支持的 provider
@@ -280,7 +280,7 @@ npm run typecheck
280
280
  npm test # mock 的路由 / failover 测试 + 真实 Kunavo 测试
281
281
  ```
282
282
 
283
- 测试套件覆盖了:最便宜优先路由、可重试错误时的 failover(以及遇到 400 时*不*做 failover)、穷尽整条链路,以及一次真实的「provider 故障 → Kunavo 恢复」。真实测试仅在环境变量 `KUNAVO_API_KEY` 设置时运行,否则跳过。
283
+ 测试套件覆盖了:最便宜优先路由、可重试错误以及 provider 400 时的 failover(但调用方主动取消时*不*做 failover)、穷尽整条链路时抛出原始错误,以及一次真实的「provider 故障 → Kunavo 恢复」。真实测试仅在环境变量 `KUNAVO_API_KEY` 设置时运行,否则跳过。
284
284
 
285
285
  ## 致谢
286
286
 
package/dist/index.cjs CHANGED
@@ -34,9 +34,13 @@ __export(index_exports, {
34
34
  createMediaLCR: () => createMediaLCR,
35
35
  createRunwareMediaAdapter: () => createRunwareMediaAdapter,
36
36
  formatCallRecord: () => formatCallRecord,
37
+ isAbortError: () => isAbortError,
38
+ isNetworkError: () => isNetworkError,
39
+ isRetryableError: () => isRetryableError,
37
40
  normalizedCents: () => normalizedCents,
38
41
  rankRoutes: () => rankRoutes,
39
- referenceMegapixels: () => referenceMegapixels
42
+ referenceMegapixels: () => referenceMegapixels,
43
+ shouldFailover: () => shouldFailover
40
44
  });
41
45
  module.exports = __toCommonJS(index_exports);
42
46
 
@@ -158,6 +162,15 @@ function isRetryableError(error) {
158
162
  const { text } = errorSignals(error);
159
163
  return RETRYABLE_PATTERNS.some((p) => text.includes(p));
160
164
  }
165
+ function isAbortError(error) {
166
+ const e = error;
167
+ if (typeof e?.name === "string" && e.name === "AbortError") return true;
168
+ const { text } = errorSignals(error);
169
+ return text.includes("operation was aborted") || text.includes("operation was canceled");
170
+ }
171
+ function shouldFailover(error) {
172
+ return !isAbortError(error);
173
+ }
161
174
  function classifyError(error) {
162
175
  if (error instanceof EmptyCompletionError) return "empty_completion";
163
176
  const e = error;
@@ -281,7 +294,7 @@ var LcrFallbackModel = class {
281
294
  this.lastFailoverAt = Date.now();
282
295
  }
283
296
  shouldRetry(error) {
284
- return (this.opts.shouldRetry ?? isRetryableError)(error);
297
+ return (this.opts.shouldRetry ?? shouldFailover)(error);
285
298
  }
286
299
  // Observer callbacks are caller-supplied logging hooks: a throw from one of
287
300
  // them must NEVER turn a successful (or already-failed) request into a
@@ -314,6 +327,7 @@ var LcrFallbackModel = class {
314
327
  }
315
328
  /** Record a failed attempt onto the call's chain (no event yet). */
316
329
  recordFail(ctx, provider, attemptStart, error) {
330
+ if (ctx.firstError === void 0) ctx.firstError = error;
317
331
  ctx.attempts.push({
318
332
  provider: provider.label,
319
333
  ok: false,
@@ -429,7 +443,7 @@ var LcrFallbackModel = class {
429
443
  }
430
444
  }
431
445
  this.finalizeFail(ctx);
432
- throw lastError;
446
+ throw ctx.firstError ?? lastError;
433
447
  }
434
448
  async doStream(options) {
435
449
  return this.doStreamWithCtx(options, this.startCall(options), this.startIndex(), 0);
@@ -465,7 +479,7 @@ var LcrFallbackModel = class {
465
479
  tried++;
466
480
  if (tried >= n) {
467
481
  this.finalizeFail(ctx);
468
- throw error;
482
+ throw ctx.firstError ?? error;
469
483
  }
470
484
  idx = (idx + 1) % n;
471
485
  }
@@ -513,7 +527,7 @@ var LcrFallbackModel = class {
513
527
  const nextTried = triedBeforeServing + 1;
514
528
  if (nextTried >= n) {
515
529
  self.finalizeFail(ctx);
516
- controller.error(error);
530
+ controller.error(ctx.firstError ?? error);
517
531
  return;
518
532
  }
519
533
  try {
@@ -1224,17 +1238,35 @@ function normalize(entry) {
1224
1238
  function priceKey(p) {
1225
1239
  return p.cost ? p.cost.input + p.cost.output : Number.POSITIVE_INFINITY;
1226
1240
  }
1241
+ function withDefaultCacheRead(p, ratio) {
1242
+ if (ratio === void 0 || !p.cost || p.cost.cacheRead !== void 0) return p;
1243
+ return { ...p, cost: { ...p.cost, cacheRead: p.cost.input * ratio } };
1244
+ }
1227
1245
  function createLCR(config) {
1228
- const { models, autoSort = false, resetIntervalMs, onError, onCost, onCall } = config;
1246
+ const {
1247
+ models,
1248
+ autoSort = false,
1249
+ resetIntervalMs,
1250
+ onError,
1251
+ onCost,
1252
+ onCall,
1253
+ shouldRetry,
1254
+ defaultCacheReadRatio
1255
+ } = config;
1256
+ if (defaultCacheReadRatio !== void 0 && (defaultCacheReadRatio < 0 || defaultCacheReadRatio > 1)) {
1257
+ throw new Error(
1258
+ `ai-lcr: defaultCacheReadRatio must be in [0, 1], got ${defaultCacheReadRatio}`
1259
+ );
1260
+ }
1229
1261
  const routed = /* @__PURE__ */ new Map();
1230
1262
  for (const [name, entries] of Object.entries(models)) {
1231
- let providers = entries.map(normalize);
1263
+ let providers = entries.map(normalize).map((p) => withDefaultCacheRead(p, defaultCacheReadRatio));
1232
1264
  if (autoSort) {
1233
1265
  providers = [...providers].sort((a, b) => priceKey(a) - priceKey(b));
1234
1266
  }
1235
1267
  routed.set(
1236
1268
  name,
1237
- new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost, onCall })
1269
+ new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost, onCall, shouldRetry })
1238
1270
  );
1239
1271
  }
1240
1272
  return (modelName) => {
@@ -1263,7 +1295,11 @@ function createLCR(config) {
1263
1295
  createMediaLCR,
1264
1296
  createRunwareMediaAdapter,
1265
1297
  formatCallRecord,
1298
+ isAbortError,
1299
+ isNetworkError,
1300
+ isRetryableError,
1266
1301
  normalizedCents,
1267
1302
  rankRoutes,
1268
- referenceMegapixels
1303
+ referenceMegapixels,
1304
+ shouldFailover
1269
1305
  });
package/dist/index.d.cts CHANGED
@@ -165,6 +165,40 @@ interface CallRecord {
165
165
  */
166
166
  emptyCompletion?: boolean;
167
167
  }
168
+ /**
169
+ * A transport-level failure (provider unreachable / socket dropped / DNS /
170
+ * connect timeout). These carry no HTTP status, so they must be detected
171
+ * structurally — by Node `code` or message — or they read as non-retryable.
172
+ * Note: a deliberate caller cancellation (AbortError without a network code) is
173
+ * intentionally NOT treated as network here, so we don't "fail over" a request
174
+ * the caller chose to abort.
175
+ */
176
+ declare function isNetworkError(error: unknown): boolean;
177
+ /** Default switch criterion: provider down / rate-limited / overloaded / unreachable. */
178
+ declare function isRetryableError(error: unknown): boolean;
179
+ /**
180
+ * A deliberate caller cancellation (an `AbortSignal` fired by the app). This is
181
+ * the one failure we must NEVER fail over: re-issuing an aborted request to the
182
+ * next provider is the opposite of what the caller asked for. Detected by name
183
+ * (`fetch`/AI SDK emit an `AbortError`) and by the canonical abort message.
184
+ */
185
+ declare function isAbortError(error: unknown): boolean;
186
+ /**
187
+ * Default failover criterion — broader than {@link isRetryableError} on purpose.
188
+ * It fails over on *anything* except a deliberate caller cancellation, including
189
+ * a client error such as a 400. In the OpenAI-compatible aggregator world a 400
190
+ * is most often "THIS provider won't take this request" (an unsupported param, a
191
+ * model it hasn't listed, a stricter schema) rather than a universally-broken
192
+ * request — and the next provider may well serve it, which is the whole point of
193
+ * the router. When every provider rejects the request, the engine still throws
194
+ * (surfacing the original error), so a genuinely-bad request stays debuggable.
195
+ * The failed attempts keep their precise {@link ErrorKind} (`"client"` for a
196
+ * 400) so a real caller bug is still visible in the {@link CallRecord}.
197
+ *
198
+ * Pass a custom `shouldRetry` to opt out (e.g. `isRetryableError` to restore the
199
+ * stricter "client errors fail fast" behavior).
200
+ */
201
+ declare function shouldFailover(error: unknown): boolean;
168
202
  /**
169
203
  * Normalize an error into a short, log-friendly class for {@link CallRecord}.
170
204
  * An HTTP status wins (e.g. "502", "429"); otherwise the first matching
@@ -589,6 +623,29 @@ interface LCRConfig {
589
623
  * you. Pair with `formatCallRecord` for a one-line log. See {@link CallRecord}.
590
624
  */
591
625
  onCall?: (record: CallRecord) => void;
626
+ /**
627
+ * Decide whether a failed attempt should fail over to the next provider.
628
+ * Defaults to {@link shouldFailover} — fail over on everything except a
629
+ * deliberate caller cancellation, so a provider-specific 400 still survives by
630
+ * trying the next provider. Pass {@link isRetryableError} to restore the
631
+ * stricter behavior where a client error (e.g. 400) fails fast.
632
+ */
633
+ shouldRetry?: (error: unknown) => boolean;
634
+ /**
635
+ * Fallback prompt-cache read rate, as a fraction of each leg's `input` price,
636
+ * applied ONLY to legs whose `cost` omits an explicit `cacheRead`. So a leg
637
+ * priced `{ input: 0.5, output: 3 }` with `defaultCacheReadRatio: 0.1` bills
638
+ * its cached input tokens at `0.05`/1M and reports the resulting
639
+ * `cachedSavingUsd` — without every route having to hardcode `cacheRead`.
640
+ *
641
+ * Most providers' cache-read price is ~0.1× input (Anthropic, Gemini, DeepSeek);
642
+ * `0.1` is a sane default. Legs with their own `cacheRead` are untouched, so set
643
+ * it explicitly for outliers (e.g. OpenAI's ~0.5×). Unset = pre-existing
644
+ * behavior: cached tokens bill at the full input rate and save nothing.
645
+ * Caching is detected from the provider's reported usage either way; this only
646
+ * controls the *price* applied to it. Must be in [0, 1].
647
+ */
648
+ defaultCacheReadRatio?: number;
592
649
  }
593
650
  /** Resolve a logical model name to a routed model. */
594
651
  type LCRRouter = (modelName: string) => LanguageModelV3;
@@ -599,4 +656,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
599
656
  */
600
657
  declare function createLCR(config: LCRConfig): LCRRouter;
601
658
 
602
- export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
659
+ export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, isAbortError, isNetworkError, isRetryableError, normalizedCents, rankRoutes, referenceMegapixels, shouldFailover };
package/dist/index.d.ts CHANGED
@@ -165,6 +165,40 @@ interface CallRecord {
165
165
  */
166
166
  emptyCompletion?: boolean;
167
167
  }
168
+ /**
169
+ * A transport-level failure (provider unreachable / socket dropped / DNS /
170
+ * connect timeout). These carry no HTTP status, so they must be detected
171
+ * structurally — by Node `code` or message — or they read as non-retryable.
172
+ * Note: a deliberate caller cancellation (AbortError without a network code) is
173
+ * intentionally NOT treated as network here, so we don't "fail over" a request
174
+ * the caller chose to abort.
175
+ */
176
+ declare function isNetworkError(error: unknown): boolean;
177
+ /** Default switch criterion: provider down / rate-limited / overloaded / unreachable. */
178
+ declare function isRetryableError(error: unknown): boolean;
179
+ /**
180
+ * A deliberate caller cancellation (an `AbortSignal` fired by the app). This is
181
+ * the one failure we must NEVER fail over: re-issuing an aborted request to the
182
+ * next provider is the opposite of what the caller asked for. Detected by name
183
+ * (`fetch`/AI SDK emit an `AbortError`) and by the canonical abort message.
184
+ */
185
+ declare function isAbortError(error: unknown): boolean;
186
+ /**
187
+ * Default failover criterion — broader than {@link isRetryableError} on purpose.
188
+ * It fails over on *anything* except a deliberate caller cancellation, including
189
+ * a client error such as a 400. In the OpenAI-compatible aggregator world a 400
190
+ * is most often "THIS provider won't take this request" (an unsupported param, a
191
+ * model it hasn't listed, a stricter schema) rather than a universally-broken
192
+ * request — and the next provider may well serve it, which is the whole point of
193
+ * the router. When every provider rejects the request, the engine still throws
194
+ * (surfacing the original error), so a genuinely-bad request stays debuggable.
195
+ * The failed attempts keep their precise {@link ErrorKind} (`"client"` for a
196
+ * 400) so a real caller bug is still visible in the {@link CallRecord}.
197
+ *
198
+ * Pass a custom `shouldRetry` to opt out (e.g. `isRetryableError` to restore the
199
+ * stricter "client errors fail fast" behavior).
200
+ */
201
+ declare function shouldFailover(error: unknown): boolean;
168
202
  /**
169
203
  * Normalize an error into a short, log-friendly class for {@link CallRecord}.
170
204
  * An HTTP status wins (e.g. "502", "429"); otherwise the first matching
@@ -589,6 +623,29 @@ interface LCRConfig {
589
623
  * you. Pair with `formatCallRecord` for a one-line log. See {@link CallRecord}.
590
624
  */
591
625
  onCall?: (record: CallRecord) => void;
626
+ /**
627
+ * Decide whether a failed attempt should fail over to the next provider.
628
+ * Defaults to {@link shouldFailover} — fail over on everything except a
629
+ * deliberate caller cancellation, so a provider-specific 400 still survives by
630
+ * trying the next provider. Pass {@link isRetryableError} to restore the
631
+ * stricter behavior where a client error (e.g. 400) fails fast.
632
+ */
633
+ shouldRetry?: (error: unknown) => boolean;
634
+ /**
635
+ * Fallback prompt-cache read rate, as a fraction of each leg's `input` price,
636
+ * applied ONLY to legs whose `cost` omits an explicit `cacheRead`. So a leg
637
+ * priced `{ input: 0.5, output: 3 }` with `defaultCacheReadRatio: 0.1` bills
638
+ * its cached input tokens at `0.05`/1M and reports the resulting
639
+ * `cachedSavingUsd` — without every route having to hardcode `cacheRead`.
640
+ *
641
+ * Most providers' cache-read price is ~0.1× input (Anthropic, Gemini, DeepSeek);
642
+ * `0.1` is a sane default. Legs with their own `cacheRead` are untouched, so set
643
+ * it explicitly for outliers (e.g. OpenAI's ~0.5×). Unset = pre-existing
644
+ * behavior: cached tokens bill at the full input rate and save nothing.
645
+ * Caching is detected from the provider's reported usage either way; this only
646
+ * controls the *price* applied to it. Must be in [0, 1].
647
+ */
648
+ defaultCacheReadRatio?: number;
592
649
  }
593
650
  /** Resolve a logical model name to a routed model. */
594
651
  type LCRRouter = (modelName: string) => LanguageModelV3;
@@ -599,4 +656,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
599
656
  */
600
657
  declare function createLCR(config: LCRConfig): LCRRouter;
601
658
 
602
- export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
659
+ export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, isAbortError, isNetworkError, isRetryableError, normalizedCents, rankRoutes, referenceMegapixels, shouldFailover };
package/dist/index.js CHANGED
@@ -116,6 +116,15 @@ function isRetryableError(error) {
116
116
  const { text } = errorSignals(error);
117
117
  return RETRYABLE_PATTERNS.some((p) => text.includes(p));
118
118
  }
119
+ function isAbortError(error) {
120
+ const e = error;
121
+ if (typeof e?.name === "string" && e.name === "AbortError") return true;
122
+ const { text } = errorSignals(error);
123
+ return text.includes("operation was aborted") || text.includes("operation was canceled");
124
+ }
125
+ function shouldFailover(error) {
126
+ return !isAbortError(error);
127
+ }
119
128
  function classifyError(error) {
120
129
  if (error instanceof EmptyCompletionError) return "empty_completion";
121
130
  const e = error;
@@ -239,7 +248,7 @@ var LcrFallbackModel = class {
239
248
  this.lastFailoverAt = Date.now();
240
249
  }
241
250
  shouldRetry(error) {
242
- return (this.opts.shouldRetry ?? isRetryableError)(error);
251
+ return (this.opts.shouldRetry ?? shouldFailover)(error);
243
252
  }
244
253
  // Observer callbacks are caller-supplied logging hooks: a throw from one of
245
254
  // them must NEVER turn a successful (or already-failed) request into a
@@ -272,6 +281,7 @@ var LcrFallbackModel = class {
272
281
  }
273
282
  /** Record a failed attempt onto the call's chain (no event yet). */
274
283
  recordFail(ctx, provider, attemptStart, error) {
284
+ if (ctx.firstError === void 0) ctx.firstError = error;
275
285
  ctx.attempts.push({
276
286
  provider: provider.label,
277
287
  ok: false,
@@ -387,7 +397,7 @@ var LcrFallbackModel = class {
387
397
  }
388
398
  }
389
399
  this.finalizeFail(ctx);
390
- throw lastError;
400
+ throw ctx.firstError ?? lastError;
391
401
  }
392
402
  async doStream(options) {
393
403
  return this.doStreamWithCtx(options, this.startCall(options), this.startIndex(), 0);
@@ -423,7 +433,7 @@ var LcrFallbackModel = class {
423
433
  tried++;
424
434
  if (tried >= n) {
425
435
  this.finalizeFail(ctx);
426
- throw error;
436
+ throw ctx.firstError ?? error;
427
437
  }
428
438
  idx = (idx + 1) % n;
429
439
  }
@@ -471,7 +481,7 @@ var LcrFallbackModel = class {
471
481
  const nextTried = triedBeforeServing + 1;
472
482
  if (nextTried >= n) {
473
483
  self.finalizeFail(ctx);
474
- controller.error(error);
484
+ controller.error(ctx.firstError ?? error);
475
485
  return;
476
486
  }
477
487
  try {
@@ -1182,17 +1192,35 @@ function normalize(entry) {
1182
1192
  function priceKey(p) {
1183
1193
  return p.cost ? p.cost.input + p.cost.output : Number.POSITIVE_INFINITY;
1184
1194
  }
1195
+ function withDefaultCacheRead(p, ratio) {
1196
+ if (ratio === void 0 || !p.cost || p.cost.cacheRead !== void 0) return p;
1197
+ return { ...p, cost: { ...p.cost, cacheRead: p.cost.input * ratio } };
1198
+ }
1185
1199
  function createLCR(config) {
1186
- const { models, autoSort = false, resetIntervalMs, onError, onCost, onCall } = config;
1200
+ const {
1201
+ models,
1202
+ autoSort = false,
1203
+ resetIntervalMs,
1204
+ onError,
1205
+ onCost,
1206
+ onCall,
1207
+ shouldRetry,
1208
+ defaultCacheReadRatio
1209
+ } = config;
1210
+ if (defaultCacheReadRatio !== void 0 && (defaultCacheReadRatio < 0 || defaultCacheReadRatio > 1)) {
1211
+ throw new Error(
1212
+ `ai-lcr: defaultCacheReadRatio must be in [0, 1], got ${defaultCacheReadRatio}`
1213
+ );
1214
+ }
1187
1215
  const routed = /* @__PURE__ */ new Map();
1188
1216
  for (const [name, entries] of Object.entries(models)) {
1189
- let providers = entries.map(normalize);
1217
+ let providers = entries.map(normalize).map((p) => withDefaultCacheRead(p, defaultCacheReadRatio));
1190
1218
  if (autoSort) {
1191
1219
  providers = [...providers].sort((a, b) => priceKey(a) - priceKey(b));
1192
1220
  }
1193
1221
  routed.set(
1194
1222
  name,
1195
- new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost, onCall })
1223
+ new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost, onCall, shouldRetry })
1196
1224
  );
1197
1225
  }
1198
1226
  return (modelName) => {
@@ -1220,7 +1248,11 @@ export {
1220
1248
  createMediaLCR,
1221
1249
  createRunwareMediaAdapter,
1222
1250
  formatCallRecord,
1251
+ isAbortError,
1252
+ isNetworkError,
1253
+ isRetryableError,
1223
1254
  normalizedCents,
1224
1255
  rankRoutes,
1225
- referenceMegapixels
1256
+ referenceMegapixels,
1257
+ shouldFailover
1226
1258
  };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ai-lcr",
3
- "version": "0.5.2",
3
+ "version": "0.5.4",
4
4
  "description": "Least Cost Routing for LLMs — route every model call to the cheapest available provider, fall back automatically, and track real cost. Built for the Vercel AI SDK.",
5
5
  "keywords": [
6
6
  "ai",