ai-lcr 0.6.1 → 0.6.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,11 +4,42 @@ All notable changes to `ai-lcr` are documented here. The format follows
4
4
  [Keep a Changelog](https://keepachangelog.com/), and the project adheres to
5
5
  [Semantic Versioning](https://semver.org/).
6
6
 
7
+ ## [0.6.2] — 2026-06-11
8
+
9
+ Circuit breaker for persistently-failing providers. Until now the only recovery
10
+ lever was `resetIntervalMs`, which snaps routing back to the cheapest provider on
11
+ a timer — so a provider that's actually down keeps eating one failed attempt
12
+ every window. The breaker remembers the failure and stops sending it traffic.
13
+
14
+ ### Added
15
+
16
+ - **`createLCR({ cooldown })`.** A provider that fails `maxFailures` times within
17
+ `windowMs` is *skipped* for `cooldownMs` instead of being re-probed every
18
+ request; a single success clears its count. `true` enables defaults (3 / 60s →
19
+ 60s); pass `{ maxFailures, windowMs, cooldownMs }` to tune. New exported type
20
+ `CooldownOptions`.
21
+ - The breaker only **reorders** each request's attempt list (cooling providers go
22
+ last), so when every provider is cooling a request still tries them all rather
23
+ than failing outright — it can never turn a recoverable request into a hard
24
+ failure.
25
+
26
+ ### Changed
27
+
28
+ - The routing engine now snapshots a per-request **attempt order** once (cheapest
29
+ ring with cooling providers moved to the back) and threads it through streaming
30
+ failover, replacing the previous modular index walk. Behavior is identical when
31
+ `cooldown` is unset.
32
+
33
+ ### Compatibility
34
+
35
+ - Fully backward compatible. `cooldown` is **off by default** — with it unset no
36
+ provider is ever skipped and routing behaves exactly as before.
37
+
7
38
  ## [0.6.1] — 2026-06-11
8
39
 
9
40
  Zero-config pricing for native-maker routes. Until now every priced provider
10
41
  needed a hand-typed `cost: { input, output }`; for a vendor's own API that number
11
- is just the public list price you could look up. 0.7 bundles those.
42
+ is just the public list price you could look up. 0.6.1 bundles those.
12
43
 
13
44
  ### Added
14
45
 
package/README.md CHANGED
@@ -171,6 +171,17 @@ Look a price up yourself with `getModelPrice("claude-sonnet-4-6")`. The table is
171
171
  2. **Fall through on failure.** On any provider failure — rate limit, 5xx, timeout, a **billing cap** (402 / out-of-credit / quota), *and* a client error like a **400** — it advances to the next provider, streaming-safe. A 400 fails over on purpose: across OpenAI-compatible aggregators a 400 is usually "*this* provider won't take this request" (an unsupported param, a model it hasn't listed, a stricter schema), not a universally-broken request — so the next provider may well serve it. If every provider rejects the request it still fails, surfacing the **original** error so a genuine caller bug stays debuggable. The one failure that never fails over is a deliberate caller cancellation (`AbortSignal`). Pass `shouldRetry: isRetryableError` to `createLCR` to restore the stricter "client errors fail fast" behavior.
172
172
  3. **Recover.** After an idle window (`resetIntervalMs`, default 60s) it snaps back to the cheapest provider.
173
173
 
174
+ For a provider that's *persistently* down, the timer alone keeps re-probing it — one failed attempt every window. Turn on the **circuit breaker** to stop that:
175
+
176
+ ```ts
177
+ const lcr = createLCR({
178
+ models: { /* … */ },
179
+ cooldown: true, // skip a provider that keeps failing, instead of re-probing it
180
+ });
181
+ ```
182
+
183
+ With `cooldown` on, a provider that fails enough times in a window is *skipped* for a cooldown period rather than tried every request — and a single success clears it. Defaults are 3 failures / 60s → 60s cooldown; tune with `cooldown: { maxFailures, windowMs, cooldownMs }`. It only ever **reorders** the attempt list (cooling providers go last), so if *every* provider is cooling a request still tries them all rather than failing outright. Off by default — routing is unchanged unless you opt in.
184
+
174
185
  ## See what happened (`onCall`)
175
186
 
176
187
  `onError`/`onCost` fire separately and uncorrelated, so a failover is hard to read after the fact. `onCall` gives you **one record per request** — the full chain, the winner, the reason for each failed hop, latency, and cost — and `formatCallRecord` turns it into a one-liner you can scan:
@@ -490,6 +501,7 @@ Two OpenAI-compatible providers, same probe, same day. Cells cover both families
490
501
  ## Roadmap
491
502
 
492
503
  - [x] Own failover engine — cheapest-first routing + streaming-safe fallback, no external routing dependency
504
+ - [x] Circuit breaker (`cooldown`) — skip a persistently-failing provider instead of re-probing it every window
493
505
  - [x] Real per-call cost accounting (`onCost`)
494
506
  - [x] One correlated record per request with the full failover chain (`onCall` + `formatCallRecord`)
495
507
  - [x] Auto cheapest-first ordering (`autoSort`) from per-provider `cost`
package/dist/index.cjs CHANGED
@@ -56,6 +56,20 @@ var EmptyCompletionError = class extends Error {
56
56
  this.name = "EmptyCompletionError";
57
57
  }
58
58
  };
59
+ var COOLDOWN_DEFAULTS = {
60
+ maxFailures: 3,
61
+ windowMs: 6e4,
62
+ cooldownMs: 6e4
63
+ };
64
+ function resolveCooldown(opt) {
65
+ if (!opt) return void 0;
66
+ if (opt === true) return { ...COOLDOWN_DEFAULTS };
67
+ return {
68
+ maxFailures: opt.maxFailures ?? COOLDOWN_DEFAULTS.maxFailures,
69
+ windowMs: opt.windowMs ?? COOLDOWN_DEFAULTS.windowMs,
70
+ cooldownMs: opt.cooldownMs ?? COOLDOWN_DEFAULTS.cooldownMs
71
+ };
72
+ }
59
73
  var RETRYABLE_STATUS = /* @__PURE__ */ new Set([401, 402, 403, 408, 409, 413, 429, 498, 500]);
60
74
  var RETRYABLE_PATTERNS = [
61
75
  "overloaded",
@@ -248,21 +262,82 @@ var LcrFallbackModel = class {
248
262
  throw new Error(`ai-lcr: model "${opts.modelName}" has no providers`);
249
263
  }
250
264
  this.resetIntervalMs = opts.resetIntervalMs ?? 6e4;
265
+ this.cooldown = resolveCooldown(opts.cooldown);
266
+ this.failures = opts.providers.map(() => []);
267
+ this.cooldownUntil = opts.providers.map(() => 0);
251
268
  }
252
269
  opts;
253
270
  specificationVersion = "v3";
254
271
  // Cross-request *hint* for where the next request starts: after a failover we
255
272
  // remember the provider that worked so we don't re-probe a dead cheap one on
256
- // every call. This is the ONLY shared mutable state and crucially it is read
257
- // once per request (snapshotted into a local cursor) and written once on
258
- // settle, never used as a per-request loop bound. The within-request iteration
259
- // is fully local, so concurrent requests can't corrupt each other's routing.
273
+ // every call. Shared mutable state, but read once per request (snapshotted into
274
+ // a local cursor) and written once on settle, never used as a per-request loop
275
+ // bound. The within-request iteration is fully local, so concurrent requests
276
+ // can't corrupt each other's routing. The cooldown state below shares the same
277
+ // discipline: it's a cross-request hint that only ever *reorders* the local
278
+ // attempt list, never bounds it.
260
279
  sticky = 0;
261
280
  // When `sticky` was last advanced (a failover). The re-probe timer measures
262
281
  // from THIS, not from the last call — so it fires under sustained traffic too,
263
282
  // instead of being pushed forward forever by a busy stream of requests.
264
283
  lastFailoverAt = Date.now();
265
284
  resetIntervalMs;
285
+ // Circuit breaker (undefined = disabled). Per-provider, parallel to `providers`:
286
+ // `failures[i]` is the timestamps of recent failures within the window, and
287
+ // `cooldownUntil[i]` is the time before which provider i is skipped. Both are
288
+ // cross-request hints — like `sticky`, eventually consistent under concurrency
289
+ // and never used to bound a request's local iteration.
290
+ cooldown;
291
+ failures;
292
+ cooldownUntil;
293
+ /** Is provider `idx` currently cooling down (skipped)? Always false when the
294
+ * breaker is disabled, so callers need no extra guard. */
295
+ isCooling(idx, now) {
296
+ return this.cooldown !== void 0 && this.cooldownUntil[idx] > now;
297
+ }
298
+ /** Record a failed attempt on provider `idx`; trip its breaker once failures
299
+ * within the window reach `maxFailures`. No-op when the breaker is disabled. */
300
+ recordProviderFailure(idx) {
301
+ const cd = this.cooldown;
302
+ if (cd === void 0) return;
303
+ const now = Date.now();
304
+ const recent = this.failures[idx].filter((t) => now - t < cd.windowMs);
305
+ recent.push(now);
306
+ if (recent.length >= cd.maxFailures) {
307
+ this.cooldownUntil[idx] = now + cd.cooldownMs;
308
+ this.failures[idx] = [];
309
+ } else {
310
+ this.failures[idx] = recent;
311
+ }
312
+ }
313
+ /** A success on provider `idx` clears its failure history and any cooldown —
314
+ * the breaker is about *sustained* failure, so one good call resets it. */
315
+ recordProviderSuccess(idx) {
316
+ if (this.cooldown === void 0) return;
317
+ if (this.failures[idx].length > 0) this.failures[idx] = [];
318
+ if (this.cooldownUntil[idx] !== 0) this.cooldownUntil[idx] = 0;
319
+ }
320
+ /**
321
+ * The order of provider indices to try this request: the cheapest-first ring
322
+ * starting at `start`, but with currently-cooling providers moved to the BACK
323
+ * (last-resort, soonest-to-expire first) so the breaker skips them without ever
324
+ * dropping a provider — if every provider is cooling we still try them all
325
+ * rather than fail the request outright. With the breaker disabled this is just
326
+ * the plain ring, identical to the previous modular iteration. Computed once
327
+ * per request and threaded through any stream failover, so it's a stable local
328
+ * snapshot (concurrent requests can't reshuffle a request mid-flight).
329
+ */
330
+ routeOrder(start) {
331
+ const n = this.opts.providers.length;
332
+ const ring = [];
333
+ for (let k = 0; k < n; k++) ring.push((start + k) % n);
334
+ if (this.cooldown === void 0) return ring;
335
+ const now = Date.now();
336
+ const live = ring.filter((i) => !this.isCooling(i, now));
337
+ if (live.length === 0 || live.length === n) return ring;
338
+ const cooling = ring.filter((i) => this.isCooling(i, now)).sort((a, b) => this.cooldownUntil[a] - this.cooldownUntil[b]);
339
+ return [...live, ...cooling];
340
+ }
266
341
  get current() {
267
342
  return this.opts.providers[this.sticky];
268
343
  }
@@ -330,8 +405,9 @@ var LcrFallbackModel = class {
330
405
  requestId: requestIdFrom(options)
331
406
  };
332
407
  }
333
- /** Record a failed attempt onto the call's chain (no event yet). */
334
- recordFail(ctx, provider, attemptStart, error) {
408
+ /** Record a failed attempt onto the call's chain (no event yet) and count it
409
+ * toward provider `idx`'s circuit breaker. */
410
+ recordFail(ctx, idx, provider, attemptStart, error) {
335
411
  if (ctx.firstError === void 0) ctx.firstError = error;
336
412
  ctx.attempts.push({
337
413
  provider: provider.label,
@@ -340,6 +416,7 @@ var LcrFallbackModel = class {
340
416
  errorClass: classifyError(error),
341
417
  kind: classifyErrorKind(error)
342
418
  });
419
+ this.recordProviderFailure(idx);
343
420
  }
344
421
  /**
345
422
  * Baseline = what this same usage would have cost on the always-on fallback:
@@ -416,59 +493,61 @@ var LcrFallbackModel = class {
416
493
  async doGenerate(options) {
417
494
  const ctx = this.startCall(options);
418
495
  const providers = this.opts.providers;
419
- const n = providers.length;
420
- const start = this.startIndex();
496
+ const order = this.routeOrder(this.startIndex());
421
497
  let lastError;
422
- for (let tried = 0; tried < n; tried++) {
423
- const idx = (start + tried) % n;
498
+ for (let pos = 0; pos < order.length; pos++) {
499
+ const idx = order[pos];
424
500
  const provider = providers[idx];
501
+ const isLast = pos === order.length - 1;
425
502
  const attemptStart = Date.now();
426
503
  try {
427
504
  const result = await provider.model.doGenerate(options);
428
505
  const out = result.usage?.outputTokens?.total ?? 0;
429
506
  const inp = result.usage?.inputTokens?.total ?? 0;
430
- if (inp > 0 && out === 0 && tried < n - 1) {
507
+ if (inp > 0 && out === 0 && !isLast) {
431
508
  const emptyErr = new EmptyCompletionError(provider.label);
432
509
  lastError = emptyErr;
433
510
  this.emitError(emptyErr, provider.label);
434
- this.recordFail(ctx, provider, attemptStart, emptyErr);
511
+ this.recordFail(ctx, idx, provider, attemptStart, emptyErr);
435
512
  continue;
436
513
  }
514
+ this.recordProviderSuccess(idx);
437
515
  this.settleSticky(idx);
438
516
  this.finalizeOk(ctx, provider, attemptStart, result.usage);
439
517
  return result;
440
518
  } catch (error) {
441
519
  lastError = error;
442
520
  if (!this.shouldRetry(error)) {
443
- this.recordFail(ctx, provider, attemptStart, error);
521
+ this.recordFail(ctx, idx, provider, attemptStart, error);
444
522
  this.finalizeFail(ctx);
445
523
  throw error;
446
524
  }
447
525
  this.emitError(error, provider.label);
448
- this.recordFail(ctx, provider, attemptStart, error);
526
+ this.recordFail(ctx, idx, provider, attemptStart, error);
449
527
  }
450
528
  }
451
529
  this.finalizeFail(ctx);
452
530
  throw ctx.firstError ?? lastError;
453
531
  }
454
532
  async doStream(options) {
455
- return this.doStreamWithCtx(options, this.startCall(options), this.startIndex(), 0);
533
+ return this.doStreamWithCtx(options, this.startCall(options), this.routeOrder(this.startIndex()), 0);
456
534
  }
457
- // The stream's failover recursion re-enters here with the SAME `ctx` and a
458
- // threaded-through local cursor (`idx`/`tried`), so a mid-stream switch keeps
459
- // appending to one CallRecord and bounds itself on the local `tried` count —
460
- // never on shared instance state. `finalizeOk`/`finalizeFail` fire exactly
461
- // once per outer request.
462
- async doStreamWithCtx(options, ctx, startIdx, alreadyTried) {
535
+ // The stream's failover recursion re-enters here with the SAME `ctx` and the
536
+ // SAME `order` snapshot, advancing only the local position `pos`, so a
537
+ // mid-stream switch keeps appending to one CallRecord and bounds itself on the
538
+ // local position — never on shared instance state. `finalizeOk`/`finalizeFail`
539
+ // fire exactly once per outer request.
540
+ async doStreamWithCtx(options, ctx, order, pos) {
463
541
  const self = this;
464
542
  const providers = this.opts.providers;
465
- const n = providers.length;
543
+ const n = order.length;
466
544
  let result;
467
545
  let serving;
468
546
  let servingStart;
469
- let idx = startIdx;
470
- let tried = alreadyTried;
547
+ let p = pos;
548
+ let idx = order[p];
471
549
  for (; ; ) {
550
+ idx = order[p];
472
551
  serving = providers[idx];
473
552
  servingStart = Date.now();
474
553
  try {
@@ -476,24 +555,23 @@ var LcrFallbackModel = class {
476
555
  break;
477
556
  } catch (error) {
478
557
  if (!this.shouldRetry(error)) {
479
- this.recordFail(ctx, serving, servingStart, error);
558
+ this.recordFail(ctx, idx, serving, servingStart, error);
480
559
  this.finalizeFail(ctx);
481
560
  throw error;
482
561
  }
483
562
  this.emitError(error, serving.label);
484
- this.recordFail(ctx, serving, servingStart, error);
485
- tried++;
486
- if (tried >= n) {
563
+ this.recordFail(ctx, idx, serving, servingStart, error);
564
+ p++;
565
+ if (p >= n) {
487
566
  this.finalizeFail(ctx);
488
567
  throw ctx.firstError ?? error;
489
568
  }
490
- idx = (idx + 1) % n;
491
569
  }
492
570
  }
493
571
  const servingProvider = serving;
494
572
  const servingAttemptStart = servingStart;
495
573
  const servingIdx = idx;
496
- const triedBeforeServing = tried;
574
+ const servingPos = p;
497
575
  let usage;
498
576
  let contentStreamed = false;
499
577
  let ttftMs;
@@ -513,7 +591,7 @@ var LcrFallbackModel = class {
513
591
  usage = value.usage;
514
592
  const out = value.usage?.outputTokens?.total ?? 0;
515
593
  const inp = value.usage?.inputTokens?.total ?? 0;
516
- if (inp > 0 && out === 0 && !contentStreamed && triedBeforeServing + 1 < n) {
594
+ if (inp > 0 && out === 0 && !contentStreamed && servingPos + 1 < n) {
517
595
  throw new EmptyCompletionError(servingProvider.label);
518
596
  }
519
597
  }
@@ -523,26 +601,22 @@ var LcrFallbackModel = class {
523
601
  controller.enqueue(value);
524
602
  if (CONTENT_PART_TYPES.has(value.type)) contentStreamed = true;
525
603
  }
604
+ self.recordProviderSuccess(servingIdx);
526
605
  self.settleSticky(servingIdx);
527
606
  self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
528
607
  controller.close();
529
608
  } catch (error) {
530
609
  self.emitError(error, servingProvider.label);
531
- self.recordFail(ctx, servingProvider, servingAttemptStart, error);
610
+ self.recordFail(ctx, servingIdx, servingProvider, servingAttemptStart, error);
532
611
  if (!contentStreamed) {
533
- const nextTried = triedBeforeServing + 1;
534
- if (nextTried >= n) {
612
+ const nextPos = servingPos + 1;
613
+ if (nextPos >= n) {
535
614
  self.finalizeFail(ctx);
536
615
  controller.error(ctx.firstError ?? error);
537
616
  return;
538
617
  }
539
618
  try {
540
- const next = await self.doStreamWithCtx(
541
- options,
542
- ctx,
543
- (servingIdx + 1) % n,
544
- nextTried
545
- );
619
+ const next = await self.doStreamWithCtx(options, ctx, order, nextPos);
546
620
  const nextReader = next.stream.getReader();
547
621
  try {
548
622
  for (; ; ) {
@@ -1937,6 +2011,7 @@ function createLCR(config) {
1937
2011
  autoSort = false,
1938
2012
  autoPrice = false,
1939
2013
  resetIntervalMs,
2014
+ cooldown,
1940
2015
  onError,
1941
2016
  onCost,
1942
2017
  onCall,
@@ -1962,7 +2037,7 @@ function createLCR(config) {
1962
2037
  }
1963
2038
  routed.set(
1964
2039
  name,
1965
- new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost, onCall, shouldRetry })
2040
+ new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, cooldown, onError, onCost, onCall, shouldRetry })
1966
2041
  );
1967
2042
  }
1968
2043
  return (modelName) => {
package/dist/index.d.cts CHANGED
@@ -206,6 +206,23 @@ interface CallRecord {
206
206
  */
207
207
  emptyCompletion?: boolean;
208
208
  }
209
+ /**
210
+ * Circuit-breaker tuning for {@link FallbackOptions.cooldown}. A provider that
211
+ * fails `maxFailures` times within `windowMs` is *skipped* for `cooldownMs` —
212
+ * not just stepped past per request. Without it, the only recovery lever is the
213
+ * `resetIntervalMs` snap-back, which blindly re-probes the cheapest provider on
214
+ * a timer: a provider that's down keeps eating one failed attempt every window.
215
+ * The breaker remembers the failure and stops sending traffic to it until it's
216
+ * had time to recover. A single success clears its failure count.
217
+ */
218
+ interface CooldownOptions {
219
+ /** Failures within `windowMs` that trip the breaker for a provider. Default 3. */
220
+ maxFailures?: number;
221
+ /** Sliding window over which failures are counted, ms. Default 60_000. */
222
+ windowMs?: number;
223
+ /** How long a tripped provider is skipped before it's re-tried, ms. Default 60_000. */
224
+ cooldownMs?: number;
225
+ }
209
226
  /**
210
227
  * A transport-level failure (provider unreachable / socket dropped / DNS /
211
228
  * connect timeout). These carry no HTTP status, so they must be detected
@@ -898,6 +915,17 @@ interface LCRConfig {
898
915
  autoPrice?: boolean;
899
916
  /** Idle window after which routing snaps back to the cheapest provider. Default 60s. */
900
917
  resetIntervalMs?: number;
918
+ /**
919
+ * Circuit breaker: stop sending traffic to a provider that keeps failing,
920
+ * instead of re-probing it on every request. A provider that fails enough
921
+ * times in a window is *skipped* for a cooldown period (one success clears it).
922
+ * This is sharper than `resetIntervalMs` alone, which blindly re-tries the
923
+ * cheapest provider on a timer — a provider that's down then eats a failed
924
+ * attempt every window. `true` enables sensible defaults (3 failures / 60s →
925
+ * 60s cooldown); pass an object to tune; omit to disable (the default —
926
+ * unchanged routing, no provider is ever skipped). See {@link CooldownOptions}.
927
+ */
928
+ cooldown?: boolean | CooldownOptions;
901
929
  /** Called when a provider errors and routing falls through to the next. */
902
930
  onError?: (error: Error, provider: string) => void;
903
931
  /** Called after each successful call with the serving provider, tokens, and cost. */
@@ -941,4 +969,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
941
969
  */
942
970
  declare function createLCR(config: LCRConfig): LCRRouter;
943
971
 
944
- export { type BillableContext, type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, MODEL_PRICES, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaJobHandle, type MediaJobStatus, type MediaLCR, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPollResult, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaStatusRequest, type MediaStatusResult, type MediaSubmitOptions, type MediaSubmitRequest, type MediaSubmitResult, type MediaUnit, type MediaUsage, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, billableUnits, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, durationFromInput, formatCallRecord, getModelPrice, isAbortError, isNetworkError, isRetryableError, normalizedCents, priceCents, rankRoutes, referenceMegapixels, shouldFailover };
972
+ export { type BillableContext, type CallRecord, type CooldownOptions, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, MODEL_PRICES, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaJobHandle, type MediaJobStatus, type MediaLCR, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPollResult, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaStatusRequest, type MediaStatusResult, type MediaSubmitOptions, type MediaSubmitRequest, type MediaSubmitResult, type MediaUnit, type MediaUsage, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, billableUnits, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, durationFromInput, formatCallRecord, getModelPrice, isAbortError, isNetworkError, isRetryableError, normalizedCents, priceCents, rankRoutes, referenceMegapixels, shouldFailover };
package/dist/index.d.ts CHANGED
@@ -206,6 +206,23 @@ interface CallRecord {
206
206
  */
207
207
  emptyCompletion?: boolean;
208
208
  }
209
+ /**
210
+ * Circuit-breaker tuning for {@link FallbackOptions.cooldown}. A provider that
211
+ * fails `maxFailures` times within `windowMs` is *skipped* for `cooldownMs` —
212
+ * not just stepped past per request. Without it, the only recovery lever is the
213
+ * `resetIntervalMs` snap-back, which blindly re-probes the cheapest provider on
214
+ * a timer: a provider that's down keeps eating one failed attempt every window.
215
+ * The breaker remembers the failure and stops sending traffic to it until it's
216
+ * had time to recover. A single success clears its failure count.
217
+ */
218
+ interface CooldownOptions {
219
+ /** Failures within `windowMs` that trip the breaker for a provider. Default 3. */
220
+ maxFailures?: number;
221
+ /** Sliding window over which failures are counted, ms. Default 60_000. */
222
+ windowMs?: number;
223
+ /** How long a tripped provider is skipped before it's re-tried, ms. Default 60_000. */
224
+ cooldownMs?: number;
225
+ }
209
226
  /**
210
227
  * A transport-level failure (provider unreachable / socket dropped / DNS /
211
228
  * connect timeout). These carry no HTTP status, so they must be detected
@@ -898,6 +915,17 @@ interface LCRConfig {
898
915
  autoPrice?: boolean;
899
916
  /** Idle window after which routing snaps back to the cheapest provider. Default 60s. */
900
917
  resetIntervalMs?: number;
918
+ /**
919
+ * Circuit breaker: stop sending traffic to a provider that keeps failing,
920
+ * instead of re-probing it on every request. A provider that fails enough
921
+ * times in a window is *skipped* for a cooldown period (one success clears it).
922
+ * This is sharper than `resetIntervalMs` alone, which blindly re-tries the
923
+ * cheapest provider on a timer — a provider that's down then eats a failed
924
+ * attempt every window. `true` enables sensible defaults (3 failures / 60s →
925
+ * 60s cooldown); pass an object to tune; omit to disable (the default —
926
+ * unchanged routing, no provider is ever skipped). See {@link CooldownOptions}.
927
+ */
928
+ cooldown?: boolean | CooldownOptions;
901
929
  /** Called when a provider errors and routing falls through to the next. */
902
930
  onError?: (error: Error, provider: string) => void;
903
931
  /** Called after each successful call with the serving provider, tokens, and cost. */
@@ -941,4 +969,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
941
969
  */
942
970
  declare function createLCR(config: LCRConfig): LCRRouter;
943
971
 
944
- export { type BillableContext, type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, MODEL_PRICES, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaJobHandle, type MediaJobStatus, type MediaLCR, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPollResult, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaStatusRequest, type MediaStatusResult, type MediaSubmitOptions, type MediaSubmitRequest, type MediaSubmitResult, type MediaUnit, type MediaUsage, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, billableUnits, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, durationFromInput, formatCallRecord, getModelPrice, isAbortError, isNetworkError, isRetryableError, normalizedCents, priceCents, rankRoutes, referenceMegapixels, shouldFailover };
972
+ export { type BillableContext, type CallRecord, type CooldownOptions, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, MODEL_PRICES, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaJobHandle, type MediaJobStatus, type MediaLCR, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPollResult, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaStatusRequest, type MediaStatusResult, type MediaSubmitOptions, type MediaSubmitRequest, type MediaSubmitResult, type MediaUnit, type MediaUsage, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, billableUnits, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, durationFromInput, formatCallRecord, getModelPrice, isAbortError, isNetworkError, isRetryableError, normalizedCents, priceCents, rankRoutes, referenceMegapixels, shouldFailover };
package/dist/index.js CHANGED
@@ -5,6 +5,20 @@ var EmptyCompletionError = class extends Error {
5
5
  this.name = "EmptyCompletionError";
6
6
  }
7
7
  };
8
+ var COOLDOWN_DEFAULTS = {
9
+ maxFailures: 3,
10
+ windowMs: 6e4,
11
+ cooldownMs: 6e4
12
+ };
13
+ function resolveCooldown(opt) {
14
+ if (!opt) return void 0;
15
+ if (opt === true) return { ...COOLDOWN_DEFAULTS };
16
+ return {
17
+ maxFailures: opt.maxFailures ?? COOLDOWN_DEFAULTS.maxFailures,
18
+ windowMs: opt.windowMs ?? COOLDOWN_DEFAULTS.windowMs,
19
+ cooldownMs: opt.cooldownMs ?? COOLDOWN_DEFAULTS.cooldownMs
20
+ };
21
+ }
8
22
  var RETRYABLE_STATUS = /* @__PURE__ */ new Set([401, 402, 403, 408, 409, 413, 429, 498, 500]);
9
23
  var RETRYABLE_PATTERNS = [
10
24
  "overloaded",
@@ -197,21 +211,82 @@ var LcrFallbackModel = class {
197
211
  throw new Error(`ai-lcr: model "${opts.modelName}" has no providers`);
198
212
  }
199
213
  this.resetIntervalMs = opts.resetIntervalMs ?? 6e4;
214
+ this.cooldown = resolveCooldown(opts.cooldown);
215
+ this.failures = opts.providers.map(() => []);
216
+ this.cooldownUntil = opts.providers.map(() => 0);
200
217
  }
201
218
  opts;
202
219
  specificationVersion = "v3";
203
220
  // Cross-request *hint* for where the next request starts: after a failover we
204
221
  // remember the provider that worked so we don't re-probe a dead cheap one on
205
- // every call. This is the ONLY shared mutable state and crucially it is read
206
- // once per request (snapshotted into a local cursor) and written once on
207
- // settle, never used as a per-request loop bound. The within-request iteration
208
- // is fully local, so concurrent requests can't corrupt each other's routing.
222
+ // every call. Shared mutable state, but read once per request (snapshotted into
223
+ // a local cursor) and written once on settle, never used as a per-request loop
224
+ // bound. The within-request iteration is fully local, so concurrent requests
225
+ // can't corrupt each other's routing. The cooldown state below shares the same
226
+ // discipline: it's a cross-request hint that only ever *reorders* the local
227
+ // attempt list, never bounds it.
209
228
  sticky = 0;
210
229
  // When `sticky` was last advanced (a failover). The re-probe timer measures
211
230
  // from THIS, not from the last call — so it fires under sustained traffic too,
212
231
  // instead of being pushed forward forever by a busy stream of requests.
213
232
  lastFailoverAt = Date.now();
214
233
  resetIntervalMs;
234
+ // Circuit breaker (undefined = disabled). Per-provider, parallel to `providers`:
235
+ // `failures[i]` is the timestamps of recent failures within the window, and
236
+ // `cooldownUntil[i]` is the time before which provider i is skipped. Both are
237
+ // cross-request hints — like `sticky`, eventually consistent under concurrency
238
+ // and never used to bound a request's local iteration.
239
+ cooldown;
240
+ failures;
241
+ cooldownUntil;
242
+ /** Is provider `idx` currently cooling down (skipped)? Always false when the
243
+ * breaker is disabled, so callers need no extra guard. */
244
+ isCooling(idx, now) {
245
+ return this.cooldown !== void 0 && this.cooldownUntil[idx] > now;
246
+ }
247
+ /** Record a failed attempt on provider `idx`; trip its breaker once failures
248
+ * within the window reach `maxFailures`. No-op when the breaker is disabled. */
249
+ recordProviderFailure(idx) {
250
+ const cd = this.cooldown;
251
+ if (cd === void 0) return;
252
+ const now = Date.now();
253
+ const recent = this.failures[idx].filter((t) => now - t < cd.windowMs);
254
+ recent.push(now);
255
+ if (recent.length >= cd.maxFailures) {
256
+ this.cooldownUntil[idx] = now + cd.cooldownMs;
257
+ this.failures[idx] = [];
258
+ } else {
259
+ this.failures[idx] = recent;
260
+ }
261
+ }
262
+ /** A success on provider `idx` clears its failure history and any cooldown —
263
+ * the breaker is about *sustained* failure, so one good call resets it. */
264
+ recordProviderSuccess(idx) {
265
+ if (this.cooldown === void 0) return;
266
+ if (this.failures[idx].length > 0) this.failures[idx] = [];
267
+ if (this.cooldownUntil[idx] !== 0) this.cooldownUntil[idx] = 0;
268
+ }
269
+ /**
270
+ * The order of provider indices to try this request: the cheapest-first ring
271
+ * starting at `start`, but with currently-cooling providers moved to the BACK
272
+ * (last-resort, soonest-to-expire first) so the breaker skips them without ever
273
+ * dropping a provider — if every provider is cooling we still try them all
274
+ * rather than fail the request outright. With the breaker disabled this is just
275
+ * the plain ring, identical to the previous modular iteration. Computed once
276
+ * per request and threaded through any stream failover, so it's a stable local
277
+ * snapshot (concurrent requests can't reshuffle a request mid-flight).
278
+ */
279
+ routeOrder(start) {
280
+ const n = this.opts.providers.length;
281
+ const ring = [];
282
+ for (let k = 0; k < n; k++) ring.push((start + k) % n);
283
+ if (this.cooldown === void 0) return ring;
284
+ const now = Date.now();
285
+ const live = ring.filter((i) => !this.isCooling(i, now));
286
+ if (live.length === 0 || live.length === n) return ring;
287
+ const cooling = ring.filter((i) => this.isCooling(i, now)).sort((a, b) => this.cooldownUntil[a] - this.cooldownUntil[b]);
288
+ return [...live, ...cooling];
289
+ }
215
290
  get current() {
216
291
  return this.opts.providers[this.sticky];
217
292
  }
@@ -279,8 +354,9 @@ var LcrFallbackModel = class {
279
354
  requestId: requestIdFrom(options)
280
355
  };
281
356
  }
282
- /** Record a failed attempt onto the call's chain (no event yet). */
283
- recordFail(ctx, provider, attemptStart, error) {
357
+ /** Record a failed attempt onto the call's chain (no event yet) and count it
358
+ * toward provider `idx`'s circuit breaker. */
359
+ recordFail(ctx, idx, provider, attemptStart, error) {
284
360
  if (ctx.firstError === void 0) ctx.firstError = error;
285
361
  ctx.attempts.push({
286
362
  provider: provider.label,
@@ -289,6 +365,7 @@ var LcrFallbackModel = class {
289
365
  errorClass: classifyError(error),
290
366
  kind: classifyErrorKind(error)
291
367
  });
368
+ this.recordProviderFailure(idx);
292
369
  }
293
370
  /**
294
371
  * Baseline = what this same usage would have cost on the always-on fallback:
@@ -365,59 +442,61 @@ var LcrFallbackModel = class {
365
442
  async doGenerate(options) {
366
443
  const ctx = this.startCall(options);
367
444
  const providers = this.opts.providers;
368
- const n = providers.length;
369
- const start = this.startIndex();
445
+ const order = this.routeOrder(this.startIndex());
370
446
  let lastError;
371
- for (let tried = 0; tried < n; tried++) {
372
- const idx = (start + tried) % n;
447
+ for (let pos = 0; pos < order.length; pos++) {
448
+ const idx = order[pos];
373
449
  const provider = providers[idx];
450
+ const isLast = pos === order.length - 1;
374
451
  const attemptStart = Date.now();
375
452
  try {
376
453
  const result = await provider.model.doGenerate(options);
377
454
  const out = result.usage?.outputTokens?.total ?? 0;
378
455
  const inp = result.usage?.inputTokens?.total ?? 0;
379
- if (inp > 0 && out === 0 && tried < n - 1) {
456
+ if (inp > 0 && out === 0 && !isLast) {
380
457
  const emptyErr = new EmptyCompletionError(provider.label);
381
458
  lastError = emptyErr;
382
459
  this.emitError(emptyErr, provider.label);
383
- this.recordFail(ctx, provider, attemptStart, emptyErr);
460
+ this.recordFail(ctx, idx, provider, attemptStart, emptyErr);
384
461
  continue;
385
462
  }
463
+ this.recordProviderSuccess(idx);
386
464
  this.settleSticky(idx);
387
465
  this.finalizeOk(ctx, provider, attemptStart, result.usage);
388
466
  return result;
389
467
  } catch (error) {
390
468
  lastError = error;
391
469
  if (!this.shouldRetry(error)) {
392
- this.recordFail(ctx, provider, attemptStart, error);
470
+ this.recordFail(ctx, idx, provider, attemptStart, error);
393
471
  this.finalizeFail(ctx);
394
472
  throw error;
395
473
  }
396
474
  this.emitError(error, provider.label);
397
- this.recordFail(ctx, provider, attemptStart, error);
475
+ this.recordFail(ctx, idx, provider, attemptStart, error);
398
476
  }
399
477
  }
400
478
  this.finalizeFail(ctx);
401
479
  throw ctx.firstError ?? lastError;
402
480
  }
403
481
  async doStream(options) {
404
- return this.doStreamWithCtx(options, this.startCall(options), this.startIndex(), 0);
482
+ return this.doStreamWithCtx(options, this.startCall(options), this.routeOrder(this.startIndex()), 0);
405
483
  }
406
- // The stream's failover recursion re-enters here with the SAME `ctx` and a
407
- // threaded-through local cursor (`idx`/`tried`), so a mid-stream switch keeps
408
- // appending to one CallRecord and bounds itself on the local `tried` count —
409
- // never on shared instance state. `finalizeOk`/`finalizeFail` fire exactly
410
- // once per outer request.
411
- async doStreamWithCtx(options, ctx, startIdx, alreadyTried) {
484
+ // The stream's failover recursion re-enters here with the SAME `ctx` and the
485
+ // SAME `order` snapshot, advancing only the local position `pos`, so a
486
+ // mid-stream switch keeps appending to one CallRecord and bounds itself on the
487
+ // local position — never on shared instance state. `finalizeOk`/`finalizeFail`
488
+ // fire exactly once per outer request.
489
+ async doStreamWithCtx(options, ctx, order, pos) {
412
490
  const self = this;
413
491
  const providers = this.opts.providers;
414
- const n = providers.length;
492
+ const n = order.length;
415
493
  let result;
416
494
  let serving;
417
495
  let servingStart;
418
- let idx = startIdx;
419
- let tried = alreadyTried;
496
+ let p = pos;
497
+ let idx = order[p];
420
498
  for (; ; ) {
499
+ idx = order[p];
421
500
  serving = providers[idx];
422
501
  servingStart = Date.now();
423
502
  try {
@@ -425,24 +504,23 @@ var LcrFallbackModel = class {
425
504
  break;
426
505
  } catch (error) {
427
506
  if (!this.shouldRetry(error)) {
428
- this.recordFail(ctx, serving, servingStart, error);
507
+ this.recordFail(ctx, idx, serving, servingStart, error);
429
508
  this.finalizeFail(ctx);
430
509
  throw error;
431
510
  }
432
511
  this.emitError(error, serving.label);
433
- this.recordFail(ctx, serving, servingStart, error);
434
- tried++;
435
- if (tried >= n) {
512
+ this.recordFail(ctx, idx, serving, servingStart, error);
513
+ p++;
514
+ if (p >= n) {
436
515
  this.finalizeFail(ctx);
437
516
  throw ctx.firstError ?? error;
438
517
  }
439
- idx = (idx + 1) % n;
440
518
  }
441
519
  }
442
520
  const servingProvider = serving;
443
521
  const servingAttemptStart = servingStart;
444
522
  const servingIdx = idx;
445
- const triedBeforeServing = tried;
523
+ const servingPos = p;
446
524
  let usage;
447
525
  let contentStreamed = false;
448
526
  let ttftMs;
@@ -462,7 +540,7 @@ var LcrFallbackModel = class {
462
540
  usage = value.usage;
463
541
  const out = value.usage?.outputTokens?.total ?? 0;
464
542
  const inp = value.usage?.inputTokens?.total ?? 0;
465
- if (inp > 0 && out === 0 && !contentStreamed && triedBeforeServing + 1 < n) {
543
+ if (inp > 0 && out === 0 && !contentStreamed && servingPos + 1 < n) {
466
544
  throw new EmptyCompletionError(servingProvider.label);
467
545
  }
468
546
  }
@@ -472,26 +550,22 @@ var LcrFallbackModel = class {
472
550
  controller.enqueue(value);
473
551
  if (CONTENT_PART_TYPES.has(value.type)) contentStreamed = true;
474
552
  }
553
+ self.recordProviderSuccess(servingIdx);
475
554
  self.settleSticky(servingIdx);
476
555
  self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
477
556
  controller.close();
478
557
  } catch (error) {
479
558
  self.emitError(error, servingProvider.label);
480
- self.recordFail(ctx, servingProvider, servingAttemptStart, error);
559
+ self.recordFail(ctx, servingIdx, servingProvider, servingAttemptStart, error);
481
560
  if (!contentStreamed) {
482
- const nextTried = triedBeforeServing + 1;
483
- if (nextTried >= n) {
561
+ const nextPos = servingPos + 1;
562
+ if (nextPos >= n) {
484
563
  self.finalizeFail(ctx);
485
564
  controller.error(ctx.firstError ?? error);
486
565
  return;
487
566
  }
488
567
  try {
489
- const next = await self.doStreamWithCtx(
490
- options,
491
- ctx,
492
- (servingIdx + 1) % n,
493
- nextTried
494
- );
568
+ const next = await self.doStreamWithCtx(options, ctx, order, nextPos);
495
569
  const nextReader = next.stream.getReader();
496
570
  try {
497
571
  for (; ; ) {
@@ -1886,6 +1960,7 @@ function createLCR(config) {
1886
1960
  autoSort = false,
1887
1961
  autoPrice = false,
1888
1962
  resetIntervalMs,
1963
+ cooldown,
1889
1964
  onError,
1890
1965
  onCost,
1891
1966
  onCall,
@@ -1911,7 +1986,7 @@ function createLCR(config) {
1911
1986
  }
1912
1987
  routed.set(
1913
1988
  name,
1914
- new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost, onCall, shouldRetry })
1989
+ new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, cooldown, onError, onCost, onCall, shouldRetry })
1915
1990
  );
1916
1991
  }
1917
1992
  return (modelName) => {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ai-lcr",
3
- "version": "0.6.1",
3
+ "version": "0.6.2",
4
4
  "description": "Least Cost Routing for LLMs — route every model call to the cheapest available provider, fall back automatically, and track real cost. Built for the Vercel AI SDK.",
5
5
  "keywords": [
6
6
  "ai",