npm - ai-lcr - Versions diffs - 0.6.1 → 0.6.2 - Mend

ai-lcr 0.6.1 → 0.6.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -4,11 +4,42 @@ All notable changes to `ai-lcr` are documented here. The format follows
 [Keep a Changelog](https://keepachangelog.com/), and the project adheres to
 [Semantic Versioning](https://semver.org/).
+## [0.6.2] — 2026-06-11
+Circuit breaker for persistently-failing providers. Until now the only recovery
+lever was `resetIntervalMs`, which snaps routing back to the cheapest provider on
+a timer — so a provider that's actually down keeps eating one failed attempt
+every window. The breaker remembers the failure and stops sending it traffic.
+### Added
+- **`createLCR({ cooldown })`.** A provider that fails `maxFailures` times within
+  `windowMs` is *skipped* for `cooldownMs` instead of being re-probed every
+  request; a single success clears its count. `true` enables defaults (3 / 60s →
+  60s); pass `{ maxFailures, windowMs, cooldownMs }` to tune. New exported type
+  `CooldownOptions`.
+- The breaker only **reorders** each request's attempt list (cooling providers go
+  last), so when every provider is cooling a request still tries them all rather
+  than failing outright — it can never turn a recoverable request into a hard
+  failure.
+### Changed
+- The routing engine now snapshots a per-request **attempt order** once (cheapest
+  ring with cooling providers moved to the back) and threads it through streaming
+  failover, replacing the previous modular index walk. Behavior is identical when
+  `cooldown` is unset.
+### Compatibility
+- Fully backward compatible. `cooldown` is **off by default** — with it unset no
+  provider is ever skipped and routing behaves exactly as before.
 ## [0.6.1] — 2026-06-11
 Zero-config pricing for native-maker routes. Until now every priced provider
 needed a hand-typed `cost: { input, output }`; for a vendor's own API that number
-is just the public list price you could look up. 0.7 bundles those.
+is just the public list price you could look up. 0.6.1 bundles those.
 ### Added

package/README.md CHANGED Viewed

@@ -171,6 +171,17 @@ Look a price up yourself with `getModelPrice("claude-sonnet-4-6")`. The table is
 2. **Fall through on failure.** On any provider failure — rate limit, 5xx, timeout, a **billing cap** (402 / out-of-credit / quota), *and* a client error like a **400** — it advances to the next provider, streaming-safe. A 400 fails over on purpose: across OpenAI-compatible aggregators a 400 is usually "*this* provider won't take this request" (an unsupported param, a model it hasn't listed, a stricter schema), not a universally-broken request — so the next provider may well serve it. If every provider rejects the request it still fails, surfacing the **original** error so a genuine caller bug stays debuggable. The one failure that never fails over is a deliberate caller cancellation (`AbortSignal`). Pass `shouldRetry: isRetryableError` to `createLCR` to restore the stricter "client errors fail fast" behavior.
 3. **Recover.** After an idle window (`resetIntervalMs`, default 60s) it snaps back to the cheapest provider.
+For a provider that's *persistently* down, the timer alone keeps re-probing it — one failed attempt every window. Turn on the **circuit breaker** to stop that:
+```ts
+const lcr = createLCR({
+  models: { /* … */ },
+  cooldown: true, // skip a provider that keeps failing, instead of re-probing it
+});
+```
+With `cooldown` on, a provider that fails enough times in a window is *skipped* for a cooldown period rather than tried every request — and a single success clears it. Defaults are 3 failures / 60s → 60s cooldown; tune with `cooldown: { maxFailures, windowMs, cooldownMs }`. It only ever **reorders** the attempt list (cooling providers go last), so if *every* provider is cooling a request still tries them all rather than failing outright. Off by default — routing is unchanged unless you opt in.
 ## See what happened (`onCall`)
 `onError`/`onCost` fire separately and uncorrelated, so a failover is hard to read after the fact. `onCall` gives you **one record per request** — the full chain, the winner, the reason for each failed hop, latency, and cost — and `formatCallRecord` turns it into a one-liner you can scan:
@@ -490,6 +501,7 @@ Two OpenAI-compatible providers, same probe, same day. Cells cover both families
 ## Roadmap
 - [x] Own failover engine — cheapest-first routing + streaming-safe fallback, no external routing dependency
+- [x] Circuit breaker (`cooldown`) — skip a persistently-failing provider instead of re-probing it every window
 - [x] Real per-call cost accounting (`onCost`)
 - [x] One correlated record per request with the full failover chain (`onCall` + `formatCallRecord`)
 - [x] Auto cheapest-first ordering (`autoSort`) from per-provider `cost`

package/dist/index.cjs CHANGED Viewed

@@ -56,6 +56,20 @@ var EmptyCompletionError = class extends Error {
     this.name = "EmptyCompletionError";
   }
 };
+var COOLDOWN_DEFAULTS = {
+  maxFailures: 3,
+  windowMs: 6e4,
+  cooldownMs: 6e4
+};
+function resolveCooldown(opt) {
+  if (!opt) return void 0;
+  if (opt === true) return { ...COOLDOWN_DEFAULTS };
+  return {
+    maxFailures: opt.maxFailures ?? COOLDOWN_DEFAULTS.maxFailures,
+    windowMs: opt.windowMs ?? COOLDOWN_DEFAULTS.windowMs,
+    cooldownMs: opt.cooldownMs ?? COOLDOWN_DEFAULTS.cooldownMs
+  };
+}
 var RETRYABLE_STATUS = /* @__PURE__ */ new Set([401, 402, 403, 408, 409, 413, 429, 498, 500]);
 var RETRYABLE_PATTERNS = [
   "overloaded",
@@ -248,21 +262,82 @@ var LcrFallbackModel = class {
       throw new Error(`ai-lcr: model "${opts.modelName}" has no providers`);
     }
     this.resetIntervalMs = opts.resetIntervalMs ?? 6e4;
+    this.cooldown = resolveCooldown(opts.cooldown);
+    this.failures = opts.providers.map(() => []);
+    this.cooldownUntil = opts.providers.map(() => 0);
   }
   opts;
   specificationVersion = "v3";
   // Cross-request *hint* for where the next request starts: after a failover we
   // remember the provider that worked so we don't re-probe a dead cheap one on
-  // every call. This is the ONLY shared mutable state — and crucially it is read
-  // once per request (snapshotted into a local cursor) and written once on
-  // settle, never used as a per-request loop bound. The within-request iteration
-  // is fully local, so concurrent requests can't corrupt each other's routing.
+  // every call. Shared mutable state, but read once per request (snapshotted into
+  // a local cursor) and written once on settle, never used as a per-request loop
+  // bound. The within-request iteration is fully local, so concurrent requests
+  // can't corrupt each other's routing. The cooldown state below shares the same
+  // discipline: it's a cross-request hint that only ever *reorders* the local
+  // attempt list, never bounds it.
   sticky = 0;
   // When `sticky` was last advanced (a failover). The re-probe timer measures
   // from THIS, not from the last call — so it fires under sustained traffic too,
   // instead of being pushed forward forever by a busy stream of requests.
   lastFailoverAt = Date.now();
   resetIntervalMs;
+  // Circuit breaker (undefined = disabled). Per-provider, parallel to `providers`:
+  // `failures[i]` is the timestamps of recent failures within the window, and
+  // `cooldownUntil[i]` is the time before which provider i is skipped. Both are
+  // cross-request hints — like `sticky`, eventually consistent under concurrency
+  // and never used to bound a request's local iteration.
+  cooldown;
+  failures;
+  cooldownUntil;
+  /** Is provider `idx` currently cooling down (skipped)? Always false when the
+   *  breaker is disabled, so callers need no extra guard. */
+  isCooling(idx, now) {
+    return this.cooldown !== void 0 && this.cooldownUntil[idx] > now;
+  }
+  /** Record a failed attempt on provider `idx`; trip its breaker once failures
+   *  within the window reach `maxFailures`. No-op when the breaker is disabled. */
+  recordProviderFailure(idx) {
+    const cd = this.cooldown;
+    if (cd === void 0) return;
+    const now = Date.now();
+    const recent = this.failures[idx].filter((t) => now - t < cd.windowMs);
+    recent.push(now);
+    if (recent.length >= cd.maxFailures) {
+      this.cooldownUntil[idx] = now + cd.cooldownMs;
+      this.failures[idx] = [];
+    } else {
+      this.failures[idx] = recent;
+    }
+  }
+  /** A success on provider `idx` clears its failure history and any cooldown —
+   *  the breaker is about *sustained* failure, so one good call resets it. */
+  recordProviderSuccess(idx) {
+    if (this.cooldown === void 0) return;
+    if (this.failures[idx].length > 0) this.failures[idx] = [];
+    if (this.cooldownUntil[idx] !== 0) this.cooldownUntil[idx] = 0;
+  }
+  /**
+   * The order of provider indices to try this request: the cheapest-first ring
+   * starting at `start`, but with currently-cooling providers moved to the BACK
+   * (last-resort, soonest-to-expire first) so the breaker skips them without ever
+   * dropping a provider — if every provider is cooling we still try them all
+   * rather than fail the request outright. With the breaker disabled this is just
+   * the plain ring, identical to the previous modular iteration. Computed once
+   * per request and threaded through any stream failover, so it's a stable local
+   * snapshot (concurrent requests can't reshuffle a request mid-flight).
+   */
+  routeOrder(start) {
+    const n = this.opts.providers.length;
+    const ring = [];
+    for (let k = 0; k < n; k++) ring.push((start + k) % n);
+    if (this.cooldown === void 0) return ring;
+    const now = Date.now();
+    const live = ring.filter((i) => !this.isCooling(i, now));
+    if (live.length === 0 || live.length === n) return ring;
+    const cooling = ring.filter((i) => this.isCooling(i, now)).sort((a, b) => this.cooldownUntil[a] - this.cooldownUntil[b]);
+    return [...live, ...cooling];
+  }
   get current() {
     return this.opts.providers[this.sticky];
   }
@@ -330,8 +405,9 @@ var LcrFallbackModel = class {
       requestId: requestIdFrom(options)
     };
   }
-  /** Record a failed attempt onto the call's chain (no event yet). */
-  recordFail(ctx, provider, attemptStart, error) {
+  /** Record a failed attempt onto the call's chain (no event yet) and count it
+   *  toward provider `idx`'s circuit breaker. */
+  recordFail(ctx, idx, provider, attemptStart, error) {
     if (ctx.firstError === void 0) ctx.firstError = error;
     ctx.attempts.push({
       provider: provider.label,
@@ -340,6 +416,7 @@ var LcrFallbackModel = class {
       errorClass: classifyError(error),
       kind: classifyErrorKind(error)
     });
+    this.recordProviderFailure(idx);
   }
   /**
    * Baseline = what this same usage would have cost on the always-on fallback:
@@ -416,59 +493,61 @@ var LcrFallbackModel = class {
   async doGenerate(options) {
     const ctx = this.startCall(options);
     const providers = this.opts.providers;
-    const n = providers.length;
-    const start = this.startIndex();
+    const order = this.routeOrder(this.startIndex());
     let lastError;
-    for (let tried = 0; tried < n; tried++) {
-      const idx = (start + tried) % n;
+    for (let pos = 0; pos < order.length; pos++) {
+      const idx = order[pos];
       const provider = providers[idx];
+      const isLast = pos === order.length - 1;
       const attemptStart = Date.now();
       try {
         const result = await provider.model.doGenerate(options);
         const out = result.usage?.outputTokens?.total ?? 0;
         const inp = result.usage?.inputTokens?.total ?? 0;
-        if (inp > 0 && out === 0 && tried < n - 1) {
+        if (inp > 0 && out === 0 && !isLast) {
           const emptyErr = new EmptyCompletionError(provider.label);
           lastError = emptyErr;
           this.emitError(emptyErr, provider.label);
-          this.recordFail(ctx, provider, attemptStart, emptyErr);
+          this.recordFail(ctx, idx, provider, attemptStart, emptyErr);
           continue;
         }
+        this.recordProviderSuccess(idx);
         this.settleSticky(idx);
         this.finalizeOk(ctx, provider, attemptStart, result.usage);
         return result;
       } catch (error) {
         lastError = error;
         if (!this.shouldRetry(error)) {
-          this.recordFail(ctx, provider, attemptStart, error);
+          this.recordFail(ctx, idx, provider, attemptStart, error);
           this.finalizeFail(ctx);
           throw error;
         }
         this.emitError(error, provider.label);
-        this.recordFail(ctx, provider, attemptStart, error);
+        this.recordFail(ctx, idx, provider, attemptStart, error);
       }
     }
     this.finalizeFail(ctx);
     throw ctx.firstError ?? lastError;
   }
   async doStream(options) {
-    return this.doStreamWithCtx(options, this.startCall(options), this.startIndex(), 0);
+    return this.doStreamWithCtx(options, this.startCall(options), this.routeOrder(this.startIndex()), 0);
   }
-  // The stream's failover recursion re-enters here with the SAME `ctx` and a
-  // threaded-through local cursor (`idx`/`tried`), so a mid-stream switch keeps
-  // appending to one CallRecord and bounds itself on the local `tried` count —
-  // never on shared instance state. `finalizeOk`/`finalizeFail` fire exactly
-  // once per outer request.
-  async doStreamWithCtx(options, ctx, startIdx, alreadyTried) {
+  // The stream's failover recursion re-enters here with the SAME `ctx` and the
+  // SAME `order` snapshot, advancing only the local position `pos`, so a
+  // mid-stream switch keeps appending to one CallRecord and bounds itself on the
+  // local position — never on shared instance state. `finalizeOk`/`finalizeFail`
+  // fire exactly once per outer request.
+  async doStreamWithCtx(options, ctx, order, pos) {
     const self = this;
     const providers = this.opts.providers;
-    const n = providers.length;
+    const n = order.length;
     let result;
     let serving;
     let servingStart;
-    let idx = startIdx;
-    let tried = alreadyTried;
+    let p = pos;
+    let idx = order[p];
     for (; ; ) {
+      idx = order[p];
       serving = providers[idx];
       servingStart = Date.now();
       try {
@@ -476,24 +555,23 @@ var LcrFallbackModel = class {
         break;
       } catch (error) {
         if (!this.shouldRetry(error)) {
-          this.recordFail(ctx, serving, servingStart, error);
+          this.recordFail(ctx, idx, serving, servingStart, error);
           this.finalizeFail(ctx);
           throw error;
         }
         this.emitError(error, serving.label);
-        this.recordFail(ctx, serving, servingStart, error);
-        tried++;
-        if (tried >= n) {
+        this.recordFail(ctx, idx, serving, servingStart, error);
+        p++;
+        if (p >= n) {
           this.finalizeFail(ctx);
           throw ctx.firstError ?? error;
         }
-        idx = (idx + 1) % n;
       }
     }
     const servingProvider = serving;
     const servingAttemptStart = servingStart;
     const servingIdx = idx;
-    const triedBeforeServing = tried;
+    const servingPos = p;
     let usage;
     let contentStreamed = false;
     let ttftMs;
@@ -513,7 +591,7 @@ var LcrFallbackModel = class {
               usage = value.usage;
               const out = value.usage?.outputTokens?.total ?? 0;
               const inp = value.usage?.inputTokens?.total ?? 0;
-              if (inp > 0 && out === 0 && !contentStreamed && triedBeforeServing + 1 < n) {
+              if (inp > 0 && out === 0 && !contentStreamed && servingPos + 1 < n) {
                 throw new EmptyCompletionError(servingProvider.label);
               }
             }
@@ -523,26 +601,22 @@ var LcrFallbackModel = class {
             controller.enqueue(value);
             if (CONTENT_PART_TYPES.has(value.type)) contentStreamed = true;
           }
+          self.recordProviderSuccess(servingIdx);
           self.settleSticky(servingIdx);
           self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
           controller.close();
         } catch (error) {
           self.emitError(error, servingProvider.label);
-          self.recordFail(ctx, servingProvider, servingAttemptStart, error);
+          self.recordFail(ctx, servingIdx, servingProvider, servingAttemptStart, error);
           if (!contentStreamed) {
-            const nextTried = triedBeforeServing + 1;
-            if (nextTried >= n) {
+            const nextPos = servingPos + 1;
+            if (nextPos >= n) {
               self.finalizeFail(ctx);
               controller.error(ctx.firstError ?? error);
               return;
             }
             try {
-              const next = await self.doStreamWithCtx(
-                options,
-                ctx,
-                (servingIdx + 1) % n,
-                nextTried
-              );
+              const next = await self.doStreamWithCtx(options, ctx, order, nextPos);
               const nextReader = next.stream.getReader();
               try {
                 for (; ; ) {
@@ -1937,6 +2011,7 @@ function createLCR(config) {
     autoSort = false,
     autoPrice = false,
     resetIntervalMs,
+    cooldown,
     onError,
     onCost,
     onCall,
@@ -1962,7 +2037,7 @@ function createLCR(config) {
     }
     routed.set(
       name,
-      new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost, onCall, shouldRetry })
+      new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, cooldown, onError, onCost, onCall, shouldRetry })
     );
   }
   return (modelName) => {

package/dist/index.d.cts CHANGED Viewed

@@ -206,6 +206,23 @@ interface CallRecord {
      */
     emptyCompletion?: boolean;
 }
+/**
+ * Circuit-breaker tuning for {@link FallbackOptions.cooldown}. A provider that
+ * fails `maxFailures` times within `windowMs` is *skipped* for `cooldownMs` —
+ * not just stepped past per request. Without it, the only recovery lever is the
+ * `resetIntervalMs` snap-back, which blindly re-probes the cheapest provider on
+ * a timer: a provider that's down keeps eating one failed attempt every window.
+ * The breaker remembers the failure and stops sending traffic to it until it's
+ * had time to recover. A single success clears its failure count.
+ */
+interface CooldownOptions {
+    /** Failures within `windowMs` that trip the breaker for a provider. Default 3. */
+    maxFailures?: number;
+    /** Sliding window over which failures are counted, ms. Default 60_000. */
+    windowMs?: number;
+    /** How long a tripped provider is skipped before it's re-tried, ms. Default 60_000. */
+    cooldownMs?: number;
+}
 /**
  * A transport-level failure (provider unreachable / socket dropped / DNS /
  * connect timeout). These carry no HTTP status, so they must be detected
@@ -898,6 +915,17 @@ interface LCRConfig {
     autoPrice?: boolean;
     /** Idle window after which routing snaps back to the cheapest provider. Default 60s. */
     resetIntervalMs?: number;
+    /**
+     * Circuit breaker: stop sending traffic to a provider that keeps failing,
+     * instead of re-probing it on every request. A provider that fails enough
+     * times in a window is *skipped* for a cooldown period (one success clears it).
+     * This is sharper than `resetIntervalMs` alone, which blindly re-tries the
+     * cheapest provider on a timer — a provider that's down then eats a failed
+     * attempt every window. `true` enables sensible defaults (3 failures / 60s →
+     * 60s cooldown); pass an object to tune; omit to disable (the default —
+     * unchanged routing, no provider is ever skipped). See {@link CooldownOptions}.
+     */
+    cooldown?: boolean | CooldownOptions;
     /** Called when a provider errors and routing falls through to the next. */
     onError?: (error: Error, provider: string) => void;
     /** Called after each successful call with the serving provider, tokens, and cost. */
@@ -941,4 +969,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
  */
 declare function createLCR(config: LCRConfig): LCRRouter;
-export { type BillableContext, type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, MODEL_PRICES, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaJobHandle, type MediaJobStatus, type MediaLCR, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPollResult, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaStatusRequest, type MediaStatusResult, type MediaSubmitOptions, type MediaSubmitRequest, type MediaSubmitResult, type MediaUnit, type MediaUsage, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, billableUnits, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, durationFromInput, formatCallRecord, getModelPrice, isAbortError, isNetworkError, isRetryableError, normalizedCents, priceCents, rankRoutes, referenceMegapixels, shouldFailover };
+export { type BillableContext, type CallRecord, type CooldownOptions, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, MODEL_PRICES, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaJobHandle, type MediaJobStatus, type MediaLCR, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPollResult, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaStatusRequest, type MediaStatusResult, type MediaSubmitOptions, type MediaSubmitRequest, type MediaSubmitResult, type MediaUnit, type MediaUsage, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, billableUnits, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, durationFromInput, formatCallRecord, getModelPrice, isAbortError, isNetworkError, isRetryableError, normalizedCents, priceCents, rankRoutes, referenceMegapixels, shouldFailover };

package/dist/index.d.ts CHANGED Viewed

@@ -206,6 +206,23 @@ interface CallRecord {
      */
     emptyCompletion?: boolean;
 }
+/**
+ * Circuit-breaker tuning for {@link FallbackOptions.cooldown}. A provider that
+ * fails `maxFailures` times within `windowMs` is *skipped* for `cooldownMs` —
+ * not just stepped past per request. Without it, the only recovery lever is the
+ * `resetIntervalMs` snap-back, which blindly re-probes the cheapest provider on
+ * a timer: a provider that's down keeps eating one failed attempt every window.
+ * The breaker remembers the failure and stops sending traffic to it until it's
+ * had time to recover. A single success clears its failure count.
+ */
+interface CooldownOptions {
+    /** Failures within `windowMs` that trip the breaker for a provider. Default 3. */
+    maxFailures?: number;
+    /** Sliding window over which failures are counted, ms. Default 60_000. */
+    windowMs?: number;
+    /** How long a tripped provider is skipped before it's re-tried, ms. Default 60_000. */
+    cooldownMs?: number;
+}
 /**
  * A transport-level failure (provider unreachable / socket dropped / DNS /
  * connect timeout). These carry no HTTP status, so they must be detected
@@ -898,6 +915,17 @@ interface LCRConfig {
     autoPrice?: boolean;
     /** Idle window after which routing snaps back to the cheapest provider. Default 60s. */
     resetIntervalMs?: number;
+    /**
+     * Circuit breaker: stop sending traffic to a provider that keeps failing,
+     * instead of re-probing it on every request. A provider that fails enough
+     * times in a window is *skipped* for a cooldown period (one success clears it).
+     * This is sharper than `resetIntervalMs` alone, which blindly re-tries the
+     * cheapest provider on a timer — a provider that's down then eats a failed
+     * attempt every window. `true` enables sensible defaults (3 failures / 60s →
+     * 60s cooldown); pass an object to tune; omit to disable (the default —
+     * unchanged routing, no provider is ever skipped). See {@link CooldownOptions}.
+     */
+    cooldown?: boolean | CooldownOptions;
     /** Called when a provider errors and routing falls through to the next. */
     onError?: (error: Error, provider: string) => void;
     /** Called after each successful call with the serving provider, tokens, and cost. */
@@ -941,4 +969,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
  */
 declare function createLCR(config: LCRConfig): LCRRouter;
-export { type BillableContext, type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, MODEL_PRICES, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaJobHandle, type MediaJobStatus, type MediaLCR, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPollResult, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaStatusRequest, type MediaStatusResult, type MediaSubmitOptions, type MediaSubmitRequest, type MediaSubmitResult, type MediaUnit, type MediaUsage, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, billableUnits, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, durationFromInput, formatCallRecord, getModelPrice, isAbortError, isNetworkError, isRetryableError, normalizedCents, priceCents, rankRoutes, referenceMegapixels, shouldFailover };
+export { type BillableContext, type CallRecord, type CooldownOptions, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, MODEL_PRICES, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaJobHandle, type MediaJobStatus, type MediaLCR, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPollResult, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaStatusRequest, type MediaStatusResult, type MediaSubmitOptions, type MediaSubmitRequest, type MediaSubmitResult, type MediaUnit, type MediaUsage, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, billableUnits, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, durationFromInput, formatCallRecord, getModelPrice, isAbortError, isNetworkError, isRetryableError, normalizedCents, priceCents, rankRoutes, referenceMegapixels, shouldFailover };

package/dist/index.js CHANGED Viewed

@@ -5,6 +5,20 @@ var EmptyCompletionError = class extends Error {
     this.name = "EmptyCompletionError";
   }
 };
+var COOLDOWN_DEFAULTS = {
+  maxFailures: 3,
+  windowMs: 6e4,
+  cooldownMs: 6e4
+};
+function resolveCooldown(opt) {
+  if (!opt) return void 0;
+  if (opt === true) return { ...COOLDOWN_DEFAULTS };
+  return {
+    maxFailures: opt.maxFailures ?? COOLDOWN_DEFAULTS.maxFailures,
+    windowMs: opt.windowMs ?? COOLDOWN_DEFAULTS.windowMs,
+    cooldownMs: opt.cooldownMs ?? COOLDOWN_DEFAULTS.cooldownMs
+  };
+}
 var RETRYABLE_STATUS = /* @__PURE__ */ new Set([401, 402, 403, 408, 409, 413, 429, 498, 500]);
 var RETRYABLE_PATTERNS = [
   "overloaded",
@@ -197,21 +211,82 @@ var LcrFallbackModel = class {
       throw new Error(`ai-lcr: model "${opts.modelName}" has no providers`);
     }
     this.resetIntervalMs = opts.resetIntervalMs ?? 6e4;
+    this.cooldown = resolveCooldown(opts.cooldown);
+    this.failures = opts.providers.map(() => []);
+    this.cooldownUntil = opts.providers.map(() => 0);
   }
   opts;
   specificationVersion = "v3";
   // Cross-request *hint* for where the next request starts: after a failover we
   // remember the provider that worked so we don't re-probe a dead cheap one on
-  // every call. This is the ONLY shared mutable state — and crucially it is read
-  // once per request (snapshotted into a local cursor) and written once on
-  // settle, never used as a per-request loop bound. The within-request iteration
-  // is fully local, so concurrent requests can't corrupt each other's routing.
+  // every call. Shared mutable state, but read once per request (snapshotted into
+  // a local cursor) and written once on settle, never used as a per-request loop
+  // bound. The within-request iteration is fully local, so concurrent requests
+  // can't corrupt each other's routing. The cooldown state below shares the same
+  // discipline: it's a cross-request hint that only ever *reorders* the local
+  // attempt list, never bounds it.
   sticky = 0;
   // When `sticky` was last advanced (a failover). The re-probe timer measures
   // from THIS, not from the last call — so it fires under sustained traffic too,
   // instead of being pushed forward forever by a busy stream of requests.
   lastFailoverAt = Date.now();
   resetIntervalMs;
+  // Circuit breaker (undefined = disabled). Per-provider, parallel to `providers`:
+  // `failures[i]` is the timestamps of recent failures within the window, and
+  // `cooldownUntil[i]` is the time before which provider i is skipped. Both are
+  // cross-request hints — like `sticky`, eventually consistent under concurrency
+  // and never used to bound a request's local iteration.
+  cooldown;
+  failures;
+  cooldownUntil;
+  /** Is provider `idx` currently cooling down (skipped)? Always false when the
+   *  breaker is disabled, so callers need no extra guard. */
+  isCooling(idx, now) {
+    return this.cooldown !== void 0 && this.cooldownUntil[idx] > now;
+  }
+  /** Record a failed attempt on provider `idx`; trip its breaker once failures
+   *  within the window reach `maxFailures`. No-op when the breaker is disabled. */
+  recordProviderFailure(idx) {
+    const cd = this.cooldown;
+    if (cd === void 0) return;
+    const now = Date.now();
+    const recent = this.failures[idx].filter((t) => now - t < cd.windowMs);
+    recent.push(now);
+    if (recent.length >= cd.maxFailures) {
+      this.cooldownUntil[idx] = now + cd.cooldownMs;
+      this.failures[idx] = [];
+    } else {
+      this.failures[idx] = recent;
+    }
+  }
+  /** A success on provider `idx` clears its failure history and any cooldown —
+   *  the breaker is about *sustained* failure, so one good call resets it. */
+  recordProviderSuccess(idx) {
+    if (this.cooldown === void 0) return;
+    if (this.failures[idx].length > 0) this.failures[idx] = [];
+    if (this.cooldownUntil[idx] !== 0) this.cooldownUntil[idx] = 0;
+  }
+  /**
+   * The order of provider indices to try this request: the cheapest-first ring
+   * starting at `start`, but with currently-cooling providers moved to the BACK
+   * (last-resort, soonest-to-expire first) so the breaker skips them without ever
+   * dropping a provider — if every provider is cooling we still try them all
+   * rather than fail the request outright. With the breaker disabled this is just
+   * the plain ring, identical to the previous modular iteration. Computed once
+   * per request and threaded through any stream failover, so it's a stable local
+   * snapshot (concurrent requests can't reshuffle a request mid-flight).
+   */
+  routeOrder(start) {
+    const n = this.opts.providers.length;
+    const ring = [];
+    for (let k = 0; k < n; k++) ring.push((start + k) % n);
+    if (this.cooldown === void 0) return ring;
+    const now = Date.now();
+    const live = ring.filter((i) => !this.isCooling(i, now));
+    if (live.length === 0 || live.length === n) return ring;
+    const cooling = ring.filter((i) => this.isCooling(i, now)).sort((a, b) => this.cooldownUntil[a] - this.cooldownUntil[b]);
+    return [...live, ...cooling];
+  }
   get current() {
     return this.opts.providers[this.sticky];
   }
@@ -279,8 +354,9 @@ var LcrFallbackModel = class {
       requestId: requestIdFrom(options)
     };
   }
-  /** Record a failed attempt onto the call's chain (no event yet). */
-  recordFail(ctx, provider, attemptStart, error) {
+  /** Record a failed attempt onto the call's chain (no event yet) and count it
+   *  toward provider `idx`'s circuit breaker. */
+  recordFail(ctx, idx, provider, attemptStart, error) {
     if (ctx.firstError === void 0) ctx.firstError = error;
     ctx.attempts.push({
       provider: provider.label,
@@ -289,6 +365,7 @@ var LcrFallbackModel = class {
       errorClass: classifyError(error),
       kind: classifyErrorKind(error)
     });
+    this.recordProviderFailure(idx);
   }
   /**
    * Baseline = what this same usage would have cost on the always-on fallback:
@@ -365,59 +442,61 @@ var LcrFallbackModel = class {
   async doGenerate(options) {
     const ctx = this.startCall(options);
     const providers = this.opts.providers;
-    const n = providers.length;
-    const start = this.startIndex();
+    const order = this.routeOrder(this.startIndex());
     let lastError;
-    for (let tried = 0; tried < n; tried++) {
-      const idx = (start + tried) % n;
+    for (let pos = 0; pos < order.length; pos++) {
+      const idx = order[pos];
       const provider = providers[idx];
+      const isLast = pos === order.length - 1;
       const attemptStart = Date.now();
       try {
         const result = await provider.model.doGenerate(options);
         const out = result.usage?.outputTokens?.total ?? 0;
         const inp = result.usage?.inputTokens?.total ?? 0;
-        if (inp > 0 && out === 0 && tried < n - 1) {
+        if (inp > 0 && out === 0 && !isLast) {
           const emptyErr = new EmptyCompletionError(provider.label);
           lastError = emptyErr;
           this.emitError(emptyErr, provider.label);
-          this.recordFail(ctx, provider, attemptStart, emptyErr);
+          this.recordFail(ctx, idx, provider, attemptStart, emptyErr);
           continue;
         }
+        this.recordProviderSuccess(idx);
         this.settleSticky(idx);
         this.finalizeOk(ctx, provider, attemptStart, result.usage);
         return result;
       } catch (error) {
         lastError = error;
         if (!this.shouldRetry(error)) {
-          this.recordFail(ctx, provider, attemptStart, error);
+          this.recordFail(ctx, idx, provider, attemptStart, error);
           this.finalizeFail(ctx);
           throw error;
         }
         this.emitError(error, provider.label);
-        this.recordFail(ctx, provider, attemptStart, error);
+        this.recordFail(ctx, idx, provider, attemptStart, error);
       }
     }
     this.finalizeFail(ctx);
     throw ctx.firstError ?? lastError;
   }
   async doStream(options) {
-    return this.doStreamWithCtx(options, this.startCall(options), this.startIndex(), 0);
+    return this.doStreamWithCtx(options, this.startCall(options), this.routeOrder(this.startIndex()), 0);
   }
-  // The stream's failover recursion re-enters here with the SAME `ctx` and a
-  // threaded-through local cursor (`idx`/`tried`), so a mid-stream switch keeps
-  // appending to one CallRecord and bounds itself on the local `tried` count —
-  // never on shared instance state. `finalizeOk`/`finalizeFail` fire exactly
-  // once per outer request.
-  async doStreamWithCtx(options, ctx, startIdx, alreadyTried) {
+  // The stream's failover recursion re-enters here with the SAME `ctx` and the
+  // SAME `order` snapshot, advancing only the local position `pos`, so a
+  // mid-stream switch keeps appending to one CallRecord and bounds itself on the
+  // local position — never on shared instance state. `finalizeOk`/`finalizeFail`
+  // fire exactly once per outer request.
+  async doStreamWithCtx(options, ctx, order, pos) {
     const self = this;
     const providers = this.opts.providers;
-    const n = providers.length;
+    const n = order.length;
     let result;
     let serving;
     let servingStart;
-    let idx = startIdx;
-    let tried = alreadyTried;
+    let p = pos;
+    let idx = order[p];
     for (; ; ) {
+      idx = order[p];
       serving = providers[idx];
       servingStart = Date.now();
       try {
@@ -425,24 +504,23 @@ var LcrFallbackModel = class {
         break;
       } catch (error) {
         if (!this.shouldRetry(error)) {
-          this.recordFail(ctx, serving, servingStart, error);
+          this.recordFail(ctx, idx, serving, servingStart, error);
           this.finalizeFail(ctx);
           throw error;
         }
         this.emitError(error, serving.label);
-        this.recordFail(ctx, serving, servingStart, error);
-        tried++;
-        if (tried >= n) {
+        this.recordFail(ctx, idx, serving, servingStart, error);
+        p++;
+        if (p >= n) {
           this.finalizeFail(ctx);
           throw ctx.firstError ?? error;
         }
-        idx = (idx + 1) % n;
       }
     }
     const servingProvider = serving;
     const servingAttemptStart = servingStart;
     const servingIdx = idx;
-    const triedBeforeServing = tried;
+    const servingPos = p;
     let usage;
     let contentStreamed = false;
     let ttftMs;
@@ -462,7 +540,7 @@ var LcrFallbackModel = class {
               usage = value.usage;
               const out = value.usage?.outputTokens?.total ?? 0;
               const inp = value.usage?.inputTokens?.total ?? 0;
-              if (inp > 0 && out === 0 && !contentStreamed && triedBeforeServing + 1 < n) {
+              if (inp > 0 && out === 0 && !contentStreamed && servingPos + 1 < n) {
                 throw new EmptyCompletionError(servingProvider.label);
               }
             }
@@ -472,26 +550,22 @@ var LcrFallbackModel = class {
             controller.enqueue(value);
             if (CONTENT_PART_TYPES.has(value.type)) contentStreamed = true;
           }
+          self.recordProviderSuccess(servingIdx);
           self.settleSticky(servingIdx);
           self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
           controller.close();
         } catch (error) {
           self.emitError(error, servingProvider.label);
-          self.recordFail(ctx, servingProvider, servingAttemptStart, error);
+          self.recordFail(ctx, servingIdx, servingProvider, servingAttemptStart, error);
           if (!contentStreamed) {
-            const nextTried = triedBeforeServing + 1;
-            if (nextTried >= n) {
+            const nextPos = servingPos + 1;
+            if (nextPos >= n) {
               self.finalizeFail(ctx);
               controller.error(ctx.firstError ?? error);
               return;
             }
             try {
-              const next = await self.doStreamWithCtx(
-                options,
-                ctx,
-                (servingIdx + 1) % n,
-                nextTried
-              );
+              const next = await self.doStreamWithCtx(options, ctx, order, nextPos);
               const nextReader = next.stream.getReader();
               try {
                 for (; ; ) {
@@ -1886,6 +1960,7 @@ function createLCR(config) {
     autoSort = false,
     autoPrice = false,
     resetIntervalMs,
+    cooldown,
     onError,
     onCost,
     onCall,
@@ -1911,7 +1986,7 @@ function createLCR(config) {
     }
     routed.set(
       name,
-      new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost, onCall, shouldRetry })
+      new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, cooldown, onError, onCost, onCall, shouldRetry })
     );
   }
   return (modelName) => {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "ai-lcr",
-  "version": "0.6.1",
+  "version": "0.6.2",
   "description": "Least Cost Routing for LLMs — route every model call to the cheapest available provider, fall back automatically, and track real cost. Built for the Vercel AI SDK.",
   "keywords": [
     "ai",