ai-lcr 0.6.1 → 0.6.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +32 -1
- package/README.md +12 -0
- package/dist/index.cjs +116 -41
- package/dist/index.d.cts +29 -1
- package/dist/index.d.ts +29 -1
- package/dist/index.js +116 -41
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -4,11 +4,42 @@ All notable changes to `ai-lcr` are documented here. The format follows
|
|
|
4
4
|
[Keep a Changelog](https://keepachangelog.com/), and the project adheres to
|
|
5
5
|
[Semantic Versioning](https://semver.org/).
|
|
6
6
|
|
|
7
|
+
## [0.6.2] — 2026-06-11
|
|
8
|
+
|
|
9
|
+
Circuit breaker for persistently-failing providers. Until now the only recovery
|
|
10
|
+
lever was `resetIntervalMs`, which snaps routing back to the cheapest provider on
|
|
11
|
+
a timer — so a provider that's actually down keeps eating one failed attempt
|
|
12
|
+
every window. The breaker remembers the failure and stops sending it traffic.
|
|
13
|
+
|
|
14
|
+
### Added
|
|
15
|
+
|
|
16
|
+
- **`createLCR({ cooldown })`.** A provider that fails `maxFailures` times within
|
|
17
|
+
`windowMs` is *skipped* for `cooldownMs` instead of being re-probed every
|
|
18
|
+
request; a single success clears its count. `true` enables defaults (3 / 60s →
|
|
19
|
+
60s); pass `{ maxFailures, windowMs, cooldownMs }` to tune. New exported type
|
|
20
|
+
`CooldownOptions`.
|
|
21
|
+
- The breaker only **reorders** each request's attempt list (cooling providers go
|
|
22
|
+
last), so when every provider is cooling a request still tries them all rather
|
|
23
|
+
than failing outright — it can never turn a recoverable request into a hard
|
|
24
|
+
failure.
|
|
25
|
+
|
|
26
|
+
### Changed
|
|
27
|
+
|
|
28
|
+
- The routing engine now snapshots a per-request **attempt order** once (cheapest
|
|
29
|
+
ring with cooling providers moved to the back) and threads it through streaming
|
|
30
|
+
failover, replacing the previous modular index walk. Behavior is identical when
|
|
31
|
+
`cooldown` is unset.
|
|
32
|
+
|
|
33
|
+
### Compatibility
|
|
34
|
+
|
|
35
|
+
- Fully backward compatible. `cooldown` is **off by default** — with it unset no
|
|
36
|
+
provider is ever skipped and routing behaves exactly as before.
|
|
37
|
+
|
|
7
38
|
## [0.6.1] — 2026-06-11
|
|
8
39
|
|
|
9
40
|
Zero-config pricing for native-maker routes. Until now every priced provider
|
|
10
41
|
needed a hand-typed `cost: { input, output }`; for a vendor's own API that number
|
|
11
|
-
is just the public list price you could look up. 0.
|
|
42
|
+
is just the public list price you could look up. 0.6.1 bundles those.
|
|
12
43
|
|
|
13
44
|
### Added
|
|
14
45
|
|
package/README.md
CHANGED
|
@@ -171,6 +171,17 @@ Look a price up yourself with `getModelPrice("claude-sonnet-4-6")`. The table is
|
|
|
171
171
|
2. **Fall through on failure.** On any provider failure — rate limit, 5xx, timeout, a **billing cap** (402 / out-of-credit / quota), *and* a client error like a **400** — it advances to the next provider, streaming-safe. A 400 fails over on purpose: across OpenAI-compatible aggregators a 400 is usually "*this* provider won't take this request" (an unsupported param, a model it hasn't listed, a stricter schema), not a universally-broken request — so the next provider may well serve it. If every provider rejects the request it still fails, surfacing the **original** error so a genuine caller bug stays debuggable. The one failure that never fails over is a deliberate caller cancellation (`AbortSignal`). Pass `shouldRetry: isRetryableError` to `createLCR` to restore the stricter "client errors fail fast" behavior.
|
|
172
172
|
3. **Recover.** After an idle window (`resetIntervalMs`, default 60s) it snaps back to the cheapest provider.
|
|
173
173
|
|
|
174
|
+
For a provider that's *persistently* down, the timer alone keeps re-probing it — one failed attempt every window. Turn on the **circuit breaker** to stop that:
|
|
175
|
+
|
|
176
|
+
```ts
|
|
177
|
+
const lcr = createLCR({
|
|
178
|
+
models: { /* … */ },
|
|
179
|
+
cooldown: true, // skip a provider that keeps failing, instead of re-probing it
|
|
180
|
+
});
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
With `cooldown` on, a provider that fails enough times in a window is *skipped* for a cooldown period rather than tried every request — and a single success clears it. Defaults are 3 failures / 60s → 60s cooldown; tune with `cooldown: { maxFailures, windowMs, cooldownMs }`. It only ever **reorders** the attempt list (cooling providers go last), so if *every* provider is cooling a request still tries them all rather than failing outright. Off by default — routing is unchanged unless you opt in.
|
|
184
|
+
|
|
174
185
|
## See what happened (`onCall`)
|
|
175
186
|
|
|
176
187
|
`onError`/`onCost` fire separately and uncorrelated, so a failover is hard to read after the fact. `onCall` gives you **one record per request** — the full chain, the winner, the reason for each failed hop, latency, and cost — and `formatCallRecord` turns it into a one-liner you can scan:
|
|
@@ -490,6 +501,7 @@ Two OpenAI-compatible providers, same probe, same day. Cells cover both families
|
|
|
490
501
|
## Roadmap
|
|
491
502
|
|
|
492
503
|
- [x] Own failover engine — cheapest-first routing + streaming-safe fallback, no external routing dependency
|
|
504
|
+
- [x] Circuit breaker (`cooldown`) — skip a persistently-failing provider instead of re-probing it every window
|
|
493
505
|
- [x] Real per-call cost accounting (`onCost`)
|
|
494
506
|
- [x] One correlated record per request with the full failover chain (`onCall` + `formatCallRecord`)
|
|
495
507
|
- [x] Auto cheapest-first ordering (`autoSort`) from per-provider `cost`
|
package/dist/index.cjs
CHANGED
|
@@ -56,6 +56,20 @@ var EmptyCompletionError = class extends Error {
|
|
|
56
56
|
this.name = "EmptyCompletionError";
|
|
57
57
|
}
|
|
58
58
|
};
|
|
59
|
+
var COOLDOWN_DEFAULTS = {
|
|
60
|
+
maxFailures: 3,
|
|
61
|
+
windowMs: 6e4,
|
|
62
|
+
cooldownMs: 6e4
|
|
63
|
+
};
|
|
64
|
+
function resolveCooldown(opt) {
|
|
65
|
+
if (!opt) return void 0;
|
|
66
|
+
if (opt === true) return { ...COOLDOWN_DEFAULTS };
|
|
67
|
+
return {
|
|
68
|
+
maxFailures: opt.maxFailures ?? COOLDOWN_DEFAULTS.maxFailures,
|
|
69
|
+
windowMs: opt.windowMs ?? COOLDOWN_DEFAULTS.windowMs,
|
|
70
|
+
cooldownMs: opt.cooldownMs ?? COOLDOWN_DEFAULTS.cooldownMs
|
|
71
|
+
};
|
|
72
|
+
}
|
|
59
73
|
var RETRYABLE_STATUS = /* @__PURE__ */ new Set([401, 402, 403, 408, 409, 413, 429, 498, 500]);
|
|
60
74
|
var RETRYABLE_PATTERNS = [
|
|
61
75
|
"overloaded",
|
|
@@ -248,21 +262,82 @@ var LcrFallbackModel = class {
|
|
|
248
262
|
throw new Error(`ai-lcr: model "${opts.modelName}" has no providers`);
|
|
249
263
|
}
|
|
250
264
|
this.resetIntervalMs = opts.resetIntervalMs ?? 6e4;
|
|
265
|
+
this.cooldown = resolveCooldown(opts.cooldown);
|
|
266
|
+
this.failures = opts.providers.map(() => []);
|
|
267
|
+
this.cooldownUntil = opts.providers.map(() => 0);
|
|
251
268
|
}
|
|
252
269
|
opts;
|
|
253
270
|
specificationVersion = "v3";
|
|
254
271
|
// Cross-request *hint* for where the next request starts: after a failover we
|
|
255
272
|
// remember the provider that worked so we don't re-probe a dead cheap one on
|
|
256
|
-
// every call.
|
|
257
|
-
//
|
|
258
|
-
//
|
|
259
|
-
//
|
|
273
|
+
// every call. Shared mutable state, but read once per request (snapshotted into
|
|
274
|
+
// a local cursor) and written once on settle, never used as a per-request loop
|
|
275
|
+
// bound. The within-request iteration is fully local, so concurrent requests
|
|
276
|
+
// can't corrupt each other's routing. The cooldown state below shares the same
|
|
277
|
+
// discipline: it's a cross-request hint that only ever *reorders* the local
|
|
278
|
+
// attempt list, never bounds it.
|
|
260
279
|
sticky = 0;
|
|
261
280
|
// When `sticky` was last advanced (a failover). The re-probe timer measures
|
|
262
281
|
// from THIS, not from the last call — so it fires under sustained traffic too,
|
|
263
282
|
// instead of being pushed forward forever by a busy stream of requests.
|
|
264
283
|
lastFailoverAt = Date.now();
|
|
265
284
|
resetIntervalMs;
|
|
285
|
+
// Circuit breaker (undefined = disabled). Per-provider, parallel to `providers`:
|
|
286
|
+
// `failures[i]` is the timestamps of recent failures within the window, and
|
|
287
|
+
// `cooldownUntil[i]` is the time before which provider i is skipped. Both are
|
|
288
|
+
// cross-request hints — like `sticky`, eventually consistent under concurrency
|
|
289
|
+
// and never used to bound a request's local iteration.
|
|
290
|
+
cooldown;
|
|
291
|
+
failures;
|
|
292
|
+
cooldownUntil;
|
|
293
|
+
/** Is provider `idx` currently cooling down (skipped)? Always false when the
|
|
294
|
+
* breaker is disabled, so callers need no extra guard. */
|
|
295
|
+
isCooling(idx, now) {
|
|
296
|
+
return this.cooldown !== void 0 && this.cooldownUntil[idx] > now;
|
|
297
|
+
}
|
|
298
|
+
/** Record a failed attempt on provider `idx`; trip its breaker once failures
|
|
299
|
+
* within the window reach `maxFailures`. No-op when the breaker is disabled. */
|
|
300
|
+
recordProviderFailure(idx) {
|
|
301
|
+
const cd = this.cooldown;
|
|
302
|
+
if (cd === void 0) return;
|
|
303
|
+
const now = Date.now();
|
|
304
|
+
const recent = this.failures[idx].filter((t) => now - t < cd.windowMs);
|
|
305
|
+
recent.push(now);
|
|
306
|
+
if (recent.length >= cd.maxFailures) {
|
|
307
|
+
this.cooldownUntil[idx] = now + cd.cooldownMs;
|
|
308
|
+
this.failures[idx] = [];
|
|
309
|
+
} else {
|
|
310
|
+
this.failures[idx] = recent;
|
|
311
|
+
}
|
|
312
|
+
}
|
|
313
|
+
/** A success on provider `idx` clears its failure history and any cooldown —
|
|
314
|
+
* the breaker is about *sustained* failure, so one good call resets it. */
|
|
315
|
+
recordProviderSuccess(idx) {
|
|
316
|
+
if (this.cooldown === void 0) return;
|
|
317
|
+
if (this.failures[idx].length > 0) this.failures[idx] = [];
|
|
318
|
+
if (this.cooldownUntil[idx] !== 0) this.cooldownUntil[idx] = 0;
|
|
319
|
+
}
|
|
320
|
+
/**
|
|
321
|
+
* The order of provider indices to try this request: the cheapest-first ring
|
|
322
|
+
* starting at `start`, but with currently-cooling providers moved to the BACK
|
|
323
|
+
* (last-resort, soonest-to-expire first) so the breaker skips them without ever
|
|
324
|
+
* dropping a provider — if every provider is cooling we still try them all
|
|
325
|
+
* rather than fail the request outright. With the breaker disabled this is just
|
|
326
|
+
* the plain ring, identical to the previous modular iteration. Computed once
|
|
327
|
+
* per request and threaded through any stream failover, so it's a stable local
|
|
328
|
+
* snapshot (concurrent requests can't reshuffle a request mid-flight).
|
|
329
|
+
*/
|
|
330
|
+
routeOrder(start) {
|
|
331
|
+
const n = this.opts.providers.length;
|
|
332
|
+
const ring = [];
|
|
333
|
+
for (let k = 0; k < n; k++) ring.push((start + k) % n);
|
|
334
|
+
if (this.cooldown === void 0) return ring;
|
|
335
|
+
const now = Date.now();
|
|
336
|
+
const live = ring.filter((i) => !this.isCooling(i, now));
|
|
337
|
+
if (live.length === 0 || live.length === n) return ring;
|
|
338
|
+
const cooling = ring.filter((i) => this.isCooling(i, now)).sort((a, b) => this.cooldownUntil[a] - this.cooldownUntil[b]);
|
|
339
|
+
return [...live, ...cooling];
|
|
340
|
+
}
|
|
266
341
|
get current() {
|
|
267
342
|
return this.opts.providers[this.sticky];
|
|
268
343
|
}
|
|
@@ -330,8 +405,9 @@ var LcrFallbackModel = class {
|
|
|
330
405
|
requestId: requestIdFrom(options)
|
|
331
406
|
};
|
|
332
407
|
}
|
|
333
|
-
/** Record a failed attempt onto the call's chain (no event yet)
|
|
334
|
-
|
|
408
|
+
/** Record a failed attempt onto the call's chain (no event yet) and count it
|
|
409
|
+
* toward provider `idx`'s circuit breaker. */
|
|
410
|
+
recordFail(ctx, idx, provider, attemptStart, error) {
|
|
335
411
|
if (ctx.firstError === void 0) ctx.firstError = error;
|
|
336
412
|
ctx.attempts.push({
|
|
337
413
|
provider: provider.label,
|
|
@@ -340,6 +416,7 @@ var LcrFallbackModel = class {
|
|
|
340
416
|
errorClass: classifyError(error),
|
|
341
417
|
kind: classifyErrorKind(error)
|
|
342
418
|
});
|
|
419
|
+
this.recordProviderFailure(idx);
|
|
343
420
|
}
|
|
344
421
|
/**
|
|
345
422
|
* Baseline = what this same usage would have cost on the always-on fallback:
|
|
@@ -416,59 +493,61 @@ var LcrFallbackModel = class {
|
|
|
416
493
|
async doGenerate(options) {
|
|
417
494
|
const ctx = this.startCall(options);
|
|
418
495
|
const providers = this.opts.providers;
|
|
419
|
-
const
|
|
420
|
-
const start = this.startIndex();
|
|
496
|
+
const order = this.routeOrder(this.startIndex());
|
|
421
497
|
let lastError;
|
|
422
|
-
for (let
|
|
423
|
-
const idx =
|
|
498
|
+
for (let pos = 0; pos < order.length; pos++) {
|
|
499
|
+
const idx = order[pos];
|
|
424
500
|
const provider = providers[idx];
|
|
501
|
+
const isLast = pos === order.length - 1;
|
|
425
502
|
const attemptStart = Date.now();
|
|
426
503
|
try {
|
|
427
504
|
const result = await provider.model.doGenerate(options);
|
|
428
505
|
const out = result.usage?.outputTokens?.total ?? 0;
|
|
429
506
|
const inp = result.usage?.inputTokens?.total ?? 0;
|
|
430
|
-
if (inp > 0 && out === 0 &&
|
|
507
|
+
if (inp > 0 && out === 0 && !isLast) {
|
|
431
508
|
const emptyErr = new EmptyCompletionError(provider.label);
|
|
432
509
|
lastError = emptyErr;
|
|
433
510
|
this.emitError(emptyErr, provider.label);
|
|
434
|
-
this.recordFail(ctx, provider, attemptStart, emptyErr);
|
|
511
|
+
this.recordFail(ctx, idx, provider, attemptStart, emptyErr);
|
|
435
512
|
continue;
|
|
436
513
|
}
|
|
514
|
+
this.recordProviderSuccess(idx);
|
|
437
515
|
this.settleSticky(idx);
|
|
438
516
|
this.finalizeOk(ctx, provider, attemptStart, result.usage);
|
|
439
517
|
return result;
|
|
440
518
|
} catch (error) {
|
|
441
519
|
lastError = error;
|
|
442
520
|
if (!this.shouldRetry(error)) {
|
|
443
|
-
this.recordFail(ctx, provider, attemptStart, error);
|
|
521
|
+
this.recordFail(ctx, idx, provider, attemptStart, error);
|
|
444
522
|
this.finalizeFail(ctx);
|
|
445
523
|
throw error;
|
|
446
524
|
}
|
|
447
525
|
this.emitError(error, provider.label);
|
|
448
|
-
this.recordFail(ctx, provider, attemptStart, error);
|
|
526
|
+
this.recordFail(ctx, idx, provider, attemptStart, error);
|
|
449
527
|
}
|
|
450
528
|
}
|
|
451
529
|
this.finalizeFail(ctx);
|
|
452
530
|
throw ctx.firstError ?? lastError;
|
|
453
531
|
}
|
|
454
532
|
async doStream(options) {
|
|
455
|
-
return this.doStreamWithCtx(options, this.startCall(options), this.startIndex(), 0);
|
|
533
|
+
return this.doStreamWithCtx(options, this.startCall(options), this.routeOrder(this.startIndex()), 0);
|
|
456
534
|
}
|
|
457
|
-
// The stream's failover recursion re-enters here with the SAME `ctx` and
|
|
458
|
-
//
|
|
459
|
-
// appending to one CallRecord and bounds itself on the
|
|
460
|
-
// never on shared instance state. `finalizeOk`/`finalizeFail`
|
|
461
|
-
// once per outer request.
|
|
462
|
-
async doStreamWithCtx(options, ctx,
|
|
535
|
+
// The stream's failover recursion re-enters here with the SAME `ctx` and the
|
|
536
|
+
// SAME `order` snapshot, advancing only the local position `pos`, so a
|
|
537
|
+
// mid-stream switch keeps appending to one CallRecord and bounds itself on the
|
|
538
|
+
// local position — never on shared instance state. `finalizeOk`/`finalizeFail`
|
|
539
|
+
// fire exactly once per outer request.
|
|
540
|
+
async doStreamWithCtx(options, ctx, order, pos) {
|
|
463
541
|
const self = this;
|
|
464
542
|
const providers = this.opts.providers;
|
|
465
|
-
const n =
|
|
543
|
+
const n = order.length;
|
|
466
544
|
let result;
|
|
467
545
|
let serving;
|
|
468
546
|
let servingStart;
|
|
469
|
-
let
|
|
470
|
-
let
|
|
547
|
+
let p = pos;
|
|
548
|
+
let idx = order[p];
|
|
471
549
|
for (; ; ) {
|
|
550
|
+
idx = order[p];
|
|
472
551
|
serving = providers[idx];
|
|
473
552
|
servingStart = Date.now();
|
|
474
553
|
try {
|
|
@@ -476,24 +555,23 @@ var LcrFallbackModel = class {
|
|
|
476
555
|
break;
|
|
477
556
|
} catch (error) {
|
|
478
557
|
if (!this.shouldRetry(error)) {
|
|
479
|
-
this.recordFail(ctx, serving, servingStart, error);
|
|
558
|
+
this.recordFail(ctx, idx, serving, servingStart, error);
|
|
480
559
|
this.finalizeFail(ctx);
|
|
481
560
|
throw error;
|
|
482
561
|
}
|
|
483
562
|
this.emitError(error, serving.label);
|
|
484
|
-
this.recordFail(ctx, serving, servingStart, error);
|
|
485
|
-
|
|
486
|
-
if (
|
|
563
|
+
this.recordFail(ctx, idx, serving, servingStart, error);
|
|
564
|
+
p++;
|
|
565
|
+
if (p >= n) {
|
|
487
566
|
this.finalizeFail(ctx);
|
|
488
567
|
throw ctx.firstError ?? error;
|
|
489
568
|
}
|
|
490
|
-
idx = (idx + 1) % n;
|
|
491
569
|
}
|
|
492
570
|
}
|
|
493
571
|
const servingProvider = serving;
|
|
494
572
|
const servingAttemptStart = servingStart;
|
|
495
573
|
const servingIdx = idx;
|
|
496
|
-
const
|
|
574
|
+
const servingPos = p;
|
|
497
575
|
let usage;
|
|
498
576
|
let contentStreamed = false;
|
|
499
577
|
let ttftMs;
|
|
@@ -513,7 +591,7 @@ var LcrFallbackModel = class {
|
|
|
513
591
|
usage = value.usage;
|
|
514
592
|
const out = value.usage?.outputTokens?.total ?? 0;
|
|
515
593
|
const inp = value.usage?.inputTokens?.total ?? 0;
|
|
516
|
-
if (inp > 0 && out === 0 && !contentStreamed &&
|
|
594
|
+
if (inp > 0 && out === 0 && !contentStreamed && servingPos + 1 < n) {
|
|
517
595
|
throw new EmptyCompletionError(servingProvider.label);
|
|
518
596
|
}
|
|
519
597
|
}
|
|
@@ -523,26 +601,22 @@ var LcrFallbackModel = class {
|
|
|
523
601
|
controller.enqueue(value);
|
|
524
602
|
if (CONTENT_PART_TYPES.has(value.type)) contentStreamed = true;
|
|
525
603
|
}
|
|
604
|
+
self.recordProviderSuccess(servingIdx);
|
|
526
605
|
self.settleSticky(servingIdx);
|
|
527
606
|
self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
|
|
528
607
|
controller.close();
|
|
529
608
|
} catch (error) {
|
|
530
609
|
self.emitError(error, servingProvider.label);
|
|
531
|
-
self.recordFail(ctx, servingProvider, servingAttemptStart, error);
|
|
610
|
+
self.recordFail(ctx, servingIdx, servingProvider, servingAttemptStart, error);
|
|
532
611
|
if (!contentStreamed) {
|
|
533
|
-
const
|
|
534
|
-
if (
|
|
612
|
+
const nextPos = servingPos + 1;
|
|
613
|
+
if (nextPos >= n) {
|
|
535
614
|
self.finalizeFail(ctx);
|
|
536
615
|
controller.error(ctx.firstError ?? error);
|
|
537
616
|
return;
|
|
538
617
|
}
|
|
539
618
|
try {
|
|
540
|
-
const next = await self.doStreamWithCtx(
|
|
541
|
-
options,
|
|
542
|
-
ctx,
|
|
543
|
-
(servingIdx + 1) % n,
|
|
544
|
-
nextTried
|
|
545
|
-
);
|
|
619
|
+
const next = await self.doStreamWithCtx(options, ctx, order, nextPos);
|
|
546
620
|
const nextReader = next.stream.getReader();
|
|
547
621
|
try {
|
|
548
622
|
for (; ; ) {
|
|
@@ -1937,6 +2011,7 @@ function createLCR(config) {
|
|
|
1937
2011
|
autoSort = false,
|
|
1938
2012
|
autoPrice = false,
|
|
1939
2013
|
resetIntervalMs,
|
|
2014
|
+
cooldown,
|
|
1940
2015
|
onError,
|
|
1941
2016
|
onCost,
|
|
1942
2017
|
onCall,
|
|
@@ -1962,7 +2037,7 @@ function createLCR(config) {
|
|
|
1962
2037
|
}
|
|
1963
2038
|
routed.set(
|
|
1964
2039
|
name,
|
|
1965
|
-
new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost, onCall, shouldRetry })
|
|
2040
|
+
new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, cooldown, onError, onCost, onCall, shouldRetry })
|
|
1966
2041
|
);
|
|
1967
2042
|
}
|
|
1968
2043
|
return (modelName) => {
|
package/dist/index.d.cts
CHANGED
|
@@ -206,6 +206,23 @@ interface CallRecord {
|
|
|
206
206
|
*/
|
|
207
207
|
emptyCompletion?: boolean;
|
|
208
208
|
}
|
|
209
|
+
/**
|
|
210
|
+
* Circuit-breaker tuning for {@link FallbackOptions.cooldown}. A provider that
|
|
211
|
+
* fails `maxFailures` times within `windowMs` is *skipped* for `cooldownMs` —
|
|
212
|
+
* not just stepped past per request. Without it, the only recovery lever is the
|
|
213
|
+
* `resetIntervalMs` snap-back, which blindly re-probes the cheapest provider on
|
|
214
|
+
* a timer: a provider that's down keeps eating one failed attempt every window.
|
|
215
|
+
* The breaker remembers the failure and stops sending traffic to it until it's
|
|
216
|
+
* had time to recover. A single success clears its failure count.
|
|
217
|
+
*/
|
|
218
|
+
interface CooldownOptions {
|
|
219
|
+
/** Failures within `windowMs` that trip the breaker for a provider. Default 3. */
|
|
220
|
+
maxFailures?: number;
|
|
221
|
+
/** Sliding window over which failures are counted, ms. Default 60_000. */
|
|
222
|
+
windowMs?: number;
|
|
223
|
+
/** How long a tripped provider is skipped before it's re-tried, ms. Default 60_000. */
|
|
224
|
+
cooldownMs?: number;
|
|
225
|
+
}
|
|
209
226
|
/**
|
|
210
227
|
* A transport-level failure (provider unreachable / socket dropped / DNS /
|
|
211
228
|
* connect timeout). These carry no HTTP status, so they must be detected
|
|
@@ -898,6 +915,17 @@ interface LCRConfig {
|
|
|
898
915
|
autoPrice?: boolean;
|
|
899
916
|
/** Idle window after which routing snaps back to the cheapest provider. Default 60s. */
|
|
900
917
|
resetIntervalMs?: number;
|
|
918
|
+
/**
|
|
919
|
+
* Circuit breaker: stop sending traffic to a provider that keeps failing,
|
|
920
|
+
* instead of re-probing it on every request. A provider that fails enough
|
|
921
|
+
* times in a window is *skipped* for a cooldown period (one success clears it).
|
|
922
|
+
* This is sharper than `resetIntervalMs` alone, which blindly re-tries the
|
|
923
|
+
* cheapest provider on a timer — a provider that's down then eats a failed
|
|
924
|
+
* attempt every window. `true` enables sensible defaults (3 failures / 60s →
|
|
925
|
+
* 60s cooldown); pass an object to tune; omit to disable (the default —
|
|
926
|
+
* unchanged routing, no provider is ever skipped). See {@link CooldownOptions}.
|
|
927
|
+
*/
|
|
928
|
+
cooldown?: boolean | CooldownOptions;
|
|
901
929
|
/** Called when a provider errors and routing falls through to the next. */
|
|
902
930
|
onError?: (error: Error, provider: string) => void;
|
|
903
931
|
/** Called after each successful call with the serving provider, tokens, and cost. */
|
|
@@ -941,4 +969,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
|
|
|
941
969
|
*/
|
|
942
970
|
declare function createLCR(config: LCRConfig): LCRRouter;
|
|
943
971
|
|
|
944
|
-
export { type BillableContext, type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, MODEL_PRICES, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaJobHandle, type MediaJobStatus, type MediaLCR, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPollResult, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaStatusRequest, type MediaStatusResult, type MediaSubmitOptions, type MediaSubmitRequest, type MediaSubmitResult, type MediaUnit, type MediaUsage, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, billableUnits, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, durationFromInput, formatCallRecord, getModelPrice, isAbortError, isNetworkError, isRetryableError, normalizedCents, priceCents, rankRoutes, referenceMegapixels, shouldFailover };
|
|
972
|
+
export { type BillableContext, type CallRecord, type CooldownOptions, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, MODEL_PRICES, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaJobHandle, type MediaJobStatus, type MediaLCR, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPollResult, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaStatusRequest, type MediaStatusResult, type MediaSubmitOptions, type MediaSubmitRequest, type MediaSubmitResult, type MediaUnit, type MediaUsage, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, billableUnits, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, durationFromInput, formatCallRecord, getModelPrice, isAbortError, isNetworkError, isRetryableError, normalizedCents, priceCents, rankRoutes, referenceMegapixels, shouldFailover };
|
package/dist/index.d.ts
CHANGED
|
@@ -206,6 +206,23 @@ interface CallRecord {
|
|
|
206
206
|
*/
|
|
207
207
|
emptyCompletion?: boolean;
|
|
208
208
|
}
|
|
209
|
+
/**
|
|
210
|
+
* Circuit-breaker tuning for {@link FallbackOptions.cooldown}. A provider that
|
|
211
|
+
* fails `maxFailures` times within `windowMs` is *skipped* for `cooldownMs` —
|
|
212
|
+
* not just stepped past per request. Without it, the only recovery lever is the
|
|
213
|
+
* `resetIntervalMs` snap-back, which blindly re-probes the cheapest provider on
|
|
214
|
+
* a timer: a provider that's down keeps eating one failed attempt every window.
|
|
215
|
+
* The breaker remembers the failure and stops sending traffic to it until it's
|
|
216
|
+
* had time to recover. A single success clears its failure count.
|
|
217
|
+
*/
|
|
218
|
+
interface CooldownOptions {
|
|
219
|
+
/** Failures within `windowMs` that trip the breaker for a provider. Default 3. */
|
|
220
|
+
maxFailures?: number;
|
|
221
|
+
/** Sliding window over which failures are counted, ms. Default 60_000. */
|
|
222
|
+
windowMs?: number;
|
|
223
|
+
/** How long a tripped provider is skipped before it's re-tried, ms. Default 60_000. */
|
|
224
|
+
cooldownMs?: number;
|
|
225
|
+
}
|
|
209
226
|
/**
|
|
210
227
|
* A transport-level failure (provider unreachable / socket dropped / DNS /
|
|
211
228
|
* connect timeout). These carry no HTTP status, so they must be detected
|
|
@@ -898,6 +915,17 @@ interface LCRConfig {
|
|
|
898
915
|
autoPrice?: boolean;
|
|
899
916
|
/** Idle window after which routing snaps back to the cheapest provider. Default 60s. */
|
|
900
917
|
resetIntervalMs?: number;
|
|
918
|
+
/**
|
|
919
|
+
* Circuit breaker: stop sending traffic to a provider that keeps failing,
|
|
920
|
+
* instead of re-probing it on every request. A provider that fails enough
|
|
921
|
+
* times in a window is *skipped* for a cooldown period (one success clears it).
|
|
922
|
+
* This is sharper than `resetIntervalMs` alone, which blindly re-tries the
|
|
923
|
+
* cheapest provider on a timer — a provider that's down then eats a failed
|
|
924
|
+
* attempt every window. `true` enables sensible defaults (3 failures / 60s →
|
|
925
|
+
* 60s cooldown); pass an object to tune; omit to disable (the default —
|
|
926
|
+
* unchanged routing, no provider is ever skipped). See {@link CooldownOptions}.
|
|
927
|
+
*/
|
|
928
|
+
cooldown?: boolean | CooldownOptions;
|
|
901
929
|
/** Called when a provider errors and routing falls through to the next. */
|
|
902
930
|
onError?: (error: Error, provider: string) => void;
|
|
903
931
|
/** Called after each successful call with the serving provider, tokens, and cost. */
|
|
@@ -941,4 +969,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
|
|
|
941
969
|
*/
|
|
942
970
|
declare function createLCR(config: LCRConfig): LCRRouter;
|
|
943
971
|
|
|
944
|
-
export { type BillableContext, type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, MODEL_PRICES, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaJobHandle, type MediaJobStatus, type MediaLCR, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPollResult, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaStatusRequest, type MediaStatusResult, type MediaSubmitOptions, type MediaSubmitRequest, type MediaSubmitResult, type MediaUnit, type MediaUsage, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, billableUnits, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, durationFromInput, formatCallRecord, getModelPrice, isAbortError, isNetworkError, isRetryableError, normalizedCents, priceCents, rankRoutes, referenceMegapixels, shouldFailover };
|
|
972
|
+
export { type BillableContext, type CallRecord, type CooldownOptions, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, MODEL_PRICES, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaJobHandle, type MediaJobStatus, type MediaLCR, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPollResult, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaStatusRequest, type MediaStatusResult, type MediaSubmitOptions, type MediaSubmitRequest, type MediaSubmitResult, type MediaUnit, type MediaUsage, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, billableUnits, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, durationFromInput, formatCallRecord, getModelPrice, isAbortError, isNetworkError, isRetryableError, normalizedCents, priceCents, rankRoutes, referenceMegapixels, shouldFailover };
|
package/dist/index.js
CHANGED
|
@@ -5,6 +5,20 @@ var EmptyCompletionError = class extends Error {
|
|
|
5
5
|
this.name = "EmptyCompletionError";
|
|
6
6
|
}
|
|
7
7
|
};
|
|
8
|
+
var COOLDOWN_DEFAULTS = {
|
|
9
|
+
maxFailures: 3,
|
|
10
|
+
windowMs: 6e4,
|
|
11
|
+
cooldownMs: 6e4
|
|
12
|
+
};
|
|
13
|
+
function resolveCooldown(opt) {
|
|
14
|
+
if (!opt) return void 0;
|
|
15
|
+
if (opt === true) return { ...COOLDOWN_DEFAULTS };
|
|
16
|
+
return {
|
|
17
|
+
maxFailures: opt.maxFailures ?? COOLDOWN_DEFAULTS.maxFailures,
|
|
18
|
+
windowMs: opt.windowMs ?? COOLDOWN_DEFAULTS.windowMs,
|
|
19
|
+
cooldownMs: opt.cooldownMs ?? COOLDOWN_DEFAULTS.cooldownMs
|
|
20
|
+
};
|
|
21
|
+
}
|
|
8
22
|
var RETRYABLE_STATUS = /* @__PURE__ */ new Set([401, 402, 403, 408, 409, 413, 429, 498, 500]);
|
|
9
23
|
var RETRYABLE_PATTERNS = [
|
|
10
24
|
"overloaded",
|
|
@@ -197,21 +211,82 @@ var LcrFallbackModel = class {
|
|
|
197
211
|
throw new Error(`ai-lcr: model "${opts.modelName}" has no providers`);
|
|
198
212
|
}
|
|
199
213
|
this.resetIntervalMs = opts.resetIntervalMs ?? 6e4;
|
|
214
|
+
this.cooldown = resolveCooldown(opts.cooldown);
|
|
215
|
+
this.failures = opts.providers.map(() => []);
|
|
216
|
+
this.cooldownUntil = opts.providers.map(() => 0);
|
|
200
217
|
}
|
|
201
218
|
opts;
|
|
202
219
|
specificationVersion = "v3";
|
|
203
220
|
// Cross-request *hint* for where the next request starts: after a failover we
|
|
204
221
|
// remember the provider that worked so we don't re-probe a dead cheap one on
|
|
205
|
-
// every call.
|
|
206
|
-
//
|
|
207
|
-
//
|
|
208
|
-
//
|
|
222
|
+
// every call. Shared mutable state, but read once per request (snapshotted into
|
|
223
|
+
// a local cursor) and written once on settle, never used as a per-request loop
|
|
224
|
+
// bound. The within-request iteration is fully local, so concurrent requests
|
|
225
|
+
// can't corrupt each other's routing. The cooldown state below shares the same
|
|
226
|
+
// discipline: it's a cross-request hint that only ever *reorders* the local
|
|
227
|
+
// attempt list, never bounds it.
|
|
209
228
|
sticky = 0;
|
|
210
229
|
// When `sticky` was last advanced (a failover). The re-probe timer measures
|
|
211
230
|
// from THIS, not from the last call — so it fires under sustained traffic too,
|
|
212
231
|
// instead of being pushed forward forever by a busy stream of requests.
|
|
213
232
|
lastFailoverAt = Date.now();
|
|
214
233
|
resetIntervalMs;
|
|
234
|
+
// Circuit breaker (undefined = disabled). Per-provider, parallel to `providers`:
|
|
235
|
+
// `failures[i]` is the timestamps of recent failures within the window, and
|
|
236
|
+
// `cooldownUntil[i]` is the time before which provider i is skipped. Both are
|
|
237
|
+
// cross-request hints — like `sticky`, eventually consistent under concurrency
|
|
238
|
+
// and never used to bound a request's local iteration.
|
|
239
|
+
cooldown;
|
|
240
|
+
failures;
|
|
241
|
+
cooldownUntil;
|
|
242
|
+
/** Is provider `idx` currently cooling down (skipped)? Always false when the
|
|
243
|
+
* breaker is disabled, so callers need no extra guard. */
|
|
244
|
+
isCooling(idx, now) {
|
|
245
|
+
return this.cooldown !== void 0 && this.cooldownUntil[idx] > now;
|
|
246
|
+
}
|
|
247
|
+
/** Record a failed attempt on provider `idx`; trip its breaker once failures
|
|
248
|
+
* within the window reach `maxFailures`. No-op when the breaker is disabled. */
|
|
249
|
+
recordProviderFailure(idx) {
|
|
250
|
+
const cd = this.cooldown;
|
|
251
|
+
if (cd === void 0) return;
|
|
252
|
+
const now = Date.now();
|
|
253
|
+
const recent = this.failures[idx].filter((t) => now - t < cd.windowMs);
|
|
254
|
+
recent.push(now);
|
|
255
|
+
if (recent.length >= cd.maxFailures) {
|
|
256
|
+
this.cooldownUntil[idx] = now + cd.cooldownMs;
|
|
257
|
+
this.failures[idx] = [];
|
|
258
|
+
} else {
|
|
259
|
+
this.failures[idx] = recent;
|
|
260
|
+
}
|
|
261
|
+
}
|
|
262
|
+
/** A success on provider `idx` clears its failure history and any cooldown —
|
|
263
|
+
* the breaker is about *sustained* failure, so one good call resets it. */
|
|
264
|
+
recordProviderSuccess(idx) {
|
|
265
|
+
if (this.cooldown === void 0) return;
|
|
266
|
+
if (this.failures[idx].length > 0) this.failures[idx] = [];
|
|
267
|
+
if (this.cooldownUntil[idx] !== 0) this.cooldownUntil[idx] = 0;
|
|
268
|
+
}
|
|
269
|
+
/**
|
|
270
|
+
* The order of provider indices to try this request: the cheapest-first ring
|
|
271
|
+
* starting at `start`, but with currently-cooling providers moved to the BACK
|
|
272
|
+
* (last-resort, soonest-to-expire first) so the breaker skips them without ever
|
|
273
|
+
* dropping a provider — if every provider is cooling we still try them all
|
|
274
|
+
* rather than fail the request outright. With the breaker disabled this is just
|
|
275
|
+
* the plain ring, identical to the previous modular iteration. Computed once
|
|
276
|
+
* per request and threaded through any stream failover, so it's a stable local
|
|
277
|
+
* snapshot (concurrent requests can't reshuffle a request mid-flight).
|
|
278
|
+
*/
|
|
279
|
+
routeOrder(start) {
|
|
280
|
+
const n = this.opts.providers.length;
|
|
281
|
+
const ring = [];
|
|
282
|
+
for (let k = 0; k < n; k++) ring.push((start + k) % n);
|
|
283
|
+
if (this.cooldown === void 0) return ring;
|
|
284
|
+
const now = Date.now();
|
|
285
|
+
const live = ring.filter((i) => !this.isCooling(i, now));
|
|
286
|
+
if (live.length === 0 || live.length === n) return ring;
|
|
287
|
+
const cooling = ring.filter((i) => this.isCooling(i, now)).sort((a, b) => this.cooldownUntil[a] - this.cooldownUntil[b]);
|
|
288
|
+
return [...live, ...cooling];
|
|
289
|
+
}
|
|
215
290
|
get current() {
|
|
216
291
|
return this.opts.providers[this.sticky];
|
|
217
292
|
}
|
|
@@ -279,8 +354,9 @@ var LcrFallbackModel = class {
|
|
|
279
354
|
requestId: requestIdFrom(options)
|
|
280
355
|
};
|
|
281
356
|
}
|
|
282
|
-
/** Record a failed attempt onto the call's chain (no event yet)
|
|
283
|
-
|
|
357
|
+
/** Record a failed attempt onto the call's chain (no event yet) and count it
|
|
358
|
+
* toward provider `idx`'s circuit breaker. */
|
|
359
|
+
recordFail(ctx, idx, provider, attemptStart, error) {
|
|
284
360
|
if (ctx.firstError === void 0) ctx.firstError = error;
|
|
285
361
|
ctx.attempts.push({
|
|
286
362
|
provider: provider.label,
|
|
@@ -289,6 +365,7 @@ var LcrFallbackModel = class {
|
|
|
289
365
|
errorClass: classifyError(error),
|
|
290
366
|
kind: classifyErrorKind(error)
|
|
291
367
|
});
|
|
368
|
+
this.recordProviderFailure(idx);
|
|
292
369
|
}
|
|
293
370
|
/**
|
|
294
371
|
* Baseline = what this same usage would have cost on the always-on fallback:
|
|
@@ -365,59 +442,61 @@ var LcrFallbackModel = class {
|
|
|
365
442
|
async doGenerate(options) {
|
|
366
443
|
const ctx = this.startCall(options);
|
|
367
444
|
const providers = this.opts.providers;
|
|
368
|
-
const
|
|
369
|
-
const start = this.startIndex();
|
|
445
|
+
const order = this.routeOrder(this.startIndex());
|
|
370
446
|
let lastError;
|
|
371
|
-
for (let
|
|
372
|
-
const idx =
|
|
447
|
+
for (let pos = 0; pos < order.length; pos++) {
|
|
448
|
+
const idx = order[pos];
|
|
373
449
|
const provider = providers[idx];
|
|
450
|
+
const isLast = pos === order.length - 1;
|
|
374
451
|
const attemptStart = Date.now();
|
|
375
452
|
try {
|
|
376
453
|
const result = await provider.model.doGenerate(options);
|
|
377
454
|
const out = result.usage?.outputTokens?.total ?? 0;
|
|
378
455
|
const inp = result.usage?.inputTokens?.total ?? 0;
|
|
379
|
-
if (inp > 0 && out === 0 &&
|
|
456
|
+
if (inp > 0 && out === 0 && !isLast) {
|
|
380
457
|
const emptyErr = new EmptyCompletionError(provider.label);
|
|
381
458
|
lastError = emptyErr;
|
|
382
459
|
this.emitError(emptyErr, provider.label);
|
|
383
|
-
this.recordFail(ctx, provider, attemptStart, emptyErr);
|
|
460
|
+
this.recordFail(ctx, idx, provider, attemptStart, emptyErr);
|
|
384
461
|
continue;
|
|
385
462
|
}
|
|
463
|
+
this.recordProviderSuccess(idx);
|
|
386
464
|
this.settleSticky(idx);
|
|
387
465
|
this.finalizeOk(ctx, provider, attemptStart, result.usage);
|
|
388
466
|
return result;
|
|
389
467
|
} catch (error) {
|
|
390
468
|
lastError = error;
|
|
391
469
|
if (!this.shouldRetry(error)) {
|
|
392
|
-
this.recordFail(ctx, provider, attemptStart, error);
|
|
470
|
+
this.recordFail(ctx, idx, provider, attemptStart, error);
|
|
393
471
|
this.finalizeFail(ctx);
|
|
394
472
|
throw error;
|
|
395
473
|
}
|
|
396
474
|
this.emitError(error, provider.label);
|
|
397
|
-
this.recordFail(ctx, provider, attemptStart, error);
|
|
475
|
+
this.recordFail(ctx, idx, provider, attemptStart, error);
|
|
398
476
|
}
|
|
399
477
|
}
|
|
400
478
|
this.finalizeFail(ctx);
|
|
401
479
|
throw ctx.firstError ?? lastError;
|
|
402
480
|
}
|
|
403
481
|
async doStream(options) {
|
|
404
|
-
return this.doStreamWithCtx(options, this.startCall(options), this.startIndex(), 0);
|
|
482
|
+
return this.doStreamWithCtx(options, this.startCall(options), this.routeOrder(this.startIndex()), 0);
|
|
405
483
|
}
|
|
406
|
-
// The stream's failover recursion re-enters here with the SAME `ctx` and
|
|
407
|
-
//
|
|
408
|
-
// appending to one CallRecord and bounds itself on the
|
|
409
|
-
// never on shared instance state. `finalizeOk`/`finalizeFail`
|
|
410
|
-
// once per outer request.
|
|
411
|
-
async doStreamWithCtx(options, ctx,
|
|
484
|
+
// The stream's failover recursion re-enters here with the SAME `ctx` and the
|
|
485
|
+
// SAME `order` snapshot, advancing only the local position `pos`, so a
|
|
486
|
+
// mid-stream switch keeps appending to one CallRecord and bounds itself on the
|
|
487
|
+
// local position — never on shared instance state. `finalizeOk`/`finalizeFail`
|
|
488
|
+
// fire exactly once per outer request.
|
|
489
|
+
async doStreamWithCtx(options, ctx, order, pos) {
|
|
412
490
|
const self = this;
|
|
413
491
|
const providers = this.opts.providers;
|
|
414
|
-
const n =
|
|
492
|
+
const n = order.length;
|
|
415
493
|
let result;
|
|
416
494
|
let serving;
|
|
417
495
|
let servingStart;
|
|
418
|
-
let
|
|
419
|
-
let
|
|
496
|
+
let p = pos;
|
|
497
|
+
let idx = order[p];
|
|
420
498
|
for (; ; ) {
|
|
499
|
+
idx = order[p];
|
|
421
500
|
serving = providers[idx];
|
|
422
501
|
servingStart = Date.now();
|
|
423
502
|
try {
|
|
@@ -425,24 +504,23 @@ var LcrFallbackModel = class {
|
|
|
425
504
|
break;
|
|
426
505
|
} catch (error) {
|
|
427
506
|
if (!this.shouldRetry(error)) {
|
|
428
|
-
this.recordFail(ctx, serving, servingStart, error);
|
|
507
|
+
this.recordFail(ctx, idx, serving, servingStart, error);
|
|
429
508
|
this.finalizeFail(ctx);
|
|
430
509
|
throw error;
|
|
431
510
|
}
|
|
432
511
|
this.emitError(error, serving.label);
|
|
433
|
-
this.recordFail(ctx, serving, servingStart, error);
|
|
434
|
-
|
|
435
|
-
if (
|
|
512
|
+
this.recordFail(ctx, idx, serving, servingStart, error);
|
|
513
|
+
p++;
|
|
514
|
+
if (p >= n) {
|
|
436
515
|
this.finalizeFail(ctx);
|
|
437
516
|
throw ctx.firstError ?? error;
|
|
438
517
|
}
|
|
439
|
-
idx = (idx + 1) % n;
|
|
440
518
|
}
|
|
441
519
|
}
|
|
442
520
|
const servingProvider = serving;
|
|
443
521
|
const servingAttemptStart = servingStart;
|
|
444
522
|
const servingIdx = idx;
|
|
445
|
-
const
|
|
523
|
+
const servingPos = p;
|
|
446
524
|
let usage;
|
|
447
525
|
let contentStreamed = false;
|
|
448
526
|
let ttftMs;
|
|
@@ -462,7 +540,7 @@ var LcrFallbackModel = class {
|
|
|
462
540
|
usage = value.usage;
|
|
463
541
|
const out = value.usage?.outputTokens?.total ?? 0;
|
|
464
542
|
const inp = value.usage?.inputTokens?.total ?? 0;
|
|
465
|
-
if (inp > 0 && out === 0 && !contentStreamed &&
|
|
543
|
+
if (inp > 0 && out === 0 && !contentStreamed && servingPos + 1 < n) {
|
|
466
544
|
throw new EmptyCompletionError(servingProvider.label);
|
|
467
545
|
}
|
|
468
546
|
}
|
|
@@ -472,26 +550,22 @@ var LcrFallbackModel = class {
|
|
|
472
550
|
controller.enqueue(value);
|
|
473
551
|
if (CONTENT_PART_TYPES.has(value.type)) contentStreamed = true;
|
|
474
552
|
}
|
|
553
|
+
self.recordProviderSuccess(servingIdx);
|
|
475
554
|
self.settleSticky(servingIdx);
|
|
476
555
|
self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
|
|
477
556
|
controller.close();
|
|
478
557
|
} catch (error) {
|
|
479
558
|
self.emitError(error, servingProvider.label);
|
|
480
|
-
self.recordFail(ctx, servingProvider, servingAttemptStart, error);
|
|
559
|
+
self.recordFail(ctx, servingIdx, servingProvider, servingAttemptStart, error);
|
|
481
560
|
if (!contentStreamed) {
|
|
482
|
-
const
|
|
483
|
-
if (
|
|
561
|
+
const nextPos = servingPos + 1;
|
|
562
|
+
if (nextPos >= n) {
|
|
484
563
|
self.finalizeFail(ctx);
|
|
485
564
|
controller.error(ctx.firstError ?? error);
|
|
486
565
|
return;
|
|
487
566
|
}
|
|
488
567
|
try {
|
|
489
|
-
const next = await self.doStreamWithCtx(
|
|
490
|
-
options,
|
|
491
|
-
ctx,
|
|
492
|
-
(servingIdx + 1) % n,
|
|
493
|
-
nextTried
|
|
494
|
-
);
|
|
568
|
+
const next = await self.doStreamWithCtx(options, ctx, order, nextPos);
|
|
495
569
|
const nextReader = next.stream.getReader();
|
|
496
570
|
try {
|
|
497
571
|
for (; ; ) {
|
|
@@ -1886,6 +1960,7 @@ function createLCR(config) {
|
|
|
1886
1960
|
autoSort = false,
|
|
1887
1961
|
autoPrice = false,
|
|
1888
1962
|
resetIntervalMs,
|
|
1963
|
+
cooldown,
|
|
1889
1964
|
onError,
|
|
1890
1965
|
onCost,
|
|
1891
1966
|
onCall,
|
|
@@ -1911,7 +1986,7 @@ function createLCR(config) {
|
|
|
1911
1986
|
}
|
|
1912
1987
|
routed.set(
|
|
1913
1988
|
name,
|
|
1914
|
-
new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, onError, onCost, onCall, shouldRetry })
|
|
1989
|
+
new LcrFallbackModel({ modelName: name, providers, resetIntervalMs, cooldown, onError, onCost, onCall, shouldRetry })
|
|
1915
1990
|
);
|
|
1916
1991
|
}
|
|
1917
1992
|
return (modelName) => {
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "ai-lcr",
|
|
3
|
-
"version": "0.6.
|
|
3
|
+
"version": "0.6.2",
|
|
4
4
|
"description": "Least Cost Routing for LLMs — route every model call to the cheapest available provider, fall back automatically, and track real cost. Built for the Vercel AI SDK.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"ai",
|