ai-lcr 0.3.0 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +39 -0
- package/README.md +8 -5
- package/README.zh-CN.md +5 -5
- package/dist/index.cjs +84 -5
- package/dist/index.d.cts +33 -1
- package/dist/index.d.ts +33 -1
- package/dist/index.js +83 -5
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,45 @@ All notable changes to `ai-lcr` are documented here. The format follows
|
|
|
4
4
|
[Keep a Changelog](https://keepachangelog.com/), and the project adheres to
|
|
5
5
|
[Semantic Versioning](https://semver.org/).
|
|
6
6
|
|
|
7
|
+
## [0.5.0] — 2026-06-02
|
|
8
|
+
|
|
9
|
+
All additions are optional and backward compatible.
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
|
|
13
|
+
- **Official-price savings baseline for media.** A media model's savings baseline
|
|
14
|
+
is now the model-maker's first-party list price — what a user pays going
|
|
15
|
+
*direct*, bypassing the cheaper providers we route to — instead of the priciest
|
|
16
|
+
provider we happen to route between. For the common case of a model served by a
|
|
17
|
+
single aggregator (Runware, fal, …), the old baseline equalled the actual cost,
|
|
18
|
+
so savings showed as `$0`; the official price surfaces the real saving.
|
|
19
|
+
- `MediaModelDef.official?: MediaPricing` — an inline first-party price on a
|
|
20
|
+
model def. When set, it wins.
|
|
21
|
+
- `MediaLCRConfig.officialPrices?: Record<string, MediaPricing>` — a modelId →
|
|
22
|
+
price map so a downstream registry gets correct baselines without inlining
|
|
23
|
+
prices. Defaults to the bundled **`OFFICIAL_PRICES`** (now exported), lifted
|
|
24
|
+
from the cross-provider price table by `scripts/gen-media-official.mjs`.
|
|
25
|
+
- When no official price is known (e.g. open-weight models served only by
|
|
26
|
+
aggregators), the baseline falls back to the priciest configured route — or
|
|
27
|
+
none if there's a single route — exactly as before.
|
|
28
|
+
|
|
29
|
+
## [0.4.0] — 2026-06-02
|
|
30
|
+
|
|
31
|
+
All additions are optional and backward compatible.
|
|
32
|
+
|
|
33
|
+
### Added
|
|
34
|
+
|
|
35
|
+
- **`CallRecord.ttftMs` — time to first token.** Streaming calls now report TTFT,
|
|
36
|
+
the industry-standard responsiveness metric: ms from the winning provider's
|
|
37
|
+
stream attempt start to its first content token (`text-delta` /
|
|
38
|
+
`reasoning-delta`). Measured against the *winner's* attempt, so failover
|
|
39
|
+
overhead (already in `latencyMs`) doesn't distort it. `undefined` for
|
|
40
|
+
`doGenerate` (no streaming → no "first token") and for calls that failed before
|
|
41
|
+
producing content. `formatCallRecord` shows it inline next to total latency when
|
|
42
|
+
present (`412ms (ttft 88ms)`). With `latencyMs` and `outputTokens` on the same
|
|
43
|
+
record, output throughput is derivable: `outputTokens / ((latencyMs − ttftMs) /
|
|
44
|
+
1000)` tokens/sec.
|
|
45
|
+
|
|
7
46
|
## [0.3.0] — 2026-06-02
|
|
8
47
|
|
|
9
48
|
Integration-feedback pass from wiring ai-lcr into a real agentic product
|
package/README.md
CHANGED
|
@@ -184,6 +184,7 @@ interface CallRecord {
|
|
|
184
184
|
ok: boolean;
|
|
185
185
|
failedOver: boolean; // more than one provider was tried
|
|
186
186
|
latencyMs: number;
|
|
187
|
+
ttftMs?: number; // streaming only: time to first token (winner's first content delta) — industry-standard responsiveness metric
|
|
187
188
|
inputTokens: number;
|
|
188
189
|
outputTokens: number;
|
|
189
190
|
cachedInputTokens?: number; // prompt-cache hits the winner read (when reported)
|
|
@@ -196,6 +197,8 @@ interface CallRecord {
|
|
|
196
197
|
|
|
197
198
|
**Savings, not just spend.** Whenever at least one provider in a chain carries a `cost`, `baselineUsd` is what the same call would have cost on the most expensive priced leg (typically your safety-net fallback). `baselineUsd − costUsd` is the money routing saved on that call — the number a cost dashboard exists to show.
|
|
198
199
|
|
|
200
|
+
**Responsiveness, not just total time.** On streaming calls (`streamText`, `streamObject`, streaming agents), `ttftMs` is the **time to first token** — measured from the winning provider's attempt start to its first content delta. It's the metric most LLM dashboards lead with, because it's what a user feels as "how fast did it start replying". Total `latencyMs` covers the whole stream including any failover; `ttftMs` isolates the serving model's responsiveness. It's `undefined` for `generateText`/`generateObject` (no streaming → no "first" token) and for calls that failed before any content. Output throughput (tokens/sec) is then `outputTokens / ((latencyMs − ttftMs) / 1000)`.
|
|
201
|
+
|
|
199
202
|
**Cache-aware cost.** Add `cacheRead` (USD per 1M cached input tokens) to a provider's `cost` and ai-lcr bills prompt-cache hits at that rate when the call reports `usage.inputTokens.cacheRead`. Omit it and cached tokens fall back to the full `input` rate (unchanged from before). For cache-heavy traffic (e.g. Anthropic, where a cache read is ~0.1×) this keeps `costUsd` honest — and `cachedInputTokens` lets a dashboard audit it:
|
|
200
203
|
|
|
201
204
|
```ts
|
|
@@ -312,11 +315,11 @@ API_KEY=$KUNAVO_API_KEY BASE=https://api.kunavo.com \
|
|
|
312
315
|
REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
|
|
313
316
|
bash scripts/check-provider.sh
|
|
314
317
|
|
|
315
|
-
# TokenMart uses
|
|
316
|
-
API_KEY=$
|
|
317
|
-
MODEL_1=
|
|
318
|
-
MODEL_2=
|
|
319
|
-
CACHE_MODEL=
|
|
318
|
+
# TokenMart (Inference AI) uses bare, un-prefixed model IDs
|
|
319
|
+
API_KEY=$INFERENCE_API_KEY BASE=https://model.service-inference.ai \
|
|
320
|
+
MODEL_1=gemini-3-flash-preview REF_1=google/gemini-3-flash-preview \
|
|
321
|
+
MODEL_2=claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
|
|
322
|
+
CACHE_MODEL=claude-sonnet-4-6 \
|
|
320
323
|
REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
|
|
321
324
|
bash scripts/check-provider.sh
|
|
322
325
|
```
|
package/README.zh-CN.md
CHANGED
|
@@ -232,11 +232,11 @@ API_KEY=$KUNAVO_API_KEY BASE=https://api.kunavo.com \
|
|
|
232
232
|
REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
|
|
233
233
|
bash scripts/check-provider.sh
|
|
234
234
|
|
|
235
|
-
# TokenMart
|
|
236
|
-
API_KEY=$
|
|
237
|
-
MODEL_1=
|
|
238
|
-
MODEL_2=
|
|
239
|
-
CACHE_MODEL=
|
|
235
|
+
# TokenMart(Inference AI)使用不带 vendor 前缀的裸模型 ID
|
|
236
|
+
API_KEY=$INFERENCE_API_KEY BASE=https://model.service-inference.ai \
|
|
237
|
+
MODEL_1=gemini-3-flash-preview REF_1=google/gemini-3-flash-preview \
|
|
238
|
+
MODEL_2=claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
|
|
239
|
+
CACHE_MODEL=claude-sonnet-4-6 \
|
|
240
240
|
REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
|
|
241
241
|
bash scripts/check-provider.sh
|
|
242
242
|
```
|
package/dist/index.cjs
CHANGED
|
@@ -22,6 +22,7 @@ var index_exports = {};
|
|
|
22
22
|
__export(index_exports, {
|
|
23
23
|
DEFAULT_REFERENCE: () => DEFAULT_REFERENCE,
|
|
24
24
|
MEDIA_PRICING: () => MEDIA_PRICING,
|
|
25
|
+
OFFICIAL_PRICES: () => OFFICIAL_PRICES,
|
|
25
26
|
cheapestRoute: () => cheapestRoute,
|
|
26
27
|
classifyError: () => classifyError,
|
|
27
28
|
classifyErrorKind: () => classifyErrorKind,
|
|
@@ -312,7 +313,7 @@ var LcrFallbackModel = class {
|
|
|
312
313
|
return max;
|
|
313
314
|
}
|
|
314
315
|
/** Winner settled: record the attempt, fire `onCost` (compat) + `onCall`. */
|
|
315
|
-
finalizeOk(ctx, provider, attemptStart, usage) {
|
|
316
|
+
finalizeOk(ctx, provider, attemptStart, usage, ttftMs) {
|
|
316
317
|
ctx.attempts.push({ provider: provider.label, ok: true, latencyMs: Date.now() - attemptStart });
|
|
317
318
|
const inputTokens = usage?.inputTokens?.total ?? 0;
|
|
318
319
|
const outputTokens = usage?.outputTokens?.total ?? 0;
|
|
@@ -334,6 +335,7 @@ var LcrFallbackModel = class {
|
|
|
334
335
|
ok: true,
|
|
335
336
|
failedOver: ctx.attempts.length > 1,
|
|
336
337
|
latencyMs: Date.now() - ctx.startedAt,
|
|
338
|
+
...ttftMs !== void 0 ? { ttftMs } : {},
|
|
337
339
|
inputTokens,
|
|
338
340
|
outputTokens,
|
|
339
341
|
...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {},
|
|
@@ -433,6 +435,7 @@ var LcrFallbackModel = class {
|
|
|
433
435
|
const triedBeforeServing = tried;
|
|
434
436
|
let usage;
|
|
435
437
|
let streamedAny = false;
|
|
438
|
+
let ttftMs;
|
|
436
439
|
const stream = new ReadableStream({
|
|
437
440
|
async start(controller) {
|
|
438
441
|
let reader = null;
|
|
@@ -446,11 +449,14 @@ var LcrFallbackModel = class {
|
|
|
446
449
|
}
|
|
447
450
|
if (done) break;
|
|
448
451
|
if (value.type === "finish") usage = value.usage;
|
|
452
|
+
if (ttftMs === void 0 && (value.type === "text-delta" || value.type === "reasoning-delta")) {
|
|
453
|
+
ttftMs = Date.now() - servingAttemptStart;
|
|
454
|
+
}
|
|
449
455
|
controller.enqueue(value);
|
|
450
456
|
if (value.type !== "stream-start") streamedAny = true;
|
|
451
457
|
}
|
|
452
458
|
self.settleSticky(servingIdx);
|
|
453
|
-
self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage);
|
|
459
|
+
self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
|
|
454
460
|
controller.close();
|
|
455
461
|
} catch (error) {
|
|
456
462
|
self.emitError(error, servingProvider.label);
|
|
@@ -512,7 +518,8 @@ function formatCallRecord(record, opts = {}) {
|
|
|
512
518
|
const glyph = !record.ok ? "\u2717" : record.failedOver ? "\u26A0" : "\u2713";
|
|
513
519
|
const chain = record.attempts.map((a) => a.provider).join("\u2192") || record.winner || "\u2014";
|
|
514
520
|
const status = formatCost(record);
|
|
515
|
-
|
|
521
|
+
const timing = record.ttftMs !== void 0 ? `${record.latencyMs}ms (ttft ${record.ttftMs}ms)` : `${record.latencyMs}ms`;
|
|
522
|
+
let line = `${glyph} ${record.model} ${chain} ${timing} ${status}`;
|
|
516
523
|
if (record.ok && record.baselineUsd !== void 0 && record.baselineUsd > record.costUsd) {
|
|
517
524
|
line += ` (saved $${(record.baselineUsd - record.costUsd).toFixed(4)})`;
|
|
518
525
|
}
|
|
@@ -563,6 +570,68 @@ function createHttpSink(options) {
|
|
|
563
570
|
};
|
|
564
571
|
}
|
|
565
572
|
|
|
573
|
+
// src/media-official.ts
|
|
574
|
+
var OFFICIAL_PRICES = {
|
|
575
|
+
"alibaba/qwen-image": { unit: "image", cents: 3.5 },
|
|
576
|
+
"alibaba/qwen-image-edit": { unit: "image", cents: 4.5 },
|
|
577
|
+
"alibaba/wan-2-2-i2v": { unit: "call", cents: 50 },
|
|
578
|
+
"alibaba/wan-2-2-t2v": { unit: "call", cents: 50 },
|
|
579
|
+
"alibaba/wan-2-5-i2v": { unit: "call", cents: 75 },
|
|
580
|
+
"alibaba/wan-2-5-t2v": { unit: "call", cents: 75 },
|
|
581
|
+
"alibaba/wan-2-7-image-pro": { unit: "image", cents: 7.5 },
|
|
582
|
+
"alibaba/wan-2-7-t2v": { unit: "call", cents: 75 },
|
|
583
|
+
"alibaba/z-image-turbo": { unit: "image", cents: 1.5 },
|
|
584
|
+
"bfl/flux-1.1-pro": { unit: "image", cents: 4 },
|
|
585
|
+
"bfl/flux-2-flex": { unit: "image", cents: 6 },
|
|
586
|
+
"bfl/flux-2-klein-4b": { unit: "image", cents: 1.4 },
|
|
587
|
+
"bfl/flux-2-klein-9b": { unit: "image", cents: 1.5 },
|
|
588
|
+
"bfl/flux-2-pro": { unit: "image", cents: 3 },
|
|
589
|
+
"bfl/flux-kontext-max": { unit: "image", cents: 8 },
|
|
590
|
+
"bfl/flux-kontext-pro": { unit: "image", cents: 4 },
|
|
591
|
+
"bria/rmbg-2": { unit: "image", cents: 1.8 },
|
|
592
|
+
"bytedance/seedance-2-0": { unit: "call", cents: 187 },
|
|
593
|
+
"bytedance/seedance-2-0-fast": { unit: "call", cents: 60 },
|
|
594
|
+
"bytedance/seedance-2-0-fast-i2v": { unit: "call", cents: 60 },
|
|
595
|
+
"bytedance/seedance-v1-pro": { unit: "call", cents: 61 },
|
|
596
|
+
"bytedance/seedance-v1-pro-i2v": { unit: "call", cents: 61 },
|
|
597
|
+
"bytedance/seedream-4-5": { unit: "image", cents: 3 },
|
|
598
|
+
"bytedance/seedream-5-lite": { unit: "image", cents: 3.5 },
|
|
599
|
+
"google/imagen-4-ultra": { unit: "image", cents: 6 },
|
|
600
|
+
"google/nano-banana": { unit: "image", cents: 3.9 },
|
|
601
|
+
"google/nano-banana-2": { unit: "image", cents: 6.7 },
|
|
602
|
+
"google/nano-banana-pro": { unit: "image", cents: 13.4 },
|
|
603
|
+
"google/veo-3-1": { unit: "call", cents: 200 },
|
|
604
|
+
"google/veo-3-1-lite": { unit: "call", cents: 40 },
|
|
605
|
+
"google/veo-3-quality": { unit: "call", cents: 200 },
|
|
606
|
+
"ideogram/v3-balanced": { unit: "image", cents: 6 },
|
|
607
|
+
"kuaishou/kling-image-3": { unit: "image", cents: 2.8 },
|
|
608
|
+
"kuaishou/kling-image-o3": { unit: "image", cents: 2.8 },
|
|
609
|
+
"kuaishou/kling-motion-control": { unit: "call", cents: 56 },
|
|
610
|
+
"kuaishou/kling-v21-master": { unit: "call", cents: 56 },
|
|
611
|
+
"kuaishou/kling-v21-master-i2v": { unit: "call", cents: 56 },
|
|
612
|
+
"kuaishou/kling-v26-pro-i2v": { unit: "call", cents: 56 },
|
|
613
|
+
"kuaishou/kling-v3-pro": { unit: "call", cents: 56 },
|
|
614
|
+
"kuaishou/kling-v3-pro-i2v": { unit: "call", cents: 56 },
|
|
615
|
+
"lightricks/ltx-2": { unit: "call", cents: 30 },
|
|
616
|
+
"lightricks/ltx-2-i2v": { unit: "call", cents: 30 },
|
|
617
|
+
"minimax/hailuo-02-pro": { unit: "call", cents: 53 },
|
|
618
|
+
"minimax/hailuo-02-standard": { unit: "call", cents: 27 },
|
|
619
|
+
"minimax/hailuo-2-3-pro": { unit: "call", cents: 53 },
|
|
620
|
+
"openai/gpt-image-1-5": { unit: "image", cents: 3.4 },
|
|
621
|
+
"openai/gpt-image-2": { unit: "image", cents: 5.3 },
|
|
622
|
+
"openai/gpt-image-2-high": { unit: "image", cents: 21.1 },
|
|
623
|
+
"openai/sora-2": { unit: "call", cents: 50 },
|
|
624
|
+
"openai/sora-2-i2v": { unit: "call", cents: 50 },
|
|
625
|
+
"pixverse/v5-5-i2v": { unit: "call", cents: 60 },
|
|
626
|
+
"pixverse/v6": { unit: "call", cents: 45 },
|
|
627
|
+
"recraft/v3": { unit: "image", cents: 4 },
|
|
628
|
+
"recraft/v4-1": { unit: "image", cents: 25 },
|
|
629
|
+
"runway/gen-4-image": { unit: "image", cents: 8 },
|
|
630
|
+
"stability/fast-sdxl": { unit: "image", cents: 0.9 },
|
|
631
|
+
"stability/stable-diffusion-3": { unit: "image", cents: 6.5 },
|
|
632
|
+
"xai/grok-image-quality": { unit: "image", cents: 7 }
|
|
633
|
+
};
|
|
634
|
+
|
|
566
635
|
// src/media.ts
|
|
567
636
|
var DEFAULT_REFERENCE = {
|
|
568
637
|
image: { width: 1920, height: 1080 },
|
|
@@ -614,7 +683,15 @@ function newMediaCallId() {
|
|
|
614
683
|
return c?.randomUUID ? c.randomUUID() : `lcr_${Date.now().toString(36)}`;
|
|
615
684
|
}
|
|
616
685
|
function createMediaLCR(config) {
|
|
617
|
-
const {
|
|
686
|
+
const {
|
|
687
|
+
registry,
|
|
688
|
+
adapters,
|
|
689
|
+
reference = DEFAULT_REFERENCE,
|
|
690
|
+
officialPrices = OFFICIAL_PRICES,
|
|
691
|
+
onError,
|
|
692
|
+
onCost,
|
|
693
|
+
onCall
|
|
694
|
+
} = config;
|
|
618
695
|
const safeError = (error, provider) => {
|
|
619
696
|
try {
|
|
620
697
|
onError?.(error, provider);
|
|
@@ -639,7 +716,8 @@ function createMediaLCR(config) {
|
|
|
639
716
|
throw new Error(`ai-lcr: unknown media model "${modelId}" \u2014 add it to the registry`);
|
|
640
717
|
}
|
|
641
718
|
const ranked = rankRoutes(def, reference);
|
|
642
|
-
const
|
|
719
|
+
const official = def.official ?? officialPrices[modelId];
|
|
720
|
+
const baselineUsd = official !== void 0 ? normalizedCents(official, reference) / 100 : ranked.length > 0 ? Math.max(...ranked.map((r) => r.refCents)) / 100 : 0;
|
|
643
721
|
const startedAt = Date.now();
|
|
644
722
|
const attempts = [];
|
|
645
723
|
let lastErr;
|
|
@@ -1124,6 +1202,7 @@ function createLCR(config) {
|
|
|
1124
1202
|
0 && (module.exports = {
|
|
1125
1203
|
DEFAULT_REFERENCE,
|
|
1126
1204
|
MEDIA_PRICING,
|
|
1205
|
+
OFFICIAL_PRICES,
|
|
1127
1206
|
cheapestRoute,
|
|
1128
1207
|
classifyError,
|
|
1129
1208
|
classifyErrorKind,
|
package/dist/index.d.cts
CHANGED
|
@@ -85,6 +85,18 @@ interface CallRecord {
|
|
|
85
85
|
failedOver: boolean;
|
|
86
86
|
/** Total wall time across all attempts, ms. */
|
|
87
87
|
latencyMs: number;
|
|
88
|
+
/**
|
|
89
|
+
* Time to first token (TTFT), ms — the industry-standard responsiveness
|
|
90
|
+
* metric. Measured from the *winning* provider's stream attempt start to its
|
|
91
|
+
* first content token (`text-delta` / `reasoning-delta`), so it captures how
|
|
92
|
+
* fast the model that actually served started replying, not failover overhead
|
|
93
|
+
* (that's already in `latencyMs`). Streaming only: **undefined** for
|
|
94
|
+
* `doGenerate` (the whole response lands at once, so there's no "first token")
|
|
95
|
+
* and for calls that failed before producing any content. With `latencyMs` and
|
|
96
|
+
* `outputTokens`, output throughput is derivable: `outputTokens / ((latencyMs −
|
|
97
|
+
* ttftMs) / 1000)` tokens/sec.
|
|
98
|
+
*/
|
|
99
|
+
ttftMs?: number;
|
|
88
100
|
inputTokens: number;
|
|
89
101
|
outputTokens: number;
|
|
90
102
|
/**
|
|
@@ -141,6 +153,7 @@ declare function classifyErrorKind(error: unknown): ErrorKind;
|
|
|
141
153
|
* latency, cost, and — when anything failed — the reason for each failed hop.
|
|
142
154
|
*
|
|
143
155
|
* ✓ text tokenmart 412ms $0.0003
|
|
156
|
+
* ✓ text tokenmart 412ms (ttft 88ms) $0.0003 ← streaming: TTFT shown when known
|
|
144
157
|
* ⚠ text tokenmart→openrouter 910ms $0.0004 ⤷ tokenmart 502
|
|
145
158
|
* ✗ text deepseek→tokenmart→openrouter 1240ms FAILED ⤷ deepseek 401, tokenmart 502, openrouter 429
|
|
146
159
|
*
|
|
@@ -251,6 +264,15 @@ interface MediaModelDef {
|
|
|
251
264
|
modality: MediaModality;
|
|
252
265
|
/** Providers that serve this model. Order is irrelevant — routing sorts by cost. */
|
|
253
266
|
routes: MediaRoute[];
|
|
267
|
+
/**
|
|
268
|
+
* The model-maker's first-party list price — what a user pays going DIRECT,
|
|
269
|
+
* bypassing the cheaper providers we route to. When set, it's the savings
|
|
270
|
+
* baseline (savings = official − actual cost). Omit for open-weight models
|
|
271
|
+
* with no first-party API price; those fall back to the priciest configured
|
|
272
|
+
* route, or no baseline if there's only one. Can also be supplied out-of-band
|
|
273
|
+
* via {@link MediaLCRConfig.officialPrices} so a registry needn't carry it inline.
|
|
274
|
+
*/
|
|
275
|
+
official?: MediaPricing;
|
|
254
276
|
}
|
|
255
277
|
type MediaRegistry = Record<string, MediaModelDef>;
|
|
256
278
|
/**
|
|
@@ -334,6 +356,14 @@ interface MediaLCRConfig {
|
|
|
334
356
|
/** Adapters keyed by provider. A route with no adapter is skipped. */
|
|
335
357
|
adapters: Record<string, MediaAdapter>;
|
|
336
358
|
reference?: ReferenceSpec;
|
|
359
|
+
/**
|
|
360
|
+
* Model-maker first-party list prices keyed by modelId — the savings baseline
|
|
361
|
+
* for a model whose registry def carries no inline `official` price. Lets a
|
|
362
|
+
* downstream registry (e.g. ai-art's) get correct baselines without inlining
|
|
363
|
+
* prices. Defaults to the bundled {@link OFFICIAL_PRICES} (lifted from the
|
|
364
|
+
* cross-provider price table). A def's inline `official` wins over this.
|
|
365
|
+
*/
|
|
366
|
+
officialPrices?: Record<string, MediaPricing>;
|
|
337
367
|
onError?: (error: Error, provider: string) => void;
|
|
338
368
|
onCost?: (event: MediaCostEvent) => void;
|
|
339
369
|
/**
|
|
@@ -374,6 +404,8 @@ declare function createMediaLCR(config: MediaLCRConfig): (modelId: string, input
|
|
|
374
404
|
|
|
375
405
|
declare const MEDIA_PRICING: MediaRegistry;
|
|
376
406
|
|
|
407
|
+
declare const OFFICIAL_PRICES: Record<string, MediaPricing>;
|
|
408
|
+
|
|
377
409
|
/**
|
|
378
410
|
* Kunavo media adapter — image (sync) + video (async poll).
|
|
379
411
|
*
|
|
@@ -532,4 +564,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
|
|
|
532
564
|
*/
|
|
533
565
|
declare function createLCR(config: LCRConfig): LCRRouter;
|
|
534
566
|
|
|
535
|
-
export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
|
|
567
|
+
export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
|
package/dist/index.d.ts
CHANGED
|
@@ -85,6 +85,18 @@ interface CallRecord {
|
|
|
85
85
|
failedOver: boolean;
|
|
86
86
|
/** Total wall time across all attempts, ms. */
|
|
87
87
|
latencyMs: number;
|
|
88
|
+
/**
|
|
89
|
+
* Time to first token (TTFT), ms — the industry-standard responsiveness
|
|
90
|
+
* metric. Measured from the *winning* provider's stream attempt start to its
|
|
91
|
+
* first content token (`text-delta` / `reasoning-delta`), so it captures how
|
|
92
|
+
* fast the model that actually served started replying, not failover overhead
|
|
93
|
+
* (that's already in `latencyMs`). Streaming only: **undefined** for
|
|
94
|
+
* `doGenerate` (the whole response lands at once, so there's no "first token")
|
|
95
|
+
* and for calls that failed before producing any content. With `latencyMs` and
|
|
96
|
+
* `outputTokens`, output throughput is derivable: `outputTokens / ((latencyMs −
|
|
97
|
+
* ttftMs) / 1000)` tokens/sec.
|
|
98
|
+
*/
|
|
99
|
+
ttftMs?: number;
|
|
88
100
|
inputTokens: number;
|
|
89
101
|
outputTokens: number;
|
|
90
102
|
/**
|
|
@@ -141,6 +153,7 @@ declare function classifyErrorKind(error: unknown): ErrorKind;
|
|
|
141
153
|
* latency, cost, and — when anything failed — the reason for each failed hop.
|
|
142
154
|
*
|
|
143
155
|
* ✓ text tokenmart 412ms $0.0003
|
|
156
|
+
* ✓ text tokenmart 412ms (ttft 88ms) $0.0003 ← streaming: TTFT shown when known
|
|
144
157
|
* ⚠ text tokenmart→openrouter 910ms $0.0004 ⤷ tokenmart 502
|
|
145
158
|
* ✗ text deepseek→tokenmart→openrouter 1240ms FAILED ⤷ deepseek 401, tokenmart 502, openrouter 429
|
|
146
159
|
*
|
|
@@ -251,6 +264,15 @@ interface MediaModelDef {
|
|
|
251
264
|
modality: MediaModality;
|
|
252
265
|
/** Providers that serve this model. Order is irrelevant — routing sorts by cost. */
|
|
253
266
|
routes: MediaRoute[];
|
|
267
|
+
/**
|
|
268
|
+
* The model-maker's first-party list price — what a user pays going DIRECT,
|
|
269
|
+
* bypassing the cheaper providers we route to. When set, it's the savings
|
|
270
|
+
* baseline (savings = official − actual cost). Omit for open-weight models
|
|
271
|
+
* with no first-party API price; those fall back to the priciest configured
|
|
272
|
+
* route, or no baseline if there's only one. Can also be supplied out-of-band
|
|
273
|
+
* via {@link MediaLCRConfig.officialPrices} so a registry needn't carry it inline.
|
|
274
|
+
*/
|
|
275
|
+
official?: MediaPricing;
|
|
254
276
|
}
|
|
255
277
|
type MediaRegistry = Record<string, MediaModelDef>;
|
|
256
278
|
/**
|
|
@@ -334,6 +356,14 @@ interface MediaLCRConfig {
|
|
|
334
356
|
/** Adapters keyed by provider. A route with no adapter is skipped. */
|
|
335
357
|
adapters: Record<string, MediaAdapter>;
|
|
336
358
|
reference?: ReferenceSpec;
|
|
359
|
+
/**
|
|
360
|
+
* Model-maker first-party list prices keyed by modelId — the savings baseline
|
|
361
|
+
* for a model whose registry def carries no inline `official` price. Lets a
|
|
362
|
+
* downstream registry (e.g. ai-art's) get correct baselines without inlining
|
|
363
|
+
* prices. Defaults to the bundled {@link OFFICIAL_PRICES} (lifted from the
|
|
364
|
+
* cross-provider price table). A def's inline `official` wins over this.
|
|
365
|
+
*/
|
|
366
|
+
officialPrices?: Record<string, MediaPricing>;
|
|
337
367
|
onError?: (error: Error, provider: string) => void;
|
|
338
368
|
onCost?: (event: MediaCostEvent) => void;
|
|
339
369
|
/**
|
|
@@ -374,6 +404,8 @@ declare function createMediaLCR(config: MediaLCRConfig): (modelId: string, input
|
|
|
374
404
|
|
|
375
405
|
declare const MEDIA_PRICING: MediaRegistry;
|
|
376
406
|
|
|
407
|
+
declare const OFFICIAL_PRICES: Record<string, MediaPricing>;
|
|
408
|
+
|
|
377
409
|
/**
|
|
378
410
|
* Kunavo media adapter — image (sync) + video (async poll).
|
|
379
411
|
*
|
|
@@ -532,4 +564,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
|
|
|
532
564
|
*/
|
|
533
565
|
declare function createLCR(config: LCRConfig): LCRRouter;
|
|
534
566
|
|
|
535
|
-
export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
|
|
567
|
+
export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
|
package/dist/index.js
CHANGED
|
@@ -271,7 +271,7 @@ var LcrFallbackModel = class {
|
|
|
271
271
|
return max;
|
|
272
272
|
}
|
|
273
273
|
/** Winner settled: record the attempt, fire `onCost` (compat) + `onCall`. */
|
|
274
|
-
finalizeOk(ctx, provider, attemptStart, usage) {
|
|
274
|
+
finalizeOk(ctx, provider, attemptStart, usage, ttftMs) {
|
|
275
275
|
ctx.attempts.push({ provider: provider.label, ok: true, latencyMs: Date.now() - attemptStart });
|
|
276
276
|
const inputTokens = usage?.inputTokens?.total ?? 0;
|
|
277
277
|
const outputTokens = usage?.outputTokens?.total ?? 0;
|
|
@@ -293,6 +293,7 @@ var LcrFallbackModel = class {
|
|
|
293
293
|
ok: true,
|
|
294
294
|
failedOver: ctx.attempts.length > 1,
|
|
295
295
|
latencyMs: Date.now() - ctx.startedAt,
|
|
296
|
+
...ttftMs !== void 0 ? { ttftMs } : {},
|
|
296
297
|
inputTokens,
|
|
297
298
|
outputTokens,
|
|
298
299
|
...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {},
|
|
@@ -392,6 +393,7 @@ var LcrFallbackModel = class {
|
|
|
392
393
|
const triedBeforeServing = tried;
|
|
393
394
|
let usage;
|
|
394
395
|
let streamedAny = false;
|
|
396
|
+
let ttftMs;
|
|
395
397
|
const stream = new ReadableStream({
|
|
396
398
|
async start(controller) {
|
|
397
399
|
let reader = null;
|
|
@@ -405,11 +407,14 @@ var LcrFallbackModel = class {
|
|
|
405
407
|
}
|
|
406
408
|
if (done) break;
|
|
407
409
|
if (value.type === "finish") usage = value.usage;
|
|
410
|
+
if (ttftMs === void 0 && (value.type === "text-delta" || value.type === "reasoning-delta")) {
|
|
411
|
+
ttftMs = Date.now() - servingAttemptStart;
|
|
412
|
+
}
|
|
408
413
|
controller.enqueue(value);
|
|
409
414
|
if (value.type !== "stream-start") streamedAny = true;
|
|
410
415
|
}
|
|
411
416
|
self.settleSticky(servingIdx);
|
|
412
|
-
self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage);
|
|
417
|
+
self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
|
|
413
418
|
controller.close();
|
|
414
419
|
} catch (error) {
|
|
415
420
|
self.emitError(error, servingProvider.label);
|
|
@@ -471,7 +476,8 @@ function formatCallRecord(record, opts = {}) {
|
|
|
471
476
|
const glyph = !record.ok ? "\u2717" : record.failedOver ? "\u26A0" : "\u2713";
|
|
472
477
|
const chain = record.attempts.map((a) => a.provider).join("\u2192") || record.winner || "\u2014";
|
|
473
478
|
const status = formatCost(record);
|
|
474
|
-
|
|
479
|
+
const timing = record.ttftMs !== void 0 ? `${record.latencyMs}ms (ttft ${record.ttftMs}ms)` : `${record.latencyMs}ms`;
|
|
480
|
+
let line = `${glyph} ${record.model} ${chain} ${timing} ${status}`;
|
|
475
481
|
if (record.ok && record.baselineUsd !== void 0 && record.baselineUsd > record.costUsd) {
|
|
476
482
|
line += ` (saved $${(record.baselineUsd - record.costUsd).toFixed(4)})`;
|
|
477
483
|
}
|
|
@@ -522,6 +528,68 @@ function createHttpSink(options) {
|
|
|
522
528
|
};
|
|
523
529
|
}
|
|
524
530
|
|
|
531
|
+
// src/media-official.ts
|
|
532
|
+
var OFFICIAL_PRICES = {
|
|
533
|
+
"alibaba/qwen-image": { unit: "image", cents: 3.5 },
|
|
534
|
+
"alibaba/qwen-image-edit": { unit: "image", cents: 4.5 },
|
|
535
|
+
"alibaba/wan-2-2-i2v": { unit: "call", cents: 50 },
|
|
536
|
+
"alibaba/wan-2-2-t2v": { unit: "call", cents: 50 },
|
|
537
|
+
"alibaba/wan-2-5-i2v": { unit: "call", cents: 75 },
|
|
538
|
+
"alibaba/wan-2-5-t2v": { unit: "call", cents: 75 },
|
|
539
|
+
"alibaba/wan-2-7-image-pro": { unit: "image", cents: 7.5 },
|
|
540
|
+
"alibaba/wan-2-7-t2v": { unit: "call", cents: 75 },
|
|
541
|
+
"alibaba/z-image-turbo": { unit: "image", cents: 1.5 },
|
|
542
|
+
"bfl/flux-1.1-pro": { unit: "image", cents: 4 },
|
|
543
|
+
"bfl/flux-2-flex": { unit: "image", cents: 6 },
|
|
544
|
+
"bfl/flux-2-klein-4b": { unit: "image", cents: 1.4 },
|
|
545
|
+
"bfl/flux-2-klein-9b": { unit: "image", cents: 1.5 },
|
|
546
|
+
"bfl/flux-2-pro": { unit: "image", cents: 3 },
|
|
547
|
+
"bfl/flux-kontext-max": { unit: "image", cents: 8 },
|
|
548
|
+
"bfl/flux-kontext-pro": { unit: "image", cents: 4 },
|
|
549
|
+
"bria/rmbg-2": { unit: "image", cents: 1.8 },
|
|
550
|
+
"bytedance/seedance-2-0": { unit: "call", cents: 187 },
|
|
551
|
+
"bytedance/seedance-2-0-fast": { unit: "call", cents: 60 },
|
|
552
|
+
"bytedance/seedance-2-0-fast-i2v": { unit: "call", cents: 60 },
|
|
553
|
+
"bytedance/seedance-v1-pro": { unit: "call", cents: 61 },
|
|
554
|
+
"bytedance/seedance-v1-pro-i2v": { unit: "call", cents: 61 },
|
|
555
|
+
"bytedance/seedream-4-5": { unit: "image", cents: 3 },
|
|
556
|
+
"bytedance/seedream-5-lite": { unit: "image", cents: 3.5 },
|
|
557
|
+
"google/imagen-4-ultra": { unit: "image", cents: 6 },
|
|
558
|
+
"google/nano-banana": { unit: "image", cents: 3.9 },
|
|
559
|
+
"google/nano-banana-2": { unit: "image", cents: 6.7 },
|
|
560
|
+
"google/nano-banana-pro": { unit: "image", cents: 13.4 },
|
|
561
|
+
"google/veo-3-1": { unit: "call", cents: 200 },
|
|
562
|
+
"google/veo-3-1-lite": { unit: "call", cents: 40 },
|
|
563
|
+
"google/veo-3-quality": { unit: "call", cents: 200 },
|
|
564
|
+
"ideogram/v3-balanced": { unit: "image", cents: 6 },
|
|
565
|
+
"kuaishou/kling-image-3": { unit: "image", cents: 2.8 },
|
|
566
|
+
"kuaishou/kling-image-o3": { unit: "image", cents: 2.8 },
|
|
567
|
+
"kuaishou/kling-motion-control": { unit: "call", cents: 56 },
|
|
568
|
+
"kuaishou/kling-v21-master": { unit: "call", cents: 56 },
|
|
569
|
+
"kuaishou/kling-v21-master-i2v": { unit: "call", cents: 56 },
|
|
570
|
+
"kuaishou/kling-v26-pro-i2v": { unit: "call", cents: 56 },
|
|
571
|
+
"kuaishou/kling-v3-pro": { unit: "call", cents: 56 },
|
|
572
|
+
"kuaishou/kling-v3-pro-i2v": { unit: "call", cents: 56 },
|
|
573
|
+
"lightricks/ltx-2": { unit: "call", cents: 30 },
|
|
574
|
+
"lightricks/ltx-2-i2v": { unit: "call", cents: 30 },
|
|
575
|
+
"minimax/hailuo-02-pro": { unit: "call", cents: 53 },
|
|
576
|
+
"minimax/hailuo-02-standard": { unit: "call", cents: 27 },
|
|
577
|
+
"minimax/hailuo-2-3-pro": { unit: "call", cents: 53 },
|
|
578
|
+
"openai/gpt-image-1-5": { unit: "image", cents: 3.4 },
|
|
579
|
+
"openai/gpt-image-2": { unit: "image", cents: 5.3 },
|
|
580
|
+
"openai/gpt-image-2-high": { unit: "image", cents: 21.1 },
|
|
581
|
+
"openai/sora-2": { unit: "call", cents: 50 },
|
|
582
|
+
"openai/sora-2-i2v": { unit: "call", cents: 50 },
|
|
583
|
+
"pixverse/v5-5-i2v": { unit: "call", cents: 60 },
|
|
584
|
+
"pixverse/v6": { unit: "call", cents: 45 },
|
|
585
|
+
"recraft/v3": { unit: "image", cents: 4 },
|
|
586
|
+
"recraft/v4-1": { unit: "image", cents: 25 },
|
|
587
|
+
"runway/gen-4-image": { unit: "image", cents: 8 },
|
|
588
|
+
"stability/fast-sdxl": { unit: "image", cents: 0.9 },
|
|
589
|
+
"stability/stable-diffusion-3": { unit: "image", cents: 6.5 },
|
|
590
|
+
"xai/grok-image-quality": { unit: "image", cents: 7 }
|
|
591
|
+
};
|
|
592
|
+
|
|
525
593
|
// src/media.ts
|
|
526
594
|
var DEFAULT_REFERENCE = {
|
|
527
595
|
image: { width: 1920, height: 1080 },
|
|
@@ -573,7 +641,15 @@ function newMediaCallId() {
|
|
|
573
641
|
return c?.randomUUID ? c.randomUUID() : `lcr_${Date.now().toString(36)}`;
|
|
574
642
|
}
|
|
575
643
|
function createMediaLCR(config) {
|
|
576
|
-
const {
|
|
644
|
+
const {
|
|
645
|
+
registry,
|
|
646
|
+
adapters,
|
|
647
|
+
reference = DEFAULT_REFERENCE,
|
|
648
|
+
officialPrices = OFFICIAL_PRICES,
|
|
649
|
+
onError,
|
|
650
|
+
onCost,
|
|
651
|
+
onCall
|
|
652
|
+
} = config;
|
|
577
653
|
const safeError = (error, provider) => {
|
|
578
654
|
try {
|
|
579
655
|
onError?.(error, provider);
|
|
@@ -598,7 +674,8 @@ function createMediaLCR(config) {
|
|
|
598
674
|
throw new Error(`ai-lcr: unknown media model "${modelId}" \u2014 add it to the registry`);
|
|
599
675
|
}
|
|
600
676
|
const ranked = rankRoutes(def, reference);
|
|
601
|
-
const
|
|
677
|
+
const official = def.official ?? officialPrices[modelId];
|
|
678
|
+
const baselineUsd = official !== void 0 ? normalizedCents(official, reference) / 100 : ranked.length > 0 ? Math.max(...ranked.map((r) => r.refCents)) / 100 : 0;
|
|
602
679
|
const startedAt = Date.now();
|
|
603
680
|
const attempts = [];
|
|
604
681
|
let lastErr;
|
|
@@ -1082,6 +1159,7 @@ function createLCR(config) {
|
|
|
1082
1159
|
export {
|
|
1083
1160
|
DEFAULT_REFERENCE,
|
|
1084
1161
|
MEDIA_PRICING,
|
|
1162
|
+
OFFICIAL_PRICES,
|
|
1085
1163
|
cheapestRoute,
|
|
1086
1164
|
classifyError,
|
|
1087
1165
|
classifyErrorKind,
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "ai-lcr",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.5.0",
|
|
4
4
|
"description": "Least Cost Routing for LLMs — route every model call to the cheapest available provider, fall back automatically, and track real cost. Built for the Vercel AI SDK.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"ai",
|