ai-lcr 0.3.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,45 @@ All notable changes to `ai-lcr` are documented here. The format follows
4
4
  [Keep a Changelog](https://keepachangelog.com/), and the project adheres to
5
5
  [Semantic Versioning](https://semver.org/).
6
6
 
7
+ ## [0.5.0] — 2026-06-02
8
+
9
+ All additions are optional and backward compatible.
10
+
11
+ ### Added
12
+
13
+ - **Official-price savings baseline for media.** A media model's savings baseline
14
+ is now the model-maker's first-party list price — what a user pays going
15
+ *direct*, bypassing the cheaper providers we route to — instead of the priciest
16
+ provider we happen to route between. For the common case of a model served by a
17
+ single aggregator (Runware, fal, …), the old baseline equalled the actual cost,
18
+ so savings showed as `$0`; the official price surfaces the real saving.
19
+ - `MediaModelDef.official?: MediaPricing` — an inline first-party price on a
20
+ model def. When set, it wins.
21
+ - `MediaLCRConfig.officialPrices?: Record<string, MediaPricing>` — a modelId →
22
+ price map so a downstream registry gets correct baselines without inlining
23
+ prices. Defaults to the bundled **`OFFICIAL_PRICES`** (now exported), lifted
24
+ from the cross-provider price table by `scripts/gen-media-official.mjs`.
25
+ - When no official price is known (e.g. open-weight models served only by
26
+ aggregators), the baseline falls back to the priciest configured route — or
27
+ none if there's a single route — exactly as before.
28
+
29
+ ## [0.4.0] — 2026-06-02
30
+
31
+ All additions are optional and backward compatible.
32
+
33
+ ### Added
34
+
35
+ - **`CallRecord.ttftMs` — time to first token.** Streaming calls now report TTFT,
36
+ the industry-standard responsiveness metric: ms from the winning provider's
37
+ stream attempt start to its first content token (`text-delta` /
38
+ `reasoning-delta`). Measured against the *winner's* attempt, so failover
39
+ overhead (already in `latencyMs`) doesn't distort it. `undefined` for
40
+ `doGenerate` (no streaming → no "first token") and for calls that failed before
41
+ producing content. `formatCallRecord` shows it inline next to total latency when
42
+ present (`412ms (ttft 88ms)`). With `latencyMs` and `outputTokens` on the same
43
+ record, output throughput is derivable: `outputTokens / ((latencyMs − ttftMs) /
44
+ 1000)` tokens/sec.
45
+
7
46
  ## [0.3.0] — 2026-06-02
8
47
 
9
48
  Integration-feedback pass from wiring ai-lcr into a real agentic product
package/README.md CHANGED
@@ -184,6 +184,7 @@ interface CallRecord {
184
184
  ok: boolean;
185
185
  failedOver: boolean; // more than one provider was tried
186
186
  latencyMs: number;
187
+ ttftMs?: number; // streaming only: time to first token (winner's first content delta) — industry-standard responsiveness metric
187
188
  inputTokens: number;
188
189
  outputTokens: number;
189
190
  cachedInputTokens?: number; // prompt-cache hits the winner read (when reported)
@@ -196,6 +197,8 @@ interface CallRecord {
196
197
 
197
198
  **Savings, not just spend.** Whenever at least one provider in a chain carries a `cost`, `baselineUsd` is what the same call would have cost on the most expensive priced leg (typically your safety-net fallback). `baselineUsd − costUsd` is the money routing saved on that call — the number a cost dashboard exists to show.
198
199
 
200
+ **Responsiveness, not just total time.** On streaming calls (`streamText`, `streamObject`, streaming agents), `ttftMs` is the **time to first token** — measured from the winning provider's attempt start to its first content delta. It's the metric most LLM dashboards lead with, because it's what a user feels as "how fast did it start replying". Total `latencyMs` covers the whole stream including any failover; `ttftMs` isolates the serving model's responsiveness. It's `undefined` for `generateText`/`generateObject` (no streaming → no "first" token) and for calls that failed before any content. Output throughput (tokens/sec) is then `outputTokens / ((latencyMs − ttftMs) / 1000)`.
201
+
199
202
  **Cache-aware cost.** Add `cacheRead` (USD per 1M cached input tokens) to a provider's `cost` and ai-lcr bills prompt-cache hits at that rate when the call reports `usage.inputTokens.cacheRead`. Omit it and cached tokens fall back to the full `input` rate (unchanged from before). For cache-heavy traffic (e.g. Anthropic, where a cache read is ~0.1×) this keeps `costUsd` honest — and `cachedInputTokens` lets a dashboard audit it:
200
203
 
201
204
  ```ts
@@ -312,11 +315,11 @@ API_KEY=$KUNAVO_API_KEY BASE=https://api.kunavo.com \
312
315
  REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
313
316
  bash scripts/check-provider.sh
314
317
 
315
- # TokenMart uses vendor-prefixed model IDs
316
- API_KEY=$TOKENMART_API_KEY BASE=https://api.tokenmart.ai \
317
- MODEL_1=google/gemini-3-flash REF_1=google/gemini-3-flash-preview \
318
- MODEL_2=anthropic/claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
319
- CACHE_MODEL=anthropic/claude-sonnet-4-6 \
318
+ # TokenMart (Inference AI) uses bare, un-prefixed model IDs
319
+ API_KEY=$INFERENCE_API_KEY BASE=https://model.service-inference.ai \
320
+ MODEL_1=gemini-3-flash-preview REF_1=google/gemini-3-flash-preview \
321
+ MODEL_2=claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
322
+ CACHE_MODEL=claude-sonnet-4-6 \
320
323
  REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
321
324
  bash scripts/check-provider.sh
322
325
  ```
package/README.zh-CN.md CHANGED
@@ -232,11 +232,11 @@ API_KEY=$KUNAVO_API_KEY BASE=https://api.kunavo.com \
232
232
  REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
233
233
  bash scripts/check-provider.sh
234
234
 
235
- # TokenMart 使用 vendor 前缀的模型 ID
236
- API_KEY=$TOKENMART_API_KEY BASE=https://api.tokenmart.ai \
237
- MODEL_1=google/gemini-3-flash REF_1=google/gemini-3-flash-preview \
238
- MODEL_2=anthropic/claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
239
- CACHE_MODEL=anthropic/claude-sonnet-4-6 \
235
+ # TokenMart(Inference AI)使用不带 vendor 前缀的裸模型 ID
236
+ API_KEY=$INFERENCE_API_KEY BASE=https://model.service-inference.ai \
237
+ MODEL_1=gemini-3-flash-preview REF_1=google/gemini-3-flash-preview \
238
+ MODEL_2=claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
239
+ CACHE_MODEL=claude-sonnet-4-6 \
240
240
  REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
241
241
  bash scripts/check-provider.sh
242
242
  ```
package/dist/index.cjs CHANGED
@@ -22,6 +22,7 @@ var index_exports = {};
22
22
  __export(index_exports, {
23
23
  DEFAULT_REFERENCE: () => DEFAULT_REFERENCE,
24
24
  MEDIA_PRICING: () => MEDIA_PRICING,
25
+ OFFICIAL_PRICES: () => OFFICIAL_PRICES,
25
26
  cheapestRoute: () => cheapestRoute,
26
27
  classifyError: () => classifyError,
27
28
  classifyErrorKind: () => classifyErrorKind,
@@ -312,7 +313,7 @@ var LcrFallbackModel = class {
312
313
  return max;
313
314
  }
314
315
  /** Winner settled: record the attempt, fire `onCost` (compat) + `onCall`. */
315
- finalizeOk(ctx, provider, attemptStart, usage) {
316
+ finalizeOk(ctx, provider, attemptStart, usage, ttftMs) {
316
317
  ctx.attempts.push({ provider: provider.label, ok: true, latencyMs: Date.now() - attemptStart });
317
318
  const inputTokens = usage?.inputTokens?.total ?? 0;
318
319
  const outputTokens = usage?.outputTokens?.total ?? 0;
@@ -334,6 +335,7 @@ var LcrFallbackModel = class {
334
335
  ok: true,
335
336
  failedOver: ctx.attempts.length > 1,
336
337
  latencyMs: Date.now() - ctx.startedAt,
338
+ ...ttftMs !== void 0 ? { ttftMs } : {},
337
339
  inputTokens,
338
340
  outputTokens,
339
341
  ...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {},
@@ -433,6 +435,7 @@ var LcrFallbackModel = class {
433
435
  const triedBeforeServing = tried;
434
436
  let usage;
435
437
  let streamedAny = false;
438
+ let ttftMs;
436
439
  const stream = new ReadableStream({
437
440
  async start(controller) {
438
441
  let reader = null;
@@ -446,11 +449,14 @@ var LcrFallbackModel = class {
446
449
  }
447
450
  if (done) break;
448
451
  if (value.type === "finish") usage = value.usage;
452
+ if (ttftMs === void 0 && (value.type === "text-delta" || value.type === "reasoning-delta")) {
453
+ ttftMs = Date.now() - servingAttemptStart;
454
+ }
449
455
  controller.enqueue(value);
450
456
  if (value.type !== "stream-start") streamedAny = true;
451
457
  }
452
458
  self.settleSticky(servingIdx);
453
- self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage);
459
+ self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
454
460
  controller.close();
455
461
  } catch (error) {
456
462
  self.emitError(error, servingProvider.label);
@@ -512,7 +518,8 @@ function formatCallRecord(record, opts = {}) {
512
518
  const glyph = !record.ok ? "\u2717" : record.failedOver ? "\u26A0" : "\u2713";
513
519
  const chain = record.attempts.map((a) => a.provider).join("\u2192") || record.winner || "\u2014";
514
520
  const status = formatCost(record);
515
- let line = `${glyph} ${record.model} ${chain} ${record.latencyMs}ms ${status}`;
521
+ const timing = record.ttftMs !== void 0 ? `${record.latencyMs}ms (ttft ${record.ttftMs}ms)` : `${record.latencyMs}ms`;
522
+ let line = `${glyph} ${record.model} ${chain} ${timing} ${status}`;
516
523
  if (record.ok && record.baselineUsd !== void 0 && record.baselineUsd > record.costUsd) {
517
524
  line += ` (saved $${(record.baselineUsd - record.costUsd).toFixed(4)})`;
518
525
  }
@@ -563,6 +570,68 @@ function createHttpSink(options) {
563
570
  };
564
571
  }
565
572
 
573
+ // src/media-official.ts
574
+ var OFFICIAL_PRICES = {
575
+ "alibaba/qwen-image": { unit: "image", cents: 3.5 },
576
+ "alibaba/qwen-image-edit": { unit: "image", cents: 4.5 },
577
+ "alibaba/wan-2-2-i2v": { unit: "call", cents: 50 },
578
+ "alibaba/wan-2-2-t2v": { unit: "call", cents: 50 },
579
+ "alibaba/wan-2-5-i2v": { unit: "call", cents: 75 },
580
+ "alibaba/wan-2-5-t2v": { unit: "call", cents: 75 },
581
+ "alibaba/wan-2-7-image-pro": { unit: "image", cents: 7.5 },
582
+ "alibaba/wan-2-7-t2v": { unit: "call", cents: 75 },
583
+ "alibaba/z-image-turbo": { unit: "image", cents: 1.5 },
584
+ "bfl/flux-1.1-pro": { unit: "image", cents: 4 },
585
+ "bfl/flux-2-flex": { unit: "image", cents: 6 },
586
+ "bfl/flux-2-klein-4b": { unit: "image", cents: 1.4 },
587
+ "bfl/flux-2-klein-9b": { unit: "image", cents: 1.5 },
588
+ "bfl/flux-2-pro": { unit: "image", cents: 3 },
589
+ "bfl/flux-kontext-max": { unit: "image", cents: 8 },
590
+ "bfl/flux-kontext-pro": { unit: "image", cents: 4 },
591
+ "bria/rmbg-2": { unit: "image", cents: 1.8 },
592
+ "bytedance/seedance-2-0": { unit: "call", cents: 187 },
593
+ "bytedance/seedance-2-0-fast": { unit: "call", cents: 60 },
594
+ "bytedance/seedance-2-0-fast-i2v": { unit: "call", cents: 60 },
595
+ "bytedance/seedance-v1-pro": { unit: "call", cents: 61 },
596
+ "bytedance/seedance-v1-pro-i2v": { unit: "call", cents: 61 },
597
+ "bytedance/seedream-4-5": { unit: "image", cents: 3 },
598
+ "bytedance/seedream-5-lite": { unit: "image", cents: 3.5 },
599
+ "google/imagen-4-ultra": { unit: "image", cents: 6 },
600
+ "google/nano-banana": { unit: "image", cents: 3.9 },
601
+ "google/nano-banana-2": { unit: "image", cents: 6.7 },
602
+ "google/nano-banana-pro": { unit: "image", cents: 13.4 },
603
+ "google/veo-3-1": { unit: "call", cents: 200 },
604
+ "google/veo-3-1-lite": { unit: "call", cents: 40 },
605
+ "google/veo-3-quality": { unit: "call", cents: 200 },
606
+ "ideogram/v3-balanced": { unit: "image", cents: 6 },
607
+ "kuaishou/kling-image-3": { unit: "image", cents: 2.8 },
608
+ "kuaishou/kling-image-o3": { unit: "image", cents: 2.8 },
609
+ "kuaishou/kling-motion-control": { unit: "call", cents: 56 },
610
+ "kuaishou/kling-v21-master": { unit: "call", cents: 56 },
611
+ "kuaishou/kling-v21-master-i2v": { unit: "call", cents: 56 },
612
+ "kuaishou/kling-v26-pro-i2v": { unit: "call", cents: 56 },
613
+ "kuaishou/kling-v3-pro": { unit: "call", cents: 56 },
614
+ "kuaishou/kling-v3-pro-i2v": { unit: "call", cents: 56 },
615
+ "lightricks/ltx-2": { unit: "call", cents: 30 },
616
+ "lightricks/ltx-2-i2v": { unit: "call", cents: 30 },
617
+ "minimax/hailuo-02-pro": { unit: "call", cents: 53 },
618
+ "minimax/hailuo-02-standard": { unit: "call", cents: 27 },
619
+ "minimax/hailuo-2-3-pro": { unit: "call", cents: 53 },
620
+ "openai/gpt-image-1-5": { unit: "image", cents: 3.4 },
621
+ "openai/gpt-image-2": { unit: "image", cents: 5.3 },
622
+ "openai/gpt-image-2-high": { unit: "image", cents: 21.1 },
623
+ "openai/sora-2": { unit: "call", cents: 50 },
624
+ "openai/sora-2-i2v": { unit: "call", cents: 50 },
625
+ "pixverse/v5-5-i2v": { unit: "call", cents: 60 },
626
+ "pixverse/v6": { unit: "call", cents: 45 },
627
+ "recraft/v3": { unit: "image", cents: 4 },
628
+ "recraft/v4-1": { unit: "image", cents: 25 },
629
+ "runway/gen-4-image": { unit: "image", cents: 8 },
630
+ "stability/fast-sdxl": { unit: "image", cents: 0.9 },
631
+ "stability/stable-diffusion-3": { unit: "image", cents: 6.5 },
632
+ "xai/grok-image-quality": { unit: "image", cents: 7 }
633
+ };
634
+
566
635
  // src/media.ts
567
636
  var DEFAULT_REFERENCE = {
568
637
  image: { width: 1920, height: 1080 },
@@ -614,7 +683,15 @@ function newMediaCallId() {
614
683
  return c?.randomUUID ? c.randomUUID() : `lcr_${Date.now().toString(36)}`;
615
684
  }
616
685
  function createMediaLCR(config) {
617
- const { registry, adapters, reference = DEFAULT_REFERENCE, onError, onCost, onCall } = config;
686
+ const {
687
+ registry,
688
+ adapters,
689
+ reference = DEFAULT_REFERENCE,
690
+ officialPrices = OFFICIAL_PRICES,
691
+ onError,
692
+ onCost,
693
+ onCall
694
+ } = config;
618
695
  const safeError = (error, provider) => {
619
696
  try {
620
697
  onError?.(error, provider);
@@ -639,7 +716,8 @@ function createMediaLCR(config) {
639
716
  throw new Error(`ai-lcr: unknown media model "${modelId}" \u2014 add it to the registry`);
640
717
  }
641
718
  const ranked = rankRoutes(def, reference);
642
- const baselineUsd = ranked.length > 0 ? Math.max(...ranked.map((r) => r.refCents)) / 100 : 0;
719
+ const official = def.official ?? officialPrices[modelId];
720
+ const baselineUsd = official !== void 0 ? normalizedCents(official, reference) / 100 : ranked.length > 0 ? Math.max(...ranked.map((r) => r.refCents)) / 100 : 0;
643
721
  const startedAt = Date.now();
644
722
  const attempts = [];
645
723
  let lastErr;
@@ -1124,6 +1202,7 @@ function createLCR(config) {
1124
1202
  0 && (module.exports = {
1125
1203
  DEFAULT_REFERENCE,
1126
1204
  MEDIA_PRICING,
1205
+ OFFICIAL_PRICES,
1127
1206
  cheapestRoute,
1128
1207
  classifyError,
1129
1208
  classifyErrorKind,
package/dist/index.d.cts CHANGED
@@ -85,6 +85,18 @@ interface CallRecord {
85
85
  failedOver: boolean;
86
86
  /** Total wall time across all attempts, ms. */
87
87
  latencyMs: number;
88
+ /**
89
+ * Time to first token (TTFT), ms — the industry-standard responsiveness
90
+ * metric. Measured from the *winning* provider's stream attempt start to its
91
+ * first content token (`text-delta` / `reasoning-delta`), so it captures how
92
+ * fast the model that actually served started replying, not failover overhead
93
+ * (that's already in `latencyMs`). Streaming only: **undefined** for
94
+ * `doGenerate` (the whole response lands at once, so there's no "first token")
95
+ * and for calls that failed before producing any content. With `latencyMs` and
96
+ * `outputTokens`, output throughput is derivable: `outputTokens / ((latencyMs −
97
+ * ttftMs) / 1000)` tokens/sec.
98
+ */
99
+ ttftMs?: number;
88
100
  inputTokens: number;
89
101
  outputTokens: number;
90
102
  /**
@@ -141,6 +153,7 @@ declare function classifyErrorKind(error: unknown): ErrorKind;
141
153
  * latency, cost, and — when anything failed — the reason for each failed hop.
142
154
  *
143
155
  * ✓ text tokenmart 412ms $0.0003
156
+ * ✓ text tokenmart 412ms (ttft 88ms) $0.0003 ← streaming: TTFT shown when known
144
157
  * ⚠ text tokenmart→openrouter 910ms $0.0004 ⤷ tokenmart 502
145
158
  * ✗ text deepseek→tokenmart→openrouter 1240ms FAILED ⤷ deepseek 401, tokenmart 502, openrouter 429
146
159
  *
@@ -251,6 +264,15 @@ interface MediaModelDef {
251
264
  modality: MediaModality;
252
265
  /** Providers that serve this model. Order is irrelevant — routing sorts by cost. */
253
266
  routes: MediaRoute[];
267
+ /**
268
+ * The model-maker's first-party list price — what a user pays going DIRECT,
269
+ * bypassing the cheaper providers we route to. When set, it's the savings
270
+ * baseline (savings = official − actual cost). Omit for open-weight models
271
+ * with no first-party API price; those fall back to the priciest configured
272
+ * route, or no baseline if there's only one. Can also be supplied out-of-band
273
+ * via {@link MediaLCRConfig.officialPrices} so a registry needn't carry it inline.
274
+ */
275
+ official?: MediaPricing;
254
276
  }
255
277
  type MediaRegistry = Record<string, MediaModelDef>;
256
278
  /**
@@ -334,6 +356,14 @@ interface MediaLCRConfig {
334
356
  /** Adapters keyed by provider. A route with no adapter is skipped. */
335
357
  adapters: Record<string, MediaAdapter>;
336
358
  reference?: ReferenceSpec;
359
+ /**
360
+ * Model-maker first-party list prices keyed by modelId — the savings baseline
361
+ * for a model whose registry def carries no inline `official` price. Lets a
362
+ * downstream registry (e.g. ai-art's) get correct baselines without inlining
363
+ * prices. Defaults to the bundled {@link OFFICIAL_PRICES} (lifted from the
364
+ * cross-provider price table). A def's inline `official` wins over this.
365
+ */
366
+ officialPrices?: Record<string, MediaPricing>;
337
367
  onError?: (error: Error, provider: string) => void;
338
368
  onCost?: (event: MediaCostEvent) => void;
339
369
  /**
@@ -374,6 +404,8 @@ declare function createMediaLCR(config: MediaLCRConfig): (modelId: string, input
374
404
 
375
405
  declare const MEDIA_PRICING: MediaRegistry;
376
406
 
407
+ declare const OFFICIAL_PRICES: Record<string, MediaPricing>;
408
+
377
409
  /**
378
410
  * Kunavo media adapter — image (sync) + video (async poll).
379
411
  *
@@ -532,4 +564,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
532
564
  */
533
565
  declare function createLCR(config: LCRConfig): LCRRouter;
534
566
 
535
- export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
567
+ export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
package/dist/index.d.ts CHANGED
@@ -85,6 +85,18 @@ interface CallRecord {
85
85
  failedOver: boolean;
86
86
  /** Total wall time across all attempts, ms. */
87
87
  latencyMs: number;
88
+ /**
89
+ * Time to first token (TTFT), ms — the industry-standard responsiveness
90
+ * metric. Measured from the *winning* provider's stream attempt start to its
91
+ * first content token (`text-delta` / `reasoning-delta`), so it captures how
92
+ * fast the model that actually served started replying, not failover overhead
93
+ * (that's already in `latencyMs`). Streaming only: **undefined** for
94
+ * `doGenerate` (the whole response lands at once, so there's no "first token")
95
+ * and for calls that failed before producing any content. With `latencyMs` and
96
+ * `outputTokens`, output throughput is derivable: `outputTokens / ((latencyMs −
97
+ * ttftMs) / 1000)` tokens/sec.
98
+ */
99
+ ttftMs?: number;
88
100
  inputTokens: number;
89
101
  outputTokens: number;
90
102
  /**
@@ -141,6 +153,7 @@ declare function classifyErrorKind(error: unknown): ErrorKind;
141
153
  * latency, cost, and — when anything failed — the reason for each failed hop.
142
154
  *
143
155
  * ✓ text tokenmart 412ms $0.0003
156
+ * ✓ text tokenmart 412ms (ttft 88ms) $0.0003 ← streaming: TTFT shown when known
144
157
  * ⚠ text tokenmart→openrouter 910ms $0.0004 ⤷ tokenmart 502
145
158
  * ✗ text deepseek→tokenmart→openrouter 1240ms FAILED ⤷ deepseek 401, tokenmart 502, openrouter 429
146
159
  *
@@ -251,6 +264,15 @@ interface MediaModelDef {
251
264
  modality: MediaModality;
252
265
  /** Providers that serve this model. Order is irrelevant — routing sorts by cost. */
253
266
  routes: MediaRoute[];
267
+ /**
268
+ * The model-maker's first-party list price — what a user pays going DIRECT,
269
+ * bypassing the cheaper providers we route to. When set, it's the savings
270
+ * baseline (savings = official − actual cost). Omit for open-weight models
271
+ * with no first-party API price; those fall back to the priciest configured
272
+ * route, or no baseline if there's only one. Can also be supplied out-of-band
273
+ * via {@link MediaLCRConfig.officialPrices} so a registry needn't carry it inline.
274
+ */
275
+ official?: MediaPricing;
254
276
  }
255
277
  type MediaRegistry = Record<string, MediaModelDef>;
256
278
  /**
@@ -334,6 +356,14 @@ interface MediaLCRConfig {
334
356
  /** Adapters keyed by provider. A route with no adapter is skipped. */
335
357
  adapters: Record<string, MediaAdapter>;
336
358
  reference?: ReferenceSpec;
359
+ /**
360
+ * Model-maker first-party list prices keyed by modelId — the savings baseline
361
+ * for a model whose registry def carries no inline `official` price. Lets a
362
+ * downstream registry (e.g. ai-art's) get correct baselines without inlining
363
+ * prices. Defaults to the bundled {@link OFFICIAL_PRICES} (lifted from the
364
+ * cross-provider price table). A def's inline `official` wins over this.
365
+ */
366
+ officialPrices?: Record<string, MediaPricing>;
337
367
  onError?: (error: Error, provider: string) => void;
338
368
  onCost?: (event: MediaCostEvent) => void;
339
369
  /**
@@ -374,6 +404,8 @@ declare function createMediaLCR(config: MediaLCRConfig): (modelId: string, input
374
404
 
375
405
  declare const MEDIA_PRICING: MediaRegistry;
376
406
 
407
+ declare const OFFICIAL_PRICES: Record<string, MediaPricing>;
408
+
377
409
  /**
378
410
  * Kunavo media adapter — image (sync) + video (async poll).
379
411
  *
@@ -532,4 +564,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
532
564
  */
533
565
  declare function createLCR(config: LCRConfig): LCRRouter;
534
566
 
535
- export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
567
+ export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
package/dist/index.js CHANGED
@@ -271,7 +271,7 @@ var LcrFallbackModel = class {
271
271
  return max;
272
272
  }
273
273
  /** Winner settled: record the attempt, fire `onCost` (compat) + `onCall`. */
274
- finalizeOk(ctx, provider, attemptStart, usage) {
274
+ finalizeOk(ctx, provider, attemptStart, usage, ttftMs) {
275
275
  ctx.attempts.push({ provider: provider.label, ok: true, latencyMs: Date.now() - attemptStart });
276
276
  const inputTokens = usage?.inputTokens?.total ?? 0;
277
277
  const outputTokens = usage?.outputTokens?.total ?? 0;
@@ -293,6 +293,7 @@ var LcrFallbackModel = class {
293
293
  ok: true,
294
294
  failedOver: ctx.attempts.length > 1,
295
295
  latencyMs: Date.now() - ctx.startedAt,
296
+ ...ttftMs !== void 0 ? { ttftMs } : {},
296
297
  inputTokens,
297
298
  outputTokens,
298
299
  ...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {},
@@ -392,6 +393,7 @@ var LcrFallbackModel = class {
392
393
  const triedBeforeServing = tried;
393
394
  let usage;
394
395
  let streamedAny = false;
396
+ let ttftMs;
395
397
  const stream = new ReadableStream({
396
398
  async start(controller) {
397
399
  let reader = null;
@@ -405,11 +407,14 @@ var LcrFallbackModel = class {
405
407
  }
406
408
  if (done) break;
407
409
  if (value.type === "finish") usage = value.usage;
410
+ if (ttftMs === void 0 && (value.type === "text-delta" || value.type === "reasoning-delta")) {
411
+ ttftMs = Date.now() - servingAttemptStart;
412
+ }
408
413
  controller.enqueue(value);
409
414
  if (value.type !== "stream-start") streamedAny = true;
410
415
  }
411
416
  self.settleSticky(servingIdx);
412
- self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage);
417
+ self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
413
418
  controller.close();
414
419
  } catch (error) {
415
420
  self.emitError(error, servingProvider.label);
@@ -471,7 +476,8 @@ function formatCallRecord(record, opts = {}) {
471
476
  const glyph = !record.ok ? "\u2717" : record.failedOver ? "\u26A0" : "\u2713";
472
477
  const chain = record.attempts.map((a) => a.provider).join("\u2192") || record.winner || "\u2014";
473
478
  const status = formatCost(record);
474
- let line = `${glyph} ${record.model} ${chain} ${record.latencyMs}ms ${status}`;
479
+ const timing = record.ttftMs !== void 0 ? `${record.latencyMs}ms (ttft ${record.ttftMs}ms)` : `${record.latencyMs}ms`;
480
+ let line = `${glyph} ${record.model} ${chain} ${timing} ${status}`;
475
481
  if (record.ok && record.baselineUsd !== void 0 && record.baselineUsd > record.costUsd) {
476
482
  line += ` (saved $${(record.baselineUsd - record.costUsd).toFixed(4)})`;
477
483
  }
@@ -522,6 +528,68 @@ function createHttpSink(options) {
522
528
  };
523
529
  }
524
530
 
531
+ // src/media-official.ts
532
+ var OFFICIAL_PRICES = {
533
+ "alibaba/qwen-image": { unit: "image", cents: 3.5 },
534
+ "alibaba/qwen-image-edit": { unit: "image", cents: 4.5 },
535
+ "alibaba/wan-2-2-i2v": { unit: "call", cents: 50 },
536
+ "alibaba/wan-2-2-t2v": { unit: "call", cents: 50 },
537
+ "alibaba/wan-2-5-i2v": { unit: "call", cents: 75 },
538
+ "alibaba/wan-2-5-t2v": { unit: "call", cents: 75 },
539
+ "alibaba/wan-2-7-image-pro": { unit: "image", cents: 7.5 },
540
+ "alibaba/wan-2-7-t2v": { unit: "call", cents: 75 },
541
+ "alibaba/z-image-turbo": { unit: "image", cents: 1.5 },
542
+ "bfl/flux-1.1-pro": { unit: "image", cents: 4 },
543
+ "bfl/flux-2-flex": { unit: "image", cents: 6 },
544
+ "bfl/flux-2-klein-4b": { unit: "image", cents: 1.4 },
545
+ "bfl/flux-2-klein-9b": { unit: "image", cents: 1.5 },
546
+ "bfl/flux-2-pro": { unit: "image", cents: 3 },
547
+ "bfl/flux-kontext-max": { unit: "image", cents: 8 },
548
+ "bfl/flux-kontext-pro": { unit: "image", cents: 4 },
549
+ "bria/rmbg-2": { unit: "image", cents: 1.8 },
550
+ "bytedance/seedance-2-0": { unit: "call", cents: 187 },
551
+ "bytedance/seedance-2-0-fast": { unit: "call", cents: 60 },
552
+ "bytedance/seedance-2-0-fast-i2v": { unit: "call", cents: 60 },
553
+ "bytedance/seedance-v1-pro": { unit: "call", cents: 61 },
554
+ "bytedance/seedance-v1-pro-i2v": { unit: "call", cents: 61 },
555
+ "bytedance/seedream-4-5": { unit: "image", cents: 3 },
556
+ "bytedance/seedream-5-lite": { unit: "image", cents: 3.5 },
557
+ "google/imagen-4-ultra": { unit: "image", cents: 6 },
558
+ "google/nano-banana": { unit: "image", cents: 3.9 },
559
+ "google/nano-banana-2": { unit: "image", cents: 6.7 },
560
+ "google/nano-banana-pro": { unit: "image", cents: 13.4 },
561
+ "google/veo-3-1": { unit: "call", cents: 200 },
562
+ "google/veo-3-1-lite": { unit: "call", cents: 40 },
563
+ "google/veo-3-quality": { unit: "call", cents: 200 },
564
+ "ideogram/v3-balanced": { unit: "image", cents: 6 },
565
+ "kuaishou/kling-image-3": { unit: "image", cents: 2.8 },
566
+ "kuaishou/kling-image-o3": { unit: "image", cents: 2.8 },
567
+ "kuaishou/kling-motion-control": { unit: "call", cents: 56 },
568
+ "kuaishou/kling-v21-master": { unit: "call", cents: 56 },
569
+ "kuaishou/kling-v21-master-i2v": { unit: "call", cents: 56 },
570
+ "kuaishou/kling-v26-pro-i2v": { unit: "call", cents: 56 },
571
+ "kuaishou/kling-v3-pro": { unit: "call", cents: 56 },
572
+ "kuaishou/kling-v3-pro-i2v": { unit: "call", cents: 56 },
573
+ "lightricks/ltx-2": { unit: "call", cents: 30 },
574
+ "lightricks/ltx-2-i2v": { unit: "call", cents: 30 },
575
+ "minimax/hailuo-02-pro": { unit: "call", cents: 53 },
576
+ "minimax/hailuo-02-standard": { unit: "call", cents: 27 },
577
+ "minimax/hailuo-2-3-pro": { unit: "call", cents: 53 },
578
+ "openai/gpt-image-1-5": { unit: "image", cents: 3.4 },
579
+ "openai/gpt-image-2": { unit: "image", cents: 5.3 },
580
+ "openai/gpt-image-2-high": { unit: "image", cents: 21.1 },
581
+ "openai/sora-2": { unit: "call", cents: 50 },
582
+ "openai/sora-2-i2v": { unit: "call", cents: 50 },
583
+ "pixverse/v5-5-i2v": { unit: "call", cents: 60 },
584
+ "pixverse/v6": { unit: "call", cents: 45 },
585
+ "recraft/v3": { unit: "image", cents: 4 },
586
+ "recraft/v4-1": { unit: "image", cents: 25 },
587
+ "runway/gen-4-image": { unit: "image", cents: 8 },
588
+ "stability/fast-sdxl": { unit: "image", cents: 0.9 },
589
+ "stability/stable-diffusion-3": { unit: "image", cents: 6.5 },
590
+ "xai/grok-image-quality": { unit: "image", cents: 7 }
591
+ };
592
+
525
593
  // src/media.ts
526
594
  var DEFAULT_REFERENCE = {
527
595
  image: { width: 1920, height: 1080 },
@@ -573,7 +641,15 @@ function newMediaCallId() {
573
641
  return c?.randomUUID ? c.randomUUID() : `lcr_${Date.now().toString(36)}`;
574
642
  }
575
643
  function createMediaLCR(config) {
576
- const { registry, adapters, reference = DEFAULT_REFERENCE, onError, onCost, onCall } = config;
644
+ const {
645
+ registry,
646
+ adapters,
647
+ reference = DEFAULT_REFERENCE,
648
+ officialPrices = OFFICIAL_PRICES,
649
+ onError,
650
+ onCost,
651
+ onCall
652
+ } = config;
577
653
  const safeError = (error, provider) => {
578
654
  try {
579
655
  onError?.(error, provider);
@@ -598,7 +674,8 @@ function createMediaLCR(config) {
598
674
  throw new Error(`ai-lcr: unknown media model "${modelId}" \u2014 add it to the registry`);
599
675
  }
600
676
  const ranked = rankRoutes(def, reference);
601
- const baselineUsd = ranked.length > 0 ? Math.max(...ranked.map((r) => r.refCents)) / 100 : 0;
677
+ const official = def.official ?? officialPrices[modelId];
678
+ const baselineUsd = official !== void 0 ? normalizedCents(official, reference) / 100 : ranked.length > 0 ? Math.max(...ranked.map((r) => r.refCents)) / 100 : 0;
602
679
  const startedAt = Date.now();
603
680
  const attempts = [];
604
681
  let lastErr;
@@ -1082,6 +1159,7 @@ function createLCR(config) {
1082
1159
  export {
1083
1160
  DEFAULT_REFERENCE,
1084
1161
  MEDIA_PRICING,
1162
+ OFFICIAL_PRICES,
1085
1163
  cheapestRoute,
1086
1164
  classifyError,
1087
1165
  classifyErrorKind,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ai-lcr",
3
- "version": "0.3.0",
3
+ "version": "0.5.0",
4
4
  "description": "Least Cost Routing for LLMs — route every model call to the cheapest available provider, fall back automatically, and track real cost. Built for the Vercel AI SDK.",
5
5
  "keywords": [
6
6
  "ai",