npm - ai-lcr - Versions diffs - 0.3.0 → 0.5.0 - Mend

ai-lcr 0.3.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,45 @@ All notable changes to `ai-lcr` are documented here. The format follows
 [Keep a Changelog](https://keepachangelog.com/), and the project adheres to
 [Semantic Versioning](https://semver.org/).
+## [0.5.0] — 2026-06-02
+All additions are optional and backward compatible.
+### Added
+- **Official-price savings baseline for media.** A media model's savings baseline
+  is now the model-maker's first-party list price — what a user pays going
+  *direct*, bypassing the cheaper providers we route to — instead of the priciest
+  provider we happen to route between. For the common case of a model served by a
+  single aggregator (Runware, fal, …), the old baseline equalled the actual cost,
+  so savings showed as `$0`; the official price surfaces the real saving.
+  - `MediaModelDef.official?: MediaPricing` — an inline first-party price on a
+    model def. When set, it wins.
+  - `MediaLCRConfig.officialPrices?: Record<string, MediaPricing>` — a modelId →
+    price map so a downstream registry gets correct baselines without inlining
+    prices. Defaults to the bundled **`OFFICIAL_PRICES`** (now exported), lifted
+    from the cross-provider price table by `scripts/gen-media-official.mjs`.
+  - When no official price is known (e.g. open-weight models served only by
+    aggregators), the baseline falls back to the priciest configured route — or
+    none if there's a single route — exactly as before.
+## [0.4.0] — 2026-06-02
+All additions are optional and backward compatible.
+### Added
+- **`CallRecord.ttftMs` — time to first token.** Streaming calls now report TTFT,
+  the industry-standard responsiveness metric: ms from the winning provider's
+  stream attempt start to its first content token (`text-delta` /
+  `reasoning-delta`). Measured against the *winner's* attempt, so failover
+  overhead (already in `latencyMs`) doesn't distort it. `undefined` for
+  `doGenerate` (no streaming → no "first token") and for calls that failed before
+  producing content. `formatCallRecord` shows it inline next to total latency when
+  present (`412ms (ttft 88ms)`). With `latencyMs` and `outputTokens` on the same
+  record, output throughput is derivable: `outputTokens / ((latencyMs − ttftMs) /
+  1000)` tokens/sec.
 ## [0.3.0] — 2026-06-02
 Integration-feedback pass from wiring ai-lcr into a real agentic product

package/README.md CHANGED Viewed

@@ -184,6 +184,7 @@ interface CallRecord {
   ok: boolean;
   failedOver: boolean;       // more than one provider was tried
   latencyMs: number;
+  ttftMs?: number;           // streaming only: time to first token (winner's first content delta) — industry-standard responsiveness metric
   inputTokens: number;
   outputTokens: number;
   cachedInputTokens?: number; // prompt-cache hits the winner read (when reported)
@@ -196,6 +197,8 @@ interface CallRecord {
 **Savings, not just spend.** Whenever at least one provider in a chain carries a `cost`, `baselineUsd` is what the same call would have cost on the most expensive priced leg (typically your safety-net fallback). `baselineUsd − costUsd` is the money routing saved on that call — the number a cost dashboard exists to show.
+**Responsiveness, not just total time.** On streaming calls (`streamText`, `streamObject`, streaming agents), `ttftMs` is the **time to first token** — measured from the winning provider's attempt start to its first content delta. It's the metric most LLM dashboards lead with, because it's what a user feels as "how fast did it start replying". Total `latencyMs` covers the whole stream including any failover; `ttftMs` isolates the serving model's responsiveness. It's `undefined` for `generateText`/`generateObject` (no streaming → no "first" token) and for calls that failed before any content. Output throughput (tokens/sec) is then `outputTokens / ((latencyMs − ttftMs) / 1000)`.
 **Cache-aware cost.** Add `cacheRead` (USD per 1M cached input tokens) to a provider's `cost` and ai-lcr bills prompt-cache hits at that rate when the call reports `usage.inputTokens.cacheRead`. Omit it and cached tokens fall back to the full `input` rate (unchanged from before). For cache-heavy traffic (e.g. Anthropic, where a cache read is ~0.1×) this keeps `costUsd` honest — and `cachedInputTokens` lets a dashboard audit it:
 ```ts
@@ -312,11 +315,11 @@ API_KEY=$KUNAVO_API_KEY BASE=https://api.kunavo.com \
   REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
   bash scripts/check-provider.sh
-# TokenMart uses vendor-prefixed model IDs
-API_KEY=$TOKENMART_API_KEY BASE=https://api.tokenmart.ai \
-  MODEL_1=google/gemini-3-flash    REF_1=google/gemini-3-flash-preview \
-  MODEL_2=anthropic/claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
-  CACHE_MODEL=anthropic/claude-sonnet-4-6 \
+# TokenMart (Inference AI) uses bare, un-prefixed model IDs
+API_KEY=$INFERENCE_API_KEY BASE=https://model.service-inference.ai \
+  MODEL_1=gemini-3-flash-preview      REF_1=google/gemini-3-flash-preview \
+  MODEL_2=claude-sonnet-4-6           REF_2=anthropic/claude-sonnet-4.6 \
+  CACHE_MODEL=claude-sonnet-4-6 \
   REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
   bash scripts/check-provider.sh
 ```

package/README.zh-CN.md CHANGED Viewed

@@ -232,11 +232,11 @@ API_KEY=$KUNAVO_API_KEY BASE=https://api.kunavo.com \
   REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
   bash scripts/check-provider.sh
-# TokenMart 使用 vendor 前缀的模型 ID
-API_KEY=$TOKENMART_API_KEY BASE=https://api.tokenmart.ai \
-  MODEL_1=google/gemini-3-flash    REF_1=google/gemini-3-flash-preview \
-  MODEL_2=anthropic/claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
-  CACHE_MODEL=anthropic/claude-sonnet-4-6 \
+# TokenMart（Inference AI）使用不带 vendor 前缀的裸模型 ID
+API_KEY=$INFERENCE_API_KEY BASE=https://model.service-inference.ai \
+  MODEL_1=gemini-3-flash-preview      REF_1=google/gemini-3-flash-preview \
+  MODEL_2=claude-sonnet-4-6           REF_2=anthropic/claude-sonnet-4.6 \
+  CACHE_MODEL=claude-sonnet-4-6 \
   REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
   bash scripts/check-provider.sh
 ```

package/dist/index.cjs CHANGED Viewed

@@ -22,6 +22,7 @@ var index_exports = {};
 __export(index_exports, {
   DEFAULT_REFERENCE: () => DEFAULT_REFERENCE,
   MEDIA_PRICING: () => MEDIA_PRICING,
+  OFFICIAL_PRICES: () => OFFICIAL_PRICES,
   cheapestRoute: () => cheapestRoute,
   classifyError: () => classifyError,
   classifyErrorKind: () => classifyErrorKind,
@@ -312,7 +313,7 @@ var LcrFallbackModel = class {
     return max;
   }
   /** Winner settled: record the attempt, fire `onCost` (compat) + `onCall`. */
-  finalizeOk(ctx, provider, attemptStart, usage) {
+  finalizeOk(ctx, provider, attemptStart, usage, ttftMs) {
     ctx.attempts.push({ provider: provider.label, ok: true, latencyMs: Date.now() - attemptStart });
     const inputTokens = usage?.inputTokens?.total ?? 0;
     const outputTokens = usage?.outputTokens?.total ?? 0;
@@ -334,6 +335,7 @@ var LcrFallbackModel = class {
       ok: true,
       failedOver: ctx.attempts.length > 1,
       latencyMs: Date.now() - ctx.startedAt,
+      ...ttftMs !== void 0 ? { ttftMs } : {},
       inputTokens,
       outputTokens,
       ...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {},
@@ -433,6 +435,7 @@ var LcrFallbackModel = class {
     const triedBeforeServing = tried;
     let usage;
     let streamedAny = false;
+    let ttftMs;
     const stream = new ReadableStream({
       async start(controller) {
         let reader = null;
@@ -446,11 +449,14 @@ var LcrFallbackModel = class {
             }
             if (done) break;
             if (value.type === "finish") usage = value.usage;
+            if (ttftMs === void 0 && (value.type === "text-delta" || value.type === "reasoning-delta")) {
+              ttftMs = Date.now() - servingAttemptStart;
+            }
             controller.enqueue(value);
             if (value.type !== "stream-start") streamedAny = true;
           }
           self.settleSticky(servingIdx);
-          self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage);
+          self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
           controller.close();
         } catch (error) {
           self.emitError(error, servingProvider.label);
@@ -512,7 +518,8 @@ function formatCallRecord(record, opts = {}) {
   const glyph = !record.ok ? "\u2717" : record.failedOver ? "\u26A0" : "\u2713";
   const chain = record.attempts.map((a) => a.provider).join("\u2192") || record.winner || "\u2014";
   const status = formatCost(record);
-  let line = `${glyph} ${record.model}  ${chain}  ${record.latencyMs}ms  ${status}`;
+  const timing = record.ttftMs !== void 0 ? `${record.latencyMs}ms (ttft ${record.ttftMs}ms)` : `${record.latencyMs}ms`;
+  let line = `${glyph} ${record.model}  ${chain}  ${timing}  ${status}`;
   if (record.ok && record.baselineUsd !== void 0 && record.baselineUsd > record.costUsd) {
     line += `  (saved $${(record.baselineUsd - record.costUsd).toFixed(4)})`;
   }
@@ -563,6 +570,68 @@ function createHttpSink(options) {
   };
 }
+// src/media-official.ts
+var OFFICIAL_PRICES = {
+  "alibaba/qwen-image": { unit: "image", cents: 3.5 },
+  "alibaba/qwen-image-edit": { unit: "image", cents: 4.5 },
+  "alibaba/wan-2-2-i2v": { unit: "call", cents: 50 },
+  "alibaba/wan-2-2-t2v": { unit: "call", cents: 50 },
+  "alibaba/wan-2-5-i2v": { unit: "call", cents: 75 },
+  "alibaba/wan-2-5-t2v": { unit: "call", cents: 75 },
+  "alibaba/wan-2-7-image-pro": { unit: "image", cents: 7.5 },
+  "alibaba/wan-2-7-t2v": { unit: "call", cents: 75 },
+  "alibaba/z-image-turbo": { unit: "image", cents: 1.5 },
+  "bfl/flux-1.1-pro": { unit: "image", cents: 4 },
+  "bfl/flux-2-flex": { unit: "image", cents: 6 },
+  "bfl/flux-2-klein-4b": { unit: "image", cents: 1.4 },
+  "bfl/flux-2-klein-9b": { unit: "image", cents: 1.5 },
+  "bfl/flux-2-pro": { unit: "image", cents: 3 },
+  "bfl/flux-kontext-max": { unit: "image", cents: 8 },
+  "bfl/flux-kontext-pro": { unit: "image", cents: 4 },
+  "bria/rmbg-2": { unit: "image", cents: 1.8 },
+  "bytedance/seedance-2-0": { unit: "call", cents: 187 },
+  "bytedance/seedance-2-0-fast": { unit: "call", cents: 60 },
+  "bytedance/seedance-2-0-fast-i2v": { unit: "call", cents: 60 },
+  "bytedance/seedance-v1-pro": { unit: "call", cents: 61 },
+  "bytedance/seedance-v1-pro-i2v": { unit: "call", cents: 61 },
+  "bytedance/seedream-4-5": { unit: "image", cents: 3 },
+  "bytedance/seedream-5-lite": { unit: "image", cents: 3.5 },
+  "google/imagen-4-ultra": { unit: "image", cents: 6 },
+  "google/nano-banana": { unit: "image", cents: 3.9 },
+  "google/nano-banana-2": { unit: "image", cents: 6.7 },
+  "google/nano-banana-pro": { unit: "image", cents: 13.4 },
+  "google/veo-3-1": { unit: "call", cents: 200 },
+  "google/veo-3-1-lite": { unit: "call", cents: 40 },
+  "google/veo-3-quality": { unit: "call", cents: 200 },
+  "ideogram/v3-balanced": { unit: "image", cents: 6 },
+  "kuaishou/kling-image-3": { unit: "image", cents: 2.8 },
+  "kuaishou/kling-image-o3": { unit: "image", cents: 2.8 },
+  "kuaishou/kling-motion-control": { unit: "call", cents: 56 },
+  "kuaishou/kling-v21-master": { unit: "call", cents: 56 },
+  "kuaishou/kling-v21-master-i2v": { unit: "call", cents: 56 },
+  "kuaishou/kling-v26-pro-i2v": { unit: "call", cents: 56 },
+  "kuaishou/kling-v3-pro": { unit: "call", cents: 56 },
+  "kuaishou/kling-v3-pro-i2v": { unit: "call", cents: 56 },
+  "lightricks/ltx-2": { unit: "call", cents: 30 },
+  "lightricks/ltx-2-i2v": { unit: "call", cents: 30 },
+  "minimax/hailuo-02-pro": { unit: "call", cents: 53 },
+  "minimax/hailuo-02-standard": { unit: "call", cents: 27 },
+  "minimax/hailuo-2-3-pro": { unit: "call", cents: 53 },
+  "openai/gpt-image-1-5": { unit: "image", cents: 3.4 },
+  "openai/gpt-image-2": { unit: "image", cents: 5.3 },
+  "openai/gpt-image-2-high": { unit: "image", cents: 21.1 },
+  "openai/sora-2": { unit: "call", cents: 50 },
+  "openai/sora-2-i2v": { unit: "call", cents: 50 },
+  "pixverse/v5-5-i2v": { unit: "call", cents: 60 },
+  "pixverse/v6": { unit: "call", cents: 45 },
+  "recraft/v3": { unit: "image", cents: 4 },
+  "recraft/v4-1": { unit: "image", cents: 25 },
+  "runway/gen-4-image": { unit: "image", cents: 8 },
+  "stability/fast-sdxl": { unit: "image", cents: 0.9 },
+  "stability/stable-diffusion-3": { unit: "image", cents: 6.5 },
+  "xai/grok-image-quality": { unit: "image", cents: 7 }
+};
 // src/media.ts
 var DEFAULT_REFERENCE = {
   image: { width: 1920, height: 1080 },
@@ -614,7 +683,15 @@ function newMediaCallId() {
   return c?.randomUUID ? c.randomUUID() : `lcr_${Date.now().toString(36)}`;
 }
 function createMediaLCR(config) {
-  const { registry, adapters, reference = DEFAULT_REFERENCE, onError, onCost, onCall } = config;
+  const {
+    registry,
+    adapters,
+    reference = DEFAULT_REFERENCE,
+    officialPrices = OFFICIAL_PRICES,
+    onError,
+    onCost,
+    onCall
+  } = config;
   const safeError = (error, provider) => {
     try {
       onError?.(error, provider);
@@ -639,7 +716,8 @@ function createMediaLCR(config) {
       throw new Error(`ai-lcr: unknown media model "${modelId}" \u2014 add it to the registry`);
     }
     const ranked = rankRoutes(def, reference);
-    const baselineUsd = ranked.length > 0 ? Math.max(...ranked.map((r) => r.refCents)) / 100 : 0;
+    const official = def.official ?? officialPrices[modelId];
+    const baselineUsd = official !== void 0 ? normalizedCents(official, reference) / 100 : ranked.length > 0 ? Math.max(...ranked.map((r) => r.refCents)) / 100 : 0;
     const startedAt = Date.now();
     const attempts = [];
     let lastErr;
@@ -1124,6 +1202,7 @@ function createLCR(config) {
 0 && (module.exports = {
   DEFAULT_REFERENCE,
   MEDIA_PRICING,
+  OFFICIAL_PRICES,
   cheapestRoute,
   classifyError,
   classifyErrorKind,

package/dist/index.d.cts CHANGED Viewed

@@ -85,6 +85,18 @@ interface CallRecord {
     failedOver: boolean;
     /** Total wall time across all attempts, ms. */
     latencyMs: number;
+    /**
+     * Time to first token (TTFT), ms — the industry-standard responsiveness
+     * metric. Measured from the *winning* provider's stream attempt start to its
+     * first content token (`text-delta` / `reasoning-delta`), so it captures how
+     * fast the model that actually served started replying, not failover overhead
+     * (that's already in `latencyMs`). Streaming only: **undefined** for
+     * `doGenerate` (the whole response lands at once, so there's no "first token")
+     * and for calls that failed before producing any content. With `latencyMs` and
+     * `outputTokens`, output throughput is derivable: `outputTokens / ((latencyMs −
+     * ttftMs) / 1000)` tokens/sec.
+     */
+    ttftMs?: number;
     inputTokens: number;
     outputTokens: number;
     /**
@@ -141,6 +153,7 @@ declare function classifyErrorKind(error: unknown): ErrorKind;
  * latency, cost, and — when anything failed — the reason for each failed hop.
  *
  *   ✓ text  tokenmart                      412ms  $0.0003
+ *   ✓ text  tokenmart                      412ms (ttft 88ms)  $0.0003   ← streaming: TTFT shown when known
  *   ⚠ text  tokenmart→openrouter           910ms  $0.0004   ⤷ tokenmart 502
  *   ✗ text  deepseek→tokenmart→openrouter  1240ms FAILED    ⤷ deepseek 401, tokenmart 502, openrouter 429
  *
@@ -251,6 +264,15 @@ interface MediaModelDef {
     modality: MediaModality;
     /** Providers that serve this model. Order is irrelevant — routing sorts by cost. */
     routes: MediaRoute[];
+    /**
+     * The model-maker's first-party list price — what a user pays going DIRECT,
+     * bypassing the cheaper providers we route to. When set, it's the savings
+     * baseline (savings = official − actual cost). Omit for open-weight models
+     * with no first-party API price; those fall back to the priciest configured
+     * route, or no baseline if there's only one. Can also be supplied out-of-band
+     * via {@link MediaLCRConfig.officialPrices} so a registry needn't carry it inline.
+     */
+    official?: MediaPricing;
 }
 type MediaRegistry = Record<string, MediaModelDef>;
 /**
@@ -334,6 +356,14 @@ interface MediaLCRConfig {
     /** Adapters keyed by provider. A route with no adapter is skipped. */
     adapters: Record<string, MediaAdapter>;
     reference?: ReferenceSpec;
+    /**
+     * Model-maker first-party list prices keyed by modelId — the savings baseline
+     * for a model whose registry def carries no inline `official` price. Lets a
+     * downstream registry (e.g. ai-art's) get correct baselines without inlining
+     * prices. Defaults to the bundled {@link OFFICIAL_PRICES} (lifted from the
+     * cross-provider price table). A def's inline `official` wins over this.
+     */
+    officialPrices?: Record<string, MediaPricing>;
     onError?: (error: Error, provider: string) => void;
     onCost?: (event: MediaCostEvent) => void;
     /**
@@ -374,6 +404,8 @@ declare function createMediaLCR(config: MediaLCRConfig): (modelId: string, input
 declare const MEDIA_PRICING: MediaRegistry;
+declare const OFFICIAL_PRICES: Record<string, MediaPricing>;
 /**
  * Kunavo media adapter — image (sync) + video (async poll).
  *
@@ -532,4 +564,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
  */
 declare function createLCR(config: LCRConfig): LCRRouter;
-export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
+export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };

package/dist/index.d.ts CHANGED Viewed

@@ -85,6 +85,18 @@ interface CallRecord {
     failedOver: boolean;
     /** Total wall time across all attempts, ms. */
     latencyMs: number;
+    /**
+     * Time to first token (TTFT), ms — the industry-standard responsiveness
+     * metric. Measured from the *winning* provider's stream attempt start to its
+     * first content token (`text-delta` / `reasoning-delta`), so it captures how
+     * fast the model that actually served started replying, not failover overhead
+     * (that's already in `latencyMs`). Streaming only: **undefined** for
+     * `doGenerate` (the whole response lands at once, so there's no "first token")
+     * and for calls that failed before producing any content. With `latencyMs` and
+     * `outputTokens`, output throughput is derivable: `outputTokens / ((latencyMs −
+     * ttftMs) / 1000)` tokens/sec.
+     */
+    ttftMs?: number;
     inputTokens: number;
     outputTokens: number;
     /**
@@ -141,6 +153,7 @@ declare function classifyErrorKind(error: unknown): ErrorKind;
  * latency, cost, and — when anything failed — the reason for each failed hop.
  *
  *   ✓ text  tokenmart                      412ms  $0.0003
+ *   ✓ text  tokenmart                      412ms (ttft 88ms)  $0.0003   ← streaming: TTFT shown when known
  *   ⚠ text  tokenmart→openrouter           910ms  $0.0004   ⤷ tokenmart 502
  *   ✗ text  deepseek→tokenmart→openrouter  1240ms FAILED    ⤷ deepseek 401, tokenmart 502, openrouter 429
  *
@@ -251,6 +264,15 @@ interface MediaModelDef {
     modality: MediaModality;
     /** Providers that serve this model. Order is irrelevant — routing sorts by cost. */
     routes: MediaRoute[];
+    /**
+     * The model-maker's first-party list price — what a user pays going DIRECT,
+     * bypassing the cheaper providers we route to. When set, it's the savings
+     * baseline (savings = official − actual cost). Omit for open-weight models
+     * with no first-party API price; those fall back to the priciest configured
+     * route, or no baseline if there's only one. Can also be supplied out-of-band
+     * via {@link MediaLCRConfig.officialPrices} so a registry needn't carry it inline.
+     */
+    official?: MediaPricing;
 }
 type MediaRegistry = Record<string, MediaModelDef>;
 /**
@@ -334,6 +356,14 @@ interface MediaLCRConfig {
     /** Adapters keyed by provider. A route with no adapter is skipped. */
     adapters: Record<string, MediaAdapter>;
     reference?: ReferenceSpec;
+    /**
+     * Model-maker first-party list prices keyed by modelId — the savings baseline
+     * for a model whose registry def carries no inline `official` price. Lets a
+     * downstream registry (e.g. ai-art's) get correct baselines without inlining
+     * prices. Defaults to the bundled {@link OFFICIAL_PRICES} (lifted from the
+     * cross-provider price table). A def's inline `official` wins over this.
+     */
+    officialPrices?: Record<string, MediaPricing>;
     onError?: (error: Error, provider: string) => void;
     onCost?: (event: MediaCostEvent) => void;
     /**
@@ -374,6 +404,8 @@ declare function createMediaLCR(config: MediaLCRConfig): (modelId: string, input
 declare const MEDIA_PRICING: MediaRegistry;
+declare const OFFICIAL_PRICES: Record<string, MediaPricing>;
 /**
  * Kunavo media adapter — image (sync) + video (async poll).
  *
@@ -532,4 +564,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
  */
 declare function createLCR(config: LCRConfig): LCRRouter;
-export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
+export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, OFFICIAL_PRICES, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };

package/dist/index.js CHANGED Viewed

@@ -271,7 +271,7 @@ var LcrFallbackModel = class {
     return max;
   }
   /** Winner settled: record the attempt, fire `onCost` (compat) + `onCall`. */
-  finalizeOk(ctx, provider, attemptStart, usage) {
+  finalizeOk(ctx, provider, attemptStart, usage, ttftMs) {
     ctx.attempts.push({ provider: provider.label, ok: true, latencyMs: Date.now() - attemptStart });
     const inputTokens = usage?.inputTokens?.total ?? 0;
     const outputTokens = usage?.outputTokens?.total ?? 0;
@@ -293,6 +293,7 @@ var LcrFallbackModel = class {
       ok: true,
       failedOver: ctx.attempts.length > 1,
       latencyMs: Date.now() - ctx.startedAt,
+      ...ttftMs !== void 0 ? { ttftMs } : {},
       inputTokens,
       outputTokens,
       ...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {},
@@ -392,6 +393,7 @@ var LcrFallbackModel = class {
     const triedBeforeServing = tried;
     let usage;
     let streamedAny = false;
+    let ttftMs;
     const stream = new ReadableStream({
       async start(controller) {
         let reader = null;
@@ -405,11 +407,14 @@ var LcrFallbackModel = class {
             }
             if (done) break;
             if (value.type === "finish") usage = value.usage;
+            if (ttftMs === void 0 && (value.type === "text-delta" || value.type === "reasoning-delta")) {
+              ttftMs = Date.now() - servingAttemptStart;
+            }
             controller.enqueue(value);
             if (value.type !== "stream-start") streamedAny = true;
           }
           self.settleSticky(servingIdx);
-          self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage);
+          self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
           controller.close();
         } catch (error) {
           self.emitError(error, servingProvider.label);
@@ -471,7 +476,8 @@ function formatCallRecord(record, opts = {}) {
   const glyph = !record.ok ? "\u2717" : record.failedOver ? "\u26A0" : "\u2713";
   const chain = record.attempts.map((a) => a.provider).join("\u2192") || record.winner || "\u2014";
   const status = formatCost(record);
-  let line = `${glyph} ${record.model}  ${chain}  ${record.latencyMs}ms  ${status}`;
+  const timing = record.ttftMs !== void 0 ? `${record.latencyMs}ms (ttft ${record.ttftMs}ms)` : `${record.latencyMs}ms`;
+  let line = `${glyph} ${record.model}  ${chain}  ${timing}  ${status}`;
   if (record.ok && record.baselineUsd !== void 0 && record.baselineUsd > record.costUsd) {
     line += `  (saved $${(record.baselineUsd - record.costUsd).toFixed(4)})`;
   }
@@ -522,6 +528,68 @@ function createHttpSink(options) {
   };
 }
+// src/media-official.ts
+var OFFICIAL_PRICES = {
+  "alibaba/qwen-image": { unit: "image", cents: 3.5 },
+  "alibaba/qwen-image-edit": { unit: "image", cents: 4.5 },
+  "alibaba/wan-2-2-i2v": { unit: "call", cents: 50 },
+  "alibaba/wan-2-2-t2v": { unit: "call", cents: 50 },
+  "alibaba/wan-2-5-i2v": { unit: "call", cents: 75 },
+  "alibaba/wan-2-5-t2v": { unit: "call", cents: 75 },
+  "alibaba/wan-2-7-image-pro": { unit: "image", cents: 7.5 },
+  "alibaba/wan-2-7-t2v": { unit: "call", cents: 75 },
+  "alibaba/z-image-turbo": { unit: "image", cents: 1.5 },
+  "bfl/flux-1.1-pro": { unit: "image", cents: 4 },
+  "bfl/flux-2-flex": { unit: "image", cents: 6 },
+  "bfl/flux-2-klein-4b": { unit: "image", cents: 1.4 },
+  "bfl/flux-2-klein-9b": { unit: "image", cents: 1.5 },
+  "bfl/flux-2-pro": { unit: "image", cents: 3 },
+  "bfl/flux-kontext-max": { unit: "image", cents: 8 },
+  "bfl/flux-kontext-pro": { unit: "image", cents: 4 },
+  "bria/rmbg-2": { unit: "image", cents: 1.8 },
+  "bytedance/seedance-2-0": { unit: "call", cents: 187 },
+  "bytedance/seedance-2-0-fast": { unit: "call", cents: 60 },
+  "bytedance/seedance-2-0-fast-i2v": { unit: "call", cents: 60 },
+  "bytedance/seedance-v1-pro": { unit: "call", cents: 61 },
+  "bytedance/seedance-v1-pro-i2v": { unit: "call", cents: 61 },
+  "bytedance/seedream-4-5": { unit: "image", cents: 3 },
+  "bytedance/seedream-5-lite": { unit: "image", cents: 3.5 },
+  "google/imagen-4-ultra": { unit: "image", cents: 6 },
+  "google/nano-banana": { unit: "image", cents: 3.9 },
+  "google/nano-banana-2": { unit: "image", cents: 6.7 },
+  "google/nano-banana-pro": { unit: "image", cents: 13.4 },
+  "google/veo-3-1": { unit: "call", cents: 200 },
+  "google/veo-3-1-lite": { unit: "call", cents: 40 },
+  "google/veo-3-quality": { unit: "call", cents: 200 },
+  "ideogram/v3-balanced": { unit: "image", cents: 6 },
+  "kuaishou/kling-image-3": { unit: "image", cents: 2.8 },
+  "kuaishou/kling-image-o3": { unit: "image", cents: 2.8 },
+  "kuaishou/kling-motion-control": { unit: "call", cents: 56 },
+  "kuaishou/kling-v21-master": { unit: "call", cents: 56 },
+  "kuaishou/kling-v21-master-i2v": { unit: "call", cents: 56 },
+  "kuaishou/kling-v26-pro-i2v": { unit: "call", cents: 56 },
+  "kuaishou/kling-v3-pro": { unit: "call", cents: 56 },
+  "kuaishou/kling-v3-pro-i2v": { unit: "call", cents: 56 },
+  "lightricks/ltx-2": { unit: "call", cents: 30 },
+  "lightricks/ltx-2-i2v": { unit: "call", cents: 30 },
+  "minimax/hailuo-02-pro": { unit: "call", cents: 53 },
+  "minimax/hailuo-02-standard": { unit: "call", cents: 27 },
+  "minimax/hailuo-2-3-pro": { unit: "call", cents: 53 },
+  "openai/gpt-image-1-5": { unit: "image", cents: 3.4 },
+  "openai/gpt-image-2": { unit: "image", cents: 5.3 },
+  "openai/gpt-image-2-high": { unit: "image", cents: 21.1 },
+  "openai/sora-2": { unit: "call", cents: 50 },
+  "openai/sora-2-i2v": { unit: "call", cents: 50 },
+  "pixverse/v5-5-i2v": { unit: "call", cents: 60 },
+  "pixverse/v6": { unit: "call", cents: 45 },
+  "recraft/v3": { unit: "image", cents: 4 },
+  "recraft/v4-1": { unit: "image", cents: 25 },
+  "runway/gen-4-image": { unit: "image", cents: 8 },
+  "stability/fast-sdxl": { unit: "image", cents: 0.9 },
+  "stability/stable-diffusion-3": { unit: "image", cents: 6.5 },
+  "xai/grok-image-quality": { unit: "image", cents: 7 }
+};
 // src/media.ts
 var DEFAULT_REFERENCE = {
   image: { width: 1920, height: 1080 },
@@ -573,7 +641,15 @@ function newMediaCallId() {
   return c?.randomUUID ? c.randomUUID() : `lcr_${Date.now().toString(36)}`;
 }
 function createMediaLCR(config) {
-  const { registry, adapters, reference = DEFAULT_REFERENCE, onError, onCost, onCall } = config;
+  const {
+    registry,
+    adapters,
+    reference = DEFAULT_REFERENCE,
+    officialPrices = OFFICIAL_PRICES,
+    onError,
+    onCost,
+    onCall
+  } = config;
   const safeError = (error, provider) => {
     try {
       onError?.(error, provider);
@@ -598,7 +674,8 @@ function createMediaLCR(config) {
       throw new Error(`ai-lcr: unknown media model "${modelId}" \u2014 add it to the registry`);
     }
     const ranked = rankRoutes(def, reference);
-    const baselineUsd = ranked.length > 0 ? Math.max(...ranked.map((r) => r.refCents)) / 100 : 0;
+    const official = def.official ?? officialPrices[modelId];
+    const baselineUsd = official !== void 0 ? normalizedCents(official, reference) / 100 : ranked.length > 0 ? Math.max(...ranked.map((r) => r.refCents)) / 100 : 0;
     const startedAt = Date.now();
     const attempts = [];
     let lastErr;
@@ -1082,6 +1159,7 @@ function createLCR(config) {
 export {
   DEFAULT_REFERENCE,
   MEDIA_PRICING,
+  OFFICIAL_PRICES,
   cheapestRoute,
   classifyError,
   classifyErrorKind,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "ai-lcr",
-  "version": "0.3.0",
+  "version": "0.5.0",
   "description": "Least Cost Routing for LLMs — route every model call to the cheapest available provider, fall back automatically, and track real cost. Built for the Vercel AI SDK.",
   "keywords": [
     "ai",