npm - ai-lcr - Versions diffs - 0.2.6 → 0.4.0 - Mend

ai-lcr 0.2.6 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,65 @@ All notable changes to `ai-lcr` are documented here. The format follows
 [Keep a Changelog](https://keepachangelog.com/), and the project adheres to
 [Semantic Versioning](https://semver.org/).
+## [0.4.0] — 2026-06-02
+All additions are optional and backward compatible.
+### Added
+- **`CallRecord.ttftMs` — time to first token.** Streaming calls now report TTFT,
+  the industry-standard responsiveness metric: ms from the winning provider's
+  stream attempt start to its first content token (`text-delta` /
+  `reasoning-delta`). Measured against the *winner's* attempt, so failover
+  overhead (already in `latencyMs`) doesn't distort it. `undefined` for
+  `doGenerate` (no streaming → no "first token") and for calls that failed before
+  producing content. `formatCallRecord` shows it inline next to total latency when
+  present (`412ms (ttft 88ms)`). With `latencyMs` and `outputTokens` on the same
+  record, output throughput is derivable: `outputTokens / ((latencyMs − ttftMs) /
+  1000)` tokens/sec.
+## [0.3.0] — 2026-06-02
+Integration-feedback pass from wiring ai-lcr into a real agentic product
+(multi-step tool loops, Anthropic prompt caching). All additions are optional
+and backward compatible.
+### Fixed
+- **`createHttpSink` is exported again.** It shipped in 0.2.0, then silently
+  dropped out of the package somewhere after — so `import { createHttpSink }`
+  (as the integration playbook documents) failed with TS2305 on 0.2.1+. The
+  source and tests are restored and the symbol is now pinned in the public-API
+  smoke test so it can't regress unnoticed.
+- **Capability probe no longer false-FAILs tool support.** `check-provider.sh`
+  tested tools with `tool_choice:"auto"` and a single roll — reasoning / chatty
+  models often answer in text instead of calling, which looked identical to
+  dropped tools. It now forces `tool_choice:"required"` (testing *can* the
+  provider call a tool, not *will* the model decide to). The token-inflation
+  parser also surfaces a stderr diagnostic on a parse failure instead of
+  silently returning empty (which masqueraded as an inconclusive result).
+### Added
+- **`CallRecord.baselineUsd` on the text side.** The text router now fills the
+  savings baseline — the same token usage priced on the most expensive priced
+  provider in the chain — so `baselineUsd − costUsd` (the headline a cost
+  dashboard shows) is computable for text, not just media.
+- **Prompt-cache-aware cost.** `ProviderCost` gains an optional `cacheRead`
+  (USD per 1M cached input tokens). When a call reports
+  `usage.inputTokens.cacheRead`, those tokens bill at that rate; omit it and
+  they fall back to the full `input` rate (unchanged). `CallRecord` exposes
+  `cachedInputTokens` for auditing. Accounting only — routing weights are
+  unchanged in this release.
+- **`CallRecord.requestId` passthrough.** Read from `providerOptions.lcr.requestId`;
+  stamp the same id on every step of a tool loop to roll a multi-step request
+  up into one cost figure on the dashboard.
+- **`CallRecord.usageMissing` flag.** Set when the winner served OK but reported
+  zero input *and* output tokens — i.e. the provider emitted no usage, so
+  `costUsd` (and any token-based credit metering) silently reads 0. Surfaces the
+  difference between "free" and "cost unknown"; `formatCallRecord` shows it as
+  `⚠no-usage`, and a savings suffix `(saved $X)` when `baselineUsd` beats cost.
 ## [0.2.6] — 2026-06-01
 ### Changed

package/README.md CHANGED Viewed

@@ -98,6 +98,46 @@ const lcr = createLCR({
 The same pattern works for any vendor's native SDK provider — `@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/openai`, `@ai-sdk/xai`, and so on. They all return `LanguageModelV3`, so you can mix a native vendor API with aggregators in one model's list. Native APIs are narrow (only that vendor's models) but featureful; aggregators are broad. **Official-first + aggregator-fallback** is the natural LCR shape.
+## Cheapest route for open-weights models (DeepInfra)
+For open-weights models — DeepSeek, Kimi, MiniMax, GLM, Qwen — a dedicated inference host is usually the cheapest route, well under aggregator pricing. [DeepInfra](https://deepinfra.com) is OpenAI-compatible, so it slots in as just another entry. **One gotcha:** its OpenAI endpoint lives at `/v1/openai` (the `/v1/` precedes `openai`), not the usual `/v1`:
+```ts
+import { createLCR } from "ai-lcr";
+import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
+const deepinfra = createOpenAICompatible({
+  name: "deepinfra",
+  baseURL: "https://api.deepinfra.com/v1/openai", // note: /v1/openai, not /v1
+  apiKey: process.env.DEEPINFRA_API_KEY,
+});
+const openrouter = createOpenAICompatible({
+  name: "openrouter",
+  baseURL: "https://openrouter.ai/api/v1",
+  apiKey: process.env.OPENROUTER_API_KEY,
+});
+const lcr = createLCR({
+  autoSort: true,
+  models: {
+    // DeepInfra is cheapest; OpenRouter is the breadth/uptime fallback.
+    // DeepInfra uses HuggingFace-style ids (org/Name).
+    "deepseek-v4-flash": [
+      { model: deepinfra("deepseek-ai/DeepSeek-V4-Flash"), label: "deepinfra", cost: { input: 0.10, output: 0.20 } },
+      { model: openrouter("deepseek/deepseek-v4-flash"), label: "openrouter", cost: { input: 0.27, output: 1.10 } },
+    ],
+    "minimax-m2.5": [
+      { model: deepinfra("MiniMaxAI/MiniMax-M2.5"), label: "deepinfra", cost: { input: 0.15, output: 1.15 } },
+    ],
+    "kimi-k2.5": [
+      { model: deepinfra("moonshotai/Kimi-K2.5"), label: "deepinfra", cost: { input: 0.45, output: 2.25 } },
+    ],
+  },
+});
+```
+DeepInfra carries open weights only — no first-party Claude / GPT / Gemini. For those closed models, route through OpenRouter or a discount gateway instead.
 ## How it routes
 1. **Cheapest first.** Providers are tried in order — list them cheapest-first, or set `autoSort: true` to order them by `cost` automatically.
@@ -144,12 +184,52 @@ interface CallRecord {
   ok: boolean;
   failedOver: boolean;       // more than one provider was tried
   latencyMs: number;
+  ttftMs?: number;           // streaming only: time to first token (winner's first content delta) — industry-standard responsiveness metric
   inputTokens: number;
   outputTokens: number;
-  costUsd: number;
+  cachedInputTokens?: number; // prompt-cache hits the winner read (when reported)
+  costUsd: number;            // winner cost, cache-discount applied (see `cacheRead`)
+  baselineUsd?: number;       // same usage on the priciest priced leg → savings = baselineUsd − costUsd
+  requestId?: string;         // your correlation id (see below) — roll multi-step tool loops into one request
+  usageMissing?: boolean;     // winner served but reported 0/0 tokens → costUsd is 0 but unknown, not free
 }
 ```
+**Savings, not just spend.** Whenever at least one provider in a chain carries a `cost`, `baselineUsd` is what the same call would have cost on the most expensive priced leg (typically your safety-net fallback). `baselineUsd − costUsd` is the money routing saved on that call — the number a cost dashboard exists to show.
+**Responsiveness, not just total time.** On streaming calls (`streamText`, `streamObject`, streaming agents), `ttftMs` is the **time to first token** — measured from the winning provider's attempt start to its first content delta. It's the metric most LLM dashboards lead with, because it's what a user feels as "how fast did it start replying". Total `latencyMs` covers the whole stream including any failover; `ttftMs` isolates the serving model's responsiveness. It's `undefined` for `generateText`/`generateObject` (no streaming → no "first" token) and for calls that failed before any content. Output throughput (tokens/sec) is then `outputTokens / ((latencyMs − ttftMs) / 1000)`.
+**Cache-aware cost.** Add `cacheRead` (USD per 1M cached input tokens) to a provider's `cost` and ai-lcr bills prompt-cache hits at that rate when the call reports `usage.inputTokens.cacheRead`. Omit it and cached tokens fall back to the full `input` rate (unchanged from before). For cache-heavy traffic (e.g. Anthropic, where a cache read is ~0.1×) this keeps `costUsd` honest — and `cachedInputTokens` lets a dashboard audit it:
+```ts
+{ model: claude, label: "anthropic", cost: { input: 3, output: 15, cacheRead: 0.3 } }
+```
+**Group a multi-step request.** An agentic turn does one `onCall` per `doStream`/`doGenerate` step, so a 10-step tool loop emits 10 records. Pass a stable id through `providerOptions.lcr.requestId` and every step's record carries it — group by `requestId` for per-request cost:
+```ts
+await streamText({ model: lcr("chat"), messages, providerOptions: { lcr: { requestId } } });
+```
+### Ship records to a collector (`createHttpSink`)
+`createHttpSink` builds an `onCall` handler that POSTs each `CallRecord` as JSON to an endpoint (e.g. a self-hosted dashboard's `/api/ingest`, or any drain that takes the shape). Fire-and-forget — a failed POST never breaks your app. On serverless, pass a `waitUntil`-style `dispatch` (Next.js: `after`) so the request isn't cut off:
+```ts
+import { createLCR, createHttpSink } from "ai-lcr";
+import { after } from "next/server";
+const lcr = createLCR({
+  models: { /* … */ },
+  onCall: createHttpSink({
+    url: process.env.LCR_INGEST_URL + "/api/ingest",
+    headers: { authorization: `Bearer ${process.env.LCR_INGEST_KEY}` },
+    project: process.env.LCR_PROJECT, // optional tenant tag merged into each payload
+    dispatch: after,
+  }),
+});
+```
 ## Supported providers
 Any OpenAI-compatible endpoint works — and so does any AI SDK provider package, including a model vendor's own official API.
@@ -235,11 +315,11 @@ API_KEY=$KUNAVO_API_KEY BASE=https://api.kunavo.com \
   REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
   bash scripts/check-provider.sh
-# TokenMart uses vendor-prefixed model IDs
-API_KEY=$TOKENMART_API_KEY BASE=https://api.tokenmart.ai \
-  MODEL_1=google/gemini-3-flash    REF_1=google/gemini-3-flash-preview \
-  MODEL_2=anthropic/claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
-  CACHE_MODEL=anthropic/claude-sonnet-4-6 \
+# TokenMart (Inference AI) uses bare, un-prefixed model IDs
+API_KEY=$INFERENCE_API_KEY BASE=https://model.service-inference.ai \
+  MODEL_1=gemini-3-flash-preview      REF_1=google/gemini-3-flash-preview \
+  MODEL_2=claude-sonnet-4-6           REF_2=anthropic/claude-sonnet-4.6 \
+  CACHE_MODEL=claude-sonnet-4-6 \
   REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
   bash scripts/check-provider.sh
 ```

package/README.zh-CN.md CHANGED Viewed

@@ -98,6 +98,46 @@ const lcr = createLCR({
 同样的模式适用于任何厂商的原生 SDK provider——`@ai-sdk/anthropic`、`@ai-sdk/google`、`@ai-sdk/openai`、`@ai-sdk/xai` 等等。它们都返回 `LanguageModelV3`，所以你可以在一个模型的列表里把厂商原生 API 和聚合器混着用。原生 API 覆盖窄（只有该厂商自己的模型）但特性全；聚合器覆盖广。**官方优先 + 聚合器兜底** 正是 LCR 最自然的形态。
+## 开源权重模型的最便宜路由（DeepInfra）
+对开源权重模型——DeepSeek、Kimi、MiniMax、GLM、Qwen——专门的推理托管商通常是最便宜的路由，明显低于聚合器价格。[DeepInfra](https://deepinfra.com) 兼容 OpenAI，直接当成列表里的又一个 entry 即可。**有一个坑**：它的 OpenAI endpoint 在 `/v1/openai`（`/v1/` 在 `openai` **前面**），不是常规的 `/v1`：
+```ts
+import { createLCR } from "ai-lcr";
+import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
+const deepinfra = createOpenAICompatible({
+  name: "deepinfra",
+  baseURL: "https://api.deepinfra.com/v1/openai", // 注意：/v1/openai，不是 /v1
+  apiKey: process.env.DEEPINFRA_API_KEY,
+});
+const openrouter = createOpenAICompatible({
+  name: "openrouter",
+  baseURL: "https://openrouter.ai/api/v1",
+  apiKey: process.env.OPENROUTER_API_KEY,
+});
+const lcr = createLCR({
+  autoSort: true,
+  models: {
+    // DeepInfra 最便宜；OpenRouter 作广覆盖 / 可用性兜底。
+    // DeepInfra 用 HuggingFace 风格的 id（org/Name）。
+    "deepseek-v4-flash": [
+      { model: deepinfra("deepseek-ai/DeepSeek-V4-Flash"), label: "deepinfra", cost: { input: 0.10, output: 0.20 } },
+      { model: openrouter("deepseek/deepseek-v4-flash"), label: "openrouter", cost: { input: 0.27, output: 1.10 } },
+    ],
+    "minimax-m2.5": [
+      { model: deepinfra("MiniMaxAI/MiniMax-M2.5"), label: "deepinfra", cost: { input: 0.15, output: 1.15 } },
+    ],
+    "kimi-k2.5": [
+      { model: deepinfra("moonshotai/Kimi-K2.5"), label: "deepinfra", cost: { input: 0.45, output: 2.25 } },
+    ],
+  },
+});
+```
+DeepInfra 只承载开源权重——没有第一方 Claude / GPT / Gemini。那些闭源模型请走 OpenRouter 或折扣中转。
 ## 它如何路由
 1. **最便宜优先。** provider 按顺序依次尝试——把它们排成最便宜优先，或设置 `autoSort: true` 让它按 `cost` 自动排序。
@@ -192,11 +232,11 @@ API_KEY=$KUNAVO_API_KEY BASE=https://api.kunavo.com \
   REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
   bash scripts/check-provider.sh
-# TokenMart 使用 vendor 前缀的模型 ID
-API_KEY=$TOKENMART_API_KEY BASE=https://api.tokenmart.ai \
-  MODEL_1=google/gemini-3-flash    REF_1=google/gemini-3-flash-preview \
-  MODEL_2=anthropic/claude-sonnet-4-6 REF_2=anthropic/claude-sonnet-4.6 \
-  CACHE_MODEL=anthropic/claude-sonnet-4-6 \
+# TokenMart（Inference AI）使用不带 vendor 前缀的裸模型 ID
+API_KEY=$INFERENCE_API_KEY BASE=https://model.service-inference.ai \
+  MODEL_1=gemini-3-flash-preview      REF_1=google/gemini-3-flash-preview \
+  MODEL_2=claude-sonnet-4-6           REF_2=anthropic/claude-sonnet-4.6 \
+  CACHE_MODEL=claude-sonnet-4-6 \
   REF_API_KEY=$OPENROUTER_API_KEY REF_BASE=https://openrouter.ai/api \
   bash scripts/check-provider.sh
 ```

package/dist/index.cjs CHANGED Viewed

@@ -27,6 +27,7 @@ __export(index_exports, {
   classifyErrorKind: () => classifyErrorKind,
   comparePrices: () => comparePrices,
   createFalMediaAdapter: () => createFalMediaAdapter,
+  createHttpSink: () => createHttpSink,
   createKunavoMediaAdapter: () => createKunavoMediaAdapter,
   createLCR: () => createLCR,
   createMediaLCR: () => createMediaLCR,
@@ -186,6 +187,16 @@ function newCallId() {
   if (c?.randomUUID) return c.randomUUID();
   return `lcr_${Date.now().toString(36)}_${(callSeq++).toString(36)}`;
 }
+function costForUsage(cost, inputTokens, outputTokens, cacheReadTokens) {
+  const cached = Math.min(Math.max(cacheReadTokens, 0), inputTokens);
+  const fullInput = inputTokens - cached;
+  const cachedRate = cost.cacheRead ?? cost.input;
+  return fullInput / 1e6 * cost.input + cached / 1e6 * cachedRate + outputTokens / 1e6 * cost.output;
+}
+function requestIdFrom(options) {
+  const raw = options.providerOptions?.lcr?.requestId;
+  return typeof raw === "string" && raw.length > 0 ? raw : void 0;
+}
 var LcrFallbackModel = class {
   constructor(opts) {
     this.opts = opts;
@@ -267,8 +278,13 @@ var LcrFallbackModel = class {
     } catch {
     }
   }
-  startCall() {
-    return { id: newCallId(), attempts: [], startedAt: Date.now() };
+  startCall(options) {
+    return {
+      id: newCallId(),
+      attempts: [],
+      startedAt: Date.now(),
+      requestId: requestIdFrom(options)
+    };
   }
   /** Record a failed attempt onto the call's chain (no event yet). */
   recordFail(ctx, provider, attemptStart, error) {
@@ -280,12 +296,29 @@ var LcrFallbackModel = class {
       kind: classifyErrorKind(error)
     });
   }
+  /**
+   * Baseline = what this same usage would have cost on the most expensive
+   * *priced* provider in the chain (typically the OpenRouter fallback leg). The
+   * winner's savings is `baselineUsd - costUsd`. Undefined when no provider in
+   * the chain carries a price (nothing to compare against).
+   */
+  baselineUsd(inputTokens, outputTokens, cacheReadTokens) {
+    let max;
+    for (const p of this.opts.providers) {
+      if (!p.cost) continue;
+      const c = costForUsage(p.cost, inputTokens, outputTokens, cacheReadTokens);
+      if (max === void 0 || c > max) max = c;
+    }
+    return max;
+  }
   /** Winner settled: record the attempt, fire `onCost` (compat) + `onCall`. */
-  finalizeOk(ctx, provider, attemptStart, usage) {
+  finalizeOk(ctx, provider, attemptStart, usage, ttftMs) {
     ctx.attempts.push({ provider: provider.label, ok: true, latencyMs: Date.now() - attemptStart });
     const inputTokens = usage?.inputTokens?.total ?? 0;
     const outputTokens = usage?.outputTokens?.total ?? 0;
-    const costUsd = provider.cost ? inputTokens / 1e6 * provider.cost.input + outputTokens / 1e6 * provider.cost.output : 0;
+    const cacheReadTokens = usage?.inputTokens?.cacheRead ?? 0;
+    const costUsd = provider.cost ? costForUsage(provider.cost, inputTokens, outputTokens, cacheReadTokens) : 0;
+    const usageMissing = inputTokens === 0 && outputTokens === 0;
     this.emitCost({
       model: this.opts.modelName,
       provider: provider.label,
@@ -301,9 +334,14 @@ var LcrFallbackModel = class {
       ok: true,
       failedOver: ctx.attempts.length > 1,
       latencyMs: Date.now() - ctx.startedAt,
+      ...ttftMs !== void 0 ? { ttftMs } : {},
       inputTokens,
       outputTokens,
-      costUsd
+      ...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {},
+      costUsd,
+      baselineUsd: this.baselineUsd(inputTokens, outputTokens, cacheReadTokens),
+      ...ctx.requestId ? { requestId: ctx.requestId } : {},
+      ...usageMissing ? { usageMissing: true } : {}
     });
   }
   /** Every provider failed: fire `onCall` with no winner. */
@@ -318,11 +356,12 @@ var LcrFallbackModel = class {
       latencyMs: Date.now() - ctx.startedAt,
       inputTokens: 0,
       outputTokens: 0,
-      costUsd: 0
+      costUsd: 0,
+      ...ctx.requestId ? { requestId: ctx.requestId } : {}
     });
   }
   async doGenerate(options) {
-    const ctx = this.startCall();
+    const ctx = this.startCall(options);
     const providers = this.opts.providers;
     const n = providers.length;
     const start = this.startIndex();
@@ -351,7 +390,7 @@ var LcrFallbackModel = class {
     throw lastError;
   }
   async doStream(options) {
-    return this.doStreamWithCtx(options, this.startCall(), this.startIndex(), 0);
+    return this.doStreamWithCtx(options, this.startCall(options), this.startIndex(), 0);
   }
   // The stream's failover recursion re-enters here with the SAME `ctx` and a
   // threaded-through local cursor (`idx`/`tried`), so a mid-stream switch keeps
@@ -395,6 +434,7 @@ var LcrFallbackModel = class {
     const triedBeforeServing = tried;
     let usage;
     let streamedAny = false;
+    let ttftMs;
     const stream = new ReadableStream({
       async start(controller) {
         let reader = null;
@@ -408,11 +448,14 @@ var LcrFallbackModel = class {
             }
             if (done) break;
             if (value.type === "finish") usage = value.usage;
+            if (ttftMs === void 0 && (value.type === "text-delta" || value.type === "reasoning-delta")) {
+              ttftMs = Date.now() - servingAttemptStart;
+            }
             controller.enqueue(value);
             if (value.type !== "stream-start") streamedAny = true;
           }
           self.settleSticky(servingIdx);
-          self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage);
+          self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
           controller.close();
         } catch (error) {
           self.emitError(error, servingProvider.label);
@@ -474,7 +517,12 @@ function formatCallRecord(record, opts = {}) {
   const glyph = !record.ok ? "\u2717" : record.failedOver ? "\u26A0" : "\u2713";
   const chain = record.attempts.map((a) => a.provider).join("\u2192") || record.winner || "\u2014";
   const status = formatCost(record);
-  let line = `${glyph} ${record.model}  ${chain}  ${record.latencyMs}ms  ${status}`;
+  const timing = record.ttftMs !== void 0 ? `${record.latencyMs}ms (ttft ${record.ttftMs}ms)` : `${record.latencyMs}ms`;
+  let line = `${glyph} ${record.model}  ${chain}  ${timing}  ${status}`;
+  if (record.ok && record.baselineUsd !== void 0 && record.baselineUsd > record.costUsd) {
+    line += `  (saved $${(record.baselineUsd - record.costUsd).toFixed(4)})`;
+  }
+  if (record.usageMissing) line += `  \u26A0no-usage`;
   const failed = record.attempts.filter((a) => !a.ok);
   if (failed.length > 0) {
     const reasons = failed.map((a) => `${a.provider} ${a.errorClass ?? "error"}`).join(", ");
@@ -487,6 +535,40 @@ function formatCallRecord(record, opts = {}) {
   return line;
 }
+// src/sink.ts
+function createHttpSink(options) {
+  const {
+    url,
+    headers,
+    project,
+    dispatch = (task) => {
+      void task();
+    },
+    fetchImpl,
+    onError
+  } = options;
+  const doFetch = fetchImpl ?? globalThis.fetch;
+  return (record) => {
+    if (!doFetch) {
+      onError?.(new Error("ai-lcr: no fetch available for createHttpSink"));
+      return;
+    }
+    const payload = project ? { project, ...record } : record;
+    dispatch(async () => {
+      try {
+        await doFetch(url, {
+          method: "POST",
+          headers: { "content-type": "application/json", ...headers },
+          body: JSON.stringify(payload),
+          keepalive: true
+        });
+      } catch (err) {
+        onError?.(err);
+      }
+    });
+  };
+}
 // src/media.ts
 var DEFAULT_REFERENCE = {
   image: { width: 1920, height: 1080 },
@@ -1053,6 +1135,7 @@ function createLCR(config) {
   classifyErrorKind,
   comparePrices,
   createFalMediaAdapter,
+  createHttpSink,
   createKunavoMediaAdapter,
   createLCR,
   createMediaLCR,

package/dist/index.d.cts CHANGED Viewed

@@ -17,8 +17,18 @@ import { LanguageModelV3 } from '@ai-sdk/provider';
 /** USD per 1M tokens. */
 interface ProviderCost {
+    /** USD per 1M input (prompt) tokens. */
     input: number;
+    /** USD per 1M output (completion) tokens. */
     output: number;
+    /**
+     * USD per 1M *cached* input tokens read (prompt-cache hits). Optional. When a
+     * call reports `usage.inputTokens.cacheRead`, those tokens are billed at this
+     * rate instead of `input` — so the cost stays honest for cache-heavy traffic
+     * (e.g. Anthropic, where a cache read is ~0.1× the input price). Omit it and
+     * cached tokens fall back to the full `input` rate (the pre-0.3 behavior).
+     */
+    cacheRead?: number;
 }
 interface CostEvent {
     /** Logical model name (the key in createLCR's `models`). */
@@ -75,17 +85,50 @@ interface CallRecord {
     failedOver: boolean;
     /** Total wall time across all attempts, ms. */
     latencyMs: number;
+    /**
+     * Time to first token (TTFT), ms — the industry-standard responsiveness
+     * metric. Measured from the *winning* provider's stream attempt start to its
+     * first content token (`text-delta` / `reasoning-delta`), so it captures how
+     * fast the model that actually served started replying, not failover overhead
+     * (that's already in `latencyMs`). Streaming only: **undefined** for
+     * `doGenerate` (the whole response lands at once, so there's no "first token")
+     * and for calls that failed before producing any content. With `latencyMs` and
+     * `outputTokens`, output throughput is derivable: `outputTokens / ((latencyMs −
+     * ttftMs) / 1000)` tokens/sec.
+     */
+    ttftMs?: number;
     inputTokens: number;
     outputTokens: number;
+    /**
+     * Cached input (prompt-cache) tokens the winner read, when the provider
+     * reported them (`usage.inputTokens.cacheRead`). Present only when > 0. Lets
+     * the dashboard show cache-hit volume and audit why `costUsd` is lower than
+     * sticker × tokens. Undefined when the provider reports no cache info.
+     */
+    cachedInputTokens?: number;
     /** Computed from the winner's `cost`; 0 if no price was given or the call failed. */
     costUsd: number;
     /**
-     * What the same request would have cost on the most expensive configured
-     * provider — the savings baseline (`baselineUsd - costUsd`). Set by the media
-     * router; the text router omits it (left undefined) until a per-call text
-     * baseline lands. Optional so both routers share one {@link CallRecord} shape.
+     * What the same request would have cost on the most expensive *priced*
+     * provider in the chain, on identical token usage — the savings baseline
+     * (`baselineUsd - costUsd`). Set by both routers whenever at least one
+     * provider carries a `cost`; undefined only when no provider was priced.
      */
     baselineUsd?: number;
+    /**
+     * Caller-supplied correlation id, read from `providerOptions.lcr.requestId`
+     * on the call. Multi-step tool loops emit one record per `doStream`/
+     * `doGenerate` step; stamp the same `requestId` on every step to let the
+     * dashboard roll a whole user request up into one cost/`calls` figure.
+     */
+    requestId?: string;
+    /**
+     * True when the winner served OK but reported **zero** input *and* output
+     * tokens — i.e. the provider didn't emit usage. A silent danger: `costUsd`
+     * collapses to 0 and any token-based credit metering under-charges with no
+     * other signal. Treat a flagged record as "cost unknown", not "free".
+     */
+    usageMissing?: boolean;
 }
 /**
  * Normalize an error into a short, log-friendly class for {@link CallRecord}.
@@ -110,6 +153,7 @@ declare function classifyErrorKind(error: unknown): ErrorKind;
  * latency, cost, and — when anything failed — the reason for each failed hop.
  *
  *   ✓ text  tokenmart                      412ms  $0.0003
+ *   ✓ text  tokenmart                      412ms (ttft 88ms)  $0.0003   ← streaming: TTFT shown when known
  *   ⚠ text  tokenmart→openrouter           910ms  $0.0004   ⤷ tokenmart 502
  *   ✗ text  deepseek→tokenmart→openrouter  1240ms FAILED    ⤷ deepseek 401, tokenmart 502, openrouter 429
  *
@@ -122,6 +166,54 @@ interface FormatOptions {
 }
 declare function formatCallRecord(record: CallRecord, opts?: FormatOptions): string;
+/**
+ * Optional HTTP sink for `onCall` — ship each {@link CallRecord} as JSON to a
+ * collector (e.g. a self-hosted ai-lcr-dashboard `/api/ingest`, or any endpoint
+ * that accepts the CallRecord shape).
+ *
+ * Fully optional and dashboard-agnostic: omit it and ai-lcr stores nothing;
+ * point `url` at whatever you run. Logging must never break your app, so a
+ * failed POST is swallowed by default (surface it via `onError` if you want).
+ *
+ *   import { createLCR, createHttpSink } from "ai-lcr";
+ *   import { after } from "next/server"; // serverless: don't block the response
+ *
+ *   const lcr = createLCR({
+ *     models: { ... },
+ *     onCall: createHttpSink({
+ *       url: process.env.LCR_INGEST_URL + "/api/ingest",
+ *       headers: { authorization: `Bearer ${process.env.LCR_INGEST_KEY}` },
+ *       project: process.env.LCR_PROJECT,
+ *       dispatch: after, // run after the response is sent
+ *     }),
+ *   });
+ */
+interface HttpSinkOptions {
+    /** Where to POST each CallRecord (a collector that accepts the JSON shape). */
+    url: string;
+    /** Extra headers, e.g. `{ authorization: ` + "`Bearer ${key}`" + ` }`. */
+    headers?: Record<string, string>;
+    /** Optional tenant/project tag merged into each payload (`{ project, ...record }`). */
+    project?: string;
+    /**
+     * Wrap the dispatch so it survives a serverless function returning. On
+     * Next.js pass `after` from "next/server"; elsewhere pass a `waitUntil`-style
+     * function. Defaults to running immediately — correct for long-lived servers,
+     * but on serverless an un-awaited POST may be cut off, so pass `after`.
+     */
+    dispatch?: (task: () => void | Promise<void>) => void;
+    /** Custom fetch (tests / runtimes without a global `fetch`). */
+    fetchImpl?: typeof fetch;
+    /** Called if the POST fails. Failures are swallowed by default. */
+    onError?: (error: unknown) => void;
+}
+/**
+ * Build an `onCall` handler that POSTs each {@link CallRecord} to `url`.
+ * Returns a plain `(record) => void` — pass it straight to `createLCR`'s `onCall`.
+ */
+declare function createHttpSink(options: HttpSinkOptions): (record: CallRecord) => void;
 /**
  * ai-lcr media routing — Least Cost Routing for image & video models.
  *
@@ -453,4 +545,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
  */
 declare function createLCR(config: LCRConfig): LCRRouter;
-export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
+export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };

package/dist/index.d.ts CHANGED Viewed

@@ -17,8 +17,18 @@ import { LanguageModelV3 } from '@ai-sdk/provider';
 /** USD per 1M tokens. */
 interface ProviderCost {
+    /** USD per 1M input (prompt) tokens. */
     input: number;
+    /** USD per 1M output (completion) tokens. */
     output: number;
+    /**
+     * USD per 1M *cached* input tokens read (prompt-cache hits). Optional. When a
+     * call reports `usage.inputTokens.cacheRead`, those tokens are billed at this
+     * rate instead of `input` — so the cost stays honest for cache-heavy traffic
+     * (e.g. Anthropic, where a cache read is ~0.1× the input price). Omit it and
+     * cached tokens fall back to the full `input` rate (the pre-0.3 behavior).
+     */
+    cacheRead?: number;
 }
 interface CostEvent {
     /** Logical model name (the key in createLCR's `models`). */
@@ -75,17 +85,50 @@ interface CallRecord {
     failedOver: boolean;
     /** Total wall time across all attempts, ms. */
     latencyMs: number;
+    /**
+     * Time to first token (TTFT), ms — the industry-standard responsiveness
+     * metric. Measured from the *winning* provider's stream attempt start to its
+     * first content token (`text-delta` / `reasoning-delta`), so it captures how
+     * fast the model that actually served started replying, not failover overhead
+     * (that's already in `latencyMs`). Streaming only: **undefined** for
+     * `doGenerate` (the whole response lands at once, so there's no "first token")
+     * and for calls that failed before producing any content. With `latencyMs` and
+     * `outputTokens`, output throughput is derivable: `outputTokens / ((latencyMs −
+     * ttftMs) / 1000)` tokens/sec.
+     */
+    ttftMs?: number;
     inputTokens: number;
     outputTokens: number;
+    /**
+     * Cached input (prompt-cache) tokens the winner read, when the provider
+     * reported them (`usage.inputTokens.cacheRead`). Present only when > 0. Lets
+     * the dashboard show cache-hit volume and audit why `costUsd` is lower than
+     * sticker × tokens. Undefined when the provider reports no cache info.
+     */
+    cachedInputTokens?: number;
     /** Computed from the winner's `cost`; 0 if no price was given or the call failed. */
     costUsd: number;
     /**
-     * What the same request would have cost on the most expensive configured
-     * provider — the savings baseline (`baselineUsd - costUsd`). Set by the media
-     * router; the text router omits it (left undefined) until a per-call text
-     * baseline lands. Optional so both routers share one {@link CallRecord} shape.
+     * What the same request would have cost on the most expensive *priced*
+     * provider in the chain, on identical token usage — the savings baseline
+     * (`baselineUsd - costUsd`). Set by both routers whenever at least one
+     * provider carries a `cost`; undefined only when no provider was priced.
      */
     baselineUsd?: number;
+    /**
+     * Caller-supplied correlation id, read from `providerOptions.lcr.requestId`
+     * on the call. Multi-step tool loops emit one record per `doStream`/
+     * `doGenerate` step; stamp the same `requestId` on every step to let the
+     * dashboard roll a whole user request up into one cost/`calls` figure.
+     */
+    requestId?: string;
+    /**
+     * True when the winner served OK but reported **zero** input *and* output
+     * tokens — i.e. the provider didn't emit usage. A silent danger: `costUsd`
+     * collapses to 0 and any token-based credit metering under-charges with no
+     * other signal. Treat a flagged record as "cost unknown", not "free".
+     */
+    usageMissing?: boolean;
 }
 /**
  * Normalize an error into a short, log-friendly class for {@link CallRecord}.
@@ -110,6 +153,7 @@ declare function classifyErrorKind(error: unknown): ErrorKind;
  * latency, cost, and — when anything failed — the reason for each failed hop.
  *
  *   ✓ text  tokenmart                      412ms  $0.0003
+ *   ✓ text  tokenmart                      412ms (ttft 88ms)  $0.0003   ← streaming: TTFT shown when known
  *   ⚠ text  tokenmart→openrouter           910ms  $0.0004   ⤷ tokenmart 502
  *   ✗ text  deepseek→tokenmart→openrouter  1240ms FAILED    ⤷ deepseek 401, tokenmart 502, openrouter 429
  *
@@ -122,6 +166,54 @@ interface FormatOptions {
 }
 declare function formatCallRecord(record: CallRecord, opts?: FormatOptions): string;
+/**
+ * Optional HTTP sink for `onCall` — ship each {@link CallRecord} as JSON to a
+ * collector (e.g. a self-hosted ai-lcr-dashboard `/api/ingest`, or any endpoint
+ * that accepts the CallRecord shape).
+ *
+ * Fully optional and dashboard-agnostic: omit it and ai-lcr stores nothing;
+ * point `url` at whatever you run. Logging must never break your app, so a
+ * failed POST is swallowed by default (surface it via `onError` if you want).
+ *
+ *   import { createLCR, createHttpSink } from "ai-lcr";
+ *   import { after } from "next/server"; // serverless: don't block the response
+ *
+ *   const lcr = createLCR({
+ *     models: { ... },
+ *     onCall: createHttpSink({
+ *       url: process.env.LCR_INGEST_URL + "/api/ingest",
+ *       headers: { authorization: `Bearer ${process.env.LCR_INGEST_KEY}` },
+ *       project: process.env.LCR_PROJECT,
+ *       dispatch: after, // run after the response is sent
+ *     }),
+ *   });
+ */
+interface HttpSinkOptions {
+    /** Where to POST each CallRecord (a collector that accepts the JSON shape). */
+    url: string;
+    /** Extra headers, e.g. `{ authorization: ` + "`Bearer ${key}`" + ` }`. */
+    headers?: Record<string, string>;
+    /** Optional tenant/project tag merged into each payload (`{ project, ...record }`). */
+    project?: string;
+    /**
+     * Wrap the dispatch so it survives a serverless function returning. On
+     * Next.js pass `after` from "next/server"; elsewhere pass a `waitUntil`-style
+     * function. Defaults to running immediately — correct for long-lived servers,
+     * but on serverless an un-awaited POST may be cut off, so pass `after`.
+     */
+    dispatch?: (task: () => void | Promise<void>) => void;
+    /** Custom fetch (tests / runtimes without a global `fetch`). */
+    fetchImpl?: typeof fetch;
+    /** Called if the POST fails. Failures are swallowed by default. */
+    onError?: (error: unknown) => void;
+}
+/**
+ * Build an `onCall` handler that POSTs each {@link CallRecord} to `url`.
+ * Returns a plain `(record) => void` — pass it straight to `createLCR`'s `onCall`.
+ */
+declare function createHttpSink(options: HttpSinkOptions): (record: CallRecord) => void;
 /**
  * ai-lcr media routing — Least Cost Routing for image & video models.
  *
@@ -453,4 +545,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
  */
 declare function createLCR(config: LCRConfig): LCRRouter;
-export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
+export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };

package/dist/index.js CHANGED Viewed

@@ -146,6 +146,16 @@ function newCallId() {
   if (c?.randomUUID) return c.randomUUID();
   return `lcr_${Date.now().toString(36)}_${(callSeq++).toString(36)}`;
 }
+function costForUsage(cost, inputTokens, outputTokens, cacheReadTokens) {
+  const cached = Math.min(Math.max(cacheReadTokens, 0), inputTokens);
+  const fullInput = inputTokens - cached;
+  const cachedRate = cost.cacheRead ?? cost.input;
+  return fullInput / 1e6 * cost.input + cached / 1e6 * cachedRate + outputTokens / 1e6 * cost.output;
+}
+function requestIdFrom(options) {
+  const raw = options.providerOptions?.lcr?.requestId;
+  return typeof raw === "string" && raw.length > 0 ? raw : void 0;
+}
 var LcrFallbackModel = class {
   constructor(opts) {
     this.opts = opts;
@@ -227,8 +237,13 @@ var LcrFallbackModel = class {
     } catch {
     }
   }
-  startCall() {
-    return { id: newCallId(), attempts: [], startedAt: Date.now() };
+  startCall(options) {
+    return {
+      id: newCallId(),
+      attempts: [],
+      startedAt: Date.now(),
+      requestId: requestIdFrom(options)
+    };
   }
   /** Record a failed attempt onto the call's chain (no event yet). */
   recordFail(ctx, provider, attemptStart, error) {
@@ -240,12 +255,29 @@ var LcrFallbackModel = class {
       kind: classifyErrorKind(error)
     });
   }
+  /**
+   * Baseline = what this same usage would have cost on the most expensive
+   * *priced* provider in the chain (typically the OpenRouter fallback leg). The
+   * winner's savings is `baselineUsd - costUsd`. Undefined when no provider in
+   * the chain carries a price (nothing to compare against).
+   */
+  baselineUsd(inputTokens, outputTokens, cacheReadTokens) {
+    let max;
+    for (const p of this.opts.providers) {
+      if (!p.cost) continue;
+      const c = costForUsage(p.cost, inputTokens, outputTokens, cacheReadTokens);
+      if (max === void 0 || c > max) max = c;
+    }
+    return max;
+  }
   /** Winner settled: record the attempt, fire `onCost` (compat) + `onCall`. */
-  finalizeOk(ctx, provider, attemptStart, usage) {
+  finalizeOk(ctx, provider, attemptStart, usage, ttftMs) {
     ctx.attempts.push({ provider: provider.label, ok: true, latencyMs: Date.now() - attemptStart });
     const inputTokens = usage?.inputTokens?.total ?? 0;
     const outputTokens = usage?.outputTokens?.total ?? 0;
-    const costUsd = provider.cost ? inputTokens / 1e6 * provider.cost.input + outputTokens / 1e6 * provider.cost.output : 0;
+    const cacheReadTokens = usage?.inputTokens?.cacheRead ?? 0;
+    const costUsd = provider.cost ? costForUsage(provider.cost, inputTokens, outputTokens, cacheReadTokens) : 0;
+    const usageMissing = inputTokens === 0 && outputTokens === 0;
     this.emitCost({
       model: this.opts.modelName,
       provider: provider.label,
@@ -261,9 +293,14 @@ var LcrFallbackModel = class {
       ok: true,
       failedOver: ctx.attempts.length > 1,
       latencyMs: Date.now() - ctx.startedAt,
+      ...ttftMs !== void 0 ? { ttftMs } : {},
       inputTokens,
       outputTokens,
-      costUsd
+      ...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {},
+      costUsd,
+      baselineUsd: this.baselineUsd(inputTokens, outputTokens, cacheReadTokens),
+      ...ctx.requestId ? { requestId: ctx.requestId } : {},
+      ...usageMissing ? { usageMissing: true } : {}
     });
   }
   /** Every provider failed: fire `onCall` with no winner. */
@@ -278,11 +315,12 @@ var LcrFallbackModel = class {
       latencyMs: Date.now() - ctx.startedAt,
       inputTokens: 0,
       outputTokens: 0,
-      costUsd: 0
+      costUsd: 0,
+      ...ctx.requestId ? { requestId: ctx.requestId } : {}
     });
   }
   async doGenerate(options) {
-    const ctx = this.startCall();
+    const ctx = this.startCall(options);
     const providers = this.opts.providers;
     const n = providers.length;
     const start = this.startIndex();
@@ -311,7 +349,7 @@ var LcrFallbackModel = class {
     throw lastError;
   }
   async doStream(options) {
-    return this.doStreamWithCtx(options, this.startCall(), this.startIndex(), 0);
+    return this.doStreamWithCtx(options, this.startCall(options), this.startIndex(), 0);
   }
   // The stream's failover recursion re-enters here with the SAME `ctx` and a
   // threaded-through local cursor (`idx`/`tried`), so a mid-stream switch keeps
@@ -355,6 +393,7 @@ var LcrFallbackModel = class {
     const triedBeforeServing = tried;
     let usage;
     let streamedAny = false;
+    let ttftMs;
     const stream = new ReadableStream({
       async start(controller) {
         let reader = null;
@@ -368,11 +407,14 @@ var LcrFallbackModel = class {
             }
             if (done) break;
             if (value.type === "finish") usage = value.usage;
+            if (ttftMs === void 0 && (value.type === "text-delta" || value.type === "reasoning-delta")) {
+              ttftMs = Date.now() - servingAttemptStart;
+            }
             controller.enqueue(value);
             if (value.type !== "stream-start") streamedAny = true;
           }
           self.settleSticky(servingIdx);
-          self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage);
+          self.finalizeOk(ctx, servingProvider, servingAttemptStart, usage, ttftMs);
           controller.close();
         } catch (error) {
           self.emitError(error, servingProvider.label);
@@ -434,7 +476,12 @@ function formatCallRecord(record, opts = {}) {
   const glyph = !record.ok ? "\u2717" : record.failedOver ? "\u26A0" : "\u2713";
   const chain = record.attempts.map((a) => a.provider).join("\u2192") || record.winner || "\u2014";
   const status = formatCost(record);
-  let line = `${glyph} ${record.model}  ${chain}  ${record.latencyMs}ms  ${status}`;
+  const timing = record.ttftMs !== void 0 ? `${record.latencyMs}ms (ttft ${record.ttftMs}ms)` : `${record.latencyMs}ms`;
+  let line = `${glyph} ${record.model}  ${chain}  ${timing}  ${status}`;
+  if (record.ok && record.baselineUsd !== void 0 && record.baselineUsd > record.costUsd) {
+    line += `  (saved $${(record.baselineUsd - record.costUsd).toFixed(4)})`;
+  }
+  if (record.usageMissing) line += `  \u26A0no-usage`;
   const failed = record.attempts.filter((a) => !a.ok);
   if (failed.length > 0) {
     const reasons = failed.map((a) => `${a.provider} ${a.errorClass ?? "error"}`).join(", ");
@@ -447,6 +494,40 @@ function formatCallRecord(record, opts = {}) {
   return line;
 }
+// src/sink.ts
+function createHttpSink(options) {
+  const {
+    url,
+    headers,
+    project,
+    dispatch = (task) => {
+      void task();
+    },
+    fetchImpl,
+    onError
+  } = options;
+  const doFetch = fetchImpl ?? globalThis.fetch;
+  return (record) => {
+    if (!doFetch) {
+      onError?.(new Error("ai-lcr: no fetch available for createHttpSink"));
+      return;
+    }
+    const payload = project ? { project, ...record } : record;
+    dispatch(async () => {
+      try {
+        await doFetch(url, {
+          method: "POST",
+          headers: { "content-type": "application/json", ...headers },
+          body: JSON.stringify(payload),
+          keepalive: true
+        });
+      } catch (err) {
+        onError?.(err);
+      }
+    });
+  };
+}
 // src/media.ts
 var DEFAULT_REFERENCE = {
   image: { width: 1920, height: 1080 },
@@ -1012,6 +1093,7 @@ export {
   classifyErrorKind,
   comparePrices,
   createFalMediaAdapter,
+  createHttpSink,
   createKunavoMediaAdapter,
   createLCR,
   createMediaLCR,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "ai-lcr",
-  "version": "0.2.6",
+  "version": "0.4.0",
   "description": "Least Cost Routing for LLMs — route every model call to the cheapest available provider, fall back automatically, and track real cost. Built for the Vercel AI SDK.",
   "keywords": [
     "ai",