npm - ai-lcr - Versions diffs - 0.5.5 → 0.6.0 - Mend

ai-lcr 0.5.5 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,138 @@ All notable changes to `ai-lcr` are documented here. The format follows
 [Keep a Changelog](https://keepachangelog.com/), and the project adheres to
 [Semantic Versioning](https://semver.org/).
+## [0.6.0] — 2026-06-10
+Media billing contract v2: **rank by the reference, bill by actual usage.**
+The 0.5 media router used one number for both jobs — the price normalized to a
+reference output (1080p image / 5-second clip) ranked routes *and* estimated
+costs, multiplied by an untyped `units` count. That mispriced off-reference
+outputs (an 8s clip billed as 5s) and made the baseline duration-blind, and the
+bare `units` invited a seconds-as-count 8× overcharge. 0.6 separates the two.
+### Added
+- **Typed usage (`MediaUsage`).** Adapter results (`MediaGenerateResult`,
+  `MediaStatusResult`) carry `usage: { seconds?, outputs?, megapixels? }` —
+  explicitly named dimensions that cannot be confused. The bundled adapters
+  report it (Kunavo video now safely reports the real `duration_seconds`).
+  The legacy bare `units` field is still honored as an output count.
+- **Settle-time billing.** Cost estimates price the route's actual unit on
+  actual usage: per-second SKUs bill `usage.seconds` → `input.duration`
+  (numbers or `"8s"`-style strings) → the reference (last resort); per-image /
+  per-call SKUs bill output count; per-megapixel SKUs bill measured megapixels.
+  New public helpers: `billableUnits`, `priceCents`, `durationFromInput`.
+- **Usage-aware savings baseline.** `baselineUsd` is now priced at settle time
+  against the same usage as the cost — an 8-second clip is baselined at 8
+  seconds of the official rate, not the 5-second reference. Off-reference calls
+  can no longer produce negative or understated savings.
+- **`CallRecord` provenance fields** (all optional, backward compatible):
+  `modality` ("image" | "video"), `usage`, `baselineKind`
+  ("official" | "priciest-route" | "last-leg" — the text router now stamps
+  "last-leg"), `officialUsd` (the official price for this call's usage), and
+  `estCostUsd` (the price-table prediction; `costUsd − estCostUsd` on
+  provider-reported rows is price-table drift).
+- **Cost-outlier guard.** A provider-reported cost ≥25× off the table
+  prediction (the classic USD-vs-cents slip is exactly 100×) raises `onError`
+  with both numbers; the reported bill still stands.
+- `MediaRunResult` and the terminal `MediaPollResult` expose the `usage` that
+  backed the bill.
+### Changed
+- `MediaJobHandle` now carries the serving route's `pricing` and the resolved
+  savings `baseline` so settle-time billing works across processes. Handles
+  serialized by 0.5.x still poll fine: they settle with the legacy
+  reference-price estimate and the submit-time baseline.
+## [0.5.6] — 2026-06-07
+All additions are optional and backward compatible. The sync `createMediaLCR`
+router (the callable `generate(modelId, input)`) and every adapter's `run()` are
+**unchanged** in signature and behavior.
+### Added
+- **Async media routing — `submit` / `poll` for long-running (video) jobs.**
+  The blocking media path holds a serverless invocation open until the file is
+  ready: fine for an image (seconds), impossible for a minutes-long video job.
+  `createMediaLCR(...)` now returns a callable with two methods attached:
+  ```ts
+  const lcr = createMediaLCR({ registry, adapters })
+  // process A (request handler): route + enqueue, return immediately
+  const handle = await lcr.submit('google/veo-3-lite', { prompt, aspect_ratio: '16:9' })
+  await db.save(JSON.stringify(handle))      // the handle is plain JSON
+  // process B (cron / queue worker): poll until terminal
+  const r = await lcr.poll(handle)
+  if (r.done) use(r.outputs, r.costCents)    // else keep polling r.handle
+  ```
+  - **Routing happens at `submit`** — it picks the cheapest provider whose
+    adapter supports async, and the returned `MediaJobHandle` carries the
+    not-yet-tried fallback routes (cheapest-first), the original input, and the
+    telemetry accumulator. The handle is **serializable on purpose**: submit and
+    poll typically run in different processes, so it must survive a round-trip
+    through a database or queue.
+  - **Failover happens at `poll`, not just submit.** When a provider's job fails
+    mid-poll (a `status:"error"`, a completed-but-empty job, or a thrown
+    retryable transport error such as the video-timeout `504` remap), `poll`
+    **re-submits to the next fallback provider** and hands back a fresh handle to
+    keep polling — it does not give up. A thrown error uses the standard
+    `isRetryableError` gate (so a caller-bug `400` on the poll endpoint doesn't
+    loop); a provider's own job failure always earns a fallback attempt.
+  - **Telemetry lands once, at the terminal poll.** The single correlated
+    `CallRecord` (via `onCall`) and the `onCost` event fire when the job settles
+    (`poll` → done/exhausted), carrying the full failover chain across both
+    processes — not at `submit`. The one exception: a `submit` that *no* provider
+    accepts settles a failed record there (there is no poll to do it).
+- **`MediaAdapter.submit` / `MediaAdapter.checkStatus` (both optional).** The
+  adapter contract gains the async pair, shaped to match ai-art's
+  `ProviderAdapter` so a consumer can delegate its own async runtime to ai-lcr
+  with no glue:
+  ```ts
+  submit({ externalId, input, metadata? }) -> { requestId }
+  checkStatus({ externalId, requestId }) ->
+    { status: 'queued' | 'running' | 'done' | 'error', outputs?, costCents?, units?, error? }
+  ```
+  A sync-only adapter (image-only) omits both; the async router simply skips a
+  route whose adapter can't serve async.
+- **All three bundled adapters now implement the async path:**
+  - **Kunavo** — `submit` → `POST /v1/videos`, `checkStatus` → `GET /v1/videos/{id}`
+    (video only; submitting an image id throws, since Kunavo images are sync).
+    `run()`'s blocking async path now reuses these internally.
+  - **fal** — `submit` → `POST queue.fal.run/{model}`, `checkStatus` reconstructs
+    the queue base from the id (the `fal-ai/flux/schnell` → `fal-ai/flux`
+    sub-path quirk) for cross-process polling.
+  - **Runware** — gains an **async video** path (`videoInference` with
+    `deliveryMethod:"async"`, polled via `getResponse`). Image stays on the
+    synchronous `run()`.
+- **New exported types:** `MediaSubmitRequest`, `MediaSubmitResult`,
+  `MediaStatusRequest`, `MediaStatusResult`, `MediaJobStatus`,
+  `MediaSubmitOptions`, `MediaJobHandle`, `MediaPollResult`, and `MediaLCR` (the
+  callable-with-methods return type of `createMediaLCR`).
+- **Live probe `scripts/check-media-async.mjs`** — exercises the real
+  `submit`/`poll` API across **every async provider** (kunavo · fal · runware)
+  whose key is present: submit → JSON round-trip the handle → poll to done →
+  assert the output URL fetches and cost is reported, per provider.
+  `PROBE_FAILOVER=1` adds a live submit-time failover case.
+### Migration
+Nothing breaks. To adopt async, give your video adapters `submit`/`checkStatus`
+(the bundled fal/kunavo/runware adapters already have them) and call
+`lcr.submit(...)` / `lcr.poll(...)` instead of the blocking `lcr(...)`. The
+blocking call still works for image and for video where holding the request open
+is acceptable.
 ## [0.5.5] — 2026-06-06
 Kunavo media (image + video) verified live and properly wired. The Kunavo

package/README.md CHANGED Viewed

@@ -291,11 +291,67 @@ USD per second, as of 2026-05 — verify current rates. Video billing differs by
 | Seedance Pro | $0.124 |
 | Veo 3.1 (audio-on) | $0.400 |
+## Image & video routing (`createMediaLCR`)
+Image and video are a separate, self-contained side of `ai-lcr` (file outputs, mixed pricing units, async jobs) — see [`src/media.ts`](src/media.ts). You give it a registry (each model's provider routes + per-unit price) and a set of adapters; it routes cheapest-first, fails over, and reports real cost through the same `onCall` sink as text.
+Two prices, two jobs: routes are **ranked** by their price normalized to one reference output (a 1080p image / a 5-second clip) so mixed units are comparable, but each settled call is **billed** on its actual usage — an 8-second clip on a per-second SKU costs 8 × the per-second rate, and its savings baseline is the official price for those same 8 seconds. Adapters report typed usage (`usage: { seconds, outputs, megapixels }`); when a provider returns its own bill, that wins, and a bill wildly off the price table (the classic USD-vs-cents slip is exactly 100×) raises `onError` so the table gets fixed.
+```ts
+import { createMediaLCR, createKunavoMediaAdapter, createFalMediaAdapter } from 'ai-lcr'
+const lcr = createMediaLCR({
+  registry: {
+    'google/veo-3-lite': {
+      id: 'google/veo-3-lite', modality: 'video',
+      routes: [
+        { provider: 'kunavo', externalId: 'veo-3-lite',        pricing: { unit: 'call',   cents: 16 } },
+        { provider: 'fal',    externalId: 'fal-ai/veo3.1/lite', pricing: { unit: 'second', cents: 8  } },
+      ],
+    },
+  },
+  adapters: {
+    kunavo: createKunavoMediaAdapter({ apiKey: process.env.KUNAVO_API_KEY! }),
+    fal:    createFalMediaAdapter({ apiKey: process.env.FAL_KEY! }),
+  },
+  onCall: rec => console.log(rec.winner, rec.costUsd, rec.failedOver),
+})
+// Sync: resolves when the file is ready (fine for images).
+const { outputs, provider, costCents } = await lcr('google/veo-3-lite', { prompt: 'a wave' })
+```
+### Async (`submit` / `poll`) — for long-running video
+A minutes-long video job can't hold a serverless request open. `submit` routes + enqueues and returns a **plain-JSON handle**; `poll` checks it. The two run in different processes — the handle survives a database/queue hop.
+```ts
+// process A — request handler: route + enqueue, return immediately
+const handle = await lcr.submit('google/veo-3-lite', { prompt: 'a wave', aspect_ratio: '16:9' })
+await db.jobs.put(jobId, JSON.stringify(handle))
+// process B — cron / queue worker: poll until terminal
+let handle = JSON.parse(await db.jobs.get(jobId))
+const r = await lcr.poll(handle)
+if (r.done) {
+  save(r.outputs, r.costCents)                 // settled — telemetry already emitted
+} else {
+  await db.jobs.put(jobId, JSON.stringify(r.handle))  // keep polling r.handle
+}
+```
+Design choices worth knowing:
+- **Routing is at `submit`** (cheapest async-capable provider); the handle carries the not-yet-tried fallbacks, so…
+- **Failover is at `poll`** — a provider whose job fails mid-poll is re-submitted to the next provider automatically (a fresh `r.handle` to keep polling), rather than the request just dying.
+- **Telemetry lands once, at the terminal poll** — one `onCall` `CallRecord` with the full failover chain, threaded across both processes (not at `submit`).
+- An adapter advertises async by implementing `submit` + `checkStatus`; image-only adapters omit them and are skipped by the async router. The bundled Kunavo, fal, and Runware adapters all implement the async path (Kunavo/Runware async is video-only; fal covers both).
 ## Vetting a provider (capability + cost probe)
 A discount is worthless if the provider quietly breaks the wire protocol. `ai-lcr` ships a zero-dependency check (`scripts/check-provider.sh`, just `bash` + `curl` + `python3`) that vets the things that actually cost you money or corrupt output, **per model**:
-> **Media providers** have their own probe: `scripts/check-kunavo-media.sh` (`bash` + `curl` + `jq`) live-tests Kunavo's image generation, `*-edit` reference endpoint, and async + sync video — the same checks used to verify the routes above. Run it before trusting a media route in production.
+> **Media providers** have their own probes: `scripts/check-kunavo-media.sh` (`bash` + `curl` + `jq`) live-tests Kunavo's image generation, `*-edit` reference endpoint, and async + sync video; `scripts/check-media-async.mjs` exercises `ai-lcr`'s own `submit`/`poll` API across **every async provider** (kunavo · fal · runware) whose key is present — submit → JSON round-trip the handle → poll to done → assert the URL fetches and cost is reported, per provider (`PROBE_FAILOVER=1` adds a live submit-time failover case). Run them before trusting a media route in production.
 - **tool calling** — single call and a multi-step round-trip with `content: null` (the shape every agent loop sends)
 - **`max_tokens` honored** — caps must bound output

package/README.zh-CN.md CHANGED Viewed

@@ -207,10 +207,66 @@ Kunavo 提供 Anthropic + Google。DeepSeek / OpenAI / Grok / Mistral 路由到
 | Seedance Pro | $0.124 |
 | Veo 3.1（audio-on） | $0.400 |
+## 图像与视频路由（`createMediaLCR`）
+图像和视频是 `ai-lcr` 独立的一侧（输出是文件、计价单位混杂、视频是异步任务）—— 见 [`src/media.ts`](src/media.ts)。你提供一个 registry（每个模型的 provider 路由 + 单位价）和一组 adapter，它就按最便宜优先路由、自动 failover，并通过与文本侧相同的 `onCall` sink 报告真实/归一化成本。
+```ts
+import { createMediaLCR, createKunavoMediaAdapter, createFalMediaAdapter } from 'ai-lcr'
+const lcr = createMediaLCR({
+  registry: {
+    'google/veo-3-lite': {
+      id: 'google/veo-3-lite', modality: 'video',
+      routes: [
+        { provider: 'kunavo', externalId: 'veo-3-lite',        pricing: { unit: 'call',   cents: 16 } },
+        { provider: 'fal',    externalId: 'fal-ai/veo3.1/lite', pricing: { unit: 'second', cents: 8  } },
+      ],
+    },
+  },
+  adapters: {
+    kunavo: createKunavoMediaAdapter({ apiKey: process.env.KUNAVO_API_KEY! }),
+    fal:    createFalMediaAdapter({ apiKey: process.env.FAL_KEY! }),
+  },
+  onCall: rec => console.log(rec.winner, rec.costUsd, rec.failedOver),
+})
+// 同步：出片才 resolve（图像够用）。
+const { outputs, provider, costCents } = await lcr('google/veo-3-lite', { prompt: 'a wave' })
+```
+### 异步（`submit` / `poll`）—— 给长耗时的视频
+几分钟的视频任务没法把一个 serverless 请求一直挂住。`submit` 负责路由 + 入队，返回一个**纯 JSON 句柄**；`poll` 负责查它。两者跑在不同进程——句柄能扛过一次数据库/队列的来回。
+```ts
+// 进程 A —— 请求处理器：路由 + 入队，立即返回
+const handle = await lcr.submit('google/veo-3-lite', { prompt: 'a wave', aspect_ratio: '16:9' })
+await db.jobs.put(jobId, JSON.stringify(handle))
+// 进程 B —— cron / 队列 worker：轮询到终态
+let handle = JSON.parse(await db.jobs.get(jobId))
+const r = await lcr.poll(handle)
+if (r.done) {
+  save(r.outputs, r.costCents)                          // 已落地——telemetry 此刻已落一条
+} else {
+  await db.jobs.put(jobId, JSON.stringify(r.handle))    // 继续轮询 r.handle
+}
+```
+几个值得知道的设计取舍：
+- **路由发生在 `submit`**（选中最便宜的、支持异步的 provider）；句柄携带尚未尝试的 fallback 列表，所以——
+- **failover 发生在 `poll`**——某个 provider 的任务在轮询途中失败时，会自动 re-submit 到下一个 provider（返回一个新的 `r.handle` 继续轮询），而不是让请求直接死掉。
+- **telemetry 只在终态轮询落一条**——一条 `onCall` `CallRecord`，带完整 failover 链，跨两个进程串起来（不是在 `submit` 时落）。
+- adapter 通过实现 `submit` + `checkStatus` 来声明支持异步；只做图像的 adapter 省略它们，异步路由会跳过这种路由。内置的 Kunavo、fal、Runware adapter 都实现了异步路径（Kunavo/Runware 异步仅视频；fal 图像视频皆可）。
 ## 给 provider 做体检（能力 + 成本探测）
 折扣再大，如果 provider 偷偷破坏了协议就一文不值。`ai-lcr` 自带一个零依赖的检查脚本（`scripts/check-provider.sh`，只需 `bash` + `curl` + `python3`），**逐模型**核查那些真正会让你多花钱或污染输出的点：
+> **媒体 provider 有独立探针：** `scripts/check-kunavo-media.sh`（`bash` + `curl` + `jq`）实测 Kunavo 的图像生成、`*-edit` 参考图端点、以及异步 + 同步视频；`scripts/check-media-async.mjs` 则**逐 provider**（kunavo · fal · runware，有 key 的才跑）跑 `ai-lcr` 自己的 `submit`/`poll` API——submit → 把句柄做 JSON 来回 → 轮询到 done → 断言 URL 真能 GET 到、成本有上报（`PROBE_FAILOVER=1` 再加一条实时 submit 期 failover）。上生产前先跑一遍。
 - **工具调用** —— 单次调用 + 带 `content: null` 的多步 round-trip（每个 agent 循环都会发的形态）
 - **`max_tokens` 是否生效** —— cap 必须能限制输出长度
 - **隐藏 prompt 注入** —— 发一条中性消息，如果模型开始回应一段它从没收到过的 system prompt，就说明 provider 注入了东西