ai-lcr 0.2.5 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +59 -0
- package/README.md +81 -3
- package/README.zh-CN.md +43 -2
- package/dist/index.cjs +160 -38
- package/dist/index.d.cts +110 -24
- package/dist/index.d.ts +110 -24
- package/dist/index.js +159 -38
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,64 @@ All notable changes to `ai-lcr` are documented here. The format follows
|
|
|
4
4
|
[Keep a Changelog](https://keepachangelog.com/), and the project adheres to
|
|
5
5
|
[Semantic Versioning](https://semver.org/).
|
|
6
6
|
|
|
7
|
+
## [0.3.0] — 2026-06-02
|
|
8
|
+
|
|
9
|
+
Integration-feedback pass from wiring ai-lcr into a real agentic product
|
|
10
|
+
(multi-step tool loops, Anthropic prompt caching). All additions are optional
|
|
11
|
+
and backward compatible.
|
|
12
|
+
|
|
13
|
+
### Fixed
|
|
14
|
+
|
|
15
|
+
- **`createHttpSink` is exported again.** It shipped in 0.2.0, then silently
|
|
16
|
+
dropped out of the package somewhere after — so `import { createHttpSink }`
|
|
17
|
+
(as the integration playbook documents) failed with TS2305 on 0.2.1+. The
|
|
18
|
+
source and tests are restored and the symbol is now pinned in the public-API
|
|
19
|
+
smoke test so it can't regress unnoticed.
|
|
20
|
+
- **Capability probe no longer false-FAILs tool support.** `check-provider.sh`
|
|
21
|
+
tested tools with `tool_choice:"auto"` and a single roll — reasoning / chatty
|
|
22
|
+
models often answer in text instead of calling, which looked identical to
|
|
23
|
+
dropped tools. It now forces `tool_choice:"required"` (testing *can* the
|
|
24
|
+
provider call a tool, not *will* the model decide to). The token-inflation
|
|
25
|
+
parser also surfaces a stderr diagnostic on a parse failure instead of
|
|
26
|
+
silently returning empty (which masqueraded as an inconclusive result).
|
|
27
|
+
|
|
28
|
+
### Added
|
|
29
|
+
|
|
30
|
+
- **`CallRecord.baselineUsd` on the text side.** The text router now fills the
|
|
31
|
+
savings baseline — the same token usage priced on the most expensive priced
|
|
32
|
+
provider in the chain — so `baselineUsd − costUsd` (the headline a cost
|
|
33
|
+
dashboard shows) is computable for text, not just media.
|
|
34
|
+
- **Prompt-cache-aware cost.** `ProviderCost` gains an optional `cacheRead`
|
|
35
|
+
(USD per 1M cached input tokens). When a call reports
|
|
36
|
+
`usage.inputTokens.cacheRead`, those tokens bill at that rate; omit it and
|
|
37
|
+
they fall back to the full `input` rate (unchanged). `CallRecord` exposes
|
|
38
|
+
`cachedInputTokens` for auditing. Accounting only — routing weights are
|
|
39
|
+
unchanged in this release.
|
|
40
|
+
- **`CallRecord.requestId` passthrough.** Read from `providerOptions.lcr.requestId`;
|
|
41
|
+
stamp the same id on every step of a tool loop to roll a multi-step request
|
|
42
|
+
up into one cost figure on the dashboard.
|
|
43
|
+
- **`CallRecord.usageMissing` flag.** Set when the winner served OK but reported
|
|
44
|
+
zero input *and* output tokens — i.e. the provider emitted no usage, so
|
|
45
|
+
`costUsd` (and any token-based credit metering) silently reads 0. Surfaces the
|
|
46
|
+
difference between "free" and "cost unknown"; `formatCallRecord` shows it as
|
|
47
|
+
`⚠no-usage`, and a savings suffix `(saved $X)` when `baselineUsd` beats cost.
|
|
48
|
+
|
|
49
|
+
## [0.2.6] — 2026-06-01
|
|
50
|
+
|
|
51
|
+
### Changed
|
|
52
|
+
|
|
53
|
+
- **fal media adapter now covers image *and* video** via fal's async queue API
|
|
54
|
+
(submit → poll `status_url` → fetch `response_url`), replacing the synchronous
|
|
55
|
+
image-only `fal.run` adapter shipped in 0.2.5. This is ai-lcr's first working
|
|
56
|
+
**video** execution path: the registry already priced/routed the Veo family
|
|
57
|
+
but no adapter could run it. Same house style — raw `fetch`, injectable
|
|
58
|
+
`fetchImpl`, no provider SDK; `Authorization: Key` (not Bearer); cost left to
|
|
59
|
+
the router's normalized estimate (the queue result carries no per-call price).
|
|
60
|
+
Following the submit response's `status_url`/`response_url` sidesteps fal's
|
|
61
|
+
sub-path quirk (`fal-ai/flux/schnell` submits to the full path, but status and
|
|
62
|
+
result live under the `fal-ai/flux` base). `createFalMediaAdapter`'s public
|
|
63
|
+
name is unchanged; image callers are unaffected.
|
|
64
|
+
|
|
7
65
|
## [0.2.5] — 2026-06-01
|
|
8
66
|
|
|
9
67
|
Pre-launch failover-robustness + media-provider pass — closing cases where a
|
|
@@ -104,5 +162,6 @@ Release-quality and engine-correctness pass.
|
|
|
104
162
|
- Dual ESM/CJS build. Media (image/video) least-cost routing with the Runware
|
|
105
163
|
and Kunavo adapters; cap-aware failover for the text router.
|
|
106
164
|
|
|
165
|
+
[0.2.6]: https://github.com/victorzhrn/ai-lcr/releases/tag/v0.2.6
|
|
107
166
|
[0.2.5]: https://github.com/victorzhrn/ai-lcr/releases/tag/v0.2.5
|
|
108
167
|
[0.2.3]: https://github.com/victorzhrn/ai-lcr/releases/tag/v0.2.3
|
package/README.md
CHANGED
|
@@ -98,6 +98,46 @@ const lcr = createLCR({
|
|
|
98
98
|
|
|
99
99
|
The same pattern works for any vendor's native SDK provider — `@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/openai`, `@ai-sdk/xai`, and so on. They all return `LanguageModelV3`, so you can mix a native vendor API with aggregators in one model's list. Native APIs are narrow (only that vendor's models) but featureful; aggregators are broad. **Official-first + aggregator-fallback** is the natural LCR shape.
|
|
100
100
|
|
|
101
|
+
## Cheapest route for open-weights models (DeepInfra)
|
|
102
|
+
|
|
103
|
+
For open-weights models — DeepSeek, Kimi, MiniMax, GLM, Qwen — a dedicated inference host is usually the cheapest route, well under aggregator pricing. [DeepInfra](https://deepinfra.com) is OpenAI-compatible, so it slots in as just another entry. **One gotcha:** its OpenAI endpoint lives at `/v1/openai` (the `/v1/` precedes `openai`), not the usual `/v1`:
|
|
104
|
+
|
|
105
|
+
```ts
|
|
106
|
+
import { createLCR } from "ai-lcr";
|
|
107
|
+
import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
|
|
108
|
+
|
|
109
|
+
const deepinfra = createOpenAICompatible({
|
|
110
|
+
name: "deepinfra",
|
|
111
|
+
baseURL: "https://api.deepinfra.com/v1/openai", // note: /v1/openai, not /v1
|
|
112
|
+
apiKey: process.env.DEEPINFRA_API_KEY,
|
|
113
|
+
});
|
|
114
|
+
const openrouter = createOpenAICompatible({
|
|
115
|
+
name: "openrouter",
|
|
116
|
+
baseURL: "https://openrouter.ai/api/v1",
|
|
117
|
+
apiKey: process.env.OPENROUTER_API_KEY,
|
|
118
|
+
});
|
|
119
|
+
|
|
120
|
+
const lcr = createLCR({
|
|
121
|
+
autoSort: true,
|
|
122
|
+
models: {
|
|
123
|
+
// DeepInfra is cheapest; OpenRouter is the breadth/uptime fallback.
|
|
124
|
+
// DeepInfra uses HuggingFace-style ids (org/Name).
|
|
125
|
+
"deepseek-v4-flash": [
|
|
126
|
+
{ model: deepinfra("deepseek-ai/DeepSeek-V4-Flash"), label: "deepinfra", cost: { input: 0.10, output: 0.20 } },
|
|
127
|
+
{ model: openrouter("deepseek/deepseek-v4-flash"), label: "openrouter", cost: { input: 0.27, output: 1.10 } },
|
|
128
|
+
],
|
|
129
|
+
"minimax-m2.5": [
|
|
130
|
+
{ model: deepinfra("MiniMaxAI/MiniMax-M2.5"), label: "deepinfra", cost: { input: 0.15, output: 1.15 } },
|
|
131
|
+
],
|
|
132
|
+
"kimi-k2.5": [
|
|
133
|
+
{ model: deepinfra("moonshotai/Kimi-K2.5"), label: "deepinfra", cost: { input: 0.45, output: 2.25 } },
|
|
134
|
+
],
|
|
135
|
+
},
|
|
136
|
+
});
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
DeepInfra carries open weights only — no first-party Claude / GPT / Gemini. For those closed models, route through OpenRouter or a discount gateway instead.
|
|
140
|
+
|
|
101
141
|
## How it routes
|
|
102
142
|
|
|
103
143
|
1. **Cheapest first.** Providers are tried in order — list them cheapest-first, or set `autoSort: true` to order them by `cost` automatically.
|
|
@@ -146,17 +186,54 @@ interface CallRecord {
|
|
|
146
186
|
latencyMs: number;
|
|
147
187
|
inputTokens: number;
|
|
148
188
|
outputTokens: number;
|
|
149
|
-
|
|
189
|
+
cachedInputTokens?: number; // prompt-cache hits the winner read (when reported)
|
|
190
|
+
costUsd: number; // winner cost, cache-discount applied (see `cacheRead`)
|
|
191
|
+
baselineUsd?: number; // same usage on the priciest priced leg → savings = baselineUsd − costUsd
|
|
192
|
+
requestId?: string; // your correlation id (see below) — roll multi-step tool loops into one request
|
|
193
|
+
usageMissing?: boolean; // winner served but reported 0/0 tokens → costUsd is 0 but unknown, not free
|
|
150
194
|
}
|
|
151
195
|
```
|
|
152
196
|
|
|
197
|
+
**Savings, not just spend.** Whenever at least one provider in a chain carries a `cost`, `baselineUsd` is what the same call would have cost on the most expensive priced leg (typically your safety-net fallback). `baselineUsd − costUsd` is the money routing saved on that call — the number a cost dashboard exists to show.
|
|
198
|
+
|
|
199
|
+
**Cache-aware cost.** Add `cacheRead` (USD per 1M cached input tokens) to a provider's `cost` and ai-lcr bills prompt-cache hits at that rate when the call reports `usage.inputTokens.cacheRead`. Omit it and cached tokens fall back to the full `input` rate (unchanged from before). For cache-heavy traffic (e.g. Anthropic, where a cache read is ~0.1×) this keeps `costUsd` honest — and `cachedInputTokens` lets a dashboard audit it:
|
|
200
|
+
|
|
201
|
+
```ts
|
|
202
|
+
{ model: claude, label: "anthropic", cost: { input: 3, output: 15, cacheRead: 0.3 } }
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
**Group a multi-step request.** An agentic turn does one `onCall` per `doStream`/`doGenerate` step, so a 10-step tool loop emits 10 records. Pass a stable id through `providerOptions.lcr.requestId` and every step's record carries it — group by `requestId` for per-request cost:
|
|
206
|
+
|
|
207
|
+
```ts
|
|
208
|
+
await streamText({ model: lcr("chat"), messages, providerOptions: { lcr: { requestId } } });
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
### Ship records to a collector (`createHttpSink`)
|
|
212
|
+
|
|
213
|
+
`createHttpSink` builds an `onCall` handler that POSTs each `CallRecord` as JSON to an endpoint (e.g. a self-hosted dashboard's `/api/ingest`, or any drain that takes the shape). Fire-and-forget — a failed POST never breaks your app. On serverless, pass a `waitUntil`-style `dispatch` (Next.js: `after`) so the request isn't cut off:
|
|
214
|
+
|
|
215
|
+
```ts
|
|
216
|
+
import { createLCR, createHttpSink } from "ai-lcr";
|
|
217
|
+
import { after } from "next/server";
|
|
218
|
+
|
|
219
|
+
const lcr = createLCR({
|
|
220
|
+
models: { /* … */ },
|
|
221
|
+
onCall: createHttpSink({
|
|
222
|
+
url: process.env.LCR_INGEST_URL + "/api/ingest",
|
|
223
|
+
headers: { authorization: `Bearer ${process.env.LCR_INGEST_KEY}` },
|
|
224
|
+
project: process.env.LCR_PROJECT, // optional tenant tag merged into each payload
|
|
225
|
+
dispatch: after,
|
|
226
|
+
}),
|
|
227
|
+
});
|
|
228
|
+
```
|
|
229
|
+
|
|
153
230
|
## Supported providers
|
|
154
231
|
|
|
155
232
|
Any OpenAI-compatible endpoint works — and so does any AI SDK provider package, including a model vendor's own official API.
|
|
156
233
|
|
|
157
234
|
- **Model vendors' own APIs (native):** route straight to [DeepSeek](https://platform.deepseek.com), [OpenAI](https://openai.com), [Anthropic](https://anthropic.com), [Google](https://ai.google.dev), [xAI](https://x.ai), etc. via their AI SDK provider packages — no markup, full native features. See [Route to a model vendor's own API](#route-to-a-model-vendors-own-api-native-providers).
|
|
158
235
|
- **Text aggregators:** [OpenRouter](https://openrouter.ai) (widest coverage, list pricing) · [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off** every model) · [TokenMart](https://thetokenmart.ai) (15–65% off, varies by model)
|
|
159
|
-
- **Image / video:** [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off**) · [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —
|
|
236
|
+
- **Image / video:** [Kunavo](https://kunavo.com/?ref=victorimf) (**20% off**) · [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) — routing via `createMediaLCR`. Image: Kunavo + Runware + fal. Video: fal (live, via its async queue API); Kunavo's Veo poll path is implemented but unverified
|
|
160
237
|
|
|
161
238
|
## Text model pricing
|
|
162
239
|
|
|
@@ -273,7 +350,8 @@ Two OpenAI-compatible providers, same probe, same day. Cells cover both families
|
|
|
273
350
|
- [ ] Bundled price table for zero-config pricing (drop the manual `cost` numbers)
|
|
274
351
|
- [ ] Provider-quirk middleware (transparently patch known per-provider request quirks, e.g. Kunavo's ignored `max_tokens`)
|
|
275
352
|
- [ ] Feed probe results into routing automatically (auto-exclude a model from a provider that fails its probe)
|
|
276
|
-
- [
|
|
353
|
+
- [x] Image & video model routing (`createMediaLCR`) — image via Kunavo + Runware + fal; **video live via fal** (async queue API)
|
|
354
|
+
- [ ] Normalized cross-provider video price comparison + verified Kunavo/Runware video adapters
|
|
277
355
|
|
|
278
356
|
## Affiliate disclosure
|
|
279
357
|
|
package/README.zh-CN.md
CHANGED
|
@@ -98,6 +98,46 @@ const lcr = createLCR({
|
|
|
98
98
|
|
|
99
99
|
同样的模式适用于任何厂商的原生 SDK provider——`@ai-sdk/anthropic`、`@ai-sdk/google`、`@ai-sdk/openai`、`@ai-sdk/xai` 等等。它们都返回 `LanguageModelV3`,所以你可以在一个模型的列表里把厂商原生 API 和聚合器混着用。原生 API 覆盖窄(只有该厂商自己的模型)但特性全;聚合器覆盖广。**官方优先 + 聚合器兜底** 正是 LCR 最自然的形态。
|
|
100
100
|
|
|
101
|
+
## 开源权重模型的最便宜路由(DeepInfra)
|
|
102
|
+
|
|
103
|
+
对开源权重模型——DeepSeek、Kimi、MiniMax、GLM、Qwen——专门的推理托管商通常是最便宜的路由,明显低于聚合器价格。[DeepInfra](https://deepinfra.com) 兼容 OpenAI,直接当成列表里的又一个 entry 即可。**有一个坑**:它的 OpenAI endpoint 在 `/v1/openai`(`/v1/` 在 `openai` **前面**),不是常规的 `/v1`:
|
|
104
|
+
|
|
105
|
+
```ts
|
|
106
|
+
import { createLCR } from "ai-lcr";
|
|
107
|
+
import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
|
|
108
|
+
|
|
109
|
+
const deepinfra = createOpenAICompatible({
|
|
110
|
+
name: "deepinfra",
|
|
111
|
+
baseURL: "https://api.deepinfra.com/v1/openai", // 注意:/v1/openai,不是 /v1
|
|
112
|
+
apiKey: process.env.DEEPINFRA_API_KEY,
|
|
113
|
+
});
|
|
114
|
+
const openrouter = createOpenAICompatible({
|
|
115
|
+
name: "openrouter",
|
|
116
|
+
baseURL: "https://openrouter.ai/api/v1",
|
|
117
|
+
apiKey: process.env.OPENROUTER_API_KEY,
|
|
118
|
+
});
|
|
119
|
+
|
|
120
|
+
const lcr = createLCR({
|
|
121
|
+
autoSort: true,
|
|
122
|
+
models: {
|
|
123
|
+
// DeepInfra 最便宜;OpenRouter 作广覆盖 / 可用性兜底。
|
|
124
|
+
// DeepInfra 用 HuggingFace 风格的 id(org/Name)。
|
|
125
|
+
"deepseek-v4-flash": [
|
|
126
|
+
{ model: deepinfra("deepseek-ai/DeepSeek-V4-Flash"), label: "deepinfra", cost: { input: 0.10, output: 0.20 } },
|
|
127
|
+
{ model: openrouter("deepseek/deepseek-v4-flash"), label: "openrouter", cost: { input: 0.27, output: 1.10 } },
|
|
128
|
+
],
|
|
129
|
+
"minimax-m2.5": [
|
|
130
|
+
{ model: deepinfra("MiniMaxAI/MiniMax-M2.5"), label: "deepinfra", cost: { input: 0.15, output: 1.15 } },
|
|
131
|
+
],
|
|
132
|
+
"kimi-k2.5": [
|
|
133
|
+
{ model: deepinfra("moonshotai/Kimi-K2.5"), label: "deepinfra", cost: { input: 0.45, output: 2.25 } },
|
|
134
|
+
],
|
|
135
|
+
},
|
|
136
|
+
});
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
DeepInfra 只承载开源权重——没有第一方 Claude / GPT / Gemini。那些闭源模型请走 OpenRouter 或折扣中转。
|
|
140
|
+
|
|
101
141
|
## 它如何路由
|
|
102
142
|
|
|
103
143
|
1. **最便宜优先。** provider 按顺序依次尝试——把它们排成最便宜优先,或设置 `autoSort: true` 让它按 `cost` 自动排序。
|
|
@@ -114,7 +154,7 @@ const lcr = createLCR({
|
|
|
114
154
|
|
|
115
155
|
- **模型厂商官方 API(原生):** 通过各自的 AI SDK provider 包直连 [DeepSeek](https://platform.deepseek.com)、[OpenAI](https://openai.com)、[Anthropic](https://anthropic.com)、[Google](https://ai.google.dev)、[xAI](https://x.ai) 等——无加价,原生特性齐全。见上方「直连模型厂商官方 API(原生 provider)」一节。
|
|
116
156
|
- **文本聚合器:** [OpenRouter](https://openrouter.ai)(覆盖最广,列表定价)· [Kunavo](https://kunavo.com/?ref=victorimf)(**全模型 8 折**)· [TokenMart](https://thetokenmart.ai)(按模型 85 折–35 折不等)
|
|
117
|
-
- **图像 / 视频:** [Kunavo](https://kunavo.com/?ref=victorimf)(**8 折**)· [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) ——
|
|
157
|
+
- **图像 / 视频:** [Kunavo](https://kunavo.com/?ref=victorimf)(**8 折**)· [TokenMart](https://thetokenmart.ai) · [fal.ai](https://fal.ai) · [Runware](https://runware.ai) —— 通过 `createMediaLCR` 路由。图像:Kunavo + Runware + fal。视频:fal(已可用,走其异步队列 API);Kunavo 的 Veo 轮询路径已实现但未验证
|
|
118
158
|
|
|
119
159
|
## 文本模型价格
|
|
120
160
|
|
|
@@ -229,7 +269,8 @@ API_KEY=$TOKENMART_API_KEY BASE=https://api.tokenmart.ai \
|
|
|
229
269
|
- [ ] 内置价格表,实现零配置定价(省去手填 `cost` 数字)
|
|
230
270
|
- [ ] provider 怪癖中间件(透明地修补已知怪癖,如 Kunavo 被忽略的 `max_tokens`)
|
|
231
271
|
- [ ] 把 probe 结果自动接入路由(探测失败的 provider×model 自动从列表剔除)
|
|
232
|
-
- [
|
|
272
|
+
- [x] 图像与视频模型路由(`createMediaLCR`)—— 图像走 Kunavo + Runware + fal;**视频已可用,走 fal**(异步队列 API)
|
|
273
|
+
- [ ] 归一化的跨 provider 视频价格对比 + 验证 Kunavo/Runware 视频适配器
|
|
233
274
|
|
|
234
275
|
## 联盟(Affiliate)披露
|
|
235
276
|
|
package/dist/index.cjs
CHANGED
|
@@ -27,6 +27,7 @@ __export(index_exports, {
|
|
|
27
27
|
classifyErrorKind: () => classifyErrorKind,
|
|
28
28
|
comparePrices: () => comparePrices,
|
|
29
29
|
createFalMediaAdapter: () => createFalMediaAdapter,
|
|
30
|
+
createHttpSink: () => createHttpSink,
|
|
30
31
|
createKunavoMediaAdapter: () => createKunavoMediaAdapter,
|
|
31
32
|
createLCR: () => createLCR,
|
|
32
33
|
createMediaLCR: () => createMediaLCR,
|
|
@@ -186,6 +187,16 @@ function newCallId() {
|
|
|
186
187
|
if (c?.randomUUID) return c.randomUUID();
|
|
187
188
|
return `lcr_${Date.now().toString(36)}_${(callSeq++).toString(36)}`;
|
|
188
189
|
}
|
|
190
|
+
function costForUsage(cost, inputTokens, outputTokens, cacheReadTokens) {
|
|
191
|
+
const cached = Math.min(Math.max(cacheReadTokens, 0), inputTokens);
|
|
192
|
+
const fullInput = inputTokens - cached;
|
|
193
|
+
const cachedRate = cost.cacheRead ?? cost.input;
|
|
194
|
+
return fullInput / 1e6 * cost.input + cached / 1e6 * cachedRate + outputTokens / 1e6 * cost.output;
|
|
195
|
+
}
|
|
196
|
+
function requestIdFrom(options) {
|
|
197
|
+
const raw = options.providerOptions?.lcr?.requestId;
|
|
198
|
+
return typeof raw === "string" && raw.length > 0 ? raw : void 0;
|
|
199
|
+
}
|
|
189
200
|
var LcrFallbackModel = class {
|
|
190
201
|
constructor(opts) {
|
|
191
202
|
this.opts = opts;
|
|
@@ -267,8 +278,13 @@ var LcrFallbackModel = class {
|
|
|
267
278
|
} catch {
|
|
268
279
|
}
|
|
269
280
|
}
|
|
270
|
-
startCall() {
|
|
271
|
-
return {
|
|
281
|
+
startCall(options) {
|
|
282
|
+
return {
|
|
283
|
+
id: newCallId(),
|
|
284
|
+
attempts: [],
|
|
285
|
+
startedAt: Date.now(),
|
|
286
|
+
requestId: requestIdFrom(options)
|
|
287
|
+
};
|
|
272
288
|
}
|
|
273
289
|
/** Record a failed attempt onto the call's chain (no event yet). */
|
|
274
290
|
recordFail(ctx, provider, attemptStart, error) {
|
|
@@ -280,12 +296,29 @@ var LcrFallbackModel = class {
|
|
|
280
296
|
kind: classifyErrorKind(error)
|
|
281
297
|
});
|
|
282
298
|
}
|
|
299
|
+
/**
|
|
300
|
+
* Baseline = what this same usage would have cost on the most expensive
|
|
301
|
+
* *priced* provider in the chain (typically the OpenRouter fallback leg). The
|
|
302
|
+
* winner's savings is `baselineUsd - costUsd`. Undefined when no provider in
|
|
303
|
+
* the chain carries a price (nothing to compare against).
|
|
304
|
+
*/
|
|
305
|
+
baselineUsd(inputTokens, outputTokens, cacheReadTokens) {
|
|
306
|
+
let max;
|
|
307
|
+
for (const p of this.opts.providers) {
|
|
308
|
+
if (!p.cost) continue;
|
|
309
|
+
const c = costForUsage(p.cost, inputTokens, outputTokens, cacheReadTokens);
|
|
310
|
+
if (max === void 0 || c > max) max = c;
|
|
311
|
+
}
|
|
312
|
+
return max;
|
|
313
|
+
}
|
|
283
314
|
/** Winner settled: record the attempt, fire `onCost` (compat) + `onCall`. */
|
|
284
315
|
finalizeOk(ctx, provider, attemptStart, usage) {
|
|
285
316
|
ctx.attempts.push({ provider: provider.label, ok: true, latencyMs: Date.now() - attemptStart });
|
|
286
317
|
const inputTokens = usage?.inputTokens?.total ?? 0;
|
|
287
318
|
const outputTokens = usage?.outputTokens?.total ?? 0;
|
|
288
|
-
const
|
|
319
|
+
const cacheReadTokens = usage?.inputTokens?.cacheRead ?? 0;
|
|
320
|
+
const costUsd = provider.cost ? costForUsage(provider.cost, inputTokens, outputTokens, cacheReadTokens) : 0;
|
|
321
|
+
const usageMissing = inputTokens === 0 && outputTokens === 0;
|
|
289
322
|
this.emitCost({
|
|
290
323
|
model: this.opts.modelName,
|
|
291
324
|
provider: provider.label,
|
|
@@ -303,7 +336,11 @@ var LcrFallbackModel = class {
|
|
|
303
336
|
latencyMs: Date.now() - ctx.startedAt,
|
|
304
337
|
inputTokens,
|
|
305
338
|
outputTokens,
|
|
306
|
-
|
|
339
|
+
...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {},
|
|
340
|
+
costUsd,
|
|
341
|
+
baselineUsd: this.baselineUsd(inputTokens, outputTokens, cacheReadTokens),
|
|
342
|
+
...ctx.requestId ? { requestId: ctx.requestId } : {},
|
|
343
|
+
...usageMissing ? { usageMissing: true } : {}
|
|
307
344
|
});
|
|
308
345
|
}
|
|
309
346
|
/** Every provider failed: fire `onCall` with no winner. */
|
|
@@ -318,11 +355,12 @@ var LcrFallbackModel = class {
|
|
|
318
355
|
latencyMs: Date.now() - ctx.startedAt,
|
|
319
356
|
inputTokens: 0,
|
|
320
357
|
outputTokens: 0,
|
|
321
|
-
costUsd: 0
|
|
358
|
+
costUsd: 0,
|
|
359
|
+
...ctx.requestId ? { requestId: ctx.requestId } : {}
|
|
322
360
|
});
|
|
323
361
|
}
|
|
324
362
|
async doGenerate(options) {
|
|
325
|
-
const ctx = this.startCall();
|
|
363
|
+
const ctx = this.startCall(options);
|
|
326
364
|
const providers = this.opts.providers;
|
|
327
365
|
const n = providers.length;
|
|
328
366
|
const start = this.startIndex();
|
|
@@ -351,7 +389,7 @@ var LcrFallbackModel = class {
|
|
|
351
389
|
throw lastError;
|
|
352
390
|
}
|
|
353
391
|
async doStream(options) {
|
|
354
|
-
return this.doStreamWithCtx(options, this.startCall(), this.startIndex(), 0);
|
|
392
|
+
return this.doStreamWithCtx(options, this.startCall(options), this.startIndex(), 0);
|
|
355
393
|
}
|
|
356
394
|
// The stream's failover recursion re-enters here with the SAME `ctx` and a
|
|
357
395
|
// threaded-through local cursor (`idx`/`tried`), so a mid-stream switch keeps
|
|
@@ -475,6 +513,10 @@ function formatCallRecord(record, opts = {}) {
|
|
|
475
513
|
const chain = record.attempts.map((a) => a.provider).join("\u2192") || record.winner || "\u2014";
|
|
476
514
|
const status = formatCost(record);
|
|
477
515
|
let line = `${glyph} ${record.model} ${chain} ${record.latencyMs}ms ${status}`;
|
|
516
|
+
if (record.ok && record.baselineUsd !== void 0 && record.baselineUsd > record.costUsd) {
|
|
517
|
+
line += ` (saved $${(record.baselineUsd - record.costUsd).toFixed(4)})`;
|
|
518
|
+
}
|
|
519
|
+
if (record.usageMissing) line += ` \u26A0no-usage`;
|
|
478
520
|
const failed = record.attempts.filter((a) => !a.ok);
|
|
479
521
|
if (failed.length > 0) {
|
|
480
522
|
const reasons = failed.map((a) => `${a.provider} ${a.errorClass ?? "error"}`).join(", ");
|
|
@@ -487,6 +529,40 @@ function formatCallRecord(record, opts = {}) {
|
|
|
487
529
|
return line;
|
|
488
530
|
}
|
|
489
531
|
|
|
532
|
+
// src/sink.ts
|
|
533
|
+
function createHttpSink(options) {
|
|
534
|
+
const {
|
|
535
|
+
url,
|
|
536
|
+
headers,
|
|
537
|
+
project,
|
|
538
|
+
dispatch = (task) => {
|
|
539
|
+
void task();
|
|
540
|
+
},
|
|
541
|
+
fetchImpl,
|
|
542
|
+
onError
|
|
543
|
+
} = options;
|
|
544
|
+
const doFetch = fetchImpl ?? globalThis.fetch;
|
|
545
|
+
return (record) => {
|
|
546
|
+
if (!doFetch) {
|
|
547
|
+
onError?.(new Error("ai-lcr: no fetch available for createHttpSink"));
|
|
548
|
+
return;
|
|
549
|
+
}
|
|
550
|
+
const payload = project ? { project, ...record } : record;
|
|
551
|
+
dispatch(async () => {
|
|
552
|
+
try {
|
|
553
|
+
await doFetch(url, {
|
|
554
|
+
method: "POST",
|
|
555
|
+
headers: { "content-type": "application/json", ...headers },
|
|
556
|
+
body: JSON.stringify(payload),
|
|
557
|
+
keepalive: true
|
|
558
|
+
});
|
|
559
|
+
} catch (err) {
|
|
560
|
+
onError?.(err);
|
|
561
|
+
}
|
|
562
|
+
});
|
|
563
|
+
};
|
|
564
|
+
}
|
|
565
|
+
|
|
490
566
|
// src/media.ts
|
|
491
567
|
var DEFAULT_REFERENCE = {
|
|
492
568
|
image: { width: 1920, height: 1080 },
|
|
@@ -903,49 +979,84 @@ var RunwareMediaError = class extends Error {
|
|
|
903
979
|
};
|
|
904
980
|
|
|
905
981
|
// src/adapters/fal-media.ts
|
|
906
|
-
var DEFAULT_BASE3 = "https://fal.run";
|
|
907
|
-
function
|
|
908
|
-
|
|
909
|
-
|
|
910
|
-
const
|
|
911
|
-
|
|
912
|
-
}
|
|
913
|
-
|
|
914
|
-
if (
|
|
915
|
-
|
|
916
|
-
|
|
917
|
-
|
|
982
|
+
var DEFAULT_BASE3 = "https://queue.fal.run";
|
|
983
|
+
function extractOutputs(raw) {
|
|
984
|
+
if (!raw || typeof raw !== "object") return [];
|
|
985
|
+
const data = raw;
|
|
986
|
+
const out = [];
|
|
987
|
+
const pushUrl = (url, type) => {
|
|
988
|
+
if (typeof url === "string" && url.length > 0) out.push({ url, type });
|
|
989
|
+
};
|
|
990
|
+
if (Array.isArray(data.images)) {
|
|
991
|
+
for (const img of data.images) pushUrl(img?.url, "image");
|
|
992
|
+
}
|
|
993
|
+
pushUrl(data.image?.url, "image");
|
|
994
|
+
if (Array.isArray(data.videos)) {
|
|
995
|
+
for (const v of data.videos) pushUrl(v?.url, "video");
|
|
918
996
|
}
|
|
919
|
-
|
|
997
|
+
pushUrl(data.video?.url, "video");
|
|
998
|
+
return out;
|
|
920
999
|
}
|
|
921
1000
|
function createFalMediaAdapter(config) {
|
|
922
|
-
const {
|
|
1001
|
+
const {
|
|
1002
|
+
apiKey,
|
|
1003
|
+
baseUrl = DEFAULT_BASE3,
|
|
1004
|
+
pollIntervalMs = 3e3,
|
|
1005
|
+
pollTimeoutMs = 3e5,
|
|
1006
|
+
fetchImpl = fetch
|
|
1007
|
+
} = config;
|
|
1008
|
+
const headers = {
|
|
1009
|
+
"content-type": "application/json",
|
|
1010
|
+
authorization: `Key ${apiKey}`
|
|
1011
|
+
};
|
|
923
1012
|
return {
|
|
924
1013
|
provider: "fal",
|
|
925
1014
|
async run(req) {
|
|
926
|
-
const
|
|
1015
|
+
const submitRes = await fetchImpl(`${baseUrl}/${req.externalId}`, {
|
|
927
1016
|
method: "POST",
|
|
928
|
-
headers
|
|
929
|
-
"content-type": "application/json",
|
|
930
|
-
authorization: `Key ${apiKey}`,
|
|
931
|
-
accept: "application/json"
|
|
932
|
-
},
|
|
1017
|
+
headers,
|
|
933
1018
|
body: JSON.stringify(req.input)
|
|
934
1019
|
});
|
|
935
|
-
|
|
936
|
-
|
|
937
|
-
body = await res.json();
|
|
938
|
-
} catch {
|
|
939
|
-
body = {};
|
|
1020
|
+
if (!submitRes.ok) {
|
|
1021
|
+
throw new FalMediaError(submitRes.status, await safeText2(submitRes));
|
|
940
1022
|
}
|
|
941
|
-
|
|
942
|
-
|
|
1023
|
+
const submit = await submitRes.json();
|
|
1024
|
+
const statusUrl = submit.status_url;
|
|
1025
|
+
const responseUrl = submit.response_url;
|
|
1026
|
+
if (!statusUrl || !responseUrl) {
|
|
1027
|
+
throw new Error(
|
|
1028
|
+
`ai-lcr: fal submit for "${req.externalId}" returned no status/response URL (keys: ${Object.keys(
|
|
1029
|
+
submit
|
|
1030
|
+
).join(", ")})`
|
|
1031
|
+
);
|
|
943
1032
|
}
|
|
944
|
-
const
|
|
945
|
-
|
|
946
|
-
|
|
1033
|
+
const deadline = Date.now() + pollTimeoutMs;
|
|
1034
|
+
let completed = false;
|
|
1035
|
+
while (Date.now() < deadline) {
|
|
1036
|
+
const statusRes = await fetchImpl(statusUrl, { headers });
|
|
1037
|
+
if (!statusRes.ok) {
|
|
1038
|
+
throw new FalMediaError(statusRes.status, await safeText2(statusRes));
|
|
1039
|
+
}
|
|
1040
|
+
const status = String((await statusRes.json()).status ?? "");
|
|
1041
|
+
if (status === "COMPLETED") {
|
|
1042
|
+
completed = true;
|
|
1043
|
+
break;
|
|
1044
|
+
}
|
|
1045
|
+
await sleep2(pollIntervalMs);
|
|
1046
|
+
}
|
|
1047
|
+
if (!completed) {
|
|
1048
|
+
throw new Error(
|
|
1049
|
+
`ai-lcr: fal job for "${req.externalId}" timed out after ${pollTimeoutMs}ms`
|
|
1050
|
+
);
|
|
1051
|
+
}
|
|
1052
|
+
const resultRes = await fetchImpl(responseUrl, { headers });
|
|
1053
|
+
if (!resultRes.ok) {
|
|
1054
|
+
throw new FalMediaError(resultRes.status, await safeText2(resultRes));
|
|
1055
|
+
}
|
|
1056
|
+
const outputs = extractOutputs(await resultRes.json());
|
|
1057
|
+
if (outputs.length === 0) {
|
|
1058
|
+
throw new Error(`ai-lcr: fal returned no media URL for "${req.externalId}"`);
|
|
947
1059
|
}
|
|
948
|
-
const outputs = urls.map((url) => ({ url, type: "image" }));
|
|
949
1060
|
return { outputs, units: outputs.length };
|
|
950
1061
|
}
|
|
951
1062
|
};
|
|
@@ -958,6 +1069,16 @@ var FalMediaError = class extends Error {
|
|
|
958
1069
|
}
|
|
959
1070
|
status;
|
|
960
1071
|
};
|
|
1072
|
+
function sleep2(ms) {
|
|
1073
|
+
return new Promise((r) => setTimeout(r, ms));
|
|
1074
|
+
}
|
|
1075
|
+
async function safeText2(res) {
|
|
1076
|
+
try {
|
|
1077
|
+
return await res.text();
|
|
1078
|
+
} catch {
|
|
1079
|
+
return "<no body>";
|
|
1080
|
+
}
|
|
1081
|
+
}
|
|
961
1082
|
|
|
962
1083
|
// src/index.ts
|
|
963
1084
|
function isLanguageModel(entry) {
|
|
@@ -1008,6 +1129,7 @@ function createLCR(config) {
|
|
|
1008
1129
|
classifyErrorKind,
|
|
1009
1130
|
comparePrices,
|
|
1010
1131
|
createFalMediaAdapter,
|
|
1132
|
+
createHttpSink,
|
|
1011
1133
|
createKunavoMediaAdapter,
|
|
1012
1134
|
createLCR,
|
|
1013
1135
|
createMediaLCR,
|
package/dist/index.d.cts
CHANGED
|
@@ -17,8 +17,18 @@ import { LanguageModelV3 } from '@ai-sdk/provider';
|
|
|
17
17
|
|
|
18
18
|
/** USD per 1M tokens. */
|
|
19
19
|
interface ProviderCost {
|
|
20
|
+
/** USD per 1M input (prompt) tokens. */
|
|
20
21
|
input: number;
|
|
22
|
+
/** USD per 1M output (completion) tokens. */
|
|
21
23
|
output: number;
|
|
24
|
+
/**
|
|
25
|
+
* USD per 1M *cached* input tokens read (prompt-cache hits). Optional. When a
|
|
26
|
+
* call reports `usage.inputTokens.cacheRead`, those tokens are billed at this
|
|
27
|
+
* rate instead of `input` — so the cost stays honest for cache-heavy traffic
|
|
28
|
+
* (e.g. Anthropic, where a cache read is ~0.1× the input price). Omit it and
|
|
29
|
+
* cached tokens fall back to the full `input` rate (the pre-0.3 behavior).
|
|
30
|
+
*/
|
|
31
|
+
cacheRead?: number;
|
|
22
32
|
}
|
|
23
33
|
interface CostEvent {
|
|
24
34
|
/** Logical model name (the key in createLCR's `models`). */
|
|
@@ -77,15 +87,36 @@ interface CallRecord {
|
|
|
77
87
|
latencyMs: number;
|
|
78
88
|
inputTokens: number;
|
|
79
89
|
outputTokens: number;
|
|
90
|
+
/**
|
|
91
|
+
* Cached input (prompt-cache) tokens the winner read, when the provider
|
|
92
|
+
* reported them (`usage.inputTokens.cacheRead`). Present only when > 0. Lets
|
|
93
|
+
* the dashboard show cache-hit volume and audit why `costUsd` is lower than
|
|
94
|
+
* sticker × tokens. Undefined when the provider reports no cache info.
|
|
95
|
+
*/
|
|
96
|
+
cachedInputTokens?: number;
|
|
80
97
|
/** Computed from the winner's `cost`; 0 if no price was given or the call failed. */
|
|
81
98
|
costUsd: number;
|
|
82
99
|
/**
|
|
83
|
-
* What the same request would have cost on the most expensive
|
|
84
|
-
* provider
|
|
85
|
-
*
|
|
86
|
-
*
|
|
100
|
+
* What the same request would have cost on the most expensive *priced*
|
|
101
|
+
* provider in the chain, on identical token usage — the savings baseline
|
|
102
|
+
* (`baselineUsd - costUsd`). Set by both routers whenever at least one
|
|
103
|
+
* provider carries a `cost`; undefined only when no provider was priced.
|
|
87
104
|
*/
|
|
88
105
|
baselineUsd?: number;
|
|
106
|
+
/**
|
|
107
|
+
* Caller-supplied correlation id, read from `providerOptions.lcr.requestId`
|
|
108
|
+
* on the call. Multi-step tool loops emit one record per `doStream`/
|
|
109
|
+
* `doGenerate` step; stamp the same `requestId` on every step to let the
|
|
110
|
+
* dashboard roll a whole user request up into one cost/`calls` figure.
|
|
111
|
+
*/
|
|
112
|
+
requestId?: string;
|
|
113
|
+
/**
|
|
114
|
+
* True when the winner served OK but reported **zero** input *and* output
|
|
115
|
+
* tokens — i.e. the provider didn't emit usage. A silent danger: `costUsd`
|
|
116
|
+
* collapses to 0 and any token-based credit metering under-charges with no
|
|
117
|
+
* other signal. Treat a flagged record as "cost unknown", not "free".
|
|
118
|
+
*/
|
|
119
|
+
usageMissing?: boolean;
|
|
89
120
|
}
|
|
90
121
|
/**
|
|
91
122
|
* Normalize an error into a short, log-friendly class for {@link CallRecord}.
|
|
@@ -122,6 +153,54 @@ interface FormatOptions {
|
|
|
122
153
|
}
|
|
123
154
|
declare function formatCallRecord(record: CallRecord, opts?: FormatOptions): string;
|
|
124
155
|
|
|
156
|
+
/**
|
|
157
|
+
* Optional HTTP sink for `onCall` — ship each {@link CallRecord} as JSON to a
|
|
158
|
+
* collector (e.g. a self-hosted ai-lcr-dashboard `/api/ingest`, or any endpoint
|
|
159
|
+
* that accepts the CallRecord shape).
|
|
160
|
+
*
|
|
161
|
+
* Fully optional and dashboard-agnostic: omit it and ai-lcr stores nothing;
|
|
162
|
+
* point `url` at whatever you run. Logging must never break your app, so a
|
|
163
|
+
* failed POST is swallowed by default (surface it via `onError` if you want).
|
|
164
|
+
*
|
|
165
|
+
* import { createLCR, createHttpSink } from "ai-lcr";
|
|
166
|
+
* import { after } from "next/server"; // serverless: don't block the response
|
|
167
|
+
*
|
|
168
|
+
* const lcr = createLCR({
|
|
169
|
+
* models: { ... },
|
|
170
|
+
* onCall: createHttpSink({
|
|
171
|
+
* url: process.env.LCR_INGEST_URL + "/api/ingest",
|
|
172
|
+
* headers: { authorization: `Bearer ${process.env.LCR_INGEST_KEY}` },
|
|
173
|
+
* project: process.env.LCR_PROJECT,
|
|
174
|
+
* dispatch: after, // run after the response is sent
|
|
175
|
+
* }),
|
|
176
|
+
* });
|
|
177
|
+
*/
|
|
178
|
+
|
|
179
|
+
interface HttpSinkOptions {
|
|
180
|
+
/** Where to POST each CallRecord (a collector that accepts the JSON shape). */
|
|
181
|
+
url: string;
|
|
182
|
+
/** Extra headers, e.g. `{ authorization: ` + "`Bearer ${key}`" + ` }`. */
|
|
183
|
+
headers?: Record<string, string>;
|
|
184
|
+
/** Optional tenant/project tag merged into each payload (`{ project, ...record }`). */
|
|
185
|
+
project?: string;
|
|
186
|
+
/**
|
|
187
|
+
* Wrap the dispatch so it survives a serverless function returning. On
|
|
188
|
+
* Next.js pass `after` from "next/server"; elsewhere pass a `waitUntil`-style
|
|
189
|
+
* function. Defaults to running immediately — correct for long-lived servers,
|
|
190
|
+
* but on serverless an un-awaited POST may be cut off, so pass `after`.
|
|
191
|
+
*/
|
|
192
|
+
dispatch?: (task: () => void | Promise<void>) => void;
|
|
193
|
+
/** Custom fetch (tests / runtimes without a global `fetch`). */
|
|
194
|
+
fetchImpl?: typeof fetch;
|
|
195
|
+
/** Called if the POST fails. Failures are swallowed by default. */
|
|
196
|
+
onError?: (error: unknown) => void;
|
|
197
|
+
}
|
|
198
|
+
/**
|
|
199
|
+
* Build an `onCall` handler that POSTs each {@link CallRecord} to `url`.
|
|
200
|
+
* Returns a plain `(record) => void` — pass it straight to `createLCR`'s `onCall`.
|
|
201
|
+
*/
|
|
202
|
+
declare function createHttpSink(options: HttpSinkOptions): (record: CallRecord) => void;
|
|
203
|
+
|
|
125
204
|
/**
|
|
126
205
|
* ai-lcr media routing — Least Cost Routing for image & video models.
|
|
127
206
|
*
|
|
@@ -359,35 +438,42 @@ interface RunwareMediaConfig {
|
|
|
359
438
|
declare function createRunwareMediaAdapter(config: RunwareMediaConfig): MediaAdapter;
|
|
360
439
|
|
|
361
440
|
/**
|
|
362
|
-
* fal
|
|
441
|
+
* fal media adapter — image (queue) + video (queue, async poll).
|
|
363
442
|
*
|
|
364
|
-
* fal
|
|
365
|
-
*
|
|
366
|
-
*
|
|
367
|
-
*
|
|
368
|
-
* image_url for i2i/edit, …) work without this adapter knowing about them — it
|
|
369
|
-
* stays generic, not tied to one model family.
|
|
443
|
+
* fal serves every model through one async queue API, so a single submit→poll→
|
|
444
|
+
* fetch-result path covers both image and video. That is the whole reason this
|
|
445
|
+
* adapter exists: it is ai-lcr's first VIDEO-capable execution path. (The
|
|
446
|
+
* Runware adapter is image-only; the Kunavo one's video poll loop is unverified.)
|
|
370
447
|
*
|
|
371
|
-
*
|
|
448
|
+
* Implementation note: ai-art's fal adapter uses the `@fal-ai/client` SDK, but
|
|
449
|
+
* ai-lcr deliberately keeps zero provider SDKs — every adapter is raw `fetch`
|
|
450
|
+
* with an injectable `fetchImpl` for testing (see runware-media, kunavo-media).
|
|
451
|
+
* So this re-implements the three queue calls against fal's REST endpoints:
|
|
372
452
|
*
|
|
373
|
-
*
|
|
374
|
-
*
|
|
375
|
-
*
|
|
376
|
-
* whether to fail over. A 403 "exhausted balance" is retryable (fall over to the
|
|
377
|
-
* next provider); a 422 bad-input is not (don't waste the fallbacks).
|
|
453
|
+
* 1. submit POST https://queue.fal.run/{model} → { request_id, status_url, response_url }
|
|
454
|
+
* 2. status GET {status_url} → { status: IN_QUEUE | IN_PROGRESS | COMPLETED }
|
|
455
|
+
* 3. result GET {response_url} → { images:[…] } | { video:{url} } | …
|
|
378
456
|
*
|
|
379
|
-
*
|
|
380
|
-
*
|
|
381
|
-
*
|
|
457
|
+
* We follow the `status_url` / `response_url` returned by submit rather than
|
|
458
|
+
* rebuilding them, which sidesteps fal's sub-path quirk (a model like
|
|
459
|
+
* `fal-ai/flux/schnell` submits to the full path but its status/result live
|
|
460
|
+
* under the `fal-ai/flux` base).
|
|
382
461
|
*
|
|
383
|
-
*
|
|
384
|
-
*
|
|
462
|
+
* Auth: fal uses `Authorization: Key {FAL_KEY}` (NOT Bearer).
|
|
463
|
+
*
|
|
464
|
+
* Cost: fal's queue result does not carry a per-call price, so cost is left to
|
|
465
|
+
* the router's normalized estimate (costCents stays undefined; `units` is the
|
|
466
|
+
* output count — one image, or one clip).
|
|
385
467
|
*/
|
|
386
468
|
|
|
387
469
|
interface FalMediaConfig {
|
|
388
470
|
apiKey: string;
|
|
389
|
-
/** Override for testing. Defaults to https://fal.run. */
|
|
471
|
+
/** Override for testing. Defaults to https://queue.fal.run. */
|
|
390
472
|
baseUrl?: string;
|
|
473
|
+
/** Video/job poll cadence (ms). Default 3000. */
|
|
474
|
+
pollIntervalMs?: number;
|
|
475
|
+
/** Max time to wait for a job before giving up (ms). Default 300000 (5m). */
|
|
476
|
+
pollTimeoutMs?: number;
|
|
391
477
|
/** Injected for testing; defaults to global fetch. */
|
|
392
478
|
fetchImpl?: typeof fetch;
|
|
393
479
|
}
|
|
@@ -446,4 +532,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
|
|
|
446
532
|
*/
|
|
447
533
|
declare function createLCR(config: LCRConfig): LCRRouter;
|
|
448
534
|
|
|
449
|
-
export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
|
|
535
|
+
export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
|
package/dist/index.d.ts
CHANGED
|
@@ -17,8 +17,18 @@ import { LanguageModelV3 } from '@ai-sdk/provider';
|
|
|
17
17
|
|
|
18
18
|
/** USD per 1M tokens. */
|
|
19
19
|
interface ProviderCost {
|
|
20
|
+
/** USD per 1M input (prompt) tokens. */
|
|
20
21
|
input: number;
|
|
22
|
+
/** USD per 1M output (completion) tokens. */
|
|
21
23
|
output: number;
|
|
24
|
+
/**
|
|
25
|
+
* USD per 1M *cached* input tokens read (prompt-cache hits). Optional. When a
|
|
26
|
+
* call reports `usage.inputTokens.cacheRead`, those tokens are billed at this
|
|
27
|
+
* rate instead of `input` — so the cost stays honest for cache-heavy traffic
|
|
28
|
+
* (e.g. Anthropic, where a cache read is ~0.1× the input price). Omit it and
|
|
29
|
+
* cached tokens fall back to the full `input` rate (the pre-0.3 behavior).
|
|
30
|
+
*/
|
|
31
|
+
cacheRead?: number;
|
|
22
32
|
}
|
|
23
33
|
interface CostEvent {
|
|
24
34
|
/** Logical model name (the key in createLCR's `models`). */
|
|
@@ -77,15 +87,36 @@ interface CallRecord {
|
|
|
77
87
|
latencyMs: number;
|
|
78
88
|
inputTokens: number;
|
|
79
89
|
outputTokens: number;
|
|
90
|
+
/**
|
|
91
|
+
* Cached input (prompt-cache) tokens the winner read, when the provider
|
|
92
|
+
* reported them (`usage.inputTokens.cacheRead`). Present only when > 0. Lets
|
|
93
|
+
* the dashboard show cache-hit volume and audit why `costUsd` is lower than
|
|
94
|
+
* sticker × tokens. Undefined when the provider reports no cache info.
|
|
95
|
+
*/
|
|
96
|
+
cachedInputTokens?: number;
|
|
80
97
|
/** Computed from the winner's `cost`; 0 if no price was given or the call failed. */
|
|
81
98
|
costUsd: number;
|
|
82
99
|
/**
|
|
83
|
-
* What the same request would have cost on the most expensive
|
|
84
|
-
* provider
|
|
85
|
-
*
|
|
86
|
-
*
|
|
100
|
+
* What the same request would have cost on the most expensive *priced*
|
|
101
|
+
* provider in the chain, on identical token usage — the savings baseline
|
|
102
|
+
* (`baselineUsd - costUsd`). Set by both routers whenever at least one
|
|
103
|
+
* provider carries a `cost`; undefined only when no provider was priced.
|
|
87
104
|
*/
|
|
88
105
|
baselineUsd?: number;
|
|
106
|
+
/**
|
|
107
|
+
* Caller-supplied correlation id, read from `providerOptions.lcr.requestId`
|
|
108
|
+
* on the call. Multi-step tool loops emit one record per `doStream`/
|
|
109
|
+
* `doGenerate` step; stamp the same `requestId` on every step to let the
|
|
110
|
+
* dashboard roll a whole user request up into one cost/`calls` figure.
|
|
111
|
+
*/
|
|
112
|
+
requestId?: string;
|
|
113
|
+
/**
|
|
114
|
+
* True when the winner served OK but reported **zero** input *and* output
|
|
115
|
+
* tokens — i.e. the provider didn't emit usage. A silent danger: `costUsd`
|
|
116
|
+
* collapses to 0 and any token-based credit metering under-charges with no
|
|
117
|
+
* other signal. Treat a flagged record as "cost unknown", not "free".
|
|
118
|
+
*/
|
|
119
|
+
usageMissing?: boolean;
|
|
89
120
|
}
|
|
90
121
|
/**
|
|
91
122
|
* Normalize an error into a short, log-friendly class for {@link CallRecord}.
|
|
@@ -122,6 +153,54 @@ interface FormatOptions {
|
|
|
122
153
|
}
|
|
123
154
|
declare function formatCallRecord(record: CallRecord, opts?: FormatOptions): string;
|
|
124
155
|
|
|
156
|
+
/**
|
|
157
|
+
* Optional HTTP sink for `onCall` — ship each {@link CallRecord} as JSON to a
|
|
158
|
+
* collector (e.g. a self-hosted ai-lcr-dashboard `/api/ingest`, or any endpoint
|
|
159
|
+
* that accepts the CallRecord shape).
|
|
160
|
+
*
|
|
161
|
+
* Fully optional and dashboard-agnostic: omit it and ai-lcr stores nothing;
|
|
162
|
+
* point `url` at whatever you run. Logging must never break your app, so a
|
|
163
|
+
* failed POST is swallowed by default (surface it via `onError` if you want).
|
|
164
|
+
*
|
|
165
|
+
* import { createLCR, createHttpSink } from "ai-lcr";
|
|
166
|
+
* import { after } from "next/server"; // serverless: don't block the response
|
|
167
|
+
*
|
|
168
|
+
* const lcr = createLCR({
|
|
169
|
+
* models: { ... },
|
|
170
|
+
* onCall: createHttpSink({
|
|
171
|
+
* url: process.env.LCR_INGEST_URL + "/api/ingest",
|
|
172
|
+
* headers: { authorization: `Bearer ${process.env.LCR_INGEST_KEY}` },
|
|
173
|
+
* project: process.env.LCR_PROJECT,
|
|
174
|
+
* dispatch: after, // run after the response is sent
|
|
175
|
+
* }),
|
|
176
|
+
* });
|
|
177
|
+
*/
|
|
178
|
+
|
|
179
|
+
interface HttpSinkOptions {
|
|
180
|
+
/** Where to POST each CallRecord (a collector that accepts the JSON shape). */
|
|
181
|
+
url: string;
|
|
182
|
+
/** Extra headers, e.g. `{ authorization: ` + "`Bearer ${key}`" + ` }`. */
|
|
183
|
+
headers?: Record<string, string>;
|
|
184
|
+
/** Optional tenant/project tag merged into each payload (`{ project, ...record }`). */
|
|
185
|
+
project?: string;
|
|
186
|
+
/**
|
|
187
|
+
* Wrap the dispatch so it survives a serverless function returning. On
|
|
188
|
+
* Next.js pass `after` from "next/server"; elsewhere pass a `waitUntil`-style
|
|
189
|
+
* function. Defaults to running immediately — correct for long-lived servers,
|
|
190
|
+
* but on serverless an un-awaited POST may be cut off, so pass `after`.
|
|
191
|
+
*/
|
|
192
|
+
dispatch?: (task: () => void | Promise<void>) => void;
|
|
193
|
+
/** Custom fetch (tests / runtimes without a global `fetch`). */
|
|
194
|
+
fetchImpl?: typeof fetch;
|
|
195
|
+
/** Called if the POST fails. Failures are swallowed by default. */
|
|
196
|
+
onError?: (error: unknown) => void;
|
|
197
|
+
}
|
|
198
|
+
/**
|
|
199
|
+
* Build an `onCall` handler that POSTs each {@link CallRecord} to `url`.
|
|
200
|
+
* Returns a plain `(record) => void` — pass it straight to `createLCR`'s `onCall`.
|
|
201
|
+
*/
|
|
202
|
+
declare function createHttpSink(options: HttpSinkOptions): (record: CallRecord) => void;
|
|
203
|
+
|
|
125
204
|
/**
|
|
126
205
|
* ai-lcr media routing — Least Cost Routing for image & video models.
|
|
127
206
|
*
|
|
@@ -359,35 +438,42 @@ interface RunwareMediaConfig {
|
|
|
359
438
|
declare function createRunwareMediaAdapter(config: RunwareMediaConfig): MediaAdapter;
|
|
360
439
|
|
|
361
440
|
/**
|
|
362
|
-
* fal
|
|
441
|
+
* fal media adapter — image (queue) + video (queue, async poll).
|
|
363
442
|
*
|
|
364
|
-
* fal
|
|
365
|
-
*
|
|
366
|
-
*
|
|
367
|
-
*
|
|
368
|
-
* image_url for i2i/edit, …) work without this adapter knowing about them — it
|
|
369
|
-
* stays generic, not tied to one model family.
|
|
443
|
+
* fal serves every model through one async queue API, so a single submit→poll→
|
|
444
|
+
* fetch-result path covers both image and video. That is the whole reason this
|
|
445
|
+
* adapter exists: it is ai-lcr's first VIDEO-capable execution path. (The
|
|
446
|
+
* Runware adapter is image-only; the Kunavo one's video poll loop is unverified.)
|
|
370
447
|
*
|
|
371
|
-
*
|
|
448
|
+
* Implementation note: ai-art's fal adapter uses the `@fal-ai/client` SDK, but
|
|
449
|
+
* ai-lcr deliberately keeps zero provider SDKs — every adapter is raw `fetch`
|
|
450
|
+
* with an injectable `fetchImpl` for testing (see runware-media, kunavo-media).
|
|
451
|
+
* So this re-implements the three queue calls against fal's REST endpoints:
|
|
372
452
|
*
|
|
373
|
-
*
|
|
374
|
-
*
|
|
375
|
-
*
|
|
376
|
-
* whether to fail over. A 403 "exhausted balance" is retryable (fall over to the
|
|
377
|
-
* next provider); a 422 bad-input is not (don't waste the fallbacks).
|
|
453
|
+
* 1. submit POST https://queue.fal.run/{model} → { request_id, status_url, response_url }
|
|
454
|
+
* 2. status GET {status_url} → { status: IN_QUEUE | IN_PROGRESS | COMPLETED }
|
|
455
|
+
* 3. result GET {response_url} → { images:[…] } | { video:{url} } | …
|
|
378
456
|
*
|
|
379
|
-
*
|
|
380
|
-
*
|
|
381
|
-
*
|
|
457
|
+
* We follow the `status_url` / `response_url` returned by submit rather than
|
|
458
|
+
* rebuilding them, which sidesteps fal's sub-path quirk (a model like
|
|
459
|
+
* `fal-ai/flux/schnell` submits to the full path but its status/result live
|
|
460
|
+
* under the `fal-ai/flux` base).
|
|
382
461
|
*
|
|
383
|
-
*
|
|
384
|
-
*
|
|
462
|
+
* Auth: fal uses `Authorization: Key {FAL_KEY}` (NOT Bearer).
|
|
463
|
+
*
|
|
464
|
+
* Cost: fal's queue result does not carry a per-call price, so cost is left to
|
|
465
|
+
* the router's normalized estimate (costCents stays undefined; `units` is the
|
|
466
|
+
* output count — one image, or one clip).
|
|
385
467
|
*/
|
|
386
468
|
|
|
387
469
|
interface FalMediaConfig {
|
|
388
470
|
apiKey: string;
|
|
389
|
-
/** Override for testing. Defaults to https://fal.run. */
|
|
471
|
+
/** Override for testing. Defaults to https://queue.fal.run. */
|
|
390
472
|
baseUrl?: string;
|
|
473
|
+
/** Video/job poll cadence (ms). Default 3000. */
|
|
474
|
+
pollIntervalMs?: number;
|
|
475
|
+
/** Max time to wait for a job before giving up (ms). Default 300000 (5m). */
|
|
476
|
+
pollTimeoutMs?: number;
|
|
391
477
|
/** Injected for testing; defaults to global fetch. */
|
|
392
478
|
fetchImpl?: typeof fetch;
|
|
393
479
|
}
|
|
@@ -446,4 +532,4 @@ type LCRRouter = (modelName: string) => LanguageModelV3;
|
|
|
446
532
|
*/
|
|
447
533
|
declare function createLCR(config: LCRConfig): LCRRouter;
|
|
448
534
|
|
|
449
|
-
export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
|
|
535
|
+
export { type CallRecord, type CostEvent, DEFAULT_REFERENCE, type ErrorKind, type FormatOptions, type HttpSinkOptions, type LCRConfig, type LCRRouter, MEDIA_PRICING, type MediaAdapter, type MediaCostEvent, type MediaGenerateRequest, type MediaGenerateResult, type MediaLCRConfig, type MediaModality, type MediaModelDef, type MediaOutput, type MediaPricing, type MediaRegistry, type MediaRoute, type MediaRunResult, type MediaUnit, type PriceComparisonRow, type ProviderCost, type ProviderEntry, type RankedRoute, type ReferenceSpec, type RouteAttempt, cheapestRoute, classifyError, classifyErrorKind, comparePrices, createFalMediaAdapter, createHttpSink, createKunavoMediaAdapter, createLCR, createMediaLCR, createRunwareMediaAdapter, formatCallRecord, normalizedCents, rankRoutes, referenceMegapixels };
|
package/dist/index.js
CHANGED
|
@@ -146,6 +146,16 @@ function newCallId() {
|
|
|
146
146
|
if (c?.randomUUID) return c.randomUUID();
|
|
147
147
|
return `lcr_${Date.now().toString(36)}_${(callSeq++).toString(36)}`;
|
|
148
148
|
}
|
|
149
|
+
function costForUsage(cost, inputTokens, outputTokens, cacheReadTokens) {
|
|
150
|
+
const cached = Math.min(Math.max(cacheReadTokens, 0), inputTokens);
|
|
151
|
+
const fullInput = inputTokens - cached;
|
|
152
|
+
const cachedRate = cost.cacheRead ?? cost.input;
|
|
153
|
+
return fullInput / 1e6 * cost.input + cached / 1e6 * cachedRate + outputTokens / 1e6 * cost.output;
|
|
154
|
+
}
|
|
155
|
+
function requestIdFrom(options) {
|
|
156
|
+
const raw = options.providerOptions?.lcr?.requestId;
|
|
157
|
+
return typeof raw === "string" && raw.length > 0 ? raw : void 0;
|
|
158
|
+
}
|
|
149
159
|
var LcrFallbackModel = class {
|
|
150
160
|
constructor(opts) {
|
|
151
161
|
this.opts = opts;
|
|
@@ -227,8 +237,13 @@ var LcrFallbackModel = class {
|
|
|
227
237
|
} catch {
|
|
228
238
|
}
|
|
229
239
|
}
|
|
230
|
-
startCall() {
|
|
231
|
-
return {
|
|
240
|
+
startCall(options) {
|
|
241
|
+
return {
|
|
242
|
+
id: newCallId(),
|
|
243
|
+
attempts: [],
|
|
244
|
+
startedAt: Date.now(),
|
|
245
|
+
requestId: requestIdFrom(options)
|
|
246
|
+
};
|
|
232
247
|
}
|
|
233
248
|
/** Record a failed attempt onto the call's chain (no event yet). */
|
|
234
249
|
recordFail(ctx, provider, attemptStart, error) {
|
|
@@ -240,12 +255,29 @@ var LcrFallbackModel = class {
|
|
|
240
255
|
kind: classifyErrorKind(error)
|
|
241
256
|
});
|
|
242
257
|
}
|
|
258
|
+
/**
|
|
259
|
+
* Baseline = what this same usage would have cost on the most expensive
|
|
260
|
+
* *priced* provider in the chain (typically the OpenRouter fallback leg). The
|
|
261
|
+
* winner's savings is `baselineUsd - costUsd`. Undefined when no provider in
|
|
262
|
+
* the chain carries a price (nothing to compare against).
|
|
263
|
+
*/
|
|
264
|
+
baselineUsd(inputTokens, outputTokens, cacheReadTokens) {
|
|
265
|
+
let max;
|
|
266
|
+
for (const p of this.opts.providers) {
|
|
267
|
+
if (!p.cost) continue;
|
|
268
|
+
const c = costForUsage(p.cost, inputTokens, outputTokens, cacheReadTokens);
|
|
269
|
+
if (max === void 0 || c > max) max = c;
|
|
270
|
+
}
|
|
271
|
+
return max;
|
|
272
|
+
}
|
|
243
273
|
/** Winner settled: record the attempt, fire `onCost` (compat) + `onCall`. */
|
|
244
274
|
finalizeOk(ctx, provider, attemptStart, usage) {
|
|
245
275
|
ctx.attempts.push({ provider: provider.label, ok: true, latencyMs: Date.now() - attemptStart });
|
|
246
276
|
const inputTokens = usage?.inputTokens?.total ?? 0;
|
|
247
277
|
const outputTokens = usage?.outputTokens?.total ?? 0;
|
|
248
|
-
const
|
|
278
|
+
const cacheReadTokens = usage?.inputTokens?.cacheRead ?? 0;
|
|
279
|
+
const costUsd = provider.cost ? costForUsage(provider.cost, inputTokens, outputTokens, cacheReadTokens) : 0;
|
|
280
|
+
const usageMissing = inputTokens === 0 && outputTokens === 0;
|
|
249
281
|
this.emitCost({
|
|
250
282
|
model: this.opts.modelName,
|
|
251
283
|
provider: provider.label,
|
|
@@ -263,7 +295,11 @@ var LcrFallbackModel = class {
|
|
|
263
295
|
latencyMs: Date.now() - ctx.startedAt,
|
|
264
296
|
inputTokens,
|
|
265
297
|
outputTokens,
|
|
266
|
-
|
|
298
|
+
...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {},
|
|
299
|
+
costUsd,
|
|
300
|
+
baselineUsd: this.baselineUsd(inputTokens, outputTokens, cacheReadTokens),
|
|
301
|
+
...ctx.requestId ? { requestId: ctx.requestId } : {},
|
|
302
|
+
...usageMissing ? { usageMissing: true } : {}
|
|
267
303
|
});
|
|
268
304
|
}
|
|
269
305
|
/** Every provider failed: fire `onCall` with no winner. */
|
|
@@ -278,11 +314,12 @@ var LcrFallbackModel = class {
|
|
|
278
314
|
latencyMs: Date.now() - ctx.startedAt,
|
|
279
315
|
inputTokens: 0,
|
|
280
316
|
outputTokens: 0,
|
|
281
|
-
costUsd: 0
|
|
317
|
+
costUsd: 0,
|
|
318
|
+
...ctx.requestId ? { requestId: ctx.requestId } : {}
|
|
282
319
|
});
|
|
283
320
|
}
|
|
284
321
|
async doGenerate(options) {
|
|
285
|
-
const ctx = this.startCall();
|
|
322
|
+
const ctx = this.startCall(options);
|
|
286
323
|
const providers = this.opts.providers;
|
|
287
324
|
const n = providers.length;
|
|
288
325
|
const start = this.startIndex();
|
|
@@ -311,7 +348,7 @@ var LcrFallbackModel = class {
|
|
|
311
348
|
throw lastError;
|
|
312
349
|
}
|
|
313
350
|
async doStream(options) {
|
|
314
|
-
return this.doStreamWithCtx(options, this.startCall(), this.startIndex(), 0);
|
|
351
|
+
return this.doStreamWithCtx(options, this.startCall(options), this.startIndex(), 0);
|
|
315
352
|
}
|
|
316
353
|
// The stream's failover recursion re-enters here with the SAME `ctx` and a
|
|
317
354
|
// threaded-through local cursor (`idx`/`tried`), so a mid-stream switch keeps
|
|
@@ -435,6 +472,10 @@ function formatCallRecord(record, opts = {}) {
|
|
|
435
472
|
const chain = record.attempts.map((a) => a.provider).join("\u2192") || record.winner || "\u2014";
|
|
436
473
|
const status = formatCost(record);
|
|
437
474
|
let line = `${glyph} ${record.model} ${chain} ${record.latencyMs}ms ${status}`;
|
|
475
|
+
if (record.ok && record.baselineUsd !== void 0 && record.baselineUsd > record.costUsd) {
|
|
476
|
+
line += ` (saved $${(record.baselineUsd - record.costUsd).toFixed(4)})`;
|
|
477
|
+
}
|
|
478
|
+
if (record.usageMissing) line += ` \u26A0no-usage`;
|
|
438
479
|
const failed = record.attempts.filter((a) => !a.ok);
|
|
439
480
|
if (failed.length > 0) {
|
|
440
481
|
const reasons = failed.map((a) => `${a.provider} ${a.errorClass ?? "error"}`).join(", ");
|
|
@@ -447,6 +488,40 @@ function formatCallRecord(record, opts = {}) {
|
|
|
447
488
|
return line;
|
|
448
489
|
}
|
|
449
490
|
|
|
491
|
+
// src/sink.ts
|
|
492
|
+
function createHttpSink(options) {
|
|
493
|
+
const {
|
|
494
|
+
url,
|
|
495
|
+
headers,
|
|
496
|
+
project,
|
|
497
|
+
dispatch = (task) => {
|
|
498
|
+
void task();
|
|
499
|
+
},
|
|
500
|
+
fetchImpl,
|
|
501
|
+
onError
|
|
502
|
+
} = options;
|
|
503
|
+
const doFetch = fetchImpl ?? globalThis.fetch;
|
|
504
|
+
return (record) => {
|
|
505
|
+
if (!doFetch) {
|
|
506
|
+
onError?.(new Error("ai-lcr: no fetch available for createHttpSink"));
|
|
507
|
+
return;
|
|
508
|
+
}
|
|
509
|
+
const payload = project ? { project, ...record } : record;
|
|
510
|
+
dispatch(async () => {
|
|
511
|
+
try {
|
|
512
|
+
await doFetch(url, {
|
|
513
|
+
method: "POST",
|
|
514
|
+
headers: { "content-type": "application/json", ...headers },
|
|
515
|
+
body: JSON.stringify(payload),
|
|
516
|
+
keepalive: true
|
|
517
|
+
});
|
|
518
|
+
} catch (err) {
|
|
519
|
+
onError?.(err);
|
|
520
|
+
}
|
|
521
|
+
});
|
|
522
|
+
};
|
|
523
|
+
}
|
|
524
|
+
|
|
450
525
|
// src/media.ts
|
|
451
526
|
var DEFAULT_REFERENCE = {
|
|
452
527
|
image: { width: 1920, height: 1080 },
|
|
@@ -863,49 +938,84 @@ var RunwareMediaError = class extends Error {
|
|
|
863
938
|
};
|
|
864
939
|
|
|
865
940
|
// src/adapters/fal-media.ts
|
|
866
|
-
var DEFAULT_BASE3 = "https://fal.run";
|
|
867
|
-
function
|
|
868
|
-
|
|
869
|
-
|
|
870
|
-
const
|
|
871
|
-
|
|
872
|
-
}
|
|
873
|
-
|
|
874
|
-
if (
|
|
875
|
-
|
|
876
|
-
|
|
877
|
-
|
|
941
|
+
var DEFAULT_BASE3 = "https://queue.fal.run";
|
|
942
|
+
function extractOutputs(raw) {
|
|
943
|
+
if (!raw || typeof raw !== "object") return [];
|
|
944
|
+
const data = raw;
|
|
945
|
+
const out = [];
|
|
946
|
+
const pushUrl = (url, type) => {
|
|
947
|
+
if (typeof url === "string" && url.length > 0) out.push({ url, type });
|
|
948
|
+
};
|
|
949
|
+
if (Array.isArray(data.images)) {
|
|
950
|
+
for (const img of data.images) pushUrl(img?.url, "image");
|
|
951
|
+
}
|
|
952
|
+
pushUrl(data.image?.url, "image");
|
|
953
|
+
if (Array.isArray(data.videos)) {
|
|
954
|
+
for (const v of data.videos) pushUrl(v?.url, "video");
|
|
878
955
|
}
|
|
879
|
-
|
|
956
|
+
pushUrl(data.video?.url, "video");
|
|
957
|
+
return out;
|
|
880
958
|
}
|
|
881
959
|
function createFalMediaAdapter(config) {
|
|
882
|
-
const {
|
|
960
|
+
const {
|
|
961
|
+
apiKey,
|
|
962
|
+
baseUrl = DEFAULT_BASE3,
|
|
963
|
+
pollIntervalMs = 3e3,
|
|
964
|
+
pollTimeoutMs = 3e5,
|
|
965
|
+
fetchImpl = fetch
|
|
966
|
+
} = config;
|
|
967
|
+
const headers = {
|
|
968
|
+
"content-type": "application/json",
|
|
969
|
+
authorization: `Key ${apiKey}`
|
|
970
|
+
};
|
|
883
971
|
return {
|
|
884
972
|
provider: "fal",
|
|
885
973
|
async run(req) {
|
|
886
|
-
const
|
|
974
|
+
const submitRes = await fetchImpl(`${baseUrl}/${req.externalId}`, {
|
|
887
975
|
method: "POST",
|
|
888
|
-
headers
|
|
889
|
-
"content-type": "application/json",
|
|
890
|
-
authorization: `Key ${apiKey}`,
|
|
891
|
-
accept: "application/json"
|
|
892
|
-
},
|
|
976
|
+
headers,
|
|
893
977
|
body: JSON.stringify(req.input)
|
|
894
978
|
});
|
|
895
|
-
|
|
896
|
-
|
|
897
|
-
body = await res.json();
|
|
898
|
-
} catch {
|
|
899
|
-
body = {};
|
|
979
|
+
if (!submitRes.ok) {
|
|
980
|
+
throw new FalMediaError(submitRes.status, await safeText2(submitRes));
|
|
900
981
|
}
|
|
901
|
-
|
|
902
|
-
|
|
982
|
+
const submit = await submitRes.json();
|
|
983
|
+
const statusUrl = submit.status_url;
|
|
984
|
+
const responseUrl = submit.response_url;
|
|
985
|
+
if (!statusUrl || !responseUrl) {
|
|
986
|
+
throw new Error(
|
|
987
|
+
`ai-lcr: fal submit for "${req.externalId}" returned no status/response URL (keys: ${Object.keys(
|
|
988
|
+
submit
|
|
989
|
+
).join(", ")})`
|
|
990
|
+
);
|
|
903
991
|
}
|
|
904
|
-
const
|
|
905
|
-
|
|
906
|
-
|
|
992
|
+
const deadline = Date.now() + pollTimeoutMs;
|
|
993
|
+
let completed = false;
|
|
994
|
+
while (Date.now() < deadline) {
|
|
995
|
+
const statusRes = await fetchImpl(statusUrl, { headers });
|
|
996
|
+
if (!statusRes.ok) {
|
|
997
|
+
throw new FalMediaError(statusRes.status, await safeText2(statusRes));
|
|
998
|
+
}
|
|
999
|
+
const status = String((await statusRes.json()).status ?? "");
|
|
1000
|
+
if (status === "COMPLETED") {
|
|
1001
|
+
completed = true;
|
|
1002
|
+
break;
|
|
1003
|
+
}
|
|
1004
|
+
await sleep2(pollIntervalMs);
|
|
1005
|
+
}
|
|
1006
|
+
if (!completed) {
|
|
1007
|
+
throw new Error(
|
|
1008
|
+
`ai-lcr: fal job for "${req.externalId}" timed out after ${pollTimeoutMs}ms`
|
|
1009
|
+
);
|
|
1010
|
+
}
|
|
1011
|
+
const resultRes = await fetchImpl(responseUrl, { headers });
|
|
1012
|
+
if (!resultRes.ok) {
|
|
1013
|
+
throw new FalMediaError(resultRes.status, await safeText2(resultRes));
|
|
1014
|
+
}
|
|
1015
|
+
const outputs = extractOutputs(await resultRes.json());
|
|
1016
|
+
if (outputs.length === 0) {
|
|
1017
|
+
throw new Error(`ai-lcr: fal returned no media URL for "${req.externalId}"`);
|
|
907
1018
|
}
|
|
908
|
-
const outputs = urls.map((url) => ({ url, type: "image" }));
|
|
909
1019
|
return { outputs, units: outputs.length };
|
|
910
1020
|
}
|
|
911
1021
|
};
|
|
@@ -918,6 +1028,16 @@ var FalMediaError = class extends Error {
|
|
|
918
1028
|
}
|
|
919
1029
|
status;
|
|
920
1030
|
};
|
|
1031
|
+
function sleep2(ms) {
|
|
1032
|
+
return new Promise((r) => setTimeout(r, ms));
|
|
1033
|
+
}
|
|
1034
|
+
async function safeText2(res) {
|
|
1035
|
+
try {
|
|
1036
|
+
return await res.text();
|
|
1037
|
+
} catch {
|
|
1038
|
+
return "<no body>";
|
|
1039
|
+
}
|
|
1040
|
+
}
|
|
921
1041
|
|
|
922
1042
|
// src/index.ts
|
|
923
1043
|
function isLanguageModel(entry) {
|
|
@@ -967,6 +1087,7 @@ export {
|
|
|
967
1087
|
classifyErrorKind,
|
|
968
1088
|
comparePrices,
|
|
969
1089
|
createFalMediaAdapter,
|
|
1090
|
+
createHttpSink,
|
|
970
1091
|
createKunavoMediaAdapter,
|
|
971
1092
|
createLCR,
|
|
972
1093
|
createMediaLCR,
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "ai-lcr",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.3.0",
|
|
4
4
|
"description": "Least Cost Routing for LLMs — route every model call to the cheapest available provider, fall back automatically, and track real cost. Built for the Vercel AI SDK.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"ai",
|