ai-lcr 0.6.2 → 0.6.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +52 -0
- package/README.md +36 -0
- package/README.zh-CN.md +36 -0
- package/dist/index.cjs +239 -3
- package/dist/index.d.cts +234 -4
- package/dist/index.d.ts +234 -4
- package/dist/index.js +236 -3
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,58 @@ All notable changes to `ai-lcr` are documented here. The format follows
|
|
|
4
4
|
[Keep a Changelog](https://keepachangelog.com/), and the project adheres to
|
|
5
5
|
[Semantic Versioning](https://semver.org/).
|
|
6
6
|
|
|
7
|
+
## [0.6.4] — 2026-06-16
|
|
8
|
+
|
|
9
|
+
DX improvements that eliminate per-project boilerplate for consumers.
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
|
|
13
|
+
- **`DEFAULT_PROVIDERS`** — canonical URL + env-var-name for common providers
|
|
14
|
+
(openrouter, deepinfra, tokenmart, deepseek, kunavo, runware, fal). Import
|
|
15
|
+
instead of redeclaring in every app; a URL change propagates on `npm update`.
|
|
16
|
+
- **`createEnvSink(dispatch)`** — reads `LCR_INGEST_URL` / `LCR_PROJECT` /
|
|
17
|
+
`LCR_INGEST_KEY` from env and returns a ready-to-use `onCall` handler (or
|
|
18
|
+
`undefined` when unset). Replaces the identical 30-line `sink.ts` every
|
|
19
|
+
consumer was copy-pasting. Pass `after` from `next/server` as `dispatch`.
|
|
20
|
+
- **`AnyLanguageModel`** — duck-typed model interface on `ProviderEntry` so
|
|
21
|
+
consumers no longer need `as` casts when their `@ai-sdk/provider` version
|
|
22
|
+
differs from ai-lcr's. Runtime behavior unchanged.
|
|
23
|
+
|
|
24
|
+
## [0.6.3] — 2026-06-11
|
|
25
|
+
|
|
26
|
+
Caching — both kinds, each off by default and each a pure config flag with no
|
|
27
|
+
service to run. The response cache is the layer Vercel AI Gateway notably
|
|
28
|
+
doesn't offer; ai-lcr does it in-process and folds it into its cost truth.
|
|
29
|
+
|
|
30
|
+
### Added
|
|
31
|
+
|
|
32
|
+
- **`createLCR({ cache })`** — exact-match **response cache**. An identical
|
|
33
|
+
request replays the stored response and calls **no provider at all**: zero
|
|
34
|
+
latency, `costUsd: 0`. Storage is pluggable with **zero added dependencies**:
|
|
35
|
+
`cache: true` uses a bundled in-memory store, `cache: myStore` brings your own
|
|
36
|
+
(Redis / Vercel KV — required for cross-request hits on serverless, where
|
|
37
|
+
memory isn't shared), `cache: { store?, ttlMs? }` sets a TTL. A hit settles a
|
|
38
|
+
`CallRecord` with `cacheHit: true` and the avoided cost on its own
|
|
39
|
+
`cacheHitSavingUsd` line (a caching saving, never folded into routing savings).
|
|
40
|
+
Empty completions and usage-less results are never cached. New exports:
|
|
41
|
+
`createMemoryCacheStore`, types `CacheStore` / `CacheOptions` / `CachedCall` /
|
|
42
|
+
`CachedMeta` / `MemoryCacheOptions`.
|
|
43
|
+
- **`createLCR({ promptCache })`** — automatic provider-side **prompt-cache**
|
|
44
|
+
breakpoint. Inserts an Anthropic `cache_control` marker on the last system
|
|
45
|
+
message so the static prompt head bills at the cache-read rate (~0.1× input)
|
|
46
|
+
on repeats; the model still runs. `true` for the 5-minute window,
|
|
47
|
+
`{ ttl: "1h" }` for the longer one. Only writes the `anthropic` namespace
|
|
48
|
+
(ignored by other providers, safe on a mixed chain) and steps aside if you set
|
|
49
|
+
`cacheControl` yourself. Savings surface via the existing `cachedInputTokens` /
|
|
50
|
+
`cachedSavingUsd`. New exported type `PromptCacheOptions`.
|
|
51
|
+
- `CallRecord` gains **`cacheHit`** and **`cacheHitSavingUsd`** for response-cache
|
|
52
|
+
hits.
|
|
53
|
+
|
|
54
|
+
### Compatibility
|
|
55
|
+
|
|
56
|
+
- Fully backward compatible. Both `cache` and `promptCache` are **off by
|
|
57
|
+
default** — unset, routing behaves exactly as before.
|
|
58
|
+
|
|
7
59
|
## [0.6.2] — 2026-06-11
|
|
8
60
|
|
|
9
61
|
Circuit breaker for persistently-failing providers. Until now the only recovery
|
package/README.md
CHANGED
|
@@ -182,6 +182,42 @@ const lcr = createLCR({
|
|
|
182
182
|
|
|
183
183
|
With `cooldown` on, a provider that fails enough times in a window is *skipped* for a cooldown period rather than tried every request — and a single success clears it. Defaults are 3 failures / 60s → 60s cooldown; tune with `cooldown: { maxFailures, windowMs, cooldownMs }`. It only ever **reorders** the attempt list (cooling providers go last), so if *every* provider is cooling a request still tries them all rather than failing outright. Off by default — routing is unchanged unless you opt in.
|
|
184
184
|
|
|
185
|
+
## Cache
|
|
186
|
+
|
|
187
|
+
There are two completely different "caches" in LLM land, and ai-lcr does both — each off by default, each a pure config flag with no service to run.
|
|
188
|
+
|
|
189
|
+
### Skip the call entirely (`cache`) — response cache
|
|
190
|
+
|
|
191
|
+
When a request is byte-for-byte identical to one already answered, replay the stored response and call **no provider at all**: zero latency, `costUsd: 0`. This is the layer Vercel AI Gateway notably *doesn't* offer.
|
|
192
|
+
|
|
193
|
+
```ts
|
|
194
|
+
const lcr = createLCR({
|
|
195
|
+
models: { /* … */ },
|
|
196
|
+
cache: true, // exact-match response cache (in-memory by default)
|
|
197
|
+
});
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
Storage is pluggable and ai-lcr ships **zero dependencies** for it:
|
|
201
|
+
|
|
202
|
+
- **`cache: true`** uses a process-local in-memory store. Real on a long-running server, and useful *within* one serverless invocation (an agent loop repeating a sub-call) — but it does **not** survive across serverless requests, because separate function instances don't share memory.
|
|
203
|
+
- **For cross-request hits on serverless**, bring your own store backed by a shared layer (Upstash Redis, Vercel KV): `cache: myStore`. ai-lcr runs no service of its own — any shared store is yours. A custom store is just `{ get, set }` (see `CacheStore`); `createMemoryCacheStore({ maxEntries })` is exported if you want the bundled one with a cap.
|
|
204
|
+
- **`cache: { store?, ttlMs? }`** sets an entry lifetime.
|
|
205
|
+
|
|
206
|
+
A hit settles a `CallRecord` with `cacheHit: true`, `costUsd: 0`, and the money it avoided on its own `cacheHitSavingUsd` line — a *caching* saving, kept separate from routing savings (`baselineUsd − costUsd`), never folded in. Empty completions and usage-less results are never cached. One caveat worth stating: caching makes identical requests return identical responses — exactly right for idempotent / `temperature: 0` calls, a behavior change for sampled ones.
|
|
207
|
+
|
|
208
|
+
### Pay less for the call (`promptCache`) — provider prompt cache
|
|
209
|
+
|
|
210
|
+
Different mechanism: the model still runs, but the **static head of the prompt** (your system prompt) bills at the provider's cache-read rate (~0.1× input) on repeats. Anthropic needs an explicit `cache_control` marker; OpenAI / Gemini / DeepSeek cache the prefix automatically. `promptCache: true` inserts that marker on the last system message for you:
|
|
211
|
+
|
|
212
|
+
```ts
|
|
213
|
+
const lcr = createLCR({
|
|
214
|
+
models: { /* … */ },
|
|
215
|
+
promptCache: true, // 5-minute window; { ttl: "1h" } for the longer one
|
|
216
|
+
});
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
It only writes the `anthropic` provider-options namespace, which every other provider ignores — so it's safe on a mixed chain. And it **steps aside entirely** if you set `cacheControl` yourself anywhere in the prompt. The savings then show up exactly as before: `cachedInputTokens` and `cachedSavingUsd` on the `CallRecord` (see [Cache-aware cost](#see-what-happened-oncall) below).
|
|
220
|
+
|
|
185
221
|
## See what happened (`onCall`)
|
|
186
222
|
|
|
187
223
|
`onError`/`onCost` fire separately and uncorrelated, so a failover is hard to read after the fact. `onCall` gives you **one record per request** — the full chain, the winner, the reason for each failed hop, latency, and cost — and `formatCallRecord` turns it into a one-liner you can scan:
|
package/README.zh-CN.md
CHANGED
|
@@ -144,6 +144,42 @@ DeepInfra 只承载开源权重——没有第一方 Claude / GPT / Gemini。那
|
|
|
144
144
|
2. **失败时向下穿透。** 遇到任何 provider 失败——限流、5xx、超时、**额度耗尽**(402 / 欠费 / 余额不足),以及 **400** 这类 client 错误——都会前进到下一个 provider,且对流式安全。400 会 failover 是有意为之:在 OpenAI 兼容聚合层里,400 往往是"*这家* provider 不吃这个请求"(不支持的参数、它没上架这个 model、更严格的 schema),而非请求本身坏了——换一家很可能就能服务。若所有 provider 都拒绝,请求仍会失败,并抛出**第一个**(原始)错误,让真正的调用方 bug 保持可调试。唯一永远不 failover 的是调用方主动取消(`AbortSignal`)。想恢复旧的"client 错误立即失败"行为,给 `createLCR` 传 `shouldRetry: isRetryableError`。
|
|
145
145
|
3. **恢复。** 在一段空闲窗口(`resetIntervalMs`,默认 60s)之后,自动回到最便宜的 provider。
|
|
146
146
|
|
|
147
|
+
## 缓存
|
|
148
|
+
|
|
149
|
+
LLM 世界里有两种完全不同的"缓存",ai-lcr 两种都做——都默认关闭,都只是一个配置开关,**不需要你跑任何服务**。
|
|
150
|
+
|
|
151
|
+
### 整次调用都省掉(`cache`)—— 响应缓存
|
|
152
|
+
|
|
153
|
+
当一个请求和之前回答过的一模一样时,直接重放已存的响应、**完全不调用任何 provider**:零延迟、`costUsd: 0`。这正是 Vercel AI Gateway 明显没做的那一层。
|
|
154
|
+
|
|
155
|
+
```ts
|
|
156
|
+
const lcr = createLCR({
|
|
157
|
+
models: { /* … */ },
|
|
158
|
+
cache: true, // 精确匹配响应缓存(默认进程内内存)
|
|
159
|
+
});
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
存储可插拔,且 ai-lcr 为此**零依赖**:
|
|
163
|
+
|
|
164
|
+
- **`cache: true`** 用进程内内存。在长驻 server 上是真缓存;在单次 serverless 调用内(比如 agent 循环里重复的子调用)也有用——但它**不会跨 serverless 请求存活**,因为不同函数实例之间不共享内存。
|
|
165
|
+
- **想在 serverless 上跨请求命中**,自带一个由共享层(Upstash Redis、Vercel KV)支撑的 store:`cache: myStore`。ai-lcr 自己不跑任何服务——共享层是你的。自定义 store 就是 `{ get, set }`(见 `CacheStore`);想要带上限的内置实现,用导出的 `createMemoryCacheStore({ maxEntries })`。
|
|
166
|
+
- **`cache: { store?, ttlMs? }`** 可设置过期时间。
|
|
167
|
+
|
|
168
|
+
命中会落一条 `CallRecord`:`cacheHit: true`、`costUsd: 0`,并把省下的钱单独记在 `cacheHitSavingUsd` 一行——这是**缓存**省的钱,和路由省的钱(`baselineUsd − costUsd`)分开,绝不混在一起。空回复和无 usage 的结果永不缓存。一个要点:缓存会让相同请求返回相同响应——对幂等 / `temperature: 0` 的调用正好,对采样型调用则是行为改变。
|
|
169
|
+
|
|
170
|
+
### 让这次调用更便宜(`promptCache`)—— provider 提示缓存
|
|
171
|
+
|
|
172
|
+
另一套机制:模型照样跑,但**提示的静态开头**(你的 system prompt)在重复时按 provider 的缓存读价(约 0.1× input)计费。Anthropic 需要显式 `cache_control` 标记;OpenAI / Gemini / DeepSeek 自动缓存前缀。`promptCache: true` 帮你在最后一条 system 消息上插入这个标记:
|
|
173
|
+
|
|
174
|
+
```ts
|
|
175
|
+
const lcr = createLCR({
|
|
176
|
+
models: { /* … */ },
|
|
177
|
+
promptCache: true, // 5 分钟窗口;想要更长用 { ttl: "1h" }
|
|
178
|
+
});
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
它只写 `anthropic` 这个 provider-options 命名空间,其他 provider 一律忽略——所以在混合链路上也安全。而且只要你在 prompt 里自己设了 `cacheControl`,它就**完全让位**。省下的钱照旧体现在 `CallRecord` 的 `cachedInputTokens` 和 `cachedSavingUsd` 上。
|
|
182
|
+
|
|
147
183
|
## 看清每次调用发生了什么(`onCall`)
|
|
148
184
|
|
|
149
185
|
`onError`/`onCost` 各自独立触发、互不关联,事后很难还原一次 failover 的全貌。`onCall` 给你**每个请求一条记录**——完整的尝试链、最终服务者、每跳失败的原因、延迟和成本;`formatCallRecord` 把它变成一行可扫读的日志:
|
package/dist/index.cjs
CHANGED
|
@@ -20,6 +20,7 @@ var __toCommonJS = (mod) => __copyProps(__defProp({}, "__esModule", { value: tru
|
|
|
20
20
|
// src/index.ts
|
|
21
21
|
var index_exports = {};
|
|
22
22
|
__export(index_exports, {
|
|
23
|
+
DEFAULT_PROVIDERS: () => DEFAULT_PROVIDERS,
|
|
23
24
|
DEFAULT_REFERENCE: () => DEFAULT_REFERENCE,
|
|
24
25
|
MEDIA_PRICING: () => MEDIA_PRICING,
|
|
25
26
|
MODEL_PRICES: () => MODEL_PRICES,
|
|
@@ -29,11 +30,13 @@ __export(index_exports, {
|
|
|
29
30
|
classifyError: () => classifyError,
|
|
30
31
|
classifyErrorKind: () => classifyErrorKind,
|
|
31
32
|
comparePrices: () => comparePrices,
|
|
33
|
+
createEnvSink: () => createEnvSink,
|
|
32
34
|
createFalMediaAdapter: () => createFalMediaAdapter,
|
|
33
35
|
createHttpSink: () => createHttpSink,
|
|
34
36
|
createKunavoMediaAdapter: () => createKunavoMediaAdapter,
|
|
35
37
|
createLCR: () => createLCR,
|
|
36
38
|
createMediaLCR: () => createMediaLCR,
|
|
39
|
+
createMemoryCacheStore: () => createMemoryCacheStore,
|
|
37
40
|
createRunwareMediaAdapter: () => createRunwareMediaAdapter,
|
|
38
41
|
durationFromInput: () => durationFromInput,
|
|
39
42
|
formatCallRecord: () => formatCallRecord,
|
|
@@ -49,6 +52,105 @@ __export(index_exports, {
|
|
|
49
52
|
});
|
|
50
53
|
module.exports = __toCommonJS(index_exports);
|
|
51
54
|
|
|
55
|
+
// src/cache.ts
|
|
56
|
+
function isCacheStore(x) {
|
|
57
|
+
return typeof x === "object" && x !== null && typeof x.get === "function" && typeof x.set === "function";
|
|
58
|
+
}
|
|
59
|
+
function resolveCache(opt) {
|
|
60
|
+
if (!opt) return void 0;
|
|
61
|
+
if (opt === true) return { store: createMemoryCacheStore() };
|
|
62
|
+
if (isCacheStore(opt)) return { store: opt };
|
|
63
|
+
return {
|
|
64
|
+
store: opt.store ?? createMemoryCacheStore(),
|
|
65
|
+
...opt.ttlMs !== void 0 ? { ttlMs: opt.ttlMs } : {}
|
|
66
|
+
};
|
|
67
|
+
}
|
|
68
|
+
function cacheKeyOf(modelName, options) {
|
|
69
|
+
const rest = options.providerOptions ? Object.entries(options.providerOptions).filter(([ns]) => ns !== "lcr") : [];
|
|
70
|
+
const po = rest.length > 0 ? Object.fromEntries(rest) : void 0;
|
|
71
|
+
return JSON.stringify({
|
|
72
|
+
m: modelName,
|
|
73
|
+
prompt: options.prompt,
|
|
74
|
+
maxOutputTokens: options.maxOutputTokens,
|
|
75
|
+
temperature: options.temperature,
|
|
76
|
+
topP: options.topP,
|
|
77
|
+
topK: options.topK,
|
|
78
|
+
frequencyPenalty: options.frequencyPenalty,
|
|
79
|
+
presencePenalty: options.presencePenalty,
|
|
80
|
+
stopSequences: options.stopSequences,
|
|
81
|
+
seed: options.seed,
|
|
82
|
+
responseFormat: options.responseFormat,
|
|
83
|
+
tools: options.tools,
|
|
84
|
+
toolChoice: options.toolChoice,
|
|
85
|
+
po
|
|
86
|
+
});
|
|
87
|
+
}
|
|
88
|
+
function streamFromParts(parts) {
|
|
89
|
+
return new ReadableStream({
|
|
90
|
+
start(controller) {
|
|
91
|
+
for (const part of parts) controller.enqueue(part);
|
|
92
|
+
controller.close();
|
|
93
|
+
}
|
|
94
|
+
});
|
|
95
|
+
}
|
|
96
|
+
function createMemoryCacheStore(opts = {}) {
|
|
97
|
+
const maxEntries = opts.maxEntries ?? 1e3;
|
|
98
|
+
const map = /* @__PURE__ */ new Map();
|
|
99
|
+
return {
|
|
100
|
+
get(key) {
|
|
101
|
+
const entry = map.get(key);
|
|
102
|
+
if (!entry) return void 0;
|
|
103
|
+
if (entry.expiresAt !== void 0 && entry.expiresAt <= Date.now()) {
|
|
104
|
+
map.delete(key);
|
|
105
|
+
return void 0;
|
|
106
|
+
}
|
|
107
|
+
return entry.value;
|
|
108
|
+
},
|
|
109
|
+
set(key, value, ttlMs) {
|
|
110
|
+
const entry = ttlMs !== void 0 ? { value, expiresAt: Date.now() + ttlMs } : { value };
|
|
111
|
+
map.delete(key);
|
|
112
|
+
map.set(key, entry);
|
|
113
|
+
if (map.size > maxEntries) {
|
|
114
|
+
const oldest = map.keys().next().value;
|
|
115
|
+
if (oldest !== void 0) map.delete(oldest);
|
|
116
|
+
}
|
|
117
|
+
}
|
|
118
|
+
};
|
|
119
|
+
}
|
|
120
|
+
|
|
121
|
+
// src/prompt-cache.ts
|
|
122
|
+
function resolvePromptCache(opt) {
|
|
123
|
+
if (!opt) return void 0;
|
|
124
|
+
if (opt === true) return { ttl: "5m" };
|
|
125
|
+
return { ttl: opt.ttl ?? "5m" };
|
|
126
|
+
}
|
|
127
|
+
function hasAnthropicCacheControl(message) {
|
|
128
|
+
const anthropic = message.providerOptions?.anthropic;
|
|
129
|
+
return !!anthropic && "cacheControl" in anthropic;
|
|
130
|
+
}
|
|
131
|
+
function withPromptCacheBreakpoint(options, cfg) {
|
|
132
|
+
const prompt = options.prompt;
|
|
133
|
+
if (!Array.isArray(prompt) || prompt.length === 0) return options;
|
|
134
|
+
if (prompt.some(hasAnthropicCacheControl)) return options;
|
|
135
|
+
let target = -1;
|
|
136
|
+
for (let i = 0; i < prompt.length; i++) {
|
|
137
|
+
if (prompt[i].role === "system") target = i;
|
|
138
|
+
}
|
|
139
|
+
if (target === -1) return options;
|
|
140
|
+
const cacheControl = cfg.ttl === "1h" ? { type: "ephemeral", ttl: "1h" } : { type: "ephemeral" };
|
|
141
|
+
const newPrompt = prompt.map((message, i) => {
|
|
142
|
+
if (i !== target) return message;
|
|
143
|
+
return {
|
|
144
|
+
...message,
|
|
145
|
+
providerOptions: {
|
|
146
|
+
...message.providerOptions,
|
|
147
|
+
anthropic: { ...message.providerOptions?.anthropic, cacheControl }
|
|
148
|
+
}
|
|
149
|
+
};
|
|
150
|
+
});
|
|
151
|
+
return { ...options, prompt: newPrompt };
|
|
152
|
+
}
|
|
153
|
+
|
|
52
154
|
// src/fallback.ts
|
|
53
155
|
var EmptyCompletionError = class extends Error {
|
|
54
156
|
constructor(provider) {
|
|
@@ -447,6 +549,16 @@ var LcrFallbackModel = class {
|
|
|
447
549
|
const usageMissing = inputTokens === 0 && outputTokens === 0;
|
|
448
550
|
const emptyCompletion = inputTokens > 0 && outputTokens === 0;
|
|
449
551
|
const baselineUsd = this.baselineUsd(inputTokens, outputTokens, cacheReadTokens);
|
|
552
|
+
ctx.settled = {
|
|
553
|
+
meta: {
|
|
554
|
+
winner: provider.label,
|
|
555
|
+
costUsd,
|
|
556
|
+
inputTokens,
|
|
557
|
+
outputTokens,
|
|
558
|
+
...cacheReadTokens > 0 ? { cachedInputTokens: cacheReadTokens } : {}
|
|
559
|
+
},
|
|
560
|
+
cacheable: !emptyCompletion && !usageMissing
|
|
561
|
+
};
|
|
450
562
|
this.emitCost({
|
|
451
563
|
model: this.opts.modelName,
|
|
452
564
|
provider: provider.label,
|
|
@@ -491,6 +603,16 @@ var LcrFallbackModel = class {
|
|
|
491
603
|
});
|
|
492
604
|
}
|
|
493
605
|
async doGenerate(options) {
|
|
606
|
+
const cache = this.opts.cache;
|
|
607
|
+
const cacheKey = cache ? cacheKeyOf(this.opts.modelName, options) : void 0;
|
|
608
|
+
if (cache && cacheKey !== void 0) {
|
|
609
|
+
const hit = await cache.store.get(cacheKey);
|
|
610
|
+
if (hit && hit.kind === "generate") {
|
|
611
|
+
this.finalizeCacheHit(this.startCall(options), hit.meta);
|
|
612
|
+
return hit.result;
|
|
613
|
+
}
|
|
614
|
+
}
|
|
615
|
+
const callOptions = this.opts.promptCache ? withPromptCacheBreakpoint(options, this.opts.promptCache) : options;
|
|
494
616
|
const ctx = this.startCall(options);
|
|
495
617
|
const providers = this.opts.providers;
|
|
496
618
|
const order = this.routeOrder(this.startIndex());
|
|
@@ -501,7 +623,7 @@ var LcrFallbackModel = class {
|
|
|
501
623
|
const isLast = pos === order.length - 1;
|
|
502
624
|
const attemptStart = Date.now();
|
|
503
625
|
try {
|
|
504
|
-
const result = await provider.model.doGenerate(
|
|
626
|
+
const result = await provider.model.doGenerate(callOptions);
|
|
505
627
|
const out = result.usage?.outputTokens?.total ?? 0;
|
|
506
628
|
const inp = result.usage?.inputTokens?.total ?? 0;
|
|
507
629
|
if (inp > 0 && out === 0 && !isLast) {
|
|
@@ -514,6 +636,9 @@ var LcrFallbackModel = class {
|
|
|
514
636
|
this.recordProviderSuccess(idx);
|
|
515
637
|
this.settleSticky(idx);
|
|
516
638
|
this.finalizeOk(ctx, provider, attemptStart, result.usage);
|
|
639
|
+
if (cache && cacheKey !== void 0 && ctx.settled?.cacheable) {
|
|
640
|
+
this.storeCache(cacheKey, { kind: "generate", result, meta: ctx.settled.meta });
|
|
641
|
+
}
|
|
517
642
|
return result;
|
|
518
643
|
} catch (error) {
|
|
519
644
|
lastError = error;
|
|
@@ -530,7 +655,76 @@ var LcrFallbackModel = class {
|
|
|
530
655
|
throw ctx.firstError ?? lastError;
|
|
531
656
|
}
|
|
532
657
|
async doStream(options) {
|
|
533
|
-
|
|
658
|
+
const cache = this.opts.cache;
|
|
659
|
+
const cacheKey = cache ? cacheKeyOf(this.opts.modelName, options) : void 0;
|
|
660
|
+
if (cache && cacheKey !== void 0) {
|
|
661
|
+
const hit = await cache.store.get(cacheKey);
|
|
662
|
+
if (hit && hit.kind === "stream") {
|
|
663
|
+
this.finalizeCacheHit(this.startCall(options), hit.meta);
|
|
664
|
+
return { stream: streamFromParts(hit.parts) };
|
|
665
|
+
}
|
|
666
|
+
}
|
|
667
|
+
const ctx = this.startCall(options);
|
|
668
|
+
const callOptions = this.opts.promptCache ? withPromptCacheBreakpoint(options, this.opts.promptCache) : options;
|
|
669
|
+
const inner = await this.doStreamWithCtx(
|
|
670
|
+
callOptions,
|
|
671
|
+
ctx,
|
|
672
|
+
this.routeOrder(this.startIndex()),
|
|
673
|
+
0
|
|
674
|
+
);
|
|
675
|
+
if (!cache || cacheKey === void 0) return inner;
|
|
676
|
+
const collected = [];
|
|
677
|
+
const self = this;
|
|
678
|
+
const wrapped = inner.stream.pipeThrough(
|
|
679
|
+
new TransformStream({
|
|
680
|
+
transform(part, controller) {
|
|
681
|
+
collected.push(part);
|
|
682
|
+
controller.enqueue(part);
|
|
683
|
+
},
|
|
684
|
+
flush() {
|
|
685
|
+
if (ctx.settled?.cacheable) {
|
|
686
|
+
self.storeCache(cacheKey, { kind: "stream", parts: collected, meta: ctx.settled.meta });
|
|
687
|
+
}
|
|
688
|
+
}
|
|
689
|
+
})
|
|
690
|
+
);
|
|
691
|
+
return { ...inner, stream: wrapped };
|
|
692
|
+
}
|
|
693
|
+
/** A response-cache hit: replay a stored answer with no provider call. Settles
|
|
694
|
+
* one {@link CallRecord} with `cacheHit`, `costUsd: 0`, and the avoided cost
|
|
695
|
+
* on its own `cacheHitSavingUsd` line. */
|
|
696
|
+
finalizeCacheHit(ctx, meta) {
|
|
697
|
+
this.emitCall({
|
|
698
|
+
id: ctx.id,
|
|
699
|
+
model: this.opts.modelName,
|
|
700
|
+
attempts: [{ provider: meta.winner, ok: true, latencyMs: Date.now() - ctx.startedAt }],
|
|
701
|
+
winner: meta.winner,
|
|
702
|
+
ok: true,
|
|
703
|
+
failedOver: false,
|
|
704
|
+
latencyMs: Date.now() - ctx.startedAt,
|
|
705
|
+
inputTokens: meta.inputTokens,
|
|
706
|
+
outputTokens: meta.outputTokens,
|
|
707
|
+
...meta.cachedInputTokens ? { cachedInputTokens: meta.cachedInputTokens } : {},
|
|
708
|
+
costUsd: 0,
|
|
709
|
+
cacheHit: true,
|
|
710
|
+
...meta.costUsd > 0 ? { cacheHitSavingUsd: meta.costUsd } : {},
|
|
711
|
+
...ctx.requestId ? { requestId: ctx.requestId } : {}
|
|
712
|
+
});
|
|
713
|
+
}
|
|
714
|
+
/** Best-effort write to the response cache: a sync throw or a rejected async
|
|
715
|
+
* `set` must never break the request. Caching is an optimization, not a
|
|
716
|
+
* guarantee. */
|
|
717
|
+
storeCache(key, value) {
|
|
718
|
+
const cache = this.opts.cache;
|
|
719
|
+
if (!cache) return;
|
|
720
|
+
try {
|
|
721
|
+
const r = cache.store.set(key, value, cache.ttlMs);
|
|
722
|
+
if (r && typeof r.catch === "function") {
|
|
723
|
+
r.catch(() => {
|
|
724
|
+
});
|
|
725
|
+
}
|
|
726
|
+
} catch {
|
|
727
|
+
}
|
|
534
728
|
}
|
|
535
729
|
// The stream's failover recursion re-enters here with the SAME `ctx` and the
|
|
536
730
|
// SAME `order` snapshot, advancing only the local position `pos`, so a
|
|
@@ -713,6 +907,30 @@ function createHttpSink(options) {
|
|
|
713
907
|
};
|
|
714
908
|
}
|
|
715
909
|
|
|
910
|
+
// src/env-sink.ts
|
|
911
|
+
function createEnvSink(dispatch) {
|
|
912
|
+
const base = process.env.LCR_INGEST_URL?.replace(/\/+$/, "");
|
|
913
|
+
if (!base) return void 0;
|
|
914
|
+
return createHttpSink({
|
|
915
|
+
url: `${base}/api/ingest`,
|
|
916
|
+
headers: process.env.LCR_INGEST_KEY ? { authorization: `Bearer ${process.env.LCR_INGEST_KEY}` } : void 0,
|
|
917
|
+
project: process.env.LCR_PROJECT ?? process.env.SITE_KEY,
|
|
918
|
+
dispatch,
|
|
919
|
+
onError: (err) => console.error("[lcr] ingest POST failed:", err)
|
|
920
|
+
});
|
|
921
|
+
}
|
|
922
|
+
|
|
923
|
+
// src/providers.ts
|
|
924
|
+
var DEFAULT_PROVIDERS = {
|
|
925
|
+
openrouter: { baseURL: "https://openrouter.ai/api/v1", apiKeyEnv: "OPENROUTER_API_KEY" },
|
|
926
|
+
deepinfra: { baseURL: "https://api.deepinfra.com/v1/openai", apiKeyEnv: "DEEPINFRA_API_KEY" },
|
|
927
|
+
tokenmart: { baseURL: "https://model.service-inference.ai/v1", apiKeyEnv: "INFERENCE_API_KEY" },
|
|
928
|
+
deepseek: { baseURL: "https://api.deepseek.com", apiKeyEnv: "DEEPSEEK_API_KEY" },
|
|
929
|
+
kunavo: { baseURL: "https://api.kunavo.com/v1", apiKeyEnv: "KUNAVO_API_KEY" },
|
|
930
|
+
runware: { baseURL: "https://api.runware.ai/v1", apiKeyEnv: "RUNWARE_API_KEY" },
|
|
931
|
+
fal: { baseURL: "https://queue.fal.run", apiKeyEnv: "FAL_KEY" }
|
|
932
|
+
};
|
|
933
|
+
|
|
716
934
|
// src/text-prices.ts
|
|
717
935
|
var MODEL_PRICES = {
|
|
718
936
|
"chatgpt-4o-latest": { input: 5, output: 15 },
|
|
@@ -2012,12 +2230,16 @@ function createLCR(config) {
|
|
|
2012
2230
|
autoPrice = false,
|
|
2013
2231
|
resetIntervalMs,
|
|
2014
2232
|
cooldown,
|
|
2233
|
+
cache,
|
|
2234
|
+
promptCache,
|
|
2015
2235
|
onError,
|
|
2016
2236
|
onCost,
|
|
2017
2237
|
onCall,
|
|
2018
2238
|
shouldRetry,
|
|
2019
2239
|
defaultCacheReadRatio
|
|
2020
2240
|
} = config;
|
|
2241
|
+
const resolvedCache = resolveCache(cache);
|
|
2242
|
+
const resolvedPromptCache = resolvePromptCache(promptCache);
|
|
2021
2243
|
if (defaultCacheReadRatio !== void 0 && (defaultCacheReadRatio < 0 || defaultCacheReadRatio > 1)) {
|
|
2022
2244
|
throw new Error(
|
|
2023
2245
|
`ai-lcr: defaultCacheReadRatio must be in [0, 1], got ${defaultCacheReadRatio}`
|
|
@@ -2037,7 +2259,18 @@ function createLCR(config) {
|
|
|
2037
2259
|
}
|
|
2038
2260
|
routed.set(
|
|
2039
2261
|
name,
|
|
2040
|
-
new LcrFallbackModel({
|
|
2262
|
+
new LcrFallbackModel({
|
|
2263
|
+
modelName: name,
|
|
2264
|
+
providers,
|
|
2265
|
+
resetIntervalMs,
|
|
2266
|
+
cooldown,
|
|
2267
|
+
...resolvedCache ? { cache: resolvedCache } : {},
|
|
2268
|
+
...resolvedPromptCache ? { promptCache: resolvedPromptCache } : {},
|
|
2269
|
+
onError,
|
|
2270
|
+
onCost,
|
|
2271
|
+
onCall,
|
|
2272
|
+
shouldRetry
|
|
2273
|
+
})
|
|
2041
2274
|
);
|
|
2042
2275
|
}
|
|
2043
2276
|
return (modelName) => {
|
|
@@ -2052,6 +2285,7 @@ function createLCR(config) {
|
|
|
2052
2285
|
}
|
|
2053
2286
|
// Annotate the CommonJS export names for ESM import in node:
|
|
2054
2287
|
0 && (module.exports = {
|
|
2288
|
+
DEFAULT_PROVIDERS,
|
|
2055
2289
|
DEFAULT_REFERENCE,
|
|
2056
2290
|
MEDIA_PRICING,
|
|
2057
2291
|
MODEL_PRICES,
|
|
@@ -2061,11 +2295,13 @@ function createLCR(config) {
|
|
|
2061
2295
|
classifyError,
|
|
2062
2296
|
classifyErrorKind,
|
|
2063
2297
|
comparePrices,
|
|
2298
|
+
createEnvSink,
|
|
2064
2299
|
createFalMediaAdapter,
|
|
2065
2300
|
createHttpSink,
|
|
2066
2301
|
createKunavoMediaAdapter,
|
|
2067
2302
|
createLCR,
|
|
2068
2303
|
createMediaLCR,
|
|
2304
|
+
createMemoryCacheStore,
|
|
2069
2305
|
createRunwareMediaAdapter,
|
|
2070
2306
|
durationFromInput,
|
|
2071
2307
|
formatCallRecord,
|