pi-cache-optimizer 2.5.3 → 2.5.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -3
- package/README.zh-CN.md +2 -3
- package/index.ts +168 -36
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -8,8 +8,6 @@
|
|
|
8
8
|
|
|
9
9
|
Pi extension for improving provider-side KV / prompt cache hit rates. It keeps stable prompt content near the front, adds a conservative OpenAI-compatible `prompt_cache_key` fallback, warns about common proxy cache-routing gaps, and shows read-only footer cache stats.
|
|
10
10
|
|
|
11
|
-
**GitHub About:** Improve Pi prompt/KV cache hit rates with stable prompts, OpenAI-compatible cache keys, proxy compat warnings, and footer cache stats.
|
|
12
|
-
|
|
13
11
|
> Renamed from `pi-deepseek-cache-optimizer`. Existing footer counters migrate automatically. This package never creates, edits, backs up, or deletes your `~/.pi/agent/models.json`.
|
|
14
12
|
|
|
15
13
|
## Contents
|
|
@@ -100,6 +98,7 @@ Notes:
|
|
|
100
98
|
- `supportsLongCacheRetention: true` is optional. Add it only when the endpoint explicitly supports OpenAI long prompt cache retention.
|
|
101
99
|
- If you see `400 Unsupported parameter: prompt_cache_retention`, remove/avoid `supportsLongCacheRetention` for that channel. Keep `sendSessionAffinityHeaders` if supported.
|
|
102
100
|
- Use `/cache-optimizer compat` or `/cache-optimizer doctor` to see model-specific advice.
|
|
101
|
+
- For DeepSeek models, the Pi Mono guidance expects `compat.requiresReasoningContentOnAssistantMessages: true` and `compat.thinkingFormat: "deepseek"` alongside cache/session-affinity flags when the endpoint supports them.
|
|
103
102
|
- This extension only advises; it does not edit `models.json`.
|
|
104
103
|
|
|
105
104
|
## Footer stats
|
|
@@ -114,7 +113,7 @@ OpenAI cache 3/10 · 0.002M/0.005M tok (40%) ⚠️ compat
|
|
|
114
113
|
|
|
115
114
|
Format: `<label> <hit requests>/<total requests> · <cached input tokens>/<total input tokens> tok (<token hit rate>)`. Some adapters may also append `· write <tokens> tok`, and runtime diagnostics may append `⚠️ compat` or `⚠️ integrity`.
|
|
116
115
|
|
|
117
|
-
Supported footer labels include: DS, Claude, OpenAI, Gemini, Kimi, Qwen, GLM, MiniMax, Hunyuan, Mistral, Grok, Llama, Nemotron, Cohere, Yi, Doubao, ERNIE, Baichuan, StepFun, Spark, InternLM, Gemma, Phi, Jamba, Solar, Sonar, Nova, Reka, Falcon, DBRX, MPT, StableLM, Aquila, EXAONE, HyperCLOVA, Luminous, Hermes, Granite, Arctic, Pangu, SenseNova, Zhinao, MiniCPM, XVERSE, Orion, OpenChat, Vicuna, Wizard, Zephyr, Dolphin, OpenOrca, Starling, BLOOM, RWKV, and Aya.
|
|
116
|
+
Supported footer labels include: DS, Claude, OpenAI, Gemini, Kimi, Qwen, GLM, MiniMax, Mimo, Hunyuan, Mistral, Grok, Llama, Nemotron, Cohere, Yi, Doubao, ERNIE, Baichuan, StepFun, Spark, InternLM, Gemma, Phi, Jamba, Solar, Sonar, Nova, Reka, Falcon, DBRX, MPT, StableLM, Aquila, EXAONE, HyperCLOVA, Luminous, Hermes, Granite, Arctic, Pangu, SenseNova, Zhinao, MiniCPM, XVERSE, Orion, OpenChat, Vicuna, Wizard, Zephyr, Dolphin, OpenOrca, Starling, BLOOM, RWKV, and Aya.
|
|
118
117
|
|
|
119
118
|
Adapter selection uses only model id/name (plus assistant message model/name on message end). Generic OpenAI-shaped APIs are not treated as OpenAI-family unless the model id/name matches a supported family.
|
|
120
119
|
|
package/README.zh-CN.md
CHANGED
|
@@ -8,8 +8,6 @@
|
|
|
8
8
|
|
|
9
9
|
用于提升 Pi 中 provider 侧 KV Cache / Prompt Cache 命中率的扩展:把稳定 prompt 内容前置,给 OpenAI-compatible 请求补保守的 `prompt_cache_key`,提示代理渠道常见缓存路由兼容问题,并在底部显示只读缓存统计。
|
|
10
10
|
|
|
11
|
-
**GitHub About:** Improve Pi prompt/KV cache hit rates with stable prompts, OpenAI-compatible cache keys, proxy compat warnings, and footer cache stats.
|
|
12
|
-
|
|
13
11
|
> 本包已从 `pi-deepseek-cache-optimizer` 改名。已有底部统计会自动迁移。本扩展绝不会创建、修改、备份或删除你的 `~/.pi/agent/models.json`。
|
|
14
12
|
|
|
15
13
|
## 目录
|
|
@@ -100,6 +98,7 @@ LiteLLM / OneAPI / NewAPI / 类 OpenRouter 渠道等第三方 `openai-completion
|
|
|
100
98
|
- `supportsLongCacheRetention: true` 是可选项。只有 endpoint 明确支持 OpenAI long prompt cache retention 时才添加。
|
|
101
99
|
- 如果出现 `400 Unsupported parameter: prompt_cache_retention`,请为该渠道移除 / 避免 `supportsLongCacheRetention`;如支持,可保留 `sendSessionAffinityHeaders`。
|
|
102
100
|
- 使用 `/cache-optimizer compat` 或 `/cache-optimizer doctor` 查看当前模型的具体建议。
|
|
101
|
+
- 对 DeepSeek 模型,Pi Mono 指南期望在支持时同时设置 `compat.requiresReasoningContentOnAssistantMessages: true` 和 `compat.thinkingFormat: "deepseek"`,再配合缓存 / session-affinity 相关 compat。
|
|
103
102
|
- 本扩展只给建议,不会修改 `models.json`。
|
|
104
103
|
|
|
105
104
|
## Footer 统计
|
|
@@ -114,7 +113,7 @@ OpenAI cache 3/10 · 0.002M/0.005M tok (40%) ⚠️ compat
|
|
|
114
113
|
|
|
115
114
|
格式:`<label> <命中请求数>/<总请求数> · <cached input tokens>/<total input tokens> tok (<token 命中率>)`。部分 adapter 还可能追加 `· write <tokens> tok`,运行时诊断可能追加 `⚠️ compat` 或 `⚠️ integrity`。
|
|
116
115
|
|
|
117
|
-
支持的 footer label 包括:DS、Claude、OpenAI、Gemini、Kimi、Qwen、GLM、MiniMax、Hunyuan、Mistral、Grok、Llama、Nemotron、Cohere、Yi、Doubao、ERNIE、Baichuan、StepFun、Spark、InternLM、Gemma、Phi、Jamba、Solar、Sonar、Nova、Reka、Falcon、DBRX、MPT、StableLM、Aquila、EXAONE、HyperCLOVA、Luminous、Hermes、Granite、Arctic、Pangu、SenseNova、Zhinao、MiniCPM、XVERSE、Orion、OpenChat、Vicuna、Wizard、Zephyr、Dolphin、OpenOrca、Starling、BLOOM、RWKV、Aya。
|
|
116
|
+
支持的 footer label 包括:DS、Claude、OpenAI、Gemini、Kimi、Qwen、GLM、MiniMax、Mimo、Hunyuan、Mistral、Grok、Llama、Nemotron、Cohere、Yi、Doubao、ERNIE、Baichuan、StepFun、Spark、InternLM、Gemma、Phi、Jamba、Solar、Sonar、Nova、Reka、Falcon、DBRX、MPT、StableLM、Aquila、EXAONE、HyperCLOVA、Luminous、Hermes、Granite、Arctic、Pangu、SenseNova、Zhinao、MiniCPM、XVERSE、Orion、OpenChat、Vicuna、Wizard、Zephyr、Dolphin、OpenOrca、Starling、BLOOM、RWKV、Aya。
|
|
118
117
|
|
|
119
118
|
Adapter 选择只看模型 id/name(以及 message_end 时 assistant message 的 model/name)。仅使用 OpenAI-shaped API 不会被当作 OpenAI-family,除非模型 id/name 匹配受支持的家族。
|
|
120
119
|
|
package/index.ts
CHANGED
|
@@ -126,6 +126,7 @@ const MIN_STABLE_CANDIDATE_LENGTH = 8;
|
|
|
126
126
|
const ASSISTANT_MESSAGE_MODEL_TOKEN_KEYS = ["model", "name"];
|
|
127
127
|
const OPENAI_REASONING_MODEL_PATTERN = /(^|[/\s:_-])o[1345]($|[-_.:/\s])/;
|
|
128
128
|
const XAI_MODEL_PATTERN = /(^|[/\s:_-])xai($|[-_.:/\s])/;
|
|
129
|
+
const MIMO_MODEL_PATTERN = /(^|[/\s:_-])mi-?mo($|[-_.:/\s])/i;
|
|
129
130
|
const PPLX_MODEL_PATTERN = /(^|[/\s:_-])pplx($|[-_.:/\s])/i;
|
|
130
131
|
const NOVA_MODEL_PATTERN = /(^|[/\s:_-])nova($|[-_.:/\s])/i;
|
|
131
132
|
const MPT_MODEL_PATTERN = /(^|[/\s:_-])mpt($|[-_.:/\s])/i;
|
|
@@ -141,6 +142,7 @@ type CacheCompat = {
|
|
|
141
142
|
sendSessionIdHeader?: boolean;
|
|
142
143
|
supportsLongCacheRetention?: boolean;
|
|
143
144
|
thinkingFormat?: string;
|
|
145
|
+
requiresReasoningContentOnAssistantMessages?: boolean;
|
|
144
146
|
cacheControlFormat?: string;
|
|
145
147
|
};
|
|
146
148
|
|
|
@@ -831,6 +833,18 @@ function isMiniMaxLikeAssistantMessage(message: unknown, model: PiModel | undefi
|
|
|
831
833
|
return modelOrAssistantMessageHas(message, model, ["minimax"]);
|
|
832
834
|
}
|
|
833
835
|
|
|
836
|
+
function isMimoLikeModel(model: PiModel | undefined): boolean {
|
|
837
|
+
const tokens = getModelIdNameTokenValues(model);
|
|
838
|
+
return hasAnyTokenContaining(tokens, ["xiaomimimo"]) || tokens.some((t) => MIMO_MODEL_PATTERN.test(t));
|
|
839
|
+
}
|
|
840
|
+
function isMimoLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
|
|
841
|
+
const allTokens = [
|
|
842
|
+
...getModelIdNameTokenValues(model),
|
|
843
|
+
...getAssistantMessageModelTokenValues(message),
|
|
844
|
+
];
|
|
845
|
+
return hasAnyTokenContaining(allTokens, ["xiaomimimo"]) || allTokens.some((t) => MIMO_MODEL_PATTERN.test(t));
|
|
846
|
+
}
|
|
847
|
+
|
|
834
848
|
function isHunyuanLikeModel(model: PiModel | undefined): boolean {
|
|
835
849
|
return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["hunyuan"]);
|
|
836
850
|
}
|
|
@@ -1492,7 +1506,7 @@ function describeMissingOpenAIFamilyProxyCompat(model: PiModel): string[] {
|
|
|
1492
1506
|
/**
|
|
1493
1507
|
* Like describeMissingOpenAIFamilyProxyCompat but without the isOpenAIFamilyModel
|
|
1494
1508
|
* gate. Warns for ANY model using openai-completions through a non-official base
|
|
1495
|
-
* URL — covers GPT, Kimi, Qwen, GLM, MiniMax, Hunyuan, and any other
|
|
1509
|
+
* URL — covers GPT, Kimi, Qwen, GLM, MiniMax, Mimo, Hunyuan, and any other
|
|
1496
1510
|
* OpenAI-compatible proxy.
|
|
1497
1511
|
*/
|
|
1498
1512
|
function describeMissingOpenAICompatibleProxyCompat(model: PiModel): string[] {
|
|
@@ -1590,10 +1604,88 @@ function describeMissingDeepSeekCompat(model: PiModel): string[] {
|
|
|
1590
1604
|
} else if (compat.sendSessionAffinityHeaders !== true) {
|
|
1591
1605
|
missing.push("sendSessionAffinityHeaders");
|
|
1592
1606
|
}
|
|
1607
|
+
if (compat.requiresReasoningContentOnAssistantMessages !== true) {
|
|
1608
|
+
missing.push("requiresReasoningContentOnAssistantMessages");
|
|
1609
|
+
}
|
|
1610
|
+
if (compat.thinkingFormat !== "deepseek") {
|
|
1611
|
+
missing.push("thinkingFormat");
|
|
1612
|
+
}
|
|
1593
1613
|
|
|
1594
1614
|
return missing;
|
|
1595
1615
|
}
|
|
1596
1616
|
|
|
1617
|
+
function isDeepSeekCompatCheckApplicable(model: PiModel): boolean {
|
|
1618
|
+
return isDeepSeekLikeModel(model) && isOpenAICompatibleApi(model.api);
|
|
1619
|
+
}
|
|
1620
|
+
|
|
1621
|
+
function describeMissingCacheCompatForModel(model: PiModel): string[] {
|
|
1622
|
+
if (isDeepSeekCompatCheckApplicable(model)) {
|
|
1623
|
+
return describeMissingDeepSeekCompat(model);
|
|
1624
|
+
}
|
|
1625
|
+
return describeMissingOpenAICompatibleProxyCompat(model);
|
|
1626
|
+
}
|
|
1627
|
+
|
|
1628
|
+
function buildDeepSeekCompatSuggestion(missing: string[]): Record<string, unknown> {
|
|
1629
|
+
const suggestion: Record<string, unknown> = {};
|
|
1630
|
+
|
|
1631
|
+
if (missing.includes("supportsLongCacheRetention")) {
|
|
1632
|
+
suggestion.supportsLongCacheRetention = true;
|
|
1633
|
+
}
|
|
1634
|
+
if (missing.includes("sendSessionIdHeader")) {
|
|
1635
|
+
suggestion.sendSessionIdHeader = true;
|
|
1636
|
+
}
|
|
1637
|
+
if (missing.includes("sendSessionAffinityHeaders")) {
|
|
1638
|
+
suggestion.sendSessionAffinityHeaders = true;
|
|
1639
|
+
}
|
|
1640
|
+
if (missing.includes("requiresReasoningContentOnAssistantMessages")) {
|
|
1641
|
+
suggestion.requiresReasoningContentOnAssistantMessages = true;
|
|
1642
|
+
}
|
|
1643
|
+
if (missing.includes("thinkingFormat")) {
|
|
1644
|
+
suggestion.thinkingFormat = "deepseek";
|
|
1645
|
+
}
|
|
1646
|
+
|
|
1647
|
+
return suggestion;
|
|
1648
|
+
}
|
|
1649
|
+
|
|
1650
|
+
function appendDeepSeekCompatAdviceLines(lines: string[], missing: string[]): void {
|
|
1651
|
+
const suggestion = buildDeepSeekCompatSuggestion(missing);
|
|
1652
|
+
if (Object.keys(suggestion).length > 0) {
|
|
1653
|
+
lines.push("Recommended DeepSeek compat snippet:");
|
|
1654
|
+
lines.push(JSON.stringify(suggestion, null, 2));
|
|
1655
|
+
}
|
|
1656
|
+
|
|
1657
|
+
if (missing.includes("requiresReasoningContentOnAssistantMessages")) {
|
|
1658
|
+
lines.push('- requiresReasoningContentOnAssistantMessages: true keeps replayed assistant turns compatible with DeepSeek reasoning_content requirements.');
|
|
1659
|
+
}
|
|
1660
|
+
if (missing.includes("thinkingFormat")) {
|
|
1661
|
+
lines.push('- thinkingFormat: "deepseek" tells Pi to use DeepSeek reasoning/thinking parameter format.');
|
|
1662
|
+
}
|
|
1663
|
+
if (missing.includes("sendSessionAffinityHeaders")) {
|
|
1664
|
+
lines.push("- sendSessionAffinityHeaders: recommended for OpenAI-compatible DeepSeek proxies when supported; it helps keep one Pi session on the same upstream/backend.");
|
|
1665
|
+
}
|
|
1666
|
+
if (missing.includes("sendSessionIdHeader")) {
|
|
1667
|
+
lines.push("- sendSessionIdHeader: recommended for OpenAI Responses-compatible DeepSeek proxies when supported.");
|
|
1668
|
+
}
|
|
1669
|
+
if (missing.includes("supportsLongCacheRetention")) {
|
|
1670
|
+
lines.push("- supportsLongCacheRetention: enable for DeepSeek-compatible endpoints that support long cache retention.");
|
|
1671
|
+
}
|
|
1672
|
+
}
|
|
1673
|
+
|
|
1674
|
+
function buildDeepSeekCompatWarningText(key: string, missing: string[]): string {
|
|
1675
|
+
const slashIdx = key.indexOf("/");
|
|
1676
|
+
const providerLabel = slashIdx > 0 ? key.slice(0, slashIdx) : key;
|
|
1677
|
+
const modelsJsonPath = getModelsJsonDisplayPath();
|
|
1678
|
+
const lines: string[] = [
|
|
1679
|
+
`💡 pi-cache-optimizer: ${key} is DeepSeek-like but merged compat lacks ${missing.join(" and ")}.`,
|
|
1680
|
+
`Proxies may reduce or hide cache hits. Edit ${modelsJsonPath} -> providers["${providerLabel}"] -> compat (at the same level as baseUrl/api/apiKey/models).`,
|
|
1681
|
+
"",
|
|
1682
|
+
];
|
|
1683
|
+
|
|
1684
|
+
appendDeepSeekCompatAdviceLines(lines, missing);
|
|
1685
|
+
|
|
1686
|
+
return lines.join("\n");
|
|
1687
|
+
}
|
|
1688
|
+
|
|
1597
1689
|
const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
|
|
1598
1690
|
{
|
|
1599
1691
|
id: "deepseek",
|
|
@@ -1613,13 +1705,7 @@ const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
|
|
|
1613
1705
|
if (missing.length === 0) return undefined;
|
|
1614
1706
|
|
|
1615
1707
|
const key = modelKey(model);
|
|
1616
|
-
|
|
1617
|
-
const providerLabel = slashIdx > 0 ? key.slice(0, slashIdx) : key;
|
|
1618
|
-
const modelsJsonPath = getModelsJsonDisplayPath();
|
|
1619
|
-
return (
|
|
1620
|
-
`💡 pi-cache-optimizer: ${key} is DeepSeek-like but merged compat lacks ${missing.join(" and ")}. ` +
|
|
1621
|
-
`Proxies may reduce or hide cache hits. Edit ${modelsJsonPath} -> providers["${providerLabel}"] -> compat (at the same level as baseUrl/api/apiKey/models).`
|
|
1622
|
-
);
|
|
1708
|
+
return buildDeepSeekCompatWarningText(key, missing);
|
|
1623
1709
|
},
|
|
1624
1710
|
},
|
|
1625
1711
|
{
|
|
@@ -1742,6 +1828,23 @@ const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
|
|
|
1742
1828
|
return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
|
|
1743
1829
|
},
|
|
1744
1830
|
},
|
|
1831
|
+
{
|
|
1832
|
+
id: "openai" as CacheProviderId,
|
|
1833
|
+
label: "Mimo cache",
|
|
1834
|
+
matchesModel: isMimoLikeModel,
|
|
1835
|
+
matchesAssistantMessage(message, model) {
|
|
1836
|
+
if (!isAssistantMessage(message)) return false;
|
|
1837
|
+
return isMimoLikeAssistantMessage(message, model);
|
|
1838
|
+
},
|
|
1839
|
+
normalizeUsage(message) {
|
|
1840
|
+
return normalizeWithFallback(message, getOpenAIRawUsage);
|
|
1841
|
+
},
|
|
1842
|
+
warningText(model) {
|
|
1843
|
+
const missing = describeMissingOpenAICompatibleProxyCompat(model);
|
|
1844
|
+
if (missing.length === 0) return undefined;
|
|
1845
|
+
return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
|
|
1846
|
+
},
|
|
1847
|
+
},
|
|
1745
1848
|
{
|
|
1746
1849
|
id: "openai" as CacheProviderId,
|
|
1747
1850
|
label: "Hunyuan cache",
|
|
@@ -3028,6 +3131,12 @@ function isCompatCheckApplicable(model: PiModel): boolean {
|
|
|
3028
3131
|
return lower(model.api) === "openai-completions" && !isOfficialOpenAIBaseUrl(model);
|
|
3029
3132
|
}
|
|
3030
3133
|
|
|
3134
|
+
function isPromptCacheRetention400Applicable(model: PiModel): boolean {
|
|
3135
|
+
return isOpenAICompatibleApi(model.api) &&
|
|
3136
|
+
!isOfficialOpenAIBaseUrl(model) &&
|
|
3137
|
+
getCompat(model).supportsLongCacheRetention === true;
|
|
3138
|
+
}
|
|
3139
|
+
|
|
3031
3140
|
/**
|
|
3032
3141
|
* Detect router / channel profiles from a PiModel and return diagnostic notes.
|
|
3033
3142
|
*
|
|
@@ -3171,7 +3280,7 @@ function describeRouterChannelDiagnostics(model: PiModel): string[] {
|
|
|
3171
3280
|
|
|
3172
3281
|
// ── 4. Generic third-party OpenAI-compatible proxy ─────────────────
|
|
3173
3282
|
if (api === "openai-completions" && baseUrl) {
|
|
3174
|
-
const missing =
|
|
3283
|
+
const missing = describeMissingCacheCompatForModel(model);
|
|
3175
3284
|
notes.push(
|
|
3176
3285
|
"🔀 Router/channel: Third-party OpenAI-compatible proxy. If cache hit rates are low:",
|
|
3177
3286
|
);
|
|
@@ -3207,7 +3316,8 @@ function buildDoctorDiagnosis(model: PiModel, options: { promptCacheRetention400
|
|
|
3207
3316
|
const compat = getCompat(model);
|
|
3208
3317
|
lines.push(`Compat: ${JSON.stringify(compat)}`);
|
|
3209
3318
|
|
|
3210
|
-
const
|
|
3319
|
+
const deepSeekCompatApplicable = isDeepSeekCompatCheckApplicable(model);
|
|
3320
|
+
const missing = describeMissingCacheCompatForModel(model);
|
|
3211
3321
|
if (missing.length > 0) {
|
|
3212
3322
|
lines.push(`⚠️ Missing compat flags: ${missing.join(", ")}`);
|
|
3213
3323
|
const key = modelKey(model);
|
|
@@ -3215,14 +3325,18 @@ function buildDoctorDiagnosis(model: PiModel, options: { promptCacheRetention400
|
|
|
3215
3325
|
const providerLabel = slashIdx > 0 ? key.slice(0, slashIdx) : key;
|
|
3216
3326
|
const modelsJsonPath = getModelsJsonDisplayPath();
|
|
3217
3327
|
lines.push(`Edit ${modelsJsonPath} -> providers["${providerLabel}"] -> compat (same level as baseUrl/api/apiKey/models).`);
|
|
3218
|
-
|
|
3219
|
-
|
|
3328
|
+
if (deepSeekCompatApplicable) {
|
|
3329
|
+
appendDeepSeekCompatAdviceLines(lines, missing);
|
|
3330
|
+
} else {
|
|
3331
|
+
appendOpenAIProxyCompatAdviceLines(lines, missing);
|
|
3332
|
+
}
|
|
3333
|
+
} else if (deepSeekCompatApplicable || isCompatCheckApplicable(model)) {
|
|
3220
3334
|
lines.push("✅ Compat fully configured.");
|
|
3221
3335
|
} else {
|
|
3222
3336
|
lines.push("ℹ️ Compat check not applicable for this model.");
|
|
3223
3337
|
}
|
|
3224
3338
|
|
|
3225
|
-
if (
|
|
3339
|
+
if (isPromptCacheRetention400Applicable(model)) {
|
|
3226
3340
|
lines.push("");
|
|
3227
3341
|
if (options.promptCacheRetention400) {
|
|
3228
3342
|
lines.push("⚠️ A 400 response was observed while supportsLongCacheRetention is enabled.");
|
|
@@ -3274,8 +3388,8 @@ function buildLowHitDiagnosis(
|
|
|
3274
3388
|
): string[] {
|
|
3275
3389
|
const lines: string[] = [];
|
|
3276
3390
|
|
|
3277
|
-
// 1. Missing compat flags (
|
|
3278
|
-
const missingCompat =
|
|
3391
|
+
// 1. Missing compat flags (adapter-aware: DeepSeek has extra reasoning compat)
|
|
3392
|
+
const missingCompat = describeMissingCacheCompatForModel(model);
|
|
3279
3393
|
|
|
3280
3394
|
// 2. Router/channel risk (reuse existing check)
|
|
3281
3395
|
const routerNotes = describeRouterChannelDiagnostics(model);
|
|
@@ -3297,6 +3411,13 @@ function buildLowHitDiagnosis(
|
|
|
3297
3411
|
const hasRouterRisk = routerNotes.length > 0;
|
|
3298
3412
|
const hasUsageMissing = missingUsageSamples > 0;
|
|
3299
3413
|
|
|
3414
|
+
// Today's cached-token ratio is used both inside and outside the recent-sample
|
|
3415
|
+
// branch. Keep it block-external so doctor/stats never throw for low-hit
|
|
3416
|
+
// models that have persisted counters but no recent in-memory samples.
|
|
3417
|
+
const todayHitRatio = todayStats.totalInputTokens > 0
|
|
3418
|
+
? Math.round((todayStats.cachedInputTokens / todayStats.totalInputTokens) * 100)
|
|
3419
|
+
: 0;
|
|
3420
|
+
|
|
3300
3421
|
// Determine if there are actual issues worth flagging
|
|
3301
3422
|
const hasActualIssues = hasMissingCompat || hasUsageMissing ||
|
|
3302
3423
|
// Low hit trend (today total > 3 and hit ratio < 30%)
|
|
@@ -3337,10 +3458,6 @@ function buildLowHitDiagnosis(
|
|
|
3337
3458
|
// Priority 4: recent trend low
|
|
3338
3459
|
if (recent10Total > 0) {
|
|
3339
3460
|
const hitRatio = recent10Input > 0 ? Math.round((recent10Cached / recent10Input) * 100) : 0;
|
|
3340
|
-
const todayHitRatio = todayStats.totalInputTokens > 0
|
|
3341
|
-
? Math.round((todayStats.cachedInputTokens / todayStats.totalInputTokens) * 100)
|
|
3342
|
-
: 0;
|
|
3343
|
-
|
|
3344
3461
|
if (recent10Hits === 0 && todayStats.totalRequests > 3 && todayHitRatio < 30) {
|
|
3345
3462
|
lines.push(`📉 Cache hit rate is low: ${todayHitRatio}% today (${recent10Total} recent samples).`);
|
|
3346
3463
|
lines.push(" Likely causes: proxy routing to different backends per request,");
|
|
@@ -3371,7 +3488,8 @@ function buildLowHitDiagnosis(
|
|
|
3371
3488
|
}
|
|
3372
3489
|
|
|
3373
3490
|
function buildCompatDiagnosis(model: PiModel): string | undefined {
|
|
3374
|
-
const missing =
|
|
3491
|
+
const missing = describeMissingCacheCompatForModel(model);
|
|
3492
|
+
const deepSeekCompatApplicable = isDeepSeekCompatCheckApplicable(model);
|
|
3375
3493
|
const routerNotes = describeRouterChannelDiagnostics(model);
|
|
3376
3494
|
|
|
3377
3495
|
if (missing.length === 0 && routerNotes.length === 0) return undefined;
|
|
@@ -3388,14 +3506,18 @@ function buildCompatDiagnosis(model: PiModel): string | undefined {
|
|
|
3388
3506
|
lines.push("");
|
|
3389
3507
|
lines.push(`Edit ${modelsJsonPath} -> providers["${providerLabel}"] -> compat`);
|
|
3390
3508
|
lines.push(`(at the same level as baseUrl/api/apiKey/models).`);
|
|
3391
|
-
|
|
3509
|
+
if (deepSeekCompatApplicable) {
|
|
3510
|
+
appendDeepSeekCompatAdviceLines(lines, missing);
|
|
3511
|
+
} else {
|
|
3512
|
+
appendOpenAIProxyCompatAdviceLines(lines, missing);
|
|
3513
|
+
}
|
|
3392
3514
|
}
|
|
3393
3515
|
|
|
3394
3516
|
// When compat is fully configured but router notes exist, prefix the status.
|
|
3395
3517
|
if (routerNotes.length > 0 && missing.length === 0) {
|
|
3396
|
-
if (isCompatCheckApplicable(model)) {
|
|
3518
|
+
if (deepSeekCompatApplicable || isCompatCheckApplicable(model)) {
|
|
3397
3519
|
lines.push("✅ Compat fully configured.");
|
|
3398
|
-
if (
|
|
3520
|
+
if (isPromptCacheRetention400Applicable(model)) {
|
|
3399
3521
|
lines.push(getPromptCacheRetentionUnsupportedHint());
|
|
3400
3522
|
}
|
|
3401
3523
|
} else {
|
|
@@ -3441,9 +3563,16 @@ export const __internals_for_tests = {
|
|
|
3441
3563
|
isOpenAIFamilyToken,
|
|
3442
3564
|
describeMissingOpenAIFamilyProxyCompat,
|
|
3443
3565
|
describeMissingOpenAICompatibleProxyCompat,
|
|
3566
|
+
describeMissingDeepSeekCompat,
|
|
3567
|
+
isDeepSeekCompatCheckApplicable,
|
|
3568
|
+
describeMissingCacheCompatForModel,
|
|
3569
|
+
buildDeepSeekCompatSuggestion,
|
|
3570
|
+
buildDeepSeekCompatWarningText,
|
|
3444
3571
|
buildSafeOpenAIProxyCompatSuggestion,
|
|
3445
3572
|
getPromptCacheRetentionUnsupportedHint,
|
|
3446
3573
|
isOfficialOpenAIBaseUrl,
|
|
3574
|
+
isCompatCheckApplicable,
|
|
3575
|
+
isPromptCacheRetention400Applicable,
|
|
3447
3576
|
// Non-GPT OpenAI-compatible model detection
|
|
3448
3577
|
isKimiLikeModel,
|
|
3449
3578
|
isKimiLikeAssistantMessage,
|
|
@@ -3453,6 +3582,8 @@ export const __internals_for_tests = {
|
|
|
3453
3582
|
isGLMLikeAssistantMessage,
|
|
3454
3583
|
isMiniMaxLikeModel,
|
|
3455
3584
|
isMiniMaxLikeAssistantMessage,
|
|
3585
|
+
isMimoLikeModel,
|
|
3586
|
+
isMimoLikeAssistantMessage,
|
|
3456
3587
|
isHunyuanLikeModel,
|
|
3457
3588
|
isHunyuanLikeAssistantMessage,
|
|
3458
3589
|
// Additional OpenAI-compatible model detection
|
|
@@ -3551,6 +3682,8 @@ export const __internals_for_tests = {
|
|
|
3551
3682
|
isRwkvLikeAssistantMessage,
|
|
3552
3683
|
isAyaLikeModel,
|
|
3553
3684
|
isAyaLikeAssistantMessage,
|
|
3685
|
+
selectAdapterForModel,
|
|
3686
|
+
selectAdapterForAssistantMessage,
|
|
3554
3687
|
buildOpenAIProxyCompatWarningText,
|
|
3555
3688
|
getModelIdNameTokenValues,
|
|
3556
3689
|
getAssistantMessageModelTokenValues,
|
|
@@ -3899,15 +4032,15 @@ export default function (pi: ExtensionAPI) {
|
|
|
3899
4032
|
}
|
|
3900
4033
|
}
|
|
3901
4034
|
|
|
3902
|
-
// ⚠️ compat footer marker: if the active model
|
|
3903
|
-
//
|
|
3904
|
-
//
|
|
3905
|
-
// compat configuration is incomplete.
|
|
3906
|
-
// update so the marker persists through stats
|
|
3907
|
-
// rollovers. Redundant setStatus calls are blocked by the
|
|
4035
|
+
// ⚠️ compat footer marker: if the active model has adapter-specific
|
|
4036
|
+
// missing compat (DeepSeek reasoning/cache compat, or a non-official
|
|
4037
|
+
// openai-completions model missing cache/session-affinity flags), append
|
|
4038
|
+
// the marker to indicate that compat configuration is incomplete.
|
|
4039
|
+
// Re-evaluated on every status update so the marker persists through stats
|
|
4040
|
+
// changes and day rollovers. Redundant setStatus calls are blocked by the
|
|
3908
4041
|
// `lastStatusText` early return above.
|
|
3909
4042
|
if (runtimeOptimizerEnabled && statusText !== undefined && model) {
|
|
3910
|
-
const compatMissing =
|
|
4043
|
+
const compatMissing = describeMissingCacheCompatForModel(model);
|
|
3911
4044
|
if (compatMissing.length > 0) {
|
|
3912
4045
|
statusText = statusText + " ⚠️ compat";
|
|
3913
4046
|
}
|
|
@@ -4027,8 +4160,7 @@ export default function (pi: ExtensionAPI) {
|
|
|
4027
4160
|
const model = ctx.model;
|
|
4028
4161
|
if (!runtimeOptimizerEnabled || !model) return;
|
|
4029
4162
|
if (event.status !== 400) return;
|
|
4030
|
-
if (!
|
|
4031
|
-
if (getCompat(model).supportsLongCacheRetention !== true) return;
|
|
4163
|
+
if (!isPromptCacheRetention400Applicable(model)) return;
|
|
4032
4164
|
|
|
4033
4165
|
const key = modelKey(model);
|
|
4034
4166
|
promptCacheRetention400Models.add(key);
|
|
@@ -4140,7 +4272,7 @@ export default function (pi: ExtensionAPI) {
|
|
|
4140
4272
|
cmdCtx.ui.notify(compatResult, "warning");
|
|
4141
4273
|
} else {
|
|
4142
4274
|
cmdCtx.ui.notify(
|
|
4143
|
-
isCompatCheckApplicable(model)
|
|
4275
|
+
isDeepSeekCompatCheckApplicable(model) || isCompatCheckApplicable(model)
|
|
4144
4276
|
? "✅ Compat fully configured."
|
|
4145
4277
|
: "ℹ️ Compat check not applicable for this model.",
|
|
4146
4278
|
"info",
|
|
@@ -4238,7 +4370,7 @@ export default function (pi: ExtensionAPI) {
|
|
|
4238
4370
|
cmdCtx.ui.notify(compatResult, "warning");
|
|
4239
4371
|
} else {
|
|
4240
4372
|
cmdCtx.ui.notify(
|
|
4241
|
-
isCompatCheckApplicable(model)
|
|
4373
|
+
isDeepSeekCompatCheckApplicable(model) || isCompatCheckApplicable(model)
|
|
4242
4374
|
? "✅ Compat fully configured."
|
|
4243
4375
|
: "ℹ️ Compat check not applicable for this model.",
|
|
4244
4376
|
"info",
|
|
@@ -4285,11 +4417,11 @@ export default function (pi: ExtensionAPI) {
|
|
|
4285
4417
|
diagnosis.push("");
|
|
4286
4418
|
if (model) {
|
|
4287
4419
|
const displayKey = modelKey(model);
|
|
4288
|
-
const missing =
|
|
4420
|
+
const missing = describeMissingCacheCompatForModel(model);
|
|
4289
4421
|
if (missing.length > 0) {
|
|
4290
4422
|
diagnosis.push(`⚠️ Active model "${displayKey}" missing compat: ${missing.join(", ")}`);
|
|
4291
4423
|
diagnosis.push('Run "/cache-optimizer compat" for edit instructions.');
|
|
4292
|
-
} else if (isCompatCheckApplicable(model)) {
|
|
4424
|
+
} else if (isDeepSeekCompatCheckApplicable(model) || isCompatCheckApplicable(model)) {
|
|
4293
4425
|
diagnosis.push(`✅ Active model "${displayKey}": compat fully configured.`);
|
|
4294
4426
|
} else {
|
|
4295
4427
|
diagnosis.push(`ℹ️ Active model "${displayKey}": compat check not applicable.`);
|
package/package.json
CHANGED