pi-cache-optimizer 2.5.3 → 2.5.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -8,8 +8,6 @@
8
8
 
9
9
  Pi extension for improving provider-side KV / prompt cache hit rates. It keeps stable prompt content near the front, adds a conservative OpenAI-compatible `prompt_cache_key` fallback, warns about common proxy cache-routing gaps, and shows read-only footer cache stats.
10
10
 
11
- **GitHub About:** Improve Pi prompt/KV cache hit rates with stable prompts, OpenAI-compatible cache keys, proxy compat warnings, and footer cache stats.
12
-
13
11
  > Renamed from `pi-deepseek-cache-optimizer`. Existing footer counters migrate automatically. This package never creates, edits, backs up, or deletes your `~/.pi/agent/models.json`.
14
12
 
15
13
  ## Contents
@@ -100,6 +98,7 @@ Notes:
100
98
  - `supportsLongCacheRetention: true` is optional. Add it only when the endpoint explicitly supports OpenAI long prompt cache retention.
101
99
  - If you see `400 Unsupported parameter: prompt_cache_retention`, remove/avoid `supportsLongCacheRetention` for that channel. Keep `sendSessionAffinityHeaders` if supported.
102
100
  - Use `/cache-optimizer compat` or `/cache-optimizer doctor` to see model-specific advice.
101
+ - For DeepSeek models, the Pi Mono guidance expects `compat.requiresReasoningContentOnAssistantMessages: true` and `compat.thinkingFormat: "deepseek"` alongside cache/session-affinity flags when the endpoint supports them.
103
102
  - This extension only advises; it does not edit `models.json`.
104
103
 
105
104
  ## Footer stats
@@ -114,7 +113,7 @@ OpenAI cache 3/10 · 0.002M/0.005M tok (40%) ⚠️ compat
114
113
 
115
114
  Format: `<label> <hit requests>/<total requests> · <cached input tokens>/<total input tokens> tok (<token hit rate>)`. Some adapters may also append `· write <tokens> tok`, and runtime diagnostics may append `⚠️ compat` or `⚠️ integrity`.
116
115
 
117
- Supported footer labels include: DS, Claude, OpenAI, Gemini, Kimi, Qwen, GLM, MiniMax, Hunyuan, Mistral, Grok, Llama, Nemotron, Cohere, Yi, Doubao, ERNIE, Baichuan, StepFun, Spark, InternLM, Gemma, Phi, Jamba, Solar, Sonar, Nova, Reka, Falcon, DBRX, MPT, StableLM, Aquila, EXAONE, HyperCLOVA, Luminous, Hermes, Granite, Arctic, Pangu, SenseNova, Zhinao, MiniCPM, XVERSE, Orion, OpenChat, Vicuna, Wizard, Zephyr, Dolphin, OpenOrca, Starling, BLOOM, RWKV, and Aya.
116
+ Supported footer labels include: DS, Claude, OpenAI, Gemini, Kimi, Qwen, GLM, MiniMax, Mimo, Hunyuan, Mistral, Grok, Llama, Nemotron, Cohere, Yi, Doubao, ERNIE, Baichuan, StepFun, Spark, InternLM, Gemma, Phi, Jamba, Solar, Sonar, Nova, Reka, Falcon, DBRX, MPT, StableLM, Aquila, EXAONE, HyperCLOVA, Luminous, Hermes, Granite, Arctic, Pangu, SenseNova, Zhinao, MiniCPM, XVERSE, Orion, OpenChat, Vicuna, Wizard, Zephyr, Dolphin, OpenOrca, Starling, BLOOM, RWKV, and Aya.
118
117
 
119
118
  Adapter selection uses only model id/name (plus assistant message model/name on message end). Generic OpenAI-shaped APIs are not treated as OpenAI-family unless the model id/name matches a supported family.
120
119
 
package/README.zh-CN.md CHANGED
@@ -8,8 +8,6 @@
8
8
 
9
9
  用于提升 Pi 中 provider 侧 KV Cache / Prompt Cache 命中率的扩展:把稳定 prompt 内容前置,给 OpenAI-compatible 请求补保守的 `prompt_cache_key`,提示代理渠道常见缓存路由兼容问题,并在底部显示只读缓存统计。
10
10
 
11
- **GitHub About:** Improve Pi prompt/KV cache hit rates with stable prompts, OpenAI-compatible cache keys, proxy compat warnings, and footer cache stats.
12
-
13
11
  > 本包已从 `pi-deepseek-cache-optimizer` 改名。已有底部统计会自动迁移。本扩展绝不会创建、修改、备份或删除你的 `~/.pi/agent/models.json`。
14
12
 
15
13
  ## 目录
@@ -100,6 +98,7 @@ LiteLLM / OneAPI / NewAPI / 类 OpenRouter 渠道等第三方 `openai-completion
100
98
  - `supportsLongCacheRetention: true` 是可选项。只有 endpoint 明确支持 OpenAI long prompt cache retention 时才添加。
101
99
  - 如果出现 `400 Unsupported parameter: prompt_cache_retention`,请为该渠道移除 / 避免 `supportsLongCacheRetention`;如支持,可保留 `sendSessionAffinityHeaders`。
102
100
  - 使用 `/cache-optimizer compat` 或 `/cache-optimizer doctor` 查看当前模型的具体建议。
101
+ - 对 DeepSeek 模型,Pi Mono 指南期望在支持时同时设置 `compat.requiresReasoningContentOnAssistantMessages: true` 和 `compat.thinkingFormat: "deepseek"`,再配合缓存 / session-affinity 相关 compat。
103
102
  - 本扩展只给建议,不会修改 `models.json`。
104
103
 
105
104
  ## Footer 统计
@@ -114,7 +113,7 @@ OpenAI cache 3/10 · 0.002M/0.005M tok (40%) ⚠️ compat
114
113
 
115
114
  格式:`<label> <命中请求数>/<总请求数> · <cached input tokens>/<total input tokens> tok (<token 命中率>)`。部分 adapter 还可能追加 `· write <tokens> tok`,运行时诊断可能追加 `⚠️ compat` 或 `⚠️ integrity`。
116
115
 
117
- 支持的 footer label 包括:DS、Claude、OpenAI、Gemini、Kimi、Qwen、GLM、MiniMax、Hunyuan、Mistral、Grok、Llama、Nemotron、Cohere、Yi、Doubao、ERNIE、Baichuan、StepFun、Spark、InternLM、Gemma、Phi、Jamba、Solar、Sonar、Nova、Reka、Falcon、DBRX、MPT、StableLM、Aquila、EXAONE、HyperCLOVA、Luminous、Hermes、Granite、Arctic、Pangu、SenseNova、Zhinao、MiniCPM、XVERSE、Orion、OpenChat、Vicuna、Wizard、Zephyr、Dolphin、OpenOrca、Starling、BLOOM、RWKV、Aya。
116
+ 支持的 footer label 包括:DS、Claude、OpenAI、Gemini、Kimi、Qwen、GLM、MiniMax、Mimo、Hunyuan、Mistral、Grok、Llama、Nemotron、Cohere、Yi、Doubao、ERNIE、Baichuan、StepFun、Spark、InternLM、Gemma、Phi、Jamba、Solar、Sonar、Nova、Reka、Falcon、DBRX、MPT、StableLM、Aquila、EXAONE、HyperCLOVA、Luminous、Hermes、Granite、Arctic、Pangu、SenseNova、Zhinao、MiniCPM、XVERSE、Orion、OpenChat、Vicuna、Wizard、Zephyr、Dolphin、OpenOrca、Starling、BLOOM、RWKV、Aya。
118
117
 
119
118
  Adapter 选择只看模型 id/name(以及 message_end 时 assistant message 的 model/name)。仅使用 OpenAI-shaped API 不会被当作 OpenAI-family,除非模型 id/name 匹配受支持的家族。
120
119
 
package/index.ts CHANGED
@@ -126,6 +126,7 @@ const MIN_STABLE_CANDIDATE_LENGTH = 8;
126
126
  const ASSISTANT_MESSAGE_MODEL_TOKEN_KEYS = ["model", "name"];
127
127
  const OPENAI_REASONING_MODEL_PATTERN = /(^|[/\s:_-])o[1345]($|[-_.:/\s])/;
128
128
  const XAI_MODEL_PATTERN = /(^|[/\s:_-])xai($|[-_.:/\s])/;
129
+ const MIMO_MODEL_PATTERN = /(^|[/\s:_-])mi-?mo($|[-_.:/\s])/i;
129
130
  const PPLX_MODEL_PATTERN = /(^|[/\s:_-])pplx($|[-_.:/\s])/i;
130
131
  const NOVA_MODEL_PATTERN = /(^|[/\s:_-])nova($|[-_.:/\s])/i;
131
132
  const MPT_MODEL_PATTERN = /(^|[/\s:_-])mpt($|[-_.:/\s])/i;
@@ -141,6 +142,7 @@ type CacheCompat = {
141
142
  sendSessionIdHeader?: boolean;
142
143
  supportsLongCacheRetention?: boolean;
143
144
  thinkingFormat?: string;
145
+ requiresReasoningContentOnAssistantMessages?: boolean;
144
146
  cacheControlFormat?: string;
145
147
  };
146
148
 
@@ -831,6 +833,18 @@ function isMiniMaxLikeAssistantMessage(message: unknown, model: PiModel | undefi
831
833
  return modelOrAssistantMessageHas(message, model, ["minimax"]);
832
834
  }
833
835
 
836
+ function isMimoLikeModel(model: PiModel | undefined): boolean {
837
+ const tokens = getModelIdNameTokenValues(model);
838
+ return hasAnyTokenContaining(tokens, ["xiaomimimo"]) || tokens.some((t) => MIMO_MODEL_PATTERN.test(t));
839
+ }
840
+ function isMimoLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
841
+ const allTokens = [
842
+ ...getModelIdNameTokenValues(model),
843
+ ...getAssistantMessageModelTokenValues(message),
844
+ ];
845
+ return hasAnyTokenContaining(allTokens, ["xiaomimimo"]) || allTokens.some((t) => MIMO_MODEL_PATTERN.test(t));
846
+ }
847
+
834
848
  function isHunyuanLikeModel(model: PiModel | undefined): boolean {
835
849
  return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["hunyuan"]);
836
850
  }
@@ -1492,7 +1506,7 @@ function describeMissingOpenAIFamilyProxyCompat(model: PiModel): string[] {
1492
1506
  /**
1493
1507
  * Like describeMissingOpenAIFamilyProxyCompat but without the isOpenAIFamilyModel
1494
1508
  * gate. Warns for ANY model using openai-completions through a non-official base
1495
- * URL — covers GPT, Kimi, Qwen, GLM, MiniMax, Hunyuan, and any other
1509
+ * URL — covers GPT, Kimi, Qwen, GLM, MiniMax, Mimo, Hunyuan, and any other
1496
1510
  * OpenAI-compatible proxy.
1497
1511
  */
1498
1512
  function describeMissingOpenAICompatibleProxyCompat(model: PiModel): string[] {
@@ -1590,10 +1604,88 @@ function describeMissingDeepSeekCompat(model: PiModel): string[] {
1590
1604
  } else if (compat.sendSessionAffinityHeaders !== true) {
1591
1605
  missing.push("sendSessionAffinityHeaders");
1592
1606
  }
1607
+ if (compat.requiresReasoningContentOnAssistantMessages !== true) {
1608
+ missing.push("requiresReasoningContentOnAssistantMessages");
1609
+ }
1610
+ if (compat.thinkingFormat !== "deepseek") {
1611
+ missing.push("thinkingFormat");
1612
+ }
1593
1613
 
1594
1614
  return missing;
1595
1615
  }
1596
1616
 
1617
+ function isDeepSeekCompatCheckApplicable(model: PiModel): boolean {
1618
+ return isDeepSeekLikeModel(model) && isOpenAICompatibleApi(model.api);
1619
+ }
1620
+
1621
+ function describeMissingCacheCompatForModel(model: PiModel): string[] {
1622
+ if (isDeepSeekCompatCheckApplicable(model)) {
1623
+ return describeMissingDeepSeekCompat(model);
1624
+ }
1625
+ return describeMissingOpenAICompatibleProxyCompat(model);
1626
+ }
1627
+
1628
+ function buildDeepSeekCompatSuggestion(missing: string[]): Record<string, unknown> {
1629
+ const suggestion: Record<string, unknown> = {};
1630
+
1631
+ if (missing.includes("supportsLongCacheRetention")) {
1632
+ suggestion.supportsLongCacheRetention = true;
1633
+ }
1634
+ if (missing.includes("sendSessionIdHeader")) {
1635
+ suggestion.sendSessionIdHeader = true;
1636
+ }
1637
+ if (missing.includes("sendSessionAffinityHeaders")) {
1638
+ suggestion.sendSessionAffinityHeaders = true;
1639
+ }
1640
+ if (missing.includes("requiresReasoningContentOnAssistantMessages")) {
1641
+ suggestion.requiresReasoningContentOnAssistantMessages = true;
1642
+ }
1643
+ if (missing.includes("thinkingFormat")) {
1644
+ suggestion.thinkingFormat = "deepseek";
1645
+ }
1646
+
1647
+ return suggestion;
1648
+ }
1649
+
1650
+ function appendDeepSeekCompatAdviceLines(lines: string[], missing: string[]): void {
1651
+ const suggestion = buildDeepSeekCompatSuggestion(missing);
1652
+ if (Object.keys(suggestion).length > 0) {
1653
+ lines.push("Recommended DeepSeek compat snippet:");
1654
+ lines.push(JSON.stringify(suggestion, null, 2));
1655
+ }
1656
+
1657
+ if (missing.includes("requiresReasoningContentOnAssistantMessages")) {
1658
+ lines.push('- requiresReasoningContentOnAssistantMessages: true keeps replayed assistant turns compatible with DeepSeek reasoning_content requirements.');
1659
+ }
1660
+ if (missing.includes("thinkingFormat")) {
1661
+ lines.push('- thinkingFormat: "deepseek" tells Pi to use DeepSeek reasoning/thinking parameter format.');
1662
+ }
1663
+ if (missing.includes("sendSessionAffinityHeaders")) {
1664
+ lines.push("- sendSessionAffinityHeaders: recommended for OpenAI-compatible DeepSeek proxies when supported; it helps keep one Pi session on the same upstream/backend.");
1665
+ }
1666
+ if (missing.includes("sendSessionIdHeader")) {
1667
+ lines.push("- sendSessionIdHeader: recommended for OpenAI Responses-compatible DeepSeek proxies when supported.");
1668
+ }
1669
+ if (missing.includes("supportsLongCacheRetention")) {
1670
+ lines.push("- supportsLongCacheRetention: enable for DeepSeek-compatible endpoints that support long cache retention.");
1671
+ }
1672
+ }
1673
+
1674
+ function buildDeepSeekCompatWarningText(key: string, missing: string[]): string {
1675
+ const slashIdx = key.indexOf("/");
1676
+ const providerLabel = slashIdx > 0 ? key.slice(0, slashIdx) : key;
1677
+ const modelsJsonPath = getModelsJsonDisplayPath();
1678
+ const lines: string[] = [
1679
+ `💡 pi-cache-optimizer: ${key} is DeepSeek-like but merged compat lacks ${missing.join(" and ")}.`,
1680
+ `Proxies may reduce or hide cache hits. Edit ${modelsJsonPath} -> providers["${providerLabel}"] -> compat (at the same level as baseUrl/api/apiKey/models).`,
1681
+ "",
1682
+ ];
1683
+
1684
+ appendDeepSeekCompatAdviceLines(lines, missing);
1685
+
1686
+ return lines.join("\n");
1687
+ }
1688
+
1597
1689
  const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
1598
1690
  {
1599
1691
  id: "deepseek",
@@ -1613,13 +1705,7 @@ const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
1613
1705
  if (missing.length === 0) return undefined;
1614
1706
 
1615
1707
  const key = modelKey(model);
1616
- const slashIdx = key.indexOf("/");
1617
- const providerLabel = slashIdx > 0 ? key.slice(0, slashIdx) : key;
1618
- const modelsJsonPath = getModelsJsonDisplayPath();
1619
- return (
1620
- `💡 pi-cache-optimizer: ${key} is DeepSeek-like but merged compat lacks ${missing.join(" and ")}. ` +
1621
- `Proxies may reduce or hide cache hits. Edit ${modelsJsonPath} -> providers["${providerLabel}"] -> compat (at the same level as baseUrl/api/apiKey/models).`
1622
- );
1708
+ return buildDeepSeekCompatWarningText(key, missing);
1623
1709
  },
1624
1710
  },
1625
1711
  {
@@ -1742,6 +1828,23 @@ const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
1742
1828
  return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1743
1829
  },
1744
1830
  },
1831
+ {
1832
+ id: "openai" as CacheProviderId,
1833
+ label: "Mimo cache",
1834
+ matchesModel: isMimoLikeModel,
1835
+ matchesAssistantMessage(message, model) {
1836
+ if (!isAssistantMessage(message)) return false;
1837
+ return isMimoLikeAssistantMessage(message, model);
1838
+ },
1839
+ normalizeUsage(message) {
1840
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1841
+ },
1842
+ warningText(model) {
1843
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1844
+ if (missing.length === 0) return undefined;
1845
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1846
+ },
1847
+ },
1745
1848
  {
1746
1849
  id: "openai" as CacheProviderId,
1747
1850
  label: "Hunyuan cache",
@@ -3028,6 +3131,12 @@ function isCompatCheckApplicable(model: PiModel): boolean {
3028
3131
  return lower(model.api) === "openai-completions" && !isOfficialOpenAIBaseUrl(model);
3029
3132
  }
3030
3133
 
3134
+ function isPromptCacheRetention400Applicable(model: PiModel): boolean {
3135
+ return isOpenAICompatibleApi(model.api) &&
3136
+ !isOfficialOpenAIBaseUrl(model) &&
3137
+ getCompat(model).supportsLongCacheRetention === true;
3138
+ }
3139
+
3031
3140
  /**
3032
3141
  * Detect router / channel profiles from a PiModel and return diagnostic notes.
3033
3142
  *
@@ -3171,7 +3280,7 @@ function describeRouterChannelDiagnostics(model: PiModel): string[] {
3171
3280
 
3172
3281
  // ── 4. Generic third-party OpenAI-compatible proxy ─────────────────
3173
3282
  if (api === "openai-completions" && baseUrl) {
3174
- const missing = describeMissingOpenAICompatibleProxyCompat(model);
3283
+ const missing = describeMissingCacheCompatForModel(model);
3175
3284
  notes.push(
3176
3285
  "🔀 Router/channel: Third-party OpenAI-compatible proxy. If cache hit rates are low:",
3177
3286
  );
@@ -3207,7 +3316,8 @@ function buildDoctorDiagnosis(model: PiModel, options: { promptCacheRetention400
3207
3316
  const compat = getCompat(model);
3208
3317
  lines.push(`Compat: ${JSON.stringify(compat)}`);
3209
3318
 
3210
- const missing = describeMissingOpenAICompatibleProxyCompat(model);
3319
+ const deepSeekCompatApplicable = isDeepSeekCompatCheckApplicable(model);
3320
+ const missing = describeMissingCacheCompatForModel(model);
3211
3321
  if (missing.length > 0) {
3212
3322
  lines.push(`⚠️ Missing compat flags: ${missing.join(", ")}`);
3213
3323
  const key = modelKey(model);
@@ -3215,14 +3325,18 @@ function buildDoctorDiagnosis(model: PiModel, options: { promptCacheRetention400
3215
3325
  const providerLabel = slashIdx > 0 ? key.slice(0, slashIdx) : key;
3216
3326
  const modelsJsonPath = getModelsJsonDisplayPath();
3217
3327
  lines.push(`Edit ${modelsJsonPath} -> providers["${providerLabel}"] -> compat (same level as baseUrl/api/apiKey/models).`);
3218
- appendOpenAIProxyCompatAdviceLines(lines, missing);
3219
- } else if (isCompatCheckApplicable(model)) {
3328
+ if (deepSeekCompatApplicable) {
3329
+ appendDeepSeekCompatAdviceLines(lines, missing);
3330
+ } else {
3331
+ appendOpenAIProxyCompatAdviceLines(lines, missing);
3332
+ }
3333
+ } else if (deepSeekCompatApplicable || isCompatCheckApplicable(model)) {
3220
3334
  lines.push("✅ Compat fully configured.");
3221
3335
  } else {
3222
3336
  lines.push("ℹ️ Compat check not applicable for this model.");
3223
3337
  }
3224
3338
 
3225
- if (isCompatCheckApplicable(model) && compat.supportsLongCacheRetention === true) {
3339
+ if (isPromptCacheRetention400Applicable(model)) {
3226
3340
  lines.push("");
3227
3341
  if (options.promptCacheRetention400) {
3228
3342
  lines.push("⚠️ A 400 response was observed while supportsLongCacheRetention is enabled.");
@@ -3274,8 +3388,8 @@ function buildLowHitDiagnosis(
3274
3388
  ): string[] {
3275
3389
  const lines: string[] = [];
3276
3390
 
3277
- // 1. Missing compat flags (reuse existing check)
3278
- const missingCompat = describeMissingOpenAICompatibleProxyCompat(model);
3391
+ // 1. Missing compat flags (adapter-aware: DeepSeek has extra reasoning compat)
3392
+ const missingCompat = describeMissingCacheCompatForModel(model);
3279
3393
 
3280
3394
  // 2. Router/channel risk (reuse existing check)
3281
3395
  const routerNotes = describeRouterChannelDiagnostics(model);
@@ -3297,6 +3411,13 @@ function buildLowHitDiagnosis(
3297
3411
  const hasRouterRisk = routerNotes.length > 0;
3298
3412
  const hasUsageMissing = missingUsageSamples > 0;
3299
3413
 
3414
+ // Today's cached-token ratio is used both inside and outside the recent-sample
3415
+ // branch. Keep it block-external so doctor/stats never throw for low-hit
3416
+ // models that have persisted counters but no recent in-memory samples.
3417
+ const todayHitRatio = todayStats.totalInputTokens > 0
3418
+ ? Math.round((todayStats.cachedInputTokens / todayStats.totalInputTokens) * 100)
3419
+ : 0;
3420
+
3300
3421
  // Determine if there are actual issues worth flagging
3301
3422
  const hasActualIssues = hasMissingCompat || hasUsageMissing ||
3302
3423
  // Low hit trend (today total > 3 and hit ratio < 30%)
@@ -3337,10 +3458,6 @@ function buildLowHitDiagnosis(
3337
3458
  // Priority 4: recent trend low
3338
3459
  if (recent10Total > 0) {
3339
3460
  const hitRatio = recent10Input > 0 ? Math.round((recent10Cached / recent10Input) * 100) : 0;
3340
- const todayHitRatio = todayStats.totalInputTokens > 0
3341
- ? Math.round((todayStats.cachedInputTokens / todayStats.totalInputTokens) * 100)
3342
- : 0;
3343
-
3344
3461
  if (recent10Hits === 0 && todayStats.totalRequests > 3 && todayHitRatio < 30) {
3345
3462
  lines.push(`📉 Cache hit rate is low: ${todayHitRatio}% today (${recent10Total} recent samples).`);
3346
3463
  lines.push(" Likely causes: proxy routing to different backends per request,");
@@ -3371,7 +3488,8 @@ function buildLowHitDiagnosis(
3371
3488
  }
3372
3489
 
3373
3490
  function buildCompatDiagnosis(model: PiModel): string | undefined {
3374
- const missing = describeMissingOpenAICompatibleProxyCompat(model);
3491
+ const missing = describeMissingCacheCompatForModel(model);
3492
+ const deepSeekCompatApplicable = isDeepSeekCompatCheckApplicable(model);
3375
3493
  const routerNotes = describeRouterChannelDiagnostics(model);
3376
3494
 
3377
3495
  if (missing.length === 0 && routerNotes.length === 0) return undefined;
@@ -3388,14 +3506,18 @@ function buildCompatDiagnosis(model: PiModel): string | undefined {
3388
3506
  lines.push("");
3389
3507
  lines.push(`Edit ${modelsJsonPath} -> providers["${providerLabel}"] -> compat`);
3390
3508
  lines.push(`(at the same level as baseUrl/api/apiKey/models).`);
3391
- appendOpenAIProxyCompatAdviceLines(lines, missing);
3509
+ if (deepSeekCompatApplicable) {
3510
+ appendDeepSeekCompatAdviceLines(lines, missing);
3511
+ } else {
3512
+ appendOpenAIProxyCompatAdviceLines(lines, missing);
3513
+ }
3392
3514
  }
3393
3515
 
3394
3516
  // When compat is fully configured but router notes exist, prefix the status.
3395
3517
  if (routerNotes.length > 0 && missing.length === 0) {
3396
- if (isCompatCheckApplicable(model)) {
3518
+ if (deepSeekCompatApplicable || isCompatCheckApplicable(model)) {
3397
3519
  lines.push("✅ Compat fully configured.");
3398
- if (getCompat(model).supportsLongCacheRetention === true) {
3520
+ if (isPromptCacheRetention400Applicable(model)) {
3399
3521
  lines.push(getPromptCacheRetentionUnsupportedHint());
3400
3522
  }
3401
3523
  } else {
@@ -3441,9 +3563,16 @@ export const __internals_for_tests = {
3441
3563
  isOpenAIFamilyToken,
3442
3564
  describeMissingOpenAIFamilyProxyCompat,
3443
3565
  describeMissingOpenAICompatibleProxyCompat,
3566
+ describeMissingDeepSeekCompat,
3567
+ isDeepSeekCompatCheckApplicable,
3568
+ describeMissingCacheCompatForModel,
3569
+ buildDeepSeekCompatSuggestion,
3570
+ buildDeepSeekCompatWarningText,
3444
3571
  buildSafeOpenAIProxyCompatSuggestion,
3445
3572
  getPromptCacheRetentionUnsupportedHint,
3446
3573
  isOfficialOpenAIBaseUrl,
3574
+ isCompatCheckApplicable,
3575
+ isPromptCacheRetention400Applicable,
3447
3576
  // Non-GPT OpenAI-compatible model detection
3448
3577
  isKimiLikeModel,
3449
3578
  isKimiLikeAssistantMessage,
@@ -3453,6 +3582,8 @@ export const __internals_for_tests = {
3453
3582
  isGLMLikeAssistantMessage,
3454
3583
  isMiniMaxLikeModel,
3455
3584
  isMiniMaxLikeAssistantMessage,
3585
+ isMimoLikeModel,
3586
+ isMimoLikeAssistantMessage,
3456
3587
  isHunyuanLikeModel,
3457
3588
  isHunyuanLikeAssistantMessage,
3458
3589
  // Additional OpenAI-compatible model detection
@@ -3551,6 +3682,8 @@ export const __internals_for_tests = {
3551
3682
  isRwkvLikeAssistantMessage,
3552
3683
  isAyaLikeModel,
3553
3684
  isAyaLikeAssistantMessage,
3685
+ selectAdapterForModel,
3686
+ selectAdapterForAssistantMessage,
3554
3687
  buildOpenAIProxyCompatWarningText,
3555
3688
  getModelIdNameTokenValues,
3556
3689
  getAssistantMessageModelTokenValues,
@@ -3899,15 +4032,15 @@ export default function (pi: ExtensionAPI) {
3899
4032
  }
3900
4033
  }
3901
4034
 
3902
- // ⚠️ compat footer marker: if the active model is a non-official
3903
- // openai-completions model with missing supportsLongCacheRetention
3904
- // or sendSessionAffinityHeaders, append the marker to indicate that
3905
- // compat configuration is incomplete. Re-evaluated on every status
3906
- // update so the marker persists through stats changes and day
3907
- // rollovers. Redundant setStatus calls are blocked by the
4035
+ // ⚠️ compat footer marker: if the active model has adapter-specific
4036
+ // missing compat (DeepSeek reasoning/cache compat, or a non-official
4037
+ // openai-completions model missing cache/session-affinity flags), append
4038
+ // the marker to indicate that compat configuration is incomplete.
4039
+ // Re-evaluated on every status update so the marker persists through stats
4040
+ // changes and day rollovers. Redundant setStatus calls are blocked by the
3908
4041
  // `lastStatusText` early return above.
3909
4042
  if (runtimeOptimizerEnabled && statusText !== undefined && model) {
3910
- const compatMissing = describeMissingOpenAICompatibleProxyCompat(model);
4043
+ const compatMissing = describeMissingCacheCompatForModel(model);
3911
4044
  if (compatMissing.length > 0) {
3912
4045
  statusText = statusText + " ⚠️ compat";
3913
4046
  }
@@ -4027,8 +4160,7 @@ export default function (pi: ExtensionAPI) {
4027
4160
  const model = ctx.model;
4028
4161
  if (!runtimeOptimizerEnabled || !model) return;
4029
4162
  if (event.status !== 400) return;
4030
- if (!isCompatCheckApplicable(model)) return;
4031
- if (getCompat(model).supportsLongCacheRetention !== true) return;
4163
+ if (!isPromptCacheRetention400Applicable(model)) return;
4032
4164
 
4033
4165
  const key = modelKey(model);
4034
4166
  promptCacheRetention400Models.add(key);
@@ -4140,7 +4272,7 @@ export default function (pi: ExtensionAPI) {
4140
4272
  cmdCtx.ui.notify(compatResult, "warning");
4141
4273
  } else {
4142
4274
  cmdCtx.ui.notify(
4143
- isCompatCheckApplicable(model)
4275
+ isDeepSeekCompatCheckApplicable(model) || isCompatCheckApplicable(model)
4144
4276
  ? "✅ Compat fully configured."
4145
4277
  : "ℹ️ Compat check not applicable for this model.",
4146
4278
  "info",
@@ -4238,7 +4370,7 @@ export default function (pi: ExtensionAPI) {
4238
4370
  cmdCtx.ui.notify(compatResult, "warning");
4239
4371
  } else {
4240
4372
  cmdCtx.ui.notify(
4241
- isCompatCheckApplicable(model)
4373
+ isDeepSeekCompatCheckApplicable(model) || isCompatCheckApplicable(model)
4242
4374
  ? "✅ Compat fully configured."
4243
4375
  : "ℹ️ Compat check not applicable for this model.",
4244
4376
  "info",
@@ -4285,11 +4417,11 @@ export default function (pi: ExtensionAPI) {
4285
4417
  diagnosis.push("");
4286
4418
  if (model) {
4287
4419
  const displayKey = modelKey(model);
4288
- const missing = describeMissingOpenAICompatibleProxyCompat(model);
4420
+ const missing = describeMissingCacheCompatForModel(model);
4289
4421
  if (missing.length > 0) {
4290
4422
  diagnosis.push(`⚠️ Active model "${displayKey}" missing compat: ${missing.join(", ")}`);
4291
4423
  diagnosis.push('Run "/cache-optimizer compat" for edit instructions.');
4292
- } else if (isCompatCheckApplicable(model)) {
4424
+ } else if (isDeepSeekCompatCheckApplicable(model) || isCompatCheckApplicable(model)) {
4293
4425
  diagnosis.push(`✅ Active model "${displayKey}": compat fully configured.`);
4294
4426
  } else {
4295
4427
  diagnosis.push(`ℹ️ Active model "${displayKey}": compat check not applicable.`);
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-cache-optimizer",
3
- "version": "2.5.3",
3
+ "version": "2.5.5",
4
4
  "description": "Improve Pi prompt/KV cache hit rates with stable prompts, OpenAI-compatible cache keys, proxy compat warnings, and footer cache stats.",
5
5
  "keywords": [
6
6
  "pi-package",