pi-cache-optimizer 2.4.2 → 2.4.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -35,6 +35,17 @@ This release keeps the original DeepSeek behavior and adds read-only stats adapt
35
35
  |---|---|---|---|
36
36
  | DeepSeek | Model id/name contains `deepseek` | `DS cache` | Pi `usage.cacheRead`/`usage.input`, or raw `prompt_cache_hit_tokens`, `prompt_cache_miss_tokens`, `prompt_tokens` when visible |
37
37
  | OpenAI-family | Model id/name contains conservative OpenAI-family tokens such as `gpt-`, `chatgpt`, `o1`, `o3`, `o4`, or `o5` | `OpenAI cache` | Pi-normalized usage, or raw `prompt_tokens_details.cached_tokens` / `input_tokens_details.cached_tokens` with prompt/input totals |
38
+ | Kimi / Moonshot | Model id/name contains `kimi` | `Kimi cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
39
+ | Qwen / Alibaba | Model id/name contains `qwen` | `Qwen cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
40
+ | GLM / Zhipu | Model id/name contains `glm` | `GLM cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
41
+ | MiniMax | Model id/name contains `minimax` | `MiniMax cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
42
+ | Hunyuan / Tencent | Model id/name contains `hunyuan` | `Hunyuan cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
43
+ | Mistral | Model id/name contains `mistral`, `mixtral`, or `codestral` | `Mistral cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
44
+ | xAI / Grok | Model id/name contains `grok`, or pattern `xai` with safe boundaries | `Grok cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
45
+ | Meta / Llama | Model id/name contains `llama` | `Llama cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
46
+ | NVIDIA Nemotron | Model id/name contains `nemotron` | `Nemotron cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
47
+ | Cohere / Command | Model id/name contains `cohere` or `command-r` | `Cohere cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
48
+ | Yi / 零一万物 | Model id/name contains `yi-`, `01-ai`, `zero-one`, or pattern `yi` with safe boundaries | `Yi cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
38
49
  | Anthropic / Claude | Model id/name contains `anthropic` or `claude` | `Claude cache` | Pi-normalized usage, or raw `cache_read_input_tokens`, `cache_creation_input_tokens`, `input_tokens` |
39
50
  | Gemini / Vertex | Model id/name contains `gemini` or `vertex` | `Gemini cache` | Pi-normalized usage, or raw Gemini/Vertex cached-content token metadata when visible |
40
51
 
@@ -192,7 +203,7 @@ After: [stable tools + rules | dynamic git status | task context]
192
203
  ↓ stable prefix → higher chance of cache reuse
193
204
  ```
194
205
 
195
- Pi itself decides whether to send cache-related fields such as `prompt_cache_retention`, session-affinity headers, or Anthropic-style `cache_control` based on model compat and `PI_CACHE_RETENTION`. This extension now adds only one conservative request-body fallback by default: for OpenAI-family models using OpenAI-compatible Pi APIs, it fills a missing or blank top-level `prompt_cache_key` with the Pi session id and never overwrites an existing non-empty key. The extension does not fake cache hits; it helps configuration, improves stable-prefix probability, and summarizes exposed usage in the footer.
206
+ Pi itself decides whether to send cache-related fields such as `prompt_cache_retention`, session-affinity headers, or Anthropic-style `cache_control` based on model compat and `PI_CACHE_RETENTION`. This extension now adds only one conservative request-body fallback by default: for all models using OpenAI-compatible Pi APIs (`openai-completions` / `openai-responses`), it fills a missing or blank top-level `prompt_cache_key` with the Pi session id and never overwrites an existing non-empty key. This covers GPT-named models, Kimi/Moonshot, Qwen/Alibaba, GLM/Zhipu, MiniMax, Hunyuan, and any other provider using an OpenAI-shaped API — only custom transports like `kiro-api` are excluded. The extension does not fake cache hits; it helps configuration, improves stable-prefix probability, and summarizes exposed usage in the footer.
196
207
 
197
208
  ## Improving cache hit rate
198
209
 
@@ -207,7 +218,7 @@ What the extension does automatically:
207
218
  Provider notes:
208
219
 
209
220
  - DeepSeek: current behavior remains the reference path. Stable prefix ordering plus long-retention/session-affinity compat gives the best chance of automatic KV prefix reuse.
210
- - OpenAI-family: prompt caching is automatic only on supported upstreams and sufficiently long prompts. Keep static instructions, tools, examples, and specs before changing user/task context. Pi owns retention transport by default. For OpenAI-compatible Pi APIs, the extension fills a missing or blank top-level `prompt_cache_key` with the Pi session id (matching Pi core's official OpenAI behavior) and never overwrites an existing non-empty `prompt_cache_key` / `promptCacheKey`. Disable this fallback with `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` or `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0`. Unsupported OpenAI-compatible proxies may reject unknown fields; custom APIs are not targeted.
221
+ - OpenAI-family: prompt caching is automatic only on supported upstreams and sufficiently long prompts. Keep static instructions, tools, examples, and specs before changing user/task context. Pi owns retention transport by default. For OpenAI-compatible Pi APIs, the extension fills a missing or blank top-level `prompt_cache_key` with the Pi session id (matching Pi core's official OpenAI behavior) and never overwrites an existing non-empty `prompt_cache_key` / `promptCacheKey`. The fallback now applies to ALL models using `openai-completions` / `openai-responses` (not just GPT-named ones), so Kimi, Qwen, GLM, MiniMax, Hunyuan, and other OpenAI-compatible models also benefit. Disable this fallback with `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` or `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0`. Unsupported OpenAI-compatible proxies may reject unknown fields; custom APIs are not targeted.
211
222
  - Claude: prompt caching depends on Anthropic `cache_control` breakpoints. This extension does not inject breakpoints itself; for compatible endpoints, configure Pi compat such as `cacheControlFormat: "anthropic"` only when the endpoint supports it.
212
223
  - Gemini/Vertex: implicit caching benefits from repeated large stable prefixes. This extension does not create explicit `cachedContents` resources or store cache resource names.
213
224
  - Proxies/aggregators: fix upstream routing/provider order where possible. Cache hit rates are unreliable if the same model id/name can route to different upstreams.
@@ -226,7 +237,7 @@ This package now has provider-family stats adapters, but it still avoids blind g
226
237
 
227
238
  - Broad/provider-agnostic request-body mutation or cache-control injection. The only default request-body fallback is OpenAI-family `prompt_cache_key` on OpenAI-compatible APIs, sourced from the Pi session id and skipped when an effective key already exists.
228
239
  - Injecting Anthropic `cache_control` markers.
229
- - Sending OpenAI `prompt_cache_key` into custom/non-OpenAI-compatible APIs; the fallback is gated to OpenAI-family id/name plus `openai-completions` / `openai-responses`.
240
+ - Sending OpenAI `prompt_cache_key` into custom/non-OpenAI-compatible APIs; the fallback is gated to `openai-completions` / `openai-responses` only (custom transports like `kiro-api` are excluded, but the model name no longer needs to be GPT-family).
230
241
  - Overriding OpenAI `prompt_cache_retention` outside Pi's own compat handling.
231
242
  - Creating Gemini explicit `cachedContents` resources or persisting cache resource names.
232
243
  - Claiming stats for providers that do not expose reliable cache usage.
package/README.zh-CN.md CHANGED
@@ -38,6 +38,17 @@
38
38
  |---|---|---|---|
39
39
  | DeepSeek | model id/name 包含 `deepseek` | `DS cache` | Pi `usage.cacheRead`/`usage.input`,或可见 raw 字段 `prompt_cache_hit_tokens`、`prompt_cache_miss_tokens`、`prompt_tokens` |
40
40
  | OpenAI-family | model id/name 包含保守 OpenAI-family token,例如 `gpt-`、`chatgpt`、`o1`、`o3`、`o4` 或 `o5` | `OpenAI cache` | Pi 归一化 usage,或可见 raw 字段 `prompt_tokens_details.cached_tokens` / `input_tokens_details.cached_tokens` 及 prompt/input total |
41
+ | Kimi / Moonshot | model id/name 包含 `kimi` | `Kimi cache` | Pi 归一化 usage,或可见 OpenAI 形状字段 |
42
+ | Qwen / Alibaba | model id/name 包含 `qwen` | `Qwen cache` | Pi 归一化 usage,或可见 OpenAI 形状字段 |
43
+ | GLM / Zhipu | model id/name 包含 `glm` | `GLM cache` | Pi 归一化 usage,或可见 OpenAI 形状字段 |
44
+ | MiniMax | model id/name 包含 `minimax` | `MiniMax cache` | Pi 归一化 usage,或可见 OpenAI 形状字段 |
45
+ | Hunyuan / Tencent | model id/name 包含 `hunyuan` | `Hunyuan cache` | Pi 归一化 usage,或可见 OpenAI 形状字段 |
46
+ | Mistral | model id/name 包含 `mistral`、`mixtral` 或 `codestral` | `Mistral cache` | Pi 归一化 usage,或可见 OpenAI 形状字段 |
47
+ | xAI / Grok | model id/name 包含 `grok`,或安全边界内 `xai` 模式 | `Grok cache` | Pi 归一化 usage,或可见 OpenAI 形状字段 |
48
+ | Meta / Llama | model id/name 包含 `llama` | `Llama cache` | Pi 归一化 usage,或可见 OpenAI 形状字段 |
49
+ | NVIDIA Nemotron | model id/name 包含 `nemotron` | `Nemotron cache` | Pi 归一化 usage,或可见 OpenAI 形状字段 |
50
+ | Cohere / Command | model id/name 包含 `cohere` 或 `command-r` | `Cohere cache` | Pi 归一化 usage,或可见 OpenAI 形状字段 |
51
+ | Yi / 零一万物 | model id/name 包含 `yi-`、`01-ai`、`zero-one`,或安全边界内 `yi` 模式 | `Yi cache` | Pi 归一化 usage,或可见 OpenAI 形状字段 |
41
52
  | Anthropic / Claude | model id/name 包含 `anthropic` 或 `claude` | `Claude cache` | Pi 归一化 usage,或可见 raw 字段 `cache_read_input_tokens`、`cache_creation_input_tokens`、`input_tokens` |
42
53
  | Gemini / Vertex | model id/name 包含 `gemini` 或 `vertex` | `Gemini cache` | Pi 归一化 usage,或可见 Gemini/Vertex cached-content token metadata |
43
54
 
@@ -195,7 +206,7 @@ Provider 缓存通常依赖精确或近似精确的前缀匹配。Pi 的 system
195
206
  ↓ 稳定前缀不变 → 更容易命中缓存
196
207
  ```
197
208
 
198
- Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送缓存相关字段,例如 `prompt_cache_retention`、session affinity headers 或 Anthropic-style `cache_control`。本扩展现在默认只做一个保守的 request-body 兜底:对使用 OpenAI-compatible Pi API OpenAI-family 模型,当顶层 `prompt_cache_key` 缺失或为空时,用 Pi session id 补上,并且不会覆盖已有的非空 key。本扩展不伪造缓存命中,只帮助配置、提高稳定前缀概率,并把已暴露的 usage 汇总到底部状态栏。
209
+ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送缓存相关字段,例如 `prompt_cache_retention`、session affinity headers 或 Anthropic-style `cache_control`。本扩展现在默认只做一个保守的 request-body 兜底:对所有使用 OpenAI-compatible Pi API(`openai-completions` / `openai-responses`)的模型,当顶层 `prompt_cache_key` 缺失或为空时,用 Pi session id 补上,并且不会覆盖已有的非空 key。这覆盖 GPT 命名模型、Kimi/Moonshot、Qwen/Alibaba、GLM/Zhipu、MiniMax、Hunyuan 等任何使用 OpenAI 形状 API 的 provider——只有 `kiro-api` 等 custom transport 不被注入。本扩展不伪造缓存命中,只帮助配置、提高稳定前缀概率,并把已暴露的 usage 汇总到底部状态栏。
199
210
 
200
211
  ## 提高 cache 命中率
201
212
 
@@ -210,7 +221,7 @@ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送
210
221
  各 provider 注意点:
211
222
 
212
223
  - DeepSeek:现有行为仍是参考路径。稳定前缀排序,加上 long-retention / session-affinity compat,最有利于自动 KV prefix 复用。
213
- - OpenAI-family:prompt caching 只会在真实上游支持且 prompt 足够长时自动生效。请尽量把静态 instructions、tools、examples、specs 放在变化的 user/task context 前面。retention 传输默认由 Pi 负责。对 OpenAI-compatible Pi API,本扩展会用 Pi session id 补齐缺失或空白的顶层 `prompt_cache_key`(与 Pi core 官方 OpenAI 行为对齐),并且不会覆盖已有非空的 `prompt_cache_key` / `promptCacheKey`。可用 `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` 或 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` 禁用该兜底。不支持该字段的 OpenAI-compatible 代理可能拒绝请求;custom API 不会被注入。
224
+ - OpenAI-family:prompt caching 只会在真实上游支持且 prompt 足够长时自动生效。请尽量把静态 instructions、tools、examples、specs 放在变化的 user/task context 前面。retention 传输默认由 Pi 负责。对 OpenAI-compatible Pi API,本扩展会用 Pi session id 补齐缺失或空白的顶层 `prompt_cache_key`(与 Pi core 官方 OpenAI 行为对齐),并且不会覆盖已有非空的 `prompt_cache_key` / `promptCacheKey`。该兜底现在适用于所有使用 `openai-completions` / `openai-responses` 的模型(不限于 GPT 命名),因此 Kimi、Qwen、GLM、MiniMax、Hunyuan 等 OpenAI-compatible 模型也同样受益。可用 `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` 或 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` 禁用该兜底。不支持该字段的 OpenAI-compatible 代理可能拒绝请求;custom API 不会被注入。
214
225
  - Claude:prompt caching 依赖 Anthropic `cache_control` breakpoints。本扩展不会自行注入 breakpoint;对兼容 endpoint,只在 endpoint 明确支持时配置 Pi compat,例如 `cacheControlFormat: "anthropic"`。
215
226
  - Gemini/Vertex:implicit caching 受益于重复的大型稳定前缀。本扩展不会创建 explicit `cachedContents` resources,也不会保存 cache resource names。
216
227
  - Proxies/aggregators:尽量固定上游 routing/provider order。如果同一个 model id/name 可能路由到不同上游,cache hit rate 会不稳定。
@@ -230,7 +241,7 @@ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送
230
241
 
231
242
  - 广泛/provider-agnostic 修改请求体,或做 cache-control 注入。唯一默认 request-body 兜底是 OpenAI-family 在 OpenAI-compatible API 上使用 Pi session id 的 `prompt_cache_key`,且已有有效 key 时会跳过。
232
243
  - 注入 Anthropic `cache_control` markers。
233
- - 向 custom / 非 OpenAI-compatible API 发送 OpenAI `prompt_cache_key`;该兜底同时要求 model id/name 属于 OpenAI-family,且 API 是 `openai-completions` / `openai-responses`。
244
+ - 向 custom / 非 OpenAI-compatible API 发送 OpenAI `prompt_cache_key`;该兜底只要求 API 是 `openai-completions` / `openai-responses`(`kiro-api` 等 custom transport 不被注入,但模型命名不再要求属于 GPT-family)。
234
245
  - 在 Pi 自己的 compat 处理之外覆盖 OpenAI `prompt_cache_retention`。
235
246
  - 创建 Gemini explicit `cachedContents` resources 或持久化 cache resource names。
236
247
  - 对不暴露可靠 cache usage 的 provider 声称统计支持。
package/index.ts CHANGED
@@ -80,6 +80,7 @@ const MIN_STABLE_CANDIDATE_LENGTH = 8;
80
80
 
81
81
  const ASSISTANT_MESSAGE_MODEL_TOKEN_KEYS = ["model", "name"];
82
82
  const OPENAI_REASONING_MODEL_PATTERN = /(^|[/\s:_-])o[1345]($|[-_.:/\s])/;
83
+ const XAI_MODEL_PATTERN = /(^|[/\s:_-])xai($|[-_.:/\s])/;
83
84
 
84
85
  type CacheCompat = {
85
86
  sendSessionAffinityHeaders?: boolean;
@@ -635,6 +636,101 @@ function isGeminiLikeAssistantMessage(message: unknown, model: PiModel | undefin
635
636
  return modelOrAssistantMessageHas(message, model, ["gemini", "vertex"]);
636
637
  }
637
638
 
639
+ // ── Non-GPT OpenAI-compatible model detection ──────────────────────
640
+
641
+ function isKimiLikeModel(model: PiModel | undefined): boolean {
642
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["kimi"]);
643
+ }
644
+ function isKimiLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
645
+ return modelOrAssistantMessageHas(message, model, ["kimi"]);
646
+ }
647
+
648
+ function isQwenLikeModel(model: PiModel | undefined): boolean {
649
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["qwen"]);
650
+ }
651
+ function isQwenLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
652
+ return modelOrAssistantMessageHas(message, model, ["qwen"]);
653
+ }
654
+
655
+ function isGLMLikeModel(model: PiModel | undefined): boolean {
656
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["glm"]);
657
+ }
658
+ function isGLMLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
659
+ return modelOrAssistantMessageHas(message, model, ["glm"]);
660
+ }
661
+
662
+ function isMiniMaxLikeModel(model: PiModel | undefined): boolean {
663
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["minimax"]);
664
+ }
665
+ function isMiniMaxLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
666
+ return modelOrAssistantMessageHas(message, model, ["minimax"]);
667
+ }
668
+
669
+ function isHunyuanLikeModel(model: PiModel | undefined): boolean {
670
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["hunyuan"]);
671
+ }
672
+ function isHunyuanLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
673
+ return modelOrAssistantMessageHas(message, model, ["hunyuan"]);
674
+ }
675
+
676
+ // ── Additional OpenAI-compatible model detection ──────────────────
677
+
678
+ function isMistralLikeModel(model: PiModel | undefined): boolean {
679
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["mistral", "mixtral", "codestral"]);
680
+ }
681
+ function isMistralLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
682
+ return modelOrAssistantMessageHas(message, model, ["mistral", "mixtral", "codestral"]);
683
+ }
684
+
685
+ function isGrokLikeModel(model: PiModel | undefined): boolean {
686
+ const tokens = getModelIdNameTokenValues(model);
687
+ return hasAnyTokenContaining(tokens, ["grok"]) || tokens.some((t) => XAI_MODEL_PATTERN.test(t));
688
+ }
689
+ function isGrokLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
690
+ const allTokens = [
691
+ ...getModelIdNameTokenValues(model),
692
+ ...getAssistantMessageModelTokenValues(message),
693
+ ];
694
+ return hasAnyTokenContaining(allTokens, ["grok"]) || allTokens.some((t) => XAI_MODEL_PATTERN.test(t));
695
+ }
696
+
697
+ function isLlamaLikeModel(model: PiModel | undefined): boolean {
698
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["llama"]);
699
+ }
700
+ function isLlamaLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
701
+ return modelOrAssistantMessageHas(message, model, ["llama"]);
702
+ }
703
+
704
+ function isNemotronLikeModel(model: PiModel | undefined): boolean {
705
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["nemotron"]);
706
+ }
707
+ function isNemotronLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
708
+ return modelOrAssistantMessageHas(message, model, ["nemotron"]);
709
+ }
710
+
711
+ function isCohereLikeModel(model: PiModel | undefined): boolean {
712
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["cohere", "command-r"]);
713
+ }
714
+ function isCohereLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
715
+ return modelOrAssistantMessageHas(message, model, ["cohere", "command-r"]);
716
+ }
717
+
718
+ const YI_MODEL_PATTERN = /(^|[\/\s:_-])yi($|[\-_.:\/\s])/;
719
+
720
+ function isYiLikeModel(model: PiModel | undefined): boolean {
721
+ const tokens = getModelIdNameTokenValues(model);
722
+ return hasAnyTokenContaining(tokens, ["yi-", "01-ai", "zero-one"]) || tokens.some((t) => YI_MODEL_PATTERN.test(t));
723
+ }
724
+ function isYiLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
725
+ const allTokens = [
726
+ ...getModelIdNameTokenValues(model),
727
+ ...getAssistantMessageModelTokenValues(message),
728
+ ];
729
+ return hasAnyTokenContaining(allTokens, ["yi-", "01-ai", "zero-one"]) || allTokens.some((t) => YI_MODEL_PATTERN.test(t));
730
+ }
731
+
732
+ // ── Model key ──────────────────────────────────────────────────────
733
+
638
734
  function modelKey(model: PiModel): string {
639
735
  return `${model.provider}/${model.id}`;
640
736
  }
@@ -857,6 +953,29 @@ function describeMissingOpenAIFamilyProxyCompat(model: PiModel): string[] {
857
953
  return missing;
858
954
  }
859
955
 
956
+ /**
957
+ * Like describeMissingOpenAIFamilyProxyCompat but without the isOpenAIFamilyModel
958
+ * gate. Warns for ANY model using openai-completions through a non-official base
959
+ * URL — covers GPT, Kimi, Qwen, GLM, MiniMax, Hunyuan, and any other
960
+ * OpenAI-compatible proxy.
961
+ */
962
+ function describeMissingOpenAICompatibleProxyCompat(model: PiModel): string[] {
963
+ const compat = getCompat(model);
964
+ const missing: string[] = [];
965
+
966
+ if (lower(model.api) !== "openai-completions") return missing;
967
+ if (isOfficialOpenAIBaseUrl(model)) return missing;
968
+
969
+ if (compat.supportsLongCacheRetention !== true) {
970
+ missing.push("supportsLongCacheRetention");
971
+ }
972
+ if (compat.sendSessionAffinityHeaders !== true) {
973
+ missing.push("sendSessionAffinityHeaders");
974
+ }
975
+
976
+ return missing;
977
+ }
978
+
860
979
  /**
861
980
  * Build the warning text displayed to users when an OpenAI-family third-party
862
981
  * proxy is missing one or more cache/session-affinity compat flags.
@@ -971,7 +1090,7 @@ const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
971
1090
  return normalizeWithFallback(message, getOpenAIRawUsage);
972
1091
  },
973
1092
  warningText(model) {
974
- const missing = describeMissingOpenAIFamilyProxyCompat(model);
1093
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
975
1094
  if (missing.length === 0) return undefined;
976
1095
  return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
977
1096
  },
@@ -988,6 +1107,195 @@ const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
988
1107
  return normalizeWithFallback(message, getGeminiRawUsage);
989
1108
  },
990
1109
  },
1110
+ // ── Non-GPT OpenAI-compatible adapters ──────────────────────
1111
+ {
1112
+ id: "openai" as CacheProviderId,
1113
+ label: "Kimi cache",
1114
+ matchesModel: isKimiLikeModel,
1115
+ matchesAssistantMessage(message, model) {
1116
+ if (!isAssistantMessage(message)) return false;
1117
+ return isKimiLikeAssistantMessage(message, model);
1118
+ },
1119
+ normalizeUsage(message) {
1120
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1121
+ },
1122
+ warningText(model) {
1123
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1124
+ if (missing.length === 0) return undefined;
1125
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1126
+ },
1127
+ },
1128
+ {
1129
+ id: "openai" as CacheProviderId,
1130
+ label: "Qwen cache",
1131
+ matchesModel: isQwenLikeModel,
1132
+ matchesAssistantMessage(message, model) {
1133
+ if (!isAssistantMessage(message)) return false;
1134
+ return isQwenLikeAssistantMessage(message, model);
1135
+ },
1136
+ normalizeUsage(message) {
1137
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1138
+ },
1139
+ warningText(model) {
1140
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1141
+ if (missing.length === 0) return undefined;
1142
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1143
+ },
1144
+ },
1145
+ {
1146
+ id: "openai" as CacheProviderId,
1147
+ label: "GLM cache",
1148
+ matchesModel: isGLMLikeModel,
1149
+ matchesAssistantMessage(message, model) {
1150
+ if (!isAssistantMessage(message)) return false;
1151
+ return isGLMLikeAssistantMessage(message, model);
1152
+ },
1153
+ normalizeUsage(message) {
1154
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1155
+ },
1156
+ warningText(model) {
1157
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1158
+ if (missing.length === 0) return undefined;
1159
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1160
+ },
1161
+ },
1162
+ {
1163
+ id: "openai" as CacheProviderId,
1164
+ label: "MiniMax cache",
1165
+ matchesModel: isMiniMaxLikeModel,
1166
+ matchesAssistantMessage(message, model) {
1167
+ if (!isAssistantMessage(message)) return false;
1168
+ return isMiniMaxLikeAssistantMessage(message, model);
1169
+ },
1170
+ normalizeUsage(message) {
1171
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1172
+ },
1173
+ warningText(model) {
1174
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1175
+ if (missing.length === 0) return undefined;
1176
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1177
+ },
1178
+ },
1179
+ {
1180
+ id: "openai" as CacheProviderId,
1181
+ label: "Hunyuan cache",
1182
+ matchesModel: isHunyuanLikeModel,
1183
+ matchesAssistantMessage(message, model) {
1184
+ if (!isAssistantMessage(message)) return false;
1185
+ return isHunyuanLikeAssistantMessage(message, model);
1186
+ },
1187
+ normalizeUsage(message) {
1188
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1189
+ },
1190
+ warningText(model) {
1191
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1192
+ if (missing.length === 0) return undefined;
1193
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1194
+ },
1195
+ },
1196
+ // ── More OpenAI-compatible adapters ──────────────────────────
1197
+ {
1198
+ id: "openai" as CacheProviderId,
1199
+ label: "Mistral cache",
1200
+ matchesModel: isMistralLikeModel,
1201
+ matchesAssistantMessage(message, model) {
1202
+ if (!isAssistantMessage(message)) return false;
1203
+ return isMistralLikeAssistantMessage(message, model);
1204
+ },
1205
+ normalizeUsage(message) {
1206
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1207
+ },
1208
+ warningText(model) {
1209
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1210
+ if (missing.length === 0) return undefined;
1211
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1212
+ },
1213
+ },
1214
+ {
1215
+ id: "openai" as CacheProviderId,
1216
+ label: "Grok cache",
1217
+ matchesModel: isGrokLikeModel,
1218
+ matchesAssistantMessage(message, model) {
1219
+ if (!isAssistantMessage(message)) return false;
1220
+ return isGrokLikeAssistantMessage(message, model);
1221
+ },
1222
+ normalizeUsage(message) {
1223
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1224
+ },
1225
+ warningText(model) {
1226
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1227
+ if (missing.length === 0) return undefined;
1228
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1229
+ },
1230
+ },
1231
+ {
1232
+ id: "openai" as CacheProviderId,
1233
+ label: "Llama cache",
1234
+ matchesModel: isLlamaLikeModel,
1235
+ matchesAssistantMessage(message, model) {
1236
+ if (!isAssistantMessage(message)) return false;
1237
+ return isLlamaLikeAssistantMessage(message, model);
1238
+ },
1239
+ normalizeUsage(message) {
1240
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1241
+ },
1242
+ warningText(model) {
1243
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1244
+ if (missing.length === 0) return undefined;
1245
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1246
+ },
1247
+ },
1248
+ {
1249
+ id: "openai" as CacheProviderId,
1250
+ label: "Nemotron cache",
1251
+ matchesModel: isNemotronLikeModel,
1252
+ matchesAssistantMessage(message, model) {
1253
+ if (!isAssistantMessage(message)) return false;
1254
+ return isNemotronLikeAssistantMessage(message, model);
1255
+ },
1256
+ normalizeUsage(message) {
1257
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1258
+ },
1259
+ warningText(model) {
1260
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1261
+ if (missing.length === 0) return undefined;
1262
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1263
+ },
1264
+ },
1265
+ {
1266
+ id: "openai" as CacheProviderId,
1267
+ label: "Cohere cache",
1268
+ matchesModel: isCohereLikeModel,
1269
+ matchesAssistantMessage(message, model) {
1270
+ if (!isAssistantMessage(message)) return false;
1271
+ return isCohereLikeAssistantMessage(message, model);
1272
+ },
1273
+ normalizeUsage(message) {
1274
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1275
+ },
1276
+ warningText(model) {
1277
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1278
+ if (missing.length === 0) return undefined;
1279
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1280
+ },
1281
+ },
1282
+ {
1283
+ id: "openai" as CacheProviderId,
1284
+ label: "Yi cache",
1285
+ matchesModel: isYiLikeModel,
1286
+ matchesAssistantMessage(message, model) {
1287
+ if (!isAssistantMessage(message)) return false;
1288
+ return isYiLikeAssistantMessage(message, model);
1289
+ },
1290
+ normalizeUsage(message) {
1291
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1292
+ },
1293
+ warningText(model) {
1294
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1295
+ if (missing.length === 0) return undefined;
1296
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1297
+ },
1298
+ },
991
1299
  ];
992
1300
 
993
1301
  function selectAdapterForModel(model: PiModel | undefined): CacheProviderAdapter | undefined {
@@ -1237,7 +1545,32 @@ export const __internals_for_tests = {
1237
1545
  isOpenAIFamilyAssistantMessage,
1238
1546
  isOpenAIFamilyToken,
1239
1547
  describeMissingOpenAIFamilyProxyCompat,
1548
+ describeMissingOpenAICompatibleProxyCompat,
1240
1549
  isOfficialOpenAIBaseUrl,
1550
+ // Non-GPT OpenAI-compatible model detection
1551
+ isKimiLikeModel,
1552
+ isKimiLikeAssistantMessage,
1553
+ isQwenLikeModel,
1554
+ isQwenLikeAssistantMessage,
1555
+ isGLMLikeModel,
1556
+ isGLMLikeAssistantMessage,
1557
+ isMiniMaxLikeModel,
1558
+ isMiniMaxLikeAssistantMessage,
1559
+ isHunyuanLikeModel,
1560
+ isHunyuanLikeAssistantMessage,
1561
+ // Additional OpenAI-compatible model detection
1562
+ isMistralLikeModel,
1563
+ isMistralLikeAssistantMessage,
1564
+ isGrokLikeModel,
1565
+ isGrokLikeAssistantMessage,
1566
+ isLlamaLikeModel,
1567
+ isLlamaLikeAssistantMessage,
1568
+ isNemotronLikeModel,
1569
+ isNemotronLikeAssistantMessage,
1570
+ isCohereLikeModel,
1571
+ isCohereLikeAssistantMessage,
1572
+ isYiLikeModel,
1573
+ isYiLikeAssistantMessage,
1241
1574
  buildOpenAIProxyCompatWarningText,
1242
1575
  getModelIdNameTokenValues,
1243
1576
  getAssistantMessageModelTokenValues,
@@ -1509,7 +1842,6 @@ export default function (pi: ExtensionAPI) {
1509
1842
 
1510
1843
  pi.on("before_provider_request", (event, ctx) => {
1511
1844
  if (!shouldInjectOpenAIPromptCacheKey()) return undefined;
1512
- if (!isOpenAIFamilyModel(ctx.model)) return undefined;
1513
1845
  if (!isOpenAICompatibleApi(ctx.model?.api)) return undefined;
1514
1846
 
1515
1847
  return addOpenAIPromptCacheKey(event.payload, getSessionPromptCacheKey(ctx));
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-cache-optimizer",
3
- "version": "2.4.2",
3
+ "version": "2.4.4",
4
4
  "description": "Pi extension that improves provider-side KV/prompt cache hit rates (DeepSeek, OpenAI, Claude, Gemini) by reordering the system prompt, requesting long retention, and showing footer cache stats. Renamed from pi-deepseek-cache-optimizer.",
5
5
  "keywords": [
6
6
  "pi-package",