pi-cache-optimizer 2.4.1 → 2.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -82,6 +82,7 @@ After installation, `PI_CACHE_RETENTION=long` is applied automatically, the syst
82
82
 
83
83
  | Env var | Effect |
84
84
  |---------|--------|
85
+ | `PI_CACHE_OPTIMIZER_NO_PROMPT_REWRITE=1` | Skip all `before_agent_start` prompt mutations (churn strip, skill compression, stable-prefix reorder); footer stats and `prompt_cache_key` fallback remain active |
85
86
  | `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1` | Keep pi's verbose `<available_skills>` XML (opt out of one-line index) |
86
87
  | `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` | Disable the OpenAI-family `prompt_cache_key` fallback (default is enabled) |
87
88
  | `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` | Disable the OpenAI-family `prompt_cache_key` fallback |
@@ -191,7 +192,7 @@ After: [stable tools + rules | dynamic git status | task context]
191
192
  ↓ stable prefix → higher chance of cache reuse
192
193
  ```
193
194
 
194
- Pi itself decides whether to send cache-related fields such as `prompt_cache_retention`, session-affinity headers, or Anthropic-style `cache_control` based on model compat and `PI_CACHE_RETENTION`. This extension now adds only one conservative request-body fallback by default: for OpenAI-family models using OpenAI-compatible Pi APIs, it fills a missing or blank top-level `prompt_cache_key` with the Pi session id and never overwrites an existing non-empty key. The extension does not fake cache hits; it helps configuration, improves stable-prefix probability, and summarizes exposed usage in the footer.
195
+ Pi itself decides whether to send cache-related fields such as `prompt_cache_retention`, session-affinity headers, or Anthropic-style `cache_control` based on model compat and `PI_CACHE_RETENTION`. This extension now adds only one conservative request-body fallback by default: for all models using OpenAI-compatible Pi APIs (`openai-completions` / `openai-responses`), it fills a missing or blank top-level `prompt_cache_key` with the Pi session id and never overwrites an existing non-empty key. This covers GPT-named models, Kimi/Moonshot, Qwen/Alibaba, GLM/Zhipu, MiniMax, Hunyuan, and any other provider using an OpenAI-shaped API — only custom transports like `kiro-api` are excluded. The extension does not fake cache hits; it helps configuration, improves stable-prefix probability, and summarizes exposed usage in the footer.
195
196
 
196
197
  ## Improving cache hit rate
197
198
 
@@ -206,7 +207,7 @@ What the extension does automatically:
206
207
  Provider notes:
207
208
 
208
209
  - DeepSeek: current behavior remains the reference path. Stable prefix ordering plus long-retention/session-affinity compat gives the best chance of automatic KV prefix reuse.
209
- - OpenAI-family: prompt caching is automatic only on supported upstreams and sufficiently long prompts. Keep static instructions, tools, examples, and specs before changing user/task context. Pi owns retention transport by default. For OpenAI-compatible Pi APIs, the extension fills a missing or blank top-level `prompt_cache_key` with the Pi session id (matching Pi core's official OpenAI behavior) and never overwrites an existing non-empty `prompt_cache_key` / `promptCacheKey`. Disable this fallback with `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` or `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0`. Unsupported OpenAI-compatible proxies may reject unknown fields; custom APIs are not targeted.
210
+ - OpenAI-family: prompt caching is automatic only on supported upstreams and sufficiently long prompts. Keep static instructions, tools, examples, and specs before changing user/task context. Pi owns retention transport by default. For OpenAI-compatible Pi APIs, the extension fills a missing or blank top-level `prompt_cache_key` with the Pi session id (matching Pi core's official OpenAI behavior) and never overwrites an existing non-empty `prompt_cache_key` / `promptCacheKey`. The fallback now applies to ALL models using `openai-completions` / `openai-responses` (not just GPT-named ones), so Kimi, Qwen, GLM, MiniMax, Hunyuan, and other OpenAI-compatible models also benefit. Disable this fallback with `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` or `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0`. Unsupported OpenAI-compatible proxies may reject unknown fields; custom APIs are not targeted.
210
211
  - Claude: prompt caching depends on Anthropic `cache_control` breakpoints. This extension does not inject breakpoints itself; for compatible endpoints, configure Pi compat such as `cacheControlFormat: "anthropic"` only when the endpoint supports it.
211
212
  - Gemini/Vertex: implicit caching benefits from repeated large stable prefixes. This extension does not create explicit `cachedContents` resources or store cache resource names.
212
213
  - Proxies/aggregators: fix upstream routing/provider order where possible. Cache hit rates are unreliable if the same model id/name can route to different upstreams.
@@ -225,7 +226,7 @@ This package now has provider-family stats adapters, but it still avoids blind g
225
226
 
226
227
  - Broad/provider-agnostic request-body mutation or cache-control injection. The only default request-body fallback is OpenAI-family `prompt_cache_key` on OpenAI-compatible APIs, sourced from the Pi session id and skipped when an effective key already exists.
227
228
  - Injecting Anthropic `cache_control` markers.
228
- - Sending OpenAI `prompt_cache_key` into custom/non-OpenAI-compatible APIs; the fallback is gated to OpenAI-family id/name plus `openai-completions` / `openai-responses`.
229
+ - Sending OpenAI `prompt_cache_key` into custom/non-OpenAI-compatible APIs; the fallback is gated to `openai-completions` / `openai-responses` only (custom transports like `kiro-api` are excluded, but the model name no longer needs to be GPT-family).
229
230
  - Overriding OpenAI `prompt_cache_retention` outside Pi's own compat handling.
230
231
  - Creating Gemini explicit `cachedContents` resources or persisting cache resource names.
231
232
  - Claiming stats for providers that do not expose reliable cache usage.
package/README.zh-CN.md CHANGED
@@ -85,6 +85,7 @@ pi install npm:pi-cache-optimizer
85
85
 
86
86
  | 环境变量 | 作用 |
87
87
  |---------|------|
88
+ | `PI_CACHE_OPTIMIZER_NO_PROMPT_REWRITE=1` | 跳过所有 `before_agent_start` prompt 修改(session-overview 字段剥离、skills 压缩、稳定前缀重排);底部统计和 `prompt_cache_key` 兜底仍然生效 |
88
89
  | `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1` | 保留 pi 的 verbose `<available_skills>` XML(退出一行索引模式) |
89
90
  | `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` | 禁用 OpenAI-family `prompt_cache_key` 兜底(默认启用) |
90
91
  | `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` | 禁用 OpenAI-family `prompt_cache_key` 兜底 |
@@ -194,7 +195,7 @@ Provider 缓存通常依赖精确或近似精确的前缀匹配。Pi 的 system
194
195
  ↓ 稳定前缀不变 → 更容易命中缓存
195
196
  ```
196
197
 
197
- Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送缓存相关字段,例如 `prompt_cache_retention`、session affinity headers 或 Anthropic-style `cache_control`。本扩展现在默认只做一个保守的 request-body 兜底:对使用 OpenAI-compatible Pi API OpenAI-family 模型,当顶层 `prompt_cache_key` 缺失或为空时,用 Pi session id 补上,并且不会覆盖已有的非空 key。本扩展不伪造缓存命中,只帮助配置、提高稳定前缀概率,并把已暴露的 usage 汇总到底部状态栏。
198
+ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送缓存相关字段,例如 `prompt_cache_retention`、session affinity headers 或 Anthropic-style `cache_control`。本扩展现在默认只做一个保守的 request-body 兜底:对所有使用 OpenAI-compatible Pi API(`openai-completions` / `openai-responses`)的模型,当顶层 `prompt_cache_key` 缺失或为空时,用 Pi session id 补上,并且不会覆盖已有的非空 key。这覆盖 GPT 命名模型、Kimi/Moonshot、Qwen/Alibaba、GLM/Zhipu、MiniMax、Hunyuan 等任何使用 OpenAI 形状 API 的 provider——只有 `kiro-api` 等 custom transport 不被注入。本扩展不伪造缓存命中,只帮助配置、提高稳定前缀概率,并把已暴露的 usage 汇总到底部状态栏。
198
199
 
199
200
  ## 提高 cache 命中率
200
201
 
@@ -209,7 +210,7 @@ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送
209
210
  各 provider 注意点:
210
211
 
211
212
  - DeepSeek:现有行为仍是参考路径。稳定前缀排序,加上 long-retention / session-affinity compat,最有利于自动 KV prefix 复用。
212
- - OpenAI-family:prompt caching 只会在真实上游支持且 prompt 足够长时自动生效。请尽量把静态 instructions、tools、examples、specs 放在变化的 user/task context 前面。retention 传输默认由 Pi 负责。对 OpenAI-compatible Pi API,本扩展会用 Pi session id 补齐缺失或空白的顶层 `prompt_cache_key`(与 Pi core 官方 OpenAI 行为对齐),并且不会覆盖已有非空的 `prompt_cache_key` / `promptCacheKey`。可用 `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` 或 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` 禁用该兜底。不支持该字段的 OpenAI-compatible 代理可能拒绝请求;custom API 不会被注入。
213
+ - OpenAI-family:prompt caching 只会在真实上游支持且 prompt 足够长时自动生效。请尽量把静态 instructions、tools、examples、specs 放在变化的 user/task context 前面。retention 传输默认由 Pi 负责。对 OpenAI-compatible Pi API,本扩展会用 Pi session id 补齐缺失或空白的顶层 `prompt_cache_key`(与 Pi core 官方 OpenAI 行为对齐),并且不会覆盖已有非空的 `prompt_cache_key` / `promptCacheKey`。该兜底现在适用于所有使用 `openai-completions` / `openai-responses` 的模型(不限于 GPT 命名),因此 Kimi、Qwen、GLM、MiniMax、Hunyuan 等 OpenAI-compatible 模型也同样受益。可用 `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` 或 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` 禁用该兜底。不支持该字段的 OpenAI-compatible 代理可能拒绝请求;custom API 不会被注入。
213
214
  - Claude:prompt caching 依赖 Anthropic `cache_control` breakpoints。本扩展不会自行注入 breakpoint;对兼容 endpoint,只在 endpoint 明确支持时配置 Pi compat,例如 `cacheControlFormat: "anthropic"`。
214
215
  - Gemini/Vertex:implicit caching 受益于重复的大型稳定前缀。本扩展不会创建 explicit `cachedContents` resources,也不会保存 cache resource names。
215
216
  - Proxies/aggregators:尽量固定上游 routing/provider order。如果同一个 model id/name 可能路由到不同上游,cache hit rate 会不稳定。
@@ -229,7 +230,7 @@ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送
229
230
 
230
231
  - 广泛/provider-agnostic 修改请求体,或做 cache-control 注入。唯一默认 request-body 兜底是 OpenAI-family 在 OpenAI-compatible API 上使用 Pi session id 的 `prompt_cache_key`,且已有有效 key 时会跳过。
231
232
  - 注入 Anthropic `cache_control` markers。
232
- - 向 custom / 非 OpenAI-compatible API 发送 OpenAI `prompt_cache_key`;该兜底同时要求 model id/name 属于 OpenAI-family,且 API 是 `openai-completions` / `openai-responses`。
233
+ - 向 custom / 非 OpenAI-compatible API 发送 OpenAI `prompt_cache_key`;该兜底只要求 API 是 `openai-completions` / `openai-responses`(`kiro-api` 等 custom transport 不被注入,但模型命名不再要求属于 GPT-family)。
233
234
  - 在 Pi 自己的 compat 处理之外覆盖 OpenAI `prompt_cache_retention`。
234
235
  - 创建 Gemini explicit `cachedContents` resources 或持久化 cache resource names。
235
236
  - 对不暴露可靠 cache usage 的 provider 声称统计支持。
package/index.ts CHANGED
@@ -37,6 +37,7 @@ const OPENAI_CACHE_KEY_ENV = "PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY";
37
37
  const NO_OPENAI_CACHE_KEY_ENV = "PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY";
38
38
  const OPENAI_PROMPT_CACHE_KEY_MAX_LENGTH = 64;
39
39
  const NO_SKILL_COMPRESSION_ENV = "PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION";
40
+ const NO_PROMPT_REWRITE_ENV = "PI_CACHE_OPTIMIZER_NO_PROMPT_REWRITE";
40
41
 
41
42
  // WORM-flag: if optimizeSystemPrompt ever detects that its blind-replace
42
43
  // logic has accidentally truncated a structural marker (any XML tag or
@@ -634,6 +635,45 @@ function isGeminiLikeAssistantMessage(message: unknown, model: PiModel | undefin
634
635
  return modelOrAssistantMessageHas(message, model, ["gemini", "vertex"]);
635
636
  }
636
637
 
638
+ // ── Non-GPT OpenAI-compatible model detection ──────────────────────
639
+
640
+ function isKimiLikeModel(model: PiModel | undefined): boolean {
641
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["kimi"]);
642
+ }
643
+ function isKimiLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
644
+ return modelOrAssistantMessageHas(message, model, ["kimi"]);
645
+ }
646
+
647
+ function isQwenLikeModel(model: PiModel | undefined): boolean {
648
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["qwen"]);
649
+ }
650
+ function isQwenLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
651
+ return modelOrAssistantMessageHas(message, model, ["qwen"]);
652
+ }
653
+
654
+ function isGLMLikeModel(model: PiModel | undefined): boolean {
655
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["glm"]);
656
+ }
657
+ function isGLMLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
658
+ return modelOrAssistantMessageHas(message, model, ["glm"]);
659
+ }
660
+
661
+ function isMiniMaxLikeModel(model: PiModel | undefined): boolean {
662
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["minimax"]);
663
+ }
664
+ function isMiniMaxLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
665
+ return modelOrAssistantMessageHas(message, model, ["minimax"]);
666
+ }
667
+
668
+ function isHunyuanLikeModel(model: PiModel | undefined): boolean {
669
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["hunyuan"]);
670
+ }
671
+ function isHunyuanLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
672
+ return modelOrAssistantMessageHas(message, model, ["hunyuan"]);
673
+ }
674
+
675
+ // ── Model key ──────────────────────────────────────────────────────
676
+
637
677
  function modelKey(model: PiModel): string {
638
678
  return `${model.provider}/${model.id}`;
639
679
  }
@@ -856,6 +896,66 @@ function describeMissingOpenAIFamilyProxyCompat(model: PiModel): string[] {
856
896
  return missing;
857
897
  }
858
898
 
899
+ /**
900
+ * Like describeMissingOpenAIFamilyProxyCompat but without the isOpenAIFamilyModel
901
+ * gate. Warns for ANY model using openai-completions through a non-official base
902
+ * URL — covers GPT, Kimi, Qwen, GLM, MiniMax, Hunyuan, and any other
903
+ * OpenAI-compatible proxy.
904
+ */
905
+ function describeMissingOpenAICompatibleProxyCompat(model: PiModel): string[] {
906
+ const compat = getCompat(model);
907
+ const missing: string[] = [];
908
+
909
+ if (lower(model.api) !== "openai-completions") return missing;
910
+ if (isOfficialOpenAIBaseUrl(model)) return missing;
911
+
912
+ if (compat.supportsLongCacheRetention !== true) {
913
+ missing.push("supportsLongCacheRetention");
914
+ }
915
+ if (compat.sendSessionAffinityHeaders !== true) {
916
+ missing.push("sendSessionAffinityHeaders");
917
+ }
918
+
919
+ return missing;
920
+ }
921
+
922
+ /**
923
+ * Build the warning text displayed to users when an OpenAI-family third-party
924
+ * proxy is missing one or more cache/session-affinity compat flags.
925
+ *
926
+ * The returned string contains a parseable JSON object (via JSON.stringify)
927
+ * listing only the missing flags with recommended value `true`. Inline
928
+ * explanations for each flag follow the JSON snippet as separate prose lines,
929
+ * so the JSON remains valid and copyable.
930
+ *
931
+ * Expected use: the openai adapter's warningText calls this function; tests
932
+ * exercise it via __internals_for_tests.
933
+ */
934
+ function buildOpenAIProxyCompatWarningText(key: string, missing: string[]): string {
935
+ const suggestion: Record<string, boolean> = {};
936
+ for (const flag of missing) {
937
+ suggestion[flag] = true;
938
+ }
939
+
940
+ const lines: string[] = [
941
+ `💡 pi-cache-optimizer: ${key} is a third-party GPT/OpenAI-compatible proxy but merged compat lacks ${missing.join(" and ")}.`,
942
+ `Add under the model's compat in ~/.pi/agent/models.json (only if the endpoint supports them):`,
943
+ ``,
944
+ JSON.stringify(suggestion, null, 2),
945
+ ``,
946
+ ];
947
+
948
+ for (const flag of missing) {
949
+ if (flag === "supportsLongCacheRetention") {
950
+ lines.push("- supportsLongCacheRetention: confirm your endpoint or proxy supports long prompt cache retention.");
951
+ } else if (flag === "sendSessionAffinityHeaders") {
952
+ lines.push("- sendSessionAffinityHeaders: keeps requests on the same backend for proxy cache locality (session affinity).");
953
+ }
954
+ }
955
+
956
+ return lines.join("\n");
957
+ }
958
+
859
959
  function describeMissingDeepSeekCompat(model: PiModel): string[] {
860
960
  const compat = getCompat(model);
861
961
  const missing: string[] = [];
@@ -933,13 +1033,9 @@ const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
933
1033
  return normalizeWithFallback(message, getOpenAIRawUsage);
934
1034
  },
935
1035
  warningText(model) {
936
- const missing = describeMissingOpenAIFamilyProxyCompat(model);
1036
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
937
1037
  if (missing.length === 0) return undefined;
938
-
939
- return (
940
- `💡 pi-cache-optimizer: ${modelKey(model)} looks like a third-party GPT/OpenAI-compatible proxy but merged compat lacks ${missing.join(" and ")}. ` +
941
- `For better cache locality, add compat: { "supportsLongCacheRetention": true, "sendSessionAffinityHeaders": true } in ~/.pi/agent/models.json when the endpoint supports these fields.`
942
- );
1038
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
943
1039
  },
944
1040
  },
945
1041
  {
@@ -954,6 +1050,92 @@ const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
954
1050
  return normalizeWithFallback(message, getGeminiRawUsage);
955
1051
  },
956
1052
  },
1053
+ // ── Non-GPT OpenAI-compatible adapters ──────────────────────
1054
+ {
1055
+ id: "openai" as CacheProviderId,
1056
+ label: "Kimi cache",
1057
+ matchesModel: isKimiLikeModel,
1058
+ matchesAssistantMessage(message, model) {
1059
+ if (!isAssistantMessage(message)) return false;
1060
+ return isKimiLikeAssistantMessage(message, model);
1061
+ },
1062
+ normalizeUsage(message) {
1063
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1064
+ },
1065
+ warningText(model) {
1066
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1067
+ if (missing.length === 0) return undefined;
1068
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1069
+ },
1070
+ },
1071
+ {
1072
+ id: "openai" as CacheProviderId,
1073
+ label: "Qwen cache",
1074
+ matchesModel: isQwenLikeModel,
1075
+ matchesAssistantMessage(message, model) {
1076
+ if (!isAssistantMessage(message)) return false;
1077
+ return isQwenLikeAssistantMessage(message, model);
1078
+ },
1079
+ normalizeUsage(message) {
1080
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1081
+ },
1082
+ warningText(model) {
1083
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1084
+ if (missing.length === 0) return undefined;
1085
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1086
+ },
1087
+ },
1088
+ {
1089
+ id: "openai" as CacheProviderId,
1090
+ label: "GLM cache",
1091
+ matchesModel: isGLMLikeModel,
1092
+ matchesAssistantMessage(message, model) {
1093
+ if (!isAssistantMessage(message)) return false;
1094
+ return isGLMLikeAssistantMessage(message, model);
1095
+ },
1096
+ normalizeUsage(message) {
1097
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1098
+ },
1099
+ warningText(model) {
1100
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1101
+ if (missing.length === 0) return undefined;
1102
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1103
+ },
1104
+ },
1105
+ {
1106
+ id: "openai" as CacheProviderId,
1107
+ label: "MiniMax cache",
1108
+ matchesModel: isMiniMaxLikeModel,
1109
+ matchesAssistantMessage(message, model) {
1110
+ if (!isAssistantMessage(message)) return false;
1111
+ return isMiniMaxLikeAssistantMessage(message, model);
1112
+ },
1113
+ normalizeUsage(message) {
1114
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1115
+ },
1116
+ warningText(model) {
1117
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1118
+ if (missing.length === 0) return undefined;
1119
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1120
+ },
1121
+ },
1122
+ {
1123
+ id: "openai" as CacheProviderId,
1124
+ label: "Hunyuan cache",
1125
+ matchesModel: isHunyuanLikeModel,
1126
+ matchesAssistantMessage(message, model) {
1127
+ if (!isAssistantMessage(message)) return false;
1128
+ return isHunyuanLikeAssistantMessage(message, model);
1129
+ },
1130
+ normalizeUsage(message) {
1131
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1132
+ },
1133
+ warningText(model) {
1134
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1135
+ if (missing.length === 0) return undefined;
1136
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1137
+ },
1138
+ },
957
1139
  ];
958
1140
 
959
1141
  function selectAdapterForModel(model: PiModel | undefined): CacheProviderAdapter | undefined {
@@ -1190,6 +1372,8 @@ export const __internals_for_tests = {
1190
1372
  compressSkillsInSystemPrompt,
1191
1373
  MIN_STABLE_CANDIDATE_LENGTH,
1192
1374
  SKILL_COMPRESSION_MIN_COUNT,
1375
+ NO_PROMPT_REWRITE_ENV,
1376
+ isEnabledEnv,
1193
1377
  // OpenAI-family cache-key helpers
1194
1378
  addOpenAIPromptCacheKey,
1195
1379
  clampPromptCacheKey,
@@ -1201,7 +1385,20 @@ export const __internals_for_tests = {
1201
1385
  isOpenAIFamilyAssistantMessage,
1202
1386
  isOpenAIFamilyToken,
1203
1387
  describeMissingOpenAIFamilyProxyCompat,
1388
+ describeMissingOpenAICompatibleProxyCompat,
1204
1389
  isOfficialOpenAIBaseUrl,
1390
+ // Non-GPT OpenAI-compatible model detection
1391
+ isKimiLikeModel,
1392
+ isKimiLikeAssistantMessage,
1393
+ isQwenLikeModel,
1394
+ isQwenLikeAssistantMessage,
1395
+ isGLMLikeModel,
1396
+ isGLMLikeAssistantMessage,
1397
+ isMiniMaxLikeModel,
1398
+ isMiniMaxLikeAssistantMessage,
1399
+ isHunyuanLikeModel,
1400
+ isHunyuanLikeAssistantMessage,
1401
+ buildOpenAIProxyCompatWarningText,
1205
1402
  getModelIdNameTokenValues,
1206
1403
  getAssistantMessageModelTokenValues,
1207
1404
  getCompat,
@@ -1421,6 +1618,14 @@ export default function (pi: ExtensionAPI) {
1421
1618
  }
1422
1619
  }
1423
1620
 
1621
+ // Global opt-out: PI_CACHE_OPTIMIZER_NO_PROMPT_REWRITE=1 bypasses all
1622
+ // prompt mutations below (session-overview churn strip, skill compression,
1623
+ // and stable-prefix reordering). Footer stats and the OpenAI
1624
+ // prompt_cache_key fallback remain active.
1625
+ if (isEnabledEnv(process.env[NO_PROMPT_REWRITE_ENV])) {
1626
+ return {};
1627
+ }
1628
+
1424
1629
  // Step 1: strip per-turn churn from <session-overview>.
1425
1630
  // Removing RECENT COMMITS, Working directory status, and
1426
1631
  // Journal line count makes more of the session-overview stable
@@ -1464,7 +1669,6 @@ export default function (pi: ExtensionAPI) {
1464
1669
 
1465
1670
  pi.on("before_provider_request", (event, ctx) => {
1466
1671
  if (!shouldInjectOpenAIPromptCacheKey()) return undefined;
1467
- if (!isOpenAIFamilyModel(ctx.model)) return undefined;
1468
1672
  if (!isOpenAICompatibleApi(ctx.model?.api)) return undefined;
1469
1673
 
1470
1674
  return addOpenAIPromptCacheKey(event.payload, getSessionPromptCacheKey(ctx));
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-cache-optimizer",
3
- "version": "2.4.1",
3
+ "version": "2.4.3",
4
4
  "description": "Pi extension that improves provider-side KV/prompt cache hit rates (DeepSeek, OpenAI, Claude, Gemini) by reordering the system prompt, requesting long retention, and showing footer cache stats. Renamed from pi-deepseek-cache-optimizer.",
5
5
  "keywords": [
6
6
  "pi-package",