pi-cache-optimizer 2.4.2 → 2.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -192,7 +192,7 @@ After: [stable tools + rules | dynamic git status | task context]
192
192
  ↓ stable prefix → higher chance of cache reuse
193
193
  ```
194
194
 
195
- Pi itself decides whether to send cache-related fields such as `prompt_cache_retention`, session-affinity headers, or Anthropic-style `cache_control` based on model compat and `PI_CACHE_RETENTION`. This extension now adds only one conservative request-body fallback by default: for OpenAI-family models using OpenAI-compatible Pi APIs, it fills a missing or blank top-level `prompt_cache_key` with the Pi session id and never overwrites an existing non-empty key. The extension does not fake cache hits; it helps configuration, improves stable-prefix probability, and summarizes exposed usage in the footer.
195
+ Pi itself decides whether to send cache-related fields such as `prompt_cache_retention`, session-affinity headers, or Anthropic-style `cache_control` based on model compat and `PI_CACHE_RETENTION`. This extension now adds only one conservative request-body fallback by default: for all models using OpenAI-compatible Pi APIs (`openai-completions` / `openai-responses`), it fills a missing or blank top-level `prompt_cache_key` with the Pi session id and never overwrites an existing non-empty key. This covers GPT-named models, Kimi/Moonshot, Qwen/Alibaba, GLM/Zhipu, MiniMax, Hunyuan, and any other provider using an OpenAI-shaped API — only custom transports like `kiro-api` are excluded. The extension does not fake cache hits; it helps configuration, improves stable-prefix probability, and summarizes exposed usage in the footer.
196
196
 
197
197
  ## Improving cache hit rate
198
198
 
@@ -207,7 +207,7 @@ What the extension does automatically:
207
207
  Provider notes:
208
208
 
209
209
  - DeepSeek: current behavior remains the reference path. Stable prefix ordering plus long-retention/session-affinity compat gives the best chance of automatic KV prefix reuse.
210
- - OpenAI-family: prompt caching is automatic only on supported upstreams and sufficiently long prompts. Keep static instructions, tools, examples, and specs before changing user/task context. Pi owns retention transport by default. For OpenAI-compatible Pi APIs, the extension fills a missing or blank top-level `prompt_cache_key` with the Pi session id (matching Pi core's official OpenAI behavior) and never overwrites an existing non-empty `prompt_cache_key` / `promptCacheKey`. Disable this fallback with `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` or `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0`. Unsupported OpenAI-compatible proxies may reject unknown fields; custom APIs are not targeted.
210
+ - OpenAI-family: prompt caching is automatic only on supported upstreams and sufficiently long prompts. Keep static instructions, tools, examples, and specs before changing user/task context. Pi owns retention transport by default. For OpenAI-compatible Pi APIs, the extension fills a missing or blank top-level `prompt_cache_key` with the Pi session id (matching Pi core's official OpenAI behavior) and never overwrites an existing non-empty `prompt_cache_key` / `promptCacheKey`. The fallback now applies to ALL models using `openai-completions` / `openai-responses` (not just GPT-named ones), so Kimi, Qwen, GLM, MiniMax, Hunyuan, and other OpenAI-compatible models also benefit. Disable this fallback with `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` or `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0`. Unsupported OpenAI-compatible proxies may reject unknown fields; custom APIs are not targeted.
211
211
  - Claude: prompt caching depends on Anthropic `cache_control` breakpoints. This extension does not inject breakpoints itself; for compatible endpoints, configure Pi compat such as `cacheControlFormat: "anthropic"` only when the endpoint supports it.
212
212
  - Gemini/Vertex: implicit caching benefits from repeated large stable prefixes. This extension does not create explicit `cachedContents` resources or store cache resource names.
213
213
  - Proxies/aggregators: fix upstream routing/provider order where possible. Cache hit rates are unreliable if the same model id/name can route to different upstreams.
@@ -226,7 +226,7 @@ This package now has provider-family stats adapters, but it still avoids blind g
226
226
 
227
227
  - Broad/provider-agnostic request-body mutation or cache-control injection. The only default request-body fallback is OpenAI-family `prompt_cache_key` on OpenAI-compatible APIs, sourced from the Pi session id and skipped when an effective key already exists.
228
228
  - Injecting Anthropic `cache_control` markers.
229
- - Sending OpenAI `prompt_cache_key` into custom/non-OpenAI-compatible APIs; the fallback is gated to OpenAI-family id/name plus `openai-completions` / `openai-responses`.
229
+ - Sending OpenAI `prompt_cache_key` into custom/non-OpenAI-compatible APIs; the fallback is gated to `openai-completions` / `openai-responses` only (custom transports like `kiro-api` are excluded, but the model name no longer needs to be GPT-family).
230
230
  - Overriding OpenAI `prompt_cache_retention` outside Pi's own compat handling.
231
231
  - Creating Gemini explicit `cachedContents` resources or persisting cache resource names.
232
232
  - Claiming stats for providers that do not expose reliable cache usage.
package/README.zh-CN.md CHANGED
@@ -195,7 +195,7 @@ Provider 缓存通常依赖精确或近似精确的前缀匹配。Pi 的 system
195
195
  ↓ 稳定前缀不变 → 更容易命中缓存
196
196
  ```
197
197
 
198
- Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送缓存相关字段,例如 `prompt_cache_retention`、session affinity headers 或 Anthropic-style `cache_control`。本扩展现在默认只做一个保守的 request-body 兜底:对使用 OpenAI-compatible Pi API OpenAI-family 模型,当顶层 `prompt_cache_key` 缺失或为空时,用 Pi session id 补上,并且不会覆盖已有的非空 key。本扩展不伪造缓存命中,只帮助配置、提高稳定前缀概率,并把已暴露的 usage 汇总到底部状态栏。
198
+ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送缓存相关字段,例如 `prompt_cache_retention`、session affinity headers 或 Anthropic-style `cache_control`。本扩展现在默认只做一个保守的 request-body 兜底:对所有使用 OpenAI-compatible Pi API(`openai-completions` / `openai-responses`)的模型,当顶层 `prompt_cache_key` 缺失或为空时,用 Pi session id 补上,并且不会覆盖已有的非空 key。这覆盖 GPT 命名模型、Kimi/Moonshot、Qwen/Alibaba、GLM/Zhipu、MiniMax、Hunyuan 等任何使用 OpenAI 形状 API 的 provider——只有 `kiro-api` 等 custom transport 不被注入。本扩展不伪造缓存命中,只帮助配置、提高稳定前缀概率,并把已暴露的 usage 汇总到底部状态栏。
199
199
 
200
200
  ## 提高 cache 命中率
201
201
 
@@ -210,7 +210,7 @@ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送
210
210
  各 provider 注意点:
211
211
 
212
212
  - DeepSeek:现有行为仍是参考路径。稳定前缀排序,加上 long-retention / session-affinity compat,最有利于自动 KV prefix 复用。
213
- - OpenAI-family:prompt caching 只会在真实上游支持且 prompt 足够长时自动生效。请尽量把静态 instructions、tools、examples、specs 放在变化的 user/task context 前面。retention 传输默认由 Pi 负责。对 OpenAI-compatible Pi API,本扩展会用 Pi session id 补齐缺失或空白的顶层 `prompt_cache_key`(与 Pi core 官方 OpenAI 行为对齐),并且不会覆盖已有非空的 `prompt_cache_key` / `promptCacheKey`。可用 `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` 或 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` 禁用该兜底。不支持该字段的 OpenAI-compatible 代理可能拒绝请求;custom API 不会被注入。
213
+ - OpenAI-family:prompt caching 只会在真实上游支持且 prompt 足够长时自动生效。请尽量把静态 instructions、tools、examples、specs 放在变化的 user/task context 前面。retention 传输默认由 Pi 负责。对 OpenAI-compatible Pi API,本扩展会用 Pi session id 补齐缺失或空白的顶层 `prompt_cache_key`(与 Pi core 官方 OpenAI 行为对齐),并且不会覆盖已有非空的 `prompt_cache_key` / `promptCacheKey`。该兜底现在适用于所有使用 `openai-completions` / `openai-responses` 的模型(不限于 GPT 命名),因此 Kimi、Qwen、GLM、MiniMax、Hunyuan 等 OpenAI-compatible 模型也同样受益。可用 `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` 或 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` 禁用该兜底。不支持该字段的 OpenAI-compatible 代理可能拒绝请求;custom API 不会被注入。
214
214
  - Claude:prompt caching 依赖 Anthropic `cache_control` breakpoints。本扩展不会自行注入 breakpoint;对兼容 endpoint,只在 endpoint 明确支持时配置 Pi compat,例如 `cacheControlFormat: "anthropic"`。
215
215
  - Gemini/Vertex:implicit caching 受益于重复的大型稳定前缀。本扩展不会创建 explicit `cachedContents` resources,也不会保存 cache resource names。
216
216
  - Proxies/aggregators:尽量固定上游 routing/provider order。如果同一个 model id/name 可能路由到不同上游,cache hit rate 会不稳定。
@@ -230,7 +230,7 @@ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送
230
230
 
231
231
  - 广泛/provider-agnostic 修改请求体,或做 cache-control 注入。唯一默认 request-body 兜底是 OpenAI-family 在 OpenAI-compatible API 上使用 Pi session id 的 `prompt_cache_key`,且已有有效 key 时会跳过。
232
232
  - 注入 Anthropic `cache_control` markers。
233
- - 向 custom / 非 OpenAI-compatible API 发送 OpenAI `prompt_cache_key`;该兜底同时要求 model id/name 属于 OpenAI-family,且 API 是 `openai-completions` / `openai-responses`。
233
+ - 向 custom / 非 OpenAI-compatible API 发送 OpenAI `prompt_cache_key`;该兜底只要求 API 是 `openai-completions` / `openai-responses`(`kiro-api` 等 custom transport 不被注入,但模型命名不再要求属于 GPT-family)。
234
234
  - 在 Pi 自己的 compat 处理之外覆盖 OpenAI `prompt_cache_retention`。
235
235
  - 创建 Gemini explicit `cachedContents` resources 或持久化 cache resource names。
236
236
  - 对不暴露可靠 cache usage 的 provider 声称统计支持。
package/index.ts CHANGED
@@ -635,6 +635,45 @@ function isGeminiLikeAssistantMessage(message: unknown, model: PiModel | undefin
635
635
  return modelOrAssistantMessageHas(message, model, ["gemini", "vertex"]);
636
636
  }
637
637
 
638
+ // ── Non-GPT OpenAI-compatible model detection ──────────────────────
639
+
640
+ function isKimiLikeModel(model: PiModel | undefined): boolean {
641
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["kimi"]);
642
+ }
643
+ function isKimiLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
644
+ return modelOrAssistantMessageHas(message, model, ["kimi"]);
645
+ }
646
+
647
+ function isQwenLikeModel(model: PiModel | undefined): boolean {
648
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["qwen"]);
649
+ }
650
+ function isQwenLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
651
+ return modelOrAssistantMessageHas(message, model, ["qwen"]);
652
+ }
653
+
654
+ function isGLMLikeModel(model: PiModel | undefined): boolean {
655
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["glm"]);
656
+ }
657
+ function isGLMLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
658
+ return modelOrAssistantMessageHas(message, model, ["glm"]);
659
+ }
660
+
661
+ function isMiniMaxLikeModel(model: PiModel | undefined): boolean {
662
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["minimax"]);
663
+ }
664
+ function isMiniMaxLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
665
+ return modelOrAssistantMessageHas(message, model, ["minimax"]);
666
+ }
667
+
668
+ function isHunyuanLikeModel(model: PiModel | undefined): boolean {
669
+ return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["hunyuan"]);
670
+ }
671
+ function isHunyuanLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
672
+ return modelOrAssistantMessageHas(message, model, ["hunyuan"]);
673
+ }
674
+
675
+ // ── Model key ──────────────────────────────────────────────────────
676
+
638
677
  function modelKey(model: PiModel): string {
639
678
  return `${model.provider}/${model.id}`;
640
679
  }
@@ -857,6 +896,29 @@ function describeMissingOpenAIFamilyProxyCompat(model: PiModel): string[] {
857
896
  return missing;
858
897
  }
859
898
 
899
+ /**
900
+ * Like describeMissingOpenAIFamilyProxyCompat but without the isOpenAIFamilyModel
901
+ * gate. Warns for ANY model using openai-completions through a non-official base
902
+ * URL — covers GPT, Kimi, Qwen, GLM, MiniMax, Hunyuan, and any other
903
+ * OpenAI-compatible proxy.
904
+ */
905
+ function describeMissingOpenAICompatibleProxyCompat(model: PiModel): string[] {
906
+ const compat = getCompat(model);
907
+ const missing: string[] = [];
908
+
909
+ if (lower(model.api) !== "openai-completions") return missing;
910
+ if (isOfficialOpenAIBaseUrl(model)) return missing;
911
+
912
+ if (compat.supportsLongCacheRetention !== true) {
913
+ missing.push("supportsLongCacheRetention");
914
+ }
915
+ if (compat.sendSessionAffinityHeaders !== true) {
916
+ missing.push("sendSessionAffinityHeaders");
917
+ }
918
+
919
+ return missing;
920
+ }
921
+
860
922
  /**
861
923
  * Build the warning text displayed to users when an OpenAI-family third-party
862
924
  * proxy is missing one or more cache/session-affinity compat flags.
@@ -971,7 +1033,7 @@ const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
971
1033
  return normalizeWithFallback(message, getOpenAIRawUsage);
972
1034
  },
973
1035
  warningText(model) {
974
- const missing = describeMissingOpenAIFamilyProxyCompat(model);
1036
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
975
1037
  if (missing.length === 0) return undefined;
976
1038
  return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
977
1039
  },
@@ -988,6 +1050,92 @@ const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
988
1050
  return normalizeWithFallback(message, getGeminiRawUsage);
989
1051
  },
990
1052
  },
1053
+ // ── Non-GPT OpenAI-compatible adapters ──────────────────────
1054
+ {
1055
+ id: "openai" as CacheProviderId,
1056
+ label: "Kimi cache",
1057
+ matchesModel: isKimiLikeModel,
1058
+ matchesAssistantMessage(message, model) {
1059
+ if (!isAssistantMessage(message)) return false;
1060
+ return isKimiLikeAssistantMessage(message, model);
1061
+ },
1062
+ normalizeUsage(message) {
1063
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1064
+ },
1065
+ warningText(model) {
1066
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1067
+ if (missing.length === 0) return undefined;
1068
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1069
+ },
1070
+ },
1071
+ {
1072
+ id: "openai" as CacheProviderId,
1073
+ label: "Qwen cache",
1074
+ matchesModel: isQwenLikeModel,
1075
+ matchesAssistantMessage(message, model) {
1076
+ if (!isAssistantMessage(message)) return false;
1077
+ return isQwenLikeAssistantMessage(message, model);
1078
+ },
1079
+ normalizeUsage(message) {
1080
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1081
+ },
1082
+ warningText(model) {
1083
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1084
+ if (missing.length === 0) return undefined;
1085
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1086
+ },
1087
+ },
1088
+ {
1089
+ id: "openai" as CacheProviderId,
1090
+ label: "GLM cache",
1091
+ matchesModel: isGLMLikeModel,
1092
+ matchesAssistantMessage(message, model) {
1093
+ if (!isAssistantMessage(message)) return false;
1094
+ return isGLMLikeAssistantMessage(message, model);
1095
+ },
1096
+ normalizeUsage(message) {
1097
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1098
+ },
1099
+ warningText(model) {
1100
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1101
+ if (missing.length === 0) return undefined;
1102
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1103
+ },
1104
+ },
1105
+ {
1106
+ id: "openai" as CacheProviderId,
1107
+ label: "MiniMax cache",
1108
+ matchesModel: isMiniMaxLikeModel,
1109
+ matchesAssistantMessage(message, model) {
1110
+ if (!isAssistantMessage(message)) return false;
1111
+ return isMiniMaxLikeAssistantMessage(message, model);
1112
+ },
1113
+ normalizeUsage(message) {
1114
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1115
+ },
1116
+ warningText(model) {
1117
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1118
+ if (missing.length === 0) return undefined;
1119
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1120
+ },
1121
+ },
1122
+ {
1123
+ id: "openai" as CacheProviderId,
1124
+ label: "Hunyuan cache",
1125
+ matchesModel: isHunyuanLikeModel,
1126
+ matchesAssistantMessage(message, model) {
1127
+ if (!isAssistantMessage(message)) return false;
1128
+ return isHunyuanLikeAssistantMessage(message, model);
1129
+ },
1130
+ normalizeUsage(message) {
1131
+ return normalizeWithFallback(message, getOpenAIRawUsage);
1132
+ },
1133
+ warningText(model) {
1134
+ const missing = describeMissingOpenAICompatibleProxyCompat(model);
1135
+ if (missing.length === 0) return undefined;
1136
+ return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
1137
+ },
1138
+ },
991
1139
  ];
992
1140
 
993
1141
  function selectAdapterForModel(model: PiModel | undefined): CacheProviderAdapter | undefined {
@@ -1237,7 +1385,19 @@ export const __internals_for_tests = {
1237
1385
  isOpenAIFamilyAssistantMessage,
1238
1386
  isOpenAIFamilyToken,
1239
1387
  describeMissingOpenAIFamilyProxyCompat,
1388
+ describeMissingOpenAICompatibleProxyCompat,
1240
1389
  isOfficialOpenAIBaseUrl,
1390
+ // Non-GPT OpenAI-compatible model detection
1391
+ isKimiLikeModel,
1392
+ isKimiLikeAssistantMessage,
1393
+ isQwenLikeModel,
1394
+ isQwenLikeAssistantMessage,
1395
+ isGLMLikeModel,
1396
+ isGLMLikeAssistantMessage,
1397
+ isMiniMaxLikeModel,
1398
+ isMiniMaxLikeAssistantMessage,
1399
+ isHunyuanLikeModel,
1400
+ isHunyuanLikeAssistantMessage,
1241
1401
  buildOpenAIProxyCompatWarningText,
1242
1402
  getModelIdNameTokenValues,
1243
1403
  getAssistantMessageModelTokenValues,
@@ -1509,7 +1669,6 @@ export default function (pi: ExtensionAPI) {
1509
1669
 
1510
1670
  pi.on("before_provider_request", (event, ctx) => {
1511
1671
  if (!shouldInjectOpenAIPromptCacheKey()) return undefined;
1512
- if (!isOpenAIFamilyModel(ctx.model)) return undefined;
1513
1672
  if (!isOpenAICompatibleApi(ctx.model?.api)) return undefined;
1514
1673
 
1515
1674
  return addOpenAIPromptCacheKey(event.payload, getSessionPromptCacheKey(ctx));
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-cache-optimizer",
3
- "version": "2.4.2",
3
+ "version": "2.4.3",
4
4
  "description": "Pi extension that improves provider-side KV/prompt cache hit rates (DeepSeek, OpenAI, Claude, Gemini) by reordering the system prompt, requesting long retention, and showing footer cache stats. Renamed from pi-deepseek-cache-optimizer.",
5
5
  "keywords": [
6
6
  "pi-package",