npm - pi-cache-optimizer - Versions diffs - 2.4.2 → 2.4.3 - Mend

pi-cache-optimizer 2.4.2 → 2.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -192,7 +192,7 @@ After:  [stable tools + rules | dynamic git status | task context]
         ↓ stable prefix → higher chance of cache reuse
 ```
-Pi itself decides whether to send cache-related fields such as `prompt_cache_retention`, session-affinity headers, or Anthropic-style `cache_control` based on model compat and `PI_CACHE_RETENTION`. This extension now adds only one conservative request-body fallback by default: for OpenAI-family models using OpenAI-compatible Pi APIs, it fills a missing or blank top-level `prompt_cache_key` with the Pi session id and never overwrites an existing non-empty key. The extension does not fake cache hits; it helps configuration, improves stable-prefix probability, and summarizes exposed usage in the footer.
+Pi itself decides whether to send cache-related fields such as `prompt_cache_retention`, session-affinity headers, or Anthropic-style `cache_control` based on model compat and `PI_CACHE_RETENTION`. This extension now adds only one conservative request-body fallback by default: for all models using OpenAI-compatible Pi APIs (`openai-completions` / `openai-responses`), it fills a missing or blank top-level `prompt_cache_key` with the Pi session id and never overwrites an existing non-empty key. This covers GPT-named models, Kimi/Moonshot, Qwen/Alibaba, GLM/Zhipu, MiniMax, Hunyuan, and any other provider using an OpenAI-shaped API — only custom transports like `kiro-api` are excluded. The extension does not fake cache hits; it helps configuration, improves stable-prefix probability, and summarizes exposed usage in the footer.
 ## Improving cache hit rate
@@ -207,7 +207,7 @@ What the extension does automatically:
 Provider notes:
 - DeepSeek: current behavior remains the reference path. Stable prefix ordering plus long-retention/session-affinity compat gives the best chance of automatic KV prefix reuse.
-- OpenAI-family: prompt caching is automatic only on supported upstreams and sufficiently long prompts. Keep static instructions, tools, examples, and specs before changing user/task context. Pi owns retention transport by default. For OpenAI-compatible Pi APIs, the extension fills a missing or blank top-level `prompt_cache_key` with the Pi session id (matching Pi core's official OpenAI behavior) and never overwrites an existing non-empty `prompt_cache_key` / `promptCacheKey`. Disable this fallback with `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` or `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0`. Unsupported OpenAI-compatible proxies may reject unknown fields; custom APIs are not targeted.
+- OpenAI-family: prompt caching is automatic only on supported upstreams and sufficiently long prompts. Keep static instructions, tools, examples, and specs before changing user/task context. Pi owns retention transport by default. For OpenAI-compatible Pi APIs, the extension fills a missing or blank top-level `prompt_cache_key` with the Pi session id (matching Pi core's official OpenAI behavior) and never overwrites an existing non-empty `prompt_cache_key` / `promptCacheKey`. The fallback now applies to ALL models using `openai-completions` / `openai-responses` (not just GPT-named ones), so Kimi, Qwen, GLM, MiniMax, Hunyuan, and other OpenAI-compatible models also benefit. Disable this fallback with `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` or `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0`. Unsupported OpenAI-compatible proxies may reject unknown fields; custom APIs are not targeted.
 - Claude: prompt caching depends on Anthropic `cache_control` breakpoints. This extension does not inject breakpoints itself; for compatible endpoints, configure Pi compat such as `cacheControlFormat: "anthropic"` only when the endpoint supports it.
 - Gemini/Vertex: implicit caching benefits from repeated large stable prefixes. This extension does not create explicit `cachedContents` resources or store cache resource names.
 - Proxies/aggregators: fix upstream routing/provider order where possible. Cache hit rates are unreliable if the same model id/name can route to different upstreams.
@@ -226,7 +226,7 @@ This package now has provider-family stats adapters, but it still avoids blind g
 - Broad/provider-agnostic request-body mutation or cache-control injection. The only default request-body fallback is OpenAI-family `prompt_cache_key` on OpenAI-compatible APIs, sourced from the Pi session id and skipped when an effective key already exists.
 - Injecting Anthropic `cache_control` markers.
-- Sending OpenAI `prompt_cache_key` into custom/non-OpenAI-compatible APIs; the fallback is gated to OpenAI-family id/name plus `openai-completions` / `openai-responses`.
+- Sending OpenAI `prompt_cache_key` into custom/non-OpenAI-compatible APIs; the fallback is gated to `openai-completions` / `openai-responses` only (custom transports like `kiro-api` are excluded, but the model name no longer needs to be GPT-family).
 - Overriding OpenAI `prompt_cache_retention` outside Pi's own compat handling.
 - Creating Gemini explicit `cachedContents` resources or persisting cache resource names.
 - Claiming stats for providers that do not expose reliable cache usage.

package/README.zh-CN.md CHANGED Viewed

@@ -195,7 +195,7 @@ Provider 缓存通常依赖精确或近似精确的前缀匹配。Pi 的 system
          ↓ 稳定前缀不变 → 更容易命中缓存
 ```
-Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送缓存相关字段，例如 `prompt_cache_retention`、session affinity headers 或 Anthropic-style `cache_control`。本扩展现在默认只做一个保守的 request-body 兜底：对使用 OpenAI-compatible Pi API 的 OpenAI-family 模型，当顶层 `prompt_cache_key` 缺失或为空时，用 Pi session id 补上，并且不会覆盖已有的非空 key。本扩展不伪造缓存命中，只帮助配置、提高稳定前缀概率，并把已暴露的 usage 汇总到底部状态栏。
+Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送缓存相关字段，例如 `prompt_cache_retention`、session affinity headers 或 Anthropic-style `cache_control`。本扩展现在默认只做一个保守的 request-body 兜底：对所有使用 OpenAI-compatible Pi API（`openai-completions` / `openai-responses`）的模型，当顶层 `prompt_cache_key` 缺失或为空时，用 Pi session id 补上，并且不会覆盖已有的非空 key。这覆盖 GPT 命名模型、Kimi/Moonshot、Qwen/Alibaba、GLM/Zhipu、MiniMax、Hunyuan 等任何使用 OpenAI 形状 API 的 provider——只有 `kiro-api` 等 custom transport 不被注入。本扩展不伪造缓存命中，只帮助配置、提高稳定前缀概率，并把已暴露的 usage 汇总到底部状态栏。
 ## 提高 cache 命中率
@@ -210,7 +210,7 @@ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送
 各 provider 注意点：
 - DeepSeek：现有行为仍是参考路径。稳定前缀排序，加上 long-retention / session-affinity compat，最有利于自动 KV prefix 复用。
-- OpenAI-family：prompt caching 只会在真实上游支持且 prompt 足够长时自动生效。请尽量把静态 instructions、tools、examples、specs 放在变化的 user/task context 前面。retention 传输默认由 Pi 负责。对 OpenAI-compatible Pi API，本扩展会用 Pi session id 补齐缺失或空白的顶层 `prompt_cache_key`（与 Pi core 官方 OpenAI 行为对齐），并且不会覆盖已有非空的 `prompt_cache_key` / `promptCacheKey`。可用 `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` 或 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` 禁用该兜底。不支持该字段的 OpenAI-compatible 代理可能拒绝请求；custom API 不会被注入。
+- OpenAI-family：prompt caching 只会在真实上游支持且 prompt 足够长时自动生效。请尽量把静态 instructions、tools、examples、specs 放在变化的 user/task context 前面。retention 传输默认由 Pi 负责。对 OpenAI-compatible Pi API，本扩展会用 Pi session id 补齐缺失或空白的顶层 `prompt_cache_key`（与 Pi core 官方 OpenAI 行为对齐），并且不会覆盖已有非空的 `prompt_cache_key` / `promptCacheKey`。该兜底现在适用于所有使用 `openai-completions` / `openai-responses` 的模型（不限于 GPT 命名），因此 Kimi、Qwen、GLM、MiniMax、Hunyuan 等 OpenAI-compatible 模型也同样受益。可用 `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` 或 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` 禁用该兜底。不支持该字段的 OpenAI-compatible 代理可能拒绝请求；custom API 不会被注入。
 - Claude：prompt caching 依赖 Anthropic `cache_control` breakpoints。本扩展不会自行注入 breakpoint；对兼容 endpoint，只在 endpoint 明确支持时配置 Pi compat，例如 `cacheControlFormat: "anthropic"`。
 - Gemini/Vertex：implicit caching 受益于重复的大型稳定前缀。本扩展不会创建 explicit `cachedContents` resources，也不会保存 cache resource names。
 - Proxies/aggregators：尽量固定上游 routing/provider order。如果同一个 model id/name 可能路由到不同上游，cache hit rate 会不稳定。
@@ -230,7 +230,7 @@ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送
 - 广泛/provider-agnostic 修改请求体，或做 cache-control 注入。唯一默认 request-body 兜底是 OpenAI-family 在 OpenAI-compatible API 上使用 Pi session id 的 `prompt_cache_key`，且已有有效 key 时会跳过。
 - 注入 Anthropic `cache_control` markers。
-- 向 custom / 非 OpenAI-compatible API 发送 OpenAI `prompt_cache_key`；该兜底同时要求 model id/name 属于 OpenAI-family，且 API 是 `openai-completions` / `openai-responses`。
+- 向 custom / 非 OpenAI-compatible API 发送 OpenAI `prompt_cache_key`；该兜底只要求 API 是 `openai-completions` / `openai-responses`（`kiro-api` 等 custom transport 不被注入，但模型命名不再要求属于 GPT-family）。
 - 在 Pi 自己的 compat 处理之外覆盖 OpenAI `prompt_cache_retention`。
 - 创建 Gemini explicit `cachedContents` resources 或持久化 cache resource names。
 - 对不暴露可靠 cache usage 的 provider 声称统计支持。

package/index.ts CHANGED Viewed

@@ -635,6 +635,45 @@ function isGeminiLikeAssistantMessage(message: unknown, model: PiModel | undefin
   return modelOrAssistantMessageHas(message, model, ["gemini", "vertex"]);
 }
+// ── Non-GPT OpenAI-compatible model detection ──────────────────────
+function isKimiLikeModel(model: PiModel | undefined): boolean {
+  return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["kimi"]);
+}
+function isKimiLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
+  return modelOrAssistantMessageHas(message, model, ["kimi"]);
+}
+function isQwenLikeModel(model: PiModel | undefined): boolean {
+  return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["qwen"]);
+}
+function isQwenLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
+  return modelOrAssistantMessageHas(message, model, ["qwen"]);
+}
+function isGLMLikeModel(model: PiModel | undefined): boolean {
+  return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["glm"]);
+}
+function isGLMLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
+  return modelOrAssistantMessageHas(message, model, ["glm"]);
+}
+function isMiniMaxLikeModel(model: PiModel | undefined): boolean {
+  return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["minimax"]);
+}
+function isMiniMaxLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
+  return modelOrAssistantMessageHas(message, model, ["minimax"]);
+}
+function isHunyuanLikeModel(model: PiModel | undefined): boolean {
+  return hasAnyTokenContaining(getModelIdNameTokenValues(model), ["hunyuan"]);
+}
+function isHunyuanLikeAssistantMessage(message: unknown, model: PiModel | undefined): boolean {
+  return modelOrAssistantMessageHas(message, model, ["hunyuan"]);
+}
+// ── Model key ──────────────────────────────────────────────────────
 function modelKey(model: PiModel): string {
   return `${model.provider}/${model.id}`;
 }
@@ -857,6 +896,29 @@ function describeMissingOpenAIFamilyProxyCompat(model: PiModel): string[] {
   return missing;
 }
+/**
+ * Like describeMissingOpenAIFamilyProxyCompat but without the isOpenAIFamilyModel
+ * gate. Warns for ANY model using openai-completions through a non-official base
+ * URL — covers GPT, Kimi, Qwen, GLM, MiniMax, Hunyuan, and any other
+ * OpenAI-compatible proxy.
+ */
+function describeMissingOpenAICompatibleProxyCompat(model: PiModel): string[] {
+  const compat = getCompat(model);
+  const missing: string[] = [];
+  if (lower(model.api) !== "openai-completions") return missing;
+  if (isOfficialOpenAIBaseUrl(model)) return missing;
+  if (compat.supportsLongCacheRetention !== true) {
+    missing.push("supportsLongCacheRetention");
+  }
+  if (compat.sendSessionAffinityHeaders !== true) {
+    missing.push("sendSessionAffinityHeaders");
+  }
+  return missing;
+}
 /**
  * Build the warning text displayed to users when an OpenAI-family third-party
  * proxy is missing one or more cache/session-affinity compat flags.
@@ -971,7 +1033,7 @@ const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
       return normalizeWithFallback(message, getOpenAIRawUsage);
     },
     warningText(model) {
-      const missing = describeMissingOpenAIFamilyProxyCompat(model);
+      const missing = describeMissingOpenAICompatibleProxyCompat(model);
       if (missing.length === 0) return undefined;
       return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
     },
@@ -988,6 +1050,92 @@ const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
       return normalizeWithFallback(message, getGeminiRawUsage);
     },
   },
+  // ── Non-GPT OpenAI-compatible adapters ──────────────────────
+  {
+    id: "openai" as CacheProviderId,
+    label: "Kimi cache",
+    matchesModel: isKimiLikeModel,
+    matchesAssistantMessage(message, model) {
+      if (!isAssistantMessage(message)) return false;
+      return isKimiLikeAssistantMessage(message, model);
+    },
+    normalizeUsage(message) {
+      return normalizeWithFallback(message, getOpenAIRawUsage);
+    },
+    warningText(model) {
+      const missing = describeMissingOpenAICompatibleProxyCompat(model);
+      if (missing.length === 0) return undefined;
+      return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
+    },
+  },
+  {
+    id: "openai" as CacheProviderId,
+    label: "Qwen cache",
+    matchesModel: isQwenLikeModel,
+    matchesAssistantMessage(message, model) {
+      if (!isAssistantMessage(message)) return false;
+      return isQwenLikeAssistantMessage(message, model);
+    },
+    normalizeUsage(message) {
+      return normalizeWithFallback(message, getOpenAIRawUsage);
+    },
+    warningText(model) {
+      const missing = describeMissingOpenAICompatibleProxyCompat(model);
+      if (missing.length === 0) return undefined;
+      return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
+    },
+  },
+  {
+    id: "openai" as CacheProviderId,
+    label: "GLM cache",
+    matchesModel: isGLMLikeModel,
+    matchesAssistantMessage(message, model) {
+      if (!isAssistantMessage(message)) return false;
+      return isGLMLikeAssistantMessage(message, model);
+    },
+    normalizeUsage(message) {
+      return normalizeWithFallback(message, getOpenAIRawUsage);
+    },
+    warningText(model) {
+      const missing = describeMissingOpenAICompatibleProxyCompat(model);
+      if (missing.length === 0) return undefined;
+      return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
+    },
+  },
+  {
+    id: "openai" as CacheProviderId,
+    label: "MiniMax cache",
+    matchesModel: isMiniMaxLikeModel,
+    matchesAssistantMessage(message, model) {
+      if (!isAssistantMessage(message)) return false;
+      return isMiniMaxLikeAssistantMessage(message, model);
+    },
+    normalizeUsage(message) {
+      return normalizeWithFallback(message, getOpenAIRawUsage);
+    },
+    warningText(model) {
+      const missing = describeMissingOpenAICompatibleProxyCompat(model);
+      if (missing.length === 0) return undefined;
+      return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
+    },
+  },
+  {
+    id: "openai" as CacheProviderId,
+    label: "Hunyuan cache",
+    matchesModel: isHunyuanLikeModel,
+    matchesAssistantMessage(message, model) {
+      if (!isAssistantMessage(message)) return false;
+      return isHunyuanLikeAssistantMessage(message, model);
+    },
+    normalizeUsage(message) {
+      return normalizeWithFallback(message, getOpenAIRawUsage);
+    },
+    warningText(model) {
+      const missing = describeMissingOpenAICompatibleProxyCompat(model);
+      if (missing.length === 0) return undefined;
+      return buildOpenAIProxyCompatWarningText(modelKey(model), missing);
+    },
+  },
 ];
 function selectAdapterForModel(model: PiModel | undefined): CacheProviderAdapter | undefined {
@@ -1237,7 +1385,19 @@ export const __internals_for_tests = {
   isOpenAIFamilyAssistantMessage,
   isOpenAIFamilyToken,
   describeMissingOpenAIFamilyProxyCompat,
+  describeMissingOpenAICompatibleProxyCompat,
   isOfficialOpenAIBaseUrl,
+  // Non-GPT OpenAI-compatible model detection
+  isKimiLikeModel,
+  isKimiLikeAssistantMessage,
+  isQwenLikeModel,
+  isQwenLikeAssistantMessage,
+  isGLMLikeModel,
+  isGLMLikeAssistantMessage,
+  isMiniMaxLikeModel,
+  isMiniMaxLikeAssistantMessage,
+  isHunyuanLikeModel,
+  isHunyuanLikeAssistantMessage,
   buildOpenAIProxyCompatWarningText,
   getModelIdNameTokenValues,
   getAssistantMessageModelTokenValues,
@@ -1509,7 +1669,6 @@ export default function (pi: ExtensionAPI) {
   pi.on("before_provider_request", (event, ctx) => {
     if (!shouldInjectOpenAIPromptCacheKey()) return undefined;
-    if (!isOpenAIFamilyModel(ctx.model)) return undefined;
     if (!isOpenAICompatibleApi(ctx.model?.api)) return undefined;
     return addOpenAIPromptCacheKey(event.payload, getSessionPromptCacheKey(ctx));

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-cache-optimizer",
-  "version": "2.4.2",
+  "version": "2.4.3",
   "description": "Pi extension that improves provider-side KV/prompt cache hit rates (DeepSeek, OpenAI, Claude, Gemini) by reordering the system prompt, requesting long retention, and showing footer cache stats. Renamed from pi-deepseek-cache-optimizer.",
   "keywords": [
     "pi-package",