npm - pi-cache-optimizer - Versions diffs - 2.5.2 → 2.5.3 - Mend

pi-cache-optimizer 2.5.2 → 2.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -69,12 +69,12 @@ Run `/reload` in Pi after install/update/remove so extension hooks refresh.
 |---|---|
 | `PI_CACHE_OPTIMIZER_NO_PROMPT_REWRITE=1` | Disable prompt mutations only; footer stats and cache-key fallback remain active. |
 | `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1` | Keep Pi's verbose skill XML. |
-| `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` | Disable the OpenAI-compatible `prompt_cache_key` fallback. |
-| `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` | Disable the OpenAI-compatible `prompt_cache_key` fallback. |
+| `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` | Disable the OpenAI-compatible `prompt_cache_key` fallback. Preferred explicit opt-out. |
+| `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` | Disable the same fallback via the legacy inverse switch. Values `0`, `false`, `no`, or `off` disable it. |
 ## OpenAI-compatible proxy setup
-For third-party `openai-completions` proxies such as LiteLLM / OneAPI / NewAPI / OpenRouter-like channels, low cache hit rate is often caused by multi-backend routing. The safe default is session affinity:
+Third-party `openai-completions` proxies (LiteLLM / OneAPI / NewAPI / OpenRouter-like channels) often route one session across multiple upstream backends. That splits provider-side prompt caches. Start with session affinity:
 ```json
 {
@@ -94,7 +94,13 @@ For third-party `openai-completions` proxies such as LiteLLM / OneAPI / NewAPI /
 }
 ```
-Only add `supportsLongCacheRetention: true` after the endpoint/proxy explicitly supports OpenAI long prompt cache retention. This extension does not directly write `prompt_cache_retention`; it requests `PI_CACHE_RETENTION=long`, and Pi may send `prompt_cache_retention` when compat says long retention is supported. If a proxy returns `400 Unsupported parameter: prompt_cache_retention`, remove/avoid `supportsLongCacheRetention` for that channel, keep `sendSessionAffinityHeaders` if supported, and use `/cache-optimizer compat` / `/cache-optimizer doctor` for diagnosis. When a 400 is observed while long retention compat is enabled, the extension adds a one-time warning and doctor hint. This extension itself only advises; it does not edit `models.json`.
+Notes:
+- `sendSessionAffinityHeaders: true` is the safe default when your proxy supports sticky routing.
+- `supportsLongCacheRetention: true` is optional. Add it only when the endpoint explicitly supports OpenAI long prompt cache retention.
+- If you see `400 Unsupported parameter: prompt_cache_retention`, remove/avoid `supportsLongCacheRetention` for that channel. Keep `sendSessionAffinityHeaders` if supported.
+- Use `/cache-optimizer compat` or `/cache-optimizer doctor` to see model-specific advice.
+- This extension only advises; it does not edit `models.json`.
 ## Footer stats
@@ -103,9 +109,11 @@ Stats are read-only local counters stored at `~/.pi/agent/pi-cache-optimizer-sta
 Example footer:
 ```text
-OpenAI cache 3/10 (30%) · 0.002M/0.005M tok ⚠️ compat
+OpenAI cache 3/10 · 0.002M/0.005M tok (40%) ⚠️ compat
 ```
+Format: `<label> <hit requests>/<total requests> · <cached input tokens>/<total input tokens> tok (<token hit rate>)`. Some adapters may also append `· write <tokens> tok`, and runtime diagnostics may append `⚠️ compat` or `⚠️ integrity`.
 Supported footer labels include: DS, Claude, OpenAI, Gemini, Kimi, Qwen, GLM, MiniMax, Hunyuan, Mistral, Grok, Llama, Nemotron, Cohere, Yi, Doubao, ERNIE, Baichuan, StepFun, Spark, InternLM, Gemma, Phi, Jamba, Solar, Sonar, Nova, Reka, Falcon, DBRX, MPT, StableLM, Aquila, EXAONE, HyperCLOVA, Luminous, Hermes, Granite, Arctic, Pangu, SenseNova, Zhinao, MiniCPM, XVERSE, Orion, OpenChat, Vicuna, Wizard, Zephyr, Dolphin, OpenOrca, Starling, BLOOM, RWKV, and Aya.
 Adapter selection uses only model id/name (plus assistant message model/name on message end). Generic OpenAI-shaped APIs are not treated as OpenAI-family unless the model id/name matches a supported family.

package/README.zh-CN.md CHANGED Viewed

@@ -69,12 +69,12 @@ pi remove npm:pi-deepseek-cache-optimizer && pi install npm:pi-cache-optimizer
 |---|---|
 | `PI_CACHE_OPTIMIZER_NO_PROMPT_REWRITE=1` | 只关闭 prompt 改写；footer 统计和 cache-key fallback 仍启用。 |
 | `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1` | 保留 Pi 原始 verbose skill XML。 |
-| `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` | 关闭 OpenAI-compatible `prompt_cache_key` fallback。 |
-| `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` | 关闭 OpenAI-compatible `prompt_cache_key` fallback。 |
+| `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` | 关闭 OpenAI-compatible `prompt_cache_key` fallback。推荐使用这个显式 opt-out。 |
+| `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` | 通过旧的反向开关关闭同一个 fallback。取值 `0`、`false`、`no`、`off` 时关闭。 |
 ## OpenAI-compatible 代理配置
-对 LiteLLM / OneAPI / NewAPI / 类 OpenRouter 渠道等第三方 `openai-completions` 代理，缓存命中率低通常是因为请求被分散到多个后端。安全默认配置是 session affinity：
+LiteLLM / OneAPI / NewAPI / 类 OpenRouter 渠道等第三方 `openai-completions` 代理，常会把同一个 session 分散到多个上游后端，导致 provider 侧 prompt cache 被拆散。建议先启用 session affinity：
 ```json
 {
@@ -94,7 +94,13 @@ pi remove npm:pi-deepseek-cache-optimizer && pi install npm:pi-cache-optimizer
 }
 ```
-只有在 endpoint / proxy 明确支持 OpenAI long prompt cache retention 时，才添加 `supportsLongCacheRetention: true`。本扩展不会直接写入 `prompt_cache_retention`；它会请求 `PI_CACHE_RETENTION=long`，当 compat 声明支持 long retention 时，Pi 可能发送 `prompt_cache_retention`。如果某代理返回 `400 Unsupported parameter: prompt_cache_retention`，请为该渠道移除 / 避免 `supportsLongCacheRetention`，如支持可保留 `sendSessionAffinityHeaders`，并用 `/cache-optimizer compat` / `/cache-optimizer doctor` 诊断。当启用 long retention compat 时观察到 400，本扩展会给一次性 warning，并在 doctor 中提示。本扩展只给建议，不会修改 `models.json`。
+说明：
+- `sendSessionAffinityHeaders: true` 是安全默认项，前提是你的代理支持 sticky routing。
+- `supportsLongCacheRetention: true` 是可选项。只有 endpoint 明确支持 OpenAI long prompt cache retention 时才添加。
+- 如果出现 `400 Unsupported parameter: prompt_cache_retention`，请为该渠道移除 / 避免 `supportsLongCacheRetention`；如支持，可保留 `sendSessionAffinityHeaders`。
+- 使用 `/cache-optimizer compat` 或 `/cache-optimizer doctor` 查看当前模型的具体建议。
+- 本扩展只给建议，不会修改 `models.json`。
 ## Footer 统计
@@ -103,9 +109,11 @@ pi remove npm:pi-deepseek-cache-optimizer && pi install npm:pi-cache-optimizer
 示例 footer：
 ```text
-OpenAI cache 3/10 (30%) · 0.002M/0.005M tok ⚠️ compat
+OpenAI cache 3/10 · 0.002M/0.005M tok (40%) ⚠️ compat
 ```
+格式：`<label> <命中请求数>/<总请求数> · <cached input tokens>/<total input tokens> tok (<token 命中率>)`。部分 adapter 还可能追加 `· write <tokens> tok`，运行时诊断可能追加 `⚠️ compat` 或 `⚠️ integrity`。
 支持的 footer label 包括：DS、Claude、OpenAI、Gemini、Kimi、Qwen、GLM、MiniMax、Hunyuan、Mistral、Grok、Llama、Nemotron、Cohere、Yi、Doubao、ERNIE、Baichuan、StepFun、Spark、InternLM、Gemma、Phi、Jamba、Solar、Sonar、Nova、Reka、Falcon、DBRX、MPT、StableLM、Aquila、EXAONE、HyperCLOVA、Luminous、Hermes、Granite、Arctic、Pangu、SenseNova、Zhinao、MiniCPM、XVERSE、Orion、OpenChat、Vicuna、Wizard、Zephyr、Dolphin、OpenOrca、Starling、BLOOM、RWKV、Aya。
 Adapter 选择只看模型 id/name（以及 message_end 时 assistant message 的 model/name）。仅使用 OpenAI-shaped API 不会被当作 OpenAI-family，除非模型 id/name 匹配受支持的家族。

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-cache-optimizer",
-  "version": "2.5.2",
+  "version": "2.5.3",
   "description": "Improve Pi prompt/KV cache hit rates with stable prompts, OpenAI-compatible cache keys, proxy compat warnings, and footer cache stats.",
   "keywords": [
     "pi-package",