pi-cache-optimizer 2.5.2 → 2.5.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +13 -5
- package/README.zh-CN.md +13 -5
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -69,12 +69,12 @@ Run `/reload` in Pi after install/update/remove so extension hooks refresh.
|
|
|
69
69
|
|---|---|
|
|
70
70
|
| `PI_CACHE_OPTIMIZER_NO_PROMPT_REWRITE=1` | Disable prompt mutations only; footer stats and cache-key fallback remain active. |
|
|
71
71
|
| `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1` | Keep Pi's verbose skill XML. |
|
|
72
|
-
| `
|
|
73
|
-
| `
|
|
72
|
+
| `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` | Disable the OpenAI-compatible `prompt_cache_key` fallback. Preferred explicit opt-out. |
|
|
73
|
+
| `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` | Disable the same fallback via the legacy inverse switch. Values `0`, `false`, `no`, or `off` disable it. |
|
|
74
74
|
|
|
75
75
|
## OpenAI-compatible proxy setup
|
|
76
76
|
|
|
77
|
-
|
|
77
|
+
Third-party `openai-completions` proxies (LiteLLM / OneAPI / NewAPI / OpenRouter-like channels) often route one session across multiple upstream backends. That splits provider-side prompt caches. Start with session affinity:
|
|
78
78
|
|
|
79
79
|
```json
|
|
80
80
|
{
|
|
@@ -94,7 +94,13 @@ For third-party `openai-completions` proxies such as LiteLLM / OneAPI / NewAPI /
|
|
|
94
94
|
}
|
|
95
95
|
```
|
|
96
96
|
|
|
97
|
-
|
|
97
|
+
Notes:
|
|
98
|
+
|
|
99
|
+
- `sendSessionAffinityHeaders: true` is the safe default when your proxy supports sticky routing.
|
|
100
|
+
- `supportsLongCacheRetention: true` is optional. Add it only when the endpoint explicitly supports OpenAI long prompt cache retention.
|
|
101
|
+
- If you see `400 Unsupported parameter: prompt_cache_retention`, remove/avoid `supportsLongCacheRetention` for that channel. Keep `sendSessionAffinityHeaders` if supported.
|
|
102
|
+
- Use `/cache-optimizer compat` or `/cache-optimizer doctor` to see model-specific advice.
|
|
103
|
+
- This extension only advises; it does not edit `models.json`.
|
|
98
104
|
|
|
99
105
|
## Footer stats
|
|
100
106
|
|
|
@@ -103,9 +109,11 @@ Stats are read-only local counters stored at `~/.pi/agent/pi-cache-optimizer-sta
|
|
|
103
109
|
Example footer:
|
|
104
110
|
|
|
105
111
|
```text
|
|
106
|
-
OpenAI cache 3/10
|
|
112
|
+
OpenAI cache 3/10 · 0.002M/0.005M tok (40%) ⚠️ compat
|
|
107
113
|
```
|
|
108
114
|
|
|
115
|
+
Format: `<label> <hit requests>/<total requests> · <cached input tokens>/<total input tokens> tok (<token hit rate>)`. Some adapters may also append `· write <tokens> tok`, and runtime diagnostics may append `⚠️ compat` or `⚠️ integrity`.
|
|
116
|
+
|
|
109
117
|
Supported footer labels include: DS, Claude, OpenAI, Gemini, Kimi, Qwen, GLM, MiniMax, Hunyuan, Mistral, Grok, Llama, Nemotron, Cohere, Yi, Doubao, ERNIE, Baichuan, StepFun, Spark, InternLM, Gemma, Phi, Jamba, Solar, Sonar, Nova, Reka, Falcon, DBRX, MPT, StableLM, Aquila, EXAONE, HyperCLOVA, Luminous, Hermes, Granite, Arctic, Pangu, SenseNova, Zhinao, MiniCPM, XVERSE, Orion, OpenChat, Vicuna, Wizard, Zephyr, Dolphin, OpenOrca, Starling, BLOOM, RWKV, and Aya.
|
|
110
118
|
|
|
111
119
|
Adapter selection uses only model id/name (plus assistant message model/name on message end). Generic OpenAI-shaped APIs are not treated as OpenAI-family unless the model id/name matches a supported family.
|
package/README.zh-CN.md
CHANGED
|
@@ -69,12 +69,12 @@ pi remove npm:pi-deepseek-cache-optimizer && pi install npm:pi-cache-optimizer
|
|
|
69
69
|
|---|---|
|
|
70
70
|
| `PI_CACHE_OPTIMIZER_NO_PROMPT_REWRITE=1` | 只关闭 prompt 改写;footer 统计和 cache-key fallback 仍启用。 |
|
|
71
71
|
| `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1` | 保留 Pi 原始 verbose skill XML。 |
|
|
72
|
-
| `
|
|
73
|
-
| `
|
|
72
|
+
| `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` | 关闭 OpenAI-compatible `prompt_cache_key` fallback。推荐使用这个显式 opt-out。 |
|
|
73
|
+
| `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` | 通过旧的反向开关关闭同一个 fallback。取值 `0`、`false`、`no`、`off` 时关闭。 |
|
|
74
74
|
|
|
75
75
|
## OpenAI-compatible 代理配置
|
|
76
76
|
|
|
77
|
-
|
|
77
|
+
LiteLLM / OneAPI / NewAPI / 类 OpenRouter 渠道等第三方 `openai-completions` 代理,常会把同一个 session 分散到多个上游后端,导致 provider 侧 prompt cache 被拆散。建议先启用 session affinity:
|
|
78
78
|
|
|
79
79
|
```json
|
|
80
80
|
{
|
|
@@ -94,7 +94,13 @@ pi remove npm:pi-deepseek-cache-optimizer && pi install npm:pi-cache-optimizer
|
|
|
94
94
|
}
|
|
95
95
|
```
|
|
96
96
|
|
|
97
|
-
|
|
97
|
+
说明:
|
|
98
|
+
|
|
99
|
+
- `sendSessionAffinityHeaders: true` 是安全默认项,前提是你的代理支持 sticky routing。
|
|
100
|
+
- `supportsLongCacheRetention: true` 是可选项。只有 endpoint 明确支持 OpenAI long prompt cache retention 时才添加。
|
|
101
|
+
- 如果出现 `400 Unsupported parameter: prompt_cache_retention`,请为该渠道移除 / 避免 `supportsLongCacheRetention`;如支持,可保留 `sendSessionAffinityHeaders`。
|
|
102
|
+
- 使用 `/cache-optimizer compat` 或 `/cache-optimizer doctor` 查看当前模型的具体建议。
|
|
103
|
+
- 本扩展只给建议,不会修改 `models.json`。
|
|
98
104
|
|
|
99
105
|
## Footer 统计
|
|
100
106
|
|
|
@@ -103,9 +109,11 @@ pi remove npm:pi-deepseek-cache-optimizer && pi install npm:pi-cache-optimizer
|
|
|
103
109
|
示例 footer:
|
|
104
110
|
|
|
105
111
|
```text
|
|
106
|
-
OpenAI cache 3/10
|
|
112
|
+
OpenAI cache 3/10 · 0.002M/0.005M tok (40%) ⚠️ compat
|
|
107
113
|
```
|
|
108
114
|
|
|
115
|
+
格式:`<label> <命中请求数>/<总请求数> · <cached input tokens>/<total input tokens> tok (<token 命中率>)`。部分 adapter 还可能追加 `· write <tokens> tok`,运行时诊断可能追加 `⚠️ compat` 或 `⚠️ integrity`。
|
|
116
|
+
|
|
109
117
|
支持的 footer label 包括:DS、Claude、OpenAI、Gemini、Kimi、Qwen、GLM、MiniMax、Hunyuan、Mistral、Grok、Llama、Nemotron、Cohere、Yi、Doubao、ERNIE、Baichuan、StepFun、Spark、InternLM、Gemma、Phi、Jamba、Solar、Sonar、Nova、Reka、Falcon、DBRX、MPT、StableLM、Aquila、EXAONE、HyperCLOVA、Luminous、Hermes、Granite、Arctic、Pangu、SenseNova、Zhinao、MiniCPM、XVERSE、Orion、OpenChat、Vicuna、Wizard、Zephyr、Dolphin、OpenOrca、Starling、BLOOM、RWKV、Aya。
|
|
110
118
|
|
|
111
119
|
Adapter 选择只看模型 id/name(以及 message_end 时 assistant message 的 model/name)。仅使用 OpenAI-shaped API 不会被当作 OpenAI-family,除非模型 id/name 匹配受支持的家族。
|
package/package.json
CHANGED