pi-cache-optimizer 2.5.1 → 2.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/README.md +80 -493
  2. package/README.zh-CN.md +83 -355
  3. package/index.ts +223 -43
  4. package/package.json +2 -2
package/README.md CHANGED
@@ -6,554 +6,141 @@
6
6
 
7
7
  [中文说明](./README.zh-CN.md)
8
8
 
9
- > **Renamed from `pi-deepseek-cache-optimizer`.** If you previously installed the old name, migrate with:
10
- >
11
- > ```bash
12
- > pi remove npm:pi-deepseek-cache-optimizer && pi install npm:pi-cache-optimizer
13
- > ```
14
- >
15
- > Your persisted footer counters and any existing `~/.pi/agent/models.json` are preserved automatically.
9
+ Pi extension for improving provider-side KV / prompt cache hit rates. It keeps stable prompt content near the front, adds a conservative OpenAI-compatible `prompt_cache_key` fallback, warns about common proxy cache-routing gaps, and shows read-only footer cache stats.
16
10
 
17
- A plug-and-play Pi extension that improves provider-side KV Cache / Prompt Cache hit rates, with conservative provider-specific footer stats. Despite the original DeepSeek-only name, this package has supported DeepSeek, OpenAI, Claude, and Gemini stats adapters since 1.x — the new name reflects that scope.
11
+ **GitHub About:** Improve Pi prompt/KV cache hit rates with stable prompts, OpenAI-compatible cache keys, proxy compat warnings, and footer cache stats.
18
12
 
19
- > Important: prompt/KV caching is provider-side and best-effort. This extension can improve the odds of cache hits by stabilizing prompt prefixes, requesting long retention through Pi when supported, warning about obvious compat gaps, and showing lightweight footer stats for providers that expose reliable cache usage. It cannot guarantee cache hits. Third-party proxies may hide, drop, reroute, or reinterpret cache behavior.
13
+ > Renamed from `pi-deepseek-cache-optimizer`. Existing footer counters migrate automatically. This package never creates, edits, backs up, or deletes your `~/.pi/agent/models.json`.
14
+
15
+ ## Contents
16
+
17
+ - [What it does](#what-it-does)
18
+ - [Install](#install)
19
+ - [Commands](#commands)
20
+ - [Persistent opt-out](#persistent-opt-out)
21
+ - [OpenAI-compatible proxy setup](#openai-compatible-proxy-setup)
22
+ - [Footer stats](#footer-stats)
23
+ - [Uninstall](#uninstall)
24
+ - [Verify effect](#verify-effect)
25
+ - [License](#license)
20
26
 
21
27
  ## What it does
22
28
 
23
- | Feature | How | Manual action required |
24
- |---|---|:---:|
25
- | 🔄 Reorders the system prompt | `before_agent_start` hook: stable prefix first, dynamic context later | ❌ Automatic |
26
- | Requests long cache retention | Sets `PI_CACHE_RETENTION=long` when the extension loads; Pi/provider compat decides what is sent | ❌ Automatic |
27
- | 🔗 Conservative compat reminders | DeepSeek session-affinity reminders, plus obvious Claude cache-control guidance for compatible endpoints | ⚠️ See below |
28
- | 📊 Provider-specific footer stats | Shows read-only cache stats for supported provider families in Pi footer/status | ❌ Automatic |
29
-
30
- ## Supported stats adapters
31
-
32
- This release keeps the original DeepSeek behavior and adds read-only stats adapters for model families that Pi or the provider can expose safely. Adapter selection is intentionally limited to the model id/name (and assistant message `model`/`name` on `message_end`); provider id, API type, base URL, `thinkingFormat`, and compat flags never select a stats adapter.
33
-
34
- | Adapter | Detection | Footer label | Usage fields |
35
- |---|---|---|---|
36
- | DeepSeek | Model id/name contains `deepseek` | `DS cache` | Pi `usage.cacheRead`/`usage.input`, or raw `prompt_cache_hit_tokens`, `prompt_cache_miss_tokens`, `prompt_tokens` when visible |
37
- | OpenAI-family | Model id/name contains conservative OpenAI-family tokens such as `gpt-`, `chatgpt`, `o1`, `o3`, `o4`, or `o5` | `OpenAI cache` | Pi-normalized usage, or raw `prompt_tokens_details.cached_tokens` / `input_tokens_details.cached_tokens` with prompt/input totals |
38
- | Kimi / Moonshot | Model id/name contains `kimi` | `Kimi cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
39
- | Qwen / Alibaba | Model id/name contains `qwen` | `Qwen cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
40
- | GLM / Zhipu | Model id/name contains `glm` | `GLM cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
41
- | MiniMax | Model id/name contains `minimax` | `MiniMax cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
42
- | Hunyuan / Tencent | Model id/name contains `hunyuan` | `Hunyuan cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
43
- | Mistral | Model id/name contains `mistral`, `mixtral`, or `codestral` | `Mistral cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
44
- | xAI / Grok | Model id/name contains `grok`, or pattern `xai` with safe boundaries | `Grok cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
45
- | Meta / Llama | Model id/name contains `llama` | `Llama cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
46
- | NVIDIA Nemotron | Model id/name contains `nemotron` | `Nemotron cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
47
- | Cohere / Command | Model id/name contains `cohere` or `command-r` | `Cohere cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
48
- | Yi / 零一万物 | Model id/name contains `yi-`, `01-ai`, `zero-one`, or pattern `yi` with safe boundaries | `Yi cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
49
- | Doubao / ByteDance / Seed | Model id/name contains `doubao`, `豆包`, `volcengine`, `bytedance`, `byte-dance`, or pattern `seed` with safe boundaries | `Doubao cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
50
- | Baidu ERNIE / Wenxin | Model id/name contains `ernie`, `wenxin`, `文心`, `yiyan`, `一言`, or `baidu` | `ERNIE cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
51
- | Baichuan / 百川 | Model id/name contains `baichuan` or `百川` | `Baichuan cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
52
- | StepFun / 阶跃星辰 | Model id/name contains `stepfun` or `step-` prefix | `StepFun cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
53
- | iFlytek Spark / 讯飞星火 | Model id/name contains `spark`, `xinghuo`, `星火`, `iflytek`, or `讯飞` | `Spark cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
54
- | InternLM / 书生 | Model id/name contains `internlm`, `intern-lm`, or `书生` | `InternLM cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
55
- | Google Gemma | Model id/name contains `gemma` | `Gemma cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
56
- | Microsoft Phi | Model id/name contains `phi-` prefix, or pattern `phi` with safe boundaries | `Phi cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
57
- | AI21 Jamba | Model id/name contains `jamba` or `ai21` | `Jamba cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
58
- | Upstage Solar | Model id/name contains `solar` or `upstage` | `Solar cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
59
- | Perplexity / Sonar | Model id/name contains `sonar`, `perplexity`, or pattern `pplx` with safe boundaries | `Sonar cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
60
- | Amazon Nova | Model id/name contains `amazon-nova`, or pattern `nova` with safe boundaries | `Nova cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
61
- | Reka | Model id/name contains `reka` | `Reka cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
62
- | Falcon / TII | Model id/name contains `falcon` or `tiiuae` (not bare `tii`) | `Falcon cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
63
- | Databricks DBRX | Model id/name contains `dbrx` or `databricks` | `DBRX cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
64
- | MosaicML MPT | Model id/name contains `mosaicml`, `mpt-` prefix, or pattern `mpt` with safe boundaries | `MPT cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
65
- | StableLM / Stability AI | Model id/name contains `stablelm`, `stable-lm`, or `stability-ai` | `StableLM cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
66
- | BAAI / Aquila | Model id/name contains `aquila` or `baai` | `Aquila cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
67
- | LG EXAONE | Model id/name contains `exaone` | `EXAONE cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
68
- | Naver HyperCLOVA X | Model id/name contains `hyperclova` or `clova-x` (conservative, not bare `clova`/`naver`) | `HyperCLOVA cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
69
- | Aleph Alpha Luminous | Model id/name contains `luminous`, `aleph-alpha`, or pattern `aleph` with safe boundaries | `Luminous cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
70
- | Nous / Hermes / OpenHermes | Model id/name contains `nous`, `hermes`, or `openhermes` | `Hermes cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
71
- | IBM Granite | Model id/name contains `granite` or `ibm-granite` | `Granite cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
72
- | Snowflake Arctic | Model id/name contains `snowflake-arctic`, or safe-boundary pattern `arctic` | `Arctic cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
73
- | Huawei Pangu / 盘古 | Model id/name contains `pangu`, `pan-gu`, `盘古`, or `huawei-pangu` | `Pangu cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
74
- | SenseTime SenseNova / 商汤 | Model id/name contains `sensenova`, `sense-nova`, `sensechat`, or `商汤` | `SenseNova cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
75
- | 360 Zhinao / 智脑 | Model id/name contains `360gpt`, `360-gpt`, `zhinao`, or `智脑` (no bare `360`) | `Zhinao cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
76
- | OpenBMB MiniCPM | Model id/name contains `minicpm`, `mini-cpm`, or `openbmb` | `MiniCPM cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
77
- | XVERSE | Model id/name contains `xverse` | `XVERSE cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
78
- | OrionStar Orion | Model id/name contains `orionstar`, `orion-star`, or safe-boundary pattern `orion` | `Orion cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
79
- | OpenChat | Model id/name contains `openchat` | `OpenChat cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
80
- | Vicuna | Model id/name contains `vicuna` | `Vicuna cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
81
- | WizardLM / WizardCoder | Model id/name contains `wizardlm`, `wizard-lm`, `wizardcoder`, or `wizard-coder` | `Wizard cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
82
- | Zephyr | Model id/name contains `zephyr` | `Zephyr cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
83
- | Dolphin | Model id/name contains `dolphin` | `Dolphin cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
84
- | OpenOrca | Model id/name contains `openorca` or `open-orca` | `OpenOrca cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
85
- | Starling | Model id/name contains `starling` | `Starling cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
86
- | BLOOM / BigScience | Model id/name contains `bloom` or `bigscience` | `BLOOM cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
87
- | RWKV | Model id/name contains `rwkv` | `RWKV cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
88
- | Cohere Aya | Model id/name contains `aya-expanse`, or safe-boundary pattern `aya` (avoid `maya`/`payara`) | `Aya cache` | Pi-normalized usage, or raw OpenAI-shaped fields when visible |
89
- | Anthropic / Claude | Model id/name contains `anthropic` or `claude` | `Claude cache` | Pi-normalized usage, or raw `cache_read_input_tokens`, `cache_creation_input_tokens`, `input_tokens` |
90
- | Gemini / Vertex | Model id/name contains `gemini` or `vertex` | `Gemini cache` | Pi-normalized usage, or raw Gemini/Vertex cached-content token metadata when visible |
91
-
92
- Generic OpenAI-compatible proxies are **not** treated as OpenAI-family just because they use an OpenAI-shaped API or provider id. If the active model id/name is ambiguous, the extension hides the footer stats instead of guessing.
93
-
94
- ## Platform support
95
-
96
- This extension is pure Node.js — no shell exec, no native bindings, no platform-specific paths — so it runs on every OS Pi itself supports:
97
-
98
- | OS | Notes |
99
- |---|---|
100
- | Linux | Native. |
101
- | macOS | Native. |
102
- | Windows | Works through the bash shell Pi requires on Windows (Git Bash, Cygwin, MSYS2, or WSL). See Pi's [Windows setup](https://github.com/earendil-works/pi-coding-agent/blob/main/docs/windows.md). |
103
- | Termux / Android | Works inside Pi's Termux setup. |
29
+ - Reorders stable system-prompt content before dynamic context.
30
+ - Compresses Pi skill listings and strips session-overview churn.
31
+ - Requests long cache retention when Pi/provider compat supports it.
32
+ - Adds a session-id `prompt_cache_key` fallback for `openai-completions` / `openai-responses` payloads when no effective key exists.
33
+ - Warns once for third-party OpenAI-compatible proxies missing cache/session-affinity compat flags.
34
+ - Shows session-scoped footer stats for supported model families.
104
35
 
105
- State files under `~/.pi/agent/` are resolved via Node's `os.homedir()`, so on Windows the path automatically expands to `C:\Users\<you>\.pi\agent\...`. The extension's compat warnings, `/cache-optimizer doctor`, and `/cache-optimizer compat` show the platform-appropriate path automatically (`~/.pi/agent/models.json` on Linux/macOS, `%USERPROFILE%\.pi\agent\models.json` on Windows). All shell snippets in this README are bash, matching the shell Pi runs in on every supported platform; no PowerShell or `cmd.exe` translation is needed when commands are executed inside (or for) Pi.
36
+ Caching is provider-side and best-effort. Third-party proxies can still hide cache usage, reject unsupported parameters, or route requests across multiple upstreams.
106
37
 
107
38
  ## Install
108
39
 
109
- Install and configure Pi first, then install this extension:
110
-
111
40
  ```bash
112
41
  pi install npm:pi-cache-optimizer
113
42
  ```
114
43
 
115
- After installation, `PI_CACHE_RETENTION=long` is applied automatically, the system prompt is reordered and skills are compressed automatically, session-overview churn is stripped automatically, and the footer shows cache stats after supported model-family responses with exposed usage.
116
-
117
- ## Opt-out
118
-
119
- | Env var | Effect |
120
- |---------|--------|
121
- | `PI_CACHE_OPTIMIZER_NO_PROMPT_REWRITE=1` | Skip all `before_agent_start` prompt mutations (churn strip, skill compression, stable-prefix reorder); footer stats and `prompt_cache_key` fallback remain active |
122
- | `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1` | Keep pi's verbose `<available_skills>` XML (opt out of one-line index) |
123
- | `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` | Disable the OpenAI-family `prompt_cache_key` fallback (default is enabled) |
124
- | `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` | Disable the OpenAI-family `prompt_cache_key` fallback |
125
-
126
- ## Uninstall
127
-
128
- Remove the same package source you installed. For the npm package:
129
-
130
- ```bash
131
- pi remove npm:pi-cache-optimizer
132
- ```
133
-
134
- If you installed from a local path, remove that same path/source instead, for example:
44
+ If you previously installed the old package:
135
45
 
136
46
  ```bash
137
- pi remove /absolute/path/to/pi-deepseek-cache-optimizer
138
- # or, if that was the exact source you installed:
139
- pi remove ./relative/path/to/pi-deepseek-cache-optimizer
47
+ pi remove npm:pi-deepseek-cache-optimizer && pi install npm:pi-cache-optimizer
140
48
  ```
141
49
 
142
- If you installed into project settings with `pi install -l ...`, use the matching project-scope remove command, for example `pi remove -l npm:pi-cache-optimizer`.
50
+ Run `/reload` in Pi after install/update/remove so extension hooks refresh.
143
51
 
144
- After removing the package, run `/reload` in Pi or restart Pi so the extension is unloaded. The footer counters are persisted separately; if you also want to delete that local state, remove:
52
+ ## Commands
145
53
 
146
- ```bash
147
- rm ~/.pi/agent/pi-cache-optimizer-stats.json
148
- # Old name (kept once and migrated automatically; safe to delete if it still exists):
149
- rm -f ~/.pi/agent/deepseek-cache-optimizer-stats.json
150
- ```
54
+ | Command | Effect |
55
+ |---|---|
56
+ | `/cache-optimizer` | Interactive menu when UI supports it; otherwise prints help and current state. |
57
+ | `/cache-optimizer enable` | Enables runtime optimizations for the current Pi process, resets current-session stats, and starts a fresh “enabled” measurement. |
58
+ | `/cache-optimizer disable` | Disables optimization for the current Pi process, resets current-session stats, and keeps collecting footer stats in disabled comparison mode. Run `/reload` or restart Pi to return to startup behavior. |
59
+ | `/cache-optimizer doctor` | Shows active model/provider/API/base URL/compat plus low-hit diagnosis. |
60
+ | `/cache-optimizer compat` | Shows copyable compat advice for the active model, if applicable. |
61
+ | `/cache-optimizer stats` | Shows today's session-scoped counters and recent trend for the active model. |
62
+ | `/cache-optimizer reset` | Resets only local stats for the active session + model; upstream provider cache is not modified. |
151
63
 
64
+ `enable` / `disable` are current-process switches. For a persistent opt-out, use environment variables below.
152
65
 
66
+ ## Persistent opt-out
153
67
 
154
- ## Adding an OpenAI-compatible proxy channel
68
+ | Env var | Effect |
69
+ |---|---|
70
+ | `PI_CACHE_OPTIMIZER_NO_PROMPT_REWRITE=1` | Disable prompt mutations only; footer stats and cache-key fallback remain active. |
71
+ | `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1` | Keep Pi's verbose skill XML. |
72
+ | `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` | Disable the OpenAI-compatible `prompt_cache_key` fallback. Preferred explicit opt-out. |
73
+ | `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` | Disable the same fallback via the legacy inverse switch. Values `0`, `false`, `no`, or `off` disable it. |
155
74
 
156
- When adding a third-party OpenAI-compatible proxy provider (e.g. `otokapi`, `cafecode`,
157
- OpenRouter, etc.) to `~/.pi/agent/models.json`, the `compat` flags for cache optimization
158
- are NOT required for the model to work — but they dramatically improve cache durability.
75
+ ## OpenAI-compatible proxy setup
159
76
 
160
- ### Minimal provider config template
77
+ Third-party `openai-completions` proxies (LiteLLM / OneAPI / NewAPI / OpenRouter-like channels) often route one session across multiple upstream backends. That splits provider-side prompt caches. Start with session affinity:
161
78
 
162
- ```jsonc
79
+ ```json
163
80
  {
164
81
  "providers": {
165
82
  "your-provider-id": {
166
- "api": "openai-completions", // or "openai-responses"
167
- "baseUrl": "https://your-proxy.example.com/v1",
168
- "apiKey": "your-api-key",
169
- "models": {
170
- "gpt-5.5": {
171
- "id": "gpt-5.5",
172
- "name": "GPT 5.5",
173
- "contextWindowTokens": 128000,
174
- "maxOutputTokens": 8192,
175
- "thinking": {
176
- // Use the thinking modes your proxy actually supports.
177
- // Pi maps --thinking <level> to tokens via thinkingLevelMap.
178
- // The template below keeps each level distinct — DO NOT
179
- // map everything to "xhigh". Your proxy may not support
180
- // all levels; remove unsupported ones or test each.
181
- "thinkingLevelMap": {
182
- "off": null,
183
- "minimal": "minimal",
184
- "low": "low",
185
- "medium": "medium",
186
- "high": "high",
187
- "xhigh": "xhigh"
188
- }
189
- },
190
- "compat": {
191
- "supportsLongCacheRetention": true,
192
- "sendSessionAffinityHeaders": true
193
- }
194
- }
195
- }
196
- }
197
- }
198
- }
199
- ```
200
-
201
- Key points:
202
-
203
- - `thinkingLevelMap` keeps distinct levels. If your proxy does not support a particular
204
- level (e.g. `minimal`), remove that entry or set to `null`. Do **not** collapse all
205
- levels to `"xhigh"` — that defeats user control over reasoning effort.
206
- - `compat` flags help Pi request longer cache retention and send session-affinity
207
- headers for proxy-side cache locality. Only enable them if your proxy supports them.
208
- - The extension detects model families by `id`/`name` strings, not by provider id,
209
- base URL, or API type. Use recognizable model ids (e.g. `gpt-5.5`, `kimi-k2.5`) for
210
- correct stats adapter selection.
211
-
212
- ## Footer cache stats
213
-
214
- The Pi footer displays stats for the **active model family** only, for example:
215
-
216
- ```text
217
- DS cache 3/5 · 0.77M/0.80M tok (96%)
218
- OpenAI cache 2/4 · 0.25M/0.70M tok (36%)
219
- Claude cache 1/3 · 0.10M/0.45M tok (22%) · write 0.20M tok
220
- Gemini cache 1/2 · 0.18M/0.50M tok (36%)
221
- ```
222
-
223
- Meaning:
224
-
225
- - `3/5`: 3 of 5 supported assistant responses for that provider family had cache-read tokens.
226
- - `0.77M/0.80M tok`: cumulative cache-read input tokens / cumulative prompt input tokens, shown in millions.
227
- - Percentage: `cacheRead / total prompt input`.
228
- - `write ... tok` appears for Claude when cache-write tokens are nonzero, because Anthropic cache writes have distinct cost/accounting semantics.
229
-
230
- Stats rules:
231
-
232
- - Counters are separate per provider family. DeepSeek, OpenAI, Claude, and Gemini stats are not combined into one global hit rate.
233
- - The footer shows only the active model family's label and counters; it clears/hides for unsupported or ambiguous models.
234
- - Counts only assistant responses where Pi/provider exposes usage. Missing usage means no counter update.
235
- - Adapter matching uses only active model id/name plus assistant message `model`/`name`; broad provider/API/compat metadata is ignored for selection.
236
- - Pi-normalized `usage.input`, `usage.cacheRead`, and `usage.cacheWrite` are preferred. Known raw provider fields are used only defensively when visible on the assistant message.
237
- - Total prompt input is `input + cacheRead + cacheWrite` for Pi-normalized usage. Provider raw normalizers use each provider's documented total/input fields when available.
238
- - Stats update only the footer/status. The extension does not create extra TUI widgets or diagnostic files.
239
- - Stats are persisted in a small local JSON state file at `~/.pi/agent/pi-cache-optimizer-stats.json`. Earlier 1.x releases used `~/.pi/agent/deepseek-cache-optimizer-stats.json`; on first run after upgrade the old file is read once, copied into the new path, and best-effort deleted. The file stores only counters and the local day; it does not store API keys, prompts, messages, headers, or model output.
240
- - Existing v1 state files from DeepSeek-only releases are migrated into the DeepSeek adapter counters automatically.
241
-
242
- Session scope:
243
-
244
- - Stats are now scoped per Pi session + provider/model, not global.
245
- - Each Pi process (session) starts with fresh counters. Different sessions using the
246
- same provider/model do not share footer statistics or reset effects.
247
- - Within the same Pi session, stats accumulate normally for each provider/model.
248
- - Pi restarts start fresh stats for the new session.
249
- - `/reload` does **not** clear accumulated session-scoped stats; it only clears transient
250
- in-memory data (recent samples, integrity notification state).
251
- - Crossing the local natural-day boundary resets counters on the next status update or
252
- supported-provider response.
253
- - Persisted stats are stored under an opaque session hash key (SHA-256 hash of session id)
254
- so that different sessions' data is isolated on disk. Raw session ids are never logged,
255
- displayed, or written to the stats file.
256
-
257
- > **Concurrent-write caveat**: Stats are persisted atomically (write-temp then rename),
258
- > but multiple Pi processes reading and writing simultaneously can still experience
259
- > a lost-update window (the classic read-modify-write race). The implementation
260
- > preserves sequential operation semantics (each write replaces only the current
261
- > session's data and re-appends other sessions from the previous read), but does
262
- > **not** guarantee concurrent-safety across processes. If you run multiple Pi
263
- > instances using the same `models.json` with different provider/model IDs, their
264
- > stats files may occasionally overwrite each other's session data. This affects
265
- > only the on-disk persistence; in-memory counters per process remain correct.
266
-
267
- ## Suggested compat config
268
-
269
- For direct DeepSeek or DeepSeek-like OpenAI-compatible proxies, configure the provider or model `compat` like this:
270
-
271
- The `compat` block goes inside your provider object in `~/.pi/agent/models.json`, at
272
- the same level as `baseUrl`, `api`, `apiKey`, and `models`:
273
-
274
- ```jsonc
275
- {
276
- "providers": {
277
- "deepseek": {
278
83
  "api": "openai-completions",
279
- "baseUrl": "https://api.deepseek.com/v1",
280
- "apiKey": "sk-...",
281
- "models": { /* ... */ },
282
- // 👇 compat goes here, NOT inside models
84
+ "baseUrl": "https://example.com/v1",
85
+ "apiKey": "env:YOUR_API_KEY",
283
86
  "compat": {
284
- "thinkingFormat": "deepseek",
285
- "supportsLongCacheRetention": true,
286
87
  "sendSessionAffinityHeaders": true
287
- }
88
+ },
89
+ "models": [
90
+ { "id": "gpt-5.5", "name": "GPT-5.5" }
91
+ ]
288
92
  }
289
93
  }
290
94
  }
291
95
  ```
292
96
 
293
- If your provider id is not `deepseek` (for example a company proxy or OpenRouter-style proxy), you can put the same fields on that provider or the specific DeepSeek model. The extension detects DeepSeek-like models only by checking whether the model id/name contains `deepseek`; it does not infer this from provider id, base URL, or `thinkingFormat`. The currently recommended verification path covers the official direct `deepseek/deepseek-v4-pro` model.
294
-
295
- The extension warns at most once per provider/model per session when a DeepSeek-like OpenAI-compatible model is missing:
296
-
297
- - `supportsLongCacheRetention: true`, so Pi may not send `prompt_cache_retention: "24h"`.
298
- - `sendSessionAffinityHeaders: true` for OpenAI Completions-compatible APIs, or `sendSessionIdHeader: true` for OpenAI Responses-compatible APIs, so Pi may not send session-affinity headers such as `session_id`, `x-client-request-id`, or `x-session-affinity`.
299
-
300
- For Claude/Anthropic models behind an OpenAI-compatible endpoint, the extension may warn when the model is clearly Claude-like but `cacheControlFormat: "anthropic"` is missing. Only enable that compat flag if your endpoint supports Anthropic-style cache-control markers.
301
-
302
- > Reminder: only enable session-affinity headers or cache-control compat when your endpoint or proxy supports them.
303
-
304
- ## Diagnostic command
305
-
306
- The extension registers a Pi command `/cache-optimizer` for interactive diagnosis.
307
-
308
- ```
309
- /cache-optimizer — interactive menu (or text help when no UI)
310
- /cache-optimizer doctor — show provider, model, API, base URL, compat status
311
- and low-hit cause diagnosis
312
- /cache-optimizer stats — show active model stats bucket and recent trend
313
- /cache-optimizer compat — show compat suggestion with edit instructions
314
- /cache-optimizer reset — reset local session stats for the current model
315
- (does not affect upstream provider prompt cache)
316
- ```
317
-
318
- When run without arguments, `/cache-optimizer` shows an interactive selection menu
319
- (Doctor / Stats / Compat / Reset / Cancel) when the Pi UI supports it (`ctx.ui.select`).
320
- In non-interactive terminals, it falls back to text help with current model compat
321
- status.
97
+ Notes:
322
98
 
323
- ### `/cache-optimizer reset`
99
+ - `sendSessionAffinityHeaders: true` is the safe default when your proxy supports sticky routing.
100
+ - `supportsLongCacheRetention: true` is optional. Add it only when the endpoint explicitly supports OpenAI long prompt cache retention.
101
+ - If you see `400 Unsupported parameter: prompt_cache_retention`, remove/avoid `supportsLongCacheRetention` for that channel. Keep `sendSessionAffinityHeaders` if supported.
102
+ - Use `/cache-optimizer compat` or `/cache-optimizer doctor` to see model-specific advice.
103
+ - This extension only advises; it does not edit `models.json`.
324
104
 
325
- Resets only the current Pi session's stats bucket for the active provider/model.
326
- Clears today's request counters (hit/total), cached token counts, and recent trend
327
- samples for that model. Other provider/model buckets within the same session are
328
- unaffected, and other sessions' data is preserved.
105
+ ## Footer stats
329
106
 
330
- ```text
331
- Provider: otokapi
332
- Model: gpt-5.5
107
+ Stats are read-only local counters stored at `~/.pi/agent/pi-cache-optimizer-stats.json` and scoped by Pi session + provider/model. They contain only dates and numeric counters — no API keys, prompts, payloads, headers, responses, or model output.
333
108
 
334
- Reset local session cache stats for "otokapi/gpt-5.5".
335
- Upstream provider prompt cache was not modified.
336
- New requests will start a fresh stats bucket for this Pi session.
337
- ```
338
-
339
- If no active model is selected, a warning is shown. If the active model does not
340
- match a cache adapter (footer stats are not shown for it), a friendly no-op message
341
- is displayed instead.
342
-
343
- ### `/cache-optimizer doctor`
344
-
345
- Displays the active model's provider, model id, name, API type, base URL, current
346
- `compat` flags, and any missing cache/session-affinity flags. If flags are missing,
347
- it also shows a copyable JSON snippet and the exact edit location.
348
-
349
- When all compat flags are present and applicable (third-party `openai-completions`
350
- proxy), the output shows `✅ Compat fully configured.` For models where the
351
- compat check does not apply (official OpenAI, non-`openai-completions` APIs,
352
- custom transports), it shows `ℹ️ Compat check not applicable for this model.`:
109
+ Example footer:
353
110
 
354
111
  ```text
355
- Provider: otokapi
356
- Model: gpt-5.5
357
- API: openai-completions
358
- Base URL: https://otokapi.example.com/v1
359
- Compat: {}
360
- ⚠️ Missing compat flags: supportsLongCacheRetention, sendSessionAffinityHeaders
361
- Edit ~/.pi/agent/models.json -> providers["otokapi"] -> compat (same level as baseUrl/api/apiKey/models):
362
- {
363
- "supportsLongCacheRetention": true,
364
- "sendSessionAffinityHeaders": true
365
- }
112
+ OpenAI cache 3/10 · 0.002M/0.005M tok (40%) ⚠️ compat
366
113
  ```
367
114
 
368
- ### `/cache-optimizer compat`
115
+ Format: `<label> <hit requests>/<total requests> · <cached input tokens>/<total input tokens> tok (<token hit rate>)`. Some adapters may also append `· write <tokens> tok`, and runtime diagnostics may append `⚠️ compat` or `⚠️ integrity`.
369
116
 
370
- Shows the compat suggestion for the active model, including file path,
371
- provider path, and copyable JSON snippet. When no compat flags are missing,
372
- it shows `✅ Compat fully configured.` if the model is an applicable
373
- third-party proxy, or `ℹ️ Compat check not applicable for this model.`
374
- otherwise.
117
+ Supported footer labels include: DS, Claude, OpenAI, Gemini, Kimi, Qwen, GLM, MiniMax, Hunyuan, Mistral, Grok, Llama, Nemotron, Cohere, Yi, Doubao, ERNIE, Baichuan, StepFun, Spark, InternLM, Gemma, Phi, Jamba, Solar, Sonar, Nova, Reka, Falcon, DBRX, MPT, StableLM, Aquila, EXAONE, HyperCLOVA, Luminous, Hermes, Granite, Arctic, Pangu, SenseNova, Zhinao, MiniCPM, XVERSE, Orion, OpenChat, Vicuna, Wizard, Zephyr, Dolphin, OpenOrca, Starling, BLOOM, RWKV, and Aya.
375
118
 
376
- ### `/cache-optimizer stats`
119
+ Adapter selection uses only model id/name (plus assistant message model/name on message end). Generic OpenAI-shaped APIs are not treated as OpenAI-family unless the model id/name matches a supported family.
377
120
 
378
- Displays the active model's stats bucket (`provider/modelId`), today's request
379
- count (hit/total), cached input tokens vs total input tokens, and the hit rate
380
- percentage. Also shows recent trend summaries (last 10 and last 30 samples):
381
-
382
- ```text
383
- Model key: otokapi/gpt-5.5
384
- Adapter: OpenAI cache
385
-
386
- ── Today ──
387
- Requests: 3 hit / 10 total · 30%
388
- Cached tokens: 0.0015M / 0.005M input · 30%
389
-
390
- ── Recent trend ──
391
- Recent 10/10: 3/10 hits · 30% tok cached
392
- Recent 10/10: 3/10 hits · 30% tok cached
393
- ```
394
-
395
- If the active model has no adapter match, a friendly message is shown. If
396
- no samples have been recorded yet in this session, trend shows "no samples".
397
-
398
- ### Low-hit cause diagnosis
399
-
400
- The `/cache-optimizer doctor` output includes a "Cache diagnosis" section
401
- with prioritized low-hit cause analysis:
402
-
403
- 1. **Missing compat flags** — flags that enable prompt caching and session-affinity
404
- routing are absent.
405
- 2. **Router/channel risk** — multi-backend routing may split the cache across
406
- different upstream instances.
407
- 3. **Missing usage fields** — the proxy may not return prompt-level usage
408
- fields, causing the footer to under-report hits.
409
- 4. **Recent low trend** — when today's cache hit rate is below 30%,
410
- the diagnosis suggests proxy route instability or prompt prefix churn.
411
-
412
- For fully configured models that still have low cache hit rates, the diagnosis
413
- emphasizes sticky routing and upstream cache usage verification rather than
414
- pointing to compat flags.
415
-
416
- ### Router/channel diagnostics
417
-
418
- For models using OpenAI-compatible APIs (`openai-completions` or
419
- `openai-responses`) through a non-official base URL, the extension detects
420
- common router/channel proxy patterns from `provider`, `baseUrl`, and `compat`
421
- metadata:
422
- Vercel AI Gateway, LiteLLM, OneAPI/NewAPI/VoAPI, or a generic third-party
423
- OpenAI-compatible proxy), both `doctor` and `compat` subcommands append
424
- router/channel diagnostics with targeted recommendations.
425
-
426
- ### Router/channel diagnostics
427
-
428
- For models using OpenAI-compatible APIs (`openai-completions` or
429
- `openai-responses`) through a non-official base URL, the extension detects
430
- common router/channel proxy patterns from `provider`, `baseUrl`, and `compat`
431
- metadata:
432
-
433
- | Profile | Detection | Recommendation |
434
- |---------|-----------|----------------|
435
- | **OpenRouter** | baseUrl or provider contains `openrouter`/`openrouter.ai` | Fix the upstream provider with `openRouterRouting.only` or `.order` in compat |
436
- | **Vercel AI Gateway** | baseUrl contains `ai-gateway.vercel.sh` or provider contains `vercel` | Fix the upstream with `vercelGatewayRouting.only` or `.order` in compat |
437
- | **LiteLLM / OneAPI / NewAPI / VoAPI** | baseUrl or provider contains `litellm`, `oneapi`/`one-api`, `newapi`/`new-api`, `voapi`/`vo-api` | Ensure sticky session routing, forward `prompt_cache_key` + session-affinity headers, return cache usage fields |
438
- | **Generic third-party proxy** | Any `openai-completions` model with non-official base URL not matching above | General guidance: verify single-upstream routing, forward `prompt_cache_key` + session-affinity headers, return cache usage |
439
-
440
- These diagnostics are **advisory only**. They do not participate in adapter
441
- selection (still id/name-only), prompt_cache_key injection, footer stats, or
442
- any automated configuration changes. Detection uses only metadata exposed by
443
- Pi (`provider`, `api`, `baseUrl`, `compat`) — no API keys, prompts, payloads,
444
- headers, or model outputs are read or exposed.
445
-
446
- Official OpenAI (`api.openai.com`) and custom transports (`kiro-api`,
447
- `anthropic-messages`, `bedrock-converse-stream`) are excluded from router/
448
- channel diagnostics.
449
-
450
- ### Security
451
-
452
- The command reads only metadata exposed by Pi through `ctx.model`:
453
- provider, id, name, api, baseUrl, compat. It does NOT read or expose:
454
- - API keys or environment secrets
455
- - Request/response payloads
456
- - Prompts or model outputs
457
- - HTTP headers
458
- - Raw `~/.pi/agent/models.json` content
459
-
460
- ## How it works
461
-
462
- Provider caches are usually based on exact or near-exact prefix matching. Pi's system prompt contains stable content that is likely shared across sessions (tools, skills, guidelines) and dynamic content that changes frequently (git status, task context).
463
-
464
- ```text
465
- Before: [dynamic git status | task context | stable tools + rules]
466
- ↓ changing prefix → lower cache reuse
121
+ ## Uninstall
467
122
 
468
- After: [stable tools + rules | dynamic git status | task context]
469
- stable prefix → higher chance of cache reuse
123
+ ```bash
124
+ pi remove npm:pi-cache-optimizer
470
125
  ```
471
126
 
472
- Pi itself decides whether to send cache-related fields such as `prompt_cache_retention`, session-affinity headers, or Anthropic-style `cache_control` based on model compat and `PI_CACHE_RETENTION`. This extension now adds only one conservative request-body fallback by default: for all models using OpenAI-compatible Pi APIs (`openai-completions` / `openai-responses`), it fills a missing or blank top-level `prompt_cache_key` with the Pi session id and never overwrites an existing non-empty key. This covers GPT-named models, Kimi/Moonshot, Qwen/Alibaba, GLM/Zhipu, MiniMax, Hunyuan, and any other provider using an OpenAI-shaped API — only custom transports like `kiro-api` are excluded. The extension does not fake cache hits; it helps configuration, improves stable-prefix probability, and summarizes exposed usage in the footer.
127
+ Then run `/reload` or restart Pi. Optional local stats cleanup:
473
128
 
474
- ## Improving cache hit rate
475
-
476
- The cache-hit optimization is intentionally conservative and provider-neutral in code: keep the largest stable prompt prefix first, let Pi/provider compat send supported cache controls, and avoid leaking unsupported request fields to proxies.
477
-
478
- What the extension does automatically:
479
-
480
- - Moves stable prompt material before dynamic task/git/session context. Besides tools, skills, custom prompts, appended prompts, and guideline bullets, this also keeps known-stable project/spec instruction files such as `AGENTS.md`, `CLAUDE.md`, `GEMINI.md`, `CURSOR.md`, and `.trellis/spec/...` in the early cacheable prefix. Arbitrary large context files are not lifted by size alone, because they may be task/session-specific.
481
- - Sets `PI_CACHE_RETENTION=long` so Pi can request longer retention where the selected model/provider compat supports it.
482
- - Keeps footer counters provider-family-specific so you can verify whether the active model family is actually reporting cache reads.
483
-
484
- Provider notes:
485
-
486
- - DeepSeek: current behavior remains the reference path. Stable prefix ordering plus long-retention/session-affinity compat gives the best chance of automatic KV prefix reuse.
487
- - OpenAI-family: prompt caching is automatic only on supported upstreams and sufficiently long prompts. Keep static instructions, tools, examples, and specs before changing user/task context. Pi owns retention transport by default. For OpenAI-compatible Pi APIs, the extension fills a missing or blank top-level `prompt_cache_key` with the Pi session id (matching Pi core's official OpenAI behavior) and never overwrites an existing non-empty `prompt_cache_key` / `promptCacheKey`. The fallback now applies to ALL models using `openai-completions` / `openai-responses` (not just GPT-named ones), so Kimi, Qwen, GLM, MiniMax, Hunyuan, and other OpenAI-compatible models also benefit. Disable this fallback with `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` or `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0`. Unsupported OpenAI-compatible proxies may reject unknown fields; custom APIs are not targeted.
488
- - Claude: prompt caching depends on Anthropic `cache_control` breakpoints. This extension does not inject breakpoints itself; for compatible endpoints, configure Pi compat such as `cacheControlFormat: "anthropic"` only when the endpoint supports it.
489
- - Gemini/Vertex: implicit caching benefits from repeated large stable prefixes. This extension does not create explicit `cachedContents` resources or store cache resource names.
490
- - Proxies/aggregators: fix upstream routing/provider order where possible. Cache hit rates are unreliable if the same model id/name can route to different upstreams.
491
-
492
- ## Provider-specific limitations
493
-
494
- This package now has provider-family stats adapters, but it still avoids blind generalization:
495
-
496
- - DeepSeek cache is automatic and prefix/KV-cache based. Hits are best-effort and proxies can hide DeepSeek usage fields.
497
- - OpenAI-family prompt caching is automatic only where the actual upstream supports it and prompts are long enough. The adapter is model-name based and intentionally conservative; it does not use provider/API/base URL metadata to infer official OpenAI support.
498
- - Claude prompt caching depends on explicit Anthropic cache-control breakpoints. This release only reports stats exposed by Pi/provider; it does not insert breakpoints or mutate request bodies.
499
- - Gemini/Vertex may expose implicit cached-content token counts. This release does not create, store, update, or delete explicit Gemini cached-content resources.
500
- - Proxies/aggregators can route the same model name to different upstream providers. Because detection is id/name-only, use unambiguous model names, upstream routing constraints, and exposed usage verification before trusting cache behavior.
501
-
502
- ## Out of scope for this release
129
+ | Platform | Delete local stats files |
130
+ |---|---|
131
+ | Linux / macOS / WSL | `rm -f ~/.pi/agent/pi-cache-optimizer-stats.json ~/.pi/agent/deepseek-cache-optimizer-stats.json` |
132
+ | Windows PowerShell | `Remove-Item -Force "$env:USERPROFILE\.pi\agent\pi-cache-optimizer-stats.json", "$env:USERPROFILE\.pi\agent\deepseek-cache-optimizer-stats.json" -ErrorAction SilentlyContinue` |
133
+ | Windows Command Prompt | `del /f /q "%USERPROFILE%\.pi\agent\pi-cache-optimizer-stats.json" "%USERPROFILE%\.pi\agent\deepseek-cache-optimizer-stats.json" 2>nul` |
503
134
 
504
- - Broad/provider-agnostic request-body mutation or cache-control injection. The only default request-body fallback is OpenAI-family `prompt_cache_key` on OpenAI-compatible APIs, sourced from the Pi session id and skipped when an effective key already exists.
505
- - Injecting Anthropic `cache_control` markers.
506
- - Sending OpenAI `prompt_cache_key` into custom/non-OpenAI-compatible APIs; the fallback is gated to `openai-completions` / `openai-responses` only (custom transports like `kiro-api` are excluded, but the model name no longer needs to be GPT-family).
507
- - Overriding OpenAI `prompt_cache_retention` outside Pi's own compat handling.
508
- - Creating Gemini explicit `cachedContents` resources or persisting cache resource names.
509
- - Claiming stats for providers that do not expose reliable cache usage.
135
+ Do not delete `models.json` during cleanup; it contains your Pi model/provider configuration and is not owned by this package.
510
136
 
511
137
  ## Verify effect
512
138
 
513
- ### In Pi
514
-
515
- - Watch the footer label for the active family, such as `DS cache ...`, `OpenAI cache ...`, `Claude cache ...`, or `Gemini cache ...`.
516
- - Use Pi's built-in `/stats` to confirm `cacheRead` tokens grow when Pi normalizes provider usage.
517
- - For DeepSeek, Pi normalizes `usage.input` as uncached/miss prompt tokens and `usage.cacheRead` as `prompt_cache_hit_tokens`, so the footer denominator is reconstructed as `input + cacheRead + cacheWrite` (matching DeepSeek `prompt_tokens` when the provider reports normal usage).
518
- - Footer hit count is request-level: one assistant response increments total requests, and it is a hit when `cacheRead > 0`. DeepSeek dashboards may use different time windows or account-wide/provider-side aggregation, so align the reset/window before comparing.
519
- - For provider raw APIs, compare with documented usage fields such as DeepSeek `prompt_cache_hit_tokens`, OpenAI `cached_tokens`, Anthropic `cache_read_input_tokens`, or Gemini/Vertex cached-content token counts.
520
-
521
- ### Official DeepSeek baseline (recommended)
522
-
523
- Use official direct `deepseek/deepseek-v4-pro` as the baseline. Avoid mixing proxy paths in the same verification run. Do not paste API keys into chats or issues.
524
-
525
- 1. Configure the official key with either:
526
-
527
- ```bash
528
- export DEEPSEEK_API_KEY='...'
529
- ```
530
-
531
- or Pi's login/config mechanism.
532
-
533
- 2. Confirm the model is visible:
534
-
535
- ```bash
536
- pi --list-models deepseek-v4-pro
537
- ```
538
-
539
- 3. Run a minimal request:
540
-
541
- ```bash
542
- pi --model deepseek/deepseek-v4-pro --thinking high
543
- ```
544
-
545
- In Pi, send the same or highly similar short prompt several times, for example:
546
-
547
- ```text
548
- Answer in one sentence: cache baseline ping
549
- ```
550
-
551
- 4. Repeat the same or highly similar request at least three times, then compare footer `DS cache ...` and `/stats` for increasing `cacheRead` / hit rate.
552
-
553
- DeepSeek cache prefixes are server-side and may be grouped by prefix/cache unit. The first repeated request can still be building a shared prefix cache; the third and later matching requests are usually more meaningful. Official docs describe cache cleanup as a best-effort process that may take hours to days, but this is not a hit guarantee. A short-term miss can also be caused by prefix granularity, routing, request differences, or cache not being built yet.
554
-
555
- > Note: the baseline consumes a small number of tokens. Use short prompts and do not paste large files.
139
+ 1. Select a model whose provider exposes cache usage.
140
+ 2. Send several similar turns in the same Pi session.
141
+ 3. Watch the footer or run `/cache-optimizer stats`.
142
+ 4. For third-party proxies, also run `/cache-optimizer doctor` and confirm sticky routing / session affinity on the proxy side.
556
143
 
557
144
  ## License
558
145
 
559
- Released under the [MIT License](./LICENSE).
146
+ MIT