npm - pi-cache-optimizer - Versions diffs - 2.0.0 - Mend

pi-cache-optimizer 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 freescheme
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,258 @@
+# Pi Cache Optimizer
+[中文说明](./README.zh-CN.md)
+> **Renamed from `pi-deepseek-cache-optimizer`.** If you previously installed the old name, migrate with:
+>
+> ```bash
+> pi remove npm:pi-deepseek-cache-optimizer && pi install npm:pi-cache-optimizer
+> ```
+>
+> Your persisted footer counters and any existing `~/.pi/agent/models.json` are preserved automatically.
+A plug-and-play Pi extension that improves provider-side KV Cache / Prompt Cache hit rates, with conservative provider-specific footer stats. Despite the original DeepSeek-only name, this package has supported DeepSeek, OpenAI, Claude, and Gemini stats adapters since 1.x — the new name reflects that scope.
+> Important: prompt/KV caching is provider-side and best-effort. This extension can improve the odds of cache hits by stabilizing prompt prefixes, requesting long retention through Pi when supported, warning about obvious compat gaps, and showing lightweight footer stats for providers that expose reliable cache usage. It cannot guarantee cache hits. Third-party proxies may hide, drop, reroute, or reinterpret cache behavior.
+## What it does
+| Feature | How | Manual action required |
+|---|---|:---:|
+| 🔄 Reorders the system prompt | `before_agent_start` hook: stable prefix first, dynamic context later | ❌ Automatic |
+| ⏳ Requests long cache retention | Sets `PI_CACHE_RETENTION=long` when the extension loads; Pi/provider compat decides what is sent | ❌ Automatic |
+| 🔗 Conservative compat reminders | DeepSeek session-affinity reminders, plus obvious Claude cache-control guidance for compatible endpoints | ⚠️ See below |
+| 📊 Provider-specific footer stats | Shows read-only cache stats for supported provider families in Pi footer/status | ❌ Automatic |
+## Supported stats adapters
+This release keeps the original DeepSeek behavior and adds read-only stats adapters for model families that Pi or the provider can expose safely. Adapter selection is intentionally limited to the model id/name (and assistant message `model`/`name` on `message_end`); provider id, API type, base URL, `thinkingFormat`, and compat flags never select a stats adapter.
+| Adapter | Detection | Footer label | Usage fields |
+|---|---|---|---|
+| DeepSeek | Model id/name contains `deepseek` | `DS cache` | Pi `usage.cacheRead`/`usage.input`, or raw `prompt_cache_hit_tokens`, `prompt_cache_miss_tokens`, `prompt_tokens` when visible |
+| OpenAI-family | Model id/name contains conservative OpenAI-family tokens such as `gpt-`, `chatgpt`, `o1`, `o3`, `o4`, or `o5` | `OpenAI cache` | Pi-normalized usage, or raw `prompt_tokens_details.cached_tokens` / `input_tokens_details.cached_tokens` with prompt/input totals |
+| Anthropic / Claude | Model id/name contains `anthropic` or `claude` | `Claude cache` | Pi-normalized usage, or raw `cache_read_input_tokens`, `cache_creation_input_tokens`, `input_tokens` |
+| Gemini / Vertex | Model id/name contains `gemini` or `vertex` | `Gemini cache` | Pi-normalized usage, or raw Gemini/Vertex cached-content token metadata when visible |
+Generic OpenAI-compatible proxies are **not** treated as OpenAI-family just because they use an OpenAI-shaped API or provider id. If the active model id/name is ambiguous, the extension hides the footer stats instead of guessing.
+## Quickstart
+1. (Optional but recommended) Read the official Pi + DeepSeek onboarding guide: [`pi_mono.zh-CN.md`](https://github.com/deepseek-ai/awesome-deepseek-agent/blob/main/docs/pi_mono.zh-CN.md). It covers Pi installation and core configuration.
+2. Install this extension:
+   ```bash
+   pi install npm:pi-cache-optimizer
+   ```
+3. On first activation, if no DeepSeek-like model is already configured, this extension auto-seeds a recommended `deepseek` provider block into `~/.pi/agent/models.json`. The seed goes BEYOND the official onboarding doc by adding `supportsLongCacheRetention: true` and `sendSessionAffinityHeaders: true` — those flags are exactly the cache-related compat the official doc omits, and they are the reason this extension's compat warnings exist. A timestamped backup `~/.pi/agent/models.json.bak.<unix-millis>` is written before any change. Existing user entries are never modified.
+4. Export your DeepSeek API key in the same shell where you run `pi`:
+   ```bash
+   export DEEPSEEK_API_KEY='...'
+   ```
+   The seed references `$DEEPSEEK_API_KEY` symbolically; this extension never reads, stores, or prints the key value.
+5. Opt out of auto-seeding by exporting `PI_CACHE_OPTIMIZER_NO_AUTO_CONFIG=1` before launching Pi. With opt-out, no write or backup happens, and no provider entry is added or modified.
+## Install
+```bash
+pi install npm:pi-cache-optimizer
+```
+After installation, `PI_CACHE_RETENTION=long` is applied automatically, the system prompt is reordered automatically, `~/.pi/agent/models.json` is auto-seeded with a DeepSeek block when no DeepSeek-like model is configured, and the footer shows cache stats after supported model-family responses with exposed usage.
+## Uninstall
+Remove the same package source you installed. For the npm package:
+```bash
+pi remove npm:pi-cache-optimizer
+```
+If you installed from a local path, remove that same path/source instead, for example:
+```bash
+pi remove /absolute/path/to/pi-deepseek-cache-optimizer
+# or, if that was the exact source you installed:
+pi remove ./relative/path/to/pi-deepseek-cache-optimizer
+```
+If you installed into project settings with `pi install -l ...`, use the matching project-scope remove command, for example `pi remove -l npm:pi-cache-optimizer`.
+After removing the package, run `/reload` in Pi or restart Pi so the extension is unloaded. The footer counters are persisted separately; if you also want to delete that local state, remove:
+```bash
+rm ~/.pi/agent/pi-cache-optimizer-stats.json
+# Old name (kept once and migrated automatically; safe to delete if it still exists):
+rm -f ~/.pi/agent/deepseek-cache-optimizer-stats.json
+```
+The DeepSeek block this extension seeded into `~/.pi/agent/models.json` is left in place on uninstall. Remove it manually if you no longer want it; the timestamped backup at `~/.pi/agent/models.json.bak.<unix-millis>` lets you compare against the previous content.
+## Footer cache stats
+The Pi footer displays stats for the **active model family** only, for example:
+```text
+DS cache 3/5 · 0.77M/0.80M tok (96%)
+OpenAI cache 2/4 · 0.25M/0.70M tok (36%)
+Claude cache 1/3 · 0.10M/0.45M tok (22%) · write 0.20M tok
+Gemini cache 1/2 · 0.18M/0.50M tok (36%)
+```
+Meaning:
+- `3/5`: 3 of 5 supported assistant responses for that provider family had cache-read tokens.
+- `0.77M/0.80M tok`: cumulative cache-read input tokens / cumulative prompt input tokens, shown in millions.
+- Percentage: `cacheRead / total prompt input`.
+- `write ... tok` appears for Claude when cache-write tokens are nonzero, because Anthropic cache writes have distinct cost/accounting semantics.
+Stats rules:
+- Counters are separate per provider family. DeepSeek, OpenAI, Claude, and Gemini stats are not combined into one global hit rate.
+- The footer shows only the active model family's label and counters; it clears/hides for unsupported or ambiguous models.
+- Counts only assistant responses where Pi/provider exposes usage. Missing usage means no counter update.
+- Adapter matching uses only active model id/name plus assistant message `model`/`name`; broad provider/API/compat metadata is ignored for selection.
+- Pi-normalized `usage.input`, `usage.cacheRead`, and `usage.cacheWrite` are preferred. Known raw provider fields are used only defensively when visible on the assistant message.
+- Total prompt input is `input + cacheRead + cacheWrite` for Pi-normalized usage. Provider raw normalizers use each provider's documented total/input fields when available.
+- Stats update only the footer/status. The extension does not create extra TUI widgets or diagnostic files.
+- Stats are persisted in a small local JSON state file at `~/.pi/agent/pi-cache-optimizer-stats.json`. Earlier 1.x releases used `~/.pi/agent/deepseek-cache-optimizer-stats.json`; on first run after upgrade the old file is read once, copied into the new path, and best-effort deleted. The file stores only counters and the local day; it does not store API keys, prompts, messages, headers, or model output.
+- Existing v1 state files from DeepSeek-only releases are migrated into the DeepSeek adapter counters automatically.
+Reset behavior:
+- Pi restarts do **not** clear stats; the persisted counters are restored.
+- `/reload` / extension reload resets the persisted counters because Pi exposes `session_start` with reason `reload`.
+- Crossing the local natural-day boundary resets counters on the next status update or supported-provider response.
+## Suggested compat config
+For direct DeepSeek or DeepSeek-like OpenAI-compatible proxies, configure the provider or model `compat` like this:
+```json
+{
+  "providers": {
+    "deepseek": {
+      "compat": {
+        "thinkingFormat": "deepseek",
+        "supportsLongCacheRetention": true,
+        "sendSessionAffinityHeaders": true
+      }
+    }
+  }
+}
+```
+If your provider id is not `deepseek` (for example a company proxy or OpenRouter-style proxy), you can put the same fields on that provider or the specific DeepSeek model. The extension detects DeepSeek-like models only by checking whether the model id/name contains `deepseek`; it does not infer this from provider id, base URL, or `thinkingFormat`. The currently recommended verification path covers the official direct `deepseek/deepseek-v4-pro` model.
+The extension warns at most once per provider/model per session when a DeepSeek-like OpenAI-compatible model is missing:
+- `supportsLongCacheRetention: true`, so Pi may not send `prompt_cache_retention: "24h"`.
+- `sendSessionAffinityHeaders: true` for OpenAI Completions-compatible APIs, or `sendSessionIdHeader: true` for OpenAI Responses-compatible APIs, so Pi may not send session-affinity headers such as `session_id`, `x-client-request-id`, or `x-session-affinity`.
+For Claude/Anthropic models behind an OpenAI-compatible endpoint, the extension may warn when the model is clearly Claude-like but `cacheControlFormat: "anthropic"` is missing. Only enable that compat flag if your endpoint supports Anthropic-style cache-control markers.
+> Reminder: only enable session-affinity headers or cache-control compat when your endpoint or proxy supports them.
+## How it works
+Provider caches are usually based on exact or near-exact prefix matching. Pi's system prompt contains stable content that is likely shared across sessions (tools, skills, guidelines) and dynamic content that changes frequently (git status, task context).
+```text
+Before: [dynamic git status | task context | stable tools + rules]
+        ↓ changing prefix → lower cache reuse
+After:  [stable tools + rules | dynamic git status | task context]
+        ↓ stable prefix → higher chance of cache reuse
+```
+Pi itself decides whether to send cache-related fields such as `prompt_cache_retention`, session-affinity headers, or Anthropic-style `cache_control` based on model compat and `PI_CACHE_RETENTION`. By default this extension does not add request fields; the only opt-in request hint is OpenAI-family `prompt_cache_key` when `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1` is set. The extension does not fake cache hits; it helps configuration, improves stable-prefix probability, and summarizes exposed usage in the footer.
+## Improving cache hit rate
+The cache-hit optimization is intentionally conservative and provider-neutral in code: keep the largest stable prompt prefix first, let Pi/provider compat send supported cache controls, and avoid leaking unsupported request fields to proxies.
+What the extension does automatically:
+- Moves stable prompt material before dynamic task/git/session context. Besides tools, skills, custom prompts, appended prompts, and guideline bullets, this also keeps known-stable project/spec instruction files such as `AGENTS.md`, `CLAUDE.md`, `GEMINI.md`, `CURSOR.md`, and `.trellis/spec/...` in the early cacheable prefix. Arbitrary large context files are not lifted by size alone, because they may be task/session-specific.
+- Sets `PI_CACHE_RETENTION=long` so Pi can request longer retention where the selected model/provider compat supports it.
+- Keeps footer counters provider-family-specific so you can verify whether the active model family is actually reporting cache reads.
+Provider notes:
+- DeepSeek: current behavior remains the reference path. Stable prefix ordering plus long-retention/session-affinity compat gives the best chance of automatic KV prefix reuse.
+- OpenAI-family: prompt caching is automatic only on supported upstreams and sufficiently long prompts. Keep static instructions, tools, examples, and specs before changing user/task context. Pi owns retention transport by default. If you explicitly opt in with `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1`, the extension adds a top-level `prompt_cache_key` derived from a SHA-256 hash of the stable prompt prefix for OpenAI-family id/name matches only. The stable prompt text is not stored or printed, but unsupported OpenAI-compatible proxies may reject this field.
+- Claude: prompt caching depends on Anthropic `cache_control` breakpoints. This extension does not inject breakpoints itself; for compatible endpoints, configure Pi compat such as `cacheControlFormat: "anthropic"` only when the endpoint supports it.
+- Gemini/Vertex: implicit caching benefits from repeated large stable prefixes. This extension does not create explicit `cachedContents` resources or store cache resource names.
+- Proxies/aggregators: fix upstream routing/provider order where possible. Cache hit rates are unreliable if the same model id/name can route to different upstreams.
+## Provider-specific limitations
+This package now has provider-family stats adapters, but it still avoids blind generalization:
+- DeepSeek cache is automatic and prefix/KV-cache based. Hits are best-effort and proxies can hide DeepSeek usage fields.
+- OpenAI-family prompt caching is automatic only where the actual upstream supports it and prompts are long enough. The adapter is model-name based and intentionally conservative; it does not use provider/API/base URL metadata to infer official OpenAI support.
+- Claude prompt caching depends on explicit Anthropic cache-control breakpoints. This release only reports stats exposed by Pi/provider; it does not insert breakpoints or mutate request bodies.
+- Gemini/Vertex may expose implicit cached-content token counts. This release does not create, store, update, or delete explicit Gemini cached-content resources.
+- Proxies/aggregators can route the same model name to different upstream providers. Because detection is id/name-only, use unambiguous model names, upstream routing constraints, and exposed usage verification before trusting cache behavior.
+## Out of scope for this release
+- Broad/default request-body mutation or provider-agnostic cache-control injection.
+- Injecting Anthropic `cache_control` markers.
+- Sending OpenAI `prompt_cache_key` by default; it is only added when `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1` is set, the active model id/name is OpenAI-family, and the payload does not already define one.
+- Overriding OpenAI `prompt_cache_retention` outside Pi's own compat handling.
+- Creating Gemini explicit `cachedContents` resources or persisting cache resource names.
+- Claiming stats for providers that do not expose reliable cache usage.
+## Verify effect
+### In Pi
+- Watch the footer label for the active family, such as `DS cache ...`, `OpenAI cache ...`, `Claude cache ...`, or `Gemini cache ...`.
+- Use Pi's built-in `/stats` to confirm `cacheRead` tokens grow when Pi normalizes provider usage.
+- For DeepSeek, Pi normalizes `usage.input` as uncached/miss prompt tokens and `usage.cacheRead` as `prompt_cache_hit_tokens`, so the footer denominator is reconstructed as `input + cacheRead + cacheWrite` (matching DeepSeek `prompt_tokens` when the provider reports normal usage).
+- Footer hit count is request-level: one assistant response increments total requests, and it is a hit when `cacheRead > 0`. DeepSeek dashboards may use different time windows or account-wide/provider-side aggregation, so align the reset/window before comparing.
+- For provider raw APIs, compare with documented usage fields such as DeepSeek `prompt_cache_hit_tokens`, OpenAI `cached_tokens`, Anthropic `cache_read_input_tokens`, or Gemini/Vertex cached-content token counts.
+### Official DeepSeek baseline (recommended)
+Use official direct `deepseek/deepseek-v4-pro` as the baseline. Avoid mixing proxy paths in the same verification run. Do not paste API keys into chats or issues.
+1. Configure the official key with either:
+   ```bash
+   export DEEPSEEK_API_KEY='...'
+   ```
+   or Pi's login/config mechanism.
+2. Confirm the model is visible:
+   ```bash
+   pi --list-models deepseek-v4-pro
+   ```
+3. Run a minimal request:
+   ```bash
+   pi --model deepseek/deepseek-v4-pro --thinking high
+   ```
+   In Pi, send the same or highly similar short prompt several times, for example:
+   ```text
+   Answer in one sentence: cache baseline ping
+   ```
+4. Repeat the same or highly similar request at least three times, then compare footer `DS cache ...` and `/stats` for increasing `cacheRead` / hit rate.
+DeepSeek cache prefixes are server-side and may be grouped by prefix/cache unit. The first repeated request can still be building a shared prefix cache; the third and later matching requests are usually more meaningful. Official docs describe cache cleanup as a best-effort process that may take hours to days, but this is not a hit guarantee. A short-term miss can also be caused by prefix granularity, routing, request differences, or cache not being built yet.
+> Note: the baseline consumes a small number of tokens. Use short prompts and do not paste large files.
+## License
+Released under the [MIT License](./LICENSE).

package/README.zh-CN.md ADDED Viewed

@@ -0,0 +1,260 @@
+# Pi Cache Optimizer
+[English README](./README.md)
+> **已从 `pi-deepseek-cache-optimizer` 重命名。** 如果你之前安装的是旧名称，请迁移：
+>
+> ```bash
+> pi remove npm:pi-deepseek-cache-optimizer && pi install npm:pi-cache-optimizer
+> ```
+>
+> 持久化的底部计数器以及已有的 `~/.pi/agent/models.json` 都会被保留。
+开箱即用的 Pi 扩展，用稳定 prompt 前缀提升 provider-side KV Cache / Prompt Cache 命中概率，并以保守的 provider-specific adapter 显示底部缓存统计。包名里虭带 DeepSeek，但从 1.x 开始实际上已同时支持 DeepSeek、OpenAI、Claude、Gemini 的统计 adapter；新名称反映这个事实。
+> 重要：prompt/KV 缓存是 provider 侧、best-effort 行为。本扩展只能通过稳定前缀、在 Pi 支持时请求长保留、提醒明显 compat 缺口、以及展示 provider 暴露的轻量统计来提高命中概率，不能保证每次命中。第三方代理可能隐藏、丢失、重路由或重新解释缓存行为。
+## 做了什么
+| 功能 | 方式 | 是否需要手动操作 |
+|------|------|:---:|
+| 🔄 重组 system prompt | `before_agent_start` 钩子 — 稳定前缀在前、动态上下文在后 | ❌ 自动 |
+| ⏳ 长缓存保留 | 扩展加载时设置 `PI_CACHE_RETENTION=long`；Pi/provider compat 决定实际发送内容 | ❌ 自动 |
+| 🔗 保守 compat 提醒 | DeepSeek session-affinity 提醒，以及 Claude 兼容 endpoint 的明显 cache-control 提醒 | ⚠️ 见下 |
+| 📊 Provider-specific 底部统计 | 在 Pi footer/status 中显示受支持 provider family 的只读缓存统计 | ❌ 自动 |
+## 支持的统计 adapter
+本版本保留原有 DeepSeek 行为，并增加针对 Pi 或 provider 能安全暴露 usage 的只读统计 adapter。Adapter 选择刻意只使用 model id/name（以及 `message_end` 中 assistant message 的 `model`/`name`）；不会用 provider id、API type、base URL、`thinkingFormat` 或 compat flags 来选择统计 adapter。
+| Adapter | 检测方式 | 底部标签 | usage 字段 |
+|---|---|---|---|
+| DeepSeek | model id/name 包含 `deepseek` | `DS cache` | Pi `usage.cacheRead`/`usage.input`，或可见 raw 字段 `prompt_cache_hit_tokens`、`prompt_cache_miss_tokens`、`prompt_tokens` |
+| OpenAI-family | model id/name 包含保守 OpenAI-family token，例如 `gpt-`、`chatgpt`、`o1`、`o3`、`o4` 或 `o5` | `OpenAI cache` | Pi 归一化 usage，或可见 raw 字段 `prompt_tokens_details.cached_tokens` / `input_tokens_details.cached_tokens` 及 prompt/input total |
+| Anthropic / Claude | model id/name 包含 `anthropic` 或 `claude` | `Claude cache` | Pi 归一化 usage，或可见 raw 字段 `cache_read_input_tokens`、`cache_creation_input_tokens`、`input_tokens` |
+| Gemini / Vertex | model id/name 包含 `gemini` 或 `vertex` | `Gemini cache` | Pi 归一化 usage，或可见 Gemini/Vertex cached-content token metadata |
+Generic OpenAI-compatible 代理**不会**仅因为使用 OpenAI 形状 API 或 provider id 就被当作 OpenAI-family。如果当前 model id/name 语义不明确，扩展会隐藏底部统计，而不是猜测。
+## 快速开始
+1. （可选但推荐）先读一遍官方 Pi + DeepSeek 接入指南：[`pi_mono.zh-CN.md`](https://github.com/deepseek-ai/awesome-deepseek-agent/blob/main/docs/pi_mono.zh-CN.md)。它讲了 Pi 安装与基础配置。
+2. 安装本扩展：
+   ```bash
+   pi install npm:pi-cache-optimizer
+   ```
+3. 首次激活时，如果 `~/.pi/agent/models.json` 里还没有 DeepSeek-like 模型，本扩展会自动写入一个推荐的 `deepseek` provider 块。这个 seed 比官方接入文档多了两个关键 flag：`supportsLongCacheRetention: true` 与 `sendSessionAffinityHeaders: true`——这些正是官方文档略去、但本扩展 compat 警告一直在判断的缓存相关项。写入前会先产生一个带时间戳的备份 `~/.pi/agent/models.json.bak.<unix-millis>`，原有的任何 provider 条目都不会被修改或覆盖。
+4. 在运行 `pi` 的同一个 shell 中导出 DeepSeek API key：
+   ```bash
+   export DEEPSEEK_API_KEY='...'
+   ```
+   seed 只是以 `$DEEPSEEK_API_KEY` 符号引用 key；本扩展**不会**读取、存储或打印 key 的值。
+5. 如需退出自动写入，请在启动 Pi 之前设 `PI_CACHE_OPTIMIZER_NO_AUTO_CONFIG=1`。退出后不会产生任何写入或备份，也不会新增 provider 条目。
+## 安装
+```bash
+pi install npm:pi-cache-optimizer
+```
+安装后 `PI_CACHE_RETENTION=long` **自动生效**，system prompt **自动重组**；如果 `~/.pi/agent/models.json` 还没有 DeepSeek-like 模型，会自动 seed 一个 `deepseek` provider 块；受支持 model family 的响应完成且暴露 usage 后，底部状态栏会显示缓存统计。
+## 卸载
+请移除当初安装时使用的同一个 package source。npm 包对应命令：
+```bash
+pi remove npm:pi-cache-optimizer
+```
+如果你是从本地路径安装的，请移除同一个路径/source，例如：
+```bash
+pi remove /absolute/path/to/pi-deepseek-cache-optimizer
+# 或者，如果安装时使用的就是这个相对路径：
+pi remove ./relative/path/to/pi-deepseek-cache-optimizer
+```
+如果当初使用 `pi install -l ...` 安装到项目级 settings，请使用对应的项目级卸载命令，例如 `pi remove -l npm:pi-cache-optimizer`。
+移除 package 后，在 Pi 中执行 `/reload` 或重启 Pi，让扩展卸载。底部统计计数器会单独持久化；如果也想删除这个本地状态文件，可以执行：
+```bash
+rm ~/.pi/agent/pi-cache-optimizer-stats.json
+# 旧名称（首次运行新版本时会被迁移、可能已被删除；仍在的话可安全删除）：
+rm -f ~/.pi/agent/deepseek-cache-optimizer-stats.json
+```
+本扩展写入到 `~/.pi/agent/models.json` 的 DeepSeek 块在卸载后不会被自动删除。如需清除请手动编辑；之前的备份 `~/.pi/agent/models.json.bak.<unix-millis>` 可供对比还原。
+## 底部缓存统计
+Pi footer 只显示**当前活跃模型 family** 的统计，例如：
+```text
+DS cache 3/5 · 0.77M/0.80M tok (96%)
+OpenAI cache 2/4 · 0.25M/0.70M tok (36%)
+Claude cache 1/3 · 0.10M/0.45M tok (22%) · write 0.20M tok
+Gemini cache 1/2 · 0.18M/0.50M tok (36%)
+```
+含义：
+- `3/5`：该 provider family 的 5 次受支持 assistant 响应中，有 3 次出现 cache-read tokens。
+- `0.77M/0.80M tok`：累计 cache-read input tokens / 累计 prompt input tokens，单位固定显示为百万（M）。
+- 百分比：`cacheRead / total prompt input`。
+- `write ... tok` 只会在 Claude cache-write tokens 非零时出现，因为 Anthropic cache write 有独立成本/统计语义。
+统计规则：
+- 计数器按 provider family 分开保存。DeepSeek、OpenAI、Claude、Gemini 不会合并成一个全局 hit rate。
+- 底部只显示当前活跃模型 family 的标签和计数器；不支持或语义不明确的模型会隐藏/清空状态。
+- 只统计 Pi/provider 暴露 usage 的 assistant 响应；没有 usage 时不更新计数器。
+- Adapter 匹配只使用当前 model id/name 加 assistant message 的 `model`/`name`；选择 adapter 时会忽略宽泛 provider/API/compat metadata。
+- 优先使用 Pi 归一化后的 `usage.input`、`usage.cacheRead`、`usage.cacheWrite`。只有当 assistant message 上可见已知 provider raw 字段时，才做保守 fallback 解析。
+- Pi 归一化 usage 的 prompt input 总量使用 `input + cacheRead + cacheWrite`。provider raw normalizer 会优先使用各 provider 文档里的 total/input 字段。
+- 统计只更新底部状态栏，不创建额外 TUI 组件，也不写诊断文件；因此不会因调试组件频繁重绘导致屏幕闪烁。
+- 统计会持久化到本地小 JSON 文件：`~/.pi/agent/pi-cache-optimizer-stats.json`。早期 1.x 版本使用 `~/.pi/agent/deepseek-cache-optimizer-stats.json`；首次运行新版时会从旧路径读一次、复制到新路径、然后 best-effort 删除旧文件。该文件只保存计数器和本地日期，不保存 API key、prompt、消息内容、headers 或模型输出。
+- DeepSeek-only 旧版本的 v1 状态文件会自动迁移到 DeepSeek adapter 计数器。
+重置规则：
+- Pi 重启**不会**清零统计；扩展会恢复已持久化的计数器。
+- `/reload` / extension reload 会清零并覆盖持久化计数器，因为 Pi 会暴露 `session_start` 的 `reason: "reload"`。
+- 长时间运行跨过本地自然日时，会在下一次状态更新或受支持 provider 响应统计前自动按本地日期清零。
+## 建议的 compat 配置
+对直连 DeepSeek 或 DeepSeek-like OpenAI-compatible 代理，建议在对应 provider 或 model 的 `compat` 中配置：
+```json
+{
+  "providers": {
+    "deepseek": {
+      "compat": {
+        "thinkingFormat": "deepseek",
+        "supportsLongCacheRetention": true,
+        "sendSessionAffinityHeaders": true
+      }
+    }
+  }
+}
+```
+如果你的 provider id 不是 `deepseek`（例如公司代理、OpenRouter 风格代理），也可以把同样字段放在该 provider 或具体 DeepSeek 模型的 `compat` 里。扩展识别 DeepSeek-like 模型的依据仍然是 model id / model name 是否包含 `deepseek`；不会根据 provider id、baseUrl 或 `thinkingFormat` 判断。当前推荐的 DeepSeek 验证路径只覆盖官方直连 `deepseek/deepseek-v4-pro`。
+扩展会对每个 provider/model **每个会话最多提醒一次**。对于 DeepSeek-like OpenAI-compatible 模型，会在缺少以下配置时提醒：
+- `supportsLongCacheRetention: true`：Pi 可能不会发送 `prompt_cache_retention: "24h"`。
+- `sendSessionAffinityHeaders: true`（OpenAI Completions 兼容 API）或 `sendSessionIdHeader: true`（OpenAI Responses 兼容 API）：Pi 可能不会发送 session affinity headers（如 `session_id`、`x-client-request-id`、`x-session-affinity`），代理/负载均衡场景下缓存命中可能更差。
+对于通过 OpenAI-compatible endpoint 暴露的 Claude/Anthropic 模型，如果模型明显 Claude-like 但缺少 `cacheControlFormat: "anthropic"`，扩展可能提醒。只有在 endpoint 支持 Anthropic-style cache-control markers 时才应启用该 compat flag。
+> 提醒：只有在 endpoint 或代理明确支持时，才建议启用 session-affinity headers 或 cache-control compat。
+## 原理
+Provider 缓存通常依赖精确或近似精确的前缀匹配。Pi 的 system prompt 包含跨会话稳定的内容（工具定义、技能、规范），也包含每次变化的动态内容（git status、当前任务）。
+```text
+优化前: [动态 git status | 任务上下文 | 稳定工具+规范]
+         ↓ 每次前缀不同 → 缓存复用降低
+优化后: [稳定工具+规范 | 动态 git status | 任务上下文]
+         ↓ 稳定前缀不变 → 更容易命中缓存
+```
+Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送缓存相关字段，例如 `prompt_cache_retention`、session affinity headers 或 Anthropic-style `cache_control`。默认情况下，本扩展不会添加请求字段；唯一的 opt-in 请求提示是设置 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1` 后，对 OpenAI-family 模型添加 `prompt_cache_key`。本扩展不伪造缓存命中，只帮助配置、提高稳定前缀概率，并把已暴露的 usage 汇总到底部状态栏。
+## 提高 cache 命中率
+代码里的命中率优化会保持保守和 provider-neutral：把最大的稳定 prompt 前缀放在最前面，让 Pi/provider compat 发送其支持的缓存控制字段，避免把不受支持的 request 字段泄漏给代理。
+扩展会自动做这些事：
+- 把稳定 prompt 内容移动到动态 task/git/session 上下文之前。除了 tools、skills、custom prompt、append prompt 和 guideline bullets，也会把已知稳定的项目/规范文件（例如 `AGENTS.md`、`CLAUDE.md`、`GEMINI.md`、`CURSOR.md`、`.trellis/spec/...`）保留在更靠前的 cacheable prefix 中。任意大型 context file 不会只因为体积大就被提前，因为它们可能是 task/session-specific 内容。
+- 设置 `PI_CACHE_RETENTION=long`，让 Pi 在当前模型/provider compat 支持时请求更长缓存保留。
+- 按 provider family 分开 footer 计数，方便你验证当前活跃模型 family 是否真的报告 cache reads。
+各 provider 注意点：
+- DeepSeek：现有行为仍是参考路径。稳定前缀排序，加上 long-retention / session-affinity compat，最有利于自动 KV prefix 复用。
+- OpenAI-family：prompt caching 只会在真实上游支持且 prompt 足够长时自动生效。请尽量把静态 instructions、tools、examples、specs 放在变化的 user/task context 前面。retention 传输默认由 Pi 负责。如果你显式设置 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1`，扩展会仅对 OpenAI-family id/name 匹配的模型添加顶层 `prompt_cache_key`，其值来自稳定 prompt 前缀的 SHA-256 hash。稳定 prompt 原文不会被保存或打印，但不支持该字段的 OpenAI-compatible 代理可能会拒绝请求。
+- Claude：prompt caching 依赖 Anthropic `cache_control` breakpoints。本扩展不会自行注入 breakpoint；对兼容 endpoint，只在 endpoint 明确支持时配置 Pi compat，例如 `cacheControlFormat: "anthropic"`。
+- Gemini/Vertex：implicit caching 受益于重复的大型稳定前缀。本扩展不会创建 explicit `cachedContents` resources，也不会保存 cache resource names。
+- Proxies/aggregators：尽量固定上游 routing/provider order。如果同一个 model id/name 可能路由到不同上游，cache hit rate 会不稳定。
+## Provider-specific 限制
+本包现在有 provider-family stats adapter，但仍避免盲目泛化：
+- DeepSeek cache 是自动的 prefix/KV cache。命中是 best-effort，代理可能隐藏 DeepSeek usage 字段。
+- OpenAI-family prompt caching 只有在真实上游支持且 prompt 足够长时才会自动生效。adapter 基于模型名称且刻意保守；不会用 provider/API/base URL metadata 推断官方 OpenAI 支持。
+- Claude prompt caching 依赖显式 Anthropic cache-control breakpoints。本版本只报告 Pi/provider 暴露的统计；不会插入 breakpoint，也不会修改请求体。
+- Gemini/Vertex 可能暴露 implicit cached-content token count。本版本不会创建、保存、更新或删除 explicit Gemini cached-content resources。
+- Proxies/aggregators 可能把同一个 model name 路由到不同上游 provider。由于检测是 id/name-only，请使用无歧义 model name、固定上游 routing，并验证 exposed usage 后再判断缓存行为。
+## 本版本不包含
+- 广泛/默认修改请求体，或做 provider-agnostic cache-control 注入。
+- 注入 Anthropic `cache_control` markers。
+- 默认发送 OpenAI `prompt_cache_key`；只有设置 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1`、当前 model id/name 属于 OpenAI-family、且 payload 还没有该字段时才会添加。
+- 在 Pi 自己的 compat 处理之外覆盖 OpenAI `prompt_cache_retention`。
+- 创建 Gemini explicit `cachedContents` resources 或持久化 cache resource names。
+- 对不暴露可靠 cache usage 的 provider 声称统计支持。
+## 验证效果
+### 在 Pi 中查看
+- 查看当前活跃 family 的底部标签，例如 `DS cache ...`、`OpenAI cache ...`、`Claude cache ...` 或 `Gemini cache ...`。
+- 使用 Pi 内置 `/stats` 查看 Pi 归一化后的 `cacheRead` tokens 是否增长。
+- 对 DeepSeek，Pi 会把 `usage.input` 归一化为未缓存/miss prompt tokens，把 `usage.cacheRead` 归一化为 `prompt_cache_hit_tokens`，所以 footer 分母使用 `input + cacheRead + cacheWrite` 还原（provider 正常报告 usage 时应对应 DeepSeek `prompt_tokens`）。
+- footer 的 hit count 是请求级：每个 assistant response 计入一次总请求，`cacheRead > 0` 计为命中。DeepSeek 后台可能使用不同时间窗口或账号级/provider 侧聚合；对比前请先对齐 reset/window。
+- 对 provider raw API，可对比文档中的 usage 字段，例如 DeepSeek `prompt_cache_hit_tokens`、OpenAI `cached_tokens`、Anthropic `cache_read_input_tokens` 或 Gemini/Vertex cached-content token count。
+### 官方 DeepSeek baseline（推荐）
+请使用官方直连 `deepseek/deepseek-v4-pro` 做 DeepSeek 基线；暂不建议把代理路径混进同一次验证。请不要把 API key 粘贴到聊天记录或 issue 中。
+1. 配置官方 key（任选一种方式）：
+   ```bash
+   export DEEPSEEK_API_KEY='...'
+   ```
+   或使用 Pi 的登录/配置方式保存 key。
+2. 确认模型可见：
+   ```bash
+   pi --list-models deepseek-v4-pro
+   ```
+   应能看到 `deepseek/deepseek-v4-pro`。
+3. 运行最小请求：
+   ```bash
+   pi --model deepseek/deepseek-v4-pro --thinking high
+   ```
+   在 Pi 中连续输入几次相同或高度相似的短 prompt，例如：
+   ```text
+   请用一句话回答：cache baseline ping
+   ```
+4. 对同一或高度相似请求连续运行至少三次，再用底部 `DS cache ...` 和 `/stats` 对比 `cacheRead` / hit rate 是否增长。
+DeepSeek 的缓存前缀以服务端 prefix/cache unit 为粒度。第一次重复但在后缀处发生分歧的请求，可能是在构建公共前缀缓存；第三次以及之后与该公共前缀匹配的请求通常更有参考意义。官方文档提到缓存清理可能是数小时到数天的 best-effort 行为，但这不是“命中保证”；短时间内 miss 也不一定代表 TTL 已经失效，可能只是前缀粒度、路由、请求差异或缓存尚未建立。
+> 注意：baseline 会消耗少量 token；请使用短 prompt，不要粘贴大文件。当前推荐测试命令只使用官方 `deepseek/deepseek-v4-pro`。
+## 许可证
+本项目基于 [MIT License](./LICENSE) 开源发布。