npm - pi-cache-optimizer - Versions diffs - 2.3.0 → 2.4.1 - Mend

pi-cache-optimizer 2.3.0 → 2.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -62,15 +62,13 @@ State files under `~/.pi/agent/` are resolved via Node's `os.homedir()`, so on W
    pi install npm:pi-cache-optimizer
    ```
-3. On first activation, if no DeepSeek-like model is already configured, this extension auto-seeds a recommended `deepseek` provider block into `~/.pi/agent/models.json`. The seed goes BEYOND the official onboarding doc by adding `supportsLongCacheRetention: true` and `sendSessionAffinityHeaders: true` — those flags are exactly the cache-related compat the official doc omits, and they are the reason this extension's compat warnings exist. A timestamped backup `~/.pi/agent/models.json.bak.<unix-millis>` is written before any change. Existing user entries are never modified.
-4. Export your DeepSeek API key in the same shell where you run `pi`:
+3. Export your DeepSeek API key in the same shell where you run `pi` (if you use a DeepSeek model):
    ```bash
    export DEEPSEEK_API_KEY='...'
    ```
-   The seed references `$DEEPSEEK_API_KEY` symbolically; this extension never reads, stores, or prints the key value.
-5. Opt out of auto-seeding by exporting `PI_CACHE_OPTIMIZER_NO_AUTO_CONFIG=1` before launching Pi. With opt-out, no write or backup happens, and no provider entry is added or modified.
+   This extension never reads, stores, or prints the key value.
 ## Install
@@ -78,15 +76,15 @@ State files under `~/.pi/agent/` are resolved via Node's `os.homedir()`, so on W
 pi install npm:pi-cache-optimizer
 ```
-After installation, `PI_CACHE_RETENTION=long` is applied automatically, the system prompt is reordered and skills are compressed automatically, session-overview churn is stripped automatically, `~/.pi/agent/models.json` is auto-seeded with a DeepSeek block when no DeepSeek-like model is configured, and the footer shows cache stats after supported model-family responses with exposed usage.
+After installation, `PI_CACHE_RETENTION=long` is applied automatically, the system prompt is reordered and skills are compressed automatically, session-overview churn is stripped automatically, and the footer shows cache stats after supported model-family responses with exposed usage.
 ## Opt-out
 | Env var | Effect |
 |---------|--------|
-| `PI_CACHE_OPTIMIZER_NO_AUTO_CONFIG=1` | Skip DeepSeek `models.json` auto-seed |
 | `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1` | Keep pi's verbose `<available_skills>` XML (opt out of one-line index) |
-| `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1` | Add `prompt_cache_key` to OpenAI-family requests (opt-in) |
+| `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` | Disable the OpenAI-family `prompt_cache_key` fallback (default is enabled) |
+| `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` | Disable the OpenAI-family `prompt_cache_key` fallback |
 ## Uninstall
@@ -114,7 +112,7 @@ rm ~/.pi/agent/pi-cache-optimizer-stats.json
 rm -f ~/.pi/agent/deepseek-cache-optimizer-stats.json
 ```
-The DeepSeek block this extension seeded into `~/.pi/agent/models.json` is left in place on uninstall. Remove it manually if you no longer want it; the timestamped backup at `~/.pi/agent/models.json.bak.<unix-millis>` lets you compare against the previous content.
 ## Footer cache stats
@@ -193,7 +191,7 @@ After:  [stable tools + rules | dynamic git status | task context]
         ↓ stable prefix → higher chance of cache reuse
 ```
-Pi itself decides whether to send cache-related fields such as `prompt_cache_retention`, session-affinity headers, or Anthropic-style `cache_control` based on model compat and `PI_CACHE_RETENTION`. By default this extension does not add request fields; the only opt-in request hint is OpenAI-family `prompt_cache_key` when `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1` is set. The extension does not fake cache hits; it helps configuration, improves stable-prefix probability, and summarizes exposed usage in the footer.
+Pi itself decides whether to send cache-related fields such as `prompt_cache_retention`, session-affinity headers, or Anthropic-style `cache_control` based on model compat and `PI_CACHE_RETENTION`. This extension now adds only one conservative request-body fallback by default: for OpenAI-family models using OpenAI-compatible Pi APIs, it fills a missing or blank top-level `prompt_cache_key` with the Pi session id and never overwrites an existing non-empty key. The extension does not fake cache hits; it helps configuration, improves stable-prefix probability, and summarizes exposed usage in the footer.
 ## Improving cache hit rate
@@ -208,7 +206,7 @@ What the extension does automatically:
 Provider notes:
 - DeepSeek: current behavior remains the reference path. Stable prefix ordering plus long-retention/session-affinity compat gives the best chance of automatic KV prefix reuse.
-- OpenAI-family: prompt caching is automatic only on supported upstreams and sufficiently long prompts. Keep static instructions, tools, examples, and specs before changing user/task context. Pi owns retention transport by default. If you explicitly opt in with `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1`, the extension adds a top-level `prompt_cache_key` derived from a SHA-256 hash of the stable prompt prefix for OpenAI-family id/name matches only. The stable prompt text is not stored or printed, but unsupported OpenAI-compatible proxies may reject this field.
+- OpenAI-family: prompt caching is automatic only on supported upstreams and sufficiently long prompts. Keep static instructions, tools, examples, and specs before changing user/task context. Pi owns retention transport by default. For OpenAI-compatible Pi APIs, the extension fills a missing or blank top-level `prompt_cache_key` with the Pi session id (matching Pi core's official OpenAI behavior) and never overwrites an existing non-empty `prompt_cache_key` / `promptCacheKey`. Disable this fallback with `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` or `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0`. Unsupported OpenAI-compatible proxies may reject unknown fields; custom APIs are not targeted.
 - Claude: prompt caching depends on Anthropic `cache_control` breakpoints. This extension does not inject breakpoints itself; for compatible endpoints, configure Pi compat such as `cacheControlFormat: "anthropic"` only when the endpoint supports it.
 - Gemini/Vertex: implicit caching benefits from repeated large stable prefixes. This extension does not create explicit `cachedContents` resources or store cache resource names.
 - Proxies/aggregators: fix upstream routing/provider order where possible. Cache hit rates are unreliable if the same model id/name can route to different upstreams.
@@ -225,9 +223,9 @@ This package now has provider-family stats adapters, but it still avoids blind g
 ## Out of scope for this release
-- Broad/default request-body mutation or provider-agnostic cache-control injection.
+- Broad/provider-agnostic request-body mutation or cache-control injection. The only default request-body fallback is OpenAI-family `prompt_cache_key` on OpenAI-compatible APIs, sourced from the Pi session id and skipped when an effective key already exists.
 - Injecting Anthropic `cache_control` markers.
-- Sending OpenAI `prompt_cache_key` by default; it is only added when `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1` is set, the active model id/name is OpenAI-family, and the payload does not already define one.
+- Sending OpenAI `prompt_cache_key` into custom/non-OpenAI-compatible APIs; the fallback is gated to OpenAI-family id/name plus `openai-completions` / `openai-responses`.
 - Overriding OpenAI `prompt_cache_retention` outside Pi's own compat handling.
 - Creating Gemini explicit `cachedContents` resources or persisting cache resource names.
 - Claiming stats for providers that do not expose reliable cache usage.

package/README.zh-CN.md CHANGED Viewed

@@ -65,15 +65,13 @@ Generic OpenAI-compatible 代理**不会**仅因为使用 OpenAI 形状 API 或
    pi install npm:pi-cache-optimizer
    ```
-3. 首次激活时，如果 `~/.pi/agent/models.json` 里还没有 DeepSeek-like 模型，本扩展会自动写入一个推荐的 `deepseek` provider 块。这个 seed 比官方接入文档多了两个关键 flag：`supportsLongCacheRetention: true` 与 `sendSessionAffinityHeaders: true`——这些正是官方文档略去、但本扩展 compat 警告一直在判断的缓存相关项。写入前会先产生一个带时间戳的备份 `~/.pi/agent/models.json.bak.<unix-millis>`，原有的任何 provider 条目都不会被修改或覆盖。
-4. 在运行 `pi` 的同一个 shell 中导出 DeepSeek API key：
+3. 如果使用 DeepSeek 模型，请在运行 `pi` 的同一个 shell 中导出 DeepSeek API key：
    ```bash
    export DEEPSEEK_API_KEY='...'
    ```
-   seed 只是以 `$DEEPSEEK_API_KEY` 符号引用 key；本扩展**不会**读取、存储或打印 key 的值。
-5. 如需退出自动写入，请在启动 Pi 之前设 `PI_CACHE_OPTIMIZER_NO_AUTO_CONFIG=1`。退出后不会产生任何写入或备份，也不会新增 provider 条目。
+   本扩展**不会**读取、存储或打印 key 的值。
 ## 安装
@@ -81,15 +79,15 @@ Generic OpenAI-compatible 代理**不会**仅因为使用 OpenAI 形状 API 或
 pi install npm:pi-cache-optimizer
 ```
-安装后 `PI_CACHE_RETENTION=long` **自动生效**，system prompt **自动重组**、skills 自动压缩、session-overview 动态尾字段自动剥离；如果 `~/.pi/agent/models.json` 还没有 DeepSeek-like 模型，会自动 seed 一个 `deepseek` provider 块；受支持 model family 的响应完成且暴露 usage 后，底部状态栏会显示缓存统计。
+安装后 `PI_CACHE_RETENTION=long` **自动生效**，system prompt **自动重组**、skills 自动压缩、session-overview 动态尾字段自动剥离；受支持 model family 的响应完成且暴露 usage 后，底部状态栏会显示缓存统计。
 ## 退出（Opt-out）
 | 环境变量 | 作用 |
 |---------|------|
-| `PI_CACHE_OPTIMIZER_NO_AUTO_CONFIG=1` | 跳过 `models.json` DeepSeek 自动写入 |
 | `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1` | 保留 pi 的 verbose `<available_skills>` XML（退出一行索引模式） |
-| `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1` | 对 OpenAI-family 请求添加 `prompt_cache_key`（需主动启用） |
+| `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` | 禁用 OpenAI-family `prompt_cache_key` 兜底（默认启用） |
+| `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` | 禁用 OpenAI-family `prompt_cache_key` 兜底 |
 ## 卸载
@@ -117,7 +115,7 @@ rm ~/.pi/agent/pi-cache-optimizer-stats.json
 rm -f ~/.pi/agent/deepseek-cache-optimizer-stats.json
 ```
-本扩展写入到 `~/.pi/agent/models.json` 的 DeepSeek 块在卸载后不会被自动删除。如需清除请手动编辑；之前的备份 `~/.pi/agent/models.json.bak.<unix-millis>` 可供对比还原。
 ## 底部缓存统计
@@ -196,7 +194,7 @@ Provider 缓存通常依赖精确或近似精确的前缀匹配。Pi 的 system
          ↓ 稳定前缀不变 → 更容易命中缓存
 ```
-Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送缓存相关字段，例如 `prompt_cache_retention`、session affinity headers 或 Anthropic-style `cache_control`。默认情况下，本扩展不会添加请求字段；唯一的 opt-in 请求提示是设置 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1` 后，对 OpenAI-family 模型添加 `prompt_cache_key`。本扩展不伪造缓存命中，只帮助配置、提高稳定前缀概率，并把已暴露的 usage 汇总到底部状态栏。
+Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送缓存相关字段，例如 `prompt_cache_retention`、session affinity headers 或 Anthropic-style `cache_control`。本扩展现在默认只做一个保守的 request-body 兜底：对使用 OpenAI-compatible Pi API 的 OpenAI-family 模型，当顶层 `prompt_cache_key` 缺失或为空时，用 Pi session id 补上，并且不会覆盖已有的非空 key。本扩展不伪造缓存命中，只帮助配置、提高稳定前缀概率，并把已暴露的 usage 汇总到底部状态栏。
 ## 提高 cache 命中率
@@ -211,7 +209,7 @@ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送
 各 provider 注意点：
 - DeepSeek：现有行为仍是参考路径。稳定前缀排序，加上 long-retention / session-affinity compat，最有利于自动 KV prefix 复用。
-- OpenAI-family：prompt caching 只会在真实上游支持且 prompt 足够长时自动生效。请尽量把静态 instructions、tools、examples、specs 放在变化的 user/task context 前面。retention 传输默认由 Pi 负责。如果你显式设置 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1`，扩展会仅对 OpenAI-family id/name 匹配的模型添加顶层 `prompt_cache_key`，其值来自稳定 prompt 前缀的 SHA-256 hash。稳定 prompt 原文不会被保存或打印，但不支持该字段的 OpenAI-compatible 代理可能会拒绝请求。
+- OpenAI-family：prompt caching 只会在真实上游支持且 prompt 足够长时自动生效。请尽量把静态 instructions、tools、examples、specs 放在变化的 user/task context 前面。retention 传输默认由 Pi 负责。对 OpenAI-compatible Pi API，本扩展会用 Pi session id 补齐缺失或空白的顶层 `prompt_cache_key`（与 Pi core 官方 OpenAI 行为对齐），并且不会覆盖已有非空的 `prompt_cache_key` / `promptCacheKey`。可用 `PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY=1` 或 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=0` 禁用该兜底。不支持该字段的 OpenAI-compatible 代理可能拒绝请求；custom API 不会被注入。
 - Claude：prompt caching 依赖 Anthropic `cache_control` breakpoints。本扩展不会自行注入 breakpoint；对兼容 endpoint，只在 endpoint 明确支持时配置 Pi compat，例如 `cacheControlFormat: "anthropic"`。
 - Gemini/Vertex：implicit caching 受益于重复的大型稳定前缀。本扩展不会创建 explicit `cachedContents` resources，也不会保存 cache resource names。
 - Proxies/aggregators：尽量固定上游 routing/provider order。如果同一个 model id/name 可能路由到不同上游，cache hit rate 会不稳定。
@@ -229,9 +227,9 @@ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送
 ## 本版本不包含
-- 广泛/默认修改请求体，或做 provider-agnostic cache-control 注入。
+- 广泛/provider-agnostic 修改请求体，或做 cache-control 注入。唯一默认 request-body 兜底是 OpenAI-family 在 OpenAI-compatible API 上使用 Pi session id 的 `prompt_cache_key`，且已有有效 key 时会跳过。
 - 注入 Anthropic `cache_control` markers。
-- 默认发送 OpenAI `prompt_cache_key`；只有设置 `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1`、当前 model id/name 属于 OpenAI-family、且 payload 还没有该字段时才会添加。
+- 向 custom / 非 OpenAI-compatible API 发送 OpenAI `prompt_cache_key`；该兜底同时要求 model id/name 属于 OpenAI-family，且 API 是 `openai-completions` / `openai-responses`。
 - 在 Pi 自己的 compat 处理之外覆盖 OpenAI `prompt_cache_retention`。
 - 创建 Gemini explicit `cachedContents` resources 或持久化 cache resource names。
 - 对不暴露可靠 cache usage 的 provider 声称统计支持。

package/index.ts CHANGED Viewed

@@ -1,10 +1,3 @@
-import { createHash } from "node:crypto";
-import {
-  mkdirSync,
-  readFileSync,
-  renameSync,
-  writeFileSync,
-} from "node:fs";
 import { mkdir, readFile, rename, unlink, writeFile } from "node:fs/promises";
 import { homedir } from "node:os";
 import { dirname, join } from "node:path";
@@ -16,10 +9,8 @@ import type { BuildSystemPromptOptions, ExtensionAPI, ExtensionContext } from "@
  * What it does:
  * 1. Reorders Pi's system prompt so stable content is sent before dynamic context.
  * 2. Sets PI_CACHE_RETENTION=long at extension load time.
- * 3. Auto-seeds a recommended DeepSeek entry into ~/.pi/agent/models.json on first run
- *    (only when no DeepSeek-like model is already configured; never overwrites).
- * 4. Warns once for provider/model cache compat gaps where the signal is conservative.
- * 5. Shows lightweight persisted provider-specific cache stats in Pi's footer.
+ * 3. Warns once for provider/model cache compat gaps where the signal is conservative.
+ * 4. Shows lightweight persisted provider-specific cache stats in Pi's footer.
  *
  * Provider prompt/KV caches are provider-side and best-effort. This extension improves
  * the odds of cache hits; it cannot guarantee hits, especially through proxies.
@@ -41,14 +32,11 @@ const STATUS_KEY = "pi-cache-stats";
 const STATE_DIR = join(homedir(), ".pi", "agent");
 const STATE_FILE_PATH = join(STATE_DIR, "pi-cache-optimizer-stats.json");
 const LEGACY_STATE_FILE_PATH = join(STATE_DIR, "deepseek-cache-optimizer-stats.json");
-const MODELS_JSON_PATH = join(STATE_DIR, "models.json");
 const CACHE_PROVIDER_IDS: CacheProviderId[] = ["deepseek", "openai", "claude", "gemini"];
 const OPENAI_CACHE_KEY_ENV = "PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY";
-const OPENAI_PROMPT_CACHE_KEY_PREFIX = "pi-dsco-";
-const NO_AUTO_CONFIG_ENV = "PI_CACHE_OPTIMIZER_NO_AUTO_CONFIG";
+const NO_OPENAI_CACHE_KEY_ENV = "PI_CACHE_OPTIMIZER_NO_OPENAI_CACHE_KEY";
+const OPENAI_PROMPT_CACHE_KEY_MAX_LENGTH = 64;
 const NO_SKILL_COMPRESSION_ENV = "PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION";
-const DEEPSEEK_API_KEY_ENV = "DEEPSEEK_API_KEY";
 // WORM-flag: if optimizeSystemPrompt ever detects that its blind-replace
 // logic has accidentally truncated a structural marker (any XML tag or
@@ -114,6 +102,18 @@ type PersistedCacheStatsV2 = {
   statsByProvider: Partial<Record<CacheProviderId, CacheStats>>;
 };
+/** Per-model-key scoped state. Used in memory and for v3 persistence. */
+type CacheStatsState = {
+  statsByModel: Record<string, CacheStats>;
+  legacyFamily: Partial<Record<CacheProviderId, CacheStats>>;
+};
+type PersistedCacheStatsV3 = {
+  version: 3;
+  statsByModel: Record<string, CacheStats>;
+  legacyFamily: Partial<Record<CacheProviderId, CacheStats>>;
+};
 type UsageSnapshot = {
   cacheRead: number;
   cacheWrite: number;
@@ -511,12 +511,17 @@ function optimizeSystemPrompt(
   };
 }
-function buildPromptCacheKey(stablePrefix: string): string | undefined {
-  const normalized = stablePrefix.trim();
+function clampPromptCacheKey(key: string | undefined): string | undefined {
+  const normalized = key?.trim();
   if (!normalized) return undefined;
-  const digest = createHash("sha256").update(normalized).digest("hex").slice(0, 24);
-  return `${OPENAI_PROMPT_CACHE_KEY_PREFIX}${digest}`;
+  const chars = Array.from(normalized);
+  if (chars.length <= OPENAI_PROMPT_CACHE_KEY_MAX_LENGTH) return normalized;
+  return chars.slice(0, OPENAI_PROMPT_CACHE_KEY_MAX_LENGTH).join("");
+}
+function getSessionPromptCacheKey(ctx: ExtensionContext): string | undefined {
+  return clampPromptCacheKey(ctx.sessionManager.getSessionId());
 }
 function asRecord(value: unknown): UnknownRecord | undefined {
@@ -547,8 +552,16 @@ function isEnabledEnv(value: string | undefined): boolean {
   return normalized === "1" || normalized === "true" || normalized === "yes" || normalized === "on";
 }
-function hasOwn(record: UnknownRecord, key: string): boolean {
-  return Object.prototype.hasOwnProperty.call(record, key);
+function isDisabledEnv(value: string | undefined): boolean {
+  if (!value) return false;
+  const normalized = value.trim().toLowerCase();
+  return normalized === "0" || normalized === "false" || normalized === "no" || normalized === "off";
+}
+function shouldInjectOpenAIPromptCacheKey(): boolean {
+  if (isEnabledEnv(process.env[NO_OPENAI_CACHE_KEY_ENV])) return false;
+  if (isDisabledEnv(process.env[OPENAI_CACHE_KEY_ENV])) return false;
+  return true;
 }
 function isAssistantMessage(message: unknown): boolean {
@@ -796,13 +809,51 @@ function normalizeWithFallback(
 function addOpenAIPromptCacheKey(payload: unknown, cacheKey: string | undefined): unknown | undefined {
   const record = asRecord(payload);
-  if (!record || !cacheKey) return undefined;
+  const normalizedCacheKey = clampPromptCacheKey(cacheKey);
+  if (!record || !normalizedCacheKey) return undefined;
-  if (hasOwn(record, "prompt_cache_key") || hasOwn(record, "promptCacheKey")) {
+  if (hasEffectivePromptCacheKey(record)) {
     return undefined;
   }
-  return { ...record, prompt_cache_key: cacheKey };
+  return { ...record, prompt_cache_key: normalizedCacheKey };
+}
+function hasEffectivePromptCacheKey(record: UnknownRecord): boolean {
+  return isNonEmptyString(record.prompt_cache_key) || isNonEmptyString(record.promptCacheKey);
+}
+function isNonEmptyString(value: unknown): boolean {
+  return typeof value === "string" && value.trim().length > 0;
+}
+function isOfficialOpenAIBaseUrl(model: PiModel): boolean {
+  const value = lower(model.baseUrl).trim();
+  if (!value) return false;
+  try {
+    return new URL(value).hostname === "api.openai.com";
+  } catch {
+    return value === "api.openai.com" || value.startsWith("api.openai.com/");
+  }
+}
+function describeMissingOpenAIFamilyProxyCompat(model: PiModel): string[] {
+  const compat = getCompat(model);
+  const missing: string[] = [];
+  if (!isOpenAIFamilyModel(model)) return missing;
+  if (lower(model.api) !== "openai-completions") return missing;
+  if (isOfficialOpenAIBaseUrl(model)) return missing;
+  if (compat.supportsLongCacheRetention !== true) {
+    missing.push("supportsLongCacheRetention");
+  }
+  if (compat.sendSessionAffinityHeaders !== true) {
+    missing.push("sendSessionAffinityHeaders");
+  }
+  return missing;
 }
 function describeMissingDeepSeekCompat(model: PiModel): string[] {
@@ -881,6 +932,15 @@ const CACHE_PROVIDER_ADAPTERS: CacheProviderAdapter[] = [
     normalizeUsage(message) {
       return normalizeWithFallback(message, getOpenAIRawUsage);
     },
+    warningText(model) {
+      const missing = describeMissingOpenAIFamilyProxyCompat(model);
+      if (missing.length === 0) return undefined;
+      return (
+        `💡 pi-cache-optimizer: ${modelKey(model)} looks like a third-party GPT/OpenAI-compatible proxy but merged compat lacks ${missing.join(" and ")}. ` +
+        `For better cache locality, add compat: { "supportsLongCacheRetention": true, "sendSessionAffinityHeaders": true } in ~/.pi/agent/models.json when the endpoint supports these fields.`
+      );
+    },
   },
   {
     id: "gemini",
@@ -1013,30 +1073,56 @@ function parseCacheStats(value: unknown): CacheStats | undefined {
   };
 }
-function parsePersistedCacheStats(value: unknown): Partial<Record<CacheProviderId, CacheStats>> | undefined {
+function parsePersistedCacheStats(value: unknown): CacheStatsState | undefined {
   const record = asRecord(value);
   if (!record) return undefined;
-  if (record.version === 1) {
-    const migrated = parseCacheStats(record.stats);
-    return migrated ? { deepseek: migrated } : undefined;
-  }
+  // version 3: model-scoped stats + legacy family fallback
+  if (record.version === 3) {
+    const statsByModel: Record<string, CacheStats> = {};
+    const rawModelMap = asRecord(record.statsByModel);
+    if (rawModelMap) {
+      for (const [key, val] of Object.entries(rawModelMap)) {
+        const parsed = parseCacheStats(val);
+        if (parsed) statsByModel[key] = parsed;
+      }
+    }
+    const legacyFamily: Partial<Record<CacheProviderId, CacheStats>> = {};
+    const rawFamily = asRecord(record.legacyFamily);
+    if (rawFamily) {
+      for (const id of CACHE_PROVIDER_IDS) {
+        const stats = parseCacheStats(rawFamily[id]);
+        if (stats) legacyFamily[id] = stats;
+      }
+    }
-  if (record.version !== 2) return undefined;
+    return { statsByModel, legacyFamily };
+  }
-  const statsByProvider = asRecord(record.statsByProvider);
-  if (!statsByProvider) return undefined;
+  // version 2: migrate statsByProvider into legacyFamily
+  if (record.version === 2) {
+    const statsByProvider = asRecord(record.statsByProvider);
+    const legacyFamily: Partial<Record<CacheProviderId, CacheStats>> = {};
+    if (statsByProvider) {
+      for (const id of CACHE_PROVIDER_IDS) {
+        const stats = parseCacheStats(statsByProvider[id]);
+        if (stats) legacyFamily[id] = stats;
+      }
+    }
+    return { statsByModel: {}, legacyFamily };
+  }
-  const parsed: Partial<Record<CacheProviderId, CacheStats>> = {};
-  for (const id of CACHE_PROVIDER_IDS) {
-    const stats = parseCacheStats(statsByProvider[id]);
-    if (stats) parsed[id] = stats;
+  // version 1: single DeepSeek stats -> migrate to legacyFamily.deepseek
+  if (record.version === 1) {
+    const migrated = parseCacheStats(record.stats);
+    return migrated ? { statsByModel: {}, legacyFamily: { deepseek: migrated } } : undefined;
   }
-  return parsed;
+  return undefined;
 }
-async function readPersistedCacheStats(): Promise<Partial<Record<CacheProviderId, CacheStats>> | undefined> {
+async function readPersistedCacheStats(): Promise<CacheStatsState | undefined> {
   try {
     const raw = await readFile(STATE_FILE_PATH, "utf8");
     return parsePersistedCacheStats(JSON.parse(raw));
@@ -1076,231 +1162,20 @@ async function readPersistedCacheStats(): Promise<Partial<Record<CacheProviderId
   return undefined;
 }
-async function writePersistedCacheStats(statsByProvider: Partial<Record<CacheProviderId, CacheStats>>): Promise<void> {
+async function writePersistedCacheStats(state: CacheStatsState): Promise<void> {
   await mkdir(STATE_DIR, { recursive: true });
-  const payload: PersistedCacheStatsV2 = { version: 2, statsByProvider };
+  const payload: PersistedCacheStatsV3 = {
+    version: 3,
+    statsByModel: state.statsByModel,
+    legacyFamily: state.legacyFamily,
+  };
   const tempPath = `${STATE_FILE_PATH}.${process.pid}.${Date.now()}.tmp`;
   await writeFile(tempPath, JSON.stringify(payload, null, 2) + "\n", "utf8");
   await rename(tempPath, STATE_FILE_PATH);
 }
-// ============================================================
-// models.json auto-config (DeepSeek seed)
-// ============================================================
-type ModelsJsonShape = {
-  providers?: UnknownRecord;
-} & UnknownRecord;
-const DEEPSEEK_SEED_PROVIDER = {
-  baseUrl: "https://api.deepseek.com",
-  api: "openai-completions",
-  apiKey: "$DEEPSEEK_API_KEY",
-  models: [
-    {
-      id: "deepseek-v4-pro",
-      name: "DeepSeek V4 Pro",
-      contextWindow: 1_000_000,
-      maxTokens: 384_000,
-      input: ["text"],
-      reasoning: true,
-      cost: { input: 1.74, output: 3.48, cacheRead: 0.145, cacheWrite: 0 },
-      compat: {
-        requiresReasoningContentOnAssistantMessages: true,
-        thinkingFormat: "deepseek",
-        supportsLongCacheRetention: true,
-        sendSessionAffinityHeaders: true,
-        reasoningEffortMap: {
-          minimal: "high",
-          low: "high",
-          medium: "high",
-          high: "high",
-          xhigh: "max",
-        },
-      },
-    },
-    {
-      id: "deepseek-v4-flash",
-      name: "DeepSeek V4 Flash",
-      contextWindow: 1_000_000,
-      maxTokens: 384_000,
-      input: ["text"],
-      reasoning: true,
-      cost: { input: 0.14, output: 0.28, cacheRead: 0.028, cacheWrite: 0 },
-      compat: {
-        requiresReasoningContentOnAssistantMessages: true,
-        thinkingFormat: "deepseek",
-        supportsLongCacheRetention: true,
-        sendSessionAffinityHeaders: true,
-        reasoningEffortMap: {
-          minimal: "high",
-          low: "high",
-          medium: "high",
-          high: "high",
-          xhigh: "max",
-        },
-      },
-    },
-  ],
-} as const;
-function modelsJsonContainsDeepseek(parsed: ModelsJsonShape): boolean {
-  const providers = asRecord(parsed.providers);
-  if (!providers) return false;
-  // Respect user intent: a provider key literally named "deepseek" (case-insensitive)
-  // means the user already declared their own DeepSeek block, even if its models list is empty.
-  for (const key of Object.keys(providers)) {
-    if (key.toLowerCase() === "deepseek") return true;
-  }
-  for (const providerValue of Object.values(providers)) {
-    const provider = asRecord(providerValue);
-    if (!provider) continue;
-    const models = provider.models;
-    if (!Array.isArray(models)) continue;
-    for (const model of models) {
-      const record = asRecord(model);
-      if (!record) continue;
-      if (lower(record.id).includes("deepseek") || lower(record.name).includes("deepseek")) {
-        return true;
-      }
-    }
-  }
-  return false;
-}
-type EnsureDeepseekResult = {
-  // Whether some DeepSeek-like model is now present in models.json (either pre-existing or just-seeded).
-  deepseekPresent: boolean;
-  // Whether we just wrote the seed in this activation.
-  seeded: boolean;
-  // Whether auto-config was deliberately skipped (env opt-out or malformed file).
-  skipped: boolean;
-};
-function ensureDeepseekConfigured(notify?: (text: string, level: "info" | "warning") => void): EnsureDeepseekResult {
-  const result: EnsureDeepseekResult = { deepseekPresent: false, seeded: false, skipped: false };
-  if (isEnabledEnv(process.env[NO_AUTO_CONFIG_ENV])) {
-    result.skipped = true;
-    // Even when opted out, callers still need to know whether DeepSeek is present so the
-    // API-key hint can fire. Read-only inspection only; no writes.
-    try {
-      const raw = readFileSync(MODELS_JSON_PATH, "utf8");
-      const parsed = JSON.parse(raw) as ModelsJsonShape;
-      if (parsed && typeof parsed === "object") {
-        result.deepseekPresent = modelsJsonContainsDeepseek(parsed);
-      }
-    } catch {
-      // ignore: missing or unreadable file means "not present"
-    }
-    return result;
-  }
-  let originalBytes: string | undefined;
-  let parsed: ModelsJsonShape;
-  try {
-    originalBytes = readFileSync(MODELS_JSON_PATH, "utf8");
-  } catch (error) {
-    if (getErrorCode(error) !== "ENOENT") {
-      console.warn(`${LOG_PREFIX}: failed to read models.json; skipping auto-config`, error);
-      result.skipped = true;
-      return result;
-    }
-    parsed = { providers: {} };
-  }
-  if (originalBytes !== undefined) {
-    try {
-      const decoded = JSON.parse(originalBytes) as unknown;
-      if (decoded && typeof decoded === "object" && !Array.isArray(decoded)) {
-        parsed = decoded as ModelsJsonShape;
-      } else {
-        // A non-object top-level JSON (array/string/number) is unexpected; treat as malformed and abort.
-        console.warn(`${LOG_PREFIX}: models.json top-level is not an object; aborting auto-config`);
-        result.skipped = true;
-        return result;
-      }
-    } catch (error) {
-      // Malformed JSON: do NOT overwrite the user's file.
-      console.warn(`${LOG_PREFIX}: models.json is not valid JSON; aborting auto-config`, error);
-      result.skipped = true;
-      return result;
-    }
-  } else {
-    parsed = { providers: {} };
-  }
-  if (modelsJsonContainsDeepseek(parsed)) {
-    result.deepseekPresent = true;
-    return result;
-  }
-  // Decide we will seed. Snapshot the old bytes (or empty marker) into a backup before mutating.
-  const backupPath = `${MODELS_JSON_PATH}.bak.${Date.now()}`;
-  try {
-    mkdirSync(STATE_DIR, { recursive: true });
-    writeFileSync(backupPath, originalBytes ?? "", "utf8");
-  } catch (error) {
-    console.warn(`${LOG_PREFIX}: failed to write models.json backup; aborting auto-config`, error);
-    result.skipped = true;
-    return result;
-  }
-  const providersIn = asRecord(parsed.providers) ?? {};
-  const merged: ModelsJsonShape = {
-    ...parsed,
-    providers: { ...providersIn, deepseek: DEEPSEEK_SEED_PROVIDER },
-  };
-  const tempPath = `${MODELS_JSON_PATH}.tmp.${process.pid}`;
-  try {
-    writeFileSync(tempPath, JSON.stringify(merged, null, 2) + "\n", "utf8");
-  } catch (error) {
-    console.warn(`${LOG_PREFIX}: failed to write models.json temp file; aborting auto-config`, error);
-    result.skipped = true;
-    return result;
-  }
-  try {
-    renameSync(tempPath, MODELS_JSON_PATH);
-  } catch (error) {
-    console.warn(
-      `${LOG_PREFIX}: failed to atomically rename models.json (temp left at ${tempPath})`,
-      error,
-    );
-    result.skipped = true;
-    return result;
-  }
-  result.seeded = true;
-  result.deepseekPresent = true;
-  notify?.(
-    `${LOG_PREFIX}: seeded DeepSeek provider into ${MODELS_JSON_PATH} (backup at ${backupPath}). ` +
-      `Set ${DEEPSEEK_API_KEY_ENV} to use it; or set ${NO_AUTO_CONFIG_ENV}=1 next time to opt out.`,
-    "info",
-  );
-  return result;
-}
-function emitDeepseekApiKeyHintIfNeeded(
-  deepseekPresent: boolean,
-  notify: (text: string, level: "info" | "warning") => void,
-): void {
-  if (!deepseekPresent) return;
-  const value = process.env[DEEPSEEK_API_KEY_ENV];
-  if (typeof value === "string" && value.trim().length > 0) return;
-  notify(
-    `${LOG_PREFIX}: ${DEEPSEEK_API_KEY_ENV} is not set. ` +
-      `DeepSeek models in ${MODELS_JSON_PATH} reference $${DEEPSEEK_API_KEY_ENV}; ` +
-      `export ${DEEPSEEK_API_KEY_ENV}=... in your shell to enable them.`,
-    "info",
-  );
-}
 // Internal helpers exported only so the task verification script
 // (.trellis/tasks/.../verify.ts) can exercise them. They are not part of the
@@ -1315,42 +1190,75 @@ export const __internals_for_tests = {
   compressSkillsInSystemPrompt,
   MIN_STABLE_CANDIDATE_LENGTH,
   SKILL_COMPRESSION_MIN_COUNT,
+  // OpenAI-family cache-key helpers
+  addOpenAIPromptCacheKey,
+  clampPromptCacheKey,
+  hasEffectivePromptCacheKey,
+  isNonEmptyString,
+  shouldInjectOpenAIPromptCacheKey,
+  isOpenAICompatibleApi,
+  isOpenAIFamilyModel,
+  isOpenAIFamilyAssistantMessage,
+  isOpenAIFamilyToken,
+  describeMissingOpenAIFamilyProxyCompat,
+  isOfficialOpenAIBaseUrl,
+  getModelIdNameTokenValues,
+  getAssistantMessageModelTokenValues,
+  getCompat,
+  modelKey,
+  // Cache stats helpers (module-level, usable from verify script)
+  addUsageToCacheStats,
+  formatCacheStats,
+  emptyCacheStats,
+  emptyAllCacheStats,
+  parseCacheStats,
+  parsePersistedCacheStats,
 };
 export default function (pi: ExtensionAPI) {
   const warnedModels = new Set<string>();
-  let cacheStatsByProvider: Partial<Record<CacheProviderId, CacheStats>> = emptyAllCacheStats();
+  let cacheStatsByModel: Record<string, CacheStats> = {};
+  let cacheStatsLegacyFamily: Partial<Record<CacheProviderId, CacheStats>> = emptyAllCacheStats();
   let lastStatusText: string | undefined;
-  let latestPromptCacheKey: string | undefined;
   let persistenceWarningShown = false;
-  let apiKeyHintShown = false;
+  let persistTimer: ReturnType<typeof setTimeout> | null = null;
+  const PERSIST_DEBOUNCE_MS = 2000;
-  // Auto-config runs once at extension activation (idempotent: skips if DeepSeek already configured).
-  // Pi's UI logger is not yet bound here, so seed-time notifications go through console.warn / console.info.
-  // Per-session UI notification is emitted from the session_start hook below.
-  let autoConfig: EnsureDeepseekResult;
-  try {
-    autoConfig = ensureDeepseekConfigured((text, level) => {
-      if (level === "warning") console.warn(text);
-      else console.info(text);
-    });
-  } catch (error) {
-    console.warn(`${LOG_PREFIX}: ensureDeepseekConfigured threw; continuing without auto-config`, error);
-    autoConfig = { deepseekPresent: false, seeded: false, skipped: true };
+  function getCacheStatsState(): CacheStatsState {
+    return { statsByModel: cacheStatsByModel, legacyFamily: cacheStatsLegacyFamily };
+  }
+  /** Look up active stats for a model, falling back to legacy family. */
+  function getStatsForModel(model: PiModel | undefined, adapter: CacheProviderAdapter): CacheStats {
+    if (model) {
+      const key = modelKey(model);
+      const existing = cacheStatsByModel[key];
+      if (existing) return existing;
+    }
+    // Fallback: legacy family bucket — used when model key is unknown
+    // or this model hasn't been seen yet in this session.
+    const family = cacheStatsLegacyFamily[adapter.id];
+    if (family) return family;
+    const created = emptyCacheStats();
+    cacheStatsLegacyFamily[adapter.id] = created;
+    return created;
   }
-  function getStatsForAdapter(adapter: CacheProviderAdapter): CacheStats {
-    const existing = cacheStatsByProvider[adapter.id];
+  /** Get or create a stats entry for the given model key. */
+  function getOrCreateStatsByModelKey(key: string): CacheStats {
+    const existing = cacheStatsByModel[key];
     if (existing) return existing;
     const created = emptyCacheStats();
-    cacheStatsByProvider[adapter.id] = created;
+    cacheStatsByModel[key] = created;
     return created;
   }
   async function persistCacheStats(ctx?: ExtensionContext): Promise<void> {
     try {
-      await writePersistedCacheStats(cacheStatsByProvider);
+      await writePersistedCacheStats(getCacheStatsState());
     } catch (error) {
       console.warn(`${LOG_PREFIX}: failed to persist cache stats`, error);
       if (!persistenceWarningShown) {
@@ -1363,14 +1271,48 @@ export default function (pi: ExtensionAPI) {
     }
   }
+  /** Schedule a debounced persist. Coalesces rapid message_end writes
+   *  into a single disk write after PERSIST_DEBOUNCE_MS of silence.
+   *  In-memory stats remain instantly up-to-date for the footer; only
+   *  the on-disk persistence is delayed. */
+  function schedulePersistCacheStats(ctx?: ExtensionContext): void {
+    if (persistTimer !== null) clearTimeout(persistTimer);
+    persistTimer = setTimeout(() => {
+      persistTimer = null;
+      persistCacheStats(ctx).catch((err) => {
+        console.warn(`${LOG_PREFIX}: debounced persist failed`, err);
+      });
+    }, PERSIST_DEBOUNCE_MS);
+  }
+  /** Flush any pending debounced persist immediately (cancels timer + writes).
+   *  Used on reload and day-rollover where immediate durability matters. */
+  async function flushPersistCacheStats(ctx?: ExtensionContext): Promise<void> {
+    if (persistTimer !== null) {
+      clearTimeout(persistTimer);
+      persistTimer = null;
+    }
+    await persistCacheStats(ctx);
+  }
   async function rollOverStatsIfNeeded(ctx?: ExtensionContext): Promise<void> {
     const day = currentLocalDay();
     let changed = false;
+    // Roll over per-model entries.
+    for (const key of Object.keys(cacheStatsByModel)) {
+      const stats = cacheStatsByModel[key];
+      if (stats && stats.day !== day) {
+        cacheStatsByModel[key] = emptyCacheStats(day);
+        changed = true;
+      }
+    }
+    // Roll over legacy family entries.
     for (const id of CACHE_PROVIDER_IDS) {
-      const stats = cacheStatsByProvider[id];
+      const stats = cacheStatsLegacyFamily[id];
       if (stats && stats.day !== day) {
-        cacheStatsByProvider[id] = emptyCacheStats(day);
+        cacheStatsLegacyFamily[id] = emptyCacheStats(day);
         changed = true;
       }
     }
@@ -1383,13 +1325,21 @@ export default function (pi: ExtensionAPI) {
   async function restoreCacheStats(reason: string, ctx: ExtensionContext): Promise<void> {
     if (reason === "reload") {
-      cacheStatsByProvider = emptyAllCacheStats();
+      cacheStatsByModel = {};
+      cacheStatsLegacyFamily = emptyAllCacheStats();
       lastStatusText = undefined;
-      await persistCacheStats(ctx);
+      await flushPersistCacheStats(ctx);
       return;
     }
-    cacheStatsByProvider = (await readPersistedCacheStats()) ?? emptyAllCacheStats();
+    const persisted = await readPersistedCacheStats();
+    if (persisted) {
+      cacheStatsByModel = persisted.statsByModel;
+      cacheStatsLegacyFamily = persisted.legacyFamily;
+    } else {
+      cacheStatsByModel = {};
+      cacheStatsLegacyFamily = emptyAllCacheStats();
+    }
     lastStatusText = undefined;
     await rollOverStatsIfNeeded(ctx);
   }
@@ -1398,7 +1348,17 @@ export default function (pi: ExtensionAPI) {
     await rollOverStatsIfNeeded(ctx);
     const adapter = selectAdapterForModel(model);
-    let statusText: string | undefined = adapter ? formatCacheStats(adapter, getStatsForAdapter(adapter)) : undefined;
+    let statusText: string | undefined;
+    if (adapter) {
+      // Display only per-model scoped stats. A model that has never been
+      // used in this session shows 0/0 rather than falling back to legacy
+      // family aggregated stats (which could span different providers with
+      // the same model-family name). The message_end hook populates
+      // cacheStatsByModel[key] on first use with that model.
+      const key = model ? modelKey(model) : undefined;
+      const stats = key ? cacheStatsByModel[key] : undefined;
+      statusText = formatCacheStats(adapter, stats ?? emptyCacheStats());
+    }
     // If optimizeSystemPrompt detected structural truncation on this or
     // a recent turn, flag it once in the footer so the user knows to
@@ -1418,12 +1378,6 @@ export default function (pi: ExtensionAPI) {
   pi.on("session_start", async (event, ctx) => {
     await restoreCacheStats(event.reason, ctx);
     notifyCacheCompatIfNeeded(ctx.model, ctx, warnedModels);
-    if (!apiKeyHintShown) {
-      apiKeyHintShown = true;
-      emitDeepseekApiKeyHintIfNeeded(autoConfig.deepseekPresent, (text, level) => {
-        ctx.ui.notify(text, level);
-      });
-    }
     await publishStatus(ctx);
   });
@@ -1489,7 +1443,6 @@ export default function (pi: ExtensionAPI) {
     // cache key derived from `stablePrefix` reflects what actually
     // ships to the provider.
     const optimized = optimizeSystemPrompt(compressedPrompt, event.systemPromptOptions);
-    latestPromptCacheKey = buildPromptCacheKey(optimized.stablePrefix);
     if (optimized.changed && optimized.systemPrompt.trim().length > 0) {
       return { systemPrompt: optimized.systemPrompt };
@@ -1510,10 +1463,11 @@ export default function (pi: ExtensionAPI) {
   });
   pi.on("before_provider_request", (event, ctx) => {
-    if (!isEnabledEnv(process.env[OPENAI_CACHE_KEY_ENV])) return undefined;
+    if (!shouldInjectOpenAIPromptCacheKey()) return undefined;
     if (!isOpenAIFamilyModel(ctx.model)) return undefined;
+    if (!isOpenAICompatibleApi(ctx.model?.api)) return undefined;
-    return addOpenAIPromptCacheKey(event.payload, latestPromptCacheKey);
+    return addOpenAIPromptCacheKey(event.payload, getSessionPromptCacheKey(ctx));
   });
   pi.on("message_end", async (event, ctx) => {
@@ -1524,8 +1478,17 @@ export default function (pi: ExtensionAPI) {
     if (!usage) return;
     await rollOverStatsIfNeeded(ctx);
-    addUsageToCacheStats(getStatsForAdapter(adapter), usage);
-    await persistCacheStats(ctx);
+    // Update stats scoped to the active model (provider/id key).
+    // Falls back to legacy family when ctx.model is undefined.
+    if (ctx.model) {
+      const key = modelKey(ctx.model);
+      addUsageToCacheStats(getOrCreateStatsByModelKey(key), usage);
+    } else {
+      addUsageToCacheStats(getStatsForModel(undefined, adapter), usage);
+    }
+    schedulePersistCacheStats(ctx);
     await publishStatus(ctx);
   });
 }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-cache-optimizer",
-  "version": "2.3.0",
+  "version": "2.4.1",
   "description": "Pi extension that improves provider-side KV/prompt cache hit rates (DeepSeek, OpenAI, Claude, Gemini) by reordering the system prompt, requesting long retention, and showing footer cache stats. Renamed from pi-deepseek-cache-optimizer.",
   "keywords": [
     "pi-package",