npm - pi-cache-optimizer - Versions diffs - 2.6.6 → 2.6.9 - Mend

pi-cache-optimizer 2.6.6 → 2.6.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -20,6 +20,7 @@ Pi extension for improving provider-side KV / prompt cache hit rates. It keeps s
 - [Anthropic adaptive thinking models](#anthropic-adaptive-thinking-models)
 - [Auto-repair with `/cache-optimizer fix`](#auto-repair-with-cache-optimizer-fix)
 - [Footer stats](#footer-stats)
+- [For router / virtual-channel extension authors](#for-router--virtual-channel-extension-authors)
 - [Uninstall](#uninstall)
 - [Verify effect](#verify-effect)
 - [License](#license)
@@ -51,6 +52,8 @@ pi remove npm:pi-deepseek-cache-optimizer && pi install npm:pi-cache-optimizer
 Run `/reload` in Pi after install/update/remove so extension hooks refresh.
+On Pi 0.79.7 and newer, `pi update` updates Pi itself only. To update installed Pi packages such as this extension, run `pi update --extensions` (packages only) or `pi update --all` (Pi + packages).
 ## Commands
 | Command | Effect |
@@ -213,7 +216,7 @@ If only one model should change, use `modelOverrides`:
 Stats are read-only local counters stored at `~/.pi/agent/pi-cache-optimizer-stats.json` and scoped by Pi session + provider/model. They contain only dates and numeric counters — no API keys, prompts, payloads, headers, responses, or model output.
-For virtual routing providers, completed assistant message metadata is authoritative: if the message carries real upstream `provider`, `model` / `responseModel`, `api`, and usage, stats are attributed to that upstream provider/model instead of the virtual router shell. Router extensions may also publish a live route adapter under `Symbol.for("pi.routing.registry.v1")` so footer, doctor, compat, and reset flows can resolve the current upstream before the final assistant message exists. The cache optimizer also exposes query-scoped prompt/cache hints via `Symbol.for("pi.cache.hints.v1")` for routers that forward to inner `streamSimple` calls. Both protocols are optional and versioned; no router package import is required.
+Pi 0.79+ also includes a built-in footer `CH` marker for the latest prompt cache hit rate. This extension complements that marker with persisted, provider/model/session-scoped counters plus proxy compat diagnostics.
 Example footer:
@@ -227,6 +230,113 @@ Supported footer labels include: DS, Claude, OpenAI, Gemini, Kimi, Qwen, GLM, Mi
 Adapter selection uses only model id/name (plus assistant message model/name on message end). Generic OpenAI-shaped APIs are not treated as OpenAI-family unless the model id/name matches a supported family.
+## For router / virtual-channel extension authors
+If your Pi extension provides a virtual routing provider (for example `router/auto`, `router/smart`, or a profile/channel that forwards to a real upstream), this extension can show cache stats for the real upstream provider/model instead of the virtual shell. Integration is optional, versioned, and does **not** require importing this package.
+### Minimum integration: final assistant message metadata
+For seamless final cache-stat attribution, relay the real upstream identity on completed assistant messages:
+```ts
+{
+  role: "assistant",
+  provider: "anthropic",              // real upstream provider
+  responseModel: "claude-opus-4-8",   // or model: "..."
+  api: "anthropic-messages",          // upstream Pi API id when known
+  usage: {
+    input: 1200,       // Pi-normalized uncached input tokens, if available
+    cacheRead: 8000,   // tokens read from provider prompt cache
+    cacheWrite: 500,   // tokens newly written to provider prompt cache
+  },
+}
+```
+`message_end` treats these assistant-message fields as authoritative. If `provider` + `model`/`responseModel` + cache usage are present, stats update the upstream bucket even when the active model is still `router/auto`. If upstream usage does not expose cache fields, leave them absent/zero; this extension will not fake cache hits.
+### Optional: live route registry for pre-response UX
+Final message metadata is enough for post-response stats. For pre-response flows — footer display before the first response, `/cache-optimizer doctor`, `/cache-optimizer compat`, `/cache-optimizer reset`, and OpenAI-compatible `prompt_cache_key` fallback — register a live route adapter under `Symbol.for("pi.routing.registry.v1")`.
+Protocol shape:
+```ts
+type PiRouteSnapshot = {
+  virtualProvider: string;
+  virtualModelId: string;
+  provider: string;
+  modelId: string;
+  api?: string;
+  canonicalModelId?: string;
+  routeLabel?: string;
+  status?: "planned" | "trying" | "selected" | "success" | "failed";
+  sessionIdHash?: string;
+  requestId?: string;
+  timestamp: number;
+};
+type PiRouterAdapterV1 = {
+  virtualProvider: string;
+  resolveActiveRoute(
+    virtualModelId: string,
+    hint?: { sessionIdHash?: string; requestId?: string },
+  ): PiRouteSnapshot | undefined;
+  resolveCandidateRoutes?(virtualModelId: string): PiRouteSnapshot[];
+  subscribe?(listener: (event: PiRouteSnapshot) => void): () => void;
+};
+```
+Registration pattern:
+```ts
+const ROUTING = Symbol.for("pi.routing.registry.v1");
+const registry = (globalThis as Record<symbol, unknown>)[ROUTING] as
+  | { version: 1; registerRouter(adapter: PiRouterAdapterV1): () => void }
+  | undefined;
+registry?.registerRouter({
+  virtualProvider: "router",
+  resolveActiveRoute(virtualModelId, hint) {
+    return {
+      virtualProvider: "router",
+      virtualModelId,
+      provider: "deepseek",
+      modelId: "deepseek-v4",
+      api: "openai-completions",
+      sessionIdHash: hint?.sessionIdHash,
+      timestamp: Date.now(),
+    };
+  },
+});
+```
+Do not overwrite an existing registry. If your extension loads before this optimizer, retry registration on `session_start` or create the same V1 registry shape only if no registry exists.
+### Optional: query-scoped cache hints
+Routers that forward to an inner Pi request path can read query-scoped hints from `Symbol.for("pi.cache.hints.v1")`:
+```ts
+const CACHE_HINTS = Symbol.for("pi.cache.hints.v1");
+const hints = (globalThis as Record<symbol, any>)[CACHE_HINTS]?.getHints?.({
+  sessionIdHash,
+  virtualProvider: "router",
+  virtualModelId: "auto",
+  upstreamProvider: "deepseek",
+  upstreamModelId: "deepseek-v4",
+  api: "openai-completions",
+});
+```
+When the query matches the current session/route, `hints` may contain `systemPrompt`, `promptCacheKey`, and `cacheRetention: "long"`. Treat these as advisory and sensitive: do not log them, do not expose prompt text, and do not overwrite an existing request-level `prompt_cache_key` / `promptCacheKey`.
+### Security and correctness rules
+- Do not import `pi-cache-optimizer`; use `Symbol.for(...)` discovery only.
+- Do not expose API keys, prompts, payloads, headers, response bodies, or model output in route snapshots or logs.
+- Use assistant-message metadata for final attribution; live registry data is advisory and may be stale by response time.
+- Preserve truthful usage. Missing cache usage should show as 0/under-reported, not as synthetic hits.
 ## Uninstall
 ```bash

package/README.zh-CN.md CHANGED Viewed

@@ -20,6 +20,7 @@
 - [Anthropic adaptive thinking 模型](#anthropic-adaptive-thinking-模型)
 - [使用 `/cache-optimizer fix` 自动修复](#使用-cache-optimizer-fix-自动修复)
 - [Footer 统计](#footer-统计)
+- [Router / Virtual-channel 扩展作者指南](#router--virtual-channel-扩展作者指南)
 - [卸载](#卸载)
 - [验证效果](#验证效果)
 - [License](#license)
@@ -51,6 +52,8 @@ pi remove npm:pi-deepseek-cache-optimizer && pi install npm:pi-cache-optimizer
 安装、更新或移除后，在 Pi 中运行 `/reload`，让 extension hooks 刷新。
+Pi 0.79.7 及之后，`pi update` 默认只更新 Pi 本体。若要更新已安装的 Pi package（包括本扩展），请运行 `pi update --extensions`（只更新 packages）或 `pi update --all`（Pi 与 packages 一起更新）。
 ## 命令
 | 命令 | 作用 |
@@ -213,7 +216,7 @@ Provider 级最小 override：
 统计是只读本地计数，保存在 `~/.pi/agent/pi-cache-optimizer-stats.json`，按 Pi session + provider/model 隔离。文件只包含日期和数字计数，不包含 API key、prompt、payload、headers、响应或模型输出。
-对于虚拟 routing provider，最终 assistant message 的 metadata 是权威来源：如果 message 携带真实上游 `provider`、`model` / `responseModel`、`api` 和 usage，统计会归因到真实上游 provider/model，而不是虚拟 router 外壳。Router extension 也可以在 `Symbol.for("pi.routing.registry.v1")` 下发布 live route adapter，让 footer、doctor、compat 和 reset 在最终 assistant message 出现前解析当前上游。本扩展还通过 `Symbol.for("pi.cache.hints.v1")` 暴露按查询过滤的 prompt/cache hints，供转发到内部 `streamSimple` 的 router 使用。两个协议都是可选、版本化的；不需要导入任何 router 包。
+Pi 0.79+ 已内置 footer `CH` 标记，用于显示最近一次 prompt cache hit rate。本扩展在此基础上补充持久化的 provider/model/session-scoped 计数，以及代理 compat 诊断。
 示例 footer：
@@ -227,6 +230,113 @@ OpenAI cache 3/10 · 0.002M/0.005M tok (40%) ⚠️ compat
 Adapter 选择只看模型 id/name（以及 message_end 时 assistant message 的 model/name）。仅使用 OpenAI-shaped API 不会被当作 OpenAI-family，除非模型 id/name 匹配受支持的家族。
+## Router / Virtual-channel 扩展作者指南
+如果你的 Pi 扩展提供虚拟 routing provider（例如 `router/auto`、`router/smart`，或会转发到真实上游的 profile/channel），本扩展可以为真实上游 provider/model 显示缓存统计，而不是把统计记到虚拟外壳上。集成是可选、版本化的，并且**不需要导入本包**。
+### 最小集成：最终 assistant message metadata
+要无缝获得最终缓存统计归因，请在完成的 assistant message 上透传真实上游身份：
+```ts
+{
+  role: "assistant",
+  provider: "anthropic",              // 真实上游 provider
+  responseModel: "claude-opus-4-8",   // 或 model: "..."
+  api: "anthropic-messages",          // 已知时填写上游 Pi API id
+  usage: {
+    input: 1200,       // Pi-normalized 未缓存 input tokens，如可用
+    cacheRead: 8000,   // 从 provider prompt cache 读取的 tokens
+    cacheWrite: 500,   // 本次新写入 provider prompt cache 的 tokens
+  },
+}
+```
+`message_end` 会把这些 assistant-message 字段视为权威来源。只要存在 `provider` + `model`/`responseModel` + cache usage，即使 active model 仍是 `router/auto`，统计也会更新真实上游桶。如果上游 usage 没有 cache 字段，请保持缺失或为 0；本扩展不会伪造 cache hit。
+### 可选：用于预响应 UX 的实时路由注册表
+最终 message metadata 足以支持响应后的统计。若要支持响应前流程——首次响应前的 footer 显示、`/cache-optimizer doctor`、`/cache-optimizer compat`、`/cache-optimizer reset` 和 OpenAI-compatible `prompt_cache_key` fallback——请在 `Symbol.for("pi.routing.registry.v1")` 下注册 live route adapter。
+协议形状：
+```ts
+type PiRouteSnapshot = {
+  virtualProvider: string;
+  virtualModelId: string;
+  provider: string;
+  modelId: string;
+  api?: string;
+  canonicalModelId?: string;
+  routeLabel?: string;
+  status?: "planned" | "trying" | "selected" | "success" | "failed";
+  sessionIdHash?: string;
+  requestId?: string;
+  timestamp: number;
+};
+type PiRouterAdapterV1 = {
+  virtualProvider: string;
+  resolveActiveRoute(
+    virtualModelId: string,
+    hint?: { sessionIdHash?: string; requestId?: string },
+  ): PiRouteSnapshot | undefined;
+  resolveCandidateRoutes?(virtualModelId: string): PiRouteSnapshot[];
+  subscribe?(listener: (event: PiRouteSnapshot) => void): () => void;
+};
+```
+注册模式：
+```ts
+const ROUTING = Symbol.for("pi.routing.registry.v1");
+const registry = (globalThis as Record<symbol, unknown>)[ROUTING] as
+  | { version: 1; registerRouter(adapter: PiRouterAdapterV1): () => void }
+  | undefined;
+registry?.registerRouter({
+  virtualProvider: "router",
+  resolveActiveRoute(virtualModelId, hint) {
+    return {
+      virtualProvider: "router",
+      virtualModelId,
+      provider: "deepseek",
+      modelId: "deepseek-v4",
+      api: "openai-completions",
+      sessionIdHash: hint?.sessionIdHash,
+      timestamp: Date.now(),
+    };
+  },
+});
+```
+不要覆盖已有 registry。如果你的扩展比本优化器更早加载，请在 `session_start` 时重试注册，或仅在 registry 不存在时创建同样的 V1 registry 形状。
+### 可选：按查询过滤的缓存提示
+会转发到内部 Pi 请求路径的 router，可以从 `Symbol.for("pi.cache.hints.v1")` 读取按查询过滤的提示：
+```ts
+const CACHE_HINTS = Symbol.for("pi.cache.hints.v1");
+const hints = (globalThis as Record<symbol, any>)[CACHE_HINTS]?.getHints?.({
+  sessionIdHash,
+  virtualProvider: "router",
+  virtualModelId: "auto",
+  upstreamProvider: "deepseek",
+  upstreamModelId: "deepseek-v4",
+  api: "openai-completions",
+});
+```
+当查询匹配当前 session/route 时，`hints` 可能包含 `systemPrompt`、`promptCacheKey` 和 `cacheRetention: "long"`。这些提示是参考信息且可能敏感：不要记录日志，不要暴露 prompt 文本，也不要覆盖已有 request-level `prompt_cache_key` / `promptCacheKey`。
+### 安全与正确性规则
+- 不要导入 `pi-cache-optimizer`；只使用 `Symbol.for(...)` 发现协议。
+- 不要在 route snapshot 或日志中暴露 API key、prompt、payload、headers、response body 或模型输出。
+- 最终归因使用 assistant-message metadata；live registry 只是参考信息，到响应完成时可能已经过期。
+- 保持 usage 真实。缺失 cache usage 时应该显示 0 或低报，而不是合成命中。
 ## 卸载
 ```bash

package/index.ts CHANGED Viewed

@@ -1,5 +1,6 @@
 import { createHash } from "node:crypto";
 import { copyFile, mkdir, readFile, rename, unlink, writeFile } from "node:fs/promises";
+import { readFileSync } from "node:fs";
 import { homedir } from "node:os";
 import { dirname, join } from "node:path";
 import type { BuildSystemPromptOptions, ExtensionAPI, ExtensionContext } from "@earendil-works/pi-coding-agent";
@@ -958,12 +959,12 @@ function getNonNegativeNumber(record: UnknownRecord, key: string): number | unde
  */
 function getCompat(model: PiModel | undefined): CacheCompat {
   if (!model) return {} as CacheCompat;
   // Pi merges provider.compat with model.compat (model wins on conflicts)
   // We approximate this by reading from ctx.model which should already have merged compat
   // However, for safety, we check both levels if available
   const modelCompat = (model.compat ?? {}) as CacheCompat;
   // Note: ctx.model from Pi should already contain merged compat,
   // but we document the two-level structure for clarity
   return modelCompat;
@@ -1941,6 +1942,13 @@ function describeMissingOpenAICompatibleProxyCompat(model: PiModel): string[] {
     missing.push("sendSessionAffinityHeaders");
   }
+  // NOTE: supportsLongCacheRetention is intentionally NOT checked here.
+  // Per spec, it is optional/risky advisory text only and must NOT trigger
+  // the ⚠️ compat marker. The before_provider_request hook proactively
+  // strips prompt_cache_retention for models without explicit opt-in,
+  // so 400 errors are prevented regardless of this compat flag.
+  // Doctor/compat may mention it as optional guidance separately.
   return missing;
 }
@@ -1963,6 +1971,9 @@ function buildSafeOpenAIProxyCompatSuggestion(missing: string[]): Record<string,
   if (missing.includes("sendSessionAffinityHeaders")) {
     suggestion.sendSessionAffinityHeaders = true;
   }
+  // supportsLongCacheRetention is NOT suggested here — per spec it is
+  // optional/risky and must not appear in the copyable safe snippet.
+  // The proactive stripping in before_provider_request handles 400 prevention.
   return suggestion;
 }
@@ -1984,6 +1995,10 @@ function hasPromptCacheRetentionUnsupportedSignal(headers: Record<string, string
     "unknown parameter",
     "not supported",
     "unsupported field",
+    "extra inputs",
+    "not permitted",
+    "unrecognized",
+    "bad request",
   ].some((needle) => normalized.includes(needle));
 }
@@ -4714,6 +4729,291 @@ function locateModelInJsonc(
   };
 }
+/**
+ * Scan produced by `analyzeModelsJsonForMissingEntry` when
+ * `locateModelInJsonc` cannot find the target provider/model.
+ */
+type MissingEntryDiagnosis =
+  | { scenario: "provider_missing"; providersEnd: number }
+  | { scenario: "model_missing"; modelsEnd: number; providerBrace: number; providerEndBrace: number }
+  | { scenario: "provider_without_models"; providerBrace: number; providerEndBrace: number };
+/**
+ * Light second-pass scan that determines *why* `locateModelInJsonc` failed.
+ * Returns structured diagnostic so the fix handler can compose targeted
+ * guidance and an optional surgical insertion for API-logged-in models
+ * (e.g. opencode go) that never appear in `models.json`.
+ */
+function analyzeModelsJsonForMissingEntry(
+  text: string,
+  providerLabel: string,
+  modelId: string,
+): MissingEntryDiagnosis | undefined {
+  const clean = stripJsoncComments(text);
+  const rootBrace = skipJsonWhitespace(clean, 0);
+  if (clean[rootBrace] !== "{") return undefined;
+  const providersKey = findJsonObjectKey(clean, rootBrace, "providers");
+  if (!providersKey) {
+    // Root has no "providers" key at all — we don't auto-create one.
+    return undefined;
+  }
+  const providersBrace = skipJsonWhitespace(clean, providersKey.valueStart);
+  if (clean[providersBrace] !== "{") return undefined;
+  const providersEnd = findMatchingBracket(clean, providersBrace);
+  if (providersEnd === undefined) return undefined;
+  const providerKey = findJsonObjectKey(clean, providersBrace, providerLabel);
+  if (!providerKey || providerKey.keyStart > providersEnd) {
+    return { scenario: "provider_missing", providersEnd };
+  }
+  // Provider exists. Check for a models array so we know where to append.
+  const providerBrace = skipJsonWhitespace(clean, providerKey.valueStart);
+  if (clean[providerBrace] !== "{") return undefined;
+  const providerEndBrace = findMatchingBracket(clean, providerBrace);
+  if (providerEndBrace === undefined || providerEndBrace > providersEnd) return undefined;
+  const modelsKey = findJsonObjectKey(clean, providerBrace, "models");
+  if (modelsKey && modelsKey.keyStart < providerEndBrace) {
+    let mScan = skipJsonWhitespace(clean, modelsKey.valueStart);
+    if (clean[mScan] === "[") {
+      const modelsEnd = findMatchingBracket(clean, mScan);
+      if (modelsEnd !== undefined && modelsEnd <= providerEndBrace) {
+        return { scenario: "model_missing", modelsEnd, providerBrace, providerEndBrace };
+      }
+    }
+  }
+  // Provider exists, but there's no discoverable models array — treat as
+  // a provider that needs one.
+  return { scenario: "provider_without_models", providerBrace, providerEndBrace };
+}
+/**
+ * Build a copyable manual-edit snippet for the missing entry. Used when the
+ * terminal is non-interactive or the user chooses to edit by hand.
+ * Returns a complete provider→model→compat JSON block that the user can
+ * paste into `models.json` under `providers`.
+ */
+function formatMissingEntryManualSnippet(
+  providerLabel: string,
+  modelId: string,
+  compatKeys: Record<string, unknown>,
+): string {
+  const lines: string[] = [];
+  const sorted = Object.entries(compatKeys).sort(([a], [b]) => a.localeCompare(b));
+  const compatItems = sorted.map(([k, v]) => `      ${JSON.stringify(k)}: ${JSON.stringify(v)}`);
+  lines.push(`"${providerLabel}": {`);
+  lines.push(`    "models": [`);
+  lines.push(`      {`);
+  lines.push(`        "id": ${JSON.stringify(modelId)},`);
+  lines.push(`        "compat": {`);
+  lines.push(compatItems.join(",\n"));
+  lines.push(`        }`);
+  lines.push(`      }`);
+  lines.push(`    ]`);
+  lines.push(`  }`);
+  return lines.join("\n");
+}
+/**
+ * Surgically insert the missing provider/model entry into the original
+ * JSONC text. Returns the modified text and placement descriptor.
+ *
+ * Handles three scenarios:
+ * - `model_missing`: append a new model object to the provider's `models` array.
+ * - `provider_missing`: append a new provider block to the root `providers` object.
+ * - `provider_without_models`: inject a `"models": [...]` key into the existing provider.
+ */
+function composeMissingEntryInsertion(
+  originalText: string,
+  diagnosis: MissingEntryDiagnosis,
+  providerLabel: string,
+  modelId: string,
+  compatKeys: Record<string, unknown>,
+): { modifiedText: string; placementLabel: string } {
+  // Resolve a sensible indentation step from an arbitrary byte offset in
+  // the original file.
+  const indentUnitAt = (offset: number): string => {
+    const ls = originalText.lastIndexOf("\n", offset);
+    const line = originalText.slice(ls < 0 ? 0 : ls + 1, offset);
+    const m = line.match(/^(\s+)/);
+    return m ? m[1] : "  ";
+  };
+  // Figure out the base indent from the insertion point's own line.
+  // Then derive inner indents (+1 and +2 levels).
+  const sorted = Object.entries(compatKeys).sort(([a], [b]) => a.localeCompare(b));
+  const formatCompactCompat = (indent: string): string => {
+    // Single-line compact when there's only one key, multi-line otherwise.
+    if (sorted.length === 1) {
+      const [k, v] = sorted[0];
+      return `{ ${JSON.stringify(k)}: ${JSON.stringify(v)} }`;
+    }
+    return (
+      "{\n" +
+      sorted.map(([k, v]) => `${indent}${JSON.stringify(k)}: ${JSON.stringify(v)}`).join(",\n") +
+      "\n" +
+      indent.slice(0, -2) +
+      "}"
+    );
+  };
+  if (diagnosis.scenario === "model_missing") {
+    // Append to the provider's models array, right before `]`.
+    const unit = indentUnitAt(diagnosis.modelsEnd);
+    const inner0 = unit + unit; // indent of model object's own keys
+    const inner1 = inner0 + unit; // indent of compat keys inside the model
+    const inner2 = inner1 + unit; // indent of compat values
+    // Determine whether the array is empty (need to skip the leading newline).
+    const arrayInterior = originalText.slice(
+      originalText.lastIndexOf("[", diagnosis.modelsEnd) + 1,
+      diagnosis.modelsEnd,
+    ).trim();
+    const hasExistingElements = arrayInterior.length > 0;
+    const compatBlock = formatCompactCompat(inner2);
+    const modelBlock = [
+      hasExistingElements ? "," : "",
+      inner0 + "{",
+      inner1 + `"id": ${JSON.stringify(modelId)},`,
+      inner1 + `"compat": ` + compatBlock,
+      inner0 + "}",
+      unit,
+    ].filter(Boolean).join("\n");
+    const insertionPoint = diagnosis.modelsEnd;
+    const prefix = originalText.slice(0, insertionPoint);
+    const suffix = originalText.slice(insertionPoint); // starts with `]`
+    return {
+      modifiedText: prefix + modelBlock + suffix,
+      placementLabel: `providers["${providerLabel}"] -> models -> (new entry for "${modelId}")`,
+    };
+  }
+  if (diagnosis.scenario === "provider_missing") {
+    // Append a new provider entry to the root `providers` object, right
+    // before its closing `}`.
+    const unit = indentUnitAt(diagnosis.providersEnd);
+    const inner0 = unit + unit;
+    const inner1 = inner0 + unit;
+    const inner2 = inner1 + unit;
+    const inner3 = inner2 + unit;
+    const compatBlock = formatCompactCompat(inner3);
+    const providersInterior = originalText.slice(
+      originalText.lastIndexOf("{", diagnosis.providersEnd) + 1,
+      diagnosis.providersEnd,
+    ).trim();
+    const hasExisting = providersInterior.length > 0;
+    const providerBlock = [
+      hasExisting ? "," : "",
+      inner0 + `"${providerLabel}": {`,
+      inner1 + `"models": [`,
+      inner2 + "{",
+      inner3 + `"id": ${JSON.stringify(modelId)},`,
+      inner3 + `"compat": ` + compatBlock,
+      inner2 + "}",
+      inner1 + "]",
+      inner0 + "}",
+      unit,
+    ].filter(Boolean).join("\n");
+    const insertionPoint = diagnosis.providersEnd;
+    const prefix = originalText.slice(0, insertionPoint);
+    const suffix = originalText.slice(insertionPoint);
+    return {
+      modifiedText: prefix + providerBlock + suffix,
+      placementLabel: `providers -> (new entry "${providerLabel}")`,
+    };
+  }
+  // `provider_without_models`: inject a models array key into the
+  // existing provider block, right after the provider's opening `{`.
+  const unit = indentUnitAt(diagnosis.providerBrace);
+  const inner0 = unit + unit;
+  const inner1 = inner0 + unit;
+  const inner2 = inner1 + unit;
+  const compatBlock = formatCompactCompat(inner2);
+  const afterBrace = diagnosis.providerBrace + 1;
+  const modelsBlock = [
+    "",
+    inner0 + `"models": [`,
+    inner1 + "{",
+    inner2 + `"id": ${JSON.stringify(modelId)},`,
+    inner2 + `"compat": ` + compatBlock,
+    inner1 + "}",
+    inner0 + "],",
+    unit,
+  ].join("\n");
+  return {
+    modifiedText: originalText.slice(0, afterBrace) + modelsBlock + originalText.slice(afterBrace),
+    placementLabel: `providers["${providerLabel}"] -> (new "models" array with "${modelId}")`,
+  };
+}
+/**
+ * Lightweight self-check for a newly inserted entry.
+ * Parses the modified text as JSONC and confirms:
+ *   1. The target model exists under the provider.
+ *   2. Every compat key has the expected value (merged provider+model).
+ * Returns null on success, an error string on failure.
+ */
+function selfCheckMissingEntryInsertion(
+  originalText: string,
+  modifiedText: string,
+  providerLabel: string,
+  modelId: string,
+  compatKeys: Record<string, unknown>,
+): string | null {
+  try {
+    const modParsed = parseJsonc(modifiedText);
+    const providers = asRecord(asRecord(modParsed)?.providers);
+    if (!providers) return "Modified file: providers object missing or invalid";
+    const provider = asRecord(providers[providerLabel]);
+    if (!provider) return `Modified file: provider "${providerLabel}" not found`;
+    const models = provider.models;
+    if (!Array.isArray(models)) return `Modified file: provider "${providerLabel}".models is not an array`;
+    const targetModel = models.find((m: unknown) => asRecord(m)?.id === modelId);
+    if (!targetModel || typeof targetModel !== "object")
+      return `Modified file: model "${modelId}" not found in provider after insertion`;
+    // Validate effective merged compat
+    const provCompatRaw = (provider as Record<string, unknown>).compat;
+    const provCompat = (provCompatRaw && typeof provCompatRaw === "object" && !Array.isArray(provCompatRaw))
+      ? provCompatRaw as Record<string, unknown>
+      : {};
+    const mdlCompatRaw = (targetModel as Record<string, unknown>).compat;
+    const mdlCompat = (mdlCompatRaw && typeof mdlCompatRaw === "object" && !Array.isArray(mdlCompatRaw))
+      ? mdlCompatRaw as Record<string, unknown>
+      : {};
+    const merged = { ...provCompat, ...mdlCompat };
+    for (const [k, v] of Object.entries(compatKeys)) {
+      if (!(k in merged)) return `Modified file: effective compat.${k} not found`;
+      if (merged[k] !== v) return `Modified file: effective compat.${k} wrong value`;
+    }
+    if (modifiedText.length < originalText.length)
+      return "Modified file: content is shorter than original (possible truncation)";
+    const modClean = stripJsoncComments(modifiedText);
+    const rootStart = skipJsonWhitespace(modClean, 0);
+    const rootEnd = findMatchingBracket(modClean, rootStart);
+    if (rootEnd === undefined) return "Modified file: root bracket mismatch";
+    if (skipJsonWhitespace(modClean, rootEnd + 1) !== modClean.length)
+      return "Modified file: trailing content after root object";
+    return null;
+  } catch (e) {
+    return `Self-check error: ${e instanceof Error ? e.message : String(e)}`;
+  }
+}
 /**
  * Deep-equal comparison of two values, used for post-write self-check.
  * Compares all keys recursively, allowing `extraKeys` to be present in `a` but not in `b`.
@@ -5003,7 +5303,7 @@ function selfCheckFix(
     if (models.length === 0) {
       return `Modified file: provider "${providerLabel}".models is empty`;
     }
     // Step 4: Find and validate target model
     const targetModel = models.find((m: Record<string, unknown>) => m.id === modelId);
     if (!targetModel || typeof targetModel !== 'object') {
@@ -5021,7 +5321,7 @@ function selfCheckFix(
     if (!origProvider || !origTargetModelRecord) {
       return `Original file: provider/model "${providerLabel}/${modelId}" not found`;
     }
     // Step 5: Compute the EFFECTIVE merged compat (provider-level + model-level),
     // mirroring Pi's mergeCompat behavior (model wins on conflicts). The fix may
     // have written either level, so validation must check the merged result.
@@ -5045,7 +5345,7 @@ function selfCheckFix(
         return `Modified file: effective compat.${k} has wrong value: expected ${JSON.stringify(v)}, got ${JSON.stringify(mergedCompat[k])}`;
       }
     }
     // Step 7: Validate original structure is preserved (no accidental deletions/changes)
     function isSubset(origVal: unknown, modVal: unknown, path = ''): boolean {
@@ -5092,12 +5392,12 @@ function selfCheckFix(
     if (!isSubset(origParsed, modParsed)) {
       return "Modified file: original structure was altered (data loss detected)";
     }
     // Step 8: Basic format sanity checks
     if (modified.length < original.length) {
       return "Modified file: content is shorter than original (possible truncation)";
     }
     // Step 9: Validate root bracket integrity with the same string/comment-aware
     // scanner used for edits. Do not count raw braces: comments or strings may
     // legitimately contain unmatched `{` / `}` bytes.
@@ -5791,9 +6091,56 @@ export default function (pi: ExtensionAPI) {
   ensureRoutingRegistry();
+  /**
+   * Check whether a model has an EXPLICIT supportsLongCacheRetention: true
+   * opt-in in models.json (either at provider-level or model-level).
+   * Model-level compat takes precedence over provider-level (mirrors Pi's
+   * mergeCompat behaviour: model wins on conflicts).
+   *
+   * Returns true ONLY when the user explicitly opted in. Returns false for:
+   *   - Explicit false (opt-out)
+   *   - In models.json but field absent (Pi defaults to true — unsafe)
+   *   - Not in models.json at all (API-logged-in providers)
+   *   - File missing/unreadable
+   *
+   * The caller strips prompt_cache_retention when this returns false.
+   */
+  function hasExplicitLongRetentionOptIn(model: PiModel): boolean {
+    try {
+      const text = readFileSync(MODELS_JSON_PATH, "utf8");
+      const parsed = parseJsonc(text);
+      const providers = asRecord(asRecord(parsed)?.providers);
+      if (!providers) return false;
+      const prov = asRecord(providers[model.provider]);
+      if (!prov) return false;
+      // Check model-level first (higher priority in Pi's merge logic)
+      const models = prov.models;
+      if (Array.isArray(models)) {
+        const modelEntry = models.find(m => asRecord(m)?.id === model.id);
+        if (modelEntry) {
+          const modelCompat = asRecord(asRecord(modelEntry)?.compat);
+          if (modelCompat?.supportsLongCacheRetention !== undefined) {
+            return modelCompat.supportsLongCacheRetention === true;
+          }
+        }
+      }
+      // Check provider-level
+      const provCompat = asRecord(prov.compat);
+      if (provCompat?.supportsLongCacheRetention !== undefined) {
+        return provCompat.supportsLongCacheRetention === true;
+      }
+      return false;
+    } catch {
+      return false;
+    }
+  }
   pi.on("session_start", async (event, ctx) => {
     await restoreCacheStats(event.reason, ctx);
-    if (runtimeOptimizerEnabled) notifyCacheCompatIfNeeded(resolveRouteModel(ctx.model, ctx) ?? ctx.model, ctx, warnedModels);
     await publishStatus(ctx);
   });
@@ -5915,6 +6262,39 @@ export default function (pi: ExtensionAPI) {
   });
   pi.on("before_provider_request", (event, ctx) => {
+    // ── Safety: strip prompt_cache_retention from payload for models that
+    // are not authorised to send it. Pi defaults supportsLongCacheRetention
+    // to true for all openai-completions models, but most third-party APIs
+    // reject the parameter with 400 “Extra inputs are not permitted”.
+    //
+    // Gate order (first match wins):
+    //   1. Official OpenAI          → keep (trusted to support it)
+    //   2. 400 history              → strip (empirical evidence overrides user config)
+    //   3. Explicit opt-in in models.json → keep (user explicitly wants it)
+    //   4. Everything else          → strip (safe default for third-party APIs)
+    //
+    // Gate 2 before Gate 3 is critical: if a user explicitly opted in but
+    // the API returned 400, we must strip — otherwise the 400 repeats forever.
+    if (runtimeOptimizerEnabled) {
+      const payload = event.payload as UnknownRecord;
+      if (payload && typeof payload.prompt_cache_retention === 'string') {
+        const rModel = resolveRouteModel(ctx.model, ctx) ?? ctx.model;
+        if (rModel) {
+          if (isOfficialOpenAIBaseUrl(rModel)) {
+            // Gate 1: Official OpenAI → keep
+          } else if (promptCacheRetention400Models.has(modelKey(rModel))) {
+            // Gate 2: 400 history → strip (overrides user opt-in)
+            delete payload.prompt_cache_retention;
+          } else if (hasExplicitLongRetentionOptIn(rModel)) {
+            // Gate 3: Explicit user opt-in → keep
+          } else {
+            // Gate 4: Safe default → strip
+            delete payload.prompt_cache_retention;
+          }
+        }
+      }
+    }
     if (!shouldInjectOpenAIPromptCacheKey()) return undefined;
     const requestModel = resolveRouteModel(ctx.model, ctx) ?? ctx.model;
     if (!isOpenAICompatibleApi(requestModel?.api)) return undefined;
@@ -6110,7 +6490,23 @@ export default function (pi: ExtensionAPI) {
           return;
         }
-        const suggestion = buildFixSuggestion(model);
+        let suggestion = buildFixSuggestion(model);
+        // If no regular missing compat flags but the model has a recorded
+        // prompt_cache_retention 400 (Pi sent `prompt_cache_retention` and
+        // the provider rejected it), offer to override
+        // `supportsLongCacheRetention` to false in models.json.
+        if (!suggestion && isPromptCacheRetention400Applicable(model) && promptCacheRetention400Models.has(modelKey(model))) {
+          const key = modelKey(model);
+          const slashIdx = key.indexOf("/");
+          const providerLabel = slashIdx > 0 ? key.slice(0, slashIdx) : key;
+          suggestion = {
+            providerLabel,
+            modelId: model.id,
+            compatKeys: { supportsLongCacheRetention: false },
+          };
+        }
         if (!suggestion) {
           const key = modelKey(model);
           cmdCtx.ui.notify(`✅ Nothing to fix for "${key}". Compat already configured.`, "info");
@@ -6120,24 +6516,40 @@ export default function (pi: ExtensionAPI) {
         if (!cmdCtx.hasUI) {
           // No UI — refuse to write, show manual guidance instead.
           const compatResult = buildCompatDiagnosis(model);
-          if (compatResult) {
-            cmdCtx.ui.notify(
-              `❌ Non-interactive terminal detected. Auto-fix requires UI confirmation.\n\n` +
-              `Manual steps:\n` +
-              `1. Open ${getModelsJsonDisplayPath()} in your editor.\n` +
-              `2. Go to providers["${suggestion.providerLabel}"] -> models -> entry with id "${suggestion.modelId}" -> compat.\n` +
-              `3. Add the missing keys:\n${formatCompatKeysForInsertion(suggestion.compatKeys)}\n` +
-              `4. Save and run /reload.\n\n` +
-              compatResult,
-              "warning",
+          const snippet = formatMissingEntryManualSnippet(
+            suggestion.providerLabel, suggestion.modelId, suggestion.compatKeys,
+          );
+          const manualLines = [
+            `❌ Non-interactive terminal detected. Auto-fix requires UI confirmation.`,
+            "",
+            `Edit ${getModelsJsonDisplayPath()} and run /reload.`,
+          ];
+          if (promptCacheRetention400Models.has(modelKey(model))) {
+            manualLines.push(
+              "",
+              "💡 This model returned HTTP 400 for prompt_cache_retention.",
+              "Create or edit the entry below to override supportsLongCacheRetention to false.",
             );
-          } else {
-            cmdCtx.ui.notify(
-              `❌ Non-interactive terminal detected. Auto-fix requires UI confirmation.\n` +
-              `Edit ${getModelsJsonDisplayPath()} manually and run /reload.`,
-              "warning",
+          }
+          manualLines.push(
+            "",
+            "If the provider/model already exists in models.json, add these compat keys under",
+            `providers["${suggestion.providerLabel}"] -> models -> entry with id "${suggestion.modelId}" -> compat:`,
+            formatCompatKeysForInsertion(suggestion.compatKeys),
+          );
+          if (snippet.length > 0) {
+            manualLines.push(
+              "",
+              "If the provider/model is missing (common for API-logged-in channels such as",
+              `opencode go), add a minimal entry under "providers" (keep existing auth as-is):`,
+              "",
+              snippet,
             );
           }
+          if (compatResult) {
+            manualLines.push("", compatResult);
+          }
+          cmdCtx.ui.notify(manualLines.join("\n"), "warning");
           return;
         }
@@ -6150,17 +6562,127 @@ export default function (pi: ExtensionAPI) {
           return;
         }
-        // Locate the model entry
+        // Locate the model entry. API-logged-in providers (e.g. opencode go)
+        // may not appear in models.json at all.
         const location = locateModelInJsonc(originalText, suggestion.providerLabel, suggestion.modelId);
         if (!location) {
-          cmdCtx.ui.notify(
-            `❌ Could not locate model "${suggestion.modelId}" in ${getModelsJsonDisplayPath()}.\n` +
-            `The JSONC scanner could not confidently find the target entry.\n` +
-            `Manual edit required: open the file, find providers["${suggestion.providerLabel}"] -> models, and add:\n` +
-            `${formatCompatKeysForInsertion(suggestion.compatKeys)}\n` +
-            `Then run /reload.`,
-            "warning",
-          );
+          const diagnosis = analyzeModelsJsonForMissingEntry(originalText, suggestion.providerLabel, suggestion.modelId);
+          if (diagnosis && cmdCtx.hasUI) {
+            // Offer to create the missing entry.
+            const plan = composeMissingEntryInsertion(
+              originalText, diagnosis,
+              suggestion.providerLabel, suggestion.modelId, suggestion.compatKeys,
+            );
+            const checkError = selfCheckMissingEntryInsertion(
+              originalText, plan.modifiedText,
+              suggestion.providerLabel, suggestion.modelId, suggestion.compatKeys,
+            );
+            if (checkError !== null) {
+              // Fall through to manual guidance.
+              cmdCtx.ui.notify(
+                `❌ Self-check would fail for auto-created entry: ${checkError}\n` +
+                `Falling back to manual guidance. No changes were made.`,
+                "error",
+              );
+              // Continue to manual guidance below.
+            } else {
+              const keysPreview = JSON.stringify(suggestion.compatKeys, null, 2);
+              const ts = backupTimestamp();
+              const backupPath = `${MODELS_JSON_PATH}.backup-cache-optimizer-${ts}`;
+              const previewLines = [
+                `📝 Preview of changes to ${getModelsJsonDisplayPath()}:`,
+                ``,
+                `Location: ${plan.placementLabel}`,
+                `Compat JSON to write:`,
+                keysPreview,
+                ``,
+                `⚠️  Risk notice:`,
+                `  1. This creates a new entry in models.json. Existing auth (e.g. login API tokens) is not affected.`,
+                `  2. A timestamped backup will be written to: ${backupPath}`,
+                `  3. You must run /reload or restart Pi for the change to take effect.`,
+                `  4. If the file contains comments or unusual formatting, please verify the result after write.`,
+              ];
+              if (promptCacheRetention400Models.has(modelKey(model))) {
+                previewLines.push(
+                  "",
+                  "💡  This fix overrides supportsLongCacheRetention to false because",
+                  "a 400 prompt_cache_retention error was observed for this model.",
+                  "After applying and reloading, Pi will no longer send the",
+                  "prompt_cache_retention parameter to this provider.",
+                );
+              }
+              previewLines.push("", `Apply these changes?`);
+              const confirmed = await cmdCtx.ui.confirm("Cache Optimizer — Fix (new entry)", previewLines.join("\n"));
+              if (confirmed) {
+                try {
+                  await copyFile(MODELS_JSON_PATH, backupPath);
+                  const tempPath = `${MODELS_JSON_PATH}.${process.pid}.${Date.now()}.fix.tmp`;
+                  await writeFile(tempPath, plan.modifiedText, "utf8");
+                  await rename(tempPath, MODELS_JSON_PATH);
+                  const writtenText = await readFile(MODELS_JSON_PATH, "utf8");
+                  const postErr = selfCheckMissingEntryInsertion(
+                    originalText, writtenText,
+                    suggestion.providerLabel, suggestion.modelId, suggestion.compatKeys,
+                  );
+                  if (postErr !== null) {
+                    await copyFile(backupPath, MODELS_JSON_PATH);
+                    cmdCtx.ui.notify(
+                      `❌ Post-write self-check failed: ${postErr}\n` +
+                      `The backup at ${backupPath} has been restored. No changes applied.`,
+                      "error",
+                    );
+                    return;
+                  }
+                  cmdCtx.ui.notify(
+                    `✅ Fix applied to ${getModelsJsonDisplayPath()}.\n` +
+                    `Backup saved to: ${backupPath}\n` +
+                    `Run /reload or restart Pi for the change to take effect.`,
+                    "info",
+                  );
+                } catch (e) {
+                  cmdCtx.ui.notify(
+                    `❌ Write failed: ${e instanceof Error ? e.message : String(e)}`,
+                    "error",
+                  );
+                }
+                return;
+              }
+              cmdCtx.ui.notify("No changes were made. Canceled by user.", "info");
+              return;
+            }
+          }
+          // Non-interactive or no diagnosis: show manual guidance.
+          const snippet = diagnosis
+            ? formatMissingEntryManualSnippet(suggestion.providerLabel, suggestion.modelId, suggestion.compatKeys)
+            : formatCompatKeysForInsertion(suggestion.compatKeys);
+          const adviceLines: string[] = [];
+          if (!diagnosis) {
+            adviceLines.push(
+              `❌ Could not locate model "${suggestion.modelId}" or provider "${suggestion.providerLabel}" in ${getModelsJsonDisplayPath()}.`,
+              "",
+              "Providers that were added via Pi /login API (e.g. opencode go) do not have",
+              "entries in models.json. You can create a minimal compat-only entry by hand:",
+            );
+          } else if (diagnosis.scenario === "provider_missing") {
+            adviceLines.push(
+              `ℹ️ Provider "${suggestion.providerLabel}" does not exist in ${getModelsJsonDisplayPath()}.`,
+              `This is common for API-logged-in providers (e.g. /login ...).`,
+              "",
+              "Add the following minimal block under the \"providers\" key (keep your",
+              "existing authentication as-is):",
+            );
+          } else {
+            adviceLines.push(
+              `ℹ️ Model "${suggestion.modelId}" was not found in ${getModelsJsonDisplayPath()}`,
+              `under providers["${suggestion.providerLabel}"].`,
+              "",
+              "Add the following entry to the models array (keep existing auth):",
+            );
+          }
+          adviceLines.push("", snippet, "", "Then save and run /reload.");
+          cmdCtx.ui.notify(adviceLines.join("\n"), "warning");
           return;
         }
@@ -6203,15 +6725,23 @@ export default function (pi: ExtensionAPI) {
           `Placement: ${decision.placement} level — ${decision.reason}`,
           `Compat JSON to write:`,
           keysPreview,
-          ``,
+          ``,
           `⚠️  Risk notice:`,
           scopeRiskLine,
           `  2. A timestamped backup will be written to: ${backupPath}`,
           `  3. You must restart Pi / run /reload for the change to take effect.`,
           `  4. If the file contains comments or unusual formatting, please verify the result after write.`,
-          ``,
-          `Apply these changes?`,
         ];
+        if (promptCacheRetention400Models.has(modelKey(model))) {
+          previewLines.push(
+            "",
+            "💡  This fix overrides supportsLongCacheRetention to false because",
+            "a 400 prompt_cache_retention error was observed for this model.",
+            "After applying and reloading, Pi will no longer send the",
+            "prompt_cache_retention parameter to this provider.",
+          );
+        }
+        previewLines.push("", `Apply these changes?`);
         const confirmed = await cmdCtx.ui.confirm("Cache Optimizer — Fix", previewLines.join("\n"));
         if (!confirmed) {
@@ -6383,13 +6913,13 @@ export default function (pi: ExtensionAPI) {
               `Placement: ${menuDecision.placement} level — ${menuDecision.reason}`,
               `Compat JSON to write:`,
               keysPreview,
-              ``,
+              ``,
               `⚠️  Risk notice:`,
               menuScopeRiskLine,
               `  2. A timestamped backup will be written to: ${backupPath}`,
               `  3. You must restart Pi / run /reload for the change to take effect.`,
               `  4. If the file contains comments, verify the result after write.`,
-              ``,
+              ``,
               `Apply these changes?`,
             ];

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-cache-optimizer",
-  "version": "2.6.6",
+  "version": "2.6.9",
   "description": "Improve Pi prompt/KV cache hit rates with stable prompts, OpenAI-compatible cache keys, proxy compat warnings, and footer cache stats.",
   "keywords": [
     "pi-package",