npm - pi-cache-optimizer - Versions diffs - 2.6.1 → 2.6.3 - Mend

pi-cache-optimizer 2.6.1 → 2.6.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -29,6 +29,7 @@ Pi extension for improving provider-side KV / prompt cache hit rates. It keeps s
 - Requests long cache retention when Pi/provider compat supports it.
 - Adds a session-id `prompt_cache_key` fallback for `openai-completions` / `openai-responses` payloads when no effective key exists.
 - Warns once for third-party OpenAI-compatible proxies missing cache/session-affinity compat flags.
+- Detects Anthropic adaptive thinking models (opus-4.6+, sonnet-4.6+, fable-5+) missing `forceAdaptiveThinking: true` compat.
 - Shows session-scoped footer stats for supported model families.
 Caching is provider-side and best-effort. Third-party proxies can still hide cache usage, reject unsupported parameters, or route requests across multiple upstreams.
@@ -58,6 +59,7 @@ Run `/reload` in Pi after install/update/remove so extension hooks refresh.
 | `/cache-optimizer compat` | Shows copyable compat advice for the active model, if applicable. |
 | `/cache-optimizer stats` | Shows today's session-scoped counters and recent trend for the active model. |
 | `/cache-optimizer reset` | Resets only local stats for the active session + model; upstream provider cache is not modified. |
+| `/cache-optimizer fix` | Auto-repairs safe compat issues for the active model (adaptive thinking, DeepSeek reasoning, OpenAI proxy session affinity). Shows preview + risk warning, requires confirmation. **Only modifies `models.json` after explicit user approval.** |
 `enable` / `disable` are current-process switches. For a persistent opt-out, use environment variables below.
@@ -99,7 +101,74 @@ Notes:
 - If you see `400 Unsupported parameter: prompt_cache_retention`, remove/avoid `supportsLongCacheRetention` for that channel. Keep `sendSessionAffinityHeaders` if supported.
 - Use `/cache-optimizer compat` or `/cache-optimizer doctor` to see model-specific advice.
 - For DeepSeek models, the Pi Mono guidance expects `compat.requiresReasoningContentOnAssistantMessages: true` and `compat.thinkingFormat: "deepseek"` alongside cache/session-affinity flags when the endpoint supports them.
-- This extension only advises; it does not edit `models.json`.
+- This extension's `doctor` and `compat` commands only advise; they do not modify `models.json`.
+## Anthropic adaptive thinking models
+Claude models from opus-4.6 / sonnet-4.6 / fable-5 onwards require `forceAdaptiveThinking: true` in compat. Without it, Pi sends the legacy thinking format and Anthropic rejects the request.
+Pi's built-in catalog already sets this flag for official models. Custom channels in `models.json` that override these models must include the flag:
+```json
+{
+  "providers": {
+    "your-claude-channel": {
+      "api": "anthropic-messages",
+      "baseUrl": "https://...",
+      "apiKey": "env:YOUR_KEY",
+      "compat": {
+        "forceAdaptiveThinking": true
+      },
+      "models": [
+        { "id": "claude-opus-4-8", "name": "Claude Opus 4.8" }
+      ]
+    }
+  }
+}
+```
+Or use model-level override:
+```json
+{
+  "providers": {
+    "your-claude-channel": {
+      "modelOverrides": {
+        "claude-opus-4-8": {
+          "compat": {
+            "forceAdaptiveThinking": true
+          }
+        }
+      }
+    }
+  }
+}
+```
+`/cache-optimizer doctor` and `/cache-optimizer compat` detect missing flags and show copyable JSON.
+## Auto-repair with `/cache-optimizer fix`
+**v2.6.0+** adds a `fix` subcommand that can auto-repair safe compat issues:
+- Anthropic adaptive thinking (`forceAdaptiveThinking: true`)
+- DeepSeek Pi Mono reasoning compat (`thinkingFormat: "deepseek"`, `requiresReasoningContentOnAssistantMessages: true`)
+- OpenAI-compatible proxy session affinity (`sendSessionAffinityHeaders: true` for `openai-completions`, `sendSessionIdHeader: true` for `openai-responses`)
+**Scope:** only the currently active model. Other channels require switching models and running `fix` again.
+**Safety:**
+1. Shows full preview of changes (file path, edit location, JSON to write, risks)
+2. Warns: ① changes affect all sessions using that channel, ② automatic backup created at `models.json.backup-cache-optimizer-<timestamp>`, ③ Pi reload required
+3. Uses comment-preserving surgical editor — existing comments, indentation, key order preserved
+4. Requires explicit user confirmation (interactive prompt or `ui.select`)
+5. Writes atomically (temp + rename); self-validates after write
+6. Falls back to manual guidance if JSONC scanner cannot confidently locate the target
+**Non-interactive mode:** refuses to write; shows manual edit guidance instead.
+**Run:** `/cache-optimizer fix` when the active model has detected compat issues. The command shows "nothing to fix" when compat is already complete.
 ### Channels without a `models.json` provider entry

package/README.zh-CN.md CHANGED Viewed

@@ -29,6 +29,7 @@
 - 在 Pi / provider compat 支持时请求长缓存保留。
 - 对 `openai-completions` / `openai-responses` 请求，在没有有效 key 时使用 Pi session id 补 `prompt_cache_key`。
 - 对缺少缓存 / session-affinity compat 的第三方 OpenAI-compatible 代理给出一次性提醒。
+- 检测 Anthropic adaptive thinking 模型（opus-4.6+、sonnet-4.6+、fable-5+）是否缺少 `forceAdaptiveThinking: true` compat。
 - 为支持的模型家族显示按 session 隔离的底部缓存统计。
 缓存是 provider 侧的 best-effort 行为。第三方代理仍可能隐藏缓存 usage、拒绝不支持的参数，或把请求路由到多个上游。
@@ -58,6 +59,7 @@ pi remove npm:pi-deepseek-cache-optimizer && pi install npm:pi-cache-optimizer
 | `/cache-optimizer compat` | 对当前模型显示可复制的 compat 建议（如适用）。 |
 | `/cache-optimizer stats` | 显示当前模型今天的 session-scoped 统计和近期趋势。 |
 | `/cache-optimizer reset` | 只重置当前 session + 当前模型的本地统计；不会修改上游 provider 缓存。 |
+| `/cache-optimizer fix` | 为当前模型自动修复安全的 compat 问题（adaptive thinking、DeepSeek reasoning、OpenAI proxy session affinity）。展示预览 + 风险提示，需要用户确认。**仅在用户明确批准后才修改 `models.json`。** |
 `enable` / `disable` 是当前进程内开关。若要持久关闭某些能力，请使用下面的环境变量。
@@ -99,7 +101,74 @@ LiteLLM / OneAPI / NewAPI / 类 OpenRouter 渠道等第三方 `openai-completion
 - 如果出现 `400 Unsupported parameter: prompt_cache_retention`，请为该渠道移除 / 避免 `supportsLongCacheRetention`；如支持，可保留 `sendSessionAffinityHeaders`。
 - 使用 `/cache-optimizer compat` 或 `/cache-optimizer doctor` 查看当前模型的具体建议。
 - 对 DeepSeek 模型，Pi Mono 指南期望在支持时同时设置 `compat.requiresReasoningContentOnAssistantMessages: true` 和 `compat.thinkingFormat: "deepseek"`，再配合缓存 / session-affinity 相关 compat。
-- 本扩展只给建议，不会修改 `models.json`。
+- 本扩展的 `doctor` 和 `compat` 命令只给建议，不会修改 `models.json`。
+## Anthropic adaptive thinking 模型
+Claude 从 opus-4.6 / sonnet-4.6 / fable-5 开始需要在 compat 中设置 `forceAdaptiveThinking: true`。缺少此 flag 时，Pi 会发送旧版 thinking 格式，Anthropic 会拒绝请求。
+Pi 内置 catalog 已为官方模型设置此 flag。`models.json` 中覆盖这些模型的自定义渠道必须包含该 flag：
+```json
+{
+  "providers": {
+    "your-claude-channel": {
+      "api": "anthropic-messages",
+      "baseUrl": "https://...",
+      "apiKey": "env:YOUR_KEY",
+      "compat": {
+        "forceAdaptiveThinking": true
+      },
+      "models": [
+        { "id": "claude-opus-4-8", "name": "Claude Opus 4.8" }
+      ]
+    }
+  }
+}
+```
+或使用模型级 override：
+```json
+{
+  "providers": {
+    "your-claude-channel": {
+      "modelOverrides": {
+        "claude-opus-4-8": {
+          "compat": {
+            "forceAdaptiveThinking": true
+          }
+        }
+      }
+    }
+  }
+}
+```
+`/cache-optimizer doctor` 和 `/cache-optimizer compat` 会检测缺失的 flag 并显示可复制的 JSON。
+## 使用 `/cache-optimizer fix` 自动修复
+**v2.6.0+** 新增 `fix` 子命令，可自动修复安全的 compat 问题：
+- Anthropic adaptive thinking（`forceAdaptiveThinking: true`）
+- DeepSeek Pi Mono reasoning compat（`thinkingFormat: "deepseek"`、`requiresReasoningContentOnAssistantMessages: true`）
+- OpenAI-compatible proxy session affinity（`openai-completions` 用 `sendSessionAffinityHeaders: true`，`openai-responses` 用 `sendSessionIdHeader: true`）
+**范围：** 仅当前 active model。其他渠道需切换模型后再次运行 `fix`。
+**安全机制：**
+1. 显示完整变更预览（文件路径、编辑位置、要写入的 JSON、风险说明）
+2. 警告：① 修改影响使用该渠道的所有 session，② 自动备份到 `models.json.backup-cache-optimizer-<timestamp>`，③ 需重启 Pi 或 reload
+3. 使用保留注释的精确编辑器 —— 现有注释、缩进、key 顺序全部保留
+4. 需要用户明确确认（交互式提示或 `ui.select`）
+5. 原子写入（temp + rename）；写入后自我验证
+6. 如果 JSONC 扫描器无法置信定位目标，回退到手动修改指引
+**非交互模式：** 拒绝写入，显示手动编辑指引。
+**运行：** 当 active model 检测到 compat 问题时执行 `/cache-optimizer fix`。compat 已完整时，命令显示"无需修复"。
 ### 没有 `models.json` provider entry 的渠道

package/index.ts CHANGED Viewed

@@ -1381,6 +1381,34 @@ function modelKey(model: PiModel): string {
   return `${model.provider}/${model.id}`;
 }
+function isRouterModel(model: PiModel | undefined): boolean {
+  return lower(model?.provider) === "router";
+}
+function modelFromAssistantMessage(message: unknown, fallback: PiModel | undefined): PiModel | undefined {
+  const record = getAssistantRecord(message);
+  if (!record) return fallback;
+  const id = lower(record.responseModel) || lower(record.model) || fallback?.id;
+  const provider = lower(record.provider) || fallback?.provider;
+  const api = lower(record.api) || fallback?.api;
+  if (!id || !provider || !api) return fallback;
+  return {
+    ...(fallback ?? {}),
+    id,
+    name: id,
+    provider,
+    api,
+    baseUrl: fallback?.baseUrl ?? "",
+    reasoning: fallback?.reasoning ?? false,
+    input: fallback?.input ?? ["text"],
+    cost: fallback?.cost ?? { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+    contextWindow: fallback?.contextWindow ?? 0,
+    maxTokens: fallback?.maxTokens ?? 0,
+  } as PiModel;
+}
 function keyForModelExt(model: { provider: string; id: string }): string {
   return `${model.provider}/${model.id}`;
 }
@@ -2835,7 +2863,8 @@ function selectAdapterForModel(model: PiModel | undefined): CacheProviderAdapter
 }
 function selectAdapterForAssistantMessage(message: unknown, model: PiModel | undefined): CacheProviderAdapter | undefined {
-  return CACHE_PROVIDER_ADAPTERS.find((adapter) => adapter.matchesAssistantMessage(message, model));
+  const responseModel = isRouterModel(model) ? modelFromAssistantMessage(message, model) : model;
+  return CACHE_PROVIDER_ADAPTERS.find((adapter) => adapter.matchesAssistantMessage(message, responseModel));
 }
 function notifyCacheCompatIfNeeded(
@@ -5141,6 +5170,14 @@ export default function (pi: ExtensionAPI) {
     const adapter = selectAdapterForModel(model);
     let statusText: string | undefined;
+    if (!adapter && isRouterModel(model)) {
+      // router/auto has no stable target family before the first successful
+      // routed response. Keep the existing cache footer visible instead of
+      // clearing it on model_select; message_end will switch to the real
+      // upstream model/provider after pi-router relays the response metadata.
+      return;
+    }
     if (adapter) {
       // Display session-scoped stats. A model that has never been used
       // in this session shows 0/0. The message_end hook populates
@@ -5322,9 +5359,11 @@ export default function (pi: ExtensionAPI) {
     const usage = adapter.normalizeUsage(event.message);
+    const statsModel = isRouterModel(ctx.model) ? modelFromAssistantMessage(event.message, ctx.model) : ctx.model;
     // Record recent sample (even when usage is missing, for trend diagnosis)
-    if (ctx.model) {
-      const sk = sessionModelKey(ctx.model);
+    if (statsModel) {
+      const sk = sessionModelKey(statsModel);
       const missingFields = usage === undefined || (usage.cacheRead === 0 && usage.cacheWrite === 0 && usage.totalInput === 0)
         ? true
         : hasMissingUsageFields(event.message, adapter);
@@ -5335,17 +5374,17 @@ export default function (pi: ExtensionAPI) {
     await rollOverStatsIfNeeded(ctx);
-    // Update stats scoped to current session + active model.
-    // Falls back to legacy family when ctx.model is undefined.
-    if (ctx.model) {
-      const sk = sessionModelKey(ctx.model);
+    // Update stats scoped to current session + actual routed model.
+    // Falls back to legacy family when no model is available.
+    if (statsModel) {
+      const sk = sessionModelKey(statsModel);
       addUsageToCacheStats(getOrCreateStatsByModelKey(sk), usage);
     } else {
       addUsageToCacheStats(getStatsForModel(undefined, adapter), usage);
     }
     schedulePersistCacheStats(ctx);
-    await publishStatus(ctx);
+    await publishStatus(ctx, statsModel);
   });
   // ────────────────────────────────────────────────────────────────

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-cache-optimizer",
-  "version": "2.6.1",
+  "version": "2.6.3",
   "description": "Improve Pi prompt/KV cache hit rates with stable prompts, OpenAI-compatible cache keys, proxy compat warnings, and footer cache stats.",
   "keywords": [
     "pi-package",