npm - claude-code-cache-fix - Versions diffs - 1.5.0 → 1.6.0 - Mend

claude-code-cache-fix 1.5.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -1,6 +1,8 @@
 # claude-code-cache-fix
-Fixes prompt cache regressions in [Claude Code](https://github.com/anthropics/claude-code) that cause **up to 20x cost increase** on resumed sessions, plus monitoring for silent context degradation. Confirmed through v2.1.92.
+English | [中文](./README.zh.md)
+Fixes prompt cache regressions in [Claude Code](https://github.com/anthropics/claude-code) that cause **up to 20x cost increase** on resumed sessions, plus monitoring for silent context degradation. Confirmed through v2.1.97.
 ## The problem

package/README.zh.md ADDED Viewed

@@ -0,0 +1,167 @@
+# claude-code-cache-fix
+[English](./README.md) | 中文
+修复 [Claude Code](https://github.com/anthropics/claude-code) 中导致恢复会话时**成本增加高达 20 倍**的提示缓存回归问题，同时监控静默上下文降级。已在 v2.1.92 至 v2.1.97 上验证。
+## 问题描述
+当你在 Claude Code 中使用 `--resume` 或 `/resume` 时，提示缓存会静默失效。API 不再读取已缓存的 token（廉价），而是每一轮都从头重建（昂贵）。原本每小时约 $0.50 的会话可能在无任何提示的情况下飙升至 $5-10/小时。
+三个 bug 导致了这个问题：
+1. **附件块散布** — 技能列表、MCP 服务器、延迟工具、钩子等附件块应当位于 `messages[0]` 中。恢复会话时，它们会漂移到后续消息中，改变缓存前缀。
+2. **指纹不稳定** — `cc_version` 指纹（如 `2.1.92.a3f`）是根据 `messages[0]` 的内容计算的，包括元数据/附件块。当这些块发生偏移时，指纹改变，系统提示改变，缓存失效。
+3. **工具定义排序不确定** — 工具定义在不同轮次间可能以不同顺序到达，改变请求字节并使缓存键失效。
+此外，通过 Read 工具读取的图片会以 base64 形式持久化在对话历史中，在每次后续 API 调用时一并发送，悄然增加 token 成本。
+## 安装
+需要 Node.js >= 18，且 Claude Code 通过 npm 安装（非独立二进制文件）。
+```bash
+npm install -g claude-code-cache-fix
+```
+## 使用方法
+本修复以 Node.js 预加载模块的形式工作，在 API 请求离开本机之前进行拦截。
+### 方式 A：包装脚本（推荐）
+创建包装脚本（如 `~/bin/claude-fixed`）：
+```bash
+#!/bin/bash
+CLAUDE_NPM_CLI="$HOME/.npm-global/lib/node_modules/@anthropic-ai/claude-code/cli.js"
+if [ ! -f "$CLAUDE_NPM_CLI" ]; then
+  echo "Error: Claude Code npm package not found at $CLAUDE_NPM_CLI" >&2
+  echo "Install with: npm install -g @anthropic-ai/claude-code" >&2
+  exit 1
+fi
+exec env NODE_OPTIONS="--import claude-code-cache-fix" node "$CLAUDE_NPM_CLI" "$@"
+```
+```bash
+chmod +x ~/bin/claude-fixed
+```
+如果你的 npm 全局前缀不同，请相应调整 `CLAUDE_NPM_CLI`。使用以下命令查找：
+```bash
+npm root -g
+```
+### 方式 B：Shell 别名
+```bash
+alias claude='NODE_OPTIONS="--import claude-code-cache-fix" node "$(npm root -g)/@anthropic-ai/claude-code/cli.js"'
+```
+### 方式 C：直接调用
+```bash
+NODE_OPTIONS="--import claude-code-cache-fix" claude
+```
+> **注意**：仅在 `claude` 指向 npm/Node 安装时有效。独立二进制文件使用不同的执行路径，会绕过 Node.js 预加载。
+## 工作原理
+模块在 Claude Code 向 `/v1/messages` 发起 API 调用前拦截 `globalThis.fetch`。每次调用时：
+1. **扫描所有用户消息**中的附件块（技能、MCP、延迟工具、钩子），将每种类型的最新版本移回 `messages[0]`，匹配全新会话的布局
+2. **按名称字母顺序排列工具定义**，确保确定性排序
+3. **重新计算 cc_version 指纹**，基于真实用户消息文本而非元数据/附件内容
+所有修复都是幂等的 — 如果无需修复，请求将原样传递。拦截器对你的对话是只读的；它只在请求到达 API 之前规范化请求结构。
+## 图片剥离
+通过 Read 工具读取的图片以 base64 编码存储在对话历史的 `tool_result` 块中。它们会**在每次后续 API 调用中**随行发送，直到压缩。单张 500KB 的图片每轮带来约 62,500 token 的额外开销。
+启用图片剥离以移除旧的工具结果中的图片：
+```bash
+export CACHE_FIX_IMAGE_KEEP_LAST=3
+```
+这将保留最近 3 条用户消息中的图片，并将较早的替换为文本占位符。仅针对 `tool_result` 块（Read 工具输出）中的图片 — 用户粘贴的图片不受影响。文件仍保留在磁盘上，需要时可重新读取。
+设为 `0`（默认）以禁用。
+## 监控功能
+拦截器包含社区发现的多项额外问题的监控：
+### 微压缩 / 预算执行
+Claude Code 通过服务器控制机制（GrowthBook 标志）静默替换旧的工具结果为 `[Old tool result content cleared]`。200,000 字符的聚合上限和每工具上限（Bash: 30K, Grep: 20K）会截断较早的结果且无通知。
+拦截器检测已清除的工具结果并记录计数。当总工具结果字符数接近 200K 阈值时，会记录警告。
+### 虚假速率限制器
+客户端可以在不发起 API 调用的情况下生成合成的 "Rate limit reached" 错误，可通过 `"model": "<synthetic>"` 识别。拦截器会记录这些事件。
+### 配额追踪
+解析响应头中的 `anthropic-ratelimit-unified-5h-utilization` 和 `7d-utilization`，保存到 `~/.claude/quota-status.json`，供状态栏钩子或其他工具使用。
+### 高峰时段检测
+Anthropic 在工作日高峰时段（UTC 13:00-19:00，周一至周五）会提高配额消耗速率。拦截器检测高峰窗口并将 `peak_hour: true/false` 写入 `quota-status.json`。详见 `docs/peak-hours-reference.md`。
+### 使用量遥测与成本报告
+拦截器将每次调用的使用数据记录到 `~/.claude/usage.jsonl` — 每次 API 调用一行 JSON，包含模型、token 计数和缓存明细。使用内置的成本报告工具分析费用：
+```bash
+node tools/cost-report.mjs                    # 从拦截器日志查看今日费用
+node tools/cost-report.mjs --date 2026-04-08  # 指定日期
+node tools/cost-report.mjs --since 2h         # 最近 2 小时
+node tools/cost-report.mjs --admin-key <key>  # 与 Admin API 交叉验证
+```
+同样适用于任何包含 Anthropic 使用量字段的 JSONL（`--file`、stdin）— 适合 SDK 用户和代理设置。支持文本、JSON 和 Markdown 输出格式。详见 `docs/cost-report.md`。
+## 调试模式
+启用调试日志以验证修复是否生效：
+```bash
+CACHE_FIX_DEBUG=1 claude-fixed
+```
+日志写入 `~/.claude/cache-fix-debug.log`。
+## 环境变量
+| 变量 | 默认值 | 说明 |
+|------|--------|------|
+| `CACHE_FIX_DEBUG` | `0` | 启用调试日志 |
+| `CACHE_FIX_PREFIXDIFF` | `0` | 启用前缀快照差异对比 |
+| `CACHE_FIX_IMAGE_KEEP_LAST` | `0` | 保留最近 N 条用户消息中的图片（0 = 禁用） |
+| `CACHE_FIX_USAGE_LOG` | `~/.claude/usage.jsonl` | 每次调用使用量遥测日志路径 |
+## 限制
+- **仅支持 npm 安装** — 独立 Claude Code 二进制文件具有 Zig 级别的证明机制，会绕过 Node.js。本修复仅适用于 npm 包（`npm install -g @anthropic-ai/claude-code`）。
+- **超额 TTL 降级** — 超过 5 小时配额的 100% 会触发服务器端 TTL 从 1h 降级至 5m。这是服务器端决策，无法在客户端修复。拦截器通过防止缓存不稳定来避免你首先进入超额状态。
+- **微压缩不可阻止** — 监控功能可以检测上下文降级，但无法阻止。微压缩和预算执行机制是通过 GrowthBook 标志进行服务器控制的，没有客户端禁用选项。
+## 相关问题
+- [#34629](https://github.com/anthropics/claude-code/issues/34629) — 恢复缓存回归的原始报告
+- [#40524](https://github.com/anthropics/claude-code/issues/40524) — 会话内指纹失效，图片持久化
+- [#42052](https://github.com/anthropics/claude-code/issues/42052) — 社区拦截器开发，TTL 降级发现
+- [#44045](https://github.com/anthropics/claude-code/issues/44045) — 恢复时提示缓存部分缺失
+- [#41930](https://github.com/anthropics/claude-code/issues/41930) — 多种根因导致的异常用量消耗
+## 许可证
+MIT

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-code-cache-fix",
-  "version": "1.5.0",
+  "version": "1.6.0",
   "description": "Fixes prompt cache regression in Claude Code that causes up to 20x cost increase on resumed sessions",
   "type": "module",
   "exports": "./preload.mjs",

package/preload.mjs CHANGED Viewed

@@ -169,6 +169,75 @@ function sortSkillsBlock(text) {
   return header + entries.join("\n") + footer;
 }
+/**
+ * Sort deferred tools listing for deterministic ordering. The block format is:
+ *   <system-reminder>
+ *   The following deferred tools are now available via ToolSearch:
+ *   ToolName1
+ *   ToolName2
+ *   ...
+ *   </system-reminder>
+ *
+ * When MCP tools register asynchronously, new tools can appear between API
+ * calls, changing the block content and busting cache. Sorting ensures that
+ * once a tool appears, its position is deterministic.
+ */
+function sortDeferredToolsBlock(text) {
+  const match = text.match(
+    /^(<system-reminder>\nThe following deferred tools are now available[^\n]*\n)([\s\S]+?)(\n<\/system-reminder>\s*)$/
+  );
+  if (!match) return text;
+  const [, header, toolsList, footer] = match;
+  const tools = toolsList.split("\n").map(t => t.trim()).filter(Boolean);
+  tools.sort();
+  return header + tools.join("\n") + footer;
+}
+// --------------------------------------------------------------------------
+// Content pinning for MCP registration jitter (Bug 4)
+// --------------------------------------------------------------------------
+//
+// When MCP tools register asynchronously, the skills and deferred tools blocks
+// can change between consecutive API calls as new tools finish registering.
+// This causes repeated cache busts even though the final tool set is stable.
+//
+// Fix: track the content hash of each block type. When content changes, accept
+// one cache miss (the new tool needs to be visible), then pin the new content.
+// If the SAME content appears on consecutive calls, use the pinned version
+// with normalized whitespace to prevent trivial diffs.
+//
+// Reported by @bilby91 on #44045 (Agent SDK with MCP tools).
+// --------------------------------------------------------------------------
+const _pinnedBlocks = new Map(); // blockType → { hash, text }
+/**
+ * Normalize a block's trailing whitespace and pin its content. Returns the
+ * normalized text. On first call for a block type, pins the content. On
+ * subsequent calls, if the content hash matches the pin, returns the pinned
+ * version (byte-identical). If content changed, updates the pin and returns
+ * the new content (accepts one cache bust).
+ */
+function pinBlockContent(blockType, text) {
+  // Normalize: trim trailing whitespace inside the </system-reminder> tag
+  const normalized = text.replace(/\s+(<\/system-reminder>)\s*$/, "\n$1");
+  const hash = createHash("sha256").update(normalized).digest("hex").slice(0, 16);
+  const pinned = _pinnedBlocks.get(blockType);
+  if (pinned && pinned.hash === hash) {
+    // Content matches pin — return pinned version (byte-identical)
+    return pinned.text;
+  }
+  // Content changed or first call — update pin
+  if (pinned && pinned.hash !== hash) {
+    debugLog(`CONTENT PIN: ${blockType} changed (${pinned.hash} → ${hash}) — accepting one cache bust`);
+  }
+  _pinnedBlocks.set(blockType, { hash, text: normalized });
+  return normalized;
+}
 /**
  * Strip session_knowledge from hooks blocks — ephemeral content that differs
  * between sessions and would bust cache.
@@ -226,7 +295,46 @@ function normalizeResumeMessages(messages) {
       }
     }
   }
-  if (!hasScatteredBlocks) return messages;
+  // Even when blocks aren't scattered, apply sorting and content pinning to
+  // blocks in messages[0]. This handles MCP registration jitter where block
+  // CONTENT changes between calls (new tool registers) without scattering.
+  // (Reported by @bilby91 — Agent SDK with async MCP tools, #44045)
+  if (!hasScatteredBlocks) {
+    let contentModified = false;
+    const newContent = firstMsg.content.map((block) => {
+      const text = block.text || "";
+      if (!isRelocatableBlock(text)) return block;
+      let fixedText = text;
+      if (isSkillsBlock(text)) fixedText = sortSkillsBlock(text);
+      else if (isDeferredToolsBlock(text)) fixedText = sortDeferredToolsBlock(text);
+      else if (isHooksBlock(text)) fixedText = stripSessionKnowledge(text);
+      // Determine block type for pinning
+      let blockType;
+      if (isSkillsBlock(text)) blockType = "skills";
+      else if (isDeferredToolsBlock(text)) blockType = "deferred";
+      else if (isMcpBlock(text)) blockType = "mcp";
+      else if (isHooksBlock(text)) blockType = "hooks";
+      if (blockType) fixedText = pinBlockContent(blockType, fixedText);
+      if (fixedText !== text) {
+        contentModified = true;
+        const { cache_control, ...rest } = block;
+        return { ...rest, text: fixedText };
+      }
+      return block;
+    });
+    if (contentModified) {
+      return messages.map((msg, idx) =>
+        idx === firstUserIdx ? { ...msg, content: newContent } : msg
+      );
+    }
+    return messages;
+  }
   // Scan ALL user messages (including first) in reverse to collect the LATEST
   // version of each block type. This handles both full and partial scatter.
@@ -254,6 +362,10 @@ function normalizeResumeMessages(messages) {
         let fixedText = text;
         if (blockType === "hooks") fixedText = stripSessionKnowledge(text);
         if (blockType === "skills") fixedText = sortSkillsBlock(text);
+        if (blockType === "deferred") fixedText = sortDeferredToolsBlock(text);
+        // Pin content to prevent jitter from late MCP tool registration
+        fixedText = pinBlockContent(blockType, fixedText);
         const { cache_control, ...rest } = block;
         found.set(blockType, { ...rest, text: fixedText });
@@ -722,6 +834,41 @@ globalThis.fetch = async function (url, options) {
         }
       }
+      // Bug 5: 1h TTL enforcement
+      // The client gates 1h cache TTL behind a GrowthBook allowlist that checks
+      // querySource against patterns like "repl_main_thread*", "sdk", "auto_mode".
+      // Interactive CLI sessions may not match any pattern, causing the client to
+      // send cache_control without ttl (defaulting to 5m server-side).
+      // The server honors whatever TTL the client requests — so we inject it.
+      // Discovered by @TigerKay1926 on #42052 using our GrowthBook flag dump.
+      if (payload.system) {
+        let ttlInjected = 0;
+        payload.system = payload.system.map((block) => {
+          if (block.cache_control?.type === "ephemeral" && !block.cache_control.ttl) {
+            ttlInjected++;
+            return { ...block, cache_control: { ...block.cache_control, ttl: "1h" } };
+          }
+          return block;
+        });
+        // Also check messages for cache_control blocks (conversation history breakpoints)
+        if (payload.messages) {
+          for (const msg of payload.messages) {
+            if (!Array.isArray(msg.content)) continue;
+            for (let i = 0; i < msg.content.length; i++) {
+              const b = msg.content[i];
+              if (b.cache_control?.type === "ephemeral" && !b.cache_control.ttl) {
+                msg.content[i] = { ...b, cache_control: { ...b.cache_control, ttl: "1h" } };
+                ttlInjected++;
+              }
+            }
+          }
+        }
+        if (ttlInjected > 0) {
+          modified = true;
+          debugLog(`APPLIED: 1h TTL injected on ${ttlInjected} cache_control block(s)`);
+        }
+      }
       if (modified) {
         options = { ...options, body: JSON.stringify(payload) };
         debugLog("Request body rewritten");

package/tools/cache-test.sh ADDED Viewed

@@ -0,0 +1,249 @@
+#!/bin/bash
+# cache-test.sh — Test Claude Code cache behavior with and without interceptor.
+#
+# Runs four scenarios and captures cache stats for each:
+#   1. One-shot WITHOUT interceptor (baseline)
+#   2. One-shot WITH interceptor
+#   3. Multi-turn WITHOUT interceptor (conversation + resume)
+#   4. Multi-turn WITH interceptor (conversation + resume)
+#
+# Outputs a summary report comparing TTL tier, cache hit rates, and
+# whether the interceptor's fixes fired.
+#
+# Usage:
+#   ./cache-test.sh [--skip-resume]   # --skip-resume skips the resume tests
+#
+# Requires: Claude Code installed via npm, claude-code-cache-fix installed.
+set -euo pipefail
+CLAUDE_CLI="$HOME/.npm-global/lib/node_modules/@anthropic-ai/claude-code/cli.js"
+PRELOAD="$HOME/.claude/cache-fix-preload.mjs"
+QUOTA_FILE="$HOME/.claude/quota-status.json"
+USAGE_LOG="$HOME/.claude/usage.jsonl"
+DEBUG_LOG="$HOME/.claude/cache-fix-debug.log"
+REPORT_DIR="/tmp/cache-test-$(date +%Y%m%d_%H%M%S)"
+SKIP_RESUME=false
+for arg in "$@"; do
+  case "$arg" in
+    --skip-resume) SKIP_RESUME=true ;;
+  esac
+done
+# Verify prerequisites
+if [ ! -f "$CLAUDE_CLI" ]; then
+  echo "ERROR: Claude Code not found at $CLAUDE_CLI" >&2
+  echo "Install with: npm install -g @anthropic-ai/claude-code" >&2
+  exit 1
+fi
+if [ ! -f "$PRELOAD" ]; then
+  echo "ERROR: cache-fix preload not found at $PRELOAD" >&2
+  echo "Install with: npm install -g claude-code-cache-fix" >&2
+  exit 1
+fi
+CC_VERSION=$(node "$CLAUDE_CLI" --version 2>/dev/null | head -1)
+echo "=========================================="
+echo "  CACHE BEHAVIOR TEST"
+echo "  Claude Code: $CC_VERSION"
+echo "  Report dir:  $REPORT_DIR"
+echo "=========================================="
+echo ""
+mkdir -p "$REPORT_DIR"
+# Helper: snapshot cache state from quota-status.json
+snapshot_cache() {
+  local label="$1"
+  local outfile="$REPORT_DIR/${label}.json"
+  if [ -f "$QUOTA_FILE" ]; then
+    cp "$QUOTA_FILE" "$outfile"
+    local tier=$(python3 -c "import json; d=json.load(open('$QUOTA_FILE')); print(d.get('cache',{}).get('ttl_tier','?'))" 2>/dev/null || echo "?")
+    local create=$(python3 -c "import json; d=json.load(open('$QUOTA_FILE')); print(d.get('cache',{}).get('cache_creation',0))" 2>/dev/null || echo "?")
+    local read=$(python3 -c "import json; d=json.load(open('$QUOTA_FILE')); print(d.get('cache',{}).get('cache_read',0))" 2>/dev/null || echo "?")
+    local e1h=$(python3 -c "import json; d=json.load(open('$QUOTA_FILE')); print(d.get('cache',{}).get('ephemeral_1h',0))" 2>/dev/null || echo "?")
+    local e5m=$(python3 -c "import json; d=json.load(open('$QUOTA_FILE')); print(d.get('cache',{}).get('ephemeral_5m',0))" 2>/dev/null || echo "?")
+    local hit=$(python3 -c "import json; d=json.load(open('$QUOTA_FILE')); print(d.get('cache',{}).get('hit_rate','?'))" 2>/dev/null || echo "?")
+    echo "  [$label] TTL=$tier  create=$create  read=$read  1h=$e1h  5m=$e5m  hit=$hit%"
+  else
+    echo "  [$label] No quota-status.json found"
+  fi
+}
+# Helper: count usage.jsonl entries
+count_usage() {
+  if [ -f "$USAGE_LOG" ]; then
+    wc -l < "$USAGE_LOG" | tr -d ' '
+  else
+    echo "0"
+  fi
+}
+# Helper: capture debug log entries
+snapshot_debug() {
+  local label="$1"
+  if [ -f "$DEBUG_LOG" ]; then
+    cp "$DEBUG_LOG" "$REPORT_DIR/${label}-debug.log"
+  fi
+}
+# ─── Test 1: One-shot WITHOUT interceptor ────────────────────────────────────
+echo "--- Test 1: One-shot WITHOUT interceptor ---"
+rm -f "$DEBUG_LOG"
+usage_before=$(count_usage)
+# Call 1: cold start
+node "$CLAUDE_CLI" -p "respond with exactly: cache-test-1a" --dangerously-skip-permissions > "$REPORT_DIR/test1a-output.txt" 2>&1
+snapshot_cache "test1a-no-interceptor"
+# Wait 2 seconds for any async writes
+sleep 2
+# Call 2: should get cache hit
+node "$CLAUDE_CLI" -p "respond with exactly: cache-test-1b" --dangerously-skip-permissions > "$REPORT_DIR/test1b-output.txt" 2>&1
+snapshot_cache "test1b-no-interceptor"
+usage_after=$(count_usage)
+echo "  Usage entries added: $((usage_after - usage_before))"
+echo ""
+# ─── Test 2: One-shot WITH interceptor ───────────────────────────────────────
+echo "--- Test 2: One-shot WITH interceptor ---"
+rm -f "$DEBUG_LOG"
+usage_before=$(count_usage)
+# Call 1: cold start with interceptor
+CACHE_FIX_DEBUG=1 NODE_OPTIONS="--import $PRELOAD" \
+  node "$CLAUDE_CLI" -p "respond with exactly: cache-test-2a" --dangerously-skip-permissions > "$REPORT_DIR/test2a-output.txt" 2>&1
+snapshot_cache "test2a-with-interceptor"
+snapshot_debug "test2a"
+sleep 2
+# Call 2: should get cache hit
+CACHE_FIX_DEBUG=1 NODE_OPTIONS="--import $PRELOAD" \
+  node "$CLAUDE_CLI" -p "respond with exactly: cache-test-2b" --dangerously-skip-permissions > "$REPORT_DIR/test2b-output.txt" 2>&1
+snapshot_cache "test2b-with-interceptor"
+snapshot_debug "test2b"
+usage_after=$(count_usage)
+echo "  Usage entries added: $((usage_after - usage_before))"
+echo ""
+# ─── Test 3 & 4: Multi-turn + Resume ────────────────────────────────────────
+if [ "$SKIP_RESUME" = true ]; then
+  echo "--- Tests 3 & 4: SKIPPED (--skip-resume) ---"
+  echo ""
+else
+  # Test 3: Multi-turn WITHOUT interceptor
+  echo "--- Test 3: Multi-turn + Resume WITHOUT interceptor ---"
+  rm -f "$DEBUG_LOG"
+  usage_before=$(count_usage)
+  # Start a session with a named session, do 2 turns, exit, then resume
+  SESSION_NAME="cache-test-no-fix-$$"
+  # Turn 1
+  node "$CLAUDE_CLI" -p "respond with exactly: turn1-done" \
+    --dangerously-skip-permissions -n "$SESSION_NAME" \
+    > "$REPORT_DIR/test3-turn1-output.txt" 2>&1
+  snapshot_cache "test3-turn1-no-interceptor"
+  sleep 2
+  # Turn 2 (resume)
+  node "$CLAUDE_CLI" -p "respond with exactly: turn2-done" \
+    --dangerously-skip-permissions -c \
+    > "$REPORT_DIR/test3-turn2-output.txt" 2>&1
+  snapshot_cache "test3-turn2-no-interceptor"
+  sleep 2
+  # Turn 3 (second resume — this is where scatter typically shows)
+  node "$CLAUDE_CLI" -p "respond with exactly: turn3-done" \
+    --dangerously-skip-permissions -c \
+    > "$REPORT_DIR/test3-turn3-output.txt" 2>&1
+  snapshot_cache "test3-turn3-no-interceptor"
+  usage_after=$(count_usage)
+  echo "  Usage entries added: $((usage_after - usage_before))"
+  echo ""
+  # Test 4: Multi-turn WITH interceptor
+  echo "--- Test 4: Multi-turn + Resume WITH interceptor ---"
+  rm -f "$DEBUG_LOG"
+  usage_before=$(count_usage)
+  SESSION_NAME="cache-test-with-fix-$$"
+  # Turn 1
+  CACHE_FIX_DEBUG=1 CACHE_FIX_PREFIXDIFF=1 NODE_OPTIONS="--import $PRELOAD" \
+    node "$CLAUDE_CLI" -p "respond with exactly: turn1-done" \
+    --dangerously-skip-permissions -n "$SESSION_NAME" \
+    > "$REPORT_DIR/test4-turn1-output.txt" 2>&1
+  snapshot_cache "test4-turn1-with-interceptor"
+  snapshot_debug "test4-turn1"
+  sleep 2
+  # Turn 2 (resume)
+  CACHE_FIX_DEBUG=1 CACHE_FIX_PREFIXDIFF=1 NODE_OPTIONS="--import $PRELOAD" \
+    node "$CLAUDE_CLI" -p "respond with exactly: turn2-done" \
+    --dangerously-skip-permissions -c \
+    > "$REPORT_DIR/test4-turn2-output.txt" 2>&1
+  snapshot_cache "test4-turn2-with-interceptor"
+  snapshot_debug "test4-turn2"
+  sleep 2
+  # Turn 3 (second resume)
+  CACHE_FIX_DEBUG=1 CACHE_FIX_PREFIXDIFF=1 NODE_OPTIONS="--import $PRELOAD" \
+    node "$CLAUDE_CLI" -p "respond with exactly: turn3-done" \
+    --dangerously-skip-permissions -c \
+    > "$REPORT_DIR/test4-turn3-output.txt" 2>&1
+  snapshot_cache "test4-turn3-with-interceptor"
+  snapshot_debug "test4-turn3"
+  usage_after=$(count_usage)
+  echo "  Usage entries added: $((usage_after - usage_before))"
+  echo ""
+fi
+# ─── Summary ────────────────────────────────────────────────────────────────
+echo "=========================================="
+echo "  SUMMARY"
+echo "=========================================="
+echo ""
+echo "All snapshots saved to: $REPORT_DIR"
+echo ""
+echo "Cache snapshots:"
+for f in "$REPORT_DIR"/*.json; do
+  label=$(basename "$f" .json)
+  tier=$(python3 -c "import json; d=json.load(open('$f')); print(d.get('cache',{}).get('ttl_tier','?'))" 2>/dev/null || echo "?")
+  create=$(python3 -c "import json; d=json.load(open('$f')); print(d.get('cache',{}).get('cache_creation',0))" 2>/dev/null || echo "?")
+  read=$(python3 -c "import json; d=json.load(open('$f')); print(d.get('cache',{}).get('cache_read',0))" 2>/dev/null || echo "?")
+  e1h=$(python3 -c "import json; d=json.load(open('$f')); print(d.get('cache',{}).get('ephemeral_1h',0))" 2>/dev/null || echo "?")
+  e5m=$(python3 -c "import json; d=json.load(open('$f')); print(d.get('cache',{}).get('ephemeral_5m',0))" 2>/dev/null || echo "?")
+  printf "  %-40s TTL=%-4s create=%-6s read=%-6s 1h=%-6s 5m=%-6s\n" "$label" "$tier" "$create" "$read" "$e1h" "$e5m"
+done
+# Check for interceptor actions in debug logs
+echo ""
+echo "Interceptor actions:"
+for f in "$REPORT_DIR"/*-debug.log; do
+  [ -f "$f" ] || continue
+  label=$(basename "$f" -debug.log)
+  applied=$(grep -c "APPLIED:" "$f" 2>/dev/null || echo 0)
+  skipped=$(grep -c "SKIPPED:" "$f" 2>/dev/null || echo 0)
+  pins=$(grep -c "CONTENT PIN:" "$f" 2>/dev/null || echo 0)
+  echo "  $label: $applied applied, $skipped skipped, $pins content pins"
+done
+echo ""
+echo "Done. Review $REPORT_DIR for full details."