npm - claude-code-cache-fix - Versions diffs - 1.0.0 → 1.2.0 - Mend

claude-code-cache-fix 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # claude-code-cache-fix
-Fixes a prompt cache regression in [Claude Code](https://github.com/anthropics/claude-code) that causes **up to 20x cost increase** on resumed sessions. Confirmed broken through v2.1.92.
+Fixes prompt cache regressions in [Claude Code](https://github.com/anthropics/claude-code) that cause **up to 20x cost increase** on resumed sessions, plus monitoring for silent context degradation. Confirmed through v2.1.92.
 ## The problem
@@ -14,6 +14,8 @@ Three bugs cause this:
 3. **Non-deterministic tool ordering** — Tool definitions can arrive in different orders between turns, changing request bytes and invalidating the cache key.
+Additionally, images read via the Read tool persist as base64 in conversation history and are sent on every subsequent API call, compounding token costs silently.
 ## Installation
 Requires Node.js >= 18 and Claude Code installed via npm (not the standalone binary).
@@ -76,6 +78,63 @@ The module intercepts `globalThis.fetch` before Claude Code makes API calls to `
 All fixes are idempotent — if nothing needs fixing, the request passes through unmodified. The interceptor is read-only with respect to your conversation; it only normalizes the request structure before it hits the API.
+## Image stripping
+Images read via the Read tool are encoded as base64 and stored in `tool_result` blocks in conversation history. They ride along on **every subsequent API call** until compaction. A single 500KB image costs ~62,500 tokens per turn in carry-forward.
+Enable image stripping to remove old images from tool results:
+```bash
+export CACHE_FIX_IMAGE_KEEP_LAST=3
+```
+This keeps images in the last 3 user messages and replaces older ones with a text placeholder. Only targets images inside `tool_result` blocks (Read tool output) — user-pasted images are never touched. Files remain on disk for re-reading if needed.
+Set to `0` (default) to disable.
+## Prefix lock (resume cache hit)
+Even with the block relocation fix, the first API call after `--resume` triggers a full cache rebuild because CC reassembles messages with different system-reminder blocks, changing the prefix bytes. On a 300k token context at Opus rates, that's ~$2.80 per resume.
+The prefix lock eliminates this by saving the exact `messages[0]` content after all fixes are applied, then replaying it on the next resume to produce a byte-identical prefix.
+```bash
+export CACHE_FIX_PREFIX_LOCK=1
+```
+Safety guards — the lock only fires when ALL of these match:
+- System prompt hash (same project, no CLAUDE.md changes)
+- Tools hash (no MCP/plugin changes)
+- User message text (same conversation)
+- User content hash (no substantive context changes)
+- Not a post-compaction conversation
+If any guard fails, the lock skips and falls back to normal behavior. The worst case is a skip — the lock cannot increase costs or cause context loss.
+Set to `0` (default) to disable.
+## Monitoring
+The interceptor includes monitoring for several additional issues identified by the community:
+### Microcompact / budget enforcement
+Claude Code silently replaces old tool results with `[Old tool result content cleared]` via server-controlled mechanisms (GrowthBook flags). A 200,000-character aggregate cap and per-tool caps (Bash: 30K, Grep: 20K) truncate older results without notification. There is no `DISABLE_MICROCOMPACT` environment variable.
+The interceptor detects cleared tool results and logs counts. When total tool result characters approach the 200K threshold, a warning is logged.
+### False rate limiter
+The client can generate synthetic "Rate limit reached" errors without making an API call, identifiable by `"model": "<synthetic>"`. The interceptor logs these events.
+### GrowthBook flag dump
+On the first API call, the interceptor reads `~/.claude.json` and logs the current state of cost/cache-relevant server-controlled flags (hawthorn_window, pewter_kestrel, slate_heron, session_memory, etc.).
+### Quota tracking
+Response headers are parsed for `anthropic-ratelimit-unified-5h-utilization` and `7d-utilization`, saved to `~/.claude/quota-status.json` for consumption by status line hooks or other tools.
 ## Debug mode
 Enable debug logging to verify the fix is working:
@@ -88,31 +147,71 @@ Logs are written to `~/.claude/cache-fix-debug.log`. Look for:
 - `APPLIED: resume message relocation` — block scatter was detected and fixed
 - `APPLIED: tool order stabilization` — tools were reordered
 - `APPLIED: fingerprint stabilized from XXX to YYY` — fingerprint was corrected
-- `SKIPPED: resume relocation (not a resume or already correct)` — no fix needed (fresh session or already correct)
+- `APPLIED: stripped N images from old tool results` — images were stripped
+- `MICROCOMPACT: N/M tool results cleared` — microcompact degradation detected
+- `BUDGET WARNING: tool result chars at N / 200,000 threshold` — approaching budget cap
+- `FALSE RATE LIMIT: synthetic model detected` — client-side false rate limit
+- `GROWTHBOOK FLAGS: {...}` — server-controlled feature flags on first call
+- `PREFIX LOCK: APPLIED — replayed saved messages[0]` — resume cache hit achieved
+- `PREFIX LOCK: skipped — <reason>` — guard prevented lock (expected, safe)
+- `SKIPPED: resume relocation (not a resume or already correct)` — no fix needed
+### Prefix diff mode
+Enable cross-process prefix snapshot diffing to diagnose cache busts on restart:
+```bash
+CACHE_FIX_PREFIXDIFF=1 claude-fixed
+```
+Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are generated on the first API call after a restart.
+## Environment variables
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `CACHE_FIX_DEBUG` | `0` | Enable debug logging to `~/.claude/cache-fix-debug.log` |
+| `CACHE_FIX_PREFIXDIFF` | `0` | Enable prefix snapshot diffing |
+| `CACHE_FIX_IMAGE_KEEP_LAST` | `0` | Keep images in last N user messages (0 = disabled) |
+| `CACHE_FIX_PREFIX_LOCK` | `0` | Replay saved messages[0] on resume for cache hit (0 = disabled) |
 ## Limitations
 - **npm installation only** — The standalone Claude Code binary has Zig-level attestation that bypasses Node.js. This fix only works with the npm package (`npm install -g @anthropic-ai/claude-code`).
 - **Overage TTL downgrade** — Exceeding 100% of the 5-hour quota triggers a server-enforced TTL downgrade from 1h to 5m. This is a server-side decision and cannot be fixed client-side. The interceptor prevents the cache instability that can push you into overage in the first place.
+- **Microcompact is not preventable** — The monitoring features detect context degradation but cannot prevent it. The microcompact and budget enforcement mechanisms are server-controlled via GrowthBook flags with no client-side disable option.
 - **Version coupling** — The fingerprint salt and block detection heuristics are derived from Claude Code internals. A major refactor could require an update to this package.
 ## Tracked issues
 - [#34629](https://github.com/anthropics/claude-code/issues/34629) — Original resume cache regression report
-- [#40524](https://github.com/anthropics/claude-code/issues/40524) — Within-session fingerprint invalidation
-- [#42052](https://github.com/anthropics/claude-code/issues/42052) — Community interceptor development and testing
+- [#40524](https://github.com/anthropics/claude-code/issues/40524) — Within-session fingerprint invalidation, image persistence
+- [#42052](https://github.com/anthropics/claude-code/issues/42052) — Community interceptor development, TTL downgrade discovery
 - [#43044](https://github.com/anthropics/claude-code/issues/43044) — Resume loads 0% context on v2.1.91
 - [#43657](https://github.com/anthropics/claude-code/issues/43657) — Resume cache invalidation confirmed on v2.1.92
 - [#44045](https://github.com/anthropics/claude-code/issues/44045) — SDK-level reproduction with token measurements
+## Related research
+- **[@ArkNill/claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis)** — Systematic proxy-based analysis of 7 bugs including microcompact, budget enforcement, false rate limiter, and extended thinking quota impact. The monitoring features in v1.1.0 are informed by this research.
+- **[@Renvect/X-Ray-Claude-Code-Interceptor](https://github.com/Renvect/X-Ray-Claude-Code-Interceptor)** — Diagnostic HTTPS proxy with real-time dashboard, system prompt section diffing, per-tool stripping thresholds, and multi-stream JSONL logging. Works with any Claude client that supports `ANTHROPIC_BASE_URL` (CLI, VS Code extension, desktop app), complementing this package's CLI-only `NODE_OPTIONS` approach.
 ## Contributors
 - **[@VictorSun92](https://github.com/VictorSun92)** — Original monkey-patch fix for v2.1.88, identified partial scatter on v2.1.90, contributed forward-scan detection, correct block ordering, and tighter block matchers
 - **[@jmarianski](https://github.com/jmarianski)** — Root cause analysis via MITM proxy capture and Ghidra reverse engineering, multi-mode cache test script
-- **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, debug logging, overage TTL downgrade discovery, package maintainer
+- **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, image stripping, monitoring features, overage TTL downgrade discovery, package maintainer
+- **[@ArkNill](https://github.com/ArkNill)** — Microcompact mechanism analysis, GrowthBook flag documentation, false rate limiter identification
+- **[@Renvect](https://github.com/Renvect)** — Image duplication discovery, cross-project directory contamination analysis
 If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.
+## Support
+If this tool saved you money, consider buying me a coffee:
+<a href="https://buymeacoffee.com/vsits" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>
 ## License
 [MIT](LICENSE)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-code-cache-fix",
-  "version": "1.0.0",
+  "version": "1.2.0",
   "description": "Fixes prompt cache regression in Claude Code that causes up to 20x cost increase on resumed sessions",
   "type": "module",
   "exports": "./preload.mjs",

package/preload.mjs CHANGED Viewed

@@ -8,51 +8,42 @@
 //   later user messages instead of messages[0]. This breaks the prompt cache
 //   prefix match. Fix: relocate them to messages[0] on every API call.
 //   (github.com/anthropics/claude-code/issues/34629)
-//   (github.com/anthropics/claude-code/issues/43657)
-//   (github.com/anthropics/claude-code/issues/44045)
 //
 // Bug 2: Fingerprint instability
 //   The cc_version fingerprint in the attribution header is computed from
 //   messages[0] content INCLUDING meta/attachment blocks. When those blocks
-//   change between turns, the fingerprint changes -> system prompt bytes
-//   change -> cache bust. Fix: recompute fingerprint from real user text.
+//   change between turns, the fingerprint changes, busting cache within the
+//   same session. Fix: stabilize the fingerprint from the real user message.
 //   (github.com/anthropics/claude-code/issues/40524)
 //
-// Bug 3: Non-deterministic tool schema ordering
-//   Tool definitions can arrive in different orders between turns, changing
-//   request bytes and busting cache. Fix: sort tools alphabetically by name.
+// Bug 3: Image carry-forward in conversation history
+//   Images read via the Read tool persist as base64 in conversation history
+//   and are sent on every subsequent API call. A single 500KB image costs
+//   ~62,500 tokens per turn in carry-forward. Fix: strip base64 image blocks
+//   from tool_result content older than N user turns.
+//   Set CACHE_FIX_IMAGE_KEEP_LAST=N to enable (default: 0 = disabled).
+//   (github.com/anthropics/claude-code/issues/40524)
+//
+// Monitoring:
+//   - GrowthBook flag dump on first API call (CACHE_FIX_DEBUG=1)
+//   - Microcompact / budget enforcement detection (logs cleared tool results)
+//   - False rate limiter detection (model: "<synthetic>")
+//   - Quota utilization tracking (writes ~/.claude/quota-status.json)
+//   - Prefix snapshot diffing across process restarts (CACHE_FIX_PREFIXDIFF=1)
 //
-// Based on community work by @VictorSun92 (original monkey-patch + partial
-// scatter fixes) and @jmarianski (MITM proxy root cause analysis).
+// Based on community fix by @VictorSun92 / @jmarianski (issue #34629),
+// enhanced with fingerprint stabilization, image stripping, and monitoring.
+// Bug research informed by @ArkNill's claude-code-hidden-problem-analysis.
 //
-// Usage: NODE_OPTIONS="--import claude-code-cache-fix" claude
+// Load via: NODE_OPTIONS="--import $HOME/.claude/cache-fix-preload.mjs"
 import { createHash } from "node:crypto";
-import { appendFileSync } from "node:fs";
-import { homedir } from "node:os";
-import { join } from "node:path";
-// ---------------------------------------------------------------------------
-// Debug logging (writes to ~/.claude/cache-fix-debug.log)
-// Set CACHE_FIX_DEBUG=1 to enable
-// ---------------------------------------------------------------------------
-const DEBUG = process.env.CACHE_FIX_DEBUG === "1";
-const LOG_PATH = join(homedir(), ".claude", "cache-fix-debug.log");
-function debugLog(...args) {
-  if (!DEBUG) return;
-  const line = `[${new Date().toISOString()}] ${args.join(" ")}\n`;
-  try {
-    appendFileSync(LOG_PATH, line);
-  } catch {}
-}
-// ---------------------------------------------------------------------------
+// --------------------------------------------------------------------------
 // Fingerprint stabilization (Bug 2)
-// ---------------------------------------------------------------------------
+// --------------------------------------------------------------------------
-// Must match Claude Code src/utils/fingerprint.ts exactly.
+// Must match src/utils/fingerprint.ts exactly.
 const FINGERPRINT_SALT = "59cf53e54c78";
 const FINGERPRINT_INDICES = [4, 7, 20];
@@ -77,20 +68,14 @@ function extractRealUserMessageText(messages) {
     if (msg.role !== "user") continue;
     const content = msg.content;
     if (!Array.isArray(content)) {
-      if (
-        typeof content === "string" &&
-        !content.startsWith("<system-reminder>")
-      ) {
+      if (typeof content === "string" && !content.startsWith("<system-reminder>")) {
         return content;
       }
       continue;
     }
+    // Find first text block that isn't a system-reminder
     for (const block of content) {
-      if (
-        block.type === "text" &&
-        typeof block.text === "string" &&
-        !block.text.startsWith("<system-reminder>")
-      ) {
+      if (block.type === "text" && typeof block.text === "string" && !block.text.startsWith("<system-reminder>")) {
         return block.text;
       }
     }
@@ -100,17 +85,14 @@ function extractRealUserMessageText(messages) {
 /**
  * Extract current cc_version from system prompt blocks and recompute with
- * stable fingerprint. Returns { attrIdx, newText, oldFingerprint, stableFingerprint }
- * or null if no fix needed.
+ * stable fingerprint. Returns { oldVersion, newVersion, stableFingerprint }.
  */
 function stabilizeFingerprint(system, messages) {
   if (!Array.isArray(system)) return null;
+  // Find the attribution header block
   const attrIdx = system.findIndex(
-    (b) =>
-      b.type === "text" &&
-      typeof b.text === "string" &&
-      b.text.includes("x-anthropic-billing-header:")
+    (b) => b.type === "text" && typeof b.text === "string" && b.text.includes("x-anthropic-billing-header:")
   );
   if (attrIdx === -1) return null;
@@ -118,13 +100,14 @@ function stabilizeFingerprint(system, messages) {
   const versionMatch = attrBlock.text.match(/cc_version=([^;]+)/);
   if (!versionMatch) return null;
-  const fullVersion = versionMatch[1]; // e.g. "2.1.92.a3f"
+  const fullVersion = versionMatch[1]; // e.g. "2.1.87.a3f"
   const dotParts = fullVersion.split(".");
   if (dotParts.length < 4) return null;
-  const baseVersion = dotParts.slice(0, 3).join("."); // "2.1.92"
+  const baseVersion = dotParts.slice(0, 3).join("."); // "2.1.87"
   const oldFingerprint = dotParts[3]; // "a3f"
+  // Compute stable fingerprint from real user text
   const realText = extractRealUserMessageText(messages);
   const stableFingerprint = computeFingerprint(realText, baseVersion);
@@ -139,38 +122,28 @@ function stabilizeFingerprint(system, messages) {
   return { attrIdx, newText, oldFingerprint, stableFingerprint };
 }
-// ---------------------------------------------------------------------------
+// --------------------------------------------------------------------------
 // Resume message relocation (Bug 1)
-// ---------------------------------------------------------------------------
+// --------------------------------------------------------------------------
 function isSystemReminder(text) {
   return typeof text === "string" && text.startsWith("<system-reminder>");
 }
+// FIX: Match block headers with startsWith to avoid false positives from
+// quoted content (e.g. "Note:" file-change reminders embedding debug logs).
 const SR = "<system-reminder>\n";
 function isHooksBlock(text) {
-  return (
-    isSystemReminder(text) && text.substring(0, 200).includes("hook success")
-  );
+  // Hooks block header varies; fall back to head-region check
+  return isSystemReminder(text) && text.substring(0, 200).includes("hook success");
 }
 function isSkillsBlock(text) {
-  return (
-    typeof text === "string" &&
-    text.startsWith(SR + "The following skills are available")
-  );
+  return typeof text === "string" && text.startsWith(SR + "The following skills are available");
 }
 function isDeferredToolsBlock(text) {
-  return (
-    typeof text === "string" &&
-    text.startsWith(SR + "The following deferred tools are now available")
-  );
+  return typeof text === "string" && text.startsWith(SR + "The following deferred tools are now available");
 }
 function isMcpBlock(text) {
-  return (
-    typeof text === "string" &&
-    text.startsWith(SR + "# MCP Server Instructions")
-  );
+  return typeof text === "string" && text.startsWith(SR + "# MCP Server Instructions");
 }
 function isRelocatableBlock(text) {
   return (
@@ -208,18 +181,21 @@ function stripSessionKnowledge(text) {
 }
 /**
- * Core fix: on EVERY API call, scan the entire message array for the LATEST
+ * Core fix: on EVERY call, scan the entire message array for the LATEST
  * relocatable blocks (skills, MCP, deferred tools, hooks) and ensure they
  * are in messages[0]. This matches fresh session behavior where attachments
- * are always prepended to messages[0].
+ * are always prepended to messages[0] on every API call.
  *
- * The v2.1.90 native fix has a remaining detection gap: it bails early if
- * it sees *some* relocatable blocks in messages[0], missing the case where
- * others have scattered elsewhere (partial scatter).
+ * The original community fix only checked the last user message, which
+ * broke on subsequent turns because:
+ *   - Call 1: skills in last msg → relocated to messages[0] (3 blocks)
+ *   - Call 2: in-memory state unchanged, skills now in a middle msg,
+ *     last msg has no relocatable blocks → messages[0] back to 2 blocks
+ *   - Prefix changed → cache bust
  *
  * This version scans backwards to find the latest instance of each
  * relocatable block type, removes them from wherever they are, and
- * prepends them to messages[0] in fresh-session order. Idempotent.
+ * prepends them to messages[0]. Idempotent across calls.
  */
 function normalizeResumeMessages(messages) {
   if (!Array.isArray(messages) || messages.length < 2) return messages;
@@ -236,13 +212,11 @@ function normalizeResumeMessages(messages) {
   const firstMsg = messages[firstUserIdx];
   if (!Array.isArray(firstMsg?.content)) return messages;
-  // Check if ANY relocatable blocks are scattered outside first user msg.
+  // FIX: Check if ANY relocatable blocks are scattered outside first user msg.
+  // The old check (firstAlreadyHas → skip) missed partial scatter where some
+  // blocks stay in messages[0] but others drift to later messages (v2.1.89+).
   let hasScatteredBlocks = false;
-  for (
-    let i = firstUserIdx + 1;
-    i < messages.length && !hasScatteredBlocks;
-    i++
-  ) {
+  for (let i = firstUserIdx + 1; i < messages.length && !hasScatteredBlocks; i++) {
     const msg = messages[i];
     if (msg.role !== "user" || !Array.isArray(msg.content)) continue;
     for (const block of msg.content) {
@@ -254,8 +228,8 @@ function normalizeResumeMessages(messages) {
   }
   if (!hasScatteredBlocks) return messages;
-  // Scan ALL user messages in reverse to collect the LATEST version of each
-  // block type. This handles both full and partial scatter.
+  // Scan ALL user messages (including first) in reverse to collect the LATEST
+  // version of each block type. This handles both full and partial scatter.
   const found = new Map();
   for (let i = messages.length - 1; i >= firstUserIdx; i--) {
@@ -267,6 +241,7 @@ function normalizeResumeMessages(messages) {
       const text = block.text || "";
       if (!isRelocatableBlock(text)) continue;
+      // Determine block type for dedup
       let blockType;
       if (isSkillsBlock(text)) blockType = "skills";
       else if (isMcpBlock(text)) blockType = "mcp";
@@ -274,6 +249,7 @@ function normalizeResumeMessages(messages) {
       else if (isHooksBlock(text)) blockType = "hooks";
       else continue;
+      // Keep only the LATEST (first found scanning backwards)
       if (!found.has(blockType)) {
         let fixedText = text;
         if (blockType === "hooks") fixedText = stripSessionKnowledge(text);
@@ -287,17 +263,15 @@ function normalizeResumeMessages(messages) {
   if (found.size === 0) return messages;
-  // Remove ALL relocatable blocks from ALL user messages
+  // Remove ALL relocatable blocks from ALL user messages (both first and later)
   const result = messages.map((msg) => {
     if (msg.role !== "user" || !Array.isArray(msg.content)) return msg;
-    const filtered = msg.content.filter(
-      (b) => !isRelocatableBlock(b.text || "")
-    );
+    const filtered = msg.content.filter((b) => !isRelocatableBlock(b.text || ""));
     if (filtered.length === msg.content.length) return msg;
     return { ...msg, content: filtered };
   });
-  // Order must match fresh session layout: deferred -> mcp -> skills -> hooks
+  // FIX: Order must match fresh session layout: deferred → mcp → skills → hooks
   const ORDER = ["deferred", "mcp", "skills", "hooks"];
   const toRelocate = ORDER.filter((t) => found.has(t)).map((t) => found.get(t));
@@ -309,12 +283,245 @@ function normalizeResumeMessages(messages) {
   return result;
 }
-// ---------------------------------------------------------------------------
-// Tool schema stabilization (Bug 3)
-// ---------------------------------------------------------------------------
+// --------------------------------------------------------------------------
+// Image stripping from old tool results (cost optimization)
+// --------------------------------------------------------------------------
+// CACHE_FIX_IMAGE_KEEP_LAST=N  — keep images only in the last N user messages.
+// Unset or 0 = disabled (all images preserved, backward compatible).
+// Images in tool_result blocks older than N user messages from the end are
+// replaced with a text placeholder. User-pasted images (direct image blocks
+// in user messages, not inside tool_result) are left alone.
+const IMAGE_KEEP_LAST = parseInt(process.env.CACHE_FIX_IMAGE_KEEP_LAST || "0", 10);
+/**
+ * Strip base64 image blocks from tool_result content in older messages.
+ * Returns { messages, stats } where stats has stripping metrics.
+ */
+function stripOldToolResultImages(messages, keepLast) {
+  if (!keepLast || keepLast <= 0 || !Array.isArray(messages)) {
+    return { messages, stats: null };
+  }
+  // Find user message indices (turns) so we can count from the end
+  const userMsgIndices = [];
+  for (let i = 0; i < messages.length; i++) {
+    if (messages[i].role === "user") userMsgIndices.push(i);
+  }
+  if (userMsgIndices.length <= keepLast) {
+    return { messages, stats: null }; // not enough turns to strip anything
+  }
+  // Messages at or after this index are "recent" — keep their images
+  const cutoffIdx = userMsgIndices[userMsgIndices.length - keepLast];
+  let strippedCount = 0;
+  let strippedBytes = 0;
+  const result = messages.map((msg, msgIdx) => {
+    // Only process user messages before the cutoff (tool_result is in user msgs)
+    if (msg.role !== "user" || msgIdx >= cutoffIdx || !Array.isArray(msg.content)) {
+      return msg;
+    }
+    let msgModified = false;
+    const newContent = msg.content.map((block) => {
+      // Only strip images inside tool_result blocks, not user-pasted images
+      if (block.type === "tool_result" && Array.isArray(block.content)) {
+        let toolModified = false;
+        const newToolContent = block.content.map((item) => {
+          if (item.type === "image") {
+            strippedCount++;
+            if (item.source?.data) {
+              strippedBytes += item.source.data.length;
+            }
+            toolModified = true;
+            return {
+              type: "text",
+              text: "[image stripped from history — file may still be on disk]",
+            };
+          }
+          return item;
+        });
+        if (toolModified) {
+          msgModified = true;
+          return { ...block, content: newToolContent };
+        }
+      }
+      return block;
+    });
+    if (msgModified) {
+      return { ...msg, content: newContent };
+    }
+    return msg;
+  });
+  const stats = strippedCount > 0
+    ? { strippedCount, strippedBytes, estimatedTokens: Math.ceil(strippedBytes * 0.125) }
+    : null;
+  return { messages: strippedCount > 0 ? result : messages, stats };
+}
+// --------------------------------------------------------------------------
+// Prefix lock — replay saved messages[0] on resume for cache hit
+// --------------------------------------------------------------------------
+// CACHE_FIX_PREFIX_LOCK=1 — save messages[0] on every call and replay it on
+// resume to avoid a cache rebuild. Disabled by default.
+//
+// On resume, CC reassembles messages with blocks in different positions and
+// injects fresh system-reminders, changing the prefix bytes. Even after our
+// relocation fix corrects the blocks, the prefix differs from what the server
+// cached on the last pre-exit call, causing a full cache rebuild.
+//
+// This feature saves the exact messages[0] content after all fixes are applied.
+// On the first call of a new process (resume), if system prompt hash and tools
+// hash match the saved snapshot, and the real user message text matches, we
+// replay the saved messages[0] to produce a byte-identical prefix → cache hit.
+const PREFIX_LOCK = process.env.CACHE_FIX_PREFIX_LOCK === "1";
+const PREFIX_LOCK_FILE = join(homedir(), ".claude", "cache-fix-prefix-lock.json");
+let _prefixLockFirstCall = true;
+/**
+ * Compute hashes for prefix lock comparison.
+ */
+function computePrefixHashes(system, tools) {
+  const sysHash = system
+    ? createHash("sha256").update(JSON.stringify(system)).digest("hex").slice(0, 16)
+    : "none";
+  const toolHash = tools
+    ? createHash("sha256").update(JSON.stringify(tools.map(t => t.name).sort())).digest("hex").slice(0, 16)
+    : "none";
+  return { sysHash, toolHash };
+}
+/**
+ * Extract the real user message text from messages[0] (skipping system-reminders).
+ */
+function extractUserTextFromFirstMsg(msg) {
+  if (!msg || !Array.isArray(msg.content)) return "";
+  for (const block of msg.content) {
+    if (block.type === "text" && typeof block.text === "string" &&
+        !block.text.startsWith("<system-reminder>") &&
+        !block.text.startsWith("<local-command")) {
+      return block.text.slice(0, 200); // enough to identify, not too much to compare
+    }
+  }
+  return "";
+}
+/**
+ * Hash all non-system-reminder user content in messages[0] to detect
+ * substantive changes that the userText check (first 200 chars) might miss.
+ */
+function hashUserContent(msg) {
+  if (!msg || !Array.isArray(msg.content)) return "empty";
+  const userBlocks = msg.content.filter(b =>
+    b.type === "text" && typeof b.text === "string" &&
+    !b.text.startsWith("<system-reminder>") &&
+    !b.text.startsWith("<local-command")
+  );
+  if (userBlocks.length === 0) return "empty";
+  return createHash("sha256")
+    .update(userBlocks.map(b => b.text).join("\n"))
+    .digest("hex").slice(0, 16);
+}
+/**
+ * On resume: try to replay saved messages[0] for cache hit.
+ * Returns the locked messages array or the original if lock doesn't apply.
+ */
+function applyPrefixLock(messages, system, tools) {
+  if (!PREFIX_LOCK || !Array.isArray(messages) || messages.length < 2) return messages;
+  const firstUserIdx = messages.findIndex(m => m.role === "user");
+  if (firstUserIdx === -1) return messages;
+  const { sysHash, toolHash } = computePrefixHashes(system, tools);
+  const currentUserText = extractUserTextFromFirstMsg(messages[firstUserIdx]);
+  const currentContentHash = hashUserContent(messages[firstUserIdx]);
+  // Skip if this looks like a compacted conversation (system-reminder as first block
+  // with compaction summary markers)
+  const firstBlock = messages[firstUserIdx]?.content?.[0];
+  if (firstBlock?.text?.includes("CompactBoundary") || firstBlock?.text?.includes("compacted")) {
+    debugLog("PREFIX LOCK: skipped — compacted conversation detected");
+    return messages;
+  }
+  if (_prefixLockFirstCall) {
+    _prefixLockFirstCall = false;
+    // Try to load and apply saved prefix
+    try {
+      const saved = JSON.parse(readFileSync(PREFIX_LOCK_FILE, "utf8"));
+      if (saved.sysHash !== sysHash) {
+        debugLog("PREFIX LOCK: skipped — system prompt changed");
+      } else if (saved.toolHash !== toolHash) {
+        debugLog("PREFIX LOCK: skipped — tools changed");
+      } else if (saved.userText !== currentUserText) {
+        debugLog("PREFIX LOCK: skipped — user message text changed");
+      } else if (saved.contentHash && saved.contentHash !== currentContentHash) {
+        debugLog("PREFIX LOCK: skipped — user content hash changed (substantive context change)");
+      } else if (!saved.content || !Array.isArray(saved.content)) {
+        debugLog("PREFIX LOCK: skipped — saved content invalid");
+      } else {
+        // Apply the saved messages[0] content
+        const result = [...messages];
+        result[firstUserIdx] = { ...result[firstUserIdx], content: saved.content };
+        debugLog(`PREFIX LOCK: APPLIED — replayed saved messages[0] (${saved.content.length} blocks)`);
+        return result;
+      }
+    } catch {
+      debugLog("PREFIX LOCK: no saved prefix found (first run or file missing)");
+    }
+  }
+  return messages;
+}
+/**
+ * Save current messages[0] content for future resume replay.
+ * Called after all fixes are applied, before the request is sent.
+ */
+function savePrefixLock(messages, system, tools) {
+  if (!PREFIX_LOCK || !Array.isArray(messages)) return;
+  const firstUserIdx = messages.findIndex(m => m.role === "user");
+  if (firstUserIdx === -1) return;
+  const { sysHash, toolHash } = computePrefixHashes(system, tools);
+  const userText = extractUserTextFromFirstMsg(messages[firstUserIdx]);
+  const contentHash = hashUserContent(messages[firstUserIdx]);
+  const content = messages[firstUserIdx].content;
+  try {
+    writeFileSync(PREFIX_LOCK_FILE, JSON.stringify({
+      timestamp: new Date().toISOString(),
+      sysHash,
+      toolHash,
+      userText,
+      contentHash,
+      content,
+    }));
+  } catch (e) {
+    debugLog("PREFIX LOCK: failed to save:", e?.message);
+  }
+}
+// --------------------------------------------------------------------------
+// Tool schema stabilization (Bug 2 secondary cause)
+// --------------------------------------------------------------------------
 /**
- * Sort tool definitions by name for deterministic ordering.
+ * Sort tool definitions by name for deterministic ordering. Tool schema bytes
+ * changing mid-session was acknowledged as a bug in the v2.1.88 changelog.
  */
 function stabilizeToolOrder(tools) {
   if (!Array.isArray(tools) || tools.length === 0) return tools;
@@ -325,9 +532,228 @@ function stabilizeToolOrder(tools) {
   });
 }
-// ---------------------------------------------------------------------------
+// --------------------------------------------------------------------------
+// Fetch interceptor
+// --------------------------------------------------------------------------
+// --------------------------------------------------------------------------
+// Debug logging (writes to ~/.claude/cache-fix-debug.log)
+// Set CACHE_FIX_DEBUG=1 to enable
+// --------------------------------------------------------------------------
+import { appendFileSync, readFileSync, writeFileSync, mkdirSync } from "node:fs";
+import { homedir } from "node:os";
+import { join } from "node:path";
+const DEBUG = process.env.CACHE_FIX_DEBUG === "1";
+const PREFIXDIFF = process.env.CACHE_FIX_PREFIXDIFF === "1";
+const LOG_PATH = join(homedir(), ".claude", "cache-fix-debug.log");
+const SNAPSHOT_DIR = join(homedir(), ".claude", "cache-fix-snapshots");
+function debugLog(...args) {
+  if (!DEBUG) return;
+  const line = `[${new Date().toISOString()}] ${args.join(" ")}\n`;
+  try { appendFileSync(LOG_PATH, line); } catch {}
+}
+// --------------------------------------------------------------------------
+// Prefix snapshot — captures message prefix for cross-process diff.
+// Set CACHE_FIX_PREFIXDIFF=1 to enable.
+//
+// On each API call: saves JSON of first 5 messages + system + tools hash
+// to ~/.claude/cache-fix-snapshots/<session-hash>-last.json
+//
+// On first call after startup: compares against saved snapshot and writes
+// a diff report to ~/.claude/cache-fix-snapshots/<session-hash>-diff.json
+// --------------------------------------------------------------------------
+let _prefixDiffFirstCall = true;
+// --------------------------------------------------------------------------
+// GrowthBook flag dump (runs once on first API call)
+// --------------------------------------------------------------------------
+let _growthBookDumped = false;
+function dumpGrowthBookFlags() {
+  if (_growthBookDumped || !DEBUG) return;
+  _growthBookDumped = true;
+  try {
+    const claudeJson = JSON.parse(readFileSync(join(homedir(), ".claude.json"), "utf8"));
+    const features = claudeJson.cachedGrowthBookFeatures;
+    if (!features) { debugLog("GROWTHBOOK: no cachedGrowthBookFeatures found"); return; }
+    // Log the flags that matter for cost/cache/context behavior
+    const interesting = {
+      hawthorn_window: features.tengu_hawthorn_window,
+      pewter_kestrel: features.tengu_pewter_kestrel,
+      summarize_tool_results: features.tengu_summarize_tool_results,
+      slate_heron: features.tengu_slate_heron,
+      session_memory: features.tengu_session_memory,
+      sm_compact: features.tengu_sm_compact,
+      sm_compact_config: features.tengu_sm_compact_config,
+      sm_config: features.tengu_sm_config,
+      cache_plum_violet: features.tengu_cache_plum_violet,
+      prompt_cache_1h_config: features.tengu_prompt_cache_1h_config,
+      crystal_beam: features.tengu_crystal_beam,
+      cold_compact: features.tengu_cold_compact,
+      system_prompt_global_cache: features.tengu_system_prompt_global_cache,
+      compact_cache_prefix: features.tengu_compact_cache_prefix,
+    };
+    debugLog("GROWTHBOOK FLAGS:", JSON.stringify(interesting, null, 2));
+  } catch (e) {
+    debugLog("GROWTHBOOK: failed to read ~/.claude.json:", e?.message);
+  }
+}
+// --------------------------------------------------------------------------
+// Microcompact / budget monitoring
+// --------------------------------------------------------------------------
+/**
+ * Scan outgoing messages for signs of microcompact clearing and budget
+ * enforcement. Counts tool results that have been gutted and reports stats.
+ */
+function monitorContextDegradation(messages) {
+  if (!Array.isArray(messages)) return null;
+  let clearedToolResults = 0;
+  let totalToolResultChars = 0;
+  let totalToolResults = 0;
+  for (const msg of messages) {
+    if (msg.role !== "user" || !Array.isArray(msg.content)) continue;
+    for (const block of msg.content) {
+      if (block.type === "tool_result") {
+        totalToolResults++;
+        const content = block.content;
+        if (typeof content === "string") {
+          if (content === "[Old tool result content cleared]") {
+            clearedToolResults++;
+          } else {
+            totalToolResultChars += content.length;
+          }
+        } else if (Array.isArray(content)) {
+          for (const item of content) {
+            if (item.type === "text") {
+              if (item.text === "[Old tool result content cleared]") {
+                clearedToolResults++;
+              } else {
+                totalToolResultChars += item.text.length;
+              }
+            }
+          }
+        }
+      }
+    }
+  }
+  if (totalToolResults === 0) return null;
+  const stats = { totalToolResults, clearedToolResults, totalToolResultChars };
+  if (clearedToolResults > 0) {
+    debugLog(`MICROCOMPACT: ${clearedToolResults}/${totalToolResults} tool results cleared`);
+  }
+  // Warn when approaching the 200K budget threshold
+  if (totalToolResultChars > 150000) {
+    debugLog(`BUDGET WARNING: tool result chars at ${totalToolResultChars.toLocaleString()} / 200,000 threshold`);
+  }
+  return stats;
+}
+function snapshotPrefix(payload) {
+  if (!PREFIXDIFF) return;
+  try {
+    mkdirSync(SNAPSHOT_DIR, { recursive: true });
+    // Session key: use system prompt hash — stable across restarts for the same project.
+    // Different projects get different snapshots, same project matches across resume.
+    const sessionKey = payload.system
+      ? createHash("sha256").update(JSON.stringify(payload.system).slice(0, 2000)).digest("hex").slice(0, 12)
+      : "default";
+    const snapshotFile = join(SNAPSHOT_DIR, `${sessionKey}-last.json`);
+    const diffFile = join(SNAPSHOT_DIR, `${sessionKey}-diff.json`);
+    // Build prefix snapshot: first 5 messages, stripped of cache_control
+    const prefixMsgs = (payload.messages || []).slice(0, 5).map(msg => {
+      const content = Array.isArray(msg.content)
+        ? msg.content.map(b => {
+            const { cache_control, ...rest } = b;
+            // Truncate long text blocks for diffing
+            if (rest.text && rest.text.length > 500) {
+              rest.text = rest.text.slice(0, 500) + `...[${rest.text.length} chars]`;
+            }
+            return rest;
+          })
+        : msg.content;
+      return { role: msg.role, content };
+    });
+    const toolsHash = payload.tools
+      ? createHash("sha256").update(JSON.stringify(payload.tools.map(t => t.name))).digest("hex").slice(0, 16)
+      : "none";
+    const systemHash = payload.system
+      ? createHash("sha256").update(JSON.stringify(payload.system)).digest("hex").slice(0, 16)
+      : "none";
+    const snapshot = {
+      timestamp: new Date().toISOString(),
+      messageCount: payload.messages?.length || 0,
+      toolsHash,
+      systemHash,
+      prefixMessages: prefixMsgs,
+    };
+    // On first call: compare against saved
+    if (_prefixDiffFirstCall) {
+      _prefixDiffFirstCall = false;
+      try {
+        const prev = JSON.parse(readFileSync(snapshotFile, "utf8"));
+        const diff = {
+          timestamp: snapshot.timestamp,
+          prevTimestamp: prev.timestamp,
+          toolsMatch: prev.toolsHash === snapshot.toolsHash,
+          systemMatch: prev.systemHash === snapshot.systemHash,
+          messageCountPrev: prev.messageCount,
+          messageCountNow: snapshot.messageCount,
+          prefixDiffs: [],
+        };
+        const maxIdx = Math.max(prev.prefixMessages.length, snapshot.prefixMessages.length);
+        for (let i = 0; i < maxIdx; i++) {
+          const prevMsg = JSON.stringify(prev.prefixMessages[i] || null);
+          const nowMsg = JSON.stringify(snapshot.prefixMessages[i] || null);
+          if (prevMsg !== nowMsg) {
+            diff.prefixDiffs.push({
+              index: i,
+              prev: prev.prefixMessages[i] || null,
+              now: snapshot.prefixMessages[i] || null,
+            });
+          }
+        }
+        writeFileSync(diffFile, JSON.stringify(diff, null, 2));
+        debugLog(`PREFIX DIFF: ${diff.prefixDiffs.length} differences in first 5 messages. tools=${diff.toolsMatch ? "match" : "DIFFER"} system=${diff.systemMatch ? "match" : "DIFFER"}`);
+      } catch {
+        // No previous snapshot — first run
+      }
+    }
+    // Save current snapshot
+    writeFileSync(snapshotFile, JSON.stringify(snapshot, null, 2));
+  } catch (e) {
+    debugLog("PREFIX SNAPSHOT ERROR:", e?.message);
+  }
+}
+// --------------------------------------------------------------------------
 // Fetch interceptor
-// ---------------------------------------------------------------------------
+// --------------------------------------------------------------------------
 const _origFetch = globalThis.fetch;
@@ -339,23 +765,27 @@ globalThis.fetch = async function (url, options) {
     !urlStr.includes("batches") &&
     !urlStr.includes("count_tokens");
-  if (
-    isMessagesEndpoint &&
-    options?.body &&
-    typeof options.body === "string"
-  ) {
+  if (isMessagesEndpoint && options?.body && typeof options.body === "string") {
     try {
       const payload = JSON.parse(options.body);
       let modified = false;
+      // One-time GrowthBook flag dump on first API call
+      dumpGrowthBookFlags();
       debugLog("--- API call to", urlStr);
       debugLog("message count:", payload.messages?.length);
-      // Bug 1: Relocate scattered attachment blocks
+      // Detect synthetic model (false rate limiter, B3)
+      if (payload.model === "<synthetic>") {
+        debugLog("FALSE RATE LIMIT: synthetic model detected — client-side rate limit, no real API call");
+      }
+      // Bug 1: Relocate resume attachment blocks
       if (payload.messages) {
+        // Log message structure for debugging
         if (DEBUG) {
-          let firstUserIdx = -1;
-          let lastUserIdx = -1;
+          let firstUserIdx = -1, lastUserIdx = -1;
           for (let i = 0; i < payload.messages.length; i++) {
             if (payload.messages[i].role === "user") {
               if (firstUserIdx === -1) firstUserIdx = i;
@@ -365,39 +795,20 @@ globalThis.fetch = async function (url, options) {
           if (firstUserIdx !== -1) {
             const firstContent = payload.messages[firstUserIdx].content;
             const lastContent = payload.messages[lastUserIdx].content;
-            debugLog(
-              "firstUserIdx:",
-              firstUserIdx,
-              "lastUserIdx:",
-              lastUserIdx
-            );
-            debugLog(
-              "first user msg blocks:",
-              Array.isArray(firstContent) ? firstContent.length : "string"
-            );
+            debugLog("firstUserIdx:", firstUserIdx, "lastUserIdx:", lastUserIdx);
+            debugLog("first user msg blocks:", Array.isArray(firstContent) ? firstContent.length : "string");
             if (Array.isArray(firstContent)) {
               for (const b of firstContent) {
                 const t = (b.text || "").substring(0, 80);
-                debugLog(
-                  "  first[block]:",
-                  isRelocatableBlock(b.text) ? "RELOCATABLE" : "keep",
-                  JSON.stringify(t)
-                );
+                debugLog("  first[block]:", isRelocatableBlock(b.text) ? "RELOCATABLE" : "keep", JSON.stringify(t));
               }
             }
             if (firstUserIdx !== lastUserIdx) {
-              debugLog(
-                "last user msg blocks:",
-                Array.isArray(lastContent) ? lastContent.length : "string"
-              );
+              debugLog("last user msg blocks:", Array.isArray(lastContent) ? lastContent.length : "string");
               if (Array.isArray(lastContent)) {
                 for (const b of lastContent) {
                   const t = (b.text || "").substring(0, 80);
-                  debugLog(
-                    "  last[block]:",
-                    isRelocatableBlock(b.text) ? "RELOCATABLE" : "keep",
-                    JSON.stringify(t)
-                  );
+                  debugLog("  last[block]:", isRelocatableBlock(b.text) ? "RELOCATABLE" : "keep", JSON.stringify(t));
                 }
               }
             } else {
@@ -412,13 +823,37 @@ globalThis.fetch = async function (url, options) {
           modified = true;
           debugLog("APPLIED: resume message relocation");
         } else {
+          debugLog("SKIPPED: resume relocation (not a resume or already correct)");
+        }
+      }
+      // Image stripping: remove old tool_result images to reduce token waste
+      if (payload.messages && IMAGE_KEEP_LAST > 0) {
+        const { messages: imgStripped, stats: imgStats } = stripOldToolResultImages(
+          payload.messages, IMAGE_KEEP_LAST
+        );
+        if (imgStats) {
+          payload.messages = imgStripped;
+          modified = true;
           debugLog(
-            "SKIPPED: resume relocation (not a resume or already correct)"
+            `APPLIED: stripped ${imgStats.strippedCount} images from old tool results`,
+            `(~${imgStats.strippedBytes} base64 bytes, ~${imgStats.estimatedTokens} tokens saved)`
           );
+        } else if (IMAGE_KEEP_LAST > 0) {
+          debugLog("SKIPPED: image stripping (no old images found or not enough turns)");
+        }
+      }
+      // Prefix lock: replay saved messages[0] on resume for cache hit
+      if (payload.messages && payload.system) {
+        const locked = applyPrefixLock(payload.messages, payload.system, payload.tools);
+        if (locked !== payload.messages) {
+          payload.messages = locked;
+          modified = true;
         }
       }
-      // Bug 3: Stabilize tool ordering
+      // Bug 2a: Stabilize tool ordering
       if (payload.tools) {
         const sorted = stabilizeToolOrder(payload.tools);
         const changed = sorted.some(
@@ -431,7 +866,7 @@ globalThis.fetch = async function (url, options) {
         }
       }
-      // Bug 2: Stabilize fingerprint in attribution header
+      // Bug 2b: Stabilize fingerprint in attribution header
       if (payload.system && payload.messages) {
         const fix = stabilizeFingerprint(payload.system, payload.messages);
         if (fix) {
@@ -441,12 +876,7 @@ globalThis.fetch = async function (url, options) {
             text: fix.newText,
           };
           modified = true;
-          debugLog(
-            "APPLIED: fingerprint stabilized from",
-            fix.oldFingerprint,
-            "to",
-            fix.stableFingerprint
-          );
+          debugLog("APPLIED: fingerprint stabilized from", fix.oldFingerprint, "to", fix.stableFingerprint);
         }
       }
@@ -454,11 +884,53 @@ globalThis.fetch = async function (url, options) {
         options = { ...options, body: JSON.stringify(payload) };
         debugLog("Request body rewritten");
       }
+      // Save prefix lock after all fixes applied
+      if (payload.messages && payload.system) {
+        savePrefixLock(payload.messages, payload.system, payload.tools);
+      }
+      // Monitor for microcompact / budget enforcement degradation
+      if (payload.messages) {
+        monitorContextDegradation(payload.messages);
+      }
+      // Capture prefix snapshot for cross-process diff analysis
+      snapshotPrefix(payload);
     } catch (e) {
       debugLog("ERROR in interceptor:", e?.message);
       // Parse failure — pass through unmodified
     }
   }
-  return _origFetch.apply(this, [url, options]);
+  const response = await _origFetch.apply(this, [url, options]);
+  // Extract quota utilization from response headers and save for hooks/MCP
+  if (isMessagesEndpoint) {
+    try {
+      const h5 = response.headers.get("anthropic-ratelimit-unified-5h-utilization");
+      const h7d = response.headers.get("anthropic-ratelimit-unified-7d-utilization");
+      const reset5h = response.headers.get("anthropic-ratelimit-unified-5h-reset");
+      const reset7d = response.headers.get("anthropic-ratelimit-unified-7d-reset");
+      const status = response.headers.get("anthropic-ratelimit-unified-status");
+      const overage = response.headers.get("anthropic-ratelimit-unified-overage-status");
+      if (h5 || h7d) {
+        const quota = {
+          timestamp: new Date().toISOString(),
+          five_hour: h5 ? { utilization: parseFloat(h5), pct: Math.round(parseFloat(h5) * 100), resets_at: reset5h ? parseInt(reset5h) : null } : null,
+          seven_day: h7d ? { utilization: parseFloat(h7d), pct: Math.round(parseFloat(h7d) * 100), resets_at: reset7d ? parseInt(reset7d) : null } : null,
+          status: status || null,
+          overage_status: overage || null,
+        };
+        const quotaFile = join(homedir(), ".claude", "quota-status.json");
+        writeFileSync(quotaFile, JSON.stringify(quota, null, 2));
+      }
+    } catch {
+      // Non-critical — don't break the response
+    }
+  }
+  return response;
 };