claude-code-cache-fix 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +79 -5
  2. package/package.json +1 -1
  3. package/preload.mjs +443 -135
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # claude-code-cache-fix
2
2
 
3
- Fixes a prompt cache regression in [Claude Code](https://github.com/anthropics/claude-code) that causes **up to 20x cost increase** on resumed sessions. Confirmed broken through v2.1.92.
3
+ Fixes prompt cache regressions in [Claude Code](https://github.com/anthropics/claude-code) that cause **up to 20x cost increase** on resumed sessions, plus monitoring for silent context degradation. Confirmed through v2.1.92.
4
4
 
5
5
  ## The problem
6
6
 
@@ -14,6 +14,8 @@ Three bugs cause this:
14
14
 
15
15
  3. **Non-deterministic tool ordering** — Tool definitions can arrive in different orders between turns, changing request bytes and invalidating the cache key.
16
16
 
17
+ Additionally, images read via the Read tool persist as base64 in conversation history and are sent on every subsequent API call, compounding token costs silently.
18
+
17
19
  ## Installation
18
20
 
19
21
  Requires Node.js >= 18 and Claude Code installed via npm (not the standalone binary).
@@ -76,6 +78,42 @@ The module intercepts `globalThis.fetch` before Claude Code makes API calls to `
76
78
 
77
79
  All fixes are idempotent — if nothing needs fixing, the request passes through unmodified. The interceptor is read-only with respect to your conversation; it only normalizes the request structure before it hits the API.
78
80
 
81
+ ## Image stripping
82
+
83
+ Images read via the Read tool are encoded as base64 and stored in `tool_result` blocks in conversation history. They ride along on **every subsequent API call** until compaction. A single 500KB image costs ~62,500 tokens per turn in carry-forward.
84
+
85
+ Enable image stripping to remove old images from tool results:
86
+
87
+ ```bash
88
+ export CACHE_FIX_IMAGE_KEEP_LAST=3
89
+ ```
90
+
91
+ This keeps images in the last 3 user messages and replaces older ones with a text placeholder. Only targets images inside `tool_result` blocks (Read tool output) — user-pasted images are never touched. Files remain on disk for re-reading if needed.
92
+
93
+ Set to `0` (default) to disable.
94
+
95
+ ## Monitoring
96
+
97
+ The interceptor includes monitoring for several additional issues identified by the community:
98
+
99
+ ### Microcompact / budget enforcement
100
+
101
+ Claude Code silently replaces old tool results with `[Old tool result content cleared]` via server-controlled mechanisms (GrowthBook flags). A 200,000-character aggregate cap and per-tool caps (Bash: 30K, Grep: 20K) truncate older results without notification. There is no `DISABLE_MICROCOMPACT` environment variable.
102
+
103
+ The interceptor detects cleared tool results and logs counts. When total tool result characters approach the 200K threshold, a warning is logged.
104
+
105
+ ### False rate limiter
106
+
107
+ The client can generate synthetic "Rate limit reached" errors without making an API call, identifiable by `"model": "<synthetic>"`. The interceptor logs these events.
108
+
109
+ ### GrowthBook flag dump
110
+
111
+ On the first API call, the interceptor reads `~/.claude.json` and logs the current state of cost/cache-relevant server-controlled flags (hawthorn_window, pewter_kestrel, slate_heron, session_memory, etc.).
112
+
113
+ ### Quota tracking
114
+
115
+ Response headers are parsed for `anthropic-ratelimit-unified-5h-utilization` and `7d-utilization`, saved to `~/.claude/quota-status.json` for consumption by status line hooks or other tools.
116
+
79
117
  ## Debug mode
80
118
 
81
119
  Enable debug logging to verify the fix is working:
@@ -88,31 +126,67 @@ Logs are written to `~/.claude/cache-fix-debug.log`. Look for:
88
126
  - `APPLIED: resume message relocation` — block scatter was detected and fixed
89
127
  - `APPLIED: tool order stabilization` — tools were reordered
90
128
  - `APPLIED: fingerprint stabilized from XXX to YYY` — fingerprint was corrected
91
- - `SKIPPED: resume relocation (not a resume or already correct)` — no fix needed (fresh session or already correct)
129
+ - `APPLIED: stripped N images from old tool results` — images were stripped
130
+ - `MICROCOMPACT: N/M tool results cleared` — microcompact degradation detected
131
+ - `BUDGET WARNING: tool result chars at N / 200,000 threshold` — approaching budget cap
132
+ - `FALSE RATE LIMIT: synthetic model detected` — client-side false rate limit
133
+ - `GROWTHBOOK FLAGS: {...}` — server-controlled feature flags on first call
134
+ - `SKIPPED: resume relocation (not a resume or already correct)` — no fix needed
135
+
136
+ ### Prefix diff mode
137
+
138
+ Enable cross-process prefix snapshot diffing to diagnose cache busts on restart:
139
+
140
+ ```bash
141
+ CACHE_FIX_PREFIXDIFF=1 claude-fixed
142
+ ```
143
+
144
+ Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are generated on the first API call after a restart.
145
+
146
+ ## Environment variables
147
+
148
+ | Variable | Default | Description |
149
+ |----------|---------|-------------|
150
+ | `CACHE_FIX_DEBUG` | `0` | Enable debug logging to `~/.claude/cache-fix-debug.log` |
151
+ | `CACHE_FIX_PREFIXDIFF` | `0` | Enable prefix snapshot diffing |
152
+ | `CACHE_FIX_IMAGE_KEEP_LAST` | `0` | Keep images in last N user messages (0 = disabled) |
92
153
 
93
154
  ## Limitations
94
155
 
95
156
  - **npm installation only** — The standalone Claude Code binary has Zig-level attestation that bypasses Node.js. This fix only works with the npm package (`npm install -g @anthropic-ai/claude-code`).
96
157
  - **Overage TTL downgrade** — Exceeding 100% of the 5-hour quota triggers a server-enforced TTL downgrade from 1h to 5m. This is a server-side decision and cannot be fixed client-side. The interceptor prevents the cache instability that can push you into overage in the first place.
158
+ - **Microcompact is not preventable** — The monitoring features detect context degradation but cannot prevent it. The microcompact and budget enforcement mechanisms are server-controlled via GrowthBook flags with no client-side disable option.
97
159
  - **Version coupling** — The fingerprint salt and block detection heuristics are derived from Claude Code internals. A major refactor could require an update to this package.
98
160
 
99
161
  ## Tracked issues
100
162
 
101
163
  - [#34629](https://github.com/anthropics/claude-code/issues/34629) — Original resume cache regression report
102
- - [#40524](https://github.com/anthropics/claude-code/issues/40524) — Within-session fingerprint invalidation
103
- - [#42052](https://github.com/anthropics/claude-code/issues/42052) — Community interceptor development and testing
164
+ - [#40524](https://github.com/anthropics/claude-code/issues/40524) — Within-session fingerprint invalidation, image persistence
165
+ - [#42052](https://github.com/anthropics/claude-code/issues/42052) — Community interceptor development, TTL downgrade discovery
104
166
  - [#43044](https://github.com/anthropics/claude-code/issues/43044) — Resume loads 0% context on v2.1.91
105
167
  - [#43657](https://github.com/anthropics/claude-code/issues/43657) — Resume cache invalidation confirmed on v2.1.92
106
168
  - [#44045](https://github.com/anthropics/claude-code/issues/44045) — SDK-level reproduction with token measurements
107
169
 
170
+ ## Related research
171
+
172
+ - **[@ArkNill/claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis)** — Systematic proxy-based analysis of 7 bugs including microcompact, budget enforcement, false rate limiter, and extended thinking quota impact. The monitoring features in v1.1.0 are informed by this research.
173
+
108
174
  ## Contributors
109
175
 
110
176
  - **[@VictorSun92](https://github.com/VictorSun92)** — Original monkey-patch fix for v2.1.88, identified partial scatter on v2.1.90, contributed forward-scan detection, correct block ordering, and tighter block matchers
111
177
  - **[@jmarianski](https://github.com/jmarianski)** — Root cause analysis via MITM proxy capture and Ghidra reverse engineering, multi-mode cache test script
112
- - **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, debug logging, overage TTL downgrade discovery, package maintainer
178
+ - **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, image stripping, monitoring features, overage TTL downgrade discovery, package maintainer
179
+ - **[@ArkNill](https://github.com/ArkNill)** — Microcompact mechanism analysis, GrowthBook flag documentation, false rate limiter identification
180
+ - **[@Renvect](https://github.com/Renvect)** — Image duplication discovery, cross-project directory contamination analysis
113
181
 
114
182
  If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.
115
183
 
184
+ ## Support
185
+
186
+ If this tool saved you money, consider buying me a coffee:
187
+
188
+ <a href="https://buymeacoffee.com/vsits" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>
189
+
116
190
  ## License
117
191
 
118
192
  [MIT](LICENSE)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-code-cache-fix",
3
- "version": "1.0.0",
3
+ "version": "1.1.0",
4
4
  "description": "Fixes prompt cache regression in Claude Code that causes up to 20x cost increase on resumed sessions",
5
5
  "type": "module",
6
6
  "exports": "./preload.mjs",
package/preload.mjs CHANGED
@@ -8,51 +8,42 @@
8
8
  // later user messages instead of messages[0]. This breaks the prompt cache
9
9
  // prefix match. Fix: relocate them to messages[0] on every API call.
10
10
  // (github.com/anthropics/claude-code/issues/34629)
11
- // (github.com/anthropics/claude-code/issues/43657)
12
- // (github.com/anthropics/claude-code/issues/44045)
13
11
  //
14
12
  // Bug 2: Fingerprint instability
15
13
  // The cc_version fingerprint in the attribution header is computed from
16
14
  // messages[0] content INCLUDING meta/attachment blocks. When those blocks
17
- // change between turns, the fingerprint changes -> system prompt bytes
18
- // change -> cache bust. Fix: recompute fingerprint from real user text.
15
+ // change between turns, the fingerprint changes, busting cache within the
16
+ // same session. Fix: stabilize the fingerprint from the real user message.
19
17
  // (github.com/anthropics/claude-code/issues/40524)
20
18
  //
21
- // Bug 3: Non-deterministic tool schema ordering
22
- // Tool definitions can arrive in different orders between turns, changing
23
- // request bytes and busting cache. Fix: sort tools alphabetically by name.
19
+ // Bug 3: Image carry-forward in conversation history
20
+ // Images read via the Read tool persist as base64 in conversation history
21
+ // and are sent on every subsequent API call. A single 500KB image costs
22
+ // ~62,500 tokens per turn in carry-forward. Fix: strip base64 image blocks
23
+ // from tool_result content older than N user turns.
24
+ // Set CACHE_FIX_IMAGE_KEEP_LAST=N to enable (default: 0 = disabled).
25
+ // (github.com/anthropics/claude-code/issues/40524)
26
+ //
27
+ // Monitoring:
28
+ // - GrowthBook flag dump on first API call (CACHE_FIX_DEBUG=1)
29
+ // - Microcompact / budget enforcement detection (logs cleared tool results)
30
+ // - False rate limiter detection (model: "<synthetic>")
31
+ // - Quota utilization tracking (writes ~/.claude/quota-status.json)
32
+ // - Prefix snapshot diffing across process restarts (CACHE_FIX_PREFIXDIFF=1)
24
33
  //
25
- // Based on community work by @VictorSun92 (original monkey-patch + partial
26
- // scatter fixes) and @jmarianski (MITM proxy root cause analysis).
34
+ // Based on community fix by @VictorSun92 / @jmarianski (issue #34629),
35
+ // enhanced with fingerprint stabilization, image stripping, and monitoring.
36
+ // Bug research informed by @ArkNill's claude-code-hidden-problem-analysis.
27
37
  //
28
- // Usage: NODE_OPTIONS="--import claude-code-cache-fix" claude
38
+ // Load via: NODE_OPTIONS="--import $HOME/.claude/cache-fix-preload.mjs"
29
39
 
30
40
  import { createHash } from "node:crypto";
31
- import { appendFileSync } from "node:fs";
32
- import { homedir } from "node:os";
33
- import { join } from "node:path";
34
41
 
35
- // ---------------------------------------------------------------------------
36
- // Debug logging (writes to ~/.claude/cache-fix-debug.log)
37
- // Set CACHE_FIX_DEBUG=1 to enable
38
- // ---------------------------------------------------------------------------
39
-
40
- const DEBUG = process.env.CACHE_FIX_DEBUG === "1";
41
- const LOG_PATH = join(homedir(), ".claude", "cache-fix-debug.log");
42
-
43
- function debugLog(...args) {
44
- if (!DEBUG) return;
45
- const line = `[${new Date().toISOString()}] ${args.join(" ")}\n`;
46
- try {
47
- appendFileSync(LOG_PATH, line);
48
- } catch {}
49
- }
50
-
51
- // ---------------------------------------------------------------------------
42
+ // --------------------------------------------------------------------------
52
43
  // Fingerprint stabilization (Bug 2)
53
- // ---------------------------------------------------------------------------
44
+ // --------------------------------------------------------------------------
54
45
 
55
- // Must match Claude Code src/utils/fingerprint.ts exactly.
46
+ // Must match src/utils/fingerprint.ts exactly.
56
47
  const FINGERPRINT_SALT = "59cf53e54c78";
57
48
  const FINGERPRINT_INDICES = [4, 7, 20];
58
49
 
@@ -77,20 +68,14 @@ function extractRealUserMessageText(messages) {
77
68
  if (msg.role !== "user") continue;
78
69
  const content = msg.content;
79
70
  if (!Array.isArray(content)) {
80
- if (
81
- typeof content === "string" &&
82
- !content.startsWith("<system-reminder>")
83
- ) {
71
+ if (typeof content === "string" && !content.startsWith("<system-reminder>")) {
84
72
  return content;
85
73
  }
86
74
  continue;
87
75
  }
76
+ // Find first text block that isn't a system-reminder
88
77
  for (const block of content) {
89
- if (
90
- block.type === "text" &&
91
- typeof block.text === "string" &&
92
- !block.text.startsWith("<system-reminder>")
93
- ) {
78
+ if (block.type === "text" && typeof block.text === "string" && !block.text.startsWith("<system-reminder>")) {
94
79
  return block.text;
95
80
  }
96
81
  }
@@ -100,17 +85,14 @@ function extractRealUserMessageText(messages) {
100
85
 
101
86
  /**
102
87
  * Extract current cc_version from system prompt blocks and recompute with
103
- * stable fingerprint. Returns { attrIdx, newText, oldFingerprint, stableFingerprint }
104
- * or null if no fix needed.
88
+ * stable fingerprint. Returns { oldVersion, newVersion, stableFingerprint }.
105
89
  */
106
90
  function stabilizeFingerprint(system, messages) {
107
91
  if (!Array.isArray(system)) return null;
108
92
 
93
+ // Find the attribution header block
109
94
  const attrIdx = system.findIndex(
110
- (b) =>
111
- b.type === "text" &&
112
- typeof b.text === "string" &&
113
- b.text.includes("x-anthropic-billing-header:")
95
+ (b) => b.type === "text" && typeof b.text === "string" && b.text.includes("x-anthropic-billing-header:")
114
96
  );
115
97
  if (attrIdx === -1) return null;
116
98
 
@@ -118,13 +100,14 @@ function stabilizeFingerprint(system, messages) {
118
100
  const versionMatch = attrBlock.text.match(/cc_version=([^;]+)/);
119
101
  if (!versionMatch) return null;
120
102
 
121
- const fullVersion = versionMatch[1]; // e.g. "2.1.92.a3f"
103
+ const fullVersion = versionMatch[1]; // e.g. "2.1.87.a3f"
122
104
  const dotParts = fullVersion.split(".");
123
105
  if (dotParts.length < 4) return null;
124
106
 
125
- const baseVersion = dotParts.slice(0, 3).join("."); // "2.1.92"
107
+ const baseVersion = dotParts.slice(0, 3).join("."); // "2.1.87"
126
108
  const oldFingerprint = dotParts[3]; // "a3f"
127
109
 
110
+ // Compute stable fingerprint from real user text
128
111
  const realText = extractRealUserMessageText(messages);
129
112
  const stableFingerprint = computeFingerprint(realText, baseVersion);
130
113
 
@@ -139,38 +122,28 @@ function stabilizeFingerprint(system, messages) {
139
122
  return { attrIdx, newText, oldFingerprint, stableFingerprint };
140
123
  }
141
124
 
142
- // ---------------------------------------------------------------------------
125
+ // --------------------------------------------------------------------------
143
126
  // Resume message relocation (Bug 1)
144
- // ---------------------------------------------------------------------------
127
+ // --------------------------------------------------------------------------
145
128
 
146
129
  function isSystemReminder(text) {
147
130
  return typeof text === "string" && text.startsWith("<system-reminder>");
148
131
  }
149
-
132
+ // FIX: Match block headers with startsWith to avoid false positives from
133
+ // quoted content (e.g. "Note:" file-change reminders embedding debug logs).
150
134
  const SR = "<system-reminder>\n";
151
-
152
135
  function isHooksBlock(text) {
153
- return (
154
- isSystemReminder(text) && text.substring(0, 200).includes("hook success")
155
- );
136
+ // Hooks block header varies; fall back to head-region check
137
+ return isSystemReminder(text) && text.substring(0, 200).includes("hook success");
156
138
  }
157
139
  function isSkillsBlock(text) {
158
- return (
159
- typeof text === "string" &&
160
- text.startsWith(SR + "The following skills are available")
161
- );
140
+ return typeof text === "string" && text.startsWith(SR + "The following skills are available");
162
141
  }
163
142
  function isDeferredToolsBlock(text) {
164
- return (
165
- typeof text === "string" &&
166
- text.startsWith(SR + "The following deferred tools are now available")
167
- );
143
+ return typeof text === "string" && text.startsWith(SR + "The following deferred tools are now available");
168
144
  }
169
145
  function isMcpBlock(text) {
170
- return (
171
- typeof text === "string" &&
172
- text.startsWith(SR + "# MCP Server Instructions")
173
- );
146
+ return typeof text === "string" && text.startsWith(SR + "# MCP Server Instructions");
174
147
  }
175
148
  function isRelocatableBlock(text) {
176
149
  return (
@@ -208,18 +181,21 @@ function stripSessionKnowledge(text) {
208
181
  }
209
182
 
210
183
  /**
211
- * Core fix: on EVERY API call, scan the entire message array for the LATEST
184
+ * Core fix: on EVERY call, scan the entire message array for the LATEST
212
185
  * relocatable blocks (skills, MCP, deferred tools, hooks) and ensure they
213
186
  * are in messages[0]. This matches fresh session behavior where attachments
214
- * are always prepended to messages[0].
187
+ * are always prepended to messages[0] on every API call.
215
188
  *
216
- * The v2.1.90 native fix has a remaining detection gap: it bails early if
217
- * it sees *some* relocatable blocks in messages[0], missing the case where
218
- * others have scattered elsewhere (partial scatter).
189
+ * The original community fix only checked the last user message, which
190
+ * broke on subsequent turns because:
191
+ * - Call 1: skills in last msg → relocated to messages[0] (3 blocks)
192
+ * - Call 2: in-memory state unchanged, skills now in a middle msg,
193
+ * last msg has no relocatable blocks → messages[0] back to 2 blocks
194
+ * - Prefix changed → cache bust
219
195
  *
220
196
  * This version scans backwards to find the latest instance of each
221
197
  * relocatable block type, removes them from wherever they are, and
222
- * prepends them to messages[0] in fresh-session order. Idempotent.
198
+ * prepends them to messages[0]. Idempotent across calls.
223
199
  */
224
200
  function normalizeResumeMessages(messages) {
225
201
  if (!Array.isArray(messages) || messages.length < 2) return messages;
@@ -236,13 +212,11 @@ function normalizeResumeMessages(messages) {
236
212
  const firstMsg = messages[firstUserIdx];
237
213
  if (!Array.isArray(firstMsg?.content)) return messages;
238
214
 
239
- // Check if ANY relocatable blocks are scattered outside first user msg.
215
+ // FIX: Check if ANY relocatable blocks are scattered outside first user msg.
216
+ // The old check (firstAlreadyHas → skip) missed partial scatter where some
217
+ // blocks stay in messages[0] but others drift to later messages (v2.1.89+).
240
218
  let hasScatteredBlocks = false;
241
- for (
242
- let i = firstUserIdx + 1;
243
- i < messages.length && !hasScatteredBlocks;
244
- i++
245
- ) {
219
+ for (let i = firstUserIdx + 1; i < messages.length && !hasScatteredBlocks; i++) {
246
220
  const msg = messages[i];
247
221
  if (msg.role !== "user" || !Array.isArray(msg.content)) continue;
248
222
  for (const block of msg.content) {
@@ -254,8 +228,8 @@ function normalizeResumeMessages(messages) {
254
228
  }
255
229
  if (!hasScatteredBlocks) return messages;
256
230
 
257
- // Scan ALL user messages in reverse to collect the LATEST version of each
258
- // block type. This handles both full and partial scatter.
231
+ // Scan ALL user messages (including first) in reverse to collect the LATEST
232
+ // version of each block type. This handles both full and partial scatter.
259
233
  const found = new Map();
260
234
 
261
235
  for (let i = messages.length - 1; i >= firstUserIdx; i--) {
@@ -267,6 +241,7 @@ function normalizeResumeMessages(messages) {
267
241
  const text = block.text || "";
268
242
  if (!isRelocatableBlock(text)) continue;
269
243
 
244
+ // Determine block type for dedup
270
245
  let blockType;
271
246
  if (isSkillsBlock(text)) blockType = "skills";
272
247
  else if (isMcpBlock(text)) blockType = "mcp";
@@ -274,6 +249,7 @@ function normalizeResumeMessages(messages) {
274
249
  else if (isHooksBlock(text)) blockType = "hooks";
275
250
  else continue;
276
251
 
252
+ // Keep only the LATEST (first found scanning backwards)
277
253
  if (!found.has(blockType)) {
278
254
  let fixedText = text;
279
255
  if (blockType === "hooks") fixedText = stripSessionKnowledge(text);
@@ -287,17 +263,15 @@ function normalizeResumeMessages(messages) {
287
263
 
288
264
  if (found.size === 0) return messages;
289
265
 
290
- // Remove ALL relocatable blocks from ALL user messages
266
+ // Remove ALL relocatable blocks from ALL user messages (both first and later)
291
267
  const result = messages.map((msg) => {
292
268
  if (msg.role !== "user" || !Array.isArray(msg.content)) return msg;
293
- const filtered = msg.content.filter(
294
- (b) => !isRelocatableBlock(b.text || "")
295
- );
269
+ const filtered = msg.content.filter((b) => !isRelocatableBlock(b.text || ""));
296
270
  if (filtered.length === msg.content.length) return msg;
297
271
  return { ...msg, content: filtered };
298
272
  });
299
273
 
300
- // Order must match fresh session layout: deferred -> mcp -> skills -> hooks
274
+ // FIX: Order must match fresh session layout: deferred mcp skills hooks
301
275
  const ORDER = ["deferred", "mcp", "skills", "hooks"];
302
276
  const toRelocate = ORDER.filter((t) => found.has(t)).map((t) => found.get(t));
303
277
 
@@ -309,12 +283,95 @@ function normalizeResumeMessages(messages) {
309
283
  return result;
310
284
  }
311
285
 
312
- // ---------------------------------------------------------------------------
313
- // Tool schema stabilization (Bug 3)
314
- // ---------------------------------------------------------------------------
286
+ // --------------------------------------------------------------------------
287
+ // Image stripping from old tool results (cost optimization)
288
+ // --------------------------------------------------------------------------
289
+
290
+ // CACHE_FIX_IMAGE_KEEP_LAST=N — keep images only in the last N user messages.
291
+ // Unset or 0 = disabled (all images preserved, backward compatible).
292
+ // Images in tool_result blocks older than N user messages from the end are
293
+ // replaced with a text placeholder. User-pasted images (direct image blocks
294
+ // in user messages, not inside tool_result) are left alone.
295
+ const IMAGE_KEEP_LAST = parseInt(process.env.CACHE_FIX_IMAGE_KEEP_LAST || "0", 10);
296
+
297
+ /**
298
+ * Strip base64 image blocks from tool_result content in older messages.
299
+ * Returns { messages, stats } where stats has stripping metrics.
300
+ */
301
+ function stripOldToolResultImages(messages, keepLast) {
302
+ if (!keepLast || keepLast <= 0 || !Array.isArray(messages)) {
303
+ return { messages, stats: null };
304
+ }
305
+
306
+ // Find user message indices (turns) so we can count from the end
307
+ const userMsgIndices = [];
308
+ for (let i = 0; i < messages.length; i++) {
309
+ if (messages[i].role === "user") userMsgIndices.push(i);
310
+ }
311
+
312
+ if (userMsgIndices.length <= keepLast) {
313
+ return { messages, stats: null }; // not enough turns to strip anything
314
+ }
315
+
316
+ // Messages at or after this index are "recent" — keep their images
317
+ const cutoffIdx = userMsgIndices[userMsgIndices.length - keepLast];
318
+
319
+ let strippedCount = 0;
320
+ let strippedBytes = 0;
321
+
322
+ const result = messages.map((msg, msgIdx) => {
323
+ // Only process user messages before the cutoff (tool_result is in user msgs)
324
+ if (msg.role !== "user" || msgIdx >= cutoffIdx || !Array.isArray(msg.content)) {
325
+ return msg;
326
+ }
327
+
328
+ let msgModified = false;
329
+ const newContent = msg.content.map((block) => {
330
+ // Only strip images inside tool_result blocks, not user-pasted images
331
+ if (block.type === "tool_result" && Array.isArray(block.content)) {
332
+ let toolModified = false;
333
+ const newToolContent = block.content.map((item) => {
334
+ if (item.type === "image") {
335
+ strippedCount++;
336
+ if (item.source?.data) {
337
+ strippedBytes += item.source.data.length;
338
+ }
339
+ toolModified = true;
340
+ return {
341
+ type: "text",
342
+ text: "[image stripped from history — file may still be on disk]",
343
+ };
344
+ }
345
+ return item;
346
+ });
347
+ if (toolModified) {
348
+ msgModified = true;
349
+ return { ...block, content: newToolContent };
350
+ }
351
+ }
352
+ return block;
353
+ });
354
+
355
+ if (msgModified) {
356
+ return { ...msg, content: newContent };
357
+ }
358
+ return msg;
359
+ });
360
+
361
+ const stats = strippedCount > 0
362
+ ? { strippedCount, strippedBytes, estimatedTokens: Math.ceil(strippedBytes * 0.125) }
363
+ : null;
364
+
365
+ return { messages: strippedCount > 0 ? result : messages, stats };
366
+ }
367
+
368
+ // --------------------------------------------------------------------------
369
+ // Tool schema stabilization (Bug 2 secondary cause)
370
+ // --------------------------------------------------------------------------
315
371
 
316
372
  /**
317
- * Sort tool definitions by name for deterministic ordering.
373
+ * Sort tool definitions by name for deterministic ordering. Tool schema bytes
374
+ * changing mid-session was acknowledged as a bug in the v2.1.88 changelog.
318
375
  */
319
376
  function stabilizeToolOrder(tools) {
320
377
  if (!Array.isArray(tools) || tools.length === 0) return tools;
@@ -325,9 +382,228 @@ function stabilizeToolOrder(tools) {
325
382
  });
326
383
  }
327
384
 
328
- // ---------------------------------------------------------------------------
385
+ // --------------------------------------------------------------------------
329
386
  // Fetch interceptor
330
- // ---------------------------------------------------------------------------
387
+ // --------------------------------------------------------------------------
388
+
389
+ // --------------------------------------------------------------------------
390
+ // Debug logging (writes to ~/.claude/cache-fix-debug.log)
391
+ // Set CACHE_FIX_DEBUG=1 to enable
392
+ // --------------------------------------------------------------------------
393
+
394
+ import { appendFileSync, readFileSync, writeFileSync, mkdirSync } from "node:fs";
395
+ import { homedir } from "node:os";
396
+ import { join } from "node:path";
397
+
398
+ const DEBUG = process.env.CACHE_FIX_DEBUG === "1";
399
+ const PREFIXDIFF = process.env.CACHE_FIX_PREFIXDIFF === "1";
400
+ const LOG_PATH = join(homedir(), ".claude", "cache-fix-debug.log");
401
+ const SNAPSHOT_DIR = join(homedir(), ".claude", "cache-fix-snapshots");
402
+
403
+ function debugLog(...args) {
404
+ if (!DEBUG) return;
405
+ const line = `[${new Date().toISOString()}] ${args.join(" ")}\n`;
406
+ try { appendFileSync(LOG_PATH, line); } catch {}
407
+ }
408
+
409
+ // --------------------------------------------------------------------------
410
+ // Prefix snapshot — captures message prefix for cross-process diff.
411
+ // Set CACHE_FIX_PREFIXDIFF=1 to enable.
412
+ //
413
+ // On each API call: saves JSON of first 5 messages + system + tools hash
414
+ // to ~/.claude/cache-fix-snapshots/<session-hash>-last.json
415
+ //
416
+ // On first call after startup: compares against saved snapshot and writes
417
+ // a diff report to ~/.claude/cache-fix-snapshots/<session-hash>-diff.json
418
+ // --------------------------------------------------------------------------
419
+
420
+ let _prefixDiffFirstCall = true;
421
+
422
+ // --------------------------------------------------------------------------
423
+ // GrowthBook flag dump (runs once on first API call)
424
+ // --------------------------------------------------------------------------
425
+
426
+ let _growthBookDumped = false;
427
+
428
+ function dumpGrowthBookFlags() {
429
+ if (_growthBookDumped || !DEBUG) return;
430
+ _growthBookDumped = true;
431
+ try {
432
+ const claudeJson = JSON.parse(readFileSync(join(homedir(), ".claude.json"), "utf8"));
433
+ const features = claudeJson.cachedGrowthBookFeatures;
434
+ if (!features) { debugLog("GROWTHBOOK: no cachedGrowthBookFeatures found"); return; }
435
+
436
+ // Log the flags that matter for cost/cache/context behavior
437
+ const interesting = {
438
+ hawthorn_window: features.tengu_hawthorn_window,
439
+ pewter_kestrel: features.tengu_pewter_kestrel,
440
+ summarize_tool_results: features.tengu_summarize_tool_results,
441
+ slate_heron: features.tengu_slate_heron,
442
+ session_memory: features.tengu_session_memory,
443
+ sm_compact: features.tengu_sm_compact,
444
+ sm_compact_config: features.tengu_sm_compact_config,
445
+ sm_config: features.tengu_sm_config,
446
+ cache_plum_violet: features.tengu_cache_plum_violet,
447
+ prompt_cache_1h_config: features.tengu_prompt_cache_1h_config,
448
+ crystal_beam: features.tengu_crystal_beam,
449
+ cold_compact: features.tengu_cold_compact,
450
+ system_prompt_global_cache: features.tengu_system_prompt_global_cache,
451
+ compact_cache_prefix: features.tengu_compact_cache_prefix,
452
+ };
453
+ debugLog("GROWTHBOOK FLAGS:", JSON.stringify(interesting, null, 2));
454
+ } catch (e) {
455
+ debugLog("GROWTHBOOK: failed to read ~/.claude.json:", e?.message);
456
+ }
457
+ }
458
+
459
+ // --------------------------------------------------------------------------
460
+ // Microcompact / budget monitoring
461
+ // --------------------------------------------------------------------------
462
+
463
+ /**
464
+ * Scan outgoing messages for signs of microcompact clearing and budget
465
+ * enforcement. Counts tool results that have been gutted and reports stats.
466
+ */
467
+ function monitorContextDegradation(messages) {
468
+ if (!Array.isArray(messages)) return null;
469
+
470
+ let clearedToolResults = 0;
471
+ let totalToolResultChars = 0;
472
+ let totalToolResults = 0;
473
+
474
+ for (const msg of messages) {
475
+ if (msg.role !== "user" || !Array.isArray(msg.content)) continue;
476
+ for (const block of msg.content) {
477
+ if (block.type === "tool_result") {
478
+ totalToolResults++;
479
+ const content = block.content;
480
+ if (typeof content === "string") {
481
+ if (content === "[Old tool result content cleared]") {
482
+ clearedToolResults++;
483
+ } else {
484
+ totalToolResultChars += content.length;
485
+ }
486
+ } else if (Array.isArray(content)) {
487
+ for (const item of content) {
488
+ if (item.type === "text") {
489
+ if (item.text === "[Old tool result content cleared]") {
490
+ clearedToolResults++;
491
+ } else {
492
+ totalToolResultChars += item.text.length;
493
+ }
494
+ }
495
+ }
496
+ }
497
+ }
498
+ }
499
+ }
500
+
501
+ if (totalToolResults === 0) return null;
502
+
503
+ const stats = { totalToolResults, clearedToolResults, totalToolResultChars };
504
+
505
+ if (clearedToolResults > 0) {
506
+ debugLog(`MICROCOMPACT: ${clearedToolResults}/${totalToolResults} tool results cleared`);
507
+ }
508
+
509
+ // Warn when approaching the 200K budget threshold
510
+ if (totalToolResultChars > 150000) {
511
+ debugLog(`BUDGET WARNING: tool result chars at ${totalToolResultChars.toLocaleString()} / 200,000 threshold`);
512
+ }
513
+
514
+ return stats;
515
+ }
516
+
517
+ function snapshotPrefix(payload) {
518
+ if (!PREFIXDIFF) return;
519
+ try {
520
+ mkdirSync(SNAPSHOT_DIR, { recursive: true });
521
+
522
+ // Session key: use system prompt hash — stable across restarts for the same project.
523
+ // Different projects get different snapshots, same project matches across resume.
524
+ const sessionKey = payload.system
525
+ ? createHash("sha256").update(JSON.stringify(payload.system).slice(0, 2000)).digest("hex").slice(0, 12)
526
+ : "default";
527
+
528
+ const snapshotFile = join(SNAPSHOT_DIR, `${sessionKey}-last.json`);
529
+ const diffFile = join(SNAPSHOT_DIR, `${sessionKey}-diff.json`);
530
+
531
+ // Build prefix snapshot: first 5 messages, stripped of cache_control
532
+ const prefixMsgs = (payload.messages || []).slice(0, 5).map(msg => {
533
+ const content = Array.isArray(msg.content)
534
+ ? msg.content.map(b => {
535
+ const { cache_control, ...rest } = b;
536
+ // Truncate long text blocks for diffing
537
+ if (rest.text && rest.text.length > 500) {
538
+ rest.text = rest.text.slice(0, 500) + `...[${rest.text.length} chars]`;
539
+ }
540
+ return rest;
541
+ })
542
+ : msg.content;
543
+ return { role: msg.role, content };
544
+ });
545
+
546
+ const toolsHash = payload.tools
547
+ ? createHash("sha256").update(JSON.stringify(payload.tools.map(t => t.name))).digest("hex").slice(0, 16)
548
+ : "none";
549
+
550
+ const systemHash = payload.system
551
+ ? createHash("sha256").update(JSON.stringify(payload.system)).digest("hex").slice(0, 16)
552
+ : "none";
553
+
554
+ const snapshot = {
555
+ timestamp: new Date().toISOString(),
556
+ messageCount: payload.messages?.length || 0,
557
+ toolsHash,
558
+ systemHash,
559
+ prefixMessages: prefixMsgs,
560
+ };
561
+
562
+ // On first call: compare against saved
563
+ if (_prefixDiffFirstCall) {
564
+ _prefixDiffFirstCall = false;
565
+ try {
566
+ const prev = JSON.parse(readFileSync(snapshotFile, "utf8"));
567
+ const diff = {
568
+ timestamp: snapshot.timestamp,
569
+ prevTimestamp: prev.timestamp,
570
+ toolsMatch: prev.toolsHash === snapshot.toolsHash,
571
+ systemMatch: prev.systemHash === snapshot.systemHash,
572
+ messageCountPrev: prev.messageCount,
573
+ messageCountNow: snapshot.messageCount,
574
+ prefixDiffs: [],
575
+ };
576
+
577
+ const maxIdx = Math.max(prev.prefixMessages.length, snapshot.prefixMessages.length);
578
+ for (let i = 0; i < maxIdx; i++) {
579
+ const prevMsg = JSON.stringify(prev.prefixMessages[i] || null);
580
+ const nowMsg = JSON.stringify(snapshot.prefixMessages[i] || null);
581
+ if (prevMsg !== nowMsg) {
582
+ diff.prefixDiffs.push({
583
+ index: i,
584
+ prev: prev.prefixMessages[i] || null,
585
+ now: snapshot.prefixMessages[i] || null,
586
+ });
587
+ }
588
+ }
589
+
590
+ writeFileSync(diffFile, JSON.stringify(diff, null, 2));
591
+ debugLog(`PREFIX DIFF: ${diff.prefixDiffs.length} differences in first 5 messages. tools=${diff.toolsMatch ? "match" : "DIFFER"} system=${diff.systemMatch ? "match" : "DIFFER"}`);
592
+ } catch {
593
+ // No previous snapshot — first run
594
+ }
595
+ }
596
+
597
+ // Save current snapshot
598
+ writeFileSync(snapshotFile, JSON.stringify(snapshot, null, 2));
599
+ } catch (e) {
600
+ debugLog("PREFIX SNAPSHOT ERROR:", e?.message);
601
+ }
602
+ }
603
+
604
+ // --------------------------------------------------------------------------
605
+ // Fetch interceptor
606
+ // --------------------------------------------------------------------------
331
607
 
332
608
  const _origFetch = globalThis.fetch;
333
609
 
@@ -339,23 +615,27 @@ globalThis.fetch = async function (url, options) {
339
615
  !urlStr.includes("batches") &&
340
616
  !urlStr.includes("count_tokens");
341
617
 
342
- if (
343
- isMessagesEndpoint &&
344
- options?.body &&
345
- typeof options.body === "string"
346
- ) {
618
+ if (isMessagesEndpoint && options?.body && typeof options.body === "string") {
347
619
  try {
348
620
  const payload = JSON.parse(options.body);
349
621
  let modified = false;
350
622
 
623
+ // One-time GrowthBook flag dump on first API call
624
+ dumpGrowthBookFlags();
625
+
351
626
  debugLog("--- API call to", urlStr);
352
627
  debugLog("message count:", payload.messages?.length);
353
628
 
354
- // Bug 1: Relocate scattered attachment blocks
629
+ // Detect synthetic model (false rate limiter, B3)
630
+ if (payload.model === "<synthetic>") {
631
+ debugLog("FALSE RATE LIMIT: synthetic model detected — client-side rate limit, no real API call");
632
+ }
633
+
634
+ // Bug 1: Relocate resume attachment blocks
355
635
  if (payload.messages) {
636
+ // Log message structure for debugging
356
637
  if (DEBUG) {
357
- let firstUserIdx = -1;
358
- let lastUserIdx = -1;
638
+ let firstUserIdx = -1, lastUserIdx = -1;
359
639
  for (let i = 0; i < payload.messages.length; i++) {
360
640
  if (payload.messages[i].role === "user") {
361
641
  if (firstUserIdx === -1) firstUserIdx = i;
@@ -365,39 +645,20 @@ globalThis.fetch = async function (url, options) {
365
645
  if (firstUserIdx !== -1) {
366
646
  const firstContent = payload.messages[firstUserIdx].content;
367
647
  const lastContent = payload.messages[lastUserIdx].content;
368
- debugLog(
369
- "firstUserIdx:",
370
- firstUserIdx,
371
- "lastUserIdx:",
372
- lastUserIdx
373
- );
374
- debugLog(
375
- "first user msg blocks:",
376
- Array.isArray(firstContent) ? firstContent.length : "string"
377
- );
648
+ debugLog("firstUserIdx:", firstUserIdx, "lastUserIdx:", lastUserIdx);
649
+ debugLog("first user msg blocks:", Array.isArray(firstContent) ? firstContent.length : "string");
378
650
  if (Array.isArray(firstContent)) {
379
651
  for (const b of firstContent) {
380
652
  const t = (b.text || "").substring(0, 80);
381
- debugLog(
382
- " first[block]:",
383
- isRelocatableBlock(b.text) ? "RELOCATABLE" : "keep",
384
- JSON.stringify(t)
385
- );
653
+ debugLog(" first[block]:", isRelocatableBlock(b.text) ? "RELOCATABLE" : "keep", JSON.stringify(t));
386
654
  }
387
655
  }
388
656
  if (firstUserIdx !== lastUserIdx) {
389
- debugLog(
390
- "last user msg blocks:",
391
- Array.isArray(lastContent) ? lastContent.length : "string"
392
- );
657
+ debugLog("last user msg blocks:", Array.isArray(lastContent) ? lastContent.length : "string");
393
658
  if (Array.isArray(lastContent)) {
394
659
  for (const b of lastContent) {
395
660
  const t = (b.text || "").substring(0, 80);
396
- debugLog(
397
- " last[block]:",
398
- isRelocatableBlock(b.text) ? "RELOCATABLE" : "keep",
399
- JSON.stringify(t)
400
- );
661
+ debugLog(" last[block]:", isRelocatableBlock(b.text) ? "RELOCATABLE" : "keep", JSON.stringify(t));
401
662
  }
402
663
  }
403
664
  } else {
@@ -412,13 +673,28 @@ globalThis.fetch = async function (url, options) {
412
673
  modified = true;
413
674
  debugLog("APPLIED: resume message relocation");
414
675
  } else {
676
+ debugLog("SKIPPED: resume relocation (not a resume or already correct)");
677
+ }
678
+ }
679
+
680
+ // Image stripping: remove old tool_result images to reduce token waste
681
+ if (payload.messages && IMAGE_KEEP_LAST > 0) {
682
+ const { messages: imgStripped, stats: imgStats } = stripOldToolResultImages(
683
+ payload.messages, IMAGE_KEEP_LAST
684
+ );
685
+ if (imgStats) {
686
+ payload.messages = imgStripped;
687
+ modified = true;
415
688
  debugLog(
416
- "SKIPPED: resume relocation (not a resume or already correct)"
689
+ `APPLIED: stripped ${imgStats.strippedCount} images from old tool results`,
690
+ `(~${imgStats.strippedBytes} base64 bytes, ~${imgStats.estimatedTokens} tokens saved)`
417
691
  );
692
+ } else if (IMAGE_KEEP_LAST > 0) {
693
+ debugLog("SKIPPED: image stripping (no old images found or not enough turns)");
418
694
  }
419
695
  }
420
696
 
421
- // Bug 3: Stabilize tool ordering
697
+ // Bug 2a: Stabilize tool ordering
422
698
  if (payload.tools) {
423
699
  const sorted = stabilizeToolOrder(payload.tools);
424
700
  const changed = sorted.some(
@@ -431,7 +707,7 @@ globalThis.fetch = async function (url, options) {
431
707
  }
432
708
  }
433
709
 
434
- // Bug 2: Stabilize fingerprint in attribution header
710
+ // Bug 2b: Stabilize fingerprint in attribution header
435
711
  if (payload.system && payload.messages) {
436
712
  const fix = stabilizeFingerprint(payload.system, payload.messages);
437
713
  if (fix) {
@@ -441,12 +717,7 @@ globalThis.fetch = async function (url, options) {
441
717
  text: fix.newText,
442
718
  };
443
719
  modified = true;
444
- debugLog(
445
- "APPLIED: fingerprint stabilized from",
446
- fix.oldFingerprint,
447
- "to",
448
- fix.stableFingerprint
449
- );
720
+ debugLog("APPLIED: fingerprint stabilized from", fix.oldFingerprint, "to", fix.stableFingerprint);
450
721
  }
451
722
  }
452
723
 
@@ -454,11 +725,48 @@ globalThis.fetch = async function (url, options) {
454
725
  options = { ...options, body: JSON.stringify(payload) };
455
726
  debugLog("Request body rewritten");
456
727
  }
728
+
729
+ // Monitor for microcompact / budget enforcement degradation
730
+ if (payload.messages) {
731
+ monitorContextDegradation(payload.messages);
732
+ }
733
+
734
+ // Capture prefix snapshot for cross-process diff analysis
735
+ snapshotPrefix(payload);
736
+
457
737
  } catch (e) {
458
738
  debugLog("ERROR in interceptor:", e?.message);
459
739
  // Parse failure — pass through unmodified
460
740
  }
461
741
  }
462
742
 
463
- return _origFetch.apply(this, [url, options]);
743
+ const response = await _origFetch.apply(this, [url, options]);
744
+
745
+ // Extract quota utilization from response headers and save for hooks/MCP
746
+ if (isMessagesEndpoint) {
747
+ try {
748
+ const h5 = response.headers.get("anthropic-ratelimit-unified-5h-utilization");
749
+ const h7d = response.headers.get("anthropic-ratelimit-unified-7d-utilization");
750
+ const reset5h = response.headers.get("anthropic-ratelimit-unified-5h-reset");
751
+ const reset7d = response.headers.get("anthropic-ratelimit-unified-7d-reset");
752
+ const status = response.headers.get("anthropic-ratelimit-unified-status");
753
+ const overage = response.headers.get("anthropic-ratelimit-unified-overage-status");
754
+
755
+ if (h5 || h7d) {
756
+ const quota = {
757
+ timestamp: new Date().toISOString(),
758
+ five_hour: h5 ? { utilization: parseFloat(h5), pct: Math.round(parseFloat(h5) * 100), resets_at: reset5h ? parseInt(reset5h) : null } : null,
759
+ seven_day: h7d ? { utilization: parseFloat(h7d), pct: Math.round(parseFloat(h7d) * 100), resets_at: reset7d ? parseInt(reset7d) : null } : null,
760
+ status: status || null,
761
+ overage_status: overage || null,
762
+ };
763
+ const quotaFile = join(homedir(), ".claude", "quota-status.json");
764
+ writeFileSync(quotaFile, JSON.stringify(quota, null, 2));
765
+ }
766
+ } catch {
767
+ // Non-critical — don't break the response
768
+ }
769
+ }
770
+
771
+ return response;
464
772
  };