claude-code-cache-fix 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +104 -5
  2. package/package.json +1 -1
  3. package/preload.mjs +607 -135
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # claude-code-cache-fix
2
2
 
3
- Fixes a prompt cache regression in [Claude Code](https://github.com/anthropics/claude-code) that causes **up to 20x cost increase** on resumed sessions. Confirmed broken through v2.1.92.
3
+ Fixes prompt cache regressions in [Claude Code](https://github.com/anthropics/claude-code) that cause **up to 20x cost increase** on resumed sessions, plus monitoring for silent context degradation. Confirmed through v2.1.92.
4
4
 
5
5
  ## The problem
6
6
 
@@ -14,6 +14,8 @@ Three bugs cause this:
14
14
 
15
15
  3. **Non-deterministic tool ordering** — Tool definitions can arrive in different orders between turns, changing request bytes and invalidating the cache key.
16
16
 
17
+ Additionally, images read via the Read tool persist as base64 in conversation history and are sent on every subsequent API call, compounding token costs silently.
18
+
17
19
  ## Installation
18
20
 
19
21
  Requires Node.js >= 18 and Claude Code installed via npm (not the standalone binary).
@@ -76,6 +78,63 @@ The module intercepts `globalThis.fetch` before Claude Code makes API calls to `
76
78
 
77
79
  All fixes are idempotent — if nothing needs fixing, the request passes through unmodified. The interceptor is read-only with respect to your conversation; it only normalizes the request structure before it hits the API.
78
80
 
81
+ ## Image stripping
82
+
83
+ Images read via the Read tool are encoded as base64 and stored in `tool_result` blocks in conversation history. They ride along on **every subsequent API call** until compaction. A single 500KB image costs ~62,500 tokens per turn in carry-forward.
84
+
85
+ Enable image stripping to remove old images from tool results:
86
+
87
+ ```bash
88
+ export CACHE_FIX_IMAGE_KEEP_LAST=3
89
+ ```
90
+
91
+ This keeps images in the last 3 user messages and replaces older ones with a text placeholder. Only targets images inside `tool_result` blocks (Read tool output) — user-pasted images are never touched. Files remain on disk for re-reading if needed.
92
+
93
+ Set to `0` (default) to disable.
94
+
95
+ ## Prefix lock (resume cache hit)
96
+
97
+ Even with the block relocation fix, the first API call after `--resume` triggers a full cache rebuild because CC reassembles messages with different system-reminder blocks, changing the prefix bytes. On a 300k token context at Opus rates, that's ~$2.80 per resume.
98
+
99
+ The prefix lock eliminates this by saving the exact `messages[0]` content after all fixes are applied, then replaying it on the next resume to produce a byte-identical prefix.
100
+
101
+ ```bash
102
+ export CACHE_FIX_PREFIX_LOCK=1
103
+ ```
104
+
105
+ Safety guards — the lock only fires when ALL of these match:
106
+ - System prompt hash (same project, no CLAUDE.md changes)
107
+ - Tools hash (no MCP/plugin changes)
108
+ - User message text (same conversation)
109
+ - User content hash (no substantive context changes)
110
+ - Not a post-compaction conversation
111
+
112
+ If any guard fails, the lock skips and falls back to normal behavior. The worst case is a skip — the lock cannot increase costs or cause context loss.
113
+
114
+ Set to `0` (default) to disable.
115
+
116
+ ## Monitoring
117
+
118
+ The interceptor includes monitoring for several additional issues identified by the community:
119
+
120
+ ### Microcompact / budget enforcement
121
+
122
+ Claude Code silently replaces old tool results with `[Old tool result content cleared]` via server-controlled mechanisms (GrowthBook flags). A 200,000-character aggregate cap and per-tool caps (Bash: 30K, Grep: 20K) truncate older results without notification. There is no `DISABLE_MICROCOMPACT` environment variable.
123
+
124
+ The interceptor detects cleared tool results and logs counts. When total tool result characters approach the 200K threshold, a warning is logged.
125
+
126
+ ### False rate limiter
127
+
128
+ The client can generate synthetic "Rate limit reached" errors without making an API call, identifiable by `"model": "<synthetic>"`. The interceptor logs these events.
129
+
130
+ ### GrowthBook flag dump
131
+
132
+ On the first API call, the interceptor reads `~/.claude.json` and logs the current state of cost/cache-relevant server-controlled flags (hawthorn_window, pewter_kestrel, slate_heron, session_memory, etc.).
133
+
134
+ ### Quota tracking
135
+
136
+ Response headers are parsed for `anthropic-ratelimit-unified-5h-utilization` and `7d-utilization`, saved to `~/.claude/quota-status.json` for consumption by status line hooks or other tools.
137
+
79
138
  ## Debug mode
80
139
 
81
140
  Enable debug logging to verify the fix is working:
@@ -88,31 +147,71 @@ Logs are written to `~/.claude/cache-fix-debug.log`. Look for:
88
147
  - `APPLIED: resume message relocation` — block scatter was detected and fixed
89
148
  - `APPLIED: tool order stabilization` — tools were reordered
90
149
  - `APPLIED: fingerprint stabilized from XXX to YYY` — fingerprint was corrected
91
- - `SKIPPED: resume relocation (not a resume or already correct)` — no fix needed (fresh session or already correct)
150
+ - `APPLIED: stripped N images from old tool results` — images were stripped
151
+ - `MICROCOMPACT: N/M tool results cleared` — microcompact degradation detected
152
+ - `BUDGET WARNING: tool result chars at N / 200,000 threshold` — approaching budget cap
153
+ - `FALSE RATE LIMIT: synthetic model detected` — client-side false rate limit
154
+ - `GROWTHBOOK FLAGS: {...}` — server-controlled feature flags on first call
155
+ - `PREFIX LOCK: APPLIED — replayed saved messages[0]` — resume cache hit achieved
156
+ - `PREFIX LOCK: skipped — <reason>` — guard prevented lock (expected, safe)
157
+ - `SKIPPED: resume relocation (not a resume or already correct)` — no fix needed
158
+
159
+ ### Prefix diff mode
160
+
161
+ Enable cross-process prefix snapshot diffing to diagnose cache busts on restart:
162
+
163
+ ```bash
164
+ CACHE_FIX_PREFIXDIFF=1 claude-fixed
165
+ ```
166
+
167
+ Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are generated on the first API call after a restart.
168
+
169
+ ## Environment variables
170
+
171
+ | Variable | Default | Description |
172
+ |----------|---------|-------------|
173
+ | `CACHE_FIX_DEBUG` | `0` | Enable debug logging to `~/.claude/cache-fix-debug.log` |
174
+ | `CACHE_FIX_PREFIXDIFF` | `0` | Enable prefix snapshot diffing |
175
+ | `CACHE_FIX_IMAGE_KEEP_LAST` | `0` | Keep images in last N user messages (0 = disabled) |
176
+ | `CACHE_FIX_PREFIX_LOCK` | `0` | Replay saved messages[0] on resume for cache hit (0 = disabled) |
92
177
 
93
178
  ## Limitations
94
179
 
95
180
  - **npm installation only** — The standalone Claude Code binary has Zig-level attestation that bypasses Node.js. This fix only works with the npm package (`npm install -g @anthropic-ai/claude-code`).
96
181
  - **Overage TTL downgrade** — Exceeding 100% of the 5-hour quota triggers a server-enforced TTL downgrade from 1h to 5m. This is a server-side decision and cannot be fixed client-side. The interceptor prevents the cache instability that can push you into overage in the first place.
182
+ - **Microcompact is not preventable** — The monitoring features detect context degradation but cannot prevent it. The microcompact and budget enforcement mechanisms are server-controlled via GrowthBook flags with no client-side disable option.
97
183
  - **Version coupling** — The fingerprint salt and block detection heuristics are derived from Claude Code internals. A major refactor could require an update to this package.
98
184
 
99
185
  ## Tracked issues
100
186
 
101
187
  - [#34629](https://github.com/anthropics/claude-code/issues/34629) — Original resume cache regression report
102
- - [#40524](https://github.com/anthropics/claude-code/issues/40524) — Within-session fingerprint invalidation
103
- - [#42052](https://github.com/anthropics/claude-code/issues/42052) — Community interceptor development and testing
188
+ - [#40524](https://github.com/anthropics/claude-code/issues/40524) — Within-session fingerprint invalidation, image persistence
189
+ - [#42052](https://github.com/anthropics/claude-code/issues/42052) — Community interceptor development, TTL downgrade discovery
104
190
  - [#43044](https://github.com/anthropics/claude-code/issues/43044) — Resume loads 0% context on v2.1.91
105
191
  - [#43657](https://github.com/anthropics/claude-code/issues/43657) — Resume cache invalidation confirmed on v2.1.92
106
192
  - [#44045](https://github.com/anthropics/claude-code/issues/44045) — SDK-level reproduction with token measurements
107
193
 
194
+ ## Related research
195
+
196
+ - **[@ArkNill/claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis)** — Systematic proxy-based analysis of 7 bugs including microcompact, budget enforcement, false rate limiter, and extended thinking quota impact. The monitoring features in v1.1.0 are informed by this research.
197
+ - **[@Renvect/X-Ray-Claude-Code-Interceptor](https://github.com/Renvect/X-Ray-Claude-Code-Interceptor)** — Diagnostic HTTPS proxy with real-time dashboard, system prompt section diffing, per-tool stripping thresholds, and multi-stream JSONL logging. Works with any Claude client that supports `ANTHROPIC_BASE_URL` (CLI, VS Code extension, desktop app), complementing this package's CLI-only `NODE_OPTIONS` approach.
198
+
108
199
  ## Contributors
109
200
 
110
201
  - **[@VictorSun92](https://github.com/VictorSun92)** — Original monkey-patch fix for v2.1.88, identified partial scatter on v2.1.90, contributed forward-scan detection, correct block ordering, and tighter block matchers
111
202
  - **[@jmarianski](https://github.com/jmarianski)** — Root cause analysis via MITM proxy capture and Ghidra reverse engineering, multi-mode cache test script
112
- - **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, debug logging, overage TTL downgrade discovery, package maintainer
203
+ - **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, image stripping, monitoring features, overage TTL downgrade discovery, package maintainer
204
+ - **[@ArkNill](https://github.com/ArkNill)** — Microcompact mechanism analysis, GrowthBook flag documentation, false rate limiter identification
205
+ - **[@Renvect](https://github.com/Renvect)** — Image duplication discovery, cross-project directory contamination analysis
113
206
 
114
207
  If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.
115
208
 
209
+ ## Support
210
+
211
+ If this tool saved you money, consider buying me a coffee:
212
+
213
+ <a href="https://buymeacoffee.com/vsits" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>
214
+
116
215
  ## License
117
216
 
118
217
  [MIT](LICENSE)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-code-cache-fix",
3
- "version": "1.0.0",
3
+ "version": "1.2.0",
4
4
  "description": "Fixes prompt cache regression in Claude Code that causes up to 20x cost increase on resumed sessions",
5
5
  "type": "module",
6
6
  "exports": "./preload.mjs",
package/preload.mjs CHANGED
@@ -8,51 +8,42 @@
8
8
  // later user messages instead of messages[0]. This breaks the prompt cache
9
9
  // prefix match. Fix: relocate them to messages[0] on every API call.
10
10
  // (github.com/anthropics/claude-code/issues/34629)
11
- // (github.com/anthropics/claude-code/issues/43657)
12
- // (github.com/anthropics/claude-code/issues/44045)
13
11
  //
14
12
  // Bug 2: Fingerprint instability
15
13
  // The cc_version fingerprint in the attribution header is computed from
16
14
  // messages[0] content INCLUDING meta/attachment blocks. When those blocks
17
- // change between turns, the fingerprint changes -> system prompt bytes
18
- // change -> cache bust. Fix: recompute fingerprint from real user text.
15
+ // change between turns, the fingerprint changes, busting cache within the
16
+ // same session. Fix: stabilize the fingerprint from the real user message.
19
17
  // (github.com/anthropics/claude-code/issues/40524)
20
18
  //
21
- // Bug 3: Non-deterministic tool schema ordering
22
- // Tool definitions can arrive in different orders between turns, changing
23
- // request bytes and busting cache. Fix: sort tools alphabetically by name.
19
+ // Bug 3: Image carry-forward in conversation history
20
+ // Images read via the Read tool persist as base64 in conversation history
21
+ // and are sent on every subsequent API call. A single 500KB image costs
22
+ // ~62,500 tokens per turn in carry-forward. Fix: strip base64 image blocks
23
+ // from tool_result content older than N user turns.
24
+ // Set CACHE_FIX_IMAGE_KEEP_LAST=N to enable (default: 0 = disabled).
25
+ // (github.com/anthropics/claude-code/issues/40524)
26
+ //
27
+ // Monitoring:
28
+ // - GrowthBook flag dump on first API call (CACHE_FIX_DEBUG=1)
29
+ // - Microcompact / budget enforcement detection (logs cleared tool results)
30
+ // - False rate limiter detection (model: "<synthetic>")
31
+ // - Quota utilization tracking (writes ~/.claude/quota-status.json)
32
+ // - Prefix snapshot diffing across process restarts (CACHE_FIX_PREFIXDIFF=1)
24
33
  //
25
- // Based on community work by @VictorSun92 (original monkey-patch + partial
26
- // scatter fixes) and @jmarianski (MITM proxy root cause analysis).
34
+ // Based on community fix by @VictorSun92 / @jmarianski (issue #34629),
35
+ // enhanced with fingerprint stabilization, image stripping, and monitoring.
36
+ // Bug research informed by @ArkNill's claude-code-hidden-problem-analysis.
27
37
  //
28
- // Usage: NODE_OPTIONS="--import claude-code-cache-fix" claude
38
+ // Load via: NODE_OPTIONS="--import $HOME/.claude/cache-fix-preload.mjs"
29
39
 
30
40
  import { createHash } from "node:crypto";
31
- import { appendFileSync } from "node:fs";
32
- import { homedir } from "node:os";
33
- import { join } from "node:path";
34
-
35
- // ---------------------------------------------------------------------------
36
- // Debug logging (writes to ~/.claude/cache-fix-debug.log)
37
- // Set CACHE_FIX_DEBUG=1 to enable
38
- // ---------------------------------------------------------------------------
39
-
40
- const DEBUG = process.env.CACHE_FIX_DEBUG === "1";
41
- const LOG_PATH = join(homedir(), ".claude", "cache-fix-debug.log");
42
-
43
- function debugLog(...args) {
44
- if (!DEBUG) return;
45
- const line = `[${new Date().toISOString()}] ${args.join(" ")}\n`;
46
- try {
47
- appendFileSync(LOG_PATH, line);
48
- } catch {}
49
- }
50
41
 
51
- // ---------------------------------------------------------------------------
42
+ // --------------------------------------------------------------------------
52
43
  // Fingerprint stabilization (Bug 2)
53
- // ---------------------------------------------------------------------------
44
+ // --------------------------------------------------------------------------
54
45
 
55
- // Must match Claude Code src/utils/fingerprint.ts exactly.
46
+ // Must match src/utils/fingerprint.ts exactly.
56
47
  const FINGERPRINT_SALT = "59cf53e54c78";
57
48
  const FINGERPRINT_INDICES = [4, 7, 20];
58
49
 
@@ -77,20 +68,14 @@ function extractRealUserMessageText(messages) {
77
68
  if (msg.role !== "user") continue;
78
69
  const content = msg.content;
79
70
  if (!Array.isArray(content)) {
80
- if (
81
- typeof content === "string" &&
82
- !content.startsWith("<system-reminder>")
83
- ) {
71
+ if (typeof content === "string" && !content.startsWith("<system-reminder>")) {
84
72
  return content;
85
73
  }
86
74
  continue;
87
75
  }
76
+ // Find first text block that isn't a system-reminder
88
77
  for (const block of content) {
89
- if (
90
- block.type === "text" &&
91
- typeof block.text === "string" &&
92
- !block.text.startsWith("<system-reminder>")
93
- ) {
78
+ if (block.type === "text" && typeof block.text === "string" && !block.text.startsWith("<system-reminder>")) {
94
79
  return block.text;
95
80
  }
96
81
  }
@@ -100,17 +85,14 @@ function extractRealUserMessageText(messages) {
100
85
 
101
86
  /**
102
87
  * Extract current cc_version from system prompt blocks and recompute with
103
- * stable fingerprint. Returns { attrIdx, newText, oldFingerprint, stableFingerprint }
104
- * or null if no fix needed.
88
+ * stable fingerprint. Returns { oldVersion, newVersion, stableFingerprint }.
105
89
  */
106
90
  function stabilizeFingerprint(system, messages) {
107
91
  if (!Array.isArray(system)) return null;
108
92
 
93
+ // Find the attribution header block
109
94
  const attrIdx = system.findIndex(
110
- (b) =>
111
- b.type === "text" &&
112
- typeof b.text === "string" &&
113
- b.text.includes("x-anthropic-billing-header:")
95
+ (b) => b.type === "text" && typeof b.text === "string" && b.text.includes("x-anthropic-billing-header:")
114
96
  );
115
97
  if (attrIdx === -1) return null;
116
98
 
@@ -118,13 +100,14 @@ function stabilizeFingerprint(system, messages) {
118
100
  const versionMatch = attrBlock.text.match(/cc_version=([^;]+)/);
119
101
  if (!versionMatch) return null;
120
102
 
121
- const fullVersion = versionMatch[1]; // e.g. "2.1.92.a3f"
103
+ const fullVersion = versionMatch[1]; // e.g. "2.1.87.a3f"
122
104
  const dotParts = fullVersion.split(".");
123
105
  if (dotParts.length < 4) return null;
124
106
 
125
- const baseVersion = dotParts.slice(0, 3).join("."); // "2.1.92"
107
+ const baseVersion = dotParts.slice(0, 3).join("."); // "2.1.87"
126
108
  const oldFingerprint = dotParts[3]; // "a3f"
127
109
 
110
+ // Compute stable fingerprint from real user text
128
111
  const realText = extractRealUserMessageText(messages);
129
112
  const stableFingerprint = computeFingerprint(realText, baseVersion);
130
113
 
@@ -139,38 +122,28 @@ function stabilizeFingerprint(system, messages) {
139
122
  return { attrIdx, newText, oldFingerprint, stableFingerprint };
140
123
  }
141
124
 
142
- // ---------------------------------------------------------------------------
125
+ // --------------------------------------------------------------------------
143
126
  // Resume message relocation (Bug 1)
144
- // ---------------------------------------------------------------------------
127
+ // --------------------------------------------------------------------------
145
128
 
146
129
  function isSystemReminder(text) {
147
130
  return typeof text === "string" && text.startsWith("<system-reminder>");
148
131
  }
149
-
132
+ // FIX: Match block headers with startsWith to avoid false positives from
133
+ // quoted content (e.g. "Note:" file-change reminders embedding debug logs).
150
134
  const SR = "<system-reminder>\n";
151
-
152
135
  function isHooksBlock(text) {
153
- return (
154
- isSystemReminder(text) && text.substring(0, 200).includes("hook success")
155
- );
136
+ // Hooks block header varies; fall back to head-region check
137
+ return isSystemReminder(text) && text.substring(0, 200).includes("hook success");
156
138
  }
157
139
  function isSkillsBlock(text) {
158
- return (
159
- typeof text === "string" &&
160
- text.startsWith(SR + "The following skills are available")
161
- );
140
+ return typeof text === "string" && text.startsWith(SR + "The following skills are available");
162
141
  }
163
142
  function isDeferredToolsBlock(text) {
164
- return (
165
- typeof text === "string" &&
166
- text.startsWith(SR + "The following deferred tools are now available")
167
- );
143
+ return typeof text === "string" && text.startsWith(SR + "The following deferred tools are now available");
168
144
  }
169
145
  function isMcpBlock(text) {
170
- return (
171
- typeof text === "string" &&
172
- text.startsWith(SR + "# MCP Server Instructions")
173
- );
146
+ return typeof text === "string" && text.startsWith(SR + "# MCP Server Instructions");
174
147
  }
175
148
  function isRelocatableBlock(text) {
176
149
  return (
@@ -208,18 +181,21 @@ function stripSessionKnowledge(text) {
208
181
  }
209
182
 
210
183
  /**
211
- * Core fix: on EVERY API call, scan the entire message array for the LATEST
184
+ * Core fix: on EVERY call, scan the entire message array for the LATEST
212
185
  * relocatable blocks (skills, MCP, deferred tools, hooks) and ensure they
213
186
  * are in messages[0]. This matches fresh session behavior where attachments
214
- * are always prepended to messages[0].
187
+ * are always prepended to messages[0] on every API call.
215
188
  *
216
- * The v2.1.90 native fix has a remaining detection gap: it bails early if
217
- * it sees *some* relocatable blocks in messages[0], missing the case where
218
- * others have scattered elsewhere (partial scatter).
189
+ * The original community fix only checked the last user message, which
190
+ * broke on subsequent turns because:
191
+ * - Call 1: skills in last msg → relocated to messages[0] (3 blocks)
192
+ * - Call 2: in-memory state unchanged, skills now in a middle msg,
193
+ * last msg has no relocatable blocks → messages[0] back to 2 blocks
194
+ * - Prefix changed → cache bust
219
195
  *
220
196
  * This version scans backwards to find the latest instance of each
221
197
  * relocatable block type, removes them from wherever they are, and
222
- * prepends them to messages[0] in fresh-session order. Idempotent.
198
+ * prepends them to messages[0]. Idempotent across calls.
223
199
  */
224
200
  function normalizeResumeMessages(messages) {
225
201
  if (!Array.isArray(messages) || messages.length < 2) return messages;
@@ -236,13 +212,11 @@ function normalizeResumeMessages(messages) {
236
212
  const firstMsg = messages[firstUserIdx];
237
213
  if (!Array.isArray(firstMsg?.content)) return messages;
238
214
 
239
- // Check if ANY relocatable blocks are scattered outside first user msg.
215
+ // FIX: Check if ANY relocatable blocks are scattered outside first user msg.
216
+ // The old check (firstAlreadyHas → skip) missed partial scatter where some
217
+ // blocks stay in messages[0] but others drift to later messages (v2.1.89+).
240
218
  let hasScatteredBlocks = false;
241
- for (
242
- let i = firstUserIdx + 1;
243
- i < messages.length && !hasScatteredBlocks;
244
- i++
245
- ) {
219
+ for (let i = firstUserIdx + 1; i < messages.length && !hasScatteredBlocks; i++) {
246
220
  const msg = messages[i];
247
221
  if (msg.role !== "user" || !Array.isArray(msg.content)) continue;
248
222
  for (const block of msg.content) {
@@ -254,8 +228,8 @@ function normalizeResumeMessages(messages) {
254
228
  }
255
229
  if (!hasScatteredBlocks) return messages;
256
230
 
257
- // Scan ALL user messages in reverse to collect the LATEST version of each
258
- // block type. This handles both full and partial scatter.
231
+ // Scan ALL user messages (including first) in reverse to collect the LATEST
232
+ // version of each block type. This handles both full and partial scatter.
259
233
  const found = new Map();
260
234
 
261
235
  for (let i = messages.length - 1; i >= firstUserIdx; i--) {
@@ -267,6 +241,7 @@ function normalizeResumeMessages(messages) {
267
241
  const text = block.text || "";
268
242
  if (!isRelocatableBlock(text)) continue;
269
243
 
244
+ // Determine block type for dedup
270
245
  let blockType;
271
246
  if (isSkillsBlock(text)) blockType = "skills";
272
247
  else if (isMcpBlock(text)) blockType = "mcp";
@@ -274,6 +249,7 @@ function normalizeResumeMessages(messages) {
274
249
  else if (isHooksBlock(text)) blockType = "hooks";
275
250
  else continue;
276
251
 
252
+ // Keep only the LATEST (first found scanning backwards)
277
253
  if (!found.has(blockType)) {
278
254
  let fixedText = text;
279
255
  if (blockType === "hooks") fixedText = stripSessionKnowledge(text);
@@ -287,17 +263,15 @@ function normalizeResumeMessages(messages) {
287
263
 
288
264
  if (found.size === 0) return messages;
289
265
 
290
- // Remove ALL relocatable blocks from ALL user messages
266
+ // Remove ALL relocatable blocks from ALL user messages (both first and later)
291
267
  const result = messages.map((msg) => {
292
268
  if (msg.role !== "user" || !Array.isArray(msg.content)) return msg;
293
- const filtered = msg.content.filter(
294
- (b) => !isRelocatableBlock(b.text || "")
295
- );
269
+ const filtered = msg.content.filter((b) => !isRelocatableBlock(b.text || ""));
296
270
  if (filtered.length === msg.content.length) return msg;
297
271
  return { ...msg, content: filtered };
298
272
  });
299
273
 
300
- // Order must match fresh session layout: deferred -> mcp -> skills -> hooks
274
+ // FIX: Order must match fresh session layout: deferred mcp skills hooks
301
275
  const ORDER = ["deferred", "mcp", "skills", "hooks"];
302
276
  const toRelocate = ORDER.filter((t) => found.has(t)).map((t) => found.get(t));
303
277
 
@@ -309,12 +283,245 @@ function normalizeResumeMessages(messages) {
309
283
  return result;
310
284
  }
311
285
 
312
- // ---------------------------------------------------------------------------
313
- // Tool schema stabilization (Bug 3)
314
- // ---------------------------------------------------------------------------
286
+ // --------------------------------------------------------------------------
287
+ // Image stripping from old tool results (cost optimization)
288
+ // --------------------------------------------------------------------------
289
+
290
+ // CACHE_FIX_IMAGE_KEEP_LAST=N — keep images only in the last N user messages.
291
+ // Unset or 0 = disabled (all images preserved, backward compatible).
292
+ // Images in tool_result blocks older than N user messages from the end are
293
+ // replaced with a text placeholder. User-pasted images (direct image blocks
294
+ // in user messages, not inside tool_result) are left alone.
295
+ const IMAGE_KEEP_LAST = parseInt(process.env.CACHE_FIX_IMAGE_KEEP_LAST || "0", 10);
296
+
297
+ /**
298
+ * Strip base64 image blocks from tool_result content in older messages.
299
+ * Returns { messages, stats } where stats has stripping metrics.
300
+ */
301
+ function stripOldToolResultImages(messages, keepLast) {
302
+ if (!keepLast || keepLast <= 0 || !Array.isArray(messages)) {
303
+ return { messages, stats: null };
304
+ }
305
+
306
+ // Find user message indices (turns) so we can count from the end
307
+ const userMsgIndices = [];
308
+ for (let i = 0; i < messages.length; i++) {
309
+ if (messages[i].role === "user") userMsgIndices.push(i);
310
+ }
311
+
312
+ if (userMsgIndices.length <= keepLast) {
313
+ return { messages, stats: null }; // not enough turns to strip anything
314
+ }
315
+
316
+ // Messages at or after this index are "recent" — keep their images
317
+ const cutoffIdx = userMsgIndices[userMsgIndices.length - keepLast];
318
+
319
+ let strippedCount = 0;
320
+ let strippedBytes = 0;
321
+
322
+ const result = messages.map((msg, msgIdx) => {
323
+ // Only process user messages before the cutoff (tool_result is in user msgs)
324
+ if (msg.role !== "user" || msgIdx >= cutoffIdx || !Array.isArray(msg.content)) {
325
+ return msg;
326
+ }
327
+
328
+ let msgModified = false;
329
+ const newContent = msg.content.map((block) => {
330
+ // Only strip images inside tool_result blocks, not user-pasted images
331
+ if (block.type === "tool_result" && Array.isArray(block.content)) {
332
+ let toolModified = false;
333
+ const newToolContent = block.content.map((item) => {
334
+ if (item.type === "image") {
335
+ strippedCount++;
336
+ if (item.source?.data) {
337
+ strippedBytes += item.source.data.length;
338
+ }
339
+ toolModified = true;
340
+ return {
341
+ type: "text",
342
+ text: "[image stripped from history — file may still be on disk]",
343
+ };
344
+ }
345
+ return item;
346
+ });
347
+ if (toolModified) {
348
+ msgModified = true;
349
+ return { ...block, content: newToolContent };
350
+ }
351
+ }
352
+ return block;
353
+ });
354
+
355
+ if (msgModified) {
356
+ return { ...msg, content: newContent };
357
+ }
358
+ return msg;
359
+ });
360
+
361
+ const stats = strippedCount > 0
362
+ ? { strippedCount, strippedBytes, estimatedTokens: Math.ceil(strippedBytes * 0.125) }
363
+ : null;
364
+
365
+ return { messages: strippedCount > 0 ? result : messages, stats };
366
+ }
367
+
368
+ // --------------------------------------------------------------------------
369
+ // Prefix lock — replay saved messages[0] on resume for cache hit
370
+ // --------------------------------------------------------------------------
371
+
372
+ // CACHE_FIX_PREFIX_LOCK=1 — save messages[0] on every call and replay it on
373
+ // resume to avoid a cache rebuild. Disabled by default.
374
+ //
375
+ // On resume, CC reassembles messages with blocks in different positions and
376
+ // injects fresh system-reminders, changing the prefix bytes. Even after our
377
+ // relocation fix corrects the blocks, the prefix differs from what the server
378
+ // cached on the last pre-exit call, causing a full cache rebuild.
379
+ //
380
+ // This feature saves the exact messages[0] content after all fixes are applied.
381
+ // On the first call of a new process (resume), if system prompt hash and tools
382
+ // hash match the saved snapshot, and the real user message text matches, we
383
+ // replay the saved messages[0] to produce a byte-identical prefix → cache hit.
384
+
385
+ const PREFIX_LOCK = process.env.CACHE_FIX_PREFIX_LOCK === "1";
386
+ const PREFIX_LOCK_FILE = join(homedir(), ".claude", "cache-fix-prefix-lock.json");
387
+
388
+ let _prefixLockFirstCall = true;
389
+
390
+ /**
391
+ * Compute hashes for prefix lock comparison.
392
+ */
393
+ function computePrefixHashes(system, tools) {
394
+ const sysHash = system
395
+ ? createHash("sha256").update(JSON.stringify(system)).digest("hex").slice(0, 16)
396
+ : "none";
397
+ const toolHash = tools
398
+ ? createHash("sha256").update(JSON.stringify(tools.map(t => t.name).sort())).digest("hex").slice(0, 16)
399
+ : "none";
400
+ return { sysHash, toolHash };
401
+ }
402
+
403
+ /**
404
+ * Extract the real user message text from messages[0] (skipping system-reminders).
405
+ */
406
+ function extractUserTextFromFirstMsg(msg) {
407
+ if (!msg || !Array.isArray(msg.content)) return "";
408
+ for (const block of msg.content) {
409
+ if (block.type === "text" && typeof block.text === "string" &&
410
+ !block.text.startsWith("<system-reminder>") &&
411
+ !block.text.startsWith("<local-command")) {
412
+ return block.text.slice(0, 200); // enough to identify, not too much to compare
413
+ }
414
+ }
415
+ return "";
416
+ }
417
+
418
+ /**
419
+ * Hash all non-system-reminder user content in messages[0] to detect
420
+ * substantive changes that the userText check (first 200 chars) might miss.
421
+ */
422
+ function hashUserContent(msg) {
423
+ if (!msg || !Array.isArray(msg.content)) return "empty";
424
+ const userBlocks = msg.content.filter(b =>
425
+ b.type === "text" && typeof b.text === "string" &&
426
+ !b.text.startsWith("<system-reminder>") &&
427
+ !b.text.startsWith("<local-command")
428
+ );
429
+ if (userBlocks.length === 0) return "empty";
430
+ return createHash("sha256")
431
+ .update(userBlocks.map(b => b.text).join("\n"))
432
+ .digest("hex").slice(0, 16);
433
+ }
434
+
435
+ /**
436
+ * On resume: try to replay saved messages[0] for cache hit.
437
+ * Returns the locked messages array or the original if lock doesn't apply.
438
+ */
439
+ function applyPrefixLock(messages, system, tools) {
440
+ if (!PREFIX_LOCK || !Array.isArray(messages) || messages.length < 2) return messages;
441
+
442
+ const firstUserIdx = messages.findIndex(m => m.role === "user");
443
+ if (firstUserIdx === -1) return messages;
444
+
445
+ const { sysHash, toolHash } = computePrefixHashes(system, tools);
446
+ const currentUserText = extractUserTextFromFirstMsg(messages[firstUserIdx]);
447
+ const currentContentHash = hashUserContent(messages[firstUserIdx]);
448
+
449
+ // Skip if this looks like a compacted conversation (system-reminder as first block
450
+ // with compaction summary markers)
451
+ const firstBlock = messages[firstUserIdx]?.content?.[0];
452
+ if (firstBlock?.text?.includes("CompactBoundary") || firstBlock?.text?.includes("compacted")) {
453
+ debugLog("PREFIX LOCK: skipped — compacted conversation detected");
454
+ return messages;
455
+ }
456
+
457
+ if (_prefixLockFirstCall) {
458
+ _prefixLockFirstCall = false;
459
+
460
+ // Try to load and apply saved prefix
461
+ try {
462
+ const saved = JSON.parse(readFileSync(PREFIX_LOCK_FILE, "utf8"));
463
+
464
+ if (saved.sysHash !== sysHash) {
465
+ debugLog("PREFIX LOCK: skipped — system prompt changed");
466
+ } else if (saved.toolHash !== toolHash) {
467
+ debugLog("PREFIX LOCK: skipped — tools changed");
468
+ } else if (saved.userText !== currentUserText) {
469
+ debugLog("PREFIX LOCK: skipped — user message text changed");
470
+ } else if (saved.contentHash && saved.contentHash !== currentContentHash) {
471
+ debugLog("PREFIX LOCK: skipped — user content hash changed (substantive context change)");
472
+ } else if (!saved.content || !Array.isArray(saved.content)) {
473
+ debugLog("PREFIX LOCK: skipped — saved content invalid");
474
+ } else {
475
+ // Apply the saved messages[0] content
476
+ const result = [...messages];
477
+ result[firstUserIdx] = { ...result[firstUserIdx], content: saved.content };
478
+ debugLog(`PREFIX LOCK: APPLIED — replayed saved messages[0] (${saved.content.length} blocks)`);
479
+ return result;
480
+ }
481
+ } catch {
482
+ debugLog("PREFIX LOCK: no saved prefix found (first run or file missing)");
483
+ }
484
+ }
485
+
486
+ return messages;
487
+ }
488
+
489
+ /**
490
+ * Save current messages[0] content for future resume replay.
491
+ * Called after all fixes are applied, before the request is sent.
492
+ */
493
+ function savePrefixLock(messages, system, tools) {
494
+ if (!PREFIX_LOCK || !Array.isArray(messages)) return;
495
+
496
+ const firstUserIdx = messages.findIndex(m => m.role === "user");
497
+ if (firstUserIdx === -1) return;
498
+
499
+ const { sysHash, toolHash } = computePrefixHashes(system, tools);
500
+ const userText = extractUserTextFromFirstMsg(messages[firstUserIdx]);
501
+ const contentHash = hashUserContent(messages[firstUserIdx]);
502
+ const content = messages[firstUserIdx].content;
503
+
504
+ try {
505
+ writeFileSync(PREFIX_LOCK_FILE, JSON.stringify({
506
+ timestamp: new Date().toISOString(),
507
+ sysHash,
508
+ toolHash,
509
+ userText,
510
+ contentHash,
511
+ content,
512
+ }));
513
+ } catch (e) {
514
+ debugLog("PREFIX LOCK: failed to save:", e?.message);
515
+ }
516
+ }
517
+
518
+ // --------------------------------------------------------------------------
519
+ // Tool schema stabilization (Bug 2 secondary cause)
520
+ // --------------------------------------------------------------------------
315
521
 
316
522
  /**
317
- * Sort tool definitions by name for deterministic ordering.
523
+ * Sort tool definitions by name for deterministic ordering. Tool schema bytes
524
+ * changing mid-session was acknowledged as a bug in the v2.1.88 changelog.
318
525
  */
319
526
  function stabilizeToolOrder(tools) {
320
527
  if (!Array.isArray(tools) || tools.length === 0) return tools;
@@ -325,9 +532,228 @@ function stabilizeToolOrder(tools) {
325
532
  });
326
533
  }
327
534
 
328
- // ---------------------------------------------------------------------------
535
+ // --------------------------------------------------------------------------
536
+ // Fetch interceptor
537
+ // --------------------------------------------------------------------------
538
+
539
+ // --------------------------------------------------------------------------
540
+ // Debug logging (writes to ~/.claude/cache-fix-debug.log)
541
+ // Set CACHE_FIX_DEBUG=1 to enable
542
+ // --------------------------------------------------------------------------
543
+
544
+ import { appendFileSync, readFileSync, writeFileSync, mkdirSync } from "node:fs";
545
+ import { homedir } from "node:os";
546
+ import { join } from "node:path";
547
+
548
+ const DEBUG = process.env.CACHE_FIX_DEBUG === "1";
549
+ const PREFIXDIFF = process.env.CACHE_FIX_PREFIXDIFF === "1";
550
+ const LOG_PATH = join(homedir(), ".claude", "cache-fix-debug.log");
551
+ const SNAPSHOT_DIR = join(homedir(), ".claude", "cache-fix-snapshots");
552
+
553
+ function debugLog(...args) {
554
+ if (!DEBUG) return;
555
+ const line = `[${new Date().toISOString()}] ${args.join(" ")}\n`;
556
+ try { appendFileSync(LOG_PATH, line); } catch {}
557
+ }
558
+
559
+ // --------------------------------------------------------------------------
560
+ // Prefix snapshot — captures message prefix for cross-process diff.
561
+ // Set CACHE_FIX_PREFIXDIFF=1 to enable.
562
+ //
563
+ // On each API call: saves JSON of first 5 messages + system + tools hash
564
+ // to ~/.claude/cache-fix-snapshots/<session-hash>-last.json
565
+ //
566
+ // On first call after startup: compares against saved snapshot and writes
567
+ // a diff report to ~/.claude/cache-fix-snapshots/<session-hash>-diff.json
568
+ // --------------------------------------------------------------------------
569
+
570
+ let _prefixDiffFirstCall = true;
571
+
572
+ // --------------------------------------------------------------------------
573
+ // GrowthBook flag dump (runs once on first API call)
574
+ // --------------------------------------------------------------------------
575
+
576
+ let _growthBookDumped = false;
577
+
578
+ function dumpGrowthBookFlags() {
579
+ if (_growthBookDumped || !DEBUG) return;
580
+ _growthBookDumped = true;
581
+ try {
582
+ const claudeJson = JSON.parse(readFileSync(join(homedir(), ".claude.json"), "utf8"));
583
+ const features = claudeJson.cachedGrowthBookFeatures;
584
+ if (!features) { debugLog("GROWTHBOOK: no cachedGrowthBookFeatures found"); return; }
585
+
586
+ // Log the flags that matter for cost/cache/context behavior
587
+ const interesting = {
588
+ hawthorn_window: features.tengu_hawthorn_window,
589
+ pewter_kestrel: features.tengu_pewter_kestrel,
590
+ summarize_tool_results: features.tengu_summarize_tool_results,
591
+ slate_heron: features.tengu_slate_heron,
592
+ session_memory: features.tengu_session_memory,
593
+ sm_compact: features.tengu_sm_compact,
594
+ sm_compact_config: features.tengu_sm_compact_config,
595
+ sm_config: features.tengu_sm_config,
596
+ cache_plum_violet: features.tengu_cache_plum_violet,
597
+ prompt_cache_1h_config: features.tengu_prompt_cache_1h_config,
598
+ crystal_beam: features.tengu_crystal_beam,
599
+ cold_compact: features.tengu_cold_compact,
600
+ system_prompt_global_cache: features.tengu_system_prompt_global_cache,
601
+ compact_cache_prefix: features.tengu_compact_cache_prefix,
602
+ };
603
+ debugLog("GROWTHBOOK FLAGS:", JSON.stringify(interesting, null, 2));
604
+ } catch (e) {
605
+ debugLog("GROWTHBOOK: failed to read ~/.claude.json:", e?.message);
606
+ }
607
+ }
608
+
609
+ // --------------------------------------------------------------------------
610
+ // Microcompact / budget monitoring
611
+ // --------------------------------------------------------------------------
612
+
613
+ /**
614
+ * Scan outgoing messages for signs of microcompact clearing and budget
615
+ * enforcement. Counts tool results that have been gutted and reports stats.
616
+ */
617
+ function monitorContextDegradation(messages) {
618
+ if (!Array.isArray(messages)) return null;
619
+
620
+ let clearedToolResults = 0;
621
+ let totalToolResultChars = 0;
622
+ let totalToolResults = 0;
623
+
624
+ for (const msg of messages) {
625
+ if (msg.role !== "user" || !Array.isArray(msg.content)) continue;
626
+ for (const block of msg.content) {
627
+ if (block.type === "tool_result") {
628
+ totalToolResults++;
629
+ const content = block.content;
630
+ if (typeof content === "string") {
631
+ if (content === "[Old tool result content cleared]") {
632
+ clearedToolResults++;
633
+ } else {
634
+ totalToolResultChars += content.length;
635
+ }
636
+ } else if (Array.isArray(content)) {
637
+ for (const item of content) {
638
+ if (item.type === "text") {
639
+ if (item.text === "[Old tool result content cleared]") {
640
+ clearedToolResults++;
641
+ } else {
642
+ totalToolResultChars += item.text.length;
643
+ }
644
+ }
645
+ }
646
+ }
647
+ }
648
+ }
649
+ }
650
+
651
+ if (totalToolResults === 0) return null;
652
+
653
+ const stats = { totalToolResults, clearedToolResults, totalToolResultChars };
654
+
655
+ if (clearedToolResults > 0) {
656
+ debugLog(`MICROCOMPACT: ${clearedToolResults}/${totalToolResults} tool results cleared`);
657
+ }
658
+
659
+ // Warn when approaching the 200K budget threshold
660
+ if (totalToolResultChars > 150000) {
661
+ debugLog(`BUDGET WARNING: tool result chars at ${totalToolResultChars.toLocaleString()} / 200,000 threshold`);
662
+ }
663
+
664
+ return stats;
665
+ }
666
+
667
+ function snapshotPrefix(payload) {
668
+ if (!PREFIXDIFF) return;
669
+ try {
670
+ mkdirSync(SNAPSHOT_DIR, { recursive: true });
671
+
672
+ // Session key: use system prompt hash — stable across restarts for the same project.
673
+ // Different projects get different snapshots, same project matches across resume.
674
+ const sessionKey = payload.system
675
+ ? createHash("sha256").update(JSON.stringify(payload.system).slice(0, 2000)).digest("hex").slice(0, 12)
676
+ : "default";
677
+
678
+ const snapshotFile = join(SNAPSHOT_DIR, `${sessionKey}-last.json`);
679
+ const diffFile = join(SNAPSHOT_DIR, `${sessionKey}-diff.json`);
680
+
681
+ // Build prefix snapshot: first 5 messages, stripped of cache_control
682
+ const prefixMsgs = (payload.messages || []).slice(0, 5).map(msg => {
683
+ const content = Array.isArray(msg.content)
684
+ ? msg.content.map(b => {
685
+ const { cache_control, ...rest } = b;
686
+ // Truncate long text blocks for diffing
687
+ if (rest.text && rest.text.length > 500) {
688
+ rest.text = rest.text.slice(0, 500) + `...[${rest.text.length} chars]`;
689
+ }
690
+ return rest;
691
+ })
692
+ : msg.content;
693
+ return { role: msg.role, content };
694
+ });
695
+
696
+ const toolsHash = payload.tools
697
+ ? createHash("sha256").update(JSON.stringify(payload.tools.map(t => t.name))).digest("hex").slice(0, 16)
698
+ : "none";
699
+
700
+ const systemHash = payload.system
701
+ ? createHash("sha256").update(JSON.stringify(payload.system)).digest("hex").slice(0, 16)
702
+ : "none";
703
+
704
+ const snapshot = {
705
+ timestamp: new Date().toISOString(),
706
+ messageCount: payload.messages?.length || 0,
707
+ toolsHash,
708
+ systemHash,
709
+ prefixMessages: prefixMsgs,
710
+ };
711
+
712
+ // On first call: compare against saved
713
+ if (_prefixDiffFirstCall) {
714
+ _prefixDiffFirstCall = false;
715
+ try {
716
+ const prev = JSON.parse(readFileSync(snapshotFile, "utf8"));
717
+ const diff = {
718
+ timestamp: snapshot.timestamp,
719
+ prevTimestamp: prev.timestamp,
720
+ toolsMatch: prev.toolsHash === snapshot.toolsHash,
721
+ systemMatch: prev.systemHash === snapshot.systemHash,
722
+ messageCountPrev: prev.messageCount,
723
+ messageCountNow: snapshot.messageCount,
724
+ prefixDiffs: [],
725
+ };
726
+
727
+ const maxIdx = Math.max(prev.prefixMessages.length, snapshot.prefixMessages.length);
728
+ for (let i = 0; i < maxIdx; i++) {
729
+ const prevMsg = JSON.stringify(prev.prefixMessages[i] || null);
730
+ const nowMsg = JSON.stringify(snapshot.prefixMessages[i] || null);
731
+ if (prevMsg !== nowMsg) {
732
+ diff.prefixDiffs.push({
733
+ index: i,
734
+ prev: prev.prefixMessages[i] || null,
735
+ now: snapshot.prefixMessages[i] || null,
736
+ });
737
+ }
738
+ }
739
+
740
+ writeFileSync(diffFile, JSON.stringify(diff, null, 2));
741
+ debugLog(`PREFIX DIFF: ${diff.prefixDiffs.length} differences in first 5 messages. tools=${diff.toolsMatch ? "match" : "DIFFER"} system=${diff.systemMatch ? "match" : "DIFFER"}`);
742
+ } catch {
743
+ // No previous snapshot — first run
744
+ }
745
+ }
746
+
747
+ // Save current snapshot
748
+ writeFileSync(snapshotFile, JSON.stringify(snapshot, null, 2));
749
+ } catch (e) {
750
+ debugLog("PREFIX SNAPSHOT ERROR:", e?.message);
751
+ }
752
+ }
753
+
754
+ // --------------------------------------------------------------------------
329
755
  // Fetch interceptor
330
- // ---------------------------------------------------------------------------
756
+ // --------------------------------------------------------------------------
331
757
 
332
758
  const _origFetch = globalThis.fetch;
333
759
 
@@ -339,23 +765,27 @@ globalThis.fetch = async function (url, options) {
339
765
  !urlStr.includes("batches") &&
340
766
  !urlStr.includes("count_tokens");
341
767
 
342
- if (
343
- isMessagesEndpoint &&
344
- options?.body &&
345
- typeof options.body === "string"
346
- ) {
768
+ if (isMessagesEndpoint && options?.body && typeof options.body === "string") {
347
769
  try {
348
770
  const payload = JSON.parse(options.body);
349
771
  let modified = false;
350
772
 
773
+ // One-time GrowthBook flag dump on first API call
774
+ dumpGrowthBookFlags();
775
+
351
776
  debugLog("--- API call to", urlStr);
352
777
  debugLog("message count:", payload.messages?.length);
353
778
 
354
- // Bug 1: Relocate scattered attachment blocks
779
+ // Detect synthetic model (false rate limiter, B3)
780
+ if (payload.model === "<synthetic>") {
781
+ debugLog("FALSE RATE LIMIT: synthetic model detected — client-side rate limit, no real API call");
782
+ }
783
+
784
+ // Bug 1: Relocate resume attachment blocks
355
785
  if (payload.messages) {
786
+ // Log message structure for debugging
356
787
  if (DEBUG) {
357
- let firstUserIdx = -1;
358
- let lastUserIdx = -1;
788
+ let firstUserIdx = -1, lastUserIdx = -1;
359
789
  for (let i = 0; i < payload.messages.length; i++) {
360
790
  if (payload.messages[i].role === "user") {
361
791
  if (firstUserIdx === -1) firstUserIdx = i;
@@ -365,39 +795,20 @@ globalThis.fetch = async function (url, options) {
365
795
  if (firstUserIdx !== -1) {
366
796
  const firstContent = payload.messages[firstUserIdx].content;
367
797
  const lastContent = payload.messages[lastUserIdx].content;
368
- debugLog(
369
- "firstUserIdx:",
370
- firstUserIdx,
371
- "lastUserIdx:",
372
- lastUserIdx
373
- );
374
- debugLog(
375
- "first user msg blocks:",
376
- Array.isArray(firstContent) ? firstContent.length : "string"
377
- );
798
+ debugLog("firstUserIdx:", firstUserIdx, "lastUserIdx:", lastUserIdx);
799
+ debugLog("first user msg blocks:", Array.isArray(firstContent) ? firstContent.length : "string");
378
800
  if (Array.isArray(firstContent)) {
379
801
  for (const b of firstContent) {
380
802
  const t = (b.text || "").substring(0, 80);
381
- debugLog(
382
- " first[block]:",
383
- isRelocatableBlock(b.text) ? "RELOCATABLE" : "keep",
384
- JSON.stringify(t)
385
- );
803
+ debugLog(" first[block]:", isRelocatableBlock(b.text) ? "RELOCATABLE" : "keep", JSON.stringify(t));
386
804
  }
387
805
  }
388
806
  if (firstUserIdx !== lastUserIdx) {
389
- debugLog(
390
- "last user msg blocks:",
391
- Array.isArray(lastContent) ? lastContent.length : "string"
392
- );
807
+ debugLog("last user msg blocks:", Array.isArray(lastContent) ? lastContent.length : "string");
393
808
  if (Array.isArray(lastContent)) {
394
809
  for (const b of lastContent) {
395
810
  const t = (b.text || "").substring(0, 80);
396
- debugLog(
397
- " last[block]:",
398
- isRelocatableBlock(b.text) ? "RELOCATABLE" : "keep",
399
- JSON.stringify(t)
400
- );
811
+ debugLog(" last[block]:", isRelocatableBlock(b.text) ? "RELOCATABLE" : "keep", JSON.stringify(t));
401
812
  }
402
813
  }
403
814
  } else {
@@ -412,13 +823,37 @@ globalThis.fetch = async function (url, options) {
412
823
  modified = true;
413
824
  debugLog("APPLIED: resume message relocation");
414
825
  } else {
826
+ debugLog("SKIPPED: resume relocation (not a resume or already correct)");
827
+ }
828
+ }
829
+
830
+ // Image stripping: remove old tool_result images to reduce token waste
831
+ if (payload.messages && IMAGE_KEEP_LAST > 0) {
832
+ const { messages: imgStripped, stats: imgStats } = stripOldToolResultImages(
833
+ payload.messages, IMAGE_KEEP_LAST
834
+ );
835
+ if (imgStats) {
836
+ payload.messages = imgStripped;
837
+ modified = true;
415
838
  debugLog(
416
- "SKIPPED: resume relocation (not a resume or already correct)"
839
+ `APPLIED: stripped ${imgStats.strippedCount} images from old tool results`,
840
+ `(~${imgStats.strippedBytes} base64 bytes, ~${imgStats.estimatedTokens} tokens saved)`
417
841
  );
842
+ } else if (IMAGE_KEEP_LAST > 0) {
843
+ debugLog("SKIPPED: image stripping (no old images found or not enough turns)");
844
+ }
845
+ }
846
+
847
+ // Prefix lock: replay saved messages[0] on resume for cache hit
848
+ if (payload.messages && payload.system) {
849
+ const locked = applyPrefixLock(payload.messages, payload.system, payload.tools);
850
+ if (locked !== payload.messages) {
851
+ payload.messages = locked;
852
+ modified = true;
418
853
  }
419
854
  }
420
855
 
421
- // Bug 3: Stabilize tool ordering
856
+ // Bug 2a: Stabilize tool ordering
422
857
  if (payload.tools) {
423
858
  const sorted = stabilizeToolOrder(payload.tools);
424
859
  const changed = sorted.some(
@@ -431,7 +866,7 @@ globalThis.fetch = async function (url, options) {
431
866
  }
432
867
  }
433
868
 
434
- // Bug 2: Stabilize fingerprint in attribution header
869
+ // Bug 2b: Stabilize fingerprint in attribution header
435
870
  if (payload.system && payload.messages) {
436
871
  const fix = stabilizeFingerprint(payload.system, payload.messages);
437
872
  if (fix) {
@@ -441,12 +876,7 @@ globalThis.fetch = async function (url, options) {
441
876
  text: fix.newText,
442
877
  };
443
878
  modified = true;
444
- debugLog(
445
- "APPLIED: fingerprint stabilized from",
446
- fix.oldFingerprint,
447
- "to",
448
- fix.stableFingerprint
449
- );
879
+ debugLog("APPLIED: fingerprint stabilized from", fix.oldFingerprint, "to", fix.stableFingerprint);
450
880
  }
451
881
  }
452
882
 
@@ -454,11 +884,53 @@ globalThis.fetch = async function (url, options) {
454
884
  options = { ...options, body: JSON.stringify(payload) };
455
885
  debugLog("Request body rewritten");
456
886
  }
887
+
888
+ // Save prefix lock after all fixes applied
889
+ if (payload.messages && payload.system) {
890
+ savePrefixLock(payload.messages, payload.system, payload.tools);
891
+ }
892
+
893
+ // Monitor for microcompact / budget enforcement degradation
894
+ if (payload.messages) {
895
+ monitorContextDegradation(payload.messages);
896
+ }
897
+
898
+ // Capture prefix snapshot for cross-process diff analysis
899
+ snapshotPrefix(payload);
900
+
457
901
  } catch (e) {
458
902
  debugLog("ERROR in interceptor:", e?.message);
459
903
  // Parse failure — pass through unmodified
460
904
  }
461
905
  }
462
906
 
463
- return _origFetch.apply(this, [url, options]);
907
+ const response = await _origFetch.apply(this, [url, options]);
908
+
909
+ // Extract quota utilization from response headers and save for hooks/MCP
910
+ if (isMessagesEndpoint) {
911
+ try {
912
+ const h5 = response.headers.get("anthropic-ratelimit-unified-5h-utilization");
913
+ const h7d = response.headers.get("anthropic-ratelimit-unified-7d-utilization");
914
+ const reset5h = response.headers.get("anthropic-ratelimit-unified-5h-reset");
915
+ const reset7d = response.headers.get("anthropic-ratelimit-unified-7d-reset");
916
+ const status = response.headers.get("anthropic-ratelimit-unified-status");
917
+ const overage = response.headers.get("anthropic-ratelimit-unified-overage-status");
918
+
919
+ if (h5 || h7d) {
920
+ const quota = {
921
+ timestamp: new Date().toISOString(),
922
+ five_hour: h5 ? { utilization: parseFloat(h5), pct: Math.round(parseFloat(h5) * 100), resets_at: reset5h ? parseInt(reset5h) : null } : null,
923
+ seven_day: h7d ? { utilization: parseFloat(h7d), pct: Math.round(parseFloat(h7d) * 100), resets_at: reset7d ? parseInt(reset7d) : null } : null,
924
+ status: status || null,
925
+ overage_status: overage || null,
926
+ };
927
+ const quotaFile = join(homedir(), ".claude", "quota-status.json");
928
+ writeFileSync(quotaFile, JSON.stringify(quota, null, 2));
929
+ }
930
+ } catch {
931
+ // Non-critical — don't break the response
932
+ }
933
+ }
934
+
935
+ return response;
464
936
  };