claude-code-cache-fix 1.8.0 → 1.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # claude-code-cache-fix
2
2
 
3
- English | [中文](./README.zh.md)
3
+ English | [中文](./README.zh.md) | [Português](./docs/guia-pt-br.md)
4
4
 
5
5
  Fixes prompt cache regressions in [Claude Code](https://github.com/anthropics/claude-code) that cause **up to 20x cost increase** on resumed sessions, plus monitoring for silent context degradation. Confirmed through v2.1.97.
6
6
 
@@ -36,7 +36,10 @@ Create a wrapper script (e.g. `~/bin/claude-fixed`):
36
36
 
37
37
  ```bash
38
38
  #!/bin/bash
39
- CLAUDE_NPM_CLI="$HOME/.npm-global/lib/node_modules/@anthropic-ai/claude-code/cli.js"
39
+ NPM_GLOBAL_ROOT="$(npm root -g 2>/dev/null)"
40
+
41
+ CLAUDE_NPM_CLI="$NPM_GLOBAL_ROOT/@anthropic-ai/claude-code/cli.js"
42
+ CACHE_FIX="$NPM_GLOBAL_ROOT/claude-code-cache-fix/preload.mjs"
40
43
 
41
44
  if [ ! -f "$CLAUDE_NPM_CLI" ]; then
42
45
  echo "Error: Claude Code npm package not found at $CLAUDE_NPM_CLI" >&2
@@ -44,7 +47,13 @@ if [ ! -f "$CLAUDE_NPM_CLI" ]; then
44
47
  exit 1
45
48
  fi
46
49
 
47
- exec env NODE_OPTIONS="--import claude-code-cache-fix" node "$CLAUDE_NPM_CLI" "$@"
50
+ if [ ! -f "$CACHE_FIX" ]; then
51
+ echo "Error: claude-code-cache-fix not found at $CACHE_FIX" >&2
52
+ echo "Install with: npm install -g claude-code-cache-fix" >&2
53
+ exit 1
54
+ fi
55
+
56
+ exec env NODE_OPTIONS="--import $CACHE_FIX" node "$CLAUDE_NPM_CLI" "$@"
48
57
  ```
49
58
 
50
59
  ```bash
@@ -95,6 +104,30 @@ The wrapper dynamically resolves your npm global root, constructs a `file:///` U
95
104
 
96
105
  Credit: [@TomTheMenace](https://github.com/anthropics/claude-code/issues/38335) contributed the Windows wrapper and validated the interceptor across a 7.5-hour, 536-call Opus 4.6 session on Windows — 98.4% cache hit rate, 81% of calls had fingerprint instability that the interceptor corrected.
97
106
 
107
+ ## VS Code Extension (experimental)
108
+
109
+ If you use Claude Code through the VS Code extension rather than the CLI, you may be able to load the interceptor via VS Code settings:
110
+
111
+ ```json
112
+ {
113
+ "claude-code.environmentVariables": {
114
+ "NODE_OPTIONS": "--import /path/to/claude-code-cache-fix/preload.mjs"
115
+ }
116
+ }
117
+ ```
118
+
119
+ Replace `/path/to` with your npm global root (`npm root -g`). Example for a typical Linux setup:
120
+
121
+ ```json
122
+ {
123
+ "claude-code.environmentVariables": {
124
+ "NODE_OPTIONS": "--import /home/username/.npm-global/lib/node_modules/claude-code-cache-fix/preload.mjs"
125
+ }
126
+ }
127
+ ```
128
+
129
+ **Status: needs community testing.** We've confirmed the `claude-code.environmentVariables` setting exists but haven't verified it propagates `NODE_OPTIONS` to the CC subprocess. If you test this, please report back on [#16](https://github.com/cnighswonger/claude-code-cache-fix/issues/16).
130
+
98
131
  ## How it works
99
132
 
100
133
  The module intercepts `globalThis.fetch` before Claude Code makes API calls to `/v1/messages`. On each call it:
@@ -198,7 +231,23 @@ Add to `~/.claude/settings.json`:
198
231
  }
199
232
  ```
200
233
 
201
- ### Why this matters
234
+ ### Recommended: disable git-status injection
235
+
236
+ Claude Code injects live `git status` output into the system prompt on every call. Any file edit changes the git status, which changes the system prompt, which busts the entire prefix cache. Disabling this saves ~1,800 tokens per call and fully stabilizes the system prompt across file edits:
237
+
238
+ ```bash
239
+ export CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1
240
+ ```
241
+
242
+ Or add `"includeGitInstructions": false` to `~/.claude/settings.json`. Claude Code can still run `git status` via the Bash tool when it needs git context — it just won't pre-inject it into every system prompt.
243
+
244
+ The flag also shrinks the Bash tool description by ~6,364 chars (the Bash tool includes git-related instructions that are stripped when the flag is set), for a total prefix savings of ~7,180 chars (~1,800 tokens) per call.
245
+
246
+ Community-validated by [@wadabum](https://github.com/cnighswonger/claude-code-cache-fix/issues/11): 18-token cache creation across git state changes (vs thousands without the flag). See [#11](https://github.com/cnighswonger/claude-code-cache-fix/issues/11) for the full telemetry comparison.
247
+
248
+ **Note:** this flag does not address the `"Primary working directory"` line in the system prompt, which changes per git worktree. A v1.9.0 interceptor fix to strip/normalize both is planned ([#11](https://github.com/cnighswonger/claude-code-cache-fix/issues/11)).
249
+
250
+ ### Why the status line matters
202
251
 
203
252
  When the server downgrades your TTL to 5m (Layer 2 — quota-aware downgrade at Q5h ≥ 100%), **every idle longer than 5 minutes causes a full context rebuild**. Without the status line, this is invisible — you just notice things getting slower and more expensive. With the status line, the red `TTL:5m` warning tells you immediately: **stop working, wait for the Q5h window to reset, then resume**. Powering through overage compounds the drain; pausing breaks the cycle.
204
253
 
@@ -406,8 +455,13 @@ Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are gen
406
455
  | `CACHE_FIX_SKIP_RELOCATE` | `0` | Skip block relocation fix (Bug 1) |
407
456
  | `CACHE_FIX_SKIP_FINGERPRINT` | `0` | Skip fingerprint stabilization (Bug 2b) |
408
457
  | `CACHE_FIX_SKIP_TOOL_SORT` | `0` | Skip tool ordering stabilization (Bug 2a) |
409
- | `CACHE_FIX_SKIP_TTL` | `0` | Skip 1h TTL injection (Bug 5) |
458
+ | `CACHE_FIX_SKIP_TTL` | `0` | Skip TTL injection (Bug 5) |
410
459
  | `CACHE_FIX_SKIP_IDENTITY` | `0` | Skip identity normalization (Bug 6) |
460
+ | `CACHE_FIX_SKIP_GIT_STATUS` | `0` | Skip git-status stripping |
461
+ | `CACHE_FIX_STRIP_GIT_STATUS` | `0` | Strip volatile git-status from system prompt for prefix stability. Model can still run `git status` via Bash. |
462
+ | `CACHE_FIX_TTL_MAIN` | `1h` | TTL for main-thread requests: `1h`, `5m`, or `none` (pass-through) |
463
+ | `CACHE_FIX_TTL_SUBAGENT` | `1h` | TTL for subagent requests: `1h`, `5m`, or `none` (pass-through) |
464
+ | `CACHE_FIX_DUMP_BREAKPOINTS` | unset | Path to dump cache breakpoint structure (diagnostic for #12) |
411
465
 
412
466
  ## Limitations
413
467
 
@@ -491,6 +545,8 @@ measurable signature of cache-efficiency degradation.
491
545
  - **[@Renvect](https://github.com/Renvect)** — Image duplication discovery, cross-project directory contamination analysis
492
546
  - **[@fgrosswig](https://github.com/fgrosswig)** — [claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard) forensic methodology: cost-factor overhead ratio metric, `anthropic-*` header capture pattern, proxy NDJSON schema that informed our dashboard interop layer
493
547
  - **[@TomTheMenace](https://github.com/TomTheMenace)** — Windows `.bat` wrapper for the interceptor, first Windows platform validation (7.5h/536-call Opus 4.6 session, 98.4% cache hit rate, 81% fingerprint instability corrected)
548
+ - **[@arjansingh](https://github.com/arjansingh)** — nvm-compatible wrapper script with dynamic `npm root -g` path resolution (PR #15)
549
+ - **[@beekamai](https://github.com/beekamai)** — Windows URL-encoding fix for `claude-fixed.bat` when npm root contains spaces (PR #17)
494
550
 
495
551
  If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.
496
552
 
package/claude-fixed.bat CHANGED
@@ -17,6 +17,10 @@ REM
17
17
  REM Credit: @TomTheMenace (https://github.com/anthropics/claude-code/issues/38335)
18
18
  REM Part of claude-code-cache-fix: https://github.com/cnighswonger/claude-code-cache-fix
19
19
 
20
+ REM Resolve npm global root and URL-encode it so spaces (e.g. "C:\Program Files\nodejs")
21
+ REM don't break NODE_OPTIONS parsing. Without encoding, Node splits --import file:/// on
22
+ REM the literal space and fails with ERR_MODULE_NOT_FOUND: Cannot find module 'C:\Program'.
20
23
  for /f "delims=" %%G in ('npm root -g') do set "NPM_GLOBAL=%%G"
21
- set NODE_OPTIONS=--import file:///%NPM_GLOBAL:\=/%/claude-code-cache-fix/preload.mjs
24
+ for /f "delims=" %%U in ('powershell -NoProfile -Command "[System.Uri]::EscapeUriString(('%NPM_GLOBAL:\=/%' + '/claude-code-cache-fix/preload.mjs'))"') do set "PRELOAD_URL=%%U"
25
+ set NODE_OPTIONS=--import file:///%PRELOAD_URL%
22
26
  node "%NPM_GLOBAL%\@anthropic-ai\claude-code\cli.js" %*
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-code-cache-fix",
3
- "version": "1.8.0",
3
+ "version": "1.9.1",
4
4
  "description": "Fixes prompt cache regression in Claude Code that causes up to 20x cost increase on resumed sessions",
5
5
  "type": "module",
6
6
  "exports": "./preload.mjs",
package/preload.mjs CHANGED
@@ -631,6 +631,9 @@ import { join } from "node:path";
631
631
  const DEBUG = process.env.CACHE_FIX_DEBUG === "1";
632
632
  const PREFIXDIFF = process.env.CACHE_FIX_PREFIXDIFF === "1";
633
633
  const NORMALIZE_IDENTITY = process.env.CACHE_FIX_NORMALIZE_IDENTITY === "1";
634
+ const STRIP_GIT_STATUS = process.env.CACHE_FIX_STRIP_GIT_STATUS === "1";
635
+ const TTL_MAIN = (process.env.CACHE_FIX_TTL_MAIN || "1h").toLowerCase();
636
+ const TTL_SUBAGENT = (process.env.CACHE_FIX_TTL_SUBAGENT || "1h").toLowerCase();
634
637
  const LOG_PATH = join(homedir(), ".claude", "cache-fix-debug.log");
635
638
  const SNAPSHOT_DIR = join(homedir(), ".claude", "cache-fix-snapshots");
636
639
  const USAGE_JSONL = process.env.CACHE_FIX_USAGE_LOG || join(homedir(), ".claude", "usage.jsonl");
@@ -672,6 +675,7 @@ const _STATS_SCHEMA = {
672
675
  tool_sort: { applied: 0, skipped: 0, lastApplied: null },
673
676
  ttl: { applied: 0, skipped: 0, lastApplied: null },
674
677
  identity: { applied: 0, skipped: 0, lastApplied: null },
678
+ git_status: { applied: 0, skipped: 0, lastApplied: null },
675
679
  };
676
680
 
677
681
  function _createEmptyStats() {
@@ -1221,41 +1225,88 @@ globalThis.fetch = async function (url, options) {
1221
1225
  }
1222
1226
  }
1223
1227
 
1224
- // Bug 5: 1h TTL enforcement
1228
+ // Optimization: strip volatile git-status from system prompt
1229
+ // CC injects live git-status output (branch, changed files, recent commits)
1230
+ // into a system text block. This changes on every file edit, busting the
1231
+ // entire prefix cache. Opt-in via CACHE_FIX_STRIP_GIT_STATUS=1.
1232
+ // The model can still run `git status` via Bash tool when it needs context.
1233
+ if (STRIP_GIT_STATUS && shouldApplyFix("git_status") && payload.system && Array.isArray(payload.system)) {
1234
+ let stripped = 0;
1235
+ payload.system = payload.system.map((block) => {
1236
+ if (block?.type !== "text" || typeof block.text !== "string") return block;
1237
+ // Match the gitStatus section CC injects. Pattern:
1238
+ // "gitStatus: This is the git status..."
1239
+ // followed by branch, status, commits until the next section or end
1240
+ const gitStatusPattern = /gitStatus:.*?(?=\n# |\n## |\nWhen |\nAnswer |\n<[a-z]|$)/s;
1241
+ if (!gitStatusPattern.test(block.text)) return block;
1242
+ const newText = block.text.replace(gitStatusPattern, "gitStatus: [stripped by cache-fix for prefix stability]");
1243
+ if (newText !== block.text) {
1244
+ stripped++;
1245
+ return { ...block, text: newText };
1246
+ }
1247
+ return block;
1248
+ });
1249
+ if (stripped > 0) {
1250
+ modified = true;
1251
+ debugLog(`APPLIED: git-status stripped from ${stripped} system block(s)`);
1252
+ recordFixResult("git_status", "applied");
1253
+ } else {
1254
+ recordFixResult("git_status", "skipped");
1255
+ }
1256
+ }
1257
+
1258
+ // Bug 5: TTL enforcement (configurable per request type)
1225
1259
  // The client gates 1h cache TTL behind a GrowthBook allowlist that checks
1226
1260
  // querySource against patterns like "repl_main_thread*", "sdk", "auto_mode".
1227
1261
  // Interactive CLI sessions may not match any pattern, causing the client to
1228
1262
  // send cache_control without ttl (defaulting to 5m server-side).
1229
1263
  // The server honors whatever TTL the client requests — so we inject it.
1230
1264
  // Discovered by @TigerKay1926 on #42052 using our GrowthBook flag dump.
1265
+ //
1266
+ // v1.9.0: configurable per request type via CACHE_FIX_TTL_MAIN and
1267
+ // CACHE_FIX_TTL_SUBAGENT. Values: "1h" (default), "5m", "none".
1268
+ // "none" = don't inject TTL, pass through caller's original cache_control.
1231
1269
  if (payload.system && shouldApplyFix("ttl")) {
1232
- let ttlInjected = 0;
1233
- payload.system = payload.system.map((block) => {
1234
- if (block.cache_control?.type === "ephemeral" && !block.cache_control.ttl) {
1235
- ttlInjected++;
1236
- return { ...block, cache_control: { ...block.cache_control, ttl: "1h" } };
1237
- }
1238
- return block;
1239
- });
1240
- // Also check messages for cache_control blocks (conversation history breakpoints)
1241
- if (payload.messages) {
1242
- for (const msg of payload.messages) {
1243
- if (!Array.isArray(msg.content)) continue;
1244
- for (let i = 0; i < msg.content.length; i++) {
1245
- const b = msg.content[i];
1246
- if (b.cache_control?.type === "ephemeral" && !b.cache_control.ttl) {
1247
- msg.content[i] = { ...b, cache_control: { ...b.cache_control, ttl: "1h" } };
1248
- ttlInjected++;
1270
+ // Detect subagent: Agent SDK identity in system[1]
1271
+ const AGENT_SDK_PREFIX = "You are a Claude agent, built on Anthropic's Claude Agent SDK.";
1272
+ const isSubagent = Array.isArray(payload.system) &&
1273
+ payload.system.some((b) => b?.type === "text" && typeof b.text === "string" && b.text.startsWith(AGENT_SDK_PREFIX));
1274
+ const ttlValue = isSubagent ? TTL_SUBAGENT : TTL_MAIN;
1275
+ const requestType = isSubagent ? "subagent" : "main";
1276
+
1277
+ if (ttlValue === "none") {
1278
+ debugLog(`SKIPPED: TTL injection (${requestType} set to 'none' pass-through)`);
1279
+ recordFixResult("ttl", "skipped");
1280
+ } else {
1281
+ const ttlParam = ttlValue === "5m" ? "5m" : "1h";
1282
+ let ttlInjected = 0;
1283
+ payload.system = payload.system.map((block) => {
1284
+ if (block.cache_control?.type === "ephemeral" && !block.cache_control.ttl) {
1285
+ ttlInjected++;
1286
+ return { ...block, cache_control: { ...block.cache_control, ttl: ttlParam } };
1287
+ }
1288
+ return block;
1289
+ });
1290
+ // Also check messages for cache_control blocks (conversation history breakpoints)
1291
+ if (payload.messages) {
1292
+ for (const msg of payload.messages) {
1293
+ if (!Array.isArray(msg.content)) continue;
1294
+ for (let i = 0; i < msg.content.length; i++) {
1295
+ const b = msg.content[i];
1296
+ if (b.cache_control?.type === "ephemeral" && !b.cache_control.ttl) {
1297
+ msg.content[i] = { ...b, cache_control: { ...b.cache_control, ttl: ttlParam } };
1298
+ ttlInjected++;
1299
+ }
1249
1300
  }
1250
1301
  }
1251
1302
  }
1252
- }
1253
- if (ttlInjected > 0) {
1254
- modified = true;
1255
- debugLog(`APPLIED: 1h TTL injected on ${ttlInjected} cache_control block(s)`);
1256
- recordFixResult("ttl", "applied");
1257
- } else {
1258
- recordFixResult("ttl", "skipped");
1303
+ if (ttlInjected > 0) {
1304
+ modified = true;
1305
+ debugLog(`APPLIED: ${ttlParam} TTL injected on ${ttlInjected} cache_control block(s) (${requestType})`);
1306
+ recordFixResult("ttl", "applied");
1307
+ } else {
1308
+ recordFixResult("ttl", "skipped");
1309
+ }
1259
1310
  }
1260
1311
  } else if (payload.system && !shouldApplyFix("ttl")) {
1261
1312
  debugLog("SKIPPED: TTL injection disabled via env var");
@@ -1271,6 +1322,60 @@ globalThis.fetch = async function (url, options) {
1271
1322
  monitorContextDegradation(payload.messages);
1272
1323
  }
1273
1324
 
1325
+ // Diagnostic: dump cache breakpoint structure to a file when
1326
+ // CACHE_FIX_DUMP_BREAKPOINTS=<path> is set. Maps where cache_control markers
1327
+ // sit across system blocks and message content. Used to investigate #12
1328
+ // (missing breakpoint #3 for skills/CLAUDE.md).
1329
+ if (process.env.CACHE_FIX_DUMP_BREAKPOINTS && payload.system) {
1330
+ try {
1331
+ const dumpPath = process.env.CACHE_FIX_DUMP_BREAKPOINTS;
1332
+ const breakpoints = [];
1333
+ // System blocks
1334
+ if (Array.isArray(payload.system)) {
1335
+ payload.system.forEach((block, idx) => {
1336
+ if (block.cache_control) {
1337
+ breakpoints.push({
1338
+ location: "system",
1339
+ index: idx,
1340
+ type: block.type,
1341
+ cache_control: block.cache_control,
1342
+ text_preview: (block.text || "").slice(0, 120),
1343
+ text_chars: (block.text || "").length,
1344
+ });
1345
+ }
1346
+ });
1347
+ }
1348
+ // Message blocks
1349
+ if (payload.messages) {
1350
+ payload.messages.forEach((msg, msgIdx) => {
1351
+ if (!Array.isArray(msg.content)) return;
1352
+ msg.content.forEach((block, blockIdx) => {
1353
+ if (block.cache_control) {
1354
+ breakpoints.push({
1355
+ location: `messages[${msgIdx}].content`,
1356
+ role: msg.role,
1357
+ index: blockIdx,
1358
+ type: block.type,
1359
+ cache_control: block.cache_control,
1360
+ text_preview: (block.text || "").slice(0, 120),
1361
+ text_chars: (block.text || "").length,
1362
+ });
1363
+ }
1364
+ });
1365
+ });
1366
+ }
1367
+ const dump = {
1368
+ timestamp: new Date().toISOString(),
1369
+ breakpoint_count: breakpoints.length,
1370
+ breakpoints,
1371
+ system_block_count: Array.isArray(payload.system) ? payload.system.length : 0,
1372
+ message_count: payload.messages ? payload.messages.length : 0,
1373
+ };
1374
+ writeFileSync(dumpPath, JSON.stringify(dump, null, 2));
1375
+ debugLog(`DUMP: ${breakpoints.length} cache breakpoints written to ${dumpPath}`);
1376
+ } catch (e) { debugLog("BREAKPOINT DUMP ERROR:", e?.message); }
1377
+ }
1378
+
1274
1379
  // Diagnostic: dump full tools array (names, descriptions, schemas, sizes) to a file
1275
1380
  // when CACHE_FIX_DUMP_TOOLS=<path> is set. Useful for per-version tool-schema drift
1276
1381
  // analysis and for understanding which tools contribute prefix bloat. First used
@@ -397,13 +397,24 @@ function calculateCosts(entries, ratesData) {
397
397
  continue;
398
398
  }
399
399
 
400
- // Determine cache write tier breakdown
401
- // If telemetry has eph_1h/eph_5m, use those; otherwise assume all cache_create is 5m
402
- let cw1h = entry.eph_1h;
403
- let cw5m = entry.eph_5m;
404
- if (cw1h === 0 && cw5m === 0 && entry.cache_create > 0) {
405
- // No tier breakdown available; assume 5m (conservative — lower rate)
406
- cw5m = entry.cache_create;
400
+ // Determine cache write tier for cache_creation tokens.
401
+ // eph_1h/eph_5m are READ tokens (cache hits per tier), not write tokens.
402
+ // But they tell us which tier the request was on — and cache creation on
403
+ // that request uses the same tier's write rate.
404
+ // Fix for #7: previously assigned all creation to 5m when eph fields were 0.
405
+ let cw1h = 0;
406
+ let cw5m = 0;
407
+ if (entry.cache_create > 0) {
408
+ if (entry.eph_1h > 0) {
409
+ // Request was on 1h tier — creation charged at 1h write rate
410
+ cw1h = entry.cache_create;
411
+ } else if (entry.eph_5m > 0) {
412
+ // Request was on 5m tier — creation charged at 5m write rate
413
+ cw5m = entry.cache_create;
414
+ } else {
415
+ // No tier signal available; assume 5m (conservative — lower rate)
416
+ cw5m = entry.cache_create;
417
+ }
407
418
  }
408
419
 
409
420
  const cost = (