claude-code-cache-fix 1.7.2 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # claude-code-cache-fix
2
2
 
3
- English | [中文](./README.zh.md)
3
+ English | [中文](./README.zh.md) | [Português](./docs/guia-pt-br.md)
4
4
 
5
5
  Fixes prompt cache regressions in [Claude Code](https://github.com/anthropics/claude-code) that cause **up to 20x cost increase** on resumed sessions, plus monitoring for silent context degradation. Confirmed through v2.1.97.
6
6
 
@@ -36,7 +36,10 @@ Create a wrapper script (e.g. `~/bin/claude-fixed`):
36
36
 
37
37
  ```bash
38
38
  #!/bin/bash
39
- CLAUDE_NPM_CLI="$HOME/.npm-global/lib/node_modules/@anthropic-ai/claude-code/cli.js"
39
+ NPM_GLOBAL_ROOT="$(npm root -g 2>/dev/null)"
40
+
41
+ CLAUDE_NPM_CLI="$NPM_GLOBAL_ROOT/@anthropic-ai/claude-code/cli.js"
42
+ CACHE_FIX="$NPM_GLOBAL_ROOT/claude-code-cache-fix/preload.mjs"
40
43
 
41
44
  if [ ! -f "$CLAUDE_NPM_CLI" ]; then
42
45
  echo "Error: Claude Code npm package not found at $CLAUDE_NPM_CLI" >&2
@@ -44,7 +47,13 @@ if [ ! -f "$CLAUDE_NPM_CLI" ]; then
44
47
  exit 1
45
48
  fi
46
49
 
47
- exec env NODE_OPTIONS="--import claude-code-cache-fix" node "$CLAUDE_NPM_CLI" "$@"
50
+ if [ ! -f "$CACHE_FIX" ]; then
51
+ echo "Error: claude-code-cache-fix not found at $CACHE_FIX" >&2
52
+ echo "Install with: npm install -g claude-code-cache-fix" >&2
53
+ exit 1
54
+ fi
55
+
56
+ exec env NODE_OPTIONS="--import $CACHE_FIX" node "$CLAUDE_NPM_CLI" "$@"
48
57
  ```
49
58
 
50
59
  ```bash
@@ -105,6 +114,67 @@ The module intercepts `globalThis.fetch` before Claude Code makes API calls to `
105
114
 
106
115
  All fixes are idempotent — if nothing needs fixing, the request passes through unmodified. The interceptor is read-only with respect to your conversation; it only normalizes the request structure before it hits the API.
107
116
 
117
+ ## Graduating from Fixes
118
+
119
+ The interceptor serves three purposes with different lifecycles:
120
+
121
+ | Purpose | Examples | When to disable |
122
+ |---------|----------|-----------------|
123
+ | **Bug fixes** | Block relocation, fingerprint, tool sort, TTL | When CC fixes the underlying bug — check the health line |
124
+ | **Monitoring** | Quota tracking, microcompact detection, GrowthBook flags | Keep permanently — these detect future regressions |
125
+ | **Optimizations** | Image stripping, output efficiency rewrite | Keep as long as they help your workflow |
126
+
127
+ ### Health status
128
+
129
+ On first API call, the interceptor logs a health status line (requires `CACHE_FIX_DEBUG=1`):
130
+
131
+ ```
132
+ cache-fix health: relocate=active(2h ago) fingerprint=dormant(5 clean sessions) tool_sort=active ttl=active identity=waiting
133
+ ```
134
+
135
+ Status meanings:
136
+ - **active(Xh ago)** — fix was applied recently
137
+ - **dormant(N clean sessions)** — bug not detected in N resume sessions; CC may have fixed it
138
+ - **safety-blocked(Nx)** — round-trip verification failed; CC changed its algorithm, fix auto-disabled
139
+ - **waiting** — fix hasn't been triggered yet
140
+
141
+ When a fix shows `dormant`, you can safely disable it:
142
+ ```bash
143
+ export CACHE_FIX_SKIP_RELOCATE=1 # example
144
+ ```
145
+
146
+ To disable all fixes but keep monitoring:
147
+ ```bash
148
+ export CACHE_FIX_DISABLED=1
149
+ ```
150
+
151
+ ### Regression detection
152
+
153
+ If cache_read ratio drops below 50% across 5+ calls after disabling fixes, you'll see:
154
+ ```
155
+ REGRESSION WARNING: cache_read ratio averaged 12% across last 5 calls.
156
+ Fixes are disabled — consider re-enabling to recover cache performance.
157
+ ```
158
+
159
+ ## Safety
160
+
161
+ ### Fingerprint round-trip verification
162
+
163
+ Before rewriting the `cc_version` fingerprint, the interceptor verifies that its
164
+ hardcoded salt and character indices reproduce the fingerprint Claude Code sent.
165
+ If verification fails (CC changed its algorithm), the rewrite is skipped automatically.
166
+ This ensures the interceptor can never make cache performance *worse* than stock CC.
167
+
168
+ ### Fail-safe design
169
+
170
+ Every fix is designed to fail to a no-op:
171
+ - If block detection regexes don't match → blocks aren't relocated (CC behavior)
172
+ - If fingerprint format changes → fingerprint isn't rewritten (CC behavior)
173
+ - If tool sort produces no changes → payload passes through untouched
174
+ - If TTL injection target structure changes → TTL isn't injected (CC behavior)
175
+
176
+ The interceptor can only *help* or *do nothing*. It cannot make things worse.
177
+
108
178
  ## Status line — quota warnings in real time
109
179
 
110
180
  The interceptor writes quota state to `~/.claude/quota-status.json` on every API call. The included `tools/quota-statusline.sh` script reads this file and displays a live status line in Claude Code showing:
@@ -137,7 +207,23 @@ Add to `~/.claude/settings.json`:
137
207
  }
138
208
  ```
139
209
 
140
- ### Why this matters
210
+ ### Recommended: disable git-status injection
211
+
212
+ Claude Code injects live `git status` output into the system prompt on every call. Any file edit changes the git status, which changes the system prompt, which busts the entire prefix cache. Disabling this saves ~1,800 tokens per call and fully stabilizes the system prompt across file edits:
213
+
214
+ ```bash
215
+ export CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1
216
+ ```
217
+
218
+ Or add `"includeGitInstructions": false` to `~/.claude/settings.json`. Claude Code can still run `git status` via the Bash tool when it needs git context — it just won't pre-inject it into every system prompt.
219
+
220
+ The flag also shrinks the Bash tool description by ~6,364 chars (the Bash tool includes git-related instructions that are stripped when the flag is set), for a total prefix savings of ~7,180 chars (~1,800 tokens) per call.
221
+
222
+ Community-validated by [@wadabum](https://github.com/cnighswonger/claude-code-cache-fix/issues/11): 18-token cache creation across git state changes (vs thousands without the flag). See [#11](https://github.com/cnighswonger/claude-code-cache-fix/issues/11) for the full telemetry comparison.
223
+
224
+ **Note:** this flag does not address the `"Primary working directory"` line in the system prompt, which changes per git worktree. A v1.9.0 interceptor fix to strip/normalize both is planned ([#11](https://github.com/cnighswonger/claude-code-cache-fix/issues/11)).
225
+
226
+ ### Why the status line matters
141
227
 
142
228
  When the server downgrades your TTL to 5m (Layer 2 — quota-aware downgrade at Q5h ≥ 100%), **every idle longer than 5 minutes causes a full context rebuild**. Without the status line, this is invisible — you just notice things getting slower and more expensive. With the status line, the red `TTL:5m` warning tells you immediately: **stop working, wait for the Q5h window to reset, then resume**. Powering through overage compounds the drain; pausing breaks the cycle.
143
229
 
@@ -341,6 +427,17 @@ Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are gen
341
427
  | `CACHE_FIX_IMAGE_KEEP_LAST` | `0` | Keep images in last N user messages (0 = disabled) |
342
428
  | `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT` | unset | Replace Claude Code's `# Output efficiency` system-prompt section before the request is sent |
343
429
  | `CACHE_FIX_USAGE_LOG` | `~/.claude/usage.jsonl` | Path for per-call usage telemetry log |
430
+ | `CACHE_FIX_DISABLED` | `0` | Disable all bug fixes; keep monitoring + optimizations active |
431
+ | `CACHE_FIX_SKIP_RELOCATE` | `0` | Skip block relocation fix (Bug 1) |
432
+ | `CACHE_FIX_SKIP_FINGERPRINT` | `0` | Skip fingerprint stabilization (Bug 2b) |
433
+ | `CACHE_FIX_SKIP_TOOL_SORT` | `0` | Skip tool ordering stabilization (Bug 2a) |
434
+ | `CACHE_FIX_SKIP_TTL` | `0` | Skip TTL injection (Bug 5) |
435
+ | `CACHE_FIX_SKIP_IDENTITY` | `0` | Skip identity normalization (Bug 6) |
436
+ | `CACHE_FIX_SKIP_GIT_STATUS` | `0` | Skip git-status stripping |
437
+ | `CACHE_FIX_STRIP_GIT_STATUS` | `0` | Strip volatile git-status from system prompt for prefix stability. Model can still run `git status` via Bash. |
438
+ | `CACHE_FIX_TTL_MAIN` | `1h` | TTL for main-thread requests: `1h`, `5m`, or `none` (pass-through) |
439
+ | `CACHE_FIX_TTL_SUBAGENT` | `1h` | TTL for subagent requests: `1h`, `5m`, or `none` (pass-through) |
440
+ | `CACHE_FIX_DUMP_BREAKPOINTS` | unset | Path to dump cache breakpoint structure (diagnostic for #12) |
344
441
 
345
442
  ## Limitations
346
443
 
@@ -424,6 +521,7 @@ measurable signature of cache-efficiency degradation.
424
521
  - **[@Renvect](https://github.com/Renvect)** — Image duplication discovery, cross-project directory contamination analysis
425
522
  - **[@fgrosswig](https://github.com/fgrosswig)** — [claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard) forensic methodology: cost-factor overhead ratio metric, `anthropic-*` header capture pattern, proxy NDJSON schema that informed our dashboard interop layer
426
523
  - **[@TomTheMenace](https://github.com/TomTheMenace)** — Windows `.bat` wrapper for the interceptor, first Windows platform validation (7.5h/536-call Opus 4.6 session, 98.4% cache hit rate, 81% fingerprint instability corrected)
524
+ - **[@arjansingh](https://github.com/arjansingh)** — nvm-compatible wrapper script with dynamic `npm root -g` path resolution (PR #15)
427
525
 
428
526
  If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.
429
527
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-code-cache-fix",
3
- "version": "1.7.2",
3
+ "version": "1.9.0",
4
4
  "description": "Fixes prompt cache regression in Claude Code that causes up to 20x cost increase on resumed sessions",
5
5
  "type": "module",
6
6
  "exports": "./preload.mjs",
package/preload.mjs CHANGED
@@ -83,6 +83,25 @@ function extractRealUserMessageText(messages) {
83
83
  return "";
84
84
  }
85
85
 
86
+ /**
87
+ * Extract text from messages[0] the way CC's original fingerprint code does —
88
+ * including meta/attachment blocks. Used only for round-trip verification.
89
+ */
90
+ function extractFirstMessageText(messages) {
91
+ if (!Array.isArray(messages) || messages.length === 0) return "";
92
+ const first = messages[0];
93
+ if (!first || first.role !== "user") return "";
94
+ const content = first.content;
95
+ if (typeof content === "string") return content;
96
+ if (!Array.isArray(content)) return "";
97
+ for (const block of content) {
98
+ if (block.type === "text" && typeof block.text === "string") {
99
+ return block.text;
100
+ }
101
+ }
102
+ return "";
103
+ }
104
+
86
105
  /**
87
106
  * Extract current cc_version from system prompt blocks and recompute with
88
107
  * stable fingerprint. Returns { oldVersion, newVersion, stableFingerprint }.
@@ -107,6 +126,23 @@ function stabilizeFingerprint(system, messages) {
107
126
  const baseVersion = dotParts.slice(0, 3).join("."); // "2.1.87"
108
127
  const oldFingerprint = dotParts[3]; // "a3f"
109
128
 
129
+ // --- SAFETY: Round-trip verification ---
130
+ // Verify our salt/indices reproduce CC's fingerprint for the ORIGINAL
131
+ // message text (messages[0] content, which is what CC used).
132
+ // If our computation doesn't match, our constants are stale — skip rewrite.
133
+ const originalText = extractFirstMessageText(messages);
134
+ const verification = computeFingerprint(originalText, baseVersion);
135
+ if (verification !== oldFingerprint) {
136
+ debugLog(
137
+ "FINGERPRINT SAFETY: round-trip verification failed.",
138
+ `CC sent '${oldFingerprint}', we computed '${verification}'.`,
139
+ "Salt/indices may have changed in this CC version. Skipping rewrite."
140
+ );
141
+ recordFixResult("fingerprint", "safety_blocked");
142
+ return null;
143
+ }
144
+ // --- END SAFETY ---
145
+
110
146
  // Compute stable fingerprint from real user text
111
147
  const realText = extractRealUserMessageText(messages);
112
148
  const stableFingerprint = computeFingerprint(realText, baseVersion);
@@ -588,13 +624,16 @@ function replaceOutputEfficiencySection(text) {
588
624
  // Set CACHE_FIX_DEBUG=1 to enable
589
625
  // --------------------------------------------------------------------------
590
626
 
591
- import { appendFileSync, readFileSync, writeFileSync, mkdirSync } from "node:fs";
627
+ import { appendFileSync, readFileSync, writeFileSync, mkdirSync, renameSync } from "node:fs";
592
628
  import { homedir } from "node:os";
593
629
  import { join } from "node:path";
594
630
 
595
631
  const DEBUG = process.env.CACHE_FIX_DEBUG === "1";
596
632
  const PREFIXDIFF = process.env.CACHE_FIX_PREFIXDIFF === "1";
597
633
  const NORMALIZE_IDENTITY = process.env.CACHE_FIX_NORMALIZE_IDENTITY === "1";
634
+ const STRIP_GIT_STATUS = process.env.CACHE_FIX_STRIP_GIT_STATUS === "1";
635
+ const TTL_MAIN = (process.env.CACHE_FIX_TTL_MAIN || "1h").toLowerCase();
636
+ const TTL_SUBAGENT = (process.env.CACHE_FIX_TTL_SUBAGENT || "1h").toLowerCase();
598
637
  const LOG_PATH = join(homedir(), ".claude", "cache-fix-debug.log");
599
638
  const SNAPSHOT_DIR = join(homedir(), ".claude", "cache-fix-snapshots");
600
639
  const USAGE_JSONL = process.env.CACHE_FIX_USAGE_LOG || join(homedir(), ".claude", "usage.jsonl");
@@ -605,6 +644,104 @@ function debugLog(...args) {
605
644
  try { appendFileSync(LOG_PATH, line); } catch {}
606
645
  }
607
646
 
647
+ // --------------------------------------------------------------------------
648
+ // Kill switches — disable fixes while keeping monitoring active
649
+ // --------------------------------------------------------------------------
650
+
651
+ const FIXES_DISABLED = process.env.CACHE_FIX_DISABLED === "1";
652
+
653
+ /**
654
+ * Check if a specific fix should be applied.
655
+ * Returns false if master kill switch is on OR individual fix is skipped.
656
+ * Monitoring and optimizations (image strip, output efficiency) are NOT
657
+ * affected by CACHE_FIX_DISABLED — only bug fixes are.
658
+ */
659
+ function shouldApplyFix(fixName) {
660
+ if (FIXES_DISABLED) return false;
661
+ const skipKey = `CACHE_FIX_SKIP_${fixName.toUpperCase()}`;
662
+ if (process.env[skipKey] === "1") return false;
663
+ return true;
664
+ }
665
+
666
+ // --------------------------------------------------------------------------
667
+ // Persistent effectiveness stats
668
+ // --------------------------------------------------------------------------
669
+
670
+ const STATS_PATH = join(homedir(), ".claude", "cache-fix-stats.json");
671
+
672
+ const _STATS_SCHEMA = {
673
+ relocate: { applied: 0, skipped: 0, bugPresent: 0, resumeScanned: 0, lastApplied: null, lastScanned: null },
674
+ fingerprint: { applied: 0, skipped: 0, safetyBlocked: 0, lastApplied: null },
675
+ tool_sort: { applied: 0, skipped: 0, lastApplied: null },
676
+ ttl: { applied: 0, skipped: 0, lastApplied: null },
677
+ identity: { applied: 0, skipped: 0, lastApplied: null },
678
+ git_status: { applied: 0, skipped: 0, lastApplied: null },
679
+ };
680
+
681
+ function _createEmptyStats() {
682
+ return {
683
+ version: 1,
684
+ created: new Date().toISOString(),
685
+ lastUpdated: null,
686
+ fixes: JSON.parse(JSON.stringify(_STATS_SCHEMA)),
687
+ };
688
+ }
689
+
690
+ /** Read stats from disk. Returns empty stats on any error. */
691
+ function readStats() {
692
+ try {
693
+ const data = JSON.parse(readFileSync(STATS_PATH, "utf8"));
694
+ if (data.created) {
695
+ const ageDays = (Date.now() - new Date(data.created).getTime()) / (1000 * 60 * 60 * 24);
696
+ if (ageDays > 30) return _createEmptyStats();
697
+ }
698
+ for (const [key, schema] of Object.entries(_STATS_SCHEMA)) {
699
+ if (!data.fixes[key]) data.fixes[key] = { ...schema };
700
+ }
701
+ return data;
702
+ } catch {
703
+ return _createEmptyStats();
704
+ }
705
+ }
706
+
707
+ /** Atomic write: temp file + rename to avoid corruption. */
708
+ function writeStats(stats) {
709
+ try {
710
+ stats.lastUpdated = new Date().toISOString();
711
+ const tmp = STATS_PATH + ".tmp";
712
+ writeFileSync(tmp, JSON.stringify(stats, null, 2));
713
+ renameSync(tmp, STATS_PATH);
714
+ } catch (e) {
715
+ debugLog("STATS WRITE ERROR:", e?.message);
716
+ }
717
+ }
718
+
719
+ function recordFixResult(fixName, result) {
720
+ const stats = readStats();
721
+ if (!stats.fixes[fixName]) return;
722
+ const now = new Date().toISOString();
723
+ stats.lastUpdated = now;
724
+ if (result === "applied") {
725
+ stats.fixes[fixName].applied++;
726
+ stats.fixes[fixName].lastApplied = now;
727
+ } else if (result === "skipped") {
728
+ stats.fixes[fixName].skipped++;
729
+ } else if (result === "safety_blocked") {
730
+ stats.fixes[fixName].safetyBlocked = (stats.fixes[fixName].safetyBlocked || 0) + 1;
731
+ }
732
+ writeStats(stats);
733
+ }
734
+
735
+ function recordRelocateScan(bugFound) {
736
+ const stats = readStats();
737
+ const now = new Date().toISOString();
738
+ stats.lastUpdated = now;
739
+ stats.fixes.relocate.resumeScanned++;
740
+ stats.fixes.relocate.lastScanned = now;
741
+ if (bugFound) stats.fixes.relocate.bugPresent++;
742
+ writeStats(stats);
743
+ }
744
+
608
745
  // --------------------------------------------------------------------------
609
746
  // Prefix snapshot — captures message prefix for cross-process diff.
610
747
  // Set CACHE_FIX_PREFIXDIFF=1 to enable.
@@ -656,6 +793,59 @@ function dumpGrowthBookFlags() {
656
793
  }
657
794
  }
658
795
 
796
+ // --------------------------------------------------------------------------
797
+ // Startup health status line
798
+ // --------------------------------------------------------------------------
799
+
800
+ let _healthLinePrinted = false;
801
+
802
+ function _formatTimeSince(isoString) {
803
+ if (!isoString) return "never";
804
+ const ms = Date.now() - new Date(isoString).getTime();
805
+ const hours = Math.floor(ms / (1000 * 60 * 60));
806
+ const days = Math.floor(hours / 24);
807
+ if (days > 0) return `${days}d ago`;
808
+ if (hours > 0) return `${hours}h ago`;
809
+ const mins = Math.floor(ms / (1000 * 60));
810
+ return `${mins}m ago`;
811
+ }
812
+
813
+ function _formatFixStatus(fixName, fixStats, dormantThreshold = 5) {
814
+ if (fixName === "relocate") {
815
+ if (fixStats.resumeScanned >= dormantThreshold && fixStats.bugPresent === 0) {
816
+ return `dormant(${fixStats.resumeScanned} clean sessions)`;
817
+ }
818
+ } else {
819
+ if (fixStats.skipped >= dormantThreshold && fixStats.applied === 0) {
820
+ return `dormant(${fixStats.skipped} skips)`;
821
+ }
822
+ }
823
+ if (fixStats.safetyBlocked > 0) return `safety-blocked(${fixStats.safetyBlocked}x)`;
824
+ if (fixStats.lastApplied) return `active(${_formatTimeSince(fixStats.lastApplied)})`;
825
+ return "waiting";
826
+ }
827
+
828
+ function printHealthLine() {
829
+ if (_healthLinePrinted) return;
830
+ _healthLinePrinted = true;
831
+ const stats = readStats();
832
+ const parts = [];
833
+ for (const [name, fixStats] of Object.entries(stats.fixes)) {
834
+ const status = _formatFixStatus(name, fixStats);
835
+ parts.push(`${name}=${status}`);
836
+ if (status.startsWith("dormant")) {
837
+ debugLog(`DORMANT: ${name} — CC may have fixed this. Consider CACHE_FIX_SKIP_${name.toUpperCase()}=1`);
838
+ }
839
+ if (status.startsWith("safety-blocked")) {
840
+ debugLog(`SAFETY: ${name} — salt/indices may have changed. Fix is auto-disabled.`);
841
+ }
842
+ }
843
+ debugLog(`HEALTH: ${parts.join(" ")}`);
844
+ if (FIXES_DISABLED) {
845
+ debugLog("HEALTH: all fixes disabled via CACHE_FIX_DISABLED=1 (monitoring active)");
846
+ }
847
+ }
848
+
659
849
  // --------------------------------------------------------------------------
660
850
  // Microcompact / budget monitoring
661
851
  // --------------------------------------------------------------------------
@@ -801,6 +991,50 @@ function snapshotPrefix(payload) {
801
991
  }
802
992
  }
803
993
 
994
+ // --------------------------------------------------------------------------
995
+ // Cache regression detector
996
+ // --------------------------------------------------------------------------
997
+
998
+ const _cacheHistory = []; // in-memory ring buffer of { ratio, turn }
999
+ const REGRESSION_MIN_CALLS = 5;
1000
+ const REGRESSION_MIN_RATIO = 0.5;
1001
+ let _apiCallCount = 0;
1002
+
1003
+ function _computeCacheRatio(usage) {
1004
+ if (!usage) return null;
1005
+ const read = usage.cache_read_input_tokens || 0;
1006
+ const creation = usage.cache_creation_input_tokens || 0;
1007
+ const input = usage.input_tokens || 0;
1008
+ const total = read + creation + input;
1009
+ if (total === 0) return null;
1010
+ return read / total;
1011
+ }
1012
+
1013
+ function _checkCacheRegression() {
1014
+ if (_cacheHistory.length < REGRESSION_MIN_CALLS) return;
1015
+ const recent = _cacheHistory.slice(-REGRESSION_MIN_CALLS);
1016
+ const allLow = recent.every((h) => h.ratio < REGRESSION_MIN_RATIO);
1017
+ if (allLow) {
1018
+ const avgRatio = recent.reduce((sum, h) => sum + h.ratio, 0) / recent.length;
1019
+ debugLog(
1020
+ `REGRESSION WARNING: cache_read ratio averaged ${Math.round(avgRatio * 100)}%`,
1021
+ `across last ${REGRESSION_MIN_CALLS} calls (threshold: ${REGRESSION_MIN_RATIO * 100}%).`,
1022
+ FIXES_DISABLED
1023
+ ? "Fixes are disabled — consider re-enabling to recover cache performance."
1024
+ : "Fixes are active but cache is still degraded — CC may have introduced a new bug."
1025
+ );
1026
+ }
1027
+ }
1028
+
1029
+ function _trackCacheRatio(usage) {
1030
+ if (_apiCallCount <= 1) return; // skip first call (cache creation, no reads)
1031
+ const ratio = _computeCacheRatio(usage);
1032
+ if (ratio === null) return;
1033
+ _cacheHistory.push({ ratio, turn: _apiCallCount });
1034
+ if (_cacheHistory.length > 20) _cacheHistory.shift(); // ring buffer
1035
+ _checkCacheRegression();
1036
+ }
1037
+
804
1038
  // --------------------------------------------------------------------------
805
1039
  // Fetch interceptor
806
1040
  // --------------------------------------------------------------------------
@@ -817,11 +1051,17 @@ globalThis.fetch = async function (url, options) {
817
1051
 
818
1052
  if (isMessagesEndpoint && options?.body && typeof options.body === "string") {
819
1053
  try {
1054
+ _apiCallCount++;
820
1055
  const payload = JSON.parse(options.body);
821
1056
  let modified = false;
822
1057
 
823
1058
  // One-time GrowthBook flag dump on first API call
824
1059
  dumpGrowthBookFlags();
1060
+ printHealthLine();
1061
+
1062
+ if (FIXES_DISABLED) {
1063
+ debugLog("CACHE_FIX_DISABLED=1 — all bug fixes bypassed, monitoring active");
1064
+ }
825
1065
 
826
1066
  debugLog("--- API call to", urlStr);
827
1067
  debugLog("message count:", payload.messages?.length);
@@ -832,7 +1072,7 @@ globalThis.fetch = async function (url, options) {
832
1072
  }
833
1073
 
834
1074
  // Bug 1: Relocate resume attachment blocks
835
- if (payload.messages) {
1075
+ if (payload.messages && shouldApplyFix("relocate")) {
836
1076
  // Log message structure for debugging
837
1077
  if (DEBUG) {
838
1078
  let firstUserIdx = -1, lastUserIdx = -1;
@@ -868,13 +1108,21 @@ globalThis.fetch = async function (url, options) {
868
1108
  }
869
1109
 
870
1110
  const normalized = normalizeResumeMessages(payload.messages);
1111
+ // Track bug presence for dormancy detection (resume = messages > 5)
1112
+ const isResume = payload.messages.length > 5;
1113
+ if (isResume) recordRelocateScan(normalized !== payload.messages);
1114
+
871
1115
  if (normalized !== payload.messages) {
872
1116
  payload.messages = normalized;
873
1117
  modified = true;
874
1118
  debugLog("APPLIED: resume message relocation");
1119
+ recordFixResult("relocate", "applied");
875
1120
  } else {
876
1121
  debugLog("SKIPPED: resume relocation (not a resume or already correct)");
1122
+ recordFixResult("relocate", "skipped");
877
1123
  }
1124
+ } else if (payload.messages && !shouldApplyFix("relocate")) {
1125
+ debugLog("SKIPPED: relocate fix disabled via env var");
878
1126
  }
879
1127
 
880
1128
  // Image stripping: remove old tool_result images to reduce token waste
@@ -895,7 +1143,7 @@ globalThis.fetch = async function (url, options) {
895
1143
  }
896
1144
 
897
1145
  // Bug 2a: Stabilize tool ordering
898
- if (payload.tools) {
1146
+ if (payload.tools && shouldApplyFix("tool_sort")) {
899
1147
  const sorted = stabilizeToolOrder(payload.tools);
900
1148
  const changed = sorted.some(
901
1149
  (t, i) => t.name !== payload.tools[i]?.name
@@ -904,11 +1152,16 @@ globalThis.fetch = async function (url, options) {
904
1152
  payload.tools = sorted;
905
1153
  modified = true;
906
1154
  debugLog("APPLIED: tool order stabilization");
1155
+ recordFixResult("tool_sort", "applied");
1156
+ } else {
1157
+ recordFixResult("tool_sort", "skipped");
907
1158
  }
1159
+ } else if (payload.tools && !shouldApplyFix("tool_sort")) {
1160
+ debugLog("SKIPPED: tool sort fix disabled via env var");
908
1161
  }
909
1162
 
910
1163
  // Bug 2b: Stabilize fingerprint in attribution header
911
- if (payload.system && payload.messages) {
1164
+ if (payload.system && payload.messages && shouldApplyFix("fingerprint")) {
912
1165
  const fix = stabilizeFingerprint(payload.system, payload.messages);
913
1166
  if (fix) {
914
1167
  payload.system = [...payload.system];
@@ -918,7 +1171,12 @@ globalThis.fetch = async function (url, options) {
918
1171
  };
919
1172
  modified = true;
920
1173
  debugLog("APPLIED: fingerprint stabilized from", fix.oldFingerprint, "to", fix.stableFingerprint);
1174
+ recordFixResult("fingerprint", "applied");
1175
+ } else {
1176
+ recordFixResult("fingerprint", "skipped");
921
1177
  }
1178
+ } else if (payload.system && payload.messages && !shouldApplyFix("fingerprint")) {
1179
+ debugLog("SKIPPED: fingerprint fix disabled via env var");
922
1180
  }
923
1181
 
924
1182
  // Bug 6: Identity string normalization for Agent()/SendMessage() cache parity
@@ -931,7 +1189,7 @@ globalThis.fetch = async function (url, options) {
931
1189
  // turn even though system[2] (the actual instructions) is byte-identical.
932
1190
  // Confirmed by @labzink via mitmproxy on #44724.
933
1191
  // Opt-in because it's a model-perceivable behavior change (subagent thinks it's CC).
934
- if (NORMALIZE_IDENTITY && payload.system && Array.isArray(payload.system)) {
1192
+ if (NORMALIZE_IDENTITY && shouldApplyFix("identity") && payload.system && Array.isArray(payload.system)) {
935
1193
  const CANONICAL = "You are Claude Code, Anthropic's official CLI for Claude.";
936
1194
  const AGENT_SDK = "You are a Claude agent, built on Anthropic's Claude Agent SDK.";
937
1195
  let normalized = 0;
@@ -949,6 +1207,9 @@ globalThis.fetch = async function (url, options) {
949
1207
  if (normalized > 0) {
950
1208
  modified = true;
951
1209
  debugLog(`APPLIED: identity normalized on ${normalized} system block(s) (Agent SDK → Claude Code)`);
1210
+ recordFixResult("identity", "applied");
1211
+ } else {
1212
+ recordFixResult("identity", "skipped");
952
1213
  }
953
1214
  }
954
1215
 
@@ -964,39 +1225,91 @@ globalThis.fetch = async function (url, options) {
964
1225
  }
965
1226
  }
966
1227
 
967
- // Bug 5: 1h TTL enforcement
1228
+ // Optimization: strip volatile git-status from system prompt
1229
+ // CC injects live git-status output (branch, changed files, recent commits)
1230
+ // into a system text block. This changes on every file edit, busting the
1231
+ // entire prefix cache. Opt-in via CACHE_FIX_STRIP_GIT_STATUS=1.
1232
+ // The model can still run `git status` via Bash tool when it needs context.
1233
+ if (STRIP_GIT_STATUS && shouldApplyFix("git_status") && payload.system && Array.isArray(payload.system)) {
1234
+ let stripped = 0;
1235
+ payload.system = payload.system.map((block) => {
1236
+ if (block?.type !== "text" || typeof block.text !== "string") return block;
1237
+ // Match the gitStatus section CC injects. Pattern:
1238
+ // "gitStatus: This is the git status..."
1239
+ // followed by branch, status, commits until the next section or end
1240
+ const gitStatusPattern = /gitStatus:.*?(?=\n# |\n## |\nWhen |\nAnswer |\n<[a-z]|$)/s;
1241
+ if (!gitStatusPattern.test(block.text)) return block;
1242
+ const newText = block.text.replace(gitStatusPattern, "gitStatus: [stripped by cache-fix for prefix stability]");
1243
+ if (newText !== block.text) {
1244
+ stripped++;
1245
+ return { ...block, text: newText };
1246
+ }
1247
+ return block;
1248
+ });
1249
+ if (stripped > 0) {
1250
+ modified = true;
1251
+ debugLog(`APPLIED: git-status stripped from ${stripped} system block(s)`);
1252
+ recordFixResult("git_status", "applied");
1253
+ } else {
1254
+ recordFixResult("git_status", "skipped");
1255
+ }
1256
+ }
1257
+
1258
+ // Bug 5: TTL enforcement (configurable per request type)
968
1259
  // The client gates 1h cache TTL behind a GrowthBook allowlist that checks
969
1260
  // querySource against patterns like "repl_main_thread*", "sdk", "auto_mode".
970
1261
  // Interactive CLI sessions may not match any pattern, causing the client to
971
1262
  // send cache_control without ttl (defaulting to 5m server-side).
972
1263
  // The server honors whatever TTL the client requests — so we inject it.
973
1264
  // Discovered by @TigerKay1926 on #42052 using our GrowthBook flag dump.
974
- if (payload.system) {
975
- let ttlInjected = 0;
976
- payload.system = payload.system.map((block) => {
977
- if (block.cache_control?.type === "ephemeral" && !block.cache_control.ttl) {
978
- ttlInjected++;
979
- return { ...block, cache_control: { ...block.cache_control, ttl: "1h" } };
980
- }
981
- return block;
982
- });
983
- // Also check messages for cache_control blocks (conversation history breakpoints)
984
- if (payload.messages) {
985
- for (const msg of payload.messages) {
986
- if (!Array.isArray(msg.content)) continue;
987
- for (let i = 0; i < msg.content.length; i++) {
988
- const b = msg.content[i];
989
- if (b.cache_control?.type === "ephemeral" && !b.cache_control.ttl) {
990
- msg.content[i] = { ...b, cache_control: { ...b.cache_control, ttl: "1h" } };
991
- ttlInjected++;
1265
+ //
1266
+ // v1.9.0: configurable per request type via CACHE_FIX_TTL_MAIN and
1267
+ // CACHE_FIX_TTL_SUBAGENT. Values: "1h" (default), "5m", "none".
1268
+ // "none" = don't inject TTL, pass through caller's original cache_control.
1269
+ if (payload.system && shouldApplyFix("ttl")) {
1270
+ // Detect subagent: Agent SDK identity in system[1]
1271
+ const AGENT_SDK_PREFIX = "You are a Claude agent, built on Anthropic's Claude Agent SDK.";
1272
+ const isSubagent = Array.isArray(payload.system) &&
1273
+ payload.system.some((b) => b?.type === "text" && typeof b.text === "string" && b.text.startsWith(AGENT_SDK_PREFIX));
1274
+ const ttlValue = isSubagent ? TTL_SUBAGENT : TTL_MAIN;
1275
+ const requestType = isSubagent ? "subagent" : "main";
1276
+
1277
+ if (ttlValue === "none") {
1278
+ debugLog(`SKIPPED: TTL injection (${requestType} set to 'none' — pass-through)`);
1279
+ recordFixResult("ttl", "skipped");
1280
+ } else {
1281
+ const ttlParam = ttlValue === "5m" ? "5m" : "1h";
1282
+ let ttlInjected = 0;
1283
+ payload.system = payload.system.map((block) => {
1284
+ if (block.cache_control?.type === "ephemeral" && !block.cache_control.ttl) {
1285
+ ttlInjected++;
1286
+ return { ...block, cache_control: { ...block.cache_control, ttl: ttlParam } };
1287
+ }
1288
+ return block;
1289
+ });
1290
+ // Also check messages for cache_control blocks (conversation history breakpoints)
1291
+ if (payload.messages) {
1292
+ for (const msg of payload.messages) {
1293
+ if (!Array.isArray(msg.content)) continue;
1294
+ for (let i = 0; i < msg.content.length; i++) {
1295
+ const b = msg.content[i];
1296
+ if (b.cache_control?.type === "ephemeral" && !b.cache_control.ttl) {
1297
+ msg.content[i] = { ...b, cache_control: { ...b.cache_control, ttl: ttlParam } };
1298
+ ttlInjected++;
1299
+ }
992
1300
  }
993
1301
  }
994
1302
  }
1303
+ if (ttlInjected > 0) {
1304
+ modified = true;
1305
+ debugLog(`APPLIED: ${ttlParam} TTL injected on ${ttlInjected} cache_control block(s) (${requestType})`);
1306
+ recordFixResult("ttl", "applied");
1307
+ } else {
1308
+ recordFixResult("ttl", "skipped");
1309
+ }
995
1310
  }
996
- if (ttlInjected > 0) {
997
- modified = true;
998
- debugLog(`APPLIED: 1h TTL injected on ${ttlInjected} cache_control block(s)`);
999
- }
1311
+ } else if (payload.system && !shouldApplyFix("ttl")) {
1312
+ debugLog("SKIPPED: TTL injection disabled via env var");
1000
1313
  }
1001
1314
 
1002
1315
  if (modified) {
@@ -1009,6 +1322,60 @@ globalThis.fetch = async function (url, options) {
1009
1322
  monitorContextDegradation(payload.messages);
1010
1323
  }
1011
1324
 
1325
+ // Diagnostic: dump cache breakpoint structure to a file when
1326
+ // CACHE_FIX_DUMP_BREAKPOINTS=<path> is set. Maps where cache_control markers
1327
+ // sit across system blocks and message content. Used to investigate #12
1328
+ // (missing breakpoint #3 for skills/CLAUDE.md).
1329
+ if (process.env.CACHE_FIX_DUMP_BREAKPOINTS && payload.system) {
1330
+ try {
1331
+ const dumpPath = process.env.CACHE_FIX_DUMP_BREAKPOINTS;
1332
+ const breakpoints = [];
1333
+ // System blocks
1334
+ if (Array.isArray(payload.system)) {
1335
+ payload.system.forEach((block, idx) => {
1336
+ if (block.cache_control) {
1337
+ breakpoints.push({
1338
+ location: "system",
1339
+ index: idx,
1340
+ type: block.type,
1341
+ cache_control: block.cache_control,
1342
+ text_preview: (block.text || "").slice(0, 120),
1343
+ text_chars: (block.text || "").length,
1344
+ });
1345
+ }
1346
+ });
1347
+ }
1348
+ // Message blocks
1349
+ if (payload.messages) {
1350
+ payload.messages.forEach((msg, msgIdx) => {
1351
+ if (!Array.isArray(msg.content)) return;
1352
+ msg.content.forEach((block, blockIdx) => {
1353
+ if (block.cache_control) {
1354
+ breakpoints.push({
1355
+ location: `messages[${msgIdx}].content`,
1356
+ role: msg.role,
1357
+ index: blockIdx,
1358
+ type: block.type,
1359
+ cache_control: block.cache_control,
1360
+ text_preview: (block.text || "").slice(0, 120),
1361
+ text_chars: (block.text || "").length,
1362
+ });
1363
+ }
1364
+ });
1365
+ });
1366
+ }
1367
+ const dump = {
1368
+ timestamp: new Date().toISOString(),
1369
+ breakpoint_count: breakpoints.length,
1370
+ breakpoints,
1371
+ system_block_count: Array.isArray(payload.system) ? payload.system.length : 0,
1372
+ message_count: payload.messages ? payload.messages.length : 0,
1373
+ };
1374
+ writeFileSync(dumpPath, JSON.stringify(dump, null, 2));
1375
+ debugLog(`DUMP: ${breakpoints.length} cache breakpoints written to ${dumpPath}`);
1376
+ } catch (e) { debugLog("BREAKPOINT DUMP ERROR:", e?.message); }
1377
+ }
1378
+
1012
1379
  // Diagnostic: dump full tools array (names, descriptions, schemas, sizes) to a file
1013
1380
  // when CACHE_FIX_DUMP_TOOLS=<path> is set. Useful for per-version tool-schema drift
1014
1381
  // analysis and for understanding which tools contribute prefix bloat. First used
@@ -1199,6 +1566,7 @@ async function drainTTLFromClone(clone, model, quotaHeaders) {
1199
1566
  if (event.type === "message_start" && event.message?.usage) {
1200
1567
  const u = event.message.usage;
1201
1568
  startUsage = u;
1569
+ _trackCacheRatio(u);
1202
1570
  const cc = u.cache_creation || {};
1203
1571
  const e1h = cc.ephemeral_1h_input_tokens ?? 0;
1204
1572
  const e5m = cc.ephemeral_5m_input_tokens ?? 0;
@@ -397,13 +397,24 @@ function calculateCosts(entries, ratesData) {
397
397
  continue;
398
398
  }
399
399
 
400
- // Determine cache write tier breakdown
401
- // If telemetry has eph_1h/eph_5m, use those; otherwise assume all cache_create is 5m
402
- let cw1h = entry.eph_1h;
403
- let cw5m = entry.eph_5m;
404
- if (cw1h === 0 && cw5m === 0 && entry.cache_create > 0) {
405
- // No tier breakdown available; assume 5m (conservative — lower rate)
406
- cw5m = entry.cache_create;
400
+ // Determine cache write tier for cache_creation tokens.
401
+ // eph_1h/eph_5m are READ tokens (cache hits per tier), not write tokens.
402
+ // But they tell us which tier the request was on — and cache creation on
403
+ // that request uses the same tier's write rate.
404
+ // Fix for #7: previously assigned all creation to 5m when eph fields were 0.
405
+ let cw1h = 0;
406
+ let cw5m = 0;
407
+ if (entry.cache_create > 0) {
408
+ if (entry.eph_1h > 0) {
409
+ // Request was on 1h tier — creation charged at 1h write rate
410
+ cw1h = entry.cache_create;
411
+ } else if (entry.eph_5m > 0) {
412
+ // Request was on 5m tier — creation charged at 5m write rate
413
+ cw5m = entry.cache_create;
414
+ } else {
415
+ // No tier signal available; assume 5m (conservative — lower rate)
416
+ cw5m = entry.cache_create;
417
+ }
407
418
  }
408
419
 
409
420
  const cost = (
@@ -59,6 +59,16 @@ try:
59
59
  if ttl:
60
60
  if ttl == '5m':
61
61
  label += ' | \033[31mTTL:5m\033[0m' # red
62
+ # When on 5m tier, show the cold-rebuild size so users know
63
+ # the cost of idling past 5 minutes
64
+ cache_cr = qs.get('cache', {}).get('cache_creation', 0)
65
+ cache_rd = qs.get('cache', {}).get('cache_read', 0)
66
+ prefix = cache_cr + cache_rd
67
+ if prefix > 0:
68
+ if prefix >= 1_000_000:
69
+ label += ' \033[31m\u26A0 idle >5m = {:.1f}M rebuild\033[0m'.format(prefix / 1_000_000)
70
+ else:
71
+ label += ' \033[31m\u26A0 idle >5m = {:.0f}K rebuild\033[0m'.format(prefix / 1_000)
62
72
  else:
63
73
  label += ' | TTL:' + ttl
64
74
  if hit and hit != 'N/A':