npm - claude-code-cache-fix - Versions diffs - 1.7.2 → 1.9.0 - Mend

claude-code-cache-fix 1.7.2 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md +102 -4
package/package.json +1 -1
package/preload.mjs +396 -28
package/tools/cost-report.mjs +18 -7
package/tools/quota-statusline.sh +10 -0

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # claude-code-cache-fix
-English | [中文](./README.zh.md)
+English | [中文](./README.zh.md) | [Português](./docs/guia-pt-br.md)
 Fixes prompt cache regressions in [Claude Code](https://github.com/anthropics/claude-code) that cause **up to 20x cost increase** on resumed sessions, plus monitoring for silent context degradation. Confirmed through v2.1.97.
@@ -36,7 +36,10 @@ Create a wrapper script (e.g. `~/bin/claude-fixed`):
 ```bash
 #!/bin/bash
-CLAUDE_NPM_CLI="$HOME/.npm-global/lib/node_modules/@anthropic-ai/claude-code/cli.js"
+NPM_GLOBAL_ROOT="$(npm root -g 2>/dev/null)"
+CLAUDE_NPM_CLI="$NPM_GLOBAL_ROOT/@anthropic-ai/claude-code/cli.js"
+CACHE_FIX="$NPM_GLOBAL_ROOT/claude-code-cache-fix/preload.mjs"
 if [ ! -f "$CLAUDE_NPM_CLI" ]; then
   echo "Error: Claude Code npm package not found at $CLAUDE_NPM_CLI" >&2
@@ -44,7 +47,13 @@ if [ ! -f "$CLAUDE_NPM_CLI" ]; then
   exit 1
 fi
-exec env NODE_OPTIONS="--import claude-code-cache-fix" node "$CLAUDE_NPM_CLI" "$@"
+if [ ! -f "$CACHE_FIX" ]; then
+  echo "Error: claude-code-cache-fix not found at $CACHE_FIX" >&2
+  echo "Install with: npm install -g claude-code-cache-fix" >&2
+  exit 1
+fi
+exec env NODE_OPTIONS="--import $CACHE_FIX" node "$CLAUDE_NPM_CLI" "$@"
 ```
 ```bash
@@ -105,6 +114,67 @@ The module intercepts `globalThis.fetch` before Claude Code makes API calls to `
 All fixes are idempotent — if nothing needs fixing, the request passes through unmodified. The interceptor is read-only with respect to your conversation; it only normalizes the request structure before it hits the API.
+## Graduating from Fixes
+The interceptor serves three purposes with different lifecycles:
+| Purpose | Examples | When to disable |
+|---------|----------|-----------------|
+| **Bug fixes** | Block relocation, fingerprint, tool sort, TTL | When CC fixes the underlying bug — check the health line |
+| **Monitoring** | Quota tracking, microcompact detection, GrowthBook flags | Keep permanently — these detect future regressions |
+| **Optimizations** | Image stripping, output efficiency rewrite | Keep as long as they help your workflow |
+### Health status
+On first API call, the interceptor logs a health status line (requires `CACHE_FIX_DEBUG=1`):
+```
+cache-fix health: relocate=active(2h ago) fingerprint=dormant(5 clean sessions) tool_sort=active ttl=active identity=waiting
+```
+Status meanings:
+- **active(Xh ago)** — fix was applied recently
+- **dormant(N clean sessions)** — bug not detected in N resume sessions; CC may have fixed it
+- **safety-blocked(Nx)** — round-trip verification failed; CC changed its algorithm, fix auto-disabled
+- **waiting** — fix hasn't been triggered yet
+When a fix shows `dormant`, you can safely disable it:
+```bash
+export CACHE_FIX_SKIP_RELOCATE=1  # example
+```
+To disable all fixes but keep monitoring:
+```bash
+export CACHE_FIX_DISABLED=1
+```
+### Regression detection
+If cache_read ratio drops below 50% across 5+ calls after disabling fixes, you'll see:
+```
+REGRESSION WARNING: cache_read ratio averaged 12% across last 5 calls.
+Fixes are disabled — consider re-enabling to recover cache performance.
+```
+## Safety
+### Fingerprint round-trip verification
+Before rewriting the `cc_version` fingerprint, the interceptor verifies that its
+hardcoded salt and character indices reproduce the fingerprint Claude Code sent.
+If verification fails (CC changed its algorithm), the rewrite is skipped automatically.
+This ensures the interceptor can never make cache performance *worse* than stock CC.
+### Fail-safe design
+Every fix is designed to fail to a no-op:
+- If block detection regexes don't match → blocks aren't relocated (CC behavior)
+- If fingerprint format changes → fingerprint isn't rewritten (CC behavior)
+- If tool sort produces no changes → payload passes through untouched
+- If TTL injection target structure changes → TTL isn't injected (CC behavior)
+The interceptor can only *help* or *do nothing*. It cannot make things worse.
 ## Status line — quota warnings in real time
 The interceptor writes quota state to `~/.claude/quota-status.json` on every API call. The included `tools/quota-statusline.sh` script reads this file and displays a live status line in Claude Code showing:
@@ -137,7 +207,23 @@ Add to `~/.claude/settings.json`:
 }
 ```
-### Why this matters
+### Recommended: disable git-status injection
+Claude Code injects live `git status` output into the system prompt on every call. Any file edit changes the git status, which changes the system prompt, which busts the entire prefix cache. Disabling this saves ~1,800 tokens per call and fully stabilizes the system prompt across file edits:
+```bash
+export CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1
+```
+Or add `"includeGitInstructions": false` to `~/.claude/settings.json`. Claude Code can still run `git status` via the Bash tool when it needs git context — it just won't pre-inject it into every system prompt.
+The flag also shrinks the Bash tool description by ~6,364 chars (the Bash tool includes git-related instructions that are stripped when the flag is set), for a total prefix savings of ~7,180 chars (~1,800 tokens) per call.
+Community-validated by [@wadabum](https://github.com/cnighswonger/claude-code-cache-fix/issues/11): 18-token cache creation across git state changes (vs thousands without the flag). See [#11](https://github.com/cnighswonger/claude-code-cache-fix/issues/11) for the full telemetry comparison.
+**Note:** this flag does not address the `"Primary working directory"` line in the system prompt, which changes per git worktree. A v1.9.0 interceptor fix to strip/normalize both is planned ([#11](https://github.com/cnighswonger/claude-code-cache-fix/issues/11)).
+### Why the status line matters
 When the server downgrades your TTL to 5m (Layer 2 — quota-aware downgrade at Q5h ≥ 100%), **every idle longer than 5 minutes causes a full context rebuild**. Without the status line, this is invisible — you just notice things getting slower and more expensive. With the status line, the red `TTL:5m` warning tells you immediately: **stop working, wait for the Q5h window to reset, then resume**. Powering through overage compounds the drain; pausing breaks the cycle.
@@ -341,6 +427,17 @@ Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are gen
 | `CACHE_FIX_IMAGE_KEEP_LAST` | `0` | Keep images in last N user messages (0 = disabled) |
 | `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT` | unset | Replace Claude Code's `# Output efficiency` system-prompt section before the request is sent |
 | `CACHE_FIX_USAGE_LOG` | `~/.claude/usage.jsonl` | Path for per-call usage telemetry log |
+| `CACHE_FIX_DISABLED` | `0` | Disable all bug fixes; keep monitoring + optimizations active |
+| `CACHE_FIX_SKIP_RELOCATE` | `0` | Skip block relocation fix (Bug 1) |
+| `CACHE_FIX_SKIP_FINGERPRINT` | `0` | Skip fingerprint stabilization (Bug 2b) |
+| `CACHE_FIX_SKIP_TOOL_SORT` | `0` | Skip tool ordering stabilization (Bug 2a) |
+| `CACHE_FIX_SKIP_TTL` | `0` | Skip TTL injection (Bug 5) |
+| `CACHE_FIX_SKIP_IDENTITY` | `0` | Skip identity normalization (Bug 6) |
+| `CACHE_FIX_SKIP_GIT_STATUS` | `0` | Skip git-status stripping |
+| `CACHE_FIX_STRIP_GIT_STATUS` | `0` | Strip volatile git-status from system prompt for prefix stability. Model can still run `git status` via Bash. |
+| `CACHE_FIX_TTL_MAIN` | `1h` | TTL for main-thread requests: `1h`, `5m`, or `none` (pass-through) |
+| `CACHE_FIX_TTL_SUBAGENT` | `1h` | TTL for subagent requests: `1h`, `5m`, or `none` (pass-through) |
+| `CACHE_FIX_DUMP_BREAKPOINTS` | unset | Path to dump cache breakpoint structure (diagnostic for #12) |
 ## Limitations
@@ -424,6 +521,7 @@ measurable signature of cache-efficiency degradation.
 - **[@Renvect](https://github.com/Renvect)** — Image duplication discovery, cross-project directory contamination analysis
 - **[@fgrosswig](https://github.com/fgrosswig)** — [claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard) forensic methodology: cost-factor overhead ratio metric, `anthropic-*` header capture pattern, proxy NDJSON schema that informed our dashboard interop layer
 - **[@TomTheMenace](https://github.com/TomTheMenace)** — Windows `.bat` wrapper for the interceptor, first Windows platform validation (7.5h/536-call Opus 4.6 session, 98.4% cache hit rate, 81% fingerprint instability corrected)
+- **[@arjansingh](https://github.com/arjansingh)** — nvm-compatible wrapper script with dynamic `npm root -g` path resolution (PR #15)
 If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-code-cache-fix",
-  "version": "1.7.2",
+  "version": "1.9.0",
   "description": "Fixes prompt cache regression in Claude Code that causes up to 20x cost increase on resumed sessions",
   "type": "module",
   "exports": "./preload.mjs",

package/preload.mjs CHANGED Viewed

@@ -83,6 +83,25 @@ function extractRealUserMessageText(messages) {
   return "";
 }
+/**
+ * Extract text from messages[0] the way CC's original fingerprint code does —
+ * including meta/attachment blocks. Used only for round-trip verification.
+ */
+function extractFirstMessageText(messages) {
+  if (!Array.isArray(messages) || messages.length === 0) return "";
+  const first = messages[0];
+  if (!first || first.role !== "user") return "";
+  const content = first.content;
+  if (typeof content === "string") return content;
+  if (!Array.isArray(content)) return "";
+  for (const block of content) {
+    if (block.type === "text" && typeof block.text === "string") {
+      return block.text;
+    }
+  }
+  return "";
+}
 /**
  * Extract current cc_version from system prompt blocks and recompute with
  * stable fingerprint. Returns { oldVersion, newVersion, stableFingerprint }.
@@ -107,6 +126,23 @@ function stabilizeFingerprint(system, messages) {
   const baseVersion = dotParts.slice(0, 3).join("."); // "2.1.87"
   const oldFingerprint = dotParts[3]; // "a3f"
+  // --- SAFETY: Round-trip verification ---
+  // Verify our salt/indices reproduce CC's fingerprint for the ORIGINAL
+  // message text (messages[0] content, which is what CC used).
+  // If our computation doesn't match, our constants are stale — skip rewrite.
+  const originalText = extractFirstMessageText(messages);
+  const verification = computeFingerprint(originalText, baseVersion);
+  if (verification !== oldFingerprint) {
+    debugLog(
+      "FINGERPRINT SAFETY: round-trip verification failed.",
+      `CC sent '${oldFingerprint}', we computed '${verification}'.`,
+      "Salt/indices may have changed in this CC version. Skipping rewrite."
+    );
+    recordFixResult("fingerprint", "safety_blocked");
+    return null;
+  }
+  // --- END SAFETY ---
   // Compute stable fingerprint from real user text
   const realText = extractRealUserMessageText(messages);
   const stableFingerprint = computeFingerprint(realText, baseVersion);
@@ -588,13 +624,16 @@ function replaceOutputEfficiencySection(text) {
 // Set CACHE_FIX_DEBUG=1 to enable
 // --------------------------------------------------------------------------
-import { appendFileSync, readFileSync, writeFileSync, mkdirSync } from "node:fs";
+import { appendFileSync, readFileSync, writeFileSync, mkdirSync, renameSync } from "node:fs";
 import { homedir } from "node:os";
 import { join } from "node:path";
 const DEBUG = process.env.CACHE_FIX_DEBUG === "1";
 const PREFIXDIFF = process.env.CACHE_FIX_PREFIXDIFF === "1";
 const NORMALIZE_IDENTITY = process.env.CACHE_FIX_NORMALIZE_IDENTITY === "1";
+const STRIP_GIT_STATUS = process.env.CACHE_FIX_STRIP_GIT_STATUS === "1";
+const TTL_MAIN = (process.env.CACHE_FIX_TTL_MAIN || "1h").toLowerCase();
+const TTL_SUBAGENT = (process.env.CACHE_FIX_TTL_SUBAGENT || "1h").toLowerCase();
 const LOG_PATH = join(homedir(), ".claude", "cache-fix-debug.log");
 const SNAPSHOT_DIR = join(homedir(), ".claude", "cache-fix-snapshots");
 const USAGE_JSONL = process.env.CACHE_FIX_USAGE_LOG || join(homedir(), ".claude", "usage.jsonl");
@@ -605,6 +644,104 @@ function debugLog(...args) {
   try { appendFileSync(LOG_PATH, line); } catch {}
 }
+// --------------------------------------------------------------------------
+// Kill switches — disable fixes while keeping monitoring active
+// --------------------------------------------------------------------------
+const FIXES_DISABLED = process.env.CACHE_FIX_DISABLED === "1";
+/**
+ * Check if a specific fix should be applied.
+ * Returns false if master kill switch is on OR individual fix is skipped.
+ * Monitoring and optimizations (image strip, output efficiency) are NOT
+ * affected by CACHE_FIX_DISABLED — only bug fixes are.
+ */
+function shouldApplyFix(fixName) {
+  if (FIXES_DISABLED) return false;
+  const skipKey = `CACHE_FIX_SKIP_${fixName.toUpperCase()}`;
+  if (process.env[skipKey] === "1") return false;
+  return true;
+}
+// --------------------------------------------------------------------------
+// Persistent effectiveness stats
+// --------------------------------------------------------------------------
+const STATS_PATH = join(homedir(), ".claude", "cache-fix-stats.json");
+const _STATS_SCHEMA = {
+  relocate: { applied: 0, skipped: 0, bugPresent: 0, resumeScanned: 0, lastApplied: null, lastScanned: null },
+  fingerprint: { applied: 0, skipped: 0, safetyBlocked: 0, lastApplied: null },
+  tool_sort: { applied: 0, skipped: 0, lastApplied: null },
+  ttl: { applied: 0, skipped: 0, lastApplied: null },
+  identity: { applied: 0, skipped: 0, lastApplied: null },
+  git_status: { applied: 0, skipped: 0, lastApplied: null },
+};
+function _createEmptyStats() {
+  return {
+    version: 1,
+    created: new Date().toISOString(),
+    lastUpdated: null,
+    fixes: JSON.parse(JSON.stringify(_STATS_SCHEMA)),
+  };
+}
+/** Read stats from disk. Returns empty stats on any error. */
+function readStats() {
+  try {
+    const data = JSON.parse(readFileSync(STATS_PATH, "utf8"));
+    if (data.created) {
+      const ageDays = (Date.now() - new Date(data.created).getTime()) / (1000 * 60 * 60 * 24);
+      if (ageDays > 30) return _createEmptyStats();
+    }
+    for (const [key, schema] of Object.entries(_STATS_SCHEMA)) {
+      if (!data.fixes[key]) data.fixes[key] = { ...schema };
+    }
+    return data;
+  } catch {
+    return _createEmptyStats();
+  }
+}
+/** Atomic write: temp file + rename to avoid corruption. */
+function writeStats(stats) {
+  try {
+    stats.lastUpdated = new Date().toISOString();
+    const tmp = STATS_PATH + ".tmp";
+    writeFileSync(tmp, JSON.stringify(stats, null, 2));
+    renameSync(tmp, STATS_PATH);
+  } catch (e) {
+    debugLog("STATS WRITE ERROR:", e?.message);
+  }
+}
+function recordFixResult(fixName, result) {
+  const stats = readStats();
+  if (!stats.fixes[fixName]) return;
+  const now = new Date().toISOString();
+  stats.lastUpdated = now;
+  if (result === "applied") {
+    stats.fixes[fixName].applied++;
+    stats.fixes[fixName].lastApplied = now;
+  } else if (result === "skipped") {
+    stats.fixes[fixName].skipped++;
+  } else if (result === "safety_blocked") {
+    stats.fixes[fixName].safetyBlocked = (stats.fixes[fixName].safetyBlocked || 0) + 1;
+  }
+  writeStats(stats);
+}
+function recordRelocateScan(bugFound) {
+  const stats = readStats();
+  const now = new Date().toISOString();
+  stats.lastUpdated = now;
+  stats.fixes.relocate.resumeScanned++;
+  stats.fixes.relocate.lastScanned = now;
+  if (bugFound) stats.fixes.relocate.bugPresent++;
+  writeStats(stats);
+}
 // --------------------------------------------------------------------------
 // Prefix snapshot — captures message prefix for cross-process diff.
 // Set CACHE_FIX_PREFIXDIFF=1 to enable.
@@ -656,6 +793,59 @@ function dumpGrowthBookFlags() {
   }
 }
+// --------------------------------------------------------------------------
+// Startup health status line
+// --------------------------------------------------------------------------
+let _healthLinePrinted = false;
+function _formatTimeSince(isoString) {
+  if (!isoString) return "never";
+  const ms = Date.now() - new Date(isoString).getTime();
+  const hours = Math.floor(ms / (1000 * 60 * 60));
+  const days = Math.floor(hours / 24);
+  if (days > 0) return `${days}d ago`;
+  if (hours > 0) return `${hours}h ago`;
+  const mins = Math.floor(ms / (1000 * 60));
+  return `${mins}m ago`;
+}
+function _formatFixStatus(fixName, fixStats, dormantThreshold = 5) {
+  if (fixName === "relocate") {
+    if (fixStats.resumeScanned >= dormantThreshold && fixStats.bugPresent === 0) {
+      return `dormant(${fixStats.resumeScanned} clean sessions)`;
+    }
+  } else {
+    if (fixStats.skipped >= dormantThreshold && fixStats.applied === 0) {
+      return `dormant(${fixStats.skipped} skips)`;
+    }
+  }
+  if (fixStats.safetyBlocked > 0) return `safety-blocked(${fixStats.safetyBlocked}x)`;
+  if (fixStats.lastApplied) return `active(${_formatTimeSince(fixStats.lastApplied)})`;
+  return "waiting";
+}
+function printHealthLine() {
+  if (_healthLinePrinted) return;
+  _healthLinePrinted = true;
+  const stats = readStats();
+  const parts = [];
+  for (const [name, fixStats] of Object.entries(stats.fixes)) {
+    const status = _formatFixStatus(name, fixStats);
+    parts.push(`${name}=${status}`);
+    if (status.startsWith("dormant")) {
+      debugLog(`DORMANT: ${name} — CC may have fixed this. Consider CACHE_FIX_SKIP_${name.toUpperCase()}=1`);
+    }
+    if (status.startsWith("safety-blocked")) {
+      debugLog(`SAFETY: ${name} — salt/indices may have changed. Fix is auto-disabled.`);
+    }
+  }
+  debugLog(`HEALTH: ${parts.join(" ")}`);
+  if (FIXES_DISABLED) {
+    debugLog("HEALTH: all fixes disabled via CACHE_FIX_DISABLED=1 (monitoring active)");
+  }
+}
 // --------------------------------------------------------------------------
 // Microcompact / budget monitoring
 // --------------------------------------------------------------------------
@@ -801,6 +991,50 @@ function snapshotPrefix(payload) {
   }
 }
+// --------------------------------------------------------------------------
+// Cache regression detector
+// --------------------------------------------------------------------------
+const _cacheHistory = []; // in-memory ring buffer of { ratio, turn }
+const REGRESSION_MIN_CALLS = 5;
+const REGRESSION_MIN_RATIO = 0.5;
+let _apiCallCount = 0;
+function _computeCacheRatio(usage) {
+  if (!usage) return null;
+  const read = usage.cache_read_input_tokens || 0;
+  const creation = usage.cache_creation_input_tokens || 0;
+  const input = usage.input_tokens || 0;
+  const total = read + creation + input;
+  if (total === 0) return null;
+  return read / total;
+}
+function _checkCacheRegression() {
+  if (_cacheHistory.length < REGRESSION_MIN_CALLS) return;
+  const recent = _cacheHistory.slice(-REGRESSION_MIN_CALLS);
+  const allLow = recent.every((h) => h.ratio < REGRESSION_MIN_RATIO);
+  if (allLow) {
+    const avgRatio = recent.reduce((sum, h) => sum + h.ratio, 0) / recent.length;
+    debugLog(
+      `REGRESSION WARNING: cache_read ratio averaged ${Math.round(avgRatio * 100)}%`,
+      `across last ${REGRESSION_MIN_CALLS} calls (threshold: ${REGRESSION_MIN_RATIO * 100}%).`,
+      FIXES_DISABLED
+        ? "Fixes are disabled — consider re-enabling to recover cache performance."
+        : "Fixes are active but cache is still degraded — CC may have introduced a new bug."
+    );
+  }
+}
+function _trackCacheRatio(usage) {
+  if (_apiCallCount <= 1) return; // skip first call (cache creation, no reads)
+  const ratio = _computeCacheRatio(usage);
+  if (ratio === null) return;
+  _cacheHistory.push({ ratio, turn: _apiCallCount });
+  if (_cacheHistory.length > 20) _cacheHistory.shift(); // ring buffer
+  _checkCacheRegression();
+}
 // --------------------------------------------------------------------------
 // Fetch interceptor
 // --------------------------------------------------------------------------
@@ -817,11 +1051,17 @@ globalThis.fetch = async function (url, options) {
   if (isMessagesEndpoint && options?.body && typeof options.body === "string") {
     try {
+      _apiCallCount++;
       const payload = JSON.parse(options.body);
       let modified = false;
       // One-time GrowthBook flag dump on first API call
       dumpGrowthBookFlags();
+      printHealthLine();
+      if (FIXES_DISABLED) {
+        debugLog("CACHE_FIX_DISABLED=1 — all bug fixes bypassed, monitoring active");
+      }
       debugLog("--- API call to", urlStr);
       debugLog("message count:", payload.messages?.length);
@@ -832,7 +1072,7 @@ globalThis.fetch = async function (url, options) {
       }
       // Bug 1: Relocate resume attachment blocks
-      if (payload.messages) {
+      if (payload.messages && shouldApplyFix("relocate")) {
         // Log message structure for debugging
         if (DEBUG) {
           let firstUserIdx = -1, lastUserIdx = -1;
@@ -868,13 +1108,21 @@ globalThis.fetch = async function (url, options) {
         }
         const normalized = normalizeResumeMessages(payload.messages);
+        // Track bug presence for dormancy detection (resume = messages > 5)
+        const isResume = payload.messages.length > 5;
+        if (isResume) recordRelocateScan(normalized !== payload.messages);
         if (normalized !== payload.messages) {
           payload.messages = normalized;
           modified = true;
           debugLog("APPLIED: resume message relocation");
+          recordFixResult("relocate", "applied");
         } else {
           debugLog("SKIPPED: resume relocation (not a resume or already correct)");
+          recordFixResult("relocate", "skipped");
         }
+      } else if (payload.messages && !shouldApplyFix("relocate")) {
+        debugLog("SKIPPED: relocate fix disabled via env var");
       }
       // Image stripping: remove old tool_result images to reduce token waste
@@ -895,7 +1143,7 @@ globalThis.fetch = async function (url, options) {
       }
       // Bug 2a: Stabilize tool ordering
-      if (payload.tools) {
+      if (payload.tools && shouldApplyFix("tool_sort")) {
         const sorted = stabilizeToolOrder(payload.tools);
         const changed = sorted.some(
           (t, i) => t.name !== payload.tools[i]?.name
@@ -904,11 +1152,16 @@ globalThis.fetch = async function (url, options) {
           payload.tools = sorted;
           modified = true;
           debugLog("APPLIED: tool order stabilization");
+          recordFixResult("tool_sort", "applied");
+        } else {
+          recordFixResult("tool_sort", "skipped");
         }
+      } else if (payload.tools && !shouldApplyFix("tool_sort")) {
+        debugLog("SKIPPED: tool sort fix disabled via env var");
       }
       // Bug 2b: Stabilize fingerprint in attribution header
-      if (payload.system && payload.messages) {
+      if (payload.system && payload.messages && shouldApplyFix("fingerprint")) {
         const fix = stabilizeFingerprint(payload.system, payload.messages);
         if (fix) {
           payload.system = [...payload.system];
@@ -918,7 +1171,12 @@ globalThis.fetch = async function (url, options) {
           };
           modified = true;
           debugLog("APPLIED: fingerprint stabilized from", fix.oldFingerprint, "to", fix.stableFingerprint);
+          recordFixResult("fingerprint", "applied");
+        } else {
+          recordFixResult("fingerprint", "skipped");
         }
+      } else if (payload.system && payload.messages && !shouldApplyFix("fingerprint")) {
+        debugLog("SKIPPED: fingerprint fix disabled via env var");
       }
       // Bug 6: Identity string normalization for Agent()/SendMessage() cache parity
@@ -931,7 +1189,7 @@ globalThis.fetch = async function (url, options) {
       // turn even though system[2] (the actual instructions) is byte-identical.
       // Confirmed by @labzink via mitmproxy on #44724.
       // Opt-in because it's a model-perceivable behavior change (subagent thinks it's CC).
-      if (NORMALIZE_IDENTITY && payload.system && Array.isArray(payload.system)) {
+      if (NORMALIZE_IDENTITY && shouldApplyFix("identity") && payload.system && Array.isArray(payload.system)) {
         const CANONICAL = "You are Claude Code, Anthropic's official CLI for Claude.";
         const AGENT_SDK = "You are a Claude agent, built on Anthropic's Claude Agent SDK.";
         let normalized = 0;
@@ -949,6 +1207,9 @@ globalThis.fetch = async function (url, options) {
         if (normalized > 0) {
           modified = true;
           debugLog(`APPLIED: identity normalized on ${normalized} system block(s) (Agent SDK → Claude Code)`);
+          recordFixResult("identity", "applied");
+        } else {
+          recordFixResult("identity", "skipped");
         }
       }
@@ -964,39 +1225,91 @@ globalThis.fetch = async function (url, options) {
         }
       }
-      // Bug 5: 1h TTL enforcement
+      // Optimization: strip volatile git-status from system prompt
+      // CC injects live git-status output (branch, changed files, recent commits)
+      // into a system text block. This changes on every file edit, busting the
+      // entire prefix cache. Opt-in via CACHE_FIX_STRIP_GIT_STATUS=1.
+      // The model can still run `git status` via Bash tool when it needs context.
+      if (STRIP_GIT_STATUS && shouldApplyFix("git_status") && payload.system && Array.isArray(payload.system)) {
+        let stripped = 0;
+        payload.system = payload.system.map((block) => {
+          if (block?.type !== "text" || typeof block.text !== "string") return block;
+          // Match the gitStatus section CC injects. Pattern:
+          //   "gitStatus: This is the git status..."
+          //   followed by branch, status, commits until the next section or end
+          const gitStatusPattern = /gitStatus:.*?(?=\n# |\n## |\nWhen |\nAnswer |\n<[a-z]|$)/s;
+          if (!gitStatusPattern.test(block.text)) return block;
+          const newText = block.text.replace(gitStatusPattern, "gitStatus: [stripped by cache-fix for prefix stability]");
+          if (newText !== block.text) {
+            stripped++;
+            return { ...block, text: newText };
+          }
+          return block;
+        });
+        if (stripped > 0) {
+          modified = true;
+          debugLog(`APPLIED: git-status stripped from ${stripped} system block(s)`);
+          recordFixResult("git_status", "applied");
+        } else {
+          recordFixResult("git_status", "skipped");
+        }
+      }
+      // Bug 5: TTL enforcement (configurable per request type)
       // The client gates 1h cache TTL behind a GrowthBook allowlist that checks
       // querySource against patterns like "repl_main_thread*", "sdk", "auto_mode".
       // Interactive CLI sessions may not match any pattern, causing the client to
       // send cache_control without ttl (defaulting to 5m server-side).
       // The server honors whatever TTL the client requests — so we inject it.
       // Discovered by @TigerKay1926 on #42052 using our GrowthBook flag dump.
-      if (payload.system) {
-        let ttlInjected = 0;
-        payload.system = payload.system.map((block) => {
-          if (block.cache_control?.type === "ephemeral" && !block.cache_control.ttl) {
-            ttlInjected++;
-            return { ...block, cache_control: { ...block.cache_control, ttl: "1h" } };
-          }
-          return block;
-        });
-        // Also check messages for cache_control blocks (conversation history breakpoints)
-        if (payload.messages) {
-          for (const msg of payload.messages) {
-            if (!Array.isArray(msg.content)) continue;
-            for (let i = 0; i < msg.content.length; i++) {
-              const b = msg.content[i];
-              if (b.cache_control?.type === "ephemeral" && !b.cache_control.ttl) {
-                msg.content[i] = { ...b, cache_control: { ...b.cache_control, ttl: "1h" } };
-                ttlInjected++;
+      //
+      // v1.9.0: configurable per request type via CACHE_FIX_TTL_MAIN and
+      // CACHE_FIX_TTL_SUBAGENT. Values: "1h" (default), "5m", "none".
+      // "none" = don't inject TTL, pass through caller's original cache_control.
+      if (payload.system && shouldApplyFix("ttl")) {
+        // Detect subagent: Agent SDK identity in system[1]
+        const AGENT_SDK_PREFIX = "You are a Claude agent, built on Anthropic's Claude Agent SDK.";
+        const isSubagent = Array.isArray(payload.system) &&
+          payload.system.some((b) => b?.type === "text" && typeof b.text === "string" && b.text.startsWith(AGENT_SDK_PREFIX));
+        const ttlValue = isSubagent ? TTL_SUBAGENT : TTL_MAIN;
+        const requestType = isSubagent ? "subagent" : "main";
+        if (ttlValue === "none") {
+          debugLog(`SKIPPED: TTL injection (${requestType} set to 'none' — pass-through)`);
+          recordFixResult("ttl", "skipped");
+        } else {
+          const ttlParam = ttlValue === "5m" ? "5m" : "1h";
+          let ttlInjected = 0;
+          payload.system = payload.system.map((block) => {
+            if (block.cache_control?.type === "ephemeral" && !block.cache_control.ttl) {
+              ttlInjected++;
+              return { ...block, cache_control: { ...block.cache_control, ttl: ttlParam } };
+            }
+            return block;
+          });
+          // Also check messages for cache_control blocks (conversation history breakpoints)
+          if (payload.messages) {
+            for (const msg of payload.messages) {
+              if (!Array.isArray(msg.content)) continue;
+              for (let i = 0; i < msg.content.length; i++) {
+                const b = msg.content[i];
+                if (b.cache_control?.type === "ephemeral" && !b.cache_control.ttl) {
+                  msg.content[i] = { ...b, cache_control: { ...b.cache_control, ttl: ttlParam } };
+                  ttlInjected++;
+                }
               }
             }
           }
+          if (ttlInjected > 0) {
+            modified = true;
+            debugLog(`APPLIED: ${ttlParam} TTL injected on ${ttlInjected} cache_control block(s) (${requestType})`);
+            recordFixResult("ttl", "applied");
+          } else {
+            recordFixResult("ttl", "skipped");
+          }
         }
-        if (ttlInjected > 0) {
-          modified = true;
-          debugLog(`APPLIED: 1h TTL injected on ${ttlInjected} cache_control block(s)`);
-        }
+      } else if (payload.system && !shouldApplyFix("ttl")) {
+        debugLog("SKIPPED: TTL injection disabled via env var");
       }
       if (modified) {
@@ -1009,6 +1322,60 @@ globalThis.fetch = async function (url, options) {
         monitorContextDegradation(payload.messages);
       }
+      // Diagnostic: dump cache breakpoint structure to a file when
+      // CACHE_FIX_DUMP_BREAKPOINTS=<path> is set. Maps where cache_control markers
+      // sit across system blocks and message content. Used to investigate #12
+      // (missing breakpoint #3 for skills/CLAUDE.md).
+      if (process.env.CACHE_FIX_DUMP_BREAKPOINTS && payload.system) {
+        try {
+          const dumpPath = process.env.CACHE_FIX_DUMP_BREAKPOINTS;
+          const breakpoints = [];
+          // System blocks
+          if (Array.isArray(payload.system)) {
+            payload.system.forEach((block, idx) => {
+              if (block.cache_control) {
+                breakpoints.push({
+                  location: "system",
+                  index: idx,
+                  type: block.type,
+                  cache_control: block.cache_control,
+                  text_preview: (block.text || "").slice(0, 120),
+                  text_chars: (block.text || "").length,
+                });
+              }
+            });
+          }
+          // Message blocks
+          if (payload.messages) {
+            payload.messages.forEach((msg, msgIdx) => {
+              if (!Array.isArray(msg.content)) return;
+              msg.content.forEach((block, blockIdx) => {
+                if (block.cache_control) {
+                  breakpoints.push({
+                    location: `messages[${msgIdx}].content`,
+                    role: msg.role,
+                    index: blockIdx,
+                    type: block.type,
+                    cache_control: block.cache_control,
+                    text_preview: (block.text || "").slice(0, 120),
+                    text_chars: (block.text || "").length,
+                  });
+                }
+              });
+            });
+          }
+          const dump = {
+            timestamp: new Date().toISOString(),
+            breakpoint_count: breakpoints.length,
+            breakpoints,
+            system_block_count: Array.isArray(payload.system) ? payload.system.length : 0,
+            message_count: payload.messages ? payload.messages.length : 0,
+          };
+          writeFileSync(dumpPath, JSON.stringify(dump, null, 2));
+          debugLog(`DUMP: ${breakpoints.length} cache breakpoints written to ${dumpPath}`);
+        } catch (e) { debugLog("BREAKPOINT DUMP ERROR:", e?.message); }
+      }
       // Diagnostic: dump full tools array (names, descriptions, schemas, sizes) to a file
       // when CACHE_FIX_DUMP_TOOLS=<path> is set. Useful for per-version tool-schema drift
       // analysis and for understanding which tools contribute prefix bloat. First used
@@ -1199,6 +1566,7 @@ async function drainTTLFromClone(clone, model, quotaHeaders) {
           if (event.type === "message_start" && event.message?.usage) {
             const u = event.message.usage;
             startUsage = u;
+            _trackCacheRatio(u);
             const cc = u.cache_creation || {};
             const e1h = cc.ephemeral_1h_input_tokens ?? 0;
             const e5m = cc.ephemeral_5m_input_tokens ?? 0;

package/tools/cost-report.mjs CHANGED Viewed

@@ -397,13 +397,24 @@ function calculateCosts(entries, ratesData) {
       continue;
     }
-    // Determine cache write tier breakdown
-    // If telemetry has eph_1h/eph_5m, use those; otherwise assume all cache_create is 5m
-    let cw1h = entry.eph_1h;
-    let cw5m = entry.eph_5m;
-    if (cw1h === 0 && cw5m === 0 && entry.cache_create > 0) {
-      // No tier breakdown available; assume 5m (conservative — lower rate)
-      cw5m = entry.cache_create;
+    // Determine cache write tier for cache_creation tokens.
+    // eph_1h/eph_5m are READ tokens (cache hits per tier), not write tokens.
+    // But they tell us which tier the request was on — and cache creation on
+    // that request uses the same tier's write rate.
+    // Fix for #7: previously assigned all creation to 5m when eph fields were 0.
+    let cw1h = 0;
+    let cw5m = 0;
+    if (entry.cache_create > 0) {
+      if (entry.eph_1h > 0) {
+        // Request was on 1h tier — creation charged at 1h write rate
+        cw1h = entry.cache_create;
+      } else if (entry.eph_5m > 0) {
+        // Request was on 5m tier — creation charged at 5m write rate
+        cw5m = entry.cache_create;
+      } else {
+        // No tier signal available; assume 5m (conservative — lower rate)
+        cw5m = entry.cache_create;
+      }
     }
     const cost = (

package/tools/quota-statusline.sh CHANGED Viewed

@@ -59,6 +59,16 @@ try:
     if ttl:
         if ttl == '5m':
             label += ' | \033[31mTTL:5m\033[0m'  # red
+            # When on 5m tier, show the cold-rebuild size so users know
+            # the cost of idling past 5 minutes
+            cache_cr = qs.get('cache', {}).get('cache_creation', 0)
+            cache_rd = qs.get('cache', {}).get('cache_read', 0)
+            prefix = cache_cr + cache_rd
+            if prefix > 0:
+                if prefix >= 1_000_000:
+                    label += ' \033[31m\u26A0 idle >5m = {:.1f}M rebuild\033[0m'.format(prefix / 1_000_000)
+                else:
+                    label += ' \033[31m\u26A0 idle >5m = {:.0f}K rebuild\033[0m'.format(prefix / 1_000)
         else:
             label += ' | TTL:' + ttl
     if hit and hit != 'N/A':