npm - claude-code-cache-fix - Versions diffs - 1.6.4 → 1.7.1 - Mend

claude-code-cache-fix 1.6.4 → 1.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md +72 -0
package/claude-fixed.bat +22 -0
package/package.json +3 -2
package/preload.mjs +44 -0
package/tools/cost-report.mjs +31 -0
package/tools/cross-version-cache-test.sh +304 -0
package/tools/sim-cost-reconcile.sh +60 -0
package/tools/usage-to-dashboard-ndjson.mjs +352 -0

package/README.md CHANGED Viewed

@@ -70,6 +70,31 @@ NODE_OPTIONS="--import claude-code-cache-fix" claude
 > **Note**: This only works if `claude` points to the npm/Node installation. The standalone binary uses a different execution path that bypasses Node.js preloads.
+### Windows users
+On Windows, `NODE_OPTIONS="--import ..."` doesn't work the same way as on Linux/macOS. Use the included `claude-fixed.bat` wrapper instead:
+1. After installing both packages globally:
+   ```bat
+   npm install -g claude-code-cache-fix
+   npm install -g @anthropic-ai/claude-code
+   ```
+2. Copy `claude-fixed.bat` from this package to a directory in your PATH (e.g., `C:\Users\<you>\bin\`):
+   ```bat
+   copy "%NPM_ROOT%\claude-code-cache-fix\claude-fixed.bat" C:\Users\%USERNAME%\bin\
+   ```
+   Or find the file manually at your npm global root (run `npm root -g` to locate it).
+3. Run Claude Code with the interceptor active:
+   ```bat
+   claude-fixed [any claude args...]
+   ```
+The wrapper dynamically resolves your npm global root, constructs a `file:///` URL for the preload module (converting backslashes to forward slashes for Node.js), and launches Claude Code with the interceptor loaded. All environment variables (`CACHE_FIX_DEBUG`, `CACHE_FIX_IMAGE_KEEP_LAST`, etc.) work the same as on Linux/macOS.
+Credit: [@TomTheMenace](https://github.com/anthropics/claude-code/issues/38335) contributed the Windows wrapper and validated the interceptor across a 7.5-hour, 536-call Opus 4.6 session on Windows — 98.4% cache hit rate, 81% of calls had fingerprint instability that the interceptor corrected.
 ## How it works
 The module intercepts `globalThis.fetch` before Claude Code makes API calls to `/v1/messages`. On each call it:
@@ -303,6 +328,51 @@ Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are gen
 - **[@ArkNill/claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis)** — Systematic proxy-based analysis of 7 bugs including microcompact, budget enforcement, false rate limiter, and extended thinking quota impact. The monitoring features in v1.1.0 are informed by this research.
 - **[@Renvect/X-Ray-Claude-Code-Interceptor](https://github.com/Renvect/X-Ray-Claude-Code-Interceptor)** — Diagnostic HTTPS proxy with real-time dashboard, system prompt section diffing, per-tool stripping thresholds, and multi-stream JSONL logging. Works with any Claude client that supports `ANTHROPIC_BASE_URL` (CLI, VS Code extension, desktop app), complementing this package's CLI-only `NODE_OPTIONS` approach.
+- **[@fgrosswig/claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard)** — Self-hosted forensic dashboard with SSE live monitoring, multi-host aggregation, cache-health scoring, and forced-restart/compaction detection. Reads from Claude Code's native session JSONL files and optionally from an HTTP proxy NDJSON stream. v1.4.0 documented the forced-session-restart mechanism at quota-cap boundaries (~490K tokens per event) and the 78–91% cache-wipe pattern at compaction events. Complementary to our interceptor's in-process vantage point. See [Works with @fgrosswig's dashboard](#works-with-fgrosswigs-dashboard) below for the interop pattern.
+## Works with @fgrosswig's dashboard
+This interceptor and [@fgrosswig](https://github.com/fgrosswig)'s
+[claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard)
+solve strongly complementary problems. The interceptor captures per-call API
+data from inside the Node.js process — cache metrics, quota state, TTL tier,
+rewrites applied. The dashboard provides the visualization layer — historical
+trending, per-day charts, multi-host aggregation, cache-health scoring.
+Running both gives you the best of both tools, and the integration is a
+one-liner thanks to the dashboard's tolerant NDJSON ingest and our new
+`usage-to-dashboard-ndjson` translator.
+### Quick setup
+```bash
+# Install both tools
+npm install -g claude-code-cache-fix
+# (follow fgrosswig's dashboard install: https://github.com/fgrosswig/claude-usage-dashboard)
+# One-shot translation (reads ~/.claude/usage.jsonl, writes to
+# ~/.claude/anthropic-proxy-logs/proxy-YYYY-MM-DD.ndjson, which his
+# dashboard already watches)
+node $(npm root -g)/claude-code-cache-fix/tools/usage-to-dashboard-ndjson.mjs
+# Or keep it live-updating as the interceptor logs new calls
+node $(npm root -g)/claude-code-cache-fix/tools/usage-to-dashboard-ndjson.mjs --follow &
+```
+No configuration required on the dashboard side — fgrosswig's
+`collectProxyNdjsonFiles()` auto-discovers files in
+`~/.claude/anthropic-proxy-logs/` (or `$ANTHROPIC_PROXY_LOG_DIR`), and our
+translator writes to exactly that path with the expected `proxy-YYYY-MM-DD.ndjson`
+filename convention. The dashboard's tolerant ingestion layer ignores unknown
+fields, so interceptor-specific extras (`ttl_tier`, `ephemeral_1h_input_tokens`,
+`ephemeral_5m_input_tokens`, `peak_hour`, quota state) pass through cleanly
+and remain available to downstream consumers that know to read them.
+The `cost_factor` metric in `tools/cost-report.mjs` also comes from
+fgrosswig's methodology — the `(input + output + cache_read + cache_creation) / output`
+ratio that gives a single-number measure of how much context is being paid
+per useful output token. A rising cost factor across a long session is the
+measurable signature of cache-efficiency degradation.
 ## Used in production
@@ -316,6 +386,8 @@ Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are gen
 - **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, image stripping, monitoring features, overage TTL downgrade discovery, package maintainer
 - **[@ArkNill](https://github.com/ArkNill)** — Microcompact mechanism analysis, GrowthBook flag documentation, false rate limiter identification
 - **[@Renvect](https://github.com/Renvect)** — Image duplication discovery, cross-project directory contamination analysis
+- **[@fgrosswig](https://github.com/fgrosswig)** — [claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard) forensic methodology: cost-factor overhead ratio metric, `anthropic-*` header capture pattern, proxy NDJSON schema that informed our dashboard interop layer
+- **[@TomTheMenace](https://github.com/TomTheMenace)** — Windows `.bat` wrapper for the interceptor, first Windows platform validation (7.5h/536-call Opus 4.6 session, 98.4% cache hit rate, 81% fingerprint instability corrected)
 If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.

package/claude-fixed.bat ADDED Viewed

@@ -0,0 +1,22 @@
+@echo off
+REM claude-fixed.bat — Windows wrapper for Claude Code with cache-fix interceptor.
+REM
+REM Resolves the npm global root dynamically, constructs a file:/// URL for the
+REM preload module (converting backslashes to forward slashes for Node.js), and
+REM launches Claude Code with the interceptor active.
+REM
+REM Usage:
+REM   claude-fixed [any claude args...]
+REM
+REM Prerequisites:
+REM   npm install -g claude-code-cache-fix
+REM   npm install -g @anthropic-ai/claude-code
+REM
+REM Save this file somewhere in your PATH (e.g. C:\Users\<you>\bin\claude-fixed.bat).
+REM
+REM Credit: @TomTheMenace (https://github.com/anthropics/claude-code/issues/38335)
+REM Part of claude-code-cache-fix: https://github.com/cnighswonger/claude-code-cache-fix
+for /f "delims=" %%G in ('npm root -g') do set "NPM_GLOBAL=%%G"
+set NODE_OPTIONS=--import file:///%NPM_GLOBAL:\=/%/claude-code-cache-fix/preload.mjs
+node "%NPM_GLOBAL%\@anthropic-ai\claude-code\cli.js" %*

package/package.json CHANGED Viewed

@@ -1,13 +1,14 @@
 {
   "name": "claude-code-cache-fix",
-  "version": "1.6.4",
+  "version": "1.7.1",
   "description": "Fixes prompt cache regression in Claude Code that causes up to 20x cost increase on resumed sessions",
   "type": "module",
   "exports": "./preload.mjs",
   "main": "./preload.mjs",
   "files": [
     "preload.mjs",
-    "tools/"
+    "tools/",
+    "claude-fixed.bat"
   ],
   "engines": {
     "node": ">=18"

package/preload.mjs CHANGED Viewed

@@ -1009,6 +1009,30 @@ globalThis.fetch = async function (url, options) {
         monitorContextDegradation(payload.messages);
       }
+      // Diagnostic: dump full tools array (names, descriptions, schemas, sizes) to a file
+      // when CACHE_FIX_DUMP_TOOLS=<path> is set. Useful for per-version tool-schema drift
+      // analysis and for understanding which tools contribute prefix bloat. First used
+      // during the 2026-04-11 cross-version regression investigation.
+      if (process.env.CACHE_FIX_DUMP_TOOLS && payload.tools) {
+        try {
+          const dumpPath = process.env.CACHE_FIX_DUMP_TOOLS;
+          const dump = {
+            timestamp: new Date().toISOString(),
+            tool_count: payload.tools.length,
+            tools: payload.tools.map(t => ({
+              name: t.name,
+              description: t.description || "",
+              desc_chars: (t.description || "").length,
+              schema_chars: JSON.stringify(t.input_schema || {}).length,
+              total_chars: JSON.stringify(t).length,
+            })),
+            system_chars: JSON.stringify(payload.system || "").length,
+            total_tools_chars: JSON.stringify(payload.tools).length,
+          };
+          writeFileSync(dumpPath, JSON.stringify(dump, null, 2));
+        } catch (e) { debugLog("DUMP ERROR:", e?.message); }
+      }
       // Prompt size measurement — log system prompt, tools, and injected block sizes
       if (DEBUG && payload.system && payload.tools && payload.messages) {
         const sysChars = JSON.stringify(payload.system).length;
@@ -1061,6 +1085,25 @@ globalThis.fetch = async function (url, options) {
       const status = response.headers.get("anthropic-ratelimit-unified-status");
       const overage = response.headers.get("anthropic-ratelimit-unified-overage-status");
+      // Capture ALL anthropic-* and request-id/cf-ray response headers.
+      // Pattern borrowed from @fgrosswig's claude-usage-dashboard proxy:
+      //   https://github.com/fgrosswig/claude-usage-dashboard
+      // Widening beyond the specific unified-ratelimit headers above future-proofs
+      // us against Anthropic adding new headers (e.g. experimental rollout flags,
+      // region hints, new quota dimensions) without needing code changes.
+      const allAnthropicHeaders = {};
+      for (const [name, value] of response.headers.entries()) {
+        const lower = name.toLowerCase();
+        if (
+          lower.startsWith("anthropic-") ||
+          lower === "request-id" ||
+          lower === "x-request-id" ||
+          lower === "cf-ray"
+        ) {
+          allAnthropicHeaders[lower] = value;
+        }
+      }
       if (h5 || h7d) {
         const quotaFile = join(homedir(), ".claude", "quota-status.json");
         let quota = {};
@@ -1070,6 +1113,7 @@ globalThis.fetch = async function (url, options) {
         quota.seven_day = h7d ? { utilization: parseFloat(h7d), pct: Math.round(parseFloat(h7d) * 100), resets_at: reset7d ? parseInt(reset7d) : null } : quota.seven_day;
         quota.status = status || null;
         quota.overage_status = overage || null;
+        quota.all_headers = allAnthropicHeaders;
         // Peak hour detection — Anthropic applies higher quota drain rate during
         // weekday peak hours: 13:00–19:00 UTC (Mon–Fri).

package/tools/cost-report.mjs CHANGED Viewed

@@ -484,6 +484,12 @@ function printJsonReport(results, summary, ratesData, adminSummary) {
       total_cost: summary.totalCost,
       avg_cost_per_call: summary.totalCost / summary.calls,
       tokens: summary.totals,
+      cost_factor: (function () {
+        // fgrosswig-style overhead ratio: gross tokens / output tokens
+        const gross = summary.totals.input + summary.totals.output +
+                      summary.totals.cache_read + summary.totals.cache_1h + summary.totals.cache_5m;
+        return summary.totals.output > 0 ? gross / summary.totals.output : null;
+      })(),
       by_model: summary.byModel,
       degradation: summary.degradedCalls > 0 ? {
         degraded_calls: summary.degradedCalls,
@@ -544,6 +550,15 @@ function printMarkdownReport(results, summary, ratesData, adminSummary) {
   lines.push(`| Total cache write 5m | ${fmt(summary.totals.cache_5m)} |`);
   lines.push(`| **Total cost** | **${fmtCost(summary.totalCost)}** |`);
   lines.push(`| Avg cost per call | ${fmtCost(summary.totalCost / summary.calls)} |`);
+  {
+    // Cost factor: popularized by @fgrosswig's claude-usage-dashboard
+    // (https://github.com/fgrosswig/claude-usage-dashboard)
+    const grossTokens = summary.totals.input + summary.totals.output +
+                        summary.totals.cache_read + summary.totals.cache_1h + summary.totals.cache_5m;
+    if (summary.totals.output > 0) {
+      lines.push(`| Cost factor (tokens/output) | ${(grossTokens / summary.totals.output).toFixed(1)}× |`);
+    }
+  }
   lines.push('');
   // By model
@@ -680,6 +695,22 @@ function printTextReport(results, summary, ratesData, adminSummary) {
       }
     }
   }
+  // ── Cost factor (overhead ratio) ──
+  // Credit: this metric was popularized by @fgrosswig's claude-usage-dashboard
+  // (https://github.com/fgrosswig/claude-usage-dashboard). It divides total
+  // tokens processed (input + output + cache_read + cache_creation) by useful
+  // output tokens, giving a single-number "how much context am I carrying
+  // per useful word of output" multiplier. Values climb over long sessions
+  // due to resume/compaction cycles; a rising curve is a signal that cache
+  // efficiency is degrading.
+  const totalCacheCreate = summary.totals.cache_1h + summary.totals.cache_5m;
+  const grossTokens = summary.totals.input + summary.totals.output +
+                      summary.totals.cache_read + totalCacheCreate;
+  if (summary.totals.output > 0) {
+    const costFactor = grossTokens / summary.totals.output;
+    console.log(`  Cost factor:           ${costFactor.toFixed(1)}× (tokens/output)`);
+  }
   console.log('');
   // ── Degradation ──

package/tools/cross-version-cache-test.sh ADDED Viewed

@@ -0,0 +1,304 @@
+#!/usr/bin/env bash
+# cross-version-cache-test — replicable cache-behavior test across installed Claude Code versions.
+#
+# What it tests:
+#   Phase A (always): per-version steady-state cache behavior via 5 sequential Haiku -p calls,
+#                     fired within seconds of each other. Captures:
+#                       - Turn 1 prefix size (cache_creation cold start)
+#                       - Turns 2-5 cache hit stability (should be ~100% cache_read if TTL holds)
+#                       - Per-turn q5h_pct delta
+#                       - TTL tier granted by server
+#   Phase B (optional, --include-idle): per-version idle-gap behavior via two calls 6 minutes apart.
+#                     Captures whether the 1h TTL grant holds across a >5-minute idle, or whether
+#                     the server flips to 5m tier and forces a rebuild.
+#
+# Safety:
+#   - Uses Haiku exclusively (~$0.006/call at Haiku 4.5 rates; full test at ~30 calls = ~$0.20)
+#   - No deliberate quota burn; exits gracefully if Q5h > 80% at start
+#   - Runs against fixed seed prompt to keep per-call overhead minimal
+#   - Does not trigger overage, does not pin quota state for the session
+#
+# Usage:
+#   ./cross-version-cache-test.sh                       # Phase A only, quick
+#   ./cross-version-cache-test.sh --include-idle        # Phase A + Phase B (takes ~25 minutes)
+#   ./cross-version-cache-test.sh --output /some/path   # Custom output dir
+#
+# Output:
+#   /tmp/cross-version-test-YYYYMMDD-HHMMSS/ (default) containing:
+#     - <version>-phase-a.jsonl    # one usage.jsonl record per call
+#     - <version>-phase-b.jsonl    # optional, only with --include-idle
+#     - summary.md                 # tabulated comparison across versions
+#     - raw-quota-status-*.json    # quota state snapshots
+#
+# Part of claude-code-cache-fix. Requires:
+#   - ~/bin/cc-version launcher (see repo)
+#   - Installed versions at ~/cc-versions/<version>/ (this script checks and warns)
+#   - Interceptor active (the script verifies usage.jsonl grows per call)
+#
+# First created 2026-04-11 for the March 23 regression investigation follow-up.
+set -euo pipefail
+# ─── Configuration ──────────────────────────────────────────────────────────
+VERSIONS=(2.1.81 2.1.83 2.1.90 2.1.101)
+STEADY_STATE_TURNS=5
+IDLE_GAP_SECONDS=360  # 6 minutes, crosses the 5m TTL boundary
+SEED_PROMPT='Reply with exactly: ok'
+MODEL='haiku'
+# ─── CLI parsing ────────────────────────────────────────────────────────────
+INCLUDE_IDLE=0
+OUTPUT_DIR=""
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        --include-idle) INCLUDE_IDLE=1; shift ;;
+        --output)       OUTPUT_DIR="$2"; shift 2 ;;
+        -h|--help)
+            sed -n '3,34p' "$0" | sed 's/^# \?//'
+            exit 0
+            ;;
+        *)
+            echo "unknown flag: $1" >&2
+            exit 1
+            ;;
+    esac
+done
+# Default output dir
+if [[ -z "$OUTPUT_DIR" ]]; then
+    OUTPUT_DIR="/tmp/cross-version-test-$(date +%Y%m%d-%H%M%S)"
+fi
+mkdir -p "$OUTPUT_DIR"
+SUMMARY="$OUTPUT_DIR/summary.md"
+echo "# Cross-Version Cache Test — $(date -u +%Y-%m-%dT%H:%M:%SZ)" > "$SUMMARY"
+echo "" >> "$SUMMARY"
+echo "Output directory: \`$OUTPUT_DIR\`" >> "$SUMMARY"
+echo "" >> "$SUMMARY"
+# ─── Preflight ──────────────────────────────────────────────────────────────
+echo "=== Cross-version cache test ===" | tee -a "$SUMMARY"
+# Check launcher
+if [[ ! -x "$HOME/bin/cc-version" ]]; then
+    echo "ERROR: $HOME/bin/cc-version not found or not executable" >&2
+    exit 1
+fi
+# Check installed versions
+for v in "${VERSIONS[@]}"; do
+    if [[ ! -f "$HOME/cc-versions/$v/node_modules/@anthropic-ai/claude-code/cli.js" ]]; then
+        echo "ERROR: v$v not installed at ~/cc-versions/$v — run the install snippet in docs/march-23-regression-investigation.md" >&2
+        exit 1
+    fi
+done
+# Quota safety check — abort if Q5h is already high
+Q5H=$(python3 -c "
+import json
+try:
+    q = json.load(open('$HOME/.claude/quota-status.json'))
+    print(q['five_hour']['pct'])
+except Exception:
+    print(0)
+" 2>/dev/null || echo 0)
+if [[ "$Q5H" -gt 80 ]]; then
+    echo "ABORT: Q5h is at ${Q5H}% — too close to cap. Test deferred." | tee -a "$SUMMARY"
+    exit 2
+fi
+echo "Preflight OK: Q5h at ${Q5H}%, 4 versions installed, launcher present." | tee -a "$SUMMARY"
+echo "" | tee -a "$SUMMARY"
+# Snapshot quota state at start
+cp "$HOME/.claude/quota-status.json" "$OUTPUT_DIR/raw-quota-status-start.json" 2>/dev/null || true
+# ─── Phase A: steady-state per version ─────────────────────────────────────
+echo "## Phase A — Steady-state" | tee -a "$SUMMARY"
+echo "" | tee -a "$SUMMARY"
+echo "5 sequential Haiku calls per version, fired in quick succession (<30s gap each)." | tee -a "$SUMMARY"
+echo "" | tee -a "$SUMMARY"
+for v in "${VERSIONS[@]}"; do
+    echo "--- Phase A: v$v ---"
+    OUTFILE="$OUTPUT_DIR/$v-phase-a.jsonl"
+    : > "$OUTFILE"
+    for i in $(seq 1 "$STEADY_STATE_TURNS"); do
+        USAGE_LINES_BEFORE=$(wc -l < "$HOME/.claude/usage.jsonl" 2>/dev/null || echo 0)
+        echo "$SEED_PROMPT" | "$HOME/bin/cc-version" "$v" -p --model "$MODEL" > /dev/null 2>&1 || {
+            echo "WARNING: v$v turn $i failed" | tee -a "$SUMMARY"
+            continue
+        }
+        USAGE_LINES_AFTER=$(wc -l < "$HOME/.claude/usage.jsonl" 2>/dev/null || echo 0)
+        if [[ "$USAGE_LINES_AFTER" -gt "$USAGE_LINES_BEFORE" ]]; then
+            # Capture the newly-added usage.jsonl line(s) for this version
+            tail -n "$((USAGE_LINES_AFTER - USAGE_LINES_BEFORE))" "$HOME/.claude/usage.jsonl" >> "$OUTFILE"
+        fi
+        # Tiny sleep to let the interceptor finish writing the telemetry
+        sleep 0.5
+    done
+    TURNS_CAPTURED=$(wc -l < "$OUTFILE")
+    echo "  v$v: $TURNS_CAPTURED turns captured → $OUTFILE"
+done
+echo "" | tee -a "$SUMMARY"
+# ─── Phase B: idle-gap (optional) ──────────────────────────────────────────
+if [[ "$INCLUDE_IDLE" -eq 1 ]]; then
+    echo "## Phase B — Idle-gap behavior" | tee -a "$SUMMARY"
+    echo "" | tee -a "$SUMMARY"
+    echo "Per version: turn 1, wait ${IDLE_GAP_SECONDS}s (crosses 5m TTL), turn 2." | tee -a "$SUMMARY"
+    echo "" | tee -a "$SUMMARY"
+    for v in "${VERSIONS[@]}"; do
+        echo "--- Phase B: v$v ---"
+        OUTFILE="$OUTPUT_DIR/$v-phase-b.jsonl"
+        : > "$OUTFILE"
+        # Turn 1
+        USAGE_LINES_BEFORE=$(wc -l < "$HOME/.claude/usage.jsonl" 2>/dev/null || echo 0)
+        echo "$SEED_PROMPT" | "$HOME/bin/cc-version" "$v" -p --model "$MODEL" > /dev/null 2>&1 || true
+        USAGE_LINES_AFTER=$(wc -l < "$HOME/.claude/usage.jsonl" 2>/dev/null || echo 0)
+        if [[ "$USAGE_LINES_AFTER" -gt "$USAGE_LINES_BEFORE" ]]; then
+            tail -n "$((USAGE_LINES_AFTER - USAGE_LINES_BEFORE))" "$HOME/.claude/usage.jsonl" >> "$OUTFILE"
+        fi
+        echo "  v$v: turn 1 done, waiting ${IDLE_GAP_SECONDS}s..."
+        sleep "$IDLE_GAP_SECONDS"
+        # Turn 2
+        USAGE_LINES_BEFORE=$(wc -l < "$HOME/.claude/usage.jsonl" 2>/dev/null || echo 0)
+        echo "$SEED_PROMPT" | "$HOME/bin/cc-version" "$v" -p --model "$MODEL" > /dev/null 2>&1 || true
+        USAGE_LINES_AFTER=$(wc -l < "$HOME/.claude/usage.jsonl" 2>/dev/null || echo 0)
+        if [[ "$USAGE_LINES_AFTER" -gt "$USAGE_LINES_BEFORE" ]]; then
+            tail -n "$((USAGE_LINES_AFTER - USAGE_LINES_BEFORE))" "$HOME/.claude/usage.jsonl" >> "$OUTFILE"
+        fi
+        echo "  v$v: turn 2 done"
+    done
+    echo "" | tee -a "$SUMMARY"
+fi
+# Snapshot quota state at end
+cp "$HOME/.claude/quota-status.json" "$OUTPUT_DIR/raw-quota-status-end.json" 2>/dev/null || true
+# ─── Analysis ──────────────────────────────────────────────────────────────
+echo "## Phase A Results" >> "$SUMMARY"
+echo "" >> "$SUMMARY"
+python3 <<EOF >> "$SUMMARY"
+import json, os
+output_dir = "$OUTPUT_DIR"
+versions = ["2.1.81", "2.1.83", "2.1.90", "2.1.101"]
+include_idle = $INCLUDE_IDLE
+def load_jsonl(path):
+    if not os.path.exists(path):
+        return []
+    rows = []
+    with open(path) as f:
+        for line in f:
+            line = line.strip()
+            if line:
+                try:
+                    rows.append(json.loads(line))
+                except Exception:
+                    pass
+    return rows
+# Phase A steady-state table
+print("### Per-version per-turn usage (Phase A)")
+print("")
+print("| Version | Turn | cc (creation) | cr (read) | prefix | out | ttl | q5h% |")
+print("|---|---:|---:|---:|---:|---:|---|---:|")
+for v in versions:
+    rows = load_jsonl(os.path.join(output_dir, f"{v}-phase-a.jsonl"))
+    for i, r in enumerate(rows, 1):
+        cc = r.get("cache_creation_input_tokens", 0)
+        cr = r.get("cache_read_input_tokens", 0)
+        prefix = cc + cr
+        out = r.get("output_tokens", 0)
+        ttl = r.get("ttl_tier", "?")
+        q5h = r.get("q5h_pct", "?")
+        print(f"| v{v} | {i} | {cc:>6,} | {cr:>6,} | {prefix:>6,} | {out:>3} | {ttl} | {q5h}% |")
+print("")
+# Steady-state summary: turn-2-onwards averages
+print("### Steady-state averages (turns 2-5)")
+print("")
+print("| Version | avg prefix | avg cc | avg cr | cache hit rate | Turn 1 cold cc | q5h delta turn 1→5 |")
+print("|---|---:|---:|---:|---:|---:|---:|")
+for v in versions:
+    rows = load_jsonl(os.path.join(output_dir, f"{v}-phase-a.jsonl"))
+    if len(rows) < 2:
+        print(f"| v{v} | (insufficient data) | | | | | |")
+        continue
+    turn1 = rows[0]
+    tail = rows[1:]
+    avg_prefix = sum((r.get("cache_creation_input_tokens",0) + r.get("cache_read_input_tokens",0)) for r in tail) / len(tail)
+    avg_cc = sum(r.get("cache_creation_input_tokens",0) for r in tail) / len(tail)
+    avg_cr = sum(r.get("cache_read_input_tokens",0) for r in tail) / len(tail)
+    hit_rate = avg_cr / avg_prefix if avg_prefix > 0 else 0
+    q5h_start = rows[0].get("q5h_pct", 0)
+    q5h_end = rows[-1].get("q5h_pct", 0)
+    q5h_delta = (q5h_end - q5h_start) if isinstance(q5h_start, (int, float)) and isinstance(q5h_end, (int, float)) else "?"
+    print(f"| v{v} | {avg_prefix:>7,.0f} | {avg_cc:>6,.0f} | {avg_cr:>6,.0f} | {hit_rate*100:.1f}% | {turn1.get('cache_creation_input_tokens',0):>7,} | {q5h_delta}% |")
+print("")
+if include_idle:
+    print("## Phase B Results (idle-gap behavior)")
+    print("")
+    print("| Version | Turn 1 prefix | Turn 1 ttl | idle (s) | Turn 2 cc | Turn 2 cr | Turn 2 ttl | rebuilt? |")
+    print("|---|---:|---|---:|---:|---:|---|:---:|")
+    for v in versions:
+        rows = load_jsonl(os.path.join(output_dir, f"{v}-phase-b.jsonl"))
+        if len(rows) < 2:
+            print(f"| v{v} | (incomplete) | | | | | | |")
+            continue
+        t1, t2 = rows[0], rows[1]
+        t1_prefix = t1.get("cache_creation_input_tokens",0) + t1.get("cache_read_input_tokens",0)
+        t2_cc = t2.get("cache_creation_input_tokens",0)
+        t2_cr = t2.get("cache_read_input_tokens",0)
+        # Idle gap we configured
+        idle_s = $IDLE_GAP_SECONDS
+        # Rebuilt = turn 2 had substantial cache_creation relative to turn 1 prefix
+        rebuilt = "✗ expired" if t2_cc > (t1_prefix * 0.5) else "✓ warm"
+        print(f"| v{v} | {t1_prefix:>7,} | {t1.get('ttl_tier','?')} | {idle_s} | {t2_cc:>7,} | {t2_cr:>7,} | {t2.get('ttl_tier','?')} | {rebuilt} |")
+    print("")
+print("---")
+print("")
+print("*Generated by cross-version-cache-test.sh*")
+EOF
+echo ""
+echo "=== Test complete ==="
+echo "Summary written to: $SUMMARY"
+echo ""
+echo "Raw per-version JSONLs in: $OUTPUT_DIR"
+echo ""
+if [[ "$Q5H" -lt 50 ]]; then
+    NEW_Q5H=$(python3 -c "
+import json
+try:
+    print(json.load(open('$HOME/.claude/quota-status.json'))['five_hour']['pct'])
+except Exception:
+    print('?')
+" 2>/dev/null)
+    echo "Q5h at start: ${Q5H}%"
+    echo "Q5h at end:   ${NEW_Q5H}%"
+fi

package/tools/sim-cost-reconcile.sh ADDED Viewed

@@ -0,0 +1,60 @@
+#!/usr/bin/env bash
+# sim-cost-reconcile — One-liner wrapper for running cost-report.mjs against
+# a simulation log with admin API cross-reference enabled.
+#
+# Usage:
+#   sim-cost-reconcile <sim-dir-or-log> [extra cost-report.mjs args...]
+#
+# Examples:
+#   sim-cost-reconcile ~/git_repos/kanfei_test/kanfei-nowcast/.test_cache/simulations/realtime_sim_harnett_county_qlcs_2026_20260411_024836
+#   sim-cost-reconcile path/to/simulation.log --format md > report.md
+#
+# Reads admin key from $ANTHROPIC_ADMIN_KEY or ~/.config/anthropic/admin-key.
+# If no admin key is available, runs with telemetry only and warns.
+#
+# NOTE on admin reconciliation: the admin API returns data at 1h-bucket
+# resolution, so if multiple sims (or other API activity) overlap the same
+# hour, the admin total will include all of it. For an accurate multi-sim
+# aggregate, run this on each sim and sum the telemetry totals, then pull
+# the admin total once over the full window.
+set -euo pipefail
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+COST_REPORT="$SCRIPT_DIR/cost-report.mjs"
+if [[ $# -lt 1 ]]; then
+    echo "Usage: $(basename "$0") <sim-dir-or-log> [extra cost-report args...]" >&2
+    exit 1
+fi
+TARGET="$1"
+shift
+# Resolve a dir to its simulation.log
+if [[ -d "$TARGET" ]]; then
+    LOG="$TARGET/simulation.log"
+    if [[ ! -f "$LOG" ]]; then
+        echo "ERROR: no simulation.log in $TARGET" >&2
+        exit 1
+    fi
+elif [[ -f "$TARGET" ]]; then
+    LOG="$TARGET"
+else
+    echo "ERROR: $TARGET is neither a file nor a directory" >&2
+    exit 1
+fi
+# Load admin key
+ADMIN_KEY_FILE="${HOME}/.config/anthropic/admin-key"
+if [[ -n "${ANTHROPIC_ADMIN_KEY:-}" ]]; then
+    KEY="$ANTHROPIC_ADMIN_KEY"
+elif [[ -r "$ADMIN_KEY_FILE" ]]; then
+    KEY="$(cat "$ADMIN_KEY_FILE")"
+else
+    echo "WARNING: no admin key found ($ADMIN_KEY_FILE missing, ANTHROPIC_ADMIN_KEY unset)" >&2
+    echo "         running telemetry-only — pass --admin-key or set env var to enable reconciliation" >&2
+    exec node "$COST_REPORT" --sim-log "$LOG" "$@"
+fi
+exec node "$COST_REPORT" --sim-log "$LOG" --admin-key "$KEY" "$@"

package/tools/usage-to-dashboard-ndjson.mjs ADDED Viewed

@@ -0,0 +1,352 @@
+#!/usr/bin/env node
+/**
+ * usage-to-dashboard-ndjson — Translate claude-code-cache-fix's usage.jsonl
+ * into the proxy NDJSON format expected by @fgrosswig's claude-usage-dashboard,
+ * and write to the directory his dashboard already watches.
+ *
+ *   https://github.com/fgrosswig/claude-usage-dashboard
+ *
+ * # Why this exists
+ *
+ * Our interceptor and fgrosswig's dashboard are strongly complementary:
+ * the interceptor captures per-call API data from inside the Node.js process
+ * (cache metrics, quota state, request rewrites), while his dashboard
+ * provides visualization, historical trending, and multi-host aggregation.
+ *
+ * Rather than build our own visualization layer, we translate our per-call
+ * usage records into the NDJSON schema his dashboard ingests. A user running
+ * both tools gets the best of both: the interceptor fixes what it can fix
+ * and emits rich per-call data, and the dashboard displays that data
+ * alongside whatever Claude Code's own session JSONLs already capture.
+ *
+ * # What this tool does
+ *
+ * Reads `~/.claude/usage.jsonl` (our interceptor's per-call log) and
+ * translates each entry into a minimal-but-compatible record in the shape
+ * his dashboard expects under `~/.claude/anthropic-proxy-logs/*.ndjson`.
+ * The output file follows the convention `proxy-YYYY-MM-DD.ndjson`, one
+ * file per UTC day, matching the filename pattern his `collectProxyNdjsonFiles()`
+ * helper discovers.
+ *
+ * # Fields emitted
+ *
+ * Mapped from our usage.jsonl to fgrosswig's proxy-core.js shape:
+ *
+ *   {
+ *     "ts_start":  <our timestamp>,
+ *     "ts_end":    <our timestamp>,        // single-point, no duration
+ *     "duration_ms": null,                 // we don't measure this
+ *     "method":    "POST",
+ *     "path":      "/v1/messages",
+ *     "upstream_status": 200,              // implicit from usage presence
+ *     "usage": {
+ *       "input_tokens": <ours>,
+ *       "output_tokens": <ours>,
+ *       "cache_read_input_tokens": <ours>,
+ *       "cache_creation_input_tokens": <ours>
+ *     },
+ *     "cache_read_ratio": <computed>,
+ *     "cache_health":     "healthy" | "affected" | "mixed",
+ *     "request_hints":    { "model": <ours> },
+ *     "response_anthropic_headers": {      // if quota fields available
+ *       "anthropic-ratelimit-unified-5h-utilization": "<ours>",
+ *       "anthropic-ratelimit-unified-7d-utilization": "<ours>"
+ *     },
+ *     "ttl_tier":         <ours, interceptor-specific>,
+ *     "ephemeral_1h_input_tokens": <ours, interceptor-specific>,
+ *     "ephemeral_5m_input_tokens": <ours, interceptor-specific>,
+ *     "source": "claude-code-cache-fix"
+ *   }
+ *
+ * Extra fields beyond fgrosswig's native schema (ttl_tier, ephemeral_*,
+ * source) are added for forward-compatibility — his dashboard ignores
+ * unknown fields per its tolerant-ingest design, and our own tooling
+ * downstream may find them useful when consuming the same NDJSON.
+ *
+ * # Usage
+ *
+ *   # One-shot translation (reads all of usage.jsonl, writes today's file)
+ *   node tools/usage-to-dashboard-ndjson.mjs
+ *
+ *   # Follow mode (tail usage.jsonl, append new records as they arrive)
+ *   node tools/usage-to-dashboard-ndjson.mjs --follow
+ *
+ *   # Custom input/output paths
+ *   node tools/usage-to-dashboard-ndjson.mjs --input /path/to/usage.jsonl --output-dir /path/to/ndjson-dir
+ *
+ *   # Dry-run: print to stdout instead of writing files
+ *   node tools/usage-to-dashboard-ndjson.mjs --stdout
+ *
+ * # Environment
+ *
+ *   ANTHROPIC_PROXY_LOG_DIR  Override output directory (matches fgrosswig's
+ *                            dashboard env var so both tools stay in sync).
+ *
+ * Part of claude-code-cache-fix. MIT licensed.
+ *   https://github.com/cnighswonger/claude-code-cache-fix
+ */
+import { readFileSync, writeFileSync, appendFileSync, existsSync, mkdirSync, statSync, watch } from 'node:fs';
+import { join } from 'node:path';
+import { homedir } from 'node:os';
+// ─── CLI parsing ────────────────────────────────────────────────────────────
+function parseArgs() {
+  const args = process.argv.slice(2);
+  const opts = {
+    input: join(homedir(), '.claude', 'usage.jsonl'),
+    outputDir: process.env.ANTHROPIC_PROXY_LOG_DIR || join(homedir(), '.claude', 'anthropic-proxy-logs'),
+    stdout: false,
+    follow: false,
+    help: false,
+  };
+  for (let i = 0; i < args.length; i++) {
+    switch (args[i]) {
+      case '--input':      opts.input = args[++i]; break;
+      case '--output-dir': opts.outputDir = args[++i]; break;
+      case '--stdout':     opts.stdout = true; break;
+      case '--follow':     opts.follow = true; break;
+      case '-h':
+      case '--help':       opts.help = true; break;
+      default:
+        console.error(`unknown flag: ${args[i]}`);
+        opts.help = true;
+    }
+  }
+  return opts;
+}
+function printUsage() {
+  console.log(`usage-to-dashboard-ndjson — Translate cache-fix usage.jsonl to fgrosswig dashboard NDJSON.
+Usage:
+  node usage-to-dashboard-ndjson.mjs                 One-shot: read all, write today's file
+  node usage-to-dashboard-ndjson.mjs --follow        Tail usage.jsonl, append new records live
+  node usage-to-dashboard-ndjson.mjs --stdout        Print NDJSON to stdout instead of files
+  node usage-to-dashboard-ndjson.mjs --input <path>  Custom input (default: ~/.claude/usage.jsonl)
+  node usage-to-dashboard-ndjson.mjs --output-dir <path>  Custom output dir (default: ~/.claude/anthropic-proxy-logs)
+Output files follow the convention: proxy-YYYY-MM-DD.ndjson (one per UTC day).
+Environment:
+  ANTHROPIC_PROXY_LOG_DIR  Override output directory (also used by fgrosswig's dashboard).
+Credit: this tool writes the NDJSON schema expected by @fgrosswig's
+claude-usage-dashboard (https://github.com/fgrosswig/claude-usage-dashboard).
+Running both tools together gives users per-call data from our interceptor
+plus the visualization layer from his dashboard, with no coordination needed.
+`);
+}
+// ─── Record translation ─────────────────────────────────────────────────────
+/**
+ * Translate one claude-code-cache-fix usage.jsonl record into a
+ * fgrosswig-dashboard-compatible NDJSON record. Returns null if the
+ * record doesn't have enough fields to be usable.
+ */
+function translateRecord(entry) {
+  if (!entry || !entry.timestamp || !entry.model) return null;
+  const inTok = entry.input_tokens || 0;
+  const outTok = entry.output_tokens || 0;
+  const crTok = entry.cache_read_input_tokens || 0;
+  const ccTok = entry.cache_creation_input_tokens || 0;
+  // Cache health (fgrosswig's semantic labels)
+  const totalCacheInput = crTok + ccTok;
+  const cacheReadRatio = totalCacheInput > 0 ? crTok / totalCacheInput : null;
+  let cacheHealth = 'na';
+  if (cacheReadRatio != null) {
+    if (cacheReadRatio >= 0.8) cacheHealth = 'healthy';
+    else if (cacheReadRatio < 0.4 && ccTok > 0) cacheHealth = 'affected';
+    else cacheHealth = 'mixed';
+  }
+  // Reconstruct a minimal response_anthropic_headers blob from the quota
+  // pct fields we captured. Not byte-identical to what the proxy would see
+  // on the wire, but structurally compatible for the dashboard's consumers.
+  const responseHeaders = {};
+  if (entry.q5h_pct != null) {
+    responseHeaders['anthropic-ratelimit-unified-5h-utilization'] = String(entry.q5h_pct / 100);
+  }
+  if (entry.q7d_pct != null) {
+    responseHeaders['anthropic-ratelimit-unified-7d-utilization'] = String(entry.q7d_pct / 100);
+  }
+  const rec = {
+    ts_start: entry.timestamp,
+    ts_end: entry.timestamp,
+    duration_ms: null,
+    method: 'POST',
+    path: '/v1/messages',
+    upstream_status: 200,
+    usage: {
+      input_tokens: inTok,
+      output_tokens: outTok,
+      cache_read_input_tokens: crTok,
+      cache_creation_input_tokens: ccTok,
+    },
+    cache_read_ratio: cacheReadRatio,
+    cache_health: cacheHealth,
+    request_hints: {
+      model: entry.model,
+    },
+    response_anthropic_headers: responseHeaders,
+    // Interceptor-specific extras — fgrosswig's dashboard ignores unknown
+    // fields, so these pass through without breaking ingestion.
+    ttl_tier: entry.ttl_tier || null,
+    ephemeral_1h_input_tokens: entry.ephemeral_1h_input_tokens || 0,
+    ephemeral_5m_input_tokens: entry.ephemeral_5m_input_tokens || 0,
+    peak_hour: entry.peak_hour || false,
+    source: 'claude-code-cache-fix',
+  };
+  // Synthesize a stable pseudo-request-id from timestamp + model for dedup
+  // at the dashboard layer. Not a real request ID — just a deterministic key.
+  rec.req_id = 'ccf_' + entry.timestamp.replace(/[^0-9]/g, '') + '_' + entry.model.slice(-6);
+  return rec;
+}
+// ─── File output ────────────────────────────────────────────────────────────
+function dayFileFor(outputDir, isoTimestamp) {
+  // proxy-YYYY-MM-DD.ndjson from UTC date
+  const date = isoTimestamp.slice(0, 10);
+  return join(outputDir, `proxy-${date}.ndjson`);
+}
+function ensureDir(dir) {
+  if (!existsSync(dir)) mkdirSync(dir, { recursive: true });
+}
+function writeRecords(records, outputDir, useStdout) {
+  if (useStdout) {
+    for (const r of records) {
+      process.stdout.write(JSON.stringify(r) + '\n');
+    }
+    return records.length;
+  }
+  ensureDir(outputDir);
+  // Group by day for efficient appending
+  const byDay = new Map();
+  for (const r of records) {
+    const day = dayFileFor(outputDir, r.ts_start);
+    if (!byDay.has(day)) byDay.set(day, []);
+    byDay.get(day).push(r);
+  }
+  for (const [dayFile, dayRecords] of byDay) {
+    const payload = dayRecords.map(r => JSON.stringify(r)).join('\n') + '\n';
+    // Overwrite on one-shot mode — the tool is idempotent within a single
+    // input file, so rewriting today's file from a full replay is safe.
+    writeFileSync(dayFile, payload);
+  }
+  return records.length;
+}
+// ─── One-shot batch mode ────────────────────────────────────────────────────
+function runBatch(opts) {
+  if (!existsSync(opts.input)) {
+    console.error(`ERROR: input file not found: ${opts.input}`);
+    process.exit(1);
+  }
+  const raw = readFileSync(opts.input, 'utf8');
+  const lines = raw.split('\n').filter(l => l.trim());
+  const records = [];
+  let skipped = 0;
+  for (const line of lines) {
+    try {
+      const entry = JSON.parse(line);
+      const rec = translateRecord(entry);
+      if (rec) records.push(rec);
+      else skipped++;
+    } catch {
+      skipped++;
+    }
+  }
+  const written = writeRecords(records, opts.outputDir, opts.stdout);
+  if (!opts.stdout) {
+    console.error(`usage-to-dashboard-ndjson: wrote ${written} records to ${opts.outputDir} (${skipped} skipped)`);
+  }
+}
+// ─── Follow mode ────────────────────────────────────────────────────────────
+function runFollow(opts) {
+  if (!existsSync(opts.input)) {
+    console.error(`ERROR: input file not found: ${opts.input}`);
+    process.exit(1);
+  }
+  // First, catch up on the existing file (idempotent write)
+  runBatch(opts);
+  // Then watch for new entries
+  console.error(`usage-to-dashboard-ndjson: watching ${opts.input} for new records...`);
+  let lastSize = statSync(opts.input).size;
+  watch(opts.input, { persistent: true }, () => {
+    let currentSize;
+    try { currentSize = statSync(opts.input).size; } catch { return; }
+    if (currentSize <= lastSize) {
+      // File truncated or unchanged — rewind lastSize
+      if (currentSize < lastSize) lastSize = 0;
+      return;
+    }
+    // Read only the new bytes
+    try {
+      const fd = readFileSync(opts.input, 'utf8');
+      const newContent = fd.slice(lastSize);
+      lastSize = currentSize;
+      const newLines = newContent.split('\n').filter(l => l.trim());
+      const newRecs = [];
+      for (const line of newLines) {
+        try {
+          const entry = JSON.parse(line);
+          const rec = translateRecord(entry);
+          if (rec) newRecs.push(rec);
+        } catch {}
+      }
+      if (newRecs.length > 0) {
+        // Append to today's dayfile per record
+        ensureDir(opts.outputDir);
+        for (const r of newRecs) {
+          const dayFile = dayFileFor(opts.outputDir, r.ts_start);
+          appendFileSync(dayFile, JSON.stringify(r) + '\n');
+        }
+        console.error(`[${new Date().toISOString()}] appended ${newRecs.length} records`);
+      }
+    } catch (err) {
+      console.error(`watch error: ${err.message}`);
+    }
+  });
+  // Keep the process alive
+  process.stdin.resume();
+}
+// ─── Main ───────────────────────────────────────────────────────────────────
+const opts = parseArgs();
+if (opts.help) {
+  printUsage();
+  process.exit(0);
+}
+if (opts.follow) {
+  runFollow(opts);
+} else {
+  runBatch(opts);
+}