npm - claude-code-cache-fix - Versions diffs - 1.6.3 → 1.7.0 - Mend

claude-code-cache-fix 1.6.3 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md +71 -0
package/README.zh.md +5 -0
package/package.json +2 -2
package/preload.mjs +44 -0
package/tools/cost-report.mjs +31 -0
package/tools/quota-analysis.mjs +539 -0
package/tools/sim-cost-reconcile.sh +60 -0
package/tools/usage-to-dashboard-ndjson.mjs +352 -0

package/README.md CHANGED Viewed

@@ -217,6 +217,26 @@ node tools/cost-report.mjs --admin-key <key>  # cross-reference with Admin API
 Also works with any JSONL containing Anthropic usage fields (`--file`, stdin) — useful for SDK users and proxy setups. See `docs/cost-report.md` for full documentation.
+### Quota analysis (5-hour quota counting)
+The same `usage.jsonl` log can be analyzed to test how Anthropic's 5-hour quota is actually computed. Run the bundled tool:
+```bash
+node tools/quota-analysis.mjs              # analyze your default log
+node tools/quota-analysis.mjs --since 24h  # last 24 hours only
+node tools/quota-analysis.mjs --json       # machine-readable output
+```
+The tool answers three questions from your own data:
+1. **Does `cache_read` count toward your 5-hour quota?** Tests three hypotheses (cache_read costs 0x / 0.1x / 1x of input rate) and reports which one best explains your `q5h_pct` trajectory across reset windows. Lower coefficient of variation across windows = better fit.
+2. **Do peak hours cost more quota per token?** Splits windows into peak-dominant (≥80% peak calls) and off-peak-dominant (≤20%) and compares the implied 100% quota under the best-fit model.
+3. **What is your account's effective 5-hour quota in token-equivalents?** Reports a concrete number you can compare against your subscription tier or against what other users measure.
+Requires `q5h_pct`, `q7d_pct`, and `peak_hour` fields in usage.jsonl, which were added in v1.6.1 (2026-04-09). Older entries are silently filtered out.
+**Help us validate across accounts:** if you run this on your own log, please open an issue or PR on this repo with your output (or just the best-fit hypothesis name and your peak/off-peak ratio). Cross-validating across multiple accounts is the only way to distinguish per-account variance from real findings. Reference: [anthropics/claude-code#45756](https://github.com/anthropics/claude-code/issues/45756).
 ## Debug mode
 Enable debug logging to verify the fix is working:
@@ -283,14 +303,65 @@ Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are gen
 - **[@ArkNill/claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis)** — Systematic proxy-based analysis of 7 bugs including microcompact, budget enforcement, false rate limiter, and extended thinking quota impact. The monitoring features in v1.1.0 are informed by this research.
 - **[@Renvect/X-Ray-Claude-Code-Interceptor](https://github.com/Renvect/X-Ray-Claude-Code-Interceptor)** — Diagnostic HTTPS proxy with real-time dashboard, system prompt section diffing, per-tool stripping thresholds, and multi-stream JSONL logging. Works with any Claude client that supports `ANTHROPIC_BASE_URL` (CLI, VS Code extension, desktop app), complementing this package's CLI-only `NODE_OPTIONS` approach.
+- **[@fgrosswig/claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard)** — Self-hosted forensic dashboard with SSE live monitoring, multi-host aggregation, cache-health scoring, and forced-restart/compaction detection. Reads from Claude Code's native session JSONL files and optionally from an HTTP proxy NDJSON stream. v1.4.0 documented the forced-session-restart mechanism at quota-cap boundaries (~490K tokens per event) and the 78–91% cache-wipe pattern at compaction events. Complementary to our interceptor's in-process vantage point. See [Works with @fgrosswig's dashboard](#works-with-fgrosswigs-dashboard) below for the interop pattern.
+## Works with @fgrosswig's dashboard
+This interceptor and [@fgrosswig](https://github.com/fgrosswig)'s
+[claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard)
+solve strongly complementary problems. The interceptor captures per-call API
+data from inside the Node.js process — cache metrics, quota state, TTL tier,
+rewrites applied. The dashboard provides the visualization layer — historical
+trending, per-day charts, multi-host aggregation, cache-health scoring.
+Running both gives you the best of both tools, and the integration is a
+one-liner thanks to the dashboard's tolerant NDJSON ingest and our new
+`usage-to-dashboard-ndjson` translator.
+### Quick setup
+```bash
+# Install both tools
+npm install -g claude-code-cache-fix
+# (follow fgrosswig's dashboard install: https://github.com/fgrosswig/claude-usage-dashboard)
+# One-shot translation (reads ~/.claude/usage.jsonl, writes to
+# ~/.claude/anthropic-proxy-logs/proxy-YYYY-MM-DD.ndjson, which his
+# dashboard already watches)
+node $(npm root -g)/claude-code-cache-fix/tools/usage-to-dashboard-ndjson.mjs
+# Or keep it live-updating as the interceptor logs new calls
+node $(npm root -g)/claude-code-cache-fix/tools/usage-to-dashboard-ndjson.mjs --follow &
+```
+No configuration required on the dashboard side — fgrosswig's
+`collectProxyNdjsonFiles()` auto-discovers files in
+`~/.claude/anthropic-proxy-logs/` (or `$ANTHROPIC_PROXY_LOG_DIR`), and our
+translator writes to exactly that path with the expected `proxy-YYYY-MM-DD.ndjson`
+filename convention. The dashboard's tolerant ingestion layer ignores unknown
+fields, so interceptor-specific extras (`ttl_tier`, `ephemeral_1h_input_tokens`,
+`ephemeral_5m_input_tokens`, `peak_hour`, quota state) pass through cleanly
+and remain available to downstream consumers that know to read them.
+The `cost_factor` metric in `tools/cost-report.mjs` also comes from
+fgrosswig's methodology — the `(input + output + cache_read + cache_creation) / output`
+ratio that gives a single-number measure of how much context is being paid
+per useful output token. A rising cost factor across a long session is the
+measurable signature of cache-efficiency degradation.
+## Used in production
+- **[Crunchloop DAP](https://dap.crunchloop.ai)** — Agent SDK / DAP development environment. First production team to merge the interceptor to trunk for team-wide deployment (2026-04-10). Identified two distinct cache regression patterns through real-world testing — tool ordering jitter and the fresh-session sort gap — and contributed debug traces that drove the v1.5.1 and v1.6.2 fixes.
 ## Contributors
 - **[@VictorSun92](https://github.com/VictorSun92)** — Original monkey-patch fix for v2.1.88, identified partial scatter on v2.1.90, contributed forward-scan detection, correct block ordering, tighter block matchers, and the optional output-efficiency rewrite hook
+- **[@bilby91](https://github.com/bilby91)** ([Crunchloop DAP](https://dap.crunchloop.ai)) — Agent SDK / DAP production environment validation, 1h cache TTL confirmation, tool ordering jitter discovery via debug trace (fixed in v1.5.1), fresh-session sort bug discovery via SKILLS SORT diagnostic (fixed in v1.6.2). First production team to roll the interceptor to trunk.
 - **[@jmarianski](https://github.com/jmarianski)** — Root cause analysis via MITM proxy capture and Ghidra reverse engineering, multi-mode cache test script
 - **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, image stripping, monitoring features, overage TTL downgrade discovery, package maintainer
 - **[@ArkNill](https://github.com/ArkNill)** — Microcompact mechanism analysis, GrowthBook flag documentation, false rate limiter identification
 - **[@Renvect](https://github.com/Renvect)** — Image duplication discovery, cross-project directory contamination analysis
+- **[@fgrosswig](https://github.com/fgrosswig)** — [claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard) forensic methodology: cost-factor overhead ratio metric, `anthropic-*` header capture pattern, proxy NDJSON schema that informed our dashboard interop layer
 If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.

package/README.zh.md CHANGED Viewed

@@ -277,9 +277,14 @@ CACHE_FIX_PREFIXDIFF=1 claude-fixed
 - [#44045](https://github.com/anthropics/claude-code/issues/44045) — SDK 层面的复现与 token 测量
 - [#32508](https://github.com/anthropics/claude-code/issues/32508) — 关于 `Output efficiency` 系统提示词变更及其可能影响模型行为的社区讨论
+## 生产环境使用
+- **[Crunchloop DAP](https://dap.crunchloop.ai)** — Agent SDK / DAP 开发环境。首个将本拦截器合入 trunk 并团队级部署的生产团队（2026-04-10）。通过真实环境测试发现两类不同的缓存回归问题——工具排序抖动与 fresh-session 排序漏洞，并贡献了驱动 v1.5.1 与 v1.6.2 修复的调试日志。
 ## 贡献者
 - **[@VictorSun92](https://github.com/VictorSun92)** — 原始 v2.1.88 monkey-patch 修复作者，识别出 v2.1.90 中的部分块散布问题，并贡献了前向扫描检测、正确的块排序、更严格的块匹配器，以及可选的 output-efficiency 重写 hook
+- **[@bilby91](https://github.com/bilby91)** ([Crunchloop DAP](https://dap.crunchloop.ai)) — Agent SDK / DAP 生产环境验证、1h 缓存 TTL 确认、通过调试日志发现工具排序抖动（v1.5.1 修复）、通过 SKILLS SORT 诊断发现 fresh-session 排序 bug（v1.6.2 修复）。首个将本拦截器合入 trunk 的生产团队。
 - **[@jmarianski](https://github.com/jmarianski)** — 通过 MITM 代理抓包和 Ghidra 逆向分析定位根因，并提供多模式缓存测试脚本
 - **[@cnighswonger](https://github.com/cnighswonger)** — 指纹稳定化、工具顺序修复、图片剥离、监控功能、超额 TTL 降级发现，本包维护者
 - **[@ArkNill](https://github.com/ArkNill)** — 微压缩机制分析、GrowthBook 标志文档整理、虚假速率限制识别

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-code-cache-fix",
-  "version": "1.6.3",
+  "version": "1.7.0",
   "description": "Fixes prompt cache regression in Claude Code that causes up to 20x cost increase on resumed sessions",
   "type": "module",
   "exports": "./preload.mjs",
@@ -13,7 +13,7 @@
     "node": ">=18"
   },
   "scripts": {
-    "test": "node --test 'test/**/*.test.mjs'"
+    "test": "node --test"
   },
   "keywords": [
     "claude-code",

package/preload.mjs CHANGED Viewed

@@ -1009,6 +1009,30 @@ globalThis.fetch = async function (url, options) {
         monitorContextDegradation(payload.messages);
       }
+      // Diagnostic: dump full tools array (names, descriptions, schemas, sizes) to a file
+      // when CACHE_FIX_DUMP_TOOLS=<path> is set. Useful for per-version tool-schema drift
+      // analysis and for understanding which tools contribute prefix bloat. First used
+      // during the 2026-04-11 cross-version regression investigation.
+      if (process.env.CACHE_FIX_DUMP_TOOLS && payload.tools) {
+        try {
+          const dumpPath = process.env.CACHE_FIX_DUMP_TOOLS;
+          const dump = {
+            timestamp: new Date().toISOString(),
+            tool_count: payload.tools.length,
+            tools: payload.tools.map(t => ({
+              name: t.name,
+              description: t.description || "",
+              desc_chars: (t.description || "").length,
+              schema_chars: JSON.stringify(t.input_schema || {}).length,
+              total_chars: JSON.stringify(t).length,
+            })),
+            system_chars: JSON.stringify(payload.system || "").length,
+            total_tools_chars: JSON.stringify(payload.tools).length,
+          };
+          writeFileSync(dumpPath, JSON.stringify(dump, null, 2));
+        } catch (e) { debugLog("DUMP ERROR:", e?.message); }
+      }
       // Prompt size measurement — log system prompt, tools, and injected block sizes
       if (DEBUG && payload.system && payload.tools && payload.messages) {
         const sysChars = JSON.stringify(payload.system).length;
@@ -1061,6 +1085,25 @@ globalThis.fetch = async function (url, options) {
       const status = response.headers.get("anthropic-ratelimit-unified-status");
       const overage = response.headers.get("anthropic-ratelimit-unified-overage-status");
+      // Capture ALL anthropic-* and request-id/cf-ray response headers.
+      // Pattern borrowed from @fgrosswig's claude-usage-dashboard proxy:
+      //   https://github.com/fgrosswig/claude-usage-dashboard
+      // Widening beyond the specific unified-ratelimit headers above future-proofs
+      // us against Anthropic adding new headers (e.g. experimental rollout flags,
+      // region hints, new quota dimensions) without needing code changes.
+      const allAnthropicHeaders = {};
+      for (const [name, value] of response.headers.entries()) {
+        const lower = name.toLowerCase();
+        if (
+          lower.startsWith("anthropic-") ||
+          lower === "request-id" ||
+          lower === "x-request-id" ||
+          lower === "cf-ray"
+        ) {
+          allAnthropicHeaders[lower] = value;
+        }
+      }
       if (h5 || h7d) {
         const quotaFile = join(homedir(), ".claude", "quota-status.json");
         let quota = {};
@@ -1070,6 +1113,7 @@ globalThis.fetch = async function (url, options) {
         quota.seven_day = h7d ? { utilization: parseFloat(h7d), pct: Math.round(parseFloat(h7d) * 100), resets_at: reset7d ? parseInt(reset7d) : null } : quota.seven_day;
         quota.status = status || null;
         quota.overage_status = overage || null;
+        quota.all_headers = allAnthropicHeaders;
         // Peak hour detection — Anthropic applies higher quota drain rate during
         // weekday peak hours: 13:00–19:00 UTC (Mon–Fri).

package/tools/cost-report.mjs CHANGED Viewed

@@ -484,6 +484,12 @@ function printJsonReport(results, summary, ratesData, adminSummary) {
       total_cost: summary.totalCost,
       avg_cost_per_call: summary.totalCost / summary.calls,
       tokens: summary.totals,
+      cost_factor: (function () {
+        // fgrosswig-style overhead ratio: gross tokens / output tokens
+        const gross = summary.totals.input + summary.totals.output +
+                      summary.totals.cache_read + summary.totals.cache_1h + summary.totals.cache_5m;
+        return summary.totals.output > 0 ? gross / summary.totals.output : null;
+      })(),
       by_model: summary.byModel,
       degradation: summary.degradedCalls > 0 ? {
         degraded_calls: summary.degradedCalls,
@@ -544,6 +550,15 @@ function printMarkdownReport(results, summary, ratesData, adminSummary) {
   lines.push(`| Total cache write 5m | ${fmt(summary.totals.cache_5m)} |`);
   lines.push(`| **Total cost** | **${fmtCost(summary.totalCost)}** |`);
   lines.push(`| Avg cost per call | ${fmtCost(summary.totalCost / summary.calls)} |`);
+  {
+    // Cost factor: popularized by @fgrosswig's claude-usage-dashboard
+    // (https://github.com/fgrosswig/claude-usage-dashboard)
+    const grossTokens = summary.totals.input + summary.totals.output +
+                        summary.totals.cache_read + summary.totals.cache_1h + summary.totals.cache_5m;
+    if (summary.totals.output > 0) {
+      lines.push(`| Cost factor (tokens/output) | ${(grossTokens / summary.totals.output).toFixed(1)}× |`);
+    }
+  }
   lines.push('');
   // By model
@@ -680,6 +695,22 @@ function printTextReport(results, summary, ratesData, adminSummary) {
       }
     }
   }
+  // ── Cost factor (overhead ratio) ──
+  // Credit: this metric was popularized by @fgrosswig's claude-usage-dashboard
+  // (https://github.com/fgrosswig/claude-usage-dashboard). It divides total
+  // tokens processed (input + output + cache_read + cache_creation) by useful
+  // output tokens, giving a single-number "how much context am I carrying
+  // per useful word of output" multiplier. Values climb over long sessions
+  // due to resume/compaction cycles; a rising curve is a signal that cache
+  // efficiency is degrading.
+  const totalCacheCreate = summary.totals.cache_1h + summary.totals.cache_5m;
+  const grossTokens = summary.totals.input + summary.totals.output +
+                      summary.totals.cache_read + totalCacheCreate;
+  if (summary.totals.output > 0) {
+    const costFactor = grossTokens / summary.totals.output;
+    console.log(`  Cost factor:           ${costFactor.toFixed(1)}× (tokens/output)`);
+  }
   console.log('');
   // ── Degradation ──

package/tools/quota-analysis.mjs ADDED Viewed

@@ -0,0 +1,539 @@
+#!/usr/bin/env node
+/**
+ * quota-analysis — Test how Anthropic's 5-hour quota is actually computed
+ * by analyzing your own per-call telemetry.
+ *
+ * Reads usage.jsonl (the per-call log written by claude-code-cache-fix v1.6.1+)
+ * and answers three questions:
+ *
+ *   1. Does cache_read count toward your 5-hour quota?
+ *      Tests three hypotheses (cache_read costs 0x / 0.1x / 1x of input rate)
+ *      and reports which one best explains the q5h_pct trajectory across
+ *      reset windows in your data.
+ *
+ *   2. Do peak hours (weekday 13:00–19:00 UTC) cost more quota per token?
+ *      Splits windows into peak-dominant vs off-peak-dominant and compares
+ *      the implied 100% quota under the best-fit counting model.
+ *
+ *   3. What is your account's effective 5-hour quota in token-equivalents?
+ *      Reports a concrete number you can compare against your subscription
+ *      tier or against what other users are seeing.
+ *
+ * Telemetry requirements:
+ *   - usage.jsonl entries must include q5h_pct, q7d_pct, peak_hour fields
+ *   - These were added in claude-code-cache-fix v1.6.1 (2026-04-09)
+ *   - Older entries are silently filtered out
+ *   - Need at least 2 q5h reset events in the data for meaningful analysis
+ *     (typically 10+ hours of active use)
+ *
+ * Methodology and caveats:
+ *   - q5h is a 5-hour SLIDING window. We approximate it as discrete reset
+ *     boundaries by looking for drops in q5h_pct >= 5 percentage points.
+ *   - Token-equivalent weights: uncached_input = 1.0, output = 5.0,
+ *     cache_creation = 2.0 (treats all writes as 1h-tier; the 5m tier is
+ *     1.25 but most writes are 1h with the interceptor's TTL injection).
+ *   - Coefficient of variation (CV) is used to compare hypotheses: lower
+ *     CV across windows = better fit. CV < 50% suggests a clear winner;
+ *     CV > 80% suggests the model is wrong or sample is too small.
+ *   - Single-account analysis. Sample is yours. Findings should be
+ *     compared across multiple accounts before generalizing.
+ *
+ * Part of claude-code-cache-fix. Works with the interceptor's usage log.
+ * https://github.com/cnighswonger/claude-code-cache-fix
+ *
+ * Reference: anthropics/claude-code#45756 (cache_read quota counting hypothesis)
+ */
+import { readFileSync, existsSync } from 'node:fs';
+import { homedir } from 'node:os';
+import { join } from 'node:path';
+const DEFAULT_USAGE_LOG = join(homedir(), '.claude', 'usage.jsonl');
+// Token-equivalent weights for the H_zero counting model.
+// (cache_read weight is the variable being tested.)
+const W_UNCACHED_INPUT = 1.0;
+const W_OUTPUT = 5.0;
+const W_CACHE_CREATION = 2.0;  // 1h tier conservative; 5m would be 1.25
+// Q5h window boundary detection threshold (in percentage points)
+const RESET_THRESHOLD = 5;
+// Window classification thresholds
+const PEAK_WINDOW_MIN_PCT = 80;     // >= 80% peak calls = peak-dominant window
+const OFFPEAK_WINDOW_MAX_PCT = 20;  // <= 20% peak calls = offpeak-dominant window
+// Minimum delta_q5h for a window to be useful for extrapolation
+const MIN_DELTA_Q5H = 5;
+// ─── CLI parsing ────────────────────────────────────────────────────────────
+function parseArgs() {
+  const args = process.argv.slice(2);
+  const opts = { file: null, since: null, format: 'text', help: false };
+  for (let i = 0; i < args.length; i++) {
+    const a = args[i];
+    if (a === '--help' || a === '-h') opts.help = true;
+    else if (a === '--file' || a === '-f') opts.file = args[++i];
+    else if (a === '--since' || a === '-s') opts.since = args[++i];
+    else if (a === '--format') opts.format = args[++i];
+    else if (a === '--json') opts.format = 'json';
+    else { console.error(`Unknown argument: ${a}`); opts.help = true; }
+  }
+  return opts;
+}
+function printUsage() {
+  console.log(`quota-analysis — analyze 5-hour quota counting from usage telemetry
+Usage:
+  quota-analysis [options]
+Options:
+  -f, --file <path>      JSONL file to read (default: ~/.claude/usage.jsonl)
+  -s, --since <duration> Filter to last N hours/days (e.g. 24h, 3d, 7d)
+      --format <fmt>     Output format: text (default), json, markdown
+      --json             Shorthand for --format json
+  -h, --help             Show this help
+Examples:
+  quota-analysis                        # Analyze your default log
+  quota-analysis --since 24h            # Last 24 hours only
+  quota-analysis --file /tmp/team.jsonl # A different log file
+  quota-analysis --json > report.json   # Machine-readable output
+Methodology:
+  Tests three counting hypotheses for cache_read in the 5-hour quota:
+    H_zero   = cache_read costs nothing for quota
+    H_billed = cache_read costs 0.1x of input rate (matches the billing rate)
+    H_full   = cache_read costs 1.0x of input rate (the original concern)
+  The hypothesis with the lowest coefficient of variation across reset
+  windows is the best fit for your data.
+  Then splits windows into peak (weekday 13:00–19:00 UTC) and off-peak
+  groups and compares the effective quota multiplier between them.
+Reference:
+  anthropics/claude-code#45756 — original "cache_read counts at full rate"
+  hypothesis from @molu0219.
+`);
+}
+// ─── Data loading ───────────────────────────────────────────────────────────
+function loadUsage(filePath) {
+  if (!existsSync(filePath)) {
+    console.error(`Error: usage file not found: ${filePath}`);
+    console.error(`Hint: claude-code-cache-fix writes its log to ${DEFAULT_USAGE_LOG} by default.`);
+    process.exit(1);
+  }
+  const text = readFileSync(filePath, 'utf8');
+  const rows = [];
+  for (const line of text.split('\n')) {
+    const t = line.trim();
+    if (!t) continue;
+    try { rows.push(JSON.parse(t)); }
+    catch { /* skip malformed */ }
+  }
+  return rows;
+}
+function filterSince(rows, since) {
+  if (!since) return rows;
+  const m = since.match(/^(\d+)([hd])$/);
+  if (!m) {
+    console.error(`Invalid --since format: ${since}. Expected like 24h, 3d.`);
+    process.exit(1);
+  }
+  const n = parseInt(m[1], 10);
+  const ms = m[2] === 'h' ? n * 3600 * 1000 : n * 86400 * 1000;
+  const cutoff = new Date(Date.now() - ms).toISOString();
+  return rows.filter(r => r.timestamp >= cutoff);
+}
+// ─── Window detection ───────────────────────────────────────────────────────
+function findResetWindows(rows) {
+  // Sort by timestamp (defensive — should already be sorted)
+  rows = rows.slice().sort((a, b) => a.timestamp.localeCompare(b.timestamp));
+  // Find indices where q5h_pct drops by RESET_THRESHOLD or more
+  // (these are window boundaries)
+  const windowStarts = [0]; // first call is always a window start
+  for (let i = 1; i < rows.length; i++) {
+    const prev = rows[i - 1].q5h_pct;
+    const cur = rows[i].q5h_pct;
+    if (typeof prev === 'number' && typeof cur === 'number' && cur < prev - RESET_THRESHOLD) {
+      windowStarts.push(i);
+    }
+  }
+  windowStarts.push(rows.length); // sentinel for last window
+  const windows = [];
+  for (let i = 0; i < windowStarts.length - 1; i++) {
+    const slice = rows.slice(windowStarts[i], windowStarts[i + 1]);
+    if (slice.length === 0) continue;
+    windows.push(slice);
+  }
+  return windows;
+}
+// ─── Token-equivalent calculation ───────────────────────────────────────────
+function callEquivalent(r, cacheReadWeight) {
+  return (
+    (r.input_tokens || 0) * W_UNCACHED_INPUT
+    + (r.output_tokens || 0) * W_OUTPUT
+    + (r.cache_creation_input_tokens || 0) * W_CACHE_CREATION
+    + (r.cache_read_input_tokens || 0) * cacheReadWeight
+  );
+}
+function windowEquivalent(window, cacheReadWeight) {
+  let sum = 0;
+  for (const r of window) sum += callEquivalent(r, cacheReadWeight);
+  return sum;
+}
+function windowDeltaQ5h(window) {
+  const start = window[0].q5h_pct ?? 0;
+  let peak = start;
+  for (const r of window) {
+    if (typeof r.q5h_pct === 'number' && r.q5h_pct > peak) peak = r.q5h_pct;
+  }
+  return peak - start;
+}
+function windowPeakFraction(window) {
+  let peakCount = 0;
+  for (const r of window) if (r.peak_hour) peakCount++;
+  return peakCount / window.length;
+}
+// ─── Statistics helpers ─────────────────────────────────────────────────────
+function mean(xs) {
+  if (xs.length === 0) return 0;
+  return xs.reduce((a, b) => a + b, 0) / xs.length;
+}
+function stdev(xs) {
+  if (xs.length < 2) return 0;
+  const m = mean(xs);
+  const sq = xs.map(x => (x - m) ** 2);
+  return Math.sqrt(sq.reduce((a, b) => a + b, 0) / (xs.length - 1));
+}
+function cv(xs) {
+  const m = mean(xs);
+  if (m === 0) return Infinity;
+  return stdev(xs) / m;
+}
+// ─── Counting model fit ─────────────────────────────────────────────────────
+function fitCountingModels(windows) {
+  // For each window, compute equivalent tokens under each hypothesis,
+  // then extrapolate to 100% quota using the observed delta_q5h.
+  // The model whose extrapolations are most consistent (lowest CV) wins.
+  const models = {
+    zero:   { weight: 0.0, label: 'H_zero (cache_read = 0.0x)',   extrapolations: [] },
+    billed: { weight: 0.1, label: 'H_billed (cache_read = 0.1x)', extrapolations: [] },
+    full:   { weight: 1.0, label: 'H_full (cache_read = 1.0x)',   extrapolations: [] },
+  };
+  for (const w of windows) {
+    const delta = windowDeltaQ5h(w);
+    if (delta < MIN_DELTA_Q5H) continue;
+    for (const key of Object.keys(models)) {
+      const eq = windowEquivalent(w, models[key].weight);
+      const implied100 = eq / (delta / 100);
+      models[key].extrapolations.push(implied100);
+    }
+  }
+  // Compute CV for each model
+  const usableWindows = models.zero.extrapolations.length;
+  const fits = {};
+  for (const key of Object.keys(models)) {
+    const xs = models[key].extrapolations;
+    fits[key] = {
+      label: models[key].label,
+      weight: models[key].weight,
+      mean: mean(xs),
+      stdev: stdev(xs),
+      cv: cv(xs),
+      values: xs,
+    };
+  }
+  // Determine the best fit
+  let bestKey = null;
+  let bestCv = Infinity;
+  for (const key of Object.keys(fits)) {
+    if (fits[key].cv < bestCv) {
+      bestCv = fits[key].cv;
+      bestKey = key;
+    }
+  }
+  return { fits, bestKey, usableWindows };
+}
+// ─── Peak vs off-peak analysis ─────────────────────────────────────────────
+function peakSplit(windows, weight) {
+  // Returns { peakWindows: [...], offPeakWindows: [...], skipped: [...] }
+  // and computes mean implied 100% quota for each group under the given
+  // cache_read weight.
+  const peakDom = [];
+  const offDom = [];
+  const skipped = [];
+  for (const w of windows) {
+    const delta = windowDeltaQ5h(w);
+    if (delta < MIN_DELTA_Q5H) {
+      skipped.push({ reason: 'delta_q5h too small', window: w });
+      continue;
+    }
+    const eq = windowEquivalent(w, weight);
+    const implied100 = eq / (delta / 100);
+    const pf = windowPeakFraction(w) * 100;
+    const entry = {
+      start: w[0].timestamp,
+      end: w[w.length - 1].timestamp,
+      calls: w.length,
+      delta,
+      peakFraction: pf,
+      eq,
+      implied100,
+    };
+    if (pf >= PEAK_WINDOW_MIN_PCT) peakDom.push(entry);
+    else if (pf <= OFFPEAK_WINDOW_MAX_PCT) offDom.push(entry);
+    else skipped.push({ reason: 'mixed peak/off-peak', ...entry });
+  }
+  return { peakDom, offDom, skipped };
+}
+// ─── Output rendering ───────────────────────────────────────────────────────
+function fmt(n, decimals = 2) {
+  if (n === null || n === undefined || !isFinite(n)) return 'n/a';
+  if (Math.abs(n) >= 1e6) return (n / 1e6).toFixed(decimals) + 'M';
+  if (Math.abs(n) >= 1e3) return (n / 1e3).toFixed(decimals) + 'K';
+  return n.toFixed(decimals);
+}
+function pct(n) { return (n * 100).toFixed(1) + '%'; }
+function printText(report) {
+  const { meta, windows, fit, peak } = report;
+  console.log('═══════════════════════════════════════════════════════════════════════');
+  console.log('  CLAUDE 5-HOUR QUOTA ANALYSIS');
+  console.log('═══════════════════════════════════════════════════════════════════════');
+  console.log();
+  console.log(`Data source:      ${meta.file}`);
+  console.log(`Total entries:    ${meta.totalRows}`);
+  console.log(`With q5h_pct:     ${meta.withQuota} (${pct(meta.withQuota / meta.totalRows)})`);
+  console.log(`Time range:       ${meta.timeStart}`);
+  console.log(`             →    ${meta.timeEnd}`);
+  console.log(`Reset windows:    ${windows.total} detected, ${windows.usable} usable for fit`);
+  console.log();
+  if (windows.usable < 2) {
+    console.log('⚠  Not enough usable reset windows to fit counting models.');
+    console.log('   Need at least 2 windows with q5h_pct increase ≥ 5%.');
+    console.log('   Run the interceptor through more activity and try again.');
+    return;
+  }
+  console.log('───────────────────────────────────────────────────────────────────────');
+  console.log('  Per-window breakdown');
+  console.log('───────────────────────────────────────────────────────────────────────');
+  console.log();
+  console.log('  ' + 'Window'.padEnd(34) + 'Calls'.padStart(6) + 'Δq5h'.padStart(6) + 'Peak%'.padStart(7) + 'EqToks'.padStart(10) + '100%impl'.padStart(11));
+  for (const wr of report.windowRows) {
+    console.log('  ' + wr.label.padEnd(34) + String(wr.calls).padStart(6) + (wr.delta + '%').padStart(6) + (wr.peakFraction.toFixed(0) + '%').padStart(7) + fmt(wr.eq).padStart(10) + fmt(wr.implied100).padStart(11));
+  }
+  console.log();
+  console.log('───────────────────────────────────────────────────────────────────────');
+  console.log('  Q1: Does cache_read count toward 5h quota?');
+  console.log('───────────────────────────────────────────────────────────────────────');
+  console.log();
+  console.log('  Tests three hypotheses against your data. Lower CV = better fit.');
+  console.log();
+  console.log('  ' + 'Hypothesis'.padEnd(34) + 'Mean impl 100%'.padStart(18) + 'CV'.padStart(10));
+  for (const key of ['zero', 'billed', 'full']) {
+    const f = fit.fits[key];
+    const marker = key === fit.bestKey ? ' ★' : '';
+    console.log('  ' + f.label.padEnd(34) + (fmt(f.mean) + ' tok').padStart(18) + (f.cv === Infinity ? 'inf' : (f.cv * 100).toFixed(1) + '%').padStart(10) + marker);
+  }
+  console.log();
+  console.log('  ★ = best fit (lowest coefficient of variation)');
+  console.log();
+  const bestFit = fit.fits[fit.bestKey];
+  if (bestFit.cv < 0.5) {
+    console.log(`  Verdict: ${bestFit.label} is the best fit (CV ${(bestFit.cv * 100).toFixed(1)}%).`);
+    if (fit.bestKey === 'zero') {
+      console.log('  Interpretation: cache_read does NOT meaningfully count toward your 5h quota.');
+      console.log('  The cache really is saving you quota, not just billing.');
+    } else if (fit.bestKey === 'billed') {
+      console.log('  Interpretation: cache_read counts at the BILLING rate (0.1x of input).');
+      console.log('  Quota and billing are aligned for cache reads.');
+    } else {
+      console.log('  Interpretation: cache_read counts at the FULL input rate for quota purposes.');
+      console.log('  This means cache hits save you billing but NOT quota — a stealth multiplier.');
+    }
+  } else {
+    console.log(`  Verdict: No clear winner. Best fit (${fit.fits[fit.bestKey].label}) has CV ${(fit.fits[fit.bestKey].cv * 100).toFixed(1)}%.`);
+    console.log('  Likely cause: small sample, mixed-model traffic, or sliding-window noise.');
+    console.log('  Run for longer and try again.');
+  }
+  console.log();
+  console.log('───────────────────────────────────────────────────────────────────────');
+  console.log('  Q2: Do peak hours cost more quota per token?');
+  console.log('───────────────────────────────────────────────────────────────────────');
+  console.log();
+  console.log(`  Peak hours: weekday 13:00–19:00 UTC (interceptor default)`);
+  console.log();
+  if (peak.peakDom.length === 0 && peak.offDom.length === 0) {
+    console.log('  Not enough peak-dominant or off-peak-dominant windows to compare.');
+    console.log('  Need at least 1 of each (≥80% same-bucket calls per window).');
+  } else {
+    console.log('  ' + 'Group'.padEnd(20) + 'Windows'.padStart(10) + 'Mean impl 100%'.padStart(20));
+    if (peak.peakDom.length > 0) {
+      const m = mean(peak.peakDom.map(p => p.implied100));
+      console.log('  ' + 'Peak-dominant'.padEnd(20) + String(peak.peakDom.length).padStart(10) + (fmt(m) + ' tok').padStart(20));
+    }
+    if (peak.offDom.length > 0) {
+      const m = mean(peak.offDom.map(p => p.implied100));
+      console.log('  ' + 'Off-peak'.padEnd(20) + String(peak.offDom.length).padStart(10) + (fmt(m) + ' tok').padStart(20));
+    }
+    if (peak.peakDom.length > 0 && peak.offDom.length > 0) {
+      const peakMean = mean(peak.peakDom.map(p => p.implied100));
+      const offMean = mean(peak.offDom.map(p => p.implied100));
+      const ratio = peakMean / offMean;
+      console.log();
+      if (ratio < 0.85) {
+        console.log(`  ⚠  Peak windows imply ${pct(ratio)} of off-peak quota.`);
+        console.log(`     That's a ${pct(1 - ratio)} effective quota REDUCTION during peak hours.`);
+        console.log('     Same usage pattern, fewer tokens until you hit 100%.');
+      } else if (ratio > 1.15) {
+        console.log(`  Peak windows imply ${pct(ratio)} of off-peak quota — peak is MORE generous?`);
+        console.log('  Unusual. Check your sample size and time range.');
+      } else {
+        console.log(`  Peak / off-peak ratio is ${pct(ratio)} — no significant peak penalty detected.`);
+      }
+    } else {
+      console.log();
+      console.log('  Need both peak-dominant AND off-peak-dominant windows for the comparison.');
+    }
+  }
+  console.log();
+  console.log('───────────────────────────────────────────────────────────────────────');
+  console.log('  Q3: Implied 5h quota for your account');
+  console.log('───────────────────────────────────────────────────────────────────────');
+  console.log();
+  console.log(`  Under best-fit model (${fit.fits[fit.bestKey].label}):`);
+  console.log(`    Mean implied 100% quota: ${fmt(fit.fits[fit.bestKey].mean)} token-equivalents`);
+  console.log();
+  console.log('  Token-equivalent weights used:');
+  console.log(`    uncached input  × ${W_UNCACHED_INPUT}`);
+  console.log(`    output          × ${W_OUTPUT}   (Opus output is 5x input rate)`);
+  console.log(`    cache_creation  × ${W_CACHE_CREATION}   (1h tier; 5m tier would be 1.25)`);
+  console.log(`    cache_read      × ${fit.fits[fit.bestKey].weight}   (this hypothesis)`);
+  console.log();
+  console.log('  Compare against your subscription tier and plan estimate. If your');
+  console.log('  number is wildly different from other reports, your sample may be');
+  console.log('  too small or your model mix may differ significantly.');
+  console.log();
+  console.log('═══════════════════════════════════════════════════════════════════════');
+  console.log();
+  console.log('Caveats:');
+  console.log('  • q5h is a 5-hour SLIDING window; we approximate as discrete resets');
+  console.log('  • Single account; aggregate findings need cross-validation');
+  console.log('  • cache_creation TTL weight averaged at 2.0; mixed 5m/1h would lower it');
+  console.log('  • Only Anthropic knows the exact quota formula');
+  console.log();
+  console.log('Reference: anthropics/claude-code#45756');
+  console.log('Report your findings: open an issue or PR on cnighswonger/claude-code-cache-fix');
+}
+function printJson(report) {
+  console.log(JSON.stringify(report, null, 2));
+}
+// ─── Main ───────────────────────────────────────────────────────────────────
+function main() {
+  const opts = parseArgs();
+  if (opts.help) { printUsage(); return; }
+  const filePath = opts.file || DEFAULT_USAGE_LOG;
+  const rawRows = loadUsage(filePath);
+  const filtered = filterSince(rawRows, opts.since);
+  const withQuota = filtered.filter(r => typeof r.q5h_pct === 'number');
+  if (withQuota.length === 0) {
+    console.error('No entries with q5h_pct field found.');
+    console.error('This field was added in claude-code-cache-fix v1.6.1 (2026-04-09).');
+    console.error('Older log entries are silently filtered out.');
+    process.exit(1);
+  }
+  withQuota.sort((a, b) => a.timestamp.localeCompare(b.timestamp));
+  const allWindows = findResetWindows(withQuota);
+  const fit = fitCountingModels(allWindows);
+  // Use the best-fit weight for the peak/off-peak analysis
+  const bestWeight = fit.fits[fit.bestKey].weight;
+  const peak = peakSplit(allWindows, bestWeight);
+  // Build per-window rows for the breakdown table
+  const windowRows = [];
+  for (const w of allWindows) {
+    const delta = windowDeltaQ5h(w);
+    if (delta < MIN_DELTA_Q5H) continue;
+    const eq = windowEquivalent(w, bestWeight);
+    const implied100 = eq / (delta / 100);
+    const pf = windowPeakFraction(w) * 100;
+    windowRows.push({
+      label: `${w[0].timestamp.slice(5, 16)} → ${w[w.length - 1].timestamp.slice(5, 16)}`,
+      calls: w.length,
+      delta,
+      peakFraction: pf,
+      eq,
+      implied100,
+    });
+  }
+  const report = {
+    meta: {
+      file: filePath,
+      totalRows: rawRows.length,
+      filteredRows: filtered.length,
+      withQuota: withQuota.length,
+      timeStart: withQuota[0].timestamp,
+      timeEnd: withQuota[withQuota.length - 1].timestamp,
+      since: opts.since,
+    },
+    windows: { total: allWindows.length, usable: fit.usableWindows },
+    windowRows,
+    fit,
+    peak,
+  };
+  if (opts.format === 'json') printJson(report);
+  else printText(report);
+}
+main();

package/tools/sim-cost-reconcile.sh ADDED Viewed

@@ -0,0 +1,60 @@
+#!/usr/bin/env bash
+# sim-cost-reconcile — One-liner wrapper for running cost-report.mjs against
+# a simulation log with admin API cross-reference enabled.
+#
+# Usage:
+#   sim-cost-reconcile <sim-dir-or-log> [extra cost-report.mjs args...]
+#
+# Examples:
+#   sim-cost-reconcile ~/git_repos/kanfei_test/kanfei-nowcast/.test_cache/simulations/realtime_sim_harnett_county_qlcs_2026_20260411_024836
+#   sim-cost-reconcile path/to/simulation.log --format md > report.md
+#
+# Reads admin key from $ANTHROPIC_ADMIN_KEY or ~/.config/anthropic/admin-key.
+# If no admin key is available, runs with telemetry only and warns.
+#
+# NOTE on admin reconciliation: the admin API returns data at 1h-bucket
+# resolution, so if multiple sims (or other API activity) overlap the same
+# hour, the admin total will include all of it. For an accurate multi-sim
+# aggregate, run this on each sim and sum the telemetry totals, then pull
+# the admin total once over the full window.
+set -euo pipefail
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+COST_REPORT="$SCRIPT_DIR/cost-report.mjs"
+if [[ $# -lt 1 ]]; then
+    echo "Usage: $(basename "$0") <sim-dir-or-log> [extra cost-report args...]" >&2
+    exit 1
+fi
+TARGET="$1"
+shift
+# Resolve a dir to its simulation.log
+if [[ -d "$TARGET" ]]; then
+    LOG="$TARGET/simulation.log"
+    if [[ ! -f "$LOG" ]]; then
+        echo "ERROR: no simulation.log in $TARGET" >&2
+        exit 1
+    fi
+elif [[ -f "$TARGET" ]]; then
+    LOG="$TARGET"
+else
+    echo "ERROR: $TARGET is neither a file nor a directory" >&2
+    exit 1
+fi
+# Load admin key
+ADMIN_KEY_FILE="${HOME}/.config/anthropic/admin-key"
+if [[ -n "${ANTHROPIC_ADMIN_KEY:-}" ]]; then
+    KEY="$ANTHROPIC_ADMIN_KEY"
+elif [[ -r "$ADMIN_KEY_FILE" ]]; then
+    KEY="$(cat "$ADMIN_KEY_FILE")"
+else
+    echo "WARNING: no admin key found ($ADMIN_KEY_FILE missing, ANTHROPIC_ADMIN_KEY unset)" >&2
+    echo "         running telemetry-only — pass --admin-key or set env var to enable reconciliation" >&2
+    exec node "$COST_REPORT" --sim-log "$LOG" "$@"
+fi
+exec node "$COST_REPORT" --sim-log "$LOG" --admin-key "$KEY" "$@"

package/tools/usage-to-dashboard-ndjson.mjs ADDED Viewed

@@ -0,0 +1,352 @@
+#!/usr/bin/env node
+/**
+ * usage-to-dashboard-ndjson — Translate claude-code-cache-fix's usage.jsonl
+ * into the proxy NDJSON format expected by @fgrosswig's claude-usage-dashboard,
+ * and write to the directory his dashboard already watches.
+ *
+ *   https://github.com/fgrosswig/claude-usage-dashboard
+ *
+ * # Why this exists
+ *
+ * Our interceptor and fgrosswig's dashboard are strongly complementary:
+ * the interceptor captures per-call API data from inside the Node.js process
+ * (cache metrics, quota state, request rewrites), while his dashboard
+ * provides visualization, historical trending, and multi-host aggregation.
+ *
+ * Rather than build our own visualization layer, we translate our per-call
+ * usage records into the NDJSON schema his dashboard ingests. A user running
+ * both tools gets the best of both: the interceptor fixes what it can fix
+ * and emits rich per-call data, and the dashboard displays that data
+ * alongside whatever Claude Code's own session JSONLs already capture.
+ *
+ * # What this tool does
+ *
+ * Reads `~/.claude/usage.jsonl` (our interceptor's per-call log) and
+ * translates each entry into a minimal-but-compatible record in the shape
+ * his dashboard expects under `~/.claude/anthropic-proxy-logs/*.ndjson`.
+ * The output file follows the convention `proxy-YYYY-MM-DD.ndjson`, one
+ * file per UTC day, matching the filename pattern his `collectProxyNdjsonFiles()`
+ * helper discovers.
+ *
+ * # Fields emitted
+ *
+ * Mapped from our usage.jsonl to fgrosswig's proxy-core.js shape:
+ *
+ *   {
+ *     "ts_start":  <our timestamp>,
+ *     "ts_end":    <our timestamp>,        // single-point, no duration
+ *     "duration_ms": null,                 // we don't measure this
+ *     "method":    "POST",
+ *     "path":      "/v1/messages",
+ *     "upstream_status": 200,              // implicit from usage presence
+ *     "usage": {
+ *       "input_tokens": <ours>,
+ *       "output_tokens": <ours>,
+ *       "cache_read_input_tokens": <ours>,
+ *       "cache_creation_input_tokens": <ours>
+ *     },
+ *     "cache_read_ratio": <computed>,
+ *     "cache_health":     "healthy" | "affected" | "mixed",
+ *     "request_hints":    { "model": <ours> },
+ *     "response_anthropic_headers": {      // if quota fields available
+ *       "anthropic-ratelimit-unified-5h-utilization": "<ours>",
+ *       "anthropic-ratelimit-unified-7d-utilization": "<ours>"
+ *     },
+ *     "ttl_tier":         <ours, interceptor-specific>,
+ *     "ephemeral_1h_input_tokens": <ours, interceptor-specific>,
+ *     "ephemeral_5m_input_tokens": <ours, interceptor-specific>,
+ *     "source": "claude-code-cache-fix"
+ *   }
+ *
+ * Extra fields beyond fgrosswig's native schema (ttl_tier, ephemeral_*,
+ * source) are added for forward-compatibility — his dashboard ignores
+ * unknown fields per its tolerant-ingest design, and our own tooling
+ * downstream may find them useful when consuming the same NDJSON.
+ *
+ * # Usage
+ *
+ *   # One-shot translation (reads all of usage.jsonl, writes today's file)
+ *   node tools/usage-to-dashboard-ndjson.mjs
+ *
+ *   # Follow mode (tail usage.jsonl, append new records as they arrive)
+ *   node tools/usage-to-dashboard-ndjson.mjs --follow
+ *
+ *   # Custom input/output paths
+ *   node tools/usage-to-dashboard-ndjson.mjs --input /path/to/usage.jsonl --output-dir /path/to/ndjson-dir
+ *
+ *   # Dry-run: print to stdout instead of writing files
+ *   node tools/usage-to-dashboard-ndjson.mjs --stdout
+ *
+ * # Environment
+ *
+ *   ANTHROPIC_PROXY_LOG_DIR  Override output directory (matches fgrosswig's
+ *                            dashboard env var so both tools stay in sync).
+ *
+ * Part of claude-code-cache-fix. MIT licensed.
+ *   https://github.com/cnighswonger/claude-code-cache-fix
+ */
+import { readFileSync, writeFileSync, appendFileSync, existsSync, mkdirSync, statSync, watch } from 'node:fs';
+import { join } from 'node:path';
+import { homedir } from 'node:os';
+// ─── CLI parsing ────────────────────────────────────────────────────────────
+function parseArgs() {
+  const args = process.argv.slice(2);
+  const opts = {
+    input: join(homedir(), '.claude', 'usage.jsonl'),
+    outputDir: process.env.ANTHROPIC_PROXY_LOG_DIR || join(homedir(), '.claude', 'anthropic-proxy-logs'),
+    stdout: false,
+    follow: false,
+    help: false,
+  };
+  for (let i = 0; i < args.length; i++) {
+    switch (args[i]) {
+      case '--input':      opts.input = args[++i]; break;
+      case '--output-dir': opts.outputDir = args[++i]; break;
+      case '--stdout':     opts.stdout = true; break;
+      case '--follow':     opts.follow = true; break;
+      case '-h':
+      case '--help':       opts.help = true; break;
+      default:
+        console.error(`unknown flag: ${args[i]}`);
+        opts.help = true;
+    }
+  }
+  return opts;
+}
+function printUsage() {
+  console.log(`usage-to-dashboard-ndjson — Translate cache-fix usage.jsonl to fgrosswig dashboard NDJSON.
+Usage:
+  node usage-to-dashboard-ndjson.mjs                 One-shot: read all, write today's file
+  node usage-to-dashboard-ndjson.mjs --follow        Tail usage.jsonl, append new records live
+  node usage-to-dashboard-ndjson.mjs --stdout        Print NDJSON to stdout instead of files
+  node usage-to-dashboard-ndjson.mjs --input <path>  Custom input (default: ~/.claude/usage.jsonl)
+  node usage-to-dashboard-ndjson.mjs --output-dir <path>  Custom output dir (default: ~/.claude/anthropic-proxy-logs)
+Output files follow the convention: proxy-YYYY-MM-DD.ndjson (one per UTC day).
+Environment:
+  ANTHROPIC_PROXY_LOG_DIR  Override output directory (also used by fgrosswig's dashboard).
+Credit: this tool writes the NDJSON schema expected by @fgrosswig's
+claude-usage-dashboard (https://github.com/fgrosswig/claude-usage-dashboard).
+Running both tools together gives users per-call data from our interceptor
+plus the visualization layer from his dashboard, with no coordination needed.
+`);
+}
+// ─── Record translation ─────────────────────────────────────────────────────
+/**
+ * Translate one claude-code-cache-fix usage.jsonl record into a
+ * fgrosswig-dashboard-compatible NDJSON record. Returns null if the
+ * record doesn't have enough fields to be usable.
+ */
+function translateRecord(entry) {
+  if (!entry || !entry.timestamp || !entry.model) return null;
+  const inTok = entry.input_tokens || 0;
+  const outTok = entry.output_tokens || 0;
+  const crTok = entry.cache_read_input_tokens || 0;
+  const ccTok = entry.cache_creation_input_tokens || 0;
+  // Cache health (fgrosswig's semantic labels)
+  const totalCacheInput = crTok + ccTok;
+  const cacheReadRatio = totalCacheInput > 0 ? crTok / totalCacheInput : null;
+  let cacheHealth = 'na';
+  if (cacheReadRatio != null) {
+    if (cacheReadRatio >= 0.8) cacheHealth = 'healthy';
+    else if (cacheReadRatio < 0.4 && ccTok > 0) cacheHealth = 'affected';
+    else cacheHealth = 'mixed';
+  }
+  // Reconstruct a minimal response_anthropic_headers blob from the quota
+  // pct fields we captured. Not byte-identical to what the proxy would see
+  // on the wire, but structurally compatible for the dashboard's consumers.
+  const responseHeaders = {};
+  if (entry.q5h_pct != null) {
+    responseHeaders['anthropic-ratelimit-unified-5h-utilization'] = String(entry.q5h_pct / 100);
+  }
+  if (entry.q7d_pct != null) {
+    responseHeaders['anthropic-ratelimit-unified-7d-utilization'] = String(entry.q7d_pct / 100);
+  }
+  const rec = {
+    ts_start: entry.timestamp,
+    ts_end: entry.timestamp,
+    duration_ms: null,
+    method: 'POST',
+    path: '/v1/messages',
+    upstream_status: 200,
+    usage: {
+      input_tokens: inTok,
+      output_tokens: outTok,
+      cache_read_input_tokens: crTok,
+      cache_creation_input_tokens: ccTok,
+    },
+    cache_read_ratio: cacheReadRatio,
+    cache_health: cacheHealth,
+    request_hints: {
+      model: entry.model,
+    },
+    response_anthropic_headers: responseHeaders,
+    // Interceptor-specific extras — fgrosswig's dashboard ignores unknown
+    // fields, so these pass through without breaking ingestion.
+    ttl_tier: entry.ttl_tier || null,
+    ephemeral_1h_input_tokens: entry.ephemeral_1h_input_tokens || 0,
+    ephemeral_5m_input_tokens: entry.ephemeral_5m_input_tokens || 0,
+    peak_hour: entry.peak_hour || false,
+    source: 'claude-code-cache-fix',
+  };
+  // Synthesize a stable pseudo-request-id from timestamp + model for dedup
+  // at the dashboard layer. Not a real request ID — just a deterministic key.
+  rec.req_id = 'ccf_' + entry.timestamp.replace(/[^0-9]/g, '') + '_' + entry.model.slice(-6);
+  return rec;
+}
+// ─── File output ────────────────────────────────────────────────────────────
+function dayFileFor(outputDir, isoTimestamp) {
+  // proxy-YYYY-MM-DD.ndjson from UTC date
+  const date = isoTimestamp.slice(0, 10);
+  return join(outputDir, `proxy-${date}.ndjson`);
+}
+function ensureDir(dir) {
+  if (!existsSync(dir)) mkdirSync(dir, { recursive: true });
+}
+function writeRecords(records, outputDir, useStdout) {
+  if (useStdout) {
+    for (const r of records) {
+      process.stdout.write(JSON.stringify(r) + '\n');
+    }
+    return records.length;
+  }
+  ensureDir(outputDir);
+  // Group by day for efficient appending
+  const byDay = new Map();
+  for (const r of records) {
+    const day = dayFileFor(outputDir, r.ts_start);
+    if (!byDay.has(day)) byDay.set(day, []);
+    byDay.get(day).push(r);
+  }
+  for (const [dayFile, dayRecords] of byDay) {
+    const payload = dayRecords.map(r => JSON.stringify(r)).join('\n') + '\n';
+    // Overwrite on one-shot mode — the tool is idempotent within a single
+    // input file, so rewriting today's file from a full replay is safe.
+    writeFileSync(dayFile, payload);
+  }
+  return records.length;
+}
+// ─── One-shot batch mode ────────────────────────────────────────────────────
+function runBatch(opts) {
+  if (!existsSync(opts.input)) {
+    console.error(`ERROR: input file not found: ${opts.input}`);
+    process.exit(1);
+  }
+  const raw = readFileSync(opts.input, 'utf8');
+  const lines = raw.split('\n').filter(l => l.trim());
+  const records = [];
+  let skipped = 0;
+  for (const line of lines) {
+    try {
+      const entry = JSON.parse(line);
+      const rec = translateRecord(entry);
+      if (rec) records.push(rec);
+      else skipped++;
+    } catch {
+      skipped++;
+    }
+  }
+  const written = writeRecords(records, opts.outputDir, opts.stdout);
+  if (!opts.stdout) {
+    console.error(`usage-to-dashboard-ndjson: wrote ${written} records to ${opts.outputDir} (${skipped} skipped)`);
+  }
+}
+// ─── Follow mode ────────────────────────────────────────────────────────────
+function runFollow(opts) {
+  if (!existsSync(opts.input)) {
+    console.error(`ERROR: input file not found: ${opts.input}`);
+    process.exit(1);
+  }
+  // First, catch up on the existing file (idempotent write)
+  runBatch(opts);
+  // Then watch for new entries
+  console.error(`usage-to-dashboard-ndjson: watching ${opts.input} for new records...`);
+  let lastSize = statSync(opts.input).size;
+  watch(opts.input, { persistent: true }, () => {
+    let currentSize;
+    try { currentSize = statSync(opts.input).size; } catch { return; }
+    if (currentSize <= lastSize) {
+      // File truncated or unchanged — rewind lastSize
+      if (currentSize < lastSize) lastSize = 0;
+      return;
+    }
+    // Read only the new bytes
+    try {
+      const fd = readFileSync(opts.input, 'utf8');
+      const newContent = fd.slice(lastSize);
+      lastSize = currentSize;
+      const newLines = newContent.split('\n').filter(l => l.trim());
+      const newRecs = [];
+      for (const line of newLines) {
+        try {
+          const entry = JSON.parse(line);
+          const rec = translateRecord(entry);
+          if (rec) newRecs.push(rec);
+        } catch {}
+      }
+      if (newRecs.length > 0) {
+        // Append to today's dayfile per record
+        ensureDir(opts.outputDir);
+        for (const r of newRecs) {
+          const dayFile = dayFileFor(opts.outputDir, r.ts_start);
+          appendFileSync(dayFile, JSON.stringify(r) + '\n');
+        }
+        console.error(`[${new Date().toISOString()}] appended ${newRecs.length} records`);
+      }
+    } catch (err) {
+      console.error(`watch error: ${err.message}`);
+    }
+  });
+  // Keep the process alive
+  process.stdin.resume();
+}
+// ─── Main ───────────────────────────────────────────────────────────────────
+const opts = parseArgs();
+if (opts.help) {
+  printUsage();
+  process.exit(0);
+}
+if (opts.follow) {
+  runFollow(opts);
+} else {
+  runBatch(opts);
+}