claude-code-cache-fix 3.0.1 → 3.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -4,13 +4,13 @@
4
4
 
5
5
  English | [中文](./README.zh.md) | [한국어](./README.ko.md) | [Português](./docs/guia-pt-br.md)
6
6
 
7
- Cache optimization proxy and interceptor for [Claude Code](https://github.com/anthropics/claude-code). Fixes prompt cache bugs that cause excessive quota burn, stabilizes the request prefix, and monitors for silent regressions. Works with all CC versions including the v2.1.113+ Bun binary.
7
+ Cache optimization proxy for [Claude Code](https://github.com/anthropics/claude-code). Fixes prompt cache bugs that cause excessive quota burn, stabilizes the request prefix, and monitors for silent regressions. Works with all CC versions including the v2.1.113+ Bun binary.
8
8
 
9
- > **v3.0.0** adds a local HTTP proxy with hot-reloadable extensions. This is the recommended path for CC v2.1.113+ where the preload interceptor no longer works. A/B tested on v2.1.117: **95.5% cache hit rate through proxy vs 82.3% direct** on first warm turn. [Full release notes →](https://github.com/cnighswonger/claude-code-cache-fix/releases/tag/v3.0.0)
9
+ > **v3.0.3** Local HTTP proxy with 7 hot-reloadable extensions. A/B tested on v2.1.117: **95.5% cache hit rate through proxy vs 82.3% direct** on first warm turn. [Full release notes →](https://github.com/cnighswonger/claude-code-cache-fix/releases/tag/v3.0.0)
10
10
 
11
- > **Opus 4.7 advisory:** Metered data shows 4.7 burns Q5h quota at **~2.4x the rate of 4.6** for equivalent visible token counts. Two factors: a new tokenizer (up to 35% more tokens, [documented](https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7)) and adaptive thinking overhead (~105%, not documented in usage response). Workaround: `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` (may reduce quality). See [Discussion #25](https://github.com/cnighswonger/claude-code-cache-fix/discussions/25) for full analysis.
11
+ > **Opus 4.7 advisory:** Metered data shows 4.7 burns Q5h quota at **~2.4x the rate of 4.6** for equivalent visible token counts ([independently confirmed by @ArkNill](https://github.com/ArkNill/claude-code-hidden-problem-analysis/blob/main/16_OPUS-47-ADVISORY.md)). Two factors: a new tokenizer (up to 35% more tokens, [documented](https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7)) and adaptive thinking overhead (~105%, not documented in usage response). The Q5h impact compounds into **Q7d** — the weekly quota ceiling that most heavy users will hit first. Workaround: `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` reduces burn by ~3.3x but may reduce quality on complex tasks. See [Discussion #25](https://github.com/cnighswonger/claude-code-cache-fix/discussions/25) (initial observation) and [Discussion #42](https://github.com/cnighswonger/claude-code-cache-fix/discussions/42) (controlled A/B data + Q7d analysis).
12
12
 
13
- ## Quick Start: Proxy (recommended for CC v2.1.113+)
13
+ ## Quick Start: Proxy (recommended)
14
14
 
15
15
  The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as hot-reloadable extensions.
16
16
 
@@ -29,7 +29,7 @@ That's it. The proxy applies all 7 cache-fix extensions automatically. No wrappe
29
29
 
30
30
  ### What the proxy does
31
31
 
32
- On every request passing through, 7 extensions run in order:
32
+ On every `/v1/messages` request, 7 extensions run in order:
33
33
 
34
34
  | Extension | What it fixes |
35
35
  |-----------|--------------|
@@ -45,232 +45,140 @@ Extensions are hot-reloadable — add, remove, or modify `.mjs` files in `proxy/
45
45
 
46
46
  ### Running as a service
47
47
 
48
- For persistent use, run the proxy in the background:
48
+ **Linux (systemd recommended):**
49
49
 
50
- ```bash
51
- # Start in background with logging
52
- nohup node "$(npm root -g)/claude-code-cache-fix/proxy/server.mjs" > /tmp/cache-fix-proxy.log 2>&1 &
50
+ Create `~/.config/systemd/user/cache-fix-proxy.service`:
53
51
 
54
- # Add to your shell profile
55
- echo 'export ANTHROPIC_BASE_URL=http://127.0.0.1:9801' >> ~/.bashrc
56
- ```
52
+ ```ini
53
+ [Unit]
54
+ Description=Claude Code Cache Fix Proxy (v3.x)
55
+ After=network.target
57
56
 
58
- ### Health check
57
+ [Service]
58
+ Type=simple
59
+ ExecStart=/usr/local/bin/node /path/to/claude-code-cache-fix/proxy/server.mjs
60
+ Restart=on-failure
61
+ RestartSec=5
62
+ Environment=CACHE_FIX_PROXY_PORT=9801
59
63
 
60
- ```bash
61
- curl http://127.0.0.1:9801/health
62
- # {"status":"ok"}
64
+ [Install]
65
+ WantedBy=default.target
63
66
  ```
64
67
 
65
- ## Quick Start: Preload (for CC v2.1.112 and earlier)
66
-
67
- If you're on a Node.js-based CC version (v2.1.112 or earlier), the preload interceptor still works and requires no proxy:
68
-
69
68
  ```bash
70
- npm install -g claude-code-cache-fix
71
- NODE_OPTIONS="--import claude-code-cache-fix" claude
72
- ```
73
-
74
- > **Note:** The preload does NOT work on CC v2.1.113+ (Bun binary). Use the proxy path above.
75
-
76
- See [Preload Setup Details](#preload-setup-details) below for wrapper scripts, shell aliases, and Windows instructions.
77
-
78
- ## Security model
79
-
80
- > **This interceptor patches `globalThis.fetch`.** By design, it has full read/write access to all API requests and responses in the Claude Code process. This is inherent to the approach — any fetch interceptor, proxy, or gateway has this position.
81
-
82
- **What it does:** Modifies outgoing request structure (block order, fingerprint, TTL, git-status) to fix cache bugs. Reads response headers and SSE usage data for monitoring.
83
-
84
- **What it does NOT do:** No network calls from the interceptor. All telemetry is written to local files under `~/.claude/`. No data leaves your machine unless you explicitly opt in to [claude-code-meter](https://github.com/cnighswonger/claude-code-meter) sharing (separate package, requires interactive consent).
85
-
86
- **Supply chain:** Single unminified file (`preload.mjs`, ~1,700 lines). One dependency (`zod` for schema validation in tests only). Review before installing. npm provenance links each published version to its source commit.
87
-
88
- **Independent audit:** [Assessed as "LEGITIMATE TOOL"](https://github.com/anthropics/claude-code/issues/38335#issuecomment-4244413605) by @TheAuditorTool (2026-04-14).
69
+ systemctl --user daemon-reload
70
+ systemctl --user enable --now cache-fix-proxy
89
71
 
90
- ## The problem
91
-
92
- When you use `--resume` or `/resume` in Claude Code, the prompt cache breaks silently. Instead of reading cached tokens (cheap), the API rebuilds them from scratch on every turn (expensive). A session that should cost ~$0.50/hour can burn through $5–10/hour with no visible indication anything is wrong.
93
-
94
- Three bugs cause this:
95
-
96
- 1. **Partial block scatter** — Attachment blocks (skills listing, MCP servers, deferred tools, hooks) are supposed to live in `messages[0]`. On resume, some or all of them drift to later messages, changing the cache prefix.
97
-
98
- 2. **Fingerprint instability** — The `cc_version` fingerprint (e.g. `2.1.92.a3f`) is computed from `messages[0]` content including meta/attachment blocks. When those blocks shift, the fingerprint changes, the system prompt changes, and cache busts.
99
-
100
- 3. **Non-deterministic tool ordering** — Tool definitions can arrive in different orders between turns, changing request bytes and invalidating the cache key.
101
-
102
- Additionally, images read via the Read tool persist as base64 in conversation history and are sent on every subsequent API call, compounding token costs silently.
103
-
104
- ## Preload Setup Details
105
-
106
- <details>
107
- <summary>Expand for preload interceptor setup (CC v2.1.112 and earlier only)</summary>
72
+ # Optional: start on boot (before login)
73
+ sudo loginctl enable-linger $USER
74
+ ```
108
75
 
109
- ### Installation
76
+ A `cache-fix-proxy install-service` subcommand is planned for v3.1.0 ([#48](https://github.com/cnighswonger/claude-code-cache-fix/issues/48)).
110
77
 
111
- Requires Node.js >= 18 and Claude Code installed via npm (not the standalone binary).
78
+ **Fallback (any OS):**
112
79
 
113
80
  ```bash
114
- npm install -g claude-code-cache-fix
81
+ nohup node "$(npm root -g)/claude-code-cache-fix/proxy/server.mjs" > /tmp/cache-fix-proxy.log 2>&1 &
82
+ echo 'export ANTHROPIC_BASE_URL=http://127.0.0.1:9801' >> ~/.bashrc
115
83
  ```
116
84
 
117
- ### Usage
118
-
119
- The preload works as a Node.js module that intercepts API requests before they leave your machine.
120
-
121
- ### Option A: Wrapper script (recommended)
122
-
123
- Create a wrapper script (e.g. `~/bin/claude-fixed`):
85
+ ### Health check
124
86
 
125
87
  ```bash
126
- #!/bin/bash
127
- NPM_GLOBAL_ROOT="$(npm root -g 2>/dev/null)"
128
-
129
- CLAUDE_NPM_CLI="$NPM_GLOBAL_ROOT/@anthropic-ai/claude-code/cli.js"
130
- CACHE_FIX="$NPM_GLOBAL_ROOT/claude-code-cache-fix/preload.mjs"
88
+ curl http://127.0.0.1:9801/health
89
+ # {"status":"ok"}
90
+ ```
131
91
 
132
- if [ ! -f "$CLAUDE_NPM_CLI" ]; then
133
- echo "Error: Claude Code npm package not found at $CLAUDE_NPM_CLI" >&2
134
- echo "Install with: npm install -g @anthropic-ai/claude-code" >&2
135
- exit 1
136
- fi
92
+ ### Corporate environments (proxies, custom CAs)
137
93
 
138
- if [ ! -f "$CACHE_FIX" ]; then
139
- echo "Error: claude-code-cache-fix not found at $CACHE_FIX" >&2
140
- echo "Install with: npm install -g claude-code-cache-fix" >&2
141
- exit 1
142
- fi
94
+ The proxy honors the following environment variables when forwarding to `api.anthropic.com`. Behind Zscaler / Netskope / Forcepoint / Bluecoat / corporate squid, set these in the proxy's environment.
143
95
 
144
- exec env NODE_OPTIONS="--import $CACHE_FIX" node "$CLAUDE_NPM_CLI" "$@"
145
- ```
96
+ | Variable | Effect |
97
+ |----------|--------|
98
+ | `HTTPS_PROXY` / `HTTP_PROXY` (and lowercase variants) | Routes upstream requests through the corporate HTTP CONNECT proxy. |
99
+ | `NO_PROXY` | Comma-separated host list to bypass the proxy. Supports `*` and `.suffix.example.com`. |
100
+ | `CACHE_FIX_PROXY_CA_FILE` | Path to a PEM file with one or more extra CA certificates (for SSL-inspecting proxies). |
101
+ | `NODE_EXTRA_CA_CERTS` | Standard Node mechanism — also honored. |
102
+ | `CACHE_FIX_PROXY_REJECT_UNAUTHORIZED=0` | **Insecure escape hatch.** Disables TLS verification. Use only as a last resort while you wait for IT to provide the corp CA bundle. |
146
103
 
147
- ```bash
148
- chmod +x ~/bin/claude-fixed
149
- ```
104
+ Example (Windows PowerShell):
150
105
 
151
- Adjust `CLAUDE_NPM_CLI` if your npm global prefix differs. Find it with:
152
- ```bash
153
- npm root -g
106
+ ```powershell
107
+ $env:HTTPS_PROXY = 'http://proxy.corp.example:8080'
108
+ $env:NO_PROXY = 'localhost,127.0.0.1,.corp.example'
109
+ $env:CACHE_FIX_PROXY_CA_FILE = 'C:\corp\zscaler-root.pem'
110
+ node "$(npm root -g)\claude-code-cache-fix\proxy\server.mjs"
154
111
  ```
155
112
 
156
- ### Option B: Shell alias
113
+ Stderr will print `[upstream] using proxy http://proxy.corp.example:8080 ...` on first request when the agent is wired correctly. With no proxy/CA env vars set, behavior is unchanged from earlier versions (Node default agent, system trust store).
157
114
 
158
- ```bash
159
- alias claude='NODE_OPTIONS="--import claude-code-cache-fix" node "$(npm root -g)/@anthropic-ai/claude-code/cli.js"'
160
- ```
115
+ ## Quick Start: Preload (CC v2.1.112 and earlier)
161
116
 
162
- ### Option C: Direct invocation
117
+ If you're on a Node.js-based CC version (v2.1.112 or earlier), the preload interceptor works without a proxy:
163
118
 
164
119
  ```bash
120
+ npm install -g claude-code-cache-fix
165
121
  NODE_OPTIONS="--import claude-code-cache-fix" claude
166
122
  ```
167
123
 
168
- > **Note**: This only works if `claude` points to the npm/Node installation. The standalone binary uses a different execution path that bypasses Node.js preloads.
169
-
170
- ### Windows users
171
-
172
- On Windows, `NODE_OPTIONS="--import ..."` doesn't work the same way as on Linux/macOS. Use the included `claude-fixed.bat` wrapper instead:
173
-
174
- 1. After installing both packages globally:
175
- ```bat
176
- npm install -g claude-code-cache-fix
177
- npm install -g @anthropic-ai/claude-code
178
- ```
124
+ > **Note:** The preload does NOT work on CC v2.1.113+ (Bun binary). Use the proxy above.
179
125
 
180
- 2. Copy `claude-fixed.bat` from this package to a directory in your PATH (e.g., `C:\Users\<you>\bin\`):
181
- ```bat
182
- copy "%NPM_ROOT%\claude-code-cache-fix\claude-fixed.bat" C:\Users\%USERNAME%\bin\
183
- ```
184
- Or find the file manually at your npm global root (run `npm root -g` to locate it).
185
-
186
- 3. Run Claude Code with the interceptor active:
187
- ```bat
188
- claude-fixed [any claude args...]
189
- ```
190
-
191
- The wrapper dynamically resolves your npm global root, constructs a `file:///` URL for the preload module (converting backslashes to forward slashes for Node.js), and launches Claude Code with the interceptor loaded. All environment variables (`CACHE_FIX_DEBUG`, `CACHE_FIX_IMAGE_KEEP_LAST`, etc.) work the same as on Linux/macOS.
192
-
193
- Credit: [@TomTheMenace](https://github.com/anthropics/claude-code/issues/38335) contributed the Windows wrapper and validated the interceptor across a 7.5-hour, 536-call Opus 4.6 session on Windows — 98.4% cache hit rate, 81% of calls had fingerprint instability that the interceptor corrected.
126
+ See [docs/preload-setup.md](docs/preload-setup.md) for wrapper scripts, shell aliases, Windows instructions, and VS Code preload-mode integration.
194
127
 
195
128
  ## VS Code Extension
196
129
 
197
- ### Option A: VSIX extension (recommended)
130
+ The [VS Code extension](https://github.com/cnighswonger/claude-code-cache-fix-vscode) (v0.5.0) supports both proxy and preload modes:
198
131
 
199
- The easiest path — a VS Code extension that handles everything automatically:
132
+ **Proxy mode (recommended):**
133
+ 1. Start the proxy (see above)
134
+ 2. In VS Code command palette: **Claude Code Cache Fix: Enable Proxy Mode**
135
+ 3. Restart any active Claude Code session
200
136
 
201
- 1. Install the interceptor: `npm install -g claude-code-cache-fix`
137
+ **Preload mode (CC ≤v2.1.112):**
138
+ 1. `npm install -g claude-code-cache-fix`
202
139
  2. Download the VSIX from [GitHub Releases](https://github.com/cnighswonger/claude-code-cache-fix-vscode/releases/latest)
203
- 3. Install: `code --install-extension claude-code-cache-fix-0.1.0.vsix`
204
- (or in VS Code: Extensions `...` menu → "Install from VSIX...")
205
- 4. Restart any active Claude Code session
140
+ 3. Install: `code --install-extension claude-code-cache-fix-0.5.0.vsix`
141
+ 4. Command palette: **Claude Code Cache Fix: Enable**
206
142
 
207
- The extension auto-configures `claudeCode.claudeProcessWrapper` on activation. No manual settings needed. Works on Windows, macOS, and Linux.
143
+ For manual VS Code wrapper setup (without the VSIX), see [docs/preload-setup.md](docs/preload-setup.md#vs-code-preload-mode).
208
144
 
209
- Commands available in the VS Code command palette:
210
- - **Claude Code Cache Fix: Enable** / **Disable** / **Show Status**
211
-
212
- ### Option B: Manual wrapper (if you prefer not to install the VSIX)
213
-
214
- The VS Code Claude Code extension spawns `claude.exe` / `claude` as a subprocess. The `claude-code.environmentVariables` setting does **not** propagate `NODE_OPTIONS`, so a process wrapper is required.
145
+ ## Security model
215
146
 
216
- **Linux / macOS**create `~/bin/claude-vscode-wrapper`:
147
+ > **The proxy and interceptor have full read/write access to API requests and responses.** This is inherent to the approach any fetch interceptor, proxy, or gateway has this position.
217
148
 
218
- ```bash
219
- #!/bin/bash
220
- NPM_ROOT="$(npm root -g 2>/dev/null)"
221
- PRELOAD="$NPM_ROOT/claude-code-cache-fix/preload.mjs"
222
- shift # VS Code passes the original claude path as $1
223
- export NODE_OPTIONS="--import $PRELOAD"
224
- exec node "$NPM_ROOT/@anthropic-ai/claude-code/cli.js" "$@"
225
- ```
149
+ **What it does:** Modifies outgoing request structure (block order, fingerprint, TTL, git-status) to fix cache bugs. Reads response headers and SSE usage data for monitoring.
226
150
 
227
- ```bash
228
- chmod +x ~/bin/claude-vscode-wrapper
229
- ```
151
+ **What it does NOT do:** No network calls from the proxy or interceptor. All telemetry is written to local files under `~/.claude/`. No data leaves your machine unless you explicitly opt in to [claude-code-meter](https://github.com/cnighswonger/claude-code-meter) sharing (separate package, requires interactive consent).
230
152
 
231
- Add to VS Code `settings.json`:
153
+ **Supply chain:** Proxy mode: 7 small extension modules in `proxy/extensions/` (each under 200 lines). Preload mode: single unminified file (`preload.mjs`, ~1,700 lines). One dev dependency (`zod` for schema validation in tests only). Review before installing. npm provenance links each published version to its source commit.
232
154
 
233
- ```json
234
- {
235
- "claudeCode.claudeProcessWrapper": "/home/YOUR_USERNAME/bin/claude-vscode-wrapper"
236
- }
237
- ```
238
-
239
- **Windows** — `.bat`/`.cmd` wrappers fail because the extension uses `child_process.spawn()` without `shell: true`. Use the C wrapper source included in this package (`tools/claude-vscode-wrapper.c`):
155
+ **Independent audit:** [Assessed as "LEGITIMATE TOOL"](https://github.com/anthropics/claude-code/issues/38335#issuecomment-4244413605) by @TheAuditorTool (2026-04-14).
240
156
 
241
- ```cmd
242
- cl tools\claude-vscode-wrapper.c /Fe:claude-vscode-wrapper.exe
243
- ```
157
+ ## The problem
244
158
 
245
- Then set in VS Code `settings.json`:
159
+ When you use `--resume` or `/resume` in Claude Code, the prompt cache breaks silently. Instead of reading cached tokens (cheap), the API rebuilds them from scratch on every turn (expensive). A session that should cost ~$0.50/hour can burn through $5–10/hour with no visible indication anything is wrong.
246
160
 
247
- ```json
248
- {
249
- "claudeCode.claudeProcessWrapper": "C:\\path\\to\\claude-vscode-wrapper.exe"
250
- }
251
- ```
161
+ Three bugs cause this:
252
162
 
253
- ### Known limitations (VS Code)
163
+ 1. **Partial block scatter** — Attachment blocks (skills listing, MCP servers, deferred tools, hooks) are supposed to live in `messages[0]`. On resume, some or all drift to later messages, changing the cache prefix.
254
164
 
255
- - **Fingerprint fix**: Fixed in v1.11.0 the safety check now handles both the v2.1.108+ extraction method and the legacy method. No workaround needed. (Previously required `CACHE_FIX_SKIP_FINGERPRINT=1`.)
165
+ 2. **Fingerprint instability** The `cc_version` fingerprint (e.g. `2.1.92.a3f`) is computed from `messages[0]` content including meta/attachment blocks. When those blocks shift, the fingerprint changes, the system prompt changes, and cache busts.
256
166
 
257
- Credit: [@JEONG-JIWOO](https://github.com/JEONG-JIWOO) and [@X-15](https://github.com/X-15) for the VS Code extension investigation and C wrapper ([#16](https://github.com/cnighswonger/claude-code-cache-fix/issues/16)).
167
+ 3. **Non-deterministic tool ordering** Tool definitions can arrive in different orders between turns, changing request bytes and invalidating the cache key.
258
168
 
259
- </details>
169
+ Additionally, images read via the Read tool persist as base64 in conversation history and are sent on every subsequent API call, compounding token costs silently.
260
170
 
261
171
  ## How it works
262
172
 
263
- The module intercepts `globalThis.fetch` before Claude Code makes API calls to `/v1/messages`. On each call it:
173
+ **Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions are hot-reloadable `.mjs` files configured in `proxy/extensions.json`. All other traffic passes through untouched.
264
174
 
265
- 1. **Scans all user messages** for relocated attachment blocks (skills, MCP, deferred tools, hooks) and moves the latest version of each back to `messages[0]`, matching fresh session layout
266
- 2. **Sorts tool definitions** alphabetically by name for deterministic ordering
267
- 3. **Recomputes the cc_version fingerprint** from the real user message text instead of meta/attachment content
175
+ **Preload mode** (v2.x): A Node.js `--import` module that patches `globalThis.fetch` before Claude Code makes API calls. Applies the same fixes inline scans user messages for relocated blocks, sorts tools, recomputes fingerprints, injects TTL markers.
268
176
 
269
- All fixes are idempotent — if nothing needs fixing, the request passes through unmodified. The interceptor is read-only with respect to your conversation; it only normalizes the request structure before it hits the API.
177
+ Both modes are idempotent — if nothing needs fixing, the request passes through unmodified. Neither mode modifies your conversation; they only normalize the request structure before it hits the API.
270
178
 
271
- ## Graduating from Fixes
179
+ ## Graduating from fixes
272
180
 
273
- The interceptor serves three purposes with different lifecycles:
181
+ The package serves three purposes with different lifecycles:
274
182
 
275
183
  | Purpose | Examples | When to disable |
276
184
  |---------|----------|-----------------|
@@ -278,7 +186,7 @@ The interceptor serves three purposes with different lifecycles:
278
186
  | **Monitoring** | Quota tracking, microcompact detection, GrowthBook flags | Keep permanently — these detect future regressions |
279
187
  | **Optimizations** | Image stripping, output efficiency rewrite | Keep as long as they help your workflow |
280
188
 
281
- ### Health status
189
+ ### Health status (preload mode)
282
190
 
283
191
  On first API call, the interceptor logs a health status line (requires `CACHE_FIX_DEBUG=1`):
284
192
 
@@ -286,25 +194,14 @@ On first API call, the interceptor logs a health status line (requires `CACHE_FI
286
194
  cache-fix health: relocate=active(2h ago) fingerprint=dormant(5 clean sessions) tool_sort=active ttl=active identity=waiting
287
195
  ```
288
196
 
289
- Status meanings:
290
197
  - **active(Xh ago)** — fix was applied recently
291
- - **dormant(N clean sessions)** — bug not detected in N resume sessions; CC may have fixed it
292
- - **safety-blocked(Nx)** — round-trip verification failed; CC changed its algorithm, fix auto-disabled
198
+ - **dormant(N clean sessions)** — bug not detected in N sessions; CC may have fixed it
199
+ - **safety-blocked(Nx)** — round-trip verification failed; fix auto-disabled
293
200
  - **waiting** — fix hasn't been triggered yet
294
201
 
295
- When a fix shows `dormant`, you can safely disable it:
296
- ```bash
297
- export CACHE_FIX_SKIP_RELOCATE=1 # example
298
- ```
299
-
300
- To disable all fixes but keep monitoring:
301
- ```bash
302
- export CACHE_FIX_DISABLED=1
303
- ```
304
-
305
202
  ### Regression detection
306
203
 
307
- If cache_read ratio drops below 50% across 5+ calls after disabling fixes, you'll see:
204
+ If cache_read ratio drops below 50% across 5+ calls after disabling fixes:
308
205
  ```
309
206
  REGRESSION WARNING: cache_read ratio averaged 12% across last 5 calls.
310
207
  Fixes are disabled — consider re-enabling to recover cache performance.
@@ -314,10 +211,7 @@ Fixes are disabled — consider re-enabling to recover cache performance.
314
211
 
315
212
  ### Fingerprint round-trip verification
316
213
 
317
- Before rewriting the `cc_version` fingerprint, the interceptor verifies that its
318
- hardcoded salt and character indices reproduce the fingerprint Claude Code sent.
319
- If verification fails (CC changed its algorithm), the rewrite is skipped automatically.
320
- This ensures the interceptor can never make cache performance *worse* than stock CC.
214
+ Before rewriting the `cc_version` fingerprint, the interceptor verifies that its hardcoded salt and character indices reproduce the fingerprint Claude Code sent. If verification fails (CC changed its algorithm), the rewrite is skipped automatically. This ensures the interceptor can never make cache performance *worse* than stock CC.
321
215
 
322
216
  ### Fail-safe design
323
217
 
@@ -331,21 +225,18 @@ The interceptor can only *help* or *do nothing*. It cannot make things worse.
331
225
 
332
226
  ## Status line — quota warnings in real time
333
227
 
334
- The interceptor writes quota state to `~/.claude/quota-status.json` on every API call. The included `tools/quota-statusline.sh` script reads this file and displays a live status line in Claude Code showing:
228
+ Both proxy and preload modes write quota state to `~/.claude/quota-status.json` on every API call. The included `tools/quota-statusline.sh` script displays a live status line showing:
335
229
 
336
230
  - **Q5h %** with burn rate (%/min)
337
231
  - **Q7d %** with burn rate (%/hr)
338
- - **TTL tier** — shows `TTL:1h` when healthy, **`TTL:5m` in red when the server has downgraded you** (typically at Q5h ≥ 100%)
232
+ - **TTL tier** — `TTL:1h` when healthy, **`TTL:5m` in red when the server has downgraded you** (typically at Q5h ≥ 100%)
339
233
  - **PEAK** in yellow during weekday peak hours (13:00–19:00 UTC)
340
234
  - **Cache hit rate %**
341
235
  - **OVERAGE** flag when active
342
236
 
343
237
  ### Setup
344
238
 
345
- Copy the script and configure Claude Code to use it:
346
-
347
239
  ```bash
348
- # Copy from the npm package to Claude Code's hooks directory
349
240
  mkdir -p ~/.claude/hooks
350
241
  cp "$(npm root -g)/claude-code-cache-fix/tools/quota-statusline.sh" ~/.claude/hooks/
351
242
  chmod +x ~/.claude/hooks/quota-statusline.sh
@@ -362,305 +253,57 @@ Add to `~/.claude/settings.json`:
362
253
  }
363
254
  ```
364
255
 
365
- ### Recommended: disable git-status injection
366
-
367
- Claude Code injects live `git status` output into the system prompt on every call. Any file edit changes the git status, which changes the system prompt, which busts the entire prefix cache. Disabling this saves ~1,800 tokens per call and fully stabilizes the system prompt across file edits:
368
-
369
- ```bash
370
- export CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1
371
- ```
372
-
373
- Or add `"includeGitInstructions": false` to `~/.claude/settings.json`. Claude Code can still run `git status` via the Bash tool when it needs git context — it just won't pre-inject it into every system prompt.
374
-
375
- The flag also shrinks the Bash tool description by ~6,364 chars (the Bash tool includes git-related instructions that are stripped when the flag is set), for a total prefix savings of ~7,180 chars (~1,800 tokens) per call.
376
-
377
- Community-validated by [@wadabum](https://github.com/cnighswonger/claude-code-cache-fix/issues/11): 18-token cache creation across git state changes (vs thousands without the flag). See [#11](https://github.com/cnighswonger/claude-code-cache-fix/issues/11) for the full telemetry comparison.
378
-
379
- **Note:** this flag does not address the `"Primary working directory"` line in the system prompt, which changes per git worktree. A v1.9.0 interceptor fix to strip/normalize both is planned ([#11](https://github.com/cnighswonger/claude-code-cache-fix/issues/11)).
380
-
381
256
  ### Why the status line matters
382
257
 
383
- When the server downgrades your TTL to 5m (Layer 2 — quota-aware downgrade at Q5h ≥ 100%), **every idle longer than 5 minutes causes a full context rebuild**. Without the status line, this is invisible — you just notice things getting slower and more expensive. With the status line, the red `TTL:5m` warning tells you immediately: **stop working, wait for the Q5h window to reset, then resume**. Powering through overage compounds the drain; pausing breaks the cycle.
384
-
385
- ## Image stripping
386
-
387
- Images read via the Read tool are encoded as base64 and stored in `tool_result` blocks in conversation history. They ride along on **every subsequent API call** until compaction. A single 500KB image costs ~62,500 tokens per turn on Opus 4.6, and potentially **~85,000+ tokens on Opus 4.7** due to the new tokenizer (up to 35% inflation) and high-res image support (2576px max, up from 1568px). Image stripping is strongly recommended on 4.7.
388
-
389
- Enable image stripping to remove old images from tool results:
390
-
391
- ```bash
392
- export CACHE_FIX_IMAGE_KEEP_LAST=3
393
- ```
394
-
395
- This keeps images in the last 3 user messages and replaces older ones with a text placeholder. Only targets images inside `tool_result` blocks (Read tool output) — user-pasted images are never touched. Files remain on disk for re-reading if needed.
396
-
397
- Set to `0` (default) to disable.
398
-
399
- ## System prompt rewrite (optional)
400
-
401
- The interceptor can also rewrite Claude Code's `# Output efficiency` system-prompt section before the request is sent.
402
-
403
- This feature is **optional** and **disabled by default**. If `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT` is unset, nothing is changed.
404
-
405
- Enable it by setting a replacement text:
406
-
407
- ```bash
408
- export CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT=$'# Output efficiency\n\n...'
409
- ```
410
-
411
- The rewrite is intentionally narrow:
412
-
413
- - Only Claude Code's `# Output efficiency` section is replaced
414
- - Other system prompt sections are preserved
415
- - Existing system block structure and fields such as `cache_control` are preserved
416
-
417
- This may be useful for users who want to stay on current Claude Code versions but experiment with a different `Output efficiency` instruction set instead of downgrading to an earlier release.
418
-
419
- ### Prompt variants
420
-
421
- <details>
422
- <summary>Anthropic internal / <code>USER_TYPE=ant</code> version</summary>
423
-
424
- ```text
425
- # Output efficiency
426
-
427
- When sending user-facing text, you're writing for a person, not logging to a console. Assume users can't see most tool calls or thinking - only your text output. Before your first tool call, briefly state what you're about to do. While working, give short updates at key moments: when you find something load-bearing (a bug, a root cause), when changing direction, when you've made progress without an update.
428
-
429
- When you give updates, assume the recipient may have stepped away and lost the thread. They do not know your internal shorthand, codenames, or half-formed plan. Write in complete, grammatical sentences that can be understood cold. Spell out technical terms when helpful. If unsure, err on the side of a bit more explanation. Adapt to the user's expertise: experts can handle denser updates, but don't make novice users reconstruct context on their own.
430
-
431
- User-facing text should read like natural prose. Avoid clipped sentence fragments, excessive dashes, symbolic shorthand, or formatting that reads like console output. Use tables only when they genuinely improve scanability, such as compact facts (files, lines, pass/fail) or quantitative comparisons. Keep explanatory reasoning in prose around the table, not inside it. Avoid semantic backtracking: structure sentences so the user can follow them linearly without having to reinterpret earlier clauses after reading later ones.
432
-
433
- Optimize for fast human comprehension, not minimal surface area. If the user has to reread your summary or ask a follow-up just to understand what happened, you saved the wrong tokens. Match the level of structure to the task: for a simple question, answer in plain prose without unnecessary headings or numbered lists. While staying clear and direct, also be concise and avoid fluff. Skip filler, obvious restatements, and throat-clearing. Get to the point. Don't over-focus on low-signal details from your process. When it helps, use an inverted pyramid structure with the conclusion first and details later.
434
-
435
- These user-facing text instructions do not apply to code or tool calls.
436
- ```
437
-
438
- </details>
439
-
440
- <details>
441
- <summary>Public / default Claude Code version</summary>
442
-
443
- ```text
444
- # Output efficiency
445
-
446
- IMPORTANT: Go straight to the point. Try the simplest approach first without going in circles. Do not overdo it. Be extra concise.
447
-
448
- Your text output is brief, direct, and to the point. Lead with the answer or action, not the reasoning. Omit filler, preamble, and unnecessary transitions. Do not restate the user's request; move directly to the work. When explanation is needed, include only what helps the user understand the outcome.
449
-
450
- Prioritize user-facing text for:
451
- - decisions that require user input
452
- - high-signal progress updates at natural milestones
453
- - errors or blockers that change the plan
454
-
455
- If a sentence can do the job, do not turn it into three. Favor short, direct constructions over long explanatory prose. These instructions do not apply to code or tool calls.
456
- ```
457
-
458
- </details>
459
-
460
- <details>
461
- <summary>Example custom replacement(A middle-ground version combining the two versions above)</summary>
462
-
463
- ```text
464
- # Output efficiency
465
-
466
- When sending user-facing text, write for a person, not a log file. Assume the user cannot see most tool calls or hidden reasoning - only your text output.
467
-
468
- Keep user-facing text clear, direct, and reasonably concise. Lead with the answer or action. Skip filler, repetition, and unnecessary preamble.
469
-
470
- Explain enough for the user to understand the reasoning, tradeoffs, or root cause when that would help them learn or make a decision, but do not turn simple answers into long writeups.
471
-
472
- These instructions apply to user-facing text only. They do not apply to investigation, code reading, tool use, or verification.
473
-
474
- Before making changes, read the relevant code and understand the surrounding context. Check types, signatures, call sites, and error causes before editing. Do not confuse brevity with rushing, and do not replace understanding with trial and error.
475
-
476
- While working, give short updates at meaningful moments: when you find the root cause, when the plan changes, when you hit a blocker, or when a meaningful milestone is complete. Do not narrate every step.
477
-
478
- When reporting results, be accurate and concrete. If you did not verify something, say so plainly. If a check failed, say that plainly too.
479
- ```
480
-
481
- </details>
482
-
483
- ## Monitoring
484
-
485
- The interceptor includes monitoring for several additional issues identified by the community:
486
-
487
- ### Microcompact / budget enforcement
488
-
489
- Claude Code silently replaces old tool results with `[Old tool result content cleared]` via server-controlled mechanisms (GrowthBook flags). A 200,000-character aggregate cap and per-tool caps (Bash: 30K, Grep: 20K) truncate older results without notification. There is no `DISABLE_MICROCOMPACT` environment variable.
490
-
491
- The interceptor detects cleared tool results and logs counts. When total tool result characters approach the 200K threshold, a warning is logged.
492
-
493
- ### False rate limiter
494
-
495
- The client can generate synthetic "Rate limit reached" errors without making an API call, identifiable by `"model": "<synthetic>"`. The interceptor logs these events.
496
-
497
- ### GrowthBook flag dump
498
-
499
- On the first API call, the interceptor reads `~/.claude.json` and logs the current state of cost/cache-relevant server-controlled flags (hawthorn_window, pewter_kestrel, slate_heron, session_memory, etc.).
500
-
501
- ### Quota tracking
502
-
503
- Response headers are parsed for `anthropic-ratelimit-unified-5h-utilization` and `7d-utilization`, saved to `~/.claude/quota-status.json` for consumption by status line hooks or other tools.
504
-
505
- ### Peak hour detection
258
+ When the server downgrades your TTL to 5m (quota-aware downgrade at Q5h ≥ 100%), **every idle longer than 5 minutes causes a full context rebuild**. Without the status line, this is invisible. With it, the red `TTL:5m` warning tells you: **stop working, wait for the Q5h window to reset, then resume**. Powering through overage compounds the drain; pausing breaks the cycle.
506
259
 
507
- Anthropic applies elevated quota drain rates during weekday peak hours (13:00–19:00 UTC, Mon–Fri). The interceptor detects peak windows and writes `peak_hour: true/false` to `quota-status.json`. See `docs/peak-hours-reference.md` for sources and details.
508
-
509
- ### Usage telemetry and cost reporting
260
+ ### Recommended: disable git-status injection
510
261
 
511
- The interceptor logs per-call usage data to `~/.claude/usage.jsonl` one JSON line per API call with model, token counts, and cache breakdown. Use the bundled cost report tool to analyze costs:
262
+ Claude Code injects live `git status` into the system prompt on every call. Any file edit changes the git status, which busts the entire prefix cache. Disabling this saves ~1,800 tokens per call:
512
263
 
513
264
  ```bash
514
- node tools/cost-report.mjs # today's costs from interceptor log
515
- node tools/cost-report.mjs --date 2026-04-08 # specific date
516
- node tools/cost-report.mjs --since 2h # last 2 hours
517
- node tools/cost-report.mjs --admin-key <key> # cross-reference with Admin API
265
+ export CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1
518
266
  ```
519
267
 
520
- Also works with any JSONL containing Anthropic usage fields (`--file`, stdin) useful for SDK users and proxy setups. See `docs/cost-report.md` for full documentation.
268
+ Or add `"includeGitInstructions": false` to `~/.claude/settings.json`. Claude Code can still run `git status` via the Bash tool when it needs context. Community-validated by [@wadabum](https://github.com/cnighswonger/claude-code-cache-fix/issues/11): 18-token cache creation across git state changes (vs thousands without the flag).
521
269
 
522
- ### Quota analysis (5-hour quota counting)
270
+ ## Image stripping (preload mode)
523
271
 
524
- The same `usage.jsonl` log can be analyzed to test how Anthropic's 5-hour quota is actually computed. Run the bundled tool:
272
+ Images read via the Read tool persist as base64 in conversation history, riding along on every subsequent API call. A single 500KB image costs ~62,500 tokens per turn on Opus 4.6, and **~85,000+ on Opus 4.7** due to the new tokenizer. Image stripping is strongly recommended on 4.7.
525
273
 
526
274
  ```bash
527
- node tools/quota-analysis.mjs # analyze your default log
528
- node tools/quota-analysis.mjs --since 24h # last 24 hours only
529
- node tools/quota-analysis.mjs --json # machine-readable output
275
+ export CACHE_FIX_IMAGE_KEEP_LAST=3
530
276
  ```
531
277
 
532
- The tool answers three questions from your own data:
533
-
534
- 1. **Does `cache_read` count toward your 5-hour quota?** Tests three hypotheses (cache_read costs 0x / 0.1x / 1x of input rate) and reports which one best explains your `q5h_pct` trajectory across reset windows. Lower coefficient of variation across windows = better fit.
535
- 2. **Do peak hours cost more quota per token?** Splits windows into peak-dominant (≥80% peak calls) and off-peak-dominant (≤20%) and compares the implied 100% quota under the best-fit model.
536
- 3. **What is your account's effective 5-hour quota in token-equivalents?** Reports a concrete number you can compare against your subscription tier or against what other users measure.
537
-
538
- Requires `q5h_pct`, `q7d_pct`, and `peak_hour` fields in usage.jsonl, which were added in v1.6.1 (2026-04-09). Older entries are silently filtered out.
278
+ Keeps images in the last 3 user messages, replaces older ones with a text placeholder. Only targets `tool_result` blocks — user-pasted images are never touched.
539
279
 
540
- **Help us validate across accounts:** if you run this on your own log, please open an issue or PR on this repo with your output (or just the best-fit hypothesis name and your peak/off-peak ratio). Cross-validating across multiple accounts is the only way to distinguish per-account variance from real findings. Reference: [anthropics/claude-code#45756](https://github.com/anthropics/claude-code/issues/45756).
280
+ ## System prompt rewrite (preload mode, optional)
541
281
 
542
- ## Debug mode
282
+ The interceptor can rewrite Claude Code's `# Output efficiency` system-prompt section. Disabled by default. Enable with `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT`. See [docs/output-efficiency-prompts.md](docs/output-efficiency-prompts.md) for the three known prompt variants and usage instructions.
543
283
 
544
- Enable debug logging to verify the fix is working:
284
+ ## Monitoring & diagnostics
545
285
 
546
- ```bash
547
- CACHE_FIX_DEBUG=1 claude-fixed
548
- ```
549
-
550
- Logs are written to `~/.claude/cache-fix-debug.log`. Look for:
551
- - `APPLIED: resume message relocation` — block scatter was detected and fixed
552
- - `APPLIED: tool order stabilization` — tools were reordered
553
- - `APPLIED: fingerprint stabilized from XXX to YYY` — fingerprint was corrected
554
- - `APPLIED: stripped N images from old tool results` — images were stripped
555
- - `APPLIED: output efficiency section rewritten` — output-efficiency section was replaced
556
- - `MICROCOMPACT: N/M tool results cleared` — microcompact degradation detected
557
- - `BUDGET WARNING: tool result chars at N / 200,000 threshold` — approaching budget cap
558
- - `FALSE RATE LIMIT: synthetic model detected` — client-side false rate limit
559
- - `GROWTHBOOK FLAGS: {...}` — server-controlled feature flags on first call
560
- - `PROMPT SIZE: system=N tools=N injected=N (skills=N mcp=N ...)` — per-call prompt size breakdown
561
- - `CACHE TTL: tier=1h create=N read=N hit=N% (1h=N 5m=N)` — TTL tier and cache hit rate per call
562
- - `PEAK HOUR: weekday 13:00-19:00 UTC` — Anthropic peak hour throttling active
563
- - `SKIPPED: resume relocation (not a resume or already correct)` — no fix needed
564
- - `SKIPPED: output efficiency rewrite (section not found)` — no matching output-efficiency section found
565
-
566
- ### Prefix diff mode
567
-
568
- Enable cross-process prefix snapshot diffing to diagnose cache busts on restart:
569
-
570
- ```bash
571
- CACHE_FIX_PREFIXDIFF=1 claude-fixed
572
- ```
286
+ The preload interceptor includes monitoring for microcompact degradation, false rate limiters, GrowthBook flag state, usage telemetry, and cost reporting. Quota tracking works in both proxy and preload modes via `~/.claude/quota-status.json`.
573
287
 
574
- Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are generated on the first API call after a restart.
575
-
576
- ## Environment variables
577
-
578
- | Variable | Default | Description |
579
- |----------|---------|-------------|
580
- | `CACHE_FIX_DEBUG` | `0` | Enable debug logging to `~/.claude/cache-fix-debug.log` |
581
- | `CACHE_FIX_PREFIXDIFF` | `0` | Enable prefix snapshot diffing |
582
- | `CACHE_FIX_IMAGE_KEEP_LAST` | `0` | Keep images in last N user messages (0 = disabled) |
583
- | `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT` | unset | Replace Claude Code's `# Output efficiency` system-prompt section before the request is sent |
584
- | `CACHE_FIX_USAGE_LOG` | `~/.claude/usage.jsonl` | Path for per-call usage telemetry log |
585
- | `CACHE_FIX_DISABLED` | `0` | Disable all bug fixes; keep monitoring + optimizations active |
586
- | `CACHE_FIX_SKIP_RELOCATE` | `0` | Skip block relocation fix (Bug 1) |
587
- | `CACHE_FIX_SKIP_FINGERPRINT` | `0` | Skip fingerprint stabilization (Bug 2b) |
588
- | `CACHE_FIX_SKIP_TOOL_SORT` | `0` | Skip tool ordering stabilization (Bug 2a) |
589
- | `CACHE_FIX_SKIP_TTL` | `0` | Skip TTL injection (Bug 5) |
590
- | `CACHE_FIX_SKIP_IDENTITY` | `0` | Skip identity normalization (Bug 6) |
591
- | `CACHE_FIX_SKIP_GIT_STATUS` | `0` | Skip git-status stripping |
592
- | `CACHE_FIX_STRIP_GIT_STATUS` | `0` | Strip volatile git-status from system prompt for prefix stability. Model can still run `git status` via Bash. |
593
- | `CACHE_FIX_TTL_MAIN` | `1h` | TTL for main-thread requests: `1h`, `5m`, or `none` (pass-through) |
594
- | `CACHE_FIX_TTL_SUBAGENT` | `1h` | TTL for subagent requests: `1h`, `5m`, or `none` (pass-through) |
595
- | `CACHE_FIX_DUMP_BREAKPOINTS` | unset | Path to dump cache breakpoint structure (diagnostic for #12) |
288
+ See [docs/monitoring.md](docs/monitoring.md) for full details, debug mode, prefix diffing, environment variables, and the bundled quota analysis tool.
596
289
 
597
290
  ## Limitations
598
291
 
599
- - **npm installation only** — The standalone Claude Code binary has Zig-level attestation that bypasses Node.js. This fix only works with the npm package (`npm install -g @anthropic-ai/claude-code`).
600
- - **Overage TTL downgrade** — Exceeding 100% of the 5-hour quota triggers a server-enforced TTL downgrade from 1h to 5m. This is a server-side decision and cannot be fixed client-side. The interceptor prevents the cache instability that can push you into overage in the first place.
601
- - **Microcompact is not preventable** — The monitoring features detect context degradation but cannot prevent it. The microcompact and budget enforcement mechanisms are server-controlled via GrowthBook flags with no client-side disable option.
602
- - **System prompt rewrite is experimental** — This hook only rewrites one system-prompt section and is opt-in, but there are still unknowns: it is not proven that this prompt text is responsible for the behavior differences discussed in community reports, and it is not known whether future server-side validation could react to modified system prompts. Use at your own risk.
292
+ - **Proxy requires a running process** — The proxy must be started before Claude Code. If it's not running and `ANTHROPIC_BASE_URL` points to it, CC will fail to connect. We recommend running it as a systemd service or with a health-checking wrapper script.
293
+ - **Overage TTL downgrade** — Exceeding 100% of the 5-hour quota triggers a server-enforced TTL downgrade from 1h to 5m. This is server-side and cannot be fixed client-side. The proxy/interceptor prevents the cache instability that can push you into overage in the first place.
294
+ - **Microcompact is not preventable** — The monitoring features detect context degradation but cannot prevent it. Microcompact and budget enforcement are server-controlled via GrowthBook flags with no client-side disable option.
295
+ - **System prompt rewrite is experimental** — Preload-only, opt-in. Not proven to be the cause of behavior differences discussed in community reports. Use at your own risk.
603
296
  - **Version coupling** — The fingerprint salt and block detection heuristics are derived from Claude Code internals. A major refactor could require an update to this package.
604
297
 
605
298
  ## Tracked issues
606
299
 
607
- - [#34629](https://github.com/anthropics/claude-code/issues/34629) Original resume cache regression report
608
- - [#40524](https://github.com/anthropics/claude-code/issues/40524) — Within-session fingerprint invalidation, image persistence
609
- - [#42052](https://github.com/anthropics/claude-code/issues/42052) — Community interceptor development, TTL downgrade discovery
610
- - [#43044](https://github.com/anthropics/claude-code/issues/43044) — Resume loads 0% context on v2.1.91
611
- - [#43657](https://github.com/anthropics/claude-code/issues/43657) — Resume cache invalidation confirmed on v2.1.92
612
- - [#44045](https://github.com/anthropics/claude-code/issues/44045) — SDK-level reproduction with token measurements
613
- - [#32508](https://github.com/anthropics/claude-code/issues/32508) — Community discussion around the `Output efficiency` system-prompt change and its possible effect on model behavior
300
+ We monitor 30+ upstream Claude Code issues related to cache, quota, and context bugs. See [TRACKED_ISSUES.md](TRACKED_ISSUES.md) for the full list with our involvement, community research, and key contributors.
614
301
 
615
302
  ## Related research
616
303
 
617
- - **[@ArkNill/claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis)** — Systematic proxy-based analysis of 7 bugs including microcompact, budget enforcement, false rate limiter, and extended thinking quota impact. The monitoring features in v1.1.0 are informed by this research.
618
- - **[@Renvect/X-Ray-Claude-Code-Interceptor](https://github.com/Renvect/X-Ray-Claude-Code-Interceptor)** — Diagnostic HTTPS proxy with real-time dashboard, system prompt section diffing, per-tool stripping thresholds, and multi-stream JSONL logging. Works with any Claude client that supports `ANTHROPIC_BASE_URL` (CLI, VS Code extension, desktop app), complementing this package's CLI-only `NODE_OPTIONS` approach.
619
- - **[@fgrosswig/claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard)** — Self-hosted forensic dashboard with SSE live monitoring, multi-host aggregation, cache-health scoring, and forced-restart/compaction detection. Reads from Claude Code's native session JSONL files and optionally from an HTTP proxy NDJSON stream. v1.4.0 documented the forced-session-restart mechanism at quota-cap boundaries (~490K tokens per event) and the 78–91% cache-wipe pattern at compaction events. Complementary to our interceptor's in-process vantage point. See [Works with @fgrosswig's dashboard](#works-with-fgrosswigs-dashboard) below for the interop pattern.
620
-
621
- ## Works with @fgrosswig's dashboard
622
-
623
- This interceptor and [@fgrosswig](https://github.com/fgrosswig)'s
624
- [claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard)
625
- solve strongly complementary problems. The interceptor captures per-call API
626
- data from inside the Node.js process — cache metrics, quota state, TTL tier,
627
- rewrites applied. The dashboard provides the visualization layer — historical
628
- trending, per-day charts, multi-host aggregation, cache-health scoring.
629
-
630
- Running both gives you the best of both tools, and the integration is a
631
- one-liner thanks to the dashboard's tolerant NDJSON ingest and our new
632
- `usage-to-dashboard-ndjson` translator.
633
-
634
- ### Quick setup
635
-
636
- ```bash
637
- # Install both tools
638
- npm install -g claude-code-cache-fix
639
- # (follow fgrosswig's dashboard install: https://github.com/fgrosswig/claude-usage-dashboard)
640
-
641
- # One-shot translation (reads ~/.claude/usage.jsonl, writes to
642
- # ~/.claude/anthropic-proxy-logs/proxy-YYYY-MM-DD.ndjson, which his
643
- # dashboard already watches)
644
- node $(npm root -g)/claude-code-cache-fix/tools/usage-to-dashboard-ndjson.mjs
645
-
646
- # Or keep it live-updating as the interceptor logs new calls
647
- node $(npm root -g)/claude-code-cache-fix/tools/usage-to-dashboard-ndjson.mjs --follow &
648
- ```
649
-
650
- No configuration required on the dashboard side — fgrosswig's
651
- `collectProxyNdjsonFiles()` auto-discovers files in
652
- `~/.claude/anthropic-proxy-logs/` (or `$ANTHROPIC_PROXY_LOG_DIR`), and our
653
- translator writes to exactly that path with the expected `proxy-YYYY-MM-DD.ndjson`
654
- filename convention. The dashboard's tolerant ingestion layer ignores unknown
655
- fields, so interceptor-specific extras (`ttl_tier`, `ephemeral_1h_input_tokens`,
656
- `ephemeral_5m_input_tokens`, `peak_hour`, quota state) pass through cleanly
657
- and remain available to downstream consumers that know to read them.
658
-
659
- The `cost_factor` metric in `tools/cost-report.mjs` also comes from
660
- fgrosswig's methodology — the `(input + output + cache_read + cache_creation) / output`
661
- ratio that gives a single-number measure of how much context is being paid
662
- per useful output token. A rising cost factor across a long session is the
663
- measurable signature of cache-efficiency degradation.
304
+ - **[@ArkNill/claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis)** — 38,996-request proxy-based analysis: 7 bugs (microcompact, budget caps, false rate limiter, JSONL duplication, extended thinking), GrowthBook feature flag causal testing, Opus 4.7 burn rate advisory. The monitoring features in v1.1.0 are informed by this research.
305
+ - **[@Renvect/X-Ray-Claude-Code-Interceptor](https://github.com/Renvect/X-Ray-Claude-Code-Interceptor)** — Diagnostic HTTPS proxy with real-time dashboard, system prompt section diffing, per-tool stripping thresholds. Works with any Claude client that supports `ANTHROPIC_BASE_URL`.
306
+ - **[@fgrosswig/claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard)** — Self-hosted forensic dashboard with SSE live monitoring, multi-host aggregation, cache-health scoring. Complementary to our proxy's vantage point. See [docs/dashboard-integration.md](docs/dashboard-integration.md) for the interop setup.
664
307
 
665
308
  ## Used in production
666
309
 
@@ -671,17 +314,16 @@ measurable signature of cache-efficiency degradation.
671
314
  - **[@VictorSun92](https://github.com/VictorSun92)** — Original monkey-patch fix for v2.1.88, identified partial scatter on v2.1.90, contributed forward-scan detection, correct block ordering, tighter block matchers, and the optional output-efficiency rewrite hook
672
315
  - **[@bilby91](https://github.com/bilby91)** ([Crunchloop DAP](https://dap.crunchloop.ai)) — Agent SDK / DAP production environment validation, 1h cache TTL confirmation, tool ordering jitter discovery via debug trace (fixed in v1.5.1), fresh-session sort bug discovery via SKILLS SORT diagnostic (fixed in v1.6.2). First production team to roll the interceptor to trunk.
673
316
  - **[@jmarianski](https://github.com/jmarianski)** — Root cause analysis via MITM proxy capture and Ghidra reverse engineering, multi-mode cache test script
674
- - **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, image stripping, monitoring features, overage TTL downgrade discovery, package maintainer
675
- - **[@ArkNill](https://github.com/ArkNill)** — Microcompact mechanism analysis, GrowthBook flag documentation, false rate limiter identification
317
+ - **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, image stripping, monitoring features, overage TTL downgrade discovery, proxy architecture, package maintainer
318
+ - **[@ArkNill](https://github.com/ArkNill)** — Microcompact mechanism analysis, GrowthBook flag documentation, false rate limiter identification, fingerprint verification fix for CC v2.1.108+ (PR #21), Korean README (PR #22), [claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis) research
676
319
  - **[@Renvect](https://github.com/Renvect)** — Image duplication discovery, cross-project directory contamination analysis
677
320
  - **[@fgrosswig](https://github.com/fgrosswig)** — [claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard) forensic methodology: cost-factor overhead ratio metric, `anthropic-*` header capture pattern, proxy NDJSON schema that informed our dashboard interop layer
678
- - **[@TomTheMenace](https://github.com/TomTheMenace)** — Windows `.bat` wrapper for the interceptor, first Windows platform validation (7.5h/536-call Opus 4.6 session, 98.4% cache hit rate, 81% fingerprint instability corrected)
321
+ - **[@TomTheMenace](https://github.com/TomTheMenace)** — Windows `.bat` wrapper, first Windows platform validation (7.5h/536-call Opus 4.6 session, 98.4% cache hit rate)
679
322
  - **[@arjansingh](https://github.com/arjansingh)** — nvm-compatible wrapper script with dynamic `npm root -g` path resolution (PR #15)
680
323
  - **[@beekamai](https://github.com/beekamai)** — Windows URL-encoding fix for `claude-fixed.bat` when npm root contains spaces (PR #17)
681
324
  - **[@JEONG-JIWOO](https://github.com/JEONG-JIWOO)** — VS Code extension investigation: discovered `claudeCode.claudeProcessWrapper` as the working integration path, wrote the C wrapper for Windows (#16)
682
325
  - **[@X-15](https://github.com/X-15)** — VS Code extension validation, per-fix health status analysis confirming safety check behavior on v2.1.105 (#16)
683
- - **[@ArkNill](https://github.com/ArkNill)** — Fingerprint verification fix for CC v2.1.108+ (`isMeta` filter change, PR #21), Korean README (PR #22), original [claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis) research
684
- - **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery
326
+ - **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery, proposed and built proxy architecture for v3.0.0
685
327
 
686
328
  If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.
687
329