claude-code-cache-fix 3.0.1 → 3.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.ko.md +186 -174
- package/README.md +114 -472
- package/README.zh.md +153 -195
- package/package.json +4 -1
- package/preload.mjs +2 -2
- package/proxy/config.mjs +15 -0
- package/proxy/pipeline.mjs +2 -1
- package/proxy/upstream.mjs +102 -0
package/README.md
CHANGED
|
@@ -4,13 +4,13 @@
|
|
|
4
4
|
|
|
5
5
|
English | [中文](./README.zh.md) | [한국어](./README.ko.md) | [Português](./docs/guia-pt-br.md)
|
|
6
6
|
|
|
7
|
-
Cache optimization proxy
|
|
7
|
+
Cache optimization proxy for [Claude Code](https://github.com/anthropics/claude-code). Fixes prompt cache bugs that cause excessive quota burn, stabilizes the request prefix, and monitors for silent regressions. Works with all CC versions including the v2.1.113+ Bun binary.
|
|
8
8
|
|
|
9
|
-
> **v3.0.
|
|
9
|
+
> **v3.0.3** — Local HTTP proxy with 7 hot-reloadable extensions. A/B tested on v2.1.117: **95.5% cache hit rate through proxy vs 82.3% direct** on first warm turn. [Full release notes →](https://github.com/cnighswonger/claude-code-cache-fix/releases/tag/v3.0.0)
|
|
10
10
|
|
|
11
|
-
> **Opus 4.7 advisory:** Metered data shows 4.7 burns Q5h quota at **~2.4x the rate of 4.6** for equivalent visible token counts. Two factors: a new tokenizer (up to 35% more tokens, [documented](https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7)) and adaptive thinking overhead (~105%, not documented in usage response). Workaround: `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1`
|
|
11
|
+
> **Opus 4.7 advisory:** Metered data shows 4.7 burns Q5h quota at **~2.4x the rate of 4.6** for equivalent visible token counts ([independently confirmed by @ArkNill](https://github.com/ArkNill/claude-code-hidden-problem-analysis/blob/main/16_OPUS-47-ADVISORY.md)). Two factors: a new tokenizer (up to 35% more tokens, [documented](https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7)) and adaptive thinking overhead (~105%, not documented in usage response). The Q5h impact compounds into **Q7d** — the weekly quota ceiling that most heavy users will hit first. Workaround: `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` reduces burn by ~3.3x but may reduce quality on complex tasks. See [Discussion #25](https://github.com/cnighswonger/claude-code-cache-fix/discussions/25) (initial observation) and [Discussion #42](https://github.com/cnighswonger/claude-code-cache-fix/discussions/42) (controlled A/B data + Q7d analysis).
|
|
12
12
|
|
|
13
|
-
## Quick Start: Proxy (recommended
|
|
13
|
+
## Quick Start: Proxy (recommended)
|
|
14
14
|
|
|
15
15
|
The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as hot-reloadable extensions.
|
|
16
16
|
|
|
@@ -29,7 +29,7 @@ That's it. The proxy applies all 7 cache-fix extensions automatically. No wrappe
|
|
|
29
29
|
|
|
30
30
|
### What the proxy does
|
|
31
31
|
|
|
32
|
-
On every request
|
|
32
|
+
On every `/v1/messages` request, 7 extensions run in order:
|
|
33
33
|
|
|
34
34
|
| Extension | What it fixes |
|
|
35
35
|
|-----------|--------------|
|
|
@@ -45,232 +45,140 @@ Extensions are hot-reloadable — add, remove, or modify `.mjs` files in `proxy/
|
|
|
45
45
|
|
|
46
46
|
### Running as a service
|
|
47
47
|
|
|
48
|
-
|
|
48
|
+
**Linux (systemd — recommended):**
|
|
49
49
|
|
|
50
|
-
|
|
51
|
-
# Start in background with logging
|
|
52
|
-
nohup node "$(npm root -g)/claude-code-cache-fix/proxy/server.mjs" > /tmp/cache-fix-proxy.log 2>&1 &
|
|
50
|
+
Create `~/.config/systemd/user/cache-fix-proxy.service`:
|
|
53
51
|
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
52
|
+
```ini
|
|
53
|
+
[Unit]
|
|
54
|
+
Description=Claude Code Cache Fix Proxy (v3.x)
|
|
55
|
+
After=network.target
|
|
57
56
|
|
|
58
|
-
|
|
57
|
+
[Service]
|
|
58
|
+
Type=simple
|
|
59
|
+
ExecStart=/usr/local/bin/node /path/to/claude-code-cache-fix/proxy/server.mjs
|
|
60
|
+
Restart=on-failure
|
|
61
|
+
RestartSec=5
|
|
62
|
+
Environment=CACHE_FIX_PROXY_PORT=9801
|
|
59
63
|
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
# {"status":"ok"}
|
|
64
|
+
[Install]
|
|
65
|
+
WantedBy=default.target
|
|
63
66
|
```
|
|
64
67
|
|
|
65
|
-
## Quick Start: Preload (for CC v2.1.112 and earlier)
|
|
66
|
-
|
|
67
|
-
If you're on a Node.js-based CC version (v2.1.112 or earlier), the preload interceptor still works and requires no proxy:
|
|
68
|
-
|
|
69
68
|
```bash
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
> **Note:** The preload does NOT work on CC v2.1.113+ (Bun binary). Use the proxy path above.
|
|
75
|
-
|
|
76
|
-
See [Preload Setup Details](#preload-setup-details) below for wrapper scripts, shell aliases, and Windows instructions.
|
|
77
|
-
|
|
78
|
-
## Security model
|
|
79
|
-
|
|
80
|
-
> **This interceptor patches `globalThis.fetch`.** By design, it has full read/write access to all API requests and responses in the Claude Code process. This is inherent to the approach — any fetch interceptor, proxy, or gateway has this position.
|
|
81
|
-
|
|
82
|
-
**What it does:** Modifies outgoing request structure (block order, fingerprint, TTL, git-status) to fix cache bugs. Reads response headers and SSE usage data for monitoring.
|
|
83
|
-
|
|
84
|
-
**What it does NOT do:** No network calls from the interceptor. All telemetry is written to local files under `~/.claude/`. No data leaves your machine unless you explicitly opt in to [claude-code-meter](https://github.com/cnighswonger/claude-code-meter) sharing (separate package, requires interactive consent).
|
|
85
|
-
|
|
86
|
-
**Supply chain:** Single unminified file (`preload.mjs`, ~1,700 lines). One dependency (`zod` for schema validation in tests only). Review before installing. npm provenance links each published version to its source commit.
|
|
87
|
-
|
|
88
|
-
**Independent audit:** [Assessed as "LEGITIMATE TOOL"](https://github.com/anthropics/claude-code/issues/38335#issuecomment-4244413605) by @TheAuditorTool (2026-04-14).
|
|
69
|
+
systemctl --user daemon-reload
|
|
70
|
+
systemctl --user enable --now cache-fix-proxy
|
|
89
71
|
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
Three bugs cause this:
|
|
95
|
-
|
|
96
|
-
1. **Partial block scatter** — Attachment blocks (skills listing, MCP servers, deferred tools, hooks) are supposed to live in `messages[0]`. On resume, some or all of them drift to later messages, changing the cache prefix.
|
|
97
|
-
|
|
98
|
-
2. **Fingerprint instability** — The `cc_version` fingerprint (e.g. `2.1.92.a3f`) is computed from `messages[0]` content including meta/attachment blocks. When those blocks shift, the fingerprint changes, the system prompt changes, and cache busts.
|
|
99
|
-
|
|
100
|
-
3. **Non-deterministic tool ordering** — Tool definitions can arrive in different orders between turns, changing request bytes and invalidating the cache key.
|
|
101
|
-
|
|
102
|
-
Additionally, images read via the Read tool persist as base64 in conversation history and are sent on every subsequent API call, compounding token costs silently.
|
|
103
|
-
|
|
104
|
-
## Preload Setup Details
|
|
105
|
-
|
|
106
|
-
<details>
|
|
107
|
-
<summary>Expand for preload interceptor setup (CC v2.1.112 and earlier only)</summary>
|
|
72
|
+
# Optional: start on boot (before login)
|
|
73
|
+
sudo loginctl enable-linger $USER
|
|
74
|
+
```
|
|
108
75
|
|
|
109
|
-
|
|
76
|
+
A `cache-fix-proxy install-service` subcommand is planned for v3.1.0 ([#48](https://github.com/cnighswonger/claude-code-cache-fix/issues/48)).
|
|
110
77
|
|
|
111
|
-
|
|
78
|
+
**Fallback (any OS):**
|
|
112
79
|
|
|
113
80
|
```bash
|
|
114
|
-
npm
|
|
81
|
+
nohup node "$(npm root -g)/claude-code-cache-fix/proxy/server.mjs" > /tmp/cache-fix-proxy.log 2>&1 &
|
|
82
|
+
echo 'export ANTHROPIC_BASE_URL=http://127.0.0.1:9801' >> ~/.bashrc
|
|
115
83
|
```
|
|
116
84
|
|
|
117
|
-
###
|
|
118
|
-
|
|
119
|
-
The preload works as a Node.js module that intercepts API requests before they leave your machine.
|
|
120
|
-
|
|
121
|
-
### Option A: Wrapper script (recommended)
|
|
122
|
-
|
|
123
|
-
Create a wrapper script (e.g. `~/bin/claude-fixed`):
|
|
85
|
+
### Health check
|
|
124
86
|
|
|
125
87
|
```bash
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
CLAUDE_NPM_CLI="$NPM_GLOBAL_ROOT/@anthropic-ai/claude-code/cli.js"
|
|
130
|
-
CACHE_FIX="$NPM_GLOBAL_ROOT/claude-code-cache-fix/preload.mjs"
|
|
88
|
+
curl http://127.0.0.1:9801/health
|
|
89
|
+
# {"status":"ok"}
|
|
90
|
+
```
|
|
131
91
|
|
|
132
|
-
|
|
133
|
-
echo "Error: Claude Code npm package not found at $CLAUDE_NPM_CLI" >&2
|
|
134
|
-
echo "Install with: npm install -g @anthropic-ai/claude-code" >&2
|
|
135
|
-
exit 1
|
|
136
|
-
fi
|
|
92
|
+
### Corporate environments (proxies, custom CAs)
|
|
137
93
|
|
|
138
|
-
|
|
139
|
-
echo "Error: claude-code-cache-fix not found at $CACHE_FIX" >&2
|
|
140
|
-
echo "Install with: npm install -g claude-code-cache-fix" >&2
|
|
141
|
-
exit 1
|
|
142
|
-
fi
|
|
94
|
+
The proxy honors the following environment variables when forwarding to `api.anthropic.com`. Behind Zscaler / Netskope / Forcepoint / Bluecoat / corporate squid, set these in the proxy's environment.
|
|
143
95
|
|
|
144
|
-
|
|
145
|
-
|
|
96
|
+
| Variable | Effect |
|
|
97
|
+
|----------|--------|
|
|
98
|
+
| `HTTPS_PROXY` / `HTTP_PROXY` (and lowercase variants) | Routes upstream requests through the corporate HTTP CONNECT proxy. |
|
|
99
|
+
| `NO_PROXY` | Comma-separated host list to bypass the proxy. Supports `*` and `.suffix.example.com`. |
|
|
100
|
+
| `CACHE_FIX_PROXY_CA_FILE` | Path to a PEM file with one or more extra CA certificates (for SSL-inspecting proxies). |
|
|
101
|
+
| `NODE_EXTRA_CA_CERTS` | Standard Node mechanism — also honored. |
|
|
102
|
+
| `CACHE_FIX_PROXY_REJECT_UNAUTHORIZED=0` | **Insecure escape hatch.** Disables TLS verification. Use only as a last resort while you wait for IT to provide the corp CA bundle. |
|
|
146
103
|
|
|
147
|
-
|
|
148
|
-
chmod +x ~/bin/claude-fixed
|
|
149
|
-
```
|
|
104
|
+
Example (Windows PowerShell):
|
|
150
105
|
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
106
|
+
```powershell
|
|
107
|
+
$env:HTTPS_PROXY = 'http://proxy.corp.example:8080'
|
|
108
|
+
$env:NO_PROXY = 'localhost,127.0.0.1,.corp.example'
|
|
109
|
+
$env:CACHE_FIX_PROXY_CA_FILE = 'C:\corp\zscaler-root.pem'
|
|
110
|
+
node "$(npm root -g)\claude-code-cache-fix\proxy\server.mjs"
|
|
154
111
|
```
|
|
155
112
|
|
|
156
|
-
|
|
113
|
+
Stderr will print `[upstream] using proxy http://proxy.corp.example:8080 ...` on first request when the agent is wired correctly. With no proxy/CA env vars set, behavior is unchanged from earlier versions (Node default agent, system trust store).
|
|
157
114
|
|
|
158
|
-
|
|
159
|
-
alias claude='NODE_OPTIONS="--import claude-code-cache-fix" node "$(npm root -g)/@anthropic-ai/claude-code/cli.js"'
|
|
160
|
-
```
|
|
115
|
+
## Quick Start: Preload (CC v2.1.112 and earlier)
|
|
161
116
|
|
|
162
|
-
|
|
117
|
+
If you're on a Node.js-based CC version (v2.1.112 or earlier), the preload interceptor works without a proxy:
|
|
163
118
|
|
|
164
119
|
```bash
|
|
120
|
+
npm install -g claude-code-cache-fix
|
|
165
121
|
NODE_OPTIONS="--import claude-code-cache-fix" claude
|
|
166
122
|
```
|
|
167
123
|
|
|
168
|
-
> **Note
|
|
169
|
-
|
|
170
|
-
### Windows users
|
|
171
|
-
|
|
172
|
-
On Windows, `NODE_OPTIONS="--import ..."` doesn't work the same way as on Linux/macOS. Use the included `claude-fixed.bat` wrapper instead:
|
|
173
|
-
|
|
174
|
-
1. After installing both packages globally:
|
|
175
|
-
```bat
|
|
176
|
-
npm install -g claude-code-cache-fix
|
|
177
|
-
npm install -g @anthropic-ai/claude-code
|
|
178
|
-
```
|
|
124
|
+
> **Note:** The preload does NOT work on CC v2.1.113+ (Bun binary). Use the proxy above.
|
|
179
125
|
|
|
180
|
-
|
|
181
|
-
```bat
|
|
182
|
-
copy "%NPM_ROOT%\claude-code-cache-fix\claude-fixed.bat" C:\Users\%USERNAME%\bin\
|
|
183
|
-
```
|
|
184
|
-
Or find the file manually at your npm global root (run `npm root -g` to locate it).
|
|
185
|
-
|
|
186
|
-
3. Run Claude Code with the interceptor active:
|
|
187
|
-
```bat
|
|
188
|
-
claude-fixed [any claude args...]
|
|
189
|
-
```
|
|
190
|
-
|
|
191
|
-
The wrapper dynamically resolves your npm global root, constructs a `file:///` URL for the preload module (converting backslashes to forward slashes for Node.js), and launches Claude Code with the interceptor loaded. All environment variables (`CACHE_FIX_DEBUG`, `CACHE_FIX_IMAGE_KEEP_LAST`, etc.) work the same as on Linux/macOS.
|
|
192
|
-
|
|
193
|
-
Credit: [@TomTheMenace](https://github.com/anthropics/claude-code/issues/38335) contributed the Windows wrapper and validated the interceptor across a 7.5-hour, 536-call Opus 4.6 session on Windows — 98.4% cache hit rate, 81% of calls had fingerprint instability that the interceptor corrected.
|
|
126
|
+
See [docs/preload-setup.md](docs/preload-setup.md) for wrapper scripts, shell aliases, Windows instructions, and VS Code preload-mode integration.
|
|
194
127
|
|
|
195
128
|
## VS Code Extension
|
|
196
129
|
|
|
197
|
-
|
|
130
|
+
The [VS Code extension](https://github.com/cnighswonger/claude-code-cache-fix-vscode) (v0.5.0) supports both proxy and preload modes:
|
|
198
131
|
|
|
199
|
-
|
|
132
|
+
**Proxy mode (recommended):**
|
|
133
|
+
1. Start the proxy (see above)
|
|
134
|
+
2. In VS Code command palette: **Claude Code Cache Fix: Enable Proxy Mode**
|
|
135
|
+
3. Restart any active Claude Code session
|
|
200
136
|
|
|
201
|
-
|
|
137
|
+
**Preload mode (CC ≤v2.1.112):**
|
|
138
|
+
1. `npm install -g claude-code-cache-fix`
|
|
202
139
|
2. Download the VSIX from [GitHub Releases](https://github.com/cnighswonger/claude-code-cache-fix-vscode/releases/latest)
|
|
203
|
-
3. Install: `code --install-extension claude-code-cache-fix-0.
|
|
204
|
-
|
|
205
|
-
4. Restart any active Claude Code session
|
|
140
|
+
3. Install: `code --install-extension claude-code-cache-fix-0.5.0.vsix`
|
|
141
|
+
4. Command palette: **Claude Code Cache Fix: Enable**
|
|
206
142
|
|
|
207
|
-
|
|
143
|
+
For manual VS Code wrapper setup (without the VSIX), see [docs/preload-setup.md](docs/preload-setup.md#vs-code-preload-mode).
|
|
208
144
|
|
|
209
|
-
|
|
210
|
-
- **Claude Code Cache Fix: Enable** / **Disable** / **Show Status**
|
|
211
|
-
|
|
212
|
-
### Option B: Manual wrapper (if you prefer not to install the VSIX)
|
|
213
|
-
|
|
214
|
-
The VS Code Claude Code extension spawns `claude.exe` / `claude` as a subprocess. The `claude-code.environmentVariables` setting does **not** propagate `NODE_OPTIONS`, so a process wrapper is required.
|
|
145
|
+
## Security model
|
|
215
146
|
|
|
216
|
-
**
|
|
147
|
+
> **The proxy and interceptor have full read/write access to API requests and responses.** This is inherent to the approach — any fetch interceptor, proxy, or gateway has this position.
|
|
217
148
|
|
|
218
|
-
|
|
219
|
-
#!/bin/bash
|
|
220
|
-
NPM_ROOT="$(npm root -g 2>/dev/null)"
|
|
221
|
-
PRELOAD="$NPM_ROOT/claude-code-cache-fix/preload.mjs"
|
|
222
|
-
shift # VS Code passes the original claude path as $1
|
|
223
|
-
export NODE_OPTIONS="--import $PRELOAD"
|
|
224
|
-
exec node "$NPM_ROOT/@anthropic-ai/claude-code/cli.js" "$@"
|
|
225
|
-
```
|
|
149
|
+
**What it does:** Modifies outgoing request structure (block order, fingerprint, TTL, git-status) to fix cache bugs. Reads response headers and SSE usage data for monitoring.
|
|
226
150
|
|
|
227
|
-
|
|
228
|
-
chmod +x ~/bin/claude-vscode-wrapper
|
|
229
|
-
```
|
|
151
|
+
**What it does NOT do:** No network calls from the proxy or interceptor. All telemetry is written to local files under `~/.claude/`. No data leaves your machine unless you explicitly opt in to [claude-code-meter](https://github.com/cnighswonger/claude-code-meter) sharing (separate package, requires interactive consent).
|
|
230
152
|
|
|
231
|
-
|
|
153
|
+
**Supply chain:** Proxy mode: 7 small extension modules in `proxy/extensions/` (each under 200 lines). Preload mode: single unminified file (`preload.mjs`, ~1,700 lines). One dev dependency (`zod` for schema validation in tests only). Review before installing. npm provenance links each published version to its source commit.
|
|
232
154
|
|
|
233
|
-
|
|
234
|
-
{
|
|
235
|
-
"claudeCode.claudeProcessWrapper": "/home/YOUR_USERNAME/bin/claude-vscode-wrapper"
|
|
236
|
-
}
|
|
237
|
-
```
|
|
238
|
-
|
|
239
|
-
**Windows** — `.bat`/`.cmd` wrappers fail because the extension uses `child_process.spawn()` without `shell: true`. Use the C wrapper source included in this package (`tools/claude-vscode-wrapper.c`):
|
|
155
|
+
**Independent audit:** [Assessed as "LEGITIMATE TOOL"](https://github.com/anthropics/claude-code/issues/38335#issuecomment-4244413605) by @TheAuditorTool (2026-04-14).
|
|
240
156
|
|
|
241
|
-
|
|
242
|
-
cl tools\claude-vscode-wrapper.c /Fe:claude-vscode-wrapper.exe
|
|
243
|
-
```
|
|
157
|
+
## The problem
|
|
244
158
|
|
|
245
|
-
|
|
159
|
+
When you use `--resume` or `/resume` in Claude Code, the prompt cache breaks silently. Instead of reading cached tokens (cheap), the API rebuilds them from scratch on every turn (expensive). A session that should cost ~$0.50/hour can burn through $5–10/hour with no visible indication anything is wrong.
|
|
246
160
|
|
|
247
|
-
|
|
248
|
-
{
|
|
249
|
-
"claudeCode.claudeProcessWrapper": "C:\\path\\to\\claude-vscode-wrapper.exe"
|
|
250
|
-
}
|
|
251
|
-
```
|
|
161
|
+
Three bugs cause this:
|
|
252
162
|
|
|
253
|
-
|
|
163
|
+
1. **Partial block scatter** — Attachment blocks (skills listing, MCP servers, deferred tools, hooks) are supposed to live in `messages[0]`. On resume, some or all drift to later messages, changing the cache prefix.
|
|
254
164
|
|
|
255
|
-
|
|
165
|
+
2. **Fingerprint instability** — The `cc_version` fingerprint (e.g. `2.1.92.a3f`) is computed from `messages[0]` content including meta/attachment blocks. When those blocks shift, the fingerprint changes, the system prompt changes, and cache busts.
|
|
256
166
|
|
|
257
|
-
|
|
167
|
+
3. **Non-deterministic tool ordering** — Tool definitions can arrive in different orders between turns, changing request bytes and invalidating the cache key.
|
|
258
168
|
|
|
259
|
-
|
|
169
|
+
Additionally, images read via the Read tool persist as base64 in conversation history and are sent on every subsequent API call, compounding token costs silently.
|
|
260
170
|
|
|
261
171
|
## How it works
|
|
262
172
|
|
|
263
|
-
|
|
173
|
+
**Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions are hot-reloadable `.mjs` files configured in `proxy/extensions.json`. All other traffic passes through untouched.
|
|
264
174
|
|
|
265
|
-
|
|
266
|
-
2. **Sorts tool definitions** alphabetically by name for deterministic ordering
|
|
267
|
-
3. **Recomputes the cc_version fingerprint** from the real user message text instead of meta/attachment content
|
|
175
|
+
**Preload mode** (v2.x): A Node.js `--import` module that patches `globalThis.fetch` before Claude Code makes API calls. Applies the same fixes inline — scans user messages for relocated blocks, sorts tools, recomputes fingerprints, injects TTL markers.
|
|
268
176
|
|
|
269
|
-
|
|
177
|
+
Both modes are idempotent — if nothing needs fixing, the request passes through unmodified. Neither mode modifies your conversation; they only normalize the request structure before it hits the API.
|
|
270
178
|
|
|
271
|
-
## Graduating from
|
|
179
|
+
## Graduating from fixes
|
|
272
180
|
|
|
273
|
-
The
|
|
181
|
+
The package serves three purposes with different lifecycles:
|
|
274
182
|
|
|
275
183
|
| Purpose | Examples | When to disable |
|
|
276
184
|
|---------|----------|-----------------|
|
|
@@ -278,7 +186,7 @@ The interceptor serves three purposes with different lifecycles:
|
|
|
278
186
|
| **Monitoring** | Quota tracking, microcompact detection, GrowthBook flags | Keep permanently — these detect future regressions |
|
|
279
187
|
| **Optimizations** | Image stripping, output efficiency rewrite | Keep as long as they help your workflow |
|
|
280
188
|
|
|
281
|
-
### Health status
|
|
189
|
+
### Health status (preload mode)
|
|
282
190
|
|
|
283
191
|
On first API call, the interceptor logs a health status line (requires `CACHE_FIX_DEBUG=1`):
|
|
284
192
|
|
|
@@ -286,25 +194,14 @@ On first API call, the interceptor logs a health status line (requires `CACHE_FI
|
|
|
286
194
|
cache-fix health: relocate=active(2h ago) fingerprint=dormant(5 clean sessions) tool_sort=active ttl=active identity=waiting
|
|
287
195
|
```
|
|
288
196
|
|
|
289
|
-
Status meanings:
|
|
290
197
|
- **active(Xh ago)** — fix was applied recently
|
|
291
|
-
- **dormant(N clean sessions)** — bug not detected in N
|
|
292
|
-
- **safety-blocked(Nx)** — round-trip verification failed;
|
|
198
|
+
- **dormant(N clean sessions)** — bug not detected in N sessions; CC may have fixed it
|
|
199
|
+
- **safety-blocked(Nx)** — round-trip verification failed; fix auto-disabled
|
|
293
200
|
- **waiting** — fix hasn't been triggered yet
|
|
294
201
|
|
|
295
|
-
When a fix shows `dormant`, you can safely disable it:
|
|
296
|
-
```bash
|
|
297
|
-
export CACHE_FIX_SKIP_RELOCATE=1 # example
|
|
298
|
-
```
|
|
299
|
-
|
|
300
|
-
To disable all fixes but keep monitoring:
|
|
301
|
-
```bash
|
|
302
|
-
export CACHE_FIX_DISABLED=1
|
|
303
|
-
```
|
|
304
|
-
|
|
305
202
|
### Regression detection
|
|
306
203
|
|
|
307
|
-
If cache_read ratio drops below 50% across 5+ calls after disabling fixes
|
|
204
|
+
If cache_read ratio drops below 50% across 5+ calls after disabling fixes:
|
|
308
205
|
```
|
|
309
206
|
REGRESSION WARNING: cache_read ratio averaged 12% across last 5 calls.
|
|
310
207
|
Fixes are disabled — consider re-enabling to recover cache performance.
|
|
@@ -314,10 +211,7 @@ Fixes are disabled — consider re-enabling to recover cache performance.
|
|
|
314
211
|
|
|
315
212
|
### Fingerprint round-trip verification
|
|
316
213
|
|
|
317
|
-
Before rewriting the `cc_version` fingerprint, the interceptor verifies that its
|
|
318
|
-
hardcoded salt and character indices reproduce the fingerprint Claude Code sent.
|
|
319
|
-
If verification fails (CC changed its algorithm), the rewrite is skipped automatically.
|
|
320
|
-
This ensures the interceptor can never make cache performance *worse* than stock CC.
|
|
214
|
+
Before rewriting the `cc_version` fingerprint, the interceptor verifies that its hardcoded salt and character indices reproduce the fingerprint Claude Code sent. If verification fails (CC changed its algorithm), the rewrite is skipped automatically. This ensures the interceptor can never make cache performance *worse* than stock CC.
|
|
321
215
|
|
|
322
216
|
### Fail-safe design
|
|
323
217
|
|
|
@@ -331,21 +225,18 @@ The interceptor can only *help* or *do nothing*. It cannot make things worse.
|
|
|
331
225
|
|
|
332
226
|
## Status line — quota warnings in real time
|
|
333
227
|
|
|
334
|
-
|
|
228
|
+
Both proxy and preload modes write quota state to `~/.claude/quota-status.json` on every API call. The included `tools/quota-statusline.sh` script displays a live status line showing:
|
|
335
229
|
|
|
336
230
|
- **Q5h %** with burn rate (%/min)
|
|
337
231
|
- **Q7d %** with burn rate (%/hr)
|
|
338
|
-
- **TTL tier** —
|
|
232
|
+
- **TTL tier** — `TTL:1h` when healthy, **`TTL:5m` in red when the server has downgraded you** (typically at Q5h ≥ 100%)
|
|
339
233
|
- **PEAK** in yellow during weekday peak hours (13:00–19:00 UTC)
|
|
340
234
|
- **Cache hit rate %**
|
|
341
235
|
- **OVERAGE** flag when active
|
|
342
236
|
|
|
343
237
|
### Setup
|
|
344
238
|
|
|
345
|
-
Copy the script and configure Claude Code to use it:
|
|
346
|
-
|
|
347
239
|
```bash
|
|
348
|
-
# Copy from the npm package to Claude Code's hooks directory
|
|
349
240
|
mkdir -p ~/.claude/hooks
|
|
350
241
|
cp "$(npm root -g)/claude-code-cache-fix/tools/quota-statusline.sh" ~/.claude/hooks/
|
|
351
242
|
chmod +x ~/.claude/hooks/quota-statusline.sh
|
|
@@ -362,305 +253,57 @@ Add to `~/.claude/settings.json`:
|
|
|
362
253
|
}
|
|
363
254
|
```
|
|
364
255
|
|
|
365
|
-
### Recommended: disable git-status injection
|
|
366
|
-
|
|
367
|
-
Claude Code injects live `git status` output into the system prompt on every call. Any file edit changes the git status, which changes the system prompt, which busts the entire prefix cache. Disabling this saves ~1,800 tokens per call and fully stabilizes the system prompt across file edits:
|
|
368
|
-
|
|
369
|
-
```bash
|
|
370
|
-
export CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1
|
|
371
|
-
```
|
|
372
|
-
|
|
373
|
-
Or add `"includeGitInstructions": false` to `~/.claude/settings.json`. Claude Code can still run `git status` via the Bash tool when it needs git context — it just won't pre-inject it into every system prompt.
|
|
374
|
-
|
|
375
|
-
The flag also shrinks the Bash tool description by ~6,364 chars (the Bash tool includes git-related instructions that are stripped when the flag is set), for a total prefix savings of ~7,180 chars (~1,800 tokens) per call.
|
|
376
|
-
|
|
377
|
-
Community-validated by [@wadabum](https://github.com/cnighswonger/claude-code-cache-fix/issues/11): 18-token cache creation across git state changes (vs thousands without the flag). See [#11](https://github.com/cnighswonger/claude-code-cache-fix/issues/11) for the full telemetry comparison.
|
|
378
|
-
|
|
379
|
-
**Note:** this flag does not address the `"Primary working directory"` line in the system prompt, which changes per git worktree. A v1.9.0 interceptor fix to strip/normalize both is planned ([#11](https://github.com/cnighswonger/claude-code-cache-fix/issues/11)).
|
|
380
|
-
|
|
381
256
|
### Why the status line matters
|
|
382
257
|
|
|
383
|
-
When the server downgrades your TTL to 5m (
|
|
384
|
-
|
|
385
|
-
## Image stripping
|
|
386
|
-
|
|
387
|
-
Images read via the Read tool are encoded as base64 and stored in `tool_result` blocks in conversation history. They ride along on **every subsequent API call** until compaction. A single 500KB image costs ~62,500 tokens per turn on Opus 4.6, and potentially **~85,000+ tokens on Opus 4.7** due to the new tokenizer (up to 35% inflation) and high-res image support (2576px max, up from 1568px). Image stripping is strongly recommended on 4.7.
|
|
388
|
-
|
|
389
|
-
Enable image stripping to remove old images from tool results:
|
|
390
|
-
|
|
391
|
-
```bash
|
|
392
|
-
export CACHE_FIX_IMAGE_KEEP_LAST=3
|
|
393
|
-
```
|
|
394
|
-
|
|
395
|
-
This keeps images in the last 3 user messages and replaces older ones with a text placeholder. Only targets images inside `tool_result` blocks (Read tool output) — user-pasted images are never touched. Files remain on disk for re-reading if needed.
|
|
396
|
-
|
|
397
|
-
Set to `0` (default) to disable.
|
|
398
|
-
|
|
399
|
-
## System prompt rewrite (optional)
|
|
400
|
-
|
|
401
|
-
The interceptor can also rewrite Claude Code's `# Output efficiency` system-prompt section before the request is sent.
|
|
402
|
-
|
|
403
|
-
This feature is **optional** and **disabled by default**. If `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT` is unset, nothing is changed.
|
|
404
|
-
|
|
405
|
-
Enable it by setting a replacement text:
|
|
406
|
-
|
|
407
|
-
```bash
|
|
408
|
-
export CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT=$'# Output efficiency\n\n...'
|
|
409
|
-
```
|
|
410
|
-
|
|
411
|
-
The rewrite is intentionally narrow:
|
|
412
|
-
|
|
413
|
-
- Only Claude Code's `# Output efficiency` section is replaced
|
|
414
|
-
- Other system prompt sections are preserved
|
|
415
|
-
- Existing system block structure and fields such as `cache_control` are preserved
|
|
416
|
-
|
|
417
|
-
This may be useful for users who want to stay on current Claude Code versions but experiment with a different `Output efficiency` instruction set instead of downgrading to an earlier release.
|
|
418
|
-
|
|
419
|
-
### Prompt variants
|
|
420
|
-
|
|
421
|
-
<details>
|
|
422
|
-
<summary>Anthropic internal / <code>USER_TYPE=ant</code> version</summary>
|
|
423
|
-
|
|
424
|
-
```text
|
|
425
|
-
# Output efficiency
|
|
426
|
-
|
|
427
|
-
When sending user-facing text, you're writing for a person, not logging to a console. Assume users can't see most tool calls or thinking - only your text output. Before your first tool call, briefly state what you're about to do. While working, give short updates at key moments: when you find something load-bearing (a bug, a root cause), when changing direction, when you've made progress without an update.
|
|
428
|
-
|
|
429
|
-
When you give updates, assume the recipient may have stepped away and lost the thread. They do not know your internal shorthand, codenames, or half-formed plan. Write in complete, grammatical sentences that can be understood cold. Spell out technical terms when helpful. If unsure, err on the side of a bit more explanation. Adapt to the user's expertise: experts can handle denser updates, but don't make novice users reconstruct context on their own.
|
|
430
|
-
|
|
431
|
-
User-facing text should read like natural prose. Avoid clipped sentence fragments, excessive dashes, symbolic shorthand, or formatting that reads like console output. Use tables only when they genuinely improve scanability, such as compact facts (files, lines, pass/fail) or quantitative comparisons. Keep explanatory reasoning in prose around the table, not inside it. Avoid semantic backtracking: structure sentences so the user can follow them linearly without having to reinterpret earlier clauses after reading later ones.
|
|
432
|
-
|
|
433
|
-
Optimize for fast human comprehension, not minimal surface area. If the user has to reread your summary or ask a follow-up just to understand what happened, you saved the wrong tokens. Match the level of structure to the task: for a simple question, answer in plain prose without unnecessary headings or numbered lists. While staying clear and direct, also be concise and avoid fluff. Skip filler, obvious restatements, and throat-clearing. Get to the point. Don't over-focus on low-signal details from your process. When it helps, use an inverted pyramid structure with the conclusion first and details later.
|
|
434
|
-
|
|
435
|
-
These user-facing text instructions do not apply to code or tool calls.
|
|
436
|
-
```
|
|
437
|
-
|
|
438
|
-
</details>
|
|
439
|
-
|
|
440
|
-
<details>
|
|
441
|
-
<summary>Public / default Claude Code version</summary>
|
|
442
|
-
|
|
443
|
-
```text
|
|
444
|
-
# Output efficiency
|
|
445
|
-
|
|
446
|
-
IMPORTANT: Go straight to the point. Try the simplest approach first without going in circles. Do not overdo it. Be extra concise.
|
|
447
|
-
|
|
448
|
-
Your text output is brief, direct, and to the point. Lead with the answer or action, not the reasoning. Omit filler, preamble, and unnecessary transitions. Do not restate the user's request; move directly to the work. When explanation is needed, include only what helps the user understand the outcome.
|
|
449
|
-
|
|
450
|
-
Prioritize user-facing text for:
|
|
451
|
-
- decisions that require user input
|
|
452
|
-
- high-signal progress updates at natural milestones
|
|
453
|
-
- errors or blockers that change the plan
|
|
454
|
-
|
|
455
|
-
If a sentence can do the job, do not turn it into three. Favor short, direct constructions over long explanatory prose. These instructions do not apply to code or tool calls.
|
|
456
|
-
```
|
|
457
|
-
|
|
458
|
-
</details>
|
|
459
|
-
|
|
460
|
-
<details>
|
|
461
|
-
<summary>Example custom replacement(A middle-ground version combining the two versions above)</summary>
|
|
462
|
-
|
|
463
|
-
```text
|
|
464
|
-
# Output efficiency
|
|
465
|
-
|
|
466
|
-
When sending user-facing text, write for a person, not a log file. Assume the user cannot see most tool calls or hidden reasoning - only your text output.
|
|
467
|
-
|
|
468
|
-
Keep user-facing text clear, direct, and reasonably concise. Lead with the answer or action. Skip filler, repetition, and unnecessary preamble.
|
|
469
|
-
|
|
470
|
-
Explain enough for the user to understand the reasoning, tradeoffs, or root cause when that would help them learn or make a decision, but do not turn simple answers into long writeups.
|
|
471
|
-
|
|
472
|
-
These instructions apply to user-facing text only. They do not apply to investigation, code reading, tool use, or verification.
|
|
473
|
-
|
|
474
|
-
Before making changes, read the relevant code and understand the surrounding context. Check types, signatures, call sites, and error causes before editing. Do not confuse brevity with rushing, and do not replace understanding with trial and error.
|
|
475
|
-
|
|
476
|
-
While working, give short updates at meaningful moments: when you find the root cause, when the plan changes, when you hit a blocker, or when a meaningful milestone is complete. Do not narrate every step.
|
|
477
|
-
|
|
478
|
-
When reporting results, be accurate and concrete. If you did not verify something, say so plainly. If a check failed, say that plainly too.
|
|
479
|
-
```
|
|
480
|
-
|
|
481
|
-
</details>
|
|
482
|
-
|
|
483
|
-
## Monitoring
|
|
484
|
-
|
|
485
|
-
The interceptor includes monitoring for several additional issues identified by the community:
|
|
486
|
-
|
|
487
|
-
### Microcompact / budget enforcement
|
|
488
|
-
|
|
489
|
-
Claude Code silently replaces old tool results with `[Old tool result content cleared]` via server-controlled mechanisms (GrowthBook flags). A 200,000-character aggregate cap and per-tool caps (Bash: 30K, Grep: 20K) truncate older results without notification. There is no `DISABLE_MICROCOMPACT` environment variable.
|
|
490
|
-
|
|
491
|
-
The interceptor detects cleared tool results and logs counts. When total tool result characters approach the 200K threshold, a warning is logged.
|
|
492
|
-
|
|
493
|
-
### False rate limiter
|
|
494
|
-
|
|
495
|
-
The client can generate synthetic "Rate limit reached" errors without making an API call, identifiable by `"model": "<synthetic>"`. The interceptor logs these events.
|
|
496
|
-
|
|
497
|
-
### GrowthBook flag dump
|
|
498
|
-
|
|
499
|
-
On the first API call, the interceptor reads `~/.claude.json` and logs the current state of cost/cache-relevant server-controlled flags (hawthorn_window, pewter_kestrel, slate_heron, session_memory, etc.).
|
|
500
|
-
|
|
501
|
-
### Quota tracking
|
|
502
|
-
|
|
503
|
-
Response headers are parsed for `anthropic-ratelimit-unified-5h-utilization` and `7d-utilization`, saved to `~/.claude/quota-status.json` for consumption by status line hooks or other tools.
|
|
504
|
-
|
|
505
|
-
### Peak hour detection
|
|
258
|
+
When the server downgrades your TTL to 5m (quota-aware downgrade at Q5h ≥ 100%), **every idle longer than 5 minutes causes a full context rebuild**. Without the status line, this is invisible. With it, the red `TTL:5m` warning tells you: **stop working, wait for the Q5h window to reset, then resume**. Powering through overage compounds the drain; pausing breaks the cycle.
|
|
506
259
|
|
|
507
|
-
|
|
508
|
-
|
|
509
|
-
### Usage telemetry and cost reporting
|
|
260
|
+
### Recommended: disable git-status injection
|
|
510
261
|
|
|
511
|
-
|
|
262
|
+
Claude Code injects live `git status` into the system prompt on every call. Any file edit changes the git status, which busts the entire prefix cache. Disabling this saves ~1,800 tokens per call:
|
|
512
263
|
|
|
513
264
|
```bash
|
|
514
|
-
|
|
515
|
-
node tools/cost-report.mjs --date 2026-04-08 # specific date
|
|
516
|
-
node tools/cost-report.mjs --since 2h # last 2 hours
|
|
517
|
-
node tools/cost-report.mjs --admin-key <key> # cross-reference with Admin API
|
|
265
|
+
export CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1
|
|
518
266
|
```
|
|
519
267
|
|
|
520
|
-
|
|
268
|
+
Or add `"includeGitInstructions": false` to `~/.claude/settings.json`. Claude Code can still run `git status` via the Bash tool when it needs context. Community-validated by [@wadabum](https://github.com/cnighswonger/claude-code-cache-fix/issues/11): 18-token cache creation across git state changes (vs thousands without the flag).
|
|
521
269
|
|
|
522
|
-
|
|
270
|
+
## Image stripping (preload mode)
|
|
523
271
|
|
|
524
|
-
|
|
272
|
+
Images read via the Read tool persist as base64 in conversation history, riding along on every subsequent API call. A single 500KB image costs ~62,500 tokens per turn on Opus 4.6, and **~85,000+ on Opus 4.7** due to the new tokenizer. Image stripping is strongly recommended on 4.7.
|
|
525
273
|
|
|
526
274
|
```bash
|
|
527
|
-
|
|
528
|
-
node tools/quota-analysis.mjs --since 24h # last 24 hours only
|
|
529
|
-
node tools/quota-analysis.mjs --json # machine-readable output
|
|
275
|
+
export CACHE_FIX_IMAGE_KEEP_LAST=3
|
|
530
276
|
```
|
|
531
277
|
|
|
532
|
-
|
|
533
|
-
|
|
534
|
-
1. **Does `cache_read` count toward your 5-hour quota?** Tests three hypotheses (cache_read costs 0x / 0.1x / 1x of input rate) and reports which one best explains your `q5h_pct` trajectory across reset windows. Lower coefficient of variation across windows = better fit.
|
|
535
|
-
2. **Do peak hours cost more quota per token?** Splits windows into peak-dominant (≥80% peak calls) and off-peak-dominant (≤20%) and compares the implied 100% quota under the best-fit model.
|
|
536
|
-
3. **What is your account's effective 5-hour quota in token-equivalents?** Reports a concrete number you can compare against your subscription tier or against what other users measure.
|
|
537
|
-
|
|
538
|
-
Requires `q5h_pct`, `q7d_pct`, and `peak_hour` fields in usage.jsonl, which were added in v1.6.1 (2026-04-09). Older entries are silently filtered out.
|
|
278
|
+
Keeps images in the last 3 user messages, replaces older ones with a text placeholder. Only targets `tool_result` blocks — user-pasted images are never touched.
|
|
539
279
|
|
|
540
|
-
|
|
280
|
+
## System prompt rewrite (preload mode, optional)
|
|
541
281
|
|
|
542
|
-
|
|
282
|
+
The interceptor can rewrite Claude Code's `# Output efficiency` system-prompt section. Disabled by default. Enable with `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT`. See [docs/output-efficiency-prompts.md](docs/output-efficiency-prompts.md) for the three known prompt variants and usage instructions.
|
|
543
283
|
|
|
544
|
-
|
|
284
|
+
## Monitoring & diagnostics
|
|
545
285
|
|
|
546
|
-
|
|
547
|
-
CACHE_FIX_DEBUG=1 claude-fixed
|
|
548
|
-
```
|
|
549
|
-
|
|
550
|
-
Logs are written to `~/.claude/cache-fix-debug.log`. Look for:
|
|
551
|
-
- `APPLIED: resume message relocation` — block scatter was detected and fixed
|
|
552
|
-
- `APPLIED: tool order stabilization` — tools were reordered
|
|
553
|
-
- `APPLIED: fingerprint stabilized from XXX to YYY` — fingerprint was corrected
|
|
554
|
-
- `APPLIED: stripped N images from old tool results` — images were stripped
|
|
555
|
-
- `APPLIED: output efficiency section rewritten` — output-efficiency section was replaced
|
|
556
|
-
- `MICROCOMPACT: N/M tool results cleared` — microcompact degradation detected
|
|
557
|
-
- `BUDGET WARNING: tool result chars at N / 200,000 threshold` — approaching budget cap
|
|
558
|
-
- `FALSE RATE LIMIT: synthetic model detected` — client-side false rate limit
|
|
559
|
-
- `GROWTHBOOK FLAGS: {...}` — server-controlled feature flags on first call
|
|
560
|
-
- `PROMPT SIZE: system=N tools=N injected=N (skills=N mcp=N ...)` — per-call prompt size breakdown
|
|
561
|
-
- `CACHE TTL: tier=1h create=N read=N hit=N% (1h=N 5m=N)` — TTL tier and cache hit rate per call
|
|
562
|
-
- `PEAK HOUR: weekday 13:00-19:00 UTC` — Anthropic peak hour throttling active
|
|
563
|
-
- `SKIPPED: resume relocation (not a resume or already correct)` — no fix needed
|
|
564
|
-
- `SKIPPED: output efficiency rewrite (section not found)` — no matching output-efficiency section found
|
|
565
|
-
|
|
566
|
-
### Prefix diff mode
|
|
567
|
-
|
|
568
|
-
Enable cross-process prefix snapshot diffing to diagnose cache busts on restart:
|
|
569
|
-
|
|
570
|
-
```bash
|
|
571
|
-
CACHE_FIX_PREFIXDIFF=1 claude-fixed
|
|
572
|
-
```
|
|
286
|
+
The preload interceptor includes monitoring for microcompact degradation, false rate limiters, GrowthBook flag state, usage telemetry, and cost reporting. Quota tracking works in both proxy and preload modes via `~/.claude/quota-status.json`.
|
|
573
287
|
|
|
574
|
-
|
|
575
|
-
|
|
576
|
-
## Environment variables
|
|
577
|
-
|
|
578
|
-
| Variable | Default | Description |
|
|
579
|
-
|----------|---------|-------------|
|
|
580
|
-
| `CACHE_FIX_DEBUG` | `0` | Enable debug logging to `~/.claude/cache-fix-debug.log` |
|
|
581
|
-
| `CACHE_FIX_PREFIXDIFF` | `0` | Enable prefix snapshot diffing |
|
|
582
|
-
| `CACHE_FIX_IMAGE_KEEP_LAST` | `0` | Keep images in last N user messages (0 = disabled) |
|
|
583
|
-
| `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT` | unset | Replace Claude Code's `# Output efficiency` system-prompt section before the request is sent |
|
|
584
|
-
| `CACHE_FIX_USAGE_LOG` | `~/.claude/usage.jsonl` | Path for per-call usage telemetry log |
|
|
585
|
-
| `CACHE_FIX_DISABLED` | `0` | Disable all bug fixes; keep monitoring + optimizations active |
|
|
586
|
-
| `CACHE_FIX_SKIP_RELOCATE` | `0` | Skip block relocation fix (Bug 1) |
|
|
587
|
-
| `CACHE_FIX_SKIP_FINGERPRINT` | `0` | Skip fingerprint stabilization (Bug 2b) |
|
|
588
|
-
| `CACHE_FIX_SKIP_TOOL_SORT` | `0` | Skip tool ordering stabilization (Bug 2a) |
|
|
589
|
-
| `CACHE_FIX_SKIP_TTL` | `0` | Skip TTL injection (Bug 5) |
|
|
590
|
-
| `CACHE_FIX_SKIP_IDENTITY` | `0` | Skip identity normalization (Bug 6) |
|
|
591
|
-
| `CACHE_FIX_SKIP_GIT_STATUS` | `0` | Skip git-status stripping |
|
|
592
|
-
| `CACHE_FIX_STRIP_GIT_STATUS` | `0` | Strip volatile git-status from system prompt for prefix stability. Model can still run `git status` via Bash. |
|
|
593
|
-
| `CACHE_FIX_TTL_MAIN` | `1h` | TTL for main-thread requests: `1h`, `5m`, or `none` (pass-through) |
|
|
594
|
-
| `CACHE_FIX_TTL_SUBAGENT` | `1h` | TTL for subagent requests: `1h`, `5m`, or `none` (pass-through) |
|
|
595
|
-
| `CACHE_FIX_DUMP_BREAKPOINTS` | unset | Path to dump cache breakpoint structure (diagnostic for #12) |
|
|
288
|
+
See [docs/monitoring.md](docs/monitoring.md) for full details, debug mode, prefix diffing, environment variables, and the bundled quota analysis tool.
|
|
596
289
|
|
|
597
290
|
## Limitations
|
|
598
291
|
|
|
599
|
-
- **
|
|
600
|
-
- **Overage TTL downgrade** — Exceeding 100% of the 5-hour quota triggers a server-enforced TTL downgrade from 1h to 5m. This is
|
|
601
|
-
- **Microcompact is not preventable** — The monitoring features detect context degradation but cannot prevent it.
|
|
602
|
-
- **System prompt rewrite is experimental** —
|
|
292
|
+
- **Proxy requires a running process** — The proxy must be started before Claude Code. If it's not running and `ANTHROPIC_BASE_URL` points to it, CC will fail to connect. We recommend running it as a systemd service or with a health-checking wrapper script.
|
|
293
|
+
- **Overage TTL downgrade** — Exceeding 100% of the 5-hour quota triggers a server-enforced TTL downgrade from 1h to 5m. This is server-side and cannot be fixed client-side. The proxy/interceptor prevents the cache instability that can push you into overage in the first place.
|
|
294
|
+
- **Microcompact is not preventable** — The monitoring features detect context degradation but cannot prevent it. Microcompact and budget enforcement are server-controlled via GrowthBook flags with no client-side disable option.
|
|
295
|
+
- **System prompt rewrite is experimental** — Preload-only, opt-in. Not proven to be the cause of behavior differences discussed in community reports. Use at your own risk.
|
|
603
296
|
- **Version coupling** — The fingerprint salt and block detection heuristics are derived from Claude Code internals. A major refactor could require an update to this package.
|
|
604
297
|
|
|
605
298
|
## Tracked issues
|
|
606
299
|
|
|
607
|
-
|
|
608
|
-
- [#40524](https://github.com/anthropics/claude-code/issues/40524) — Within-session fingerprint invalidation, image persistence
|
|
609
|
-
- [#42052](https://github.com/anthropics/claude-code/issues/42052) — Community interceptor development, TTL downgrade discovery
|
|
610
|
-
- [#43044](https://github.com/anthropics/claude-code/issues/43044) — Resume loads 0% context on v2.1.91
|
|
611
|
-
- [#43657](https://github.com/anthropics/claude-code/issues/43657) — Resume cache invalidation confirmed on v2.1.92
|
|
612
|
-
- [#44045](https://github.com/anthropics/claude-code/issues/44045) — SDK-level reproduction with token measurements
|
|
613
|
-
- [#32508](https://github.com/anthropics/claude-code/issues/32508) — Community discussion around the `Output efficiency` system-prompt change and its possible effect on model behavior
|
|
300
|
+
We monitor 30+ upstream Claude Code issues related to cache, quota, and context bugs. See [TRACKED_ISSUES.md](TRACKED_ISSUES.md) for the full list with our involvement, community research, and key contributors.
|
|
614
301
|
|
|
615
302
|
## Related research
|
|
616
303
|
|
|
617
|
-
- **[@ArkNill/claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis)** —
|
|
618
|
-
- **[@Renvect/X-Ray-Claude-Code-Interceptor](https://github.com/Renvect/X-Ray-Claude-Code-Interceptor)** — Diagnostic HTTPS proxy with real-time dashboard, system prompt section diffing, per-tool stripping thresholds
|
|
619
|
-
- **[@fgrosswig/claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard)** — Self-hosted forensic dashboard with SSE live monitoring, multi-host aggregation, cache-health scoring
|
|
620
|
-
|
|
621
|
-
## Works with @fgrosswig's dashboard
|
|
622
|
-
|
|
623
|
-
This interceptor and [@fgrosswig](https://github.com/fgrosswig)'s
|
|
624
|
-
[claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard)
|
|
625
|
-
solve strongly complementary problems. The interceptor captures per-call API
|
|
626
|
-
data from inside the Node.js process — cache metrics, quota state, TTL tier,
|
|
627
|
-
rewrites applied. The dashboard provides the visualization layer — historical
|
|
628
|
-
trending, per-day charts, multi-host aggregation, cache-health scoring.
|
|
629
|
-
|
|
630
|
-
Running both gives you the best of both tools, and the integration is a
|
|
631
|
-
one-liner thanks to the dashboard's tolerant NDJSON ingest and our new
|
|
632
|
-
`usage-to-dashboard-ndjson` translator.
|
|
633
|
-
|
|
634
|
-
### Quick setup
|
|
635
|
-
|
|
636
|
-
```bash
|
|
637
|
-
# Install both tools
|
|
638
|
-
npm install -g claude-code-cache-fix
|
|
639
|
-
# (follow fgrosswig's dashboard install: https://github.com/fgrosswig/claude-usage-dashboard)
|
|
640
|
-
|
|
641
|
-
# One-shot translation (reads ~/.claude/usage.jsonl, writes to
|
|
642
|
-
# ~/.claude/anthropic-proxy-logs/proxy-YYYY-MM-DD.ndjson, which his
|
|
643
|
-
# dashboard already watches)
|
|
644
|
-
node $(npm root -g)/claude-code-cache-fix/tools/usage-to-dashboard-ndjson.mjs
|
|
645
|
-
|
|
646
|
-
# Or keep it live-updating as the interceptor logs new calls
|
|
647
|
-
node $(npm root -g)/claude-code-cache-fix/tools/usage-to-dashboard-ndjson.mjs --follow &
|
|
648
|
-
```
|
|
649
|
-
|
|
650
|
-
No configuration required on the dashboard side — fgrosswig's
|
|
651
|
-
`collectProxyNdjsonFiles()` auto-discovers files in
|
|
652
|
-
`~/.claude/anthropic-proxy-logs/` (or `$ANTHROPIC_PROXY_LOG_DIR`), and our
|
|
653
|
-
translator writes to exactly that path with the expected `proxy-YYYY-MM-DD.ndjson`
|
|
654
|
-
filename convention. The dashboard's tolerant ingestion layer ignores unknown
|
|
655
|
-
fields, so interceptor-specific extras (`ttl_tier`, `ephemeral_1h_input_tokens`,
|
|
656
|
-
`ephemeral_5m_input_tokens`, `peak_hour`, quota state) pass through cleanly
|
|
657
|
-
and remain available to downstream consumers that know to read them.
|
|
658
|
-
|
|
659
|
-
The `cost_factor` metric in `tools/cost-report.mjs` also comes from
|
|
660
|
-
fgrosswig's methodology — the `(input + output + cache_read + cache_creation) / output`
|
|
661
|
-
ratio that gives a single-number measure of how much context is being paid
|
|
662
|
-
per useful output token. A rising cost factor across a long session is the
|
|
663
|
-
measurable signature of cache-efficiency degradation.
|
|
304
|
+
- **[@ArkNill/claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis)** — 38,996-request proxy-based analysis: 7 bugs (microcompact, budget caps, false rate limiter, JSONL duplication, extended thinking), GrowthBook feature flag causal testing, Opus 4.7 burn rate advisory. The monitoring features in v1.1.0 are informed by this research.
|
|
305
|
+
- **[@Renvect/X-Ray-Claude-Code-Interceptor](https://github.com/Renvect/X-Ray-Claude-Code-Interceptor)** — Diagnostic HTTPS proxy with real-time dashboard, system prompt section diffing, per-tool stripping thresholds. Works with any Claude client that supports `ANTHROPIC_BASE_URL`.
|
|
306
|
+
- **[@fgrosswig/claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard)** — Self-hosted forensic dashboard with SSE live monitoring, multi-host aggregation, cache-health scoring. Complementary to our proxy's vantage point. See [docs/dashboard-integration.md](docs/dashboard-integration.md) for the interop setup.
|
|
664
307
|
|
|
665
308
|
## Used in production
|
|
666
309
|
|
|
@@ -671,17 +314,16 @@ measurable signature of cache-efficiency degradation.
|
|
|
671
314
|
- **[@VictorSun92](https://github.com/VictorSun92)** — Original monkey-patch fix for v2.1.88, identified partial scatter on v2.1.90, contributed forward-scan detection, correct block ordering, tighter block matchers, and the optional output-efficiency rewrite hook
|
|
672
315
|
- **[@bilby91](https://github.com/bilby91)** ([Crunchloop DAP](https://dap.crunchloop.ai)) — Agent SDK / DAP production environment validation, 1h cache TTL confirmation, tool ordering jitter discovery via debug trace (fixed in v1.5.1), fresh-session sort bug discovery via SKILLS SORT diagnostic (fixed in v1.6.2). First production team to roll the interceptor to trunk.
|
|
673
316
|
- **[@jmarianski](https://github.com/jmarianski)** — Root cause analysis via MITM proxy capture and Ghidra reverse engineering, multi-mode cache test script
|
|
674
|
-
- **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, image stripping, monitoring features, overage TTL downgrade discovery, package maintainer
|
|
675
|
-
- **[@ArkNill](https://github.com/ArkNill)** — Microcompact mechanism analysis, GrowthBook flag documentation, false rate limiter identification
|
|
317
|
+
- **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, image stripping, monitoring features, overage TTL downgrade discovery, proxy architecture, package maintainer
|
|
318
|
+
- **[@ArkNill](https://github.com/ArkNill)** — Microcompact mechanism analysis, GrowthBook flag documentation, false rate limiter identification, fingerprint verification fix for CC v2.1.108+ (PR #21), Korean README (PR #22), [claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis) research
|
|
676
319
|
- **[@Renvect](https://github.com/Renvect)** — Image duplication discovery, cross-project directory contamination analysis
|
|
677
320
|
- **[@fgrosswig](https://github.com/fgrosswig)** — [claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard) forensic methodology: cost-factor overhead ratio metric, `anthropic-*` header capture pattern, proxy NDJSON schema that informed our dashboard interop layer
|
|
678
|
-
- **[@TomTheMenace](https://github.com/TomTheMenace)** — Windows `.bat` wrapper
|
|
321
|
+
- **[@TomTheMenace](https://github.com/TomTheMenace)** — Windows `.bat` wrapper, first Windows platform validation (7.5h/536-call Opus 4.6 session, 98.4% cache hit rate)
|
|
679
322
|
- **[@arjansingh](https://github.com/arjansingh)** — nvm-compatible wrapper script with dynamic `npm root -g` path resolution (PR #15)
|
|
680
323
|
- **[@beekamai](https://github.com/beekamai)** — Windows URL-encoding fix for `claude-fixed.bat` when npm root contains spaces (PR #17)
|
|
681
324
|
- **[@JEONG-JIWOO](https://github.com/JEONG-JIWOO)** — VS Code extension investigation: discovered `claudeCode.claudeProcessWrapper` as the working integration path, wrote the C wrapper for Windows (#16)
|
|
682
325
|
- **[@X-15](https://github.com/X-15)** — VS Code extension validation, per-fix health status analysis confirming safety check behavior on v2.1.105 (#16)
|
|
683
|
-
- **[@
|
|
684
|
-
- **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery
|
|
326
|
+
- **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery, proposed and built proxy architecture for v3.0.0
|
|
685
327
|
|
|
686
328
|
If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.
|
|
687
329
|
|