claude-code-cache-fix 3.6.1 → 3.6.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +47 -3
- package/package.json +1 -1
- package/proxy/extensions.json +68 -17
- package/tools/quota-statusline.sh +71 -23
package/README.md
CHANGED
|
@@ -208,6 +208,40 @@ Options (all optional; all fall back to the same env vars used by the CLI):
|
|
|
208
208
|
|
|
209
209
|
*The embeddable factory was contributed by [@bilby91](https://github.com/bilby91) at [Crunchloop DAP](https://dap.crunchloop.ai) — see [PR #123](https://github.com/cnighswonger/claude-code-cache-fix/pull/123).*
|
|
210
210
|
|
|
211
|
+
## Recommended CC operational config
|
|
212
|
+
|
|
213
|
+
The proxy fixes what it can fix at the request layer. A handful of CC client-side env vars and `~/.claude/settings.json` knobs solve adjacent problems the proxy can't reach — silent model swaps on CC update, ambiguous model fallback, schema-strip side effects. Surfacing these here as a recommendation; users decide their own config.
|
|
214
|
+
|
|
215
|
+
These findings come from [@fgrosswig](https://github.com/fgrosswig)'s binary analysis of CC v2.1.91. Methodology is public PowerShell + ASCII string extraction; he shared the resulting punch list privately as a courtesy.
|
|
216
|
+
|
|
217
|
+
### Suggested `~/.claude/settings.json` env block
|
|
218
|
+
|
|
219
|
+
The model IDs below are illustrative — replace with your preferred main and small-fast models. The point is that pinning *something* explicit beats relying on CC's defaults.
|
|
220
|
+
|
|
221
|
+
```json
|
|
222
|
+
{
|
|
223
|
+
"env": {
|
|
224
|
+
"CLAUDE_CODE_DISABLE_LEGACY_MODEL_REMAP": "1",
|
|
225
|
+
"ANTHROPIC_MODEL": "claude-opus-4-7",
|
|
226
|
+
"ANTHROPIC_SMALL_FAST_MODEL": "claude-haiku-4-5-20251001"
|
|
227
|
+
}
|
|
228
|
+
}
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
**`CLAUDE_CODE_DISABLE_LEGACY_MODEL_REMAP=1`** — single most impactful flag. CC has a legacy code path that silently remaps your pinned model to a different one after certain version updates. Setting this to `1` disables the remap; the model you pin is the model you get. (If you don't pin, CC's defaults apply as usual.)
|
|
232
|
+
|
|
233
|
+
**`ANTHROPIC_MODEL`** — pins the primary model. Keeping this explicit means the cache prefix hash stays stable across CC version bumps that would otherwise swap your default. Adjust to whichever model you actually want.
|
|
234
|
+
|
|
235
|
+
**`ANTHROPIC_SMALL_FAST_MODEL`** — pins the side-channel "fast" model CC uses for short auxiliary calls (e.g., title generation, classification). Without an explicit pin, this can silently fall back to a different family on update.
|
|
236
|
+
|
|
237
|
+
### `autoCompactWindow=1000000` caveat
|
|
238
|
+
|
|
239
|
+
If you've seen the `autoCompactWindow: 1000000` setting recommended elsewhere: it only takes effect when the active model qualifies for 1M-context (currently `claude-sonnet-4-6` or `claude-opus-4-6` with the appropriate beta header). Without those preconditions it caps at the hardcoded 200K regardless of what you set.
|
|
240
|
+
|
|
241
|
+
### `CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1` schema-strip side effect
|
|
242
|
+
|
|
243
|
+
If you set this flag, CC strips any tool field outside `["name", "description", "input_schema", "cache_control"]` from outgoing requests. Custom tools relying on `defer_loading` or `eager_input_streaming` will silently lose those fields and behave differently. Worth knowing before turning the flag on.
|
|
244
|
+
|
|
211
245
|
## Quick Start: Preload (CC v2.1.112 and earlier)
|
|
212
246
|
|
|
213
247
|
If you're on a Node.js-based CC version (v2.1.112 or earlier), the preload interceptor works without a proxy:
|
|
@@ -246,7 +280,7 @@ For manual VS Code wrapper setup (without the VSIX), see [docs/preload-setup.md]
|
|
|
246
280
|
|
|
247
281
|
**What it does NOT do:** No network calls from the proxy or interceptor. All telemetry is written to local files under `~/.claude/`. No data leaves your machine.
|
|
248
282
|
|
|
249
|
-
**Supply chain:** Proxy mode: 7 small extension modules in `proxy/extensions/` (each under 200 lines). Preload mode: single unminified file (`preload.mjs`, ~1,700 lines). One dev dependency (`zod` for schema validation in tests only). Review before installing. npm provenance
|
|
283
|
+
**Supply chain:** Proxy mode: 7 small extension modules in `proxy/extensions/` (each under 200 lines). Preload mode: single unminified file (`preload.mjs`, ~1,700 lines). One dev dependency (`zod` for schema validation in tests only). Review before installing. Published builds carry npm's default registry signatures; sigstore provenance attestation is not currently published — tracked as a follow-up.
|
|
250
284
|
|
|
251
285
|
**Independent audit:** [Assessed as "LEGITIMATE TOOL"](https://github.com/anthropics/claude-code/issues/38335#issuecomment-4244413605) by @TheAuditorTool (2026-04-14).
|
|
252
286
|
|
|
@@ -323,13 +357,23 @@ The interceptor can only *help* or *do nothing*. It cannot make things worse.
|
|
|
323
357
|
|
|
324
358
|
Both modes write quota state on every API call. Proxy mode (v3.5.0+) splits into `~/.claude/quota-status/account.json` (account-global fields: Q5h/Q7d, status, overage) plus `~/.claude/quota-status/sessions/<id>.json` (per-session cache fields: TTL tier, hit rate). Preload mode keeps the legacy `~/.claude/quota-status.json` (single-session by construction). The included `tools/quota-statusline.sh` script displays a live status line showing:
|
|
325
359
|
|
|
326
|
-
- **Q5h
|
|
327
|
-
- **Q7d
|
|
360
|
+
- **Q5h** quota bar `[███░┃░░░░░]` + percent + `(exhaust X, reset Y)`. Filled cells are consumed quota; the heavy-vertical tick is wall-clock elapsed position in the window. Tick to the right of the fill = under pace; tick inside the fill = burning faster than time (over pace). `exhaust` is the projected time-to-100% at the current burn rate; `reset` is the wall-clock time until the window rolls over. When `exhaust < reset`, you will hit 100% before the window resets — back off.
|
|
361
|
+
- **Q7d** same shape with day-scale durations (e.g. `(exhaust 3d 13h, reset 3d 0h)`).
|
|
328
362
|
- **TTL tier** — `TTL:1h` when healthy, **`TTL:5m` in red when the server has downgraded you** (typically at Q5h ≥ 100%)
|
|
329
363
|
- **PEAK** in yellow during weekday peak hours (13:00–19:00 UTC)
|
|
330
364
|
- **Cache hit rate %**
|
|
331
365
|
- **OVERAGE** flag when active
|
|
332
366
|
|
|
367
|
+
Example line (mid-window, healthy state):
|
|
368
|
+
|
|
369
|
+
```
|
|
370
|
+
Q5h [███░┃░░░░░] 30% (exhaust 4h40m, reset 3h00m) | Q7d [█████┃░░░░] 53% (exhaust 3d 13h, reset 3d 0h) | TTL:1h 98.3%
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
The `(exhaust …, reset …)` suffix is dropped piecewise when projection isn't meaningful: at 0% (fresh window) and 100% (already exhausted) only `reset` is shown; in the first minute (Q5h) or six minutes (Q7d) after window start the burn rate isn't stable enough to project, so `exhaust` is held back until then; a stale `resets_at` (the server-reported value sits in the past, before the next API call refreshes it) drops both.
|
|
374
|
+
|
|
375
|
+
The bar uses Unicode block characters (`█┃░`) — most modern terminals render these correctly. If your terminal substitutes boxes or replacement glyphs, configure a Unicode-capable font (any DejaVu, Fira, Iosevka, JetBrains Mono, etc.).
|
|
376
|
+
|
|
333
377
|
### Setup
|
|
334
378
|
|
|
335
379
|
```bash
|
package/package.json
CHANGED
package/proxy/extensions.json
CHANGED
|
@@ -1,19 +1,70 @@
|
|
|
1
1
|
{
|
|
2
|
-
"ttl-tier-detect": {
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
"
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
"
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
"
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
"
|
|
2
|
+
"ttl-tier-detect": {
|
|
3
|
+
"enabled": true,
|
|
4
|
+
"order": 75
|
|
5
|
+
},
|
|
6
|
+
"fingerprint-strip": {
|
|
7
|
+
"enabled": true,
|
|
8
|
+
"order": 100
|
|
9
|
+
},
|
|
10
|
+
"image-strip": {
|
|
11
|
+
"enabled": true,
|
|
12
|
+
"order": 150
|
|
13
|
+
},
|
|
14
|
+
"sort-stabilization": {
|
|
15
|
+
"enabled": true,
|
|
16
|
+
"order": 200
|
|
17
|
+
},
|
|
18
|
+
"fresh-session-sort": {
|
|
19
|
+
"enabled": true,
|
|
20
|
+
"order": 250
|
|
21
|
+
},
|
|
22
|
+
"identity-normalization": {
|
|
23
|
+
"enabled": true,
|
|
24
|
+
"order": 300
|
|
25
|
+
},
|
|
26
|
+
"smoosh-split": {
|
|
27
|
+
"enabled": true,
|
|
28
|
+
"order": 320
|
|
29
|
+
},
|
|
30
|
+
"content-strip": {
|
|
31
|
+
"enabled": true,
|
|
32
|
+
"order": 330
|
|
33
|
+
},
|
|
34
|
+
"tool-input-normalize": {
|
|
35
|
+
"enabled": true,
|
|
36
|
+
"order": 340
|
|
37
|
+
},
|
|
38
|
+
"microcompact-stability": {
|
|
39
|
+
"enabled": true,
|
|
40
|
+
"order": 350
|
|
41
|
+
},
|
|
42
|
+
"thinking-display": {
|
|
43
|
+
"enabled": true,
|
|
44
|
+
"order": 360
|
|
45
|
+
},
|
|
46
|
+
"cache-control-normalize": {
|
|
47
|
+
"enabled": true,
|
|
48
|
+
"order": 400
|
|
49
|
+
},
|
|
50
|
+
"messages-cache-breakpoint": {
|
|
51
|
+
"enabled": true,
|
|
52
|
+
"order": 410
|
|
53
|
+
},
|
|
54
|
+
"ttl-management": {
|
|
55
|
+
"enabled": true,
|
|
56
|
+
"order": 500
|
|
57
|
+
},
|
|
58
|
+
"cache-telemetry": {
|
|
59
|
+
"enabled": true,
|
|
60
|
+
"order": 600
|
|
61
|
+
},
|
|
62
|
+
"overage-warning": {
|
|
63
|
+
"enabled": true,
|
|
64
|
+
"order": 610
|
|
65
|
+
},
|
|
66
|
+
"request-log": {
|
|
67
|
+
"enabled": false,
|
|
68
|
+
"order": 700
|
|
69
|
+
}
|
|
19
70
|
}
|
|
@@ -41,7 +41,7 @@ fi
|
|
|
41
41
|
# through os.environ, never via a shell-substituted string.
|
|
42
42
|
result=$(python3 <<'PYEOF' 2>/dev/null
|
|
43
43
|
import sys, json, os, re, hashlib
|
|
44
|
-
from datetime import datetime, timezone
|
|
44
|
+
from datetime import datetime, timezone
|
|
45
45
|
|
|
46
46
|
home = os.path.expanduser('~')
|
|
47
47
|
account_path = os.path.join(home, '.claude', 'quota-status', 'account.json')
|
|
@@ -98,28 +98,76 @@ ts = sess.get('timestamp') or acc.get('timestamp', '')
|
|
|
98
98
|
|
|
99
99
|
now = datetime.fromisoformat(ts.replace('Z', '+00:00')) if ts else datetime.now(timezone.utc)
|
|
100
100
|
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
101
|
+
BAR_WIDTH = 10
|
|
102
|
+
|
|
103
|
+
def draw_bar(consumed_pct, elapsed_pct, width=BAR_WIDTH):
|
|
104
|
+
# Tick overlays a fill cell when consumed > elapsed, keeping bar width
|
|
105
|
+
# constant — that's what makes the over-pace state legible (┃ inside the
|
|
106
|
+
# filled run) rather than just pushing fill cells around.
|
|
107
|
+
fill = int(round(max(0, min(100, consumed_pct)) / 100 * width))
|
|
108
|
+
if elapsed_pct is None:
|
|
109
|
+
tick = -1
|
|
110
|
+
else:
|
|
111
|
+
tick = min(int(max(0, min(100, elapsed_pct)) / 100 * width), width - 1)
|
|
112
|
+
cells = []
|
|
113
|
+
remaining = fill
|
|
114
|
+
for i in range(width):
|
|
115
|
+
if i == tick:
|
|
116
|
+
cells.append('┃')
|
|
117
|
+
elif remaining > 0:
|
|
118
|
+
cells.append('█')
|
|
119
|
+
remaining -= 1
|
|
120
|
+
else:
|
|
121
|
+
cells.append('░')
|
|
122
|
+
return '[' + ''.join(cells) + ']'
|
|
123
|
+
|
|
124
|
+
def fmt_hm(secs):
|
|
125
|
+
if secs is None or secs <= 0:
|
|
126
|
+
return ''
|
|
127
|
+
return '{}h{:02d}m'.format(int(secs // 3600), int((secs % 3600) // 60))
|
|
128
|
+
|
|
129
|
+
def fmt_dh(secs):
|
|
130
|
+
if secs is None or secs <= 0:
|
|
131
|
+
return ''
|
|
132
|
+
return '{}d {}h'.format(int(secs // 86400), int((secs % 86400) // 3600))
|
|
133
|
+
|
|
134
|
+
def window_view(reset_ts, window_secs):
|
|
135
|
+
# Returns (elapsed_sec, secs_left). elapsed_sec may be negative (server
|
|
136
|
+
# gave us a reset_at past the window head — invalid) or exceed window_secs
|
|
137
|
+
# (stale reset_at not yet refreshed by the next API call). Callers handle
|
|
138
|
+
# both; downstream rendering clamps the tick to the bar edges.
|
|
139
|
+
if reset_ts <= 0:
|
|
140
|
+
return None, None
|
|
141
|
+
window_start = datetime.fromtimestamp(reset_ts - window_secs, tz=timezone.utc)
|
|
142
|
+
return (now - window_start).total_seconds(), reset_ts - now.timestamp()
|
|
143
|
+
|
|
144
|
+
def time_to_exhaust_sec(pct, elapsed_sec, min_elapsed_sec):
|
|
145
|
+
# (100 - pct) divided by current burn rate (pct / elapsed_sec). Gated on
|
|
146
|
+
# min_elapsed_sec so very-fresh windows don't project off noise.
|
|
147
|
+
if elapsed_sec is None or elapsed_sec <= min_elapsed_sec:
|
|
148
|
+
return None
|
|
149
|
+
if pct <= 0 or pct >= 100:
|
|
150
|
+
return None
|
|
151
|
+
return (100 - pct) * elapsed_sec / pct
|
|
152
|
+
|
|
153
|
+
def format_window(name, pct, elapsed_sec, window_secs, secs_left, fmt_time, min_elapsed_sec):
|
|
154
|
+
ep = None if elapsed_sec is None or elapsed_sec < 0 else elapsed_sec / window_secs * 100
|
|
155
|
+
extras = []
|
|
156
|
+
stale = secs_left is not None and secs_left <= 0
|
|
157
|
+
if not stale:
|
|
158
|
+
exhaust = time_to_exhaust_sec(pct, elapsed_sec, min_elapsed_sec)
|
|
159
|
+
if exhaust is not None:
|
|
160
|
+
extras.append('exhaust ' + fmt_time(exhaust))
|
|
161
|
+
if secs_left is not None and secs_left > 0:
|
|
162
|
+
extras.append('reset ' + fmt_time(secs_left))
|
|
163
|
+
tail = ' (' + ', '.join(extras) + ')' if extras else ''
|
|
164
|
+
return '{} {} {}%{}'.format(name, draw_bar(pct, ep), pct, tail)
|
|
165
|
+
|
|
166
|
+
elapsed_5h, left_5h = window_view(q5h_reset, 5 * 3600)
|
|
167
|
+
elapsed_7d, left_7d = window_view(q7d_reset, 7 * 86400)
|
|
168
|
+
|
|
169
|
+
label = format_window('Q5h', q5h, elapsed_5h, 5 * 3600, left_5h, fmt_hm, 60)
|
|
170
|
+
label += ' | ' + format_window('Q7d', q7d, elapsed_7d, 7 * 86400, left_7d, fmt_dh, 360)
|
|
123
171
|
if overage == 'active':
|
|
124
172
|
label += ' | OVERAGE'
|
|
125
173
|
|