claude-code-cache-fix 3.8.0 → 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -12,7 +12,7 @@ Cache optimization proxy for [Claude Code](https://github.com/anthropics/claude-
12
12
 
13
13
  ## Quick Start: Proxy (recommended)
14
14
 
15
- The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as hot-reloadable extensions.
15
+ The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as composable extensions.
16
16
 
17
17
  ```bash
18
18
  # Install
@@ -29,7 +29,7 @@ That's it. The proxy applies all 7 cache-fix extensions automatically. No wrappe
29
29
 
30
30
  ### What the proxy does
31
31
 
32
- On every `/v1/messages` request, 9 extensions run in order (one opt-in):
32
+ On every `/v1/messages` request, 9 extensions run in order:
33
33
 
34
34
  | Extension | What it fixes |
35
35
  |-----------|--------------|
@@ -41,9 +41,9 @@ On every `/v1/messages` request, 9 extensions run in order (one opt-in):
41
41
  | `cache-control-normalize` | Normalizes cache_control markers across messages |
42
42
  | `cache-telemetry` | Extracts cache stats from response headers → `~/.claude/quota-status/{account.json,sessions/<id>.json}` |
43
43
  | `session-health` | Observes per-session thinking-desync risk (context size + thinking-block count) and warns before a session reaches the danger zone. Read-only |
44
- | `thinking-block-sanitize` | Drops omitted (empty-text) thinking blocks to head off the CC thinking-desync `400` (#63147). **Opt-in** (`CACHE_FIX_THINKING_SANITIZE=on`) |
44
+ | `thinking-block-sanitize` | Drops omitted (empty-text) thinking blocks to head off the CC thinking-desync `400` (#63147). **On by default as of v4.0.0** (v1 mode). Set `CACHE_FIX_THINKING_SANITIZE=off` to disable, `=v2` for additional tools-hash-mismatch drop (opt-in). |
45
45
 
46
- Extensions are hot-reloadable — add, remove, or modify `.mjs` files in `proxy/extensions/` and changes apply to the next request without restarting. Configuration in `proxy/extensions.json`.
46
+ Extensions live as `.mjs` files in `proxy/extensions/` with configuration in `proxy/extensions.json`. As of v4.0.0 the proxy loads them once at startup; adding, removing, or modifying an extension requires a supervisor-level proxy restart (see [Upgrading from v3.x](#upgrading-from-v3x)). Hot-reload is available as opt-in via `CACHE_FIX_HOT_RELOAD=on` for users who want the v3.x behavior back; that path is subject to the Node ESM stale-import race documented in [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196).
47
47
 
48
48
  **Developing a new extension?** See [docs/parallel-proxy-test-harness.md](docs/parallel-proxy-test-harness.md) for the pattern we use to test extensions end-to-end against real `claude -p` traffic without disturbing the production proxy.
49
49
 
@@ -147,6 +147,7 @@ All proxy settings are controlled via environment variables. Set them before sta
147
147
  | `CACHE_FIX_EXTENSIONS_DIR` | `proxy/extensions/` | Directory for extension `.mjs` files |
148
148
  | `CACHE_FIX_EXTENSIONS_CONFIG` | `proxy/extensions.json` | Extension configuration file |
149
149
  | `CACHE_FIX_DEBUG` | `0` | Enable debug logging |
150
+ | `CACHE_FIX_HOT_RELOAD` | unset | Set to `on` to enable in-process extension hot-reload. Off by default as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x) for details and the supervisor restart flow. |
150
151
 
151
152
  ### Corporate environments (proxies, custom CAs)
152
153
 
@@ -210,6 +211,73 @@ Options (all optional; all fall back to the same env vars used by the CLI):
210
211
 
211
212
  *The embeddable factory was contributed by [@bilby91](https://github.com/bilby91) at [Crunchloop DAP](https://dap.crunchloop.ai) — see [PR #123](https://github.com/cnighswonger/claude-code-cache-fix/pull/123).*
212
213
 
214
+ ## Upgrading from v3.x
215
+
216
+ **Behavior changes in v4.0.0:**
217
+
218
+ - **`thinking-block-sanitize` v1 is now on by default.** Was opt-in via `CACHE_FIX_THINKING_SANITIZE=on` in v3.8.0–v3.9.x. After seven days of prod dogfood across 37 sessions (zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs. 92.44% baseline, sanitize firing on ~35% of sessions with ~800 blocks dropped per day) the v1 mitigation is the new default. Set `CACHE_FIX_THINKING_SANITIZE=off` to explicitly disable. v2 (additional tools-hash-mismatch drop) stays opt-in via `=v2`. See [#63147](https://github.com/anthropics/claude-code/issues/63147) and [#162](https://github.com/cnighswonger/claude-code-cache-fix/issues/162).
219
+ - **In-process extension hot-reload is now off by default.** Was on in v3.x. Set `CACHE_FIX_HOT_RELOAD=on` to restore the prior behavior. Off-by-default eliminates the Node ESM stale-import race documented in [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196), where the watcher silently failed to load a newly-merged extension for 17 hours after a hot-reload trigger. The race fires when the file watcher re-imports an extension whose transitive dependencies are already cached by Node's loader; cold starts are unaffected.
220
+
221
+ ### Embedder note (Bun hosts, DAP-style integrations using `createProxyServer()` / `startProxy()`)
222
+
223
+ v4.0.0 flips `CACHE_FIX_THINKING_SANITIZE` from default-off to default-on. The v1 omitted-text drop will run on every request body passing through the embedded proxy. If your host depends on the prior no-sanitization behavior (e.g., your downstream code expects empty `thinking` blocks to survive the proxy round-trip), preserve it by either:
224
+
225
+ - Setting `CACHE_FIX_THINKING_SANITIZE=off` in your host's environment, OR
226
+ - Setting `process.env.CACHE_FIX_THINKING_SANITIZE = "off"` in your code at any point before request handling — the mode is read per-request via `modeFromEnv()`, not cached at module load.
227
+
228
+ The flip is backed by 7 days of prod dogfood (37 sessions, zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs 92.44% baseline). See [PR #201](https://github.com/cnighswonger/claude-code-cache-fix/pull/201) for the validation data and [#63147](https://github.com/anthropics/claude-code/issues/63147) for upstream context.
229
+
230
+ Picking up a new extension or a code change to an existing one in v4.0.0 requires a supervisor-level proxy restart. There are two upgrade flows depending on whether you also want to opt back into hot-reload.
231
+
232
+ ### Flow 1 — code-only npm upgrade (recommended default)
233
+
234
+ Your existing systemd unit / launchd plist is unchanged; only the proxy code on disk is updated by npm. Restart the running process to pick up the new code.
235
+
236
+ **Linux (systemd user unit):**
237
+
238
+ ```
239
+ npm install -g claude-code-cache-fix@4
240
+ systemctl --user restart cache-fix-proxy
241
+ ```
242
+
243
+ No `daemon-reload` required — the unit file content is unchanged.
244
+
245
+ **macOS (launchd user agent):**
246
+
247
+ ```
248
+ npm install -g claude-code-cache-fix@4
249
+ launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
250
+ ```
251
+
252
+ `kickstart` re-execs the agent under the existing plist.
253
+
254
+ ### Flow 2 — opt back into hot-reload at the supervisor layer
255
+
256
+ Run if you actively use hot-reload (e.g., you drop custom extensions into the extensions dir on a live proxy and want them picked up without restart). This rewrites the unit / plist so `CACHE_FIX_HOT_RELOAD=on` is set every time the supervisor starts the proxy.
257
+
258
+ **Linux (systemd user unit):**
259
+
260
+ ```
261
+ CACHE_FIX_HOT_RELOAD=on cache-fix-proxy install-service
262
+ systemctl --user daemon-reload
263
+ systemctl --user restart cache-fix-proxy
264
+ ```
265
+
266
+ `daemon-reload` is required because the unit file content changed.
267
+
268
+ **macOS (launchd user agent):**
269
+
270
+ ```
271
+ CACHE_FIX_HOT_RELOAD=on cache-fix-proxy install-service
272
+ launchctl bootout gui/$(id -u)/com.cnighswonger.cache-fix-proxy
273
+ launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.cnighswonger.cache-fix-proxy.plist
274
+ launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
275
+ ```
276
+
277
+ `bootout` + `bootstrap` is required because the plist contents changed — `kickstart` alone does not pick up plist changes.
278
+
279
+ **Note on the hot-reload tradeoff:** even on the opt-in path, the ESM stale-import race remains possible on long-running processes. If you hit a degraded `/health` (returns 503 + `{status:"degraded",...}`), a process restart is the only recovery; the proxy logs a `[CRITICAL]` hint when this happens. See [#197](https://github.com/cnighswonger/claude-code-cache-fix/pull/197) for the observability layer.
280
+
213
281
  ## What this proxy defends against
214
282
 
215
283
  **Cache-economics regressions.** The original purpose of cache-fix is to absorb the cache-handling behaviors in Claude Code that cost users real money and quota — TTL downgrades, cache-breaking header churn, identity-latching issues, and the rest of the regression catalog documented across our issue history. The proxy sits between CC and the Anthropic API, normalizes the request and response stream, and emits enough observability (via statusline integration and the quota-status files) that users can see what their session is actually doing. This is the load-bearing feature for almost every user today.
@@ -231,6 +299,24 @@ Note: cache-fix v3.6.2 and earlier returned 404 for the bootstrap path because t
231
299
  - [`CHANGELOG.md`](CHANGELOG.md#371---2026-05-27) — v3.7.1 release entry (extended surface coverage + allowlist mode); [v3.7.0 entry](CHANGELOG.md#370---2026-05-26) covers the prior behavior-change note
232
300
  - [`cnighswonger/heron-brook-poc`](https://github.com/cnighswonger/heron-brook-poc) — reproducer for the bootstrap-channel behavior
233
301
 
302
+ **Auto-1M-context overage protection.** CC v2.1.161 onward (notably the VS Code Extension surface) can auto-select 1M context on Pro Plan without user request, immediately consuming overage credits. The proxy's `auto-1m-guard` extension detects the `context-1m-2025-08-07` token on the outbound `anthropic-beta` header and either warns or strips it, depending on the mode you opt into via `CACHE_FIX_AUTO_1M_GUARD`:
303
+
304
+ | Mode | Default? | Behavior |
305
+ |---|---|---|
306
+ | `off` | no | Extension no-op. |
307
+ | `warn` | yes | Detect the token. Stash an annotation into the per-session JSON (`auto_1m_detected`, `auto_1m_action: "warn"`, `auto_1m_advice`) and emit a stderr log line. Does not modify the request. |
308
+ | `strip` | opt-in | Detect AND remove the token from the `anthropic-beta` header before forwarding. Annotation: `auto_1m_action: "stripped"`. |
309
+
310
+ The CC-side kill switch is `CLAUDE_CODE_DISABLE_1M_CONTEXT=1` (env var), which is the right fix when it actually reaches the CC process. On the VS Code extension surface that env var is reportedly unreliable; the proxy intercept bypasses that gap because it acts on the wire regardless of which CC launcher produced the request. Tracks [CC#64919](https://github.com/anthropics/claude-code/issues/64919); see [`docs/directives/proxy-auto-1m-guard.md`](docs/directives/proxy-auto-1m-guard.md) for the binary-walk that confirms the proxy-visible signal is the beta header (CC strips the `[1m]` suffix from `req.body.model` client-side before sending).
311
+
312
+ ## Client-side hooks
313
+
314
+ Some Claude Code behaviors live below the request layer — they happen client-side, in the tool-dispatch path, before the proxy ever sees traffic. cache-fix ships standalone hook scripts under [`hooks/examples/`](hooks/README.md) for those cases. They're independent of the proxy and you install them by pointing at them from your own `~/.claude/settings.json`.
315
+
316
+ | Script | What it does |
317
+ |---|---|
318
+ | [`worktree-edit-guard.py`](docs/hooks/worktree-edit-guard.md) | Block `Edit`/`Write`/`MultiEdit`/`NotebookEdit` tool calls whose target path escapes the active git worktree, preventing parent-checkout corruption from worktree sessions. Addresses [CC#59628](https://github.com/anthropics/claude-code/issues/59628). |
319
+
234
320
  ## Recommended CC operational config
235
321
 
236
322
  The proxy fixes what it can fix at the request layer. A handful of CC client-side env vars and `~/.claude/settings.json` knobs solve adjacent problems the proxy can't reach — silent model swaps on CC update, ambiguous model fallback, schema-strip side effects. Surfacing these here as a recommendation; users decide their own config.
@@ -335,7 +421,7 @@ Additionally, images read via the Read tool persist as base64 in conversation hi
335
421
 
336
422
  ## How it works
337
423
 
338
- **Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions are hot-reloadable `.mjs` files configured in `proxy/extensions.json`. All other traffic passes through untouched.
424
+ **Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions live as `.mjs` files configured in `proxy/extensions.json` and load once at proxy startup (hot-reload is opt-in as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x)). All other traffic passes through untouched.
339
425
 
340
426
  **Preload mode** (v2.x): A Node.js `--import` module that patches `globalThis.fetch` before Claude Code makes API calls. Applies the same fixes inline — scans user messages for relocated blocks, sorts tools, recomputes fingerprints, injects TTL markers.
341
427
 
@@ -747,17 +833,17 @@ Token thresholds are anchored to the observed ~382K-token trip with margin; the
747
833
  | `CACHE_FIX_THINKING_RISK_HIGH_TOKENS` | `340000` | Context-token level at which risk becomes `high` and the one-time stderr warn fires. |
748
834
  | `CACHE_FIX_THINKING_RISK` | unset (on) | Set to `off` to suppress the warning signal (stderr line + `thinking_desync_risk` field). Raw count telemetry keeps recording. |
749
835
 
750
- ## Thinking-block sanitize (proxy mode, opt-in, thinking-desync mitigation)
836
+ ## Thinking-block sanitize (proxy mode, on by default, thinking-desync mitigation)
751
837
 
752
838
  The *mitigate* half of the thinking-desync response (the *warn-before* half is session-health above). On history-replay paths (resume / `--continue` / auto-compaction / parallel-tool-cancel), Claude Code re-sends prior assistant turns' extended thinking in the **omitted** shape `{ "type":"thinking", "thinking":"", "signature":"<intact>" }`. The API rejects modified thinking in the **latest** assistant message with a permanent `400 … thinking … blocks cannot be modified`, which wedges the session on every subsequent turn (upstream root cause: [anthropics/claude-code#63147](https://github.com/anthropics/claude-code/issues/63147)).
753
839
 
754
840
  The `thinking-block-sanitize` extension drops those omitted blocks — which the API treats as optional history — from the request before it is forwarded. Empirically-resolved turn-selection rule: drop omitted thinking from **all prior assistant turns and the latest assistant turn, unless the latest turn is an active tool-continuation** (its last block is a `tool_use` answered by a following `tool_result`). In that one case the API requires the signed thinking intact and the proxy cannot restore the emptied text, so it leaves the turn untouched. **No env var both preserves thinking and avoids the wedge for that case:** `CLAUDE_CODE_DISABLE_THINKING=1` / `MAX_THINKING_TOKENS=0` stop the wedge only by disabling thinking entirely (lossy — no reasoning), and `DISABLE_INTERLEAVED_THINKING=1` does *not* stop the `400` — so there the answer is don't-resume + heal/retire the session. That is exactly why the proxy mitigation matters: **it is the only path that preserves reasoning while avoiding the wedge** for the history-replay paths it covers. Non-empty thinking is never touched; `redacted_thinking` is out of scope for v1.
755
841
 
756
- **Opt-in.** v1 ships behind `CACHE_FIX_THINKING_SANITIZE=on` (default off): it mutates request bodies and full live-coverage validation is pending. The transform is deterministic and cache-prefix-stable, and emits a per-request `thinking_blocks_dropped` count into the per-session JSON (counts only — never content) that complements the session-health signal.
842
+ **On by default as of v4.0.0.** v1 was opt-in via `CACHE_FIX_THINKING_SANITIZE=on` in v3.8.0–v3.9.x. After seven days of prod dogfood across 37 sessions (zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs. 92.44% baseline, sanitize firing on ~35% of sessions with ~800 blocks dropped per day, max 938K context healthy) the v1 mitigation is the new default. The transform is deterministic and cache-prefix-stable, and emits a per-request `thinking_blocks_dropped` count into the per-session JSON (counts only — never content) that complements the session-health signal. v2 stays opt-in pending its own prod-dogfood window after [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196) closes the silent-load failure mode that prevented v2 from running in prior testing.
757
843
 
758
844
  | Env var | Default | Purpose |
759
845
  |---------|---------|---------|
760
- | `CACHE_FIX_THINKING_SANITIZE` | unset (off) | Set to `on` to enable the request-path drop of omitted thinking blocks. Off = no-op (no mutation, no telemetry). |
846
+ | `CACHE_FIX_THINKING_SANITIZE` | unset (= v1) | v4.0.0+: v1 omitted-block drop is the default. Set to `off` to explicitly disable (returns to v3.x default-off behavior). Set to `v2` to additionally enable the v2 tools-hash-mismatch drop. Set to `on` for v1 (back-compat same as unset). |
761
847
 
762
848
  ## System prompt rewrite (preload mode, optional)
763
849
 
@@ -809,6 +895,7 @@ We monitor 30+ upstream Claude Code issues related to cache, quota, and context
809
895
  - **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery, proposed and built proxy architecture for v3.0.0
810
896
  - **[@vmfarms](https://github.com/vmfarms)** — Concurrent multi-runner production validation, surfaced proxy-mode resume-marker regex no-op (#96), TTL tier detection gap (#97), and image-strip stderr leak (#98)
811
897
  - **[@ojura](https://github.com/ojura)** — Opus 4.7 thinking-summaries root-cause analysis: filed [anthropics/claude-code#59844](https://github.com/anthropics/claude-code/issues/59844) with the CLI-binary decode (`!getIsNonInteractiveSession()` gate at offset 230510599 in v2.1.142) and the two-stacked-special-cases framing, which made the `thinking-display` extension (v3.6.1) a clean proxy-side complement to the proposed upstream fix
898
+ - **[@yurukusa](https://github.com/yurukusa)** — [Cluster taxonomy](https://yurukusa.github.io/cc-safe-setup/cluster-tracker.html#cluster-extended-thinking-wedge) for [anthropics/claude-code#63147](https://github.com/anthropics/claude-code/issues/63147) thinking-desync wedge; the 13E (ToolSearch) sub-pattern synthesis that made the `thinking-block-sanitize` v2 directive predicate tractable (cache-fix #171, shipped behind `=v2` opt-in in v4.0.0)
812
899
  - **[@schuay](https://github.com/schuay)** — `quota-statusline.sh` enhancements: 10-cell quota bar with elapsed-time tick and exhaust-vs-reset projection replacing the prior `%/min` burn-rate display (PR #140, v3.6.2), and d/h vs h/m time-format autoselect plus named time-unit and burn-warmup constants (PR #143, v3.7.0)
813
900
 
814
901
  If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.