npm - claude-code-cache-fix - Versions diffs - 3.8.0 → 4.0.0 - Mend

claude-code-cache-fix 3.8.0 → 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/README.md +95 -8
package/README.zh.md +691 -159
package/bin/claude-via-proxy.mjs +1 -0
package/bin/install-service.mjs +15 -0
package/hooks/README.md +36 -0
package/hooks/examples/worktree-edit-guard.py +93 -0
package/package.json +2 -1
package/proxy/extensions/auto-1m-guard.mjs +117 -0
package/proxy/extensions/cache-telemetry.mjs +20 -3
package/proxy/extensions/signature-surface-hash.mjs +60 -0
package/proxy/extensions/thinking-block-sanitize.mjs +233 -19
package/proxy/pipeline.mjs +22 -1
package/proxy/server.mjs +44 -2
package/templates/cache-fix-proxy.service.template +1 -0
package/templates/com.cnighswonger.cache-fix-proxy.plist.template +1 -0
package/tools/MANUAL-COMPACT.md +31 -9
package/tools/manual-compact.sh +17 -11
package/tools/quota-statusline.sh +4 -2

package/README.md CHANGED Viewed

@@ -12,7 +12,7 @@ Cache optimization proxy for [Claude Code](https://github.com/anthropics/claude-
 ## Quick Start: Proxy (recommended)
-The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as hot-reloadable extensions.
+The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as composable extensions.
 ```bash
 # Install
@@ -29,7 +29,7 @@ That's it. The proxy applies all 7 cache-fix extensions automatically. No wrappe
 ### What the proxy does
-On every `/v1/messages` request, 9 extensions run in order (one opt-in):
+On every `/v1/messages` request, 9 extensions run in order:
 | Extension | What it fixes |
 |-----------|--------------|
@@ -41,9 +41,9 @@ On every `/v1/messages` request, 9 extensions run in order (one opt-in):
 | `cache-control-normalize` | Normalizes cache_control markers across messages |
 | `cache-telemetry` | Extracts cache stats from response headers → `~/.claude/quota-status/{account.json,sessions/<id>.json}` |
 | `session-health` | Observes per-session thinking-desync risk (context size + thinking-block count) and warns before a session reaches the danger zone. Read-only |
-| `thinking-block-sanitize` | Drops omitted (empty-text) thinking blocks to head off the CC thinking-desync `400` (#63147). **Opt-in** (`CACHE_FIX_THINKING_SANITIZE=on`) |
+| `thinking-block-sanitize` | Drops omitted (empty-text) thinking blocks to head off the CC thinking-desync `400` (#63147). **On by default as of v4.0.0** (v1 mode). Set `CACHE_FIX_THINKING_SANITIZE=off` to disable, `=v2` for additional tools-hash-mismatch drop (opt-in). |
-Extensions are hot-reloadable — add, remove, or modify `.mjs` files in `proxy/extensions/` and changes apply to the next request without restarting. Configuration in `proxy/extensions.json`.
+Extensions live as `.mjs` files in `proxy/extensions/` with configuration in `proxy/extensions.json`. As of v4.0.0 the proxy loads them once at startup; adding, removing, or modifying an extension requires a supervisor-level proxy restart (see [Upgrading from v3.x](#upgrading-from-v3x)). Hot-reload is available as opt-in via `CACHE_FIX_HOT_RELOAD=on` for users who want the v3.x behavior back; that path is subject to the Node ESM stale-import race documented in [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196).
 **Developing a new extension?** See [docs/parallel-proxy-test-harness.md](docs/parallel-proxy-test-harness.md) for the pattern we use to test extensions end-to-end against real `claude -p` traffic without disturbing the production proxy.
@@ -147,6 +147,7 @@ All proxy settings are controlled via environment variables. Set them before sta
 | `CACHE_FIX_EXTENSIONS_DIR` | `proxy/extensions/` | Directory for extension `.mjs` files |
 | `CACHE_FIX_EXTENSIONS_CONFIG` | `proxy/extensions.json` | Extension configuration file |
 | `CACHE_FIX_DEBUG` | `0` | Enable debug logging |
+| `CACHE_FIX_HOT_RELOAD` | unset | Set to `on` to enable in-process extension hot-reload. Off by default as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x) for details and the supervisor restart flow. |
 ### Corporate environments (proxies, custom CAs)
@@ -210,6 +211,73 @@ Options (all optional; all fall back to the same env vars used by the CLI):
 *The embeddable factory was contributed by [@bilby91](https://github.com/bilby91) at [Crunchloop DAP](https://dap.crunchloop.ai) — see [PR #123](https://github.com/cnighswonger/claude-code-cache-fix/pull/123).*
+## Upgrading from v3.x
+**Behavior changes in v4.0.0:**
+- **`thinking-block-sanitize` v1 is now on by default.** Was opt-in via `CACHE_FIX_THINKING_SANITIZE=on` in v3.8.0–v3.9.x. After seven days of prod dogfood across 37 sessions (zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs. 92.44% baseline, sanitize firing on ~35% of sessions with ~800 blocks dropped per day) the v1 mitigation is the new default. Set `CACHE_FIX_THINKING_SANITIZE=off` to explicitly disable. v2 (additional tools-hash-mismatch drop) stays opt-in via `=v2`. See [#63147](https://github.com/anthropics/claude-code/issues/63147) and [#162](https://github.com/cnighswonger/claude-code-cache-fix/issues/162).
+- **In-process extension hot-reload is now off by default.** Was on in v3.x. Set `CACHE_FIX_HOT_RELOAD=on` to restore the prior behavior. Off-by-default eliminates the Node ESM stale-import race documented in [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196), where the watcher silently failed to load a newly-merged extension for 17 hours after a hot-reload trigger. The race fires when the file watcher re-imports an extension whose transitive dependencies are already cached by Node's loader; cold starts are unaffected.
+### Embedder note (Bun hosts, DAP-style integrations using `createProxyServer()` / `startProxy()`)
+v4.0.0 flips `CACHE_FIX_THINKING_SANITIZE` from default-off to default-on. The v1 omitted-text drop will run on every request body passing through the embedded proxy. If your host depends on the prior no-sanitization behavior (e.g., your downstream code expects empty `thinking` blocks to survive the proxy round-trip), preserve it by either:
+- Setting `CACHE_FIX_THINKING_SANITIZE=off` in your host's environment, OR
+- Setting `process.env.CACHE_FIX_THINKING_SANITIZE = "off"` in your code at any point before request handling — the mode is read per-request via `modeFromEnv()`, not cached at module load.
+The flip is backed by 7 days of prod dogfood (37 sessions, zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs 92.44% baseline). See [PR #201](https://github.com/cnighswonger/claude-code-cache-fix/pull/201) for the validation data and [#63147](https://github.com/anthropics/claude-code/issues/63147) for upstream context.
+Picking up a new extension or a code change to an existing one in v4.0.0 requires a supervisor-level proxy restart. There are two upgrade flows depending on whether you also want to opt back into hot-reload.
+### Flow 1 — code-only npm upgrade (recommended default)
+Your existing systemd unit / launchd plist is unchanged; only the proxy code on disk is updated by npm. Restart the running process to pick up the new code.
+**Linux (systemd user unit):**
+```
+npm install -g claude-code-cache-fix@4
+systemctl --user restart cache-fix-proxy
+```
+No `daemon-reload` required — the unit file content is unchanged.
+**macOS (launchd user agent):**
+```
+npm install -g claude-code-cache-fix@4
+launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
+```
+`kickstart` re-execs the agent under the existing plist.
+### Flow 2 — opt back into hot-reload at the supervisor layer
+Run if you actively use hot-reload (e.g., you drop custom extensions into the extensions dir on a live proxy and want them picked up without restart). This rewrites the unit / plist so `CACHE_FIX_HOT_RELOAD=on` is set every time the supervisor starts the proxy.
+**Linux (systemd user unit):**
+```
+CACHE_FIX_HOT_RELOAD=on cache-fix-proxy install-service
+systemctl --user daemon-reload
+systemctl --user restart cache-fix-proxy
+```
+`daemon-reload` is required because the unit file content changed.
+**macOS (launchd user agent):**
+```
+CACHE_FIX_HOT_RELOAD=on cache-fix-proxy install-service
+launchctl bootout gui/$(id -u)/com.cnighswonger.cache-fix-proxy
+launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.cnighswonger.cache-fix-proxy.plist
+launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
+```
+`bootout` + `bootstrap` is required because the plist contents changed — `kickstart` alone does not pick up plist changes.
+**Note on the hot-reload tradeoff:** even on the opt-in path, the ESM stale-import race remains possible on long-running processes. If you hit a degraded `/health` (returns 503 + `{status:"degraded",...}`), a process restart is the only recovery; the proxy logs a `[CRITICAL]` hint when this happens. See [#197](https://github.com/cnighswonger/claude-code-cache-fix/pull/197) for the observability layer.
 ## What this proxy defends against
 **Cache-economics regressions.** The original purpose of cache-fix is to absorb the cache-handling behaviors in Claude Code that cost users real money and quota — TTL downgrades, cache-breaking header churn, identity-latching issues, and the rest of the regression catalog documented across our issue history. The proxy sits between CC and the Anthropic API, normalizes the request and response stream, and emits enough observability (via statusline integration and the quota-status files) that users can see what their session is actually doing. This is the load-bearing feature for almost every user today.
@@ -231,6 +299,24 @@ Note: cache-fix v3.6.2 and earlier returned 404 for the bootstrap path because t
 - [`CHANGELOG.md`](CHANGELOG.md#371---2026-05-27) — v3.7.1 release entry (extended surface coverage + allowlist mode); [v3.7.0 entry](CHANGELOG.md#370---2026-05-26) covers the prior behavior-change note
 - [`cnighswonger/heron-brook-poc`](https://github.com/cnighswonger/heron-brook-poc) — reproducer for the bootstrap-channel behavior
+**Auto-1M-context overage protection.** CC v2.1.161 onward (notably the VS Code Extension surface) can auto-select 1M context on Pro Plan without user request, immediately consuming overage credits. The proxy's `auto-1m-guard` extension detects the `context-1m-2025-08-07` token on the outbound `anthropic-beta` header and either warns or strips it, depending on the mode you opt into via `CACHE_FIX_AUTO_1M_GUARD`:
+| Mode | Default? | Behavior |
+|---|---|---|
+| `off` | no | Extension no-op. |
+| `warn` | yes | Detect the token. Stash an annotation into the per-session JSON (`auto_1m_detected`, `auto_1m_action: "warn"`, `auto_1m_advice`) and emit a stderr log line. Does not modify the request. |
+| `strip` | opt-in | Detect AND remove the token from the `anthropic-beta` header before forwarding. Annotation: `auto_1m_action: "stripped"`. |
+The CC-side kill switch is `CLAUDE_CODE_DISABLE_1M_CONTEXT=1` (env var), which is the right fix when it actually reaches the CC process. On the VS Code extension surface that env var is reportedly unreliable; the proxy intercept bypasses that gap because it acts on the wire regardless of which CC launcher produced the request. Tracks [CC#64919](https://github.com/anthropics/claude-code/issues/64919); see [`docs/directives/proxy-auto-1m-guard.md`](docs/directives/proxy-auto-1m-guard.md) for the binary-walk that confirms the proxy-visible signal is the beta header (CC strips the `[1m]` suffix from `req.body.model` client-side before sending).
+## Client-side hooks
+Some Claude Code behaviors live below the request layer — they happen client-side, in the tool-dispatch path, before the proxy ever sees traffic. cache-fix ships standalone hook scripts under [`hooks/examples/`](hooks/README.md) for those cases. They're independent of the proxy and you install them by pointing at them from your own `~/.claude/settings.json`.
+| Script | What it does |
+|---|---|
+| [`worktree-edit-guard.py`](docs/hooks/worktree-edit-guard.md) | Block `Edit`/`Write`/`MultiEdit`/`NotebookEdit` tool calls whose target path escapes the active git worktree, preventing parent-checkout corruption from worktree sessions. Addresses [CC#59628](https://github.com/anthropics/claude-code/issues/59628). |
 ## Recommended CC operational config
 The proxy fixes what it can fix at the request layer. A handful of CC client-side env vars and `~/.claude/settings.json` knobs solve adjacent problems the proxy can't reach — silent model swaps on CC update, ambiguous model fallback, schema-strip side effects. Surfacing these here as a recommendation; users decide their own config.
@@ -335,7 +421,7 @@ Additionally, images read via the Read tool persist as base64 in conversation hi
 ## How it works
-**Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions are hot-reloadable `.mjs` files configured in `proxy/extensions.json`. All other traffic passes through untouched.
+**Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions live as `.mjs` files configured in `proxy/extensions.json` and load once at proxy startup (hot-reload is opt-in as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x)). All other traffic passes through untouched.
 **Preload mode** (v2.x): A Node.js `--import` module that patches `globalThis.fetch` before Claude Code makes API calls. Applies the same fixes inline — scans user messages for relocated blocks, sorts tools, recomputes fingerprints, injects TTL markers.
@@ -747,17 +833,17 @@ Token thresholds are anchored to the observed ~382K-token trip with margin; the
 | `CACHE_FIX_THINKING_RISK_HIGH_TOKENS` | `340000` | Context-token level at which risk becomes `high` and the one-time stderr warn fires. |
 | `CACHE_FIX_THINKING_RISK` | unset (on) | Set to `off` to suppress the warning signal (stderr line + `thinking_desync_risk` field). Raw count telemetry keeps recording. |
-## Thinking-block sanitize (proxy mode, opt-in, thinking-desync mitigation)
+## Thinking-block sanitize (proxy mode, on by default, thinking-desync mitigation)
 The *mitigate* half of the thinking-desync response (the *warn-before* half is session-health above). On history-replay paths (resume / `--continue` / auto-compaction / parallel-tool-cancel), Claude Code re-sends prior assistant turns' extended thinking in the **omitted** shape `{ "type":"thinking", "thinking":"", "signature":"<intact>" }`. The API rejects modified thinking in the **latest** assistant message with a permanent `400 … thinking … blocks cannot be modified`, which wedges the session on every subsequent turn (upstream root cause: [anthropics/claude-code#63147](https://github.com/anthropics/claude-code/issues/63147)).
 The `thinking-block-sanitize` extension drops those omitted blocks — which the API treats as optional history — from the request before it is forwarded. Empirically-resolved turn-selection rule: drop omitted thinking from **all prior assistant turns and the latest assistant turn, unless the latest turn is an active tool-continuation** (its last block is a `tool_use` answered by a following `tool_result`). In that one case the API requires the signed thinking intact and the proxy cannot restore the emptied text, so it leaves the turn untouched. **No env var both preserves thinking and avoids the wedge for that case:** `CLAUDE_CODE_DISABLE_THINKING=1` / `MAX_THINKING_TOKENS=0` stop the wedge only by disabling thinking entirely (lossy — no reasoning), and `DISABLE_INTERLEAVED_THINKING=1` does *not* stop the `400` — so there the answer is don't-resume + heal/retire the session. That is exactly why the proxy mitigation matters: **it is the only path that preserves reasoning while avoiding the wedge** for the history-replay paths it covers. Non-empty thinking is never touched; `redacted_thinking` is out of scope for v1.
-**Opt-in.** v1 ships behind `CACHE_FIX_THINKING_SANITIZE=on` (default off): it mutates request bodies and full live-coverage validation is pending. The transform is deterministic and cache-prefix-stable, and emits a per-request `thinking_blocks_dropped` count into the per-session JSON (counts only — never content) that complements the session-health signal.
+**On by default as of v4.0.0.** v1 was opt-in via `CACHE_FIX_THINKING_SANITIZE=on` in v3.8.0–v3.9.x. After seven days of prod dogfood across 37 sessions (zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs. 92.44% baseline, sanitize firing on ~35% of sessions with ~800 blocks dropped per day, max 938K context healthy) the v1 mitigation is the new default. The transform is deterministic and cache-prefix-stable, and emits a per-request `thinking_blocks_dropped` count into the per-session JSON (counts only — never content) that complements the session-health signal. v2 stays opt-in pending its own prod-dogfood window after [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196) closes the silent-load failure mode that prevented v2 from running in prior testing.
 | Env var | Default | Purpose |
 |---------|---------|---------|
-| `CACHE_FIX_THINKING_SANITIZE` | unset (off) | Set to `on` to enable the request-path drop of omitted thinking blocks. Off = no-op (no mutation, no telemetry). |
+| `CACHE_FIX_THINKING_SANITIZE` | unset (= v1) | v4.0.0+: v1 omitted-block drop is the default. Set to `off` to explicitly disable (returns to v3.x default-off behavior). Set to `v2` to additionally enable the v2 tools-hash-mismatch drop. Set to `on` for v1 (back-compat — same as unset). |
 ## System prompt rewrite (preload mode, optional)
@@ -809,6 +895,7 @@ We monitor 30+ upstream Claude Code issues related to cache, quota, and context
 - **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery, proposed and built proxy architecture for v3.0.0
 - **[@vmfarms](https://github.com/vmfarms)** — Concurrent multi-runner production validation, surfaced proxy-mode resume-marker regex no-op (#96), TTL tier detection gap (#97), and image-strip stderr leak (#98)
 - **[@ojura](https://github.com/ojura)** — Opus 4.7 thinking-summaries root-cause analysis: filed [anthropics/claude-code#59844](https://github.com/anthropics/claude-code/issues/59844) with the CLI-binary decode (`!getIsNonInteractiveSession()` gate at offset 230510599 in v2.1.142) and the two-stacked-special-cases framing, which made the `thinking-display` extension (v3.6.1) a clean proxy-side complement to the proposed upstream fix
+- **[@yurukusa](https://github.com/yurukusa)** — [Cluster taxonomy](https://yurukusa.github.io/cc-safe-setup/cluster-tracker.html#cluster-extended-thinking-wedge) for [anthropics/claude-code#63147](https://github.com/anthropics/claude-code/issues/63147) thinking-desync wedge; the 13E (ToolSearch) sub-pattern synthesis that made the `thinking-block-sanitize` v2 directive predicate tractable (cache-fix #171, shipped behind `=v2` opt-in in v4.0.0)
 - **[@schuay](https://github.com/schuay)** — `quota-statusline.sh` enhancements: 10-cell quota bar with elapsed-time tick and exhaust-vs-reset projection replacing the prior `%/min` burn-rate display (PR #140, v3.6.2), and d/h vs h/m time-format autoselect plus named time-unit and burn-warmup constants (PR #143, v3.7.0)
 If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.