claude-code-cache-fix 3.9.0 → 4.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,13 +6,13 @@ English | [中文](./README.zh.md) | [한국어](./README.ko.md) | [Português](
6
6
 
7
7
  Cache optimization proxy for [Claude Code](https://github.com/anthropics/claude-code). Fixes prompt cache bugs that cause excessive quota burn, stabilizes the request prefix, and monitors for silent regressions. Works with all CC versions including the v2.1.113+ Bun binary.
8
8
 
9
- > **v3.0.3** — Local HTTP proxy with 7 hot-reloadable extensions. A/B tested on v2.1.117: **95.5% cache hit rate through proxy vs 82.3% direct** on first warm turn. [Full release notes →](https://github.com/cnighswonger/claude-code-cache-fix/releases/tag/v3.0.0)
9
+ > **v4.0.0** — Local HTTP proxy with a pipeline of cost-impact and observability extensions. Two long-standing defaults flipped: `thinking-block-sanitize` v1 is on by default (mitigates the thinking-desync `400` wedge — [#63147](https://github.com/anthropics/claude-code/issues/63147)) and in-process extension hot-reload is opt-in (`CACHE_FIX_HOT_RELOAD=on`). A/B baseline (v3.0.0 on v2.1.117): **95.5% cache hit rate through proxy vs 82.3% direct** on first warm turn. [Full release notes →](https://github.com/cnighswonger/claude-code-cache-fix/releases/tag/v4.0.0)
10
10
 
11
11
  > **Opus 4.7 advisory:** Metered data shows 4.7 burns Q5h quota at **~2.4x the rate of 4.6** for equivalent visible token counts ([independently confirmed by @ArkNill](https://github.com/ArkNill/claude-code-hidden-problem-analysis/blob/main/16_OPUS-47-ADVISORY.md)). Two factors: a new tokenizer (up to 35% more tokens, [documented](https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7)) and adaptive thinking overhead (~105%, not documented in usage response). The Q5h impact compounds into **Q7d** — the weekly quota ceiling that most heavy users will hit first. Workaround: `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` reduces burn by ~3.3x but may reduce quality on complex tasks. See [Discussion #25](https://github.com/cnighswonger/claude-code-cache-fix/discussions/25) (initial observation) and [Discussion #42](https://github.com/cnighswonger/claude-code-cache-fix/discussions/42) (controlled A/B data + Q7d analysis).
12
12
 
13
13
  ## Quick Start: Proxy (recommended)
14
14
 
15
- The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as hot-reloadable extensions.
15
+ The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as composable extensions.
16
16
 
17
17
  ```bash
18
18
  # Install
@@ -25,11 +25,11 @@ node "$(npm root -g)/claude-code-cache-fix/proxy/server.mjs" &
25
25
  ANTHROPIC_BASE_URL=http://127.0.0.1:9801 claude
26
26
  ```
27
27
 
28
- That's it. The proxy applies all 7 cache-fix extensions automatically. No wrapper scripts, no `NODE_OPTIONS`, no preload.
28
+ That's it. The proxy applies its default extension pipeline automatically. No wrapper scripts, no `NODE_OPTIONS`, no preload.
29
29
 
30
30
  ### What the proxy does
31
31
 
32
- On every `/v1/messages` request, 9 extensions run in order (one opt-in):
32
+ On every `/v1/messages` request, the pipeline runs an ordered chain of extensions covering cache stability, observability, thinking-desync mitigation, image, microcompact, breakpoint, bootstrap-channel, and other surfaces. Several are gated behind env vars documented in their own sections below; bootstrap-channel handling defaults to `audit` mode. The headliners:
33
33
 
34
34
  | Extension | What it fixes |
35
35
  |-----------|--------------|
@@ -41,9 +41,9 @@ On every `/v1/messages` request, 9 extensions run in order (one opt-in):
41
41
  | `cache-control-normalize` | Normalizes cache_control markers across messages |
42
42
  | `cache-telemetry` | Extracts cache stats from response headers → `~/.claude/quota-status/{account.json,sessions/<id>.json}` |
43
43
  | `session-health` | Observes per-session thinking-desync risk (context size + thinking-block count) and warns before a session reaches the danger zone. Read-only |
44
- | `thinking-block-sanitize` | Drops omitted (empty-text) thinking blocks to head off the CC thinking-desync `400` (#63147). **Opt-in** (`CACHE_FIX_THINKING_SANITIZE=on`) |
44
+ | `thinking-block-sanitize` | Drops omitted (empty-text) thinking blocks to head off the CC thinking-desync `400` (#63147). **On by default as of v4.0.0** (v1 mode). Set `CACHE_FIX_THINKING_SANITIZE=off` to disable, `=v2` for additional tools-hash-mismatch drop (opt-in). |
45
45
 
46
- Extensions are hot-reloadable — add, remove, or modify `.mjs` files in `proxy/extensions/` and changes apply to the next request without restarting. Configuration in `proxy/extensions.json`.
46
+ Extensions live as `.mjs` files in `proxy/extensions/` with configuration in `proxy/extensions.json`. As of v4.0.0 the proxy loads them once at startup; adding, removing, or modifying an extension requires a supervisor-level proxy restart (see [Upgrading from v3.x](#upgrading-from-v3x)). Hot-reload is available as opt-in via `CACHE_FIX_HOT_RELOAD=on` for users who want the v3.x behavior back; that path is subject to the Node ESM stale-import race documented in [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196).
47
47
 
48
48
  **Developing a new extension?** See [docs/parallel-proxy-test-harness.md](docs/parallel-proxy-test-harness.md) for the pattern we use to test extensions end-to-end against real `claude -p` traffic without disturbing the production proxy.
49
49
 
@@ -116,7 +116,7 @@ docker run -d --name cache-fix-proxy --restart=always -p 9801:9801 \
116
116
  ghcr.io/cnighswonger/claude-code-cache-fix:latest
117
117
  ```
118
118
 
119
- Image tags: `latest`, `3`, `3.2`, `3.2.1` (semver-ladder, so `3` always points to the newest 3.x). `latest` always tracks the newest tagged release.
119
+ Image tags: `latest`, `4`, `4.0`, `4.0.0` (semver-ladder, so `4` always points to the newest 4.x). `latest` always tracks the newest tagged release.
120
120
 
121
121
  **Linux note:** the chained-upstream `host.docker.internal` example below is automatic on Docker Desktop (macOS / Windows). On plain Linux Docker Engine you usually need `--add-host=host.docker.internal:host-gateway` so the name resolves to the host bridge. Without it, the container's name lookup fails and the proxy can't reach the upstream service running on the host. Example chaining cache-fix proxy through `llm-relay` running on the host:
122
122
 
@@ -147,6 +147,7 @@ All proxy settings are controlled via environment variables. Set them before sta
147
147
  | `CACHE_FIX_EXTENSIONS_DIR` | `proxy/extensions/` | Directory for extension `.mjs` files |
148
148
  | `CACHE_FIX_EXTENSIONS_CONFIG` | `proxy/extensions.json` | Extension configuration file |
149
149
  | `CACHE_FIX_DEBUG` | `0` | Enable debug logging |
150
+ | `CACHE_FIX_HOT_RELOAD` | unset | Set to `on` to enable in-process extension hot-reload. Off by default as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x) for details and the supervisor restart flow. |
150
151
 
151
152
  ### Corporate environments (proxies, custom CAs)
152
153
 
@@ -210,11 +211,78 @@ Options (all optional; all fall back to the same env vars used by the CLI):
210
211
 
211
212
  *The embeddable factory was contributed by [@bilby91](https://github.com/bilby91) at [Crunchloop DAP](https://dap.crunchloop.ai) — see [PR #123](https://github.com/cnighswonger/claude-code-cache-fix/pull/123).*
212
213
 
214
+ ## Upgrading from v3.x
215
+
216
+ **Behavior changes in v4.0.0:**
217
+
218
+ - **`thinking-block-sanitize` v1 is now on by default.** Was opt-in via `CACHE_FIX_THINKING_SANITIZE=on` in v3.8.0–v3.9.x. After seven days of prod dogfood across 37 sessions (zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs. 92.44% baseline, sanitize firing on ~35% of sessions with ~800 blocks dropped per day) the v1 mitigation is the new default. Set `CACHE_FIX_THINKING_SANITIZE=off` to explicitly disable. v2 (additional tools-hash-mismatch drop) stays opt-in via `=v2`. See [#63147](https://github.com/anthropics/claude-code/issues/63147) and [#162](https://github.com/cnighswonger/claude-code-cache-fix/issues/162).
219
+ - **In-process extension hot-reload is now off by default.** Was on in v3.x. Set `CACHE_FIX_HOT_RELOAD=on` to restore the prior behavior. Off-by-default eliminates the Node ESM stale-import race documented in [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196), where the watcher silently failed to load a newly-merged extension for 17 hours after a hot-reload trigger. The race fires when the file watcher re-imports an extension whose transitive dependencies are already cached by Node's loader; cold starts are unaffected.
220
+
221
+ ### Embedder note (Bun hosts, DAP-style integrations using `createProxyServer()` / `startProxy()`)
222
+
223
+ v4.0.0 flips `CACHE_FIX_THINKING_SANITIZE` from default-off to default-on. The v1 omitted-text drop will run on every request body passing through the embedded proxy. If your host depends on the prior no-sanitization behavior (e.g., your downstream code expects empty `thinking` blocks to survive the proxy round-trip), preserve it by either:
224
+
225
+ - Setting `CACHE_FIX_THINKING_SANITIZE=off` in your host's environment, OR
226
+ - Setting `process.env.CACHE_FIX_THINKING_SANITIZE = "off"` in your code at any point before request handling — the mode is read per-request via `modeFromEnv()`, not cached at module load.
227
+
228
+ The flip is backed by 7 days of prod dogfood (37 sessions, zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs 92.44% baseline). See [PR #201](https://github.com/cnighswonger/claude-code-cache-fix/pull/201) for the validation data and [#63147](https://github.com/anthropics/claude-code/issues/63147) for upstream context.
229
+
230
+ Picking up a new extension or a code change to an existing one in v4.0.0 requires a supervisor-level proxy restart. There are two upgrade flows depending on whether you also want to opt back into hot-reload.
231
+
232
+ ### Flow 1 — code-only npm upgrade (recommended default)
233
+
234
+ Your existing systemd unit / launchd plist is unchanged; only the proxy code on disk is updated by npm. Restart the running process to pick up the new code.
235
+
236
+ **Linux (systemd user unit):**
237
+
238
+ ```
239
+ npm install -g claude-code-cache-fix@4
240
+ systemctl --user restart cache-fix-proxy
241
+ ```
242
+
243
+ No `daemon-reload` required — the unit file content is unchanged.
244
+
245
+ **macOS (launchd user agent):**
246
+
247
+ ```
248
+ npm install -g claude-code-cache-fix@4
249
+ launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
250
+ ```
251
+
252
+ `kickstart` re-execs the agent under the existing plist.
253
+
254
+ ### Flow 2 — opt back into hot-reload at the supervisor layer
255
+
256
+ Run if you actively use hot-reload (e.g., you drop custom extensions into the extensions dir on a live proxy and want them picked up without restart). This rewrites the unit / plist so `CACHE_FIX_HOT_RELOAD=on` is set every time the supervisor starts the proxy.
257
+
258
+ **Linux (systemd user unit):**
259
+
260
+ ```
261
+ CACHE_FIX_HOT_RELOAD=on cache-fix-proxy install-service
262
+ systemctl --user daemon-reload
263
+ systemctl --user restart cache-fix-proxy
264
+ ```
265
+
266
+ `daemon-reload` is required because the unit file content changed.
267
+
268
+ **macOS (launchd user agent):**
269
+
270
+ ```
271
+ CACHE_FIX_HOT_RELOAD=on cache-fix-proxy install-service
272
+ launchctl bootout gui/$(id -u)/com.cnighswonger.cache-fix-proxy
273
+ launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.cnighswonger.cache-fix-proxy.plist
274
+ launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
275
+ ```
276
+
277
+ `bootout` + `bootstrap` is required because the plist contents changed — `kickstart` alone does not pick up plist changes.
278
+
279
+ **Note on the hot-reload tradeoff:** even on the opt-in path, the ESM stale-import race remains possible on long-running processes. If you hit a degraded `/health` (returns 503 + `{status:"degraded",...}`), a process restart is the only recovery; the proxy logs a `[CRITICAL]` hint when this happens. See [#197](https://github.com/cnighswonger/claude-code-cache-fix/pull/197) for the observability layer.
280
+
213
281
  ## What this proxy defends against
214
282
 
215
283
  **Cache-economics regressions.** The original purpose of cache-fix is to absorb the cache-handling behaviors in Claude Code that cost users real money and quota — TTL downgrades, cache-breaking header churn, identity-latching issues, and the rest of the regression catalog documented across our issue history. The proxy sits between CC and the Anthropic API, normalizes the request and response stream, and emits enough observability (via statusline integration and the quota-status files) that users can see what their session is actually doing. This is the load-bearing feature for almost every user today.
216
284
 
217
- **Bootstrap-channel observability.** Claude Code v2.1.150 introduced a prompt-section consumer that fetches a server-supplied string from `/api/claude_cli/bootstrap` and merges it into the agent's behavioral-instructions prompt path. We filed this behavior with Anthropic's security team in May 2026; Anthropic closed the report as *Informative*, treating TLS as the transport-integrity boundary and declining to add application-layer authenticity checks. Cache-fix v3.7.0 added explicit handling for this path. v3.7.1 extends it to also cover the env-var-selected GrowthBook prompt-injection surface that landed in CC v2.1.152 (remote-control mode: `CLAUDE_CODE_SYSTEM_PROMPT_GB_FEATURE` names a flag key whose cached value is used as the system prompt body).
285
+ **Bootstrap-channel observability.** Claude Code v2.1.150 introduced a prompt-section consumer that fetches a server-supplied string from `/api/claude_cli/bootstrap` and merges it into the agent's behavioral-instructions prompt path. We filed this behavior with Anthropic's security team in May 2026; Anthropic closed the report as *Informative*, treating TLS as the transport-integrity boundary and declining to add application-layer authenticity checks. Cache-fix shipped explicit handling for this path in v3.7.0 and extended it in v3.7.1 to also cover the env-var-selected GrowthBook prompt-injection surface that landed in CC v2.1.152 (remote-control mode: `CLAUDE_CODE_SYSTEM_PROMPT_GB_FEATURE` names a flag key whose cached value is used as the system prompt body). Stable in the current v4.x line.
218
286
 
219
287
  Cache-fix's `bootstrap-defense` extension ships three modes, selected via `CACHE_FIX_BOOTSTRAP_MODE`:
220
288
 
@@ -333,7 +401,7 @@ For manual VS Code wrapper setup (without the VSIX), see [docs/preload-setup.md]
333
401
 
334
402
  **What it does NOT do:** No network calls from the proxy or interceptor. All telemetry is written to local files under `~/.claude/`. No data leaves your machine.
335
403
 
336
- **Supply chain:** Proxy mode: 7 small extension modules in `proxy/extensions/` (each under 200 lines). Preload mode: single unminified file (`preload.mjs`, ~1,700 lines). One dev dependency (`zod` for schema validation in tests only). Review before installing. Published builds carry npm's default registry signatures; sigstore provenance attestation is not currently published — tracked as a follow-up.
404
+ **Supply chain:** Proxy mode: small focused extension modules in `proxy/extensions/` (most under a few hundred lines; the pipeline is composable, you can read any single one in isolation). Preload mode: single unminified file (`preload.mjs`). One dev dependency (`zod` for schema validation in tests only). Review before installing. Published builds carry npm's default registry signatures; sigstore provenance attestation is not currently published — tracked as a follow-up.
337
405
 
338
406
  **Independent audit:** [Assessed as "LEGITIMATE TOOL"](https://github.com/anthropics/claude-code/issues/38335#issuecomment-4244413605) by @TheAuditorTool (2026-04-14).
339
407
 
@@ -353,7 +421,7 @@ Additionally, images read via the Read tool persist as base64 in conversation hi
353
421
 
354
422
  ## How it works
355
423
 
356
- **Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions are hot-reloadable `.mjs` files configured in `proxy/extensions.json`. All other traffic passes through untouched.
424
+ **Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. A pipeline of extension modules processes each request — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers, sanitizing thinking blocks, recording telemetry, and more. Extensions live as `.mjs` files configured in `proxy/extensions.json` and load once at proxy startup (hot-reload is opt-in as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x)). All other traffic passes through untouched.
357
425
 
358
426
  **Preload mode** (v2.x): A Node.js `--import` module that patches `globalThis.fetch` before Claude Code makes API calls. Applies the same fixes inline — scans user messages for relocated blocks, sorts tools, recomputes fingerprints, injects TTL markers.
359
427
 
@@ -765,17 +833,17 @@ Token thresholds are anchored to the observed ~382K-token trip with margin; the
765
833
  | `CACHE_FIX_THINKING_RISK_HIGH_TOKENS` | `340000` | Context-token level at which risk becomes `high` and the one-time stderr warn fires. |
766
834
  | `CACHE_FIX_THINKING_RISK` | unset (on) | Set to `off` to suppress the warning signal (stderr line + `thinking_desync_risk` field). Raw count telemetry keeps recording. |
767
835
 
768
- ## Thinking-block sanitize (proxy mode, opt-in, thinking-desync mitigation)
836
+ ## Thinking-block sanitize (proxy mode, on by default, thinking-desync mitigation)
769
837
 
770
838
  The *mitigate* half of the thinking-desync response (the *warn-before* half is session-health above). On history-replay paths (resume / `--continue` / auto-compaction / parallel-tool-cancel), Claude Code re-sends prior assistant turns' extended thinking in the **omitted** shape `{ "type":"thinking", "thinking":"", "signature":"<intact>" }`. The API rejects modified thinking in the **latest** assistant message with a permanent `400 … thinking … blocks cannot be modified`, which wedges the session on every subsequent turn (upstream root cause: [anthropics/claude-code#63147](https://github.com/anthropics/claude-code/issues/63147)).
771
839
 
772
840
  The `thinking-block-sanitize` extension drops those omitted blocks — which the API treats as optional history — from the request before it is forwarded. Empirically-resolved turn-selection rule: drop omitted thinking from **all prior assistant turns and the latest assistant turn, unless the latest turn is an active tool-continuation** (its last block is a `tool_use` answered by a following `tool_result`). In that one case the API requires the signed thinking intact and the proxy cannot restore the emptied text, so it leaves the turn untouched. **No env var both preserves thinking and avoids the wedge for that case:** `CLAUDE_CODE_DISABLE_THINKING=1` / `MAX_THINKING_TOKENS=0` stop the wedge only by disabling thinking entirely (lossy — no reasoning), and `DISABLE_INTERLEAVED_THINKING=1` does *not* stop the `400` — so there the answer is don't-resume + heal/retire the session. That is exactly why the proxy mitigation matters: **it is the only path that preserves reasoning while avoiding the wedge** for the history-replay paths it covers. Non-empty thinking is never touched; `redacted_thinking` is out of scope for v1.
773
841
 
774
- **Opt-in.** v1 ships behind `CACHE_FIX_THINKING_SANITIZE=on` (default off): it mutates request bodies and full live-coverage validation is pending. The transform is deterministic and cache-prefix-stable, and emits a per-request `thinking_blocks_dropped` count into the per-session JSON (counts only — never content) that complements the session-health signal.
842
+ **On by default as of v4.0.0.** v1 was opt-in via `CACHE_FIX_THINKING_SANITIZE=on` in v3.8.0–v3.9.x. After seven days of prod dogfood across 37 sessions (zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs. 92.44% baseline, sanitize firing on ~35% of sessions with ~800 blocks dropped per day, max 938K context healthy) the v1 mitigation is the new default. The transform is deterministic and cache-prefix-stable, and emits a per-request `thinking_blocks_dropped` count into the per-session JSON (counts only — never content) that complements the session-health signal. v2 stays opt-in pending its own prod-dogfood window after [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196) closes the silent-load failure mode that prevented v2 from running in prior testing.
775
843
 
776
844
  | Env var | Default | Purpose |
777
845
  |---------|---------|---------|
778
- | `CACHE_FIX_THINKING_SANITIZE` | unset (off) | Set to `on` to enable the request-path drop of omitted thinking blocks. Off = no-op (no mutation, no telemetry). |
846
+ | `CACHE_FIX_THINKING_SANITIZE` | unset (= v1) | v4.0.0+: v1 omitted-block drop is the default. Set to `off` to explicitly disable (returns to v3.x default-off behavior). Set to `v2` to additionally enable the v2 tools-hash-mismatch drop. Set to `on` for v1 (back-compat same as unset). |
779
847
 
780
848
  ## System prompt rewrite (preload mode, optional)
781
849
 
@@ -787,6 +855,52 @@ The preload interceptor includes monitoring for microcompact degradation, false
787
855
 
788
856
  See [docs/monitoring.md](docs/monitoring.md) for full details, debug mode, prefix diffing, environment variables, and the bundled quota analysis tool.
789
857
 
858
+ ### `usage-log` extension and the `MeterRowSchema v:1` wire format
859
+
860
+ The `usage-log` extension (opt-in via `proxy/extensions.json`) appends one JSON line per API response to `~/.claude/usage.jsonl`. The row shape is `MeterRowSchema v:1` — the cross-repo contract validated by [`claude-code-meter`](https://github.com/cnighswonger/claude-code-meter)'s strict schema. Every field below is captured per call:
861
+
862
+ | Field | Type | Source |
863
+ |---|---|---|
864
+ | `v` | literal `1` | constant |
865
+ | `ts` | ISO-8601 datetime | server time at row emission |
866
+ | `sid` | 8-char lowercase hex | proxy session id, sticky for the proxy's lifetime |
867
+ | `model` | string ≤64 | `message_start.message.model` from the response stream |
868
+ | `requested_model` | string ≤64 (optional) | request body `model` field |
869
+ | `model_mismatch` | bool (optional) | true when `requested_model && model && requested_model !== model` |
870
+ | `speed` | `"standard"` / `"fast"` / `""` | response `usage.speed` |
871
+ | `service_tier` | string ≤32 | response `usage.service_tier` |
872
+ | `input_tokens` | int ≥0 | response usage |
873
+ | `output_tokens` | int ≥0 | response usage |
874
+ | `cache_creation_input_tokens` | int ≥0 | response usage |
875
+ | `cache_read_input_tokens` | int ≥0 | response usage |
876
+ | `ephemeral_1h_input_tokens` | int ≥0 | response usage |
877
+ | `ephemeral_5m_input_tokens` | int ≥0 | response usage |
878
+ | `web_search_requests` | int ≥0 | response usage |
879
+ | `q5h` / `q7d` | float 0–2 | `anthropic-ratelimit-unified-{5h,7d}-utilization` headers |
880
+ | `q5h_reset` / `q7d_reset` | int (unix sec) | corresponding reset headers |
881
+ | `qstatus`, `qoverage`, `qclaim` | lowercase enums | unified status / overage / claim headers |
882
+ | `qfallback_pct` | float 0–1 | unified fallback percentage |
883
+ | `qoverage_util` | float ≥0 (optional) | overage utilization header |
884
+ | `qrepresentative_claim` | string ≤16 (optional) | representative-claim header |
885
+ | `org_id` | 16-char hex (optional) | `sha256(anthropic-organization-id).slice(0, 16)` — never raw |
886
+ | `overage_disabled_reason` | string ≤64 (optional) | overage-disabled-reason header |
887
+ | `cache_hit_rate` | float 0–1 | `cache_read_input_tokens / (input + cache_creation + cache_read)` |
888
+ | `q5h_delta`, `q7d_delta` | float | per-call delta from the previous row's q5h/q7d; 0 on first call after restart |
889
+ | `request_id` | string ≤64 (optional, gated) | upstream `request-id` response header. Default-off; enable with `CACHE_FIX_USAGE_LOG_REQID=on`. **Cross-repo gate:** `claude-code-meter >= v0.7.0` accepts the optional field; older meter installs reject unknown keys via the strict-object schema. |
890
+
891
+ **Why `request_id` matters operationally.** The `sid` field is generated once at proxy boot and shared across every CC session that proxy serves. On hosts running multiple concurrent CC sessions through one proxy (common in agent fleets), every session's rows collapse into the same `sid` — there's no way to ask "which session burned 80% of today's Opus tokens?" from `usage.jsonl` alone. CC's per-session JSONL transcripts at `~/.claude/projects/<project>/<session-uuid>.jsonl` already carry `requestId` for every API call. Capturing the same value in the meter row makes the post-hoc join trivial:
892
+
893
+ ```bash
894
+ # Find which CC session each usage.jsonl row belongs to:
895
+ for row in $(jq -c . < ~/.claude/usage.jsonl); do
896
+ req=$(jq -r '.request_id // empty' <<< "$row")
897
+ [ -z "$req" ] && continue
898
+ grep -l "\"requestId\":\"$req\"" ~/.claude/projects/*/*.jsonl
899
+ done
900
+ ```
901
+
902
+ The filename of the matching transcript is the CC session UUID, recovering per-session attribution for every meter row that was emitted with the field on.
903
+
790
904
  ## Limitations
791
905
 
792
906
  - **Proxy requires a running process** — The proxy must be started before Claude Code. If it's not running and `ANTHROPIC_BASE_URL` points to it, CC will fail to connect. We recommend running it as a systemd service or with a health-checking wrapper script.
@@ -827,6 +941,7 @@ We monitor 30+ upstream Claude Code issues related to cache, quota, and context
827
941
  - **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery, proposed and built proxy architecture for v3.0.0
828
942
  - **[@vmfarms](https://github.com/vmfarms)** — Concurrent multi-runner production validation, surfaced proxy-mode resume-marker regex no-op (#96), TTL tier detection gap (#97), and image-strip stderr leak (#98)
829
943
  - **[@ojura](https://github.com/ojura)** — Opus 4.7 thinking-summaries root-cause analysis: filed [anthropics/claude-code#59844](https://github.com/anthropics/claude-code/issues/59844) with the CLI-binary decode (`!getIsNonInteractiveSession()` gate at offset 230510599 in v2.1.142) and the two-stacked-special-cases framing, which made the `thinking-display` extension (v3.6.1) a clean proxy-side complement to the proposed upstream fix
944
+ - **[@yurukusa](https://github.com/yurukusa)** — [Cluster taxonomy](https://yurukusa.github.io/cc-safe-setup/cluster-tracker.html#cluster-extended-thinking-wedge) for [anthropics/claude-code#63147](https://github.com/anthropics/claude-code/issues/63147) thinking-desync wedge; the 13E (ToolSearch) sub-pattern synthesis that made the `thinking-block-sanitize` v2 directive predicate tractable (cache-fix #171, shipped behind `=v2` opt-in in v4.0.0)
830
945
  - **[@schuay](https://github.com/schuay)** — `quota-statusline.sh` enhancements: 10-cell quota bar with elapsed-time tick and exhaust-vs-reset projection replacing the prior `%/min` burn-rate display (PR #140, v3.6.2), and d/h vs h/m time-format autoselect plus named time-unit and burn-warmup constants (PR #143, v3.7.0)
831
946
 
832
947
  If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.
@@ -55,6 +55,7 @@ async function dispatch() {
55
55
  " CACHE_FIX_PROXY_PORT Port for the proxy server\n" +
56
56
  " CACHE_FIX_PROXY_UPSTREAM Upstream URL\n" +
57
57
  " CACHE_FIX_DEBUG=1 Verbose proxy logging\n" +
58
+ " CACHE_FIX_HOT_RELOAD=on Enable in-process extension hot-reload (off by default; see #196)\n" +
58
59
  " CACHE_FIX_CLAUDE_CMD Override the `claude` command for the wrapper\n",
59
60
  );
60
61
  return 0;
@@ -13,6 +13,7 @@ import { spawn } from "node:child_process";
13
13
  import { fileURLToPath } from "node:url";
14
14
  import { dirname, resolve, join } from "node:path";
15
15
  import { homedir, platform } from "node:os";
16
+ import { systemdEscape, xmlEscape } from "../proxy/helpers.mjs";
16
17
 
17
18
  const __dirname = dirname(fileURLToPath(import.meta.url));
18
19
  const TEMPLATE_DIR = resolve(__dirname, "..", "templates");
@@ -22,7 +23,14 @@ function getDefaults() {
22
23
  return {
23
24
  port: validatePort(process.env.CACHE_FIX_PROXY_PORT || "9801"),
24
25
  upstream: process.env.CACHE_FIX_PROXY_UPSTREAM || "",
26
+ caFile: process.env.CACHE_FIX_PROXY_CA_FILE || "",
27
+ rejectUnauthorized: process.env.CACHE_FIX_PROXY_REJECT_UNAUTHORIZED || "",
25
28
  debug: process.env.CACHE_FIX_DEBUG || "",
29
+ // Hot-reload is opt-in as of v4.0.0 (#196). Capture from env at install
30
+ // time so the operator can bake `CACHE_FIX_HOT_RELOAD=on` into the
31
+ // generated unit/plist via `CACHE_FIX_HOT_RELOAD=on cache-fix-proxy
32
+ // install-service`. Strict "on" match — anything else renders nothing.
33
+ hotReload: process.env.CACHE_FIX_HOT_RELOAD === "on" ? "on" : "",
26
34
  workingDir: resolve(__dirname, ".."),
27
35
  };
28
36
  }
@@ -88,10 +96,19 @@ function getPaths(plat = platform()) {
88
96
 
89
97
  function renderSystemdTemplate(template, vars) {
90
98
  const upstreamLine = vars.upstream
91
- ? `Environment=CACHE_FIX_PROXY_UPSTREAM=${vars.upstream}`
99
+ ? `Environment=CACHE_FIX_PROXY_UPSTREAM=${systemdEscape(vars.upstream)}`
100
+ : "";
101
+ const caFileLine = vars.caFile
102
+ ? `Environment=CACHE_FIX_PROXY_CA_FILE=${systemdEscape(vars.caFile)}`
103
+ : "";
104
+ const rejectUnauthorizedLine = vars.rejectUnauthorized
105
+ ? `Environment=CACHE_FIX_PROXY_REJECT_UNAUTHORIZED=${systemdEscape(vars.rejectUnauthorized)}`
92
106
  : "";
93
107
  const debugLine = vars.debug
94
- ? `Environment=CACHE_FIX_DEBUG=${vars.debug}`
108
+ ? `Environment=CACHE_FIX_DEBUG=${systemdEscape(vars.debug)}`
109
+ : "";
110
+ const hotReloadLine = vars.hotReload
111
+ ? `Environment=CACHE_FIX_HOT_RELOAD=${vars.hotReload}`
95
112
  : "";
96
113
  // Allow callers to wire a Requires= line (e.g. another service the proxy
97
114
  // chains to). Empty string by default so the unit has no extra deps.
@@ -103,7 +120,10 @@ function renderSystemdTemplate(template, vars) {
103
120
  .replaceAll("{{SERVER_PATH}}", vars.serverPath)
104
121
  .replaceAll("{{PORT}}", vars.port)
105
122
  .replaceAll("{{UPSTREAM_LINE}}", upstreamLine)
123
+ .replaceAll("{{CA_FILE_LINE}}", caFileLine)
124
+ .replaceAll("{{REJECT_UNAUTHORIZED_LINE}}", rejectUnauthorizedLine)
106
125
  .replaceAll("{{DEBUG_LINE}}", debugLine)
126
+ .replaceAll("{{HOT_RELOAD_LINE}}", hotReloadLine)
107
127
  .replaceAll("{{REQUIRES_LINE}}", requiresLine)
108
128
  .replaceAll("{{WORKING_DIR}}", vars.workingDir)
109
129
  // Collapse triple newlines from empty optional lines down to single blank
@@ -112,17 +132,29 @@ function renderSystemdTemplate(template, vars) {
112
132
 
113
133
  function renderLaunchdTemplate(template, vars) {
114
134
  const upstreamPlist = vars.upstream
115
- ? ` <key>CACHE_FIX_PROXY_UPSTREAM</key>\n <string>${vars.upstream}</string>`
135
+ ? ` <key>CACHE_FIX_PROXY_UPSTREAM</key>\n <string>${xmlEscape(vars.upstream)}</string>`
136
+ : "";
137
+ const caFilePlist = vars.caFile
138
+ ? ` <key>CACHE_FIX_PROXY_CA_FILE</key>\n <string>${xmlEscape(vars.caFile)}</string>`
139
+ : "";
140
+ const rejectUnauthorizedPlist = vars.rejectUnauthorized
141
+ ? ` <key>CACHE_FIX_PROXY_REJECT_UNAUTHORIZED</key>\n <string>${xmlEscape(vars.rejectUnauthorized)}</string>`
116
142
  : "";
117
143
  const debugPlist = vars.debug
118
- ? ` <key>CACHE_FIX_DEBUG</key>\n <string>${vars.debug}</string>`
144
+ ? ` <key>CACHE_FIX_DEBUG</key>\n <string>${xmlEscape(vars.debug)}</string>`
145
+ : "";
146
+ const hotReloadPlist = vars.hotReload
147
+ ? ` <key>CACHE_FIX_HOT_RELOAD</key>\n <string>${vars.hotReload}</string>`
119
148
  : "";
120
149
  return template
121
150
  .replaceAll("{{NODE}}", vars.node)
122
151
  .replaceAll("{{SERVER_PATH}}", vars.serverPath)
123
152
  .replaceAll("{{PORT}}", vars.port)
124
153
  .replaceAll("{{UPSTREAM_PLIST}}", upstreamPlist)
154
+ .replaceAll("{{CA_FILE_PLIST}}", caFilePlist)
155
+ .replaceAll("{{REJECT_UNAUTHORIZED_PLIST}}", rejectUnauthorizedPlist)
125
156
  .replaceAll("{{DEBUG_PLIST}}", debugPlist)
157
+ .replaceAll("{{HOT_RELOAD_PLIST}}", hotReloadPlist)
126
158
  .replaceAll("{{WORKING_DIR}}", vars.workingDir)
127
159
  .replaceAll("{{LOG_DIR}}", vars.logDir)
128
160
  .replace(/\n\n+/g, "\n");
@@ -173,11 +205,8 @@ async function installSystemd({ paths, defaults, force = false } = {}) {
173
205
  const rendered = renderSystemdTemplate(template, {
174
206
  node: process.execPath,
175
207
  serverPath: SERVER_PATH,
176
- port: defaults.port,
177
- upstream: defaults.upstream,
178
- debug: defaults.debug,
179
- workingDir: defaults.workingDir,
180
208
  requires: "",
209
+ ...defaults,
181
210
  });
182
211
  await mkdir(paths.configDir, { recursive: true });
183
212
  await writeFile(targetPath, rendered);
@@ -272,11 +301,8 @@ async function installLaunchd({ paths, defaults, force = false } = {}) {
272
301
  const rendered = renderLaunchdTemplate(template, {
273
302
  node: process.execPath,
274
303
  serverPath: SERVER_PATH,
275
- port: defaults.port,
276
- upstream: defaults.upstream,
277
- debug: defaults.debug,
278
- workingDir: defaults.workingDir,
279
304
  logDir: paths.logDir,
305
+ ...defaults,
280
306
  });
281
307
  await mkdir(paths.configDir, { recursive: true });
282
308
  await writeFile(targetPath, rendered);
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-code-cache-fix",
3
- "version": "3.9.0",
3
+ "version": "4.1.0",
4
4
  "description": "Cache optimization proxy and interceptor for Claude Code. Fixes prompt cache bugs, stabilizes prefix, reduces quota burn.",
5
5
  "type": "module",
6
6
  "exports": {
@@ -56,7 +56,12 @@ export function sessionFilePath(rawId) {
56
56
  return join(paths().sessionsDir, `${sessionFilename(rawId)}.json`);
57
57
  }
58
58
 
59
- function resolveSessionId(headers) {
59
+ // Exported so sibling extensions can read the canonical session id from
60
+ // REQUEST headers at their own onRequest time — they can't rely on
61
+ // ctx.meta._sessionId being set, because this writer's onRequest is the
62
+ // thing that populates it (and runs at order 600, after most extensions).
63
+ // thinking-block-sanitize v2 (order 550) uses this for the same reason.
64
+ export function resolveSessionId(headers) {
60
65
  if (!headers) return null;
61
66
  const sid =
62
67
  headers["x-claude-code-session-id"] ||
@@ -233,9 +238,16 @@ export default {
233
238
  // 590, stashes these before this writer runs). Optional — absent if
234
239
  // that extension is disabled or produced nothing this request.
235
240
  ...(ctx.meta._sessionHealth || {}),
236
- // Additive thinking-block-sanitize drop count (order 550, opt-in).
237
- // Optional absent unless CACHE_FIX_THINKING_SANITIZE=on.
241
+ // Additive thinking-block-sanitize drop count (order 550). On by
242
+ // default since v4.0.0; present (possibly with thinking_blocks_dropped:0)
243
+ // whenever sanitize ran. Absent when CACHE_FIX_THINKING_SANITIZE=off
244
+ // or when the extension returned early before reaching the planner
245
+ // (e.g., body.messages not an array).
238
246
  ...(ctx.meta._thinkingSanitize || {}),
247
+ // Additive thinking-block-sanitize v2 fields (order 550, opt-in via
248
+ // CACHE_FIX_THINKING_SANITIZE=v2). Optional — absent unless v2 is
249
+ // enabled. Keys: thinking_blocks_dropped_v2 / tools_hash_baseline.
250
+ ...(ctx.meta._thinkingSanitizeV2 || {}),
239
251
  // Additive auto-1m-guard annotation (order 520). Optional — absent
240
252
  // unless the outbound request carried context-1m-2025-08-07 and the
241
253
  // mode wasn't off. Keys: auto_1m_detected / auto_1m_action /
@@ -0,0 +1,60 @@
1
+ // signature-surface-hash — hash helper for thinking-block-sanitize v2.
2
+ //
3
+ // Computes a deterministic 16-hex-char fingerprint of the inputs that
4
+ // participate in the API's thinking-block signature: the tools surface,
5
+ // and (forward-compat) optionally the system block or anthropic-beta
6
+ // header value.
7
+ //
8
+ // v2 only passes `{ tools }`. The signature is left forward-compatible so
9
+ // a future v3 directive can extend coverage without renaming this helper.
10
+ //
11
+ // Canonicalization rules (per directive proxy-thinking-block-sanitize-v2.md):
12
+ // - Each tool object: recursive stable JSON stringify with recursive key
13
+ // sorting at every nesting level. Nested JSON-schema objects (in
14
+ // input_schema, parameters, etc.) have their own keys, which also
15
+ // sort stably.
16
+ // - Preserve tools[] array order. Reordering tools changes which slot
17
+ // which tool occupies in the API's view; the hash MUST reflect that.
18
+ // (Note: sort-stabilization at order 200 currently locks the array
19
+ // order before v2 fires, so this rule is forward-compatibility against
20
+ // any future change in upstream ordering.)
21
+ // - Sentinel for empty/absent: if tools is undefined, null, or [], the
22
+ // hash input is the literal string "none". Rules out collision with
23
+ // other empty-shaped inputs in a future extension.
24
+ //
25
+ // Output: sha256(canonical_input).slice(0, 16) — 16 hex chars matches the
26
+ // existing _sessionHealth / _thinkingSanitize precedent for in-JSON identifiers.
27
+
28
+ import { createHash } from "node:crypto";
29
+
30
+ // Recursive stable stringify: object keys sort, arrays preserve order,
31
+ // primitives go through JSON.stringify as-is. Handles nested objects and
32
+ // arrays to arbitrary depth.
33
+ export function canonicalStringify(value) {
34
+ if (value === null || typeof value !== "object") {
35
+ return JSON.stringify(value);
36
+ }
37
+ if (Array.isArray(value)) {
38
+ return "[" + value.map(canonicalStringify).join(",") + "]";
39
+ }
40
+ const keys = Object.keys(value).sort();
41
+ const parts = keys.map((k) => JSON.stringify(k) + ":" + canonicalStringify(value[k]));
42
+ return "{" + parts.join(",") + "}";
43
+ }
44
+
45
+ // Compute the signature-surface hash. v2 passes only { tools }; system and
46
+ // anthropic_beta are reserved for future versions.
47
+ export function computeSignatureSurfaceHash({ tools, system, anthropic_beta } = {}) {
48
+ // Empty/absent tools → "none" sentinel (not the canonical-stringify of [],
49
+ // which would be "[]" and could collide with other empty-shaped inputs).
50
+ const toolsPart =
51
+ tools == null || (Array.isArray(tools) && tools.length === 0)
52
+ ? "none"
53
+ : canonicalStringify(tools);
54
+ // Reserved inputs — passed by future versions; v2 always omits them, so
55
+ // they contribute nothing to the hash today. Kept in the signature so
56
+ // existing call sites don't need to change when v3 adds them.
57
+ void system;
58
+ void anthropic_beta;
59
+ return createHash("sha256").update(toolsPart).digest("hex").slice(0, 16);
60
+ }