npm - claude-code-cache-fix - Versions diffs - 3.9.0 → 4.1.0 - Mend

claude-code-cache-fix 3.9.0 → 4.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/README.md +128 -13
package/bin/claude-via-proxy.mjs +1 -0
package/bin/install-service.mjs +38 -12
package/package.json +1 -1
package/proxy/extensions/cache-telemetry.mjs +15 -3
package/proxy/extensions/signature-surface-hash.mjs +60 -0
package/proxy/extensions/thinking-block-sanitize.mjs +233 -19
package/proxy/extensions/usage-log.mjs +46 -1
package/proxy/extensions.json +18 -80
package/proxy/helpers.mjs +30 -0
package/proxy/pipeline.mjs +22 -1
package/proxy/server.mjs +136 -13
package/proxy/upstream.mjs +15 -1
package/templates/cache-fix-proxy.service.template +3 -0
package/templates/com.cnighswonger.cache-fix-proxy.plist.template +3 -0
package/tools/MANUAL-COMPACT.md +16 -1
package/tools/cache_analysis.py +229 -0

package/README.md CHANGED Viewed

@@ -6,13 +6,13 @@ English | [中文](./README.zh.md) | [한국어](./README.ko.md) | [Português](
 Cache optimization proxy for [Claude Code](https://github.com/anthropics/claude-code). Fixes prompt cache bugs that cause excessive quota burn, stabilizes the request prefix, and monitors for silent regressions. Works with all CC versions including the v2.1.113+ Bun binary.
-> **v3.0.3** — Local HTTP proxy with 7 hot-reloadable extensions. A/B tested on v2.1.117: **95.5% cache hit rate through proxy vs 82.3% direct** on first warm turn. [Full release notes →](https://github.com/cnighswonger/claude-code-cache-fix/releases/tag/v3.0.0)
+> **v4.0.0** — Local HTTP proxy with a pipeline of cost-impact and observability extensions. Two long-standing defaults flipped: `thinking-block-sanitize` v1 is on by default (mitigates the thinking-desync `400` wedge — [#63147](https://github.com/anthropics/claude-code/issues/63147)) and in-process extension hot-reload is opt-in (`CACHE_FIX_HOT_RELOAD=on`). A/B baseline (v3.0.0 on v2.1.117): **95.5% cache hit rate through proxy vs 82.3% direct** on first warm turn. [Full release notes →](https://github.com/cnighswonger/claude-code-cache-fix/releases/tag/v4.0.0)
 > **Opus 4.7 advisory:** Metered data shows 4.7 burns Q5h quota at **~2.4x the rate of 4.6** for equivalent visible token counts ([independently confirmed by @ArkNill](https://github.com/ArkNill/claude-code-hidden-problem-analysis/blob/main/16_OPUS-47-ADVISORY.md)). Two factors: a new tokenizer (up to 35% more tokens, [documented](https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7)) and adaptive thinking overhead (~105%, not documented in usage response). The Q5h impact compounds into **Q7d** — the weekly quota ceiling that most heavy users will hit first. Workaround: `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` reduces burn by ~3.3x but may reduce quality on complex tasks. See [Discussion #25](https://github.com/cnighswonger/claude-code-cache-fix/discussions/25) (initial observation) and [Discussion #42](https://github.com/cnighswonger/claude-code-cache-fix/discussions/42) (controlled A/B data + Q7d analysis).
 ## Quick Start: Proxy (recommended)
-The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as hot-reloadable extensions.
+The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as composable extensions.
 ```bash
 # Install
@@ -25,11 +25,11 @@ node "$(npm root -g)/claude-code-cache-fix/proxy/server.mjs" &
 ANTHROPIC_BASE_URL=http://127.0.0.1:9801 claude
 ```
-That's it. The proxy applies all 7 cache-fix extensions automatically. No wrapper scripts, no `NODE_OPTIONS`, no preload.
+That's it. The proxy applies its default extension pipeline automatically. No wrapper scripts, no `NODE_OPTIONS`, no preload.
 ### What the proxy does
-On every `/v1/messages` request, 9 extensions run in order (one opt-in):
+On every `/v1/messages` request, the pipeline runs an ordered chain of extensions covering cache stability, observability, thinking-desync mitigation, image, microcompact, breakpoint, bootstrap-channel, and other surfaces. Several are gated behind env vars documented in their own sections below; bootstrap-channel handling defaults to `audit` mode. The headliners:
 | Extension | What it fixes |
 |-----------|--------------|
@@ -41,9 +41,9 @@ On every `/v1/messages` request, 9 extensions run in order (one opt-in):
 | `cache-control-normalize` | Normalizes cache_control markers across messages |
 | `cache-telemetry` | Extracts cache stats from response headers → `~/.claude/quota-status/{account.json,sessions/<id>.json}` |
 | `session-health` | Observes per-session thinking-desync risk (context size + thinking-block count) and warns before a session reaches the danger zone. Read-only |
-| `thinking-block-sanitize` | Drops omitted (empty-text) thinking blocks to head off the CC thinking-desync `400` (#63147). **Opt-in** (`CACHE_FIX_THINKING_SANITIZE=on`) |
+| `thinking-block-sanitize` | Drops omitted (empty-text) thinking blocks to head off the CC thinking-desync `400` (#63147). **On by default as of v4.0.0** (v1 mode). Set `CACHE_FIX_THINKING_SANITIZE=off` to disable, `=v2` for additional tools-hash-mismatch drop (opt-in). |
-Extensions are hot-reloadable — add, remove, or modify `.mjs` files in `proxy/extensions/` and changes apply to the next request without restarting. Configuration in `proxy/extensions.json`.
+Extensions live as `.mjs` files in `proxy/extensions/` with configuration in `proxy/extensions.json`. As of v4.0.0 the proxy loads them once at startup; adding, removing, or modifying an extension requires a supervisor-level proxy restart (see [Upgrading from v3.x](#upgrading-from-v3x)). Hot-reload is available as opt-in via `CACHE_FIX_HOT_RELOAD=on` for users who want the v3.x behavior back; that path is subject to the Node ESM stale-import race documented in [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196).
 **Developing a new extension?** See [docs/parallel-proxy-test-harness.md](docs/parallel-proxy-test-harness.md) for the pattern we use to test extensions end-to-end against real `claude -p` traffic without disturbing the production proxy.
@@ -116,7 +116,7 @@ docker run -d --name cache-fix-proxy --restart=always -p 9801:9801 \
   ghcr.io/cnighswonger/claude-code-cache-fix:latest
 ```
-Image tags: `latest`, `3`, `3.2`, `3.2.1` (semver-ladder, so `3` always points to the newest 3.x). `latest` always tracks the newest tagged release.
+Image tags: `latest`, `4`, `4.0`, `4.0.0` (semver-ladder, so `4` always points to the newest 4.x). `latest` always tracks the newest tagged release.
 **Linux note:** the chained-upstream `host.docker.internal` example below is automatic on Docker Desktop (macOS / Windows). On plain Linux Docker Engine you usually need `--add-host=host.docker.internal:host-gateway` so the name resolves to the host bridge. Without it, the container's name lookup fails and the proxy can't reach the upstream service running on the host. Example chaining cache-fix proxy through `llm-relay` running on the host:
@@ -147,6 +147,7 @@ All proxy settings are controlled via environment variables. Set them before sta
 | `CACHE_FIX_EXTENSIONS_DIR` | `proxy/extensions/` | Directory for extension `.mjs` files |
 | `CACHE_FIX_EXTENSIONS_CONFIG` | `proxy/extensions.json` | Extension configuration file |
 | `CACHE_FIX_DEBUG` | `0` | Enable debug logging |
+| `CACHE_FIX_HOT_RELOAD` | unset | Set to `on` to enable in-process extension hot-reload. Off by default as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x) for details and the supervisor restart flow. |
 ### Corporate environments (proxies, custom CAs)
@@ -210,11 +211,78 @@ Options (all optional; all fall back to the same env vars used by the CLI):
 *The embeddable factory was contributed by [@bilby91](https://github.com/bilby91) at [Crunchloop DAP](https://dap.crunchloop.ai) — see [PR #123](https://github.com/cnighswonger/claude-code-cache-fix/pull/123).*
+## Upgrading from v3.x
+**Behavior changes in v4.0.0:**
+- **`thinking-block-sanitize` v1 is now on by default.** Was opt-in via `CACHE_FIX_THINKING_SANITIZE=on` in v3.8.0–v3.9.x. After seven days of prod dogfood across 37 sessions (zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs. 92.44% baseline, sanitize firing on ~35% of sessions with ~800 blocks dropped per day) the v1 mitigation is the new default. Set `CACHE_FIX_THINKING_SANITIZE=off` to explicitly disable. v2 (additional tools-hash-mismatch drop) stays opt-in via `=v2`. See [#63147](https://github.com/anthropics/claude-code/issues/63147) and [#162](https://github.com/cnighswonger/claude-code-cache-fix/issues/162).
+- **In-process extension hot-reload is now off by default.** Was on in v3.x. Set `CACHE_FIX_HOT_RELOAD=on` to restore the prior behavior. Off-by-default eliminates the Node ESM stale-import race documented in [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196), where the watcher silently failed to load a newly-merged extension for 17 hours after a hot-reload trigger. The race fires when the file watcher re-imports an extension whose transitive dependencies are already cached by Node's loader; cold starts are unaffected.
+### Embedder note (Bun hosts, DAP-style integrations using `createProxyServer()` / `startProxy()`)
+v4.0.0 flips `CACHE_FIX_THINKING_SANITIZE` from default-off to default-on. The v1 omitted-text drop will run on every request body passing through the embedded proxy. If your host depends on the prior no-sanitization behavior (e.g., your downstream code expects empty `thinking` blocks to survive the proxy round-trip), preserve it by either:
+- Setting `CACHE_FIX_THINKING_SANITIZE=off` in your host's environment, OR
+- Setting `process.env.CACHE_FIX_THINKING_SANITIZE = "off"` in your code at any point before request handling — the mode is read per-request via `modeFromEnv()`, not cached at module load.
+The flip is backed by 7 days of prod dogfood (37 sessions, zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs 92.44% baseline). See [PR #201](https://github.com/cnighswonger/claude-code-cache-fix/pull/201) for the validation data and [#63147](https://github.com/anthropics/claude-code/issues/63147) for upstream context.
+Picking up a new extension or a code change to an existing one in v4.0.0 requires a supervisor-level proxy restart. There are two upgrade flows depending on whether you also want to opt back into hot-reload.
+### Flow 1 — code-only npm upgrade (recommended default)
+Your existing systemd unit / launchd plist is unchanged; only the proxy code on disk is updated by npm. Restart the running process to pick up the new code.
+**Linux (systemd user unit):**
+```
+npm install -g claude-code-cache-fix@4
+systemctl --user restart cache-fix-proxy
+```
+No `daemon-reload` required — the unit file content is unchanged.
+**macOS (launchd user agent):**
+```
+npm install -g claude-code-cache-fix@4
+launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
+```
+`kickstart` re-execs the agent under the existing plist.
+### Flow 2 — opt back into hot-reload at the supervisor layer
+Run if you actively use hot-reload (e.g., you drop custom extensions into the extensions dir on a live proxy and want them picked up without restart). This rewrites the unit / plist so `CACHE_FIX_HOT_RELOAD=on` is set every time the supervisor starts the proxy.
+**Linux (systemd user unit):**
+```
+CACHE_FIX_HOT_RELOAD=on cache-fix-proxy install-service
+systemctl --user daemon-reload
+systemctl --user restart cache-fix-proxy
+```
+`daemon-reload` is required because the unit file content changed.
+**macOS (launchd user agent):**
+```
+CACHE_FIX_HOT_RELOAD=on cache-fix-proxy install-service
+launchctl bootout gui/$(id -u)/com.cnighswonger.cache-fix-proxy
+launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.cnighswonger.cache-fix-proxy.plist
+launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
+```
+`bootout` + `bootstrap` is required because the plist contents changed — `kickstart` alone does not pick up plist changes.
+**Note on the hot-reload tradeoff:** even on the opt-in path, the ESM stale-import race remains possible on long-running processes. If you hit a degraded `/health` (returns 503 + `{status:"degraded",...}`), a process restart is the only recovery; the proxy logs a `[CRITICAL]` hint when this happens. See [#197](https://github.com/cnighswonger/claude-code-cache-fix/pull/197) for the observability layer.
 ## What this proxy defends against
 **Cache-economics regressions.** The original purpose of cache-fix is to absorb the cache-handling behaviors in Claude Code that cost users real money and quota — TTL downgrades, cache-breaking header churn, identity-latching issues, and the rest of the regression catalog documented across our issue history. The proxy sits between CC and the Anthropic API, normalizes the request and response stream, and emits enough observability (via statusline integration and the quota-status files) that users can see what their session is actually doing. This is the load-bearing feature for almost every user today.
-**Bootstrap-channel observability.** Claude Code v2.1.150 introduced a prompt-section consumer that fetches a server-supplied string from `/api/claude_cli/bootstrap` and merges it into the agent's behavioral-instructions prompt path. We filed this behavior with Anthropic's security team in May 2026; Anthropic closed the report as *Informative*, treating TLS as the transport-integrity boundary and declining to add application-layer authenticity checks. Cache-fix v3.7.0 added explicit handling for this path. v3.7.1 extends it to also cover the env-var-selected GrowthBook prompt-injection surface that landed in CC v2.1.152 (remote-control mode: `CLAUDE_CODE_SYSTEM_PROMPT_GB_FEATURE` names a flag key whose cached value is used as the system prompt body).
+**Bootstrap-channel observability.** Claude Code v2.1.150 introduced a prompt-section consumer that fetches a server-supplied string from `/api/claude_cli/bootstrap` and merges it into the agent's behavioral-instructions prompt path. We filed this behavior with Anthropic's security team in May 2026; Anthropic closed the report as *Informative*, treating TLS as the transport-integrity boundary and declining to add application-layer authenticity checks. Cache-fix shipped explicit handling for this path in v3.7.0 and extended it in v3.7.1 to also cover the env-var-selected GrowthBook prompt-injection surface that landed in CC v2.1.152 (remote-control mode: `CLAUDE_CODE_SYSTEM_PROMPT_GB_FEATURE` names a flag key whose cached value is used as the system prompt body). Stable in the current v4.x line.
 Cache-fix's `bootstrap-defense` extension ships three modes, selected via `CACHE_FIX_BOOTSTRAP_MODE`:
@@ -333,7 +401,7 @@ For manual VS Code wrapper setup (without the VSIX), see [docs/preload-setup.md]
 **What it does NOT do:** No network calls from the proxy or interceptor. All telemetry is written to local files under `~/.claude/`. No data leaves your machine.
-**Supply chain:** Proxy mode: 7 small extension modules in `proxy/extensions/` (each under 200 lines). Preload mode: single unminified file (`preload.mjs`, ~1,700 lines). One dev dependency (`zod` for schema validation in tests only). Review before installing. Published builds carry npm's default registry signatures; sigstore provenance attestation is not currently published — tracked as a follow-up.
+**Supply chain:** Proxy mode: small focused extension modules in `proxy/extensions/` (most under a few hundred lines; the pipeline is composable, you can read any single one in isolation). Preload mode: single unminified file (`preload.mjs`). One dev dependency (`zod` for schema validation in tests only). Review before installing. Published builds carry npm's default registry signatures; sigstore provenance attestation is not currently published — tracked as a follow-up.
 **Independent audit:** [Assessed as "LEGITIMATE TOOL"](https://github.com/anthropics/claude-code/issues/38335#issuecomment-4244413605) by @TheAuditorTool (2026-04-14).
@@ -353,7 +421,7 @@ Additionally, images read via the Read tool persist as base64 in conversation hi
 ## How it works
-**Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions are hot-reloadable `.mjs` files configured in `proxy/extensions.json`. All other traffic passes through untouched.
+**Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. A pipeline of extension modules processes each request — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers, sanitizing thinking blocks, recording telemetry, and more. Extensions live as `.mjs` files configured in `proxy/extensions.json` and load once at proxy startup (hot-reload is opt-in as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x)). All other traffic passes through untouched.
 **Preload mode** (v2.x): A Node.js `--import` module that patches `globalThis.fetch` before Claude Code makes API calls. Applies the same fixes inline — scans user messages for relocated blocks, sorts tools, recomputes fingerprints, injects TTL markers.
@@ -765,17 +833,17 @@ Token thresholds are anchored to the observed ~382K-token trip with margin; the
 | `CACHE_FIX_THINKING_RISK_HIGH_TOKENS` | `340000` | Context-token level at which risk becomes `high` and the one-time stderr warn fires. |
 | `CACHE_FIX_THINKING_RISK` | unset (on) | Set to `off` to suppress the warning signal (stderr line + `thinking_desync_risk` field). Raw count telemetry keeps recording. |
-## Thinking-block sanitize (proxy mode, opt-in, thinking-desync mitigation)
+## Thinking-block sanitize (proxy mode, on by default, thinking-desync mitigation)
 The *mitigate* half of the thinking-desync response (the *warn-before* half is session-health above). On history-replay paths (resume / `--continue` / auto-compaction / parallel-tool-cancel), Claude Code re-sends prior assistant turns' extended thinking in the **omitted** shape `{ "type":"thinking", "thinking":"", "signature":"<intact>" }`. The API rejects modified thinking in the **latest** assistant message with a permanent `400 … thinking … blocks cannot be modified`, which wedges the session on every subsequent turn (upstream root cause: [anthropics/claude-code#63147](https://github.com/anthropics/claude-code/issues/63147)).
 The `thinking-block-sanitize` extension drops those omitted blocks — which the API treats as optional history — from the request before it is forwarded. Empirically-resolved turn-selection rule: drop omitted thinking from **all prior assistant turns and the latest assistant turn, unless the latest turn is an active tool-continuation** (its last block is a `tool_use` answered by a following `tool_result`). In that one case the API requires the signed thinking intact and the proxy cannot restore the emptied text, so it leaves the turn untouched. **No env var both preserves thinking and avoids the wedge for that case:** `CLAUDE_CODE_DISABLE_THINKING=1` / `MAX_THINKING_TOKENS=0` stop the wedge only by disabling thinking entirely (lossy — no reasoning), and `DISABLE_INTERLEAVED_THINKING=1` does *not* stop the `400` — so there the answer is don't-resume + heal/retire the session. That is exactly why the proxy mitigation matters: **it is the only path that preserves reasoning while avoiding the wedge** for the history-replay paths it covers. Non-empty thinking is never touched; `redacted_thinking` is out of scope for v1.
-**Opt-in.** v1 ships behind `CACHE_FIX_THINKING_SANITIZE=on` (default off): it mutates request bodies and full live-coverage validation is pending. The transform is deterministic and cache-prefix-stable, and emits a per-request `thinking_blocks_dropped` count into the per-session JSON (counts only — never content) that complements the session-health signal.
+**On by default as of v4.0.0.** v1 was opt-in via `CACHE_FIX_THINKING_SANITIZE=on` in v3.8.0–v3.9.x. After seven days of prod dogfood across 37 sessions (zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs. 92.44% baseline, sanitize firing on ~35% of sessions with ~800 blocks dropped per day, max 938K context healthy) the v1 mitigation is the new default. The transform is deterministic and cache-prefix-stable, and emits a per-request `thinking_blocks_dropped` count into the per-session JSON (counts only — never content) that complements the session-health signal. v2 stays opt-in pending its own prod-dogfood window after [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196) closes the silent-load failure mode that prevented v2 from running in prior testing.
 | Env var | Default | Purpose |
 |---------|---------|---------|
-| `CACHE_FIX_THINKING_SANITIZE` | unset (off) | Set to `on` to enable the request-path drop of omitted thinking blocks. Off = no-op (no mutation, no telemetry). |
+| `CACHE_FIX_THINKING_SANITIZE` | unset (= v1) | v4.0.0+: v1 omitted-block drop is the default. Set to `off` to explicitly disable (returns to v3.x default-off behavior). Set to `v2` to additionally enable the v2 tools-hash-mismatch drop. Set to `on` for v1 (back-compat — same as unset). |
 ## System prompt rewrite (preload mode, optional)
@@ -787,6 +855,52 @@ The preload interceptor includes monitoring for microcompact degradation, false
 See [docs/monitoring.md](docs/monitoring.md) for full details, debug mode, prefix diffing, environment variables, and the bundled quota analysis tool.
+### `usage-log` extension and the `MeterRowSchema v:1` wire format
+The `usage-log` extension (opt-in via `proxy/extensions.json`) appends one JSON line per API response to `~/.claude/usage.jsonl`. The row shape is `MeterRowSchema v:1` — the cross-repo contract validated by [`claude-code-meter`](https://github.com/cnighswonger/claude-code-meter)'s strict schema. Every field below is captured per call:
+| Field | Type | Source |
+|---|---|---|
+| `v` | literal `1` | constant |
+| `ts` | ISO-8601 datetime | server time at row emission |
+| `sid` | 8-char lowercase hex | proxy session id, sticky for the proxy's lifetime |
+| `model` | string ≤64 | `message_start.message.model` from the response stream |
+| `requested_model` | string ≤64 (optional) | request body `model` field |
+| `model_mismatch` | bool (optional) | true when `requested_model && model && requested_model !== model` |
+| `speed` | `"standard"` / `"fast"` / `""` | response `usage.speed` |
+| `service_tier` | string ≤32 | response `usage.service_tier` |
+| `input_tokens` | int ≥0 | response usage |
+| `output_tokens` | int ≥0 | response usage |
+| `cache_creation_input_tokens` | int ≥0 | response usage |
+| `cache_read_input_tokens` | int ≥0 | response usage |
+| `ephemeral_1h_input_tokens` | int ≥0 | response usage |
+| `ephemeral_5m_input_tokens` | int ≥0 | response usage |
+| `web_search_requests` | int ≥0 | response usage |
+| `q5h` / `q7d` | float 0–2 | `anthropic-ratelimit-unified-{5h,7d}-utilization` headers |
+| `q5h_reset` / `q7d_reset` | int (unix sec) | corresponding reset headers |
+| `qstatus`, `qoverage`, `qclaim` | lowercase enums | unified status / overage / claim headers |
+| `qfallback_pct` | float 0–1 | unified fallback percentage |
+| `qoverage_util` | float ≥0 (optional) | overage utilization header |
+| `qrepresentative_claim` | string ≤16 (optional) | representative-claim header |
+| `org_id` | 16-char hex (optional) | `sha256(anthropic-organization-id).slice(0, 16)` — never raw |
+| `overage_disabled_reason` | string ≤64 (optional) | overage-disabled-reason header |
+| `cache_hit_rate` | float 0–1 | `cache_read_input_tokens / (input + cache_creation + cache_read)` |
+| `q5h_delta`, `q7d_delta` | float | per-call delta from the previous row's q5h/q7d; 0 on first call after restart |
+| `request_id` | string ≤64 (optional, gated) | upstream `request-id` response header. Default-off; enable with `CACHE_FIX_USAGE_LOG_REQID=on`. **Cross-repo gate:** `claude-code-meter >= v0.7.0` accepts the optional field; older meter installs reject unknown keys via the strict-object schema. |
+**Why `request_id` matters operationally.** The `sid` field is generated once at proxy boot and shared across every CC session that proxy serves. On hosts running multiple concurrent CC sessions through one proxy (common in agent fleets), every session's rows collapse into the same `sid` — there's no way to ask "which session burned 80% of today's Opus tokens?" from `usage.jsonl` alone. CC's per-session JSONL transcripts at `~/.claude/projects/<project>/<session-uuid>.jsonl` already carry `requestId` for every API call. Capturing the same value in the meter row makes the post-hoc join trivial:
+```bash
+# Find which CC session each usage.jsonl row belongs to:
+for row in $(jq -c . < ~/.claude/usage.jsonl); do
+  req=$(jq -r '.request_id // empty' <<< "$row")
+  [ -z "$req" ] && continue
+  grep -l "\"requestId\":\"$req\"" ~/.claude/projects/*/*.jsonl
+done
+```
+The filename of the matching transcript is the CC session UUID, recovering per-session attribution for every meter row that was emitted with the field on.
 ## Limitations
 - **Proxy requires a running process** — The proxy must be started before Claude Code. If it's not running and `ANTHROPIC_BASE_URL` points to it, CC will fail to connect. We recommend running it as a systemd service or with a health-checking wrapper script.
@@ -827,6 +941,7 @@ We monitor 30+ upstream Claude Code issues related to cache, quota, and context
 - **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery, proposed and built proxy architecture for v3.0.0
 - **[@vmfarms](https://github.com/vmfarms)** — Concurrent multi-runner production validation, surfaced proxy-mode resume-marker regex no-op (#96), TTL tier detection gap (#97), and image-strip stderr leak (#98)
 - **[@ojura](https://github.com/ojura)** — Opus 4.7 thinking-summaries root-cause analysis: filed [anthropics/claude-code#59844](https://github.com/anthropics/claude-code/issues/59844) with the CLI-binary decode (`!getIsNonInteractiveSession()` gate at offset 230510599 in v2.1.142) and the two-stacked-special-cases framing, which made the `thinking-display` extension (v3.6.1) a clean proxy-side complement to the proposed upstream fix
+- **[@yurukusa](https://github.com/yurukusa)** — [Cluster taxonomy](https://yurukusa.github.io/cc-safe-setup/cluster-tracker.html#cluster-extended-thinking-wedge) for [anthropics/claude-code#63147](https://github.com/anthropics/claude-code/issues/63147) thinking-desync wedge; the 13E (ToolSearch) sub-pattern synthesis that made the `thinking-block-sanitize` v2 directive predicate tractable (cache-fix #171, shipped behind `=v2` opt-in in v4.0.0)
 - **[@schuay](https://github.com/schuay)** — `quota-statusline.sh` enhancements: 10-cell quota bar with elapsed-time tick and exhaust-vs-reset projection replacing the prior `%/min` burn-rate display (PR #140, v3.6.2), and d/h vs h/m time-format autoselect plus named time-unit and burn-warmup constants (PR #143, v3.7.0)
 If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.

package/bin/claude-via-proxy.mjs CHANGED Viewed

@@ -55,6 +55,7 @@ async function dispatch() {
         "  CACHE_FIX_PROXY_PORT     Port for the proxy server\n" +
         "  CACHE_FIX_PROXY_UPSTREAM Upstream URL\n" +
         "  CACHE_FIX_DEBUG=1        Verbose proxy logging\n" +
+        "  CACHE_FIX_HOT_RELOAD=on  Enable in-process extension hot-reload (off by default; see #196)\n" +
         "  CACHE_FIX_CLAUDE_CMD     Override the `claude` command for the wrapper\n",
     );
     return 0;

package/bin/install-service.mjs CHANGED Viewed

@@ -13,6 +13,7 @@ import { spawn } from "node:child_process";
 import { fileURLToPath } from "node:url";
 import { dirname, resolve, join } from "node:path";
 import { homedir, platform } from "node:os";
+import { systemdEscape, xmlEscape } from "../proxy/helpers.mjs";
 const __dirname = dirname(fileURLToPath(import.meta.url));
 const TEMPLATE_DIR = resolve(__dirname, "..", "templates");
@@ -22,7 +23,14 @@ function getDefaults() {
   return {
     port: validatePort(process.env.CACHE_FIX_PROXY_PORT || "9801"),
     upstream: process.env.CACHE_FIX_PROXY_UPSTREAM || "",
+    caFile: process.env.CACHE_FIX_PROXY_CA_FILE || "",
+    rejectUnauthorized: process.env.CACHE_FIX_PROXY_REJECT_UNAUTHORIZED || "",
     debug: process.env.CACHE_FIX_DEBUG || "",
+    // Hot-reload is opt-in as of v4.0.0 (#196). Capture from env at install
+    // time so the operator can bake `CACHE_FIX_HOT_RELOAD=on` into the
+    // generated unit/plist via `CACHE_FIX_HOT_RELOAD=on cache-fix-proxy
+    // install-service`. Strict "on" match — anything else renders nothing.
+    hotReload: process.env.CACHE_FIX_HOT_RELOAD === "on" ? "on" : "",
     workingDir: resolve(__dirname, ".."),
   };
 }
@@ -88,10 +96,19 @@ function getPaths(plat = platform()) {
 function renderSystemdTemplate(template, vars) {
   const upstreamLine = vars.upstream
-    ? `Environment=CACHE_FIX_PROXY_UPSTREAM=${vars.upstream}`
+    ? `Environment=CACHE_FIX_PROXY_UPSTREAM=${systemdEscape(vars.upstream)}`
+    : "";
+  const caFileLine = vars.caFile
+    ? `Environment=CACHE_FIX_PROXY_CA_FILE=${systemdEscape(vars.caFile)}`
+    : "";
+  const rejectUnauthorizedLine = vars.rejectUnauthorized
+    ? `Environment=CACHE_FIX_PROXY_REJECT_UNAUTHORIZED=${systemdEscape(vars.rejectUnauthorized)}`
     : "";
   const debugLine = vars.debug
-    ? `Environment=CACHE_FIX_DEBUG=${vars.debug}`
+    ? `Environment=CACHE_FIX_DEBUG=${systemdEscape(vars.debug)}`
+    : "";
+  const hotReloadLine = vars.hotReload
+    ? `Environment=CACHE_FIX_HOT_RELOAD=${vars.hotReload}`
     : "";
   // Allow callers to wire a Requires= line (e.g. another service the proxy
   // chains to). Empty string by default so the unit has no extra deps.
@@ -103,7 +120,10 @@ function renderSystemdTemplate(template, vars) {
     .replaceAll("{{SERVER_PATH}}", vars.serverPath)
     .replaceAll("{{PORT}}", vars.port)
     .replaceAll("{{UPSTREAM_LINE}}", upstreamLine)
+    .replaceAll("{{CA_FILE_LINE}}", caFileLine)
+    .replaceAll("{{REJECT_UNAUTHORIZED_LINE}}", rejectUnauthorizedLine)
     .replaceAll("{{DEBUG_LINE}}", debugLine)
+    .replaceAll("{{HOT_RELOAD_LINE}}", hotReloadLine)
     .replaceAll("{{REQUIRES_LINE}}", requiresLine)
     .replaceAll("{{WORKING_DIR}}", vars.workingDir)
     // Collapse triple newlines from empty optional lines down to single blank
@@ -112,17 +132,29 @@ function renderSystemdTemplate(template, vars) {
 function renderLaunchdTemplate(template, vars) {
   const upstreamPlist = vars.upstream
-    ? `        <key>CACHE_FIX_PROXY_UPSTREAM</key>\n        <string>${vars.upstream}</string>`
+    ? `        <key>CACHE_FIX_PROXY_UPSTREAM</key>\n        <string>${xmlEscape(vars.upstream)}</string>`
+    : "";
+  const caFilePlist = vars.caFile
+    ? `        <key>CACHE_FIX_PROXY_CA_FILE</key>\n        <string>${xmlEscape(vars.caFile)}</string>`
+    : "";
+  const rejectUnauthorizedPlist = vars.rejectUnauthorized
+    ? `        <key>CACHE_FIX_PROXY_REJECT_UNAUTHORIZED</key>\n        <string>${xmlEscape(vars.rejectUnauthorized)}</string>`
     : "";
   const debugPlist = vars.debug
-    ? `        <key>CACHE_FIX_DEBUG</key>\n        <string>${vars.debug}</string>`
+    ? `        <key>CACHE_FIX_DEBUG</key>\n        <string>${xmlEscape(vars.debug)}</string>`
+    : "";
+  const hotReloadPlist = vars.hotReload
+    ? `        <key>CACHE_FIX_HOT_RELOAD</key>\n        <string>${vars.hotReload}</string>`
     : "";
   return template
     .replaceAll("{{NODE}}", vars.node)
     .replaceAll("{{SERVER_PATH}}", vars.serverPath)
     .replaceAll("{{PORT}}", vars.port)
     .replaceAll("{{UPSTREAM_PLIST}}", upstreamPlist)
+    .replaceAll("{{CA_FILE_PLIST}}", caFilePlist)
+    .replaceAll("{{REJECT_UNAUTHORIZED_PLIST}}", rejectUnauthorizedPlist)
     .replaceAll("{{DEBUG_PLIST}}", debugPlist)
+    .replaceAll("{{HOT_RELOAD_PLIST}}", hotReloadPlist)
     .replaceAll("{{WORKING_DIR}}", vars.workingDir)
     .replaceAll("{{LOG_DIR}}", vars.logDir)
     .replace(/\n\n+/g, "\n");
@@ -173,11 +205,8 @@ async function installSystemd({ paths, defaults, force = false } = {}) {
   const rendered = renderSystemdTemplate(template, {
     node: process.execPath,
     serverPath: SERVER_PATH,
-    port: defaults.port,
-    upstream: defaults.upstream,
-    debug: defaults.debug,
-    workingDir: defaults.workingDir,
     requires: "",
+    ...defaults,
   });
   await mkdir(paths.configDir, { recursive: true });
   await writeFile(targetPath, rendered);
@@ -272,11 +301,8 @@ async function installLaunchd({ paths, defaults, force = false } = {}) {
   const rendered = renderLaunchdTemplate(template, {
     node: process.execPath,
     serverPath: SERVER_PATH,
-    port: defaults.port,
-    upstream: defaults.upstream,
-    debug: defaults.debug,
-    workingDir: defaults.workingDir,
     logDir: paths.logDir,
+    ...defaults,
   });
   await mkdir(paths.configDir, { recursive: true });
   await writeFile(targetPath, rendered);

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-code-cache-fix",
-  "version": "3.9.0",
+  "version": "4.1.0",
   "description": "Cache optimization proxy and interceptor for Claude Code. Fixes prompt cache bugs, stabilizes prefix, reduces quota burn.",
   "type": "module",
   "exports": {

package/proxy/extensions/cache-telemetry.mjs CHANGED Viewed

@@ -56,7 +56,12 @@ export function sessionFilePath(rawId) {
   return join(paths().sessionsDir, `${sessionFilename(rawId)}.json`);
 }
-function resolveSessionId(headers) {
+// Exported so sibling extensions can read the canonical session id from
+// REQUEST headers at their own onRequest time — they can't rely on
+// ctx.meta._sessionId being set, because this writer's onRequest is the
+// thing that populates it (and runs at order 600, after most extensions).
+// thinking-block-sanitize v2 (order 550) uses this for the same reason.
+export function resolveSessionId(headers) {
   if (!headers) return null;
   const sid =
     headers["x-claude-code-session-id"] ||
@@ -233,9 +238,16 @@ export default {
           // 590, stashes these before this writer runs). Optional — absent if
           // that extension is disabled or produced nothing this request.
           ...(ctx.meta._sessionHealth || {}),
-          // Additive thinking-block-sanitize drop count (order 550, opt-in).
-          // Optional — absent unless CACHE_FIX_THINKING_SANITIZE=on.
+          // Additive thinking-block-sanitize drop count (order 550). On by
+          // default since v4.0.0; present (possibly with thinking_blocks_dropped:0)
+          // whenever sanitize ran. Absent when CACHE_FIX_THINKING_SANITIZE=off
+          // or when the extension returned early before reaching the planner
+          // (e.g., body.messages not an array).
           ...(ctx.meta._thinkingSanitize || {}),
+          // Additive thinking-block-sanitize v2 fields (order 550, opt-in via
+          // CACHE_FIX_THINKING_SANITIZE=v2). Optional — absent unless v2 is
+          // enabled. Keys: thinking_blocks_dropped_v2 / tools_hash_baseline.
+          ...(ctx.meta._thinkingSanitizeV2 || {}),
           // Additive auto-1m-guard annotation (order 520). Optional — absent
           // unless the outbound request carried context-1m-2025-08-07 and the
           // mode wasn't off. Keys: auto_1m_detected / auto_1m_action /

package/proxy/extensions/signature-surface-hash.mjs ADDED Viewed

@@ -0,0 +1,60 @@
+// signature-surface-hash — hash helper for thinking-block-sanitize v2.
+//
+// Computes a deterministic 16-hex-char fingerprint of the inputs that
+// participate in the API's thinking-block signature: the tools surface,
+// and (forward-compat) optionally the system block or anthropic-beta
+// header value.
+//
+// v2 only passes `{ tools }`. The signature is left forward-compatible so
+// a future v3 directive can extend coverage without renaming this helper.
+//
+// Canonicalization rules (per directive proxy-thinking-block-sanitize-v2.md):
+//   - Each tool object: recursive stable JSON stringify with recursive key
+//     sorting at every nesting level. Nested JSON-schema objects (in
+//     input_schema, parameters, etc.) have their own keys, which also
+//     sort stably.
+//   - Preserve tools[] array order. Reordering tools changes which slot
+//     which tool occupies in the API's view; the hash MUST reflect that.
+//     (Note: sort-stabilization at order 200 currently locks the array
+//     order before v2 fires, so this rule is forward-compatibility against
+//     any future change in upstream ordering.)
+//   - Sentinel for empty/absent: if tools is undefined, null, or [], the
+//     hash input is the literal string "none". Rules out collision with
+//     other empty-shaped inputs in a future extension.
+//
+// Output: sha256(canonical_input).slice(0, 16) — 16 hex chars matches the
+// existing _sessionHealth / _thinkingSanitize precedent for in-JSON identifiers.
+import { createHash } from "node:crypto";
+// Recursive stable stringify: object keys sort, arrays preserve order,
+// primitives go through JSON.stringify as-is. Handles nested objects and
+// arrays to arbitrary depth.
+export function canonicalStringify(value) {
+  if (value === null || typeof value !== "object") {
+    return JSON.stringify(value);
+  }
+  if (Array.isArray(value)) {
+    return "[" + value.map(canonicalStringify).join(",") + "]";
+  }
+  const keys = Object.keys(value).sort();
+  const parts = keys.map((k) => JSON.stringify(k) + ":" + canonicalStringify(value[k]));
+  return "{" + parts.join(",") + "}";
+}
+// Compute the signature-surface hash. v2 passes only { tools }; system and
+// anthropic_beta are reserved for future versions.
+export function computeSignatureSurfaceHash({ tools, system, anthropic_beta } = {}) {
+  // Empty/absent tools → "none" sentinel (not the canonical-stringify of [],
+  // which would be "[]" and could collide with other empty-shaped inputs).
+  const toolsPart =
+    tools == null || (Array.isArray(tools) && tools.length === 0)
+      ? "none"
+      : canonicalStringify(tools);
+  // Reserved inputs — passed by future versions; v2 always omits them, so
+  // they contribute nothing to the hash today. Kept in the signature so
+  // existing call sites don't need to change when v3 adds them.
+  void system;
+  void anthropic_beta;
+  return createHash("sha256").update(toolsPart).digest("hex").slice(0, 16);
+}