claude-code-cache-fix 3.9.0 → 4.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +128 -13
- package/bin/claude-via-proxy.mjs +1 -0
- package/bin/install-service.mjs +38 -12
- package/package.json +1 -1
- package/proxy/extensions/cache-telemetry.mjs +15 -3
- package/proxy/extensions/signature-surface-hash.mjs +60 -0
- package/proxy/extensions/thinking-block-sanitize.mjs +233 -19
- package/proxy/extensions/usage-log.mjs +46 -1
- package/proxy/extensions.json +18 -80
- package/proxy/helpers.mjs +30 -0
- package/proxy/pipeline.mjs +22 -1
- package/proxy/server.mjs +136 -13
- package/proxy/upstream.mjs +15 -1
- package/templates/cache-fix-proxy.service.template +3 -0
- package/templates/com.cnighswonger.cache-fix-proxy.plist.template +3 -0
- package/tools/MANUAL-COMPACT.md +16 -1
- package/tools/cache_analysis.py +229 -0
package/README.md
CHANGED
|
@@ -6,13 +6,13 @@ English | [中文](./README.zh.md) | [한국어](./README.ko.md) | [Português](
|
|
|
6
6
|
|
|
7
7
|
Cache optimization proxy for [Claude Code](https://github.com/anthropics/claude-code). Fixes prompt cache bugs that cause excessive quota burn, stabilizes the request prefix, and monitors for silent regressions. Works with all CC versions including the v2.1.113+ Bun binary.
|
|
8
8
|
|
|
9
|
-
> **
|
|
9
|
+
> **v4.0.0** — Local HTTP proxy with a pipeline of cost-impact and observability extensions. Two long-standing defaults flipped: `thinking-block-sanitize` v1 is on by default (mitigates the thinking-desync `400` wedge — [#63147](https://github.com/anthropics/claude-code/issues/63147)) and in-process extension hot-reload is opt-in (`CACHE_FIX_HOT_RELOAD=on`). A/B baseline (v3.0.0 on v2.1.117): **95.5% cache hit rate through proxy vs 82.3% direct** on first warm turn. [Full release notes →](https://github.com/cnighswonger/claude-code-cache-fix/releases/tag/v4.0.0)
|
|
10
10
|
|
|
11
11
|
> **Opus 4.7 advisory:** Metered data shows 4.7 burns Q5h quota at **~2.4x the rate of 4.6** for equivalent visible token counts ([independently confirmed by @ArkNill](https://github.com/ArkNill/claude-code-hidden-problem-analysis/blob/main/16_OPUS-47-ADVISORY.md)). Two factors: a new tokenizer (up to 35% more tokens, [documented](https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7)) and adaptive thinking overhead (~105%, not documented in usage response). The Q5h impact compounds into **Q7d** — the weekly quota ceiling that most heavy users will hit first. Workaround: `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` reduces burn by ~3.3x but may reduce quality on complex tasks. See [Discussion #25](https://github.com/cnighswonger/claude-code-cache-fix/discussions/25) (initial observation) and [Discussion #42](https://github.com/cnighswonger/claude-code-cache-fix/discussions/42) (controlled A/B data + Q7d analysis).
|
|
12
12
|
|
|
13
13
|
## Quick Start: Proxy (recommended)
|
|
14
14
|
|
|
15
|
-
The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as
|
|
15
|
+
The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as composable extensions.
|
|
16
16
|
|
|
17
17
|
```bash
|
|
18
18
|
# Install
|
|
@@ -25,11 +25,11 @@ node "$(npm root -g)/claude-code-cache-fix/proxy/server.mjs" &
|
|
|
25
25
|
ANTHROPIC_BASE_URL=http://127.0.0.1:9801 claude
|
|
26
26
|
```
|
|
27
27
|
|
|
28
|
-
That's it. The proxy applies
|
|
28
|
+
That's it. The proxy applies its default extension pipeline automatically. No wrapper scripts, no `NODE_OPTIONS`, no preload.
|
|
29
29
|
|
|
30
30
|
### What the proxy does
|
|
31
31
|
|
|
32
|
-
On every `/v1/messages` request,
|
|
32
|
+
On every `/v1/messages` request, the pipeline runs an ordered chain of extensions covering cache stability, observability, thinking-desync mitigation, image, microcompact, breakpoint, bootstrap-channel, and other surfaces. Several are gated behind env vars documented in their own sections below; bootstrap-channel handling defaults to `audit` mode. The headliners:
|
|
33
33
|
|
|
34
34
|
| Extension | What it fixes |
|
|
35
35
|
|-----------|--------------|
|
|
@@ -41,9 +41,9 @@ On every `/v1/messages` request, 9 extensions run in order (one opt-in):
|
|
|
41
41
|
| `cache-control-normalize` | Normalizes cache_control markers across messages |
|
|
42
42
|
| `cache-telemetry` | Extracts cache stats from response headers → `~/.claude/quota-status/{account.json,sessions/<id>.json}` |
|
|
43
43
|
| `session-health` | Observes per-session thinking-desync risk (context size + thinking-block count) and warns before a session reaches the danger zone. Read-only |
|
|
44
|
-
| `thinking-block-sanitize` | Drops omitted (empty-text) thinking blocks to head off the CC thinking-desync `400` (#63147). **
|
|
44
|
+
| `thinking-block-sanitize` | Drops omitted (empty-text) thinking blocks to head off the CC thinking-desync `400` (#63147). **On by default as of v4.0.0** (v1 mode). Set `CACHE_FIX_THINKING_SANITIZE=off` to disable, `=v2` for additional tools-hash-mismatch drop (opt-in). |
|
|
45
45
|
|
|
46
|
-
Extensions
|
|
46
|
+
Extensions live as `.mjs` files in `proxy/extensions/` with configuration in `proxy/extensions.json`. As of v4.0.0 the proxy loads them once at startup; adding, removing, or modifying an extension requires a supervisor-level proxy restart (see [Upgrading from v3.x](#upgrading-from-v3x)). Hot-reload is available as opt-in via `CACHE_FIX_HOT_RELOAD=on` for users who want the v3.x behavior back; that path is subject to the Node ESM stale-import race documented in [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196).
|
|
47
47
|
|
|
48
48
|
**Developing a new extension?** See [docs/parallel-proxy-test-harness.md](docs/parallel-proxy-test-harness.md) for the pattern we use to test extensions end-to-end against real `claude -p` traffic without disturbing the production proxy.
|
|
49
49
|
|
|
@@ -116,7 +116,7 @@ docker run -d --name cache-fix-proxy --restart=always -p 9801:9801 \
|
|
|
116
116
|
ghcr.io/cnighswonger/claude-code-cache-fix:latest
|
|
117
117
|
```
|
|
118
118
|
|
|
119
|
-
Image tags: `latest`, `
|
|
119
|
+
Image tags: `latest`, `4`, `4.0`, `4.0.0` (semver-ladder, so `4` always points to the newest 4.x). `latest` always tracks the newest tagged release.
|
|
120
120
|
|
|
121
121
|
**Linux note:** the chained-upstream `host.docker.internal` example below is automatic on Docker Desktop (macOS / Windows). On plain Linux Docker Engine you usually need `--add-host=host.docker.internal:host-gateway` so the name resolves to the host bridge. Without it, the container's name lookup fails and the proxy can't reach the upstream service running on the host. Example chaining cache-fix proxy through `llm-relay` running on the host:
|
|
122
122
|
|
|
@@ -147,6 +147,7 @@ All proxy settings are controlled via environment variables. Set them before sta
|
|
|
147
147
|
| `CACHE_FIX_EXTENSIONS_DIR` | `proxy/extensions/` | Directory for extension `.mjs` files |
|
|
148
148
|
| `CACHE_FIX_EXTENSIONS_CONFIG` | `proxy/extensions.json` | Extension configuration file |
|
|
149
149
|
| `CACHE_FIX_DEBUG` | `0` | Enable debug logging |
|
|
150
|
+
| `CACHE_FIX_HOT_RELOAD` | unset | Set to `on` to enable in-process extension hot-reload. Off by default as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x) for details and the supervisor restart flow. |
|
|
150
151
|
|
|
151
152
|
### Corporate environments (proxies, custom CAs)
|
|
152
153
|
|
|
@@ -210,11 +211,78 @@ Options (all optional; all fall back to the same env vars used by the CLI):
|
|
|
210
211
|
|
|
211
212
|
*The embeddable factory was contributed by [@bilby91](https://github.com/bilby91) at [Crunchloop DAP](https://dap.crunchloop.ai) — see [PR #123](https://github.com/cnighswonger/claude-code-cache-fix/pull/123).*
|
|
212
213
|
|
|
214
|
+
## Upgrading from v3.x
|
|
215
|
+
|
|
216
|
+
**Behavior changes in v4.0.0:**
|
|
217
|
+
|
|
218
|
+
- **`thinking-block-sanitize` v1 is now on by default.** Was opt-in via `CACHE_FIX_THINKING_SANITIZE=on` in v3.8.0–v3.9.x. After seven days of prod dogfood across 37 sessions (zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs. 92.44% baseline, sanitize firing on ~35% of sessions with ~800 blocks dropped per day) the v1 mitigation is the new default. Set `CACHE_FIX_THINKING_SANITIZE=off` to explicitly disable. v2 (additional tools-hash-mismatch drop) stays opt-in via `=v2`. See [#63147](https://github.com/anthropics/claude-code/issues/63147) and [#162](https://github.com/cnighswonger/claude-code-cache-fix/issues/162).
|
|
219
|
+
- **In-process extension hot-reload is now off by default.** Was on in v3.x. Set `CACHE_FIX_HOT_RELOAD=on` to restore the prior behavior. Off-by-default eliminates the Node ESM stale-import race documented in [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196), where the watcher silently failed to load a newly-merged extension for 17 hours after a hot-reload trigger. The race fires when the file watcher re-imports an extension whose transitive dependencies are already cached by Node's loader; cold starts are unaffected.
|
|
220
|
+
|
|
221
|
+
### Embedder note (Bun hosts, DAP-style integrations using `createProxyServer()` / `startProxy()`)
|
|
222
|
+
|
|
223
|
+
v4.0.0 flips `CACHE_FIX_THINKING_SANITIZE` from default-off to default-on. The v1 omitted-text drop will run on every request body passing through the embedded proxy. If your host depends on the prior no-sanitization behavior (e.g., your downstream code expects empty `thinking` blocks to survive the proxy round-trip), preserve it by either:
|
|
224
|
+
|
|
225
|
+
- Setting `CACHE_FIX_THINKING_SANITIZE=off` in your host's environment, OR
|
|
226
|
+
- Setting `process.env.CACHE_FIX_THINKING_SANITIZE = "off"` in your code at any point before request handling — the mode is read per-request via `modeFromEnv()`, not cached at module load.
|
|
227
|
+
|
|
228
|
+
The flip is backed by 7 days of prod dogfood (37 sessions, zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs 92.44% baseline). See [PR #201](https://github.com/cnighswonger/claude-code-cache-fix/pull/201) for the validation data and [#63147](https://github.com/anthropics/claude-code/issues/63147) for upstream context.
|
|
229
|
+
|
|
230
|
+
Picking up a new extension or a code change to an existing one in v4.0.0 requires a supervisor-level proxy restart. There are two upgrade flows depending on whether you also want to opt back into hot-reload.
|
|
231
|
+
|
|
232
|
+
### Flow 1 — code-only npm upgrade (recommended default)
|
|
233
|
+
|
|
234
|
+
Your existing systemd unit / launchd plist is unchanged; only the proxy code on disk is updated by npm. Restart the running process to pick up the new code.
|
|
235
|
+
|
|
236
|
+
**Linux (systemd user unit):**
|
|
237
|
+
|
|
238
|
+
```
|
|
239
|
+
npm install -g claude-code-cache-fix@4
|
|
240
|
+
systemctl --user restart cache-fix-proxy
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
No `daemon-reload` required — the unit file content is unchanged.
|
|
244
|
+
|
|
245
|
+
**macOS (launchd user agent):**
|
|
246
|
+
|
|
247
|
+
```
|
|
248
|
+
npm install -g claude-code-cache-fix@4
|
|
249
|
+
launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
`kickstart` re-execs the agent under the existing plist.
|
|
253
|
+
|
|
254
|
+
### Flow 2 — opt back into hot-reload at the supervisor layer
|
|
255
|
+
|
|
256
|
+
Run if you actively use hot-reload (e.g., you drop custom extensions into the extensions dir on a live proxy and want them picked up without restart). This rewrites the unit / plist so `CACHE_FIX_HOT_RELOAD=on` is set every time the supervisor starts the proxy.
|
|
257
|
+
|
|
258
|
+
**Linux (systemd user unit):**
|
|
259
|
+
|
|
260
|
+
```
|
|
261
|
+
CACHE_FIX_HOT_RELOAD=on cache-fix-proxy install-service
|
|
262
|
+
systemctl --user daemon-reload
|
|
263
|
+
systemctl --user restart cache-fix-proxy
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
`daemon-reload` is required because the unit file content changed.
|
|
267
|
+
|
|
268
|
+
**macOS (launchd user agent):**
|
|
269
|
+
|
|
270
|
+
```
|
|
271
|
+
CACHE_FIX_HOT_RELOAD=on cache-fix-proxy install-service
|
|
272
|
+
launchctl bootout gui/$(id -u)/com.cnighswonger.cache-fix-proxy
|
|
273
|
+
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.cnighswonger.cache-fix-proxy.plist
|
|
274
|
+
launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
`bootout` + `bootstrap` is required because the plist contents changed — `kickstart` alone does not pick up plist changes.
|
|
278
|
+
|
|
279
|
+
**Note on the hot-reload tradeoff:** even on the opt-in path, the ESM stale-import race remains possible on long-running processes. If you hit a degraded `/health` (returns 503 + `{status:"degraded",...}`), a process restart is the only recovery; the proxy logs a `[CRITICAL]` hint when this happens. See [#197](https://github.com/cnighswonger/claude-code-cache-fix/pull/197) for the observability layer.
|
|
280
|
+
|
|
213
281
|
## What this proxy defends against
|
|
214
282
|
|
|
215
283
|
**Cache-economics regressions.** The original purpose of cache-fix is to absorb the cache-handling behaviors in Claude Code that cost users real money and quota — TTL downgrades, cache-breaking header churn, identity-latching issues, and the rest of the regression catalog documented across our issue history. The proxy sits between CC and the Anthropic API, normalizes the request and response stream, and emits enough observability (via statusline integration and the quota-status files) that users can see what their session is actually doing. This is the load-bearing feature for almost every user today.
|
|
216
284
|
|
|
217
|
-
**Bootstrap-channel observability.** Claude Code v2.1.150 introduced a prompt-section consumer that fetches a server-supplied string from `/api/claude_cli/bootstrap` and merges it into the agent's behavioral-instructions prompt path. We filed this behavior with Anthropic's security team in May 2026; Anthropic closed the report as *Informative*, treating TLS as the transport-integrity boundary and declining to add application-layer authenticity checks. Cache-fix
|
|
285
|
+
**Bootstrap-channel observability.** Claude Code v2.1.150 introduced a prompt-section consumer that fetches a server-supplied string from `/api/claude_cli/bootstrap` and merges it into the agent's behavioral-instructions prompt path. We filed this behavior with Anthropic's security team in May 2026; Anthropic closed the report as *Informative*, treating TLS as the transport-integrity boundary and declining to add application-layer authenticity checks. Cache-fix shipped explicit handling for this path in v3.7.0 and extended it in v3.7.1 to also cover the env-var-selected GrowthBook prompt-injection surface that landed in CC v2.1.152 (remote-control mode: `CLAUDE_CODE_SYSTEM_PROMPT_GB_FEATURE` names a flag key whose cached value is used as the system prompt body). Stable in the current v4.x line.
|
|
218
286
|
|
|
219
287
|
Cache-fix's `bootstrap-defense` extension ships three modes, selected via `CACHE_FIX_BOOTSTRAP_MODE`:
|
|
220
288
|
|
|
@@ -333,7 +401,7 @@ For manual VS Code wrapper setup (without the VSIX), see [docs/preload-setup.md]
|
|
|
333
401
|
|
|
334
402
|
**What it does NOT do:** No network calls from the proxy or interceptor. All telemetry is written to local files under `~/.claude/`. No data leaves your machine.
|
|
335
403
|
|
|
336
|
-
**Supply chain:** Proxy mode:
|
|
404
|
+
**Supply chain:** Proxy mode: small focused extension modules in `proxy/extensions/` (most under a few hundred lines; the pipeline is composable, you can read any single one in isolation). Preload mode: single unminified file (`preload.mjs`). One dev dependency (`zod` for schema validation in tests only). Review before installing. Published builds carry npm's default registry signatures; sigstore provenance attestation is not currently published — tracked as a follow-up.
|
|
337
405
|
|
|
338
406
|
**Independent audit:** [Assessed as "LEGITIMATE TOOL"](https://github.com/anthropics/claude-code/issues/38335#issuecomment-4244413605) by @TheAuditorTool (2026-04-14).
|
|
339
407
|
|
|
@@ -353,7 +421,7 @@ Additionally, images read via the Read tool persist as base64 in conversation hi
|
|
|
353
421
|
|
|
354
422
|
## How it works
|
|
355
423
|
|
|
356
|
-
**Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests.
|
|
424
|
+
**Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. A pipeline of extension modules processes each request — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers, sanitizing thinking blocks, recording telemetry, and more. Extensions live as `.mjs` files configured in `proxy/extensions.json` and load once at proxy startup (hot-reload is opt-in as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x)). All other traffic passes through untouched.
|
|
357
425
|
|
|
358
426
|
**Preload mode** (v2.x): A Node.js `--import` module that patches `globalThis.fetch` before Claude Code makes API calls. Applies the same fixes inline — scans user messages for relocated blocks, sorts tools, recomputes fingerprints, injects TTL markers.
|
|
359
427
|
|
|
@@ -765,17 +833,17 @@ Token thresholds are anchored to the observed ~382K-token trip with margin; the
|
|
|
765
833
|
| `CACHE_FIX_THINKING_RISK_HIGH_TOKENS` | `340000` | Context-token level at which risk becomes `high` and the one-time stderr warn fires. |
|
|
766
834
|
| `CACHE_FIX_THINKING_RISK` | unset (on) | Set to `off` to suppress the warning signal (stderr line + `thinking_desync_risk` field). Raw count telemetry keeps recording. |
|
|
767
835
|
|
|
768
|
-
## Thinking-block sanitize (proxy mode,
|
|
836
|
+
## Thinking-block sanitize (proxy mode, on by default, thinking-desync mitigation)
|
|
769
837
|
|
|
770
838
|
The *mitigate* half of the thinking-desync response (the *warn-before* half is session-health above). On history-replay paths (resume / `--continue` / auto-compaction / parallel-tool-cancel), Claude Code re-sends prior assistant turns' extended thinking in the **omitted** shape `{ "type":"thinking", "thinking":"", "signature":"<intact>" }`. The API rejects modified thinking in the **latest** assistant message with a permanent `400 … thinking … blocks cannot be modified`, which wedges the session on every subsequent turn (upstream root cause: [anthropics/claude-code#63147](https://github.com/anthropics/claude-code/issues/63147)).
|
|
771
839
|
|
|
772
840
|
The `thinking-block-sanitize` extension drops those omitted blocks — which the API treats as optional history — from the request before it is forwarded. Empirically-resolved turn-selection rule: drop omitted thinking from **all prior assistant turns and the latest assistant turn, unless the latest turn is an active tool-continuation** (its last block is a `tool_use` answered by a following `tool_result`). In that one case the API requires the signed thinking intact and the proxy cannot restore the emptied text, so it leaves the turn untouched. **No env var both preserves thinking and avoids the wedge for that case:** `CLAUDE_CODE_DISABLE_THINKING=1` / `MAX_THINKING_TOKENS=0` stop the wedge only by disabling thinking entirely (lossy — no reasoning), and `DISABLE_INTERLEAVED_THINKING=1` does *not* stop the `400` — so there the answer is don't-resume + heal/retire the session. That is exactly why the proxy mitigation matters: **it is the only path that preserves reasoning while avoiding the wedge** for the history-replay paths it covers. Non-empty thinking is never touched; `redacted_thinking` is out of scope for v1.
|
|
773
841
|
|
|
774
|
-
**
|
|
842
|
+
**On by default as of v4.0.0.** v1 was opt-in via `CACHE_FIX_THINKING_SANITIZE=on` in v3.8.0–v3.9.x. After seven days of prod dogfood across 37 sessions (zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs. 92.44% baseline, sanitize firing on ~35% of sessions with ~800 blocks dropped per day, max 938K context healthy) the v1 mitigation is the new default. The transform is deterministic and cache-prefix-stable, and emits a per-request `thinking_blocks_dropped` count into the per-session JSON (counts only — never content) that complements the session-health signal. v2 stays opt-in pending its own prod-dogfood window after [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196) closes the silent-load failure mode that prevented v2 from running in prior testing.
|
|
775
843
|
|
|
776
844
|
| Env var | Default | Purpose |
|
|
777
845
|
|---------|---------|---------|
|
|
778
|
-
| `CACHE_FIX_THINKING_SANITIZE` | unset (
|
|
846
|
+
| `CACHE_FIX_THINKING_SANITIZE` | unset (= v1) | v4.0.0+: v1 omitted-block drop is the default. Set to `off` to explicitly disable (returns to v3.x default-off behavior). Set to `v2` to additionally enable the v2 tools-hash-mismatch drop. Set to `on` for v1 (back-compat — same as unset). |
|
|
779
847
|
|
|
780
848
|
## System prompt rewrite (preload mode, optional)
|
|
781
849
|
|
|
@@ -787,6 +855,52 @@ The preload interceptor includes monitoring for microcompact degradation, false
|
|
|
787
855
|
|
|
788
856
|
See [docs/monitoring.md](docs/monitoring.md) for full details, debug mode, prefix diffing, environment variables, and the bundled quota analysis tool.
|
|
789
857
|
|
|
858
|
+
### `usage-log` extension and the `MeterRowSchema v:1` wire format
|
|
859
|
+
|
|
860
|
+
The `usage-log` extension (opt-in via `proxy/extensions.json`) appends one JSON line per API response to `~/.claude/usage.jsonl`. The row shape is `MeterRowSchema v:1` — the cross-repo contract validated by [`claude-code-meter`](https://github.com/cnighswonger/claude-code-meter)'s strict schema. Every field below is captured per call:
|
|
861
|
+
|
|
862
|
+
| Field | Type | Source |
|
|
863
|
+
|---|---|---|
|
|
864
|
+
| `v` | literal `1` | constant |
|
|
865
|
+
| `ts` | ISO-8601 datetime | server time at row emission |
|
|
866
|
+
| `sid` | 8-char lowercase hex | proxy session id, sticky for the proxy's lifetime |
|
|
867
|
+
| `model` | string ≤64 | `message_start.message.model` from the response stream |
|
|
868
|
+
| `requested_model` | string ≤64 (optional) | request body `model` field |
|
|
869
|
+
| `model_mismatch` | bool (optional) | true when `requested_model && model && requested_model !== model` |
|
|
870
|
+
| `speed` | `"standard"` / `"fast"` / `""` | response `usage.speed` |
|
|
871
|
+
| `service_tier` | string ≤32 | response `usage.service_tier` |
|
|
872
|
+
| `input_tokens` | int ≥0 | response usage |
|
|
873
|
+
| `output_tokens` | int ≥0 | response usage |
|
|
874
|
+
| `cache_creation_input_tokens` | int ≥0 | response usage |
|
|
875
|
+
| `cache_read_input_tokens` | int ≥0 | response usage |
|
|
876
|
+
| `ephemeral_1h_input_tokens` | int ≥0 | response usage |
|
|
877
|
+
| `ephemeral_5m_input_tokens` | int ≥0 | response usage |
|
|
878
|
+
| `web_search_requests` | int ≥0 | response usage |
|
|
879
|
+
| `q5h` / `q7d` | float 0–2 | `anthropic-ratelimit-unified-{5h,7d}-utilization` headers |
|
|
880
|
+
| `q5h_reset` / `q7d_reset` | int (unix sec) | corresponding reset headers |
|
|
881
|
+
| `qstatus`, `qoverage`, `qclaim` | lowercase enums | unified status / overage / claim headers |
|
|
882
|
+
| `qfallback_pct` | float 0–1 | unified fallback percentage |
|
|
883
|
+
| `qoverage_util` | float ≥0 (optional) | overage utilization header |
|
|
884
|
+
| `qrepresentative_claim` | string ≤16 (optional) | representative-claim header |
|
|
885
|
+
| `org_id` | 16-char hex (optional) | `sha256(anthropic-organization-id).slice(0, 16)` — never raw |
|
|
886
|
+
| `overage_disabled_reason` | string ≤64 (optional) | overage-disabled-reason header |
|
|
887
|
+
| `cache_hit_rate` | float 0–1 | `cache_read_input_tokens / (input + cache_creation + cache_read)` |
|
|
888
|
+
| `q5h_delta`, `q7d_delta` | float | per-call delta from the previous row's q5h/q7d; 0 on first call after restart |
|
|
889
|
+
| `request_id` | string ≤64 (optional, gated) | upstream `request-id` response header. Default-off; enable with `CACHE_FIX_USAGE_LOG_REQID=on`. **Cross-repo gate:** `claude-code-meter >= v0.7.0` accepts the optional field; older meter installs reject unknown keys via the strict-object schema. |
|
|
890
|
+
|
|
891
|
+
**Why `request_id` matters operationally.** The `sid` field is generated once at proxy boot and shared across every CC session that proxy serves. On hosts running multiple concurrent CC sessions through one proxy (common in agent fleets), every session's rows collapse into the same `sid` — there's no way to ask "which session burned 80% of today's Opus tokens?" from `usage.jsonl` alone. CC's per-session JSONL transcripts at `~/.claude/projects/<project>/<session-uuid>.jsonl` already carry `requestId` for every API call. Capturing the same value in the meter row makes the post-hoc join trivial:
|
|
892
|
+
|
|
893
|
+
```bash
|
|
894
|
+
# Find which CC session each usage.jsonl row belongs to:
|
|
895
|
+
for row in $(jq -c . < ~/.claude/usage.jsonl); do
|
|
896
|
+
req=$(jq -r '.request_id // empty' <<< "$row")
|
|
897
|
+
[ -z "$req" ] && continue
|
|
898
|
+
grep -l "\"requestId\":\"$req\"" ~/.claude/projects/*/*.jsonl
|
|
899
|
+
done
|
|
900
|
+
```
|
|
901
|
+
|
|
902
|
+
The filename of the matching transcript is the CC session UUID, recovering per-session attribution for every meter row that was emitted with the field on.
|
|
903
|
+
|
|
790
904
|
## Limitations
|
|
791
905
|
|
|
792
906
|
- **Proxy requires a running process** — The proxy must be started before Claude Code. If it's not running and `ANTHROPIC_BASE_URL` points to it, CC will fail to connect. We recommend running it as a systemd service or with a health-checking wrapper script.
|
|
@@ -827,6 +941,7 @@ We monitor 30+ upstream Claude Code issues related to cache, quota, and context
|
|
|
827
941
|
- **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery, proposed and built proxy architecture for v3.0.0
|
|
828
942
|
- **[@vmfarms](https://github.com/vmfarms)** — Concurrent multi-runner production validation, surfaced proxy-mode resume-marker regex no-op (#96), TTL tier detection gap (#97), and image-strip stderr leak (#98)
|
|
829
943
|
- **[@ojura](https://github.com/ojura)** — Opus 4.7 thinking-summaries root-cause analysis: filed [anthropics/claude-code#59844](https://github.com/anthropics/claude-code/issues/59844) with the CLI-binary decode (`!getIsNonInteractiveSession()` gate at offset 230510599 in v2.1.142) and the two-stacked-special-cases framing, which made the `thinking-display` extension (v3.6.1) a clean proxy-side complement to the proposed upstream fix
|
|
944
|
+
- **[@yurukusa](https://github.com/yurukusa)** — [Cluster taxonomy](https://yurukusa.github.io/cc-safe-setup/cluster-tracker.html#cluster-extended-thinking-wedge) for [anthropics/claude-code#63147](https://github.com/anthropics/claude-code/issues/63147) thinking-desync wedge; the 13E (ToolSearch) sub-pattern synthesis that made the `thinking-block-sanitize` v2 directive predicate tractable (cache-fix #171, shipped behind `=v2` opt-in in v4.0.0)
|
|
830
945
|
- **[@schuay](https://github.com/schuay)** — `quota-statusline.sh` enhancements: 10-cell quota bar with elapsed-time tick and exhaust-vs-reset projection replacing the prior `%/min` burn-rate display (PR #140, v3.6.2), and d/h vs h/m time-format autoselect plus named time-unit and burn-warmup constants (PR #143, v3.7.0)
|
|
831
946
|
|
|
832
947
|
If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.
|
package/bin/claude-via-proxy.mjs
CHANGED
|
@@ -55,6 +55,7 @@ async function dispatch() {
|
|
|
55
55
|
" CACHE_FIX_PROXY_PORT Port for the proxy server\n" +
|
|
56
56
|
" CACHE_FIX_PROXY_UPSTREAM Upstream URL\n" +
|
|
57
57
|
" CACHE_FIX_DEBUG=1 Verbose proxy logging\n" +
|
|
58
|
+
" CACHE_FIX_HOT_RELOAD=on Enable in-process extension hot-reload (off by default; see #196)\n" +
|
|
58
59
|
" CACHE_FIX_CLAUDE_CMD Override the `claude` command for the wrapper\n",
|
|
59
60
|
);
|
|
60
61
|
return 0;
|
package/bin/install-service.mjs
CHANGED
|
@@ -13,6 +13,7 @@ import { spawn } from "node:child_process";
|
|
|
13
13
|
import { fileURLToPath } from "node:url";
|
|
14
14
|
import { dirname, resolve, join } from "node:path";
|
|
15
15
|
import { homedir, platform } from "node:os";
|
|
16
|
+
import { systemdEscape, xmlEscape } from "../proxy/helpers.mjs";
|
|
16
17
|
|
|
17
18
|
const __dirname = dirname(fileURLToPath(import.meta.url));
|
|
18
19
|
const TEMPLATE_DIR = resolve(__dirname, "..", "templates");
|
|
@@ -22,7 +23,14 @@ function getDefaults() {
|
|
|
22
23
|
return {
|
|
23
24
|
port: validatePort(process.env.CACHE_FIX_PROXY_PORT || "9801"),
|
|
24
25
|
upstream: process.env.CACHE_FIX_PROXY_UPSTREAM || "",
|
|
26
|
+
caFile: process.env.CACHE_FIX_PROXY_CA_FILE || "",
|
|
27
|
+
rejectUnauthorized: process.env.CACHE_FIX_PROXY_REJECT_UNAUTHORIZED || "",
|
|
25
28
|
debug: process.env.CACHE_FIX_DEBUG || "",
|
|
29
|
+
// Hot-reload is opt-in as of v4.0.0 (#196). Capture from env at install
|
|
30
|
+
// time so the operator can bake `CACHE_FIX_HOT_RELOAD=on` into the
|
|
31
|
+
// generated unit/plist via `CACHE_FIX_HOT_RELOAD=on cache-fix-proxy
|
|
32
|
+
// install-service`. Strict "on" match — anything else renders nothing.
|
|
33
|
+
hotReload: process.env.CACHE_FIX_HOT_RELOAD === "on" ? "on" : "",
|
|
26
34
|
workingDir: resolve(__dirname, ".."),
|
|
27
35
|
};
|
|
28
36
|
}
|
|
@@ -88,10 +96,19 @@ function getPaths(plat = platform()) {
|
|
|
88
96
|
|
|
89
97
|
function renderSystemdTemplate(template, vars) {
|
|
90
98
|
const upstreamLine = vars.upstream
|
|
91
|
-
? `Environment=CACHE_FIX_PROXY_UPSTREAM=${vars.upstream}`
|
|
99
|
+
? `Environment=CACHE_FIX_PROXY_UPSTREAM=${systemdEscape(vars.upstream)}`
|
|
100
|
+
: "";
|
|
101
|
+
const caFileLine = vars.caFile
|
|
102
|
+
? `Environment=CACHE_FIX_PROXY_CA_FILE=${systemdEscape(vars.caFile)}`
|
|
103
|
+
: "";
|
|
104
|
+
const rejectUnauthorizedLine = vars.rejectUnauthorized
|
|
105
|
+
? `Environment=CACHE_FIX_PROXY_REJECT_UNAUTHORIZED=${systemdEscape(vars.rejectUnauthorized)}`
|
|
92
106
|
: "";
|
|
93
107
|
const debugLine = vars.debug
|
|
94
|
-
? `Environment=CACHE_FIX_DEBUG=${vars.debug}`
|
|
108
|
+
? `Environment=CACHE_FIX_DEBUG=${systemdEscape(vars.debug)}`
|
|
109
|
+
: "";
|
|
110
|
+
const hotReloadLine = vars.hotReload
|
|
111
|
+
? `Environment=CACHE_FIX_HOT_RELOAD=${vars.hotReload}`
|
|
95
112
|
: "";
|
|
96
113
|
// Allow callers to wire a Requires= line (e.g. another service the proxy
|
|
97
114
|
// chains to). Empty string by default so the unit has no extra deps.
|
|
@@ -103,7 +120,10 @@ function renderSystemdTemplate(template, vars) {
|
|
|
103
120
|
.replaceAll("{{SERVER_PATH}}", vars.serverPath)
|
|
104
121
|
.replaceAll("{{PORT}}", vars.port)
|
|
105
122
|
.replaceAll("{{UPSTREAM_LINE}}", upstreamLine)
|
|
123
|
+
.replaceAll("{{CA_FILE_LINE}}", caFileLine)
|
|
124
|
+
.replaceAll("{{REJECT_UNAUTHORIZED_LINE}}", rejectUnauthorizedLine)
|
|
106
125
|
.replaceAll("{{DEBUG_LINE}}", debugLine)
|
|
126
|
+
.replaceAll("{{HOT_RELOAD_LINE}}", hotReloadLine)
|
|
107
127
|
.replaceAll("{{REQUIRES_LINE}}", requiresLine)
|
|
108
128
|
.replaceAll("{{WORKING_DIR}}", vars.workingDir)
|
|
109
129
|
// Collapse triple newlines from empty optional lines down to single blank
|
|
@@ -112,17 +132,29 @@ function renderSystemdTemplate(template, vars) {
|
|
|
112
132
|
|
|
113
133
|
function renderLaunchdTemplate(template, vars) {
|
|
114
134
|
const upstreamPlist = vars.upstream
|
|
115
|
-
? ` <key>CACHE_FIX_PROXY_UPSTREAM</key>\n <string>${vars.upstream}</string>`
|
|
135
|
+
? ` <key>CACHE_FIX_PROXY_UPSTREAM</key>\n <string>${xmlEscape(vars.upstream)}</string>`
|
|
136
|
+
: "";
|
|
137
|
+
const caFilePlist = vars.caFile
|
|
138
|
+
? ` <key>CACHE_FIX_PROXY_CA_FILE</key>\n <string>${xmlEscape(vars.caFile)}</string>`
|
|
139
|
+
: "";
|
|
140
|
+
const rejectUnauthorizedPlist = vars.rejectUnauthorized
|
|
141
|
+
? ` <key>CACHE_FIX_PROXY_REJECT_UNAUTHORIZED</key>\n <string>${xmlEscape(vars.rejectUnauthorized)}</string>`
|
|
116
142
|
: "";
|
|
117
143
|
const debugPlist = vars.debug
|
|
118
|
-
? ` <key>CACHE_FIX_DEBUG</key>\n <string>${vars.debug}</string>`
|
|
144
|
+
? ` <key>CACHE_FIX_DEBUG</key>\n <string>${xmlEscape(vars.debug)}</string>`
|
|
145
|
+
: "";
|
|
146
|
+
const hotReloadPlist = vars.hotReload
|
|
147
|
+
? ` <key>CACHE_FIX_HOT_RELOAD</key>\n <string>${vars.hotReload}</string>`
|
|
119
148
|
: "";
|
|
120
149
|
return template
|
|
121
150
|
.replaceAll("{{NODE}}", vars.node)
|
|
122
151
|
.replaceAll("{{SERVER_PATH}}", vars.serverPath)
|
|
123
152
|
.replaceAll("{{PORT}}", vars.port)
|
|
124
153
|
.replaceAll("{{UPSTREAM_PLIST}}", upstreamPlist)
|
|
154
|
+
.replaceAll("{{CA_FILE_PLIST}}", caFilePlist)
|
|
155
|
+
.replaceAll("{{REJECT_UNAUTHORIZED_PLIST}}", rejectUnauthorizedPlist)
|
|
125
156
|
.replaceAll("{{DEBUG_PLIST}}", debugPlist)
|
|
157
|
+
.replaceAll("{{HOT_RELOAD_PLIST}}", hotReloadPlist)
|
|
126
158
|
.replaceAll("{{WORKING_DIR}}", vars.workingDir)
|
|
127
159
|
.replaceAll("{{LOG_DIR}}", vars.logDir)
|
|
128
160
|
.replace(/\n\n+/g, "\n");
|
|
@@ -173,11 +205,8 @@ async function installSystemd({ paths, defaults, force = false } = {}) {
|
|
|
173
205
|
const rendered = renderSystemdTemplate(template, {
|
|
174
206
|
node: process.execPath,
|
|
175
207
|
serverPath: SERVER_PATH,
|
|
176
|
-
port: defaults.port,
|
|
177
|
-
upstream: defaults.upstream,
|
|
178
|
-
debug: defaults.debug,
|
|
179
|
-
workingDir: defaults.workingDir,
|
|
180
208
|
requires: "",
|
|
209
|
+
...defaults,
|
|
181
210
|
});
|
|
182
211
|
await mkdir(paths.configDir, { recursive: true });
|
|
183
212
|
await writeFile(targetPath, rendered);
|
|
@@ -272,11 +301,8 @@ async function installLaunchd({ paths, defaults, force = false } = {}) {
|
|
|
272
301
|
const rendered = renderLaunchdTemplate(template, {
|
|
273
302
|
node: process.execPath,
|
|
274
303
|
serverPath: SERVER_PATH,
|
|
275
|
-
port: defaults.port,
|
|
276
|
-
upstream: defaults.upstream,
|
|
277
|
-
debug: defaults.debug,
|
|
278
|
-
workingDir: defaults.workingDir,
|
|
279
304
|
logDir: paths.logDir,
|
|
305
|
+
...defaults,
|
|
280
306
|
});
|
|
281
307
|
await mkdir(paths.configDir, { recursive: true });
|
|
282
308
|
await writeFile(targetPath, rendered);
|
package/package.json
CHANGED
|
@@ -56,7 +56,12 @@ export function sessionFilePath(rawId) {
|
|
|
56
56
|
return join(paths().sessionsDir, `${sessionFilename(rawId)}.json`);
|
|
57
57
|
}
|
|
58
58
|
|
|
59
|
-
|
|
59
|
+
// Exported so sibling extensions can read the canonical session id from
|
|
60
|
+
// REQUEST headers at their own onRequest time — they can't rely on
|
|
61
|
+
// ctx.meta._sessionId being set, because this writer's onRequest is the
|
|
62
|
+
// thing that populates it (and runs at order 600, after most extensions).
|
|
63
|
+
// thinking-block-sanitize v2 (order 550) uses this for the same reason.
|
|
64
|
+
export function resolveSessionId(headers) {
|
|
60
65
|
if (!headers) return null;
|
|
61
66
|
const sid =
|
|
62
67
|
headers["x-claude-code-session-id"] ||
|
|
@@ -233,9 +238,16 @@ export default {
|
|
|
233
238
|
// 590, stashes these before this writer runs). Optional — absent if
|
|
234
239
|
// that extension is disabled or produced nothing this request.
|
|
235
240
|
...(ctx.meta._sessionHealth || {}),
|
|
236
|
-
// Additive thinking-block-sanitize drop count (order 550
|
|
237
|
-
//
|
|
241
|
+
// Additive thinking-block-sanitize drop count (order 550). On by
|
|
242
|
+
// default since v4.0.0; present (possibly with thinking_blocks_dropped:0)
|
|
243
|
+
// whenever sanitize ran. Absent when CACHE_FIX_THINKING_SANITIZE=off
|
|
244
|
+
// or when the extension returned early before reaching the planner
|
|
245
|
+
// (e.g., body.messages not an array).
|
|
238
246
|
...(ctx.meta._thinkingSanitize || {}),
|
|
247
|
+
// Additive thinking-block-sanitize v2 fields (order 550, opt-in via
|
|
248
|
+
// CACHE_FIX_THINKING_SANITIZE=v2). Optional — absent unless v2 is
|
|
249
|
+
// enabled. Keys: thinking_blocks_dropped_v2 / tools_hash_baseline.
|
|
250
|
+
...(ctx.meta._thinkingSanitizeV2 || {}),
|
|
239
251
|
// Additive auto-1m-guard annotation (order 520). Optional — absent
|
|
240
252
|
// unless the outbound request carried context-1m-2025-08-07 and the
|
|
241
253
|
// mode wasn't off. Keys: auto_1m_detected / auto_1m_action /
|
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
// signature-surface-hash — hash helper for thinking-block-sanitize v2.
|
|
2
|
+
//
|
|
3
|
+
// Computes a deterministic 16-hex-char fingerprint of the inputs that
|
|
4
|
+
// participate in the API's thinking-block signature: the tools surface,
|
|
5
|
+
// and (forward-compat) optionally the system block or anthropic-beta
|
|
6
|
+
// header value.
|
|
7
|
+
//
|
|
8
|
+
// v2 only passes `{ tools }`. The signature is left forward-compatible so
|
|
9
|
+
// a future v3 directive can extend coverage without renaming this helper.
|
|
10
|
+
//
|
|
11
|
+
// Canonicalization rules (per directive proxy-thinking-block-sanitize-v2.md):
|
|
12
|
+
// - Each tool object: recursive stable JSON stringify with recursive key
|
|
13
|
+
// sorting at every nesting level. Nested JSON-schema objects (in
|
|
14
|
+
// input_schema, parameters, etc.) have their own keys, which also
|
|
15
|
+
// sort stably.
|
|
16
|
+
// - Preserve tools[] array order. Reordering tools changes which slot
|
|
17
|
+
// which tool occupies in the API's view; the hash MUST reflect that.
|
|
18
|
+
// (Note: sort-stabilization at order 200 currently locks the array
|
|
19
|
+
// order before v2 fires, so this rule is forward-compatibility against
|
|
20
|
+
// any future change in upstream ordering.)
|
|
21
|
+
// - Sentinel for empty/absent: if tools is undefined, null, or [], the
|
|
22
|
+
// hash input is the literal string "none". Rules out collision with
|
|
23
|
+
// other empty-shaped inputs in a future extension.
|
|
24
|
+
//
|
|
25
|
+
// Output: sha256(canonical_input).slice(0, 16) — 16 hex chars matches the
|
|
26
|
+
// existing _sessionHealth / _thinkingSanitize precedent for in-JSON identifiers.
|
|
27
|
+
|
|
28
|
+
import { createHash } from "node:crypto";
|
|
29
|
+
|
|
30
|
+
// Recursive stable stringify: object keys sort, arrays preserve order,
|
|
31
|
+
// primitives go through JSON.stringify as-is. Handles nested objects and
|
|
32
|
+
// arrays to arbitrary depth.
|
|
33
|
+
export function canonicalStringify(value) {
|
|
34
|
+
if (value === null || typeof value !== "object") {
|
|
35
|
+
return JSON.stringify(value);
|
|
36
|
+
}
|
|
37
|
+
if (Array.isArray(value)) {
|
|
38
|
+
return "[" + value.map(canonicalStringify).join(",") + "]";
|
|
39
|
+
}
|
|
40
|
+
const keys = Object.keys(value).sort();
|
|
41
|
+
const parts = keys.map((k) => JSON.stringify(k) + ":" + canonicalStringify(value[k]));
|
|
42
|
+
return "{" + parts.join(",") + "}";
|
|
43
|
+
}
|
|
44
|
+
|
|
45
|
+
// Compute the signature-surface hash. v2 passes only { tools }; system and
|
|
46
|
+
// anthropic_beta are reserved for future versions.
|
|
47
|
+
export function computeSignatureSurfaceHash({ tools, system, anthropic_beta } = {}) {
|
|
48
|
+
// Empty/absent tools → "none" sentinel (not the canonical-stringify of [],
|
|
49
|
+
// which would be "[]" and could collide with other empty-shaped inputs).
|
|
50
|
+
const toolsPart =
|
|
51
|
+
tools == null || (Array.isArray(tools) && tools.length === 0)
|
|
52
|
+
? "none"
|
|
53
|
+
: canonicalStringify(tools);
|
|
54
|
+
// Reserved inputs — passed by future versions; v2 always omits them, so
|
|
55
|
+
// they contribute nothing to the hash today. Kept in the signature so
|
|
56
|
+
// existing call sites don't need to change when v3 adds them.
|
|
57
|
+
void system;
|
|
58
|
+
void anthropic_beta;
|
|
59
|
+
return createHash("sha256").update(toolsPart).digest("hex").slice(0, 16);
|
|
60
|
+
}
|