claude-code-cache-fix 4.0.0 → 4.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +53 -7
- package/bin/install-service.mjs +25 -14
- package/package.json +1 -1
- package/proxy/extensions/usage-log.mjs +46 -1
- package/proxy/extensions.json +18 -80
- package/proxy/helpers.mjs +30 -0
- package/proxy/server.mjs +92 -11
- package/proxy/upstream.mjs +15 -1
- package/templates/cache-fix-proxy.service.template +2 -0
- package/templates/com.cnighswonger.cache-fix-proxy.plist.template +2 -0
- package/tools/cache_analysis.py +229 -0
package/README.md
CHANGED
|
@@ -6,7 +6,7 @@ English | [中文](./README.zh.md) | [한국어](./README.ko.md) | [Português](
|
|
|
6
6
|
|
|
7
7
|
Cache optimization proxy for [Claude Code](https://github.com/anthropics/claude-code). Fixes prompt cache bugs that cause excessive quota burn, stabilizes the request prefix, and monitors for silent regressions. Works with all CC versions including the v2.1.113+ Bun binary.
|
|
8
8
|
|
|
9
|
-
> **
|
|
9
|
+
> **v4.0.0** — Local HTTP proxy with a pipeline of cost-impact and observability extensions. Two long-standing defaults flipped: `thinking-block-sanitize` v1 is on by default (mitigates the thinking-desync `400` wedge — [#63147](https://github.com/anthropics/claude-code/issues/63147)) and in-process extension hot-reload is opt-in (`CACHE_FIX_HOT_RELOAD=on`). A/B baseline (v3.0.0 on v2.1.117): **95.5% cache hit rate through proxy vs 82.3% direct** on first warm turn. [Full release notes →](https://github.com/cnighswonger/claude-code-cache-fix/releases/tag/v4.0.0)
|
|
10
10
|
|
|
11
11
|
> **Opus 4.7 advisory:** Metered data shows 4.7 burns Q5h quota at **~2.4x the rate of 4.6** for equivalent visible token counts ([independently confirmed by @ArkNill](https://github.com/ArkNill/claude-code-hidden-problem-analysis/blob/main/16_OPUS-47-ADVISORY.md)). Two factors: a new tokenizer (up to 35% more tokens, [documented](https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7)) and adaptive thinking overhead (~105%, not documented in usage response). The Q5h impact compounds into **Q7d** — the weekly quota ceiling that most heavy users will hit first. Workaround: `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` reduces burn by ~3.3x but may reduce quality on complex tasks. See [Discussion #25](https://github.com/cnighswonger/claude-code-cache-fix/discussions/25) (initial observation) and [Discussion #42](https://github.com/cnighswonger/claude-code-cache-fix/discussions/42) (controlled A/B data + Q7d analysis).
|
|
12
12
|
|
|
@@ -25,11 +25,11 @@ node "$(npm root -g)/claude-code-cache-fix/proxy/server.mjs" &
|
|
|
25
25
|
ANTHROPIC_BASE_URL=http://127.0.0.1:9801 claude
|
|
26
26
|
```
|
|
27
27
|
|
|
28
|
-
That's it. The proxy applies
|
|
28
|
+
That's it. The proxy applies its default extension pipeline automatically. No wrapper scripts, no `NODE_OPTIONS`, no preload.
|
|
29
29
|
|
|
30
30
|
### What the proxy does
|
|
31
31
|
|
|
32
|
-
On every `/v1/messages` request,
|
|
32
|
+
On every `/v1/messages` request, the pipeline runs an ordered chain of extensions covering cache stability, observability, thinking-desync mitigation, image, microcompact, breakpoint, bootstrap-channel, and other surfaces. Several are gated behind env vars documented in their own sections below; bootstrap-channel handling defaults to `audit` mode. The headliners:
|
|
33
33
|
|
|
34
34
|
| Extension | What it fixes |
|
|
35
35
|
|-----------|--------------|
|
|
@@ -116,7 +116,7 @@ docker run -d --name cache-fix-proxy --restart=always -p 9801:9801 \
|
|
|
116
116
|
ghcr.io/cnighswonger/claude-code-cache-fix:latest
|
|
117
117
|
```
|
|
118
118
|
|
|
119
|
-
Image tags: `latest`, `
|
|
119
|
+
Image tags: `latest`, `4`, `4.0`, `4.0.0` (semver-ladder, so `4` always points to the newest 4.x). `latest` always tracks the newest tagged release.
|
|
120
120
|
|
|
121
121
|
**Linux note:** the chained-upstream `host.docker.internal` example below is automatic on Docker Desktop (macOS / Windows). On plain Linux Docker Engine you usually need `--add-host=host.docker.internal:host-gateway` so the name resolves to the host bridge. Without it, the container's name lookup fails and the proxy can't reach the upstream service running on the host. Example chaining cache-fix proxy through `llm-relay` running on the host:
|
|
122
122
|
|
|
@@ -282,7 +282,7 @@ launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
|
|
|
282
282
|
|
|
283
283
|
**Cache-economics regressions.** The original purpose of cache-fix is to absorb the cache-handling behaviors in Claude Code that cost users real money and quota — TTL downgrades, cache-breaking header churn, identity-latching issues, and the rest of the regression catalog documented across our issue history. The proxy sits between CC and the Anthropic API, normalizes the request and response stream, and emits enough observability (via statusline integration and the quota-status files) that users can see what their session is actually doing. This is the load-bearing feature for almost every user today.
|
|
284
284
|
|
|
285
|
-
**Bootstrap-channel observability.** Claude Code v2.1.150 introduced a prompt-section consumer that fetches a server-supplied string from `/api/claude_cli/bootstrap` and merges it into the agent's behavioral-instructions prompt path. We filed this behavior with Anthropic's security team in May 2026; Anthropic closed the report as *Informative*, treating TLS as the transport-integrity boundary and declining to add application-layer authenticity checks. Cache-fix
|
|
285
|
+
**Bootstrap-channel observability.** Claude Code v2.1.150 introduced a prompt-section consumer that fetches a server-supplied string from `/api/claude_cli/bootstrap` and merges it into the agent's behavioral-instructions prompt path. We filed this behavior with Anthropic's security team in May 2026; Anthropic closed the report as *Informative*, treating TLS as the transport-integrity boundary and declining to add application-layer authenticity checks. Cache-fix shipped explicit handling for this path in v3.7.0 and extended it in v3.7.1 to also cover the env-var-selected GrowthBook prompt-injection surface that landed in CC v2.1.152 (remote-control mode: `CLAUDE_CODE_SYSTEM_PROMPT_GB_FEATURE` names a flag key whose cached value is used as the system prompt body). Stable in the current v4.x line.
|
|
286
286
|
|
|
287
287
|
Cache-fix's `bootstrap-defense` extension ships three modes, selected via `CACHE_FIX_BOOTSTRAP_MODE`:
|
|
288
288
|
|
|
@@ -401,7 +401,7 @@ For manual VS Code wrapper setup (without the VSIX), see [docs/preload-setup.md]
|
|
|
401
401
|
|
|
402
402
|
**What it does NOT do:** No network calls from the proxy or interceptor. All telemetry is written to local files under `~/.claude/`. No data leaves your machine.
|
|
403
403
|
|
|
404
|
-
**Supply chain:** Proxy mode:
|
|
404
|
+
**Supply chain:** Proxy mode: small focused extension modules in `proxy/extensions/` (most under a few hundred lines; the pipeline is composable, you can read any single one in isolation). Preload mode: single unminified file (`preload.mjs`). One dev dependency (`zod` for schema validation in tests only). Review before installing. Published builds carry npm's default registry signatures; sigstore provenance attestation is not currently published — tracked as a follow-up.
|
|
405
405
|
|
|
406
406
|
**Independent audit:** [Assessed as "LEGITIMATE TOOL"](https://github.com/anthropics/claude-code/issues/38335#issuecomment-4244413605) by @TheAuditorTool (2026-04-14).
|
|
407
407
|
|
|
@@ -421,7 +421,7 @@ Additionally, images read via the Read tool persist as base64 in conversation hi
|
|
|
421
421
|
|
|
422
422
|
## How it works
|
|
423
423
|
|
|
424
|
-
**Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests.
|
|
424
|
+
**Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. A pipeline of extension modules processes each request — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers, sanitizing thinking blocks, recording telemetry, and more. Extensions live as `.mjs` files configured in `proxy/extensions.json` and load once at proxy startup (hot-reload is opt-in as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x)). All other traffic passes through untouched.
|
|
425
425
|
|
|
426
426
|
**Preload mode** (v2.x): A Node.js `--import` module that patches `globalThis.fetch` before Claude Code makes API calls. Applies the same fixes inline — scans user messages for relocated blocks, sorts tools, recomputes fingerprints, injects TTL markers.
|
|
427
427
|
|
|
@@ -855,6 +855,52 @@ The preload interceptor includes monitoring for microcompact degradation, false
|
|
|
855
855
|
|
|
856
856
|
See [docs/monitoring.md](docs/monitoring.md) for full details, debug mode, prefix diffing, environment variables, and the bundled quota analysis tool.
|
|
857
857
|
|
|
858
|
+
### `usage-log` extension and the `MeterRowSchema v:1` wire format
|
|
859
|
+
|
|
860
|
+
The `usage-log` extension (opt-in via `proxy/extensions.json`) appends one JSON line per API response to `~/.claude/usage.jsonl`. The row shape is `MeterRowSchema v:1` — the cross-repo contract validated by [`claude-code-meter`](https://github.com/cnighswonger/claude-code-meter)'s strict schema. Every field below is captured per call:
|
|
861
|
+
|
|
862
|
+
| Field | Type | Source |
|
|
863
|
+
|---|---|---|
|
|
864
|
+
| `v` | literal `1` | constant |
|
|
865
|
+
| `ts` | ISO-8601 datetime | server time at row emission |
|
|
866
|
+
| `sid` | 8-char lowercase hex | proxy session id, sticky for the proxy's lifetime |
|
|
867
|
+
| `model` | string ≤64 | `message_start.message.model` from the response stream |
|
|
868
|
+
| `requested_model` | string ≤64 (optional) | request body `model` field |
|
|
869
|
+
| `model_mismatch` | bool (optional) | true when `requested_model && model && requested_model !== model` |
|
|
870
|
+
| `speed` | `"standard"` / `"fast"` / `""` | response `usage.speed` |
|
|
871
|
+
| `service_tier` | string ≤32 | response `usage.service_tier` |
|
|
872
|
+
| `input_tokens` | int ≥0 | response usage |
|
|
873
|
+
| `output_tokens` | int ≥0 | response usage |
|
|
874
|
+
| `cache_creation_input_tokens` | int ≥0 | response usage |
|
|
875
|
+
| `cache_read_input_tokens` | int ≥0 | response usage |
|
|
876
|
+
| `ephemeral_1h_input_tokens` | int ≥0 | response usage |
|
|
877
|
+
| `ephemeral_5m_input_tokens` | int ≥0 | response usage |
|
|
878
|
+
| `web_search_requests` | int ≥0 | response usage |
|
|
879
|
+
| `q5h` / `q7d` | float 0–2 | `anthropic-ratelimit-unified-{5h,7d}-utilization` headers |
|
|
880
|
+
| `q5h_reset` / `q7d_reset` | int (unix sec) | corresponding reset headers |
|
|
881
|
+
| `qstatus`, `qoverage`, `qclaim` | lowercase enums | unified status / overage / claim headers |
|
|
882
|
+
| `qfallback_pct` | float 0–1 | unified fallback percentage |
|
|
883
|
+
| `qoverage_util` | float ≥0 (optional) | overage utilization header |
|
|
884
|
+
| `qrepresentative_claim` | string ≤16 (optional) | representative-claim header |
|
|
885
|
+
| `org_id` | 16-char hex (optional) | `sha256(anthropic-organization-id).slice(0, 16)` — never raw |
|
|
886
|
+
| `overage_disabled_reason` | string ≤64 (optional) | overage-disabled-reason header |
|
|
887
|
+
| `cache_hit_rate` | float 0–1 | `cache_read_input_tokens / (input + cache_creation + cache_read)` |
|
|
888
|
+
| `q5h_delta`, `q7d_delta` | float | per-call delta from the previous row's q5h/q7d; 0 on first call after restart |
|
|
889
|
+
| `request_id` | string ≤64 (optional, gated) | upstream `request-id` response header. Default-off; enable with `CACHE_FIX_USAGE_LOG_REQID=on`. **Cross-repo gate:** `claude-code-meter >= v0.7.0` accepts the optional field; older meter installs reject unknown keys via the strict-object schema. |
|
|
890
|
+
|
|
891
|
+
**Why `request_id` matters operationally.** The `sid` field is generated once at proxy boot and shared across every CC session that proxy serves. On hosts running multiple concurrent CC sessions through one proxy (common in agent fleets), every session's rows collapse into the same `sid` — there's no way to ask "which session burned 80% of today's Opus tokens?" from `usage.jsonl` alone. CC's per-session JSONL transcripts at `~/.claude/projects/<project>/<session-uuid>.jsonl` already carry `requestId` for every API call. Capturing the same value in the meter row makes the post-hoc join trivial:
|
|
892
|
+
|
|
893
|
+
```bash
|
|
894
|
+
# Find which CC session each usage.jsonl row belongs to:
|
|
895
|
+
for row in $(jq -c . < ~/.claude/usage.jsonl); do
|
|
896
|
+
req=$(jq -r '.request_id // empty' <<< "$row")
|
|
897
|
+
[ -z "$req" ] && continue
|
|
898
|
+
grep -l "\"requestId\":\"$req\"" ~/.claude/projects/*/*.jsonl
|
|
899
|
+
done
|
|
900
|
+
```
|
|
901
|
+
|
|
902
|
+
The filename of the matching transcript is the CC session UUID, recovering per-session attribution for every meter row that was emitted with the field on.
|
|
903
|
+
|
|
858
904
|
## Limitations
|
|
859
905
|
|
|
860
906
|
- **Proxy requires a running process** — The proxy must be started before Claude Code. If it's not running and `ANTHROPIC_BASE_URL` points to it, CC will fail to connect. We recommend running it as a systemd service or with a health-checking wrapper script.
|
package/bin/install-service.mjs
CHANGED
|
@@ -13,6 +13,7 @@ import { spawn } from "node:child_process";
|
|
|
13
13
|
import { fileURLToPath } from "node:url";
|
|
14
14
|
import { dirname, resolve, join } from "node:path";
|
|
15
15
|
import { homedir, platform } from "node:os";
|
|
16
|
+
import { systemdEscape, xmlEscape } from "../proxy/helpers.mjs";
|
|
16
17
|
|
|
17
18
|
const __dirname = dirname(fileURLToPath(import.meta.url));
|
|
18
19
|
const TEMPLATE_DIR = resolve(__dirname, "..", "templates");
|
|
@@ -22,6 +23,8 @@ function getDefaults() {
|
|
|
22
23
|
return {
|
|
23
24
|
port: validatePort(process.env.CACHE_FIX_PROXY_PORT || "9801"),
|
|
24
25
|
upstream: process.env.CACHE_FIX_PROXY_UPSTREAM || "",
|
|
26
|
+
caFile: process.env.CACHE_FIX_PROXY_CA_FILE || "",
|
|
27
|
+
rejectUnauthorized: process.env.CACHE_FIX_PROXY_REJECT_UNAUTHORIZED || "",
|
|
25
28
|
debug: process.env.CACHE_FIX_DEBUG || "",
|
|
26
29
|
// Hot-reload is opt-in as of v4.0.0 (#196). Capture from env at install
|
|
27
30
|
// time so the operator can bake `CACHE_FIX_HOT_RELOAD=on` into the
|
|
@@ -93,10 +96,16 @@ function getPaths(plat = platform()) {
|
|
|
93
96
|
|
|
94
97
|
function renderSystemdTemplate(template, vars) {
|
|
95
98
|
const upstreamLine = vars.upstream
|
|
96
|
-
? `Environment=CACHE_FIX_PROXY_UPSTREAM=${vars.upstream}`
|
|
99
|
+
? `Environment=CACHE_FIX_PROXY_UPSTREAM=${systemdEscape(vars.upstream)}`
|
|
100
|
+
: "";
|
|
101
|
+
const caFileLine = vars.caFile
|
|
102
|
+
? `Environment=CACHE_FIX_PROXY_CA_FILE=${systemdEscape(vars.caFile)}`
|
|
103
|
+
: "";
|
|
104
|
+
const rejectUnauthorizedLine = vars.rejectUnauthorized
|
|
105
|
+
? `Environment=CACHE_FIX_PROXY_REJECT_UNAUTHORIZED=${systemdEscape(vars.rejectUnauthorized)}`
|
|
97
106
|
: "";
|
|
98
107
|
const debugLine = vars.debug
|
|
99
|
-
? `Environment=CACHE_FIX_DEBUG=${vars.debug}`
|
|
108
|
+
? `Environment=CACHE_FIX_DEBUG=${systemdEscape(vars.debug)}`
|
|
100
109
|
: "";
|
|
101
110
|
const hotReloadLine = vars.hotReload
|
|
102
111
|
? `Environment=CACHE_FIX_HOT_RELOAD=${vars.hotReload}`
|
|
@@ -111,6 +120,8 @@ function renderSystemdTemplate(template, vars) {
|
|
|
111
120
|
.replaceAll("{{SERVER_PATH}}", vars.serverPath)
|
|
112
121
|
.replaceAll("{{PORT}}", vars.port)
|
|
113
122
|
.replaceAll("{{UPSTREAM_LINE}}", upstreamLine)
|
|
123
|
+
.replaceAll("{{CA_FILE_LINE}}", caFileLine)
|
|
124
|
+
.replaceAll("{{REJECT_UNAUTHORIZED_LINE}}", rejectUnauthorizedLine)
|
|
114
125
|
.replaceAll("{{DEBUG_LINE}}", debugLine)
|
|
115
126
|
.replaceAll("{{HOT_RELOAD_LINE}}", hotReloadLine)
|
|
116
127
|
.replaceAll("{{REQUIRES_LINE}}", requiresLine)
|
|
@@ -121,10 +132,16 @@ function renderSystemdTemplate(template, vars) {
|
|
|
121
132
|
|
|
122
133
|
function renderLaunchdTemplate(template, vars) {
|
|
123
134
|
const upstreamPlist = vars.upstream
|
|
124
|
-
? ` <key>CACHE_FIX_PROXY_UPSTREAM</key>\n <string>${vars.upstream}</string>`
|
|
135
|
+
? ` <key>CACHE_FIX_PROXY_UPSTREAM</key>\n <string>${xmlEscape(vars.upstream)}</string>`
|
|
136
|
+
: "";
|
|
137
|
+
const caFilePlist = vars.caFile
|
|
138
|
+
? ` <key>CACHE_FIX_PROXY_CA_FILE</key>\n <string>${xmlEscape(vars.caFile)}</string>`
|
|
139
|
+
: "";
|
|
140
|
+
const rejectUnauthorizedPlist = vars.rejectUnauthorized
|
|
141
|
+
? ` <key>CACHE_FIX_PROXY_REJECT_UNAUTHORIZED</key>\n <string>${xmlEscape(vars.rejectUnauthorized)}</string>`
|
|
125
142
|
: "";
|
|
126
143
|
const debugPlist = vars.debug
|
|
127
|
-
? ` <key>CACHE_FIX_DEBUG</key>\n <string>${vars.debug}</string>`
|
|
144
|
+
? ` <key>CACHE_FIX_DEBUG</key>\n <string>${xmlEscape(vars.debug)}</string>`
|
|
128
145
|
: "";
|
|
129
146
|
const hotReloadPlist = vars.hotReload
|
|
130
147
|
? ` <key>CACHE_FIX_HOT_RELOAD</key>\n <string>${vars.hotReload}</string>`
|
|
@@ -134,6 +151,8 @@ function renderLaunchdTemplate(template, vars) {
|
|
|
134
151
|
.replaceAll("{{SERVER_PATH}}", vars.serverPath)
|
|
135
152
|
.replaceAll("{{PORT}}", vars.port)
|
|
136
153
|
.replaceAll("{{UPSTREAM_PLIST}}", upstreamPlist)
|
|
154
|
+
.replaceAll("{{CA_FILE_PLIST}}", caFilePlist)
|
|
155
|
+
.replaceAll("{{REJECT_UNAUTHORIZED_PLIST}}", rejectUnauthorizedPlist)
|
|
137
156
|
.replaceAll("{{DEBUG_PLIST}}", debugPlist)
|
|
138
157
|
.replaceAll("{{HOT_RELOAD_PLIST}}", hotReloadPlist)
|
|
139
158
|
.replaceAll("{{WORKING_DIR}}", vars.workingDir)
|
|
@@ -186,12 +205,8 @@ async function installSystemd({ paths, defaults, force = false } = {}) {
|
|
|
186
205
|
const rendered = renderSystemdTemplate(template, {
|
|
187
206
|
node: process.execPath,
|
|
188
207
|
serverPath: SERVER_PATH,
|
|
189
|
-
port: defaults.port,
|
|
190
|
-
upstream: defaults.upstream,
|
|
191
|
-
debug: defaults.debug,
|
|
192
|
-
hotReload: defaults.hotReload,
|
|
193
|
-
workingDir: defaults.workingDir,
|
|
194
208
|
requires: "",
|
|
209
|
+
...defaults,
|
|
195
210
|
});
|
|
196
211
|
await mkdir(paths.configDir, { recursive: true });
|
|
197
212
|
await writeFile(targetPath, rendered);
|
|
@@ -286,12 +301,8 @@ async function installLaunchd({ paths, defaults, force = false } = {}) {
|
|
|
286
301
|
const rendered = renderLaunchdTemplate(template, {
|
|
287
302
|
node: process.execPath,
|
|
288
303
|
serverPath: SERVER_PATH,
|
|
289
|
-
port: defaults.port,
|
|
290
|
-
upstream: defaults.upstream,
|
|
291
|
-
debug: defaults.debug,
|
|
292
|
-
hotReload: defaults.hotReload,
|
|
293
|
-
workingDir: defaults.workingDir,
|
|
294
304
|
logDir: paths.logDir,
|
|
305
|
+
...defaults,
|
|
295
306
|
});
|
|
296
307
|
await mkdir(paths.configDir, { recursive: true });
|
|
297
308
|
await writeFile(targetPath, rendered);
|
package/package.json
CHANGED
|
@@ -26,6 +26,7 @@
|
|
|
26
26
|
// overage_disabled_reason?: string ≤64 (optional)
|
|
27
27
|
// cache_hit_rate: float 0–1
|
|
28
28
|
// q5h_delta, q7d_delta: float (0 on first call after restart)
|
|
29
|
+
// request_id?: string ≤64 (optional, gated)
|
|
29
30
|
//
|
|
30
31
|
// `peak_hour` is NOT in the wire format. It can be derived from `ts` if any
|
|
31
32
|
// consumer needs it.
|
|
@@ -36,6 +37,17 @@
|
|
|
36
37
|
// CACHE_FIX_USAGE_LOG=<path> overrides the destination path only — it is NOT
|
|
37
38
|
// an enable flag and never has been.
|
|
38
39
|
//
|
|
40
|
+
// CACHE_FIX_USAGE_LOG_REQID=on emits the optional `request_id` field
|
|
41
|
+
// (sourced from the upstream `request-id` response header). Default-off in
|
|
42
|
+
// v4.1.0 to avoid breaking unpatched claude-meter installs whose strict-
|
|
43
|
+
// object schema rejects unknown keys. claude-meter v0.7.0+ accepts the
|
|
44
|
+
// optional field; the v4.2.0 flip to default-on assumes that floor.
|
|
45
|
+
// The field is the post-hoc join key against CC's per-session JSONL
|
|
46
|
+
// transcripts
|
|
47
|
+
// (`~/.claude/projects/<project>/<session-uuid>.jsonl` carry `requestId`
|
|
48
|
+
// for every API call), which recovers per-CC-session attribution that
|
|
49
|
+
// `sid` alone cannot provide. See docs/directives/proxy-usage-log-request-id.md.
|
|
50
|
+
//
|
|
39
51
|
// See `docs/directives/proxy-claude-meter-compat.md` for full design.
|
|
40
52
|
|
|
41
53
|
import { appendFile, mkdir } from "node:fs/promises";
|
|
@@ -91,6 +103,17 @@ export function extractMessageDeltaFields(event) {
|
|
|
91
103
|
return { output_tokens: event.usage.output_tokens || 0 };
|
|
92
104
|
}
|
|
93
105
|
|
|
106
|
+
// Extract upstream request-id from response headers, guarded against the
|
|
107
|
+
// max(64) MeterRowSchema constraint. Returns the string when valid, or
|
|
108
|
+
// `undefined` so the optional schema field is omitted on bad input rather
|
|
109
|
+
// than emitting a row that would fail meter-side validation.
|
|
110
|
+
export function extractRequestId(headers) {
|
|
111
|
+
const raw = headers?.["request-id"];
|
|
112
|
+
if (typeof raw !== "string") return undefined;
|
|
113
|
+
if (raw.length === 0 || raw.length > 64) return undefined;
|
|
114
|
+
return raw;
|
|
115
|
+
}
|
|
116
|
+
|
|
94
117
|
function num(headers, key) {
|
|
95
118
|
const v = headers?.[key];
|
|
96
119
|
if (v === undefined || v === null || v === "") return null;
|
|
@@ -134,7 +157,7 @@ export function computeDelta(current, previous) {
|
|
|
134
157
|
return current - previous;
|
|
135
158
|
}
|
|
136
159
|
|
|
137
|
-
export function assembleRecord({ start, delta, quota, requestedModel, sid, prevQ5h, prevQ7d, now = new Date() }) {
|
|
160
|
+
export function assembleRecord({ start, delta, quota, requestedModel, sid, prevQ5h, prevQ7d, requestId, now = new Date() }) {
|
|
138
161
|
const s = start || {};
|
|
139
162
|
const d = delta || {};
|
|
140
163
|
const q = quota || {};
|
|
@@ -194,6 +217,26 @@ export function assembleRecord({ start, delta, quota, requestedModel, sid, prevQ
|
|
|
194
217
|
record.overage_disabled_reason = q.overage_disabled_reason;
|
|
195
218
|
}
|
|
196
219
|
|
|
220
|
+
// Optional: emit request_id when CACHE_FIX_USAGE_LOG_REQID=on AND the
|
|
221
|
+
// captured value is a non-empty string within the schema's max(64)
|
|
222
|
+
// constraint. Belt-and-braces: extractRequestId enforces these guards at
|
|
223
|
+
// capture time, and assembleRecord re-enforces them here so a future
|
|
224
|
+
// refactor that bypasses the extractor can't emit a row that would fail
|
|
225
|
+
// claude-meter's strict-object validation.
|
|
226
|
+
// Env read happens per-call so operators can flip it at runtime without
|
|
227
|
+
// proxy restart, matching the image-strip debug-gate pattern.
|
|
228
|
+
// Cross-repo contract: claude-code-meter v0.7.0+ accepts this optional
|
|
229
|
+
// field; older meter installs reject rows that carry it, so the gate
|
|
230
|
+
// stays default-off in v4.1.0. Default flips on in cache-fix v4.2.0.
|
|
231
|
+
if (
|
|
232
|
+
process.env.CACHE_FIX_USAGE_LOG_REQID === "on" &&
|
|
233
|
+
typeof requestId === "string" &&
|
|
234
|
+
requestId.length > 0 &&
|
|
235
|
+
requestId.length <= 64
|
|
236
|
+
) {
|
|
237
|
+
record.request_id = requestId;
|
|
238
|
+
}
|
|
239
|
+
|
|
197
240
|
return record;
|
|
198
241
|
}
|
|
199
242
|
|
|
@@ -250,6 +293,7 @@ export default {
|
|
|
250
293
|
const delta = extractMessageDeltaFields(ctx.event);
|
|
251
294
|
const quota = parseQuotaHeaders(ctx.responseHeaders || {});
|
|
252
295
|
const requestedModel = ctx.telemetry?.requestedModel || undefined;
|
|
296
|
+
const requestId = extractRequestId(ctx.responseHeaders || {});
|
|
253
297
|
|
|
254
298
|
const record = assembleRecord({
|
|
255
299
|
start,
|
|
@@ -259,6 +303,7 @@ export default {
|
|
|
259
303
|
sid: _sid,
|
|
260
304
|
prevQ5h: _lastQ5h,
|
|
261
305
|
prevQ7d: _lastQ7d,
|
|
306
|
+
requestId,
|
|
262
307
|
now: new Date(),
|
|
263
308
|
});
|
|
264
309
|
|
package/proxy/extensions.json
CHANGED
|
@@ -1,82 +1,20 @@
|
|
|
1
1
|
{
|
|
2
|
-
"bootstrap-defense": {
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
},
|
|
6
|
-
"
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
},
|
|
10
|
-
"
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
},
|
|
14
|
-
"
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
},
|
|
18
|
-
"
|
|
19
|
-
|
|
20
|
-
"order": 200
|
|
21
|
-
},
|
|
22
|
-
"fresh-session-sort": {
|
|
23
|
-
"enabled": true,
|
|
24
|
-
"order": 250
|
|
25
|
-
},
|
|
26
|
-
"identity-normalization": {
|
|
27
|
-
"enabled": true,
|
|
28
|
-
"order": 300
|
|
29
|
-
},
|
|
30
|
-
"smoosh-split": {
|
|
31
|
-
"enabled": true,
|
|
32
|
-
"order": 320
|
|
33
|
-
},
|
|
34
|
-
"content-strip": {
|
|
35
|
-
"enabled": true,
|
|
36
|
-
"order": 330
|
|
37
|
-
},
|
|
38
|
-
"tool-input-normalize": {
|
|
39
|
-
"enabled": true,
|
|
40
|
-
"order": 340
|
|
41
|
-
},
|
|
42
|
-
"microcompact-stability": {
|
|
43
|
-
"enabled": true,
|
|
44
|
-
"order": 350
|
|
45
|
-
},
|
|
46
|
-
"thinking-display": {
|
|
47
|
-
"enabled": true,
|
|
48
|
-
"order": 360
|
|
49
|
-
},
|
|
50
|
-
"cache-control-normalize": {
|
|
51
|
-
"enabled": true,
|
|
52
|
-
"order": 400
|
|
53
|
-
},
|
|
54
|
-
"messages-cache-breakpoint": {
|
|
55
|
-
"enabled": true,
|
|
56
|
-
"order": 410
|
|
57
|
-
},
|
|
58
|
-
"ttl-management": {
|
|
59
|
-
"enabled": true,
|
|
60
|
-
"order": 500
|
|
61
|
-
},
|
|
62
|
-
"cache-telemetry": {
|
|
63
|
-
"enabled": true,
|
|
64
|
-
"order": 600
|
|
65
|
-
},
|
|
66
|
-
"overage-warning": {
|
|
67
|
-
"enabled": true,
|
|
68
|
-
"order": 610
|
|
69
|
-
},
|
|
70
|
-
"request-log": {
|
|
71
|
-
"enabled": false,
|
|
72
|
-
"order": 700
|
|
73
|
-
},
|
|
74
|
-
"usage-log": {
|
|
75
|
-
"enabled": true,
|
|
76
|
-
"order": 650
|
|
77
|
-
},
|
|
78
|
-
"rate-limit-log": {
|
|
79
|
-
"enabled": true,
|
|
80
|
-
"order": 660
|
|
81
|
-
}
|
|
2
|
+
"bootstrap-defense": { "enabled": true, "order": 45 },
|
|
3
|
+
"ttl-tier-detect": { "enabled": true, "order": 75 },
|
|
4
|
+
"fingerprint-strip": { "enabled": true, "order": 100 },
|
|
5
|
+
"image-strip": { "enabled": true, "order": 150 },
|
|
6
|
+
"sort-stabilization": { "enabled": true, "order": 200 },
|
|
7
|
+
"fresh-session-sort": { "enabled": true, "order": 250 },
|
|
8
|
+
"identity-normalization": { "enabled": true, "order": 300 },
|
|
9
|
+
"smoosh-split": { "enabled": true, "order": 320 },
|
|
10
|
+
"content-strip": { "enabled": true, "order": 330 },
|
|
11
|
+
"tool-input-normalize": { "enabled": true, "order": 340 },
|
|
12
|
+
"microcompact-stability": { "enabled": true, "order": 350 },
|
|
13
|
+
"thinking-display": { "enabled": true, "order": 360 },
|
|
14
|
+
"cache-control-normalize": { "enabled": true, "order": 400 },
|
|
15
|
+
"messages-cache-breakpoint": { "enabled": true, "order": 410 },
|
|
16
|
+
"ttl-management": { "enabled": true, "order": 500 },
|
|
17
|
+
"cache-telemetry": { "enabled": true, "order": 600 },
|
|
18
|
+
"overage-warning": { "enabled": true, "order": 610 },
|
|
19
|
+
"request-log": { "enabled": false, "order": 700 }
|
|
82
20
|
}
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
// Escape a value for safe rendering into a systemd `Environment=KEY=VALUE` line.
|
|
2
|
+
//
|
|
3
|
+
// Per systemd.exec(5) Environment= and systemd.unit(5) Specifier Expansion:
|
|
4
|
+
// - Literal `%` is the specifier-expansion marker; to embed one in a value
|
|
5
|
+
// the unit file must write `%%`. Without escaping, `a%20b` is parsed as
|
|
6
|
+
// a failed `%20` specifier expansion, systemd logs "Invalid slot" and
|
|
7
|
+
// silently drops the variable (empirically reproduced 2026-06-07).
|
|
8
|
+
// - Backslash is a C-string escape inside quoted strings AND inside the
|
|
9
|
+
// Environment= value parser; `\b` becomes byte 0x08 (backspace), `\n`
|
|
10
|
+
// becomes LF, etc. To embed a literal `\` the unit must write `\\`.
|
|
11
|
+
// - `"` requires `\"` (after the backslash escape rule above).
|
|
12
|
+
// - Whitespace requires the whole value to be quoted (`"..."`).
|
|
13
|
+
//
|
|
14
|
+
// Order matters: escape `%` first (it produces `%%`, neither of which we
|
|
15
|
+
// want to re-escape later), then handle `\` and `"` together inside the
|
|
16
|
+
// quoting branch.
|
|
17
|
+
export const systemdEscape = (v) => {
|
|
18
|
+
const percentEscaped = v.replace(/%/g, '%%');
|
|
19
|
+
const needsQuoting = /[\s"\\]/.test(v);
|
|
20
|
+
if (!needsQuoting) return percentEscaped;
|
|
21
|
+
return `"${percentEscaped.replace(/[\\"]/g, '\\$&')}"`;
|
|
22
|
+
};
|
|
23
|
+
|
|
24
|
+
export const xmlEscape = (v) => v.replace(/[&<>'"]/g, c => ({
|
|
25
|
+
'&': '&',
|
|
26
|
+
'<': '<',
|
|
27
|
+
'>': '>',
|
|
28
|
+
"'": ''',
|
|
29
|
+
'"': '"'
|
|
30
|
+
})[c]);
|
package/proxy/server.mjs
CHANGED
|
@@ -6,6 +6,47 @@ import { streamResponse, createTelemetryRecord } from "./stream.mjs";
|
|
|
6
6
|
import { loadExtensions, snapshotRegistry, runOnRequest, runOnResponseStart, runOnResponse, getFailedExtensions } from "./pipeline.mjs";
|
|
7
7
|
import { startWatcher } from "./watcher.mjs";
|
|
8
8
|
|
|
9
|
+
// Debug logging — writes to ~/.claude/cache-fix-debug.log (override path with
|
|
10
|
+
// CACHE_FIX_DEBUG_LOG). Self-gated on CACHE_FIX_DEBUG=1; a no-op otherwise.
|
|
11
|
+
// Env is read on every call so tests (and operators flipping the flag at
|
|
12
|
+
// runtime) see live behavior — same pattern as image-strip's #98 gate.
|
|
13
|
+
import { appendFileSync, mkdirSync } from "node:fs";
|
|
14
|
+
import { homedir } from "node:os";
|
|
15
|
+
import { dirname, join } from "node:path";
|
|
16
|
+
import util from "node:util";
|
|
17
|
+
|
|
18
|
+
function debugLogPath() {
|
|
19
|
+
return process.env.CACHE_FIX_DEBUG_LOG ||
|
|
20
|
+
join(homedir(), ".claude", "cache-fix-debug.log");
|
|
21
|
+
}
|
|
22
|
+
|
|
23
|
+
// Never spread raw headers to the log: Authorization / x-api-key / cookies
|
|
24
|
+
// must never persist to disk. Same discipline as bootstrap-defense.mjs's
|
|
25
|
+
// audit-record contract — extract named scalars only.
|
|
26
|
+
const SENSITIVE_HEADERS = new Set([
|
|
27
|
+
"authorization",
|
|
28
|
+
"x-api-key",
|
|
29
|
+
"cookie",
|
|
30
|
+
"set-cookie",
|
|
31
|
+
"proxy-authorization",
|
|
32
|
+
]);
|
|
33
|
+
|
|
34
|
+
function redactHeaders(headers) {
|
|
35
|
+
const out = {};
|
|
36
|
+
for (const [k, v] of Object.entries(headers || {})) {
|
|
37
|
+
out[k] = SENSITIVE_HEADERS.has(k.toLowerCase()) ? "[REDACTED]" : v;
|
|
38
|
+
}
|
|
39
|
+
return out;
|
|
40
|
+
}
|
|
41
|
+
|
|
42
|
+
function debugLog(...args) {
|
|
43
|
+
if (process.env.CACHE_FIX_DEBUG !== "1") return;
|
|
44
|
+
const path = debugLogPath();
|
|
45
|
+
try { mkdirSync(dirname(path), { recursive: true }); } catch {}
|
|
46
|
+
const line = `[${new Date().toISOString()}] ${util.format(...args)}\n`;
|
|
47
|
+
try { appendFileSync(path, line); } catch {}
|
|
48
|
+
}
|
|
49
|
+
|
|
9
50
|
function collectBody(req) {
|
|
10
51
|
return new Promise((resolve, reject) => {
|
|
11
52
|
const chunks = [];
|
|
@@ -74,7 +115,13 @@ async function handleMessages(clientReq, clientRes) {
|
|
|
74
115
|
});
|
|
75
116
|
|
|
76
117
|
const pre = await preForward(clientReq, clientRes, abortController, extSnapshot, "messages");
|
|
77
|
-
if (pre.handled)
|
|
118
|
+
if (pre.handled) {
|
|
119
|
+
debugLog("[PROXY] handled internally without upstream request",
|
|
120
|
+
"method:", clientReq.method, "url:", clientReq.url,
|
|
121
|
+
"status:", clientRes.statusCode,
|
|
122
|
+
"response headers:", redactHeaders(clientRes.getHeaders()));
|
|
123
|
+
return;
|
|
124
|
+
}
|
|
78
125
|
const { parsed, forwardBody, meta } = pre;
|
|
79
126
|
|
|
80
127
|
const requestedModel = parsed?.model || null;
|
|
@@ -88,6 +135,7 @@ async function handleMessages(clientReq, clientRes) {
|
|
|
88
135
|
abortController.signal
|
|
89
136
|
));
|
|
90
137
|
} catch (err) {
|
|
138
|
+
debugLog("[PROXY] forwardRequest error:", err.message);
|
|
91
139
|
if (abortController.signal.aborted) return;
|
|
92
140
|
clientRes.writeHead(502, { "content-type": "application/json" });
|
|
93
141
|
clientRes.end(JSON.stringify({ error: "upstream_error", message: err.message }));
|
|
@@ -99,6 +147,11 @@ async function handleMessages(clientReq, clientRes) {
|
|
|
99
147
|
// socket carried the request without each one re-instrumenting upstream.
|
|
100
148
|
meta._upstreamConnectionId = upstreamConnectionId ?? null;
|
|
101
149
|
|
|
150
|
+
debugLog("[UPSTREAM -> PROXY -> CLAUDE] RESPONSE",
|
|
151
|
+
"status:", statusCode, "message:", upstreamRes.statusMessage,
|
|
152
|
+
"upstream headers:", redactHeaders(upstreamRes.headers),
|
|
153
|
+
"proxy headers:", redactHeaders(responseHeaders));
|
|
154
|
+
|
|
102
155
|
if (extSnapshot.length > 0) {
|
|
103
156
|
const resCtx = { status: statusCode, headers: responseHeaders, meta };
|
|
104
157
|
await runOnResponseStart(resCtx, extSnapshot);
|
|
@@ -274,16 +327,44 @@ function handleNotFound(_req, res) {
|
|
|
274
327
|
*/
|
|
275
328
|
export function createProxyServer() {
|
|
276
329
|
return http.createServer((req, res) => {
|
|
277
|
-
|
|
278
|
-
|
|
279
|
-
|
|
280
|
-
|
|
281
|
-
|
|
282
|
-
|
|
283
|
-
|
|
284
|
-
|
|
285
|
-
|
|
286
|
-
|
|
330
|
+
// Async IIFE: handleMessages/handleBootstrap return promises, so we have
|
|
331
|
+
// to await them inside the try/catch — a bare return would let rejections
|
|
332
|
+
// escape to unhandledRejection and (on Node 15+) crash the process.
|
|
333
|
+
(async () => {
|
|
334
|
+
try {
|
|
335
|
+
debugLog("[CLAUDE -> PROXY] REQUEST",
|
|
336
|
+
"method:", req.method, "url:", req.url,
|
|
337
|
+
"headers:", redactHeaders(req.headers));
|
|
338
|
+
|
|
339
|
+
// Wrap res.write/res.end to log chunk-level activity when debug is on.
|
|
340
|
+
// These are sync monkey-patches; the inner debugLog self-gates so the
|
|
341
|
+
// overhead is negligible when CACHE_FIX_DEBUG is unset.
|
|
342
|
+
const originalWrite = res.write;
|
|
343
|
+
const originalEnd = res.end;
|
|
344
|
+
res.write = function (chunk, ...args) {
|
|
345
|
+
debugLog(`[PROXY -> CLAUDE] Send chunk. Size: ${chunk ? chunk.length : 0} bytes`);
|
|
346
|
+
return originalWrite.apply(res, [chunk, ...args]);
|
|
347
|
+
};
|
|
348
|
+
res.end = function (chunk, ...args) {
|
|
349
|
+
debugLog("[PROXY -> CLAUDE] Close connection (res.end)");
|
|
350
|
+
return originalEnd.apply(res, [chunk, ...args]);
|
|
351
|
+
};
|
|
352
|
+
|
|
353
|
+
if (req.method === "GET" && req.url === "/health") return handleHealth(req, res);
|
|
354
|
+
if (req.method === "POST" && req.url?.startsWith("/v1/messages")) return await handleMessages(req, res);
|
|
355
|
+
if (req.url?.startsWith("/api/claude_cli/bootstrap")) return await handleBootstrap(req, res);
|
|
356
|
+
debugLog("ERROR: handler not found for req.url=", req.url, "method=", req.method);
|
|
357
|
+
handleNotFound(req, res);
|
|
358
|
+
} catch (error) {
|
|
359
|
+
debugLog("REQUEST HANDLER ERROR:", error?.message, error?.stack);
|
|
360
|
+
// Generic body: do NOT echo error.message (may include internal paths,
|
|
361
|
+
// upstream URLs, or other server state).
|
|
362
|
+
if (!res.headersSent) {
|
|
363
|
+
res.writeHead(500, { "content-type": "application/json" });
|
|
364
|
+
res.end(JSON.stringify({ error: "internal_proxy_error" }));
|
|
365
|
+
}
|
|
366
|
+
}
|
|
367
|
+
})();
|
|
287
368
|
});
|
|
288
369
|
}
|
|
289
370
|
|
package/proxy/upstream.mjs
CHANGED
|
@@ -183,9 +183,23 @@ function getAgent(isHTTPS, hostname) {
|
|
|
183
183
|
return agent;
|
|
184
184
|
}
|
|
185
185
|
|
|
186
|
+
// Build the upstream URL by concatenating the configured base (with any path
|
|
187
|
+
// component preserved) with the client request URL. The historical
|
|
188
|
+
// `new URL(clientReq.url, base)` approach is RFC 3986 relative-resolution,
|
|
189
|
+
// which drops the base's path component when the relative is path-absolute
|
|
190
|
+
// (`/v1/messages`). That breaks corp-proxy / mirror setups where the
|
|
191
|
+
// configured upstream is `https://corp-proxy.example.net/anthropic-mirror`
|
|
192
|
+
// — the request would land at `https://corp-proxy.example.net/v1/messages`
|
|
193
|
+
// with `/anthropic-mirror` silently dropped. See PR #188 / @nisqatsi.
|
|
194
|
+
export function buildUpstreamUrl(base, clientUrl) {
|
|
195
|
+
const trimmedBase = base.endsWith("/") ? base.slice(0, -1) : base;
|
|
196
|
+
const relative = clientUrl.startsWith("/") ? clientUrl : "/" + clientUrl;
|
|
197
|
+
return new URL(trimmedBase + relative);
|
|
198
|
+
}
|
|
199
|
+
|
|
186
200
|
export function forwardRequest(clientReq, body, signal) {
|
|
187
201
|
return new Promise((resolve, reject) => {
|
|
188
|
-
const upstreamUrl =
|
|
202
|
+
const upstreamUrl = buildUpstreamUrl(config.upstream, clientReq.url);
|
|
189
203
|
|
|
190
204
|
const headers = buildUpstreamHeaders(clientReq.headers, upstreamUrl.hostname);
|
|
191
205
|
if (body) {
|
|
@@ -0,0 +1,229 @@
|
|
|
1
|
+
"""Shared cache analysis helpers for hooks and MCP tools.
|
|
2
|
+
|
|
3
|
+
Reference Python helper for consumers that want to read cache-fix's
|
|
4
|
+
``quota-status`` output and reason about cache-state from a Claude Code
|
|
5
|
+
transcript. Used by host-side hooks (e.g. ``~/.claude/hooks/
|
|
6
|
+
context-advisor-analyze.py``) and MCP tools that need quota-aware
|
|
7
|
+
behavior.
|
|
8
|
+
|
|
9
|
+
Consumer pattern: copy or symlink this file into ``~/.claude/mcp/`` (or
|
|
10
|
+
wherever your hook / tool expects to import from) and ``from cache_analysis
|
|
11
|
+
import read_quota_status, analyze_transcript`` etc. The file ships in the
|
|
12
|
+
cache-fix npm package's ``tools/`` directory; npm consumers can reference
|
|
13
|
+
``node_modules/claude-code-cache-fix/tools/cache_analysis.py`` directly or
|
|
14
|
+
copy it out for non-npm installs.
|
|
15
|
+
|
|
16
|
+
The ``read_quota_status()`` helper handles both cache-fix v3.5.0+ (proxy
|
|
17
|
+
mode, per-session split at ``~/.claude/quota-status/account.json``) and
|
|
18
|
+
v3.4.x and earlier / preload mode (single global
|
|
19
|
+
``~/.claude/quota-status.json``). See the README's "Migration:
|
|
20
|
+
v3.4.x → v3.5.0+" section.
|
|
21
|
+
"""
|
|
22
|
+
|
|
23
|
+
import json
|
|
24
|
+
import subprocess
|
|
25
|
+
from datetime import datetime, timezone
|
|
26
|
+
|
|
27
|
+
CACHE_TTL_5M = 300 # 5-minute ephemeral TTL
|
|
28
|
+
CACHE_TTL_1H = 3600 # 1-hour extended TTL
|
|
29
|
+
CONTEXT_THRESHOLD = 50_000 # Minimum tokens to recommend compact
|
|
30
|
+
COMPACT_RESULT_ESTIMATE = 12_000 # Estimated tokens after compaction
|
|
31
|
+
CACHE_CREATE_RATE_5M = 3.75 # Opus $/MTok for 5min cache writes
|
|
32
|
+
CACHE_CREATE_RATE_1H = 7.50 # Opus $/MTok for 1h cache writes
|
|
33
|
+
|
|
34
|
+
|
|
35
|
+
def read_tail_lines(filepath, n=300):
|
|
36
|
+
"""Read last N lines efficiently using tail."""
|
|
37
|
+
try:
|
|
38
|
+
result = subprocess.run(
|
|
39
|
+
["tail", "-n", str(n), filepath],
|
|
40
|
+
capture_output=True, text=True, timeout=5,
|
|
41
|
+
)
|
|
42
|
+
return result.stdout.splitlines()
|
|
43
|
+
except Exception:
|
|
44
|
+
return []
|
|
45
|
+
|
|
46
|
+
|
|
47
|
+
def parse_assistant_usage(lines):
|
|
48
|
+
"""Extract assistant messages with usage data from transcript lines."""
|
|
49
|
+
messages = []
|
|
50
|
+
for line in lines:
|
|
51
|
+
line = line.strip()
|
|
52
|
+
if not line:
|
|
53
|
+
continue
|
|
54
|
+
try:
|
|
55
|
+
obj = json.loads(line)
|
|
56
|
+
except json.JSONDecodeError:
|
|
57
|
+
continue
|
|
58
|
+
if obj.get("type") != "assistant":
|
|
59
|
+
continue
|
|
60
|
+
msg = obj.get("message", {})
|
|
61
|
+
usage = msg.get("usage")
|
|
62
|
+
ts = obj.get("timestamp")
|
|
63
|
+
if not usage or not ts:
|
|
64
|
+
continue
|
|
65
|
+
cr = usage.get("cache_creation_input_tokens", 0)
|
|
66
|
+
rd = usage.get("cache_read_input_tokens", 0)
|
|
67
|
+
inp = usage.get("input_tokens", 0)
|
|
68
|
+
out = usage.get("output_tokens", 0)
|
|
69
|
+
if cr == 0 and rd == 0 and inp == 0:
|
|
70
|
+
continue
|
|
71
|
+
# Extract TTL tier breakdown if available
|
|
72
|
+
cr_detail = usage.get("cache_creation", {})
|
|
73
|
+
cr_1h = cr_detail.get("ephemeral_1h_input_tokens", 0) if isinstance(cr_detail, dict) else 0
|
|
74
|
+
cr_5m = cr_detail.get("ephemeral_5m_input_tokens", 0) if isinstance(cr_detail, dict) else 0
|
|
75
|
+
messages.append({
|
|
76
|
+
"timestamp": ts,
|
|
77
|
+
"input_tokens": inp,
|
|
78
|
+
"cache_creation": cr,
|
|
79
|
+
"cache_read": rd,
|
|
80
|
+
"output_tokens": out,
|
|
81
|
+
"total_in": cr + rd + inp,
|
|
82
|
+
"cr_1h": cr_1h,
|
|
83
|
+
"cr_5m": cr_5m,
|
|
84
|
+
})
|
|
85
|
+
return messages
|
|
86
|
+
|
|
87
|
+
|
|
88
|
+
def detect_cache_ttl(messages):
|
|
89
|
+
"""Detect the effective cache TTL from recent API call usage data.
|
|
90
|
+
|
|
91
|
+
If any recent calls show ephemeral_1h_input_tokens > 0, the account
|
|
92
|
+
is on the 1-hour tier. Otherwise, assume 5-minute ephemeral.
|
|
93
|
+
Returns (ttl_seconds, tier_name).
|
|
94
|
+
"""
|
|
95
|
+
recent = messages[-10:] if len(messages) >= 10 else messages
|
|
96
|
+
has_1h = any(m.get("cr_1h", 0) > 0 for m in recent)
|
|
97
|
+
has_5m = any(m.get("cr_5m", 0) > 0 for m in recent)
|
|
98
|
+
|
|
99
|
+
if has_1h:
|
|
100
|
+
return CACHE_TTL_1H, "1h"
|
|
101
|
+
if has_5m:
|
|
102
|
+
return CACHE_TTL_5M, "5m"
|
|
103
|
+
# No cache_creation breakdown available — conservative default
|
|
104
|
+
return CACHE_TTL_5M, "5m (default)"
|
|
105
|
+
|
|
106
|
+
|
|
107
|
+
def estimate_thinking_overhead(messages):
|
|
108
|
+
"""Estimate thinking block replay overhead.
|
|
109
|
+
|
|
110
|
+
Thinking blocks from prior turns replay as input tokens. Heuristic:
|
|
111
|
+
cumulative output_tokens approximates thinking content that gets replayed.
|
|
112
|
+
"""
|
|
113
|
+
if len(messages) < 2:
|
|
114
|
+
return 0
|
|
115
|
+
return sum(m["output_tokens"] for m in messages[:-1])
|
|
116
|
+
|
|
117
|
+
|
|
118
|
+
def format_tokens(n):
|
|
119
|
+
if n >= 1_000_000:
|
|
120
|
+
return f"{n / 1_000_000:.1f}M"
|
|
121
|
+
if n >= 1_000:
|
|
122
|
+
return f"{n / 1_000:.0f}k"
|
|
123
|
+
return str(n)
|
|
124
|
+
|
|
125
|
+
|
|
126
|
+
def format_duration(seconds):
|
|
127
|
+
if seconds >= 3600:
|
|
128
|
+
return f"{seconds / 3600:.1f}h"
|
|
129
|
+
return f"{int(seconds / 60)}m"
|
|
130
|
+
|
|
131
|
+
|
|
132
|
+
def estimate_savings(total_context, ttl_tier="5m"):
|
|
133
|
+
"""Estimate $ savings from compacting before a cold start.
|
|
134
|
+
|
|
135
|
+
Rate depends on the active cache TTL tier — 1h cache writes are 2x the
|
|
136
|
+
5m rate. Caller should pass the tier returned by detect_cache_ttl().
|
|
137
|
+
Default is the conservative 5m rate for backward compatibility.
|
|
138
|
+
"""
|
|
139
|
+
rate = CACHE_CREATE_RATE_1H if ttl_tier.startswith("1h") else CACHE_CREATE_RATE_5M
|
|
140
|
+
cold_cost = (total_context / 1_000_000) * rate
|
|
141
|
+
compact_cost = (COMPACT_RESULT_ESTIMATE / 1_000_000) * rate
|
|
142
|
+
return cold_cost - compact_cost
|
|
143
|
+
|
|
144
|
+
|
|
145
|
+
def read_quota_status():
|
|
146
|
+
"""Read current quota utilization from cache-fix's quota-status file.
|
|
147
|
+
|
|
148
|
+
Written by the cache-fix interceptor from API response headers. Path
|
|
149
|
+
depends on cache-fix version:
|
|
150
|
+
- v3.5.0+ (proxy mode, per-session split): ~/.claude/quota-status/account.json
|
|
151
|
+
- v3.4.x and earlier (or preload mode): ~/.claude/quota-status.json (flat)
|
|
152
|
+
|
|
153
|
+
Tries the v3.5.0+ path first, falls back to the legacy flat path. A
|
|
154
|
+
candidate file whose JSON parses but isn't a dict (e.g. a partial write
|
|
155
|
+
that lands as ``[]`` or ``null``) is skipped so the next candidate gets
|
|
156
|
+
a chance — and so callers never receive a non-dict and break on
|
|
157
|
+
``status.get(...)`` accessors downstream.
|
|
158
|
+
|
|
159
|
+
Returns dict with five_hour/seven_day pct (and other fields written by
|
|
160
|
+
cache-fix's response-header capture), or None if no candidate yields a
|
|
161
|
+
dict-shaped payload.
|
|
162
|
+
"""
|
|
163
|
+
import os
|
|
164
|
+
for quota_file in (
|
|
165
|
+
os.path.expanduser("~/.claude/quota-status/account.json"),
|
|
166
|
+
os.path.expanduser("~/.claude/quota-status.json"),
|
|
167
|
+
):
|
|
168
|
+
try:
|
|
169
|
+
with open(quota_file) as f:
|
|
170
|
+
data = json.load(f)
|
|
171
|
+
except (OSError, json.JSONDecodeError):
|
|
172
|
+
continue
|
|
173
|
+
if isinstance(data, dict):
|
|
174
|
+
return data
|
|
175
|
+
# Valid JSON but wrong shape — try the next candidate.
|
|
176
|
+
return None
|
|
177
|
+
|
|
178
|
+
|
|
179
|
+
def analyze_transcript(transcript_path):
|
|
180
|
+
"""Full analysis of a transcript. Returns a dict with all cache state info.
|
|
181
|
+
|
|
182
|
+
Returns None if analysis can't be performed (no data, etc).
|
|
183
|
+
"""
|
|
184
|
+
lines = read_tail_lines(transcript_path, 300)
|
|
185
|
+
if not lines:
|
|
186
|
+
return None
|
|
187
|
+
|
|
188
|
+
messages = parse_assistant_usage(lines)
|
|
189
|
+
if not messages:
|
|
190
|
+
return None
|
|
191
|
+
|
|
192
|
+
last = messages[-1]
|
|
193
|
+
try:
|
|
194
|
+
last_ts = datetime.fromisoformat(last["timestamp"].replace("Z", "+00:00"))
|
|
195
|
+
except (ValueError, KeyError):
|
|
196
|
+
return None
|
|
197
|
+
|
|
198
|
+
now = datetime.now(timezone.utc)
|
|
199
|
+
gap_seconds = (now - last_ts).total_seconds()
|
|
200
|
+
|
|
201
|
+
context_tokens = last["total_in"]
|
|
202
|
+
thinking_overhead = estimate_thinking_overhead(messages)
|
|
203
|
+
total_with_thinking = context_tokens + thinking_overhead
|
|
204
|
+
|
|
205
|
+
ttl_seconds, ttl_tier = detect_cache_ttl(messages)
|
|
206
|
+
cache_expired = gap_seconds > ttl_seconds
|
|
207
|
+
|
|
208
|
+
# Last few turns' cache efficiency
|
|
209
|
+
recent = messages[-5:] if len(messages) >= 5 else messages
|
|
210
|
+
recent_cr = sum(m["cache_creation"] for m in recent)
|
|
211
|
+
recent_total = sum(m["total_in"] for m in recent)
|
|
212
|
+
cr_pct = (recent_cr / recent_total * 100) if recent_total else 0
|
|
213
|
+
|
|
214
|
+
quota = read_quota_status()
|
|
215
|
+
|
|
216
|
+
return {
|
|
217
|
+
"context_tokens": context_tokens,
|
|
218
|
+
"thinking_overhead": thinking_overhead,
|
|
219
|
+
"total_with_thinking": total_with_thinking,
|
|
220
|
+
"gap_seconds": gap_seconds,
|
|
221
|
+
"cache_expired": cache_expired,
|
|
222
|
+
"ttl_seconds": ttl_seconds,
|
|
223
|
+
"ttl_tier": ttl_tier,
|
|
224
|
+
"last_timestamp": last["timestamp"],
|
|
225
|
+
"num_messages": len(messages),
|
|
226
|
+
"recent_cr_pct": cr_pct,
|
|
227
|
+
"savings": estimate_savings(total_with_thinking, ttl_tier) if cache_expired else 0,
|
|
228
|
+
"quota": quota,
|
|
229
|
+
}
|