claude-code-cache-fix 4.0.0 → 4.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,7 +6,7 @@ English | [中文](./README.zh.md) | [한국어](./README.ko.md) | [Português](
6
6
 
7
7
  Cache optimization proxy for [Claude Code](https://github.com/anthropics/claude-code). Fixes prompt cache bugs that cause excessive quota burn, stabilizes the request prefix, and monitors for silent regressions. Works with all CC versions including the v2.1.113+ Bun binary.
8
8
 
9
- > **v3.0.3** — Local HTTP proxy with 7 hot-reloadable extensions. A/B tested on v2.1.117: **95.5% cache hit rate through proxy vs 82.3% direct** on first warm turn. [Full release notes →](https://github.com/cnighswonger/claude-code-cache-fix/releases/tag/v3.0.0)
9
+ > **v4.0.0** — Local HTTP proxy with a pipeline of cost-impact and observability extensions. Two long-standing defaults flipped: `thinking-block-sanitize` v1 is on by default (mitigates the thinking-desync `400` wedge — [#63147](https://github.com/anthropics/claude-code/issues/63147)) and in-process extension hot-reload is opt-in (`CACHE_FIX_HOT_RELOAD=on`). A/B baseline (v3.0.0 on v2.1.117): **95.5% cache hit rate through proxy vs 82.3% direct** on first warm turn. [Full release notes →](https://github.com/cnighswonger/claude-code-cache-fix/releases/tag/v4.0.0)
10
10
 
11
11
  > **Opus 4.7 advisory:** Metered data shows 4.7 burns Q5h quota at **~2.4x the rate of 4.6** for equivalent visible token counts ([independently confirmed by @ArkNill](https://github.com/ArkNill/claude-code-hidden-problem-analysis/blob/main/16_OPUS-47-ADVISORY.md)). Two factors: a new tokenizer (up to 35% more tokens, [documented](https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7)) and adaptive thinking overhead (~105%, not documented in usage response). The Q5h impact compounds into **Q7d** — the weekly quota ceiling that most heavy users will hit first. Workaround: `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` reduces burn by ~3.3x but may reduce quality on complex tasks. See [Discussion #25](https://github.com/cnighswonger/claude-code-cache-fix/discussions/25) (initial observation) and [Discussion #42](https://github.com/cnighswonger/claude-code-cache-fix/discussions/42) (controlled A/B data + Q7d analysis).
12
12
 
@@ -25,11 +25,11 @@ node "$(npm root -g)/claude-code-cache-fix/proxy/server.mjs" &
25
25
  ANTHROPIC_BASE_URL=http://127.0.0.1:9801 claude
26
26
  ```
27
27
 
28
- That's it. The proxy applies all 7 cache-fix extensions automatically. No wrapper scripts, no `NODE_OPTIONS`, no preload.
28
+ That's it. The proxy applies its default extension pipeline automatically. No wrapper scripts, no `NODE_OPTIONS`, no preload.
29
29
 
30
30
  ### What the proxy does
31
31
 
32
- On every `/v1/messages` request, 9 extensions run in order:
32
+ On every `/v1/messages` request, the pipeline runs an ordered chain of extensions covering cache stability, observability, thinking-desync mitigation, image, microcompact, breakpoint, bootstrap-channel, and other surfaces. Several are gated behind env vars documented in their own sections below; bootstrap-channel handling defaults to `audit` mode. The headliners:
33
33
 
34
34
  | Extension | What it fixes |
35
35
  |-----------|--------------|
@@ -116,7 +116,7 @@ docker run -d --name cache-fix-proxy --restart=always -p 9801:9801 \
116
116
  ghcr.io/cnighswonger/claude-code-cache-fix:latest
117
117
  ```
118
118
 
119
- Image tags: `latest`, `3`, `3.2`, `3.2.1` (semver-ladder, so `3` always points to the newest 3.x). `latest` always tracks the newest tagged release.
119
+ Image tags: `latest`, `4`, `4.0`, `4.0.0` (semver-ladder, so `4` always points to the newest 4.x). `latest` always tracks the newest tagged release.
120
120
 
121
121
  **Linux note:** the chained-upstream `host.docker.internal` example below is automatic on Docker Desktop (macOS / Windows). On plain Linux Docker Engine you usually need `--add-host=host.docker.internal:host-gateway` so the name resolves to the host bridge. Without it, the container's name lookup fails and the proxy can't reach the upstream service running on the host. Example chaining cache-fix proxy through `llm-relay` running on the host:
122
122
 
@@ -282,7 +282,7 @@ launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
282
282
 
283
283
  **Cache-economics regressions.** The original purpose of cache-fix is to absorb the cache-handling behaviors in Claude Code that cost users real money and quota — TTL downgrades, cache-breaking header churn, identity-latching issues, and the rest of the regression catalog documented across our issue history. The proxy sits between CC and the Anthropic API, normalizes the request and response stream, and emits enough observability (via statusline integration and the quota-status files) that users can see what their session is actually doing. This is the load-bearing feature for almost every user today.
284
284
 
285
- **Bootstrap-channel observability.** Claude Code v2.1.150 introduced a prompt-section consumer that fetches a server-supplied string from `/api/claude_cli/bootstrap` and merges it into the agent's behavioral-instructions prompt path. We filed this behavior with Anthropic's security team in May 2026; Anthropic closed the report as *Informative*, treating TLS as the transport-integrity boundary and declining to add application-layer authenticity checks. Cache-fix v3.7.0 added explicit handling for this path. v3.7.1 extends it to also cover the env-var-selected GrowthBook prompt-injection surface that landed in CC v2.1.152 (remote-control mode: `CLAUDE_CODE_SYSTEM_PROMPT_GB_FEATURE` names a flag key whose cached value is used as the system prompt body).
285
+ **Bootstrap-channel observability.** Claude Code v2.1.150 introduced a prompt-section consumer that fetches a server-supplied string from `/api/claude_cli/bootstrap` and merges it into the agent's behavioral-instructions prompt path. We filed this behavior with Anthropic's security team in May 2026; Anthropic closed the report as *Informative*, treating TLS as the transport-integrity boundary and declining to add application-layer authenticity checks. Cache-fix shipped explicit handling for this path in v3.7.0 and extended it in v3.7.1 to also cover the env-var-selected GrowthBook prompt-injection surface that landed in CC v2.1.152 (remote-control mode: `CLAUDE_CODE_SYSTEM_PROMPT_GB_FEATURE` names a flag key whose cached value is used as the system prompt body). Stable in the current v4.x line.
286
286
 
287
287
  Cache-fix's `bootstrap-defense` extension ships three modes, selected via `CACHE_FIX_BOOTSTRAP_MODE`:
288
288
 
@@ -401,7 +401,7 @@ For manual VS Code wrapper setup (without the VSIX), see [docs/preload-setup.md]
401
401
 
402
402
  **What it does NOT do:** No network calls from the proxy or interceptor. All telemetry is written to local files under `~/.claude/`. No data leaves your machine.
403
403
 
404
- **Supply chain:** Proxy mode: 7 small extension modules in `proxy/extensions/` (each under 200 lines). Preload mode: single unminified file (`preload.mjs`, ~1,700 lines). One dev dependency (`zod` for schema validation in tests only). Review before installing. Published builds carry npm's default registry signatures; sigstore provenance attestation is not currently published — tracked as a follow-up.
404
+ **Supply chain:** Proxy mode: small focused extension modules in `proxy/extensions/` (most under a few hundred lines; the pipeline is composable, you can read any single one in isolation). Preload mode: single unminified file (`preload.mjs`). One dev dependency (`zod` for schema validation in tests only). Review before installing. Published builds carry npm's default registry signatures; sigstore provenance attestation is not currently published — tracked as a follow-up.
405
405
 
406
406
  **Independent audit:** [Assessed as "LEGITIMATE TOOL"](https://github.com/anthropics/claude-code/issues/38335#issuecomment-4244413605) by @TheAuditorTool (2026-04-14).
407
407
 
@@ -421,7 +421,7 @@ Additionally, images read via the Read tool persist as base64 in conversation hi
421
421
 
422
422
  ## How it works
423
423
 
424
- **Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions live as `.mjs` files configured in `proxy/extensions.json` and load once at proxy startup (hot-reload is opt-in as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x)). All other traffic passes through untouched.
424
+ **Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. A pipeline of extension modules processes each request — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers, sanitizing thinking blocks, recording telemetry, and more. Extensions live as `.mjs` files configured in `proxy/extensions.json` and load once at proxy startup (hot-reload is opt-in as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x)). All other traffic passes through untouched.
425
425
 
426
426
  **Preload mode** (v2.x): A Node.js `--import` module that patches `globalThis.fetch` before Claude Code makes API calls. Applies the same fixes inline — scans user messages for relocated blocks, sorts tools, recomputes fingerprints, injects TTL markers.
427
427
 
@@ -855,6 +855,52 @@ The preload interceptor includes monitoring for microcompact degradation, false
855
855
 
856
856
  See [docs/monitoring.md](docs/monitoring.md) for full details, debug mode, prefix diffing, environment variables, and the bundled quota analysis tool.
857
857
 
858
+ ### `usage-log` extension and the `MeterRowSchema v:1` wire format
859
+
860
+ The `usage-log` extension (opt-in via `proxy/extensions.json`) appends one JSON line per API response to `~/.claude/usage.jsonl`. The row shape is `MeterRowSchema v:1` — the cross-repo contract validated by [`claude-code-meter`](https://github.com/cnighswonger/claude-code-meter)'s strict schema. Every field below is captured per call:
861
+
862
+ | Field | Type | Source |
863
+ |---|---|---|
864
+ | `v` | literal `1` | constant |
865
+ | `ts` | ISO-8601 datetime | server time at row emission |
866
+ | `sid` | 8-char lowercase hex | proxy session id, sticky for the proxy's lifetime |
867
+ | `model` | string ≤64 | `message_start.message.model` from the response stream |
868
+ | `requested_model` | string ≤64 (optional) | request body `model` field |
869
+ | `model_mismatch` | bool (optional) | true when `requested_model && model && requested_model !== model` |
870
+ | `speed` | `"standard"` / `"fast"` / `""` | response `usage.speed` |
871
+ | `service_tier` | string ≤32 | response `usage.service_tier` |
872
+ | `input_tokens` | int ≥0 | response usage |
873
+ | `output_tokens` | int ≥0 | response usage |
874
+ | `cache_creation_input_tokens` | int ≥0 | response usage |
875
+ | `cache_read_input_tokens` | int ≥0 | response usage |
876
+ | `ephemeral_1h_input_tokens` | int ≥0 | response usage |
877
+ | `ephemeral_5m_input_tokens` | int ≥0 | response usage |
878
+ | `web_search_requests` | int ≥0 | response usage |
879
+ | `q5h` / `q7d` | float 0–2 | `anthropic-ratelimit-unified-{5h,7d}-utilization` headers |
880
+ | `q5h_reset` / `q7d_reset` | int (unix sec) | corresponding reset headers |
881
+ | `qstatus`, `qoverage`, `qclaim` | lowercase enums | unified status / overage / claim headers |
882
+ | `qfallback_pct` | float 0–1 | unified fallback percentage |
883
+ | `qoverage_util` | float ≥0 (optional) | overage utilization header |
884
+ | `qrepresentative_claim` | string ≤16 (optional) | representative-claim header |
885
+ | `org_id` | 16-char hex (optional) | `sha256(anthropic-organization-id).slice(0, 16)` — never raw |
886
+ | `overage_disabled_reason` | string ≤64 (optional) | overage-disabled-reason header |
887
+ | `cache_hit_rate` | float 0–1 | `cache_read_input_tokens / (input + cache_creation + cache_read)` |
888
+ | `q5h_delta`, `q7d_delta` | float | per-call delta from the previous row's q5h/q7d; 0 on first call after restart |
889
+ | `request_id` | string ≤64 (optional, gated) | upstream `request-id` response header. Default-off; enable with `CACHE_FIX_USAGE_LOG_REQID=on`. **Cross-repo gate:** `claude-code-meter >= v0.7.0` accepts the optional field; older meter installs reject unknown keys via the strict-object schema. |
890
+
891
+ **Why `request_id` matters operationally.** The `sid` field is generated once at proxy boot and shared across every CC session that proxy serves. On hosts running multiple concurrent CC sessions through one proxy (common in agent fleets), every session's rows collapse into the same `sid` — there's no way to ask "which session burned 80% of today's Opus tokens?" from `usage.jsonl` alone. CC's per-session JSONL transcripts at `~/.claude/projects/<project>/<session-uuid>.jsonl` already carry `requestId` for every API call. Capturing the same value in the meter row makes the post-hoc join trivial:
892
+
893
+ ```bash
894
+ # Find which CC session each usage.jsonl row belongs to:
895
+ for row in $(jq -c . < ~/.claude/usage.jsonl); do
896
+ req=$(jq -r '.request_id // empty' <<< "$row")
897
+ [ -z "$req" ] && continue
898
+ grep -l "\"requestId\":\"$req\"" ~/.claude/projects/*/*.jsonl
899
+ done
900
+ ```
901
+
902
+ The filename of the matching transcript is the CC session UUID, recovering per-session attribution for every meter row that was emitted with the field on.
903
+
858
904
  ## Limitations
859
905
 
860
906
  - **Proxy requires a running process** — The proxy must be started before Claude Code. If it's not running and `ANTHROPIC_BASE_URL` points to it, CC will fail to connect. We recommend running it as a systemd service or with a health-checking wrapper script.
@@ -13,6 +13,7 @@ import { spawn } from "node:child_process";
13
13
  import { fileURLToPath } from "node:url";
14
14
  import { dirname, resolve, join } from "node:path";
15
15
  import { homedir, platform } from "node:os";
16
+ import { systemdEscape, xmlEscape } from "../proxy/helpers.mjs";
16
17
 
17
18
  const __dirname = dirname(fileURLToPath(import.meta.url));
18
19
  const TEMPLATE_DIR = resolve(__dirname, "..", "templates");
@@ -22,6 +23,8 @@ function getDefaults() {
22
23
  return {
23
24
  port: validatePort(process.env.CACHE_FIX_PROXY_PORT || "9801"),
24
25
  upstream: process.env.CACHE_FIX_PROXY_UPSTREAM || "",
26
+ caFile: process.env.CACHE_FIX_PROXY_CA_FILE || "",
27
+ rejectUnauthorized: process.env.CACHE_FIX_PROXY_REJECT_UNAUTHORIZED || "",
25
28
  debug: process.env.CACHE_FIX_DEBUG || "",
26
29
  // Hot-reload is opt-in as of v4.0.0 (#196). Capture from env at install
27
30
  // time so the operator can bake `CACHE_FIX_HOT_RELOAD=on` into the
@@ -93,10 +96,16 @@ function getPaths(plat = platform()) {
93
96
 
94
97
  function renderSystemdTemplate(template, vars) {
95
98
  const upstreamLine = vars.upstream
96
- ? `Environment=CACHE_FIX_PROXY_UPSTREAM=${vars.upstream}`
99
+ ? `Environment=CACHE_FIX_PROXY_UPSTREAM=${systemdEscape(vars.upstream)}`
100
+ : "";
101
+ const caFileLine = vars.caFile
102
+ ? `Environment=CACHE_FIX_PROXY_CA_FILE=${systemdEscape(vars.caFile)}`
103
+ : "";
104
+ const rejectUnauthorizedLine = vars.rejectUnauthorized
105
+ ? `Environment=CACHE_FIX_PROXY_REJECT_UNAUTHORIZED=${systemdEscape(vars.rejectUnauthorized)}`
97
106
  : "";
98
107
  const debugLine = vars.debug
99
- ? `Environment=CACHE_FIX_DEBUG=${vars.debug}`
108
+ ? `Environment=CACHE_FIX_DEBUG=${systemdEscape(vars.debug)}`
100
109
  : "";
101
110
  const hotReloadLine = vars.hotReload
102
111
  ? `Environment=CACHE_FIX_HOT_RELOAD=${vars.hotReload}`
@@ -111,6 +120,8 @@ function renderSystemdTemplate(template, vars) {
111
120
  .replaceAll("{{SERVER_PATH}}", vars.serverPath)
112
121
  .replaceAll("{{PORT}}", vars.port)
113
122
  .replaceAll("{{UPSTREAM_LINE}}", upstreamLine)
123
+ .replaceAll("{{CA_FILE_LINE}}", caFileLine)
124
+ .replaceAll("{{REJECT_UNAUTHORIZED_LINE}}", rejectUnauthorizedLine)
114
125
  .replaceAll("{{DEBUG_LINE}}", debugLine)
115
126
  .replaceAll("{{HOT_RELOAD_LINE}}", hotReloadLine)
116
127
  .replaceAll("{{REQUIRES_LINE}}", requiresLine)
@@ -121,10 +132,16 @@ function renderSystemdTemplate(template, vars) {
121
132
 
122
133
  function renderLaunchdTemplate(template, vars) {
123
134
  const upstreamPlist = vars.upstream
124
- ? ` <key>CACHE_FIX_PROXY_UPSTREAM</key>\n <string>${vars.upstream}</string>`
135
+ ? ` <key>CACHE_FIX_PROXY_UPSTREAM</key>\n <string>${xmlEscape(vars.upstream)}</string>`
136
+ : "";
137
+ const caFilePlist = vars.caFile
138
+ ? ` <key>CACHE_FIX_PROXY_CA_FILE</key>\n <string>${xmlEscape(vars.caFile)}</string>`
139
+ : "";
140
+ const rejectUnauthorizedPlist = vars.rejectUnauthorized
141
+ ? ` <key>CACHE_FIX_PROXY_REJECT_UNAUTHORIZED</key>\n <string>${xmlEscape(vars.rejectUnauthorized)}</string>`
125
142
  : "";
126
143
  const debugPlist = vars.debug
127
- ? ` <key>CACHE_FIX_DEBUG</key>\n <string>${vars.debug}</string>`
144
+ ? ` <key>CACHE_FIX_DEBUG</key>\n <string>${xmlEscape(vars.debug)}</string>`
128
145
  : "";
129
146
  const hotReloadPlist = vars.hotReload
130
147
  ? ` <key>CACHE_FIX_HOT_RELOAD</key>\n <string>${vars.hotReload}</string>`
@@ -134,6 +151,8 @@ function renderLaunchdTemplate(template, vars) {
134
151
  .replaceAll("{{SERVER_PATH}}", vars.serverPath)
135
152
  .replaceAll("{{PORT}}", vars.port)
136
153
  .replaceAll("{{UPSTREAM_PLIST}}", upstreamPlist)
154
+ .replaceAll("{{CA_FILE_PLIST}}", caFilePlist)
155
+ .replaceAll("{{REJECT_UNAUTHORIZED_PLIST}}", rejectUnauthorizedPlist)
137
156
  .replaceAll("{{DEBUG_PLIST}}", debugPlist)
138
157
  .replaceAll("{{HOT_RELOAD_PLIST}}", hotReloadPlist)
139
158
  .replaceAll("{{WORKING_DIR}}", vars.workingDir)
@@ -186,12 +205,8 @@ async function installSystemd({ paths, defaults, force = false } = {}) {
186
205
  const rendered = renderSystemdTemplate(template, {
187
206
  node: process.execPath,
188
207
  serverPath: SERVER_PATH,
189
- port: defaults.port,
190
- upstream: defaults.upstream,
191
- debug: defaults.debug,
192
- hotReload: defaults.hotReload,
193
- workingDir: defaults.workingDir,
194
208
  requires: "",
209
+ ...defaults,
195
210
  });
196
211
  await mkdir(paths.configDir, { recursive: true });
197
212
  await writeFile(targetPath, rendered);
@@ -286,12 +301,8 @@ async function installLaunchd({ paths, defaults, force = false } = {}) {
286
301
  const rendered = renderLaunchdTemplate(template, {
287
302
  node: process.execPath,
288
303
  serverPath: SERVER_PATH,
289
- port: defaults.port,
290
- upstream: defaults.upstream,
291
- debug: defaults.debug,
292
- hotReload: defaults.hotReload,
293
- workingDir: defaults.workingDir,
294
304
  logDir: paths.logDir,
305
+ ...defaults,
295
306
  });
296
307
  await mkdir(paths.configDir, { recursive: true });
297
308
  await writeFile(targetPath, rendered);
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-code-cache-fix",
3
- "version": "4.0.0",
3
+ "version": "4.1.0",
4
4
  "description": "Cache optimization proxy and interceptor for Claude Code. Fixes prompt cache bugs, stabilizes prefix, reduces quota burn.",
5
5
  "type": "module",
6
6
  "exports": {
@@ -26,6 +26,7 @@
26
26
  // overage_disabled_reason?: string ≤64 (optional)
27
27
  // cache_hit_rate: float 0–1
28
28
  // q5h_delta, q7d_delta: float (0 on first call after restart)
29
+ // request_id?: string ≤64 (optional, gated)
29
30
  //
30
31
  // `peak_hour` is NOT in the wire format. It can be derived from `ts` if any
31
32
  // consumer needs it.
@@ -36,6 +37,17 @@
36
37
  // CACHE_FIX_USAGE_LOG=<path> overrides the destination path only — it is NOT
37
38
  // an enable flag and never has been.
38
39
  //
40
+ // CACHE_FIX_USAGE_LOG_REQID=on emits the optional `request_id` field
41
+ // (sourced from the upstream `request-id` response header). Default-off in
42
+ // v4.1.0 to avoid breaking unpatched claude-meter installs whose strict-
43
+ // object schema rejects unknown keys. claude-meter v0.7.0+ accepts the
44
+ // optional field; the v4.2.0 flip to default-on assumes that floor.
45
+ // The field is the post-hoc join key against CC's per-session JSONL
46
+ // transcripts
47
+ // (`~/.claude/projects/<project>/<session-uuid>.jsonl` carry `requestId`
48
+ // for every API call), which recovers per-CC-session attribution that
49
+ // `sid` alone cannot provide. See docs/directives/proxy-usage-log-request-id.md.
50
+ //
39
51
  // See `docs/directives/proxy-claude-meter-compat.md` for full design.
40
52
 
41
53
  import { appendFile, mkdir } from "node:fs/promises";
@@ -91,6 +103,17 @@ export function extractMessageDeltaFields(event) {
91
103
  return { output_tokens: event.usage.output_tokens || 0 };
92
104
  }
93
105
 
106
+ // Extract upstream request-id from response headers, guarded against the
107
+ // max(64) MeterRowSchema constraint. Returns the string when valid, or
108
+ // `undefined` so the optional schema field is omitted on bad input rather
109
+ // than emitting a row that would fail meter-side validation.
110
+ export function extractRequestId(headers) {
111
+ const raw = headers?.["request-id"];
112
+ if (typeof raw !== "string") return undefined;
113
+ if (raw.length === 0 || raw.length > 64) return undefined;
114
+ return raw;
115
+ }
116
+
94
117
  function num(headers, key) {
95
118
  const v = headers?.[key];
96
119
  if (v === undefined || v === null || v === "") return null;
@@ -134,7 +157,7 @@ export function computeDelta(current, previous) {
134
157
  return current - previous;
135
158
  }
136
159
 
137
- export function assembleRecord({ start, delta, quota, requestedModel, sid, prevQ5h, prevQ7d, now = new Date() }) {
160
+ export function assembleRecord({ start, delta, quota, requestedModel, sid, prevQ5h, prevQ7d, requestId, now = new Date() }) {
138
161
  const s = start || {};
139
162
  const d = delta || {};
140
163
  const q = quota || {};
@@ -194,6 +217,26 @@ export function assembleRecord({ start, delta, quota, requestedModel, sid, prevQ
194
217
  record.overage_disabled_reason = q.overage_disabled_reason;
195
218
  }
196
219
 
220
+ // Optional: emit request_id when CACHE_FIX_USAGE_LOG_REQID=on AND the
221
+ // captured value is a non-empty string within the schema's max(64)
222
+ // constraint. Belt-and-braces: extractRequestId enforces these guards at
223
+ // capture time, and assembleRecord re-enforces them here so a future
224
+ // refactor that bypasses the extractor can't emit a row that would fail
225
+ // claude-meter's strict-object validation.
226
+ // Env read happens per-call so operators can flip it at runtime without
227
+ // proxy restart, matching the image-strip debug-gate pattern.
228
+ // Cross-repo contract: claude-code-meter v0.7.0+ accepts this optional
229
+ // field; older meter installs reject rows that carry it, so the gate
230
+ // stays default-off in v4.1.0. Default flips on in cache-fix v4.2.0.
231
+ if (
232
+ process.env.CACHE_FIX_USAGE_LOG_REQID === "on" &&
233
+ typeof requestId === "string" &&
234
+ requestId.length > 0 &&
235
+ requestId.length <= 64
236
+ ) {
237
+ record.request_id = requestId;
238
+ }
239
+
197
240
  return record;
198
241
  }
199
242
 
@@ -250,6 +293,7 @@ export default {
250
293
  const delta = extractMessageDeltaFields(ctx.event);
251
294
  const quota = parseQuotaHeaders(ctx.responseHeaders || {});
252
295
  const requestedModel = ctx.telemetry?.requestedModel || undefined;
296
+ const requestId = extractRequestId(ctx.responseHeaders || {});
253
297
 
254
298
  const record = assembleRecord({
255
299
  start,
@@ -259,6 +303,7 @@ export default {
259
303
  sid: _sid,
260
304
  prevQ5h: _lastQ5h,
261
305
  prevQ7d: _lastQ7d,
306
+ requestId,
262
307
  now: new Date(),
263
308
  });
264
309
 
@@ -1,82 +1,20 @@
1
1
  {
2
- "bootstrap-defense": {
3
- "enabled": true,
4
- "order": 45
5
- },
6
- "ttl-tier-detect": {
7
- "enabled": true,
8
- "order": 75
9
- },
10
- "fingerprint-strip": {
11
- "enabled": true,
12
- "order": 100
13
- },
14
- "image-strip": {
15
- "enabled": true,
16
- "order": 150
17
- },
18
- "sort-stabilization": {
19
- "enabled": true,
20
- "order": 200
21
- },
22
- "fresh-session-sort": {
23
- "enabled": true,
24
- "order": 250
25
- },
26
- "identity-normalization": {
27
- "enabled": true,
28
- "order": 300
29
- },
30
- "smoosh-split": {
31
- "enabled": true,
32
- "order": 320
33
- },
34
- "content-strip": {
35
- "enabled": true,
36
- "order": 330
37
- },
38
- "tool-input-normalize": {
39
- "enabled": true,
40
- "order": 340
41
- },
42
- "microcompact-stability": {
43
- "enabled": true,
44
- "order": 350
45
- },
46
- "thinking-display": {
47
- "enabled": true,
48
- "order": 360
49
- },
50
- "cache-control-normalize": {
51
- "enabled": true,
52
- "order": 400
53
- },
54
- "messages-cache-breakpoint": {
55
- "enabled": true,
56
- "order": 410
57
- },
58
- "ttl-management": {
59
- "enabled": true,
60
- "order": 500
61
- },
62
- "cache-telemetry": {
63
- "enabled": true,
64
- "order": 600
65
- },
66
- "overage-warning": {
67
- "enabled": true,
68
- "order": 610
69
- },
70
- "request-log": {
71
- "enabled": false,
72
- "order": 700
73
- },
74
- "usage-log": {
75
- "enabled": true,
76
- "order": 650
77
- },
78
- "rate-limit-log": {
79
- "enabled": true,
80
- "order": 660
81
- }
2
+ "bootstrap-defense": { "enabled": true, "order": 45 },
3
+ "ttl-tier-detect": { "enabled": true, "order": 75 },
4
+ "fingerprint-strip": { "enabled": true, "order": 100 },
5
+ "image-strip": { "enabled": true, "order": 150 },
6
+ "sort-stabilization": { "enabled": true, "order": 200 },
7
+ "fresh-session-sort": { "enabled": true, "order": 250 },
8
+ "identity-normalization": { "enabled": true, "order": 300 },
9
+ "smoosh-split": { "enabled": true, "order": 320 },
10
+ "content-strip": { "enabled": true, "order": 330 },
11
+ "tool-input-normalize": { "enabled": true, "order": 340 },
12
+ "microcompact-stability": { "enabled": true, "order": 350 },
13
+ "thinking-display": { "enabled": true, "order": 360 },
14
+ "cache-control-normalize": { "enabled": true, "order": 400 },
15
+ "messages-cache-breakpoint": { "enabled": true, "order": 410 },
16
+ "ttl-management": { "enabled": true, "order": 500 },
17
+ "cache-telemetry": { "enabled": true, "order": 600 },
18
+ "overage-warning": { "enabled": true, "order": 610 },
19
+ "request-log": { "enabled": false, "order": 700 }
82
20
  }
@@ -0,0 +1,30 @@
1
+ // Escape a value for safe rendering into a systemd `Environment=KEY=VALUE` line.
2
+ //
3
+ // Per systemd.exec(5) Environment= and systemd.unit(5) Specifier Expansion:
4
+ // - Literal `%` is the specifier-expansion marker; to embed one in a value
5
+ // the unit file must write `%%`. Without escaping, `a%20b` is parsed as
6
+ // a failed `%20` specifier expansion, systemd logs "Invalid slot" and
7
+ // silently drops the variable (empirically reproduced 2026-06-07).
8
+ // - Backslash is a C-string escape inside quoted strings AND inside the
9
+ // Environment= value parser; `\b` becomes byte 0x08 (backspace), `\n`
10
+ // becomes LF, etc. To embed a literal `\` the unit must write `\\`.
11
+ // - `"` requires `\"` (after the backslash escape rule above).
12
+ // - Whitespace requires the whole value to be quoted (`"..."`).
13
+ //
14
+ // Order matters: escape `%` first (it produces `%%`, neither of which we
15
+ // want to re-escape later), then handle `\` and `"` together inside the
16
+ // quoting branch.
17
+ export const systemdEscape = (v) => {
18
+ const percentEscaped = v.replace(/%/g, '%%');
19
+ const needsQuoting = /[\s"\\]/.test(v);
20
+ if (!needsQuoting) return percentEscaped;
21
+ return `"${percentEscaped.replace(/[\\"]/g, '\\$&')}"`;
22
+ };
23
+
24
+ export const xmlEscape = (v) => v.replace(/[&<>'"]/g, c => ({
25
+ '&': '&amp;',
26
+ '<': '&lt;',
27
+ '>': '&gt;',
28
+ "'": '&apos;',
29
+ '"': '&quot;'
30
+ })[c]);
package/proxy/server.mjs CHANGED
@@ -6,6 +6,47 @@ import { streamResponse, createTelemetryRecord } from "./stream.mjs";
6
6
  import { loadExtensions, snapshotRegistry, runOnRequest, runOnResponseStart, runOnResponse, getFailedExtensions } from "./pipeline.mjs";
7
7
  import { startWatcher } from "./watcher.mjs";
8
8
 
9
+ // Debug logging — writes to ~/.claude/cache-fix-debug.log (override path with
10
+ // CACHE_FIX_DEBUG_LOG). Self-gated on CACHE_FIX_DEBUG=1; a no-op otherwise.
11
+ // Env is read on every call so tests (and operators flipping the flag at
12
+ // runtime) see live behavior — same pattern as image-strip's #98 gate.
13
+ import { appendFileSync, mkdirSync } from "node:fs";
14
+ import { homedir } from "node:os";
15
+ import { dirname, join } from "node:path";
16
+ import util from "node:util";
17
+
18
+ function debugLogPath() {
19
+ return process.env.CACHE_FIX_DEBUG_LOG ||
20
+ join(homedir(), ".claude", "cache-fix-debug.log");
21
+ }
22
+
23
+ // Never spread raw headers to the log: Authorization / x-api-key / cookies
24
+ // must never persist to disk. Same discipline as bootstrap-defense.mjs's
25
+ // audit-record contract — extract named scalars only.
26
+ const SENSITIVE_HEADERS = new Set([
27
+ "authorization",
28
+ "x-api-key",
29
+ "cookie",
30
+ "set-cookie",
31
+ "proxy-authorization",
32
+ ]);
33
+
34
+ function redactHeaders(headers) {
35
+ const out = {};
36
+ for (const [k, v] of Object.entries(headers || {})) {
37
+ out[k] = SENSITIVE_HEADERS.has(k.toLowerCase()) ? "[REDACTED]" : v;
38
+ }
39
+ return out;
40
+ }
41
+
42
+ function debugLog(...args) {
43
+ if (process.env.CACHE_FIX_DEBUG !== "1") return;
44
+ const path = debugLogPath();
45
+ try { mkdirSync(dirname(path), { recursive: true }); } catch {}
46
+ const line = `[${new Date().toISOString()}] ${util.format(...args)}\n`;
47
+ try { appendFileSync(path, line); } catch {}
48
+ }
49
+
9
50
  function collectBody(req) {
10
51
  return new Promise((resolve, reject) => {
11
52
  const chunks = [];
@@ -74,7 +115,13 @@ async function handleMessages(clientReq, clientRes) {
74
115
  });
75
116
 
76
117
  const pre = await preForward(clientReq, clientRes, abortController, extSnapshot, "messages");
77
- if (pre.handled) return;
118
+ if (pre.handled) {
119
+ debugLog("[PROXY] handled internally without upstream request",
120
+ "method:", clientReq.method, "url:", clientReq.url,
121
+ "status:", clientRes.statusCode,
122
+ "response headers:", redactHeaders(clientRes.getHeaders()));
123
+ return;
124
+ }
78
125
  const { parsed, forwardBody, meta } = pre;
79
126
 
80
127
  const requestedModel = parsed?.model || null;
@@ -88,6 +135,7 @@ async function handleMessages(clientReq, clientRes) {
88
135
  abortController.signal
89
136
  ));
90
137
  } catch (err) {
138
+ debugLog("[PROXY] forwardRequest error:", err.message);
91
139
  if (abortController.signal.aborted) return;
92
140
  clientRes.writeHead(502, { "content-type": "application/json" });
93
141
  clientRes.end(JSON.stringify({ error: "upstream_error", message: err.message }));
@@ -99,6 +147,11 @@ async function handleMessages(clientReq, clientRes) {
99
147
  // socket carried the request without each one re-instrumenting upstream.
100
148
  meta._upstreamConnectionId = upstreamConnectionId ?? null;
101
149
 
150
+ debugLog("[UPSTREAM -> PROXY -> CLAUDE] RESPONSE",
151
+ "status:", statusCode, "message:", upstreamRes.statusMessage,
152
+ "upstream headers:", redactHeaders(upstreamRes.headers),
153
+ "proxy headers:", redactHeaders(responseHeaders));
154
+
102
155
  if (extSnapshot.length > 0) {
103
156
  const resCtx = { status: statusCode, headers: responseHeaders, meta };
104
157
  await runOnResponseStart(resCtx, extSnapshot);
@@ -274,16 +327,44 @@ function handleNotFound(_req, res) {
274
327
  */
275
328
  export function createProxyServer() {
276
329
  return http.createServer((req, res) => {
277
- if (req.method === "GET" && req.url === "/health") {
278
- return handleHealth(req, res);
279
- }
280
- if (req.method === "POST" && req.url?.startsWith("/v1/messages")) {
281
- return handleMessages(req, res);
282
- }
283
- if (req.url?.startsWith("/api/claude_cli/bootstrap")) {
284
- return handleBootstrap(req, res);
285
- }
286
- handleNotFound(req, res);
330
+ // Async IIFE: handleMessages/handleBootstrap return promises, so we have
331
+ // to await them inside the try/catch — a bare return would let rejections
332
+ // escape to unhandledRejection and (on Node 15+) crash the process.
333
+ (async () => {
334
+ try {
335
+ debugLog("[CLAUDE -> PROXY] REQUEST",
336
+ "method:", req.method, "url:", req.url,
337
+ "headers:", redactHeaders(req.headers));
338
+
339
+ // Wrap res.write/res.end to log chunk-level activity when debug is on.
340
+ // These are sync monkey-patches; the inner debugLog self-gates so the
341
+ // overhead is negligible when CACHE_FIX_DEBUG is unset.
342
+ const originalWrite = res.write;
343
+ const originalEnd = res.end;
344
+ res.write = function (chunk, ...args) {
345
+ debugLog(`[PROXY -> CLAUDE] Send chunk. Size: ${chunk ? chunk.length : 0} bytes`);
346
+ return originalWrite.apply(res, [chunk, ...args]);
347
+ };
348
+ res.end = function (chunk, ...args) {
349
+ debugLog("[PROXY -> CLAUDE] Close connection (res.end)");
350
+ return originalEnd.apply(res, [chunk, ...args]);
351
+ };
352
+
353
+ if (req.method === "GET" && req.url === "/health") return handleHealth(req, res);
354
+ if (req.method === "POST" && req.url?.startsWith("/v1/messages")) return await handleMessages(req, res);
355
+ if (req.url?.startsWith("/api/claude_cli/bootstrap")) return await handleBootstrap(req, res);
356
+ debugLog("ERROR: handler not found for req.url=", req.url, "method=", req.method);
357
+ handleNotFound(req, res);
358
+ } catch (error) {
359
+ debugLog("REQUEST HANDLER ERROR:", error?.message, error?.stack);
360
+ // Generic body: do NOT echo error.message (may include internal paths,
361
+ // upstream URLs, or other server state).
362
+ if (!res.headersSent) {
363
+ res.writeHead(500, { "content-type": "application/json" });
364
+ res.end(JSON.stringify({ error: "internal_proxy_error" }));
365
+ }
366
+ }
367
+ })();
287
368
  });
288
369
  }
289
370
 
@@ -183,9 +183,23 @@ function getAgent(isHTTPS, hostname) {
183
183
  return agent;
184
184
  }
185
185
 
186
+ // Build the upstream URL by concatenating the configured base (with any path
187
+ // component preserved) with the client request URL. The historical
188
+ // `new URL(clientReq.url, base)` approach is RFC 3986 relative-resolution,
189
+ // which drops the base's path component when the relative is path-absolute
190
+ // (`/v1/messages`). That breaks corp-proxy / mirror setups where the
191
+ // configured upstream is `https://corp-proxy.example.net/anthropic-mirror`
192
+ // — the request would land at `https://corp-proxy.example.net/v1/messages`
193
+ // with `/anthropic-mirror` silently dropped. See PR #188 / @nisqatsi.
194
+ export function buildUpstreamUrl(base, clientUrl) {
195
+ const trimmedBase = base.endsWith("/") ? base.slice(0, -1) : base;
196
+ const relative = clientUrl.startsWith("/") ? clientUrl : "/" + clientUrl;
197
+ return new URL(trimmedBase + relative);
198
+ }
199
+
186
200
  export function forwardRequest(clientReq, body, signal) {
187
201
  return new Promise((resolve, reject) => {
188
- const upstreamUrl = new URL(clientReq.url, config.upstream);
202
+ const upstreamUrl = buildUpstreamUrl(config.upstream, clientReq.url);
189
203
 
190
204
  const headers = buildUpstreamHeaders(clientReq.headers, upstreamUrl.hostname);
191
205
  if (body) {
@@ -10,6 +10,8 @@ Restart=on-failure
10
10
  RestartSec=5
11
11
  Environment=CACHE_FIX_PROXY_PORT={{PORT}}
12
12
  {{UPSTREAM_LINE}}
13
+ {{CA_FILE_LINE}}
14
+ {{REJECT_UNAUTHORIZED_LINE}}
13
15
  {{DEBUG_LINE}}
14
16
  {{HOT_RELOAD_LINE}}
15
17
  WorkingDirectory={{WORKING_DIR}}
@@ -14,6 +14,8 @@
14
14
  <key>CACHE_FIX_PROXY_PORT</key>
15
15
  <string>{{PORT}}</string>
16
16
  {{UPSTREAM_PLIST}}
17
+ {{CA_FILE_PLIST}}
18
+ {{REJECT_UNAUTHORIZED_PLIST}}
17
19
  {{DEBUG_PLIST}}
18
20
  {{HOT_RELOAD_PLIST}}
19
21
  </dict>
@@ -0,0 +1,229 @@
1
+ """Shared cache analysis helpers for hooks and MCP tools.
2
+
3
+ Reference Python helper for consumers that want to read cache-fix's
4
+ ``quota-status`` output and reason about cache-state from a Claude Code
5
+ transcript. Used by host-side hooks (e.g. ``~/.claude/hooks/
6
+ context-advisor-analyze.py``) and MCP tools that need quota-aware
7
+ behavior.
8
+
9
+ Consumer pattern: copy or symlink this file into ``~/.claude/mcp/`` (or
10
+ wherever your hook / tool expects to import from) and ``from cache_analysis
11
+ import read_quota_status, analyze_transcript`` etc. The file ships in the
12
+ cache-fix npm package's ``tools/`` directory; npm consumers can reference
13
+ ``node_modules/claude-code-cache-fix/tools/cache_analysis.py`` directly or
14
+ copy it out for non-npm installs.
15
+
16
+ The ``read_quota_status()`` helper handles both cache-fix v3.5.0+ (proxy
17
+ mode, per-session split at ``~/.claude/quota-status/account.json``) and
18
+ v3.4.x and earlier / preload mode (single global
19
+ ``~/.claude/quota-status.json``). See the README's "Migration:
20
+ v3.4.x → v3.5.0+" section.
21
+ """
22
+
23
+ import json
24
+ import subprocess
25
+ from datetime import datetime, timezone
26
+
27
+ CACHE_TTL_5M = 300 # 5-minute ephemeral TTL
28
+ CACHE_TTL_1H = 3600 # 1-hour extended TTL
29
+ CONTEXT_THRESHOLD = 50_000 # Minimum tokens to recommend compact
30
+ COMPACT_RESULT_ESTIMATE = 12_000 # Estimated tokens after compaction
31
+ CACHE_CREATE_RATE_5M = 3.75 # Opus $/MTok for 5min cache writes
32
+ CACHE_CREATE_RATE_1H = 7.50 # Opus $/MTok for 1h cache writes
33
+
34
+
35
+ def read_tail_lines(filepath, n=300):
36
+ """Read last N lines efficiently using tail."""
37
+ try:
38
+ result = subprocess.run(
39
+ ["tail", "-n", str(n), filepath],
40
+ capture_output=True, text=True, timeout=5,
41
+ )
42
+ return result.stdout.splitlines()
43
+ except Exception:
44
+ return []
45
+
46
+
47
+ def parse_assistant_usage(lines):
48
+ """Extract assistant messages with usage data from transcript lines."""
49
+ messages = []
50
+ for line in lines:
51
+ line = line.strip()
52
+ if not line:
53
+ continue
54
+ try:
55
+ obj = json.loads(line)
56
+ except json.JSONDecodeError:
57
+ continue
58
+ if obj.get("type") != "assistant":
59
+ continue
60
+ msg = obj.get("message", {})
61
+ usage = msg.get("usage")
62
+ ts = obj.get("timestamp")
63
+ if not usage or not ts:
64
+ continue
65
+ cr = usage.get("cache_creation_input_tokens", 0)
66
+ rd = usage.get("cache_read_input_tokens", 0)
67
+ inp = usage.get("input_tokens", 0)
68
+ out = usage.get("output_tokens", 0)
69
+ if cr == 0 and rd == 0 and inp == 0:
70
+ continue
71
+ # Extract TTL tier breakdown if available
72
+ cr_detail = usage.get("cache_creation", {})
73
+ cr_1h = cr_detail.get("ephemeral_1h_input_tokens", 0) if isinstance(cr_detail, dict) else 0
74
+ cr_5m = cr_detail.get("ephemeral_5m_input_tokens", 0) if isinstance(cr_detail, dict) else 0
75
+ messages.append({
76
+ "timestamp": ts,
77
+ "input_tokens": inp,
78
+ "cache_creation": cr,
79
+ "cache_read": rd,
80
+ "output_tokens": out,
81
+ "total_in": cr + rd + inp,
82
+ "cr_1h": cr_1h,
83
+ "cr_5m": cr_5m,
84
+ })
85
+ return messages
86
+
87
+
88
+ def detect_cache_ttl(messages):
89
+ """Detect the effective cache TTL from recent API call usage data.
90
+
91
+ If any recent calls show ephemeral_1h_input_tokens > 0, the account
92
+ is on the 1-hour tier. Otherwise, assume 5-minute ephemeral.
93
+ Returns (ttl_seconds, tier_name).
94
+ """
95
+ recent = messages[-10:] if len(messages) >= 10 else messages
96
+ has_1h = any(m.get("cr_1h", 0) > 0 for m in recent)
97
+ has_5m = any(m.get("cr_5m", 0) > 0 for m in recent)
98
+
99
+ if has_1h:
100
+ return CACHE_TTL_1H, "1h"
101
+ if has_5m:
102
+ return CACHE_TTL_5M, "5m"
103
+ # No cache_creation breakdown available — conservative default
104
+ return CACHE_TTL_5M, "5m (default)"
105
+
106
+
107
+ def estimate_thinking_overhead(messages):
108
+ """Estimate thinking block replay overhead.
109
+
110
+ Thinking blocks from prior turns replay as input tokens. Heuristic:
111
+ cumulative output_tokens approximates thinking content that gets replayed.
112
+ """
113
+ if len(messages) < 2:
114
+ return 0
115
+ return sum(m["output_tokens"] for m in messages[:-1])
116
+
117
+
118
+ def format_tokens(n):
119
+ if n >= 1_000_000:
120
+ return f"{n / 1_000_000:.1f}M"
121
+ if n >= 1_000:
122
+ return f"{n / 1_000:.0f}k"
123
+ return str(n)
124
+
125
+
126
+ def format_duration(seconds):
127
+ if seconds >= 3600:
128
+ return f"{seconds / 3600:.1f}h"
129
+ return f"{int(seconds / 60)}m"
130
+
131
+
132
+ def estimate_savings(total_context, ttl_tier="5m"):
133
+ """Estimate $ savings from compacting before a cold start.
134
+
135
+ Rate depends on the active cache TTL tier — 1h cache writes are 2x the
136
+ 5m rate. Caller should pass the tier returned by detect_cache_ttl().
137
+ Default is the conservative 5m rate for backward compatibility.
138
+ """
139
+ rate = CACHE_CREATE_RATE_1H if ttl_tier.startswith("1h") else CACHE_CREATE_RATE_5M
140
+ cold_cost = (total_context / 1_000_000) * rate
141
+ compact_cost = (COMPACT_RESULT_ESTIMATE / 1_000_000) * rate
142
+ return cold_cost - compact_cost
143
+
144
+
145
+ def read_quota_status():
146
+ """Read current quota utilization from cache-fix's quota-status file.
147
+
148
+ Written by the cache-fix interceptor from API response headers. Path
149
+ depends on cache-fix version:
150
+ - v3.5.0+ (proxy mode, per-session split): ~/.claude/quota-status/account.json
151
+ - v3.4.x and earlier (or preload mode): ~/.claude/quota-status.json (flat)
152
+
153
+ Tries the v3.5.0+ path first, falls back to the legacy flat path. A
154
+ candidate file whose JSON parses but isn't a dict (e.g. a partial write
155
+ that lands as ``[]`` or ``null``) is skipped so the next candidate gets
156
+ a chance — and so callers never receive a non-dict and break on
157
+ ``status.get(...)`` accessors downstream.
158
+
159
+ Returns dict with five_hour/seven_day pct (and other fields written by
160
+ cache-fix's response-header capture), or None if no candidate yields a
161
+ dict-shaped payload.
162
+ """
163
+ import os
164
+ for quota_file in (
165
+ os.path.expanduser("~/.claude/quota-status/account.json"),
166
+ os.path.expanduser("~/.claude/quota-status.json"),
167
+ ):
168
+ try:
169
+ with open(quota_file) as f:
170
+ data = json.load(f)
171
+ except (OSError, json.JSONDecodeError):
172
+ continue
173
+ if isinstance(data, dict):
174
+ return data
175
+ # Valid JSON but wrong shape — try the next candidate.
176
+ return None
177
+
178
+
179
+ def analyze_transcript(transcript_path):
180
+ """Full analysis of a transcript. Returns a dict with all cache state info.
181
+
182
+ Returns None if analysis can't be performed (no data, etc).
183
+ """
184
+ lines = read_tail_lines(transcript_path, 300)
185
+ if not lines:
186
+ return None
187
+
188
+ messages = parse_assistant_usage(lines)
189
+ if not messages:
190
+ return None
191
+
192
+ last = messages[-1]
193
+ try:
194
+ last_ts = datetime.fromisoformat(last["timestamp"].replace("Z", "+00:00"))
195
+ except (ValueError, KeyError):
196
+ return None
197
+
198
+ now = datetime.now(timezone.utc)
199
+ gap_seconds = (now - last_ts).total_seconds()
200
+
201
+ context_tokens = last["total_in"]
202
+ thinking_overhead = estimate_thinking_overhead(messages)
203
+ total_with_thinking = context_tokens + thinking_overhead
204
+
205
+ ttl_seconds, ttl_tier = detect_cache_ttl(messages)
206
+ cache_expired = gap_seconds > ttl_seconds
207
+
208
+ # Last few turns' cache efficiency
209
+ recent = messages[-5:] if len(messages) >= 5 else messages
210
+ recent_cr = sum(m["cache_creation"] for m in recent)
211
+ recent_total = sum(m["total_in"] for m in recent)
212
+ cr_pct = (recent_cr / recent_total * 100) if recent_total else 0
213
+
214
+ quota = read_quota_status()
215
+
216
+ return {
217
+ "context_tokens": context_tokens,
218
+ "thinking_overhead": thinking_overhead,
219
+ "total_with_thinking": total_with_thinking,
220
+ "gap_seconds": gap_seconds,
221
+ "cache_expired": cache_expired,
222
+ "ttl_seconds": ttl_seconds,
223
+ "ttl_tier": ttl_tier,
224
+ "last_timestamp": last["timestamp"],
225
+ "num_messages": len(messages),
226
+ "recent_cr_pct": cr_pct,
227
+ "savings": estimate_savings(total_with_thinking, ttl_tier) if cache_expired else 0,
228
+ "quota": quota,
229
+ }