npm - claude-code-cache-fix - Versions diffs - 4.0.0 → 4.1.0 - Mend

claude-code-cache-fix 4.0.0 → 4.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +53 -7
package/bin/install-service.mjs +25 -14
package/package.json +1 -1
package/proxy/extensions/usage-log.mjs +46 -1
package/proxy/extensions.json +18 -80
package/proxy/helpers.mjs +30 -0
package/proxy/server.mjs +92 -11
package/proxy/upstream.mjs +15 -1
package/templates/cache-fix-proxy.service.template +2 -0
package/templates/com.cnighswonger.cache-fix-proxy.plist.template +2 -0
package/tools/cache_analysis.py +229 -0

package/README.md CHANGED Viewed

@@ -6,7 +6,7 @@ English | [中文](./README.zh.md) | [한국어](./README.ko.md) | [Português](
 Cache optimization proxy for [Claude Code](https://github.com/anthropics/claude-code). Fixes prompt cache bugs that cause excessive quota burn, stabilizes the request prefix, and monitors for silent regressions. Works with all CC versions including the v2.1.113+ Bun binary.
-> **v3.0.3** — Local HTTP proxy with 7 hot-reloadable extensions. A/B tested on v2.1.117: **95.5% cache hit rate through proxy vs 82.3% direct** on first warm turn. [Full release notes →](https://github.com/cnighswonger/claude-code-cache-fix/releases/tag/v3.0.0)
+> **v4.0.0** — Local HTTP proxy with a pipeline of cost-impact and observability extensions. Two long-standing defaults flipped: `thinking-block-sanitize` v1 is on by default (mitigates the thinking-desync `400` wedge — [#63147](https://github.com/anthropics/claude-code/issues/63147)) and in-process extension hot-reload is opt-in (`CACHE_FIX_HOT_RELOAD=on`). A/B baseline (v3.0.0 on v2.1.117): **95.5% cache hit rate through proxy vs 82.3% direct** on first warm turn. [Full release notes →](https://github.com/cnighswonger/claude-code-cache-fix/releases/tag/v4.0.0)
 > **Opus 4.7 advisory:** Metered data shows 4.7 burns Q5h quota at **~2.4x the rate of 4.6** for equivalent visible token counts ([independently confirmed by @ArkNill](https://github.com/ArkNill/claude-code-hidden-problem-analysis/blob/main/16_OPUS-47-ADVISORY.md)). Two factors: a new tokenizer (up to 35% more tokens, [documented](https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7)) and adaptive thinking overhead (~105%, not documented in usage response). The Q5h impact compounds into **Q7d** — the weekly quota ceiling that most heavy users will hit first. Workaround: `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` reduces burn by ~3.3x but may reduce quality on complex tasks. See [Discussion #25](https://github.com/cnighswonger/claude-code-cache-fix/discussions/25) (initial observation) and [Discussion #42](https://github.com/cnighswonger/claude-code-cache-fix/discussions/42) (controlled A/B data + Q7d analysis).
@@ -25,11 +25,11 @@ node "$(npm root -g)/claude-code-cache-fix/proxy/server.mjs" &
 ANTHROPIC_BASE_URL=http://127.0.0.1:9801 claude
 ```
-That's it. The proxy applies all 7 cache-fix extensions automatically. No wrapper scripts, no `NODE_OPTIONS`, no preload.
+That's it. The proxy applies its default extension pipeline automatically. No wrapper scripts, no `NODE_OPTIONS`, no preload.
 ### What the proxy does
-On every `/v1/messages` request, 9 extensions run in order:
+On every `/v1/messages` request, the pipeline runs an ordered chain of extensions covering cache stability, observability, thinking-desync mitigation, image, microcompact, breakpoint, bootstrap-channel, and other surfaces. Several are gated behind env vars documented in their own sections below; bootstrap-channel handling defaults to `audit` mode. The headliners:
 | Extension | What it fixes |
 |-----------|--------------|
@@ -116,7 +116,7 @@ docker run -d --name cache-fix-proxy --restart=always -p 9801:9801 \
   ghcr.io/cnighswonger/claude-code-cache-fix:latest
 ```
-Image tags: `latest`, `3`, `3.2`, `3.2.1` (semver-ladder, so `3` always points to the newest 3.x). `latest` always tracks the newest tagged release.
+Image tags: `latest`, `4`, `4.0`, `4.0.0` (semver-ladder, so `4` always points to the newest 4.x). `latest` always tracks the newest tagged release.
 **Linux note:** the chained-upstream `host.docker.internal` example below is automatic on Docker Desktop (macOS / Windows). On plain Linux Docker Engine you usually need `--add-host=host.docker.internal:host-gateway` so the name resolves to the host bridge. Without it, the container's name lookup fails and the proxy can't reach the upstream service running on the host. Example chaining cache-fix proxy through `llm-relay` running on the host:
@@ -282,7 +282,7 @@ launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
 **Cache-economics regressions.** The original purpose of cache-fix is to absorb the cache-handling behaviors in Claude Code that cost users real money and quota — TTL downgrades, cache-breaking header churn, identity-latching issues, and the rest of the regression catalog documented across our issue history. The proxy sits between CC and the Anthropic API, normalizes the request and response stream, and emits enough observability (via statusline integration and the quota-status files) that users can see what their session is actually doing. This is the load-bearing feature for almost every user today.
-**Bootstrap-channel observability.** Claude Code v2.1.150 introduced a prompt-section consumer that fetches a server-supplied string from `/api/claude_cli/bootstrap` and merges it into the agent's behavioral-instructions prompt path. We filed this behavior with Anthropic's security team in May 2026; Anthropic closed the report as *Informative*, treating TLS as the transport-integrity boundary and declining to add application-layer authenticity checks. Cache-fix v3.7.0 added explicit handling for this path. v3.7.1 extends it to also cover the env-var-selected GrowthBook prompt-injection surface that landed in CC v2.1.152 (remote-control mode: `CLAUDE_CODE_SYSTEM_PROMPT_GB_FEATURE` names a flag key whose cached value is used as the system prompt body).
+**Bootstrap-channel observability.** Claude Code v2.1.150 introduced a prompt-section consumer that fetches a server-supplied string from `/api/claude_cli/bootstrap` and merges it into the agent's behavioral-instructions prompt path. We filed this behavior with Anthropic's security team in May 2026; Anthropic closed the report as *Informative*, treating TLS as the transport-integrity boundary and declining to add application-layer authenticity checks. Cache-fix shipped explicit handling for this path in v3.7.0 and extended it in v3.7.1 to also cover the env-var-selected GrowthBook prompt-injection surface that landed in CC v2.1.152 (remote-control mode: `CLAUDE_CODE_SYSTEM_PROMPT_GB_FEATURE` names a flag key whose cached value is used as the system prompt body). Stable in the current v4.x line.
 Cache-fix's `bootstrap-defense` extension ships three modes, selected via `CACHE_FIX_BOOTSTRAP_MODE`:
@@ -401,7 +401,7 @@ For manual VS Code wrapper setup (without the VSIX), see [docs/preload-setup.md]
 **What it does NOT do:** No network calls from the proxy or interceptor. All telemetry is written to local files under `~/.claude/`. No data leaves your machine.
-**Supply chain:** Proxy mode: 7 small extension modules in `proxy/extensions/` (each under 200 lines). Preload mode: single unminified file (`preload.mjs`, ~1,700 lines). One dev dependency (`zod` for schema validation in tests only). Review before installing. Published builds carry npm's default registry signatures; sigstore provenance attestation is not currently published — tracked as a follow-up.
+**Supply chain:** Proxy mode: small focused extension modules in `proxy/extensions/` (most under a few hundred lines; the pipeline is composable, you can read any single one in isolation). Preload mode: single unminified file (`preload.mjs`). One dev dependency (`zod` for schema validation in tests only). Review before installing. Published builds carry npm's default registry signatures; sigstore provenance attestation is not currently published — tracked as a follow-up.
 **Independent audit:** [Assessed as "LEGITIMATE TOOL"](https://github.com/anthropics/claude-code/issues/38335#issuecomment-4244413605) by @TheAuditorTool (2026-04-14).
@@ -421,7 +421,7 @@ Additionally, images read via the Read tool persist as base64 in conversation hi
 ## How it works
-**Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions live as `.mjs` files configured in `proxy/extensions.json` and load once at proxy startup (hot-reload is opt-in as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x)). All other traffic passes through untouched.
+**Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. A pipeline of extension modules processes each request — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers, sanitizing thinking blocks, recording telemetry, and more. Extensions live as `.mjs` files configured in `proxy/extensions.json` and load once at proxy startup (hot-reload is opt-in as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x)). All other traffic passes through untouched.
 **Preload mode** (v2.x): A Node.js `--import` module that patches `globalThis.fetch` before Claude Code makes API calls. Applies the same fixes inline — scans user messages for relocated blocks, sorts tools, recomputes fingerprints, injects TTL markers.
@@ -855,6 +855,52 @@ The preload interceptor includes monitoring for microcompact degradation, false
 See [docs/monitoring.md](docs/monitoring.md) for full details, debug mode, prefix diffing, environment variables, and the bundled quota analysis tool.
+### `usage-log` extension and the `MeterRowSchema v:1` wire format
+The `usage-log` extension (opt-in via `proxy/extensions.json`) appends one JSON line per API response to `~/.claude/usage.jsonl`. The row shape is `MeterRowSchema v:1` — the cross-repo contract validated by [`claude-code-meter`](https://github.com/cnighswonger/claude-code-meter)'s strict schema. Every field below is captured per call:
+| Field | Type | Source |
+|---|---|---|
+| `v` | literal `1` | constant |
+| `ts` | ISO-8601 datetime | server time at row emission |
+| `sid` | 8-char lowercase hex | proxy session id, sticky for the proxy's lifetime |
+| `model` | string ≤64 | `message_start.message.model` from the response stream |
+| `requested_model` | string ≤64 (optional) | request body `model` field |
+| `model_mismatch` | bool (optional) | true when `requested_model && model && requested_model !== model` |
+| `speed` | `"standard"` / `"fast"` / `""` | response `usage.speed` |
+| `service_tier` | string ≤32 | response `usage.service_tier` |
+| `input_tokens` | int ≥0 | response usage |
+| `output_tokens` | int ≥0 | response usage |
+| `cache_creation_input_tokens` | int ≥0 | response usage |
+| `cache_read_input_tokens` | int ≥0 | response usage |
+| `ephemeral_1h_input_tokens` | int ≥0 | response usage |
+| `ephemeral_5m_input_tokens` | int ≥0 | response usage |
+| `web_search_requests` | int ≥0 | response usage |
+| `q5h` / `q7d` | float 0–2 | `anthropic-ratelimit-unified-{5h,7d}-utilization` headers |
+| `q5h_reset` / `q7d_reset` | int (unix sec) | corresponding reset headers |
+| `qstatus`, `qoverage`, `qclaim` | lowercase enums | unified status / overage / claim headers |
+| `qfallback_pct` | float 0–1 | unified fallback percentage |
+| `qoverage_util` | float ≥0 (optional) | overage utilization header |
+| `qrepresentative_claim` | string ≤16 (optional) | representative-claim header |
+| `org_id` | 16-char hex (optional) | `sha256(anthropic-organization-id).slice(0, 16)` — never raw |
+| `overage_disabled_reason` | string ≤64 (optional) | overage-disabled-reason header |
+| `cache_hit_rate` | float 0–1 | `cache_read_input_tokens / (input + cache_creation + cache_read)` |
+| `q5h_delta`, `q7d_delta` | float | per-call delta from the previous row's q5h/q7d; 0 on first call after restart |
+| `request_id` | string ≤64 (optional, gated) | upstream `request-id` response header. Default-off; enable with `CACHE_FIX_USAGE_LOG_REQID=on`. **Cross-repo gate:** `claude-code-meter >= v0.7.0` accepts the optional field; older meter installs reject unknown keys via the strict-object schema. |
+**Why `request_id` matters operationally.** The `sid` field is generated once at proxy boot and shared across every CC session that proxy serves. On hosts running multiple concurrent CC sessions through one proxy (common in agent fleets), every session's rows collapse into the same `sid` — there's no way to ask "which session burned 80% of today's Opus tokens?" from `usage.jsonl` alone. CC's per-session JSONL transcripts at `~/.claude/projects/<project>/<session-uuid>.jsonl` already carry `requestId` for every API call. Capturing the same value in the meter row makes the post-hoc join trivial:
+```bash
+# Find which CC session each usage.jsonl row belongs to:
+for row in $(jq -c . < ~/.claude/usage.jsonl); do
+  req=$(jq -r '.request_id // empty' <<< "$row")
+  [ -z "$req" ] && continue
+  grep -l "\"requestId\":\"$req\"" ~/.claude/projects/*/*.jsonl
+done
+```
+The filename of the matching transcript is the CC session UUID, recovering per-session attribution for every meter row that was emitted with the field on.
 ## Limitations
 - **Proxy requires a running process** — The proxy must be started before Claude Code. If it's not running and `ANTHROPIC_BASE_URL` points to it, CC will fail to connect. We recommend running it as a systemd service or with a health-checking wrapper script.

package/bin/install-service.mjs CHANGED Viewed

@@ -13,6 +13,7 @@ import { spawn } from "node:child_process";
 import { fileURLToPath } from "node:url";
 import { dirname, resolve, join } from "node:path";
 import { homedir, platform } from "node:os";
+import { systemdEscape, xmlEscape } from "../proxy/helpers.mjs";
 const __dirname = dirname(fileURLToPath(import.meta.url));
 const TEMPLATE_DIR = resolve(__dirname, "..", "templates");
@@ -22,6 +23,8 @@ function getDefaults() {
   return {
     port: validatePort(process.env.CACHE_FIX_PROXY_PORT || "9801"),
     upstream: process.env.CACHE_FIX_PROXY_UPSTREAM || "",
+    caFile: process.env.CACHE_FIX_PROXY_CA_FILE || "",
+    rejectUnauthorized: process.env.CACHE_FIX_PROXY_REJECT_UNAUTHORIZED || "",
     debug: process.env.CACHE_FIX_DEBUG || "",
     // Hot-reload is opt-in as of v4.0.0 (#196). Capture from env at install
     // time so the operator can bake `CACHE_FIX_HOT_RELOAD=on` into the
@@ -93,10 +96,16 @@ function getPaths(plat = platform()) {
 function renderSystemdTemplate(template, vars) {
   const upstreamLine = vars.upstream
-    ? `Environment=CACHE_FIX_PROXY_UPSTREAM=${vars.upstream}`
+    ? `Environment=CACHE_FIX_PROXY_UPSTREAM=${systemdEscape(vars.upstream)}`
+    : "";
+  const caFileLine = vars.caFile
+    ? `Environment=CACHE_FIX_PROXY_CA_FILE=${systemdEscape(vars.caFile)}`
+    : "";
+  const rejectUnauthorizedLine = vars.rejectUnauthorized
+    ? `Environment=CACHE_FIX_PROXY_REJECT_UNAUTHORIZED=${systemdEscape(vars.rejectUnauthorized)}`
     : "";
   const debugLine = vars.debug
-    ? `Environment=CACHE_FIX_DEBUG=${vars.debug}`
+    ? `Environment=CACHE_FIX_DEBUG=${systemdEscape(vars.debug)}`
     : "";
   const hotReloadLine = vars.hotReload
     ? `Environment=CACHE_FIX_HOT_RELOAD=${vars.hotReload}`
@@ -111,6 +120,8 @@ function renderSystemdTemplate(template, vars) {
     .replaceAll("{{SERVER_PATH}}", vars.serverPath)
     .replaceAll("{{PORT}}", vars.port)
     .replaceAll("{{UPSTREAM_LINE}}", upstreamLine)
+    .replaceAll("{{CA_FILE_LINE}}", caFileLine)
+    .replaceAll("{{REJECT_UNAUTHORIZED_LINE}}", rejectUnauthorizedLine)
     .replaceAll("{{DEBUG_LINE}}", debugLine)
     .replaceAll("{{HOT_RELOAD_LINE}}", hotReloadLine)
     .replaceAll("{{REQUIRES_LINE}}", requiresLine)
@@ -121,10 +132,16 @@ function renderSystemdTemplate(template, vars) {
 function renderLaunchdTemplate(template, vars) {
   const upstreamPlist = vars.upstream
-    ? `        <key>CACHE_FIX_PROXY_UPSTREAM</key>\n        <string>${vars.upstream}</string>`
+    ? `        <key>CACHE_FIX_PROXY_UPSTREAM</key>\n        <string>${xmlEscape(vars.upstream)}</string>`
+    : "";
+  const caFilePlist = vars.caFile
+    ? `        <key>CACHE_FIX_PROXY_CA_FILE</key>\n        <string>${xmlEscape(vars.caFile)}</string>`
+    : "";
+  const rejectUnauthorizedPlist = vars.rejectUnauthorized
+    ? `        <key>CACHE_FIX_PROXY_REJECT_UNAUTHORIZED</key>\n        <string>${xmlEscape(vars.rejectUnauthorized)}</string>`
     : "";
   const debugPlist = vars.debug
-    ? `        <key>CACHE_FIX_DEBUG</key>\n        <string>${vars.debug}</string>`
+    ? `        <key>CACHE_FIX_DEBUG</key>\n        <string>${xmlEscape(vars.debug)}</string>`
     : "";
   const hotReloadPlist = vars.hotReload
     ? `        <key>CACHE_FIX_HOT_RELOAD</key>\n        <string>${vars.hotReload}</string>`
@@ -134,6 +151,8 @@ function renderLaunchdTemplate(template, vars) {
     .replaceAll("{{SERVER_PATH}}", vars.serverPath)
     .replaceAll("{{PORT}}", vars.port)
     .replaceAll("{{UPSTREAM_PLIST}}", upstreamPlist)
+    .replaceAll("{{CA_FILE_PLIST}}", caFilePlist)
+    .replaceAll("{{REJECT_UNAUTHORIZED_PLIST}}", rejectUnauthorizedPlist)
     .replaceAll("{{DEBUG_PLIST}}", debugPlist)
     .replaceAll("{{HOT_RELOAD_PLIST}}", hotReloadPlist)
     .replaceAll("{{WORKING_DIR}}", vars.workingDir)
@@ -186,12 +205,8 @@ async function installSystemd({ paths, defaults, force = false } = {}) {
   const rendered = renderSystemdTemplate(template, {
     node: process.execPath,
     serverPath: SERVER_PATH,
-    port: defaults.port,
-    upstream: defaults.upstream,
-    debug: defaults.debug,
-    hotReload: defaults.hotReload,
-    workingDir: defaults.workingDir,
     requires: "",
+    ...defaults,
   });
   await mkdir(paths.configDir, { recursive: true });
   await writeFile(targetPath, rendered);
@@ -286,12 +301,8 @@ async function installLaunchd({ paths, defaults, force = false } = {}) {
   const rendered = renderLaunchdTemplate(template, {
     node: process.execPath,
     serverPath: SERVER_PATH,
-    port: defaults.port,
-    upstream: defaults.upstream,
-    debug: defaults.debug,
-    hotReload: defaults.hotReload,
-    workingDir: defaults.workingDir,
     logDir: paths.logDir,
+    ...defaults,
   });
   await mkdir(paths.configDir, { recursive: true });
   await writeFile(targetPath, rendered);

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-code-cache-fix",
-  "version": "4.0.0",
+  "version": "4.1.0",
   "description": "Cache optimization proxy and interceptor for Claude Code. Fixes prompt cache bugs, stabilizes prefix, reduces quota burn.",
   "type": "module",
   "exports": {

package/proxy/extensions/usage-log.mjs CHANGED Viewed

@@ -26,6 +26,7 @@
 //   overage_disabled_reason?: string ≤64             (optional)
 //   cache_hit_rate: float 0–1
 //   q5h_delta, q7d_delta: float (0 on first call after restart)
+//   request_id?: string ≤64                          (optional, gated)
 //
 // `peak_hour` is NOT in the wire format. It can be derived from `ts` if any
 // consumer needs it.
@@ -36,6 +37,17 @@
 // CACHE_FIX_USAGE_LOG=<path> overrides the destination path only — it is NOT
 // an enable flag and never has been.
 //
+// CACHE_FIX_USAGE_LOG_REQID=on emits the optional `request_id` field
+// (sourced from the upstream `request-id` response header). Default-off in
+// v4.1.0 to avoid breaking unpatched claude-meter installs whose strict-
+// object schema rejects unknown keys. claude-meter v0.7.0+ accepts the
+// optional field; the v4.2.0 flip to default-on assumes that floor.
+// The field is the post-hoc join key against CC's per-session JSONL
+// transcripts
+// (`~/.claude/projects/<project>/<session-uuid>.jsonl` carry `requestId`
+// for every API call), which recovers per-CC-session attribution that
+// `sid` alone cannot provide. See docs/directives/proxy-usage-log-request-id.md.
+//
 // See `docs/directives/proxy-claude-meter-compat.md` for full design.
 import { appendFile, mkdir } from "node:fs/promises";
@@ -91,6 +103,17 @@ export function extractMessageDeltaFields(event) {
   return { output_tokens: event.usage.output_tokens || 0 };
 }
+// Extract upstream request-id from response headers, guarded against the
+// max(64) MeterRowSchema constraint. Returns the string when valid, or
+// `undefined` so the optional schema field is omitted on bad input rather
+// than emitting a row that would fail meter-side validation.
+export function extractRequestId(headers) {
+  const raw = headers?.["request-id"];
+  if (typeof raw !== "string") return undefined;
+  if (raw.length === 0 || raw.length > 64) return undefined;
+  return raw;
+}
 function num(headers, key) {
   const v = headers?.[key];
   if (v === undefined || v === null || v === "") return null;
@@ -134,7 +157,7 @@ export function computeDelta(current, previous) {
   return current - previous;
 }
-export function assembleRecord({ start, delta, quota, requestedModel, sid, prevQ5h, prevQ7d, now = new Date() }) {
+export function assembleRecord({ start, delta, quota, requestedModel, sid, prevQ5h, prevQ7d, requestId, now = new Date() }) {
   const s = start || {};
   const d = delta || {};
   const q = quota || {};
@@ -194,6 +217,26 @@ export function assembleRecord({ start, delta, quota, requestedModel, sid, prevQ
     record.overage_disabled_reason = q.overage_disabled_reason;
   }
+  // Optional: emit request_id when CACHE_FIX_USAGE_LOG_REQID=on AND the
+  // captured value is a non-empty string within the schema's max(64)
+  // constraint. Belt-and-braces: extractRequestId enforces these guards at
+  // capture time, and assembleRecord re-enforces them here so a future
+  // refactor that bypasses the extractor can't emit a row that would fail
+  // claude-meter's strict-object validation.
+  // Env read happens per-call so operators can flip it at runtime without
+  // proxy restart, matching the image-strip debug-gate pattern.
+  // Cross-repo contract: claude-code-meter v0.7.0+ accepts this optional
+  // field; older meter installs reject rows that carry it, so the gate
+  // stays default-off in v4.1.0. Default flips on in cache-fix v4.2.0.
+  if (
+    process.env.CACHE_FIX_USAGE_LOG_REQID === "on" &&
+    typeof requestId === "string" &&
+    requestId.length > 0 &&
+    requestId.length <= 64
+  ) {
+    record.request_id = requestId;
+  }
   return record;
 }
@@ -250,6 +293,7 @@ export default {
       const delta = extractMessageDeltaFields(ctx.event);
       const quota = parseQuotaHeaders(ctx.responseHeaders || {});
       const requestedModel = ctx.telemetry?.requestedModel || undefined;
+      const requestId = extractRequestId(ctx.responseHeaders || {});
       const record = assembleRecord({
         start,
@@ -259,6 +303,7 @@ export default {
         sid: _sid,
         prevQ5h: _lastQ5h,
         prevQ7d: _lastQ7d,
+        requestId,
         now: new Date(),
       });

package/proxy/extensions.json CHANGED Viewed

@@ -1,82 +1,20 @@
 {
-  "bootstrap-defense": {
-    "enabled": true,
-    "order": 45
-  },
-  "ttl-tier-detect": {
-    "enabled": true,
-    "order": 75
-  },
-  "fingerprint-strip": {
-    "enabled": true,
-    "order": 100
-  },
-  "image-strip": {
-    "enabled": true,
-    "order": 150
-  },
-  "sort-stabilization": {
-    "enabled": true,
-    "order": 200
-  },
-  "fresh-session-sort": {
-    "enabled": true,
-    "order": 250
-  },
-  "identity-normalization": {
-    "enabled": true,
-    "order": 300
-  },
-  "smoosh-split": {
-    "enabled": true,
-    "order": 320
-  },
-  "content-strip": {
-    "enabled": true,
-    "order": 330
-  },
-  "tool-input-normalize": {
-    "enabled": true,
-    "order": 340
-  },
-  "microcompact-stability": {
-    "enabled": true,
-    "order": 350
-  },
-  "thinking-display": {
-    "enabled": true,
-    "order": 360
-  },
-  "cache-control-normalize": {
-    "enabled": true,
-    "order": 400
-  },
-  "messages-cache-breakpoint": {
-    "enabled": true,
-    "order": 410
-  },
-  "ttl-management": {
-    "enabled": true,
-    "order": 500
-  },
-  "cache-telemetry": {
-    "enabled": true,
-    "order": 600
-  },
-  "overage-warning": {
-    "enabled": true,
-    "order": 610
-  },
-  "request-log": {
-    "enabled": false,
-    "order": 700
-  },
-  "usage-log": {
-    "enabled": true,
-    "order": 650
-  },
-  "rate-limit-log": {
-    "enabled": true,
-    "order": 660
-  }
+  "bootstrap-defense": { "enabled": true, "order": 45 },
+  "ttl-tier-detect": { "enabled": true, "order": 75 },
+  "fingerprint-strip": { "enabled": true, "order": 100 },
+  "image-strip": { "enabled": true, "order": 150 },
+  "sort-stabilization": { "enabled": true, "order": 200 },
+  "fresh-session-sort": { "enabled": true, "order": 250 },
+  "identity-normalization": { "enabled": true, "order": 300 },
+  "smoosh-split": { "enabled": true, "order": 320 },
+  "content-strip": { "enabled": true, "order": 330 },
+  "tool-input-normalize": { "enabled": true, "order": 340 },
+  "microcompact-stability": { "enabled": true, "order": 350 },
+  "thinking-display": { "enabled": true, "order": 360 },
+  "cache-control-normalize": { "enabled": true, "order": 400 },
+  "messages-cache-breakpoint": { "enabled": true, "order": 410 },
+  "ttl-management": { "enabled": true, "order": 500 },
+  "cache-telemetry": { "enabled": true, "order": 600 },
+  "overage-warning": { "enabled": true, "order": 610 },
+  "request-log": { "enabled": false, "order": 700 }
 }

package/proxy/helpers.mjs ADDED Viewed

@@ -0,0 +1,30 @@
+// Escape a value for safe rendering into a systemd `Environment=KEY=VALUE` line.
+//
+// Per systemd.exec(5) Environment= and systemd.unit(5) Specifier Expansion:
+//   - Literal `%` is the specifier-expansion marker; to embed one in a value
+//     the unit file must write `%%`. Without escaping, `a%20b` is parsed as
+//     a failed `%20` specifier expansion, systemd logs "Invalid slot" and
+//     silently drops the variable (empirically reproduced 2026-06-07).
+//   - Backslash is a C-string escape inside quoted strings AND inside the
+//     Environment= value parser; `\b` becomes byte 0x08 (backspace), `\n`
+//     becomes LF, etc. To embed a literal `\` the unit must write `\\`.
+//   - `"` requires `\"` (after the backslash escape rule above).
+//   - Whitespace requires the whole value to be quoted (`"..."`).
+//
+// Order matters: escape `%` first (it produces `%%`, neither of which we
+// want to re-escape later), then handle `\` and `"` together inside the
+// quoting branch.
+export const systemdEscape = (v) => {
+  const percentEscaped = v.replace(/%/g, '%%');
+  const needsQuoting = /[\s"\\]/.test(v);
+  if (!needsQuoting) return percentEscaped;
+  return `"${percentEscaped.replace(/[\\"]/g, '\\$&')}"`;
+};
+export const xmlEscape = (v) => v.replace(/[&<>'"]/g, c => ({
+  '&': '&amp;',
+  '<': '&lt;',
+  '>': '&gt;',
+  "'": '&apos;',
+  '"': '&quot;'
+})[c]);

package/proxy/server.mjs CHANGED Viewed

@@ -6,6 +6,47 @@ import { streamResponse, createTelemetryRecord } from "./stream.mjs";
 import { loadExtensions, snapshotRegistry, runOnRequest, runOnResponseStart, runOnResponse, getFailedExtensions } from "./pipeline.mjs";
 import { startWatcher } from "./watcher.mjs";
+// Debug logging — writes to ~/.claude/cache-fix-debug.log (override path with
+// CACHE_FIX_DEBUG_LOG). Self-gated on CACHE_FIX_DEBUG=1; a no-op otherwise.
+// Env is read on every call so tests (and operators flipping the flag at
+// runtime) see live behavior — same pattern as image-strip's #98 gate.
+import { appendFileSync, mkdirSync } from "node:fs";
+import { homedir } from "node:os";
+import { dirname, join } from "node:path";
+import util from "node:util";
+function debugLogPath() {
+  return process.env.CACHE_FIX_DEBUG_LOG ||
+    join(homedir(), ".claude", "cache-fix-debug.log");
+}
+// Never spread raw headers to the log: Authorization / x-api-key / cookies
+// must never persist to disk. Same discipline as bootstrap-defense.mjs's
+// audit-record contract — extract named scalars only.
+const SENSITIVE_HEADERS = new Set([
+  "authorization",
+  "x-api-key",
+  "cookie",
+  "set-cookie",
+  "proxy-authorization",
+]);
+function redactHeaders(headers) {
+  const out = {};
+  for (const [k, v] of Object.entries(headers || {})) {
+    out[k] = SENSITIVE_HEADERS.has(k.toLowerCase()) ? "[REDACTED]" : v;
+  }
+  return out;
+}
+function debugLog(...args) {
+  if (process.env.CACHE_FIX_DEBUG !== "1") return;
+  const path = debugLogPath();
+  try { mkdirSync(dirname(path), { recursive: true }); } catch {}
+  const line = `[${new Date().toISOString()}] ${util.format(...args)}\n`;
+  try { appendFileSync(path, line); } catch {}
+}
 function collectBody(req) {
   return new Promise((resolve, reject) => {
     const chunks = [];
@@ -74,7 +115,13 @@ async function handleMessages(clientReq, clientRes) {
   });
   const pre = await preForward(clientReq, clientRes, abortController, extSnapshot, "messages");
-  if (pre.handled) return;
+  if (pre.handled) {
+    debugLog("[PROXY] handled internally without upstream request",
+             "method:", clientReq.method, "url:", clientReq.url,
+             "status:", clientRes.statusCode,
+             "response headers:", redactHeaders(clientRes.getHeaders()));
+    return;
+  }
   const { parsed, forwardBody, meta } = pre;
   const requestedModel = parsed?.model || null;
@@ -88,6 +135,7 @@ async function handleMessages(clientReq, clientRes) {
       abortController.signal
     ));
   } catch (err) {
+    debugLog("[PROXY] forwardRequest error:", err.message);
     if (abortController.signal.aborted) return;
     clientRes.writeHead(502, { "content-type": "application/json" });
     clientRes.end(JSON.stringify({ error: "upstream_error", message: err.message }));
@@ -99,6 +147,11 @@ async function handleMessages(clientReq, clientRes) {
   // socket carried the request without each one re-instrumenting upstream.
   meta._upstreamConnectionId = upstreamConnectionId ?? null;
+  debugLog("[UPSTREAM -> PROXY -> CLAUDE] RESPONSE",
+           "status:", statusCode, "message:", upstreamRes.statusMessage,
+           "upstream headers:", redactHeaders(upstreamRes.headers),
+           "proxy headers:", redactHeaders(responseHeaders));
   if (extSnapshot.length > 0) {
     const resCtx = { status: statusCode, headers: responseHeaders, meta };
     await runOnResponseStart(resCtx, extSnapshot);
@@ -274,16 +327,44 @@ function handleNotFound(_req, res) {
  */
 export function createProxyServer() {
   return http.createServer((req, res) => {
-    if (req.method === "GET" && req.url === "/health") {
-      return handleHealth(req, res);
-    }
-    if (req.method === "POST" && req.url?.startsWith("/v1/messages")) {
-      return handleMessages(req, res);
-    }
-    if (req.url?.startsWith("/api/claude_cli/bootstrap")) {
-      return handleBootstrap(req, res);
-    }
-    handleNotFound(req, res);
+    // Async IIFE: handleMessages/handleBootstrap return promises, so we have
+    // to await them inside the try/catch — a bare return would let rejections
+    // escape to unhandledRejection and (on Node 15+) crash the process.
+    (async () => {
+      try {
+        debugLog("[CLAUDE -> PROXY] REQUEST",
+                 "method:", req.method, "url:", req.url,
+                 "headers:", redactHeaders(req.headers));
+        // Wrap res.write/res.end to log chunk-level activity when debug is on.
+        // These are sync monkey-patches; the inner debugLog self-gates so the
+        // overhead is negligible when CACHE_FIX_DEBUG is unset.
+        const originalWrite = res.write;
+        const originalEnd = res.end;
+        res.write = function (chunk, ...args) {
+          debugLog(`[PROXY -> CLAUDE] Send chunk. Size: ${chunk ? chunk.length : 0} bytes`);
+          return originalWrite.apply(res, [chunk, ...args]);
+        };
+        res.end = function (chunk, ...args) {
+          debugLog("[PROXY -> CLAUDE] Close connection (res.end)");
+          return originalEnd.apply(res, [chunk, ...args]);
+        };
+        if (req.method === "GET" && req.url === "/health") return handleHealth(req, res);
+        if (req.method === "POST" && req.url?.startsWith("/v1/messages")) return await handleMessages(req, res);
+        if (req.url?.startsWith("/api/claude_cli/bootstrap")) return await handleBootstrap(req, res);
+        debugLog("ERROR: handler not found for req.url=", req.url, "method=", req.method);
+        handleNotFound(req, res);
+      } catch (error) {
+        debugLog("REQUEST HANDLER ERROR:", error?.message, error?.stack);
+        // Generic body: do NOT echo error.message (may include internal paths,
+        // upstream URLs, or other server state).
+        if (!res.headersSent) {
+          res.writeHead(500, { "content-type": "application/json" });
+          res.end(JSON.stringify({ error: "internal_proxy_error" }));
+        }
+      }
+    })();
   });
 }

package/proxy/upstream.mjs CHANGED Viewed

@@ -183,9 +183,23 @@ function getAgent(isHTTPS, hostname) {
   return agent;
 }
+// Build the upstream URL by concatenating the configured base (with any path
+// component preserved) with the client request URL. The historical
+// `new URL(clientReq.url, base)` approach is RFC 3986 relative-resolution,
+// which drops the base's path component when the relative is path-absolute
+// (`/v1/messages`). That breaks corp-proxy / mirror setups where the
+// configured upstream is `https://corp-proxy.example.net/anthropic-mirror`
+// — the request would land at `https://corp-proxy.example.net/v1/messages`
+// with `/anthropic-mirror` silently dropped. See PR #188 / @nisqatsi.
+export function buildUpstreamUrl(base, clientUrl) {
+  const trimmedBase = base.endsWith("/") ? base.slice(0, -1) : base;
+  const relative = clientUrl.startsWith("/") ? clientUrl : "/" + clientUrl;
+  return new URL(trimmedBase + relative);
+}
 export function forwardRequest(clientReq, body, signal) {
   return new Promise((resolve, reject) => {
-    const upstreamUrl = new URL(clientReq.url, config.upstream);
+    const upstreamUrl = buildUpstreamUrl(config.upstream, clientReq.url);
     const headers = buildUpstreamHeaders(clientReq.headers, upstreamUrl.hostname);
     if (body) {

package/templates/cache-fix-proxy.service.template CHANGED Viewed

@@ -10,6 +10,8 @@ Restart=on-failure
 RestartSec=5
 Environment=CACHE_FIX_PROXY_PORT={{PORT}}
 {{UPSTREAM_LINE}}
+{{CA_FILE_LINE}}
+{{REJECT_UNAUTHORIZED_LINE}}
 {{DEBUG_LINE}}
 {{HOT_RELOAD_LINE}}
 WorkingDirectory={{WORKING_DIR}}

package/templates/com.cnighswonger.cache-fix-proxy.plist.template CHANGED Viewed

@@ -14,6 +14,8 @@
         <key>CACHE_FIX_PROXY_PORT</key>
         <string>{{PORT}}</string>
 {{UPSTREAM_PLIST}}
+{{CA_FILE_PLIST}}
+{{REJECT_UNAUTHORIZED_PLIST}}
 {{DEBUG_PLIST}}
 {{HOT_RELOAD_PLIST}}
     </dict>

package/tools/cache_analysis.py ADDED Viewed

@@ -0,0 +1,229 @@
+"""Shared cache analysis helpers for hooks and MCP tools.
+Reference Python helper for consumers that want to read cache-fix's
+``quota-status`` output and reason about cache-state from a Claude Code
+transcript. Used by host-side hooks (e.g. ``~/.claude/hooks/
+context-advisor-analyze.py``) and MCP tools that need quota-aware
+behavior.
+Consumer pattern: copy or symlink this file into ``~/.claude/mcp/`` (or
+wherever your hook / tool expects to import from) and ``from cache_analysis
+import read_quota_status, analyze_transcript`` etc. The file ships in the
+cache-fix npm package's ``tools/`` directory; npm consumers can reference
+``node_modules/claude-code-cache-fix/tools/cache_analysis.py`` directly or
+copy it out for non-npm installs.
+The ``read_quota_status()`` helper handles both cache-fix v3.5.0+ (proxy
+mode, per-session split at ``~/.claude/quota-status/account.json``) and
+v3.4.x and earlier / preload mode (single global
+``~/.claude/quota-status.json``). See the README's "Migration:
+v3.4.x → v3.5.0+" section.
+"""
+import json
+import subprocess
+from datetime import datetime, timezone
+CACHE_TTL_5M = 300                # 5-minute ephemeral TTL
+CACHE_TTL_1H = 3600              # 1-hour extended TTL
+CONTEXT_THRESHOLD = 50_000        # Minimum tokens to recommend compact
+COMPACT_RESULT_ESTIMATE = 12_000  # Estimated tokens after compaction
+CACHE_CREATE_RATE_5M = 3.75       # Opus $/MTok for 5min cache writes
+CACHE_CREATE_RATE_1H = 7.50       # Opus $/MTok for 1h cache writes
+def read_tail_lines(filepath, n=300):
+    """Read last N lines efficiently using tail."""
+    try:
+        result = subprocess.run(
+            ["tail", "-n", str(n), filepath],
+            capture_output=True, text=True, timeout=5,
+        )
+        return result.stdout.splitlines()
+    except Exception:
+        return []
+def parse_assistant_usage(lines):
+    """Extract assistant messages with usage data from transcript lines."""
+    messages = []
+    for line in lines:
+        line = line.strip()
+        if not line:
+            continue
+        try:
+            obj = json.loads(line)
+        except json.JSONDecodeError:
+            continue
+        if obj.get("type") != "assistant":
+            continue
+        msg = obj.get("message", {})
+        usage = msg.get("usage")
+        ts = obj.get("timestamp")
+        if not usage or not ts:
+            continue
+        cr = usage.get("cache_creation_input_tokens", 0)
+        rd = usage.get("cache_read_input_tokens", 0)
+        inp = usage.get("input_tokens", 0)
+        out = usage.get("output_tokens", 0)
+        if cr == 0 and rd == 0 and inp == 0:
+            continue
+        # Extract TTL tier breakdown if available
+        cr_detail = usage.get("cache_creation", {})
+        cr_1h = cr_detail.get("ephemeral_1h_input_tokens", 0) if isinstance(cr_detail, dict) else 0
+        cr_5m = cr_detail.get("ephemeral_5m_input_tokens", 0) if isinstance(cr_detail, dict) else 0
+        messages.append({
+            "timestamp": ts,
+            "input_tokens": inp,
+            "cache_creation": cr,
+            "cache_read": rd,
+            "output_tokens": out,
+            "total_in": cr + rd + inp,
+            "cr_1h": cr_1h,
+            "cr_5m": cr_5m,
+        })
+    return messages
+def detect_cache_ttl(messages):
+    """Detect the effective cache TTL from recent API call usage data.
+    If any recent calls show ephemeral_1h_input_tokens > 0, the account
+    is on the 1-hour tier. Otherwise, assume 5-minute ephemeral.
+    Returns (ttl_seconds, tier_name).
+    """
+    recent = messages[-10:] if len(messages) >= 10 else messages
+    has_1h = any(m.get("cr_1h", 0) > 0 for m in recent)
+    has_5m = any(m.get("cr_5m", 0) > 0 for m in recent)
+    if has_1h:
+        return CACHE_TTL_1H, "1h"
+    if has_5m:
+        return CACHE_TTL_5M, "5m"
+    # No cache_creation breakdown available — conservative default
+    return CACHE_TTL_5M, "5m (default)"
+def estimate_thinking_overhead(messages):
+    """Estimate thinking block replay overhead.
+    Thinking blocks from prior turns replay as input tokens. Heuristic:
+    cumulative output_tokens approximates thinking content that gets replayed.
+    """
+    if len(messages) < 2:
+        return 0
+    return sum(m["output_tokens"] for m in messages[:-1])
+def format_tokens(n):
+    if n >= 1_000_000:
+        return f"{n / 1_000_000:.1f}M"
+    if n >= 1_000:
+        return f"{n / 1_000:.0f}k"
+    return str(n)
+def format_duration(seconds):
+    if seconds >= 3600:
+        return f"{seconds / 3600:.1f}h"
+    return f"{int(seconds / 60)}m"
+def estimate_savings(total_context, ttl_tier="5m"):
+    """Estimate $ savings from compacting before a cold start.
+    Rate depends on the active cache TTL tier — 1h cache writes are 2x the
+    5m rate. Caller should pass the tier returned by detect_cache_ttl().
+    Default is the conservative 5m rate for backward compatibility.
+    """
+    rate = CACHE_CREATE_RATE_1H if ttl_tier.startswith("1h") else CACHE_CREATE_RATE_5M
+    cold_cost = (total_context / 1_000_000) * rate
+    compact_cost = (COMPACT_RESULT_ESTIMATE / 1_000_000) * rate
+    return cold_cost - compact_cost
+def read_quota_status():
+    """Read current quota utilization from cache-fix's quota-status file.
+    Written by the cache-fix interceptor from API response headers. Path
+    depends on cache-fix version:
+      - v3.5.0+ (proxy mode, per-session split): ~/.claude/quota-status/account.json
+      - v3.4.x and earlier (or preload mode): ~/.claude/quota-status.json (flat)
+    Tries the v3.5.0+ path first, falls back to the legacy flat path. A
+    candidate file whose JSON parses but isn't a dict (e.g. a partial write
+    that lands as ``[]`` or ``null``) is skipped so the next candidate gets
+    a chance — and so callers never receive a non-dict and break on
+    ``status.get(...)`` accessors downstream.
+    Returns dict with five_hour/seven_day pct (and other fields written by
+    cache-fix's response-header capture), or None if no candidate yields a
+    dict-shaped payload.
+    """
+    import os
+    for quota_file in (
+        os.path.expanduser("~/.claude/quota-status/account.json"),
+        os.path.expanduser("~/.claude/quota-status.json"),
+    ):
+        try:
+            with open(quota_file) as f:
+                data = json.load(f)
+        except (OSError, json.JSONDecodeError):
+            continue
+        if isinstance(data, dict):
+            return data
+        # Valid JSON but wrong shape — try the next candidate.
+    return None
+def analyze_transcript(transcript_path):
+    """Full analysis of a transcript. Returns a dict with all cache state info.
+    Returns None if analysis can't be performed (no data, etc).
+    """
+    lines = read_tail_lines(transcript_path, 300)
+    if not lines:
+        return None
+    messages = parse_assistant_usage(lines)
+    if not messages:
+        return None
+    last = messages[-1]
+    try:
+        last_ts = datetime.fromisoformat(last["timestamp"].replace("Z", "+00:00"))
+    except (ValueError, KeyError):
+        return None
+    now = datetime.now(timezone.utc)
+    gap_seconds = (now - last_ts).total_seconds()
+    context_tokens = last["total_in"]
+    thinking_overhead = estimate_thinking_overhead(messages)
+    total_with_thinking = context_tokens + thinking_overhead
+    ttl_seconds, ttl_tier = detect_cache_ttl(messages)
+    cache_expired = gap_seconds > ttl_seconds
+    # Last few turns' cache efficiency
+    recent = messages[-5:] if len(messages) >= 5 else messages
+    recent_cr = sum(m["cache_creation"] for m in recent)
+    recent_total = sum(m["total_in"] for m in recent)
+    cr_pct = (recent_cr / recent_total * 100) if recent_total else 0
+    quota = read_quota_status()
+    return {
+        "context_tokens": context_tokens,
+        "thinking_overhead": thinking_overhead,
+        "total_with_thinking": total_with_thinking,
+        "gap_seconds": gap_seconds,
+        "cache_expired": cache_expired,
+        "ttl_seconds": ttl_seconds,
+        "ttl_tier": ttl_tier,
+        "last_timestamp": last["timestamp"],
+        "num_messages": len(messages),
+        "recent_cr_pct": cr_pct,
+        "savings": estimate_savings(total_with_thinking, ttl_tier) if cache_expired else 0,
+        "quota": quota,
+    }