claude-code-cache-fix 3.6.2 → 3.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -208,6 +208,27 @@ Options (all optional; all fall back to the same env vars used by the CLI):
208
208
 
209
209
  *The embeddable factory was contributed by [@bilby91](https://github.com/bilby91) at [Crunchloop DAP](https://dap.crunchloop.ai) — see [PR #123](https://github.com/cnighswonger/claude-code-cache-fix/pull/123).*
210
210
 
211
+ ## What this proxy defends against
212
+
213
+ **Cache-economics regressions.** The original purpose of cache-fix is to absorb the cache-handling behaviors in Claude Code that cost users real money and quota — TTL downgrades, cache-breaking header churn, identity-latching issues, and the rest of the regression catalog documented across our issue history. The proxy sits between CC and the Anthropic API, normalizes the request and response stream, and emits enough observability (via statusline integration and the quota-status files) that users can see what their session is actually doing. This is the load-bearing feature for almost every user today.
214
+
215
+ **Bootstrap-channel observability.** Claude Code v2.1.150 introduced a prompt-section consumer that fetches a server-supplied string from `/api/claude_cli/bootstrap` and merges it into the agent's behavioral-instructions prompt path. We filed this behavior with Anthropic's security team in May 2026; Anthropic closed the report as *Informative*, treating TLS as the transport-integrity boundary and declining to add application-layer authenticity checks. Cache-fix v3.7.0 added explicit handling for this path. v3.7.1 extends it to also cover the env-var-selected GrowthBook prompt-injection surface that landed in CC v2.1.152 (remote-control mode: `CLAUDE_CODE_SYSTEM_PROMPT_GB_FEATURE` names a flag key whose cached value is used as the system prompt body).
216
+
217
+ Cache-fix's `bootstrap-defense` extension ships three modes, selected via `CACHE_FIX_BOOTSTRAP_MODE`:
218
+
219
+ | Mode | Default? | Behavior |
220
+ |---|---|---|
221
+ | `audit` | yes | Bootstrap responses proxy through to CC. Each response is logged to `~/.claude/cache-fix-bootstrap-log.jsonl` with surface metadata: which prompt-source surfaces fired (`tengu_heron_brook` legacy and/or env-var-selected), the SHA-256 hash of the value (first 16 hex chars — never the value itself), and the `CLAUDE_CODE_REMOTE` flag. Multi-surface responses emit one record per surface, correlated by `request_id` + timestamp window. |
222
+ | `block` | opt-in | `onRequest` returns a 200 with an empty JSON body. Upstream is never called, no flag map ever reaches the on-disk GrowthBook cache. Defeats both legacy and env-var-selected injection surfaces. |
223
+ | `allowlist` | opt-in (experimental) | Bootstrap response proxies through, but prompt-source-eligible keys (legacy `tengu_heron_brook` + env-var-selected key) not in the allowlist are stripped from the response body before it reaches CC. Default allowlist is `tengu_heron_brook` (the only known-legitimate historical key); configure via `CACHE_FIX_BOOTSTRAP_ALLOWED_KEYS=comma,separated,list`. Pass `CACHE_FIX_BOOTSTRAP_ALLOWED_KEYS=` (explicit empty) for full deny-all. Other GrowthBook flag keys pass through untouched. May need updates if Anthropic adds legitimate prompt-source keys in future CC releases. |
224
+
225
+ Note: cache-fix v3.6.2 and earlier returned 404 for the bootstrap path because the proxy router did not include it — the practical effect was that bootstrap content was not reaching CC for cache-fix users. v3.7.0's default `audit` changes that behavior; explicit `CACHE_FIX_BOOTSTRAP_MODE=block` preserves it. The full disclosure record, including Anthropic's verbatim close text, is in [`docs/disclosure/heron-brook-2026-05.md`](docs/disclosure/heron-brook-2026-05.md).
226
+
227
+ **Reference material:**
228
+ - [`docs/disclosure/heron-brook-2026-05.md`](docs/disclosure/heron-brook-2026-05.md) — full disclosure record
229
+ - [`CHANGELOG.md`](CHANGELOG.md#371---2026-05-27) — v3.7.1 release entry (extended surface coverage + allowlist mode); [v3.7.0 entry](CHANGELOG.md#370---2026-05-26) covers the prior behavior-change note
230
+ - [`cnighswonger/heron-brook-poc`](https://github.com/cnighswonger/heron-brook-poc) — reproducer for the bootstrap-channel behavior
231
+
211
232
  ## Recommended CC operational config
212
233
 
213
234
  The proxy fixes what it can fix at the request layer. A handful of CC client-side env vars and `~/.claude/settings.json` knobs solve adjacent problems the proxy can't reach — silent model swaps on CC update, ambiguous model fallback, schema-strip side effects. Surfacing these here as a recommendation; users decide their own config.
@@ -242,6 +263,18 @@ If you've seen the `autoCompactWindow: 1000000` setting recommended elsewhere: i
242
263
 
243
264
  If you set this flag, CC strips any tool field outside `["name", "description", "input_schema", "cache_control"]` from outgoing requests. Custom tools relying on `defer_loading` or `eager_input_streaming` will silently lose those fields and behave differently. Worth knowing before turning the flag on.
244
265
 
266
+ ## Known CC behaviors that affect cache cost
267
+
268
+ These aren't bugs cache-fix patches — they're upstream CC behaviors users should be aware of when sizing their session cost.
269
+
270
+ ### Diagnostic slash commands inflate conversation history ([#49335](https://github.com/anthropics/claude-code/issues/49335))
271
+
272
+ Running `/context`, `/release-notes` (and likely other state-inspection commands) appends the diagnostic output to conversation history rather than rendering terminal-only. Subsequent turns replay the inflated payload via prompt cache, compounding token cost on a state-inspection action that should be free. Empirically measured at +3,480 `cache_creation_input_tokens` for a single `/context` invocation on v2.1.148; another user reports ~5K on a separate session. `/release-notes` is worse — defaults to dumping the full changelog.
273
+
274
+ Worse for diagnosis: the inflated payload that bills against your cache isn't written to the local JSONL transcript, so you can't audit the cost source locally — you can only infer it from `cache_creation_input_tokens` jumps in response usage metadata. (Proxy-mode users can inspect the deltas in `~/.claude/quota-status/` files, which the proxy writes directly from response headers.)
275
+
276
+ **Workaround until upstream fix:** use these commands sparingly in long sessions. If you need them frequently in a session, consider `/compact` after a diagnostic run to reset the bleed.
277
+
245
278
  ## Quick Start: Preload (CC v2.1.112 and earlier)
246
279
 
247
280
  If you're on a Node.js-based CC version (v2.1.112 or earlier), the preload interceptor works without a proxy:
@@ -358,7 +391,7 @@ The interceptor can only *help* or *do nothing*. It cannot make things worse.
358
391
  Both modes write quota state on every API call. Proxy mode (v3.5.0+) splits into `~/.claude/quota-status/account.json` (account-global fields: Q5h/Q7d, status, overage) plus `~/.claude/quota-status/sessions/<id>.json` (per-session cache fields: TTL tier, hit rate). Preload mode keeps the legacy `~/.claude/quota-status.json` (single-session by construction). The included `tools/quota-statusline.sh` script displays a live status line showing:
359
392
 
360
393
  - **Q5h** quota bar `[███░┃░░░░░]` + percent + `(exhaust X, reset Y)`. Filled cells are consumed quota; the heavy-vertical tick is wall-clock elapsed position in the window. Tick to the right of the fill = under pace; tick inside the fill = burning faster than time (over pace). `exhaust` is the projected time-to-100% at the current burn rate; `reset` is the wall-clock time until the window rolls over. When `exhaust < reset`, you will hit 100% before the window resets — back off.
361
- - **Q7d** same shape with day-scale durations (e.g. `(exhaust 3d 13h, reset 3d 0h)`).
394
+ - **Q7d** same shape with day-scale durations (e.g. `(exhaust 3d13h, reset 3d0h)`). Below a day, the suffix auto-switches to `h/m` format (e.g. `(exhaust 1h41m, reset 0h30m)`).
362
395
  - **TTL tier** — `TTL:1h` when healthy, **`TTL:5m` in red when the server has downgraded you** (typically at Q5h ≥ 100%)
363
396
  - **PEAK** in yellow during weekday peak hours (13:00–19:00 UTC)
364
397
  - **Cache hit rate %**
@@ -367,10 +400,10 @@ Both modes write quota state on every API call. Proxy mode (v3.5.0+) splits into
367
400
  Example line (mid-window, healthy state):
368
401
 
369
402
  ```
370
- Q5h [███░┃░░░░░] 30% (exhaust 4h40m, reset 3h00m) | Q7d [█████┃░░░░] 53% (exhaust 3d 13h, reset 3d 0h) | TTL:1h 98.3%
403
+ Q5h [███░┃░░░░░] 30% (exhaust 4h40m, reset 3h00m) | Q7d [█████┃░░░░] 53% (exhaust 3d13h, reset 3d0h) | TTL:1h 98.3%
371
404
  ```
372
405
 
373
- The `(exhaust …, reset …)` suffix is dropped piecewise when projection isn't meaningful: at 0% (fresh window) and 100% (already exhausted) only `reset` is shown; in the first minute (Q5h) or six minutes (Q7d) after window start the burn rate isn't stable enough to project, so `exhaust` is held back until then; a stale `resets_at` (the server-reported value sits in the past, before the next API call refreshes it) drops both.
406
+ The `(exhaust …, reset …)` suffix is dropped piecewise when projection isn't meaningful: at 0% (fresh window) and 100% (already exhausted) only `reset` is shown; in the first 5 minutes after window start the burn rate isn't stable enough to project (a single early call dominates the rate), so `exhaust` is held back until then on both Q5h and Q7d; a stale `resets_at` (the server-reported value sits in the past, before the next API call refreshes it) drops both.
374
407
 
375
408
  The bar uses Unicode block characters (`█┃░`) — most modern terminals render these correctly. If your terminal substitutes boxes or replacement glyphs, configure a Unicode-capable font (any DejaVu, Fira, Iosevka, JetBrains Mono, etc.).
376
409
 
@@ -740,6 +773,7 @@ We monitor 30+ upstream Claude Code issues related to cache, quota, and context
740
773
  - **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery, proposed and built proxy architecture for v3.0.0
741
774
  - **[@vmfarms](https://github.com/vmfarms)** — Concurrent multi-runner production validation, surfaced proxy-mode resume-marker regex no-op (#96), TTL tier detection gap (#97), and image-strip stderr leak (#98)
742
775
  - **[@ojura](https://github.com/ojura)** — Opus 4.7 thinking-summaries root-cause analysis: filed [anthropics/claude-code#59844](https://github.com/anthropics/claude-code/issues/59844) with the CLI-binary decode (`!getIsNonInteractiveSession()` gate at offset 230510599 in v2.1.142) and the two-stacked-special-cases framing, which made the `thinking-display` extension (v3.6.1) a clean proxy-side complement to the proposed upstream fix
776
+ - **[@schuay](https://github.com/schuay)** — `quota-statusline.sh` enhancements: 10-cell quota bar with elapsed-time tick and exhaust-vs-reset projection replacing the prior `%/min` burn-rate display (PR #140, v3.6.2), and d/h vs h/m time-format autoselect plus named time-unit and burn-warmup constants (PR #143, v3.7.0)
743
777
 
744
778
  If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.
745
779
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-code-cache-fix",
3
- "version": "3.6.2",
3
+ "version": "3.7.1",
4
4
  "description": "Cache optimization proxy and interceptor for Claude Code. Fixes prompt cache bugs, stabilizes prefix, reduces quota burn.",
5
5
  "type": "module",
6
6
  "exports": {
package/proxy/config.mjs CHANGED
@@ -10,14 +10,15 @@ function envInt(name, fallback) {
10
10
 
11
11
  const __dirname = dirname(fileURLToPath(import.meta.url));
12
12
 
13
- // Existing fields are read once at module init (preserving prior behavior).
14
- // New corp-proxy/CA fields are getters so they reflect live env — important
15
- // for test isolation (see test/proxy-upstream-corp-proxy.test.mjs) and for
16
- // callers that legitimately want to flip env at runtime.
13
+ // Most fields are read once at module init (preserving prior behavior).
14
+ // Corp-proxy/CA fields and `upstream` are getters so they reflect live env —
15
+ // important for test isolation (see test/proxy-upstream-corp-proxy.test.mjs
16
+ // and test/proxy-server-bootstrap.test.mjs) and for callers that legitimately
17
+ // want to flip env at runtime.
17
18
  const config = {
18
19
  port: envInt("CACHE_FIX_PROXY_PORT", 9801),
19
20
  bind: process.env.CACHE_FIX_PROXY_BIND || "127.0.0.1",
20
- upstream: process.env.CACHE_FIX_PROXY_UPSTREAM || "https://api.anthropic.com",
21
+ get upstream() { return process.env.CACHE_FIX_PROXY_UPSTREAM || "https://api.anthropic.com"; },
21
22
  timeout: envInt("CACHE_FIX_PROXY_TIMEOUT", 600_000),
22
23
  extensionsDir: process.env.CACHE_FIX_EXTENSIONS_DIR || join(__dirname, "extensions"),
23
24
  extensionsConfig: process.env.CACHE_FIX_EXTENSIONS_CONFIG || join(__dirname, "extensions.json"),
@@ -0,0 +1,286 @@
1
+ import { appendFileSync, statSync, renameSync, mkdirSync } from "node:fs";
2
+ import { join, dirname } from "node:path";
3
+ import { homedir } from "node:os";
4
+ import { createHash } from "node:crypto";
5
+
6
+ const LOG_ROTATE_BYTES = 5 * 1024 * 1024;
7
+ const SCHEMA_VERSION = 2;
8
+ const EXTENSION_VERSION = "v3.7.1";
9
+
10
+ const LEGACY_PROMPT_KEY = "tengu_heron_brook";
11
+
12
+ function logPath() {
13
+ return process.env.CACHE_FIX_BOOTSTRAP_LOG_PATH || join(homedir(), ".claude", "cache-fix-bootstrap-log.jsonl");
14
+ }
15
+
16
+ // Single-tier rotation by design: any previous .1 gets overwritten. The audit
17
+ // log is a forward-looking signal feed, not an archival record — if v3.7.0
18
+ // anomaly detection wants long-term retention it can subscribe to the stream
19
+ // directly. Keeping rotation to one tier bounds disk usage at 2×5MB = 10MB.
20
+ function rotateIfNeeded(path) {
21
+ let size = 0;
22
+ try { size = statSync(path).size; } catch { return; }
23
+ if (size < LOG_ROTATE_BYTES) return;
24
+ try { renameSync(path, `${path}.1`); } catch {}
25
+ }
26
+
27
+ // Single-writer invariant: cache-fix-proxy is one Node process per host, and
28
+ // every bootstrap response that needs logging flows through this extension.
29
+ // All writes are appendFileSync from a single event loop, so no inter-process
30
+ // or inter-extension locking is required. If that invariant ever changes
31
+ // (e.g. multi-process proxy, sibling extension writing to the same file),
32
+ // this writer needs to gain a lock.
33
+ function appendRecord(record) {
34
+ const path = logPath();
35
+ try {
36
+ mkdirSync(dirname(path), { recursive: true });
37
+ rotateIfNeeded(path);
38
+ appendFileSync(path, JSON.stringify(record) + "\n");
39
+ } catch (err) {
40
+ process.stderr.write(`[bootstrap-defense] log write failed: ${err.message}\n`);
41
+ }
42
+ }
43
+
44
+ function modeFromEnv(fallback) {
45
+ const raw = process.env.CACHE_FIX_BOOTSTRAP_MODE;
46
+ if (raw === "audit" || raw === "block" || raw === "allowlist") return raw;
47
+ return fallback;
48
+ }
49
+
50
+ // Parse the allowlist env var. Distinguishes three states:
51
+ // - var unset → default allowlist [LEGACY_PROMPT_KEY]
52
+ // - var = "" (explicit) → empty allowlist (deny-all)
53
+ // - var = "a,b,c" → [a, b, c] (trimmed, empty entries dropped)
54
+ function allowedKeysFromEnv() {
55
+ const raw = process.env.CACHE_FIX_BOOTSTRAP_ALLOWED_KEYS;
56
+ if (raw === undefined) return new Set([LEGACY_PROMPT_KEY]);
57
+ if (raw === "") return new Set();
58
+ return new Set(raw.split(",").map((s) => s.trim()).filter((s) => s.length > 0));
59
+ }
60
+
61
+ // Hash derivation contract (directive § Hash derivation): first 16 chars of
62
+ // the lowercase hex SHA-256 digest over the UTF-8-encoded flag value. Pinned
63
+ // here so a future refactor cannot silently change historical-record identity.
64
+ function hashFlagValue(value) {
65
+ return createHash("sha256").update(value, "utf8").digest("hex").slice(0, 16);
66
+ }
67
+
68
+ // PII discipline: the audit log MUST NOT include client headers (Authorization,
69
+ // x-api-key, cookies, etc.) or request/response bodies. Callers pass only the
70
+ // extracted scalar fields below — the full headers object is never threaded
71
+ // through this function, so a future maintainer can't accidentally widen the
72
+ // log surface by spreading the parameter object.
73
+ //
74
+ // v3.7.0 extension fields (baseline_hash, anomaly_status) are emitted as null
75
+ // so log readers from that schema don't break. v3.7.1 adds the prompt-source
76
+ // surface fields (surface, prompt_key, prompt_value_hash, remote_mode,
77
+ // stripped_keys) — consumers reading v1 records should treat these as
78
+ // null/empty array.
79
+ function recordShape({
80
+ phase, mode, status, body_bytes, upstream_host, request_id, error,
81
+ surface, prompt_key, prompt_value_hash, remote_mode, stripped_keys,
82
+ }) {
83
+ return {
84
+ schema_version: SCHEMA_VERSION,
85
+ extension_version: EXTENSION_VERSION,
86
+ timestamp: new Date().toISOString(),
87
+ phase,
88
+ mode,
89
+ status: status ?? null,
90
+ body_bytes: body_bytes ?? null,
91
+ upstream_host: upstream_host ?? null,
92
+ request_id: request_id ?? null,
93
+ baseline_hash: null,
94
+ anomaly_status: null,
95
+ error: error ?? null,
96
+ surface: surface ?? "bootstrap",
97
+ prompt_key: prompt_key ?? null,
98
+ prompt_value_hash: prompt_value_hash ?? null,
99
+ remote_mode: remote_mode ?? false,
100
+ stripped_keys: stripped_keys ?? [],
101
+ };
102
+ }
103
+
104
+ // Extract audit-record scalars from the request context. Meta wins over
105
+ // headers so that the canonical request-side fields stashed by the server
106
+ // (handleBootstrap stashes `_bootstrapUpstreamHost` and `_bootstrapRequestId`
107
+ // before the upstream call) are authoritative on the onResponse path, where
108
+ // ctx.headers carries the upstream RESPONSE headers (no Host, no request-id).
109
+ // On the onRequest block path, meta hasn't been populated yet — the helper
110
+ // falls back to ctx.headers (the client request headers) so request_id is
111
+ // still captured. Upstream_host falls back to null on the request path
112
+ // because the resolved upstream isn't known until handleBootstrap runs.
113
+ //
114
+ // Signature is scalar-only by design: callers MUST NOT spread a headers
115
+ // object into recordShape, so a future maintainer can't accidentally widen
116
+ // the log surface to carry Authorization / x-api-key / cookies.
117
+ function extractAuditFields(meta, headers) {
118
+ const requestId =
119
+ meta?._bootstrapRequestId ??
120
+ headers?.["request-id"] ??
121
+ headers?.["x-request-id"] ??
122
+ null;
123
+ return {
124
+ upstream_host: meta?._bootstrapUpstreamHost ?? null,
125
+ request_id: requestId,
126
+ };
127
+ }
128
+
129
+ // Surface-activation logic (directive § Multi-surface records):
130
+ // - "bootstrap" surface fires when the legacy hardcoded key is in the body.
131
+ // - "prompt_injection_gb" surface fires when CLAUDE_CODE_SYSTEM_PROMPT_GB_FEATURE
132
+ // is set, independent of whether that named key is in the body — this gives
133
+ // the audit log operator-visibility into env-configured intent even when
134
+ // the upstream did not deliver the value.
135
+ function detectSurfaces(body) {
136
+ const surfaces = [];
137
+ const bodyIsObject = body !== null && typeof body === "object" && !Array.isArray(body);
138
+ if (bodyIsObject && Object.prototype.hasOwnProperty.call(body, LEGACY_PROMPT_KEY)) {
139
+ surfaces.push({ surface: "bootstrap", prompt_key: LEGACY_PROMPT_KEY });
140
+ }
141
+ const envKey = process.env.CLAUDE_CODE_SYSTEM_PROMPT_GB_FEATURE;
142
+ if (envKey) {
143
+ surfaces.push({ surface: "prompt_injection_gb", prompt_key: envKey });
144
+ }
145
+ return surfaces;
146
+ }
147
+
148
+ export default {
149
+ name: "bootstrap-defense",
150
+ description:
151
+ "Audit (default), block, or allowlist server-controlled system-prompt injection through /api/claude_cli/bootstrap. " +
152
+ "Covers both the legacy tengu_heron_brook reader and the env-var-selected (CLAUDE_CODE_SYSTEM_PROMPT_GB_FEATURE) reader. " +
153
+ "Audit mode logs surface metadata to ~/.claude/cache-fix-bootstrap-log.jsonl. Block mode returns empty 200 from onRequest. " +
154
+ "Allowlist mode strips non-allowlisted prompt-source-eligible keys from the response body before returning it to CC.",
155
+ // Pipeline route scoping (pipeline.mjs:appliesToRoute) gates this extension
156
+ // to the bootstrap path. The internal route guard below is belt-and-suspenders
157
+ // in case a caller invokes the hook directly.
158
+ routes: ["bootstrap"],
159
+ order: 45,
160
+
161
+ async onRequest(ctx) {
162
+ if (ctx.meta?.route !== "bootstrap") return;
163
+
164
+ const mode = modeFromEnv("audit");
165
+ ctx.meta._bootstrapDefenseMode = mode;
166
+
167
+ if (mode === "block") {
168
+ appendRecord(
169
+ recordShape({
170
+ phase: "request_blocked",
171
+ mode,
172
+ ...extractAuditFields(ctx.meta, ctx.headers),
173
+ remote_mode: Boolean(process.env.CLAUDE_CODE_REMOTE),
174
+ }),
175
+ );
176
+ return {
177
+ skip: true,
178
+ status: 200,
179
+ headers: { "content-type": "application/json" },
180
+ body: {},
181
+ };
182
+ }
183
+ },
184
+
185
+ async onResponse(ctx) {
186
+ if (ctx.meta?.route !== "bootstrap") return;
187
+ const mode = ctx.meta._bootstrapDefenseMode || "audit";
188
+ if (mode !== "audit" && mode !== "allowlist") return;
189
+
190
+ // Prefer raw on-wire byte count from server.mjs (accurate even for
191
+ // non-JSON / unparseable upstream responses). The fallback stringify path
192
+ // exists only for direct unit-test invocation of onResponse without going
193
+ // through handleBootstrap — production traffic always sets the meta field.
194
+ let bodyBytes = ctx.meta?._bootstrapBodyBytes ?? null;
195
+ if (bodyBytes === null) {
196
+ try { bodyBytes = Buffer.byteLength(JSON.stringify(ctx.body ?? {})); } catch {}
197
+ }
198
+
199
+ const upstreamError = ctx.meta?._bootstrapUpstreamError ?? null;
200
+ const auditFields = extractAuditFields(ctx.meta, ctx.headers);
201
+ const remoteMode = Boolean(process.env.CLAUDE_CODE_REMOTE);
202
+ const baseRecord = {
203
+ mode,
204
+ status: ctx.status,
205
+ body_bytes: bodyBytes,
206
+ ...auditFields,
207
+ remote_mode: remoteMode,
208
+ };
209
+
210
+ // Anomaly path: upstream error or unparseable body. Single record with
211
+ // new fields defaulted; no multi-surface emission (directive § Multi-surface
212
+ // records, last paragraph).
213
+ if (upstreamError) {
214
+ appendRecord(recordShape({
215
+ ...baseRecord,
216
+ phase: "upstream_error_audited",
217
+ error: upstreamError,
218
+ }));
219
+ return;
220
+ }
221
+ if (ctx.body === null || ctx.body === undefined) {
222
+ appendRecord(recordShape({
223
+ ...baseRecord,
224
+ phase: "response_audited",
225
+ }));
226
+ return;
227
+ }
228
+
229
+ // Detect which prompt-source surfaces apply to this response.
230
+ const surfaces = detectSurfaces(ctx.body);
231
+
232
+ // No injection-detected baseline: single record preserving v3.7.0's
233
+ // record shape (surface defaults to "bootstrap", prompt_key null).
234
+ if (surfaces.length === 0) {
235
+ appendRecord(recordShape({
236
+ ...baseRecord,
237
+ phase: "response_audited",
238
+ }));
239
+ return;
240
+ }
241
+
242
+ // Compute per-surface fields. Read from the original (unmutated) body
243
+ // for every surface BEFORE applying any allowlist strips, so multiple
244
+ // surfaces referencing the same key (env-var-aliases-legacy-key) both
245
+ // see the same value and emit identical prompt_value_hash.
246
+ const allowed = mode === "allowlist" ? allowedKeysFromEnv() : null;
247
+ const perSurface = surfaces.map((s) => {
248
+ const value = ctx.body[s.prompt_key];
249
+ const hash = typeof value === "string" ? hashFlagValue(value) : null;
250
+ const stripThis =
251
+ mode === "allowlist" &&
252
+ !allowed.has(s.prompt_key) &&
253
+ Object.prototype.hasOwnProperty.call(ctx.body, s.prompt_key);
254
+ return {
255
+ ...s,
256
+ prompt_value_hash: hash,
257
+ stripped_keys: stripThis ? [s.prompt_key] : [],
258
+ };
259
+ });
260
+
261
+ // Apply strips to ctx.body. Deletion is idempotent across multiple
262
+ // surfaces that target the same key (alias case).
263
+ if (mode === "allowlist") {
264
+ for (const r of perSurface) {
265
+ for (const k of r.stripped_keys) {
266
+ delete ctx.body[k];
267
+ }
268
+ }
269
+ }
270
+
271
+ // Emit one audit record per detected surface. Shared scalars
272
+ // (request_id, timestamp window, body_bytes, etc.) are duplicated across
273
+ // records emitted from a single response; consumers correlate by
274
+ // request_id + timestamp window.
275
+ for (const r of perSurface) {
276
+ appendRecord(recordShape({
277
+ ...baseRecord,
278
+ phase: "response_audited",
279
+ surface: r.surface,
280
+ prompt_key: r.prompt_key,
281
+ prompt_value_hash: r.prompt_value_hash,
282
+ stripped_keys: r.stripped_keys,
283
+ }));
284
+ }
285
+ },
286
+ };
@@ -1,70 +1,20 @@
1
1
  {
2
- "ttl-tier-detect": {
3
- "enabled": true,
4
- "order": 75
5
- },
6
- "fingerprint-strip": {
7
- "enabled": true,
8
- "order": 100
9
- },
10
- "image-strip": {
11
- "enabled": true,
12
- "order": 150
13
- },
14
- "sort-stabilization": {
15
- "enabled": true,
16
- "order": 200
17
- },
18
- "fresh-session-sort": {
19
- "enabled": true,
20
- "order": 250
21
- },
22
- "identity-normalization": {
23
- "enabled": true,
24
- "order": 300
25
- },
26
- "smoosh-split": {
27
- "enabled": true,
28
- "order": 320
29
- },
30
- "content-strip": {
31
- "enabled": true,
32
- "order": 330
33
- },
34
- "tool-input-normalize": {
35
- "enabled": true,
36
- "order": 340
37
- },
38
- "microcompact-stability": {
39
- "enabled": true,
40
- "order": 350
41
- },
42
- "thinking-display": {
43
- "enabled": true,
44
- "order": 360
45
- },
46
- "cache-control-normalize": {
47
- "enabled": true,
48
- "order": 400
49
- },
50
- "messages-cache-breakpoint": {
51
- "enabled": true,
52
- "order": 410
53
- },
54
- "ttl-management": {
55
- "enabled": true,
56
- "order": 500
57
- },
58
- "cache-telemetry": {
59
- "enabled": true,
60
- "order": 600
61
- },
62
- "overage-warning": {
63
- "enabled": true,
64
- "order": 610
65
- },
66
- "request-log": {
67
- "enabled": false,
68
- "order": 700
69
- }
2
+ "bootstrap-defense": { "enabled": true, "order": 45 },
3
+ "ttl-tier-detect": { "enabled": true, "order": 75 },
4
+ "fingerprint-strip": { "enabled": true, "order": 100 },
5
+ "image-strip": { "enabled": true, "order": 150 },
6
+ "sort-stabilization": { "enabled": true, "order": 200 },
7
+ "fresh-session-sort": { "enabled": true, "order": 250 },
8
+ "identity-normalization": { "enabled": true, "order": 300 },
9
+ "smoosh-split": { "enabled": true, "order": 320 },
10
+ "content-strip": { "enabled": true, "order": 330 },
11
+ "tool-input-normalize": { "enabled": true, "order": 340 },
12
+ "microcompact-stability": { "enabled": true, "order": 350 },
13
+ "thinking-display": { "enabled": true, "order": 360 },
14
+ "cache-control-normalize": { "enabled": true, "order": 400 },
15
+ "messages-cache-breakpoint": { "enabled": true, "order": 410 },
16
+ "ttl-management": { "enabled": true, "order": 500 },
17
+ "cache-telemetry": { "enabled": true, "order": 600 },
18
+ "overage-warning": { "enabled": true, "order": 610 },
19
+ "request-log": { "enabled": false, "order": 700 }
70
20
  }
@@ -46,10 +46,27 @@ export function snapshotRegistry() {
46
46
  return [...registry];
47
47
  }
48
48
 
49
+ // Route scoping: extensions default to messages-only so that adding a new
50
+ // route (e.g. /api/claude_cli/bootstrap) doesn't drag every existing
51
+ // message-mutating extension onto it — most throw on a null body because
52
+ // they were never designed for non-messages traffic. Cross-cutting
53
+ // extensions (cache-telemetry, usage-log, …) opt into additional routes
54
+ // by declaring an explicit `routes` array on their default export.
55
+ //
56
+ // If ctx.meta.route is undefined we skip filtering entirely — preserves
57
+ // back-compat for callers that don't tag routes (legacy tests, embedders).
58
+ function appliesToRoute(ext, route) {
59
+ if (!route) return true;
60
+ const routes = ext.routes || ["messages"];
61
+ return routes.includes(route);
62
+ }
63
+
49
64
  export async function runOnRequest(ctx, snapshot) {
50
65
  const exts = snapshot || registry;
66
+ const route = ctx.meta?.route;
51
67
  for (const ext of exts) {
52
68
  if (!ext.onRequest) continue;
69
+ if (!appliesToRoute(ext, route)) continue;
53
70
  try {
54
71
  const result = await ext.onRequest(ctx);
55
72
  if (result && result.skip) return result;
@@ -62,8 +79,10 @@ export async function runOnRequest(ctx, snapshot) {
62
79
 
63
80
  export async function runOnResponseStart(ctx, snapshot) {
64
81
  const exts = snapshot || registry;
82
+ const route = ctx.meta?.route;
65
83
  for (const ext of exts) {
66
84
  if (!ext.onResponseStart) continue;
85
+ if (!appliesToRoute(ext, route)) continue;
67
86
  try {
68
87
  await ext.onResponseStart(ctx);
69
88
  } catch (err) {
@@ -74,8 +93,10 @@ export async function runOnResponseStart(ctx, snapshot) {
74
93
 
75
94
  export async function runOnStreamEvent(ctx, snapshot) {
76
95
  const exts = snapshot || registry;
96
+ const route = ctx.meta?.route;
77
97
  for (const ext of exts) {
78
98
  if (!ext.onStreamEvent) continue;
99
+ if (!appliesToRoute(ext, route)) continue;
79
100
  try {
80
101
  await ext.onStreamEvent(ctx);
81
102
  } catch (err) {
@@ -86,8 +107,10 @@ export async function runOnStreamEvent(ctx, snapshot) {
86
107
 
87
108
  export async function runOnResponse(ctx, snapshot) {
88
109
  const exts = snapshot || registry;
110
+ const route = ctx.meta?.route;
89
111
  for (const ext of exts) {
90
112
  if (!ext.onResponse) continue;
113
+ if (!appliesToRoute(ext, route)) continue;
91
114
  try {
92
115
  await ext.onResponse(ctx);
93
116
  } catch (err) {
package/proxy/server.mjs CHANGED
@@ -1,5 +1,5 @@
1
1
  import http from "node:http";
2
- import { pathToFileURL } from "node:url";
2
+ import { pathToFileURL, URL } from "node:url";
3
3
  import config from "./config.mjs";
4
4
  import { forwardRequest } from "./upstream.mjs";
5
5
  import { streamResponse, createTelemetryRecord } from "./stream.mjs";
@@ -15,16 +15,15 @@ function collectBody(req) {
15
15
  });
16
16
  }
17
17
 
18
- async function handleMessages(clientReq, clientRes) {
19
- const abortController = new AbortController();
20
- const extSnapshot = snapshotRegistry();
21
-
22
- clientReq.on("close", () => {
23
- if (!clientRes.writableEnded) {
24
- abortController.abort();
25
- }
26
- });
27
-
18
+ // Run the pre-forward pipeline stages (collect body, parse, runOnRequest)
19
+ // and either short-circuit with an extension-supplied response (block mode,
20
+ // auth-failure synth, etc.) or return the inputs the caller needs to drive
21
+ // forwarding and the post-response stages.
22
+ //
23
+ // `routeName` is stashed on ctx.meta.route so route-aware extensions
24
+ // (bootstrap-defense, env-flag-detector) can discriminate without each
25
+ // route needing its own pipeline hook.
26
+ async function preForward(clientReq, clientRes, _abortController, extSnapshot, routeName, baseMeta = {}) {
28
27
  const rawBody = await collectBody(clientReq);
29
28
 
30
29
  let parsed;
@@ -35,23 +34,49 @@ async function handleMessages(clientReq, clientRes) {
35
34
  }
36
35
 
37
36
  let forwardBody = rawBody;
38
- const meta = {};
37
+ // baseMeta lets routes pre-populate audit scalars (e.g. resolved upstream
38
+ // hostname, request_id) so they're available to onRequest hooks BEFORE the
39
+ // upstream call — block-mode short-circuits in onRequest, so a post-call
40
+ // stash would miss the block-path audit record.
41
+ const meta = { ...baseMeta, route: routeName };
39
42
 
40
- if (parsed && extSnapshot.length > 0) {
43
+ if (extSnapshot.length > 0) {
41
44
  const reqCtx = { body: parsed, headers: { ...clientReq.headers }, meta };
42
45
  const skipResult = await runOnRequest(reqCtx, extSnapshot);
43
46
 
44
47
  if (skipResult && skipResult.skip) {
45
48
  const status = skipResult.status || 400;
46
- const body = skipResult.body || { error: "blocked_by_extension" };
47
- clientRes.writeHead(status, { "content-type": "application/json" });
48
- clientRes.end(JSON.stringify(body));
49
- return;
49
+ const headers = skipResult.headers || { "content-type": "application/json" };
50
+ const body = skipResult.body ?? { error: "blocked_by_extension" };
51
+ clientRes.writeHead(status, headers);
52
+ clientRes.end(typeof body === "string" ? body : JSON.stringify(body));
53
+ return { handled: true };
50
54
  }
51
55
 
52
- forwardBody = Buffer.from(JSON.stringify(reqCtx.body));
56
+ if (parsed) {
57
+ forwardBody = Buffer.from(JSON.stringify(reqCtx.body));
58
+ }
53
59
  }
54
60
 
61
+ return { handled: false, parsed, forwardBody, meta };
62
+ }
63
+
64
+ async function handleMessages(clientReq, clientRes) {
65
+ const abortController = new AbortController();
66
+ const extSnapshot = snapshotRegistry();
67
+
68
+ // Streaming SSE: if the client gives up mid-stream, free the upstream.
69
+ // Bootstrap (handleBootstrap) doesn't install this because its response is
70
+ // a single non-SSE JSON payload — aborting on clientReq close prematurely
71
+ // would race the response write on fast-failure paths (e.g. ECONNREFUSED).
72
+ clientReq.on("close", () => {
73
+ if (!clientRes.writableEnded) abortController.abort();
74
+ });
75
+
76
+ const pre = await preForward(clientReq, clientRes, abortController, extSnapshot, "messages");
77
+ if (pre.handled) return;
78
+ const { parsed, forwardBody, meta } = pre;
79
+
55
80
  const requestedModel = parsed?.model || null;
56
81
 
57
82
  let upstreamRes, responseHeaders, statusCode, upstreamConnectionId;
@@ -129,6 +154,89 @@ async function handleMessages(clientReq, clientRes) {
129
154
  }
130
155
  }
131
156
 
157
+ // Route handler for `/api/claude_cli/bootstrap` (CC v2.1.150+ system-prompt
158
+ // injection channel). Same pipeline shape as handleMessages but without
159
+ // the streaming branch — bootstrap is a single non-SSE JSON response.
160
+ // The bootstrap-defense extension binds to onRequest/onResponse with
161
+ // `ctx.meta.route === "bootstrap"` to drive audit/block behavior.
162
+ async function handleBootstrap(clientReq, clientRes) {
163
+ const abortController = new AbortController();
164
+ const extSnapshot = snapshotRegistry();
165
+
166
+ // Resolve audit-record scalars BEFORE preForward so they're visible to
167
+ // onRequest hooks (block-mode short-circuits there). HTTP responses don't
168
+ // carry a Host header, so the audit log derives upstream_host from
169
+ // config.upstream — the actual destination requests were forwarded to.
170
+ let upstreamHost = null;
171
+ try {
172
+ upstreamHost = new URL(config.upstream).hostname;
173
+ } catch {}
174
+ const baseMeta = {
175
+ _bootstrapUpstreamHost: upstreamHost,
176
+ _bootstrapRequestId:
177
+ clientReq.headers["request-id"] ?? clientReq.headers["x-request-id"] ?? null,
178
+ };
179
+
180
+ const pre = await preForward(clientReq, clientRes, abortController, extSnapshot, "bootstrap", baseMeta);
181
+ if (pre.handled) return;
182
+ const { forwardBody, meta } = pre;
183
+
184
+ let upstreamRes, responseHeaders, statusCode, upstreamConnectionId;
185
+
186
+ try {
187
+ ({ upstreamRes, responseHeaders, statusCode, upstreamConnectionId } = await forwardRequest(
188
+ clientReq,
189
+ forwardBody,
190
+ abortController.signal,
191
+ ));
192
+ } catch (err) {
193
+ // Anomaly audit: bootstrap upstream errors are exactly the kind of event
194
+ // an attacker triggering DNS shenanigans or an outage would produce, so
195
+ // route them through the extension pipeline before responding 502.
196
+ if (extSnapshot.length > 0) {
197
+ meta._bootstrapUpstreamError = err.message;
198
+ meta._bootstrapBodyBytes = 0;
199
+ const errCtx = { status: 502, headers: {}, body: null, meta };
200
+ await runOnResponse(errCtx, extSnapshot);
201
+ }
202
+ clientRes.writeHead(502, { "content-type": "application/json" });
203
+ clientRes.end(JSON.stringify({ error: "upstream_error", message: err.message }));
204
+ return;
205
+ }
206
+
207
+ meta._upstreamConnectionId = upstreamConnectionId ?? null;
208
+
209
+ if (extSnapshot.length > 0) {
210
+ const resCtx = { status: statusCode, headers: responseHeaders, meta };
211
+ await runOnResponseStart(resCtx, extSnapshot);
212
+ }
213
+
214
+ const chunks = [];
215
+ for await (const chunk of upstreamRes) chunks.push(chunk);
216
+ const rawResponse = Buffer.concat(chunks);
217
+
218
+ if (extSnapshot.length > 0) {
219
+ let responseBody = null;
220
+ try {
221
+ responseBody = JSON.parse(rawResponse.toString());
222
+ } catch {}
223
+ // Stash raw byte count so bootstrap-defense (and future audit extensions)
224
+ // can record the on-wire payload size even when the body fails to parse.
225
+ // Non-JSON responses are exactly the anomaly audit mode needs to capture.
226
+ meta._bootstrapBodyBytes = rawResponse.length;
227
+ const resCtx = { status: statusCode, headers: responseHeaders, body: responseBody, meta };
228
+ await runOnResponse(resCtx, extSnapshot);
229
+ if (responseBody !== null) {
230
+ clientRes.writeHead(statusCode, resCtx.headers);
231
+ clientRes.end(JSON.stringify(resCtx.body));
232
+ return;
233
+ }
234
+ }
235
+
236
+ clientRes.writeHead(statusCode, responseHeaders);
237
+ clientRes.end(rawResponse);
238
+ }
239
+
132
240
  function handleHealth(_req, res) {
133
241
  res.writeHead(200, { "content-type": "application/json" });
134
242
  res.end(JSON.stringify({ status: "ok" }));
@@ -157,6 +265,9 @@ export function createProxyServer() {
157
265
  if (req.method === "POST" && req.url?.startsWith("/v1/messages")) {
158
266
  return handleMessages(req, res);
159
267
  }
268
+ if (req.url?.startsWith("/api/claude_cli/bootstrap")) {
269
+ return handleBootstrap(req, res);
270
+ }
160
271
  handleNotFound(req, res);
161
272
  });
162
273
  }
@@ -98,6 +98,17 @@ ts = sess.get('timestamp') or acc.get('timestamp', '')
98
98
 
99
99
  now = datetime.fromisoformat(ts.replace('Z', '+00:00')) if ts else datetime.now(timezone.utc)
100
100
 
101
+ SECS_PER_MIN = 60
102
+ MINS_PER_HR = 60
103
+ HRS_PER_DAY = 24
104
+ SECS_PER_HR = SECS_PER_MIN * MINS_PER_HR
105
+ SECS_PER_DAY = SECS_PER_HR * HRS_PER_DAY
106
+
107
+ # Minimum elapsed time in a window before we'll project an exhaust ETA from
108
+ # its burn rate. Below this the rate is dominated by a single early call and
109
+ # the projection is noise.
110
+ BURN_WARMUP_SEC = 5 * SECS_PER_MIN
111
+
101
112
  BAR_WIDTH = 10
102
113
 
103
114
  def draw_bar(consumed_pct, elapsed_pct, width=BAR_WIDTH):
@@ -121,15 +132,15 @@ def draw_bar(consumed_pct, elapsed_pct, width=BAR_WIDTH):
121
132
  cells.append('░')
122
133
  return '[' + ''.join(cells) + ']'
123
134
 
124
- def fmt_hm(secs):
125
- if secs is None or secs <= 0:
126
- return ''
127
- return '{}h{:02d}m'.format(int(secs // 3600), int((secs % 3600) // 60))
128
-
129
- def fmt_dh(secs):
135
+ def fmt_time(secs):
136
+ # Autoselect scale: `{D}d{H}h` for >=1 day, `{H}h{MM}m` below that.
137
+ # One formatter so the Q5h (always h/m) and Q7d (h/m or d/h depending on
138
+ # how close to reset) callers don't need to pick.
130
139
  if secs is None or secs <= 0:
131
140
  return ''
132
- return '{}d {}h'.format(int(secs // 86400), int((secs % 86400) // 3600))
141
+ if secs >= SECS_PER_DAY:
142
+ return '{}d{}h'.format(int(secs // SECS_PER_DAY), int((secs % SECS_PER_DAY) // SECS_PER_HR))
143
+ return '{}h{:02d}m'.format(int(secs // SECS_PER_HR), int((secs % SECS_PER_HR) // SECS_PER_MIN))
133
144
 
134
145
  def window_view(reset_ts, window_secs):
135
146
  # Returns (elapsed_sec, secs_left). elapsed_sec may be negative (server
@@ -150,7 +161,7 @@ def time_to_exhaust_sec(pct, elapsed_sec, min_elapsed_sec):
150
161
  return None
151
162
  return (100 - pct) * elapsed_sec / pct
152
163
 
153
- def format_window(name, pct, elapsed_sec, window_secs, secs_left, fmt_time, min_elapsed_sec):
164
+ def format_window(name, pct, elapsed_sec, window_secs, secs_left, min_elapsed_sec):
154
165
  ep = None if elapsed_sec is None or elapsed_sec < 0 else elapsed_sec / window_secs * 100
155
166
  extras = []
156
167
  stale = secs_left is not None and secs_left <= 0
@@ -163,11 +174,11 @@ def format_window(name, pct, elapsed_sec, window_secs, secs_left, fmt_time, min_
163
174
  tail = ' (' + ', '.join(extras) + ')' if extras else ''
164
175
  return '{} {} {}%{}'.format(name, draw_bar(pct, ep), pct, tail)
165
176
 
166
- elapsed_5h, left_5h = window_view(q5h_reset, 5 * 3600)
167
- elapsed_7d, left_7d = window_view(q7d_reset, 7 * 86400)
177
+ elapsed_5h, left_5h = window_view(q5h_reset, 5 * SECS_PER_HR)
178
+ elapsed_7d, left_7d = window_view(q7d_reset, 7 * SECS_PER_DAY)
168
179
 
169
- label = format_window('Q5h', q5h, elapsed_5h, 5 * 3600, left_5h, fmt_hm, 60)
170
- label += ' | ' + format_window('Q7d', q7d, elapsed_7d, 7 * 86400, left_7d, fmt_dh, 360)
180
+ label = format_window('Q5h', q5h, elapsed_5h, 5 * SECS_PER_HR, left_5h, BURN_WARMUP_SEC)
181
+ label += ' | ' + format_window('Q7d', q7d, elapsed_7d, 7 * SECS_PER_DAY, left_7d, BURN_WARMUP_SEC)
171
182
  if overage == 'active':
172
183
  label += ' | OVERAGE'
173
184