claude-code-cache-fix 3.6.2 → 3.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +26 -3
- package/package.json +1 -1
- package/proxy/config.mjs +6 -5
- package/proxy/extensions/bootstrap-defense.mjs +161 -0
- package/proxy/extensions.json +18 -68
- package/proxy/pipeline.mjs +23 -0
- package/proxy/server.mjs +129 -18
- package/tools/quota-statusline.sh +23 -12
package/README.md
CHANGED
|
@@ -208,6 +208,17 @@ Options (all optional; all fall back to the same env vars used by the CLI):
|
|
|
208
208
|
|
|
209
209
|
*The embeddable factory was contributed by [@bilby91](https://github.com/bilby91) at [Crunchloop DAP](https://dap.crunchloop.ai) — see [PR #123](https://github.com/cnighswonger/claude-code-cache-fix/pull/123).*
|
|
210
210
|
|
|
211
|
+
## What this proxy defends against
|
|
212
|
+
|
|
213
|
+
**Cache-economics regressions.** The original purpose of cache-fix is to absorb the cache-handling behaviors in Claude Code that cost users real money and quota — TTL downgrades, cache-breaking header churn, identity-latching issues, and the rest of the regression catalog documented across our issue history. The proxy sits between CC and the Anthropic API, normalizes the request and response stream, and emits enough observability (via statusline integration and the quota-status files) that users can see what their session is actually doing. This is the load-bearing feature for almost every user today.
|
|
214
|
+
|
|
215
|
+
**Bootstrap-channel observability.** Claude Code v2.1.150 introduced a prompt-section consumer that fetches a server-supplied string from `/api/claude_cli/bootstrap` and merges it into the agent's behavioral-instructions prompt path. We filed this behavior with Anthropic's security team in May 2026; Anthropic closed the report as *Informative*, treating TLS as the transport-integrity boundary and declining to add application-layer authenticity checks. Cache-fix v3.7.0 adds explicit handling for this path. Default mode is `audit` — bootstrap responses proxy through to CC and are logged to `~/.claude/cache-fix-bootstrap-log.jsonl` so users can inspect them locally. To opt into block mode instead, set `CACHE_FIX_BOOTSTRAP_MODE=block` in the proxy environment; block mode short-circuits the upstream call and returns a 200 with an empty JSON body, dropping bootstrap content before it reaches CC. (Note: cache-fix v3.6.2 and earlier returned 404 for this path because the proxy router did not include it — the practical effect was that bootstrap content was not previously reaching CC for cache-fix users. v3.7.0's default `audit` changes that behavior; explicit `CACHE_FIX_BOOTSTRAP_MODE=block` preserves it.) The full disclosure record, including Anthropic's verbatim close text, is in [`docs/disclosure/heron-brook-2026-05.md`](docs/disclosure/heron-brook-2026-05.md).
|
|
216
|
+
|
|
217
|
+
**Reference material:**
|
|
218
|
+
- [`docs/disclosure/heron-brook-2026-05.md`](docs/disclosure/heron-brook-2026-05.md) — full disclosure record
|
|
219
|
+
- [`CHANGELOG.md`](CHANGELOG.md#370---2026-05-26) — v3.7.0 release entry (includes the behavior-change note for prior users)
|
|
220
|
+
- [`cnighswonger/heron-brook-poc`](https://github.com/cnighswonger/heron-brook-poc) — reproducer for the bootstrap-channel behavior
|
|
221
|
+
|
|
211
222
|
## Recommended CC operational config
|
|
212
223
|
|
|
213
224
|
The proxy fixes what it can fix at the request layer. A handful of CC client-side env vars and `~/.claude/settings.json` knobs solve adjacent problems the proxy can't reach — silent model swaps on CC update, ambiguous model fallback, schema-strip side effects. Surfacing these here as a recommendation; users decide their own config.
|
|
@@ -242,6 +253,18 @@ If you've seen the `autoCompactWindow: 1000000` setting recommended elsewhere: i
|
|
|
242
253
|
|
|
243
254
|
If you set this flag, CC strips any tool field outside `["name", "description", "input_schema", "cache_control"]` from outgoing requests. Custom tools relying on `defer_loading` or `eager_input_streaming` will silently lose those fields and behave differently. Worth knowing before turning the flag on.
|
|
244
255
|
|
|
256
|
+
## Known CC behaviors that affect cache cost
|
|
257
|
+
|
|
258
|
+
These aren't bugs cache-fix patches — they're upstream CC behaviors users should be aware of when sizing their session cost.
|
|
259
|
+
|
|
260
|
+
### Diagnostic slash commands inflate conversation history ([#49335](https://github.com/anthropics/claude-code/issues/49335))
|
|
261
|
+
|
|
262
|
+
Running `/context`, `/release-notes` (and likely other state-inspection commands) appends the diagnostic output to conversation history rather than rendering terminal-only. Subsequent turns replay the inflated payload via prompt cache, compounding token cost on a state-inspection action that should be free. Empirically measured at +3,480 `cache_creation_input_tokens` for a single `/context` invocation on v2.1.148; another user reports ~5K on a separate session. `/release-notes` is worse — defaults to dumping the full changelog.
|
|
263
|
+
|
|
264
|
+
Worse for diagnosis: the inflated payload that bills against your cache isn't written to the local JSONL transcript, so you can't audit the cost source locally — you can only infer it from `cache_creation_input_tokens` jumps in response usage metadata. (Proxy-mode users can inspect the deltas in `~/.claude/quota-status/` files, which the proxy writes directly from response headers.)
|
|
265
|
+
|
|
266
|
+
**Workaround until upstream fix:** use these commands sparingly in long sessions. If you need them frequently in a session, consider `/compact` after a diagnostic run to reset the bleed.
|
|
267
|
+
|
|
245
268
|
## Quick Start: Preload (CC v2.1.112 and earlier)
|
|
246
269
|
|
|
247
270
|
If you're on a Node.js-based CC version (v2.1.112 or earlier), the preload interceptor works without a proxy:
|
|
@@ -358,7 +381,7 @@ The interceptor can only *help* or *do nothing*. It cannot make things worse.
|
|
|
358
381
|
Both modes write quota state on every API call. Proxy mode (v3.5.0+) splits into `~/.claude/quota-status/account.json` (account-global fields: Q5h/Q7d, status, overage) plus `~/.claude/quota-status/sessions/<id>.json` (per-session cache fields: TTL tier, hit rate). Preload mode keeps the legacy `~/.claude/quota-status.json` (single-session by construction). The included `tools/quota-statusline.sh` script displays a live status line showing:
|
|
359
382
|
|
|
360
383
|
- **Q5h** quota bar `[███░┃░░░░░]` + percent + `(exhaust X, reset Y)`. Filled cells are consumed quota; the heavy-vertical tick is wall-clock elapsed position in the window. Tick to the right of the fill = under pace; tick inside the fill = burning faster than time (over pace). `exhaust` is the projected time-to-100% at the current burn rate; `reset` is the wall-clock time until the window rolls over. When `exhaust < reset`, you will hit 100% before the window resets — back off.
|
|
361
|
-
- **Q7d** same shape with day-scale durations (e.g. `(exhaust
|
|
384
|
+
- **Q7d** same shape with day-scale durations (e.g. `(exhaust 3d13h, reset 3d0h)`). Below a day, the suffix auto-switches to `h/m` format (e.g. `(exhaust 1h41m, reset 0h30m)`).
|
|
362
385
|
- **TTL tier** — `TTL:1h` when healthy, **`TTL:5m` in red when the server has downgraded you** (typically at Q5h ≥ 100%)
|
|
363
386
|
- **PEAK** in yellow during weekday peak hours (13:00–19:00 UTC)
|
|
364
387
|
- **Cache hit rate %**
|
|
@@ -367,10 +390,10 @@ Both modes write quota state on every API call. Proxy mode (v3.5.0+) splits into
|
|
|
367
390
|
Example line (mid-window, healthy state):
|
|
368
391
|
|
|
369
392
|
```
|
|
370
|
-
Q5h [███░┃░░░░░] 30% (exhaust 4h40m, reset 3h00m) | Q7d [█████┃░░░░] 53% (exhaust
|
|
393
|
+
Q5h [███░┃░░░░░] 30% (exhaust 4h40m, reset 3h00m) | Q7d [█████┃░░░░] 53% (exhaust 3d13h, reset 3d0h) | TTL:1h 98.3%
|
|
371
394
|
```
|
|
372
395
|
|
|
373
|
-
The `(exhaust …, reset …)` suffix is dropped piecewise when projection isn't meaningful: at 0% (fresh window) and 100% (already exhausted) only `reset` is shown; in the first
|
|
396
|
+
The `(exhaust …, reset …)` suffix is dropped piecewise when projection isn't meaningful: at 0% (fresh window) and 100% (already exhausted) only `reset` is shown; in the first 5 minutes after window start the burn rate isn't stable enough to project (a single early call dominates the rate), so `exhaust` is held back until then on both Q5h and Q7d; a stale `resets_at` (the server-reported value sits in the past, before the next API call refreshes it) drops both.
|
|
374
397
|
|
|
375
398
|
The bar uses Unicode block characters (`█┃░`) — most modern terminals render these correctly. If your terminal substitutes boxes or replacement glyphs, configure a Unicode-capable font (any DejaVu, Fira, Iosevka, JetBrains Mono, etc.).
|
|
376
399
|
|
package/package.json
CHANGED
package/proxy/config.mjs
CHANGED
|
@@ -10,14 +10,15 @@ function envInt(name, fallback) {
|
|
|
10
10
|
|
|
11
11
|
const __dirname = dirname(fileURLToPath(import.meta.url));
|
|
12
12
|
|
|
13
|
-
//
|
|
14
|
-
//
|
|
15
|
-
// for test isolation (see test/proxy-upstream-corp-proxy.test.mjs
|
|
16
|
-
//
|
|
13
|
+
// Most fields are read once at module init (preserving prior behavior).
|
|
14
|
+
// Corp-proxy/CA fields and `upstream` are getters so they reflect live env —
|
|
15
|
+
// important for test isolation (see test/proxy-upstream-corp-proxy.test.mjs
|
|
16
|
+
// and test/proxy-server-bootstrap.test.mjs) and for callers that legitimately
|
|
17
|
+
// want to flip env at runtime.
|
|
17
18
|
const config = {
|
|
18
19
|
port: envInt("CACHE_FIX_PROXY_PORT", 9801),
|
|
19
20
|
bind: process.env.CACHE_FIX_PROXY_BIND || "127.0.0.1",
|
|
20
|
-
upstream
|
|
21
|
+
get upstream() { return process.env.CACHE_FIX_PROXY_UPSTREAM || "https://api.anthropic.com"; },
|
|
21
22
|
timeout: envInt("CACHE_FIX_PROXY_TIMEOUT", 600_000),
|
|
22
23
|
extensionsDir: process.env.CACHE_FIX_EXTENSIONS_DIR || join(__dirname, "extensions"),
|
|
23
24
|
extensionsConfig: process.env.CACHE_FIX_EXTENSIONS_CONFIG || join(__dirname, "extensions.json"),
|
|
@@ -0,0 +1,161 @@
|
|
|
1
|
+
import { appendFileSync, statSync, renameSync, mkdirSync } from "node:fs";
|
|
2
|
+
import { join, dirname } from "node:path";
|
|
3
|
+
import { homedir } from "node:os";
|
|
4
|
+
|
|
5
|
+
const LOG_ROTATE_BYTES = 5 * 1024 * 1024;
|
|
6
|
+
const SCHEMA_VERSION = 1;
|
|
7
|
+
const EXTENSION_VERSION = "v3.6.3";
|
|
8
|
+
|
|
9
|
+
function logPath() {
|
|
10
|
+
return process.env.CACHE_FIX_BOOTSTRAP_LOG_PATH || join(homedir(), ".claude", "cache-fix-bootstrap-log.jsonl");
|
|
11
|
+
}
|
|
12
|
+
|
|
13
|
+
// Single-tier rotation by design: any previous .1 gets overwritten. The audit
|
|
14
|
+
// log is a forward-looking signal feed, not an archival record — if v3.7.0
|
|
15
|
+
// anomaly detection wants long-term retention it can subscribe to the stream
|
|
16
|
+
// directly. Keeping rotation to one tier bounds disk usage at 2×5MB = 10MB.
|
|
17
|
+
function rotateIfNeeded(path) {
|
|
18
|
+
let size = 0;
|
|
19
|
+
try { size = statSync(path).size; } catch { return; }
|
|
20
|
+
if (size < LOG_ROTATE_BYTES) return;
|
|
21
|
+
try { renameSync(path, `${path}.1`); } catch {}
|
|
22
|
+
}
|
|
23
|
+
|
|
24
|
+
// Single-writer invariant: cache-fix-proxy is one Node process per host, and
|
|
25
|
+
// every bootstrap response that needs logging flows through this extension.
|
|
26
|
+
// All writes are appendFileSync from a single event loop, so no inter-process
|
|
27
|
+
// or inter-extension locking is required. If that invariant ever changes
|
|
28
|
+
// (e.g. multi-process proxy, sibling extension writing to the same file),
|
|
29
|
+
// this writer needs to gain a lock.
|
|
30
|
+
function appendRecord(record) {
|
|
31
|
+
const path = logPath();
|
|
32
|
+
try {
|
|
33
|
+
mkdirSync(dirname(path), { recursive: true });
|
|
34
|
+
rotateIfNeeded(path);
|
|
35
|
+
appendFileSync(path, JSON.stringify(record) + "\n");
|
|
36
|
+
} catch (err) {
|
|
37
|
+
process.stderr.write(`[bootstrap-defense] log write failed: ${err.message}\n`);
|
|
38
|
+
}
|
|
39
|
+
}
|
|
40
|
+
|
|
41
|
+
function modeFromEnv(fallback) {
|
|
42
|
+
const raw = process.env.CACHE_FIX_BOOTSTRAP_MODE;
|
|
43
|
+
if (raw === "audit" || raw === "block") return raw;
|
|
44
|
+
return fallback;
|
|
45
|
+
}
|
|
46
|
+
|
|
47
|
+
// PII discipline: the audit log MUST NOT include client headers (Authorization,
|
|
48
|
+
// x-api-key, cookies, etc.) or request/response bodies. Callers pass only the
|
|
49
|
+
// extracted scalar fields below — the full headers object is never threaded
|
|
50
|
+
// through this function, so a future maintainer can't accidentally widen the
|
|
51
|
+
// log surface by spreading the parameter object.
|
|
52
|
+
//
|
|
53
|
+
// v3.7.0 extension fields are emitted as null/defaulted so log readers don't
|
|
54
|
+
// break when v3.7.0 starts populating baseline_hash, anomaly_status, etc.
|
|
55
|
+
function recordShape({ phase, mode, status, body_bytes, upstream_host, request_id, error }) {
|
|
56
|
+
return {
|
|
57
|
+
schema_version: SCHEMA_VERSION,
|
|
58
|
+
extension_version: EXTENSION_VERSION,
|
|
59
|
+
timestamp: new Date().toISOString(),
|
|
60
|
+
phase,
|
|
61
|
+
mode,
|
|
62
|
+
status: status ?? null,
|
|
63
|
+
body_bytes: body_bytes ?? null,
|
|
64
|
+
upstream_host: upstream_host ?? null,
|
|
65
|
+
request_id: request_id ?? null,
|
|
66
|
+
baseline_hash: null,
|
|
67
|
+
anomaly_status: null,
|
|
68
|
+
error: error ?? null,
|
|
69
|
+
};
|
|
70
|
+
}
|
|
71
|
+
|
|
72
|
+
// Extract audit-record scalars from the request context. Meta wins over
|
|
73
|
+
// headers so that the canonical request-side fields stashed by the server
|
|
74
|
+
// (handleBootstrap stashes `_bootstrapUpstreamHost` and `_bootstrapRequestId`
|
|
75
|
+
// before the upstream call) are authoritative on the onResponse path, where
|
|
76
|
+
// ctx.headers carries the upstream RESPONSE headers (no Host, no request-id).
|
|
77
|
+
// On the onRequest block path, meta hasn't been populated yet — the helper
|
|
78
|
+
// falls back to ctx.headers (the client request headers) so request_id is
|
|
79
|
+
// still captured. Upstream_host falls back to null on the request path
|
|
80
|
+
// because the resolved upstream isn't known until handleBootstrap runs.
|
|
81
|
+
//
|
|
82
|
+
// Signature is scalar-only by design: callers MUST NOT spread a headers
|
|
83
|
+
// object into recordShape, so a future maintainer can't accidentally widen
|
|
84
|
+
// the log surface to carry Authorization / x-api-key / cookies.
|
|
85
|
+
function extractAuditFields(meta, headers) {
|
|
86
|
+
const requestId =
|
|
87
|
+
meta?._bootstrapRequestId ??
|
|
88
|
+
headers?.["request-id"] ??
|
|
89
|
+
headers?.["x-request-id"] ??
|
|
90
|
+
null;
|
|
91
|
+
return {
|
|
92
|
+
upstream_host: meta?._bootstrapUpstreamHost ?? null,
|
|
93
|
+
request_id: requestId,
|
|
94
|
+
};
|
|
95
|
+
}
|
|
96
|
+
|
|
97
|
+
export default {
|
|
98
|
+
name: "bootstrap-defense",
|
|
99
|
+
description:
|
|
100
|
+
"Audit (default) or block /api/claude_cli/bootstrap traffic. Audit mode proxies the response through to CC and logs metadata to ~/.claude/cache-fix-bootstrap-log.jsonl. Block mode returns empty 200, preserving v3.6.2's de-facto behavior in explicit form.",
|
|
101
|
+
// Pipeline route scoping (pipeline.mjs:appliesToRoute) gates this extension
|
|
102
|
+
// to the bootstrap path. The internal route guard below is belt-and-suspenders
|
|
103
|
+
// in case a caller invokes the hook directly.
|
|
104
|
+
routes: ["bootstrap"],
|
|
105
|
+
order: 45,
|
|
106
|
+
|
|
107
|
+
async onRequest(ctx) {
|
|
108
|
+
if (ctx.meta?.route !== "bootstrap") return;
|
|
109
|
+
|
|
110
|
+
const mode = modeFromEnv("audit");
|
|
111
|
+
ctx.meta._bootstrapDefenseMode = mode;
|
|
112
|
+
|
|
113
|
+
if (mode === "block") {
|
|
114
|
+
appendRecord(
|
|
115
|
+
recordShape({
|
|
116
|
+
phase: "request_blocked",
|
|
117
|
+
mode,
|
|
118
|
+
...extractAuditFields(ctx.meta, ctx.headers),
|
|
119
|
+
}),
|
|
120
|
+
);
|
|
121
|
+
return {
|
|
122
|
+
skip: true,
|
|
123
|
+
status: 200,
|
|
124
|
+
headers: { "content-type": "application/json" },
|
|
125
|
+
body: {},
|
|
126
|
+
};
|
|
127
|
+
}
|
|
128
|
+
},
|
|
129
|
+
|
|
130
|
+
async onResponse(ctx) {
|
|
131
|
+
if (ctx.meta?.route !== "bootstrap") return;
|
|
132
|
+
const mode = ctx.meta._bootstrapDefenseMode || "audit";
|
|
133
|
+
if (mode !== "audit") return;
|
|
134
|
+
|
|
135
|
+
// Prefer raw on-wire byte count from server.mjs (accurate even for
|
|
136
|
+
// non-JSON / unparseable upstream responses). The fallback stringify path
|
|
137
|
+
// exists only for direct unit-test invocation of onResponse without going
|
|
138
|
+
// through handleBootstrap — production traffic always sets the meta field.
|
|
139
|
+
let bodyBytes = ctx.meta?._bootstrapBodyBytes ?? null;
|
|
140
|
+
if (bodyBytes === null) {
|
|
141
|
+
try { bodyBytes = Buffer.byteLength(JSON.stringify(ctx.body ?? {})); } catch {}
|
|
142
|
+
}
|
|
143
|
+
|
|
144
|
+
const upstreamError = ctx.meta?._bootstrapUpstreamError ?? null;
|
|
145
|
+
// Request-side context (request_id, upstream_host) is captured at
|
|
146
|
+
// forward time and stashed on ctx.meta — we read from meta here rather
|
|
147
|
+
// than ctx.headers (which on onResponse contains the upstream RESPONSE
|
|
148
|
+
// headers, not the client request).
|
|
149
|
+
appendRecord(
|
|
150
|
+
recordShape({
|
|
151
|
+
phase: upstreamError ? "upstream_error_audited" : "response_audited",
|
|
152
|
+
mode,
|
|
153
|
+
status: ctx.status,
|
|
154
|
+
body_bytes: bodyBytes,
|
|
155
|
+
upstream_host: ctx.meta?._bootstrapUpstreamHost ?? null,
|
|
156
|
+
request_id: ctx.meta?._bootstrapRequestId ?? null,
|
|
157
|
+
error: upstreamError,
|
|
158
|
+
}),
|
|
159
|
+
);
|
|
160
|
+
},
|
|
161
|
+
};
|
package/proxy/extensions.json
CHANGED
|
@@ -1,70 +1,20 @@
|
|
|
1
1
|
{
|
|
2
|
-
"
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
},
|
|
6
|
-
"
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
},
|
|
10
|
-
"
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
},
|
|
14
|
-
"
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
},
|
|
18
|
-
"
|
|
19
|
-
|
|
20
|
-
"order": 250
|
|
21
|
-
},
|
|
22
|
-
"identity-normalization": {
|
|
23
|
-
"enabled": true,
|
|
24
|
-
"order": 300
|
|
25
|
-
},
|
|
26
|
-
"smoosh-split": {
|
|
27
|
-
"enabled": true,
|
|
28
|
-
"order": 320
|
|
29
|
-
},
|
|
30
|
-
"content-strip": {
|
|
31
|
-
"enabled": true,
|
|
32
|
-
"order": 330
|
|
33
|
-
},
|
|
34
|
-
"tool-input-normalize": {
|
|
35
|
-
"enabled": true,
|
|
36
|
-
"order": 340
|
|
37
|
-
},
|
|
38
|
-
"microcompact-stability": {
|
|
39
|
-
"enabled": true,
|
|
40
|
-
"order": 350
|
|
41
|
-
},
|
|
42
|
-
"thinking-display": {
|
|
43
|
-
"enabled": true,
|
|
44
|
-
"order": 360
|
|
45
|
-
},
|
|
46
|
-
"cache-control-normalize": {
|
|
47
|
-
"enabled": true,
|
|
48
|
-
"order": 400
|
|
49
|
-
},
|
|
50
|
-
"messages-cache-breakpoint": {
|
|
51
|
-
"enabled": true,
|
|
52
|
-
"order": 410
|
|
53
|
-
},
|
|
54
|
-
"ttl-management": {
|
|
55
|
-
"enabled": true,
|
|
56
|
-
"order": 500
|
|
57
|
-
},
|
|
58
|
-
"cache-telemetry": {
|
|
59
|
-
"enabled": true,
|
|
60
|
-
"order": 600
|
|
61
|
-
},
|
|
62
|
-
"overage-warning": {
|
|
63
|
-
"enabled": true,
|
|
64
|
-
"order": 610
|
|
65
|
-
},
|
|
66
|
-
"request-log": {
|
|
67
|
-
"enabled": false,
|
|
68
|
-
"order": 700
|
|
69
|
-
}
|
|
2
|
+
"bootstrap-defense": { "enabled": true, "order": 45 },
|
|
3
|
+
"ttl-tier-detect": { "enabled": true, "order": 75 },
|
|
4
|
+
"fingerprint-strip": { "enabled": true, "order": 100 },
|
|
5
|
+
"image-strip": { "enabled": true, "order": 150 },
|
|
6
|
+
"sort-stabilization": { "enabled": true, "order": 200 },
|
|
7
|
+
"fresh-session-sort": { "enabled": true, "order": 250 },
|
|
8
|
+
"identity-normalization": { "enabled": true, "order": 300 },
|
|
9
|
+
"smoosh-split": { "enabled": true, "order": 320 },
|
|
10
|
+
"content-strip": { "enabled": true, "order": 330 },
|
|
11
|
+
"tool-input-normalize": { "enabled": true, "order": 340 },
|
|
12
|
+
"microcompact-stability": { "enabled": true, "order": 350 },
|
|
13
|
+
"thinking-display": { "enabled": true, "order": 360 },
|
|
14
|
+
"cache-control-normalize": { "enabled": true, "order": 400 },
|
|
15
|
+
"messages-cache-breakpoint": { "enabled": true, "order": 410 },
|
|
16
|
+
"ttl-management": { "enabled": true, "order": 500 },
|
|
17
|
+
"cache-telemetry": { "enabled": true, "order": 600 },
|
|
18
|
+
"overage-warning": { "enabled": true, "order": 610 },
|
|
19
|
+
"request-log": { "enabled": false, "order": 700 }
|
|
70
20
|
}
|
package/proxy/pipeline.mjs
CHANGED
|
@@ -46,10 +46,27 @@ export function snapshotRegistry() {
|
|
|
46
46
|
return [...registry];
|
|
47
47
|
}
|
|
48
48
|
|
|
49
|
+
// Route scoping: extensions default to messages-only so that adding a new
|
|
50
|
+
// route (e.g. /api/claude_cli/bootstrap) doesn't drag every existing
|
|
51
|
+
// message-mutating extension onto it — most throw on a null body because
|
|
52
|
+
// they were never designed for non-messages traffic. Cross-cutting
|
|
53
|
+
// extensions (cache-telemetry, usage-log, …) opt into additional routes
|
|
54
|
+
// by declaring an explicit `routes` array on their default export.
|
|
55
|
+
//
|
|
56
|
+
// If ctx.meta.route is undefined we skip filtering entirely — preserves
|
|
57
|
+
// back-compat for callers that don't tag routes (legacy tests, embedders).
|
|
58
|
+
function appliesToRoute(ext, route) {
|
|
59
|
+
if (!route) return true;
|
|
60
|
+
const routes = ext.routes || ["messages"];
|
|
61
|
+
return routes.includes(route);
|
|
62
|
+
}
|
|
63
|
+
|
|
49
64
|
export async function runOnRequest(ctx, snapshot) {
|
|
50
65
|
const exts = snapshot || registry;
|
|
66
|
+
const route = ctx.meta?.route;
|
|
51
67
|
for (const ext of exts) {
|
|
52
68
|
if (!ext.onRequest) continue;
|
|
69
|
+
if (!appliesToRoute(ext, route)) continue;
|
|
53
70
|
try {
|
|
54
71
|
const result = await ext.onRequest(ctx);
|
|
55
72
|
if (result && result.skip) return result;
|
|
@@ -62,8 +79,10 @@ export async function runOnRequest(ctx, snapshot) {
|
|
|
62
79
|
|
|
63
80
|
export async function runOnResponseStart(ctx, snapshot) {
|
|
64
81
|
const exts = snapshot || registry;
|
|
82
|
+
const route = ctx.meta?.route;
|
|
65
83
|
for (const ext of exts) {
|
|
66
84
|
if (!ext.onResponseStart) continue;
|
|
85
|
+
if (!appliesToRoute(ext, route)) continue;
|
|
67
86
|
try {
|
|
68
87
|
await ext.onResponseStart(ctx);
|
|
69
88
|
} catch (err) {
|
|
@@ -74,8 +93,10 @@ export async function runOnResponseStart(ctx, snapshot) {
|
|
|
74
93
|
|
|
75
94
|
export async function runOnStreamEvent(ctx, snapshot) {
|
|
76
95
|
const exts = snapshot || registry;
|
|
96
|
+
const route = ctx.meta?.route;
|
|
77
97
|
for (const ext of exts) {
|
|
78
98
|
if (!ext.onStreamEvent) continue;
|
|
99
|
+
if (!appliesToRoute(ext, route)) continue;
|
|
79
100
|
try {
|
|
80
101
|
await ext.onStreamEvent(ctx);
|
|
81
102
|
} catch (err) {
|
|
@@ -86,8 +107,10 @@ export async function runOnStreamEvent(ctx, snapshot) {
|
|
|
86
107
|
|
|
87
108
|
export async function runOnResponse(ctx, snapshot) {
|
|
88
109
|
const exts = snapshot || registry;
|
|
110
|
+
const route = ctx.meta?.route;
|
|
89
111
|
for (const ext of exts) {
|
|
90
112
|
if (!ext.onResponse) continue;
|
|
113
|
+
if (!appliesToRoute(ext, route)) continue;
|
|
91
114
|
try {
|
|
92
115
|
await ext.onResponse(ctx);
|
|
93
116
|
} catch (err) {
|
package/proxy/server.mjs
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
import http from "node:http";
|
|
2
|
-
import { pathToFileURL } from "node:url";
|
|
2
|
+
import { pathToFileURL, URL } from "node:url";
|
|
3
3
|
import config from "./config.mjs";
|
|
4
4
|
import { forwardRequest } from "./upstream.mjs";
|
|
5
5
|
import { streamResponse, createTelemetryRecord } from "./stream.mjs";
|
|
@@ -15,16 +15,15 @@ function collectBody(req) {
|
|
|
15
15
|
});
|
|
16
16
|
}
|
|
17
17
|
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
18
|
+
// Run the pre-forward pipeline stages (collect body, parse, runOnRequest)
|
|
19
|
+
// and either short-circuit with an extension-supplied response (block mode,
|
|
20
|
+
// auth-failure synth, etc.) or return the inputs the caller needs to drive
|
|
21
|
+
// forwarding and the post-response stages.
|
|
22
|
+
//
|
|
23
|
+
// `routeName` is stashed on ctx.meta.route so route-aware extensions
|
|
24
|
+
// (bootstrap-defense, env-flag-detector) can discriminate without each
|
|
25
|
+
// route needing its own pipeline hook.
|
|
26
|
+
async function preForward(clientReq, clientRes, _abortController, extSnapshot, routeName, baseMeta = {}) {
|
|
28
27
|
const rawBody = await collectBody(clientReq);
|
|
29
28
|
|
|
30
29
|
let parsed;
|
|
@@ -35,23 +34,49 @@ async function handleMessages(clientReq, clientRes) {
|
|
|
35
34
|
}
|
|
36
35
|
|
|
37
36
|
let forwardBody = rawBody;
|
|
38
|
-
|
|
37
|
+
// baseMeta lets routes pre-populate audit scalars (e.g. resolved upstream
|
|
38
|
+
// hostname, request_id) so they're available to onRequest hooks BEFORE the
|
|
39
|
+
// upstream call — block-mode short-circuits in onRequest, so a post-call
|
|
40
|
+
// stash would miss the block-path audit record.
|
|
41
|
+
const meta = { ...baseMeta, route: routeName };
|
|
39
42
|
|
|
40
|
-
if (
|
|
43
|
+
if (extSnapshot.length > 0) {
|
|
41
44
|
const reqCtx = { body: parsed, headers: { ...clientReq.headers }, meta };
|
|
42
45
|
const skipResult = await runOnRequest(reqCtx, extSnapshot);
|
|
43
46
|
|
|
44
47
|
if (skipResult && skipResult.skip) {
|
|
45
48
|
const status = skipResult.status || 400;
|
|
46
|
-
const
|
|
47
|
-
|
|
48
|
-
clientRes.
|
|
49
|
-
|
|
49
|
+
const headers = skipResult.headers || { "content-type": "application/json" };
|
|
50
|
+
const body = skipResult.body ?? { error: "blocked_by_extension" };
|
|
51
|
+
clientRes.writeHead(status, headers);
|
|
52
|
+
clientRes.end(typeof body === "string" ? body : JSON.stringify(body));
|
|
53
|
+
return { handled: true };
|
|
50
54
|
}
|
|
51
55
|
|
|
52
|
-
|
|
56
|
+
if (parsed) {
|
|
57
|
+
forwardBody = Buffer.from(JSON.stringify(reqCtx.body));
|
|
58
|
+
}
|
|
53
59
|
}
|
|
54
60
|
|
|
61
|
+
return { handled: false, parsed, forwardBody, meta };
|
|
62
|
+
}
|
|
63
|
+
|
|
64
|
+
async function handleMessages(clientReq, clientRes) {
|
|
65
|
+
const abortController = new AbortController();
|
|
66
|
+
const extSnapshot = snapshotRegistry();
|
|
67
|
+
|
|
68
|
+
// Streaming SSE: if the client gives up mid-stream, free the upstream.
|
|
69
|
+
// Bootstrap (handleBootstrap) doesn't install this because its response is
|
|
70
|
+
// a single non-SSE JSON payload — aborting on clientReq close prematurely
|
|
71
|
+
// would race the response write on fast-failure paths (e.g. ECONNREFUSED).
|
|
72
|
+
clientReq.on("close", () => {
|
|
73
|
+
if (!clientRes.writableEnded) abortController.abort();
|
|
74
|
+
});
|
|
75
|
+
|
|
76
|
+
const pre = await preForward(clientReq, clientRes, abortController, extSnapshot, "messages");
|
|
77
|
+
if (pre.handled) return;
|
|
78
|
+
const { parsed, forwardBody, meta } = pre;
|
|
79
|
+
|
|
55
80
|
const requestedModel = parsed?.model || null;
|
|
56
81
|
|
|
57
82
|
let upstreamRes, responseHeaders, statusCode, upstreamConnectionId;
|
|
@@ -129,6 +154,89 @@ async function handleMessages(clientReq, clientRes) {
|
|
|
129
154
|
}
|
|
130
155
|
}
|
|
131
156
|
|
|
157
|
+
// Route handler for `/api/claude_cli/bootstrap` (CC v2.1.150+ system-prompt
|
|
158
|
+
// injection channel). Same pipeline shape as handleMessages but without
|
|
159
|
+
// the streaming branch — bootstrap is a single non-SSE JSON response.
|
|
160
|
+
// The bootstrap-defense extension binds to onRequest/onResponse with
|
|
161
|
+
// `ctx.meta.route === "bootstrap"` to drive audit/block behavior.
|
|
162
|
+
async function handleBootstrap(clientReq, clientRes) {
|
|
163
|
+
const abortController = new AbortController();
|
|
164
|
+
const extSnapshot = snapshotRegistry();
|
|
165
|
+
|
|
166
|
+
// Resolve audit-record scalars BEFORE preForward so they're visible to
|
|
167
|
+
// onRequest hooks (block-mode short-circuits there). HTTP responses don't
|
|
168
|
+
// carry a Host header, so the audit log derives upstream_host from
|
|
169
|
+
// config.upstream — the actual destination requests were forwarded to.
|
|
170
|
+
let upstreamHost = null;
|
|
171
|
+
try {
|
|
172
|
+
upstreamHost = new URL(config.upstream).hostname;
|
|
173
|
+
} catch {}
|
|
174
|
+
const baseMeta = {
|
|
175
|
+
_bootstrapUpstreamHost: upstreamHost,
|
|
176
|
+
_bootstrapRequestId:
|
|
177
|
+
clientReq.headers["request-id"] ?? clientReq.headers["x-request-id"] ?? null,
|
|
178
|
+
};
|
|
179
|
+
|
|
180
|
+
const pre = await preForward(clientReq, clientRes, abortController, extSnapshot, "bootstrap", baseMeta);
|
|
181
|
+
if (pre.handled) return;
|
|
182
|
+
const { forwardBody, meta } = pre;
|
|
183
|
+
|
|
184
|
+
let upstreamRes, responseHeaders, statusCode, upstreamConnectionId;
|
|
185
|
+
|
|
186
|
+
try {
|
|
187
|
+
({ upstreamRes, responseHeaders, statusCode, upstreamConnectionId } = await forwardRequest(
|
|
188
|
+
clientReq,
|
|
189
|
+
forwardBody,
|
|
190
|
+
abortController.signal,
|
|
191
|
+
));
|
|
192
|
+
} catch (err) {
|
|
193
|
+
// Anomaly audit: bootstrap upstream errors are exactly the kind of event
|
|
194
|
+
// an attacker triggering DNS shenanigans or an outage would produce, so
|
|
195
|
+
// route them through the extension pipeline before responding 502.
|
|
196
|
+
if (extSnapshot.length > 0) {
|
|
197
|
+
meta._bootstrapUpstreamError = err.message;
|
|
198
|
+
meta._bootstrapBodyBytes = 0;
|
|
199
|
+
const errCtx = { status: 502, headers: {}, body: null, meta };
|
|
200
|
+
await runOnResponse(errCtx, extSnapshot);
|
|
201
|
+
}
|
|
202
|
+
clientRes.writeHead(502, { "content-type": "application/json" });
|
|
203
|
+
clientRes.end(JSON.stringify({ error: "upstream_error", message: err.message }));
|
|
204
|
+
return;
|
|
205
|
+
}
|
|
206
|
+
|
|
207
|
+
meta._upstreamConnectionId = upstreamConnectionId ?? null;
|
|
208
|
+
|
|
209
|
+
if (extSnapshot.length > 0) {
|
|
210
|
+
const resCtx = { status: statusCode, headers: responseHeaders, meta };
|
|
211
|
+
await runOnResponseStart(resCtx, extSnapshot);
|
|
212
|
+
}
|
|
213
|
+
|
|
214
|
+
const chunks = [];
|
|
215
|
+
for await (const chunk of upstreamRes) chunks.push(chunk);
|
|
216
|
+
const rawResponse = Buffer.concat(chunks);
|
|
217
|
+
|
|
218
|
+
if (extSnapshot.length > 0) {
|
|
219
|
+
let responseBody = null;
|
|
220
|
+
try {
|
|
221
|
+
responseBody = JSON.parse(rawResponse.toString());
|
|
222
|
+
} catch {}
|
|
223
|
+
// Stash raw byte count so bootstrap-defense (and future audit extensions)
|
|
224
|
+
// can record the on-wire payload size even when the body fails to parse.
|
|
225
|
+
// Non-JSON responses are exactly the anomaly audit mode needs to capture.
|
|
226
|
+
meta._bootstrapBodyBytes = rawResponse.length;
|
|
227
|
+
const resCtx = { status: statusCode, headers: responseHeaders, body: responseBody, meta };
|
|
228
|
+
await runOnResponse(resCtx, extSnapshot);
|
|
229
|
+
if (responseBody !== null) {
|
|
230
|
+
clientRes.writeHead(statusCode, resCtx.headers);
|
|
231
|
+
clientRes.end(JSON.stringify(resCtx.body));
|
|
232
|
+
return;
|
|
233
|
+
}
|
|
234
|
+
}
|
|
235
|
+
|
|
236
|
+
clientRes.writeHead(statusCode, responseHeaders);
|
|
237
|
+
clientRes.end(rawResponse);
|
|
238
|
+
}
|
|
239
|
+
|
|
132
240
|
function handleHealth(_req, res) {
|
|
133
241
|
res.writeHead(200, { "content-type": "application/json" });
|
|
134
242
|
res.end(JSON.stringify({ status: "ok" }));
|
|
@@ -157,6 +265,9 @@ export function createProxyServer() {
|
|
|
157
265
|
if (req.method === "POST" && req.url?.startsWith("/v1/messages")) {
|
|
158
266
|
return handleMessages(req, res);
|
|
159
267
|
}
|
|
268
|
+
if (req.url?.startsWith("/api/claude_cli/bootstrap")) {
|
|
269
|
+
return handleBootstrap(req, res);
|
|
270
|
+
}
|
|
160
271
|
handleNotFound(req, res);
|
|
161
272
|
});
|
|
162
273
|
}
|
|
@@ -98,6 +98,17 @@ ts = sess.get('timestamp') or acc.get('timestamp', '')
|
|
|
98
98
|
|
|
99
99
|
now = datetime.fromisoformat(ts.replace('Z', '+00:00')) if ts else datetime.now(timezone.utc)
|
|
100
100
|
|
|
101
|
+
SECS_PER_MIN = 60
|
|
102
|
+
MINS_PER_HR = 60
|
|
103
|
+
HRS_PER_DAY = 24
|
|
104
|
+
SECS_PER_HR = SECS_PER_MIN * MINS_PER_HR
|
|
105
|
+
SECS_PER_DAY = SECS_PER_HR * HRS_PER_DAY
|
|
106
|
+
|
|
107
|
+
# Minimum elapsed time in a window before we'll project an exhaust ETA from
|
|
108
|
+
# its burn rate. Below this the rate is dominated by a single early call and
|
|
109
|
+
# the projection is noise.
|
|
110
|
+
BURN_WARMUP_SEC = 5 * SECS_PER_MIN
|
|
111
|
+
|
|
101
112
|
BAR_WIDTH = 10
|
|
102
113
|
|
|
103
114
|
def draw_bar(consumed_pct, elapsed_pct, width=BAR_WIDTH):
|
|
@@ -121,15 +132,15 @@ def draw_bar(consumed_pct, elapsed_pct, width=BAR_WIDTH):
|
|
|
121
132
|
cells.append('░')
|
|
122
133
|
return '[' + ''.join(cells) + ']'
|
|
123
134
|
|
|
124
|
-
def
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
def fmt_dh(secs):
|
|
135
|
+
def fmt_time(secs):
|
|
136
|
+
# Autoselect scale: `{D}d{H}h` for >=1 day, `{H}h{MM}m` below that.
|
|
137
|
+
# One formatter so the Q5h (always h/m) and Q7d (h/m or d/h depending on
|
|
138
|
+
# how close to reset) callers don't need to pick.
|
|
130
139
|
if secs is None or secs <= 0:
|
|
131
140
|
return ''
|
|
132
|
-
|
|
141
|
+
if secs >= SECS_PER_DAY:
|
|
142
|
+
return '{}d{}h'.format(int(secs // SECS_PER_DAY), int((secs % SECS_PER_DAY) // SECS_PER_HR))
|
|
143
|
+
return '{}h{:02d}m'.format(int(secs // SECS_PER_HR), int((secs % SECS_PER_HR) // SECS_PER_MIN))
|
|
133
144
|
|
|
134
145
|
def window_view(reset_ts, window_secs):
|
|
135
146
|
# Returns (elapsed_sec, secs_left). elapsed_sec may be negative (server
|
|
@@ -150,7 +161,7 @@ def time_to_exhaust_sec(pct, elapsed_sec, min_elapsed_sec):
|
|
|
150
161
|
return None
|
|
151
162
|
return (100 - pct) * elapsed_sec / pct
|
|
152
163
|
|
|
153
|
-
def format_window(name, pct, elapsed_sec, window_secs, secs_left,
|
|
164
|
+
def format_window(name, pct, elapsed_sec, window_secs, secs_left, min_elapsed_sec):
|
|
154
165
|
ep = None if elapsed_sec is None or elapsed_sec < 0 else elapsed_sec / window_secs * 100
|
|
155
166
|
extras = []
|
|
156
167
|
stale = secs_left is not None and secs_left <= 0
|
|
@@ -163,11 +174,11 @@ def format_window(name, pct, elapsed_sec, window_secs, secs_left, fmt_time, min_
|
|
|
163
174
|
tail = ' (' + ', '.join(extras) + ')' if extras else ''
|
|
164
175
|
return '{} {} {}%{}'.format(name, draw_bar(pct, ep), pct, tail)
|
|
165
176
|
|
|
166
|
-
elapsed_5h, left_5h = window_view(q5h_reset, 5 *
|
|
167
|
-
elapsed_7d, left_7d = window_view(q7d_reset, 7 *
|
|
177
|
+
elapsed_5h, left_5h = window_view(q5h_reset, 5 * SECS_PER_HR)
|
|
178
|
+
elapsed_7d, left_7d = window_view(q7d_reset, 7 * SECS_PER_DAY)
|
|
168
179
|
|
|
169
|
-
label = format_window('Q5h', q5h, elapsed_5h, 5 *
|
|
170
|
-
label += ' | ' + format_window('Q7d', q7d, elapsed_7d, 7 *
|
|
180
|
+
label = format_window('Q5h', q5h, elapsed_5h, 5 * SECS_PER_HR, left_5h, BURN_WARMUP_SEC)
|
|
181
|
+
label += ' | ' + format_window('Q7d', q7d, elapsed_7d, 7 * SECS_PER_DAY, left_7d, BURN_WARMUP_SEC)
|
|
171
182
|
if overage == 'active':
|
|
172
183
|
label += ' | OVERAGE'
|
|
173
184
|
|