claude-code-cache-fix 3.9.0 → 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -12,7 +12,7 @@ Cache optimization proxy for [Claude Code](https://github.com/anthropics/claude-
12
12
 
13
13
  ## Quick Start: Proxy (recommended)
14
14
 
15
- The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as hot-reloadable extensions.
15
+ The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as composable extensions.
16
16
 
17
17
  ```bash
18
18
  # Install
@@ -29,7 +29,7 @@ That's it. The proxy applies all 7 cache-fix extensions automatically. No wrappe
29
29
 
30
30
  ### What the proxy does
31
31
 
32
- On every `/v1/messages` request, 9 extensions run in order (one opt-in):
32
+ On every `/v1/messages` request, 9 extensions run in order:
33
33
 
34
34
  | Extension | What it fixes |
35
35
  |-----------|--------------|
@@ -41,9 +41,9 @@ On every `/v1/messages` request, 9 extensions run in order (one opt-in):
41
41
  | `cache-control-normalize` | Normalizes cache_control markers across messages |
42
42
  | `cache-telemetry` | Extracts cache stats from response headers → `~/.claude/quota-status/{account.json,sessions/<id>.json}` |
43
43
  | `session-health` | Observes per-session thinking-desync risk (context size + thinking-block count) and warns before a session reaches the danger zone. Read-only |
44
- | `thinking-block-sanitize` | Drops omitted (empty-text) thinking blocks to head off the CC thinking-desync `400` (#63147). **Opt-in** (`CACHE_FIX_THINKING_SANITIZE=on`) |
44
+ | `thinking-block-sanitize` | Drops omitted (empty-text) thinking blocks to head off the CC thinking-desync `400` (#63147). **On by default as of v4.0.0** (v1 mode). Set `CACHE_FIX_THINKING_SANITIZE=off` to disable, `=v2` for additional tools-hash-mismatch drop (opt-in). |
45
45
 
46
- Extensions are hot-reloadable — add, remove, or modify `.mjs` files in `proxy/extensions/` and changes apply to the next request without restarting. Configuration in `proxy/extensions.json`.
46
+ Extensions live as `.mjs` files in `proxy/extensions/` with configuration in `proxy/extensions.json`. As of v4.0.0 the proxy loads them once at startup; adding, removing, or modifying an extension requires a supervisor-level proxy restart (see [Upgrading from v3.x](#upgrading-from-v3x)). Hot-reload is available as opt-in via `CACHE_FIX_HOT_RELOAD=on` for users who want the v3.x behavior back; that path is subject to the Node ESM stale-import race documented in [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196).
47
47
 
48
48
  **Developing a new extension?** See [docs/parallel-proxy-test-harness.md](docs/parallel-proxy-test-harness.md) for the pattern we use to test extensions end-to-end against real `claude -p` traffic without disturbing the production proxy.
49
49
 
@@ -147,6 +147,7 @@ All proxy settings are controlled via environment variables. Set them before sta
147
147
  | `CACHE_FIX_EXTENSIONS_DIR` | `proxy/extensions/` | Directory for extension `.mjs` files |
148
148
  | `CACHE_FIX_EXTENSIONS_CONFIG` | `proxy/extensions.json` | Extension configuration file |
149
149
  | `CACHE_FIX_DEBUG` | `0` | Enable debug logging |
150
+ | `CACHE_FIX_HOT_RELOAD` | unset | Set to `on` to enable in-process extension hot-reload. Off by default as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x) for details and the supervisor restart flow. |
150
151
 
151
152
  ### Corporate environments (proxies, custom CAs)
152
153
 
@@ -210,6 +211,73 @@ Options (all optional; all fall back to the same env vars used by the CLI):
210
211
 
211
212
  *The embeddable factory was contributed by [@bilby91](https://github.com/bilby91) at [Crunchloop DAP](https://dap.crunchloop.ai) — see [PR #123](https://github.com/cnighswonger/claude-code-cache-fix/pull/123).*
212
213
 
214
+ ## Upgrading from v3.x
215
+
216
+ **Behavior changes in v4.0.0:**
217
+
218
+ - **`thinking-block-sanitize` v1 is now on by default.** Was opt-in via `CACHE_FIX_THINKING_SANITIZE=on` in v3.8.0–v3.9.x. After seven days of prod dogfood across 37 sessions (zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs. 92.44% baseline, sanitize firing on ~35% of sessions with ~800 blocks dropped per day) the v1 mitigation is the new default. Set `CACHE_FIX_THINKING_SANITIZE=off` to explicitly disable. v2 (additional tools-hash-mismatch drop) stays opt-in via `=v2`. See [#63147](https://github.com/anthropics/claude-code/issues/63147) and [#162](https://github.com/cnighswonger/claude-code-cache-fix/issues/162).
219
+ - **In-process extension hot-reload is now off by default.** Was on in v3.x. Set `CACHE_FIX_HOT_RELOAD=on` to restore the prior behavior. Off-by-default eliminates the Node ESM stale-import race documented in [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196), where the watcher silently failed to load a newly-merged extension for 17 hours after a hot-reload trigger. The race fires when the file watcher re-imports an extension whose transitive dependencies are already cached by Node's loader; cold starts are unaffected.
220
+
221
+ ### Embedder note (Bun hosts, DAP-style integrations using `createProxyServer()` / `startProxy()`)
222
+
223
+ v4.0.0 flips `CACHE_FIX_THINKING_SANITIZE` from default-off to default-on. The v1 omitted-text drop will run on every request body passing through the embedded proxy. If your host depends on the prior no-sanitization behavior (e.g., your downstream code expects empty `thinking` blocks to survive the proxy round-trip), preserve it by either:
224
+
225
+ - Setting `CACHE_FIX_THINKING_SANITIZE=off` in your host's environment, OR
226
+ - Setting `process.env.CACHE_FIX_THINKING_SANITIZE = "off"` in your code at any point before request handling — the mode is read per-request via `modeFromEnv()`, not cached at module load.
227
+
228
+ The flip is backed by 7 days of prod dogfood (37 sessions, zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs 92.44% baseline). See [PR #201](https://github.com/cnighswonger/claude-code-cache-fix/pull/201) for the validation data and [#63147](https://github.com/anthropics/claude-code/issues/63147) for upstream context.
229
+
230
+ Picking up a new extension or a code change to an existing one in v4.0.0 requires a supervisor-level proxy restart. There are two upgrade flows depending on whether you also want to opt back into hot-reload.
231
+
232
+ ### Flow 1 — code-only npm upgrade (recommended default)
233
+
234
+ Your existing systemd unit / launchd plist is unchanged; only the proxy code on disk is updated by npm. Restart the running process to pick up the new code.
235
+
236
+ **Linux (systemd user unit):**
237
+
238
+ ```
239
+ npm install -g claude-code-cache-fix@4
240
+ systemctl --user restart cache-fix-proxy
241
+ ```
242
+
243
+ No `daemon-reload` required — the unit file content is unchanged.
244
+
245
+ **macOS (launchd user agent):**
246
+
247
+ ```
248
+ npm install -g claude-code-cache-fix@4
249
+ launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
250
+ ```
251
+
252
+ `kickstart` re-execs the agent under the existing plist.
253
+
254
+ ### Flow 2 — opt back into hot-reload at the supervisor layer
255
+
256
+ Run if you actively use hot-reload (e.g., you drop custom extensions into the extensions dir on a live proxy and want them picked up without restart). This rewrites the unit / plist so `CACHE_FIX_HOT_RELOAD=on` is set every time the supervisor starts the proxy.
257
+
258
+ **Linux (systemd user unit):**
259
+
260
+ ```
261
+ CACHE_FIX_HOT_RELOAD=on cache-fix-proxy install-service
262
+ systemctl --user daemon-reload
263
+ systemctl --user restart cache-fix-proxy
264
+ ```
265
+
266
+ `daemon-reload` is required because the unit file content changed.
267
+
268
+ **macOS (launchd user agent):**
269
+
270
+ ```
271
+ CACHE_FIX_HOT_RELOAD=on cache-fix-proxy install-service
272
+ launchctl bootout gui/$(id -u)/com.cnighswonger.cache-fix-proxy
273
+ launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.cnighswonger.cache-fix-proxy.plist
274
+ launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
275
+ ```
276
+
277
+ `bootout` + `bootstrap` is required because the plist contents changed — `kickstart` alone does not pick up plist changes.
278
+
279
+ **Note on the hot-reload tradeoff:** even on the opt-in path, the ESM stale-import race remains possible on long-running processes. If you hit a degraded `/health` (returns 503 + `{status:"degraded",...}`), a process restart is the only recovery; the proxy logs a `[CRITICAL]` hint when this happens. See [#197](https://github.com/cnighswonger/claude-code-cache-fix/pull/197) for the observability layer.
280
+
213
281
  ## What this proxy defends against
214
282
 
215
283
  **Cache-economics regressions.** The original purpose of cache-fix is to absorb the cache-handling behaviors in Claude Code that cost users real money and quota — TTL downgrades, cache-breaking header churn, identity-latching issues, and the rest of the regression catalog documented across our issue history. The proxy sits between CC and the Anthropic API, normalizes the request and response stream, and emits enough observability (via statusline integration and the quota-status files) that users can see what their session is actually doing. This is the load-bearing feature for almost every user today.
@@ -353,7 +421,7 @@ Additionally, images read via the Read tool persist as base64 in conversation hi
353
421
 
354
422
  ## How it works
355
423
 
356
- **Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions are hot-reloadable `.mjs` files configured in `proxy/extensions.json`. All other traffic passes through untouched.
424
+ **Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions live as `.mjs` files configured in `proxy/extensions.json` and load once at proxy startup (hot-reload is opt-in as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x)). All other traffic passes through untouched.
357
425
 
358
426
  **Preload mode** (v2.x): A Node.js `--import` module that patches `globalThis.fetch` before Claude Code makes API calls. Applies the same fixes inline — scans user messages for relocated blocks, sorts tools, recomputes fingerprints, injects TTL markers.
359
427
 
@@ -765,17 +833,17 @@ Token thresholds are anchored to the observed ~382K-token trip with margin; the
765
833
  | `CACHE_FIX_THINKING_RISK_HIGH_TOKENS` | `340000` | Context-token level at which risk becomes `high` and the one-time stderr warn fires. |
766
834
  | `CACHE_FIX_THINKING_RISK` | unset (on) | Set to `off` to suppress the warning signal (stderr line + `thinking_desync_risk` field). Raw count telemetry keeps recording. |
767
835
 
768
- ## Thinking-block sanitize (proxy mode, opt-in, thinking-desync mitigation)
836
+ ## Thinking-block sanitize (proxy mode, on by default, thinking-desync mitigation)
769
837
 
770
838
  The *mitigate* half of the thinking-desync response (the *warn-before* half is session-health above). On history-replay paths (resume / `--continue` / auto-compaction / parallel-tool-cancel), Claude Code re-sends prior assistant turns' extended thinking in the **omitted** shape `{ "type":"thinking", "thinking":"", "signature":"<intact>" }`. The API rejects modified thinking in the **latest** assistant message with a permanent `400 … thinking … blocks cannot be modified`, which wedges the session on every subsequent turn (upstream root cause: [anthropics/claude-code#63147](https://github.com/anthropics/claude-code/issues/63147)).
771
839
 
772
840
  The `thinking-block-sanitize` extension drops those omitted blocks — which the API treats as optional history — from the request before it is forwarded. Empirically-resolved turn-selection rule: drop omitted thinking from **all prior assistant turns and the latest assistant turn, unless the latest turn is an active tool-continuation** (its last block is a `tool_use` answered by a following `tool_result`). In that one case the API requires the signed thinking intact and the proxy cannot restore the emptied text, so it leaves the turn untouched. **No env var both preserves thinking and avoids the wedge for that case:** `CLAUDE_CODE_DISABLE_THINKING=1` / `MAX_THINKING_TOKENS=0` stop the wedge only by disabling thinking entirely (lossy — no reasoning), and `DISABLE_INTERLEAVED_THINKING=1` does *not* stop the `400` — so there the answer is don't-resume + heal/retire the session. That is exactly why the proxy mitigation matters: **it is the only path that preserves reasoning while avoiding the wedge** for the history-replay paths it covers. Non-empty thinking is never touched; `redacted_thinking` is out of scope for v1.
773
841
 
774
- **Opt-in.** v1 ships behind `CACHE_FIX_THINKING_SANITIZE=on` (default off): it mutates request bodies and full live-coverage validation is pending. The transform is deterministic and cache-prefix-stable, and emits a per-request `thinking_blocks_dropped` count into the per-session JSON (counts only — never content) that complements the session-health signal.
842
+ **On by default as of v4.0.0.** v1 was opt-in via `CACHE_FIX_THINKING_SANITIZE=on` in v3.8.0–v3.9.x. After seven days of prod dogfood across 37 sessions (zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs. 92.44% baseline, sanitize firing on ~35% of sessions with ~800 blocks dropped per day, max 938K context healthy) the v1 mitigation is the new default. The transform is deterministic and cache-prefix-stable, and emits a per-request `thinking_blocks_dropped` count into the per-session JSON (counts only — never content) that complements the session-health signal. v2 stays opt-in pending its own prod-dogfood window after [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196) closes the silent-load failure mode that prevented v2 from running in prior testing.
775
843
 
776
844
  | Env var | Default | Purpose |
777
845
  |---------|---------|---------|
778
- | `CACHE_FIX_THINKING_SANITIZE` | unset (off) | Set to `on` to enable the request-path drop of omitted thinking blocks. Off = no-op (no mutation, no telemetry). |
846
+ | `CACHE_FIX_THINKING_SANITIZE` | unset (= v1) | v4.0.0+: v1 omitted-block drop is the default. Set to `off` to explicitly disable (returns to v3.x default-off behavior). Set to `v2` to additionally enable the v2 tools-hash-mismatch drop. Set to `on` for v1 (back-compat same as unset). |
779
847
 
780
848
  ## System prompt rewrite (preload mode, optional)
781
849
 
@@ -827,6 +895,7 @@ We monitor 30+ upstream Claude Code issues related to cache, quota, and context
827
895
  - **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery, proposed and built proxy architecture for v3.0.0
828
896
  - **[@vmfarms](https://github.com/vmfarms)** — Concurrent multi-runner production validation, surfaced proxy-mode resume-marker regex no-op (#96), TTL tier detection gap (#97), and image-strip stderr leak (#98)
829
897
  - **[@ojura](https://github.com/ojura)** — Opus 4.7 thinking-summaries root-cause analysis: filed [anthropics/claude-code#59844](https://github.com/anthropics/claude-code/issues/59844) with the CLI-binary decode (`!getIsNonInteractiveSession()` gate at offset 230510599 in v2.1.142) and the two-stacked-special-cases framing, which made the `thinking-display` extension (v3.6.1) a clean proxy-side complement to the proposed upstream fix
898
+ - **[@yurukusa](https://github.com/yurukusa)** — [Cluster taxonomy](https://yurukusa.github.io/cc-safe-setup/cluster-tracker.html#cluster-extended-thinking-wedge) for [anthropics/claude-code#63147](https://github.com/anthropics/claude-code/issues/63147) thinking-desync wedge; the 13E (ToolSearch) sub-pattern synthesis that made the `thinking-block-sanitize` v2 directive predicate tractable (cache-fix #171, shipped behind `=v2` opt-in in v4.0.0)
830
899
  - **[@schuay](https://github.com/schuay)** — `quota-statusline.sh` enhancements: 10-cell quota bar with elapsed-time tick and exhaust-vs-reset projection replacing the prior `%/min` burn-rate display (PR #140, v3.6.2), and d/h vs h/m time-format autoselect plus named time-unit and burn-warmup constants (PR #143, v3.7.0)
831
900
 
832
901
  If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.
@@ -55,6 +55,7 @@ async function dispatch() {
55
55
  " CACHE_FIX_PROXY_PORT Port for the proxy server\n" +
56
56
  " CACHE_FIX_PROXY_UPSTREAM Upstream URL\n" +
57
57
  " CACHE_FIX_DEBUG=1 Verbose proxy logging\n" +
58
+ " CACHE_FIX_HOT_RELOAD=on Enable in-process extension hot-reload (off by default; see #196)\n" +
58
59
  " CACHE_FIX_CLAUDE_CMD Override the `claude` command for the wrapper\n",
59
60
  );
60
61
  return 0;
@@ -23,6 +23,11 @@ function getDefaults() {
23
23
  port: validatePort(process.env.CACHE_FIX_PROXY_PORT || "9801"),
24
24
  upstream: process.env.CACHE_FIX_PROXY_UPSTREAM || "",
25
25
  debug: process.env.CACHE_FIX_DEBUG || "",
26
+ // Hot-reload is opt-in as of v4.0.0 (#196). Capture from env at install
27
+ // time so the operator can bake `CACHE_FIX_HOT_RELOAD=on` into the
28
+ // generated unit/plist via `CACHE_FIX_HOT_RELOAD=on cache-fix-proxy
29
+ // install-service`. Strict "on" match — anything else renders nothing.
30
+ hotReload: process.env.CACHE_FIX_HOT_RELOAD === "on" ? "on" : "",
26
31
  workingDir: resolve(__dirname, ".."),
27
32
  };
28
33
  }
@@ -93,6 +98,9 @@ function renderSystemdTemplate(template, vars) {
93
98
  const debugLine = vars.debug
94
99
  ? `Environment=CACHE_FIX_DEBUG=${vars.debug}`
95
100
  : "";
101
+ const hotReloadLine = vars.hotReload
102
+ ? `Environment=CACHE_FIX_HOT_RELOAD=${vars.hotReload}`
103
+ : "";
96
104
  // Allow callers to wire a Requires= line (e.g. another service the proxy
97
105
  // chains to). Empty string by default so the unit has no extra deps.
98
106
  const requiresLine = vars.requires
@@ -104,6 +112,7 @@ function renderSystemdTemplate(template, vars) {
104
112
  .replaceAll("{{PORT}}", vars.port)
105
113
  .replaceAll("{{UPSTREAM_LINE}}", upstreamLine)
106
114
  .replaceAll("{{DEBUG_LINE}}", debugLine)
115
+ .replaceAll("{{HOT_RELOAD_LINE}}", hotReloadLine)
107
116
  .replaceAll("{{REQUIRES_LINE}}", requiresLine)
108
117
  .replaceAll("{{WORKING_DIR}}", vars.workingDir)
109
118
  // Collapse triple newlines from empty optional lines down to single blank
@@ -117,12 +126,16 @@ function renderLaunchdTemplate(template, vars) {
117
126
  const debugPlist = vars.debug
118
127
  ? ` <key>CACHE_FIX_DEBUG</key>\n <string>${vars.debug}</string>`
119
128
  : "";
129
+ const hotReloadPlist = vars.hotReload
130
+ ? ` <key>CACHE_FIX_HOT_RELOAD</key>\n <string>${vars.hotReload}</string>`
131
+ : "";
120
132
  return template
121
133
  .replaceAll("{{NODE}}", vars.node)
122
134
  .replaceAll("{{SERVER_PATH}}", vars.serverPath)
123
135
  .replaceAll("{{PORT}}", vars.port)
124
136
  .replaceAll("{{UPSTREAM_PLIST}}", upstreamPlist)
125
137
  .replaceAll("{{DEBUG_PLIST}}", debugPlist)
138
+ .replaceAll("{{HOT_RELOAD_PLIST}}", hotReloadPlist)
126
139
  .replaceAll("{{WORKING_DIR}}", vars.workingDir)
127
140
  .replaceAll("{{LOG_DIR}}", vars.logDir)
128
141
  .replace(/\n\n+/g, "\n");
@@ -176,6 +189,7 @@ async function installSystemd({ paths, defaults, force = false } = {}) {
176
189
  port: defaults.port,
177
190
  upstream: defaults.upstream,
178
191
  debug: defaults.debug,
192
+ hotReload: defaults.hotReload,
179
193
  workingDir: defaults.workingDir,
180
194
  requires: "",
181
195
  });
@@ -275,6 +289,7 @@ async function installLaunchd({ paths, defaults, force = false } = {}) {
275
289
  port: defaults.port,
276
290
  upstream: defaults.upstream,
277
291
  debug: defaults.debug,
292
+ hotReload: defaults.hotReload,
278
293
  workingDir: defaults.workingDir,
279
294
  logDir: paths.logDir,
280
295
  });
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-code-cache-fix",
3
- "version": "3.9.0",
3
+ "version": "4.0.0",
4
4
  "description": "Cache optimization proxy and interceptor for Claude Code. Fixes prompt cache bugs, stabilizes prefix, reduces quota burn.",
5
5
  "type": "module",
6
6
  "exports": {
@@ -56,7 +56,12 @@ export function sessionFilePath(rawId) {
56
56
  return join(paths().sessionsDir, `${sessionFilename(rawId)}.json`);
57
57
  }
58
58
 
59
- function resolveSessionId(headers) {
59
+ // Exported so sibling extensions can read the canonical session id from
60
+ // REQUEST headers at their own onRequest time — they can't rely on
61
+ // ctx.meta._sessionId being set, because this writer's onRequest is the
62
+ // thing that populates it (and runs at order 600, after most extensions).
63
+ // thinking-block-sanitize v2 (order 550) uses this for the same reason.
64
+ export function resolveSessionId(headers) {
60
65
  if (!headers) return null;
61
66
  const sid =
62
67
  headers["x-claude-code-session-id"] ||
@@ -233,9 +238,16 @@ export default {
233
238
  // 590, stashes these before this writer runs). Optional — absent if
234
239
  // that extension is disabled or produced nothing this request.
235
240
  ...(ctx.meta._sessionHealth || {}),
236
- // Additive thinking-block-sanitize drop count (order 550, opt-in).
237
- // Optional absent unless CACHE_FIX_THINKING_SANITIZE=on.
241
+ // Additive thinking-block-sanitize drop count (order 550). On by
242
+ // default since v4.0.0; present (possibly with thinking_blocks_dropped:0)
243
+ // whenever sanitize ran. Absent when CACHE_FIX_THINKING_SANITIZE=off
244
+ // or when the extension returned early before reaching the planner
245
+ // (e.g., body.messages not an array).
238
246
  ...(ctx.meta._thinkingSanitize || {}),
247
+ // Additive thinking-block-sanitize v2 fields (order 550, opt-in via
248
+ // CACHE_FIX_THINKING_SANITIZE=v2). Optional — absent unless v2 is
249
+ // enabled. Keys: thinking_blocks_dropped_v2 / tools_hash_baseline.
250
+ ...(ctx.meta._thinkingSanitizeV2 || {}),
239
251
  // Additive auto-1m-guard annotation (order 520). Optional — absent
240
252
  // unless the outbound request carried context-1m-2025-08-07 and the
241
253
  // mode wasn't off. Keys: auto_1m_detected / auto_1m_action /
@@ -0,0 +1,60 @@
1
+ // signature-surface-hash — hash helper for thinking-block-sanitize v2.
2
+ //
3
+ // Computes a deterministic 16-hex-char fingerprint of the inputs that
4
+ // participate in the API's thinking-block signature: the tools surface,
5
+ // and (forward-compat) optionally the system block or anthropic-beta
6
+ // header value.
7
+ //
8
+ // v2 only passes `{ tools }`. The signature is left forward-compatible so
9
+ // a future v3 directive can extend coverage without renaming this helper.
10
+ //
11
+ // Canonicalization rules (per directive proxy-thinking-block-sanitize-v2.md):
12
+ // - Each tool object: recursive stable JSON stringify with recursive key
13
+ // sorting at every nesting level. Nested JSON-schema objects (in
14
+ // input_schema, parameters, etc.) have their own keys, which also
15
+ // sort stably.
16
+ // - Preserve tools[] array order. Reordering tools changes which slot
17
+ // which tool occupies in the API's view; the hash MUST reflect that.
18
+ // (Note: sort-stabilization at order 200 currently locks the array
19
+ // order before v2 fires, so this rule is forward-compatibility against
20
+ // any future change in upstream ordering.)
21
+ // - Sentinel for empty/absent: if tools is undefined, null, or [], the
22
+ // hash input is the literal string "none". Rules out collision with
23
+ // other empty-shaped inputs in a future extension.
24
+ //
25
+ // Output: sha256(canonical_input).slice(0, 16) — 16 hex chars matches the
26
+ // existing _sessionHealth / _thinkingSanitize precedent for in-JSON identifiers.
27
+
28
+ import { createHash } from "node:crypto";
29
+
30
+ // Recursive stable stringify: object keys sort, arrays preserve order,
31
+ // primitives go through JSON.stringify as-is. Handles nested objects and
32
+ // arrays to arbitrary depth.
33
+ export function canonicalStringify(value) {
34
+ if (value === null || typeof value !== "object") {
35
+ return JSON.stringify(value);
36
+ }
37
+ if (Array.isArray(value)) {
38
+ return "[" + value.map(canonicalStringify).join(",") + "]";
39
+ }
40
+ const keys = Object.keys(value).sort();
41
+ const parts = keys.map((k) => JSON.stringify(k) + ":" + canonicalStringify(value[k]));
42
+ return "{" + parts.join(",") + "}";
43
+ }
44
+
45
+ // Compute the signature-surface hash. v2 passes only { tools }; system and
46
+ // anthropic_beta are reserved for future versions.
47
+ export function computeSignatureSurfaceHash({ tools, system, anthropic_beta } = {}) {
48
+ // Empty/absent tools → "none" sentinel (not the canonical-stringify of [],
49
+ // which would be "[]" and could collide with other empty-shaped inputs).
50
+ const toolsPart =
51
+ tools == null || (Array.isArray(tools) && tools.length === 0)
52
+ ? "none"
53
+ : canonicalStringify(tools);
54
+ // Reserved inputs — passed by future versions; v2 always omits them, so
55
+ // they contribute nothing to the hash today. Kept in the signature so
56
+ // existing call sites don't need to change when v3 adds them.
57
+ void system;
58
+ void anthropic_beta;
59
+ return createHash("sha256").update(toolsPart).digest("hex").slice(0, 16);
60
+ }
@@ -1,12 +1,27 @@
1
1
  // thinking-block-sanitize — request-path mitigation for the CC thinking-desync
2
- // wedge (anthropics/claude-code#63147). On replay paths (resume / --continue /
2
+ // wedge (anthropics/claude-code#63147).
3
+ //
4
+ // v1 (default since v4.0.0; CACHE_FIX_THINKING_SANITIZE unset or =on): On replay paths (resume / --continue /
3
5
  // auto-compaction / parallel-tool-cancel), CC re-sends prior assistant turns'
4
6
  // thinking in the OMITTED shape `{ type:"thinking", thinking:"", signature }`.
5
7
  // The API rejects modified thinking in the *latest* assistant message with a
6
- // permanent 400, which wedges the session. This extension drops the omitted
7
- // thinking blocks the API treats as optional, before the request is forwarded.
8
+ // permanent 400, which wedges the session. v1 drops these omitted blocks
9
+ // before forwarding. Never touches non-empty thinking; never touches
10
+ // redacted_thinking (v1's empirical exclusion — zero observed in worst-case
11
+ // wedged transcripts).
12
+ //
13
+ // v2 (CACHE_FIX_THINKING_SANITIZE=v2): Additionally handles yurukusa's "13E"
14
+ // pattern — when ToolSearch dynamically loads a tool mid-conversation, the
15
+ // prior assistant turn's thinking signature is invalidated because it was
16
+ // computed over the now-stale tools surface. The API rejects + CC's harness
17
+ // strips-and-retries, paying a 400 + retry tax every turn. v2 detects
18
+ // cross-request tools-surface change via a per-session tools-hash baseline,
19
+ // and strips ALL prior-turn signed thinking (both `thinking` blocks with
20
+ // non-empty text AND `redacted_thinking` blocks — v2's scope is structural,
21
+ // not empirical) on hash mismatch. Same active-tool-continuation latest-turn
22
+ // guard as v1.
8
23
  //
9
- // Resolved turn-selection rule (directive Open Question 1, empirical capture):
24
+ // Resolved turn-selection rule (v1 directive Open Question 1, empirical capture):
10
25
  // - drop omitted thinking from ALL prior assistant turns, AND
11
26
  // - from the LATEST assistant turn UNLESS it is an active tool-continuation
12
27
  // (last block is a tool_use with a following tool_result) — that case is
@@ -16,15 +31,47 @@
16
31
  // / MAX_THINKING_TOKENS=0 stop it only by disabling thinking entirely
17
32
  // (lossy); DISABLE_INTERLEAVED_THINKING=1 does NOT stop the 400 — so the
18
33
  // answer for that case is don't-resume + heal/retire.
19
- // Never touches non-empty thinking, and never touches redacted_thinking (v1).
20
34
  //
21
- // OPT-IN for v1: only runs when CACHE_FIX_THINKING_SANITIZE=on (default off)
22
- // it mutates request bodies and its coverage is not yet live-validated.
35
+ // v2 state pattern (per directive proxy-thinking-block-sanitize-v2.md):
36
+ // - In-memory per-session map keyed by canonical session filename, seeded
37
+ // once from sessions/<sid>.json on first request that session. Mirrors
38
+ // session-health's pattern. Lives at module scope.
39
+ // - Baseline updates ONLY on response success (HTTP 2xx). 4xx/5xx leave
40
+ // the baseline unchanged so a failed request's hash doesn't become the
41
+ // new ground truth.
42
+ // - First request observes-and-establishes (no strip; baseline is set
43
+ // after the response succeeds).
44
+ // - When canonical session id is "unknown" (raw id null/empty/whitespace),
45
+ // v2 no-ops entirely. The shared sessions/unknown.json would cross-
46
+ // contaminate baselines across unrelated agents otherwise.
47
+ //
48
+ // Modes via CACHE_FIX_THINKING_SANITIZE (as of v4.0.0 — v1 default-on flip):
49
+ // unset (or "on") — v1 only (omitted-text drop). DEFAULT.
50
+ // "off" — extension no-ops (explicit disable)
51
+ // "v2" — v1 + v2 (omitted-text drop AND
52
+ // tools-hash-mismatch drop). v2 is
53
+ // strict superset of v1.
54
+ // any other value — treated as v1 (the default), not off.
55
+ // Matches the precedent of being
56
+ // permissive about the on-path.
57
+ //
58
+ // v1 default-on rationale: 7-day prod dogfood across 37 sessions (2026-05-29
59
+ // → 2026-06-05) on `=on`: zero `cannot be modified` 400s, cache hit-rate
60
+ // aggregate 94.66% vs 92.44% baseline (no prefix degradation), sanitize fired
61
+ // on ~35% of sessions, ~800 blocks dropped per day, max 938K context healthy.
62
+ // v2 stays opt-in via `=v2` because the dogfood only ran v1.
23
63
  //
24
64
  // Order 550: after the request-body mutators (ttl-management 500) and before
25
65
  // session-health (590), so #160's thinking_block_count reflects the forwarded
26
- // body. The per-request drop count is exposed via ctx.meta._thinkingSanitize
27
- // for cache-telemetry (600) to merge into the per-session JSON.
66
+ // body. The per-request drop counts are exposed via ctx.meta._thinkingSanitize
67
+ // (v1 counter) and ctx.meta._thinkingSanitizeV2 (v2 counter + baseline) for
68
+ // cache-telemetry (600) to merge into the per-session JSON.
69
+
70
+ import { readFileSync } from "node:fs";
71
+ import { resolveSessionId, sessionFilePath, sessionFilename } from "./cache-telemetry.mjs";
72
+ import { computeSignatureSurfaceHash } from "./signature-surface-hash.mjs";
73
+
74
+ // --- v1 predicates ---
28
75
 
29
76
  export function isOmittedThinking(block) {
30
77
  return (
@@ -70,14 +117,38 @@ function latestAssistantIndex(messages) {
70
117
  return -1;
71
118
  }
72
119
 
73
- // Pure planner: returns { messages, dropped }. Does not mutate the input.
74
- // `messages` is the new array (a message that loses all content is dropped).
75
- export function planSanitize(messages) {
76
- if (!Array.isArray(messages)) return { messages, dropped: 0 };
120
+ // --- v2 predicate ---
121
+
122
+ // v2 strips signed `thinking` blocks (non-empty text) AND `redacted_thinking`
123
+ // blocks. v1's `isOmittedThinking` filter handles the empty-text case
124
+ // independently — when both flags are active, v1 drops the empty ones and v2
125
+ // drops the signed ones; predicates are non-overlapping.
126
+ export function isSignedThinkingForV2(block) {
127
+ if (!block) return false;
128
+ if (block.type === "redacted_thinking") return true;
129
+ // Non-empty thinking with a signature — v1 leaves these alone by design.
130
+ return (
131
+ block.type === "thinking" &&
132
+ typeof block.thinking === "string" &&
133
+ block.thinking.trim() !== "" &&
134
+ typeof block.signature === "string" &&
135
+ block.signature.length > 0
136
+ );
137
+ }
138
+
139
+ // --- Pure planner ---
140
+ //
141
+ // Returns { messages, dropped, droppedV2 }. Does not mutate input.
142
+ // `v2StripSigned` is the externally-determined boolean: should v2's
143
+ // signed-thinking drop fire this request? (Caller has already computed
144
+ // hash mismatch + session-state checks.)
145
+ export function planSanitize(messages, { v2StripSigned = false } = {}) {
146
+ if (!Array.isArray(messages)) return { messages, dropped: 0, droppedV2: 0 };
77
147
  const latestAsst = latestAssistantIndex(messages);
78
148
  const protectLatest = latestAsst >= 0 && isActiveToolContinuation(messages, latestAsst);
79
149
 
80
150
  let dropped = 0;
151
+ let droppedV2 = 0;
81
152
  let changed = false;
82
153
  const out = [];
83
154
  for (let i = 0; i < messages.length; i++) {
@@ -87,14 +158,24 @@ export function planSanitize(messages) {
87
158
  continue;
88
159
  }
89
160
  if (i === latestAsst && protectLatest) {
90
- out.push(msg); // active continuation — leave its thinking intact
161
+ // Active continuation — leave thinking intact (both v1 and v2 respect
162
+ // this; the API needs the signed thinking for the pending tool call).
163
+ out.push(msg);
91
164
  continue;
92
165
  }
93
166
  const kept = msg.content.filter((b) => {
167
+ // v1 always-active drop predicate.
94
168
  if (isOmittedThinking(b)) {
95
169
  dropped++;
96
170
  return false;
97
171
  }
172
+ // v2-only drop predicate. Predicates are mutually exclusive on a single
173
+ // block: omitted thinking matches v1's predicate but not v2's, and
174
+ // signed/redacted thinking matches v2's predicate but not v1's.
175
+ if (v2StripSigned && isSignedThinkingForV2(b)) {
176
+ droppedV2++;
177
+ return false;
178
+ }
98
179
  return true;
99
180
  });
100
181
  if (kept.length === msg.content.length) {
@@ -106,25 +187,158 @@ export function planSanitize(messages) {
106
187
  changed = true;
107
188
  }
108
189
  }
109
- return { messages: changed ? out : messages, dropped };
190
+ return { messages: changed ? out : messages, dropped, droppedV2 };
191
+ }
192
+
193
+ // --- v2 mode + state ---
194
+
195
+ // "off" | "on" | "v2". As of v4.0.0 the default flipped from "off" to "on" —
196
+ // v1 (omitted-text drop) is the new default behavior. Set
197
+ // CACHE_FIX_THINKING_SANITIZE=off to explicitly disable; =v2 to additionally
198
+ // enable the v2 tools-hash-mismatch drop (still opt-in pending its own
199
+ // prod-dogfood window after #200 closes the silent-load failure mode).
200
+ // Unknown values fall through to "on" — we are permissive about the on-path
201
+ // and only treat the literal "off" as a disable.
202
+ export function modeFromEnv(env = process.env) {
203
+ const v = env.CACHE_FIX_THINKING_SANITIZE;
204
+ if (v === "off") return "off";
205
+ if (v === "v2") return "v2";
206
+ return "on";
207
+ }
208
+
209
+ // Per-session state, in memory. Keyed by canonical session filename
210
+ // (sessionFilename(rawId)). Each entry: { tools_hash_baseline }.
211
+ // Mirrors session-health's pattern: seeded once from disk on first request
212
+ // that session, then maintained in memory + persisted via cache-telemetry's
213
+ // spread of ctx.meta._thinkingSanitizeV2.
214
+ const v2SessionState = new Map();
215
+
216
+ function seedV2FromFile(rawSid) {
217
+ let prev = null;
218
+ try {
219
+ prev = JSON.parse(readFileSync(sessionFilePath(rawSid), "utf8"));
220
+ } catch {}
221
+ return {
222
+ tools_hash_baseline:
223
+ typeof prev?.tools_hash_baseline === "string" ? prev.tools_hash_baseline : null,
224
+ };
225
+ }
226
+
227
+ // Test-only reset (also useful for proxy-restart simulation in unit tests).
228
+ export function _resetV2State() {
229
+ v2SessionState.clear();
110
230
  }
111
231
 
232
+ // --- Extension default-export ---
233
+
112
234
  export default {
113
235
  name: "thinking-block-sanitize",
114
236
  description:
115
- "Drop omitted (empty-text) thinking blocks from prior assistant turns and the latest non-continuation turn, to head off the CC thinking-desync 400 (#63147). Opt-in via CACHE_FIX_THINKING_SANITIZE=on.",
237
+ "Drop omitted (empty-text) thinking blocks from prior assistant turns and the latest non-continuation turn, to head off the CC thinking-desync 400 (#63147). v1 mode: omitted-text drop only. v2 mode: also drop signed thinking + redacted_thinking on cross-request tools-hash mismatch (ToolSearch surface). v1 is now ON by default as of v4.0.0; set CACHE_FIX_THINKING_SANITIZE=off to disable, =v2 to additionally opt into v2.",
116
238
  order: 550,
117
239
 
118
240
  async onRequest(ctx) {
119
- if (process.env.CACHE_FIX_THINKING_SANITIZE !== "on") return;
241
+ const mode = modeFromEnv();
242
+ if (mode === "off") return;
243
+
120
244
  const body = ctx.body;
121
245
  if (!body || !Array.isArray(body.messages)) return;
122
246
 
123
- const { messages, dropped } = planSanitize(body.messages);
124
- if (dropped > 0) body.messages = messages;
247
+ // v2 only fires when mode === "v2" AND we have a usable session id.
248
+ let v2StripSigned = false;
249
+ let stateKey = null;
250
+ let currentHash = null;
251
+
252
+ if (mode === "v2") {
253
+ // Resolve session id inline — cache-telemetry's onRequest runs at order
254
+ // 600, after us, so ctx.meta._sessionId is not yet set when we fire at
255
+ // order 550. We import resolveSessionId from cache-telemetry to keep
256
+ // canonicalization consistent.
257
+ const rawSid = resolveSessionId(ctx.headers);
258
+ stateKey = sessionFilename(rawSid);
259
+
260
+ // "unknown" canonical id → no-op for v2 (cross-contamination risk on
261
+ // the shared sessions/unknown.json baseline). v1's strip still runs
262
+ // below regardless.
263
+ if (stateKey !== "unknown") {
264
+ currentHash = computeSignatureSurfaceHash({ tools: body.tools });
265
+
266
+ // Seed in-memory state from disk on first encounter that session
267
+ // (covers proxy restart — re-reads persisted baseline).
268
+ let st = v2SessionState.get(stateKey);
269
+ if (!st) {
270
+ st = seedV2FromFile(rawSid);
271
+ v2SessionState.set(stateKey, st);
272
+ }
273
+
274
+ const baseline = st.tools_hash_baseline;
275
+ // Mismatch only fires when there IS a baseline AND it differs.
276
+ // First request (baseline === null) observes-and-establishes — no strip.
277
+ v2StripSigned = baseline !== null && baseline !== currentHash;
278
+
279
+ // Stash for the onResponseStart hook to advance the baseline iff the
280
+ // response succeeded. Stash BEFORE the plan + strip so the response
281
+ // path has access regardless of whether anything was dropped.
282
+ ctx.meta._thinkingSanitizeV2PendingHash = currentHash;
283
+ ctx.meta._thinkingSanitizeV2StateKey = stateKey;
284
+ }
285
+ }
286
+
287
+ const { messages, dropped, droppedV2 } = planSanitize(body.messages, {
288
+ v2StripSigned,
289
+ });
290
+ if (dropped > 0 || droppedV2 > 0) body.messages = messages;
125
291
 
126
292
  // Counts only — never content. Exposed for cache-telemetry to persist and
127
293
  // for the #160 session-health signal.
294
+ // v1 counter — unchanged, fires for both modes.
128
295
  ctx.meta._thinkingSanitize = { thinking_blocks_dropped: dropped };
296
+
297
+ // v2 counter — fires only in v2 mode. Includes the post-mismatch baseline
298
+ // value that cache-telemetry will persist (so consumers can see the
299
+ // current baseline in the session JSON even on requests that didn't
300
+ // strip). The actual advance only happens on response success below.
301
+ if (mode === "v2" && stateKey && stateKey !== "unknown") {
302
+ ctx.meta._thinkingSanitizeV2 = {
303
+ thinking_blocks_dropped_v2: droppedV2,
304
+ // Persist the SOON-TO-BE-NEW baseline (it'll be advanced on success).
305
+ // On 4xx/5xx, the cache-telemetry write still happens but the
306
+ // in-memory state isn't advanced — next request re-reads disk and
307
+ // sees the persisted value, which may now disagree with in-memory.
308
+ // We resolve that by NOT writing the new hash to disk on failure:
309
+ // see onResponseStart below, which is the only thing that advances
310
+ // both in-memory and (indirectly via cache-telemetry's spread) disk.
311
+ // For now, leave tools_hash_baseline at the CURRENT baseline value;
312
+ // onResponseStart will overwrite this in meta if the response is 2xx.
313
+ tools_hash_baseline: v2SessionState.get(stateKey)?.tools_hash_baseline ?? null,
314
+ };
315
+ }
316
+ },
317
+
318
+ // Advance the baseline only on HTTP 2xx response. 4xx/5xx leaves the
319
+ // in-memory state and the meta-stashed baseline untouched, so a failed
320
+ // request's hash never becomes the new ground truth.
321
+ async onResponseStart(ctx) {
322
+ const stateKey = ctx.meta._thinkingSanitizeV2StateKey;
323
+ const pendingHash = ctx.meta._thinkingSanitizeV2PendingHash;
324
+ if (!stateKey || !pendingHash) return;
325
+ if (typeof ctx.status !== "number") return;
326
+ if (ctx.status < 200 || ctx.status >= 300) return;
327
+
328
+ // Advance in-memory baseline.
329
+ let st = v2SessionState.get(stateKey);
330
+ if (!st) {
331
+ st = { tools_hash_baseline: null };
332
+ v2SessionState.set(stateKey, st);
333
+ }
334
+ st.tools_hash_baseline = pendingHash;
335
+
336
+ // Update the meta-stashed baseline so cache-telemetry's spread writes
337
+ // the new value to disk. If meta._thinkingSanitizeV2 wasn't stashed
338
+ // (e.g. mode flip mid-request), construct it now.
339
+ if (!ctx.meta._thinkingSanitizeV2) {
340
+ ctx.meta._thinkingSanitizeV2 = { thinking_blocks_dropped_v2: 0 };
341
+ }
342
+ ctx.meta._thinkingSanitizeV2.tools_hash_baseline = pendingHash;
129
343
  },
130
344
  };
@@ -3,6 +3,7 @@ import { join } from "node:path";
3
3
  import { pathToFileURL } from "node:url";
4
4
 
5
5
  let registry = [];
6
+ let failedExtensions = []; // [{ file, error, lastAttempt }]
6
7
 
7
8
  export async function loadExtensions(dir, configPath) {
8
9
  let config = {};
@@ -15,6 +16,7 @@ export async function loadExtensions(dir, configPath) {
15
16
  const mjsFiles = files.filter((f) => f.endsWith(".mjs")).sort();
16
17
 
17
18
  const extensions = [];
19
+ const newlyFailed = [];
18
20
  for (const file of mjsFiles) {
19
21
  try {
20
22
  const mod = await import(pathToFileURL(join(dir, file)).href + "?t=" + Date.now());
@@ -29,12 +31,24 @@ export async function loadExtensions(dir, configPath) {
29
31
  extensions.push({ ...ext, order, _file: file });
30
32
  }
31
33
  } catch (err) {
32
- process.stderr.write(`[pipeline] failed to load ${file}: ${err.message}\n`);
34
+ // Load-bearing observability: this branch is the only signal that the
35
+ // proxy is running with a degraded extension graph. See #196: a Node
36
+ // ESM cache stale-import race silently broke thinking-block-sanitize
37
+ // v2 for 17 hours post-merge before AITL grepped the journal. The
38
+ // [CRITICAL] prefix is harder to miss than the prior [pipeline] one,
39
+ // and the explicit "restart proxy to recover" hint tells the operator
40
+ // what to do — the underlying Node ESM cache problem can't be fixed
41
+ // in-process (you can't evict cached transitive imports), so a full
42
+ // process restart is the only path to recover the extension graph.
43
+ const msg = `[CRITICAL] extension load failed: ${file}: ${err.message} — restart the proxy via your supervisor to recover (in-process reload cannot fix stale ESM cache; see #196)\n`;
44
+ process.stderr.write(msg);
45
+ newlyFailed.push({ file, error: String(err.message || err), lastAttempt: new Date().toISOString() });
33
46
  }
34
47
  }
35
48
 
36
49
  extensions.sort((a, b) => a.order - b.order);
37
50
  registry = extensions;
51
+ failedExtensions = newlyFailed;
38
52
  return extensions;
39
53
  }
40
54
 
@@ -46,6 +60,13 @@ export function snapshotRegistry() {
46
60
  return [...registry];
47
61
  }
48
62
 
63
+ // Exposed for /health and any operator-facing tool that wants to surface
64
+ // extension-load failures. Returns a fresh array per call so callers can't
65
+ // mutate internal state.
66
+ export function getFailedExtensions() {
67
+ return failedExtensions.map((f) => ({ ...f }));
68
+ }
69
+
49
70
  // Route scoping: extensions default to messages-only so that adding a new
50
71
  // route (e.g. /api/claude_cli/bootstrap) doesn't drag every existing
51
72
  // message-mutating extension onto it — most throw on a null body because
package/proxy/server.mjs CHANGED
@@ -3,7 +3,7 @@ import { pathToFileURL, URL } from "node:url";
3
3
  import config from "./config.mjs";
4
4
  import { forwardRequest } from "./upstream.mjs";
5
5
  import { streamResponse, createTelemetryRecord } from "./stream.mjs";
6
- import { loadExtensions, snapshotRegistry, runOnRequest, runOnResponseStart, runOnResponse } from "./pipeline.mjs";
6
+ import { loadExtensions, snapshotRegistry, runOnRequest, runOnResponseStart, runOnResponse, getFailedExtensions } from "./pipeline.mjs";
7
7
  import { startWatcher } from "./watcher.mjs";
8
8
 
9
9
  function collectBody(req) {
@@ -238,6 +238,21 @@ async function handleBootstrap(clientReq, clientRes) {
238
238
  }
239
239
 
240
240
  function handleHealth(_req, res) {
241
+ // Surface extension-load failures so callers (operators, monitoring) see
242
+ // a degraded proxy state instead of a misleading "ok". See #196: a Node
243
+ // ESM cache stale-import race silently broke thinking-block-sanitize v2
244
+ // for 17 hours post-merge before anyone noticed. /health returning "ok"
245
+ // through that window was load-bearing in the silence.
246
+ const failed = getFailedExtensions();
247
+ if (failed.length > 0) {
248
+ res.writeHead(503, { "content-type": "application/json" });
249
+ res.end(JSON.stringify({
250
+ status: "degraded",
251
+ failed_extensions: failed,
252
+ hint: "restart the proxy via your supervisor to recover (in-process reload cannot fix stale ESM cache; #196)",
253
+ }));
254
+ return;
255
+ }
241
256
  res.writeHead(200, { "content-type": "application/json" });
242
257
  res.end(JSON.stringify({ status: "ok" }));
243
258
  }
@@ -290,7 +305,34 @@ export async function startProxy(options = {}) {
290
305
  const bind = options.bind ?? config.bind;
291
306
  const extensionsDir = options.extensionsDir ?? config.extensionsDir;
292
307
  const extensionsConfig = options.extensionsConfig ?? config.extensionsConfig;
293
- const watch = options.watch !== false;
308
+ // Hot-reload is opt-in as of v4.0.0 (#196). The in-process watcher is the
309
+ // only code path that triggers the Node ESM stale-import race; cold starts
310
+ // have an empty module cache and load extensions cleanly. Strict `=== "on"`
311
+ // means any other value (including "true"/"1"/"yes") is treated as off —
312
+ // the safe default. Note this is the opposite stance from
313
+ // CACHE_FIX_THINKING_SANITIZE (default-on; only literal "off" disables):
314
+ // a hot-reload enable is a footgun, so we require the operator to type the
315
+ // exact opt-in token; a sanitize disable is also a footgun (loses the
316
+ // wedge mitigation), so we require the exact disable token there.
317
+ const hotReloadOptIn = process.env.CACHE_FIX_HOT_RELOAD === "on";
318
+ const watch = options.watch !== false && hotReloadOptIn;
319
+
320
+ // Boot banner on stderr so the EFFECTIVE hot-reload mode is visible in the
321
+ // supervisor's log (journalctl --user / ~/Library/Logs/) without being
322
+ // noisy for monitoring tools that line-grep stderr. Keyed off the effective
323
+ // `watch` value, not the raw envvar, so an embedder calling startProxy({
324
+ // watch: false }) with the envvar set sees "off" (which is the truth — the
325
+ // watcher is suppressed regardless of envvar in that case). Supervisor-
326
+ // neutral wording — no version pin (lives in CHANGELOG/README instead).
327
+ if (watch) {
328
+ process.stderr.write(
329
+ "[cache-fix] hot-reload: on (CACHE_FIX_HOT_RELOAD=on) — long-running processes can hit a Node ESM stale-import race; see #196. Restart the proxy via your supervisor to recover.\n",
330
+ );
331
+ } else {
332
+ process.stderr.write(
333
+ "[cache-fix] hot-reload: off (set CACHE_FIX_HOT_RELOAD=on to enable). Extension changes require a supervisor-level proxy restart.\n",
334
+ );
335
+ }
294
336
 
295
337
  let watcher = null;
296
338
  try {
@@ -11,6 +11,7 @@ RestartSec=5
11
11
  Environment=CACHE_FIX_PROXY_PORT={{PORT}}
12
12
  {{UPSTREAM_LINE}}
13
13
  {{DEBUG_LINE}}
14
+ {{HOT_RELOAD_LINE}}
14
15
  WorkingDirectory={{WORKING_DIR}}
15
16
 
16
17
  [Install]
@@ -15,6 +15,7 @@
15
15
  <string>{{PORT}}</string>
16
16
  {{UPSTREAM_PLIST}}
17
17
  {{DEBUG_PLIST}}
18
+ {{HOT_RELOAD_PLIST}}
18
19
  </dict>
19
20
  <key>WorkingDirectory</key>
20
21
  <string>{{WORKING_DIR}}</string>
@@ -56,7 +56,7 @@ Always:
56
56
  ```
57
57
 
58
58
  ```
59
- Project directory: /home/manager/git_repos/kanfei_nowcast_e3b
59
+ Project directory: ~/git_repos/your-project
60
60
  Auto-detected session: db11f377-4ca8-4fc3-9b6d-1069da58c1b2.jsonl
61
61
  Modified: 2026-04-19 13:26:42
62
62
  Size: 4.8M
@@ -155,6 +155,21 @@ The cold rebuild consumed ~15% Q5h in one call on our Max 5x account. After that
155
155
 
156
156
  **Total cost of a manual compact cycle:** roughly ~15% cold rebuild plus a few % for the Opus summarization. Compare to hitting the 1M wall and losing the session entirely.
157
157
 
158
+ ### Stale transcripts get swept (CC's `cleanupPeriodDays`)
159
+
160
+ Heads up if you're treating the on-disk `.jsonl` as a "keep just in case" backup after `/clear`: it isn't durable. Claude Code maintains a transcript-retention setting `cleanupPeriodDays` in `~/.claude/settings.json` (default 30 days). CC runs a transcript cleanup at startup when its `~/.claude/.last-cleanup` sentinel is past the 24h freshness window — when that fires, CC walks every `.jsonl` under `~/.claude/projects/` and deletes any whose `mtime` is past the cutoff, along with the matching `<session-id>/` companion directory next to it. A session you compacted, `/clear`-ed, and stopped retaining ~31 days ago will be gone after the next launch that crosses the cleanup gate, even if you'd planned to grep it for context.
161
+
162
+ Practical implications:
163
+
164
+ - **If you need the post-compact JSONL preserved**, copy it out of `~/.claude/projects/` to a path that isn't subject to CC's cleanup — e.g. `~/snapshots/cc-jsonl-backups/`.
165
+ - **A stopped session held in heal-and-await state is especially vulnerable** — it's idle by definition, so it crosses `cleanupPeriodDays` faster than an actively-used session whose appends keep mtime fresh. If you've stopped a session intending to resume later, either resume promptly, `touch` the `.jsonl` to refresh mtime, or copy it out of the tree.
166
+ - Cleanup keys off `mtime`, and plain reads (`cat`/`grep`/`less`) don't refresh `mtime` — inspection doesn't extend retention.
167
+ - **Raise the retention setting on every machine you use CC on.** Adding `"cleanupPeriodDays": 36500` (~100 years) to `~/.claude/settings.json` defangs the documented cleanup path entirely. There's no documented upper bound; the schema just wants a positive integer. The cleanup logic re-reads the setting at each sweep, so you can land this even on machines where prior sweeps already happened.
168
+
169
+ **If a transcript was already swept** and you need to recover it, [`vsits/restore-claude-history-linux`](https://github.com/vsits/restore-claude-history-linux) (RCB) restores deleted `.jsonl` files from Linux filesystem snapshots — **ZFS**, **Btrfs**, or **Timeshift**. End-to-end-verified on Ubuntu 24.04; a real Btrfs dogfood confirmed a recovered transcript loads and resumes via `/resume` in a fresh CC session. macOS users have the same shape via the upstream [`garrettmoss/restore-claude-history`](https://github.com/garrettmoss/restore-claude-history) (Time Machine). Both tools also remind you to set `cleanupPeriodDays` afterward — otherwise the restored transcript gets re-swept on the next cleanup pass.
170
+
171
+ Tracked upstream as [anthropics/claude-code#62272](https://github.com/anthropics/claude-code/issues/62272) — cache-fix doesn't touch this surface, but documenting it because manual-compact users are the population most likely to bank on the `.jsonl` sticking around.
172
+
158
173
  ### Summarizer model
159
174
 
160
175
  The tool defaults to `claude --print --model claude-opus-4-7` for the highest-fidelity summary. Override with the `MANUAL_COMPACT_MODEL` env var — e.g. `MANUAL_COMPACT_MODEL=claude-sonnet-4-6` to minimize Q5h impact, or to point at a different model if Opus is rate-limited or retired.