npm - claude-code-cache-fix - Versions diffs - 3.9.0 → 4.0.0 - Mend

claude-code-cache-fix 3.9.0 → 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/README.md +77 -8
package/bin/claude-via-proxy.mjs +1 -0
package/bin/install-service.mjs +15 -0
package/package.json +1 -1
package/proxy/extensions/cache-telemetry.mjs +15 -3
package/proxy/extensions/signature-surface-hash.mjs +60 -0
package/proxy/extensions/thinking-block-sanitize.mjs +233 -19
package/proxy/pipeline.mjs +22 -1
package/proxy/server.mjs +44 -2
package/templates/cache-fix-proxy.service.template +1 -0
package/templates/com.cnighswonger.cache-fix-proxy.plist.template +1 -0
package/tools/MANUAL-COMPACT.md +16 -1

package/README.md CHANGED Viewed

@@ -12,7 +12,7 @@ Cache optimization proxy for [Claude Code](https://github.com/anthropics/claude-
 ## Quick Start: Proxy (recommended)
-The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as hot-reloadable extensions.
+The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as composable extensions.
 ```bash
 # Install
@@ -29,7 +29,7 @@ That's it. The proxy applies all 7 cache-fix extensions automatically. No wrappe
 ### What the proxy does
-On every `/v1/messages` request, 9 extensions run in order (one opt-in):
+On every `/v1/messages` request, 9 extensions run in order:
 | Extension | What it fixes |
 |-----------|--------------|
@@ -41,9 +41,9 @@ On every `/v1/messages` request, 9 extensions run in order (one opt-in):
 | `cache-control-normalize` | Normalizes cache_control markers across messages |
 | `cache-telemetry` | Extracts cache stats from response headers → `~/.claude/quota-status/{account.json,sessions/<id>.json}` |
 | `session-health` | Observes per-session thinking-desync risk (context size + thinking-block count) and warns before a session reaches the danger zone. Read-only |
-| `thinking-block-sanitize` | Drops omitted (empty-text) thinking blocks to head off the CC thinking-desync `400` (#63147). **Opt-in** (`CACHE_FIX_THINKING_SANITIZE=on`) |
+| `thinking-block-sanitize` | Drops omitted (empty-text) thinking blocks to head off the CC thinking-desync `400` (#63147). **On by default as of v4.0.0** (v1 mode). Set `CACHE_FIX_THINKING_SANITIZE=off` to disable, `=v2` for additional tools-hash-mismatch drop (opt-in). |
-Extensions are hot-reloadable — add, remove, or modify `.mjs` files in `proxy/extensions/` and changes apply to the next request without restarting. Configuration in `proxy/extensions.json`.
+Extensions live as `.mjs` files in `proxy/extensions/` with configuration in `proxy/extensions.json`. As of v4.0.0 the proxy loads them once at startup; adding, removing, or modifying an extension requires a supervisor-level proxy restart (see [Upgrading from v3.x](#upgrading-from-v3x)). Hot-reload is available as opt-in via `CACHE_FIX_HOT_RELOAD=on` for users who want the v3.x behavior back; that path is subject to the Node ESM stale-import race documented in [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196).
 **Developing a new extension?** See [docs/parallel-proxy-test-harness.md](docs/parallel-proxy-test-harness.md) for the pattern we use to test extensions end-to-end against real `claude -p` traffic without disturbing the production proxy.
@@ -147,6 +147,7 @@ All proxy settings are controlled via environment variables. Set them before sta
 | `CACHE_FIX_EXTENSIONS_DIR` | `proxy/extensions/` | Directory for extension `.mjs` files |
 | `CACHE_FIX_EXTENSIONS_CONFIG` | `proxy/extensions.json` | Extension configuration file |
 | `CACHE_FIX_DEBUG` | `0` | Enable debug logging |
+| `CACHE_FIX_HOT_RELOAD` | unset | Set to `on` to enable in-process extension hot-reload. Off by default as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x) for details and the supervisor restart flow. |
 ### Corporate environments (proxies, custom CAs)
@@ -210,6 +211,73 @@ Options (all optional; all fall back to the same env vars used by the CLI):
 *The embeddable factory was contributed by [@bilby91](https://github.com/bilby91) at [Crunchloop DAP](https://dap.crunchloop.ai) — see [PR #123](https://github.com/cnighswonger/claude-code-cache-fix/pull/123).*
+## Upgrading from v3.x
+**Behavior changes in v4.0.0:**
+- **`thinking-block-sanitize` v1 is now on by default.** Was opt-in via `CACHE_FIX_THINKING_SANITIZE=on` in v3.8.0–v3.9.x. After seven days of prod dogfood across 37 sessions (zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs. 92.44% baseline, sanitize firing on ~35% of sessions with ~800 blocks dropped per day) the v1 mitigation is the new default. Set `CACHE_FIX_THINKING_SANITIZE=off` to explicitly disable. v2 (additional tools-hash-mismatch drop) stays opt-in via `=v2`. See [#63147](https://github.com/anthropics/claude-code/issues/63147) and [#162](https://github.com/cnighswonger/claude-code-cache-fix/issues/162).
+- **In-process extension hot-reload is now off by default.** Was on in v3.x. Set `CACHE_FIX_HOT_RELOAD=on` to restore the prior behavior. Off-by-default eliminates the Node ESM stale-import race documented in [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196), where the watcher silently failed to load a newly-merged extension for 17 hours after a hot-reload trigger. The race fires when the file watcher re-imports an extension whose transitive dependencies are already cached by Node's loader; cold starts are unaffected.
+### Embedder note (Bun hosts, DAP-style integrations using `createProxyServer()` / `startProxy()`)
+v4.0.0 flips `CACHE_FIX_THINKING_SANITIZE` from default-off to default-on. The v1 omitted-text drop will run on every request body passing through the embedded proxy. If your host depends on the prior no-sanitization behavior (e.g., your downstream code expects empty `thinking` blocks to survive the proxy round-trip), preserve it by either:
+- Setting `CACHE_FIX_THINKING_SANITIZE=off` in your host's environment, OR
+- Setting `process.env.CACHE_FIX_THINKING_SANITIZE = "off"` in your code at any point before request handling — the mode is read per-request via `modeFromEnv()`, not cached at module load.
+The flip is backed by 7 days of prod dogfood (37 sessions, zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs 92.44% baseline). See [PR #201](https://github.com/cnighswonger/claude-code-cache-fix/pull/201) for the validation data and [#63147](https://github.com/anthropics/claude-code/issues/63147) for upstream context.
+Picking up a new extension or a code change to an existing one in v4.0.0 requires a supervisor-level proxy restart. There are two upgrade flows depending on whether you also want to opt back into hot-reload.
+### Flow 1 — code-only npm upgrade (recommended default)
+Your existing systemd unit / launchd plist is unchanged; only the proxy code on disk is updated by npm. Restart the running process to pick up the new code.
+**Linux (systemd user unit):**
+```
+npm install -g claude-code-cache-fix@4
+systemctl --user restart cache-fix-proxy
+```
+No `daemon-reload` required — the unit file content is unchanged.
+**macOS (launchd user agent):**
+```
+npm install -g claude-code-cache-fix@4
+launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
+```
+`kickstart` re-execs the agent under the existing plist.
+### Flow 2 — opt back into hot-reload at the supervisor layer
+Run if you actively use hot-reload (e.g., you drop custom extensions into the extensions dir on a live proxy and want them picked up without restart). This rewrites the unit / plist so `CACHE_FIX_HOT_RELOAD=on` is set every time the supervisor starts the proxy.
+**Linux (systemd user unit):**
+```
+CACHE_FIX_HOT_RELOAD=on cache-fix-proxy install-service
+systemctl --user daemon-reload
+systemctl --user restart cache-fix-proxy
+```
+`daemon-reload` is required because the unit file content changed.
+**macOS (launchd user agent):**
+```
+CACHE_FIX_HOT_RELOAD=on cache-fix-proxy install-service
+launchctl bootout gui/$(id -u)/com.cnighswonger.cache-fix-proxy
+launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.cnighswonger.cache-fix-proxy.plist
+launchctl kickstart gui/$(id -u)/com.cnighswonger.cache-fix-proxy
+```
+`bootout` + `bootstrap` is required because the plist contents changed — `kickstart` alone does not pick up plist changes.
+**Note on the hot-reload tradeoff:** even on the opt-in path, the ESM stale-import race remains possible on long-running processes. If you hit a degraded `/health` (returns 503 + `{status:"degraded",...}`), a process restart is the only recovery; the proxy logs a `[CRITICAL]` hint when this happens. See [#197](https://github.com/cnighswonger/claude-code-cache-fix/pull/197) for the observability layer.
 ## What this proxy defends against
 **Cache-economics regressions.** The original purpose of cache-fix is to absorb the cache-handling behaviors in Claude Code that cost users real money and quota — TTL downgrades, cache-breaking header churn, identity-latching issues, and the rest of the regression catalog documented across our issue history. The proxy sits between CC and the Anthropic API, normalizes the request and response stream, and emits enough observability (via statusline integration and the quota-status files) that users can see what their session is actually doing. This is the load-bearing feature for almost every user today.
@@ -353,7 +421,7 @@ Additionally, images read via the Read tool persist as base64 in conversation hi
 ## How it works
-**Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions are hot-reloadable `.mjs` files configured in `proxy/extensions.json`. All other traffic passes through untouched.
+**Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions live as `.mjs` files configured in `proxy/extensions.json` and load once at proxy startup (hot-reload is opt-in as of v4.0.0 — see [Upgrading from v3.x](#upgrading-from-v3x)). All other traffic passes through untouched.
 **Preload mode** (v2.x): A Node.js `--import` module that patches `globalThis.fetch` before Claude Code makes API calls. Applies the same fixes inline — scans user messages for relocated blocks, sorts tools, recomputes fingerprints, injects TTL markers.
@@ -765,17 +833,17 @@ Token thresholds are anchored to the observed ~382K-token trip with margin; the
 | `CACHE_FIX_THINKING_RISK_HIGH_TOKENS` | `340000` | Context-token level at which risk becomes `high` and the one-time stderr warn fires. |
 | `CACHE_FIX_THINKING_RISK` | unset (on) | Set to `off` to suppress the warning signal (stderr line + `thinking_desync_risk` field). Raw count telemetry keeps recording. |
-## Thinking-block sanitize (proxy mode, opt-in, thinking-desync mitigation)
+## Thinking-block sanitize (proxy mode, on by default, thinking-desync mitigation)
 The *mitigate* half of the thinking-desync response (the *warn-before* half is session-health above). On history-replay paths (resume / `--continue` / auto-compaction / parallel-tool-cancel), Claude Code re-sends prior assistant turns' extended thinking in the **omitted** shape `{ "type":"thinking", "thinking":"", "signature":"<intact>" }`. The API rejects modified thinking in the **latest** assistant message with a permanent `400 … thinking … blocks cannot be modified`, which wedges the session on every subsequent turn (upstream root cause: [anthropics/claude-code#63147](https://github.com/anthropics/claude-code/issues/63147)).
 The `thinking-block-sanitize` extension drops those omitted blocks — which the API treats as optional history — from the request before it is forwarded. Empirically-resolved turn-selection rule: drop omitted thinking from **all prior assistant turns and the latest assistant turn, unless the latest turn is an active tool-continuation** (its last block is a `tool_use` answered by a following `tool_result`). In that one case the API requires the signed thinking intact and the proxy cannot restore the emptied text, so it leaves the turn untouched. **No env var both preserves thinking and avoids the wedge for that case:** `CLAUDE_CODE_DISABLE_THINKING=1` / `MAX_THINKING_TOKENS=0` stop the wedge only by disabling thinking entirely (lossy — no reasoning), and `DISABLE_INTERLEAVED_THINKING=1` does *not* stop the `400` — so there the answer is don't-resume + heal/retire the session. That is exactly why the proxy mitigation matters: **it is the only path that preserves reasoning while avoiding the wedge** for the history-replay paths it covers. Non-empty thinking is never touched; `redacted_thinking` is out of scope for v1.
-**Opt-in.** v1 ships behind `CACHE_FIX_THINKING_SANITIZE=on` (default off): it mutates request bodies and full live-coverage validation is pending. The transform is deterministic and cache-prefix-stable, and emits a per-request `thinking_blocks_dropped` count into the per-session JSON (counts only — never content) that complements the session-health signal.
+**On by default as of v4.0.0.** v1 was opt-in via `CACHE_FIX_THINKING_SANITIZE=on` in v3.8.0–v3.9.x. After seven days of prod dogfood across 37 sessions (zero `cannot be modified` 400s, cache hit-rate aggregate 94.66% vs. 92.44% baseline, sanitize firing on ~35% of sessions with ~800 blocks dropped per day, max 938K context healthy) the v1 mitigation is the new default. The transform is deterministic and cache-prefix-stable, and emits a per-request `thinking_blocks_dropped` count into the per-session JSON (counts only — never content) that complements the session-health signal. v2 stays opt-in pending its own prod-dogfood window after [#196](https://github.com/cnighswonger/claude-code-cache-fix/issues/196) closes the silent-load failure mode that prevented v2 from running in prior testing.
 | Env var | Default | Purpose |
 |---------|---------|---------|
-| `CACHE_FIX_THINKING_SANITIZE` | unset (off) | Set to `on` to enable the request-path drop of omitted thinking blocks. Off = no-op (no mutation, no telemetry). |
+| `CACHE_FIX_THINKING_SANITIZE` | unset (= v1) | v4.0.0+: v1 omitted-block drop is the default. Set to `off` to explicitly disable (returns to v3.x default-off behavior). Set to `v2` to additionally enable the v2 tools-hash-mismatch drop. Set to `on` for v1 (back-compat — same as unset). |
 ## System prompt rewrite (preload mode, optional)
@@ -827,6 +895,7 @@ We monitor 30+ upstream Claude Code issues related to cache, quota, and context
 - **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery, proposed and built proxy architecture for v3.0.0
 - **[@vmfarms](https://github.com/vmfarms)** — Concurrent multi-runner production validation, surfaced proxy-mode resume-marker regex no-op (#96), TTL tier detection gap (#97), and image-strip stderr leak (#98)
 - **[@ojura](https://github.com/ojura)** — Opus 4.7 thinking-summaries root-cause analysis: filed [anthropics/claude-code#59844](https://github.com/anthropics/claude-code/issues/59844) with the CLI-binary decode (`!getIsNonInteractiveSession()` gate at offset 230510599 in v2.1.142) and the two-stacked-special-cases framing, which made the `thinking-display` extension (v3.6.1) a clean proxy-side complement to the proposed upstream fix
+- **[@yurukusa](https://github.com/yurukusa)** — [Cluster taxonomy](https://yurukusa.github.io/cc-safe-setup/cluster-tracker.html#cluster-extended-thinking-wedge) for [anthropics/claude-code#63147](https://github.com/anthropics/claude-code/issues/63147) thinking-desync wedge; the 13E (ToolSearch) sub-pattern synthesis that made the `thinking-block-sanitize` v2 directive predicate tractable (cache-fix #171, shipped behind `=v2` opt-in in v4.0.0)
 - **[@schuay](https://github.com/schuay)** — `quota-statusline.sh` enhancements: 10-cell quota bar with elapsed-time tick and exhaust-vs-reset projection replacing the prior `%/min` burn-rate display (PR #140, v3.6.2), and d/h vs h/m time-format autoselect plus named time-unit and burn-warmup constants (PR #143, v3.7.0)
 If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.

package/bin/claude-via-proxy.mjs CHANGED Viewed

@@ -55,6 +55,7 @@ async function dispatch() {
         "  CACHE_FIX_PROXY_PORT     Port for the proxy server\n" +
         "  CACHE_FIX_PROXY_UPSTREAM Upstream URL\n" +
         "  CACHE_FIX_DEBUG=1        Verbose proxy logging\n" +
+        "  CACHE_FIX_HOT_RELOAD=on  Enable in-process extension hot-reload (off by default; see #196)\n" +
         "  CACHE_FIX_CLAUDE_CMD     Override the `claude` command for the wrapper\n",
     );
     return 0;

package/bin/install-service.mjs CHANGED Viewed

@@ -23,6 +23,11 @@ function getDefaults() {
     port: validatePort(process.env.CACHE_FIX_PROXY_PORT || "9801"),
     upstream: process.env.CACHE_FIX_PROXY_UPSTREAM || "",
     debug: process.env.CACHE_FIX_DEBUG || "",
+    // Hot-reload is opt-in as of v4.0.0 (#196). Capture from env at install
+    // time so the operator can bake `CACHE_FIX_HOT_RELOAD=on` into the
+    // generated unit/plist via `CACHE_FIX_HOT_RELOAD=on cache-fix-proxy
+    // install-service`. Strict "on" match — anything else renders nothing.
+    hotReload: process.env.CACHE_FIX_HOT_RELOAD === "on" ? "on" : "",
     workingDir: resolve(__dirname, ".."),
   };
 }
@@ -93,6 +98,9 @@ function renderSystemdTemplate(template, vars) {
   const debugLine = vars.debug
     ? `Environment=CACHE_FIX_DEBUG=${vars.debug}`
     : "";
+  const hotReloadLine = vars.hotReload
+    ? `Environment=CACHE_FIX_HOT_RELOAD=${vars.hotReload}`
+    : "";
   // Allow callers to wire a Requires= line (e.g. another service the proxy
   // chains to). Empty string by default so the unit has no extra deps.
   const requiresLine = vars.requires
@@ -104,6 +112,7 @@ function renderSystemdTemplate(template, vars) {
     .replaceAll("{{PORT}}", vars.port)
     .replaceAll("{{UPSTREAM_LINE}}", upstreamLine)
     .replaceAll("{{DEBUG_LINE}}", debugLine)
+    .replaceAll("{{HOT_RELOAD_LINE}}", hotReloadLine)
     .replaceAll("{{REQUIRES_LINE}}", requiresLine)
     .replaceAll("{{WORKING_DIR}}", vars.workingDir)
     // Collapse triple newlines from empty optional lines down to single blank
@@ -117,12 +126,16 @@ function renderLaunchdTemplate(template, vars) {
   const debugPlist = vars.debug
     ? `        <key>CACHE_FIX_DEBUG</key>\n        <string>${vars.debug}</string>`
     : "";
+  const hotReloadPlist = vars.hotReload
+    ? `        <key>CACHE_FIX_HOT_RELOAD</key>\n        <string>${vars.hotReload}</string>`
+    : "";
   return template
     .replaceAll("{{NODE}}", vars.node)
     .replaceAll("{{SERVER_PATH}}", vars.serverPath)
     .replaceAll("{{PORT}}", vars.port)
     .replaceAll("{{UPSTREAM_PLIST}}", upstreamPlist)
     .replaceAll("{{DEBUG_PLIST}}", debugPlist)
+    .replaceAll("{{HOT_RELOAD_PLIST}}", hotReloadPlist)
     .replaceAll("{{WORKING_DIR}}", vars.workingDir)
     .replaceAll("{{LOG_DIR}}", vars.logDir)
     .replace(/\n\n+/g, "\n");
@@ -176,6 +189,7 @@ async function installSystemd({ paths, defaults, force = false } = {}) {
     port: defaults.port,
     upstream: defaults.upstream,
     debug: defaults.debug,
+    hotReload: defaults.hotReload,
     workingDir: defaults.workingDir,
     requires: "",
   });
@@ -275,6 +289,7 @@ async function installLaunchd({ paths, defaults, force = false } = {}) {
     port: defaults.port,
     upstream: defaults.upstream,
     debug: defaults.debug,
+    hotReload: defaults.hotReload,
     workingDir: defaults.workingDir,
     logDir: paths.logDir,
   });

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-code-cache-fix",
-  "version": "3.9.0",
+  "version": "4.0.0",
   "description": "Cache optimization proxy and interceptor for Claude Code. Fixes prompt cache bugs, stabilizes prefix, reduces quota burn.",
   "type": "module",
   "exports": {

package/proxy/extensions/cache-telemetry.mjs CHANGED Viewed

@@ -56,7 +56,12 @@ export function sessionFilePath(rawId) {
   return join(paths().sessionsDir, `${sessionFilename(rawId)}.json`);
 }
-function resolveSessionId(headers) {
+// Exported so sibling extensions can read the canonical session id from
+// REQUEST headers at their own onRequest time — they can't rely on
+// ctx.meta._sessionId being set, because this writer's onRequest is the
+// thing that populates it (and runs at order 600, after most extensions).
+// thinking-block-sanitize v2 (order 550) uses this for the same reason.
+export function resolveSessionId(headers) {
   if (!headers) return null;
   const sid =
     headers["x-claude-code-session-id"] ||
@@ -233,9 +238,16 @@ export default {
           // 590, stashes these before this writer runs). Optional — absent if
           // that extension is disabled or produced nothing this request.
           ...(ctx.meta._sessionHealth || {}),
-          // Additive thinking-block-sanitize drop count (order 550, opt-in).
-          // Optional — absent unless CACHE_FIX_THINKING_SANITIZE=on.
+          // Additive thinking-block-sanitize drop count (order 550). On by
+          // default since v4.0.0; present (possibly with thinking_blocks_dropped:0)
+          // whenever sanitize ran. Absent when CACHE_FIX_THINKING_SANITIZE=off
+          // or when the extension returned early before reaching the planner
+          // (e.g., body.messages not an array).
           ...(ctx.meta._thinkingSanitize || {}),
+          // Additive thinking-block-sanitize v2 fields (order 550, opt-in via
+          // CACHE_FIX_THINKING_SANITIZE=v2). Optional — absent unless v2 is
+          // enabled. Keys: thinking_blocks_dropped_v2 / tools_hash_baseline.
+          ...(ctx.meta._thinkingSanitizeV2 || {}),
           // Additive auto-1m-guard annotation (order 520). Optional — absent
           // unless the outbound request carried context-1m-2025-08-07 and the
           // mode wasn't off. Keys: auto_1m_detected / auto_1m_action /

package/proxy/extensions/signature-surface-hash.mjs ADDED Viewed

@@ -0,0 +1,60 @@
+// signature-surface-hash — hash helper for thinking-block-sanitize v2.
+//
+// Computes a deterministic 16-hex-char fingerprint of the inputs that
+// participate in the API's thinking-block signature: the tools surface,
+// and (forward-compat) optionally the system block or anthropic-beta
+// header value.
+//
+// v2 only passes `{ tools }`. The signature is left forward-compatible so
+// a future v3 directive can extend coverage without renaming this helper.
+//
+// Canonicalization rules (per directive proxy-thinking-block-sanitize-v2.md):
+//   - Each tool object: recursive stable JSON stringify with recursive key
+//     sorting at every nesting level. Nested JSON-schema objects (in
+//     input_schema, parameters, etc.) have their own keys, which also
+//     sort stably.
+//   - Preserve tools[] array order. Reordering tools changes which slot
+//     which tool occupies in the API's view; the hash MUST reflect that.
+//     (Note: sort-stabilization at order 200 currently locks the array
+//     order before v2 fires, so this rule is forward-compatibility against
+//     any future change in upstream ordering.)
+//   - Sentinel for empty/absent: if tools is undefined, null, or [], the
+//     hash input is the literal string "none". Rules out collision with
+//     other empty-shaped inputs in a future extension.
+//
+// Output: sha256(canonical_input).slice(0, 16) — 16 hex chars matches the
+// existing _sessionHealth / _thinkingSanitize precedent for in-JSON identifiers.
+import { createHash } from "node:crypto";
+// Recursive stable stringify: object keys sort, arrays preserve order,
+// primitives go through JSON.stringify as-is. Handles nested objects and
+// arrays to arbitrary depth.
+export function canonicalStringify(value) {
+  if (value === null || typeof value !== "object") {
+    return JSON.stringify(value);
+  }
+  if (Array.isArray(value)) {
+    return "[" + value.map(canonicalStringify).join(",") + "]";
+  }
+  const keys = Object.keys(value).sort();
+  const parts = keys.map((k) => JSON.stringify(k) + ":" + canonicalStringify(value[k]));
+  return "{" + parts.join(",") + "}";
+}
+// Compute the signature-surface hash. v2 passes only { tools }; system and
+// anthropic_beta are reserved for future versions.
+export function computeSignatureSurfaceHash({ tools, system, anthropic_beta } = {}) {
+  // Empty/absent tools → "none" sentinel (not the canonical-stringify of [],
+  // which would be "[]" and could collide with other empty-shaped inputs).
+  const toolsPart =
+    tools == null || (Array.isArray(tools) && tools.length === 0)
+      ? "none"
+      : canonicalStringify(tools);
+  // Reserved inputs — passed by future versions; v2 always omits them, so
+  // they contribute nothing to the hash today. Kept in the signature so
+  // existing call sites don't need to change when v3 adds them.
+  void system;
+  void anthropic_beta;
+  return createHash("sha256").update(toolsPart).digest("hex").slice(0, 16);
+}

package/proxy/extensions/thinking-block-sanitize.mjs CHANGED Viewed

@@ -1,12 +1,27 @@
 // thinking-block-sanitize — request-path mitigation for the CC thinking-desync
-// wedge (anthropics/claude-code#63147). On replay paths (resume / --continue /
+// wedge (anthropics/claude-code#63147).
+//
+// v1 (default since v4.0.0; CACHE_FIX_THINKING_SANITIZE unset or =on): On replay paths (resume / --continue /
 // auto-compaction / parallel-tool-cancel), CC re-sends prior assistant turns'
 // thinking in the OMITTED shape `{ type:"thinking", thinking:"", signature }`.
 // The API rejects modified thinking in the *latest* assistant message with a
-// permanent 400, which wedges the session. This extension drops the omitted
-// thinking blocks the API treats as optional, before the request is forwarded.
+// permanent 400, which wedges the session. v1 drops these omitted blocks
+// before forwarding. Never touches non-empty thinking; never touches
+// redacted_thinking (v1's empirical exclusion — zero observed in worst-case
+// wedged transcripts).
+//
+// v2 (CACHE_FIX_THINKING_SANITIZE=v2): Additionally handles yurukusa's "13E"
+// pattern — when ToolSearch dynamically loads a tool mid-conversation, the
+// prior assistant turn's thinking signature is invalidated because it was
+// computed over the now-stale tools surface. The API rejects + CC's harness
+// strips-and-retries, paying a 400 + retry tax every turn. v2 detects
+// cross-request tools-surface change via a per-session tools-hash baseline,
+// and strips ALL prior-turn signed thinking (both `thinking` blocks with
+// non-empty text AND `redacted_thinking` blocks — v2's scope is structural,
+// not empirical) on hash mismatch. Same active-tool-continuation latest-turn
+// guard as v1.
 //
-// Resolved turn-selection rule (directive Open Question 1, empirical capture):
+// Resolved turn-selection rule (v1 directive Open Question 1, empirical capture):
 //   - drop omitted thinking from ALL prior assistant turns, AND
 //   - from the LATEST assistant turn UNLESS it is an active tool-continuation
 //     (last block is a tool_use with a following tool_result) — that case is
@@ -16,15 +31,47 @@
 //     / MAX_THINKING_TOKENS=0 stop it only by disabling thinking entirely
 //     (lossy); DISABLE_INTERLEAVED_THINKING=1 does NOT stop the 400 — so the
 //     answer for that case is don't-resume + heal/retire.
-// Never touches non-empty thinking, and never touches redacted_thinking (v1).
 //
-// OPT-IN for v1: only runs when CACHE_FIX_THINKING_SANITIZE=on (default off) —
-// it mutates request bodies and its coverage is not yet live-validated.
+// v2 state pattern (per directive proxy-thinking-block-sanitize-v2.md):
+//   - In-memory per-session map keyed by canonical session filename, seeded
+//     once from sessions/<sid>.json on first request that session. Mirrors
+//     session-health's pattern. Lives at module scope.
+//   - Baseline updates ONLY on response success (HTTP 2xx). 4xx/5xx leave
+//     the baseline unchanged so a failed request's hash doesn't become the
+//     new ground truth.
+//   - First request observes-and-establishes (no strip; baseline is set
+//     after the response succeeds).
+//   - When canonical session id is "unknown" (raw id null/empty/whitespace),
+//     v2 no-ops entirely. The shared sessions/unknown.json would cross-
+//     contaminate baselines across unrelated agents otherwise.
+//
+// Modes via CACHE_FIX_THINKING_SANITIZE (as of v4.0.0 — v1 default-on flip):
+//   unset (or "on")                       — v1 only (omitted-text drop). DEFAULT.
+//   "off"                                 — extension no-ops (explicit disable)
+//   "v2"                                  — v1 + v2 (omitted-text drop AND
+//                                           tools-hash-mismatch drop). v2 is
+//                                           strict superset of v1.
+//   any other value                       — treated as v1 (the default), not off.
+//                                           Matches the precedent of being
+//                                           permissive about the on-path.
+//
+// v1 default-on rationale: 7-day prod dogfood across 37 sessions (2026-05-29
+// → 2026-06-05) on `=on`: zero `cannot be modified` 400s, cache hit-rate
+// aggregate 94.66% vs 92.44% baseline (no prefix degradation), sanitize fired
+// on ~35% of sessions, ~800 blocks dropped per day, max 938K context healthy.
+// v2 stays opt-in via `=v2` because the dogfood only ran v1.
 //
 // Order 550: after the request-body mutators (ttl-management 500) and before
 // session-health (590), so #160's thinking_block_count reflects the forwarded
-// body. The per-request drop count is exposed via ctx.meta._thinkingSanitize
-// for cache-telemetry (600) to merge into the per-session JSON.
+// body. The per-request drop counts are exposed via ctx.meta._thinkingSanitize
+// (v1 counter) and ctx.meta._thinkingSanitizeV2 (v2 counter + baseline) for
+// cache-telemetry (600) to merge into the per-session JSON.
+import { readFileSync } from "node:fs";
+import { resolveSessionId, sessionFilePath, sessionFilename } from "./cache-telemetry.mjs";
+import { computeSignatureSurfaceHash } from "./signature-surface-hash.mjs";
+// --- v1 predicates ---
 export function isOmittedThinking(block) {
   return (
@@ -70,14 +117,38 @@ function latestAssistantIndex(messages) {
   return -1;
 }
-// Pure planner: returns { messages, dropped }. Does not mutate the input.
-// `messages` is the new array (a message that loses all content is dropped).
-export function planSanitize(messages) {
-  if (!Array.isArray(messages)) return { messages, dropped: 0 };
+// --- v2 predicate ---
+// v2 strips signed `thinking` blocks (non-empty text) AND `redacted_thinking`
+// blocks. v1's `isOmittedThinking` filter handles the empty-text case
+// independently — when both flags are active, v1 drops the empty ones and v2
+// drops the signed ones; predicates are non-overlapping.
+export function isSignedThinkingForV2(block) {
+  if (!block) return false;
+  if (block.type === "redacted_thinking") return true;
+  // Non-empty thinking with a signature — v1 leaves these alone by design.
+  return (
+    block.type === "thinking" &&
+    typeof block.thinking === "string" &&
+    block.thinking.trim() !== "" &&
+    typeof block.signature === "string" &&
+    block.signature.length > 0
+  );
+}
+// --- Pure planner ---
+//
+// Returns { messages, dropped, droppedV2 }. Does not mutate input.
+// `v2StripSigned` is the externally-determined boolean: should v2's
+// signed-thinking drop fire this request? (Caller has already computed
+// hash mismatch + session-state checks.)
+export function planSanitize(messages, { v2StripSigned = false } = {}) {
+  if (!Array.isArray(messages)) return { messages, dropped: 0, droppedV2: 0 };
   const latestAsst = latestAssistantIndex(messages);
   const protectLatest = latestAsst >= 0 && isActiveToolContinuation(messages, latestAsst);
   let dropped = 0;
+  let droppedV2 = 0;
   let changed = false;
   const out = [];
   for (let i = 0; i < messages.length; i++) {
@@ -87,14 +158,24 @@ export function planSanitize(messages) {
       continue;
     }
     if (i === latestAsst && protectLatest) {
-      out.push(msg); // active continuation — leave its thinking intact
+      // Active continuation — leave thinking intact (both v1 and v2 respect
+      // this; the API needs the signed thinking for the pending tool call).
+      out.push(msg);
       continue;
     }
     const kept = msg.content.filter((b) => {
+      // v1 always-active drop predicate.
       if (isOmittedThinking(b)) {
         dropped++;
         return false;
       }
+      // v2-only drop predicate. Predicates are mutually exclusive on a single
+      // block: omitted thinking matches v1's predicate but not v2's, and
+      // signed/redacted thinking matches v2's predicate but not v1's.
+      if (v2StripSigned && isSignedThinkingForV2(b)) {
+        droppedV2++;
+        return false;
+      }
       return true;
     });
     if (kept.length === msg.content.length) {
@@ -106,25 +187,158 @@ export function planSanitize(messages) {
       changed = true;
     }
   }
-  return { messages: changed ? out : messages, dropped };
+  return { messages: changed ? out : messages, dropped, droppedV2 };
+}
+// --- v2 mode + state ---
+// "off" | "on" | "v2". As of v4.0.0 the default flipped from "off" to "on" —
+// v1 (omitted-text drop) is the new default behavior. Set
+// CACHE_FIX_THINKING_SANITIZE=off to explicitly disable; =v2 to additionally
+// enable the v2 tools-hash-mismatch drop (still opt-in pending its own
+// prod-dogfood window after #200 closes the silent-load failure mode).
+// Unknown values fall through to "on" — we are permissive about the on-path
+// and only treat the literal "off" as a disable.
+export function modeFromEnv(env = process.env) {
+  const v = env.CACHE_FIX_THINKING_SANITIZE;
+  if (v === "off") return "off";
+  if (v === "v2") return "v2";
+  return "on";
+}
+// Per-session state, in memory. Keyed by canonical session filename
+// (sessionFilename(rawId)). Each entry: { tools_hash_baseline }.
+// Mirrors session-health's pattern: seeded once from disk on first request
+// that session, then maintained in memory + persisted via cache-telemetry's
+// spread of ctx.meta._thinkingSanitizeV2.
+const v2SessionState = new Map();
+function seedV2FromFile(rawSid) {
+  let prev = null;
+  try {
+    prev = JSON.parse(readFileSync(sessionFilePath(rawSid), "utf8"));
+  } catch {}
+  return {
+    tools_hash_baseline:
+      typeof prev?.tools_hash_baseline === "string" ? prev.tools_hash_baseline : null,
+  };
+}
+// Test-only reset (also useful for proxy-restart simulation in unit tests).
+export function _resetV2State() {
+  v2SessionState.clear();
 }
+// --- Extension default-export ---
 export default {
   name: "thinking-block-sanitize",
   description:
-    "Drop omitted (empty-text) thinking blocks from prior assistant turns and the latest non-continuation turn, to head off the CC thinking-desync 400 (#63147). Opt-in via CACHE_FIX_THINKING_SANITIZE=on.",
+    "Drop omitted (empty-text) thinking blocks from prior assistant turns and the latest non-continuation turn, to head off the CC thinking-desync 400 (#63147). v1 mode: omitted-text drop only. v2 mode: also drop signed thinking + redacted_thinking on cross-request tools-hash mismatch (ToolSearch surface). v1 is now ON by default as of v4.0.0; set CACHE_FIX_THINKING_SANITIZE=off to disable, =v2 to additionally opt into v2.",
   order: 550,
   async onRequest(ctx) {
-    if (process.env.CACHE_FIX_THINKING_SANITIZE !== "on") return;
+    const mode = modeFromEnv();
+    if (mode === "off") return;
     const body = ctx.body;
     if (!body || !Array.isArray(body.messages)) return;
-    const { messages, dropped } = planSanitize(body.messages);
-    if (dropped > 0) body.messages = messages;
+    // v2 only fires when mode === "v2" AND we have a usable session id.
+    let v2StripSigned = false;
+    let stateKey = null;
+    let currentHash = null;
+    if (mode === "v2") {
+      // Resolve session id inline — cache-telemetry's onRequest runs at order
+      // 600, after us, so ctx.meta._sessionId is not yet set when we fire at
+      // order 550. We import resolveSessionId from cache-telemetry to keep
+      // canonicalization consistent.
+      const rawSid = resolveSessionId(ctx.headers);
+      stateKey = sessionFilename(rawSid);
+      // "unknown" canonical id → no-op for v2 (cross-contamination risk on
+      // the shared sessions/unknown.json baseline). v1's strip still runs
+      // below regardless.
+      if (stateKey !== "unknown") {
+        currentHash = computeSignatureSurfaceHash({ tools: body.tools });
+        // Seed in-memory state from disk on first encounter that session
+        // (covers proxy restart — re-reads persisted baseline).
+        let st = v2SessionState.get(stateKey);
+        if (!st) {
+          st = seedV2FromFile(rawSid);
+          v2SessionState.set(stateKey, st);
+        }
+        const baseline = st.tools_hash_baseline;
+        // Mismatch only fires when there IS a baseline AND it differs.
+        // First request (baseline === null) observes-and-establishes — no strip.
+        v2StripSigned = baseline !== null && baseline !== currentHash;
+        // Stash for the onResponseStart hook to advance the baseline iff the
+        // response succeeded. Stash BEFORE the plan + strip so the response
+        // path has access regardless of whether anything was dropped.
+        ctx.meta._thinkingSanitizeV2PendingHash = currentHash;
+        ctx.meta._thinkingSanitizeV2StateKey = stateKey;
+      }
+    }
+    const { messages, dropped, droppedV2 } = planSanitize(body.messages, {
+      v2StripSigned,
+    });
+    if (dropped > 0 || droppedV2 > 0) body.messages = messages;
     // Counts only — never content. Exposed for cache-telemetry to persist and
     // for the #160 session-health signal.
+    // v1 counter — unchanged, fires for both modes.
     ctx.meta._thinkingSanitize = { thinking_blocks_dropped: dropped };
+    // v2 counter — fires only in v2 mode. Includes the post-mismatch baseline
+    // value that cache-telemetry will persist (so consumers can see the
+    // current baseline in the session JSON even on requests that didn't
+    // strip). The actual advance only happens on response success below.
+    if (mode === "v2" && stateKey && stateKey !== "unknown") {
+      ctx.meta._thinkingSanitizeV2 = {
+        thinking_blocks_dropped_v2: droppedV2,
+        // Persist the SOON-TO-BE-NEW baseline (it'll be advanced on success).
+        // On 4xx/5xx, the cache-telemetry write still happens but the
+        // in-memory state isn't advanced — next request re-reads disk and
+        // sees the persisted value, which may now disagree with in-memory.
+        // We resolve that by NOT writing the new hash to disk on failure:
+        // see onResponseStart below, which is the only thing that advances
+        // both in-memory and (indirectly via cache-telemetry's spread) disk.
+        // For now, leave tools_hash_baseline at the CURRENT baseline value;
+        // onResponseStart will overwrite this in meta if the response is 2xx.
+        tools_hash_baseline: v2SessionState.get(stateKey)?.tools_hash_baseline ?? null,
+      };
+    }
+  },
+  // Advance the baseline only on HTTP 2xx response. 4xx/5xx leaves the
+  // in-memory state and the meta-stashed baseline untouched, so a failed
+  // request's hash never becomes the new ground truth.
+  async onResponseStart(ctx) {
+    const stateKey = ctx.meta._thinkingSanitizeV2StateKey;
+    const pendingHash = ctx.meta._thinkingSanitizeV2PendingHash;
+    if (!stateKey || !pendingHash) return;
+    if (typeof ctx.status !== "number") return;
+    if (ctx.status < 200 || ctx.status >= 300) return;
+    // Advance in-memory baseline.
+    let st = v2SessionState.get(stateKey);
+    if (!st) {
+      st = { tools_hash_baseline: null };
+      v2SessionState.set(stateKey, st);
+    }
+    st.tools_hash_baseline = pendingHash;
+    // Update the meta-stashed baseline so cache-telemetry's spread writes
+    // the new value to disk. If meta._thinkingSanitizeV2 wasn't stashed
+    // (e.g. mode flip mid-request), construct it now.
+    if (!ctx.meta._thinkingSanitizeV2) {
+      ctx.meta._thinkingSanitizeV2 = { thinking_blocks_dropped_v2: 0 };
+    }
+    ctx.meta._thinkingSanitizeV2.tools_hash_baseline = pendingHash;
   },
 };

package/proxy/pipeline.mjs CHANGED Viewed

@@ -3,6 +3,7 @@ import { join } from "node:path";
 import { pathToFileURL } from "node:url";
 let registry = [];
+let failedExtensions = []; // [{ file, error, lastAttempt }]
 export async function loadExtensions(dir, configPath) {
   let config = {};
@@ -15,6 +16,7 @@ export async function loadExtensions(dir, configPath) {
   const mjsFiles = files.filter((f) => f.endsWith(".mjs")).sort();
   const extensions = [];
+  const newlyFailed = [];
   for (const file of mjsFiles) {
     try {
       const mod = await import(pathToFileURL(join(dir, file)).href + "?t=" + Date.now());
@@ -29,12 +31,24 @@ export async function loadExtensions(dir, configPath) {
         extensions.push({ ...ext, order, _file: file });
       }
     } catch (err) {
-      process.stderr.write(`[pipeline] failed to load ${file}: ${err.message}\n`);
+      // Load-bearing observability: this branch is the only signal that the
+      // proxy is running with a degraded extension graph. See #196: a Node
+      // ESM cache stale-import race silently broke thinking-block-sanitize
+      // v2 for 17 hours post-merge before AITL grepped the journal. The
+      // [CRITICAL] prefix is harder to miss than the prior [pipeline] one,
+      // and the explicit "restart proxy to recover" hint tells the operator
+      // what to do — the underlying Node ESM cache problem can't be fixed
+      // in-process (you can't evict cached transitive imports), so a full
+      // process restart is the only path to recover the extension graph.
+      const msg = `[CRITICAL] extension load failed: ${file}: ${err.message} — restart the proxy via your supervisor to recover (in-process reload cannot fix stale ESM cache; see #196)\n`;
+      process.stderr.write(msg);
+      newlyFailed.push({ file, error: String(err.message || err), lastAttempt: new Date().toISOString() });
     }
   }
   extensions.sort((a, b) => a.order - b.order);
   registry = extensions;
+  failedExtensions = newlyFailed;
   return extensions;
 }
@@ -46,6 +60,13 @@ export function snapshotRegistry() {
   return [...registry];
 }
+// Exposed for /health and any operator-facing tool that wants to surface
+// extension-load failures. Returns a fresh array per call so callers can't
+// mutate internal state.
+export function getFailedExtensions() {
+  return failedExtensions.map((f) => ({ ...f }));
+}
 // Route scoping: extensions default to messages-only so that adding a new
 // route (e.g. /api/claude_cli/bootstrap) doesn't drag every existing
 // message-mutating extension onto it — most throw on a null body because

package/proxy/server.mjs CHANGED Viewed

@@ -3,7 +3,7 @@ import { pathToFileURL, URL } from "node:url";
 import config from "./config.mjs";
 import { forwardRequest } from "./upstream.mjs";
 import { streamResponse, createTelemetryRecord } from "./stream.mjs";
-import { loadExtensions, snapshotRegistry, runOnRequest, runOnResponseStart, runOnResponse } from "./pipeline.mjs";
+import { loadExtensions, snapshotRegistry, runOnRequest, runOnResponseStart, runOnResponse, getFailedExtensions } from "./pipeline.mjs";
 import { startWatcher } from "./watcher.mjs";
 function collectBody(req) {
@@ -238,6 +238,21 @@ async function handleBootstrap(clientReq, clientRes) {
 }
 function handleHealth(_req, res) {
+  // Surface extension-load failures so callers (operators, monitoring) see
+  // a degraded proxy state instead of a misleading "ok". See #196: a Node
+  // ESM cache stale-import race silently broke thinking-block-sanitize v2
+  // for 17 hours post-merge before anyone noticed. /health returning "ok"
+  // through that window was load-bearing in the silence.
+  const failed = getFailedExtensions();
+  if (failed.length > 0) {
+    res.writeHead(503, { "content-type": "application/json" });
+    res.end(JSON.stringify({
+      status: "degraded",
+      failed_extensions: failed,
+      hint: "restart the proxy via your supervisor to recover (in-process reload cannot fix stale ESM cache; #196)",
+    }));
+    return;
+  }
   res.writeHead(200, { "content-type": "application/json" });
   res.end(JSON.stringify({ status: "ok" }));
 }
@@ -290,7 +305,34 @@ export async function startProxy(options = {}) {
   const bind = options.bind ?? config.bind;
   const extensionsDir = options.extensionsDir ?? config.extensionsDir;
   const extensionsConfig = options.extensionsConfig ?? config.extensionsConfig;
-  const watch = options.watch !== false;
+  // Hot-reload is opt-in as of v4.0.0 (#196). The in-process watcher is the
+  // only code path that triggers the Node ESM stale-import race; cold starts
+  // have an empty module cache and load extensions cleanly. Strict `=== "on"`
+  // means any other value (including "true"/"1"/"yes") is treated as off —
+  // the safe default. Note this is the opposite stance from
+  // CACHE_FIX_THINKING_SANITIZE (default-on; only literal "off" disables):
+  // a hot-reload enable is a footgun, so we require the operator to type the
+  // exact opt-in token; a sanitize disable is also a footgun (loses the
+  // wedge mitigation), so we require the exact disable token there.
+  const hotReloadOptIn = process.env.CACHE_FIX_HOT_RELOAD === "on";
+  const watch = options.watch !== false && hotReloadOptIn;
+  // Boot banner on stderr so the EFFECTIVE hot-reload mode is visible in the
+  // supervisor's log (journalctl --user / ~/Library/Logs/) without being
+  // noisy for monitoring tools that line-grep stderr. Keyed off the effective
+  // `watch` value, not the raw envvar, so an embedder calling startProxy({
+  // watch: false }) with the envvar set sees "off" (which is the truth — the
+  // watcher is suppressed regardless of envvar in that case). Supervisor-
+  // neutral wording — no version pin (lives in CHANGELOG/README instead).
+  if (watch) {
+    process.stderr.write(
+      "[cache-fix] hot-reload: on (CACHE_FIX_HOT_RELOAD=on) — long-running processes can hit a Node ESM stale-import race; see #196. Restart the proxy via your supervisor to recover.\n",
+    );
+  } else {
+    process.stderr.write(
+      "[cache-fix] hot-reload: off (set CACHE_FIX_HOT_RELOAD=on to enable). Extension changes require a supervisor-level proxy restart.\n",
+    );
+  }
   let watcher = null;
   try {

package/templates/cache-fix-proxy.service.template CHANGED Viewed

@@ -11,6 +11,7 @@ RestartSec=5
 Environment=CACHE_FIX_PROXY_PORT={{PORT}}
 {{UPSTREAM_LINE}}
 {{DEBUG_LINE}}
+{{HOT_RELOAD_LINE}}
 WorkingDirectory={{WORKING_DIR}}
 [Install]

package/templates/com.cnighswonger.cache-fix-proxy.plist.template CHANGED Viewed

@@ -15,6 +15,7 @@
         <string>{{PORT}}</string>
 {{UPSTREAM_PLIST}}
 {{DEBUG_PLIST}}
+{{HOT_RELOAD_PLIST}}
     </dict>
     <key>WorkingDirectory</key>
     <string>{{WORKING_DIR}}</string>

package/tools/MANUAL-COMPACT.md CHANGED Viewed

@@ -56,7 +56,7 @@ Always:
 ```
 ```
-Project directory: /home/manager/git_repos/kanfei_nowcast_e3b
+Project directory: ~/git_repos/your-project
 Auto-detected session: db11f377-4ca8-4fc3-9b6d-1069da58c1b2.jsonl
   Modified: 2026-04-19 13:26:42
   Size: 4.8M
@@ -155,6 +155,21 @@ The cold rebuild consumed ~15% Q5h in one call on our Max 5x account. After that
 **Total cost of a manual compact cycle:** roughly ~15% cold rebuild plus a few % for the Opus summarization. Compare to hitting the 1M wall and losing the session entirely.
+### Stale transcripts get swept (CC's `cleanupPeriodDays`)
+Heads up if you're treating the on-disk `.jsonl` as a "keep just in case" backup after `/clear`: it isn't durable. Claude Code maintains a transcript-retention setting `cleanupPeriodDays` in `~/.claude/settings.json` (default 30 days). CC runs a transcript cleanup at startup when its `~/.claude/.last-cleanup` sentinel is past the 24h freshness window — when that fires, CC walks every `.jsonl` under `~/.claude/projects/` and deletes any whose `mtime` is past the cutoff, along with the matching `<session-id>/` companion directory next to it. A session you compacted, `/clear`-ed, and stopped retaining ~31 days ago will be gone after the next launch that crosses the cleanup gate, even if you'd planned to grep it for context.
+Practical implications:
+- **If you need the post-compact JSONL preserved**, copy it out of `~/.claude/projects/` to a path that isn't subject to CC's cleanup — e.g. `~/snapshots/cc-jsonl-backups/`.
+- **A stopped session held in heal-and-await state is especially vulnerable** — it's idle by definition, so it crosses `cleanupPeriodDays` faster than an actively-used session whose appends keep mtime fresh. If you've stopped a session intending to resume later, either resume promptly, `touch` the `.jsonl` to refresh mtime, or copy it out of the tree.
+- Cleanup keys off `mtime`, and plain reads (`cat`/`grep`/`less`) don't refresh `mtime` — inspection doesn't extend retention.
+- **Raise the retention setting on every machine you use CC on.** Adding `"cleanupPeriodDays": 36500` (~100 years) to `~/.claude/settings.json` defangs the documented cleanup path entirely. There's no documented upper bound; the schema just wants a positive integer. The cleanup logic re-reads the setting at each sweep, so you can land this even on machines where prior sweeps already happened.
+**If a transcript was already swept** and you need to recover it, [`vsits/restore-claude-history-linux`](https://github.com/vsits/restore-claude-history-linux) (RCB) restores deleted `.jsonl` files from Linux filesystem snapshots — **ZFS**, **Btrfs**, or **Timeshift**. End-to-end-verified on Ubuntu 24.04; a real Btrfs dogfood confirmed a recovered transcript loads and resumes via `/resume` in a fresh CC session. macOS users have the same shape via the upstream [`garrettmoss/restore-claude-history`](https://github.com/garrettmoss/restore-claude-history) (Time Machine). Both tools also remind you to set `cleanupPeriodDays` afterward — otherwise the restored transcript gets re-swept on the next cleanup pass.
+Tracked upstream as [anthropics/claude-code#62272](https://github.com/anthropics/claude-code/issues/62272) — cache-fix doesn't touch this surface, but documenting it because manual-compact users are the population most likely to bank on the `.jsonl` sticking around.
 ### Summarizer model
 The tool defaults to `claude --print --model claude-opus-4-7` for the highest-fidelity summary. Override with the `MANUAL_COMPACT_MODEL` env var — e.g. `MANUAL_COMPACT_MODEL=claude-sonnet-4-6` to minimize Q5h impact, or to point at a different model if Opus is rate-limited or retired.