npm - claude-code-cache-fix - Versions diffs - 3.0.0 → 3.0.2 - Mend

claude-code-cache-fix 3.0.0 → 3.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -4,199 +4,158 @@
 English | [中文](./README.zh.md) | [한국어](./README.ko.md) | [Português](./docs/guia-pt-br.md)
-Fixes prompt cache regressions in [Claude Code](https://github.com/anthropics/claude-code) that cause **up to 20x cost increase** on resumed sessions, plus monitoring for silent context degradation. Confirmed through v2.1.112. Opus 4.7 compatible.
+Cache optimization proxy for [Claude Code](https://github.com/anthropics/claude-code). Fixes prompt cache bugs that cause excessive quota burn, stabilizes the request prefix, and monitors for silent regressions. Works with all CC versions including the v2.1.113+ Bun binary.
-> **Opus 4.7 advisory:** Our metered data shows 4.7 burns Q5h quota at **~2.4x the rate of 4.6** for equivalent visible token counts. Two factors: a new tokenizer (up to 35% more tokens, [documented](https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7)) and adaptive thinking overhead (~105%, not documented in usage response). Workaround: `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` (may reduce quality). Image stripping (`CACHE_FIX_IMAGE_KEEP_LAST`) is even more important on 4.7 due to high-res image support increasing image token counts. See [Discussion #25](https://github.com/cnighswonger/claude-code-cache-fix/discussions/25) for full analysis.
+> **v3.0.1** — Local HTTP proxy with 7 hot-reloadable extensions. A/B tested on v2.1.117: **95.5% cache hit rate through proxy vs 82.3% direct** on first warm turn. [Full release notes →](https://github.com/cnighswonger/claude-code-cache-fix/releases/tag/v3.0.0)
-## Security model
-> **This interceptor patches `globalThis.fetch`.** By design, it has full read/write access to all API requests and responses in the Claude Code process. This is inherent to the approach — any fetch interceptor, proxy, or gateway has this position.
-**What it does:** Modifies outgoing request structure (block order, fingerprint, TTL, git-status) to fix cache bugs. Reads response headers and SSE usage data for monitoring.
-**What it does NOT do:** No network calls from the interceptor. All telemetry is written to local files under `~/.claude/`. No data leaves your machine unless you explicitly opt in to [claude-code-meter](https://github.com/cnighswonger/claude-code-meter) sharing (separate package, requires interactive consent).
+> **Opus 4.7 advisory:** Metered data shows 4.7 burns Q5h quota at **~2.4x the rate of 4.6** for equivalent visible token counts ([independently confirmed by @ArkNill](https://github.com/ArkNill/claude-code-hidden-problem-analysis/blob/main/16_OPUS-47-ADVISORY.md)). Two factors: a new tokenizer (up to 35% more tokens, [documented](https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7)) and adaptive thinking overhead (~105%, not documented in usage response). The Q5h impact compounds into **Q7d** — the weekly quota ceiling that most heavy users will hit first. Workaround: `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` reduces burn by ~3.3x but may reduce quality on complex tasks. See [Discussion #25](https://github.com/cnighswonger/claude-code-cache-fix/discussions/25) (initial observation) and [Discussion #42](https://github.com/cnighswonger/claude-code-cache-fix/discussions/42) (controlled A/B data + Q7d analysis).
-**Supply chain:** Single unminified file (`preload.mjs`, ~1,700 lines). One dependency (`zod` for schema validation in tests only). Review before installing. npm provenance links each published version to its source commit.
+## Quick Start: Proxy (recommended)
-**Independent audit:** [Assessed as "LEGITIMATE TOOL"](https://github.com/anthropics/claude-code/issues/38335#issuecomment-4244413605) by @TheAuditorTool (2026-04-14).
+The proxy works with any CC version — Node.js or Bun binary. It sits between Claude Code and the Anthropic API, applying cache fixes as hot-reloadable extensions.
-## The problem
+```bash
+# Install
+npm install -g claude-code-cache-fix
-When you use `--resume` or `/resume` in Claude Code, the prompt cache breaks silently. Instead of reading cached tokens (cheap), the API rebuilds them from scratch on every turn (expensive). A session that should cost ~$0.50/hour can burn through $5–10/hour with no visible indication anything is wrong.
+# Start the proxy (runs on localhost:9801)
+node "$(npm root -g)/claude-code-cache-fix/proxy/server.mjs" &
-Three bugs cause this:
+# Launch Claude Code through it
+ANTHROPIC_BASE_URL=http://127.0.0.1:9801 claude
+```
-1. **Partial block scatter** — Attachment blocks (skills listing, MCP servers, deferred tools, hooks) are supposed to live in `messages[0]`. On resume, some or all of them drift to later messages, changing the cache prefix.
+That's it. The proxy applies all 7 cache-fix extensions automatically. No wrapper scripts, no `NODE_OPTIONS`, no preload.
-2. **Fingerprint instability** — The `cc_version` fingerprint (e.g. `2.1.92.a3f`) is computed from `messages[0]` content including meta/attachment blocks. When those blocks shift, the fingerprint changes, the system prompt changes, and cache busts.
+### What the proxy does
-3. **Non-deterministic tool ordering** — Tool definitions can arrive in different orders between turns, changing request bytes and invalidating the cache key.
+On every `/v1/messages` request, 7 extensions run in order:
-Additionally, images read via the Read tool persist as base64 in conversation history and are sent on every subsequent API call, compounding token costs silently.
+| Extension | What it fixes |
+|-----------|--------------|
+| `fingerprint-strip` | Removes unstable cc_version fingerprint from system prompt |
+| `sort-stabilization` | Deterministic ordering of tool and MCP definitions |
+| `ttl-management` | Detects server TTL tier, injects correct cache_control markers |
+| `identity-normalization` | Normalizes message identity fields for prefix stability |
+| `fresh-session-sort` | Fixes non-deterministic ordering on first turn |
+| `cache-control-normalize` | Normalizes cache_control markers across messages |
+| `cache-telemetry` | Extracts cache stats from response headers → `~/.claude/quota-status.json` |
-## Installation
+Extensions are hot-reloadable — add, remove, or modify `.mjs` files in `proxy/extensions/` and changes apply to the next request without restarting. Configuration in `proxy/extensions.json`.
-Requires Node.js >= 18 and Claude Code installed via npm (not the standalone binary).
+### Running as a service
-```bash
-npm install -g claude-code-cache-fix
-```
+**Linux (systemd — recommended):**
-## Usage
+Create `~/.config/systemd/user/cache-fix-proxy.service`:
-The fix works as a Node.js preload module that intercepts API requests before they leave your machine.
+```ini
+[Unit]
+Description=Claude Code Cache Fix Proxy (v3.x)
+After=network.target
-### Option A: Wrapper script (recommended)
+[Service]
+Type=simple
+ExecStart=/usr/local/bin/node /path/to/claude-code-cache-fix/proxy/server.mjs
+Restart=on-failure
+RestartSec=5
+Environment=CACHE_FIX_PROXY_PORT=9801
-Create a wrapper script (e.g. `~/bin/claude-fixed`):
+[Install]
+WantedBy=default.target
+```
 ```bash
-#!/bin/bash
-NPM_GLOBAL_ROOT="$(npm root -g 2>/dev/null)"
-CLAUDE_NPM_CLI="$NPM_GLOBAL_ROOT/@anthropic-ai/claude-code/cli.js"
-CACHE_FIX="$NPM_GLOBAL_ROOT/claude-code-cache-fix/preload.mjs"
+systemctl --user daemon-reload
+systemctl --user enable --now cache-fix-proxy
-if [ ! -f "$CLAUDE_NPM_CLI" ]; then
-  echo "Error: Claude Code npm package not found at $CLAUDE_NPM_CLI" >&2
-  echo "Install with: npm install -g @anthropic-ai/claude-code" >&2
-  exit 1
-fi
-if [ ! -f "$CACHE_FIX" ]; then
-  echo "Error: claude-code-cache-fix not found at $CACHE_FIX" >&2
-  echo "Install with: npm install -g claude-code-cache-fix" >&2
-  exit 1
-fi
-exec env NODE_OPTIONS="--import $CACHE_FIX" node "$CLAUDE_NPM_CLI" "$@"
+# Optional: start on boot (before login)
+sudo loginctl enable-linger $USER
 ```
-```bash
-chmod +x ~/bin/claude-fixed
-```
+A `cache-fix-proxy install-service` subcommand is planned for v3.1.0 ([#48](https://github.com/cnighswonger/claude-code-cache-fix/issues/48)).
+**Fallback (any OS):**
-Adjust `CLAUDE_NPM_CLI` if your npm global prefix differs. Find it with:
 ```bash
-npm root -g
+nohup node "$(npm root -g)/claude-code-cache-fix/proxy/server.mjs" > /tmp/cache-fix-proxy.log 2>&1 &
+echo 'export ANTHROPIC_BASE_URL=http://127.0.0.1:9801' >> ~/.bashrc
 ```
-### Option B: Shell alias
+### Health check
 ```bash
-alias claude='NODE_OPTIONS="--import claude-code-cache-fix" node "$(npm root -g)/@anthropic-ai/claude-code/cli.js"'
+curl http://127.0.0.1:9801/health
+# {"status":"ok"}
 ```
-### Option C: Direct invocation
+## Quick Start: Preload (CC v2.1.112 and earlier)
+If you're on a Node.js-based CC version (v2.1.112 or earlier), the preload interceptor works without a proxy:
 ```bash
+npm install -g claude-code-cache-fix
 NODE_OPTIONS="--import claude-code-cache-fix" claude
 ```
-> **Note**: This only works if `claude` points to the npm/Node installation. The standalone binary uses a different execution path that bypasses Node.js preloads.
-### Windows users
-On Windows, `NODE_OPTIONS="--import ..."` doesn't work the same way as on Linux/macOS. Use the included `claude-fixed.bat` wrapper instead:
-1. After installing both packages globally:
-   ```bat
-   npm install -g claude-code-cache-fix
-   npm install -g @anthropic-ai/claude-code
-   ```
+> **Note:** The preload does NOT work on CC v2.1.113+ (Bun binary). Use the proxy above.
-2. Copy `claude-fixed.bat` from this package to a directory in your PATH (e.g., `C:\Users\<you>\bin\`):
-   ```bat
-   copy "%NPM_ROOT%\claude-code-cache-fix\claude-fixed.bat" C:\Users\%USERNAME%\bin\
-   ```
-   Or find the file manually at your npm global root (run `npm root -g` to locate it).
-3. Run Claude Code with the interceptor active:
-   ```bat
-   claude-fixed [any claude args...]
-   ```
-The wrapper dynamically resolves your npm global root, constructs a `file:///` URL for the preload module (converting backslashes to forward slashes for Node.js), and launches Claude Code with the interceptor loaded. All environment variables (`CACHE_FIX_DEBUG`, `CACHE_FIX_IMAGE_KEEP_LAST`, etc.) work the same as on Linux/macOS.
-Credit: [@TomTheMenace](https://github.com/anthropics/claude-code/issues/38335) contributed the Windows wrapper and validated the interceptor across a 7.5-hour, 536-call Opus 4.6 session on Windows — 98.4% cache hit rate, 81% of calls had fingerprint instability that the interceptor corrected.
+See [docs/preload-setup.md](docs/preload-setup.md) for wrapper scripts, shell aliases, Windows instructions, and VS Code preload-mode integration.
 ## VS Code Extension
-### Option A: VSIX extension (recommended)
+The [VS Code extension](https://github.com/cnighswonger/claude-code-cache-fix-vscode) (v0.5.0) supports both proxy and preload modes:
-The easiest path — a VS Code extension that handles everything automatically:
+**Proxy mode (recommended):**
+1. Start the proxy (see above)
+2. In VS Code command palette: **Claude Code Cache Fix: Enable Proxy Mode**
+3. Restart any active Claude Code session
-1. Install the interceptor: `npm install -g claude-code-cache-fix`
+**Preload mode (CC ≤v2.1.112):**
+1. `npm install -g claude-code-cache-fix`
 2. Download the VSIX from [GitHub Releases](https://github.com/cnighswonger/claude-code-cache-fix-vscode/releases/latest)
-3. Install: `code --install-extension claude-code-cache-fix-0.1.0.vsix`
-   (or in VS Code: Extensions → `...` menu → "Install from VSIX...")
-4. Restart any active Claude Code session
+3. Install: `code --install-extension claude-code-cache-fix-0.5.0.vsix`
+4. Command palette: **Claude Code Cache Fix: Enable**
-The extension auto-configures `claudeCode.claudeProcessWrapper` on activation. No manual settings needed. Works on Windows, macOS, and Linux.
+For manual VS Code wrapper setup (without the VSIX), see [docs/preload-setup.md](docs/preload-setup.md#vs-code-preload-mode).
-Commands available in the VS Code command palette:
-- **Claude Code Cache Fix: Enable** / **Disable** / **Show Status**
-### Option B: Manual wrapper (if you prefer not to install the VSIX)
+## Security model
-The VS Code Claude Code extension spawns `claude.exe` / `claude` as a subprocess. The `claude-code.environmentVariables` setting does **not** propagate `NODE_OPTIONS`, so a process wrapper is required.
+> **The proxy and interceptor have full read/write access to API requests and responses.** This is inherent to the approach — any fetch interceptor, proxy, or gateway has this position.
-**Linux / macOS** — create `~/bin/claude-vscode-wrapper`:
+**What it does:** Modifies outgoing request structure (block order, fingerprint, TTL, git-status) to fix cache bugs. Reads response headers and SSE usage data for monitoring.
-```bash
-#!/bin/bash
-NPM_ROOT="$(npm root -g 2>/dev/null)"
-PRELOAD="$NPM_ROOT/claude-code-cache-fix/preload.mjs"
-shift  # VS Code passes the original claude path as $1
-export NODE_OPTIONS="--import $PRELOAD"
-exec node "$NPM_ROOT/@anthropic-ai/claude-code/cli.js" "$@"
-```
+**What it does NOT do:** No network calls from the proxy or interceptor. All telemetry is written to local files under `~/.claude/`. No data leaves your machine unless you explicitly opt in to [claude-code-meter](https://github.com/cnighswonger/claude-code-meter) sharing (separate package, requires interactive consent).
-```bash
-chmod +x ~/bin/claude-vscode-wrapper
-```
+**Supply chain:** Proxy mode: 7 small extension modules in `proxy/extensions/` (each under 200 lines). Preload mode: single unminified file (`preload.mjs`, ~1,700 lines). One dev dependency (`zod` for schema validation in tests only). Review before installing. npm provenance links each published version to its source commit.
-Add to VS Code `settings.json`:
-```json
-{
-  "claudeCode.claudeProcessWrapper": "/home/YOUR_USERNAME/bin/claude-vscode-wrapper"
-}
-```
+**Independent audit:** [Assessed as "LEGITIMATE TOOL"](https://github.com/anthropics/claude-code/issues/38335#issuecomment-4244413605) by @TheAuditorTool (2026-04-14).
-**Windows** — `.bat`/`.cmd` wrappers fail because the extension uses `child_process.spawn()` without `shell: true`. Use the C wrapper source included in this package (`tools/claude-vscode-wrapper.c`):
+## The problem
-```cmd
-cl tools\claude-vscode-wrapper.c /Fe:claude-vscode-wrapper.exe
-```
+When you use `--resume` or `/resume` in Claude Code, the prompt cache breaks silently. Instead of reading cached tokens (cheap), the API rebuilds them from scratch on every turn (expensive). A session that should cost ~$0.50/hour can burn through $5–10/hour with no visible indication anything is wrong.
-Then set in VS Code `settings.json`:
+Three bugs cause this:
-```json
-{
-  "claudeCode.claudeProcessWrapper": "C:\\path\\to\\claude-vscode-wrapper.exe"
-}
-```
+1. **Partial block scatter** — Attachment blocks (skills listing, MCP servers, deferred tools, hooks) are supposed to live in `messages[0]`. On resume, some or all drift to later messages, changing the cache prefix.
-### Known limitations (VS Code)
+2. **Fingerprint instability** — The `cc_version` fingerprint (e.g. `2.1.92.a3f`) is computed from `messages[0]` content including meta/attachment blocks. When those blocks shift, the fingerprint changes, the system prompt changes, and cache busts.
-- **Fingerprint fix**: Fixed in v1.11.0 — the safety check now handles both the v2.1.108+ extraction method and the legacy method. No workaround needed. (Previously required `CACHE_FIX_SKIP_FINGERPRINT=1`.)
+3. **Non-deterministic tool ordering** — Tool definitions can arrive in different orders between turns, changing request bytes and invalidating the cache key.
-Credit: [@JEONG-JIWOO](https://github.com/JEONG-JIWOO) and [@X-15](https://github.com/X-15) for the VS Code extension investigation and C wrapper ([#16](https://github.com/cnighswonger/claude-code-cache-fix/issues/16)).
+Additionally, images read via the Read tool persist as base64 in conversation history and are sent on every subsequent API call, compounding token costs silently.
 ## How it works
-The module intercepts `globalThis.fetch` before Claude Code makes API calls to `/v1/messages`. On each call it:
+**Proxy mode** (v3.0.0+): An HTTP server on `localhost:9801` intercepts `POST /v1/messages` requests. Seven extension modules process each request through a pipeline — normalizing block order, stripping fingerprints, stabilizing tool sort, managing TTL markers. Extensions are hot-reloadable `.mjs` files configured in `proxy/extensions.json`. All other traffic passes through untouched.
-1. **Scans all user messages** for relocated attachment blocks (skills, MCP, deferred tools, hooks) and moves the latest version of each back to `messages[0]`, matching fresh session layout
-2. **Sorts tool definitions** alphabetically by name for deterministic ordering
-3. **Recomputes the cc_version fingerprint** from the real user message text instead of meta/attachment content
+**Preload mode** (v2.x): A Node.js `--import` module that patches `globalThis.fetch` before Claude Code makes API calls. Applies the same fixes inline — scans user messages for relocated blocks, sorts tools, recomputes fingerprints, injects TTL markers.
-All fixes are idempotent — if nothing needs fixing, the request passes through unmodified. The interceptor is read-only with respect to your conversation; it only normalizes the request structure before it hits the API.
+Both modes are idempotent — if nothing needs fixing, the request passes through unmodified. Neither mode modifies your conversation; they only normalize the request structure before it hits the API.
-## Graduating from Fixes
+## Graduating from fixes
-The interceptor serves three purposes with different lifecycles:
+The package serves three purposes with different lifecycles:
 | Purpose | Examples | When to disable |
 |---------|----------|-----------------|
@@ -204,7 +163,7 @@ The interceptor serves three purposes with different lifecycles:
 | **Monitoring** | Quota tracking, microcompact detection, GrowthBook flags | Keep permanently — these detect future regressions |
 | **Optimizations** | Image stripping, output efficiency rewrite | Keep as long as they help your workflow |
-### Health status
+### Health status (preload mode)
 On first API call, the interceptor logs a health status line (requires `CACHE_FIX_DEBUG=1`):
@@ -212,25 +171,14 @@ On first API call, the interceptor logs a health status line (requires `CACHE_FI
 cache-fix health: relocate=active(2h ago) fingerprint=dormant(5 clean sessions) tool_sort=active ttl=active identity=waiting
 ```
-Status meanings:
 - **active(Xh ago)** — fix was applied recently
-- **dormant(N clean sessions)** — bug not detected in N resume sessions; CC may have fixed it
-- **safety-blocked(Nx)** — round-trip verification failed; CC changed its algorithm, fix auto-disabled
+- **dormant(N clean sessions)** — bug not detected in N sessions; CC may have fixed it
+- **safety-blocked(Nx)** — round-trip verification failed; fix auto-disabled
 - **waiting** — fix hasn't been triggered yet
-When a fix shows `dormant`, you can safely disable it:
-```bash
-export CACHE_FIX_SKIP_RELOCATE=1  # example
-```
-To disable all fixes but keep monitoring:
-```bash
-export CACHE_FIX_DISABLED=1
-```
 ### Regression detection
-If cache_read ratio drops below 50% across 5+ calls after disabling fixes, you'll see:
+If cache_read ratio drops below 50% across 5+ calls after disabling fixes:
 ```
 REGRESSION WARNING: cache_read ratio averaged 12% across last 5 calls.
 Fixes are disabled — consider re-enabling to recover cache performance.
@@ -240,10 +188,7 @@ Fixes are disabled — consider re-enabling to recover cache performance.
 ### Fingerprint round-trip verification
-Before rewriting the `cc_version` fingerprint, the interceptor verifies that its
-hardcoded salt and character indices reproduce the fingerprint Claude Code sent.
-If verification fails (CC changed its algorithm), the rewrite is skipped automatically.
-This ensures the interceptor can never make cache performance *worse* than stock CC.
+Before rewriting the `cc_version` fingerprint, the interceptor verifies that its hardcoded salt and character indices reproduce the fingerprint Claude Code sent. If verification fails (CC changed its algorithm), the rewrite is skipped automatically. This ensures the interceptor can never make cache performance *worse* than stock CC.
 ### Fail-safe design
@@ -257,21 +202,18 @@ The interceptor can only *help* or *do nothing*. It cannot make things worse.
 ## Status line — quota warnings in real time
-The interceptor writes quota state to `~/.claude/quota-status.json` on every API call. The included `tools/quota-statusline.sh` script reads this file and displays a live status line in Claude Code showing:
+Both proxy and preload modes write quota state to `~/.claude/quota-status.json` on every API call. The included `tools/quota-statusline.sh` script displays a live status line showing:
 - **Q5h %** with burn rate (%/min)
 - **Q7d %** with burn rate (%/hr)
-- **TTL tier** — shows `TTL:1h` when healthy, **`TTL:5m` in red when the server has downgraded you** (typically at Q5h ≥ 100%)
+- **TTL tier** — `TTL:1h` when healthy, **`TTL:5m` in red when the server has downgraded you** (typically at Q5h ≥ 100%)
 - **PEAK** in yellow during weekday peak hours (13:00–19:00 UTC)
 - **Cache hit rate %**
 - **OVERAGE** flag when active
 ### Setup
-Copy the script and configure Claude Code to use it:
 ```bash
-# Copy from the npm package to Claude Code's hooks directory
 mkdir -p ~/.claude/hooks
 cp "$(npm root -g)/claude-code-cache-fix/tools/quota-statusline.sh" ~/.claude/hooks/
 chmod +x ~/.claude/hooks/quota-statusline.sh
@@ -288,305 +230,57 @@ Add to `~/.claude/settings.json`:
 }
 ```
-### Recommended: disable git-status injection
-Claude Code injects live `git status` output into the system prompt on every call. Any file edit changes the git status, which changes the system prompt, which busts the entire prefix cache. Disabling this saves ~1,800 tokens per call and fully stabilizes the system prompt across file edits:
-```bash
-export CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1
-```
-Or add `"includeGitInstructions": false` to `~/.claude/settings.json`. Claude Code can still run `git status` via the Bash tool when it needs git context — it just won't pre-inject it into every system prompt.
-The flag also shrinks the Bash tool description by ~6,364 chars (the Bash tool includes git-related instructions that are stripped when the flag is set), for a total prefix savings of ~7,180 chars (~1,800 tokens) per call.
-Community-validated by [@wadabum](https://github.com/cnighswonger/claude-code-cache-fix/issues/11): 18-token cache creation across git state changes (vs thousands without the flag). See [#11](https://github.com/cnighswonger/claude-code-cache-fix/issues/11) for the full telemetry comparison.
-**Note:** this flag does not address the `"Primary working directory"` line in the system prompt, which changes per git worktree. A v1.9.0 interceptor fix to strip/normalize both is planned ([#11](https://github.com/cnighswonger/claude-code-cache-fix/issues/11)).
 ### Why the status line matters
-When the server downgrades your TTL to 5m (Layer 2 — quota-aware downgrade at Q5h ≥ 100%), **every idle longer than 5 minutes causes a full context rebuild**. Without the status line, this is invisible — you just notice things getting slower and more expensive. With the status line, the red `TTL:5m` warning tells you immediately: **stop working, wait for the Q5h window to reset, then resume**. Powering through overage compounds the drain; pausing breaks the cycle.
-## Image stripping
-Images read via the Read tool are encoded as base64 and stored in `tool_result` blocks in conversation history. They ride along on **every subsequent API call** until compaction. A single 500KB image costs ~62,500 tokens per turn on Opus 4.6, and potentially **~85,000+ tokens on Opus 4.7** due to the new tokenizer (up to 35% inflation) and high-res image support (2576px max, up from 1568px). Image stripping is strongly recommended on 4.7.
-Enable image stripping to remove old images from tool results:
-```bash
-export CACHE_FIX_IMAGE_KEEP_LAST=3
-```
-This keeps images in the last 3 user messages and replaces older ones with a text placeholder. Only targets images inside `tool_result` blocks (Read tool output) — user-pasted images are never touched. Files remain on disk for re-reading if needed.
-Set to `0` (default) to disable.
-## System prompt rewrite (optional)
-The interceptor can also rewrite Claude Code's `# Output efficiency` system-prompt section before the request is sent.
-This feature is **optional** and **disabled by default**. If `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT` is unset, nothing is changed.
-Enable it by setting a replacement text:
-```bash
-export CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT=$'# Output efficiency\n\n...'
-```
-The rewrite is intentionally narrow:
-- Only Claude Code's `# Output efficiency` section is replaced
-- Other system prompt sections are preserved
-- Existing system block structure and fields such as `cache_control` are preserved
-This may be useful for users who want to stay on current Claude Code versions but experiment with a different `Output efficiency` instruction set instead of downgrading to an earlier release.
-### Prompt variants
-<details>
-<summary>Anthropic internal / <code>USER_TYPE=ant</code> version</summary>
-```text
-# Output efficiency
-When sending user-facing text, you're writing for a person, not logging to a console. Assume users can't see most tool calls or thinking - only your text output. Before your first tool call, briefly state what you're about to do. While working, give short updates at key moments: when you find something load-bearing (a bug, a root cause), when changing direction, when you've made progress without an update.
-When you give updates, assume the recipient may have stepped away and lost the thread. They do not know your internal shorthand, codenames, or half-formed plan. Write in complete, grammatical sentences that can be understood cold. Spell out technical terms when helpful. If unsure, err on the side of a bit more explanation. Adapt to the user's expertise: experts can handle denser updates, but don't make novice users reconstruct context on their own.
-User-facing text should read like natural prose. Avoid clipped sentence fragments, excessive dashes, symbolic shorthand, or formatting that reads like console output. Use tables only when they genuinely improve scanability, such as compact facts (files, lines, pass/fail) or quantitative comparisons. Keep explanatory reasoning in prose around the table, not inside it. Avoid semantic backtracking: structure sentences so the user can follow them linearly without having to reinterpret earlier clauses after reading later ones.
-Optimize for fast human comprehension, not minimal surface area. If the user has to reread your summary or ask a follow-up just to understand what happened, you saved the wrong tokens. Match the level of structure to the task: for a simple question, answer in plain prose without unnecessary headings or numbered lists. While staying clear and direct, also be concise and avoid fluff. Skip filler, obvious restatements, and throat-clearing. Get to the point. Don't over-focus on low-signal details from your process. When it helps, use an inverted pyramid structure with the conclusion first and details later.
-These user-facing text instructions do not apply to code or tool calls.
-```
-</details>
-<details>
-<summary>Public / default Claude Code version</summary>
-```text
-# Output efficiency
-IMPORTANT: Go straight to the point. Try the simplest approach first without going in circles. Do not overdo it. Be extra concise.
-Your text output is brief, direct, and to the point. Lead with the answer or action, not the reasoning. Omit filler, preamble, and unnecessary transitions. Do not restate the user's request; move directly to the work. When explanation is needed, include only what helps the user understand the outcome.
-Prioritize user-facing text for:
-- decisions that require user input
-- high-signal progress updates at natural milestones
-- errors or blockers that change the plan
-If a sentence can do the job, do not turn it into three. Favor short, direct constructions over long explanatory prose. These instructions do not apply to code or tool calls.
-```
-</details>
-<details>
-<summary>Example custom replacement(A middle-ground version combining the two versions above)</summary>
-```text
-# Output efficiency
-When sending user-facing text, write for a person, not a log file. Assume the user cannot see most tool calls or hidden reasoning - only your text output.
-Keep user-facing text clear, direct, and reasonably concise. Lead with the answer or action. Skip filler, repetition, and unnecessary preamble.
-Explain enough for the user to understand the reasoning, tradeoffs, or root cause when that would help them learn or make a decision, but do not turn simple answers into long writeups.
-These instructions apply to user-facing text only. They do not apply to investigation, code reading, tool use, or verification.
-Before making changes, read the relevant code and understand the surrounding context. Check types, signatures, call sites, and error causes before editing. Do not confuse brevity with rushing, and do not replace understanding with trial and error.
+When the server downgrades your TTL to 5m (quota-aware downgrade at Q5h ≥ 100%), **every idle longer than 5 minutes causes a full context rebuild**. Without the status line, this is invisible. With it, the red `TTL:5m` warning tells you: **stop working, wait for the Q5h window to reset, then resume**. Powering through overage compounds the drain; pausing breaks the cycle.
-While working, give short updates at meaningful moments: when you find the root cause, when the plan changes, when you hit a blocker, or when a meaningful milestone is complete. Do not narrate every step.
-When reporting results, be accurate and concrete. If you did not verify something, say so plainly. If a check failed, say that plainly too.
-```
-</details>
-## Monitoring
-The interceptor includes monitoring for several additional issues identified by the community:
-### Microcompact / budget enforcement
-Claude Code silently replaces old tool results with `[Old tool result content cleared]` via server-controlled mechanisms (GrowthBook flags). A 200,000-character aggregate cap and per-tool caps (Bash: 30K, Grep: 20K) truncate older results without notification. There is no `DISABLE_MICROCOMPACT` environment variable.
-The interceptor detects cleared tool results and logs counts. When total tool result characters approach the 200K threshold, a warning is logged.
-### False rate limiter
-The client can generate synthetic "Rate limit reached" errors without making an API call, identifiable by `"model": "<synthetic>"`. The interceptor logs these events.
-### GrowthBook flag dump
-On the first API call, the interceptor reads `~/.claude.json` and logs the current state of cost/cache-relevant server-controlled flags (hawthorn_window, pewter_kestrel, slate_heron, session_memory, etc.).
-### Quota tracking
-Response headers are parsed for `anthropic-ratelimit-unified-5h-utilization` and `7d-utilization`, saved to `~/.claude/quota-status.json` for consumption by status line hooks or other tools.
-### Peak hour detection
-Anthropic applies elevated quota drain rates during weekday peak hours (13:00–19:00 UTC, Mon–Fri). The interceptor detects peak windows and writes `peak_hour: true/false` to `quota-status.json`. See `docs/peak-hours-reference.md` for sources and details.
-### Usage telemetry and cost reporting
+### Recommended: disable git-status injection
-The interceptor logs per-call usage data to `~/.claude/usage.jsonl` — one JSON line per API call with model, token counts, and cache breakdown. Use the bundled cost report tool to analyze costs:
+Claude Code injects live `git status` into the system prompt on every call. Any file edit changes the git status, which busts the entire prefix cache. Disabling this saves ~1,800 tokens per call:
 ```bash
-node tools/cost-report.mjs                    # today's costs from interceptor log
-node tools/cost-report.mjs --date 2026-04-08  # specific date
-node tools/cost-report.mjs --since 2h         # last 2 hours
-node tools/cost-report.mjs --admin-key <key>  # cross-reference with Admin API
+export CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1
 ```
-Also works with any JSONL containing Anthropic usage fields (`--file`, stdin) — useful for SDK users and proxy setups. See `docs/cost-report.md` for full documentation.
+Or add `"includeGitInstructions": false` to `~/.claude/settings.json`. Claude Code can still run `git status` via the Bash tool when it needs context. Community-validated by [@wadabum](https://github.com/cnighswonger/claude-code-cache-fix/issues/11): 18-token cache creation across git state changes (vs thousands without the flag).
-### Quota analysis (5-hour quota counting)
+## Image stripping (preload mode)
-The same `usage.jsonl` log can be analyzed to test how Anthropic's 5-hour quota is actually computed. Run the bundled tool:
+Images read via the Read tool persist as base64 in conversation history, riding along on every subsequent API call. A single 500KB image costs ~62,500 tokens per turn on Opus 4.6, and **~85,000+ on Opus 4.7** due to the new tokenizer. Image stripping is strongly recommended on 4.7.
 ```bash
-node tools/quota-analysis.mjs              # analyze your default log
-node tools/quota-analysis.mjs --since 24h  # last 24 hours only
-node tools/quota-analysis.mjs --json       # machine-readable output
+export CACHE_FIX_IMAGE_KEEP_LAST=3
 ```
-The tool answers three questions from your own data:
+Keeps images in the last 3 user messages, replaces older ones with a text placeholder. Only targets `tool_result` blocks — user-pasted images are never touched.
-1. **Does `cache_read` count toward your 5-hour quota?** Tests three hypotheses (cache_read costs 0x / 0.1x / 1x of input rate) and reports which one best explains your `q5h_pct` trajectory across reset windows. Lower coefficient of variation across windows = better fit.
-2. **Do peak hours cost more quota per token?** Splits windows into peak-dominant (≥80% peak calls) and off-peak-dominant (≤20%) and compares the implied 100% quota under the best-fit model.
-3. **What is your account's effective 5-hour quota in token-equivalents?** Reports a concrete number you can compare against your subscription tier or against what other users measure.
+## System prompt rewrite (preload mode, optional)
-Requires `q5h_pct`, `q7d_pct`, and `peak_hour` fields in usage.jsonl, which were added in v1.6.1 (2026-04-09). Older entries are silently filtered out.
+The interceptor can rewrite Claude Code's `# Output efficiency` system-prompt section. Disabled by default. Enable with `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT`. See [docs/output-efficiency-prompts.md](docs/output-efficiency-prompts.md) for the three known prompt variants and usage instructions.
-**Help us validate across accounts:** if you run this on your own log, please open an issue or PR on this repo with your output (or just the best-fit hypothesis name and your peak/off-peak ratio). Cross-validating across multiple accounts is the only way to distinguish per-account variance from real findings. Reference: [anthropics/claude-code#45756](https://github.com/anthropics/claude-code/issues/45756).
+## Monitoring & diagnostics
-## Debug mode
+The preload interceptor includes monitoring for microcompact degradation, false rate limiters, GrowthBook flag state, usage telemetry, and cost reporting. Quota tracking works in both proxy and preload modes via `~/.claude/quota-status.json`.
-Enable debug logging to verify the fix is working:
-```bash
-CACHE_FIX_DEBUG=1 claude-fixed
-```
-Logs are written to `~/.claude/cache-fix-debug.log`. Look for:
-- `APPLIED: resume message relocation` — block scatter was detected and fixed
-- `APPLIED: tool order stabilization` — tools were reordered
-- `APPLIED: fingerprint stabilized from XXX to YYY` — fingerprint was corrected
-- `APPLIED: stripped N images from old tool results` — images were stripped
-- `APPLIED: output efficiency section rewritten` — output-efficiency section was replaced
-- `MICROCOMPACT: N/M tool results cleared` — microcompact degradation detected
-- `BUDGET WARNING: tool result chars at N / 200,000 threshold` — approaching budget cap
-- `FALSE RATE LIMIT: synthetic model detected` — client-side false rate limit
-- `GROWTHBOOK FLAGS: {...}` — server-controlled feature flags on first call
-- `PROMPT SIZE: system=N tools=N injected=N (skills=N mcp=N ...)` — per-call prompt size breakdown
-- `CACHE TTL: tier=1h create=N read=N hit=N% (1h=N 5m=N)` — TTL tier and cache hit rate per call
-- `PEAK HOUR: weekday 13:00-19:00 UTC` — Anthropic peak hour throttling active
-- `SKIPPED: resume relocation (not a resume or already correct)` — no fix needed
-- `SKIPPED: output efficiency rewrite (section not found)` — no matching output-efficiency section found
-### Prefix diff mode
-Enable cross-process prefix snapshot diffing to diagnose cache busts on restart:
-```bash
-CACHE_FIX_PREFIXDIFF=1 claude-fixed
-```
-Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are generated on the first API call after a restart.
-## Environment variables
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `CACHE_FIX_DEBUG` | `0` | Enable debug logging to `~/.claude/cache-fix-debug.log` |
-| `CACHE_FIX_PREFIXDIFF` | `0` | Enable prefix snapshot diffing |
-| `CACHE_FIX_IMAGE_KEEP_LAST` | `0` | Keep images in last N user messages (0 = disabled) |
-| `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT` | unset | Replace Claude Code's `# Output efficiency` system-prompt section before the request is sent |
-| `CACHE_FIX_USAGE_LOG` | `~/.claude/usage.jsonl` | Path for per-call usage telemetry log |
-| `CACHE_FIX_DISABLED` | `0` | Disable all bug fixes; keep monitoring + optimizations active |
-| `CACHE_FIX_SKIP_RELOCATE` | `0` | Skip block relocation fix (Bug 1) |
-| `CACHE_FIX_SKIP_FINGERPRINT` | `0` | Skip fingerprint stabilization (Bug 2b) |
-| `CACHE_FIX_SKIP_TOOL_SORT` | `0` | Skip tool ordering stabilization (Bug 2a) |
-| `CACHE_FIX_SKIP_TTL` | `0` | Skip TTL injection (Bug 5) |
-| `CACHE_FIX_SKIP_IDENTITY` | `0` | Skip identity normalization (Bug 6) |
-| `CACHE_FIX_SKIP_GIT_STATUS` | `0` | Skip git-status stripping |
-| `CACHE_FIX_STRIP_GIT_STATUS` | `0` | Strip volatile git-status from system prompt for prefix stability. Model can still run `git status` via Bash. |
-| `CACHE_FIX_TTL_MAIN` | `1h` | TTL for main-thread requests: `1h`, `5m`, or `none` (pass-through) |
-| `CACHE_FIX_TTL_SUBAGENT` | `1h` | TTL for subagent requests: `1h`, `5m`, or `none` (pass-through) |
-| `CACHE_FIX_DUMP_BREAKPOINTS` | unset | Path to dump cache breakpoint structure (diagnostic for #12) |
+See [docs/monitoring.md](docs/monitoring.md) for full details, debug mode, prefix diffing, environment variables, and the bundled quota analysis tool.
 ## Limitations
-- **npm installation only** — The standalone Claude Code binary has Zig-level attestation that bypasses Node.js. This fix only works with the npm package (`npm install -g @anthropic-ai/claude-code`).
-- **Overage TTL downgrade** — Exceeding 100% of the 5-hour quota triggers a server-enforced TTL downgrade from 1h to 5m. This is a server-side decision and cannot be fixed client-side. The interceptor prevents the cache instability that can push you into overage in the first place.
-- **Microcompact is not preventable** — The monitoring features detect context degradation but cannot prevent it. The microcompact and budget enforcement mechanisms are server-controlled via GrowthBook flags with no client-side disable option.
-- **System prompt rewrite is experimental** — This hook only rewrites one system-prompt section and is opt-in, but there are still unknowns: it is not proven that this prompt text is responsible for the behavior differences discussed in community reports, and it is not known whether future server-side validation could react to modified system prompts. Use at your own risk.
+- **Proxy requires a running process** — The proxy must be started before Claude Code. If it's not running and `ANTHROPIC_BASE_URL` points to it, CC will fail to connect. We recommend running it as a systemd service or with a health-checking wrapper script.
+- **Overage TTL downgrade** — Exceeding 100% of the 5-hour quota triggers a server-enforced TTL downgrade from 1h to 5m. This is server-side and cannot be fixed client-side. The proxy/interceptor prevents the cache instability that can push you into overage in the first place.
+- **Microcompact is not preventable** — The monitoring features detect context degradation but cannot prevent it. Microcompact and budget enforcement are server-controlled via GrowthBook flags with no client-side disable option.
+- **System prompt rewrite is experimental** — Preload-only, opt-in. Not proven to be the cause of behavior differences discussed in community reports. Use at your own risk.
 - **Version coupling** — The fingerprint salt and block detection heuristics are derived from Claude Code internals. A major refactor could require an update to this package.
 ## Tracked issues
-- [#34629](https://github.com/anthropics/claude-code/issues/34629) — Original resume cache regression report
-- [#40524](https://github.com/anthropics/claude-code/issues/40524) — Within-session fingerprint invalidation, image persistence
-- [#42052](https://github.com/anthropics/claude-code/issues/42052) — Community interceptor development, TTL downgrade discovery
-- [#43044](https://github.com/anthropics/claude-code/issues/43044) — Resume loads 0% context on v2.1.91
-- [#43657](https://github.com/anthropics/claude-code/issues/43657) — Resume cache invalidation confirmed on v2.1.92
-- [#44045](https://github.com/anthropics/claude-code/issues/44045) — SDK-level reproduction with token measurements
-- [#32508](https://github.com/anthropics/claude-code/issues/32508) — Community discussion around the `Output efficiency` system-prompt change and its possible effect on model behavior
+We monitor 30+ upstream Claude Code issues related to cache, quota, and context bugs. See [TRACKED_ISSUES.md](TRACKED_ISSUES.md) for the full list with our involvement, community research, and key contributors.
 ## Related research
-- **[@ArkNill/claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis)** — Systematic proxy-based analysis of 7 bugs including microcompact, budget enforcement, false rate limiter, and extended thinking quota impact. The monitoring features in v1.1.0 are informed by this research.
-- **[@Renvect/X-Ray-Claude-Code-Interceptor](https://github.com/Renvect/X-Ray-Claude-Code-Interceptor)** — Diagnostic HTTPS proxy with real-time dashboard, system prompt section diffing, per-tool stripping thresholds, and multi-stream JSONL logging. Works with any Claude client that supports `ANTHROPIC_BASE_URL` (CLI, VS Code extension, desktop app), complementing this package's CLI-only `NODE_OPTIONS` approach.
-- **[@fgrosswig/claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard)** — Self-hosted forensic dashboard with SSE live monitoring, multi-host aggregation, cache-health scoring, and forced-restart/compaction detection. Reads from Claude Code's native session JSONL files and optionally from an HTTP proxy NDJSON stream. v1.4.0 documented the forced-session-restart mechanism at quota-cap boundaries (~490K tokens per event) and the 78–91% cache-wipe pattern at compaction events. Complementary to our interceptor's in-process vantage point. See [Works with @fgrosswig's dashboard](#works-with-fgrosswigs-dashboard) below for the interop pattern.
-## Works with @fgrosswig's dashboard
-This interceptor and [@fgrosswig](https://github.com/fgrosswig)'s
-[claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard)
-solve strongly complementary problems. The interceptor captures per-call API
-data from inside the Node.js process — cache metrics, quota state, TTL tier,
-rewrites applied. The dashboard provides the visualization layer — historical
-trending, per-day charts, multi-host aggregation, cache-health scoring.
-Running both gives you the best of both tools, and the integration is a
-one-liner thanks to the dashboard's tolerant NDJSON ingest and our new
-`usage-to-dashboard-ndjson` translator.
-### Quick setup
-```bash
-# Install both tools
-npm install -g claude-code-cache-fix
-# (follow fgrosswig's dashboard install: https://github.com/fgrosswig/claude-usage-dashboard)
-# One-shot translation (reads ~/.claude/usage.jsonl, writes to
-# ~/.claude/anthropic-proxy-logs/proxy-YYYY-MM-DD.ndjson, which his
-# dashboard already watches)
-node $(npm root -g)/claude-code-cache-fix/tools/usage-to-dashboard-ndjson.mjs
-# Or keep it live-updating as the interceptor logs new calls
-node $(npm root -g)/claude-code-cache-fix/tools/usage-to-dashboard-ndjson.mjs --follow &
-```
-No configuration required on the dashboard side — fgrosswig's
-`collectProxyNdjsonFiles()` auto-discovers files in
-`~/.claude/anthropic-proxy-logs/` (or `$ANTHROPIC_PROXY_LOG_DIR`), and our
-translator writes to exactly that path with the expected `proxy-YYYY-MM-DD.ndjson`
-filename convention. The dashboard's tolerant ingestion layer ignores unknown
-fields, so interceptor-specific extras (`ttl_tier`, `ephemeral_1h_input_tokens`,
-`ephemeral_5m_input_tokens`, `peak_hour`, quota state) pass through cleanly
-and remain available to downstream consumers that know to read them.
-The `cost_factor` metric in `tools/cost-report.mjs` also comes from
-fgrosswig's methodology — the `(input + output + cache_read + cache_creation) / output`
-ratio that gives a single-number measure of how much context is being paid
-per useful output token. A rising cost factor across a long session is the
-measurable signature of cache-efficiency degradation.
+- **[@ArkNill/claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis)** — 38,996-request proxy-based analysis: 7 bugs (microcompact, budget caps, false rate limiter, JSONL duplication, extended thinking), GrowthBook feature flag causal testing, Opus 4.7 burn rate advisory. The monitoring features in v1.1.0 are informed by this research.
+- **[@Renvect/X-Ray-Claude-Code-Interceptor](https://github.com/Renvect/X-Ray-Claude-Code-Interceptor)** — Diagnostic HTTPS proxy with real-time dashboard, system prompt section diffing, per-tool stripping thresholds. Works with any Claude client that supports `ANTHROPIC_BASE_URL`.
+- **[@fgrosswig/claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard)** — Self-hosted forensic dashboard with SSE live monitoring, multi-host aggregation, cache-health scoring. Complementary to our proxy's vantage point. See [docs/dashboard-integration.md](docs/dashboard-integration.md) for the interop setup.
 ## Used in production
@@ -597,17 +291,16 @@ measurable signature of cache-efficiency degradation.
 - **[@VictorSun92](https://github.com/VictorSun92)** — Original monkey-patch fix for v2.1.88, identified partial scatter on v2.1.90, contributed forward-scan detection, correct block ordering, tighter block matchers, and the optional output-efficiency rewrite hook
 - **[@bilby91](https://github.com/bilby91)** ([Crunchloop DAP](https://dap.crunchloop.ai)) — Agent SDK / DAP production environment validation, 1h cache TTL confirmation, tool ordering jitter discovery via debug trace (fixed in v1.5.1), fresh-session sort bug discovery via SKILLS SORT diagnostic (fixed in v1.6.2). First production team to roll the interceptor to trunk.
 - **[@jmarianski](https://github.com/jmarianski)** — Root cause analysis via MITM proxy capture and Ghidra reverse engineering, multi-mode cache test script
-- **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, image stripping, monitoring features, overage TTL downgrade discovery, package maintainer
-- **[@ArkNill](https://github.com/ArkNill)** — Microcompact mechanism analysis, GrowthBook flag documentation, false rate limiter identification
+- **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, image stripping, monitoring features, overage TTL downgrade discovery, proxy architecture, package maintainer
+- **[@ArkNill](https://github.com/ArkNill)** — Microcompact mechanism analysis, GrowthBook flag documentation, false rate limiter identification, fingerprint verification fix for CC v2.1.108+ (PR #21), Korean README (PR #22), [claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis) research
 - **[@Renvect](https://github.com/Renvect)** — Image duplication discovery, cross-project directory contamination analysis
 - **[@fgrosswig](https://github.com/fgrosswig)** — [claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard) forensic methodology: cost-factor overhead ratio metric, `anthropic-*` header capture pattern, proxy NDJSON schema that informed our dashboard interop layer
-- **[@TomTheMenace](https://github.com/TomTheMenace)** — Windows `.bat` wrapper for the interceptor, first Windows platform validation (7.5h/536-call Opus 4.6 session, 98.4% cache hit rate, 81% fingerprint instability corrected)
+- **[@TomTheMenace](https://github.com/TomTheMenace)** — Windows `.bat` wrapper, first Windows platform validation (7.5h/536-call Opus 4.6 session, 98.4% cache hit rate)
 - **[@arjansingh](https://github.com/arjansingh)** — nvm-compatible wrapper script with dynamic `npm root -g` path resolution (PR #15)
 - **[@beekamai](https://github.com/beekamai)** — Windows URL-encoding fix for `claude-fixed.bat` when npm root contains spaces (PR #17)
 - **[@JEONG-JIWOO](https://github.com/JEONG-JIWOO)** — VS Code extension investigation: discovered `claudeCode.claudeProcessWrapper` as the working integration path, wrote the C wrapper for Windows (#16)
 - **[@X-15](https://github.com/X-15)** — VS Code extension validation, per-fix health status analysis confirming safety check behavior on v2.1.105 (#16)
-- **[@ArkNill](https://github.com/ArkNill)** — Fingerprint verification fix for CC v2.1.108+ (`isMeta` filter change, PR #21), Korean README (PR #22), original [claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis) research
-- **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery
+- **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery, proposed and built proxy architecture for v3.0.0
 If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.

package/package.json CHANGED Viewed

@@ -1,12 +1,12 @@
 {
   "name": "claude-code-cache-fix",
-  "version": "3.0.0",
+  "version": "3.0.2",
   "description": "Cache optimization proxy and interceptor for Claude Code. Fixes prompt cache bugs, stabilizes prefix, reduces quota burn.",
   "type": "module",
   "exports": "./preload.mjs",
   "main": "./preload.mjs",
   "bin": {
-    "claude-via-proxy": "./bin/claude-via-proxy.mjs"
+    "cache-fix-proxy": "./bin/claude-via-proxy.mjs"
   },
   "files": [
     "preload.mjs",

package/preload.mjs CHANGED Viewed

@@ -2268,7 +2268,7 @@ globalThis.fetch = async function (url, options) {
             }
             return true;
           });
-          if (kept.length !== msg.content.length) msg.content = kept;
+          if (kept.length !== msg.content.length && kept.length > 0) msg.content = kept;
         }
         if (trailerStripped > 0) {
           modified = true;
@@ -2340,7 +2340,7 @@ globalThis.fetch = async function (url, options) {
             }
             return true;
           });
-          if (kept.length !== msg.content.length) msg.content = kept;
+          if (kept.length !== msg.content.length && kept.length > 0) msg.content = kept;
         }
         if (reminderStripped > 0) {
           modified = true;

package/proxy/pipeline.mjs CHANGED Viewed

@@ -1,5 +1,6 @@
 import { readdir, readFile } from "node:fs/promises";
 import { join } from "node:path";
+import { pathToFileURL } from "node:url";
 let registry = [];
@@ -16,7 +17,7 @@ export async function loadExtensions(dir, configPath) {
   const extensions = [];
   for (const file of mjsFiles) {
     try {
-      const mod = await import(join(dir, file) + "?t=" + Date.now());
+      const mod = await import(pathToFileURL(join(dir, file)).href + "?t=" + Date.now());
       const ext = mod.default;
       if (!ext || !ext.name) continue;