npm - claude-code-cache-fix - Versions diffs - 1.6.2-debug.1 → 1.6.3 - Mend

claude-code-cache-fix 1.6.2-debug.1 → 1.6.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -94,6 +94,90 @@ This keeps images in the last 3 user messages and replaces older ones with a tex
 Set to `0` (default) to disable.
+## System prompt rewrite (optional)
+The interceptor can also rewrite Claude Code's `# Output efficiency` system-prompt section before the request is sent.
+This feature is **optional** and **disabled by default**. If `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT` is unset, nothing is changed.
+Enable it by setting a replacement text:
+```bash
+export CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT=$'# Output efficiency\n\n...'
+```
+The rewrite is intentionally narrow:
+- Only Claude Code's `# Output efficiency` section is replaced
+- Other system prompt sections are preserved
+- Existing system block structure and fields such as `cache_control` are preserved
+This may be useful for users who want to stay on current Claude Code versions but experiment with a different `Output efficiency` instruction set instead of downgrading to an earlier release.
+### Prompt variants
+<details>
+<summary>Anthropic internal / <code>USER_TYPE=ant</code> version</summary>
+```text
+# Output efficiency
+When sending user-facing text, you're writing for a person, not logging to a console. Assume users can't see most tool calls or thinking - only your text output. Before your first tool call, briefly state what you're about to do. While working, give short updates at key moments: when you find something load-bearing (a bug, a root cause), when changing direction, when you've made progress without an update.
+When you give updates, assume the recipient may have stepped away and lost the thread. They do not know your internal shorthand, codenames, or half-formed plan. Write in complete, grammatical sentences that can be understood cold. Spell out technical terms when helpful. If unsure, err on the side of a bit more explanation. Adapt to the user's expertise: experts can handle denser updates, but don't make novice users reconstruct context on their own.
+User-facing text should read like natural prose. Avoid clipped sentence fragments, excessive dashes, symbolic shorthand, or formatting that reads like console output. Use tables only when they genuinely improve scanability, such as compact facts (files, lines, pass/fail) or quantitative comparisons. Keep explanatory reasoning in prose around the table, not inside it. Avoid semantic backtracking: structure sentences so the user can follow them linearly without having to reinterpret earlier clauses after reading later ones.
+Optimize for fast human comprehension, not minimal surface area. If the user has to reread your summary or ask a follow-up just to understand what happened, you saved the wrong tokens. Match the level of structure to the task: for a simple question, answer in plain prose without unnecessary headings or numbered lists. While staying clear and direct, also be concise and avoid fluff. Skip filler, obvious restatements, and throat-clearing. Get to the point. Don't over-focus on low-signal details from your process. When it helps, use an inverted pyramid structure with the conclusion first and details later.
+These user-facing text instructions do not apply to code or tool calls.
+```
+</details>
+<details>
+<summary>Public / default Claude Code version</summary>
+```text
+# Output efficiency
+IMPORTANT: Go straight to the point. Try the simplest approach first without going in circles. Do not overdo it. Be extra concise.
+Your text output is brief, direct, and to the point. Lead with the answer or action, not the reasoning. Omit filler, preamble, and unnecessary transitions. Do not restate the user's request; move directly to the work. When explanation is needed, include only what helps the user understand the outcome.
+Prioritize user-facing text for:
+- decisions that require user input
+- high-signal progress updates at natural milestones
+- errors or blockers that change the plan
+If a sentence can do the job, do not turn it into three. Favor short, direct constructions over long explanatory prose. These instructions do not apply to code or tool calls.
+```
+</details>
+<details>
+<summary>Example custom replacement(A middle-ground version combining the two versions above)</summary>
+```text
+# Output efficiency
+When sending user-facing text, write for a person, not a log file. Assume the user cannot see most tool calls or hidden reasoning - only your text output.
+Keep user-facing text clear, direct, and reasonably concise. Lead with the answer or action. Skip filler, repetition, and unnecessary preamble.
+Explain enough for the user to understand the reasoning, tradeoffs, or root cause when that would help them learn or make a decision, but do not turn simple answers into long writeups.
+These instructions apply to user-facing text only. They do not apply to investigation, code reading, tool use, or verification.
+Before making changes, read the relevant code and understand the surrounding context. Check types, signatures, call sites, and error causes before editing. Do not confuse brevity with rushing, and do not replace understanding with trial and error.
+While working, give short updates at meaningful moments: when you find the root cause, when the plan changes, when you hit a blocker, or when a meaningful milestone is complete. Do not narrate every step.
+When reporting results, be accurate and concrete. If you did not verify something, say so plainly. If a check failed, say that plainly too.
+```
+</details>
 ## Monitoring
 The interceptor includes monitoring for several additional issues identified by the community:
@@ -146,6 +230,7 @@ Logs are written to `~/.claude/cache-fix-debug.log`. Look for:
 - `APPLIED: tool order stabilization` — tools were reordered
 - `APPLIED: fingerprint stabilized from XXX to YYY` — fingerprint was corrected
 - `APPLIED: stripped N images from old tool results` — images were stripped
+- `APPLIED: output efficiency section rewritten` — output-efficiency section was replaced
 - `MICROCOMPACT: N/M tool results cleared` — microcompact degradation detected
 - `BUDGET WARNING: tool result chars at N / 200,000 threshold` — approaching budget cap
 - `FALSE RATE LIMIT: synthetic model detected` — client-side false rate limit
@@ -154,6 +239,7 @@ Logs are written to `~/.claude/cache-fix-debug.log`. Look for:
 - `CACHE TTL: tier=1h create=N read=N hit=N% (1h=N 5m=N)` — TTL tier and cache hit rate per call
 - `PEAK HOUR: weekday 13:00-19:00 UTC` — Anthropic peak hour throttling active
 - `SKIPPED: resume relocation (not a resume or already correct)` — no fix needed
+- `SKIPPED: output efficiency rewrite (section not found)` — no matching output-efficiency section found
 ### Prefix diff mode
@@ -172,6 +258,7 @@ Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are gen
 | `CACHE_FIX_DEBUG` | `0` | Enable debug logging to `~/.claude/cache-fix-debug.log` |
 | `CACHE_FIX_PREFIXDIFF` | `0` | Enable prefix snapshot diffing |
 | `CACHE_FIX_IMAGE_KEEP_LAST` | `0` | Keep images in last N user messages (0 = disabled) |
+| `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT` | unset | Replace Claude Code's `# Output efficiency` system-prompt section before the request is sent |
 | `CACHE_FIX_USAGE_LOG` | `~/.claude/usage.jsonl` | Path for per-call usage telemetry log |
 ## Limitations
@@ -179,6 +266,7 @@ Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are gen
 - **npm installation only** — The standalone Claude Code binary has Zig-level attestation that bypasses Node.js. This fix only works with the npm package (`npm install -g @anthropic-ai/claude-code`).
 - **Overage TTL downgrade** — Exceeding 100% of the 5-hour quota triggers a server-enforced TTL downgrade from 1h to 5m. This is a server-side decision and cannot be fixed client-side. The interceptor prevents the cache instability that can push you into overage in the first place.
 - **Microcompact is not preventable** — The monitoring features detect context degradation but cannot prevent it. The microcompact and budget enforcement mechanisms are server-controlled via GrowthBook flags with no client-side disable option.
+- **System prompt rewrite is experimental** — This hook only rewrites one system-prompt section and is opt-in, but there are still unknowns: it is not proven that this prompt text is responsible for the behavior differences discussed in community reports, and it is not known whether future server-side validation could react to modified system prompts. Use at your own risk.
 - **Version coupling** — The fingerprint salt and block detection heuristics are derived from Claude Code internals. A major refactor could require an update to this package.
 ## Tracked issues
@@ -189,6 +277,7 @@ Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are gen
 - [#43044](https://github.com/anthropics/claude-code/issues/43044) — Resume loads 0% context on v2.1.91
 - [#43657](https://github.com/anthropics/claude-code/issues/43657) — Resume cache invalidation confirmed on v2.1.92
 - [#44045](https://github.com/anthropics/claude-code/issues/44045) — SDK-level reproduction with token measurements
+- [#32508](https://github.com/anthropics/claude-code/issues/32508) — Community discussion around the `Output efficiency` system-prompt change and its possible effect on model behavior
 ## Related research
@@ -197,7 +286,7 @@ Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are gen
 ## Contributors
-- **[@VictorSun92](https://github.com/VictorSun92)** — Original monkey-patch fix for v2.1.88, identified partial scatter on v2.1.90, contributed forward-scan detection, correct block ordering, and tighter block matchers
+- **[@VictorSun92](https://github.com/VictorSun92)** — Original monkey-patch fix for v2.1.88, identified partial scatter on v2.1.90, contributed forward-scan detection, correct block ordering, tighter block matchers, and the optional output-efficiency rewrite hook
 - **[@jmarianski](https://github.com/jmarianski)** — Root cause analysis via MITM proxy capture and Ghidra reverse engineering, multi-mode cache test script
 - **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, image stripping, monitoring features, overage TTL downgrade discovery, package maintainer
 - **[@ArkNill](https://github.com/ArkNill)** — Microcompact mechanism analysis, GrowthBook flag documentation, false rate limiter identification

package/README.zh.md CHANGED Viewed

@@ -52,6 +52,7 @@ chmod +x ~/bin/claude-fixed
 ```
 如果你的 npm 全局前缀不同，请相应调整 `CLAUDE_NPM_CLI`。使用以下命令查找：
 ```bash
 npm root -g
 ```
@@ -94,6 +95,90 @@ export CACHE_FIX_IMAGE_KEEP_LAST=3
 设为 `0`（默认）以禁用。
+## 系统提示词重写（可选）
+拦截器还可以在请求发出前，重写 Claude Code 的 `# Output efficiency` 系统提示词段落。
+此功能是**可选的**，并且**默认关闭**。如果未设置 `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT`，则不会做任何修改。
+通过设置替换文本启用：
+```bash
+export CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT=$'# Output efficiency\n\n...'
+```
+该重写被刻意限制在很小的范围内：
+- 只替换 Claude Code 的 `# Output efficiency` 这一节
+- 其他系统提示词段落会被保留
+- 现有的 system block 结构以及 `cache_control` 等字段会被保留
+这对那些希望继续使用较新版本的 Claude Code、但又想尝试不同 `Output efficiency` 指令集而不是降级到旧版本的用户，可能会有帮助。
+### 提示词版本
+<details>
+<summary>Anthropic 内部 / <code>USER_TYPE=ant</code> 版本</summary>
+```text
+# Output efficiency
+When sending user-facing text, you're writing for a person, not logging to a console. Assume users can't see most tool calls or thinking - only your text output. Before your first tool call, briefly state what you're about to do. While working, give short updates at key moments: when you find something load-bearing (a bug, a root cause), when changing direction, when you've made progress without an update.
+When you give updates, assume the recipient may have stepped away and lost the thread. They do not know your internal shorthand, codenames, or half-formed plan. Write in complete, grammatical sentences that can be understood cold. Spell out technical terms when helpful. If unsure, err on the side of a bit more explanation. Adapt to the user's expertise: experts can handle denser updates, but don't make novice users reconstruct context on their own.
+User-facing text should read like natural prose. Avoid clipped sentence fragments, excessive dashes, symbolic shorthand, or formatting that reads like console output. Use tables only when they genuinely improve scanability, such as compact facts (files, lines, pass/fail) or quantitative comparisons. Keep explanatory reasoning in prose around the table, not inside it. Avoid semantic backtracking: structure sentences so the user can follow them linearly without having to reinterpret earlier clauses after reading later ones.
+Optimize for fast human comprehension, not minimal surface area. If the user has to reread your summary or ask a follow-up just to understand what happened, you saved the wrong tokens. Match the level of structure to the task: for a simple question, answer in plain prose without unnecessary headings or numbered lists. While staying clear and direct, also be concise and avoid fluff. Skip filler, obvious restatements, and throat-clearing. Get to the point. Don't over-focus on low-signal details from your process. When it helps, use an inverted pyramid structure with the conclusion first and details later.
+These user-facing text instructions do not apply to code or tool calls.
+```
+</details>
+<details>
+<summary>公开 / 默认 Claude Code 版本</summary>
+```text
+# Output efficiency
+IMPORTANT: Go straight to the point. Try the simplest approach first without going in circles. Do not overdo it. Be extra concise.
+Your text output is brief, direct, and to the point. Lead with the answer or action, not the reasoning. Omit filler, preamble, and unnecessary transitions. Do not restate the user's request; move directly to the work. When explanation is needed, include only what helps the user understand the outcome.
+Prioritize user-facing text for:
+- decisions that require user input
+- high-signal progress updates at natural milestones
+- errors or blockers that change the plan
+If a sentence can do the job, do not turn it into three. Favor short, direct constructions over long explanatory prose. These instructions do not apply to code or tool calls.
+```
+</details>
+<details>
+<summary>自定义替换示例（结合上面两版的折中版本）</summary>
+```text
+# Output efficiency
+When sending user-facing text, write for a person, not a log file. Assume the user cannot see most tool calls or hidden reasoning - only your text output.
+Keep user-facing text clear, direct, and reasonably concise. Lead with the answer or action. Skip filler, repetition, and unnecessary preamble.
+Explain enough for the user to understand the reasoning, tradeoffs, or root cause when that would help them learn or make a decision, but do not turn simple answers into long writeups.
+These instructions apply to user-facing text only. They do not apply to investigation, code reading, tool use, or verification.
+Before making changes, read the relevant code and understand the surrounding context. Check types, signatures, call sites, and error causes before editing. Do not confuse brevity with rushing, and do not replace understanding with trial and error.
+While working, give short updates at meaningful moments: when you find the root cause, when the plan changes, when you hit a blocker, or when a meaningful milestone is complete. Do not narrate every step.
+When reporting results, be accurate and concrete. If you did not verify something, say so plainly. If a check failed, say that plainly too.
+```
+</details>
 ## 监控功能
 拦截器包含社区发现的多项额外问题的监控：
@@ -137,7 +222,32 @@ node tools/cost-report.mjs --admin-key <key>  # 与 Admin API 交叉验证
 CACHE_FIX_DEBUG=1 claude-fixed
 ```
-日志写入 `~/.claude/cache-fix-debug.log`。
+日志写入 `~/.claude/cache-fix-debug.log`。重点关注：
+- `APPLIED: resume message relocation` — 块散布已检测并修复
+- `APPLIED: tool order stabilization` — 工具已重新排序
+- `APPLIED: fingerprint stabilized from XXX to YYY` — 指纹已被纠正
+- `APPLIED: stripped N images from old tool results` — 已从旧工具结果中剥离图片
+- `APPLIED: output efficiency section rewritten` — `output efficiency` 段已被替换
+- `MICROCOMPACT: N/M tool results cleared` — 检测到微压缩降级
+- `BUDGET WARNING: tool result chars at N / 200,000 threshold` — 接近预算上限
+- `FALSE RATE LIMIT: synthetic model detected` — 检测到客户端侧虚假速率限制
+- `GROWTHBOOK FLAGS: {...}` — 首次调用时记录的服务器控制标志
+- `PROMPT SIZE: system=N tools=N injected=N (skills=N mcp=N ...)` — 每次调用的提示体积明细
+- `CACHE TTL: tier=1h create=N read=N hit=N% (1h=N 5m=N)` — TTL 档位和每次调用的缓存命中率
+- `PEAK HOUR: weekday 13:00-19:00 UTC` — Anthropic 高峰时段限流生效
+- `SKIPPED: resume relocation (not a resume or already correct)` — 无需修复
+- `SKIPPED: output efficiency rewrite (section not found)` — 未找到匹配的 `output efficiency` 段
+### Prefix diff mode
+启用跨进程前缀快照差异对比，以诊断重启后的缓存失效：
+```bash
+CACHE_FIX_PREFIXDIFF=1 claude-fixed
+```
+快照会保存到 `~/.claude/cache-fix-snapshots/`，并在重启后的第一次 API 调用时生成差异报告。
 ## 环境变量
@@ -146,6 +256,7 @@ CACHE_FIX_DEBUG=1 claude-fixed
 | `CACHE_FIX_DEBUG` | `0` | 启用调试日志 |
 | `CACHE_FIX_PREFIXDIFF` | `0` | 启用前缀快照差异对比 |
 | `CACHE_FIX_IMAGE_KEEP_LAST` | `0` | 保留最近 N 条用户消息中的图片（0 = 禁用） |
+| `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT` | unset | 在请求发出前替换 Claude Code 的 `# Output efficiency` 系统提示词段落 |
 | `CACHE_FIX_USAGE_LOG` | `~/.claude/usage.jsonl` | 每次调用使用量遥测日志路径 |
 ## 限制
@@ -153,14 +264,28 @@ CACHE_FIX_DEBUG=1 claude-fixed
 - **仅支持 npm 安装** — 独立 Claude Code 二进制文件具有 Zig 级别的证明机制，会绕过 Node.js。本修复仅适用于 npm 包（`npm install -g @anthropic-ai/claude-code`）。
 - **超额 TTL 降级** — 超过 5 小时配额的 100% 会触发服务器端 TTL 从 1h 降级至 5m。这是服务器端决策，无法在客户端修复。拦截器通过防止缓存不稳定来避免你首先进入超额状态。
 - **微压缩不可阻止** — 监控功能可以检测上下文降级，但无法阻止。微压缩和预算执行机制是通过 GrowthBook 标志进行服务器控制的，没有客户端禁用选项。
+- **系统提示词重写是实验性的** — 此 hook 只会重写一个系统提示词段落，并且默认关闭，但仍存在未知因素：目前并未证明这段提示词文本本身就是社区报告中行为差异的根因，也无法确认未来服务端校验是否会对修改后的系统提示词作出反应。使用风险由用户自行承担。
+- **版本耦合** — 指纹 salt 和块检测启发式规则都来自 Claude Code 内部实现。重大重构可能需要更新此包。
 ## 相关问题
 - [#34629](https://github.com/anthropics/claude-code/issues/34629) — 恢复缓存回归的原始报告
 - [#40524](https://github.com/anthropics/claude-code/issues/40524) — 会话内指纹失效，图片持久化
 - [#42052](https://github.com/anthropics/claude-code/issues/42052) — 社区拦截器开发，TTL 降级发现
-- [#44045](https://github.com/anthropics/claude-code/issues/44045) — 恢复时提示缓存部分缺失
-- [#41930](https://github.com/anthropics/claude-code/issues/41930) — 多种根因导致的异常用量消耗
+- [#43044](https://github.com/anthropics/claude-code/issues/43044) — 恢复会话后在 v2.1.91 上仅加载 0% 上下文
+- [#43657](https://github.com/anthropics/claude-code/issues/43657) — 在 v2.1.92 上确认恢复会话导致缓存失效
+- [#44045](https://github.com/anthropics/claude-code/issues/44045) — SDK 层面的复现与 token 测量
+- [#32508](https://github.com/anthropics/claude-code/issues/32508) — 关于 `Output efficiency` 系统提示词变更及其可能影响模型行为的社区讨论
+## 贡献者
+- **[@VictorSun92](https://github.com/VictorSun92)** — 原始 v2.1.88 monkey-patch 修复作者，识别出 v2.1.90 中的部分块散布问题，并贡献了前向扫描检测、正确的块排序、更严格的块匹配器，以及可选的 output-efficiency 重写 hook
+- **[@jmarianski](https://github.com/jmarianski)** — 通过 MITM 代理抓包和 Ghidra 逆向分析定位根因，并提供多模式缓存测试脚本
+- **[@cnighswonger](https://github.com/cnighswonger)** — 指纹稳定化、工具顺序修复、图片剥离、监控功能、超额 TTL 降级发现，本包维护者
+- **[@ArkNill](https://github.com/ArkNill)** — 微压缩机制分析、GrowthBook 标志文档整理、虚假速率限制识别
+- **[@Renvect](https://github.com/Renvect)** — 图片重复发送问题发现、跨项目目录污染分析
+如果你参与了这些问题的社区协作但尚未被列出，欢迎开 issue 或 PR，我们希望正确致谢每一位贡献者。
 ## 许可证

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-code-cache-fix",
-  "version": "1.6.2-debug.1",
+  "version": "1.6.3",
   "description": "Fixes prompt cache regression in Claude Code that causes up to 20x cost increase on resumed sessions",
   "type": "module",
   "exports": "./preload.mjs",
@@ -12,6 +12,9 @@
   "engines": {
     "node": ">=18"
   },
+  "scripts": {
+    "test": "node --test 'test/**/*.test.mjs'"
+  },
   "keywords": [
     "claude-code",
     "claude",

package/preload.mjs CHANGED Viewed

@@ -276,7 +276,13 @@ function stripSessionKnowledge(text) {
  * prepends them to messages[0]. Idempotent across calls.
  */
 function normalizeResumeMessages(messages) {
-  if (!Array.isArray(messages) || messages.length < 2) return messages;
+  if (!Array.isArray(messages)) return messages;
+  // NOTE: We used to return early here for messages.length < 2 (fresh sessions)
+  // because there's nothing to relocate. But this left the first call's blocks
+  // in CC's raw, non-deterministic order. On call 2+, sorting/pinning would run
+  // and produce DIFFERENT bytes — busting cache on the first resume turn.
+  // Fix: always run sort+pin, even on single-message calls, so the first call
+  // establishes a deterministic baseline. (@bilby91 #44045)
   let firstUserIdx = -1;
   for (let i = 0; i < messages.length; i++) {
@@ -503,6 +509,76 @@ function stabilizeToolOrder(tools) {
   });
 }
+// --------------------------------------------------------------------------
+// System prompt rewrite (optional)
+// --------------------------------------------------------------------------
+const OUTPUT_EFFICIENCY_SECTION_HEADER = "# Output efficiency";
+const OUTPUT_EFFICIENCY_REPLACEMENT_RAW =
+  process.env.CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT || "";
+const OUTPUT_EFFICIENCY_SECTION_REPLACEMENT =
+  normalizeOutputEfficiencyReplacement(OUTPUT_EFFICIENCY_REPLACEMENT_RAW);
+function normalizeOutputEfficiencyReplacement(text) {
+  const trimmed = typeof text === "string" ? text.trim() : "";
+  if (!trimmed) return "";
+  return trimmed.startsWith(OUTPUT_EFFICIENCY_SECTION_HEADER)
+    ? trimmed
+    : `${OUTPUT_EFFICIENCY_SECTION_HEADER}\n\n${trimmed}`;
+}
+/**
+ * Replace Claude Code's entire output-efficiency section in-place while
+ * preserving the existing system block structure and cache_control fields.
+ */
+function rewriteOutputEfficiencyInstruction(system) {
+  if (!Array.isArray(system) || !OUTPUT_EFFICIENCY_SECTION_REPLACEMENT) {
+    return null;
+  }
+  let changed = false;
+  const rewritten = system.map((block) => {
+    if (
+      block?.type !== "text" ||
+      typeof block.text !== "string" ||
+      !block.text.includes(OUTPUT_EFFICIENCY_SECTION_HEADER)
+    ) {
+      return block;
+    }
+    const nextText = replaceOutputEfficiencySection(block.text);
+    if (!nextText || nextText === block.text) {
+      return block;
+    }
+    changed = true;
+    return { ...block, text: nextText };
+  });
+  return changed ? rewritten : null;
+}
+function replaceOutputEfficiencySection(text) {
+  const start = text.indexOf(OUTPUT_EFFICIENCY_SECTION_HEADER);
+  if (start === -1) return null;
+  const afterHeader = start + OUTPUT_EFFICIENCY_SECTION_HEADER.length;
+  const remainder = text.slice(afterHeader);
+  const nextHeadingMatch = remainder.match(/\n# [^\n]+/);
+  if (!nextHeadingMatch || nextHeadingMatch.index == null) {
+    return text.slice(0, start) + OUTPUT_EFFICIENCY_SECTION_REPLACEMENT;
+  }
+  const nextHeadingStart = afterHeader + nextHeadingMatch.index + 1;
+  return (
+    text.slice(0, start) +
+    OUTPUT_EFFICIENCY_SECTION_REPLACEMENT +
+    "\n\n" +
+    text.slice(nextHeadingStart)
+  );
+}
 // --------------------------------------------------------------------------
 // Fetch interceptor
 // --------------------------------------------------------------------------
@@ -518,6 +594,7 @@ import { join } from "node:path";
 const DEBUG = process.env.CACHE_FIX_DEBUG === "1";
 const PREFIXDIFF = process.env.CACHE_FIX_PREFIXDIFF === "1";
+const NORMALIZE_IDENTITY = process.env.CACHE_FIX_NORMALIZE_IDENTITY === "1";
 const LOG_PATH = join(homedir(), ".claude", "cache-fix-debug.log");
 const SNAPSHOT_DIR = join(homedir(), ".claude", "cache-fix-snapshots");
 const USAGE_JSONL = process.env.CACHE_FIX_USAGE_LOG || join(homedir(), ".claude", "usage.jsonl");
@@ -571,6 +648,7 @@ function dumpGrowthBookFlags() {
       cold_compact: features.tengu_cold_compact,
       system_prompt_global_cache: features.tengu_system_prompt_global_cache,
       compact_cache_prefix: features.tengu_compact_cache_prefix,
+      onyx_plover: features.tengu_onyx_plover,
     };
     debugLog("GROWTHBOOK FLAGS:", JSON.stringify(interesting, null, 2));
   } catch (e) {
@@ -843,6 +921,49 @@ globalThis.fetch = async function (url, options) {
         }
       }
+      // Bug 6: Identity string normalization for Agent()/SendMessage() cache parity
+      // The CC orchestrator emits a different identity string in system[1] depending
+      // on whether the call originated from Agent() vs SendMessage() (subagent resume):
+      //   Agent():       "You are Claude Code, Anthropic's official CLI for Claude."
+      //   SendMessage(): "You are a Claude agent, built on Anthropic's Claude Agent SDK."
+      // Both blocks carry cache_control: ephemeral. The ~50-char identity swap is enough
+      // to invalidate the entire cache prefix, producing cache_read=0 on first SendMessage
+      // turn even though system[2] (the actual instructions) is byte-identical.
+      // Confirmed by @labzink via mitmproxy on #44724.
+      // Opt-in because it's a model-perceivable behavior change (subagent thinks it's CC).
+      if (NORMALIZE_IDENTITY && payload.system && Array.isArray(payload.system)) {
+        const CANONICAL = "You are Claude Code, Anthropic's official CLI for Claude.";
+        const AGENT_SDK = "You are a Claude agent, built on Anthropic's Claude Agent SDK.";
+        let normalized = 0;
+        payload.system = payload.system.map((block) => {
+          if (
+            block?.type === "text" &&
+            typeof block.text === "string" &&
+            block.text.startsWith(AGENT_SDK)
+          ) {
+            normalized++;
+            return { ...block, text: CANONICAL + block.text.slice(AGENT_SDK.length) };
+          }
+          return block;
+        });
+        if (normalized > 0) {
+          modified = true;
+          debugLog(`APPLIED: identity normalized on ${normalized} system block(s) (Agent SDK → Claude Code)`);
+        }
+      }
+      // Optional: rewrite Claude Code's default output-efficiency section
+      if (payload.system && OUTPUT_EFFICIENCY_SECTION_REPLACEMENT) {
+        const rewritten = rewriteOutputEfficiencyInstruction(payload.system);
+        if (rewritten) {
+          payload.system = rewritten;
+          modified = true;
+          debugLog("APPLIED: output efficiency section rewritten");
+        } else {
+          debugLog("SKIPPED: output efficiency rewrite (section not found)");
+        }
+      }
       // Bug 5: 1h TTL enforcement
       // The client gates 1h cache TTL behind a GrowthBook allowlist that checks
       // querySource against patterns like "repl_main_thread*", "sdk", "auto_mode".
@@ -1120,3 +1241,31 @@ async function drainTTLFromClone(clone, model, quotaHeaders) {
     }
   }
 }
+// --------------------------------------------------------------------------
+// Test exports
+// --------------------------------------------------------------------------
+//
+// These exports exist for unit testing the pure functions in this file. They
+// have no effect on the interceptor's runtime behavior — production callers
+// load this module via NODE_OPTIONS=--import and never use named imports.
+// Tests import from this file directly: `import { sortSkillsBlock } from
+// '../preload.mjs'`. The fetch patching above runs at import time but is
+// harmless in a test process since tests do not make fetch calls.
+export {
+  sortSkillsBlock,
+  sortDeferredToolsBlock,
+  pinBlockContent,
+  stripSessionKnowledge,
+  stabilizeFingerprint,
+  computeFingerprint,
+  isSkillsBlock,
+  isDeferredToolsBlock,
+  isHooksBlock,
+  isMcpBlock,
+  isRelocatableBlock,
+  rewriteOutputEfficiencyInstruction,
+  normalizeOutputEfficiencyReplacement,
+  _pinnedBlocks,  // exported so tests can reset between runs
+};