claude-code-cache-fix 1.7.2 → 1.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +102 -4
- package/package.json +1 -1
- package/preload.mjs +396 -28
- package/tools/cost-report.mjs +18 -7
- package/tools/quota-statusline.sh +10 -0
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# claude-code-cache-fix
|
|
2
2
|
|
|
3
|
-
English | [中文](./README.zh.md)
|
|
3
|
+
English | [中文](./README.zh.md) | [Português](./docs/guia-pt-br.md)
|
|
4
4
|
|
|
5
5
|
Fixes prompt cache regressions in [Claude Code](https://github.com/anthropics/claude-code) that cause **up to 20x cost increase** on resumed sessions, plus monitoring for silent context degradation. Confirmed through v2.1.97.
|
|
6
6
|
|
|
@@ -36,7 +36,10 @@ Create a wrapper script (e.g. `~/bin/claude-fixed`):
|
|
|
36
36
|
|
|
37
37
|
```bash
|
|
38
38
|
#!/bin/bash
|
|
39
|
-
|
|
39
|
+
NPM_GLOBAL_ROOT="$(npm root -g 2>/dev/null)"
|
|
40
|
+
|
|
41
|
+
CLAUDE_NPM_CLI="$NPM_GLOBAL_ROOT/@anthropic-ai/claude-code/cli.js"
|
|
42
|
+
CACHE_FIX="$NPM_GLOBAL_ROOT/claude-code-cache-fix/preload.mjs"
|
|
40
43
|
|
|
41
44
|
if [ ! -f "$CLAUDE_NPM_CLI" ]; then
|
|
42
45
|
echo "Error: Claude Code npm package not found at $CLAUDE_NPM_CLI" >&2
|
|
@@ -44,7 +47,13 @@ if [ ! -f "$CLAUDE_NPM_CLI" ]; then
|
|
|
44
47
|
exit 1
|
|
45
48
|
fi
|
|
46
49
|
|
|
47
|
-
|
|
50
|
+
if [ ! -f "$CACHE_FIX" ]; then
|
|
51
|
+
echo "Error: claude-code-cache-fix not found at $CACHE_FIX" >&2
|
|
52
|
+
echo "Install with: npm install -g claude-code-cache-fix" >&2
|
|
53
|
+
exit 1
|
|
54
|
+
fi
|
|
55
|
+
|
|
56
|
+
exec env NODE_OPTIONS="--import $CACHE_FIX" node "$CLAUDE_NPM_CLI" "$@"
|
|
48
57
|
```
|
|
49
58
|
|
|
50
59
|
```bash
|
|
@@ -105,6 +114,67 @@ The module intercepts `globalThis.fetch` before Claude Code makes API calls to `
|
|
|
105
114
|
|
|
106
115
|
All fixes are idempotent — if nothing needs fixing, the request passes through unmodified. The interceptor is read-only with respect to your conversation; it only normalizes the request structure before it hits the API.
|
|
107
116
|
|
|
117
|
+
## Graduating from Fixes
|
|
118
|
+
|
|
119
|
+
The interceptor serves three purposes with different lifecycles:
|
|
120
|
+
|
|
121
|
+
| Purpose | Examples | When to disable |
|
|
122
|
+
|---------|----------|-----------------|
|
|
123
|
+
| **Bug fixes** | Block relocation, fingerprint, tool sort, TTL | When CC fixes the underlying bug — check the health line |
|
|
124
|
+
| **Monitoring** | Quota tracking, microcompact detection, GrowthBook flags | Keep permanently — these detect future regressions |
|
|
125
|
+
| **Optimizations** | Image stripping, output efficiency rewrite | Keep as long as they help your workflow |
|
|
126
|
+
|
|
127
|
+
### Health status
|
|
128
|
+
|
|
129
|
+
On first API call, the interceptor logs a health status line (requires `CACHE_FIX_DEBUG=1`):
|
|
130
|
+
|
|
131
|
+
```
|
|
132
|
+
cache-fix health: relocate=active(2h ago) fingerprint=dormant(5 clean sessions) tool_sort=active ttl=active identity=waiting
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
Status meanings:
|
|
136
|
+
- **active(Xh ago)** — fix was applied recently
|
|
137
|
+
- **dormant(N clean sessions)** — bug not detected in N resume sessions; CC may have fixed it
|
|
138
|
+
- **safety-blocked(Nx)** — round-trip verification failed; CC changed its algorithm, fix auto-disabled
|
|
139
|
+
- **waiting** — fix hasn't been triggered yet
|
|
140
|
+
|
|
141
|
+
When a fix shows `dormant`, you can safely disable it:
|
|
142
|
+
```bash
|
|
143
|
+
export CACHE_FIX_SKIP_RELOCATE=1 # example
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
To disable all fixes but keep monitoring:
|
|
147
|
+
```bash
|
|
148
|
+
export CACHE_FIX_DISABLED=1
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
### Regression detection
|
|
152
|
+
|
|
153
|
+
If cache_read ratio drops below 50% across 5+ calls after disabling fixes, you'll see:
|
|
154
|
+
```
|
|
155
|
+
REGRESSION WARNING: cache_read ratio averaged 12% across last 5 calls.
|
|
156
|
+
Fixes are disabled — consider re-enabling to recover cache performance.
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
## Safety
|
|
160
|
+
|
|
161
|
+
### Fingerprint round-trip verification
|
|
162
|
+
|
|
163
|
+
Before rewriting the `cc_version` fingerprint, the interceptor verifies that its
|
|
164
|
+
hardcoded salt and character indices reproduce the fingerprint Claude Code sent.
|
|
165
|
+
If verification fails (CC changed its algorithm), the rewrite is skipped automatically.
|
|
166
|
+
This ensures the interceptor can never make cache performance *worse* than stock CC.
|
|
167
|
+
|
|
168
|
+
### Fail-safe design
|
|
169
|
+
|
|
170
|
+
Every fix is designed to fail to a no-op:
|
|
171
|
+
- If block detection regexes don't match → blocks aren't relocated (CC behavior)
|
|
172
|
+
- If fingerprint format changes → fingerprint isn't rewritten (CC behavior)
|
|
173
|
+
- If tool sort produces no changes → payload passes through untouched
|
|
174
|
+
- If TTL injection target structure changes → TTL isn't injected (CC behavior)
|
|
175
|
+
|
|
176
|
+
The interceptor can only *help* or *do nothing*. It cannot make things worse.
|
|
177
|
+
|
|
108
178
|
## Status line — quota warnings in real time
|
|
109
179
|
|
|
110
180
|
The interceptor writes quota state to `~/.claude/quota-status.json` on every API call. The included `tools/quota-statusline.sh` script reads this file and displays a live status line in Claude Code showing:
|
|
@@ -137,7 +207,23 @@ Add to `~/.claude/settings.json`:
|
|
|
137
207
|
}
|
|
138
208
|
```
|
|
139
209
|
|
|
140
|
-
###
|
|
210
|
+
### Recommended: disable git-status injection
|
|
211
|
+
|
|
212
|
+
Claude Code injects live `git status` output into the system prompt on every call. Any file edit changes the git status, which changes the system prompt, which busts the entire prefix cache. Disabling this saves ~1,800 tokens per call and fully stabilizes the system prompt across file edits:
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
export CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
Or add `"includeGitInstructions": false` to `~/.claude/settings.json`. Claude Code can still run `git status` via the Bash tool when it needs git context — it just won't pre-inject it into every system prompt.
|
|
219
|
+
|
|
220
|
+
The flag also shrinks the Bash tool description by ~6,364 chars (the Bash tool includes git-related instructions that are stripped when the flag is set), for a total prefix savings of ~7,180 chars (~1,800 tokens) per call.
|
|
221
|
+
|
|
222
|
+
Community-validated by [@wadabum](https://github.com/cnighswonger/claude-code-cache-fix/issues/11): 18-token cache creation across git state changes (vs thousands without the flag). See [#11](https://github.com/cnighswonger/claude-code-cache-fix/issues/11) for the full telemetry comparison.
|
|
223
|
+
|
|
224
|
+
**Note:** this flag does not address the `"Primary working directory"` line in the system prompt, which changes per git worktree. A v1.9.0 interceptor fix to strip/normalize both is planned ([#11](https://github.com/cnighswonger/claude-code-cache-fix/issues/11)).
|
|
225
|
+
|
|
226
|
+
### Why the status line matters
|
|
141
227
|
|
|
142
228
|
When the server downgrades your TTL to 5m (Layer 2 — quota-aware downgrade at Q5h ≥ 100%), **every idle longer than 5 minutes causes a full context rebuild**. Without the status line, this is invisible — you just notice things getting slower and more expensive. With the status line, the red `TTL:5m` warning tells you immediately: **stop working, wait for the Q5h window to reset, then resume**. Powering through overage compounds the drain; pausing breaks the cycle.
|
|
143
229
|
|
|
@@ -341,6 +427,17 @@ Snapshots are saved to `~/.claude/cache-fix-snapshots/` and diff reports are gen
|
|
|
341
427
|
| `CACHE_FIX_IMAGE_KEEP_LAST` | `0` | Keep images in last N user messages (0 = disabled) |
|
|
342
428
|
| `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT` | unset | Replace Claude Code's `# Output efficiency` system-prompt section before the request is sent |
|
|
343
429
|
| `CACHE_FIX_USAGE_LOG` | `~/.claude/usage.jsonl` | Path for per-call usage telemetry log |
|
|
430
|
+
| `CACHE_FIX_DISABLED` | `0` | Disable all bug fixes; keep monitoring + optimizations active |
|
|
431
|
+
| `CACHE_FIX_SKIP_RELOCATE` | `0` | Skip block relocation fix (Bug 1) |
|
|
432
|
+
| `CACHE_FIX_SKIP_FINGERPRINT` | `0` | Skip fingerprint stabilization (Bug 2b) |
|
|
433
|
+
| `CACHE_FIX_SKIP_TOOL_SORT` | `0` | Skip tool ordering stabilization (Bug 2a) |
|
|
434
|
+
| `CACHE_FIX_SKIP_TTL` | `0` | Skip TTL injection (Bug 5) |
|
|
435
|
+
| `CACHE_FIX_SKIP_IDENTITY` | `0` | Skip identity normalization (Bug 6) |
|
|
436
|
+
| `CACHE_FIX_SKIP_GIT_STATUS` | `0` | Skip git-status stripping |
|
|
437
|
+
| `CACHE_FIX_STRIP_GIT_STATUS` | `0` | Strip volatile git-status from system prompt for prefix stability. Model can still run `git status` via Bash. |
|
|
438
|
+
| `CACHE_FIX_TTL_MAIN` | `1h` | TTL for main-thread requests: `1h`, `5m`, or `none` (pass-through) |
|
|
439
|
+
| `CACHE_FIX_TTL_SUBAGENT` | `1h` | TTL for subagent requests: `1h`, `5m`, or `none` (pass-through) |
|
|
440
|
+
| `CACHE_FIX_DUMP_BREAKPOINTS` | unset | Path to dump cache breakpoint structure (diagnostic for #12) |
|
|
344
441
|
|
|
345
442
|
## Limitations
|
|
346
443
|
|
|
@@ -424,6 +521,7 @@ measurable signature of cache-efficiency degradation.
|
|
|
424
521
|
- **[@Renvect](https://github.com/Renvect)** — Image duplication discovery, cross-project directory contamination analysis
|
|
425
522
|
- **[@fgrosswig](https://github.com/fgrosswig)** — [claude-usage-dashboard](https://github.com/fgrosswig/claude-usage-dashboard) forensic methodology: cost-factor overhead ratio metric, `anthropic-*` header capture pattern, proxy NDJSON schema that informed our dashboard interop layer
|
|
426
523
|
- **[@TomTheMenace](https://github.com/TomTheMenace)** — Windows `.bat` wrapper for the interceptor, first Windows platform validation (7.5h/536-call Opus 4.6 session, 98.4% cache hit rate, 81% fingerprint instability corrected)
|
|
524
|
+
- **[@arjansingh](https://github.com/arjansingh)** — nvm-compatible wrapper script with dynamic `npm root -g` path resolution (PR #15)
|
|
427
525
|
|
|
428
526
|
If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.
|
|
429
527
|
|
package/package.json
CHANGED
package/preload.mjs
CHANGED
|
@@ -83,6 +83,25 @@ function extractRealUserMessageText(messages) {
|
|
|
83
83
|
return "";
|
|
84
84
|
}
|
|
85
85
|
|
|
86
|
+
/**
|
|
87
|
+
* Extract text from messages[0] the way CC's original fingerprint code does —
|
|
88
|
+
* including meta/attachment blocks. Used only for round-trip verification.
|
|
89
|
+
*/
|
|
90
|
+
function extractFirstMessageText(messages) {
|
|
91
|
+
if (!Array.isArray(messages) || messages.length === 0) return "";
|
|
92
|
+
const first = messages[0];
|
|
93
|
+
if (!first || first.role !== "user") return "";
|
|
94
|
+
const content = first.content;
|
|
95
|
+
if (typeof content === "string") return content;
|
|
96
|
+
if (!Array.isArray(content)) return "";
|
|
97
|
+
for (const block of content) {
|
|
98
|
+
if (block.type === "text" && typeof block.text === "string") {
|
|
99
|
+
return block.text;
|
|
100
|
+
}
|
|
101
|
+
}
|
|
102
|
+
return "";
|
|
103
|
+
}
|
|
104
|
+
|
|
86
105
|
/**
|
|
87
106
|
* Extract current cc_version from system prompt blocks and recompute with
|
|
88
107
|
* stable fingerprint. Returns { oldVersion, newVersion, stableFingerprint }.
|
|
@@ -107,6 +126,23 @@ function stabilizeFingerprint(system, messages) {
|
|
|
107
126
|
const baseVersion = dotParts.slice(0, 3).join("."); // "2.1.87"
|
|
108
127
|
const oldFingerprint = dotParts[3]; // "a3f"
|
|
109
128
|
|
|
129
|
+
// --- SAFETY: Round-trip verification ---
|
|
130
|
+
// Verify our salt/indices reproduce CC's fingerprint for the ORIGINAL
|
|
131
|
+
// message text (messages[0] content, which is what CC used).
|
|
132
|
+
// If our computation doesn't match, our constants are stale — skip rewrite.
|
|
133
|
+
const originalText = extractFirstMessageText(messages);
|
|
134
|
+
const verification = computeFingerprint(originalText, baseVersion);
|
|
135
|
+
if (verification !== oldFingerprint) {
|
|
136
|
+
debugLog(
|
|
137
|
+
"FINGERPRINT SAFETY: round-trip verification failed.",
|
|
138
|
+
`CC sent '${oldFingerprint}', we computed '${verification}'.`,
|
|
139
|
+
"Salt/indices may have changed in this CC version. Skipping rewrite."
|
|
140
|
+
);
|
|
141
|
+
recordFixResult("fingerprint", "safety_blocked");
|
|
142
|
+
return null;
|
|
143
|
+
}
|
|
144
|
+
// --- END SAFETY ---
|
|
145
|
+
|
|
110
146
|
// Compute stable fingerprint from real user text
|
|
111
147
|
const realText = extractRealUserMessageText(messages);
|
|
112
148
|
const stableFingerprint = computeFingerprint(realText, baseVersion);
|
|
@@ -588,13 +624,16 @@ function replaceOutputEfficiencySection(text) {
|
|
|
588
624
|
// Set CACHE_FIX_DEBUG=1 to enable
|
|
589
625
|
// --------------------------------------------------------------------------
|
|
590
626
|
|
|
591
|
-
import { appendFileSync, readFileSync, writeFileSync, mkdirSync } from "node:fs";
|
|
627
|
+
import { appendFileSync, readFileSync, writeFileSync, mkdirSync, renameSync } from "node:fs";
|
|
592
628
|
import { homedir } from "node:os";
|
|
593
629
|
import { join } from "node:path";
|
|
594
630
|
|
|
595
631
|
const DEBUG = process.env.CACHE_FIX_DEBUG === "1";
|
|
596
632
|
const PREFIXDIFF = process.env.CACHE_FIX_PREFIXDIFF === "1";
|
|
597
633
|
const NORMALIZE_IDENTITY = process.env.CACHE_FIX_NORMALIZE_IDENTITY === "1";
|
|
634
|
+
const STRIP_GIT_STATUS = process.env.CACHE_FIX_STRIP_GIT_STATUS === "1";
|
|
635
|
+
const TTL_MAIN = (process.env.CACHE_FIX_TTL_MAIN || "1h").toLowerCase();
|
|
636
|
+
const TTL_SUBAGENT = (process.env.CACHE_FIX_TTL_SUBAGENT || "1h").toLowerCase();
|
|
598
637
|
const LOG_PATH = join(homedir(), ".claude", "cache-fix-debug.log");
|
|
599
638
|
const SNAPSHOT_DIR = join(homedir(), ".claude", "cache-fix-snapshots");
|
|
600
639
|
const USAGE_JSONL = process.env.CACHE_FIX_USAGE_LOG || join(homedir(), ".claude", "usage.jsonl");
|
|
@@ -605,6 +644,104 @@ function debugLog(...args) {
|
|
|
605
644
|
try { appendFileSync(LOG_PATH, line); } catch {}
|
|
606
645
|
}
|
|
607
646
|
|
|
647
|
+
// --------------------------------------------------------------------------
|
|
648
|
+
// Kill switches — disable fixes while keeping monitoring active
|
|
649
|
+
// --------------------------------------------------------------------------
|
|
650
|
+
|
|
651
|
+
const FIXES_DISABLED = process.env.CACHE_FIX_DISABLED === "1";
|
|
652
|
+
|
|
653
|
+
/**
|
|
654
|
+
* Check if a specific fix should be applied.
|
|
655
|
+
* Returns false if master kill switch is on OR individual fix is skipped.
|
|
656
|
+
* Monitoring and optimizations (image strip, output efficiency) are NOT
|
|
657
|
+
* affected by CACHE_FIX_DISABLED — only bug fixes are.
|
|
658
|
+
*/
|
|
659
|
+
function shouldApplyFix(fixName) {
|
|
660
|
+
if (FIXES_DISABLED) return false;
|
|
661
|
+
const skipKey = `CACHE_FIX_SKIP_${fixName.toUpperCase()}`;
|
|
662
|
+
if (process.env[skipKey] === "1") return false;
|
|
663
|
+
return true;
|
|
664
|
+
}
|
|
665
|
+
|
|
666
|
+
// --------------------------------------------------------------------------
|
|
667
|
+
// Persistent effectiveness stats
|
|
668
|
+
// --------------------------------------------------------------------------
|
|
669
|
+
|
|
670
|
+
const STATS_PATH = join(homedir(), ".claude", "cache-fix-stats.json");
|
|
671
|
+
|
|
672
|
+
const _STATS_SCHEMA = {
|
|
673
|
+
relocate: { applied: 0, skipped: 0, bugPresent: 0, resumeScanned: 0, lastApplied: null, lastScanned: null },
|
|
674
|
+
fingerprint: { applied: 0, skipped: 0, safetyBlocked: 0, lastApplied: null },
|
|
675
|
+
tool_sort: { applied: 0, skipped: 0, lastApplied: null },
|
|
676
|
+
ttl: { applied: 0, skipped: 0, lastApplied: null },
|
|
677
|
+
identity: { applied: 0, skipped: 0, lastApplied: null },
|
|
678
|
+
git_status: { applied: 0, skipped: 0, lastApplied: null },
|
|
679
|
+
};
|
|
680
|
+
|
|
681
|
+
function _createEmptyStats() {
|
|
682
|
+
return {
|
|
683
|
+
version: 1,
|
|
684
|
+
created: new Date().toISOString(),
|
|
685
|
+
lastUpdated: null,
|
|
686
|
+
fixes: JSON.parse(JSON.stringify(_STATS_SCHEMA)),
|
|
687
|
+
};
|
|
688
|
+
}
|
|
689
|
+
|
|
690
|
+
/** Read stats from disk. Returns empty stats on any error. */
|
|
691
|
+
function readStats() {
|
|
692
|
+
try {
|
|
693
|
+
const data = JSON.parse(readFileSync(STATS_PATH, "utf8"));
|
|
694
|
+
if (data.created) {
|
|
695
|
+
const ageDays = (Date.now() - new Date(data.created).getTime()) / (1000 * 60 * 60 * 24);
|
|
696
|
+
if (ageDays > 30) return _createEmptyStats();
|
|
697
|
+
}
|
|
698
|
+
for (const [key, schema] of Object.entries(_STATS_SCHEMA)) {
|
|
699
|
+
if (!data.fixes[key]) data.fixes[key] = { ...schema };
|
|
700
|
+
}
|
|
701
|
+
return data;
|
|
702
|
+
} catch {
|
|
703
|
+
return _createEmptyStats();
|
|
704
|
+
}
|
|
705
|
+
}
|
|
706
|
+
|
|
707
|
+
/** Atomic write: temp file + rename to avoid corruption. */
|
|
708
|
+
function writeStats(stats) {
|
|
709
|
+
try {
|
|
710
|
+
stats.lastUpdated = new Date().toISOString();
|
|
711
|
+
const tmp = STATS_PATH + ".tmp";
|
|
712
|
+
writeFileSync(tmp, JSON.stringify(stats, null, 2));
|
|
713
|
+
renameSync(tmp, STATS_PATH);
|
|
714
|
+
} catch (e) {
|
|
715
|
+
debugLog("STATS WRITE ERROR:", e?.message);
|
|
716
|
+
}
|
|
717
|
+
}
|
|
718
|
+
|
|
719
|
+
function recordFixResult(fixName, result) {
|
|
720
|
+
const stats = readStats();
|
|
721
|
+
if (!stats.fixes[fixName]) return;
|
|
722
|
+
const now = new Date().toISOString();
|
|
723
|
+
stats.lastUpdated = now;
|
|
724
|
+
if (result === "applied") {
|
|
725
|
+
stats.fixes[fixName].applied++;
|
|
726
|
+
stats.fixes[fixName].lastApplied = now;
|
|
727
|
+
} else if (result === "skipped") {
|
|
728
|
+
stats.fixes[fixName].skipped++;
|
|
729
|
+
} else if (result === "safety_blocked") {
|
|
730
|
+
stats.fixes[fixName].safetyBlocked = (stats.fixes[fixName].safetyBlocked || 0) + 1;
|
|
731
|
+
}
|
|
732
|
+
writeStats(stats);
|
|
733
|
+
}
|
|
734
|
+
|
|
735
|
+
function recordRelocateScan(bugFound) {
|
|
736
|
+
const stats = readStats();
|
|
737
|
+
const now = new Date().toISOString();
|
|
738
|
+
stats.lastUpdated = now;
|
|
739
|
+
stats.fixes.relocate.resumeScanned++;
|
|
740
|
+
stats.fixes.relocate.lastScanned = now;
|
|
741
|
+
if (bugFound) stats.fixes.relocate.bugPresent++;
|
|
742
|
+
writeStats(stats);
|
|
743
|
+
}
|
|
744
|
+
|
|
608
745
|
// --------------------------------------------------------------------------
|
|
609
746
|
// Prefix snapshot — captures message prefix for cross-process diff.
|
|
610
747
|
// Set CACHE_FIX_PREFIXDIFF=1 to enable.
|
|
@@ -656,6 +793,59 @@ function dumpGrowthBookFlags() {
|
|
|
656
793
|
}
|
|
657
794
|
}
|
|
658
795
|
|
|
796
|
+
// --------------------------------------------------------------------------
|
|
797
|
+
// Startup health status line
|
|
798
|
+
// --------------------------------------------------------------------------
|
|
799
|
+
|
|
800
|
+
let _healthLinePrinted = false;
|
|
801
|
+
|
|
802
|
+
function _formatTimeSince(isoString) {
|
|
803
|
+
if (!isoString) return "never";
|
|
804
|
+
const ms = Date.now() - new Date(isoString).getTime();
|
|
805
|
+
const hours = Math.floor(ms / (1000 * 60 * 60));
|
|
806
|
+
const days = Math.floor(hours / 24);
|
|
807
|
+
if (days > 0) return `${days}d ago`;
|
|
808
|
+
if (hours > 0) return `${hours}h ago`;
|
|
809
|
+
const mins = Math.floor(ms / (1000 * 60));
|
|
810
|
+
return `${mins}m ago`;
|
|
811
|
+
}
|
|
812
|
+
|
|
813
|
+
function _formatFixStatus(fixName, fixStats, dormantThreshold = 5) {
|
|
814
|
+
if (fixName === "relocate") {
|
|
815
|
+
if (fixStats.resumeScanned >= dormantThreshold && fixStats.bugPresent === 0) {
|
|
816
|
+
return `dormant(${fixStats.resumeScanned} clean sessions)`;
|
|
817
|
+
}
|
|
818
|
+
} else {
|
|
819
|
+
if (fixStats.skipped >= dormantThreshold && fixStats.applied === 0) {
|
|
820
|
+
return `dormant(${fixStats.skipped} skips)`;
|
|
821
|
+
}
|
|
822
|
+
}
|
|
823
|
+
if (fixStats.safetyBlocked > 0) return `safety-blocked(${fixStats.safetyBlocked}x)`;
|
|
824
|
+
if (fixStats.lastApplied) return `active(${_formatTimeSince(fixStats.lastApplied)})`;
|
|
825
|
+
return "waiting";
|
|
826
|
+
}
|
|
827
|
+
|
|
828
|
+
function printHealthLine() {
|
|
829
|
+
if (_healthLinePrinted) return;
|
|
830
|
+
_healthLinePrinted = true;
|
|
831
|
+
const stats = readStats();
|
|
832
|
+
const parts = [];
|
|
833
|
+
for (const [name, fixStats] of Object.entries(stats.fixes)) {
|
|
834
|
+
const status = _formatFixStatus(name, fixStats);
|
|
835
|
+
parts.push(`${name}=${status}`);
|
|
836
|
+
if (status.startsWith("dormant")) {
|
|
837
|
+
debugLog(`DORMANT: ${name} — CC may have fixed this. Consider CACHE_FIX_SKIP_${name.toUpperCase()}=1`);
|
|
838
|
+
}
|
|
839
|
+
if (status.startsWith("safety-blocked")) {
|
|
840
|
+
debugLog(`SAFETY: ${name} — salt/indices may have changed. Fix is auto-disabled.`);
|
|
841
|
+
}
|
|
842
|
+
}
|
|
843
|
+
debugLog(`HEALTH: ${parts.join(" ")}`);
|
|
844
|
+
if (FIXES_DISABLED) {
|
|
845
|
+
debugLog("HEALTH: all fixes disabled via CACHE_FIX_DISABLED=1 (monitoring active)");
|
|
846
|
+
}
|
|
847
|
+
}
|
|
848
|
+
|
|
659
849
|
// --------------------------------------------------------------------------
|
|
660
850
|
// Microcompact / budget monitoring
|
|
661
851
|
// --------------------------------------------------------------------------
|
|
@@ -801,6 +991,50 @@ function snapshotPrefix(payload) {
|
|
|
801
991
|
}
|
|
802
992
|
}
|
|
803
993
|
|
|
994
|
+
// --------------------------------------------------------------------------
|
|
995
|
+
// Cache regression detector
|
|
996
|
+
// --------------------------------------------------------------------------
|
|
997
|
+
|
|
998
|
+
const _cacheHistory = []; // in-memory ring buffer of { ratio, turn }
|
|
999
|
+
const REGRESSION_MIN_CALLS = 5;
|
|
1000
|
+
const REGRESSION_MIN_RATIO = 0.5;
|
|
1001
|
+
let _apiCallCount = 0;
|
|
1002
|
+
|
|
1003
|
+
function _computeCacheRatio(usage) {
|
|
1004
|
+
if (!usage) return null;
|
|
1005
|
+
const read = usage.cache_read_input_tokens || 0;
|
|
1006
|
+
const creation = usage.cache_creation_input_tokens || 0;
|
|
1007
|
+
const input = usage.input_tokens || 0;
|
|
1008
|
+
const total = read + creation + input;
|
|
1009
|
+
if (total === 0) return null;
|
|
1010
|
+
return read / total;
|
|
1011
|
+
}
|
|
1012
|
+
|
|
1013
|
+
function _checkCacheRegression() {
|
|
1014
|
+
if (_cacheHistory.length < REGRESSION_MIN_CALLS) return;
|
|
1015
|
+
const recent = _cacheHistory.slice(-REGRESSION_MIN_CALLS);
|
|
1016
|
+
const allLow = recent.every((h) => h.ratio < REGRESSION_MIN_RATIO);
|
|
1017
|
+
if (allLow) {
|
|
1018
|
+
const avgRatio = recent.reduce((sum, h) => sum + h.ratio, 0) / recent.length;
|
|
1019
|
+
debugLog(
|
|
1020
|
+
`REGRESSION WARNING: cache_read ratio averaged ${Math.round(avgRatio * 100)}%`,
|
|
1021
|
+
`across last ${REGRESSION_MIN_CALLS} calls (threshold: ${REGRESSION_MIN_RATIO * 100}%).`,
|
|
1022
|
+
FIXES_DISABLED
|
|
1023
|
+
? "Fixes are disabled — consider re-enabling to recover cache performance."
|
|
1024
|
+
: "Fixes are active but cache is still degraded — CC may have introduced a new bug."
|
|
1025
|
+
);
|
|
1026
|
+
}
|
|
1027
|
+
}
|
|
1028
|
+
|
|
1029
|
+
function _trackCacheRatio(usage) {
|
|
1030
|
+
if (_apiCallCount <= 1) return; // skip first call (cache creation, no reads)
|
|
1031
|
+
const ratio = _computeCacheRatio(usage);
|
|
1032
|
+
if (ratio === null) return;
|
|
1033
|
+
_cacheHistory.push({ ratio, turn: _apiCallCount });
|
|
1034
|
+
if (_cacheHistory.length > 20) _cacheHistory.shift(); // ring buffer
|
|
1035
|
+
_checkCacheRegression();
|
|
1036
|
+
}
|
|
1037
|
+
|
|
804
1038
|
// --------------------------------------------------------------------------
|
|
805
1039
|
// Fetch interceptor
|
|
806
1040
|
// --------------------------------------------------------------------------
|
|
@@ -817,11 +1051,17 @@ globalThis.fetch = async function (url, options) {
|
|
|
817
1051
|
|
|
818
1052
|
if (isMessagesEndpoint && options?.body && typeof options.body === "string") {
|
|
819
1053
|
try {
|
|
1054
|
+
_apiCallCount++;
|
|
820
1055
|
const payload = JSON.parse(options.body);
|
|
821
1056
|
let modified = false;
|
|
822
1057
|
|
|
823
1058
|
// One-time GrowthBook flag dump on first API call
|
|
824
1059
|
dumpGrowthBookFlags();
|
|
1060
|
+
printHealthLine();
|
|
1061
|
+
|
|
1062
|
+
if (FIXES_DISABLED) {
|
|
1063
|
+
debugLog("CACHE_FIX_DISABLED=1 — all bug fixes bypassed, monitoring active");
|
|
1064
|
+
}
|
|
825
1065
|
|
|
826
1066
|
debugLog("--- API call to", urlStr);
|
|
827
1067
|
debugLog("message count:", payload.messages?.length);
|
|
@@ -832,7 +1072,7 @@ globalThis.fetch = async function (url, options) {
|
|
|
832
1072
|
}
|
|
833
1073
|
|
|
834
1074
|
// Bug 1: Relocate resume attachment blocks
|
|
835
|
-
if (payload.messages) {
|
|
1075
|
+
if (payload.messages && shouldApplyFix("relocate")) {
|
|
836
1076
|
// Log message structure for debugging
|
|
837
1077
|
if (DEBUG) {
|
|
838
1078
|
let firstUserIdx = -1, lastUserIdx = -1;
|
|
@@ -868,13 +1108,21 @@ globalThis.fetch = async function (url, options) {
|
|
|
868
1108
|
}
|
|
869
1109
|
|
|
870
1110
|
const normalized = normalizeResumeMessages(payload.messages);
|
|
1111
|
+
// Track bug presence for dormancy detection (resume = messages > 5)
|
|
1112
|
+
const isResume = payload.messages.length > 5;
|
|
1113
|
+
if (isResume) recordRelocateScan(normalized !== payload.messages);
|
|
1114
|
+
|
|
871
1115
|
if (normalized !== payload.messages) {
|
|
872
1116
|
payload.messages = normalized;
|
|
873
1117
|
modified = true;
|
|
874
1118
|
debugLog("APPLIED: resume message relocation");
|
|
1119
|
+
recordFixResult("relocate", "applied");
|
|
875
1120
|
} else {
|
|
876
1121
|
debugLog("SKIPPED: resume relocation (not a resume or already correct)");
|
|
1122
|
+
recordFixResult("relocate", "skipped");
|
|
877
1123
|
}
|
|
1124
|
+
} else if (payload.messages && !shouldApplyFix("relocate")) {
|
|
1125
|
+
debugLog("SKIPPED: relocate fix disabled via env var");
|
|
878
1126
|
}
|
|
879
1127
|
|
|
880
1128
|
// Image stripping: remove old tool_result images to reduce token waste
|
|
@@ -895,7 +1143,7 @@ globalThis.fetch = async function (url, options) {
|
|
|
895
1143
|
}
|
|
896
1144
|
|
|
897
1145
|
// Bug 2a: Stabilize tool ordering
|
|
898
|
-
if (payload.tools) {
|
|
1146
|
+
if (payload.tools && shouldApplyFix("tool_sort")) {
|
|
899
1147
|
const sorted = stabilizeToolOrder(payload.tools);
|
|
900
1148
|
const changed = sorted.some(
|
|
901
1149
|
(t, i) => t.name !== payload.tools[i]?.name
|
|
@@ -904,11 +1152,16 @@ globalThis.fetch = async function (url, options) {
|
|
|
904
1152
|
payload.tools = sorted;
|
|
905
1153
|
modified = true;
|
|
906
1154
|
debugLog("APPLIED: tool order stabilization");
|
|
1155
|
+
recordFixResult("tool_sort", "applied");
|
|
1156
|
+
} else {
|
|
1157
|
+
recordFixResult("tool_sort", "skipped");
|
|
907
1158
|
}
|
|
1159
|
+
} else if (payload.tools && !shouldApplyFix("tool_sort")) {
|
|
1160
|
+
debugLog("SKIPPED: tool sort fix disabled via env var");
|
|
908
1161
|
}
|
|
909
1162
|
|
|
910
1163
|
// Bug 2b: Stabilize fingerprint in attribution header
|
|
911
|
-
if (payload.system && payload.messages) {
|
|
1164
|
+
if (payload.system && payload.messages && shouldApplyFix("fingerprint")) {
|
|
912
1165
|
const fix = stabilizeFingerprint(payload.system, payload.messages);
|
|
913
1166
|
if (fix) {
|
|
914
1167
|
payload.system = [...payload.system];
|
|
@@ -918,7 +1171,12 @@ globalThis.fetch = async function (url, options) {
|
|
|
918
1171
|
};
|
|
919
1172
|
modified = true;
|
|
920
1173
|
debugLog("APPLIED: fingerprint stabilized from", fix.oldFingerprint, "to", fix.stableFingerprint);
|
|
1174
|
+
recordFixResult("fingerprint", "applied");
|
|
1175
|
+
} else {
|
|
1176
|
+
recordFixResult("fingerprint", "skipped");
|
|
921
1177
|
}
|
|
1178
|
+
} else if (payload.system && payload.messages && !shouldApplyFix("fingerprint")) {
|
|
1179
|
+
debugLog("SKIPPED: fingerprint fix disabled via env var");
|
|
922
1180
|
}
|
|
923
1181
|
|
|
924
1182
|
// Bug 6: Identity string normalization for Agent()/SendMessage() cache parity
|
|
@@ -931,7 +1189,7 @@ globalThis.fetch = async function (url, options) {
|
|
|
931
1189
|
// turn even though system[2] (the actual instructions) is byte-identical.
|
|
932
1190
|
// Confirmed by @labzink via mitmproxy on #44724.
|
|
933
1191
|
// Opt-in because it's a model-perceivable behavior change (subagent thinks it's CC).
|
|
934
|
-
if (NORMALIZE_IDENTITY && payload.system && Array.isArray(payload.system)) {
|
|
1192
|
+
if (NORMALIZE_IDENTITY && shouldApplyFix("identity") && payload.system && Array.isArray(payload.system)) {
|
|
935
1193
|
const CANONICAL = "You are Claude Code, Anthropic's official CLI for Claude.";
|
|
936
1194
|
const AGENT_SDK = "You are a Claude agent, built on Anthropic's Claude Agent SDK.";
|
|
937
1195
|
let normalized = 0;
|
|
@@ -949,6 +1207,9 @@ globalThis.fetch = async function (url, options) {
|
|
|
949
1207
|
if (normalized > 0) {
|
|
950
1208
|
modified = true;
|
|
951
1209
|
debugLog(`APPLIED: identity normalized on ${normalized} system block(s) (Agent SDK → Claude Code)`);
|
|
1210
|
+
recordFixResult("identity", "applied");
|
|
1211
|
+
} else {
|
|
1212
|
+
recordFixResult("identity", "skipped");
|
|
952
1213
|
}
|
|
953
1214
|
}
|
|
954
1215
|
|
|
@@ -964,39 +1225,91 @@ globalThis.fetch = async function (url, options) {
|
|
|
964
1225
|
}
|
|
965
1226
|
}
|
|
966
1227
|
|
|
967
|
-
//
|
|
1228
|
+
// Optimization: strip volatile git-status from system prompt
|
|
1229
|
+
// CC injects live git-status output (branch, changed files, recent commits)
|
|
1230
|
+
// into a system text block. This changes on every file edit, busting the
|
|
1231
|
+
// entire prefix cache. Opt-in via CACHE_FIX_STRIP_GIT_STATUS=1.
|
|
1232
|
+
// The model can still run `git status` via Bash tool when it needs context.
|
|
1233
|
+
if (STRIP_GIT_STATUS && shouldApplyFix("git_status") && payload.system && Array.isArray(payload.system)) {
|
|
1234
|
+
let stripped = 0;
|
|
1235
|
+
payload.system = payload.system.map((block) => {
|
|
1236
|
+
if (block?.type !== "text" || typeof block.text !== "string") return block;
|
|
1237
|
+
// Match the gitStatus section CC injects. Pattern:
|
|
1238
|
+
// "gitStatus: This is the git status..."
|
|
1239
|
+
// followed by branch, status, commits until the next section or end
|
|
1240
|
+
const gitStatusPattern = /gitStatus:.*?(?=\n# |\n## |\nWhen |\nAnswer |\n<[a-z]|$)/s;
|
|
1241
|
+
if (!gitStatusPattern.test(block.text)) return block;
|
|
1242
|
+
const newText = block.text.replace(gitStatusPattern, "gitStatus: [stripped by cache-fix for prefix stability]");
|
|
1243
|
+
if (newText !== block.text) {
|
|
1244
|
+
stripped++;
|
|
1245
|
+
return { ...block, text: newText };
|
|
1246
|
+
}
|
|
1247
|
+
return block;
|
|
1248
|
+
});
|
|
1249
|
+
if (stripped > 0) {
|
|
1250
|
+
modified = true;
|
|
1251
|
+
debugLog(`APPLIED: git-status stripped from ${stripped} system block(s)`);
|
|
1252
|
+
recordFixResult("git_status", "applied");
|
|
1253
|
+
} else {
|
|
1254
|
+
recordFixResult("git_status", "skipped");
|
|
1255
|
+
}
|
|
1256
|
+
}
|
|
1257
|
+
|
|
1258
|
+
// Bug 5: TTL enforcement (configurable per request type)
|
|
968
1259
|
// The client gates 1h cache TTL behind a GrowthBook allowlist that checks
|
|
969
1260
|
// querySource against patterns like "repl_main_thread*", "sdk", "auto_mode".
|
|
970
1261
|
// Interactive CLI sessions may not match any pattern, causing the client to
|
|
971
1262
|
// send cache_control without ttl (defaulting to 5m server-side).
|
|
972
1263
|
// The server honors whatever TTL the client requests — so we inject it.
|
|
973
1264
|
// Discovered by @TigerKay1926 on #42052 using our GrowthBook flag dump.
|
|
974
|
-
|
|
975
|
-
|
|
976
|
-
|
|
977
|
-
|
|
978
|
-
|
|
979
|
-
|
|
980
|
-
|
|
981
|
-
|
|
982
|
-
|
|
983
|
-
|
|
984
|
-
|
|
985
|
-
|
|
986
|
-
|
|
987
|
-
|
|
988
|
-
|
|
989
|
-
|
|
990
|
-
|
|
991
|
-
|
|
1265
|
+
//
|
|
1266
|
+
// v1.9.0: configurable per request type via CACHE_FIX_TTL_MAIN and
|
|
1267
|
+
// CACHE_FIX_TTL_SUBAGENT. Values: "1h" (default), "5m", "none".
|
|
1268
|
+
// "none" = don't inject TTL, pass through caller's original cache_control.
|
|
1269
|
+
if (payload.system && shouldApplyFix("ttl")) {
|
|
1270
|
+
// Detect subagent: Agent SDK identity in system[1]
|
|
1271
|
+
const AGENT_SDK_PREFIX = "You are a Claude agent, built on Anthropic's Claude Agent SDK.";
|
|
1272
|
+
const isSubagent = Array.isArray(payload.system) &&
|
|
1273
|
+
payload.system.some((b) => b?.type === "text" && typeof b.text === "string" && b.text.startsWith(AGENT_SDK_PREFIX));
|
|
1274
|
+
const ttlValue = isSubagent ? TTL_SUBAGENT : TTL_MAIN;
|
|
1275
|
+
const requestType = isSubagent ? "subagent" : "main";
|
|
1276
|
+
|
|
1277
|
+
if (ttlValue === "none") {
|
|
1278
|
+
debugLog(`SKIPPED: TTL injection (${requestType} set to 'none' — pass-through)`);
|
|
1279
|
+
recordFixResult("ttl", "skipped");
|
|
1280
|
+
} else {
|
|
1281
|
+
const ttlParam = ttlValue === "5m" ? "5m" : "1h";
|
|
1282
|
+
let ttlInjected = 0;
|
|
1283
|
+
payload.system = payload.system.map((block) => {
|
|
1284
|
+
if (block.cache_control?.type === "ephemeral" && !block.cache_control.ttl) {
|
|
1285
|
+
ttlInjected++;
|
|
1286
|
+
return { ...block, cache_control: { ...block.cache_control, ttl: ttlParam } };
|
|
1287
|
+
}
|
|
1288
|
+
return block;
|
|
1289
|
+
});
|
|
1290
|
+
// Also check messages for cache_control blocks (conversation history breakpoints)
|
|
1291
|
+
if (payload.messages) {
|
|
1292
|
+
for (const msg of payload.messages) {
|
|
1293
|
+
if (!Array.isArray(msg.content)) continue;
|
|
1294
|
+
for (let i = 0; i < msg.content.length; i++) {
|
|
1295
|
+
const b = msg.content[i];
|
|
1296
|
+
if (b.cache_control?.type === "ephemeral" && !b.cache_control.ttl) {
|
|
1297
|
+
msg.content[i] = { ...b, cache_control: { ...b.cache_control, ttl: ttlParam } };
|
|
1298
|
+
ttlInjected++;
|
|
1299
|
+
}
|
|
992
1300
|
}
|
|
993
1301
|
}
|
|
994
1302
|
}
|
|
1303
|
+
if (ttlInjected > 0) {
|
|
1304
|
+
modified = true;
|
|
1305
|
+
debugLog(`APPLIED: ${ttlParam} TTL injected on ${ttlInjected} cache_control block(s) (${requestType})`);
|
|
1306
|
+
recordFixResult("ttl", "applied");
|
|
1307
|
+
} else {
|
|
1308
|
+
recordFixResult("ttl", "skipped");
|
|
1309
|
+
}
|
|
995
1310
|
}
|
|
996
|
-
|
|
997
|
-
|
|
998
|
-
debugLog(`APPLIED: 1h TTL injected on ${ttlInjected} cache_control block(s)`);
|
|
999
|
-
}
|
|
1311
|
+
} else if (payload.system && !shouldApplyFix("ttl")) {
|
|
1312
|
+
debugLog("SKIPPED: TTL injection disabled via env var");
|
|
1000
1313
|
}
|
|
1001
1314
|
|
|
1002
1315
|
if (modified) {
|
|
@@ -1009,6 +1322,60 @@ globalThis.fetch = async function (url, options) {
|
|
|
1009
1322
|
monitorContextDegradation(payload.messages);
|
|
1010
1323
|
}
|
|
1011
1324
|
|
|
1325
|
+
// Diagnostic: dump cache breakpoint structure to a file when
|
|
1326
|
+
// CACHE_FIX_DUMP_BREAKPOINTS=<path> is set. Maps where cache_control markers
|
|
1327
|
+
// sit across system blocks and message content. Used to investigate #12
|
|
1328
|
+
// (missing breakpoint #3 for skills/CLAUDE.md).
|
|
1329
|
+
if (process.env.CACHE_FIX_DUMP_BREAKPOINTS && payload.system) {
|
|
1330
|
+
try {
|
|
1331
|
+
const dumpPath = process.env.CACHE_FIX_DUMP_BREAKPOINTS;
|
|
1332
|
+
const breakpoints = [];
|
|
1333
|
+
// System blocks
|
|
1334
|
+
if (Array.isArray(payload.system)) {
|
|
1335
|
+
payload.system.forEach((block, idx) => {
|
|
1336
|
+
if (block.cache_control) {
|
|
1337
|
+
breakpoints.push({
|
|
1338
|
+
location: "system",
|
|
1339
|
+
index: idx,
|
|
1340
|
+
type: block.type,
|
|
1341
|
+
cache_control: block.cache_control,
|
|
1342
|
+
text_preview: (block.text || "").slice(0, 120),
|
|
1343
|
+
text_chars: (block.text || "").length,
|
|
1344
|
+
});
|
|
1345
|
+
}
|
|
1346
|
+
});
|
|
1347
|
+
}
|
|
1348
|
+
// Message blocks
|
|
1349
|
+
if (payload.messages) {
|
|
1350
|
+
payload.messages.forEach((msg, msgIdx) => {
|
|
1351
|
+
if (!Array.isArray(msg.content)) return;
|
|
1352
|
+
msg.content.forEach((block, blockIdx) => {
|
|
1353
|
+
if (block.cache_control) {
|
|
1354
|
+
breakpoints.push({
|
|
1355
|
+
location: `messages[${msgIdx}].content`,
|
|
1356
|
+
role: msg.role,
|
|
1357
|
+
index: blockIdx,
|
|
1358
|
+
type: block.type,
|
|
1359
|
+
cache_control: block.cache_control,
|
|
1360
|
+
text_preview: (block.text || "").slice(0, 120),
|
|
1361
|
+
text_chars: (block.text || "").length,
|
|
1362
|
+
});
|
|
1363
|
+
}
|
|
1364
|
+
});
|
|
1365
|
+
});
|
|
1366
|
+
}
|
|
1367
|
+
const dump = {
|
|
1368
|
+
timestamp: new Date().toISOString(),
|
|
1369
|
+
breakpoint_count: breakpoints.length,
|
|
1370
|
+
breakpoints,
|
|
1371
|
+
system_block_count: Array.isArray(payload.system) ? payload.system.length : 0,
|
|
1372
|
+
message_count: payload.messages ? payload.messages.length : 0,
|
|
1373
|
+
};
|
|
1374
|
+
writeFileSync(dumpPath, JSON.stringify(dump, null, 2));
|
|
1375
|
+
debugLog(`DUMP: ${breakpoints.length} cache breakpoints written to ${dumpPath}`);
|
|
1376
|
+
} catch (e) { debugLog("BREAKPOINT DUMP ERROR:", e?.message); }
|
|
1377
|
+
}
|
|
1378
|
+
|
|
1012
1379
|
// Diagnostic: dump full tools array (names, descriptions, schemas, sizes) to a file
|
|
1013
1380
|
// when CACHE_FIX_DUMP_TOOLS=<path> is set. Useful for per-version tool-schema drift
|
|
1014
1381
|
// analysis and for understanding which tools contribute prefix bloat. First used
|
|
@@ -1199,6 +1566,7 @@ async function drainTTLFromClone(clone, model, quotaHeaders) {
|
|
|
1199
1566
|
if (event.type === "message_start" && event.message?.usage) {
|
|
1200
1567
|
const u = event.message.usage;
|
|
1201
1568
|
startUsage = u;
|
|
1569
|
+
_trackCacheRatio(u);
|
|
1202
1570
|
const cc = u.cache_creation || {};
|
|
1203
1571
|
const e1h = cc.ephemeral_1h_input_tokens ?? 0;
|
|
1204
1572
|
const e5m = cc.ephemeral_5m_input_tokens ?? 0;
|
package/tools/cost-report.mjs
CHANGED
|
@@ -397,13 +397,24 @@ function calculateCosts(entries, ratesData) {
|
|
|
397
397
|
continue;
|
|
398
398
|
}
|
|
399
399
|
|
|
400
|
-
// Determine cache write tier
|
|
401
|
-
//
|
|
402
|
-
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
|
|
406
|
-
|
|
400
|
+
// Determine cache write tier for cache_creation tokens.
|
|
401
|
+
// eph_1h/eph_5m are READ tokens (cache hits per tier), not write tokens.
|
|
402
|
+
// But they tell us which tier the request was on — and cache creation on
|
|
403
|
+
// that request uses the same tier's write rate.
|
|
404
|
+
// Fix for #7: previously assigned all creation to 5m when eph fields were 0.
|
|
405
|
+
let cw1h = 0;
|
|
406
|
+
let cw5m = 0;
|
|
407
|
+
if (entry.cache_create > 0) {
|
|
408
|
+
if (entry.eph_1h > 0) {
|
|
409
|
+
// Request was on 1h tier — creation charged at 1h write rate
|
|
410
|
+
cw1h = entry.cache_create;
|
|
411
|
+
} else if (entry.eph_5m > 0) {
|
|
412
|
+
// Request was on 5m tier — creation charged at 5m write rate
|
|
413
|
+
cw5m = entry.cache_create;
|
|
414
|
+
} else {
|
|
415
|
+
// No tier signal available; assume 5m (conservative — lower rate)
|
|
416
|
+
cw5m = entry.cache_create;
|
|
417
|
+
}
|
|
407
418
|
}
|
|
408
419
|
|
|
409
420
|
const cost = (
|
|
@@ -59,6 +59,16 @@ try:
|
|
|
59
59
|
if ttl:
|
|
60
60
|
if ttl == '5m':
|
|
61
61
|
label += ' | \033[31mTTL:5m\033[0m' # red
|
|
62
|
+
# When on 5m tier, show the cold-rebuild size so users know
|
|
63
|
+
# the cost of idling past 5 minutes
|
|
64
|
+
cache_cr = qs.get('cache', {}).get('cache_creation', 0)
|
|
65
|
+
cache_rd = qs.get('cache', {}).get('cache_read', 0)
|
|
66
|
+
prefix = cache_cr + cache_rd
|
|
67
|
+
if prefix > 0:
|
|
68
|
+
if prefix >= 1_000_000:
|
|
69
|
+
label += ' \033[31m\u26A0 idle >5m = {:.1f}M rebuild\033[0m'.format(prefix / 1_000_000)
|
|
70
|
+
else:
|
|
71
|
+
label += ' \033[31m\u26A0 idle >5m = {:.0f}K rebuild\033[0m'.format(prefix / 1_000)
|
|
62
72
|
else:
|
|
63
73
|
label += ' | TTL:' + ttl
|
|
64
74
|
if hit and hit != 'N/A':
|