claude-code-cache-fix 3.3.0 → 3.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.ko.md +3 -3
- package/README.md +58 -3
- package/README.zh.md +3 -3
- package/package.json +2 -2
- package/proxy/extensions/cache-telemetry.mjs +139 -17
- package/proxy/extensions/identity-normalization.mjs +1 -1
- package/proxy/extensions/image-strip.mjs +7 -2
- package/proxy/extensions/messages-cache-breakpoint.mjs +314 -0
- package/proxy/extensions/microcompact-stability.mjs +429 -0
- package/proxy/extensions/ttl-management.mjs +2 -1
- package/proxy/extensions/ttl-tier-detect.mjs +33 -0
- package/proxy/extensions.json +3 -0
- package/tools/cache-test.sh +19 -11
- package/tools/cross-version-cache-test.sh +4 -4
- package/tools/quota-statusline.sh +75 -19
package/README.ko.md
CHANGED
|
@@ -39,7 +39,7 @@ ANTHROPIC_BASE_URL=http://127.0.0.1:9801 claude
|
|
|
39
39
|
| `identity-normalization` | 접두사 안정성을 위해 메시지 ID 필드를 정규화합니다 |
|
|
40
40
|
| `fresh-session-sort` | 첫 번째 턴의 비결정적 순서를 수정합니다 |
|
|
41
41
|
| `cache-control-normalize` | 메시지 간 cache_control 마커를 정규화합니다 |
|
|
42
|
-
| `cache-telemetry` | 응답 헤더에서 캐시 통계를 추출하여 `~/.claude/quota-status.json`에 기록합니다 |
|
|
42
|
+
| `cache-telemetry` | 응답 헤더에서 캐시 통계를 추출하여 `~/.claude/quota-status/{account.json,sessions/<id>.json}`에 기록합니다 |
|
|
43
43
|
|
|
44
44
|
확장은 핫리로드됩니다 — `proxy/extensions/`에서 `.mjs` 파일을 추가, 제거 또는 수정하면 프록시 재시작 없이 다음 요청부터 적용됩니다. 설정은 `proxy/extensions.json`에 있습니다.
|
|
45
45
|
|
|
@@ -202,7 +202,7 @@ Fixes are disabled — consider re-enabling to recover cache performance.
|
|
|
202
202
|
|
|
203
203
|
## 상태 표시줄 — 실시간 쿼터 경고
|
|
204
204
|
|
|
205
|
-
|
|
205
|
+
두 모드 모두 매 API 호출마다 쿼터 상태를 기록합니다. 프록시 모드(v3.5.0+)는 `~/.claude/quota-status/account.json`(계정 전역: Q5h/Q7d, 상태, 초과)과 `~/.claude/quota-status/sessions/<id>.json`(세션별: TTL 계층, 적중률)로 분리됩니다. 프리로드 모드는 기존 `~/.claude/quota-status.json`(구조상 단일 세션)을 유지합니다. 포함된 `tools/quota-statusline.sh` 스크립트로 실시간 상태를 표시할 수 있습니다:
|
|
206
206
|
|
|
207
207
|
- **Q5h %** (소진율, %/분)
|
|
208
208
|
- **Q7d %** (소진율, %/시간)
|
|
@@ -292,7 +292,7 @@ npm install sharp
|
|
|
292
292
|
|
|
293
293
|
## 모니터링 & 진단
|
|
294
294
|
|
|
295
|
-
프리로드 인터셉터에는 마이크로컴팩트 열화, 가상 속도 제한기, GrowthBook 플래그 상태, 사용량 텔레메트리, 비용 리포트에 대한 모니터링이 포함됩니다. 쿼터 추적은
|
|
295
|
+
프리로드 인터셉터에는 마이크로컴팩트 열화, 가상 속도 제한기, GrowthBook 플래그 상태, 사용량 텔레메트리, 비용 리포트에 대한 모니터링이 포함됩니다. 쿼터 추적은 `~/.claude/quota-status/`(프록시: 세션별 분리) 또는 `~/.claude/quota-status.json`(프리로드: 단일 세션 레거시 경로)을 통해 동작합니다.
|
|
296
296
|
|
|
297
297
|
전체 상세, 디버그 모드, 접두사 비교, 환경 변수, 내장 쿼터 분석 도구는 [docs/monitoring.md](docs/monitoring.md)를 참조하십시오.
|
|
298
298
|
|
package/README.md
CHANGED
|
@@ -39,7 +39,7 @@ On every `/v1/messages` request, 7 extensions run in order:
|
|
|
39
39
|
| `identity-normalization` | Normalizes message identity fields for prefix stability |
|
|
40
40
|
| `fresh-session-sort` | Fixes non-deterministic ordering on first turn |
|
|
41
41
|
| `cache-control-normalize` | Normalizes cache_control markers across messages |
|
|
42
|
-
| `cache-telemetry` | Extracts cache stats from response headers → `~/.claude/quota-status.json` |
|
|
42
|
+
| `cache-telemetry` | Extracts cache stats from response headers → `~/.claude/quota-status/{account.json,sessions/<id>.json}` |
|
|
43
43
|
|
|
44
44
|
Extensions are hot-reloadable — add, remove, or modify `.mjs` files in `proxy/extensions/` and changes apply to the next request without restarting. Configuration in `proxy/extensions.json`.
|
|
45
45
|
|
|
@@ -280,7 +280,7 @@ The interceptor can only *help* or *do nothing*. It cannot make things worse.
|
|
|
280
280
|
|
|
281
281
|
## Status line — quota warnings in real time
|
|
282
282
|
|
|
283
|
-
Both
|
|
283
|
+
Both modes write quota state on every API call. Proxy mode (v3.5.0+) splits into `~/.claude/quota-status/account.json` (account-global fields: Q5h/Q7d, status, overage) plus `~/.claude/quota-status/sessions/<id>.json` (per-session cache fields: TTL tier, hit rate). Preload mode keeps the legacy `~/.claude/quota-status.json` (single-session by construction). The included `tools/quota-statusline.sh` script displays a live status line showing:
|
|
284
284
|
|
|
285
285
|
- **Q5h %** with burn rate (%/min)
|
|
286
286
|
- **Q7d %** with burn rate (%/hr)
|
|
@@ -409,13 +409,66 @@ If `sharp` is missing, Pass 3 skips cleanly (telemetry records `library_missing:
|
|
|
409
409
|
| `CACHE_FIX_IMAGE_REQUEST_SIZE_MAX` | 31457280 (30 MB) | Pass 2 byte budget. 2 MB headroom from Anthropic's 32 MB ceiling. |
|
|
410
410
|
| `CACHE_FIX_IMAGE_COUNT_MAX` | 100 | Hard image-count cap. Set to 600 for legacy Claude 1/2.x/Instant if needed. |
|
|
411
411
|
|
|
412
|
+
## Cache breakpoints (proxy mode, opt-in)
|
|
413
|
+
|
|
414
|
+
Anthropic's prompt cache supports up to **four** `cache_control` markers per request. Claude Code currently uses three of the four; the third (between auto-injected `messages[0]` content — hooks, skills, project CLAUDE.md, deferred tools, MCP server descriptions — and the first real user content) is missing entirely. Without that marker, every change inside the auto-injected span busts the cache for everything that follows. wadabum projected ~6,500 token savings per fresh-session first turn from adding it ([anthropics/claude-code#47098](https://github.com/anthropics/claude-code/issues/47098)).
|
|
415
|
+
|
|
416
|
+
The proxy can inject the missing marker on opt-in. Default off until validated against community data.
|
|
417
|
+
|
|
418
|
+
```sh
|
|
419
|
+
export CACHE_FIX_INJECT_MESSAGES_BREAKPOINT=1
|
|
420
|
+
```
|
|
421
|
+
|
|
422
|
+
The injection is conservative: it only fires when the request already carries 1–3 markers (typical CC shape) and refuses if the request is at the 4-marker limit (would 400) or has zero markers (Agent SDK / API-direct shape this extension isn't built for). Boundary detection covers all five observed auto-injected block kinds — hooks, skills, CLAUDE.md, deferred-tools, MCP — and lands the marker on the LAST auto-injected block.
|
|
423
|
+
|
|
424
|
+
A diagnostic-only env var dumps the structural shape of `messages[0]` for fixture sourcing without mutating the request:
|
|
425
|
+
|
|
426
|
+
```sh
|
|
427
|
+
export CACHE_FIX_DUMP_MESSAGES_HEAD=/tmp/messages-head.jsonl
|
|
428
|
+
```
|
|
429
|
+
|
|
430
|
+
| Env var | Default | Purpose |
|
|
431
|
+
|---------|---------|---------|
|
|
432
|
+
| `CACHE_FIX_INJECT_MESSAGES_BREAKPOINT` | unset | Enable breakpoint #3 injection (`=1` opt-in). |
|
|
433
|
+
| `CACHE_FIX_DUMP_MESSAGES_HEAD` | unset | Diagnostic JSONL dump of `messages[0].content` shape — read-only, no mutation. |
|
|
434
|
+
|
|
435
|
+
## Microcompact stability (proxy mode, opt-in)
|
|
436
|
+
|
|
437
|
+
After ~90 minutes idle, Claude Code's `time_based_microcompact` (and the cold-compact path triggered by `FDY()`) replaces old `tool_result` content with a sentinel string. The original content is gone for cache purposes; that part is unrecoverable from the proxy. But the sentinel itself can carry an embedded timestamp (`[Old tool result content cleared at 2026-04-30T13:42:11Z]`), which means a *second* microcompact pass against the same already-cleared position writes different bytes — busting the cache for everything after that position even though no new content was added.
|
|
438
|
+
|
|
439
|
+
This extension addresses the recoverable half: normalize the sentinel to a byte-stable canonical form so repeat microcompacts don't churn the cache. **Phase 1 only** — diagnostic + opt-in normalization. Phase 2 (snapshot-and-restore of original tool_result content) is deferred to v3.5.0+ pending Phase 1 production data.
|
|
440
|
+
|
|
441
|
+
```sh
|
|
442
|
+
# Step 1 (diagnostic): characterize what CC's sentinel actually looks like.
|
|
443
|
+
export CACHE_FIX_DUMP_MICROCOMPACT=/tmp/microcompact-dump.jsonl
|
|
444
|
+
|
|
445
|
+
# Step 2 (normalize): once the sentinel format is confirmed, opt-in.
|
|
446
|
+
export CACHE_FIX_NORMALIZE_MICROCOMPACT=1
|
|
447
|
+
```
|
|
448
|
+
|
|
449
|
+
Detection has two modes:
|
|
450
|
+
- **Mode A** — exact match against confirmed CC sentinel patterns (the bare form and the ISO-8601 timestamp variant). Mode A matches are eligible for normalization.
|
|
451
|
+
- **Mode B** — prefix-only match (text begins with `[Old tool result content cleared` but does not exactly match a Mode A pattern). Mode B is **diagnostic-only**: never normalized, dump records redact to a 64-char prefix only.
|
|
452
|
+
|
|
453
|
+
The Mode A/B separation protects against cases where the sentinel might be followed by user-derived content (e.g., a tool that echoed user input back into its result) — the redaction guarantee on Mode B keeps that content out of the diagnostic dump.
|
|
454
|
+
|
|
455
|
+
| Env var | Default | Purpose |
|
|
456
|
+
|---------|---------|---------|
|
|
457
|
+
| `CACHE_FIX_DUMP_MICROCOMPACT` | unset | Path for diagnostic JSONL dump of detected sentinels. Read-only — no mutation. |
|
|
458
|
+
| `CACHE_FIX_NORMALIZE_MICROCOMPACT` | unset | Enable normalization (`=1` opts in). Mutates Mode A matches to canonical form. |
|
|
459
|
+
| `CACHE_FIX_MICROCOMPACT_NORMALIZED` | `[Old tool result content cleared]` | Override the canonical replacement string. |
|
|
460
|
+
| `CACHE_FIX_MICROCOMPACT_SENTINEL_PATTERN_<N>` | unset | Add custom Mode A regex pattern(s). Numbered (1-indexed, sparse OK). |
|
|
461
|
+
| `CACHE_FIX_MICROCOMPACT_SENTINEL_PREFIX_<N>` | unset | Custom Mode B literal prefix(es). Pair with a custom Mode A pattern from a non-default sentinel family so prefix-only variants of that family also get redacted Mode B capture. |
|
|
462
|
+
| `CACHE_FIX_MICROCOMPACT_REDACT_LEN` | `64` | Mode B prefix length in dump records. Set to `0` to suppress the prefix entirely. |
|
|
463
|
+
| `CACHE_FIX_DUMP_MICROCOMPACT_INCLUDE_NORMALIZED` | unset | Add post-normalization text alongside (not replacing) raw `sentinel_text` in dump records. |
|
|
464
|
+
|
|
412
465
|
## System prompt rewrite (preload mode, optional)
|
|
413
466
|
|
|
414
467
|
The interceptor can rewrite Claude Code's `# Output efficiency` system-prompt section. Disabled by default. Enable with `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT`. See [docs/output-efficiency-prompts.md](docs/output-efficiency-prompts.md) for the three known prompt variants and usage instructions.
|
|
415
468
|
|
|
416
469
|
## Monitoring & diagnostics
|
|
417
470
|
|
|
418
|
-
The preload interceptor includes monitoring for microcompact degradation, false rate limiters, GrowthBook flag state, usage telemetry, and cost reporting. Quota tracking works in both proxy and preload modes via `~/.claude/quota-status.json
|
|
471
|
+
The preload interceptor includes monitoring for microcompact degradation, false rate limiters, GrowthBook flag state, usage telemetry, and cost reporting. Quota tracking works in both proxy and preload modes via `~/.claude/quota-status/` (proxy: per-session split) or `~/.claude/quota-status.json` (preload: single-session legacy path).
|
|
419
472
|
|
|
420
473
|
See [docs/monitoring.md](docs/monitoring.md) for full details, debug mode, prefix diffing, environment variables, and the bundled quota analysis tool.
|
|
421
474
|
|
|
@@ -440,6 +493,7 @@ We monitor 30+ upstream Claude Code issues related to cache, quota, and context
|
|
|
440
493
|
## Used in production
|
|
441
494
|
|
|
442
495
|
- **[Crunchloop DAP](https://dap.crunchloop.ai)** — Agent SDK / DAP development environment. First production team to merge the interceptor to trunk for team-wide deployment (2026-04-10). Identified two distinct cache regression patterns through real-world testing — tool ordering jitter and the fresh-session sort gap — and contributed debug traces that drove the v1.5.1 and v1.6.2 fixes.
|
|
496
|
+
- **[VM Farms](https://vmfarms.com)** ([@vmfarms](https://github.com/vmfarms)) — Agent development environment running concurrent multi-runner workloads with `--resume --fork-session`. Surfaced three cache-fix proxy-mode bugs: the resume-marker regex no-op (#96), TTL tier detection gap vs preload mode (#97), and image-strip stderr leak past `CACHE_FIX_DEBUG` (#98) — all addressed in the v3.4.0 release.
|
|
443
497
|
|
|
444
498
|
## Contributors
|
|
445
499
|
|
|
@@ -456,6 +510,7 @@ We monitor 30+ upstream Claude Code issues related to cache, quota, and context
|
|
|
456
510
|
- **[@JEONG-JIWOO](https://github.com/JEONG-JIWOO)** — VS Code extension investigation: discovered `claudeCode.claudeProcessWrapper` as the working integration path, wrote the C wrapper for Windows (#16)
|
|
457
511
|
- **[@X-15](https://github.com/X-15)** — VS Code extension validation, per-fix health status analysis confirming safety check behavior on v2.1.105 (#16)
|
|
458
512
|
- **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery, proposed and built proxy architecture for v3.0.0
|
|
513
|
+
- **[@vmfarms](https://github.com/vmfarms)** — Concurrent multi-runner production validation, surfaced proxy-mode resume-marker regex no-op (#96), TTL tier detection gap (#97), and image-strip stderr leak (#98)
|
|
459
514
|
|
|
460
515
|
If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.
|
|
461
516
|
|
package/README.zh.md
CHANGED
|
@@ -39,7 +39,7 @@ ANTHROPIC_BASE_URL=http://127.0.0.1:9801 claude
|
|
|
39
39
|
| `identity-normalization` | 规范化消息身份字段以保持前缀稳定性 |
|
|
40
40
|
| `fresh-session-sort` | 修复首次轮次的非确定性排序 |
|
|
41
41
|
| `cache-control-normalize` | 规范化消息间的 cache_control 标记 |
|
|
42
|
-
| `cache-telemetry` | 从响应头提取缓存统计 → `~/.claude/quota-status.json` |
|
|
42
|
+
| `cache-telemetry` | 从响应头提取缓存统计 → `~/.claude/quota-status/{account.json,sessions/<id>.json}` |
|
|
43
43
|
|
|
44
44
|
扩展支持热重载 — 在 `proxy/extensions/` 中添加、删除或修改 `.mjs` 文件,更改将在下一次请求时生效,无需重启。配置在 `proxy/extensions.json` 中。
|
|
45
45
|
|
|
@@ -167,7 +167,7 @@ NODE_OPTIONS="--import claude-code-cache-fix" claude
|
|
|
167
167
|
|
|
168
168
|
## 状态栏 — 实时配额警告
|
|
169
169
|
|
|
170
|
-
|
|
170
|
+
两种模式在每次 API 调用时都会写入配额状态。代理模式(v3.5.0+)拆分为 `~/.claude/quota-status/account.json`(账户级:Q5h/Q7d、状态、超额)和 `~/.claude/quota-status/sessions/<id>.json`(每会话:TTL 层级、命中率)。预加载模式保留旧版 `~/.claude/quota-status.json`(按构造为单会话)。内置的 `tools/quota-statusline.sh` 脚本显示实时状态栏:
|
|
171
171
|
|
|
172
172
|
- **Q5h %** 及消耗速率(%/分钟)
|
|
173
173
|
- **Q7d %** 及消耗速率(%/小时)
|
|
@@ -200,7 +200,7 @@ export CACHE_FIX_IMAGE_KEEP_LAST=3
|
|
|
200
200
|
|
|
201
201
|
## 监控与诊断
|
|
202
202
|
|
|
203
|
-
预加载拦截器包含对微压缩降级、虚假速率限制器、GrowthBook
|
|
203
|
+
预加载拦截器包含对微压缩降级、虚假速率限制器、GrowthBook 标志状态、使用量遥测和成本报告的监控。配额追踪通过 `~/.claude/quota-status/`(代理:按会话拆分)或 `~/.claude/quota-status.json`(预加载:单会话旧路径)工作。
|
|
204
204
|
|
|
205
205
|
完整详情、调试模式、前缀差异对比、环境变量和内置配额分析工具请参见 [docs/monitoring.md](docs/monitoring.md)。
|
|
206
206
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "claude-code-cache-fix",
|
|
3
|
-
"version": "3.
|
|
3
|
+
"version": "3.5.0",
|
|
4
4
|
"description": "Cache optimization proxy and interceptor for Claude Code. Fixes prompt cache bugs, stabilizes prefix, reduces quota burn.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"exports": "./preload.mjs",
|
|
@@ -53,5 +53,5 @@
|
|
|
53
53
|
"url": "https://buymeacoffee.com/vsits"
|
|
54
54
|
},
|
|
55
55
|
"license": "MIT",
|
|
56
|
-
"author": "Chris Nighswonger <
|
|
56
|
+
"author": "Chris Nighswonger <dev@vsits.co> (https://vsits.co)"
|
|
57
57
|
}
|
|
@@ -1,8 +1,63 @@
|
|
|
1
|
-
import {
|
|
1
|
+
import {
|
|
2
|
+
writeFileSync,
|
|
3
|
+
renameSync,
|
|
4
|
+
unlinkSync,
|
|
5
|
+
mkdirSync,
|
|
6
|
+
readdirSync,
|
|
7
|
+
statSync,
|
|
8
|
+
} from "node:fs";
|
|
2
9
|
import { join } from "node:path";
|
|
3
10
|
import { homedir } from "node:os";
|
|
11
|
+
import { createHash, randomBytes } from "node:crypto";
|
|
4
12
|
|
|
5
|
-
|
|
13
|
+
// Paths are resolved per call (not cached at module load) so tests can swap
|
|
14
|
+
// $HOME between cases. The homedir() call is essentially free.
|
|
15
|
+
function paths() {
|
|
16
|
+
const home = homedir();
|
|
17
|
+
const quotaDir = join(home, ".claude", "quota-status");
|
|
18
|
+
return {
|
|
19
|
+
quotaDir,
|
|
20
|
+
accountPath: join(quotaDir, "account.json"),
|
|
21
|
+
sessionsDir: join(quotaDir, "sessions"),
|
|
22
|
+
legacyPath: join(home, ".claude", "quota-status.json"),
|
|
23
|
+
};
|
|
24
|
+
}
|
|
25
|
+
|
|
26
|
+
const SAFE_NAME_RE = /^[A-Za-z0-9_-]{1,128}$/;
|
|
27
|
+
const SWEEP_THROTTLE_MS = 60_000;
|
|
28
|
+
const DEFAULT_TTL_DAYS = 7;
|
|
29
|
+
|
|
30
|
+
// --- Module-scope state ---
|
|
31
|
+
let legacyCleanupDone = false;
|
|
32
|
+
let lastSweepMs = 0;
|
|
33
|
+
|
|
34
|
+
// Per directive `proxy-quota-status-per-session.md` — derive a filesystem-safe
|
|
35
|
+
// filename from a raw session id. Both writer (this extension) and readers
|
|
36
|
+
// (tools/quota-statusline.sh, etc.) must apply the same rule.
|
|
37
|
+
//
|
|
38
|
+
// Rules:
|
|
39
|
+
// - null/undefined/empty/whitespace → "unknown"
|
|
40
|
+
// - matches /^[A-Za-z0-9_-]{1,128}$/ → raw passthrough
|
|
41
|
+
// - else → "inv-" + sha256(raw)[:16]
|
|
42
|
+
//
|
|
43
|
+
// Exported for unit testing and for the directive's writer/reader contract.
|
|
44
|
+
export function sessionFilename(rawId) {
|
|
45
|
+
if (rawId === null || rawId === undefined) return "unknown";
|
|
46
|
+
const s = String(rawId).trim();
|
|
47
|
+
if (s.length === 0) return "unknown";
|
|
48
|
+
if (SAFE_NAME_RE.test(s)) return s;
|
|
49
|
+
return "inv-" + createHash("sha256").update(s).digest("hex").slice(0, 16);
|
|
50
|
+
}
|
|
51
|
+
|
|
52
|
+
function resolveSessionId(headers) {
|
|
53
|
+
if (!headers) return null;
|
|
54
|
+
const sid =
|
|
55
|
+
headers["x-claude-code-session-id"] ||
|
|
56
|
+
headers["x-session-id"] ||
|
|
57
|
+
headers["x-anthropic-session-id"] ||
|
|
58
|
+
null;
|
|
59
|
+
return sid || null;
|
|
60
|
+
}
|
|
6
61
|
|
|
7
62
|
function parseHeaders(headers) {
|
|
8
63
|
const get = (key) => headers[key] || "";
|
|
@@ -44,9 +99,56 @@ function parseHeaders(headers) {
|
|
|
44
99
|
};
|
|
45
100
|
}
|
|
46
101
|
|
|
102
|
+
function atomicWrite(finalPath, content) {
|
|
103
|
+
const tmp = `${finalPath}.tmp.${process.pid}.${randomBytes(4).toString("hex")}`;
|
|
104
|
+
writeFileSync(tmp, content);
|
|
105
|
+
renameSync(tmp, finalPath);
|
|
106
|
+
}
|
|
107
|
+
|
|
108
|
+
function cleanupLegacyOnce() {
|
|
109
|
+
if (legacyCleanupDone) return;
|
|
110
|
+
legacyCleanupDone = true;
|
|
111
|
+
try {
|
|
112
|
+
unlinkSync(paths().legacyPath);
|
|
113
|
+
} catch {}
|
|
114
|
+
}
|
|
115
|
+
|
|
116
|
+
function sweepStaleSessions(ttlDays) {
|
|
117
|
+
const now = Date.now();
|
|
118
|
+
if (now - lastSweepMs < SWEEP_THROTTLE_MS) return;
|
|
119
|
+
lastSweepMs = now;
|
|
120
|
+
|
|
121
|
+
const cutoffMs = now - ttlDays * 86_400_000;
|
|
122
|
+
const { sessionsDir } = paths();
|
|
123
|
+
let entries;
|
|
124
|
+
try {
|
|
125
|
+
entries = readdirSync(sessionsDir);
|
|
126
|
+
} catch {
|
|
127
|
+
return;
|
|
128
|
+
}
|
|
129
|
+
for (const name of entries) {
|
|
130
|
+
const p = join(sessionsDir, name);
|
|
131
|
+
try {
|
|
132
|
+
const st = statSync(p);
|
|
133
|
+
if (st.mtimeMs < cutoffMs) {
|
|
134
|
+
try {
|
|
135
|
+
unlinkSync(p);
|
|
136
|
+
} catch {}
|
|
137
|
+
}
|
|
138
|
+
} catch {}
|
|
139
|
+
}
|
|
140
|
+
}
|
|
141
|
+
|
|
142
|
+
function getTtlDays() {
|
|
143
|
+
const raw = process.env.CACHE_FIX_QUOTA_STATUS_TTL_DAYS;
|
|
144
|
+
if (raw === undefined || raw === "") return DEFAULT_TTL_DAYS;
|
|
145
|
+
const n = Number(raw);
|
|
146
|
+
return Number.isFinite(n) && n >= 0 ? n : DEFAULT_TTL_DAYS;
|
|
147
|
+
}
|
|
148
|
+
|
|
47
149
|
export default {
|
|
48
150
|
name: "cache-telemetry",
|
|
49
|
-
description: "Extract cache stats from response stream, persist quota state to ~/.claude/quota-status.json",
|
|
151
|
+
description: "Extract cache stats from response stream, persist quota state to ~/.claude/quota-status/{account.json,sessions/<filename>.json}",
|
|
50
152
|
order: 600,
|
|
51
153
|
|
|
52
154
|
async onResponseStart(ctx) {
|
|
@@ -56,6 +158,7 @@ export default {
|
|
|
56
158
|
if (!quota) return;
|
|
57
159
|
|
|
58
160
|
ctx.meta._quotaData = quota;
|
|
161
|
+
ctx.meta._sessionId = resolveSessionId(ctx.headers);
|
|
59
162
|
},
|
|
60
163
|
|
|
61
164
|
async onStreamEvent(ctx) {
|
|
@@ -89,24 +192,43 @@ export default {
|
|
|
89
192
|
|
|
90
193
|
const ttl = cr > 0 ? "1h" : (cc > 0 ? "5m" : "unknown");
|
|
91
194
|
|
|
92
|
-
const
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
195
|
+
const timestamp = new Date().toISOString();
|
|
196
|
+
const rawSid = ctx.meta._sessionId;
|
|
197
|
+
const filename = sessionFilename(rawSid);
|
|
198
|
+
|
|
199
|
+
const accountPayload = JSON.stringify({ ...quota, timestamp }, null, 2);
|
|
200
|
+
const sessionPayload = JSON.stringify(
|
|
201
|
+
{
|
|
202
|
+
cache: {
|
|
203
|
+
ttl_tier: ttl,
|
|
204
|
+
cache_creation: cc,
|
|
205
|
+
cache_read: cr,
|
|
206
|
+
ephemeral_1h: ephemeral1h,
|
|
207
|
+
ephemeral_5m: ephemeral5m,
|
|
208
|
+
hit_rate: hitRate,
|
|
209
|
+
timestamp,
|
|
210
|
+
},
|
|
211
|
+
timestamp,
|
|
212
|
+
session_id: rawSid,
|
|
101
213
|
},
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
214
|
+
null,
|
|
215
|
+
2,
|
|
216
|
+
);
|
|
105
217
|
|
|
106
218
|
try {
|
|
107
|
-
|
|
108
|
-
|
|
219
|
+
cleanupLegacyOnce();
|
|
220
|
+
const { sessionsDir, accountPath } = paths();
|
|
221
|
+
mkdirSync(sessionsDir, { recursive: true });
|
|
222
|
+
atomicWrite(accountPath, accountPayload);
|
|
223
|
+
atomicWrite(join(sessionsDir, `${filename}.json`), sessionPayload);
|
|
224
|
+
sweepStaleSessions(getTtlDays());
|
|
109
225
|
} catch {}
|
|
110
226
|
}
|
|
111
227
|
},
|
|
228
|
+
|
|
229
|
+
// Test-only: reset module state between tests.
|
|
230
|
+
__resetForTests() {
|
|
231
|
+
legacyCleanupDone = false;
|
|
232
|
+
lastSweepMs = 0;
|
|
233
|
+
},
|
|
112
234
|
};
|
|
@@ -2,7 +2,7 @@ import { createHash } from "node:crypto";
|
|
|
2
2
|
|
|
3
3
|
const _pinnedBlocks = new Map();
|
|
4
4
|
|
|
5
|
-
const SESSION_START_RESUME_MARKER = /SessionStart:
|
|
5
|
+
const SESSION_START_RESUME_MARKER = /SessionStart:resume hook success:/g;
|
|
6
6
|
const SESSION_START_ID_TAG = /\n?<session-id>[^<]*<\/session-id>/g;
|
|
7
7
|
const SESSION_START_LAST_ACTIVE_LINE = /\nLast active:[^\n]*/g;
|
|
8
8
|
const CONTINUE_TRAILER_TEXT = "Continue from where you left off.";
|
|
@@ -48,6 +48,9 @@ function getRequestSizeMax() {
|
|
|
48
48
|
const v = parseInt(process.env.CACHE_FIX_IMAGE_REQUEST_SIZE_MAX || "31457280", 10);
|
|
49
49
|
return v > 0 ? v : 31457280;
|
|
50
50
|
}
|
|
51
|
+
function isDebug() {
|
|
52
|
+
return process.env.CACHE_FIX_DEBUG === "1";
|
|
53
|
+
}
|
|
51
54
|
function getImageCountMax() {
|
|
52
55
|
// Default 100 — single cap covering the only model family in active CC use.
|
|
53
56
|
// Users on legacy Claude 1/2.x/Instant who genuinely need 600 can override.
|
|
@@ -649,7 +652,9 @@ export default {
|
|
|
649
652
|
|
|
650
653
|
if (logParts.length > 0) {
|
|
651
654
|
ctx.body.messages = messages;
|
|
652
|
-
|
|
655
|
+
if (isDebug()) {
|
|
656
|
+
process.stderr.write(`[image-strip] ${logParts.join("; ")}\n`);
|
|
657
|
+
}
|
|
653
658
|
}
|
|
654
659
|
return;
|
|
655
660
|
}
|
|
@@ -676,7 +681,7 @@ export default {
|
|
|
676
681
|
stats.resize_succeeded > 0 ||
|
|
677
682
|
stats.unsupported_format_count > 0 ||
|
|
678
683
|
stats.dimension_probe_fail_count > 0;
|
|
679
|
-
if (didSomething) {
|
|
684
|
+
if (didSomething && isDebug()) {
|
|
680
685
|
const parts = [];
|
|
681
686
|
if (stats.resize_succeeded > 0) parts.push(`resized=${stats.resize_succeeded}`);
|
|
682
687
|
if (stats.resize_failed > 0) parts.push(`resize_failed=${stats.resize_failed}`);
|