claude-code-cache-fix 3.5.2 → 3.5.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.ko.md CHANGED
@@ -244,6 +244,12 @@ export CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1
244
244
 
245
245
  또는 `~/.claude/settings.json`에 `"includeGitInstructions": false`를 추가하십시오. Claude Code는 컨텍스트가 필요할 때 Bash 도구를 통해 `git status`를 직접 실행할 수 있습니다. [@wadabum](https://github.com/cnighswonger/claude-code-cache-fix/issues/11)이 커뮤니티 검증: git 상태 변경 시 캐시 생성 18토큰(이 플래그 없이는 수천 토큰).
246
246
 
247
+ ## 마이그레이션: v3.4.x → v3.5.0+
248
+
249
+ > **번역 필요 / Translation needed.** 이 섹션은 아직 영어로만 작성되어 있습니다. 사용자 정의 상태 표시줄, 모니터링 스크립트 또는 `~/.claude/quota-status.json`을 직접 읽는 다른 도구를 작성한 경우, v3.5.0 프록시 모드 분할 및 소비자 측 마이그레이션 패턴(새 경로 시도 → 레거시 경로로 폴백)에 대한 자세한 내용은 [영어 README의 "Migration: v3.4.x → v3.5.0+"](README.md#migration-v34x--v350) 섹션을 참조하십시오.
250
+ >
251
+ > 한국어 번역에 기여하시겠습니까? PR을 환영합니다.
252
+
247
253
  ## 이미지 제거 (프리로드 모드)
248
254
 
249
255
  Read 도구로 읽은 이미지는 base64로 인코딩되어 대화 기록에 저장되며, 이후 모든 API 호출에 함께 전송됩니다. 500KB 이미지 하나가 Opus 4.6에서 턴당 약 62,500 토큰, **Opus 4.7에서는 새 토크나이저로 인해 약 85,000+ 토큰**의 추가 비용을 발생시킵니다. 4.7에서는 이미지 제거를 강력히 권장합니다.
package/README.md CHANGED
@@ -324,6 +324,120 @@ Or add `"includeGitInstructions": false` to `~/.claude/settings.json`. Claude Co
324
324
 
325
325
  **Why we don't ship a proxy extension for this:** the proxy intercepts requests after Claude Code has already composed the system prompt — by then the volatile `git status` text is already part of the prefix that the model conditioned on in the previous turn, and stripping it post-hoc would itself bust the cache. The fix has to happen at the source. `CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1` prevents the injection before the prompt is composed, which is why the native flag is the right tool. Stripping post-hoc would also remove model-visible context that an explicit Bash call can recover, and would risk false-positive matches against assistant-written text.
326
326
 
327
+ ## Migration: v3.4.x → v3.5.0+
328
+
329
+ If you wrote a custom statusline, monitoring script, or anything else that reads `~/.claude/quota-status.json` directly, this section is for you. v3.5.0 split that file in proxy mode; preload mode is unchanged.
330
+
331
+ ### What changed
332
+
333
+ | | v3.4.x and earlier (proxy + preload) | v3.5.0+ proxy mode | v3.5.0+ preload mode |
334
+ |---|---|---|---|
335
+ | Quota fields (Q5h, Q7d, status, overage) | `~/.claude/quota-status.json` | `~/.claude/quota-status/account.json` | `~/.claude/quota-status.json` (legacy path) |
336
+ | Cache fields (TTL tier, hit rate, cache_creation/read) | same file as above | `~/.claude/quota-status/sessions/<filename>.json` | same file as above |
337
+ | Multi-session attribution | none — last writer wins | per-session files | preload is single-session by construction |
338
+
339
+ `<filename>` is derived from the request's `x-claude-code-session-id` header via a deterministic safe-name rule: UUIDs and other ids matching `[A-Za-z0-9_-]{1,128}` pass through; null/empty/whitespace become `unknown`; anything else is mapped to `inv-<sha256-prefix>`. Full rule is documented at [`docs/directives/proxy-quota-status-per-session.md`](docs/directives/proxy-quota-status-per-session.md).
340
+
341
+ The legacy `~/.claude/quota-status.json` is auto-deleted on the first proxy-mode write after upgrade. Per-session files older than `CACHE_FIX_QUOTA_STATUS_TTL_DAYS` (default `7`) are swept on write.
342
+
343
+ ### Consumer-side migration pattern
344
+
345
+ Your script should try the v3.5.0+ proxy paths first and fall back to the legacy path if not present. That way it works in both modes (and on hosts mid-upgrade). The session id usually comes from Claude Code's stdin when it invokes a statusline hook; for other consumers, capture it from the most-recently-modified `~/.claude/projects/*/*.jsonl` filename.
346
+
347
+ **Bash (statusline-style):**
348
+ ```bash
349
+ QS_DIR="$HOME/.claude/quota-status"
350
+ ACCOUNT="$QS_DIR/account.json"
351
+ LEGACY="$HOME/.claude/quota-status.json"
352
+
353
+ # Canonical filename rule — must mirror proxy/extensions/cache-telemetry.mjs
354
+ # sessionFilename(): trim, then "" → unknown, safe regex passthrough, else
355
+ # inv-<sha256-prefix>. Without this, malformed or whitespace ids miss the
356
+ # per-session file even though the writer created one under the canonical name.
357
+ session_filename() {
358
+ local trimmed
359
+ trimmed="$(printf '%s' "$1" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')"
360
+ if [ -z "$trimmed" ]; then echo unknown; return; fi
361
+ if printf '%s' "$trimmed" | grep -qE '^[A-Za-z0-9_-]{1,128}$'; then
362
+ printf '%s' "$trimmed"
363
+ else
364
+ # sha256sum on Linux; shasum -a 256 on macOS. Both emit "<hex> -".
365
+ local hash
366
+ if command -v sha256sum >/dev/null 2>&1; then
367
+ hash="$(printf '%s' "$trimmed" | sha256sum)"
368
+ else
369
+ hash="$(printf '%s' "$trimmed" | shasum -a 256)"
370
+ fi
371
+ printf 'inv-%s' "$(printf '%s' "$hash" | cut -c1-16)"
372
+ fi
373
+ }
374
+
375
+ # session id: prefer CC stdin, fall back to most-recent jsonl
376
+ sid="$(jq -r '.session_id // empty' 2>/dev/null < /dev/stdin || true)"
377
+ if [ -z "$sid" ]; then
378
+ sid="$(ls -t "$HOME"/.claude/projects/*/*.jsonl 2>/dev/null | head -1 | xargs -I{} basename {} .jsonl)"
379
+ fi
380
+ filename="$(session_filename "$sid")"
381
+
382
+ # quota: account.json (v3.5.0+) → fall back to legacy
383
+ if [ -f "$ACCOUNT" ]; then
384
+ quota_json="$(cat "$ACCOUNT")"
385
+ elif [ -f "$LEGACY" ]; then
386
+ quota_json="$(cat "$LEGACY")"
387
+ fi
388
+
389
+ # cache: sessions/<filename>.json (v3.5.0+) → fall back to legacy
390
+ if [ -f "$QS_DIR/sessions/$filename.json" ]; then
391
+ cache_json="$(cat "$QS_DIR/sessions/$filename.json")"
392
+ elif [ -f "$LEGACY" ]; then
393
+ cache_json="$(cat "$LEGACY")"
394
+ fi
395
+ ```
396
+
397
+ **Node:**
398
+ ```js
399
+ import { readFileSync, existsSync } from "node:fs";
400
+ import { homedir } from "node:os";
401
+ import { join } from "node:path";
402
+ import { createHash } from "node:crypto";
403
+
404
+ const home = homedir();
405
+ const accountPath = join(home, ".claude", "quota-status", "account.json");
406
+ const legacyPath = join(home, ".claude", "quota-status.json");
407
+
408
+ const SAFE_NAME_RE = /^[A-Za-z0-9_-]{1,128}$/;
409
+
410
+ // Mirror of cache-telemetry.mjs sessionFilename(). Reader-side rule must match
411
+ // writer-side rule; otherwise malformed/whitespace ids miss their per-session file.
412
+ function sessionFilename(rawId) {
413
+ if (rawId === null || rawId === undefined) return "unknown";
414
+ const s = String(rawId).trim();
415
+ if (s.length === 0) return "unknown";
416
+ if (SAFE_NAME_RE.test(s)) return s;
417
+ return "inv-" + createHash("sha256").update(s).digest("hex").slice(0, 16);
418
+ }
419
+
420
+ function readQuotaJson() {
421
+ if (existsSync(accountPath)) return JSON.parse(readFileSync(accountPath, "utf8"));
422
+ if (existsSync(legacyPath)) return JSON.parse(readFileSync(legacyPath, "utf8"));
423
+ return null;
424
+ }
425
+
426
+ function readCacheJson(sessionId) {
427
+ const filename = sessionFilename(sessionId);
428
+ const p = join(home, ".claude", "quota-status", "sessions", `${filename}.json`);
429
+ if (existsSync(p)) return JSON.parse(readFileSync(p, "utf8"));
430
+ if (existsSync(legacyPath)) return JSON.parse(readFileSync(legacyPath, "utf8"));
431
+ return null;
432
+ }
433
+ ```
434
+
435
+ The shipped [`tools/quota-statusline.sh`](tools/quota-statusline.sh) is the reference implementation for the bash version. The [`/coffee` skill](https://github.com/cnighswonger/claude-code-coffee) v1.4.0 is the reference for the per-session warmth gate.
436
+
437
+ ### Why per-session
438
+
439
+ On multi-agent hosts (multiple Claude Code sessions sharing one proxy), the pre-v3.5.0 single global file caused every session to overwrite the others' cache stats with each response. A statusline reading from session A would show session B's TTL tier whenever B sent a request more recently. Per-session files plus an account-global quota file resolve this without losing the easy account-wide view. See [#104](https://github.com/cnighswonger/claude-code-cache-fix/issues/104) for the original report.
440
+
327
441
  ## Image stripping (preload mode)
328
442
 
329
443
  Images read via the Read tool persist as base64 in conversation history, riding along on every subsequent API call. A single 500KB image costs ~62,500 tokens per turn on Opus 4.6, and **~85,000+ on Opus 4.7** due to the new tokenizer. Image stripping is strongly recommended on 4.7.
package/README.zh.md CHANGED
@@ -186,6 +186,120 @@ export CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1
186
186
 
187
187
  或在 `~/.claude/settings.json` 中添加 `"includeGitInstructions": false`。社区验证者 [@wadabum](https://github.com/cnighswonger/claude-code-cache-fix/issues/11):跨 git 状态变化仅 18 token 缓存创建(禁用前为数千 token)。
188
188
 
189
+ ## 迁移:v3.4.x → v3.5.0+
190
+
191
+ 如果你编写了直接读取 `~/.claude/quota-status.json` 的自定义状态栏、监控脚本或其他工具,本节适用于你。v3.5.0 在代理模式下拆分了该文件;预加载模式保持不变。
192
+
193
+ ### 变更内容
194
+
195
+ | | v3.4.x 及更早(代理 + 预加载) | v3.5.0+ 代理模式 | v3.5.0+ 预加载模式 |
196
+ |---|---|---|---|
197
+ | 配额字段(Q5h、Q7d、status、overage) | `~/.claude/quota-status.json` | `~/.claude/quota-status/account.json` | `~/.claude/quota-status.json`(旧路径) |
198
+ | 缓存字段(TTL 层级、命中率、cache_creation/read) | 同上文件 | `~/.claude/quota-status/sessions/<filename>.json` | 同上文件 |
199
+ | 多会话归属 | 无 — 后写者覆盖 | 按会话分文件 | 预加载按构造为单会话 |
200
+
201
+ `<filename>` 由请求的 `x-claude-code-session-id` 头通过确定性安全名规则派生:UUID 等匹配 `[A-Za-z0-9_-]{1,128}` 的 id 直接通过;空/null/空白被映射为 `unknown`;其他映射为 `inv-<sha256-prefix>`。完整规则见 [`docs/directives/proxy-quota-status-per-session.md`](docs/directives/proxy-quota-status-per-session.md)。
202
+
203
+ 升级后第一次代理模式写入会自动删除旧版 `~/.claude/quota-status.json`。早于 `CACHE_FIX_QUOTA_STATUS_TTL_DAYS`(默认 `7`)的会话文件会在写入时被清理。
204
+
205
+ ### 消费方迁移模式
206
+
207
+ 你的脚本应优先尝试 v3.5.0+ 代理路径,失败时回退到旧路径。这样在两种模式下(以及升级中途的主机上)都能正常工作。会话 id 通常来自 Claude Code 调用状态栏 hook 时的 stdin;其他场景可从最近修改的 `~/.claude/projects/*/*.jsonl` 文件名捕获。
208
+
209
+ **Bash(状态栏风格):**
210
+ ```bash
211
+ QS_DIR="$HOME/.claude/quota-status"
212
+ ACCOUNT="$QS_DIR/account.json"
213
+ LEGACY="$HOME/.claude/quota-status.json"
214
+
215
+ # 文件名规范化规则 —— 必须与 proxy/extensions/cache-telemetry.mjs 中的
216
+ # sessionFilename() 保持一致:先 trim;空 → unknown;匹配安全正则 → 直接通过;
217
+ # 否则 → inv-<sha256-prefix>。否则空白/格式异常的 id 会读不到写入端按规范名
218
+ # 创建的文件。
219
+ session_filename() {
220
+ local trimmed
221
+ trimmed="$(printf '%s' "$1" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')"
222
+ if [ -z "$trimmed" ]; then echo unknown; return; fi
223
+ if printf '%s' "$trimmed" | grep -qE '^[A-Za-z0-9_-]{1,128}$'; then
224
+ printf '%s' "$trimmed"
225
+ else
226
+ # Linux 上是 sha256sum,macOS 上是 shasum -a 256;两者均输出 "<hex> -"。
227
+ local hash
228
+ if command -v sha256sum >/dev/null 2>&1; then
229
+ hash="$(printf '%s' "$trimmed" | sha256sum)"
230
+ else
231
+ hash="$(printf '%s' "$trimmed" | shasum -a 256)"
232
+ fi
233
+ printf 'inv-%s' "$(printf '%s' "$hash" | cut -c1-16)"
234
+ fi
235
+ }
236
+
237
+ # 会话 id:优先 CC stdin,回退最近的 jsonl
238
+ sid="$(jq -r '.session_id // empty' 2>/dev/null < /dev/stdin || true)"
239
+ if [ -z "$sid" ]; then
240
+ sid="$(ls -t "$HOME"/.claude/projects/*/*.jsonl 2>/dev/null | head -1 | xargs -I{} basename {} .jsonl)"
241
+ fi
242
+ filename="$(session_filename "$sid")"
243
+
244
+ # 配额:account.json(v3.5.0+)→ 回退旧路径
245
+ if [ -f "$ACCOUNT" ]; then
246
+ quota_json="$(cat "$ACCOUNT")"
247
+ elif [ -f "$LEGACY" ]; then
248
+ quota_json="$(cat "$LEGACY")"
249
+ fi
250
+
251
+ # 缓存:sessions/<filename>.json(v3.5.0+)→ 回退旧路径
252
+ if [ -f "$QS_DIR/sessions/$filename.json" ]; then
253
+ cache_json="$(cat "$QS_DIR/sessions/$filename.json")"
254
+ elif [ -f "$LEGACY" ]; then
255
+ cache_json="$(cat "$LEGACY")"
256
+ fi
257
+ ```
258
+
259
+ **Node:**
260
+ ```js
261
+ import { readFileSync, existsSync } from "node:fs";
262
+ import { homedir } from "node:os";
263
+ import { join } from "node:path";
264
+ import { createHash } from "node:crypto";
265
+
266
+ const home = homedir();
267
+ const accountPath = join(home, ".claude", "quota-status", "account.json");
268
+ const legacyPath = join(home, ".claude", "quota-status.json");
269
+
270
+ const SAFE_NAME_RE = /^[A-Za-z0-9_-]{1,128}$/;
271
+
272
+ // 与 cache-telemetry.mjs 的 sessionFilename() 保持一致。读取端规则必须与写入端
273
+ // 一致;否则空白/格式异常的 id 会找不到对应的会话文件。
274
+ function sessionFilename(rawId) {
275
+ if (rawId === null || rawId === undefined) return "unknown";
276
+ const s = String(rawId).trim();
277
+ if (s.length === 0) return "unknown";
278
+ if (SAFE_NAME_RE.test(s)) return s;
279
+ return "inv-" + createHash("sha256").update(s).digest("hex").slice(0, 16);
280
+ }
281
+
282
+ function readQuotaJson() {
283
+ if (existsSync(accountPath)) return JSON.parse(readFileSync(accountPath, "utf8"));
284
+ if (existsSync(legacyPath)) return JSON.parse(readFileSync(legacyPath, "utf8"));
285
+ return null;
286
+ }
287
+
288
+ function readCacheJson(sessionId) {
289
+ const filename = sessionFilename(sessionId);
290
+ const p = join(home, ".claude", "quota-status", "sessions", `${filename}.json`);
291
+ if (existsSync(p)) return JSON.parse(readFileSync(p, "utf8"));
292
+ if (existsSync(legacyPath)) return JSON.parse(readFileSync(legacyPath, "utf8"));
293
+ return null;
294
+ }
295
+ ```
296
+
297
+ 随包发布的 [`tools/quota-statusline.sh`](tools/quota-statusline.sh) 是 bash 版本的参考实现。[`/coffee` 技能](https://github.com/cnighswonger/claude-code-coffee) v1.4.0 是按会话保活闸门的参考。
298
+
299
+ ### 为什么按会话拆分
300
+
301
+ 在多代理主机上(多个 Claude Code 会话共享一个代理),v3.5.0 之前的单一全局文件会让每个会话用自己的响应覆盖其他会话的缓存统计。状态栏从会话 A 读取,但会话 B 最近发出请求时,会显示 B 的 TTL 层级。按会话分文件 + 一个账户级配额文件解决了这一问题,同时保留账户级整体视图。原始报告见 [#104](https://github.com/cnighswonger/claude-code-cache-fix/issues/104)。
302
+
189
303
  ## 图片剥离(预加载模式)
190
304
 
191
305
  通过 Read 工具读取的图片以 base64 持久化在对话历史中,在每次后续 API 调用时随行发送。单张 500KB 图片在 Opus 4.6 上每轮带来约 62,500 token 开销,**在 Opus 4.7 上约 85,000+ token**(因新分词器)。强烈建议在 4.7 上启用图片剥离。
@@ -0,0 +1,15 @@
1
+ Third-Party Licenses
2
+ ====================
3
+
4
+ 1. claude-usage-dashboard — NDJSON proxy log schema
5
+
6
+ Source: https://github.com/fgrosswig/claude-usage-dashboard
7
+ Author: Falk Grosswig (@fgrosswig)
8
+ License: Apache License 2.0
9
+
10
+ The proxy NDJSON log schema used by tools/usage-to-dashboard-ndjson.mjs
11
+ (field names, structure, file naming convention, cache_health semantics,
12
+ and cost_factor methodology) originates from the claude-usage-dashboard.
13
+
14
+ Used under the Apache License, Version 2.0.
15
+ https://www.apache.org/licenses/LICENSE-2.0
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-code-cache-fix",
3
- "version": "3.5.2",
3
+ "version": "3.5.4",
4
4
  "description": "Cache optimization proxy and interceptor for Claude Code. Fixes prompt cache bugs, stabilizes prefix, reduces quota burn.",
5
5
  "type": "module",
6
6
  "exports": "./preload.mjs",
@@ -15,7 +15,8 @@
15
15
  "claude-fixed.bat",
16
16
  "proxy/",
17
17
  "bin/",
18
- "templates/"
18
+ "templates/",
19
+ "THIRD_PARTY_LICENSES"
19
20
  ],
20
21
  "engines": {
21
22
  "node": ">=18"
@@ -0,0 +1,251 @@
1
+ // rate-limit-log — append per-event record to ~/.claude/usage-log/rate-limit-events.jsonl
2
+ // when an upstream response carries the canonical Anthropic rate-limit error
3
+ // envelope. This is a SUPERSET of burst/concurrency events: the same
4
+ // envelope is returned for RPM/ITPM/OTPM and auto-mode classifier overflow.
5
+ // Splitting the classes is a post-analysis problem on the JSONL stream.
6
+ //
7
+ // See docs/directives/proxy-rate-limit-logging.md for the full design and
8
+ // the post-analysis playbook.
9
+ //
10
+ // Activation: enabled:false in the export default. Users opt in via
11
+ // "rate-limit-log": { "enabled": true, "order": 660 }
12
+ // in proxy/extensions.json. No env-var enable flag.
13
+ //
14
+ // Detection signature is grounded in 88 captured 429 responses from the
15
+ // 2026-05-08 00:06-00:21 UTC burst (15 min window, single account, full HTTP
16
+ // fidelity via tee between cache-fix-proxy and llm-relay). Brief at
17
+ // ~/git_repos/claude/docs/issues/cache-fix-429-burst-data-2026-05-08.md
18
+ // Across all 88: status === 429, content-type: application/json (no SSE),
19
+ // body.type === "error", body.error.type === "rate_limit_error",
20
+ // x-should-retry: "true". No Retry-After. No anthropic-ratelimit-* headers.
21
+ // Anthropic's `error.message` is literally "Error" — no class hint.
22
+ //
23
+ // SCOPE NOTE: this extension logs ALL `rate_limit_error` 429s. Per
24
+ // Anthropic's public docs, that error type is shared across the
25
+ // burst/concurrency limiter, classic RPM/ITPM/OTPM limiters, and (per
26
+ // Lead's 2026-05-08 follow-up) auto-mode classifier traffic on Opus 4.7.
27
+ // The response itself carries no signal that distinguishes those classes —
28
+ // `error.message` is literally "Error". Splitting burst-vs-RPM/TPM and
29
+ // classifier-vs-main-inference happens in post-analysis, using the
30
+ // recorded `requested_model`, `request_path`, inter-arrival timing, and
31
+ // `request_size_tokens` fields on each row. Do NOT treat this file as
32
+ // burst-limit-only evidence; it's a superset.
33
+
34
+ import { mkdir, appendFile } from "node:fs/promises";
35
+ import { readdirSync, statSync, readFileSync } from "node:fs";
36
+ import { join, dirname } from "node:path";
37
+ import { homedir } from "node:os";
38
+
39
+ // Paths resolved per call so tests can swap $HOME between cases. The
40
+ // homedir() call is essentially free.
41
+ function paths() {
42
+ const home = homedir();
43
+ return {
44
+ logPath: join(home, ".claude", "usage-log", "rate-limit-events.jsonl"),
45
+ accountPath: join(home, ".claude", "quota-status", "account.json"),
46
+ sessionsDir: join(home, ".claude", "quota-status", "sessions"),
47
+ };
48
+ }
49
+
50
+ const BODY_EXCERPT_MAX = 256;
51
+ const ACTIVE_SESSION_WINDOW_MS = 5 * 60 * 1000;
52
+
53
+ // --- Detection predicate ---
54
+ //
55
+ // Matches the canonical Anthropic rate-limit error envelope:
56
+ // { "type": "error", "error": { "type": "rate_limit_error", ... } } at 429
57
+ //
58
+ // Per Anthropic's public docs and the 2026-05-08 capture, this envelope is
59
+ // returned for every flavor of 429 — burst/concurrency, RPM, ITPM, OTPM,
60
+ // and auto-mode classifier overflow. The response itself carries no
61
+ // discriminator. Distinguishing the classes is a post-analysis problem
62
+ // over the JSONL using inter-arrival timing, `requested_model`, and
63
+ // `request_size_tokens`. See directive for the analysis playbook.
64
+ //
65
+ // Header signals (x-should-retry, request-id) are recorded in the row but
66
+ // NOT used as detection gates — Anthropic could change them independently
67
+ // of the body, and the body schema is the canonical contract.
68
+ export function isRateLimitResponse(ctx) {
69
+ if (!ctx || typeof ctx.status !== "number") return false;
70
+ if (ctx.status !== 429) return false;
71
+ const body = ctx.body;
72
+ if (!body || typeof body !== "object") return false;
73
+ return body.type === "error" && body.error?.type === "rate_limit_error";
74
+ }
75
+
76
+ // --- Field extractors (test seams) ---
77
+
78
+ export function estimateRequestSizeTokens(body) {
79
+ if (!body || typeof body !== "object") return 0;
80
+ let chars = 0;
81
+ if (typeof body.system === "string") chars += body.system.length;
82
+ if (Array.isArray(body.system)) {
83
+ for (const block of body.system) {
84
+ if (block && typeof block.text === "string") chars += block.text.length;
85
+ }
86
+ }
87
+ if (Array.isArray(body.messages)) {
88
+ for (const msg of body.messages) {
89
+ if (typeof msg?.content === "string") {
90
+ chars += msg.content.length;
91
+ } else if (Array.isArray(msg?.content)) {
92
+ for (const block of msg.content) {
93
+ if (typeof block?.text === "string") chars += block.text.length;
94
+ }
95
+ }
96
+ }
97
+ }
98
+ return Math.ceil(chars / 4);
99
+ }
100
+
101
+ // Bounds the persisted excerpt length, NOT the temporary serialization
102
+ // allocation. JSON.stringify still materializes the full body string before
103
+ // the slice. In practice this is fine — Anthropic's 429 bodies are ~120
104
+ // bytes per the 2026-05-08 capture, and the proxy already buffers the full
105
+ // body upstream of this extension (server.mjs:79-82). For string inputs the
106
+ // length is capped pre-stringify, so a hostile pre-rendered string can't
107
+ // blow up here. If a future call site ever passes a giant pre-built object
108
+ // graph, the upstream-buffering and per-extension try/catch isolate the
109
+ // allocation cost from the rest of the pipeline.
110
+ export function bodyExcerpt(body) {
111
+ if (body === undefined || body === null) return "";
112
+ if (typeof body === "string") return body.slice(0, BODY_EXCERPT_MAX);
113
+ let s;
114
+ try {
115
+ s = JSON.stringify(body);
116
+ } catch {
117
+ s = String(body);
118
+ }
119
+ return s.slice(0, BODY_EXCERPT_MAX);
120
+ }
121
+
122
+ export function isPeakHourOldSchedule(now = new Date()) {
123
+ const day = now.getUTCDay(); // 0 = Sun, 1..5 = Mon..Fri, 6 = Sat
124
+ const hour = now.getUTCHours();
125
+ return day >= 1 && day <= 5 && hour >= 13 && hour < 19;
126
+ }
127
+
128
+ export function countActiveSessions(now = Date.now(), sessionsDir = paths().sessionsDir) {
129
+ let entries;
130
+ try {
131
+ entries = readdirSync(sessionsDir);
132
+ } catch {
133
+ return 0;
134
+ }
135
+ let count = 0;
136
+ const cutoff = now - ACTIVE_SESSION_WINDOW_MS;
137
+ for (const name of entries) {
138
+ try {
139
+ const st = statSync(join(sessionsDir, name));
140
+ if (st.mtimeMs >= cutoff) count++;
141
+ } catch {}
142
+ }
143
+ return count;
144
+ }
145
+
146
+ export function readQ5hPctAtEvent(accountPath = paths().accountPath) {
147
+ try {
148
+ const data = JSON.parse(readFileSync(accountPath, "utf8"));
149
+ return data?.five_hour?.pct ?? null;
150
+ } catch {
151
+ return null;
152
+ }
153
+ }
154
+
155
+ export function buildRecord({ ctx, now = new Date() }) {
156
+ // Anthropic's error responses carry the request id in TWO places: the
157
+ // `request-id` response header and the body's `request_id` field. Prefer
158
+ // body (canonical), fall back to header.
159
+ const headerReqId = ctx?.headers?.["request-id"] || null;
160
+ const bodyReqId = (ctx?.body && typeof ctx.body === "object")
161
+ ? (ctx.body.request_id || null)
162
+ : null;
163
+ const xShouldRetry = ctx?.headers?.["x-should-retry"] || null;
164
+
165
+ return {
166
+ schema_version: 1,
167
+ ts: now.toISOString(),
168
+ type: "rate_limit",
169
+ session_id: ctx?.meta?._sessionId ?? null,
170
+ requested_model: ctx?.meta?._requestedModel ?? null,
171
+ request_path: ctx?.meta?._requestPath || "/v1/messages",
172
+ request_size_tokens: ctx?.meta?._requestSizeTokens ?? 0,
173
+ response_status: ctx?.status ?? null,
174
+ response_body_excerpt: bodyExcerpt(ctx?.body),
175
+ concurrent_sessions_estimate: countActiveSessions(now.getTime()),
176
+ q5h_pct_at_event: readQ5hPctAtEvent(),
177
+ peak_hour_old_schedule: isPeakHourOldSchedule(now),
178
+ upstream_request_id: bodyReqId || headerReqId,
179
+ x_should_retry: xShouldRetry,
180
+ // Stable id of the underlying TCP socket that carried this request,
181
+ // assigned in proxy/upstream.mjs via WeakMap<Socket, id>. Persists across
182
+ // keep-alive reuse, recycles on socket close. Null if upstream errored
183
+ // before a socket was assigned. Populated by server.mjs after
184
+ // forwardRequest resolves. Use for H3-vs-H4 verification per Lead's
185
+ // 2026-05-08 brief: if 429s cluster on one connection id, the limiter
186
+ // is per-connection (H3); if they spread across many, client-side
187
+ // queue saturation (H4) is more likely.
188
+ upstream_connection_id: ctx?.meta?._upstreamConnectionId ?? null,
189
+ };
190
+ }
191
+
192
+ // --- I/O ---
193
+
194
+ async function appendJsonl(record, path = paths().logPath) {
195
+ await mkdir(dirname(path), { recursive: true });
196
+ await appendFile(path, JSON.stringify(record) + "\n");
197
+ }
198
+
199
+ // Test helper: write to a caller-supplied path (bypasses default).
200
+ export async function writeRecord(record, path) {
201
+ await mkdir(dirname(path), { recursive: true });
202
+ await appendFile(path, JSON.stringify(record) + "\n");
203
+ }
204
+
205
+ // Exported so tests / external diagnostics can resolve the current path.
206
+ export function getLogPath() {
207
+ return paths().logPath;
208
+ }
209
+
210
+ // --- Extension contract ---
211
+
212
+ export default {
213
+ name: "rate-limit-log",
214
+ description: "Append rate-limit incident records to ~/.claude/usage-log/rate-limit-events.jsonl (opt-in)",
215
+ enabled: false,
216
+ order: 660,
217
+
218
+ async onRequest(ctx) {
219
+ if (!ctx || !ctx.body) return;
220
+ try {
221
+ ctx.meta = ctx.meta || {};
222
+ ctx.meta._requestSizeTokens = estimateRequestSizeTokens(ctx.body);
223
+ // Capture the requested model so post-analysis can distinguish
224
+ // auto-mode classifier traffic (Opus 4.7) from main-inference (any
225
+ // model). Per Lead's 2026-05-08 finding, CC's auto-mode safety
226
+ // classifier runs a separate Opus-4-7 API call before each Edit, and
227
+ // those classifier calls share the same account-wide concurrency
228
+ // limiter — so the rate-limit JSONL is naturally a mix of both
229
+ // traffic types. requested_model + request_size_tokens together let
230
+ // post-analysis split them.
231
+ if (typeof ctx.body.model === "string") {
232
+ ctx.meta._requestedModel = ctx.body.model;
233
+ }
234
+ // Future-proof: when the proxy gains other paths beyond /v1/messages,
235
+ // pass the path through ctx so we can record it. Until then default in
236
+ // buildRecord. We don't have ctx.path today, so this is a no-op.
237
+ } catch {
238
+ // Fail-open: never throw to the pipeline.
239
+ }
240
+ },
241
+
242
+ async onResponse(ctx) {
243
+ if (!isRateLimitResponse(ctx)) return;
244
+ try {
245
+ const record = buildRecord({ ctx });
246
+ await appendJsonl(record);
247
+ } catch {
248
+ // Fail-open: never throw to the pipeline.
249
+ }
250
+ },
251
+ };
package/proxy/server.mjs CHANGED
@@ -53,10 +53,10 @@ async function handleMessages(clientReq, clientRes) {
53
53
 
54
54
  const requestedModel = parsed?.model || null;
55
55
 
56
- let upstreamRes, responseHeaders, statusCode;
56
+ let upstreamRes, responseHeaders, statusCode, upstreamConnectionId;
57
57
 
58
58
  try {
59
- ({ upstreamRes, responseHeaders, statusCode } = await forwardRequest(
59
+ ({ upstreamRes, responseHeaders, statusCode, upstreamConnectionId } = await forwardRequest(
60
60
  clientReq,
61
61
  forwardBody,
62
62
  abortController.signal
@@ -68,6 +68,11 @@ async function handleMessages(clientReq, clientRes) {
68
68
  return;
69
69
  }
70
70
 
71
+ // Stash upstream connection id on meta so downstream extensions
72
+ // (rate-limit-log, future per-connection diagnostics) can record which
73
+ // socket carried the request without each one re-instrumenting upstream.
74
+ meta._upstreamConnectionId = upstreamConnectionId ?? null;
75
+
71
76
  if (extSnapshot.length > 0) {
72
77
  const resCtx = { status: statusCode, headers: responseHeaders, meta };
73
78
  await runOnResponseStart(resCtx, extSnapshot);
@@ -54,6 +54,42 @@ const _agents = new Map(); // cache key → Agent | null
54
54
  const _loggedProxies = new Set(); // dedupe stderr "using proxy" lines per (url, isHTTPS)
55
55
  let _warnedTlsDisabled = false;
56
56
 
57
+ // --- Upstream connection identity ---
58
+ //
59
+ // Each underlying TCP socket gets a stable id the first time we see it. The
60
+ // id persists across keep-alive reuses of the same socket (WeakMap by socket
61
+ // reference) and dies when the socket is GC'd. New sockets — including
62
+ // reconnects after a closed connection — get fresh ids.
63
+ //
64
+ // This lets the rate-limit-log extension (and any future per-connection
65
+ // diagnostic) record which upstream connection a response came back on, so
66
+ // post-analysis can distinguish per-connection limiter behavior (Lead's H3,
67
+ // 2026-05-08 brief) from client-side queue saturation (H4) or genuinely
68
+ // account-wide limiting.
69
+ //
70
+ // Format: "cn-<int>" — opaque to consumers; only the equality and cardinality
71
+ // matter for analysis.
72
+
73
+ let _connectionIdCounter = 0;
74
+ const _socketIds = new WeakMap();
75
+
76
+ export function getOrAssignConnectionId(socket) {
77
+ if (!socket) return null;
78
+ let id = _socketIds.get(socket);
79
+ if (id === undefined) {
80
+ id = `cn-${++_connectionIdCounter}`;
81
+ _socketIds.set(socket, id);
82
+ }
83
+ return id;
84
+ }
85
+
86
+ // Test-only: reset the monotonic counter. The WeakMap entries die with their
87
+ // sockets so we don't need to clear them; we just need a predictable start
88
+ // for assertions on id values across cases.
89
+ export function __resetConnectionIdsForTests() {
90
+ _connectionIdCounter = 0;
91
+ }
92
+
57
93
  function shouldBypassProxy(hostname) {
58
94
  if (!config.noProxy) return false;
59
95
  const list = config.noProxy.split(",").map((s) => s.trim().toLowerCase()).filter(Boolean);
@@ -170,10 +206,25 @@ export function forwardRequest(clientReq, body, signal) {
170
206
  agent: getAgent(isHTTPS, upstreamUrl.hostname),
171
207
  };
172
208
 
209
+ let upstreamConnectionId = null;
210
+ // The 'socket' event fires when a socket is assigned to this request,
211
+ // synchronously after transport.request() returns for both new and
212
+ // pooled-keep-alive sockets. By the time the response callback runs we
213
+ // already know which connection carried the request.
214
+ const captureSocket = (sock) => {
215
+ upstreamConnectionId = getOrAssignConnectionId(sock);
216
+ };
217
+
173
218
  const upstreamReq = transport.request(options, (upstreamRes) => {
174
219
  const responseHeaders = filterResponseHeaders(upstreamRes.headers);
175
- resolve({ upstreamRes, responseHeaders, statusCode: upstreamRes.statusCode });
220
+ resolve({
221
+ upstreamRes,
222
+ responseHeaders,
223
+ statusCode: upstreamRes.statusCode,
224
+ upstreamConnectionId,
225
+ });
176
226
  });
227
+ upstreamReq.on("socket", captureSocket);
177
228
 
178
229
  upstreamReq.on("error", reject);
179
230
  upstreamReq.on("timeout", () => {
@@ -82,8 +82,13 @@
82
82
  * ANTHROPIC_PROXY_LOG_DIR Override output directory (matches fgrosswig's
83
83
  * dashboard env var so both tools stay in sync).
84
84
  *
85
- * Part of claude-code-cache-fix. MIT licensed.
85
+ * Part of claude-code-cache-fix.
86
86
  * https://github.com/cnighswonger/claude-code-cache-fix
87
+ *
88
+ * The NDJSON proxy log schema (field names, structure, file naming convention,
89
+ * cache_health semantics) originates from fgrosswig/claude-usage-dashboard
90
+ * and is used under the Apache License 2.0.
91
+ * https://github.com/fgrosswig/claude-usage-dashboard
87
92
  */
88
93
 
89
94
  import { readFileSync, writeFileSync, appendFileSync, existsSync, mkdirSync, statSync, watch } from 'node:fs';
@@ -147,9 +152,19 @@ plus the visualization layer from his dashboard, with no coordination needed.
147
152
  * Translate one claude-code-cache-fix usage.jsonl record into a
148
153
  * fgrosswig-dashboard-compatible NDJSON record. Returns null if the
149
154
  * record doesn't have enough fields to be usable.
155
+ *
156
+ * Accepts both schemas:
157
+ * - Preload-era: `entry.timestamp`, `entry.q5h_pct` / `entry.q7d_pct` (int 0-100)
158
+ * - Proxy v:1 (MeterRowSchema, written by `usage-log` extension v3.2.0+):
159
+ * `entry.ts`, `entry.q5h` / `entry.q7d` (float 0-1)
160
+ *
161
+ * Both forms are handled via fallback so this translator continues to work
162
+ * across the schema evolution. Tracking issue: #112.
150
163
  */
151
- function translateRecord(entry) {
152
- if (!entry || !entry.timestamp || !entry.model) return null;
164
+ export function translateRecord(entry) {
165
+ // Entry guard accept both formats. Drop only when neither timestamp form
166
+ // is present, or model is missing.
167
+ if (!entry || !(entry.timestamp || entry.ts) || !entry.model) return null;
153
168
 
154
169
  const inTok = entry.input_tokens || 0;
155
170
  const outTok = entry.output_tokens || 0;
@@ -167,19 +182,28 @@ function translateRecord(entry) {
167
182
  }
168
183
 
169
184
  // Reconstruct a minimal response_anthropic_headers blob from the quota
170
- // pct fields we captured. Not byte-identical to what the proxy would see
171
- // on the wire, but structurally compatible for the dashboard's consumers.
185
+ // fields we captured. Two schema flavors:
186
+ // preload: q5h_pct / q7d_pct as int 0-100 (divide by 100 to get utilization)
187
+ // v:1: q5h / q7d as float 0-1 (already in utilization form)
188
+ // Not byte-identical to what the proxy would see on the wire, but
189
+ // structurally compatible for the dashboard's consumers.
172
190
  const responseHeaders = {};
173
191
  if (entry.q5h_pct != null) {
174
192
  responseHeaders['anthropic-ratelimit-unified-5h-utilization'] = String(entry.q5h_pct / 100);
193
+ } else if (entry.q5h != null) {
194
+ responseHeaders['anthropic-ratelimit-unified-5h-utilization'] = String(entry.q5h);
175
195
  }
176
196
  if (entry.q7d_pct != null) {
177
197
  responseHeaders['anthropic-ratelimit-unified-7d-utilization'] = String(entry.q7d_pct / 100);
198
+ } else if (entry.q7d != null) {
199
+ responseHeaders['anthropic-ratelimit-unified-7d-utilization'] = String(entry.q7d);
178
200
  }
179
201
 
202
+ const entryTs = entry.timestamp || entry.ts;
203
+
180
204
  const rec = {
181
- ts_start: entry.timestamp,
182
- ts_end: entry.timestamp,
205
+ ts_start: entryTs,
206
+ ts_end: entryTs,
183
207
  duration_ms: null,
184
208
  method: 'POST',
185
209
  path: '/v1/messages',
@@ -207,7 +231,7 @@ function translateRecord(entry) {
207
231
 
208
232
  // Synthesize a stable pseudo-request-id from timestamp + model for dedup
209
233
  // at the dashboard layer. Not a real request ID — just a deterministic key.
210
- rec.req_id = 'ccf_' + entry.timestamp.replace(/[^0-9]/g, '') + '_' + entry.model.slice(-6);
234
+ rec.req_id = 'ccf_' + entryTs.replace(/[^0-9]/g, '') + '_' + entry.model.slice(-6);
211
235
 
212
236
  return rec;
213
237
  }
@@ -339,14 +363,19 @@ function runFollow(opts) {
339
363
 
340
364
  // ─── Main ───────────────────────────────────────────────────────────────────
341
365
 
342
- const opts = parseArgs();
343
- if (opts.help) {
344
- printUsage();
345
- process.exit(0);
346
- }
366
+ // Guard CLI execution so tests can `import { translateRecord }` without
367
+ // auto-running the batch/follow flow.
368
+ const _isMain = import.meta.url === `file://${process.argv[1]}`;
369
+ if (_isMain) {
370
+ const opts = parseArgs();
371
+ if (opts.help) {
372
+ printUsage();
373
+ process.exit(0);
374
+ }
347
375
 
348
- if (opts.follow) {
349
- runFollow(opts);
350
- } else {
351
- runBatch(opts);
376
+ if (opts.follow) {
377
+ runFollow(opts);
378
+ } else {
379
+ runBatch(opts);
380
+ }
352
381
  }