npm - claude-code-cache-fix - Versions diffs - 3.5.2 → 3.5.4 - Mend

claude-code-cache-fix 3.5.2 → 3.5.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/README.ko.md +6 -0
package/README.md +114 -0
package/README.zh.md +114 -0
package/THIRD_PARTY_LICENSES +15 -0
package/package.json +3 -2
package/proxy/extensions/rate-limit-log.mjs +251 -0
package/proxy/server.mjs +7 -2
package/proxy/upstream.mjs +52 -1
package/tools/usage-to-dashboard-ndjson.mjs +46 -17

package/README.ko.md CHANGED Viewed

@@ -244,6 +244,12 @@ export CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1
 또는 `~/.claude/settings.json`에 `"includeGitInstructions": false`를 추가하십시오. Claude Code는 컨텍스트가 필요할 때 Bash 도구를 통해 `git status`를 직접 실행할 수 있습니다. [@wadabum](https://github.com/cnighswonger/claude-code-cache-fix/issues/11)이 커뮤니티 검증: git 상태 변경 시 캐시 생성 18토큰(이 플래그 없이는 수천 토큰).
+## 마이그레이션: v3.4.x → v3.5.0+
+> **번역 필요 / Translation needed.** 이 섹션은 아직 영어로만 작성되어 있습니다. 사용자 정의 상태 표시줄, 모니터링 스크립트 또는 `~/.claude/quota-status.json`을 직접 읽는 다른 도구를 작성한 경우, v3.5.0 프록시 모드 분할 및 소비자 측 마이그레이션 패턴(새 경로 시도 → 레거시 경로로 폴백)에 대한 자세한 내용은 [영어 README의 "Migration: v3.4.x → v3.5.0+"](README.md#migration-v34x--v350) 섹션을 참조하십시오.
+>
+> 한국어 번역에 기여하시겠습니까? PR을 환영합니다.
 ## 이미지 제거 (프리로드 모드)
 Read 도구로 읽은 이미지는 base64로 인코딩되어 대화 기록에 저장되며, 이후 모든 API 호출에 함께 전송됩니다. 500KB 이미지 하나가 Opus 4.6에서 턴당 약 62,500 토큰, **Opus 4.7에서는 새 토크나이저로 인해 약 85,000+ 토큰**의 추가 비용을 발생시킵니다. 4.7에서는 이미지 제거를 강력히 권장합니다.

package/README.md CHANGED Viewed

@@ -324,6 +324,120 @@ Or add `"includeGitInstructions": false` to `~/.claude/settings.json`. Claude Co
 **Why we don't ship a proxy extension for this:** the proxy intercepts requests after Claude Code has already composed the system prompt — by then the volatile `git status` text is already part of the prefix that the model conditioned on in the previous turn, and stripping it post-hoc would itself bust the cache. The fix has to happen at the source. `CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1` prevents the injection before the prompt is composed, which is why the native flag is the right tool. Stripping post-hoc would also remove model-visible context that an explicit Bash call can recover, and would risk false-positive matches against assistant-written text.
+## Migration: v3.4.x → v3.5.0+
+If you wrote a custom statusline, monitoring script, or anything else that reads `~/.claude/quota-status.json` directly, this section is for you. v3.5.0 split that file in proxy mode; preload mode is unchanged.
+### What changed
+| | v3.4.x and earlier (proxy + preload) | v3.5.0+ proxy mode | v3.5.0+ preload mode |
+|---|---|---|---|
+| Quota fields (Q5h, Q7d, status, overage) | `~/.claude/quota-status.json` | `~/.claude/quota-status/account.json` | `~/.claude/quota-status.json` (legacy path) |
+| Cache fields (TTL tier, hit rate, cache_creation/read) | same file as above | `~/.claude/quota-status/sessions/<filename>.json` | same file as above |
+| Multi-session attribution | none — last writer wins | per-session files | preload is single-session by construction |
+`<filename>` is derived from the request's `x-claude-code-session-id` header via a deterministic safe-name rule: UUIDs and other ids matching `[A-Za-z0-9_-]{1,128}` pass through; null/empty/whitespace become `unknown`; anything else is mapped to `inv-<sha256-prefix>`. Full rule is documented at [`docs/directives/proxy-quota-status-per-session.md`](docs/directives/proxy-quota-status-per-session.md).
+The legacy `~/.claude/quota-status.json` is auto-deleted on the first proxy-mode write after upgrade. Per-session files older than `CACHE_FIX_QUOTA_STATUS_TTL_DAYS` (default `7`) are swept on write.
+### Consumer-side migration pattern
+Your script should try the v3.5.0+ proxy paths first and fall back to the legacy path if not present. That way it works in both modes (and on hosts mid-upgrade). The session id usually comes from Claude Code's stdin when it invokes a statusline hook; for other consumers, capture it from the most-recently-modified `~/.claude/projects/*/*.jsonl` filename.
+**Bash (statusline-style):**
+```bash
+QS_DIR="$HOME/.claude/quota-status"
+ACCOUNT="$QS_DIR/account.json"
+LEGACY="$HOME/.claude/quota-status.json"
+# Canonical filename rule — must mirror proxy/extensions/cache-telemetry.mjs
+# sessionFilename(): trim, then "" → unknown, safe regex passthrough, else
+# inv-<sha256-prefix>. Without this, malformed or whitespace ids miss the
+# per-session file even though the writer created one under the canonical name.
+session_filename() {
+  local trimmed
+  trimmed="$(printf '%s' "$1" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')"
+  if [ -z "$trimmed" ]; then echo unknown; return; fi
+  if printf '%s' "$trimmed" | grep -qE '^[A-Za-z0-9_-]{1,128}$'; then
+    printf '%s' "$trimmed"
+  else
+    # sha256sum on Linux; shasum -a 256 on macOS. Both emit "<hex>  -".
+    local hash
+    if command -v sha256sum >/dev/null 2>&1; then
+      hash="$(printf '%s' "$trimmed" | sha256sum)"
+    else
+      hash="$(printf '%s' "$trimmed" | shasum -a 256)"
+    fi
+    printf 'inv-%s' "$(printf '%s' "$hash" | cut -c1-16)"
+  fi
+}
+# session id: prefer CC stdin, fall back to most-recent jsonl
+sid="$(jq -r '.session_id // empty' 2>/dev/null < /dev/stdin || true)"
+if [ -z "$sid" ]; then
+  sid="$(ls -t "$HOME"/.claude/projects/*/*.jsonl 2>/dev/null | head -1 | xargs -I{} basename {} .jsonl)"
+fi
+filename="$(session_filename "$sid")"
+# quota: account.json (v3.5.0+) → fall back to legacy
+if [ -f "$ACCOUNT" ]; then
+  quota_json="$(cat "$ACCOUNT")"
+elif [ -f "$LEGACY" ]; then
+  quota_json="$(cat "$LEGACY")"
+fi
+# cache: sessions/<filename>.json (v3.5.0+) → fall back to legacy
+if [ -f "$QS_DIR/sessions/$filename.json" ]; then
+  cache_json="$(cat "$QS_DIR/sessions/$filename.json")"
+elif [ -f "$LEGACY" ]; then
+  cache_json="$(cat "$LEGACY")"
+fi
+```
+**Node:**
+```js
+import { readFileSync, existsSync } from "node:fs";
+import { homedir } from "node:os";
+import { join } from "node:path";
+import { createHash } from "node:crypto";
+const home = homedir();
+const accountPath = join(home, ".claude", "quota-status", "account.json");
+const legacyPath = join(home, ".claude", "quota-status.json");
+const SAFE_NAME_RE = /^[A-Za-z0-9_-]{1,128}$/;
+// Mirror of cache-telemetry.mjs sessionFilename(). Reader-side rule must match
+// writer-side rule; otherwise malformed/whitespace ids miss their per-session file.
+function sessionFilename(rawId) {
+  if (rawId === null || rawId === undefined) return "unknown";
+  const s = String(rawId).trim();
+  if (s.length === 0) return "unknown";
+  if (SAFE_NAME_RE.test(s)) return s;
+  return "inv-" + createHash("sha256").update(s).digest("hex").slice(0, 16);
+}
+function readQuotaJson() {
+  if (existsSync(accountPath)) return JSON.parse(readFileSync(accountPath, "utf8"));
+  if (existsSync(legacyPath)) return JSON.parse(readFileSync(legacyPath, "utf8"));
+  return null;
+}
+function readCacheJson(sessionId) {
+  const filename = sessionFilename(sessionId);
+  const p = join(home, ".claude", "quota-status", "sessions", `${filename}.json`);
+  if (existsSync(p)) return JSON.parse(readFileSync(p, "utf8"));
+  if (existsSync(legacyPath)) return JSON.parse(readFileSync(legacyPath, "utf8"));
+  return null;
+}
+```
+The shipped [`tools/quota-statusline.sh`](tools/quota-statusline.sh) is the reference implementation for the bash version. The [`/coffee` skill](https://github.com/cnighswonger/claude-code-coffee) v1.4.0 is the reference for the per-session warmth gate.
+### Why per-session
+On multi-agent hosts (multiple Claude Code sessions sharing one proxy), the pre-v3.5.0 single global file caused every session to overwrite the others' cache stats with each response. A statusline reading from session A would show session B's TTL tier whenever B sent a request more recently. Per-session files plus an account-global quota file resolve this without losing the easy account-wide view. See [#104](https://github.com/cnighswonger/claude-code-cache-fix/issues/104) for the original report.
 ## Image stripping (preload mode)
 Images read via the Read tool persist as base64 in conversation history, riding along on every subsequent API call. A single 500KB image costs ~62,500 tokens per turn on Opus 4.6, and **~85,000+ on Opus 4.7** due to the new tokenizer. Image stripping is strongly recommended on 4.7.

package/README.zh.md CHANGED Viewed

@@ -186,6 +186,120 @@ export CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1
 或在 `~/.claude/settings.json` 中添加 `"includeGitInstructions": false`。社区验证者 [@wadabum](https://github.com/cnighswonger/claude-code-cache-fix/issues/11)：跨 git 状态变化仅 18 token 缓存创建（禁用前为数千 token）。
+## 迁移：v3.4.x → v3.5.0+
+如果你编写了直接读取 `~/.claude/quota-status.json` 的自定义状态栏、监控脚本或其他工具，本节适用于你。v3.5.0 在代理模式下拆分了该文件；预加载模式保持不变。
+### 变更内容
+| | v3.4.x 及更早（代理 + 预加载） | v3.5.0+ 代理模式 | v3.5.0+ 预加载模式 |
+|---|---|---|---|
+| 配额字段（Q5h、Q7d、status、overage） | `~/.claude/quota-status.json` | `~/.claude/quota-status/account.json` | `~/.claude/quota-status.json`（旧路径） |
+| 缓存字段（TTL 层级、命中率、cache_creation/read） | 同上文件 | `~/.claude/quota-status/sessions/<filename>.json` | 同上文件 |
+| 多会话归属 | 无 — 后写者覆盖 | 按会话分文件 | 预加载按构造为单会话 |
+`<filename>` 由请求的 `x-claude-code-session-id` 头通过确定性安全名规则派生：UUID 等匹配 `[A-Za-z0-9_-]{1,128}` 的 id 直接通过；空/null/空白被映射为 `unknown`；其他映射为 `inv-<sha256-prefix>`。完整规则见 [`docs/directives/proxy-quota-status-per-session.md`](docs/directives/proxy-quota-status-per-session.md)。
+升级后第一次代理模式写入会自动删除旧版 `~/.claude/quota-status.json`。早于 `CACHE_FIX_QUOTA_STATUS_TTL_DAYS`（默认 `7`）的会话文件会在写入时被清理。
+### 消费方迁移模式
+你的脚本应优先尝试 v3.5.0+ 代理路径，失败时回退到旧路径。这样在两种模式下（以及升级中途的主机上）都能正常工作。会话 id 通常来自 Claude Code 调用状态栏 hook 时的 stdin；其他场景可从最近修改的 `~/.claude/projects/*/*.jsonl` 文件名捕获。
+**Bash（状态栏风格）：**
+```bash
+QS_DIR="$HOME/.claude/quota-status"
+ACCOUNT="$QS_DIR/account.json"
+LEGACY="$HOME/.claude/quota-status.json"
+# 文件名规范化规则 —— 必须与 proxy/extensions/cache-telemetry.mjs 中的
+# sessionFilename() 保持一致：先 trim；空 → unknown；匹配安全正则 → 直接通过；
+# 否则 → inv-<sha256-prefix>。否则空白/格式异常的 id 会读不到写入端按规范名
+# 创建的文件。
+session_filename() {
+  local trimmed
+  trimmed="$(printf '%s' "$1" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')"
+  if [ -z "$trimmed" ]; then echo unknown; return; fi
+  if printf '%s' "$trimmed" | grep -qE '^[A-Za-z0-9_-]{1,128}$'; then
+    printf '%s' "$trimmed"
+  else
+    # Linux 上是 sha256sum，macOS 上是 shasum -a 256；两者均输出 "<hex>  -"。
+    local hash
+    if command -v sha256sum >/dev/null 2>&1; then
+      hash="$(printf '%s' "$trimmed" | sha256sum)"
+    else
+      hash="$(printf '%s' "$trimmed" | shasum -a 256)"
+    fi
+    printf 'inv-%s' "$(printf '%s' "$hash" | cut -c1-16)"
+  fi
+}
+# 会话 id：优先 CC stdin，回退最近的 jsonl
+sid="$(jq -r '.session_id // empty' 2>/dev/null < /dev/stdin || true)"
+if [ -z "$sid" ]; then
+  sid="$(ls -t "$HOME"/.claude/projects/*/*.jsonl 2>/dev/null | head -1 | xargs -I{} basename {} .jsonl)"
+fi
+filename="$(session_filename "$sid")"
+# 配额：account.json（v3.5.0+）→ 回退旧路径
+if [ -f "$ACCOUNT" ]; then
+  quota_json="$(cat "$ACCOUNT")"
+elif [ -f "$LEGACY" ]; then
+  quota_json="$(cat "$LEGACY")"
+fi
+# 缓存：sessions/<filename>.json（v3.5.0+）→ 回退旧路径
+if [ -f "$QS_DIR/sessions/$filename.json" ]; then
+  cache_json="$(cat "$QS_DIR/sessions/$filename.json")"
+elif [ -f "$LEGACY" ]; then
+  cache_json="$(cat "$LEGACY")"
+fi
+```
+**Node：**
+```js
+import { readFileSync, existsSync } from "node:fs";
+import { homedir } from "node:os";
+import { join } from "node:path";
+import { createHash } from "node:crypto";
+const home = homedir();
+const accountPath = join(home, ".claude", "quota-status", "account.json");
+const legacyPath = join(home, ".claude", "quota-status.json");
+const SAFE_NAME_RE = /^[A-Za-z0-9_-]{1,128}$/;
+// 与 cache-telemetry.mjs 的 sessionFilename() 保持一致。读取端规则必须与写入端
+// 一致；否则空白/格式异常的 id 会找不到对应的会话文件。
+function sessionFilename(rawId) {
+  if (rawId === null || rawId === undefined) return "unknown";
+  const s = String(rawId).trim();
+  if (s.length === 0) return "unknown";
+  if (SAFE_NAME_RE.test(s)) return s;
+  return "inv-" + createHash("sha256").update(s).digest("hex").slice(0, 16);
+}
+function readQuotaJson() {
+  if (existsSync(accountPath)) return JSON.parse(readFileSync(accountPath, "utf8"));
+  if (existsSync(legacyPath)) return JSON.parse(readFileSync(legacyPath, "utf8"));
+  return null;
+}
+function readCacheJson(sessionId) {
+  const filename = sessionFilename(sessionId);
+  const p = join(home, ".claude", "quota-status", "sessions", `${filename}.json`);
+  if (existsSync(p)) return JSON.parse(readFileSync(p, "utf8"));
+  if (existsSync(legacyPath)) return JSON.parse(readFileSync(legacyPath, "utf8"));
+  return null;
+}
+```
+随包发布的 [`tools/quota-statusline.sh`](tools/quota-statusline.sh) 是 bash 版本的参考实现。[`/coffee` 技能](https://github.com/cnighswonger/claude-code-coffee) v1.4.0 是按会话保活闸门的参考。
+### 为什么按会话拆分
+在多代理主机上（多个 Claude Code 会话共享一个代理），v3.5.0 之前的单一全局文件会让每个会话用自己的响应覆盖其他会话的缓存统计。状态栏从会话 A 读取，但会话 B 最近发出请求时，会显示 B 的 TTL 层级。按会话分文件 + 一个账户级配额文件解决了这一问题，同时保留账户级整体视图。原始报告见 [#104](https://github.com/cnighswonger/claude-code-cache-fix/issues/104)。
 ## 图片剥离（预加载模式）
 通过 Read 工具读取的图片以 base64 持久化在对话历史中，在每次后续 API 调用时随行发送。单张 500KB 图片在 Opus 4.6 上每轮带来约 62,500 token 开销，**在 Opus 4.7 上约 85,000+ token**（因新分词器）。强烈建议在 4.7 上启用图片剥离。

package/THIRD_PARTY_LICENSES ADDED Viewed

@@ -0,0 +1,15 @@
+Third-Party Licenses
+====================
+1. claude-usage-dashboard — NDJSON proxy log schema
+   Source:  https://github.com/fgrosswig/claude-usage-dashboard
+   Author:  Falk Grosswig (@fgrosswig)
+   License: Apache License 2.0
+   The proxy NDJSON log schema used by tools/usage-to-dashboard-ndjson.mjs
+   (field names, structure, file naming convention, cache_health semantics,
+   and cost_factor methodology) originates from the claude-usage-dashboard.
+   Used under the Apache License, Version 2.0.
+   https://www.apache.org/licenses/LICENSE-2.0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-code-cache-fix",
-  "version": "3.5.2",
+  "version": "3.5.4",
   "description": "Cache optimization proxy and interceptor for Claude Code. Fixes prompt cache bugs, stabilizes prefix, reduces quota burn.",
   "type": "module",
   "exports": "./preload.mjs",
@@ -15,7 +15,8 @@
     "claude-fixed.bat",
     "proxy/",
     "bin/",
-    "templates/"
+    "templates/",
+    "THIRD_PARTY_LICENSES"
   ],
   "engines": {
     "node": ">=18"

package/proxy/extensions/rate-limit-log.mjs ADDED Viewed

@@ -0,0 +1,251 @@
+// rate-limit-log — append per-event record to ~/.claude/usage-log/rate-limit-events.jsonl
+// when an upstream response carries the canonical Anthropic rate-limit error
+// envelope. This is a SUPERSET of burst/concurrency events: the same
+// envelope is returned for RPM/ITPM/OTPM and auto-mode classifier overflow.
+// Splitting the classes is a post-analysis problem on the JSONL stream.
+//
+// See docs/directives/proxy-rate-limit-logging.md for the full design and
+// the post-analysis playbook.
+//
+// Activation: enabled:false in the export default. Users opt in via
+//   "rate-limit-log": { "enabled": true, "order": 660 }
+// in proxy/extensions.json. No env-var enable flag.
+//
+// Detection signature is grounded in 88 captured 429 responses from the
+// 2026-05-08 00:06-00:21 UTC burst (15 min window, single account, full HTTP
+// fidelity via tee between cache-fix-proxy and llm-relay). Brief at
+//   ~/git_repos/claude/docs/issues/cache-fix-429-burst-data-2026-05-08.md
+// Across all 88: status === 429, content-type: application/json (no SSE),
+// body.type === "error", body.error.type === "rate_limit_error",
+// x-should-retry: "true". No Retry-After. No anthropic-ratelimit-* headers.
+// Anthropic's `error.message` is literally "Error" — no class hint.
+//
+// SCOPE NOTE: this extension logs ALL `rate_limit_error` 429s. Per
+// Anthropic's public docs, that error type is shared across the
+// burst/concurrency limiter, classic RPM/ITPM/OTPM limiters, and (per
+// Lead's 2026-05-08 follow-up) auto-mode classifier traffic on Opus 4.7.
+// The response itself carries no signal that distinguishes those classes —
+// `error.message` is literally "Error". Splitting burst-vs-RPM/TPM and
+// classifier-vs-main-inference happens in post-analysis, using the
+// recorded `requested_model`, `request_path`, inter-arrival timing, and
+// `request_size_tokens` fields on each row. Do NOT treat this file as
+// burst-limit-only evidence; it's a superset.
+import { mkdir, appendFile } from "node:fs/promises";
+import { readdirSync, statSync, readFileSync } from "node:fs";
+import { join, dirname } from "node:path";
+import { homedir } from "node:os";
+// Paths resolved per call so tests can swap $HOME between cases. The
+// homedir() call is essentially free.
+function paths() {
+  const home = homedir();
+  return {
+    logPath: join(home, ".claude", "usage-log", "rate-limit-events.jsonl"),
+    accountPath: join(home, ".claude", "quota-status", "account.json"),
+    sessionsDir: join(home, ".claude", "quota-status", "sessions"),
+  };
+}
+const BODY_EXCERPT_MAX = 256;
+const ACTIVE_SESSION_WINDOW_MS = 5 * 60 * 1000;
+// --- Detection predicate ---
+//
+// Matches the canonical Anthropic rate-limit error envelope:
+//   { "type": "error", "error": { "type": "rate_limit_error", ... } } at 429
+//
+// Per Anthropic's public docs and the 2026-05-08 capture, this envelope is
+// returned for every flavor of 429 — burst/concurrency, RPM, ITPM, OTPM,
+// and auto-mode classifier overflow. The response itself carries no
+// discriminator. Distinguishing the classes is a post-analysis problem
+// over the JSONL using inter-arrival timing, `requested_model`, and
+// `request_size_tokens`. See directive for the analysis playbook.
+//
+// Header signals (x-should-retry, request-id) are recorded in the row but
+// NOT used as detection gates — Anthropic could change them independently
+// of the body, and the body schema is the canonical contract.
+export function isRateLimitResponse(ctx) {
+  if (!ctx || typeof ctx.status !== "number") return false;
+  if (ctx.status !== 429) return false;
+  const body = ctx.body;
+  if (!body || typeof body !== "object") return false;
+  return body.type === "error" && body.error?.type === "rate_limit_error";
+}
+// --- Field extractors (test seams) ---
+export function estimateRequestSizeTokens(body) {
+  if (!body || typeof body !== "object") return 0;
+  let chars = 0;
+  if (typeof body.system === "string") chars += body.system.length;
+  if (Array.isArray(body.system)) {
+    for (const block of body.system) {
+      if (block && typeof block.text === "string") chars += block.text.length;
+    }
+  }
+  if (Array.isArray(body.messages)) {
+    for (const msg of body.messages) {
+      if (typeof msg?.content === "string") {
+        chars += msg.content.length;
+      } else if (Array.isArray(msg?.content)) {
+        for (const block of msg.content) {
+          if (typeof block?.text === "string") chars += block.text.length;
+        }
+      }
+    }
+  }
+  return Math.ceil(chars / 4);
+}
+// Bounds the persisted excerpt length, NOT the temporary serialization
+// allocation. JSON.stringify still materializes the full body string before
+// the slice. In practice this is fine — Anthropic's 429 bodies are ~120
+// bytes per the 2026-05-08 capture, and the proxy already buffers the full
+// body upstream of this extension (server.mjs:79-82). For string inputs the
+// length is capped pre-stringify, so a hostile pre-rendered string can't
+// blow up here. If a future call site ever passes a giant pre-built object
+// graph, the upstream-buffering and per-extension try/catch isolate the
+// allocation cost from the rest of the pipeline.
+export function bodyExcerpt(body) {
+  if (body === undefined || body === null) return "";
+  if (typeof body === "string") return body.slice(0, BODY_EXCERPT_MAX);
+  let s;
+  try {
+    s = JSON.stringify(body);
+  } catch {
+    s = String(body);
+  }
+  return s.slice(0, BODY_EXCERPT_MAX);
+}
+export function isPeakHourOldSchedule(now = new Date()) {
+  const day = now.getUTCDay(); // 0 = Sun, 1..5 = Mon..Fri, 6 = Sat
+  const hour = now.getUTCHours();
+  return day >= 1 && day <= 5 && hour >= 13 && hour < 19;
+}
+export function countActiveSessions(now = Date.now(), sessionsDir = paths().sessionsDir) {
+  let entries;
+  try {
+    entries = readdirSync(sessionsDir);
+  } catch {
+    return 0;
+  }
+  let count = 0;
+  const cutoff = now - ACTIVE_SESSION_WINDOW_MS;
+  for (const name of entries) {
+    try {
+      const st = statSync(join(sessionsDir, name));
+      if (st.mtimeMs >= cutoff) count++;
+    } catch {}
+  }
+  return count;
+}
+export function readQ5hPctAtEvent(accountPath = paths().accountPath) {
+  try {
+    const data = JSON.parse(readFileSync(accountPath, "utf8"));
+    return data?.five_hour?.pct ?? null;
+  } catch {
+    return null;
+  }
+}
+export function buildRecord({ ctx, now = new Date() }) {
+  // Anthropic's error responses carry the request id in TWO places: the
+  // `request-id` response header and the body's `request_id` field. Prefer
+  // body (canonical), fall back to header.
+  const headerReqId = ctx?.headers?.["request-id"] || null;
+  const bodyReqId = (ctx?.body && typeof ctx.body === "object")
+    ? (ctx.body.request_id || null)
+    : null;
+  const xShouldRetry = ctx?.headers?.["x-should-retry"] || null;
+  return {
+    schema_version: 1,
+    ts: now.toISOString(),
+    type: "rate_limit",
+    session_id: ctx?.meta?._sessionId ?? null,
+    requested_model: ctx?.meta?._requestedModel ?? null,
+    request_path: ctx?.meta?._requestPath || "/v1/messages",
+    request_size_tokens: ctx?.meta?._requestSizeTokens ?? 0,
+    response_status: ctx?.status ?? null,
+    response_body_excerpt: bodyExcerpt(ctx?.body),
+    concurrent_sessions_estimate: countActiveSessions(now.getTime()),
+    q5h_pct_at_event: readQ5hPctAtEvent(),
+    peak_hour_old_schedule: isPeakHourOldSchedule(now),
+    upstream_request_id: bodyReqId || headerReqId,
+    x_should_retry: xShouldRetry,
+    // Stable id of the underlying TCP socket that carried this request,
+    // assigned in proxy/upstream.mjs via WeakMap<Socket, id>. Persists across
+    // keep-alive reuse, recycles on socket close. Null if upstream errored
+    // before a socket was assigned. Populated by server.mjs after
+    // forwardRequest resolves. Use for H3-vs-H4 verification per Lead's
+    // 2026-05-08 brief: if 429s cluster on one connection id, the limiter
+    // is per-connection (H3); if they spread across many, client-side
+    // queue saturation (H4) is more likely.
+    upstream_connection_id: ctx?.meta?._upstreamConnectionId ?? null,
+  };
+}
+// --- I/O ---
+async function appendJsonl(record, path = paths().logPath) {
+  await mkdir(dirname(path), { recursive: true });
+  await appendFile(path, JSON.stringify(record) + "\n");
+}
+// Test helper: write to a caller-supplied path (bypasses default).
+export async function writeRecord(record, path) {
+  await mkdir(dirname(path), { recursive: true });
+  await appendFile(path, JSON.stringify(record) + "\n");
+}
+// Exported so tests / external diagnostics can resolve the current path.
+export function getLogPath() {
+  return paths().logPath;
+}
+// --- Extension contract ---
+export default {
+  name: "rate-limit-log",
+  description: "Append rate-limit incident records to ~/.claude/usage-log/rate-limit-events.jsonl (opt-in)",
+  enabled: false,
+  order: 660,
+  async onRequest(ctx) {
+    if (!ctx || !ctx.body) return;
+    try {
+      ctx.meta = ctx.meta || {};
+      ctx.meta._requestSizeTokens = estimateRequestSizeTokens(ctx.body);
+      // Capture the requested model so post-analysis can distinguish
+      // auto-mode classifier traffic (Opus 4.7) from main-inference (any
+      // model). Per Lead's 2026-05-08 finding, CC's auto-mode safety
+      // classifier runs a separate Opus-4-7 API call before each Edit, and
+      // those classifier calls share the same account-wide concurrency
+      // limiter — so the rate-limit JSONL is naturally a mix of both
+      // traffic types. requested_model + request_size_tokens together let
+      // post-analysis split them.
+      if (typeof ctx.body.model === "string") {
+        ctx.meta._requestedModel = ctx.body.model;
+      }
+      // Future-proof: when the proxy gains other paths beyond /v1/messages,
+      // pass the path through ctx so we can record it. Until then default in
+      // buildRecord. We don't have ctx.path today, so this is a no-op.
+    } catch {
+      // Fail-open: never throw to the pipeline.
+    }
+  },
+  async onResponse(ctx) {
+    if (!isRateLimitResponse(ctx)) return;
+    try {
+      const record = buildRecord({ ctx });
+      await appendJsonl(record);
+    } catch {
+      // Fail-open: never throw to the pipeline.
+    }
+  },
+};

package/proxy/server.mjs CHANGED Viewed

@@ -53,10 +53,10 @@ async function handleMessages(clientReq, clientRes) {
   const requestedModel = parsed?.model || null;
-  let upstreamRes, responseHeaders, statusCode;
+  let upstreamRes, responseHeaders, statusCode, upstreamConnectionId;
   try {
-    ({ upstreamRes, responseHeaders, statusCode } = await forwardRequest(
+    ({ upstreamRes, responseHeaders, statusCode, upstreamConnectionId } = await forwardRequest(
       clientReq,
       forwardBody,
       abortController.signal
@@ -68,6 +68,11 @@ async function handleMessages(clientReq, clientRes) {
     return;
   }
+  // Stash upstream connection id on meta so downstream extensions
+  // (rate-limit-log, future per-connection diagnostics) can record which
+  // socket carried the request without each one re-instrumenting upstream.
+  meta._upstreamConnectionId = upstreamConnectionId ?? null;
   if (extSnapshot.length > 0) {
     const resCtx = { status: statusCode, headers: responseHeaders, meta };
     await runOnResponseStart(resCtx, extSnapshot);

package/proxy/upstream.mjs CHANGED Viewed

@@ -54,6 +54,42 @@ const _agents = new Map();        // cache key → Agent | null
 const _loggedProxies = new Set(); // dedupe stderr "using proxy" lines per (url, isHTTPS)
 let _warnedTlsDisabled = false;
+// --- Upstream connection identity ---
+//
+// Each underlying TCP socket gets a stable id the first time we see it. The
+// id persists across keep-alive reuses of the same socket (WeakMap by socket
+// reference) and dies when the socket is GC'd. New sockets — including
+// reconnects after a closed connection — get fresh ids.
+//
+// This lets the rate-limit-log extension (and any future per-connection
+// diagnostic) record which upstream connection a response came back on, so
+// post-analysis can distinguish per-connection limiter behavior (Lead's H3,
+// 2026-05-08 brief) from client-side queue saturation (H4) or genuinely
+// account-wide limiting.
+//
+// Format: "cn-<int>" — opaque to consumers; only the equality and cardinality
+// matter for analysis.
+let _connectionIdCounter = 0;
+const _socketIds = new WeakMap();
+export function getOrAssignConnectionId(socket) {
+  if (!socket) return null;
+  let id = _socketIds.get(socket);
+  if (id === undefined) {
+    id = `cn-${++_connectionIdCounter}`;
+    _socketIds.set(socket, id);
+  }
+  return id;
+}
+// Test-only: reset the monotonic counter. The WeakMap entries die with their
+// sockets so we don't need to clear them; we just need a predictable start
+// for assertions on id values across cases.
+export function __resetConnectionIdsForTests() {
+  _connectionIdCounter = 0;
+}
 function shouldBypassProxy(hostname) {
   if (!config.noProxy) return false;
   const list = config.noProxy.split(",").map((s) => s.trim().toLowerCase()).filter(Boolean);
@@ -170,10 +206,25 @@ export function forwardRequest(clientReq, body, signal) {
       agent: getAgent(isHTTPS, upstreamUrl.hostname),
     };
+    let upstreamConnectionId = null;
+    // The 'socket' event fires when a socket is assigned to this request,
+    // synchronously after transport.request() returns for both new and
+    // pooled-keep-alive sockets. By the time the response callback runs we
+    // already know which connection carried the request.
+    const captureSocket = (sock) => {
+      upstreamConnectionId = getOrAssignConnectionId(sock);
+    };
     const upstreamReq = transport.request(options, (upstreamRes) => {
       const responseHeaders = filterResponseHeaders(upstreamRes.headers);
-      resolve({ upstreamRes, responseHeaders, statusCode: upstreamRes.statusCode });
+      resolve({
+        upstreamRes,
+        responseHeaders,
+        statusCode: upstreamRes.statusCode,
+        upstreamConnectionId,
+      });
     });
+    upstreamReq.on("socket", captureSocket);
     upstreamReq.on("error", reject);
     upstreamReq.on("timeout", () => {

package/tools/usage-to-dashboard-ndjson.mjs CHANGED Viewed

@@ -82,8 +82,13 @@
  *   ANTHROPIC_PROXY_LOG_DIR  Override output directory (matches fgrosswig's
  *                            dashboard env var so both tools stay in sync).
  *
- * Part of claude-code-cache-fix. MIT licensed.
+ * Part of claude-code-cache-fix.
  *   https://github.com/cnighswonger/claude-code-cache-fix
+ *
+ * The NDJSON proxy log schema (field names, structure, file naming convention,
+ * cache_health semantics) originates from fgrosswig/claude-usage-dashboard
+ * and is used under the Apache License 2.0.
+ *   https://github.com/fgrosswig/claude-usage-dashboard
  */
 import { readFileSync, writeFileSync, appendFileSync, existsSync, mkdirSync, statSync, watch } from 'node:fs';
@@ -147,9 +152,19 @@ plus the visualization layer from his dashboard, with no coordination needed.
  * Translate one claude-code-cache-fix usage.jsonl record into a
  * fgrosswig-dashboard-compatible NDJSON record. Returns null if the
  * record doesn't have enough fields to be usable.
+ *
+ * Accepts both schemas:
+ *   - Preload-era: `entry.timestamp`, `entry.q5h_pct` / `entry.q7d_pct` (int 0-100)
+ *   - Proxy v:1 (MeterRowSchema, written by `usage-log` extension v3.2.0+):
+ *     `entry.ts`, `entry.q5h` / `entry.q7d` (float 0-1)
+ *
+ * Both forms are handled via fallback so this translator continues to work
+ * across the schema evolution. Tracking issue: #112.
  */
-function translateRecord(entry) {
-  if (!entry || !entry.timestamp || !entry.model) return null;
+export function translateRecord(entry) {
+  // Entry guard — accept both formats. Drop only when neither timestamp form
+  // is present, or model is missing.
+  if (!entry || !(entry.timestamp || entry.ts) || !entry.model) return null;
   const inTok = entry.input_tokens || 0;
   const outTok = entry.output_tokens || 0;
@@ -167,19 +182,28 @@ function translateRecord(entry) {
   }
   // Reconstruct a minimal response_anthropic_headers blob from the quota
-  // pct fields we captured. Not byte-identical to what the proxy would see
-  // on the wire, but structurally compatible for the dashboard's consumers.
+  // fields we captured. Two schema flavors:
+  //   preload: q5h_pct / q7d_pct as int 0-100 (divide by 100 to get utilization)
+  //   v:1:     q5h / q7d as float 0-1 (already in utilization form)
+  // Not byte-identical to what the proxy would see on the wire, but
+  // structurally compatible for the dashboard's consumers.
   const responseHeaders = {};
   if (entry.q5h_pct != null) {
     responseHeaders['anthropic-ratelimit-unified-5h-utilization'] = String(entry.q5h_pct / 100);
+  } else if (entry.q5h != null) {
+    responseHeaders['anthropic-ratelimit-unified-5h-utilization'] = String(entry.q5h);
   }
   if (entry.q7d_pct != null) {
     responseHeaders['anthropic-ratelimit-unified-7d-utilization'] = String(entry.q7d_pct / 100);
+  } else if (entry.q7d != null) {
+    responseHeaders['anthropic-ratelimit-unified-7d-utilization'] = String(entry.q7d);
   }
+  const entryTs = entry.timestamp || entry.ts;
   const rec = {
-    ts_start: entry.timestamp,
-    ts_end: entry.timestamp,
+    ts_start: entryTs,
+    ts_end: entryTs,
     duration_ms: null,
     method: 'POST',
     path: '/v1/messages',
@@ -207,7 +231,7 @@ function translateRecord(entry) {
   // Synthesize a stable pseudo-request-id from timestamp + model for dedup
   // at the dashboard layer. Not a real request ID — just a deterministic key.
-  rec.req_id = 'ccf_' + entry.timestamp.replace(/[^0-9]/g, '') + '_' + entry.model.slice(-6);
+  rec.req_id = 'ccf_' + entryTs.replace(/[^0-9]/g, '') + '_' + entry.model.slice(-6);
   return rec;
 }
@@ -339,14 +363,19 @@ function runFollow(opts) {
 // ─── Main ───────────────────────────────────────────────────────────────────
-const opts = parseArgs();
-if (opts.help) {
-  printUsage();
-  process.exit(0);
-}
+// Guard CLI execution so tests can `import { translateRecord }` without
+// auto-running the batch/follow flow.
+const _isMain = import.meta.url === `file://${process.argv[1]}`;
+if (_isMain) {
+  const opts = parseArgs();
+  if (opts.help) {
+    printUsage();
+    process.exit(0);
+  }
-if (opts.follow) {
-  runFollow(opts);
-} else {
-  runBatch(opts);
+  if (opts.follow) {
+    runFollow(opts);
+  } else {
+    runBatch(opts);
+  }
 }