claude-code-cache-fix 3.6.0 → 3.6.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -43,6 +43,8 @@ On every `/v1/messages` request, 7 extensions run in order:
43
43
 
44
44
  Extensions are hot-reloadable — add, remove, or modify `.mjs` files in `proxy/extensions/` and changes apply to the next request without restarting. Configuration in `proxy/extensions.json`.
45
45
 
46
+ **Developing a new extension?** See [docs/parallel-proxy-test-harness.md](docs/parallel-proxy-test-harness.md) for the pattern we use to test extensions end-to-end against real `claude -p` traffic without disturbing the production proxy.
47
+
46
48
  ### Running as a service
47
49
 
48
50
  **Recommended (Linux/macOS) — `install-service` subcommand:**
@@ -204,6 +206,42 @@ Options (all optional; all fall back to the same env vars used by the CLI):
204
206
 
205
207
  **CLI invocation is unchanged.** `node proxy/server.mjs`, `cache-fix-proxy server`, and the wrapper's child-fork path all auto-listen and install SIGTERM/SIGINT handlers as before. Library imports never trigger that behavior — the auto-listen is gated behind a main-module check.
206
208
 
209
+ *The embeddable factory was contributed by [@bilby91](https://github.com/bilby91) at [Crunchloop DAP](https://dap.crunchloop.ai) — see [PR #123](https://github.com/cnighswonger/claude-code-cache-fix/pull/123).*
210
+
211
+ ## Recommended CC operational config
212
+
213
+ The proxy fixes what it can fix at the request layer. A handful of CC client-side env vars and `~/.claude/settings.json` knobs solve adjacent problems the proxy can't reach — silent model swaps on CC update, ambiguous model fallback, schema-strip side effects. Surfacing these here as a recommendation; users decide their own config.
214
+
215
+ These findings come from [@fgrosswig](https://github.com/fgrosswig)'s binary analysis of CC v2.1.91. Methodology is public PowerShell + ASCII string extraction; he shared the resulting punch list privately as a courtesy.
216
+
217
+ ### Suggested `~/.claude/settings.json` env block
218
+
219
+ The model IDs below are illustrative — replace with your preferred main and small-fast models. The point is that pinning *something* explicit beats relying on CC's defaults.
220
+
221
+ ```json
222
+ {
223
+ "env": {
224
+ "CLAUDE_CODE_DISABLE_LEGACY_MODEL_REMAP": "1",
225
+ "ANTHROPIC_MODEL": "claude-opus-4-7",
226
+ "ANTHROPIC_SMALL_FAST_MODEL": "claude-haiku-4-5-20251001"
227
+ }
228
+ }
229
+ ```
230
+
231
+ **`CLAUDE_CODE_DISABLE_LEGACY_MODEL_REMAP=1`** — single most impactful flag. CC has a legacy code path that silently remaps your pinned model to a different one after certain version updates. Setting this to `1` disables the remap; the model you pin is the model you get. (If you don't pin, CC's defaults apply as usual.)
232
+
233
+ **`ANTHROPIC_MODEL`** — pins the primary model. Keeping this explicit means the cache prefix hash stays stable across CC version bumps that would otherwise swap your default. Adjust to whichever model you actually want.
234
+
235
+ **`ANTHROPIC_SMALL_FAST_MODEL`** — pins the side-channel "fast" model CC uses for short auxiliary calls (e.g., title generation, classification). Without an explicit pin, this can silently fall back to a different family on update.
236
+
237
+ ### `autoCompactWindow=1000000` caveat
238
+
239
+ If you've seen the `autoCompactWindow: 1000000` setting recommended elsewhere: it only takes effect when the active model qualifies for 1M-context (currently `claude-sonnet-4-6` or `claude-opus-4-6` with the appropriate beta header). Without those preconditions it caps at the hardcoded 200K regardless of what you set.
240
+
241
+ ### `CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1` schema-strip side effect
242
+
243
+ If you set this flag, CC strips any tool field outside `["name", "description", "input_schema", "cache_control"]` from outgoing requests. Custom tools relying on `defer_loading` or `eager_input_streaming` will silently lose those fields and behave differently. Worth knowing before turning the flag on.
244
+
207
245
  ## Quick Start: Preload (CC v2.1.112 and earlier)
208
246
 
209
247
  If you're on a Node.js-based CC version (v2.1.112 or earlier), the preload interceptor works without a proxy:
@@ -242,7 +280,7 @@ For manual VS Code wrapper setup (without the VSIX), see [docs/preload-setup.md]
242
280
 
243
281
  **What it does NOT do:** No network calls from the proxy or interceptor. All telemetry is written to local files under `~/.claude/`. No data leaves your machine.
244
282
 
245
- **Supply chain:** Proxy mode: 7 small extension modules in `proxy/extensions/` (each under 200 lines). Preload mode: single unminified file (`preload.mjs`, ~1,700 lines). One dev dependency (`zod` for schema validation in tests only). Review before installing. npm provenance links each published version to its source commit.
283
+ **Supply chain:** Proxy mode: 7 small extension modules in `proxy/extensions/` (each under 200 lines). Preload mode: single unminified file (`preload.mjs`, ~1,700 lines). One dev dependency (`zod` for schema validation in tests only). Review before installing. Published builds carry npm's default registry signatures; sigstore provenance attestation is not currently published tracked as a follow-up.
246
284
 
247
285
  **Independent audit:** [Assessed as "LEGITIMATE TOOL"](https://github.com/anthropics/claude-code/issues/38335#issuecomment-4244413605) by @TheAuditorTool (2026-04-14).
248
286
 
@@ -319,13 +357,23 @@ The interceptor can only *help* or *do nothing*. It cannot make things worse.
319
357
 
320
358
  Both modes write quota state on every API call. Proxy mode (v3.5.0+) splits into `~/.claude/quota-status/account.json` (account-global fields: Q5h/Q7d, status, overage) plus `~/.claude/quota-status/sessions/<id>.json` (per-session cache fields: TTL tier, hit rate). Preload mode keeps the legacy `~/.claude/quota-status.json` (single-session by construction). The included `tools/quota-statusline.sh` script displays a live status line showing:
321
359
 
322
- - **Q5h %** with burn rate (%/min)
323
- - **Q7d %** with burn rate (%/hr)
360
+ - **Q5h** quota bar `[███░┃░░░░░]` + percent + `(exhaust X, reset Y)`. Filled cells are consumed quota; the heavy-vertical tick is wall-clock elapsed position in the window. Tick to the right of the fill = under pace; tick inside the fill = burning faster than time (over pace). `exhaust` is the projected time-to-100% at the current burn rate; `reset` is the wall-clock time until the window rolls over. When `exhaust < reset`, you will hit 100% before the window resets — back off.
361
+ - **Q7d** same shape with day-scale durations (e.g. `(exhaust 3d 13h, reset 3d 0h)`).
324
362
  - **TTL tier** — `TTL:1h` when healthy, **`TTL:5m` in red when the server has downgraded you** (typically at Q5h ≥ 100%)
325
363
  - **PEAK** in yellow during weekday peak hours (13:00–19:00 UTC)
326
364
  - **Cache hit rate %**
327
365
  - **OVERAGE** flag when active
328
366
 
367
+ Example line (mid-window, healthy state):
368
+
369
+ ```
370
+ Q5h [███░┃░░░░░] 30% (exhaust 4h40m, reset 3h00m) | Q7d [█████┃░░░░] 53% (exhaust 3d 13h, reset 3d 0h) | TTL:1h 98.3%
371
+ ```
372
+
373
+ The `(exhaust …, reset …)` suffix is dropped piecewise when projection isn't meaningful: at 0% (fresh window) and 100% (already exhausted) only `reset` is shown; in the first minute (Q5h) or six minutes (Q7d) after window start the burn rate isn't stable enough to project, so `exhaust` is held back until then; a stale `resets_at` (the server-reported value sits in the past, before the next API call refreshes it) drops both.
374
+
375
+ The bar uses Unicode block characters (`█┃░`) — most modern terminals render these correctly. If your terminal substitutes boxes or replacement glyphs, configure a Unicode-capable font (any DejaVu, Fira, Iosevka, JetBrains Mono, etc.).
376
+
329
377
  ### Setup
330
378
 
331
379
  ```bash
@@ -613,6 +661,35 @@ The Mode A/B separation protects against cases where the sentinel might be follo
613
661
  | `CACHE_FIX_MICROCOMPACT_REDACT_LEN` | `64` | Mode B prefix length in dump records. Set to `0` to suppress the prefix entirely. |
614
662
  | `CACHE_FIX_DUMP_MICROCOMPACT_INCLUDE_NORMALIZED` | unset | Add post-normalization text alongside (not replacing) raw `sentinel_text` in dump records. |
615
663
 
664
+ ## Thinking summaries (proxy mode, opt-in, Opus 4.7+)
665
+
666
+ On Opus 4.7, Anthropic flipped the API default for `thinking.display` from `"summarized"` to `"omitted"`. In parallel, Claude Code's CLI has a `!getIsNonInteractiveSession()` gate that propagates `display: "summarized"` only when the session is interactive. The combination means every CC subprocess spawned with `--input-format stream-json` — the VS Code chat panel, the Antigravity panel, the SDK, `claude --print` — sends a thinking-enabled request (`thinking.type` is either `"enabled"` or `"adaptive"` depending on CC version) without `display`, and the API responds with thinking blocks whose `thinking` field is empty (plus a multi-KB signature). The UI shows a static "Thinking" stub while the agent runs but never any reasoning content.
667
+
668
+ Upstream root cause and patch proposed in [anthropics/claude-code#59844](https://github.com/anthropics/claude-code/issues/59844) (credit: [@ojura](https://github.com/ojura)). This extension is the proxy-side complement: when a request to an Opus 4.7 endpoint has thinking enabled but `display` unset, inject the configured mode at the API boundary. Works on any CC version routed through cache-fix-proxy, no waiting on Anthropic to ship the CLI fix.
669
+
670
+ ```sh
671
+ # Restore summaries (the built-in default — non-interactive surfaces get reasoning content)
672
+ export CACHE_FIX_THINKING_DISPLAY=summarized
673
+
674
+ # Force-suppress override (agent runtimes that don't want thinking blocks at all)
675
+ export CACHE_FIX_THINKING_DISPLAY=omitted
676
+
677
+ # Explicit no-op (extension passes through unchanged)
678
+ export CACHE_FIX_THINKING_DISPLAY=disabled
679
+ ```
680
+
681
+ The extension is **default-on** as of v3.6.1. The cache-prefix test measured 0% absolute drop in steady-state `cache_read` ratio when injection is active on Opus 4.7 (5 sequential `claude -p` calls per window, baseline vs injected — both windows held 1.000 cache_read ratio from call 2 onward). Adding `thinking.display` to the request body changes the bytes Anthropic hashes, but Anthropic's cache layer accepts and indexes the injected-prefix the same way it does any other prefix. Users who want the older "no injection" behavior (e.g. to avoid any request-body mutation at all) explicitly set `CACHE_FIX_THINKING_DISPLAY=disabled`.
682
+
683
+ Scoping rules baked into the extension:
684
+
685
+ - **Model-gated.** Only fires on requests whose `model` matches `/^claude-opus-4-7/` — covers `claude-opus-4-7` and `claude-opus-4-7-1m`. Sonnet 4.7 needs separate verification (the API default-flip may differ); future versions (4.8+) require an explicit cache-fix bump rather than auto-applying unverified behavior.
686
+ - **User opt-out preserved.** If the request already has `thinking.display` set (either `"summarized"` or `"omitted"`), the extension never overwrites. Explicit user choice always wins.
687
+ - **Thinking-active types only.** The extension fires on `thinking.type` ∈ `{ "enabled", "adaptive" }` — the two active modes that produce thinking blocks on Opus 4.7. Other values (`"disabled"`, future modes) are skipped. Conservative: if Anthropic ships a new thinking type with different display semantics, we'd rather miss the fix than auto-apply incorrect behavior.
688
+
689
+ | Env var | Default | Purpose |
690
+ |---------|---------|---------|
691
+ | `CACHE_FIX_THINKING_DISPLAY` | `summarized` (built-in) | One of `summarized` / `omitted` / `disabled`. `summarized` restores thinking summaries (default). `omitted` force-suppresses thinking blocks. `disabled` opts the extension out entirely. |
692
+
616
693
  ## System prompt rewrite (preload mode, optional)
617
694
 
618
695
  The interceptor can rewrite Claude Code's `# Output efficiency` system-prompt section. Disabled by default. Enable with `CACHE_FIX_OUTPUT_EFFICIENCY_REPLACEMENT`. See [docs/output-efficiency-prompts.md](docs/output-efficiency-prompts.md) for the three known prompt variants and usage instructions.
@@ -643,13 +720,13 @@ We monitor 30+ upstream Claude Code issues related to cache, quota, and context
643
720
 
644
721
  ## Used in production
645
722
 
646
- - **[Crunchloop DAP](https://dap.crunchloop.ai)** — Agent SDK / DAP development environment. First production team to merge the interceptor to trunk for team-wide deployment (2026-04-10). Identified two distinct cache regression patterns through real-world testing — tool ordering jitter and the fresh-session sort gap — and contributed debug traces that drove the v1.5.1 and v1.6.2 fixes.
723
+ - **[Crunchloop DAP](https://dap.crunchloop.ai)** — Agent SDK / DAP development environment. First production team to merge the interceptor to trunk for team-wide deployment (2026-04-10). Identified two distinct cache regression patterns through real-world testing — tool ordering jitter and the fresh-session sort gap — and contributed debug traces that drove the v1.5.1 and v1.6.2 fixes. Contributed the embeddable proxy factory (v3.6.0) that lets the proxy run in-process inside Bun-compiled and DAP-style agent binaries without forking a Node child.
647
724
  - **[VM Farms](https://vmfarms.com)** ([@vmfarms](https://github.com/vmfarms)) — Agent development environment running concurrent multi-runner workloads with `--resume --fork-session`. Surfaced three cache-fix proxy-mode bugs: the resume-marker regex no-op (#96), TTL tier detection gap vs preload mode (#97), and image-strip stderr leak past `CACHE_FIX_DEBUG` (#98) — all addressed in the v3.4.0 release.
648
725
 
649
726
  ## Contributors
650
727
 
651
728
  - **[@VictorSun92](https://github.com/VictorSun92)** — Original monkey-patch fix for v2.1.88, identified partial scatter on v2.1.90, contributed forward-scan detection, correct block ordering, tighter block matchers, and the optional output-efficiency rewrite hook
652
- - **[@bilby91](https://github.com/bilby91)** ([Crunchloop DAP](https://dap.crunchloop.ai)) — Agent SDK / DAP production environment validation, 1h cache TTL confirmation, tool ordering jitter discovery via debug trace (fixed in v1.5.1), fresh-session sort bug discovery via SKILLS SORT diagnostic (fixed in v1.6.2). First production team to roll the interceptor to trunk.
729
+ - **[@bilby91](https://github.com/bilby91)** ([Crunchloop DAP](https://dap.crunchloop.ai)) — Agent SDK / DAP production environment validation, 1h cache TTL confirmation, tool ordering jitter discovery via debug trace (fixed in v1.5.1), fresh-session sort bug discovery via SKILLS SORT diagnostic (fixed in v1.6.2). First production team to roll the interceptor to trunk. Designed and contributed the embeddable proxy factory (`startProxy()` / `createProxyServer()`) shipped in v3.6.0 (PR #123).
653
730
  - **[@jmarianski](https://github.com/jmarianski)** — Root cause analysis via MITM proxy capture and Ghidra reverse engineering, multi-mode cache test script
654
731
  - **[@cnighswonger](https://github.com/cnighswonger)** — Fingerprint stabilization, tool ordering fix, image stripping, monitoring features, overage TTL downgrade discovery, proxy architecture, package maintainer
655
732
  - **[@ArkNill](https://github.com/ArkNill)** — Microcompact mechanism analysis, GrowthBook flag documentation, false rate limiter identification, fingerprint verification fix for CC v2.1.108+ (PR #21), Korean README (PR #22), [claude-code-hidden-problem-analysis](https://github.com/ArkNill/claude-code-hidden-problem-analysis) research
@@ -662,6 +739,7 @@ We monitor 30+ upstream Claude Code issues related to cache, quota, and context
662
739
  - **[@X-15](https://github.com/X-15)** — VS Code extension validation, per-fix health status analysis confirming safety check behavior on v2.1.105 (#16)
663
740
  - **[@deafsquad](https://github.com/deafsquad)** — Universal smoosh_split un-smoosh fix (PR #26), source-level function attribution of resume scatter bug (anthropics/claude-code#43657), OTEL telemetry discovery, proposed and built proxy architecture for v3.0.0
664
741
  - **[@vmfarms](https://github.com/vmfarms)** — Concurrent multi-runner production validation, surfaced proxy-mode resume-marker regex no-op (#96), TTL tier detection gap (#97), and image-strip stderr leak (#98)
742
+ - **[@ojura](https://github.com/ojura)** — Opus 4.7 thinking-summaries root-cause analysis: filed [anthropics/claude-code#59844](https://github.com/anthropics/claude-code/issues/59844) with the CLI-binary decode (`!getIsNonInteractiveSession()` gate at offset 230510599 in v2.1.142) and the two-stacked-special-cases framing, which made the `thinking-display` extension (v3.6.1) a clean proxy-side complement to the proposed upstream fix
665
743
 
666
744
  If you contributed to the community effort on these issues and aren't listed here, please open an issue or PR — we want to credit everyone properly.
667
745
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-code-cache-fix",
3
- "version": "3.6.0",
3
+ "version": "3.6.2",
4
4
  "description": "Cache optimization proxy and interceptor for Claude Code. Fixes prompt cache bugs, stabilizes prefix, reduces quota burn.",
5
5
  "type": "module",
6
6
  "exports": {
@@ -0,0 +1,79 @@
1
+ // thinking-display extension
2
+ //
3
+ // Restores Opus 4.7 thinking summaries in non-interactive CC surfaces
4
+ // (VS Code chat panel, SDK, `claude --print`, anything spawned with
5
+ // `--input-format stream-json`).
6
+ //
7
+ // Background: Anthropic flipped the `thinking.display` default to `"omitted"`
8
+ // on Opus 4.7. CC's CLI propagates `display: "summarized"` only when the
9
+ // session is interactive (the `!getIsNonInteractiveSession()` gate); every
10
+ // non-interactive subprocess gets the API default of omitted thinking, which
11
+ // renders as an empty stub in the IDE. Upstream root cause and patch proposed
12
+ // in anthropics/claude-code#59844 (credit: @ojura).
13
+ //
14
+ // This extension is the proxy-side workaround. When a request has thinking
15
+ // enabled but `display` unset, inject the configured mode at the API
16
+ // boundary. Works on any CC version routed through cache-fix-proxy without
17
+ // waiting for Anthropic to ship the CLI fix.
18
+ //
19
+ // Config (env var; built-in default is "summarized"):
20
+ // CACHE_FIX_THINKING_DISPLAY=summarized — inject display: "summarized"
21
+ // (main case; restores summaries in IDE/SDK/--print). DEFAULT.
22
+ // CACHE_FIX_THINKING_DISPLAY=omitted — inject display: "omitted"
23
+ // (force-suppress override; for agent runtimes that don't want thinking
24
+ // blocks at all, regardless of what their CLI sends)
25
+ // CACHE_FIX_THINKING_DISPLAY=disabled — no injection; extension is a no-op
26
+ //
27
+ // Default flipped to "summarized" in v3.6.1 after the cache-prefix test on
28
+ // Opus 4.7 measured 0% absolute drop in steady-state cache_read ratio with
29
+ // injection enabled (well inside the ≤5% "preserved" threshold). Users who
30
+ // want the older "no injection" behavior set CACHE_FIX_THINKING_DISPLAY=disabled.
31
+
32
+ const MODEL_REGEX = /^claude-opus-4-7/;
33
+
34
+ function resolveMode() {
35
+ const v = process.env.CACHE_FIX_THINKING_DISPLAY;
36
+ if (v === "summarized" || v === "omitted" || v === "disabled") return v;
37
+ return "summarized";
38
+ }
39
+
40
+ // Thinking types that produce thinking blocks. CC v2.1.131+ ships
41
+ // `type: "adaptive"` (dynamic-budget mode) by default for the Bun binary's
42
+ // non-interactive paths; older versions and explicit-budget configs may
43
+ // still send `"enabled"`. Both produce the same empty-thinking symptom
44
+ // when `display` is unset on Opus 4.7, so both are in scope.
45
+ const ACTIVE_THINKING_TYPES = new Set(["enabled", "adaptive"]);
46
+
47
+ function shouldInject(body) {
48
+ if (!body || typeof body !== "object") return false;
49
+ if (typeof body.model !== "string") return false;
50
+ if (!MODEL_REGEX.test(body.model)) return false;
51
+ if (!body.thinking || typeof body.thinking !== "object") return false;
52
+ if (!ACTIVE_THINKING_TYPES.has(body.thinking.type)) return false;
53
+ // Only inject when display is unset. Preserve any explicit user choice
54
+ // (including explicit "omitted" for compliance opt-out).
55
+ return body.thinking.display === undefined;
56
+ }
57
+
58
+ export { MODEL_REGEX, ACTIVE_THINKING_TYPES, resolveMode, shouldInject };
59
+
60
+ export default {
61
+ name: "thinking-display",
62
+ description:
63
+ "Inject thinking.display on Opus 4.7 requests when unset, to restore " +
64
+ "thinking summaries lost to CC's non-interactive CLI gate (claude-code#59844)",
65
+ enabled: false,
66
+ order: 360,
67
+
68
+ async onRequest(ctx) {
69
+ const mode = resolveMode();
70
+ if (mode === "disabled") return;
71
+ if (!shouldInject(ctx.body)) return;
72
+
73
+ ctx.body.thinking.display = mode;
74
+
75
+ if (ctx.meta) {
76
+ ctx.meta.thinkingDisplayInjected = mode;
77
+ }
78
+ },
79
+ };
@@ -1,18 +1,70 @@
1
1
  {
2
- "ttl-tier-detect": { "enabled": true, "order": 75 },
3
- "fingerprint-strip": { "enabled": true, "order": 100 },
4
- "image-strip": { "enabled": true, "order": 150 },
5
- "sort-stabilization": { "enabled": true, "order": 200 },
6
- "fresh-session-sort": { "enabled": true, "order": 250 },
7
- "identity-normalization": { "enabled": true, "order": 300 },
8
- "smoosh-split": { "enabled": true, "order": 320 },
9
- "content-strip": { "enabled": true, "order": 330 },
10
- "tool-input-normalize": { "enabled": true, "order": 340 },
11
- "microcompact-stability": { "enabled": true, "order": 350 },
12
- "cache-control-normalize": { "enabled": true, "order": 400 },
13
- "messages-cache-breakpoint": { "enabled": true, "order": 410 },
14
- "ttl-management": { "enabled": true, "order": 500 },
15
- "cache-telemetry": { "enabled": true, "order": 600 },
16
- "overage-warning": { "enabled": true, "order": 610 },
17
- "request-log": { "enabled": false, "order": 700 }
2
+ "ttl-tier-detect": {
3
+ "enabled": true,
4
+ "order": 75
5
+ },
6
+ "fingerprint-strip": {
7
+ "enabled": true,
8
+ "order": 100
9
+ },
10
+ "image-strip": {
11
+ "enabled": true,
12
+ "order": 150
13
+ },
14
+ "sort-stabilization": {
15
+ "enabled": true,
16
+ "order": 200
17
+ },
18
+ "fresh-session-sort": {
19
+ "enabled": true,
20
+ "order": 250
21
+ },
22
+ "identity-normalization": {
23
+ "enabled": true,
24
+ "order": 300
25
+ },
26
+ "smoosh-split": {
27
+ "enabled": true,
28
+ "order": 320
29
+ },
30
+ "content-strip": {
31
+ "enabled": true,
32
+ "order": 330
33
+ },
34
+ "tool-input-normalize": {
35
+ "enabled": true,
36
+ "order": 340
37
+ },
38
+ "microcompact-stability": {
39
+ "enabled": true,
40
+ "order": 350
41
+ },
42
+ "thinking-display": {
43
+ "enabled": true,
44
+ "order": 360
45
+ },
46
+ "cache-control-normalize": {
47
+ "enabled": true,
48
+ "order": 400
49
+ },
50
+ "messages-cache-breakpoint": {
51
+ "enabled": true,
52
+ "order": 410
53
+ },
54
+ "ttl-management": {
55
+ "enabled": true,
56
+ "order": 500
57
+ },
58
+ "cache-telemetry": {
59
+ "enabled": true,
60
+ "order": 600
61
+ },
62
+ "overage-warning": {
63
+ "enabled": true,
64
+ "order": 610
65
+ },
66
+ "request-log": {
67
+ "enabled": false,
68
+ "order": 700
69
+ }
18
70
  }
@@ -41,7 +41,7 @@ fi
41
41
  # through os.environ, never via a shell-substituted string.
42
42
  result=$(python3 <<'PYEOF' 2>/dev/null
43
43
  import sys, json, os, re, hashlib
44
- from datetime import datetime, timezone, timedelta
44
+ from datetime import datetime, timezone
45
45
 
46
46
  home = os.path.expanduser('~')
47
47
  account_path = os.path.join(home, '.claude', 'quota-status', 'account.json')
@@ -98,28 +98,76 @@ ts = sess.get('timestamp') or acc.get('timestamp', '')
98
98
 
99
99
  now = datetime.fromisoformat(ts.replace('Z', '+00:00')) if ts else datetime.now(timezone.utc)
100
100
 
101
- # Q5h burn rate
102
- rate5 = ''
103
- if q5h_reset > 0 and q5h > 0:
104
- window_start = datetime.fromtimestamp(q5h_reset, tz=timezone.utc) - timedelta(hours=5)
105
- elapsed_min = (now - window_start).total_seconds() / 60
106
- if elapsed_min > 1:
107
- rate5 = '{:+.1f}'.format(q5h / elapsed_min)
108
-
109
- # Q7d burn rate
110
- rate7 = ''
111
- if q7d_reset > 0 and q7d > 0:
112
- window_start_7d = datetime.fromtimestamp(q7d_reset, tz=timezone.utc) - timedelta(days=7)
113
- elapsed_hr = (now - window_start_7d).total_seconds() / 3600
114
- if elapsed_hr > 0.1:
115
- rate7 = '{:+.1f}'.format(q7d / elapsed_hr)
116
-
117
- label = 'Q5h: {}%'.format(q5h)
118
- if rate5:
119
- label += ' ({}%/m)'.format(rate5)
120
- label += ' | Q7d: {}%'.format(q7d)
121
- if rate7:
122
- label += ' ({}%/hr)'.format(rate7)
101
+ BAR_WIDTH = 10
102
+
103
+ def draw_bar(consumed_pct, elapsed_pct, width=BAR_WIDTH):
104
+ # Tick overlays a fill cell when consumed > elapsed, keeping bar width
105
+ # constant that's what makes the over-pace state legible ( inside the
106
+ # filled run) rather than just pushing fill cells around.
107
+ fill = int(round(max(0, min(100, consumed_pct)) / 100 * width))
108
+ if elapsed_pct is None:
109
+ tick = -1
110
+ else:
111
+ tick = min(int(max(0, min(100, elapsed_pct)) / 100 * width), width - 1)
112
+ cells = []
113
+ remaining = fill
114
+ for i in range(width):
115
+ if i == tick:
116
+ cells.append('┃')
117
+ elif remaining > 0:
118
+ cells.append('█')
119
+ remaining -= 1
120
+ else:
121
+ cells.append('░')
122
+ return '[' + ''.join(cells) + ']'
123
+
124
+ def fmt_hm(secs):
125
+ if secs is None or secs <= 0:
126
+ return ''
127
+ return '{}h{:02d}m'.format(int(secs // 3600), int((secs % 3600) // 60))
128
+
129
+ def fmt_dh(secs):
130
+ if secs is None or secs <= 0:
131
+ return ''
132
+ return '{}d {}h'.format(int(secs // 86400), int((secs % 86400) // 3600))
133
+
134
+ def window_view(reset_ts, window_secs):
135
+ # Returns (elapsed_sec, secs_left). elapsed_sec may be negative (server
136
+ # gave us a reset_at past the window head — invalid) or exceed window_secs
137
+ # (stale reset_at not yet refreshed by the next API call). Callers handle
138
+ # both; downstream rendering clamps the tick to the bar edges.
139
+ if reset_ts <= 0:
140
+ return None, None
141
+ window_start = datetime.fromtimestamp(reset_ts - window_secs, tz=timezone.utc)
142
+ return (now - window_start).total_seconds(), reset_ts - now.timestamp()
143
+
144
+ def time_to_exhaust_sec(pct, elapsed_sec, min_elapsed_sec):
145
+ # (100 - pct) divided by current burn rate (pct / elapsed_sec). Gated on
146
+ # min_elapsed_sec so very-fresh windows don't project off noise.
147
+ if elapsed_sec is None or elapsed_sec <= min_elapsed_sec:
148
+ return None
149
+ if pct <= 0 or pct >= 100:
150
+ return None
151
+ return (100 - pct) * elapsed_sec / pct
152
+
153
+ def format_window(name, pct, elapsed_sec, window_secs, secs_left, fmt_time, min_elapsed_sec):
154
+ ep = None if elapsed_sec is None or elapsed_sec < 0 else elapsed_sec / window_secs * 100
155
+ extras = []
156
+ stale = secs_left is not None and secs_left <= 0
157
+ if not stale:
158
+ exhaust = time_to_exhaust_sec(pct, elapsed_sec, min_elapsed_sec)
159
+ if exhaust is not None:
160
+ extras.append('exhaust ' + fmt_time(exhaust))
161
+ if secs_left is not None and secs_left > 0:
162
+ extras.append('reset ' + fmt_time(secs_left))
163
+ tail = ' (' + ', '.join(extras) + ')' if extras else ''
164
+ return '{} {} {}%{}'.format(name, draw_bar(pct, ep), pct, tail)
165
+
166
+ elapsed_5h, left_5h = window_view(q5h_reset, 5 * 3600)
167
+ elapsed_7d, left_7d = window_view(q7d_reset, 7 * 86400)
168
+
169
+ label = format_window('Q5h', q5h, elapsed_5h, 5 * 3600, left_5h, fmt_hm, 60)
170
+ label += ' | ' + format_window('Q7d', q7d, elapsed_7d, 7 * 86400, left_7d, fmt_dh, 360)
123
171
  if overage == 'active':
124
172
  label += ' | OVERAGE'
125
173