@askalf/dario 3.13.1 → 3.13.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +176 -202
  2. package/dist/cc-template.js +16 -16
  3. package/package.json +1 -1
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  <p align="center">
2
2
  <h1 align="center">dario</h1>
3
- <p align="center"><strong>A local LLM router. One endpoint on your machine, every provider behind it, your tools don't need to change.</strong></p>
3
+ <p align="center"><strong>Your Claude Max subscription, in every tool you use.<br>Plus OpenAI, Groq, OpenRouter, Ollama one local URL. Your tools don't change a line.</strong></p>
4
4
  </p>
5
5
 
6
6
  <p align="center">
@@ -11,134 +11,111 @@
11
11
  <a href="https://www.npmjs.com/package/@askalf/dario"><img src="https://img.shields.io/npm/dm/@askalf/dario" alt="Downloads"></a>
12
12
  </p>
13
13
 
14
- <p align="center">
15
- <a href="#quick-start">Quick Start</a> &bull;
16
- <a href="#who-this-is-for">Who it's for</a> &bull;
17
- <a href="#backends">Backends</a> &bull;
18
- <a href="#why-switch">Why switch</a> &bull;
19
- <a href="#trust--transparency">Trust</a> &bull;
20
- <a href="#faq">FAQ</a>
21
- </p>
22
-
23
- ---
24
-
25
- ## What it is
26
-
27
- Dario runs on your machine and gives every tool you use one local URL that reaches **every LLM you use.** Point Cursor, Continue, Aider, LiteLLM, your own scripts — anything that speaks the Anthropic or OpenAI API — at `http://localhost:3456`, and dario routes each request to the right backend:
28
-
29
- - **OpenAI** — your API key, routed to `api.openai.com` straight through.
30
- - **Any OpenAI-compat endpoint** — OpenRouter, Groq, a local LiteLLM, Ollama's openai-compat mode, self-hosted vLLM. Set the backend's `baseUrl` once, done.
31
- - **Claude Max / Pro subscriptions** — OAuth-backed, billed against your plan instead of API pricing. Multi-account pooling if you have more than one.
32
-
33
- Your tool sees one base URL. `gpt-4o` goes to OpenAI. `llama-3-70b` goes to Groq. `claude-opus-4-6` goes to your Claude subscription. None of your tools have to know about any of it.
14
+ ```bash
15
+ npm install -g @askalf/dario && dario login && dario proxy
16
+ ```
34
17
 
35
- **Backends are plugins, not the product.** Dario's job is the one local endpoint your tools point at. Each backend is a swappable adapter behind it when a provider ships, a backend entry lands, your tools don't change. That's the durable part.
18
+ Three commands. Your Claude Max subscription is now the Claude backend for **Cursor, Continue, Aider, Zed, OpenCode, Claude Code itself, your own scripts** — anything that speaks the Anthropic or OpenAI API. Point them at `http://localhost:3456` and the model name decides where the request goes: `claude-opus-4-6` hits your Max plan, `gpt-4o` hits your OpenAI key, `llama-3.3-70b` hits your Groq / OpenRouter / local vLLM.
36
19
 
37
- **No account anywhere is required.** Single-backend Claude dario works with nothing but `dario login`. Multi-backend dario works with nothing but local config files. Nothing phones home. Zero runtime dependencies. ~2,000 lines of TypeScript.
20
+ **Zero runtime dependencies. ~6,500 lines of TypeScript. 376 tests across 12 suites. [SLSA-attested](https://www.npmjs.com/package/@askalf/dario) on every release. Nothing phones home, ever.**
38
21
 
39
22
  ---
40
23
 
41
- ## Who this is for
24
+ ## Before and after
42
25
 
43
- **Best fit:**
26
+ Before dario, your $200/mo Claude Max is a Claude-Code-only thing. Every other tool you open bills per-token against the Anthropic API:
44
27
 
45
- - **Developers using multiple LLMs across multiple tools** who are tired of juggling base URLs, API keys, and per-tool provider configs.
46
- - **Teams running local or hosted OpenAI-compat servers** (LiteLLM, vLLM, Ollama, Groq, OpenRouter) who want one stable local endpoint in front of them that every tool can reuse.
47
- - **Anyone who wants to switch providers without reconfiguring every tool** — change the model name in your tool, dario picks a different backend, your tool keeps working.
48
- - **Claude Max or Pro subscribers** who want their subscription usable anywhere that speaks the Anthropic or OpenAI API — without paying API rates for every request.
49
- - **Power users running multi-agent workloads on Claude subscriptions** who want multi-account pooling with headroom-aware routing on their own machine, against their own subscriptions, without a hosted platform.
50
-
51
- **Not a fit:**
52
-
53
- - You need vendor-managed production SLAs on every request. Use the provider APIs directly.
54
- - You need a hosted multi-tenant routing platform with a dashboard. Try [askalf](https://askalf.org), a separate product in the same family — different problem, different tool.
55
- - You want a chat UI. Use claude.ai or chatgpt.com.
28
+ | Tool | Claude backend |
29
+ |---|---|
30
+ | Claude Code | subscription |
31
+ | Cursor | pay per token |
32
+ | Aider | pay per token |
33
+ | Continue | pay per token |
34
+ | Zed | pay per token |
35
+ | Your own scripts | pay per token |
36
+
37
+ After `dario proxy`, same list:
38
+
39
+ | Tool | Claude backend | OpenAI backend | Local / OpenRouter / Groq |
40
+ |---|---|---|---|
41
+ | Claude Code | subscription ✓ | gpt-4o passthrough | passthrough |
42
+ | Cursor | **subscription ✓** | gpt-4o passthrough | passthrough |
43
+ | Aider | **subscription ✓** | gpt-4o passthrough | passthrough |
44
+ | Continue | **subscription ✓** | gpt-4o passthrough | passthrough |
45
+ | Zed | **subscription ✓** | gpt-4o passthrough | passthrough |
46
+ | Your own scripts | **subscription ✓** | gpt-4o passthrough | passthrough |
47
+
48
+ The trick: every outbound request on the Claude path is rebuilt to look exactly like a request Claude Code itself would make — system prompt, tool definitions, fingerprint headers, billing tag, beta flags — using a **live-extracted template from your actually-installed CC binary** that self-heals on every Anthropic release. Anthropic's classifier sees a CC session because, from the wire up, it is one. That's what keeps your usage on subscription billing instead of API overage.
56
49
 
57
50
  ---
58
51
 
59
- ## First use case
60
-
61
- > I install dario, point every tool I already use at `http://localhost:3456`, and every LLM I have access to works through that one URL.
62
-
63
- Flow on a fresh machine:
52
+ ## Quick start
64
53
 
65
54
  ```bash
66
55
  # Install
67
56
  npm install -g @askalf/dario
68
57
 
69
- # Optional: log in to your Claude subscription (Max or Pro)
58
+ # Claude subscription path uses your Claude Code OAuth if CC is installed,
59
+ # runs its own OAuth flow otherwise
70
60
  dario login
71
61
 
72
- # Optional: add an OpenAI-compat backend
73
- dario backend add openai --key=sk-proj-...
62
+ # OpenAI or any OpenAI-compat backend (optional, additive)
63
+ dario backend add openai --key=sk-proj-...
64
+ dario backend add groq --key=gsk_... --base-url=https://api.groq.com/openai/v1
65
+ dario backend add openrouter --key=sk-or-... --base-url=https://openrouter.ai/api/v1
66
+ dario backend add local --key=anything --base-url=http://127.0.0.1:11434/v1
74
67
 
75
68
  # Start the proxy
76
69
  dario proxy
77
70
 
78
- # Use it — set these once, every tool that honors them just works
71
+ # Point every tool at one local URL
79
72
  export ANTHROPIC_BASE_URL=http://localhost:3456
80
73
  export ANTHROPIC_API_KEY=dario
81
74
  export OPENAI_BASE_URL=http://localhost:3456/v1
82
75
  export OPENAI_API_KEY=dario
83
76
  ```
84
77
 
85
- Now from the same Cursor/Continue/Aider instance:
86
-
87
- - `gpt-4o` → OpenAI, your key, straight through
88
- - `claude-opus-4-6` → Claude subscription, billed against your Max plan
89
- - `opus` → shortcut, same as above
90
- - `llama-3.1-70b` on OpenRouter → configure `dario backend add openrouter --key=sk-or-... --base-url=https://openrouter.ai/api/v1`, done
91
-
92
- One URL. Your tool doesn't know or care which provider is answering.
78
+ That's it. Every tool that honors these standard env vars now reaches every backend you configured. Switching providers is a model-name change in your tool — not a reconfigure of every SDK you've got.
93
79
 
94
80
  ---
95
81
 
96
- ## Why switch
82
+ ## Why you'll install this
97
83
 
98
- **Use dario if** you use more than one LLM provider, or more than one tool, or both and you're tired of configuring each tool with a different base URL and API key per provider.
84
+ **You pay for Claude Max but only use it in Claude Code.** Cursor, Aider, Zed, Continue they all want API keys and bill per-token while your $200/mo subscription sits idle. Dario routes Claude requests from all of them against your plan instead. The live fingerprint extractor reads your installed CC binary on every startup, so when Anthropic rotates the template, dario picks up the new one automatically — no release needed, no user action.
99
85
 
100
- **Use dario if** you want provider independence. Switching from GPT-4o to Claude to Llama is a model-name change in your tool, not a reconfigure of every SDK and base URL you've got.
86
+ **You hit 5-hour rate limits on long agent runs.** Add a second / third subscription with `dario accounts add work` and pool mode routes each request to whichever account has the most headroom. **v3.13.0 session stickiness** pins a conversation to one account for its lifetime so the Anthropic prompt cache isn't shredded across accounts mid-conversation — a 5–10× token cost saving on cache-reused turns of a long agent session. If a 429 lands mid-request, dario fails the *in-flight request* over to a different account without your client ever seeing the error.
101
87
 
102
- **Use dario if** you pay for Claude Max or Pro and you want that subscription reachable from every tool on your machine, without paying API rates or opening a second billing surface.
88
+ **You want to share capacity with a trusted group without surveilling each other.** The **v3.13.0 sealed-sender overflow protocol** uses RSA blind signatures (Chaum 1983, implemented from scratch over Node's `crypto`) so members of a trust group can lend unused Claude capacity to each other with cryptographic unlinkability. A lender verifies "this is a valid group member" without learning *which* member. It's the privacy primitive that makes friends-pool possible — and as far as I know, no other Claude router ships this.
103
89
 
104
- **Use dario pool mode if** you're running multi-agent workloads on Claude subscriptions and hitting per-account rate limits. Add 2–N accounts with `dario accounts add` and dario routes across them by per-account headroom, all on your machine, against your own subscriptions. See [Multi-Account Pool Mode](#multi-account-pool-mode).
90
+ **You want the proxy layer off the wire entirely.** **Shim mode** (v3.12.0, hardened in v3.13.0) is an in-process `globalThis.fetch` patch injected via `NODE_OPTIONS=--require`. No HTTP hop, no port to bind, no `BASE_URL` to set. Anthropic literally cannot detect this from outside the CC process without shipping signed-binary integrity checks against `globalThis` from inside their own binary. `dario shim -- claude --print "hi"` and CC thinks it's talking directly to `api.anthropic.com`.
105
91
 
106
- **Use a provider API directly if** you need vendor-managed production SLAs or high-scale orchestration primitives the providers ship themselves. Dario isn't trying to replace their APIs it's trying to put one local shim in front of all of them so your tools don't care which is which.
92
+ **You want provider independence.** Switching from Claude to GPT-4o to Llama-3.3-70b to a local Qwen-Coder is a **model-name change** in your tool. Not a reconfigure. Not new base URLs. Not new API keys. Not a new SDK import. One URL, one fake key, every real provider behind it.
107
93
 
108
- **Don't use dario if** you want a subprocess bridge that shells out to `claude --print` under the hood (openclaw-claude-bridge and similar). That's a valid answer for single-team single-machine workloads that can accept a one-subscription rate ceiling and a one-machine deployment different tradeoffs, different tool.
94
+ **You want to actually audit the thing.** ~6,500 lines of TypeScript across ~15 files. Zero runtime dependencies (`npm ls --production` confirms). Credentials stored at `~/.dario/` with `0600` permissions. `127.0.0.1`-only by default. Every release [SLSA-attested](https://www.npmjs.com/package/@askalf/dario) via GitHub Actions. Nothing phones home. It's small enough to read in a weekend.
109
95
 
110
96
  ---
111
97
 
112
- ## Quick Start
113
-
114
- ```bash
115
- # Install
116
- npm install -g @askalf/dario
117
-
118
- # Claude subscription path (detects Claude Code credentials if CC is installed,
119
- # runs its own OAuth flow otherwise)
120
- dario login
98
+ ## Who this is for
121
99
 
122
- # OpenAI or any OpenAI-compat provider (optional, additive)
123
- dario backend add openai --key=sk-proj-...
100
+ **Best fit:**
124
101
 
125
- # Start the proxy
126
- dario proxy
102
+ - **Developers using multiple LLMs across multiple tools** tired of juggling base URLs, keys, and per-tool provider configs.
103
+ - **Claude Max / Pro subscribers** who want their subscription usable from every tool on their machine, not just Claude Code.
104
+ - **Teams running local or hosted OpenAI-compat servers** (LiteLLM, vLLM, Ollama, Groq, OpenRouter) who want one stable local endpoint that every tool reuses.
105
+ - **Power users on multi-agent workloads** who want multi-account pooling, session stickiness, and in-flight 429 failover on their own machine, against their own subscriptions.
106
+ - **Anyone building AI coding tools** who wants provider independence without writing an OpenAI ↔ Anthropic translator themselves.
127
107
 
128
- # Point anything that speaks the Anthropic or OpenAI API at localhost:3456
129
- export ANTHROPIC_BASE_URL=http://localhost:3456
130
- export ANTHROPIC_API_KEY=dario
131
- export OPENAI_BASE_URL=http://localhost:3456/v1
132
- export OPENAI_API_KEY=dario
133
- ```
108
+ **Not a fit if:**
134
109
 
135
- Opus, Sonnet, Haiku, GPT-4o, o1, o3, o4, plus anything the configured OpenAI-compat backend serves. Streaming, tool use, prompt caching, extended thinking. **Zero runtime dependencies.** Auto-launches under [Bun](https://bun.sh) when available for TLS fingerprint fidelity with Claude Code's runtime on the Claude path.
110
+ - You need vendor-managed production SLAs on every request. Use the provider APIs directly.
111
+ - You need a hosted multi-tenant routing platform with a dashboard. Try [askalf](https://askalf.org) — different product, same family.
112
+ - You want a chat UI. Use claude.ai or chatgpt.com.
136
113
 
137
114
  ---
138
115
 
139
116
  ## Backends
140
117
 
141
- Dario's routing is organized around **backends**, each with its own auth and its own target. Backends are swappable adapters — add one, your tools reach it at `localhost:3456` with whatever API shape they already speak. v3.6.0 ships two backends, with more coming.
118
+ Dario's routing is organized around **backends**, each with its own auth and its own target. Backends are swappable adapters — add one, your tools reach it at `localhost:3456` with whatever API shape they already speak.
142
119
 
143
120
  ### 1. Claude subscription backend (built in)
144
121
 
@@ -146,9 +123,10 @@ OAuth-backed Claude Max / Pro, billed against your plan instead of the API. Acti
146
123
 
147
124
  **What it does:**
148
125
 
149
- - Every request is replaced with a Claude Code template before it goes upstream — 25 tool definitions, 25KB system prompt, exact CC field order, exact beta headers, exact metadata structure. Only the conversation content is preserved. Anthropic's classifier sees what looks like a Claude Code session because, from the wire up, it *is* one — and that's what keeps your usage on subscription billing instead of Extra Usage.
126
+ - Every request is replaced with a Claude Code template before it goes upstream — 25 tool definitions, ~25KB system prompt, exact CC field order, exact beta headers, exact metadata structure. Only the conversation content is preserved. Anthropic's classifier sees what looks like a Claude Code session because, from the wire up, it *is* one — and that's what keeps your usage on subscription billing instead of Extra Usage.
127
+ - **Live fingerprint extraction** (v3.11.0). Dario spawns your installed `claude` binary against a loopback MITM endpoint on startup, captures its outbound request, and extracts the live template (system prompt, tools, user-agent, beta flags, and as of v3.13.0 the exact header insertion order). Eliminates the "Anthropic ships a new CC, dario is stale for 48 hours" window. Cached at `~/.dario/cc-template.live.json` with a 24h TTL. Falls back to the bundled snapshot if CC isn't installed.
150
128
  - **Billing tag** reconstructed using CC's own algorithm: `x-anthropic-billing-header: cc_version=<version>.<build_tag>; cc_entrypoint=cli; cch=<5-char-hex>;` where `build_tag = SHA-256(seed + chars[4,7,20] of user message + version).slice(0,3)`.
151
- - **OAuth config** auto-detected from the installed CC binary at startup. When Anthropic rotates `client_id`, authorize URL, or scopes, dario picks up the new values on the next run without needing a release.
129
+ - **OAuth config auto-detection** from the installed CC binary. When Anthropic rotates `client_id`, authorize URL, or scopes, dario picks up the new values on the next run without needing a release.
152
130
  - **Multi-account pool mode** — see below. Automatic when 2+ accounts are configured.
153
131
  - **Framework scrubbing** — known fingerprint tokens (`OpenClaw`, `sessions_*` prefixes, orchestration tags) stripped from system prompt and message content before the request leaves your machine.
154
132
  - **Bun auto-relaunch** — when Bun is installed, dario relaunches under it so the TLS fingerprint matches CC's runtime. Without Bun, dario runs on Node.js.
@@ -157,7 +135,7 @@ OAuth-backed Claude Max / Pro, billed against your plan instead of the API. Acti
157
135
 
158
136
  **Detection scope.** The Claude backend is a per-request layer. Template replay and scrubbing are designed to be indistinguishable from Claude Code at the request level. What they *cannot* defend against is Anthropic's session-level behavioral classifier, which operates on cumulative per-OAuth aggregates (token throughput, conversation depth, streaming duration, inter-arrival timing). The practical answer to that is **pool mode** — distributing load across multiple subscriptions so no one account accumulates enough signal to trip anything. See the [FAQ entry](#faq) for the full mechanism.
159
137
 
160
- ### 2. OpenAI-compat backend (v3.6.0+)
138
+ ### 2. OpenAI-compat backend
161
139
 
162
140
  Any provider that speaks the OpenAI Chat Completions API. Activated by:
163
141
 
@@ -185,19 +163,15 @@ Credentials live at `~/.dario/backends/<name>.json` with mode `0600`.
185
163
  | `claude-*` (or `opus` / `sonnet` / `haiku`) | Claude subscription backend |
186
164
  | Anything else | Claude backend with OpenAI-compat translation |
187
165
 
188
- Dario's passthrough for the OpenAI-compat backend is literal: client request body goes upstream as-is, only the `Authorization` header is swapped for the configured API key and the URL is pointed at `baseUrl + /chat/completions`. Response body streams back unchanged.
189
-
190
- ### Coming in a follow-up
166
+ Passthrough for the OpenAI-compat backend is literal: client request body goes upstream as-is, only the `Authorization` header is swapped and the URL is pointed at `baseUrl + /chat/completions`. Response body streams back unchanged.
191
167
 
192
- - **Anthropic OpenAI request translation** for `/v1/messages` requests with GPT-family model names (tool_use format, streaming delta conversion).
193
- - **Multiple simultaneous openai-compat backends** with per-model routing rules (`gpt-*` → OpenAI, `llama-*` → Groq, `mixtral-*` → OpenRouter).
194
- - **Fallback rules.** "If Claude 429s, use Gemini." v3.6.0 ships the routing plumbing; fallback logic layers on top.
168
+ You can also force the backend with a **provider prefix** on the model field (`openai:gpt-4o`, `groq:llama-3.3-70b`, `claude:opus`, `local:qwen-coder`) regardless of what the model name looks like. See [Provider prefix](#provider-prefix).
195
169
 
196
170
  ---
197
171
 
198
- ## Multi-Account Pool Mode
172
+ ## Multi-account pool mode
199
173
 
200
- *New in v3.5.0, for the Claude subscription backend.* Dario can manage multiple Claude subscriptions and route each request to the account with the most headroom. Single-account Claude dario is unchanged — pool mode activates **only** when `~/.dario/accounts/` contains 2+ accounts.
174
+ Dario can manage multiple Claude subscriptions and route each request to the account with the most headroom. Single-account dario is unchanged — pool mode activates **only** when `~/.dario/accounts/` contains 2+ accounts.
201
175
 
202
176
  ```bash
203
177
  dario accounts add work
@@ -213,18 +187,82 @@ Each request picks the account with the highest headroom:
213
187
  headroom = 1 - max(util_5h, util_7d)
214
188
  ```
215
189
 
216
- The response's `anthropic-ratelimit-unified-*` headers are parsed back into the pool so the next selection sees fresh utilization. An account that returns a 429 is marked `rejected` and routed around until its window resets. When every account is exhausted, requests queue for up to 60 seconds waiting for headroom to reappear.
190
+ The response's `anthropic-ratelimit-unified-*` headers are parsed back into the pool so the next selection sees fresh utilization. An account that returns a 429 is marked `rejected` and routed around until its window resets. When every account is exhausted, requests queue for up to 60 seconds waiting for headroom to reappear. Accounts can mix plans — Max and Pro accounts can sit in the same pool; dario doesn't care about tier, only headroom.
191
+
192
+ ### Session stickiness (v3.13.0)
193
+
194
+ Multi-turn agent sessions now pin to one account for the life of the conversation, so the Anthropic prompt cache isn't destroyed by account rotation between turns.
195
+
196
+ **The problem.** Claude Max prompt cache is scoped to `{account × cache_control key}`. When the pool rotates a long agent conversation across accounts on headroom alone, turn 1 builds a cache entry on account A, turn 2 lands on account B and reads nothing from A's cache — paying full cache-create cost again. For a long agent session that's a 5–10× token cost multiplier on the cache-reused portion of every turn after the first.
217
197
 
218
- Accounts can mix plans Max and Pro accounts can sit in the same pool; dario doesn't care about tier, only headroom.
198
+ **The fix.** Dario hashes a conversation's first user message into a 16-hex-char `stickyKey` (SHA-256 truncated, deterministic) and binds the key to whichever account `select()` would have picked on turn 1. Subsequent turns re-use that account as long as it's still healthy (not rejected, token not near expiry, headroom > 2%). On 429 failover, dario rebinds the key to the new account so the next turn doesn't re-select the exhausted one. 6h TTL, 2,000-entry cap, lazy cleanup. No client cooperation required — it works through the normal proxy path with nothing to configure.
219
199
 
220
- **Pool inspection endpoints:**
200
+ ### In-flight 429 failover (v3.8.0+)
201
+
202
+ When a Claude request hits a 429 mid-flight, dario retries the *same request* against a different account before the client ever sees an error. The client sees one successful response; the pool sees the rejected account go cold until its window resets. Combined with session stickiness, this means long agent runs survive pool-level exhaustion without dropping user-facing turns.
203
+
204
+ ### Inspection
221
205
 
222
206
  ```bash
223
- curl http://localhost:3456/accounts # per-account utilization, claim, status
207
+ curl http://localhost:3456/accounts # per-account utilization, claim, sticky bindings, status
224
208
  curl http://localhost:3456/analytics # per-account / per-model stats, burn rate, exhaustion predictions
225
209
  ```
226
210
 
227
- **Scope.** v3.5.0 ships headroom-aware selection *across* requests — a 429 on one request marks the account rejected and the next request goes to a different one. Retrying a single in-flight request against a different account when that request 429s (inside-request failover) ships in v3.5.1 along with analytics recording wiring.
211
+ ---
212
+
213
+ ## Sealed-sender overflow protocol (v3.13.0)
214
+
215
+ Trust-group members can lend each other Claude capacity with **cryptographic unlinkability**: a lender can verify the borrower is a valid group member without learning *which* member, so no one in the pool can surveil another through borrow telemetry.
216
+
217
+ **The primitive.** RSA blind signatures (Chaum 1983), implemented from scratch on top of Node's `crypto` module using `RSA_NO_PADDING` for raw `m^e mod n` / `c^d mod n` primitives. Full-Domain Hash via MGF1-SHA256 (with counter retry) prevents multiplicative forgery. The flow: the group admin signs *blinded* tokens in a batch without seeing their real values; the member unblinds locally to obtain valid RSA-FDH signatures on random tokens the admin has never seen and can never correlate back to the member. When a member spends a token with a lender, the lender verifies the signature with the group public key — it proves "some member got this signed" without identifying who.
218
+
219
+ **What this is, and what it isn't.** This is **privacy between group members**, not anonymity from Anthropic. When a lender accepts a borrow, the actual upstream request still lands under the lender's attributable Claude account identity — Anthropic sees the lender as the originator, exactly as they would for any other request on that account. The cryptographic unlinkability protects group members from each other: no member can correlate borrow telemetry back to another member's identity.
220
+
221
+ **What's in v3.13.0:**
222
+
223
+ - `src/sealed-pool.ts` — ~550 lines. `GroupAdmin` / `GroupMember` / `GroupLender` classes with quota/expiry enforcement, SHA-256-hashed double-spend set, JSON wire envelope (`{v:1, groupId, token, sig, request}`), and key export/import for distributing group credentials.
224
+ - `POST /v1/pool/borrow` endpoint on the proxy, gated on `~/.dario/group.json`. Positioned before `checkAuth` — the group signature *is* the authentication, so doubling it with a local API key would add nothing. Verified borrows delegate to `pool.select()` and forward upstream under the lender's account.
225
+ - 57 test assertions covering raw RSA roundtrip, unlinkability, wrong-key / tampered-sig / wrong-group / double-spend rejection, key export/import, admin membership / quota / expiry enforcement, concurrent-borrow double-spend prevention, and end-to-end two-member unlinkability.
226
+
227
+ Full feature-parity with `/v1/messages` (streaming, inside-request 429 failover, reverse tool mapping) for borrowed requests is intentionally a follow-up — v3.13.0 ships the cryptographic primitive and a working minimal endpoint; full integration layers on top.
228
+
229
+ ---
230
+
231
+ ## Shim mode
232
+
233
+ *Experimental, opt-in. The default path is still the HTTP proxy — shim mode is a second transport, not a replacement.*
234
+
235
+ Shim mode runs a child process with an **in-process `globalThis.fetch` patch** that rewrites the child's outbound requests to `api.anthropic.com/v1/messages` exactly the way the proxy would, then sends them directly from the child to Anthropic. No localhost HTTP hop. No port to bind. No `ANTHROPIC_BASE_URL` to set.
236
+
237
+ ```bash
238
+ dario shim -- claude --print "hello"
239
+ dario shim -v -- claude --print "hello" # verbose
240
+ ```
241
+
242
+ Under the hood: `dario shim` spawns the child with `NODE_OPTIONS=--require <dario-runtime.cjs>` and a unix socket / named pipe for telemetry. The runtime patches `globalThis.fetch` only for Anthropic messages requests, applies the same template replay the proxy does, and relays per-request events back to the parent so analytics still work. Every other fetch call in the child is untouched and failsafe-passes through on any internal error.
243
+
244
+ **Why it matters.** Anthropic can fingerprint a proxy via TLS, headers, IP, or `BASE_URL` env. They literally cannot easily detect a `globalThis.fetch` monkey-patch from inside their own process without shipping signed-binary integrity checks against `globalThis` — and even then, the shim runs *before* CC's code loads, so it could patch the integrity check too. This is the transport that has the longest half-life against classifier evolution.
245
+
246
+ **v3.13.0 hardening.** The shim got a significant robustness pass:
247
+
248
+ - **Runtime detection** — `detectRuntime()` checks `globalThis.Bun` / `globalThis.Deno` / `process.versions.node` and logs a warning for non-Node runtimes. Canary for the day Anthropic ships a Bun-compiled CC.
249
+ - **Template mtime-based auto-reload** — long-running child processes pick up mid-session fingerprint refreshes from dario's live capture without restart.
250
+ - **Strict defensive `rewriteBody`** — the previous logic accepted `length >= 1` on the system array and invented `[1]`/`[2]` blocks out of thin air. Now requires exactly `length === 3` with all-text blocks; any mismatch passes through unchanged. Passthrough on an unknown shape is safer than blind replacement.
251
+ - **`rewriteHeaders` honors captured header order** — the live fingerprint capture now records the exact order CC emits headers on the wire, and the shim replays that order on every outbound request. Header sequence alone is a fingerprint vector; v3.13.0 removes it.
252
+ - **`checkVersionDrift`** — logs when the child's UA `cc_version` differs from the template's, so stale-cache windows during CC upgrades are visible in debug output.
253
+
254
+ **When to use shim mode:**
255
+ - Running a single CC instance on a locked-down machine where binding a local port is inconvenient or forbidden.
256
+ - Wrapping one-off scripts (`dario shim -- node my-agent.js`) without setting up environment variables.
257
+ - Debugging a specific child process in isolation — verbose logs are scoped to that process.
258
+ - You suspect Anthropic is fingerprinting your proxy traffic and you want to take the proxy off the wire.
259
+
260
+ **When to stay on the proxy** (which is still the default):
261
+ - Multi-client routing. The proxy serves every tool on the machine through one endpoint; the shim wraps one child at a time.
262
+ - Multi-account pool mode. Pooling across subscriptions needs a shared OAuth pool the proxy owns — a shim patch inside one child can't see pool state across other processes.
263
+ - Anything that isn't a Node / Bun child. The shim relies on `NODE_OPTIONS`, so non-JS runtimes (Python SDK, a Go CLI) still need the proxy.
264
+
265
+ See the [v3.12.0 release notes](https://github.com/askalf/dario/releases/tag/v3.12.0) for the original design writeup and [v3.13.0](https://github.com/askalf/dario/releases/tag/v3.13.0) for the hardening notes.
228
266
 
229
267
  ---
230
268
 
@@ -243,7 +281,7 @@ curl http://localhost:3456/analytics # per-account / per-model stats, burn ra
243
281
  | `dario backend list` | List configured OpenAI-compat backends |
244
282
  | `dario backend add <name> --key=<key> [--base-url=<url>]` | Add an OpenAI-compat backend |
245
283
  | `dario backend remove <name>` | Remove an OpenAI-compat backend |
246
- | `dario shim -- <cmd> [args...]` | **Experimental (v3.12.0).** Run a child process with an in-process fetch patch that rewrites its outbound Anthropic requests — no HTTP proxy involved. See [Experimental: Shim mode](#experimental-shim-mode). |
284
+ | `dario shim -- <cmd> [args...]` | Run a child process with the in-process fetch patch (see [Shim mode](#shim-mode)) |
247
285
  | `dario help` | Full command reference |
248
286
 
249
287
  ### Proxy options
@@ -251,8 +289,8 @@ curl http://localhost:3456/analytics # per-account / per-model stats, burn ra
251
289
  | Flag / env | Description | Default |
252
290
  |---|---|---|
253
291
  | `--passthrough` / `--thin` | Thin proxy for the Claude backend — OAuth swap only, no template injection | off |
254
- | `--preserve-tools` / `--keep-tools` | Keep client tool schemas instead of remapping to CC's `Bash/Read/Grep/Glob/WebSearch/WebFetch`. Required for clients whose tools have fields CC doesn't (`sessionId`, custom ids, etc.) — see [Custom tool schemas](#custom-tool-schemas). Trade-off: drops the CC request fingerprint. | off |
255
- | `--hybrid-tools` / `--context-inject` | Remap to CC tools **and** inject request-context values (`sessionId`, `requestId`, `channelId`, `userId`, `timestamp`) into client-declared fields CC's schema doesn't carry. Preserves the CC fingerprint while keeping custom schemas functional — see [Hybrid tool mode](#hybrid-tool-mode). Mutually exclusive with `--preserve-tools`. | off |
292
+ | `--preserve-tools` / `--keep-tools` | Keep client tool schemas instead of remapping to CC's `Bash/Read/Grep/Glob/WebSearch/WebFetch`. Required for clients whose tools have fields CC doesn't — see [Custom tool schemas](#custom-tool-schemas). | off |
293
+ | `--hybrid-tools` / `--context-inject` | Remap to CC tools **and** inject request-context values (`sessionId`, `requestId`, `channelId`, `userId`, `timestamp`) into client-declared fields CC's schema doesn't carry. See [Hybrid tool mode](#hybrid-tool-mode). | off |
256
294
  | `--model=<name>` | Force a model. Shortcuts (`opus`, `sonnet`, `haiku`), full IDs (`claude-opus-4-6`), or a **provider prefix** (`openai:gpt-4o`, `groq:llama-3.3-70b`, `claude:opus`, `local:qwen-coder`) to force the backend server-wide. See [Provider prefix](#provider-prefix). | passthrough |
257
295
  | `--port=<n>` | Port to listen on | `3456` |
258
296
  | `--host=<addr>` / `DARIO_HOST` | Bind address. Use `0.0.0.0` for LAN, or a specific IP (e.g. a Tailscale interface). When non-loopback, also set `DARIO_API_KEY`. | `127.0.0.1` |
@@ -356,7 +394,7 @@ All supported. Claude backend: full Anthropic SSE format plus OpenAI-SSE transla
356
394
 
357
395
  ### Provider prefix
358
396
 
359
- Any request's `model` field can be written as `<provider>:<name>` to force which backend handles it, regardless of what the model name looks like. This is useful when regex-based routing (`gpt-*` → OpenAI, `claude-*` → Claude) doesn't match — for example when routing a `llama-3.3-70b` request through an OpenAI-compat backend, or when you want the same model name to go to different providers on different requests.
397
+ Any request's `model` field can be written as `<provider>:<name>` to force which backend handles it, regardless of what the model name looks like. Useful when regex-based routing (`gpt-*` → OpenAI, `claude-*` → Claude) doesn't match — for example when routing a `llama-3.3-70b` request through an OpenAI-compat backend, or when you want the same model name to go to different providers on different requests.
360
398
 
361
399
  Recognized prefixes:
362
400
 
@@ -370,45 +408,25 @@ Recognized prefixes:
370
408
  | `claude:` | Claude subscription backend |
371
409
  | `anthropic:` | Claude subscription backend |
372
410
 
373
- Examples:
374
-
375
- ```bash
376
- # Force openai backend
377
- curl http://localhost:3456/v1/chat/completions \
378
- -H "Authorization: Bearer dario" \
379
- -d '{"model":"openai:gpt-4o","messages":[{"role":"user","content":"hi"}]}'
380
-
381
- # Force a non-gpt model through the openai-compat backend (e.g. OpenRouter)
382
- curl http://localhost:3456/v1/chat/completions \
383
- -H "Authorization: Bearer dario" \
384
- -d '{"model":"openrouter:meta-llama/llama-3.1-70b-instruct","messages":[...]}'
385
-
386
- # Force Claude subscription backend — same as `opus` shortcut but explicit
387
- curl http://localhost:3456/v1/messages \
388
- -d '{"model":"claude:opus","max_tokens":1024,"messages":[...]}'
389
- ```
390
-
391
- The prefix gets stripped before the request goes upstream — the backend only sees the bare model name. Unrecognized prefixes are ignored, so ollama-style `llama3:8b` passes through untouched.
392
-
393
- **Server-wide override.** `dario proxy --model=openai:gpt-4o` applies the prefix to every request, regardless of what the client sends. Useful for "I want everything routed to this specific backend and model" without editing every tool's config.
411
+ The prefix gets stripped before the request goes upstream — the backend only sees the bare model name. Unrecognized prefixes are ignored, so ollama-style `llama3:8b` passes through untouched. `dario proxy --model=openai:gpt-4o` applies the prefix to every request server-wide.
394
412
 
395
413
  ### Custom tool schemas
396
414
 
397
415
  By default, on the Claude backend, dario replaces your client's tool definitions with the real Claude Code tools (`Bash`, `Read`, `Grep`, `Glob`, `WebSearch`, `WebFetch`) and translates parameters back and forth. That's how dario looks like CC on the wire, which is what lets your request bill against your Claude subscription instead of API pricing.
398
416
 
399
- The trade-off: if your client's tools carry fields CC's schema doesn't have — a `sessionId`, a custom request id, a channel-bound context token, anything — those fields don't survive the round trip. The model only ever sees `Bash({command})`, responds with `Bash({command})`, and dario's reverse map rebuilds your tool call without the fields the model never saw. Your validator then rejects the call for a missing required field.
417
+ The trade-off: if your client's tools carry fields CC's schema doesn't have — a `sessionId`, a custom request id, a channel-bound context token — those fields don't survive the round trip. The model only ever sees `Bash({command})`, responds with `Bash({command})`, and dario's reverse map rebuilds your tool call without the fields the model never saw. Your validator then rejects the call for a missing required field.
400
418
 
401
419
  Symptom: your tool calls come back looking stripped-down, or your runtime complains about a required field being absent *only when routed through dario's Claude backend*, while the same tools work fine against a direct API key or the OpenAI-compat backend.
402
420
 
403
- Fix: run dario with `--preserve-tools` (or `--keep-tools`). That skips the CC tool remap entirely, passes your client's tool definitions through to the model unchanged, and lets the model populate every field your schema expects.
421
+ Fix: run dario with `--preserve-tools`. That skips the CC tool remap entirely, passes your client's tool definitions through to the model unchanged, and lets the model populate every field your schema expects.
404
422
 
405
423
  ```bash
406
424
  dario proxy --preserve-tools
407
425
  ```
408
426
 
409
- The cost: requests no longer look like CC on the wire, so the CC subscription fingerprint is gone. On a Max/Pro plan, that means the request may be counted against your API usage rather than your subscription quota. If you're on API-key billing already, `--preserve-tools` is free; if you're using dario specifically to route against a subscription, the [hybrid tool mode](#hybrid-tool-mode) below is the compromise that keeps both.
427
+ The cost: requests no longer look like CC on the wire, so the CC subscription fingerprint is gone. On a Max/Pro plan, that means the request may be counted against your API usage rather than your subscription quota. If you're on API-key billing already, `--preserve-tools` is free; if you're using dario specifically to route against a subscription, [hybrid tool mode](#hybrid-tool-mode) below is the compromise that keeps both.
410
428
 
411
- The openai-compat backend (OpenRouter, OpenAI, Groq, local LiteLLM, etc.) is unaffected — it forwards tool definitions byte-for-byte and doesn't need this flag.
429
+ The openai-compat backend (OpenRouter, OpenAI, Groq, local LiteLLM) is unaffected — it forwards tool definitions byte-for-byte and doesn't need this flag.
412
430
 
413
431
  ### Hybrid tool mode
414
432
 
@@ -420,7 +438,7 @@ dario proxy --hybrid-tools
420
438
 
421
439
  **How it works.** On each request, dario builds a `RequestContext` from headers (`x-session-id`, `x-request-id`, `x-channel-id`, `x-user-id`) plus its own generated ids and the current timestamp. After `translateBack` produces the client-shaped tool call on the response path, any field declared on the client's tool schema whose name matches a known context field (`sessionId`/`session_id`, `requestId`/`request_id`, `channelId`/`channel_id`, `userId`/`user_id`, `timestamp`/`created_at`/`createdAt`) and isn't already populated gets filled from the context. Fields the model genuinely populated via `translateBack` are never overwritten.
422
440
 
423
- **When to use which flag.**
441
+ **When to use which flag:**
424
442
 
425
443
  | Your situation | Flag | Why |
426
444
  |---|---|---|
@@ -428,12 +446,6 @@ dario proxy --hybrid-tools
428
446
  | Your custom fields need the model's reasoning (e.g. `confidence`, `reasoning_trace`, `tool_selection_rationale`) | `--preserve-tools` | The model has to see the real schema to populate these. Accept the fingerprint loss. |
429
447
  | Your client's tools are already a subset of CC's `Bash/Read/Grep/Glob/WebSearch/WebFetch` | *(neither)* | Default mode works as-is. |
430
448
 
431
- **Limitations of hybrid mode.**
432
-
433
- - Top-level fields only. If your custom field is nested (e.g. `meta: {sessionId: ...}`), v1 doesn't reach into the nested object. Tracked in [#33](https://github.com/askalf/dario/issues/33).
434
- - The field-to-context mapping is a fixed list. If you need arbitrary fields (e.g. an internal `tenant_id`) pulled from headers, file an issue and we'll extend the map.
435
- - No type coercion beyond string. If your schema requires a numeric `sessionId`, dario sends the string it got from headers — override at your client level or use `--preserve-tools`.
436
-
437
449
  Hybrid mode was built to resolve [#29](https://github.com/askalf/dario/issues/29) cleanly for OpenClaw-style agents whose `process` tool declares `sessionId`, after the full provider-comparison diagnostic from [@boeingchoco](https://github.com/boeingchoco) made clear that the problem wasn't fixable in the translation layer alone.
438
450
 
439
451
  ### Library mode
@@ -455,69 +467,38 @@ curl http://localhost:3456/health
455
467
 
456
468
  ---
457
469
 
458
- ## Experimental: Shim mode
459
-
460
- *New in v3.12.0. Opt-in. The default path is still the HTTP proxy — shim mode is a second transport, not a replacement.*
461
-
462
- Shim mode runs a child process with an **in-process `globalThis.fetch` patch** that rewrites the child's outbound requests to `api.anthropic.com/v1/messages` exactly the way the proxy would, then sends them directly from the child to Anthropic. No localhost HTTP hop. No port to bind. No `ANTHROPIC_BASE_URL` to set.
463
-
464
- ```bash
465
- dario shim -- claude --print "hello"
466
- dario shim -v -- claude --print "hello" # verbose
467
- ```
468
-
469
- Under the hood: `dario shim` spawns the child with `NODE_OPTIONS=--require <dario-runtime.cjs>` and a unix socket / named pipe for telemetry. The runtime patches `globalThis.fetch` only for Anthropic messages requests, applies the same template replay the proxy does (system prompt, tools, user agent, beta flags), and relays per-request events back to the parent so analytics still work. Every other fetch call in the child is untouched and failsafe-passes through on any internal error.
470
-
471
- **When to use shim mode**
472
- - Running a single CC instance on a locked-down machine where binding a local port is inconvenient or forbidden.
473
- - Wrapping one-off scripts (`dario shim -- node my-agent.js`) without setting up environment variables.
474
- - Debugging a specific child process in isolation — verbose logs are scoped to that process.
475
-
476
- **When to stay on the proxy** (which is still the default)
477
- - Multi-client routing. The proxy serves every tool on the machine through one endpoint; the shim wraps one child at a time.
478
- - Multi-account pool mode. Pooling across subscriptions needs a shared OAuth pool the proxy owns — a shim patch inside one child can't see the pool state.
479
- - Anything that isn't a Node / Bun child. The shim relies on `NODE_OPTIONS`, so non-JS runtimes (Python SDK, a Go CLI) still need the proxy.
480
-
481
- Limitations at v3.12.0:
482
- - Bun child detection is partial — known-good with `claude --print` on Node.
483
- - No `--replace claude` global wrapper yet; you call `dario shim -- claude ...` explicitly.
484
- - Per-request token cost recording in shim mode is still being wired into analytics.
485
- - Windows named-pipe CI coverage is incomplete.
486
-
487
- The shim runtime lives at `src/shim/runtime.cjs` (hand-written CJS so `--require` can load it) and the host orchestrator at `src/shim/host.ts`. ~180 lines total. See the [v3.12.0 release notes](https://github.com/askalf/dario/releases/tag/v3.12.0) for the full design writeup.
488
-
489
- ---
490
-
491
470
  ## Endpoints
492
471
 
493
472
  | Path | Description |
494
473
  |---|---|
495
474
  | `POST /v1/messages` | Anthropic Messages API (Claude backend) |
496
475
  | `POST /v1/chat/completions` | OpenAI-compatible Chat API (routes by model name) |
476
+ | `POST /v1/pool/borrow` | Sealed-sender borrow endpoint (v3.13.0). Accepts group-signed tokens and forwards the request through the lender's pool. |
497
477
  | `GET /v1/models` | Model list (Claude models — OpenAI models come from the OpenAI backend directly) |
498
478
  | `GET /health` | Proxy health + OAuth status + request count |
499
479
  | `GET /status` | Detailed Claude OAuth token status |
500
- | `GET /accounts` | Pool snapshot (pool mode only) |
501
- | `GET /analytics` | Per-account / per-model stats, burn rate, exhaustion predictions. **v3.11.1+:** every request carries a `billingBucket` field (`five_hour` / `seven_day` / `overage` / `unknown`) so you can see, at a glance, which bucket each request billed against. (pool mode only) |
480
+ | `GET /accounts` | Pool snapshot including sticky binding count (pool mode only) |
481
+ | `GET /analytics` | Per-account / per-model stats, burn rate, exhaustion predictions. Every request carries a `billingBucket` field (`subscription` / `subscription_fallback` / `extra_usage` / `api` / `unknown`) so you can see which bucket each request billed against. |
502
482
 
503
483
  ---
504
484
 
505
- ## Trust & Transparency
485
+ ## Trust and transparency
506
486
 
507
487
  Dario handles your OAuth tokens and API keys locally. Here's why you can trust it:
508
488
 
509
489
  | Signal | Status |
510
490
  |---|---|
511
- | **Source code** | ~2,500 lines of TypeScript across 10 files — small enough to audit in one sitting |
491
+ | **Source code** | ~6,500 lines of TypeScript across ~15 files — small enough to audit in a weekend |
512
492
  | **Dependencies** | 0 runtime dependencies. Verify: `npm ls --production` |
513
- | **npm provenance** | Every release is [SLSA-attested](https://www.npmjs.com/package/@askalf/dario) via GitHub Actions |
493
+ | **npm provenance** | Every release is [SLSA-attested](https://www.npmjs.com/package/@askalf/dario) via GitHub Actions with sigstore provenance attached to the transparency log |
514
494
  | **Security scanning** | [CodeQL](https://github.com/askalf/dario/actions/workflows/codeql.yml) runs on every push and weekly |
495
+ | **Test footprint** | 376 assertions across 12 files. Full `npm test` green on every release |
515
496
  | **Credential handling** | Tokens and API keys never logged, redacted from errors, stored with `0600` permissions |
516
497
  | **OAuth flow** | PKCE (Proof Key for Code Exchange), no client secret |
517
498
  | **Network scope** | Binds to `127.0.0.1` by default. `--host` allows LAN/mesh with `DARIO_API_KEY` gating. Upstream traffic goes only to the configured backend target URLs over HTTPS |
518
499
  | **SSRF protection** | `/v1/messages` hits `api.anthropic.com` only; `/v1/chat/completions` hits the configured backend `baseUrl` only — hardcoded allowlist |
519
500
  | **Telemetry** | None. Zero analytics, tracking, or data collection |
520
- | **Audit trail** | [CHANGELOG.md](CHANGELOG.md) documents every release |
501
+ | **Audit trail** | [CHANGELOG.md](CHANGELOG.md) documents every release with file-level rationale |
521
502
 
522
503
  Verify the npm tarball matches this repo:
523
504
 
@@ -541,7 +522,7 @@ Claude Max and Claude Pro. Any plan that lets you use Claude Code.
541
522
  Should work if your plan includes Claude Code access. Not widely tested yet — open an issue with results.
542
523
 
543
524
  **Do I need Claude Code installed?**
544
- Recommended for the Claude backend, not strictly required. With CC installed, `dario login` picks up your credentials automatically. Without CC, dario runs its own OAuth flow against Anthropic's authorize endpoint.
525
+ Recommended for the Claude backend, not strictly required. With CC installed, `dario login` picks up your credentials automatically, and the live fingerprint extractor reads your CC binary on every startup so the template stays current. Without CC, dario runs its own OAuth flow and falls back to the bundled template snapshot.
545
526
 
546
527
  **Do I need Bun?**
547
528
  Optional, recommended for Claude-backend requests. Dario auto-relaunches under Bun when available so the TLS fingerprint matches CC's runtime. Without Bun, dario runs on Node.js and works fine; the TLS fingerprint is the only difference.
@@ -555,13 +536,13 @@ claude --print "hello"
555
536
  This establishes a session baseline. Without priming, brand-new accounts occasionally see billing classification issues on first use.
556
537
 
557
538
  **What happens when Anthropic rotates the OAuth config?**
558
- Dario auto-detects OAuth config from the installed Claude Code binary. When CC ships a new version with rotated values, dario picks them up on the next run. Cache at `~/.dario/cc-oauth-cache-v3.json`, keyed by the CC binary fingerprint. Falls back to hardcoded CC 2.1.104 prod values if CC isn't installed.
539
+ Dario auto-detects OAuth config from the installed Claude Code binary. When CC ships a new version with rotated values, dario picks them up on the next run. Cache at `~/.dario/cc-oauth-cache-v3.json`, keyed by the CC binary fingerprint.
559
540
 
560
541
  **What happens when Anthropic changes the CC request template?**
561
- *New in v3.11.0.* Dario extracts the live request template from your installed Claude Code binary on startup — the system prompt slices, tool schemas, user-agent, beta flags — and uses those to replay requests instead of a version pinned into dario itself. When CC ships a new version with a tweaked template, the next `dario proxy` run picks it up automatically. Fallback: the hand-curated `src/cc-template-data.json` bundled with the release, so dario still works even if the installed CC binary is a version the extractor doesn't know how to read. See `src/live-fingerprint.ts`.
542
+ Dario extracts the live request template from your installed Claude Code binary on startup — the system prompt, tool schemas, user-agent, beta flags, and as of v3.13.0 the exact header insertion order — and uses those to replay requests instead of a version pinned into dario itself. When CC ships a new version with a tweaked template, the next `dario proxy` run picks it up automatically. Fallback: the hand-curated `src/cc-template-data.json` bundled with the release.
562
543
 
563
544
  **I'm hitting rate limits on the Claude backend. What do I do?**
564
- Claude subscriptions have rolling 5-hour and 7-day usage windows. Check utilization with Claude Code's `/usage` command or the [statusline](https://code.claude.com/docs/en/statusline). For multi-agent workloads, add more accounts and let pool mode distribute the load: `dario accounts add <alias>`.
545
+ Claude subscriptions have rolling 5-hour and 7-day usage windows. Check utilization with Claude Code's `/usage` command or the [statusline](https://code.claude.com/docs/en/statusline). For multi-agent workloads, add more accounts and let pool mode distribute the load: `dario accounts add <alias>`. As of v3.13.0, pool mode also keeps long conversations pinned to one account so the Anthropic prompt cache isn't destroyed by rotation.
565
546
 
566
547
  **I'm seeing `representative-claim: seven_day` in my rate-limit headers instead of `five_hour`. Am I being downgraded to API billing?**
567
548
 
@@ -569,10 +550,10 @@ Claude subscriptions have rolling 5-hour and 7-day usage windows. Check utilizat
569
550
 
570
551
  Here's the full picture. Every Claude Max and Pro subscription has **two rolling usage windows**:
571
552
 
572
- - **5-hour window** — your short-term usage bucket. Refreshes on a rolling 5-hour schedule. It's the one you'll see most of the time if you use Claude casually.
573
- - **7-day window** — your longer-term usage bucket. Refreshes on a rolling 7-day schedule. It's intentionally larger than the 5-hour one so you can keep working past brief bursts of heavy usage.
553
+ - **5-hour window** — your short-term usage bucket. Refreshes on a rolling 5-hour schedule.
554
+ - **7-day window** — your longer-term usage bucket. Refreshes on a rolling 7-day schedule. Intentionally larger than the 5-hour one so you can keep working past brief bursts of heavy usage.
574
555
 
575
- When Anthropic bills a request, it decides which bucket to charge it against based on your current utilization. That decision comes back to you in the `anthropic-ratelimit-unified-representative-claim` response header:
556
+ When Anthropic bills a request, it decides which bucket to charge it against based on your current utilization. That decision comes back in the `anthropic-ratelimit-unified-representative-claim` response header:
576
557
 
577
558
  | Claim | What it means |
578
559
  |---|---|
@@ -580,36 +561,27 @@ When Anthropic bills a request, it decides which bucket to charge it against bas
580
561
  | `seven_day` | You've exhausted (or come close to exhausting) the 5-hour window for this rolling cycle, so Anthropic is now charging this request against the 7-day bucket. **Still subscription billing. Still your plan.** Not API pricing, not overage. |
581
562
  | `overage` | Both subscription windows are effectively exhausted. *This* is where per-token Extra Usage charges kick in — if you've enabled Extra Usage on the account. If you haven't, you get 429'd instead. |
582
563
 
583
- **Seeing `seven_day` is a healthy state.** It means your Max/Pro plan is doing exactly what it's supposed to do: letting you keep working past short bursts of heavy use by absorbing them into the larger 7-day bucket. Your subscription is not being "downgraded." You're not being charged API rates. Nothing has reclassified you to a worse billing tier. When your 5-hour window rolls forward enough, the claim on new requests will go back to `five_hour` on its own.
584
-
585
- **What about `overage`?** That's the state to watch. It means both windows are saturated and Anthropic is either billing you per-token under Extra Usage (if enabled) or refusing the request (if disabled). If you see this on a Claude Max account under normal use, it usually means (a) you're running a multi-agent workload that's genuinely outgrowing one subscription, or (b) Anthropic's session-level classifier has reclassified your long-running OAuth session as agentic load — see the next FAQ entry for the mechanism.
586
-
587
- **Checking where you stand.** You can inspect your current utilization three ways:
588
- 1. **Claude Code's built-in command** — run `/usage` inside a `claude` session. Shows both windows as percentages with reset times.
589
- 2. **The statusline** — see [Claude Code's statusline docs](https://code.claude.com/docs/en/statusline) for a per-prompt readout.
590
- 3. **Dario's pool endpoint** — `curl http://localhost:3456/accounts` when running pool mode. The returned snapshot includes `util5h`, `util7d`, and `claim` per account.
591
-
592
- **Practical answer if `seven_day` is painful for your workload.** Add more Claude subscriptions to the pool. Each account has its own independent 5-hour and 7-day windows, and dario pool mode will route each request to the account with the most headroom (`1 - max(util5h, util7d)`). With 2-3 accounts, you almost never see the `seven_day` bucket get touched because the router steers traffic to whichever account still has `five_hour` headroom. `dario accounts add <alias>`.
564
+ **Seeing `seven_day` is a healthy state.** Your Max/Pro plan is doing exactly what it's supposed to do: letting you keep working past short bursts of heavy use by absorbing them into the larger 7-day bucket. Your subscription is not being "downgraded." When your 5-hour window rolls forward enough, the claim on new requests will go back to `five_hour` on its own.
593
565
 
594
- **Dario's test suite asserts `five_hour` — what if I see failures saying `got: seven_day`?** Some of dario's stealth-test assertions use `representative-claim == "five_hour"` as a shorthand for "is subscription billing classification working?" That assertion is correct for a fresh account but noisy for an account that's been developed against heavily exactly the situation our own CI hits after an afternoon of test runs. If you're running the stealth suite against an account that's been busy recently and you see failures of the form `Billing claim is five_hour` / `got: seven_day`, that's a test infrastructure limitation, not a dario bug. The request was still billed against your subscription, which is what matters. These assertions will be tightened in a follow-up so they accept both buckets.
566
+ **Practical answer if `seven_day` is painful for your workload.** Add more Claude subscriptions to the pool. Each account has its own independent 5-hour and 7-day windows, and pool mode routes each request to the account with the most headroom. With 2-3 accounts, you almost never see the `seven_day` bucket get touched. `dario accounts add <alias>`.
595
567
 
596
- Standalone writeup with more detail: [Discussion #32 — why you see `representative-claim: seven_day` and why it's not a downgrade](https://github.com/askalf/dario/discussions/32).
568
+ Standalone writeup: [Discussion #32 — why you see `representative-claim: seven_day` and why it's not a downgrade](https://github.com/askalf/dario/discussions/32).
597
569
 
598
570
  **My multi-agent workload is getting reclassified to overage even though dario template-replays per request. Why?**
599
- Reclassification at high agent volume is not a per-request problem. Anthropic's classifier operates on cumulative per-OAuth-session aggregates — token throughput, conversation depth, streaming duration, inter-arrival timing, thinking-block volume. Dario's Claude backend can make each individual request indistinguishable from Claude Code and still hit this wall on a long-running agent session, because the wall isn't at the request level. Thorough diagnostic work on this was contributed by [@belangertrading](https://github.com/belangertrading) in [#23](https://github.com/askalf/dario/issues/23), including the v3.4.3/v3.4.5 hardening that landed as a result. The practical answer at the dario layer is **pool mode** — distribute load across multiple subscriptions so no single account accumulates enough signal to trip anything. See [Multi-Account Pool Mode](#multi-account-pool-mode).
571
+ Reclassification at high agent volume is not a per-request problem. Anthropic's classifier operates on cumulative per-OAuth-session aggregates — token throughput, conversation depth, streaming duration, inter-arrival timing, thinking-block volume. Dario's Claude backend can make each individual request indistinguishable from Claude Code and still hit this wall on a long-running agent session, because the wall isn't at the request level. Thorough diagnostic work on this was contributed by [@belangertrading](https://github.com/belangertrading) in [#23](https://github.com/askalf/dario/issues/23). The practical answer at the dario layer is **pool mode** — distribute load across multiple subscriptions so no single account accumulates enough signal to trip anything. See [Multi-account pool mode](#multi-account-pool-mode).
600
572
 
601
573
  **Can I route non-OpenAI providers through dario?**
602
- Yes — anything that speaks the OpenAI Chat Completions API. `dario backend add groq --key=... --base-url=https://api.groq.com/openai/v1`, `dario backend add openrouter --key=... --base-url=https://openrouter.ai/api/v1`, or point at a local LiteLLM / vLLM / Ollama-openai-compat server with `--base-url=http://localhost:4000/v1`. v3.6.0 supports one active OpenAI-compat backend at a time; per-model routing to multiple OpenAI-compat backends ships in a follow-up.
574
+ Yes — anything that speaks the OpenAI Chat Completions API. Groq, OpenRouter, LiteLLM, vLLM, Ollama's openai-compat mode. Just `dario backend add <name> --key=... --base-url=...`.
603
575
 
604
576
  **Does dario work with only the OpenAI backend, no Claude subscription?**
605
- Yes. Don't run `dario login`, just run `dario backend add openai --key=...` and `dario proxy`. Claude-backend requests will return an authentication error; OpenAI-compat requests will work normally. Dario becomes a local OpenAI-compat shim with no Claude involvement.
577
+ Yes. Skip `dario login`, just run `dario backend add openai --key=...` and `dario proxy`. Claude-backend requests will return an authentication error; OpenAI-compat requests will work normally. Dario becomes a local OpenAI-compat shim with no Claude involvement.
606
578
 
607
579
  **Why "dario"?**
608
580
  It's a name, not an acronym. Don't overthink it.
609
581
 
610
582
  ---
611
583
 
612
- ## Technical Deep Dives
584
+ ## Technical deep dives
613
585
 
614
586
  Longer-form writing on how dario works and why it works that way:
615
587
 
@@ -623,22 +595,23 @@ Longer-form writing on how dario works and why it works that way:
623
595
 
624
596
  ## Contributing
625
597
 
626
- PRs welcome. The codebase is small TypeScript — around ~3,000 lines across ~14 files:
598
+ PRs welcome. The codebase is small TypeScript — around ~6,500 lines across ~15 files:
627
599
 
628
600
  | File | Purpose |
629
601
  |---|---|
630
602
  | `src/proxy.ts` | HTTP proxy server, request handler, rate governor, Claude backend dispatch |
631
- | `src/cc-template.ts` | CC request template engine, tool mapping, orchestration & framework scrubbing |
603
+ | `src/cc-template.ts` | CC request template engine, tool mapping, orchestration and framework scrubbing |
632
604
  | `src/cc-template-data.json` | Bundled fallback CC request template (used when live-fingerprint extraction isn't possible) |
633
605
  | `src/cc-oauth-detect.ts` | OAuth config auto-detection from the installed CC binary |
634
- | `src/live-fingerprint.ts` | **v3.11.0.** Live extraction of the CC request template (system prompt, tools, user-agent, beta flags) from the installed Claude Code binary |
606
+ | `src/live-fingerprint.ts` | Live extraction of the CC request template (system prompt, tools, user-agent, beta flags, header order) from the installed Claude Code binary |
635
607
  | `src/oauth.ts` | Single-account token storage, PKCE flow, auto-refresh |
636
608
  | `src/accounts.ts` | Multi-account credential storage and independent OAuth lifecycle |
637
- | `src/pool.ts` | Account pool, headroom-aware routing, failover target selection |
609
+ | `src/pool.ts` | Account pool, headroom-aware routing, session stickiness, failover target selection |
610
+ | `src/sealed-pool.ts` | **v3.13.0.** Sealed-sender overflow protocol — RSA blind signatures for unlinkable group pooling |
638
611
  | `src/analytics.ts` | Rolling request history, per-account / per-model stats, burn-rate, billing bucket classification |
639
612
  | `src/openai-backend.ts` | OpenAI-compat backend credential storage and request forwarder |
640
- | `src/shim/runtime.cjs` | **v3.12.0.** Hand-written CJS payload loaded into child processes via `NODE_OPTIONS=--require`; patches `globalThis.fetch` for Anthropic messages requests only |
641
- | `src/shim/host.ts` | **v3.12.0.** Parent-side orchestrator for `dario shim` — spawns the child, owns the telemetry socket / named pipe, feeds analytics |
613
+ | `src/shim/runtime.cjs` | Hand-written CJS payload loaded into child processes via `NODE_OPTIONS=--require`; patches `globalThis.fetch` for Anthropic messages requests only |
614
+ | `src/shim/host.ts` | Parent-side orchestrator for `dario shim` — spawns the child, owns the telemetry socket / named pipe, feeds analytics |
642
615
  | `src/cli.ts` | CLI entry point, command routing, Bun auto-relaunch |
643
616
  | `src/index.ts` | Library exports |
644
617
 
@@ -647,6 +620,7 @@ git clone https://github.com/askalf/dario
647
620
  cd dario
648
621
  npm install
649
622
  npm run dev # runs with tsx, no build step
623
+ npm test # 376 assertions across 12 suites
650
624
  ```
651
625
 
652
626
  ---
@@ -659,7 +633,7 @@ npm run dev # runs with tsx, no build step
659
633
  | [@belangertrading](https://github.com/belangertrading) | Billing classification investigation ([#4](https://github.com/askalf/dario/issues/4)), cache_control fingerprinting ([#6](https://github.com/askalf/dario/issues/6)), billing reclassification root cause ([#7](https://github.com/askalf/dario/issues/7)), OAuth client_id discovery ([#12](https://github.com/askalf/dario/issues/12)), multi-agent session-level billing analysis ([#23](https://github.com/askalf/dario/issues/23)) |
660
634
  | [@nathan-widjaja](https://github.com/nathan-widjaja) | README positioning rewrite structure ([#21](https://github.com/askalf/dario/issues/21)) |
661
635
  | [@iNicholasBE](https://github.com/iNicholasBE) | macOS keychain credential detection ([#30](https://github.com/askalf/dario/pull/30)) |
662
- | [@boeingchoco](https://github.com/boeingchoco) | Reverse-direction tool parameter translation ([#29](https://github.com/askalf/dario/issues/29)), SSE event-group framing regression catch (v3.7.1), provider-comparison diagnostic that surfaced the `--preserve-tools` discoverability gap (v3.8.1), and the motivating case for hybrid tool mode ([#33](https://github.com/askalf/dario/issues/33), v3.9.0) |
636
+ | [@boeingchoco](https://github.com/boeingchoco) | Reverse-direction tool parameter translation ([#29](https://github.com/askalf/dario/issues/29)), SSE event-group framing regression catch (v3.7.1), provider-comparison diagnostic that surfaced the `--preserve-tools` discoverability gap (v3.8.1), motivating case for hybrid tool mode ([#33](https://github.com/askalf/dario/issues/33), v3.9.0) |
663
637
 
664
638
  ---
665
639
 
@@ -106,33 +106,33 @@ const TOOL_MAP = {
106
106
  // but default mode relies on these output names being the common case.
107
107
  bash: {
108
108
  ccTool: 'Bash',
109
- translateArgs: (a) => ({ command: a.cmd || a.command || a.c || '' }),
110
- translateBack: (a) => ({ command: a.command ?? '' }),
109
+ translateArgs: (a) => ({ command: a.cmd || a.command || a.c || '', ...(a.description ? { description: a.description } : {}) }),
110
+ translateBack: (a) => ({ command: a.command ?? '', ...(a.description ? { description: a.description } : { description: a.command ?? '' }) }),
111
111
  },
112
112
  exec: {
113
113
  ccTool: 'Bash',
114
- translateArgs: (a) => ({ command: a.cmd || a.command || a.c || '' }),
115
- translateBack: (a) => ({ command: a.command ?? '' }),
114
+ translateArgs: (a) => ({ command: a.cmd || a.command || a.c || '', ...(a.description ? { description: a.description } : {}) }),
115
+ translateBack: (a) => ({ command: a.command ?? '', ...(a.description ? { description: a.description } : { description: a.command ?? '' }) }),
116
116
  },
117
117
  shell: {
118
118
  ccTool: 'Bash',
119
- translateArgs: (a) => ({ command: a.cmd || a.command || a.c || '' }),
120
- translateBack: (a) => ({ command: a.command ?? '' }),
119
+ translateArgs: (a) => ({ command: a.cmd || a.command || a.c || '', ...(a.description ? { description: a.description } : {}) }),
120
+ translateBack: (a) => ({ command: a.command ?? '', ...(a.description ? { description: a.description } : { description: a.command ?? '' }) }),
121
121
  },
122
122
  run: {
123
123
  ccTool: 'Bash',
124
- translateArgs: (a) => ({ command: a.cmd || a.command || '' }),
125
- translateBack: (a) => ({ command: a.command ?? '' }),
124
+ translateArgs: (a) => ({ command: a.cmd || a.command || '', ...(a.description ? { description: a.description } : {}) }),
125
+ translateBack: (a) => ({ command: a.command ?? '', ...(a.description ? { description: a.description } : { description: a.command ?? '' }) }),
126
126
  },
127
127
  command: {
128
128
  ccTool: 'Bash',
129
- translateArgs: (a) => ({ command: a.cmd || a.command || '' }),
130
- translateBack: (a) => ({ command: a.command ?? '' }),
129
+ translateArgs: (a) => ({ command: a.cmd || a.command || '', ...(a.description ? { description: a.description } : {}) }),
130
+ translateBack: (a) => ({ command: a.command ?? '', ...(a.description ? { description: a.description } : { description: a.command ?? '' }) }),
131
131
  },
132
132
  terminal: {
133
133
  ccTool: 'Bash',
134
- translateArgs: (a) => ({ command: a.cmd || a.command || '' }),
135
- translateBack: (a) => ({ command: a.command ?? '' }),
134
+ translateArgs: (a) => ({ command: a.cmd || a.command || '', ...(a.description ? { description: a.description } : {}) }),
135
+ translateBack: (a) => ({ command: a.command ?? '', ...(a.description ? { description: a.description } : { description: a.command ?? '' }) }),
136
136
  },
137
137
  // `process` is OpenClaw's session-manager tool — it's an action-discriminator
138
138
  // shape {action: "list"|"poll"|"log"|..., sessionId?, ...}. Flattening it onto
@@ -192,13 +192,13 @@ const TOOL_MAP = {
192
192
  grep: { ccTool: 'Grep' },
193
193
  search: {
194
194
  ccTool: 'Grep',
195
- translateArgs: (a) => ({ pattern: a.query || a.pattern || '' }),
196
- translateBack: (a) => ({ query: a.pattern ?? '' }),
195
+ translateArgs: (a) => ({ pattern: a.query || a.pattern || '', ...(a.path ? { path: a.path } : {}) }),
196
+ translateBack: (a) => ({ query: a.pattern ?? '', pattern: a.pattern ?? '', path: a.path ?? '.' }),
197
197
  },
198
198
  search_files: {
199
199
  ccTool: 'Grep',
200
- translateArgs: (a) => ({ pattern: a.query || a.pattern || '' }),
201
- translateBack: (a) => ({ query: a.pattern ?? '' }),
200
+ translateArgs: (a) => ({ pattern: a.query || a.pattern || a.regex || '', ...(a.path ? { path: a.path } : {}), ...(a.filePattern ? { glob: a.filePattern } : {}) }),
201
+ translateBack: (a) => ({ query: a.pattern ?? '', pattern: a.pattern ?? '', regex: a.pattern ?? '', path: a.path ?? '.', filePattern: a.glob ?? '' }),
202
202
  },
203
203
  web_search: {
204
204
  ccTool: 'WebSearch',
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@askalf/dario",
3
- "version": "3.13.1",
3
+ "version": "3.13.3",
4
4
  "description": "A local LLM router. One endpoint, every provider — Claude subscriptions, OpenAI, OpenRouter, Groq, local LiteLLM, any OpenAI-compat endpoint — your tools don't need to change.",
5
5
  "type": "module",
6
6
  "bin": {