@askalf/dario 3.6.0 → 3.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  <p align="center">
2
2
  <h1 align="center">dario</h1>
3
- <p align="center"><strong>Use your Claude subscription inside your local tools and scripts without moving to API billing or rebuilding your workflow around a separate stack.</strong></p>
3
+ <p align="center"><strong>A local LLM router. One endpoint on your machine, every provider behind it, your tools don't need to change.</strong></p>
4
4
  </p>
5
5
 
6
6
  <p align="center">
@@ -14,9 +14,8 @@
14
14
  <p align="center">
15
15
  <a href="#quick-start">Quick Start</a> &bull;
16
16
  <a href="#who-this-is-for">Who it's for</a> &bull;
17
+ <a href="#backends">Backends</a> &bull;
17
18
  <a href="#why-switch">Why switch</a> &bull;
18
- <a href="#how-it-works">How it works</a> &bull;
19
- <a href="#from-standalone-to-askalf">Standalone → askalf</a> &bull;
20
19
  <a href="#trust--transparency">Trust</a> &bull;
21
20
  <a href="#faq">FAQ</a>
22
21
  </p>
@@ -25,191 +24,137 @@
25
24
 
26
25
  ## What it is
27
26
 
28
- dario is a local process that turns your Claude Max or Pro subscription into an API endpoint that any tool on your machine can use. It talks to Anthropic the same way Claude Code does, so your subscription's rate limits are what you spend against not a separate API billing tier.
27
+ Dario runs on your machine and gives every tool you use one local URL that reaches **every LLM you use.** Point Cursor, Continue, Aider, LiteLLM, your own scripts anything that speaks the Anthropic or OpenAI API at `http://localhost:3456`, and dario routes each request to the right backend:
29
28
 
30
- Install it once, log in once (using your existing Claude Code credentials if you have them), and from that point on every tool that speaks the Anthropic API or the OpenAI chat completions API Cursor, Continue, Aider, LiteLLM, your own scripts, whatever — can reach Claude through `http://localhost:3456`.
29
+ - **Claude Max / Pro subscriptions** OAuth-backed, billed against your plan instead of API pricing. Multi-account pooling if you have more than one.
30
+ - **OpenAI** — your API key, routed to `api.openai.com` straight through.
31
+ - **Any OpenAI-compat endpoint** — OpenRouter, Groq, a local LiteLLM, Ollama's openai-compat mode, self-hosted vLLM. Set the backend's `baseUrl` once, done.
31
32
 
32
- **Single-account mode is the default.** You don't need an account anywhere, you don't need to wait for anything, and nothing phones home. Install and run.
33
+ Your tool sees one base URL. `gpt-4` goes to OpenAI. `claude-opus-4-6` goes to your Claude subscription. `llama-3-70b` goes to Groq. None of your tools have to know about any of it.
33
34
 
34
- **Pool mode** (new in v3.5.0) lifts multi-account routing into dario itself. Add two or more Claude subscriptions with `dario accounts add`, and dario starts selecting per request by the account with the most headroom, marking exhausted accounts rejected until they reset. No hosted platform required — you run the pool on your machine, against your own subscriptions. See [Multi-Account Pool Mode](#multi-account-pool-mode) below.
35
-
36
- **Multi-provider routing** (new in v3.6.0) lets dario speak to more than just Claude. Configure an OpenAI-compat backend (OpenAI, OpenRouter, Groq, local LiteLLM, Ollama's openai-compat mode) with `dario backend add openai --key=... [--base-url=...]`, and GPT-family model names at `/v1/chat/completions` route to that backend while Claude model names keep flowing through the Claude subscription path. Point your tool at `http://localhost:3456` once and use any model from any provider. See [Multi-Provider Routing](#multi-provider-routing) below.
37
-
38
- Separately, [askalf](https://askalf.org) is the hosted platform that does the things a local proxy on your machine can't — browser and desktop control, scheduling, persistent memory, 24/7 hosted fleets. Different problem, different tool. Dario does not depend on askalf, and askalf is not required to use any dario feature.
35
+ **No account anywhere is required.** Single-backend Claude dario works with nothing but `dario login`. Multi-backend dario works with nothing but local config files. Nothing phones home. Zero runtime dependencies. ~2,000 lines of TypeScript.
39
36
 
40
37
  ---
41
38
 
42
39
  ## Who this is for
43
40
 
44
41
  **Best fit:**
45
- - Power users already paying for Claude Max or Pro who want Claude available inside their local stack — editors, terminals, scripts, internal tools — without moving to API billing.
46
- - Small teams that work in terminals and IDEs, not in hosted agent stacks, and want Claude as a drop-in provider for whatever tools they already use.
47
- - Anyone who wants Claude Code's billing behavior on their own requests, from their own code.
42
+
43
+ - **Developers using multiple LLMs across multiple tools** who are tired of juggling base URLs, API keys, and per-tool provider configs.
44
+ - **Claude Max or Pro subscribers** who want their subscription usable anywhere that speaks the Anthropic or OpenAI API — without paying API rates for every request.
45
+ - **Teams running local or hosted OpenAI-compat servers** (LiteLLM, vLLM, Ollama, Groq, OpenRouter) who want one stable local endpoint in front of them that every tool can reuse.
46
+ - **Power users running multi-agent workloads on Claude subscriptions** who want multi-account pooling with headroom-aware routing on their own machine, against their own subscriptions, without a hosted platform.
48
47
 
49
48
  **Not a fit:**
50
- - You need vendor-managed production SLAs on every request. Use the Anthropic API directly.
51
- - You need high-scale agent orchestration, multi-account pooling, or session-level classifier shaping. That's [askalf](https://askalf.org), which dario bridges into.
52
- - You want a hosted chat UI. Use claude.ai.
49
+
50
+ - You need vendor-managed production SLAs on every request. Use the provider APIs directly.
51
+ - You need a hosted multi-tenant routing platform with a dashboard. Try [askalf](https://askalf.org), a separate product in the same family — different problem, different tool.
52
+ - You want a chat UI. Use claude.ai or chatgpt.com.
53
53
 
54
54
  ---
55
55
 
56
56
  ## First use case
57
57
 
58
- > Use Claude in your local automation and developer workflows the way you'd normally reach for an API but backed by the subscription you already pay for.
59
-
60
- You install dario, point your existing tool at `http://localhost:3456`, and that tool now sees Claude. The tool doesn't know it's going through a proxy; from its perspective dario *is* the Anthropic API. Your subscription handles the billing. Your Max plan limits are what count against usage.
58
+ > I install dario, point every tool I already use at `http://localhost:3456`, and every LLM I have access to works through that one URL.
61
59
 
62
60
  Flow on a fresh machine:
63
61
 
64
- 1. `npm install -g @askalf/dario`
65
- 2. `dario login` — detects your installed Claude Code credentials, or runs its own OAuth flow if you don't have CC installed
66
- 3. `dario proxy` — starts the local server on port 3456
67
- 4. Set two environment variables in the tool you already use:
68
- ```bash
69
- ANTHROPIC_BASE_URL=http://localhost:3456
70
- ANTHROPIC_API_KEY=dario
71
- ```
72
- 5. That tool now uses your Claude subscription. Streaming works, tool use works, prompt caching works.
73
-
74
- No separate API key. No Extra Usage charges. No rebuilding your workflow around a new provider.
75
-
76
- ---
77
-
78
- ## Why switch
79
-
80
- **Use dario if** you already pay for Claude Max or Pro and you want Claude inside the tools you already use, without paying API rates for every request or routing your work through a second hosted stack.
81
-
82
- **Use dario pool mode if** you're running multi-agent workloads and hitting per-subscription rate limits — add 2–N accounts with `dario accounts add` and dario handles headroom-aware routing across them, all on your machine, against your own subscriptions. No hosted stack to sign up for. See [Multi-Account Pool Mode](#multi-account-pool-mode).
83
-
84
- **Use dario multi-provider routing if** you want one local endpoint that speaks to Claude subscriptions *and* OpenAI / OpenRouter / Groq / a local LiteLLM / anything else OpenAI-compat. `dario backend add openai --key=...`, point your tool at `http://localhost:3456`, and every model from every provider flows through the same URL. See [Multi-Provider Routing](#multi-provider-routing).
85
-
86
- **Use the Anthropic API directly if** you need platform-native primitives, vendor-managed production usage, high-scale control, or SLAs your subscription tier doesn't cover. Dario isn't trying to replace the API — it's trying to unlock the subscription you already bought.
87
-
88
- **Don't use dario if** you want a subprocess bridge that shells out to `claude --print` under the hood. Those tools (openclaw-claude-bridge and similar) work well for single-team single-machine workloads that can accept a one-subscription rate ceiling and a one-machine deployment. Dario is the API-path alternative, which trades that simplicity for pooling-friendly behavior on the wire. Different tradeoffs, different tool.
89
-
90
- ---
91
-
92
- ## Quick Start
93
-
94
62
  ```bash
95
63
  # Install
96
64
  npm install -g @askalf/dario
97
65
 
98
- # Log in (detects Claude Code credentials if installed)
66
+ # Optional: log in to your Claude subscription (Max or Pro)
99
67
  dario login
100
68
 
69
+ # Optional: add an OpenAI-compat backend
70
+ dario backend add openai --key=sk-proj-...
71
+
101
72
  # Start the proxy
102
73
  dario proxy
103
74
 
104
- # Anthropic SDK
75
+ # Use it — set these once, every tool that honors them just works
105
76
  export ANTHROPIC_BASE_URL=http://localhost:3456
106
77
  export ANTHROPIC_API_KEY=dario
107
-
108
- # or OpenAI-compatible tools
109
78
  export OPENAI_BASE_URL=http://localhost:3456/v1
110
79
  export OPENAI_API_KEY=dario
111
80
  ```
112
81
 
113
- Opus, Sonnet, Haiku — all models, streaming, tool use, prompt caching, extended thinking. **Zero runtime dependencies.** ~2,000 lines of TypeScript. Auto-launches under [Bun](https://bun.sh) when available for TLS fingerprint fidelity with Claude Code's runtime. **Auto-detects OAuth config from your installed CC binary** so dario stays in sync forever — Anthropic can rotate client IDs and dario picks them up on the next run.
114
-
115
- ---
116
-
117
- ## How It Works
82
+ Now from the same Cursor/Continue/Aider instance:
118
83
 
119
- Dario has two modes: **direct API mode** (the default) and **passthrough mode** (`--passthrough`).
84
+ - `gpt-4o` OpenAI, your key, straight through
85
+ - `claude-opus-4-6` → Claude subscription, billed against your Max plan
86
+ - `opus` → shortcut, same as above
87
+ - `llama-3.1-70b` on OpenRouter → configure `dario backend add openrouter --key=sk-or-... --base-url=https://openrouter.ai/api/v1`, done
120
88
 
121
- ### Direct API Mode Template Replay
89
+ One URL. Your tool doesn't know or care which provider is answering.
122
90
 
123
- This is the mode you want for almost every case. Dario takes each request you send it and replaces it with a Claude Code request: same 25 tool definitions, same 25KB system prompt, same field order, same beta headers, same metadata structure, same device identity. Only your conversation content is preserved. Anthropic's classifier sees what looks like a Claude Code session because, from the wire up, it *is* one — and that's what keeps your usage on subscription billing instead of Extra Usage.
124
-
125
- ```
126
- ┌───────────┐ ┌─────────────────────┐ ┌──────────────────┐
127
- │ Your App │ ──> │ dario (proxy) │ ──> │ api.anthropic.com│
128
- │ │ │ localhost:3456 │ │ │
129
- │ sends │ │ │ │ sees a genuine │
130
- │ its own │ │ replaces request │ │ Claude Code │
131
- │ tools & │ │ with CC template │ │ request │
132
- │ params │ │ keeps only content │ │ │
133
- └───────────┘ └─────────────────────┘ └──────────────────┘
134
- ```
91
+ ---
135
92
 
136
- The details that matter:
93
+ ## Why switch
137
94
 
138
- - **Billing tag** reconstructed using CC's own algorithm extracted from the binary: `x-anthropic-billing-header: cc_version=<version>.<build_tag>; cc_entrypoint=cli; cch=<5-char-hex>;` where `build_tag = SHA-256(seed + chars[4,7,20] of user message + version).slice(0,3)`.
139
- - **Beta set** — CC's exact beta list in CC's order, minus any beta that would require Extra Usage to be enabled on your account.
140
- - **OAuth config** auto-detected from the installed CC binary at startup. When Anthropic rotates `client_id`, authorize URL, or scopes, dario picks up the new values on the next run without needing a new release. Falls back to hardcoded CC 2.1.104 prod values if CC isn't installed. Cache at `~/.dario/cc-oauth-cache-v3.json`, keyed by binary fingerprint.
141
- - **Session ID** rotates per request via `x-claude-code-session-id`, matching how `claude --print` behaves. A persistent session ID across rapid requests is a behavioral signal.
142
- - **Framework scrub** — framework identifiers and known fingerprint tokens (`OpenClaw`, `sessions_*` tool prefixes, orchestration tags, etc.) are stripped from both the system prompt and message content before the request goes upstream.
143
- - **Bun auto-relaunch** — when Bun is installed, dario relaunches under it so its TLS fingerprint matches CC's runtime. Without Bun, dario runs on Node.js and works fine; the TLS fingerprint is the only difference.
95
+ **Use dario if** you use more than one LLM provider, or more than one tool, or both and you're tired of configuring each tool with a different base URL and API key per provider.
144
96
 
145
- ### Passthrough Mode `--passthrough`
97
+ **Use dario if** you pay for Claude Max or Pro and you want that subscription reachable from every tool on your machine, without paying API rates or opening a second billing surface.
146
98
 
147
- For tools that need exact Anthropic protocol fidelity with nothing injected. This mode does an OAuth token swap and nothing else: no billing tag, no template, no device identity.
99
+ **Use dario pool mode if** you're running multi-agent workloads on Claude subscriptions and hitting per-account rate limits. Add 2–N accounts with `dario accounts add` and dario routes across them by per-account headroom, all on your machine, against your own subscriptions. See [Multi-Account Pool Mode](#multi-account-pool-mode).
148
100
 
149
- ```bash
150
- dario proxy --passthrough
151
- ```
101
+ **Use a provider API directly if** you need vendor-managed production SLAs or high-scale orchestration primitives the providers ship themselves. Dario isn't trying to replace their APIs — it's trying to put one local shim in front of all of them so your tools don't care which is which.
152
102
 
153
- Use it when the upstream tool already builds a Claude-Code-shaped request on its own and you just need the token auth.
154
-
155
- ### Detection scope
156
-
157
- Dario is a **per-request layer**. Every request it sends upstream is designed to be indistinguishable from a Claude Code request, and the per-request scrubbing hardened in v3.4.5 makes that meaningfully harder to fingerprint than it was when v3.0 first shipped. What dario cannot do at the per-request level is defend against Anthropic's session-level behavioral classifiers — those operate on cumulative per-OAuth aggregates (token throughput, conversation depth, streaming duration, inter-arrival timing) and no amount of per-request hardening reaches them. The practical answer to that problem is *distributing* load across multiple subscriptions so no single account accumulates enough signal to trip the classifier — which is what pool mode (below) does.
103
+ **Don't use dario if** you want a subprocess bridge that shells out to `claude --print` under the hood (openclaw-claude-bridge and similar). That's a valid answer for single-team single-machine workloads that can accept a one-subscription rate ceiling and a one-machine deployment different tradeoffs, different tool.
158
104
 
159
105
  ---
160
106
 
161
- ## Multi-Account Pool Mode
162
-
163
- *New in v3.5.0.* Dario can manage multiple Claude subscriptions and route each request to the account with the most headroom. Single-account dario is unchanged and remains the default — pool mode activates **only** when `~/.dario/accounts/` contains 2+ accounts.
107
+ ## Quick Start
164
108
 
165
109
  ```bash
166
- # Add accounts to the pool. Each runs its own OAuth flow.
167
- dario accounts add work
168
- dario accounts add personal
169
- dario accounts add side-project
170
-
171
- # List them
172
- dario accounts list
110
+ # Install
111
+ npm install -g @askalf/dario
173
112
 
174
- # Start the proxy pool mode activates automatically
175
- dario proxy
176
- ```
113
+ # Claude subscription path (detects Claude Code credentials if CC is installed,
114
+ # runs its own OAuth flow otherwise)
115
+ dario login
177
116
 
178
- ### How it routes
117
+ # OpenAI or any OpenAI-compat provider (optional, additive)
118
+ dario backend add openai --key=sk-proj-...
179
119
 
180
- Each incoming request picks the account with the highest **headroom**:
120
+ # Start the proxy
121
+ dario proxy
181
122
 
182
- ```
183
- headroom = 1 - max(util_5h, util_7d)
123
+ # Point anything that speaks the Anthropic or OpenAI API at localhost:3456
124
+ export ANTHROPIC_BASE_URL=http://localhost:3456
125
+ export ANTHROPIC_API_KEY=dario
126
+ export OPENAI_BASE_URL=http://localhost:3456/v1
127
+ export OPENAI_API_KEY=dario
184
128
  ```
185
129
 
186
- The response's `anthropic-ratelimit-unified-*` headers are parsed back into the pool so the next request sees fresh utilization. An account that returns a 429 is marked `rejected` and routed around until its window resets. When every account is exhausted, incoming requests queue for up to 60 seconds waiting for headroom to reappear, with backoff-aware draining.
130
+ Opus, Sonnet, Haiku, GPT-4o, o1, o3, o4, plus anything the configured OpenAI-compat backend serves. Streaming, tool use, prompt caching, extended thinking. **Zero runtime dependencies.** Auto-launches under [Bun](https://bun.sh) when available for TLS fingerprint fidelity with Claude Code's runtime on the Claude path.
187
131
 
188
- Accounts can use different plans — mix Max and Pro accounts freely. The pool doesn't care about tier, only headroom.
132
+ ---
189
133
 
190
- ### Why pool over per-request tricks alone
134
+ ## Backends
191
135
 
192
- Per-request template replay is necessary but not sufficient for multi-agent workloads. Anthropic's classifier operates on cumulative per-OAuth-session aggregates (see the [FAQ entry](#faq) on multi-agent reclassification), and no amount of per-request hardening reaches that layer. The practical answer is *distribution* — spread load so no single account accumulates enough signal to trip anything. Pool mode is the piece that does that, and the headroom-aware selection means you don't have to think about which account is which; dario picks.
136
+ Dario's routing is organized around **backends**, each with its own auth and its own target. v3.6.0 ships two backends, with more coming.
193
137
 
194
- ### Inspection endpoints
138
+ ### 1. Claude subscription backend (built in)
195
139
 
196
- ```bash
197
- # Live pool snapshot — per-account utilization, claim, status
198
- curl http://localhost:3456/accounts
140
+ OAuth-backed Claude Max / Pro, billed against your plan instead of the API. Activated by `dario login`.
199
141
 
200
- # Pool analytics — per-account / per-model stats, burn-rate, exhaustion predictions
201
- curl http://localhost:3456/analytics
202
- ```
142
+ **What it does:**
203
143
 
204
- ### Known scope for v3.5.0
144
+ - Every request is replaced with a Claude Code template before it goes upstream — 25 tool definitions, 25KB system prompt, exact CC field order, exact beta headers, exact metadata structure. Only the conversation content is preserved. Anthropic's classifier sees what looks like a Claude Code session because, from the wire up, it *is* one — and that's what keeps your usage on subscription billing instead of Extra Usage.
145
+ - **Billing tag** reconstructed using CC's own algorithm: `x-anthropic-billing-header: cc_version=<version>.<build_tag>; cc_entrypoint=cli; cch=<5-char-hex>;` where `build_tag = SHA-256(seed + chars[4,7,20] of user message + version).slice(0,3)`.
146
+ - **OAuth config** auto-detected from the installed CC binary at startup. When Anthropic rotates `client_id`, authorize URL, or scopes, dario picks up the new values on the next run without needing a release.
147
+ - **Multi-account pool mode** — see below. Automatic when 2+ accounts are configured.
148
+ - **Framework scrubbing** — known fingerprint tokens (`OpenClaw`, `sessions_*` prefixes, orchestration tags) stripped from system prompt and message content before the request leaves your machine.
149
+ - **Bun auto-relaunch** — when Bun is installed, dario relaunches under it so the TLS fingerprint matches CC's runtime. Without Bun, dario runs on Node.js.
205
150
 
206
- Pool mode v3.5.0 ships **headroom-aware selection across requests**. It does not yet retry a single in-flight request against a different account when that request 429s — that ships in v3.5.1 along with analytics recording wiring. Across-request routing is already effective: a 429 on one request immediately marks that account rejected, and the next request goes somewhere else.
151
+ **Passthrough mode** (`dario proxy --passthrough`) does an OAuth swap and nothing else no template, no identity, no scrubbing. Use it when the upstream tool already builds a Claude-Code-shaped request on its own and you just need the token auth.
207
152
 
208
- ---
153
+ **Detection scope.** The Claude backend is a per-request layer. Template replay and scrubbing are designed to be indistinguishable from Claude Code at the request level. What they *cannot* defend against is Anthropic's session-level behavioral classifier, which operates on cumulative per-OAuth aggregates (token throughput, conversation depth, streaming duration, inter-arrival timing). The practical answer to that is **pool mode** — distributing load across multiple subscriptions so no one account accumulates enough signal to trip anything. See the [FAQ entry](#faq) for the full mechanism.
209
154
 
210
- ## Multi-Provider Routing
155
+ ### 2. OpenAI-compat backend (v3.6.0+)
211
156
 
212
- *New in v3.6.0.* Dario is no longer Claude-only. Configure an OpenAI-compat backend once, and GPT-family model names at `/v1/chat/completions` route to that backend while Claude model names keep flowing through the Claude subscription path. Works with **any** OpenAI-compat provider OpenAI, OpenRouter, Groq, a local LiteLLM instance, Ollama's openai-compat mode, whatever — via a configurable `--base-url`.
157
+ Any provider that speaks the OpenAI Chat Completions API. Activated by:
213
158
 
214
159
  ```bash
215
160
  # OpenAI itself (default base URL)
@@ -221,74 +166,77 @@ dario backend add groq --key=gsk_... --base-url=https://api.groq.com/openai/v1
221
166
  # OpenRouter
222
167
  dario backend add openrouter --key=sk-or-... --base-url=https://openrouter.ai/api/v1
223
168
 
224
- # Local LiteLLM / Ollama / any openai-compat server
169
+ # Local LiteLLM / vLLM / Ollama openai-compat mode
225
170
  dario backend add local --key=anything --base-url=http://127.0.0.1:4000/v1
226
-
227
- # Inspect
228
- dario backend list
229
171
  ```
230
172
 
231
- ### How it routes
173
+ Credentials live at `~/.dario/backends/<name>.json` with mode `0600`.
232
174
 
233
- After the proxy starts, every request at `/v1/chat/completions` is checked:
175
+ **How it routes.** When the OpenAI-compat backend is configured, each request at `/v1/chat/completions` is checked:
234
176
 
235
177
  | Request model | Route |
236
178
  |---|---|
237
- | `gpt-*`, `o1-*`, `o3-*`, `o4-*`, `chatgpt-*`, `text-davinci-*`, `text-embedding-*` | OpenAI-compat backend (if configured) |
238
- | `claude-*` (or the shortcut `opus`/`sonnet`/`haiku`) | Claude subscription path |
239
- | Anything else on `/v1/chat/completions` | Claude subscription path with existing OpenAI-compat translation |
179
+ | `gpt-*`, `o1-*`, `o3-*`, `o4-*`, `chatgpt-*`, `text-davinci-*`, `text-embedding-*` | OpenAI-compat backend |
180
+ | `claude-*` (or `opus` / `sonnet` / `haiku`) | Claude subscription backend |
181
+ | Anything else | Claude backend with OpenAI-compat translation |
240
182
 
241
- Point any tool that speaks the OpenAI Chat Completions API at `http://localhost:3456/v1` once, and both GPT and Claude models work through the same base URL. Cursor, Continue, Aider, any OpenAI SDK they don't need to know anything changed.
183
+ Dario's passthrough for the OpenAI-compat backend is literal: client request body goes upstream as-is, only the `Authorization` header is swapped for the configured API key and the URL is pointed at `baseUrl + /chat/completions`. Response body streams back unchanged.
242
184
 
243
- ### Why this matters
185
+ ### Coming in a follow-up
244
186
 
245
- Dario's earlier layers (template replay, framework scrubbing, pool routing) keep the Claude subscription path defensible against Anthropic's classifier. Multi-provider routing changes the *game board*: when dario also speaks OpenAI and OpenAI-compat, a squeeze on the Claude side stops being an existential issue — traffic for affected workloads shifts to another backend, and dario is still useful. The moment Anthropic ships their own "bring your subscription to the API" feature, dario's Claude backend simplifies and keeps working. Either way, dario is still the local router between every model and every tool on your machine.
187
+ - **Anthropic OpenAI request translation** for `/v1/messages` requests with GPT-family model names (tool_use format, streaming delta conversion).
188
+ - **Multiple simultaneous openai-compat backends** with per-model routing rules (`gpt-*` → OpenAI, `llama-*` → Groq, `mixtral-*` → OpenRouter).
189
+ - **Fallback rules.** "If Claude 429s, use Gemini." v3.6.0 ships the routing plumbing; fallback logic layers on top.
246
190
 
247
- ### Not yet in this release
191
+ ---
248
192
 
249
- - **Cross-format translation** (Anthropic → OpenAI). Requests at `/v1/messages` with GPT-family model names fall through to the existing Claude-side handling (which maps them to Claude equivalents). Full Anthropic → OpenAI request translation, including tool_use format conversion, lands in a follow-up.
250
- - **Per-model routing rules.** v3.6.0 supports one active openai-compat backend. Per-model selection (`llama-*` → Groq, `mixtral-*` → OpenRouter) ships next.
251
- - **Fallback rules.** "If Claude 429s, use Gemini" is a follow-up goal. v3.6.0 ships the routing plumbing; fallback logic layers on top.
193
+ ## Multi-Account Pool Mode
252
194
 
253
- ---
195
+ *New in v3.5.0, for the Claude subscription backend.* Dario can manage multiple Claude subscriptions and route each request to the account with the most headroom. Single-account Claude dario is unchanged — pool mode activates **only** when `~/.dario/accounts/` contains 2+ accounts.
254
196
 
255
- ## Dario and askalf
197
+ ```bash
198
+ dario accounts add work
199
+ dario accounts add personal
200
+ dario accounts add side-project
201
+ dario accounts list
202
+ dario proxy
203
+ ```
256
204
 
257
- Dario is fully useful on its own — single-account mode is the default, pool mode (above) scales to as many Claude subscriptions as you want to add, and neither mode requires an account anywhere. Everything dario does is open-source and self-hosted.
205
+ Each request picks the account with the highest headroom:
258
206
 
259
- [askalf](https://askalf.org) is the hosted platform built on top of the same OAuth and billing infrastructure, targeting the things a local proxy can't deliver by design:
207
+ ```
208
+ headroom = 1 - max(util_5h, util_7d)
209
+ ```
260
210
 
261
- | | dario | askalf |
262
- |---|---|---|
263
- | **Accounts** | 1 (single) or N (pool mode) | Managed pool, no setup |
264
- | **Rate limits** | Distributed across your own pool | Distributed across the hosted fleet |
265
- | **Browser / desktop control** | No | Yes — full computer use |
266
- | **Scheduling** | No | Cron, webhooks, triggers |
267
- | **Persistent memory** | No | Per-agent context and state |
268
- | **Hosted dashboard** | No | Yes |
269
- | **Runs where** | Your machine | Hosted |
270
- | **Price** | Free | Paid |
211
+ The response's `anthropic-ratelimit-unified-*` headers are parsed back into the pool so the next selection sees fresh utilization. An account that returns a 429 is marked `rejected` and routed around until its window resets. When every account is exhausted, requests queue for up to 60 seconds waiting for headroom to reappear.
212
+
213
+ Accounts can mix plans Max and Pro accounts can sit in the same pool; dario doesn't care about tier, only headroom.
214
+
215
+ **Pool inspection endpoints:**
271
216
 
272
- Pool mode in dario covers the "I want multi-account routing on my own machine with my own subscriptions" case. askalf covers the "I want someone else to run this, with a dashboard, and 24/7 fleet capabilities my own machine can't give me" case. Dario is and will remain open-source and free.
217
+ ```bash
218
+ curl http://localhost:3456/accounts # per-account utilization, claim, status
219
+ curl http://localhost:3456/analytics # per-account / per-model stats, burn rate, exhaustion predictions
220
+ ```
273
221
 
274
- **[Join the askalf waitlist →](https://askalf.org)**
222
+ **Scope.** v3.5.0 ships headroom-aware selection *across* requests — a 429 on one request marks the account rejected and the next request goes to a different one. Retrying a single in-flight request against a different account when that request 429s (inside-request failover) ships in v3.5.1 along with analytics recording wiring.
275
223
 
276
224
  ---
277
225
 
278
226
  ## Commands
279
227
 
280
228
  | Command | Description |
281
- |---------|-------------|
282
- | `dario login` | Log in (detects CC credentials or runs its own OAuth flow) |
229
+ |---|---|
230
+ | `dario login` | Log in to the Claude backend (detects CC credentials or runs its own OAuth flow) |
283
231
  | `dario proxy` | Start the local API proxy on port 3456 |
284
- | `dario status` | Show OAuth token health and expiry |
285
- | `dario refresh` | Force an immediate token refresh |
286
- | `dario logout` | Delete stored credentials |
232
+ | `dario status` | Show Claude backend OAuth token health and expiry |
233
+ | `dario refresh` | Force an immediate Claude token refresh |
234
+ | `dario logout` | Delete stored Claude credentials |
287
235
  | `dario accounts list` | List accounts in the multi-account pool |
288
- | `dario accounts add <alias>` | Add a new account to the pool (runs OAuth flow) |
236
+ | `dario accounts add <alias>` | Add a Claude account to the pool (runs OAuth flow) |
289
237
  | `dario accounts remove <alias>` | Remove an account from the pool |
290
238
  | `dario backend list` | List configured OpenAI-compat backends |
291
- | `dario backend add <name> --key=<k> [--base-url=<u>]` | Add an OpenAI-compat backend |
239
+ | `dario backend add <name> --key=<key> [--base-url=<url>]` | Add an OpenAI-compat backend |
292
240
  | `dario backend remove <name>` | Remove an OpenAI-compat backend |
293
241
  | `dario help` | Full command reference |
294
242
 
@@ -296,23 +244,23 @@ Pool mode in dario covers the "I want multi-account routing on my own machine wi
296
244
 
297
245
  | Flag / env | Description | Default |
298
246
  |---|---|---|
299
- | `--passthrough` / `--thin` | Thin proxy — OAuth swap only, no template injection | off |
300
- | `--preserve-tools` / `--keep-tools` | Keep client tool schemas instead of remapping to CC tools | off |
301
- | `--model=<name>` | Force a model (`opus`, `sonnet`, `haiku`, or full ID) | passthrough |
247
+ | `--passthrough` / `--thin` | Thin proxy for the Claude backend — OAuth swap only, no template injection | off |
248
+ | `--preserve-tools` / `--keep-tools` | Keep client tool schemas instead of remapping to CC tools (Claude backend) | off |
249
+ | `--model=<name>` | Force a model (`opus`, `sonnet`, `haiku`, or full ID). Applies to the Claude backend. | passthrough |
302
250
  | `--port=<n>` | Port to listen on | `3456` |
303
251
  | `--host=<addr>` / `DARIO_HOST` | Bind address. Use `0.0.0.0` for LAN, or a specific IP (e.g. a Tailscale interface). When non-loopback, also set `DARIO_API_KEY`. | `127.0.0.1` |
304
252
  | `--verbose` / `-v` | Log every request | off |
305
253
  | `DARIO_API_KEY` | If set, all endpoints (except `/health`) require a matching `x-api-key` or `Authorization: Bearer` header. Required when `--host` binds non-loopback. | unset (open) |
306
- | `DARIO_CORS_ORIGIN` | Override the browser CORS `Access-Control-Allow-Origin`. Useful for browser clients reaching dario over a mesh network. | `http://localhost:${port}` |
254
+ | `DARIO_CORS_ORIGIN` | Override browser CORS origin | `http://localhost:${port}` |
307
255
  | `DARIO_NO_BUN` | Disable automatic Bun relaunch | unset |
308
- | `DARIO_MIN_INTERVAL_MS` | Minimum ms between requests (rate governor) | `500` |
256
+ | `DARIO_MIN_INTERVAL_MS` | Minimum ms between Claude-backend requests (rate governor) | `500` |
309
257
  | `DARIO_CC_PATH` | Override path to the Claude Code binary for OAuth detection | auto-detect |
310
258
 
311
259
  ---
312
260
 
313
261
  ## Usage
314
262
 
315
- ### Python
263
+ ### Python (Anthropic SDK)
316
264
 
317
265
  ```python
318
266
  import anthropic
@@ -330,6 +278,29 @@ msg = client.messages.create(
330
278
  print(msg.content[0].text)
331
279
  ```
332
280
 
281
+ ### Python (OpenAI SDK — same proxy, different provider)
282
+
283
+ ```python
284
+ from openai import OpenAI
285
+
286
+ client = OpenAI(
287
+ base_url="http://localhost:3456/v1",
288
+ api_key="dario",
289
+ )
290
+
291
+ # gpt-4o routes to the configured OpenAI backend
292
+ msg = client.chat.completions.create(
293
+ model="gpt-4o",
294
+ messages=[{"role": "user", "content": "Hello!"}],
295
+ )
296
+
297
+ # claude-opus-4-6 routes to the Claude subscription backend — same SDK, same URL
298
+ claude_msg = client.chat.completions.create(
299
+ model="claude-opus-4-6",
300
+ messages=[{"role": "user", "content": "Hello!"}],
301
+ )
302
+ ```
303
+
333
304
  ### TypeScript / Node.js
334
305
 
335
306
  ```typescript
@@ -347,38 +318,44 @@ const msg = await client.messages.create({
347
318
  });
348
319
  ```
349
320
 
350
- ### OpenAI-compatible (Cursor, Continue, LiteLLM, Aider, …)
321
+ ### OpenAI-compatible tools (Cursor, Continue, Aider, LiteLLM, …)
351
322
 
352
323
  ```bash
353
324
  export OPENAI_BASE_URL=http://localhost:3456/v1
354
325
  export OPENAI_API_KEY=dario
355
326
  ```
356
327
 
357
- Any tool that accepts an OpenAI base URL works as-is. Claude model names pass through directly; GPT-style names (`gpt-4`, `gpt-5.4`, etc.) map to their closest Claude equivalents so tools with hardcoded OpenAI model lists work without code changes.
328
+ Any tool that accepts an OpenAI base URL works. Use Claude model names (`claude-opus-4-6`, `opus`, `sonnet`, `haiku`) for the Claude backend, or GPT-family names for the configured OpenAI-compat backend.
358
329
 
359
330
  ### curl
360
331
 
361
332
  ```bash
333
+ # Claude backend via Anthropic format
362
334
  curl http://localhost:3456/v1/messages \
363
335
  -H "Content-Type: application/json" \
364
336
  -H "anthropic-version: 2023-06-01" \
365
337
  -d '{"model":"claude-opus-4-6","max_tokens":1024,"messages":[{"role":"user","content":"Hello!"}]}'
338
+
339
+ # OpenAI backend via OpenAI format
340
+ curl http://localhost:3456/v1/chat/completions \
341
+ -H "Content-Type: application/json" \
342
+ -H "Authorization: Bearer dario" \
343
+ -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}'
366
344
  ```
367
345
 
368
346
  ### Streaming, tool use, prompt caching, extended thinking
369
347
 
370
- All supported in both Anthropic and OpenAI SSE formats. Tool-use streaming emits `input_json_delta` events. Prompt caching works as-is. Extended thinking is routed through `reasoning_effort` in OpenAI format or the native `thinking` field in Anthropic format.
348
+ All supported. Claude backend: full Anthropic SSE format plus OpenAI-SSE translation for tool_use streaming. OpenAI-compat backend: streaming body forwarded byte-for-byte.
371
349
 
372
350
  ### Library mode
373
351
 
374
- Dario is also importable:
375
-
376
352
  ```typescript
377
- import { startProxy, getAccessToken, getStatus } from "@askalf/dario";
353
+ import { startProxy, getAccessToken, getStatus, listBackends } from "@askalf/dario";
378
354
 
379
355
  await startProxy({ port: 3456, verbose: true });
380
356
  const token = await getAccessToken();
381
357
  const status = await getStatus();
358
+ const backends = await listBackends();
382
359
  ```
383
360
 
384
361
  ### Health check
@@ -387,43 +364,36 @@ const status = await getStatus();
387
364
  curl http://localhost:3456/health
388
365
  ```
389
366
 
390
- ```json
391
- {
392
- "status": "ok",
393
- "oauth": "healthy",
394
- "expiresIn": "11h 42m",
395
- "requests": 47
396
- }
397
- ```
398
-
399
367
  ---
400
368
 
401
369
  ## Endpoints
402
370
 
403
371
  | Path | Description |
404
- |------|-------------|
405
- | `POST /v1/messages` | Anthropic Messages API |
406
- | `POST /v1/chat/completions` | OpenAI-compatible Chat API |
407
- | `GET /v1/models` | Model list (works with both SDKs) |
372
+ |---|---|
373
+ | `POST /v1/messages` | Anthropic Messages API (Claude backend) |
374
+ | `POST /v1/chat/completions` | OpenAI-compatible Chat API (routes by model name) |
375
+ | `GET /v1/models` | Model list (Claude models OpenAI models come from the OpenAI backend directly) |
408
376
  | `GET /health` | Proxy health + OAuth status + request count |
409
- | `GET /status` | Detailed OAuth token status |
377
+ | `GET /status` | Detailed Claude OAuth token status |
378
+ | `GET /accounts` | Pool snapshot (pool mode only) |
379
+ | `GET /analytics` | Per-account / per-model stats, burn rate, exhaustion predictions (pool mode only) |
410
380
 
411
381
  ---
412
382
 
413
383
  ## Trust & Transparency
414
384
 
415
- Dario handles your OAuth tokens. Here's why you can trust it:
385
+ Dario handles your OAuth tokens and API keys locally. Here's why you can trust it:
416
386
 
417
387
  | Signal | Status |
418
388
  |---|---|
419
- | **Source code** | ~2,000 lines of TypeScript across 7 files — small enough to audit in one sitting |
389
+ | **Source code** | ~2,500 lines of TypeScript across 10 files — small enough to audit in one sitting |
420
390
  | **Dependencies** | 0 runtime dependencies. Verify: `npm ls --production` |
421
391
  | **npm provenance** | Every release is [SLSA-attested](https://www.npmjs.com/package/@askalf/dario) via GitHub Actions |
422
392
  | **Security scanning** | [CodeQL](https://github.com/askalf/dario/actions/workflows/codeql.yml) runs on every push and weekly |
423
- | **Credential handling** | Tokens never logged, redacted from errors, stored with `0600` permissions |
424
- | **OAuth flow** | PKCE (Proof Key for Code Exchange) no client secret |
425
- | **Network scope** | Binds to `127.0.0.1` by default. `--host` allows LAN/mesh with `DARIO_API_KEY` gating. Upstream traffic goes only to `api.anthropic.com` over HTTPS |
426
- | **SSRF protection** | Only `/v1/messages` and `/v1/complete` proxy upstream — hardcoded allowlist |
393
+ | **Credential handling** | Tokens and API keys never logged, redacted from errors, stored with `0600` permissions |
394
+ | **OAuth flow** | PKCE (Proof Key for Code Exchange), no client secret |
395
+ | **Network scope** | Binds to `127.0.0.1` by default. `--host` allows LAN/mesh with `DARIO_API_KEY` gating. Upstream traffic goes only to the configured backend target URLs over HTTPS |
396
+ | **SSRF protection** | `/v1/messages` hits `api.anthropic.com` only; `/v1/chat/completions` hits the configured backend `baseUrl` only — hardcoded allowlist |
427
397
  | **Telemetry** | None. Zero analytics, tracking, or data collection |
428
398
  | **Audit trail** | [CHANGELOG.md](CHANGELOG.md) documents every release |
429
399
 
@@ -440,21 +410,21 @@ cd $(npm root -g)/@askalf/dario && npm ls --production
440
410
  ## FAQ
441
411
 
442
412
  **Does this violate Anthropic's terms of service?**
443
- Dario uses your existing Claude Code credentials with the same OAuth tokens CC uses. It authenticates you as you, with your subscription, through Anthropic's official API endpoints.
413
+ Dario's Claude backend uses your existing Claude Code credentials with the same OAuth tokens CC uses. It authenticates you as you, with your subscription, through Anthropic's official API endpoints.
444
414
 
445
- **What subscription plans work?**
415
+ **What subscription plans work on the Claude backend?**
446
416
  Claude Max and Claude Pro. Any plan that lets you use Claude Code.
447
417
 
448
418
  **Does it work with Team / Enterprise?**
449
419
  Should work if your plan includes Claude Code access. Not widely tested yet — open an issue with results.
450
420
 
451
421
  **Do I need Claude Code installed?**
452
- Recommended, not required. With CC installed, `dario login` picks up your credentials automatically. Without CC, dario runs its own OAuth flow against Anthropic's authorize endpoint.
422
+ Recommended for the Claude backend, not strictly required. With CC installed, `dario login` picks up your credentials automatically. Without CC, dario runs its own OAuth flow against Anthropic's authorize endpoint.
453
423
 
454
424
  **Do I need Bun?**
455
- Optional, recommended. Dario auto-relaunches under Bun when it's available so the TLS fingerprint matches CC's runtime. Without Bun, dario runs on Node.js and works fine; the TLS fingerprint is the only difference. Install: `curl -fsSL https://bun.sh/install | bash`.
425
+ Optional, recommended for Claude-backend requests. Dario auto-relaunches under Bun when available so the TLS fingerprint matches CC's runtime. Without Bun, dario runs on Node.js and works fine; the TLS fingerprint is the only difference.
456
426
 
457
- **First time setup on a fresh account.**
427
+ **First time setup on a fresh Claude account.**
458
428
  If dario is the first thing you run against a brand-new Claude account, prime the account with a few real Claude Code commands first:
459
429
  ```bash
460
430
  claude --print "hello"
@@ -462,17 +432,20 @@ claude --print "hello"
462
432
  ```
463
433
  This establishes a session baseline. Without priming, brand-new accounts occasionally see billing classification issues on first use.
464
434
 
465
- **What happens when my token expires?**
466
- Dario auto-refreshes tokens 30 minutes before expiry. `dario refresh` forces an immediate refresh if something goes wrong.
435
+ **What happens when Anthropic rotates the OAuth config?**
436
+ Dario auto-detects OAuth config from the installed Claude Code binary. When CC ships a new version with rotated values, dario picks them up on the next run. Cache at `~/.dario/cc-oauth-cache-v3.json`, keyed by the CC binary fingerprint. Falls back to hardcoded CC 2.1.104 prod values if CC isn't installed.
467
437
 
468
- **What happens when Anthropic rotates the OAuth client_id or URL?**
469
- Dario auto-detects OAuth config from your installed Claude Code binary. When CC ships a new version with rotated values, dario picks them up on the next run — no dario release needed. Cache at `~/.dario/cc-oauth-cache-v3.json`, keyed by the CC binary fingerprint. If CC isn't installed, dario falls back to hardcoded CC 2.1.104 prod values.
470
-
471
- **I'm hitting rate limits. What do I do?**
472
- Claude subscriptions have rolling 5-hour and 7-day usage windows. Check utilization with Claude Code's `/usage` command or the [statusline](https://code.claude.com/docs/en/statusline). Dario's rate-limit errors include utilization percentages and reset times so you can see exactly when capacity returns.
438
+ **I'm hitting rate limits on the Claude backend. What do I do?**
439
+ Claude subscriptions have rolling 5-hour and 7-day usage windows. Check utilization with Claude Code's `/usage` command or the [statusline](https://code.claude.com/docs/en/statusline). For multi-agent workloads, add more accounts and let pool mode distribute the load: `dario accounts add <alias>`.
473
440
 
474
441
  **My multi-agent workload is getting reclassified to overage even though dario template-replays per request. Why?**
475
- Because reclassification at high agent volume is not a per-request problem. Anthropic's classifier operates on cumulative per-OAuth-session behavioral aggregates — token throughput, conversation depth, streaming duration, inter-arrival timing, thinking-block volume. Dario can make each individual request indistinguishable from Claude Code and still hit this wall on a long-running agent session, because the wall isn't at the request level. Thorough diagnostic work on this was contributed by [@belangertrading](https://github.com/belangertrading) in [#23](https://github.com/askalf/dario/issues/23), including the per-request v3.4.3 and v3.4.5 hardening that landed as a result. For the session-layer shaping itselfmulti-account pooling, session rotation, workload distribution that keeps any single account from concentrating the behavioral signal that's what [askalf](https://askalf.org) is built for. Different layer, different tool.
442
+ Reclassification at high agent volume is not a per-request problem. Anthropic's classifier operates on cumulative per-OAuth-session aggregates — token throughput, conversation depth, streaming duration, inter-arrival timing, thinking-block volume. Dario's Claude backend can make each individual request indistinguishable from Claude Code and still hit this wall on a long-running agent session, because the wall isn't at the request level. Thorough diagnostic work on this was contributed by [@belangertrading](https://github.com/belangertrading) in [#23](https://github.com/askalf/dario/issues/23), including the v3.4.3/v3.4.5 hardening that landed as a result. The practical answer at the dario layer is **pool mode** distribute load across multiple subscriptions so no single account accumulates enough signal to trip anything. See [Multi-Account Pool Mode](#multi-account-pool-mode).
443
+
444
+ **Can I route non-OpenAI providers through dario?**
445
+ Yes — anything that speaks the OpenAI Chat Completions API. `dario backend add groq --key=... --base-url=https://api.groq.com/openai/v1`, `dario backend add openrouter --key=... --base-url=https://openrouter.ai/api/v1`, or point at a local LiteLLM / vLLM / Ollama-openai-compat server with `--base-url=http://localhost:4000/v1`. v3.6.0 supports one active OpenAI-compat backend at a time; per-model routing to multiple OpenAI-compat backends ships in a follow-up.
446
+
447
+ **Does dario work with only the OpenAI backend, no Claude subscription?**
448
+ Yes. Don't run `dario login`, just run `dario backend add openai --key=...` and `dario proxy`. Claude-backend requests will return an authentication error; OpenAI-compat requests will work normally. Dario becomes a local OpenAI-compat shim with no Claude involvement.
476
449
 
477
450
  **Why "dario"?**
478
451
  It's a name, not an acronym. Don't overthink it.
@@ -493,15 +466,19 @@ Longer-form writing on how dario works and why it works that way:
493
466
 
494
467
  ## Contributing
495
468
 
496
- PRs welcome. The codebase is ~2,000 lines of TypeScript across 7 files:
469
+ PRs welcome. The codebase is ~2,500 lines of TypeScript across 10 files:
497
470
 
498
471
  | File | Purpose |
499
472
  |---|---|
500
- | `src/proxy.ts` | HTTP proxy server, rate governor, billing tag, response forwarding |
501
- | `src/cc-template.ts` | Template engine, tool mapping, orchestration & framework scrubbing |
473
+ | `src/proxy.ts` | HTTP proxy server, request handler, rate governor, Claude backend dispatch |
474
+ | `src/cc-template.ts` | CC request template engine, tool mapping, orchestration & framework scrubbing |
502
475
  | `src/cc-template-data.json` | CC request template data (25 tools, 25KB system prompt) |
503
476
  | `src/cc-oauth-detect.ts` | OAuth config auto-detection from the installed CC binary |
504
- | `src/oauth.ts` | Token storage, PKCE flow, auto-refresh |
477
+ | `src/oauth.ts` | Single-account token storage, PKCE flow, auto-refresh |
478
+ | `src/accounts.ts` | Multi-account credential storage and independent OAuth lifecycle |
479
+ | `src/pool.ts` | Account pool, headroom-aware routing, failover target selection |
480
+ | `src/analytics.ts` | Rolling request history, per-account / per-model stats, burn-rate |
481
+ | `src/openai-backend.ts` | OpenAI-compat backend credential storage and request forwarder |
505
482
  | `src/cli.ts` | CLI entry point, command routing, Bun auto-relaunch |
506
483
  | `src/index.ts` | Library exports |
507
484