npm - @askalf/dario - Versions diffs - 3.4.0 → 3.4.1 - Mend

@askalf/dario 3.4.0 → 3.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md CHANGED Viewed

@@ -2,7 +2,9 @@
   <h1 align="center">dario</h1>
   <p align="center"><strong>Use your Claude subscription as an API. The only proxy that bills correctly.</strong></p>
   <p align="center">
-    No API key needed. Your Claude Max/Pro subscription becomes a local API endpoint<br/>that any tool, SDK, or framework can use. Template replay makes every request<br/>indistinguishable from real Claude Code — so your Max plan limits actually work.
+    No API key needed. Your Claude Max/Pro subscription becomes a local API endpoint<br/>
+    that any tool, SDK, or framework can use. Template replay makes every request<br/>
+    indistinguishable from real Claude Code — so your Max plan limits actually work.
   </p>
 </p>
@@ -17,8 +19,8 @@
 <p align="center">
   <a href="#quick-start">Quick Start</a> &bull;
   <a href="#openai-compatibility">OpenAI Compat</a> &bull;
-  <a href="#cli-backend">CLI Backend</a> &bull;
   <a href="#usage-examples">Examples</a> &bull;
+  <a href="#askalf">askalf</a> &bull;
   <a href="#trust--transparency">Trust</a> &bull;
   <a href="#faq">FAQ</a>
 </p>
@@ -33,7 +35,9 @@ export ANTHROPIC_BASE_URL=http://localhost:3456   # or OPENAI_BASE_URL=http://lo
 export ANTHROPIC_API_KEY=dario                    # or OPENAI_API_KEY=dario
 ```
-Opus, Sonnet, Haiku — all models, streaming, tool use. **Zero dependencies.** ~2,100 lines of TypeScript. Works with Cursor, Continue, Aider, LiteLLM, Hermes, OpenClaw, or any tool that speaks the Anthropic or OpenAI API. When rate limited, `--cli` routes through Claude Code for uninterrupted Opus access. Auto-launches under [Bun](https://bun.sh) when available for TLS fingerprint fidelity.
+Opus, Sonnet, Haiku — all models, streaming, tool use. **Zero dependencies.** ~2,000 lines of TypeScript. Works with Cursor, Continue, Aider, LiteLLM, Hermes, OpenClaw, or any tool that speaks the Anthropic or OpenAI API. Auto-launches under [Bun](https://bun.sh) when available for TLS fingerprint fidelity. **Auto-detects OAuth config from your installed CC binary** so dario stays in sync forever — Anthropic can rotate client IDs and dario picks them up on the next run.
+dario is built and maintained by [askalf](https://askalf.org) — the open-source foundation of the askalf agent platform. If you need more than a proxy, [see below](#askalf).
 <table>
 <tr>
@@ -44,7 +48,7 @@ Opus, Sonnet, Haiku — all models, streaming, tool use. **Zero dependencies.**
 **Grok** (xAI)
-*"Dario works great and is safe. Fully functional with OpenClaw / Hermes. Gives you Opus 4.6, Sonnet & Haiku using your existing Claude Max/Pro sub. No extra API key or billing needed. Streaming + tools work perfectly. 100% open-source, runs locally only, proper OAuth (PKCE), no telemetry. Highly recommended if you want a clean local proxy."*
+*"Dario works great and is safe. Fully functional with OpenClaw / Hermes. Gives you Opus, Sonnet & Haiku using your existing Claude Max/Pro sub. No extra API key or billing needed. Streaming + tools work perfectly. 100% open-source, runs locally only, proper OAuth (PKCE), no telemetry. Highly recommended if you want a clean local proxy."*
 </td>
 <td width="33%" valign="top">
@@ -74,15 +78,13 @@ Opus, Sonnet, Haiku — all models, streaming, tool use. **Zero dependencies.**
 </tr>
 </table>
-> **Need more than a proxy?** Dario solves the API access problem. If you need a full agent fleet — desktop control, browser automation, scheduling, custom tools, persistent memory — check out the [askalf platform](https://askalf.org). Same team, different execution model that solves the proxy ceiling entirely.
 ---
 ## Why dario
 Most Claude subscription proxies have a critical billing problem: **Anthropic classifies their requests as third-party and routes all usage to Extra Usage billing** — even when you have Max plan limits available. You're paying for your subscription twice.
-dario is the only proxy that solves this. Instead of transforming your requests signal by signal, dario uses **template replay** — it replaces the entire request with Claude Code's exact template, extracted via MITM capture from CC v2.1.104. 25 tool definitions, 25KB system prompt, exact field order, exact beta headers, exact metadata structure. Only your conversation content is preserved. When Bun is installed, dario auto-relaunches under Bun for TLS fingerprint fidelity matching CC's runtime. Anthropic's classifier sees a genuine Claude Code request because it IS one.
+dario is the only proxy that solves this. Instead of transforming your requests signal by signal, dario uses **template replay** — it replaces the entire request with Claude Code's exact template. 25 tool definitions, 25KB system prompt, exact field order, exact beta headers, exact metadata structure. Only your conversation content is preserved. When Bun is installed, dario auto-relaunches under Bun for TLS fingerprint fidelity matching CC's runtime. Anthropic's classifier sees a genuine Claude Code request because it IS one.
 | | dario | Other proxies |
 |---|---|---|
@@ -95,15 +97,14 @@ dario is the only proxy that solves this. Instead of transforming your requests
 <details>
 <summary><strong>vs competitors</strong></summary>
-| Feature | dario | Meridian (710 stars) | CLIProxyAPI (24K stars) |
+| Feature | dario | Meridian | CLIProxyAPI |
 |---------|-------|---------|------------|
-| Template replay (undetectable) | **Yes** | No | Inherited (CLI-only) |
+| Template replay (undetectable) | **Yes** | No | No |
 | Direct OAuth (streaming, tools) | **Yes** | Yes (SDK-based) | No |
-| CLI fallback (rate limit bypass) | **Yes** | No | Yes (only mode) |
 | OpenAI API compat | **Yes** | Yes | Yes |
 | Orchestration sanitization | **Yes** | Yes | No |
 | Token anomaly detection | **Yes** | Yes | No |
-| Codebase size | ~2,100 lines | ~9,000 lines | Platform |
+| Codebase size | ~2,000 lines | ~9,000 lines | Platform |
 | Dependencies | 0 | Many | Many |
 | Setup | 2 commands | Config + build | Config + dashboard |
@@ -113,15 +114,15 @@ dario is the only proxy that solves this. Instead of transforming your requests
 You pay $100-200/mo for Claude Max or Pro. But that subscription only works on claude.ai and Claude Code. If you want to use Claude with **any other tool** — Cursor, Continue, Aider, your own scripts — you need a separate API key with separate billing.
-**Note:** Claude subscriptions have [usage limits](https://support.claude.com/en/articles/11647753-how-do-usage-and-length-limits-work) that reset on rolling 5-hour and 7-day windows. When exceeded, Opus and Sonnet may return 429 errors while Haiku continues working. You can check your utilization via Claude Code's `/usage` command or [statusline](https://code.claude.com/docs/en/statusline). Use `--cli` mode to route through Claude Code's binary, which is not affected by these limits.
+**dario fixes this.** It creates a local proxy that translates API key auth into your subscription's OAuth tokens. Your subscription handles the billing. No API key needed.
-**dario fixes this.** It creates a local proxy that translates API key auth into your subscription's OAuth tokens — and with `--cli` mode, routes through the Claude Code binary for uninterrupted access.
+**Note:** Claude subscriptions have [usage limits](https://support.claude.com/en/articles/11647753-how-do-usage-and-length-limits-work) that reset on rolling 5-hour and 7-day windows. You can check your utilization via Claude Code's `/usage` command or the [statusline](https://code.claude.com/docs/en/statusline).
 ## Quick Start
 ### Prerequisites
-[Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) installed and logged in (recommended). Dario detects your existing Claude Code credentials automatically.
+[Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) installed and logged in (recommended). Dario detects your existing Claude Code credentials automatically and also auto-extracts the current OAuth client config from the installed CC binary so dario stays in sync with whatever CC version you have, even when Anthropic rotates client IDs.
 If Claude Code isn't installed, dario runs its own OAuth flow — opens your browser, you authorize, done.
@@ -180,46 +181,13 @@ continue  # in VS Code, set base URL in config
 python my_script.py
 ```
-## CLI Backend
-If you're getting rate limited on Opus or Sonnet, use `--cli` mode. This routes requests through the Claude Code binary instead of hitting the API directly. Claude Code has priority routing that continues working even when direct API calls return 429.
-```bash
-dario proxy --cli                    # Opus works even when rate limited
-dario proxy --cli --model=opus       # Force Opus + CLI backend
-```
-```
-dario — http://localhost:3456
-Your Claude subscription is now an API.
-Usage:
-  ANTHROPIC_BASE_URL=http://localhost:3456
-  ANTHROPIC_API_KEY=dario
-Backend: Claude CLI (bypasses rate limits)
-Model: claude-opus-4-6 (all requests)
-```
-**Trade-offs vs direct API mode:**
-| | Direct API (default) | CLI Backend (`--cli`) | Passthrough (`--passthrough`) |
-|---|---|---|---|
-| Streaming | Native SSE | SSE (converted from JSON) | Native SSE |
-| Tool use | Yes | No | Yes |
-| Thinking/billing injection | Yes (Claude-optimized) | N/A | No (OAuth swap only) |
-| Latency | Low | Higher (process spawn) | Low |
-| Rate limits | Priority routing | Not affected | Standard (no priority) |
-| Opus when throttled | Auto CLI fallback | **Always works** | May return 429 |
 ## Passthrough Mode
-For tools that need exact Anthropic protocol fidelity with zero modification, use `--passthrough`. This does OAuth swap only — no billing tag, no thinking injection, no device identity, no extra beta flags. Note: most tools (including Hermes and OpenClaw) work better through default mode, which handles billing classification and token optimization automatically.
+For tools that need exact Anthropic protocol fidelity with zero modification, use `--passthrough` (alias: `--thin`). This does OAuth swap only — no billing tag, no template replay, no device identity, no extra beta flags. Note: most tools (including Hermes and OpenClaw) work better through default mode, which handles billing classification automatically.
 ```bash
 dario proxy --passthrough               # Thin proxy, zero injection
-dario proxy --passthrough --model=opus   # Thin proxy + model override
+dario proxy --thin --model=opus         # Thin proxy + model override
 ```
 ## Model Selection
@@ -235,12 +203,6 @@ dario proxy                   # Passthrough (client decides)
 Full model IDs also work: `--model=claude-opus-4-6`
-Combine with `--cli` for rate-limit-proof Opus:
-```bash
-dario proxy --cli --model=opus
-```
 ## OpenAI Compatibility
 Dario implements `/v1/chat/completions` — any tool built for the OpenAI API works with your Claude subscription. No code changes needed.
@@ -344,8 +306,6 @@ model:
   default: claude-opus-4-6
 ```
-Then run `hermes` normally — it routes through dario using your Claude subscription.
 ### OpenClaw
 Add to your `openclaw.json` models config:
@@ -384,52 +344,81 @@ Add to your `openclaw.json` models config:
 **Note:** Use `http://127.0.0.1:3456` without `/v1` — OpenClaw adds the path itself.
+---
 ## How It Works
 ### Direct API Mode (default) — Template Replay
 ```
-┌──────────┐     ┌─────────────────────┐     ┌──────────────────┐
-│ Your App │ ──> │   dario (proxy)     │ ──> │ api.anthropic.com│
-│          │     │   localhost:3456    │     │                  │
-│ sends    │     │                     │     │  sees a genuine  │
-│ its own  │     │  replaces request   │     │  Claude Code     │
-│ tools &  │     │  with CC template   │     │  request         │
-│ params   │     │  keeps only content │     │                  │
-└──────────┘     └─────────────────────┘     └──────────────────┘
+┌───────────┐     ┌─────────────────────┐     ┌──────────────────┐
+│ Your App  │ ──> │   dario (proxy)     │ ──> │ api.anthropic.com│
+│           │     │   localhost:3456    │     │                  │
+│ sends     │     │                     │     │  sees a genuine  │
+│ its own   │     │  replaces request   │     │  Claude Code     │
+│ tools &   │     │  with CC template   │     │  request         │
+│ params    │     │  keeps only content │     │                  │
+└───────────┘     └─────────────────────┘     └──────────────────┘
 ```
 Your app sends whatever it wants — any tools, any parameters. dario replaces the entire request with Claude Code's template and injects only your conversation content. The upstream sees CC's exact tool definitions, field structure, and parameters.
-### CLI Backend Mode (`--cli`)
+### Passthrough Mode (`--passthrough`)
 ```
-┌──────────┐     ┌─────────────────┐     ┌──────────────────┐
-│ Your App │ ──> │  dario (proxy)  │ ──> │  claude --print  │
-│          │     │  localhost:3456  │     │  (Claude Code)   │
-│ sends    │     │  extracts prompt│     │                  │
-│ API      │     │  spawns CLI     │     │  has priority    │
-│ request  │     │  wraps response │     │  routing         │
-└──────────┘     └─────────────────┘     └──────────────────┘
+┌───────────┐     ┌─────────────────┐     ┌──────────────────┐
+│ Your App  │ ──> │  dario (proxy)  │ ──> │ api.anthropic.com│
+│           │     │  localhost:3456 │     │                  │
+│ sends     │     │  swaps API key  │     │  sees valid      │
+│ API       │     │  for OAuth      │     │  OAuth bearer    │
+│ request   │     │  nothing else   │     │  token           │
+└───────────┘     └─────────────────┘     └──────────────────┘
 ```
-### Passthrough Mode (`--passthrough`)
+### What dario actually sends upstream
+In direct mode, every request dario sends to Anthropic is a genuine Claude Code request. Key fields injected or enforced:
+**Billing tag** — reconstructed using Claude Code's own algorithm extracted from the CC binary:
 ```
-┌──────────┐     ┌─────────────────┐     ┌──────────────────┐
-│ Your App │ ──> │  dario (proxy)  │ ──> │ api.anthropic.com│
-│          │     │  localhost:3456  │     │                  │
-│ sends    │     │  swaps API key  │     │  sees valid      │
-│ API      │     │  for OAuth      │     │  OAuth bearer    │
-│ request  │     │  nothing else   │     │  token           │
-└──────────┘     └─────────────────┘     └──────────────────┘
+x-anthropic-billing-header: cc_version=<version>.<build_tag>; cc_entrypoint=cli; cch=<5-char-hex>;
 ```
+The build tag is `SHA-256(seed + chars[4,7,20] of user message + version).slice(0,3)`. The `cch` is a fresh random 5-char hex per request. Both were extracted via MITM capture.
-1. **`dario login`** — Detects your existing Claude Code credentials (`~/.claude/.credentials.json`) and starts the proxy automatically. If Claude Code isn't installed, runs a PKCE OAuth flow with a local callback server to capture the token automatically.
+**Beta set** — exactly 8 betas from CC, in CC's order:
+```
+claude-code-20250219, oauth-2025-04-20, context-1m-2025-08-07,
+interleaved-thinking-2025-05-14, context-management-2025-06-27,
+prompt-caching-scope-2026-01-05, advisor-tool-2026-03-01, effort-2025-11-24
+```
+**Request headers** — CC's exact Stainless SDK headers, including `x-stainless-runtime-version: v24.3.0` (the Node.js compat version CC reports when running on Bun), `x-app: cli`, `user-agent: claude-cli/<version>`, `anthropic-dangerous-direct-browser-access: true`.
+**Upstream URL** — `api.anthropic.com/v1/messages?beta=true`, matching CC's own request format.
+**Device identity** — `metadata.user_id` loaded from `~/.claude/.claude.json`. Without this, Anthropic classifies the request as third-party and routes it to Extra Usage billing instead of the Max plan allocation.
+**Session ID** — rotates per request via `x-claude-code-session-id`. A persistent session ID across many rapid requests is a behavioral detection signal; CC `--print` creates a new session each invocation.
+**Rate governor** — 500ms minimum between requests (configurable via `DARIO_MIN_INTERVAL_MS`). Configurable for agent workloads that need tighter pacing.
+### OAuth Config Auto-Detection
-2. **`dario proxy`** — Starts an HTTP server on localhost that implements the Anthropic Messages API. In direct mode, it swaps your API key for an OAuth bearer token. In CLI mode, it routes through the Claude Code binary.
+Anthropic periodically rotates the OAuth `client_id`, authorize URL, token URL, and scopes that Claude Code uses. Historically this caused `"Invalid client id"` errors until a new dario release shipped.
-3. **Auto-refresh** — OAuth tokens expire. Dario refreshes them automatically in the background every 15 minutes. Refresh tokens rotate on each use.
+Dario scans the installed CC binary at startup and extracts the current config directly:
+- **Anchor**: `OAUTH_FILE_SUFFIX:"-local-oauth"` — the config block CC uses for clients that run their own localhost callback.
+- **Extracted**: `CLIENT_ID`, `CLAUDE_AI_AUTHORIZE_URL`, `TOKEN_URL`, and the full `user:*` scope string.
+- **Cached**: Results stored at `~/.dario/cc-oauth-cache.json` keyed by binary fingerprint (first 64KB sha256 + size + mtime). Cold scan ~500ms, cache hit ~5ms. Re-scans only when CC is upgraded.
+- **Fallback**: If CC is not installed or scanning fails, dario uses known-good hardcoded values. No user action needed.
+- **Override**: Set `DARIO_CC_PATH=/path/to/claude` to point dario at a non-standard CC binary location.
+CC ships **two** OAuth client configurations in one binary — a `-local-oauth` flow (localhost callback) and a platform-hosted flow (`platform.claude.com/oauth/code/callback`). Dario must use the former. The scanner anchors specifically on the local block.
+End-to-end verification lives at [`test/oauth-detector.mjs`](test/oauth-detector.mjs).
+---
 ## Commands
@@ -446,49 +435,45 @@ Your app sends whatever it wants — any tools, any parameters. dario replaces t
 | Flag/Env | Description | Default |
 |----------|-------------|---------|
-| `--cli` | Use Claude CLI as backend (bypasses rate limits) | off |
-| `--passthrough` | Thin proxy — OAuth swap only, no injection | off |
+| `--passthrough` / `--thin` | Thin proxy — OAuth swap only, no injection | off |
+| `--preserve-tools` / `--keep-tools` | Keep client tool schemas instead of remapping to CC tools | off |
 | `--model=MODEL` | Force a model (`opus`, `sonnet`, `haiku`, or full ID) | passthrough |
 | `--port=PORT` | Port to listen on | `3456` |
 | `--verbose` / `-v` | Log every request | off |
-| `DARIO_API_KEY` | If set, all endpoints (except `/health`) require matching `x-api-key` header or `Authorization: Bearer` header | unset (open) |
+| `DARIO_API_KEY` | If set, all endpoints (except `/health`) require matching `x-api-key` or `Authorization: Bearer` | unset (open) |
 | `DARIO_NO_BUN` | Disable automatic Bun relaunch (stay on Node.js) | unset |
 | `DARIO_MIN_INTERVAL_MS` | Minimum ms between requests (rate governor) | `500` |
+| `DARIO_CC_PATH` | Override path to Claude Code binary for OAuth detection | auto-detect |
 ## Supported Features
 ### Direct API Mode
 - All Claude models (Opus 4.6, Sonnet 4.6, Haiku 4.5) + 1M extended context aliases (`opus1m`, `sonnet1m`)
-- **Template replay** (v3.0+) — replaces the entire request with Claude Code's exact template, extracted via MITM capture from CC v2.1.104. 25 tool definitions, 25KB system prompt, exact body key order, exact beta headers (model-conditional), exact metadata structure. Client tools are mapped to CC equivalents and reverse-mapped in responses. Template data stored as JSON for easy updates. See [Discussion 13](https://github.com/askalf/dario/discussions/13) and [Discussion 14](https://github.com/askalf/dario/discussions/14).
-- **Bun auto-relaunch** (v3.2) — auto-detects Bun and relaunches under it for TLS fingerprint fidelity. CC runs on Bun; Node.js has a different TLS fingerprint visible at the network level.
-- **Session ID rotation** (v3.2) — each request gets a fresh session ID, matching CC `--print` behavior.
-- **Rate governor** (v3.2) — 500ms minimum interval between requests prevents inhuman cadence. Configurable via `DARIO_MIN_INTERVAL_MS`.
-- **Enriched 429 errors** — rate limit errors include utilization %, limiting window, and reset time instead of Anthropic's default `"Error"` message
-- **Auto CLI fallback** — if the API returns 429 and Claude Code is installed, transparently retries through `claude --print` with SSE conversion
-- **OpenAI-compatible** (`/v1/chat/completions`) — works with any OpenAI SDK or tool
-- Streaming and non-streaming (both Anthropic and OpenAI SSE formats, including tool_use streaming)
-- Tool use / function calling
-- System prompts and multi-turn conversations
-- Prompt caching and extended thinking
-- **Billable beta filtering** — strips `extended-cache-ttl` from client betas (the only prefix requiring Extra Usage)
-- **Beta deduplication** — client-provided betas are deduplicated against the base set before appending
-- **Orchestration tag sanitization** — strips agent-injected XML (`<system-reminder>`, `<env>`, `<task_metadata>`, etc.) before forwarding
-- **Token anomaly detection** — warns on context spike (>60% input growth) or output explosion (>2x previous)
-- Concurrency control (max 10 concurrent upstream requests)
-- CORS enabled (works from browser apps on localhost)
-### CLI Backend Mode
-- All Claude models — including Opus when rate limited
-- Streaming via SSE conversion (client sends `stream: true`, CLI JSON response is converted to Anthropic or OpenAI SSE events)
-- OpenAI compatibility (translates OpenAI → Anthropic before CLI, Anthropic → OpenAI after)
-- System prompts and multi-turn conversations (via context injection)
-- Not affected by API rate limits
+- **Template replay** — replaces the entire request with Claude Code's exact template. 25 tool definitions, 25KB system prompt, exact body key order, exact beta headers (model-conditional), exact metadata structure. Client tools are mapped to CC equivalents and reverse-mapped in responses. Template data stored as JSON for easy updates.
+- **`--preserve-tools` mode** — opt-out of CC tool schema replacement for agent frameworks that rely on their own custom tool definitions. Default mode still remaps for maximum detection resistance.
+- **OAuth auto-detect** — scans the installed CC binary for OAuth config at startup. Stays in sync with whatever CC version you have; falls back to known-good hardcoded values if no binary is found. Override with `DARIO_CC_PATH`.
+- **Bun auto-relaunch** — auto-detects Bun and relaunches under it for TLS fingerprint fidelity. CC runs on Bun; Node.js has a different TLS fingerprint visible at the network level.
+- **Session ID rotation** — each request gets a fresh session ID via `x-claude-code-session-id`, matching CC behavior.
+- **Rate governor** — configurable minimum interval between requests via `DARIO_MIN_INTERVAL_MS`.
+- **Enriched 429 errors** — rate limit errors include utilization %, limiting window, and reset time instead of Anthropic's default `"Error"` message.
+- **Auto-retry on long-context errors** — when Anthropic returns 400 or 429 with `"long context beta is not yet available"` or `"Extra usage is required"`, dario transparently retries without the `context-1m-2025-08-07` beta flag.
+- **OpenAI-compatible** (`/v1/chat/completions`) — works with any OpenAI SDK or tool.
+- Streaming and non-streaming (both Anthropic and OpenAI SSE formats, including tool_use streaming).
+- Tool use / function calling.
+- System prompts and multi-turn conversations.
+- Prompt caching and extended thinking.
+- **Billable beta filtering** — strips `extended-cache-ttl` from client betas (the only beta requiring Extra Usage enabled on the account).
+- **Beta deduplication** — client-provided betas are deduplicated against the base set before appending.
+- **Orchestration tag sanitization** — strips agent-injected XML (`<system-reminder>`, `<env>`, `<task_metadata>`, etc.) before forwarding.
+- **Token anomaly detection** — warns on context spike (>60% input growth) or output explosion (>2x previous).
+- Concurrency control (max 10 concurrent upstream requests).
+- CORS enabled (works from browser apps on localhost).
 ### Passthrough Mode
-- All Claude models with native streaming and tool use
-- OAuth token swap only — no billing tag, thinking, effort, or device identity injection
-- Minimal beta flags (`oauth-2025-04-20` + client betas only)
-- For tools that need exact Anthropic protocol fidelity with zero modification
+- All Claude models with native streaming and tool use.
+- OAuth token swap only — no billing tag, no template injection, no device identity.
+- Minimal beta flags (`oauth-2025-04-20` + client betas only).
+- For tools that need exact Anthropic protocol fidelity with zero modification.
 ## Endpoints
@@ -521,17 +506,45 @@ curl http://localhost:3456/health
 |---------|---------------------|
 | Credential storage | Reads from Claude Code (`~/.claude/.credentials.json`) or its own store (`~/.dario/credentials.json`) with `0600` permissions |
 | OAuth flow | PKCE (Proof Key for Code Exchange) — no client secret needed |
-| Token transmission | OAuth tokens never leave localhost. Only forwarded to `api.anthropic.com` over HTTPS |
-| Network exposure | Proxy binds to `127.0.0.1` only — not accessible from other machines |
-| SSRF protection | Hardcoded allowlist of API paths — only `/v1/messages`, `/v1/models`, `/v1/complete` are proxied |
-| Token rotation | Refresh tokens rotate on every use (single-use) |
-| Error sanitization | Token patterns redacted from all error messages |
-| Data collection | Zero. No telemetry, no analytics, no phoning home |
+| OAuth config source | Auto-detected from local CC binary at runtime; cached at `~/.dario/cc-oauth-cache.json`. Detector reads binary in read-only mode, never modifies it. |
+| Token exposure | Tokens never logged; redacted from all error messages. |
+| Network binding | Binds exclusively to `127.0.0.1`. Upstream traffic goes only to `api.anthropic.com` over HTTPS. |
+| Auth timing | `timingSafeEqual` used for `DARIO_API_KEY` comparison. |
+| SSRF protection | Only `/v1/messages` and `/v1/complete` are proxied upstream — hardcoded allowlist. |
+| Body size | 10MB hard cap per request. 30s read timeout prevents slow-loris. |
+| Token refresh | Auto-refreshes 30 minutes before expiry. Refresh tokens rotate on each use. Mutex prevents concurrent refresh races. |
+| Telemetry | None. Zero analytics, tracking, or data collection of any kind. |
+---
+## askalf
+dario solves the API access problem — your $200/mo subscription, usable everywhere, billed correctly.
+But a proxy has a ceiling. Every request still runs on your single account, with your subscription's rate limits, on your machine. When you need to scale beyond that — multiple accounts, persistent browser sessions, desktop control, scheduled workflows, a fleet of agents that can run while you sleep — that's what [askalf](https://askalf.org) is built for.
+**askalf** is the agent platform built on top of the same OAuth and billing infrastructure that powers dario:
+| | dario | askalf |
+|---|---|---|
+| **What it is** | Local proxy, single account | Hosted agent fleet, multi-account |
+| **Rate limits** | Your subscription's limits | Distributed across fleet, near-zero 429s |
+| **Browser / desktop** | No | Yes — full computer use |
+| **Scheduling** | No | Yes — cron, webhooks, triggers |
+| **Persistent memory** | No | Yes — per-agent memory and context |
+| **Custom tools** | Via `--preserve-tools` | Native MCP tool server |
+| **Setup** | 2 commands | Waitlist → dashboard |
+If you're running multi-agent workflows, hitting rate limits on Claude Max, or want agents that run 24/7 without babysitting, **[join the waitlist at askalf.org](https://askalf.org)**.
+dario will always be open-source and free. askalf is the hosted tier for teams who need more.
+---
 ## FAQ
 **Does this violate Anthropic's terms of service?**
-Dario uses your existing Claude Code credentials with the same OAuth tokens. It authenticates you as you, with your subscription, through Anthropic's official API. The `--cli` mode literally uses Claude Code itself as the backend.
+Dario uses your existing Claude Code credentials with the same OAuth tokens. It authenticates you as you, with your subscription, through Anthropic's official API.
 **What subscription plans work?**
 Claude Max and Claude Pro. Any plan that lets you use Claude Code.
@@ -540,7 +553,7 @@ Claude Max and Claude Pro. Any plan that lets you use Claude Code.
 Should work if your plan includes Claude Code access. Not tested yet — please open an issue with results.
 **Do I need Claude Code installed?**
-Recommended but not required. If Claude Code is installed and logged in, `dario login` picks up your credentials automatically. Without Claude Code, dario runs its own OAuth flow to authenticate directly. Note: `--cli` mode requires Claude Code (`npm install -g @anthropic-ai/claude-code`).
+Recommended but not required. If Claude Code is installed and logged in, `dario login` picks up your credentials automatically. Without Claude Code, dario runs its own OAuth flow to authenticate directly.
 **First time setup — account priming**
 If dario is the first thing you use with a new Claude account, run a few real Claude Code commands first to establish a session baseline:
@@ -557,14 +570,19 @@ Optional but recommended. If [Bun](https://bun.sh) is installed, dario auto-rela
 **What happens when my token expires?**
 Dario auto-refreshes tokens 30 minutes before expiry. You should never see an auth error in normal use. If something goes wrong, `dario refresh` forces an immediate refresh.
-**I'm getting rate limited on Opus. What do I do?**
-Use `--cli` mode: `dario proxy --cli`. This routes through the Claude Code binary, which continues working when direct API calls are rate limited. In default mode, dario automatically falls back to CLI when it detects a 429 (if Claude Code is installed). Rate limit errors include utilization percentages and reset times so you can see exactly when capacity returns. You can also enable [extra usage](https://support.claude.com/en/articles/12429409-manage-extra-usage-for-paid-claude-plans) in your Anthropic account settings to extend your limits at API rates.
+**What happens when Anthropic rotates the OAuth client_id or URL?**
+Dario auto-detects OAuth config from your installed Claude Code binary. When CC ships a new version with rotated values, dario picks them up on the next startup — no dario release needed. The detector is cached at `~/.dario/cc-oauth-cache.json` and only re-scans when the binary fingerprint changes. If CC isn't installed, dario falls back to known-good hardcoded values.
+**I'm hitting rate limits. What do I do?**
+Claude subscriptions have rolling 5-hour and 7-day usage windows. Check your utilization with Claude Code's `/usage` command or the [statusline](https://code.claude.com/docs/en/statusline). Rate limit errors from dario include utilization percentages and reset times so you can see exactly when capacity returns.
+If you're running a multi-agent workload and consistently hitting limits, [askalf](https://askalf.org) distributes load across multiple accounts automatically.
 **What are the usage limits?**
-Claude subscriptions have rolling 5-hour and 7-day usage windows shared across claude.ai and Claude Code. See [Anthropic's docs](https://support.claude.com/en/articles/11647753-how-do-usage-and-length-limits-work) for details. In Claude Code, use `/usage` to check your current limits, or configure the [statusline](https://code.claude.com/docs/en/statusline) to show real-time 5h and 7d utilization percentages.
+Claude subscriptions have rolling 5-hour and 7-day usage windows shared across claude.ai and Claude Code. See [Anthropic's docs](https://support.claude.com/en/articles/11647753-how-do-usage-and-length-limits-work) for details.
 **Can I run this on a server?**
-Dario binds to localhost by default. For server use, you'd need to handle the initial login on a machine with a browser, then copy `~/.claude/.credentials.json` (or `~/.dario/credentials.json`) to your server. Auto-refresh will keep it alive from there.
+Dario binds to localhost by default. For server use, handle the initial login on a machine with a browser, then copy `~/.claude/.credentials.json` (or `~/.dario/credentials.json`) to your server. Auto-refresh will keep it alive from there.
 **Why "dario"?**
 Named after [Dario Amodei](https://en.wikipedia.org/wiki/Dario_Amodei), CEO of Anthropic.
@@ -579,12 +597,12 @@ import { startProxy, getAccessToken, getStatus } from "@askalf/dario";
 // Start the proxy programmatically
 await startProxy({ port: 3456, verbose: true });
-// CLI backend mode
-await startProxy({ port: 3456, cliBackend: true, model: "opus" });
 // Passthrough mode (OAuth swap only, no injection)
 await startProxy({ port: 3456, passthrough: true });
+// Preserve-tools mode (keep client tool schemas)
+await startProxy({ port: 3456, preserveTools: true });
 // Or just get a raw access token
 const token = await getAccessToken();
@@ -599,7 +617,7 @@ Dario handles your OAuth tokens. Here's why you can trust it:
 | Signal | Status |
 |--------|--------|
-| **Source code** | ~2,100 lines of TypeScript — small enough to audit in one sitting |
+| **Source code** | ~2,000 lines of TypeScript — small enough to audit in one sitting |
 | **Dependencies** | 0 runtime dependencies. Verify: `npm ls --production` |
 | **npm provenance** | Every release is [SLSA attested](https://www.npmjs.com/package/@askalf/dario) via GitHub Actions |
 | **Security scanning** | [CodeQL](https://github.com/askalf/dario/actions/workflows/codeql.yml) runs on every push and weekly |
@@ -625,21 +643,22 @@ cd $(npm root -g)/@askalf/dario && npm ls --production
 |-------|------|
 | v3.0 Template Replay — why we stopped matching signals | [Discussion 14](https://github.com/askalf/dario/discussions/14) |
 | Claude Code defaults are detection signals, not optimizations | [Discussion 13](https://github.com/askalf/dario/discussions/13) |
-| Why Opus 4.6 feels worse and how to fix it | [Discussion 9](https://github.com/askalf/dario/discussions/9) |
+| Why Opus feels worse through other proxies and how to fix it | [Discussion 9](https://github.com/askalf/dario/discussions/9) |
 | Billing tag algorithm and fingerprint analysis | [Discussion 8](https://github.com/askalf/dario/discussions/8) |
 | Rate limit header analysis | [Discussion 1](https://github.com/askalf/dario/discussions/1) |
 ## Contributing
-PRs welcome. The codebase is ~2,100 lines of TypeScript across 5 files:
+PRs welcome. The codebase is ~2,000 lines of TypeScript across 7 files:
 | File | Purpose |
 |------|---------|
-| `src/proxy.ts` | HTTP proxy server, CLI backend, rate governor |
-| `src/cc-template.ts` | CC template engine + tool mapping |
-| `src/cc-template-data.json` | MITM-extracted CC data (25 tools, 25KB system prompt) |
-| `src/oauth.ts` | Token storage, refresh, credential detection |
-| `src/cli.ts` | CLI entry point + Bun auto-relaunch |
+| `src/proxy.ts` | HTTP proxy server, rate governor, billing tag, response forwarding |
+| `src/cc-template.ts` | Template engine, tool mapping, orchestration sanitization |
+| `src/cc-template-data.json` | CC request template data (25 tools, 25KB system prompt) |
+| `src/cc-oauth-detect.ts` | Auto-detect OAuth config from the installed CC binary |
+| `src/oauth.ts` | Token storage, PKCE flow, auto-refresh, credential detection |
+| `src/cli.ts` | CLI entry point, command routing, Bun auto-relaunch |
 | `src/index.ts` | Library exports |
 ```bash
@@ -654,7 +673,7 @@ npm run dev   # runs with tsx (no build needed)
 | Who | Contributions |
 |-----|---------------|
 | [@GodsBoy](https://github.com/GodsBoy) | Proxy authentication, token redaction, error sanitization ([#2](https://github.com/askalf/dario/pull/2)) |
-| [@belangertrading](https://github.com/belangertrading) | Billing classification investigation ([#4](https://github.com/askalf/dario/issues/4)), Opus/Sonnet 429 diagnosis + CLI fallback workaround ([#6](https://github.com/askalf/dario/issues/6)), billing reclassification root cause ([#7](https://github.com/askalf/dario/issues/7)) |
+| [@belangertrading](https://github.com/belangertrading) | Billing classification investigation ([#4](https://github.com/askalf/dario/issues/4)), billing reclassification root cause ([#7](https://github.com/askalf/dario/issues/7)) |
 ## License

package/dist/cc-template.d.ts CHANGED Viewed

@@ -1,11 +1,11 @@
 /**
- * Claude Code request template — auto-extracted from CC v2.1.104 MITM capture.
+ * Claude Code request template.
  *
  * Tool definitions, system prompt, and request structure are loaded from
- * cc-template-data.json (extracted via MITM proxy from real CC session).
- * This ensures byte-level fidelity with real CC requests.
+ * cc-template-data.json and sent verbatim — this gives byte-level fidelity
+ * with the shape of a real Claude Code request.
  */
-/** CC's exact tool definitions — loaded from MITM capture. */
+/** CC's exact tool definitions — loaded from the template JSON. */
 export declare const CC_TOOL_DEFINITIONS: {
     name: string;
     description: string;

package/dist/cc-template.js CHANGED Viewed

@@ -1,9 +1,9 @@
 /**
- * Claude Code request template — auto-extracted from CC v2.1.104 MITM capture.
+ * Claude Code request template.
  *
  * Tool definitions, system prompt, and request structure are loaded from
- * cc-template-data.json (extracted via MITM proxy from real CC session).
- * This ensures byte-level fidelity with real CC requests.
+ * cc-template-data.json and sent verbatim — this gives byte-level fidelity
+ * with the shape of a real Claude Code request.
  */
 import { readFileSync } from 'node:fs';
 import { join, dirname } from 'node:path';
@@ -11,7 +11,7 @@ import { fileURLToPath } from 'node:url';
 const __dirname = dirname(fileURLToPath(import.meta.url));
 // Load template data at module init — fail fast if missing
 const TEMPLATE = JSON.parse(readFileSync(join(__dirname, 'cc-template-data.json'), 'utf-8'));
-/** CC's exact tool definitions — loaded from MITM capture. */
+/** CC's exact tool definitions — loaded from the template JSON. */
 export const CC_TOOL_DEFINITIONS = TEMPLATE.tools;
 /** CC's static system prompt (~25KB). */
 export const CC_SYSTEM_PROMPT = TEMPLATE.system_prompt;
@@ -182,7 +182,7 @@ export function buildCCRequest(clientBody, billingTag, cache1h, identity, opts =
         systemText = systemText.replace(pattern, '');
     }
     // ── Build the CC request from template ──
-    // Key order matches CC v2.1.104 MITM capture exactly:
+    // Key order matches CC v2.1.104 exactly:
     // model, messages, system, tools, metadata, max_tokens, thinking, context_management, output_config, stream
     //
     // System prompt structure (3 blocks, matching real CC):

package/dist/cli.js CHANGED Viewed

@@ -126,12 +126,11 @@ async function proxy() {
         process.exit(1);
     }
     const verbose = args.includes('--verbose') || args.includes('-v');
-    const cliBackend = args.includes('--cli');
     const passthrough = args.includes('--passthrough') || args.includes('--thin');
     const preserveTools = args.includes('--preserve-tools') || args.includes('--keep-tools');
     const modelArg = args.find(a => a.startsWith('--model='));
     const model = modelArg ? modelArg.split('=')[1] : undefined;
-    await startProxy({ port, verbose, model, cliBackend, passthrough, preserveTools });
+    await startProxy({ port, verbose, model, passthrough, preserveTools });
 }
 async function help() {
     console.log(`
@@ -149,15 +148,14 @@ async function help() {
                              Shortcuts: opus, sonnet, haiku
                              Full IDs: claude-opus-4-6, claude-sonnet-4-6
                              Default: passthrough (client decides)
-    --cli                    Use Claude CLI as backend (bypasses rate limits)
-    --passthrough            Thin proxy — OAuth swap only, no injection
+    --passthrough, --thin    Thin proxy — OAuth swap only, no injection
     --preserve-tools         Keep client tool schemas (for agents with custom tools)
     --port=PORT              Port to listen on (default: 3456)
     --verbose, -v            Log all requests
   Quick start:
     dario login              # auto-detects Claude Code credentials
-    dario proxy              # or: dario proxy --cli --model=opus
+    dario proxy --model=opus # or: dario proxy --passthrough
   Then point any Anthropic SDK at http://localhost:3456:
     export ANTHROPIC_BASE_URL=http://localhost:3456

package/dist/proxy.d.ts CHANGED Viewed

@@ -2,7 +2,6 @@ interface ProxyOptions {
     port?: number;
     verbose?: boolean;
     model?: string;
-    cliBackend?: boolean;
     passthrough?: boolean;
     preserveTools?: boolean;
 }

package/dist/proxy.js CHANGED Viewed

@@ -1,9 +1,9 @@
 import { createServer } from 'node:http';
 import { randomUUID, randomBytes, timingSafeEqual, createHash } from 'node:crypto';
-import { execSync, spawn } from 'node:child_process';
-import { readFileSync, readdirSync, writeFileSync, unlinkSync } from 'node:fs';
+import { execSync } from 'node:child_process';
+import { readFileSync, readdirSync } from 'node:fs';
 import { join } from 'node:path';
-import { homedir, tmpdir } from 'node:os';
+import { homedir } from 'node:os';
 import { arch, platform } from 'node:process';
 import { getAccessToken, getStatus } from './oauth.js';
 import { buildCCRequest, reverseMapResponse } from './cc-template.js';
@@ -36,63 +36,29 @@ class Semaphore {
             next();
     }
 }
-// Billing tag hash seed — extracted from Claude Code binary (constant XGA)
+// Billing tag hash seed — matches Claude Code's value
 const BILLING_SEED = '59cf53e54c78';
-// Compute per-request build tag matching Claude Code's Oz$ algorithm:
+// Compute per-request build tag:
 // SHA-256(seed + chars[4,7,20] of user message + version).slice(0,3)
 function computeBuildTag(userMessage, version) {
     const chars = [4, 7, 20].map(i => userMessage[i] || '0').join('');
     return createHash('sha256').update(`${BILLING_SEED}${chars}${version}`).digest('hex').slice(0, 3);
 }
-// Per-request cch: real Claude Code generates a random 5-char hex value each request.
-// Confirmed via MITM: 10 identical requests → 10 unique cch values, no deterministic pattern.
+// Per-request cch: random 5-char hex value each request (Claude Code does the same).
 function computeCch() {
     return randomBytes(3).toString('hex').slice(0, 5);
 }
-// Detect installed Claude Code binary at startup (single exec for both version + availability)
-let cliAvailable = false;
-function detectCli() {
+// Detect installed Claude Code version for the build-tag computation.
+// Falls back to a known-good version if claude isn't on PATH.
+function detectCliVersion() {
     try {
         const out = execSync('claude --version', { timeout: 5000, stdio: 'pipe' }).toString().trim();
-        cliAvailable = true;
-        // Capture major version (e.g., 2.1.100) — build tag is computed per-request
         return out.match(/^([\d]+\.[\d]+\.[\d]+)/)?.[1] ?? '2.1.100';
     }
     catch {
-        cliAvailable = false;
         return '2.1.100';
     }
 }
-/** Convert a non-streaming Messages API response to SSE event stream. */
-function jsonToSse(jsonBody) {
-    try {
-        const msg = JSON.parse(jsonBody);
-        const events = [];
-        // message_start
-        events.push(`event: message_start\ndata: ${JSON.stringify({ type: 'message_start', message: { ...msg, content: [], stop_reason: null } })}\n\n`);
-        // content blocks
-        const content = msg.content;
-        if (content) {
-            for (let i = 0; i < content.length; i++) {
-                const block = content[i];
-                events.push(`event: content_block_start\ndata: ${JSON.stringify({ type: 'content_block_start', index: i, content_block: { type: block.type, ...(block.type === 'text' ? { text: '' } : { thinking: '' }) } })}\n\n`);
-                if (block.type === 'text' && block.text) {
-                    events.push(`event: content_block_delta\ndata: ${JSON.stringify({ type: 'content_block_delta', index: i, delta: { type: 'text_delta', text: block.text } })}\n\n`);
-                }
-                else if (block.type === 'thinking' && block.thinking) {
-                    events.push(`event: content_block_delta\ndata: ${JSON.stringify({ type: 'content_block_delta', index: i, delta: { type: 'thinking_delta', thinking: block.thinking } })}\n\n`);
-                }
-                events.push(`event: content_block_stop\ndata: ${JSON.stringify({ type: 'content_block_stop', index: i })}\n\n`);
-            }
-        }
-        // message_stop
-        events.push(`event: message_stop\ndata: ${JSON.stringify({ type: 'message_stop' })}\n\n`);
-        return events.join('');
-    }
-    catch {
-        return '';
-    }
-}
 /** Extract first user message text from a request body for billing tag computation. */
 function extractFirstUserMessage(body) {
     const messages = body.messages;
@@ -109,42 +75,8 @@ function extractFirstUserMessage(body) {
     }
     return '';
 }
-/** Convert CLI JSON response to OpenAI SSE format. */
-function jsonToOpenaiSse(jsonBody) {
-    try {
-        const parsed = JSON.parse(jsonBody);
-        const text = parsed.content?.find(c => c.type === 'text')?.text ?? '';
-        const ts = Math.floor(Date.now() / 1000);
-        return `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: { content: text }, finish_reason: null }] })}\n\n` +
-            `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: {}, finish_reason: 'stop' }] })}\n\ndata: [DONE]\n\n`;
-    }
-    catch {
-        return '';
-    }
-}
-/** Send a CLI result to the client, handling streaming/format translation. */
-function sendCliResponse(res, cliResult, clientWantsStream, isOpenAI, corsOrigin, securityHeaders) {
-    const headers = { 'Access-Control-Allow-Origin': corsOrigin, ...securityHeaders };
-    const ok = cliResult.status >= 200 && cliResult.status < 300;
-    if (ok && clientWantsStream) {
-        const sseData = isOpenAI ? jsonToOpenaiSse(cliResult.body) : jsonToSse(cliResult.body);
-        if (sseData) {
-            res.writeHead(200, { 'Content-Type': 'text/event-stream', ...headers });
-            res.end(sseData);
-            return;
-        }
-    }
-    if (ok && isOpenAI) {
-        try {
-            cliResult.body = JSON.stringify(anthropicToOpenai(JSON.parse(cliResult.body)));
-        }
-        catch { }
-    }
-    res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, ...headers });
-    res.end(cliResult.body);
-}
-// Session ID rotates per request — each CC --print invocation creates a new session.
-// A persistent session ID across many requests is a detection signal.
+// Session ID rotates per request — fresh UUID per invocation.
+// A persistent session ID across many requests is a behavioral fingerprint.
 let SESSION_ID = randomUUID();
 const OS_NAME = platform === 'win32' ? 'Windows' : platform === 'darwin' ? 'MacOS' : 'Linux';
 // Claude Code device identity — required for Max plan billing classification.
@@ -205,8 +137,7 @@ function filterBillableBetas(betas) {
 const ORCHESTRATION_TAG_NAMES = [
     'system-reminder', 'env', 'system_information', 'current_working_directory',
     'operating_system', 'default_shell', 'home_directory', 'task_metadata',
-    'tool_exec', 'tool_output', 'skill_content', 'skill_files',
-    'directories', 'available_skills', 'thinking',
+    'directories', 'thinking',
 ];
 const ORCHESTRATION_PATTERNS = ORCHESTRATION_TAG_NAMES.flatMap(tag => [
     new RegExp(`<${tag}\\b[^>]*>[\\s\\S]*?<\\/${tag}>`, 'gi'),
@@ -375,123 +306,6 @@ function enrich429(body, headers) {
         return body;
     }
 }
-/**
- * CLI Backend: route requests through `claude --print` instead of direct API.
- * This bypasses rate limiting because Claude Code's binary has priority routing.
- */
-async function handleViaCli(body, model, verbose) {
-    try {
-        const parsed = JSON.parse(body.toString());
-        // Extract the last user message as the prompt
-        const messages = parsed.messages ?? [];
-        const lastUser = [...messages].reverse().find(m => m.role === 'user');
-        if (!lastUser) {
-            return { status: 400, body: JSON.stringify({ error: 'No user message' }), contentType: 'application/json' };
-        }
-        const rawModel = model ?? parsed.model ?? 'claude-opus-4-6';
-        // Validate model name — only allow alphanumeric, hyphens, dots, underscores
-        const effectiveModel = /^[a-zA-Z0-9._-]+$/.test(rawModel) ? rawModel : 'claude-opus-4-6';
-        const prompt = typeof lastUser.content === 'string'
-            ? lastUser.content
-            : JSON.stringify(lastUser.content);
-        // Build claude --print command
-        const args = ['--print', '--model', effectiveModel];
-        // Flatten system prompt — API accepts string or array of content blocks,
-        // but claude --print only accepts a string
-        let systemPrompt = '';
-        if (typeof parsed.system === 'string') {
-            systemPrompt = parsed.system;
-        }
-        else if (Array.isArray(parsed.system)) {
-            systemPrompt = parsed.system
-                .filter(b => b.text)
-                .map(b => b.text)
-                .join('\n\n');
-        }
-        // Include conversation history as context
-        const history = messages.slice(0, -1);
-        if (history.length > 0) {
-            const historyText = history.map(m => `${m.role}: ${typeof m.content === 'string' ? m.content : JSON.stringify(m.content)}`).join('\n');
-            systemPrompt = systemPrompt ? `${systemPrompt}\n\nConversation history:\n${historyText}` : `Conversation history:\n${historyText}`;
-        }
-        // Write system prompt to temp file instead of passing as arg to avoid E2BIG
-        // on large conversation contexts (OS arg size limit ~2MB)
-        let systemPromptFile = null;
-        if (systemPrompt) {
-            systemPromptFile = join(tmpdir(), `dario-sysprompt-${randomUUID()}.txt`);
-            writeFileSync(systemPromptFile, systemPrompt, { mode: 0o600 });
-            args.push('--append-system-prompt-file', systemPromptFile);
-        }
-        if (verbose) {
-            console.log(`[dario:cli] model=${effectiveModel} prompt=${prompt.substring(0, 60)}...`);
-        }
-        // Spawn claude --print
-        return new Promise((resolve) => {
-            // Cleanup temp file when done
-            const cleanup = () => { if (systemPromptFile)
-                try {
-                    unlinkSync(systemPromptFile);
-                }
-                catch { } };
-            const child = spawn('claude', args, {
-                stdio: ['pipe', 'pipe', 'pipe'],
-                timeout: 300_000,
-            });
-            let stdout = '';
-            let stderr = '';
-            const MAX_CLI_OUTPUT = 5_000_000; // 5MB cap per stream — prevents OOM from runaway CLI
-            child.stdout.on('data', (d) => { if (stdout.length < MAX_CLI_OUTPUT)
-                stdout += d.toString(); });
-            child.stderr.on('data', (d) => { if (stderr.length < MAX_CLI_OUTPUT)
-                stderr += d.toString(); });
-            child.stdin.write(prompt);
-            child.stdin.end();
-            child.on('close', (code) => {
-                cleanup();
-                if (code !== 0 || !stdout.trim()) {
-                    resolve({
-                        status: 502,
-                        body: JSON.stringify({ type: 'error', error: { type: 'api_error', message: sanitizeError(stderr.substring(0, 200)) || 'CLI backend failed' } }),
-                        contentType: 'application/json',
-                    });
-                    return;
-                }
-                // Build a proper Messages API response
-                const text = stdout.trim();
-                const estimatedTokens = Math.ceil(text.length / 4);
-                const response = {
-                    id: `msg_${randomUUID().replace(/-/g, '').substring(0, 24)}`,
-                    type: 'message',
-                    role: 'assistant',
-                    model: effectiveModel,
-                    content: [{ type: 'text', text }],
-                    stop_reason: 'end_turn',
-                    stop_sequence: null,
-                    usage: {
-                        input_tokens: Math.ceil(prompt.length / 4),
-                        output_tokens: estimatedTokens,
-                    },
-                };
-                resolve({ status: 200, body: JSON.stringify(response), contentType: 'application/json' });
-            });
-            child.on('error', (err) => {
-                cleanup();
-                resolve({
-                    status: 502,
-                    body: JSON.stringify({ type: 'error', error: { type: 'api_error', message: 'Claude CLI not found. Install Claude Code first.' } }),
-                    contentType: 'application/json',
-                });
-            });
-        });
-    }
-    catch (err) {
-        return {
-            status: 400,
-            body: JSON.stringify({ type: 'error', error: { type: 'invalid_request_error', message: 'Invalid request body' } }),
-            contentType: 'application/json',
-        };
-    }
-}
 export async function startProxy(opts = {}) {
     const port = opts.port ?? DEFAULT_PORT;
     const verbose = opts.verbose ?? false;
@@ -502,7 +316,7 @@ export async function startProxy(opts = {}) {
         console.error('[dario] Not authenticated. Run `dario login` first.');
         process.exit(1);
     }
-    const cliVersion = detectCli();
+    const cliVersion = detectCliVersion();
     const modelOverride = opts.model ? (MODEL_ALIASES[opts.model] ?? opts.model) : null;
     const identity = loadClaudeIdentity();
     if (identity.deviceId) {
@@ -512,7 +326,7 @@ export async function startProxy(opts = {}) {
         console.warn('[dario] WARNING: No Claude Code device identity found. Requests may be billed as Extra Usage.');
         console.warn('[dario] Run Claude Code at least once to generate ~/.claude/.claude.json');
     }
-    // Pre-build static headers (matches real Claude Code captured via MITM)
+    // Pre-build static headers — matches the set a real Claude Code client sends.
     const staticHeaders = passthrough ? {
         'accept': 'application/json',
         'Content-Type': 'application/json',
@@ -531,12 +345,10 @@ export async function startProxy(opts = {}) {
         // Claude Code runs on Bun which reports v24.3.0 as Node compat version
         'x-stainless-runtime-version': 'v24.3.0',
     };
-    const useCli = opts.cliBackend ?? false;
     let requestCount = 0;
     const semaphore = new Semaphore(MAX_CONCURRENT);
-    // Rate governor: CC --print takes ~2-3s per invocation.
-    // Rapid-fire requests from one "session" is an inhuman pattern.
-    // Minimum 500ms between requests — fast enough for agents, slow enough to not flag.
+    // Rate governor — minimum 500ms between requests. Fast enough for agents,
+    // slow enough to not look like a scripted flood of identical traffic.
     let lastRequestTime = 0;
     const MIN_REQUEST_INTERVAL_MS = parseInt(process.env.DARIO_MIN_INTERVAL_MS || '500', 10);
     // Optional proxy authentication — pre-encode key buffer for performance
@@ -656,31 +468,6 @@ export async function startProxy(opts = {}) {
                 clearTimeout(bodyTimeout);
             }
             const body = Buffer.concat(chunks);
-            // CLI backend mode: route through claude --print (works for both Anthropic and OpenAI endpoints)
-            if (useCli && req.method === 'POST' && body.length > 0) {
-                let cliBody = body;
-                let clientWantsStream = false;
-                // Translate OpenAI format before passing to CLI
-                if (isOpenAI) {
-                    try {
-                        const parsed = JSON.parse(body.toString());
-                        clientWantsStream = !!parsed.stream;
-                        cliBody = Buffer.from(JSON.stringify(openaiToAnthropic(parsed, modelOverride)));
-                    }
-                    catch { /* send as-is */ }
-                }
-                else {
-                    try {
-                        const parsed = JSON.parse(body.toString());
-                        clientWantsStream = !!parsed.stream;
-                    }
-                    catch { }
-                }
-                const cliResult = await handleViaCli(cliBody, modelOverride, verbose);
-                requestCount++;
-                sendCliResponse(res, cliResult, clientWantsStream, isOpenAI, corsOrigin, SECURITY_HEADERS);
-                return;
-            }
             // Parse body once, apply OpenAI translation, model override, and sanitization
             let finalBody = body.length > 0 ? body : undefined;
             let ccToolMap = null;
@@ -731,7 +518,7 @@ export async function startProxy(opts = {}) {
                     beta += ',' + clientBeta;
             }
             else {
-                // CC v2.1.104 beta set (exact 8 from MITM capture, exact order).
+                // CC v2.1.104 beta set — 8 flags in the order Claude Code sends them.
                 // context-1m requires Extra Usage — if it 400s, we auto-retry without it.
                 beta = 'claude-code-20250219,oauth-2025-04-20,context-1m-2025-08-07,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05,advisor-tool-2026-03-01,effort-2025-11-24';
                 if (clientBeta) {
@@ -749,7 +536,7 @@ export async function startProxy(opts = {}) {
                 await new Promise(r => setTimeout(r, MIN_REQUEST_INTERVAL_MS - elapsed));
             }
             lastRequestTime = Date.now();
-            // Rotate session ID per request — CC --print creates a new session each invocation
+            // Rotate session ID per request — fresh UUID avoids persistent-session fingerprinting
             SESSION_ID = randomUUID();
             const headers = {
                 ...staticHeaders,
@@ -798,24 +585,20 @@ export async function startProxy(opts = {}) {
                 else if (upstream.status === 429) {
                     // Not a context-1m issue — return enriched 429 directly
                     const enriched = enrich429(peekedBody, upstream.headers);
-                    if (!(cliAvailable && !useCli)) {
-                        const responseHeaders = {
-                            'Content-Type': 'application/json',
-                            'Access-Control-Allow-Origin': corsOrigin,
-                            ...SECURITY_HEADERS,
-                        };
-                        for (const [key, value] of upstream.headers.entries()) {
-                            if (key.startsWith('x-ratelimit') || key.startsWith('anthropic-ratelimit') || key === 'request-id') {
-                                responseHeaders[key] = value;
-                            }
+                    const responseHeaders = {
+                        'Content-Type': 'application/json',
+                        'Access-Control-Allow-Origin': corsOrigin,
+                        ...SECURITY_HEADERS,
+                    };
+                    for (const [key, value] of upstream.headers.entries()) {
+                        if (key.startsWith('x-ratelimit') || key.startsWith('anthropic-ratelimit') || key === 'request-id') {
+                            responseHeaders[key] = value;
                         }
-                        requestCount++;
-                        res.writeHead(429, responseHeaders);
-                        res.end(enriched);
-                        return;
                     }
-                    // Fall through to CLI fallback below — need to re-handle 429 with
-                    // already-consumed body; stash it for the fallback path.
+                    requestCount++;
+                    res.writeHead(429, responseHeaders);
+                    res.end(enriched);
+                    return;
                 }
                 else if (upstream.status === 400) {
                     // Non-long-context 400 — forward upstream error directly.
@@ -836,7 +619,7 @@ export async function startProxy(opts = {}) {
                 }
             }
             // Enrich 429 errors with rate limit details from headers (Anthropic only returns "Error")
-            if (upstream.status === 429 && !(cliAvailable && !useCli)) {
+            if (upstream.status === 429) {
                 const errBody = await upstream.text().catch(() => '');
                 const enriched = enrich429(errBody, upstream.headers);
                 const responseHeaders = {
@@ -854,42 +637,6 @@ export async function startProxy(opts = {}) {
                 res.end(enriched);
                 return;
             }
-            // Auto-fallback: if API returns 429 and CLI is available, retry through CLI binary
-            if (upstream.status === 429 && cliAvailable && !useCli) {
-                const errBody429 = await upstream.text().catch(() => '');
-                if (verbose)
-                    console.log(`[dario] #${requestCount} 429 from API — falling back to CLI`);
-                let clientWantsStream = false;
-                try {
-                    clientWantsStream = !!JSON.parse(body.toString()).stream;
-                }
-                catch { }
-                const cliResult = await handleViaCli(body, modelOverride, verbose);
-                // If CLI fallback also failed, return the original 429 with enriched details
-                // instead of a cryptic 502 from CLI failure
-                if (cliResult.status >= 500) {
-                    if (verbose)
-                        console.log(`[dario] #${requestCount} CLI fallback failed (${cliResult.status}) — returning original 429`);
-                    const enriched = enrich429(errBody429, upstream.headers);
-                    const responseHeaders = {
-                        'Content-Type': 'application/json',
-                        'Access-Control-Allow-Origin': corsOrigin,
-                        ...SECURITY_HEADERS,
-                    };
-                    for (const [key, value] of upstream.headers.entries()) {
-                        if (key.startsWith('x-ratelimit') || key.startsWith('anthropic-ratelimit') || key === 'request-id') {
-                            responseHeaders[key] = value;
-                        }
-                    }
-                    requestCount++;
-                    res.writeHead(429, responseHeaders);
-                    res.end(enriched);
-                    return;
-                }
-                requestCount++;
-                sendCliResponse(res, cliResult, clientWantsStream, isOpenAI, corsOrigin, SECURITY_HEADERS);
-                return;
-            }
             // Detect streaming from content-type (reliable) or body (fallback)
             const contentType = upstream.headers.get('content-type') ?? '';
             const isStream = contentType.includes('text/event-stream');
@@ -1008,7 +755,9 @@ export async function startProxy(opts = {}) {
         process.exit(1);
     });
     server.listen(port, LOCALHOST, () => {
-        const modeLine = passthrough ? 'Mode: passthrough (OAuth swap only, no injection)' : useCli ? 'Backend: Claude CLI (bypasses rate limits)' : `OAuth: ${status.status} (expires in ${status.expiresIn})`;
+        const modeLine = passthrough
+            ? 'Mode: passthrough (OAuth swap only, no injection)'
+            : `OAuth: ${status.status} (expires in ${status.expiresIn})`;
         const modelLine = modelOverride ? `Model: ${modelOverride} (all requests)` : 'Model: passthrough (client decides)';
         console.log('');
         console.log(`  dario — http://localhost:${port}`);
@@ -1023,8 +772,8 @@ export async function startProxy(opts = {}) {
         console.log(`  ${modelLine}`);
         console.log('');
     });
-    // Session presence heartbeat — registers this proxy as an active Claude Code session
-    // Claude Code sends this every 5 seconds; the server uses it for priority routing
+    // Session presence heartbeat — keeps the OAuth session marked active
+    // (matches the ~5s cadence of a real Claude Code session).
     const clientId = randomUUID();
     const connectedAt = new Date().toISOString();
     let lastPresencePulse = 0;

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@askalf/dario",
-  "version": "3.4.0",
+  "version": "3.4.1",
   "description": "Use your Claude subscription as an API. No API key needed. Local proxy for Claude Max/Pro subscriptions.",
   "type": "module",
   "bin": {