npm - @askalf/dario - Versions diffs - 2.7.1 → 2.8.1 - Mend

@askalf/dario 2.7.1 → 2.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md CHANGED Viewed

@@ -193,13 +193,23 @@ Model: claude-opus-4-6 (all requests)
 **Trade-offs vs direct API mode:**
-| | Direct API (default) | CLI Backend (`--cli`) |
-|---|---|---|
-| Streaming | Yes | No (full response) |
-| Tool use passthrough | Yes | No |
-| Latency | Low | Higher (process spawn) |
-| Rate limits | Subject to 5h/7d quotas | Not affected |
-| Opus when throttled | May return 429 | **Works** |
+| | Direct API (default) | CLI Backend (`--cli`) | Passthrough (`--passthrough`) |
+|---|---|---|---|
+| Streaming | Native SSE | SSE (converted from JSON) | Native SSE |
+| Tool use | Yes | No | Yes |
+| Thinking/billing injection | Yes (Claude-optimized) | N/A | No (OAuth swap only) |
+| Latency | Low | Higher (process spawn) | Low |
+| Rate limits | Priority routing | Not affected | Standard (no priority) |
+| Opus when throttled | Auto CLI fallback | **Always works** | May return 429 |
+## Passthrough Mode
+For tools like Hermes or OpenClaw that need exact Anthropic protocol fidelity, use `--passthrough`. This does OAuth swap only — no billing tag, no thinking injection, no device identity, no extra beta flags.
+```bash
+dario proxy --passthrough               # Thin proxy, zero injection
+dario proxy --passthrough --model=opus   # Thin proxy + model override
+```
 ## Model Selection
@@ -285,7 +295,7 @@ const message = await client.messages.create({
 });
 ```
-### Streaming (direct API mode only)
+### Streaming
 ```bash
 curl http://localhost:3456/v1/messages \
@@ -351,6 +361,18 @@ Then run `hermes` normally — it routes through dario using your Claude subscri
 └──────────┘     └─────────────────┘     └──────────────────┘
 ```
+### Passthrough Mode (`--passthrough`)
+```
+┌──────────┐     ┌─────────────────┐     ┌──────────────────┐
+│ Your App │ ──> │  dario (proxy)  │ ──> │ api.anthropic.com│
+│          │     │  localhost:3456  │     │                  │
+│ sends    │     │  swaps API key  │     │  sees valid      │
+│ API      │     │  for OAuth      │     │  OAuth bearer    │
+│ request  │     │  nothing else   │     │  token           │
+└──────────┘     └─────────────────┘     └──────────────────┘
+```
 1. **`dario login`** — Detects your existing Claude Code credentials (`~/.claude/.credentials.json`) and starts the proxy automatically. If Claude Code isn't installed, runs a PKCE OAuth flow with a local callback server to capture the token automatically.
 2. **`dario proxy`** — Starts an HTTP server on localhost that implements the Anthropic Messages API. In direct mode, it swaps your API key for an OAuth bearer token. In CLI mode, it routes through the Claude Code binary.
@@ -373,6 +395,7 @@ Then run `hermes` normally — it routes through dario using your Claude subscri
 | Flag/Env | Description | Default |
 |----------|-------------|---------|
 | `--cli` | Use Claude CLI as backend (bypasses rate limits) | off |
+| `--passthrough` | Thin proxy — OAuth swap only, no injection | off |
 | `--model=MODEL` | Force a model (`opus`, `sonnet`, `haiku`, or full ID) | passthrough |
 | `--port=PORT` | Port to listen on | `3456` |
 | `--verbose` / `-v` | Log every request | off |
@@ -383,8 +406,10 @@ Then run `hermes` normally — it routes through dario using your Claude subscri
 ### Direct API Mode
 - All Claude models (Opus 4.6, Sonnet 4.6, Haiku 4.5) + 1M extended context aliases (`opus1m`, `sonnet1m`)
 - **Native billing classification** — device identity metadata ensures Max plan limits work correctly
-- **Priority routing** — billing tag injection + `service_tier: auto` activates per-model rate limits, keeping Opus/Sonnet available even at 100% overall utilization
-- **Adaptive thinking** — matches Claude Code's `{ type: 'adaptive' }` mode for optimal reasoning
+- **Priority routing** — billing tag injection + `service_tier: 'auto'` activates per-model rate limits, keeping Opus/Sonnet available even at 100% overall utilization
+- **Adaptive thinking** — matches Claude Code's `{ type: 'adaptive' }` mode for optimal reasoning (auto-skipped for Haiku 4.5)
+- **Effort control** — injects `output_config: { effort: 'high' }` by default, or passes through client-specified effort level
+- **Enriched 429 errors** — rate limit errors include utilization %, limiting window, and reset time instead of Anthropic's default `"Error"` message
 - **Auto CLI fallback** — if the API returns 429 and Claude Code is installed, transparently retries through `claude --print` with SSE conversion
 - **OpenAI-compatible** (`/v1/chat/completions`) — works with any OpenAI SDK or tool
 - Streaming and non-streaming (both Anthropic and OpenAI SSE formats, including tool_use streaming)
@@ -399,10 +424,17 @@ Then run `hermes` normally — it routes through dario using your Claude subscri
 ### CLI Backend Mode
 - All Claude models — including Opus when rate limited
-- Non-streaming responses
+- Streaming via SSE conversion (client sends `stream: true`, CLI JSON response is converted to Anthropic or OpenAI SSE events)
+- OpenAI compatibility (translates OpenAI → Anthropic before CLI, Anthropic → OpenAI after)
 - System prompts and multi-turn conversations (via context injection)
 - Not affected by API rate limits
+### Passthrough Mode
+- All Claude models with native streaming and tool use
+- OAuth token swap only — no billing tag, thinking, effort, service_tier, or device identity injection
+- Minimal beta flags (`oauth-2025-04-20` + client betas only)
+- For tools like Hermes or OpenClaw that need exact Anthropic protocol fidelity
 ## Endpoints
 | Path | Description |
@@ -459,7 +491,7 @@ Recommended but not required. If Claude Code is installed and logged in, `dario
 Dario auto-refreshes tokens 30 minutes before expiry. You should never see an auth error in normal use. If something goes wrong, `dario refresh` forces an immediate refresh.
 **I'm getting rate limited on Opus. What do I do?**
-Use `--cli` mode: `dario proxy --cli`. This routes through the Claude Code binary, which continues working when direct API calls are rate limited. You can also enable [extra usage](https://support.claude.com/en/articles/12429409-manage-extra-usage-for-paid-claude-plans) in your Anthropic account settings to extend your limits at API rates.
+Use `--cli` mode: `dario proxy --cli`. This routes through the Claude Code binary, which continues working when direct API calls are rate limited. In default mode, dario automatically falls back to CLI when it detects a 429 (if Claude Code is installed). Rate limit errors include utilization percentages and reset times so you can see exactly when capacity returns. You can also enable [extra usage](https://support.claude.com/en/articles/12429409-manage-extra-usage-for-paid-claude-plans) in your Anthropic account settings to extend your limits at API rates.
 **What are the usage limits?**
 Claude subscriptions have rolling 5-hour and 7-day usage windows shared across claude.ai and Claude Code. See [Anthropic's docs](https://support.claude.com/en/articles/11647753-how-do-usage-and-length-limits-work) for details. In Claude Code, use `/usage` to check your current limits, or configure the [statusline](https://code.claude.com/docs/en/statusline) to show real-time 5h and 7d utilization percentages.
@@ -483,6 +515,9 @@ await startProxy({ port: 3456, verbose: true });
 // CLI backend mode
 await startProxy({ port: 3456, cliBackend: true, model: "opus" });
+// Passthrough mode (OAuth swap only, no injection)
+await startProxy({ port: 3456, passthrough: true });
 // Or just get a raw access token
 const token = await getAccessToken();

package/dist/cli.js CHANGED Viewed

@@ -9,10 +9,10 @@
  *   dario refresh   — Force token refresh
  *   dario logout    — Remove saved credentials
  */
-import { readFile, unlink } from 'node:fs/promises';
+import { unlink } from 'node:fs/promises';
 import { join } from 'node:path';
 import { homedir } from 'node:os';
-import { startAutoOAuthFlow, getStatus, refreshTokens } from './oauth.js';
+import { startAutoOAuthFlow, getStatus, refreshTokens, loadCredentials } from './oauth.js';
 import { startProxy, sanitizeError } from './proxy.js';
 const args = process.argv.slice(2);
 const command = args[0] ?? 'proxy';
@@ -21,22 +21,14 @@ async function login() {
     console.log('  dario — Claude Login');
     console.log('  ───────────────────');
     console.log('');
-    // Check if Claude Code credentials exist
-    const ccPath = join(homedir(), '.claude', '.credentials.json');
-    try {
-        const raw = await readFile(ccPath, 'utf-8');
-        const parsed = JSON.parse(raw);
-        if (parsed?.claudeAiOauth?.accessToken && parsed?.claudeAiOauth?.refreshToken) {
-            const expiresAt = parsed.claudeAiOauth.expiresAt;
-            if (expiresAt > Date.now()) {
-                console.log('  Found Claude Code credentials. Starting proxy...');
-                console.log('');
-                await proxy();
-                return;
-            }
-        }
+    // Check for existing credentials (Claude Code or dario's own)
+    const creds = await loadCredentials();
+    if (creds?.claudeAiOauth?.accessToken && creds.claudeAiOauth.expiresAt > Date.now()) {
+        console.log('  Found credentials. Starting proxy...');
+        console.log('');
+        await proxy();
+        return;
     }
-    catch { /* no Claude Code credentials, fall through to OAuth */ }
     console.log('  No Claude Code credentials found. Starting OAuth flow...');
     console.log('');
     try {
@@ -112,9 +104,10 @@ async function proxy() {
     }
     const verbose = args.includes('--verbose') || args.includes('-v');
     const cliBackend = args.includes('--cli');
+    const passthrough = args.includes('--passthrough') || args.includes('--thin');
     const modelArg = args.find(a => a.startsWith('--model='));
     const model = modelArg ? modelArg.split('=')[1] : undefined;
-    await startProxy({ port, verbose, model, cliBackend });
+    await startProxy({ port, verbose, model, cliBackend, passthrough });
 }
 async function help() {
     console.log(`
@@ -133,6 +126,7 @@ async function help() {
                              Full IDs: claude-opus-4-6, claude-sonnet-4-6
                              Default: passthrough (client decides)
     --cli                    Use Claude CLI as backend (bypasses rate limits)
+    --passthrough            Thin proxy — OAuth swap only, no injection
     --port=PORT              Port to listen on (default: 3456)
     --verbose, -v            Log all requests
@@ -155,12 +149,11 @@ async function help() {
 `);
 }
 async function version() {
-    const { readFile } = await import('node:fs/promises');
-    const { fileURLToPath } = await import('node:url');
-    const { dirname, join } = await import('node:path');
     try {
-        const dir = dirname(fileURLToPath(import.meta.url));
-        const pkg = JSON.parse(await readFile(join(dir, '..', 'package.json'), 'utf-8'));
+        const { fileURLToPath } = await import('node:url');
+        const { readFile: rf } = await import('node:fs/promises');
+        const dir = join(fileURLToPath(import.meta.url), '..', '..');
+        const pkg = JSON.parse(await rf(join(dir, 'package.json'), 'utf-8'));
         console.log(pkg.version);
     }
     catch {

package/dist/oauth.js CHANGED Viewed

@@ -5,7 +5,7 @@
  * Handles authorization, token exchange, storage, and auto-refresh.
  */
 import { randomBytes, createHash } from 'node:crypto';
-import { readFile, writeFile, mkdir, chmod, rename } from 'node:fs/promises';
+import { readFile, writeFile, mkdir, rename } from 'node:fs/promises';
 import { dirname, join } from 'node:path';
 import { homedir } from 'node:os';
 // Claude Code's public OAuth client (PKCE, no secret needed)
@@ -62,11 +62,6 @@ async function saveCredentials(creds) {
     const tmpPath = `${path}.tmp.${Date.now()}`;
     await writeFile(tmpPath, JSON.stringify(creds, null, 2), { mode: 0o600 });
     await rename(tmpPath, path);
-    // Set permissions (best-effort — no-op on Windows where mode is ignored)
-    try {
-        await chmod(path, 0o600);
-    }
-    catch { /* Windows ignores file modes */ }
     // Invalidate cache so next read picks up the new tokens
     credentialsCache = creds;
     credentialsCacheTime = Date.now();
@@ -222,10 +217,10 @@ async function doRefreshTokens() {
         }
         const data = await res.json();
         const tokens = {
-            ...oauth,
             accessToken: data.access_token,
             refreshToken: data.refresh_token,
             expiresAt: Date.now() + data.expires_in * 1000,
+            scopes: oauth.scopes,
         };
         await saveCredentials({ claudeAiOauth: tokens });
         return tokens;

package/dist/proxy.d.ts CHANGED Viewed

@@ -3,6 +3,7 @@ interface ProxyOptions {
     verbose?: boolean;
     model?: string;
     cliBackend?: boolean;
+    passthrough?: boolean;
 }
 export declare function sanitizeError(err: unknown): string;
 export declare function startProxy(opts?: ProxyOptions): Promise<void>;

package/dist/proxy.js CHANGED Viewed

@@ -1,7 +1,7 @@
 import { createServer } from 'node:http';
 import { randomUUID, timingSafeEqual } from 'node:crypto';
 import { execSync, spawn } from 'node:child_process';
-import { readFileSync } from 'node:fs';
+import { readFileSync, readdirSync } from 'node:fs';
 import { join } from 'node:path';
 import { homedir } from 'node:os';
 import { arch, platform, version as nodeVersion } from 'node:process';
@@ -35,27 +35,19 @@ class Semaphore {
             next();
     }
 }
-// Detect installed Claude Code binary at startup
-function detectClaudeVersion() {
+// Detect installed Claude Code binary at startup (single exec for both version + availability)
+let cliAvailable = false;
+function detectCli() {
     try {
         const out = execSync('claude --version', { timeout: 5000, stdio: 'pipe' }).toString().trim();
-        const match = out.match(/^([\d.]+)/);
-        return match?.[1] ?? '2.1.96';
+        cliAvailable = true;
+        return out.match(/^([\d.]+)/)?.[1] ?? '2.1.96';
     }
     catch {
+        cliAvailable = false;
         return '2.1.96';
     }
 }
-let cliAvailable = false;
-function detectCliAvailable() {
-    try {
-        execSync('claude --version', { timeout: 5000, stdio: 'pipe' });
-        return true;
-    }
-    catch {
-        return false;
-    }
-}
 /** Convert a non-streaming Messages API response to SSE event stream. */
 function jsonToSse(jsonBody) {
     try {
@@ -86,6 +78,40 @@ function jsonToSse(jsonBody) {
         return '';
     }
 }
+/** Convert CLI JSON response to OpenAI SSE format. */
+function jsonToOpenaiSse(jsonBody) {
+    try {
+        const parsed = JSON.parse(jsonBody);
+        const text = parsed.content?.find(c => c.type === 'text')?.text ?? '';
+        const ts = Math.floor(Date.now() / 1000);
+        return `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: { content: text }, finish_reason: null }] })}\n\n` +
+            `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: {}, finish_reason: 'stop' }] })}\n\ndata: [DONE]\n\n`;
+    }
+    catch {
+        return '';
+    }
+}
+/** Send a CLI result to the client, handling streaming/format translation. */
+function sendCliResponse(res, cliResult, clientWantsStream, isOpenAI, corsOrigin, securityHeaders) {
+    const headers = { 'Access-Control-Allow-Origin': corsOrigin, ...securityHeaders };
+    const ok = cliResult.status >= 200 && cliResult.status < 300;
+    if (ok && clientWantsStream) {
+        const sseData = isOpenAI ? jsonToOpenaiSse(cliResult.body) : jsonToSse(cliResult.body);
+        if (sseData) {
+            res.writeHead(200, { 'Content-Type': 'text/event-stream', ...headers });
+            res.end(sseData);
+            return;
+        }
+    }
+    if (ok && isOpenAI) {
+        try {
+            cliResult.body = JSON.stringify(anthropicToOpenai(JSON.parse(cliResult.body)));
+        }
+        catch { }
+    }
+    res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, ...headers });
+    res.end(cliResult.body);
+}
 const SESSION_ID = randomUUID();
 const OS_NAME = platform === 'win32' ? 'Windows' : platform === 'darwin' ? 'MacOS' : 'Linux';
 // Claude Code device identity — required for Max plan billing classification.
@@ -100,7 +126,7 @@ function loadClaudeIdentity() {
     // Also check backup files as fallback
     try {
         const backupDir = join(homedir(), '.claude', 'backups');
-        const files = require('fs').readdirSync(backupDir);
+        const files = readdirSync(backupDir);
         const backups = files
             .filter((f) => f.startsWith('.claude.json.backup.'))
             .sort()
@@ -180,28 +206,6 @@ function sanitizeMessages(body) {
         }
     }
 }
-let lastTokenSnapshot = null;
-function checkTokenAnomalies(usage, requestId) {
-    const current = {
-        inputTokens: usage.input_tokens ?? 0,
-        outputTokens: usage.output_tokens ?? 0,
-        cacheRead: usage.cache_read_input_tokens ?? 0,
-    };
-    if (lastTokenSnapshot && lastTokenSnapshot.inputTokens > 0) {
-        const growth = (current.inputTokens - lastTokenSnapshot.inputTokens) / lastTokenSnapshot.inputTokens;
-        if (growth > 0.6) {
-            const pct = Math.round(growth * 100);
-            console.warn(`[dario] TOKEN WARN ${requestId}: Input grew ${pct}% (${lastTokenSnapshot.inputTokens} → ${current.inputTokens}). Possible full replay.`);
-        }
-        if (current.outputTokens > lastTokenSnapshot.outputTokens * 2 && current.outputTokens > 2000) {
-            console.warn(`[dario] TOKEN WARN ${requestId}: Output explosion ${current.outputTokens} tokens (${Math.round(current.outputTokens / lastTokenSnapshot.outputTokens)}x previous).`);
-        }
-    }
-    lastTokenSnapshot = current;
-}
-// Extended context fallback — cooldown after 1M context failure
-let extendedContextUnavailableAt = 0;
-const EXTENDED_CONTEXT_COOLDOWN_MS = 60 * 60 * 1000; // 1 hour
 // OpenAI model names → Anthropic (fallback if client sends GPT names)
 const OPENAI_MODEL_MAP = {
     'gpt-5.4': 'claude-opus-4-6',
@@ -301,6 +305,37 @@ export function sanitizeError(err) {
         .replace(/eyJ[a-zA-Z0-9_-]+\.eyJ[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+/g, '[REDACTED_JWT]')
         .replace(/Bearer\s+[^\s,;]+/gi, 'Bearer [REDACTED]');
 }
+/**
+ * Enrich Anthropic's unhelpful 429 "Error" body with rate limit details from headers.
+ */
+function enrich429(body, headers) {
+    try {
+        const parsed = JSON.parse(body);
+        const err = parsed.error;
+        if (err && (err.message === 'Error' || !err.message)) {
+            const claim = headers.get('anthropic-ratelimit-unified-representative-claim') || 'unknown';
+            const status = headers.get('anthropic-ratelimit-unified-status') || 'rejected';
+            const util5h = headers.get('anthropic-ratelimit-unified-5h-utilization');
+            const util7d = headers.get('anthropic-ratelimit-unified-7d-utilization');
+            const reset = headers.get('anthropic-ratelimit-unified-reset');
+            const parts = [`Rate limited (${status}). Limiting window: ${claim}`];
+            if (util5h)
+                parts.push(`5h utilization: ${Math.round(parseFloat(util5h) * 100)}%`);
+            if (util7d)
+                parts.push(`7d utilization: ${Math.round(parseFloat(util7d) * 100)}%`);
+            if (reset) {
+                const resetDate = new Date(parseInt(reset) * 1000);
+                const mins = Math.max(0, Math.round((resetDate.getTime() - Date.now()) / 60000));
+                parts.push(`resets in ${mins}m`);
+            }
+            err.message = parts.join('. ');
+        }
+        return JSON.stringify(parsed);
+    }
+    catch {
+        return body;
+    }
+}
 /**
  * CLI Backend: route requests through `claude --print` instead of direct API.
  * This bypasses rate limiting because Claude Code's binary has priority routing.
@@ -398,14 +433,14 @@ async function handleViaCli(body, model, verbose) {
 export async function startProxy(opts = {}) {
     const port = opts.port ?? DEFAULT_PORT;
     const verbose = opts.verbose ?? false;
+    const passthrough = opts.passthrough ?? false;
     // Verify auth before starting
     const status = await getStatus();
     if (!status.authenticated) {
         console.error('[dario] Not authenticated. Run `dario login` first.');
         process.exit(1);
     }
-    const cliVersion = detectClaudeVersion();
-    cliAvailable = detectCliAvailable();
+    const cliVersion = detectCli();
     const modelOverride = opts.model ? (MODEL_ALIASES[opts.model] ?? opts.model) : null;
     const identity = loadClaudeIdentity();
     if (identity.deviceId) {
@@ -415,8 +450,11 @@ export async function startProxy(opts = {}) {
         console.warn('[dario] WARNING: No Claude Code device identity found. Requests may be billed as Extra Usage.');
         console.warn('[dario] Run Claude Code at least once to generate ~/.claude/.claude.json');
     }
-    // Pre-build static headers (only auth, version, beta, request-id change per request)
-    const staticHeaders = {
+    // Pre-build static headers
+    const staticHeaders = passthrough ? {
+        'accept': 'application/json',
+        'Content-Type': 'application/json',
+    } : {
         'accept': 'application/json',
         'Content-Type': 'application/json',
         'anthropic-dangerous-direct-browser-access': 'true',
@@ -556,30 +594,26 @@ export async function startProxy(opts = {}) {
             // CLI backend mode: route through claude --print (works for both Anthropic and OpenAI endpoints)
             if (useCli && req.method === 'POST' && body.length > 0) {
                 let cliBody = body;
+                let clientWantsStream = false;
                 // Translate OpenAI format before passing to CLI
                 if (isOpenAI) {
                     try {
                         const parsed = JSON.parse(body.toString());
+                        clientWantsStream = !!parsed.stream;
                         cliBody = Buffer.from(JSON.stringify(openaiToAnthropic(parsed, modelOverride)));
                     }
                     catch { /* send as-is */ }
                 }
-                const cliResult = await handleViaCli(cliBody, modelOverride, verbose);
-                requestCount++;
-                // Translate CLI response back to OpenAI format if needed
-                if (isOpenAI && cliResult.status >= 200 && cliResult.status < 300) {
+                else {
                     try {
-                        const parsed = JSON.parse(cliResult.body);
-                        cliResult.body = JSON.stringify(anthropicToOpenai(parsed));
+                        const parsed = JSON.parse(body.toString());
+                        clientWantsStream = !!parsed.stream;
                     }
-                    catch { /* send as-is */ }
+                    catch { }
                 }
-                res.writeHead(cliResult.status, {
-                    'Content-Type': cliResult.contentType,
-                    'Access-Control-Allow-Origin': corsOrigin,
-                    ...SECURITY_HEADERS,
-                });
-                res.end(cliResult.body);
+                const cliResult = await handleViaCli(cliBody, modelOverride, verbose);
+                requestCount++;
+                sendCliResponse(res, cliResult, clientWantsStream, isOpenAI, corsOrigin, SECURITY_HEADERS);
                 return;
             }
             // Parse body once, apply OpenAI translation, model override, and sanitization
@@ -589,60 +623,64 @@ export async function startProxy(opts = {}) {
                     const parsed = JSON.parse(body.toString());
                     // Strip orchestration tags from messages (Aider, Cursor, etc.)
                     sanitizeMessages(parsed);
-                    // Handle 1M context: strip [1m] suffix if in cooldown
-                    if (modelOverride?.includes('[1m]') && extendedContextUnavailableAt > 0 && Date.now() - extendedContextUnavailableAt < EXTENDED_CONTEXT_COOLDOWN_MS) {
-                        parsed.model = modelOverride.replace('[1m]', '');
-                    }
                     const result = isOpenAI ? openaiToAnthropic(parsed, modelOverride) : (modelOverride ? { ...parsed, model: modelOverride } : parsed);
                     const r = result;
-                    // Inject device identity metadata for session tracking
-                    if (identity.deviceId) {
-                        r.metadata = {
-                            user_id: JSON.stringify({
-                                device_id: identity.deviceId,
-                                account_uuid: identity.accountUuid,
-                                session_id: SESSION_ID,
-                            }),
-                        };
-                    }
-                    // Enable adaptive thinking for models that support it (Opus/Sonnet 4.6+)
-                    // Haiku 4.5 does not support thinking at all
-                    const modelName = (r.model || '').toLowerCase();
-                    const supportsThinking = !modelName.includes('haiku');
-                    if (supportsThinking && !r.thinking) {
-                        r.thinking = { type: 'adaptive' };
-                        // Ensure max_tokens is reasonable for thinking models
-                        const clientMax = r.max_tokens || 8192;
-                        r.max_tokens = Math.max(clientMax, 16000);
-                    }
-                    // Request priority capacity when available
-                    if (!r.service_tier) {
-                        r.service_tier = 'auto';
-                    }
-                    // Enable context management (matches Claude Code default)
-                    // Requires thinking to be enabled — skip for models without thinking support (e.g. Haiku)
-                    if (supportsThinking && !r.context_management) {
-                        r.context_management = { edits: [{ type: 'clear_thinking_20251015', keep: 'all' }] };
-                    }
-                    // Inject Claude Code billing header into system prompt.
-                    // Anthropic uses this to route requests through priority rate limiting
-                    // instead of the general API quota. Without it, Opus/Sonnet get 429
-                    // when overall utilization is high, even though model-specific limits
-                    // have headroom. The CLI binary embeds this in its system prompt.
-                    const billingTag = `x-anthropic-billing-header: cc_version=${cliVersion}; cc_entrypoint=cli; cch=98638;`;
-                    if (typeof r.system === 'string') {
-                        if (!r.system.includes('x-anthropic-billing-header:')) {
-                            r.system = billingTag + '\n' + r.system;
+                    // In passthrough mode, skip all Claude-specific injection — OAuth swap only
+                    if (!passthrough) {
+                        // Inject device identity metadata for session tracking
+                        if (identity.deviceId) {
+                            r.metadata = {
+                                user_id: JSON.stringify({
+                                    device_id: identity.deviceId,
+                                    account_uuid: identity.accountUuid,
+                                    session_id: SESSION_ID,
+                                }),
+                            };
                         }
-                    }
-                    else if (Array.isArray(r.system)) {
-                        const hasTag = r.system.some(b => typeof b.text === 'string' && b.text.includes('x-anthropic-billing-header:'));
-                        if (!hasTag) {
-                            r.system.unshift({ type: 'text', text: billingTag });
+                        // Enable adaptive thinking for models that support it (Opus/Sonnet 4.6+)
+                        // Haiku 4.5 does not support thinking at all
+                        const modelName = (r.model || '').toLowerCase();
+                        const supportsThinking = !modelName.includes('haiku');
+                        if (supportsThinking && !r.thinking) {
+                            r.thinking = { type: 'adaptive' };
+                            // Ensure max_tokens is reasonable for thinking models
+                            const clientMax = r.max_tokens || 8192;
+                            r.max_tokens = Math.max(clientMax, 16000);
+                        }
+                        // Request priority capacity when available
+                        if (!r.service_tier) {
+                            r.service_tier = 'auto';
+                        }
+                        // Set reasoning effort (pass through client value or default)
+                        // Haiku does not support the effort parameter
+                        if (supportsThinking && !r.output_config) {
+                            r.output_config = { effort: 'high' };
+                        }
+                        // Enable context management (matches Claude Code default)
+                        // Requires thinking to be enabled — skip for models without thinking support (e.g. Haiku)
+                        if (supportsThinking && !r.context_management) {
+                            r.context_management = { edits: [{ type: 'clear_thinking_20251015', keep: 'all' }] };
+                        }
+                        // Inject Claude Code billing header into system prompt.
+                        // Anthropic uses this to route requests through priority rate limiting
+                        // instead of the general API quota. Without it, Opus/Sonnet get 429
+                        // when overall utilization is high, even though model-specific limits
+                        // have headroom. The CLI binary embeds this in its system prompt.
+                        const billingTag = `x-anthropic-billing-header: cc_version=${cliVersion}; cc_entrypoint=cli; cch=98638;`;
+                        if (typeof r.system === 'string') {
+                            if (!r.system.includes('x-anthropic-billing-header:')) {
+                                r.system = billingTag + '\n' + r.system;
+                            }
+                        }
+                        else if (Array.isArray(r.system)) {
+                            const hasTag = r.system.some(b => typeof b.text === 'string' && b.text.includes('x-anthropic-billing-header:'));
+                            if (!hasTag) {
+                                r.system.unshift({ type: 'text', text: billingTag });
+                            }
+                        }
+                        else {
+                            r.system = billingTag;
                         }
-                    }
-                    else {
-                        r.system = billingTag;
                     }
                     finalBody = Buffer.from(JSON.stringify(r));
                 }
@@ -652,15 +690,23 @@ export async function startProxy(opts = {}) {
                 const modelInfo = modelOverride ? ` (model: ${modelOverride})` : '';
                 console.log(`[dario] #${requestCount} ${req.method} ${urlPath}${modelInfo}`);
             }
-            // Beta defaults — matches native Claude Code v2.1.98 headers exactly.
-            // Billing classification is determined by the OAuth token alone, not beta flags.
-            // context-management and prompt-caching-scope are safe for all subscription types.
+            // Beta headers
             const clientBeta = req.headers['anthropic-beta'];
-            let beta = 'oauth-2025-04-20,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05,claude-code-20250219,advisor-tool-2026-03-01,effort-2025-11-24';
-            if (clientBeta) {
-                const filtered = filterBillableBetas(clientBeta);
-                if (filtered)
-                    beta += ',' + filtered;
+            let beta;
+            if (passthrough) {
+                // Passthrough: only add oauth beta, forward client betas as-is
+                beta = 'oauth-2025-04-20';
+                if (clientBeta)
+                    beta += ',' + clientBeta;
+            }
+            else {
+                // Claude-optimized: full beta set matching CLI v2.1.100
+                beta = 'oauth-2025-04-20,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05,claude-code-20250219,advisor-tool-2026-03-01,effort-2025-11-24';
+                if (clientBeta) {
+                    const filtered = filterBillableBetas(clientBeta);
+                    if (filtered)
+                        beta += ',' + filtered;
+                }
             }
             const headers = {
                 ...staticHeaders,
@@ -675,74 +721,38 @@ export async function startProxy(opts = {}) {
                 body: finalBody ? new Uint8Array(finalBody) : undefined,
                 signal: AbortSignal.timeout(UPSTREAM_TIMEOUT_MS),
             });
-            // Auto-fallback: if API returns 429 and CLI is available, retry through CLI binary.
-            // The CLI gets priority routing from Anthropic's server — a separate rate limit pool
-            // that continues working when the direct API quota is exhausted for expensive models.
+            // Enrich 429 errors with rate limit details from headers (Anthropic only returns "Error")
+            if (upstream.status === 429 && !(cliAvailable && !useCli)) {
+                const errBody = await upstream.text().catch(() => '');
+                const enriched = enrich429(errBody, upstream.headers);
+                const responseHeaders = {
+                    'Content-Type': 'application/json',
+                    'Access-Control-Allow-Origin': corsOrigin,
+                    ...SECURITY_HEADERS,
+                };
+                for (const [key, value] of upstream.headers.entries()) {
+                    if (key.startsWith('x-ratelimit') || key.startsWith('anthropic-ratelimit') || key === 'request-id') {
+                        responseHeaders[key] = value;
+                    }
+                }
+                requestCount++;
+                res.writeHead(429, responseHeaders);
+                res.end(enriched);
+                return;
+            }
+            // Auto-fallback: if API returns 429 and CLI is available, retry through CLI binary
             if (upstream.status === 429 && cliAvailable && !useCli) {
-                // Drain the upstream response
                 await upstream.text().catch(() => { });
                 if (verbose)
                     console.log(`[dario] #${requestCount} 429 from API — falling back to CLI`);
-                // Determine if the client requested streaming
                 let clientWantsStream = false;
-                if (body.length > 0) {
-                    try {
-                        const p = JSON.parse(body.toString());
-                        clientWantsStream = !!p.stream;
-                    }
-                    catch { }
+                try {
+                    clientWantsStream = !!JSON.parse(body.toString()).stream;
                 }
+                catch { }
                 const cliResult = await handleViaCli(body, modelOverride, verbose);
                 requestCount++;
-                if (cliResult.status >= 200 && cliResult.status < 300) {
-                    if (isOpenAI) {
-                        // Translate to OpenAI format
-                        try {
-                            const parsed = JSON.parse(cliResult.body);
-                            cliResult.body = JSON.stringify(anthropicToOpenai(parsed));
-                        }
-                        catch { }
-                    }
-                    if (clientWantsStream && !isOpenAI) {
-                        // Client requested SSE streaming — convert CLI JSON to SSE events
-                        const sseData = jsonToSse(cliResult.body);
-                        res.writeHead(200, {
-                            'Content-Type': 'text/event-stream',
-                            'Access-Control-Allow-Origin': corsOrigin,
-                            ...SECURITY_HEADERS,
-                        });
-                        res.end(sseData);
-                    }
-                    else if (clientWantsStream && isOpenAI) {
-                        // OpenAI streaming — convert Anthropic JSON to OpenAI SSE
-                        try {
-                            const parsed = JSON.parse(cliResult.body);
-                            const text = parsed.content?.find(c => c.type === 'text')?.text ?? '';
-                            const ts = Math.floor(Date.now() / 1000);
-                            let sseData = `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: { content: text }, finish_reason: null }] })}\n\n`;
-                            sseData += `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: {}, finish_reason: 'stop' }] })}\n\ndata: [DONE]\n\n`;
-                            res.writeHead(200, {
-                                'Content-Type': 'text/event-stream',
-                                'Access-Control-Allow-Origin': corsOrigin,
-                                ...SECURITY_HEADERS,
-                            });
-                            res.end(sseData);
-                        }
-                        catch {
-                            res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
-                            res.end(cliResult.body);
-                        }
-                    }
-                    else {
-                        res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
-                        res.end(cliResult.body);
-                    }
-                }
-                else {
-                    // CLI also failed — return the CLI error
-                    res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
-                    res.end(cliResult.body);
-                }
+                sendCliResponse(res, cliResult, clientWantsStream, isOpenAI, corsOrigin, SECURITY_HEADERS);
                 return;
             }
             // Detect streaming from content-type (reliable) or body (fallback)
@@ -817,21 +827,6 @@ export async function startProxy(opts = {}) {
             else {
                 // Buffer and forward
                 const responseBody = await upstream.text();
-                // Check for extended context failure — cooldown to avoid repeated failures
-                if (upstream.status === 400 && responseBody.includes('extra_usage') && modelOverride?.includes('[1m]')) {
-                    extendedContextUnavailableAt = Date.now();
-                    console.warn('[dario] 1M context requires Extra Usage — falling back to standard context for 1 hour');
-                }
-                // Token anomaly detection on non-streaming responses
-                if (upstream.status >= 200 && upstream.status < 300) {
-                    try {
-                        const parsed = JSON.parse(responseBody);
-                        const usage = parsed.usage;
-                        if (usage)
-                            checkTokenAnomalies(usage, responseHeaders['request-id'] ?? '');
-                    }
-                    catch { /* ignore parse errors */ }
-                }
                 if (isOpenAI && upstream.status >= 200 && upstream.status < 300) {
                     // Translate Anthropic response → OpenAI format
                     try {
@@ -869,7 +864,7 @@ export async function startProxy(opts = {}) {
         process.exit(1);
     });
     server.listen(port, LOCALHOST, () => {
-        const oauthLine = useCli ? 'Backend: Claude CLI (bypasses rate limits)' : `OAuth: ${status.status} (expires in ${status.expiresIn})`;
+        const modeLine = passthrough ? 'Mode: passthrough (OAuth swap only, no injection)' : useCli ? 'Backend: Claude CLI (bypasses rate limits)' : `OAuth: ${status.status} (expires in ${status.expiresIn})`;
         const modelLine = modelOverride ? `Model: ${modelOverride} (all requests)` : 'Model: passthrough (client decides)';
         console.log('');
         console.log(`  dario — http://localhost:${port}`);
@@ -880,7 +875,7 @@ export async function startProxy(opts = {}) {
         console.log(`    ANTHROPIC_BASE_URL=http://localhost:${port}`);
         console.log('    ANTHROPIC_API_KEY=dario');
         console.log('');
-        console.log(`  ${oauthLine}`);
+        console.log(`  ${modeLine}`);
         console.log(`  ${modelLine}`);
         console.log('');
     });

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@askalf/dario",
-  "version": "2.7.1",
+  "version": "2.8.1",
   "description": "Use your Claude subscription as an API. No API key needed. Local proxy for Claude Max/Pro subscriptions.",
   "type": "module",
   "bin": {
@@ -24,7 +24,8 @@
     "audit": "npm audit --production --audit-level=high",
     "prepublishOnly": "npm run build",
     "start": "node dist/cli.js",
-    "dev": "tsx src/cli.ts"
+    "dev": "tsx src/cli.ts",
+    "e2e": "node test/e2e.mjs"
   },
   "keywords": [
     "claude",