npm - @askalf/dario - Versions diffs - 2.8.0 → 2.8.2 - Mend

@askalf/dario 2.8.0 → 2.8.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -60,6 +60,16 @@ Opus, Sonnet, Haiku — all models, streaming, tool use. Works with Cursor, Cont
 *"Highly recommended for personal, local development. Solves a massive pain point for developers by bridging Claude Max/Pro subscriptions with developer IDEs, saving substantial API costs. Modular & lean (~1100 lines), modern PKCE auth, SSRF protection, mature CI/CD pipeline with CodeQL and npm provenance attestations."*
+</td>
+</tr>
+<tr>
+<td colspan="3" align="center"><br/><strong>In production</strong><br/><br/></td>
+</tr>
+<tr>
+<td colspan="3" valign="top">
+*"The 429s were driving us crazy running a multi-agent stack on Claude Max — the CLI fallback was duct tape until you found the real fix. Billing tag in the system prompt is wild. v2.8.0 running clean, zero 429s."* — [@belangertrading](https://github.com/belangertrading), multi-agent stack on Claude Max
 </td>
 </tr>
 </table>
@@ -193,13 +203,23 @@ Model: claude-opus-4-6 (all requests)
 **Trade-offs vs direct API mode:**
-| | Direct API (default) | CLI Backend (`--cli`) |
-|---|---|---|
-| Streaming | Yes | No (full response) |
-| Tool use passthrough | Yes | No |
-| Latency | Low | Higher (process spawn) |
-| Rate limits | Subject to 5h/7d quotas | Not affected |
-| Opus when throttled | May return 429 | **Works** |
+| | Direct API (default) | CLI Backend (`--cli`) | Passthrough (`--passthrough`) |
+|---|---|---|---|
+| Streaming | Native SSE | SSE (converted from JSON) | Native SSE |
+| Tool use | Yes | No | Yes |
+| Thinking/billing injection | Yes (Claude-optimized) | N/A | No (OAuth swap only) |
+| Latency | Low | Higher (process spawn) | Low |
+| Rate limits | Priority routing | Not affected | Standard (no priority) |
+| Opus when throttled | Auto CLI fallback | **Always works** | May return 429 |
+## Passthrough Mode
+For tools like Hermes or OpenClaw that need exact Anthropic protocol fidelity, use `--passthrough`. This does OAuth swap only — no billing tag, no thinking injection, no device identity, no extra beta flags.
+```bash
+dario proxy --passthrough               # Thin proxy, zero injection
+dario proxy --passthrough --model=opus   # Thin proxy + model override
+```
 ## Model Selection
@@ -285,7 +305,7 @@ const message = await client.messages.create({
 });
 ```
-### Streaming (direct API mode only)
+### Streaming
 ```bash
 curl http://localhost:3456/v1/messages \
@@ -351,6 +371,18 @@ Then run `hermes` normally — it routes through dario using your Claude subscri
 └──────────┘     └─────────────────┘     └──────────────────┘
 ```
+### Passthrough Mode (`--passthrough`)
+```
+┌──────────┐     ┌─────────────────┐     ┌──────────────────┐
+│ Your App │ ──> │  dario (proxy)  │ ──> │ api.anthropic.com│
+│          │     │  localhost:3456  │     │                  │
+│ sends    │     │  swaps API key  │     │  sees valid      │
+│ API      │     │  for OAuth      │     │  OAuth bearer    │
+│ request  │     │  nothing else   │     │  token           │
+└──────────┘     └─────────────────┘     └──────────────────┘
+```
 1. **`dario login`** — Detects your existing Claude Code credentials (`~/.claude/.credentials.json`) and starts the proxy automatically. If Claude Code isn't installed, runs a PKCE OAuth flow with a local callback server to capture the token automatically.
 2. **`dario proxy`** — Starts an HTTP server on localhost that implements the Anthropic Messages API. In direct mode, it swaps your API key for an OAuth bearer token. In CLI mode, it routes through the Claude Code binary.
@@ -373,6 +405,7 @@ Then run `hermes` normally — it routes through dario using your Claude subscri
 | Flag/Env | Description | Default |
 |----------|-------------|---------|
 | `--cli` | Use Claude CLI as backend (bypasses rate limits) | off |
+| `--passthrough` | Thin proxy — OAuth swap only, no injection | off |
 | `--model=MODEL` | Force a model (`opus`, `sonnet`, `haiku`, or full ID) | passthrough |
 | `--port=PORT` | Port to listen on | `3456` |
 | `--verbose` / `-v` | Log every request | off |
@@ -383,8 +416,10 @@ Then run `hermes` normally — it routes through dario using your Claude subscri
 ### Direct API Mode
 - All Claude models (Opus 4.6, Sonnet 4.6, Haiku 4.5) + 1M extended context aliases (`opus1m`, `sonnet1m`)
 - **Native billing classification** — device identity metadata ensures Max plan limits work correctly
-- **Priority routing** — billing tag injection + `service_tier: auto` activates per-model rate limits, keeping Opus/Sonnet available even at 100% overall utilization
-- **Adaptive thinking** — matches Claude Code's `{ type: 'adaptive' }` mode for optimal reasoning
+- **Priority routing** — billing tag injection + `service_tier: 'auto'` activates per-model rate limits, keeping Opus/Sonnet available even at 100% overall utilization
+- **Adaptive thinking** — matches Claude Code's `{ type: 'adaptive' }` mode for optimal reasoning (auto-skipped for Haiku 4.5)
+- **Effort control** — injects `output_config: { effort: 'high' }` by default, or passes through client-specified effort level
+- **Enriched 429 errors** — rate limit errors include utilization %, limiting window, and reset time instead of Anthropic's default `"Error"` message
 - **Auto CLI fallback** — if the API returns 429 and Claude Code is installed, transparently retries through `claude --print` with SSE conversion
 - **OpenAI-compatible** (`/v1/chat/completions`) — works with any OpenAI SDK or tool
 - Streaming and non-streaming (both Anthropic and OpenAI SSE formats, including tool_use streaming)
@@ -399,10 +434,17 @@ Then run `hermes` normally — it routes through dario using your Claude subscri
 ### CLI Backend Mode
 - All Claude models — including Opus when rate limited
-- Non-streaming responses
+- Streaming via SSE conversion (client sends `stream: true`, CLI JSON response is converted to Anthropic or OpenAI SSE events)
+- OpenAI compatibility (translates OpenAI → Anthropic before CLI, Anthropic → OpenAI after)
 - System prompts and multi-turn conversations (via context injection)
 - Not affected by API rate limits
+### Passthrough Mode
+- All Claude models with native streaming and tool use
+- OAuth token swap only — no billing tag, thinking, effort, service_tier, or device identity injection
+- Minimal beta flags (`oauth-2025-04-20` + client betas only)
+- For tools like Hermes or OpenClaw that need exact Anthropic protocol fidelity
 ## Endpoints
 | Path | Description |
@@ -459,7 +501,7 @@ Recommended but not required. If Claude Code is installed and logged in, `dario
 Dario auto-refreshes tokens 30 minutes before expiry. You should never see an auth error in normal use. If something goes wrong, `dario refresh` forces an immediate refresh.
 **I'm getting rate limited on Opus. What do I do?**
-Use `--cli` mode: `dario proxy --cli`. This routes through the Claude Code binary, which continues working when direct API calls are rate limited. You can also enable [extra usage](https://support.claude.com/en/articles/12429409-manage-extra-usage-for-paid-claude-plans) in your Anthropic account settings to extend your limits at API rates.
+Use `--cli` mode: `dario proxy --cli`. This routes through the Claude Code binary, which continues working when direct API calls are rate limited. In default mode, dario automatically falls back to CLI when it detects a 429 (if Claude Code is installed). Rate limit errors include utilization percentages and reset times so you can see exactly when capacity returns. You can also enable [extra usage](https://support.claude.com/en/articles/12429409-manage-extra-usage-for-paid-claude-plans) in your Anthropic account settings to extend your limits at API rates.
 **What are the usage limits?**
 Claude subscriptions have rolling 5-hour and 7-day usage windows shared across claude.ai and Claude Code. See [Anthropic's docs](https://support.claude.com/en/articles/11647753-how-do-usage-and-length-limits-work) for details. In Claude Code, use `/usage` to check your current limits, or configure the [statusline](https://code.claude.com/docs/en/statusline) to show real-time 5h and 7d utilization percentages.
@@ -483,6 +525,9 @@ await startProxy({ port: 3456, verbose: true });
 // CLI backend mode
 await startProxy({ port: 3456, cliBackend: true, model: "opus" });
+// Passthrough mode (OAuth swap only, no injection)
+await startProxy({ port: 3456, passthrough: true });
 // Or just get a raw access token
 const token = await getAccessToken();

package/dist/cli.js CHANGED Viewed

@@ -9,10 +9,10 @@
  *   dario refresh   — Force token refresh
  *   dario logout    — Remove saved credentials
  */
-import { readFile, unlink } from 'node:fs/promises';
+import { unlink } from 'node:fs/promises';
 import { join } from 'node:path';
 import { homedir } from 'node:os';
-import { startAutoOAuthFlow, getStatus, refreshTokens } from './oauth.js';
+import { startAutoOAuthFlow, getStatus, refreshTokens, loadCredentials } from './oauth.js';
 import { startProxy, sanitizeError } from './proxy.js';
 const args = process.argv.slice(2);
 const command = args[0] ?? 'proxy';
@@ -21,22 +21,14 @@ async function login() {
     console.log('  dario — Claude Login');
     console.log('  ───────────────────');
     console.log('');
-    // Check if Claude Code credentials exist
-    const ccPath = join(homedir(), '.claude', '.credentials.json');
-    try {
-        const raw = await readFile(ccPath, 'utf-8');
-        const parsed = JSON.parse(raw);
-        if (parsed?.claudeAiOauth?.accessToken && parsed?.claudeAiOauth?.refreshToken) {
-            const expiresAt = parsed.claudeAiOauth.expiresAt;
-            if (expiresAt > Date.now()) {
-                console.log('  Found Claude Code credentials. Starting proxy...');
-                console.log('');
-                await proxy();
-                return;
-            }
-        }
+    // Check for existing credentials (Claude Code or dario's own)
+    const creds = await loadCredentials();
+    if (creds?.claudeAiOauth?.accessToken && creds.claudeAiOauth.expiresAt > Date.now()) {
+        console.log('  Found credentials. Starting proxy...');
+        console.log('');
+        await proxy();
+        return;
     }
-    catch { /* no Claude Code credentials, fall through to OAuth */ }
     console.log('  No Claude Code credentials found. Starting OAuth flow...');
     console.log('');
     try {
@@ -157,12 +149,11 @@ async function help() {
 `);
 }
 async function version() {
-    const { readFile } = await import('node:fs/promises');
-    const { fileURLToPath } = await import('node:url');
-    const { dirname, join } = await import('node:path');
     try {
-        const dir = dirname(fileURLToPath(import.meta.url));
-        const pkg = JSON.parse(await readFile(join(dir, '..', 'package.json'), 'utf-8'));
+        const { fileURLToPath } = await import('node:url');
+        const { readFile: rf } = await import('node:fs/promises');
+        const dir = join(fileURLToPath(import.meta.url), '..', '..');
+        const pkg = JSON.parse(await rf(join(dir, 'package.json'), 'utf-8'));
         console.log(pkg.version);
     }
     catch {

package/dist/oauth.js CHANGED Viewed

@@ -5,7 +5,7 @@
  * Handles authorization, token exchange, storage, and auto-refresh.
  */
 import { randomBytes, createHash } from 'node:crypto';
-import { readFile, writeFile, mkdir, chmod, rename } from 'node:fs/promises';
+import { readFile, writeFile, mkdir, rename } from 'node:fs/promises';
 import { dirname, join } from 'node:path';
 import { homedir } from 'node:os';
 // Claude Code's public OAuth client (PKCE, no secret needed)
@@ -62,11 +62,6 @@ async function saveCredentials(creds) {
     const tmpPath = `${path}.tmp.${Date.now()}`;
     await writeFile(tmpPath, JSON.stringify(creds, null, 2), { mode: 0o600 });
     await rename(tmpPath, path);
-    // Set permissions (best-effort — no-op on Windows where mode is ignored)
-    try {
-        await chmod(path, 0o600);
-    }
-    catch { /* Windows ignores file modes */ }
     // Invalidate cache so next read picks up the new tokens
     credentialsCache = creds;
     credentialsCacheTime = Date.now();
@@ -222,10 +217,10 @@ async function doRefreshTokens() {
         }
         const data = await res.json();
         const tokens = {
-            ...oauth,
             accessToken: data.access_token,
             refreshToken: data.refresh_token,
             expiresAt: Date.now() + data.expires_in * 1000,
+            scopes: oauth.scopes,
         };
         await saveCredentials({ claudeAiOauth: tokens });
         return tokens;

package/dist/proxy.js CHANGED Viewed

@@ -1,9 +1,9 @@
 import { createServer } from 'node:http';
 import { randomUUID, timingSafeEqual } from 'node:crypto';
 import { execSync, spawn } from 'node:child_process';
-import { readFileSync } from 'node:fs';
+import { readFileSync, readdirSync, writeFileSync, unlinkSync } from 'node:fs';
 import { join } from 'node:path';
-import { homedir } from 'node:os';
+import { homedir, tmpdir } from 'node:os';
 import { arch, platform, version as nodeVersion } from 'node:process';
 import { getAccessToken, getStatus } from './oauth.js';
 const ANTHROPIC_API = 'https://api.anthropic.com';
@@ -35,27 +35,19 @@ class Semaphore {
             next();
     }
 }
-// Detect installed Claude Code binary at startup
-function detectClaudeVersion() {
+// Detect installed Claude Code binary at startup (single exec for both version + availability)
+let cliAvailable = false;
+function detectCli() {
     try {
         const out = execSync('claude --version', { timeout: 5000, stdio: 'pipe' }).toString().trim();
-        const match = out.match(/^([\d.]+)/);
-        return match?.[1] ?? '2.1.96';
+        cliAvailable = true;
+        return out.match(/^([\d.]+)/)?.[1] ?? '2.1.96';
     }
     catch {
+        cliAvailable = false;
         return '2.1.96';
     }
 }
-let cliAvailable = false;
-function detectCliAvailable() {
-    try {
-        execSync('claude --version', { timeout: 5000, stdio: 'pipe' });
-        return true;
-    }
-    catch {
-        return false;
-    }
-}
 /** Convert a non-streaming Messages API response to SSE event stream. */
 function jsonToSse(jsonBody) {
     try {
@@ -86,6 +78,40 @@ function jsonToSse(jsonBody) {
         return '';
     }
 }
+/** Convert CLI JSON response to OpenAI SSE format. */
+function jsonToOpenaiSse(jsonBody) {
+    try {
+        const parsed = JSON.parse(jsonBody);
+        const text = parsed.content?.find(c => c.type === 'text')?.text ?? '';
+        const ts = Math.floor(Date.now() / 1000);
+        return `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: { content: text }, finish_reason: null }] })}\n\n` +
+            `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: {}, finish_reason: 'stop' }] })}\n\ndata: [DONE]\n\n`;
+    }
+    catch {
+        return '';
+    }
+}
+/** Send a CLI result to the client, handling streaming/format translation. */
+function sendCliResponse(res, cliResult, clientWantsStream, isOpenAI, corsOrigin, securityHeaders) {
+    const headers = { 'Access-Control-Allow-Origin': corsOrigin, ...securityHeaders };
+    const ok = cliResult.status >= 200 && cliResult.status < 300;
+    if (ok && clientWantsStream) {
+        const sseData = isOpenAI ? jsonToOpenaiSse(cliResult.body) : jsonToSse(cliResult.body);
+        if (sseData) {
+            res.writeHead(200, { 'Content-Type': 'text/event-stream', ...headers });
+            res.end(sseData);
+            return;
+        }
+    }
+    if (ok && isOpenAI) {
+        try {
+            cliResult.body = JSON.stringify(anthropicToOpenai(JSON.parse(cliResult.body)));
+        }
+        catch { }
+    }
+    res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, ...headers });
+    res.end(cliResult.body);
+}
 const SESSION_ID = randomUUID();
 const OS_NAME = platform === 'win32' ? 'Windows' : platform === 'darwin' ? 'MacOS' : 'Linux';
 // Claude Code device identity — required for Max plan billing classification.
@@ -100,7 +126,7 @@ function loadClaudeIdentity() {
     // Also check backup files as fallback
     try {
         const backupDir = join(homedir(), '.claude', 'backups');
-        const files = require('fs').readdirSync(backupDir);
+        const files = readdirSync(backupDir);
         const backups = files
             .filter((f) => f.startsWith('.claude.json.backup.'))
             .sort()
@@ -180,28 +206,6 @@ function sanitizeMessages(body) {
         }
     }
 }
-let lastTokenSnapshot = null;
-function checkTokenAnomalies(usage, requestId) {
-    const current = {
-        inputTokens: usage.input_tokens ?? 0,
-        outputTokens: usage.output_tokens ?? 0,
-        cacheRead: usage.cache_read_input_tokens ?? 0,
-    };
-    if (lastTokenSnapshot && lastTokenSnapshot.inputTokens > 0) {
-        const growth = (current.inputTokens - lastTokenSnapshot.inputTokens) / lastTokenSnapshot.inputTokens;
-        if (growth > 0.6) {
-            const pct = Math.round(growth * 100);
-            console.warn(`[dario] TOKEN WARN ${requestId}: Input grew ${pct}% (${lastTokenSnapshot.inputTokens} → ${current.inputTokens}). Possible full replay.`);
-        }
-        if (current.outputTokens > lastTokenSnapshot.outputTokens * 2 && current.outputTokens > 2000) {
-            console.warn(`[dario] TOKEN WARN ${requestId}: Output explosion ${current.outputTokens} tokens (${Math.round(current.outputTokens / lastTokenSnapshot.outputTokens)}x previous).`);
-        }
-    }
-    lastTokenSnapshot = current;
-}
-// Extended context fallback — cooldown after 1M context failure
-let extendedContextUnavailableAt = 0;
-const EXTENDED_CONTEXT_COOLDOWN_MS = 60 * 60 * 1000; // 1 hour
 // OpenAI model names → Anthropic (fallback if client sends GPT names)
 const OPENAI_MODEL_MAP = {
     'gpt-5.4': 'claude-opus-4-6',
@@ -361,14 +365,25 @@ async function handleViaCli(body, model, verbose) {
             const historyText = history.map(m => `${m.role}: ${typeof m.content === 'string' ? m.content : JSON.stringify(m.content)}`).join('\n');
             systemPrompt = systemPrompt ? `${systemPrompt}\n\nConversation history:\n${historyText}` : `Conversation history:\n${historyText}`;
         }
+        // Write system prompt to temp file instead of passing as arg to avoid E2BIG
+        // on large conversation contexts (OS arg size limit ~2MB)
+        let systemPromptFile = null;
         if (systemPrompt) {
-            args.push('--append-system-prompt', systemPrompt);
+            systemPromptFile = join(tmpdir(), `dario-sysprompt-${randomUUID()}.txt`);
+            writeFileSync(systemPromptFile, systemPrompt, { mode: 0o600 });
+            args.push('--append-system-prompt-file', systemPromptFile);
         }
         if (verbose) {
             console.log(`[dario:cli] model=${effectiveModel} prompt=${prompt.substring(0, 60)}...`);
         }
         // Spawn claude --print
         return new Promise((resolve) => {
+            // Cleanup temp file when done
+            const cleanup = () => { if (systemPromptFile)
+                try {
+                    unlinkSync(systemPromptFile);
+                }
+                catch { } };
             const child = spawn('claude', args, {
                 stdio: ['pipe', 'pipe', 'pipe'],
                 timeout: 300_000,
@@ -383,6 +398,7 @@ async function handleViaCli(body, model, verbose) {
             child.stdin.write(prompt);
             child.stdin.end();
             child.on('close', (code) => {
+                cleanup();
                 if (code !== 0 || !stdout.trim()) {
                     resolve({
                         status: 502,
@@ -410,6 +426,7 @@ async function handleViaCli(body, model, verbose) {
                 resolve({ status: 200, body: JSON.stringify(response), contentType: 'application/json' });
             });
             child.on('error', (err) => {
+                cleanup();
                 resolve({
                     status: 502,
                     body: JSON.stringify({ type: 'error', error: { type: 'api_error', message: 'Claude CLI not found. Install Claude Code first.' } }),
@@ -436,8 +453,7 @@ export async function startProxy(opts = {}) {
         console.error('[dario] Not authenticated. Run `dario login` first.');
         process.exit(1);
     }
-    const cliVersion = detectClaudeVersion();
-    cliAvailable = detectCliAvailable();
+    const cliVersion = detectCli();
     const modelOverride = opts.model ? (MODEL_ALIASES[opts.model] ?? opts.model) : null;
     const identity = loadClaudeIdentity();
     if (identity.deviceId) {
@@ -610,41 +626,7 @@ export async function startProxy(opts = {}) {
                 }
                 const cliResult = await handleViaCli(cliBody, modelOverride, verbose);
                 requestCount++;
-                if (cliResult.status >= 200 && cliResult.status < 300 && clientWantsStream) {
-                    // Client requested streaming — convert CLI JSON to SSE
-                    if (isOpenAI) {
-                        try {
-                            const parsed = JSON.parse(cliResult.body);
-                            const text = parsed.content?.find(c => c.type === 'text')?.text ?? '';
-                            const ts = Math.floor(Date.now() / 1000);
-                            let sseData = `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: { content: text }, finish_reason: null }] })}\n\n`;
-                            sseData += `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: {}, finish_reason: 'stop' }] })}\n\ndata: [DONE]\n\n`;
-                            res.writeHead(200, { 'Content-Type': 'text/event-stream', 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
-                            res.end(sseData);
-                        }
-                        catch {
-                            res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
-                            res.end(cliResult.body);
-                        }
-                    }
-                    else {
-                        const sseData = jsonToSse(cliResult.body);
-                        res.writeHead(200, { 'Content-Type': 'text/event-stream', 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
-                        res.end(sseData);
-                    }
-                }
-                else {
-                    // Non-streaming or error — translate and return as JSON
-                    if (isOpenAI && cliResult.status >= 200 && cliResult.status < 300) {
-                        try {
-                            const parsed = JSON.parse(cliResult.body);
-                            cliResult.body = JSON.stringify(anthropicToOpenai(parsed));
-                        }
-                        catch { /* send as-is */ }
-                    }
-                    res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
-                    res.end(cliResult.body);
-                }
+                sendCliResponse(res, cliResult, clientWantsStream, isOpenAI, corsOrigin, SECURITY_HEADERS);
                 return;
             }
             // Parse body once, apply OpenAI translation, model override, and sanitization
@@ -654,10 +636,6 @@ export async function startProxy(opts = {}) {
                     const parsed = JSON.parse(body.toString());
                     // Strip orchestration tags from messages (Aider, Cursor, etc.)
                     sanitizeMessages(parsed);
-                    // Handle 1M context: strip [1m] suffix if in cooldown
-                    if (modelOverride?.includes('[1m]') && extendedContextUnavailableAt > 0 && Date.now() - extendedContextUnavailableAt < EXTENDED_CONTEXT_COOLDOWN_MS) {
-                        parsed.model = modelOverride.replace('[1m]', '');
-                    }
                     const result = isOpenAI ? openaiToAnthropic(parsed, modelOverride) : (modelOverride ? { ...parsed, model: modelOverride } : parsed);
                     const r = result;
                     // In passthrough mode, skip all Claude-specific injection — OAuth swap only
@@ -687,7 +665,8 @@ export async function startProxy(opts = {}) {
                             r.service_tier = 'auto';
                         }
                         // Set reasoning effort (pass through client value or default)
-                        if (!r.output_config) {
+                        // Haiku does not support the effort parameter
+                        if (supportsThinking && !r.output_config) {
                             r.output_config = { effort: 'high' };
                         }
                         // Enable context management (matches Claude Code default)
@@ -774,74 +753,19 @@ export async function startProxy(opts = {}) {
                 res.end(enriched);
                 return;
             }
-            // Auto-fallback: if API returns 429 and CLI is available, retry through CLI binary.
-            // The CLI gets priority routing from Anthropic's server — a separate rate limit pool
-            // that continues working when the direct API quota is exhausted for expensive models.
+            // Auto-fallback: if API returns 429 and CLI is available, retry through CLI binary
             if (upstream.status === 429 && cliAvailable && !useCli) {
-                // Drain the upstream response
                 await upstream.text().catch(() => { });
                 if (verbose)
                     console.log(`[dario] #${requestCount} 429 from API — falling back to CLI`);
-                // Determine if the client requested streaming
                 let clientWantsStream = false;
-                if (body.length > 0) {
-                    try {
-                        const p = JSON.parse(body.toString());
-                        clientWantsStream = !!p.stream;
-                    }
-                    catch { }
+                try {
+                    clientWantsStream = !!JSON.parse(body.toString()).stream;
                 }
+                catch { }
                 const cliResult = await handleViaCli(body, modelOverride, verbose);
                 requestCount++;
-                if (cliResult.status >= 200 && cliResult.status < 300) {
-                    if (isOpenAI) {
-                        // Translate to OpenAI format
-                        try {
-                            const parsed = JSON.parse(cliResult.body);
-                            cliResult.body = JSON.stringify(anthropicToOpenai(parsed));
-                        }
-                        catch { }
-                    }
-                    if (clientWantsStream && !isOpenAI) {
-                        // Client requested SSE streaming — convert CLI JSON to SSE events
-                        const sseData = jsonToSse(cliResult.body);
-                        res.writeHead(200, {
-                            'Content-Type': 'text/event-stream',
-                            'Access-Control-Allow-Origin': corsOrigin,
-                            ...SECURITY_HEADERS,
-                        });
-                        res.end(sseData);
-                    }
-                    else if (clientWantsStream && isOpenAI) {
-                        // OpenAI streaming — convert Anthropic JSON to OpenAI SSE
-                        try {
-                            const parsed = JSON.parse(cliResult.body);
-                            const text = parsed.content?.find(c => c.type === 'text')?.text ?? '';
-                            const ts = Math.floor(Date.now() / 1000);
-                            let sseData = `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: { content: text }, finish_reason: null }] })}\n\n`;
-                            sseData += `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: {}, finish_reason: 'stop' }] })}\n\ndata: [DONE]\n\n`;
-                            res.writeHead(200, {
-                                'Content-Type': 'text/event-stream',
-                                'Access-Control-Allow-Origin': corsOrigin,
-                                ...SECURITY_HEADERS,
-                            });
-                            res.end(sseData);
-                        }
-                        catch {
-                            res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
-                            res.end(cliResult.body);
-                        }
-                    }
-                    else {
-                        res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
-                        res.end(cliResult.body);
-                    }
-                }
-                else {
-                    // CLI also failed — return the CLI error
-                    res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
-                    res.end(cliResult.body);
-                }
+                sendCliResponse(res, cliResult, clientWantsStream, isOpenAI, corsOrigin, SECURITY_HEADERS);
                 return;
             }
             // Detect streaming from content-type (reliable) or body (fallback)
@@ -916,21 +840,6 @@ export async function startProxy(opts = {}) {
             else {
                 // Buffer and forward
                 const responseBody = await upstream.text();
-                // Check for extended context failure — cooldown to avoid repeated failures
-                if (upstream.status === 400 && responseBody.includes('extra_usage') && modelOverride?.includes('[1m]')) {
-                    extendedContextUnavailableAt = Date.now();
-                    console.warn('[dario] 1M context requires Extra Usage — falling back to standard context for 1 hour');
-                }
-                // Token anomaly detection on non-streaming responses
-                if (upstream.status >= 200 && upstream.status < 300) {
-                    try {
-                        const parsed = JSON.parse(responseBody);
-                        const usage = parsed.usage;
-                        if (usage)
-                            checkTokenAnomalies(usage, responseHeaders['request-id'] ?? '');
-                    }
-                    catch { /* ignore parse errors */ }
-                }
                 if (isOpenAI && upstream.status >= 200 && upstream.status < 300) {
                     // Translate Anthropic response → OpenAI format
                     try {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@askalf/dario",
-  "version": "2.8.0",
+  "version": "2.8.2",
   "description": "Use your Claude subscription as an API. No API key needed. Local proxy for Claude Max/Pro subscriptions.",
   "type": "module",
   "bin": {
@@ -25,7 +25,8 @@
     "prepublishOnly": "npm run build",
     "start": "node dist/cli.js",
     "dev": "tsx src/cli.ts",
-    "e2e": "node test/e2e.mjs"
+    "e2e": "node test/e2e.mjs",
+    "compat": "node test/compat.mjs"
   },
   "keywords": [
     "claude",