@askalf/dario 2.7.1 → 2.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -193,13 +193,23 @@ Model: claude-opus-4-6 (all requests)
193
193
 
194
194
  **Trade-offs vs direct API mode:**
195
195
 
196
- | | Direct API (default) | CLI Backend (`--cli`) |
197
- |---|---|---|
198
- | Streaming | Yes | No (full response) |
199
- | Tool use passthrough | Yes | No |
200
- | Latency | Low | Higher (process spawn) |
201
- | Rate limits | Subject to 5h/7d quotas | Not affected |
202
- | Opus when throttled | May return 429 | **Works** |
196
+ | | Direct API (default) | CLI Backend (`--cli`) | Passthrough (`--passthrough`) |
197
+ |---|---|---|---|
198
+ | Streaming | Native SSE | SSE (converted from JSON) | Native SSE |
199
+ | Tool use | Yes | No | Yes |
200
+ | Thinking/billing injection | Yes (Claude-optimized) | N/A | No (OAuth swap only) |
201
+ | Latency | Low | Higher (process spawn) | Low |
202
+ | Rate limits | Priority routing | Not affected | Standard (no priority) |
203
+ | Opus when throttled | Auto CLI fallback | **Always works** | May return 429 |
204
+
205
+ ## Passthrough Mode
206
+
207
+ For tools like Hermes or OpenClaw that need exact Anthropic protocol fidelity, use `--passthrough`. This does OAuth swap only — no billing tag, no thinking injection, no device identity, no extra beta flags.
208
+
209
+ ```bash
210
+ dario proxy --passthrough # Thin proxy, zero injection
211
+ dario proxy --passthrough --model=opus # Thin proxy + model override
212
+ ```
203
213
 
204
214
  ## Model Selection
205
215
 
@@ -285,7 +295,7 @@ const message = await client.messages.create({
285
295
  });
286
296
  ```
287
297
 
288
- ### Streaming (direct API mode only)
298
+ ### Streaming
289
299
 
290
300
  ```bash
291
301
  curl http://localhost:3456/v1/messages \
@@ -351,6 +361,18 @@ Then run `hermes` normally — it routes through dario using your Claude subscri
351
361
  └──────────┘ └─────────────────┘ └──────────────────┘
352
362
  ```
353
363
 
364
+ ### Passthrough Mode (`--passthrough`)
365
+
366
+ ```
367
+ ┌──────────┐ ┌─────────────────┐ ┌──────────────────┐
368
+ │ Your App │ ──> │ dario (proxy) │ ──> │ api.anthropic.com│
369
+ │ │ │ localhost:3456 │ │ │
370
+ │ sends │ │ swaps API key │ │ sees valid │
371
+ │ API │ │ for OAuth │ │ OAuth bearer │
372
+ │ request │ │ nothing else │ │ token │
373
+ └──────────┘ └─────────────────┘ └──────────────────┘
374
+ ```
375
+
354
376
  1. **`dario login`** — Detects your existing Claude Code credentials (`~/.claude/.credentials.json`) and starts the proxy automatically. If Claude Code isn't installed, runs a PKCE OAuth flow with a local callback server to capture the token automatically.
355
377
 
356
378
  2. **`dario proxy`** — Starts an HTTP server on localhost that implements the Anthropic Messages API. In direct mode, it swaps your API key for an OAuth bearer token. In CLI mode, it routes through the Claude Code binary.
@@ -373,6 +395,7 @@ Then run `hermes` normally — it routes through dario using your Claude subscri
373
395
  | Flag/Env | Description | Default |
374
396
  |----------|-------------|---------|
375
397
  | `--cli` | Use Claude CLI as backend (bypasses rate limits) | off |
398
+ | `--passthrough` | Thin proxy — OAuth swap only, no injection | off |
376
399
  | `--model=MODEL` | Force a model (`opus`, `sonnet`, `haiku`, or full ID) | passthrough |
377
400
  | `--port=PORT` | Port to listen on | `3456` |
378
401
  | `--verbose` / `-v` | Log every request | off |
@@ -383,8 +406,10 @@ Then run `hermes` normally — it routes through dario using your Claude subscri
383
406
  ### Direct API Mode
384
407
  - All Claude models (Opus 4.6, Sonnet 4.6, Haiku 4.5) + 1M extended context aliases (`opus1m`, `sonnet1m`)
385
408
  - **Native billing classification** — device identity metadata ensures Max plan limits work correctly
386
- - **Priority routing** — billing tag injection + `service_tier: auto` activates per-model rate limits, keeping Opus/Sonnet available even at 100% overall utilization
387
- - **Adaptive thinking** — matches Claude Code's `{ type: 'adaptive' }` mode for optimal reasoning
409
+ - **Priority routing** — billing tag injection + `service_tier: 'auto'` activates per-model rate limits, keeping Opus/Sonnet available even at 100% overall utilization
410
+ - **Adaptive thinking** — matches Claude Code's `{ type: 'adaptive' }` mode for optimal reasoning (auto-skipped for Haiku 4.5)
411
+ - **Effort control** — injects `output_config: { effort: 'high' }` by default, or passes through client-specified effort level
412
+ - **Enriched 429 errors** — rate limit errors include utilization %, limiting window, and reset time instead of Anthropic's default `"Error"` message
388
413
  - **Auto CLI fallback** — if the API returns 429 and Claude Code is installed, transparently retries through `claude --print` with SSE conversion
389
414
  - **OpenAI-compatible** (`/v1/chat/completions`) — works with any OpenAI SDK or tool
390
415
  - Streaming and non-streaming (both Anthropic and OpenAI SSE formats, including tool_use streaming)
@@ -399,10 +424,17 @@ Then run `hermes` normally — it routes through dario using your Claude subscri
399
424
 
400
425
  ### CLI Backend Mode
401
426
  - All Claude models — including Opus when rate limited
402
- - Non-streaming responses
427
+ - Streaming via SSE conversion (client sends `stream: true`, CLI JSON response is converted to Anthropic or OpenAI SSE events)
428
+ - OpenAI compatibility (translates OpenAI → Anthropic before CLI, Anthropic → OpenAI after)
403
429
  - System prompts and multi-turn conversations (via context injection)
404
430
  - Not affected by API rate limits
405
431
 
432
+ ### Passthrough Mode
433
+ - All Claude models with native streaming and tool use
434
+ - OAuth token swap only — no billing tag, thinking, effort, service_tier, or device identity injection
435
+ - Minimal beta flags (`oauth-2025-04-20` + client betas only)
436
+ - For tools like Hermes or OpenClaw that need exact Anthropic protocol fidelity
437
+
406
438
  ## Endpoints
407
439
 
408
440
  | Path | Description |
@@ -459,7 +491,7 @@ Recommended but not required. If Claude Code is installed and logged in, `dario
459
491
  Dario auto-refreshes tokens 30 minutes before expiry. You should never see an auth error in normal use. If something goes wrong, `dario refresh` forces an immediate refresh.
460
492
 
461
493
  **I'm getting rate limited on Opus. What do I do?**
462
- Use `--cli` mode: `dario proxy --cli`. This routes through the Claude Code binary, which continues working when direct API calls are rate limited. You can also enable [extra usage](https://support.claude.com/en/articles/12429409-manage-extra-usage-for-paid-claude-plans) in your Anthropic account settings to extend your limits at API rates.
494
+ Use `--cli` mode: `dario proxy --cli`. This routes through the Claude Code binary, which continues working when direct API calls are rate limited. In default mode, dario automatically falls back to CLI when it detects a 429 (if Claude Code is installed). Rate limit errors include utilization percentages and reset times so you can see exactly when capacity returns. You can also enable [extra usage](https://support.claude.com/en/articles/12429409-manage-extra-usage-for-paid-claude-plans) in your Anthropic account settings to extend your limits at API rates.
463
495
 
464
496
  **What are the usage limits?**
465
497
  Claude subscriptions have rolling 5-hour and 7-day usage windows shared across claude.ai and Claude Code. See [Anthropic's docs](https://support.claude.com/en/articles/11647753-how-do-usage-and-length-limits-work) for details. In Claude Code, use `/usage` to check your current limits, or configure the [statusline](https://code.claude.com/docs/en/statusline) to show real-time 5h and 7d utilization percentages.
@@ -483,6 +515,9 @@ await startProxy({ port: 3456, verbose: true });
483
515
  // CLI backend mode
484
516
  await startProxy({ port: 3456, cliBackend: true, model: "opus" });
485
517
 
518
+ // Passthrough mode (OAuth swap only, no injection)
519
+ await startProxy({ port: 3456, passthrough: true });
520
+
486
521
  // Or just get a raw access token
487
522
  const token = await getAccessToken();
488
523
 
package/dist/cli.js CHANGED
@@ -9,10 +9,10 @@
9
9
  * dario refresh — Force token refresh
10
10
  * dario logout — Remove saved credentials
11
11
  */
12
- import { readFile, unlink } from 'node:fs/promises';
12
+ import { unlink } from 'node:fs/promises';
13
13
  import { join } from 'node:path';
14
14
  import { homedir } from 'node:os';
15
- import { startAutoOAuthFlow, getStatus, refreshTokens } from './oauth.js';
15
+ import { startAutoOAuthFlow, getStatus, refreshTokens, loadCredentials } from './oauth.js';
16
16
  import { startProxy, sanitizeError } from './proxy.js';
17
17
  const args = process.argv.slice(2);
18
18
  const command = args[0] ?? 'proxy';
@@ -21,22 +21,14 @@ async function login() {
21
21
  console.log(' dario — Claude Login');
22
22
  console.log(' ───────────────────');
23
23
  console.log('');
24
- // Check if Claude Code credentials exist
25
- const ccPath = join(homedir(), '.claude', '.credentials.json');
26
- try {
27
- const raw = await readFile(ccPath, 'utf-8');
28
- const parsed = JSON.parse(raw);
29
- if (parsed?.claudeAiOauth?.accessToken && parsed?.claudeAiOauth?.refreshToken) {
30
- const expiresAt = parsed.claudeAiOauth.expiresAt;
31
- if (expiresAt > Date.now()) {
32
- console.log(' Found Claude Code credentials. Starting proxy...');
33
- console.log('');
34
- await proxy();
35
- return;
36
- }
37
- }
24
+ // Check for existing credentials (Claude Code or dario's own)
25
+ const creds = await loadCredentials();
26
+ if (creds?.claudeAiOauth?.accessToken && creds.claudeAiOauth.expiresAt > Date.now()) {
27
+ console.log(' Found credentials. Starting proxy...');
28
+ console.log('');
29
+ await proxy();
30
+ return;
38
31
  }
39
- catch { /* no Claude Code credentials, fall through to OAuth */ }
40
32
  console.log(' No Claude Code credentials found. Starting OAuth flow...');
41
33
  console.log('');
42
34
  try {
@@ -112,9 +104,10 @@ async function proxy() {
112
104
  }
113
105
  const verbose = args.includes('--verbose') || args.includes('-v');
114
106
  const cliBackend = args.includes('--cli');
107
+ const passthrough = args.includes('--passthrough') || args.includes('--thin');
115
108
  const modelArg = args.find(a => a.startsWith('--model='));
116
109
  const model = modelArg ? modelArg.split('=')[1] : undefined;
117
- await startProxy({ port, verbose, model, cliBackend });
110
+ await startProxy({ port, verbose, model, cliBackend, passthrough });
118
111
  }
119
112
  async function help() {
120
113
  console.log(`
@@ -133,6 +126,7 @@ async function help() {
133
126
  Full IDs: claude-opus-4-6, claude-sonnet-4-6
134
127
  Default: passthrough (client decides)
135
128
  --cli Use Claude CLI as backend (bypasses rate limits)
129
+ --passthrough Thin proxy — OAuth swap only, no injection
136
130
  --port=PORT Port to listen on (default: 3456)
137
131
  --verbose, -v Log all requests
138
132
 
@@ -155,12 +149,11 @@ async function help() {
155
149
  `);
156
150
  }
157
151
  async function version() {
158
- const { readFile } = await import('node:fs/promises');
159
- const { fileURLToPath } = await import('node:url');
160
- const { dirname, join } = await import('node:path');
161
152
  try {
162
- const dir = dirname(fileURLToPath(import.meta.url));
163
- const pkg = JSON.parse(await readFile(join(dir, '..', 'package.json'), 'utf-8'));
153
+ const { fileURLToPath } = await import('node:url');
154
+ const { readFile: rf } = await import('node:fs/promises');
155
+ const dir = join(fileURLToPath(import.meta.url), '..', '..');
156
+ const pkg = JSON.parse(await rf(join(dir, 'package.json'), 'utf-8'));
164
157
  console.log(pkg.version);
165
158
  }
166
159
  catch {
package/dist/oauth.js CHANGED
@@ -5,7 +5,7 @@
5
5
  * Handles authorization, token exchange, storage, and auto-refresh.
6
6
  */
7
7
  import { randomBytes, createHash } from 'node:crypto';
8
- import { readFile, writeFile, mkdir, chmod, rename } from 'node:fs/promises';
8
+ import { readFile, writeFile, mkdir, rename } from 'node:fs/promises';
9
9
  import { dirname, join } from 'node:path';
10
10
  import { homedir } from 'node:os';
11
11
  // Claude Code's public OAuth client (PKCE, no secret needed)
@@ -62,11 +62,6 @@ async function saveCredentials(creds) {
62
62
  const tmpPath = `${path}.tmp.${Date.now()}`;
63
63
  await writeFile(tmpPath, JSON.stringify(creds, null, 2), { mode: 0o600 });
64
64
  await rename(tmpPath, path);
65
- // Set permissions (best-effort — no-op on Windows where mode is ignored)
66
- try {
67
- await chmod(path, 0o600);
68
- }
69
- catch { /* Windows ignores file modes */ }
70
65
  // Invalidate cache so next read picks up the new tokens
71
66
  credentialsCache = creds;
72
67
  credentialsCacheTime = Date.now();
@@ -222,10 +217,10 @@ async function doRefreshTokens() {
222
217
  }
223
218
  const data = await res.json();
224
219
  const tokens = {
225
- ...oauth,
226
220
  accessToken: data.access_token,
227
221
  refreshToken: data.refresh_token,
228
222
  expiresAt: Date.now() + data.expires_in * 1000,
223
+ scopes: oauth.scopes,
229
224
  };
230
225
  await saveCredentials({ claudeAiOauth: tokens });
231
226
  return tokens;
package/dist/proxy.d.ts CHANGED
@@ -3,6 +3,7 @@ interface ProxyOptions {
3
3
  verbose?: boolean;
4
4
  model?: string;
5
5
  cliBackend?: boolean;
6
+ passthrough?: boolean;
6
7
  }
7
8
  export declare function sanitizeError(err: unknown): string;
8
9
  export declare function startProxy(opts?: ProxyOptions): Promise<void>;
package/dist/proxy.js CHANGED
@@ -1,7 +1,7 @@
1
1
  import { createServer } from 'node:http';
2
2
  import { randomUUID, timingSafeEqual } from 'node:crypto';
3
3
  import { execSync, spawn } from 'node:child_process';
4
- import { readFileSync } from 'node:fs';
4
+ import { readFileSync, readdirSync } from 'node:fs';
5
5
  import { join } from 'node:path';
6
6
  import { homedir } from 'node:os';
7
7
  import { arch, platform, version as nodeVersion } from 'node:process';
@@ -35,27 +35,19 @@ class Semaphore {
35
35
  next();
36
36
  }
37
37
  }
38
- // Detect installed Claude Code binary at startup
39
- function detectClaudeVersion() {
38
+ // Detect installed Claude Code binary at startup (single exec for both version + availability)
39
+ let cliAvailable = false;
40
+ function detectCli() {
40
41
  try {
41
42
  const out = execSync('claude --version', { timeout: 5000, stdio: 'pipe' }).toString().trim();
42
- const match = out.match(/^([\d.]+)/);
43
- return match?.[1] ?? '2.1.96';
43
+ cliAvailable = true;
44
+ return out.match(/^([\d.]+)/)?.[1] ?? '2.1.96';
44
45
  }
45
46
  catch {
47
+ cliAvailable = false;
46
48
  return '2.1.96';
47
49
  }
48
50
  }
49
- let cliAvailable = false;
50
- function detectCliAvailable() {
51
- try {
52
- execSync('claude --version', { timeout: 5000, stdio: 'pipe' });
53
- return true;
54
- }
55
- catch {
56
- return false;
57
- }
58
- }
59
51
  /** Convert a non-streaming Messages API response to SSE event stream. */
60
52
  function jsonToSse(jsonBody) {
61
53
  try {
@@ -86,6 +78,40 @@ function jsonToSse(jsonBody) {
86
78
  return '';
87
79
  }
88
80
  }
81
+ /** Convert CLI JSON response to OpenAI SSE format. */
82
+ function jsonToOpenaiSse(jsonBody) {
83
+ try {
84
+ const parsed = JSON.parse(jsonBody);
85
+ const text = parsed.content?.find(c => c.type === 'text')?.text ?? '';
86
+ const ts = Math.floor(Date.now() / 1000);
87
+ return `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: { content: text }, finish_reason: null }] })}\n\n` +
88
+ `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: {}, finish_reason: 'stop' }] })}\n\ndata: [DONE]\n\n`;
89
+ }
90
+ catch {
91
+ return '';
92
+ }
93
+ }
94
+ /** Send a CLI result to the client, handling streaming/format translation. */
95
+ function sendCliResponse(res, cliResult, clientWantsStream, isOpenAI, corsOrigin, securityHeaders) {
96
+ const headers = { 'Access-Control-Allow-Origin': corsOrigin, ...securityHeaders };
97
+ const ok = cliResult.status >= 200 && cliResult.status < 300;
98
+ if (ok && clientWantsStream) {
99
+ const sseData = isOpenAI ? jsonToOpenaiSse(cliResult.body) : jsonToSse(cliResult.body);
100
+ if (sseData) {
101
+ res.writeHead(200, { 'Content-Type': 'text/event-stream', ...headers });
102
+ res.end(sseData);
103
+ return;
104
+ }
105
+ }
106
+ if (ok && isOpenAI) {
107
+ try {
108
+ cliResult.body = JSON.stringify(anthropicToOpenai(JSON.parse(cliResult.body)));
109
+ }
110
+ catch { }
111
+ }
112
+ res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, ...headers });
113
+ res.end(cliResult.body);
114
+ }
89
115
  const SESSION_ID = randomUUID();
90
116
  const OS_NAME = platform === 'win32' ? 'Windows' : platform === 'darwin' ? 'MacOS' : 'Linux';
91
117
  // Claude Code device identity — required for Max plan billing classification.
@@ -100,7 +126,7 @@ function loadClaudeIdentity() {
100
126
  // Also check backup files as fallback
101
127
  try {
102
128
  const backupDir = join(homedir(), '.claude', 'backups');
103
- const files = require('fs').readdirSync(backupDir);
129
+ const files = readdirSync(backupDir);
104
130
  const backups = files
105
131
  .filter((f) => f.startsWith('.claude.json.backup.'))
106
132
  .sort()
@@ -180,28 +206,6 @@ function sanitizeMessages(body) {
180
206
  }
181
207
  }
182
208
  }
183
- let lastTokenSnapshot = null;
184
- function checkTokenAnomalies(usage, requestId) {
185
- const current = {
186
- inputTokens: usage.input_tokens ?? 0,
187
- outputTokens: usage.output_tokens ?? 0,
188
- cacheRead: usage.cache_read_input_tokens ?? 0,
189
- };
190
- if (lastTokenSnapshot && lastTokenSnapshot.inputTokens > 0) {
191
- const growth = (current.inputTokens - lastTokenSnapshot.inputTokens) / lastTokenSnapshot.inputTokens;
192
- if (growth > 0.6) {
193
- const pct = Math.round(growth * 100);
194
- console.warn(`[dario] TOKEN WARN ${requestId}: Input grew ${pct}% (${lastTokenSnapshot.inputTokens} → ${current.inputTokens}). Possible full replay.`);
195
- }
196
- if (current.outputTokens > lastTokenSnapshot.outputTokens * 2 && current.outputTokens > 2000) {
197
- console.warn(`[dario] TOKEN WARN ${requestId}: Output explosion ${current.outputTokens} tokens (${Math.round(current.outputTokens / lastTokenSnapshot.outputTokens)}x previous).`);
198
- }
199
- }
200
- lastTokenSnapshot = current;
201
- }
202
- // Extended context fallback — cooldown after 1M context failure
203
- let extendedContextUnavailableAt = 0;
204
- const EXTENDED_CONTEXT_COOLDOWN_MS = 60 * 60 * 1000; // 1 hour
205
209
  // OpenAI model names → Anthropic (fallback if client sends GPT names)
206
210
  const OPENAI_MODEL_MAP = {
207
211
  'gpt-5.4': 'claude-opus-4-6',
@@ -301,6 +305,37 @@ export function sanitizeError(err) {
301
305
  .replace(/eyJ[a-zA-Z0-9_-]+\.eyJ[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+/g, '[REDACTED_JWT]')
302
306
  .replace(/Bearer\s+[^\s,;]+/gi, 'Bearer [REDACTED]');
303
307
  }
308
+ /**
309
+ * Enrich Anthropic's unhelpful 429 "Error" body with rate limit details from headers.
310
+ */
311
+ function enrich429(body, headers) {
312
+ try {
313
+ const parsed = JSON.parse(body);
314
+ const err = parsed.error;
315
+ if (err && (err.message === 'Error' || !err.message)) {
316
+ const claim = headers.get('anthropic-ratelimit-unified-representative-claim') || 'unknown';
317
+ const status = headers.get('anthropic-ratelimit-unified-status') || 'rejected';
318
+ const util5h = headers.get('anthropic-ratelimit-unified-5h-utilization');
319
+ const util7d = headers.get('anthropic-ratelimit-unified-7d-utilization');
320
+ const reset = headers.get('anthropic-ratelimit-unified-reset');
321
+ const parts = [`Rate limited (${status}). Limiting window: ${claim}`];
322
+ if (util5h)
323
+ parts.push(`5h utilization: ${Math.round(parseFloat(util5h) * 100)}%`);
324
+ if (util7d)
325
+ parts.push(`7d utilization: ${Math.round(parseFloat(util7d) * 100)}%`);
326
+ if (reset) {
327
+ const resetDate = new Date(parseInt(reset) * 1000);
328
+ const mins = Math.max(0, Math.round((resetDate.getTime() - Date.now()) / 60000));
329
+ parts.push(`resets in ${mins}m`);
330
+ }
331
+ err.message = parts.join('. ');
332
+ }
333
+ return JSON.stringify(parsed);
334
+ }
335
+ catch {
336
+ return body;
337
+ }
338
+ }
304
339
  /**
305
340
  * CLI Backend: route requests through `claude --print` instead of direct API.
306
341
  * This bypasses rate limiting because Claude Code's binary has priority routing.
@@ -398,14 +433,14 @@ async function handleViaCli(body, model, verbose) {
398
433
  export async function startProxy(opts = {}) {
399
434
  const port = opts.port ?? DEFAULT_PORT;
400
435
  const verbose = opts.verbose ?? false;
436
+ const passthrough = opts.passthrough ?? false;
401
437
  // Verify auth before starting
402
438
  const status = await getStatus();
403
439
  if (!status.authenticated) {
404
440
  console.error('[dario] Not authenticated. Run `dario login` first.');
405
441
  process.exit(1);
406
442
  }
407
- const cliVersion = detectClaudeVersion();
408
- cliAvailable = detectCliAvailable();
443
+ const cliVersion = detectCli();
409
444
  const modelOverride = opts.model ? (MODEL_ALIASES[opts.model] ?? opts.model) : null;
410
445
  const identity = loadClaudeIdentity();
411
446
  if (identity.deviceId) {
@@ -415,8 +450,11 @@ export async function startProxy(opts = {}) {
415
450
  console.warn('[dario] WARNING: No Claude Code device identity found. Requests may be billed as Extra Usage.');
416
451
  console.warn('[dario] Run Claude Code at least once to generate ~/.claude/.claude.json');
417
452
  }
418
- // Pre-build static headers (only auth, version, beta, request-id change per request)
419
- const staticHeaders = {
453
+ // Pre-build static headers
454
+ const staticHeaders = passthrough ? {
455
+ 'accept': 'application/json',
456
+ 'Content-Type': 'application/json',
457
+ } : {
420
458
  'accept': 'application/json',
421
459
  'Content-Type': 'application/json',
422
460
  'anthropic-dangerous-direct-browser-access': 'true',
@@ -556,30 +594,26 @@ export async function startProxy(opts = {}) {
556
594
  // CLI backend mode: route through claude --print (works for both Anthropic and OpenAI endpoints)
557
595
  if (useCli && req.method === 'POST' && body.length > 0) {
558
596
  let cliBody = body;
597
+ let clientWantsStream = false;
559
598
  // Translate OpenAI format before passing to CLI
560
599
  if (isOpenAI) {
561
600
  try {
562
601
  const parsed = JSON.parse(body.toString());
602
+ clientWantsStream = !!parsed.stream;
563
603
  cliBody = Buffer.from(JSON.stringify(openaiToAnthropic(parsed, modelOverride)));
564
604
  }
565
605
  catch { /* send as-is */ }
566
606
  }
567
- const cliResult = await handleViaCli(cliBody, modelOverride, verbose);
568
- requestCount++;
569
- // Translate CLI response back to OpenAI format if needed
570
- if (isOpenAI && cliResult.status >= 200 && cliResult.status < 300) {
607
+ else {
571
608
  try {
572
- const parsed = JSON.parse(cliResult.body);
573
- cliResult.body = JSON.stringify(anthropicToOpenai(parsed));
609
+ const parsed = JSON.parse(body.toString());
610
+ clientWantsStream = !!parsed.stream;
574
611
  }
575
- catch { /* send as-is */ }
612
+ catch { }
576
613
  }
577
- res.writeHead(cliResult.status, {
578
- 'Content-Type': cliResult.contentType,
579
- 'Access-Control-Allow-Origin': corsOrigin,
580
- ...SECURITY_HEADERS,
581
- });
582
- res.end(cliResult.body);
614
+ const cliResult = await handleViaCli(cliBody, modelOverride, verbose);
615
+ requestCount++;
616
+ sendCliResponse(res, cliResult, clientWantsStream, isOpenAI, corsOrigin, SECURITY_HEADERS);
583
617
  return;
584
618
  }
585
619
  // Parse body once, apply OpenAI translation, model override, and sanitization
@@ -589,60 +623,64 @@ export async function startProxy(opts = {}) {
589
623
  const parsed = JSON.parse(body.toString());
590
624
  // Strip orchestration tags from messages (Aider, Cursor, etc.)
591
625
  sanitizeMessages(parsed);
592
- // Handle 1M context: strip [1m] suffix if in cooldown
593
- if (modelOverride?.includes('[1m]') && extendedContextUnavailableAt > 0 && Date.now() - extendedContextUnavailableAt < EXTENDED_CONTEXT_COOLDOWN_MS) {
594
- parsed.model = modelOverride.replace('[1m]', '');
595
- }
596
626
  const result = isOpenAI ? openaiToAnthropic(parsed, modelOverride) : (modelOverride ? { ...parsed, model: modelOverride } : parsed);
597
627
  const r = result;
598
- // Inject device identity metadata for session tracking
599
- if (identity.deviceId) {
600
- r.metadata = {
601
- user_id: JSON.stringify({
602
- device_id: identity.deviceId,
603
- account_uuid: identity.accountUuid,
604
- session_id: SESSION_ID,
605
- }),
606
- };
607
- }
608
- // Enable adaptive thinking for models that support it (Opus/Sonnet 4.6+)
609
- // Haiku 4.5 does not support thinking at all
610
- const modelName = (r.model || '').toLowerCase();
611
- const supportsThinking = !modelName.includes('haiku');
612
- if (supportsThinking && !r.thinking) {
613
- r.thinking = { type: 'adaptive' };
614
- // Ensure max_tokens is reasonable for thinking models
615
- const clientMax = r.max_tokens || 8192;
616
- r.max_tokens = Math.max(clientMax, 16000);
617
- }
618
- // Request priority capacity when available
619
- if (!r.service_tier) {
620
- r.service_tier = 'auto';
621
- }
622
- // Enable context management (matches Claude Code default)
623
- // Requires thinking to be enabled — skip for models without thinking support (e.g. Haiku)
624
- if (supportsThinking && !r.context_management) {
625
- r.context_management = { edits: [{ type: 'clear_thinking_20251015', keep: 'all' }] };
626
- }
627
- // Inject Claude Code billing header into system prompt.
628
- // Anthropic uses this to route requests through priority rate limiting
629
- // instead of the general API quota. Without it, Opus/Sonnet get 429
630
- // when overall utilization is high, even though model-specific limits
631
- // have headroom. The CLI binary embeds this in its system prompt.
632
- const billingTag = `x-anthropic-billing-header: cc_version=${cliVersion}; cc_entrypoint=cli; cch=98638;`;
633
- if (typeof r.system === 'string') {
634
- if (!r.system.includes('x-anthropic-billing-header:')) {
635
- r.system = billingTag + '\n' + r.system;
628
+ // In passthrough mode, skip all Claude-specific injection — OAuth swap only
629
+ if (!passthrough) {
630
+ // Inject device identity metadata for session tracking
631
+ if (identity.deviceId) {
632
+ r.metadata = {
633
+ user_id: JSON.stringify({
634
+ device_id: identity.deviceId,
635
+ account_uuid: identity.accountUuid,
636
+ session_id: SESSION_ID,
637
+ }),
638
+ };
636
639
  }
637
- }
638
- else if (Array.isArray(r.system)) {
639
- const hasTag = r.system.some(b => typeof b.text === 'string' && b.text.includes('x-anthropic-billing-header:'));
640
- if (!hasTag) {
641
- r.system.unshift({ type: 'text', text: billingTag });
640
+ // Enable adaptive thinking for models that support it (Opus/Sonnet 4.6+)
641
+ // Haiku 4.5 does not support thinking at all
642
+ const modelName = (r.model || '').toLowerCase();
643
+ const supportsThinking = !modelName.includes('haiku');
644
+ if (supportsThinking && !r.thinking) {
645
+ r.thinking = { type: 'adaptive' };
646
+ // Ensure max_tokens is reasonable for thinking models
647
+ const clientMax = r.max_tokens || 8192;
648
+ r.max_tokens = Math.max(clientMax, 16000);
649
+ }
650
+ // Request priority capacity when available
651
+ if (!r.service_tier) {
652
+ r.service_tier = 'auto';
653
+ }
654
+ // Set reasoning effort (pass through client value or default)
655
+ // Haiku does not support the effort parameter
656
+ if (supportsThinking && !r.output_config) {
657
+ r.output_config = { effort: 'high' };
658
+ }
659
+ // Enable context management (matches Claude Code default)
660
+ // Requires thinking to be enabled — skip for models without thinking support (e.g. Haiku)
661
+ if (supportsThinking && !r.context_management) {
662
+ r.context_management = { edits: [{ type: 'clear_thinking_20251015', keep: 'all' }] };
663
+ }
664
+ // Inject Claude Code billing header into system prompt.
665
+ // Anthropic uses this to route requests through priority rate limiting
666
+ // instead of the general API quota. Without it, Opus/Sonnet get 429
667
+ // when overall utilization is high, even though model-specific limits
668
+ // have headroom. The CLI binary embeds this in its system prompt.
669
+ const billingTag = `x-anthropic-billing-header: cc_version=${cliVersion}; cc_entrypoint=cli; cch=98638;`;
670
+ if (typeof r.system === 'string') {
671
+ if (!r.system.includes('x-anthropic-billing-header:')) {
672
+ r.system = billingTag + '\n' + r.system;
673
+ }
674
+ }
675
+ else if (Array.isArray(r.system)) {
676
+ const hasTag = r.system.some(b => typeof b.text === 'string' && b.text.includes('x-anthropic-billing-header:'));
677
+ if (!hasTag) {
678
+ r.system.unshift({ type: 'text', text: billingTag });
679
+ }
680
+ }
681
+ else {
682
+ r.system = billingTag;
642
683
  }
643
- }
644
- else {
645
- r.system = billingTag;
646
684
  }
647
685
  finalBody = Buffer.from(JSON.stringify(r));
648
686
  }
@@ -652,15 +690,23 @@ export async function startProxy(opts = {}) {
652
690
  const modelInfo = modelOverride ? ` (model: ${modelOverride})` : '';
653
691
  console.log(`[dario] #${requestCount} ${req.method} ${urlPath}${modelInfo}`);
654
692
  }
655
- // Beta defaults — matches native Claude Code v2.1.98 headers exactly.
656
- // Billing classification is determined by the OAuth token alone, not beta flags.
657
- // context-management and prompt-caching-scope are safe for all subscription types.
693
+ // Beta headers
658
694
  const clientBeta = req.headers['anthropic-beta'];
659
- let beta = 'oauth-2025-04-20,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05,claude-code-20250219,advisor-tool-2026-03-01,effort-2025-11-24';
660
- if (clientBeta) {
661
- const filtered = filterBillableBetas(clientBeta);
662
- if (filtered)
663
- beta += ',' + filtered;
695
+ let beta;
696
+ if (passthrough) {
697
+ // Passthrough: only add oauth beta, forward client betas as-is
698
+ beta = 'oauth-2025-04-20';
699
+ if (clientBeta)
700
+ beta += ',' + clientBeta;
701
+ }
702
+ else {
703
+ // Claude-optimized: full beta set matching CLI v2.1.100
704
+ beta = 'oauth-2025-04-20,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05,claude-code-20250219,advisor-tool-2026-03-01,effort-2025-11-24';
705
+ if (clientBeta) {
706
+ const filtered = filterBillableBetas(clientBeta);
707
+ if (filtered)
708
+ beta += ',' + filtered;
709
+ }
664
710
  }
665
711
  const headers = {
666
712
  ...staticHeaders,
@@ -675,74 +721,38 @@ export async function startProxy(opts = {}) {
675
721
  body: finalBody ? new Uint8Array(finalBody) : undefined,
676
722
  signal: AbortSignal.timeout(UPSTREAM_TIMEOUT_MS),
677
723
  });
678
- // Auto-fallback: if API returns 429 and CLI is available, retry through CLI binary.
679
- // The CLI gets priority routing from Anthropic's server — a separate rate limit pool
680
- // that continues working when the direct API quota is exhausted for expensive models.
724
+ // Enrich 429 errors with rate limit details from headers (Anthropic only returns "Error")
725
+ if (upstream.status === 429 && !(cliAvailable && !useCli)) {
726
+ const errBody = await upstream.text().catch(() => '');
727
+ const enriched = enrich429(errBody, upstream.headers);
728
+ const responseHeaders = {
729
+ 'Content-Type': 'application/json',
730
+ 'Access-Control-Allow-Origin': corsOrigin,
731
+ ...SECURITY_HEADERS,
732
+ };
733
+ for (const [key, value] of upstream.headers.entries()) {
734
+ if (key.startsWith('x-ratelimit') || key.startsWith('anthropic-ratelimit') || key === 'request-id') {
735
+ responseHeaders[key] = value;
736
+ }
737
+ }
738
+ requestCount++;
739
+ res.writeHead(429, responseHeaders);
740
+ res.end(enriched);
741
+ return;
742
+ }
743
+ // Auto-fallback: if API returns 429 and CLI is available, retry through CLI binary
681
744
  if (upstream.status === 429 && cliAvailable && !useCli) {
682
- // Drain the upstream response
683
745
  await upstream.text().catch(() => { });
684
746
  if (verbose)
685
747
  console.log(`[dario] #${requestCount} 429 from API — falling back to CLI`);
686
- // Determine if the client requested streaming
687
748
  let clientWantsStream = false;
688
- if (body.length > 0) {
689
- try {
690
- const p = JSON.parse(body.toString());
691
- clientWantsStream = !!p.stream;
692
- }
693
- catch { }
749
+ try {
750
+ clientWantsStream = !!JSON.parse(body.toString()).stream;
694
751
  }
752
+ catch { }
695
753
  const cliResult = await handleViaCli(body, modelOverride, verbose);
696
754
  requestCount++;
697
- if (cliResult.status >= 200 && cliResult.status < 300) {
698
- if (isOpenAI) {
699
- // Translate to OpenAI format
700
- try {
701
- const parsed = JSON.parse(cliResult.body);
702
- cliResult.body = JSON.stringify(anthropicToOpenai(parsed));
703
- }
704
- catch { }
705
- }
706
- if (clientWantsStream && !isOpenAI) {
707
- // Client requested SSE streaming — convert CLI JSON to SSE events
708
- const sseData = jsonToSse(cliResult.body);
709
- res.writeHead(200, {
710
- 'Content-Type': 'text/event-stream',
711
- 'Access-Control-Allow-Origin': corsOrigin,
712
- ...SECURITY_HEADERS,
713
- });
714
- res.end(sseData);
715
- }
716
- else if (clientWantsStream && isOpenAI) {
717
- // OpenAI streaming — convert Anthropic JSON to OpenAI SSE
718
- try {
719
- const parsed = JSON.parse(cliResult.body);
720
- const text = parsed.content?.find(c => c.type === 'text')?.text ?? '';
721
- const ts = Math.floor(Date.now() / 1000);
722
- let sseData = `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: { content: text }, finish_reason: null }] })}\n\n`;
723
- sseData += `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: {}, finish_reason: 'stop' }] })}\n\ndata: [DONE]\n\n`;
724
- res.writeHead(200, {
725
- 'Content-Type': 'text/event-stream',
726
- 'Access-Control-Allow-Origin': corsOrigin,
727
- ...SECURITY_HEADERS,
728
- });
729
- res.end(sseData);
730
- }
731
- catch {
732
- res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
733
- res.end(cliResult.body);
734
- }
735
- }
736
- else {
737
- res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
738
- res.end(cliResult.body);
739
- }
740
- }
741
- else {
742
- // CLI also failed — return the CLI error
743
- res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
744
- res.end(cliResult.body);
745
- }
755
+ sendCliResponse(res, cliResult, clientWantsStream, isOpenAI, corsOrigin, SECURITY_HEADERS);
746
756
  return;
747
757
  }
748
758
  // Detect streaming from content-type (reliable) or body (fallback)
@@ -817,21 +827,6 @@ export async function startProxy(opts = {}) {
817
827
  else {
818
828
  // Buffer and forward
819
829
  const responseBody = await upstream.text();
820
- // Check for extended context failure — cooldown to avoid repeated failures
821
- if (upstream.status === 400 && responseBody.includes('extra_usage') && modelOverride?.includes('[1m]')) {
822
- extendedContextUnavailableAt = Date.now();
823
- console.warn('[dario] 1M context requires Extra Usage — falling back to standard context for 1 hour');
824
- }
825
- // Token anomaly detection on non-streaming responses
826
- if (upstream.status >= 200 && upstream.status < 300) {
827
- try {
828
- const parsed = JSON.parse(responseBody);
829
- const usage = parsed.usage;
830
- if (usage)
831
- checkTokenAnomalies(usage, responseHeaders['request-id'] ?? '');
832
- }
833
- catch { /* ignore parse errors */ }
834
- }
835
830
  if (isOpenAI && upstream.status >= 200 && upstream.status < 300) {
836
831
  // Translate Anthropic response → OpenAI format
837
832
  try {
@@ -869,7 +864,7 @@ export async function startProxy(opts = {}) {
869
864
  process.exit(1);
870
865
  });
871
866
  server.listen(port, LOCALHOST, () => {
872
- const oauthLine = useCli ? 'Backend: Claude CLI (bypasses rate limits)' : `OAuth: ${status.status} (expires in ${status.expiresIn})`;
867
+ const modeLine = passthrough ? 'Mode: passthrough (OAuth swap only, no injection)' : useCli ? 'Backend: Claude CLI (bypasses rate limits)' : `OAuth: ${status.status} (expires in ${status.expiresIn})`;
873
868
  const modelLine = modelOverride ? `Model: ${modelOverride} (all requests)` : 'Model: passthrough (client decides)';
874
869
  console.log('');
875
870
  console.log(` dario — http://localhost:${port}`);
@@ -880,7 +875,7 @@ export async function startProxy(opts = {}) {
880
875
  console.log(` ANTHROPIC_BASE_URL=http://localhost:${port}`);
881
876
  console.log(' ANTHROPIC_API_KEY=dario');
882
877
  console.log('');
883
- console.log(` ${oauthLine}`);
878
+ console.log(` ${modeLine}`);
884
879
  console.log(` ${modelLine}`);
885
880
  console.log('');
886
881
  });
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@askalf/dario",
3
- "version": "2.7.1",
3
+ "version": "2.8.1",
4
4
  "description": "Use your Claude subscription as an API. No API key needed. Local proxy for Claude Max/Pro subscriptions.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -24,7 +24,8 @@
24
24
  "audit": "npm audit --production --audit-level=high",
25
25
  "prepublishOnly": "npm run build",
26
26
  "start": "node dist/cli.js",
27
- "dev": "tsx src/cli.ts"
27
+ "dev": "tsx src/cli.ts",
28
+ "e2e": "node test/e2e.mjs"
28
29
  },
29
30
  "keywords": [
30
31
  "claude",