@askalf/dario 2.7.0 → 2.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -70,14 +70,15 @@ Opus, Sonnet, Haiku — all models, streaming, tool use. Works with Cursor, Cont
70
70
 
71
71
  Most Claude subscription proxies have a critical billing problem: **Anthropic classifies their requests as third-party and routes all usage to Extra Usage billing** — even when you have Max plan limits available. You're paying for your subscription twice.
72
72
 
73
- dario is the only proxy that solves this. It injects native Claude Code device identity (`metadata.user_id`) into every request, so Anthropic's billing system treats your requests exactly like Claude Code itself. Your Max plan limits work correctly.
73
+ dario is the only proxy that solves this. It injects native Claude Code device identity, billing classification tags, and priority routing into every request so Anthropic's billing system treats your requests exactly like Claude Code itself. Your Max plan limits work correctly, and Opus/Sonnet stay available even at high utilization.
74
74
 
75
75
  | | dario | Other proxies |
76
76
  |---|---|---|
77
77
  | **Billing classification** | Native Claude Code session | Third-party (Extra Usage) |
78
78
  | **Max plan limits** | Used correctly | Bypassed — billed separately |
79
79
  | **Device identity** | Injected automatically | Missing |
80
- | **Beta flags** | Match Claude Code v2.1.98 | Outdated or missing |
80
+ | **Priority routing** | Billing tag + service_tier auto | Missing |
81
+ | **Beta flags** | Match Claude Code v2.1.100 | Outdated or missing |
81
82
  | **Billable beta filtering** | Strips surprise charges | Passes everything through |
82
83
 
83
84
  <details>
@@ -91,7 +92,7 @@ dario is the only proxy that solves this. It injects native Claude Code device i
91
92
  | OpenAI API compat | **Yes** | Yes | Yes | Yes |
92
93
  | Orchestration sanitization | **Yes** | Yes | No | No |
93
94
  | Token anomaly detection | **Yes** | Yes | No | No |
94
- | Codebase size | ~1,200 lines | ~9,000 lines | Platform | Rust binary |
95
+ | Codebase size | ~1,500 lines | ~9,000 lines | Platform | Rust binary |
95
96
  | Dependencies | 1 | Many | Many | Compiled |
96
97
  | Setup | 2 commands | Config + build | Config + dashboard | Config |
97
98
 
@@ -382,6 +383,9 @@ Then run `hermes` normally — it routes through dario using your Claude subscri
382
383
  ### Direct API Mode
383
384
  - All Claude models (Opus 4.6, Sonnet 4.6, Haiku 4.5) + 1M extended context aliases (`opus1m`, `sonnet1m`)
384
385
  - **Native billing classification** — device identity metadata ensures Max plan limits work correctly
386
+ - **Priority routing** — billing tag injection + `service_tier: auto` activates per-model rate limits, keeping Opus/Sonnet available even at 100% overall utilization
387
+ - **Adaptive thinking** — matches Claude Code's `{ type: 'adaptive' }` mode for optimal reasoning
388
+ - **Auto CLI fallback** — if the API returns 429 and Claude Code is installed, transparently retries through `claude --print` with SSE conversion
385
389
  - **OpenAI-compatible** (`/v1/chat/completions`) — works with any OpenAI SDK or tool
386
390
  - Streaming and non-streaming (both Anthropic and OpenAI SSE formats, including tool_use streaming)
387
391
  - Tool use / function calling
@@ -493,7 +497,7 @@ Dario handles your OAuth tokens. Here's why you can trust it:
493
497
 
494
498
  | Signal | Status |
495
499
  |--------|--------|
496
- | **Source code** | ~1,300 lines of TypeScript — small enough to audit in one sitting |
500
+ | **Source code** | ~1,500 lines of TypeScript — small enough to audit in one sitting |
497
501
  | **Dependencies** | 1 production dep (`@anthropic-ai/sdk`). Verify: `npm ls --production` |
498
502
  | **npm provenance** | Every release is [SLSA attested](https://www.npmjs.com/package/@askalf/dario) via GitHub Actions |
499
503
  | **Security scanning** | [CodeQL](https://github.com/askalf/dario/actions/workflows/codeql.yml) runs on every push and weekly |
@@ -515,7 +519,7 @@ cd $(npm root -g)/@askalf/dario && npm ls --production
515
519
 
516
520
  ## Contributing
517
521
 
518
- PRs welcome. The codebase is ~1,300 lines of TypeScript across 4 files:
522
+ PRs welcome. The codebase is ~1,500 lines of TypeScript across 4 files:
519
523
 
520
524
  | File | Purpose |
521
525
  |------|---------|
@@ -536,7 +540,7 @@ npm run dev # runs with tsx (no build needed)
536
540
  | Who | Contributions |
537
541
  |-----|---------------|
538
542
  | [@GodsBoy](https://github.com/GodsBoy) | Proxy authentication, token redaction, error sanitization ([#2](https://github.com/askalf/dario/pull/2)) |
539
- | [@belangertrading](https://github.com/belangertrading) | Billing classification investigation — reported, tested 5 versions, confirmed fix via response header analysis ([#4](https://github.com/askalf/dario/issues/4)) |
543
+ | [@belangertrading](https://github.com/belangertrading) | Billing classification investigation ([#4](https://github.com/askalf/dario/issues/4)), Opus/Sonnet 429 diagnosis + CLI fallback workaround ([#6](https://github.com/askalf/dario/issues/6)) |
540
544
 
541
545
  ## Also by AskAlf
542
546
 
package/dist/cli.js CHANGED
@@ -112,9 +112,10 @@ async function proxy() {
112
112
  }
113
113
  const verbose = args.includes('--verbose') || args.includes('-v');
114
114
  const cliBackend = args.includes('--cli');
115
+ const passthrough = args.includes('--passthrough') || args.includes('--thin');
115
116
  const modelArg = args.find(a => a.startsWith('--model='));
116
117
  const model = modelArg ? modelArg.split('=')[1] : undefined;
117
- await startProxy({ port, verbose, model, cliBackend });
118
+ await startProxy({ port, verbose, model, cliBackend, passthrough });
118
119
  }
119
120
  async function help() {
120
121
  console.log(`
@@ -133,6 +134,7 @@ async function help() {
133
134
  Full IDs: claude-opus-4-6, claude-sonnet-4-6
134
135
  Default: passthrough (client decides)
135
136
  --cli Use Claude CLI as backend (bypasses rate limits)
137
+ --passthrough Thin proxy — OAuth swap only, no injection
136
138
  --port=PORT Port to listen on (default: 3456)
137
139
  --verbose, -v Log all requests
138
140
 
package/dist/proxy.d.ts CHANGED
@@ -3,6 +3,7 @@ interface ProxyOptions {
3
3
  verbose?: boolean;
4
4
  model?: string;
5
5
  cliBackend?: boolean;
6
+ passthrough?: boolean;
6
7
  }
7
8
  export declare function sanitizeError(err: unknown): string;
8
9
  export declare function startProxy(opts?: ProxyOptions): Promise<void>;
package/dist/proxy.js CHANGED
@@ -301,6 +301,37 @@ export function sanitizeError(err) {
301
301
  .replace(/eyJ[a-zA-Z0-9_-]+\.eyJ[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+/g, '[REDACTED_JWT]')
302
302
  .replace(/Bearer\s+[^\s,;]+/gi, 'Bearer [REDACTED]');
303
303
  }
304
+ /**
305
+ * Enrich Anthropic's unhelpful 429 "Error" body with rate limit details from headers.
306
+ */
307
+ function enrich429(body, headers) {
308
+ try {
309
+ const parsed = JSON.parse(body);
310
+ const err = parsed.error;
311
+ if (err && (err.message === 'Error' || !err.message)) {
312
+ const claim = headers.get('anthropic-ratelimit-unified-representative-claim') || 'unknown';
313
+ const status = headers.get('anthropic-ratelimit-unified-status') || 'rejected';
314
+ const util5h = headers.get('anthropic-ratelimit-unified-5h-utilization');
315
+ const util7d = headers.get('anthropic-ratelimit-unified-7d-utilization');
316
+ const reset = headers.get('anthropic-ratelimit-unified-reset');
317
+ const parts = [`Rate limited (${status}). Limiting window: ${claim}`];
318
+ if (util5h)
319
+ parts.push(`5h utilization: ${Math.round(parseFloat(util5h) * 100)}%`);
320
+ if (util7d)
321
+ parts.push(`7d utilization: ${Math.round(parseFloat(util7d) * 100)}%`);
322
+ if (reset) {
323
+ const resetDate = new Date(parseInt(reset) * 1000);
324
+ const mins = Math.max(0, Math.round((resetDate.getTime() - Date.now()) / 60000));
325
+ parts.push(`resets in ${mins}m`);
326
+ }
327
+ err.message = parts.join('. ');
328
+ }
329
+ return JSON.stringify(parsed);
330
+ }
331
+ catch {
332
+ return body;
333
+ }
334
+ }
304
335
  /**
305
336
  * CLI Backend: route requests through `claude --print` instead of direct API.
306
337
  * This bypasses rate limiting because Claude Code's binary has priority routing.
@@ -398,6 +429,7 @@ async function handleViaCli(body, model, verbose) {
398
429
  export async function startProxy(opts = {}) {
399
430
  const port = opts.port ?? DEFAULT_PORT;
400
431
  const verbose = opts.verbose ?? false;
432
+ const passthrough = opts.passthrough ?? false;
401
433
  // Verify auth before starting
402
434
  const status = await getStatus();
403
435
  if (!status.authenticated) {
@@ -415,8 +447,11 @@ export async function startProxy(opts = {}) {
415
447
  console.warn('[dario] WARNING: No Claude Code device identity found. Requests may be billed as Extra Usage.');
416
448
  console.warn('[dario] Run Claude Code at least once to generate ~/.claude/.claude.json');
417
449
  }
418
- // Pre-build static headers (only auth, version, beta, request-id change per request)
419
- const staticHeaders = {
450
+ // Pre-build static headers
451
+ const staticHeaders = passthrough ? {
452
+ 'accept': 'application/json',
453
+ 'Content-Type': 'application/json',
454
+ } : {
420
455
  'accept': 'application/json',
421
456
  'Content-Type': 'application/json',
422
457
  'anthropic-dangerous-direct-browser-access': 'true',
@@ -556,30 +591,60 @@ export async function startProxy(opts = {}) {
556
591
  // CLI backend mode: route through claude --print (works for both Anthropic and OpenAI endpoints)
557
592
  if (useCli && req.method === 'POST' && body.length > 0) {
558
593
  let cliBody = body;
594
+ let clientWantsStream = false;
559
595
  // Translate OpenAI format before passing to CLI
560
596
  if (isOpenAI) {
561
597
  try {
562
598
  const parsed = JSON.parse(body.toString());
599
+ clientWantsStream = !!parsed.stream;
563
600
  cliBody = Buffer.from(JSON.stringify(openaiToAnthropic(parsed, modelOverride)));
564
601
  }
565
602
  catch { /* send as-is */ }
566
603
  }
604
+ else {
605
+ try {
606
+ const parsed = JSON.parse(body.toString());
607
+ clientWantsStream = !!parsed.stream;
608
+ }
609
+ catch { }
610
+ }
567
611
  const cliResult = await handleViaCli(cliBody, modelOverride, verbose);
568
612
  requestCount++;
569
- // Translate CLI response back to OpenAI format if needed
570
- if (isOpenAI && cliResult.status >= 200 && cliResult.status < 300) {
571
- try {
572
- const parsed = JSON.parse(cliResult.body);
573
- cliResult.body = JSON.stringify(anthropicToOpenai(parsed));
613
+ if (cliResult.status >= 200 && cliResult.status < 300 && clientWantsStream) {
614
+ // Client requested streaming convert CLI JSON to SSE
615
+ if (isOpenAI) {
616
+ try {
617
+ const parsed = JSON.parse(cliResult.body);
618
+ const text = parsed.content?.find(c => c.type === 'text')?.text ?? '';
619
+ const ts = Math.floor(Date.now() / 1000);
620
+ let sseData = `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: { content: text }, finish_reason: null }] })}\n\n`;
621
+ sseData += `data: ${JSON.stringify({ id: 'chatcmpl-dario', object: 'chat.completion.chunk', created: ts, model: 'claude', choices: [{ index: 0, delta: {}, finish_reason: 'stop' }] })}\n\ndata: [DONE]\n\n`;
622
+ res.writeHead(200, { 'Content-Type': 'text/event-stream', 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
623
+ res.end(sseData);
624
+ }
625
+ catch {
626
+ res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
627
+ res.end(cliResult.body);
628
+ }
629
+ }
630
+ else {
631
+ const sseData = jsonToSse(cliResult.body);
632
+ res.writeHead(200, { 'Content-Type': 'text/event-stream', 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
633
+ res.end(sseData);
574
634
  }
575
- catch { /* send as-is */ }
576
635
  }
577
- res.writeHead(cliResult.status, {
578
- 'Content-Type': cliResult.contentType,
579
- 'Access-Control-Allow-Origin': corsOrigin,
580
- ...SECURITY_HEADERS,
581
- });
582
- res.end(cliResult.body);
636
+ else {
637
+ // Non-streaming or error — translate and return as JSON
638
+ if (isOpenAI && cliResult.status >= 200 && cliResult.status < 300) {
639
+ try {
640
+ const parsed = JSON.parse(cliResult.body);
641
+ cliResult.body = JSON.stringify(anthropicToOpenai(parsed));
642
+ }
643
+ catch { /* send as-is */ }
644
+ }
645
+ res.writeHead(cliResult.status, { 'Content-Type': cliResult.contentType, 'Access-Control-Allow-Origin': corsOrigin, ...SECURITY_HEADERS });
646
+ res.end(cliResult.body);
647
+ }
583
648
  return;
584
649
  }
585
650
  // Parse body once, apply OpenAI translation, model override, and sanitization
@@ -595,51 +660,61 @@ export async function startProxy(opts = {}) {
595
660
  }
596
661
  const result = isOpenAI ? openaiToAnthropic(parsed, modelOverride) : (modelOverride ? { ...parsed, model: modelOverride } : parsed);
597
662
  const r = result;
598
- // Inject device identity metadata for session tracking
599
- if (identity.deviceId) {
600
- r.metadata = {
601
- user_id: JSON.stringify({
602
- device_id: identity.deviceId,
603
- account_uuid: identity.accountUuid,
604
- session_id: SESSION_ID,
605
- }),
606
- };
607
- }
608
- // Enable adaptive thinking (matches Claude Code default)
609
- // adaptive lets the model decide when/how much to think — preferred for Opus/Sonnet 4.6
610
- if (!r.thinking) {
611
- r.thinking = { type: 'adaptive' };
612
- // Ensure max_tokens is reasonable for thinking models
613
- const clientMax = r.max_tokens || 8192;
614
- r.max_tokens = Math.max(clientMax, 16000);
615
- }
616
- // Request priority capacity when available
617
- if (!r.service_tier) {
618
- r.service_tier = 'auto';
619
- }
620
- // Enable context management (matches Claude Code default)
621
- if (!r.context_management) {
622
- r.context_management = { edits: [{ type: 'clear_thinking_20251015', keep: 'all' }] };
623
- }
624
- // Inject Claude Code billing header into system prompt.
625
- // Anthropic uses this to route requests through priority rate limiting
626
- // instead of the general API quota. Without it, Opus/Sonnet get 429
627
- // when overall utilization is high, even though model-specific limits
628
- // have headroom. The CLI binary embeds this in its system prompt.
629
- const billingTag = `x-anthropic-billing-header: cc_version=${cliVersion}; cc_entrypoint=cli; cch=98638;`;
630
- if (typeof r.system === 'string') {
631
- if (!r.system.includes('x-anthropic-billing-header:')) {
632
- r.system = billingTag + '\n' + r.system;
663
+ // In passthrough mode, skip all Claude-specific injection — OAuth swap only
664
+ if (!passthrough) {
665
+ // Inject device identity metadata for session tracking
666
+ if (identity.deviceId) {
667
+ r.metadata = {
668
+ user_id: JSON.stringify({
669
+ device_id: identity.deviceId,
670
+ account_uuid: identity.accountUuid,
671
+ session_id: SESSION_ID,
672
+ }),
673
+ };
633
674
  }
634
- }
635
- else if (Array.isArray(r.system)) {
636
- const hasTag = r.system.some(b => typeof b.text === 'string' && b.text.includes('x-anthropic-billing-header:'));
637
- if (!hasTag) {
638
- r.system.unshift({ type: 'text', text: billingTag });
675
+ // Enable adaptive thinking for models that support it (Opus/Sonnet 4.6+)
676
+ // Haiku 4.5 does not support thinking at all
677
+ const modelName = (r.model || '').toLowerCase();
678
+ const supportsThinking = !modelName.includes('haiku');
679
+ if (supportsThinking && !r.thinking) {
680
+ r.thinking = { type: 'adaptive' };
681
+ // Ensure max_tokens is reasonable for thinking models
682
+ const clientMax = r.max_tokens || 8192;
683
+ r.max_tokens = Math.max(clientMax, 16000);
684
+ }
685
+ // Request priority capacity when available
686
+ if (!r.service_tier) {
687
+ r.service_tier = 'auto';
688
+ }
689
+ // Set reasoning effort (pass through client value or default)
690
+ if (!r.output_config) {
691
+ r.output_config = { effort: 'high' };
692
+ }
693
+ // Enable context management (matches Claude Code default)
694
+ // Requires thinking to be enabled — skip for models without thinking support (e.g. Haiku)
695
+ if (supportsThinking && !r.context_management) {
696
+ r.context_management = { edits: [{ type: 'clear_thinking_20251015', keep: 'all' }] };
697
+ }
698
+ // Inject Claude Code billing header into system prompt.
699
+ // Anthropic uses this to route requests through priority rate limiting
700
+ // instead of the general API quota. Without it, Opus/Sonnet get 429
701
+ // when overall utilization is high, even though model-specific limits
702
+ // have headroom. The CLI binary embeds this in its system prompt.
703
+ const billingTag = `x-anthropic-billing-header: cc_version=${cliVersion}; cc_entrypoint=cli; cch=98638;`;
704
+ if (typeof r.system === 'string') {
705
+ if (!r.system.includes('x-anthropic-billing-header:')) {
706
+ r.system = billingTag + '\n' + r.system;
707
+ }
708
+ }
709
+ else if (Array.isArray(r.system)) {
710
+ const hasTag = r.system.some(b => typeof b.text === 'string' && b.text.includes('x-anthropic-billing-header:'));
711
+ if (!hasTag) {
712
+ r.system.unshift({ type: 'text', text: billingTag });
713
+ }
714
+ }
715
+ else {
716
+ r.system = billingTag;
639
717
  }
640
- }
641
- else {
642
- r.system = billingTag;
643
718
  }
644
719
  finalBody = Buffer.from(JSON.stringify(r));
645
720
  }
@@ -649,15 +724,23 @@ export async function startProxy(opts = {}) {
649
724
  const modelInfo = modelOverride ? ` (model: ${modelOverride})` : '';
650
725
  console.log(`[dario] #${requestCount} ${req.method} ${urlPath}${modelInfo}`);
651
726
  }
652
- // Beta defaults — matches native Claude Code v2.1.98 headers exactly.
653
- // Billing classification is determined by the OAuth token alone, not beta flags.
654
- // context-management and prompt-caching-scope are safe for all subscription types.
727
+ // Beta headers
655
728
  const clientBeta = req.headers['anthropic-beta'];
656
- let beta = 'oauth-2025-04-20,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05,claude-code-20250219,advisor-tool-2026-03-01,effort-2025-11-24';
657
- if (clientBeta) {
658
- const filtered = filterBillableBetas(clientBeta);
659
- if (filtered)
660
- beta += ',' + filtered;
729
+ let beta;
730
+ if (passthrough) {
731
+ // Passthrough: only add oauth beta, forward client betas as-is
732
+ beta = 'oauth-2025-04-20';
733
+ if (clientBeta)
734
+ beta += ',' + clientBeta;
735
+ }
736
+ else {
737
+ // Claude-optimized: full beta set matching CLI v2.1.100
738
+ beta = 'oauth-2025-04-20,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05,claude-code-20250219,advisor-tool-2026-03-01,effort-2025-11-24';
739
+ if (clientBeta) {
740
+ const filtered = filterBillableBetas(clientBeta);
741
+ if (filtered)
742
+ beta += ',' + filtered;
743
+ }
661
744
  }
662
745
  const headers = {
663
746
  ...staticHeaders,
@@ -672,6 +755,25 @@ export async function startProxy(opts = {}) {
672
755
  body: finalBody ? new Uint8Array(finalBody) : undefined,
673
756
  signal: AbortSignal.timeout(UPSTREAM_TIMEOUT_MS),
674
757
  });
758
+ // Enrich 429 errors with rate limit details from headers (Anthropic only returns "Error")
759
+ if (upstream.status === 429 && !(cliAvailable && !useCli)) {
760
+ const errBody = await upstream.text().catch(() => '');
761
+ const enriched = enrich429(errBody, upstream.headers);
762
+ const responseHeaders = {
763
+ 'Content-Type': 'application/json',
764
+ 'Access-Control-Allow-Origin': corsOrigin,
765
+ ...SECURITY_HEADERS,
766
+ };
767
+ for (const [key, value] of upstream.headers.entries()) {
768
+ if (key.startsWith('x-ratelimit') || key.startsWith('anthropic-ratelimit') || key === 'request-id') {
769
+ responseHeaders[key] = value;
770
+ }
771
+ }
772
+ requestCount++;
773
+ res.writeHead(429, responseHeaders);
774
+ res.end(enriched);
775
+ return;
776
+ }
675
777
  // Auto-fallback: if API returns 429 and CLI is available, retry through CLI binary.
676
778
  // The CLI gets priority routing from Anthropic's server — a separate rate limit pool
677
779
  // that continues working when the direct API quota is exhausted for expensive models.
@@ -866,7 +968,7 @@ export async function startProxy(opts = {}) {
866
968
  process.exit(1);
867
969
  });
868
970
  server.listen(port, LOCALHOST, () => {
869
- const oauthLine = useCli ? 'Backend: Claude CLI (bypasses rate limits)' : `OAuth: ${status.status} (expires in ${status.expiresIn})`;
971
+ const modeLine = passthrough ? 'Mode: passthrough (OAuth swap only, no injection)' : useCli ? 'Backend: Claude CLI (bypasses rate limits)' : `OAuth: ${status.status} (expires in ${status.expiresIn})`;
870
972
  const modelLine = modelOverride ? `Model: ${modelOverride} (all requests)` : 'Model: passthrough (client decides)';
871
973
  console.log('');
872
974
  console.log(` dario — http://localhost:${port}`);
@@ -877,7 +979,7 @@ export async function startProxy(opts = {}) {
877
979
  console.log(` ANTHROPIC_BASE_URL=http://localhost:${port}`);
878
980
  console.log(' ANTHROPIC_API_KEY=dario');
879
981
  console.log('');
880
- console.log(` ${oauthLine}`);
982
+ console.log(` ${modeLine}`);
881
983
  console.log(` ${modelLine}`);
882
984
  console.log('');
883
985
  });
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@askalf/dario",
3
- "version": "2.7.0",
3
+ "version": "2.8.0",
4
4
  "description": "Use your Claude subscription as an API. No API key needed. Local proxy for Claude Max/Pro subscriptions.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -24,7 +24,8 @@
24
24
  "audit": "npm audit --production --audit-level=high",
25
25
  "prepublishOnly": "npm run build",
26
26
  "start": "node dist/cli.js",
27
- "dev": "tsx src/cli.ts"
27
+ "dev": "tsx src/cli.ts",
28
+ "e2e": "node test/e2e.mjs"
28
29
  },
29
30
  "keywords": [
30
31
  "claude",