@askalf/dario 3.9.6 → 3.10.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -26,11 +26,13 @@
26
26
 
27
27
  Dario runs on your machine and gives every tool you use one local URL that reaches **every LLM you use.** Point Cursor, Continue, Aider, LiteLLM, your own scripts — anything that speaks the Anthropic or OpenAI API — at `http://localhost:3456`, and dario routes each request to the right backend:
28
28
 
29
- - **Claude Max / Pro subscriptions** — OAuth-backed, billed against your plan instead of API pricing. Multi-account pooling if you have more than one.
30
29
  - **OpenAI** — your API key, routed to `api.openai.com` straight through.
31
30
  - **Any OpenAI-compat endpoint** — OpenRouter, Groq, a local LiteLLM, Ollama's openai-compat mode, self-hosted vLLM. Set the backend's `baseUrl` once, done.
31
+ - **Claude Max / Pro subscriptions** — OAuth-backed, billed against your plan instead of API pricing. Multi-account pooling if you have more than one.
32
+
33
+ Your tool sees one base URL. `gpt-4o` goes to OpenAI. `llama-3-70b` goes to Groq. `claude-opus-4-6` goes to your Claude subscription. None of your tools have to know about any of it.
32
34
 
33
- Your tool sees one base URL. `gpt-4` goes to OpenAI. `claude-opus-4-6` goes to your Claude subscription. `llama-3-70b` goes to Groq. None of your tools have to know about any of it.
35
+ **Backends are plugins, not the product.** Dario's job is the one local endpoint your tools point at. Each backend is a swappable adapter behind it — when a provider ships, a backend entry lands, your tools don't change. That's the durable part.
34
36
 
35
37
  **No account anywhere is required.** Single-backend Claude dario works with nothing but `dario login`. Multi-backend dario works with nothing but local config files. Nothing phones home. Zero runtime dependencies. ~2,000 lines of TypeScript.
36
38
 
@@ -41,8 +43,9 @@ Your tool sees one base URL. `gpt-4` goes to OpenAI. `claude-opus-4-6` goes to y
41
43
  **Best fit:**
42
44
 
43
45
  - **Developers using multiple LLMs across multiple tools** who are tired of juggling base URLs, API keys, and per-tool provider configs.
44
- - **Claude Max or Pro subscribers** who want their subscription usable anywhere that speaks the Anthropic or OpenAI API — without paying API rates for every request.
45
46
  - **Teams running local or hosted OpenAI-compat servers** (LiteLLM, vLLM, Ollama, Groq, OpenRouter) who want one stable local endpoint in front of them that every tool can reuse.
47
+ - **Anyone who wants to switch providers without reconfiguring every tool** — change the model name in your tool, dario picks a different backend, your tool keeps working.
48
+ - **Claude Max or Pro subscribers** who want their subscription usable anywhere that speaks the Anthropic or OpenAI API — without paying API rates for every request.
46
49
  - **Power users running multi-agent workloads on Claude subscriptions** who want multi-account pooling with headroom-aware routing on their own machine, against their own subscriptions, without a hosted platform.
47
50
 
48
51
  **Not a fit:**
@@ -94,6 +97,8 @@ One URL. Your tool doesn't know or care which provider is answering.
94
97
 
95
98
  **Use dario if** you use more than one LLM provider, or more than one tool, or both — and you're tired of configuring each tool with a different base URL and API key per provider.
96
99
 
100
+ **Use dario if** you want provider independence. Switching from GPT-4o to Claude to Llama is a model-name change in your tool, not a reconfigure of every SDK and base URL you've got.
101
+
97
102
  **Use dario if** you pay for Claude Max or Pro and you want that subscription reachable from every tool on your machine, without paying API rates or opening a second billing surface.
98
103
 
99
104
  **Use dario pool mode if** you're running multi-agent workloads on Claude subscriptions and hitting per-account rate limits. Add 2–N accounts with `dario accounts add` and dario routes across them by per-account headroom, all on your machine, against your own subscriptions. See [Multi-Account Pool Mode](#multi-account-pool-mode).
@@ -133,7 +138,7 @@ Opus, Sonnet, Haiku, GPT-4o, o1, o3, o4, plus anything the configured OpenAI-com
133
138
 
134
139
  ## Backends
135
140
 
136
- Dario's routing is organized around **backends**, each with its own auth and its own target. v3.6.0 ships two backends, with more coming.
141
+ Dario's routing is organized around **backends**, each with its own auth and its own target. Backends are swappable adapters — add one, your tools reach it at `localhost:3456` with whatever API shape they already speak. v3.6.0 ships two backends, with more coming.
137
142
 
138
143
  ### 1. Claude subscription backend (built in)
139
144
 
@@ -247,7 +252,7 @@ curl http://localhost:3456/analytics # per-account / per-model stats, burn ra
247
252
  | `--passthrough` / `--thin` | Thin proxy for the Claude backend — OAuth swap only, no template injection | off |
248
253
  | `--preserve-tools` / `--keep-tools` | Keep client tool schemas instead of remapping to CC's `Bash/Read/Grep/Glob/WebSearch/WebFetch`. Required for clients whose tools have fields CC doesn't (`sessionId`, custom ids, etc.) — see [Custom tool schemas](#custom-tool-schemas). Trade-off: drops the CC request fingerprint. | off |
249
254
  | `--hybrid-tools` / `--context-inject` | Remap to CC tools **and** inject request-context values (`sessionId`, `requestId`, `channelId`, `userId`, `timestamp`) into client-declared fields CC's schema doesn't carry. Preserves the CC fingerprint while keeping custom schemas functional — see [Hybrid tool mode](#hybrid-tool-mode). Mutually exclusive with `--preserve-tools`. | off |
250
- | `--model=<name>` | Force a model (`opus`, `sonnet`, `haiku`, or full ID). Applies to the Claude backend. | passthrough |
255
+ | `--model=<name>` | Force a model. Shortcuts (`opus`, `sonnet`, `haiku`), full IDs (`claude-opus-4-6`), or a **provider prefix** (`openai:gpt-4o`, `groq:llama-3.3-70b`, `claude:opus`, `local:qwen-coder`) to force the backend server-wide. See [Provider prefix](#provider-prefix). | passthrough |
251
256
  | `--port=<n>` | Port to listen on | `3456` |
252
257
  | `--host=<addr>` / `DARIO_HOST` | Bind address. Use `0.0.0.0` for LAN, or a specific IP (e.g. a Tailscale interface). When non-loopback, also set `DARIO_API_KEY`. | `127.0.0.1` |
253
258
  | `--verbose` / `-v` | Log every request | off |
@@ -348,6 +353,44 @@ curl http://localhost:3456/v1/chat/completions \
348
353
 
349
354
  All supported. Claude backend: full Anthropic SSE format plus OpenAI-SSE translation for tool_use streaming. OpenAI-compat backend: streaming body forwarded byte-for-byte.
350
355
 
356
+ ### Provider prefix
357
+
358
+ Any request's `model` field can be written as `<provider>:<name>` to force which backend handles it, regardless of what the model name looks like. This is useful when regex-based routing (`gpt-*` → OpenAI, `claude-*` → Claude) doesn't match — for example when routing a `llama-3.3-70b` request through an OpenAI-compat backend, or when you want the same model name to go to different providers on different requests.
359
+
360
+ Recognized prefixes:
361
+
362
+ | Prefix | Backend |
363
+ |---|---|
364
+ | `openai:` | OpenAI-compat backend (the configured one) |
365
+ | `groq:` | OpenAI-compat backend |
366
+ | `openrouter:` | OpenAI-compat backend |
367
+ | `local:` | OpenAI-compat backend |
368
+ | `compat:` | OpenAI-compat backend |
369
+ | `claude:` | Claude subscription backend |
370
+ | `anthropic:` | Claude subscription backend |
371
+
372
+ Examples:
373
+
374
+ ```bash
375
+ # Force openai backend
376
+ curl http://localhost:3456/v1/chat/completions \
377
+ -H "Authorization: Bearer dario" \
378
+ -d '{"model":"openai:gpt-4o","messages":[{"role":"user","content":"hi"}]}'
379
+
380
+ # Force a non-gpt model through the openai-compat backend (e.g. OpenRouter)
381
+ curl http://localhost:3456/v1/chat/completions \
382
+ -H "Authorization: Bearer dario" \
383
+ -d '{"model":"openrouter:meta-llama/llama-3.1-70b-instruct","messages":[...]}'
384
+
385
+ # Force Claude subscription backend — same as `opus` shortcut but explicit
386
+ curl http://localhost:3456/v1/messages \
387
+ -d '{"model":"claude:opus","max_tokens":1024,"messages":[...]}'
388
+ ```
389
+
390
+ The prefix gets stripped before the request goes upstream — the backend only sees the bare model name. Unrecognized prefixes are ignored, so ollama-style `llama3:8b` passes through untouched.
391
+
392
+ **Server-wide override.** `dario proxy --model=openai:gpt-4o` applies the prefix to every request, regardless of what the client sends. Useful for "I want everything routed to this specific backend and model" without editing every tool's config.
393
+
351
394
  ### Custom tool schemas
352
395
 
353
396
  By default, on the Claude backend, dario replaces your client's tool definitions with the real Claude Code tools (`Bash`, `Read`, `Grep`, `Glob`, `WebSearch`, `WebFetch`) and translates parameters back and forth. That's how dario looks like CC on the wire, which is what lets your request bill against your Claude subscription instead of API pricing.
@@ -291,6 +291,26 @@ export function buildCCRequest(clientBody, billingTag, cache1h, identity, opts =
291
291
  }
292
292
  }
293
293
  }
294
+ // ── Drop trailing empty/assistant turns ──
295
+ // An assistant turn that was thinking-only before the strip above becomes
296
+ // content: []. Forwarding that shape makes Anthropic interpret the request
297
+ // as a prefill ("continue from this assistant text"), which Opus 4.6 under
298
+ // adaptive thinking + the claude-code beta refuses with:
299
+ // "This model does not support assistant message prefill. The
300
+ // conversation must end with a user message."
301
+ // Clients that preserve full thinking in history (OpenClaw, Hermes) hit
302
+ // this any time a prior turn was interrupted mid-thinking. Drop trailing
303
+ // assistant/empty turns so the request ends on a user message.
304
+ while (messages.length > 0) {
305
+ const last = messages[messages.length - 1];
306
+ const contentEmpty = Array.isArray(last.content) && last.content.length === 0;
307
+ const isTrailingAssistant = last.role === 'assistant';
308
+ if (contentEmpty || isTrailingAssistant) {
309
+ messages.pop();
310
+ continue;
311
+ }
312
+ break;
313
+ }
294
314
  // ── Build tool mapping ──
295
315
  // In preserveTools mode, skip the tool name/arg rewriting entirely.
296
316
  // Tool routing in real agents requires bidirectional schema fidelity that
package/dist/cli.js CHANGED
@@ -371,6 +371,8 @@ async function help() {
371
371
  --model=MODEL Force a model for all requests
372
372
  Shortcuts: opus, sonnet, haiku
373
373
  Full IDs: claude-opus-4-6, claude-sonnet-4-6
374
+ Provider prefix: openai:gpt-4o, groq:llama-3.3-70b,
375
+ claude:opus, local:qwen-coder (forces backend)
374
376
  Default: passthrough (client decides)
375
377
  --passthrough, --thin Thin proxy — OAuth swap only, no injection
376
378
  --preserve-tools Forward client tool schemas unchanged
package/dist/proxy.d.ts CHANGED
@@ -1,3 +1,7 @@
1
+ export declare function parseProviderPrefix(model: string): {
2
+ provider: 'openai' | 'claude';
3
+ model: string;
4
+ } | null;
1
5
  interface ProxyOptions {
2
6
  port?: number;
3
7
  host?: string;
package/dist/proxy.js CHANGED
@@ -134,6 +134,33 @@ const MODEL_ALIASES = {
134
134
  'sonnet1m': 'claude-sonnet-4-6[1m]',
135
135
  'haiku': 'claude-haiku-4-5',
136
136
  };
137
+ // Provider prefix in the `model` field — `<provider>:<model>`. Forces
138
+ // routing regardless of model-name regex. Only recognized prefixes are
139
+ // parsed, so ollama-style `llama3:8b` (without a recognized prefix)
140
+ // passes through untouched and reaches the configured openai-compat
141
+ // backend as-is.
142
+ const PROVIDER_PREFIXES = {
143
+ openai: 'openai',
144
+ openrouter: 'openai',
145
+ groq: 'openai',
146
+ compat: 'openai',
147
+ local: 'openai',
148
+ claude: 'claude',
149
+ anthropic: 'claude',
150
+ };
151
+ export function parseProviderPrefix(model) {
152
+ const idx = model.indexOf(':');
153
+ if (idx <= 0)
154
+ return null;
155
+ const prefix = model.slice(0, idx).toLowerCase();
156
+ const provider = PROVIDER_PREFIXES[prefix];
157
+ if (!provider)
158
+ return null;
159
+ const stripped = model.slice(idx + 1);
160
+ if (!stripped)
161
+ return null;
162
+ return { provider, model: stripped };
163
+ }
137
164
  // Beta prefixes that require Extra Usage to be ENABLED on the account.
138
165
  // context-management and prompt-caching-scope are safe — billing is determined
139
166
  // solely by the OAuth token's subscription type, not by beta flags.
@@ -390,7 +417,13 @@ export async function startProxy(opts = {}) {
390
417
  }
391
418
  }
392
419
  const cliVersion = detectCliVersion();
393
- const modelOverride = opts.model ? (MODEL_ALIASES[opts.model] ?? opts.model) : null;
420
+ // Parse --model once at startup. Supports `<provider>:<model>` to force
421
+ // a backend for every request (e.g. `--model=openai:gpt-4o`). Back-compat:
422
+ // bare names like `opus` resolve via MODEL_ALIASES.
423
+ const modelPrefix = opts.model ? parseProviderPrefix(opts.model) : null;
424
+ const cliModelRaw = modelPrefix ? modelPrefix.model : opts.model;
425
+ const cliProviderOverride = modelPrefix ? modelPrefix.provider : null;
426
+ const modelOverride = cliModelRaw ? (MODEL_ALIASES[cliModelRaw] ?? cliModelRaw) : null;
394
427
  const identity = loadClaudeIdentity();
395
428
  if (identity.deviceId) {
396
429
  console.log(' Device identity: detected');
@@ -608,18 +641,47 @@ export async function startProxy(opts = {}) {
608
641
  finally {
609
642
  clearTimeout(bodyTimeout);
610
643
  }
611
- const body = Buffer.concat(chunks);
644
+ let body = Buffer.concat(chunks);
645
+ // Provider prefix (v3.10.0). If the body's model field is `<provider>:<model>`
646
+ // with a recognized prefix, strip the prefix and force routing regardless of
647
+ // regex. CLI-level `--model=<provider>:<name>` applies the same override
648
+ // server-wide. Rewrites the body in place once so both code paths below
649
+ // see the stripped model name.
650
+ let forcedProvider = cliProviderOverride;
651
+ if (body.length > 0) {
652
+ try {
653
+ const parsed = JSON.parse(body.toString());
654
+ const rawModel = parsed.model ?? '';
655
+ const prefix = parseProviderPrefix(rawModel);
656
+ if (prefix) {
657
+ forcedProvider = prefix.provider;
658
+ parsed.model = prefix.model;
659
+ body = Buffer.from(JSON.stringify(parsed));
660
+ if (verbose) {
661
+ console.log(`[dario] provider prefix: ${rawModel} → ${prefix.provider} backend with model ${prefix.model}`);
662
+ }
663
+ }
664
+ else if (cliProviderOverride === 'openai' && cliModelRaw) {
665
+ // --model=openai:<name> forces the openai backend and replaces
666
+ // the model name server-wide. Body gets rewritten so the openai
667
+ // route below sees the CLI-chosen model.
668
+ parsed.model = cliModelRaw;
669
+ body = Buffer.from(JSON.stringify(parsed));
670
+ }
671
+ }
672
+ catch { /* not JSON — fall through */ }
673
+ }
612
674
  // Multi-provider routing (v3.6.0+). When an OpenAI-compat backend is
613
675
  // configured and the request is on /v1/chat/completions with a
614
- // GPT-family model, forward it straight through to the backend
615
- // instead of running it through the Claude template path. Requests
616
- // on /v1/messages or with Claude-family models fall through to
617
- // existing behavior.
618
- if (openaiBackend && isOpenAI && body.length > 0) {
676
+ // GPT-family model (or a forced `openai:` prefix), forward it straight
677
+ // through to the backend instead of running it through the Claude
678
+ // template path. Requests on /v1/messages or with Claude-family models
679
+ // fall through to existing behavior.
680
+ if (openaiBackend && isOpenAI && forcedProvider !== 'claude' && body.length > 0) {
619
681
  try {
620
682
  const peek = JSON.parse(body.toString());
621
683
  const rawModel = (peek.model || '').toString();
622
- if (rawModel && isOpenAIModel(rawModel)) {
684
+ if (rawModel && (forcedProvider === 'openai' || isOpenAIModel(rawModel))) {
623
685
  if (verbose) {
624
686
  console.log(`[dario] #${requestCount} ${req.method} ${urlPath} (model: ${rawModel}) → openai backend`);
625
687
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@askalf/dario",
3
- "version": "3.9.6",
3
+ "version": "3.10.1",
4
4
  "description": "A local LLM router. One endpoint, every provider — Claude subscriptions, OpenAI, OpenRouter, Groq, local LiteLLM, any OpenAI-compat endpoint — your tools don't need to change.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -21,7 +21,7 @@
21
21
  ],
22
22
  "scripts": {
23
23
  "build": "tsc && cp src/cc-template-data.json dist/",
24
- "test": "node test/issue-29-tool-translation.mjs && node test/hybrid-tools.mjs && node test/scrub-paths.mjs && node test/analytics-recording.mjs && node test/failover-429.mjs",
24
+ "test": "node test/issue-29-tool-translation.mjs && node test/hybrid-tools.mjs && node test/scrub-paths.mjs && node test/provider-prefix.mjs && node test/analytics-recording.mjs && node test/failover-429.mjs",
25
25
  "audit": "npm audit --production --audit-level=high",
26
26
  "prepublishOnly": "npm run build",
27
27
  "start": "node dist/cli.js",