@askalf/dario 3.30.8 → 3.30.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +91 -1
- package/dist/cc-template.d.ts +15 -0
- package/dist/cc-template.js +26 -4
- package/dist/cli.d.ts +15 -0
- package/dist/cli.js +75 -1
- package/dist/proxy.d.ts +15 -0
- package/dist/proxy.js +42 -27
- package/dist/request-queue.d.ts +75 -0
- package/dist/request-queue.js +108 -0
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -443,12 +443,102 @@ const msg = await client.messages.create({
|
|
|
443
443
|
|
|
444
444
|
### OpenAI-compatible tools (Cursor, Continue, Aider, LiteLLM, …)
|
|
445
445
|
|
|
446
|
+
Any tool that accepts an OpenAI-compatible base URL + API key works with dario. The universal env-var setup:
|
|
447
|
+
|
|
446
448
|
```bash
|
|
447
449
|
export OPENAI_BASE_URL=http://localhost:3456/v1
|
|
448
450
|
export OPENAI_API_KEY=dario
|
|
449
451
|
```
|
|
450
452
|
|
|
451
|
-
|
|
453
|
+
Use Claude model names (`claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`, or shortcuts `opus` / `sonnet` / `haiku`) for the Claude subscription backend, or GPT-family / Llama / any-other-model names for your configured OpenAI-compat backends.
|
|
454
|
+
|
|
455
|
+
Some tools use env vars (above works as-is); others want settings-UI entries:
|
|
456
|
+
|
|
457
|
+
#### Cursor
|
|
458
|
+
|
|
459
|
+
1. **Cmd/Ctrl + ,** to open Settings → **Models**
|
|
460
|
+
2. Under the **OpenAI API Key** section:
|
|
461
|
+
- Check **Override OpenAI Base URL**: `http://localhost:3456/v1`
|
|
462
|
+
- API key: `dario`
|
|
463
|
+
3. Under the **Model Names** section (or the Add Model button):
|
|
464
|
+
- Add `claude-sonnet-4-6`
|
|
465
|
+
- Add `claude-opus-4-7` (premium)
|
|
466
|
+
- Add `claude-haiku-4-5` (cheap)
|
|
467
|
+
4. Select one of the new models in the chat input's model picker.
|
|
468
|
+
|
|
469
|
+
Cursor now routes those model names through dario → your Claude Max / Pro subscription. `gpt-*` and `o*` model names still route through Cursor's default OpenAI path — dario doesn't interfere with non-Claude traffic unless you point Cursor's base URL at it exclusively.
|
|
470
|
+
|
|
471
|
+
#### Continue.dev
|
|
472
|
+
|
|
473
|
+
In `~/.continue/config.yaml` (or the Continue settings UI, which edits the same file):
|
|
474
|
+
|
|
475
|
+
```yaml
|
|
476
|
+
models:
|
|
477
|
+
- name: Claude Sonnet (dario)
|
|
478
|
+
provider: anthropic
|
|
479
|
+
model: claude-sonnet-4-6
|
|
480
|
+
apiBase: http://localhost:3456
|
|
481
|
+
apiKey: dario
|
|
482
|
+
- name: Claude Opus (dario)
|
|
483
|
+
provider: anthropic
|
|
484
|
+
model: claude-opus-4-7
|
|
485
|
+
apiBase: http://localhost:3456
|
|
486
|
+
apiKey: dario
|
|
487
|
+
```
|
|
488
|
+
|
|
489
|
+
`provider: anthropic` + `apiBase: http://localhost:3456` points Continue's Anthropic SDK path at dario instead of `api.anthropic.com`. dario runs the full Claude Code wire replay on the outbound path.
|
|
490
|
+
|
|
491
|
+
#### Aider
|
|
492
|
+
|
|
493
|
+
```bash
|
|
494
|
+
export ANTHROPIC_BASE_URL=http://localhost:3456
|
|
495
|
+
export ANTHROPIC_API_KEY=dario
|
|
496
|
+
aider --model sonnet
|
|
497
|
+
```
|
|
498
|
+
|
|
499
|
+
Aider's Anthropic path honors `ANTHROPIC_BASE_URL` directly. `--model opus`, `--model haiku`, or any explicit `claude-*` model name works.
|
|
500
|
+
|
|
501
|
+
#### Cline / Roo Code / Kilo Code
|
|
502
|
+
|
|
503
|
+
Cline and its forks use a UI-based "API Provider" dropdown. Pick **Anthropic** as the provider and fill in:
|
|
504
|
+
|
|
505
|
+
- **API Key**: `dario`
|
|
506
|
+
- **Anthropic Base URL**: `http://localhost:3456`
|
|
507
|
+
- **Model**: `claude-sonnet-4-6` / `claude-opus-4-7` / `claude-haiku-4-5`
|
|
508
|
+
|
|
509
|
+
Cline's tool-invocation protocol is XML-based (`<execute_command>`, `<write_to_file>`, etc.), not Anthropic's tool-use format. Dario auto-detects Cline-family clients via system-prompt fingerprint and flips into preserve-tools mode automatically — Cline's own tool schema passes through to Anthropic, your commands route back to Cline's parser. No flag required. Override: `--no-auto-detect` if you'd rather force the CC fingerprint and deal with the parser mismatch yourself (see [Agent compatibility](#agent-compatibility)).
|
|
510
|
+
|
|
511
|
+
#### Zed
|
|
512
|
+
|
|
513
|
+
Zed's Anthropic provider config (`~/.config/zed/settings.json` or Cmd/Ctrl+,):
|
|
514
|
+
|
|
515
|
+
```json
|
|
516
|
+
{
|
|
517
|
+
"language_models": {
|
|
518
|
+
"anthropic": {
|
|
519
|
+
"api_url": "http://localhost:3456",
|
|
520
|
+
"version": "2023-06-01"
|
|
521
|
+
}
|
|
522
|
+
}
|
|
523
|
+
}
|
|
524
|
+
```
|
|
525
|
+
|
|
526
|
+
Set the `ANTHROPIC_API_KEY` env var to `dario` before launching Zed. Model picker then shows Claude models routed through your subscription.
|
|
527
|
+
|
|
528
|
+
#### OpenHands
|
|
529
|
+
|
|
530
|
+
```bash
|
|
531
|
+
export LLM_BASE_URL=http://localhost:3456
|
|
532
|
+
export LLM_API_KEY=dario
|
|
533
|
+
export LLM_MODEL=anthropic/claude-sonnet-4-6
|
|
534
|
+
python -m openhands.core.main -t "task description"
|
|
535
|
+
```
|
|
536
|
+
|
|
537
|
+
Prefix the model with `anthropic/` so LiteLLM (OpenHands' inner routing layer) knows to hit the Anthropic path, which dario is now fronting.
|
|
538
|
+
|
|
539
|
+
#### Everything else
|
|
540
|
+
|
|
541
|
+
If your tool isn't listed, check whether it reads `OPENAI_BASE_URL` / `ANTHROPIC_BASE_URL` from the environment. Most do. For tools that don't, look in their settings for "Base URL" / "API URL" / "Endpoint" / "OpenAI-compatible endpoint" — all of those map to dario's `http://localhost:3456` (Anthropic-protocol) or `http://localhost:3456/v1` (OpenAI-protocol). If the tool only accepts `https://`, you'll need a loopback TLS shim (out of scope here — open an issue if you need one for a specific tool).
|
|
452
542
|
|
|
453
543
|
### curl
|
|
454
544
|
|
package/dist/cc-template.d.ts
CHANGED
|
@@ -170,6 +170,20 @@ export interface RequestContext {
|
|
|
170
170
|
* Replaces the entire request structure — tools, fields, ordering — with
|
|
171
171
|
* what real CC sends. Only the conversation content is preserved.
|
|
172
172
|
*/
|
|
173
|
+
/** Valid values for the `--effort` flag. `'client'` passes through the client's own `output_config.effort` (falling back to `'high'` if the client didn't send one). dario#87. */
|
|
174
|
+
export type EffortValue = 'low' | 'medium' | 'high' | 'xhigh' | 'client';
|
|
175
|
+
export declare const VALID_EFFORT_VALUES: ReadonlyArray<EffortValue>;
|
|
176
|
+
/**
|
|
177
|
+
* Resolve the outbound `output_config.effort` value.
|
|
178
|
+
*
|
|
179
|
+
* undefined / 'high' → 'high' (current default, matches CC 2.1.116 wire value)
|
|
180
|
+
* 'low' / 'medium' / 'xhigh' → pin to that value
|
|
181
|
+
* 'client' → extract from `clientBody.output_config.effort`; fall back
|
|
182
|
+
* to 'high' if the client didn't send one or sent a non-string
|
|
183
|
+
*
|
|
184
|
+
* Exported for tests.
|
|
185
|
+
*/
|
|
186
|
+
export declare function resolveEffort(flag: EffortValue | undefined, clientBody: Record<string, unknown>): string;
|
|
173
187
|
export declare function buildCCRequest(clientBody: Record<string, unknown>, billingTag: string, cacheControl: {
|
|
174
188
|
type: 'ephemeral';
|
|
175
189
|
}, identity: {
|
|
@@ -180,6 +194,7 @@ export declare function buildCCRequest(clientBody: Record<string, unknown>, bill
|
|
|
180
194
|
preserveTools?: boolean;
|
|
181
195
|
hybridTools?: boolean;
|
|
182
196
|
noAutoDetect?: boolean;
|
|
197
|
+
effort?: EffortValue;
|
|
183
198
|
}): {
|
|
184
199
|
body: Record<string, unknown>;
|
|
185
200
|
toolMap: Map<string, ToolMapping>;
|
package/dist/cc-template.js
CHANGED
|
@@ -708,11 +708,29 @@ const TOOL_MAP = {
|
|
|
708
708
|
},
|
|
709
709
|
exit_worktree: { ccTool: 'ExitWorktree' },
|
|
710
710
|
};
|
|
711
|
+
export const VALID_EFFORT_VALUES = ['low', 'medium', 'high', 'xhigh', 'client'];
|
|
711
712
|
/**
|
|
712
|
-
*
|
|
713
|
-
*
|
|
714
|
-
*
|
|
713
|
+
* Resolve the outbound `output_config.effort` value.
|
|
714
|
+
*
|
|
715
|
+
* undefined / 'high' → 'high' (current default, matches CC 2.1.116 wire value)
|
|
716
|
+
* 'low' / 'medium' / 'xhigh' → pin to that value
|
|
717
|
+
* 'client' → extract from `clientBody.output_config.effort`; fall back
|
|
718
|
+
* to 'high' if the client didn't send one or sent a non-string
|
|
719
|
+
*
|
|
720
|
+
* Exported for tests.
|
|
715
721
|
*/
|
|
722
|
+
export function resolveEffort(flag, clientBody) {
|
|
723
|
+
if (flag === undefined)
|
|
724
|
+
return 'high';
|
|
725
|
+
if (flag === 'client') {
|
|
726
|
+
const clientOC = clientBody.output_config;
|
|
727
|
+
const clientEffort = clientOC?.effort;
|
|
728
|
+
if (typeof clientEffort === 'string' && clientEffort.length > 0)
|
|
729
|
+
return clientEffort;
|
|
730
|
+
return 'high';
|
|
731
|
+
}
|
|
732
|
+
return flag;
|
|
733
|
+
}
|
|
716
734
|
export function buildCCRequest(clientBody, billingTag, cacheControl, identity, opts = {}) {
|
|
717
735
|
const model = clientBody.model || 'claude-sonnet-4-6';
|
|
718
736
|
const isHaiku = model.toLowerCase().includes('haiku');
|
|
@@ -979,7 +997,11 @@ export function buildCCRequest(clientBody, billingTag, cacheControl, identity, o
|
|
|
979
997
|
if (!isHaiku) {
|
|
980
998
|
ccRequest.thinking = { type: 'adaptive' };
|
|
981
999
|
ccRequest.context_management = { edits: [{ type: 'clear_thinking_20251015', keep: 'all' }] };
|
|
982
|
-
|
|
1000
|
+
// output_config.effort default is `'high'` (matches CC 2.1.116's wire
|
|
1001
|
+
// value). `--effort` flag overrides; `'client'` passes through whatever
|
|
1002
|
+
// the client sent (or falls back to `'high'` if the client didn't
|
|
1003
|
+
// include an output_config). See dario#87.
|
|
1004
|
+
ccRequest.output_config = { effort: resolveEffort(opts.effort, clientBody) };
|
|
983
1005
|
}
|
|
984
1006
|
ccRequest.stream = stream;
|
|
985
1007
|
// Replay the captured top-level key order. The hardcoded build order above
|
package/dist/cli.d.ts
CHANGED
|
@@ -9,6 +9,21 @@
|
|
|
9
9
|
* dario refresh — Force token refresh
|
|
10
10
|
* dario logout — Remove saved credentials
|
|
11
11
|
*/
|
|
12
|
+
import { type EffortValue } from './cc-template.js';
|
|
13
|
+
/**
|
|
14
|
+
* Parse the `--effort` flag + `DARIO_EFFORT` env. Validates against the
|
|
15
|
+
* allowed set; unrecognised values cause a non-zero exit with the list of
|
|
16
|
+
* valid choices (same philosophy as other strict parsers in this CLI).
|
|
17
|
+
* Flag value wins over env. Exported for tests. dario#87.
|
|
18
|
+
*/
|
|
19
|
+
export declare function resolveEffortFlag(args: string[], env: string | undefined): EffortValue | undefined;
|
|
20
|
+
/**
|
|
21
|
+
* Parse a positive-integer env var. Returns undefined on unset, empty,
|
|
22
|
+
* non-numeric, or non-positive values so the caller's default applies.
|
|
23
|
+
* Sibling of parsePositiveIntFlag; exported for tests + used by the
|
|
24
|
+
* dario#80 queue env mirrors.
|
|
25
|
+
*/
|
|
26
|
+
export declare function parsePositiveIntEnv(value: string | undefined): number | undefined;
|
|
12
27
|
/**
|
|
13
28
|
* Parse a boolean env var. Accepts "1", "true", "yes", "on" (case-insensitive)
|
|
14
29
|
* as truthy; everything else (including unset) is undefined/false. Exported
|
package/dist/cli.js
CHANGED
|
@@ -37,6 +37,7 @@ import { join } from 'node:path';
|
|
|
37
37
|
import { homedir } from 'node:os';
|
|
38
38
|
import { startAutoOAuthFlow, startManualOAuthFlow, detectHeadlessEnvironment, getStatus, refreshTokens, loadCredentials } from './oauth.js';
|
|
39
39
|
import { startProxy, sanitizeError } from './proxy.js';
|
|
40
|
+
import { VALID_EFFORT_VALUES } from './cc-template.js';
|
|
40
41
|
import { listAccountAliases, loadAllAccounts, addAccountViaOAuth, removeAccount } from './accounts.js';
|
|
41
42
|
import { listBackends, saveBackend, removeBackend } from './openai-backend.js';
|
|
42
43
|
const args = process.argv.slice(2);
|
|
@@ -250,6 +251,26 @@ async function proxy() {
|
|
|
250
251
|
const strictTemplate = args.includes('--strict-template')
|
|
251
252
|
|| parseBooleanEnv(process.env['DARIO_STRICT_TEMPLATE'])
|
|
252
253
|
|| undefined;
|
|
254
|
+
// --max-concurrent=N / --max-queued=N / --queue-timeout=MS — bounded
|
|
255
|
+
// request queue knobs (dario#80). Defaults preserve v3.30.x-and-earlier
|
|
256
|
+
// behaviour for typical single-user workloads; tune up for high-fan-out
|
|
257
|
+
// agent setups that otherwise hit dario-level 429s before upstream.
|
|
258
|
+
const maxConcurrent = parsePositiveIntFlag('--max-concurrent=')
|
|
259
|
+
?? parsePositiveIntEnv(process.env['DARIO_MAX_CONCURRENT']);
|
|
260
|
+
const maxQueued = parsePositiveIntFlag('--max-queued=')
|
|
261
|
+
?? parsePositiveIntEnv(process.env['DARIO_MAX_QUEUED']);
|
|
262
|
+
const queueTimeoutMs = parsePositiveIntFlag('--queue-timeout=')
|
|
263
|
+
?? parsePositiveIntEnv(process.env['DARIO_QUEUE_TIMEOUT_MS']);
|
|
264
|
+
// --effort=low|medium|high|xhigh|client — override the outbound
|
|
265
|
+
// output_config.effort (dario#87). Default (unset) pins 'high' to match
|
|
266
|
+
// CC 2.1.116's wire value. 'client' passes through whatever the client
|
|
267
|
+
// sent, falling back to 'high' if the client didn't include one.
|
|
268
|
+
//
|
|
269
|
+
// Risk: setting effort to a non-CC-default value may cause Anthropic's
|
|
270
|
+
// classifier to flip requests to 'overage' billing. Users opting in
|
|
271
|
+
// should watch the `representative-claim` response header via -v logs
|
|
272
|
+
// and revert to default if subscription billing breaks.
|
|
273
|
+
const effort = resolveEffortFlag(args, process.env['DARIO_EFFORT']);
|
|
253
274
|
// Non-loopback bind without DARIO_API_KEY turns dario into an open
|
|
254
275
|
// OAuth-subscription relay for anyone on the reachable network. Refuse
|
|
255
276
|
// to start rather than rely on the operator to read the startup banner.
|
|
@@ -269,7 +290,39 @@ async function proxy() {
|
|
|
269
290
|
console.error(`[dario] Override (not recommended): pass --unsafe-no-auth if you have out-of-band network controls and accept the risk.`);
|
|
270
291
|
process.exit(1);
|
|
271
292
|
}
|
|
272
|
-
await startProxy({ port, host, verbose, verboseBodies, model, passthrough, preserveTools, hybridTools, noAutoDetect, strictTls, pacingMinMs, pacingJitterMs, drainOnClose, sessionIdleRotateMs, sessionRotateJitterMs, sessionMaxAgeMs, sessionPerClient, preserveOrchestrationTags, noLiveCapture, strictTemplate });
|
|
293
|
+
await startProxy({ port, host, verbose, verboseBodies, model, passthrough, preserveTools, hybridTools, noAutoDetect, strictTls, pacingMinMs, pacingJitterMs, drainOnClose, sessionIdleRotateMs, sessionRotateJitterMs, sessionMaxAgeMs, sessionPerClient, preserveOrchestrationTags, noLiveCapture, strictTemplate, maxConcurrent, maxQueued, queueTimeoutMs, effort });
|
|
294
|
+
}
|
|
295
|
+
/**
|
|
296
|
+
* Parse the `--effort` flag + `DARIO_EFFORT` env. Validates against the
|
|
297
|
+
* allowed set; unrecognised values cause a non-zero exit with the list of
|
|
298
|
+
* valid choices (same philosophy as other strict parsers in this CLI).
|
|
299
|
+
* Flag value wins over env. Exported for tests. dario#87.
|
|
300
|
+
*/
|
|
301
|
+
export function resolveEffortFlag(args, env) {
|
|
302
|
+
const withValue = args.find(a => a.startsWith('--effort='));
|
|
303
|
+
const raw = withValue ? withValue.slice('--effort='.length) : env;
|
|
304
|
+
if (raw === undefined || raw === '')
|
|
305
|
+
return undefined;
|
|
306
|
+
const normalized = raw.trim().toLowerCase();
|
|
307
|
+
if (VALID_EFFORT_VALUES.includes(normalized)) {
|
|
308
|
+
return normalized;
|
|
309
|
+
}
|
|
310
|
+
console.error(`[dario] Invalid --effort value: ${JSON.stringify(raw)}. Must be one of: ${VALID_EFFORT_VALUES.join(', ')}.`);
|
|
311
|
+
process.exit(1);
|
|
312
|
+
}
|
|
313
|
+
/**
|
|
314
|
+
* Parse a positive-integer env var. Returns undefined on unset, empty,
|
|
315
|
+
* non-numeric, or non-positive values so the caller's default applies.
|
|
316
|
+
* Sibling of parsePositiveIntFlag; exported for tests + used by the
|
|
317
|
+
* dario#80 queue env mirrors.
|
|
318
|
+
*/
|
|
319
|
+
export function parsePositiveIntEnv(value) {
|
|
320
|
+
if (value === undefined || value === '')
|
|
321
|
+
return undefined;
|
|
322
|
+
const n = Number.parseInt(value.trim(), 10);
|
|
323
|
+
if (!Number.isFinite(n) || n <= 0)
|
|
324
|
+
return undefined;
|
|
325
|
+
return n;
|
|
273
326
|
}
|
|
274
327
|
/**
|
|
275
328
|
* Parse a boolean env var. Accepts "1", "true", "yes", "on" (case-insensitive)
|
|
@@ -648,6 +701,27 @@ async function help() {
|
|
|
648
701
|
as --strict-tls: make the unsafe state
|
|
649
702
|
require intent. Env: DARIO_STRICT_TEMPLATE=1.
|
|
650
703
|
(v3.30.8, dario#77)
|
|
704
|
+
--max-concurrent=N Max in-flight requests (default: 10).
|
|
705
|
+
Env: DARIO_MAX_CONCURRENT. (dario#80)
|
|
706
|
+
--max-queued=N Max requests buffered waiting for a
|
|
707
|
+
concurrency slot before dario returns
|
|
708
|
+
429 "queue-full" (default: 128).
|
|
709
|
+
Env: DARIO_MAX_QUEUED. (dario#80)
|
|
710
|
+
--queue-timeout=MS Max ms a queued request waits before
|
|
711
|
+
dario returns 504 "queue-timeout"
|
|
712
|
+
(default: 60000).
|
|
713
|
+
Env: DARIO_QUEUE_TIMEOUT_MS. (dario#80)
|
|
714
|
+
--effort=<low|medium|high|xhigh|client>
|
|
715
|
+
Override the outbound output_config.effort
|
|
716
|
+
on non-haiku requests. Default (unset)
|
|
717
|
+
pins 'high' — matches CC 2.1.116's wire
|
|
718
|
+
value. 'client' passes through what the
|
|
719
|
+
client sent (falls back to 'high' if none).
|
|
720
|
+
WARNING: non-'high' values may cause
|
|
721
|
+
Anthropic's classifier to flip requests
|
|
722
|
+
to 'overage' billing; watch -v logs for
|
|
723
|
+
representative-claim changes.
|
|
724
|
+
Env: DARIO_EFFORT. (dario#87)
|
|
651
725
|
--port=PORT Port to listen on (default: 3456)
|
|
652
726
|
--host=ADDRESS Address to bind to (default: 127.0.0.1)
|
|
653
727
|
Use 0.0.0.0 for LAN; see README for DARIO_API_KEY
|
package/dist/proxy.d.ts
CHANGED
|
@@ -1,4 +1,5 @@
|
|
|
1
1
|
import { type IncomingMessage } from 'node:http';
|
|
2
|
+
import { type EffortValue } from './cc-template.js';
|
|
2
3
|
export declare function parseProviderPrefix(model: string): {
|
|
3
4
|
provider: 'openai' | 'claude';
|
|
4
5
|
model: string;
|
|
@@ -65,6 +66,20 @@ interface ProxyOptions {
|
|
|
65
66
|
* --strict-tls. dario#77.
|
|
66
67
|
*/
|
|
67
68
|
strictTemplate?: boolean;
|
|
69
|
+
/** Max concurrent in-flight requests. Default 10. dario#80. */
|
|
70
|
+
maxConcurrent?: number;
|
|
71
|
+
/** Max requests buffered waiting for a concurrency slot. Default 128. dario#80. */
|
|
72
|
+
maxQueued?: number;
|
|
73
|
+
/** Max ms a queued request waits before it times out with 504. Default 60000. dario#80. */
|
|
74
|
+
queueTimeoutMs?: number;
|
|
75
|
+
/**
|
|
76
|
+
* Override the outbound `output_config.effort` value on non-haiku
|
|
77
|
+
* requests. Default (undefined) pins `'high'`, matching CC 2.1.116's
|
|
78
|
+
* wire value. `'client'` passes through whatever the client sent (or
|
|
79
|
+
* falls back to `'high'` if the client didn't include an output_config).
|
|
80
|
+
* dario#87.
|
|
81
|
+
*/
|
|
82
|
+
effort?: EffortValue;
|
|
68
83
|
}
|
|
69
84
|
export declare function sanitizeError(err: unknown): string;
|
|
70
85
|
/**
|
package/dist/proxy.js
CHANGED
|
@@ -12,12 +12,12 @@ import { AccountPool, computeStickyKey, parseRateLimits } from './pool.js';
|
|
|
12
12
|
import { Analytics, billingBucketFromClaim } from './analytics.js';
|
|
13
13
|
import { loadAllAccounts, loadAccount, refreshAccountToken } from './accounts.js';
|
|
14
14
|
import { getOpenAIBackend, isOpenAIModel, forwardToOpenAI } from './openai-backend.js';
|
|
15
|
+
import { RequestQueue, QueueFullError, QueueTimeoutError, DEFAULT_MAX_CONCURRENT, DEFAULT_MAX_QUEUED, DEFAULT_QUEUE_TIMEOUT_MS } from './request-queue.js';
|
|
15
16
|
const ANTHROPIC_API = 'https://api.anthropic.com';
|
|
16
17
|
const DEFAULT_PORT = 3456;
|
|
17
18
|
const MAX_BODY_BYTES = 10 * 1024 * 1024; // 10 MB — generous for large prompts, prevents abuse
|
|
18
19
|
const UPSTREAM_TIMEOUT_MS = 300_000; // 5 min — matches Anthropic SDK default
|
|
19
20
|
const BODY_READ_TIMEOUT_MS = 30_000; // 30s — prevents slow-loris on body reads
|
|
20
|
-
const MAX_CONCURRENT = 10; // Max concurrent upstream requests
|
|
21
21
|
const DEFAULT_HOST = '127.0.0.1';
|
|
22
22
|
// A host is "loopback" if it's one of the well-known localhost literals.
|
|
23
23
|
// Used to decide whether to warn at startup about binding to a reachable
|
|
@@ -28,28 +28,8 @@ function isLoopbackHost(host) {
|
|
|
28
28
|
return true;
|
|
29
29
|
return host.startsWith('127.');
|
|
30
30
|
}
|
|
31
|
-
//
|
|
32
|
-
|
|
33
|
-
max;
|
|
34
|
-
queue = [];
|
|
35
|
-
active = 0;
|
|
36
|
-
constructor(max) {
|
|
37
|
-
this.max = max;
|
|
38
|
-
}
|
|
39
|
-
async acquire() {
|
|
40
|
-
if (this.active < this.max) {
|
|
41
|
-
this.active++;
|
|
42
|
-
return;
|
|
43
|
-
}
|
|
44
|
-
return new Promise(resolve => { this.queue.push(() => { this.active++; resolve(); }); });
|
|
45
|
-
}
|
|
46
|
-
release() {
|
|
47
|
-
this.active--;
|
|
48
|
-
const next = this.queue.shift();
|
|
49
|
-
if (next)
|
|
50
|
-
next();
|
|
51
|
-
}
|
|
52
|
-
}
|
|
31
|
+
// Concurrency control: see src/request-queue.ts for the bounded queue
|
|
32
|
+
// (replaced the v3.30.x-and-earlier simple unbounded semaphore in dario#80).
|
|
53
33
|
// Billing tag hash seed — matches Claude Code's value
|
|
54
34
|
const BILLING_SEED = '59cf53e54c78';
|
|
55
35
|
// Compute per-request build tag:
|
|
@@ -572,7 +552,11 @@ export async function startProxy(opts = {}) {
|
|
|
572
552
|
}
|
|
573
553
|
}
|
|
574
554
|
let requestCount = 0;
|
|
575
|
-
const
|
|
555
|
+
const queue = new RequestQueue({
|
|
556
|
+
maxConcurrent: opts.maxConcurrent ?? DEFAULT_MAX_CONCURRENT,
|
|
557
|
+
maxQueued: opts.maxQueued ?? DEFAULT_MAX_QUEUED,
|
|
558
|
+
queueTimeoutMs: opts.queueTimeoutMs ?? DEFAULT_QUEUE_TIMEOUT_MS,
|
|
559
|
+
});
|
|
576
560
|
// Cache context-1m beta availability. Set false once per account (or process
|
|
577
561
|
// in single-account mode) after the first "long context" rejection, so we
|
|
578
562
|
// skip sending context-1m on every subsequent request instead of paying the
|
|
@@ -771,8 +755,38 @@ export async function startProxy(opts = {}) {
|
|
|
771
755
|
res.end(ERR_METHOD);
|
|
772
756
|
return;
|
|
773
757
|
}
|
|
774
|
-
// Proxy to Anthropic (with concurrency control)
|
|
775
|
-
|
|
758
|
+
// Proxy to Anthropic (with concurrency control). The bounded queue
|
|
759
|
+
// replaces the v3.30.x-and-earlier unbounded semaphore — dario#80. A
|
|
760
|
+
// queue-full condition returns an explicit 429 with a `"queue-full"`
|
|
761
|
+
// marker in the body; a queue-timeout returns 504 with `"queue-timeout"`.
|
|
762
|
+
try {
|
|
763
|
+
await queue.acquire();
|
|
764
|
+
}
|
|
765
|
+
catch (err) {
|
|
766
|
+
if (err instanceof QueueFullError) {
|
|
767
|
+
res.writeHead(429, JSON_HEADERS);
|
|
768
|
+
res.end(JSON.stringify({
|
|
769
|
+
type: 'error',
|
|
770
|
+
error: {
|
|
771
|
+
type: 'rate_limit_error',
|
|
772
|
+
message: `dario queue full — ${queue.maxConcurrent} concurrent + ${queue.maxQueued} queued already in flight. Tune --max-concurrent / --max-queued, or reduce client-side concurrency. (dario#80)`,
|
|
773
|
+
},
|
|
774
|
+
}));
|
|
775
|
+
return;
|
|
776
|
+
}
|
|
777
|
+
if (err instanceof QueueTimeoutError) {
|
|
778
|
+
res.writeHead(504, JSON_HEADERS);
|
|
779
|
+
res.end(JSON.stringify({
|
|
780
|
+
type: 'error',
|
|
781
|
+
error: {
|
|
782
|
+
type: 'timeout_error',
|
|
783
|
+
message: `dario queue timeout — request waited longer than ${queue.queueTimeoutMs}ms for a concurrency slot. Tune --queue-timeout, or reduce client-side concurrency. (dario#80)`,
|
|
784
|
+
},
|
|
785
|
+
}));
|
|
786
|
+
return;
|
|
787
|
+
}
|
|
788
|
+
throw err;
|
|
789
|
+
}
|
|
776
790
|
// Hoisted so the finally block can clean up whatever was set.
|
|
777
791
|
let upstreamTimeout = null;
|
|
778
792
|
let onClientClose = null;
|
|
@@ -964,6 +978,7 @@ export async function startProxy(opts = {}) {
|
|
|
964
978
|
preserveTools: opts.preserveTools ?? false,
|
|
965
979
|
hybridTools: opts.hybridTools ?? false,
|
|
966
980
|
noAutoDetect: opts.noAutoDetect ?? false,
|
|
981
|
+
effort: opts.effort,
|
|
967
982
|
});
|
|
968
983
|
// Log the auto-preserve-tools switch once per text-tool
|
|
969
984
|
// client family. Skip when the operator already opted into
|
|
@@ -1617,7 +1632,7 @@ export async function startProxy(opts = {}) {
|
|
|
1617
1632
|
clearTimeout(upstreamTimeout);
|
|
1618
1633
|
if (onClientClose !== null)
|
|
1619
1634
|
req.off('close', onClientClose);
|
|
1620
|
-
|
|
1635
|
+
queue.release();
|
|
1621
1636
|
}
|
|
1622
1637
|
});
|
|
1623
1638
|
server.on('error', (err) => {
|
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Bounded request queue — replaces the simple in-process semaphore so that
|
|
3
|
+
* overload conditions are visible and tunable instead of silently queuing
|
|
4
|
+
* unbounded or rejecting with generic 429s before upstream had a chance.
|
|
5
|
+
*
|
|
6
|
+
* Three knobs:
|
|
7
|
+
* - maxConcurrent : in-flight requests allowed at once (default 10)
|
|
8
|
+
* - maxQueued : buffered requests waiting for a concurrency slot
|
|
9
|
+
* (default 128); beyond this, the queue is "full" and
|
|
10
|
+
* admission is rejected with a clear 429 body.
|
|
11
|
+
* - queueTimeoutMs: how long a queued request waits before it 504s with
|
|
12
|
+
* a "queue-timeout" reason (default 60_000).
|
|
13
|
+
*
|
|
14
|
+
* Behaviour:
|
|
15
|
+
* - active < maxConcurrent → admit immediately
|
|
16
|
+
* - else, queued < maxQueued → enqueue
|
|
17
|
+
* - else → reject with `queue-full`
|
|
18
|
+
* - queued > queueTimeoutMs → reject with `queue-timeout`
|
|
19
|
+
*
|
|
20
|
+
* The decision logic is split out as a pure `decideAdmit(state)` function so
|
|
21
|
+
* tests can exercise all three branches without side effects or timers.
|
|
22
|
+
*
|
|
23
|
+
* dario#80 (Gemini review push-back).
|
|
24
|
+
*/
|
|
25
|
+
export interface QueueState {
|
|
26
|
+
active: number;
|
|
27
|
+
queued: number;
|
|
28
|
+
maxConcurrent: number;
|
|
29
|
+
maxQueued: number;
|
|
30
|
+
}
|
|
31
|
+
export type AdmitDecision = {
|
|
32
|
+
action: 'admit';
|
|
33
|
+
} | {
|
|
34
|
+
action: 'enqueue';
|
|
35
|
+
} | {
|
|
36
|
+
action: 'reject';
|
|
37
|
+
reason: 'queue-full';
|
|
38
|
+
};
|
|
39
|
+
/** Pure admission decision — no side effects, no clock dep. */
|
|
40
|
+
export declare function decideAdmit(state: QueueState): AdmitDecision;
|
|
41
|
+
/** Pure timeout check — separated so tests can pass an explicit clock. */
|
|
42
|
+
export declare function isQueueEntryExpired(enqueuedAt: number, now: number, timeoutMs: number): boolean;
|
|
43
|
+
export declare class QueueFullError extends Error {
|
|
44
|
+
constructor();
|
|
45
|
+
}
|
|
46
|
+
export declare class QueueTimeoutError extends Error {
|
|
47
|
+
constructor();
|
|
48
|
+
}
|
|
49
|
+
export interface RequestQueueOptions {
|
|
50
|
+
maxConcurrent?: number;
|
|
51
|
+
maxQueued?: number;
|
|
52
|
+
queueTimeoutMs?: number;
|
|
53
|
+
}
|
|
54
|
+
export declare const DEFAULT_MAX_CONCURRENT = 10;
|
|
55
|
+
export declare const DEFAULT_MAX_QUEUED = 128;
|
|
56
|
+
export declare const DEFAULT_QUEUE_TIMEOUT_MS = 60000;
|
|
57
|
+
export declare class RequestQueue {
|
|
58
|
+
readonly maxConcurrent: number;
|
|
59
|
+
readonly maxQueued: number;
|
|
60
|
+
readonly queueTimeoutMs: number;
|
|
61
|
+
private active;
|
|
62
|
+
private queue;
|
|
63
|
+
constructor(opts?: RequestQueueOptions);
|
|
64
|
+
/**
|
|
65
|
+
* Acquire a concurrency slot. Resolves when admitted; throws
|
|
66
|
+
* `QueueFullError` when the queue is at its `maxQueued` cap, throws
|
|
67
|
+
* `QueueTimeoutError` when a queued request waited longer than
|
|
68
|
+
* `queueTimeoutMs`.
|
|
69
|
+
*/
|
|
70
|
+
acquire(): Promise<void>;
|
|
71
|
+
/** Release a slot. The next queued entry (if any) is admitted in FIFO order. */
|
|
72
|
+
release(): void;
|
|
73
|
+
/** Snapshot of queue state — exposed for /analytics + tests. */
|
|
74
|
+
snapshot(): QueueState;
|
|
75
|
+
}
|
|
@@ -0,0 +1,108 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Bounded request queue — replaces the simple in-process semaphore so that
|
|
3
|
+
* overload conditions are visible and tunable instead of silently queuing
|
|
4
|
+
* unbounded or rejecting with generic 429s before upstream had a chance.
|
|
5
|
+
*
|
|
6
|
+
* Three knobs:
|
|
7
|
+
* - maxConcurrent : in-flight requests allowed at once (default 10)
|
|
8
|
+
* - maxQueued : buffered requests waiting for a concurrency slot
|
|
9
|
+
* (default 128); beyond this, the queue is "full" and
|
|
10
|
+
* admission is rejected with a clear 429 body.
|
|
11
|
+
* - queueTimeoutMs: how long a queued request waits before it 504s with
|
|
12
|
+
* a "queue-timeout" reason (default 60_000).
|
|
13
|
+
*
|
|
14
|
+
* Behaviour:
|
|
15
|
+
* - active < maxConcurrent → admit immediately
|
|
16
|
+
* - else, queued < maxQueued → enqueue
|
|
17
|
+
* - else → reject with `queue-full`
|
|
18
|
+
* - queued > queueTimeoutMs → reject with `queue-timeout`
|
|
19
|
+
*
|
|
20
|
+
* The decision logic is split out as a pure `decideAdmit(state)` function so
|
|
21
|
+
* tests can exercise all three branches without side effects or timers.
|
|
22
|
+
*
|
|
23
|
+
* dario#80 (Gemini review push-back).
|
|
24
|
+
*/
|
|
25
|
+
/** Pure admission decision — no side effects, no clock dep. */
|
|
26
|
+
export function decideAdmit(state) {
|
|
27
|
+
if (state.active < state.maxConcurrent)
|
|
28
|
+
return { action: 'admit' };
|
|
29
|
+
if (state.queued < state.maxQueued)
|
|
30
|
+
return { action: 'enqueue' };
|
|
31
|
+
return { action: 'reject', reason: 'queue-full' };
|
|
32
|
+
}
|
|
33
|
+
/** Pure timeout check — separated so tests can pass an explicit clock. */
|
|
34
|
+
export function isQueueEntryExpired(enqueuedAt, now, timeoutMs) {
|
|
35
|
+
return (now - enqueuedAt) > timeoutMs;
|
|
36
|
+
}
|
|
37
|
+
export class QueueFullError extends Error {
|
|
38
|
+
constructor() { super('queue-full'); this.name = 'QueueFullError'; }
|
|
39
|
+
}
|
|
40
|
+
export class QueueTimeoutError extends Error {
|
|
41
|
+
constructor() { super('queue-timeout'); this.name = 'QueueTimeoutError'; }
|
|
42
|
+
}
|
|
43
|
+
export const DEFAULT_MAX_CONCURRENT = 10;
|
|
44
|
+
export const DEFAULT_MAX_QUEUED = 128;
|
|
45
|
+
export const DEFAULT_QUEUE_TIMEOUT_MS = 60_000;
|
|
46
|
+
export class RequestQueue {
|
|
47
|
+
maxConcurrent;
|
|
48
|
+
maxQueued;
|
|
49
|
+
queueTimeoutMs;
|
|
50
|
+
active = 0;
|
|
51
|
+
queue = [];
|
|
52
|
+
constructor(opts = {}) {
|
|
53
|
+
this.maxConcurrent = opts.maxConcurrent ?? DEFAULT_MAX_CONCURRENT;
|
|
54
|
+
this.maxQueued = opts.maxQueued ?? DEFAULT_MAX_QUEUED;
|
|
55
|
+
this.queueTimeoutMs = opts.queueTimeoutMs ?? DEFAULT_QUEUE_TIMEOUT_MS;
|
|
56
|
+
}
|
|
57
|
+
/**
|
|
58
|
+
* Acquire a concurrency slot. Resolves when admitted; throws
|
|
59
|
+
* `QueueFullError` when the queue is at its `maxQueued` cap, throws
|
|
60
|
+
* `QueueTimeoutError` when a queued request waited longer than
|
|
61
|
+
* `queueTimeoutMs`.
|
|
62
|
+
*/
|
|
63
|
+
async acquire() {
|
|
64
|
+
const decision = decideAdmit(this.snapshot());
|
|
65
|
+
if (decision.action === 'admit') {
|
|
66
|
+
this.active++;
|
|
67
|
+
return;
|
|
68
|
+
}
|
|
69
|
+
if (decision.action === 'reject') {
|
|
70
|
+
throw new QueueFullError();
|
|
71
|
+
}
|
|
72
|
+
return new Promise((resolve, reject) => {
|
|
73
|
+
const enqueuedAt = Date.now();
|
|
74
|
+
const timeoutHandle = setTimeout(() => {
|
|
75
|
+
const idx = this.queue.indexOf(entry);
|
|
76
|
+
if (idx >= 0) {
|
|
77
|
+
this.queue.splice(idx, 1);
|
|
78
|
+
reject(new QueueTimeoutError());
|
|
79
|
+
}
|
|
80
|
+
}, this.queueTimeoutMs);
|
|
81
|
+
// Keep the timer from pinning the event loop open on shutdown. A queued
|
|
82
|
+
// request waiting for a slot shouldn't by itself keep the process alive.
|
|
83
|
+
timeoutHandle.unref?.();
|
|
84
|
+
const entry = { resolve, reject, enqueuedAt, timeoutHandle };
|
|
85
|
+
this.queue.push(entry);
|
|
86
|
+
});
|
|
87
|
+
}
|
|
88
|
+
/** Release a slot. The next queued entry (if any) is admitted in FIFO order. */
|
|
89
|
+
release() {
|
|
90
|
+
if (this.active > 0)
|
|
91
|
+
this.active--;
|
|
92
|
+
const next = this.queue.shift();
|
|
93
|
+
if (next) {
|
|
94
|
+
clearTimeout(next.timeoutHandle);
|
|
95
|
+
this.active++;
|
|
96
|
+
next.resolve();
|
|
97
|
+
}
|
|
98
|
+
}
|
|
99
|
+
/** Snapshot of queue state — exposed for /analytics + tests. */
|
|
100
|
+
snapshot() {
|
|
101
|
+
return {
|
|
102
|
+
active: this.active,
|
|
103
|
+
queued: this.queue.length,
|
|
104
|
+
maxConcurrent: this.maxConcurrent,
|
|
105
|
+
maxQueued: this.maxQueued,
|
|
106
|
+
};
|
|
107
|
+
}
|
|
108
|
+
}
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@askalf/dario",
|
|
3
|
-
"version": "3.30.
|
|
3
|
+
"version": "3.30.10",
|
|
4
4
|
"description": "A local LLM router. One endpoint, every provider — Claude subscriptions, OpenAI, OpenRouter, Groq, local LiteLLM, any OpenAI-compat endpoint — your tools don't need to change.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -21,7 +21,7 @@
|
|
|
21
21
|
],
|
|
22
22
|
"scripts": {
|
|
23
23
|
"build": "tsc && cp src/cc-template-data.json dist/ && node -e \"require('fs').mkdirSync('dist/shim',{recursive:true})\" && cp src/shim/runtime.cjs dist/shim/",
|
|
24
|
-
"test": "node test/issue-29-tool-translation.mjs && node test/hybrid-tools.mjs && node test/tool-schema-contract.mjs && node test/scrub-paths.mjs && node test/provider-prefix.mjs && node test/analytics-recording.mjs && node test/analytics-billing-bucket.mjs && node test/failover-429.mjs && node test/pool-sticky.mjs && node test/live-fingerprint.mjs && node test/shim-runtime.mjs && node test/shim-e2e.mjs && node test/proxy-header-order.mjs && node test/proxy-body-order.mjs && node test/runtime-fingerprint.mjs && node test/pacing.mjs && node test/stream-drain.mjs && node test/subagent.mjs && node test/mcp-protocol.mjs && node test/mcp-tools.mjs && node test/mcp-e2e.mjs && node test/session-rotation.mjs && node test/drift-detection.mjs && node test/cc-authorize-probe-classifier.mjs && node test/compat-range.mjs && node test/doctor-formatter.mjs && node test/atomic-write.mjs && node test/account-refresh-singleflight.mjs && node test/streaming-edge-cases.mjs && node test/client-detection.mjs && node test/manual-oauth-flow.mjs && node test/scrub-template.mjs && node test/sanitize-messages.mjs && node test/platform-tools.mjs && node test/strict-template-flags.mjs",
|
|
24
|
+
"test": "node test/issue-29-tool-translation.mjs && node test/hybrid-tools.mjs && node test/tool-schema-contract.mjs && node test/scrub-paths.mjs && node test/provider-prefix.mjs && node test/analytics-recording.mjs && node test/analytics-billing-bucket.mjs && node test/failover-429.mjs && node test/pool-sticky.mjs && node test/live-fingerprint.mjs && node test/shim-runtime.mjs && node test/shim-e2e.mjs && node test/proxy-header-order.mjs && node test/proxy-body-order.mjs && node test/runtime-fingerprint.mjs && node test/pacing.mjs && node test/stream-drain.mjs && node test/subagent.mjs && node test/mcp-protocol.mjs && node test/mcp-tools.mjs && node test/mcp-e2e.mjs && node test/session-rotation.mjs && node test/drift-detection.mjs && node test/cc-authorize-probe-classifier.mjs && node test/compat-range.mjs && node test/doctor-formatter.mjs && node test/atomic-write.mjs && node test/account-refresh-singleflight.mjs && node test/streaming-edge-cases.mjs && node test/client-detection.mjs && node test/manual-oauth-flow.mjs && node test/scrub-template.mjs && node test/sanitize-messages.mjs && node test/platform-tools.mjs && node test/strict-template-flags.mjs && node test/request-queue.mjs && node test/effort-flag.mjs",
|
|
25
25
|
"audit": "npm audit --production --audit-level=high",
|
|
26
26
|
"prepublishOnly": "npm run build",
|
|
27
27
|
"start": "node dist/cli.js",
|