@askalf/dario 3.30.7 → 3.30.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -443,12 +443,102 @@ const msg = await client.messages.create({
443
443
 
444
444
  ### OpenAI-compatible tools (Cursor, Continue, Aider, LiteLLM, …)
445
445
 
446
+ Any tool that accepts an OpenAI-compatible base URL + API key works with dario. The universal env-var setup:
447
+
446
448
  ```bash
447
449
  export OPENAI_BASE_URL=http://localhost:3456/v1
448
450
  export OPENAI_API_KEY=dario
449
451
  ```
450
452
 
451
- Any tool that accepts an OpenAI base URL works. Use Claude model names (`claude-opus-4-7`, `opus`, `sonnet`, `haiku`) for the Claude backend, or GPT-family names for the configured OpenAI-compat backend.
453
+ Use Claude model names (`claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`, or shortcuts `opus` / `sonnet` / `haiku`) for the Claude subscription backend, or GPT-family / Llama / any-other-model names for your configured OpenAI-compat backends.
454
+
455
+ Some tools use env vars (above works as-is); others want settings-UI entries:
456
+
457
+ #### Cursor
458
+
459
+ 1. **Cmd/Ctrl + ,** to open Settings → **Models**
460
+ 2. Under the **OpenAI API Key** section:
461
+ - Check **Override OpenAI Base URL**: `http://localhost:3456/v1`
462
+ - API key: `dario`
463
+ 3. Under the **Model Names** section (or the Add Model button):
464
+ - Add `claude-sonnet-4-6`
465
+ - Add `claude-opus-4-7` (premium)
466
+ - Add `claude-haiku-4-5` (cheap)
467
+ 4. Select one of the new models in the chat input's model picker.
468
+
469
+ Cursor now routes those model names through dario → your Claude Max / Pro subscription. `gpt-*` and `o*` model names still route through Cursor's default OpenAI path — dario doesn't interfere with non-Claude traffic unless you point Cursor's base URL at it exclusively.
470
+
471
+ #### Continue.dev
472
+
473
+ In `~/.continue/config.yaml` (or the Continue settings UI, which edits the same file):
474
+
475
+ ```yaml
476
+ models:
477
+ - name: Claude Sonnet (dario)
478
+ provider: anthropic
479
+ model: claude-sonnet-4-6
480
+ apiBase: http://localhost:3456
481
+ apiKey: dario
482
+ - name: Claude Opus (dario)
483
+ provider: anthropic
484
+ model: claude-opus-4-7
485
+ apiBase: http://localhost:3456
486
+ apiKey: dario
487
+ ```
488
+
489
+ `provider: anthropic` + `apiBase: http://localhost:3456` points Continue's Anthropic SDK path at dario instead of `api.anthropic.com`. dario runs the full Claude Code wire replay on the outbound path.
490
+
491
+ #### Aider
492
+
493
+ ```bash
494
+ export ANTHROPIC_BASE_URL=http://localhost:3456
495
+ export ANTHROPIC_API_KEY=dario
496
+ aider --model sonnet
497
+ ```
498
+
499
+ Aider's Anthropic path honors `ANTHROPIC_BASE_URL` directly. `--model opus`, `--model haiku`, or any explicit `claude-*` model name works.
500
+
501
+ #### Cline / Roo Code / Kilo Code
502
+
503
+ Cline and its forks use a UI-based "API Provider" dropdown. Pick **Anthropic** as the provider and fill in:
504
+
505
+ - **API Key**: `dario`
506
+ - **Anthropic Base URL**: `http://localhost:3456`
507
+ - **Model**: `claude-sonnet-4-6` / `claude-opus-4-7` / `claude-haiku-4-5`
508
+
509
+ Cline's tool-invocation protocol is XML-based (`<execute_command>`, `<write_to_file>`, etc.), not Anthropic's tool-use format. Dario auto-detects Cline-family clients via system-prompt fingerprint and flips into preserve-tools mode automatically — Cline's own tool schema passes through to Anthropic, your commands route back to Cline's parser. No flag required. Override: `--no-auto-detect` if you'd rather force the CC fingerprint and deal with the parser mismatch yourself (see [Agent compatibility](#agent-compatibility)).
510
+
511
+ #### Zed
512
+
513
+ Zed's Anthropic provider config (`~/.config/zed/settings.json` or Cmd/Ctrl+,):
514
+
515
+ ```json
516
+ {
517
+ "language_models": {
518
+ "anthropic": {
519
+ "api_url": "http://localhost:3456",
520
+ "version": "2023-06-01"
521
+ }
522
+ }
523
+ }
524
+ ```
525
+
526
+ Set the `ANTHROPIC_API_KEY` env var to `dario` before launching Zed. Model picker then shows Claude models routed through your subscription.
527
+
528
+ #### OpenHands
529
+
530
+ ```bash
531
+ export LLM_BASE_URL=http://localhost:3456
532
+ export LLM_API_KEY=dario
533
+ export LLM_MODEL=anthropic/claude-sonnet-4-6
534
+ python -m openhands.core.main -t "task description"
535
+ ```
536
+
537
+ Prefix the model with `anthropic/` so LiteLLM (OpenHands' inner routing layer) knows to hit the Anthropic path, which dario is now fronting.
538
+
539
+ #### Everything else
540
+
541
+ If your tool isn't listed, check whether it reads `OPENAI_BASE_URL` / `ANTHROPIC_BASE_URL` from the environment. Most do. For tools that don't, look in their settings for "Base URL" / "API URL" / "Endpoint" / "OpenAI-compatible endpoint" — all of those map to dario's `http://localhost:3456` (Anthropic-protocol) or `http://localhost:3456/v1` (OpenAI-protocol). If the tool only accepts `https://`, you'll need a loopback TLS shim (out of scope here — open an issue if you need one for a specific tool).
452
542
 
453
543
  ### curl
454
544
 
package/dist/cli.d.ts CHANGED
@@ -9,6 +9,20 @@
9
9
  * dario refresh — Force token refresh
10
10
  * dario logout — Remove saved credentials
11
11
  */
12
+ /**
13
+ * Parse a positive-integer env var. Returns undefined on unset, empty,
14
+ * non-numeric, or non-positive values so the caller's default applies.
15
+ * Sibling of parsePositiveIntFlag; exported for tests + used by the
16
+ * dario#80 queue env mirrors.
17
+ */
18
+ export declare function parsePositiveIntEnv(value: string | undefined): number | undefined;
19
+ /**
20
+ * Parse a boolean env var. Accepts "1", "true", "yes", "on" (case-insensitive)
21
+ * as truthy; everything else (including unset) is undefined/false. Exported
22
+ * for tests. Used by dario#77 DARIO_STRICT_TEMPLATE / DARIO_NO_LIVE_CAPTURE
23
+ * and any future boolean env mirror.
24
+ */
25
+ export declare function parseBooleanEnv(value: string | undefined): boolean | undefined;
12
26
  /**
13
27
  * Parse --preserve-orchestration-tags (bare or =value) + env mirror.
14
28
  * Exported for tests.
package/dist/cli.js CHANGED
@@ -236,6 +236,30 @@ async function proxy() {
236
236
  // DARIO_PRESERVE_ORCHESTRATION_TAGS=* (preserve all)
237
237
  // DARIO_PRESERVE_ORCHESTRATION_TAGS=thinking,env (preserve listed)
238
238
  const preserveOrchestrationTags = resolvePreserveOrchestrationTags(args, process.env['DARIO_PRESERVE_ORCHESTRATION_TAGS']);
239
+ // --no-live-capture / --strict-template — template fail-closed knobs.
240
+ // Convergent push-back from Grok + GPT in reviews/: drift resilience
241
+ // should be opt-in-verifiable, not silently best-effort. dario#77.
242
+ // --no-live-capture → skip the background CC capture entirely, use
243
+ // the bundled snapshot; for air-gapped / CI.
244
+ // --strict-template → refuse to start if the loaded template is
245
+ // bundled (no live capture) or drifted from
246
+ // the installed CC; same shape as --strict-tls.
247
+ const noLiveCapture = args.includes('--no-live-capture')
248
+ || parseBooleanEnv(process.env['DARIO_NO_LIVE_CAPTURE'])
249
+ || undefined;
250
+ const strictTemplate = args.includes('--strict-template')
251
+ || parseBooleanEnv(process.env['DARIO_STRICT_TEMPLATE'])
252
+ || undefined;
253
+ // --max-concurrent=N / --max-queued=N / --queue-timeout=MS — bounded
254
+ // request queue knobs (dario#80). Defaults preserve v3.30.x-and-earlier
255
+ // behaviour for typical single-user workloads; tune up for high-fan-out
256
+ // agent setups that otherwise hit dario-level 429s before upstream.
257
+ const maxConcurrent = parsePositiveIntFlag('--max-concurrent=')
258
+ ?? parsePositiveIntEnv(process.env['DARIO_MAX_CONCURRENT']);
259
+ const maxQueued = parsePositiveIntFlag('--max-queued=')
260
+ ?? parsePositiveIntEnv(process.env['DARIO_MAX_QUEUED']);
261
+ const queueTimeoutMs = parsePositiveIntFlag('--queue-timeout=')
262
+ ?? parsePositiveIntEnv(process.env['DARIO_QUEUE_TIMEOUT_MS']);
239
263
  // Non-loopback bind without DARIO_API_KEY turns dario into an open
240
264
  // OAuth-subscription relay for anyone on the reachable network. Refuse
241
265
  // to start rather than rely on the operator to read the startup banner.
@@ -255,7 +279,35 @@ async function proxy() {
255
279
  console.error(`[dario] Override (not recommended): pass --unsafe-no-auth if you have out-of-band network controls and accept the risk.`);
256
280
  process.exit(1);
257
281
  }
258
- await startProxy({ port, host, verbose, verboseBodies, model, passthrough, preserveTools, hybridTools, noAutoDetect, strictTls, pacingMinMs, pacingJitterMs, drainOnClose, sessionIdleRotateMs, sessionRotateJitterMs, sessionMaxAgeMs, sessionPerClient, preserveOrchestrationTags });
282
+ await startProxy({ port, host, verbose, verboseBodies, model, passthrough, preserveTools, hybridTools, noAutoDetect, strictTls, pacingMinMs, pacingJitterMs, drainOnClose, sessionIdleRotateMs, sessionRotateJitterMs, sessionMaxAgeMs, sessionPerClient, preserveOrchestrationTags, noLiveCapture, strictTemplate, maxConcurrent, maxQueued, queueTimeoutMs });
283
+ }
284
+ /**
285
+ * Parse a positive-integer env var. Returns undefined on unset, empty,
286
+ * non-numeric, or non-positive values so the caller's default applies.
287
+ * Sibling of parsePositiveIntFlag; exported for tests + used by the
288
+ * dario#80 queue env mirrors.
289
+ */
290
+ export function parsePositiveIntEnv(value) {
291
+ if (value === undefined || value === '')
292
+ return undefined;
293
+ const n = Number.parseInt(value.trim(), 10);
294
+ if (!Number.isFinite(n) || n <= 0)
295
+ return undefined;
296
+ return n;
297
+ }
298
+ /**
299
+ * Parse a boolean env var. Accepts "1", "true", "yes", "on" (case-insensitive)
300
+ * as truthy; everything else (including unset) is undefined/false. Exported
301
+ * for tests. Used by dario#77 DARIO_STRICT_TEMPLATE / DARIO_NO_LIVE_CAPTURE
302
+ * and any future boolean env mirror.
303
+ */
304
+ export function parseBooleanEnv(value) {
305
+ if (value === undefined)
306
+ return undefined;
307
+ const normalized = value.trim().toLowerCase();
308
+ if (normalized === '1' || normalized === 'true' || normalized === 'yes' || normalized === 'on')
309
+ return true;
310
+ return undefined;
259
311
  }
260
312
  /**
261
313
  * Parse --preserve-orchestration-tags (bare or =value) + env mirror.
@@ -605,7 +657,31 @@ async function help() {
605
657
  stripped. Default: strip every tag in
606
658
  ORCHESTRATION_TAG_NAMES. Env mirror:
607
659
  DARIO_PRESERVE_ORCHESTRATION_TAGS=*
608
- or =tag1,tag2. (v3.31, dario#78)
660
+ or =tag1,tag2. (v3.30.7, dario#78)
661
+ --no-live-capture Skip the background live-fingerprint
662
+ refresh entirely. dario uses the bundled
663
+ snapshot and will NOT spawn the installed
664
+ Claude Code binary. For air-gapped /
665
+ reproducible-build / CI-harness runs.
666
+ Env: DARIO_NO_LIVE_CAPTURE=1.
667
+ (v3.30.8, dario#77)
668
+ --strict-template Refuse to start if the loaded template
669
+ is the bundled snapshot (no live capture
670
+ ever succeeded) or drifts from the
671
+ installed CC version. Same philosophy
672
+ as --strict-tls: make the unsafe state
673
+ require intent. Env: DARIO_STRICT_TEMPLATE=1.
674
+ (v3.30.8, dario#77)
675
+ --max-concurrent=N Max in-flight requests (default: 10).
676
+ Env: DARIO_MAX_CONCURRENT. (dario#80)
677
+ --max-queued=N Max requests buffered waiting for a
678
+ concurrency slot before dario returns
679
+ 429 "queue-full" (default: 128).
680
+ Env: DARIO_MAX_QUEUED. (dario#80)
681
+ --queue-timeout=MS Max ms a queued request waits before
682
+ dario returns 504 "queue-timeout"
683
+ (default: 60000).
684
+ Env: DARIO_QUEUE_TIMEOUT_MS. (dario#80)
609
685
  --port=PORT Port to listen on (default: 3456)
610
686
  --host=ADDRESS Address to bind to (default: 127.0.0.1)
611
687
  Use 0.0.0.0 for LAN; see README for DARIO_API_KEY
package/dist/proxy.d.ts CHANGED
@@ -50,6 +50,27 @@ interface ProxyOptions {
50
50
  * Set(['thinking','env']) = strip everything except those two. dario#78.
51
51
  */
52
52
  preserveOrchestrationTags?: Set<string>;
53
+ /**
54
+ * Skip the background live-fingerprint refresh entirely. Use the bundled
55
+ * snapshot even when a live capture would have been possible. For
56
+ * air-gapped / reproducible-build / CI-harness operators who want no
57
+ * subprocess capture of the installed CC binary. dario#77.
58
+ */
59
+ noLiveCapture?: boolean;
60
+ /**
61
+ * Fail-closed mode for the template. If the loaded template is the
62
+ * bundled snapshot (live capture has never been run or failed), or if
63
+ * it's a live cache that drifts from the installed CC, refuse to start
64
+ * rather than silently serve the stale shape. Same philosophy as
65
+ * --strict-tls. dario#77.
66
+ */
67
+ strictTemplate?: boolean;
68
+ /** Max concurrent in-flight requests. Default 10. dario#80. */
69
+ maxConcurrent?: number;
70
+ /** Max requests buffered waiting for a concurrency slot. Default 128. dario#80. */
71
+ maxQueued?: number;
72
+ /** Max ms a queued request waits before it times out with 504. Default 60000. dario#80. */
73
+ queueTimeoutMs?: number;
53
74
  }
54
75
  export declare function sanitizeError(err: unknown): string;
55
76
  /**
package/dist/proxy.js CHANGED
@@ -12,12 +12,12 @@ import { AccountPool, computeStickyKey, parseRateLimits } from './pool.js';
12
12
  import { Analytics, billingBucketFromClaim } from './analytics.js';
13
13
  import { loadAllAccounts, loadAccount, refreshAccountToken } from './accounts.js';
14
14
  import { getOpenAIBackend, isOpenAIModel, forwardToOpenAI } from './openai-backend.js';
15
+ import { RequestQueue, QueueFullError, QueueTimeoutError, DEFAULT_MAX_CONCURRENT, DEFAULT_MAX_QUEUED, DEFAULT_QUEUE_TIMEOUT_MS } from './request-queue.js';
15
16
  const ANTHROPIC_API = 'https://api.anthropic.com';
16
17
  const DEFAULT_PORT = 3456;
17
18
  const MAX_BODY_BYTES = 10 * 1024 * 1024; // 10 MB — generous for large prompts, prevents abuse
18
19
  const UPSTREAM_TIMEOUT_MS = 300_000; // 5 min — matches Anthropic SDK default
19
20
  const BODY_READ_TIMEOUT_MS = 30_000; // 30s — prevents slow-loris on body reads
20
- const MAX_CONCURRENT = 10; // Max concurrent upstream requests
21
21
  const DEFAULT_HOST = '127.0.0.1';
22
22
  // A host is "loopback" if it's one of the well-known localhost literals.
23
23
  // Used to decide whether to warn at startup about binding to a reachable
@@ -28,28 +28,8 @@ function isLoopbackHost(host) {
28
28
  return true;
29
29
  return host.startsWith('127.');
30
30
  }
31
- // Simple semaphore for concurrency control
32
- class Semaphore {
33
- max;
34
- queue = [];
35
- active = 0;
36
- constructor(max) {
37
- this.max = max;
38
- }
39
- async acquire() {
40
- if (this.active < this.max) {
41
- this.active++;
42
- return;
43
- }
44
- return new Promise(resolve => { this.queue.push(() => { this.active++; resolve(); }); });
45
- }
46
- release() {
47
- this.active--;
48
- const next = this.queue.shift();
49
- if (next)
50
- next();
51
- }
52
- }
31
+ // Concurrency control: see src/request-queue.ts for the bounded queue
32
+ // (replaced the v3.30.x-and-earlier simple unbounded semaphore in dario#80).
53
33
  // Billing tag hash seed — matches Claude Code's value
54
34
  const BILLING_SEED = '59cf53e54c78';
55
35
  // Compute per-request build tag:
@@ -572,7 +552,11 @@ export async function startProxy(opts = {}) {
572
552
  }
573
553
  }
574
554
  let requestCount = 0;
575
- const semaphore = new Semaphore(MAX_CONCURRENT);
555
+ const queue = new RequestQueue({
556
+ maxConcurrent: opts.maxConcurrent ?? DEFAULT_MAX_CONCURRENT,
557
+ maxQueued: opts.maxQueued ?? DEFAULT_MAX_QUEUED,
558
+ queueTimeoutMs: opts.queueTimeoutMs ?? DEFAULT_QUEUE_TIMEOUT_MS,
559
+ });
576
560
  // Cache context-1m beta availability. Set false once per account (or process
577
561
  // in single-account mode) after the first "long context" rejection, so we
578
562
  // skip sending context-1m on every subsequent request instead of paying the
@@ -771,8 +755,38 @@ export async function startProxy(opts = {}) {
771
755
  res.end(ERR_METHOD);
772
756
  return;
773
757
  }
774
- // Proxy to Anthropic (with concurrency control)
775
- await semaphore.acquire();
758
+ // Proxy to Anthropic (with concurrency control). The bounded queue
759
+ // replaces the v3.30.x-and-earlier unbounded semaphore — dario#80. A
760
+ // queue-full condition returns an explicit 429 with a `"queue-full"`
761
+ // marker in the body; a queue-timeout returns 504 with `"queue-timeout"`.
762
+ try {
763
+ await queue.acquire();
764
+ }
765
+ catch (err) {
766
+ if (err instanceof QueueFullError) {
767
+ res.writeHead(429, JSON_HEADERS);
768
+ res.end(JSON.stringify({
769
+ type: 'error',
770
+ error: {
771
+ type: 'rate_limit_error',
772
+ message: `dario queue full — ${queue.maxConcurrent} concurrent + ${queue.maxQueued} queued already in flight. Tune --max-concurrent / --max-queued, or reduce client-side concurrency. (dario#80)`,
773
+ },
774
+ }));
775
+ return;
776
+ }
777
+ if (err instanceof QueueTimeoutError) {
778
+ res.writeHead(504, JSON_HEADERS);
779
+ res.end(JSON.stringify({
780
+ type: 'error',
781
+ error: {
782
+ type: 'timeout_error',
783
+ message: `dario queue timeout — request waited longer than ${queue.queueTimeoutMs}ms for a concurrency slot. Tune --queue-timeout, or reduce client-side concurrency. (dario#80)`,
784
+ },
785
+ }));
786
+ return;
787
+ }
788
+ throw err;
789
+ }
776
790
  // Hoisted so the finally block can clean up whatever was set.
777
791
  let upstreamTimeout = null;
778
792
  let onClientClose = null;
@@ -1617,7 +1631,7 @@ export async function startProxy(opts = {}) {
1617
1631
  clearTimeout(upstreamTimeout);
1618
1632
  if (onClientClose !== null)
1619
1633
  req.off('close', onClientClose);
1620
- semaphore.release();
1634
+ queue.release();
1621
1635
  }
1622
1636
  });
1623
1637
  server.on('error', (err) => {
@@ -1640,6 +1654,23 @@ export async function startProxy(opts = {}) {
1640
1654
  if (drift.drifted) {
1641
1655
  console.log(`[dario] ⚠ template drift: ${drift.message}`);
1642
1656
  }
1657
+ // Strict-template fail-closed mode. Template must be from a live capture
1658
+ // (not the bundled snapshot) and must not have drifted from the installed
1659
+ // CC. Operator opts in via --strict-template / DARIO_STRICT_TEMPLATE=1.
1660
+ // Same philosophy as --strict-tls: make the unsafe state require intent.
1661
+ // dario#77.
1662
+ if (opts.strictTemplate) {
1663
+ if (CC_TEMPLATE._source === 'bundled') {
1664
+ console.error(`[dario] Refusing to start proxy in --strict-template mode: template source is 'bundled' (no live capture available).`);
1665
+ console.error(`[dario] Fix: run \`claude --print hello\` once so dario can capture the live template, then retry. Or drop --strict-template if the bundled fingerprint is acceptable for this run.`);
1666
+ process.exit(1);
1667
+ }
1668
+ if (drift.drifted) {
1669
+ console.error(`[dario] Refusing to start proxy in --strict-template mode: template drift detected (${drift.message}).`);
1670
+ console.error(`[dario] Fix: rm ~/.dario/cc-template.live.json and retry (the next capture will be against your current CC), or drop --strict-template if the drift is acceptable.`);
1671
+ process.exit(1);
1672
+ }
1673
+ }
1643
1674
  // Compat check: is the installed CC inside the range this dario
1644
1675
  // release has been tested against? Only log when non-OK so the happy
1645
1676
  // path stays quiet. `unknown` (no CC on PATH) is also quiet — bundled
@@ -1661,7 +1692,16 @@ export async function startProxy(opts = {}) {
1661
1692
  // user's own CC binary request shape and updates ~/.dario/cc-template.live.json
1662
1693
  // for the next startup. No-op if CC isn't installed or the cache is fresh.
1663
1694
  // Never blocks proxy startup; never throws.
1664
- void import('./live-fingerprint.js').then(({ refreshLiveFingerprintAsync }) => refreshLiveFingerprintAsync({ silent: false, force: drift.drifted }).catch(() => { }));
1695
+ //
1696
+ // Skipped entirely under --no-live-capture / DARIO_NO_LIVE_CAPTURE=1 —
1697
+ // the operator has opted into a bundled-only shape (air-gapped runs,
1698
+ // reproducible-build CI, deliberate pinning). dario#77.
1699
+ if (!opts.noLiveCapture) {
1700
+ void import('./live-fingerprint.js').then(({ refreshLiveFingerprintAsync }) => refreshLiveFingerprintAsync({ silent: false, force: drift.drifted }).catch(() => { }));
1701
+ }
1702
+ else {
1703
+ console.log('[dario] --no-live-capture: background live fingerprint refresh skipped; using bundled template.');
1704
+ }
1665
1705
  server.listen(port, host, () => {
1666
1706
  const modeLine = passthrough
1667
1707
  ? 'Mode: passthrough (OAuth swap only, no injection)'
@@ -0,0 +1,75 @@
1
+ /**
2
+ * Bounded request queue — replaces the simple in-process semaphore so that
3
+ * overload conditions are visible and tunable instead of silently queuing
4
+ * unbounded or rejecting with generic 429s before upstream had a chance.
5
+ *
6
+ * Three knobs:
7
+ * - maxConcurrent : in-flight requests allowed at once (default 10)
8
+ * - maxQueued : buffered requests waiting for a concurrency slot
9
+ * (default 128); beyond this, the queue is "full" and
10
+ * admission is rejected with a clear 429 body.
11
+ * - queueTimeoutMs: how long a queued request waits before it 504s with
12
+ * a "queue-timeout" reason (default 60_000).
13
+ *
14
+ * Behaviour:
15
+ * - active < maxConcurrent → admit immediately
16
+ * - else, queued < maxQueued → enqueue
17
+ * - else → reject with `queue-full`
18
+ * - queued > queueTimeoutMs → reject with `queue-timeout`
19
+ *
20
+ * The decision logic is split out as a pure `decideAdmit(state)` function so
21
+ * tests can exercise all three branches without side effects or timers.
22
+ *
23
+ * dario#80 (Gemini review push-back).
24
+ */
25
+ export interface QueueState {
26
+ active: number;
27
+ queued: number;
28
+ maxConcurrent: number;
29
+ maxQueued: number;
30
+ }
31
+ export type AdmitDecision = {
32
+ action: 'admit';
33
+ } | {
34
+ action: 'enqueue';
35
+ } | {
36
+ action: 'reject';
37
+ reason: 'queue-full';
38
+ };
39
+ /** Pure admission decision — no side effects, no clock dep. */
40
+ export declare function decideAdmit(state: QueueState): AdmitDecision;
41
+ /** Pure timeout check — separated so tests can pass an explicit clock. */
42
+ export declare function isQueueEntryExpired(enqueuedAt: number, now: number, timeoutMs: number): boolean;
43
+ export declare class QueueFullError extends Error {
44
+ constructor();
45
+ }
46
+ export declare class QueueTimeoutError extends Error {
47
+ constructor();
48
+ }
49
+ export interface RequestQueueOptions {
50
+ maxConcurrent?: number;
51
+ maxQueued?: number;
52
+ queueTimeoutMs?: number;
53
+ }
54
+ export declare const DEFAULT_MAX_CONCURRENT = 10;
55
+ export declare const DEFAULT_MAX_QUEUED = 128;
56
+ export declare const DEFAULT_QUEUE_TIMEOUT_MS = 60000;
57
+ export declare class RequestQueue {
58
+ readonly maxConcurrent: number;
59
+ readonly maxQueued: number;
60
+ readonly queueTimeoutMs: number;
61
+ private active;
62
+ private queue;
63
+ constructor(opts?: RequestQueueOptions);
64
+ /**
65
+ * Acquire a concurrency slot. Resolves when admitted; throws
66
+ * `QueueFullError` when the queue is at its `maxQueued` cap, throws
67
+ * `QueueTimeoutError` when a queued request waited longer than
68
+ * `queueTimeoutMs`.
69
+ */
70
+ acquire(): Promise<void>;
71
+ /** Release a slot. The next queued entry (if any) is admitted in FIFO order. */
72
+ release(): void;
73
+ /** Snapshot of queue state — exposed for /analytics + tests. */
74
+ snapshot(): QueueState;
75
+ }
@@ -0,0 +1,108 @@
1
+ /**
2
+ * Bounded request queue — replaces the simple in-process semaphore so that
3
+ * overload conditions are visible and tunable instead of silently queuing
4
+ * unbounded or rejecting with generic 429s before upstream had a chance.
5
+ *
6
+ * Three knobs:
7
+ * - maxConcurrent : in-flight requests allowed at once (default 10)
8
+ * - maxQueued : buffered requests waiting for a concurrency slot
9
+ * (default 128); beyond this, the queue is "full" and
10
+ * admission is rejected with a clear 429 body.
11
+ * - queueTimeoutMs: how long a queued request waits before it 504s with
12
+ * a "queue-timeout" reason (default 60_000).
13
+ *
14
+ * Behaviour:
15
+ * - active < maxConcurrent → admit immediately
16
+ * - else, queued < maxQueued → enqueue
17
+ * - else → reject with `queue-full`
18
+ * - queued > queueTimeoutMs → reject with `queue-timeout`
19
+ *
20
+ * The decision logic is split out as a pure `decideAdmit(state)` function so
21
+ * tests can exercise all three branches without side effects or timers.
22
+ *
23
+ * dario#80 (Gemini review push-back).
24
+ */
25
+ /** Pure admission decision — no side effects, no clock dep. */
26
+ export function decideAdmit(state) {
27
+ if (state.active < state.maxConcurrent)
28
+ return { action: 'admit' };
29
+ if (state.queued < state.maxQueued)
30
+ return { action: 'enqueue' };
31
+ return { action: 'reject', reason: 'queue-full' };
32
+ }
33
+ /** Pure timeout check — separated so tests can pass an explicit clock. */
34
+ export function isQueueEntryExpired(enqueuedAt, now, timeoutMs) {
35
+ return (now - enqueuedAt) > timeoutMs;
36
+ }
37
+ export class QueueFullError extends Error {
38
+ constructor() { super('queue-full'); this.name = 'QueueFullError'; }
39
+ }
40
+ export class QueueTimeoutError extends Error {
41
+ constructor() { super('queue-timeout'); this.name = 'QueueTimeoutError'; }
42
+ }
43
+ export const DEFAULT_MAX_CONCURRENT = 10;
44
+ export const DEFAULT_MAX_QUEUED = 128;
45
+ export const DEFAULT_QUEUE_TIMEOUT_MS = 60_000;
46
+ export class RequestQueue {
47
+ maxConcurrent;
48
+ maxQueued;
49
+ queueTimeoutMs;
50
+ active = 0;
51
+ queue = [];
52
+ constructor(opts = {}) {
53
+ this.maxConcurrent = opts.maxConcurrent ?? DEFAULT_MAX_CONCURRENT;
54
+ this.maxQueued = opts.maxQueued ?? DEFAULT_MAX_QUEUED;
55
+ this.queueTimeoutMs = opts.queueTimeoutMs ?? DEFAULT_QUEUE_TIMEOUT_MS;
56
+ }
57
+ /**
58
+ * Acquire a concurrency slot. Resolves when admitted; throws
59
+ * `QueueFullError` when the queue is at its `maxQueued` cap, throws
60
+ * `QueueTimeoutError` when a queued request waited longer than
61
+ * `queueTimeoutMs`.
62
+ */
63
+ async acquire() {
64
+ const decision = decideAdmit(this.snapshot());
65
+ if (decision.action === 'admit') {
66
+ this.active++;
67
+ return;
68
+ }
69
+ if (decision.action === 'reject') {
70
+ throw new QueueFullError();
71
+ }
72
+ return new Promise((resolve, reject) => {
73
+ const enqueuedAt = Date.now();
74
+ const timeoutHandle = setTimeout(() => {
75
+ const idx = this.queue.indexOf(entry);
76
+ if (idx >= 0) {
77
+ this.queue.splice(idx, 1);
78
+ reject(new QueueTimeoutError());
79
+ }
80
+ }, this.queueTimeoutMs);
81
+ // Keep the timer from pinning the event loop open on shutdown. A queued
82
+ // request waiting for a slot shouldn't by itself keep the process alive.
83
+ timeoutHandle.unref?.();
84
+ const entry = { resolve, reject, enqueuedAt, timeoutHandle };
85
+ this.queue.push(entry);
86
+ });
87
+ }
88
+ /** Release a slot. The next queued entry (if any) is admitted in FIFO order. */
89
+ release() {
90
+ if (this.active > 0)
91
+ this.active--;
92
+ const next = this.queue.shift();
93
+ if (next) {
94
+ clearTimeout(next.timeoutHandle);
95
+ this.active++;
96
+ next.resolve();
97
+ }
98
+ }
99
+ /** Snapshot of queue state — exposed for /analytics + tests. */
100
+ snapshot() {
101
+ return {
102
+ active: this.active,
103
+ queued: this.queue.length,
104
+ maxConcurrent: this.maxConcurrent,
105
+ maxQueued: this.maxQueued,
106
+ };
107
+ }
108
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@askalf/dario",
3
- "version": "3.30.7",
3
+ "version": "3.30.9",
4
4
  "description": "A local LLM router. One endpoint, every provider — Claude subscriptions, OpenAI, OpenRouter, Groq, local LiteLLM, any OpenAI-compat endpoint — your tools don't need to change.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -21,7 +21,7 @@
21
21
  ],
22
22
  "scripts": {
23
23
  "build": "tsc && cp src/cc-template-data.json dist/ && node -e \"require('fs').mkdirSync('dist/shim',{recursive:true})\" && cp src/shim/runtime.cjs dist/shim/",
24
- "test": "node test/issue-29-tool-translation.mjs && node test/hybrid-tools.mjs && node test/tool-schema-contract.mjs && node test/scrub-paths.mjs && node test/provider-prefix.mjs && node test/analytics-recording.mjs && node test/analytics-billing-bucket.mjs && node test/failover-429.mjs && node test/pool-sticky.mjs && node test/live-fingerprint.mjs && node test/shim-runtime.mjs && node test/shim-e2e.mjs && node test/proxy-header-order.mjs && node test/proxy-body-order.mjs && node test/runtime-fingerprint.mjs && node test/pacing.mjs && node test/stream-drain.mjs && node test/subagent.mjs && node test/mcp-protocol.mjs && node test/mcp-tools.mjs && node test/mcp-e2e.mjs && node test/session-rotation.mjs && node test/drift-detection.mjs && node test/cc-authorize-probe-classifier.mjs && node test/compat-range.mjs && node test/doctor-formatter.mjs && node test/atomic-write.mjs && node test/account-refresh-singleflight.mjs && node test/streaming-edge-cases.mjs && node test/client-detection.mjs && node test/manual-oauth-flow.mjs && node test/scrub-template.mjs && node test/sanitize-messages.mjs && node test/platform-tools.mjs",
24
+ "test": "node test/issue-29-tool-translation.mjs && node test/hybrid-tools.mjs && node test/tool-schema-contract.mjs && node test/scrub-paths.mjs && node test/provider-prefix.mjs && node test/analytics-recording.mjs && node test/analytics-billing-bucket.mjs && node test/failover-429.mjs && node test/pool-sticky.mjs && node test/live-fingerprint.mjs && node test/shim-runtime.mjs && node test/shim-e2e.mjs && node test/proxy-header-order.mjs && node test/proxy-body-order.mjs && node test/runtime-fingerprint.mjs && node test/pacing.mjs && node test/stream-drain.mjs && node test/subagent.mjs && node test/mcp-protocol.mjs && node test/mcp-tools.mjs && node test/mcp-e2e.mjs && node test/session-rotation.mjs && node test/drift-detection.mjs && node test/cc-authorize-probe-classifier.mjs && node test/compat-range.mjs && node test/doctor-formatter.mjs && node test/atomic-write.mjs && node test/account-refresh-singleflight.mjs && node test/streaming-edge-cases.mjs && node test/client-detection.mjs && node test/manual-oauth-flow.mjs && node test/scrub-template.mjs && node test/sanitize-messages.mjs && node test/platform-tools.mjs && node test/strict-template-flags.mjs && node test/request-queue.mjs",
25
25
  "audit": "npm audit --production --audit-level=high",
26
26
  "prepublishOnly": "npm run build",
27
27
  "start": "node dist/cli.js",