@askalf/dario 3.37.20 → 3.38.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +27 -24
- package/dist/cli.js +42 -1
- package/dist/pacing.d.ts +73 -0
- package/dist/pacing.js +48 -0
- package/dist/proxy.d.ts +6 -0
- package/dist/proxy.js +76 -6
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -62,6 +62,33 @@ Something not right? `dario doctor` prints a single paste-ready health report. P
|
|
|
62
62
|
|
|
63
63
|
---
|
|
64
64
|
|
|
65
|
+
## The 2026-06-15 billing cliff
|
|
66
|
+
|
|
67
|
+
Starting **2026-06-15**, Anthropic splits Claude plan usage into two separate pools. The one that matters for coding agents:
|
|
68
|
+
|
|
69
|
+
| Plan | New Agent-SDK / `claude -p` credit | What happens when it runs out |
|
|
70
|
+
|---|---|---|
|
|
71
|
+
| Pro | **$20/mo** | Per-token API pricing |
|
|
72
|
+
| Max 5x | **$100/mo** | Per-token API pricing |
|
|
73
|
+
| Max 20x | **$200/mo** | Per-token API pricing |
|
|
74
|
+
|
|
75
|
+
A sustained Cline or Aider session burns $100 of API-rate tokens in an evening. Agentic loops blow past $200 in days. Any proxy that forwards requests in their original Agent-SDK or `claude -p` wire shape — which is most of them — puts your agentic traffic into this new credit pool instead of your subscription pool. Once it's gone, you're on metered pricing.
|
|
76
|
+
|
|
77
|
+
**Dario doesn't.** Every outbound request is rebuilt as **interactive Claude Code wire-shape** before it leaves your machine: headers, body key order, TLS stack, session-id lifecycle — the same six axes the live template extractor has been closing since v3.22. Anthropic's billing classifier sees an interactive CC session. Your traffic stays in the subscription pool you already pay for.
|
|
78
|
+
|
|
79
|
+
| Your setup | Post-2026-06-15 billing path |
|
|
80
|
+
|---|---|
|
|
81
|
+
| Any tool → Anthropic API direct | Per-token API |
|
|
82
|
+
| Any tool → proxy forwarding requests as-is | **Agent-SDK credit ($20–200/mo cap), then per-token API** |
|
|
83
|
+
| **Any tool → dario** | **Subscription pool — unchanged** |
|
|
84
|
+
| Claude Code interactive | Subscription pool — unchanged |
|
|
85
|
+
|
|
86
|
+
Same install. Same `localhost:3456`. No config change needed for the cliff.
|
|
87
|
+
|
|
88
|
+
**Verify it's working:** `dario doctor --usage` fires one Haiku request through your OAuth and surfaces the rate-limit headers Anthropic returned. The `representative-claim` field should read `five_hour` or `seven_day` — subscription billing buckets. If it reads anything else after 2026-06-15 lands, [file an issue](https://github.com/askalf/dario/issues/new) — that's what the drift detector exists to catch. Full technical breakdown + post-cliff verification procedure: [`docs/why-now-2026-06.md`](./docs/why-now-2026-06.md).
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
65
92
|
## What it actually does
|
|
66
93
|
|
|
67
94
|
You point every tool at one URL. Dario reads each request, decides which backend owns it, and forwards in that backend's native protocol.
|
|
@@ -96,29 +123,6 @@ Already have **Pro + Max** stacked? Pool mode (`dario accounts add work` / `dari
|
|
|
96
123
|
|
|
97
124
|
---
|
|
98
125
|
|
|
99
|
-
## What changes 2026-06-15 (and why dario doesn't)
|
|
100
|
-
|
|
101
|
-
Anthropic announced that starting **2026-06-15**, Claude Agent SDK and `claude -p` (Claude Code headless mode) usage will no longer count toward Claude plan usage limits. Eligible plans get a separate fixed monthly credit instead — **$20/mo on Pro, $100/mo on Max 5x, $200/mo on Max 20x**. Once exhausted, those calls go to metered API pricing.
|
|
102
|
-
|
|
103
|
-
For autonomous workloads — long-running coding agents, background scripts, anything that runs unattended — that credit pool is small. A single sustained Cline or Aider session can chew through $100 in an evening; agentic workloads blow past $200 in days.
|
|
104
|
-
|
|
105
|
-
**Dario is unaffected by this split.** The Claude backend was designed from day one to send requests as **interactive Claude Code wire-shape** — full template replay from your installed CC binary (headers, body key order, TLS stack, session-id lifecycle). The upstream billing path sees a Claude Code interactive session, not a `claude -p` invocation or an Agent SDK call, regardless of what tool actually originated the request locally. That's the whole point of the wire-fidelity work in [`docs/wire-fidelity.md`](./docs/wire-fidelity.md) — and it was already shipping in production before the 2026-06-15 announcement.
|
|
106
|
-
|
|
107
|
-
What this means concretely:
|
|
108
|
-
|
|
109
|
-
| Setup | Pre-2026-06-15 | Post-2026-06-15 |
|
|
110
|
-
|---|---|---|
|
|
111
|
-
| Cline / Aider / Cursor → Anthropic API direct with API key | Per-token API billing | Per-token API billing (unchanged) |
|
|
112
|
-
| Cline / Aider / Cursor → naive proxy that passes `claude -p` / Agent SDK calls through unchanged | Subscription pool | **Separate $20–200/mo credit pool, then per-token API** |
|
|
113
|
-
| **Cline / Aider / Cursor → dario** | **Subscription pool** | **Subscription pool (unchanged — dario rewrites as interactive CC)** |
|
|
114
|
-
| Claude Code itself, used interactively | Subscription pool | Subscription pool (unchanged) |
|
|
115
|
-
|
|
116
|
-
If you're coming from a proxy that doesn't replay the full CC wire shape, your agentic workloads are about to land in the smaller credit bucket on 2026-06-15. Dario keeps them on the subscription pool you already pay for.
|
|
117
|
-
|
|
118
|
-
Same install. Same `localhost:3456`. No config change needed for the cliff. Full technical breakdown — including the two diagnostic checks you can run after 2026-06-15 lands to verify the classification on your own setup — is in [`docs/why-now-2026-06.md`](./docs/why-now-2026-06.md).
|
|
119
|
-
|
|
120
|
-
---
|
|
121
|
-
|
|
122
126
|
## Why you'll install this
|
|
123
127
|
|
|
124
128
|
- **One URL for every provider.** Cursor, Aider, Continue, Zed, OpenHands, Claude Code, your own scripts — every tool you own has its own per-provider config. Dario collapses that into a single `localhost:3456` that speaks both Anthropic and OpenAI protocols and routes by model name.
|
|
@@ -378,7 +382,6 @@ MIT — see [LICENSE](LICENSE) and [DISCLAIMER.md](DISCLAIMER.md).
|
|
|
378
382
|
| Project | What it does |
|
|
379
383
|
|---------|-------------|
|
|
380
384
|
| [arnie](https://github.com/askalf/arnie) | Portable IT troubleshooting companion. Networking, AD, Windows Update, package managers, log triage, hardware checks. |
|
|
381
|
-
| [brio](https://github.com/askalf/brio) | Capability layer for AI workloads — semantic cache, cost tiering, policy. Sits in front of any Anthropic-compat endpoint. |
|
|
382
385
|
| [browser-bridge](https://github.com/askalf/browser-bridge) | Stealth headless Chromium in a container. CDP on 9222 — Playwright/Puppeteer/MCP-compatible. |
|
|
383
386
|
| [claude-bridge](https://github.com/askalf/claude-bridge) | Bridge Claude Code sessions to Discord. |
|
|
384
387
|
| [deepdive](https://github.com/askalf/deepdive) | Local research agent. Plan → search → fetch → extract → synthesize. Cited answers. |
|
package/dist/cli.js
CHANGED
|
@@ -251,6 +251,16 @@ async function proxy() {
|
|
|
251
251
|
// calc lives in src/pacing.ts; the flags just feed it.
|
|
252
252
|
const pacingMinMs = parsePositiveIntFlag('--pace-min=');
|
|
253
253
|
const pacingJitterMs = parsePositiveIntFlag('--pace-jitter=');
|
|
254
|
+
// --think-time-* / --session-start-* — behavioral smoothing extension.
|
|
255
|
+
// Closes the temporal axis the wire-fidelity work doesn't touch:
|
|
256
|
+
// response-length-correlated read time between requests, and per-
|
|
257
|
+
// session opening latency. All defaults 0 = off (opt-in).
|
|
258
|
+
const thinkTimeBaseMs = parsePositiveIntFlag('--think-time-base=');
|
|
259
|
+
const thinkTimePerTokenMs = parsePositiveIntFlag('--think-time-per-token=');
|
|
260
|
+
const thinkTimeJitterMs = parsePositiveIntFlag('--think-time-jitter=');
|
|
261
|
+
const thinkTimeMaxMs = parsePositiveIntFlag('--think-time-max=');
|
|
262
|
+
const sessionStartMinMs = parsePositiveIntFlag('--session-start-min=');
|
|
263
|
+
const sessionStartJitterMs = parsePositiveIntFlag('--session-start-jitter=');
|
|
254
264
|
// --drain-on-close (v3.25, direction #5). When set, a client
|
|
255
265
|
// disconnect no longer aborts the upstream SSE — dario keeps
|
|
256
266
|
// draining the stream to EOF so Anthropic sees the CC-shaped
|
|
@@ -389,7 +399,7 @@ async function proxy() {
|
|
|
389
399
|
console.error(`[dario] Override (not recommended): pass --unsafe-no-auth if you have out-of-band network controls and accept the risk.`);
|
|
390
400
|
process.exit(1);
|
|
391
401
|
}
|
|
392
|
-
await startProxy({ port, host, verbose, verboseBodies, model, passthrough, preserveTools, hybridTools, mergeTools, noAutoDetect, strictTls, pacingMinMs, pacingJitterMs, drainOnClose, sessionIdleRotateMs, sessionRotateJitterMs, sessionMaxAgeMs, sessionPerClient, preserveOrchestrationTags, noLiveCapture, strictTemplate, maxConcurrent, maxQueued, queueTimeoutMs, effort, maxTokens, logFile, passthroughBetas, systemPrompt });
|
|
402
|
+
await startProxy({ port, host, verbose, verboseBodies, model, passthrough, preserveTools, hybridTools, mergeTools, noAutoDetect, strictTls, pacingMinMs, pacingJitterMs, thinkTimeBaseMs, thinkTimePerTokenMs, thinkTimeJitterMs, thinkTimeMaxMs, sessionStartMinMs, sessionStartJitterMs, drainOnClose, sessionIdleRotateMs, sessionRotateJitterMs, sessionMaxAgeMs, sessionPerClient, preserveOrchestrationTags, noLiveCapture, strictTemplate, maxConcurrent, maxQueued, queueTimeoutMs, effort, maxTokens, logFile, passthroughBetas, systemPrompt });
|
|
393
403
|
}
|
|
394
404
|
/**
|
|
395
405
|
* Parse `--system-prompt=<verbatim|partial|aggressive|filepath>` (or the
|
|
@@ -1013,6 +1023,37 @@ async function help() {
|
|
|
1013
1023
|
Default: 0 (off). Set to e.g. 300 to hide
|
|
1014
1024
|
the floor from long-run inter-arrival
|
|
1015
1025
|
statistics. (v3.24)
|
|
1026
|
+
--think-time-base=MS Post-response "think time" base — constant
|
|
1027
|
+
ms added before the next request fires.
|
|
1028
|
+
Models the wall-clock pause between an
|
|
1029
|
+
interactive CC user reading a response and
|
|
1030
|
+
typing the next message. Default: 0 (off).
|
|
1031
|
+
Env: DARIO_THINK_TIME_BASE_MS.
|
|
1032
|
+
--think-time-per-token=MS
|
|
1033
|
+
Additional ms per output token of the
|
|
1034
|
+
previous response (linear). e.g. 5 → a
|
|
1035
|
+
1000-token response adds 5s of read time
|
|
1036
|
+
before the next request. Default: 0.
|
|
1037
|
+
Env: DARIO_THINK_TIME_PER_TOKEN_MS.
|
|
1038
|
+
--think-time-jitter=MS Max uniform-random jitter on top of
|
|
1039
|
+
base+perToken*tokens. Hides the formula
|
|
1040
|
+
from long-run inter-arrival statistics.
|
|
1041
|
+
Default: 0.
|
|
1042
|
+
Env: DARIO_THINK_TIME_JITTER_MS.
|
|
1043
|
+
--think-time-max=MS Upper bound on think time so a 50k-token
|
|
1044
|
+
response doesn't pause for minutes.
|
|
1045
|
+
Default: 30000 (30s).
|
|
1046
|
+
Env: DARIO_THINK_TIME_MAX_MS.
|
|
1047
|
+
--session-start-min=MS Floor on session-start delay — applied to
|
|
1048
|
+
the first request only (lastResponseTime
|
|
1049
|
+
=== 0). Real CC sessions open with seconds
|
|
1050
|
+
of startup latency, not microseconds.
|
|
1051
|
+
Default: 0 (off).
|
|
1052
|
+
Env: DARIO_SESSION_START_MIN_MS.
|
|
1053
|
+
--session-start-jitter=MS
|
|
1054
|
+
Max uniform-random jitter on session-start
|
|
1055
|
+
delay. Default: 0.
|
|
1056
|
+
Env: DARIO_SESSION_START_JITTER_MS.
|
|
1016
1057
|
--drain-on-close When the client disconnects mid-stream,
|
|
1017
1058
|
keep consuming the upstream SSE to EOF
|
|
1018
1059
|
so Anthropic sees the same read-to-
|
package/dist/pacing.d.ts
CHANGED
|
@@ -60,3 +60,76 @@ export declare function resolvePacingConfig(explicit?: {
|
|
|
60
60
|
minGapMs?: number;
|
|
61
61
|
jitterMs?: number;
|
|
62
62
|
}, env?: NodeJS.ProcessEnv): PacingConfig;
|
|
63
|
+
/**
|
|
64
|
+
* Post-response "think time" simulation (behavioral smoothing extension).
|
|
65
|
+
*
|
|
66
|
+
* Inter-request `computePacingDelay` enforces a floor on the wall-clock
|
|
67
|
+
* distance between two outbound requests. Think time models the
|
|
68
|
+
* orthogonal axis: how long a real interactive Claude Code user would
|
|
69
|
+
* spend reading a response before sending the next message. Without it,
|
|
70
|
+
* agentic loops fire the next request as fast as the client can stamp
|
|
71
|
+
* one out, which creates an inter-arrival distribution that's
|
|
72
|
+
* structurally absent in real interactive sessions (read-then-type has
|
|
73
|
+
* variance correlated with response length; agent loops don't).
|
|
74
|
+
*
|
|
75
|
+
* delay = baseMs + perTokenMs * lastResponseTokens + U(0, jitterMs)
|
|
76
|
+
*
|
|
77
|
+
* Then clamped to [0, maxMs] and reduced by elapsed time since the
|
|
78
|
+
* response completed (so a slow downstream consumer doesn't double-pay).
|
|
79
|
+
*
|
|
80
|
+
* `lastResponseTime === 0` returns 0 — there's no response to read on
|
|
81
|
+
* the first request of a session. Session-start jitter is a separate
|
|
82
|
+
* function (`computeSessionStartDelay`) since it has different semantics.
|
|
83
|
+
*/
|
|
84
|
+
export interface ThinkTimeConfig {
|
|
85
|
+
/** Constant ms added to every think-time sample, regardless of tokens. */
|
|
86
|
+
baseMs: number;
|
|
87
|
+
/** Additional ms per output token of the previous response (linear). */
|
|
88
|
+
perTokenMs: number;
|
|
89
|
+
/** Max uniform-random jitter (ms) added on top. */
|
|
90
|
+
jitterMs: number;
|
|
91
|
+
/** Upper bound on think time. Prevents pathological pauses on very long responses. */
|
|
92
|
+
maxMs: number;
|
|
93
|
+
}
|
|
94
|
+
export declare function computeThinkTimeDelay(now: number, lastResponseTime: number, lastResponseTokens: number, cfg: ThinkTimeConfig, rng?: () => number): number;
|
|
95
|
+
/**
|
|
96
|
+
* Resolve a ThinkTimeConfig from explicit options, env vars, and
|
|
97
|
+
* defaults. All defaults are 0 — feature is opt-in. `maxMs` defaults to
|
|
98
|
+
* 30000 (30s) when any think-time knob is enabled and the user hasn't
|
|
99
|
+
* set their own cap; on a fully-disabled config the cap doesn't matter
|
|
100
|
+
* since the short-circuit above returns 0 first.
|
|
101
|
+
*/
|
|
102
|
+
export declare function resolveThinkTimeConfig(explicit?: {
|
|
103
|
+
baseMs?: number;
|
|
104
|
+
perTokenMs?: number;
|
|
105
|
+
jitterMs?: number;
|
|
106
|
+
maxMs?: number;
|
|
107
|
+
}, env?: NodeJS.ProcessEnv): ThinkTimeConfig;
|
|
108
|
+
/**
|
|
109
|
+
* Session-start delay (behavioral smoothing extension).
|
|
110
|
+
*
|
|
111
|
+
* Every new single-account session — first request after startup, first
|
|
112
|
+
* request after a session-id rotation — currently fires at machine
|
|
113
|
+
* speed (lastRequestTime resets to 0, computePacingDelay returns 0).
|
|
114
|
+
* Every session opens with an identical zero-delay first request, which
|
|
115
|
+
* is a detectable signal on long-run traffic statistics. Real CC users
|
|
116
|
+
* open a new session by opening the binary and typing a prompt — that's
|
|
117
|
+
* seconds of latency, not microseconds.
|
|
118
|
+
*
|
|
119
|
+
* delay = minMs + U(0, jitterMs)
|
|
120
|
+
*
|
|
121
|
+
* Returns the sampled delay directly (no elapsed-time check — this is a
|
|
122
|
+
* one-shot delay applied to the first request of a session, before any
|
|
123
|
+
* upstream call has happened).
|
|
124
|
+
*/
|
|
125
|
+
export interface SessionStartConfig {
|
|
126
|
+
/** Constant ms floor for session-start delay. */
|
|
127
|
+
minMs: number;
|
|
128
|
+
/** Max uniform-random jitter (ms) added on top. */
|
|
129
|
+
jitterMs: number;
|
|
130
|
+
}
|
|
131
|
+
export declare function computeSessionStartDelay(cfg: SessionStartConfig, rng?: () => number): number;
|
|
132
|
+
export declare function resolveSessionStartConfig(explicit?: {
|
|
133
|
+
minMs?: number;
|
|
134
|
+
jitterMs?: number;
|
|
135
|
+
}, env?: NodeJS.ProcessEnv): SessionStartConfig;
|
package/dist/pacing.js
CHANGED
|
@@ -76,3 +76,51 @@ function pickNonNegativeInt(...candidates) {
|
|
|
76
76
|
}
|
|
77
77
|
return undefined;
|
|
78
78
|
}
|
|
79
|
+
export function computeThinkTimeDelay(now, lastResponseTime, lastResponseTokens, cfg, rng = Math.random) {
|
|
80
|
+
if (lastResponseTime <= 0)
|
|
81
|
+
return 0;
|
|
82
|
+
const base = Math.max(0, cfg.baseMs);
|
|
83
|
+
const perToken = Math.max(0, cfg.perTokenMs);
|
|
84
|
+
const jitter = Math.max(0, cfg.jitterMs);
|
|
85
|
+
const max = Math.max(0, cfg.maxMs);
|
|
86
|
+
const tokens = Math.max(0, lastResponseTokens);
|
|
87
|
+
// Short-circuit when all knobs are zero — avoids unnecessary rng calls
|
|
88
|
+
// and the elapsed-time math on the hot path when think time is off.
|
|
89
|
+
if (base === 0 && perToken === 0 && jitter === 0)
|
|
90
|
+
return 0;
|
|
91
|
+
const jitterAdd = jitter > 0 ? Math.floor(rng() * jitter) : 0;
|
|
92
|
+
let target = base + perToken * tokens + jitterAdd;
|
|
93
|
+
if (max > 0 && target > max)
|
|
94
|
+
target = max;
|
|
95
|
+
const elapsed = now - lastResponseTime;
|
|
96
|
+
if (elapsed >= target)
|
|
97
|
+
return 0;
|
|
98
|
+
return target - elapsed;
|
|
99
|
+
}
|
|
100
|
+
/**
|
|
101
|
+
* Resolve a ThinkTimeConfig from explicit options, env vars, and
|
|
102
|
+
* defaults. All defaults are 0 — feature is opt-in. `maxMs` defaults to
|
|
103
|
+
* 30000 (30s) when any think-time knob is enabled and the user hasn't
|
|
104
|
+
* set their own cap; on a fully-disabled config the cap doesn't matter
|
|
105
|
+
* since the short-circuit above returns 0 first.
|
|
106
|
+
*/
|
|
107
|
+
export function resolveThinkTimeConfig(explicit = {}, env = process.env) {
|
|
108
|
+
const base = pickNonNegativeInt(explicit.baseMs, env.DARIO_THINK_TIME_BASE_MS) ?? 0;
|
|
109
|
+
const perToken = pickNonNegativeInt(explicit.perTokenMs, env.DARIO_THINK_TIME_PER_TOKEN_MS) ?? 0;
|
|
110
|
+
const jitter = pickNonNegativeInt(explicit.jitterMs, env.DARIO_THINK_TIME_JITTER_MS) ?? 0;
|
|
111
|
+
const max = pickNonNegativeInt(explicit.maxMs, env.DARIO_THINK_TIME_MAX_MS) ?? 30000;
|
|
112
|
+
return { baseMs: base, perTokenMs: perToken, jitterMs: jitter, maxMs: max };
|
|
113
|
+
}
|
|
114
|
+
export function computeSessionStartDelay(cfg, rng = Math.random) {
|
|
115
|
+
const min = Math.max(0, cfg.minMs);
|
|
116
|
+
const jitter = Math.max(0, cfg.jitterMs);
|
|
117
|
+
if (min === 0 && jitter === 0)
|
|
118
|
+
return 0;
|
|
119
|
+
const jitterAdd = jitter > 0 ? Math.floor(rng() * jitter) : 0;
|
|
120
|
+
return min + jitterAdd;
|
|
121
|
+
}
|
|
122
|
+
export function resolveSessionStartConfig(explicit = {}, env = process.env) {
|
|
123
|
+
const min = pickNonNegativeInt(explicit.minMs, env.DARIO_SESSION_START_MIN_MS) ?? 0;
|
|
124
|
+
const jitter = pickNonNegativeInt(explicit.jitterMs, env.DARIO_SESSION_START_JITTER_MS) ?? 0;
|
|
125
|
+
return { minMs: min, jitterMs: jitter };
|
|
126
|
+
}
|
package/dist/proxy.d.ts
CHANGED
|
@@ -62,6 +62,12 @@ interface ProxyOptions {
|
|
|
62
62
|
strictTls?: boolean;
|
|
63
63
|
pacingMinMs?: number;
|
|
64
64
|
pacingJitterMs?: number;
|
|
65
|
+
thinkTimeBaseMs?: number;
|
|
66
|
+
thinkTimePerTokenMs?: number;
|
|
67
|
+
thinkTimeJitterMs?: number;
|
|
68
|
+
thinkTimeMaxMs?: number;
|
|
69
|
+
sessionStartMinMs?: number;
|
|
70
|
+
sessionStartJitterMs?: number;
|
|
65
71
|
drainOnClose?: boolean;
|
|
66
72
|
sessionIdleRotateMs?: number;
|
|
67
73
|
sessionRotateJitterMs?: number;
|
package/dist/proxy.js
CHANGED
|
@@ -717,14 +717,39 @@ export async function startProxy(opts = {}) {
|
|
|
717
717
|
// 500ms floor keeps the default behavior identical to v3.23; `--pace-min`
|
|
718
718
|
// and `--pace-jitter` let callers tune the distribution. Pure calc lives
|
|
719
719
|
// in src/pacing.ts so the edge cases are unit-tested without timers.
|
|
720
|
-
const { computePacingDelay, resolvePacingConfig } = await import('./pacing.js');
|
|
720
|
+
const { computePacingDelay, resolvePacingConfig, computeThinkTimeDelay, resolveThinkTimeConfig, computeSessionStartDelay, resolveSessionStartConfig, } = await import('./pacing.js');
|
|
721
721
|
let lastRequestTime = 0;
|
|
722
|
+
// Behavioral smoothing state: when the last response *completed* and
|
|
723
|
+
// how many output tokens it had. Used by computeThinkTimeDelay to
|
|
724
|
+
// model human read-time before the next request. Distinct from
|
|
725
|
+
// lastRequestTime (which tracks when the last request *started* and
|
|
726
|
+
// feeds the inter-request floor).
|
|
727
|
+
let lastResponseTime = 0;
|
|
728
|
+
let lastResponseTokens = 0;
|
|
722
729
|
const pacingCfg = resolvePacingConfig({
|
|
723
730
|
minGapMs: opts.pacingMinMs,
|
|
724
731
|
jitterMs: opts.pacingJitterMs,
|
|
725
732
|
});
|
|
733
|
+
const thinkTimeCfg = resolveThinkTimeConfig({
|
|
734
|
+
baseMs: opts.thinkTimeBaseMs,
|
|
735
|
+
perTokenMs: opts.thinkTimePerTokenMs,
|
|
736
|
+
jitterMs: opts.thinkTimeJitterMs,
|
|
737
|
+
maxMs: opts.thinkTimeMaxMs,
|
|
738
|
+
});
|
|
739
|
+
const sessionStartCfg = resolveSessionStartConfig({
|
|
740
|
+
minMs: opts.sessionStartMinMs,
|
|
741
|
+
jitterMs: opts.sessionStartJitterMs,
|
|
742
|
+
});
|
|
743
|
+
const thinkTimeEnabled = thinkTimeCfg.baseMs > 0 || thinkTimeCfg.perTokenMs > 0 || thinkTimeCfg.jitterMs > 0;
|
|
744
|
+
const sessionStartEnabled = sessionStartCfg.minMs > 0 || sessionStartCfg.jitterMs > 0;
|
|
726
745
|
if (verbose) {
|
|
727
746
|
console.log(`[dario] pacing: min=${pacingCfg.minGapMs}ms jitter=${pacingCfg.jitterMs}ms`);
|
|
747
|
+
if (thinkTimeEnabled) {
|
|
748
|
+
console.log(`[dario] think-time: base=${thinkTimeCfg.baseMs}ms perToken=${thinkTimeCfg.perTokenMs}ms jitter=${thinkTimeCfg.jitterMs}ms max=${thinkTimeCfg.maxMs}ms`);
|
|
749
|
+
}
|
|
750
|
+
if (sessionStartEnabled) {
|
|
751
|
+
console.log(`[dario] session-start: min=${sessionStartCfg.minMs}ms jitter=${sessionStartCfg.jitterMs}ms`);
|
|
752
|
+
}
|
|
728
753
|
}
|
|
729
754
|
// Stream-consumption replay (v3.25, direction #5). When on, a client
|
|
730
755
|
// disconnect no longer aborts the upstream fetch — we keep consuming
|
|
@@ -1287,10 +1312,29 @@ export async function startProxy(opts = {}) {
|
|
|
1287
1312
|
}
|
|
1288
1313
|
}
|
|
1289
1314
|
// Rate governor — prevent inhuman request cadence. See src/pacing.ts
|
|
1290
|
-
// for the pure delay
|
|
1291
|
-
|
|
1292
|
-
|
|
1293
|
-
|
|
1315
|
+
// for the pure delay calculators. Three layers, all defaults preserve
|
|
1316
|
+
// v3.37.20 behaviour:
|
|
1317
|
+
// 1. pacingDelay — floor on inter-request distance (always on,
|
|
1318
|
+
// 500ms default since v3.24).
|
|
1319
|
+
// 2. thinkTimeDelay — post-response read-time, proportional to
|
|
1320
|
+
// the previous response's output tokens.
|
|
1321
|
+
// Opt-in via --think-time-* flags.
|
|
1322
|
+
// 3. sessionStartDelay — one-shot startup latency on the first
|
|
1323
|
+
// request of a session (lastResponseTime===0).
|
|
1324
|
+
// Opt-in via --session-start-* flags.
|
|
1325
|
+
// We take the max because each layer enforces an independent floor
|
|
1326
|
+
// — waiting longer satisfies all of them, so we never need to sum.
|
|
1327
|
+
const nowForPacing = Date.now();
|
|
1328
|
+
const pacingDelay = computePacingDelay(nowForPacing, lastRequestTime, pacingCfg);
|
|
1329
|
+
const thinkDelay = thinkTimeEnabled
|
|
1330
|
+
? computeThinkTimeDelay(nowForPacing, lastResponseTime, lastResponseTokens, thinkTimeCfg)
|
|
1331
|
+
: 0;
|
|
1332
|
+
const sessionStartDelay = (sessionStartEnabled && lastResponseTime === 0 && lastRequestTime === 0)
|
|
1333
|
+
? computeSessionStartDelay(sessionStartCfg)
|
|
1334
|
+
: 0;
|
|
1335
|
+
const totalDelay = Math.max(pacingDelay, thinkDelay, sessionStartDelay);
|
|
1336
|
+
if (totalDelay > 0) {
|
|
1337
|
+
await new Promise(r => setTimeout(r, totalDelay));
|
|
1294
1338
|
}
|
|
1295
1339
|
lastRequestTime = Date.now();
|
|
1296
1340
|
// Session ID: pool mode uses the per-account identity.sessionId (stable
|
|
@@ -1817,6 +1861,15 @@ export async function startProxy(opts = {}) {
|
|
|
1817
1861
|
console.error('[dario] Stream error:', sanitizeError(err));
|
|
1818
1862
|
}
|
|
1819
1863
|
res.end();
|
|
1864
|
+
// Stamp the response-completion timestamp + token count so the
|
|
1865
|
+
// next request's think-time delay can model human read time.
|
|
1866
|
+
// Only on 2xx — error responses don't represent content the user
|
|
1867
|
+
// would read, and using their (often zero) output_tokens would
|
|
1868
|
+
// pin think time to baseMs+jitter on the next request needlessly.
|
|
1869
|
+
if (upstream.status >= 200 && upstream.status < 300) {
|
|
1870
|
+
lastResponseTime = Date.now();
|
|
1871
|
+
lastResponseTokens = streamOutputTokens;
|
|
1872
|
+
}
|
|
1820
1873
|
if (analytics && poolAccount) {
|
|
1821
1874
|
analytics.record({
|
|
1822
1875
|
timestamp: Date.now(), account: poolAccount.alias, model: requestModel,
|
|
@@ -1867,6 +1920,14 @@ export async function startProxy(opts = {}) {
|
|
|
1867
1920
|
bufferedUsage = Analytics.parseUsage(parsed);
|
|
1868
1921
|
}
|
|
1869
1922
|
catch { /* malformed body — log without usage */ }
|
|
1923
|
+
// Stamp response-completion state for the next request's think-time
|
|
1924
|
+
// delay. Same 2xx-only rule as the streaming path. Falls back to 0
|
|
1925
|
+
// tokens when the body wasn't JSON or had no usage block — base +
|
|
1926
|
+
// jitter still apply but the per-token component is 0.
|
|
1927
|
+
if (upstream.status >= 200 && upstream.status < 300) {
|
|
1928
|
+
lastResponseTime = Date.now();
|
|
1929
|
+
lastResponseTokens = bufferedUsage?.outputTokens ?? 0;
|
|
1930
|
+
}
|
|
1870
1931
|
if (analytics && poolAccount && bufferedUsage) {
|
|
1871
1932
|
try {
|
|
1872
1933
|
analytics.record({
|
|
@@ -1960,10 +2021,19 @@ export async function startProxy(opts = {}) {
|
|
|
1960
2021
|
const res = await fetch(`http://${displayHost}:${port}/health`);
|
|
1961
2022
|
const body = await res.json();
|
|
1962
2023
|
if (body && (body.status === 'ok' || body.status === 'degraded')) {
|
|
2024
|
+
// The /health endpoint's `oauth` field is a status enum
|
|
2025
|
+
// ('healthy' | 'expired' | 'broken' | 'none') — not a token
|
|
2026
|
+
// and not any kind of credential. CodeQL's clear-text-logging
|
|
2027
|
+
// heuristic flags any logged field whose key contains "oauth",
|
|
2028
|
+
// so we whitelist by allow-list rather than disable the rule.
|
|
2029
|
+
const allowedOauthStatuses = new Set(['healthy', 'expired', 'broken', 'none', 'degraded']);
|
|
2030
|
+
const rawOauth = typeof body.oauth === 'string' ? body.oauth : '';
|
|
2031
|
+
const oauthStatusLabel = allowedOauthStatuses.has(rawOauth) ? rawOauth : 'unknown';
|
|
2032
|
+
const requestsServed = typeof body.requests === 'number' ? body.requests : 0;
|
|
1963
2033
|
console.log('');
|
|
1964
2034
|
console.log(` dario — already running on http://${displayHost}:${port}`);
|
|
1965
2035
|
console.log('');
|
|
1966
|
-
console.log(` OAuth: ${
|
|
2036
|
+
console.log(` OAuth: ${oauthStatusLabel} | requests served: ${requestsServed}`);
|
|
1967
2037
|
console.log('');
|
|
1968
2038
|
console.log(' Usage:');
|
|
1969
2039
|
console.log(` ANTHROPIC_BASE_URL=http://${displayHost}:${port}`);
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@askalf/dario",
|
|
3
|
-
"version": "3.
|
|
3
|
+
"version": "3.38.1",
|
|
4
4
|
"description": "A local LLM router. One endpoint, every provider — Claude subscriptions, OpenAI, OpenRouter, Groq, local LiteLLM, any OpenAI-compat endpoint — your tools don't need to change.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|