@askalf/dario 3.7.0 → 3.7.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +32 -0
- package/dist/cc-template.js +104 -50
- package/dist/cli.js +6 -4
- package/dist/openai-backend.js +6 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -438,6 +438,38 @@ Dario auto-detects OAuth config from the installed Claude Code binary. When CC s
|
|
|
438
438
|
**I'm hitting rate limits on the Claude backend. What do I do?**
|
|
439
439
|
Claude subscriptions have rolling 5-hour and 7-day usage windows. Check utilization with Claude Code's `/usage` command or the [statusline](https://code.claude.com/docs/en/statusline). For multi-agent workloads, add more accounts and let pool mode distribute the load: `dario accounts add <alias>`.
|
|
440
440
|
|
|
441
|
+
**I'm seeing `representative-claim: seven_day` in my rate-limit headers instead of `five_hour`. Am I being downgraded to API billing?**
|
|
442
|
+
|
|
443
|
+
**No.** You're still on subscription billing. Both `five_hour` and `seven_day` are the same subscription billing mode — they're just two different accounting buckets inside it.
|
|
444
|
+
|
|
445
|
+
Here's the full picture. Every Claude Max and Pro subscription has **two rolling usage windows**:
|
|
446
|
+
|
|
447
|
+
- **5-hour window** — your short-term usage bucket. Refreshes on a rolling 5-hour schedule. It's the one you'll see most of the time if you use Claude casually.
|
|
448
|
+
- **7-day window** — your longer-term usage bucket. Refreshes on a rolling 7-day schedule. It's intentionally larger than the 5-hour one so you can keep working past brief bursts of heavy usage.
|
|
449
|
+
|
|
450
|
+
When Anthropic bills a request, it decides which bucket to charge it against based on your current utilization. That decision comes back to you in the `anthropic-ratelimit-unified-representative-claim` response header:
|
|
451
|
+
|
|
452
|
+
| Claim | What it means |
|
|
453
|
+
|---|---|
|
|
454
|
+
| `five_hour` | You're well inside your 5-hour window; billing against the short-term bucket. |
|
|
455
|
+
| `seven_day` | You've exhausted (or come close to exhausting) the 5-hour window for this rolling cycle, so Anthropic is now charging this request against the 7-day bucket. **Still subscription billing. Still your plan.** Not API pricing, not overage. |
|
|
456
|
+
| `overage` | Both subscription windows are effectively exhausted. *This* is where per-token Extra Usage charges kick in — if you've enabled Extra Usage on the account. If you haven't, you get 429'd instead. |
|
|
457
|
+
|
|
458
|
+
**Seeing `seven_day` is a healthy state.** It means your Max/Pro plan is doing exactly what it's supposed to do: letting you keep working past short bursts of heavy use by absorbing them into the larger 7-day bucket. Your subscription is not being "downgraded." You're not being charged API rates. Nothing has reclassified you to a worse billing tier. When your 5-hour window rolls forward enough, the claim on new requests will go back to `five_hour` on its own.
|
|
459
|
+
|
|
460
|
+
**What about `overage`?** That's the state to watch. It means both windows are saturated and Anthropic is either billing you per-token under Extra Usage (if enabled) or refusing the request (if disabled). If you see this on a Claude Max account under normal use, it usually means (a) you're running a multi-agent workload that's genuinely outgrowing one subscription, or (b) Anthropic's session-level classifier has reclassified your long-running OAuth session as agentic load — see the next FAQ entry for the mechanism.
|
|
461
|
+
|
|
462
|
+
**Checking where you stand.** You can inspect your current utilization three ways:
|
|
463
|
+
1. **Claude Code's built-in command** — run `/usage` inside a `claude` session. Shows both windows as percentages with reset times.
|
|
464
|
+
2. **The statusline** — see [Claude Code's statusline docs](https://code.claude.com/docs/en/statusline) for a per-prompt readout.
|
|
465
|
+
3. **Dario's pool endpoint** — `curl http://localhost:3456/accounts` when running pool mode. The returned snapshot includes `util5h`, `util7d`, and `claim` per account.
|
|
466
|
+
|
|
467
|
+
**Practical answer if `seven_day` is painful for your workload.** Add more Claude subscriptions to the pool. Each account has its own independent 5-hour and 7-day windows, and dario pool mode will route each request to the account with the most headroom (`1 - max(util5h, util7d)`). With 2-3 accounts, you almost never see the `seven_day` bucket get touched because the router steers traffic to whichever account still has `five_hour` headroom. `dario accounts add <alias>`.
|
|
468
|
+
|
|
469
|
+
**Dario's test suite asserts `five_hour` — what if I see failures saying `got: seven_day`?** Some of dario's stealth-test assertions use `representative-claim == "five_hour"` as a shorthand for "is subscription billing classification working?" That assertion is correct for a fresh account but noisy for an account that's been developed against heavily — exactly the situation our own CI hits after an afternoon of test runs. If you're running the stealth suite against an account that's been busy recently and you see failures of the form `Billing claim is five_hour` / `got: seven_day`, that's a test infrastructure limitation, not a dario bug. The request was still billed against your subscription, which is what matters. These assertions will be tightened in a follow-up so they accept both buckets.
|
|
470
|
+
|
|
471
|
+
Standalone writeup with more detail: [Discussion #32 — why you see `representative-claim: seven_day` and why it's not a downgrade](https://github.com/askalf/dario/discussions/32).
|
|
472
|
+
|
|
441
473
|
**My multi-agent workload is getting reclassified to overage even though dario template-replays per request. Why?**
|
|
442
474
|
Reclassification at high agent volume is not a per-request problem. Anthropic's classifier operates on cumulative per-OAuth-session aggregates — token throughput, conversation depth, streaming duration, inter-arrival timing, thinking-block volume. Dario's Claude backend can make each individual request indistinguishable from Claude Code and still hit this wall on a long-running agent session, because the wall isn't at the request level. Thorough diagnostic work on this was contributed by [@belangertrading](https://github.com/belangertrading) in [#23](https://github.com/askalf/dario/issues/23), including the v3.4.3/v3.4.5 hardening that landed as a result. The practical answer at the dario layer is **pool mode** — distribute load across multiple subscriptions so no single account accumulates enough signal to trip anything. See [Multi-Account Pool Mode](#multi-account-pool-mode).
|
|
443
475
|
|
package/dist/cc-template.js
CHANGED
|
@@ -503,23 +503,64 @@ export function createStreamingReverseMapper(toolMap) {
|
|
|
503
503
|
return noop;
|
|
504
504
|
const decoder = new TextDecoder();
|
|
505
505
|
const encoder = new TextEncoder();
|
|
506
|
-
|
|
507
|
-
//
|
|
508
|
-
//
|
|
506
|
+
// We process on SSE event-group boundaries, not line boundaries.
|
|
507
|
+
// Events are separated by a blank line (two consecutive newlines);
|
|
508
|
+
// within an event group there may be multiple header lines like
|
|
509
|
+
// `event: content_block_delta` and `data: {...}`. The old code
|
|
510
|
+
// processed one line at a time, which meant swallowed deltas left
|
|
511
|
+
// orphan `event:` lines and synthetic delta+stop emissions joined
|
|
512
|
+
// two `data:` lines without a blank-line separator — which SSE
|
|
513
|
+
// parsers concatenate into one malformed multi-line event that
|
|
514
|
+
// fails JSON.parse downstream. v3.7.1 fixes both by processing
|
|
515
|
+
// whole event groups.
|
|
516
|
+
let groupBuffer = '';
|
|
517
|
+
// index → BufferedToolBlock for tool_use content blocks currently
|
|
518
|
+
// being held for end-of-block translation.
|
|
509
519
|
const buffered = new Map();
|
|
510
|
-
|
|
511
|
-
|
|
512
|
-
|
|
513
|
-
|
|
514
|
-
|
|
515
|
-
|
|
516
|
-
|
|
520
|
+
/**
|
|
521
|
+
* Build a complete SSE event group string with an `event:` header
|
|
522
|
+
* and a `data:` line. Used when emitting rewritten or synthetic
|
|
523
|
+
* events so the wire format matches what upstream produces.
|
|
524
|
+
*/
|
|
525
|
+
function buildEvent(type, payload) {
|
|
526
|
+
return `event: ${type}\ndata: ${JSON.stringify(payload)}`;
|
|
527
|
+
}
|
|
528
|
+
/**
|
|
529
|
+
* Process one complete SSE event group. Returns:
|
|
530
|
+
* - a string with one or more rewritten event groups separated
|
|
531
|
+
* by "\n\n" (no trailing blank line — the caller adds that)
|
|
532
|
+
* - null to drop the event group entirely (swallow)
|
|
533
|
+
* - the original `eventText` to pass through unchanged
|
|
534
|
+
*
|
|
535
|
+
* An event group is the text between blank lines. It may contain
|
|
536
|
+
* lines like `event: <type>`, `data: <payload>`, `id:`, `retry:`
|
|
537
|
+
* in any order. We only look at the `data:` line (Anthropic never
|
|
538
|
+
* uses multi-line data payloads).
|
|
539
|
+
*/
|
|
540
|
+
function processEventGroup(eventText) {
|
|
541
|
+
if (eventText === '')
|
|
542
|
+
return eventText;
|
|
543
|
+
// Find the data: line. Anthropic's SSE uses one data: per event.
|
|
544
|
+
const lines = eventText.split('\n');
|
|
545
|
+
let dataLineIdx = -1;
|
|
546
|
+
let dataText = '';
|
|
547
|
+
for (let i = 0; i < lines.length; i++) {
|
|
548
|
+
const line = lines[i];
|
|
549
|
+
if (line.startsWith('data:')) {
|
|
550
|
+
dataLineIdx = i;
|
|
551
|
+
dataText = line.slice(5).trim();
|
|
552
|
+
break;
|
|
553
|
+
}
|
|
554
|
+
}
|
|
555
|
+
if (dataLineIdx === -1 || dataText === '' || dataText === '[DONE]') {
|
|
556
|
+
return eventText;
|
|
557
|
+
}
|
|
517
558
|
let event;
|
|
518
559
|
try {
|
|
519
|
-
event = JSON.parse(
|
|
560
|
+
event = JSON.parse(dataText);
|
|
520
561
|
}
|
|
521
562
|
catch {
|
|
522
|
-
return
|
|
563
|
+
return eventText;
|
|
523
564
|
}
|
|
524
565
|
const type = event.type;
|
|
525
566
|
if (type === 'content_block_start') {
|
|
@@ -529,55 +570,50 @@ export function createStreamingReverseMapper(toolMap) {
|
|
|
529
570
|
const entry = reverseMap.get(block.name);
|
|
530
571
|
if (entry && entry.mapping.translateBack && idx >= 0) {
|
|
531
572
|
// Stash the block so we can flush a translated version at
|
|
532
|
-
// content_block_stop. Emit a rewritten start event
|
|
533
|
-
// the client sees its own tool name immediately
|
|
534
|
-
// associate subsequent events with the right call.
|
|
573
|
+
// content_block_stop. Emit a rewritten start event now so
|
|
574
|
+
// the client sees its own tool name immediately.
|
|
535
575
|
buffered.set(idx, {
|
|
536
576
|
ccName: block.name,
|
|
537
577
|
mapping: entry.mapping,
|
|
538
578
|
clientName: entry.clientName,
|
|
539
579
|
partial: '',
|
|
540
|
-
startEventLines: [],
|
|
541
580
|
});
|
|
542
581
|
block.name = entry.clientName;
|
|
543
582
|
// Reset input to empty so the client doesn't see CC's empty
|
|
544
|
-
// placeholder before
|
|
583
|
+
// placeholder before the translated full input arrives.
|
|
545
584
|
block.input = {};
|
|
546
|
-
return
|
|
585
|
+
return buildEvent('content_block_start', event);
|
|
547
586
|
}
|
|
548
|
-
// Tool we don't translate — just rewrite the name in place
|
|
549
|
-
// (matches the old non-streaming-rewrite behavior for these).
|
|
587
|
+
// Tool we don't translate — just rewrite the name in place.
|
|
550
588
|
if (entry) {
|
|
551
589
|
block.name = entry.clientName;
|
|
552
|
-
return
|
|
590
|
+
return buildEvent('content_block_start', event);
|
|
553
591
|
}
|
|
554
592
|
}
|
|
555
|
-
return
|
|
593
|
+
return eventText;
|
|
556
594
|
}
|
|
557
595
|
if (type === 'content_block_delta') {
|
|
558
596
|
const idx = typeof event.index === 'number' ? event.index : -1;
|
|
559
597
|
const buf = idx >= 0 ? buffered.get(idx) : undefined;
|
|
560
598
|
if (!buf)
|
|
561
|
-
return
|
|
599
|
+
return eventText;
|
|
562
600
|
const delta = event.delta;
|
|
563
601
|
if (delta && delta.type === 'input_json_delta' && typeof delta.partial_json === 'string') {
|
|
564
602
|
buf.partial += delta.partial_json;
|
|
565
|
-
// Swallow
|
|
603
|
+
// Swallow the whole event group — including any `event:`
|
|
604
|
+
// header line the upstream emitted for it — because we'll
|
|
605
|
+
// emit a synthetic combined delta at content_block_stop.
|
|
566
606
|
return null;
|
|
567
607
|
}
|
|
568
|
-
|
|
569
|
-
// but pass through if it does).
|
|
570
|
-
return line;
|
|
608
|
+
return eventText;
|
|
571
609
|
}
|
|
572
610
|
if (type === 'content_block_stop') {
|
|
573
611
|
const idx = typeof event.index === 'number' ? event.index : -1;
|
|
574
612
|
const buf = idx >= 0 ? buffered.get(idx) : undefined;
|
|
575
613
|
if (!buf)
|
|
576
|
-
return
|
|
577
|
-
// Parse the accumulated input JSON, apply translateBack, and
|
|
578
|
-
// emit a single synthetic delta carrying the full translated
|
|
579
|
-
// input followed by the original stop event.
|
|
614
|
+
return eventText;
|
|
580
615
|
let translatedInput = {};
|
|
616
|
+
let parseOk = true;
|
|
581
617
|
try {
|
|
582
618
|
const parsedInput = JSON.parse(buf.partial || '{}');
|
|
583
619
|
translatedInput = buf.mapping.translateBack
|
|
@@ -585,54 +621,72 @@ export function createStreamingReverseMapper(toolMap) {
|
|
|
585
621
|
: parsedInput;
|
|
586
622
|
}
|
|
587
623
|
catch {
|
|
588
|
-
|
|
589
|
-
|
|
590
|
-
|
|
591
|
-
|
|
624
|
+
parseOk = false;
|
|
625
|
+
}
|
|
626
|
+
buffered.delete(idx);
|
|
627
|
+
if (!parseOk) {
|
|
628
|
+
// Fall back to passing the original partial through unchanged
|
|
629
|
+
// so the client at least sees whatever upstream actually sent.
|
|
630
|
+
// Emit as TWO separate SSE events with blank-line separators.
|
|
592
631
|
const passthroughDelta = {
|
|
593
632
|
type: 'content_block_delta',
|
|
594
633
|
index: idx,
|
|
595
634
|
delta: { type: 'input_json_delta', partial_json: buf.partial },
|
|
596
635
|
};
|
|
597
|
-
return
|
|
636
|
+
return (buildEvent('content_block_delta', passthroughDelta) +
|
|
637
|
+
'\n\n' +
|
|
638
|
+
buildEvent('content_block_stop', event));
|
|
598
639
|
}
|
|
599
|
-
buffered.delete(idx);
|
|
600
640
|
const synthDelta = {
|
|
601
641
|
type: 'content_block_delta',
|
|
602
642
|
index: idx,
|
|
603
643
|
delta: { type: 'input_json_delta', partial_json: JSON.stringify(translatedInput) },
|
|
604
644
|
};
|
|
605
|
-
|
|
645
|
+
// Emit as TWO separate SSE events joined by a blank line so
|
|
646
|
+
// downstream parsers see them as distinct events. The outer
|
|
647
|
+
// processBuffer will append one more "\n\n" after the final
|
|
648
|
+
// event in this group, which is correct SSE framing.
|
|
649
|
+
return (buildEvent('content_block_delta', synthDelta) +
|
|
650
|
+
'\n\n' +
|
|
651
|
+
buildEvent('content_block_stop', event));
|
|
606
652
|
}
|
|
607
|
-
return
|
|
653
|
+
return eventText;
|
|
608
654
|
}
|
|
609
655
|
function processBuffer(flush) {
|
|
610
|
-
// Split
|
|
611
|
-
//
|
|
612
|
-
|
|
656
|
+
// Split the accumulated buffer on "\n\n" (SSE event separator).
|
|
657
|
+
// Every complete part is a full event group; the last part is
|
|
658
|
+
// either empty (the trailing blank after a completed event) or
|
|
659
|
+
// a partial event that needs to wait for more bytes.
|
|
660
|
+
const parts = groupBuffer.split('\n\n');
|
|
613
661
|
if (!flush) {
|
|
614
|
-
|
|
662
|
+
// Hold the last (potentially incomplete) part back.
|
|
663
|
+
groupBuffer = parts.pop() ?? '';
|
|
615
664
|
}
|
|
616
665
|
else {
|
|
617
|
-
|
|
666
|
+
groupBuffer = '';
|
|
618
667
|
}
|
|
619
668
|
const out = [];
|
|
620
|
-
for (const
|
|
621
|
-
|
|
669
|
+
for (const part of parts) {
|
|
670
|
+
if (part === '')
|
|
671
|
+
continue;
|
|
672
|
+
const processed = processEventGroup(part);
|
|
622
673
|
if (processed !== null)
|
|
623
674
|
out.push(processed);
|
|
624
675
|
}
|
|
625
|
-
|
|
676
|
+
// Each emitted event (or multi-event group) needs a trailing
|
|
677
|
+
// blank line so the SSE framing is correct. We join with "\n\n"
|
|
678
|
+
// and append "\n\n" so both the inter-group and final
|
|
679
|
+
// separators are present.
|
|
680
|
+
return out.length > 0 ? out.join('\n\n') + '\n\n' : '';
|
|
626
681
|
}
|
|
627
682
|
return {
|
|
628
683
|
feed(chunk) {
|
|
629
|
-
|
|
684
|
+
groupBuffer += decoder.decode(chunk, { stream: true });
|
|
630
685
|
const out = processBuffer(false);
|
|
631
686
|
return out.length > 0 ? encoder.encode(out) : new Uint8Array(0);
|
|
632
687
|
},
|
|
633
688
|
end() {
|
|
634
|
-
|
|
635
|
-
lineBuffer += decoder.decode();
|
|
689
|
+
groupBuffer += decoder.decode();
|
|
636
690
|
const out = processBuffer(true);
|
|
637
691
|
return out.length > 0 ? encoder.encode(out) : new Uint8Array(0);
|
|
638
692
|
},
|
package/dist/cli.js
CHANGED
|
@@ -274,10 +274,12 @@ async function backend() {
|
|
|
274
274
|
console.log(` ${all.length} backend${all.length === 1 ? '' : 's'} configured`);
|
|
275
275
|
console.log('');
|
|
276
276
|
for (const b of all) {
|
|
277
|
-
|
|
278
|
-
|
|
279
|
-
|
|
280
|
-
|
|
277
|
+
// Never emit any substring of the key itself — even partial
|
|
278
|
+
// prefixes/suffixes (like "sk-proj-...a1b2") are leakage as
|
|
279
|
+
// far as CodeQL's js/clear-text-logging rule is concerned, and
|
|
280
|
+
// it's right: partial disclosure is still disclosure. Name and
|
|
281
|
+
// baseUrl together are enough to identify a backend.
|
|
282
|
+
console.log(` ${b.name.padEnd(16)} ${b.provider.padEnd(10)} ${b.baseUrl.padEnd(40)} ***`);
|
|
281
283
|
}
|
|
282
284
|
console.log('');
|
|
283
285
|
return;
|
package/dist/openai-backend.js
CHANGED
|
@@ -148,11 +148,16 @@ export async function forwardToOpenAI(req, res, body, backend, corsOrigin, secur
|
|
|
148
148
|
}
|
|
149
149
|
catch (err) {
|
|
150
150
|
clearTimeout(timeout);
|
|
151
|
+
// Log error details server-side only. Responding with err.message
|
|
152
|
+
// exposes internal stack / path / module information (CodeQL
|
|
153
|
+
// js/stack-trace-exposure). The client gets a generic 502.
|
|
154
|
+
const detail = err instanceof Error ? err.message : String(err);
|
|
155
|
+
if (verbose)
|
|
156
|
+
console.error(`[dario] openai backend (${backend.name}) error: ${detail}`);
|
|
151
157
|
if (!res.headersSent) {
|
|
152
158
|
res.writeHead(502, { 'Content-Type': 'application/json', ...securityHeaders });
|
|
153
159
|
res.end(JSON.stringify({
|
|
154
160
|
error: 'Upstream OpenAI-compat backend error',
|
|
155
|
-
message: err instanceof Error ? err.message : String(err),
|
|
156
161
|
backend: backend.name,
|
|
157
162
|
}));
|
|
158
163
|
}
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@askalf/dario",
|
|
3
|
-
"version": "3.7.
|
|
3
|
+
"version": "3.7.2",
|
|
4
4
|
"description": "A local LLM router. One endpoint, every provider — Claude subscriptions, OpenAI, OpenRouter, Groq, local LiteLLM, any OpenAI-compat endpoint — your tools don't need to change.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|