@askalf/dario 3.7.0 → 3.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -438,6 +438,38 @@ Dario auto-detects OAuth config from the installed Claude Code binary. When CC s
438
438
  **I'm hitting rate limits on the Claude backend. What do I do?**
439
439
  Claude subscriptions have rolling 5-hour and 7-day usage windows. Check utilization with Claude Code's `/usage` command or the [statusline](https://code.claude.com/docs/en/statusline). For multi-agent workloads, add more accounts and let pool mode distribute the load: `dario accounts add <alias>`.
440
440
 
441
+ **I'm seeing `representative-claim: seven_day` in my rate-limit headers instead of `five_hour`. Am I being downgraded to API billing?**
442
+
443
+ **No.** You're still on subscription billing. Both `five_hour` and `seven_day` are the same subscription billing mode — they're just two different accounting buckets inside it.
444
+
445
+ Here's the full picture. Every Claude Max and Pro subscription has **two rolling usage windows**:
446
+
447
+ - **5-hour window** — your short-term usage bucket. Refreshes on a rolling 5-hour schedule. It's the one you'll see most of the time if you use Claude casually.
448
+ - **7-day window** — your longer-term usage bucket. Refreshes on a rolling 7-day schedule. It's intentionally larger than the 5-hour one so you can keep working past brief bursts of heavy usage.
449
+
450
+ When Anthropic bills a request, it decides which bucket to charge it against based on your current utilization. That decision comes back to you in the `anthropic-ratelimit-unified-representative-claim` response header:
451
+
452
+ | Claim | What it means |
453
+ |---|---|
454
+ | `five_hour` | You're well inside your 5-hour window; billing against the short-term bucket. |
455
+ | `seven_day` | You've exhausted (or come close to exhausting) the 5-hour window for this rolling cycle, so Anthropic is now charging this request against the 7-day bucket. **Still subscription billing. Still your plan.** Not API pricing, not overage. |
456
+ | `overage` | Both subscription windows are effectively exhausted. *This* is where per-token Extra Usage charges kick in — if you've enabled Extra Usage on the account. If you haven't, you get 429'd instead. |
457
+
458
+ **Seeing `seven_day` is a healthy state.** It means your Max/Pro plan is doing exactly what it's supposed to do: letting you keep working past short bursts of heavy use by absorbing them into the larger 7-day bucket. Your subscription is not being "downgraded." You're not being charged API rates. Nothing has reclassified you to a worse billing tier. When your 5-hour window rolls forward enough, the claim on new requests will go back to `five_hour` on its own.
459
+
460
+ **What about `overage`?** That's the state to watch. It means both windows are saturated and Anthropic is either billing you per-token under Extra Usage (if enabled) or refusing the request (if disabled). If you see this on a Claude Max account under normal use, it usually means (a) you're running a multi-agent workload that's genuinely outgrowing one subscription, or (b) Anthropic's session-level classifier has reclassified your long-running OAuth session as agentic load — see the next FAQ entry for the mechanism.
461
+
462
+ **Checking where you stand.** You can inspect your current utilization three ways:
463
+ 1. **Claude Code's built-in command** — run `/usage` inside a `claude` session. Shows both windows as percentages with reset times.
464
+ 2. **The statusline** — see [Claude Code's statusline docs](https://code.claude.com/docs/en/statusline) for a per-prompt readout.
465
+ 3. **Dario's pool endpoint** — `curl http://localhost:3456/accounts` when running pool mode. The returned snapshot includes `util5h`, `util7d`, and `claim` per account.
466
+
467
+ **Practical answer if `seven_day` is painful for your workload.** Add more Claude subscriptions to the pool. Each account has its own independent 5-hour and 7-day windows, and dario pool mode will route each request to the account with the most headroom (`1 - max(util5h, util7d)`). With 2-3 accounts, you almost never see the `seven_day` bucket get touched because the router steers traffic to whichever account still has `five_hour` headroom. `dario accounts add <alias>`.
468
+
469
+ **Dario's test suite asserts `five_hour` — what if I see failures saying `got: seven_day`?** Some of dario's stealth-test assertions use `representative-claim == "five_hour"` as a shorthand for "is subscription billing classification working?" That assertion is correct for a fresh account but noisy for an account that's been developed against heavily — exactly the situation our own CI hits after an afternoon of test runs. If you're running the stealth suite against an account that's been busy recently and you see failures of the form `Billing claim is five_hour` / `got: seven_day`, that's a test infrastructure limitation, not a dario bug. The request was still billed against your subscription, which is what matters. These assertions will be tightened in a follow-up so they accept both buckets.
470
+
471
+ Standalone writeup with more detail: [Discussion #32 — why you see `representative-claim: seven_day` and why it's not a downgrade](https://github.com/askalf/dario/discussions/32).
472
+
441
473
  **My multi-agent workload is getting reclassified to overage even though dario template-replays per request. Why?**
442
474
  Reclassification at high agent volume is not a per-request problem. Anthropic's classifier operates on cumulative per-OAuth-session aggregates — token throughput, conversation depth, streaming duration, inter-arrival timing, thinking-block volume. Dario's Claude backend can make each individual request indistinguishable from Claude Code and still hit this wall on a long-running agent session, because the wall isn't at the request level. Thorough diagnostic work on this was contributed by [@belangertrading](https://github.com/belangertrading) in [#23](https://github.com/askalf/dario/issues/23), including the v3.4.3/v3.4.5 hardening that landed as a result. The practical answer at the dario layer is **pool mode** — distribute load across multiple subscriptions so no single account accumulates enough signal to trip anything. See [Multi-Account Pool Mode](#multi-account-pool-mode).
443
475
 
@@ -503,23 +503,64 @@ export function createStreamingReverseMapper(toolMap) {
503
503
  return noop;
504
504
  const decoder = new TextDecoder();
505
505
  const encoder = new TextEncoder();
506
- let lineBuffer = '';
507
- // index BufferedToolBlock for content blocks currently being held
508
- // for end-of-block translation.
506
+ // We process on SSE event-group boundaries, not line boundaries.
507
+ // Events are separated by a blank line (two consecutive newlines);
508
+ // within an event group there may be multiple header lines like
509
+ // `event: content_block_delta` and `data: {...}`. The old code
510
+ // processed one line at a time, which meant swallowed deltas left
511
+ // orphan `event:` lines and synthetic delta+stop emissions joined
512
+ // two `data:` lines without a blank-line separator — which SSE
513
+ // parsers concatenate into one malformed multi-line event that
514
+ // fails JSON.parse downstream. v3.7.1 fixes both by processing
515
+ // whole event groups.
516
+ let groupBuffer = '';
517
+ // index → BufferedToolBlock for tool_use content blocks currently
518
+ // being held for end-of-block translation.
509
519
  const buffered = new Map();
510
- function processSseLine(line) {
511
- // Pass through empty lines and event: prefix lines unchanged.
512
- if (!line.startsWith('data:'))
513
- return line;
514
- const jsonText = line.slice(5).trim();
515
- if (jsonText === '[DONE]' || jsonText === '')
516
- return line;
520
+ /**
521
+ * Build a complete SSE event group string with an `event:` header
522
+ * and a `data:` line. Used when emitting rewritten or synthetic
523
+ * events so the wire format matches what upstream produces.
524
+ */
525
+ function buildEvent(type, payload) {
526
+ return `event: ${type}\ndata: ${JSON.stringify(payload)}`;
527
+ }
528
+ /**
529
+ * Process one complete SSE event group. Returns:
530
+ * - a string with one or more rewritten event groups separated
531
+ * by "\n\n" (no trailing blank line — the caller adds that)
532
+ * - null to drop the event group entirely (swallow)
533
+ * - the original `eventText` to pass through unchanged
534
+ *
535
+ * An event group is the text between blank lines. It may contain
536
+ * lines like `event: <type>`, `data: <payload>`, `id:`, `retry:`
537
+ * in any order. We only look at the `data:` line (Anthropic never
538
+ * uses multi-line data payloads).
539
+ */
540
+ function processEventGroup(eventText) {
541
+ if (eventText === '')
542
+ return eventText;
543
+ // Find the data: line. Anthropic's SSE uses one data: per event.
544
+ const lines = eventText.split('\n');
545
+ let dataLineIdx = -1;
546
+ let dataText = '';
547
+ for (let i = 0; i < lines.length; i++) {
548
+ const line = lines[i];
549
+ if (line.startsWith('data:')) {
550
+ dataLineIdx = i;
551
+ dataText = line.slice(5).trim();
552
+ break;
553
+ }
554
+ }
555
+ if (dataLineIdx === -1 || dataText === '' || dataText === '[DONE]') {
556
+ return eventText;
557
+ }
517
558
  let event;
518
559
  try {
519
- event = JSON.parse(jsonText);
560
+ event = JSON.parse(dataText);
520
561
  }
521
562
  catch {
522
- return line;
563
+ return eventText;
523
564
  }
524
565
  const type = event.type;
525
566
  if (type === 'content_block_start') {
@@ -529,55 +570,50 @@ export function createStreamingReverseMapper(toolMap) {
529
570
  const entry = reverseMap.get(block.name);
530
571
  if (entry && entry.mapping.translateBack && idx >= 0) {
531
572
  // Stash the block so we can flush a translated version at
532
- // content_block_stop. Emit a rewritten start event NOW so
533
- // the client sees its own tool name immediately and can
534
- // associate subsequent events with the right call.
573
+ // content_block_stop. Emit a rewritten start event now so
574
+ // the client sees its own tool name immediately.
535
575
  buffered.set(idx, {
536
576
  ccName: block.name,
537
577
  mapping: entry.mapping,
538
578
  clientName: entry.clientName,
539
579
  partial: '',
540
- startEventLines: [],
541
580
  });
542
581
  block.name = entry.clientName;
543
582
  // Reset input to empty so the client doesn't see CC's empty
544
- // placeholder before we emit the translated full input.
583
+ // placeholder before the translated full input arrives.
545
584
  block.input = {};
546
- return `data: ${JSON.stringify(event)}`;
585
+ return buildEvent('content_block_start', event);
547
586
  }
548
- // Tool we don't translate — just rewrite the name in place
549
- // (matches the old non-streaming-rewrite behavior for these).
587
+ // Tool we don't translate — just rewrite the name in place.
550
588
  if (entry) {
551
589
  block.name = entry.clientName;
552
- return `data: ${JSON.stringify(event)}`;
590
+ return buildEvent('content_block_start', event);
553
591
  }
554
592
  }
555
- return line;
593
+ return eventText;
556
594
  }
557
595
  if (type === 'content_block_delta') {
558
596
  const idx = typeof event.index === 'number' ? event.index : -1;
559
597
  const buf = idx >= 0 ? buffered.get(idx) : undefined;
560
598
  if (!buf)
561
- return line;
599
+ return eventText;
562
600
  const delta = event.delta;
563
601
  if (delta && delta.type === 'input_json_delta' && typeof delta.partial_json === 'string') {
564
602
  buf.partial += delta.partial_json;
565
- // Swallow this delta we'll emit a synthetic combined one at stop.
603
+ // Swallow the whole event group including any `event:`
604
+ // header line the upstream emitted for it — because we'll
605
+ // emit a synthetic combined delta at content_block_stop.
566
606
  return null;
567
607
  }
568
- // Some other delta type for a tool_use block (shouldn't happen,
569
- // but pass through if it does).
570
- return line;
608
+ return eventText;
571
609
  }
572
610
  if (type === 'content_block_stop') {
573
611
  const idx = typeof event.index === 'number' ? event.index : -1;
574
612
  const buf = idx >= 0 ? buffered.get(idx) : undefined;
575
613
  if (!buf)
576
- return line;
577
- // Parse the accumulated input JSON, apply translateBack, and
578
- // emit a single synthetic delta carrying the full translated
579
- // input followed by the original stop event.
614
+ return eventText;
580
615
  let translatedInput = {};
616
+ let parseOk = true;
581
617
  try {
582
618
  const parsedInput = JSON.parse(buf.partial || '{}');
583
619
  translatedInput = buf.mapping.translateBack
@@ -585,54 +621,72 @@ export function createStreamingReverseMapper(toolMap) {
585
621
  : parsedInput;
586
622
  }
587
623
  catch {
588
- // If we couldn't assemble valid JSON from the deltas, fall
589
- // back to passing the original partial through unchanged so
590
- // the client at least sees what Anthropic sent.
591
- buffered.delete(idx);
624
+ parseOk = false;
625
+ }
626
+ buffered.delete(idx);
627
+ if (!parseOk) {
628
+ // Fall back to passing the original partial through unchanged
629
+ // so the client at least sees whatever upstream actually sent.
630
+ // Emit as TWO separate SSE events with blank-line separators.
592
631
  const passthroughDelta = {
593
632
  type: 'content_block_delta',
594
633
  index: idx,
595
634
  delta: { type: 'input_json_delta', partial_json: buf.partial },
596
635
  };
597
- return `data: ${JSON.stringify(passthroughDelta)}\ndata: ${JSON.stringify(event)}`;
636
+ return (buildEvent('content_block_delta', passthroughDelta) +
637
+ '\n\n' +
638
+ buildEvent('content_block_stop', event));
598
639
  }
599
- buffered.delete(idx);
600
640
  const synthDelta = {
601
641
  type: 'content_block_delta',
602
642
  index: idx,
603
643
  delta: { type: 'input_json_delta', partial_json: JSON.stringify(translatedInput) },
604
644
  };
605
- return `data: ${JSON.stringify(synthDelta)}\ndata: ${JSON.stringify(event)}`;
645
+ // Emit as TWO separate SSE events joined by a blank line so
646
+ // downstream parsers see them as distinct events. The outer
647
+ // processBuffer will append one more "\n\n" after the final
648
+ // event in this group, which is correct SSE framing.
649
+ return (buildEvent('content_block_delta', synthDelta) +
650
+ '\n\n' +
651
+ buildEvent('content_block_stop', event));
606
652
  }
607
- return line;
653
+ return eventText;
608
654
  }
609
655
  function processBuffer(flush) {
610
- // Split on newlines; keep the trailing partial line in the buffer
611
- // unless we're flushing at end-of-stream.
612
- const lines = lineBuffer.split('\n');
656
+ // Split the accumulated buffer on "\n\n" (SSE event separator).
657
+ // Every complete part is a full event group; the last part is
658
+ // either empty (the trailing blank after a completed event) or
659
+ // a partial event that needs to wait for more bytes.
660
+ const parts = groupBuffer.split('\n\n');
613
661
  if (!flush) {
614
- lineBuffer = lines.pop() ?? '';
662
+ // Hold the last (potentially incomplete) part back.
663
+ groupBuffer = parts.pop() ?? '';
615
664
  }
616
665
  else {
617
- lineBuffer = '';
666
+ groupBuffer = '';
618
667
  }
619
668
  const out = [];
620
- for (const line of lines) {
621
- const processed = processSseLine(line);
669
+ for (const part of parts) {
670
+ if (part === '')
671
+ continue;
672
+ const processed = processEventGroup(part);
622
673
  if (processed !== null)
623
674
  out.push(processed);
624
675
  }
625
- return out.length > 0 ? out.join('\n') + '\n' : '';
676
+ // Each emitted event (or multi-event group) needs a trailing
677
+ // blank line so the SSE framing is correct. We join with "\n\n"
678
+ // and append "\n\n" so both the inter-group and final
679
+ // separators are present.
680
+ return out.length > 0 ? out.join('\n\n') + '\n\n' : '';
626
681
  }
627
682
  return {
628
683
  feed(chunk) {
629
- lineBuffer += decoder.decode(chunk, { stream: true });
684
+ groupBuffer += decoder.decode(chunk, { stream: true });
630
685
  const out = processBuffer(false);
631
686
  return out.length > 0 ? encoder.encode(out) : new Uint8Array(0);
632
687
  },
633
688
  end() {
634
- // Flush any decoder state and remaining buffer.
635
- lineBuffer += decoder.decode();
689
+ groupBuffer += decoder.decode();
636
690
  const out = processBuffer(true);
637
691
  return out.length > 0 ? encoder.encode(out) : new Uint8Array(0);
638
692
  },
package/dist/cli.js CHANGED
@@ -274,10 +274,12 @@ async function backend() {
274
274
  console.log(` ${all.length} backend${all.length === 1 ? '' : 's'} configured`);
275
275
  console.log('');
276
276
  for (const b of all) {
277
- const redacted = b.apiKey.length > 8
278
- ? `${b.apiKey.slice(0, 3)}...${b.apiKey.slice(-4)}`
279
- : '***';
280
- console.log(` ${b.name.padEnd(16)} ${b.provider.padEnd(10)} ${b.baseUrl.padEnd(40)} ${redacted}`);
277
+ // Never emit any substring of the key itself — even partial
278
+ // prefixes/suffixes (like "sk-proj-...a1b2") are leakage as
279
+ // far as CodeQL's js/clear-text-logging rule is concerned, and
280
+ // it's right: partial disclosure is still disclosure. Name and
281
+ // baseUrl together are enough to identify a backend.
282
+ console.log(` ${b.name.padEnd(16)} ${b.provider.padEnd(10)} ${b.baseUrl.padEnd(40)} ***`);
281
283
  }
282
284
  console.log('');
283
285
  return;
@@ -148,11 +148,16 @@ export async function forwardToOpenAI(req, res, body, backend, corsOrigin, secur
148
148
  }
149
149
  catch (err) {
150
150
  clearTimeout(timeout);
151
+ // Log error details server-side only. Responding with err.message
152
+ // exposes internal stack / path / module information (CodeQL
153
+ // js/stack-trace-exposure). The client gets a generic 502.
154
+ const detail = err instanceof Error ? err.message : String(err);
155
+ if (verbose)
156
+ console.error(`[dario] openai backend (${backend.name}) error: ${detail}`);
151
157
  if (!res.headersSent) {
152
158
  res.writeHead(502, { 'Content-Type': 'application/json', ...securityHeaders });
153
159
  res.end(JSON.stringify({
154
160
  error: 'Upstream OpenAI-compat backend error',
155
- message: err instanceof Error ? err.message : String(err),
156
161
  backend: backend.name,
157
162
  }));
158
163
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@askalf/dario",
3
- "version": "3.7.0",
3
+ "version": "3.7.2",
4
4
  "description": "A local LLM router. One endpoint, every provider — Claude subscriptions, OpenAI, OpenRouter, Groq, local LiteLLM, any OpenAI-compat endpoint — your tools don't need to change.",
5
5
  "type": "module",
6
6
  "bin": {