npm - @askalf/dario - Versions diffs - 3.7.0 → 3.7.2 - Mend

@askalf/dario 3.7.0 → 3.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -438,6 +438,38 @@ Dario auto-detects OAuth config from the installed Claude Code binary. When CC s
 **I'm hitting rate limits on the Claude backend. What do I do?**
 Claude subscriptions have rolling 5-hour and 7-day usage windows. Check utilization with Claude Code's `/usage` command or the [statusline](https://code.claude.com/docs/en/statusline). For multi-agent workloads, add more accounts and let pool mode distribute the load: `dario accounts add <alias>`.
+**I'm seeing `representative-claim: seven_day` in my rate-limit headers instead of `five_hour`. Am I being downgraded to API billing?**
+**No.** You're still on subscription billing. Both `five_hour` and `seven_day` are the same subscription billing mode — they're just two different accounting buckets inside it.
+Here's the full picture. Every Claude Max and Pro subscription has **two rolling usage windows**:
+- **5-hour window** — your short-term usage bucket. Refreshes on a rolling 5-hour schedule. It's the one you'll see most of the time if you use Claude casually.
+- **7-day window** — your longer-term usage bucket. Refreshes on a rolling 7-day schedule. It's intentionally larger than the 5-hour one so you can keep working past brief bursts of heavy usage.
+When Anthropic bills a request, it decides which bucket to charge it against based on your current utilization. That decision comes back to you in the `anthropic-ratelimit-unified-representative-claim` response header:
+| Claim | What it means |
+|---|---|
+| `five_hour` | You're well inside your 5-hour window; billing against the short-term bucket. |
+| `seven_day` | You've exhausted (or come close to exhausting) the 5-hour window for this rolling cycle, so Anthropic is now charging this request against the 7-day bucket. **Still subscription billing. Still your plan.** Not API pricing, not overage. |
+| `overage` | Both subscription windows are effectively exhausted. *This* is where per-token Extra Usage charges kick in — if you've enabled Extra Usage on the account. If you haven't, you get 429'd instead. |
+**Seeing `seven_day` is a healthy state.** It means your Max/Pro plan is doing exactly what it's supposed to do: letting you keep working past short bursts of heavy use by absorbing them into the larger 7-day bucket. Your subscription is not being "downgraded." You're not being charged API rates. Nothing has reclassified you to a worse billing tier. When your 5-hour window rolls forward enough, the claim on new requests will go back to `five_hour` on its own.
+**What about `overage`?** That's the state to watch. It means both windows are saturated and Anthropic is either billing you per-token under Extra Usage (if enabled) or refusing the request (if disabled). If you see this on a Claude Max account under normal use, it usually means (a) you're running a multi-agent workload that's genuinely outgrowing one subscription, or (b) Anthropic's session-level classifier has reclassified your long-running OAuth session as agentic load — see the next FAQ entry for the mechanism.
+**Checking where you stand.** You can inspect your current utilization three ways:
+1. **Claude Code's built-in command** — run `/usage` inside a `claude` session. Shows both windows as percentages with reset times.
+2. **The statusline** — see [Claude Code's statusline docs](https://code.claude.com/docs/en/statusline) for a per-prompt readout.
+3. **Dario's pool endpoint** — `curl http://localhost:3456/accounts` when running pool mode. The returned snapshot includes `util5h`, `util7d`, and `claim` per account.
+**Practical answer if `seven_day` is painful for your workload.** Add more Claude subscriptions to the pool. Each account has its own independent 5-hour and 7-day windows, and dario pool mode will route each request to the account with the most headroom (`1 - max(util5h, util7d)`). With 2-3 accounts, you almost never see the `seven_day` bucket get touched because the router steers traffic to whichever account still has `five_hour` headroom. `dario accounts add <alias>`.
+**Dario's test suite asserts `five_hour` — what if I see failures saying `got: seven_day`?** Some of dario's stealth-test assertions use `representative-claim == "five_hour"` as a shorthand for "is subscription billing classification working?" That assertion is correct for a fresh account but noisy for an account that's been developed against heavily — exactly the situation our own CI hits after an afternoon of test runs. If you're running the stealth suite against an account that's been busy recently and you see failures of the form `Billing claim is five_hour` / `got: seven_day`, that's a test infrastructure limitation, not a dario bug. The request was still billed against your subscription, which is what matters. These assertions will be tightened in a follow-up so they accept both buckets.
+Standalone writeup with more detail: [Discussion #32 — why you see `representative-claim: seven_day` and why it's not a downgrade](https://github.com/askalf/dario/discussions/32).
 **My multi-agent workload is getting reclassified to overage even though dario template-replays per request. Why?**
 Reclassification at high agent volume is not a per-request problem. Anthropic's classifier operates on cumulative per-OAuth-session aggregates — token throughput, conversation depth, streaming duration, inter-arrival timing, thinking-block volume. Dario's Claude backend can make each individual request indistinguishable from Claude Code and still hit this wall on a long-running agent session, because the wall isn't at the request level. Thorough diagnostic work on this was contributed by [@belangertrading](https://github.com/belangertrading) in [#23](https://github.com/askalf/dario/issues/23), including the v3.4.3/v3.4.5 hardening that landed as a result. The practical answer at the dario layer is **pool mode** — distribute load across multiple subscriptions so no single account accumulates enough signal to trip anything. See [Multi-Account Pool Mode](#multi-account-pool-mode).

package/dist/cc-template.js CHANGED Viewed

@@ -503,23 +503,64 @@ export function createStreamingReverseMapper(toolMap) {
         return noop;
     const decoder = new TextDecoder();
     const encoder = new TextEncoder();
-    let lineBuffer = '';
-    // index → BufferedToolBlock for content blocks currently being held
-    // for end-of-block translation.
+    // We process on SSE event-group boundaries, not line boundaries.
+    // Events are separated by a blank line (two consecutive newlines);
+    // within an event group there may be multiple header lines like
+    // `event: content_block_delta` and `data: {...}`. The old code
+    // processed one line at a time, which meant swallowed deltas left
+    // orphan `event:` lines and synthetic delta+stop emissions joined
+    // two `data:` lines without a blank-line separator — which SSE
+    // parsers concatenate into one malformed multi-line event that
+    // fails JSON.parse downstream. v3.7.1 fixes both by processing
+    // whole event groups.
+    let groupBuffer = '';
+    // index → BufferedToolBlock for tool_use content blocks currently
+    // being held for end-of-block translation.
     const buffered = new Map();
-    function processSseLine(line) {
-        // Pass through empty lines and event: prefix lines unchanged.
-        if (!line.startsWith('data:'))
-            return line;
-        const jsonText = line.slice(5).trim();
-        if (jsonText === '[DONE]' || jsonText === '')
-            return line;
+    /**
+     * Build a complete SSE event group string with an `event:` header
+     * and a `data:` line. Used when emitting rewritten or synthetic
+     * events so the wire format matches what upstream produces.
+     */
+    function buildEvent(type, payload) {
+        return `event: ${type}\ndata: ${JSON.stringify(payload)}`;
+    }
+    /**
+     * Process one complete SSE event group. Returns:
+     *   - a string with one or more rewritten event groups separated
+     *     by "\n\n" (no trailing blank line — the caller adds that)
+     *   - null to drop the event group entirely (swallow)
+     *   - the original `eventText` to pass through unchanged
+     *
+     * An event group is the text between blank lines. It may contain
+     * lines like `event: <type>`, `data: <payload>`, `id:`, `retry:`
+     * in any order. We only look at the `data:` line (Anthropic never
+     * uses multi-line data payloads).
+     */
+    function processEventGroup(eventText) {
+        if (eventText === '')
+            return eventText;
+        // Find the data: line. Anthropic's SSE uses one data: per event.
+        const lines = eventText.split('\n');
+        let dataLineIdx = -1;
+        let dataText = '';
+        for (let i = 0; i < lines.length; i++) {
+            const line = lines[i];
+            if (line.startsWith('data:')) {
+                dataLineIdx = i;
+                dataText = line.slice(5).trim();
+                break;
+            }
+        }
+        if (dataLineIdx === -1 || dataText === '' || dataText === '[DONE]') {
+            return eventText;
+        }
         let event;
         try {
-            event = JSON.parse(jsonText);
+            event = JSON.parse(dataText);
         }
         catch {
-            return line;
+            return eventText;
         }
         const type = event.type;
         if (type === 'content_block_start') {
@@ -529,55 +570,50 @@ export function createStreamingReverseMapper(toolMap) {
                 const entry = reverseMap.get(block.name);
                 if (entry && entry.mapping.translateBack && idx >= 0) {
                     // Stash the block so we can flush a translated version at
-                    // content_block_stop. Emit a rewritten start event NOW so
-                    // the client sees its own tool name immediately and can
-                    // associate subsequent events with the right call.
+                    // content_block_stop. Emit a rewritten start event now so
+                    // the client sees its own tool name immediately.
                     buffered.set(idx, {
                         ccName: block.name,
                         mapping: entry.mapping,
                         clientName: entry.clientName,
                         partial: '',
-                        startEventLines: [],
                     });
                     block.name = entry.clientName;
                     // Reset input to empty so the client doesn't see CC's empty
-                    // placeholder before we emit the translated full input.
+                    // placeholder before the translated full input arrives.
                     block.input = {};
-                    return `data: ${JSON.stringify(event)}`;
+                    return buildEvent('content_block_start', event);
                 }
-                // Tool we don't translate — just rewrite the name in place
-                // (matches the old non-streaming-rewrite behavior for these).
+                // Tool we don't translate — just rewrite the name in place.
                 if (entry) {
                     block.name = entry.clientName;
-                    return `data: ${JSON.stringify(event)}`;
+                    return buildEvent('content_block_start', event);
                 }
             }
-            return line;
+            return eventText;
         }
         if (type === 'content_block_delta') {
             const idx = typeof event.index === 'number' ? event.index : -1;
             const buf = idx >= 0 ? buffered.get(idx) : undefined;
             if (!buf)
-                return line;
+                return eventText;
             const delta = event.delta;
             if (delta && delta.type === 'input_json_delta' && typeof delta.partial_json === 'string') {
                 buf.partial += delta.partial_json;
-                // Swallow this delta — we'll emit a synthetic combined one at stop.
+                // Swallow the whole event group — including any `event:`
+                // header line the upstream emitted for it — because we'll
+                // emit a synthetic combined delta at content_block_stop.
                 return null;
             }
-            // Some other delta type for a tool_use block (shouldn't happen,
-            // but pass through if it does).
-            return line;
+            return eventText;
         }
         if (type === 'content_block_stop') {
             const idx = typeof event.index === 'number' ? event.index : -1;
             const buf = idx >= 0 ? buffered.get(idx) : undefined;
             if (!buf)
-                return line;
-            // Parse the accumulated input JSON, apply translateBack, and
-            // emit a single synthetic delta carrying the full translated
-            // input followed by the original stop event.
+                return eventText;
             let translatedInput = {};
+            let parseOk = true;
             try {
                 const parsedInput = JSON.parse(buf.partial || '{}');
                 translatedInput = buf.mapping.translateBack
@@ -585,54 +621,72 @@ export function createStreamingReverseMapper(toolMap) {
                     : parsedInput;
             }
             catch {
-                // If we couldn't assemble valid JSON from the deltas, fall
-                // back to passing the original partial through unchanged so
-                // the client at least sees what Anthropic sent.
-                buffered.delete(idx);
+                parseOk = false;
+            }
+            buffered.delete(idx);
+            if (!parseOk) {
+                // Fall back to passing the original partial through unchanged
+                // so the client at least sees whatever upstream actually sent.
+                // Emit as TWO separate SSE events with blank-line separators.
                 const passthroughDelta = {
                     type: 'content_block_delta',
                     index: idx,
                     delta: { type: 'input_json_delta', partial_json: buf.partial },
                 };
-                return `data: ${JSON.stringify(passthroughDelta)}\ndata: ${JSON.stringify(event)}`;
+                return (buildEvent('content_block_delta', passthroughDelta) +
+                    '\n\n' +
+                    buildEvent('content_block_stop', event));
             }
-            buffered.delete(idx);
             const synthDelta = {
                 type: 'content_block_delta',
                 index: idx,
                 delta: { type: 'input_json_delta', partial_json: JSON.stringify(translatedInput) },
             };
-            return `data: ${JSON.stringify(synthDelta)}\ndata: ${JSON.stringify(event)}`;
+            // Emit as TWO separate SSE events joined by a blank line so
+            // downstream parsers see them as distinct events. The outer
+            // processBuffer will append one more "\n\n" after the final
+            // event in this group, which is correct SSE framing.
+            return (buildEvent('content_block_delta', synthDelta) +
+                '\n\n' +
+                buildEvent('content_block_stop', event));
         }
-        return line;
+        return eventText;
     }
     function processBuffer(flush) {
-        // Split on newlines; keep the trailing partial line in the buffer
-        // unless we're flushing at end-of-stream.
-        const lines = lineBuffer.split('\n');
+        // Split the accumulated buffer on "\n\n" (SSE event separator).
+        // Every complete part is a full event group; the last part is
+        // either empty (the trailing blank after a completed event) or
+        // a partial event that needs to wait for more bytes.
+        const parts = groupBuffer.split('\n\n');
         if (!flush) {
-            lineBuffer = lines.pop() ?? '';
+            // Hold the last (potentially incomplete) part back.
+            groupBuffer = parts.pop() ?? '';
         }
         else {
-            lineBuffer = '';
+            groupBuffer = '';
         }
         const out = [];
-        for (const line of lines) {
-            const processed = processSseLine(line);
+        for (const part of parts) {
+            if (part === '')
+                continue;
+            const processed = processEventGroup(part);
             if (processed !== null)
                 out.push(processed);
         }
-        return out.length > 0 ? out.join('\n') + '\n' : '';
+        // Each emitted event (or multi-event group) needs a trailing
+        // blank line so the SSE framing is correct. We join with "\n\n"
+        // and append "\n\n" so both the inter-group and final
+        // separators are present.
+        return out.length > 0 ? out.join('\n\n') + '\n\n' : '';
     }
     return {
         feed(chunk) {
-            lineBuffer += decoder.decode(chunk, { stream: true });
+            groupBuffer += decoder.decode(chunk, { stream: true });
             const out = processBuffer(false);
             return out.length > 0 ? encoder.encode(out) : new Uint8Array(0);
         },
         end() {
-            // Flush any decoder state and remaining buffer.
-            lineBuffer += decoder.decode();
+            groupBuffer += decoder.decode();
             const out = processBuffer(true);
             return out.length > 0 ? encoder.encode(out) : new Uint8Array(0);
         },

package/dist/cli.js CHANGED Viewed

@@ -274,10 +274,12 @@ async function backend() {
         console.log(`  ${all.length} backend${all.length === 1 ? '' : 's'} configured`);
         console.log('');
         for (const b of all) {
-            const redacted = b.apiKey.length > 8
-                ? `${b.apiKey.slice(0, 3)}...${b.apiKey.slice(-4)}`
-                : '***';
-            console.log(`    ${b.name.padEnd(16)} ${b.provider.padEnd(10)} ${b.baseUrl.padEnd(40)} ${redacted}`);
+            // Never emit any substring of the key itself — even partial
+            // prefixes/suffixes (like "sk-proj-...a1b2") are leakage as
+            // far as CodeQL's js/clear-text-logging rule is concerned, and
+            // it's right: partial disclosure is still disclosure. Name and
+            // baseUrl together are enough to identify a backend.
+            console.log(`    ${b.name.padEnd(16)} ${b.provider.padEnd(10)} ${b.baseUrl.padEnd(40)} ***`);
         }
         console.log('');
         return;

package/dist/openai-backend.js CHANGED Viewed

@@ -148,11 +148,16 @@ export async function forwardToOpenAI(req, res, body, backend, corsOrigin, secur
     }
     catch (err) {
         clearTimeout(timeout);
+        // Log error details server-side only. Responding with err.message
+        // exposes internal stack / path / module information (CodeQL
+        // js/stack-trace-exposure). The client gets a generic 502.
+        const detail = err instanceof Error ? err.message : String(err);
+        if (verbose)
+            console.error(`[dario] openai backend (${backend.name}) error: ${detail}`);
         if (!res.headersSent) {
             res.writeHead(502, { 'Content-Type': 'application/json', ...securityHeaders });
             res.end(JSON.stringify({
                 error: 'Upstream OpenAI-compat backend error',
-                message: err instanceof Error ? err.message : String(err),
                 backend: backend.name,
             }));
         }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@askalf/dario",
-  "version": "3.7.0",
+  "version": "3.7.2",
   "description": "A local LLM router. One endpoint, every provider — Claude subscriptions, OpenAI, OpenRouter, Groq, local LiteLLM, any OpenAI-compat endpoint — your tools don't need to change.",
   "type": "module",
   "bin": {