@askalf/dario 3.12.0 → 3.13.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -243,6 +243,7 @@ curl http://localhost:3456/analytics # per-account / per-model stats, burn ra
243
243
  | `dario backend list` | List configured OpenAI-compat backends |
244
244
  | `dario backend add <name> --key=<key> [--base-url=<url>]` | Add an OpenAI-compat backend |
245
245
  | `dario backend remove <name>` | Remove an OpenAI-compat backend |
246
+ | `dario shim -- <cmd> [args...]` | **Experimental (v3.12.0).** Run a child process with an in-process fetch patch that rewrites its outbound Anthropic requests — no HTTP proxy involved. See [Experimental: Shim mode](#experimental-shim-mode). |
246
247
  | `dario help` | Full command reference |
247
248
 
248
249
  ### Proxy options
@@ -454,6 +455,39 @@ curl http://localhost:3456/health
454
455
 
455
456
  ---
456
457
 
458
+ ## Experimental: Shim mode
459
+
460
+ *New in v3.12.0. Opt-in. The default path is still the HTTP proxy — shim mode is a second transport, not a replacement.*
461
+
462
+ Shim mode runs a child process with an **in-process `globalThis.fetch` patch** that rewrites the child's outbound requests to `api.anthropic.com/v1/messages` exactly the way the proxy would, then sends them directly from the child to Anthropic. No localhost HTTP hop. No port to bind. No `ANTHROPIC_BASE_URL` to set.
463
+
464
+ ```bash
465
+ dario shim -- claude --print "hello"
466
+ dario shim -v -- claude --print "hello" # verbose
467
+ ```
468
+
469
+ Under the hood: `dario shim` spawns the child with `NODE_OPTIONS=--require <dario-runtime.cjs>` and a unix socket / named pipe for telemetry. The runtime patches `globalThis.fetch` only for Anthropic messages requests, applies the same template replay the proxy does (system prompt, tools, user agent, beta flags), and relays per-request events back to the parent so analytics still work. Every other fetch call in the child is untouched and failsafe-passes through on any internal error.
470
+
471
+ **When to use shim mode**
472
+ - Running a single CC instance on a locked-down machine where binding a local port is inconvenient or forbidden.
473
+ - Wrapping one-off scripts (`dario shim -- node my-agent.js`) without setting up environment variables.
474
+ - Debugging a specific child process in isolation — verbose logs are scoped to that process.
475
+
476
+ **When to stay on the proxy** (which is still the default)
477
+ - Multi-client routing. The proxy serves every tool on the machine through one endpoint; the shim wraps one child at a time.
478
+ - Multi-account pool mode. Pooling across subscriptions needs a shared OAuth pool the proxy owns — a shim patch inside one child can't see the pool state.
479
+ - Anything that isn't a Node / Bun child. The shim relies on `NODE_OPTIONS`, so non-JS runtimes (Python SDK, a Go CLI) still need the proxy.
480
+
481
+ Limitations at v3.12.0:
482
+ - Bun child detection is partial — known-good with `claude --print` on Node.
483
+ - No `--replace claude` global wrapper yet; you call `dario shim -- claude ...` explicitly.
484
+ - Per-request token cost recording in shim mode is still being wired into analytics.
485
+ - Windows named-pipe CI coverage is incomplete.
486
+
487
+ The shim runtime lives at `src/shim/runtime.cjs` (hand-written CJS so `--require` can load it) and the host orchestrator at `src/shim/host.ts`. ~180 lines total. See the [v3.12.0 release notes](https://github.com/askalf/dario/releases/tag/v3.12.0) for the full design writeup.
488
+
489
+ ---
490
+
457
491
  ## Endpoints
458
492
 
459
493
  | Path | Description |
@@ -464,7 +498,7 @@ curl http://localhost:3456/health
464
498
  | `GET /health` | Proxy health + OAuth status + request count |
465
499
  | `GET /status` | Detailed Claude OAuth token status |
466
500
  | `GET /accounts` | Pool snapshot (pool mode only) |
467
- | `GET /analytics` | Per-account / per-model stats, burn rate, exhaustion predictions (pool mode only) |
501
+ | `GET /analytics` | Per-account / per-model stats, burn rate, exhaustion predictions. **v3.11.1+:** every request carries a `billingBucket` field (`five_hour` / `seven_day` / `overage` / `unknown`) so you can see, at a glance, which bucket each request billed against. (pool mode only) |
468
502
 
469
503
  ---
470
504
 
@@ -523,6 +557,9 @@ This establishes a session baseline. Without priming, brand-new accounts occasio
523
557
  **What happens when Anthropic rotates the OAuth config?**
524
558
  Dario auto-detects OAuth config from the installed Claude Code binary. When CC ships a new version with rotated values, dario picks them up on the next run. Cache at `~/.dario/cc-oauth-cache-v3.json`, keyed by the CC binary fingerprint. Falls back to hardcoded CC 2.1.104 prod values if CC isn't installed.
525
559
 
560
+ **What happens when Anthropic changes the CC request template?**
561
+ *New in v3.11.0.* Dario extracts the live request template from your installed Claude Code binary on startup — the system prompt slices, tool schemas, user-agent, beta flags — and uses those to replay requests instead of a version pinned into dario itself. When CC ships a new version with a tweaked template, the next `dario proxy` run picks it up automatically. Fallback: the hand-curated `src/cc-template-data.json` bundled with the release, so dario still works even if the installed CC binary is a version the extractor doesn't know how to read. See `src/live-fingerprint.ts`.
562
+
526
563
  **I'm hitting rate limits on the Claude backend. What do I do?**
527
564
  Claude subscriptions have rolling 5-hour and 7-day usage windows. Check utilization with Claude Code's `/usage` command or the [statusline](https://code.claude.com/docs/en/statusline). For multi-agent workloads, add more accounts and let pool mode distribute the load: `dario accounts add <alias>`.
528
565
 
@@ -586,19 +623,22 @@ Longer-form writing on how dario works and why it works that way:
586
623
 
587
624
  ## Contributing
588
625
 
589
- PRs welcome. The codebase is ~2,500 lines of TypeScript across 10 files:
626
+ PRs welcome. The codebase is small TypeScript — around ~3,000 lines across ~14 files:
590
627
 
591
628
  | File | Purpose |
592
629
  |---|---|
593
630
  | `src/proxy.ts` | HTTP proxy server, request handler, rate governor, Claude backend dispatch |
594
631
  | `src/cc-template.ts` | CC request template engine, tool mapping, orchestration & framework scrubbing |
595
- | `src/cc-template-data.json` | CC request template data (25 tools, 25KB system prompt) |
632
+ | `src/cc-template-data.json` | Bundled fallback CC request template (used when live-fingerprint extraction isn't possible) |
596
633
  | `src/cc-oauth-detect.ts` | OAuth config auto-detection from the installed CC binary |
634
+ | `src/live-fingerprint.ts` | **v3.11.0.** Live extraction of the CC request template (system prompt, tools, user-agent, beta flags) from the installed Claude Code binary |
597
635
  | `src/oauth.ts` | Single-account token storage, PKCE flow, auto-refresh |
598
636
  | `src/accounts.ts` | Multi-account credential storage and independent OAuth lifecycle |
599
637
  | `src/pool.ts` | Account pool, headroom-aware routing, failover target selection |
600
- | `src/analytics.ts` | Rolling request history, per-account / per-model stats, burn-rate |
638
+ | `src/analytics.ts` | Rolling request history, per-account / per-model stats, burn-rate, billing bucket classification |
601
639
  | `src/openai-backend.ts` | OpenAI-compat backend credential storage and request forwarder |
640
+ | `src/shim/runtime.cjs` | **v3.12.0.** Hand-written CJS payload loaded into child processes via `NODE_OPTIONS=--require`; patches `globalThis.fetch` for Anthropic messages requests only |
641
+ | `src/shim/host.ts` | **v3.12.0.** Parent-side orchestrator for `dario shim` — spawns the child, owns the telemetry socket / named pipe, feeds analytics |
602
642
  | `src/cli.ts` | CLI entry point, command routing, Bun auto-relaunch |
603
643
  | `src/index.ts` | Library exports |
604
644
 
@@ -154,28 +154,28 @@ const TOOL_MAP = {
154
154
  },
155
155
  read: {
156
156
  ccTool: 'Read',
157
- translateArgs: (a) => ({ file_path: a.path || a.file_path || '' }),
158
- translateBack: (a) => ({ path: a.file_path ?? '' }),
157
+ translateArgs: (a) => ({ file_path: a.filePath || a.path || a.file_path || '' }),
158
+ translateBack: (a) => ({ path: a.file_path ?? '', filePath: a.file_path ?? '' }),
159
159
  },
160
160
  read_file: {
161
161
  ccTool: 'Read',
162
- translateArgs: (a) => ({ file_path: a.path || a.file_path || '' }),
163
- translateBack: (a) => ({ path: a.file_path ?? '' }),
162
+ translateArgs: (a) => ({ file_path: a.filePath || a.path || a.file_path || '' }),
163
+ translateBack: (a) => ({ path: a.file_path ?? '', filePath: a.file_path ?? '' }),
164
164
  },
165
165
  write: {
166
166
  ccTool: 'Write',
167
- translateArgs: (a) => ({ file_path: a.path || a.file_path || '', content: a.content || '' }),
168
- translateBack: (a) => ({ path: a.file_path ?? '', content: a.content ?? '' }),
167
+ translateArgs: (a) => ({ file_path: a.filePath || a.path || a.file_path || '', content: a.content || '' }),
168
+ translateBack: (a) => ({ path: a.file_path ?? '', filePath: a.file_path ?? '', content: a.content ?? '' }),
169
169
  },
170
170
  write_file: {
171
171
  ccTool: 'Write',
172
- translateArgs: (a) => ({ file_path: a.path || a.file_path || '', content: a.content || '' }),
173
- translateBack: (a) => ({ path: a.file_path ?? '', content: a.content ?? '' }),
172
+ translateArgs: (a) => ({ file_path: a.filePath || a.path || a.file_path || '', content: a.content || '' }),
173
+ translateBack: (a) => ({ path: a.file_path ?? '', filePath: a.file_path ?? '', content: a.content ?? '' }),
174
174
  },
175
175
  edit: {
176
176
  ccTool: 'Edit',
177
- translateArgs: (a) => ({ file_path: a.path || a.file_path || '', old_string: a.old || a.old_string || '', new_string: a.new || a.new_string || '' }),
178
- translateBack: (a) => ({ path: a.file_path ?? '', old: a.old_string ?? '', new: a.new_string ?? '' }),
177
+ translateArgs: (a) => ({ file_path: a.filePath || a.path || a.file_path || '', old_string: a.oldString || a.old || a.old_string || '', new_string: a.newString || a.new || a.new_string || '' }),
178
+ translateBack: (a) => ({ path: a.file_path ?? '', filePath: a.file_path ?? '', old: a.old_string ?? '', oldString: a.old_string ?? '', new: a.new_string ?? '', newString: a.new_string ?? '' }),
179
179
  },
180
180
  edit_file: { ccTool: 'Edit' },
181
181
  glob: { ccTool: 'Glob' },
@@ -390,6 +390,14 @@ export function buildCCRequest(clientBody, billingTag, cache1h, identity, opts =
390
390
  default: return a;
391
391
  }
392
392
  },
393
+ // Unmapped-fallback mappings must always lose the reverse-lookup
394
+ // collision to any legitimate mapping that targets the same CC tool.
395
+ // Otherwise a client that declares both an unmapped tool (e.g.
396
+ // OpenClaw's `image`) round-robin'd onto Glob AND a real `glob` /
397
+ // `find_files` / `list_files` mapping can have the reverse path
398
+ // route real Glob tool_use blocks back to `image`, which then fails
399
+ // its own input validation ("image required"). dario#37, Glob half.
400
+ reverseScore: 0,
393
401
  });
394
402
  }
395
403
  }
@@ -554,12 +562,19 @@ function buildReverseLookup(toolMap) {
554
562
  }
555
563
  }
556
564
  // Score-based collision resolution in the non-identity pass.
565
+ // reverseScore: 0 means "never claim a reverse slot at all" — used for
566
+ // unmapped-fallback mappings whose forward path exists for round-robin
567
+ // distribution but whose reverse path would corrupt real CC tool calls
568
+ // (e.g. routing a real Glob tool_use back to an unmapped `image` client
569
+ // tool with the wrong input shape, dario#37 Glob half).
557
570
  const scoreOf = (m) => m.reverseScore ?? 10;
558
571
  for (const [clientName, mapping] of toolMap) {
559
572
  if (clientName.toLowerCase() === mapping.ccTool.toLowerCase())
560
573
  continue;
561
574
  if (identityClaimed.has(mapping.ccTool))
562
575
  continue;
576
+ if (scoreOf(mapping) === 0)
577
+ continue;
563
578
  const existing = reverseMap.get(mapping.ccTool);
564
579
  if (!existing || scoreOf(mapping) > scoreOf(existing.mapping)) {
565
580
  reverseMap.set(mapping.ccTool, { clientName, mapping });
@@ -15,6 +15,74 @@
15
15
  * only runs long enough to capture a single request. CC's OAuth token
16
16
  * never leaves the machine — we send CC to a loopback URL that CC itself
17
17
  * trusts because we set ANTHROPIC_BASE_URL in the child's environment.
18
+ *
19
+ * --------------------------------------------------------------------
20
+ * "Hide in the population" roadmap (v3.13 → ?)
21
+ * --------------------------------------------------------------------
22
+ *
23
+ * The fingerprint pipeline has historically cared about one axis: what
24
+ * goes INSIDE the /v1/messages body (agent identity, system prompt, tool
25
+ * list). That's only one fingerprint vector. Anthropic can (and likely
26
+ * does) look at several others:
27
+ *
28
+ * 1. Header ORDER. Node's http module emits headers in alphabetical
29
+ * order via setHeader(). Undici preserves insertion order. Real CC
30
+ * uses undici with a specific insertion pattern. If dario sends
31
+ * headers in a different order than CC, the difference is trivially
32
+ * observable on the server side via the raw header array.
33
+ * → Captured as `header_order` below. Outbound proxy paths should
34
+ * use the captured order when rebuilding fetch() headers.
35
+ *
36
+ * 2. TLS ClientHello (JA3 / JA4 fingerprint). The cipher list, elliptic
37
+ * curves, extension order, and ALPN negotiation are determined by
38
+ * the TLS library, and Node's TLS (OpenSSL) produces a distinctive
39
+ * fingerprint that differs from any browser or from curl. Real CC
40
+ * running on top of Node has the Node JA3 — so we already match,
41
+ * provided both run on the same Node major. A cross-runtime worry
42
+ * surfaces when Anthropic ships Bun- or bundled-binary CC: at that
43
+ * point Node-dario and Bun-CC would JA-differ.
44
+ * → Mitigation: detect Bun-compiled CC, fall back to shim mode
45
+ * (which patches fetch INSIDE the CC process, inheriting CC's
46
+ * own TLS stack for free).
47
+ *
48
+ * 3. HTTP/2 frame ordering + SETTINGS parameters. Similar to TLS, this
49
+ * is controlled by the HTTP library. Node and undici produce a
50
+ * consistent H2 fingerprint. Matches as long as both ends run the
51
+ * same library.
52
+ *
53
+ * 4. Request timing distribution. Real CC sends requests with jitter
54
+ * driven by user typing, tool-call sequencing, and internal retry
55
+ * logic. Dario-through-a-client sends requests with jitter driven
56
+ * by WHATEVER client is on the other end (OpenClaw, Hermes, curl).
57
+ * That distribution differs from CC's. Anthropic could pattern-match
58
+ * "no inter-request jitter" as a fingerprint for automated usage.
59
+ * → Deferred. Adds latency for debatable gain. Analytics already
60
+ * tracks per-request timing — could drive a replay distribution
61
+ * later.
62
+ *
63
+ * 5. sessionId rotation cadence. CC rotates its internal session id
64
+ * on a specific cadence (observed: roughly once per conversation
65
+ * start, not per-request). Dario today uses a static session id
66
+ * from loadClaudeIdentity. A proxy that kept rotating sessionId
67
+ * randomly would stand out; a proxy that never rotates also stands
68
+ * out. Matching CC's cadence requires observing CC over a longer
69
+ * period than a single capture session.
70
+ * → Deferred. Requires a longer-running capture mode.
71
+ *
72
+ * 6. Request body field ordering. JSON is unordered, but the wire
73
+ * serialization IS ordered. Real CC uses a specific field order
74
+ * for /v1/messages (e.g., `model` before `messages` before
75
+ * `system` before `tools`). A proxy that serializes in a different
76
+ * order leaks its origin.
77
+ * → Worth matching. Cheap to implement — the template capture
78
+ * already produces a body we can walk to recover field order.
79
+ * Deferred to a follow-up.
80
+ *
81
+ * The concrete v3.13 move is (1): capture header_order and make it
82
+ * available on the template so the outbound proxy paths can reproduce
83
+ * it. Everything else is documented here as a roadmap so the next
84
+ * contributor — or dario maintainer six months from now — can pick up
85
+ * the right piece without re-deriving the threat model.
18
86
  */
19
87
  export interface TemplateData {
20
88
  _version: string;
@@ -28,6 +96,14 @@ export interface TemplateData {
28
96
  input_schema: Record<string, unknown>;
29
97
  }>;
30
98
  tool_names: string[];
99
+ /**
100
+ * The exact order CC emitted HTTP headers in when it hit the capture
101
+ * endpoint. Lowercased. Populated only from live captures — bundled
102
+ * snapshots leave this undefined and callers fall back to their own
103
+ * default order. Used by outbound proxy paths to reproduce CC's
104
+ * header ordering instead of Node's alphabetical default.
105
+ */
106
+ header_order?: string[];
31
107
  }
32
108
  /**
33
109
  * Load the template synchronously. Prefers the live cache (fresh capture
@@ -59,6 +135,12 @@ interface CapturedRequest {
59
135
  method: string;
60
136
  path: string;
61
137
  headers: Record<string, string>;
138
+ /**
139
+ * The flat [k1, v1, k2, v2, ...] array exactly as Node exposes it via
140
+ * req.rawHeaders. Preserves insertion order and duplicates, which the
141
+ * flattened `headers` map does not. Used to recover CC's header order.
142
+ */
143
+ rawHeaders: string[];
62
144
  body: Record<string, unknown>;
63
145
  }
64
146
  /**
@@ -15,6 +15,74 @@
15
15
  * only runs long enough to capture a single request. CC's OAuth token
16
16
  * never leaves the machine — we send CC to a loopback URL that CC itself
17
17
  * trusts because we set ANTHROPIC_BASE_URL in the child's environment.
18
+ *
19
+ * --------------------------------------------------------------------
20
+ * "Hide in the population" roadmap (v3.13 → ?)
21
+ * --------------------------------------------------------------------
22
+ *
23
+ * The fingerprint pipeline has historically cared about one axis: what
24
+ * goes INSIDE the /v1/messages body (agent identity, system prompt, tool
25
+ * list). That's only one fingerprint vector. Anthropic can (and likely
26
+ * does) look at several others:
27
+ *
28
+ * 1. Header ORDER. Node's http module emits headers in alphabetical
29
+ * order via setHeader(). Undici preserves insertion order. Real CC
30
+ * uses undici with a specific insertion pattern. If dario sends
31
+ * headers in a different order than CC, the difference is trivially
32
+ * observable on the server side via the raw header array.
33
+ * → Captured as `header_order` below. Outbound proxy paths should
34
+ * use the captured order when rebuilding fetch() headers.
35
+ *
36
+ * 2. TLS ClientHello (JA3 / JA4 fingerprint). The cipher list, elliptic
37
+ * curves, extension order, and ALPN negotiation are determined by
38
+ * the TLS library, and Node's TLS (OpenSSL) produces a distinctive
39
+ * fingerprint that differs from any browser or from curl. Real CC
40
+ * running on top of Node has the Node JA3 — so we already match,
41
+ * provided both run on the same Node major. A cross-runtime worry
42
+ * surfaces when Anthropic ships Bun- or bundled-binary CC: at that
43
+ * point Node-dario and Bun-CC would JA-differ.
44
+ * → Mitigation: detect Bun-compiled CC, fall back to shim mode
45
+ * (which patches fetch INSIDE the CC process, inheriting CC's
46
+ * own TLS stack for free).
47
+ *
48
+ * 3. HTTP/2 frame ordering + SETTINGS parameters. Similar to TLS, this
49
+ * is controlled by the HTTP library. Node and undici produce a
50
+ * consistent H2 fingerprint. Matches as long as both ends run the
51
+ * same library.
52
+ *
53
+ * 4. Request timing distribution. Real CC sends requests with jitter
54
+ * driven by user typing, tool-call sequencing, and internal retry
55
+ * logic. Dario-through-a-client sends requests with jitter driven
56
+ * by WHATEVER client is on the other end (OpenClaw, Hermes, curl).
57
+ * That distribution differs from CC's. Anthropic could pattern-match
58
+ * "no inter-request jitter" as a fingerprint for automated usage.
59
+ * → Deferred. Adds latency for debatable gain. Analytics already
60
+ * tracks per-request timing — could drive a replay distribution
61
+ * later.
62
+ *
63
+ * 5. sessionId rotation cadence. CC rotates its internal session id
64
+ * on a specific cadence (observed: roughly once per conversation
65
+ * start, not per-request). Dario today uses a static session id
66
+ * from loadClaudeIdentity. A proxy that kept rotating sessionId
67
+ * randomly would stand out; a proxy that never rotates also stands
68
+ * out. Matching CC's cadence requires observing CC over a longer
69
+ * period than a single capture session.
70
+ * → Deferred. Requires a longer-running capture mode.
71
+ *
72
+ * 6. Request body field ordering. JSON is unordered, but the wire
73
+ * serialization IS ordered. Real CC uses a specific field order
74
+ * for /v1/messages (e.g., `model` before `messages` before
75
+ * `system` before `tools`). A proxy that serializes in a different
76
+ * order leaks its origin.
77
+ * → Worth matching. Cheap to implement — the template capture
78
+ * already produces a body we can walk to recover field order.
79
+ * Deferred to a follow-up.
80
+ *
81
+ * The concrete v3.13 move is (1): capture header_order and make it
82
+ * available on the template so the outbound proxy paths can reproduce
83
+ * it. Everything else is documented here as a roadmap so the next
84
+ * contributor — or dario maintainer six months from now — can pick up
85
+ * the right piece without re-deriving the threat model.
18
86
  */
19
87
  import { spawn } from 'node:child_process';
20
88
  import { createServer } from 'node:http';
@@ -165,6 +233,7 @@ async function runCapture(timeoutMs) {
165
233
  method: req.method ?? 'POST',
166
234
  path: req.url ?? '/v1/messages',
167
235
  headers,
236
+ rawHeaders: Array.isArray(req.rawHeaders) ? [...req.rawHeaders] : [],
168
237
  body,
169
238
  };
170
239
  }
@@ -324,6 +393,7 @@ export function extractTemplate(captured) {
324
393
  if (tools.length === 0)
325
394
  return null;
326
395
  const version = extractCCVersion(captured.headers) ?? 'unknown';
396
+ const headerOrder = extractHeaderOrder(captured.rawHeaders);
327
397
  return {
328
398
  _version: version,
329
399
  _captured: new Date().toISOString(),
@@ -332,8 +402,32 @@ export function extractTemplate(captured) {
332
402
  system_prompt: systemPrompt,
333
403
  tools,
334
404
  tool_names: tools.map((t) => t.name),
405
+ header_order: headerOrder,
335
406
  };
336
407
  }
408
+ /**
409
+ * Walk rawHeaders (flat [k1, v1, k2, v2, ...] array) and return the
410
+ * header names in insertion order, lowercased, de-duplicated. If the
411
+ * raw array is empty or unusable, returns undefined so the caller
412
+ * falls back to default ordering.
413
+ */
414
+ function extractHeaderOrder(rawHeaders) {
415
+ if (!Array.isArray(rawHeaders) || rawHeaders.length === 0)
416
+ return undefined;
417
+ const order = [];
418
+ const seen = new Set();
419
+ for (let i = 0; i < rawHeaders.length; i += 2) {
420
+ const name = rawHeaders[i];
421
+ if (typeof name !== 'string')
422
+ continue;
423
+ const lower = name.toLowerCase();
424
+ if (seen.has(lower))
425
+ continue;
426
+ seen.add(lower);
427
+ order.push(lower);
428
+ }
429
+ return order.length > 0 ? order : undefined;
430
+ }
337
431
  function pickTextBlock(block) {
338
432
  if (!block || typeof block !== 'object')
339
433
  return null;
package/dist/pool.d.ts CHANGED
@@ -1,3 +1,15 @@
1
+ /**
2
+ * Compute a stable stickiness key from a conversation's first user
3
+ * message. Multi-turn agent sessions carry the same first user message
4
+ * on every turn, so hashing it gives a stable per-conversation key that
5
+ * doesn't require client cooperation. Empty / whitespace-only inputs
6
+ * return null so callers bypass stickiness on unhashable requests.
7
+ *
8
+ * Uses SHA-256 truncated to 16 hex chars (64 bits) — plenty of collision
9
+ * headroom for a pool of at most a few hundred active conversations per
10
+ * proxy instance, and small enough to log without spam.
11
+ */
12
+ export declare function computeStickyKey(firstUserMessage: string | null | undefined): string | null;
1
13
  export interface AccountIdentity {
2
14
  deviceId: string;
3
15
  accountUuid: string;
@@ -39,6 +51,7 @@ export declare class AccountPool {
39
51
  private queueMaxSize;
40
52
  private queueTimeoutMs;
41
53
  private drainTimer;
54
+ private sticky;
42
55
  add(alias: string, opts: {
43
56
  accessToken: string;
44
57
  refreshToken: string;
@@ -50,6 +63,41 @@ export declare class AccountPool {
50
63
  get size(): number;
51
64
  /** Select the best account for the next request. */
52
65
  select(): PoolAccount | null;
66
+ /**
67
+ * Select with session stickiness. If `stickyKey` is already bound to a
68
+ * healthy account (not rejected, token not near expiry, headroom > 2%),
69
+ * return that account. Otherwise pick by headroom (`select()`) and
70
+ * rebind the key to the chosen account. Null key bypasses stickiness
71
+ * and delegates to `select()`.
72
+ *
73
+ * Rebinding also fires when the previously-bound account is marked
74
+ * rejected (429) or has its headroom drop below 2% — at that point the
75
+ * conversation's cache entry on the old account is effectively stranded
76
+ * until reset anyway, so there's no cost to moving. The new account
77
+ * starts building its own cache for this conversation from turn 1 of
78
+ * the rebind.
79
+ *
80
+ * Also performs lazy cleanup of expired bindings (TTL or size cap).
81
+ */
82
+ selectSticky(stickyKey: string | null): PoolAccount | null;
83
+ /**
84
+ * Rebind a sticky key to a different account — called by proxy after an
85
+ * in-request 429 failover moves to the next-best account. Without this
86
+ * the next turn of the same conversation would re-select the exhausted
87
+ * account via the stale binding, eat another 429, and failover again.
88
+ */
89
+ rebindSticky(stickyKey: string | null, alias: string): void;
90
+ /**
91
+ * Drop any binding that points at an account no longer in the pool, any
92
+ * binding past the TTL, and if we're over the size cap drop the oldest
93
+ * entries until we're back under. O(n) but n is small (capped at 2k)
94
+ * and this only runs on selectSticky, not on every method.
95
+ */
96
+ private cleanupSticky;
97
+ /** Test/inspection helper — number of live sticky bindings. */
98
+ stickyCount(): number;
99
+ /** Test/inspection helper — current alias bound to a key, or null. */
100
+ stickyAliasFor(stickyKey: string): string | null;
53
101
  /** Select the next-best account, excluding the given set of aliases. */
54
102
  selectExcluding(excluded: Set<string>): PoolAccount | null;
55
103
  updateRateLimits(alias: string, snapshot: RateLimitSnapshot): void;
package/dist/pool.js CHANGED
@@ -6,7 +6,24 @@
6
6
  * path it has always had; the pool only runs when there are multiple
7
7
  * accounts to distribute against.
8
8
  */
9
- import { randomUUID } from 'node:crypto';
9
+ import { createHash, randomUUID } from 'node:crypto';
10
+ /**
11
+ * Compute a stable stickiness key from a conversation's first user
12
+ * message. Multi-turn agent sessions carry the same first user message
13
+ * on every turn, so hashing it gives a stable per-conversation key that
14
+ * doesn't require client cooperation. Empty / whitespace-only inputs
15
+ * return null so callers bypass stickiness on unhashable requests.
16
+ *
17
+ * Uses SHA-256 truncated to 16 hex chars (64 bits) — plenty of collision
18
+ * headroom for a pool of at most a few hundred active conversations per
19
+ * proxy instance, and small enough to log without spam.
20
+ */
21
+ export function computeStickyKey(firstUserMessage) {
22
+ const trimmed = (firstUserMessage ?? '').trim();
23
+ if (trimmed.length === 0)
24
+ return null;
25
+ return createHash('sha256').update(trimmed).digest('hex').slice(0, 16);
26
+ }
10
27
  export const EMPTY_SNAPSHOT = {
11
28
  status: 'unknown',
12
29
  util5h: 0,
@@ -31,12 +48,15 @@ export function parseRateLimits(headers) {
31
48
  updatedAt: Date.now(),
32
49
  };
33
50
  }
51
+ const STICKY_TTL_MS = 6 * 60 * 60 * 1000; // 6h
52
+ const STICKY_MAX_ENTRIES = 2_000; // lazy cleanup cap
34
53
  export class AccountPool {
35
54
  accounts = new Map();
36
55
  queue = [];
37
56
  queueMaxSize = 50;
38
57
  queueTimeoutMs = 60_000;
39
58
  drainTimer = null;
59
+ sticky = new Map();
40
60
  add(alias, opts) {
41
61
  const existing = this.accounts.get(alias);
42
62
  this.accounts.set(alias, {
@@ -82,6 +102,84 @@ export class AccountPool {
82
102
  // No rate-limit data at all — least-used first
83
103
  return all.reduce((a, b) => a.requestCount < b.requestCount ? a : b);
84
104
  }
105
+ /**
106
+ * Select with session stickiness. If `stickyKey` is already bound to a
107
+ * healthy account (not rejected, token not near expiry, headroom > 2%),
108
+ * return that account. Otherwise pick by headroom (`select()`) and
109
+ * rebind the key to the chosen account. Null key bypasses stickiness
110
+ * and delegates to `select()`.
111
+ *
112
+ * Rebinding also fires when the previously-bound account is marked
113
+ * rejected (429) or has its headroom drop below 2% — at that point the
114
+ * conversation's cache entry on the old account is effectively stranded
115
+ * until reset anyway, so there's no cost to moving. The new account
116
+ * starts building its own cache for this conversation from turn 1 of
117
+ * the rebind.
118
+ *
119
+ * Also performs lazy cleanup of expired bindings (TTL or size cap).
120
+ */
121
+ selectSticky(stickyKey) {
122
+ if (!stickyKey)
123
+ return this.select();
124
+ this.cleanupSticky();
125
+ const binding = this.sticky.get(stickyKey);
126
+ if (binding) {
127
+ const bound = this.accounts.get(binding.alias);
128
+ const now = Date.now();
129
+ if (bound
130
+ && bound.rateLimit.status !== 'rejected'
131
+ && bound.expiresAt > now + 30_000
132
+ && (1 - Math.max(bound.rateLimit.util5h, bound.rateLimit.util7d)) > 0.02) {
133
+ return bound;
134
+ }
135
+ }
136
+ const picked = this.select();
137
+ if (picked) {
138
+ this.sticky.set(stickyKey, { alias: picked.alias, boundAt: Date.now() });
139
+ }
140
+ return picked;
141
+ }
142
+ /**
143
+ * Rebind a sticky key to a different account — called by proxy after an
144
+ * in-request 429 failover moves to the next-best account. Without this
145
+ * the next turn of the same conversation would re-select the exhausted
146
+ * account via the stale binding, eat another 429, and failover again.
147
+ */
148
+ rebindSticky(stickyKey, alias) {
149
+ if (!stickyKey)
150
+ return;
151
+ if (!this.accounts.has(alias))
152
+ return;
153
+ this.sticky.set(stickyKey, { alias, boundAt: Date.now() });
154
+ }
155
+ /**
156
+ * Drop any binding that points at an account no longer in the pool, any
157
+ * binding past the TTL, and if we're over the size cap drop the oldest
158
+ * entries until we're back under. O(n) but n is small (capped at 2k)
159
+ * and this only runs on selectSticky, not on every method.
160
+ */
161
+ cleanupSticky() {
162
+ const now = Date.now();
163
+ for (const [key, b] of this.sticky) {
164
+ if (!this.accounts.has(b.alias) || now - b.boundAt > STICKY_TTL_MS) {
165
+ this.sticky.delete(key);
166
+ }
167
+ }
168
+ if (this.sticky.size > STICKY_MAX_ENTRIES) {
169
+ const sorted = [...this.sticky.entries()].sort((a, b) => a[1].boundAt - b[1].boundAt);
170
+ const toDrop = sorted.slice(0, this.sticky.size - STICKY_MAX_ENTRIES);
171
+ for (const [key] of toDrop)
172
+ this.sticky.delete(key);
173
+ }
174
+ }
175
+ /** Test/inspection helper — number of live sticky bindings. */
176
+ stickyCount() {
177
+ return this.sticky.size;
178
+ }
179
+ /** Test/inspection helper — current alias bound to a key, or null. */
180
+ stickyAliasFor(stickyKey) {
181
+ return this.sticky.get(stickyKey)?.alias ?? null;
182
+ }
85
183
  /** Select the next-best account, excluding the given set of aliases. */
86
184
  selectExcluding(excluded) {
87
185
  if (this.accounts.size <= 1)