@kylebrodeur/pi-model-router 0.1.3 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -7,21 +7,34 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.2.0] - 2026-05-17
11
+
10
12
  ### Added
13
+ - Transparent wait and retry interception for string-based rate limit errors
11
14
  - Ollama auto-sync feature
12
- - Rate-limit fallback with transparent HTTP error handling (402, 429, 503, 529)
15
+ - Rate-limit fallback with transparent HTTP error handling
13
16
  - Feature toggles in config (`features` object)
14
17
  - Scope shim for syncing router profiles to Pi enabled models
15
18
  - Progressive enhancement (auto-detect qmd-ledger and agent-bus)
16
- - Progressive config files (`model-router.ledger.json`, `model-router.agent-bus.json`, `model-router.essential.json`)
17
- - GitHub issue templates and pull request template
18
- - Code of Conduct
19
19
 
20
20
  ### Changed
21
- - Updated minimum Pi SDK version from 0.68.0 to 0.70.2
22
- - Merged `README_FORK.md` into canonical `README.md`
21
+ - **BREAKING:** Migrate from `@mariozechner/pi-coding-agent` to `@earendil-works/pi-coding-agent` v0.75.0
22
+ - Update `detectPlugins` to use `pi.getAllTools()` for Pi v0.74.1+ compatibility
23
+ - Updated minimum Pi SDK version to `>=0.75.0`
23
24
  - Replaced `@sinclair/typebox` peer dependency with `typebox`
24
25
 
26
+ ## [0.1.4] - 2026-04-27
27
+
28
+ ### Added
29
+ - Wait/retry interception for string-based rate limit errors
30
+
31
+ ## [0.1.3] - 2026-04-24
32
+
33
+ ## [0.1.2] - 2026-04-23
34
+
35
+ ### Fixed
36
+ - Config merge bug where features/ollamaSync/rateLimitFallback were dropped
37
+
25
38
  ## [0.1.1] - 2025-04-22
26
39
 
27
40
  ### Fixed
@@ -37,6 +50,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
37
50
  - Scope shim module
38
51
  - Progressive enhancement with plugin detection
39
52
 
40
- [Unreleased]: https://github.com/kylebrodeur/pi-model-router/compare/v0.1.1...HEAD
53
+ [Unreleased]: https://github.com/kylebrodeur/pi-model-router/compare/v0.2.0...HEAD
54
+ [0.2.0]: https://github.com/kylebrodeur/pi-model-router/compare/v0.1.4...v0.2.0
55
+ [0.1.4]: https://github.com/kylebrodeur/pi-model-router/compare/v0.1.3...v0.1.4
56
+ [0.1.3]: https://github.com/kylebrodeur/pi-model-router/compare/v0.1.2...v0.1.3
57
+ [0.1.2]: https://github.com/kylebrodeur/pi-model-router/compare/v0.1.1...v0.1.2
41
58
  [0.1.1]: https://github.com/kylebrodeur/pi-model-router/compare/v0.1.0...v0.1.1
42
59
  [0.1.0]: https://github.com/kylebrodeur/pi-model-router/releases/tag/v0.1.0
package/LEARNINGS.md CHANGED
@@ -73,10 +73,11 @@ The fallback mechanism uses a user-configurable sequence of models: `fallbackSeq
73
73
  * **Key benefit**: Prevents catastrophic failures when a primary model is unavailable.
74
74
 
75
75
  ### 3. Graceful Error Handling
76
- The extension transparently handles errors. For "out of credits" (`402`) or "rate limit" (`429`), it automatically switches to a fallback model and emits a custom session entry (`router-fallback`) for headless tooling to detect.
76
+ The extension transparently handles errors. For "out of credits" (`402`) or "rate limit" (`429`), it automatically switches to a fallback model and emits a custom session entry (`router-fallback`) for headless tooling to detect.
77
+ Additionally, for string-based 429 errors specifying a cooldown (e.g., "quota will reset after 58s"), the router can intercept the stream, pause for the required duration (if under `shortDelayThreshold`), and automatically retry the original request without failing the turn.
77
78
 
78
79
  * **When to use**: For any extension exposed to external API services.
79
- * **Key insight**: Never mask API errors; provide enough detail (status codes) in UI notifications for users to diagnose.
80
+ * **Key insight**: Never mask API errors; provide enough detail (status codes) in UI notifications for users to diagnose, but handle transient issues (like short rate limits) invisibly where possible.
80
81
 
81
82
  ## 🔌 Pi Integration Patterns
82
83
 
package/README.md CHANGED
@@ -121,6 +121,30 @@ Copy the example config to one of:
121
121
 
122
122
  **Priority:** Project config `.pi/model-router.json` overrides user config `~/.pi/agent/model-router.json`. Both override defaults.
123
123
 
124
+ ### Rate Limit Interception & Fallback
125
+
126
+ The router can gracefully handle 429 Rate Limit and Quota errors. If the error specifies a wait time (e.g., "reset after 58s"), the router will pause and automatically retry the prompt if the wait time is under your threshold. If it exceeds the threshold or is unparseable, it fails over to the next available model in your fallback sequence.
127
+
128
+ ```json
129
+ {
130
+ "rateLimitFallback": {
131
+ "enabled": true,
132
+ "shortDelayThreshold": 60,
133
+ "autoFallback": true,
134
+ "autoRestore": true,
135
+ "restoreCheckInterval": 300,
136
+ "fallbackSequence": ["anthropic/claude-3-haiku-20240307", "ollama/*"]
137
+ }
138
+ }
139
+ ```
140
+
141
+ | Field | Description |
142
+ |-------|-------------|
143
+ | `shortDelayThreshold` | Maximum time (in seconds) the router will pause and wait to retry when encountering a rate limit. If the cooldown is longer than this, it triggers a fallback. |
144
+ | `fallbackSequence` | Array of model IDs (or wildcards like `ollama/*`) to try if the primary model fails or the wait time is too long. |
145
+ | `autoFallback` | (Optional) Automatically switch session to the fallback model globally after a hard failure. |
146
+ | `autoRestore` | (Optional) If fallback was triggered, automatically try to restore the original cloud model after `restoreCheckInterval` seconds. |
147
+
124
148
  ### Progressive Enhancement Configs
125
149
 
126
150
  After installing optional extensions, copy one of these to `.pi/model-router.json`:
@@ -2,10 +2,10 @@ import {
2
2
  getAgentDir,
3
3
  type ExtensionAPI,
4
4
  type ExtensionContext,
5
- } from '@mariozechner/pi-coding-agent';
5
+ } from '@earendil-works/pi-coding-agent';
6
6
  import { existsSync, writeFileSync } from 'node:fs';
7
7
  import { join } from 'node:path';
8
- import type { AutocompleteItem } from '@mariozechner/pi-tui';
8
+ import type { AutocompleteItem } from '@earendil-works/pi-tui';
9
9
  import type {
10
10
  RouterConfig,
11
11
  RouterPinByProfile,
@@ -1,7 +1,7 @@
1
1
  import { existsSync, readFileSync } from 'node:fs';
2
2
  import { join } from 'node:path';
3
- import { getAgentDir } from '@mariozechner/pi-coding-agent';
4
- import type { ThinkingLevel } from '@mariozechner/pi-agent-core';
3
+ import { getAgentDir } from '@earendil-works/pi-coding-agent';
4
+ import type { ThinkingLevel } from '@earendil-works/pi-agent-core';
5
5
  import type {
6
6
  RouterConfig,
7
7
  RouterProfile,
@@ -1,7 +1,7 @@
1
1
  import type {
2
2
  ExtensionAPI,
3
3
  ExtensionContext,
4
- } from '@mariozechner/pi-coding-agent';
4
+ } from '@earendil-works/pi-coding-agent';
5
5
  import {
6
6
  type RouterConfig,
7
7
  type RouterPersistedState,
@@ -34,11 +34,25 @@ interface PluginStatus {
34
34
  }
35
35
 
36
36
  const detectPlugins = (pi: ExtensionAPI): PluginStatus => {
37
- const tools = (pi as any).tools ?? {};
38
37
  const log = (pi as any).log || console;
38
+
39
+ // Pi v0.74.1+: tools are exposed via pi.getAllTools() returning { name, description, ... }[]
40
+ // Legacy Pi versions exposed tools as pi.tools.<name>() directly.
41
+ let allTools: { name: string }[] = [];
42
+ try {
43
+ allTools = (pi as any).getAllTools?.() ?? [];
44
+ } catch {
45
+ allTools = [];
46
+ }
47
+
48
+ const legacyTools = (pi as any).tools ?? {};
49
+ const hasTool = (name: string) =>
50
+ allTools.some((t) => t.name === name) ||
51
+ typeof legacyTools[name] === 'function';
52
+
39
53
  return {
40
- ledger: typeof tools.append_ledger === 'function',
41
- agentBus: typeof tools.link_send === 'function',
54
+ ledger: hasTool('append_ledger'),
55
+ agentBus: hasTool('link_send'),
42
56
  };
43
57
  };
44
58
 
@@ -10,7 +10,7 @@ import { homedir } from 'node:os';
10
10
  import type {
11
11
  ExtensionAPI,
12
12
  ExtensionContext,
13
- } from '@mariozechner/pi-coding-agent';
13
+ } from '@earendil-works/pi-coding-agent';
14
14
 
15
15
  // ─── Types ──────────────────────────────────────────────────────────────────
16
16
 
@@ -8,11 +8,11 @@ import {
8
8
  type Model,
9
9
  type SimpleStreamOptions,
10
10
  type Message,
11
- } from '@mariozechner/pi-ai';
11
+ } from '@earendil-works/pi-ai';
12
12
  import type {
13
13
  ExtensionAPI,
14
14
  ExtensionContext,
15
- } from '@mariozechner/pi-coding-agent';
15
+ } from '@earendil-works/pi-coding-agent';
16
16
  import type {
17
17
  RouterConfig,
18
18
  RoutingDecision,
@@ -30,6 +30,20 @@ import {
30
30
  hasImageAttachment,
31
31
  } from './routing';
32
32
 
33
+ const rateLimitRegex = /(?:429|rate limit|quota).*?(?:reset after|try again in|wait)\s*(\d+)\s*([smh])/i;
34
+
35
+ function extractWaitTimeMs(errorText: string): number | null {
36
+ const match = errorText.match(rateLimitRegex);
37
+ if (!match) return null;
38
+ const value = parseInt(match[1], 10);
39
+ const unit = match[2].toLowerCase();
40
+
41
+ if (unit === 's') return value * 1000;
42
+ if (unit === 'm') return value * 60000;
43
+ if (unit === 'h') return value * 3600000;
44
+ return null;
45
+ }
46
+
33
47
  export const createErrorMessage = (
34
48
  model: Model<Api>,
35
49
  message: string,
@@ -457,74 +471,109 @@ export const registerRouterProvider = (
457
471
  const apiKey = auth.apiKey;
458
472
  const headers = auth.headers;
459
473
 
460
- try {
461
- // HONESTY CHECK & AUTO-TRUNCATION
462
- // If the picked model has a smaller context than what we reported, truncate now.
463
- let effectiveContext = context;
464
- const targetLimit = targetModel.contextWindow || 128_000;
465
- if (targetLimit < model.contextWindow!) {
466
- effectiveContext = truncateContext(context, targetLimit);
467
- }
474
+ let retryCount = 0;
475
+ let modelSuccess = false;
468
476
 
469
- const thinkingOverride = actions.getThinkingOverride(
470
- model.id,
471
- decision.tier,
472
- );
473
- const delegatedReasoning =
474
- targetModel.reasoning &&
475
- (thinkingOverride ?? decision.thinking) !== 'off'
476
- ? (thinkingOverride ?? decision.thinking)
477
- : undefined;
478
-
479
- if (state.lastExtensionContext) {
480
- if (delegatedReasoning) {
481
- state.lastExtensionContext.ui.setHiddenThinkingLabel?.(
482
- `Thinking (${targetProvider}/${targetModelId})...`,
483
- );
484
- } else {
485
- state.lastExtensionContext.ui.setHiddenThinkingLabel?.();
477
+ while (retryCount < 2) {
478
+ let contentReceived = false;
479
+ try {
480
+ // HONESTY CHECK & AUTO-TRUNCATION
481
+ // If the picked model has a smaller context than what we reported, truncate now.
482
+ let effectiveContext = context;
483
+ const targetLimit = targetModel.contextWindow || 128_000;
484
+ if (targetLimit < model.contextWindow!) {
485
+ effectiveContext = truncateContext(context, targetLimit);
486
486
  }
487
- }
488
487
 
489
- const delegatedStream = streamSimple(
490
- targetModel,
491
- effectiveContext,
492
- {
493
- ...options,
494
- apiKey,
495
- headers,
496
- ...(delegatedReasoning
497
- ? { reasoning: delegatedReasoning }
498
- : {}),
499
- },
500
- );
488
+ const thinkingOverride = actions.getThinkingOverride(
489
+ model.id,
490
+ decision.tier,
491
+ );
492
+ const delegatedReasoning =
493
+ targetModel.reasoning &&
494
+ (thinkingOverride ?? decision.thinking) !== 'off'
495
+ ? (thinkingOverride ?? decision.thinking)
496
+ : undefined;
497
+
498
+ if (state.lastExtensionContext) {
499
+ if (delegatedReasoning) {
500
+ state.lastExtensionContext.ui.setHiddenThinkingLabel?.(
501
+ `Thinking (${targetProvider}/${targetModelId})...`,
502
+ );
503
+ } else {
504
+ state.lastExtensionContext.ui.setHiddenThinkingLabel?.();
505
+ }
506
+ }
501
507
 
502
- let contentReceived = false;
503
- for await (const event of delegatedStream) {
504
- if (event.type === 'done') {
505
- const cost = event.message.usage?.cost?.total ?? 0;
506
- state.accumulatedCost += cost;
508
+ const delegatedStream = streamSimple(
509
+ targetModel,
510
+ effectiveContext,
511
+ {
512
+ ...options,
513
+ apiKey,
514
+ headers,
515
+ ...(delegatedReasoning
516
+ ? { reasoning: delegatedReasoning }
517
+ : {}),
518
+ },
519
+ );
520
+
521
+ for await (const event of delegatedStream) {
522
+ if (event.type === 'done') {
523
+ const cost = event.message.usage?.cost?.total ?? 0;
524
+ state.accumulatedCost += cost;
525
+ }
526
+ if (event.type === 'error' && !contentReceived) {
527
+ throw new Error(
528
+ (event as any).error?.errorMessage ||
529
+ 'Model failed before sending content.',
530
+ );
531
+ }
532
+ const isContent =
533
+ event.type === 'text_delta' ||
534
+ event.type === 'thinking_delta' ||
535
+ event.type === 'toolcall_delta' ||
536
+ event.type === 'toolcall_end';
537
+ if (isContent) contentReceived = true;
538
+ stream.push(event);
507
539
  }
508
- if (event.type === 'error' && !contentReceived) {
509
- throw new Error(
510
- (event as any).error?.errorMessage ||
511
- 'Model failed before sending content.',
512
- );
540
+ modelSuccess = true;
541
+ success = true;
542
+ if (i > 0) decision.isFallback = true;
543
+ break; // break the retry loop
544
+ } catch (err) {
545
+ const errMsg = err instanceof Error ? err.message : String(err);
546
+ const waitMs = extractWaitTimeMs(errMsg);
547
+ const maxWaitMs = (state.currentConfig.rateLimitFallback?.shortDelayThreshold ?? 60) * 1000;
548
+
549
+ if (waitMs && waitMs <= maxWaitMs && retryCount === 0 && !contentReceived) {
550
+ const partialMsg = {
551
+ role: 'assistant',
552
+ content: [],
553
+ api: model.api,
554
+ provider: targetProvider,
555
+ model: targetModelId,
556
+ usage: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, totalTokens: 0, cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 } },
557
+ timestamp: Date.now(),
558
+ } as unknown as AssistantMessage;
559
+
560
+ stream.push({
561
+ type: 'text_delta',
562
+ contentIndex: 0,
563
+ delta: `\n_⏳ [Router] Rate limit reached on ${targetProvider}/${targetModelId}. Waiting ${Math.ceil(waitMs/1000)}s before retrying..._\n`,
564
+ partial: partialMsg
565
+ });
566
+ await new Promise(resolve => setTimeout(resolve, waitMs + 1000)); // buffer 1s
567
+ retryCount++;
568
+ continue; // try the same model again
513
569
  }
514
- const isContent =
515
- event.type === 'text_delta' ||
516
- event.type === 'thinking_delta' ||
517
- event.type === 'toolcall_delta' ||
518
- event.type === 'toolcall_end';
519
- if (isContent) contentReceived = true;
520
- stream.push(event);
570
+
571
+ lastError = err;
572
+ break; // model failed completely, break retry loop to go to next fallback model
521
573
  }
522
- success = true;
523
- if (i > 0) decision.isFallback = true;
524
- break;
525
- } catch (err) {
526
- lastError = err;
527
574
  }
575
+
576
+ if (modelSuccess) break; // break fallback loop
528
577
  }
529
578
 
530
579
  if (!success) {
@@ -7,7 +7,7 @@
7
7
  import type {
8
8
  ExtensionAPI,
9
9
  ExtensionContext,
10
- } from '@mariozechner/pi-coding-agent';
10
+ } from '@earendil-works/pi-coding-agent';
11
11
 
12
12
  // ─── Types ──────────────────────────────────────────────────────────────────
13
13
 
@@ -1,5 +1,5 @@
1
- import { streamSimple, type Context, type Message } from '@mariozechner/pi-ai';
2
- import type { ExtensionContext } from '@mariozechner/pi-coding-agent';
1
+ import { streamSimple, type Context, type Message } from '@earendil-works/pi-ai';
2
+ import type { ExtensionContext } from '@earendil-works/pi-coding-agent';
3
3
  import type {
4
4
  RouterTier,
5
5
  RouterPhase,
@@ -8,12 +8,12 @@
8
8
  */
9
9
  import { readFileSync, writeFileSync } from 'node:fs';
10
10
  import { join } from 'node:path';
11
- import { getAgentDir } from '@mariozechner/pi-coding-agent';
12
- import type { Model } from '@mariozechner/pi-ai';
11
+ import { getAgentDir } from '@earendil-works/pi-coding-agent';
12
+ import type { Model } from '@earendil-works/pi-ai';
13
13
  import type {
14
14
  ExtensionAPI,
15
15
  ExtensionContext,
16
- } from '@mariozechner/pi-coding-agent';
16
+ } from '@earendil-works/pi-coding-agent';
17
17
  import type { RouterProfile, RouterConfig } from './types';
18
18
  import { parseCanonicalModelRef } from './config';
19
19
 
@@ -1,4 +1,4 @@
1
- import type { ThinkingLevel } from '@mariozechner/pi-agent-core';
1
+ import type { ThinkingLevel } from '@earendil-works/pi-agent-core';
2
2
 
3
3
  // ─── Feature Toggles (added by fork) ──────────────────────────────────────
4
4
 
package/extensions/ui.ts CHANGED
@@ -1,4 +1,4 @@
1
- import type { ExtensionContext } from '@mariozechner/pi-coding-agent';
1
+ import type { ExtensionContext } from '@earendil-works/pi-coding-agent';
2
2
  import type {
3
3
  RoutingDecision,
4
4
  RouterConfig,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@kylebrodeur/pi-model-router",
3
- "version": "0.1.3",
3
+ "version": "0.2.0",
4
4
  "type": "module",
5
5
  "description": "Intelligent per-turn model router extension for the pi coding agent (Enhanced Fork)",
6
6
  "keywords": [
@@ -50,14 +50,14 @@
50
50
  "prepublishOnly": "npm run tsc"
51
51
  },
52
52
  "peerDependencies": {
53
- "@mariozechner/pi-agent-core": "*",
54
- "@mariozechner/pi-ai": "*",
55
- "@mariozechner/pi-coding-agent": ">=0.70.2",
56
- "@mariozechner/pi-tui": "*",
53
+ "@earendil-works/pi-agent-core": "^0.75.0",
54
+ "@earendil-works/pi-ai": "^0.75.0",
55
+ "@earendil-works/pi-coding-agent": ">=0.75.0",
56
+ "@earendil-works/pi-tui": "^0.75.0",
57
57
  "typebox": "*"
58
58
  },
59
59
  "devDependencies": {
60
- "@mariozechner/pi-coding-agent": "^0.70.2",
60
+ "@earendil-works/pi-coding-agent": "^0.75.0",
61
61
  "prettier": "^3.8.1",
62
62
  "typescript": "^6.0.2"
63
63
  }