@kylebrodeur/pi-model-router 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +42 -0
- package/CONTRIBUTING.md +310 -0
- package/LEARNINGS.md +181 -0
- package/LICENSE +21 -0
- package/QUICKSTART.md +111 -0
- package/README.md +195 -0
- package/TESTING.md +374 -0
- package/docs/ARCHITECTURE.md +54 -0
- package/docs/UPSTREAM_ISSUE_scoped_models.md +94 -0
- package/extensions/commands.ts +1068 -0
- package/extensions/config.ts +415 -0
- package/extensions/constants.ts +1 -0
- package/extensions/index.ts +583 -0
- package/extensions/ollama-sync.ts +254 -0
- package/extensions/provider.ts +558 -0
- package/extensions/rate-limit.ts +317 -0
- package/extensions/routing.ts +418 -0
- package/extensions/scope-shim.ts +213 -0
- package/extensions/state.ts +49 -0
- package/extensions/types.ts +148 -0
- package/extensions/ui.ts +130 -0
- package/model-router.agent-bus.json +15 -0
- package/model-router.essential.json +31 -0
- package/model-router.example.json +70 -0
- package/model-router.ledger.json +15 -0
- package/package.json +64 -0
package/README.md
ADDED
|
@@ -0,0 +1,195 @@
|
|
|
1
|
+
# pi-model-router
|
|
2
|
+
|
|
3
|
+
[](https://www.npmjs.com/package/@kylebrodeur/pi-model-router)
|
|
4
|
+
[](LICENSE)
|
|
5
|
+
[](https://github.com/mariozechner/pi-coding-agent)
|
|
6
|
+
|
|
7
|
+
Intelligent per-turn model router extension for the [pi](https://github.com/mariozechner/pi-coding-agent) coding agent. Automatically selects between high, medium, and low-tier LLMs based on task intent, session budget, context size, and custom rules — with automatic fallbacks, phase awareness, Ollama sync, and transparent rate-limit recovery.
|
|
8
|
+
|
|
9
|
+
## ✨ Features
|
|
10
|
+
|
|
11
|
+
| Feature | Description |
|
|
12
|
+
|---------|-------------|
|
|
13
|
+
| **Logical Router Provider** | Registers a `router` provider that exposes stable profiles (e.g., `router/auto`) as models. |
|
|
14
|
+
| **Per-Turn Routing** | Intelligently chooses between `high`, `medium`, and `low` tiers for every turn based on task intent and complexity. |
|
|
15
|
+
| **Task-Aware Heuristics** | Detects planning vs. implementation vs. lightweight tasks using keyword analysis, word count, and conversation history. |
|
|
16
|
+
| **LLM Intent Classifier** | Optionally use a fast model to categorize intent (overrides heuristics). |
|
|
17
|
+
| **Custom Rules** | Define keyword-based tier overrides for specific patterns (e.g., `deploy` → `high`). |
|
|
18
|
+
| **Context Trigger** | Automatically upgrade to high-tier when token usage exceeds a threshold. |
|
|
19
|
+
| **Cost Budgeting** | Set a session spend limit; high tier downgrades to medium once exceeded. |
|
|
20
|
+
| **Fallback Chains** | Automatic retry with alternative models if the primary choice fails. |
|
|
21
|
+
| **Phase Memory** | Biased stickiness to keep you in the same tier during multi-turn planning or implementation work. |
|
|
22
|
+
| **Thinking Control** | Full control over reasoning/thinking levels per tier and profile. |
|
|
23
|
+
| **Persistent State** | Pins, profiles, costs, and debug history are remembered across agent restarts and conversation branches. |
|
|
24
|
+
| **Ollama Auto-Sync** | Auto-detect and register local Ollama models. |
|
|
25
|
+
| **Rate-Limit Fallback** | Detect HTTP 402/429/503/529 and transparently fall back to local models. |
|
|
26
|
+
| **Feature Toggles** | Enable/disable features at user or project level. |
|
|
27
|
+
| **Progressive Enhancement** | Detects installed plugins (qmd-ledger, agent-bus) and integrates conditionally. |
|
|
28
|
+
|
|
29
|
+
## 📦 Installation
|
|
30
|
+
|
|
31
|
+
### From npm
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
pi install npm:@kylebrodeur/pi-model-router
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
### From source
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
git clone https://github.com/kylebrodeur/pi-model-router.git
|
|
41
|
+
cd pi-model-router
|
|
42
|
+
pi install .
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
### Quick test
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
pi -e ./extensions/index.ts
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
## 🚀 Quick Start
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
# 1. Install the extension
|
|
55
|
+
pi install npm:@kylebrodeur/pi-model-router
|
|
56
|
+
|
|
57
|
+
# 2. Create default config
|
|
58
|
+
/router init
|
|
59
|
+
|
|
60
|
+
# 3. Reload to apply
|
|
61
|
+
/reload
|
|
62
|
+
|
|
63
|
+
# 4. Check status
|
|
64
|
+
/router status
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
## ⚙️ Configuration
|
|
68
|
+
|
|
69
|
+
Copy the example config to one of:
|
|
70
|
+
|
|
71
|
+
- `~/.pi/agent/model-router.json` (Global)
|
|
72
|
+
- `.pi/model-router.json` (Project-specific)
|
|
73
|
+
|
|
74
|
+
### Basic Config Shape
|
|
75
|
+
|
|
76
|
+
```json
|
|
77
|
+
{
|
|
78
|
+
"defaultProfile": "auto",
|
|
79
|
+
"classifierModel": "google/gemini-flash-latest",
|
|
80
|
+
"maxSessionBudget": 1.0,
|
|
81
|
+
"profiles": {
|
|
82
|
+
"auto": {
|
|
83
|
+
"high": { "model": "openai/gpt-5.4-pro", "thinking": "high" },
|
|
84
|
+
"medium": { "model": "google/gemini-flash-latest", "thinking": "medium" },
|
|
85
|
+
"low": { "model": "openai/gpt-5.4-nano", "thinking": "low" }
|
|
86
|
+
}
|
|
87
|
+
}
|
|
88
|
+
}
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
### Configuration Fields
|
|
92
|
+
|
|
93
|
+
| Field | Description |
|
|
94
|
+
|-------|-------------|
|
|
95
|
+
| `defaultProfile` | The profile to use when starting a new session. |
|
|
96
|
+
| `classifierModel` | (Optional) Model used to categorize intent. If omitted, fast heuristics are used. |
|
|
97
|
+
| `maxSessionBudget` | (Optional) USD budget for the session. Forces `medium` tier once exceeded. |
|
|
98
|
+
| `largeContextThreshold` | (Optional) Token count trigger to force `high` tier for large contexts. |
|
|
99
|
+
| `phaseBias` | (0.0 - 1.0) Stickiness of the current phase. Higher = more stable. Default `0.5`. |
|
|
100
|
+
| `rules` | List of custom keyword rules (e.g. `{ "matches": "deploy", "tier": "high" }`). |
|
|
101
|
+
| `profiles` | Map of profile definitions, each containing `high`, `medium`, and `low` tiers. |
|
|
102
|
+
|
|
103
|
+
### Feature Toggles
|
|
104
|
+
|
|
105
|
+
```json
|
|
106
|
+
{
|
|
107
|
+
"features": {
|
|
108
|
+
"ollamaSync": true,
|
|
109
|
+
"rateLimitFallback": true,
|
|
110
|
+
"scopeShim": true,
|
|
111
|
+
"perTurnRouting": true,
|
|
112
|
+
"intentClassifier": false,
|
|
113
|
+
"costBudgeting": true,
|
|
114
|
+
"phaseMemory": true,
|
|
115
|
+
"contextCompression": true,
|
|
116
|
+
"ledgerIntegration": false,
|
|
117
|
+
"agentBusIntegration": false
|
|
118
|
+
}
|
|
119
|
+
}
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
**Priority:** Project config `.pi/model-router.json` overrides user config `~/.pi/agent/model-router.json`. Both override defaults.
|
|
123
|
+
|
|
124
|
+
### Progressive Enhancement Configs
|
|
125
|
+
|
|
126
|
+
After installing optional extensions, copy one of these to `.pi/model-router.json`:
|
|
127
|
+
|
|
128
|
+
| File | Feature |
|
|
129
|
+
|------|---------|
|
|
130
|
+
| `model-router.ledger.json` | Log routing decisions to qmd-ledger |
|
|
131
|
+
| `model-router.agent-bus.json` | Publish model changes to agent-bus |
|
|
132
|
+
| `model-router.essential.json` | All integrations enabled |
|
|
133
|
+
|
|
134
|
+
## ⌨️ Commands
|
|
135
|
+
|
|
136
|
+
### Core Router Commands
|
|
137
|
+
|
|
138
|
+
| Command | Description |
|
|
139
|
+
|---------|-------------|
|
|
140
|
+
| `/router` | Show detailed status, current profile, spend, and settings. |
|
|
141
|
+
| `/router status` | Alias for `/router` (show current status). |
|
|
142
|
+
| `/router profile [name]` | Switch to a profile or list available ones (enables router if off). |
|
|
143
|
+
| `/router pin [prof] <t\|a>` | Pin a tier (high/medium/low/auto) for the current or specified profile. |
|
|
144
|
+
| `/router fix <tier>` | Correct the *last* decision and pin that tier for the current profile. |
|
|
145
|
+
| `/router thinking ...` | Override thinking levels (e.g., `/router thinking low xhigh`). |
|
|
146
|
+
| `/router disable` | Disable the router and switch back to the last non-router model. |
|
|
147
|
+
| `/router widget <on\|off>` | Toggle the persistent state widget (supports `toggle`). |
|
|
148
|
+
| `/router debug <on\|off>` | Toggle turn-by-turn routing notifications (supports `toggle`, `clear`, `show`). |
|
|
149
|
+
| `/router reload` | Hot-reload the configuration JSON. |
|
|
150
|
+
| `/router help` | Show usage help for all subcommands. |
|
|
151
|
+
|
|
152
|
+
### Fork-Added Commands
|
|
153
|
+
|
|
154
|
+
| Command | Feature | Description |
|
|
155
|
+
|---------|---------|-------------|
|
|
156
|
+
| `/router ollama-sync` | ollamaSync | Manually sync Ollama models |
|
|
157
|
+
| `/router fallback` | rateLimitFallback | Switch to fallback model sequence |
|
|
158
|
+
| `/router restore` | rateLimitFallback | Restore cloud model |
|
|
159
|
+
| `/router init` | | Scaffold default config file |
|
|
160
|
+
| `/router scope apply` | scopeShim | Sync router profiles to Pi enabled models |
|
|
161
|
+
| `/router scope reset` | scopeShim | Clear router profiles from Pi enabled list |
|
|
162
|
+
| `/router scope show` | scopeShim | Show current Pi scoped models settings |
|
|
163
|
+
|
|
164
|
+
## 🔒 Requirements
|
|
165
|
+
|
|
166
|
+
- **Pi >= 0.70.2** — Required for `after_provider_response` event (rate limit detection)
|
|
167
|
+
- **Ollama** — Running locally for Ollama sync and fallback
|
|
168
|
+
- **Node.js >= 18** — For TypeScript compilation
|
|
169
|
+
|
|
170
|
+
## 📚 Documentation
|
|
171
|
+
|
|
172
|
+
- [Architecture Guide](docs/ARCHITECTURE.md): Deep dive into routing logic and modular design.
|
|
173
|
+
- [Testing Guide](TESTING.md): Step-by-step testing checklist.
|
|
174
|
+
- [Contributing Guide](CONTRIBUTING.md): How to contribute back to upstream.
|
|
175
|
+
- [Quick Start](QUICKSTART.md): 6-step install guide.
|
|
176
|
+
- [Sample Configuration](model-router.example.json): Diverse profile examples (`cheap`, `deep`, `balanced`).
|
|
177
|
+
- [Learnings](LEARNINGS.md): Development insights and best practices.
|
|
178
|
+
|
|
179
|
+
## 🤝 Contributing
|
|
180
|
+
|
|
181
|
+
See [CONTRIBUTING.md](CONTRIBUTING.md) for:
|
|
182
|
+
|
|
183
|
+
- How to split changes into upstream-friendly PRs
|
|
184
|
+
- Commit message conventions
|
|
185
|
+
- Testing requirements
|
|
186
|
+
- What to say in PR descriptions
|
|
187
|
+
|
|
188
|
+
## 📜 License
|
|
189
|
+
|
|
190
|
+
MIT — See [LICENSE](LICENSE).
|
|
191
|
+
|
|
192
|
+
## 🙏 Acknowledgements
|
|
193
|
+
|
|
194
|
+
- **[yeliu84/pi-model-router](https://github.com/yeliu84/pi-model-router)**: The original author and architecture behind the `router` provider.
|
|
195
|
+
- **[shouvik12/trooper](https://github.com/shouvik12/trooper)**: Inspiration for robust, transparent HTTP rate-limit fallback triggers.
|
package/TESTING.md
ADDED
|
@@ -0,0 +1,374 @@
|
|
|
1
|
+
# 🧪 Testing Checklist - pi-model-router Fork
|
|
2
|
+
|
|
3
|
+
## Pre-requisites
|
|
4
|
+
|
|
5
|
+
- [ ] **Pi updated** to 0.67+ (for `after_provider_response` event support)
|
|
6
|
+
- [ ] **Node.js** 18+ available
|
|
7
|
+
- [ ] **Ollama installed and running**
|
|
8
|
+
- [ ] At least one Ollama model pulled locally
|
|
9
|
+
|
|
10
|
+
## Installation Steps
|
|
11
|
+
|
|
12
|
+
```bash
|
|
13
|
+
# Install from github
|
|
14
|
+
pi install git:github.com/kylebrodeur/pi-model-router@main
|
|
15
|
+
|
|
16
|
+
# Start Pi session
|
|
17
|
+
pi
|
|
18
|
+
|
|
19
|
+
# Scaffold config
|
|
20
|
+
/router init
|
|
21
|
+
/router reload
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Test 1: Basic Router Functionality (Upstream Baseline)
|
|
25
|
+
|
|
26
|
+
**Goal:** Verify upstream routing still works with our changes.
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
# Start Pi with router extension
|
|
30
|
+
pi
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
- [ ] Pi starts without errors
|
|
34
|
+
- [ ] Console shows `[router] Feature sync complete`
|
|
35
|
+
- [ ] Status bar shows router info
|
|
36
|
+
|
|
37
|
+
```
|
|
38
|
+
# In Pi TUI
|
|
39
|
+
/router status
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
- [ ] Shows current profile, tier, model
|
|
43
|
+
- [ ] Shows `Router enabled: true`
|
|
44
|
+
|
|
45
|
+
### Test Routing Decision
|
|
46
|
+
|
|
47
|
+
```
|
|
48
|
+
# Switch to router profile
|
|
49
|
+
/router profile balanced
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
- [ ] Model switches to `router/balanced`
|
|
53
|
+
- [ ] Status bar updates
|
|
54
|
+
|
|
55
|
+
```
|
|
56
|
+
# Ask a simple question (should route to low tier)
|
|
57
|
+
What is 2+2?
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
- [ ] Request uses low-tier model
|
|
61
|
+
|
|
62
|
+
```
|
|
63
|
+
# Ask a complex design question (should route to high tier)
|
|
64
|
+
Design a full-stack app architecture with React, Node, and PostgreSQL
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
- [ ] Request uses high-tier model
|
|
68
|
+
|
|
69
|
+
## Test 2: Ollama Auto-Sync
|
|
70
|
+
|
|
71
|
+
**Goal:** Verify new Ollama models are detected and added.
|
|
72
|
+
|
|
73
|
+
### Step 1: Pull a New Model
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
# In another terminal
|
|
77
|
+
ollama pull llama3.2:3b
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
### Step 2: Trigger Sync in Pi
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
# In Pi TUI
|
|
84
|
+
/router ollama-sync
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
**Expected:**
|
|
88
|
+
|
|
89
|
+
- [ ] Notification: `[Router] Added 1 model(s)`
|
|
90
|
+
- [ ] Notification: `Run /reload to see: llama3.2:3b`
|
|
91
|
+
|
|
92
|
+
### Step 3: Reload and Verify
|
|
93
|
+
|
|
94
|
+
```
|
|
95
|
+
/reload
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
**Expected:**
|
|
99
|
+
|
|
100
|
+
- [ ] Extension reloads
|
|
101
|
+
- [ ] New model appears in `/model` selector under "ollama"
|
|
102
|
+
|
|
103
|
+
### Step 4: Auto-Sync on Session Start
|
|
104
|
+
|
|
105
|
+
```
|
|
106
|
+
/model # Switch to a non-router model
|
|
107
|
+
/new # Start new session
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
**Expected:**
|
|
111
|
+
|
|
112
|
+
- [ ] On session start, Ollama sync runs automatically
|
|
113
|
+
- [ ] If new models exist, notifications appear
|
|
114
|
+
- [ ] Console log: `[router] ollama-sync: feature enabled`
|
|
115
|
+
|
|
116
|
+
### Step 5: Project-Level Config Override
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
# In your project directory
|
|
120
|
+
echo '{ "features": { "ollamaSync": false } }' > .pi/model-router.json
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
```
|
|
124
|
+
# In Pi TUI (from project directory)
|
|
125
|
+
/router
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
**Expected:**
|
|
129
|
+
|
|
130
|
+
- [ ] Auto-sync doesn't run on session start
|
|
131
|
+
- [ ] Manual `/router ollama-sync` still works
|
|
132
|
+
|
|
133
|
+
## Test 3: Rate Limit Fallback
|
|
134
|
+
|
|
135
|
+
**Goal:** Verify rate limit detection and manual fallback.
|
|
136
|
+
|
|
137
|
+
### Step 1: Verify Rate Limit Monitoring
|
|
138
|
+
|
|
139
|
+
**Note:** Real rate limits are hard to trigger intentionally. This requires actual API usage.
|
|
140
|
+
|
|
141
|
+
```
|
|
142
|
+
# In Pi TUI, ensure tracking is active
|
|
143
|
+
/router status
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
**Expected:**
|
|
147
|
+
|
|
148
|
+
- [ ] Status output includes rate limit or fallback info if relevant.
|
|
149
|
+
|
|
150
|
+
### Step 2: Manual Fallback
|
|
151
|
+
|
|
152
|
+
```
|
|
153
|
+
# Switch to a cloud model first
|
|
154
|
+
/model anthropic/claude-sonnet-4
|
|
155
|
+
|
|
156
|
+
# Then trigger manual fallback
|
|
157
|
+
/router fallback
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
**Expected:**
|
|
161
|
+
|
|
162
|
+
- [ ] Switches to fallback model (best available matching sequence)
|
|
163
|
+
- [ ] Status bar shows `🏠 fallback`
|
|
164
|
+
- [ ] Console log shows fallback model name
|
|
165
|
+
|
|
166
|
+
### Step 3: Restore
|
|
167
|
+
|
|
168
|
+
```
|
|
169
|
+
/router restore
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
**Expected:**
|
|
173
|
+
|
|
174
|
+
- [ ] Restores to original cloud model
|
|
175
|
+
- [ ] Status bar clears `🏠 fallback`
|
|
176
|
+
|
|
177
|
+
### Step 4: Feature Disabled
|
|
178
|
+
|
|
179
|
+
```bash
|
|
180
|
+
echo '{ "features": { "rateLimitFallback": false } }' > ~/.pi/agent/model-router.json
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
```
|
|
184
|
+
/reload
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
**Expected:**
|
|
188
|
+
|
|
189
|
+
- [ ] Console shows `[router] rate-limit-fallback: disabled`
|
|
190
|
+
- [ ] Status bar still shows router info but no fallback indicator
|
|
191
|
+
|
|
192
|
+
## Test 4: Feature Toggles (Config Merging)
|
|
193
|
+
|
|
194
|
+
### Step 1: User-Level Config
|
|
195
|
+
|
|
196
|
+
```bash
|
|
197
|
+
cat > ~/.pi/agent/model-router.json << 'EOF'
|
|
198
|
+
{
|
|
199
|
+
"features": {
|
|
200
|
+
"ollamaSync": true,
|
|
201
|
+
"rateLimitFallback": true,
|
|
202
|
+
"perTurnRouting": true
|
|
203
|
+
}
|
|
204
|
+
}
|
|
205
|
+
EOF
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
### Step 2: Project-Level Override
|
|
209
|
+
|
|
210
|
+
```bash
|
|
211
|
+
mkdir -p .pi
|
|
212
|
+
cat > .pi/model-router.json << 'EOF'
|
|
213
|
+
{
|
|
214
|
+
"features": {
|
|
215
|
+
"ollamaSync": false,
|
|
216
|
+
"rateLimitFallback": false
|
|
217
|
+
}
|
|
218
|
+
}
|
|
219
|
+
EOF
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
### Step 3: Verify
|
|
223
|
+
|
|
224
|
+
```
|
|
225
|
+
# In Pi TUI from project directory
|
|
226
|
+
/router
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
**Expected:**
|
|
230
|
+
|
|
231
|
+
- [ ] Console shows `[router] rate-limit-fallback: disabled`
|
|
232
|
+
- [ ] Console shows `[router] ollama-sync: disabled`
|
|
233
|
+
- [ ] Per-turn routing still works (project didn't override it)
|
|
234
|
+
|
|
235
|
+
## Test 5: Edge Cases
|
|
236
|
+
|
|
237
|
+
### Missing Ollama
|
|
238
|
+
|
|
239
|
+
```
|
|
240
|
+
# Stop Ollama
|
|
241
|
+
ollama stop # or kill process
|
|
242
|
+
|
|
243
|
+
# Try sync
|
|
244
|
+
/router ollama-sync
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
**Expected:**
|
|
248
|
+
|
|
249
|
+
- [ ] Notification: `Ollama not available`
|
|
250
|
+
- [ ] No crash, graceful failure
|
|
251
|
+
|
|
252
|
+
### Missing models.json
|
|
253
|
+
|
|
254
|
+
```bash
|
|
255
|
+
mv ~/.pi/agent/models.json ~/.pi/agent/models.json.bak
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
```
|
|
259
|
+
/router ollama-sync
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
**Expected:**
|
|
263
|
+
|
|
264
|
+
- [ ] Notification: `models.json not found`
|
|
265
|
+
|
|
266
|
+
```bash
|
|
267
|
+
# Restore
|
|
268
|
+
mv ~/.pi/agent/models.json.bak ~/.pi/agent/models.json
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
### No Ollama Models Configured
|
|
272
|
+
|
|
273
|
+
```bash
|
|
274
|
+
# Temporarily move models.json ollama section
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
```
|
|
278
|
+
/router fallback
|
|
279
|
+
```
|
|
280
|
+
|
|
281
|
+
**Expected:**
|
|
282
|
+
|
|
283
|
+
- [ ] `No Ollama models available` error notification
|
|
284
|
+
|
|
285
|
+
## Test 6: Combined Features (Full Workflow)
|
|
286
|
+
|
|
287
|
+
**Goal:** Verify all features work together.
|
|
288
|
+
|
|
289
|
+
```bash
|
|
290
|
+
# Pull new model
|
|
291
|
+
ollama pull qwen3.5:7b
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
```
|
|
295
|
+
# Pi session
|
|
296
|
+
/new
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
**Expected:**
|
|
300
|
+
|
|
301
|
+
1. [ ] Session starts
|
|
302
|
+
2. [ ] Auto-sync detects `qwen3.5:7b`
|
|
303
|
+
3. [ ] Notification: `Added 1 model(s)`
|
|
304
|
+
4. [ ] `/reload` prompt shown
|
|
305
|
+
5. [ ] After `/reload`, routing profiles work
|
|
306
|
+
6. [ ] `/router profile balanced` switches to router
|
|
307
|
+
7. [ ] Simple query routes to low tier (possibly Ollama)
|
|
308
|
+
8. [ ] Complex query routes to high tier (cloud)
|
|
309
|
+
|
|
310
|
+
## Regression Tests (Upstream Feature Checklist)
|
|
311
|
+
|
|
312
|
+
- [ ] `/router` shows status
|
|
313
|
+
- [ ] `/router profile <name>` switches profiles
|
|
314
|
+
- [ ] `/router pin high` pins tier
|
|
315
|
+
- [ ] `/router pin auto` releases pin
|
|
316
|
+
- [ ] `/router disable` disables router
|
|
317
|
+
- [ ] `/router debug toggle` enables debug
|
|
318
|
+
- [ ] `/router widget toggle` toggles widget
|
|
319
|
+
- [ ] `/router reload` reloads config
|
|
320
|
+
- [ ] Routing decisions shown in UI (if debug enabled)
|
|
321
|
+
- [ ] Cost tracking works (if budgeting enabled)
|
|
322
|
+
- [ ] Cost budget forces downgrade when exceeded
|
|
323
|
+
- [ ] Phase memory works across turns
|
|
324
|
+
- [ ] Custom rules match and override tiers
|
|
325
|
+
- [ ] Context trigger forces high tier on large contexts
|
|
326
|
+
|
|
327
|
+
## Troubleshooting
|
|
328
|
+
|
|
329
|
+
### Extension Not Loading
|
|
330
|
+
|
|
331
|
+
```bash
|
|
332
|
+
# Check file paths
|
|
333
|
+
ls -la ~/.pi/agent/extensions/pi-model-router/
|
|
334
|
+
# Should see: index.ts, config.ts, types.ts, routing.ts, provider.ts, commands.ts, state.ts, ui.ts, constants.ts, ollama-sync.ts, rate-limit.ts, features.ts
|
|
335
|
+
```
|
|
336
|
+
|
|
337
|
+
### TypeScript Errors After Update
|
|
338
|
+
|
|
339
|
+
```bash
|
|
340
|
+
cd ~/.pi/agent/extensions/pi-model-router
|
|
341
|
+
npm install # ensure dependencies match
|
|
342
|
+
./node_modules/.bin/tsc --noEmit
|
|
343
|
+
```
|
|
344
|
+
|
|
345
|
+
### Config Not Applying
|
|
346
|
+
|
|
347
|
+
```bash
|
|
348
|
+
# Verify files exist
|
|
349
|
+
cat ~/.pi/agent/model-router.json
|
|
350
|
+
cat .pi/model-router.json 2>/dev/null || echo "No project config"
|
|
351
|
+
```
|
|
352
|
+
|
|
353
|
+
### Rate Limit Not Detected
|
|
354
|
+
|
|
355
|
+
```
|
|
356
|
+
# Requires Pi 0.67+ — check version
|
|
357
|
+
pi --version
|
|
358
|
+
|
|
359
|
+
# If outdated:
|
|
360
|
+
npm install -g @mariozechner/pi-coding-agent
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
### Fallback Model Not In Registry
|
|
364
|
+
|
|
365
|
+
```
|
|
366
|
+
/reload
|
|
367
|
+
/model # Check if new models appear
|
|
368
|
+
```
|
|
369
|
+
|
|
370
|
+
If Ollama models don't appear after `/reload`:
|
|
371
|
+
|
|
372
|
+
1. Check `models.json` has ollama provider section
|
|
373
|
+
2. Verify `baseUrl` is correct: `http://127.0.0.1:11434/v1`
|
|
374
|
+
3. Check Ollama is running: `ollama list` in terminal
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
# Architecture: Pi Model Router Extension
|
|
2
|
+
|
|
3
|
+
The `pi-model-router` is an extension-first model router for the `pi` coding agent. It registers a custom logical provider (`router`) that exposes "profiles" as models (e.g., `router/auto`). For every turn, the router intelligently selects an underlying concrete model based on task complexity, conversation phase, and user-defined rules.
|
|
4
|
+
|
|
5
|
+
## Core Concepts
|
|
6
|
+
|
|
7
|
+
### 1. Profiles & Tiers
|
|
8
|
+
|
|
9
|
+
The router is organized into **Profiles** (e.g., `auto`, `cheap`, `deep`). Each profile defines three **Tiers**:
|
|
10
|
+
|
|
11
|
+
- **High**: Reserved for architecture, design, complex debugging, and planning. Uses high-reasoning models.
|
|
12
|
+
- **Medium**: The default for standard implementation, multi-file edits, and focused fixes.
|
|
13
|
+
- **Low**: Used for summaries, changelogs, formatting, and simple read-only lookups.
|
|
14
|
+
|
|
15
|
+
### 2. Custom Provider Implementation
|
|
16
|
+
|
|
17
|
+
The extension uses `pi.registerProvider` to hook into the `pi` model lifecycle. This ensures that the selected model in the `pi` footer remains stable (e.g., `router/auto`) while the underlying model changes transparently turn-by-turn via the `streamSimple` interception.
|
|
18
|
+
|
|
19
|
+
## Routing Decision Flow
|
|
20
|
+
|
|
21
|
+
For every request sent to a `router/*` model, the following logic is executed:
|
|
22
|
+
|
|
23
|
+
1. **Budget Check**: If a `maxSessionBudget` is configured and the session spend exceeds it, the router automatically downgrades `high` tier requests to `medium`.
|
|
24
|
+
2. **Context Trigger**: If `largeContextThreshold` is exceeded (measured in tokens), the router forces the `high` tier to ensure the model can handle the large context.
|
|
25
|
+
3. **Manual Pin**: If the user has pinned a tier via `/router pin` or `/router fix`, that tier is used.
|
|
26
|
+
4. **Custom Rules**: Keyword-based rules defined in the config are checked against the user prompt.
|
|
27
|
+
5. **LLM Classifier (Optional)**: If `classifierModel` is configured, a fast LLM is called to categorize the user's intent.
|
|
28
|
+
6. **Heuristics (Fallback)**: If the classifier is off or fails, a fast local heuristic (keyword/length/tool-use analysis) is used.
|
|
29
|
+
7. **Biased Stickiness**: The `phaseBias` setting modulates thresholds to keep the router in a consistent phase (e.g., staying in `high` tier during a multi-turn planning session).
|
|
30
|
+
|
|
31
|
+
## Module Architecture
|
|
32
|
+
|
|
33
|
+
The extension is modularized for maintainability:
|
|
34
|
+
|
|
35
|
+
- `extensions/index.ts`: Orchestrator. Manages state, hooks into `pi` events, and wires modules together.
|
|
36
|
+
- `extensions/provider.ts`: Implements the `router` provider and the delegation/retry loop.
|
|
37
|
+
- `extensions/routing.ts`: Core decision logic, heuristics, and the LLM classifier.
|
|
38
|
+
- `extensions/config.ts`: Loads, merges, and normalizes the JSON configuration.
|
|
39
|
+
- `extensions/commands.ts`: Registers all `/router` subcommands and their autocompletions.
|
|
40
|
+
- `extensions/ui.ts`: Manages the status line and the optional state widget.
|
|
41
|
+
- `extensions/state.ts`: Handles session-persisted state and snapshots.
|
|
42
|
+
- `extensions/types.ts`: Centralized interface and type definitions.
|
|
43
|
+
|
|
44
|
+
## State & Persistence
|
|
45
|
+
|
|
46
|
+
The router state is persisted using `pi.appendEntry` with a custom type `router-state`. This allows the router to:
|
|
47
|
+
|
|
48
|
+
- Restore the active profile and pins across agent relaunches.
|
|
49
|
+
- Maintain independent pins and state for different conversation branches.
|
|
50
|
+
- Track accumulated session costs safely.
|
|
51
|
+
|
|
52
|
+
## Reliability: Fallback Chains
|
|
53
|
+
|
|
54
|
+
Each tier in a profile can define an optional `fallbacks` list. If the primary model fails (e.g., due to rate limits or provider downtime), the router automatically retries the next model in the chain before surfacing an error to the user.
|