@kb-labs/llm-router 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +308 -0
- package/dist/index.cjs +439 -0
- package/dist/index.cjs.map +1 -0
- package/dist/index.d.cts +255 -0
- package/dist/index.d.ts +255 -0
- package/dist/index.js +434 -0
- package/dist/index.js.map +1 -0
- package/package.json +62 -0
package/README.md
ADDED
|
@@ -0,0 +1,308 @@
|
|
|
1
|
+
# @kb-labs/llm-router
|
|
2
|
+
|
|
3
|
+
Adaptive LLM Router with tier-based model selection for KB Labs Platform.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
LLM Router provides an abstraction layer that isolates plugins from LLM providers and models. Plugins specify **what they need** (tier + capabilities), and the platform decides **how to fulfill it**.
|
|
8
|
+
|
|
9
|
+
**Key Principles:**
|
|
10
|
+
- **Plugin isolation** - Plugins don't know about providers/models
|
|
11
|
+
- **User-defined tiers** - Users decide what "small/medium/large" means for them
|
|
12
|
+
- **Adaptive resolution** - Platform adapts to available models
|
|
13
|
+
- **Simplify by default** - Minimal config works out of the box
|
|
14
|
+
|
|
15
|
+
## Installation
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
pnpm add @kb-labs/llm-router
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
## Quick Start
|
|
22
|
+
|
|
23
|
+
### Plugin Usage (via SDK)
|
|
24
|
+
|
|
25
|
+
```typescript
|
|
26
|
+
import { useLLM } from '@kb-labs/sdk';
|
|
27
|
+
|
|
28
|
+
// Simple - uses configured default tier
|
|
29
|
+
const llm = useLLM();
|
|
30
|
+
|
|
31
|
+
// Request specific tier
|
|
32
|
+
const llm = useLLM({ tier: 'small' }); // Simple tasks
|
|
33
|
+
const llm = useLLM({ tier: 'large' }); // Complex tasks
|
|
34
|
+
|
|
35
|
+
// Request capabilities
|
|
36
|
+
const llm = useLLM({ tier: 'medium', capabilities: ['coding'] });
|
|
37
|
+
|
|
38
|
+
// Use LLM
|
|
39
|
+
if (llm) {
|
|
40
|
+
const result = await llm.complete('Generate commit message');
|
|
41
|
+
console.log(result.content);
|
|
42
|
+
}
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
### Configuration
|
|
46
|
+
|
|
47
|
+
Minimal config in `kb.config.json`:
|
|
48
|
+
|
|
49
|
+
```json
|
|
50
|
+
{
|
|
51
|
+
"platform": {
|
|
52
|
+
"adapters": {
|
|
53
|
+
"llm": "@kb-labs/adapters-openai"
|
|
54
|
+
},
|
|
55
|
+
"adapterOptions": {
|
|
56
|
+
"llm": {
|
|
57
|
+
"tier": "medium",
|
|
58
|
+
"defaultModel": "gpt-4o"
|
|
59
|
+
}
|
|
60
|
+
}
|
|
61
|
+
}
|
|
62
|
+
}
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### Centralized Cache/Stream Defaults
|
|
66
|
+
|
|
67
|
+
Platform can manage cache/stream defaults centrally via `adapterOptions.llm.executionDefaults`.
|
|
68
|
+
Plugins then use plain `useLLM()` and get consistent behavior by default.
|
|
69
|
+
|
|
70
|
+
```json
|
|
71
|
+
{
|
|
72
|
+
"platform": {
|
|
73
|
+
"adapterOptions": {
|
|
74
|
+
"llm": {
|
|
75
|
+
"defaultTier": "medium",
|
|
76
|
+
"executionDefaults": {
|
|
77
|
+
"cache": {
|
|
78
|
+
"mode": "prefer",
|
|
79
|
+
"scope": "segments",
|
|
80
|
+
"ttlSec": 3600
|
|
81
|
+
},
|
|
82
|
+
"stream": {
|
|
83
|
+
"mode": "prefer",
|
|
84
|
+
"fallbackToComplete": true
|
|
85
|
+
}
|
|
86
|
+
}
|
|
87
|
+
}
|
|
88
|
+
}
|
|
89
|
+
}
|
|
90
|
+
}
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
Plugin-level override remains available as an escape hatch:
|
|
94
|
+
|
|
95
|
+
```typescript
|
|
96
|
+
const llm = useLLM({
|
|
97
|
+
tier: 'medium',
|
|
98
|
+
execution: {
|
|
99
|
+
cache: { mode: 'require', key: 'mind-rag:v2' }
|
|
100
|
+
}
|
|
101
|
+
});
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
Merge priority:
|
|
105
|
+
1. platform `executionDefaults`
|
|
106
|
+
2. plugin `useLLM({ execution })`
|
|
107
|
+
3. per-call `llm.complete(..., { execution })`
|
|
108
|
+
|
|
109
|
+
### How To Verify Cache Is Working
|
|
110
|
+
|
|
111
|
+
Analytics events include cache outcome and billing breakdown:
|
|
112
|
+
|
|
113
|
+
- `llm.cache.hit`
|
|
114
|
+
- `llm.cache.miss`
|
|
115
|
+
- `llm.cache.bypass`
|
|
116
|
+
|
|
117
|
+
Completion/tool events also include:
|
|
118
|
+
|
|
119
|
+
- `cacheReadTokens`
|
|
120
|
+
- `cacheWriteTokens`
|
|
121
|
+
- `billablePromptTokens`
|
|
122
|
+
- `estimatedUncachedCost`
|
|
123
|
+
- `estimatedCost`
|
|
124
|
+
- `estimatedCacheSavingsUsd`
|
|
125
|
+
|
|
126
|
+
Practical signal that cache works:
|
|
127
|
+
- `llm.cache.hit` appears regularly;
|
|
128
|
+
- `cacheReadTokens > 0`;
|
|
129
|
+
- `billablePromptTokens < promptTokens`;
|
|
130
|
+
- `estimatedCacheSavingsUsd > 0`.
|
|
131
|
+
|
|
132
|
+
## Tier System
|
|
133
|
+
|
|
134
|
+
### Tiers are User-Defined Slots
|
|
135
|
+
|
|
136
|
+
`small` / `medium` / `large` are **NOT** tied to specific models. They are abstract slots that users fill with whatever models they want.
|
|
137
|
+
|
|
138
|
+
| Tier | Plugin Intent | User Decides |
|
|
139
|
+
|------|---------------|--------------|
|
|
140
|
+
| `small` | "This task is simple" | "What model for simple stuff" |
|
|
141
|
+
| `medium` | "Standard task" | "My workhorse model" |
|
|
142
|
+
| `large` | "Complex task, need max quality" | "When I really need quality" |
|
|
143
|
+
|
|
144
|
+
### Example Configurations
|
|
145
|
+
|
|
146
|
+
```yaml
|
|
147
|
+
# Budget-conscious: everything on mini
|
|
148
|
+
small: gpt-4o-mini
|
|
149
|
+
medium: gpt-4o-mini
|
|
150
|
+
large: gpt-4o-mini
|
|
151
|
+
|
|
152
|
+
# Standard gradient
|
|
153
|
+
small: gpt-4o-mini
|
|
154
|
+
medium: gpt-4o
|
|
155
|
+
large: o1
|
|
156
|
+
|
|
157
|
+
# Anthropic-first
|
|
158
|
+
small: claude-3-haiku
|
|
159
|
+
medium: claude-3.5-sonnet
|
|
160
|
+
large: claude-opus-4
|
|
161
|
+
|
|
162
|
+
# Local-first with cloud fallback
|
|
163
|
+
small: ollama/llama-3-8b
|
|
164
|
+
medium: ollama/llama-3-70b
|
|
165
|
+
large: claude-opus-4
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
## Adaptive Resolution
|
|
169
|
+
|
|
170
|
+
### Escalation (Silent)
|
|
171
|
+
|
|
172
|
+
If plugin requests **lower** tier than configured, platform **escalates silently**:
|
|
173
|
+
|
|
174
|
+
```
|
|
175
|
+
Plugin requests: small
|
|
176
|
+
Configured: medium
|
|
177
|
+
Result: medium (no warning)
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
### Degradation (With Warning)
|
|
181
|
+
|
|
182
|
+
If plugin requests **higher** tier than configured, platform **degrades with warning**:
|
|
183
|
+
|
|
184
|
+
```
|
|
185
|
+
Plugin requests: large
|
|
186
|
+
Configured: medium
|
|
187
|
+
Result: medium (⚠️ warning logged)
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
### Resolution Table
|
|
191
|
+
|
|
192
|
+
| Request | Configured | Result | Note |
|
|
193
|
+
|---------|------------|--------|------|
|
|
194
|
+
| small | small | small | Exact match |
|
|
195
|
+
| small | medium | medium | Escalate ✅ |
|
|
196
|
+
| small | large | large | Escalate ✅ |
|
|
197
|
+
| medium | small | small | Degrade ⚠️ |
|
|
198
|
+
| medium | medium | medium | Exact match |
|
|
199
|
+
| medium | large | large | Escalate ✅ |
|
|
200
|
+
| large | small | small | Degrade ⚠️ |
|
|
201
|
+
| large | medium | medium | Degrade ⚠️ |
|
|
202
|
+
| large | large | large | Exact match |
|
|
203
|
+
|
|
204
|
+
## Capabilities
|
|
205
|
+
|
|
206
|
+
Capabilities describe task-specific requirements:
|
|
207
|
+
|
|
208
|
+
| Capability | Description | Typical Models |
|
|
209
|
+
|------------|-------------|----------------|
|
|
210
|
+
| `fast` | Lowest latency | gpt-4o-mini, haiku, flash |
|
|
211
|
+
| `reasoning` | Complex reasoning | o1, claude-opus |
|
|
212
|
+
| `coding` | Code-optimized | claude-sonnet, gpt-4o |
|
|
213
|
+
| `vision` | Image support | gpt-4o, claude-sonnet, gemini |
|
|
214
|
+
|
|
215
|
+
```typescript
|
|
216
|
+
// Request with capabilities
|
|
217
|
+
const llm = useLLM({ tier: 'medium', capabilities: ['coding'] });
|
|
218
|
+
const llm = useLLM({ capabilities: ['vision'] });
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
## API Reference
|
|
222
|
+
|
|
223
|
+
### Types
|
|
224
|
+
|
|
225
|
+
```typescript
|
|
226
|
+
// Tier (user-defined quality slot)
|
|
227
|
+
type LLMTier = 'small' | 'medium' | 'large';
|
|
228
|
+
|
|
229
|
+
// Capability (task-specific requirements)
|
|
230
|
+
type LLMCapability = 'reasoning' | 'coding' | 'vision' | 'fast';
|
|
231
|
+
|
|
232
|
+
// Options for useLLM()
|
|
233
|
+
interface UseLLMOptions {
|
|
234
|
+
tier?: LLMTier;
|
|
235
|
+
capabilities?: LLMCapability[];
|
|
236
|
+
}
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
### Functions
|
|
240
|
+
|
|
241
|
+
```typescript
|
|
242
|
+
// Get LLM with tier selection
|
|
243
|
+
function useLLM(options?: UseLLMOptions): ILLM | undefined;
|
|
244
|
+
|
|
245
|
+
// Check if LLM is available
|
|
246
|
+
function isLLMAvailable(): boolean;
|
|
247
|
+
|
|
248
|
+
// Get configured tier
|
|
249
|
+
function getLLMTier(): LLMTier | undefined;
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
### ILLMRouter Interface
|
|
253
|
+
|
|
254
|
+
```typescript
|
|
255
|
+
interface ILLMRouter {
|
|
256
|
+
getConfiguredTier(): LLMTier;
|
|
257
|
+
resolve(options?: UseLLMOptions): LLMResolution;
|
|
258
|
+
hasCapability(capability: LLMCapability): boolean;
|
|
259
|
+
getCapabilities(): LLMCapability[];
|
|
260
|
+
}
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
## Architecture
|
|
264
|
+
|
|
265
|
+
```
|
|
266
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
267
|
+
│ PLUGIN LAYER │
|
|
268
|
+
│ ┌───────────────────────────────────────────────────────┐ │
|
|
269
|
+
│ │ Plugins see ONLY: │ │
|
|
270
|
+
│ │ • tier: 'small' | 'medium' | 'large' │ │
|
|
271
|
+
│ │ • capabilities: 'reasoning' | 'coding' | ... │ │
|
|
272
|
+
│ │ │ │
|
|
273
|
+
│ │ Plugins DON'T KNOW: │ │
|
|
274
|
+
│ │ ✗ Providers (openai, anthropic, google) │ │
|
|
275
|
+
│ │ ✗ Models (gpt-4o, claude-3-opus) │ │
|
|
276
|
+
│ │ ✗ API keys, endpoints, pricing │ │
|
|
277
|
+
│ └───────────────────────────────────────────────────────┘ │
|
|
278
|
+
│ │ │
|
|
279
|
+
│ useLLM({ tier, capabilities }) │
|
|
280
|
+
│ │ │
|
|
281
|
+
├────────────────────────────┼────────────────────────────────┤
|
|
282
|
+
│ PLATFORM LAYER │
|
|
283
|
+
│ ▼ │
|
|
284
|
+
│ ┌───────────────────────────────────────────────────────┐ │
|
|
285
|
+
│ │ LLM Router │ │
|
|
286
|
+
│ │ • Adaptive tier resolution │ │
|
|
287
|
+
│ │ • Capability matching │ │
|
|
288
|
+
│ │ • Transparent ILLM delegation │ │
|
|
289
|
+
│ └───────────────────────────────────────────────────────┘ │
|
|
290
|
+
│ │ │
|
|
291
|
+
│ ▼ │
|
|
292
|
+
│ ┌───────────────────────────────────────────────────────┐ │
|
|
293
|
+
│ │ ILLM Adapter │ │
|
|
294
|
+
│ │ (OpenAI, Anthropic, Google, Ollama, etc.) │ │
|
|
295
|
+
│ └───────────────────────────────────────────────────────┘ │
|
|
296
|
+
└─────────────────────────────────────────────────────────────┘
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
## Related
|
|
300
|
+
|
|
301
|
+
- [ADR-0046: LLM Router](../../docs/adr/0046-llm-router.md)
|
|
302
|
+
- [LLM Router Plan](../../../../docs/LLM-ROUTER-PLAN.md)
|
|
303
|
+
- [@kb-labs/sdk](../../../kb-labs-sdk/packages/sdk/README.md)
|
|
304
|
+
- [@kb-labs/core-platform](../core-platform/README.md)
|
|
305
|
+
|
|
306
|
+
## License
|
|
307
|
+
|
|
308
|
+
MIT
|