@kb-labs/llm-router 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,308 @@
1
+ # @kb-labs/llm-router
2
+
3
+ Adaptive LLM Router with tier-based model selection for KB Labs Platform.
4
+
5
+ ## Overview
6
+
7
+ LLM Router provides an abstraction layer that isolates plugins from LLM providers and models. Plugins specify **what they need** (tier + capabilities), and the platform decides **how to fulfill it**.
8
+
9
+ **Key Principles:**
10
+ - **Plugin isolation** - Plugins don't know about providers/models
11
+ - **User-defined tiers** - Users decide what "small/medium/large" means for them
12
+ - **Adaptive resolution** - Platform adapts to available models
13
+ - **Simplify by default** - Minimal config works out of the box
14
+
15
+ ## Installation
16
+
17
+ ```bash
18
+ pnpm add @kb-labs/llm-router
19
+ ```
20
+
21
+ ## Quick Start
22
+
23
+ ### Plugin Usage (via SDK)
24
+
25
+ ```typescript
26
+ import { useLLM } from '@kb-labs/sdk';
27
+
28
+ // Simple - uses configured default tier
29
+ const llm = useLLM();
30
+
31
+ // Request specific tier
32
+ const llm = useLLM({ tier: 'small' }); // Simple tasks
33
+ const llm = useLLM({ tier: 'large' }); // Complex tasks
34
+
35
+ // Request capabilities
36
+ const llm = useLLM({ tier: 'medium', capabilities: ['coding'] });
37
+
38
+ // Use LLM
39
+ if (llm) {
40
+ const result = await llm.complete('Generate commit message');
41
+ console.log(result.content);
42
+ }
43
+ ```
44
+
45
+ ### Configuration
46
+
47
+ Minimal config in `kb.config.json`:
48
+
49
+ ```json
50
+ {
51
+ "platform": {
52
+ "adapters": {
53
+ "llm": "@kb-labs/adapters-openai"
54
+ },
55
+ "adapterOptions": {
56
+ "llm": {
57
+ "tier": "medium",
58
+ "defaultModel": "gpt-4o"
59
+ }
60
+ }
61
+ }
62
+ }
63
+ ```
64
+
65
+ ### Centralized Cache/Stream Defaults
66
+
67
+ Platform can manage cache/stream defaults centrally via `adapterOptions.llm.executionDefaults`.
68
+ Plugins then use plain `useLLM()` and get consistent behavior by default.
69
+
70
+ ```json
71
+ {
72
+ "platform": {
73
+ "adapterOptions": {
74
+ "llm": {
75
+ "defaultTier": "medium",
76
+ "executionDefaults": {
77
+ "cache": {
78
+ "mode": "prefer",
79
+ "scope": "segments",
80
+ "ttlSec": 3600
81
+ },
82
+ "stream": {
83
+ "mode": "prefer",
84
+ "fallbackToComplete": true
85
+ }
86
+ }
87
+ }
88
+ }
89
+ }
90
+ }
91
+ ```
92
+
93
+ Plugin-level override remains available as an escape hatch:
94
+
95
+ ```typescript
96
+ const llm = useLLM({
97
+ tier: 'medium',
98
+ execution: {
99
+ cache: { mode: 'require', key: 'mind-rag:v2' }
100
+ }
101
+ });
102
+ ```
103
+
104
+ Merge priority:
105
+ 1. platform `executionDefaults`
106
+ 2. plugin `useLLM({ execution })`
107
+ 3. per-call `llm.complete(..., { execution })`
108
+
109
+ ### How To Verify Cache Is Working
110
+
111
+ Analytics events include cache outcome and billing breakdown:
112
+
113
+ - `llm.cache.hit`
114
+ - `llm.cache.miss`
115
+ - `llm.cache.bypass`
116
+
117
+ Completion/tool events also include:
118
+
119
+ - `cacheReadTokens`
120
+ - `cacheWriteTokens`
121
+ - `billablePromptTokens`
122
+ - `estimatedUncachedCost`
123
+ - `estimatedCost`
124
+ - `estimatedCacheSavingsUsd`
125
+
126
+ Practical signal that cache works:
127
+ - `llm.cache.hit` appears regularly;
128
+ - `cacheReadTokens > 0`;
129
+ - `billablePromptTokens < promptTokens`;
130
+ - `estimatedCacheSavingsUsd > 0`.
131
+
132
+ ## Tier System
133
+
134
+ ### Tiers are User-Defined Slots
135
+
136
+ `small` / `medium` / `large` are **NOT** tied to specific models. They are abstract slots that users fill with whatever models they want.
137
+
138
+ | Tier | Plugin Intent | User Decides |
139
+ |------|---------------|--------------|
140
+ | `small` | "This task is simple" | "What model for simple stuff" |
141
+ | `medium` | "Standard task" | "My workhorse model" |
142
+ | `large` | "Complex task, need max quality" | "When I really need quality" |
143
+
144
+ ### Example Configurations
145
+
146
+ ```yaml
147
+ # Budget-conscious: everything on mini
148
+ small: gpt-4o-mini
149
+ medium: gpt-4o-mini
150
+ large: gpt-4o-mini
151
+
152
+ # Standard gradient
153
+ small: gpt-4o-mini
154
+ medium: gpt-4o
155
+ large: o1
156
+
157
+ # Anthropic-first
158
+ small: claude-3-haiku
159
+ medium: claude-3.5-sonnet
160
+ large: claude-opus-4
161
+
162
+ # Local-first with cloud fallback
163
+ small: ollama/llama-3-8b
164
+ medium: ollama/llama-3-70b
165
+ large: claude-opus-4
166
+ ```
167
+
168
+ ## Adaptive Resolution
169
+
170
+ ### Escalation (Silent)
171
+
172
+ If plugin requests **lower** tier than configured, platform **escalates silently**:
173
+
174
+ ```
175
+ Plugin requests: small
176
+ Configured: medium
177
+ Result: medium (no warning)
178
+ ```
179
+
180
+ ### Degradation (With Warning)
181
+
182
+ If plugin requests **higher** tier than configured, platform **degrades with warning**:
183
+
184
+ ```
185
+ Plugin requests: large
186
+ Configured: medium
187
+ Result: medium (⚠️ warning logged)
188
+ ```
189
+
190
+ ### Resolution Table
191
+
192
+ | Request | Configured | Result | Note |
193
+ |---------|------------|--------|------|
194
+ | small | small | small | Exact match |
195
+ | small | medium | medium | Escalate ✅ |
196
+ | small | large | large | Escalate ✅ |
197
+ | medium | small | small | Degrade ⚠️ |
198
+ | medium | medium | medium | Exact match |
199
+ | medium | large | large | Escalate ✅ |
200
+ | large | small | small | Degrade ⚠️ |
201
+ | large | medium | medium | Degrade ⚠️ |
202
+ | large | large | large | Exact match |
203
+
204
+ ## Capabilities
205
+
206
+ Capabilities describe task-specific requirements:
207
+
208
+ | Capability | Description | Typical Models |
209
+ |------------|-------------|----------------|
210
+ | `fast` | Lowest latency | gpt-4o-mini, haiku, flash |
211
+ | `reasoning` | Complex reasoning | o1, claude-opus |
212
+ | `coding` | Code-optimized | claude-sonnet, gpt-4o |
213
+ | `vision` | Image support | gpt-4o, claude-sonnet, gemini |
214
+
215
+ ```typescript
216
+ // Request with capabilities
217
+ const llm = useLLM({ tier: 'medium', capabilities: ['coding'] });
218
+ const llm = useLLM({ capabilities: ['vision'] });
219
+ ```
220
+
221
+ ## API Reference
222
+
223
+ ### Types
224
+
225
+ ```typescript
226
+ // Tier (user-defined quality slot)
227
+ type LLMTier = 'small' | 'medium' | 'large';
228
+
229
+ // Capability (task-specific requirements)
230
+ type LLMCapability = 'reasoning' | 'coding' | 'vision' | 'fast';
231
+
232
+ // Options for useLLM()
233
+ interface UseLLMOptions {
234
+ tier?: LLMTier;
235
+ capabilities?: LLMCapability[];
236
+ }
237
+ ```
238
+
239
+ ### Functions
240
+
241
+ ```typescript
242
+ // Get LLM with tier selection
243
+ function useLLM(options?: UseLLMOptions): ILLM | undefined;
244
+
245
+ // Check if LLM is available
246
+ function isLLMAvailable(): boolean;
247
+
248
+ // Get configured tier
249
+ function getLLMTier(): LLMTier | undefined;
250
+ ```
251
+
252
+ ### ILLMRouter Interface
253
+
254
+ ```typescript
255
+ interface ILLMRouter {
256
+ getConfiguredTier(): LLMTier;
257
+ resolve(options?: UseLLMOptions): LLMResolution;
258
+ hasCapability(capability: LLMCapability): boolean;
259
+ getCapabilities(): LLMCapability[];
260
+ }
261
+ ```
262
+
263
+ ## Architecture
264
+
265
+ ```
266
+ ┌─────────────────────────────────────────────────────────────┐
267
+ │ PLUGIN LAYER │
268
+ │ ┌───────────────────────────────────────────────────────┐ │
269
+ │ │ Plugins see ONLY: │ │
270
+ │ │ • tier: 'small' | 'medium' | 'large' │ │
271
+ │ │ • capabilities: 'reasoning' | 'coding' | ... │ │
272
+ │ │ │ │
273
+ │ │ Plugins DON'T KNOW: │ │
274
+ │ │ ✗ Providers (openai, anthropic, google) │ │
275
+ │ │ ✗ Models (gpt-4o, claude-3-opus) │ │
276
+ │ │ ✗ API keys, endpoints, pricing │ │
277
+ │ └───────────────────────────────────────────────────────┘ │
278
+ │ │ │
279
+ │ useLLM({ tier, capabilities }) │
280
+ │ │ │
281
+ ├────────────────────────────┼────────────────────────────────┤
282
+ │ PLATFORM LAYER │
283
+ │ ▼ │
284
+ │ ┌───────────────────────────────────────────────────────┐ │
285
+ │ │ LLM Router │ │
286
+ │ │ • Adaptive tier resolution │ │
287
+ │ │ • Capability matching │ │
288
+ │ │ • Transparent ILLM delegation │ │
289
+ │ └───────────────────────────────────────────────────────┘ │
290
+ │ │ │
291
+ │ ▼ │
292
+ │ ┌───────────────────────────────────────────────────────┐ │
293
+ │ │ ILLM Adapter │ │
294
+ │ │ (OpenAI, Anthropic, Google, Ollama, etc.) │ │
295
+ │ └───────────────────────────────────────────────────────┘ │
296
+ └─────────────────────────────────────────────────────────────┘
297
+ ```
298
+
299
+ ## Related
300
+
301
+ - [ADR-0046: LLM Router](../../docs/adr/0046-llm-router.md)
302
+ - [LLM Router Plan](../../../../docs/LLM-ROUTER-PLAN.md)
303
+ - [@kb-labs/sdk](../../../kb-labs-sdk/packages/sdk/README.md)
304
+ - [@kb-labs/core-platform](../core-platform/README.md)
305
+
306
+ ## License
307
+
308
+ MIT