@agentlify/mcp-server 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +230 -0
- package/dist/index.d.ts +7 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +841 -0
- package/dist/index.js.map +1 -0
- package/docs/cost-optimization.md +281 -0
- package/docs/example-chatbot.md +424 -0
- package/docs/migration-anthropic.md +192 -0
- package/docs/migration-openai.md +383 -0
- package/docs/pricing.md +351 -0
- package/docs/quickstart.md +242 -0
- package/docs/router-configuration.md +644 -0
- package/docs/routing-strategies.md +236 -0
- package/docs/sdk-javascript.md +253 -0
- package/docs/sdk-python.md +52 -0
- package/package.json +40 -0
- package/src/index.ts +946 -0
- package/tsconfig.json +20 -0
|
@@ -0,0 +1,644 @@
|
|
|
1
|
+
# Router Configuration Guide
|
|
2
|
+
|
|
3
|
+
Learn how to configure Agentlify routers for optimal performance, cost savings, and reliability.
|
|
4
|
+
|
|
5
|
+
## 🎯 Overview
|
|
6
|
+
|
|
7
|
+
Agentlify routers intelligently select the best AI model for each request based on your optimization preferences. This guide covers advanced configuration options and best practices.
|
|
8
|
+
|
|
9
|
+
## 🧠 Router Modes
|
|
10
|
+
|
|
11
|
+
### Smart Router Mode (Recommended)
|
|
12
|
+
|
|
13
|
+
Smart Router mode automatically selects the optimal model for each request based on your configured weights and requirements.
|
|
14
|
+
|
|
15
|
+
```javascript
|
|
16
|
+
// Router automatically selects the best model
|
|
17
|
+
const completion = await client.chat.completions.create({
|
|
18
|
+
messages: [{ role: 'user', content: 'Analyze this data...' }],
|
|
19
|
+
// No model specified - router decides
|
|
20
|
+
});
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
**Benefits:**
|
|
24
|
+
|
|
25
|
+
- Automatic cost optimization
|
|
26
|
+
- Performance-based selection
|
|
27
|
+
- Quality-aware routing
|
|
28
|
+
- Carbon footprint consideration
|
|
29
|
+
|
|
30
|
+
**Best for:**
|
|
31
|
+
|
|
32
|
+
- Production applications
|
|
33
|
+
- Cost-sensitive workloads
|
|
34
|
+
- Variable request types
|
|
35
|
+
- Long-term optimization
|
|
36
|
+
|
|
37
|
+
### Passthrough Mode
|
|
38
|
+
|
|
39
|
+
Passthrough mode always routes requests to your specified preferred model, providing consistent behavior.
|
|
40
|
+
|
|
41
|
+
```javascript
|
|
42
|
+
// Always uses the configured preferred model
|
|
43
|
+
const completion = await client.chat.completions.create({
|
|
44
|
+
messages: [{ role: 'user', content: 'Hello' }],
|
|
45
|
+
// Uses preferred model (e.g., gpt-4)
|
|
46
|
+
});
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
**Benefits:**
|
|
50
|
+
|
|
51
|
+
- Predictable behavior
|
|
52
|
+
- Consistent response style
|
|
53
|
+
- Simple debugging
|
|
54
|
+
- Model-specific features
|
|
55
|
+
|
|
56
|
+
**Best for:**
|
|
57
|
+
|
|
58
|
+
- Testing and development
|
|
59
|
+
- Model-specific requirements
|
|
60
|
+
- Consistent user experience
|
|
61
|
+
- Compliance requirements
|
|
62
|
+
|
|
63
|
+
## ⚖️ Optimization Weights
|
|
64
|
+
|
|
65
|
+
Configure how the router prioritizes different factors when selecting models.
|
|
66
|
+
|
|
67
|
+
### Weight Configuration
|
|
68
|
+
|
|
69
|
+
```javascript
|
|
70
|
+
{
|
|
71
|
+
costWeight: 0.4, // 40% - Cost optimization
|
|
72
|
+
latencyWeight: 0.3, // 30% - Speed optimization
|
|
73
|
+
qualityWeight: 0.2, // 20% - Response quality
|
|
74
|
+
carbonWeight: 0.1 // 10% - Environmental impact
|
|
75
|
+
}
|
|
76
|
+
// Total must equal 1.0
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
### Priority Levels
|
|
80
|
+
|
|
81
|
+
| Priority | Weight | Description |
|
|
82
|
+
| --------- | ------- | ---------------------------- |
|
|
83
|
+
| Very High | 0.5-0.7 | Dominant factor in selection |
|
|
84
|
+
| High | 0.3-0.4 | Major consideration |
|
|
85
|
+
| Medium | 0.2-0.3 | Moderate influence |
|
|
86
|
+
| Low | 0.1-0.2 | Minor factor |
|
|
87
|
+
|
|
88
|
+
### Common Configurations
|
|
89
|
+
|
|
90
|
+
#### Cost-Optimized (Startups/High Volume)
|
|
91
|
+
|
|
92
|
+
```javascript
|
|
93
|
+
{
|
|
94
|
+
costWeight: 0.6, // Primary focus on cost savings
|
|
95
|
+
qualityWeight: 0.25, // Maintain reasonable quality
|
|
96
|
+
latencyWeight: 0.1, // Speed less important
|
|
97
|
+
carbonWeight: 0.05 // Minimal environmental focus
|
|
98
|
+
}
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
#### Performance-Optimized (Real-time Apps)
|
|
102
|
+
|
|
103
|
+
```javascript
|
|
104
|
+
{
|
|
105
|
+
latencyWeight: 0.5, // Speed is critical
|
|
106
|
+
qualityWeight: 0.3, // Good quality needed
|
|
107
|
+
costWeight: 0.15, // Cost secondary
|
|
108
|
+
carbonWeight: 0.05 // Environmental consideration
|
|
109
|
+
}
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
#### Quality-Optimized (Content Creation)
|
|
113
|
+
|
|
114
|
+
```javascript
|
|
115
|
+
{
|
|
116
|
+
qualityWeight: 0.5, // Best possible responses
|
|
117
|
+
costWeight: 0.25, // Reasonable cost control
|
|
118
|
+
latencyWeight: 0.2, // Acceptable speed
|
|
119
|
+
carbonWeight: 0.05 // Environmental awareness
|
|
120
|
+
}
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
#### Balanced (General Purpose)
|
|
124
|
+
|
|
125
|
+
```javascript
|
|
126
|
+
{
|
|
127
|
+
costWeight: 0.3, // Moderate cost focus
|
|
128
|
+
qualityWeight: 0.3, // Good quality
|
|
129
|
+
latencyWeight: 0.25, // Reasonable speed
|
|
130
|
+
carbonWeight: 0.15 // Environmental responsibility
|
|
131
|
+
}
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
## 🎯 Model Selection
|
|
135
|
+
|
|
136
|
+
### Available Models
|
|
137
|
+
|
|
138
|
+
Agentlify supports 100+ models across multiple providers:
|
|
139
|
+
|
|
140
|
+
#### OpenAI Models
|
|
141
|
+
|
|
142
|
+
- **GPT-4**: Highest quality, higher cost
|
|
143
|
+
- **GPT-4 Turbo**: Balanced quality and speed
|
|
144
|
+
- **GPT-3.5 Turbo**: Fast and cost-effective
|
|
145
|
+
|
|
146
|
+
#### Anthropic Models
|
|
147
|
+
|
|
148
|
+
- **Claude-3 Opus**: Premium quality
|
|
149
|
+
- **Claude-3 Sonnet**: Balanced performance
|
|
150
|
+
- **Claude-3 Haiku**: Fast and efficient
|
|
151
|
+
|
|
152
|
+
#### Google Models
|
|
153
|
+
|
|
154
|
+
- **Gemini Pro**: High-quality reasoning
|
|
155
|
+
- **Gemini Flash**: Fast responses
|
|
156
|
+
|
|
157
|
+
#### Open Source Models
|
|
158
|
+
|
|
159
|
+
- **Llama-2**: Cost-effective, privacy-focused
|
|
160
|
+
- **Mistral**: European, GDPR-compliant
|
|
161
|
+
- **CodeLlama**: Specialized for code
|
|
162
|
+
|
|
163
|
+
### Model Selection Strategy
|
|
164
|
+
|
|
165
|
+
```javascript
|
|
166
|
+
// Recommended starter set (3-5 models)
|
|
167
|
+
availableModels: [
|
|
168
|
+
'gpt-4', // High quality
|
|
169
|
+
'gpt-3.5-turbo', // Cost effective
|
|
170
|
+
'claude-3-sonnet', // Alternative high quality
|
|
171
|
+
'gemini-pro', // Google's offering
|
|
172
|
+
'llama-2-70b', // Open source option
|
|
173
|
+
];
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
### Model Categories by Use Case
|
|
177
|
+
|
|
178
|
+
#### Code Generation
|
|
179
|
+
|
|
180
|
+
```javascript
|
|
181
|
+
recommendedModels: [
|
|
182
|
+
'gpt-4', // Best for complex code
|
|
183
|
+
'claude-3-sonnet', // Good reasoning
|
|
184
|
+
'codellama-34b', // Specialized for code
|
|
185
|
+
'gpt-3.5-turbo', // Fast iterations
|
|
186
|
+
];
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
#### Content Writing
|
|
190
|
+
|
|
191
|
+
```javascript
|
|
192
|
+
recommendedModels: [
|
|
193
|
+
'gpt-4', // Creative writing
|
|
194
|
+
'claude-3-opus', // Long-form content
|
|
195
|
+
'gemini-pro', // Research-based content
|
|
196
|
+
'mistral-large', // European compliance
|
|
197
|
+
];
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
#### Data Analysis
|
|
201
|
+
|
|
202
|
+
```javascript
|
|
203
|
+
recommendedModels: [
|
|
204
|
+
'gpt-4', // Complex analysis
|
|
205
|
+
'claude-3-sonnet', // Structured thinking
|
|
206
|
+
'gemini-pro', // Mathematical reasoning
|
|
207
|
+
'gpt-3.5-turbo', // Quick insights
|
|
208
|
+
];
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
#### Customer Support
|
|
212
|
+
|
|
213
|
+
```javascript
|
|
214
|
+
recommendedModels: [
|
|
215
|
+
'gpt-3.5-turbo', // Fast responses
|
|
216
|
+
'claude-3-haiku', // Efficient handling
|
|
217
|
+
'gemini-flash', // Quick turnaround
|
|
218
|
+
'llama-2-13b', // Cost-effective
|
|
219
|
+
];
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
## 🛡️ Fallback Configuration
|
|
223
|
+
|
|
224
|
+
### Basic Fallback Setup
|
|
225
|
+
|
|
226
|
+
```javascript
|
|
227
|
+
{
|
|
228
|
+
enableFallback: true,
|
|
229
|
+
retryAttempts: 2,
|
|
230
|
+
fallbackModels: [
|
|
231
|
+
'gpt-3.5-turbo', // Reliable fallback
|
|
232
|
+
'claude-3-haiku' // Secondary fallback
|
|
233
|
+
]
|
|
234
|
+
}
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
### Advanced Fallback Strategies
|
|
238
|
+
|
|
239
|
+
#### Tiered Fallback
|
|
240
|
+
|
|
241
|
+
```javascript
|
|
242
|
+
{
|
|
243
|
+
enableFallback: true,
|
|
244
|
+
retryAttempts: 3,
|
|
245
|
+
fallbackModels: [
|
|
246
|
+
'gpt-4', // Try premium first
|
|
247
|
+
'gpt-3.5-turbo', // Then cost-effective
|
|
248
|
+
'llama-2-70b' // Finally open source
|
|
249
|
+
]
|
|
250
|
+
}
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
#### Provider Diversification
|
|
254
|
+
|
|
255
|
+
```javascript
|
|
256
|
+
{
|
|
257
|
+
enableFallback: true,
|
|
258
|
+
retryAttempts: 2,
|
|
259
|
+
fallbackModels: [
|
|
260
|
+
'claude-3-sonnet', // Different provider
|
|
261
|
+
'gemini-pro', // Another provider
|
|
262
|
+
'gpt-3.5-turbo' // Reliable backup
|
|
263
|
+
]
|
|
264
|
+
}
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
### Fallback Triggers
|
|
268
|
+
|
|
269
|
+
- **Rate Limits**: Model temporarily unavailable
|
|
270
|
+
- **Timeouts**: Model taking too long to respond
|
|
271
|
+
- **Errors**: Model returning error responses
|
|
272
|
+
- **Overload**: Model capacity exceeded
|
|
273
|
+
|
|
274
|
+
## 🔧 Advanced Settings
|
|
275
|
+
|
|
276
|
+
### Request Requirements
|
|
277
|
+
|
|
278
|
+
#### Maximum Latency
|
|
279
|
+
|
|
280
|
+
```javascript
|
|
281
|
+
{
|
|
282
|
+
maxLatencyMs: 5000, // Reject models slower than 5s
|
|
283
|
+
// Router will only consider fast models
|
|
284
|
+
}
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
#### Maximum Cost
|
|
288
|
+
|
|
289
|
+
```javascript
|
|
290
|
+
{
|
|
291
|
+
maxCostPerToken: 0.02, // Reject expensive models
|
|
292
|
+
// Router will only use cost-effective options
|
|
293
|
+
}
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
#### Capability Requirements
|
|
297
|
+
|
|
298
|
+
```javascript
|
|
299
|
+
{
|
|
300
|
+
functionCalling: true, // Require function calling support
|
|
301
|
+
structuredOutput: true, // Require JSON mode
|
|
302
|
+
fileUpload: false, // Don't need file upload
|
|
303
|
+
streaming: true // Require streaming support
|
|
304
|
+
}
|
|
305
|
+
```
|
|
306
|
+
|
|
307
|
+
### Performance Tuning
|
|
308
|
+
|
|
309
|
+
#### Timeout Configuration
|
|
310
|
+
|
|
311
|
+
```javascript
|
|
312
|
+
{
|
|
313
|
+
timeout: 30, // 30 second timeout
|
|
314
|
+
// Requests exceeding this will trigger fallback
|
|
315
|
+
}
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
#### Rate Limiting
|
|
319
|
+
|
|
320
|
+
```javascript
|
|
321
|
+
{
|
|
322
|
+
rateLimit: 100, // 100 requests per minute
|
|
323
|
+
// Helps manage costs and prevent abuse
|
|
324
|
+
}
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
#### Caching
|
|
328
|
+
|
|
329
|
+
```javascript
|
|
330
|
+
{
|
|
331
|
+
enableCaching: true, // Cache similar requests
|
|
332
|
+
// Improves performance and reduces costs
|
|
333
|
+
}
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
## 📊 Monitoring and Analytics
|
|
337
|
+
|
|
338
|
+
### Key Metrics to Track
|
|
339
|
+
|
|
340
|
+
#### Success Metrics
|
|
341
|
+
|
|
342
|
+
- **Success Rate**: Should be >99%
|
|
343
|
+
- **Average Latency**: Track by model and request type
|
|
344
|
+
- **Cost Per Request**: Monitor spending patterns
|
|
345
|
+
- **Model Distribution**: See which models are selected
|
|
346
|
+
|
|
347
|
+
#### Performance Indicators
|
|
348
|
+
|
|
349
|
+
```javascript
|
|
350
|
+
// Example analytics data
|
|
351
|
+
{
|
|
352
|
+
successRate: 0.995, // 99.5% success
|
|
353
|
+
avgLatencyMs: 1250, // 1.25s average
|
|
354
|
+
avgCostUsd: 0.0015, // $0.0015 per request
|
|
355
|
+
modelDistribution: {
|
|
356
|
+
'gpt-3.5-turbo': 0.6, // 60% of requests
|
|
357
|
+
'gpt-4': 0.25, // 25% of requests
|
|
358
|
+
'claude-3-sonnet': 0.15 // 15% of requests
|
|
359
|
+
}
|
|
360
|
+
}
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
### Optimization Based on Analytics
|
|
364
|
+
|
|
365
|
+
#### High Cost Issues
|
|
366
|
+
|
|
367
|
+
```javascript
|
|
368
|
+
// If avgCostUsd > target
|
|
369
|
+
{
|
|
370
|
+
costWeight: 0.6, // Increase cost priority
|
|
371
|
+
qualityWeight: 0.25, // Reduce quality slightly
|
|
372
|
+
// Add more cost-effective models
|
|
373
|
+
availableModels: [...existing, 'llama-2-70b', 'mistral-7b']
|
|
374
|
+
}
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
#### High Latency Issues
|
|
378
|
+
|
|
379
|
+
```javascript
|
|
380
|
+
// If avgLatencyMs > target
|
|
381
|
+
{
|
|
382
|
+
latencyWeight: 0.5, // Prioritize speed
|
|
383
|
+
costWeight: 0.3, // Accept higher costs
|
|
384
|
+
// Remove slow models
|
|
385
|
+
availableModels: models.filter(m => !slowModels.includes(m))
|
|
386
|
+
}
|
|
387
|
+
```
|
|
388
|
+
|
|
389
|
+
#### Low Quality Issues
|
|
390
|
+
|
|
391
|
+
```javascript
|
|
392
|
+
// If quality scores < target
|
|
393
|
+
{
|
|
394
|
+
qualityWeight: 0.5, // Prioritize quality
|
|
395
|
+
costWeight: 0.2, // Accept higher costs
|
|
396
|
+
// Add premium models
|
|
397
|
+
availableModels: [...existing, 'gpt-4', 'claude-3-opus']
|
|
398
|
+
}
|
|
399
|
+
```
|
|
400
|
+
|
|
401
|
+
## 🔒 Security and Compliance
|
|
402
|
+
|
|
403
|
+
### Data Governance
|
|
404
|
+
|
|
405
|
+
#### Regional Compliance
|
|
406
|
+
|
|
407
|
+
```javascript
|
|
408
|
+
{
|
|
409
|
+
availableModels: [
|
|
410
|
+
'mistral-large', // EU-based for GDPR
|
|
411
|
+
'claude-3-sonnet', // US-based
|
|
412
|
+
// Exclude models not meeting compliance requirements
|
|
413
|
+
];
|
|
414
|
+
}
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
#### Data Retention
|
|
418
|
+
|
|
419
|
+
```javascript
|
|
420
|
+
{
|
|
421
|
+
detailedAnalytics: false, // Disable detailed logging
|
|
422
|
+
enableCaching: false, // Disable request caching
|
|
423
|
+
// Minimize data retention for sensitive workloads
|
|
424
|
+
}
|
|
425
|
+
```
|
|
426
|
+
|
|
427
|
+
### Access Control
|
|
428
|
+
|
|
429
|
+
#### API Key Management
|
|
430
|
+
|
|
431
|
+
- Rotate keys regularly
|
|
432
|
+
- Use separate keys for different environments
|
|
433
|
+
- Monitor key usage patterns
|
|
434
|
+
- Implement rate limiting per key
|
|
435
|
+
|
|
436
|
+
#### Router Isolation
|
|
437
|
+
|
|
438
|
+
- Separate routers for different teams/projects
|
|
439
|
+
- Environment-specific configurations
|
|
440
|
+
- Granular access controls
|
|
441
|
+
|
|
442
|
+
## 🧪 Testing and Validation
|
|
443
|
+
|
|
444
|
+
### A/B Testing Router Configurations
|
|
445
|
+
|
|
446
|
+
```javascript
|
|
447
|
+
// Configuration A: Cost-optimized
|
|
448
|
+
const routerA = {
|
|
449
|
+
costWeight: 0.6,
|
|
450
|
+
qualityWeight: 0.3,
|
|
451
|
+
latencyWeight: 0.1,
|
|
452
|
+
};
|
|
453
|
+
|
|
454
|
+
// Configuration B: Quality-optimized
|
|
455
|
+
const routerB = {
|
|
456
|
+
qualityWeight: 0.6,
|
|
457
|
+
costWeight: 0.2,
|
|
458
|
+
latencyWeight: 0.2,
|
|
459
|
+
};
|
|
460
|
+
|
|
461
|
+
// Split traffic and compare metrics
|
|
462
|
+
```
|
|
463
|
+
|
|
464
|
+
### Gradual Rollout Strategy
|
|
465
|
+
|
|
466
|
+
1. **Development**: Test with small traffic
|
|
467
|
+
2. **Staging**: Validate with realistic load
|
|
468
|
+
3. **Canary**: Deploy to 5% of production traffic
|
|
469
|
+
4. **Full Rollout**: Gradually increase to 100%
|
|
470
|
+
|
|
471
|
+
### Configuration Validation
|
|
472
|
+
|
|
473
|
+
```javascript
|
|
474
|
+
// Validate configuration before deployment
|
|
475
|
+
function validateRouterConfig(config) {
|
|
476
|
+
// Check weights sum to 1.0
|
|
477
|
+
const totalWeight =
|
|
478
|
+
config.costWeight +
|
|
479
|
+
config.qualityWeight +
|
|
480
|
+
config.latencyWeight +
|
|
481
|
+
config.carbonWeight;
|
|
482
|
+
assert(Math.abs(totalWeight - 1.0) < 0.001);
|
|
483
|
+
|
|
484
|
+
// Ensure fallback models are available
|
|
485
|
+
assert(
|
|
486
|
+
config.fallbackModels.every((model) =>
|
|
487
|
+
config.availableModels.includes(model),
|
|
488
|
+
),
|
|
489
|
+
);
|
|
490
|
+
|
|
491
|
+
// Validate model capabilities
|
|
492
|
+
if (config.functionCalling) {
|
|
493
|
+
assert(
|
|
494
|
+
config.availableModels.some((model) => supportsFunctionCalling(model)),
|
|
495
|
+
);
|
|
496
|
+
}
|
|
497
|
+
}
|
|
498
|
+
```
|
|
499
|
+
|
|
500
|
+
## 📋 Configuration Templates
|
|
501
|
+
|
|
502
|
+
### Startup Template
|
|
503
|
+
|
|
504
|
+
```javascript
|
|
505
|
+
{
|
|
506
|
+
name: "Startup Router",
|
|
507
|
+
mode: "smartRouter",
|
|
508
|
+
costWeight: 0.5,
|
|
509
|
+
qualityWeight: 0.3,
|
|
510
|
+
latencyWeight: 0.15,
|
|
511
|
+
carbonWeight: 0.05,
|
|
512
|
+
availableModels: [
|
|
513
|
+
'gpt-3.5-turbo',
|
|
514
|
+
'claude-3-haiku',
|
|
515
|
+
'llama-2-70b'
|
|
516
|
+
],
|
|
517
|
+
enableFallback: true,
|
|
518
|
+
fallbackModels: ['gpt-3.5-turbo'],
|
|
519
|
+
retryAttempts: 2
|
|
520
|
+
}
|
|
521
|
+
```
|
|
522
|
+
|
|
523
|
+
### Enterprise Template
|
|
524
|
+
|
|
525
|
+
```javascript
|
|
526
|
+
{
|
|
527
|
+
name: "Enterprise Router",
|
|
528
|
+
mode: "smartRouter",
|
|
529
|
+
costWeight: 0.25,
|
|
530
|
+
qualityWeight: 0.4,
|
|
531
|
+
latencyWeight: 0.25,
|
|
532
|
+
carbonWeight: 0.1,
|
|
533
|
+
availableModels: [
|
|
534
|
+
'gpt-4',
|
|
535
|
+
'claude-3-opus',
|
|
536
|
+
'claude-3-sonnet',
|
|
537
|
+
'gemini-pro',
|
|
538
|
+
'gpt-3.5-turbo'
|
|
539
|
+
],
|
|
540
|
+
enableFallback: true,
|
|
541
|
+
fallbackModels: ['claude-3-sonnet', 'gpt-3.5-turbo'],
|
|
542
|
+
retryAttempts: 3,
|
|
543
|
+
functionCalling: true,
|
|
544
|
+
structuredOutput: true
|
|
545
|
+
}
|
|
546
|
+
```
|
|
547
|
+
|
|
548
|
+
### Real-time Template
|
|
549
|
+
|
|
550
|
+
```javascript
|
|
551
|
+
{
|
|
552
|
+
name: "Real-time Router",
|
|
553
|
+
mode: "smartRouter",
|
|
554
|
+
latencyWeight: 0.6,
|
|
555
|
+
qualityWeight: 0.25,
|
|
556
|
+
costWeight: 0.1,
|
|
557
|
+
carbonWeight: 0.05,
|
|
558
|
+
availableModels: [
|
|
559
|
+
'gpt-3.5-turbo',
|
|
560
|
+
'claude-3-haiku',
|
|
561
|
+
'gemini-flash'
|
|
562
|
+
],
|
|
563
|
+
maxLatencyMs: 3000,
|
|
564
|
+
enableFallback: true,
|
|
565
|
+
fallbackModels: ['gpt-3.5-turbo'],
|
|
566
|
+
retryAttempts: 1
|
|
567
|
+
}
|
|
568
|
+
```
|
|
569
|
+
|
|
570
|
+
## 🚀 Best Practices
|
|
571
|
+
|
|
572
|
+
### Configuration Management
|
|
573
|
+
|
|
574
|
+
- Version control your router configurations
|
|
575
|
+
- Document configuration changes
|
|
576
|
+
- Test configurations in staging first
|
|
577
|
+
- Monitor metrics after changes
|
|
578
|
+
|
|
579
|
+
### Model Selection
|
|
580
|
+
|
|
581
|
+
- Start with 3-5 models for simplicity
|
|
582
|
+
- Include models from different providers
|
|
583
|
+
- Balance cost, quality, and speed
|
|
584
|
+
- Regular review and optimization
|
|
585
|
+
|
|
586
|
+
### Fallback Strategy
|
|
587
|
+
|
|
588
|
+
- Always enable fallback for production
|
|
589
|
+
- Use reliable models as fallbacks
|
|
590
|
+
- Test fallback scenarios regularly
|
|
591
|
+
- Monitor fallback trigger rates
|
|
592
|
+
|
|
593
|
+
### Performance Optimization
|
|
594
|
+
|
|
595
|
+
- Adjust weights based on actual usage
|
|
596
|
+
- Remove underperforming models
|
|
597
|
+
- Add new models as they become available
|
|
598
|
+
- Regular performance reviews
|
|
599
|
+
|
|
600
|
+
## 🆘 Troubleshooting
|
|
601
|
+
|
|
602
|
+
### Common Configuration Issues
|
|
603
|
+
|
|
604
|
+
**Issue**: Router always selects expensive models
|
|
605
|
+
|
|
606
|
+
```javascript
|
|
607
|
+
// Solution: Increase cost weight
|
|
608
|
+
{
|
|
609
|
+
costWeight: 0.6, // Increase from current value
|
|
610
|
+
qualityWeight: 0.3,
|
|
611
|
+
latencyWeight: 0.1
|
|
612
|
+
}
|
|
613
|
+
```
|
|
614
|
+
|
|
615
|
+
**Issue**: Responses are too slow
|
|
616
|
+
|
|
617
|
+
```javascript
|
|
618
|
+
// Solution: Prioritize latency, set max latency
|
|
619
|
+
{
|
|
620
|
+
latencyWeight: 0.5,
|
|
621
|
+
maxLatencyMs: 5000, // Reject slow models
|
|
622
|
+
// Remove slow models from available list
|
|
623
|
+
}
|
|
624
|
+
```
|
|
625
|
+
|
|
626
|
+
**Issue**: Frequent fallback triggers
|
|
627
|
+
|
|
628
|
+
```javascript
|
|
629
|
+
// Solution: Add more reliable models, increase retry attempts
|
|
630
|
+
{
|
|
631
|
+
availableModels: [...existing, 'gpt-3.5-turbo'], // Add reliable model
|
|
632
|
+
retryAttempts: 3, // Increase retries
|
|
633
|
+
fallbackModels: ['gpt-3.5-turbo', 'claude-3-haiku'] // Multiple fallbacks
|
|
634
|
+
}
|
|
635
|
+
```
|
|
636
|
+
|
|
637
|
+
## 🔗 Related Documentation
|
|
638
|
+
|
|
639
|
+
- **[Getting Started](getting-started.md)** - Basic router setup
|
|
640
|
+
- **[NPM Package](npm-package.md)** - Using routers in code
|
|
641
|
+
- **[Backend Architecture](backend-architecture.md)** - How routing works
|
|
642
|
+
- **[API Reference](api-reference.md)** - Complete API docs
|
|
643
|
+
|
|
644
|
+
---
|