@agentlify/mcp-server 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,644 @@
1
+ # Router Configuration Guide
2
+
3
+ Learn how to configure Agentlify routers for optimal performance, cost savings, and reliability.
4
+
5
+ ## 🎯 Overview
6
+
7
+ Agentlify routers intelligently select the best AI model for each request based on your optimization preferences. This guide covers advanced configuration options and best practices.
8
+
9
+ ## 🧠 Router Modes
10
+
11
+ ### Smart Router Mode (Recommended)
12
+
13
+ Smart Router mode automatically selects the optimal model for each request based on your configured weights and requirements.
14
+
15
+ ```javascript
16
+ // Router automatically selects the best model
17
+ const completion = await client.chat.completions.create({
18
+ messages: [{ role: 'user', content: 'Analyze this data...' }],
19
+ // No model specified - router decides
20
+ });
21
+ ```
22
+
23
+ **Benefits:**
24
+
25
+ - Automatic cost optimization
26
+ - Performance-based selection
27
+ - Quality-aware routing
28
+ - Carbon footprint consideration
29
+
30
+ **Best for:**
31
+
32
+ - Production applications
33
+ - Cost-sensitive workloads
34
+ - Variable request types
35
+ - Long-term optimization
36
+
37
+ ### Passthrough Mode
38
+
39
+ Passthrough mode always routes requests to your specified preferred model, providing consistent behavior.
40
+
41
+ ```javascript
42
+ // Always uses the configured preferred model
43
+ const completion = await client.chat.completions.create({
44
+ messages: [{ role: 'user', content: 'Hello' }],
45
+ // Uses preferred model (e.g., gpt-4)
46
+ });
47
+ ```
48
+
49
+ **Benefits:**
50
+
51
+ - Predictable behavior
52
+ - Consistent response style
53
+ - Simple debugging
54
+ - Model-specific features
55
+
56
+ **Best for:**
57
+
58
+ - Testing and development
59
+ - Model-specific requirements
60
+ - Consistent user experience
61
+ - Compliance requirements
62
+
63
+ ## ⚖️ Optimization Weights
64
+
65
+ Configure how the router prioritizes different factors when selecting models.
66
+
67
+ ### Weight Configuration
68
+
69
+ ```javascript
70
+ {
71
+ costWeight: 0.4, // 40% - Cost optimization
72
+ latencyWeight: 0.3, // 30% - Speed optimization
73
+ qualityWeight: 0.2, // 20% - Response quality
74
+ carbonWeight: 0.1 // 10% - Environmental impact
75
+ }
76
+ // Total must equal 1.0
77
+ ```
78
+
79
+ ### Priority Levels
80
+
81
+ | Priority | Weight | Description |
82
+ | --------- | ------- | ---------------------------- |
83
+ | Very High | 0.5-0.7 | Dominant factor in selection |
84
+ | High | 0.3-0.4 | Major consideration |
85
+ | Medium | 0.2-0.3 | Moderate influence |
86
+ | Low | 0.1-0.2 | Minor factor |
87
+
88
+ ### Common Configurations
89
+
90
+ #### Cost-Optimized (Startups/High Volume)
91
+
92
+ ```javascript
93
+ {
94
+ costWeight: 0.6, // Primary focus on cost savings
95
+ qualityWeight: 0.25, // Maintain reasonable quality
96
+ latencyWeight: 0.1, // Speed less important
97
+ carbonWeight: 0.05 // Minimal environmental focus
98
+ }
99
+ ```
100
+
101
+ #### Performance-Optimized (Real-time Apps)
102
+
103
+ ```javascript
104
+ {
105
+ latencyWeight: 0.5, // Speed is critical
106
+ qualityWeight: 0.3, // Good quality needed
107
+ costWeight: 0.15, // Cost secondary
108
+ carbonWeight: 0.05 // Environmental consideration
109
+ }
110
+ ```
111
+
112
+ #### Quality-Optimized (Content Creation)
113
+
114
+ ```javascript
115
+ {
116
+ qualityWeight: 0.5, // Best possible responses
117
+ costWeight: 0.25, // Reasonable cost control
118
+ latencyWeight: 0.2, // Acceptable speed
119
+ carbonWeight: 0.05 // Environmental awareness
120
+ }
121
+ ```
122
+
123
+ #### Balanced (General Purpose)
124
+
125
+ ```javascript
126
+ {
127
+ costWeight: 0.3, // Moderate cost focus
128
+ qualityWeight: 0.3, // Good quality
129
+ latencyWeight: 0.25, // Reasonable speed
130
+ carbonWeight: 0.15 // Environmental responsibility
131
+ }
132
+ ```
133
+
134
+ ## 🎯 Model Selection
135
+
136
+ ### Available Models
137
+
138
+ Agentlify supports 100+ models across multiple providers:
139
+
140
+ #### OpenAI Models
141
+
142
+ - **GPT-4**: Highest quality, higher cost
143
+ - **GPT-4 Turbo**: Balanced quality and speed
144
+ - **GPT-3.5 Turbo**: Fast and cost-effective
145
+
146
+ #### Anthropic Models
147
+
148
+ - **Claude-3 Opus**: Premium quality
149
+ - **Claude-3 Sonnet**: Balanced performance
150
+ - **Claude-3 Haiku**: Fast and efficient
151
+
152
+ #### Google Models
153
+
154
+ - **Gemini Pro**: High-quality reasoning
155
+ - **Gemini Flash**: Fast responses
156
+
157
+ #### Open Source Models
158
+
159
+ - **Llama-2**: Cost-effective, privacy-focused
160
+ - **Mistral**: European, GDPR-compliant
161
+ - **CodeLlama**: Specialized for code
162
+
163
+ ### Model Selection Strategy
164
+
165
+ ```javascript
166
+ // Recommended starter set (3-5 models)
167
+ availableModels: [
168
+ 'gpt-4', // High quality
169
+ 'gpt-3.5-turbo', // Cost effective
170
+ 'claude-3-sonnet', // Alternative high quality
171
+ 'gemini-pro', // Google's offering
172
+ 'llama-2-70b', // Open source option
173
+ ];
174
+ ```
175
+
176
+ ### Model Categories by Use Case
177
+
178
+ #### Code Generation
179
+
180
+ ```javascript
181
+ recommendedModels: [
182
+ 'gpt-4', // Best for complex code
183
+ 'claude-3-sonnet', // Good reasoning
184
+ 'codellama-34b', // Specialized for code
185
+ 'gpt-3.5-turbo', // Fast iterations
186
+ ];
187
+ ```
188
+
189
+ #### Content Writing
190
+
191
+ ```javascript
192
+ recommendedModels: [
193
+ 'gpt-4', // Creative writing
194
+ 'claude-3-opus', // Long-form content
195
+ 'gemini-pro', // Research-based content
196
+ 'mistral-large', // European compliance
197
+ ];
198
+ ```
199
+
200
+ #### Data Analysis
201
+
202
+ ```javascript
203
+ recommendedModels: [
204
+ 'gpt-4', // Complex analysis
205
+ 'claude-3-sonnet', // Structured thinking
206
+ 'gemini-pro', // Mathematical reasoning
207
+ 'gpt-3.5-turbo', // Quick insights
208
+ ];
209
+ ```
210
+
211
+ #### Customer Support
212
+
213
+ ```javascript
214
+ recommendedModels: [
215
+ 'gpt-3.5-turbo', // Fast responses
216
+ 'claude-3-haiku', // Efficient handling
217
+ 'gemini-flash', // Quick turnaround
218
+ 'llama-2-13b', // Cost-effective
219
+ ];
220
+ ```
221
+
222
+ ## 🛡️ Fallback Configuration
223
+
224
+ ### Basic Fallback Setup
225
+
226
+ ```javascript
227
+ {
228
+ enableFallback: true,
229
+ retryAttempts: 2,
230
+ fallbackModels: [
231
+ 'gpt-3.5-turbo', // Reliable fallback
232
+ 'claude-3-haiku' // Secondary fallback
233
+ ]
234
+ }
235
+ ```
236
+
237
+ ### Advanced Fallback Strategies
238
+
239
+ #### Tiered Fallback
240
+
241
+ ```javascript
242
+ {
243
+ enableFallback: true,
244
+ retryAttempts: 3,
245
+ fallbackModels: [
246
+ 'gpt-4', // Try premium first
247
+ 'gpt-3.5-turbo', // Then cost-effective
248
+ 'llama-2-70b' // Finally open source
249
+ ]
250
+ }
251
+ ```
252
+
253
+ #### Provider Diversification
254
+
255
+ ```javascript
256
+ {
257
+ enableFallback: true,
258
+ retryAttempts: 2,
259
+ fallbackModels: [
260
+ 'claude-3-sonnet', // Different provider
261
+ 'gemini-pro', // Another provider
262
+ 'gpt-3.5-turbo' // Reliable backup
263
+ ]
264
+ }
265
+ ```
266
+
267
+ ### Fallback Triggers
268
+
269
+ - **Rate Limits**: Model temporarily unavailable
270
+ - **Timeouts**: Model taking too long to respond
271
+ - **Errors**: Model returning error responses
272
+ - **Overload**: Model capacity exceeded
273
+
274
+ ## 🔧 Advanced Settings
275
+
276
+ ### Request Requirements
277
+
278
+ #### Maximum Latency
279
+
280
+ ```javascript
281
+ {
282
+ maxLatencyMs: 5000, // Reject models slower than 5s
283
+ // Router will only consider fast models
284
+ }
285
+ ```
286
+
287
+ #### Maximum Cost
288
+
289
+ ```javascript
290
+ {
291
+ maxCostPerToken: 0.02, // Reject expensive models
292
+ // Router will only use cost-effective options
293
+ }
294
+ ```
295
+
296
+ #### Capability Requirements
297
+
298
+ ```javascript
299
+ {
300
+ functionCalling: true, // Require function calling support
301
+ structuredOutput: true, // Require JSON mode
302
+ fileUpload: false, // Don't need file upload
303
+ streaming: true // Require streaming support
304
+ }
305
+ ```
306
+
307
+ ### Performance Tuning
308
+
309
+ #### Timeout Configuration
310
+
311
+ ```javascript
312
+ {
313
+ timeout: 30, // 30 second timeout
314
+ // Requests exceeding this will trigger fallback
315
+ }
316
+ ```
317
+
318
+ #### Rate Limiting
319
+
320
+ ```javascript
321
+ {
322
+ rateLimit: 100, // 100 requests per minute
323
+ // Helps manage costs and prevent abuse
324
+ }
325
+ ```
326
+
327
+ #### Caching
328
+
329
+ ```javascript
330
+ {
331
+ enableCaching: true, // Cache similar requests
332
+ // Improves performance and reduces costs
333
+ }
334
+ ```
335
+
336
+ ## 📊 Monitoring and Analytics
337
+
338
+ ### Key Metrics to Track
339
+
340
+ #### Success Metrics
341
+
342
+ - **Success Rate**: Should be >99%
343
+ - **Average Latency**: Track by model and request type
344
+ - **Cost Per Request**: Monitor spending patterns
345
+ - **Model Distribution**: See which models are selected
346
+
347
+ #### Performance Indicators
348
+
349
+ ```javascript
350
+ // Example analytics data
351
+ {
352
+ successRate: 0.995, // 99.5% success
353
+ avgLatencyMs: 1250, // 1.25s average
354
+ avgCostUsd: 0.0015, // $0.0015 per request
355
+ modelDistribution: {
356
+ 'gpt-3.5-turbo': 0.6, // 60% of requests
357
+ 'gpt-4': 0.25, // 25% of requests
358
+ 'claude-3-sonnet': 0.15 // 15% of requests
359
+ }
360
+ }
361
+ ```
362
+
363
+ ### Optimization Based on Analytics
364
+
365
+ #### High Cost Issues
366
+
367
+ ```javascript
368
+ // If avgCostUsd > target
369
+ {
370
+ costWeight: 0.6, // Increase cost priority
371
+ qualityWeight: 0.25, // Reduce quality slightly
372
+ // Add more cost-effective models
373
+ availableModels: [...existing, 'llama-2-70b', 'mistral-7b']
374
+ }
375
+ ```
376
+
377
+ #### High Latency Issues
378
+
379
+ ```javascript
380
+ // If avgLatencyMs > target
381
+ {
382
+ latencyWeight: 0.5, // Prioritize speed
383
+ costWeight: 0.3, // Accept higher costs
384
+ // Remove slow models
385
+ availableModels: models.filter(m => !slowModels.includes(m))
386
+ }
387
+ ```
388
+
389
+ #### Low Quality Issues
390
+
391
+ ```javascript
392
+ // If quality scores < target
393
+ {
394
+ qualityWeight: 0.5, // Prioritize quality
395
+ costWeight: 0.2, // Accept higher costs
396
+ // Add premium models
397
+ availableModels: [...existing, 'gpt-4', 'claude-3-opus']
398
+ }
399
+ ```
400
+
401
+ ## 🔒 Security and Compliance
402
+
403
+ ### Data Governance
404
+
405
+ #### Regional Compliance
406
+
407
+ ```javascript
408
+ {
409
+ availableModels: [
410
+ 'mistral-large', // EU-based for GDPR
411
+ 'claude-3-sonnet', // US-based
412
+ // Exclude models not meeting compliance requirements
413
+ ];
414
+ }
415
+ ```
416
+
417
+ #### Data Retention
418
+
419
+ ```javascript
420
+ {
421
+ detailedAnalytics: false, // Disable detailed logging
422
+ enableCaching: false, // Disable request caching
423
+ // Minimize data retention for sensitive workloads
424
+ }
425
+ ```
426
+
427
+ ### Access Control
428
+
429
+ #### API Key Management
430
+
431
+ - Rotate keys regularly
432
+ - Use separate keys for different environments
433
+ - Monitor key usage patterns
434
+ - Implement rate limiting per key
435
+
436
+ #### Router Isolation
437
+
438
+ - Separate routers for different teams/projects
439
+ - Environment-specific configurations
440
+ - Granular access controls
441
+
442
+ ## 🧪 Testing and Validation
443
+
444
+ ### A/B Testing Router Configurations
445
+
446
+ ```javascript
447
+ // Configuration A: Cost-optimized
448
+ const routerA = {
449
+ costWeight: 0.6,
450
+ qualityWeight: 0.3,
451
+ latencyWeight: 0.1,
452
+ };
453
+
454
+ // Configuration B: Quality-optimized
455
+ const routerB = {
456
+ qualityWeight: 0.6,
457
+ costWeight: 0.2,
458
+ latencyWeight: 0.2,
459
+ };
460
+
461
+ // Split traffic and compare metrics
462
+ ```
463
+
464
+ ### Gradual Rollout Strategy
465
+
466
+ 1. **Development**: Test with small traffic
467
+ 2. **Staging**: Validate with realistic load
468
+ 3. **Canary**: Deploy to 5% of production traffic
469
+ 4. **Full Rollout**: Gradually increase to 100%
470
+
471
+ ### Configuration Validation
472
+
473
+ ```javascript
474
+ // Validate configuration before deployment
475
+ function validateRouterConfig(config) {
476
+ // Check weights sum to 1.0
477
+ const totalWeight =
478
+ config.costWeight +
479
+ config.qualityWeight +
480
+ config.latencyWeight +
481
+ config.carbonWeight;
482
+ assert(Math.abs(totalWeight - 1.0) < 0.001);
483
+
484
+ // Ensure fallback models are available
485
+ assert(
486
+ config.fallbackModels.every((model) =>
487
+ config.availableModels.includes(model),
488
+ ),
489
+ );
490
+
491
+ // Validate model capabilities
492
+ if (config.functionCalling) {
493
+ assert(
494
+ config.availableModels.some((model) => supportsFunctionCalling(model)),
495
+ );
496
+ }
497
+ }
498
+ ```
499
+
500
+ ## 📋 Configuration Templates
501
+
502
+ ### Startup Template
503
+
504
+ ```javascript
505
+ {
506
+ name: "Startup Router",
507
+ mode: "smartRouter",
508
+ costWeight: 0.5,
509
+ qualityWeight: 0.3,
510
+ latencyWeight: 0.15,
511
+ carbonWeight: 0.05,
512
+ availableModels: [
513
+ 'gpt-3.5-turbo',
514
+ 'claude-3-haiku',
515
+ 'llama-2-70b'
516
+ ],
517
+ enableFallback: true,
518
+ fallbackModels: ['gpt-3.5-turbo'],
519
+ retryAttempts: 2
520
+ }
521
+ ```
522
+
523
+ ### Enterprise Template
524
+
525
+ ```javascript
526
+ {
527
+ name: "Enterprise Router",
528
+ mode: "smartRouter",
529
+ costWeight: 0.25,
530
+ qualityWeight: 0.4,
531
+ latencyWeight: 0.25,
532
+ carbonWeight: 0.1,
533
+ availableModels: [
534
+ 'gpt-4',
535
+ 'claude-3-opus',
536
+ 'claude-3-sonnet',
537
+ 'gemini-pro',
538
+ 'gpt-3.5-turbo'
539
+ ],
540
+ enableFallback: true,
541
+ fallbackModels: ['claude-3-sonnet', 'gpt-3.5-turbo'],
542
+ retryAttempts: 3,
543
+ functionCalling: true,
544
+ structuredOutput: true
545
+ }
546
+ ```
547
+
548
+ ### Real-time Template
549
+
550
+ ```javascript
551
+ {
552
+ name: "Real-time Router",
553
+ mode: "smartRouter",
554
+ latencyWeight: 0.6,
555
+ qualityWeight: 0.25,
556
+ costWeight: 0.1,
557
+ carbonWeight: 0.05,
558
+ availableModels: [
559
+ 'gpt-3.5-turbo',
560
+ 'claude-3-haiku',
561
+ 'gemini-flash'
562
+ ],
563
+ maxLatencyMs: 3000,
564
+ enableFallback: true,
565
+ fallbackModels: ['gpt-3.5-turbo'],
566
+ retryAttempts: 1
567
+ }
568
+ ```
569
+
570
+ ## 🚀 Best Practices
571
+
572
+ ### Configuration Management
573
+
574
+ - Version control your router configurations
575
+ - Document configuration changes
576
+ - Test configurations in staging first
577
+ - Monitor metrics after changes
578
+
579
+ ### Model Selection
580
+
581
+ - Start with 3-5 models for simplicity
582
+ - Include models from different providers
583
+ - Balance cost, quality, and speed
584
+ - Regular review and optimization
585
+
586
+ ### Fallback Strategy
587
+
588
+ - Always enable fallback for production
589
+ - Use reliable models as fallbacks
590
+ - Test fallback scenarios regularly
591
+ - Monitor fallback trigger rates
592
+
593
+ ### Performance Optimization
594
+
595
+ - Adjust weights based on actual usage
596
+ - Remove underperforming models
597
+ - Add new models as they become available
598
+ - Regular performance reviews
599
+
600
+ ## 🆘 Troubleshooting
601
+
602
+ ### Common Configuration Issues
603
+
604
+ **Issue**: Router always selects expensive models
605
+
606
+ ```javascript
607
+ // Solution: Increase cost weight
608
+ {
609
+ costWeight: 0.6, // Increase from current value
610
+ qualityWeight: 0.3,
611
+ latencyWeight: 0.1
612
+ }
613
+ ```
614
+
615
+ **Issue**: Responses are too slow
616
+
617
+ ```javascript
618
+ // Solution: Prioritize latency, set max latency
619
+ {
620
+ latencyWeight: 0.5,
621
+ maxLatencyMs: 5000, // Reject slow models
622
+ // Remove slow models from available list
623
+ }
624
+ ```
625
+
626
+ **Issue**: Frequent fallback triggers
627
+
628
+ ```javascript
629
+ // Solution: Add more reliable models, increase retry attempts
630
+ {
631
+ availableModels: [...existing, 'gpt-3.5-turbo'], // Add reliable model
632
+ retryAttempts: 3, // Increase retries
633
+ fallbackModels: ['gpt-3.5-turbo', 'claude-3-haiku'] // Multiple fallbacks
634
+ }
635
+ ```
636
+
637
+ ## 🔗 Related Documentation
638
+
639
+ - **[Getting Started](getting-started.md)** - Basic router setup
640
+ - **[NPM Package](npm-package.md)** - Using routers in code
641
+ - **[Backend Architecture](backend-architecture.md)** - How routing works
642
+ - **[API Reference](api-reference.md)** - Complete API docs
643
+
644
+ ---