lynkr 7.2.5 → 8.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (64) hide show
  1. package/README.md +2 -2
  2. package/config/model-tiers.json +89 -0
  3. package/docs/docs.html +1 -0
  4. package/docs/index.md +7 -0
  5. package/docs/toon-integration-spec.md +130 -0
  6. package/documentation/README.md +3 -2
  7. package/documentation/claude-code-cli.md +23 -16
  8. package/documentation/cursor-integration.md +17 -14
  9. package/documentation/docker.md +11 -4
  10. package/documentation/embeddings.md +7 -5
  11. package/documentation/faq.md +66 -12
  12. package/documentation/features.md +22 -15
  13. package/documentation/installation.md +66 -14
  14. package/documentation/production.md +43 -8
  15. package/documentation/providers.md +145 -42
  16. package/documentation/routing.md +476 -0
  17. package/documentation/token-optimization.md +7 -5
  18. package/documentation/troubleshooting.md +81 -5
  19. package/install.sh +6 -1
  20. package/package.json +4 -2
  21. package/scripts/setup.js +0 -1
  22. package/src/agents/executor.js +14 -6
  23. package/src/api/middleware/session.js +15 -2
  24. package/src/api/openai-router.js +130 -37
  25. package/src/api/providers-handler.js +15 -1
  26. package/src/api/router.js +107 -2
  27. package/src/budget/index.js +4 -3
  28. package/src/clients/databricks.js +431 -234
  29. package/src/clients/gpt-utils.js +181 -0
  30. package/src/clients/ollama-utils.js +66 -140
  31. package/src/clients/routing.js +0 -1
  32. package/src/clients/standard-tools.js +76 -3
  33. package/src/config/index.js +113 -35
  34. package/src/context/toon.js +173 -0
  35. package/src/logger/index.js +23 -0
  36. package/src/orchestrator/index.js +686 -211
  37. package/src/routing/agentic-detector.js +320 -0
  38. package/src/routing/complexity-analyzer.js +202 -2
  39. package/src/routing/cost-optimizer.js +305 -0
  40. package/src/routing/index.js +168 -159
  41. package/src/routing/model-tiers.js +365 -0
  42. package/src/server.js +2 -2
  43. package/src/sessions/cleanup.js +3 -3
  44. package/src/sessions/record.js +10 -1
  45. package/src/sessions/store.js +7 -2
  46. package/src/tools/agent-task.js +48 -1
  47. package/src/tools/index.js +15 -2
  48. package/te +11622 -0
  49. package/test/README.md +1 -1
  50. package/test/azure-openai-config.test.js +17 -8
  51. package/test/azure-openai-integration.test.js +7 -1
  52. package/test/azure-openai-routing.test.js +41 -43
  53. package/test/bedrock-integration.test.js +18 -32
  54. package/test/hybrid-routing-integration.test.js +35 -20
  55. package/test/hybrid-routing-performance.test.js +74 -64
  56. package/test/llamacpp-integration.test.js +28 -9
  57. package/test/lmstudio-integration.test.js +20 -8
  58. package/test/openai-integration.test.js +17 -20
  59. package/test/performance-tests.js +1 -1
  60. package/test/routing.test.js +65 -59
  61. package/test/toon-compression.test.js +131 -0
  62. package/CLAWROUTER_ROUTING_PLAN.md +0 -910
  63. package/ROUTER_COMPARISON.md +0 -173
  64. package/TIER_ROUTING_PLAN.md +0 -771
@@ -1,771 +0,0 @@
1
- # 3-Tier Routing Implementation Plan
2
-
3
- ## Overview
4
-
5
- Add explicit 3-tier configuration to Lynkr for predictable, cost-aware routing:
6
- - **Tier 1**: Local/Free (Ollama, llama.cpp, LM Studio)
7
- - **Tier 2**: Cloud/Cost-Effective (OpenRouter, Bedrock with cheap models, Azure OpenAI mini)
8
- - **Tier 3**: Cloud/Premium (Databricks, Azure Anthropic, Bedrock with premium models, OpenAI direct, etc.)
9
-
10
- ---
11
-
12
- ## Current System vs Proposed System
13
-
14
- ### Current Configuration
15
- ```bash
16
- PREFER_OLLAMA=true
17
- OLLAMA_MAX_TOOLS_FOR_ROUTING=3
18
- FALLBACK_ENABLED=true
19
- FALLBACK_PROVIDER=databricks
20
- ```
21
-
22
- **Current Routing Logic**:
23
- - 0-2 tools → Ollama
24
- - 3+ tools → Checks if OpenRouter/OpenAI/Azure/etc. configured (hardcoded order in routing.js lines 56-93)
25
- - Heavy tools OR not configured → FALLBACK_PROVIDER
26
-
27
- **Problem**: No explicit Tier 2 choice. System checks providers in hardcoded priority order.
28
-
29
- ---
30
-
31
- ### Proposed Configuration
32
- ```bash
33
- # Tier 1: Local (always enabled if PREFER_OLLAMA=true)
34
- PREFER_OLLAMA=true
35
- OLLAMA_MODEL=llama3.1:8b
36
- OLLAMA_MAX_TOOLS_FOR_ROUTING=3 # 0-2 tools → Tier 1
37
-
38
- # Tier 2: Cost-Effective (NEW - explicit)
39
- TIER_2_ENABLED=true
40
- TIER_2_PROVIDER=openrouter # Explicit choice
41
- TIER_2_MAX_TOOLS=15 # 3-15 tools → Tier 2
42
-
43
- # Tier 3: Premium (existing FALLBACK_PROVIDER)
44
- FALLBACK_ENABLED=true
45
- FALLBACK_PROVIDER=databricks # 16+ tools → Tier 3
46
- ```
47
-
48
- **Proposed Routing Logic**:
49
- - 0-2 tools → Tier 1 (Ollama/llamacpp/lmstudio)
50
- - 3-15 tools → Tier 2 (openrouter/bedrock/azure-openai) - **explicit, predictable**
51
- - 16+ tools → Tier 3 (databricks/azure-anthropic/bedrock/openrouter/openai/azure-openai) - **explicit, predictable**
52
- - Any tier fails → Try next tier
53
-
54
- ---
55
-
56
- ## Tier Classification (Final)
57
-
58
- ### Tier 1: Local/Free (No API Costs)
59
- **Providers**: ollama, llamacpp, lmstudio
60
-
61
- **Configuration**: Set as `MODEL_PROVIDER` or use `PREFER_OLLAMA=true`
62
-
63
- **Use Cases**: Local inference, offline usage, privacy, development, zero API costs
64
-
65
- **Tool Count**: 0-2 tools (configured via `OLLAMA_MAX_TOOLS_FOR_ROUTING`)
66
-
67
- ---
68
-
69
- ### Tier 2: Cloud/Cost-Effective ($ - Cheap Cloud Only)
70
-
71
- **Valid Providers**:
72
- - `openrouter` - 100+ models, cheapest option ($0.15/1M for GPT-4o-mini)
73
- - `bedrock` - AWS ecosystem with cheap models (Llama $0.99/1M, Mistral, Titan)
74
- - `azure-openai` - Azure with cheap deployments (gpt-4o-mini)
75
-
76
- **NOT Allowed**:
77
- - ❌ `openai` - Direct OpenAI API is Tier 3 only (premium positioning)
78
- - ❌ `ollama`, `llamacpp`, `lmstudio` - Local providers are Tier 1 only
79
-
80
- **Configuration**:
81
- ```bash
82
- TIER_2_ENABLED=true
83
- TIER_2_PROVIDER=openrouter # or bedrock, azure-openai
84
- TIER_2_MAX_TOOLS=15
85
- ```
86
-
87
- **Tool Count**: 3-15 tools (configurable via `TIER_2_MAX_TOOLS`)
88
-
89
- **Use Cases**: Cost optimization, medium complexity requests, development with cloud
90
-
91
- ---
92
-
93
- ### Tier 3: Cloud/Premium ($$$ - Expensive Cloud)
94
-
95
- **Valid Providers** (All Cloud Providers):
96
- - `databricks` - Claude Opus/Sonnet ($3-15/1M), enterprise MLOps
97
- - `azure-anthropic` - Azure-hosted Claude ($3-15/1M)
98
- - `bedrock` - AWS with premium models (Claude 4.5 Sonnet $3+/1M)
99
- - `openrouter` - With premium models (GPT-4o, Claude Opus)
100
- - `openai` - Direct OpenAI API (official, premium)
101
- - `azure-openai` - Azure with premium models (GPT-4o, o1)
102
-
103
- **NOT Allowed**:
104
- - ❌ `ollama`, `llamacpp`, `lmstudio` - Local providers should not be fallback for cloud
105
-
106
- **Configuration**:
107
- ```bash
108
- FALLBACK_ENABLED=true
109
- FALLBACK_PROVIDER=databricks # or any cloud provider above
110
- ```
111
-
112
- **Tool Count**: 16+ tools, OR when Tier 2 fails
113
-
114
- **Use Cases**: Complex reasoning, heavy tool usage, production reliability, premium quality
115
-
116
- ---
117
-
118
- ## Implementation Phases
119
-
120
- ### Phase 1: Configuration (src/config/index.js)
121
-
122
- **File**: `src/config/index.js`
123
-
124
- **Location**: After line 109 (following existing model provider config)
125
-
126
- **Add Environment Variable Parsing**:
127
- ```javascript
128
- // Tier 2 configuration (explicit middle tier)
129
- const tier2Enabled = process.env.TIER_2_ENABLED?.toLowerCase() === "true";
130
- const tier2Provider = process.env.TIER_2_PROVIDER?.trim()?.toLowerCase() || null;
131
- const tier2MaxTools = parseInt(process.env.TIER_2_MAX_TOOLS) || 15;
132
- ```
133
-
134
- **Add Validation Logic** (after line 226):
135
- ```javascript
136
- // Validate Tier 2 if enabled
137
- if (tier2Enabled) {
138
- if (!tier2Provider) {
139
- throw new Error(
140
- "TIER_2_ENABLED is true but TIER_2_PROVIDER is not set. " +
141
- "Set TIER_2_PROVIDER to: openrouter, bedrock, azure-openai"
142
- );
143
- }
144
-
145
- const validTier2Providers = ["openrouter", "bedrock", "azure-openai"];
146
- if (!validTier2Providers.includes(tier2Provider)) {
147
- throw new Error(
148
- `TIER_2_PROVIDER '${tier2Provider}' is invalid. ` +
149
- `Valid cost-effective cloud providers: ${validTier2Providers.join(", ")}. ` +
150
- `Note: OpenAI direct API is Tier 3 only (use openrouter for cheaper OpenAI access). ` +
151
- `Local providers (ollama, llamacpp, lmstudio) should use Tier 1.`
152
- );
153
- }
154
-
155
- // Verify Tier 2 provider is configured
156
- const providerConfigured = {
157
- openrouter: config.openrouter?.apiKey,
158
- bedrock: config.bedrock?.apiKey,
159
- "azure-openai": config.azureOpenAI?.apiKey,
160
- };
161
-
162
- if (!providerConfigured[tier2Provider]) {
163
- throw new Error(
164
- `TIER_2_PROVIDER is set to '${tier2Provider}' but this provider is not configured. ` +
165
- `Please configure ${tier2Provider.toUpperCase()} environment variables.`
166
- );
167
- }
168
- }
169
-
170
- // Validate Tier 3 (FALLBACK_PROVIDER) - prevent local providers
171
- if (fallbackEnabled) {
172
- const localProviders = ["ollama", "llamacpp", "lmstudio"];
173
-
174
- if (localProviders.includes(fallbackProvider)) {
175
- throw new Error(
176
- `FALLBACK_PROVIDER cannot be '${fallbackProvider}' (local provider). ` +
177
- `Tier 3 fallback should be a cloud provider: databricks, azure-anthropic, bedrock, openrouter, openai, azure-openai. ` +
178
- `Local providers (ollama, llamacpp, lmstudio) should only be used as Tier 1.`
179
- );
180
- }
181
- }
182
- ```
183
-
184
- **Export Tier 2 Config** (after line 446 in modelProvider section):
185
- ```javascript
186
- modelProvider: {
187
- type: modelProvider,
188
- preferOllama,
189
- fallbackEnabled,
190
- fallbackProvider,
191
- ollamaMaxToolsForRouting,
192
- openRouterMaxToolsForRouting,
193
- // NEW: Tier 2 configuration
194
- tier2Enabled,
195
- tier2Provider,
196
- tier2MaxTools,
197
- },
198
- ```
199
-
200
- ---
201
-
202
- ### Phase 2: Routing Logic (src/clients/routing.js)
203
-
204
- **File**: `src/clients/routing.js`
205
-
206
- **Replace Lines 56-94** with new tier-based logic:
207
-
208
- ```javascript
209
- // Moderate tool count → Check if Tier 2 is enabled
210
- if (toolCount < maxToolsForOpenRouter && isFallbackEnabled()) {
211
- const tier2Enabled = config.modelProvider?.tier2Enabled ?? false;
212
- const tier2Provider = config.modelProvider?.tier2Provider;
213
- const tier2MaxTools = config.modelProvider?.tier2MaxTools ?? 15;
214
-
215
- // If Tier 2 explicitly enabled, route to configured provider
216
- if (tier2Enabled && toolCount <= tier2MaxTools) {
217
- logger.debug(
218
- { toolCount, tier: 2, provider: tier2Provider, tier2MaxTools, decision: tier2Provider },
219
- "Routing to Tier 2 (explicit cost-effective cloud)"
220
- );
221
- return tier2Provider;
222
- }
223
-
224
- // If Tier 2 disabled, check providers in order (backward compatibility)
225
- if (!tier2Enabled) {
226
- logger.debug({ toolCount }, "Tier 2 disabled, using legacy provider check order");
227
-
228
- if (config.openrouter?.apiKey) {
229
- logger.debug({ toolCount, decision: "openrouter" }, "Routing to OpenRouter (legacy mode)");
230
- return "openrouter";
231
- } else if (config.openai?.apiKey) {
232
- logger.debug({ toolCount, decision: "openai" }, "Routing to OpenAI (legacy mode)");
233
- return "openai";
234
- } else if (config.azureOpenAI?.apiKey) {
235
- logger.debug({ toolCount, decision: "azure-openai" }, "Routing to Azure OpenAI (legacy mode)");
236
- return "azure-openai";
237
- } else if (config.llamacpp?.endpoint) {
238
- logger.debug({ toolCount, decision: "llamacpp" }, "Routing to llama.cpp (legacy mode)");
239
- return "llamacpp";
240
- } else if (config.lmstudio?.endpoint) {
241
- logger.debug({ toolCount, decision: "lmstudio" }, "Routing to LM Studio (legacy mode)");
242
- return "lmstudio";
243
- } else if (config.bedrock?.apiKey) {
244
- logger.debug({ toolCount, decision: "bedrock" }, "Routing to AWS Bedrock (legacy mode)");
245
- return "bedrock";
246
- }
247
- }
248
- }
249
-
250
- // Heavy tool count → Tier 3 (fallback provider)
251
- if (isFallbackEnabled()) {
252
- const fallback = config.modelProvider?.fallbackProvider ?? "databricks";
253
- logger.debug(
254
- { toolCount, tier: 3, provider: fallback, decision: fallback },
255
- "Routing to Tier 3 (premium cloud - heavy tools or Tier 2 exceeded threshold)"
256
- );
257
- return fallback;
258
- }
259
-
260
- // Fallback disabled, route to Ollama regardless of complexity
261
- logger.debug(
262
- { toolCount, maxToolsForOllama, fallbackEnabled: false, decision: "ollama" },
263
- "Routing to Ollama (fallback disabled)"
264
- );
265
- return "ollama";
266
- ```
267
-
268
- **Add Helper Function** (after line 130):
269
- ```javascript
270
- /**
271
- * Get the tier for current request based on tool count
272
- *
273
- * @param {number} toolCount - Number of tools in request
274
- * @returns {number} Tier number (1, 2, or 3)
275
- */
276
- function getTierForRequest(toolCount) {
277
- const preferOllama = config.modelProvider?.preferOllama ?? false;
278
- if (!preferOllama) return null; // Not using tiered routing
279
-
280
- const ollamaMaxTools = config.modelProvider?.ollamaMaxToolsForRouting ?? 3;
281
- const tier2MaxTools = config.modelProvider?.tier2MaxTools ?? 15;
282
-
283
- if (toolCount < ollamaMaxTools) return 1; // Ollama
284
- if (toolCount <= tier2MaxTools) return 2; // Tier 2
285
- return 3; // Tier 3
286
- }
287
-
288
- module.exports = {
289
- determineProvider,
290
- isFallbackEnabled,
291
- getFallbackProvider,
292
- getTierForRequest, // NEW
293
- };
294
- ```
295
-
296
- ---
297
-
298
- ### Phase 3: Documentation Updates
299
-
300
- #### 3.1 Update .env.example
301
-
302
- **File**: `.env.example`
303
-
304
- **Add After Line 36** (after OLLAMA_MAX_TOOLS_FOR_ROUTING):
305
-
306
- ```bash
307
- # ==============================================================================
308
- # Tier 2 Configuration (Explicit Middle Tier - Cost-Effective Cloud)
309
- # ==============================================================================
310
-
311
- # Enable Tier 2 routing for cost-effective cloud provider
312
- # When enabled, requests with 3-15 tools route to this provider instead of checking providers in order
313
- # TIER_2_ENABLED=true
314
-
315
- # Which provider to use for Tier 2 (cost-effective cloud only)
316
- # Options: openrouter, bedrock, azure-openai
317
- # NOT allowed: openai (Tier 3 only), ollama/llamacpp/lmstudio (Tier 1 only)
318
- # TIER_2_PROVIDER=openrouter
319
-
320
- # Maximum tools for Tier 2 routing (requests above this go to Tier 3)
321
- # Default: 15
322
- # TIER_2_MAX_TOOLS=15
323
-
324
- # ==============================================================================
325
- # 3-Tier Routing Configuration Example
326
- # ==============================================================================
327
- #
328
- # ┌─────────────┬────────────────┬──────────────────┬─────────────┐
329
- # │ Tool Count │ Tier │ Provider │ Cost │
330
- # ├─────────────┼────────────────┼──────────────────┼─────────────┤
331
- # │ 0-2 tools │ Tier 1 (Local) │ Ollama │ FREE │
332
- # │ 3-15 tools │ Tier 2 (Cloud) │ OpenRouter │ $ (cheap) │
333
- # │ 16+ tools │ Tier 3 (Cloud) │ Databricks │ $$$ (exp) │
334
- # └─────────────┴────────────────┴──────────────────┴─────────────┘
335
- #
336
- # Complete Example:
337
- # PREFER_OLLAMA=true
338
- # OLLAMA_MAX_TOOLS_FOR_ROUTING=3
339
- # TIER_2_ENABLED=true
340
- # TIER_2_PROVIDER=openrouter
341
- # TIER_2_MAX_TOOLS=15
342
- # FALLBACK_ENABLED=true
343
- # FALLBACK_PROVIDER=databricks
344
- ```
345
-
346
- #### 3.2 Update README.md
347
-
348
- **File**: `README.md`
349
-
350
- **Add Section After "Hybrid Routing" Section** (after line ~275):
351
-
352
- ```markdown
353
- ### **3-Tier Routing (Explicit Cost Control)**
354
-
355
- For predictable cost management, use explicit tier configuration:
356
-
357
- ```bash
358
- # Tier 1: Local (FREE) - 0-2 tools
359
- PREFER_OLLAMA=true
360
- OLLAMA_MODEL=llama3.1:8b
361
- OLLAMA_MAX_TOOLS_FOR_ROUTING=3
362
-
363
- # Tier 2: Cost-Effective (CHEAP $) - 3-15 tools
364
- TIER_2_ENABLED=true
365
- TIER_2_PROVIDER=openrouter
366
- TIER_2_MAX_TOOLS=15
367
-
368
- # Tier 3: Premium (EXPENSIVE $$$) - 16+ tools
369
- FALLBACK_ENABLED=true
370
- FALLBACK_PROVIDER=databricks
371
- ```
372
-
373
- **How 3-Tier Routing Works**:
374
-
375
- | Tool Count | Tier | Provider | Cost | Example |
376
- |------------|------|----------|------|---------|
377
- | 0-2 tools | **Tier 1** (Local) | Ollama | FREE | Simple questions, basic file reads |
378
- | 3-15 tools | **Tier 2** (Cloud) | OpenRouter | $ (cheap) | Medium complexity, moderate tool usage |
379
- | 16+ tools | **Tier 3** (Cloud) | Databricks | $$$ (expensive) | Complex refactoring, heavy analysis |
380
-
381
- **Cost Predictability**:
382
- ```
383
- Simple request (1 tool): Tier 1 → FREE
384
- Medium request (8 tools): Tier 2 → $0.15 per 1M tokens (OpenRouter)
385
- Complex request (20 tools): Tier 3 → $3.00 per 1M tokens (Databricks)
386
- ```
387
-
388
- **Tier 2 Valid Providers** (Cost-Effective):
389
- - `openrouter` - 100+ models, cheapest option ($0.15/1M)
390
- - `bedrock` - AWS with cheap models (Llama, Mistral, Titan)
391
- - `azure-openai` - Azure with mini deployments
392
-
393
- **Tier 3 Valid Providers** (Premium):
394
- - `databricks` - Claude Opus/Sonnet, enterprise
395
- - `azure-anthropic` - Azure-hosted Claude
396
- - `bedrock` - AWS with Claude 4.5 Sonnet
397
- - `openrouter` - With premium models
398
- - `openai` - Direct OpenAI API (premium only)
399
- - `azure-openai` - Azure with premium models
400
-
401
- ⚠️ **Note**: Direct OpenAI API (`openai`) is Tier 3 only. For cheaper GPT access, use `openrouter` in Tier 2.
402
-
403
- **Automatic Fallback**: If any tier fails, automatically tries the next tier for resilience.
404
- ```
405
-
406
- ---
407
-
408
- ### Phase 4: Testing Strategy
409
-
410
- #### 4.1 Unit Tests
411
-
412
- **File**: `test/tier-routing.test.js` (NEW FILE)
413
-
414
- ```javascript
415
- const assert = require("assert");
416
- const { describe, it, beforeEach, afterEach } = require("node:test");
417
-
418
- describe("3-Tier Routing", () => {
419
- let originalEnv;
420
-
421
- beforeEach(() => {
422
- originalEnv = { ...process.env };
423
- delete require.cache[require.resolve("../src/config")];
424
- delete require.cache[require.resolve("../src/clients/routing")];
425
- });
426
-
427
- afterEach(() => {
428
- process.env = originalEnv;
429
- });
430
-
431
- describe("Tier 1 (Local)", () => {
432
- it("should route 0-2 tools to Tier 1 (Ollama)", () => {
433
- process.env.PREFER_OLLAMA = "true";
434
- process.env.OLLAMA_MAX_TOOLS_FOR_ROUTING = "3";
435
-
436
- const { determineProvider } = require("../src/clients/routing");
437
- const provider = determineProvider({ tools: [{}, {}] }); // 2 tools
438
-
439
- assert.strictEqual(provider, "ollama");
440
- });
441
- });
442
-
443
- describe("Tier 2 (Cost-Effective Cloud)", () => {
444
- it("should route 3-15 tools to Tier 2 when enabled", () => {
445
- process.env.PREFER_OLLAMA = "true";
446
- process.env.TIER_2_ENABLED = "true";
447
- process.env.TIER_2_PROVIDER = "openrouter";
448
- process.env.TIER_2_MAX_TOOLS = "15";
449
- process.env.OPENROUTER_API_KEY = "test-key";
450
-
451
- const { determineProvider } = require("../src/clients/routing");
452
- const provider = determineProvider({ tools: Array(8).fill({}) }); // 8 tools
453
-
454
- assert.strictEqual(provider, "openrouter");
455
- });
456
-
457
- it("should route to bedrock when configured as Tier 2", () => {
458
- process.env.PREFER_OLLAMA = "true";
459
- process.env.TIER_2_ENABLED = "true";
460
- process.env.TIER_2_PROVIDER = "bedrock";
461
- process.env.AWS_BEDROCK_API_KEY = "test-key";
462
-
463
- const { determineProvider } = require("../src/clients/routing");
464
- const provider = determineProvider({ tools: Array(10).fill({}) });
465
-
466
- assert.strictEqual(provider, "bedrock");
467
- });
468
-
469
- it("should NOT allow openai in Tier 2", () => {
470
- process.env.TIER_2_ENABLED = "true";
471
- process.env.TIER_2_PROVIDER = "openai";
472
-
473
- assert.throws(
474
- () => require("../src/config"),
475
- /OpenAI direct API is Tier 3 only/
476
- );
477
- });
478
-
479
- it("should NOT allow local providers in Tier 2", () => {
480
- process.env.TIER_2_ENABLED = "true";
481
- process.env.TIER_2_PROVIDER = "ollama";
482
-
483
- assert.throws(
484
- () => require("../src/config"),
485
- /Local providers.*should use Tier 1/
486
- );
487
- });
488
- });
489
-
490
- describe("Tier 3 (Premium Cloud)", () => {
491
- it("should route 16+ tools to Tier 3", () => {
492
- process.env.PREFER_OLLAMA = "true";
493
- process.env.TIER_2_ENABLED = "true";
494
- process.env.TIER_2_MAX_TOOLS = "15";
495
- process.env.FALLBACK_ENABLED = "true";
496
- process.env.FALLBACK_PROVIDER = "databricks";
497
-
498
- const { determineProvider } = require("../src/clients/routing");
499
- const provider = determineProvider({ tools: Array(20).fill({}) }); // 20 tools
500
-
501
- assert.strictEqual(provider, "databricks");
502
- });
503
-
504
- it("should allow openai in Tier 3", () => {
505
- process.env.FALLBACK_ENABLED = "true";
506
- process.env.FALLBACK_PROVIDER = "openai";
507
- process.env.OPENAI_API_KEY = "test-key";
508
-
509
- const config = require("../src/config");
510
- assert.strictEqual(config.modelProvider.fallbackProvider, "openai");
511
- });
512
-
513
- it("should NOT allow local providers in Tier 3", () => {
514
- process.env.FALLBACK_ENABLED = "true";
515
- process.env.FALLBACK_PROVIDER = "ollama";
516
-
517
- assert.throws(
518
- () => require("../src/config"),
519
- /FALLBACK_PROVIDER cannot be 'ollama'/
520
- );
521
- });
522
- });
523
-
524
- describe("Validation", () => {
525
- it("should throw error if Tier 2 enabled but provider not set", () => {
526
- process.env.TIER_2_ENABLED = "true";
527
- // Missing TIER_2_PROVIDER
528
-
529
- assert.throws(
530
- () => require("../src/config"),
531
- /TIER_2_PROVIDER is not set/
532
- );
533
- });
534
-
535
- it("should throw error if Tier 2 provider not configured", () => {
536
- process.env.TIER_2_ENABLED = "true";
537
- process.env.TIER_2_PROVIDER = "openrouter";
538
- // Missing OPENROUTER_API_KEY
539
-
540
- assert.throws(
541
- () => require("../src/config"),
542
- /openrouter.*is not configured/
543
- );
544
- });
545
- });
546
-
547
- describe("Backward Compatibility", () => {
548
- it("should fall back to legacy mode when Tier 2 disabled", () => {
549
- process.env.PREFER_OLLAMA = "true";
550
- process.env.TIER_2_ENABLED = "false";
551
- process.env.OPENROUTER_API_KEY = "test-key";
552
-
553
- const { determineProvider } = require("../src/clients/routing");
554
- const provider = determineProvider({ tools: Array(8).fill({}) }); // 8 tools
555
-
556
- assert.strictEqual(provider, "openrouter"); // Legacy order check
557
- });
558
- });
559
- });
560
- ```
561
-
562
- #### 4.2 Manual Integration Tests
563
-
564
- ```bash
565
- # Test 1: Tier 1 routing (0-2 tools)
566
- curl -X POST http://localhost:8081/v1/messages \
567
- -H "Content-Type: application/json" \
568
- -d '{
569
- "model": "claude",
570
- "max_tokens": 50,
571
- "messages": [{"role": "user", "content": "Hello"}],
572
- "tools": [{"name": "test1"}]
573
- }'
574
- # Expected: Routes to Ollama, logs show "Tier 1"
575
-
576
- # Test 2: Tier 2 routing (3-15 tools)
577
- curl -X POST http://localhost:8081/v1/messages \
578
- -H "Content-Type: application/json" \
579
- -d '{
580
- "model": "claude",
581
- "max_tokens": 50,
582
- "messages": [{"role": "user", "content": "Hello"}],
583
- "tools": [
584
- {"name": "t1"}, {"name": "t2"}, {"name": "t3"},
585
- {"name": "t4"}, {"name": "t5"}, {"name": "t6"}
586
- ]
587
- }'
588
- # Expected: Routes to openrouter (Tier 2), logs show "tier: 2"
589
-
590
- # Test 3: Tier 3 routing (16+ tools)
591
- curl -X POST http://localhost:8081/v1/messages \
592
- -H "Content-Type: application/json" \
593
- -d '{
594
- "model": "claude",
595
- "max_tokens": 50,
596
- "messages": [{"role": "user", "content": "Hello"}],
597
- "tools": [... 20 tools ...]
598
- }'
599
- # Expected: Routes to databricks (Tier 3), logs show "tier: 3"
600
- ```
601
-
602
- ---
603
-
604
- ## Configuration Examples
605
-
606
- ### Example 1: Standard 3-Tier (Recommended)
607
- ```bash
608
- # Tier 1: Local (FREE)
609
- PREFER_OLLAMA=true
610
- OLLAMA_MODEL=llama3.1:8b
611
- OLLAMA_MAX_TOOLS_FOR_ROUTING=3
612
-
613
- # Tier 2: OpenRouter (CHEAP $)
614
- TIER_2_ENABLED=true
615
- TIER_2_PROVIDER=openrouter
616
- TIER_2_MAX_TOOLS=15
617
- OPENROUTER_MODEL=openai/gpt-4o-mini
618
-
619
- # Tier 3: Databricks (PREMIUM $$$)
620
- FALLBACK_ENABLED=true
621
- FALLBACK_PROVIDER=databricks
622
- ```
623
-
624
- ### Example 2: AWS Bedrock Ecosystem
625
- ```bash
626
- # Tier 1: Local
627
- PREFER_OLLAMA=true
628
-
629
- # Tier 2: Bedrock with cheap models
630
- TIER_2_ENABLED=true
631
- TIER_2_PROVIDER=bedrock
632
- TIER_2_MAX_TOOLS=15
633
- AWS_BEDROCK_MODEL_ID=meta.llama3-1-8b-instruct-v1:0 # Cheap $0.99/1M
634
-
635
- # Tier 3: Bedrock with Claude
636
- FALLBACK_PROVIDER=bedrock
637
- # Would use: us.anthropic.claude-sonnet-4-5-* (expensive $3+/1M)
638
- ```
639
-
640
- ### Example 3: Azure Ecosystem
641
- ```bash
642
- # Tier 1: Local
643
- PREFER_OLLAMA=true
644
-
645
- # Tier 2: Azure OpenAI mini
646
- TIER_2_ENABLED=true
647
- TIER_2_PROVIDER=azure-openai
648
- AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini
649
-
650
- # Tier 3: Azure Anthropic
651
- FALLBACK_PROVIDER=azure-anthropic
652
- ```
653
-
654
- ### Example 4: OpenRouter → OpenAI Direct
655
- ```bash
656
- # Tier 1: Local
657
- PREFER_OLLAMA=true
658
-
659
- # Tier 2: OpenRouter (cheap GPT via aggregator)
660
- TIER_2_ENABLED=true
661
- TIER_2_PROVIDER=openrouter
662
- OPENROUTER_MODEL=openai/gpt-4o-mini
663
-
664
- # Tier 3: OpenAI Direct (official API, premium)
665
- FALLBACK_PROVIDER=openai
666
- OPENAI_MODEL=gpt-4o
667
- ```
668
-
669
- ---
670
-
671
- ## Backward Compatibility
672
-
673
- ### Strategy
674
- Make Tier 2 **opt-in** to preserve existing behavior.
675
-
676
- **Old Config (Still Works)**:
677
- ```bash
678
- PREFER_OLLAMA=true
679
- FALLBACK_ENABLED=true
680
- FALLBACK_PROVIDER=databricks
681
- ```
682
- Result: 0-2 tools → Ollama, 3+ tools → checks providers in order (legacy mode)
683
-
684
- **New Config (Explicit Tiers)**:
685
- ```bash
686
- PREFER_OLLAMA=true
687
- TIER_2_ENABLED=true
688
- TIER_2_PROVIDER=openrouter
689
- FALLBACK_PROVIDER=databricks
690
- ```
691
- Result: 0-2 tools → Ollama, 3-15 tools → openrouter, 16+ tools → databricks
692
-
693
- ### Migration Path
694
- 1. **Existing users**: No changes needed, system works as before
695
- 2. **New users**: Can enable `TIER_2_ENABLED=true` for explicit routing
696
- 3. **Documentation**: Show both old and new configs
697
-
698
- ---
699
-
700
- ## Summary
701
-
702
- ### Files to Modify
703
-
704
- | File | Changes | Lines | Type |
705
- |------|---------|-------|------|
706
- | `src/config/index.js` | Add tier2 config parsing & validation | ~50 | Modify |
707
- | `src/clients/routing.js` | Replace lines 56-94 with tier logic | ~60 | Modify |
708
- | `.env.example` | Document Tier 2 configuration | ~40 | Add |
709
- | `README.md` | Add 3-Tier Routing section | ~60 | Add |
710
- | `test/tier-routing.test.js` | Unit tests for tier routing | ~120 | Create |
711
-
712
- **Total Effort**: ~330 lines of changes/additions
713
-
714
- ---
715
-
716
- ## Tier Provider Matrix (Quick Reference)
717
-
718
- | Provider | Tier 1 | Tier 2 | Tier 3 |
719
- |----------|--------|--------|--------|
720
- | **ollama** | ✅ PRIMARY | ❌ NO | ❌ NO |
721
- | **llamacpp** | ✅ PRIMARY | ❌ NO | ❌ NO |
722
- | **lmstudio** | ✅ PRIMARY | ❌ NO | ❌ NO |
723
- | **openrouter** | ❌ NO | ✅ ALLOWED | ✅ ALLOWED |
724
- | **bedrock** | ❌ NO | ✅ ALLOWED | ✅ ALLOWED |
725
- | **azure-openai** | ❌ NO | ✅ ALLOWED | ✅ ALLOWED |
726
- | **openai** | ❌ NO | ❌ NO | ✅ ALLOWED |
727
- | **databricks** | ❌ NO | ❌ NO | ✅ ALLOWED |
728
- | **azure-anthropic** | ❌ NO | ❌ NO | ✅ ALLOWED |
729
-
730
- ---
731
-
732
- ## Decision Rationale
733
-
734
- ### Why OpenAI is Tier 3 Only?
735
- 1. Direct OpenAI API is premium-priced vs OpenRouter
736
- 2. Users choosing direct OpenAI want "official" API = premium intent
737
- 3. Tier 2 should be "cost optimization" - use OpenRouter for cheaper GPT access
738
- 4. Clear separation: Tier 2 = cheap aggregators, Tier 3 = direct APIs
739
-
740
- ### Why Local Providers are Tier 1 Only?
741
- 1. Local = FREE, doesn't make sense as "fallback" for cloud
742
- 2. Tier progression should be: Free → Cheap Cloud → Expensive Cloud
743
- 3. If local provider fails, escalate to cloud, not to another local
744
-
745
- ### Why Same Provider Can Be Tier 2 or Tier 3?
746
- 1. **Provider ≠ Tier** - Model choice determines cost
747
- 2. Example: Bedrock with Llama ($0.99/1M) = Tier 2, Bedrock with Claude ($3/1M) = Tier 3
748
- 3. User controls tier assignment via model configuration
749
-
750
- ---
751
-
752
- ## Next Steps
753
-
754
- 1. ✅ User commits Bedrock changes first
755
- 2. ✅ Implement Phase 1 (Config validation)
756
- 3. ✅ Implement Phase 2 (Routing logic)
757
- 4. ✅ Implement Phase 3 (Documentation)
758
- 5. ✅ Implement Phase 4 (Tests)
759
- 6. ✅ Test with real requests
760
- 7. ✅ Update CHANGELOG.md
761
-
762
- ---
763
-
764
- ## Open Questions
765
-
766
- _(To be filled in based on user feedback)_
767
-
768
- 1. Should Tier 2 be opt-in (current plan) or opt-out?
769
- 2. Should we add metrics to track tier usage?
770
- 3. Should we add auto-learning (if Tier 2 fails X times, skip it)?
771
- 4. Should TIER_2_MAX_TOOLS default match OPENROUTER_MAX_TOOLS_FOR_ROUTING (15)?