lynkr 7.2.5 → 8.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -2
- package/config/model-tiers.json +89 -0
- package/docs/docs.html +1 -0
- package/docs/index.md +7 -0
- package/docs/toon-integration-spec.md +130 -0
- package/documentation/README.md +3 -2
- package/documentation/claude-code-cli.md +23 -16
- package/documentation/cursor-integration.md +17 -14
- package/documentation/docker.md +11 -4
- package/documentation/embeddings.md +7 -5
- package/documentation/faq.md +66 -12
- package/documentation/features.md +22 -15
- package/documentation/installation.md +66 -14
- package/documentation/production.md +43 -8
- package/documentation/providers.md +145 -42
- package/documentation/routing.md +476 -0
- package/documentation/token-optimization.md +7 -5
- package/documentation/troubleshooting.md +81 -5
- package/install.sh +6 -1
- package/package.json +4 -2
- package/scripts/setup.js +0 -1
- package/src/agents/executor.js +14 -6
- package/src/api/middleware/session.js +15 -2
- package/src/api/openai-router.js +130 -37
- package/src/api/providers-handler.js +15 -1
- package/src/api/router.js +107 -2
- package/src/budget/index.js +4 -3
- package/src/clients/databricks.js +431 -234
- package/src/clients/gpt-utils.js +181 -0
- package/src/clients/ollama-utils.js +66 -140
- package/src/clients/routing.js +0 -1
- package/src/clients/standard-tools.js +76 -3
- package/src/config/index.js +113 -35
- package/src/context/toon.js +173 -0
- package/src/logger/index.js +23 -0
- package/src/orchestrator/index.js +686 -211
- package/src/routing/agentic-detector.js +320 -0
- package/src/routing/complexity-analyzer.js +202 -2
- package/src/routing/cost-optimizer.js +305 -0
- package/src/routing/index.js +168 -159
- package/src/routing/model-tiers.js +365 -0
- package/src/server.js +2 -2
- package/src/sessions/cleanup.js +3 -3
- package/src/sessions/record.js +10 -1
- package/src/sessions/store.js +7 -2
- package/src/tools/agent-task.js +48 -1
- package/src/tools/index.js +15 -2
- package/te +11622 -0
- package/test/README.md +1 -1
- package/test/azure-openai-config.test.js +17 -8
- package/test/azure-openai-integration.test.js +7 -1
- package/test/azure-openai-routing.test.js +41 -43
- package/test/bedrock-integration.test.js +18 -32
- package/test/hybrid-routing-integration.test.js +35 -20
- package/test/hybrid-routing-performance.test.js +74 -64
- package/test/llamacpp-integration.test.js +28 -9
- package/test/lmstudio-integration.test.js +20 -8
- package/test/openai-integration.test.js +17 -20
- package/test/performance-tests.js +1 -1
- package/test/routing.test.js +65 -59
- package/test/toon-compression.test.js +131 -0
- package/CLAWROUTER_ROUTING_PLAN.md +0 -910
- package/ROUTER_COMPARISON.md +0 -173
- package/TIER_ROUTING_PLAN.md +0 -771
package/TIER_ROUTING_PLAN.md
DELETED
|
@@ -1,771 +0,0 @@
|
|
|
1
|
-
# 3-Tier Routing Implementation Plan
|
|
2
|
-
|
|
3
|
-
## Overview
|
|
4
|
-
|
|
5
|
-
Add explicit 3-tier configuration to Lynkr for predictable, cost-aware routing:
|
|
6
|
-
- **Tier 1**: Local/Free (Ollama, llama.cpp, LM Studio)
|
|
7
|
-
- **Tier 2**: Cloud/Cost-Effective (OpenRouter, Bedrock with cheap models, Azure OpenAI mini)
|
|
8
|
-
- **Tier 3**: Cloud/Premium (Databricks, Azure Anthropic, Bedrock with premium models, OpenAI direct, etc.)
|
|
9
|
-
|
|
10
|
-
---
|
|
11
|
-
|
|
12
|
-
## Current System vs Proposed System
|
|
13
|
-
|
|
14
|
-
### Current Configuration
|
|
15
|
-
```bash
|
|
16
|
-
PREFER_OLLAMA=true
|
|
17
|
-
OLLAMA_MAX_TOOLS_FOR_ROUTING=3
|
|
18
|
-
FALLBACK_ENABLED=true
|
|
19
|
-
FALLBACK_PROVIDER=databricks
|
|
20
|
-
```
|
|
21
|
-
|
|
22
|
-
**Current Routing Logic**:
|
|
23
|
-
- 0-2 tools → Ollama
|
|
24
|
-
- 3+ tools → Checks if OpenRouter/OpenAI/Azure/etc. configured (hardcoded order in routing.js lines 56-93)
|
|
25
|
-
- Heavy tools OR not configured → FALLBACK_PROVIDER
|
|
26
|
-
|
|
27
|
-
**Problem**: No explicit Tier 2 choice. System checks providers in hardcoded priority order.
|
|
28
|
-
|
|
29
|
-
---
|
|
30
|
-
|
|
31
|
-
### Proposed Configuration
|
|
32
|
-
```bash
|
|
33
|
-
# Tier 1: Local (always enabled if PREFER_OLLAMA=true)
|
|
34
|
-
PREFER_OLLAMA=true
|
|
35
|
-
OLLAMA_MODEL=llama3.1:8b
|
|
36
|
-
OLLAMA_MAX_TOOLS_FOR_ROUTING=3 # 0-2 tools → Tier 1
|
|
37
|
-
|
|
38
|
-
# Tier 2: Cost-Effective (NEW - explicit)
|
|
39
|
-
TIER_2_ENABLED=true
|
|
40
|
-
TIER_2_PROVIDER=openrouter # Explicit choice
|
|
41
|
-
TIER_2_MAX_TOOLS=15 # 3-15 tools → Tier 2
|
|
42
|
-
|
|
43
|
-
# Tier 3: Premium (existing FALLBACK_PROVIDER)
|
|
44
|
-
FALLBACK_ENABLED=true
|
|
45
|
-
FALLBACK_PROVIDER=databricks # 16+ tools → Tier 3
|
|
46
|
-
```
|
|
47
|
-
|
|
48
|
-
**Proposed Routing Logic**:
|
|
49
|
-
- 0-2 tools → Tier 1 (Ollama/llamacpp/lmstudio)
|
|
50
|
-
- 3-15 tools → Tier 2 (openrouter/bedrock/azure-openai) - **explicit, predictable**
|
|
51
|
-
- 16+ tools → Tier 3 (databricks/azure-anthropic/bedrock/openrouter/openai/azure-openai) - **explicit, predictable**
|
|
52
|
-
- Any tier fails → Try next tier
|
|
53
|
-
|
|
54
|
-
---
|
|
55
|
-
|
|
56
|
-
## Tier Classification (Final)
|
|
57
|
-
|
|
58
|
-
### Tier 1: Local/Free (No API Costs)
|
|
59
|
-
**Providers**: ollama, llamacpp, lmstudio
|
|
60
|
-
|
|
61
|
-
**Configuration**: Set as `MODEL_PROVIDER` or use `PREFER_OLLAMA=true`
|
|
62
|
-
|
|
63
|
-
**Use Cases**: Local inference, offline usage, privacy, development, zero API costs
|
|
64
|
-
|
|
65
|
-
**Tool Count**: 0-2 tools (configured via `OLLAMA_MAX_TOOLS_FOR_ROUTING`)
|
|
66
|
-
|
|
67
|
-
---
|
|
68
|
-
|
|
69
|
-
### Tier 2: Cloud/Cost-Effective ($ - Cheap Cloud Only)
|
|
70
|
-
|
|
71
|
-
**Valid Providers**:
|
|
72
|
-
- `openrouter` - 100+ models, cheapest option ($0.15/1M for GPT-4o-mini)
|
|
73
|
-
- `bedrock` - AWS ecosystem with cheap models (Llama $0.99/1M, Mistral, Titan)
|
|
74
|
-
- `azure-openai` - Azure with cheap deployments (gpt-4o-mini)
|
|
75
|
-
|
|
76
|
-
**NOT Allowed**:
|
|
77
|
-
- ❌ `openai` - Direct OpenAI API is Tier 3 only (premium positioning)
|
|
78
|
-
- ❌ `ollama`, `llamacpp`, `lmstudio` - Local providers are Tier 1 only
|
|
79
|
-
|
|
80
|
-
**Configuration**:
|
|
81
|
-
```bash
|
|
82
|
-
TIER_2_ENABLED=true
|
|
83
|
-
TIER_2_PROVIDER=openrouter # or bedrock, azure-openai
|
|
84
|
-
TIER_2_MAX_TOOLS=15
|
|
85
|
-
```
|
|
86
|
-
|
|
87
|
-
**Tool Count**: 3-15 tools (configurable via `TIER_2_MAX_TOOLS`)
|
|
88
|
-
|
|
89
|
-
**Use Cases**: Cost optimization, medium complexity requests, development with cloud
|
|
90
|
-
|
|
91
|
-
---
|
|
92
|
-
|
|
93
|
-
### Tier 3: Cloud/Premium ($$$ - Expensive Cloud)
|
|
94
|
-
|
|
95
|
-
**Valid Providers** (All Cloud Providers):
|
|
96
|
-
- `databricks` - Claude Opus/Sonnet ($3-15/1M), enterprise MLOps
|
|
97
|
-
- `azure-anthropic` - Azure-hosted Claude ($3-15/1M)
|
|
98
|
-
- `bedrock` - AWS with premium models (Claude 4.5 Sonnet $3+/1M)
|
|
99
|
-
- `openrouter` - With premium models (GPT-4o, Claude Opus)
|
|
100
|
-
- `openai` - Direct OpenAI API (official, premium)
|
|
101
|
-
- `azure-openai` - Azure with premium models (GPT-4o, o1)
|
|
102
|
-
|
|
103
|
-
**NOT Allowed**:
|
|
104
|
-
- ❌ `ollama`, `llamacpp`, `lmstudio` - Local providers should not be fallback for cloud
|
|
105
|
-
|
|
106
|
-
**Configuration**:
|
|
107
|
-
```bash
|
|
108
|
-
FALLBACK_ENABLED=true
|
|
109
|
-
FALLBACK_PROVIDER=databricks # or any cloud provider above
|
|
110
|
-
```
|
|
111
|
-
|
|
112
|
-
**Tool Count**: 16+ tools, OR when Tier 2 fails
|
|
113
|
-
|
|
114
|
-
**Use Cases**: Complex reasoning, heavy tool usage, production reliability, premium quality
|
|
115
|
-
|
|
116
|
-
---
|
|
117
|
-
|
|
118
|
-
## Implementation Phases
|
|
119
|
-
|
|
120
|
-
### Phase 1: Configuration (src/config/index.js)
|
|
121
|
-
|
|
122
|
-
**File**: `src/config/index.js`
|
|
123
|
-
|
|
124
|
-
**Location**: After line 109 (following existing model provider config)
|
|
125
|
-
|
|
126
|
-
**Add Environment Variable Parsing**:
|
|
127
|
-
```javascript
|
|
128
|
-
// Tier 2 configuration (explicit middle tier)
|
|
129
|
-
const tier2Enabled = process.env.TIER_2_ENABLED?.toLowerCase() === "true";
|
|
130
|
-
const tier2Provider = process.env.TIER_2_PROVIDER?.trim()?.toLowerCase() || null;
|
|
131
|
-
const tier2MaxTools = parseInt(process.env.TIER_2_MAX_TOOLS) || 15;
|
|
132
|
-
```
|
|
133
|
-
|
|
134
|
-
**Add Validation Logic** (after line 226):
|
|
135
|
-
```javascript
|
|
136
|
-
// Validate Tier 2 if enabled
|
|
137
|
-
if (tier2Enabled) {
|
|
138
|
-
if (!tier2Provider) {
|
|
139
|
-
throw new Error(
|
|
140
|
-
"TIER_2_ENABLED is true but TIER_2_PROVIDER is not set. " +
|
|
141
|
-
"Set TIER_2_PROVIDER to: openrouter, bedrock, azure-openai"
|
|
142
|
-
);
|
|
143
|
-
}
|
|
144
|
-
|
|
145
|
-
const validTier2Providers = ["openrouter", "bedrock", "azure-openai"];
|
|
146
|
-
if (!validTier2Providers.includes(tier2Provider)) {
|
|
147
|
-
throw new Error(
|
|
148
|
-
`TIER_2_PROVIDER '${tier2Provider}' is invalid. ` +
|
|
149
|
-
`Valid cost-effective cloud providers: ${validTier2Providers.join(", ")}. ` +
|
|
150
|
-
`Note: OpenAI direct API is Tier 3 only (use openrouter for cheaper OpenAI access). ` +
|
|
151
|
-
`Local providers (ollama, llamacpp, lmstudio) should use Tier 1.`
|
|
152
|
-
);
|
|
153
|
-
}
|
|
154
|
-
|
|
155
|
-
// Verify Tier 2 provider is configured
|
|
156
|
-
const providerConfigured = {
|
|
157
|
-
openrouter: config.openrouter?.apiKey,
|
|
158
|
-
bedrock: config.bedrock?.apiKey,
|
|
159
|
-
"azure-openai": config.azureOpenAI?.apiKey,
|
|
160
|
-
};
|
|
161
|
-
|
|
162
|
-
if (!providerConfigured[tier2Provider]) {
|
|
163
|
-
throw new Error(
|
|
164
|
-
`TIER_2_PROVIDER is set to '${tier2Provider}' but this provider is not configured. ` +
|
|
165
|
-
`Please configure ${tier2Provider.toUpperCase()} environment variables.`
|
|
166
|
-
);
|
|
167
|
-
}
|
|
168
|
-
}
|
|
169
|
-
|
|
170
|
-
// Validate Tier 3 (FALLBACK_PROVIDER) - prevent local providers
|
|
171
|
-
if (fallbackEnabled) {
|
|
172
|
-
const localProviders = ["ollama", "llamacpp", "lmstudio"];
|
|
173
|
-
|
|
174
|
-
if (localProviders.includes(fallbackProvider)) {
|
|
175
|
-
throw new Error(
|
|
176
|
-
`FALLBACK_PROVIDER cannot be '${fallbackProvider}' (local provider). ` +
|
|
177
|
-
`Tier 3 fallback should be a cloud provider: databricks, azure-anthropic, bedrock, openrouter, openai, azure-openai. ` +
|
|
178
|
-
`Local providers (ollama, llamacpp, lmstudio) should only be used as Tier 1.`
|
|
179
|
-
);
|
|
180
|
-
}
|
|
181
|
-
}
|
|
182
|
-
```
|
|
183
|
-
|
|
184
|
-
**Export Tier 2 Config** (after line 446 in modelProvider section):
|
|
185
|
-
```javascript
|
|
186
|
-
modelProvider: {
|
|
187
|
-
type: modelProvider,
|
|
188
|
-
preferOllama,
|
|
189
|
-
fallbackEnabled,
|
|
190
|
-
fallbackProvider,
|
|
191
|
-
ollamaMaxToolsForRouting,
|
|
192
|
-
openRouterMaxToolsForRouting,
|
|
193
|
-
// NEW: Tier 2 configuration
|
|
194
|
-
tier2Enabled,
|
|
195
|
-
tier2Provider,
|
|
196
|
-
tier2MaxTools,
|
|
197
|
-
},
|
|
198
|
-
```
|
|
199
|
-
|
|
200
|
-
---
|
|
201
|
-
|
|
202
|
-
### Phase 2: Routing Logic (src/clients/routing.js)
|
|
203
|
-
|
|
204
|
-
**File**: `src/clients/routing.js`
|
|
205
|
-
|
|
206
|
-
**Replace Lines 56-94** with new tier-based logic:
|
|
207
|
-
|
|
208
|
-
```javascript
|
|
209
|
-
// Moderate tool count → Check if Tier 2 is enabled
|
|
210
|
-
if (toolCount < maxToolsForOpenRouter && isFallbackEnabled()) {
|
|
211
|
-
const tier2Enabled = config.modelProvider?.tier2Enabled ?? false;
|
|
212
|
-
const tier2Provider = config.modelProvider?.tier2Provider;
|
|
213
|
-
const tier2MaxTools = config.modelProvider?.tier2MaxTools ?? 15;
|
|
214
|
-
|
|
215
|
-
// If Tier 2 explicitly enabled, route to configured provider
|
|
216
|
-
if (tier2Enabled && toolCount <= tier2MaxTools) {
|
|
217
|
-
logger.debug(
|
|
218
|
-
{ toolCount, tier: 2, provider: tier2Provider, tier2MaxTools, decision: tier2Provider },
|
|
219
|
-
"Routing to Tier 2 (explicit cost-effective cloud)"
|
|
220
|
-
);
|
|
221
|
-
return tier2Provider;
|
|
222
|
-
}
|
|
223
|
-
|
|
224
|
-
// If Tier 2 disabled, check providers in order (backward compatibility)
|
|
225
|
-
if (!tier2Enabled) {
|
|
226
|
-
logger.debug({ toolCount }, "Tier 2 disabled, using legacy provider check order");
|
|
227
|
-
|
|
228
|
-
if (config.openrouter?.apiKey) {
|
|
229
|
-
logger.debug({ toolCount, decision: "openrouter" }, "Routing to OpenRouter (legacy mode)");
|
|
230
|
-
return "openrouter";
|
|
231
|
-
} else if (config.openai?.apiKey) {
|
|
232
|
-
logger.debug({ toolCount, decision: "openai" }, "Routing to OpenAI (legacy mode)");
|
|
233
|
-
return "openai";
|
|
234
|
-
} else if (config.azureOpenAI?.apiKey) {
|
|
235
|
-
logger.debug({ toolCount, decision: "azure-openai" }, "Routing to Azure OpenAI (legacy mode)");
|
|
236
|
-
return "azure-openai";
|
|
237
|
-
} else if (config.llamacpp?.endpoint) {
|
|
238
|
-
logger.debug({ toolCount, decision: "llamacpp" }, "Routing to llama.cpp (legacy mode)");
|
|
239
|
-
return "llamacpp";
|
|
240
|
-
} else if (config.lmstudio?.endpoint) {
|
|
241
|
-
logger.debug({ toolCount, decision: "lmstudio" }, "Routing to LM Studio (legacy mode)");
|
|
242
|
-
return "lmstudio";
|
|
243
|
-
} else if (config.bedrock?.apiKey) {
|
|
244
|
-
logger.debug({ toolCount, decision: "bedrock" }, "Routing to AWS Bedrock (legacy mode)");
|
|
245
|
-
return "bedrock";
|
|
246
|
-
}
|
|
247
|
-
}
|
|
248
|
-
}
|
|
249
|
-
|
|
250
|
-
// Heavy tool count → Tier 3 (fallback provider)
|
|
251
|
-
if (isFallbackEnabled()) {
|
|
252
|
-
const fallback = config.modelProvider?.fallbackProvider ?? "databricks";
|
|
253
|
-
logger.debug(
|
|
254
|
-
{ toolCount, tier: 3, provider: fallback, decision: fallback },
|
|
255
|
-
"Routing to Tier 3 (premium cloud - heavy tools or Tier 2 exceeded threshold)"
|
|
256
|
-
);
|
|
257
|
-
return fallback;
|
|
258
|
-
}
|
|
259
|
-
|
|
260
|
-
// Fallback disabled, route to Ollama regardless of complexity
|
|
261
|
-
logger.debug(
|
|
262
|
-
{ toolCount, maxToolsForOllama, fallbackEnabled: false, decision: "ollama" },
|
|
263
|
-
"Routing to Ollama (fallback disabled)"
|
|
264
|
-
);
|
|
265
|
-
return "ollama";
|
|
266
|
-
```
|
|
267
|
-
|
|
268
|
-
**Add Helper Function** (after line 130):
|
|
269
|
-
```javascript
|
|
270
|
-
/**
|
|
271
|
-
* Get the tier for current request based on tool count
|
|
272
|
-
*
|
|
273
|
-
* @param {number} toolCount - Number of tools in request
|
|
274
|
-
* @returns {number} Tier number (1, 2, or 3)
|
|
275
|
-
*/
|
|
276
|
-
function getTierForRequest(toolCount) {
|
|
277
|
-
const preferOllama = config.modelProvider?.preferOllama ?? false;
|
|
278
|
-
if (!preferOllama) return null; // Not using tiered routing
|
|
279
|
-
|
|
280
|
-
const ollamaMaxTools = config.modelProvider?.ollamaMaxToolsForRouting ?? 3;
|
|
281
|
-
const tier2MaxTools = config.modelProvider?.tier2MaxTools ?? 15;
|
|
282
|
-
|
|
283
|
-
if (toolCount < ollamaMaxTools) return 1; // Ollama
|
|
284
|
-
if (toolCount <= tier2MaxTools) return 2; // Tier 2
|
|
285
|
-
return 3; // Tier 3
|
|
286
|
-
}
|
|
287
|
-
|
|
288
|
-
module.exports = {
|
|
289
|
-
determineProvider,
|
|
290
|
-
isFallbackEnabled,
|
|
291
|
-
getFallbackProvider,
|
|
292
|
-
getTierForRequest, // NEW
|
|
293
|
-
};
|
|
294
|
-
```
|
|
295
|
-
|
|
296
|
-
---
|
|
297
|
-
|
|
298
|
-
### Phase 3: Documentation Updates
|
|
299
|
-
|
|
300
|
-
#### 3.1 Update .env.example
|
|
301
|
-
|
|
302
|
-
**File**: `.env.example`
|
|
303
|
-
|
|
304
|
-
**Add After Line 36** (after OLLAMA_MAX_TOOLS_FOR_ROUTING):
|
|
305
|
-
|
|
306
|
-
```bash
|
|
307
|
-
# ==============================================================================
|
|
308
|
-
# Tier 2 Configuration (Explicit Middle Tier - Cost-Effective Cloud)
|
|
309
|
-
# ==============================================================================
|
|
310
|
-
|
|
311
|
-
# Enable Tier 2 routing for cost-effective cloud provider
|
|
312
|
-
# When enabled, requests with 3-15 tools route to this provider instead of checking providers in order
|
|
313
|
-
# TIER_2_ENABLED=true
|
|
314
|
-
|
|
315
|
-
# Which provider to use for Tier 2 (cost-effective cloud only)
|
|
316
|
-
# Options: openrouter, bedrock, azure-openai
|
|
317
|
-
# NOT allowed: openai (Tier 3 only), ollama/llamacpp/lmstudio (Tier 1 only)
|
|
318
|
-
# TIER_2_PROVIDER=openrouter
|
|
319
|
-
|
|
320
|
-
# Maximum tools for Tier 2 routing (requests above this go to Tier 3)
|
|
321
|
-
# Default: 15
|
|
322
|
-
# TIER_2_MAX_TOOLS=15
|
|
323
|
-
|
|
324
|
-
# ==============================================================================
|
|
325
|
-
# 3-Tier Routing Configuration Example
|
|
326
|
-
# ==============================================================================
|
|
327
|
-
#
|
|
328
|
-
# ┌─────────────┬────────────────┬──────────────────┬─────────────┐
|
|
329
|
-
# │ Tool Count │ Tier │ Provider │ Cost │
|
|
330
|
-
# ├─────────────┼────────────────┼──────────────────┼─────────────┤
|
|
331
|
-
# │ 0-2 tools │ Tier 1 (Local) │ Ollama │ FREE │
|
|
332
|
-
# │ 3-15 tools │ Tier 2 (Cloud) │ OpenRouter │ $ (cheap) │
|
|
333
|
-
# │ 16+ tools │ Tier 3 (Cloud) │ Databricks │ $$$ (exp) │
|
|
334
|
-
# └─────────────┴────────────────┴──────────────────┴─────────────┘
|
|
335
|
-
#
|
|
336
|
-
# Complete Example:
|
|
337
|
-
# PREFER_OLLAMA=true
|
|
338
|
-
# OLLAMA_MAX_TOOLS_FOR_ROUTING=3
|
|
339
|
-
# TIER_2_ENABLED=true
|
|
340
|
-
# TIER_2_PROVIDER=openrouter
|
|
341
|
-
# TIER_2_MAX_TOOLS=15
|
|
342
|
-
# FALLBACK_ENABLED=true
|
|
343
|
-
# FALLBACK_PROVIDER=databricks
|
|
344
|
-
```
|
|
345
|
-
|
|
346
|
-
#### 3.2 Update README.md
|
|
347
|
-
|
|
348
|
-
**File**: `README.md`
|
|
349
|
-
|
|
350
|
-
**Add Section After "Hybrid Routing" Section** (after line ~275):
|
|
351
|
-
|
|
352
|
-
```markdown
|
|
353
|
-
### **3-Tier Routing (Explicit Cost Control)**
|
|
354
|
-
|
|
355
|
-
For predictable cost management, use explicit tier configuration:
|
|
356
|
-
|
|
357
|
-
```bash
|
|
358
|
-
# Tier 1: Local (FREE) - 0-2 tools
|
|
359
|
-
PREFER_OLLAMA=true
|
|
360
|
-
OLLAMA_MODEL=llama3.1:8b
|
|
361
|
-
OLLAMA_MAX_TOOLS_FOR_ROUTING=3
|
|
362
|
-
|
|
363
|
-
# Tier 2: Cost-Effective (CHEAP $) - 3-15 tools
|
|
364
|
-
TIER_2_ENABLED=true
|
|
365
|
-
TIER_2_PROVIDER=openrouter
|
|
366
|
-
TIER_2_MAX_TOOLS=15
|
|
367
|
-
|
|
368
|
-
# Tier 3: Premium (EXPENSIVE $$$) - 16+ tools
|
|
369
|
-
FALLBACK_ENABLED=true
|
|
370
|
-
FALLBACK_PROVIDER=databricks
|
|
371
|
-
```
|
|
372
|
-
|
|
373
|
-
**How 3-Tier Routing Works**:
|
|
374
|
-
|
|
375
|
-
| Tool Count | Tier | Provider | Cost | Example |
|
|
376
|
-
|------------|------|----------|------|---------|
|
|
377
|
-
| 0-2 tools | **Tier 1** (Local) | Ollama | FREE | Simple questions, basic file reads |
|
|
378
|
-
| 3-15 tools | **Tier 2** (Cloud) | OpenRouter | $ (cheap) | Medium complexity, moderate tool usage |
|
|
379
|
-
| 16+ tools | **Tier 3** (Cloud) | Databricks | $$$ (expensive) | Complex refactoring, heavy analysis |
|
|
380
|
-
|
|
381
|
-
**Cost Predictability**:
|
|
382
|
-
```
|
|
383
|
-
Simple request (1 tool): Tier 1 → FREE
|
|
384
|
-
Medium request (8 tools): Tier 2 → $0.15 per 1M tokens (OpenRouter)
|
|
385
|
-
Complex request (20 tools): Tier 3 → $3.00 per 1M tokens (Databricks)
|
|
386
|
-
```
|
|
387
|
-
|
|
388
|
-
**Tier 2 Valid Providers** (Cost-Effective):
|
|
389
|
-
- `openrouter` - 100+ models, cheapest option ($0.15/1M)
|
|
390
|
-
- `bedrock` - AWS with cheap models (Llama, Mistral, Titan)
|
|
391
|
-
- `azure-openai` - Azure with mini deployments
|
|
392
|
-
|
|
393
|
-
**Tier 3 Valid Providers** (Premium):
|
|
394
|
-
- `databricks` - Claude Opus/Sonnet, enterprise
|
|
395
|
-
- `azure-anthropic` - Azure-hosted Claude
|
|
396
|
-
- `bedrock` - AWS with Claude 4.5 Sonnet
|
|
397
|
-
- `openrouter` - With premium models
|
|
398
|
-
- `openai` - Direct OpenAI API (premium only)
|
|
399
|
-
- `azure-openai` - Azure with premium models
|
|
400
|
-
|
|
401
|
-
⚠️ **Note**: Direct OpenAI API (`openai`) is Tier 3 only. For cheaper GPT access, use `openrouter` in Tier 2.
|
|
402
|
-
|
|
403
|
-
**Automatic Fallback**: If any tier fails, automatically tries the next tier for resilience.
|
|
404
|
-
```
|
|
405
|
-
|
|
406
|
-
---
|
|
407
|
-
|
|
408
|
-
### Phase 4: Testing Strategy
|
|
409
|
-
|
|
410
|
-
#### 4.1 Unit Tests
|
|
411
|
-
|
|
412
|
-
**File**: `test/tier-routing.test.js` (NEW FILE)
|
|
413
|
-
|
|
414
|
-
```javascript
|
|
415
|
-
const assert = require("assert");
|
|
416
|
-
const { describe, it, beforeEach, afterEach } = require("node:test");
|
|
417
|
-
|
|
418
|
-
describe("3-Tier Routing", () => {
|
|
419
|
-
let originalEnv;
|
|
420
|
-
|
|
421
|
-
beforeEach(() => {
|
|
422
|
-
originalEnv = { ...process.env };
|
|
423
|
-
delete require.cache[require.resolve("../src/config")];
|
|
424
|
-
delete require.cache[require.resolve("../src/clients/routing")];
|
|
425
|
-
});
|
|
426
|
-
|
|
427
|
-
afterEach(() => {
|
|
428
|
-
process.env = originalEnv;
|
|
429
|
-
});
|
|
430
|
-
|
|
431
|
-
describe("Tier 1 (Local)", () => {
|
|
432
|
-
it("should route 0-2 tools to Tier 1 (Ollama)", () => {
|
|
433
|
-
process.env.PREFER_OLLAMA = "true";
|
|
434
|
-
process.env.OLLAMA_MAX_TOOLS_FOR_ROUTING = "3";
|
|
435
|
-
|
|
436
|
-
const { determineProvider } = require("../src/clients/routing");
|
|
437
|
-
const provider = determineProvider({ tools: [{}, {}] }); // 2 tools
|
|
438
|
-
|
|
439
|
-
assert.strictEqual(provider, "ollama");
|
|
440
|
-
});
|
|
441
|
-
});
|
|
442
|
-
|
|
443
|
-
describe("Tier 2 (Cost-Effective Cloud)", () => {
|
|
444
|
-
it("should route 3-15 tools to Tier 2 when enabled", () => {
|
|
445
|
-
process.env.PREFER_OLLAMA = "true";
|
|
446
|
-
process.env.TIER_2_ENABLED = "true";
|
|
447
|
-
process.env.TIER_2_PROVIDER = "openrouter";
|
|
448
|
-
process.env.TIER_2_MAX_TOOLS = "15";
|
|
449
|
-
process.env.OPENROUTER_API_KEY = "test-key";
|
|
450
|
-
|
|
451
|
-
const { determineProvider } = require("../src/clients/routing");
|
|
452
|
-
const provider = determineProvider({ tools: Array(8).fill({}) }); // 8 tools
|
|
453
|
-
|
|
454
|
-
assert.strictEqual(provider, "openrouter");
|
|
455
|
-
});
|
|
456
|
-
|
|
457
|
-
it("should route to bedrock when configured as Tier 2", () => {
|
|
458
|
-
process.env.PREFER_OLLAMA = "true";
|
|
459
|
-
process.env.TIER_2_ENABLED = "true";
|
|
460
|
-
process.env.TIER_2_PROVIDER = "bedrock";
|
|
461
|
-
process.env.AWS_BEDROCK_API_KEY = "test-key";
|
|
462
|
-
|
|
463
|
-
const { determineProvider } = require("../src/clients/routing");
|
|
464
|
-
const provider = determineProvider({ tools: Array(10).fill({}) });
|
|
465
|
-
|
|
466
|
-
assert.strictEqual(provider, "bedrock");
|
|
467
|
-
});
|
|
468
|
-
|
|
469
|
-
it("should NOT allow openai in Tier 2", () => {
|
|
470
|
-
process.env.TIER_2_ENABLED = "true";
|
|
471
|
-
process.env.TIER_2_PROVIDER = "openai";
|
|
472
|
-
|
|
473
|
-
assert.throws(
|
|
474
|
-
() => require("../src/config"),
|
|
475
|
-
/OpenAI direct API is Tier 3 only/
|
|
476
|
-
);
|
|
477
|
-
});
|
|
478
|
-
|
|
479
|
-
it("should NOT allow local providers in Tier 2", () => {
|
|
480
|
-
process.env.TIER_2_ENABLED = "true";
|
|
481
|
-
process.env.TIER_2_PROVIDER = "ollama";
|
|
482
|
-
|
|
483
|
-
assert.throws(
|
|
484
|
-
() => require("../src/config"),
|
|
485
|
-
/Local providers.*should use Tier 1/
|
|
486
|
-
);
|
|
487
|
-
});
|
|
488
|
-
});
|
|
489
|
-
|
|
490
|
-
describe("Tier 3 (Premium Cloud)", () => {
|
|
491
|
-
it("should route 16+ tools to Tier 3", () => {
|
|
492
|
-
process.env.PREFER_OLLAMA = "true";
|
|
493
|
-
process.env.TIER_2_ENABLED = "true";
|
|
494
|
-
process.env.TIER_2_MAX_TOOLS = "15";
|
|
495
|
-
process.env.FALLBACK_ENABLED = "true";
|
|
496
|
-
process.env.FALLBACK_PROVIDER = "databricks";
|
|
497
|
-
|
|
498
|
-
const { determineProvider } = require("../src/clients/routing");
|
|
499
|
-
const provider = determineProvider({ tools: Array(20).fill({}) }); // 20 tools
|
|
500
|
-
|
|
501
|
-
assert.strictEqual(provider, "databricks");
|
|
502
|
-
});
|
|
503
|
-
|
|
504
|
-
it("should allow openai in Tier 3", () => {
|
|
505
|
-
process.env.FALLBACK_ENABLED = "true";
|
|
506
|
-
process.env.FALLBACK_PROVIDER = "openai";
|
|
507
|
-
process.env.OPENAI_API_KEY = "test-key";
|
|
508
|
-
|
|
509
|
-
const config = require("../src/config");
|
|
510
|
-
assert.strictEqual(config.modelProvider.fallbackProvider, "openai");
|
|
511
|
-
});
|
|
512
|
-
|
|
513
|
-
it("should NOT allow local providers in Tier 3", () => {
|
|
514
|
-
process.env.FALLBACK_ENABLED = "true";
|
|
515
|
-
process.env.FALLBACK_PROVIDER = "ollama";
|
|
516
|
-
|
|
517
|
-
assert.throws(
|
|
518
|
-
() => require("../src/config"),
|
|
519
|
-
/FALLBACK_PROVIDER cannot be 'ollama'/
|
|
520
|
-
);
|
|
521
|
-
});
|
|
522
|
-
});
|
|
523
|
-
|
|
524
|
-
describe("Validation", () => {
|
|
525
|
-
it("should throw error if Tier 2 enabled but provider not set", () => {
|
|
526
|
-
process.env.TIER_2_ENABLED = "true";
|
|
527
|
-
// Missing TIER_2_PROVIDER
|
|
528
|
-
|
|
529
|
-
assert.throws(
|
|
530
|
-
() => require("../src/config"),
|
|
531
|
-
/TIER_2_PROVIDER is not set/
|
|
532
|
-
);
|
|
533
|
-
});
|
|
534
|
-
|
|
535
|
-
it("should throw error if Tier 2 provider not configured", () => {
|
|
536
|
-
process.env.TIER_2_ENABLED = "true";
|
|
537
|
-
process.env.TIER_2_PROVIDER = "openrouter";
|
|
538
|
-
// Missing OPENROUTER_API_KEY
|
|
539
|
-
|
|
540
|
-
assert.throws(
|
|
541
|
-
() => require("../src/config"),
|
|
542
|
-
/openrouter.*is not configured/
|
|
543
|
-
);
|
|
544
|
-
});
|
|
545
|
-
});
|
|
546
|
-
|
|
547
|
-
describe("Backward Compatibility", () => {
|
|
548
|
-
it("should fall back to legacy mode when Tier 2 disabled", () => {
|
|
549
|
-
process.env.PREFER_OLLAMA = "true";
|
|
550
|
-
process.env.TIER_2_ENABLED = "false";
|
|
551
|
-
process.env.OPENROUTER_API_KEY = "test-key";
|
|
552
|
-
|
|
553
|
-
const { determineProvider } = require("../src/clients/routing");
|
|
554
|
-
const provider = determineProvider({ tools: Array(8).fill({}) }); // 8 tools
|
|
555
|
-
|
|
556
|
-
assert.strictEqual(provider, "openrouter"); // Legacy order check
|
|
557
|
-
});
|
|
558
|
-
});
|
|
559
|
-
});
|
|
560
|
-
```
|
|
561
|
-
|
|
562
|
-
#### 4.2 Manual Integration Tests
|
|
563
|
-
|
|
564
|
-
```bash
|
|
565
|
-
# Test 1: Tier 1 routing (0-2 tools)
|
|
566
|
-
curl -X POST http://localhost:8081/v1/messages \
|
|
567
|
-
-H "Content-Type: application/json" \
|
|
568
|
-
-d '{
|
|
569
|
-
"model": "claude",
|
|
570
|
-
"max_tokens": 50,
|
|
571
|
-
"messages": [{"role": "user", "content": "Hello"}],
|
|
572
|
-
"tools": [{"name": "test1"}]
|
|
573
|
-
}'
|
|
574
|
-
# Expected: Routes to Ollama, logs show "Tier 1"
|
|
575
|
-
|
|
576
|
-
# Test 2: Tier 2 routing (3-15 tools)
|
|
577
|
-
curl -X POST http://localhost:8081/v1/messages \
|
|
578
|
-
-H "Content-Type: application/json" \
|
|
579
|
-
-d '{
|
|
580
|
-
"model": "claude",
|
|
581
|
-
"max_tokens": 50,
|
|
582
|
-
"messages": [{"role": "user", "content": "Hello"}],
|
|
583
|
-
"tools": [
|
|
584
|
-
{"name": "t1"}, {"name": "t2"}, {"name": "t3"},
|
|
585
|
-
{"name": "t4"}, {"name": "t5"}, {"name": "t6"}
|
|
586
|
-
]
|
|
587
|
-
}'
|
|
588
|
-
# Expected: Routes to openrouter (Tier 2), logs show "tier: 2"
|
|
589
|
-
|
|
590
|
-
# Test 3: Tier 3 routing (16+ tools)
|
|
591
|
-
curl -X POST http://localhost:8081/v1/messages \
|
|
592
|
-
-H "Content-Type: application/json" \
|
|
593
|
-
-d '{
|
|
594
|
-
"model": "claude",
|
|
595
|
-
"max_tokens": 50,
|
|
596
|
-
"messages": [{"role": "user", "content": "Hello"}],
|
|
597
|
-
"tools": [... 20 tools ...]
|
|
598
|
-
}'
|
|
599
|
-
# Expected: Routes to databricks (Tier 3), logs show "tier: 3"
|
|
600
|
-
```
|
|
601
|
-
|
|
602
|
-
---
|
|
603
|
-
|
|
604
|
-
## Configuration Examples
|
|
605
|
-
|
|
606
|
-
### Example 1: Standard 3-Tier (Recommended)
|
|
607
|
-
```bash
|
|
608
|
-
# Tier 1: Local (FREE)
|
|
609
|
-
PREFER_OLLAMA=true
|
|
610
|
-
OLLAMA_MODEL=llama3.1:8b
|
|
611
|
-
OLLAMA_MAX_TOOLS_FOR_ROUTING=3
|
|
612
|
-
|
|
613
|
-
# Tier 2: OpenRouter (CHEAP $)
|
|
614
|
-
TIER_2_ENABLED=true
|
|
615
|
-
TIER_2_PROVIDER=openrouter
|
|
616
|
-
TIER_2_MAX_TOOLS=15
|
|
617
|
-
OPENROUTER_MODEL=openai/gpt-4o-mini
|
|
618
|
-
|
|
619
|
-
# Tier 3: Databricks (PREMIUM $$$)
|
|
620
|
-
FALLBACK_ENABLED=true
|
|
621
|
-
FALLBACK_PROVIDER=databricks
|
|
622
|
-
```
|
|
623
|
-
|
|
624
|
-
### Example 2: AWS Bedrock Ecosystem
|
|
625
|
-
```bash
|
|
626
|
-
# Tier 1: Local
|
|
627
|
-
PREFER_OLLAMA=true
|
|
628
|
-
|
|
629
|
-
# Tier 2: Bedrock with cheap models
|
|
630
|
-
TIER_2_ENABLED=true
|
|
631
|
-
TIER_2_PROVIDER=bedrock
|
|
632
|
-
TIER_2_MAX_TOOLS=15
|
|
633
|
-
AWS_BEDROCK_MODEL_ID=meta.llama3-1-8b-instruct-v1:0 # Cheap $0.99/1M
|
|
634
|
-
|
|
635
|
-
# Tier 3: Bedrock with Claude
|
|
636
|
-
FALLBACK_PROVIDER=bedrock
|
|
637
|
-
# Would use: us.anthropic.claude-sonnet-4-5-* (expensive $3+/1M)
|
|
638
|
-
```
|
|
639
|
-
|
|
640
|
-
### Example 3: Azure Ecosystem
|
|
641
|
-
```bash
|
|
642
|
-
# Tier 1: Local
|
|
643
|
-
PREFER_OLLAMA=true
|
|
644
|
-
|
|
645
|
-
# Tier 2: Azure OpenAI mini
|
|
646
|
-
TIER_2_ENABLED=true
|
|
647
|
-
TIER_2_PROVIDER=azure-openai
|
|
648
|
-
AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini
|
|
649
|
-
|
|
650
|
-
# Tier 3: Azure Anthropic
|
|
651
|
-
FALLBACK_PROVIDER=azure-anthropic
|
|
652
|
-
```
|
|
653
|
-
|
|
654
|
-
### Example 4: OpenRouter → OpenAI Direct
|
|
655
|
-
```bash
|
|
656
|
-
# Tier 1: Local
|
|
657
|
-
PREFER_OLLAMA=true
|
|
658
|
-
|
|
659
|
-
# Tier 2: OpenRouter (cheap GPT via aggregator)
|
|
660
|
-
TIER_2_ENABLED=true
|
|
661
|
-
TIER_2_PROVIDER=openrouter
|
|
662
|
-
OPENROUTER_MODEL=openai/gpt-4o-mini
|
|
663
|
-
|
|
664
|
-
# Tier 3: OpenAI Direct (official API, premium)
|
|
665
|
-
FALLBACK_PROVIDER=openai
|
|
666
|
-
OPENAI_MODEL=gpt-4o
|
|
667
|
-
```
|
|
668
|
-
|
|
669
|
-
---
|
|
670
|
-
|
|
671
|
-
## Backward Compatibility
|
|
672
|
-
|
|
673
|
-
### Strategy
|
|
674
|
-
Make Tier 2 **opt-in** to preserve existing behavior.
|
|
675
|
-
|
|
676
|
-
**Old Config (Still Works)**:
|
|
677
|
-
```bash
|
|
678
|
-
PREFER_OLLAMA=true
|
|
679
|
-
FALLBACK_ENABLED=true
|
|
680
|
-
FALLBACK_PROVIDER=databricks
|
|
681
|
-
```
|
|
682
|
-
Result: 0-2 tools → Ollama, 3+ tools → checks providers in order (legacy mode)
|
|
683
|
-
|
|
684
|
-
**New Config (Explicit Tiers)**:
|
|
685
|
-
```bash
|
|
686
|
-
PREFER_OLLAMA=true
|
|
687
|
-
TIER_2_ENABLED=true
|
|
688
|
-
TIER_2_PROVIDER=openrouter
|
|
689
|
-
FALLBACK_PROVIDER=databricks
|
|
690
|
-
```
|
|
691
|
-
Result: 0-2 tools → Ollama, 3-15 tools → openrouter, 16+ tools → databricks
|
|
692
|
-
|
|
693
|
-
### Migration Path
|
|
694
|
-
1. **Existing users**: No changes needed, system works as before
|
|
695
|
-
2. **New users**: Can enable `TIER_2_ENABLED=true` for explicit routing
|
|
696
|
-
3. **Documentation**: Show both old and new configs
|
|
697
|
-
|
|
698
|
-
---
|
|
699
|
-
|
|
700
|
-
## Summary
|
|
701
|
-
|
|
702
|
-
### Files to Modify
|
|
703
|
-
|
|
704
|
-
| File | Changes | Lines | Type |
|
|
705
|
-
|------|---------|-------|------|
|
|
706
|
-
| `src/config/index.js` | Add tier2 config parsing & validation | ~50 | Modify |
|
|
707
|
-
| `src/clients/routing.js` | Replace lines 56-94 with tier logic | ~60 | Modify |
|
|
708
|
-
| `.env.example` | Document Tier 2 configuration | ~40 | Add |
|
|
709
|
-
| `README.md` | Add 3-Tier Routing section | ~60 | Add |
|
|
710
|
-
| `test/tier-routing.test.js` | Unit tests for tier routing | ~120 | Create |
|
|
711
|
-
|
|
712
|
-
**Total Effort**: ~330 lines of changes/additions
|
|
713
|
-
|
|
714
|
-
---
|
|
715
|
-
|
|
716
|
-
## Tier Provider Matrix (Quick Reference)
|
|
717
|
-
|
|
718
|
-
| Provider | Tier 1 | Tier 2 | Tier 3 |
|
|
719
|
-
|----------|--------|--------|--------|
|
|
720
|
-
| **ollama** | ✅ PRIMARY | ❌ NO | ❌ NO |
|
|
721
|
-
| **llamacpp** | ✅ PRIMARY | ❌ NO | ❌ NO |
|
|
722
|
-
| **lmstudio** | ✅ PRIMARY | ❌ NO | ❌ NO |
|
|
723
|
-
| **openrouter** | ❌ NO | ✅ ALLOWED | ✅ ALLOWED |
|
|
724
|
-
| **bedrock** | ❌ NO | ✅ ALLOWED | ✅ ALLOWED |
|
|
725
|
-
| **azure-openai** | ❌ NO | ✅ ALLOWED | ✅ ALLOWED |
|
|
726
|
-
| **openai** | ❌ NO | ❌ NO | ✅ ALLOWED |
|
|
727
|
-
| **databricks** | ❌ NO | ❌ NO | ✅ ALLOWED |
|
|
728
|
-
| **azure-anthropic** | ❌ NO | ❌ NO | ✅ ALLOWED |
|
|
729
|
-
|
|
730
|
-
---
|
|
731
|
-
|
|
732
|
-
## Decision Rationale
|
|
733
|
-
|
|
734
|
-
### Why OpenAI is Tier 3 Only?
|
|
735
|
-
1. Direct OpenAI API is premium-priced vs OpenRouter
|
|
736
|
-
2. Users choosing direct OpenAI want "official" API = premium intent
|
|
737
|
-
3. Tier 2 should be "cost optimization" - use OpenRouter for cheaper GPT access
|
|
738
|
-
4. Clear separation: Tier 2 = cheap aggregators, Tier 3 = direct APIs
|
|
739
|
-
|
|
740
|
-
### Why Local Providers are Tier 1 Only?
|
|
741
|
-
1. Local = FREE, doesn't make sense as "fallback" for cloud
|
|
742
|
-
2. Tier progression should be: Free → Cheap Cloud → Expensive Cloud
|
|
743
|
-
3. If local provider fails, escalate to cloud, not to another local
|
|
744
|
-
|
|
745
|
-
### Why Same Provider Can Be Tier 2 or Tier 3?
|
|
746
|
-
1. **Provider ≠ Tier** - Model choice determines cost
|
|
747
|
-
2. Example: Bedrock with Llama ($0.99/1M) = Tier 2, Bedrock with Claude ($3/1M) = Tier 3
|
|
748
|
-
3. User controls tier assignment via model configuration
|
|
749
|
-
|
|
750
|
-
---
|
|
751
|
-
|
|
752
|
-
## Next Steps
|
|
753
|
-
|
|
754
|
-
1. ✅ User commits Bedrock changes first
|
|
755
|
-
2. ✅ Implement Phase 1 (Config validation)
|
|
756
|
-
3. ✅ Implement Phase 2 (Routing logic)
|
|
757
|
-
4. ✅ Implement Phase 3 (Documentation)
|
|
758
|
-
5. ✅ Implement Phase 4 (Tests)
|
|
759
|
-
6. ✅ Test with real requests
|
|
760
|
-
7. ✅ Update CHANGELOG.md
|
|
761
|
-
|
|
762
|
-
---
|
|
763
|
-
|
|
764
|
-
## Open Questions
|
|
765
|
-
|
|
766
|
-
_(To be filled in based on user feedback)_
|
|
767
|
-
|
|
768
|
-
1. Should Tier 2 be opt-in (current plan) or opt-out?
|
|
769
|
-
2. Should we add metrics to track tier usage?
|
|
770
|
-
3. Should we add auto-learning (if Tier 2 fails X times, skip it)?
|
|
771
|
-
4. Should TIER_2_MAX_TOOLS default match OPENROUTER_MAX_TOOLS_FOR_ROUTING (15)?
|