lynkr 8.0.0 → 9.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (128) hide show
  1. package/.lynkr/telemetry.db +0 -0
  2. package/.lynkr/telemetry.db-shm +0 -0
  3. package/.lynkr/telemetry.db-wal +0 -0
  4. package/README.md +196 -322
  5. package/lynkr-skill.tar.gz +0 -0
  6. package/package.json +4 -3
  7. package/src/api/openai-router.js +64 -13
  8. package/src/api/providers-handler.js +171 -3
  9. package/src/api/router.js +9 -2
  10. package/src/clients/circuit-breaker.js +10 -247
  11. package/src/clients/codex-process.js +342 -0
  12. package/src/clients/codex-utils.js +143 -0
  13. package/src/clients/databricks.js +210 -63
  14. package/src/clients/resilience.js +540 -0
  15. package/src/clients/retry.js +22 -167
  16. package/src/clients/standard-tools.js +23 -0
  17. package/src/config/index.js +77 -0
  18. package/src/context/compression.js +42 -9
  19. package/src/context/distill.js +492 -0
  20. package/src/orchestrator/index.js +48 -8
  21. package/src/routing/complexity-analyzer.js +258 -5
  22. package/src/routing/index.js +12 -2
  23. package/src/routing/latency-tracker.js +148 -0
  24. package/src/routing/model-tiers.js +2 -0
  25. package/src/routing/quality-scorer.js +113 -0
  26. package/src/routing/telemetry.js +464 -0
  27. package/src/server.js +13 -12
  28. package/src/tools/code-graph.js +538 -0
  29. package/src/tools/code-mode.js +304 -0
  30. package/src/tools/index.js +4 -0
  31. package/src/tools/lazy-loader.js +18 -0
  32. package/src/tools/mcp-remote.js +7 -0
  33. package/src/tools/smart-selection.js +11 -0
  34. package/src/tools/tinyfish.js +358 -0
  35. package/src/tools/truncate.js +1 -0
  36. package/src/utils/payload.js +206 -0
  37. package/src/utils/perf-timer.js +80 -0
  38. package/.github/FUNDING.yml +0 -15
  39. package/.github/workflows/README.md +0 -215
  40. package/.github/workflows/ci.yml +0 -69
  41. package/.github/workflows/index.yml +0 -62
  42. package/.github/workflows/web-tools-tests.yml +0 -56
  43. package/CITATIONS.bib +0 -6
  44. package/DEPLOYMENT.md +0 -1001
  45. package/LYNKR-TUI-PLAN.md +0 -984
  46. package/PERFORMANCE-REPORT.md +0 -866
  47. package/PLAN-per-client-model-routing.md +0 -252
  48. package/docs/42642f749da6234f41b6b425c3bb07c9.txt +0 -1
  49. package/docs/BingSiteAuth.xml +0 -4
  50. package/docs/docs-style.css +0 -478
  51. package/docs/docs.html +0 -198
  52. package/docs/google5be250e608e6da39.html +0 -1
  53. package/docs/index.html +0 -577
  54. package/docs/index.md +0 -584
  55. package/docs/robots.txt +0 -4
  56. package/docs/sitemap.xml +0 -44
  57. package/docs/style.css +0 -1223
  58. package/docs/toon-integration-spec.md +0 -130
  59. package/documentation/README.md +0 -101
  60. package/documentation/api.md +0 -806
  61. package/documentation/claude-code-cli.md +0 -679
  62. package/documentation/codex-cli.md +0 -397
  63. package/documentation/contributing.md +0 -571
  64. package/documentation/cursor-integration.md +0 -734
  65. package/documentation/docker.md +0 -874
  66. package/documentation/embeddings.md +0 -762
  67. package/documentation/faq.md +0 -713
  68. package/documentation/features.md +0 -403
  69. package/documentation/headroom.md +0 -519
  70. package/documentation/installation.md +0 -758
  71. package/documentation/memory-system.md +0 -476
  72. package/documentation/production.md +0 -636
  73. package/documentation/providers.md +0 -1009
  74. package/documentation/routing.md +0 -476
  75. package/documentation/testing.md +0 -629
  76. package/documentation/token-optimization.md +0 -325
  77. package/documentation/tools.md +0 -697
  78. package/documentation/troubleshooting.md +0 -969
  79. package/final-test.js +0 -33
  80. package/headroom-sidecar/config.py +0 -93
  81. package/headroom-sidecar/requirements.txt +0 -14
  82. package/headroom-sidecar/server.py +0 -451
  83. package/monitor-agents.sh +0 -31
  84. package/scripts/audit-log-reader.js +0 -399
  85. package/scripts/compact-dictionary.js +0 -204
  86. package/scripts/test-deduplication.js +0 -448
  87. package/src/db/database.sqlite +0 -0
  88. package/te +0 -11622
  89. package/test/README.md +0 -212
  90. package/test/azure-openai-config.test.js +0 -213
  91. package/test/azure-openai-error-resilience.test.js +0 -238
  92. package/test/azure-openai-format-conversion.test.js +0 -354
  93. package/test/azure-openai-integration.test.js +0 -287
  94. package/test/azure-openai-routing.test.js +0 -175
  95. package/test/azure-openai-streaming.test.js +0 -171
  96. package/test/bedrock-integration.test.js +0 -457
  97. package/test/comprehensive-test-suite.js +0 -928
  98. package/test/config-validation.test.js +0 -207
  99. package/test/cursor-integration.test.js +0 -484
  100. package/test/format-conversion.test.js +0 -578
  101. package/test/hybrid-routing-integration.test.js +0 -269
  102. package/test/hybrid-routing-performance.test.js +0 -428
  103. package/test/llamacpp-integration.test.js +0 -882
  104. package/test/lmstudio-integration.test.js +0 -347
  105. package/test/memory/extractor.test.js +0 -398
  106. package/test/memory/retriever.test.js +0 -613
  107. package/test/memory/retriever.test.js.bak +0 -585
  108. package/test/memory/search.test.js +0 -537
  109. package/test/memory/search.test.js.bak +0 -389
  110. package/test/memory/store.test.js +0 -344
  111. package/test/memory/store.test.js.bak +0 -312
  112. package/test/memory/surprise.test.js +0 -300
  113. package/test/memory-performance.test.js +0 -472
  114. package/test/openai-integration.test.js +0 -683
  115. package/test/openrouter-error-resilience.test.js +0 -418
  116. package/test/passthrough-mode.test.js +0 -385
  117. package/test/performance-benchmark.js +0 -351
  118. package/test/performance-tests.js +0 -528
  119. package/test/routing.test.js +0 -225
  120. package/test/toon-compression.test.js +0 -131
  121. package/test/web-tools.test.js +0 -329
  122. package/test-agents-simple.js +0 -43
  123. package/test-cli-connection.sh +0 -33
  124. package/test-learning-unit.js +0 -126
  125. package/test-learning.js +0 -112
  126. package/test-parallel-agents.sh +0 -124
  127. package/test-parallel-direct.js +0 -155
  128. package/test-subagents.sh +0 -117
@@ -1,476 +0,0 @@
1
- # Intelligent Routing & Model Tiering
2
-
3
- Lynkr's intelligent routing system automatically selects the optimal model and provider for each request based on complexity analysis, agentic workflow detection, and cost optimization.
4
-
5
- ---
6
-
7
- ## Overview
8
-
9
- ```
10
- Request → Force Patterns → Tool Thresholds → Complexity Analysis → Agentic Detection → Tier Selection → Cost Optimization → Provider
11
- ```
12
-
13
- The routing pipeline evaluates every incoming request through multiple stages to determine which model tier and provider should handle it. Simple requests go to cheap/local models, complex ones go to powerful cloud models.
14
-
15
- **Key benefits:**
16
- - 60-80% cost reduction by routing simple tasks to cheaper models
17
- - Better quality on complex tasks by using capable models when needed
18
- - Automatic agentic workflow detection with tier upgrades
19
- - Multi-source pricing for optimal cost decisions
20
-
21
- ---
22
-
23
- ## 4-Tier Model System
24
-
25
- Every request is mapped to one of four complexity tiers:
26
-
27
- | Tier | Score Range | Description | Example Tasks |
28
- |------|-----------|-------------|---------------|
29
- | **SIMPLE** | 0-25 | Greetings, simple Q&A, confirmations | "Hello", "What is a variable?", "Yes" |
30
- | **MEDIUM** | 26-50 | Code reading, simple edits, research | "Read this file", "Fix this typo", "Search for X" |
31
- | **COMPLEX** | 51-75 | Multi-file changes, debugging, architecture | "Refactor auth module", "Debug this race condition" |
32
- | **REASONING** | 76-100 | Complex analysis, security audits, novel problems | "Security audit", "Design microservices architecture" |
33
-
34
- ### Configuration
35
-
36
- Tiers are configured via mandatory environment variables in `provider:model` format:
37
-
38
- ```bash
39
- # Required - one per tier
40
- TIER_SIMPLE=ollama:llama3.2
41
- TIER_MEDIUM=openai:gpt-4o
42
- TIER_COMPLEX=openai:o1-mini
43
- TIER_REASONING=openai:o1
44
-
45
- # Examples with other providers
46
- TIER_SIMPLE=ollama:qwen2.5-coder
47
- TIER_MEDIUM=databricks:databricks-claude-sonnet-4-5
48
- TIER_COMPLEX=azure-openai:gpt-5.2-chat
49
- TIER_REASONING=databricks:databricks-claude-opus-4-6
50
- ```
51
-
52
- If a model name is given without a provider prefix, the default provider (`MODEL_PROVIDER`) is used.
53
-
54
- ### Routing Precedence
55
-
56
- There are three routing-related settings. Here is exactly how they interact:
57
-
58
- #### 1. `TIER_*` Environment Variables (Highest Priority)
59
-
60
- When **all four** `TIER_*` vars are set (`TIER_SIMPLE`, `TIER_MEDIUM`, `TIER_COMPLEX`, `TIER_REASONING`), tiered routing is **active**. Every incoming request is scored for complexity (0-100), mapped to a tier, and routed to the `provider:model` specified in the matching `TIER_*` var.
61
-
62
- In this mode, `MODEL_PROVIDER` is **not consulted** for routing decisions. The provider comes directly from the `TIER_*` value (e.g., `ollama:llama3.2` routes to Ollama, `openai:gpt-4o` routes to OpenAI).
63
-
64
- If any of the four `TIER_*` vars are missing, tiered routing is **completely disabled** and the system falls back to `MODEL_PROVIDER`.
65
-
66
- #### 2. `MODEL_PROVIDER` (Default / Fallback)
67
-
68
- `MODEL_PROVIDER` controls routing in two scenarios:
69
-
70
- - **When tiered routing is disabled** (any `TIER_*` var missing) — all requests go to the provider set in `MODEL_PROVIDER`, regardless of complexity. This is static routing.
71
- - **When a `TIER_*` value has no provider prefix** (e.g., `TIER_SIMPLE=llama3.2` instead of `TIER_SIMPLE=ollama:llama3.2`) — `MODEL_PROVIDER` is used as the default provider for that tier.
72
-
73
- Even when tiered routing is active and overrides it for request routing, `MODEL_PROVIDER` is still used for:
74
- - **Startup checks** — e.g., if `MODEL_PROVIDER=ollama`, the server waits for Ollama to be reachable before accepting requests
75
- - **Provider discovery API** (`/v1/providers`) — marks which provider is "primary" in the response
76
- - **Embeddings routing** — the OpenAI-compatible router checks `MODEL_PROVIDER` for embedding provider selection
77
-
78
- **Always set `MODEL_PROVIDER`** even when using tier routing.
79
-
80
- #### 3. `PREFER_OLLAMA` (Removed)
81
-
82
- `PREFER_OLLAMA` is **deprecated and has no effect**. If set, a warning is logged at startup:
83
-
84
- ```
85
- [DEPRECATION] PREFER_OLLAMA is removed. Use TIER_* env vars for routing.
86
- ```
87
-
88
- To route simple requests to Ollama, use `TIER_SIMPLE=ollama:<model>` instead.
89
-
90
- #### Summary Table
91
-
92
- | Configuration | Routing Behavior |
93
- |---|---|
94
- | All 4 `TIER_*` set | Tier routing active. Each request scored and routed to its tier's `provider:model`. `MODEL_PROVIDER` ignored for routing. |
95
- | 1-3 `TIER_*` set | Tier routing **disabled**. All requests go to `MODEL_PROVIDER` (static). |
96
- | No `TIER_*` set | Static routing. All requests go to `MODEL_PROVIDER`. |
97
- | `TIER_*` value without provider prefix | `MODEL_PROVIDER` used as the default provider for that tier. |
98
- | `PREFER_OLLAMA` set | No effect. Deprecation warning logged. |
99
-
100
- #### Example: Mixed Local + Cloud Setup
101
-
102
- ```bash
103
- MODEL_PROVIDER=ollama # Startup checks + default provider
104
- TIER_SIMPLE=ollama:llama3.2 # Score 0-25 → Ollama (free, local)
105
- TIER_MEDIUM=openai:gpt-4o # Score 26-50 → OpenAI
106
- TIER_COMPLEX=databricks:claude-sonnet-4-5 # Score 51-75 → Databricks
107
- TIER_REASONING=databricks:claude-opus-4-6 # Score 76-100 → Databricks
108
- ```
109
-
110
- In this setup, a "Hello" message (score ~5) routes to Ollama. A "Refactor the auth module" message (score ~65) routes to Databricks. `MODEL_PROVIDER=ollama` ensures the server waits for Ollama at startup but does not affect where complex requests go.
111
-
112
- ### Tier Config File
113
-
114
- Additional tier preferences (fallback models per provider) can be defined in `config/model-tiers.json`:
115
-
116
- ```json
117
- {
118
- "tiers": {
119
- "SIMPLE": { "preferred": { "ollama": ["llama3.2"], "openai": ["gpt-4o-mini"] } },
120
- "MEDIUM": { "preferred": { "openai": ["gpt-4o"], "anthropic": ["claude-sonnet-4-20250514"] } },
121
- "COMPLEX": { "preferred": { "openai": ["o1-mini"], "anthropic": ["claude-sonnet-4-20250514"] } },
122
- "REASONING": { "preferred": { "openai": ["o1"], "anthropic": ["claude-opus-4-20250514"] } }
123
- },
124
- "localProviders": {
125
- "ollama": { "free": true, "defaultTier": "SIMPLE" },
126
- "llamacpp": { "free": true, "defaultTier": "SIMPLE" },
127
- "lmstudio": { "free": true, "defaultTier": "SIMPLE" }
128
- }
129
- }
130
- ```
131
-
132
- ---
133
-
134
- ## Complexity Scoring Algorithm
135
-
136
- The complexity analyzer implements 4 phases to produce a score from 0-100.
137
-
138
- ### Phase 1: Basic Scoring
139
-
140
- Three components scored independently:
141
-
142
- **Token Count (0-20 points):**
143
-
144
- | Tokens | Score |
145
- |--------|-------|
146
- | < 500 | 0 |
147
- | 500-999 | 4 |
148
- | 1,000-1,999 | 8 |
149
- | 2,000-3,999 | 12 |
150
- | 4,000-7,999 | 16 |
151
- | 8,000+ | 20 |
152
-
153
- **Tool Count (0-20 points):**
154
-
155
- | Tools | Score |
156
- |-------|-------|
157
- | 0 | 0 |
158
- | 1-3 | 4 |
159
- | 4-6 | 8 |
160
- | 7-10 | 12 |
161
- | 11-15 | 16 |
162
- | 16+ | 20 |
163
-
164
- **Task Type (0-25 points):**
165
- - Greetings / yes-no: 0-2
166
- - Simple questions: 3
167
- - General non-technical: 5
168
- - Technical content: 10
169
- - Refactoring: 16
170
- - New implementation: 18
171
- - From scratch: 20
172
- - Entire codebase scope: 22
173
- - Force cloud patterns (security audit, architecture review): 25
174
-
175
- ### Phase 2: Advanced Classification
176
-
177
- Additional scoring on top of Phase 1:
178
-
179
- **Code Complexity (0-20 points):**
180
-
181
- | Pattern | Points |
182
- |---------|--------|
183
- | Multi-file operations | +5 |
184
- | Architecture concerns | +5 |
185
- | Security | +4 |
186
- | Concurrency | +3 |
187
- | Performance | +3 |
188
- | Database operations | +3 |
189
- | Testing | +2 |
190
-
191
- **Reasoning Requirements (0-15 points):**
192
-
193
- | Pattern | Points |
194
- |---------|--------|
195
- | Step-by-step reasoning | +4 |
196
- | Trade-off analysis | +4 |
197
- | General analysis | +3 |
198
- | Planning | +3 |
199
- | Edge cases | +2 |
200
-
201
- **Conversation Bonus:**
202
- - 6-10 messages: +2
203
- - 11+ messages: +5
204
-
205
- The standard score is the sum of all components, capped at 100.
206
-
207
- ### Weighted Scoring Mode (15 Dimensions)
208
-
209
- When `ROUTING_WEIGHTED_SCORING=true`, the analyzer uses a 15-dimension weighted scoring system instead of the standard additive scoring:
210
-
211
- ```
212
- Score = Sum of (dimension_value * weight) for all 15 dimensions
213
- ```
214
-
215
- #### Dimension Weights
216
-
217
- **Content Analysis (35% total):**
218
-
219
- | Dimension | Weight | Measures |
220
- |-----------|--------|----------|
221
- | tokenCount | 0.08 | Request size (token estimate) |
222
- | promptComplexity | 0.10 | Sentence structure, average length |
223
- | technicalDepth | 0.10 | Technical keyword density |
224
- | domainSpecificity | 0.07 | Number of specialized domains (security, ML, distributed, database, frontend, devops) |
225
-
226
- **Tool Analysis (25% total):**
227
-
228
- | Dimension | Weight | Measures |
229
- |-----------|--------|----------|
230
- | toolCount | 0.08 | Number of tools in request |
231
- | toolComplexity | 0.10 | Weighted average of tool complexity (Bash=0.9, Write=0.8, Edit=0.7, Read=0.3, Glob/Grep=0.2) |
232
- | toolChainPotential | 0.07 | Sequential operation indicators ("then", "after", "step 1") |
233
-
234
- **Reasoning Requirements (25% total):**
235
-
236
- | Dimension | Weight | Measures |
237
- |-----------|--------|----------|
238
- | multiStepReasoning | 0.10 | Step-by-step / planning patterns |
239
- | codeGeneration | 0.08 | Code creation requests |
240
- | analysisDepth | 0.07 | Trade-off / analysis patterns |
241
-
242
- **Context Factors (15% total):**
243
-
244
- | Dimension | Weight | Measures |
245
- |-----------|--------|----------|
246
- | conversationDepth | 0.05 | Message count in conversation |
247
- | priorToolUsage | 0.05 | Tool results already in conversation |
248
- | ambiguity | 0.05 | Inverse of request specificity |
249
-
250
- Each dimension is scored 0-100 independently, then multiplied by its weight. The final score is the rounded sum.
251
-
252
- ### Phase 3: Metrics Tracking
253
-
254
- Every routing decision is recorded in-memory (last 1,000 decisions) for analytics:
255
- - Total decisions, local vs. cloud split
256
- - Average complexity score
257
- - Per-provider and per-tier distribution
258
-
259
- Metrics are exposed via the `/metrics` endpoint and `X-Lynkr-*` response headers.
260
-
261
- ### Phase 4: Embeddings-Based Similarity (Optional)
262
-
263
- When an embeddings model is configured (`OLLAMA_EMBEDDINGS_MODEL`), the analyzer can compare request content against reference embeddings for complex and simple tasks using cosine similarity. This produces a score adjustment of -10 to +10 points.
264
-
265
- ---
266
-
267
- ## Agentic Workflow Detection
268
-
269
- The agentic detector identifies multi-step tool chains and autonomous agent patterns, boosting the complexity tier accordingly.
270
-
271
- ### Agent Types
272
-
273
- | Type | Score Boost | Min Tier | Description |
274
- |------|------------|----------|-------------|
275
- | **SINGLE_SHOT** | +0 | SIMPLE | Simple request-response, no tool chains |
276
- | **TOOL_CHAIN** | +15 | MEDIUM | Sequential tool usage (read -> edit -> test) |
277
- | **ITERATIVE** | +25 | COMPLEX | Retry loops, debugging cycles, iterative refinement |
278
- | **AUTONOMOUS** | +35 | REASONING | Open-ended tasks, full autonomy, complex decision making |
279
-
280
- ### Detection Signals
281
-
282
- The detector evaluates 6 signal categories:
283
-
284
- **1. Tool Count**
285
- - 4-5 tools: +8
286
- - 6-10 tools: +15
287
- - 11+ tools: +25
288
-
289
- **2. Agentic Tools Present** (Bash, Write, Edit, Task, Git, Test)
290
- - 1 agentic tool: +8
291
- - 2-3 agentic tools: +15
292
- - 4+ agentic tools: +25
293
-
294
- **3. Prior Tool Results** (already in an agentic loop)
295
- - 1-2 tool results: +10
296
- - 3-5 tool results: +20
297
- - 6+ tool results: +30
298
-
299
- **4. Content Pattern Matching**
300
- - Autonomous patterns ("figure out", "solve", "make it work"): +25
301
- - Iterative patterns ("keep trying", "debug", "retry"): +20
302
- - Tool chain patterns ("then use", "next step", "step 1"): +15
303
- - Multi-file work: +15
304
- - Planning required: +10
305
- - Implementation + testing: +15
306
-
307
- **5. Conversation Depth**
308
- - 5-8 messages: +6
309
- - 9-15 messages: +12
310
- - 16+ messages: +20
311
-
312
- **6. Content Length**
313
- - 2,000+ characters: +10
314
-
315
- ### Classification Thresholds
316
-
317
- | Agent Type | Score Threshold | Additional Conditions |
318
- |------------|----------------|----------------------|
319
- | AUTONOMOUS | >= 60 | or autonomous pattern + score >= 40 |
320
- | ITERATIVE | >= 40 | or deep tool loop + score >= 30 |
321
- | TOOL_CHAIN | >= 20 | or many agentic tools present |
322
- | SINGLE_SHOT | < 20 | Default |
323
-
324
- When an agentic workflow is detected (`score >= 25`), the complexity score is boosted by the agent type's `scoreBoost` value, and the tier is upgraded to at least the agent type's `minTier`.
325
-
326
- ---
327
-
328
- ## Force Patterns
329
-
330
- Certain requests bypass the scoring algorithm entirely:
331
-
332
- ### Force Local (always local model)
333
- - Greetings: "hi", "hello", "thanks", "bye"
334
- - Time queries: "what time is it"
335
- - Confirmations: "yes", "no", "ok", "sure"
336
- - Help requests: "help", "commands"
337
-
338
- ### Force Cloud (always cloud model)
339
- - Security audits/reviews
340
- - Architecture design/review
341
- - Complete codebase refactoring
342
- - Code/PR reviews
343
- - Complex debugging
344
- - Production incidents
345
-
346
- ---
347
-
348
- ## Cost Optimization
349
-
350
- When `ROUTING_COST_OPTIMIZATION=true`, the router checks if a cheaper model can handle the determined tier.
351
-
352
- ### Model Registry
353
-
354
- Pricing data is fetched from three sources (in priority order):
355
-
356
- 1. **LiteLLM** (highest priority) - Community-maintained pricing from [BerriAI/litellm](https://github.com/BerriAI/litellm)
357
- 2. **models.dev** - API pricing aggregator
358
- 3. **Databricks Fallback** - Hardcoded pricing for common models (Claude, Llama, GPT, Gemini, DBRX)
359
-
360
- Pricing data is cached locally in `data/model-prices-cache.json` with a 24-hour TTL. Background refresh happens automatically when the cache is stale.
361
-
362
- ### Cost Tracking
363
-
364
- The optimizer tracks costs at both session and global levels:
365
- - Per-request cost recording (input + output tokens)
366
- - Per-model, per-provider, per-tier breakdowns
367
- - Savings calculation when routing to cheaper alternatives
368
-
369
- ### Pricing Lookup
370
-
371
- The registry supports flexible model name lookup:
372
- - Direct match: `gpt-4o`
373
- - Provider prefix stripping: `databricks-claude-sonnet-4-5` -> `claude-sonnet-4-5`
374
- - Fuzzy matching for partial names
375
-
376
- ---
377
-
378
- ## Routing Headers
379
-
380
- Every response includes routing metadata in `X-Lynkr-*` headers:
381
-
382
- | Header | Description | Example |
383
- |--------|-------------|---------|
384
- | `X-Lynkr-Routing-Method` | How the decision was made | `tier_config`, `force`, `tool_threshold`, `agentic`, `cost_optimized` |
385
- | `X-Lynkr-Provider` | Selected provider | `databricks`, `ollama`, `openrouter` |
386
- | `X-Lynkr-Complexity-Score` | Complexity score (0-100) | `42` |
387
- | `X-Lynkr-Complexity-Threshold` | Score threshold for cloud routing | `40` |
388
- | `X-Lynkr-Routing-Reason` | Human-readable reason | `force_local_pattern`, `autonomous_workflow` |
389
- | `X-Lynkr-Tier` | Selected model tier | `SIMPLE`, `MEDIUM`, `COMPLEX`, `REASONING` |
390
- | `X-Lynkr-Model` | Selected model | `llama3.2`, `gpt-4o`, `claude-opus-4-6` |
391
- | `X-Lynkr-Agentic` | Agentic workflow type (if detected) | `TOOL_CHAIN`, `ITERATIVE`, `AUTONOMOUS` |
392
- | `X-Lynkr-Cost-Optimized` | Whether cost optimization was applied | `true` |
393
-
394
- ---
395
-
396
- ## Configuration Reference
397
-
398
- ### Environment Variables
399
-
400
- | Variable | Default | Description |
401
- |----------|---------|-------------|
402
- | `TIER_SIMPLE` | *required* | Model for simple tier (`provider:model`) |
403
- | `TIER_MEDIUM` | *required* | Model for medium tier (`provider:model`) |
404
- | `TIER_COMPLEX` | *required* | Model for complex tier (`provider:model`) |
405
- | `TIER_REASONING` | *required* | Model for reasoning tier (`provider:model`) |
406
- | `SMART_TOOL_SELECTION_MODE` | `heuristic` | Scoring mode: `aggressive` (threshold=60), `heuristic` (threshold=40), `conservative` (threshold=25) |
407
- | `ROUTING_WEIGHTED_SCORING` | `false` | Enable 15-dimension weighted scoring |
408
- | `ROUTING_AGENTIC_DETECTION` | `true` | Enable agentic workflow detection |
409
- | `ROUTING_COST_OPTIMIZATION` | `false` | Enable cost-based model selection |
410
- | `OLLAMA_MAX_TOOLS_FOR_ROUTING` | `3` | Max tools before routing away from Ollama |
411
- | `OPENROUTER_MAX_TOOLS_FOR_ROUTING` | `15` | Max tools before routing away from OpenRouter |
412
- | `OLLAMA_EMBEDDINGS_MODEL` | *(none)* | Embeddings model for Phase 4 similarity |
413
-
414
- ### Smart Tool Selection Modes
415
-
416
- | Mode | Threshold | Behavior |
417
- |------|-----------|----------|
418
- | `aggressive` | 60 | More requests go to local (saves cost) |
419
- | `heuristic` | 40 | Balanced local/cloud split |
420
- | `conservative` | 25 | More requests go to cloud (better quality) |
421
-
422
- ---
423
-
424
- ## Routing Decision Flow
425
-
426
- ```
427
- 1. Are all 4 TIER_* env vars configured?
428
- └─ No → Return static provider (MODEL_PROVIDER), skip all routing
429
-
430
- 2. Does content match FORCE_LOCAL patterns?
431
- └─ Yes → Route to local provider
432
-
433
- 3. Does content match FORCE_CLOUD patterns?
434
- └─ Yes → Route to best cloud provider (requires FALLBACK_ENABLED)
435
-
436
- 4. Analyze complexity:
437
- └─ Calculate score 0-100 (standard or weighted mode)
438
-
439
- 5. Optional: Embeddings adjustment:
440
- └─ Adjust score by -10 to +10 based on semantic similarity
441
-
442
- 6. Agentic detection:
443
- └─ If agentic → Boost score, enforce minimum tier
444
- └─ If AUTONOMOUS → Force cloud provider
445
-
446
- 7. Map score to tier (SIMPLE/MEDIUM/COMPLEX/REASONING)
447
-
448
- 8. Select provider:model from matching TIER_* env var
449
-
450
- 9. Optional: Cost optimization
451
- └─ Check for cheaper model that can handle the tier
452
-
453
- 10. Return { provider, model, tier, score, method }
454
- ```
455
-
456
- ---
457
-
458
- ## Source Files
459
-
460
- | File | Description |
461
- |------|-------------|
462
- | `src/routing/index.js` | Main routing orchestrator (`determineProviderSmart()`) |
463
- | `src/routing/complexity-analyzer.js` | 4-phase complexity analysis, 15-dimension weighted scoring |
464
- | `src/routing/agentic-detector.js` | Agentic workflow detection and classification |
465
- | `src/routing/model-tiers.js` | Tier definitions, model selection from `TIER_*` env vars |
466
- | `src/routing/model-registry.js` | Multi-source pricing (LiteLLM, models.dev, Databricks fallback) |
467
- | `src/routing/cost-optimizer.js` | Cost tracking, cheapest model finder, savings calculation |
468
-
469
- ---
470
-
471
- ## Next Steps
472
-
473
- - **[Features Overview](features.md)** - Architecture and request flow
474
- - **[Token Optimization](token-optimization.md)** - Cost reduction strategies
475
- - **[Provider Configuration](providers.md)** - Setting up providers
476
- - **[Production Guide](production.md)** - Deploy with routing enabled