lynkr 7.2.4 → 8.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. package/README.md +2 -2
  2. package/config/model-tiers.json +89 -0
  3. package/docs/docs.html +1 -0
  4. package/docs/index.md +7 -0
  5. package/docs/toon-integration-spec.md +130 -0
  6. package/documentation/README.md +3 -2
  7. package/documentation/claude-code-cli.md +23 -16
  8. package/documentation/cursor-integration.md +17 -14
  9. package/documentation/docker.md +11 -4
  10. package/documentation/embeddings.md +7 -5
  11. package/documentation/faq.md +66 -12
  12. package/documentation/features.md +22 -15
  13. package/documentation/installation.md +66 -14
  14. package/documentation/production.md +43 -8
  15. package/documentation/providers.md +145 -42
  16. package/documentation/routing.md +476 -0
  17. package/documentation/token-optimization.md +7 -5
  18. package/documentation/troubleshooting.md +81 -5
  19. package/install.sh +6 -1
  20. package/package.json +5 -3
  21. package/scripts/setup.js +0 -1
  22. package/src/agents/executor.js +14 -6
  23. package/src/api/middleware/session.js +15 -2
  24. package/src/api/openai-router.js +130 -37
  25. package/src/api/providers-handler.js +15 -1
  26. package/src/api/router.js +107 -2
  27. package/src/budget/index.js +4 -3
  28. package/src/clients/databricks.js +431 -234
  29. package/src/clients/gpt-utils.js +181 -0
  30. package/src/clients/ollama-utils.js +66 -140
  31. package/src/clients/routing.js +0 -1
  32. package/src/clients/standard-tools.js +82 -5
  33. package/src/config/index.js +119 -35
  34. package/src/context/toon.js +173 -0
  35. package/src/headroom/launcher.js +8 -3
  36. package/src/logger/index.js +23 -0
  37. package/src/orchestrator/index.js +765 -212
  38. package/src/routing/agentic-detector.js +320 -0
  39. package/src/routing/complexity-analyzer.js +202 -2
  40. package/src/routing/cost-optimizer.js +305 -0
  41. package/src/routing/index.js +168 -159
  42. package/src/routing/model-registry.js +437 -0
  43. package/src/routing/model-tiers.js +365 -0
  44. package/src/server.js +2 -2
  45. package/src/sessions/cleanup.js +3 -3
  46. package/src/sessions/record.js +10 -1
  47. package/src/sessions/store.js +7 -2
  48. package/src/tools/agent-task.js +48 -1
  49. package/src/tools/index.js +15 -2
  50. package/src/tools/workspace.js +35 -4
  51. package/src/workspace/index.js +30 -0
  52. package/te +11622 -0
  53. package/test/README.md +1 -1
  54. package/test/azure-openai-config.test.js +17 -8
  55. package/test/azure-openai-integration.test.js +7 -1
  56. package/test/azure-openai-routing.test.js +41 -43
  57. package/test/bedrock-integration.test.js +18 -32
  58. package/test/hybrid-routing-integration.test.js +35 -20
  59. package/test/hybrid-routing-performance.test.js +74 -64
  60. package/test/llamacpp-integration.test.js +28 -9
  61. package/test/lmstudio-integration.test.js +20 -8
  62. package/test/openai-integration.test.js +17 -20
  63. package/test/performance-tests.js +1 -1
  64. package/test/routing.test.js +65 -59
  65. package/test/toon-compression.test.js +131 -0
  66. package/CLAWROUTER_ROUTING_PLAN.md +0 -910
  67. package/ROUTER_COMPARISON.md +0 -173
  68. package/TIER_ROUTING_PLAN.md +0 -771
package/README.md CHANGED
@@ -238,7 +238,7 @@ Lynkr supports [ClawdBot](https://github.com/openclaw/openclaw) via its OpenAI-c
238
238
 
239
239
  ### Getting Started
240
240
  - 📦 **[Installation Guide](documentation/installation.md)** - Detailed installation for all methods
241
- - ⚙️ **[Provider Configuration](documentation/providers.md)** - Complete setup for all 9+ providers
241
+ - ⚙️ **[Provider Configuration](documentation/providers.md)** - Complete setup for all 12+ providers
242
242
  - 🎯 **[Quick Start Examples](documentation/installation.md#quick-start-examples)** - Copy-paste configs
243
243
 
244
244
  ### IDE & CLI Integration
@@ -277,7 +277,7 @@ Lynkr supports [ClawdBot](https://github.com/openclaw/openclaw) via its OpenAI-c
277
277
 
278
278
  ## Key Features Highlights
279
279
 
280
- - ✅ **Multi-Provider Support** - 9+ providers including local (Ollama, llama.cpp) and cloud (Bedrock, Databricks, OpenRouter)
280
+ - ✅ **Multi-Provider Support** - 12+ providers including local (Ollama, llama.cpp) and cloud (Bedrock, Databricks, OpenRouter, Moonshot AI)
281
281
  - ✅ **60-80% Cost Reduction** - Token optimization with smart tool selection, prompt caching, memory deduplication
282
282
  - ✅ **100% Local Option** - Run completely offline with Ollama/llama.cpp (zero cloud dependencies)
283
283
  - ✅ **OpenAI Compatible** - Works with Cursor IDE, Continue.dev, and any OpenAI-compatible client
@@ -0,0 +1,89 @@
1
+ {
2
+ "tiers": {
3
+ "SIMPLE": {
4
+ "description": "Greetings, simple Q&A, confirmations, basic lookups",
5
+ "range": [0, 25],
6
+ "priority": 1,
7
+ "preferred": {
8
+ "ollama": ["llama3.2", "gemma2", "phi3", "qwen2.5:7b", "mistral"],
9
+ "llamacpp": ["default"],
10
+ "lmstudio": ["default"],
11
+ "openai": ["gpt-4o-mini", "gpt-3.5-turbo"],
12
+ "azure-openai": ["gpt-4o-mini", "gpt-35-turbo"],
13
+ "anthropic": ["claude-3-haiku-20240307", "claude-3-5-haiku-20241022"],
14
+ "bedrock": ["anthropic.claude-3-haiku-20240307-v1:0", "amazon.nova-lite-v1:0"],
15
+ "databricks": ["databricks-claude-haiku-4-5", "databricks-gpt-5-nano"],
16
+ "google": ["gemini-2.0-flash", "gemini-1.5-flash"],
17
+ "openrouter": ["google/gemini-flash-1.5", "deepseek/deepseek-chat"],
18
+ "zai": ["GLM-4-Flash"],
19
+ "moonshot": ["kimi-k2-turbo-preview"]
20
+ }
21
+ },
22
+ "MEDIUM": {
23
+ "description": "Code reading, simple edits, research, documentation",
24
+ "range": [26, 50],
25
+ "priority": 2,
26
+ "preferred": {
27
+ "ollama": ["qwen2.5:32b", "deepseek-coder:33b", "codellama:34b"],
28
+ "llamacpp": ["default"],
29
+ "lmstudio": ["default"],
30
+ "openai": ["gpt-4o", "gpt-4-turbo"],
31
+ "azure-openai": ["gpt-4o", "gpt-4"],
32
+ "anthropic": ["claude-sonnet-4-20250514", "claude-3-5-sonnet-20241022"],
33
+ "bedrock": ["anthropic.claude-3-5-sonnet-20241022-v2:0", "amazon.nova-pro-v1:0"],
34
+ "databricks": ["databricks-claude-sonnet-4-5", "databricks-gpt-5-1"],
35
+ "google": ["gemini-1.5-pro", "gemini-2.0-pro"],
36
+ "openrouter": ["anthropic/claude-3.5-sonnet", "openai/gpt-4o"],
37
+ "zai": ["GLM-4.7"],
38
+ "moonshot": ["kimi-k2-turbo-preview"]
39
+ }
40
+ },
41
+ "COMPLEX": {
42
+ "description": "Multi-file changes, debugging, architecture, refactoring",
43
+ "range": [51, 75],
44
+ "priority": 3,
45
+ "preferred": {
46
+ "ollama": ["qwen2.5:72b", "llama3.1:70b", "deepseek-coder-v2:236b"],
47
+ "openai": ["o1-mini", "o3-mini", "gpt-4o"],
48
+ "azure-openai": ["o1-mini", "gpt-4o"],
49
+ "anthropic": ["claude-sonnet-4-20250514", "claude-3-5-sonnet-20241022"],
50
+ "bedrock": ["anthropic.claude-3-5-sonnet-20241022-v2:0"],
51
+ "databricks": ["databricks-claude-sonnet-4-5", "databricks-gpt-5-1-codex-max"],
52
+ "google": ["gemini-2.5-pro", "gemini-1.5-pro"],
53
+ "openrouter": ["anthropic/claude-3.5-sonnet", "meta-llama/llama-3.1-405b"],
54
+ "zai": ["GLM-4.7"],
55
+ "moonshot": ["kimi-k2-turbo-preview"]
56
+ }
57
+ },
58
+ "REASONING": {
59
+ "description": "Complex analysis, security audits, novel problems, deep thinking",
60
+ "range": [76, 100],
61
+ "priority": 4,
62
+ "preferred": {
63
+ "openai": ["o1", "o1-pro", "o3"],
64
+ "azure-openai": ["o1", "o1-pro"],
65
+ "anthropic": ["claude-opus-4-20250514", "claude-3-opus-20240229"],
66
+ "bedrock": ["anthropic.claude-3-opus-20240229-v1:0"],
67
+ "databricks": ["databricks-claude-opus-4-6", "databricks-claude-opus-4-5", "databricks-gpt-5-2"],
68
+ "google": ["gemini-2.5-pro"],
69
+ "openrouter": ["anthropic/claude-3-opus", "deepseek/deepseek-reasoner", "openai/o1"],
70
+ "deepseek": ["deepseek-reasoner", "deepseek-r1"],
71
+ "moonshot": ["kimi-k2-thinking", "kimi-k2-turbo-preview"]
72
+ }
73
+ }
74
+ },
75
+ "localProviders": {
76
+ "ollama": { "free": true, "defaultTier": "SIMPLE" },
77
+ "llamacpp": { "free": true, "defaultTier": "SIMPLE" },
78
+ "lmstudio": { "free": true, "defaultTier": "SIMPLE" }
79
+ },
80
+ "providerAliases": {
81
+ "azure": "azure-openai",
82
+ "aws": "bedrock",
83
+ "amazon": "bedrock",
84
+ "claude": "anthropic",
85
+ "gemini": "google",
86
+ "vertex": "google",
87
+ "kimi": "moonshot"
88
+ }
89
+ }
package/docs/docs.html CHANGED
@@ -51,6 +51,7 @@
51
51
  <div class="doc-sidebar-title">Features</div>
52
52
  <ul class="doc-sidebar-links">
53
53
  <li><a href="?doc=features" data-doc="features">Core Features</a></li>
54
+ <li><a href="?doc=routing" data-doc="routing">Routing & Model Tiering</a></li>
54
55
  <li><a href="?doc=token-optimization" data-doc="token-optimization">Token Optimization</a></li>
55
56
  <li><a href="?doc=memory-system" data-doc="memory-system">Memory System</a></li>
56
57
  <li><a href="?doc=headroom" data-doc="headroom">Headroom Compression</a></li>
package/docs/index.md CHANGED
@@ -311,6 +311,13 @@
311
311
  <span class="provider-badge paid">GPT-4o, o1</span>
312
312
  </div>
313
313
 
314
+ <div class="provider-card">
315
+ <span class="provider-icon">🌙</span>
316
+ <div class="provider-name">Moonshot AI</div>
317
+ <div class="provider-type">Cloud</div>
318
+ <span class="provider-badge paid">KIMI K2</span>
319
+ </div>
320
+
314
321
  <div class="provider-card">
315
322
  <span class="provider-icon" style="font-weight: 900; font-size: 28px; background: linear-gradient(135deg, #1a1a2e, #16213e); color: #fff; width: 42px; height: 42px; border-radius: 10px; display: inline-flex; align-items: center; justify-content: center;">Z</span>
316
323
  <div class="provider-name">z.ai</div>
@@ -0,0 +1,130 @@
1
+ # TOON Integration Spec (Lynkr Spike)
2
+
3
+ Date: 2026-02-17
4
+ Branch: `codex/toon-integration-spike`
5
+ Status: Implemented behind flags (`TOON_ENABLED=false` by default).
6
+
7
+ ## 1) Goal
8
+
9
+ Reduce prompt token usage for large structured JSON context while preserving current Lynkr routing, tool execution semantics, and reliability.
10
+
11
+ ## 2) Non-Goals
12
+
13
+ 1. Do not replace Lynkr routing/fallback logic.
14
+ 2. Do not change MCP/tool protocol behavior.
15
+ 3. Do not change provider request envelope formats.
16
+ 4. Do not require TOON for normal operation.
17
+
18
+ ## 3) Integration Strategy (Minimal, Reversible)
19
+
20
+ 1. Add a TOON adapter module (encode-only for prompt context).
21
+ 2. Apply TOON only to eligible large JSON blobs before they are inserted into model-visible context.
22
+ 3. Keep original JSON in memory/session for execution and audit; only prompt copy is compressed.
23
+ 4. Fail open: if TOON conversion fails, send original JSON unchanged.
24
+
25
+ ## 4) What We Will Compress
26
+
27
+ Eligible inputs (all required):
28
+
29
+ 1. Payload is valid JSON object/array.
30
+ 2. Payload size exceeds threshold (for example, `TOON_MIN_BYTES`).
31
+ 3. Payload is read-only context for model comprehension (not protocol-critical).
32
+
33
+ Primary targets:
34
+
35
+ 1. Large tool output summaries inserted into prompt context.
36
+ 2. Large search/result payloads injected for reasoning.
37
+ 3. Structured data snapshots used for analysis tasks.
38
+
39
+ ## 5) What We Will Never Compress
40
+
41
+ Hard exclusions:
42
+
43
+ 1. Tool schemas/definitions (`tools`, `input_schema`, function signatures).
44
+ 2. Tool call argument payloads that are executed by systems.
45
+ 3. Provider request envelopes (`/v1/messages`, `/chat/completions` body schema fields).
46
+ 4. Protocol control fields (roles, stop reasons, tool IDs, request IDs).
47
+ 5. Stored canonical session payloads used for replay/debug/audit.
48
+
49
+ Rule: if a payload is machine-validated/executed downstream, keep JSON.
50
+
51
+ ## 6) Config Flags (Default Safe)
52
+
53
+ Proposed env flags:
54
+
55
+ 1. `TOON_ENABLED=false` (default off)
56
+ 2. `TOON_MIN_BYTES=4096` (only convert larger payloads)
57
+ 3. `TOON_FAIL_OPEN=true` (fallback to JSON on any TOON error)
58
+ 4. `TOON_LOG_STATS=true` (log before/after token estimate for observability)
59
+
60
+ ## 7) Verification Gates
61
+
62
+ Before enabling:
63
+
64
+ 1. Existing unit tests pass unchanged.
65
+ 2. Existing MCP smoke passes (`find_tool`/`call_tool` path).
66
+
67
+ With `TOON_ENABLED=true`:
68
+
69
+ 1. Prompt A/B benchmark still passes functionally.
70
+ 2. No regression in Task/subagent behavior.
71
+ 3. Data-heavy prompt shows token reduction vs baseline.
72
+ 4. No increase in protocol/tool-call errors.
73
+
74
+ ## 8) Rollback Rules
75
+
76
+ Immediate rollback:
77
+
78
+ 1. Set `TOON_ENABLED=false`.
79
+ 2. Restart Lynkr service.
80
+
81
+ Code rollback:
82
+
83
+ 1. Revert TOON integration commit(s) on this branch.
84
+ 2. Re-run unit + MCP smoke gates.
85
+
86
+ ## 9) Risks and Mitigations
87
+
88
+ 1. Risk: semantic drift from transformed payloads.
89
+ - Mitigation: apply only to read-only context, fail-open on error, keep canonical JSON.
90
+ 2. Risk: negligible gains on non-tabular/deeply nested payloads.
91
+ - Mitigation: threshold + eligibility checks; skip low-value payloads.
92
+ 3. Risk: harder debugging.
93
+ - Mitigation: log conversion stats and keep original payload for diagnostics.
94
+
95
+ ## 10) Stock Provider Validation (Ollama Cloud)
96
+
97
+ Date: 2026-02-17
98
+
99
+ Runtime under test:
100
+
101
+ 1. `MODEL_PROVIDER=ollama`
102
+ 2. `OLLAMA_ENDPOINT=http://127.0.0.1:11434`
103
+ 3. `OLLAMA_MODEL=glm-5:cloud`
104
+ 4. `TOON_MIN_BYTES=256`
105
+ 5. `TOON_FAIL_OPEN=true`
106
+ 6. `TOON_LOG_STATS=true`
107
+
108
+ Probe used:
109
+
110
+ 1. Send a two-message request where the second message is a large JSON blob.
111
+ 2. Ask model to classify the next message as `JSON` vs `OTHER` based on first character.
112
+ 3. Run once with `TOON_ENABLED=false`, once with `TOON_ENABLED=true`.
113
+
114
+ Observed results:
115
+
116
+ 1. `TOON_ENABLED=false`
117
+ - Reply: `JSON`
118
+ - Provider header: `x-lynkr-provider: ollama`
119
+ - TOON log entries: `0`
120
+ 2. `TOON_ENABLED=true`
121
+ - Reply: `OTHER`
122
+ - Provider header: `x-lynkr-provider: ollama`
123
+ - TOON log entries: `1`
124
+ - Logged conversion stats: `originalBytes=6416`, `compressedBytes=5854` (saved `562` bytes, `8.76%`)
125
+
126
+ Conclusion:
127
+
128
+ 1. TOON gating works on stock Ollama cloud path (not moonshot-specific).
129
+ 2. Compression is applied only when flag-enabled.
130
+ 3. Provider routing remains unchanged (`ollama`) during TOON transformation.
@@ -9,7 +9,7 @@ Welcome to the comprehensive documentation for Lynkr, the self-hosted Claude Cod
9
9
  New to Lynkr? Start here:
10
10
 
11
11
  - **[Installation Guide](installation.md)** - Complete installation instructions for all methods (npm, git clone, homebrew, Docker)
12
- - **[Provider Configuration](providers.md)** - Detailed setup for all 9+ supported providers (Databricks, Bedrock, OpenRouter, Ollama, llama.cpp, Azure OpenAI, Azure Anthropic, OpenAI, LM Studio)
12
+ - **[Provider Configuration](providers.md)** - Detailed setup for all 12+ supported providers (Databricks, Bedrock, OpenRouter, Ollama, llama.cpp, Azure OpenAI, Azure Anthropic, OpenAI, LM Studio, Moonshot AI, Z.AI, Vertex AI)
13
13
  - **[Quick Start Examples](installation.md#quick-start-examples)** - Copy-paste configurations to get running fast
14
14
 
15
15
  ---
@@ -30,6 +30,7 @@ Connect Lynkr to your development tools:
30
30
  Understand Lynkr's capabilities:
31
31
 
32
32
  - **[Architecture & Features](features.md)** - System architecture, request flow, format conversion, and core capabilities
33
+ - **[Routing & Model Tiering](routing.md)** - 4-tier model system, 15-dimension complexity scoring, agentic workflow detection, and cost optimization
33
34
  - **[Memory System](memory-system.md)** - Titans-inspired long-term memory with surprise-based filtering and decay
34
35
  - **[Token Optimization](token-optimization.md)** - Achieve 60-80% cost reduction through smart tool selection, prompt caching, and memory deduplication
35
36
  - **[Headroom Compression](headroom.md)** - 47-92% token reduction through intelligent context compression (Smart Crusher, CCR, LLMLingua)
@@ -73,7 +74,7 @@ Get help and contribute:
73
74
  - [Installation](installation.md) | [Providers](providers.md) | [Claude Code](claude-code-cli.md) | [Codex CLI](codex-cli.md) | [Cursor](cursor-integration.md) | [Embeddings](embeddings.md)
74
75
 
75
76
  ### Features & Optimization
76
- - [Features](features.md) | [Memory System](memory-system.md) | [Token Optimization](token-optimization.md) | [Headroom](headroom.md) | [Tools](tools.md)
77
+ - [Features](features.md) | [Routing](routing.md) | [Memory System](memory-system.md) | [Token Optimization](token-optimization.md) | [Headroom](headroom.md) | [Tools](tools.md)
77
78
 
78
79
  ### Deployment & Production
79
80
  - [Docker](docker.md) | [Production](production.md) | [API Reference](api.md)
@@ -11,7 +11,7 @@ Lynkr acts as a drop-in replacement for Anthropic's backend, enabling Claude Cod
11
11
  ### Why Use Lynkr with Claude Code CLI?
12
12
 
13
13
  - 💰 **60-80% cost savings** through token optimization
14
- - 🔓 **Provider choice** - Use any of 9+ supported providers
14
+ - 🔓 **Provider choice** - Use any of 12+ supported providers
15
15
  - 🏠 **Self-hosted** - Full control over your AI infrastructure
16
16
  - 🔒 **Local option** - Run 100% offline with Ollama or llama.cpp
17
17
  - ✅ **Zero code changes** - Drop-in replacement for Anthropic backend
@@ -74,7 +74,7 @@ export DATABRICKS_API_BASE=https://your-workspace.databricks.com
74
74
  export DATABRICKS_API_KEY=dapi1234567890abcdef
75
75
  ```
76
76
 
77
- See [Provider Configuration Guide](providers.md) for all 9+ providers.
77
+ See [Provider Configuration Guide](providers.md) for all 12+ providers.
78
78
 
79
79
  ---
80
80
 
@@ -341,15 +341,16 @@ export MODEL_PROVIDER=databricks
341
341
 
342
342
  ---
343
343
 
344
- ## Hybrid Routing (Cost Optimization)
344
+ ## Tier-Based Routing (Cost Optimization)
345
345
 
346
- Use local Ollama for simple tasks, fallback to cloud for complex ones:
346
+ Use local Ollama for simple tasks, cloud for complex ones:
347
347
 
348
348
  ```bash
349
- # Configure hybrid routing
350
- export MODEL_PROVIDER=ollama
351
- export OLLAMA_MODEL=llama3.1:8b
352
- export PREFER_OLLAMA=true
349
+ # Configure tier-based routing (set all 4 to enable)
350
+ export TIER_SIMPLE=ollama:llama3.2
351
+ export TIER_MEDIUM=openrouter:openai/gpt-4o-mini
352
+ export TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
353
+ export TIER_REASONING=databricks:databricks-claude-sonnet-4-5
353
354
  export FALLBACK_ENABLED=true
354
355
  export FALLBACK_PROVIDER=databricks
355
356
  export DATABRICKS_API_BASE=https://your-workspace.databricks.com
@@ -360,13 +361,15 @@ lynkr start
360
361
  ```
361
362
 
362
363
  **How it works:**
363
- - **0-2 tools**: Ollama (free, local, fast)
364
- - **3-15 tools**: OpenRouter (if configured) or fallback
365
- - **16+ tools**: Databricks/Azure (most capable)
366
- - **Ollama failures**: Automatic transparent fallback to cloud
364
+ - Each request is scored for complexity (0-100) and mapped to a tier
365
+ - **SIMPLE (0-25)**: Ollama (free, local, fast)
366
+ - **MEDIUM (26-50)**: OpenRouter (affordable cloud)
367
+ - **COMPLEX (51-75)**: Databricks (most capable)
368
+ - **REASONING (76-100)**: Databricks (best available)
369
+ - **Provider failures**: Automatic transparent fallback to cloud
367
370
 
368
371
  **Cost savings:**
369
- - **65-100%** for requests that stay on Ollama
372
+ - **65-100%** for requests routed to local models
370
373
  - **40-87%** faster for simple requests
371
374
 
372
375
  ---
@@ -534,9 +537,13 @@ claude "What files are in the current directory?"
534
537
  - Local (Ollama): Should be 100-500ms
535
538
  - Cloud: Should be 500ms-2s
536
539
 
537
- 2. **Enable hybrid routing:**
540
+ 2. **Enable tier-based routing:**
538
541
  ```bash
539
- export PREFER_OLLAMA=true
542
+ # Set all 4 TIER_* env vars to enable tier-based routing
543
+ export TIER_SIMPLE=ollama:llama3.2
544
+ export TIER_MEDIUM=openrouter:openai/gpt-4o-mini
545
+ export TIER_COMPLEX=azure-openai:gpt-4o
546
+ export TIER_REASONING=azure-openai:gpt-4o
540
547
  export FALLBACK_ENABLED=true
541
548
  ```
542
549
 
@@ -655,7 +662,7 @@ Claude Code CLI (displays result)
655
662
 
656
663
  ## Next Steps
657
664
 
658
- - **[Provider Configuration](providers.md)** - Configure all 9+ providers
665
+ - **[Provider Configuration](providers.md)** - Configure all 12+ providers
659
666
  - **[Installation Guide](installation.md)** - Detailed installation
660
667
  - **[Features Guide](features.md)** - Learn about advanced features
661
668
  - **[Token Optimization](token-optimization.md)** - Maximize cost savings
@@ -534,11 +534,14 @@ AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
534
534
  - **Cloud** (OpenRouter/Databricks): Should be 500ms-2s
535
535
  - **Distant regions**: Can be 2-5s
536
536
 
537
- 2. **Enable hybrid routing** for speed:
537
+ 2. **Enable tier-based routing** for speed:
538
538
  ```env
539
- # Use Ollama for simple requests (fast)
540
- # Cloud for complex requests
541
- PREFER_OLLAMA=true
539
+ # Use Ollama for simple requests (fast), cloud for complex requests
540
+ # Set all 4 TIER_* env vars to enable tier-based routing
541
+ TIER_SIMPLE=ollama:llama3.2
542
+ TIER_MEDIUM=openrouter:openai/gpt-4o-mini
543
+ TIER_COMPLEX=azure-openai:gpt-4o
544
+ TIER_REASONING=azure-openai:gpt-4o
542
545
  FALLBACK_ENABLED=true
543
546
  ```
544
547
 
@@ -675,12 +678,12 @@ OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
675
678
  ### Setup 3: Hybrid (Best of Both Worlds)
676
679
 
677
680
  ```bash
678
- # Chat: Ollama for simple requests, Databricks for complex
679
- PREFER_OLLAMA=true
681
+ # Chat: Tier-based routing (set all 4 to enable)
682
+ TIER_SIMPLE=ollama:llama3.2
683
+ TIER_MEDIUM=openrouter:openai/gpt-4o-mini
684
+ TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
685
+ TIER_REASONING=databricks:databricks-claude-sonnet-4-5
680
686
  FALLBACK_ENABLED=true
681
- OLLAMA_MODEL=llama3.1:8b
682
-
683
- # Fallback to Databricks for complex requests
684
687
  FALLBACK_PROVIDER=databricks
685
688
  DATABRICKS_API_BASE=https://your-workspace.databricks.com
686
689
  DATABRICKS_API_KEY=your-key
@@ -688,15 +691,15 @@ DATABRICKS_API_KEY=your-key
688
691
  # Embeddings: Ollama (local, private)
689
692
  OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
690
693
 
691
- # Cost: Mostly FREE (Ollama handles 70-80% of requests)
692
- # Only complex tool-heavy requests go to Databricks
694
+ # Cost: Mostly FREE (Ollama handles 70-80% of simple requests)
695
+ # Only complex/reasoning requests go to Databricks
693
696
  ```
694
697
 
695
698
  **Benefits:**
696
- - ✅ Mostly FREE (70-80% of requests on Ollama)
699
+ - ✅ Mostly FREE (70-80% of requests on Ollama via TIER_SIMPLE)
697
700
  - ✅ Private embeddings (local search)
698
701
  - ✅ Cloud quality for complex tasks
699
- - ✅ Automatic intelligent routing
702
+ - ✅ Automatic intelligent tier-based routing
700
703
 
701
704
  ---
702
705
 
@@ -704,7 +707,7 @@ OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
704
707
 
705
708
  | Aspect | Cursor Native | Lynkr + Cursor |
706
709
  |--------|---------------|----------------|
707
- | **Providers** | OpenAI only | 9+ providers (Bedrock, Databricks, OpenRouter, Ollama, llama.cpp, etc.) |
710
+ | **Providers** | OpenAI only | 12+ providers (Bedrock, Databricks, OpenRouter, Ollama, llama.cpp, Moonshot, etc.) |
708
711
  | **Costs** | OpenAI pricing | 60-80% cheaper (or 100% FREE with Ollama) |
709
712
  | **Privacy** | Cloud-only | Can run 100% locally (Ollama + local embeddings) |
710
713
  | **Embeddings** | Built-in (cloud) | 4 options: Ollama (local), llama.cpp (local), OpenRouter (cloud), OpenAI (cloud) |
@@ -73,10 +73,14 @@ services:
73
73
  ports:
74
74
  - "8081:8081"
75
75
  environment:
76
- # Hybrid routing: Ollama first, fallback to cloud
76
+ # Tier-based routing: local for simple, cloud for complex
77
77
  - MODEL_PROVIDER=ollama
78
78
  - OLLAMA_API_BASE=http://ollama:11434
79
- - PREFER_OLLAMA=true
79
+ # Set all 4 TIER_* vars to enable tier-based routing
80
+ - TIER_SIMPLE=ollama:llama3.2
81
+ - TIER_MEDIUM=ollama:llama3.2
82
+ - TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
83
+ - TIER_REASONING=databricks:databricks-claude-sonnet-4-5
80
84
  - FALLBACK_ENABLED=true
81
85
  - FALLBACK_PROVIDER=databricks
82
86
  - DATABRICKS_API_BASE=${DATABRICKS_API_BASE}
@@ -452,8 +456,11 @@ environment:
452
456
  - DATABRICKS_API_BASE=https://your-workspace.databricks.com
453
457
  - DATABRICKS_API_KEY=${DATABRICKS_API_KEY}
454
458
 
455
- # Hybrid routing
456
- - PREFER_OLLAMA=true
459
+ # Tier-based routing (set all 4 to enable)
460
+ - TIER_SIMPLE=ollama:llama3.2
461
+ - TIER_MEDIUM=openrouter:openai/gpt-4o-mini
462
+ - TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
463
+ - TIER_REASONING=databricks:databricks-claude-sonnet-4-5
457
464
  - FALLBACK_ENABLED=true
458
465
  - FALLBACK_PROVIDER=databricks
459
466
 
@@ -532,10 +532,12 @@ OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small
532
532
  **Best for:** Privacy + Quality + Cost Optimization
533
533
 
534
534
  ```env
535
- # Chat: Ollama + Cloud fallback
536
- PREFER_OLLAMA=true
535
+ # Chat: Tier-based routing (set all 4 to enable)
536
+ TIER_SIMPLE=ollama:llama3.2
537
+ TIER_MEDIUM=openrouter:openai/gpt-4o-mini
538
+ TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
539
+ TIER_REASONING=databricks:databricks-claude-sonnet-4-5
537
540
  FALLBACK_ENABLED=true
538
- OLLAMA_MODEL=llama3.1:8b
539
541
  FALLBACK_PROVIDER=databricks
540
542
  DATABRICKS_API_BASE=https://your-workspace.databricks.com
541
543
  DATABRICKS_API_KEY=your-key
@@ -547,10 +549,10 @@ OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
547
549
  ```
548
550
 
549
551
  **Benefits:**
550
- - ✅ 70-80% of chat requests FREE (Ollama)
552
+ - ✅ 70-80% of chat requests FREE (Ollama via TIER_SIMPLE)
551
553
  - ✅ 100% private embeddings (local)
552
554
  - ✅ Cloud quality for complex tasks
553
- - ✅ Intelligent automatic routing
555
+ - ✅ Intelligent automatic tier-based routing
554
556
 
555
557
  ---
556
558
 
@@ -8,11 +8,11 @@ Common questions about Lynkr, installation, configuration, and usage.
8
8
 
9
9
  ### What is Lynkr?
10
10
 
11
- Lynkr is a self-hosted proxy server that enables Claude Code CLI and Cursor IDE to work with multiple LLM providers (Databricks, AWS Bedrock, OpenRouter, Ollama, etc.) instead of being locked to Anthropic's API.
11
+ Lynkr is a self-hosted proxy server that enables Claude Code CLI and Cursor IDE to work with multiple LLM providers (Databricks, AWS Bedrock, OpenRouter, Ollama, Moonshot AI, etc.) instead of being locked to Anthropic's API.
12
12
 
13
13
  **Key benefits:**
14
14
  - 💰 **60-80% cost savings** through token optimization
15
- - 🔓 **Provider flexibility** - Choose from 9+ providers
15
+ - 🔓 **Provider flexibility** - Choose from 12+ providers
16
16
  - 🔒 **Privacy** - Run 100% locally with Ollama or llama.cpp
17
17
  - ✅ **Zero code changes** - Drop-in replacement for Anthropic backend
18
18
 
@@ -67,7 +67,7 @@ Lynkr itself is **100% FREE** and open source (Apache 2.0 license).
67
67
 
68
68
  | Feature | Native Claude Code | Lynkr |
69
69
  |---------|-------------------|-------|
70
- | **Providers** | Anthropic only | 9+ providers |
70
+ | **Providers** | Anthropic only | 12+ providers |
71
71
  | **Cost** | Full Anthropic pricing | 60-80% cheaper |
72
72
  | **Local models** | ❌ Cloud-only | ✅ Ollama, llama.cpp |
73
73
  | **Privacy** | ☁️ Cloud | 🔒 Can run 100% locally |
@@ -126,6 +126,11 @@ See [Installation Guide](installation.md) for all methods.
126
126
  - **Setup:** 5 minutes
127
127
  - **Cost:** ~$10-20/month
128
128
 
129
+ **For Affordable Cloud + Reasoning:**
130
+ - ✅ **Moonshot AI** - Kimi K2, thinking models
131
+ - **Setup:** 2 minutes
132
+ - **Cost:** ~$5-10/month
133
+
129
134
  **For Enterprise:**
130
135
  - ✅ **Databricks** - Claude 4.5, enterprise SLA
131
136
  - **Setup:** 10 minutes
@@ -137,23 +142,71 @@ See [Provider Configuration Guide](providers.md) for detailed comparison.
137
142
 
138
143
  ### Can I use multiple providers?
139
144
 
140
- **Yes!** Lynkr supports hybrid routing:
145
+ **Yes!** Lynkr supports tier-based routing:
141
146
 
142
147
  ```bash
143
- # Use Ollama for simple requests, Databricks for complex ones
144
- export PREFER_OLLAMA=true
145
- export OLLAMA_MODEL=llama3.1:8b
148
+ # Set all 4 TIER_* env vars to enable tier-based routing
149
+ export TIER_SIMPLE=ollama:llama3.2
150
+ export TIER_MEDIUM=openrouter:openai/gpt-4o-mini
151
+ export TIER_COMPLEX=azure-openai:gpt-4o
152
+ export TIER_REASONING=azure-openai:gpt-4o
146
153
  export FALLBACK_ENABLED=true
147
154
  export FALLBACK_PROVIDER=databricks
148
155
  ```
149
156
 
150
157
  **How it works:**
151
- - **0-2 tools**: Ollama (free, local, fast)
152
- - **3-15 tools**: OpenRouter (if configured) or fallback
153
- - **16+ tools**: Databricks/Azure (most capable)
154
- - **Ollama failures**: Automatic transparent fallback
158
+ - Each request is scored for complexity (0-100) and mapped to a tier
159
+ - **SIMPLE (0-25)**: Ollama (free, local, fast) or Moonshot (affordable cloud)
160
+ - **MEDIUM (26-50)**: OpenRouter or mid-range cloud model
161
+ - **COMPLEX (51-75)**: Capable cloud models
162
+ - **REASONING (76-100)**: Best available models
163
+ - **Provider failures**: Automatic transparent fallback
164
+
165
+ **Cost savings:** 65-100% for requests routed to local/cheap models.
166
+
167
+ ---
168
+
169
+ ### What is MODEL_PROVIDER and do I still need it?
170
+
171
+ `MODEL_PROVIDER` sets a single static provider for all requests. When you set `MODEL_PROVIDER=ollama`, every request goes to Ollama regardless of complexity.
172
+
173
+ **With TIER_\* vars configured:** `MODEL_PROVIDER` is not used for routing — the tier system picks the provider per-request. However, `MODEL_PROVIDER` is still read for startup checks (e.g. waiting for Ollama) and as a fallback default in edge cases. Keep it set to your most-used provider.
174
+
175
+ **Without TIER_\* vars:** `MODEL_PROVIDER` is the only thing that controls where requests go.
176
+
177
+ ---
178
+
179
+ ### How do MODEL_PROVIDER and TIER_\* work together?
180
+
181
+ They are two separate routing modes:
182
+
183
+ | Scenario | What happens |
184
+ |----------|-------------|
185
+ | `MODEL_PROVIDER` only | Static routing — all requests go to that provider |
186
+ | All 4 `TIER_*` set | Tier routing — TIER_\* **overrides** MODEL_PROVIDER for routing |
187
+ | Only 1-3 `TIER_*` set | Tier routing disabled — falls back to `MODEL_PROVIDER` |
188
+ | Both set | TIER_\* takes priority for routing; MODEL_PROVIDER is kept as a config default |
189
+
190
+ **Example:** If you have `MODEL_PROVIDER=ollama` and `TIER_COMPLEX=databricks:claude-sonnet`, complex requests go to Databricks even though MODEL_PROVIDER says ollama.
191
+
192
+ ---
193
+
194
+ ### What happens if I only set some TIER_\* vars?
195
+
196
+ All 4 must be set (`TIER_SIMPLE`, `TIER_MEDIUM`, `TIER_COMPLEX`, `TIER_REASONING`) for tier routing to activate. If any are missing, tier routing is disabled entirely and `MODEL_PROVIDER` is used for all requests.
197
+
198
+ This is intentional — partial tier config could lead to unexpected gaps where some complexity levels have no provider assigned.
199
+
200
+ ---
201
+
202
+ ### What is FALLBACK_PROVIDER?
203
+
204
+ The fallback provider is a safety net for when the tier-selected provider fails (timeout, connection refused, rate limit). If `FALLBACK_ENABLED=true` and the primary provider for a request fails, Lynkr retries the request against `FALLBACK_PROVIDER` transparently.
155
205
 
156
- **Cost savings:** 65-100% for requests that stay on Ollama.
206
+ - Only triggers when tier routing is active
207
+ - Cannot be a local provider (ollama, llamacpp, lmstudio) — use cloud providers
208
+ - Defaults to `databricks`
209
+ - If you don't have cloud credentials, set `FALLBACK_ENABLED=false`
157
210
 
158
211
  ---
159
212
 
@@ -227,6 +280,7 @@ See [Embeddings Guide](embeddings.md) for details.
227
280
  | **OpenRouter** | 500ms-2s | $-$$ | Excellent | Flexibility, 100+ models |
228
281
  | **Databricks/Azure** | 500ms-2s | $$$ | Excellent | Enterprise, Claude 4.5 |
229
282
  | **AWS Bedrock** | 500ms-2s | $-$$$ | Excellent* | AWS, 100+ models |
283
+ | **Moonshot AI** | 500ms-2s | $ | Good | Affordable, thinking models |
230
284
  | **OpenAI** | 500ms-2s | $$ | Excellent | GPT-4o, o1, o3 |
231
285
 
232
286
  _* Tool calling only supported by Claude models on Bedrock_