lynkr 7.2.4 → 8.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -2
- package/config/model-tiers.json +89 -0
- package/docs/docs.html +1 -0
- package/docs/index.md +7 -0
- package/docs/toon-integration-spec.md +130 -0
- package/documentation/README.md +3 -2
- package/documentation/claude-code-cli.md +23 -16
- package/documentation/cursor-integration.md +17 -14
- package/documentation/docker.md +11 -4
- package/documentation/embeddings.md +7 -5
- package/documentation/faq.md +66 -12
- package/documentation/features.md +22 -15
- package/documentation/installation.md +66 -14
- package/documentation/production.md +43 -8
- package/documentation/providers.md +145 -42
- package/documentation/routing.md +476 -0
- package/documentation/token-optimization.md +7 -5
- package/documentation/troubleshooting.md +81 -5
- package/install.sh +6 -1
- package/package.json +5 -3
- package/scripts/setup.js +0 -1
- package/src/agents/executor.js +14 -6
- package/src/api/middleware/session.js +15 -2
- package/src/api/openai-router.js +130 -37
- package/src/api/providers-handler.js +15 -1
- package/src/api/router.js +107 -2
- package/src/budget/index.js +4 -3
- package/src/clients/databricks.js +431 -234
- package/src/clients/gpt-utils.js +181 -0
- package/src/clients/ollama-utils.js +66 -140
- package/src/clients/routing.js +0 -1
- package/src/clients/standard-tools.js +82 -5
- package/src/config/index.js +119 -35
- package/src/context/toon.js +173 -0
- package/src/headroom/launcher.js +8 -3
- package/src/logger/index.js +23 -0
- package/src/orchestrator/index.js +765 -212
- package/src/routing/agentic-detector.js +320 -0
- package/src/routing/complexity-analyzer.js +202 -2
- package/src/routing/cost-optimizer.js +305 -0
- package/src/routing/index.js +168 -159
- package/src/routing/model-registry.js +437 -0
- package/src/routing/model-tiers.js +365 -0
- package/src/server.js +2 -2
- package/src/sessions/cleanup.js +3 -3
- package/src/sessions/record.js +10 -1
- package/src/sessions/store.js +7 -2
- package/src/tools/agent-task.js +48 -1
- package/src/tools/index.js +15 -2
- package/src/tools/workspace.js +35 -4
- package/src/workspace/index.js +30 -0
- package/te +11622 -0
- package/test/README.md +1 -1
- package/test/azure-openai-config.test.js +17 -8
- package/test/azure-openai-integration.test.js +7 -1
- package/test/azure-openai-routing.test.js +41 -43
- package/test/bedrock-integration.test.js +18 -32
- package/test/hybrid-routing-integration.test.js +35 -20
- package/test/hybrid-routing-performance.test.js +74 -64
- package/test/llamacpp-integration.test.js +28 -9
- package/test/lmstudio-integration.test.js +20 -8
- package/test/openai-integration.test.js +17 -20
- package/test/performance-tests.js +1 -1
- package/test/routing.test.js +65 -59
- package/test/toon-compression.test.js +131 -0
- package/CLAWROUTER_ROUTING_PLAN.md +0 -910
- package/ROUTER_COMPARISON.md +0 -173
- package/TIER_ROUTING_PLAN.md +0 -771
package/README.md
CHANGED
|
@@ -238,7 +238,7 @@ Lynkr supports [ClawdBot](https://github.com/openclaw/openclaw) via its OpenAI-c
|
|
|
238
238
|
|
|
239
239
|
### Getting Started
|
|
240
240
|
- 📦 **[Installation Guide](documentation/installation.md)** - Detailed installation for all methods
|
|
241
|
-
- ⚙️ **[Provider Configuration](documentation/providers.md)** - Complete setup for all
|
|
241
|
+
- ⚙️ **[Provider Configuration](documentation/providers.md)** - Complete setup for all 12+ providers
|
|
242
242
|
- 🎯 **[Quick Start Examples](documentation/installation.md#quick-start-examples)** - Copy-paste configs
|
|
243
243
|
|
|
244
244
|
### IDE & CLI Integration
|
|
@@ -277,7 +277,7 @@ Lynkr supports [ClawdBot](https://github.com/openclaw/openclaw) via its OpenAI-c
|
|
|
277
277
|
|
|
278
278
|
## Key Features Highlights
|
|
279
279
|
|
|
280
|
-
- ✅ **Multi-Provider Support** -
|
|
280
|
+
- ✅ **Multi-Provider Support** - 12+ providers including local (Ollama, llama.cpp) and cloud (Bedrock, Databricks, OpenRouter, Moonshot AI)
|
|
281
281
|
- ✅ **60-80% Cost Reduction** - Token optimization with smart tool selection, prompt caching, memory deduplication
|
|
282
282
|
- ✅ **100% Local Option** - Run completely offline with Ollama/llama.cpp (zero cloud dependencies)
|
|
283
283
|
- ✅ **OpenAI Compatible** - Works with Cursor IDE, Continue.dev, and any OpenAI-compatible client
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
{
|
|
2
|
+
"tiers": {
|
|
3
|
+
"SIMPLE": {
|
|
4
|
+
"description": "Greetings, simple Q&A, confirmations, basic lookups",
|
|
5
|
+
"range": [0, 25],
|
|
6
|
+
"priority": 1,
|
|
7
|
+
"preferred": {
|
|
8
|
+
"ollama": ["llama3.2", "gemma2", "phi3", "qwen2.5:7b", "mistral"],
|
|
9
|
+
"llamacpp": ["default"],
|
|
10
|
+
"lmstudio": ["default"],
|
|
11
|
+
"openai": ["gpt-4o-mini", "gpt-3.5-turbo"],
|
|
12
|
+
"azure-openai": ["gpt-4o-mini", "gpt-35-turbo"],
|
|
13
|
+
"anthropic": ["claude-3-haiku-20240307", "claude-3-5-haiku-20241022"],
|
|
14
|
+
"bedrock": ["anthropic.claude-3-haiku-20240307-v1:0", "amazon.nova-lite-v1:0"],
|
|
15
|
+
"databricks": ["databricks-claude-haiku-4-5", "databricks-gpt-5-nano"],
|
|
16
|
+
"google": ["gemini-2.0-flash", "gemini-1.5-flash"],
|
|
17
|
+
"openrouter": ["google/gemini-flash-1.5", "deepseek/deepseek-chat"],
|
|
18
|
+
"zai": ["GLM-4-Flash"],
|
|
19
|
+
"moonshot": ["kimi-k2-turbo-preview"]
|
|
20
|
+
}
|
|
21
|
+
},
|
|
22
|
+
"MEDIUM": {
|
|
23
|
+
"description": "Code reading, simple edits, research, documentation",
|
|
24
|
+
"range": [26, 50],
|
|
25
|
+
"priority": 2,
|
|
26
|
+
"preferred": {
|
|
27
|
+
"ollama": ["qwen2.5:32b", "deepseek-coder:33b", "codellama:34b"],
|
|
28
|
+
"llamacpp": ["default"],
|
|
29
|
+
"lmstudio": ["default"],
|
|
30
|
+
"openai": ["gpt-4o", "gpt-4-turbo"],
|
|
31
|
+
"azure-openai": ["gpt-4o", "gpt-4"],
|
|
32
|
+
"anthropic": ["claude-sonnet-4-20250514", "claude-3-5-sonnet-20241022"],
|
|
33
|
+
"bedrock": ["anthropic.claude-3-5-sonnet-20241022-v2:0", "amazon.nova-pro-v1:0"],
|
|
34
|
+
"databricks": ["databricks-claude-sonnet-4-5", "databricks-gpt-5-1"],
|
|
35
|
+
"google": ["gemini-1.5-pro", "gemini-2.0-pro"],
|
|
36
|
+
"openrouter": ["anthropic/claude-3.5-sonnet", "openai/gpt-4o"],
|
|
37
|
+
"zai": ["GLM-4.7"],
|
|
38
|
+
"moonshot": ["kimi-k2-turbo-preview"]
|
|
39
|
+
}
|
|
40
|
+
},
|
|
41
|
+
"COMPLEX": {
|
|
42
|
+
"description": "Multi-file changes, debugging, architecture, refactoring",
|
|
43
|
+
"range": [51, 75],
|
|
44
|
+
"priority": 3,
|
|
45
|
+
"preferred": {
|
|
46
|
+
"ollama": ["qwen2.5:72b", "llama3.1:70b", "deepseek-coder-v2:236b"],
|
|
47
|
+
"openai": ["o1-mini", "o3-mini", "gpt-4o"],
|
|
48
|
+
"azure-openai": ["o1-mini", "gpt-4o"],
|
|
49
|
+
"anthropic": ["claude-sonnet-4-20250514", "claude-3-5-sonnet-20241022"],
|
|
50
|
+
"bedrock": ["anthropic.claude-3-5-sonnet-20241022-v2:0"],
|
|
51
|
+
"databricks": ["databricks-claude-sonnet-4-5", "databricks-gpt-5-1-codex-max"],
|
|
52
|
+
"google": ["gemini-2.5-pro", "gemini-1.5-pro"],
|
|
53
|
+
"openrouter": ["anthropic/claude-3.5-sonnet", "meta-llama/llama-3.1-405b"],
|
|
54
|
+
"zai": ["GLM-4.7"],
|
|
55
|
+
"moonshot": ["kimi-k2-turbo-preview"]
|
|
56
|
+
}
|
|
57
|
+
},
|
|
58
|
+
"REASONING": {
|
|
59
|
+
"description": "Complex analysis, security audits, novel problems, deep thinking",
|
|
60
|
+
"range": [76, 100],
|
|
61
|
+
"priority": 4,
|
|
62
|
+
"preferred": {
|
|
63
|
+
"openai": ["o1", "o1-pro", "o3"],
|
|
64
|
+
"azure-openai": ["o1", "o1-pro"],
|
|
65
|
+
"anthropic": ["claude-opus-4-20250514", "claude-3-opus-20240229"],
|
|
66
|
+
"bedrock": ["anthropic.claude-3-opus-20240229-v1:0"],
|
|
67
|
+
"databricks": ["databricks-claude-opus-4-6", "databricks-claude-opus-4-5", "databricks-gpt-5-2"],
|
|
68
|
+
"google": ["gemini-2.5-pro"],
|
|
69
|
+
"openrouter": ["anthropic/claude-3-opus", "deepseek/deepseek-reasoner", "openai/o1"],
|
|
70
|
+
"deepseek": ["deepseek-reasoner", "deepseek-r1"],
|
|
71
|
+
"moonshot": ["kimi-k2-thinking", "kimi-k2-turbo-preview"]
|
|
72
|
+
}
|
|
73
|
+
}
|
|
74
|
+
},
|
|
75
|
+
"localProviders": {
|
|
76
|
+
"ollama": { "free": true, "defaultTier": "SIMPLE" },
|
|
77
|
+
"llamacpp": { "free": true, "defaultTier": "SIMPLE" },
|
|
78
|
+
"lmstudio": { "free": true, "defaultTier": "SIMPLE" }
|
|
79
|
+
},
|
|
80
|
+
"providerAliases": {
|
|
81
|
+
"azure": "azure-openai",
|
|
82
|
+
"aws": "bedrock",
|
|
83
|
+
"amazon": "bedrock",
|
|
84
|
+
"claude": "anthropic",
|
|
85
|
+
"gemini": "google",
|
|
86
|
+
"vertex": "google",
|
|
87
|
+
"kimi": "moonshot"
|
|
88
|
+
}
|
|
89
|
+
}
|
package/docs/docs.html
CHANGED
|
@@ -51,6 +51,7 @@
|
|
|
51
51
|
<div class="doc-sidebar-title">Features</div>
|
|
52
52
|
<ul class="doc-sidebar-links">
|
|
53
53
|
<li><a href="?doc=features" data-doc="features">Core Features</a></li>
|
|
54
|
+
<li><a href="?doc=routing" data-doc="routing">Routing & Model Tiering</a></li>
|
|
54
55
|
<li><a href="?doc=token-optimization" data-doc="token-optimization">Token Optimization</a></li>
|
|
55
56
|
<li><a href="?doc=memory-system" data-doc="memory-system">Memory System</a></li>
|
|
56
57
|
<li><a href="?doc=headroom" data-doc="headroom">Headroom Compression</a></li>
|
package/docs/index.md
CHANGED
|
@@ -311,6 +311,13 @@
|
|
|
311
311
|
<span class="provider-badge paid">GPT-4o, o1</span>
|
|
312
312
|
</div>
|
|
313
313
|
|
|
314
|
+
<div class="provider-card">
|
|
315
|
+
<span class="provider-icon">🌙</span>
|
|
316
|
+
<div class="provider-name">Moonshot AI</div>
|
|
317
|
+
<div class="provider-type">Cloud</div>
|
|
318
|
+
<span class="provider-badge paid">KIMI K2</span>
|
|
319
|
+
</div>
|
|
320
|
+
|
|
314
321
|
<div class="provider-card">
|
|
315
322
|
<span class="provider-icon" style="font-weight: 900; font-size: 28px; background: linear-gradient(135deg, #1a1a2e, #16213e); color: #fff; width: 42px; height: 42px; border-radius: 10px; display: inline-flex; align-items: center; justify-content: center;">Z</span>
|
|
316
323
|
<div class="provider-name">z.ai</div>
|
|
@@ -0,0 +1,130 @@
|
|
|
1
|
+
# TOON Integration Spec (Lynkr Spike)
|
|
2
|
+
|
|
3
|
+
Date: 2026-02-17
|
|
4
|
+
Branch: `codex/toon-integration-spike`
|
|
5
|
+
Status: Implemented behind flags (`TOON_ENABLED=false` by default).
|
|
6
|
+
|
|
7
|
+
## 1) Goal
|
|
8
|
+
|
|
9
|
+
Reduce prompt token usage for large structured JSON context while preserving current Lynkr routing, tool execution semantics, and reliability.
|
|
10
|
+
|
|
11
|
+
## 2) Non-Goals
|
|
12
|
+
|
|
13
|
+
1. Do not replace Lynkr routing/fallback logic.
|
|
14
|
+
2. Do not change MCP/tool protocol behavior.
|
|
15
|
+
3. Do not change provider request envelope formats.
|
|
16
|
+
4. Do not require TOON for normal operation.
|
|
17
|
+
|
|
18
|
+
## 3) Integration Strategy (Minimal, Reversible)
|
|
19
|
+
|
|
20
|
+
1. Add a TOON adapter module (encode-only for prompt context).
|
|
21
|
+
2. Apply TOON only to eligible large JSON blobs before they are inserted into model-visible context.
|
|
22
|
+
3. Keep original JSON in memory/session for execution and audit; only prompt copy is compressed.
|
|
23
|
+
4. Fail open: if TOON conversion fails, send original JSON unchanged.
|
|
24
|
+
|
|
25
|
+
## 4) What We Will Compress
|
|
26
|
+
|
|
27
|
+
Eligible inputs (all required):
|
|
28
|
+
|
|
29
|
+
1. Payload is valid JSON object/array.
|
|
30
|
+
2. Payload size exceeds threshold (for example, `TOON_MIN_BYTES`).
|
|
31
|
+
3. Payload is read-only context for model comprehension (not protocol-critical).
|
|
32
|
+
|
|
33
|
+
Primary targets:
|
|
34
|
+
|
|
35
|
+
1. Large tool output summaries inserted into prompt context.
|
|
36
|
+
2. Large search/result payloads injected for reasoning.
|
|
37
|
+
3. Structured data snapshots used for analysis tasks.
|
|
38
|
+
|
|
39
|
+
## 5) What We Will Never Compress
|
|
40
|
+
|
|
41
|
+
Hard exclusions:
|
|
42
|
+
|
|
43
|
+
1. Tool schemas/definitions (`tools`, `input_schema`, function signatures).
|
|
44
|
+
2. Tool call argument payloads that are executed by systems.
|
|
45
|
+
3. Provider request envelopes (`/v1/messages`, `/chat/completions` body schema fields).
|
|
46
|
+
4. Protocol control fields (roles, stop reasons, tool IDs, request IDs).
|
|
47
|
+
5. Stored canonical session payloads used for replay/debug/audit.
|
|
48
|
+
|
|
49
|
+
Rule: if a payload is machine-validated/executed downstream, keep JSON.
|
|
50
|
+
|
|
51
|
+
## 6) Config Flags (Default Safe)
|
|
52
|
+
|
|
53
|
+
Proposed env flags:
|
|
54
|
+
|
|
55
|
+
1. `TOON_ENABLED=false` (default off)
|
|
56
|
+
2. `TOON_MIN_BYTES=4096` (only convert larger payloads)
|
|
57
|
+
3. `TOON_FAIL_OPEN=true` (fallback to JSON on any TOON error)
|
|
58
|
+
4. `TOON_LOG_STATS=true` (log before/after token estimate for observability)
|
|
59
|
+
|
|
60
|
+
## 7) Verification Gates
|
|
61
|
+
|
|
62
|
+
Before enabling:
|
|
63
|
+
|
|
64
|
+
1. Existing unit tests pass unchanged.
|
|
65
|
+
2. Existing MCP smoke passes (`find_tool`/`call_tool` path).
|
|
66
|
+
|
|
67
|
+
With `TOON_ENABLED=true`:
|
|
68
|
+
|
|
69
|
+
1. Prompt A/B benchmark still passes functionally.
|
|
70
|
+
2. No regression in Task/subagent behavior.
|
|
71
|
+
3. Data-heavy prompt shows token reduction vs baseline.
|
|
72
|
+
4. No increase in protocol/tool-call errors.
|
|
73
|
+
|
|
74
|
+
## 8) Rollback Rules
|
|
75
|
+
|
|
76
|
+
Immediate rollback:
|
|
77
|
+
|
|
78
|
+
1. Set `TOON_ENABLED=false`.
|
|
79
|
+
2. Restart Lynkr service.
|
|
80
|
+
|
|
81
|
+
Code rollback:
|
|
82
|
+
|
|
83
|
+
1. Revert TOON integration commit(s) on this branch.
|
|
84
|
+
2. Re-run unit + MCP smoke gates.
|
|
85
|
+
|
|
86
|
+
## 9) Risks and Mitigations
|
|
87
|
+
|
|
88
|
+
1. Risk: semantic drift from transformed payloads.
|
|
89
|
+
- Mitigation: apply only to read-only context, fail-open on error, keep canonical JSON.
|
|
90
|
+
2. Risk: negligible gains on non-tabular/deeply nested payloads.
|
|
91
|
+
- Mitigation: threshold + eligibility checks; skip low-value payloads.
|
|
92
|
+
3. Risk: harder debugging.
|
|
93
|
+
- Mitigation: log conversion stats and keep original payload for diagnostics.
|
|
94
|
+
|
|
95
|
+
## 10) Stock Provider Validation (Ollama Cloud)
|
|
96
|
+
|
|
97
|
+
Date: 2026-02-17
|
|
98
|
+
|
|
99
|
+
Runtime under test:
|
|
100
|
+
|
|
101
|
+
1. `MODEL_PROVIDER=ollama`
|
|
102
|
+
2. `OLLAMA_ENDPOINT=http://127.0.0.1:11434`
|
|
103
|
+
3. `OLLAMA_MODEL=glm-5:cloud`
|
|
104
|
+
4. `TOON_MIN_BYTES=256`
|
|
105
|
+
5. `TOON_FAIL_OPEN=true`
|
|
106
|
+
6. `TOON_LOG_STATS=true`
|
|
107
|
+
|
|
108
|
+
Probe used:
|
|
109
|
+
|
|
110
|
+
1. Send a two-message request where the second message is a large JSON blob.
|
|
111
|
+
2. Ask model to classify the next message as `JSON` vs `OTHER` based on first character.
|
|
112
|
+
3. Run once with `TOON_ENABLED=false`, once with `TOON_ENABLED=true`.
|
|
113
|
+
|
|
114
|
+
Observed results:
|
|
115
|
+
|
|
116
|
+
1. `TOON_ENABLED=false`
|
|
117
|
+
- Reply: `JSON`
|
|
118
|
+
- Provider header: `x-lynkr-provider: ollama`
|
|
119
|
+
- TOON log entries: `0`
|
|
120
|
+
2. `TOON_ENABLED=true`
|
|
121
|
+
- Reply: `OTHER`
|
|
122
|
+
- Provider header: `x-lynkr-provider: ollama`
|
|
123
|
+
- TOON log entries: `1`
|
|
124
|
+
- Logged conversion stats: `originalBytes=6416`, `compressedBytes=5854` (saved `562` bytes, `8.76%`)
|
|
125
|
+
|
|
126
|
+
Conclusion:
|
|
127
|
+
|
|
128
|
+
1. TOON gating works on stock Ollama cloud path (not moonshot-specific).
|
|
129
|
+
2. Compression is applied only when flag-enabled.
|
|
130
|
+
3. Provider routing remains unchanged (`ollama`) during TOON transformation.
|
package/documentation/README.md
CHANGED
|
@@ -9,7 +9,7 @@ Welcome to the comprehensive documentation for Lynkr, the self-hosted Claude Cod
|
|
|
9
9
|
New to Lynkr? Start here:
|
|
10
10
|
|
|
11
11
|
- **[Installation Guide](installation.md)** - Complete installation instructions for all methods (npm, git clone, homebrew, Docker)
|
|
12
|
-
- **[Provider Configuration](providers.md)** - Detailed setup for all
|
|
12
|
+
- **[Provider Configuration](providers.md)** - Detailed setup for all 12+ supported providers (Databricks, Bedrock, OpenRouter, Ollama, llama.cpp, Azure OpenAI, Azure Anthropic, OpenAI, LM Studio, Moonshot AI, Z.AI, Vertex AI)
|
|
13
13
|
- **[Quick Start Examples](installation.md#quick-start-examples)** - Copy-paste configurations to get running fast
|
|
14
14
|
|
|
15
15
|
---
|
|
@@ -30,6 +30,7 @@ Connect Lynkr to your development tools:
|
|
|
30
30
|
Understand Lynkr's capabilities:
|
|
31
31
|
|
|
32
32
|
- **[Architecture & Features](features.md)** - System architecture, request flow, format conversion, and core capabilities
|
|
33
|
+
- **[Routing & Model Tiering](routing.md)** - 4-tier model system, 15-dimension complexity scoring, agentic workflow detection, and cost optimization
|
|
33
34
|
- **[Memory System](memory-system.md)** - Titans-inspired long-term memory with surprise-based filtering and decay
|
|
34
35
|
- **[Token Optimization](token-optimization.md)** - Achieve 60-80% cost reduction through smart tool selection, prompt caching, and memory deduplication
|
|
35
36
|
- **[Headroom Compression](headroom.md)** - 47-92% token reduction through intelligent context compression (Smart Crusher, CCR, LLMLingua)
|
|
@@ -73,7 +74,7 @@ Get help and contribute:
|
|
|
73
74
|
- [Installation](installation.md) | [Providers](providers.md) | [Claude Code](claude-code-cli.md) | [Codex CLI](codex-cli.md) | [Cursor](cursor-integration.md) | [Embeddings](embeddings.md)
|
|
74
75
|
|
|
75
76
|
### Features & Optimization
|
|
76
|
-
- [Features](features.md) | [Memory System](memory-system.md) | [Token Optimization](token-optimization.md) | [Headroom](headroom.md) | [Tools](tools.md)
|
|
77
|
+
- [Features](features.md) | [Routing](routing.md) | [Memory System](memory-system.md) | [Token Optimization](token-optimization.md) | [Headroom](headroom.md) | [Tools](tools.md)
|
|
77
78
|
|
|
78
79
|
### Deployment & Production
|
|
79
80
|
- [Docker](docker.md) | [Production](production.md) | [API Reference](api.md)
|
|
@@ -11,7 +11,7 @@ Lynkr acts as a drop-in replacement for Anthropic's backend, enabling Claude Cod
|
|
|
11
11
|
### Why Use Lynkr with Claude Code CLI?
|
|
12
12
|
|
|
13
13
|
- 💰 **60-80% cost savings** through token optimization
|
|
14
|
-
- 🔓 **Provider choice** - Use any of
|
|
14
|
+
- 🔓 **Provider choice** - Use any of 12+ supported providers
|
|
15
15
|
- 🏠 **Self-hosted** - Full control over your AI infrastructure
|
|
16
16
|
- 🔒 **Local option** - Run 100% offline with Ollama or llama.cpp
|
|
17
17
|
- ✅ **Zero code changes** - Drop-in replacement for Anthropic backend
|
|
@@ -74,7 +74,7 @@ export DATABRICKS_API_BASE=https://your-workspace.databricks.com
|
|
|
74
74
|
export DATABRICKS_API_KEY=dapi1234567890abcdef
|
|
75
75
|
```
|
|
76
76
|
|
|
77
|
-
See [Provider Configuration Guide](providers.md) for all
|
|
77
|
+
See [Provider Configuration Guide](providers.md) for all 12+ providers.
|
|
78
78
|
|
|
79
79
|
---
|
|
80
80
|
|
|
@@ -341,15 +341,16 @@ export MODEL_PROVIDER=databricks
|
|
|
341
341
|
|
|
342
342
|
---
|
|
343
343
|
|
|
344
|
-
##
|
|
344
|
+
## Tier-Based Routing (Cost Optimization)
|
|
345
345
|
|
|
346
|
-
Use local Ollama for simple tasks,
|
|
346
|
+
Use local Ollama for simple tasks, cloud for complex ones:
|
|
347
347
|
|
|
348
348
|
```bash
|
|
349
|
-
# Configure
|
|
350
|
-
export
|
|
351
|
-
export
|
|
352
|
-
export
|
|
349
|
+
# Configure tier-based routing (set all 4 to enable)
|
|
350
|
+
export TIER_SIMPLE=ollama:llama3.2
|
|
351
|
+
export TIER_MEDIUM=openrouter:openai/gpt-4o-mini
|
|
352
|
+
export TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
|
|
353
|
+
export TIER_REASONING=databricks:databricks-claude-sonnet-4-5
|
|
353
354
|
export FALLBACK_ENABLED=true
|
|
354
355
|
export FALLBACK_PROVIDER=databricks
|
|
355
356
|
export DATABRICKS_API_BASE=https://your-workspace.databricks.com
|
|
@@ -360,13 +361,15 @@ lynkr start
|
|
|
360
361
|
```
|
|
361
362
|
|
|
362
363
|
**How it works:**
|
|
363
|
-
-
|
|
364
|
-
- **
|
|
365
|
-
- **
|
|
366
|
-
- **
|
|
364
|
+
- Each request is scored for complexity (0-100) and mapped to a tier
|
|
365
|
+
- **SIMPLE (0-25)**: Ollama (free, local, fast)
|
|
366
|
+
- **MEDIUM (26-50)**: OpenRouter (affordable cloud)
|
|
367
|
+
- **COMPLEX (51-75)**: Databricks (most capable)
|
|
368
|
+
- **REASONING (76-100)**: Databricks (best available)
|
|
369
|
+
- **Provider failures**: Automatic transparent fallback to cloud
|
|
367
370
|
|
|
368
371
|
**Cost savings:**
|
|
369
|
-
- **65-100%** for requests
|
|
372
|
+
- **65-100%** for requests routed to local models
|
|
370
373
|
- **40-87%** faster for simple requests
|
|
371
374
|
|
|
372
375
|
---
|
|
@@ -534,9 +537,13 @@ claude "What files are in the current directory?"
|
|
|
534
537
|
- Local (Ollama): Should be 100-500ms
|
|
535
538
|
- Cloud: Should be 500ms-2s
|
|
536
539
|
|
|
537
|
-
2. **Enable
|
|
540
|
+
2. **Enable tier-based routing:**
|
|
538
541
|
```bash
|
|
539
|
-
|
|
542
|
+
# Set all 4 TIER_* env vars to enable tier-based routing
|
|
543
|
+
export TIER_SIMPLE=ollama:llama3.2
|
|
544
|
+
export TIER_MEDIUM=openrouter:openai/gpt-4o-mini
|
|
545
|
+
export TIER_COMPLEX=azure-openai:gpt-4o
|
|
546
|
+
export TIER_REASONING=azure-openai:gpt-4o
|
|
540
547
|
export FALLBACK_ENABLED=true
|
|
541
548
|
```
|
|
542
549
|
|
|
@@ -655,7 +662,7 @@ Claude Code CLI (displays result)
|
|
|
655
662
|
|
|
656
663
|
## Next Steps
|
|
657
664
|
|
|
658
|
-
- **[Provider Configuration](providers.md)** - Configure all
|
|
665
|
+
- **[Provider Configuration](providers.md)** - Configure all 12+ providers
|
|
659
666
|
- **[Installation Guide](installation.md)** - Detailed installation
|
|
660
667
|
- **[Features Guide](features.md)** - Learn about advanced features
|
|
661
668
|
- **[Token Optimization](token-optimization.md)** - Maximize cost savings
|
|
@@ -534,11 +534,14 @@ AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
|
|
|
534
534
|
- **Cloud** (OpenRouter/Databricks): Should be 500ms-2s
|
|
535
535
|
- **Distant regions**: Can be 2-5s
|
|
536
536
|
|
|
537
|
-
2. **Enable
|
|
537
|
+
2. **Enable tier-based routing** for speed:
|
|
538
538
|
```env
|
|
539
|
-
# Use Ollama for simple requests (fast)
|
|
540
|
-
#
|
|
541
|
-
|
|
539
|
+
# Use Ollama for simple requests (fast), cloud for complex requests
|
|
540
|
+
# Set all 4 TIER_* env vars to enable tier-based routing
|
|
541
|
+
TIER_SIMPLE=ollama:llama3.2
|
|
542
|
+
TIER_MEDIUM=openrouter:openai/gpt-4o-mini
|
|
543
|
+
TIER_COMPLEX=azure-openai:gpt-4o
|
|
544
|
+
TIER_REASONING=azure-openai:gpt-4o
|
|
542
545
|
FALLBACK_ENABLED=true
|
|
543
546
|
```
|
|
544
547
|
|
|
@@ -675,12 +678,12 @@ OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
|
|
|
675
678
|
### Setup 3: Hybrid (Best of Both Worlds)
|
|
676
679
|
|
|
677
680
|
```bash
|
|
678
|
-
# Chat:
|
|
679
|
-
|
|
681
|
+
# Chat: Tier-based routing (set all 4 to enable)
|
|
682
|
+
TIER_SIMPLE=ollama:llama3.2
|
|
683
|
+
TIER_MEDIUM=openrouter:openai/gpt-4o-mini
|
|
684
|
+
TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
|
|
685
|
+
TIER_REASONING=databricks:databricks-claude-sonnet-4-5
|
|
680
686
|
FALLBACK_ENABLED=true
|
|
681
|
-
OLLAMA_MODEL=llama3.1:8b
|
|
682
|
-
|
|
683
|
-
# Fallback to Databricks for complex requests
|
|
684
687
|
FALLBACK_PROVIDER=databricks
|
|
685
688
|
DATABRICKS_API_BASE=https://your-workspace.databricks.com
|
|
686
689
|
DATABRICKS_API_KEY=your-key
|
|
@@ -688,15 +691,15 @@ DATABRICKS_API_KEY=your-key
|
|
|
688
691
|
# Embeddings: Ollama (local, private)
|
|
689
692
|
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
|
|
690
693
|
|
|
691
|
-
# Cost: Mostly FREE (Ollama handles 70-80% of requests)
|
|
692
|
-
# Only complex
|
|
694
|
+
# Cost: Mostly FREE (Ollama handles 70-80% of simple requests)
|
|
695
|
+
# Only complex/reasoning requests go to Databricks
|
|
693
696
|
```
|
|
694
697
|
|
|
695
698
|
**Benefits:**
|
|
696
|
-
- ✅ Mostly FREE (70-80% of requests on Ollama)
|
|
699
|
+
- ✅ Mostly FREE (70-80% of requests on Ollama via TIER_SIMPLE)
|
|
697
700
|
- ✅ Private embeddings (local search)
|
|
698
701
|
- ✅ Cloud quality for complex tasks
|
|
699
|
-
- ✅ Automatic intelligent routing
|
|
702
|
+
- ✅ Automatic intelligent tier-based routing
|
|
700
703
|
|
|
701
704
|
---
|
|
702
705
|
|
|
@@ -704,7 +707,7 @@ OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
|
|
|
704
707
|
|
|
705
708
|
| Aspect | Cursor Native | Lynkr + Cursor |
|
|
706
709
|
|--------|---------------|----------------|
|
|
707
|
-
| **Providers** | OpenAI only |
|
|
710
|
+
| **Providers** | OpenAI only | 12+ providers (Bedrock, Databricks, OpenRouter, Ollama, llama.cpp, Moonshot, etc.) |
|
|
708
711
|
| **Costs** | OpenAI pricing | 60-80% cheaper (or 100% FREE with Ollama) |
|
|
709
712
|
| **Privacy** | Cloud-only | Can run 100% locally (Ollama + local embeddings) |
|
|
710
713
|
| **Embeddings** | Built-in (cloud) | 4 options: Ollama (local), llama.cpp (local), OpenRouter (cloud), OpenAI (cloud) |
|
package/documentation/docker.md
CHANGED
|
@@ -73,10 +73,14 @@ services:
|
|
|
73
73
|
ports:
|
|
74
74
|
- "8081:8081"
|
|
75
75
|
environment:
|
|
76
|
-
#
|
|
76
|
+
# Tier-based routing: local for simple, cloud for complex
|
|
77
77
|
- MODEL_PROVIDER=ollama
|
|
78
78
|
- OLLAMA_API_BASE=http://ollama:11434
|
|
79
|
-
-
|
|
79
|
+
# Set all 4 TIER_* vars to enable tier-based routing
|
|
80
|
+
- TIER_SIMPLE=ollama:llama3.2
|
|
81
|
+
- TIER_MEDIUM=ollama:llama3.2
|
|
82
|
+
- TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
|
|
83
|
+
- TIER_REASONING=databricks:databricks-claude-sonnet-4-5
|
|
80
84
|
- FALLBACK_ENABLED=true
|
|
81
85
|
- FALLBACK_PROVIDER=databricks
|
|
82
86
|
- DATABRICKS_API_BASE=${DATABRICKS_API_BASE}
|
|
@@ -452,8 +456,11 @@ environment:
|
|
|
452
456
|
- DATABRICKS_API_BASE=https://your-workspace.databricks.com
|
|
453
457
|
- DATABRICKS_API_KEY=${DATABRICKS_API_KEY}
|
|
454
458
|
|
|
455
|
-
#
|
|
456
|
-
-
|
|
459
|
+
# Tier-based routing (set all 4 to enable)
|
|
460
|
+
- TIER_SIMPLE=ollama:llama3.2
|
|
461
|
+
- TIER_MEDIUM=openrouter:openai/gpt-4o-mini
|
|
462
|
+
- TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
|
|
463
|
+
- TIER_REASONING=databricks:databricks-claude-sonnet-4-5
|
|
457
464
|
- FALLBACK_ENABLED=true
|
|
458
465
|
- FALLBACK_PROVIDER=databricks
|
|
459
466
|
|
|
@@ -532,10 +532,12 @@ OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small
|
|
|
532
532
|
**Best for:** Privacy + Quality + Cost Optimization
|
|
533
533
|
|
|
534
534
|
```env
|
|
535
|
-
# Chat:
|
|
536
|
-
|
|
535
|
+
# Chat: Tier-based routing (set all 4 to enable)
|
|
536
|
+
TIER_SIMPLE=ollama:llama3.2
|
|
537
|
+
TIER_MEDIUM=openrouter:openai/gpt-4o-mini
|
|
538
|
+
TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
|
|
539
|
+
TIER_REASONING=databricks:databricks-claude-sonnet-4-5
|
|
537
540
|
FALLBACK_ENABLED=true
|
|
538
|
-
OLLAMA_MODEL=llama3.1:8b
|
|
539
541
|
FALLBACK_PROVIDER=databricks
|
|
540
542
|
DATABRICKS_API_BASE=https://your-workspace.databricks.com
|
|
541
543
|
DATABRICKS_API_KEY=your-key
|
|
@@ -547,10 +549,10 @@ OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
|
|
|
547
549
|
```
|
|
548
550
|
|
|
549
551
|
**Benefits:**
|
|
550
|
-
- ✅ 70-80% of chat requests FREE (Ollama)
|
|
552
|
+
- ✅ 70-80% of chat requests FREE (Ollama via TIER_SIMPLE)
|
|
551
553
|
- ✅ 100% private embeddings (local)
|
|
552
554
|
- ✅ Cloud quality for complex tasks
|
|
553
|
-
- ✅ Intelligent automatic routing
|
|
555
|
+
- ✅ Intelligent automatic tier-based routing
|
|
554
556
|
|
|
555
557
|
---
|
|
556
558
|
|
package/documentation/faq.md
CHANGED
|
@@ -8,11 +8,11 @@ Common questions about Lynkr, installation, configuration, and usage.
|
|
|
8
8
|
|
|
9
9
|
### What is Lynkr?
|
|
10
10
|
|
|
11
|
-
Lynkr is a self-hosted proxy server that enables Claude Code CLI and Cursor IDE to work with multiple LLM providers (Databricks, AWS Bedrock, OpenRouter, Ollama, etc.) instead of being locked to Anthropic's API.
|
|
11
|
+
Lynkr is a self-hosted proxy server that enables Claude Code CLI and Cursor IDE to work with multiple LLM providers (Databricks, AWS Bedrock, OpenRouter, Ollama, Moonshot AI, etc.) instead of being locked to Anthropic's API.
|
|
12
12
|
|
|
13
13
|
**Key benefits:**
|
|
14
14
|
- 💰 **60-80% cost savings** through token optimization
|
|
15
|
-
- 🔓 **Provider flexibility** - Choose from
|
|
15
|
+
- 🔓 **Provider flexibility** - Choose from 12+ providers
|
|
16
16
|
- 🔒 **Privacy** - Run 100% locally with Ollama or llama.cpp
|
|
17
17
|
- ✅ **Zero code changes** - Drop-in replacement for Anthropic backend
|
|
18
18
|
|
|
@@ -67,7 +67,7 @@ Lynkr itself is **100% FREE** and open source (Apache 2.0 license).
|
|
|
67
67
|
|
|
68
68
|
| Feature | Native Claude Code | Lynkr |
|
|
69
69
|
|---------|-------------------|-------|
|
|
70
|
-
| **Providers** | Anthropic only |
|
|
70
|
+
| **Providers** | Anthropic only | 12+ providers |
|
|
71
71
|
| **Cost** | Full Anthropic pricing | 60-80% cheaper |
|
|
72
72
|
| **Local models** | ❌ Cloud-only | ✅ Ollama, llama.cpp |
|
|
73
73
|
| **Privacy** | ☁️ Cloud | 🔒 Can run 100% locally |
|
|
@@ -126,6 +126,11 @@ See [Installation Guide](installation.md) for all methods.
|
|
|
126
126
|
- **Setup:** 5 minutes
|
|
127
127
|
- **Cost:** ~$10-20/month
|
|
128
128
|
|
|
129
|
+
**For Affordable Cloud + Reasoning:**
|
|
130
|
+
- ✅ **Moonshot AI** - Kimi K2, thinking models
|
|
131
|
+
- **Setup:** 2 minutes
|
|
132
|
+
- **Cost:** ~$5-10/month
|
|
133
|
+
|
|
129
134
|
**For Enterprise:**
|
|
130
135
|
- ✅ **Databricks** - Claude 4.5, enterprise SLA
|
|
131
136
|
- **Setup:** 10 minutes
|
|
@@ -137,23 +142,71 @@ See [Provider Configuration Guide](providers.md) for detailed comparison.
|
|
|
137
142
|
|
|
138
143
|
### Can I use multiple providers?
|
|
139
144
|
|
|
140
|
-
**Yes!** Lynkr supports
|
|
145
|
+
**Yes!** Lynkr supports tier-based routing:
|
|
141
146
|
|
|
142
147
|
```bash
|
|
143
|
-
#
|
|
144
|
-
export
|
|
145
|
-
export
|
|
148
|
+
# Set all 4 TIER_* env vars to enable tier-based routing
|
|
149
|
+
export TIER_SIMPLE=ollama:llama3.2
|
|
150
|
+
export TIER_MEDIUM=openrouter:openai/gpt-4o-mini
|
|
151
|
+
export TIER_COMPLEX=azure-openai:gpt-4o
|
|
152
|
+
export TIER_REASONING=azure-openai:gpt-4o
|
|
146
153
|
export FALLBACK_ENABLED=true
|
|
147
154
|
export FALLBACK_PROVIDER=databricks
|
|
148
155
|
```
|
|
149
156
|
|
|
150
157
|
**How it works:**
|
|
151
|
-
-
|
|
152
|
-
- **
|
|
153
|
-
- **
|
|
154
|
-
- **
|
|
158
|
+
- Each request is scored for complexity (0-100) and mapped to a tier
|
|
159
|
+
- **SIMPLE (0-25)**: Ollama (free, local, fast) or Moonshot (affordable cloud)
|
|
160
|
+
- **MEDIUM (26-50)**: OpenRouter or mid-range cloud model
|
|
161
|
+
- **COMPLEX (51-75)**: Capable cloud models
|
|
162
|
+
- **REASONING (76-100)**: Best available models
|
|
163
|
+
- **Provider failures**: Automatic transparent fallback
|
|
164
|
+
|
|
165
|
+
**Cost savings:** 65-100% for requests routed to local/cheap models.
|
|
166
|
+
|
|
167
|
+
---
|
|
168
|
+
|
|
169
|
+
### What is MODEL_PROVIDER and do I still need it?
|
|
170
|
+
|
|
171
|
+
`MODEL_PROVIDER` sets a single static provider for all requests. When you set `MODEL_PROVIDER=ollama`, every request goes to Ollama regardless of complexity.
|
|
172
|
+
|
|
173
|
+
**With TIER_\* vars configured:** `MODEL_PROVIDER` is not used for routing — the tier system picks the provider per-request. However, `MODEL_PROVIDER` is still read for startup checks (e.g. waiting for Ollama) and as a fallback default in edge cases. Keep it set to your most-used provider.
|
|
174
|
+
|
|
175
|
+
**Without TIER_\* vars:** `MODEL_PROVIDER` is the only thing that controls where requests go.
|
|
176
|
+
|
|
177
|
+
---
|
|
178
|
+
|
|
179
|
+
### How do MODEL_PROVIDER and TIER_\* work together?
|
|
180
|
+
|
|
181
|
+
They are two separate routing modes:
|
|
182
|
+
|
|
183
|
+
| Scenario | What happens |
|
|
184
|
+
|----------|-------------|
|
|
185
|
+
| `MODEL_PROVIDER` only | Static routing — all requests go to that provider |
|
|
186
|
+
| All 4 `TIER_*` set | Tier routing — TIER_\* **overrides** MODEL_PROVIDER for routing |
|
|
187
|
+
| Only 1-3 `TIER_*` set | Tier routing disabled — falls back to `MODEL_PROVIDER` |
|
|
188
|
+
| Both set | TIER_\* takes priority for routing; MODEL_PROVIDER is kept as a config default |
|
|
189
|
+
|
|
190
|
+
**Example:** If you have `MODEL_PROVIDER=ollama` and `TIER_COMPLEX=databricks:claude-sonnet`, complex requests go to Databricks even though MODEL_PROVIDER says ollama.
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
### What happens if I only set some TIER_\* vars?
|
|
195
|
+
|
|
196
|
+
All 4 must be set (`TIER_SIMPLE`, `TIER_MEDIUM`, `TIER_COMPLEX`, `TIER_REASONING`) for tier routing to activate. If any are missing, tier routing is disabled entirely and `MODEL_PROVIDER` is used for all requests.
|
|
197
|
+
|
|
198
|
+
This is intentional — partial tier config could lead to unexpected gaps where some complexity levels have no provider assigned.
|
|
199
|
+
|
|
200
|
+
---
|
|
201
|
+
|
|
202
|
+
### What is FALLBACK_PROVIDER?
|
|
203
|
+
|
|
204
|
+
The fallback provider is a safety net for when the tier-selected provider fails (timeout, connection refused, rate limit). If `FALLBACK_ENABLED=true` and the primary provider for a request fails, Lynkr retries the request against `FALLBACK_PROVIDER` transparently.
|
|
155
205
|
|
|
156
|
-
|
|
206
|
+
- Only triggers when tier routing is active
|
|
207
|
+
- Cannot be a local provider (ollama, llamacpp, lmstudio) — use cloud providers
|
|
208
|
+
- Defaults to `databricks`
|
|
209
|
+
- If you don't have cloud credentials, set `FALLBACK_ENABLED=false`
|
|
157
210
|
|
|
158
211
|
---
|
|
159
212
|
|
|
@@ -227,6 +280,7 @@ See [Embeddings Guide](embeddings.md) for details.
|
|
|
227
280
|
| **OpenRouter** | 500ms-2s | $-$$ | Excellent | Flexibility, 100+ models |
|
|
228
281
|
| **Databricks/Azure** | 500ms-2s | $$$ | Excellent | Enterprise, Claude 4.5 |
|
|
229
282
|
| **AWS Bedrock** | 500ms-2s | $-$$$ | Excellent* | AWS, 100+ models |
|
|
283
|
+
| **Moonshot AI** | 500ms-2s | $ | Good | Affordable, thinking models |
|
|
230
284
|
| **OpenAI** | 500ms-2s | $$ | Excellent | GPT-4o, o1, o3 |
|
|
231
285
|
|
|
232
286
|
_* Tool calling only supported by Claude models on Bedrock_
|