lynkr 7.2.4 → 8.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -2
- package/config/model-tiers.json +89 -0
- package/docs/docs.html +1 -0
- package/docs/index.md +7 -0
- package/docs/toon-integration-spec.md +130 -0
- package/documentation/README.md +3 -2
- package/documentation/claude-code-cli.md +23 -16
- package/documentation/cursor-integration.md +17 -14
- package/documentation/docker.md +11 -4
- package/documentation/embeddings.md +7 -5
- package/documentation/faq.md +66 -12
- package/documentation/features.md +22 -15
- package/documentation/installation.md +66 -14
- package/documentation/production.md +43 -8
- package/documentation/providers.md +145 -42
- package/documentation/routing.md +476 -0
- package/documentation/token-optimization.md +7 -5
- package/documentation/troubleshooting.md +81 -5
- package/install.sh +6 -1
- package/package.json +5 -3
- package/scripts/setup.js +0 -1
- package/src/agents/executor.js +14 -6
- package/src/api/middleware/session.js +15 -2
- package/src/api/openai-router.js +130 -37
- package/src/api/providers-handler.js +15 -1
- package/src/api/router.js +107 -2
- package/src/budget/index.js +4 -3
- package/src/clients/databricks.js +431 -234
- package/src/clients/gpt-utils.js +181 -0
- package/src/clients/ollama-utils.js +66 -140
- package/src/clients/routing.js +0 -1
- package/src/clients/standard-tools.js +82 -5
- package/src/config/index.js +119 -35
- package/src/context/toon.js +173 -0
- package/src/headroom/launcher.js +8 -3
- package/src/logger/index.js +23 -0
- package/src/orchestrator/index.js +765 -212
- package/src/routing/agentic-detector.js +320 -0
- package/src/routing/complexity-analyzer.js +202 -2
- package/src/routing/cost-optimizer.js +305 -0
- package/src/routing/index.js +168 -159
- package/src/routing/model-registry.js +437 -0
- package/src/routing/model-tiers.js +365 -0
- package/src/server.js +2 -2
- package/src/sessions/cleanup.js +3 -3
- package/src/sessions/record.js +10 -1
- package/src/sessions/store.js +7 -2
- package/src/tools/agent-task.js +48 -1
- package/src/tools/index.js +15 -2
- package/src/tools/workspace.js +35 -4
- package/src/workspace/index.js +30 -0
- package/te +11622 -0
- package/test/README.md +1 -1
- package/test/azure-openai-config.test.js +17 -8
- package/test/azure-openai-integration.test.js +7 -1
- package/test/azure-openai-routing.test.js +41 -43
- package/test/bedrock-integration.test.js +18 -32
- package/test/hybrid-routing-integration.test.js +35 -20
- package/test/hybrid-routing-performance.test.js +74 -64
- package/test/llamacpp-integration.test.js +28 -9
- package/test/lmstudio-integration.test.js +20 -8
- package/test/openai-integration.test.js +17 -20
- package/test/performance-tests.js +1 -1
- package/test/routing.test.js +65 -59
- package/test/toon-compression.test.js +131 -0
- package/CLAWROUTER_ROUTING_PLAN.md +0 -910
- package/ROUTER_COMPARISON.md +0 -173
- package/TIER_ROUTING_PLAN.md +0 -771
|
@@ -26,6 +26,7 @@ Complete guide to Lynkr's architecture, request flow, and core capabilities.
|
|
|
26
26
|
├──→ Databricks (Claude 4.5)
|
|
27
27
|
├──→ AWS Bedrock (100+ models)
|
|
28
28
|
├──→ OpenRouter (100+ models)
|
|
29
|
+
├──→ Moonshot AI (Kimi K2)
|
|
29
30
|
├──→ Ollama (local, free)
|
|
30
31
|
├──→ llama.cpp (local, free)
|
|
31
32
|
├──→ Azure OpenAI (GPT-4o, o1)
|
|
@@ -52,17 +53,19 @@ Complete guide to Lynkr's architecture, request flow, and core capabilities.
|
|
|
52
53
|
|
|
53
54
|
### 2. Provider Routing
|
|
54
55
|
|
|
55
|
-
**
|
|
56
|
+
**4-Tier Intelligent Routing:**
|
|
56
57
|
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
58
|
+
Lynkr uses a multi-phase complexity analysis to route each request to the optimal model tier:
|
|
59
|
+
|
|
60
|
+
| Tier | Score | Routes To |
|
|
61
|
+
|------|-------|-----------|
|
|
62
|
+
| SIMPLE (0-25) | Greetings, simple Q&A | Cheap/local models (Ollama, llama.cpp) |
|
|
63
|
+
| MEDIUM (26-50) | Code reading, simple edits | Mid-range models (GPT-4o, Claude Sonnet) |
|
|
64
|
+
| COMPLEX (51-75) | Multi-file changes, debugging | Capable models (o1-mini, Claude Sonnet) |
|
|
65
|
+
| REASONING (76-100) | Security audits, architecture | Best models (o1, Claude Opus) |
|
|
66
|
+
|
|
67
|
+
Includes agentic workflow detection, 15-dimension weighted scoring, and cost optimization.
|
|
68
|
+
See **[Routing & Model Tiering](routing.md)** for full details.
|
|
66
69
|
|
|
67
70
|
**Automatic Fallback:**
|
|
68
71
|
- If primary provider fails → Use FALLBACK_PROVIDER
|
|
@@ -171,6 +174,8 @@ data: {}
|
|
|
171
174
|
- `invokeOllama()` - Ollama local
|
|
172
175
|
- `invokeLlamaCpp()` - llama.cpp
|
|
173
176
|
- `invokeBedrock()` - AWS Bedrock
|
|
177
|
+
- `invokeMoonshot()` - Moonshot AI (Kimi)
|
|
178
|
+
- `invokeZai()` - Z.AI (Zhipu AI)
|
|
174
179
|
|
|
175
180
|
**Format converters:**
|
|
176
181
|
- `openrouter-utils.js` - OpenAI format conversion
|
|
@@ -271,14 +276,15 @@ data: {}
|
|
|
271
276
|
|
|
272
277
|
### 1. Multi-Provider Support
|
|
273
278
|
|
|
274
|
-
**
|
|
275
|
-
- Cloud: Databricks, Bedrock, OpenRouter, Azure, OpenAI
|
|
279
|
+
**12+ Providers:**
|
|
280
|
+
- Cloud: Databricks, Bedrock, OpenRouter, Azure, OpenAI, Moonshot AI, Z.AI, Vertex AI
|
|
276
281
|
- Local: Ollama, llama.cpp, LM Studio
|
|
277
282
|
|
|
278
283
|
**Hybrid Routing:**
|
|
279
|
-
-
|
|
280
|
-
-
|
|
281
|
-
-
|
|
284
|
+
- [4-tier intelligent routing](routing.md) with complexity scoring
|
|
285
|
+
- Automatic provider selection and transparent failover
|
|
286
|
+
- Agentic workflow detection with tier upgrades
|
|
287
|
+
- Cost optimization with multi-source pricing
|
|
282
288
|
|
|
283
289
|
### 2. Token Optimization
|
|
284
290
|
|
|
@@ -383,6 +389,7 @@ PROMPT_CACHE_MAX_ENTRIES=256
|
|
|
383
389
|
|
|
384
390
|
## Next Steps
|
|
385
391
|
|
|
392
|
+
- **[Routing & Model Tiering](routing.md)** - Intelligent routing and scoring algorithm
|
|
386
393
|
- **[Memory System](memory-system.md)** - Long-term memory details
|
|
387
394
|
- **[Token Optimization](token-optimization.md)** - Cost reduction strategies
|
|
388
395
|
- **[Production Guide](production.md)** - Deploy to production
|
|
@@ -16,6 +16,7 @@ Before installing Lynkr, ensure you have:
|
|
|
16
16
|
- **OpenRouter API key** (get from [openrouter.ai/keys](https://openrouter.ai/keys))
|
|
17
17
|
- **Azure OpenAI** or **Azure Anthropic** subscription
|
|
18
18
|
- **OpenAI API key** (get from [platform.openai.com/api-keys](https://platform.openai.com/api-keys))
|
|
19
|
+
- **Moonshot AI API key** (get from [platform.moonshot.ai](https://platform.moonshot.ai))
|
|
19
20
|
- **Ollama** installed locally (for free local models)
|
|
20
21
|
- Optional: **Docker** for containerized deployment or MCP sandboxing
|
|
21
22
|
- Optional: **Claude Code CLI** (latest release) for CLI usage
|
|
@@ -236,6 +237,25 @@ MEMORY_RETRIEVAL_LIMIT=5
|
|
|
236
237
|
|
|
237
238
|
---
|
|
238
239
|
|
|
240
|
+
## Understanding Provider Selection
|
|
241
|
+
|
|
242
|
+
Lynkr has two modes for selecting which AI provider handles your requests:
|
|
243
|
+
|
|
244
|
+
| Mode | Config | How it works | Best for |
|
|
245
|
+
|------|--------|-------------|----------|
|
|
246
|
+
| **Static** | `MODEL_PROVIDER=ollama` | All requests go to one provider | Simple setups, single provider |
|
|
247
|
+
| **Tier-based** | All 4 `TIER_*` vars set | Requests route by complexity score | Cost optimization, multi-provider |
|
|
248
|
+
|
|
249
|
+
**Static mode** — Set `MODEL_PROVIDER` to your provider. Every request goes there. Simple and predictable.
|
|
250
|
+
|
|
251
|
+
**Tier-based mode** — Set all 4 `TIER_*` env vars (`TIER_SIMPLE`, `TIER_MEDIUM`, `TIER_COMPLEX`, `TIER_REASONING`). Each request is scored for complexity and routed to the appropriate tier's provider. When all 4 are set, they **override** `MODEL_PROVIDER` for routing decisions.
|
|
252
|
+
|
|
253
|
+
> **Note:** If only some `TIER_*` vars are set (not all 4), tier routing is disabled and `MODEL_PROVIDER` is used instead. `MODEL_PROVIDER` is always required as a fallback default even when tiers are configured.
|
|
254
|
+
|
|
255
|
+
See [Tier-Based Routing](#tier-based-routing-cost-optimization) below for full setup, or pick a single provider from the Quick Start examples to get running immediately.
|
|
256
|
+
|
|
257
|
+
---
|
|
258
|
+
|
|
239
259
|
## Quick Start Examples
|
|
240
260
|
|
|
241
261
|
Choose your provider and follow the setup steps:
|
|
@@ -501,7 +521,36 @@ lynkr start
|
|
|
501
521
|
|
|
502
522
|
---
|
|
503
523
|
|
|
504
|
-
### 9.
|
|
524
|
+
### 9. Moonshot AI / Kimi (Affordable Cloud)
|
|
525
|
+
|
|
526
|
+
**Best for:** Affordable cloud models, thinking/reasoning models
|
|
527
|
+
|
|
528
|
+
```bash
|
|
529
|
+
# Install
|
|
530
|
+
npm install -g lynkr
|
|
531
|
+
|
|
532
|
+
# Configure
|
|
533
|
+
export MODEL_PROVIDER=moonshot
|
|
534
|
+
export MOONSHOT_API_KEY=sk-your-moonshot-api-key
|
|
535
|
+
export MOONSHOT_MODEL=kimi-k2-turbo-preview
|
|
536
|
+
|
|
537
|
+
# Start
|
|
538
|
+
lynkr start
|
|
539
|
+
```
|
|
540
|
+
|
|
541
|
+
**Get Moonshot API key:**
|
|
542
|
+
1. Visit [platform.moonshot.ai](https://platform.moonshot.ai)
|
|
543
|
+
2. Sign up or log in
|
|
544
|
+
3. Create a new API key
|
|
545
|
+
4. Add credits to your account
|
|
546
|
+
|
|
547
|
+
**Available models:**
|
|
548
|
+
- `kimi-k2-turbo-preview` - Fast, efficient, tool calling support
|
|
549
|
+
- `kimi-k2-thinking` - Chain-of-thought reasoning model
|
|
550
|
+
|
|
551
|
+
---
|
|
552
|
+
|
|
553
|
+
### 10. LM Studio (Local with GUI)
|
|
505
554
|
|
|
506
555
|
**Best for:** Local models with graphical interface
|
|
507
556
|
|
|
@@ -525,19 +574,20 @@ lynkr start
|
|
|
525
574
|
|
|
526
575
|
---
|
|
527
576
|
|
|
528
|
-
##
|
|
577
|
+
## Tier-Based Routing (Cost Optimization)
|
|
529
578
|
|
|
530
|
-
**Use local Ollama for simple tasks,
|
|
579
|
+
**Use local Ollama for simple tasks, cloud for complex ones:**
|
|
531
580
|
|
|
532
581
|
```bash
|
|
533
582
|
# Start Ollama
|
|
534
583
|
ollama serve
|
|
535
|
-
ollama pull llama3.
|
|
584
|
+
ollama pull llama3.2
|
|
536
585
|
|
|
537
|
-
# Configure
|
|
538
|
-
export
|
|
539
|
-
export
|
|
540
|
-
export
|
|
586
|
+
# Configure tier-based routing (set all 4 to enable)
|
|
587
|
+
export TIER_SIMPLE=ollama:llama3.2
|
|
588
|
+
export TIER_MEDIUM=openrouter:openai/gpt-4o-mini
|
|
589
|
+
export TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
|
|
590
|
+
export TIER_REASONING=databricks:databricks-claude-sonnet-4-5
|
|
541
591
|
export FALLBACK_ENABLED=true
|
|
542
592
|
export FALLBACK_PROVIDER=databricks
|
|
543
593
|
export DATABRICKS_API_BASE=https://your-workspace.databricks.com
|
|
@@ -548,13 +598,15 @@ lynkr start
|
|
|
548
598
|
```
|
|
549
599
|
|
|
550
600
|
**How it works:**
|
|
551
|
-
-
|
|
552
|
-
- **
|
|
553
|
-
- **
|
|
554
|
-
- **
|
|
601
|
+
- Each request is scored for complexity (0-100) and mapped to a tier
|
|
602
|
+
- **SIMPLE (0-25)**: Ollama (free, local, fast)
|
|
603
|
+
- **MEDIUM (26-50)**: OpenRouter (affordable cloud)
|
|
604
|
+
- **COMPLEX (51-75)**: Databricks (most capable)
|
|
605
|
+
- **REASONING (76-100)**: Databricks (best available)
|
|
606
|
+
- **Provider failures**: Automatic transparent fallback to cloud
|
|
555
607
|
|
|
556
608
|
**Cost savings:**
|
|
557
|
-
- **65-100%** for requests
|
|
609
|
+
- **65-100%** for requests routed to local models
|
|
558
610
|
- **40-87%** faster for simple requests
|
|
559
611
|
- **Privacy**: Simple queries never leave your machine
|
|
560
612
|
|
|
@@ -614,7 +666,7 @@ See [Provider Configuration Guide](providers.md) for complete environment variab
|
|
|
614
666
|
|
|
615
667
|
| Variable | Description | Default |
|
|
616
668
|
|----------|-------------|---------|
|
|
617
|
-
| `MODEL_PROVIDER` | Provider to use (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`) | `databricks` |
|
|
669
|
+
| `MODEL_PROVIDER` | Provider to use (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`, `moonshot`, `zai`, `vertex`) | `databricks` |
|
|
618
670
|
| `PORT` | HTTP port for proxy server | `8081` |
|
|
619
671
|
| `WORKSPACE_ROOT` | Workspace directory path | `process.cwd()` |
|
|
620
672
|
| `LOG_LEVEL` | Logging level (`error`, `warn`, `info`, `debug`) | `info` |
|
|
@@ -190,15 +190,35 @@ METRICS_ENABLED=true # default: true
|
|
|
190
190
|
|
|
191
191
|
### 6. Structured Logging
|
|
192
192
|
|
|
193
|
-
JSON logs with request ID correlation.
|
|
193
|
+
JSON logs with request ID correlation via [Pino](https://github.com/pinojs/pino).
|
|
194
|
+
|
|
195
|
+
**Log Level Philosophy:**
|
|
196
|
+
- **`info`** — Meaningful milestones: request received (minimal), request completed (duration + tokens), errors, retries, fallbacks
|
|
197
|
+
- **`debug`** — Operational details: request body previews, tool injection, streaming chunks, intermediate conversions, tool mapping
|
|
198
|
+
|
|
199
|
+
**Console Configuration:**
|
|
200
|
+
```bash
|
|
201
|
+
LOG_LEVEL=info # options: error, warn, info, debug (default: info)
|
|
202
|
+
REQUEST_LOGGING_ENABLED=true # default: true
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
In development mode (`NODE_ENV=development`), logs are pretty-printed via `pino-pretty`.
|
|
206
|
+
|
|
207
|
+
**File Logging (optional):**
|
|
208
|
+
|
|
209
|
+
Persistent log files with automatic daily rotation via [pino-roll](https://github.com/pinojs/pino-roll). Enable by setting `LOG_FILE_ENABLED=true`.
|
|
194
210
|
|
|
195
|
-
**Configuration:**
|
|
196
211
|
```bash
|
|
197
|
-
|
|
198
|
-
|
|
212
|
+
LOG_FILE_ENABLED=true # default: false
|
|
213
|
+
LOG_FILE_PATH=./logs/lynkr.log # default: <cwd>/logs/lynkr.log
|
|
214
|
+
LOG_FILE_LEVEL=debug # default: debug (captures all levels)
|
|
215
|
+
LOG_FILE_FREQUENCY=daily # options: daily, hourly, custom (default: daily)
|
|
216
|
+
LOG_FILE_MAX_FILES=14 # rotated files to keep (default: 14)
|
|
199
217
|
```
|
|
200
218
|
|
|
201
|
-
|
|
219
|
+
Rotated files are named with timestamps (e.g., `lynkr.log.2025-07-12`). The log directory is created automatically.
|
|
220
|
+
|
|
221
|
+
**Log format (JSON):**
|
|
202
222
|
```json
|
|
203
223
|
{
|
|
204
224
|
"level": "info",
|
|
@@ -216,10 +236,25 @@ REQUEST_LOGGING_ENABLED=true # default: true
|
|
|
216
236
|
}
|
|
217
237
|
```
|
|
218
238
|
|
|
239
|
+
**Querying log files:**
|
|
240
|
+
```bash
|
|
241
|
+
# Tail live logs
|
|
242
|
+
tail -f ./logs/lynkr.log | npx pino-pretty
|
|
243
|
+
|
|
244
|
+
# Find errors in the last 24 hours
|
|
245
|
+
cat ./logs/lynkr.log | jq 'select(.level >= 50)'
|
|
246
|
+
|
|
247
|
+
# Filter by provider
|
|
248
|
+
cat ./logs/lynkr.log | jq 'select(.provider == "databricks")'
|
|
249
|
+
|
|
250
|
+
# Search for slow requests (>2s)
|
|
251
|
+
cat ./logs/lynkr.log | jq 'select(.duration > 2000)'
|
|
252
|
+
```
|
|
253
|
+
|
|
219
254
|
**Log aggregation:**
|
|
220
|
-
- Stdout
|
|
221
|
-
-
|
|
222
|
-
-
|
|
255
|
+
- **Stdout** — Captured by Docker/K8s log drivers
|
|
256
|
+
- **File rotation** — For standalone deployments or local debugging
|
|
257
|
+
- **External** — Forward JSON logs to Elasticsearch, Splunk, Grafana Loki, etc.
|
|
223
258
|
|
|
224
259
|
### 7. Health Checks
|
|
225
260
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Provider Configuration Guide
|
|
2
2
|
|
|
3
|
-
Complete configuration reference for all
|
|
3
|
+
Complete configuration reference for all 12+ supported LLM providers. Each provider section includes setup instructions, model options, pricing, and example configurations.
|
|
4
4
|
|
|
5
5
|
---
|
|
6
6
|
|
|
@@ -18,6 +18,7 @@ Lynkr supports multiple AI model providers, giving you flexibility in choosing t
|
|
|
18
18
|
| **Azure OpenAI** | Cloud | GPT-4o, GPT-5, o1, o3 | $$$ | Cloud | Medium |
|
|
19
19
|
| **Azure Anthropic** | Cloud | Claude models | $$$ | Cloud | Medium |
|
|
20
20
|
| **OpenAI** | Cloud | GPT-4o, o1, o3 | $$$ | Cloud | Easy |
|
|
21
|
+
| **Moonshot AI (Kimi)** | Cloud | Kimi K2 (thinking + turbo) | $ | Cloud | Easy |
|
|
21
22
|
| **LM Studio** | Local | Local models with GUI | **FREE** | 🔒 100% Local | Easy |
|
|
22
23
|
| **MLX OpenAI Server** | Local | Apple Silicon optimized | **FREE** | 🔒 100% Local | Easy |
|
|
23
24
|
|
|
@@ -25,7 +26,11 @@ Lynkr supports multiple AI model providers, giving you flexibility in choosing t
|
|
|
25
26
|
|
|
26
27
|
## Configuration Methods
|
|
27
28
|
|
|
28
|
-
|
|
29
|
+
There are two routing modes. Choose based on your needs:
|
|
30
|
+
|
|
31
|
+
### Static Routing (Single Provider)
|
|
32
|
+
|
|
33
|
+
Set `MODEL_PROVIDER` to send all requests to one provider. All requests go to this provider regardless of complexity:
|
|
29
34
|
|
|
30
35
|
```bash
|
|
31
36
|
export MODEL_PROVIDER=databricks
|
|
@@ -34,6 +39,23 @@ export DATABRICKS_API_KEY=your-key
|
|
|
34
39
|
lynkr start
|
|
35
40
|
```
|
|
36
41
|
|
|
42
|
+
### Tier-Based Routing (Recommended for Cost Optimization)
|
|
43
|
+
|
|
44
|
+
Set **all 4** `TIER_*` vars to route requests by complexity. Each request is scored 0-100 and routed to the `provider:model` matching its complexity tier. When all four are configured, they **override** `MODEL_PROVIDER` for routing decisions:
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
export MODEL_PROVIDER=ollama # Still needed for startup checks
|
|
48
|
+
export TIER_SIMPLE=ollama:llama3.2 # Score 0-25 → local (free)
|
|
49
|
+
export TIER_MEDIUM=openrouter:openai/gpt-4o-mini # Score 26-50 → affordable cloud
|
|
50
|
+
export TIER_COMPLEX=databricks:claude-sonnet # Score 51-75 → capable cloud
|
|
51
|
+
export TIER_REASONING=databricks:claude-sonnet # Score 76-100 → best available
|
|
52
|
+
lynkr start
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
> **Important:** All 4 `TIER_*` vars must be set to enable tier routing. If any are missing, tier routing is disabled and `MODEL_PROVIDER` is used for all requests. `MODEL_PROVIDER` should always be set — even with tier routing active, it is used for startup checks, provider discovery, and as the default provider when a `TIER_*` value has no `provider:` prefix.
|
|
56
|
+
>
|
|
57
|
+
> **`PREFER_OLLAMA` is deprecated** and has no effect. Use `TIER_SIMPLE=ollama:<model>` to route simple requests to Ollama. See [Routing Precedence](routing.md#routing-precedence) for full details.
|
|
58
|
+
|
|
37
59
|
### .env File (Recommended for Production)
|
|
38
60
|
|
|
39
61
|
```bash
|
|
@@ -46,11 +68,17 @@ nano .env
|
|
|
46
68
|
|
|
47
69
|
Example `.env`:
|
|
48
70
|
```env
|
|
49
|
-
MODEL_PROVIDER=
|
|
71
|
+
MODEL_PROVIDER=ollama
|
|
50
72
|
DATABRICKS_API_BASE=https://your-workspace.databricks.com
|
|
51
73
|
DATABRICKS_API_KEY=dapi1234567890abcdef
|
|
52
74
|
PORT=8081
|
|
53
75
|
LOG_LEVEL=info
|
|
76
|
+
|
|
77
|
+
# Tier routing (optional — set all 4 to enable)
|
|
78
|
+
TIER_SIMPLE=ollama:llama3.2
|
|
79
|
+
TIER_MEDIUM=openrouter:openai/gpt-4o-mini
|
|
80
|
+
TIER_COMPLEX=databricks:claude-sonnet
|
|
81
|
+
TIER_REASONING=databricks:claude-sonnet
|
|
54
82
|
```
|
|
55
83
|
|
|
56
84
|
---
|
|
@@ -685,7 +713,82 @@ LMSTUDIO_API_KEY=your-optional-api-key
|
|
|
685
713
|
|
|
686
714
|
---
|
|
687
715
|
|
|
688
|
-
### 10.
|
|
716
|
+
### 10. Moonshot AI / Kimi (OpenAI-Compatible)
|
|
717
|
+
|
|
718
|
+
**Best for:** Affordable cloud models, thinking/reasoning models, OpenAI-compatible API
|
|
719
|
+
|
|
720
|
+
#### Configuration
|
|
721
|
+
|
|
722
|
+
```env
|
|
723
|
+
MODEL_PROVIDER=moonshot
|
|
724
|
+
MOONSHOT_API_KEY=sk-your-moonshot-api-key
|
|
725
|
+
MOONSHOT_ENDPOINT=https://api.moonshot.ai/v1/chat/completions
|
|
726
|
+
MOONSHOT_MODEL=kimi-k2-turbo-preview
|
|
727
|
+
```
|
|
728
|
+
|
|
729
|
+
#### Getting Moonshot API Key
|
|
730
|
+
|
|
731
|
+
1. Visit [platform.moonshot.ai](https://platform.moonshot.ai)
|
|
732
|
+
2. Sign up or log in
|
|
733
|
+
3. Navigate to API Keys section
|
|
734
|
+
4. Create a new API key
|
|
735
|
+
5. Add credits to your account
|
|
736
|
+
|
|
737
|
+
#### Available Models
|
|
738
|
+
|
|
739
|
+
```env
|
|
740
|
+
MOONSHOT_MODEL=kimi-k2-turbo-preview # Fast, efficient (recommended)
|
|
741
|
+
MOONSHOT_MODEL=kimi-k2-thinking # Chain-of-thought reasoning model
|
|
742
|
+
```
|
|
743
|
+
|
|
744
|
+
**Model Details:**
|
|
745
|
+
|
|
746
|
+
| Model | Type | Best For |
|
|
747
|
+
|-------|------|----------|
|
|
748
|
+
| `kimi-k2-turbo-preview` | Standard | Fast responses, tool calling, general tasks |
|
|
749
|
+
| `kimi-k2-thinking` | Thinking/Reasoning | Complex analysis, multi-step reasoning |
|
|
750
|
+
|
|
751
|
+
#### How It Works
|
|
752
|
+
|
|
753
|
+
Moonshot uses an **OpenAI-compatible** chat completions API. Lynkr handles all format conversion automatically:
|
|
754
|
+
|
|
755
|
+
1. Claude Code CLI sends Anthropic-format request to Lynkr
|
|
756
|
+
2. Lynkr converts Anthropic messages → OpenAI chat completions format
|
|
757
|
+
3. Request is sent to Moonshot's `/v1/chat/completions` endpoint
|
|
758
|
+
4. Moonshot response is converted back to Anthropic format
|
|
759
|
+
5. Claude Code CLI receives a standard Anthropic response
|
|
760
|
+
|
|
761
|
+
#### Thinking Model Support
|
|
762
|
+
|
|
763
|
+
When using `kimi-k2-thinking`, the model returns both `reasoning_content` (chain-of-thought) and `content` (final answer). Lynkr automatically extracts only the final answer for clean CLI output. The reasoning content is used as a fallback only when the final answer is empty.
|
|
764
|
+
|
|
765
|
+
#### Important Notes
|
|
766
|
+
|
|
767
|
+
- **Streaming:** Streaming is disabled for Moonshot (responses arrive as complete JSON). This ensures clean terminal rendering since OpenAI SSE → Anthropic SSE conversion is not yet implemented.
|
|
768
|
+
- **Rate Limits:** Moonshot has a max concurrency of ~3 requests. Lynkr retries with backoff on 429 errors.
|
|
769
|
+
- **Tool Calling:** Full tool calling support via OpenAI function calling format (automatically converted from Anthropic format).
|
|
770
|
+
- **System Messages:** Moonshot natively supports the `system` role, so system prompts are passed directly.
|
|
771
|
+
|
|
772
|
+
#### Benefits
|
|
773
|
+
|
|
774
|
+
- ✅ **Affordable** — Competitive pricing for capable models
|
|
775
|
+
- ✅ **Thinking models** — Chain-of-thought reasoning with `kimi-k2-thinking`
|
|
776
|
+
- ✅ **Full tool calling** — Native function calling support
|
|
777
|
+
- ✅ **OpenAI-compatible** — Standard chat completions API
|
|
778
|
+
- ✅ **System role support** — Native system message handling
|
|
779
|
+
|
|
780
|
+
#### Test Connection
|
|
781
|
+
|
|
782
|
+
```bash
|
|
783
|
+
curl -X POST https://api.moonshot.ai/v1/chat/completions \
|
|
784
|
+
-H "Content-Type: application/json" \
|
|
785
|
+
-H "Authorization: Bearer $MOONSHOT_API_KEY" \
|
|
786
|
+
-d '{"model":"kimi-k2-turbo-preview","messages":[{"role":"user","content":"Hello"}]}'
|
|
787
|
+
```
|
|
788
|
+
|
|
789
|
+
---
|
|
790
|
+
|
|
791
|
+
### 11. MLX OpenAI Server (Apple Silicon)
|
|
689
792
|
|
|
690
793
|
**Best for:** Maximum performance on Apple Silicon Macs (M1/M2/M3/M4)
|
|
691
794
|
|
|
@@ -776,56 +879,53 @@ curl -X POST http://localhost:8000/v1/chat/completions -H "Content-Type: applica
|
|
|
776
879
|
|
|
777
880
|
---
|
|
778
881
|
|
|
779
|
-
##
|
|
882
|
+
## Tier-Based Routing & Fallback
|
|
780
883
|
|
|
781
|
-
### Intelligent
|
|
884
|
+
### Intelligent 4-Tier Routing
|
|
782
885
|
|
|
783
886
|
Optimize costs by routing requests based on complexity:
|
|
784
887
|
|
|
785
888
|
```env
|
|
786
|
-
#
|
|
787
|
-
|
|
788
|
-
|
|
889
|
+
# Tier-based routing (set all 4 to enable)
|
|
890
|
+
TIER_SIMPLE=ollama:llama3.2
|
|
891
|
+
TIER_MEDIUM=openrouter:openai/gpt-4o-mini
|
|
892
|
+
TIER_COMPLEX=azure-openai:gpt-4o
|
|
893
|
+
TIER_REASONING=azure-openai:gpt-4o
|
|
789
894
|
|
|
790
|
-
|
|
791
|
-
MODEL_PROVIDER=ollama
|
|
792
|
-
OLLAMA_MODEL=llama3.1:8b
|
|
793
|
-
OLLAMA_MAX_TOOLS_FOR_ROUTING=3
|
|
895
|
+
FALLBACK_ENABLED=true
|
|
794
896
|
|
|
795
|
-
#
|
|
897
|
+
# Provider credentials
|
|
898
|
+
OLLAMA_ENDPOINT=http://localhost:11434
|
|
796
899
|
OPENROUTER_API_KEY=your-key
|
|
797
|
-
|
|
798
|
-
|
|
799
|
-
|
|
800
|
-
# Heavy workload (complex requests)
|
|
801
|
-
FALLBACK_PROVIDER=databricks
|
|
802
|
-
DATABRICKS_API_BASE=your-base
|
|
803
|
-
DATABRICKS_API_KEY=your-key
|
|
900
|
+
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/...
|
|
901
|
+
AZURE_OPENAI_API_KEY=your-key
|
|
804
902
|
```
|
|
805
903
|
|
|
806
904
|
### How It Works
|
|
807
905
|
|
|
808
906
|
**Routing Logic:**
|
|
809
|
-
1.
|
|
810
|
-
2.
|
|
811
|
-
3.
|
|
907
|
+
1. Each request is scored for complexity (0-100)
|
|
908
|
+
2. Score maps to a tier: SIMPLE (0-25), MEDIUM (26-50), COMPLEX (51-75), REASONING (76-100)
|
|
909
|
+
3. The request is routed to the provider:model configured for that tier
|
|
812
910
|
|
|
813
911
|
**Automatic Fallback:**
|
|
814
|
-
-
|
|
815
|
-
-
|
|
816
|
-
- ✅ Transparent to the user
|
|
912
|
+
- If the selected provider fails, Lynkr falls back to `FALLBACK_PROVIDER`
|
|
913
|
+
- Transparent to the user
|
|
817
914
|
|
|
818
915
|
### Cost Savings
|
|
819
916
|
|
|
820
|
-
- **65-100%** for requests
|
|
917
|
+
- **65-100%** for requests routed to local/cheap models
|
|
821
918
|
- **40-87%** faster for simple requests
|
|
822
|
-
- **Privacy**: Simple queries
|
|
919
|
+
- **Privacy**: Simple queries can stay on your machine when using a local TIER_SIMPLE model
|
|
823
920
|
|
|
824
921
|
### Configuration Options
|
|
825
922
|
|
|
826
923
|
| Variable | Description | Default |
|
|
827
924
|
|----------|-------------|---------|
|
|
828
|
-
| `
|
|
925
|
+
| `TIER_SIMPLE` | Model for simple tier (`provider:model`) | *required for tier routing* |
|
|
926
|
+
| `TIER_MEDIUM` | Model for medium tier (`provider:model`) | *required for tier routing* |
|
|
927
|
+
| `TIER_COMPLEX` | Model for complex tier (`provider:model`) | *required for tier routing* |
|
|
928
|
+
| `TIER_REASONING` | Model for reasoning tier (`provider:model`) | *required for tier routing* |
|
|
829
929
|
| `FALLBACK_ENABLED` | Enable automatic fallback | `true` |
|
|
830
930
|
| `FALLBACK_PROVIDER` | Provider to use when primary fails | `databricks` |
|
|
831
931
|
| `OLLAMA_MAX_TOOLS_FOR_ROUTING` | Max tools to route to Ollama | `3` |
|
|
@@ -841,7 +941,7 @@ DATABRICKS_API_KEY=your-key
|
|
|
841
941
|
|
|
842
942
|
| Variable | Description | Default |
|
|
843
943
|
|----------|-------------|---------|
|
|
844
|
-
| `MODEL_PROVIDER` | Primary provider (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`) | `databricks` |
|
|
944
|
+
| `MODEL_PROVIDER` | Primary provider (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`, `zai`, `moonshot`, `vertex`) | `databricks` |
|
|
845
945
|
| `PORT` | HTTP port for proxy server | `8081` |
|
|
846
946
|
| `WORKSPACE_ROOT` | Workspace directory path | `process.cwd()` |
|
|
847
947
|
| `LOG_LEVEL` | Logging level (`error`, `warn`, `info`, `debug`) | `info` |
|
|
@@ -858,17 +958,19 @@ See individual provider sections above for complete variable lists.
|
|
|
858
958
|
|
|
859
959
|
### Feature Comparison
|
|
860
960
|
|
|
861
|
-
| Feature | Databricks | Bedrock | OpenAI | Azure OpenAI | Azure Anthropic | OpenRouter | Ollama | llama.cpp | LM Studio |
|
|
862
|
-
|
|
863
|
-
| **Setup Complexity** | Medium | Easy | Easy | Medium | Medium | Easy | Easy | Medium | Easy |
|
|
864
|
-
| **Cost** | $$$ | $-$$$ | $$ | $$ | $$$ | $-$$ | **Free** | **Free** | **Free** |
|
|
865
|
-
| **Latency** | Low | Low | Low | Low | Low | Medium | **Very Low** | **Very Low** | **Very Low** |
|
|
866
|
-
| **Model Variety** | 2 | **100+** | 10+ | 10+ | 2 | **100+** | 50+ | Unlimited | 50+ |
|
|
867
|
-
| **Tool Calling** | Excellent | Excellent* | Excellent | Excellent | Excellent | Good | Fair | Good | Fair |
|
|
868
|
-
| **Context Length** | 200K | Up to 300K | 128K | 128K | 200K | Varies | 32K-128K | Model-dependent | 32K-128K |
|
|
869
|
-
| **Streaming** | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
|
|
870
|
-
| **Privacy** | Enterprise | Enterprise | Third-party | Enterprise | Enterprise | Third-party | **Local** | **Local** | **Local** |
|
|
871
|
-
| **Offline** | No | No | No | No | No | No | **Yes** | **Yes** | **Yes** |
|
|
961
|
+
| Feature | Databricks | Bedrock | OpenAI | Azure OpenAI | Azure Anthropic | OpenRouter | Moonshot | Ollama | llama.cpp | LM Studio |
|
|
962
|
+
|---------|-----------|---------|--------|--------------|-----------------|------------|----------|--------|-----------|-----------|
|
|
963
|
+
| **Setup Complexity** | Medium | Easy | Easy | Medium | Medium | Easy | Easy | Easy | Medium | Easy |
|
|
964
|
+
| **Cost** | $$$ | $-$$$ | $$ | $$ | $$$ | $-$$ | $ | **Free** | **Free** | **Free** |
|
|
965
|
+
| **Latency** | Low | Low | Low | Low | Low | Medium | Low | **Very Low** | **Very Low** | **Very Low** |
|
|
966
|
+
| **Model Variety** | 2 | **100+** | 10+ | 10+ | 2 | **100+** | 2+ | 50+ | Unlimited | 50+ |
|
|
967
|
+
| **Tool Calling** | Excellent | Excellent* | Excellent | Excellent | Excellent | Good | Good | Fair | Good | Fair |
|
|
968
|
+
| **Context Length** | 200K | Up to 300K | 128K | 128K | 200K | Varies | 128K | 32K-128K | Model-dependent | 32K-128K |
|
|
969
|
+
| **Streaming** | Yes | Yes | Yes | Yes | Yes | Yes | Non-streaming** | Yes | Yes | Yes |
|
|
970
|
+
| **Privacy** | Enterprise | Enterprise | Third-party | Enterprise | Enterprise | Third-party | Third-party | **Local** | **Local** | **Local** |
|
|
971
|
+
| **Offline** | No | No | No | No | No | No | No | **Yes** | **Yes** | **Yes** |
|
|
972
|
+
|
|
973
|
+
_** Moonshot uses non-streaming mode (responses arrive as complete JSON) for clean terminal rendering_
|
|
872
974
|
|
|
873
975
|
_* Tool calling only supported by Claude models on Bedrock_
|
|
874
976
|
|
|
@@ -882,6 +984,7 @@ _* Tool calling only supported by Claude models on Bedrock_
|
|
|
882
984
|
| **OpenRouter** | GPT-4o mini | $0.15 | $0.60 |
|
|
883
985
|
| **OpenAI** | GPT-4o | $2.50 | $10.00 |
|
|
884
986
|
| **Azure OpenAI** | GPT-4o | $2.50 | $10.00 |
|
|
987
|
+
| **Moonshot** | Kimi K2 Turbo | See moonshot.ai | See moonshot.ai |
|
|
885
988
|
| **Ollama** | Any model | **FREE** | **FREE** |
|
|
886
989
|
| **llama.cpp** | Any model | **FREE** | **FREE** |
|
|
887
990
|
| **LM Studio** | Any model | **FREE** | **FREE** |
|