lynkr 7.2.4 → 8.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. package/README.md +2 -2
  2. package/config/model-tiers.json +89 -0
  3. package/docs/docs.html +1 -0
  4. package/docs/index.md +7 -0
  5. package/docs/toon-integration-spec.md +130 -0
  6. package/documentation/README.md +3 -2
  7. package/documentation/claude-code-cli.md +23 -16
  8. package/documentation/cursor-integration.md +17 -14
  9. package/documentation/docker.md +11 -4
  10. package/documentation/embeddings.md +7 -5
  11. package/documentation/faq.md +66 -12
  12. package/documentation/features.md +22 -15
  13. package/documentation/installation.md +66 -14
  14. package/documentation/production.md +43 -8
  15. package/documentation/providers.md +145 -42
  16. package/documentation/routing.md +476 -0
  17. package/documentation/token-optimization.md +7 -5
  18. package/documentation/troubleshooting.md +81 -5
  19. package/install.sh +6 -1
  20. package/package.json +5 -3
  21. package/scripts/setup.js +0 -1
  22. package/src/agents/executor.js +14 -6
  23. package/src/api/middleware/session.js +15 -2
  24. package/src/api/openai-router.js +130 -37
  25. package/src/api/providers-handler.js +15 -1
  26. package/src/api/router.js +107 -2
  27. package/src/budget/index.js +4 -3
  28. package/src/clients/databricks.js +431 -234
  29. package/src/clients/gpt-utils.js +181 -0
  30. package/src/clients/ollama-utils.js +66 -140
  31. package/src/clients/routing.js +0 -1
  32. package/src/clients/standard-tools.js +82 -5
  33. package/src/config/index.js +119 -35
  34. package/src/context/toon.js +173 -0
  35. package/src/headroom/launcher.js +8 -3
  36. package/src/logger/index.js +23 -0
  37. package/src/orchestrator/index.js +765 -212
  38. package/src/routing/agentic-detector.js +320 -0
  39. package/src/routing/complexity-analyzer.js +202 -2
  40. package/src/routing/cost-optimizer.js +305 -0
  41. package/src/routing/index.js +168 -159
  42. package/src/routing/model-registry.js +437 -0
  43. package/src/routing/model-tiers.js +365 -0
  44. package/src/server.js +2 -2
  45. package/src/sessions/cleanup.js +3 -3
  46. package/src/sessions/record.js +10 -1
  47. package/src/sessions/store.js +7 -2
  48. package/src/tools/agent-task.js +48 -1
  49. package/src/tools/index.js +15 -2
  50. package/src/tools/workspace.js +35 -4
  51. package/src/workspace/index.js +30 -0
  52. package/te +11622 -0
  53. package/test/README.md +1 -1
  54. package/test/azure-openai-config.test.js +17 -8
  55. package/test/azure-openai-integration.test.js +7 -1
  56. package/test/azure-openai-routing.test.js +41 -43
  57. package/test/bedrock-integration.test.js +18 -32
  58. package/test/hybrid-routing-integration.test.js +35 -20
  59. package/test/hybrid-routing-performance.test.js +74 -64
  60. package/test/llamacpp-integration.test.js +28 -9
  61. package/test/lmstudio-integration.test.js +20 -8
  62. package/test/openai-integration.test.js +17 -20
  63. package/test/performance-tests.js +1 -1
  64. package/test/routing.test.js +65 -59
  65. package/test/toon-compression.test.js +131 -0
  66. package/CLAWROUTER_ROUTING_PLAN.md +0 -910
  67. package/ROUTER_COMPARISON.md +0 -173
  68. package/TIER_ROUTING_PLAN.md +0 -771
@@ -26,6 +26,7 @@ Complete guide to Lynkr's architecture, request flow, and core capabilities.
26
26
  ├──→ Databricks (Claude 4.5)
27
27
  ├──→ AWS Bedrock (100+ models)
28
28
  ├──→ OpenRouter (100+ models)
29
+ ├──→ Moonshot AI (Kimi K2)
29
30
  ├──→ Ollama (local, free)
30
31
  ├──→ llama.cpp (local, free)
31
32
  ├──→ Azure OpenAI (GPT-4o, o1)
@@ -52,17 +53,19 @@ Complete guide to Lynkr's architecture, request flow, and core capabilities.
52
53
 
53
54
  ### 2. Provider Routing
54
55
 
55
- **Smart Routing Logic:**
56
+ **4-Tier Intelligent Routing:**
56
57
 
57
- ```javascript
58
- if (PREFER_OLLAMA && toolCount <= OLLAMA_MAX_TOOLS_FOR_ROUTING) {
59
- provider = "ollama"; // Local, fast, free
60
- } else if (toolCount <= OPENROUTER_MAX_TOOLS_FOR_ROUTING) {
61
- provider = "openrouter"; // Cloud, moderate complexity
62
- } else {
63
- provider = fallbackProvider; // Databricks/Azure, complex
64
- }
65
- ```
58
+ Lynkr uses a multi-phase complexity analysis to route each request to the optimal model tier:
59
+
60
+ | Tier | Score | Routes To |
61
+ |------|-------|-----------|
62
+ | SIMPLE (0-25) | Greetings, simple Q&A | Cheap/local models (Ollama, llama.cpp) |
63
+ | MEDIUM (26-50) | Code reading, simple edits | Mid-range models (GPT-4o, Claude Sonnet) |
64
+ | COMPLEX (51-75) | Multi-file changes, debugging | Capable models (o1-mini, Claude Sonnet) |
65
+ | REASONING (76-100) | Security audits, architecture | Best models (o1, Claude Opus) |
66
+
67
+ Includes agentic workflow detection, 15-dimension weighted scoring, and cost optimization.
68
+ See **[Routing & Model Tiering](routing.md)** for full details.
66
69
 
67
70
  **Automatic Fallback:**
68
71
  - If primary provider fails → Use FALLBACK_PROVIDER
@@ -171,6 +174,8 @@ data: {}
171
174
  - `invokeOllama()` - Ollama local
172
175
  - `invokeLlamaCpp()` - llama.cpp
173
176
  - `invokeBedrock()` - AWS Bedrock
177
+ - `invokeMoonshot()` - Moonshot AI (Kimi)
178
+ - `invokeZai()` - Z.AI (Zhipu AI)
174
179
 
175
180
  **Format converters:**
176
181
  - `openrouter-utils.js` - OpenAI format conversion
@@ -271,14 +276,15 @@ data: {}
271
276
 
272
277
  ### 1. Multi-Provider Support
273
278
 
274
- **9+ Providers:**
275
- - Cloud: Databricks, Bedrock, OpenRouter, Azure, OpenAI
279
+ **12+ Providers:**
280
+ - Cloud: Databricks, Bedrock, OpenRouter, Azure, OpenAI, Moonshot AI, Z.AI, Vertex AI
276
281
  - Local: Ollama, llama.cpp, LM Studio
277
282
 
278
283
  **Hybrid Routing:**
279
- - Automatic provider selection
280
- - Transparent failover
281
- - Cost optimization
284
+ - [4-tier intelligent routing](routing.md) with complexity scoring
285
+ - Automatic provider selection and transparent failover
286
+ - Agentic workflow detection with tier upgrades
287
+ - Cost optimization with multi-source pricing
282
288
 
283
289
  ### 2. Token Optimization
284
290
 
@@ -383,6 +389,7 @@ PROMPT_CACHE_MAX_ENTRIES=256
383
389
 
384
390
  ## Next Steps
385
391
 
392
+ - **[Routing & Model Tiering](routing.md)** - Intelligent routing and scoring algorithm
386
393
  - **[Memory System](memory-system.md)** - Long-term memory details
387
394
  - **[Token Optimization](token-optimization.md)** - Cost reduction strategies
388
395
  - **[Production Guide](production.md)** - Deploy to production
@@ -16,6 +16,7 @@ Before installing Lynkr, ensure you have:
16
16
  - **OpenRouter API key** (get from [openrouter.ai/keys](https://openrouter.ai/keys))
17
17
  - **Azure OpenAI** or **Azure Anthropic** subscription
18
18
  - **OpenAI API key** (get from [platform.openai.com/api-keys](https://platform.openai.com/api-keys))
19
+ - **Moonshot AI API key** (get from [platform.moonshot.ai](https://platform.moonshot.ai))
19
20
  - **Ollama** installed locally (for free local models)
20
21
  - Optional: **Docker** for containerized deployment or MCP sandboxing
21
22
  - Optional: **Claude Code CLI** (latest release) for CLI usage
@@ -236,6 +237,25 @@ MEMORY_RETRIEVAL_LIMIT=5
236
237
 
237
238
  ---
238
239
 
240
+ ## Understanding Provider Selection
241
+
242
+ Lynkr has two modes for selecting which AI provider handles your requests:
243
+
244
+ | Mode | Config | How it works | Best for |
245
+ |------|--------|-------------|----------|
246
+ | **Static** | `MODEL_PROVIDER=ollama` | All requests go to one provider | Simple setups, single provider |
247
+ | **Tier-based** | All 4 `TIER_*` vars set | Requests route by complexity score | Cost optimization, multi-provider |
248
+
249
+ **Static mode** — Set `MODEL_PROVIDER` to your provider. Every request goes there. Simple and predictable.
250
+
251
+ **Tier-based mode** — Set all 4 `TIER_*` env vars (`TIER_SIMPLE`, `TIER_MEDIUM`, `TIER_COMPLEX`, `TIER_REASONING`). Each request is scored for complexity and routed to the appropriate tier's provider. When all 4 are set, they **override** `MODEL_PROVIDER` for routing decisions.
252
+
253
+ > **Note:** If only some `TIER_*` vars are set (not all 4), tier routing is disabled and `MODEL_PROVIDER` is used instead. `MODEL_PROVIDER` is always required as a fallback default even when tiers are configured.
254
+
255
+ See [Tier-Based Routing](#tier-based-routing-cost-optimization) below for full setup, or pick a single provider from the Quick Start examples to get running immediately.
256
+
257
+ ---
258
+
239
259
  ## Quick Start Examples
240
260
 
241
261
  Choose your provider and follow the setup steps:
@@ -501,7 +521,36 @@ lynkr start
501
521
 
502
522
  ---
503
523
 
504
- ### 9. LM Studio (Local with GUI)
524
+ ### 9. Moonshot AI / Kimi (Affordable Cloud)
525
+
526
+ **Best for:** Affordable cloud models, thinking/reasoning models
527
+
528
+ ```bash
529
+ # Install
530
+ npm install -g lynkr
531
+
532
+ # Configure
533
+ export MODEL_PROVIDER=moonshot
534
+ export MOONSHOT_API_KEY=sk-your-moonshot-api-key
535
+ export MOONSHOT_MODEL=kimi-k2-turbo-preview
536
+
537
+ # Start
538
+ lynkr start
539
+ ```
540
+
541
+ **Get Moonshot API key:**
542
+ 1. Visit [platform.moonshot.ai](https://platform.moonshot.ai)
543
+ 2. Sign up or log in
544
+ 3. Create a new API key
545
+ 4. Add credits to your account
546
+
547
+ **Available models:**
548
+ - `kimi-k2-turbo-preview` - Fast, efficient, tool calling support
549
+ - `kimi-k2-thinking` - Chain-of-thought reasoning model
550
+
551
+ ---
552
+
553
+ ### 10. LM Studio (Local with GUI)
505
554
 
506
555
  **Best for:** Local models with graphical interface
507
556
 
@@ -525,19 +574,20 @@ lynkr start
525
574
 
526
575
  ---
527
576
 
528
- ## Hybrid Routing (Cost Optimization)
577
+ ## Tier-Based Routing (Cost Optimization)
529
578
 
530
- **Use local Ollama for simple tasks, fallback to cloud for complex ones:**
579
+ **Use local Ollama for simple tasks, cloud for complex ones:**
531
580
 
532
581
  ```bash
533
582
  # Start Ollama
534
583
  ollama serve
535
- ollama pull llama3.1:8b
584
+ ollama pull llama3.2
536
585
 
537
- # Configure hybrid routing
538
- export MODEL_PROVIDER=ollama
539
- export OLLAMA_MODEL=llama3.1:8b
540
- export PREFER_OLLAMA=true
586
+ # Configure tier-based routing (set all 4 to enable)
587
+ export TIER_SIMPLE=ollama:llama3.2
588
+ export TIER_MEDIUM=openrouter:openai/gpt-4o-mini
589
+ export TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
590
+ export TIER_REASONING=databricks:databricks-claude-sonnet-4-5
541
591
  export FALLBACK_ENABLED=true
542
592
  export FALLBACK_PROVIDER=databricks
543
593
  export DATABRICKS_API_BASE=https://your-workspace.databricks.com
@@ -548,13 +598,15 @@ lynkr start
548
598
  ```
549
599
 
550
600
  **How it works:**
551
- - **0-2 tools**: Ollama (free, local, fast)
552
- - **3-15 tools**: OpenRouter (if configured) or fallback to Databricks
553
- - **16+ tools**: Databricks/Azure (most capable)
554
- - **Ollama failures**: Automatic transparent fallback to cloud
601
+ - Each request is scored for complexity (0-100) and mapped to a tier
602
+ - **SIMPLE (0-25)**: Ollama (free, local, fast)
603
+ - **MEDIUM (26-50)**: OpenRouter (affordable cloud)
604
+ - **COMPLEX (51-75)**: Databricks (most capable)
605
+ - **REASONING (76-100)**: Databricks (best available)
606
+ - **Provider failures**: Automatic transparent fallback to cloud
555
607
 
556
608
  **Cost savings:**
557
- - **65-100%** for requests that stay on Ollama
609
+ - **65-100%** for requests routed to local models
558
610
  - **40-87%** faster for simple requests
559
611
  - **Privacy**: Simple queries never leave your machine
560
612
 
@@ -614,7 +666,7 @@ See [Provider Configuration Guide](providers.md) for complete environment variab
614
666
 
615
667
  | Variable | Description | Default |
616
668
  |----------|-------------|---------|
617
- | `MODEL_PROVIDER` | Provider to use (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`) | `databricks` |
669
+ | `MODEL_PROVIDER` | Provider to use (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`, `moonshot`, `zai`, `vertex`) | `databricks` |
618
670
  | `PORT` | HTTP port for proxy server | `8081` |
619
671
  | `WORKSPACE_ROOT` | Workspace directory path | `process.cwd()` |
620
672
  | `LOG_LEVEL` | Logging level (`error`, `warn`, `info`, `debug`) | `info` |
@@ -190,15 +190,35 @@ METRICS_ENABLED=true # default: true
190
190
 
191
191
  ### 6. Structured Logging
192
192
 
193
- JSON logs with request ID correlation.
193
+ JSON logs with request ID correlation via [Pino](https://github.com/pinojs/pino).
194
+
195
+ **Log Level Philosophy:**
196
+ - **`info`** — Meaningful milestones: request received (minimal), request completed (duration + tokens), errors, retries, fallbacks
197
+ - **`debug`** — Operational details: request body previews, tool injection, streaming chunks, intermediate conversions, tool mapping
198
+
199
+ **Console Configuration:**
200
+ ```bash
201
+ LOG_LEVEL=info # options: error, warn, info, debug (default: info)
202
+ REQUEST_LOGGING_ENABLED=true # default: true
203
+ ```
204
+
205
+ In development mode (`NODE_ENV=development`), logs are pretty-printed via `pino-pretty`.
206
+
207
+ **File Logging (optional):**
208
+
209
+ Persistent log files with automatic daily rotation via [pino-roll](https://github.com/pinojs/pino-roll). Enable by setting `LOG_FILE_ENABLED=true`.
194
210
 
195
- **Configuration:**
196
211
  ```bash
197
- LOG_LEVEL=info # options: error, warn, info, debug
198
- REQUEST_LOGGING_ENABLED=true # default: true
212
+ LOG_FILE_ENABLED=true # default: false
213
+ LOG_FILE_PATH=./logs/lynkr.log # default: <cwd>/logs/lynkr.log
214
+ LOG_FILE_LEVEL=debug # default: debug (captures all levels)
215
+ LOG_FILE_FREQUENCY=daily # options: daily, hourly, custom (default: daily)
216
+ LOG_FILE_MAX_FILES=14 # rotated files to keep (default: 14)
199
217
  ```
200
218
 
201
- **Log format:**
219
+ Rotated files are named with timestamps (e.g., `lynkr.log.2025-07-12`). The log directory is created automatically.
220
+
221
+ **Log format (JSON):**
202
222
  ```json
203
223
  {
204
224
  "level": "info",
@@ -216,10 +236,25 @@ REQUEST_LOGGING_ENABLED=true # default: true
216
236
  }
217
237
  ```
218
238
 
239
+ **Querying log files:**
240
+ ```bash
241
+ # Tail live logs
242
+ tail -f ./logs/lynkr.log | npx pino-pretty
243
+
244
+ # Find errors in the last 24 hours
245
+ cat ./logs/lynkr.log | jq 'select(.level >= 50)'
246
+
247
+ # Filter by provider
248
+ cat ./logs/lynkr.log | jq 'select(.provider == "databricks")'
249
+
250
+ # Search for slow requests (>2s)
251
+ cat ./logs/lynkr.log | jq 'select(.duration > 2000)'
252
+ ```
253
+
219
254
  **Log aggregation:**
220
- - Stdout (captured by Docker/K8s)
221
- - Parse with structured log tools
222
- - Send to Elasticsearch, Splunk, etc.
255
+ - **Stdout** Captured by Docker/K8s log drivers
256
+ - **File rotation** For standalone deployments or local debugging
257
+ - **External** — Forward JSON logs to Elasticsearch, Splunk, Grafana Loki, etc.
223
258
 
224
259
  ### 7. Health Checks
225
260
 
@@ -1,6 +1,6 @@
1
1
  # Provider Configuration Guide
2
2
 
3
- Complete configuration reference for all 9+ supported LLM providers. Each provider section includes setup instructions, model options, pricing, and example configurations.
3
+ Complete configuration reference for all 12+ supported LLM providers. Each provider section includes setup instructions, model options, pricing, and example configurations.
4
4
 
5
5
  ---
6
6
 
@@ -18,6 +18,7 @@ Lynkr supports multiple AI model providers, giving you flexibility in choosing t
18
18
  | **Azure OpenAI** | Cloud | GPT-4o, GPT-5, o1, o3 | $$$ | Cloud | Medium |
19
19
  | **Azure Anthropic** | Cloud | Claude models | $$$ | Cloud | Medium |
20
20
  | **OpenAI** | Cloud | GPT-4o, o1, o3 | $$$ | Cloud | Easy |
21
+ | **Moonshot AI (Kimi)** | Cloud | Kimi K2 (thinking + turbo) | $ | Cloud | Easy |
21
22
  | **LM Studio** | Local | Local models with GUI | **FREE** | 🔒 100% Local | Easy |
22
23
  | **MLX OpenAI Server** | Local | Apple Silicon optimized | **FREE** | 🔒 100% Local | Easy |
23
24
 
@@ -25,7 +26,11 @@ Lynkr supports multiple AI model providers, giving you flexibility in choosing t
25
26
 
26
27
  ## Configuration Methods
27
28
 
28
- ### Environment Variables (Quick Start)
29
+ There are two routing modes. Choose based on your needs:
30
+
31
+ ### Static Routing (Single Provider)
32
+
33
+ Set `MODEL_PROVIDER` to send all requests to one provider. All requests go to this provider regardless of complexity:
29
34
 
30
35
  ```bash
31
36
  export MODEL_PROVIDER=databricks
@@ -34,6 +39,23 @@ export DATABRICKS_API_KEY=your-key
34
39
  lynkr start
35
40
  ```
36
41
 
42
+ ### Tier-Based Routing (Recommended for Cost Optimization)
43
+
44
+ Set **all 4** `TIER_*` vars to route requests by complexity. Each request is scored 0-100 and routed to the `provider:model` matching its complexity tier. When all four are configured, they **override** `MODEL_PROVIDER` for routing decisions:
45
+
46
+ ```bash
47
+ export MODEL_PROVIDER=ollama # Still needed for startup checks
48
+ export TIER_SIMPLE=ollama:llama3.2 # Score 0-25 → local (free)
49
+ export TIER_MEDIUM=openrouter:openai/gpt-4o-mini # Score 26-50 → affordable cloud
50
+ export TIER_COMPLEX=databricks:claude-sonnet # Score 51-75 → capable cloud
51
+ export TIER_REASONING=databricks:claude-sonnet # Score 76-100 → best available
52
+ lynkr start
53
+ ```
54
+
55
+ > **Important:** All 4 `TIER_*` vars must be set to enable tier routing. If any are missing, tier routing is disabled and `MODEL_PROVIDER` is used for all requests. `MODEL_PROVIDER` should always be set — even with tier routing active, it is used for startup checks, provider discovery, and as the default provider when a `TIER_*` value has no `provider:` prefix.
56
+ >
57
+ > **`PREFER_OLLAMA` is deprecated** and has no effect. Use `TIER_SIMPLE=ollama:<model>` to route simple requests to Ollama. See [Routing Precedence](routing.md#routing-precedence) for full details.
58
+
37
59
  ### .env File (Recommended for Production)
38
60
 
39
61
  ```bash
@@ -46,11 +68,17 @@ nano .env
46
68
 
47
69
  Example `.env`:
48
70
  ```env
49
- MODEL_PROVIDER=databricks
71
+ MODEL_PROVIDER=ollama
50
72
  DATABRICKS_API_BASE=https://your-workspace.databricks.com
51
73
  DATABRICKS_API_KEY=dapi1234567890abcdef
52
74
  PORT=8081
53
75
  LOG_LEVEL=info
76
+
77
+ # Tier routing (optional — set all 4 to enable)
78
+ TIER_SIMPLE=ollama:llama3.2
79
+ TIER_MEDIUM=openrouter:openai/gpt-4o-mini
80
+ TIER_COMPLEX=databricks:claude-sonnet
81
+ TIER_REASONING=databricks:claude-sonnet
54
82
  ```
55
83
 
56
84
  ---
@@ -685,7 +713,82 @@ LMSTUDIO_API_KEY=your-optional-api-key
685
713
 
686
714
  ---
687
715
 
688
- ### 10. MLX OpenAI Server (Apple Silicon)
716
+ ### 10. Moonshot AI / Kimi (OpenAI-Compatible)
717
+
718
+ **Best for:** Affordable cloud models, thinking/reasoning models, OpenAI-compatible API
719
+
720
+ #### Configuration
721
+
722
+ ```env
723
+ MODEL_PROVIDER=moonshot
724
+ MOONSHOT_API_KEY=sk-your-moonshot-api-key
725
+ MOONSHOT_ENDPOINT=https://api.moonshot.ai/v1/chat/completions
726
+ MOONSHOT_MODEL=kimi-k2-turbo-preview
727
+ ```
728
+
729
+ #### Getting Moonshot API Key
730
+
731
+ 1. Visit [platform.moonshot.ai](https://platform.moonshot.ai)
732
+ 2. Sign up or log in
733
+ 3. Navigate to API Keys section
734
+ 4. Create a new API key
735
+ 5. Add credits to your account
736
+
737
+ #### Available Models
738
+
739
+ ```env
740
+ MOONSHOT_MODEL=kimi-k2-turbo-preview # Fast, efficient (recommended)
741
+ MOONSHOT_MODEL=kimi-k2-thinking # Chain-of-thought reasoning model
742
+ ```
743
+
744
+ **Model Details:**
745
+
746
+ | Model | Type | Best For |
747
+ |-------|------|----------|
748
+ | `kimi-k2-turbo-preview` | Standard | Fast responses, tool calling, general tasks |
749
+ | `kimi-k2-thinking` | Thinking/Reasoning | Complex analysis, multi-step reasoning |
750
+
751
+ #### How It Works
752
+
753
+ Moonshot uses an **OpenAI-compatible** chat completions API. Lynkr handles all format conversion automatically:
754
+
755
+ 1. Claude Code CLI sends Anthropic-format request to Lynkr
756
+ 2. Lynkr converts Anthropic messages → OpenAI chat completions format
757
+ 3. Request is sent to Moonshot's `/v1/chat/completions` endpoint
758
+ 4. Moonshot response is converted back to Anthropic format
759
+ 5. Claude Code CLI receives a standard Anthropic response
760
+
761
+ #### Thinking Model Support
762
+
763
+ When using `kimi-k2-thinking`, the model returns both `reasoning_content` (chain-of-thought) and `content` (final answer). Lynkr automatically extracts only the final answer for clean CLI output. The reasoning content is used as a fallback only when the final answer is empty.
764
+
765
+ #### Important Notes
766
+
767
+ - **Streaming:** Streaming is disabled for Moonshot (responses arrive as complete JSON). This ensures clean terminal rendering since OpenAI SSE → Anthropic SSE conversion is not yet implemented.
768
+ - **Rate Limits:** Moonshot has a max concurrency of ~3 requests. Lynkr retries with backoff on 429 errors.
769
+ - **Tool Calling:** Full tool calling support via OpenAI function calling format (automatically converted from Anthropic format).
770
+ - **System Messages:** Moonshot natively supports the `system` role, so system prompts are passed directly.
771
+
772
+ #### Benefits
773
+
774
+ - ✅ **Affordable** — Competitive pricing for capable models
775
+ - ✅ **Thinking models** — Chain-of-thought reasoning with `kimi-k2-thinking`
776
+ - ✅ **Full tool calling** — Native function calling support
777
+ - ✅ **OpenAI-compatible** — Standard chat completions API
778
+ - ✅ **System role support** — Native system message handling
779
+
780
+ #### Test Connection
781
+
782
+ ```bash
783
+ curl -X POST https://api.moonshot.ai/v1/chat/completions \
784
+ -H "Content-Type: application/json" \
785
+ -H "Authorization: Bearer $MOONSHOT_API_KEY" \
786
+ -d '{"model":"kimi-k2-turbo-preview","messages":[{"role":"user","content":"Hello"}]}'
787
+ ```
788
+
789
+ ---
790
+
791
+ ### 11. MLX OpenAI Server (Apple Silicon)
689
792
 
690
793
  **Best for:** Maximum performance on Apple Silicon Macs (M1/M2/M3/M4)
691
794
 
@@ -776,56 +879,53 @@ curl -X POST http://localhost:8000/v1/chat/completions -H "Content-Type: applica
776
879
 
777
880
  ---
778
881
 
779
- ## Hybrid Routing & Fallback
882
+ ## Tier-Based Routing & Fallback
780
883
 
781
- ### Intelligent 3-Tier Routing
884
+ ### Intelligent 4-Tier Routing
782
885
 
783
886
  Optimize costs by routing requests based on complexity:
784
887
 
785
888
  ```env
786
- # Enable hybrid routing
787
- PREFER_OLLAMA=true
788
- FALLBACK_ENABLED=true
889
+ # Tier-based routing (set all 4 to enable)
890
+ TIER_SIMPLE=ollama:llama3.2
891
+ TIER_MEDIUM=openrouter:openai/gpt-4o-mini
892
+ TIER_COMPLEX=azure-openai:gpt-4o
893
+ TIER_REASONING=azure-openai:gpt-4o
789
894
 
790
- # Configure providers for each tier
791
- MODEL_PROVIDER=ollama
792
- OLLAMA_MODEL=llama3.1:8b
793
- OLLAMA_MAX_TOOLS_FOR_ROUTING=3
895
+ FALLBACK_ENABLED=true
794
896
 
795
- # Mid-tier (moderate complexity)
897
+ # Provider credentials
898
+ OLLAMA_ENDPOINT=http://localhost:11434
796
899
  OPENROUTER_API_KEY=your-key
797
- OPENROUTER_MODEL=openai/gpt-4o-mini
798
- OPENROUTER_MAX_TOOLS_FOR_ROUTING=15
799
-
800
- # Heavy workload (complex requests)
801
- FALLBACK_PROVIDER=databricks
802
- DATABRICKS_API_BASE=your-base
803
- DATABRICKS_API_KEY=your-key
900
+ AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/...
901
+ AZURE_OPENAI_API_KEY=your-key
804
902
  ```
805
903
 
806
904
  ### How It Works
807
905
 
808
906
  **Routing Logic:**
809
- 1. **0-2 tools**: Try Ollama first (free, local, fast)
810
- 2. **3-15 tools**: Route to OpenRouter (affordable cloud)
811
- 3. **16+ tools**: Route directly to Databricks/Azure (most capable)
907
+ 1. Each request is scored for complexity (0-100)
908
+ 2. Score maps to a tier: SIMPLE (0-25), MEDIUM (26-50), COMPLEX (51-75), REASONING (76-100)
909
+ 3. The request is routed to the provider:model configured for that tier
812
910
 
813
911
  **Automatic Fallback:**
814
- - If Ollama fails Fallback to OpenRouter or Databricks
815
- - If OpenRouter fails → Fallback to Databricks
816
- - ✅ Transparent to the user
912
+ - If the selected provider fails, Lynkr falls back to `FALLBACK_PROVIDER`
913
+ - Transparent to the user
817
914
 
818
915
  ### Cost Savings
819
916
 
820
- - **65-100%** for requests that stay on Ollama
917
+ - **65-100%** for requests routed to local/cheap models
821
918
  - **40-87%** faster for simple requests
822
- - **Privacy**: Simple queries never leave your machine
919
+ - **Privacy**: Simple queries can stay on your machine when using a local TIER_SIMPLE model
823
920
 
824
921
  ### Configuration Options
825
922
 
826
923
  | Variable | Description | Default |
827
924
  |----------|-------------|---------|
828
- | `PREFER_OLLAMA` | Enable Ollama preference for simple requests | `false` |
925
+ | `TIER_SIMPLE` | Model for simple tier (`provider:model`) | *required for tier routing* |
926
+ | `TIER_MEDIUM` | Model for medium tier (`provider:model`) | *required for tier routing* |
927
+ | `TIER_COMPLEX` | Model for complex tier (`provider:model`) | *required for tier routing* |
928
+ | `TIER_REASONING` | Model for reasoning tier (`provider:model`) | *required for tier routing* |
829
929
  | `FALLBACK_ENABLED` | Enable automatic fallback | `true` |
830
930
  | `FALLBACK_PROVIDER` | Provider to use when primary fails | `databricks` |
831
931
  | `OLLAMA_MAX_TOOLS_FOR_ROUTING` | Max tools to route to Ollama | `3` |
@@ -841,7 +941,7 @@ DATABRICKS_API_KEY=your-key
841
941
 
842
942
  | Variable | Description | Default |
843
943
  |----------|-------------|---------|
844
- | `MODEL_PROVIDER` | Primary provider (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`) | `databricks` |
944
+ | `MODEL_PROVIDER` | Primary provider (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`, `zai`, `moonshot`, `vertex`) | `databricks` |
845
945
  | `PORT` | HTTP port for proxy server | `8081` |
846
946
  | `WORKSPACE_ROOT` | Workspace directory path | `process.cwd()` |
847
947
  | `LOG_LEVEL` | Logging level (`error`, `warn`, `info`, `debug`) | `info` |
@@ -858,17 +958,19 @@ See individual provider sections above for complete variable lists.
858
958
 
859
959
  ### Feature Comparison
860
960
 
861
- | Feature | Databricks | Bedrock | OpenAI | Azure OpenAI | Azure Anthropic | OpenRouter | Ollama | llama.cpp | LM Studio |
862
- |---------|-----------|---------|--------|--------------|-----------------|------------|--------|-----------|-----------|
863
- | **Setup Complexity** | Medium | Easy | Easy | Medium | Medium | Easy | Easy | Medium | Easy |
864
- | **Cost** | $$$ | $-$$$ | $$ | $$ | $$$ | $-$$ | **Free** | **Free** | **Free** |
865
- | **Latency** | Low | Low | Low | Low | Low | Medium | **Very Low** | **Very Low** | **Very Low** |
866
- | **Model Variety** | 2 | **100+** | 10+ | 10+ | 2 | **100+** | 50+ | Unlimited | 50+ |
867
- | **Tool Calling** | Excellent | Excellent* | Excellent | Excellent | Excellent | Good | Fair | Good | Fair |
868
- | **Context Length** | 200K | Up to 300K | 128K | 128K | 200K | Varies | 32K-128K | Model-dependent | 32K-128K |
869
- | **Streaming** | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
870
- | **Privacy** | Enterprise | Enterprise | Third-party | Enterprise | Enterprise | Third-party | **Local** | **Local** | **Local** |
871
- | **Offline** | No | No | No | No | No | No | **Yes** | **Yes** | **Yes** |
961
+ | Feature | Databricks | Bedrock | OpenAI | Azure OpenAI | Azure Anthropic | OpenRouter | Moonshot | Ollama | llama.cpp | LM Studio |
962
+ |---------|-----------|---------|--------|--------------|-----------------|------------|----------|--------|-----------|-----------|
963
+ | **Setup Complexity** | Medium | Easy | Easy | Medium | Medium | Easy | Easy | Easy | Medium | Easy |
964
+ | **Cost** | $$$ | $-$$$ | $$ | $$ | $$$ | $-$$ | $ | **Free** | **Free** | **Free** |
965
+ | **Latency** | Low | Low | Low | Low | Low | Medium | Low | **Very Low** | **Very Low** | **Very Low** |
966
+ | **Model Variety** | 2 | **100+** | 10+ | 10+ | 2 | **100+** | 2+ | 50+ | Unlimited | 50+ |
967
+ | **Tool Calling** | Excellent | Excellent* | Excellent | Excellent | Excellent | Good | Good | Fair | Good | Fair |
968
+ | **Context Length** | 200K | Up to 300K | 128K | 128K | 200K | Varies | 128K | 32K-128K | Model-dependent | 32K-128K |
969
+ | **Streaming** | Yes | Yes | Yes | Yes | Yes | Yes | Non-streaming** | Yes | Yes | Yes |
970
+ | **Privacy** | Enterprise | Enterprise | Third-party | Enterprise | Enterprise | Third-party | Third-party | **Local** | **Local** | **Local** |
971
+ | **Offline** | No | No | No | No | No | No | No | **Yes** | **Yes** | **Yes** |
972
+
973
+ _** Moonshot uses non-streaming mode (responses arrive as complete JSON) for clean terminal rendering_
872
974
 
873
975
  _* Tool calling only supported by Claude models on Bedrock_
874
976
 
@@ -882,6 +984,7 @@ _* Tool calling only supported by Claude models on Bedrock_
882
984
  | **OpenRouter** | GPT-4o mini | $0.15 | $0.60 |
883
985
  | **OpenAI** | GPT-4o | $2.50 | $10.00 |
884
986
  | **Azure OpenAI** | GPT-4o | $2.50 | $10.00 |
987
+ | **Moonshot** | Kimi K2 Turbo | See moonshot.ai | See moonshot.ai |
885
988
  | **Ollama** | Any model | **FREE** | **FREE** |
886
989
  | **llama.cpp** | Any model | **FREE** | **FREE** |
887
990
  | **LM Studio** | Any model | **FREE** | **FREE** |