npm - lynkr - Versions diffs - 7.2.4 → 8.0.0 - Mend

lynkr 7.2.4 → 8.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (68) hide show

package/README.md +2 -2
package/config/model-tiers.json +89 -0
package/docs/docs.html +1 -0
package/docs/index.md +7 -0
package/docs/toon-integration-spec.md +130 -0
package/documentation/README.md +3 -2
package/documentation/claude-code-cli.md +23 -16
package/documentation/cursor-integration.md +17 -14
package/documentation/docker.md +11 -4
package/documentation/embeddings.md +7 -5
package/documentation/faq.md +66 -12
package/documentation/features.md +22 -15
package/documentation/installation.md +66 -14
package/documentation/production.md +43 -8
package/documentation/providers.md +145 -42
package/documentation/routing.md +476 -0
package/documentation/token-optimization.md +7 -5
package/documentation/troubleshooting.md +81 -5
package/install.sh +6 -1
package/package.json +5 -3
package/scripts/setup.js +0 -1
package/src/agents/executor.js +14 -6
package/src/api/middleware/session.js +15 -2
package/src/api/openai-router.js +130 -37
package/src/api/providers-handler.js +15 -1
package/src/api/router.js +107 -2
package/src/budget/index.js +4 -3
package/src/clients/databricks.js +431 -234
package/src/clients/gpt-utils.js +181 -0
package/src/clients/ollama-utils.js +66 -140
package/src/clients/routing.js +0 -1
package/src/clients/standard-tools.js +82 -5
package/src/config/index.js +119 -35
package/src/context/toon.js +173 -0
package/src/headroom/launcher.js +8 -3
package/src/logger/index.js +23 -0
package/src/orchestrator/index.js +765 -212
package/src/routing/agentic-detector.js +320 -0
package/src/routing/complexity-analyzer.js +202 -2
package/src/routing/cost-optimizer.js +305 -0
package/src/routing/index.js +168 -159
package/src/routing/model-registry.js +437 -0
package/src/routing/model-tiers.js +365 -0
package/src/server.js +2 -2
package/src/sessions/cleanup.js +3 -3
package/src/sessions/record.js +10 -1
package/src/sessions/store.js +7 -2
package/src/tools/agent-task.js +48 -1
package/src/tools/index.js +15 -2
package/src/tools/workspace.js +35 -4
package/src/workspace/index.js +30 -0
package/te +11622 -0
package/test/README.md +1 -1
package/test/azure-openai-config.test.js +17 -8
package/test/azure-openai-integration.test.js +7 -1
package/test/azure-openai-routing.test.js +41 -43
package/test/bedrock-integration.test.js +18 -32
package/test/hybrid-routing-integration.test.js +35 -20
package/test/hybrid-routing-performance.test.js +74 -64
package/test/llamacpp-integration.test.js +28 -9
package/test/lmstudio-integration.test.js +20 -8
package/test/openai-integration.test.js +17 -20
package/test/performance-tests.js +1 -1
package/test/routing.test.js +65 -59
package/test/toon-compression.test.js +131 -0
package/CLAWROUTER_ROUTING_PLAN.md +0 -910
package/ROUTER_COMPARISON.md +0 -173
package/TIER_ROUTING_PLAN.md +0 -771

package/documentation/features.md CHANGED Viewed

@@ -26,6 +26,7 @@ Complete guide to Lynkr's architecture, request flow, and core capabilities.
          ├──→ Databricks (Claude 4.5)
          ├──→ AWS Bedrock (100+ models)
          ├──→ OpenRouter (100+ models)
+         ├──→ Moonshot AI (Kimi K2)
          ├──→ Ollama (local, free)
          ├──→ llama.cpp (local, free)
          ├──→ Azure OpenAI (GPT-4o, o1)
@@ -52,17 +53,19 @@ Complete guide to Lynkr's architecture, request flow, and core capabilities.
 ### 2. Provider Routing
-**Smart Routing Logic:**
+**4-Tier Intelligent Routing:**
-```javascript
-if (PREFER_OLLAMA && toolCount <= OLLAMA_MAX_TOOLS_FOR_ROUTING) {
-  provider = "ollama";  // Local, fast, free
-} else if (toolCount <= OPENROUTER_MAX_TOOLS_FOR_ROUTING) {
-  provider = "openrouter";  // Cloud, moderate complexity
-} else {
-  provider = fallbackProvider;  // Databricks/Azure, complex
-}
-```
+Lynkr uses a multi-phase complexity analysis to route each request to the optimal model tier:
+| Tier | Score | Routes To |
+|------|-------|-----------|
+| SIMPLE (0-25) | Greetings, simple Q&A | Cheap/local models (Ollama, llama.cpp) |
+| MEDIUM (26-50) | Code reading, simple edits | Mid-range models (GPT-4o, Claude Sonnet) |
+| COMPLEX (51-75) | Multi-file changes, debugging | Capable models (o1-mini, Claude Sonnet) |
+| REASONING (76-100) | Security audits, architecture | Best models (o1, Claude Opus) |
+Includes agentic workflow detection, 15-dimension weighted scoring, and cost optimization.
+See **[Routing & Model Tiering](routing.md)** for full details.
 **Automatic Fallback:**
 - If primary provider fails → Use FALLBACK_PROVIDER
@@ -171,6 +174,8 @@ data: {}
 - `invokeOllama()` - Ollama local
 - `invokeLlamaCpp()` - llama.cpp
 - `invokeBedrock()` - AWS Bedrock
+- `invokeMoonshot()` - Moonshot AI (Kimi)
+- `invokeZai()` - Z.AI (Zhipu AI)
 **Format converters:**
 - `openrouter-utils.js` - OpenAI format conversion
@@ -271,14 +276,15 @@ data: {}
 ### 1. Multi-Provider Support
-**9+ Providers:**
-- Cloud: Databricks, Bedrock, OpenRouter, Azure, OpenAI
+**12+ Providers:**
+- Cloud: Databricks, Bedrock, OpenRouter, Azure, OpenAI, Moonshot AI, Z.AI, Vertex AI
 - Local: Ollama, llama.cpp, LM Studio
 **Hybrid Routing:**
-- Automatic provider selection
-- Transparent failover
-- Cost optimization
+- [4-tier intelligent routing](routing.md) with complexity scoring
+- Automatic provider selection and transparent failover
+- Agentic workflow detection with tier upgrades
+- Cost optimization with multi-source pricing
 ### 2. Token Optimization
@@ -383,6 +389,7 @@ PROMPT_CACHE_MAX_ENTRIES=256
 ## Next Steps
+- **[Routing & Model Tiering](routing.md)** - Intelligent routing and scoring algorithm
 - **[Memory System](memory-system.md)** - Long-term memory details
 - **[Token Optimization](token-optimization.md)** - Cost reduction strategies
 - **[Production Guide](production.md)** - Deploy to production

package/documentation/installation.md CHANGED Viewed

@@ -16,6 +16,7 @@ Before installing Lynkr, ensure you have:
   - **OpenRouter API key** (get from [openrouter.ai/keys](https://openrouter.ai/keys))
   - **Azure OpenAI** or **Azure Anthropic** subscription
   - **OpenAI API key** (get from [platform.openai.com/api-keys](https://platform.openai.com/api-keys))
+  - **Moonshot AI API key** (get from [platform.moonshot.ai](https://platform.moonshot.ai))
   - **Ollama** installed locally (for free local models)
 - Optional: **Docker** for containerized deployment or MCP sandboxing
 - Optional: **Claude Code CLI** (latest release) for CLI usage
@@ -236,6 +237,25 @@ MEMORY_RETRIEVAL_LIMIT=5
 ---
+## Understanding Provider Selection
+Lynkr has two modes for selecting which AI provider handles your requests:
+| Mode | Config | How it works | Best for |
+|------|--------|-------------|----------|
+| **Static** | `MODEL_PROVIDER=ollama` | All requests go to one provider | Simple setups, single provider |
+| **Tier-based** | All 4 `TIER_*` vars set | Requests route by complexity score | Cost optimization, multi-provider |
+**Static mode** — Set `MODEL_PROVIDER` to your provider. Every request goes there. Simple and predictable.
+**Tier-based mode** — Set all 4 `TIER_*` env vars (`TIER_SIMPLE`, `TIER_MEDIUM`, `TIER_COMPLEX`, `TIER_REASONING`). Each request is scored for complexity and routed to the appropriate tier's provider. When all 4 are set, they **override** `MODEL_PROVIDER` for routing decisions.
+> **Note:** If only some `TIER_*` vars are set (not all 4), tier routing is disabled and `MODEL_PROVIDER` is used instead. `MODEL_PROVIDER` is always required as a fallback default even when tiers are configured.
+See [Tier-Based Routing](#tier-based-routing-cost-optimization) below for full setup, or pick a single provider from the Quick Start examples to get running immediately.
+---
 ## Quick Start Examples
 Choose your provider and follow the setup steps:
@@ -501,7 +521,36 @@ lynkr start
 ---
-### 9. LM Studio (Local with GUI)
+### 9. Moonshot AI / Kimi (Affordable Cloud)
+**Best for:** Affordable cloud models, thinking/reasoning models
+```bash
+# Install
+npm install -g lynkr
+# Configure
+export MODEL_PROVIDER=moonshot
+export MOONSHOT_API_KEY=sk-your-moonshot-api-key
+export MOONSHOT_MODEL=kimi-k2-turbo-preview
+# Start
+lynkr start
+```
+**Get Moonshot API key:**
+1. Visit [platform.moonshot.ai](https://platform.moonshot.ai)
+2. Sign up or log in
+3. Create a new API key
+4. Add credits to your account
+**Available models:**
+- `kimi-k2-turbo-preview` - Fast, efficient, tool calling support
+- `kimi-k2-thinking` - Chain-of-thought reasoning model
+---
+### 10. LM Studio (Local with GUI)
 **Best for:** Local models with graphical interface
@@ -525,19 +574,20 @@ lynkr start
 ---
-## Hybrid Routing (Cost Optimization)
+## Tier-Based Routing (Cost Optimization)
-**Use local Ollama for simple tasks, fallback to cloud for complex ones:**
+**Use local Ollama for simple tasks, cloud for complex ones:**
 ```bash
 # Start Ollama
 ollama serve
-ollama pull llama3.1:8b
+ollama pull llama3.2
-# Configure hybrid routing
-export MODEL_PROVIDER=ollama
-export OLLAMA_MODEL=llama3.1:8b
-export PREFER_OLLAMA=true
+# Configure tier-based routing (set all 4 to enable)
+export TIER_SIMPLE=ollama:llama3.2
+export TIER_MEDIUM=openrouter:openai/gpt-4o-mini
+export TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
+export TIER_REASONING=databricks:databricks-claude-sonnet-4-5
 export FALLBACK_ENABLED=true
 export FALLBACK_PROVIDER=databricks
 export DATABRICKS_API_BASE=https://your-workspace.databricks.com
@@ -548,13 +598,15 @@ lynkr start
 ```
 **How it works:**
-- **0-2 tools**: Ollama (free, local, fast)
-- **3-15 tools**: OpenRouter (if configured) or fallback to Databricks
-- **16+ tools**: Databricks/Azure (most capable)
-- **Ollama failures**: Automatic transparent fallback to cloud
+- Each request is scored for complexity (0-100) and mapped to a tier
+- **SIMPLE (0-25)**: Ollama (free, local, fast)
+- **MEDIUM (26-50)**: OpenRouter (affordable cloud)
+- **COMPLEX (51-75)**: Databricks (most capable)
+- **REASONING (76-100)**: Databricks (best available)
+- **Provider failures**: Automatic transparent fallback to cloud
 **Cost savings:**
-- **65-100%** for requests that stay on Ollama
+- **65-100%** for requests routed to local models
 - **40-87%** faster for simple requests
 - **Privacy**: Simple queries never leave your machine
@@ -614,7 +666,7 @@ See [Provider Configuration Guide](providers.md) for complete environment variab
 | Variable | Description | Default |
 |----------|-------------|---------|
-| `MODEL_PROVIDER` | Provider to use (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`) | `databricks` |
+| `MODEL_PROVIDER` | Provider to use (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`, `moonshot`, `zai`, `vertex`) | `databricks` |
 | `PORT` | HTTP port for proxy server | `8081` |
 | `WORKSPACE_ROOT` | Workspace directory path | `process.cwd()` |
 | `LOG_LEVEL` | Logging level (`error`, `warn`, `info`, `debug`) | `info` |

package/documentation/production.md CHANGED Viewed

@@ -190,15 +190,35 @@ METRICS_ENABLED=true  # default: true
 ### 6. Structured Logging
-JSON logs with request ID correlation.
+JSON logs with request ID correlation via [Pino](https://github.com/pinojs/pino).
+**Log Level Philosophy:**
+- **`info`** — Meaningful milestones: request received (minimal), request completed (duration + tokens), errors, retries, fallbacks
+- **`debug`** — Operational details: request body previews, tool injection, streaming chunks, intermediate conversions, tool mapping
+**Console Configuration:**
+```bash
+LOG_LEVEL=info                  # options: error, warn, info, debug (default: info)
+REQUEST_LOGGING_ENABLED=true    # default: true
+```
+In development mode (`NODE_ENV=development`), logs are pretty-printed via `pino-pretty`.
+**File Logging (optional):**
+Persistent log files with automatic daily rotation via [pino-roll](https://github.com/pinojs/pino-roll). Enable by setting `LOG_FILE_ENABLED=true`.
-**Configuration:**
 ```bash
-LOG_LEVEL=info  # options: error, warn, info, debug
-REQUEST_LOGGING_ENABLED=true  # default: true
+LOG_FILE_ENABLED=true           # default: false
+LOG_FILE_PATH=./logs/lynkr.log  # default: <cwd>/logs/lynkr.log
+LOG_FILE_LEVEL=debug            # default: debug (captures all levels)
+LOG_FILE_FREQUENCY=daily        # options: daily, hourly, custom (default: daily)
+LOG_FILE_MAX_FILES=14           # rotated files to keep (default: 14)
 ```
-**Log format:**
+Rotated files are named with timestamps (e.g., `lynkr.log.2025-07-12`). The log directory is created automatically.
+**Log format (JSON):**
 ```json
 {
   "level": "info",
@@ -216,10 +236,25 @@ REQUEST_LOGGING_ENABLED=true  # default: true
 }
 ```
+**Querying log files:**
+```bash
+# Tail live logs
+tail -f ./logs/lynkr.log | npx pino-pretty
+# Find errors in the last 24 hours
+cat ./logs/lynkr.log | jq 'select(.level >= 50)'
+# Filter by provider
+cat ./logs/lynkr.log | jq 'select(.provider == "databricks")'
+# Search for slow requests (>2s)
+cat ./logs/lynkr.log | jq 'select(.duration > 2000)'
+```
 **Log aggregation:**
-- Stdout (captured by Docker/K8s)
-- Parse with structured log tools
-- Send to Elasticsearch, Splunk, etc.
+- **Stdout** — Captured by Docker/K8s log drivers
+- **File rotation** — For standalone deployments or local debugging
+- **External** — Forward JSON logs to Elasticsearch, Splunk, Grafana Loki, etc.
 ### 7. Health Checks

package/documentation/providers.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Provider Configuration Guide
-Complete configuration reference for all 9+ supported LLM providers. Each provider section includes setup instructions, model options, pricing, and example configurations.
+Complete configuration reference for all 12+ supported LLM providers. Each provider section includes setup instructions, model options, pricing, and example configurations.
 ---
@@ -18,6 +18,7 @@ Lynkr supports multiple AI model providers, giving you flexibility in choosing t
 | **Azure OpenAI** | Cloud | GPT-4o, GPT-5, o1, o3 | $$$ | Cloud | Medium |
 | **Azure Anthropic** | Cloud | Claude models | $$$ | Cloud | Medium |
 | **OpenAI** | Cloud | GPT-4o, o1, o3 | $$$ | Cloud | Easy |
+| **Moonshot AI (Kimi)** | Cloud | Kimi K2 (thinking + turbo) | $ | Cloud | Easy |
 | **LM Studio** | Local | Local models with GUI | **FREE** | 🔒 100% Local | Easy |
 | **MLX OpenAI Server** | Local | Apple Silicon optimized | **FREE** | 🔒 100% Local | Easy |
@@ -25,7 +26,11 @@ Lynkr supports multiple AI model providers, giving you flexibility in choosing t
 ## Configuration Methods
-### Environment Variables (Quick Start)
+There are two routing modes. Choose based on your needs:
+### Static Routing (Single Provider)
+Set `MODEL_PROVIDER` to send all requests to one provider. All requests go to this provider regardless of complexity:
 ```bash
 export MODEL_PROVIDER=databricks
@@ -34,6 +39,23 @@ export DATABRICKS_API_KEY=your-key
 lynkr start
 ```
+### Tier-Based Routing (Recommended for Cost Optimization)
+Set **all 4** `TIER_*` vars to route requests by complexity. Each request is scored 0-100 and routed to the `provider:model` matching its complexity tier. When all four are configured, they **override** `MODEL_PROVIDER` for routing decisions:
+```bash
+export MODEL_PROVIDER=ollama                            # Still needed for startup checks
+export TIER_SIMPLE=ollama:llama3.2                      # Score 0-25 → local (free)
+export TIER_MEDIUM=openrouter:openai/gpt-4o-mini        # Score 26-50 → affordable cloud
+export TIER_COMPLEX=databricks:claude-sonnet             # Score 51-75 → capable cloud
+export TIER_REASONING=databricks:claude-sonnet            # Score 76-100 → best available
+lynkr start
+```
+> **Important:** All 4 `TIER_*` vars must be set to enable tier routing. If any are missing, tier routing is disabled and `MODEL_PROVIDER` is used for all requests. `MODEL_PROVIDER` should always be set — even with tier routing active, it is used for startup checks, provider discovery, and as the default provider when a `TIER_*` value has no `provider:` prefix.
+>
+> **`PREFER_OLLAMA` is deprecated** and has no effect. Use `TIER_SIMPLE=ollama:<model>` to route simple requests to Ollama. See [Routing Precedence](routing.md#routing-precedence) for full details.
 ### .env File (Recommended for Production)
 ```bash
@@ -46,11 +68,17 @@ nano .env
 Example `.env`:
 ```env
-MODEL_PROVIDER=databricks
+MODEL_PROVIDER=ollama
 DATABRICKS_API_BASE=https://your-workspace.databricks.com
 DATABRICKS_API_KEY=dapi1234567890abcdef
 PORT=8081
 LOG_LEVEL=info
+# Tier routing (optional — set all 4 to enable)
+TIER_SIMPLE=ollama:llama3.2
+TIER_MEDIUM=openrouter:openai/gpt-4o-mini
+TIER_COMPLEX=databricks:claude-sonnet
+TIER_REASONING=databricks:claude-sonnet
 ```
 ---
@@ -685,7 +713,82 @@ LMSTUDIO_API_KEY=your-optional-api-key
 ---
-### 10. MLX OpenAI Server (Apple Silicon)
+### 10. Moonshot AI / Kimi (OpenAI-Compatible)
+**Best for:** Affordable cloud models, thinking/reasoning models, OpenAI-compatible API
+#### Configuration
+```env
+MODEL_PROVIDER=moonshot
+MOONSHOT_API_KEY=sk-your-moonshot-api-key
+MOONSHOT_ENDPOINT=https://api.moonshot.ai/v1/chat/completions
+MOONSHOT_MODEL=kimi-k2-turbo-preview
+```
+#### Getting Moonshot API Key
+1. Visit [platform.moonshot.ai](https://platform.moonshot.ai)
+2. Sign up or log in
+3. Navigate to API Keys section
+4. Create a new API key
+5. Add credits to your account
+#### Available Models
+```env
+MOONSHOT_MODEL=kimi-k2-turbo-preview    # Fast, efficient (recommended)
+MOONSHOT_MODEL=kimi-k2-thinking         # Chain-of-thought reasoning model
+```
+**Model Details:**
+| Model | Type | Best For |
+|-------|------|----------|
+| `kimi-k2-turbo-preview` | Standard | Fast responses, tool calling, general tasks |
+| `kimi-k2-thinking` | Thinking/Reasoning | Complex analysis, multi-step reasoning |
+#### How It Works
+Moonshot uses an **OpenAI-compatible** chat completions API. Lynkr handles all format conversion automatically:
+1. Claude Code CLI sends Anthropic-format request to Lynkr
+2. Lynkr converts Anthropic messages → OpenAI chat completions format
+3. Request is sent to Moonshot's `/v1/chat/completions` endpoint
+4. Moonshot response is converted back to Anthropic format
+5. Claude Code CLI receives a standard Anthropic response
+#### Thinking Model Support
+When using `kimi-k2-thinking`, the model returns both `reasoning_content` (chain-of-thought) and `content` (final answer). Lynkr automatically extracts only the final answer for clean CLI output. The reasoning content is used as a fallback only when the final answer is empty.
+#### Important Notes
+- **Streaming:** Streaming is disabled for Moonshot (responses arrive as complete JSON). This ensures clean terminal rendering since OpenAI SSE → Anthropic SSE conversion is not yet implemented.
+- **Rate Limits:** Moonshot has a max concurrency of ~3 requests. Lynkr retries with backoff on 429 errors.
+- **Tool Calling:** Full tool calling support via OpenAI function calling format (automatically converted from Anthropic format).
+- **System Messages:** Moonshot natively supports the `system` role, so system prompts are passed directly.
+#### Benefits
+- ✅ **Affordable** — Competitive pricing for capable models
+- ✅ **Thinking models** — Chain-of-thought reasoning with `kimi-k2-thinking`
+- ✅ **Full tool calling** — Native function calling support
+- ✅ **OpenAI-compatible** — Standard chat completions API
+- ✅ **System role support** — Native system message handling
+#### Test Connection
+```bash
+curl -X POST https://api.moonshot.ai/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $MOONSHOT_API_KEY" \
+  -d '{"model":"kimi-k2-turbo-preview","messages":[{"role":"user","content":"Hello"}]}'
+```
+---
+### 11. MLX OpenAI Server (Apple Silicon)
 **Best for:** Maximum performance on Apple Silicon Macs (M1/M2/M3/M4)
@@ -776,56 +879,53 @@ curl -X POST http://localhost:8000/v1/chat/completions -H "Content-Type: applica
 ---
-## Hybrid Routing & Fallback
+## Tier-Based Routing & Fallback
-### Intelligent 3-Tier Routing
+### Intelligent 4-Tier Routing
 Optimize costs by routing requests based on complexity:
 ```env
-# Enable hybrid routing
-PREFER_OLLAMA=true
-FALLBACK_ENABLED=true
+# Tier-based routing (set all 4 to enable)
+TIER_SIMPLE=ollama:llama3.2
+TIER_MEDIUM=openrouter:openai/gpt-4o-mini
+TIER_COMPLEX=azure-openai:gpt-4o
+TIER_REASONING=azure-openai:gpt-4o
-# Configure providers for each tier
-MODEL_PROVIDER=ollama
-OLLAMA_MODEL=llama3.1:8b
-OLLAMA_MAX_TOOLS_FOR_ROUTING=3
+FALLBACK_ENABLED=true
-# Mid-tier (moderate complexity)
+# Provider credentials
+OLLAMA_ENDPOINT=http://localhost:11434
 OPENROUTER_API_KEY=your-key
-OPENROUTER_MODEL=openai/gpt-4o-mini
-OPENROUTER_MAX_TOOLS_FOR_ROUTING=15
-# Heavy workload (complex requests)
-FALLBACK_PROVIDER=databricks
-DATABRICKS_API_BASE=your-base
-DATABRICKS_API_KEY=your-key
+AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/...
+AZURE_OPENAI_API_KEY=your-key
 ```
 ### How It Works
 **Routing Logic:**
-1. **0-2 tools**: Try Ollama first (free, local, fast)
-2. **3-15 tools**: Route to OpenRouter (affordable cloud)
-3. **16+ tools**: Route directly to Databricks/Azure (most capable)
+1. Each request is scored for complexity (0-100)
+2. Score maps to a tier: SIMPLE (0-25), MEDIUM (26-50), COMPLEX (51-75), REASONING (76-100)
+3. The request is routed to the provider:model configured for that tier
 **Automatic Fallback:**
-- ❌ If Ollama fails → Fallback to OpenRouter or Databricks
-- ❌ If OpenRouter fails → Fallback to Databricks
-- ✅ Transparent to the user
+- If the selected provider fails, Lynkr falls back to `FALLBACK_PROVIDER`
+- Transparent to the user
 ### Cost Savings
-- **65-100%** for requests that stay on Ollama
+- **65-100%** for requests routed to local/cheap models
 - **40-87%** faster for simple requests
-- **Privacy**: Simple queries never leave your machine
+- **Privacy**: Simple queries can stay on your machine when using a local TIER_SIMPLE model
 ### Configuration Options
 | Variable | Description | Default |
 |----------|-------------|---------|
-| `PREFER_OLLAMA` | Enable Ollama preference for simple requests | `false` |
+| `TIER_SIMPLE` | Model for simple tier (`provider:model`) | *required for tier routing* |
+| `TIER_MEDIUM` | Model for medium tier (`provider:model`) | *required for tier routing* |
+| `TIER_COMPLEX` | Model for complex tier (`provider:model`) | *required for tier routing* |
+| `TIER_REASONING` | Model for reasoning tier (`provider:model`) | *required for tier routing* |
 | `FALLBACK_ENABLED` | Enable automatic fallback | `true` |
 | `FALLBACK_PROVIDER` | Provider to use when primary fails | `databricks` |
 | `OLLAMA_MAX_TOOLS_FOR_ROUTING` | Max tools to route to Ollama | `3` |
@@ -841,7 +941,7 @@ DATABRICKS_API_KEY=your-key
 | Variable | Description | Default |
 |----------|-------------|---------|
-| `MODEL_PROVIDER` | Primary provider (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`) | `databricks` |
+| `MODEL_PROVIDER` | Primary provider (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`, `zai`, `moonshot`, `vertex`) | `databricks` |
 | `PORT` | HTTP port for proxy server | `8081` |
 | `WORKSPACE_ROOT` | Workspace directory path | `process.cwd()` |
 | `LOG_LEVEL` | Logging level (`error`, `warn`, `info`, `debug`) | `info` |
@@ -858,17 +958,19 @@ See individual provider sections above for complete variable lists.
 ### Feature Comparison
-| Feature | Databricks | Bedrock | OpenAI | Azure OpenAI | Azure Anthropic | OpenRouter | Ollama | llama.cpp | LM Studio |
-|---------|-----------|---------|--------|--------------|-----------------|------------|--------|-----------|-----------|
-| **Setup Complexity** | Medium | Easy | Easy | Medium | Medium | Easy | Easy | Medium | Easy |
-| **Cost** | $$$ | $-$$$ | $$ | $$ | $$$ | $-$$ | **Free** | **Free** | **Free** |
-| **Latency** | Low | Low | Low | Low | Low | Medium | **Very Low** | **Very Low** | **Very Low** |
-| **Model Variety** | 2 | **100+** | 10+ | 10+ | 2 | **100+** | 50+ | Unlimited | 50+ |
-| **Tool Calling** | Excellent | Excellent* | Excellent | Excellent | Excellent | Good | Fair | Good | Fair |
-| **Context Length** | 200K | Up to 300K | 128K | 128K | 200K | Varies | 32K-128K | Model-dependent | 32K-128K |
-| **Streaming** | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
-| **Privacy** | Enterprise | Enterprise | Third-party | Enterprise | Enterprise | Third-party | **Local** | **Local** | **Local** |
-| **Offline** | No | No | No | No | No | No | **Yes** | **Yes** | **Yes** |
+| Feature | Databricks | Bedrock | OpenAI | Azure OpenAI | Azure Anthropic | OpenRouter | Moonshot | Ollama | llama.cpp | LM Studio |
+|---------|-----------|---------|--------|--------------|-----------------|------------|----------|--------|-----------|-----------|
+| **Setup Complexity** | Medium | Easy | Easy | Medium | Medium | Easy | Easy | Easy | Medium | Easy |
+| **Cost** | $$$ | $-$$$ | $$ | $$ | $$$ | $-$$ | $ | **Free** | **Free** | **Free** |
+| **Latency** | Low | Low | Low | Low | Low | Medium | Low | **Very Low** | **Very Low** | **Very Low** |
+| **Model Variety** | 2 | **100+** | 10+ | 10+ | 2 | **100+** | 2+ | 50+ | Unlimited | 50+ |
+| **Tool Calling** | Excellent | Excellent* | Excellent | Excellent | Excellent | Good | Good | Fair | Good | Fair |
+| **Context Length** | 200K | Up to 300K | 128K | 128K | 200K | Varies | 128K | 32K-128K | Model-dependent | 32K-128K |
+| **Streaming** | Yes | Yes | Yes | Yes | Yes | Yes | Non-streaming** | Yes | Yes | Yes |
+| **Privacy** | Enterprise | Enterprise | Third-party | Enterprise | Enterprise | Third-party | Third-party | **Local** | **Local** | **Local** |
+| **Offline** | No | No | No | No | No | No | No | **Yes** | **Yes** | **Yes** |
+_** Moonshot uses non-streaming mode (responses arrive as complete JSON) for clean terminal rendering_
 _* Tool calling only supported by Claude models on Bedrock_
@@ -882,6 +984,7 @@ _* Tool calling only supported by Claude models on Bedrock_
 | **OpenRouter** | GPT-4o mini | $0.15 | $0.60 |
 | **OpenAI** | GPT-4o | $2.50 | $10.00 |
 | **Azure OpenAI** | GPT-4o | $2.50 | $10.00 |
+| **Moonshot** | Kimi K2 Turbo | See moonshot.ai | See moonshot.ai |
 | **Ollama** | Any model | **FREE** | **FREE** |
 | **llama.cpp** | Any model | **FREE** | **FREE** |
 | **LM Studio** | Any model | **FREE** | **FREE** |