npm - lynkr - Versions diffs - 7.2.4 → 8.0.0 - Mend

lynkr 7.2.4 → 8.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (68) hide show

package/README.md +2 -2
package/config/model-tiers.json +89 -0
package/docs/docs.html +1 -0
package/docs/index.md +7 -0
package/docs/toon-integration-spec.md +130 -0
package/documentation/README.md +3 -2
package/documentation/claude-code-cli.md +23 -16
package/documentation/cursor-integration.md +17 -14
package/documentation/docker.md +11 -4
package/documentation/embeddings.md +7 -5
package/documentation/faq.md +66 -12
package/documentation/features.md +22 -15
package/documentation/installation.md +66 -14
package/documentation/production.md +43 -8
package/documentation/providers.md +145 -42
package/documentation/routing.md +476 -0
package/documentation/token-optimization.md +7 -5
package/documentation/troubleshooting.md +81 -5
package/install.sh +6 -1
package/package.json +5 -3
package/scripts/setup.js +0 -1
package/src/agents/executor.js +14 -6
package/src/api/middleware/session.js +15 -2
package/src/api/openai-router.js +130 -37
package/src/api/providers-handler.js +15 -1
package/src/api/router.js +107 -2
package/src/budget/index.js +4 -3
package/src/clients/databricks.js +431 -234
package/src/clients/gpt-utils.js +181 -0
package/src/clients/ollama-utils.js +66 -140
package/src/clients/routing.js +0 -1
package/src/clients/standard-tools.js +82 -5
package/src/config/index.js +119 -35
package/src/context/toon.js +173 -0
package/src/headroom/launcher.js +8 -3
package/src/logger/index.js +23 -0
package/src/orchestrator/index.js +765 -212
package/src/routing/agentic-detector.js +320 -0
package/src/routing/complexity-analyzer.js +202 -2
package/src/routing/cost-optimizer.js +305 -0
package/src/routing/index.js +168 -159
package/src/routing/model-registry.js +437 -0
package/src/routing/model-tiers.js +365 -0
package/src/server.js +2 -2
package/src/sessions/cleanup.js +3 -3
package/src/sessions/record.js +10 -1
package/src/sessions/store.js +7 -2
package/src/tools/agent-task.js +48 -1
package/src/tools/index.js +15 -2
package/src/tools/workspace.js +35 -4
package/src/workspace/index.js +30 -0
package/te +11622 -0
package/test/README.md +1 -1
package/test/azure-openai-config.test.js +17 -8
package/test/azure-openai-integration.test.js +7 -1
package/test/azure-openai-routing.test.js +41 -43
package/test/bedrock-integration.test.js +18 -32
package/test/hybrid-routing-integration.test.js +35 -20
package/test/hybrid-routing-performance.test.js +74 -64
package/test/llamacpp-integration.test.js +28 -9
package/test/lmstudio-integration.test.js +20 -8
package/test/openai-integration.test.js +17 -20
package/test/performance-tests.js +1 -1
package/test/routing.test.js +65 -59
package/test/toon-compression.test.js +131 -0
package/CLAWROUTER_ROUTING_PLAN.md +0 -910
package/ROUTER_COMPARISON.md +0 -173
package/TIER_ROUTING_PLAN.md +0 -771

package/README.md CHANGED Viewed

@@ -238,7 +238,7 @@ Lynkr supports [ClawdBot](https://github.com/openclaw/openclaw) via its OpenAI-c
 ### Getting Started
 - 📦 **[Installation Guide](documentation/installation.md)** - Detailed installation for all methods
-- ⚙️ **[Provider Configuration](documentation/providers.md)** - Complete setup for all 9+ providers
+- ⚙️ **[Provider Configuration](documentation/providers.md)** - Complete setup for all 12+ providers
 - 🎯 **[Quick Start Examples](documentation/installation.md#quick-start-examples)** - Copy-paste configs
 ### IDE & CLI Integration
@@ -277,7 +277,7 @@ Lynkr supports [ClawdBot](https://github.com/openclaw/openclaw) via its OpenAI-c
 ## Key Features Highlights
-- ✅ **Multi-Provider Support** - 9+ providers including local (Ollama, llama.cpp) and cloud (Bedrock, Databricks, OpenRouter)
+- ✅ **Multi-Provider Support** - 12+ providers including local (Ollama, llama.cpp) and cloud (Bedrock, Databricks, OpenRouter, Moonshot AI)
 - ✅ **60-80% Cost Reduction** - Token optimization with smart tool selection, prompt caching, memory deduplication
 - ✅ **100% Local Option** - Run completely offline with Ollama/llama.cpp (zero cloud dependencies)
 - ✅ **OpenAI Compatible** - Works with Cursor IDE, Continue.dev, and any OpenAI-compatible client

package/config/model-tiers.json ADDED Viewed

@@ -0,0 +1,89 @@
+{
+  "tiers": {
+    "SIMPLE": {
+      "description": "Greetings, simple Q&A, confirmations, basic lookups",
+      "range": [0, 25],
+      "priority": 1,
+      "preferred": {
+        "ollama": ["llama3.2", "gemma2", "phi3", "qwen2.5:7b", "mistral"],
+        "llamacpp": ["default"],
+        "lmstudio": ["default"],
+        "openai": ["gpt-4o-mini", "gpt-3.5-turbo"],
+        "azure-openai": ["gpt-4o-mini", "gpt-35-turbo"],
+        "anthropic": ["claude-3-haiku-20240307", "claude-3-5-haiku-20241022"],
+        "bedrock": ["anthropic.claude-3-haiku-20240307-v1:0", "amazon.nova-lite-v1:0"],
+        "databricks": ["databricks-claude-haiku-4-5", "databricks-gpt-5-nano"],
+        "google": ["gemini-2.0-flash", "gemini-1.5-flash"],
+        "openrouter": ["google/gemini-flash-1.5", "deepseek/deepseek-chat"],
+        "zai": ["GLM-4-Flash"],
+        "moonshot": ["kimi-k2-turbo-preview"]
+      }
+    },
+    "MEDIUM": {
+      "description": "Code reading, simple edits, research, documentation",
+      "range": [26, 50],
+      "priority": 2,
+      "preferred": {
+        "ollama": ["qwen2.5:32b", "deepseek-coder:33b", "codellama:34b"],
+        "llamacpp": ["default"],
+        "lmstudio": ["default"],
+        "openai": ["gpt-4o", "gpt-4-turbo"],
+        "azure-openai": ["gpt-4o", "gpt-4"],
+        "anthropic": ["claude-sonnet-4-20250514", "claude-3-5-sonnet-20241022"],
+        "bedrock": ["anthropic.claude-3-5-sonnet-20241022-v2:0", "amazon.nova-pro-v1:0"],
+        "databricks": ["databricks-claude-sonnet-4-5", "databricks-gpt-5-1"],
+        "google": ["gemini-1.5-pro", "gemini-2.0-pro"],
+        "openrouter": ["anthropic/claude-3.5-sonnet", "openai/gpt-4o"],
+        "zai": ["GLM-4.7"],
+        "moonshot": ["kimi-k2-turbo-preview"]
+      }
+    },
+    "COMPLEX": {
+      "description": "Multi-file changes, debugging, architecture, refactoring",
+      "range": [51, 75],
+      "priority": 3,
+      "preferred": {
+        "ollama": ["qwen2.5:72b", "llama3.1:70b", "deepseek-coder-v2:236b"],
+        "openai": ["o1-mini", "o3-mini", "gpt-4o"],
+        "azure-openai": ["o1-mini", "gpt-4o"],
+        "anthropic": ["claude-sonnet-4-20250514", "claude-3-5-sonnet-20241022"],
+        "bedrock": ["anthropic.claude-3-5-sonnet-20241022-v2:0"],
+        "databricks": ["databricks-claude-sonnet-4-5", "databricks-gpt-5-1-codex-max"],
+        "google": ["gemini-2.5-pro", "gemini-1.5-pro"],
+        "openrouter": ["anthropic/claude-3.5-sonnet", "meta-llama/llama-3.1-405b"],
+        "zai": ["GLM-4.7"],
+        "moonshot": ["kimi-k2-turbo-preview"]
+      }
+    },
+    "REASONING": {
+      "description": "Complex analysis, security audits, novel problems, deep thinking",
+      "range": [76, 100],
+      "priority": 4,
+      "preferred": {
+        "openai": ["o1", "o1-pro", "o3"],
+        "azure-openai": ["o1", "o1-pro"],
+        "anthropic": ["claude-opus-4-20250514", "claude-3-opus-20240229"],
+        "bedrock": ["anthropic.claude-3-opus-20240229-v1:0"],
+        "databricks": ["databricks-claude-opus-4-6", "databricks-claude-opus-4-5", "databricks-gpt-5-2"],
+        "google": ["gemini-2.5-pro"],
+        "openrouter": ["anthropic/claude-3-opus", "deepseek/deepseek-reasoner", "openai/o1"],
+        "deepseek": ["deepseek-reasoner", "deepseek-r1"],
+        "moonshot": ["kimi-k2-thinking", "kimi-k2-turbo-preview"]
+      }
+    }
+  },
+  "localProviders": {
+    "ollama": { "free": true, "defaultTier": "SIMPLE" },
+    "llamacpp": { "free": true, "defaultTier": "SIMPLE" },
+    "lmstudio": { "free": true, "defaultTier": "SIMPLE" }
+  },
+  "providerAliases": {
+    "azure": "azure-openai",
+    "aws": "bedrock",
+    "amazon": "bedrock",
+    "claude": "anthropic",
+    "gemini": "google",
+    "vertex": "google",
+    "kimi": "moonshot"
+  }
+}

package/docs/docs.html CHANGED Viewed

@@ -51,6 +51,7 @@
       <div class="doc-sidebar-title">Features</div>
       <ul class="doc-sidebar-links">
         <li><a href="?doc=features" data-doc="features">Core Features</a></li>
+        <li><a href="?doc=routing" data-doc="routing">Routing & Model Tiering</a></li>
         <li><a href="?doc=token-optimization" data-doc="token-optimization">Token Optimization</a></li>
         <li><a href="?doc=memory-system" data-doc="memory-system">Memory System</a></li>
         <li><a href="?doc=headroom" data-doc="headroom">Headroom Compression</a></li>

package/docs/index.md CHANGED Viewed

@@ -311,6 +311,13 @@
         <span class="provider-badge paid">GPT-4o, o1</span>
       </div>
+      <div class="provider-card">
+        <span class="provider-icon">🌙</span>
+        <div class="provider-name">Moonshot AI</div>
+        <div class="provider-type">Cloud</div>
+        <span class="provider-badge paid">KIMI K2</span>
+      </div>
       <div class="provider-card">
         <span class="provider-icon" style="font-weight: 900; font-size: 28px; background: linear-gradient(135deg, #1a1a2e, #16213e); color: #fff; width: 42px; height: 42px; border-radius: 10px; display: inline-flex; align-items: center; justify-content: center;">Z</span>
         <div class="provider-name">z.ai</div>

package/docs/toon-integration-spec.md ADDED Viewed

@@ -0,0 +1,130 @@
+# TOON Integration Spec (Lynkr Spike)
+Date: 2026-02-17
+Branch: `codex/toon-integration-spike`
+Status: Implemented behind flags (`TOON_ENABLED=false` by default).
+## 1) Goal
+Reduce prompt token usage for large structured JSON context while preserving current Lynkr routing, tool execution semantics, and reliability.
+## 2) Non-Goals
+1. Do not replace Lynkr routing/fallback logic.
+2. Do not change MCP/tool protocol behavior.
+3. Do not change provider request envelope formats.
+4. Do not require TOON for normal operation.
+## 3) Integration Strategy (Minimal, Reversible)
+1. Add a TOON adapter module (encode-only for prompt context).
+2. Apply TOON only to eligible large JSON blobs before they are inserted into model-visible context.
+3. Keep original JSON in memory/session for execution and audit; only prompt copy is compressed.
+4. Fail open: if TOON conversion fails, send original JSON unchanged.
+## 4) What We Will Compress
+Eligible inputs (all required):
+1. Payload is valid JSON object/array.
+2. Payload size exceeds threshold (for example, `TOON_MIN_BYTES`).
+3. Payload is read-only context for model comprehension (not protocol-critical).
+Primary targets:
+1. Large tool output summaries inserted into prompt context.
+2. Large search/result payloads injected for reasoning.
+3. Structured data snapshots used for analysis tasks.
+## 5) What We Will Never Compress
+Hard exclusions:
+1. Tool schemas/definitions (`tools`, `input_schema`, function signatures).
+2. Tool call argument payloads that are executed by systems.
+3. Provider request envelopes (`/v1/messages`, `/chat/completions` body schema fields).
+4. Protocol control fields (roles, stop reasons, tool IDs, request IDs).
+5. Stored canonical session payloads used for replay/debug/audit.
+Rule: if a payload is machine-validated/executed downstream, keep JSON.
+## 6) Config Flags (Default Safe)
+Proposed env flags:
+1. `TOON_ENABLED=false` (default off)
+2. `TOON_MIN_BYTES=4096` (only convert larger payloads)
+3. `TOON_FAIL_OPEN=true` (fallback to JSON on any TOON error)
+4. `TOON_LOG_STATS=true` (log before/after token estimate for observability)
+## 7) Verification Gates
+Before enabling:
+1. Existing unit tests pass unchanged.
+2. Existing MCP smoke passes (`find_tool`/`call_tool` path).
+With `TOON_ENABLED=true`:
+1. Prompt A/B benchmark still passes functionally.
+2. No regression in Task/subagent behavior.
+3. Data-heavy prompt shows token reduction vs baseline.
+4. No increase in protocol/tool-call errors.
+## 8) Rollback Rules
+Immediate rollback:
+1. Set `TOON_ENABLED=false`.
+2. Restart Lynkr service.
+Code rollback:
+1. Revert TOON integration commit(s) on this branch.
+2. Re-run unit + MCP smoke gates.
+## 9) Risks and Mitigations
+1. Risk: semantic drift from transformed payloads.
+   - Mitigation: apply only to read-only context, fail-open on error, keep canonical JSON.
+2. Risk: negligible gains on non-tabular/deeply nested payloads.
+   - Mitigation: threshold + eligibility checks; skip low-value payloads.
+3. Risk: harder debugging.
+   - Mitigation: log conversion stats and keep original payload for diagnostics.
+## 10) Stock Provider Validation (Ollama Cloud)
+Date: 2026-02-17
+Runtime under test:
+1. `MODEL_PROVIDER=ollama`
+2. `OLLAMA_ENDPOINT=http://127.0.0.1:11434`
+3. `OLLAMA_MODEL=glm-5:cloud`
+4. `TOON_MIN_BYTES=256`
+5. `TOON_FAIL_OPEN=true`
+6. `TOON_LOG_STATS=true`
+Probe used:
+1. Send a two-message request where the second message is a large JSON blob.
+2. Ask model to classify the next message as `JSON` vs `OTHER` based on first character.
+3. Run once with `TOON_ENABLED=false`, once with `TOON_ENABLED=true`.
+Observed results:
+1. `TOON_ENABLED=false`
+   - Reply: `JSON`
+   - Provider header: `x-lynkr-provider: ollama`
+   - TOON log entries: `0`
+2. `TOON_ENABLED=true`
+   - Reply: `OTHER`
+   - Provider header: `x-lynkr-provider: ollama`
+   - TOON log entries: `1`
+   - Logged conversion stats: `originalBytes=6416`, `compressedBytes=5854` (saved `562` bytes, `8.76%`)
+Conclusion:
+1. TOON gating works on stock Ollama cloud path (not moonshot-specific).
+2. Compression is applied only when flag-enabled.
+3. Provider routing remains unchanged (`ollama`) during TOON transformation.

package/documentation/README.md CHANGED Viewed

@@ -9,7 +9,7 @@ Welcome to the comprehensive documentation for Lynkr, the self-hosted Claude Cod
 New to Lynkr? Start here:
 - **[Installation Guide](installation.md)** - Complete installation instructions for all methods (npm, git clone, homebrew, Docker)
-- **[Provider Configuration](providers.md)** - Detailed setup for all 9+ supported providers (Databricks, Bedrock, OpenRouter, Ollama, llama.cpp, Azure OpenAI, Azure Anthropic, OpenAI, LM Studio)
+- **[Provider Configuration](providers.md)** - Detailed setup for all 12+ supported providers (Databricks, Bedrock, OpenRouter, Ollama, llama.cpp, Azure OpenAI, Azure Anthropic, OpenAI, LM Studio, Moonshot AI, Z.AI, Vertex AI)
 - **[Quick Start Examples](installation.md#quick-start-examples)** - Copy-paste configurations to get running fast
 ---
@@ -30,6 +30,7 @@ Connect Lynkr to your development tools:
 Understand Lynkr's capabilities:
 - **[Architecture & Features](features.md)** - System architecture, request flow, format conversion, and core capabilities
+- **[Routing & Model Tiering](routing.md)** - 4-tier model system, 15-dimension complexity scoring, agentic workflow detection, and cost optimization
 - **[Memory System](memory-system.md)** - Titans-inspired long-term memory with surprise-based filtering and decay
 - **[Token Optimization](token-optimization.md)** - Achieve 60-80% cost reduction through smart tool selection, prompt caching, and memory deduplication
 - **[Headroom Compression](headroom.md)** - 47-92% token reduction through intelligent context compression (Smart Crusher, CCR, LLMLingua)
@@ -73,7 +74,7 @@ Get help and contribute:
 - [Installation](installation.md) | [Providers](providers.md) | [Claude Code](claude-code-cli.md) | [Codex CLI](codex-cli.md) | [Cursor](cursor-integration.md) | [Embeddings](embeddings.md)
 ### Features & Optimization
-- [Features](features.md) | [Memory System](memory-system.md) | [Token Optimization](token-optimization.md) | [Headroom](headroom.md) | [Tools](tools.md)
+- [Features](features.md) | [Routing](routing.md) | [Memory System](memory-system.md) | [Token Optimization](token-optimization.md) | [Headroom](headroom.md) | [Tools](tools.md)
 ### Deployment & Production
 - [Docker](docker.md) | [Production](production.md) | [API Reference](api.md)

package/documentation/claude-code-cli.md CHANGED Viewed

@@ -11,7 +11,7 @@ Lynkr acts as a drop-in replacement for Anthropic's backend, enabling Claude Cod
 ### Why Use Lynkr with Claude Code CLI?
 - 💰 **60-80% cost savings** through token optimization
-- 🔓 **Provider choice** - Use any of 9+ supported providers
+- 🔓 **Provider choice** - Use any of 12+ supported providers
 - 🏠 **Self-hosted** - Full control over your AI infrastructure
 - 🔒 **Local option** - Run 100% offline with Ollama or llama.cpp
 - ✅ **Zero code changes** - Drop-in replacement for Anthropic backend
@@ -74,7 +74,7 @@ export DATABRICKS_API_BASE=https://your-workspace.databricks.com
 export DATABRICKS_API_KEY=dapi1234567890abcdef
 ```
-See [Provider Configuration Guide](providers.md) for all 9+ providers.
+See [Provider Configuration Guide](providers.md) for all 12+ providers.
 ---
@@ -341,15 +341,16 @@ export MODEL_PROVIDER=databricks
 ---
-## Hybrid Routing (Cost Optimization)
+## Tier-Based Routing (Cost Optimization)
-Use local Ollama for simple tasks, fallback to cloud for complex ones:
+Use local Ollama for simple tasks, cloud for complex ones:
 ```bash
-# Configure hybrid routing
-export MODEL_PROVIDER=ollama
-export OLLAMA_MODEL=llama3.1:8b
-export PREFER_OLLAMA=true
+# Configure tier-based routing (set all 4 to enable)
+export TIER_SIMPLE=ollama:llama3.2
+export TIER_MEDIUM=openrouter:openai/gpt-4o-mini
+export TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
+export TIER_REASONING=databricks:databricks-claude-sonnet-4-5
 export FALLBACK_ENABLED=true
 export FALLBACK_PROVIDER=databricks
 export DATABRICKS_API_BASE=https://your-workspace.databricks.com
@@ -360,13 +361,15 @@ lynkr start
 ```
 **How it works:**
-- **0-2 tools**: Ollama (free, local, fast)
-- **3-15 tools**: OpenRouter (if configured) or fallback
-- **16+ tools**: Databricks/Azure (most capable)
-- **Ollama failures**: Automatic transparent fallback to cloud
+- Each request is scored for complexity (0-100) and mapped to a tier
+- **SIMPLE (0-25)**: Ollama (free, local, fast)
+- **MEDIUM (26-50)**: OpenRouter (affordable cloud)
+- **COMPLEX (51-75)**: Databricks (most capable)
+- **REASONING (76-100)**: Databricks (best available)
+- **Provider failures**: Automatic transparent fallback to cloud
 **Cost savings:**
-- **65-100%** for requests that stay on Ollama
+- **65-100%** for requests routed to local models
 - **40-87%** faster for simple requests
 ---
@@ -534,9 +537,13 @@ claude "What files are in the current directory?"
    - Local (Ollama): Should be 100-500ms
    - Cloud: Should be 500ms-2s
-2. **Enable hybrid routing:**
+2. **Enable tier-based routing:**
    ```bash
-   export PREFER_OLLAMA=true
+   # Set all 4 TIER_* env vars to enable tier-based routing
+   export TIER_SIMPLE=ollama:llama3.2
+   export TIER_MEDIUM=openrouter:openai/gpt-4o-mini
+   export TIER_COMPLEX=azure-openai:gpt-4o
+   export TIER_REASONING=azure-openai:gpt-4o
    export FALLBACK_ENABLED=true
    ```
@@ -655,7 +662,7 @@ Claude Code CLI (displays result)
 ## Next Steps
-- **[Provider Configuration](providers.md)** - Configure all 9+ providers
+- **[Provider Configuration](providers.md)** - Configure all 12+ providers
 - **[Installation Guide](installation.md)** - Detailed installation
 - **[Features Guide](features.md)** - Learn about advanced features
 - **[Token Optimization](token-optimization.md)** - Maximize cost savings

package/documentation/cursor-integration.md CHANGED Viewed

@@ -534,11 +534,14 @@ AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
    - **Cloud** (OpenRouter/Databricks): Should be 500ms-2s
    - **Distant regions**: Can be 2-5s
-2. **Enable hybrid routing** for speed:
+2. **Enable tier-based routing** for speed:
    ```env
-   # Use Ollama for simple requests (fast)
-   # Cloud for complex requests
-   PREFER_OLLAMA=true
+   # Use Ollama for simple requests (fast), cloud for complex requests
+   # Set all 4 TIER_* env vars to enable tier-based routing
+   TIER_SIMPLE=ollama:llama3.2
+   TIER_MEDIUM=openrouter:openai/gpt-4o-mini
+   TIER_COMPLEX=azure-openai:gpt-4o
+   TIER_REASONING=azure-openai:gpt-4o
    FALLBACK_ENABLED=true
    ```
@@ -675,12 +678,12 @@ OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
 ### Setup 3: Hybrid (Best of Both Worlds)
 ```bash
-# Chat: Ollama for simple requests, Databricks for complex
-PREFER_OLLAMA=true
+# Chat: Tier-based routing (set all 4 to enable)
+TIER_SIMPLE=ollama:llama3.2
+TIER_MEDIUM=openrouter:openai/gpt-4o-mini
+TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
+TIER_REASONING=databricks:databricks-claude-sonnet-4-5
 FALLBACK_ENABLED=true
-OLLAMA_MODEL=llama3.1:8b
-# Fallback to Databricks for complex requests
 FALLBACK_PROVIDER=databricks
 DATABRICKS_API_BASE=https://your-workspace.databricks.com
 DATABRICKS_API_KEY=your-key
@@ -688,15 +691,15 @@ DATABRICKS_API_KEY=your-key
 # Embeddings: Ollama (local, private)
 OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
-# Cost: Mostly FREE (Ollama handles 70-80% of requests)
-#       Only complex tool-heavy requests go to Databricks
+# Cost: Mostly FREE (Ollama handles 70-80% of simple requests)
+#       Only complex/reasoning requests go to Databricks
 ```
 **Benefits:**
-- ✅ Mostly FREE (70-80% of requests on Ollama)
+- ✅ Mostly FREE (70-80% of requests on Ollama via TIER_SIMPLE)
 - ✅ Private embeddings (local search)
 - ✅ Cloud quality for complex tasks
-- ✅ Automatic intelligent routing
+- ✅ Automatic intelligent tier-based routing
 ---
@@ -704,7 +707,7 @@ OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
 | Aspect | Cursor Native | Lynkr + Cursor |
 |--------|---------------|----------------|
-| **Providers** | OpenAI only | 9+ providers (Bedrock, Databricks, OpenRouter, Ollama, llama.cpp, etc.) |
+| **Providers** | OpenAI only | 12+ providers (Bedrock, Databricks, OpenRouter, Ollama, llama.cpp, Moonshot, etc.) |
 | **Costs** | OpenAI pricing | 60-80% cheaper (or 100% FREE with Ollama) |
 | **Privacy** | Cloud-only | Can run 100% locally (Ollama + local embeddings) |
 | **Embeddings** | Built-in (cloud) | 4 options: Ollama (local), llama.cpp (local), OpenRouter (cloud), OpenAI (cloud) |

package/documentation/docker.md CHANGED Viewed

@@ -73,10 +73,14 @@ services:
     ports:
       - "8081:8081"
     environment:
-      # Hybrid routing: Ollama first, fallback to cloud
+      # Tier-based routing: local for simple, cloud for complex
       - MODEL_PROVIDER=ollama
       - OLLAMA_API_BASE=http://ollama:11434
-      - PREFER_OLLAMA=true
+      # Set all 4 TIER_* vars to enable tier-based routing
+      - TIER_SIMPLE=ollama:llama3.2
+      - TIER_MEDIUM=ollama:llama3.2
+      - TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
+      - TIER_REASONING=databricks:databricks-claude-sonnet-4-5
       - FALLBACK_ENABLED=true
       - FALLBACK_PROVIDER=databricks
       - DATABRICKS_API_BASE=${DATABRICKS_API_BASE}
@@ -452,8 +456,11 @@ environment:
   - DATABRICKS_API_BASE=https://your-workspace.databricks.com
   - DATABRICKS_API_KEY=${DATABRICKS_API_KEY}
-  # Hybrid routing
-  - PREFER_OLLAMA=true
+  # Tier-based routing (set all 4 to enable)
+  - TIER_SIMPLE=ollama:llama3.2
+  - TIER_MEDIUM=openrouter:openai/gpt-4o-mini
+  - TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
+  - TIER_REASONING=databricks:databricks-claude-sonnet-4-5
   - FALLBACK_ENABLED=true
   - FALLBACK_PROVIDER=databricks

package/documentation/embeddings.md CHANGED Viewed

@@ -532,10 +532,12 @@ OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small
 **Best for:** Privacy + Quality + Cost Optimization
 ```env
-# Chat: Ollama + Cloud fallback
-PREFER_OLLAMA=true
+# Chat: Tier-based routing (set all 4 to enable)
+TIER_SIMPLE=ollama:llama3.2
+TIER_MEDIUM=openrouter:openai/gpt-4o-mini
+TIER_COMPLEX=databricks:databricks-claude-sonnet-4-5
+TIER_REASONING=databricks:databricks-claude-sonnet-4-5
 FALLBACK_ENABLED=true
-OLLAMA_MODEL=llama3.1:8b
 FALLBACK_PROVIDER=databricks
 DATABRICKS_API_BASE=https://your-workspace.databricks.com
 DATABRICKS_API_KEY=your-key
@@ -547,10 +549,10 @@ OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
 ```
 **Benefits:**
-- ✅ 70-80% of chat requests FREE (Ollama)
+- ✅ 70-80% of chat requests FREE (Ollama via TIER_SIMPLE)
 - ✅ 100% private embeddings (local)
 - ✅ Cloud quality for complex tasks
-- ✅ Intelligent automatic routing
+- ✅ Intelligent automatic tier-based routing
 ---

package/documentation/faq.md CHANGED Viewed

@@ -8,11 +8,11 @@ Common questions about Lynkr, installation, configuration, and usage.
 ### What is Lynkr?
-Lynkr is a self-hosted proxy server that enables Claude Code CLI and Cursor IDE to work with multiple LLM providers (Databricks, AWS Bedrock, OpenRouter, Ollama, etc.) instead of being locked to Anthropic's API.
+Lynkr is a self-hosted proxy server that enables Claude Code CLI and Cursor IDE to work with multiple LLM providers (Databricks, AWS Bedrock, OpenRouter, Ollama, Moonshot AI, etc.) instead of being locked to Anthropic's API.
 **Key benefits:**
 - 💰 **60-80% cost savings** through token optimization
-- 🔓 **Provider flexibility** - Choose from 9+ providers
+- 🔓 **Provider flexibility** - Choose from 12+ providers
 - 🔒 **Privacy** - Run 100% locally with Ollama or llama.cpp
 - ✅ **Zero code changes** - Drop-in replacement for Anthropic backend
@@ -67,7 +67,7 @@ Lynkr itself is **100% FREE** and open source (Apache 2.0 license).
 | Feature | Native Claude Code | Lynkr |
 |---------|-------------------|-------|
-| **Providers** | Anthropic only | 9+ providers |
+| **Providers** | Anthropic only | 12+ providers |
 | **Cost** | Full Anthropic pricing | 60-80% cheaper |
 | **Local models** | ❌ Cloud-only | ✅ Ollama, llama.cpp |
 | **Privacy** | ☁️ Cloud | 🔒 Can run 100% locally |
@@ -126,6 +126,11 @@ See [Installation Guide](installation.md) for all methods.
 - **Setup:** 5 minutes
 - **Cost:** ~$10-20/month
+**For Affordable Cloud + Reasoning:**
+- ✅ **Moonshot AI** - Kimi K2, thinking models
+- **Setup:** 2 minutes
+- **Cost:** ~$5-10/month
 **For Enterprise:**
 - ✅ **Databricks** - Claude 4.5, enterprise SLA
 - **Setup:** 10 minutes
@@ -137,23 +142,71 @@ See [Provider Configuration Guide](providers.md) for detailed comparison.
 ### Can I use multiple providers?
-**Yes!** Lynkr supports hybrid routing:
+**Yes!** Lynkr supports tier-based routing:
 ```bash
-# Use Ollama for simple requests, Databricks for complex ones
-export PREFER_OLLAMA=true
-export OLLAMA_MODEL=llama3.1:8b
+# Set all 4 TIER_* env vars to enable tier-based routing
+export TIER_SIMPLE=ollama:llama3.2
+export TIER_MEDIUM=openrouter:openai/gpt-4o-mini
+export TIER_COMPLEX=azure-openai:gpt-4o
+export TIER_REASONING=azure-openai:gpt-4o
 export FALLBACK_ENABLED=true
 export FALLBACK_PROVIDER=databricks
 ```
 **How it works:**
-- **0-2 tools**: Ollama (free, local, fast)
-- **3-15 tools**: OpenRouter (if configured) or fallback
-- **16+ tools**: Databricks/Azure (most capable)
-- **Ollama failures**: Automatic transparent fallback
+- Each request is scored for complexity (0-100) and mapped to a tier
+- **SIMPLE (0-25)**: Ollama (free, local, fast) or Moonshot (affordable cloud)
+- **MEDIUM (26-50)**: OpenRouter or mid-range cloud model
+- **COMPLEX (51-75)**: Capable cloud models
+- **REASONING (76-100)**: Best available models
+- **Provider failures**: Automatic transparent fallback
+**Cost savings:** 65-100% for requests routed to local/cheap models.
+---
+### What is MODEL_PROVIDER and do I still need it?
+`MODEL_PROVIDER` sets a single static provider for all requests. When you set `MODEL_PROVIDER=ollama`, every request goes to Ollama regardless of complexity.
+**With TIER_\* vars configured:** `MODEL_PROVIDER` is not used for routing — the tier system picks the provider per-request. However, `MODEL_PROVIDER` is still read for startup checks (e.g. waiting for Ollama) and as a fallback default in edge cases. Keep it set to your most-used provider.
+**Without TIER_\* vars:** `MODEL_PROVIDER` is the only thing that controls where requests go.
+---
+### How do MODEL_PROVIDER and TIER_\* work together?
+They are two separate routing modes:
+| Scenario | What happens |
+|----------|-------------|
+| `MODEL_PROVIDER` only | Static routing — all requests go to that provider |
+| All 4 `TIER_*` set | Tier routing — TIER_\* **overrides** MODEL_PROVIDER for routing |
+| Only 1-3 `TIER_*` set | Tier routing disabled — falls back to `MODEL_PROVIDER` |
+| Both set | TIER_\* takes priority for routing; MODEL_PROVIDER is kept as a config default |
+**Example:** If you have `MODEL_PROVIDER=ollama` and `TIER_COMPLEX=databricks:claude-sonnet`, complex requests go to Databricks even though MODEL_PROVIDER says ollama.
+---
+### What happens if I only set some TIER_\* vars?
+All 4 must be set (`TIER_SIMPLE`, `TIER_MEDIUM`, `TIER_COMPLEX`, `TIER_REASONING`) for tier routing to activate. If any are missing, tier routing is disabled entirely and `MODEL_PROVIDER` is used for all requests.
+This is intentional — partial tier config could lead to unexpected gaps where some complexity levels have no provider assigned.
+---
+### What is FALLBACK_PROVIDER?
+The fallback provider is a safety net for when the tier-selected provider fails (timeout, connection refused, rate limit). If `FALLBACK_ENABLED=true` and the primary provider for a request fails, Lynkr retries the request against `FALLBACK_PROVIDER` transparently.
-**Cost savings:** 65-100% for requests that stay on Ollama.
+- Only triggers when tier routing is active
+- Cannot be a local provider (ollama, llamacpp, lmstudio) — use cloud providers
+- Defaults to `databricks`
+- If you don't have cloud credentials, set `FALLBACK_ENABLED=false`
 ---
@@ -227,6 +280,7 @@ See [Embeddings Guide](embeddings.md) for details.
 | **OpenRouter** | 500ms-2s | $-$$ | Excellent | Flexibility, 100+ models |
 | **Databricks/Azure** | 500ms-2s | $$$ | Excellent | Enterprise, Claude 4.5 |
 | **AWS Bedrock** | 500ms-2s | $-$$$ | Excellent* | AWS, 100+ models |
+| **Moonshot AI** | 500ms-2s | $ | Good | Affordable, thinking models |
 | **OpenAI** | 500ms-2s | $$ | Excellent | GPT-4o, o1, o3 |
 _* Tool calling only supported by Claude models on Bedrock_