lynkr 8.0.0 → 9.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.lynkr/telemetry.db +0 -0
- package/.lynkr/telemetry.db-shm +0 -0
- package/.lynkr/telemetry.db-wal +0 -0
- package/README.md +196 -322
- package/lynkr-skill.tar.gz +0 -0
- package/package.json +4 -3
- package/src/api/openai-router.js +64 -13
- package/src/api/providers-handler.js +171 -3
- package/src/api/router.js +9 -2
- package/src/clients/circuit-breaker.js +10 -247
- package/src/clients/codex-process.js +342 -0
- package/src/clients/codex-utils.js +143 -0
- package/src/clients/databricks.js +210 -63
- package/src/clients/resilience.js +540 -0
- package/src/clients/retry.js +22 -167
- package/src/clients/standard-tools.js +23 -0
- package/src/config/index.js +77 -0
- package/src/context/compression.js +42 -9
- package/src/context/distill.js +492 -0
- package/src/orchestrator/index.js +48 -8
- package/src/routing/complexity-analyzer.js +258 -5
- package/src/routing/index.js +12 -2
- package/src/routing/latency-tracker.js +148 -0
- package/src/routing/model-tiers.js +2 -0
- package/src/routing/quality-scorer.js +113 -0
- package/src/routing/telemetry.js +464 -0
- package/src/server.js +13 -12
- package/src/tools/code-graph.js +538 -0
- package/src/tools/code-mode.js +304 -0
- package/src/tools/index.js +4 -0
- package/src/tools/lazy-loader.js +18 -0
- package/src/tools/mcp-remote.js +7 -0
- package/src/tools/smart-selection.js +11 -0
- package/src/tools/tinyfish.js +358 -0
- package/src/tools/truncate.js +1 -0
- package/src/utils/payload.js +206 -0
- package/src/utils/perf-timer.js +80 -0
- package/.github/FUNDING.yml +0 -15
- package/.github/workflows/README.md +0 -215
- package/.github/workflows/ci.yml +0 -69
- package/.github/workflows/index.yml +0 -62
- package/.github/workflows/web-tools-tests.yml +0 -56
- package/CITATIONS.bib +0 -6
- package/DEPLOYMENT.md +0 -1001
- package/LYNKR-TUI-PLAN.md +0 -984
- package/PERFORMANCE-REPORT.md +0 -866
- package/PLAN-per-client-model-routing.md +0 -252
- package/docs/42642f749da6234f41b6b425c3bb07c9.txt +0 -1
- package/docs/BingSiteAuth.xml +0 -4
- package/docs/docs-style.css +0 -478
- package/docs/docs.html +0 -198
- package/docs/google5be250e608e6da39.html +0 -1
- package/docs/index.html +0 -577
- package/docs/index.md +0 -584
- package/docs/robots.txt +0 -4
- package/docs/sitemap.xml +0 -44
- package/docs/style.css +0 -1223
- package/docs/toon-integration-spec.md +0 -130
- package/documentation/README.md +0 -101
- package/documentation/api.md +0 -806
- package/documentation/claude-code-cli.md +0 -679
- package/documentation/codex-cli.md +0 -397
- package/documentation/contributing.md +0 -571
- package/documentation/cursor-integration.md +0 -734
- package/documentation/docker.md +0 -874
- package/documentation/embeddings.md +0 -762
- package/documentation/faq.md +0 -713
- package/documentation/features.md +0 -403
- package/documentation/headroom.md +0 -519
- package/documentation/installation.md +0 -758
- package/documentation/memory-system.md +0 -476
- package/documentation/production.md +0 -636
- package/documentation/providers.md +0 -1009
- package/documentation/routing.md +0 -476
- package/documentation/testing.md +0 -629
- package/documentation/token-optimization.md +0 -325
- package/documentation/tools.md +0 -697
- package/documentation/troubleshooting.md +0 -969
- package/final-test.js +0 -33
- package/headroom-sidecar/config.py +0 -93
- package/headroom-sidecar/requirements.txt +0 -14
- package/headroom-sidecar/server.py +0 -451
- package/monitor-agents.sh +0 -31
- package/scripts/audit-log-reader.js +0 -399
- package/scripts/compact-dictionary.js +0 -204
- package/scripts/test-deduplication.js +0 -448
- package/src/db/database.sqlite +0 -0
- package/te +0 -11622
- package/test/README.md +0 -212
- package/test/azure-openai-config.test.js +0 -213
- package/test/azure-openai-error-resilience.test.js +0 -238
- package/test/azure-openai-format-conversion.test.js +0 -354
- package/test/azure-openai-integration.test.js +0 -287
- package/test/azure-openai-routing.test.js +0 -175
- package/test/azure-openai-streaming.test.js +0 -171
- package/test/bedrock-integration.test.js +0 -457
- package/test/comprehensive-test-suite.js +0 -928
- package/test/config-validation.test.js +0 -207
- package/test/cursor-integration.test.js +0 -484
- package/test/format-conversion.test.js +0 -578
- package/test/hybrid-routing-integration.test.js +0 -269
- package/test/hybrid-routing-performance.test.js +0 -428
- package/test/llamacpp-integration.test.js +0 -882
- package/test/lmstudio-integration.test.js +0 -347
- package/test/memory/extractor.test.js +0 -398
- package/test/memory/retriever.test.js +0 -613
- package/test/memory/retriever.test.js.bak +0 -585
- package/test/memory/search.test.js +0 -537
- package/test/memory/search.test.js.bak +0 -389
- package/test/memory/store.test.js +0 -344
- package/test/memory/store.test.js.bak +0 -312
- package/test/memory/surprise.test.js +0 -300
- package/test/memory-performance.test.js +0 -472
- package/test/openai-integration.test.js +0 -683
- package/test/openrouter-error-resilience.test.js +0 -418
- package/test/passthrough-mode.test.js +0 -385
- package/test/performance-benchmark.js +0 -351
- package/test/performance-tests.js +0 -528
- package/test/routing.test.js +0 -225
- package/test/toon-compression.test.js +0 -131
- package/test/web-tools.test.js +0 -329
- package/test-agents-simple.js +0 -43
- package/test-cli-connection.sh +0 -33
- package/test-learning-unit.js +0 -126
- package/test-learning.js +0 -112
- package/test-parallel-agents.sh +0 -124
- package/test-parallel-direct.js +0 -155
- package/test-subagents.sh +0 -117
|
@@ -1,1009 +0,0 @@
|
|
|
1
|
-
# Provider Configuration Guide
|
|
2
|
-
|
|
3
|
-
Complete configuration reference for all 12+ supported LLM providers. Each provider section includes setup instructions, model options, pricing, and example configurations.
|
|
4
|
-
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
## Overview
|
|
8
|
-
|
|
9
|
-
Lynkr supports multiple AI model providers, giving you flexibility in choosing the right model for your needs:
|
|
10
|
-
|
|
11
|
-
| Provider | Type | Models | Cost | Privacy | Setup Complexity |
|
|
12
|
-
|----------|------|--------|------|---------|------------------|
|
|
13
|
-
| **AWS Bedrock** | Cloud | 100+ (Claude, DeepSeek, Qwen, Nova, Titan, Llama, Mistral) | $-$$$ | Cloud | Easy |
|
|
14
|
-
| **Databricks** | Cloud | Claude Sonnet 4.5, Opus 4.5 | $$$ | Cloud | Medium |
|
|
15
|
-
| **OpenRouter** | Cloud | 100+ (GPT, Claude, Gemini, Llama, Mistral, etc.) | $-$$ | Cloud | Easy |
|
|
16
|
-
| **Ollama** | Local | Unlimited (free, offline) | **FREE** | 🔒 100% Local | Easy |
|
|
17
|
-
| **llama.cpp** | Local | Any GGUF model | **FREE** | 🔒 100% Local | Medium |
|
|
18
|
-
| **Azure OpenAI** | Cloud | GPT-4o, GPT-5, o1, o3 | $$$ | Cloud | Medium |
|
|
19
|
-
| **Azure Anthropic** | Cloud | Claude models | $$$ | Cloud | Medium |
|
|
20
|
-
| **OpenAI** | Cloud | GPT-4o, o1, o3 | $$$ | Cloud | Easy |
|
|
21
|
-
| **Moonshot AI (Kimi)** | Cloud | Kimi K2 (thinking + turbo) | $ | Cloud | Easy |
|
|
22
|
-
| **LM Studio** | Local | Local models with GUI | **FREE** | 🔒 100% Local | Easy |
|
|
23
|
-
| **MLX OpenAI Server** | Local | Apple Silicon optimized | **FREE** | 🔒 100% Local | Easy |
|
|
24
|
-
|
|
25
|
-
---
|
|
26
|
-
|
|
27
|
-
## Configuration Methods
|
|
28
|
-
|
|
29
|
-
There are two routing modes. Choose based on your needs:
|
|
30
|
-
|
|
31
|
-
### Static Routing (Single Provider)
|
|
32
|
-
|
|
33
|
-
Set `MODEL_PROVIDER` to send all requests to one provider. All requests go to this provider regardless of complexity:
|
|
34
|
-
|
|
35
|
-
```bash
|
|
36
|
-
export MODEL_PROVIDER=databricks
|
|
37
|
-
export DATABRICKS_API_BASE=https://your-workspace.databricks.com
|
|
38
|
-
export DATABRICKS_API_KEY=your-key
|
|
39
|
-
lynkr start
|
|
40
|
-
```
|
|
41
|
-
|
|
42
|
-
### Tier-Based Routing (Recommended for Cost Optimization)
|
|
43
|
-
|
|
44
|
-
Set **all 4** `TIER_*` vars to route requests by complexity. Each request is scored 0-100 and routed to the `provider:model` matching its complexity tier. When all four are configured, they **override** `MODEL_PROVIDER` for routing decisions:
|
|
45
|
-
|
|
46
|
-
```bash
|
|
47
|
-
export MODEL_PROVIDER=ollama # Still needed for startup checks
|
|
48
|
-
export TIER_SIMPLE=ollama:llama3.2 # Score 0-25 → local (free)
|
|
49
|
-
export TIER_MEDIUM=openrouter:openai/gpt-4o-mini # Score 26-50 → affordable cloud
|
|
50
|
-
export TIER_COMPLEX=databricks:claude-sonnet # Score 51-75 → capable cloud
|
|
51
|
-
export TIER_REASONING=databricks:claude-sonnet # Score 76-100 → best available
|
|
52
|
-
lynkr start
|
|
53
|
-
```
|
|
54
|
-
|
|
55
|
-
> **Important:** All 4 `TIER_*` vars must be set to enable tier routing. If any are missing, tier routing is disabled and `MODEL_PROVIDER` is used for all requests. `MODEL_PROVIDER` should always be set — even with tier routing active, it is used for startup checks, provider discovery, and as the default provider when a `TIER_*` value has no `provider:` prefix.
|
|
56
|
-
>
|
|
57
|
-
> **`PREFER_OLLAMA` is deprecated** and has no effect. Use `TIER_SIMPLE=ollama:<model>` to route simple requests to Ollama. See [Routing Precedence](routing.md#routing-precedence) for full details.
|
|
58
|
-
|
|
59
|
-
### .env File (Recommended for Production)
|
|
60
|
-
|
|
61
|
-
```bash
|
|
62
|
-
# Copy example file
|
|
63
|
-
cp .env.example .env
|
|
64
|
-
|
|
65
|
-
# Edit with your credentials
|
|
66
|
-
nano .env
|
|
67
|
-
```
|
|
68
|
-
|
|
69
|
-
Example `.env`:
|
|
70
|
-
```env
|
|
71
|
-
MODEL_PROVIDER=ollama
|
|
72
|
-
DATABRICKS_API_BASE=https://your-workspace.databricks.com
|
|
73
|
-
DATABRICKS_API_KEY=dapi1234567890abcdef
|
|
74
|
-
PORT=8081
|
|
75
|
-
LOG_LEVEL=info
|
|
76
|
-
|
|
77
|
-
# Tier routing (optional — set all 4 to enable)
|
|
78
|
-
TIER_SIMPLE=ollama:llama3.2
|
|
79
|
-
TIER_MEDIUM=openrouter:openai/gpt-4o-mini
|
|
80
|
-
TIER_COMPLEX=databricks:claude-sonnet
|
|
81
|
-
TIER_REASONING=databricks:claude-sonnet
|
|
82
|
-
```
|
|
83
|
-
|
|
84
|
-
---
|
|
85
|
-
|
|
86
|
-
## Remote/Network Configuration
|
|
87
|
-
|
|
88
|
-
**All provider endpoints support remote addresses** - you're not limited to `localhost`. This enables powerful setups like:
|
|
89
|
-
|
|
90
|
-
- 🖥️ **GPU Server**: Run Ollama/llama.cpp on a dedicated GPU machine
|
|
91
|
-
- 🏢 **Team Sharing**: Multiple developers using one Lynkr instance
|
|
92
|
-
- ☁️ **Hybrid**: Lynkr on local machine, models on cloud VM
|
|
93
|
-
|
|
94
|
-
### Examples
|
|
95
|
-
|
|
96
|
-
**Ollama on Remote GPU Server**
|
|
97
|
-
```env
|
|
98
|
-
MODEL_PROVIDER=ollama
|
|
99
|
-
OLLAMA_ENDPOINT=http://192.168.1.100:11434 # Local network IP
|
|
100
|
-
# or
|
|
101
|
-
OLLAMA_ENDPOINT=http://gpu-server.local:11434 # Hostname
|
|
102
|
-
# or
|
|
103
|
-
OLLAMA_ENDPOINT=http://ollama.mycompany.com:11434 # Domain
|
|
104
|
-
```
|
|
105
|
-
|
|
106
|
-
**llama.cpp on Remote Machine**
|
|
107
|
-
```env
|
|
108
|
-
MODEL_PROVIDER=llamacpp
|
|
109
|
-
LLAMACPP_ENDPOINT=http://10.0.0.50:8080
|
|
110
|
-
```
|
|
111
|
-
|
|
112
|
-
**LM Studio on Another Computer**
|
|
113
|
-
```env
|
|
114
|
-
MODEL_PROVIDER=lmstudio
|
|
115
|
-
LMSTUDIO_ENDPOINT=http://workstation.local:1234
|
|
116
|
-
```
|
|
117
|
-
|
|
118
|
-
### Network Requirements
|
|
119
|
-
|
|
120
|
-
| Setup | Requirement |
|
|
121
|
-
|-------|-------------|
|
|
122
|
-
| Same machine | `localhost` or `127.0.0.1` |
|
|
123
|
-
| Local network | IP address or hostname, firewall allows port |
|
|
124
|
-
| Remote/Internet | Public IP/domain, port forwarding, consider VPN/auth |
|
|
125
|
-
|
|
126
|
-
> ⚠️ **Security Note**: When exposing endpoints over a network, ensure proper firewall rules and consider using a VPN or SSH tunnel for sensitive deployments.
|
|
127
|
-
|
|
128
|
-
---
|
|
129
|
-
|
|
130
|
-
## Provider-Specific Configuration
|
|
131
|
-
|
|
132
|
-
### 1. AWS Bedrock (100+ Models)
|
|
133
|
-
|
|
134
|
-
**Best for:** AWS ecosystem, multi-model flexibility, Claude + alternatives
|
|
135
|
-
|
|
136
|
-
#### Configuration
|
|
137
|
-
|
|
138
|
-
```env
|
|
139
|
-
MODEL_PROVIDER=bedrock
|
|
140
|
-
AWS_BEDROCK_API_KEY=your-bearer-token
|
|
141
|
-
AWS_BEDROCK_REGION=us-east-1
|
|
142
|
-
AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
|
|
143
|
-
```
|
|
144
|
-
|
|
145
|
-
#### Getting AWS Bedrock API Key
|
|
146
|
-
|
|
147
|
-
1. Log in to [AWS Console](https://console.aws.amazon.com/)
|
|
148
|
-
2. Navigate to **Bedrock** → **API Keys**
|
|
149
|
-
3. Click **Generate API Key**
|
|
150
|
-
4. Copy the bearer token (this is your `AWS_BEDROCK_API_KEY`)
|
|
151
|
-
5. Enable model access in Bedrock console
|
|
152
|
-
6. See: [AWS Bedrock API Keys Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/api-keys-generate.html)
|
|
153
|
-
|
|
154
|
-
#### Available Regions
|
|
155
|
-
|
|
156
|
-
- `us-east-1` (N. Virginia) - Most models available
|
|
157
|
-
- `us-west-2` (Oregon)
|
|
158
|
-
- `us-east-2` (Ohio)
|
|
159
|
-
- `ap-southeast-1` (Singapore)
|
|
160
|
-
- `ap-northeast-1` (Tokyo)
|
|
161
|
-
- `eu-central-1` (Frankfurt)
|
|
162
|
-
|
|
163
|
-
#### Model Catalog
|
|
164
|
-
|
|
165
|
-
**Claude Models (Best for Tool Calling)** ✅
|
|
166
|
-
|
|
167
|
-
Claude 4.5 (latest - requires inference profiles):
|
|
168
|
-
```env
|
|
169
|
-
AWS_BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0 # Regional US
|
|
170
|
-
AWS_BEDROCK_MODEL_ID=us.anthropic.claude-haiku-4-5-20251001-v1:0 # Fast, efficient
|
|
171
|
-
AWS_BEDROCK_MODEL_ID=global.anthropic.claude-sonnet-4-5-20250929-v1:0 # Cross-region
|
|
172
|
-
```
|
|
173
|
-
|
|
174
|
-
Claude 3.x models:
|
|
175
|
-
```env
|
|
176
|
-
AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0 # Excellent tool calling
|
|
177
|
-
AWS_BEDROCK_MODEL_ID=anthropic.claude-3-opus-20240229-v1:0 # Most capable
|
|
178
|
-
AWS_BEDROCK_MODEL_ID=anthropic.claude-3-haiku-20240307-v1:0 # Fast, cheap
|
|
179
|
-
```
|
|
180
|
-
|
|
181
|
-
**DeepSeek Models (NEW - 2025)**
|
|
182
|
-
```env
|
|
183
|
-
AWS_BEDROCK_MODEL_ID=us.deepseek.r1-v1:0 # DeepSeek R1 - reasoning model (o1-style)
|
|
184
|
-
```
|
|
185
|
-
|
|
186
|
-
**Qwen Models (Alibaba - NEW 2025)**
|
|
187
|
-
```env
|
|
188
|
-
AWS_BEDROCK_MODEL_ID=qwen.qwen3-235b-a22b-2507-v1:0 # Largest, 235B parameters
|
|
189
|
-
AWS_BEDROCK_MODEL_ID=qwen.qwen3-32b-v1:0 # Balanced, 32B
|
|
190
|
-
AWS_BEDROCK_MODEL_ID=qwen.qwen3-coder-480b-a35b-v1:0 # Coding specialist, 480B
|
|
191
|
-
AWS_BEDROCK_MODEL_ID=qwen.qwen3-coder-30b-a3b-v1:0 # Coding, smaller
|
|
192
|
-
```
|
|
193
|
-
|
|
194
|
-
**OpenAI Open-Weight Models (NEW - 2025)**
|
|
195
|
-
```env
|
|
196
|
-
AWS_BEDROCK_MODEL_ID=openai.gpt-oss-120b-1:0 # 120B parameters, open-weight
|
|
197
|
-
AWS_BEDROCK_MODEL_ID=openai.gpt-oss-20b-1:0 # 20B parameters, efficient
|
|
198
|
-
```
|
|
199
|
-
|
|
200
|
-
**Google Gemma Models (Open-Weight)**
|
|
201
|
-
```env
|
|
202
|
-
AWS_BEDROCK_MODEL_ID=google.gemma-3-27b # 27B parameters
|
|
203
|
-
AWS_BEDROCK_MODEL_ID=google.gemma-3-12b # 12B parameters
|
|
204
|
-
AWS_BEDROCK_MODEL_ID=google.gemma-3-4b # 4B parameters, efficient
|
|
205
|
-
```
|
|
206
|
-
|
|
207
|
-
**Amazon Models**
|
|
208
|
-
|
|
209
|
-
Nova (multimodal):
|
|
210
|
-
```env
|
|
211
|
-
AWS_BEDROCK_MODEL_ID=us.amazon.nova-pro-v1:0 # Best quality, multimodal, 300K context
|
|
212
|
-
AWS_BEDROCK_MODEL_ID=us.amazon.nova-lite-v1:0 # Fast, cost-effective
|
|
213
|
-
AWS_BEDROCK_MODEL_ID=us.amazon.nova-micro-v1:0 # Ultra-fast, text-only
|
|
214
|
-
```
|
|
215
|
-
|
|
216
|
-
Titan:
|
|
217
|
-
```env
|
|
218
|
-
AWS_BEDROCK_MODEL_ID=amazon.titan-text-premier-v1:0 # Largest
|
|
219
|
-
AWS_BEDROCK_MODEL_ID=amazon.titan-text-express-v1 # Fast
|
|
220
|
-
AWS_BEDROCK_MODEL_ID=amazon.titan-text-lite-v1 # Cheapest
|
|
221
|
-
```
|
|
222
|
-
|
|
223
|
-
**Meta Llama Models**
|
|
224
|
-
```env
|
|
225
|
-
AWS_BEDROCK_MODEL_ID=meta.llama3-1-70b-instruct-v1:0 # Most capable
|
|
226
|
-
AWS_BEDROCK_MODEL_ID=meta.llama3-1-8b-instruct-v1:0 # Fast, efficient
|
|
227
|
-
```
|
|
228
|
-
|
|
229
|
-
**Mistral Models**
|
|
230
|
-
```env
|
|
231
|
-
AWS_BEDROCK_MODEL_ID=mistral.mistral-large-2407-v1:0 # Largest, coding, multilingual
|
|
232
|
-
AWS_BEDROCK_MODEL_ID=mistral.mistral-small-2402-v1:0 # Efficient
|
|
233
|
-
AWS_BEDROCK_MODEL_ID=mistral.mixtral-8x7b-instruct-v0:1 # Mixture of experts
|
|
234
|
-
```
|
|
235
|
-
|
|
236
|
-
**Cohere Command Models**
|
|
237
|
-
```env
|
|
238
|
-
AWS_BEDROCK_MODEL_ID=cohere.command-r-plus-v1:0 # Best for RAG, search
|
|
239
|
-
AWS_BEDROCK_MODEL_ID=cohere.command-r-v1:0 # Balanced
|
|
240
|
-
```
|
|
241
|
-
|
|
242
|
-
**AI21 Jamba Models**
|
|
243
|
-
```env
|
|
244
|
-
AWS_BEDROCK_MODEL_ID=ai21.jamba-1-5-large-v1:0 # Hybrid architecture, 256K context
|
|
245
|
-
AWS_BEDROCK_MODEL_ID=ai21.jamba-1-5-mini-v1:0 # Fast
|
|
246
|
-
```
|
|
247
|
-
|
|
248
|
-
#### Pricing (per 1M tokens)
|
|
249
|
-
|
|
250
|
-
| Model | Input | Output |
|
|
251
|
-
|-------|-------|--------|
|
|
252
|
-
| Claude 3.5 Sonnet | $3.00 | $15.00 |
|
|
253
|
-
| Claude 3 Opus | $15.00 | $75.00 |
|
|
254
|
-
| Claude 3 Haiku | $0.25 | $1.25 |
|
|
255
|
-
| Titan Text Express | $0.20 | $0.60 |
|
|
256
|
-
| Llama 3 70B | $0.99 | $0.99 |
|
|
257
|
-
| Nova Pro | $0.80 | $3.20 |
|
|
258
|
-
|
|
259
|
-
#### Important Notes
|
|
260
|
-
|
|
261
|
-
⚠️ **Tool Calling:** Only **Claude models** support tool calling on Bedrock. Other models work via Converse API but won't use Read/Write/Bash tools.
|
|
262
|
-
|
|
263
|
-
📖 **Full Documentation:** See [BEDROCK_MODELS.md](../BEDROCK_MODELS.md) for complete model catalog with capabilities and use cases.
|
|
264
|
-
|
|
265
|
-
---
|
|
266
|
-
|
|
267
|
-
### 2. Databricks (Claude Sonnet 4.5, Opus 4.5)
|
|
268
|
-
|
|
269
|
-
**Best for:** Enterprise production use, managed Claude endpoints
|
|
270
|
-
|
|
271
|
-
#### Configuration
|
|
272
|
-
|
|
273
|
-
```env
|
|
274
|
-
MODEL_PROVIDER=databricks
|
|
275
|
-
DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com
|
|
276
|
-
DATABRICKS_API_KEY=dapi1234567890abcdef
|
|
277
|
-
```
|
|
278
|
-
|
|
279
|
-
Optional endpoint path override:
|
|
280
|
-
```env
|
|
281
|
-
DATABRICKS_ENDPOINT_PATH=/serving-endpoints/databricks-claude-sonnet-4-5/invocations
|
|
282
|
-
```
|
|
283
|
-
|
|
284
|
-
#### Getting Databricks Credentials
|
|
285
|
-
|
|
286
|
-
1. Log in to your Databricks workspace
|
|
287
|
-
2. Navigate to **Settings** → **User Settings**
|
|
288
|
-
3. Click **Generate New Token**
|
|
289
|
-
4. Copy the token (this is your `DATABRICKS_API_KEY`)
|
|
290
|
-
5. Your workspace URL is the base URL (e.g., `https://your-workspace.cloud.databricks.com`)
|
|
291
|
-
|
|
292
|
-
#### Available Models
|
|
293
|
-
|
|
294
|
-
- **Claude Sonnet 4.5** - Excellent for tool calling, balanced performance
|
|
295
|
-
- **Claude Opus 4.5** - Most capable model for complex reasoning
|
|
296
|
-
|
|
297
|
-
#### Pricing
|
|
298
|
-
|
|
299
|
-
Contact Databricks for enterprise pricing.
|
|
300
|
-
|
|
301
|
-
---
|
|
302
|
-
|
|
303
|
-
### 3. OpenRouter (100+ Models)
|
|
304
|
-
|
|
305
|
-
**Best for:** Quick setup, model flexibility, cost optimization
|
|
306
|
-
|
|
307
|
-
#### Configuration
|
|
308
|
-
|
|
309
|
-
```env
|
|
310
|
-
MODEL_PROVIDER=openrouter
|
|
311
|
-
OPENROUTER_API_KEY=sk-or-v1-your-key
|
|
312
|
-
OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
|
|
313
|
-
OPENROUTER_ENDPOINT=https://openrouter.ai/api/v1/chat/completions
|
|
314
|
-
```
|
|
315
|
-
|
|
316
|
-
Optional for hybrid routing:
|
|
317
|
-
```env
|
|
318
|
-
OPENROUTER_MAX_TOOLS_FOR_ROUTING=15 # Max tools to route to OpenRouter
|
|
319
|
-
```
|
|
320
|
-
|
|
321
|
-
#### Getting OpenRouter API Key
|
|
322
|
-
|
|
323
|
-
1. Visit [openrouter.ai](https://openrouter.ai)
|
|
324
|
-
2. Sign in with GitHub, Google, or email
|
|
325
|
-
3. Go to [openrouter.ai/keys](https://openrouter.ai/keys)
|
|
326
|
-
4. Create a new API key
|
|
327
|
-
5. Add credits (pay-as-you-go, no subscription required)
|
|
328
|
-
|
|
329
|
-
#### Popular Models
|
|
330
|
-
|
|
331
|
-
**Claude Models (Best for Coding)**
|
|
332
|
-
```env
|
|
333
|
-
OPENROUTER_MODEL=anthropic/claude-3.5-sonnet # $3/$15 per 1M tokens
|
|
334
|
-
OPENROUTER_MODEL=anthropic/claude-opus-4.5 # $15/$75 per 1M tokens
|
|
335
|
-
OPENROUTER_MODEL=anthropic/claude-3-haiku # $0.25/$1.25 per 1M tokens
|
|
336
|
-
```
|
|
337
|
-
|
|
338
|
-
**OpenAI Models**
|
|
339
|
-
```env
|
|
340
|
-
OPENROUTER_MODEL=openai/gpt-4o # $2.50/$10 per 1M tokens
|
|
341
|
-
OPENROUTER_MODEL=openai/gpt-4o-mini # $0.15/$0.60 per 1M tokens (default)
|
|
342
|
-
OPENROUTER_MODEL=openai/o1-preview # $15/$60 per 1M tokens
|
|
343
|
-
OPENROUTER_MODEL=openai/o1-mini # $3/$12 per 1M tokens
|
|
344
|
-
```
|
|
345
|
-
|
|
346
|
-
**Google Models**
|
|
347
|
-
```env
|
|
348
|
-
OPENROUTER_MODEL=google/gemini-pro-1.5 # $1.25/$5 per 1M tokens
|
|
349
|
-
OPENROUTER_MODEL=google/gemini-flash-1.5 # $0.075/$0.30 per 1M tokens
|
|
350
|
-
```
|
|
351
|
-
|
|
352
|
-
**Meta Llama Models**
|
|
353
|
-
```env
|
|
354
|
-
OPENROUTER_MODEL=meta-llama/llama-3.1-405b # $2.70/$2.70 per 1M tokens
|
|
355
|
-
OPENROUTER_MODEL=meta-llama/llama-3.1-70b # $0.52/$0.75 per 1M tokens
|
|
356
|
-
OPENROUTER_MODEL=meta-llama/llama-3.1-8b # $0.06/$0.06 per 1M tokens
|
|
357
|
-
```
|
|
358
|
-
|
|
359
|
-
**Mistral Models**
|
|
360
|
-
```env
|
|
361
|
-
OPENROUTER_MODEL=mistralai/mistral-large # $2/$6 per 1M tokens
|
|
362
|
-
OPENROUTER_MODEL=mistralai/codestral-latest # $0.30/$0.90 per 1M tokens
|
|
363
|
-
```
|
|
364
|
-
|
|
365
|
-
**DeepSeek Models**
|
|
366
|
-
```env
|
|
367
|
-
OPENROUTER_MODEL=deepseek/deepseek-chat # $0.14/$0.28 per 1M tokens
|
|
368
|
-
OPENROUTER_MODEL=deepseek/deepseek-coder # $0.14/$0.28 per 1M tokens
|
|
369
|
-
```
|
|
370
|
-
|
|
371
|
-
#### Benefits
|
|
372
|
-
|
|
373
|
-
- ✅ **100+ models** through one API
|
|
374
|
-
- ✅ **Automatic fallbacks** if primary model unavailable
|
|
375
|
-
- ✅ **Competitive pricing** with volume discounts
|
|
376
|
-
- ✅ **Full tool calling support**
|
|
377
|
-
- ✅ **No monthly fees** - pay only for usage
|
|
378
|
-
- ✅ **Rate limit pooling** across models
|
|
379
|
-
|
|
380
|
-
See [openrouter.ai/models](https://openrouter.ai/models) for complete list with pricing.
|
|
381
|
-
|
|
382
|
-
---
|
|
383
|
-
|
|
384
|
-
### 4. Ollama (Local Models)
|
|
385
|
-
|
|
386
|
-
**Best for:** Local development, privacy, offline use, no API costs
|
|
387
|
-
|
|
388
|
-
#### Configuration
|
|
389
|
-
|
|
390
|
-
```env
|
|
391
|
-
MODEL_PROVIDER=ollama
|
|
392
|
-
OLLAMA_ENDPOINT=http://localhost:11434 # Or any remote IP/hostname
|
|
393
|
-
OLLAMA_MODEL=llama3.1:8b
|
|
394
|
-
OLLAMA_TIMEOUT_MS=120000
|
|
395
|
-
```
|
|
396
|
-
|
|
397
|
-
> 🌐 **Remote Support**: `OLLAMA_ENDPOINT` can be any address - `http://192.168.1.100:11434`, `http://gpu-server:11434`, etc. See [Remote/Network Configuration](#remotenetwork-configuration).
|
|
398
|
-
|
|
399
|
-
#### Performance Optimization
|
|
400
|
-
|
|
401
|
-
**Prevent Cold Starts:** Ollama unloads models after 5 minutes of inactivity by default. This causes slow first requests (10-30+ seconds) while the model reloads. To keep models loaded:
|
|
402
|
-
|
|
403
|
-
**Option 1: Environment Variable (Recommended)**
|
|
404
|
-
```bash
|
|
405
|
-
# Set on Ollama server (not Lynkr)
|
|
406
|
-
# macOS
|
|
407
|
-
launchctl setenv OLLAMA_KEEP_ALIVE "24h"
|
|
408
|
-
|
|
409
|
-
# Linux (systemd) - edit with: sudo systemctl edit ollama
|
|
410
|
-
[Service]
|
|
411
|
-
Environment="OLLAMA_KEEP_ALIVE=24h"
|
|
412
|
-
|
|
413
|
-
# Docker
|
|
414
|
-
docker run -e OLLAMA_KEEP_ALIVE=24h -d ollama/ollama
|
|
415
|
-
```
|
|
416
|
-
|
|
417
|
-
**Option 2: Per-Request Keep Alive**
|
|
418
|
-
```bash
|
|
419
|
-
curl http://localhost:11434/api/generate -d '{"model":"llama3.1:8b","keep_alive":"24h"}'
|
|
420
|
-
```
|
|
421
|
-
|
|
422
|
-
**Keep Alive Values:**
|
|
423
|
-
| Value | Behavior |
|
|
424
|
-
|-------|----------|
|
|
425
|
-
| `5m` | Default - unload after 5 minutes |
|
|
426
|
-
| `24h` | Keep loaded for 24 hours |
|
|
427
|
-
| `-1` | Never unload (keep forever) |
|
|
428
|
-
| `0` | Unload immediately after request |
|
|
429
|
-
|
|
430
|
-
#### Installation & Setup
|
|
431
|
-
|
|
432
|
-
```bash
|
|
433
|
-
# Install Ollama
|
|
434
|
-
brew install ollama # macOS
|
|
435
|
-
# Or download from: https://ollama.ai/download
|
|
436
|
-
|
|
437
|
-
# Start Ollama service
|
|
438
|
-
ollama serve
|
|
439
|
-
|
|
440
|
-
# Pull a model
|
|
441
|
-
ollama pull llama3.1:8b
|
|
442
|
-
|
|
443
|
-
# Verify model is available
|
|
444
|
-
ollama list
|
|
445
|
-
```
|
|
446
|
-
|
|
447
|
-
#### Recommended Models
|
|
448
|
-
|
|
449
|
-
**For Tool Calling** ✅ (Required for Claude Code CLI)
|
|
450
|
-
```bash
|
|
451
|
-
ollama pull llama3.1:8b # Good balance (4.7GB)
|
|
452
|
-
ollama pull llama3.2 # Latest Llama (4.7GB)
|
|
453
|
-
ollama pull qwen2.5:14b # Strong reasoning (8GB, 7b struggles with tools)
|
|
454
|
-
ollama pull mistral:7b-instruct # Fast and capable (4.1GB)
|
|
455
|
-
```
|
|
456
|
-
|
|
457
|
-
**NOT Recommended for Tools** ❌
|
|
458
|
-
```bash
|
|
459
|
-
qwen2.5-coder # Code-only, slow with tool calling
|
|
460
|
-
codellama # Code-only, poor tool support
|
|
461
|
-
```
|
|
462
|
-
|
|
463
|
-
#### Tool Calling Support
|
|
464
|
-
|
|
465
|
-
Lynkr supports **native tool calling** for compatible Ollama models:
|
|
466
|
-
|
|
467
|
-
- ✅ **Supported models**: llama3.1, llama3.2, qwen2.5, mistral, mistral-nemo
|
|
468
|
-
- ✅ **Automatic detection**: Lynkr detects tool-capable models
|
|
469
|
-
- ✅ **Format conversion**: Transparent Anthropic ↔ Ollama conversion
|
|
470
|
-
- ❌ **Unsupported models**: llama3, older models (tools filtered automatically)
|
|
471
|
-
|
|
472
|
-
#### Pricing
|
|
473
|
-
|
|
474
|
-
**100% FREE** - Models run on your hardware with no API costs.
|
|
475
|
-
|
|
476
|
-
#### Model Sizes
|
|
477
|
-
|
|
478
|
-
- **7B models**: ~4-5GB download, 8GB RAM required
|
|
479
|
-
- **8B models**: ~4.7GB download, 8GB RAM required
|
|
480
|
-
- **14B models**: ~8GB download, 16GB RAM required
|
|
481
|
-
- **32B models**: ~18GB download, 32GB RAM required
|
|
482
|
-
|
|
483
|
-
---
|
|
484
|
-
|
|
485
|
-
### 5. llama.cpp (GGUF Models)
|
|
486
|
-
|
|
487
|
-
**Best for:** Maximum performance, custom quantization, any GGUF model
|
|
488
|
-
|
|
489
|
-
#### Configuration
|
|
490
|
-
|
|
491
|
-
```env
|
|
492
|
-
MODEL_PROVIDER=llamacpp
|
|
493
|
-
LLAMACPP_ENDPOINT=http://localhost:8080 # Or any remote IP/hostname
|
|
494
|
-
LLAMACPP_MODEL=qwen2.5-coder-7b
|
|
495
|
-
LLAMACPP_TIMEOUT_MS=120000
|
|
496
|
-
```
|
|
497
|
-
|
|
498
|
-
Optional API key (for secured servers):
|
|
499
|
-
```env
|
|
500
|
-
LLAMACPP_API_KEY=your-optional-api-key
|
|
501
|
-
```
|
|
502
|
-
|
|
503
|
-
> 🌐 **Remote Support**: `LLAMACPP_ENDPOINT` can be any address. See [Remote/Network Configuration](#remotenetwork-configuration).
|
|
504
|
-
|
|
505
|
-
#### Installation & Setup
|
|
506
|
-
|
|
507
|
-
```bash
|
|
508
|
-
# Clone and build llama.cpp
|
|
509
|
-
git clone https://github.com/ggerganov/llama.cpp
|
|
510
|
-
cd llama.cpp && make
|
|
511
|
-
|
|
512
|
-
# Download a GGUF model (example: Qwen2.5-Coder-7B)
|
|
513
|
-
wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_k_m.gguf
|
|
514
|
-
|
|
515
|
-
# Start llama-server
|
|
516
|
-
./llama-server -m qwen2.5-coder-7b-instruct-q4_k_m.gguf --port 8080
|
|
517
|
-
|
|
518
|
-
# Verify server is running
|
|
519
|
-
curl http://localhost:8080/health
|
|
520
|
-
```
|
|
521
|
-
|
|
522
|
-
#### GPU Support
|
|
523
|
-
|
|
524
|
-
llama.cpp supports multiple GPU backends:
|
|
525
|
-
|
|
526
|
-
- **CUDA** (NVIDIA): `make LLAMA_CUDA=1`
|
|
527
|
-
- **Metal** (Apple Silicon): `make LLAMA_METAL=1`
|
|
528
|
-
- **ROCm** (AMD): `make LLAMA_ROCM=1`
|
|
529
|
-
- **Vulkan** (Universal): `make LLAMA_VULKAN=1`
|
|
530
|
-
|
|
531
|
-
#### llama.cpp vs Ollama
|
|
532
|
-
|
|
533
|
-
| Feature | Ollama | llama.cpp |
|
|
534
|
-
|---------|--------|-----------|
|
|
535
|
-
| Setup | Easy (app) | Manual (compile/download) |
|
|
536
|
-
| Model Format | Ollama-specific | Any GGUF model |
|
|
537
|
-
| Performance | Good | **Excellent** (optimized C++) |
|
|
538
|
-
| GPU Support | Yes | Yes (CUDA, Metal, ROCm, Vulkan) |
|
|
539
|
-
| Memory Usage | Higher | **Lower** (quantization options) |
|
|
540
|
-
| API | Custom `/api/chat` | OpenAI-compatible `/v1/chat/completions` |
|
|
541
|
-
| Flexibility | Limited models | **Any GGUF** from HuggingFace |
|
|
542
|
-
| Tool Calling | Limited models | Grammar-based, more reliable |
|
|
543
|
-
|
|
544
|
-
**Choose llama.cpp when you need:**
|
|
545
|
-
- Maximum performance
|
|
546
|
-
- Specific quantization options (Q4, Q5, Q8)
|
|
547
|
-
- GGUF models not available in Ollama
|
|
548
|
-
- Fine-grained control over inference parameters
|
|
549
|
-
|
|
550
|
-
---
|
|
551
|
-
|
|
552
|
-
### 6. Azure OpenAI
|
|
553
|
-
|
|
554
|
-
**Best for:** Azure integration, Microsoft ecosystem, GPT-4o, o1, o3
|
|
555
|
-
|
|
556
|
-
#### Configuration
|
|
557
|
-
|
|
558
|
-
```env
|
|
559
|
-
MODEL_PROVIDER=azure-openai
|
|
560
|
-
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/openai/deployments/YOUR-DEPLOYMENT/chat/completions?api-version=2025-01-01-preview
|
|
561
|
-
AZURE_OPENAI_API_KEY=your-azure-api-key
|
|
562
|
-
AZURE_OPENAI_DEPLOYMENT=gpt-4o
|
|
563
|
-
```
|
|
564
|
-
|
|
565
|
-
Optional:
|
|
566
|
-
```env
|
|
567
|
-
AZURE_OPENAI_API_VERSION=2024-08-01-preview # Latest stable version
|
|
568
|
-
```
|
|
569
|
-
|
|
570
|
-
#### Getting Azure OpenAI Credentials
|
|
571
|
-
|
|
572
|
-
1. Log in to [Azure Portal](https://portal.azure.com)
|
|
573
|
-
2. Navigate to **Azure OpenAI** service
|
|
574
|
-
3. Go to **Keys and Endpoint**
|
|
575
|
-
4. Copy **KEY 1** (this is your API key)
|
|
576
|
-
5. Copy **Endpoint** URL
|
|
577
|
-
6. Create a deployment (gpt-4o, gpt-4o-mini, etc.)
|
|
578
|
-
|
|
579
|
-
#### Important: Full Endpoint URL Required
|
|
580
|
-
|
|
581
|
-
The `AZURE_OPENAI_ENDPOINT` must include:
|
|
582
|
-
- Resource name
|
|
583
|
-
- Deployment path
|
|
584
|
-
- API version query parameter
|
|
585
|
-
|
|
586
|
-
**Example:**
|
|
587
|
-
```
|
|
588
|
-
https://your-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview
|
|
589
|
-
```
|
|
590
|
-
|
|
591
|
-
#### Available Deployments
|
|
592
|
-
|
|
593
|
-
You can deploy any of these models in Azure AI Foundry:
|
|
594
|
-
|
|
595
|
-
```env
|
|
596
|
-
AZURE_OPENAI_DEPLOYMENT=gpt-4o # Latest GPT-4o
|
|
597
|
-
AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini # Smaller, faster, cheaper
|
|
598
|
-
AZURE_OPENAI_DEPLOYMENT=gpt-5-chat # GPT-5 (if available)
|
|
599
|
-
AZURE_OPENAI_DEPLOYMENT=o1-preview # Reasoning model
|
|
600
|
-
AZURE_OPENAI_DEPLOYMENT=o3-mini # Latest reasoning model
|
|
601
|
-
AZURE_OPENAI_DEPLOYMENT=kimi-k2 # Kimi K2 (if available)
|
|
602
|
-
```
|
|
603
|
-
|
|
604
|
-
---
|
|
605
|
-
|
|
606
|
-
### 7. Azure Anthropic
|
|
607
|
-
|
|
608
|
-
**Best for:** Azure-hosted Claude models with enterprise integration
|
|
609
|
-
|
|
610
|
-
#### Configuration
|
|
611
|
-
|
|
612
|
-
```env
|
|
613
|
-
MODEL_PROVIDER=azure-anthropic
|
|
614
|
-
AZURE_ANTHROPIC_ENDPOINT=https://your-resource.services.ai.azure.com/anthropic/v1/messages
|
|
615
|
-
AZURE_ANTHROPIC_API_KEY=your-azure-api-key
|
|
616
|
-
AZURE_ANTHROPIC_VERSION=2023-06-01
|
|
617
|
-
```
|
|
618
|
-
|
|
619
|
-
#### Getting Azure Anthropic Credentials
|
|
620
|
-
|
|
621
|
-
1. Log in to [Azure Portal](https://portal.azure.com)
|
|
622
|
-
2. Navigate to your Azure Anthropic resource
|
|
623
|
-
3. Go to **Keys and Endpoint**
|
|
624
|
-
4. Copy the API key
|
|
625
|
-
5. Copy the endpoint URL (includes `/anthropic/v1/messages`)
|
|
626
|
-
|
|
627
|
-
#### Available Models
|
|
628
|
-
|
|
629
|
-
- **Claude Sonnet 4.5** - Best for tool calling, balanced
|
|
630
|
-
- **Claude Opus 4.5** - Most capable for complex reasoning
|
|
631
|
-
|
|
632
|
-
---
|
|
633
|
-
|
|
634
|
-
### 8. OpenAI (Direct)
|
|
635
|
-
|
|
636
|
-
**Best for:** Direct OpenAI API access, lowest latency
|
|
637
|
-
|
|
638
|
-
#### Configuration
|
|
639
|
-
|
|
640
|
-
```env
|
|
641
|
-
MODEL_PROVIDER=openai
|
|
642
|
-
OPENAI_API_KEY=sk-your-openai-api-key
|
|
643
|
-
OPENAI_MODEL=gpt-4o
|
|
644
|
-
OPENAI_ENDPOINT=https://api.openai.com/v1/chat/completions
|
|
645
|
-
```
|
|
646
|
-
|
|
647
|
-
Optional for organization-level keys:
|
|
648
|
-
```env
|
|
649
|
-
OPENAI_ORGANIZATION=org-your-org-id
|
|
650
|
-
```
|
|
651
|
-
|
|
652
|
-
#### Getting OpenAI API Key
|
|
653
|
-
|
|
654
|
-
1. Visit [platform.openai.com](https://platform.openai.com)
|
|
655
|
-
2. Sign up or log in
|
|
656
|
-
3. Go to [API Keys](https://platform.openai.com/api-keys)
|
|
657
|
-
4. Create a new API key
|
|
658
|
-
5. Add credits to your account (pay-as-you-go)
|
|
659
|
-
|
|
660
|
-
#### Available Models
|
|
661
|
-
|
|
662
|
-
```env
|
|
663
|
-
OPENAI_MODEL=gpt-4o # Latest GPT-4o ($2.50/$10 per 1M)
|
|
664
|
-
OPENAI_MODEL=gpt-4o-mini # Smaller, faster ($0.15/$0.60 per 1M)
|
|
665
|
-
OPENAI_MODEL=gpt-4-turbo # GPT-4 Turbo
|
|
666
|
-
OPENAI_MODEL=o1-preview # Reasoning model
|
|
667
|
-
OPENAI_MODEL=o1-mini # Smaller reasoning model
|
|
668
|
-
```
|
|
669
|
-
|
|
670
|
-
#### Benefits
|
|
671
|
-
|
|
672
|
-
- ✅ **Direct API access** - No intermediaries, lowest latency
|
|
673
|
-
- ✅ **Full tool calling support** - Excellent function calling
|
|
674
|
-
- ✅ **Parallel tool calls** - Execute multiple tools simultaneously
|
|
675
|
-
- ✅ **Organization support** - Use org-level API keys
|
|
676
|
-
- ✅ **Simple setup** - Just one API key needed
|
|
677
|
-
|
|
678
|
-
---
|
|
679
|
-
|
|
680
|
-
### 9. LM Studio (Local with GUI)
|
|
681
|
-
|
|
682
|
-
**Best for:** Local models with graphical interface
|
|
683
|
-
|
|
684
|
-
#### Configuration
|
|
685
|
-
|
|
686
|
-
```env
|
|
687
|
-
MODEL_PROVIDER=lmstudio
|
|
688
|
-
LMSTUDIO_ENDPOINT=http://localhost:1234
|
|
689
|
-
LMSTUDIO_MODEL=default
|
|
690
|
-
LMSTUDIO_TIMEOUT_MS=120000
|
|
691
|
-
```
|
|
692
|
-
|
|
693
|
-
Optional API key (for secured servers):
|
|
694
|
-
```env
|
|
695
|
-
LMSTUDIO_API_KEY=your-optional-api-key
|
|
696
|
-
```
|
|
697
|
-
|
|
698
|
-
#### Setup
|
|
699
|
-
|
|
700
|
-
1. Download and install [LM Studio](https://lmstudio.ai)
|
|
701
|
-
2. Launch LM Studio
|
|
702
|
-
3. Download a model (e.g., Qwen2.5-Coder-7B, Llama 3.1)
|
|
703
|
-
4. Click **Start Server** (default port: 1234)
|
|
704
|
-
5. Configure Lynkr to use LM Studio
|
|
705
|
-
|
|
706
|
-
#### Benefits
|
|
707
|
-
|
|
708
|
-
- ✅ **Graphical interface** for model management
|
|
709
|
-
- ✅ **Easy model downloads** from HuggingFace
|
|
710
|
-
- ✅ **Built-in server** with OpenAI-compatible API
|
|
711
|
-
- ✅ **GPU acceleration** support
|
|
712
|
-
- ✅ **Model presets** and configurations
|
|
713
|
-
|
|
714
|
-
---
|
|
715
|
-
|
|
716
|
-
### 10. Moonshot AI / Kimi (OpenAI-Compatible)
|
|
717
|
-
|
|
718
|
-
**Best for:** Affordable cloud models, thinking/reasoning models, OpenAI-compatible API
|
|
719
|
-
|
|
720
|
-
#### Configuration
|
|
721
|
-
|
|
722
|
-
```env
|
|
723
|
-
MODEL_PROVIDER=moonshot
|
|
724
|
-
MOONSHOT_API_KEY=sk-your-moonshot-api-key
|
|
725
|
-
MOONSHOT_ENDPOINT=https://api.moonshot.ai/v1/chat/completions
|
|
726
|
-
MOONSHOT_MODEL=kimi-k2-turbo-preview
|
|
727
|
-
```
|
|
728
|
-
|
|
729
|
-
#### Getting Moonshot API Key
|
|
730
|
-
|
|
731
|
-
1. Visit [platform.moonshot.ai](https://platform.moonshot.ai)
|
|
732
|
-
2. Sign up or log in
|
|
733
|
-
3. Navigate to API Keys section
|
|
734
|
-
4. Create a new API key
|
|
735
|
-
5. Add credits to your account
|
|
736
|
-
|
|
737
|
-
#### Available Models
|
|
738
|
-
|
|
739
|
-
```env
|
|
740
|
-
MOONSHOT_MODEL=kimi-k2-turbo-preview # Fast, efficient (recommended)
|
|
741
|
-
MOONSHOT_MODEL=kimi-k2-thinking # Chain-of-thought reasoning model
|
|
742
|
-
```
|
|
743
|
-
|
|
744
|
-
**Model Details:**
|
|
745
|
-
|
|
746
|
-
| Model | Type | Best For |
|
|
747
|
-
|-------|------|----------|
|
|
748
|
-
| `kimi-k2-turbo-preview` | Standard | Fast responses, tool calling, general tasks |
|
|
749
|
-
| `kimi-k2-thinking` | Thinking/Reasoning | Complex analysis, multi-step reasoning |
|
|
750
|
-
|
|
751
|
-
#### How It Works
|
|
752
|
-
|
|
753
|
-
Moonshot uses an **OpenAI-compatible** chat completions API. Lynkr handles all format conversion automatically:
|
|
754
|
-
|
|
755
|
-
1. Claude Code CLI sends Anthropic-format request to Lynkr
|
|
756
|
-
2. Lynkr converts Anthropic messages → OpenAI chat completions format
|
|
757
|
-
3. Request is sent to Moonshot's `/v1/chat/completions` endpoint
|
|
758
|
-
4. Moonshot response is converted back to Anthropic format
|
|
759
|
-
5. Claude Code CLI receives a standard Anthropic response
|
|
760
|
-
|
|
761
|
-
#### Thinking Model Support
|
|
762
|
-
|
|
763
|
-
When using `kimi-k2-thinking`, the model returns both `reasoning_content` (chain-of-thought) and `content` (final answer). Lynkr automatically extracts only the final answer for clean CLI output. The reasoning content is used as a fallback only when the final answer is empty.
|
|
764
|
-
|
|
765
|
-
#### Important Notes
|
|
766
|
-
|
|
767
|
-
- **Streaming:** Streaming is disabled for Moonshot (responses arrive as complete JSON). This ensures clean terminal rendering since OpenAI SSE → Anthropic SSE conversion is not yet implemented.
|
|
768
|
-
- **Rate Limits:** Moonshot has a max concurrency of ~3 requests. Lynkr retries with backoff on 429 errors.
|
|
769
|
-
- **Tool Calling:** Full tool calling support via OpenAI function calling format (automatically converted from Anthropic format).
|
|
770
|
-
- **System Messages:** Moonshot natively supports the `system` role, so system prompts are passed directly.
|
|
771
|
-
|
|
772
|
-
#### Benefits
|
|
773
|
-
|
|
774
|
-
- ✅ **Affordable** — Competitive pricing for capable models
|
|
775
|
-
- ✅ **Thinking models** — Chain-of-thought reasoning with `kimi-k2-thinking`
|
|
776
|
-
- ✅ **Full tool calling** — Native function calling support
|
|
777
|
-
- ✅ **OpenAI-compatible** — Standard chat completions API
|
|
778
|
-
- ✅ **System role support** — Native system message handling
|
|
779
|
-
|
|
780
|
-
#### Test Connection
|
|
781
|
-
|
|
782
|
-
```bash
|
|
783
|
-
curl -X POST https://api.moonshot.ai/v1/chat/completions \
|
|
784
|
-
-H "Content-Type: application/json" \
|
|
785
|
-
-H "Authorization: Bearer $MOONSHOT_API_KEY" \
|
|
786
|
-
-d '{"model":"kimi-k2-turbo-preview","messages":[{"role":"user","content":"Hello"}]}'
|
|
787
|
-
```
|
|
788
|
-
|
|
789
|
-
---
|
|
790
|
-
|
|
791
|
-
### 11. MLX OpenAI Server (Apple Silicon)
|
|
792
|
-
|
|
793
|
-
**Best for:** Maximum performance on Apple Silicon Macs (M1/M2/M3/M4)
|
|
794
|
-
|
|
795
|
-
[MLX OpenAI Server](https://github.com/cubist38/mlx-openai-server) is a high-performance local LLM server optimized for Apple's MLX framework. It provides OpenAI-compatible endpoints for text, vision, audio, and image generation models.
|
|
796
|
-
|
|
797
|
-
#### Installation
|
|
798
|
-
|
|
799
|
-
```bash
|
|
800
|
-
# Create virtual environment
|
|
801
|
-
python3.11 -m venv .venv
|
|
802
|
-
source .venv/bin/activate
|
|
803
|
-
|
|
804
|
-
# Install
|
|
805
|
-
pip install mlx-openai-server
|
|
806
|
-
|
|
807
|
-
# Optional: for audio transcription
|
|
808
|
-
brew install ffmpeg
|
|
809
|
-
```
|
|
810
|
-
|
|
811
|
-
#### Start the Server
|
|
812
|
-
|
|
813
|
-
```bash
|
|
814
|
-
# Text/Code models (recommended for coding)
|
|
815
|
-
mlx-openai-server launch --model-path mlx-community/Qwen2.5-Coder-7B-Instruct-4bit --model-type lm
|
|
816
|
-
|
|
817
|
-
# Smaller model (faster, less RAM)
|
|
818
|
-
mlx-openai-server launch --model-path mlx-community/Qwen2.5-Coder-1.5B-Instruct-4bit --model-type lm
|
|
819
|
-
|
|
820
|
-
# General purpose
|
|
821
|
-
mlx-openai-server launch --model-path mlx-community/Qwen2.5-3B-Instruct-4bit --model-type lm
|
|
822
|
-
```
|
|
823
|
-
|
|
824
|
-
Server runs at `http://localhost:8000/v1` by default.
|
|
825
|
-
|
|
826
|
-
#### Configuration
|
|
827
|
-
|
|
828
|
-
```env
|
|
829
|
-
MODEL_PROVIDER=openai
|
|
830
|
-
OPENAI_ENDPOINT=http://localhost:8000/v1/chat/completions
|
|
831
|
-
OPENAI_API_KEY=not-needed
|
|
832
|
-
```
|
|
833
|
-
|
|
834
|
-
> 🌐 **Remote Support**: `OPENAI_ENDPOINT` can be any address (e.g., `http://192.168.1.100:8000/v1/chat/completions` for a Mac Studio GPU server).
|
|
835
|
-
|
|
836
|
-
#### Recommended Models for Coding
|
|
837
|
-
|
|
838
|
-
| Model | Size | RAM | Command |
|
|
839
|
-
|-------|------|-----|---------|
|
|
840
|
-
| `Qwen2.5-Coder-1.5B-Instruct-4bit` | ~1GB | 4GB | Fast, simple code tasks |
|
|
841
|
-
| `Qwen2.5-3B-Instruct-4bit` | ~2GB | 6GB | General + code |
|
|
842
|
-
| `Qwen2.5-Coder-7B-Instruct-4bit` | ~4GB | 8GB | Best for coding |
|
|
843
|
-
| `Qwen2.5-Coder-14B-Instruct-4bit` | ~8GB | 16GB | Complex reasoning |
|
|
844
|
-
| `Llama-3.2-3B-Instruct-4bit` | ~2GB | 6GB | General purpose |
|
|
845
|
-
| `Phi-3-mini-4k-instruct-4bit` | ~2GB | 6GB | Reasoning tasks |
|
|
846
|
-
|
|
847
|
-
#### Server Options
|
|
848
|
-
|
|
849
|
-
```bash
|
|
850
|
-
mlx-openai-server launch \
|
|
851
|
-
--model-path mlx-community/Qwen2.5-Coder-7B-Instruct-4bit \
|
|
852
|
-
--model-type lm \
|
|
853
|
-
--host 0.0.0.0 \ # Allow remote connections
|
|
854
|
-
--port 8000 \ # Default port
|
|
855
|
-
--max-concurrency 2 \ # Parallel requests
|
|
856
|
-
--context-length 4096 # Max context window
|
|
857
|
-
```
|
|
858
|
-
|
|
859
|
-
#### MLX vs Ollama Comparison
|
|
860
|
-
|
|
861
|
-
| Feature | MLX OpenAI Server | Ollama |
|
|
862
|
-
|---------|-------------------|--------|
|
|
863
|
-
| Platform | Apple Silicon only | Cross-platform |
|
|
864
|
-
| Performance | Native MLX optimization | Good on Apple Silicon |
|
|
865
|
-
| Model Format | HuggingFace MLX | Ollama-specific |
|
|
866
|
-
| Vision/Audio | ✅ Built-in | Limited |
|
|
867
|
-
| Image Generation | ✅ Flux support | ❌ |
|
|
868
|
-
| Quantization | 4/8/16-bit flexible | Model-specific |
|
|
869
|
-
|
|
870
|
-
#### Test Connection
|
|
871
|
-
|
|
872
|
-
```bash
|
|
873
|
-
curl -X POST http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "default", "messages": [{"role": "user", "content": "Hello"}]}'
|
|
874
|
-
```
|
|
875
|
-
|
|
876
|
-
#### Pricing
|
|
877
|
-
|
|
878
|
-
**100% FREE** - Models run locally on your Apple Silicon Mac.
|
|
879
|
-
|
|
880
|
-
---
|
|
881
|
-
|
|
882
|
-
## Tier-Based Routing & Fallback
|
|
883
|
-
|
|
884
|
-
### Intelligent 4-Tier Routing
|
|
885
|
-
|
|
886
|
-
Optimize costs by routing requests based on complexity:
|
|
887
|
-
|
|
888
|
-
```env
|
|
889
|
-
# Tier-based routing (set all 4 to enable)
|
|
890
|
-
TIER_SIMPLE=ollama:llama3.2
|
|
891
|
-
TIER_MEDIUM=openrouter:openai/gpt-4o-mini
|
|
892
|
-
TIER_COMPLEX=azure-openai:gpt-4o
|
|
893
|
-
TIER_REASONING=azure-openai:gpt-4o
|
|
894
|
-
|
|
895
|
-
FALLBACK_ENABLED=true
|
|
896
|
-
|
|
897
|
-
# Provider credentials
|
|
898
|
-
OLLAMA_ENDPOINT=http://localhost:11434
|
|
899
|
-
OPENROUTER_API_KEY=your-key
|
|
900
|
-
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/...
|
|
901
|
-
AZURE_OPENAI_API_KEY=your-key
|
|
902
|
-
```
|
|
903
|
-
|
|
904
|
-
### How It Works
|
|
905
|
-
|
|
906
|
-
**Routing Logic:**
|
|
907
|
-
1. Each request is scored for complexity (0-100)
|
|
908
|
-
2. Score maps to a tier: SIMPLE (0-25), MEDIUM (26-50), COMPLEX (51-75), REASONING (76-100)
|
|
909
|
-
3. The request is routed to the provider:model configured for that tier
|
|
910
|
-
|
|
911
|
-
**Automatic Fallback:**
|
|
912
|
-
- If the selected provider fails, Lynkr falls back to `FALLBACK_PROVIDER`
|
|
913
|
-
- Transparent to the user
|
|
914
|
-
|
|
915
|
-
### Cost Savings
|
|
916
|
-
|
|
917
|
-
- **65-100%** for requests routed to local/cheap models
|
|
918
|
-
- **40-87%** faster for simple requests
|
|
919
|
-
- **Privacy**: Simple queries can stay on your machine when using a local TIER_SIMPLE model
|
|
920
|
-
|
|
921
|
-
### Configuration Options
|
|
922
|
-
|
|
923
|
-
| Variable | Description | Default |
|
|
924
|
-
|----------|-------------|---------|
|
|
925
|
-
| `TIER_SIMPLE` | Model for simple tier (`provider:model`) | *required for tier routing* |
|
|
926
|
-
| `TIER_MEDIUM` | Model for medium tier (`provider:model`) | *required for tier routing* |
|
|
927
|
-
| `TIER_COMPLEX` | Model for complex tier (`provider:model`) | *required for tier routing* |
|
|
928
|
-
| `TIER_REASONING` | Model for reasoning tier (`provider:model`) | *required for tier routing* |
|
|
929
|
-
| `FALLBACK_ENABLED` | Enable automatic fallback | `true` |
|
|
930
|
-
| `FALLBACK_PROVIDER` | Provider to use when primary fails | `databricks` |
|
|
931
|
-
| `OLLAMA_MAX_TOOLS_FOR_ROUTING` | Max tools to route to Ollama | `3` |
|
|
932
|
-
| `OPENROUTER_MAX_TOOLS_FOR_ROUTING` | Max tools to route to OpenRouter | `15` |
|
|
933
|
-
|
|
934
|
-
**Note:** Local providers (ollama, llamacpp, lmstudio) cannot be used as `FALLBACK_PROVIDER`.
|
|
935
|
-
|
|
936
|
-
---
|
|
937
|
-
|
|
938
|
-
## Complete Configuration Reference
|
|
939
|
-
|
|
940
|
-
### Core Variables
|
|
941
|
-
|
|
942
|
-
| Variable | Description | Default |
|
|
943
|
-
|----------|-------------|---------|
|
|
944
|
-
| `MODEL_PROVIDER` | Primary provider (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`, `zai`, `moonshot`, `vertex`) | `databricks` |
|
|
945
|
-
| `PORT` | HTTP port for proxy server | `8081` |
|
|
946
|
-
| `WORKSPACE_ROOT` | Workspace directory path | `process.cwd()` |
|
|
947
|
-
| `LOG_LEVEL` | Logging level (`error`, `warn`, `info`, `debug`) | `info` |
|
|
948
|
-
| `TOOL_EXECUTION_MODE` | Where tools execute (`server`, `client`) | `server` |
|
|
949
|
-
| `MODEL_DEFAULT` | Override default model/deployment name | Provider-specific |
|
|
950
|
-
|
|
951
|
-
### Provider-Specific Variables
|
|
952
|
-
|
|
953
|
-
See individual provider sections above for complete variable lists.
|
|
954
|
-
|
|
955
|
-
---
|
|
956
|
-
|
|
957
|
-
## Provider Comparison
|
|
958
|
-
|
|
959
|
-
### Feature Comparison
|
|
960
|
-
|
|
961
|
-
| Feature | Databricks | Bedrock | OpenAI | Azure OpenAI | Azure Anthropic | OpenRouter | Moonshot | Ollama | llama.cpp | LM Studio |
|
|
962
|
-
|---------|-----------|---------|--------|--------------|-----------------|------------|----------|--------|-----------|-----------|
|
|
963
|
-
| **Setup Complexity** | Medium | Easy | Easy | Medium | Medium | Easy | Easy | Easy | Medium | Easy |
|
|
964
|
-
| **Cost** | $$$ | $-$$$ | $$ | $$ | $$$ | $-$$ | $ | **Free** | **Free** | **Free** |
|
|
965
|
-
| **Latency** | Low | Low | Low | Low | Low | Medium | Low | **Very Low** | **Very Low** | **Very Low** |
|
|
966
|
-
| **Model Variety** | 2 | **100+** | 10+ | 10+ | 2 | **100+** | 2+ | 50+ | Unlimited | 50+ |
|
|
967
|
-
| **Tool Calling** | Excellent | Excellent* | Excellent | Excellent | Excellent | Good | Good | Fair | Good | Fair |
|
|
968
|
-
| **Context Length** | 200K | Up to 300K | 128K | 128K | 200K | Varies | 128K | 32K-128K | Model-dependent | 32K-128K |
|
|
969
|
-
| **Streaming** | Yes | Yes | Yes | Yes | Yes | Yes | Non-streaming** | Yes | Yes | Yes |
|
|
970
|
-
| **Privacy** | Enterprise | Enterprise | Third-party | Enterprise | Enterprise | Third-party | Third-party | **Local** | **Local** | **Local** |
|
|
971
|
-
| **Offline** | No | No | No | No | No | No | No | **Yes** | **Yes** | **Yes** |
|
|
972
|
-
|
|
973
|
-
_** Moonshot uses non-streaming mode (responses arrive as complete JSON) for clean terminal rendering_
|
|
974
|
-
|
|
975
|
-
_* Tool calling only supported by Claude models on Bedrock_
|
|
976
|
-
|
|
977
|
-
### Cost Comparison (per 1M tokens)
|
|
978
|
-
|
|
979
|
-
| Provider | Model | Input | Output |
|
|
980
|
-
|----------|-------|-------|--------|
|
|
981
|
-
| **Bedrock** | Claude 3.5 Sonnet | $3.00 | $15.00 |
|
|
982
|
-
| **Databricks** | Contact for pricing | - | - |
|
|
983
|
-
| **OpenRouter** | Claude 3.5 Sonnet | $3.00 | $15.00 |
|
|
984
|
-
| **OpenRouter** | GPT-4o mini | $0.15 | $0.60 |
|
|
985
|
-
| **OpenAI** | GPT-4o | $2.50 | $10.00 |
|
|
986
|
-
| **Azure OpenAI** | GPT-4o | $2.50 | $10.00 |
|
|
987
|
-
| **Moonshot** | Kimi K2 Turbo | See moonshot.ai | See moonshot.ai |
|
|
988
|
-
| **Ollama** | Any model | **FREE** | **FREE** |
|
|
989
|
-
| **llama.cpp** | Any model | **FREE** | **FREE** |
|
|
990
|
-
| **LM Studio** | Any model | **FREE** | **FREE** |
|
|
991
|
-
|
|
992
|
-
---
|
|
993
|
-
|
|
994
|
-
## Next Steps
|
|
995
|
-
|
|
996
|
-
- **[Installation Guide](installation.md)** - Install Lynkr with your chosen provider
|
|
997
|
-
- **[Claude Code CLI Setup](claude-code-cli.md)** - Connect Claude Code CLI
|
|
998
|
-
- **[Cursor Integration](cursor-integration.md)** - Connect Cursor IDE
|
|
999
|
-
- **[Embeddings Configuration](embeddings.md)** - Enable @Codebase semantic search
|
|
1000
|
-
- **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
|
|
1001
|
-
|
|
1002
|
-
---
|
|
1003
|
-
|
|
1004
|
-
## Getting Help
|
|
1005
|
-
|
|
1006
|
-
- **[FAQ](faq.md)** - Frequently asked questions
|
|
1007
|
-
- **[Troubleshooting Guide](troubleshooting.md)** - Common issues
|
|
1008
|
-
- **[GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Community Q&A
|
|
1009
|
-
- **[GitHub Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Report bugs
|