lynkr 3.3.1 → 4.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +276 -2177
- package/README.md.backup +2996 -0
- package/docs/GSD_LEARNINGS.md +1116 -0
- package/docs/LOCAL_EMBEDDINGS_PLAN.md +1024 -0
- package/documentation/README.md +98 -0
- package/documentation/api.md +806 -0
- package/documentation/claude-code-cli.md +672 -0
- package/documentation/contributing.md +571 -0
- package/documentation/cursor-integration.md +731 -0
- package/documentation/docker.md +867 -0
- package/documentation/embeddings.md +760 -0
- package/documentation/faq.md +659 -0
- package/documentation/features.md +396 -0
- package/documentation/installation.md +706 -0
- package/documentation/memory-system.md +476 -0
- package/documentation/production.md +601 -0
- package/documentation/providers.md +735 -0
- package/documentation/testing.md +629 -0
- package/documentation/token-optimization.md +323 -0
- package/documentation/tools.md +697 -0
- package/documentation/troubleshooting.md +864 -0
- package/package.json +2 -2
- package/src/api/openai-router.js +919 -0
- package/src/api/router.js +4 -0
- package/src/clients/openai-format.js +427 -0
- package/src/config/index.js +8 -0
- package/test/cursor-integration.test.js +484 -0
|
@@ -0,0 +1,735 @@
|
|
|
1
|
+
# Provider Configuration Guide
|
|
2
|
+
|
|
3
|
+
Complete configuration reference for all 9+ supported LLM providers. Each provider section includes setup instructions, model options, pricing, and example configurations.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Overview
|
|
8
|
+
|
|
9
|
+
Lynkr supports multiple AI model providers, giving you flexibility in choosing the right model for your needs:
|
|
10
|
+
|
|
11
|
+
| Provider | Type | Models | Cost | Privacy | Setup Complexity |
|
|
12
|
+
|----------|------|--------|------|---------|------------------|
|
|
13
|
+
| **AWS Bedrock** | Cloud | 100+ (Claude, DeepSeek, Qwen, Nova, Titan, Llama, Mistral) | $-$$$ | Cloud | Easy |
|
|
14
|
+
| **Databricks** | Cloud | Claude Sonnet 4.5, Opus 4.5 | $$$ | Cloud | Medium |
|
|
15
|
+
| **OpenRouter** | Cloud | 100+ (GPT, Claude, Gemini, Llama, Mistral, etc.) | $-$$ | Cloud | Easy |
|
|
16
|
+
| **Ollama** | Local | Unlimited (free, offline) | **FREE** | 🔒 100% Local | Easy |
|
|
17
|
+
| **llama.cpp** | Local | Any GGUF model | **FREE** | 🔒 100% Local | Medium |
|
|
18
|
+
| **Azure OpenAI** | Cloud | GPT-4o, GPT-5, o1, o3 | $$$ | Cloud | Medium |
|
|
19
|
+
| **Azure Anthropic** | Cloud | Claude models | $$$ | Cloud | Medium |
|
|
20
|
+
| **OpenAI** | Cloud | GPT-4o, o1, o3 | $$$ | Cloud | Easy |
|
|
21
|
+
| **LM Studio** | Local | Local models with GUI | **FREE** | 🔒 100% Local | Easy |
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Configuration Methods
|
|
26
|
+
|
|
27
|
+
### Environment Variables (Quick Start)
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
export MODEL_PROVIDER=databricks
|
|
31
|
+
export DATABRICKS_API_BASE=https://your-workspace.databricks.com
|
|
32
|
+
export DATABRICKS_API_KEY=your-key
|
|
33
|
+
lynkr start
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
### .env File (Recommended for Production)
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
# Copy example file
|
|
40
|
+
cp .env.example .env
|
|
41
|
+
|
|
42
|
+
# Edit with your credentials
|
|
43
|
+
nano .env
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Example `.env`:
|
|
47
|
+
```env
|
|
48
|
+
MODEL_PROVIDER=databricks
|
|
49
|
+
DATABRICKS_API_BASE=https://your-workspace.databricks.com
|
|
50
|
+
DATABRICKS_API_KEY=dapi1234567890abcdef
|
|
51
|
+
PORT=8081
|
|
52
|
+
LOG_LEVEL=info
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Provider-Specific Configuration
|
|
58
|
+
|
|
59
|
+
### 1. AWS Bedrock (100+ Models)
|
|
60
|
+
|
|
61
|
+
**Best for:** AWS ecosystem, multi-model flexibility, Claude + alternatives
|
|
62
|
+
|
|
63
|
+
#### Configuration
|
|
64
|
+
|
|
65
|
+
```env
|
|
66
|
+
MODEL_PROVIDER=bedrock
|
|
67
|
+
AWS_BEDROCK_API_KEY=your-bearer-token
|
|
68
|
+
AWS_BEDROCK_REGION=us-east-1
|
|
69
|
+
AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
#### Getting AWS Bedrock API Key
|
|
73
|
+
|
|
74
|
+
1. Log in to [AWS Console](https://console.aws.amazon.com/)
|
|
75
|
+
2. Navigate to **Bedrock** → **API Keys**
|
|
76
|
+
3. Click **Generate API Key**
|
|
77
|
+
4. Copy the bearer token (this is your `AWS_BEDROCK_API_KEY`)
|
|
78
|
+
5. Enable model access in Bedrock console
|
|
79
|
+
6. See: [AWS Bedrock API Keys Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/api-keys-generate.html)
|
|
80
|
+
|
|
81
|
+
#### Available Regions
|
|
82
|
+
|
|
83
|
+
- `us-east-1` (N. Virginia) - Most models available
|
|
84
|
+
- `us-west-2` (Oregon)
|
|
85
|
+
- `us-east-2` (Ohio)
|
|
86
|
+
- `ap-southeast-1` (Singapore)
|
|
87
|
+
- `ap-northeast-1` (Tokyo)
|
|
88
|
+
- `eu-central-1` (Frankfurt)
|
|
89
|
+
|
|
90
|
+
#### Model Catalog
|
|
91
|
+
|
|
92
|
+
**Claude Models (Best for Tool Calling)** ✅
|
|
93
|
+
|
|
94
|
+
Claude 4.5 (latest - requires inference profiles):
|
|
95
|
+
```env
|
|
96
|
+
AWS_BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0 # Regional US
|
|
97
|
+
AWS_BEDROCK_MODEL_ID=us.anthropic.claude-haiku-4-5-20251001-v1:0 # Fast, efficient
|
|
98
|
+
AWS_BEDROCK_MODEL_ID=global.anthropic.claude-sonnet-4-5-20250929-v1:0 # Cross-region
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
Claude 3.x models:
|
|
102
|
+
```env
|
|
103
|
+
AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0 # Excellent tool calling
|
|
104
|
+
AWS_BEDROCK_MODEL_ID=anthropic.claude-3-opus-20240229-v1:0 # Most capable
|
|
105
|
+
AWS_BEDROCK_MODEL_ID=anthropic.claude-3-haiku-20240307-v1:0 # Fast, cheap
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
**DeepSeek Models (NEW - 2025)**
|
|
109
|
+
```env
|
|
110
|
+
AWS_BEDROCK_MODEL_ID=us.deepseek.r1-v1:0 # DeepSeek R1 - reasoning model (o1-style)
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
**Qwen Models (Alibaba - NEW 2025)**
|
|
114
|
+
```env
|
|
115
|
+
AWS_BEDROCK_MODEL_ID=qwen.qwen3-235b-a22b-2507-v1:0 # Largest, 235B parameters
|
|
116
|
+
AWS_BEDROCK_MODEL_ID=qwen.qwen3-32b-v1:0 # Balanced, 32B
|
|
117
|
+
AWS_BEDROCK_MODEL_ID=qwen.qwen3-coder-480b-a35b-v1:0 # Coding specialist, 480B
|
|
118
|
+
AWS_BEDROCK_MODEL_ID=qwen.qwen3-coder-30b-a3b-v1:0 # Coding, smaller
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
**OpenAI Open-Weight Models (NEW - 2025)**
|
|
122
|
+
```env
|
|
123
|
+
AWS_BEDROCK_MODEL_ID=openai.gpt-oss-120b-1:0 # 120B parameters, open-weight
|
|
124
|
+
AWS_BEDROCK_MODEL_ID=openai.gpt-oss-20b-1:0 # 20B parameters, efficient
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
**Google Gemma Models (Open-Weight)**
|
|
128
|
+
```env
|
|
129
|
+
AWS_BEDROCK_MODEL_ID=google.gemma-3-27b # 27B parameters
|
|
130
|
+
AWS_BEDROCK_MODEL_ID=google.gemma-3-12b # 12B parameters
|
|
131
|
+
AWS_BEDROCK_MODEL_ID=google.gemma-3-4b # 4B parameters, efficient
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
**Amazon Models**
|
|
135
|
+
|
|
136
|
+
Nova (multimodal):
|
|
137
|
+
```env
|
|
138
|
+
AWS_BEDROCK_MODEL_ID=us.amazon.nova-pro-v1:0 # Best quality, multimodal, 300K context
|
|
139
|
+
AWS_BEDROCK_MODEL_ID=us.amazon.nova-lite-v1:0 # Fast, cost-effective
|
|
140
|
+
AWS_BEDROCK_MODEL_ID=us.amazon.nova-micro-v1:0 # Ultra-fast, text-only
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
Titan:
|
|
144
|
+
```env
|
|
145
|
+
AWS_BEDROCK_MODEL_ID=amazon.titan-text-premier-v1:0 # Largest
|
|
146
|
+
AWS_BEDROCK_MODEL_ID=amazon.titan-text-express-v1 # Fast
|
|
147
|
+
AWS_BEDROCK_MODEL_ID=amazon.titan-text-lite-v1 # Cheapest
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
**Meta Llama Models**
|
|
151
|
+
```env
|
|
152
|
+
AWS_BEDROCK_MODEL_ID=meta.llama3-1-70b-instruct-v1:0 # Most capable
|
|
153
|
+
AWS_BEDROCK_MODEL_ID=meta.llama3-1-8b-instruct-v1:0 # Fast, efficient
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
**Mistral Models**
|
|
157
|
+
```env
|
|
158
|
+
AWS_BEDROCK_MODEL_ID=mistral.mistral-large-2407-v1:0 # Largest, coding, multilingual
|
|
159
|
+
AWS_BEDROCK_MODEL_ID=mistral.mistral-small-2402-v1:0 # Efficient
|
|
160
|
+
AWS_BEDROCK_MODEL_ID=mistral.mixtral-8x7b-instruct-v0:1 # Mixture of experts
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
**Cohere Command Models**
|
|
164
|
+
```env
|
|
165
|
+
AWS_BEDROCK_MODEL_ID=cohere.command-r-plus-v1:0 # Best for RAG, search
|
|
166
|
+
AWS_BEDROCK_MODEL_ID=cohere.command-r-v1:0 # Balanced
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
**AI21 Jamba Models**
|
|
170
|
+
```env
|
|
171
|
+
AWS_BEDROCK_MODEL_ID=ai21.jamba-1-5-large-v1:0 # Hybrid architecture, 256K context
|
|
172
|
+
AWS_BEDROCK_MODEL_ID=ai21.jamba-1-5-mini-v1:0 # Fast
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
#### Pricing (per 1M tokens)
|
|
176
|
+
|
|
177
|
+
| Model | Input | Output |
|
|
178
|
+
|-------|-------|--------|
|
|
179
|
+
| Claude 3.5 Sonnet | $3.00 | $15.00 |
|
|
180
|
+
| Claude 3 Opus | $15.00 | $75.00 |
|
|
181
|
+
| Claude 3 Haiku | $0.25 | $1.25 |
|
|
182
|
+
| Titan Text Express | $0.20 | $0.60 |
|
|
183
|
+
| Llama 3 70B | $0.99 | $0.99 |
|
|
184
|
+
| Nova Pro | $0.80 | $3.20 |
|
|
185
|
+
|
|
186
|
+
#### Important Notes
|
|
187
|
+
|
|
188
|
+
⚠️ **Tool Calling:** Only **Claude models** support tool calling on Bedrock. Other models work via Converse API but won't use Read/Write/Bash tools.
|
|
189
|
+
|
|
190
|
+
📖 **Full Documentation:** See [BEDROCK_MODELS.md](../BEDROCK_MODELS.md) for complete model catalog with capabilities and use cases.
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
### 2. Databricks (Claude Sonnet 4.5, Opus 4.5)
|
|
195
|
+
|
|
196
|
+
**Best for:** Enterprise production use, managed Claude endpoints
|
|
197
|
+
|
|
198
|
+
#### Configuration
|
|
199
|
+
|
|
200
|
+
```env
|
|
201
|
+
MODEL_PROVIDER=databricks
|
|
202
|
+
DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com
|
|
203
|
+
DATABRICKS_API_KEY=dapi1234567890abcdef
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
Optional endpoint path override:
|
|
207
|
+
```env
|
|
208
|
+
DATABRICKS_ENDPOINT_PATH=/serving-endpoints/databricks-claude-sonnet-4-5/invocations
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
#### Getting Databricks Credentials
|
|
212
|
+
|
|
213
|
+
1. Log in to your Databricks workspace
|
|
214
|
+
2. Navigate to **Settings** → **User Settings**
|
|
215
|
+
3. Click **Generate New Token**
|
|
216
|
+
4. Copy the token (this is your `DATABRICKS_API_KEY`)
|
|
217
|
+
5. Your workspace URL is the base URL (e.g., `https://your-workspace.cloud.databricks.com`)
|
|
218
|
+
|
|
219
|
+
#### Available Models
|
|
220
|
+
|
|
221
|
+
- **Claude Sonnet 4.5** - Excellent for tool calling, balanced performance
|
|
222
|
+
- **Claude Opus 4.5** - Most capable model for complex reasoning
|
|
223
|
+
|
|
224
|
+
#### Pricing
|
|
225
|
+
|
|
226
|
+
Contact Databricks for enterprise pricing.
|
|
227
|
+
|
|
228
|
+
---
|
|
229
|
+
|
|
230
|
+
### 3. OpenRouter (100+ Models)
|
|
231
|
+
|
|
232
|
+
**Best for:** Quick setup, model flexibility, cost optimization
|
|
233
|
+
|
|
234
|
+
#### Configuration
|
|
235
|
+
|
|
236
|
+
```env
|
|
237
|
+
MODEL_PROVIDER=openrouter
|
|
238
|
+
OPENROUTER_API_KEY=sk-or-v1-your-key
|
|
239
|
+
OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
|
|
240
|
+
OPENROUTER_ENDPOINT=https://openrouter.ai/api/v1/chat/completions
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
Optional for hybrid routing:
|
|
244
|
+
```env
|
|
245
|
+
OPENROUTER_MAX_TOOLS_FOR_ROUTING=15 # Max tools to route to OpenRouter
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
#### Getting OpenRouter API Key
|
|
249
|
+
|
|
250
|
+
1. Visit [openrouter.ai](https://openrouter.ai)
|
|
251
|
+
2. Sign in with GitHub, Google, or email
|
|
252
|
+
3. Go to [openrouter.ai/keys](https://openrouter.ai/keys)
|
|
253
|
+
4. Create a new API key
|
|
254
|
+
5. Add credits (pay-as-you-go, no subscription required)
|
|
255
|
+
|
|
256
|
+
#### Popular Models
|
|
257
|
+
|
|
258
|
+
**Claude Models (Best for Coding)**
|
|
259
|
+
```env
|
|
260
|
+
OPENROUTER_MODEL=anthropic/claude-3.5-sonnet # $3/$15 per 1M tokens
|
|
261
|
+
OPENROUTER_MODEL=anthropic/claude-opus-4.5 # $15/$75 per 1M tokens
|
|
262
|
+
OPENROUTER_MODEL=anthropic/claude-3-haiku # $0.25/$1.25 per 1M tokens
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
**OpenAI Models**
|
|
266
|
+
```env
|
|
267
|
+
OPENROUTER_MODEL=openai/gpt-4o # $2.50/$10 per 1M tokens
|
|
268
|
+
OPENROUTER_MODEL=openai/gpt-4o-mini # $0.15/$0.60 per 1M tokens (default)
|
|
269
|
+
OPENROUTER_MODEL=openai/o1-preview # $15/$60 per 1M tokens
|
|
270
|
+
OPENROUTER_MODEL=openai/o1-mini # $3/$12 per 1M tokens
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
**Google Models**
|
|
274
|
+
```env
|
|
275
|
+
OPENROUTER_MODEL=google/gemini-pro-1.5 # $1.25/$5 per 1M tokens
|
|
276
|
+
OPENROUTER_MODEL=google/gemini-flash-1.5 # $0.075/$0.30 per 1M tokens
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
**Meta Llama Models**
|
|
280
|
+
```env
|
|
281
|
+
OPENROUTER_MODEL=meta-llama/llama-3.1-405b # $2.70/$2.70 per 1M tokens
|
|
282
|
+
OPENROUTER_MODEL=meta-llama/llama-3.1-70b # $0.52/$0.75 per 1M tokens
|
|
283
|
+
OPENROUTER_MODEL=meta-llama/llama-3.1-8b # $0.06/$0.06 per 1M tokens
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
**Mistral Models**
|
|
287
|
+
```env
|
|
288
|
+
OPENROUTER_MODEL=mistralai/mistral-large # $2/$6 per 1M tokens
|
|
289
|
+
OPENROUTER_MODEL=mistralai/codestral-latest # $0.30/$0.90 per 1M tokens
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
**DeepSeek Models**
|
|
293
|
+
```env
|
|
294
|
+
OPENROUTER_MODEL=deepseek/deepseek-chat # $0.14/$0.28 per 1M tokens
|
|
295
|
+
OPENROUTER_MODEL=deepseek/deepseek-coder # $0.14/$0.28 per 1M tokens
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
#### Benefits
|
|
299
|
+
|
|
300
|
+
- ✅ **100+ models** through one API
|
|
301
|
+
- ✅ **Automatic fallbacks** if primary model unavailable
|
|
302
|
+
- ✅ **Competitive pricing** with volume discounts
|
|
303
|
+
- ✅ **Full tool calling support**
|
|
304
|
+
- ✅ **No monthly fees** - pay only for usage
|
|
305
|
+
- ✅ **Rate limit pooling** across models
|
|
306
|
+
|
|
307
|
+
See [openrouter.ai/models](https://openrouter.ai/models) for complete list with pricing.
|
|
308
|
+
|
|
309
|
+
---
|
|
310
|
+
|
|
311
|
+
### 4. Ollama (Local Models)
|
|
312
|
+
|
|
313
|
+
**Best for:** Local development, privacy, offline use, no API costs
|
|
314
|
+
|
|
315
|
+
#### Configuration
|
|
316
|
+
|
|
317
|
+
```env
|
|
318
|
+
MODEL_PROVIDER=ollama
|
|
319
|
+
OLLAMA_ENDPOINT=http://localhost:11434
|
|
320
|
+
OLLAMA_MODEL=llama3.1:8b
|
|
321
|
+
OLLAMA_TIMEOUT_MS=120000
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
#### Installation & Setup
|
|
325
|
+
|
|
326
|
+
```bash
|
|
327
|
+
# Install Ollama
|
|
328
|
+
brew install ollama # macOS
|
|
329
|
+
# Or download from: https://ollama.ai/download
|
|
330
|
+
|
|
331
|
+
# Start Ollama service
|
|
332
|
+
ollama serve
|
|
333
|
+
|
|
334
|
+
# Pull a model
|
|
335
|
+
ollama pull llama3.1:8b
|
|
336
|
+
|
|
337
|
+
# Verify model is available
|
|
338
|
+
ollama list
|
|
339
|
+
```
|
|
340
|
+
|
|
341
|
+
#### Recommended Models
|
|
342
|
+
|
|
343
|
+
**For Tool Calling** ✅ (Required for Claude Code CLI)
|
|
344
|
+
```bash
|
|
345
|
+
ollama pull llama3.1:8b # Good balance (4.7GB)
|
|
346
|
+
ollama pull llama3.2 # Latest Llama (4.7GB)
|
|
347
|
+
ollama pull qwen2.5:14b # Strong reasoning (8GB, 7b struggles with tools)
|
|
348
|
+
ollama pull mistral:7b-instruct # Fast and capable (4.1GB)
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
**NOT Recommended for Tools** ❌
|
|
352
|
+
```bash
|
|
353
|
+
qwen2.5-coder # Code-only, slow with tool calling
|
|
354
|
+
codellama # Code-only, poor tool support
|
|
355
|
+
```
|
|
356
|
+
|
|
357
|
+
#### Tool Calling Support
|
|
358
|
+
|
|
359
|
+
Lynkr supports **native tool calling** for compatible Ollama models:
|
|
360
|
+
|
|
361
|
+
- ✅ **Supported models**: llama3.1, llama3.2, qwen2.5, mistral, mistral-nemo
|
|
362
|
+
- ✅ **Automatic detection**: Lynkr detects tool-capable models
|
|
363
|
+
- ✅ **Format conversion**: Transparent Anthropic ↔ Ollama conversion
|
|
364
|
+
- ❌ **Unsupported models**: llama3, older models (tools filtered automatically)
|
|
365
|
+
|
|
366
|
+
#### Pricing
|
|
367
|
+
|
|
368
|
+
**100% FREE** - Models run on your hardware with no API costs.
|
|
369
|
+
|
|
370
|
+
#### Model Sizes
|
|
371
|
+
|
|
372
|
+
- **7B models**: ~4-5GB download, 8GB RAM required
|
|
373
|
+
- **8B models**: ~4.7GB download, 8GB RAM required
|
|
374
|
+
- **14B models**: ~8GB download, 16GB RAM required
|
|
375
|
+
- **32B models**: ~18GB download, 32GB RAM required
|
|
376
|
+
|
|
377
|
+
---
|
|
378
|
+
|
|
379
|
+
### 5. llama.cpp (GGUF Models)
|
|
380
|
+
|
|
381
|
+
**Best for:** Maximum performance, custom quantization, any GGUF model
|
|
382
|
+
|
|
383
|
+
#### Configuration
|
|
384
|
+
|
|
385
|
+
```env
|
|
386
|
+
MODEL_PROVIDER=llamacpp
|
|
387
|
+
LLAMACPP_ENDPOINT=http://localhost:8080
|
|
388
|
+
LLAMACPP_MODEL=qwen2.5-coder-7b
|
|
389
|
+
LLAMACPP_TIMEOUT_MS=120000
|
|
390
|
+
```
|
|
391
|
+
|
|
392
|
+
Optional API key (for secured servers):
|
|
393
|
+
```env
|
|
394
|
+
LLAMACPP_API_KEY=your-optional-api-key
|
|
395
|
+
```
|
|
396
|
+
|
|
397
|
+
#### Installation & Setup
|
|
398
|
+
|
|
399
|
+
```bash
|
|
400
|
+
# Clone and build llama.cpp
|
|
401
|
+
git clone https://github.com/ggerganov/llama.cpp
|
|
402
|
+
cd llama.cpp && make
|
|
403
|
+
|
|
404
|
+
# Download a GGUF model (example: Qwen2.5-Coder-7B)
|
|
405
|
+
wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_k_m.gguf
|
|
406
|
+
|
|
407
|
+
# Start llama-server
|
|
408
|
+
./llama-server -m qwen2.5-coder-7b-instruct-q4_k_m.gguf --port 8080
|
|
409
|
+
|
|
410
|
+
# Verify server is running
|
|
411
|
+
curl http://localhost:8080/health
|
|
412
|
+
```
|
|
413
|
+
|
|
414
|
+
#### GPU Support
|
|
415
|
+
|
|
416
|
+
llama.cpp supports multiple GPU backends:
|
|
417
|
+
|
|
418
|
+
- **CUDA** (NVIDIA): `make LLAMA_CUDA=1`
|
|
419
|
+
- **Metal** (Apple Silicon): `make LLAMA_METAL=1`
|
|
420
|
+
- **ROCm** (AMD): `make LLAMA_ROCM=1`
|
|
421
|
+
- **Vulkan** (Universal): `make LLAMA_VULKAN=1`
|
|
422
|
+
|
|
423
|
+
#### llama.cpp vs Ollama
|
|
424
|
+
|
|
425
|
+
| Feature | Ollama | llama.cpp |
|
|
426
|
+
|---------|--------|-----------|
|
|
427
|
+
| Setup | Easy (app) | Manual (compile/download) |
|
|
428
|
+
| Model Format | Ollama-specific | Any GGUF model |
|
|
429
|
+
| Performance | Good | **Excellent** (optimized C++) |
|
|
430
|
+
| GPU Support | Yes | Yes (CUDA, Metal, ROCm, Vulkan) |
|
|
431
|
+
| Memory Usage | Higher | **Lower** (quantization options) |
|
|
432
|
+
| API | Custom `/api/chat` | OpenAI-compatible `/v1/chat/completions` |
|
|
433
|
+
| Flexibility | Limited models | **Any GGUF** from HuggingFace |
|
|
434
|
+
| Tool Calling | Limited models | Grammar-based, more reliable |
|
|
435
|
+
|
|
436
|
+
**Choose llama.cpp when you need:**
|
|
437
|
+
- Maximum performance
|
|
438
|
+
- Specific quantization options (Q4, Q5, Q8)
|
|
439
|
+
- GGUF models not available in Ollama
|
|
440
|
+
- Fine-grained control over inference parameters
|
|
441
|
+
|
|
442
|
+
---
|
|
443
|
+
|
|
444
|
+
### 6. Azure OpenAI
|
|
445
|
+
|
|
446
|
+
**Best for:** Azure integration, Microsoft ecosystem, GPT-4o, o1, o3
|
|
447
|
+
|
|
448
|
+
#### Configuration
|
|
449
|
+
|
|
450
|
+
```env
|
|
451
|
+
MODEL_PROVIDER=azure-openai
|
|
452
|
+
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/openai/deployments/YOUR-DEPLOYMENT/chat/completions?api-version=2025-01-01-preview
|
|
453
|
+
AZURE_OPENAI_API_KEY=your-azure-api-key
|
|
454
|
+
AZURE_OPENAI_DEPLOYMENT=gpt-4o
|
|
455
|
+
```
|
|
456
|
+
|
|
457
|
+
Optional:
|
|
458
|
+
```env
|
|
459
|
+
AZURE_OPENAI_API_VERSION=2024-08-01-preview # Latest stable version
|
|
460
|
+
```
|
|
461
|
+
|
|
462
|
+
#### Getting Azure OpenAI Credentials
|
|
463
|
+
|
|
464
|
+
1. Log in to [Azure Portal](https://portal.azure.com)
|
|
465
|
+
2. Navigate to **Azure OpenAI** service
|
|
466
|
+
3. Go to **Keys and Endpoint**
|
|
467
|
+
4. Copy **KEY 1** (this is your API key)
|
|
468
|
+
5. Copy **Endpoint** URL
|
|
469
|
+
6. Create a deployment (gpt-4o, gpt-4o-mini, etc.)
|
|
470
|
+
|
|
471
|
+
#### Important: Full Endpoint URL Required
|
|
472
|
+
|
|
473
|
+
The `AZURE_OPENAI_ENDPOINT` must include:
|
|
474
|
+
- Resource name
|
|
475
|
+
- Deployment path
|
|
476
|
+
- API version query parameter
|
|
477
|
+
|
|
478
|
+
**Example:**
|
|
479
|
+
```
|
|
480
|
+
https://your-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview
|
|
481
|
+
```
|
|
482
|
+
|
|
483
|
+
#### Available Deployments
|
|
484
|
+
|
|
485
|
+
You can deploy any of these models in Azure AI Foundry:
|
|
486
|
+
|
|
487
|
+
```env
|
|
488
|
+
AZURE_OPENAI_DEPLOYMENT=gpt-4o # Latest GPT-4o
|
|
489
|
+
AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini # Smaller, faster, cheaper
|
|
490
|
+
AZURE_OPENAI_DEPLOYMENT=gpt-5-chat # GPT-5 (if available)
|
|
491
|
+
AZURE_OPENAI_DEPLOYMENT=o1-preview # Reasoning model
|
|
492
|
+
AZURE_OPENAI_DEPLOYMENT=o3-mini # Latest reasoning model
|
|
493
|
+
AZURE_OPENAI_DEPLOYMENT=kimi-k2 # Kimi K2 (if available)
|
|
494
|
+
```
|
|
495
|
+
|
|
496
|
+
---
|
|
497
|
+
|
|
498
|
+
### 7. Azure Anthropic
|
|
499
|
+
|
|
500
|
+
**Best for:** Azure-hosted Claude models with enterprise integration
|
|
501
|
+
|
|
502
|
+
#### Configuration
|
|
503
|
+
|
|
504
|
+
```env
|
|
505
|
+
MODEL_PROVIDER=azure-anthropic
|
|
506
|
+
AZURE_ANTHROPIC_ENDPOINT=https://your-resource.services.ai.azure.com/anthropic/v1/messages
|
|
507
|
+
AZURE_ANTHROPIC_API_KEY=your-azure-api-key
|
|
508
|
+
AZURE_ANTHROPIC_VERSION=2023-06-01
|
|
509
|
+
```
|
|
510
|
+
|
|
511
|
+
#### Getting Azure Anthropic Credentials
|
|
512
|
+
|
|
513
|
+
1. Log in to [Azure Portal](https://portal.azure.com)
|
|
514
|
+
2. Navigate to your Azure Anthropic resource
|
|
515
|
+
3. Go to **Keys and Endpoint**
|
|
516
|
+
4. Copy the API key
|
|
517
|
+
5. Copy the endpoint URL (includes `/anthropic/v1/messages`)
|
|
518
|
+
|
|
519
|
+
#### Available Models
|
|
520
|
+
|
|
521
|
+
- **Claude Sonnet 4.5** - Best for tool calling, balanced
|
|
522
|
+
- **Claude Opus 4.5** - Most capable for complex reasoning
|
|
523
|
+
|
|
524
|
+
---
|
|
525
|
+
|
|
526
|
+
### 8. OpenAI (Direct)
|
|
527
|
+
|
|
528
|
+
**Best for:** Direct OpenAI API access, lowest latency
|
|
529
|
+
|
|
530
|
+
#### Configuration
|
|
531
|
+
|
|
532
|
+
```env
|
|
533
|
+
MODEL_PROVIDER=openai
|
|
534
|
+
OPENAI_API_KEY=sk-your-openai-api-key
|
|
535
|
+
OPENAI_MODEL=gpt-4o
|
|
536
|
+
OPENAI_ENDPOINT=https://api.openai.com/v1/chat/completions
|
|
537
|
+
```
|
|
538
|
+
|
|
539
|
+
Optional for organization-level keys:
|
|
540
|
+
```env
|
|
541
|
+
OPENAI_ORGANIZATION=org-your-org-id
|
|
542
|
+
```
|
|
543
|
+
|
|
544
|
+
#### Getting OpenAI API Key
|
|
545
|
+
|
|
546
|
+
1. Visit [platform.openai.com](https://platform.openai.com)
|
|
547
|
+
2. Sign up or log in
|
|
548
|
+
3. Go to [API Keys](https://platform.openai.com/api-keys)
|
|
549
|
+
4. Create a new API key
|
|
550
|
+
5. Add credits to your account (pay-as-you-go)
|
|
551
|
+
|
|
552
|
+
#### Available Models
|
|
553
|
+
|
|
554
|
+
```env
|
|
555
|
+
OPENAI_MODEL=gpt-4o # Latest GPT-4o ($2.50/$10 per 1M)
|
|
556
|
+
OPENAI_MODEL=gpt-4o-mini # Smaller, faster ($0.15/$0.60 per 1M)
|
|
557
|
+
OPENAI_MODEL=gpt-4-turbo # GPT-4 Turbo
|
|
558
|
+
OPENAI_MODEL=o1-preview # Reasoning model
|
|
559
|
+
OPENAI_MODEL=o1-mini # Smaller reasoning model
|
|
560
|
+
```
|
|
561
|
+
|
|
562
|
+
#### Benefits
|
|
563
|
+
|
|
564
|
+
- ✅ **Direct API access** - No intermediaries, lowest latency
|
|
565
|
+
- ✅ **Full tool calling support** - Excellent function calling
|
|
566
|
+
- ✅ **Parallel tool calls** - Execute multiple tools simultaneously
|
|
567
|
+
- ✅ **Organization support** - Use org-level API keys
|
|
568
|
+
- ✅ **Simple setup** - Just one API key needed
|
|
569
|
+
|
|
570
|
+
---
|
|
571
|
+
|
|
572
|
+
### 9. LM Studio (Local with GUI)
|
|
573
|
+
|
|
574
|
+
**Best for:** Local models with graphical interface
|
|
575
|
+
|
|
576
|
+
#### Configuration
|
|
577
|
+
|
|
578
|
+
```env
|
|
579
|
+
MODEL_PROVIDER=lmstudio
|
|
580
|
+
LMSTUDIO_ENDPOINT=http://localhost:1234
|
|
581
|
+
LMSTUDIO_MODEL=default
|
|
582
|
+
LMSTUDIO_TIMEOUT_MS=120000
|
|
583
|
+
```
|
|
584
|
+
|
|
585
|
+
Optional API key (for secured servers):
|
|
586
|
+
```env
|
|
587
|
+
LMSTUDIO_API_KEY=your-optional-api-key
|
|
588
|
+
```
|
|
589
|
+
|
|
590
|
+
#### Setup
|
|
591
|
+
|
|
592
|
+
1. Download and install [LM Studio](https://lmstudio.ai)
|
|
593
|
+
2. Launch LM Studio
|
|
594
|
+
3. Download a model (e.g., Qwen2.5-Coder-7B, Llama 3.1)
|
|
595
|
+
4. Click **Start Server** (default port: 1234)
|
|
596
|
+
5. Configure Lynkr to use LM Studio
|
|
597
|
+
|
|
598
|
+
#### Benefits
|
|
599
|
+
|
|
600
|
+
- ✅ **Graphical interface** for model management
|
|
601
|
+
- ✅ **Easy model downloads** from HuggingFace
|
|
602
|
+
- ✅ **Built-in server** with OpenAI-compatible API
|
|
603
|
+
- ✅ **GPU acceleration** support
|
|
604
|
+
- ✅ **Model presets** and configurations
|
|
605
|
+
|
|
606
|
+
---
|
|
607
|
+
|
|
608
|
+
## Hybrid Routing & Fallback
|
|
609
|
+
|
|
610
|
+
### Intelligent 3-Tier Routing
|
|
611
|
+
|
|
612
|
+
Optimize costs by routing requests based on complexity:
|
|
613
|
+
|
|
614
|
+
```env
|
|
615
|
+
# Enable hybrid routing
|
|
616
|
+
PREFER_OLLAMA=true
|
|
617
|
+
FALLBACK_ENABLED=true
|
|
618
|
+
|
|
619
|
+
# Configure providers for each tier
|
|
620
|
+
MODEL_PROVIDER=ollama
|
|
621
|
+
OLLAMA_MODEL=llama3.1:8b
|
|
622
|
+
OLLAMA_MAX_TOOLS_FOR_ROUTING=3
|
|
623
|
+
|
|
624
|
+
# Mid-tier (moderate complexity)
|
|
625
|
+
OPENROUTER_API_KEY=your-key
|
|
626
|
+
OPENROUTER_MODEL=openai/gpt-4o-mini
|
|
627
|
+
OPENROUTER_MAX_TOOLS_FOR_ROUTING=15
|
|
628
|
+
|
|
629
|
+
# Heavy workload (complex requests)
|
|
630
|
+
FALLBACK_PROVIDER=databricks
|
|
631
|
+
DATABRICKS_API_BASE=your-base
|
|
632
|
+
DATABRICKS_API_KEY=your-key
|
|
633
|
+
```
|
|
634
|
+
|
|
635
|
+
### How It Works
|
|
636
|
+
|
|
637
|
+
**Routing Logic:**
|
|
638
|
+
1. **0-2 tools**: Try Ollama first (free, local, fast)
|
|
639
|
+
2. **3-15 tools**: Route to OpenRouter (affordable cloud)
|
|
640
|
+
3. **16+ tools**: Route directly to Databricks/Azure (most capable)
|
|
641
|
+
|
|
642
|
+
**Automatic Fallback:**
|
|
643
|
+
- ❌ If Ollama fails → Fallback to OpenRouter or Databricks
|
|
644
|
+
- ❌ If OpenRouter fails → Fallback to Databricks
|
|
645
|
+
- ✅ Transparent to the user
|
|
646
|
+
|
|
647
|
+
### Cost Savings
|
|
648
|
+
|
|
649
|
+
- **65-100%** for requests that stay on Ollama
|
|
650
|
+
- **40-87%** faster for simple requests
|
|
651
|
+
- **Privacy**: Simple queries never leave your machine
|
|
652
|
+
|
|
653
|
+
### Configuration Options
|
|
654
|
+
|
|
655
|
+
| Variable | Description | Default |
|
|
656
|
+
|----------|-------------|---------|
|
|
657
|
+
| `PREFER_OLLAMA` | Enable Ollama preference for simple requests | `false` |
|
|
658
|
+
| `FALLBACK_ENABLED` | Enable automatic fallback | `true` |
|
|
659
|
+
| `FALLBACK_PROVIDER` | Provider to use when primary fails | `databricks` |
|
|
660
|
+
| `OLLAMA_MAX_TOOLS_FOR_ROUTING` | Max tools to route to Ollama | `3` |
|
|
661
|
+
| `OPENROUTER_MAX_TOOLS_FOR_ROUTING` | Max tools to route to OpenRouter | `15` |
|
|
662
|
+
|
|
663
|
+
**Note:** Local providers (ollama, llamacpp, lmstudio) cannot be used as `FALLBACK_PROVIDER`.
|
|
664
|
+
|
|
665
|
+
---
|
|
666
|
+
|
|
667
|
+
## Complete Configuration Reference
|
|
668
|
+
|
|
669
|
+
### Core Variables
|
|
670
|
+
|
|
671
|
+
| Variable | Description | Default |
|
|
672
|
+
|----------|-------------|---------|
|
|
673
|
+
| `MODEL_PROVIDER` | Primary provider (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`) | `databricks` |
|
|
674
|
+
| `PORT` | HTTP port for proxy server | `8081` |
|
|
675
|
+
| `WORKSPACE_ROOT` | Workspace directory path | `process.cwd()` |
|
|
676
|
+
| `LOG_LEVEL` | Logging level (`error`, `warn`, `info`, `debug`) | `info` |
|
|
677
|
+
| `TOOL_EXECUTION_MODE` | Where tools execute (`server`, `client`) | `server` |
|
|
678
|
+
| `MODEL_DEFAULT` | Override default model/deployment name | Provider-specific |
|
|
679
|
+
|
|
680
|
+
### Provider-Specific Variables
|
|
681
|
+
|
|
682
|
+
See individual provider sections above for complete variable lists.
|
|
683
|
+
|
|
684
|
+
---
|
|
685
|
+
|
|
686
|
+
## Provider Comparison
|
|
687
|
+
|
|
688
|
+
### Feature Comparison
|
|
689
|
+
|
|
690
|
+
| Feature | Databricks | Bedrock | OpenAI | Azure OpenAI | Azure Anthropic | OpenRouter | Ollama | llama.cpp | LM Studio |
|
|
691
|
+
|---------|-----------|---------|--------|--------------|-----------------|------------|--------|-----------|-----------|
|
|
692
|
+
| **Setup Complexity** | Medium | Easy | Easy | Medium | Medium | Easy | Easy | Medium | Easy |
|
|
693
|
+
| **Cost** | $$$ | $-$$$ | $$ | $$ | $$$ | $-$$ | **Free** | **Free** | **Free** |
|
|
694
|
+
| **Latency** | Low | Low | Low | Low | Low | Medium | **Very Low** | **Very Low** | **Very Low** |
|
|
695
|
+
| **Model Variety** | 2 | **100+** | 10+ | 10+ | 2 | **100+** | 50+ | Unlimited | 50+ |
|
|
696
|
+
| **Tool Calling** | Excellent | Excellent* | Excellent | Excellent | Excellent | Good | Fair | Good | Fair |
|
|
697
|
+
| **Context Length** | 200K | Up to 300K | 128K | 128K | 200K | Varies | 32K-128K | Model-dependent | 32K-128K |
|
|
698
|
+
| **Streaming** | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
|
|
699
|
+
| **Privacy** | Enterprise | Enterprise | Third-party | Enterprise | Enterprise | Third-party | **Local** | **Local** | **Local** |
|
|
700
|
+
| **Offline** | No | No | No | No | No | No | **Yes** | **Yes** | **Yes** |
|
|
701
|
+
|
|
702
|
+
_* Tool calling only supported by Claude models on Bedrock_
|
|
703
|
+
|
|
704
|
+
### Cost Comparison (per 1M tokens)
|
|
705
|
+
|
|
706
|
+
| Provider | Model | Input | Output |
|
|
707
|
+
|----------|-------|-------|--------|
|
|
708
|
+
| **Bedrock** | Claude 3.5 Sonnet | $3.00 | $15.00 |
|
|
709
|
+
| **Databricks** | Contact for pricing | - | - |
|
|
710
|
+
| **OpenRouter** | Claude 3.5 Sonnet | $3.00 | $15.00 |
|
|
711
|
+
| **OpenRouter** | GPT-4o mini | $0.15 | $0.60 |
|
|
712
|
+
| **OpenAI** | GPT-4o | $2.50 | $10.00 |
|
|
713
|
+
| **Azure OpenAI** | GPT-4o | $2.50 | $10.00 |
|
|
714
|
+
| **Ollama** | Any model | **FREE** | **FREE** |
|
|
715
|
+
| **llama.cpp** | Any model | **FREE** | **FREE** |
|
|
716
|
+
| **LM Studio** | Any model | **FREE** | **FREE** |
|
|
717
|
+
|
|
718
|
+
---
|
|
719
|
+
|
|
720
|
+
## Next Steps
|
|
721
|
+
|
|
722
|
+
- **[Installation Guide](installation.md)** - Install Lynkr with your chosen provider
|
|
723
|
+
- **[Claude Code CLI Setup](claude-code-cli.md)** - Connect Claude Code CLI
|
|
724
|
+
- **[Cursor Integration](cursor-integration.md)** - Connect Cursor IDE
|
|
725
|
+
- **[Embeddings Configuration](embeddings.md)** - Enable @Codebase semantic search
|
|
726
|
+
- **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
|
|
727
|
+
|
|
728
|
+
---
|
|
729
|
+
|
|
730
|
+
## Getting Help
|
|
731
|
+
|
|
732
|
+
- **[FAQ](faq.md)** - Frequently asked questions
|
|
733
|
+
- **[Troubleshooting Guide](troubleshooting.md)** - Common issues
|
|
734
|
+
- **[GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Community Q&A
|
|
735
|
+
- **[GitHub Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Report bugs
|