lynkr 7.2.5 → 8.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (124) hide show
  1. package/README.md +3 -3
  2. package/config/model-tiers.json +89 -0
  3. package/install.sh +6 -1
  4. package/package.json +4 -2
  5. package/scripts/setup.js +0 -1
  6. package/src/agents/executor.js +14 -6
  7. package/src/api/middleware/session.js +15 -2
  8. package/src/api/openai-router.js +162 -37
  9. package/src/api/providers-handler.js +15 -1
  10. package/src/api/router.js +107 -2
  11. package/src/budget/index.js +4 -3
  12. package/src/clients/databricks.js +431 -234
  13. package/src/clients/gpt-utils.js +181 -0
  14. package/src/clients/ollama-utils.js +66 -140
  15. package/src/clients/routing.js +0 -1
  16. package/src/clients/standard-tools.js +99 -3
  17. package/src/config/index.js +133 -35
  18. package/src/context/toon.js +173 -0
  19. package/src/logger/index.js +23 -0
  20. package/src/orchestrator/index.js +688 -213
  21. package/src/routing/agentic-detector.js +320 -0
  22. package/src/routing/complexity-analyzer.js +202 -2
  23. package/src/routing/cost-optimizer.js +305 -0
  24. package/src/routing/index.js +168 -159
  25. package/src/routing/model-tiers.js +365 -0
  26. package/src/server.js +4 -14
  27. package/src/sessions/cleanup.js +3 -3
  28. package/src/sessions/record.js +10 -1
  29. package/src/sessions/store.js +7 -2
  30. package/src/tools/agent-task.js +48 -1
  31. package/src/tools/index.js +19 -2
  32. package/src/tools/lazy-loader.js +7 -0
  33. package/src/tools/tinyfish.js +358 -0
  34. package/src/tools/truncate.js +1 -0
  35. package/.github/FUNDING.yml +0 -15
  36. package/.github/workflows/README.md +0 -215
  37. package/.github/workflows/ci.yml +0 -69
  38. package/.github/workflows/index.yml +0 -62
  39. package/.github/workflows/web-tools-tests.yml +0 -56
  40. package/CITATIONS.bib +0 -6
  41. package/CLAWROUTER_ROUTING_PLAN.md +0 -910
  42. package/DEPLOYMENT.md +0 -1001
  43. package/LYNKR-TUI-PLAN.md +0 -984
  44. package/PERFORMANCE-REPORT.md +0 -866
  45. package/PLAN-per-client-model-routing.md +0 -252
  46. package/ROUTER_COMPARISON.md +0 -173
  47. package/TIER_ROUTING_PLAN.md +0 -771
  48. package/docs/42642f749da6234f41b6b425c3bb07c9.txt +0 -1
  49. package/docs/BingSiteAuth.xml +0 -4
  50. package/docs/docs-style.css +0 -478
  51. package/docs/docs.html +0 -197
  52. package/docs/google5be250e608e6da39.html +0 -1
  53. package/docs/index.html +0 -577
  54. package/docs/index.md +0 -577
  55. package/docs/robots.txt +0 -4
  56. package/docs/sitemap.xml +0 -44
  57. package/docs/style.css +0 -1223
  58. package/documentation/README.md +0 -100
  59. package/documentation/api.md +0 -806
  60. package/documentation/claude-code-cli.md +0 -672
  61. package/documentation/codex-cli.md +0 -397
  62. package/documentation/contributing.md +0 -571
  63. package/documentation/cursor-integration.md +0 -731
  64. package/documentation/docker.md +0 -867
  65. package/documentation/embeddings.md +0 -760
  66. package/documentation/faq.md +0 -659
  67. package/documentation/features.md +0 -396
  68. package/documentation/headroom.md +0 -519
  69. package/documentation/installation.md +0 -706
  70. package/documentation/memory-system.md +0 -476
  71. package/documentation/production.md +0 -601
  72. package/documentation/providers.md +0 -906
  73. package/documentation/testing.md +0 -629
  74. package/documentation/token-optimization.md +0 -323
  75. package/documentation/tools.md +0 -697
  76. package/documentation/troubleshooting.md +0 -893
  77. package/final-test.js +0 -33
  78. package/headroom-sidecar/config.py +0 -93
  79. package/headroom-sidecar/requirements.txt +0 -14
  80. package/headroom-sidecar/server.py +0 -451
  81. package/monitor-agents.sh +0 -31
  82. package/scripts/audit-log-reader.js +0 -399
  83. package/scripts/compact-dictionary.js +0 -204
  84. package/scripts/test-deduplication.js +0 -448
  85. package/src/db/database.sqlite +0 -0
  86. package/test/README.md +0 -212
  87. package/test/azure-openai-config.test.js +0 -204
  88. package/test/azure-openai-error-resilience.test.js +0 -238
  89. package/test/azure-openai-format-conversion.test.js +0 -354
  90. package/test/azure-openai-integration.test.js +0 -281
  91. package/test/azure-openai-routing.test.js +0 -177
  92. package/test/azure-openai-streaming.test.js +0 -171
  93. package/test/bedrock-integration.test.js +0 -471
  94. package/test/comprehensive-test-suite.js +0 -928
  95. package/test/config-validation.test.js +0 -207
  96. package/test/cursor-integration.test.js +0 -484
  97. package/test/format-conversion.test.js +0 -578
  98. package/test/hybrid-routing-integration.test.js +0 -254
  99. package/test/hybrid-routing-performance.test.js +0 -418
  100. package/test/llamacpp-integration.test.js +0 -863
  101. package/test/lmstudio-integration.test.js +0 -335
  102. package/test/memory/extractor.test.js +0 -398
  103. package/test/memory/retriever.test.js +0 -613
  104. package/test/memory/retriever.test.js.bak +0 -585
  105. package/test/memory/search.test.js +0 -537
  106. package/test/memory/search.test.js.bak +0 -389
  107. package/test/memory/store.test.js +0 -344
  108. package/test/memory/store.test.js.bak +0 -312
  109. package/test/memory/surprise.test.js +0 -300
  110. package/test/memory-performance.test.js +0 -472
  111. package/test/openai-integration.test.js +0 -686
  112. package/test/openrouter-error-resilience.test.js +0 -418
  113. package/test/passthrough-mode.test.js +0 -385
  114. package/test/performance-benchmark.js +0 -351
  115. package/test/performance-tests.js +0 -528
  116. package/test/routing.test.js +0 -219
  117. package/test/web-tools.test.js +0 -329
  118. package/test-agents-simple.js +0 -43
  119. package/test-cli-connection.sh +0 -33
  120. package/test-learning-unit.js +0 -126
  121. package/test-learning.js +0 -112
  122. package/test-parallel-agents.sh +0 -124
  123. package/test-parallel-direct.js +0 -155
  124. package/test-subagents.sh +0 -117
@@ -1,906 +0,0 @@
1
- # Provider Configuration Guide
2
-
3
- Complete configuration reference for all 9+ supported LLM providers. Each provider section includes setup instructions, model options, pricing, and example configurations.
4
-
5
- ---
6
-
7
- ## Overview
8
-
9
- Lynkr supports multiple AI model providers, giving you flexibility in choosing the right model for your needs:
10
-
11
- | Provider | Type | Models | Cost | Privacy | Setup Complexity |
12
- |----------|------|--------|------|---------|------------------|
13
- | **AWS Bedrock** | Cloud | 100+ (Claude, DeepSeek, Qwen, Nova, Titan, Llama, Mistral) | $-$$$ | Cloud | Easy |
14
- | **Databricks** | Cloud | Claude Sonnet 4.5, Opus 4.5 | $$$ | Cloud | Medium |
15
- | **OpenRouter** | Cloud | 100+ (GPT, Claude, Gemini, Llama, Mistral, etc.) | $-$$ | Cloud | Easy |
16
- | **Ollama** | Local | Unlimited (free, offline) | **FREE** | 🔒 100% Local | Easy |
17
- | **llama.cpp** | Local | Any GGUF model | **FREE** | 🔒 100% Local | Medium |
18
- | **Azure OpenAI** | Cloud | GPT-4o, GPT-5, o1, o3 | $$$ | Cloud | Medium |
19
- | **Azure Anthropic** | Cloud | Claude models | $$$ | Cloud | Medium |
20
- | **OpenAI** | Cloud | GPT-4o, o1, o3 | $$$ | Cloud | Easy |
21
- | **LM Studio** | Local | Local models with GUI | **FREE** | 🔒 100% Local | Easy |
22
- | **MLX OpenAI Server** | Local | Apple Silicon optimized | **FREE** | 🔒 100% Local | Easy |
23
-
24
- ---
25
-
26
- ## Configuration Methods
27
-
28
- ### Environment Variables (Quick Start)
29
-
30
- ```bash
31
- export MODEL_PROVIDER=databricks
32
- export DATABRICKS_API_BASE=https://your-workspace.databricks.com
33
- export DATABRICKS_API_KEY=your-key
34
- lynkr start
35
- ```
36
-
37
- ### .env File (Recommended for Production)
38
-
39
- ```bash
40
- # Copy example file
41
- cp .env.example .env
42
-
43
- # Edit with your credentials
44
- nano .env
45
- ```
46
-
47
- Example `.env`:
48
- ```env
49
- MODEL_PROVIDER=databricks
50
- DATABRICKS_API_BASE=https://your-workspace.databricks.com
51
- DATABRICKS_API_KEY=dapi1234567890abcdef
52
- PORT=8081
53
- LOG_LEVEL=info
54
- ```
55
-
56
- ---
57
-
58
- ## Remote/Network Configuration
59
-
60
- **All provider endpoints support remote addresses** - you're not limited to `localhost`. This enables powerful setups like:
61
-
62
- - 🖥️ **GPU Server**: Run Ollama/llama.cpp on a dedicated GPU machine
63
- - 🏢 **Team Sharing**: Multiple developers using one Lynkr instance
64
- - ☁️ **Hybrid**: Lynkr on local machine, models on cloud VM
65
-
66
- ### Examples
67
-
68
- **Ollama on Remote GPU Server**
69
- ```env
70
- MODEL_PROVIDER=ollama
71
- OLLAMA_ENDPOINT=http://192.168.1.100:11434 # Local network IP
72
- # or
73
- OLLAMA_ENDPOINT=http://gpu-server.local:11434 # Hostname
74
- # or
75
- OLLAMA_ENDPOINT=http://ollama.mycompany.com:11434 # Domain
76
- ```
77
-
78
- **llama.cpp on Remote Machine**
79
- ```env
80
- MODEL_PROVIDER=llamacpp
81
- LLAMACPP_ENDPOINT=http://10.0.0.50:8080
82
- ```
83
-
84
- **LM Studio on Another Computer**
85
- ```env
86
- MODEL_PROVIDER=lmstudio
87
- LMSTUDIO_ENDPOINT=http://workstation.local:1234
88
- ```
89
-
90
- ### Network Requirements
91
-
92
- | Setup | Requirement |
93
- |-------|-------------|
94
- | Same machine | `localhost` or `127.0.0.1` |
95
- | Local network | IP address or hostname, firewall allows port |
96
- | Remote/Internet | Public IP/domain, port forwarding, consider VPN/auth |
97
-
98
- > ⚠️ **Security Note**: When exposing endpoints over a network, ensure proper firewall rules and consider using a VPN or SSH tunnel for sensitive deployments.
99
-
100
- ---
101
-
102
- ## Provider-Specific Configuration
103
-
104
- ### 1. AWS Bedrock (100+ Models)
105
-
106
- **Best for:** AWS ecosystem, multi-model flexibility, Claude + alternatives
107
-
108
- #### Configuration
109
-
110
- ```env
111
- MODEL_PROVIDER=bedrock
112
- AWS_BEDROCK_API_KEY=your-bearer-token
113
- AWS_BEDROCK_REGION=us-east-1
114
- AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
115
- ```
116
-
117
- #### Getting AWS Bedrock API Key
118
-
119
- 1. Log in to [AWS Console](https://console.aws.amazon.com/)
120
- 2. Navigate to **Bedrock** → **API Keys**
121
- 3. Click **Generate API Key**
122
- 4. Copy the bearer token (this is your `AWS_BEDROCK_API_KEY`)
123
- 5. Enable model access in Bedrock console
124
- 6. See: [AWS Bedrock API Keys Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/api-keys-generate.html)
125
-
126
- #### Available Regions
127
-
128
- - `us-east-1` (N. Virginia) - Most models available
129
- - `us-west-2` (Oregon)
130
- - `us-east-2` (Ohio)
131
- - `ap-southeast-1` (Singapore)
132
- - `ap-northeast-1` (Tokyo)
133
- - `eu-central-1` (Frankfurt)
134
-
135
- #### Model Catalog
136
-
137
- **Claude Models (Best for Tool Calling)** ✅
138
-
139
- Claude 4.5 (latest - requires inference profiles):
140
- ```env
141
- AWS_BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0 # Regional US
142
- AWS_BEDROCK_MODEL_ID=us.anthropic.claude-haiku-4-5-20251001-v1:0 # Fast, efficient
143
- AWS_BEDROCK_MODEL_ID=global.anthropic.claude-sonnet-4-5-20250929-v1:0 # Cross-region
144
- ```
145
-
146
- Claude 3.x models:
147
- ```env
148
- AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0 # Excellent tool calling
149
- AWS_BEDROCK_MODEL_ID=anthropic.claude-3-opus-20240229-v1:0 # Most capable
150
- AWS_BEDROCK_MODEL_ID=anthropic.claude-3-haiku-20240307-v1:0 # Fast, cheap
151
- ```
152
-
153
- **DeepSeek Models (NEW - 2025)**
154
- ```env
155
- AWS_BEDROCK_MODEL_ID=us.deepseek.r1-v1:0 # DeepSeek R1 - reasoning model (o1-style)
156
- ```
157
-
158
- **Qwen Models (Alibaba - NEW 2025)**
159
- ```env
160
- AWS_BEDROCK_MODEL_ID=qwen.qwen3-235b-a22b-2507-v1:0 # Largest, 235B parameters
161
- AWS_BEDROCK_MODEL_ID=qwen.qwen3-32b-v1:0 # Balanced, 32B
162
- AWS_BEDROCK_MODEL_ID=qwen.qwen3-coder-480b-a35b-v1:0 # Coding specialist, 480B
163
- AWS_BEDROCK_MODEL_ID=qwen.qwen3-coder-30b-a3b-v1:0 # Coding, smaller
164
- ```
165
-
166
- **OpenAI Open-Weight Models (NEW - 2025)**
167
- ```env
168
- AWS_BEDROCK_MODEL_ID=openai.gpt-oss-120b-1:0 # 120B parameters, open-weight
169
- AWS_BEDROCK_MODEL_ID=openai.gpt-oss-20b-1:0 # 20B parameters, efficient
170
- ```
171
-
172
- **Google Gemma Models (Open-Weight)**
173
- ```env
174
- AWS_BEDROCK_MODEL_ID=google.gemma-3-27b # 27B parameters
175
- AWS_BEDROCK_MODEL_ID=google.gemma-3-12b # 12B parameters
176
- AWS_BEDROCK_MODEL_ID=google.gemma-3-4b # 4B parameters, efficient
177
- ```
178
-
179
- **Amazon Models**
180
-
181
- Nova (multimodal):
182
- ```env
183
- AWS_BEDROCK_MODEL_ID=us.amazon.nova-pro-v1:0 # Best quality, multimodal, 300K context
184
- AWS_BEDROCK_MODEL_ID=us.amazon.nova-lite-v1:0 # Fast, cost-effective
185
- AWS_BEDROCK_MODEL_ID=us.amazon.nova-micro-v1:0 # Ultra-fast, text-only
186
- ```
187
-
188
- Titan:
189
- ```env
190
- AWS_BEDROCK_MODEL_ID=amazon.titan-text-premier-v1:0 # Largest
191
- AWS_BEDROCK_MODEL_ID=amazon.titan-text-express-v1 # Fast
192
- AWS_BEDROCK_MODEL_ID=amazon.titan-text-lite-v1 # Cheapest
193
- ```
194
-
195
- **Meta Llama Models**
196
- ```env
197
- AWS_BEDROCK_MODEL_ID=meta.llama3-1-70b-instruct-v1:0 # Most capable
198
- AWS_BEDROCK_MODEL_ID=meta.llama3-1-8b-instruct-v1:0 # Fast, efficient
199
- ```
200
-
201
- **Mistral Models**
202
- ```env
203
- AWS_BEDROCK_MODEL_ID=mistral.mistral-large-2407-v1:0 # Largest, coding, multilingual
204
- AWS_BEDROCK_MODEL_ID=mistral.mistral-small-2402-v1:0 # Efficient
205
- AWS_BEDROCK_MODEL_ID=mistral.mixtral-8x7b-instruct-v0:1 # Mixture of experts
206
- ```
207
-
208
- **Cohere Command Models**
209
- ```env
210
- AWS_BEDROCK_MODEL_ID=cohere.command-r-plus-v1:0 # Best for RAG, search
211
- AWS_BEDROCK_MODEL_ID=cohere.command-r-v1:0 # Balanced
212
- ```
213
-
214
- **AI21 Jamba Models**
215
- ```env
216
- AWS_BEDROCK_MODEL_ID=ai21.jamba-1-5-large-v1:0 # Hybrid architecture, 256K context
217
- AWS_BEDROCK_MODEL_ID=ai21.jamba-1-5-mini-v1:0 # Fast
218
- ```
219
-
220
- #### Pricing (per 1M tokens)
221
-
222
- | Model | Input | Output |
223
- |-------|-------|--------|
224
- | Claude 3.5 Sonnet | $3.00 | $15.00 |
225
- | Claude 3 Opus | $15.00 | $75.00 |
226
- | Claude 3 Haiku | $0.25 | $1.25 |
227
- | Titan Text Express | $0.20 | $0.60 |
228
- | Llama 3 70B | $0.99 | $0.99 |
229
- | Nova Pro | $0.80 | $3.20 |
230
-
231
- #### Important Notes
232
-
233
- ⚠️ **Tool Calling:** Only **Claude models** support tool calling on Bedrock. Other models work via Converse API but won't use Read/Write/Bash tools.
234
-
235
- 📖 **Full Documentation:** See [BEDROCK_MODELS.md](../BEDROCK_MODELS.md) for complete model catalog with capabilities and use cases.
236
-
237
- ---
238
-
239
- ### 2. Databricks (Claude Sonnet 4.5, Opus 4.5)
240
-
241
- **Best for:** Enterprise production use, managed Claude endpoints
242
-
243
- #### Configuration
244
-
245
- ```env
246
- MODEL_PROVIDER=databricks
247
- DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com
248
- DATABRICKS_API_KEY=dapi1234567890abcdef
249
- ```
250
-
251
- Optional endpoint path override:
252
- ```env
253
- DATABRICKS_ENDPOINT_PATH=/serving-endpoints/databricks-claude-sonnet-4-5/invocations
254
- ```
255
-
256
- #### Getting Databricks Credentials
257
-
258
- 1. Log in to your Databricks workspace
259
- 2. Navigate to **Settings** → **User Settings**
260
- 3. Click **Generate New Token**
261
- 4. Copy the token (this is your `DATABRICKS_API_KEY`)
262
- 5. Your workspace URL is the base URL (e.g., `https://your-workspace.cloud.databricks.com`)
263
-
264
- #### Available Models
265
-
266
- - **Claude Sonnet 4.5** - Excellent for tool calling, balanced performance
267
- - **Claude Opus 4.5** - Most capable model for complex reasoning
268
-
269
- #### Pricing
270
-
271
- Contact Databricks for enterprise pricing.
272
-
273
- ---
274
-
275
- ### 3. OpenRouter (100+ Models)
276
-
277
- **Best for:** Quick setup, model flexibility, cost optimization
278
-
279
- #### Configuration
280
-
281
- ```env
282
- MODEL_PROVIDER=openrouter
283
- OPENROUTER_API_KEY=sk-or-v1-your-key
284
- OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
285
- OPENROUTER_ENDPOINT=https://openrouter.ai/api/v1/chat/completions
286
- ```
287
-
288
- Optional for hybrid routing:
289
- ```env
290
- OPENROUTER_MAX_TOOLS_FOR_ROUTING=15 # Max tools to route to OpenRouter
291
- ```
292
-
293
- #### Getting OpenRouter API Key
294
-
295
- 1. Visit [openrouter.ai](https://openrouter.ai)
296
- 2. Sign in with GitHub, Google, or email
297
- 3. Go to [openrouter.ai/keys](https://openrouter.ai/keys)
298
- 4. Create a new API key
299
- 5. Add credits (pay-as-you-go, no subscription required)
300
-
301
- #### Popular Models
302
-
303
- **Claude Models (Best for Coding)**
304
- ```env
305
- OPENROUTER_MODEL=anthropic/claude-3.5-sonnet # $3/$15 per 1M tokens
306
- OPENROUTER_MODEL=anthropic/claude-opus-4.5 # $15/$75 per 1M tokens
307
- OPENROUTER_MODEL=anthropic/claude-3-haiku # $0.25/$1.25 per 1M tokens
308
- ```
309
-
310
- **OpenAI Models**
311
- ```env
312
- OPENROUTER_MODEL=openai/gpt-4o # $2.50/$10 per 1M tokens
313
- OPENROUTER_MODEL=openai/gpt-4o-mini # $0.15/$0.60 per 1M tokens (default)
314
- OPENROUTER_MODEL=openai/o1-preview # $15/$60 per 1M tokens
315
- OPENROUTER_MODEL=openai/o1-mini # $3/$12 per 1M tokens
316
- ```
317
-
318
- **Google Models**
319
- ```env
320
- OPENROUTER_MODEL=google/gemini-pro-1.5 # $1.25/$5 per 1M tokens
321
- OPENROUTER_MODEL=google/gemini-flash-1.5 # $0.075/$0.30 per 1M tokens
322
- ```
323
-
324
- **Meta Llama Models**
325
- ```env
326
- OPENROUTER_MODEL=meta-llama/llama-3.1-405b # $2.70/$2.70 per 1M tokens
327
- OPENROUTER_MODEL=meta-llama/llama-3.1-70b # $0.52/$0.75 per 1M tokens
328
- OPENROUTER_MODEL=meta-llama/llama-3.1-8b # $0.06/$0.06 per 1M tokens
329
- ```
330
-
331
- **Mistral Models**
332
- ```env
333
- OPENROUTER_MODEL=mistralai/mistral-large # $2/$6 per 1M tokens
334
- OPENROUTER_MODEL=mistralai/codestral-latest # $0.30/$0.90 per 1M tokens
335
- ```
336
-
337
- **DeepSeek Models**
338
- ```env
339
- OPENROUTER_MODEL=deepseek/deepseek-chat # $0.14/$0.28 per 1M tokens
340
- OPENROUTER_MODEL=deepseek/deepseek-coder # $0.14/$0.28 per 1M tokens
341
- ```
342
-
343
- #### Benefits
344
-
345
- - ✅ **100+ models** through one API
346
- - ✅ **Automatic fallbacks** if primary model unavailable
347
- - ✅ **Competitive pricing** with volume discounts
348
- - ✅ **Full tool calling support**
349
- - ✅ **No monthly fees** - pay only for usage
350
- - ✅ **Rate limit pooling** across models
351
-
352
- See [openrouter.ai/models](https://openrouter.ai/models) for complete list with pricing.
353
-
354
- ---
355
-
356
- ### 4. Ollama (Local Models)
357
-
358
- **Best for:** Local development, privacy, offline use, no API costs
359
-
360
- #### Configuration
361
-
362
- ```env
363
- MODEL_PROVIDER=ollama
364
- OLLAMA_ENDPOINT=http://localhost:11434 # Or any remote IP/hostname
365
- OLLAMA_MODEL=llama3.1:8b
366
- OLLAMA_TIMEOUT_MS=120000
367
- ```
368
-
369
- > 🌐 **Remote Support**: `OLLAMA_ENDPOINT` can be any address - `http://192.168.1.100:11434`, `http://gpu-server:11434`, etc. See [Remote/Network Configuration](#remotenetwork-configuration).
370
-
371
- #### Performance Optimization
372
-
373
- **Prevent Cold Starts:** Ollama unloads models after 5 minutes of inactivity by default. This causes slow first requests (10-30+ seconds) while the model reloads. To keep models loaded:
374
-
375
- **Option 1: Environment Variable (Recommended)**
376
- ```bash
377
- # Set on Ollama server (not Lynkr)
378
- # macOS
379
- launchctl setenv OLLAMA_KEEP_ALIVE "24h"
380
-
381
- # Linux (systemd) - edit with: sudo systemctl edit ollama
382
- [Service]
383
- Environment="OLLAMA_KEEP_ALIVE=24h"
384
-
385
- # Docker
386
- docker run -e OLLAMA_KEEP_ALIVE=24h -d ollama/ollama
387
- ```
388
-
389
- **Option 2: Per-Request Keep Alive**
390
- ```bash
391
- curl http://localhost:11434/api/generate -d '{"model":"llama3.1:8b","keep_alive":"24h"}'
392
- ```
393
-
394
- **Keep Alive Values:**
395
- | Value | Behavior |
396
- |-------|----------|
397
- | `5m` | Default - unload after 5 minutes |
398
- | `24h` | Keep loaded for 24 hours |
399
- | `-1` | Never unload (keep forever) |
400
- | `0` | Unload immediately after request |
401
-
402
- #### Installation & Setup
403
-
404
- ```bash
405
- # Install Ollama
406
- brew install ollama # macOS
407
- # Or download from: https://ollama.ai/download
408
-
409
- # Start Ollama service
410
- ollama serve
411
-
412
- # Pull a model
413
- ollama pull llama3.1:8b
414
-
415
- # Verify model is available
416
- ollama list
417
- ```
418
-
419
- #### Recommended Models
420
-
421
- **For Tool Calling** ✅ (Required for Claude Code CLI)
422
- ```bash
423
- ollama pull llama3.1:8b # Good balance (4.7GB)
424
- ollama pull llama3.2 # Latest Llama (4.7GB)
425
- ollama pull qwen2.5:14b # Strong reasoning (8GB, 7b struggles with tools)
426
- ollama pull mistral:7b-instruct # Fast and capable (4.1GB)
427
- ```
428
-
429
- **NOT Recommended for Tools** ❌
430
- ```bash
431
- qwen2.5-coder # Code-only, slow with tool calling
432
- codellama # Code-only, poor tool support
433
- ```
434
-
435
- #### Tool Calling Support
436
-
437
- Lynkr supports **native tool calling** for compatible Ollama models:
438
-
439
- - ✅ **Supported models**: llama3.1, llama3.2, qwen2.5, mistral, mistral-nemo
440
- - ✅ **Automatic detection**: Lynkr detects tool-capable models
441
- - ✅ **Format conversion**: Transparent Anthropic ↔ Ollama conversion
442
- - ❌ **Unsupported models**: llama3, older models (tools filtered automatically)
443
-
444
- #### Pricing
445
-
446
- **100% FREE** - Models run on your hardware with no API costs.
447
-
448
- #### Model Sizes
449
-
450
- - **7B models**: ~4-5GB download, 8GB RAM required
451
- - **8B models**: ~4.7GB download, 8GB RAM required
452
- - **14B models**: ~8GB download, 16GB RAM required
453
- - **32B models**: ~18GB download, 32GB RAM required
454
-
455
- ---
456
-
457
- ### 5. llama.cpp (GGUF Models)
458
-
459
- **Best for:** Maximum performance, custom quantization, any GGUF model
460
-
461
- #### Configuration
462
-
463
- ```env
464
- MODEL_PROVIDER=llamacpp
465
- LLAMACPP_ENDPOINT=http://localhost:8080 # Or any remote IP/hostname
466
- LLAMACPP_MODEL=qwen2.5-coder-7b
467
- LLAMACPP_TIMEOUT_MS=120000
468
- ```
469
-
470
- Optional API key (for secured servers):
471
- ```env
472
- LLAMACPP_API_KEY=your-optional-api-key
473
- ```
474
-
475
- > 🌐 **Remote Support**: `LLAMACPP_ENDPOINT` can be any address. See [Remote/Network Configuration](#remotenetwork-configuration).
476
-
477
- #### Installation & Setup
478
-
479
- ```bash
480
- # Clone and build llama.cpp
481
- git clone https://github.com/ggerganov/llama.cpp
482
- cd llama.cpp && make
483
-
484
- # Download a GGUF model (example: Qwen2.5-Coder-7B)
485
- wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_k_m.gguf
486
-
487
- # Start llama-server
488
- ./llama-server -m qwen2.5-coder-7b-instruct-q4_k_m.gguf --port 8080
489
-
490
- # Verify server is running
491
- curl http://localhost:8080/health
492
- ```
493
-
494
- #### GPU Support
495
-
496
- llama.cpp supports multiple GPU backends:
497
-
498
- - **CUDA** (NVIDIA): `make LLAMA_CUDA=1`
499
- - **Metal** (Apple Silicon): `make LLAMA_METAL=1`
500
- - **ROCm** (AMD): `make LLAMA_ROCM=1`
501
- - **Vulkan** (Universal): `make LLAMA_VULKAN=1`
502
-
503
- #### llama.cpp vs Ollama
504
-
505
- | Feature | Ollama | llama.cpp |
506
- |---------|--------|-----------|
507
- | Setup | Easy (app) | Manual (compile/download) |
508
- | Model Format | Ollama-specific | Any GGUF model |
509
- | Performance | Good | **Excellent** (optimized C++) |
510
- | GPU Support | Yes | Yes (CUDA, Metal, ROCm, Vulkan) |
511
- | Memory Usage | Higher | **Lower** (quantization options) |
512
- | API | Custom `/api/chat` | OpenAI-compatible `/v1/chat/completions` |
513
- | Flexibility | Limited models | **Any GGUF** from HuggingFace |
514
- | Tool Calling | Limited models | Grammar-based, more reliable |
515
-
516
- **Choose llama.cpp when you need:**
517
- - Maximum performance
518
- - Specific quantization options (Q4, Q5, Q8)
519
- - GGUF models not available in Ollama
520
- - Fine-grained control over inference parameters
521
-
522
- ---
523
-
524
- ### 6. Azure OpenAI
525
-
526
- **Best for:** Azure integration, Microsoft ecosystem, GPT-4o, o1, o3
527
-
528
- #### Configuration
529
-
530
- ```env
531
- MODEL_PROVIDER=azure-openai
532
- AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/openai/deployments/YOUR-DEPLOYMENT/chat/completions?api-version=2025-01-01-preview
533
- AZURE_OPENAI_API_KEY=your-azure-api-key
534
- AZURE_OPENAI_DEPLOYMENT=gpt-4o
535
- ```
536
-
537
- Optional:
538
- ```env
539
- AZURE_OPENAI_API_VERSION=2024-08-01-preview # Latest stable version
540
- ```
541
-
542
- #### Getting Azure OpenAI Credentials
543
-
544
- 1. Log in to [Azure Portal](https://portal.azure.com)
545
- 2. Navigate to **Azure OpenAI** service
546
- 3. Go to **Keys and Endpoint**
547
- 4. Copy **KEY 1** (this is your API key)
548
- 5. Copy **Endpoint** URL
549
- 6. Create a deployment (gpt-4o, gpt-4o-mini, etc.)
550
-
551
- #### Important: Full Endpoint URL Required
552
-
553
- The `AZURE_OPENAI_ENDPOINT` must include:
554
- - Resource name
555
- - Deployment path
556
- - API version query parameter
557
-
558
- **Example:**
559
- ```
560
- https://your-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview
561
- ```
562
-
563
- #### Available Deployments
564
-
565
- You can deploy any of these models in Azure AI Foundry:
566
-
567
- ```env
568
- AZURE_OPENAI_DEPLOYMENT=gpt-4o # Latest GPT-4o
569
- AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini # Smaller, faster, cheaper
570
- AZURE_OPENAI_DEPLOYMENT=gpt-5-chat # GPT-5 (if available)
571
- AZURE_OPENAI_DEPLOYMENT=o1-preview # Reasoning model
572
- AZURE_OPENAI_DEPLOYMENT=o3-mini # Latest reasoning model
573
- AZURE_OPENAI_DEPLOYMENT=kimi-k2 # Kimi K2 (if available)
574
- ```
575
-
576
- ---
577
-
578
- ### 7. Azure Anthropic
579
-
580
- **Best for:** Azure-hosted Claude models with enterprise integration
581
-
582
- #### Configuration
583
-
584
- ```env
585
- MODEL_PROVIDER=azure-anthropic
586
- AZURE_ANTHROPIC_ENDPOINT=https://your-resource.services.ai.azure.com/anthropic/v1/messages
587
- AZURE_ANTHROPIC_API_KEY=your-azure-api-key
588
- AZURE_ANTHROPIC_VERSION=2023-06-01
589
- ```
590
-
591
- #### Getting Azure Anthropic Credentials
592
-
593
- 1. Log in to [Azure Portal](https://portal.azure.com)
594
- 2. Navigate to your Azure Anthropic resource
595
- 3. Go to **Keys and Endpoint**
596
- 4. Copy the API key
597
- 5. Copy the endpoint URL (includes `/anthropic/v1/messages`)
598
-
599
- #### Available Models
600
-
601
- - **Claude Sonnet 4.5** - Best for tool calling, balanced
602
- - **Claude Opus 4.5** - Most capable for complex reasoning
603
-
604
- ---
605
-
606
- ### 8. OpenAI (Direct)
607
-
608
- **Best for:** Direct OpenAI API access, lowest latency
609
-
610
- #### Configuration
611
-
612
- ```env
613
- MODEL_PROVIDER=openai
614
- OPENAI_API_KEY=sk-your-openai-api-key
615
- OPENAI_MODEL=gpt-4o
616
- OPENAI_ENDPOINT=https://api.openai.com/v1/chat/completions
617
- ```
618
-
619
- Optional for organization-level keys:
620
- ```env
621
- OPENAI_ORGANIZATION=org-your-org-id
622
- ```
623
-
624
- #### Getting OpenAI API Key
625
-
626
- 1. Visit [platform.openai.com](https://platform.openai.com)
627
- 2. Sign up or log in
628
- 3. Go to [API Keys](https://platform.openai.com/api-keys)
629
- 4. Create a new API key
630
- 5. Add credits to your account (pay-as-you-go)
631
-
632
- #### Available Models
633
-
634
- ```env
635
- OPENAI_MODEL=gpt-4o # Latest GPT-4o ($2.50/$10 per 1M)
636
- OPENAI_MODEL=gpt-4o-mini # Smaller, faster ($0.15/$0.60 per 1M)
637
- OPENAI_MODEL=gpt-4-turbo # GPT-4 Turbo
638
- OPENAI_MODEL=o1-preview # Reasoning model
639
- OPENAI_MODEL=o1-mini # Smaller reasoning model
640
- ```
641
-
642
- #### Benefits
643
-
644
- - ✅ **Direct API access** - No intermediaries, lowest latency
645
- - ✅ **Full tool calling support** - Excellent function calling
646
- - ✅ **Parallel tool calls** - Execute multiple tools simultaneously
647
- - ✅ **Organization support** - Use org-level API keys
648
- - ✅ **Simple setup** - Just one API key needed
649
-
650
- ---
651
-
652
- ### 9. LM Studio (Local with GUI)
653
-
654
- **Best for:** Local models with graphical interface
655
-
656
- #### Configuration
657
-
658
- ```env
659
- MODEL_PROVIDER=lmstudio
660
- LMSTUDIO_ENDPOINT=http://localhost:1234
661
- LMSTUDIO_MODEL=default
662
- LMSTUDIO_TIMEOUT_MS=120000
663
- ```
664
-
665
- Optional API key (for secured servers):
666
- ```env
667
- LMSTUDIO_API_KEY=your-optional-api-key
668
- ```
669
-
670
- #### Setup
671
-
672
- 1. Download and install [LM Studio](https://lmstudio.ai)
673
- 2. Launch LM Studio
674
- 3. Download a model (e.g., Qwen2.5-Coder-7B, Llama 3.1)
675
- 4. Click **Start Server** (default port: 1234)
676
- 5. Configure Lynkr to use LM Studio
677
-
678
- #### Benefits
679
-
680
- - ✅ **Graphical interface** for model management
681
- - ✅ **Easy model downloads** from HuggingFace
682
- - ✅ **Built-in server** with OpenAI-compatible API
683
- - ✅ **GPU acceleration** support
684
- - ✅ **Model presets** and configurations
685
-
686
- ---
687
-
688
- ### 10. MLX OpenAI Server (Apple Silicon)
689
-
690
- **Best for:** Maximum performance on Apple Silicon Macs (M1/M2/M3/M4)
691
-
692
- [MLX OpenAI Server](https://github.com/cubist38/mlx-openai-server) is a high-performance local LLM server optimized for Apple's MLX framework. It provides OpenAI-compatible endpoints for text, vision, audio, and image generation models.
693
-
694
- #### Installation
695
-
696
- ```bash
697
- # Create virtual environment
698
- python3.11 -m venv .venv
699
- source .venv/bin/activate
700
-
701
- # Install
702
- pip install mlx-openai-server
703
-
704
- # Optional: for audio transcription
705
- brew install ffmpeg
706
- ```
707
-
708
- #### Start the Server
709
-
710
- ```bash
711
- # Text/Code models (recommended for coding)
712
- mlx-openai-server launch --model-path mlx-community/Qwen2.5-Coder-7B-Instruct-4bit --model-type lm
713
-
714
- # Smaller model (faster, less RAM)
715
- mlx-openai-server launch --model-path mlx-community/Qwen2.5-Coder-1.5B-Instruct-4bit --model-type lm
716
-
717
- # General purpose
718
- mlx-openai-server launch --model-path mlx-community/Qwen2.5-3B-Instruct-4bit --model-type lm
719
- ```
720
-
721
- Server runs at `http://localhost:8000/v1` by default.
722
-
723
- #### Configuration
724
-
725
- ```env
726
- MODEL_PROVIDER=openai
727
- OPENAI_ENDPOINT=http://localhost:8000/v1/chat/completions
728
- OPENAI_API_KEY=not-needed
729
- ```
730
-
731
- > 🌐 **Remote Support**: `OPENAI_ENDPOINT` can be any address (e.g., `http://192.168.1.100:8000/v1/chat/completions` for a Mac Studio GPU server).
732
-
733
- #### Recommended Models for Coding
734
-
735
- | Model | Size | RAM | Command |
736
- |-------|------|-----|---------|
737
- | `Qwen2.5-Coder-1.5B-Instruct-4bit` | ~1GB | 4GB | Fast, simple code tasks |
738
- | `Qwen2.5-3B-Instruct-4bit` | ~2GB | 6GB | General + code |
739
- | `Qwen2.5-Coder-7B-Instruct-4bit` | ~4GB | 8GB | Best for coding |
740
- | `Qwen2.5-Coder-14B-Instruct-4bit` | ~8GB | 16GB | Complex reasoning |
741
- | `Llama-3.2-3B-Instruct-4bit` | ~2GB | 6GB | General purpose |
742
- | `Phi-3-mini-4k-instruct-4bit` | ~2GB | 6GB | Reasoning tasks |
743
-
744
- #### Server Options
745
-
746
- ```bash
747
- mlx-openai-server launch \
748
- --model-path mlx-community/Qwen2.5-Coder-7B-Instruct-4bit \
749
- --model-type lm \
750
- --host 0.0.0.0 \ # Allow remote connections
751
- --port 8000 \ # Default port
752
- --max-concurrency 2 \ # Parallel requests
753
- --context-length 4096 # Max context window
754
- ```
755
-
756
- #### MLX vs Ollama Comparison
757
-
758
- | Feature | MLX OpenAI Server | Ollama |
759
- |---------|-------------------|--------|
760
- | Platform | Apple Silicon only | Cross-platform |
761
- | Performance | Native MLX optimization | Good on Apple Silicon |
762
- | Model Format | HuggingFace MLX | Ollama-specific |
763
- | Vision/Audio | ✅ Built-in | Limited |
764
- | Image Generation | ✅ Flux support | ❌ |
765
- | Quantization | 4/8/16-bit flexible | Model-specific |
766
-
767
- #### Test Connection
768
-
769
- ```bash
770
- curl -X POST http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "default", "messages": [{"role": "user", "content": "Hello"}]}'
771
- ```
772
-
773
- #### Pricing
774
-
775
- **100% FREE** - Models run locally on your Apple Silicon Mac.
776
-
777
- ---
778
-
779
- ## Hybrid Routing & Fallback
780
-
781
- ### Intelligent 3-Tier Routing
782
-
783
- Optimize costs by routing requests based on complexity:
784
-
785
- ```env
786
- # Enable hybrid routing
787
- PREFER_OLLAMA=true
788
- FALLBACK_ENABLED=true
789
-
790
- # Configure providers for each tier
791
- MODEL_PROVIDER=ollama
792
- OLLAMA_MODEL=llama3.1:8b
793
- OLLAMA_MAX_TOOLS_FOR_ROUTING=3
794
-
795
- # Mid-tier (moderate complexity)
796
- OPENROUTER_API_KEY=your-key
797
- OPENROUTER_MODEL=openai/gpt-4o-mini
798
- OPENROUTER_MAX_TOOLS_FOR_ROUTING=15
799
-
800
- # Heavy workload (complex requests)
801
- FALLBACK_PROVIDER=databricks
802
- DATABRICKS_API_BASE=your-base
803
- DATABRICKS_API_KEY=your-key
804
- ```
805
-
806
- ### How It Works
807
-
808
- **Routing Logic:**
809
- 1. **0-2 tools**: Try Ollama first (free, local, fast)
810
- 2. **3-15 tools**: Route to OpenRouter (affordable cloud)
811
- 3. **16+ tools**: Route directly to Databricks/Azure (most capable)
812
-
813
- **Automatic Fallback:**
814
- - ❌ If Ollama fails → Fallback to OpenRouter or Databricks
815
- - ❌ If OpenRouter fails → Fallback to Databricks
816
- - ✅ Transparent to the user
817
-
818
- ### Cost Savings
819
-
820
- - **65-100%** for requests that stay on Ollama
821
- - **40-87%** faster for simple requests
822
- - **Privacy**: Simple queries never leave your machine
823
-
824
- ### Configuration Options
825
-
826
- | Variable | Description | Default |
827
- |----------|-------------|---------|
828
- | `PREFER_OLLAMA` | Enable Ollama preference for simple requests | `false` |
829
- | `FALLBACK_ENABLED` | Enable automatic fallback | `true` |
830
- | `FALLBACK_PROVIDER` | Provider to use when primary fails | `databricks` |
831
- | `OLLAMA_MAX_TOOLS_FOR_ROUTING` | Max tools to route to Ollama | `3` |
832
- | `OPENROUTER_MAX_TOOLS_FOR_ROUTING` | Max tools to route to OpenRouter | `15` |
833
-
834
- **Note:** Local providers (ollama, llamacpp, lmstudio) cannot be used as `FALLBACK_PROVIDER`.
835
-
836
- ---
837
-
838
- ## Complete Configuration Reference
839
-
840
- ### Core Variables
841
-
842
- | Variable | Description | Default |
843
- |----------|-------------|---------|
844
- | `MODEL_PROVIDER` | Primary provider (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`) | `databricks` |
845
- | `PORT` | HTTP port for proxy server | `8081` |
846
- | `WORKSPACE_ROOT` | Workspace directory path | `process.cwd()` |
847
- | `LOG_LEVEL` | Logging level (`error`, `warn`, `info`, `debug`) | `info` |
848
- | `TOOL_EXECUTION_MODE` | Where tools execute (`server`, `client`) | `server` |
849
- | `MODEL_DEFAULT` | Override default model/deployment name | Provider-specific |
850
-
851
- ### Provider-Specific Variables
852
-
853
- See individual provider sections above for complete variable lists.
854
-
855
- ---
856
-
857
- ## Provider Comparison
858
-
859
- ### Feature Comparison
860
-
861
- | Feature | Databricks | Bedrock | OpenAI | Azure OpenAI | Azure Anthropic | OpenRouter | Ollama | llama.cpp | LM Studio |
862
- |---------|-----------|---------|--------|--------------|-----------------|------------|--------|-----------|-----------|
863
- | **Setup Complexity** | Medium | Easy | Easy | Medium | Medium | Easy | Easy | Medium | Easy |
864
- | **Cost** | $$$ | $-$$$ | $$ | $$ | $$$ | $-$$ | **Free** | **Free** | **Free** |
865
- | **Latency** | Low | Low | Low | Low | Low | Medium | **Very Low** | **Very Low** | **Very Low** |
866
- | **Model Variety** | 2 | **100+** | 10+ | 10+ | 2 | **100+** | 50+ | Unlimited | 50+ |
867
- | **Tool Calling** | Excellent | Excellent* | Excellent | Excellent | Excellent | Good | Fair | Good | Fair |
868
- | **Context Length** | 200K | Up to 300K | 128K | 128K | 200K | Varies | 32K-128K | Model-dependent | 32K-128K |
869
- | **Streaming** | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
870
- | **Privacy** | Enterprise | Enterprise | Third-party | Enterprise | Enterprise | Third-party | **Local** | **Local** | **Local** |
871
- | **Offline** | No | No | No | No | No | No | **Yes** | **Yes** | **Yes** |
872
-
873
- _* Tool calling only supported by Claude models on Bedrock_
874
-
875
- ### Cost Comparison (per 1M tokens)
876
-
877
- | Provider | Model | Input | Output |
878
- |----------|-------|-------|--------|
879
- | **Bedrock** | Claude 3.5 Sonnet | $3.00 | $15.00 |
880
- | **Databricks** | Contact for pricing | - | - |
881
- | **OpenRouter** | Claude 3.5 Sonnet | $3.00 | $15.00 |
882
- | **OpenRouter** | GPT-4o mini | $0.15 | $0.60 |
883
- | **OpenAI** | GPT-4o | $2.50 | $10.00 |
884
- | **Azure OpenAI** | GPT-4o | $2.50 | $10.00 |
885
- | **Ollama** | Any model | **FREE** | **FREE** |
886
- | **llama.cpp** | Any model | **FREE** | **FREE** |
887
- | **LM Studio** | Any model | **FREE** | **FREE** |
888
-
889
- ---
890
-
891
- ## Next Steps
892
-
893
- - **[Installation Guide](installation.md)** - Install Lynkr with your chosen provider
894
- - **[Claude Code CLI Setup](claude-code-cli.md)** - Connect Claude Code CLI
895
- - **[Cursor Integration](cursor-integration.md)** - Connect Cursor IDE
896
- - **[Embeddings Configuration](embeddings.md)** - Enable @Codebase semantic search
897
- - **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
898
-
899
- ---
900
-
901
- ## Getting Help
902
-
903
- - **[FAQ](faq.md)** - Frequently asked questions
904
- - **[Troubleshooting Guide](troubleshooting.md)** - Common issues
905
- - **[GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Community Q&A
906
- - **[GitHub Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Report bugs