lynkr 8.0.0 → 9.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (128) hide show
  1. package/.lynkr/telemetry.db +0 -0
  2. package/.lynkr/telemetry.db-shm +0 -0
  3. package/.lynkr/telemetry.db-wal +0 -0
  4. package/README.md +196 -322
  5. package/lynkr-skill.tar.gz +0 -0
  6. package/package.json +4 -3
  7. package/src/api/openai-router.js +64 -13
  8. package/src/api/providers-handler.js +171 -3
  9. package/src/api/router.js +9 -2
  10. package/src/clients/circuit-breaker.js +10 -247
  11. package/src/clients/codex-process.js +342 -0
  12. package/src/clients/codex-utils.js +143 -0
  13. package/src/clients/databricks.js +210 -63
  14. package/src/clients/resilience.js +540 -0
  15. package/src/clients/retry.js +22 -167
  16. package/src/clients/standard-tools.js +23 -0
  17. package/src/config/index.js +77 -0
  18. package/src/context/compression.js +42 -9
  19. package/src/context/distill.js +492 -0
  20. package/src/orchestrator/index.js +48 -8
  21. package/src/routing/complexity-analyzer.js +258 -5
  22. package/src/routing/index.js +12 -2
  23. package/src/routing/latency-tracker.js +148 -0
  24. package/src/routing/model-tiers.js +2 -0
  25. package/src/routing/quality-scorer.js +113 -0
  26. package/src/routing/telemetry.js +464 -0
  27. package/src/server.js +13 -12
  28. package/src/tools/code-graph.js +538 -0
  29. package/src/tools/code-mode.js +304 -0
  30. package/src/tools/index.js +4 -0
  31. package/src/tools/lazy-loader.js +18 -0
  32. package/src/tools/mcp-remote.js +7 -0
  33. package/src/tools/smart-selection.js +11 -0
  34. package/src/tools/tinyfish.js +358 -0
  35. package/src/tools/truncate.js +1 -0
  36. package/src/utils/payload.js +206 -0
  37. package/src/utils/perf-timer.js +80 -0
  38. package/.github/FUNDING.yml +0 -15
  39. package/.github/workflows/README.md +0 -215
  40. package/.github/workflows/ci.yml +0 -69
  41. package/.github/workflows/index.yml +0 -62
  42. package/.github/workflows/web-tools-tests.yml +0 -56
  43. package/CITATIONS.bib +0 -6
  44. package/DEPLOYMENT.md +0 -1001
  45. package/LYNKR-TUI-PLAN.md +0 -984
  46. package/PERFORMANCE-REPORT.md +0 -866
  47. package/PLAN-per-client-model-routing.md +0 -252
  48. package/docs/42642f749da6234f41b6b425c3bb07c9.txt +0 -1
  49. package/docs/BingSiteAuth.xml +0 -4
  50. package/docs/docs-style.css +0 -478
  51. package/docs/docs.html +0 -198
  52. package/docs/google5be250e608e6da39.html +0 -1
  53. package/docs/index.html +0 -577
  54. package/docs/index.md +0 -584
  55. package/docs/robots.txt +0 -4
  56. package/docs/sitemap.xml +0 -44
  57. package/docs/style.css +0 -1223
  58. package/docs/toon-integration-spec.md +0 -130
  59. package/documentation/README.md +0 -101
  60. package/documentation/api.md +0 -806
  61. package/documentation/claude-code-cli.md +0 -679
  62. package/documentation/codex-cli.md +0 -397
  63. package/documentation/contributing.md +0 -571
  64. package/documentation/cursor-integration.md +0 -734
  65. package/documentation/docker.md +0 -874
  66. package/documentation/embeddings.md +0 -762
  67. package/documentation/faq.md +0 -713
  68. package/documentation/features.md +0 -403
  69. package/documentation/headroom.md +0 -519
  70. package/documentation/installation.md +0 -758
  71. package/documentation/memory-system.md +0 -476
  72. package/documentation/production.md +0 -636
  73. package/documentation/providers.md +0 -1009
  74. package/documentation/routing.md +0 -476
  75. package/documentation/testing.md +0 -629
  76. package/documentation/token-optimization.md +0 -325
  77. package/documentation/tools.md +0 -697
  78. package/documentation/troubleshooting.md +0 -969
  79. package/final-test.js +0 -33
  80. package/headroom-sidecar/config.py +0 -93
  81. package/headroom-sidecar/requirements.txt +0 -14
  82. package/headroom-sidecar/server.py +0 -451
  83. package/monitor-agents.sh +0 -31
  84. package/scripts/audit-log-reader.js +0 -399
  85. package/scripts/compact-dictionary.js +0 -204
  86. package/scripts/test-deduplication.js +0 -448
  87. package/src/db/database.sqlite +0 -0
  88. package/te +0 -11622
  89. package/test/README.md +0 -212
  90. package/test/azure-openai-config.test.js +0 -213
  91. package/test/azure-openai-error-resilience.test.js +0 -238
  92. package/test/azure-openai-format-conversion.test.js +0 -354
  93. package/test/azure-openai-integration.test.js +0 -287
  94. package/test/azure-openai-routing.test.js +0 -175
  95. package/test/azure-openai-streaming.test.js +0 -171
  96. package/test/bedrock-integration.test.js +0 -457
  97. package/test/comprehensive-test-suite.js +0 -928
  98. package/test/config-validation.test.js +0 -207
  99. package/test/cursor-integration.test.js +0 -484
  100. package/test/format-conversion.test.js +0 -578
  101. package/test/hybrid-routing-integration.test.js +0 -269
  102. package/test/hybrid-routing-performance.test.js +0 -428
  103. package/test/llamacpp-integration.test.js +0 -882
  104. package/test/lmstudio-integration.test.js +0 -347
  105. package/test/memory/extractor.test.js +0 -398
  106. package/test/memory/retriever.test.js +0 -613
  107. package/test/memory/retriever.test.js.bak +0 -585
  108. package/test/memory/search.test.js +0 -537
  109. package/test/memory/search.test.js.bak +0 -389
  110. package/test/memory/store.test.js +0 -344
  111. package/test/memory/store.test.js.bak +0 -312
  112. package/test/memory/surprise.test.js +0 -300
  113. package/test/memory-performance.test.js +0 -472
  114. package/test/openai-integration.test.js +0 -683
  115. package/test/openrouter-error-resilience.test.js +0 -418
  116. package/test/passthrough-mode.test.js +0 -385
  117. package/test/performance-benchmark.js +0 -351
  118. package/test/performance-tests.js +0 -528
  119. package/test/routing.test.js +0 -225
  120. package/test/toon-compression.test.js +0 -131
  121. package/test/web-tools.test.js +0 -329
  122. package/test-agents-simple.js +0 -43
  123. package/test-cli-connection.sh +0 -33
  124. package/test-learning-unit.js +0 -126
  125. package/test-learning.js +0 -112
  126. package/test-parallel-agents.sh +0 -124
  127. package/test-parallel-direct.js +0 -155
  128. package/test-subagents.sh +0 -117
@@ -1,1009 +0,0 @@
1
- # Provider Configuration Guide
2
-
3
- Complete configuration reference for all 12+ supported LLM providers. Each provider section includes setup instructions, model options, pricing, and example configurations.
4
-
5
- ---
6
-
7
- ## Overview
8
-
9
- Lynkr supports multiple AI model providers, giving you flexibility in choosing the right model for your needs:
10
-
11
- | Provider | Type | Models | Cost | Privacy | Setup Complexity |
12
- |----------|------|--------|------|---------|------------------|
13
- | **AWS Bedrock** | Cloud | 100+ (Claude, DeepSeek, Qwen, Nova, Titan, Llama, Mistral) | $-$$$ | Cloud | Easy |
14
- | **Databricks** | Cloud | Claude Sonnet 4.5, Opus 4.5 | $$$ | Cloud | Medium |
15
- | **OpenRouter** | Cloud | 100+ (GPT, Claude, Gemini, Llama, Mistral, etc.) | $-$$ | Cloud | Easy |
16
- | **Ollama** | Local | Unlimited (free, offline) | **FREE** | 🔒 100% Local | Easy |
17
- | **llama.cpp** | Local | Any GGUF model | **FREE** | 🔒 100% Local | Medium |
18
- | **Azure OpenAI** | Cloud | GPT-4o, GPT-5, o1, o3 | $$$ | Cloud | Medium |
19
- | **Azure Anthropic** | Cloud | Claude models | $$$ | Cloud | Medium |
20
- | **OpenAI** | Cloud | GPT-4o, o1, o3 | $$$ | Cloud | Easy |
21
- | **Moonshot AI (Kimi)** | Cloud | Kimi K2 (thinking + turbo) | $ | Cloud | Easy |
22
- | **LM Studio** | Local | Local models with GUI | **FREE** | 🔒 100% Local | Easy |
23
- | **MLX OpenAI Server** | Local | Apple Silicon optimized | **FREE** | 🔒 100% Local | Easy |
24
-
25
- ---
26
-
27
- ## Configuration Methods
28
-
29
- There are two routing modes. Choose based on your needs:
30
-
31
- ### Static Routing (Single Provider)
32
-
33
- Set `MODEL_PROVIDER` to send all requests to one provider. All requests go to this provider regardless of complexity:
34
-
35
- ```bash
36
- export MODEL_PROVIDER=databricks
37
- export DATABRICKS_API_BASE=https://your-workspace.databricks.com
38
- export DATABRICKS_API_KEY=your-key
39
- lynkr start
40
- ```
41
-
42
- ### Tier-Based Routing (Recommended for Cost Optimization)
43
-
44
- Set **all 4** `TIER_*` vars to route requests by complexity. Each request is scored 0-100 and routed to the `provider:model` matching its complexity tier. When all four are configured, they **override** `MODEL_PROVIDER` for routing decisions:
45
-
46
- ```bash
47
- export MODEL_PROVIDER=ollama # Still needed for startup checks
48
- export TIER_SIMPLE=ollama:llama3.2 # Score 0-25 → local (free)
49
- export TIER_MEDIUM=openrouter:openai/gpt-4o-mini # Score 26-50 → affordable cloud
50
- export TIER_COMPLEX=databricks:claude-sonnet # Score 51-75 → capable cloud
51
- export TIER_REASONING=databricks:claude-sonnet # Score 76-100 → best available
52
- lynkr start
53
- ```
54
-
55
- > **Important:** All 4 `TIER_*` vars must be set to enable tier routing. If any are missing, tier routing is disabled and `MODEL_PROVIDER` is used for all requests. `MODEL_PROVIDER` should always be set — even with tier routing active, it is used for startup checks, provider discovery, and as the default provider when a `TIER_*` value has no `provider:` prefix.
56
- >
57
- > **`PREFER_OLLAMA` is deprecated** and has no effect. Use `TIER_SIMPLE=ollama:<model>` to route simple requests to Ollama. See [Routing Precedence](routing.md#routing-precedence) for full details.
58
-
59
- ### .env File (Recommended for Production)
60
-
61
- ```bash
62
- # Copy example file
63
- cp .env.example .env
64
-
65
- # Edit with your credentials
66
- nano .env
67
- ```
68
-
69
- Example `.env`:
70
- ```env
71
- MODEL_PROVIDER=ollama
72
- DATABRICKS_API_BASE=https://your-workspace.databricks.com
73
- DATABRICKS_API_KEY=dapi1234567890abcdef
74
- PORT=8081
75
- LOG_LEVEL=info
76
-
77
- # Tier routing (optional — set all 4 to enable)
78
- TIER_SIMPLE=ollama:llama3.2
79
- TIER_MEDIUM=openrouter:openai/gpt-4o-mini
80
- TIER_COMPLEX=databricks:claude-sonnet
81
- TIER_REASONING=databricks:claude-sonnet
82
- ```
83
-
84
- ---
85
-
86
- ## Remote/Network Configuration
87
-
88
- **All provider endpoints support remote addresses** - you're not limited to `localhost`. This enables powerful setups like:
89
-
90
- - 🖥️ **GPU Server**: Run Ollama/llama.cpp on a dedicated GPU machine
91
- - 🏢 **Team Sharing**: Multiple developers using one Lynkr instance
92
- - ☁️ **Hybrid**: Lynkr on local machine, models on cloud VM
93
-
94
- ### Examples
95
-
96
- **Ollama on Remote GPU Server**
97
- ```env
98
- MODEL_PROVIDER=ollama
99
- OLLAMA_ENDPOINT=http://192.168.1.100:11434 # Local network IP
100
- # or
101
- OLLAMA_ENDPOINT=http://gpu-server.local:11434 # Hostname
102
- # or
103
- OLLAMA_ENDPOINT=http://ollama.mycompany.com:11434 # Domain
104
- ```
105
-
106
- **llama.cpp on Remote Machine**
107
- ```env
108
- MODEL_PROVIDER=llamacpp
109
- LLAMACPP_ENDPOINT=http://10.0.0.50:8080
110
- ```
111
-
112
- **LM Studio on Another Computer**
113
- ```env
114
- MODEL_PROVIDER=lmstudio
115
- LMSTUDIO_ENDPOINT=http://workstation.local:1234
116
- ```
117
-
118
- ### Network Requirements
119
-
120
- | Setup | Requirement |
121
- |-------|-------------|
122
- | Same machine | `localhost` or `127.0.0.1` |
123
- | Local network | IP address or hostname, firewall allows port |
124
- | Remote/Internet | Public IP/domain, port forwarding, consider VPN/auth |
125
-
126
- > ⚠️ **Security Note**: When exposing endpoints over a network, ensure proper firewall rules and consider using a VPN or SSH tunnel for sensitive deployments.
127
-
128
- ---
129
-
130
- ## Provider-Specific Configuration
131
-
132
- ### 1. AWS Bedrock (100+ Models)
133
-
134
- **Best for:** AWS ecosystem, multi-model flexibility, Claude + alternatives
135
-
136
- #### Configuration
137
-
138
- ```env
139
- MODEL_PROVIDER=bedrock
140
- AWS_BEDROCK_API_KEY=your-bearer-token
141
- AWS_BEDROCK_REGION=us-east-1
142
- AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
143
- ```
144
-
145
- #### Getting AWS Bedrock API Key
146
-
147
- 1. Log in to [AWS Console](https://console.aws.amazon.com/)
148
- 2. Navigate to **Bedrock** → **API Keys**
149
- 3. Click **Generate API Key**
150
- 4. Copy the bearer token (this is your `AWS_BEDROCK_API_KEY`)
151
- 5. Enable model access in Bedrock console
152
- 6. See: [AWS Bedrock API Keys Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/api-keys-generate.html)
153
-
154
- #### Available Regions
155
-
156
- - `us-east-1` (N. Virginia) - Most models available
157
- - `us-west-2` (Oregon)
158
- - `us-east-2` (Ohio)
159
- - `ap-southeast-1` (Singapore)
160
- - `ap-northeast-1` (Tokyo)
161
- - `eu-central-1` (Frankfurt)
162
-
163
- #### Model Catalog
164
-
165
- **Claude Models (Best for Tool Calling)** ✅
166
-
167
- Claude 4.5 (latest - requires inference profiles):
168
- ```env
169
- AWS_BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0 # Regional US
170
- AWS_BEDROCK_MODEL_ID=us.anthropic.claude-haiku-4-5-20251001-v1:0 # Fast, efficient
171
- AWS_BEDROCK_MODEL_ID=global.anthropic.claude-sonnet-4-5-20250929-v1:0 # Cross-region
172
- ```
173
-
174
- Claude 3.x models:
175
- ```env
176
- AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0 # Excellent tool calling
177
- AWS_BEDROCK_MODEL_ID=anthropic.claude-3-opus-20240229-v1:0 # Most capable
178
- AWS_BEDROCK_MODEL_ID=anthropic.claude-3-haiku-20240307-v1:0 # Fast, cheap
179
- ```
180
-
181
- **DeepSeek Models (NEW - 2025)**
182
- ```env
183
- AWS_BEDROCK_MODEL_ID=us.deepseek.r1-v1:0 # DeepSeek R1 - reasoning model (o1-style)
184
- ```
185
-
186
- **Qwen Models (Alibaba - NEW 2025)**
187
- ```env
188
- AWS_BEDROCK_MODEL_ID=qwen.qwen3-235b-a22b-2507-v1:0 # Largest, 235B parameters
189
- AWS_BEDROCK_MODEL_ID=qwen.qwen3-32b-v1:0 # Balanced, 32B
190
- AWS_BEDROCK_MODEL_ID=qwen.qwen3-coder-480b-a35b-v1:0 # Coding specialist, 480B
191
- AWS_BEDROCK_MODEL_ID=qwen.qwen3-coder-30b-a3b-v1:0 # Coding, smaller
192
- ```
193
-
194
- **OpenAI Open-Weight Models (NEW - 2025)**
195
- ```env
196
- AWS_BEDROCK_MODEL_ID=openai.gpt-oss-120b-1:0 # 120B parameters, open-weight
197
- AWS_BEDROCK_MODEL_ID=openai.gpt-oss-20b-1:0 # 20B parameters, efficient
198
- ```
199
-
200
- **Google Gemma Models (Open-Weight)**
201
- ```env
202
- AWS_BEDROCK_MODEL_ID=google.gemma-3-27b # 27B parameters
203
- AWS_BEDROCK_MODEL_ID=google.gemma-3-12b # 12B parameters
204
- AWS_BEDROCK_MODEL_ID=google.gemma-3-4b # 4B parameters, efficient
205
- ```
206
-
207
- **Amazon Models**
208
-
209
- Nova (multimodal):
210
- ```env
211
- AWS_BEDROCK_MODEL_ID=us.amazon.nova-pro-v1:0 # Best quality, multimodal, 300K context
212
- AWS_BEDROCK_MODEL_ID=us.amazon.nova-lite-v1:0 # Fast, cost-effective
213
- AWS_BEDROCK_MODEL_ID=us.amazon.nova-micro-v1:0 # Ultra-fast, text-only
214
- ```
215
-
216
- Titan:
217
- ```env
218
- AWS_BEDROCK_MODEL_ID=amazon.titan-text-premier-v1:0 # Largest
219
- AWS_BEDROCK_MODEL_ID=amazon.titan-text-express-v1 # Fast
220
- AWS_BEDROCK_MODEL_ID=amazon.titan-text-lite-v1 # Cheapest
221
- ```
222
-
223
- **Meta Llama Models**
224
- ```env
225
- AWS_BEDROCK_MODEL_ID=meta.llama3-1-70b-instruct-v1:0 # Most capable
226
- AWS_BEDROCK_MODEL_ID=meta.llama3-1-8b-instruct-v1:0 # Fast, efficient
227
- ```
228
-
229
- **Mistral Models**
230
- ```env
231
- AWS_BEDROCK_MODEL_ID=mistral.mistral-large-2407-v1:0 # Largest, coding, multilingual
232
- AWS_BEDROCK_MODEL_ID=mistral.mistral-small-2402-v1:0 # Efficient
233
- AWS_BEDROCK_MODEL_ID=mistral.mixtral-8x7b-instruct-v0:1 # Mixture of experts
234
- ```
235
-
236
- **Cohere Command Models**
237
- ```env
238
- AWS_BEDROCK_MODEL_ID=cohere.command-r-plus-v1:0 # Best for RAG, search
239
- AWS_BEDROCK_MODEL_ID=cohere.command-r-v1:0 # Balanced
240
- ```
241
-
242
- **AI21 Jamba Models**
243
- ```env
244
- AWS_BEDROCK_MODEL_ID=ai21.jamba-1-5-large-v1:0 # Hybrid architecture, 256K context
245
- AWS_BEDROCK_MODEL_ID=ai21.jamba-1-5-mini-v1:0 # Fast
246
- ```
247
-
248
- #### Pricing (per 1M tokens)
249
-
250
- | Model | Input | Output |
251
- |-------|-------|--------|
252
- | Claude 3.5 Sonnet | $3.00 | $15.00 |
253
- | Claude 3 Opus | $15.00 | $75.00 |
254
- | Claude 3 Haiku | $0.25 | $1.25 |
255
- | Titan Text Express | $0.20 | $0.60 |
256
- | Llama 3 70B | $0.99 | $0.99 |
257
- | Nova Pro | $0.80 | $3.20 |
258
-
259
- #### Important Notes
260
-
261
- ⚠️ **Tool Calling:** Only **Claude models** support tool calling on Bedrock. Other models work via Converse API but won't use Read/Write/Bash tools.
262
-
263
- 📖 **Full Documentation:** See [BEDROCK_MODELS.md](../BEDROCK_MODELS.md) for complete model catalog with capabilities and use cases.
264
-
265
- ---
266
-
267
- ### 2. Databricks (Claude Sonnet 4.5, Opus 4.5)
268
-
269
- **Best for:** Enterprise production use, managed Claude endpoints
270
-
271
- #### Configuration
272
-
273
- ```env
274
- MODEL_PROVIDER=databricks
275
- DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com
276
- DATABRICKS_API_KEY=dapi1234567890abcdef
277
- ```
278
-
279
- Optional endpoint path override:
280
- ```env
281
- DATABRICKS_ENDPOINT_PATH=/serving-endpoints/databricks-claude-sonnet-4-5/invocations
282
- ```
283
-
284
- #### Getting Databricks Credentials
285
-
286
- 1. Log in to your Databricks workspace
287
- 2. Navigate to **Settings** → **User Settings**
288
- 3. Click **Generate New Token**
289
- 4. Copy the token (this is your `DATABRICKS_API_KEY`)
290
- 5. Your workspace URL is the base URL (e.g., `https://your-workspace.cloud.databricks.com`)
291
-
292
- #### Available Models
293
-
294
- - **Claude Sonnet 4.5** - Excellent for tool calling, balanced performance
295
- - **Claude Opus 4.5** - Most capable model for complex reasoning
296
-
297
- #### Pricing
298
-
299
- Contact Databricks for enterprise pricing.
300
-
301
- ---
302
-
303
- ### 3. OpenRouter (100+ Models)
304
-
305
- **Best for:** Quick setup, model flexibility, cost optimization
306
-
307
- #### Configuration
308
-
309
- ```env
310
- MODEL_PROVIDER=openrouter
311
- OPENROUTER_API_KEY=sk-or-v1-your-key
312
- OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
313
- OPENROUTER_ENDPOINT=https://openrouter.ai/api/v1/chat/completions
314
- ```
315
-
316
- Optional for hybrid routing:
317
- ```env
318
- OPENROUTER_MAX_TOOLS_FOR_ROUTING=15 # Max tools to route to OpenRouter
319
- ```
320
-
321
- #### Getting OpenRouter API Key
322
-
323
- 1. Visit [openrouter.ai](https://openrouter.ai)
324
- 2. Sign in with GitHub, Google, or email
325
- 3. Go to [openrouter.ai/keys](https://openrouter.ai/keys)
326
- 4. Create a new API key
327
- 5. Add credits (pay-as-you-go, no subscription required)
328
-
329
- #### Popular Models
330
-
331
- **Claude Models (Best for Coding)**
332
- ```env
333
- OPENROUTER_MODEL=anthropic/claude-3.5-sonnet # $3/$15 per 1M tokens
334
- OPENROUTER_MODEL=anthropic/claude-opus-4.5 # $15/$75 per 1M tokens
335
- OPENROUTER_MODEL=anthropic/claude-3-haiku # $0.25/$1.25 per 1M tokens
336
- ```
337
-
338
- **OpenAI Models**
339
- ```env
340
- OPENROUTER_MODEL=openai/gpt-4o # $2.50/$10 per 1M tokens
341
- OPENROUTER_MODEL=openai/gpt-4o-mini # $0.15/$0.60 per 1M tokens (default)
342
- OPENROUTER_MODEL=openai/o1-preview # $15/$60 per 1M tokens
343
- OPENROUTER_MODEL=openai/o1-mini # $3/$12 per 1M tokens
344
- ```
345
-
346
- **Google Models**
347
- ```env
348
- OPENROUTER_MODEL=google/gemini-pro-1.5 # $1.25/$5 per 1M tokens
349
- OPENROUTER_MODEL=google/gemini-flash-1.5 # $0.075/$0.30 per 1M tokens
350
- ```
351
-
352
- **Meta Llama Models**
353
- ```env
354
- OPENROUTER_MODEL=meta-llama/llama-3.1-405b # $2.70/$2.70 per 1M tokens
355
- OPENROUTER_MODEL=meta-llama/llama-3.1-70b # $0.52/$0.75 per 1M tokens
356
- OPENROUTER_MODEL=meta-llama/llama-3.1-8b # $0.06/$0.06 per 1M tokens
357
- ```
358
-
359
- **Mistral Models**
360
- ```env
361
- OPENROUTER_MODEL=mistralai/mistral-large # $2/$6 per 1M tokens
362
- OPENROUTER_MODEL=mistralai/codestral-latest # $0.30/$0.90 per 1M tokens
363
- ```
364
-
365
- **DeepSeek Models**
366
- ```env
367
- OPENROUTER_MODEL=deepseek/deepseek-chat # $0.14/$0.28 per 1M tokens
368
- OPENROUTER_MODEL=deepseek/deepseek-coder # $0.14/$0.28 per 1M tokens
369
- ```
370
-
371
- #### Benefits
372
-
373
- - ✅ **100+ models** through one API
374
- - ✅ **Automatic fallbacks** if primary model unavailable
375
- - ✅ **Competitive pricing** with volume discounts
376
- - ✅ **Full tool calling support**
377
- - ✅ **No monthly fees** - pay only for usage
378
- - ✅ **Rate limit pooling** across models
379
-
380
- See [openrouter.ai/models](https://openrouter.ai/models) for complete list with pricing.
381
-
382
- ---
383
-
384
- ### 4. Ollama (Local Models)
385
-
386
- **Best for:** Local development, privacy, offline use, no API costs
387
-
388
- #### Configuration
389
-
390
- ```env
391
- MODEL_PROVIDER=ollama
392
- OLLAMA_ENDPOINT=http://localhost:11434 # Or any remote IP/hostname
393
- OLLAMA_MODEL=llama3.1:8b
394
- OLLAMA_TIMEOUT_MS=120000
395
- ```
396
-
397
- > 🌐 **Remote Support**: `OLLAMA_ENDPOINT` can be any address - `http://192.168.1.100:11434`, `http://gpu-server:11434`, etc. See [Remote/Network Configuration](#remotenetwork-configuration).
398
-
399
- #### Performance Optimization
400
-
401
- **Prevent Cold Starts:** Ollama unloads models after 5 minutes of inactivity by default. This causes slow first requests (10-30+ seconds) while the model reloads. To keep models loaded:
402
-
403
- **Option 1: Environment Variable (Recommended)**
404
- ```bash
405
- # Set on Ollama server (not Lynkr)
406
- # macOS
407
- launchctl setenv OLLAMA_KEEP_ALIVE "24h"
408
-
409
- # Linux (systemd) - edit with: sudo systemctl edit ollama
410
- [Service]
411
- Environment="OLLAMA_KEEP_ALIVE=24h"
412
-
413
- # Docker
414
- docker run -e OLLAMA_KEEP_ALIVE=24h -d ollama/ollama
415
- ```
416
-
417
- **Option 2: Per-Request Keep Alive**
418
- ```bash
419
- curl http://localhost:11434/api/generate -d '{"model":"llama3.1:8b","keep_alive":"24h"}'
420
- ```
421
-
422
- **Keep Alive Values:**
423
- | Value | Behavior |
424
- |-------|----------|
425
- | `5m` | Default - unload after 5 minutes |
426
- | `24h` | Keep loaded for 24 hours |
427
- | `-1` | Never unload (keep forever) |
428
- | `0` | Unload immediately after request |
429
-
430
- #### Installation & Setup
431
-
432
- ```bash
433
- # Install Ollama
434
- brew install ollama # macOS
435
- # Or download from: https://ollama.ai/download
436
-
437
- # Start Ollama service
438
- ollama serve
439
-
440
- # Pull a model
441
- ollama pull llama3.1:8b
442
-
443
- # Verify model is available
444
- ollama list
445
- ```
446
-
447
- #### Recommended Models
448
-
449
- **For Tool Calling** ✅ (Required for Claude Code CLI)
450
- ```bash
451
- ollama pull llama3.1:8b # Good balance (4.7GB)
452
- ollama pull llama3.2 # Latest Llama (4.7GB)
453
- ollama pull qwen2.5:14b # Strong reasoning (8GB, 7b struggles with tools)
454
- ollama pull mistral:7b-instruct # Fast and capable (4.1GB)
455
- ```
456
-
457
- **NOT Recommended for Tools** ❌
458
- ```bash
459
- qwen2.5-coder # Code-only, slow with tool calling
460
- codellama # Code-only, poor tool support
461
- ```
462
-
463
- #### Tool Calling Support
464
-
465
- Lynkr supports **native tool calling** for compatible Ollama models:
466
-
467
- - ✅ **Supported models**: llama3.1, llama3.2, qwen2.5, mistral, mistral-nemo
468
- - ✅ **Automatic detection**: Lynkr detects tool-capable models
469
- - ✅ **Format conversion**: Transparent Anthropic ↔ Ollama conversion
470
- - ❌ **Unsupported models**: llama3, older models (tools filtered automatically)
471
-
472
- #### Pricing
473
-
474
- **100% FREE** - Models run on your hardware with no API costs.
475
-
476
- #### Model Sizes
477
-
478
- - **7B models**: ~4-5GB download, 8GB RAM required
479
- - **8B models**: ~4.7GB download, 8GB RAM required
480
- - **14B models**: ~8GB download, 16GB RAM required
481
- - **32B models**: ~18GB download, 32GB RAM required
482
-
483
- ---
484
-
485
- ### 5. llama.cpp (GGUF Models)
486
-
487
- **Best for:** Maximum performance, custom quantization, any GGUF model
488
-
489
- #### Configuration
490
-
491
- ```env
492
- MODEL_PROVIDER=llamacpp
493
- LLAMACPP_ENDPOINT=http://localhost:8080 # Or any remote IP/hostname
494
- LLAMACPP_MODEL=qwen2.5-coder-7b
495
- LLAMACPP_TIMEOUT_MS=120000
496
- ```
497
-
498
- Optional API key (for secured servers):
499
- ```env
500
- LLAMACPP_API_KEY=your-optional-api-key
501
- ```
502
-
503
- > 🌐 **Remote Support**: `LLAMACPP_ENDPOINT` can be any address. See [Remote/Network Configuration](#remotenetwork-configuration).
504
-
505
- #### Installation & Setup
506
-
507
- ```bash
508
- # Clone and build llama.cpp
509
- git clone https://github.com/ggerganov/llama.cpp
510
- cd llama.cpp && make
511
-
512
- # Download a GGUF model (example: Qwen2.5-Coder-7B)
513
- wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_k_m.gguf
514
-
515
- # Start llama-server
516
- ./llama-server -m qwen2.5-coder-7b-instruct-q4_k_m.gguf --port 8080
517
-
518
- # Verify server is running
519
- curl http://localhost:8080/health
520
- ```
521
-
522
- #### GPU Support
523
-
524
- llama.cpp supports multiple GPU backends:
525
-
526
- - **CUDA** (NVIDIA): `make LLAMA_CUDA=1`
527
- - **Metal** (Apple Silicon): `make LLAMA_METAL=1`
528
- - **ROCm** (AMD): `make LLAMA_ROCM=1`
529
- - **Vulkan** (Universal): `make LLAMA_VULKAN=1`
530
-
531
- #### llama.cpp vs Ollama
532
-
533
- | Feature | Ollama | llama.cpp |
534
- |---------|--------|-----------|
535
- | Setup | Easy (app) | Manual (compile/download) |
536
- | Model Format | Ollama-specific | Any GGUF model |
537
- | Performance | Good | **Excellent** (optimized C++) |
538
- | GPU Support | Yes | Yes (CUDA, Metal, ROCm, Vulkan) |
539
- | Memory Usage | Higher | **Lower** (quantization options) |
540
- | API | Custom `/api/chat` | OpenAI-compatible `/v1/chat/completions` |
541
- | Flexibility | Limited models | **Any GGUF** from HuggingFace |
542
- | Tool Calling | Limited models | Grammar-based, more reliable |
543
-
544
- **Choose llama.cpp when you need:**
545
- - Maximum performance
546
- - Specific quantization options (Q4, Q5, Q8)
547
- - GGUF models not available in Ollama
548
- - Fine-grained control over inference parameters
549
-
550
- ---
551
-
552
- ### 6. Azure OpenAI
553
-
554
- **Best for:** Azure integration, Microsoft ecosystem, GPT-4o, o1, o3
555
-
556
- #### Configuration
557
-
558
- ```env
559
- MODEL_PROVIDER=azure-openai
560
- AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/openai/deployments/YOUR-DEPLOYMENT/chat/completions?api-version=2025-01-01-preview
561
- AZURE_OPENAI_API_KEY=your-azure-api-key
562
- AZURE_OPENAI_DEPLOYMENT=gpt-4o
563
- ```
564
-
565
- Optional:
566
- ```env
567
- AZURE_OPENAI_API_VERSION=2024-08-01-preview # Latest stable version
568
- ```
569
-
570
- #### Getting Azure OpenAI Credentials
571
-
572
- 1. Log in to [Azure Portal](https://portal.azure.com)
573
- 2. Navigate to **Azure OpenAI** service
574
- 3. Go to **Keys and Endpoint**
575
- 4. Copy **KEY 1** (this is your API key)
576
- 5. Copy **Endpoint** URL
577
- 6. Create a deployment (gpt-4o, gpt-4o-mini, etc.)
578
-
579
- #### Important: Full Endpoint URL Required
580
-
581
- The `AZURE_OPENAI_ENDPOINT` must include:
582
- - Resource name
583
- - Deployment path
584
- - API version query parameter
585
-
586
- **Example:**
587
- ```
588
- https://your-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview
589
- ```
590
-
591
- #### Available Deployments
592
-
593
- You can deploy any of these models in Azure AI Foundry:
594
-
595
- ```env
596
- AZURE_OPENAI_DEPLOYMENT=gpt-4o # Latest GPT-4o
597
- AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini # Smaller, faster, cheaper
598
- AZURE_OPENAI_DEPLOYMENT=gpt-5-chat # GPT-5 (if available)
599
- AZURE_OPENAI_DEPLOYMENT=o1-preview # Reasoning model
600
- AZURE_OPENAI_DEPLOYMENT=o3-mini # Latest reasoning model
601
- AZURE_OPENAI_DEPLOYMENT=kimi-k2 # Kimi K2 (if available)
602
- ```
603
-
604
- ---
605
-
606
- ### 7. Azure Anthropic
607
-
608
- **Best for:** Azure-hosted Claude models with enterprise integration
609
-
610
- #### Configuration
611
-
612
- ```env
613
- MODEL_PROVIDER=azure-anthropic
614
- AZURE_ANTHROPIC_ENDPOINT=https://your-resource.services.ai.azure.com/anthropic/v1/messages
615
- AZURE_ANTHROPIC_API_KEY=your-azure-api-key
616
- AZURE_ANTHROPIC_VERSION=2023-06-01
617
- ```
618
-
619
- #### Getting Azure Anthropic Credentials
620
-
621
- 1. Log in to [Azure Portal](https://portal.azure.com)
622
- 2. Navigate to your Azure Anthropic resource
623
- 3. Go to **Keys and Endpoint**
624
- 4. Copy the API key
625
- 5. Copy the endpoint URL (includes `/anthropic/v1/messages`)
626
-
627
- #### Available Models
628
-
629
- - **Claude Sonnet 4.5** - Best for tool calling, balanced
630
- - **Claude Opus 4.5** - Most capable for complex reasoning
631
-
632
- ---
633
-
634
- ### 8. OpenAI (Direct)
635
-
636
- **Best for:** Direct OpenAI API access, lowest latency
637
-
638
- #### Configuration
639
-
640
- ```env
641
- MODEL_PROVIDER=openai
642
- OPENAI_API_KEY=sk-your-openai-api-key
643
- OPENAI_MODEL=gpt-4o
644
- OPENAI_ENDPOINT=https://api.openai.com/v1/chat/completions
645
- ```
646
-
647
- Optional for organization-level keys:
648
- ```env
649
- OPENAI_ORGANIZATION=org-your-org-id
650
- ```
651
-
652
- #### Getting OpenAI API Key
653
-
654
- 1. Visit [platform.openai.com](https://platform.openai.com)
655
- 2. Sign up or log in
656
- 3. Go to [API Keys](https://platform.openai.com/api-keys)
657
- 4. Create a new API key
658
- 5. Add credits to your account (pay-as-you-go)
659
-
660
- #### Available Models
661
-
662
- ```env
663
- OPENAI_MODEL=gpt-4o # Latest GPT-4o ($2.50/$10 per 1M)
664
- OPENAI_MODEL=gpt-4o-mini # Smaller, faster ($0.15/$0.60 per 1M)
665
- OPENAI_MODEL=gpt-4-turbo # GPT-4 Turbo
666
- OPENAI_MODEL=o1-preview # Reasoning model
667
- OPENAI_MODEL=o1-mini # Smaller reasoning model
668
- ```
669
-
670
- #### Benefits
671
-
672
- - ✅ **Direct API access** - No intermediaries, lowest latency
673
- - ✅ **Full tool calling support** - Excellent function calling
674
- - ✅ **Parallel tool calls** - Execute multiple tools simultaneously
675
- - ✅ **Organization support** - Use org-level API keys
676
- - ✅ **Simple setup** - Just one API key needed
677
-
678
- ---
679
-
680
- ### 9. LM Studio (Local with GUI)
681
-
682
- **Best for:** Local models with graphical interface
683
-
684
- #### Configuration
685
-
686
- ```env
687
- MODEL_PROVIDER=lmstudio
688
- LMSTUDIO_ENDPOINT=http://localhost:1234
689
- LMSTUDIO_MODEL=default
690
- LMSTUDIO_TIMEOUT_MS=120000
691
- ```
692
-
693
- Optional API key (for secured servers):
694
- ```env
695
- LMSTUDIO_API_KEY=your-optional-api-key
696
- ```
697
-
698
- #### Setup
699
-
700
- 1. Download and install [LM Studio](https://lmstudio.ai)
701
- 2. Launch LM Studio
702
- 3. Download a model (e.g., Qwen2.5-Coder-7B, Llama 3.1)
703
- 4. Click **Start Server** (default port: 1234)
704
- 5. Configure Lynkr to use LM Studio
705
-
706
- #### Benefits
707
-
708
- - ✅ **Graphical interface** for model management
709
- - ✅ **Easy model downloads** from HuggingFace
710
- - ✅ **Built-in server** with OpenAI-compatible API
711
- - ✅ **GPU acceleration** support
712
- - ✅ **Model presets** and configurations
713
-
714
- ---
715
-
716
- ### 10. Moonshot AI / Kimi (OpenAI-Compatible)
717
-
718
- **Best for:** Affordable cloud models, thinking/reasoning models, OpenAI-compatible API
719
-
720
- #### Configuration
721
-
722
- ```env
723
- MODEL_PROVIDER=moonshot
724
- MOONSHOT_API_KEY=sk-your-moonshot-api-key
725
- MOONSHOT_ENDPOINT=https://api.moonshot.ai/v1/chat/completions
726
- MOONSHOT_MODEL=kimi-k2-turbo-preview
727
- ```
728
-
729
- #### Getting Moonshot API Key
730
-
731
- 1. Visit [platform.moonshot.ai](https://platform.moonshot.ai)
732
- 2. Sign up or log in
733
- 3. Navigate to API Keys section
734
- 4. Create a new API key
735
- 5. Add credits to your account
736
-
737
- #### Available Models
738
-
739
- ```env
740
- MOONSHOT_MODEL=kimi-k2-turbo-preview # Fast, efficient (recommended)
741
- MOONSHOT_MODEL=kimi-k2-thinking # Chain-of-thought reasoning model
742
- ```
743
-
744
- **Model Details:**
745
-
746
- | Model | Type | Best For |
747
- |-------|------|----------|
748
- | `kimi-k2-turbo-preview` | Standard | Fast responses, tool calling, general tasks |
749
- | `kimi-k2-thinking` | Thinking/Reasoning | Complex analysis, multi-step reasoning |
750
-
751
- #### How It Works
752
-
753
- Moonshot uses an **OpenAI-compatible** chat completions API. Lynkr handles all format conversion automatically:
754
-
755
- 1. Claude Code CLI sends Anthropic-format request to Lynkr
756
- 2. Lynkr converts Anthropic messages → OpenAI chat completions format
757
- 3. Request is sent to Moonshot's `/v1/chat/completions` endpoint
758
- 4. Moonshot response is converted back to Anthropic format
759
- 5. Claude Code CLI receives a standard Anthropic response
760
-
761
- #### Thinking Model Support
762
-
763
- When using `kimi-k2-thinking`, the model returns both `reasoning_content` (chain-of-thought) and `content` (final answer). Lynkr automatically extracts only the final answer for clean CLI output. The reasoning content is used as a fallback only when the final answer is empty.
764
-
765
- #### Important Notes
766
-
767
- - **Streaming:** Streaming is disabled for Moonshot (responses arrive as complete JSON). This ensures clean terminal rendering since OpenAI SSE → Anthropic SSE conversion is not yet implemented.
768
- - **Rate Limits:** Moonshot has a max concurrency of ~3 requests. Lynkr retries with backoff on 429 errors.
769
- - **Tool Calling:** Full tool calling support via OpenAI function calling format (automatically converted from Anthropic format).
770
- - **System Messages:** Moonshot natively supports the `system` role, so system prompts are passed directly.
771
-
772
- #### Benefits
773
-
774
- - ✅ **Affordable** — Competitive pricing for capable models
775
- - ✅ **Thinking models** — Chain-of-thought reasoning with `kimi-k2-thinking`
776
- - ✅ **Full tool calling** — Native function calling support
777
- - ✅ **OpenAI-compatible** — Standard chat completions API
778
- - ✅ **System role support** — Native system message handling
779
-
780
- #### Test Connection
781
-
782
- ```bash
783
- curl -X POST https://api.moonshot.ai/v1/chat/completions \
784
- -H "Content-Type: application/json" \
785
- -H "Authorization: Bearer $MOONSHOT_API_KEY" \
786
- -d '{"model":"kimi-k2-turbo-preview","messages":[{"role":"user","content":"Hello"}]}'
787
- ```
788
-
789
- ---
790
-
791
- ### 11. MLX OpenAI Server (Apple Silicon)
792
-
793
- **Best for:** Maximum performance on Apple Silicon Macs (M1/M2/M3/M4)
794
-
795
- [MLX OpenAI Server](https://github.com/cubist38/mlx-openai-server) is a high-performance local LLM server optimized for Apple's MLX framework. It provides OpenAI-compatible endpoints for text, vision, audio, and image generation models.
796
-
797
- #### Installation
798
-
799
- ```bash
800
- # Create virtual environment
801
- python3.11 -m venv .venv
802
- source .venv/bin/activate
803
-
804
- # Install
805
- pip install mlx-openai-server
806
-
807
- # Optional: for audio transcription
808
- brew install ffmpeg
809
- ```
810
-
811
- #### Start the Server
812
-
813
- ```bash
814
- # Text/Code models (recommended for coding)
815
- mlx-openai-server launch --model-path mlx-community/Qwen2.5-Coder-7B-Instruct-4bit --model-type lm
816
-
817
- # Smaller model (faster, less RAM)
818
- mlx-openai-server launch --model-path mlx-community/Qwen2.5-Coder-1.5B-Instruct-4bit --model-type lm
819
-
820
- # General purpose
821
- mlx-openai-server launch --model-path mlx-community/Qwen2.5-3B-Instruct-4bit --model-type lm
822
- ```
823
-
824
- Server runs at `http://localhost:8000/v1` by default.
825
-
826
- #### Configuration
827
-
828
- ```env
829
- MODEL_PROVIDER=openai
830
- OPENAI_ENDPOINT=http://localhost:8000/v1/chat/completions
831
- OPENAI_API_KEY=not-needed
832
- ```
833
-
834
- > 🌐 **Remote Support**: `OPENAI_ENDPOINT` can be any address (e.g., `http://192.168.1.100:8000/v1/chat/completions` for a Mac Studio GPU server).
835
-
836
- #### Recommended Models for Coding
837
-
838
- | Model | Size | RAM | Command |
839
- |-------|------|-----|---------|
840
- | `Qwen2.5-Coder-1.5B-Instruct-4bit` | ~1GB | 4GB | Fast, simple code tasks |
841
- | `Qwen2.5-3B-Instruct-4bit` | ~2GB | 6GB | General + code |
842
- | `Qwen2.5-Coder-7B-Instruct-4bit` | ~4GB | 8GB | Best for coding |
843
- | `Qwen2.5-Coder-14B-Instruct-4bit` | ~8GB | 16GB | Complex reasoning |
844
- | `Llama-3.2-3B-Instruct-4bit` | ~2GB | 6GB | General purpose |
845
- | `Phi-3-mini-4k-instruct-4bit` | ~2GB | 6GB | Reasoning tasks |
846
-
847
- #### Server Options
848
-
849
- ```bash
850
- mlx-openai-server launch \
851
- --model-path mlx-community/Qwen2.5-Coder-7B-Instruct-4bit \
852
- --model-type lm \
853
- --host 0.0.0.0 \ # Allow remote connections
854
- --port 8000 \ # Default port
855
- --max-concurrency 2 \ # Parallel requests
856
- --context-length 4096 # Max context window
857
- ```
858
-
859
- #### MLX vs Ollama Comparison
860
-
861
- | Feature | MLX OpenAI Server | Ollama |
862
- |---------|-------------------|--------|
863
- | Platform | Apple Silicon only | Cross-platform |
864
- | Performance | Native MLX optimization | Good on Apple Silicon |
865
- | Model Format | HuggingFace MLX | Ollama-specific |
866
- | Vision/Audio | ✅ Built-in | Limited |
867
- | Image Generation | ✅ Flux support | ❌ |
868
- | Quantization | 4/8/16-bit flexible | Model-specific |
869
-
870
- #### Test Connection
871
-
872
- ```bash
873
- curl -X POST http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "default", "messages": [{"role": "user", "content": "Hello"}]}'
874
- ```
875
-
876
- #### Pricing
877
-
878
- **100% FREE** - Models run locally on your Apple Silicon Mac.
879
-
880
- ---
881
-
882
- ## Tier-Based Routing & Fallback
883
-
884
- ### Intelligent 4-Tier Routing
885
-
886
- Optimize costs by routing requests based on complexity:
887
-
888
- ```env
889
- # Tier-based routing (set all 4 to enable)
890
- TIER_SIMPLE=ollama:llama3.2
891
- TIER_MEDIUM=openrouter:openai/gpt-4o-mini
892
- TIER_COMPLEX=azure-openai:gpt-4o
893
- TIER_REASONING=azure-openai:gpt-4o
894
-
895
- FALLBACK_ENABLED=true
896
-
897
- # Provider credentials
898
- OLLAMA_ENDPOINT=http://localhost:11434
899
- OPENROUTER_API_KEY=your-key
900
- AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/...
901
- AZURE_OPENAI_API_KEY=your-key
902
- ```
903
-
904
- ### How It Works
905
-
906
- **Routing Logic:**
907
- 1. Each request is scored for complexity (0-100)
908
- 2. Score maps to a tier: SIMPLE (0-25), MEDIUM (26-50), COMPLEX (51-75), REASONING (76-100)
909
- 3. The request is routed to the provider:model configured for that tier
910
-
911
- **Automatic Fallback:**
912
- - If the selected provider fails, Lynkr falls back to `FALLBACK_PROVIDER`
913
- - Transparent to the user
914
-
915
- ### Cost Savings
916
-
917
- - **65-100%** for requests routed to local/cheap models
918
- - **40-87%** faster for simple requests
919
- - **Privacy**: Simple queries can stay on your machine when using a local TIER_SIMPLE model
920
-
921
- ### Configuration Options
922
-
923
- | Variable | Description | Default |
924
- |----------|-------------|---------|
925
- | `TIER_SIMPLE` | Model for simple tier (`provider:model`) | *required for tier routing* |
926
- | `TIER_MEDIUM` | Model for medium tier (`provider:model`) | *required for tier routing* |
927
- | `TIER_COMPLEX` | Model for complex tier (`provider:model`) | *required for tier routing* |
928
- | `TIER_REASONING` | Model for reasoning tier (`provider:model`) | *required for tier routing* |
929
- | `FALLBACK_ENABLED` | Enable automatic fallback | `true` |
930
- | `FALLBACK_PROVIDER` | Provider to use when primary fails | `databricks` |
931
- | `OLLAMA_MAX_TOOLS_FOR_ROUTING` | Max tools to route to Ollama | `3` |
932
- | `OPENROUTER_MAX_TOOLS_FOR_ROUTING` | Max tools to route to OpenRouter | `15` |
933
-
934
- **Note:** Local providers (ollama, llamacpp, lmstudio) cannot be used as `FALLBACK_PROVIDER`.
935
-
936
- ---
937
-
938
- ## Complete Configuration Reference
939
-
940
- ### Core Variables
941
-
942
- | Variable | Description | Default |
943
- |----------|-------------|---------|
944
- | `MODEL_PROVIDER` | Primary provider (`databricks`, `bedrock`, `openrouter`, `ollama`, `llamacpp`, `azure-openai`, `azure-anthropic`, `openai`, `lmstudio`, `zai`, `moonshot`, `vertex`) | `databricks` |
945
- | `PORT` | HTTP port for proxy server | `8081` |
946
- | `WORKSPACE_ROOT` | Workspace directory path | `process.cwd()` |
947
- | `LOG_LEVEL` | Logging level (`error`, `warn`, `info`, `debug`) | `info` |
948
- | `TOOL_EXECUTION_MODE` | Where tools execute (`server`, `client`) | `server` |
949
- | `MODEL_DEFAULT` | Override default model/deployment name | Provider-specific |
950
-
951
- ### Provider-Specific Variables
952
-
953
- See individual provider sections above for complete variable lists.
954
-
955
- ---
956
-
957
- ## Provider Comparison
958
-
959
- ### Feature Comparison
960
-
961
- | Feature | Databricks | Bedrock | OpenAI | Azure OpenAI | Azure Anthropic | OpenRouter | Moonshot | Ollama | llama.cpp | LM Studio |
962
- |---------|-----------|---------|--------|--------------|-----------------|------------|----------|--------|-----------|-----------|
963
- | **Setup Complexity** | Medium | Easy | Easy | Medium | Medium | Easy | Easy | Easy | Medium | Easy |
964
- | **Cost** | $$$ | $-$$$ | $$ | $$ | $$$ | $-$$ | $ | **Free** | **Free** | **Free** |
965
- | **Latency** | Low | Low | Low | Low | Low | Medium | Low | **Very Low** | **Very Low** | **Very Low** |
966
- | **Model Variety** | 2 | **100+** | 10+ | 10+ | 2 | **100+** | 2+ | 50+ | Unlimited | 50+ |
967
- | **Tool Calling** | Excellent | Excellent* | Excellent | Excellent | Excellent | Good | Good | Fair | Good | Fair |
968
- | **Context Length** | 200K | Up to 300K | 128K | 128K | 200K | Varies | 128K | 32K-128K | Model-dependent | 32K-128K |
969
- | **Streaming** | Yes | Yes | Yes | Yes | Yes | Yes | Non-streaming** | Yes | Yes | Yes |
970
- | **Privacy** | Enterprise | Enterprise | Third-party | Enterprise | Enterprise | Third-party | Third-party | **Local** | **Local** | **Local** |
971
- | **Offline** | No | No | No | No | No | No | No | **Yes** | **Yes** | **Yes** |
972
-
973
- _** Moonshot uses non-streaming mode (responses arrive as complete JSON) for clean terminal rendering_
974
-
975
- _* Tool calling only supported by Claude models on Bedrock_
976
-
977
- ### Cost Comparison (per 1M tokens)
978
-
979
- | Provider | Model | Input | Output |
980
- |----------|-------|-------|--------|
981
- | **Bedrock** | Claude 3.5 Sonnet | $3.00 | $15.00 |
982
- | **Databricks** | Contact for pricing | - | - |
983
- | **OpenRouter** | Claude 3.5 Sonnet | $3.00 | $15.00 |
984
- | **OpenRouter** | GPT-4o mini | $0.15 | $0.60 |
985
- | **OpenAI** | GPT-4o | $2.50 | $10.00 |
986
- | **Azure OpenAI** | GPT-4o | $2.50 | $10.00 |
987
- | **Moonshot** | Kimi K2 Turbo | See moonshot.ai | See moonshot.ai |
988
- | **Ollama** | Any model | **FREE** | **FREE** |
989
- | **llama.cpp** | Any model | **FREE** | **FREE** |
990
- | **LM Studio** | Any model | **FREE** | **FREE** |
991
-
992
- ---
993
-
994
- ## Next Steps
995
-
996
- - **[Installation Guide](installation.md)** - Install Lynkr with your chosen provider
997
- - **[Claude Code CLI Setup](claude-code-cli.md)** - Connect Claude Code CLI
998
- - **[Cursor Integration](cursor-integration.md)** - Connect Cursor IDE
999
- - **[Embeddings Configuration](embeddings.md)** - Enable @Codebase semantic search
1000
- - **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
1001
-
1002
- ---
1003
-
1004
- ## Getting Help
1005
-
1006
- - **[FAQ](faq.md)** - Frequently asked questions
1007
- - **[Troubleshooting Guide](troubleshooting.md)** - Common issues
1008
- - **[GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Community Q&A
1009
- - **[GitHub Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Report bugs