lynkr 2.0.0 → 3.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +226 -15
- package/docs/index.md +230 -11
- package/install.sh +260 -0
- package/package.json +4 -3
- package/src/clients/databricks.js +158 -0
- package/src/clients/routing.js +13 -1
- package/src/config/index.js +68 -1
- package/src/db/index.js +118 -0
- package/src/memory/extractor.js +350 -0
- package/src/memory/index.js +55 -0
- package/src/memory/retriever.js +266 -0
- package/src/memory/search.js +239 -0
- package/src/memory/store.js +411 -0
- package/src/memory/surprise.js +306 -0
- package/src/memory/tools.js +348 -0
- package/src/orchestrator/index.js +170 -0
- package/test/llamacpp-integration.test.js +686 -0
- package/test/memory/extractor.test.js +360 -0
- package/test/memory/retriever.test.js +583 -0
- package/test/memory/search.test.js +389 -0
- package/test/memory/store.test.js +312 -0
- package/test/memory/surprise.test.js +300 -0
- package/test/memory-performance.test.js +472 -0
- package/test/openai-integration.test.js +681 -0
package/README.md
CHANGED
|
@@ -5,7 +5,9 @@
|
|
|
5
5
|
[](LICENSE)
|
|
6
6
|
[](https://deepwiki.com/vishalveerareddy123/Lynkr)
|
|
7
7
|
[](https://www.databricks.com/)
|
|
8
|
+
[](https://openai.com/)
|
|
8
9
|
[](https://ollama.ai/)
|
|
10
|
+
[](https://github.com/ggerganov/llama.cpp)
|
|
9
11
|
[](https://www.indexnow.org/)
|
|
10
12
|
[](https://devhunt.org/tool/lynkr)
|
|
11
13
|
|
|
@@ -66,7 +68,7 @@ Key highlights:
|
|
|
66
68
|
|
|
67
69
|
The result is a production-ready, self-hosted alternative that stays close to Anthropic's ergonomics while providing enterprise-grade reliability, observability, and performance.
|
|
68
70
|
|
|
69
|
-
> **Compatibility note:** Claude models hosted on Databricks work out of the box. Set `MODEL_PROVIDER=azure-anthropic` (and related credentials) to target the Azure-hosted Anthropic `/anthropic/v1/messages` endpoint. Set `MODEL_PROVIDER=openrouter` to access 100+ models through OpenRouter (GPT-4o, Claude, Gemini, etc.). Set `MODEL_PROVIDER=ollama` to use locally-running Ollama models (qwen2.5-coder, llama3, mistral, etc.).
|
|
71
|
+
> **Compatibility note:** Claude models hosted on Databricks work out of the box. Set `MODEL_PROVIDER=openai` to use OpenAI's API directly (GPT-4o, GPT-4o-mini, o1, etc.). Set `MODEL_PROVIDER=azure-anthropic` (and related credentials) to target the Azure-hosted Anthropic `/anthropic/v1/messages` endpoint. Set `MODEL_PROVIDER=openrouter` to access 100+ models through OpenRouter (GPT-4o, Claude, Gemini, etc.). Set `MODEL_PROVIDER=ollama` to use locally-running Ollama models (qwen2.5-coder, llama3, mistral, etc.).
|
|
70
72
|
|
|
71
73
|
Further documentation and usage notes are available on [DeepWiki](https://deepwiki.com/vishalveerareddy123/Lynkr).
|
|
72
74
|
|
|
@@ -81,10 +83,12 @@ Lynkr supports multiple AI model providers, giving you flexibility in choosing t
|
|
|
81
83
|
| Provider | Configuration | Models Available | Best For |
|
|
82
84
|
|----------|--------------|------------------|----------|
|
|
83
85
|
| **Databricks** (Default) | `MODEL_PROVIDER=databricks` | Claude Sonnet 4.5, Claude Opus 4.5 | Production use, enterprise deployment |
|
|
84
|
-
| **
|
|
86
|
+
| **OpenAI** | `MODEL_PROVIDER=openai` | GPT-5, GPT-5.2, GPT-4o, GPT-4o-mini, GPT-4-turbo, o1, o1-mini | Direct OpenAI API access |
|
|
87
|
+
| **Azure OpenAI** | `MODEL_PROVIDER=azure-openai` | GPT-5, GPT-5.2,GPT-4o, GPT-4o-mini, GPT-5, o1, o3 | Azure integration, Microsoft ecosystem |
|
|
85
88
|
| **Azure Anthropic** | `MODEL_PROVIDER=azure-anthropic` | Claude Sonnet 4.5, Claude Opus 4.5 | Azure-hosted Claude models |
|
|
86
89
|
| **OpenRouter** | `MODEL_PROVIDER=openrouter` | 100+ models (GPT-4o, Claude, Gemini, Llama, etc.) | Model flexibility, cost optimization |
|
|
87
90
|
| **Ollama** (Local) | `MODEL_PROVIDER=ollama` | Llama 3.1, Qwen2.5, Mistral, CodeLlama | Local/offline use, privacy, no API costs |
|
|
91
|
+
| **llama.cpp** (Local) | `MODEL_PROVIDER=llamacpp` | Any GGUF model | Maximum performance, full model control |
|
|
88
92
|
|
|
89
93
|
### **Recommended Models by Use Case**
|
|
90
94
|
|
|
@@ -158,21 +162,68 @@ FALLBACK_PROVIDER=databricks # or azure-openai, openrouter, azure-anthropic
|
|
|
158
162
|
|
|
159
163
|
### **Provider Comparison**
|
|
160
164
|
|
|
161
|
-
| Feature | Databricks | Azure OpenAI | Azure Anthropic | OpenRouter | Ollama |
|
|
162
|
-
|
|
163
|
-
| **Setup Complexity** | Medium | Medium | Medium | Easy | Easy |
|
|
164
|
-
| **Cost** | $$$ | $$ | $$$ | $ | Free |
|
|
165
|
-
| **Latency** | Low | Low | Low | Medium | Very Low |
|
|
166
|
-
| **Tool Calling** | Excellent | Excellent | Excellent | Good | Fair |
|
|
167
|
-
| **Context Length** | 200K | 128K | 200K | Varies | 32K-128K |
|
|
168
|
-
| **Streaming** | Yes | Yes | Yes | Yes | Yes |
|
|
169
|
-
| **Privacy** | Enterprise | Enterprise | Enterprise | Third-party | Local |
|
|
170
|
-
| **Offline** | No | No | No | No | Yes |
|
|
165
|
+
| Feature | Databricks | OpenAI | Azure OpenAI | Azure Anthropic | OpenRouter | Ollama | llama.cpp |
|
|
166
|
+
|---------|-----------|--------|--------------|-----------------|------------|--------|-----------|
|
|
167
|
+
| **Setup Complexity** | Medium | Easy | Medium | Medium | Easy | Easy | Medium |
|
|
168
|
+
| **Cost** | $$$ | $$ | $$ | $$$ | $ | Free | Free |
|
|
169
|
+
| **Latency** | Low | Low | Low | Low | Medium | Very Low | Very Low |
|
|
170
|
+
| **Tool Calling** | Excellent | Excellent | Excellent | Excellent | Good | Fair | Good |
|
|
171
|
+
| **Context Length** | 200K | 128K | 128K | 200K | Varies | 32K-128K | Model-dependent |
|
|
172
|
+
| **Streaming** | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
|
|
173
|
+
| **Privacy** | Enterprise | Third-party | Enterprise | Enterprise | Third-party | Local | Local |
|
|
174
|
+
| **Offline** | No | No | No | No | No | Yes | Yes |
|
|
171
175
|
|
|
172
176
|
---
|
|
173
177
|
|
|
174
178
|
## Core Capabilities
|
|
175
179
|
|
|
180
|
+
### Long-Term Memory System (Titans-Inspired)
|
|
181
|
+
|
|
182
|
+
**NEW:** Lynkr now includes a comprehensive long-term memory system inspired by Google's Titans architecture, enabling persistent context across conversations and intelligent memory management.
|
|
183
|
+
|
|
184
|
+
**Key Features:**
|
|
185
|
+
- 🧠 **Surprise-Based Memory Updates** – Automatically extracts and stores only important, novel, or surprising information from conversations using a 5-factor heuristic scoring system (novelty, contradiction, specificity, emphasis, context switch).
|
|
186
|
+
- 🔍 **FTS5 Semantic Search** – Full-text search with Porter stemmer and keyword expansion for finding relevant memories.
|
|
187
|
+
- 📊 **Multi-Signal Retrieval** – Ranks memories using recency (30%), importance (40%), and relevance (30%) for optimal context injection.
|
|
188
|
+
- ⚡ **Automatic Integration** – Memories are extracted after each response and injected before model calls with zero latency overhead (<50ms retrieval, <100ms async extraction).
|
|
189
|
+
- 🎯 **5 Memory Types** – Tracks preferences, decisions, facts, entities, and relationships.
|
|
190
|
+
- 🛠️ **Management Tools** – `memory_search`, `memory_add`, `memory_forget`, `memory_stats` for explicit control.
|
|
191
|
+
|
|
192
|
+
**Quick Start:**
|
|
193
|
+
```bash
|
|
194
|
+
# Memory system is enabled by default - just use Lynkr!
|
|
195
|
+
# Test it:
|
|
196
|
+
# 1. Say: "I prefer Python for data processing"
|
|
197
|
+
# 2. Later ask: "What language should I use for data tasks?"
|
|
198
|
+
# → Model will remember your preference and recommend Python
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
**Configuration:**
|
|
202
|
+
```env
|
|
203
|
+
MEMORY_ENABLED=true # Enable/disable (default: true)
|
|
204
|
+
MEMORY_RETRIEVAL_LIMIT=5 # Memories per request (default: 5)
|
|
205
|
+
MEMORY_SURPRISE_THRESHOLD=0.3 # Min score to store (default: 0.3)
|
|
206
|
+
MEMORY_MAX_AGE_DAYS=90 # Auto-prune age (default: 90)
|
|
207
|
+
MEMORY_MAX_COUNT=10000 # Max memories (default: 10000)
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
**What Gets Remembered:**
|
|
211
|
+
- ✅ User preferences ("I prefer X")
|
|
212
|
+
- ✅ Important decisions ("Decided to use Y")
|
|
213
|
+
- ✅ Project facts ("This app uses Z")
|
|
214
|
+
- ✅ New entities (first mentions of files, functions)
|
|
215
|
+
- ✅ Contradictions ("Actually, A not B")
|
|
216
|
+
- ❌ Greetings, confirmations, repeated info (filtered by surprise threshold)
|
|
217
|
+
|
|
218
|
+
**Benefits:**
|
|
219
|
+
- 🎯 **Better context understanding** across sessions
|
|
220
|
+
- 💾 **Persistent knowledge** stored in SQLite
|
|
221
|
+
- 🚀 **Zero performance impact** (<50ms retrieval, async extraction)
|
|
222
|
+
- 🔒 **Privacy-preserving** (all local, no external APIs)
|
|
223
|
+
- 📈 **Scales efficiently** (supports 10K+ memories)
|
|
224
|
+
|
|
225
|
+
See [MEMORY_SYSTEM.md](MEMORY_SYSTEM.md) for complete documentation and [QUICKSTART_MEMORY.md](QUICKSTART_MEMORY.md) for usage examples.
|
|
226
|
+
|
|
176
227
|
### Repo Intelligence & Navigation
|
|
177
228
|
|
|
178
229
|
- Fast indexer builds a lightweight SQLite catalog of files, symbols, references, and framework hints.
|
|
@@ -428,6 +479,23 @@ Lynkr includes comprehensive production-ready features designed for reliability,
|
|
|
428
479
|
|
|
429
480
|
Lynkr offers multiple installation methods to fit your workflow:
|
|
430
481
|
|
|
482
|
+
#### Quick Install (curl)
|
|
483
|
+
|
|
484
|
+
```bash
|
|
485
|
+
curl -fsSL https://raw.githubusercontent.com/vishalveerareddy123/Lynkr/main/install.sh | bash
|
|
486
|
+
```
|
|
487
|
+
|
|
488
|
+
This will:
|
|
489
|
+
- Clone Lynkr to `~/.lynkr`
|
|
490
|
+
- Install dependencies
|
|
491
|
+
- Create a default `.env` file
|
|
492
|
+
- Set up the `lynkr` command
|
|
493
|
+
|
|
494
|
+
**Custom installation directory:**
|
|
495
|
+
```bash
|
|
496
|
+
curl -fsSL https://raw.githubusercontent.com/vishalveerareddy123/Lynkr/main/install.sh | bash -s -- --dir /opt/lynkr
|
|
497
|
+
```
|
|
498
|
+
|
|
431
499
|
#### Option 1: Simple Databricks Setup (Quickest)
|
|
432
500
|
|
|
433
501
|
**No Ollama needed** - Just use Databricks APIs directly:
|
|
@@ -603,6 +671,52 @@ ollama pull qwen2.5-coder:latest
|
|
|
603
671
|
ollama list
|
|
604
672
|
```
|
|
605
673
|
|
|
674
|
+
**llama.cpp configuration:**
|
|
675
|
+
|
|
676
|
+
llama.cpp provides maximum performance and flexibility for running GGUF models locally. It uses an OpenAI-compatible API, making integration seamless.
|
|
677
|
+
|
|
678
|
+
```env
|
|
679
|
+
MODEL_PROVIDER=llamacpp
|
|
680
|
+
LLAMACPP_ENDPOINT=http://localhost:8080 # default llama.cpp server port
|
|
681
|
+
LLAMACPP_MODEL=qwen2.5-coder-7b # model name (for logging)
|
|
682
|
+
LLAMACPP_TIMEOUT_MS=120000 # request timeout
|
|
683
|
+
PORT=8080
|
|
684
|
+
WORKSPACE_ROOT=/path/to/your/repo
|
|
685
|
+
```
|
|
686
|
+
|
|
687
|
+
Before starting Lynkr with llama.cpp, ensure llama-server is running:
|
|
688
|
+
|
|
689
|
+
```bash
|
|
690
|
+
# Download and build llama.cpp (if not already done)
|
|
691
|
+
git clone https://github.com/ggerganov/llama.cpp
|
|
692
|
+
cd llama.cpp && make
|
|
693
|
+
|
|
694
|
+
# Download a GGUF model (e.g., from HuggingFace)
|
|
695
|
+
# Example: Qwen2.5-Coder-7B-Instruct
|
|
696
|
+
wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_k_m.gguf
|
|
697
|
+
|
|
698
|
+
# Start llama-server
|
|
699
|
+
./llama-server -m qwen2.5-coder-7b-instruct-q4_k_m.gguf --port 8080
|
|
700
|
+
|
|
701
|
+
# Verify server is running
|
|
702
|
+
curl http://localhost:8080/health
|
|
703
|
+
```
|
|
704
|
+
|
|
705
|
+
**Why llama.cpp over Ollama?**
|
|
706
|
+
|
|
707
|
+
| Feature | Ollama | llama.cpp |
|
|
708
|
+
|---------|--------|-----------|
|
|
709
|
+
| Setup | Easy (app) | Manual (compile/download) |
|
|
710
|
+
| Model Format | Ollama-specific | Any GGUF model |
|
|
711
|
+
| Performance | Good | Excellent (optimized C++) |
|
|
712
|
+
| GPU Support | Yes | Yes (CUDA, Metal, ROCm, Vulkan) |
|
|
713
|
+
| Memory Usage | Higher | Lower (quantization options) |
|
|
714
|
+
| API | Custom `/api/chat` | OpenAI-compatible `/v1/chat/completions` |
|
|
715
|
+
| Flexibility | Limited models | Any GGUF from HuggingFace |
|
|
716
|
+
| Tool Calling | Limited models | Grammar-based, more reliable |
|
|
717
|
+
|
|
718
|
+
Choose llama.cpp when you need maximum performance, specific quantization options, or want to use GGUF models not available in Ollama.
|
|
719
|
+
|
|
606
720
|
**OpenRouter configuration:**
|
|
607
721
|
|
|
608
722
|
OpenRouter provides unified access to 100+ AI models through a single API, including GPT-4o, Claude, Gemini, Llama, Mixtral, and more. It offers competitive pricing, automatic fallbacks, and no need to manage multiple API keys.
|
|
@@ -624,6 +738,33 @@ WORKSPACE_ROOT=/path/to/your/repo
|
|
|
624
738
|
|
|
625
739
|
See https://openrouter.ai/models for the complete list with pricing.
|
|
626
740
|
|
|
741
|
+
**OpenAI configuration:**
|
|
742
|
+
|
|
743
|
+
OpenAI provides direct access to GPT-4o, GPT-4o-mini, o1, and other models through their official API. This is the simplest way to use OpenAI models without going through Azure or OpenRouter.
|
|
744
|
+
|
|
745
|
+
```env
|
|
746
|
+
MODEL_PROVIDER=openai
|
|
747
|
+
OPENAI_API_KEY=sk-your-openai-api-key # Get from https://platform.openai.com/api-keys
|
|
748
|
+
OPENAI_MODEL=gpt-4o # Model to use (default: gpt-4o)
|
|
749
|
+
PORT=8080
|
|
750
|
+
WORKSPACE_ROOT=/path/to/your/repo
|
|
751
|
+
```
|
|
752
|
+
|
|
753
|
+
|
|
754
|
+
**Getting an OpenAI API key:**
|
|
755
|
+
1. Visit https://platform.openai.com
|
|
756
|
+
2. Sign up or log in to your account
|
|
757
|
+
3. Go to https://platform.openai.com/api-keys
|
|
758
|
+
4. Create a new API key
|
|
759
|
+
5. Add credits to your account (pay-as-you-go)
|
|
760
|
+
|
|
761
|
+
**OpenAI benefits:**
|
|
762
|
+
- ✅ **Direct API access** – No intermediaries, lowest latency to OpenAI
|
|
763
|
+
- ✅ **Full tool calling support** – Excellent function calling compatible with Claude Code CLI
|
|
764
|
+
- ✅ **Parallel tool calls** – Execute multiple tools simultaneously for faster workflows
|
|
765
|
+
- ✅ **Organization support** – Use organization-level API keys for team billing
|
|
766
|
+
- ✅ **Simple setup** – Just one API key needed
|
|
767
|
+
|
|
627
768
|
**Getting an OpenRouter API key:**
|
|
628
769
|
1. Visit https://openrouter.ai
|
|
629
770
|
2. Sign in with GitHub, Google, or email
|
|
@@ -647,7 +788,7 @@ See https://openrouter.ai/models for the complete list with pricing.
|
|
|
647
788
|
|----------|-------------|---------|
|
|
648
789
|
| `PORT` | HTTP port for the proxy server. | `8080` |
|
|
649
790
|
| `WORKSPACE_ROOT` | Filesystem path exposed to workspace tools and indexer. | `process.cwd()` |
|
|
650
|
-
| `MODEL_PROVIDER` | Selects the model backend (`databricks`, `azure-anthropic`, `openrouter`, `ollama`). | `databricks` |
|
|
791
|
+
| `MODEL_PROVIDER` | Selects the model backend (`databricks`, `openai`, `azure-openai`, `azure-anthropic`, `openrouter`, `ollama`, `llamacpp`). | `databricks` |
|
|
651
792
|
| `MODEL_DEFAULT` | Overrides the default model/deployment name sent to the provider. | Provider-specific default |
|
|
652
793
|
| `DATABRICKS_API_BASE` | Base URL of your Databricks workspace (required when `MODEL_PROVIDER=databricks`). | – |
|
|
653
794
|
| `DATABRICKS_API_KEY` | Databricks PAT used for the serving endpoint (required for Databricks). | – |
|
|
@@ -659,9 +800,17 @@ See https://openrouter.ai/models for the complete list with pricing.
|
|
|
659
800
|
| `OPENROUTER_MODEL` | OpenRouter model to use (e.g., `openai/gpt-4o-mini`, `anthropic/claude-3.5-sonnet`). See https://openrouter.ai/models | `openai/gpt-4o-mini` |
|
|
660
801
|
| `OPENROUTER_ENDPOINT` | OpenRouter API endpoint URL. | `https://openrouter.ai/api/v1/chat/completions` |
|
|
661
802
|
| `OPENROUTER_MAX_TOOLS_FOR_ROUTING` | Maximum tool count for routing to OpenRouter in hybrid mode. | `15` |
|
|
803
|
+
| `OPENAI_API_KEY` | OpenAI API key (required when `MODEL_PROVIDER=openai`). Get from https://platform.openai.com/api-keys | – |
|
|
804
|
+
| `OPENAI_MODEL` | OpenAI model to use (e.g., `gpt-4o`, `gpt-4o-mini`, `o1-preview`). | `gpt-4o` |
|
|
805
|
+
| `OPENAI_ENDPOINT` | OpenAI API endpoint URL (usually don't need to change). | `https://api.openai.com/v1/chat/completions` |
|
|
806
|
+
| `OPENAI_ORGANIZATION` | OpenAI organization ID for organization-level API keys (optional). | – |
|
|
662
807
|
| `OLLAMA_ENDPOINT` | Ollama API endpoint URL (required when `MODEL_PROVIDER=ollama`). | `http://localhost:11434` |
|
|
663
808
|
| `OLLAMA_MODEL` | Ollama model name to use (e.g., `qwen2.5-coder:latest`, `llama3`, `mistral`). | `qwen2.5-coder:7b` |
|
|
664
809
|
| `OLLAMA_TIMEOUT_MS` | Request timeout for Ollama API calls in milliseconds. | `120000` (2 minutes) |
|
|
810
|
+
| `LLAMACPP_ENDPOINT` | llama.cpp server endpoint URL (required when `MODEL_PROVIDER=llamacpp`). | `http://localhost:8080` |
|
|
811
|
+
| `LLAMACPP_MODEL` | llama.cpp model name (for logging purposes). | `default` |
|
|
812
|
+
| `LLAMACPP_TIMEOUT_MS` | Request timeout for llama.cpp API calls in milliseconds. | `120000` (2 minutes) |
|
|
813
|
+
| `LLAMACPP_API_KEY` | Optional API key for secured llama.cpp servers. | – |
|
|
665
814
|
| `PROMPT_CACHE_ENABLED` | Toggle the prompt cache system. | `true` |
|
|
666
815
|
| `PROMPT_CACHE_TTL_MS` | Milliseconds before cached prompts expire. | `300000` (5 minutes) |
|
|
667
816
|
| `PROMPT_CACHE_MAX_ENTRIES` | Maximum number of cached prompts retained. | `64` |
|
|
@@ -1282,9 +1431,12 @@ Replace `<workspace>` and `<endpoint-name>` with your Databricks workspace host
|
|
|
1282
1431
|
### Provider-specific behaviour
|
|
1283
1432
|
|
|
1284
1433
|
- **Databricks** – Mirrors Anthropic's hosted behaviour. Automatic policy web fallbacks (`needsWebFallback`) can trigger an extra `web_fetch`, and the upstream service executes dynamic pages on your behalf.
|
|
1434
|
+
- **OpenAI** – Connects directly to OpenAI's API for GPT-4o, GPT-4o-mini, o1, and other models. Full tool calling support with parallel tool execution enabled by default. Messages and tools are automatically converted between Anthropic and OpenAI formats. Supports organization-level API keys. Best used when you want direct access to OpenAI's latest models with the simplest setup.
|
|
1435
|
+
- **Azure OpenAI** – Connects to Azure-hosted OpenAI models. Similar to direct OpenAI but through Azure's infrastructure for enterprise compliance, data residency, and Azure billing integration.
|
|
1285
1436
|
- **Azure Anthropic** – Requests are normalised to Azure's payload shape. The proxy disables automatic `web_fetch` fallbacks to avoid duplicate tool executions; instead, the assistant surfaces a diagnostic message and you can trigger the tool manually if required.
|
|
1286
1437
|
- **OpenRouter** – Connects to OpenRouter's unified API for access to 100+ models. Full tool calling support with automatic format conversion between Anthropic and OpenAI formats. Messages are converted to OpenAI's format, tool calls are properly translated, and responses are converted back to Anthropic-compatible format. Best used for cost optimization, model flexibility, or when you want to experiment with different models without changing your codebase.
|
|
1287
1438
|
- **Ollama** – Connects to locally-running Ollama models. Tool support varies by model (llama3.1, qwen2.5, mistral support tools; llama3 and older models don't). System prompts are merged into the first user message. Response format is converted from Ollama's format to Anthropic-compatible content blocks. Best used for simple text generation tasks, offline development, or as a cost-effective development environment.
|
|
1439
|
+
- **llama.cpp** – Connects to a local llama-server instance running GGUF models. Uses OpenAI-compatible API format (`/v1/chat/completions`), enabling full tool calling support with grammar-based generation. Provides maximum performance with optimized C++ inference, lower memory usage through quantization, and support for any GGUF model from HuggingFace. Best used when you need maximum performance, specific quantization options, or models not available in Ollama.
|
|
1288
1440
|
- In all cases, `web_search` and `web_fetch` run locally. They do not execute JavaScript, so pages that render data client-side (Google Finance, etc.) will return scaffolding only. Prefer JSON/CSV quote APIs (e.g. Yahoo chart API) when you need live financial data.
|
|
1289
1441
|
|
|
1290
1442
|
---
|
|
@@ -1460,7 +1612,42 @@ A:
|
|
|
1460
1612
|
- **OpenRouter**: ~300ms-1.5s latency, cloud-hosted, competitive pricing ($0.15/1M for GPT-4o-mini), 100+ models, full tool support
|
|
1461
1613
|
- **Ollama**: ~100-500ms first token, runs locally, free, limited tool support (model-dependent)
|
|
1462
1614
|
|
|
1463
|
-
Choose Databricks/Azure for enterprise production with guaranteed SLAs. Choose OpenRouter for flexibility, cost optimization, and access to multiple models. Choose Ollama for fast iteration, offline development, or maximum cost savings.
|
|
1615
|
+
Choose Databricks/Azure for enterprise production with guaranteed SLAs. Choose OpenRouter for flexibility, cost optimization, and access to multiple models. Choose Ollama for fast iteration, offline development, or maximum cost savings. Choose llama.cpp for maximum performance and full GGUF model control.
|
|
1616
|
+
|
|
1617
|
+
**Q: What is llama.cpp and when should I use it over Ollama?**
|
|
1618
|
+
A: llama.cpp is a high-performance C++ inference engine for running large language models locally. Unlike Ollama (which is an application with its own model format), llama.cpp:
|
|
1619
|
+
- **Runs any GGUF model** from HuggingFace directly
|
|
1620
|
+
- **Provides better performance** through optimized C++ code
|
|
1621
|
+
- **Uses less memory** with advanced quantization options (Q2_K to Q8_0)
|
|
1622
|
+
- **Supports more GPU backends** (CUDA, Metal, ROCm, Vulkan, SYCL)
|
|
1623
|
+
- **Uses OpenAI-compatible API** making integration seamless
|
|
1624
|
+
|
|
1625
|
+
Use llama.cpp when you need:
|
|
1626
|
+
- Maximum inference speed and minimum memory usage
|
|
1627
|
+
- Specific quantization levels not available in Ollama
|
|
1628
|
+
- GGUF models not packaged for Ollama
|
|
1629
|
+
- Fine-grained control over model parameters (context length, GPU layers, etc.)
|
|
1630
|
+
|
|
1631
|
+
Use Ollama when you prefer easier setup and don't need the extra control.
|
|
1632
|
+
|
|
1633
|
+
**Q: How do I set up llama.cpp with Lynkr?**
|
|
1634
|
+
A:
|
|
1635
|
+
```bash
|
|
1636
|
+
# 1. Build llama.cpp (or download pre-built binary)
|
|
1637
|
+
git clone https://github.com/ggerganov/llama.cpp
|
|
1638
|
+
cd llama.cpp && make
|
|
1639
|
+
|
|
1640
|
+
# 2. Download a GGUF model
|
|
1641
|
+
wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_k_m.gguf
|
|
1642
|
+
|
|
1643
|
+
# 3. Start the server
|
|
1644
|
+
./llama-server -m qwen2.5-coder-7b-instruct-q4_k_m.gguf --port 8080
|
|
1645
|
+
|
|
1646
|
+
# 4. Configure Lynkr
|
|
1647
|
+
export MODEL_PROVIDER=llamacpp
|
|
1648
|
+
export LLAMACPP_ENDPOINT=http://localhost:8080
|
|
1649
|
+
npm start
|
|
1650
|
+
```
|
|
1464
1651
|
|
|
1465
1652
|
**Q: What is OpenRouter and why should I use it?**
|
|
1466
1653
|
A: OpenRouter is a unified API gateway that provides access to 100+ AI models from multiple providers (OpenAI, Anthropic, Google, Meta, Mistral, etc.) through a single API key. Benefits include:
|
|
@@ -1492,7 +1679,31 @@ A: Popular choices:
|
|
|
1492
1679
|
|
|
1493
1680
|
See https://openrouter.ai/models for the complete list with pricing and features.
|
|
1494
1681
|
|
|
1495
|
-
**Q:
|
|
1682
|
+
**Q: How do I use OpenAI directly with Lynkr?**
|
|
1683
|
+
A: Set `MODEL_PROVIDER=openai` and configure your API key:
|
|
1684
|
+
```env
|
|
1685
|
+
MODEL_PROVIDER=openai
|
|
1686
|
+
OPENAI_API_KEY=sk-your-api-key
|
|
1687
|
+
OPENAI_MODEL=gpt-4o # or gpt-4o-mini, o1-preview, etc.
|
|
1688
|
+
```
|
|
1689
|
+
Then start Lynkr and connect Claude CLI as usual. All requests will be routed to OpenAI's API with automatic format conversion.
|
|
1690
|
+
|
|
1691
|
+
**Q: What's the difference between OpenAI, Azure OpenAI, and OpenRouter?**
|
|
1692
|
+
A:
|
|
1693
|
+
- **OpenAI** – Direct access to OpenAI's API. Simplest setup, lowest latency to OpenAI, pay-as-you-go billing directly with OpenAI.
|
|
1694
|
+
- **Azure OpenAI** – OpenAI models hosted on Azure infrastructure. Enterprise features (private endpoints, data residency, Azure AD integration), billed through Azure.
|
|
1695
|
+
- **OpenRouter** – Third-party API gateway providing access to 100+ models (including OpenAI). Competitive pricing, automatic fallbacks, single API key for multiple providers.
|
|
1696
|
+
|
|
1697
|
+
Choose OpenAI for simplicity and direct access, Azure OpenAI for enterprise requirements, or OpenRouter for model flexibility and cost optimization.
|
|
1698
|
+
|
|
1699
|
+
**Q: Which OpenAI model should I use?**
|
|
1700
|
+
A:
|
|
1701
|
+
- **Best quality**: `gpt-4o` – Most capable, multimodal (text + vision), excellent tool calling
|
|
1702
|
+
- **Best value**: `gpt-4o-mini` – Fast, affordable ($0.15/$0.60 per 1M tokens), good for most tasks
|
|
1703
|
+
- **Complex reasoning**: `o1-preview` – Advanced reasoning for math, logic, and complex problems
|
|
1704
|
+
- **Fast reasoning**: `o1-mini` – Efficient reasoning for coding and math tasks
|
|
1705
|
+
|
|
1706
|
+
**Q: Can I use OpenAI with the 3-tier hybrid routing?**
|
|
1496
1707
|
A: Yes! The recommended configuration uses:
|
|
1497
1708
|
- **Tier 1 (0-2 tools)**: Ollama (free, local, fast)
|
|
1498
1709
|
- **Tier 2 (3-14 tools)**: OpenRouter (affordable, full tool support)
|