npm - lynkr - Versions diffs - 2.0.0 → 3.0.0 - Mend

lynkr 2.0.0 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

package/README.md +226 -15
package/docs/index.md +230 -11
package/install.sh +260 -0
package/package.json +4 -3
package/src/clients/databricks.js +158 -0
package/src/clients/routing.js +13 -1
package/src/config/index.js +68 -1
package/src/db/index.js +118 -0
package/src/memory/extractor.js +350 -0
package/src/memory/index.js +55 -0
package/src/memory/retriever.js +266 -0
package/src/memory/search.js +239 -0
package/src/memory/store.js +411 -0
package/src/memory/surprise.js +306 -0
package/src/memory/tools.js +348 -0
package/src/orchestrator/index.js +170 -0
package/test/llamacpp-integration.test.js +686 -0
package/test/memory/extractor.test.js +360 -0
package/test/memory/retriever.test.js +583 -0
package/test/memory/search.test.js +389 -0
package/test/memory/store.test.js +312 -0
package/test/memory/surprise.test.js +300 -0
package/test/memory-performance.test.js +472 -0
package/test/openai-integration.test.js +681 -0

package/README.md CHANGED Viewed

@@ -5,7 +5,9 @@
 [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
 [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/vishalveerareddy123/Lynkr)
 [![Databricks Supported](https://img.shields.io/badge/Databricks-Supported-orange)](https://www.databricks.com/)
+[![OpenAI Compatible](https://img.shields.io/badge/OpenAI-Compatible-412991)](https://openai.com/)
 [![Ollama Compatible](https://img.shields.io/badge/Ollama-Compatible-brightgreen)](https://ollama.ai/)
+[![llama.cpp Compatible](https://img.shields.io/badge/llama.cpp-Compatible-blue)](https://github.com/ggerganov/llama.cpp)
 [![IndexNow Enabled](https://img.shields.io/badge/IndexNow-Enabled-success?style=flat-square)](https://www.indexnow.org/)
 [![DevHunt](https://img.shields.io/badge/DevHunt-Lynkr-orange)](https://devhunt.org/tool/lynkr)
@@ -66,7 +68,7 @@ Key highlights:
 The result is a production-ready, self-hosted alternative that stays close to Anthropic's ergonomics while providing enterprise-grade reliability, observability, and performance.
-> **Compatibility note:** Claude models hosted on Databricks work out of the box. Set `MODEL_PROVIDER=azure-anthropic` (and related credentials) to target the Azure-hosted Anthropic `/anthropic/v1/messages` endpoint. Set `MODEL_PROVIDER=openrouter` to access 100+ models through OpenRouter (GPT-4o, Claude, Gemini, etc.). Set `MODEL_PROVIDER=ollama` to use locally-running Ollama models (qwen2.5-coder, llama3, mistral, etc.).
+> **Compatibility note:** Claude models hosted on Databricks work out of the box. Set `MODEL_PROVIDER=openai` to use OpenAI's API directly (GPT-4o, GPT-4o-mini, o1, etc.). Set `MODEL_PROVIDER=azure-anthropic` (and related credentials) to target the Azure-hosted Anthropic `/anthropic/v1/messages` endpoint. Set `MODEL_PROVIDER=openrouter` to access 100+ models through OpenRouter (GPT-4o, Claude, Gemini, etc.). Set `MODEL_PROVIDER=ollama` to use locally-running Ollama models (qwen2.5-coder, llama3, mistral, etc.).
 Further documentation and usage notes are available on [DeepWiki](https://deepwiki.com/vishalveerareddy123/Lynkr).
@@ -81,10 +83,12 @@ Lynkr supports multiple AI model providers, giving you flexibility in choosing t
 | Provider | Configuration | Models Available | Best For |
 |----------|--------------|------------------|----------|
 | **Databricks** (Default) | `MODEL_PROVIDER=databricks` | Claude Sonnet 4.5, Claude Opus 4.5 | Production use, enterprise deployment |
-| **Azure OpenAI** | `MODEL_PROVIDER=azure-openai` | GPT-4o, GPT-4o-mini, GPT-5, o1, o3 | Azure integration, Microsoft ecosystem |
+| **OpenAI** | `MODEL_PROVIDER=openai` | GPT-5, GPT-5.2, GPT-4o, GPT-4o-mini, GPT-4-turbo, o1, o1-mini | Direct OpenAI API access |
+| **Azure OpenAI** | `MODEL_PROVIDER=azure-openai` | GPT-5, GPT-5.2,GPT-4o, GPT-4o-mini, GPT-5, o1, o3 | Azure integration, Microsoft ecosystem |
 | **Azure Anthropic** | `MODEL_PROVIDER=azure-anthropic` | Claude Sonnet 4.5, Claude Opus 4.5 | Azure-hosted Claude models |
 | **OpenRouter** | `MODEL_PROVIDER=openrouter` | 100+ models (GPT-4o, Claude, Gemini, Llama, etc.) | Model flexibility, cost optimization |
 | **Ollama** (Local) | `MODEL_PROVIDER=ollama` | Llama 3.1, Qwen2.5, Mistral, CodeLlama | Local/offline use, privacy, no API costs |
+| **llama.cpp** (Local) | `MODEL_PROVIDER=llamacpp` | Any GGUF model | Maximum performance, full model control |
 ### **Recommended Models by Use Case**
@@ -158,21 +162,68 @@ FALLBACK_PROVIDER=databricks  # or azure-openai, openrouter, azure-anthropic
 ### **Provider Comparison**
-| Feature | Databricks | Azure OpenAI | Azure Anthropic | OpenRouter | Ollama |
-|---------|-----------|--------------|-----------------|------------|--------|
-| **Setup Complexity** | Medium | Medium | Medium | Easy | Easy |
-| **Cost** | $$$ | $$ | $$$ | $ | Free |
-| **Latency** | Low | Low | Low | Medium | Very Low |
-| **Tool Calling** | Excellent | Excellent | Excellent | Good | Fair |
-| **Context Length** | 200K | 128K | 200K | Varies | 32K-128K |
-| **Streaming** | Yes | Yes | Yes | Yes | Yes |
-| **Privacy** | Enterprise | Enterprise | Enterprise | Third-party | Local |
-| **Offline** | No | No | No | No | Yes |
+| Feature | Databricks | OpenAI | Azure OpenAI | Azure Anthropic | OpenRouter | Ollama | llama.cpp |
+|---------|-----------|--------|--------------|-----------------|------------|--------|-----------|
+| **Setup Complexity** | Medium | Easy | Medium | Medium | Easy | Easy | Medium |
+| **Cost** | $$$ | $$ | $$ | $$$ | $ | Free | Free |
+| **Latency** | Low | Low | Low | Low | Medium | Very Low | Very Low |
+| **Tool Calling** | Excellent | Excellent | Excellent | Excellent | Good | Fair | Good |
+| **Context Length** | 200K | 128K | 128K | 200K | Varies | 32K-128K | Model-dependent |
+| **Streaming** | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
+| **Privacy** | Enterprise | Third-party | Enterprise | Enterprise | Third-party | Local | Local |
+| **Offline** | No | No | No | No | No | Yes | Yes |
 ---
 ## Core Capabilities
+### Long-Term Memory System (Titans-Inspired)
+**NEW:** Lynkr now includes a comprehensive long-term memory system inspired by Google's Titans architecture, enabling persistent context across conversations and intelligent memory management.
+**Key Features:**
+- 🧠 **Surprise-Based Memory Updates** – Automatically extracts and stores only important, novel, or surprising information from conversations using a 5-factor heuristic scoring system (novelty, contradiction, specificity, emphasis, context switch).
+- 🔍 **FTS5 Semantic Search** – Full-text search with Porter stemmer and keyword expansion for finding relevant memories.
+- 📊 **Multi-Signal Retrieval** – Ranks memories using recency (30%), importance (40%), and relevance (30%) for optimal context injection.
+- ⚡ **Automatic Integration** – Memories are extracted after each response and injected before model calls with zero latency overhead (<50ms retrieval, <100ms async extraction).
+- 🎯 **5 Memory Types** – Tracks preferences, decisions, facts, entities, and relationships.
+- 🛠️ **Management Tools** – `memory_search`, `memory_add`, `memory_forget`, `memory_stats` for explicit control.
+**Quick Start:**
+```bash
+# Memory system is enabled by default - just use Lynkr!
+# Test it:
+# 1. Say: "I prefer Python for data processing"
+# 2. Later ask: "What language should I use for data tasks?"
+# → Model will remember your preference and recommend Python
+```
+**Configuration:**
+```env
+MEMORY_ENABLED=true                  # Enable/disable (default: true)
+MEMORY_RETRIEVAL_LIMIT=5             # Memories per request (default: 5)
+MEMORY_SURPRISE_THRESHOLD=0.3        # Min score to store (default: 0.3)
+MEMORY_MAX_AGE_DAYS=90              # Auto-prune age (default: 90)
+MEMORY_MAX_COUNT=10000              # Max memories (default: 10000)
+```
+**What Gets Remembered:**
+- ✅ User preferences ("I prefer X")
+- ✅ Important decisions ("Decided to use Y")
+- ✅ Project facts ("This app uses Z")
+- ✅ New entities (first mentions of files, functions)
+- ✅ Contradictions ("Actually, A not B")
+- ❌ Greetings, confirmations, repeated info (filtered by surprise threshold)
+**Benefits:**
+- 🎯 **Better context understanding** across sessions
+- 💾 **Persistent knowledge** stored in SQLite
+- 🚀 **Zero performance impact** (<50ms retrieval, async extraction)
+- 🔒 **Privacy-preserving** (all local, no external APIs)
+- 📈 **Scales efficiently** (supports 10K+ memories)
+See [MEMORY_SYSTEM.md](MEMORY_SYSTEM.md) for complete documentation and [QUICKSTART_MEMORY.md](QUICKSTART_MEMORY.md) for usage examples.
 ### Repo Intelligence & Navigation
 - Fast indexer builds a lightweight SQLite catalog of files, symbols, references, and framework hints.
@@ -428,6 +479,23 @@ Lynkr includes comprehensive production-ready features designed for reliability,
 Lynkr offers multiple installation methods to fit your workflow:
+#### Quick Install (curl)
+```bash
+curl -fsSL https://raw.githubusercontent.com/vishalveerareddy123/Lynkr/main/install.sh | bash
+```
+This will:
+- Clone Lynkr to `~/.lynkr`
+- Install dependencies
+- Create a default `.env` file
+- Set up the `lynkr` command
+**Custom installation directory:**
+```bash
+curl -fsSL https://raw.githubusercontent.com/vishalveerareddy123/Lynkr/main/install.sh | bash -s -- --dir /opt/lynkr
+```
 #### Option 1: Simple Databricks Setup (Quickest)
 **No Ollama needed** - Just use Databricks APIs directly:
@@ -603,6 +671,52 @@ ollama pull qwen2.5-coder:latest
 ollama list
 ```
+**llama.cpp configuration:**
+llama.cpp provides maximum performance and flexibility for running GGUF models locally. It uses an OpenAI-compatible API, making integration seamless.
+```env
+MODEL_PROVIDER=llamacpp
+LLAMACPP_ENDPOINT=http://localhost:8080  # default llama.cpp server port
+LLAMACPP_MODEL=qwen2.5-coder-7b          # model name (for logging)
+LLAMACPP_TIMEOUT_MS=120000               # request timeout
+PORT=8080
+WORKSPACE_ROOT=/path/to/your/repo
+```
+Before starting Lynkr with llama.cpp, ensure llama-server is running:
+```bash
+# Download and build llama.cpp (if not already done)
+git clone https://github.com/ggerganov/llama.cpp
+cd llama.cpp && make
+# Download a GGUF model (e.g., from HuggingFace)
+# Example: Qwen2.5-Coder-7B-Instruct
+wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_k_m.gguf
+# Start llama-server
+./llama-server -m qwen2.5-coder-7b-instruct-q4_k_m.gguf --port 8080
+# Verify server is running
+curl http://localhost:8080/health
+```
+**Why llama.cpp over Ollama?**
+| Feature | Ollama | llama.cpp |
+|---------|--------|-----------|
+| Setup | Easy (app) | Manual (compile/download) |
+| Model Format | Ollama-specific | Any GGUF model |
+| Performance | Good | Excellent (optimized C++) |
+| GPU Support | Yes | Yes (CUDA, Metal, ROCm, Vulkan) |
+| Memory Usage | Higher | Lower (quantization options) |
+| API | Custom `/api/chat` | OpenAI-compatible `/v1/chat/completions` |
+| Flexibility | Limited models | Any GGUF from HuggingFace |
+| Tool Calling | Limited models | Grammar-based, more reliable |
+Choose llama.cpp when you need maximum performance, specific quantization options, or want to use GGUF models not available in Ollama.
 **OpenRouter configuration:**
 OpenRouter provides unified access to 100+ AI models through a single API, including GPT-4o, Claude, Gemini, Llama, Mixtral, and more. It offers competitive pricing, automatic fallbacks, and no need to manage multiple API keys.
@@ -624,6 +738,33 @@ WORKSPACE_ROOT=/path/to/your/repo
 See https://openrouter.ai/models for the complete list with pricing.
+**OpenAI configuration:**
+OpenAI provides direct access to GPT-4o, GPT-4o-mini, o1, and other models through their official API. This is the simplest way to use OpenAI models without going through Azure or OpenRouter.
+```env
+MODEL_PROVIDER=openai
+OPENAI_API_KEY=sk-your-openai-api-key                    # Get from https://platform.openai.com/api-keys
+OPENAI_MODEL=gpt-4o                                      # Model to use (default: gpt-4o)
+PORT=8080
+WORKSPACE_ROOT=/path/to/your/repo
+```
+**Getting an OpenAI API key:**
+1. Visit https://platform.openai.com
+2. Sign up or log in to your account
+3. Go to https://platform.openai.com/api-keys
+4. Create a new API key
+5. Add credits to your account (pay-as-you-go)
+**OpenAI benefits:**
+- ✅ **Direct API access** – No intermediaries, lowest latency to OpenAI
+- ✅ **Full tool calling support** – Excellent function calling compatible with Claude Code CLI
+- ✅ **Parallel tool calls** – Execute multiple tools simultaneously for faster workflows
+- ✅ **Organization support** – Use organization-level API keys for team billing
+- ✅ **Simple setup** – Just one API key needed
 **Getting an OpenRouter API key:**
 1. Visit https://openrouter.ai
 2. Sign in with GitHub, Google, or email
@@ -647,7 +788,7 @@ See https://openrouter.ai/models for the complete list with pricing.
 |----------|-------------|---------|
 | `PORT` | HTTP port for the proxy server. | `8080` |
 | `WORKSPACE_ROOT` | Filesystem path exposed to workspace tools and indexer. | `process.cwd()` |
-| `MODEL_PROVIDER` | Selects the model backend (`databricks`, `azure-anthropic`, `openrouter`, `ollama`). | `databricks` |
+| `MODEL_PROVIDER` | Selects the model backend (`databricks`, `openai`, `azure-openai`, `azure-anthropic`, `openrouter`, `ollama`, `llamacpp`). | `databricks` |
 | `MODEL_DEFAULT` | Overrides the default model/deployment name sent to the provider. | Provider-specific default |
 | `DATABRICKS_API_BASE` | Base URL of your Databricks workspace (required when `MODEL_PROVIDER=databricks`). | – |
 | `DATABRICKS_API_KEY` | Databricks PAT used for the serving endpoint (required for Databricks). | – |
@@ -659,9 +800,17 @@ See https://openrouter.ai/models for the complete list with pricing.
 | `OPENROUTER_MODEL` | OpenRouter model to use (e.g., `openai/gpt-4o-mini`, `anthropic/claude-3.5-sonnet`). See https://openrouter.ai/models | `openai/gpt-4o-mini` |
 | `OPENROUTER_ENDPOINT` | OpenRouter API endpoint URL. | `https://openrouter.ai/api/v1/chat/completions` |
 | `OPENROUTER_MAX_TOOLS_FOR_ROUTING` | Maximum tool count for routing to OpenRouter in hybrid mode. | `15` |
+| `OPENAI_API_KEY` | OpenAI API key (required when `MODEL_PROVIDER=openai`). Get from https://platform.openai.com/api-keys | – |
+| `OPENAI_MODEL` | OpenAI model to use (e.g., `gpt-4o`, `gpt-4o-mini`, `o1-preview`). | `gpt-4o` |
+| `OPENAI_ENDPOINT` | OpenAI API endpoint URL (usually don't need to change). | `https://api.openai.com/v1/chat/completions` |
+| `OPENAI_ORGANIZATION` | OpenAI organization ID for organization-level API keys (optional). | – |
 | `OLLAMA_ENDPOINT` | Ollama API endpoint URL (required when `MODEL_PROVIDER=ollama`). | `http://localhost:11434` |
 | `OLLAMA_MODEL` | Ollama model name to use (e.g., `qwen2.5-coder:latest`, `llama3`, `mistral`). | `qwen2.5-coder:7b` |
 | `OLLAMA_TIMEOUT_MS` | Request timeout for Ollama API calls in milliseconds. | `120000` (2 minutes) |
+| `LLAMACPP_ENDPOINT` | llama.cpp server endpoint URL (required when `MODEL_PROVIDER=llamacpp`). | `http://localhost:8080` |
+| `LLAMACPP_MODEL` | llama.cpp model name (for logging purposes). | `default` |
+| `LLAMACPP_TIMEOUT_MS` | Request timeout for llama.cpp API calls in milliseconds. | `120000` (2 minutes) |
+| `LLAMACPP_API_KEY` | Optional API key for secured llama.cpp servers. | – |
 | `PROMPT_CACHE_ENABLED` | Toggle the prompt cache system. | `true` |
 | `PROMPT_CACHE_TTL_MS` | Milliseconds before cached prompts expire. | `300000` (5 minutes) |
 | `PROMPT_CACHE_MAX_ENTRIES` | Maximum number of cached prompts retained. | `64` |
@@ -1282,9 +1431,12 @@ Replace `<workspace>` and `<endpoint-name>` with your Databricks workspace host
 ### Provider-specific behaviour
 - **Databricks** – Mirrors Anthropic's hosted behaviour. Automatic policy web fallbacks (`needsWebFallback`) can trigger an extra `web_fetch`, and the upstream service executes dynamic pages on your behalf.
+- **OpenAI** – Connects directly to OpenAI's API for GPT-4o, GPT-4o-mini, o1, and other models. Full tool calling support with parallel tool execution enabled by default. Messages and tools are automatically converted between Anthropic and OpenAI formats. Supports organization-level API keys. Best used when you want direct access to OpenAI's latest models with the simplest setup.
+- **Azure OpenAI** – Connects to Azure-hosted OpenAI models. Similar to direct OpenAI but through Azure's infrastructure for enterprise compliance, data residency, and Azure billing integration.
 - **Azure Anthropic** – Requests are normalised to Azure's payload shape. The proxy disables automatic `web_fetch` fallbacks to avoid duplicate tool executions; instead, the assistant surfaces a diagnostic message and you can trigger the tool manually if required.
 - **OpenRouter** – Connects to OpenRouter's unified API for access to 100+ models. Full tool calling support with automatic format conversion between Anthropic and OpenAI formats. Messages are converted to OpenAI's format, tool calls are properly translated, and responses are converted back to Anthropic-compatible format. Best used for cost optimization, model flexibility, or when you want to experiment with different models without changing your codebase.
 - **Ollama** – Connects to locally-running Ollama models. Tool support varies by model (llama3.1, qwen2.5, mistral support tools; llama3 and older models don't). System prompts are merged into the first user message. Response format is converted from Ollama's format to Anthropic-compatible content blocks. Best used for simple text generation tasks, offline development, or as a cost-effective development environment.
+- **llama.cpp** – Connects to a local llama-server instance running GGUF models. Uses OpenAI-compatible API format (`/v1/chat/completions`), enabling full tool calling support with grammar-based generation. Provides maximum performance with optimized C++ inference, lower memory usage through quantization, and support for any GGUF model from HuggingFace. Best used when you need maximum performance, specific quantization options, or models not available in Ollama.
 - In all cases, `web_search` and `web_fetch` run locally. They do not execute JavaScript, so pages that render data client-side (Google Finance, etc.) will return scaffolding only. Prefer JSON/CSV quote APIs (e.g. Yahoo chart API) when you need live financial data.
 ---
@@ -1460,7 +1612,42 @@ A:
 - **OpenRouter**: ~300ms-1.5s latency, cloud-hosted, competitive pricing ($0.15/1M for GPT-4o-mini), 100+ models, full tool support
 - **Ollama**: ~100-500ms first token, runs locally, free, limited tool support (model-dependent)
-Choose Databricks/Azure for enterprise production with guaranteed SLAs. Choose OpenRouter for flexibility, cost optimization, and access to multiple models. Choose Ollama for fast iteration, offline development, or maximum cost savings.
+Choose Databricks/Azure for enterprise production with guaranteed SLAs. Choose OpenRouter for flexibility, cost optimization, and access to multiple models. Choose Ollama for fast iteration, offline development, or maximum cost savings. Choose llama.cpp for maximum performance and full GGUF model control.
+**Q: What is llama.cpp and when should I use it over Ollama?**
+A: llama.cpp is a high-performance C++ inference engine for running large language models locally. Unlike Ollama (which is an application with its own model format), llama.cpp:
+- **Runs any GGUF model** from HuggingFace directly
+- **Provides better performance** through optimized C++ code
+- **Uses less memory** with advanced quantization options (Q2_K to Q8_0)
+- **Supports more GPU backends** (CUDA, Metal, ROCm, Vulkan, SYCL)
+- **Uses OpenAI-compatible API** making integration seamless
+Use llama.cpp when you need:
+- Maximum inference speed and minimum memory usage
+- Specific quantization levels not available in Ollama
+- GGUF models not packaged for Ollama
+- Fine-grained control over model parameters (context length, GPU layers, etc.)
+Use Ollama when you prefer easier setup and don't need the extra control.
+**Q: How do I set up llama.cpp with Lynkr?**
+A:
+```bash
+# 1. Build llama.cpp (or download pre-built binary)
+git clone https://github.com/ggerganov/llama.cpp
+cd llama.cpp && make
+# 2. Download a GGUF model
+wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_k_m.gguf
+# 3. Start the server
+./llama-server -m qwen2.5-coder-7b-instruct-q4_k_m.gguf --port 8080
+# 4. Configure Lynkr
+export MODEL_PROVIDER=llamacpp
+export LLAMACPP_ENDPOINT=http://localhost:8080
+npm start
+```
 **Q: What is OpenRouter and why should I use it?**
 A: OpenRouter is a unified API gateway that provides access to 100+ AI models from multiple providers (OpenAI, Anthropic, Google, Meta, Mistral, etc.) through a single API key. Benefits include:
@@ -1492,7 +1679,31 @@ A: Popular choices:
 See https://openrouter.ai/models for the complete list with pricing and features.
-**Q: Can I use OpenRouter with the 3-tier hybrid routing?**
+**Q: How do I use OpenAI directly with Lynkr?**
+A: Set `MODEL_PROVIDER=openai` and configure your API key:
+```env
+MODEL_PROVIDER=openai
+OPENAI_API_KEY=sk-your-api-key
+OPENAI_MODEL=gpt-4o  # or gpt-4o-mini, o1-preview, etc.
+```
+Then start Lynkr and connect Claude CLI as usual. All requests will be routed to OpenAI's API with automatic format conversion.
+**Q: What's the difference between OpenAI, Azure OpenAI, and OpenRouter?**
+A:
+- **OpenAI** – Direct access to OpenAI's API. Simplest setup, lowest latency to OpenAI, pay-as-you-go billing directly with OpenAI.
+- **Azure OpenAI** – OpenAI models hosted on Azure infrastructure. Enterprise features (private endpoints, data residency, Azure AD integration), billed through Azure.
+- **OpenRouter** – Third-party API gateway providing access to 100+ models (including OpenAI). Competitive pricing, automatic fallbacks, single API key for multiple providers.
+Choose OpenAI for simplicity and direct access, Azure OpenAI for enterprise requirements, or OpenRouter for model flexibility and cost optimization.
+**Q: Which OpenAI model should I use?**
+A:
+- **Best quality**: `gpt-4o` – Most capable, multimodal (text + vision), excellent tool calling
+- **Best value**: `gpt-4o-mini` – Fast, affordable ($0.15/$0.60 per 1M tokens), good for most tasks
+- **Complex reasoning**: `o1-preview` – Advanced reasoning for math, logic, and complex problems
+- **Fast reasoning**: `o1-mini` – Efficient reasoning for coding and math tasks
+**Q: Can I use OpenAI with the 3-tier hybrid routing?**
 A: Yes! The recommended configuration uses:
 - **Tier 1 (0-2 tools)**: Ollama (free, local, fast)
 - **Tier 2 (3-14 tools)**: OpenRouter (affordable, full tool support)