lynkr 4.0.0 → 4.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,2996 +1,399 @@
1
- # Lynkr - Claude Code Proxy with Multi-Provider Support, MCP Integration & Token Optimization
1
+ # Lynkr - Claude Code Proxy with Multi-Provider Support
2
2
 
3
- [![npm version](https://img.shields.io/npm/v/lynkr.svg)](https://www.npmjs.com/package/lynkr "Lynkr NPM Package - Claude Code Proxy Server")
4
- [![Homebrew Tap](https://img.shields.io/badge/homebrew-lynkr-brightgreen.svg)](https://github.com/vishalveerareddy123/homebrew-lynkr "Install Lynkr via Homebrew")
5
- [![License: Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE "Apache 2.0 License - Open Source Claude Code Alternative")
6
- [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/vishalveerareddy123/Lynkr "Lynkr Documentation on DeepWiki")
7
- [![Databricks Supported](https://img.shields.io/badge/Databricks-Supported-orange)](https://www.databricks.com/ "Databricks Claude Integration")
8
- [![AWS Bedrock](https://img.shields.io/badge/AWS%20Bedrock-100%2B%20Models-FF9900)](https://aws.amazon.com/bedrock/ "AWS Bedrock - 100+ Models")
9
- [![OpenAI Compatible](https://img.shields.io/badge/OpenAI-Compatible-412991)](https://openai.com/ "OpenAI GPT Integration")
10
- [![Ollama Compatible](https://img.shields.io/badge/Ollama-Compatible-brightgreen)](https://ollama.ai/ "Local Ollama Model Support")
11
- [![llama.cpp Compatible](https://img.shields.io/badge/llama.cpp-Compatible-blue)](https://github.com/ggerganov/llama.cpp "llama.cpp GGUF Model Support")
12
- [![IndexNow Enabled](https://img.shields.io/badge/IndexNow-Enabled-success?style=flat-square)](https://www.indexnow.org/ "SEO Optimized with IndexNow")
13
- [![DevHunt](https://img.shields.io/badge/DevHunt-Lynkr-orange)](https://devhunt.org/tool/lynkr "Lynkr on DevHunt")
3
+ [![npm version](https://img.shields.io/npm/v/lynkr.svg)](https://www.npmjs.com/package/lynkr)
4
+ [![Homebrew Tap](https://img.shields.io/badge/homebrew-lynkr-brightgreen.svg)](https://github.com/vishalveerareddy123/homebrew-lynkr)
5
+ [![License: Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
6
+ [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/vishalveerareddy123/Lynkr)
7
+ [![Databricks Supported](https://img.shields.io/badge/Databricks-Supported-orange)](https://www.databricks.com/)
8
+ [![AWS Bedrock](https://img.shields.io/badge/AWS%20Bedrock-100%2B%20Models-FF9900)](https://aws.amazon.com/bedrock/)
9
+ [![OpenAI Compatible](https://img.shields.io/badge/OpenAI-Compatible-412991)](https://openai.com/)
10
+ [![Ollama Compatible](https://img.shields.io/badge/Ollama-Compatible-brightgreen)](https://ollama.ai/)
11
+ [![llama.cpp Compatible](https://img.shields.io/badge/llama.cpp-Compatible-blue)](https://github.com/ggerganov/llama.cpp)
14
12
 
15
- > ** Claude Code proxy server supporting Databricks, AWS Bedrock (100+ models), OpenRouter, Ollama & Azure. Features MCP integration, prompt caching & 60-80% token optimization savings.**
16
-
17
- ## 🔖 Keywords
18
-
19
- `claude-code` `claude-proxy` `anthropic-api` `databricks-llm` `aws-bedrock` `bedrock-models` `deepseek-r1` `qwen3-coder` `openrouter-integration` `ollama-local` `llama-cpp` `azure-openai` `azure-anthropic` `mcp-server` `prompt-caching` `token-optimization` `ai-coding-assistant` `llm-proxy` `self-hosted-ai` `git-automation` `code-generation` `developer-tools` `ci-cd-automation` `llm-gateway` `cost-reduction` `multi-provider-llm`
20
-
21
- ---
22
-
23
- ## Table of Contents
24
-
25
- 1. [Why Lynkr?](#why-lynkr)
26
- 2. [Quick Start (3 minutes)](#quick-start-3-minutes)
27
- 3. [Overview](#overview)
28
- 4. [Supported AI Model Providers](#supported-ai-model-providers-databricks-openrouter-ollama-azure-llamacpp)
29
- 5. [Lynkr vs Native Claude Code](#lynkr-vs-native-claude-code)
30
- 6. [Core Capabilities](#core-capabilities)
31
- - [Repo Intelligence & Navigation](#repo-intelligence--navigation)
32
- - [Git Workflow Enhancements](#git-workflow-enhancements)
33
- - [Diff & Change Management](#diff--change-management)
34
- - [Execution & Tooling](#execution--tooling)
35
- - [Workflow & Collaboration](#workflow--collaboration)
36
- - [UX, Monitoring, and Logs](#ux-monitoring-and-logs)
37
- 7. [Production-Ready Features for Enterprise Deployment](#production-ready-features-for-enterprise-deployment)
38
- - [Reliability & Resilience](#reliability--resilience)
39
- - [Observability & Monitoring](#observability--monitoring)
40
- - [Security & Governance](#security--governance)
41
- 8. [Architecture](#architecture)
42
- 9. [Getting Started: Installation & Setup Guide](#getting-started-installation--setup-guide)
43
- 10. [Configuration Reference](#configuration-reference)
44
- 11. [Runtime Operations](#runtime-operations)
45
- - [Launching the Proxy](#launching-the-proxy)
46
- - [Connecting Claude Code CLI](#connecting-claude-code-cli)
47
- - [Using Ollama Models](#using-ollama-models)
48
- - [Hybrid Routing with Automatic Fallback](#hybrid-routing-with-automatic-fallback)
49
- - [Using Built-in Workspace Tools](#using-built-in-workspace-tools)
50
- - [Working with Prompt Caching](#working-with-prompt-caching)
51
- - [Integrating MCP Servers](#integrating-mcp-servers)
52
- - [Health Checks & Monitoring](#health-checks--monitoring)
53
- - [Metrics & Observability](#metrics--observability)
54
- 12. [Manual Test Matrix](#manual-test-matrix)
55
- 13. [Troubleshooting](#troubleshooting)
56
- 14. [Roadmap & Known Gaps](#roadmap--known-gaps)
57
- 15. [Frequently Asked Questions (FAQ)](#frequently-asked-questions-faq)
58
- 16. [References & Further Reading](#references--further-reading)
59
- 17. [Community & Adoption](#community--adoption)
60
- 18. [License](#license)
61
-
62
- ---
63
-
64
- ## Why Lynkr?
65
-
66
- ### The Problem
67
- Claude Code CLI is locked to Anthropic's API, limiting your choice of LLM providers, increasing costs, and preventing local/offline usage.
68
-
69
- ### The Solution
70
- Lynkr is a **production-ready proxy server** that unlocks Claude Code CLI's full potential:
71
-
72
- - ✅ **Any LLM Provider** - [Databricks, AWS Bedrock (100+ models), OpenRouter (100+ models), Ollama (local), Azure, OpenAI, llama.cpp](#supported-ai-model-providers-databricks-aws-bedrock-openrouter-ollama-azure-llamacpp)
73
- - ✅ **60-80% Cost Reduction** - Built-in [token optimization](#token-optimization-implementation) (5 optimization phases implemented)
74
- - ✅ **Zero Code Changes** - [Drop-in replacement](#connecting-claude-code-cli) for Anthropic backend
75
- - ✅ **Local & Offline** - Run Claude Code with [Ollama](#using-ollama-models) or [llama.cpp](#using-llamacpp-with-lynkr) (no internet required)
76
- - ✅ **Enterprise Features** - [Circuit breakers, load balancing, metrics, K8s-ready health checks](#production-ready-features-for-enterprise-deployment)
77
- - ✅ **MCP Integration** - Automatically discover and orchestrate [Model Context Protocol servers](#integrating-mcp-servers)
78
- - ✅ **Privacy & Control** - Self-hosted, open-source ([Apache 2.0](#license)), no vendor lock-in
79
-
80
- ### Perfect For
81
- - 🔧 **Developers** who want flexibility and cost control
82
- - ðŸĒ **Enterprises** needing self-hosted AI with observability
83
- - 🔒 **Privacy-focused teams** requiring local model execution
84
- - 💰 **Cost-conscious projects** seeking token optimization
85
- - 🚀 **DevOps teams** wanting production-ready AI infrastructure
86
-
87
- ---
88
-
89
- ## Quick Start (3 minutes)
90
-
91
- ### 1ïļâƒĢ Install
92
- ```bash
93
- npm install -g lynkr
94
- ```
95
-
96
- ### 2ïļâƒĢ Configure Your Provider
97
- ```bash
98
- # Option A: Use AWS Bedrock (100+ models) 🆕
99
- export MODEL_PROVIDER=bedrock
100
- export AWS_BEDROCK_API_KEY=your-bearer-token
101
- export AWS_BEDROCK_REGION=us-east-2
102
- export AWS_BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0
103
-
104
- # Option B: Use local Ollama (free, offline)
105
- export MODEL_PROVIDER=ollama
106
- export OLLAMA_MODEL=llama3.1:8b
107
-
108
- # Option C: Use Databricks (production)
109
- export MODEL_PROVIDER=databricks
110
- export DATABRICKS_API_BASE=https://your-workspace.databricks.net
111
- export DATABRICKS_API_KEY=your-api-key
112
-
113
- # Option C: Use OpenRouter (100+ models)
114
- export MODEL_PROVIDER=openrouter
115
- export OPENROUTER_API_KEY=your-api-key
116
- export OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
117
- ```
118
-
119
- ### 3ïļâƒĢ Start the Proxy
120
- ```bash
121
- lynkr start
122
- # Server running at http://localhost:8080
123
- ```
124
-
125
- ### 4ïļâƒĢ Connect Claude Code CLI
126
- ```bash
127
- # Point Claude Code CLI to Lynkr
128
- export ANTHROPIC_BASE_URL=http://localhost:8080
129
- export ANTHROPIC_API_KEY=dummy # Ignored by Lynkr, but required by CLI
130
-
131
- # Start coding!
132
- claude "Hello, world!"
133
- ```
134
-
135
- ### 🎉 You're Done!
136
- Claude Code CLI now works with your chosen provider.
137
-
138
- **Next steps:**
139
- - 📖 [Configuration Guide](#configuration-reference) - Customize settings
140
- - 🏭 [Production Setup](#production-hardening-features) - Deploy to production
141
- - 💰 [Token Optimization](#token-optimization) - Enable 60-80% cost savings
142
- - 🔧 [MCP Integration](#integrating-mcp-servers) - Add custom tools
13
+ > **Production-ready Claude Code proxy supporting 9+ LLM providers with 60-80% cost reduction through token optimization.**
143
14
 
144
15
  ---
145
16
 
146
17
  ## Overview
147
18
 
148
- This repository contains a Node.js service that emulates the Anthropic Claude Code backend so that the Claude Code CLI (or any compatible client) can operate against alternative model providers and custom tooling.
149
-
150
- Key highlights:
151
-
152
- - **Production-ready architecture** – 14 production hardening features including circuit breakers, load shedding, graceful shutdown, comprehensive metrics (Prometheus format), and Kubernetes-ready health checks. Minimal overhead (~7ξs per request) with 140K req/sec throughput.
153
- - **Multi-provider support** – Works with Databricks (default), Azure-hosted Anthropic endpoints, OpenRouter (100+ models), and local Ollama models; requests are normalized to each provider while returning Claude-flavored responses.
154
- - **Enterprise observability** – Real-time metrics collection, structured logging with request ID correlation, latency percentiles (p50, p95, p99), token usage tracking, and cost attribution. Multiple export formats (JSON, Prometheus).
155
- - **Resilience & reliability** – Exponential backoff with jitter for retries, circuit breaker protection against cascading failures, automatic load shedding during overload, and zero-downtime deployments via graceful shutdown.
156
- - **Workspace awareness** – Local repo indexing, `CLAUDE.md` summaries, language-aware navigation, and Git helpers mirror core Claude Code workflows.
157
- - **Model Context Protocol (MCP) orchestration** – Automatically discovers MCP manifests, launches JSON-RPC 2.0 servers, and re-exposes their tools inside the proxy.
158
- - **Prompt caching** – Re-uses repeated prompts to reduce latency and token consumption, matching Claude's own cache semantics.
159
- - **Smart tool selection** – Intelligently filters tools based on request type (conversational, coding, research), reducing tool tokens by 50-70% for simple queries. Automatically enabled across all providers.
160
- - **Policy enforcement** – Environment-driven guardrails control Git operations, test requirements, web fetch fallbacks, and sandboxing rules. Input validation and consistent error handling ensure API reliability.
161
-
162
- The result is a production-ready, self-hosted alternative that stays close to Anthropic's ergonomics while providing enterprise-grade reliability, observability, and performance.
163
-
164
- > **Compatibility note:** Claude models hosted on Databricks work out of the box. Set `MODEL_PROVIDER=openai` to use OpenAI's API directly (GPT-4o, GPT-4o-mini, o1, etc.). Set `MODEL_PROVIDER=azure-anthropic` (and related credentials) to target the Azure-hosted Anthropic `/anthropic/v1/messages` endpoint. Set `MODEL_PROVIDER=openrouter` to access 100+ models through OpenRouter (GPT-4o, Claude, Gemini, etc.). Set `MODEL_PROVIDER=ollama` to use locally-running Ollama models (qwen2.5-coder, llama3, mistral, etc.).
165
-
166
- Further documentation and usage notes are available on [DeepWiki](https://deepwiki.com/vishalveerareddy123/Lynkr).
167
-
168
- ---
169
-
170
- ## Supported AI Model Providers (Databricks, AWS Bedrock, OpenRouter, Ollama, Azure, llama.cpp)
171
-
172
- Lynkr supports multiple AI model providers, giving you flexibility in choosing the right model for your needs:
173
-
174
- ### **Provider Options**
175
-
176
- | Provider | Configuration | Models Available | Best For |
177
- |----------|--------------|------------------|----------|
178
- | **Databricks** (Default) | `MODEL_PROVIDER=databricks` | Claude Sonnet 4.5, Claude Opus 4.5 | Production use, enterprise deployment |
179
- | **AWS Bedrock** 🆕 | `MODEL_PROVIDER=bedrock` | 100+ models (Claude, DeepSeek R1, Qwen3, Nova, Titan, Llama, Mistral, etc.) | AWS ecosystem, multi-model flexibility, Claude + alternatives |
180
- | **OpenAI** | `MODEL_PROVIDER=openai` | GPT-5, GPT-5.2, GPT-4o, GPT-4o-mini, GPT-4-turbo, o1, o1-mini | Direct OpenAI API access |
181
- | **Azure OpenAI** | `MODEL_PROVIDER=azure-openai` | GPT-5, GPT-5.2,GPT-4o, GPT-4o-mini, GPT-5, o1, o3, Kimi-K2 | Azure integration, Microsoft ecosystem |
182
- | **Azure Anthropic** | `MODEL_PROVIDER=azure-anthropic` | Claude Sonnet 4.5, Claude Opus 4.5 | Azure-hosted Claude models |
183
- | **OpenRouter** | `MODEL_PROVIDER=openrouter` | 100+ models (GPT-4o, Claude, Gemini, Llama, etc.) | Model flexibility, cost optimization |
184
- | **Ollama** (Local) | `MODEL_PROVIDER=ollama` | Llama 3.1, Qwen2.5, Mistral, CodeLlama | Local/offline use, privacy, no API costs |
185
- | **llama.cpp** (Local) | `MODEL_PROVIDER=llamacpp` | Any GGUF model | Maximum performance, full model control |
186
-
187
- ### **Recommended Models by Use Case**
188
-
189
- #### **For Production Code Assistance**
190
- - **Best**: Claude Sonnet 4.5 (via Databricks or Azure Anthropic)
191
- - **Alternative**: GPT-4o (via Azure OpenAI or OpenRouter)
192
- - **Budget**: GPT-4o-mini (via Azure OpenAI) or Claude Haiku (via OpenRouter)
193
-
194
- #### **For Code Generation**
195
- - **Best**: Claude Opus 4.5 (via Databricks or Azure Anthropic)
196
- - **Alternative**: GPT-4o (via Azure OpenAI)
197
- - **Local**: Qwen2.5-Coder 32B (via Ollama)
198
-
199
- #### **For Fast Exploration**
200
- - **Best**: Claude Haiku (via OpenRouter or Azure Anthropic)
201
- - **Alternative**: GPT-4o-mini (via Azure OpenAI)
202
- - **Local**: Llama 3.1 8B (via Ollama)
203
-
204
- #### **For Cost Optimization**
205
- - **Cheapest Cloud**: Amazon Nova models (via OpenRouter) - free tier available
206
- - **Cheapest Local**: Ollama (any model) - completely free, runs on your hardware
207
-
208
- ### **Azure OpenAI Specific Models**
209
-
210
- When using `MODEL_PROVIDER=azure-openai`, you can deploy any of the models in azure ai foundry:
211
-
212
-
213
- **Note**: Azure OpenAI deployment names are configurable via `AZURE_OPENAI_DEPLOYMENT` environment variable.
214
-
215
- ### **AWS Bedrock Model Catalog (100+ Models)**
216
-
217
- When using `MODEL_PROVIDER=bedrock`, you have access to **nearly 100 models** via AWS Bedrock's unified Converse API:
218
-
219
- #### **🆕 NEW Models (2025-2026)**
220
- - **DeepSeek R1** - `us.deepseek.r1-v1:0` - Reasoning model (o1-style)
221
- - **Qwen3** - `qwen.qwen3-235b-*`, `qwen.qwen3-coder-480b-*` - Up to 480B parameters!
222
- - **OpenAI GPT-OSS** - `openai.gpt-oss-120b-1:0` - Open-weight GPT models
223
- - **Google Gemma 3** - `google.gemma-3-27b` - Open-weight from Google
224
- - **MiniMax M2** - `minimax.m2-v1:0` - Chinese AI company
225
-
226
- #### **Claude Models (Best for Tool Calling)**
227
- - **Claude 4.5** - `us.anthropic.claude-sonnet-4-5-*` - Best for coding with tools
228
- - **Claude 3.5** - `anthropic.claude-3-5-sonnet-*` - Excellent tool calling
229
- - **Claude 3 Haiku** - `anthropic.claude-3-haiku-*` - Fast and cost-effective
230
-
231
- #### **Amazon Models**
232
- - **Nova** - `us.amazon.nova-pro-v1:0` - Multimodal, 300K context
233
- - **Titan** - `amazon.titan-text-express-v1` - General purpose
234
-
235
- #### **Other Major Models**
236
- - **Meta Llama** - `meta.llama3-1-70b-*` - Open-source Llama 3.1
237
- - **Mistral** - `mistral.mistral-large-*` - Coding, multilingual
238
- - **Cohere** - `cohere.command-r-plus-v1:0` - RAG, search
239
- - **AI21 Jamba** - `ai21.jamba-1-5-large-v1:0` - 256K context
19
+ Lynkr is a **self-hosted proxy server** that unlocks Claude Code CLI and Cursor IDE by enabling:
240
20
 
241
- #### **Quick Setup**
242
- ```bash
243
- export MODEL_PROVIDER=bedrock
244
- export AWS_BEDROCK_API_KEY=your-bearer-token # Get from AWS Console → Bedrock → API Keys
245
- export AWS_BEDROCK_REGION=us-east-2
246
- export AWS_BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0
247
- ```
248
-
249
- 📖 **Full Documentation**: See [BEDROCK_MODELS.md](BEDROCK_MODELS.md) for complete model catalog, pricing, capabilities, and use cases.
250
-
251
- ⚠ïļ **Tool Calling Note**: Only **Claude models** support tool calling on Bedrock. Other models work via Converse API but won't use Read/Write/Bash tools.
252
-
253
- ### **Ollama Model Recommendations**
254
-
255
- For tool calling support (required for Claude Code CLI functionality):
256
-
257
- ✅ **Recommended**:
258
- - `llama3.1:8b` - Good balance of speed and capability
259
- - `llama3.2` - Latest Llama model
260
- - `qwen2.5:14b` - Strong reasoning (larger model needed, 7b struggles with tools)
261
- - `mistral:7b-instruct` - Fast and capable
262
-
263
- ❌ **Not Recommended for Tools**:
264
- - `qwen2.5-coder` - Code-only, slow with tool calling
265
- - `codellama` - Code-only, poor tool support
266
-
267
- ### **Hybrid Routing (Ollama + Cloud Fallback)**
268
-
269
- Lynkr supports intelligent hybrid routing for cost optimization:
270
-
271
- ```bash
272
- # Use Ollama for simple tasks, fallback to cloud for complex ones
273
- PREFER_OLLAMA=true
274
- FALLBACK_ENABLED=true
275
- FALLBACK_PROVIDER=databricks # or azure-openai, openrouter, azure-anthropic
276
- ```
277
-
278
- **How it works**:
279
- - Requests with few/no tools → Ollama (free, local)
280
- - Requests with many tools → Cloud provider (more capable)
281
- - Ollama failures → Automatic fallback to cloud
282
-
283
- **Routing Logic**:
284
- - 0-2 tools: Ollama
285
- - 3-15 tools: OpenRouter or Azure OpenAI (if configured)
286
- - 16+ tools: Databricks or Azure Anthropic (most capable)
21
+ - 🚀 **Any LLM Provider** - Databricks, AWS Bedrock (100+ models), OpenRouter (100+ models), Ollama (local), llama.cpp, Azure OpenAI, Azure Anthropic, OpenAI, LM Studio
22
+ - 💰 **60-80% Cost Reduction** - Built-in token optimization with smart tool selection, prompt caching, and memory deduplication
23
+ - 🔒 **100% Local/Private** - Run completely offline with Ollama or llama.cpp
24
+ - ðŸŽŊ **Zero Code Changes** - Drop-in replacement for Anthropic's backend
25
+ - ðŸĒ **Enterprise-Ready** - Circuit breakers, load shedding, Prometheus metrics, health checks
287
26
 
288
- ### **Provider Comparison**
289
-
290
- | Feature | Databricks | AWS Bedrock | OpenAI | Azure OpenAI | Azure Anthropic | OpenRouter | Ollama | llama.cpp |
291
- |---------|-----------|-------------|--------|--------------|-----------------|------------|--------|-----------|
292
- | **Setup Complexity** | Medium | Easy | Easy | Medium | Medium | Easy | Easy | Medium |
293
- | **Cost** | $$$ | $$ | $$ | $$ | $$$ | $ | Free | Free |
294
- | **Latency** | Low | Low | Low | Low | Low | Medium | Very Low | Very Low |
295
- | **Model Variety** | 2 | 100+ | 10+ | 10+ | 2 | 100+ | 50+ | Unlimited |
296
- | **Tool Calling** | Excellent | Excellent* | Excellent | Excellent | Excellent | Good | Fair | Good |
297
- | **Context Length** | 200K | Up to 300K | 128K | 128K | 200K | Varies | 32K-128K | Model-dependent |
298
- | **Streaming** | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
299
- | **Privacy** | Enterprise | Enterprise | Third-party | Enterprise | Enterprise | Third-party | Local | Local |
300
- | **Offline** | No | No | No | No | No | No | Yes | Yes |
301
-
302
- _* Tool calling only supported by Claude models on Bedrock_
303
-
304
- ---
305
-
306
- ## Lynkr vs Native Claude Code
307
-
308
- **Feature Comparison for Developers and Enterprises**
309
-
310
- | Feature | Native Claude Code | Lynkr (This Project) |
311
- |---------|-------------------|----------------------|
312
- | **Provider Lock-in** | ❌ Anthropic only | ✅ 7+ providers (Databricks, OpenRouter, Ollama, Azure, OpenAI, llama.cpp) |
313
- | **Token Costs** | ðŸ’ļ Full price | ✅ **60-80% savings** (built-in optimization) |
314
- | **Local Models** | ❌ Cloud-only | ✅ **Ollama, llama.cpp** (offline support) |
315
- | **Self-Hosted** | ❌ Managed service | ✅ **Full control** (open-source) |
316
- | **MCP Support** | Limited | ✅ **Full orchestration** with auto-discovery |
317
- | **Prompt Caching** | Basic | ✅ **Advanced caching** with deduplication |
318
- | **Token Optimization** | ❌ None | ✅ **6 phases** (smart tool selection, history compression, tool truncation, dynamic prompts) |
319
- | **Enterprise Features** | Limited | ✅ **Circuit breakers, load shedding, metrics, K8s-ready** |
320
- | **Privacy** | ☁ïļ Cloud-dependent | ✅ **Self-hosted** (air-gapped deployments possible) |
321
- | **Cost Transparency** | Hidden usage | ✅ **Full tracking** (per-request, per-session, Prometheus metrics) |
322
- | **Hybrid Routing** | ❌ Not supported | ✅ **Automatic** (simple → Ollama, complex → Databricks) |
323
- | **Health Checks** | ❌ N/A | ✅ **Kubernetes-ready** (liveness, readiness, startup probes) |
324
- | **License** | Proprietary | ✅ **Apache 2.0** (open-source) |
325
-
326
- ### Cost Comparison Example
327
-
328
- **Scenario:** 100,000 API requests/month, average 50k input tokens, 2k output tokens per request
329
-
330
- | Provider | Without Lynkr | With Lynkr (60% savings) | Monthly Savings |
331
- |----------|---------------|-------------------------|-----------------|
332
- | **Claude Sonnet 4.5** (via Databricks) | $16,000 | $6,400 | **$9,600** |
333
- | **GPT-4o** (via OpenRouter) | $12,000 | $4,800 | **$7,200** |
334
- | **Ollama (Local)** | API costs + compute | Local compute only | **$12,000+** |
335
-
336
- ### Why Choose Lynkr?
337
-
338
- **For Developers:**
339
- - 🆓 Use free local models (Ollama) for development
340
- - 🔧 Switch providers without code changes
341
- - 🚀 Faster iteration with local models
342
-
343
- **For Enterprises:**
344
- - 💰 Massive cost savings (ROI: $77k-115k/year)
345
- - ðŸĒ Self-hosted = data stays private
346
- - 📊 Full observability and metrics
347
- - ðŸ›Ąïļ Production-ready reliability features
348
-
349
- **For Privacy-Focused Teams:**
350
- - 🔒 Air-gapped deployments possible
351
- - 🏠 All data stays on-premises
352
- - 🔐 No third-party API calls required
353
-
354
- ---
355
-
356
- ## 🚀 Ready to Get Started?
357
-
358
- **Reduce your Claude Code costs by 60-80% in under 3 minutes:**
359
-
360
- 1. ⭐ **[Star this repo](https://github.com/vishalveerareddy123/Lynkr)** to show support and stay updated
361
- 2. 📖 **[Follow the Quick Start Guide](#quick-start-3-minutes)** to install and configure Lynkr
362
- 3. 💎 **[Join our Discord](https://discord.gg/qF7DDxrX)** for real-time community support
363
- 4. 💎 **[Join the Discussion](https://github.com/vishalveerareddy123/Lynkr/discussions)** for questions and ideas
364
- 5. 🐛 **[Report Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** to help improve Lynkr
27
+ **Perfect for:**
28
+ - Developers who want provider flexibility and cost control
29
+ - Enterprises needing self-hosted AI with observability
30
+ - Privacy-focused teams requiring local model execution
31
+ - Teams seeking 60-80% cost reduction through optimization
365
32
 
366
33
  ---
367
34
 
368
- ## Core Capabilities
369
-
370
- ### Long-Term Memory System (Titans-Inspired)
371
-
372
- **NEW:** Lynkr now includes a comprehensive long-term memory system inspired by Google's Titans architecture, enabling persistent context across conversations and intelligent memory management.
373
-
374
- **Key Features:**
375
- - 🧠 **Surprise-Based Memory Updates** – Automatically extracts and stores only important, novel, or surprising information from conversations using a 5-factor heuristic scoring system (novelty, contradiction, specificity, emphasis, context switch).
376
- - 🔍 **FTS5 Semantic Search** – Full-text search with Porter stemmer and keyword expansion for finding relevant memories.
377
- - 📊 **Multi-Signal Retrieval** – Ranks memories using recency (30%), importance (40%), and relevance (30%) for optimal context injection.
378
- - ⚡ **Automatic Integration** – Memories are extracted after each response and injected before model calls with zero latency overhead (<50ms retrieval, <100ms async extraction).
379
- - ðŸŽŊ **5 Memory Types** – Tracks preferences, decisions, facts, entities, and relationships.
380
- - 🛠ïļ **Management Tools** – `memory_search`, `memory_add`, `memory_forget`, `memory_stats` for explicit control.
381
-
382
- **Quick Start:**
383
- ```bash
384
- # Memory system is enabled by default - just use Lynkr!
385
- # Test it:
386
- # 1. Say: "I prefer Python for data processing"
387
- # 2. Later ask: "What language should I use for data tasks?"
388
- # → Model will remember your preference and recommend Python
389
- ```
390
-
391
- **Configuration:**
392
- ```env
393
- MEMORY_ENABLED=true # Enable/disable (default: true)
394
- MEMORY_RETRIEVAL_LIMIT=5 # Memories per request (default: 5)
395
- MEMORY_SURPRISE_THRESHOLD=0.3 # Min score to store (default: 0.3)
396
- MEMORY_MAX_AGE_DAYS=90 # Auto-prune age (default: 90)
397
- MEMORY_MAX_COUNT=10000 # Max memories (default: 10000)
398
- ```
399
-
400
- **What Gets Remembered:**
401
- - ✅ User preferences ("I prefer X")
402
- - ✅ Important decisions ("Decided to use Y")
403
- - ✅ Project facts ("This app uses Z")
404
- - ✅ New entities (first mentions of files, functions)
405
- - ✅ Contradictions ("Actually, A not B")
406
- - ❌ Greetings, confirmations, repeated info (filtered by surprise threshold)
407
-
408
- **Benefits:**
409
- - ðŸŽŊ **Better context understanding** across sessions
410
- - ðŸ’ū **Persistent knowledge** stored in SQLite
411
- - 🚀 **Zero performance impact** (<50ms retrieval, async extraction)
412
- - 🔒 **Privacy-preserving** (all local, no external APIs)
413
- - 📈 **Scales efficiently** (supports 10K+ memories)
35
+ ## 💰 Cost Savings
414
36
 
415
- See [MEMORY_SYSTEM.md](MEMORY_SYSTEM.md) for complete documentation and [QUICKSTART_MEMORY.md](QUICKSTART_MEMORY.md) for usage examples.
37
+ Lynkr reduces AI costs by **60-80%** through intelligent token optimization:
416
38
 
417
- ### Repo Intelligence & Navigation
39
+ ### Real-World Savings Example
418
40
 
419
- - Fast indexer builds a lightweight SQLite catalog of files, symbols, references, and framework hints.
420
- - `CLAUDE.md` summary highlights language mix, frameworks, lint configs, and dependency signals.
421
- - Symbol search and reference lookups return definition sites and cross-file usage for supported languages (TypeScript/JavaScript/Python via Tree-sitter parsers) with heuristic fallbacks for others.
422
- - Automatic invalidation ensures removed files disappear from search results after `workspace_index_rebuild`.
41
+ **Scenario:** 100,000 API requests/month, 50k input tokens, 2k output tokens per request
423
42
 
424
- ### Git Workflow Enhancements
43
+ | Provider | Without Lynkr | With Lynkr | **Monthly Savings** | **Annual Savings** |
44
+ |----------|---------------|------------|---------------------|-------------------|
45
+ | **Claude Sonnet 4.5** (Databricks) | $16,000 | $6,400 | **$9,600** | **$115,200** |
46
+ | **GPT-4o** (OpenRouter) | $12,000 | $4,800 | **$7,200** | **$86,400** |
47
+ | **Ollama** (Local) | API costs | **$0** | **$12,000+** | **$144,000+** |
425
48
 
426
- - Git status, diff, stage, commit, push, and pull tooling via `src/tools/git.js`.
427
- - Policy flags such as `POLICY_GIT_ALLOW_PUSH` and `POLICY_GIT_REQUIRE_TESTS` enforce push restrictions or test gating.
428
- - Diff review endpoints summarise changes and highlight risks, feeding the AI review surface.
429
- - Release note generator composes summarized change logs for downstream publishing.
49
+ ### How We Achieve 60-80% Cost Reduction
430
50
 
431
- ### Diff & Change Management
51
+ **6 Token Optimization Phases:**
432
52
 
433
- - Unified diff summaries with optional AI review (`workspace_diff_review`).
434
- - Release note synthesis from Git history.
435
- - Test harness integrates with git policies to ensure guarding before commit/push events.
436
- - (Planned) Per-file threaded reviews and automated risk estimation (see [Roadmap](#roadmap--known-gaps)).
53
+ 1. **Smart Tool Selection** (50-70% reduction)
54
+ - Filters tools based on request type
55
+ - Chat queries don't get file/git tools
56
+ - Only sends relevant tools to model
437
57
 
438
- ### Execution & Tooling
58
+ 2. **Prompt Caching** (30-45% reduction)
59
+ - Caches repeated prompts and system messages
60
+ - Reuses context across conversations
61
+ - Reduces redundant token usage
439
62
 
440
- - **Flexible tool execution modes**: Configure where tools execute via `TOOL_EXECUTION_MODE`:
441
- - `server` (default) – Tools run on the proxy server where Lynkr is hosted
442
- - `client`/`passthrough` – Tools execute on the Claude Code CLI side, enabling local file operations and commands on the client machine
443
- - **Client-side tool execution** – When in passthrough mode, the proxy returns Anthropic-formatted `tool_use` blocks to the CLI, which executes them locally and sends back `tool_result` blocks. This enables:
444
- - File operations on the CLI user's local filesystem
445
- - Local command execution in the user's environment
446
- - Access to local credentials and SSH keys
447
- - Integration with local development tools
448
- - Tool execution pipeline sandboxes or runs tools in the host workspace based on policy (server mode).
449
- - MCP sandbox orchestration (Docker runtime by default) optionally isolates external tools with mount and permission controls.
450
- - Automated testing harness exposes `workspace_test_run`, `workspace_test_history`, and `workspace_test_summary`.
451
- - Prompt caching reduces repeated token usage for iterative conversations.
63
+ 3. **Memory Deduplication** (20-30% reduction)
64
+ - Removes duplicate conversation context
65
+ - Compresses historical messages
66
+ - Eliminates redundant information
452
67
 
453
- ### Workflow & Collaboration
68
+ 4. **Tool Response Truncation** (15-25% reduction)
69
+ - Truncates long tool outputs intelligently
70
+ - Keeps only relevant portions
71
+ - Reduces tool result tokens
454
72
 
455
- - Lightweight task tracker (`workspace_task_*` tools) persists TODO items in SQLite.
456
- - Session database (`data/sessions.db`) stores conversational transcripts for auditing.
457
- - Policy web fallback fetches limited remote data when explicitly permitted.
73
+ 5. **Dynamic System Prompts** (10-20% reduction)
74
+ - Adapts prompts to request complexity
75
+ - Shorter prompts for simple queries
76
+ - Full prompts only when needed
458
77
 
459
- ### UX, Monitoring, and Logs
78
+ 6. **Conversation Compression** (15-25% reduction)
79
+ - Summarizes old conversation turns
80
+ - Keeps recent context detailed
81
+ - Archives historical context
460
82
 
461
- - Pino-based structured logs with timestamps and severity.
462
- - Request/response logging for Databricks interactions (visible in stdout).
463
- - Session appenders log every user, assistant, and tool turn for reproducibility.
464
- - Metrics directory ready for future Prometheus/StatsD integration.
83
+ 📖 **[Detailed Token Optimization Guide](documentation/token-optimization.md)**
465
84
 
466
85
  ---
467
86
 
468
- ## Production-Ready Features for Enterprise Deployment
469
-
470
- Lynkr includes comprehensive production-hardened features designed for reliability, observability, and security in enterprise environments. These features add minimal performance overhead while providing robust operational capabilities for mission-critical AI deployments.
471
-
472
- ### Reliability & Resilience
473
-
474
- #### **Exponential Backoff with Jitter**
475
- - Automatic retry logic for transient failures
476
- - Configurable retry attempts (default: 3), initial delay (1s), and max delay (30s)
477
- - Jitter prevents thundering herd problems during outages
478
- - Intelligent retry logic distinguishes retryable errors (5xx, network timeouts) from permanent failures (4xx)
479
-
480
- #### **Circuit Breaker Pattern**
481
- - Protects against cascading failures to external services (Databricks, Azure Anthropic)
482
- - Three states: CLOSED (normal), OPEN (failing fast), HALF_OPEN (testing recovery)
483
- - Configurable failure threshold (default: 5) and success threshold (default: 2)
484
- - Per-provider circuit breaker instances with independent state tracking
485
- - Automatic recovery attempts after timeout period (default: 60s)
486
-
487
- #### **Load Shedding**
488
- - Proactive request rejection when system is overloaded
489
- - Monitors heap usage (90% threshold), total memory (85% threshold), and active request count (1000 threshold)
490
- - Returns HTTP 503 with Retry-After header during overload
491
- - Cached overload state (1s cache) minimizes performance impact
492
- - Graceful degradation prevents complete system failure
493
-
494
- #### **Graceful Shutdown**
495
- - SIGTERM/SIGINT signal handling for zero-downtime deployments
496
- - Health check endpoints immediately return "not ready" during shutdown
497
- - Connections drain with configurable timeout (default: 30s)
498
- - Database connections and resources cleanly closed
499
- - Kubernetes-friendly shutdown sequence
500
-
501
- #### **HTTP Connection Pooling**
502
- - Keep-alive connections reduce latency and connection overhead
503
- - Configurable socket pools (50 max sockets, 10 free sockets)
504
- - Separate HTTP and HTTPS agents with optimized settings
505
- - Connection timeouts (60s) and keep-alive intervals (30s)
506
-
507
- ### Observability & Monitoring
508
-
509
- #### **Metrics Collection**
510
- - High-performance in-memory metrics with minimal overhead (0.2ms per operation)
511
- - Request counts, error rates, latency percentiles (p50, p95, p99)
512
- - Token usage tracking (input/output tokens) and cost estimation
513
- - Databricks API metrics (success/failure rates, retry counts)
514
- - Circuit breaker state tracking per provider
515
-
516
- #### **Metrics Export Formats**
517
- - **JSON endpoint** (`/metrics/observability`): Human-readable metrics for dashboards
518
- - **Prometheus endpoint** (`/metrics/prometheus`): Industry-standard format for Prometheus scraping
519
- - **Circuit breaker endpoint** (`/metrics/circuit-breakers`): Real-time circuit breaker state
520
-
521
- #### **Health Check Endpoints**
522
- - **Liveness probe** (`/health/live`): Basic process health for Kubernetes
523
- - **Readiness probe** (`/health/ready`): Comprehensive dependency checks
524
- - Database connectivity and responsiveness
525
- - Memory usage within acceptable limits
526
- - Shutdown state detection
527
- - Returns detailed health status with per-dependency breakdown
528
-
529
- #### **Structured Request Logging**
530
- - Request ID correlation across distributed systems (X-Request-ID header)
531
- - Automatic request ID generation when not provided
532
- - Structured JSON logs with request context (method, path, IP, user agent)
533
- - Request/response timing and outcome logging
534
- - Error context preservation for debugging
535
-
536
- ### Security & Governance
537
-
538
- #### **Input Validation**
539
- - Zero-dependency JSON schema-like validation
540
- - Type checking (string, number, boolean, array, object)
541
- - Range validation (min/max length, min/max value, array size limits)
542
- - Enum validation and pattern matching
543
- - Nested object validation with detailed error reporting
544
- - Request body size limits and sanitization
545
-
546
- #### **Error Handling**
547
- - Consistent error response format across all endpoints
548
- - Operational vs non-operational error classification
549
- - 8 predefined error types (validation, authentication, authorization, not found, rate limit, external API, database, internal)
550
- - User-friendly error messages (stack traces only in development)
551
- - Request ID in all error responses for traceability
552
-
553
- #### **Path Allowlisting & Sandboxing**
554
- - Configurable filesystem path restrictions
555
- - Command execution sandboxing (Docker runtime support)
556
- - MCP tool isolation with permission controls
557
- - Environment variable filtering and secrets protection
558
-
559
- #### **Rate Limiting & Budget Enforcement**
560
- - Token budget tracking per session
561
- - Configurable budget limits and enforcement policies
562
- - Cost tracking and budget exhaustion handling
563
- - Request-level cost attribution
564
-
565
-
566
-
567
- ## Architecture
568
-
569
- ```
570
- ┌────────────────────┐ ┌───────────────────────────────────────────┐
571
- │ Claude Code CLI │──HTTP│ Claude Code Proxy (Express API Gateway) │
572
- │ (or Claude client) │ │ ┌─────────────────────────────────────┐ │
573
- └────────────────────┘ │ │ Production Middleware Stack │ │
574
- │ │ â€Ē Load Shedding (503 on overload) │ │
575
- │ │ â€Ē Request Logging (Request IDs) │ │
576
- │ │ â€Ē Metrics Collection (Prometheus) │ │
577
- │ │ â€Ē Input Validation (JSON schema) │ │
578
- │ │ â€Ē Error Handling (Consistent format)│ │
579
- │ └─────────────────────────────────────┘ │
580
- └──────────┮────────────────────────────────┘
581
- │
582
- ┌──────────────────────────────┾──────────────────────────────────┐
583
- │ │ │
584
- ┌───────▾───────┐ ┌───────▾────────┐ ┌─────────▾────────┐
585
- │ Orchestrator │ │ Prompt Cache │ │ Session Store │
586
- │ (agent loop) │ │ (LRU + TTL) │ │ (SQLite) │
587
- └───────┮───────┘ └────────────────┘ └──────────────────┘
588
- │
589
- │ ┌─────────────────────────────────────────────────────────────┐
590
- │ │ Health Checks & Metrics Endpoints │
591
- │ │ â€Ē /health/live - Kubernetes liveness probe │
592
- └──│ â€Ē /health/ready - Readiness with dependency checks │
593
- │ â€Ē /metrics/observability - JSON metrics │
594
- │ â€Ē /metrics/prometheus - Prometheus format │
595
- │ â€Ē /metrics/circuit-breakers - Circuit breaker state │
596
- └─────────────────────────────────────────────────────────────┘
597
- │
598
- ┌──────────────────────────────┾──────────────────────────────────┐
599
- │ │ │
600
- ┌───────▾────────────────────────────┐ │ ┌──────────────────────────────▾──┐
601
- │ Tool Registry & Policy Engine │ │ │ Indexer / Repo Intelligence │
602
- │ (workspace, git, diff, MCP tools) │ │ │ (SQLite catalog + CLAUDE.md) │
603
- └───────┮────────────────────────────┘ │ └─────────────────────────────────┘
604
- │ │
605
- │ ┌──────────▾─────────────────────┐
606
- │ │ Observability & Resilience │
607
- │ │ â€Ē MetricsCollector (in-memory) │
608
- │ │ â€Ē Circuit Breakers (per-provider)│
609
- │ │ â€Ē Load Shedder (resource monitor)│
610
- │ │ â€Ē Shutdown Manager (graceful) │
611
- │ └──────────┮─────────────────────┘
612
- │ │
613
- ┌───────▾────────┐ ┌─────────▾──────────────────────┐ ┌──────────────┐
614
- │ MCP Registry │ │ Provider Adapters │ │ Sandbox │
615
- │ (manifest -> │──RPC─────│ â€Ē Databricks (circuit-breaker) │──┐ │ Runtime │
616
- │ JSON-RPC client│ │ â€Ē Azure Anthropic (retry logic)│ │ │ (Docker) │
617
- └────────────────┘ │ â€Ē OpenRouter (100+ models) │ │ └──────────────┘
618
- │ â€Ē Ollama (local models) │ │
619
- │ â€Ē HTTP Connection Pooling │ │
620
- │ â€Ē Exponential Backoff + Jitter │ │
621
- └────────────┮───────────────────┘ │
622
- │ │
623
- ┌────────────────┾─────────────────┐ │
624
- │ │ │ │
625
- ┌─────────▾────────┐ ┌───▾────────┐ ┌─────▾─────────┐
626
- │ Databricks │ │ Azure │ │ OpenRouter API│
627
- │ Serving Endpoint │ │ Anthropic │ │ (GPT-4o, etc.)│
628
- │ (REST) │ │ /anthropic │ └───────────────┘
629
- └──────────────────┘ │ /v1/messages│ ┌──────────────┐
630
- └────────────┘ │ Ollama API │───────┘
631
- │ │ (localhost) │
632
- ┌────────▾──────────│ qwen2.5-coder│
633
- │ External MCP tools└──────────────┘
634
- │ (GitHub, Jira) │
635
- └───────────────────┘
636
- ```
637
-
638
- ### Request Flow Diagram
639
-
640
- ```mermaid
641
- graph TB
642
- A[Claude Code CLI] -->|HTTP POST /v1/messages| B[Lynkr Proxy Server]
643
- B --> C{Middleware Stack}
644
- C -->|Load Shedding| D{Load OK?}
645
- D -->|Yes| E[Request Logging]
646
- D -->|No| Z1[503 Service Unavailable]
647
- E --> F[Metrics Collection]
648
- F --> G[Input Validation]
649
- G --> H[Orchestrator]
650
-
651
- H --> I{Check Prompt Cache}
652
- I -->|Cache Hit| J[Return Cached Response]
653
- I -->|Cache Miss| K{Determine Provider}
654
-
655
- K -->|Simple 0-2 tools| L[Ollama Local]
656
- K -->|Moderate 3-14 tools| M[OpenRouter / Azure]
657
- K -->|Complex 15+ tools| N[Databricks]
658
-
659
- L --> O[Circuit Breaker Check]
660
- M --> O
661
- N --> O
662
-
663
- O -->|Closed| P{Provider API}
664
- O -->|Open| Z2[Fallback Provider]
665
-
666
- P -->|Databricks| Q1[Databricks API]
667
- P -->|OpenRouter| Q2[OpenRouter API]
668
- P -->|Ollama| Q3[Ollama Local]
669
- P -->|Azure| Q4[Azure Anthropic API]
670
-
671
- Q1 --> R[Response Processing]
672
- Q2 --> R
673
- Q3 --> R
674
- Q4 --> R
675
- Z2 --> R
676
-
677
- R --> S[Format Conversion]
678
- S --> T[Cache Response]
679
- T --> U[Update Metrics]
680
- U --> V[Return to Client]
681
- J --> V
682
-
683
- style B fill:#4a90e2,stroke:#333,stroke-width:2px,color:#fff
684
- style H fill:#7b68ee,stroke:#333,stroke-width:2px,color:#fff
685
- style K fill:#f39c12,stroke:#333,stroke-width:2px
686
- style P fill:#2ecc71,stroke:#333,stroke-width:2px,color:#fff
687
- ```
688
-
689
- **Key Components:**
690
-
691
- - **`src/api/router.js`** – Express routes that accept Claude-compatible `/v1/messages` requests.
692
- - **`src/api/middleware/*`** – Production middleware stack:
693
- - `load-shedding.js` – Proactive overload protection with resource monitoring
694
- - `request-logging.js` – Structured logging with request ID correlation
695
- - `metrics.js` – High-performance metrics collection middleware
696
- - `validation.js` – Zero-dependency input validation
697
- - `error-handling.js` – Consistent error response formatting
698
- - **`src/api/health.js`** – Kubernetes-ready liveness and readiness probes
699
- - **`src/orchestrator/index.js`** – Agent loop handling model invocation, tool execution, prompt caching, and policy enforcement.
700
- - **`src/cache/prompt.js`** – LRU cache implementation with SHA-256 keying and TTL eviction.
701
- - **`src/observability/metrics.js`** – In-memory metrics collector with Prometheus export
702
- - **`src/clients/circuit-breaker.js`** – Circuit breaker implementation for external service protection
703
- - **`src/clients/retry.js`** – Exponential backoff with jitter for transient failure handling
704
- - **`src/server/shutdown.js`** – Graceful shutdown manager for zero-downtime deployments
705
- - **`src/mcp/*`** – Manifest discovery, JSON-RPC 2.0 client, and dynamic tool registration for MCP servers.
706
- - **`src/tools/*`** – Built-in workspace, git, diff, testing, task, and MCP bridging tools.
707
- - **`src/indexer/index.js`** – File crawler and metadata extractor that persists into SQLite and regenerates `CLAUDE.md`.
87
+ ## 🚀 Key Features
88
+
89
+ ### Multi-Provider Support (9+ Providers)
90
+ - ✅ **Cloud Providers:** Databricks, AWS Bedrock (100+ models), OpenRouter (100+ models), Azure OpenAI, Azure Anthropic, OpenAI
91
+ - ✅ **Local Providers:** Ollama (free), llama.cpp (free), LM Studio (free)
92
+ - ✅ **Hybrid Routing:** Automatically route between local (fast/free) and cloud (powerful) based on complexity
93
+ - ✅ **Automatic Fallback:** Transparent failover if primary provider is unavailable
94
+
95
+ ### Cost Optimization
96
+ - 💰 **60-80% Token Reduction** - 6-phase optimization pipeline
97
+ - 💰 **$77k-$115k Annual Savings** - For typical enterprise usage (100k requests/month)
98
+ - 💰 **100% FREE Option** - Run completely locally with Ollama or llama.cpp
99
+ - 💰 **Hybrid Routing** - 65-100% cost savings by using local models for simple requests
100
+
101
+ ### Privacy & Security
102
+ - 🔒 **100% Local Operation** - Run completely offline with Ollama/llama.cpp
103
+ - 🔒 **Air-Gapped Deployments** - No internet required for local providers
104
+ - 🔒 **Self-Hosted** - Full control over your data and infrastructure
105
+ - 🔒 **Local Embeddings** - Private @Codebase search with Ollama/llama.cpp
106
+ - 🔐 **Policy Enforcement** - Git restrictions, test requirements, web fetch controls
107
+ - 🔐 **Sandboxing** - Optional Docker isolation for MCP tools
108
+
109
+ ### Enterprise Features
110
+ - ðŸĒ **Production-Ready** - Circuit breakers, load shedding, graceful shutdown
111
+ - ðŸĒ **Observability** - Prometheus metrics, structured logging, health checks
112
+ - ðŸĒ **Kubernetes-Ready** - Liveness, readiness, startup probes
113
+ - ðŸĒ **High Performance** - ~7Ξs overhead, 140K req/sec throughput
114
+ - ðŸĒ **Reliability** - Exponential backoff, automatic retries, error resilience
115
+ - ðŸĒ **Scalability** - Horizontal scaling, connection pooling, load balancing
116
+
117
+ ### IDE Integration
118
+ - ✅ **Claude Code CLI** - Drop-in replacement for Anthropic backend
119
+ - ✅ **Cursor IDE** - Full OpenAI API compatibility (Requires Cursor Pro)
120
+ - ✅ **Continue.dev** - Works with any OpenAI-compatible client
121
+ - ✅ **Cline +VSCode** - Confgiure it similar to cursor in openai compatible section
122
+
123
+ ### Advanced Capabilities
124
+ - 🧠 **Long-Term Memory** - Titans-inspired memory system with surprise-based filtering
125
+ - 🧠 **Semantic Memory** - FTS5 search with multi-signal retrieval (recency, importance, relevance)
126
+ - 🧠 **Automatic Extraction** - Zero-latency memory updates (<50ms retrieval, <100ms async extraction)
127
+ - 🔧 **MCP Integration** - Automatic Model Context Protocol server discovery
128
+ - 🔧 **Tool Calling** - Full tool support with server and client execution modes
129
+ - 🔧 **Custom Tools** - Easy integration of custom tool implementations
130
+ - 🔍 **Embeddings Support** - 4 options: Ollama (local), llama.cpp (local), OpenRouter, OpenAI
131
+ - 📊 **Token Tracking** - Real-time usage monitoring and cost attribution
132
+
133
+ ### Developer Experience
134
+ - ðŸŽŊ **Zero Code Changes** - Works with existing Claude Code CLI/Cursor setups
135
+ - ðŸŽŊ **Hot Reload** - Development mode with auto-restart
136
+ - ðŸŽŊ **Comprehensive Logging** - Structured logs with request ID correlation
137
+ - ðŸŽŊ **Easy Configuration** - Environment variables or .env file
138
+ - ðŸŽŊ **Docker Support** - docker-compose with GPU support
139
+ - ðŸŽŊ **400+ Tests** - Comprehensive test coverage for reliability
140
+
141
+ ### Streaming & Performance
142
+ - ⚡ **Real-Time Streaming** - Token-by-token streaming for all providers
143
+ - ⚡ **Low Latency** - Minimal overhead (~7ξs per request)
144
+ - ⚡ **High Throughput** - 140K requests/second capacity
145
+ - ⚡ **Connection Pooling** - Efficient connection reuse
146
+ - ⚡ **Prompt Caching** - LRU cache with SHA-256 keying
147
+
148
+ 📖 **[Complete Feature Documentation](documentation/features.md)**
708
149
 
709
150
  ---
710
151
 
711
- ## Getting Started: Installation & Setup Guide
712
-
713
- ### Prerequisites
714
-
715
- - **Node.js 18+** (required for the global `fetch` API).
716
- - **npm** (bundled with Node).
717
- - **Databricks account** with a Claude-compatible serving endpoint (e.g., `databricks-claude-sonnet-4-5`).
718
- - Optional: **Docker** for MCP sandboxing and tool isolation.
719
- - Optional: **Claude Code CLI** (latest release). Configure it to target the proxy URL instead of api.anthropic.com.
152
+ ## Quick Start
720
153
 
721
154
  ### Installation
722
155
 
723
- Lynkr offers multiple installation methods to fit your workflow:
724
-
725
- #### Quick Install (curl)
726
-
727
- ```bash
728
- curl -fsSL https://raw.githubusercontent.com/vishalveerareddy123/Lynkr/main/install.sh | bash
729
- ```
730
-
731
- This will:
732
- - Clone Lynkr to `~/.lynkr`
733
- - Install dependencies
734
- - Create a default `.env` file
735
- - Set up the `lynkr` command
736
-
737
- **Custom installation directory:**
738
- ```bash
739
- curl -fsSL https://raw.githubusercontent.com/vishalveerareddy123/Lynkr/main/install.sh | bash -s -- --dir /opt/lynkr
740
- ```
741
-
742
- #### Option 1: Simple Databricks Setup (Quickest)
743
-
744
- **No Ollama needed** - Just use Databricks APIs directly:
745
-
746
- ```bash
747
- # Install Lynkr
748
- npm install -g lynkr
749
-
750
- # Configure Databricks credentials
751
- export DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com
752
- export DATABRICKS_API_KEY=dapi1234567890abcdef
753
-
754
- # Start Lynkr
755
- lynkr
756
- ```
757
-
758
- That's it! Lynkr will use Databricks Claude models for all requests.
759
-
760
- **Or use a .env file:**
761
- ```env
762
- MODEL_PROVIDER=databricks
763
- DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com
764
- DATABRICKS_API_KEY=dapi1234567890abcdef
765
- PORT=8080
766
- ```
767
-
768
- #### Option 2: Hybrid Setup with Ollama (Cost Savings)
769
-
770
- For 40% faster responses and 65% cost savings on simple requests:
771
-
156
+ **Option 1: NPM Package (Recommended)**
772
157
  ```bash
773
- # Install Lynkr
158
+ # Install globally
774
159
  npm install -g lynkr
775
160
 
776
- # Run setup wizard (installs Ollama + downloads model)
777
- lynkr-setup
778
-
779
- # Start Lynkr
780
- lynkr
161
+ # Or run directly with npx
162
+ npx lynkr
781
163
  ```
782
164
 
783
- **The `lynkr-setup` wizard will:**
784
- - ✅ Check if Ollama is installed (auto-installs if missing on macOS/Linux)
785
- - ✅ Start Ollama service
786
- - ✅ Download qwen2.5-coder model (~4.7GB)
787
- - ✅ Create `.env` configuration file
788
- - ✅ Guide you through Databricks credential setup
789
-
790
- **Note**: On Windows, you'll need to manually install Ollama from https://ollama.ai/download, then run `lynkr-setup`.
791
-
792
- #### Option 3: Docker Compose (Bundled)
793
-
794
- For a complete bundled experience with Ollama included:
795
-
165
+ **Option 2: Git Clone**
796
166
  ```bash
797
167
  # Clone repository
798
168
  git clone https://github.com/vishalveerareddy123/Lynkr.git
799
169
  cd Lynkr
800
170
 
801
- # Copy environment template
171
+ # Install dependencies
172
+ npm install
173
+
174
+ # Create .env from example
802
175
  cp .env.example .env
803
176
 
804
- # Edit .env with your Databricks credentials
177
+ # Edit .env with your provider credentials
805
178
  nano .env
806
179
 
807
- # Start both services (Lynkr + Ollama)
808
- docker-compose up -d
809
-
810
- # Pull model (first time only)
811
- docker exec ollama ollama pull qwen2.5-coder:latest
812
-
813
- # Verify it's running
814
- curl http://localhost:8080/health
180
+ # Start server
181
+ npm start
815
182
  ```
816
183
 
817
- See [DEPLOYMENT.md](DEPLOYMENT.md) for advanced deployment options (Kubernetes, systemd, etc.).
818
-
819
- #### Option 4: Homebrew (macOS)
820
-
184
+ **Option 3: Homebrew (macOS/Linux)**
821
185
  ```bash
822
186
  brew tap vishalveerareddy123/lynkr
823
- brew install vishalveerareddy123/lynkr/lynkr
824
-
825
- # Configure Databricks (Ollama optional)
826
- export DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com
827
- export DATABRICKS_API_KEY=dapi1234567890abcdef
828
-
829
- # Start Lynkr
830
- lynkr
831
- ```
832
-
833
- **Optional**: Install Ollama for hybrid routing:
834
- ```bash
835
- brew install ollama
836
- ollama serve
837
- ollama pull qwen2.5-coder:latest
187
+ brew install lynkr
188
+ lynkr start
838
189
  ```
839
190
 
840
- #### Option 5: From Source
841
-
191
+ **Option 4: Docker**
842
192
  ```bash
843
- # Clone repository
844
- git clone https://github.com/vishalveerareddy123/Lynkr.git
845
- cd Lynkr
846
-
847
- # Install dependencies
848
- npm install
849
-
850
- # Start server
851
- npm start
852
- ```
853
-
854
- #### Configuration
855
-
856
- After installation, configure Lynkr by creating a `.env` file or exporting environment variables:
857
-
858
- ```env
859
- # For Databricks-only setup (no Ollama)
860
- MODEL_PROVIDER=databricks
861
- DATABRICKS_API_BASE=https://<your-workspace>.cloud.databricks.com
862
- DATABRICKS_API_KEY=<personal-access-token>
863
- PORT=8080
864
- WORKSPACE_ROOT=/path/to/your/repo
865
- PROMPT_CACHE_ENABLED=true
193
+ docker-compose up -d
866
194
  ```
867
195
 
868
- For hybrid routing with Ollama + cloud fallback, see [Hybrid Routing](#hybrid-routing-with-automatic-fallback) section below.
869
-
870
- You can copy `.env.example` if you maintain one, or rely on shell exports.
871
-
872
- #### Selecting a model provider
196
+ ---
873
197
 
874
- Set `MODEL_PROVIDER` to select the upstream endpoint:
198
+ ## Supported Providers
875
199
 
876
- - `MODEL_PROVIDER=databricks` (default) – expects `DATABRICKS_API_BASE`, `DATABRICKS_API_KEY`, and optionally `DATABRICKS_ENDPOINT_PATH`.
877
- - `MODEL_PROVIDER=azure-anthropic` – routes requests to Azure's `/anthropic/v1/messages` endpoint and uses the headers Azure expects.
878
- - `MODEL_PROVIDER=openrouter` – connects to OpenRouter for access to 100+ models (GPT-4o, Claude, Gemini, Llama, etc.). Requires `OPENROUTER_API_KEY`.
879
- - `MODEL_PROVIDER=ollama` – connects to a locally-running Ollama instance for models like qwen2.5-coder, llama3, mistral, etc.
200
+ Lynkr supports **9+ LLM providers**:
880
201
 
881
- **Azure-hosted Anthropic configuration:**
202
+ | Provider | Type | Models | Cost | Privacy |
203
+ |----------|------|--------|------|---------|
204
+ | **AWS Bedrock** | Cloud | 100+ (Claude, Titan, Llama, Mistral, etc.) | $$-$$$ | Cloud |
205
+ | **Databricks** | Cloud | Claude Sonnet 4.5, Opus 4.5 | $$$ | Cloud |
206
+ | **OpenRouter** | Cloud | 100+ (GPT, Claude, Llama, Gemini, etc.) | $-$$ | Cloud |
207
+ | **Ollama** | Local | Unlimited (free, offline) | **FREE** | 🔒 100% Local |
208
+ | **llama.cpp** | Local | GGUF models | **FREE** | 🔒 100% Local |
209
+ | **Azure OpenAI** | Cloud | GPT-4o, GPT-5, o1, o3 | $$$ | Cloud |
210
+ | **Azure Anthropic** | Cloud | Claude models | $$$ | Cloud |
211
+ | **OpenAI** | Cloud | GPT-4o, o1, o3 | $$$ | Cloud |
212
+ | **LM Studio** | Local | Local models with GUI | **FREE** | 🔒 100% Local |
882
213
 
883
- ```env
884
- MODEL_PROVIDER=azure-anthropic
885
- AZURE_ANTHROPIC_ENDPOINT=https://<resource-name>.services.ai.azure.com/anthropic/v1/messages
886
- AZURE_ANTHROPIC_API_KEY=<azure-api-key>
887
- AZURE_ANTHROPIC_VERSION=2023-06-01
888
- PORT=8080
889
- WORKSPACE_ROOT=/path/to/your/repo
890
- ```
214
+ 📖 **[Full Provider Configuration Guide](documentation/providers.md)**
891
215
 
892
- **Ollama configuration:**
216
+ ---
893
217
 
894
- ```env
895
- MODEL_PROVIDER=ollama
896
- OLLAMA_ENDPOINT=http://localhost:11434 # default Ollama endpoint
897
- OLLAMA_MODEL=qwen2.5-coder:latest # model to use
898
- OLLAMA_TIMEOUT_MS=120000 # request timeout
899
- PORT=8080
900
- WORKSPACE_ROOT=/path/to/your/repo
901
- ```
218
+ ## Claude Code Integration
902
219
 
903
- Before starting Lynkr with Ollama, ensure Ollama is running:
220
+ Configure Claude Code CLI to use Lynkr:
904
221
 
905
222
  ```bash
906
- # Start Ollama (in a separate terminal)
907
- ollama serve
908
-
909
- # Pull your desired model
910
- ollama pull qwen2.5-coder:latest
911
- # Or: ollama pull llama3, mistral, etc.
223
+ # Set Lynkr as backend
224
+ export ANTHROPIC_BASE_URL=http://localhost:8081
225
+ export ANTHROPIC_API_KEY=dummy
912
226
 
913
- # Verify model is available
914
- ollama list
227
+ # Run Claude Code
228
+ claude "Your prompt here"
915
229
  ```
916
230
 
917
- **llama.cpp configuration:**
231
+ That's it! Claude Code now uses your configured provider.
918
232
 
919
- llama.cpp provides maximum performance and flexibility for running GGUF models locally. It uses an OpenAI-compatible API, making integration seamless.
233
+ 📖 **[Detailed Claude Code Setup](documentation/claude-code-cli.md)**
920
234
 
921
- ```env
922
- MODEL_PROVIDER=llamacpp
923
- LLAMACPP_ENDPOINT=http://localhost:8080 # default llama.cpp server port
924
- LLAMACPP_MODEL=qwen2.5-coder-7b # model name (for logging)
925
- LLAMACPP_TIMEOUT_MS=120000 # request timeout
926
- PORT=8080
927
- WORKSPACE_ROOT=/path/to/your/repo
928
- ```
929
-
930
- Before starting Lynkr with llama.cpp, ensure llama-server is running:
931
-
932
- ```bash
933
- # Download and build llama.cpp (if not already done)
934
- git clone https://github.com/ggerganov/llama.cpp
935
- cd llama.cpp && make
235
+ ---
936
236
 
937
- # Download a GGUF model (e.g., from HuggingFace)
938
- # Example: Qwen2.5-Coder-7B-Instruct
939
- wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_k_m.gguf
237
+ ## Cursor Integration
940
238
 
941
- # Start llama-server
942
- ./llama-server -m qwen2.5-coder-7b-instruct-q4_k_m.gguf --port 8080
239
+ Configure Cursor IDE to use Lynkr:
943
240
 
944
- # Verify server is running
945
- curl http://localhost:8080/health
946
- ```
241
+ 1. **Open Cursor Settings**
242
+ - Mac: `Cmd+,` | Windows/Linux: `Ctrl+,`
243
+ - Navigate to: **Features** → **Models**
947
244
 
948
- **Why llama.cpp over Ollama?**
245
+ 2. **Configure OpenAI API Settings**
246
+ - **API Key**: `sk-lynkr` (any non-empty value)
247
+ - **Base URL**: `http://localhost:8081/v1`
248
+ - **Model**: `claude-3.5-sonnet` (or your provider's model)
949
249
 
950
- | Feature | Ollama | llama.cpp |
951
- |---------|--------|-----------|
952
- | Setup | Easy (app) | Manual (compile/download) |
953
- | Model Format | Ollama-specific | Any GGUF model |
954
- | Performance | Good | Excellent (optimized C++) |
955
- | GPU Support | Yes | Yes (CUDA, Metal, ROCm, Vulkan) |
956
- | Memory Usage | Higher | Lower (quantization options) |
957
- | API | Custom `/api/chat` | OpenAI-compatible `/v1/chat/completions` |
958
- | Flexibility | Limited models | Any GGUF from HuggingFace |
959
- | Tool Calling | Limited models | Grammar-based, more reliable |
250
+ 3. **Test It**
251
+ - Chat: `Cmd+L` / `Ctrl+L`
252
+ - Inline edits: `Cmd+K` / `Ctrl+K`
253
+ - @Codebase search: Requires [embeddings setup](documentation/embeddings.md)
960
254
 
961
- Choose llama.cpp when you need maximum performance, specific quantization options, or want to use GGUF models not available in Ollama.
255
+ 📖 **[Full Cursor Setup Guide](documentation/cursor-integration.md)** | **[Embeddings Configuration](documentation/embeddings.md)**
962
256
 
963
- **OpenRouter configuration:**
257
+ ---
964
258
 
965
- OpenRouter provides unified access to 100+ AI models through a single API, including GPT-4o, Claude, Gemini, Llama, Mixtral, and more. It offers competitive pricing, automatic fallbacks, and no need to manage multiple API keys.
259
+ ## Documentation
966
260
 
967
- ```env
968
- MODEL_PROVIDER=openrouter
969
- OPENROUTER_API_KEY=sk-or-v1-... # Get from https://openrouter.ai/keys
970
- OPENROUTER_MODEL=openai/gpt-4o-mini # Model to use (see https://openrouter.ai/models)
971
- OPENROUTER_ENDPOINT=https://openrouter.ai/api/v1/chat/completions # API endpoint
972
- PORT=8080
973
- WORKSPACE_ROOT=/path/to/your/repo
974
- ```
261
+ ### Getting Started
262
+ - ðŸ“Ķ **[Installation Guide](documentation/installation.md)** - Detailed installation for all methods
263
+ - ⚙ïļ **[Provider Configuration](documentation/providers.md)** - Complete setup for all 9+ providers
264
+ - ðŸŽŊ **[Quick Start Examples](documentation/installation.md#quick-start-examples)** - Copy-paste configs
975
265
 
976
- **Popular OpenRouter models:**
977
- - `openai/gpt-4o-mini` – Fast, affordable GPT-4o mini ($0.15/$0.60 per 1M tokens)
978
- - `anthropic/claude-3.5-sonnet` – Claude 3.5 Sonnet for complex reasoning
979
- - `google/gemini-pro-1.5` – Google's Gemini Pro with large context
980
- - `meta-llama/llama-3.1-70b-instruct` – Meta's open-source Llama 3.1
266
+ ### IDE Integration
267
+ - ðŸ–Ĩïļ **[Claude Code CLI Setup](documentation/claude-code-cli.md)** - Connect Claude Code CLI
268
+ - ðŸŽĻ **[Cursor IDE Setup](documentation/cursor-integration.md)** - Full Cursor integration with troubleshooting
269
+ - 🔍 **[Embeddings Guide](documentation/embeddings.md)** - Enable @Codebase semantic search (4 options: Ollama, llama.cpp, OpenRouter, OpenAI)
981
270
 
982
- See https://openrouter.ai/models for the complete list with pricing.
271
+ ### Features & Capabilities
272
+ - âœĻ **[Core Features](documentation/features.md)** - Architecture, request flow, format conversion
273
+ - 🧠 **[Memory System](documentation/memory-system.md)** - Titans-inspired long-term memory
274
+ - 💰 **[Token Optimization](documentation/token-optimization.md)** - 60-80% cost reduction strategies
275
+ - 🔧 **[Tools & Execution](documentation/tools.md)** - Tool calling, execution modes, custom tools
983
276
 
984
- **OpenAI configuration:**
277
+ ### Deployment & Operations
278
+ - ðŸģ **[Docker Deployment](documentation/docker.md)** - docker-compose setup with GPU support
279
+ - 🏭 **[Production Hardening](documentation/production.md)** - Circuit breakers, load shedding, metrics
280
+ - 📊 **[API Reference](documentation/api.md)** - All endpoints and formats
985
281
 
986
- OpenAI provides direct access to GPT-4o, GPT-4o-mini, o1, and other models through their official API. This is the simplest way to use OpenAI models without going through Azure or OpenRouter.
282
+ ### Support
283
+ - 🔧 **[Troubleshooting](documentation/troubleshooting.md)** - Common issues and solutions
284
+ - ❓ **[FAQ](documentation/faq.md)** - Frequently asked questions
285
+ - 🧊 **[Testing Guide](documentation/testing.md)** - Running tests and validation
987
286
 
988
- ```env
989
- MODEL_PROVIDER=openai
990
- OPENAI_API_KEY=sk-your-openai-api-key # Get from https://platform.openai.com/api-keys
991
- OPENAI_MODEL=gpt-4o # Model to use (default: gpt-4o)
992
- PORT=8080
993
- WORKSPACE_ROOT=/path/to/your/repo
994
- ```
287
+ ---
995
288
 
289
+ ## External Resources
996
290
 
997
- **Getting an OpenAI API key:**
998
- 1. Visit https://platform.openai.com
999
- 2. Sign up or log in to your account
1000
- 3. Go to https://platform.openai.com/api-keys
1001
- 4. Create a new API key
1002
- 5. Add credits to your account (pay-as-you-go)
1003
-
1004
- **OpenAI benefits:**
1005
- - ✅ **Direct API access** – No intermediaries, lowest latency to OpenAI
1006
- - ✅ **Full tool calling support** – Excellent function calling compatible with Claude Code CLI
1007
- - ✅ **Parallel tool calls** – Execute multiple tools simultaneously for faster workflows
1008
- - ✅ **Organization support** – Use organization-level API keys for team billing
1009
- - ✅ **Simple setup** – Just one API key needed
1010
-
1011
- **Getting an OpenRouter API key:**
1012
- 1. Visit https://openrouter.ai
1013
- 2. Sign in with GitHub, Google, or email
1014
- 3. Go to https://openrouter.ai/keys
1015
- 4. Create a new API key
1016
- 5. Add credits to your account (pay-as-you-go, no subscription required)
1017
-
1018
- **OpenRouter benefits:**
1019
- - ✅ **100+ models** through one API (no need to manage multiple provider accounts)
1020
- - ✅ **Automatic fallbacks** if your primary model is unavailable
1021
- - ✅ **Competitive pricing** with volume discounts
1022
- - ✅ **Full tool calling support** (function calling compatible with Claude Code CLI)
1023
- - ✅ **No monthly fees** – pay only for what you use
1024
- - ✅ **Rate limit pooling** across models
291
+ - 📚 **[DeepWiki Documentation](https://deepwiki.com/vishalveerareddy123/Lynkr)** - AI-powered documentation search
292
+ - 💎 **[GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Community Q&A
293
+ - 🐛 **[Report Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Bug reports and feature requests
294
+ - ðŸ“Ķ **[NPM Package](https://www.npmjs.com/package/lynkr)** - Official npm package
1025
295
 
1026
296
  ---
1027
297
 
1028
- ## Configuration Reference
1029
-
1030
- | Variable | Description | Default |
1031
- |----------|-------------|---------|
1032
- | `PORT` | HTTP port for the proxy server. | `8080` |
1033
- | `WORKSPACE_ROOT` | Filesystem path exposed to workspace tools and indexer. | `process.cwd()` |
1034
- | `MODEL_PROVIDER` | Selects the model backend (`databricks`, `openai`, `azure-openai`, `azure-anthropic`, `openrouter`, `ollama`, `llamacpp`). | `databricks` |
1035
- | `MODEL_DEFAULT` | Overrides the default model/deployment name sent to the provider. | Provider-specific default |
1036
- | `DATABRICKS_API_BASE` | Base URL of your Databricks workspace (required when `MODEL_PROVIDER=databricks`). | – |
1037
- | `DATABRICKS_API_KEY` | Databricks PAT used for the serving endpoint (required for Databricks). | – |
1038
- | `DATABRICKS_ENDPOINT_PATH` | Optional override for the Databricks serving endpoint path. | `/serving-endpoints/databricks-claude-sonnet-4-5/invocations` |
1039
- | `AZURE_ANTHROPIC_ENDPOINT` | Full HTTPS endpoint for Azure-hosted Anthropic `/anthropic/v1/messages` (required when `MODEL_PROVIDER=azure-anthropic`). | – |
1040
- | `AZURE_ANTHROPIC_API_KEY` | API key supplied via the `x-api-key` header for Azure Anthropic. | – |
1041
- | `AZURE_ANTHROPIC_VERSION` | Anthropic API version header for Azure Anthropic calls. | `2023-06-01` |
1042
- | `OPENROUTER_API_KEY` | OpenRouter API key (required when `MODEL_PROVIDER=openrouter`). Get from https://openrouter.ai/keys | – |
1043
- | `OPENROUTER_MODEL` | OpenRouter model to use (e.g., `openai/gpt-4o-mini`, `anthropic/claude-3.5-sonnet`). See https://openrouter.ai/models | `openai/gpt-4o-mini` |
1044
- | `OPENROUTER_ENDPOINT` | OpenRouter API endpoint URL. | `https://openrouter.ai/api/v1/chat/completions` |
1045
- | `OPENROUTER_MAX_TOOLS_FOR_ROUTING` | Maximum tool count for routing to OpenRouter in hybrid mode. | `15` |
1046
- | `OPENAI_API_KEY` | OpenAI API key (required when `MODEL_PROVIDER=openai`). Get from https://platform.openai.com/api-keys | – |
1047
- | `OPENAI_MODEL` | OpenAI model to use (e.g., `gpt-4o`, `gpt-4o-mini`, `o1-preview`). | `gpt-4o` |
1048
- | `OPENAI_ENDPOINT` | OpenAI API endpoint URL (usually don't need to change). | `https://api.openai.com/v1/chat/completions` |
1049
- | `OPENAI_ORGANIZATION` | OpenAI organization ID for organization-level API keys (optional). | – |
1050
- | `OLLAMA_ENDPOINT` | Ollama API endpoint URL (required when `MODEL_PROVIDER=ollama`). | `http://localhost:11434` |
1051
- | `OLLAMA_MODEL` | Ollama model name to use (e.g., `qwen2.5-coder:latest`, `llama3`, `mistral`). | `qwen2.5-coder:7b` |
1052
- | `OLLAMA_TIMEOUT_MS` | Request timeout for Ollama API calls in milliseconds. | `120000` (2 minutes) |
1053
- | `LLAMACPP_ENDPOINT` | llama.cpp server endpoint URL (required when `MODEL_PROVIDER=llamacpp`). | `http://localhost:8080` |
1054
- | `LLAMACPP_MODEL` | llama.cpp model name (for logging purposes). | `default` |
1055
- | `LLAMACPP_TIMEOUT_MS` | Request timeout for llama.cpp API calls in milliseconds. | `120000` (2 minutes) |
1056
- | `LLAMACPP_API_KEY` | Optional API key for secured llama.cpp servers. | – |
1057
- | `PROMPT_CACHE_ENABLED` | Toggle the prompt cache system. | `true` |
1058
- | `PROMPT_CACHE_TTL_MS` | Milliseconds before cached prompts expire. | `300000` (5 minutes) |
1059
- | `PROMPT_CACHE_MAX_ENTRIES` | Maximum number of cached prompts retained. | `64` |
1060
- | `TOOL_EXECUTION_MODE` | Controls where tools execute: `server` (default, tools run on proxy server), `client`/`passthrough` (tools execute on Claude Code CLI side). | `server` |
1061
- | `POLICY_MAX_STEPS` | Max agent loop iterations before timeout. | `8` |
1062
- | `POLICY_GIT_ALLOW_PUSH` | Allow/disallow `workspace_git_push`. | `false` |
1063
- | `POLICY_GIT_REQUIRE_TESTS` | Enforce passing tests before `workspace_git_commit`. | `false` |
1064
- | `POLICY_GIT_TEST_COMMAND` | Custom test command invoked by policies. | `null` |
1065
- | `WEB_SEARCH_ENDPOINT` | URL for policy-driven web fetch fallback. | `http://localhost:8888/search` |
1066
- | `WEB_SEARCH_ALLOWED_HOSTS` | Comma-separated allowlist for `web_fetch`. | `null` |
1067
- | `MCP_SERVER_MANIFEST` | Single manifest file for MCP server. | `null` |
1068
- | `MCP_MANIFEST_DIRS` | Semicolon-separated directories scanned for manifests. | `~/.claude/mcp` |
1069
- | `MCP_SANDBOX_ENABLED` | Enable container sandbox for MCP tools (requires `MCP_SANDBOX_IMAGE`). | `true` |
1070
- | `MCP_SANDBOX_IMAGE` | Docker/OCI image name used for sandboxing. | `null` |
1071
- | `WORKSPACE_TEST_COMMAND` | Default CLI used by `workspace_test_run`. | `null` |
1072
- | `WORKSPACE_TEST_TIMEOUT_MS` | Test harness timeout. | `600000` |
1073
- | `WORKSPACE_TEST_COVERAGE_FILES` | Comma-separated coverage summary files. | `coverage/coverage-summary.json` |
1074
-
1075
- ### Production Hardening Configuration
1076
-
1077
- | Variable | Description | Default |
1078
- |----------|-------------|---------|
1079
- | `API_RETRY_MAX_RETRIES` | Maximum retry attempts for transient failures. | `3` |
1080
- | `API_RETRY_INITIAL_DELAY` | Initial retry delay in milliseconds. | `1000` |
1081
- | `API_RETRY_MAX_DELAY` | Maximum retry delay in milliseconds. | `30000` |
1082
- | `CIRCUIT_BREAKER_FAILURE_THRESHOLD` | Failures before circuit opens. | `5` |
1083
- | `CIRCUIT_BREAKER_SUCCESS_THRESHOLD` | Successes needed to close circuit from half-open. | `2` |
1084
- | `CIRCUIT_BREAKER_TIMEOUT` | Time before attempting recovery (ms). | `60000` |
1085
- | `LOAD_SHEDDING_MEMORY_THRESHOLD` | Memory usage threshold (0-1) before shedding load. | `0.85` |
1086
- | `LOAD_SHEDDING_HEAP_THRESHOLD` | Heap usage threshold (0-1) before shedding load. | `0.90` |
1087
- | `LOAD_SHEDDING_ACTIVE_REQUESTS_THRESHOLD` | Max concurrent requests before shedding. | `1000` |
1088
- | `GRACEFUL_SHUTDOWN_TIMEOUT` | Shutdown timeout in milliseconds. | `30000` |
1089
- | `METRICS_ENABLED` | Enable metrics collection. | `true` |
1090
- | `HEALTH_CHECK_ENABLED` | Enable health check endpoints. | `true` |
1091
- | `REQUEST_LOGGING_ENABLED` | Enable structured request logging. | `true` |
1092
-
1093
- See `src/config/index.js` for the full configuration matrix, including sandbox mounts, permissions, and MCP networking policies.
1094
-
1095
- ---
298
+ ## Key Features Highlights
1096
299
 
1097
- ## Runtime Operations
300
+ - ✅ **Multi-Provider Support** - 9+ providers including local (Ollama, llama.cpp) and cloud (Bedrock, Databricks, OpenRouter)
301
+ - ✅ **60-80% Cost Reduction** - Token optimization with smart tool selection, prompt caching, memory deduplication
302
+ - ✅ **100% Local Option** - Run completely offline with Ollama/llama.cpp (zero cloud dependencies)
303
+ - ✅ **OpenAI Compatible** - Works with Cursor IDE, Continue.dev, and any OpenAI-compatible client
304
+ - ✅ **Embeddings Support** - 4 options for @Codebase search: Ollama (local), llama.cpp (local), OpenRouter, OpenAI
305
+ - ✅ **MCP Integration** - Automatic Model Context Protocol server discovery and orchestration
306
+ - ✅ **Enterprise Features** - Circuit breakers, load shedding, Prometheus metrics, K8s health checks
307
+ - ✅ **Streaming Support** - Real-time token streaming for all providers
308
+ - ✅ **Memory System** - Titans-inspired long-term memory with surprise-based filtering
309
+ - ✅ **Tool Calling** - Full tool support with server and passthrough execution modes
310
+ - ✅ **Production Ready** - Battle-tested with 400+ tests, observability, and error resilience
1098
311
 
1099
- ### Launching the Proxy
312
+ ---
1100
313
 
1101
- ```bash
1102
- # global install
1103
- lynkr start
314
+ ## Architecture
1104
315
 
1105
- # local checkout
1106
- npm run dev # development: auto-restarts on file changes
1107
- npm start # production
1108
316
  ```
317
+ ┌─────────────────┐
318
+ │ Claude Code CLI │ or Cursor IDE
319
+ └────────┮────────┘
320
+ │ Anthropic/OpenAI Format
321
+ ↓
322
+ ┌─────────────────┐
323
+ │ Lynkr Proxy │
324
+ │ Port: 8081 │
325
+ │ │
326
+ │ â€Ē Format Conv. │
327
+ │ â€Ē Token Optim. │
328
+ │ â€Ē Provider Route│
329
+ │ â€Ē Tool Calling │
330
+ │ â€Ē Caching │
331
+ └────────┮────────┘
332
+ │
333
+ ├──→ Databricks (Claude 4.5)
334
+ ├──→ AWS Bedrock (100+ models)
335
+ ├──→ OpenRouter (100+ models)
336
+ ├──→ Ollama (local, free)
337
+ ├──→ llama.cpp (local, free)
338
+ ├──→ Azure OpenAI (GPT-4o, o1)
339
+ ├──→ OpenAI (GPT-4o, o3)
340
+ └──→ Azure Anthropic (Claude)
341
+ ```
342
+
343
+ 📖 **[Detailed Architecture](documentation/features.md#architecture)**
1109
344
 
1110
- Logs stream to stdout. The server listens on `PORT` and exposes `/v1/messages` in the Anthropic-compatible shape. If you installed via npm globally, `lynkr start` reads the same environment variables described above.
1111
-
1112
- ### Connecting Claude Code CLI
1113
-
1114
- 1. Install or upgrade Claude Code CLI.
1115
- 2. Export the proxy endpoint:
1116
- ```bash
1117
- export ANTHROPIC_BASE_URL=http://localhost:8080
1118
- export ANTHROPIC_API_KEY=dummy # not used, but Anthropic CLI requires it
1119
- ```
1120
- 3. Launch `claude` CLI within `WORKSPACE_ROOT`.
1121
- 4. Invoke commands as normal; the CLI will route requests through the proxy.
1122
-
1123
- ### Using Ollama Models
1124
-
1125
- Lynkr can connect to locally-running Ollama models for fast, offline AI assistance. This is ideal for development environments, air-gapped systems, or cost optimization.
345
+ ---
1126
346
 
1127
- **Quick Start with Ollama:**
347
+ ## Quick Configuration Examples
1128
348
 
349
+ **100% Local (FREE)**
1129
350
  ```bash
1130
- # Terminal 1: Start Ollama
1131
- ollama serve
1132
-
1133
- # Terminal 2: Pull and verify model
1134
- ollama pull qwen2.5-coder:latest
1135
- ollama list
1136
-
1137
- # Terminal 3: Start Lynkr with Ollama
1138
351
  export MODEL_PROVIDER=ollama
1139
- export OLLAMA_ENDPOINT=http://localhost:11434
1140
352
  export OLLAMA_MODEL=qwen2.5-coder:latest
353
+ export OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
1141
354
  npm start
1142
-
1143
- # Terminal 4: Connect Claude CLI
1144
- export ANTHROPIC_BASE_URL=http://localhost:8080
1145
- export ANTHROPIC_API_KEY=dummy
1146
- claude
1147
- ```
1148
-
1149
- **Supported Ollama Models:**
1150
-
1151
- Lynkr works with any Ollama model. Popular choices:
1152
-
1153
- - **qwen2.5-coder:latest** – Optimized for code generation (7B parameters, 4.7GB)
1154
- - **llama3:latest** – General-purpose conversational model (8B parameters, 4.7GB)
1155
- - **mistral:latest** – Fast, efficient model (7B parameters, 4.1GB)
1156
- - **codellama:latest** – Meta's code-focused model (7B-34B variants)
1157
-
1158
-
1159
- **Ollama Health Check:**
1160
-
1161
- ```bash
1162
- # Basic health check
1163
- curl http://localhost:8080/health/ready
1164
-
1165
- # Deep health check (includes Ollama connectivity)
1166
- curl "http://localhost:8080/health/ready?deep=true" | jq .checks.ollama
1167
355
  ```
1168
356
 
1169
- **Tool Calling Support:**
1170
-
1171
- Lynkr now supports **native tool calling** for compatible Ollama models:
1172
-
1173
- - ✅ **Supported models**: llama3.1, llama3.2, qwen2.5, qwen2.5-coder, mistral, mistral-nemo, firefunction-v2
1174
- - ✅ **Automatic detection**: Lynkr detects tool-capable models and enables tools automatically
1175
- - ✅ **Format conversion**: Transparent conversion between Anthropic and Ollama tool formats
1176
- - ❌ **Unsupported models**: llama3, older models (tools are filtered out automatically)
1177
-
1178
-
1179
- **Limitations:**
1180
-
1181
- - Tool choice parameter is not supported (Ollama always uses "auto" mode)
1182
- - Some advanced Claude features (extended thinking, prompt caching) are not available with Ollama
1183
-
1184
- ### Hybrid Routing with Automatic Fallback
1185
-
1186
- Lynkr supports **intelligent 3-tier hybrid routing** that automatically routes requests between Ollama (local/fast), OpenRouter (moderate complexity), and cloud providers (Databricks/Azure for heavy workloads) based on request complexity, with transparent fallback when any provider is unavailable.
1187
-
1188
- **Why Hybrid Routing?**
1189
-
1190
- - 🚀 **40-87% faster** for simple requests (local Ollama)
1191
- - 💰 **65-100% cost savings** for requests that stay on Ollama
1192
- - ðŸŽŊ **Smart cost optimization** – use affordable OpenRouter models for moderate complexity
1193
- - ðŸ›Ąïļ **Automatic fallback** ensures reliability when any provider fails
1194
- - 🔒 **Privacy-preserving** for simple queries (never leave your machine with Ollama)
1195
-
1196
- **Quick Start:**
1197
-
357
+ **AWS Bedrock (100+ models)**
1198
358
  ```bash
1199
- # Terminal 1: Start Ollama
1200
- ollama serve
1201
- ollama pull qwen2.5-coder:latest
1202
-
1203
- # Terminal 2: Start Lynkr with 3-tier routing
1204
- export PREFER_OLLAMA=true
1205
- export OLLAMA_ENDPOINT=http://localhost:11434
1206
- export OLLAMA_MODEL=qwen2.5-coder:latest
1207
- export OPENROUTER_API_KEY=your_openrouter_key # Mid-tier provider
1208
- export OPENROUTER_MODEL=openai/gpt-4o-mini # Mid-tier model
1209
- export DATABRICKS_API_KEY=your_key # Heavy workload provider
1210
- export DATABRICKS_API_BASE=your_base_url # Heavy workload provider
359
+ export MODEL_PROVIDER=bedrock
360
+ export AWS_BEDROCK_API_KEY=your-key
361
+ export AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
1211
362
  npm start
1212
-
1213
- # Terminal 3: Connect Claude CLI (works transparently)
1214
- export ANTHROPIC_BASE_URL=http://localhost:8080
1215
- export ANTHROPIC_API_KEY=dummy
1216
- claude
1217
- ```
1218
-
1219
- **How It Works:**
1220
-
1221
- Lynkr intelligently routes each request based on complexity:
1222
-
1223
- 1. **Simple requests (0-2 tools)** → Try Ollama first
1224
- - ✅ If Ollama succeeds: Fast, local, free response (100-500ms)
1225
- - ❌ If Ollama fails: Automatic transparent fallback to OpenRouter or Databricks
1226
-
1227
- 2. **Moderate requests (3-14 tools)** → Route to OpenRouter
1228
- - Uses affordable models like GPT-4o-mini ($0.15/1M input tokens)
1229
- - Full tool calling support
1230
- - ❌ If OpenRouter fails or not configured: Fallback to Databricks
1231
-
1232
- 3. **Complex requests (15+ tools)** → Route directly to Databricks
1233
- - Heavy workloads get the most capable models
1234
- - Enterprise features and reliability
1235
-
1236
- 4. **Tool-incompatible models** → Route directly to cloud
1237
- - Requests requiring tools with non-tool-capable Ollama models skip Ollama
1238
-
1239
- **Configuration:**
1240
-
1241
- ```bash
1242
- # Required
1243
- PREFER_OLLAMA=true # Enable hybrid routing mode
1244
-
1245
- # Optional (with defaults)
1246
- FALLBACK_ENABLED=true # Enable automatic fallback (default: true)
1247
- OLLAMA_MAX_TOOLS_FOR_ROUTING=3 # Max tools to route to Ollama (default: 3)
1248
- OPENROUTER_MAX_TOOLS_FOR_ROUTING=15 # Max tools to route to OpenRouter (default: 15)
1249
- FALLBACK_PROVIDER=databricks # Final fallback provider (default: databricks)
1250
- OPENROUTER_API_KEY=your_key # Required for OpenRouter tier
1251
- OPENROUTER_MODEL=openai/gpt-4o-mini # OpenRouter model (default: gpt-4o-mini)
1252
363
  ```
1253
364
 
1254
- **Example Scenarios:**
1255
-
365
+ **OpenRouter (simplest cloud)**
1256
366
  ```bash
1257
- # Scenario 1: Simple code generation (no tools)
1258
- User: "Write a hello world function in Python"
1259
- → Routes to Ollama (fast, local, free)
1260
- → Response in ~300ms
1261
-
1262
- # Scenario 2: Moderate workflow (3-14 tools)
1263
- User: "Search the codebase, read 5 files, and refactor them"
1264
- → Routes to OpenRouter (moderate complexity)
1265
- → Uses affordable GPT-4o-mini
1266
- → Response in ~1500ms
1267
-
1268
- # Scenario 3: Heavy workflow (15+ tools)
1269
- User: "Analyze 20 files, run tests, update documentation, commit changes"
1270
- → Routes directly to Databricks (complex task needs most capable model)
1271
- → Response in ~2500ms
1272
-
1273
- # Scenario 4: Automatic fallback chain
1274
- User: "What is 2+2?"
1275
- → Tries Ollama (connection refused)
1276
- → Falls back to OpenRouter (if configured)
1277
- → Falls back to Databricks (if OpenRouter unavailable)
1278
- → User sees no error, just gets response
367
+ export MODEL_PROVIDER=openrouter
368
+ export OPENROUTER_API_KEY=sk-or-v1-your-key
369
+ npm start
1279
370
  ```
1280
371
 
1281
- **Circuit Breaker Protection:**
372
+ 📖 **[More Examples](documentation/providers.md#quick-start-examples)**
1282
373
 
1283
- After 5 consecutive Ollama failures, the circuit breaker opens:
1284
- - Subsequent requests skip Ollama entirely (fail-fast)
1285
- - Fallback happens in <100ms instead of waiting for timeout
1286
- - Circuit auto-recovers after 60 seconds
1287
-
1288
- **Monitoring:**
374
+ ---
1289
375
 
1290
- Track routing performance via `/metrics/observability`:
376
+ ## Contributing
1291
377
 
1292
- ```bash
1293
- curl http://localhost:8080/metrics/observability | jq '.routing, .fallback, .cost_savings'
1294
- ```
378
+ We welcome contributions! Please see:
379
+ - **[Contributing Guide](documentation/contributing.md)** - How to contribute
380
+ - **[Testing Guide](documentation/testing.md)** - Running tests
1295
381
 
1296
- Example output:
1297
- ```json
1298
- {
1299
- "routing": {
1300
- "by_provider": {"ollama": 100, "databricks": 20},
1301
- "successes_by_provider": {"ollama": 85, "databricks": 20},
1302
- "failures_by_provider": {"ollama": 15}
1303
- },
1304
- "fallback": {
1305
- "attempts_total": 15,
1306
- "successes_total": 13,
1307
- "failures_total": 2,
1308
- "success_rate": "86.67%",
1309
- "reasons": {
1310
- "circuit_breaker": 8,
1311
- "timeout": 4,
1312
- "service_unavailable": 3
1313
- }
1314
- },
1315
- "cost_savings": {
1316
- "ollama_savings_usd": "1.2345",
1317
- "ollama_latency_ms": { "mean": 450, "p95": 1200 }
1318
- }
1319
- }
1320
- ```
382
+ ---
1321
383
 
1322
- **Rollback:**
384
+ ## License
1323
385
 
1324
- Disable hybrid routing anytime:
386
+ Apache 2.0 - See [LICENSE](LICENSE) file for details.
1325
387
 
1326
- ```bash
1327
- # Option 1: Disable entirely (use static MODEL_PROVIDER)
1328
- export PREFER_OLLAMA=false
1329
- npm start
388
+ ---
1330
389
 
1331
- # Option 2: Ollama-only mode (no fallback)
1332
- export PREFER_OLLAMA=true
1333
- export FALLBACK_ENABLED=false
1334
- npm start
1335
- ```
1336
-
1337
- **Performance Comparison:**
1338
-
1339
- | Metric | Cloud Only | Hybrid Routing | Improvement |
1340
- |--------|-----------|----------------|-------------|
1341
- | **Simple requests** | 1500-2500ms | 300-600ms | 70-87% faster ⚡ |
1342
- | **Complex requests** | 1500-2500ms | 1500-2500ms | No change (routes to cloud) |
1343
- | **Cost per simple request** | $0.002-0.005 | $0.00 | 100% savings 💰 |
1344
- | **Fallback latency** | N/A | <100ms | Transparent to user |
1345
-
1346
- ### Using Built-in Workspace Tools
1347
-
1348
- You can call tools programmatically via HTTP:
1349
-
1350
- ```bash
1351
- curl http://localhost:8080/v1/messages \
1352
- -H 'Content-Type: application/json' \
1353
- -H 'x-session-id: manual-test' \
1354
- -d '{
1355
- "model": "claude-proxy",
1356
- "messages": [{ "role": "user", "content": "Rebuild the workspace index." }],
1357
- "tools": [{
1358
- "name": "workspace_index_rebuild",
1359
- "type": "function",
1360
- "description": "Rebuild the repo index and project summary",
1361
- "input_schema": { "type": "object" }
1362
- }],
1363
- "tool_choice": {
1364
- "type": "function",
1365
- "function": { "name": "workspace_index_rebuild" }
1366
- }
1367
- }'
1368
- ```
1369
-
1370
- Tool responses appear in the assistant content block with structured JSON.
1371
-
1372
- ### Client-Side Tool Execution (Passthrough Mode)
1373
-
1374
- Lynkr supports **client-side tool execution**, where tools execute on the Claude Code CLI machine instead of the proxy server. This enables local file operations, commands, and access to local resources.
1375
-
1376
- **Enable client-side execution:**
1377
-
1378
- ```bash
1379
- # Set in .env or export before starting
1380
- export TOOL_EXECUTION_MODE=client
1381
- npm start
1382
- ```
1383
-
1384
- **How it works:**
1385
-
1386
- 1. **Model generates tool calls** – Databricks/OpenRouter/Ollama model returns tool calls
1387
- 2. **Proxy converts to Anthropic format** – Tool calls converted to `tool_use` blocks
1388
- 3. **CLI executes tools locally** – Claude Code CLI receives `tool_use` blocks and runs them on the user's machine
1389
- 4. **CLI sends results back** – Tool results sent back to proxy in next request as `tool_result` blocks
1390
- 5. **Conversation continues** – Proxy forwards the complete conversation (including tool results) back to the model
1391
-
1392
- **Example response in passthrough mode:**
1393
-
1394
- ```json
1395
- {
1396
- "id": "msg_123",
1397
- "type": "message",
1398
- "role": "assistant",
1399
- "content": [
1400
- {
1401
- "type": "text",
1402
- "text": "I'll create that file for you."
1403
- },
1404
- {
1405
- "type": "tool_use",
1406
- "id": "toolu_abc",
1407
- "name": "Write",
1408
- "input": {
1409
- "file_path": "/tmp/test.txt",
1410
- "content": "Hello World"
1411
- }
1412
- }
1413
- ],
1414
- "stop_reason": "tool_use"
1415
- }
1416
- ```
1417
-
1418
- **Benefits:**
1419
- - ✅ Tools execute on CLI user's local filesystem
1420
- - ✅ Access to local credentials, SSH keys, environment variables
1421
- - ✅ Integration with local development tools (git, npm, docker, etc.)
1422
- - ✅ Reduced network latency for file operations
1423
- - ✅ Server doesn't need filesystem access or permissions
1424
-
1425
- **Use cases:**
1426
- - Remote proxy server, local CLI execution
1427
- - Multi-user environments where each user needs their own workspace
1428
- - Security-sensitive environments where server shouldn't access user files
1429
-
1430
- **Supported modes:**
1431
- - `TOOL_EXECUTION_MODE=server` – Tools run on proxy server (default)
1432
- - `TOOL_EXECUTION_MODE=client` – Tools run on CLI side
1433
- - `TOOL_EXECUTION_MODE=passthrough` – Alias for `client`
1434
-
1435
- ### Working with Prompt Caching
1436
-
1437
- - Set `PROMPT_CACHE_ENABLED=true` (default) to activate the cache.
1438
- - The cache retains up to `PROMPT_CACHE_MAX_ENTRIES` entries for `PROMPT_CACHE_TTL_MS` milliseconds.
1439
- - A cache hit skips the Databricks call; response metadata populates `cache_read_input_tokens`.
1440
- - Cache misses record `cache_creation_input_tokens`, indicating a fresh prompt was cached.
1441
- - Cache entries are invalidated automatically when they age out; no manual maintenance required.
1442
- - Disable caching temporarily by exporting `PROMPT_CACHE_ENABLED=false` and restarting the server.
1443
-
1444
- ### Integrating MCP Servers
1445
-
1446
- 1. Place MCP manifest JSON files under `~/.claude/mcp` or configure `MCP_MANIFEST_DIRS`.
1447
- 2. Each manifest should define the server command, arguments, and capabilities per the MCP spec.
1448
- 3. Restart the proxy; manifests are loaded at boot. Registered tools appear with names `mcp_<server>_<tool>`.
1449
- 4. Invoke tools via `workspace_mcp_call` or indirectly when the assistant selects them.
1450
- 5. Sandbox settings (`MCP_SANDBOX_*`) control Docker runtime, mounts, environment passthrough, and permission prompts.
1451
-
1452
- ### Health Checks & Monitoring
1453
-
1454
- Lynkr exposes Kubernetes-ready health check endpoints for orchestrated deployments:
1455
-
1456
- #### Liveness Probe
1457
- ```bash
1458
- curl http://localhost:8080/health/live
1459
- ```
1460
-
1461
- Returns `200 OK` with basic process health. Use this for Kubernetes liveness probes to detect crashed or frozen processes.
1462
-
1463
- **Kubernetes Configuration:**
1464
- ```yaml
1465
- livenessProbe:
1466
- httpGet:
1467
- path: /health/live
1468
- port: 8080
1469
- initialDelaySeconds: 10
1470
- periodSeconds: 10
1471
- timeoutSeconds: 5
1472
- failureThreshold: 3
1473
- ```
1474
-
1475
- #### Readiness Probe
1476
- ```bash
1477
- curl http://localhost:8080/health/ready
1478
- ```
1479
-
1480
- Returns `200 OK` when ready to serve traffic, or `503 Service Unavailable` when:
1481
- - System is shutting down
1482
- - Database connections are unavailable
1483
- - Memory usage exceeds safe thresholds
1484
-
1485
- **Response Format:**
1486
- ```json
1487
- {
1488
- "status": "healthy",
1489
- "timestamp": "2024-01-15T10:30:00.000Z",
1490
- "checks": {
1491
- "database": {
1492
- "healthy": true,
1493
- "latency": 12
1494
- },
1495
- "memory": {
1496
- "healthy": true,
1497
- "heapUsedPercent": 45.2,
1498
- "totalUsedPercent": 52.1
1499
- }
1500
- }
1501
- }
1502
- ```
1503
-
1504
- **Kubernetes Configuration:**
1505
- ```yaml
1506
- readinessProbe:
1507
- httpGet:
1508
- path: /health/ready
1509
- port: 8080
1510
- initialDelaySeconds: 5
1511
- periodSeconds: 5
1512
- timeoutSeconds: 3
1513
- failureThreshold: 2
1514
- ```
1515
-
1516
- ### Metrics & Observability
1517
-
1518
- Lynkr collects comprehensive metrics with minimal performance overhead (7.1Ξs per request). Three endpoints provide different views:
1519
-
1520
- #### JSON Metrics (Human-Readable)
1521
- ```bash
1522
- curl http://localhost:8080/metrics/observability
1523
- ```
1524
-
1525
- Returns detailed metrics in JSON format:
1526
- ```json
1527
- {
1528
- "requests": {
1529
- "total": 15234,
1530
- "errors": 127,
1531
- "errorRate": 0.0083
1532
- },
1533
- "latency": {
1534
- "p50": 125.3,
1535
- "p95": 342.1,
1536
- "p99": 521.8,
1537
- "count": 15234
1538
- },
1539
- "tokens": {
1540
- "input": 1523421,
1541
- "output": 823456,
1542
- "total": 2346877
1543
- },
1544
- "cost": {
1545
- "total": 234.56,
1546
- "currency": "USD"
1547
- },
1548
- "databricks": {
1549
- "requests": 15234,
1550
- "successes": 15107,
1551
- "failures": 127,
1552
- "successRate": 0.9917,
1553
- "retries": 89
1554
- }
1555
- }
1556
- ```
1557
-
1558
- #### Prometheus Format (Scraping)
1559
- ```bash
1560
- curl http://localhost:8080/metrics/prometheus
1561
- ```
1562
-
1563
- Returns metrics in Prometheus text format for scraping:
1564
- ```
1565
- # HELP http_requests_total Total number of HTTP requests
1566
- # TYPE http_requests_total counter
1567
- http_requests_total 15234
1568
-
1569
- # HELP http_request_errors_total Total number of HTTP request errors
1570
- # TYPE http_request_errors_total counter
1571
- http_request_errors_total 127
1572
-
1573
- # HELP http_request_duration_seconds HTTP request latency
1574
- # TYPE http_request_duration_seconds summary
1575
- http_request_duration_seconds{quantile="0.5"} 0.1253
1576
- http_request_duration_seconds{quantile="0.95"} 0.3421
1577
- http_request_duration_seconds{quantile="0.99"} 0.5218
1578
- http_request_duration_seconds_count 15234
1579
- ```
1580
-
1581
- **Prometheus Configuration:**
1582
- ```yaml
1583
- scrape_configs:
1584
- - job_name: 'lynkr'
1585
- static_configs:
1586
- - targets: ['localhost:8080']
1587
- metrics_path: '/metrics/prometheus'
1588
- scrape_interval: 15s
1589
- ```
1590
-
1591
- #### Circuit Breaker State
1592
- ```bash
1593
- curl http://localhost:8080/metrics/circuit-breakers
1594
- ```
1595
-
1596
- Returns real-time circuit breaker states:
1597
- ```json
1598
- {
1599
- "databricks": {
1600
- "state": "CLOSED",
1601
- "failureCount": 2,
1602
- "successCount": 1523,
1603
- "lastFailure": null,
1604
- "nextAttempt": null
1605
- },
1606
- "azure-anthropic": {
1607
- "state": "OPEN",
1608
- "failureCount": 5,
1609
- "successCount": 823,
1610
- "lastFailure": "2024-01-15T10:25:00.000Z",
1611
- "nextAttempt": "2024-01-15T10:26:00.000Z"
1612
- }
1613
- }
1614
- ```
1615
-
1616
- #### Grafana Dashboard
1617
-
1618
- For visualization, import the included Grafana dashboard (`monitoring/grafana-dashboard.json`) or create custom panels:
1619
- - Request rate and error rate over time
1620
- - Latency percentiles (p50, p95, p99)
1621
- - Token usage and cost tracking
1622
- - Circuit breaker state transitions
1623
- - Memory and CPU usage correlation
1624
-
1625
- ### Running with Docker
1626
-
1627
- A `Dockerfile` and `docker-compose.yml` are included for reproducible deployments.
1628
-
1629
- #### Build & run with Docker Compose
1630
-
1631
- ```bash
1632
- cp .env.example .env # populate with Databricks/Azure credentials, workspace path, etc.
1633
- docker compose up --build
1634
- ```
1635
-
1636
- The compose file exposes:
1637
-
1638
- - Proxy HTTP API on `8080`
1639
- - Optional SearxNG instance on `8888` (started automatically when `WEB_SEARCH_ENDPOINT` is the default)
1640
-
1641
- Workspace files are mounted into the container (`./:/workspace`), and `./data` is persisted for SQLite state. If you launch the proxy outside of this compose setup you must provide your own search backend and point `WEB_SEARCH_ENDPOINT` at it (for example, a self-hosted SearxNG instance). Without a reachable search service the `web_search` and `web_fetch` tools will return placeholder responses or fail.
1642
-
1643
- #### Manual Docker build
1644
-
1645
- ```bash
1646
- docker build -t claude-code-proxy .
1647
- docker run --rm -p 8080:8080 -p 8888:8888 \
1648
- -v "$(pwd)":/workspace \
1649
- -v "$(pwd)/data":/app/data \
1650
- --env-file .env \
1651
- claude-code-proxy
1652
- ```
1653
-
1654
- Adjust port and volume mappings to suit your environment. Ensure the container has access to the target workspace and required credentials.
1655
-
1656
- #### Direct `docker run` with inline environment variables
1657
-
1658
- ```bash
1659
- docker run --rm -p 8080:8080 \
1660
- -v "$(pwd)":/workspace \
1661
- -v "$(pwd)/data":/app/data \
1662
- -e MODEL_PROVIDER=databricks \
1663
- -e DATABRICKS_API_BASE=https://<workspace>.cloud.databricks.com \
1664
- -e DATABRICKS_ENDPOINT_PATH=/serving-endpoints/<endpoint-name>/invocations \
1665
- -e DATABRICKS_API_KEY=<personal-access-token> \
1666
- -e WORKSPACE_ROOT=/workspace \
1667
- -e PORT=8080 \
1668
- claude-code-proxy
1669
- ```
1670
-
1671
- Use additional `-e` flags (or `--env-file`) to pass Azure Anthropic credentials or other configuration values as needed.
1672
- Replace `<workspace>` and `<endpoint-name>` with your Databricks workspace host and the Serving Endpoint you want to target (e.g. `/serving-endpoints/databricks-gpt-4o-mini/invocations`) so you can choose any available model.
1673
-
1674
- ### Provider-specific behaviour
1675
-
1676
- - **Databricks** – Mirrors Anthropic's hosted behaviour. Automatic policy web fallbacks (`needsWebFallback`) can trigger an extra `web_fetch`, and the upstream service executes dynamic pages on your behalf.
1677
- - **OpenAI** – Connects directly to OpenAI's API for GPT-4o, GPT-4o-mini, o1, and other models. Full tool calling support with parallel tool execution enabled by default. Messages and tools are automatically converted between Anthropic and OpenAI formats. Supports organization-level API keys. Best used when you want direct access to OpenAI's latest models with the simplest setup.
1678
- - **Azure OpenAI** – Connects to Azure-hosted OpenAI models. Similar to direct OpenAI but through Azure's infrastructure for enterprise compliance, data residency, and Azure billing integration.
1679
- - **Azure Anthropic** – Requests are normalised to Azure's payload shape. The proxy disables automatic `web_fetch` fallbacks to avoid duplicate tool executions; instead, the assistant surfaces a diagnostic message and you can trigger the tool manually if required.
1680
- - **OpenRouter** – Connects to OpenRouter's unified API for access to 100+ models. Full tool calling support with automatic format conversion between Anthropic and OpenAI formats. Messages are converted to OpenAI's format, tool calls are properly translated, and responses are converted back to Anthropic-compatible format. Best used for cost optimization, model flexibility, or when you want to experiment with different models without changing your codebase.
1681
- - **Ollama** – Connects to locally-running Ollama models. Tool support varies by model (llama3.1, qwen2.5, mistral support tools; llama3 and older models don't). System prompts are merged into the first user message. Response format is converted from Ollama's format to Anthropic-compatible content blocks. Best used for simple text generation tasks, offline development, or as a cost-effective development environment.
1682
- - **llama.cpp** – Connects to a local llama-server instance running GGUF models. Uses OpenAI-compatible API format (`/v1/chat/completions`), enabling full tool calling support with grammar-based generation. Provides maximum performance with optimized C++ inference, lower memory usage through quantization, and support for any GGUF model from HuggingFace. Best used when you need maximum performance, specific quantization options, or models not available in Ollama.
1683
- - In all cases, `web_search` and `web_fetch` run locally. They do not execute JavaScript, so pages that render data client-side (Google Finance, etc.) will return scaffolding only. Prefer JSON/CSV quote APIs (e.g. Yahoo chart API) when you need live financial data.
1684
-
1685
- ---
1686
-
1687
- ## Manual Test Matrix
1688
-
1689
- | Area | Scenario | Steps | Expected Outcome |
1690
- |------|----------|-------|------------------|
1691
- | **Indexing & Repo Intelligence** | Rebuild index | 1. `workspace_index_rebuild` 2. Inspect `CLAUDE.md` 3. Run `workspace_symbol_search` | CLAUDE.md and symbol catalog reflect current repo state. |
1692
- | | Remove file & reindex | 1. Delete a tracked file 2. Rebuild index 3. Search for removed symbol | Symbol search returns no hits; CLAUDE.md drops the file from language counts. |
1693
- | **Language Navigation** | Cross-file definition | 1. Choose TS symbol defined/imported across files 2. Search for symbol 3. Get references | Definition points to source file; references list usages in other files only. |
1694
- | | Unsupported language fallback | 1. Use Ruby file with unique method 2. Symbol search and references | Heuristic matches return without crashing. |
1695
- | **Project Summary** | After tests | 1. Run `workspace_index_rebuild` 2. Call `project_summary` | Summary includes latest test stats and style hints (e.g., ESLint). |
1696
- | | Missing coverage files | 1. Move coverage JSON 2. Call `project_summary` | Response notes missing coverage gracefully. |
1697
- | **Task Tracker** | CRUD flow | 1. `workspace_task_create` 2. `workspace_tasks_list` 3. `workspace_task_update` 4. `workspace_task_set_status` 5. `workspace_task_delete` | Tasks persist across calls; deletion removes entry. |
1698
- | **Git Guards** | Push policy | 1. `POLICY_GIT_ALLOW_PUSH=false` 2. `workspace_git_push` | Request denied with policy message. |
1699
- | | Require tests before commit | 1. `POLICY_GIT_REQUIRE_TESTS=true` 2. Attempt commit without running tests | Commit blocked until tests executed. |
1700
- | **Prompt Cache** | Cache hit | 1. Send identical prompt twice 2. Check logs | Second response logs cache hit; response usage shows `cache_read_input_tokens`. |
1701
- | **MCP** | Manifest discovery | 1. Add manifest 2. Restart proxy 3. Call `workspace_mcp_call` | MCP tools execute via JSON-RPC bridge. |
1702
- | **Health Checks** | Liveness probe | 1. `curl http://localhost:8080/health/live` | Returns 200 with basic health status. |
1703
- | | Readiness probe | 1. `curl http://localhost:8080/health/ready` | Returns 200 when ready, 503 during shutdown or unhealthy state. |
1704
- | **Metrics** | JSON metrics | 1. Make requests 2. `curl http://localhost:8080/metrics/observability` | Returns JSON with request counts, latency percentiles, token usage. |
1705
- | | Prometheus export | 1. Make requests 2. `curl http://localhost:8080/metrics/prometheus` | Returns Prometheus text format with counters and summaries. |
1706
- | | Circuit breaker state | 1. `curl http://localhost:8080/metrics/circuit-breakers` | Returns current state (CLOSED/OPEN/HALF_OPEN) for each provider. |
1707
- | **Load Shedding** | Overload protection | 1. Set low threshold 2. Make requests 3. Check response | Returns 503 with Retry-After header when overloaded. |
1708
- | **Circuit Breaker** | Failure threshold | 1. Simulate 5 consecutive failures 2. Check state | Circuit opens, subsequent requests fail fast with circuit breaker error. |
1709
- | | Recovery | 1. Wait for timeout 2. Make successful request | Circuit transitions to HALF_OPEN, then CLOSED after success threshold. |
1710
- | **Graceful Shutdown** | Zero-downtime | 1. Send SIGTERM 2. Check health endpoints 3. Wait for connections to drain | Health checks return 503, connections close gracefully within timeout. |
1711
- | **Input Validation** | Valid input | 1. Send valid request body 2. Check response | Request processes normally. |
1712
- | | Invalid input | 1. Send invalid request (missing required field) 2. Check response | Returns 400 with detailed validation errors. |
1713
- | **Error Handling** | Consistent format | 1. Trigger various errors (404, 500, validation) 2. Check responses | All errors follow consistent format with request ID. |
1714
- | **Request Logging** | Request ID correlation | 1. Make request with X-Request-ID header 2. Check logs 3. Check response headers | Logs show request ID, response includes same ID in header. |
1715
-
1716
- ---
1717
-
1718
- ## Troubleshooting
1719
-
1720
- ### General Issues
1721
-
1722
- - **`path must be a non-empty string` errors** – Tool calls like `fs_read` require explicit paths. Verify the CLI sent a valid `path` argument.
1723
- - **Agent loop exceeding limits** – Increase `POLICY_MAX_STEPS` or fix misbehaving tool that loops.
1724
- - **`spawn npm test ENOENT`** – Configure `WORKSPACE_TEST_COMMAND` or ensure `npm test` exists in the workspace.
1725
- - **MCP server not discovered** – Confirm manifests live inside `MCP_MANIFEST_DIRS` and contain executable commands. Check logs for discovery errors.
1726
- - **Prompt cache not activating** – Ensure `PROMPT_CACHE_ENABLED=true`. Cache only stores tool-free completions; tool use requests bypass caching by design.
1727
- - **Claude CLI prompts for missing tools** – Verify `tools` array in the client request lists the functions you expect. The proxy only exposes registered handlers.
1728
- - **Dynamic finance pages return stale data** – `web_fetch` fetches static HTML only. Use an API endpoint (e.g. Yahoo Finance chart JSON) or the Databricks-hosted tooling if you need rendered values from heavily scripted pages.
1729
-
1730
- ### OpenRouter Issues
1731
-
1732
- - **"No choices in OpenRouter response" errors** – OpenRouter sometimes returns error responses (rate limits, model unavailable) with JSON but no `choices` array. As of the latest update, Lynkr gracefully handles these errors and returns proper error responses instead of crashing. Check logs for "OpenRouter response missing choices array" warnings to see the full error details.
1733
- - **Multi-prompt behavior with certain models** – Some OpenRouter models (particularly open-source models like `openai/gpt-oss-120b`) may be overly cautious and ask for confirmation multiple times before executing tools. This is model-specific behavior. Consider switching to:
1734
- - `anthropic/claude-3.5-sonnet` – More decisive tool execution
1735
- - `openai/gpt-4o` or `openai/gpt-4o-mini` – Better tool calling behavior
1736
- - Use Databricks provider with Claude models for optimal tool execution
1737
- - **Rate limit errors** – OpenRouter applies per-model rate limits. If you hit limits frequently, check your OpenRouter dashboard for current usage and consider upgrading your plan or spreading requests across multiple models.
1738
-
1739
- ### Production Hardening Issues
1740
-
1741
- - **503 Service Unavailable errors during normal load** – Check load shedding thresholds (`LOAD_SHEDDING_*`). Lower values may trigger too aggressively. Check `/metrics/observability` for memory usage patterns.
1742
- - **Circuit breaker stuck in OPEN state** – Check `/metrics/circuit-breakers` to see failure counts. Verify backend service (Databricks/Azure) is accessible. Circuit will automatically attempt recovery after `CIRCUIT_BREAKER_TIMEOUT` (default: 60s).
1743
- - **"Circuit breaker is OPEN" errors** – The circuit breaker detected too many failures and is protecting against cascading failures. Wait for timeout or fix the underlying issue. Check logs for root cause of failures.
1744
- - **Azure OpenAI specific**: If using Azure OpenAI and seeing circuit breaker errors, verify your `AZURE_OPENAI_ENDPOINT` includes the full path (including `/openai/deployments/YOUR-DEPLOYMENT/chat/completions`). Missing endpoint variable or undefined returns can trigger circuit breaker protection.
1745
- - **High latency after adding production features** – This is unexpected; middleware adds only ~7ξs overhead. Check `/metrics/prometheus` for actual latency distribution. Verify network latency to backend services.
1746
- - **Health check endpoint returns 503 but service seems healthy** – Check individual health check components in the response JSON. Database connectivity or memory issues may trigger this. Review logs for specific health check failures.
1747
- - **Metrics endpoint shows incorrect data** – Metrics are in-memory and reset on restart. For persistent metrics, configure Prometheus scraping. Check that `METRICS_ENABLED=true`.
1748
- - **Request IDs not appearing in logs** – Ensure `REQUEST_LOGGING_ENABLED=true`. Check that structured logging is configured correctly in `src/logger.js`.
1749
- - **Validation errors on valid requests** – Check request body against schemas in `src/api/middleware/validation.js`. Validation is strict by design. Review error details in 400 response.
1750
- - **Graceful shutdown not working** – Ensure process receives SIGTERM (not SIGKILL). Check `GRACEFUL_SHUTDOWN_TIMEOUT` is sufficient for your workload. Kubernetes needs proper `terminationGracePeriodSeconds`.
1751
- - **Prometheus scraping fails** – Verify `/metrics/prometheus` endpoint is accessible. Check Prometheus configuration targets and `metrics_path`. Ensure firewall rules allow scraping.
1752
-
1753
- ### Performance Debugging
1754
-
1755
- Run the included benchmarks to verify performance:
1756
- ```bash
1757
- # Run comprehensive test suite
1758
- node comprehensive-test-suite.js
1759
-
1760
- # Run performance benchmarks
1761
- node performance-benchmark.js
1762
- ```
1763
-
1764
- Expected results:
1765
- - Test pass rate: 100% (80/80 tests)
1766
- - Combined middleware overhead: <10Ξs per request
1767
- - Throughput: >100K requests/second
1768
-
1769
- If performance is degraded:
1770
- 1. Check `/metrics/observability` for latency patterns
1771
- 2. Review memory usage (should be <200MB for typical workload)
1772
- 3. Check circuit breaker states (stuck OPEN states add latency)
1773
- 4. Verify backend API latency (primary bottleneck)
1774
- 5. Review logs for retry patterns (excessive retries indicate backend issues)
1775
-
1776
- ---
1777
-
1778
- ## Roadmap & Known Gaps
1779
-
1780
- ### ✅ Recently Completed
1781
-
1782
- **Production Hardening (All 14 features implemented with 100% pass rate):**
1783
- - ✅ Exponential backoff with jitter retry logic
1784
- - ✅ Circuit breaker pattern for external services
1785
- - ✅ Load shedding with resource monitoring
1786
- - ✅ Graceful shutdown for zero-downtime deployments
1787
- - ✅ HTTP connection pooling
1788
- - ✅ Comprehensive metrics collection (Prometheus format)
1789
- - ✅ Health check endpoints (Kubernetes-ready)
1790
- - ✅ Structured request logging with correlation IDs
1791
- - ✅ Consistent error handling with 8 error types
1792
- - ✅ Input validation (zero-dependency, JSON schema-like)
1793
- - ✅ Token budget enforcement
1794
- - ✅ Path allowlisting and sandboxing
1795
- - ✅ Rate limiting capabilities
1796
- - ✅ Safe command DSL
1797
-
1798
-
1799
- **Latest Features (December 2025):**
1800
- - ✅ **Client-side tool execution** (`TOOL_EXECUTION_MODE=client/passthrough`) – Tools can now execute on the Claude Code CLI side instead of the server, enabling local file operations, local commands, and access to local credentials
1801
- - ✅ **OpenRouter error resilience** – Graceful handling of malformed OpenRouter responses (missing `choices` array), preventing crashes during rate limits or service errors
1802
- - ✅ **Enhanced format conversion** – Improved Anthropic ↔ OpenRouter format conversion for tool calls, ensuring proper `tool_use` block generation and session consistency across providers
1803
-
1804
- ### ðŸ”Ū Future Enhancements
1805
-
1806
- - **Per-file diff comments & conversation threading** – Planned to mirror Claude's review UX.
1807
- - **Automated risk assessment tied to diffs** – Future enhancement leveraging test outcomes and static analysis.
1808
- - **Expanded language-server fidelity** – Currently Tree-sitter-based; deeper AST integration or LSP bridging is a future goal.
1809
- - **Claude Skills parity** – Skills are not reproduced; designing a safe, declarative skill layer is an open area.
1810
- - **Coverage dashboards & historical trends** – Test summary tracks latest runs but no long-term history yet.
1811
- - **Response caching** – Redis-backed response cache for frequently repeated requests (Option 3, Feature 13).
1812
-
1813
- ---
1814
-
1815
- ## Cursor IDE Integration (OpenAI API Compatibility)
1816
-
1817
- Lynkr provides **full Cursor IDE support** through OpenAI-compatible API endpoints, enabling you to use Cursor with any provider (Databricks, Bedrock, OpenRouter, Ollama, etc.) while maintaining all Cursor features.
1818
-
1819
- ### Why Use Lynkr with Cursor?
1820
-
1821
- - 💰 **60-80% cost savings** vs Cursor's default GPT-4 pricing
1822
- - 🔓 **Provider choice** - Use Claude, local models, or any supported provider
1823
- - 🏠 **Self-hosted** - Full control over your AI infrastructure
1824
- - ✅ **Full compatibility** - All Cursor features work (chat, autocomplete, @Codebase search)
1825
-
1826
- ### Quick Setup (5 Minutes)
1827
-
1828
- #### Step 1: Start Lynkr Server
1829
-
1830
- ```bash
1831
- # Navigate to Lynkr directory
1832
- cd /path/to/claude-code
1833
-
1834
- # Start with any provider (Databricks, Bedrock, OpenRouter, Ollama, etc.)
1835
- npm start
1836
-
1837
- # Wait for: "Server listening at http://0.0.0.0:8081" (or your configured PORT)
1838
- ```
1839
-
1840
- **Note**: Lynkr runs on port **8081** by default (configured in `.env` as `PORT=8081`)
1841
-
1842
- #### Step 2: Configure Cursor (Detailed Steps)
1843
-
1844
- 1. **Open Cursor Settings**
1845
- - Mac: Click **Cursor** menu → **Settings** (or press `Cmd+,`)
1846
- - Windows/Linux: Click **File** → **Settings** (or press `Ctrl+,`)
1847
-
1848
- 2. **Navigate to Models Section**
1849
- - In the Settings sidebar, find **Features** section
1850
- - Click on **Models**
1851
-
1852
- 3. **Configure OpenAI API Settings**
1853
-
1854
- Fill in these three fields:
1855
-
1856
- **API Key:**
1857
- ```
1858
- sk-lynkr
1859
- ```
1860
- *(Cursor requires a non-empty value, but Lynkr ignores it. You can use any text like "dummy" or "lynkr")*
1861
-
1862
- **Base URL:**
1863
- ```
1864
- http://localhost:8081/v1
1865
- ```
1866
- ⚠ïļ **Critical:**
1867
- - Use port **8081** (or your configured PORT in .env)
1868
- - Must end with `/v1`
1869
- - Include `http://` prefix
1870
-
1871
- **Model:**
1872
-
1873
- Choose based on your `MODEL_PROVIDER` in `.env`:
1874
- - **Bedrock**: `claude-3.5-sonnet` or `claude-sonnet-4.5`
1875
- - **Databricks**: `claude-sonnet-4.5`
1876
- - **OpenRouter**: `anthropic/claude-3.5-sonnet`
1877
- - **Ollama**: `qwen2.5-coder:latest` (or your OLLAMA_MODEL)
1878
- - **Azure OpenAI**: `gpt-4o` or your deployment name
1879
-
1880
- 4. **Save Settings** (auto-saves in Cursor)
1881
-
1882
- #### Step 3: Test the Integration
1883
-
1884
- **Test 1: Basic Chat** (`Cmd+L` / `Ctrl+L`)
1885
- ```
1886
- You: "Hello, can you see this?"
1887
- Expected: Response from Claude via Lynkr ✅
1888
- ```
1889
-
1890
- **Test 2: Inline Edits** (`Cmd+K` / `Ctrl+K`)
1891
- ```
1892
- 1. Select some code
1893
- 2. Press Cmd+K (Mac) or Ctrl+K (Windows/Linux)
1894
- 3. Type: "add a comment explaining this code"
1895
- Expected: Code suggestions appear inline ✅
1896
- ```
1897
-
1898
- **Test 3: @Codebase Search** (requires embeddings)
1899
- ```
1900
- You: "@Codebase where is the config file?"
1901
- Expected:
1902
- ✅ If embeddings configured: Semantic search finds relevant files
1903
- ❌ If no embeddings: Error or "not available" message
1904
- ```
1905
-
1906
- #### Step 4: Verify Lynkr Logs
1907
-
1908
- Check your terminal where Lynkr is running. You should see:
1909
- ```
1910
- [INFO] POST /v1/chat/completions
1911
- [INFO] Routing to bedrock (or your provider)
1912
- [INFO] Response sent: 200
1913
- ```
1914
-
1915
- ### Feature Compatibility Matrix
1916
-
1917
- | Feature | Without Embeddings | With Embeddings |
1918
- |---------|-------------------|-----------------|
1919
- | **Chat in current file** | ✅ Works | ✅ Works |
1920
- | **Inline autocomplete** | ✅ Works | ✅ Works |
1921
- | **Cmd+K edits** | ✅ Works | ✅ Works |
1922
- | **Manual @file references** | ✅ Works | ✅ Works |
1923
- | **Terminal commands** | ✅ Works | ✅ Works |
1924
- | **@Codebase semantic search** | ❌ Requires embeddings | ✅ Works |
1925
- | **Automatic context** | ❌ Requires embeddings | ✅ Works |
1926
- | **Find similar code** | ❌ Requires embeddings | ✅ Works |
1927
-
1928
- ### Enabling @Codebase Semantic Search (Optional)
1929
-
1930
- For Cursor's @Codebase semantic search, you need embeddings support.
1931
-
1932
- **⚡ Already using OpenRouter? You're all set!**
1933
-
1934
- If you configured `MODEL_PROVIDER=openrouter`, embeddings **work automatically** with the same `OPENROUTER_API_KEY` - no additional setup needed. OpenRouter handles both chat completions AND embeddings with one key.
1935
-
1936
- **🔧 Using a different provider? Add embeddings:**
1937
-
1938
- If you're using Databricks, Bedrock, Ollama, or other providers for chat, add ONE of these for embeddings (ordered by privacy):
1939
-
1940
- **Option A: Ollama (100% Local - Most Private) 🔒**
1941
- ```bash
1942
- # Add to .env
1943
- ollama pull nomic-embed-text
1944
- OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
1945
-
1946
- # Cost: FREE, Privacy: 100% local, Quality: Good
1947
- # No cloud APIs, perfect for privacy-sensitive work
1948
- ```
1949
-
1950
- **Option B: llama.cpp (100% Local - GGUF Models) 🔒**
1951
- ```bash
1952
- # Download model and start server
1953
- ./llama-server -m nomic-embed-text-v1.5.Q4_K_M.gguf --port 8080 --embedding
1954
-
1955
- # Add to .env
1956
- LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings
1957
-
1958
- # Cost: FREE, Privacy: 100% local, Quality: Good
1959
- # Uses quantized GGUF models for efficiency
1960
- ```
1961
-
1962
- **Option C: OpenRouter (Cloud - Cheapest Cloud Option)**
1963
- ```bash
1964
- # Add to .env
1965
- OPENROUTER_API_KEY=sk-or-v1-your-key-here
1966
-
1967
- # Get key from: https://openrouter.ai/keys
1968
- # Cost: ~$0.0001 per 1K tokens (~$0.01-0.10/month typical usage)
1969
- # Privacy: Cloud, Quality: Excellent
1970
-
1971
- # Advanced: Use separate models for chat vs embeddings (optional)
1972
- OPENROUTER_MODEL=anthropic/claude-3.5-sonnet # Chat model
1973
- OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small # Embeddings model
1974
- # (Defaults to text-embedding-ada-002 if not specified)
1975
- ```
1976
-
1977
- **Option D: OpenAI Direct (Cloud)**
1978
- ```bash
1979
- # Add to .env
1980
- OPENAI_API_KEY=sk-your-key-here
1981
-
1982
- # Get key from: https://platform.openai.com/api-keys
1983
- # Cost: ~$0.0001 per 1K tokens
1984
- # Privacy: Cloud, Quality: Excellent
1985
- ```
1986
-
1987
- **ðŸŽŊ Advanced: Override Embeddings Provider Explicitly (Optional)**
1988
-
1989
- By default, embeddings use the same provider as `MODEL_PROVIDER` (if supported) or automatically select the first available provider. To force a specific provider:
1990
-
1991
- ```bash
1992
- # Add to .env to explicitly choose embeddings provider
1993
- EMBEDDINGS_PROVIDER=ollama # Use Ollama embeddings
1994
- # OR
1995
- EMBEDDINGS_PROVIDER=llamacpp # Use llama.cpp embeddings
1996
- # OR
1997
- EMBEDDINGS_PROVIDER=openrouter # Use OpenRouter embeddings
1998
- # OR
1999
- EMBEDDINGS_PROVIDER=openai # Use OpenAI embeddings
2000
- ```
2001
-
2002
- **Example use case**: Use Databricks for chat but force Ollama for embeddings (privacy)
2003
- ```bash
2004
- MODEL_PROVIDER=databricks
2005
- DATABRICKS_API_KEY=your-key
2006
- EMBEDDINGS_PROVIDER=ollama # Force local embeddings
2007
- OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
2008
- ```
2009
-
2010
- #### Embedding Models Comparison
2011
-
2012
- | Option | Privacy | Cost/Month | Setup Complexity | Quality | Speed |
2013
- |--------|---------|------------|------------------|---------|-------|
2014
- | **Ollama** | 🔒 100% Local | FREE | Easy (1 command) | Good | Fast |
2015
- | **llama.cpp** | 🔒 100% Local | FREE | Medium (download GGUF) | Good | Fast |
2016
- | **OpenRouter** | ☁ïļ Cloud | $0.01-0.10 | Easy (get API key) | Excellent | Very Fast |
2017
- | **OpenAI** | ☁ïļ Cloud | $0.01-0.10 | Easy (get API key) | Excellent | Very Fast |
2018
-
2019
- **Recommended setups:**
2020
- - **100% Local/Private**: Ollama chat + Ollama embeddings (zero cloud dependencies)
2021
- - **Hybrid**: Databricks/Bedrock chat + Ollama embeddings (private search, cloud chat)
2022
- - **Simple Cloud**: OpenRouter chat + OpenRouter embeddings (one key for both)
2023
-
2024
- **Restart Lynkr**, and @Codebase will work!
2025
-
2026
- ### Available Endpoints
2027
-
2028
- Lynkr implements all 4 OpenAI API endpoints for full Cursor compatibility:
2029
-
2030
- 1. **POST /v1/chat/completions** - Chat with streaming support
2031
- - Handles all chat/completion requests
2032
- - Converts OpenAI format ↔ Anthropic format automatically
2033
- - Full tool calling support
2034
-
2035
- 2. **GET /v1/models** - List available models
2036
- - Returns models based on configured provider
2037
- - Updates dynamically when you change providers
2038
-
2039
- 3. **POST /v1/embeddings** - Generate embeddings
2040
- - Required for @Codebase semantic search
2041
- - Supports: Ollama (local), llama.cpp (local), OpenRouter (cloud), OpenAI (cloud)
2042
- - Smart provider detection: explicit → same as chat → first available
2043
- - Falls back gracefully if not configured (returns 501)
2044
-
2045
- 4. **GET /v1/health** - Health check
2046
- - Verify Lynkr is running
2047
- - Check provider status
2048
-
2049
- ### Cost Comparison (100K requests/month)
2050
-
2051
- | Setup | Monthly Cost | Embeddings Setup | Features | Privacy |
2052
- |-------|--------------|------------------|----------|---------|
2053
- | **Cursor native (GPT-4)** | $20-50 | Built-in | All features | Cloud |
2054
- | **Lynkr + OpenRouter** | $5-10 | ⚡ **Same key for both** | All features, simplest setup | Cloud |
2055
- | **Lynkr + Databricks** | $15-30 | +Ollama/OpenRouter | All features | Cloud chat, local/cloud search |
2056
- | **Lynkr + Ollama + Ollama embeddings** | **100% FREE** 🔒 | Ollama (local) | All features, 100% local | 100% Local |
2057
- | **Lynkr + Ollama + llama.cpp embeddings** | **100% FREE** 🔒 | llama.cpp (local) | All features, 100% local | 100% Local |
2058
- | **Lynkr + Ollama + OpenRouter embeddings** | $0.01-0.10 | OpenRouter (cloud) | All features, hybrid | Local chat, cloud search |
2059
- | **Lynkr + Ollama (no embeddings)** | **FREE** | None | Chat/Cmd+K only, no @Codebase | 100% Local |
2060
-
2061
- ### Provider Recommendations for Cursor
2062
-
2063
- **Best for Privacy (100% Local) 🔒:**
2064
- - **Ollama + Ollama embeddings** - Zero cloud dependencies, completely private
2065
- - Cost: **100% FREE**
2066
- - Privacy: All data stays on your machine
2067
- - Full @Codebase support with local embeddings
2068
- - Perfect for: Sensitive codebases, offline work, privacy requirements
2069
-
2070
- **Best for Simplicity (Recommended for most users):**
2071
- - **OpenRouter** - ONE key for chat + embeddings, no extra setup
2072
- - Cost: ~$5-10/month (100K requests)
2073
- - Full @Codebase support out of the box
2074
- - Access to 100+ models (Claude, GPT, Llama, etc.)
2075
-
2076
- **Best for Production:**
2077
- - **Databricks** - Claude Sonnet 4.5, enterprise-grade
2078
- - **Bedrock** - AWS infrastructure, 100+ models
2079
- - Add Ollama embeddings (local) or OpenRouter (cloud) for @Codebase
2080
-
2081
- **Best for Hybrid (Local Chat + Cloud Search):**
2082
- - **Ollama** - FREE (local, offline) + $0.01-0.10/month for cloud embeddings
2083
- - Privacy: 100% local chat, cloud @Codebase search
2084
- - Add OpenRouter key only for @Codebase search
2085
-
2086
- **Best for Speed:**
2087
- - **Ollama** - Local inference, 100-500ms latency
2088
- - **llama.cpp** - Optimized GGUF models, 50-300ms latency
2089
- - **OpenRouter** - Fast cloud inference, global CDN
2090
-
2091
- ### Troubleshooting
2092
-
2093
- #### Issue: "Connection refused" or "Network error"
2094
-
2095
- **Symptoms:** Cursor shows connection errors, can't reach Lynkr
2096
-
2097
- **Solutions:**
2098
- 1. **Verify Lynkr is running:**
2099
- ```bash
2100
- # Check if Lynkr process is running on port 8081
2101
- lsof -i :8081
2102
- # Should show node process
2103
- ```
2104
-
2105
- 2. **Test health endpoint:**
2106
- ```bash
2107
- curl http://localhost:8081/v1/health
2108
- # Should return: {"status":"ok"}
2109
- ```
2110
-
2111
- 3. **Check port number:**
2112
- - Verify Cursor Base URL uses correct port: `http://localhost:8081/v1`
2113
- - Check `.env` file: `PORT=8081`
2114
- - If you changed PORT, update Cursor settings to match
2115
-
2116
- 4. **Verify URL format:**
2117
- - ✅ Correct: `http://localhost:8081/v1`
2118
- - ❌ Wrong: `http://localhost:8081` (missing `/v1`)
2119
- - ❌ Wrong: `localhost:8081/v1` (missing `http://`)
2120
-
2121
- #### Issue: "Invalid API key" or "Unauthorized"
2122
-
2123
- **Symptoms:** Cursor says API key is invalid
2124
-
2125
- **Solutions:**
2126
- - Lynkr doesn't validate API keys from Cursor
2127
- - This error means Cursor isn't reaching Lynkr at all
2128
- - Double-check Base URL in Cursor: `http://localhost:8081/v1`
2129
- - Make sure you included `/v1` at the end
2130
- - Try clearing and re-entering the Base URL
2131
-
2132
- #### Issue: "Model not found" or "Invalid model"
2133
-
2134
- **Symptoms:** Cursor can't find the model you specified
2135
-
2136
- **Solutions:**
2137
- 1. **Match model name to your provider:**
2138
- - **Bedrock**: Use `claude-3.5-sonnet` or `claude-sonnet-4.5`
2139
- - **Databricks**: Use `claude-sonnet-4.5`
2140
- - **OpenRouter**: Use `anthropic/claude-3.5-sonnet`
2141
- - **Ollama**: Use your actual model name like `qwen2.5-coder:latest`
2142
-
2143
- 2. **Try generic names:**
2144
- - Lynkr translates generic names, so try:
2145
- - `claude-3.5-sonnet`
2146
- - `gpt-4o`
2147
- - `claude-sonnet-4.5`
2148
-
2149
- 3. **Check provider logs:**
2150
- - Look at Lynkr terminal for actual error from provider
2151
- - Provider might not support the model you requested
2152
-
2153
- #### Issue: @Codebase doesn't work
2154
-
2155
- **Symptoms:** @Codebase search shows "not available" or errors
2156
-
2157
- **Solutions:**
2158
- 1. **Configure embeddings** (choose ONE):
2159
-
2160
- **Option A: Ollama (Local, Free)**
2161
- ```bash
2162
- # Pull embedding model
2163
- ollama pull nomic-embed-text
2164
-
2165
- # Add to .env
2166
- OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
2167
-
2168
- # Restart Lynkr
2169
- npm start
2170
- ```
2171
-
2172
- **Option B: llama.cpp (Local, Free)**
2173
- ```bash
2174
- # Start llama.cpp server with embedding model
2175
- ./llama-server -m nomic-embed-text-v1.5.Q4_K_M.gguf --port 8080 --embedding
2176
-
2177
- # Add to .env
2178
- LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings
2179
-
2180
- # Restart Lynkr
2181
- npm start
2182
- ```
2183
-
2184
- **Option C: OpenRouter (Cloud, ~$0.01-0.10/month)**
2185
- ```bash
2186
- # Add to .env
2187
- OPENROUTER_API_KEY=sk-or-v1-your-key-here
2188
-
2189
- # Restart Lynkr
2190
- npm start
2191
- ```
2192
-
2193
- 2. **Test embeddings endpoint:**
2194
- ```bash
2195
- curl http://localhost:8081/v1/embeddings \
2196
- -H "Content-Type: application/json" \
2197
- -d '{"input": "test"}'
2198
-
2199
- # Should return JSON with embeddings array
2200
- # If 501 error: Embeddings not configured
2201
- ```
2202
-
2203
- 3. **Check Lynkr logs:**
2204
- - Look for: `Embeddings not configured` warning
2205
- - Or: `Generating embeddings with ollama/openrouter/etc.`
2206
-
2207
- #### Issue: Chat works but responses are slow
2208
-
2209
- **Symptoms:** Long wait times for responses
2210
-
2211
- **Solutions:**
2212
- 1. **Check provider latency:**
2213
- - **Ollama/llama.cpp**: Local models may be slow on weak hardware
2214
- - **Cloud providers**: Check your internet connection
2215
- - **Bedrock/Databricks**: Check region latency
2216
-
2217
- 2. **Optimize Ollama:**
2218
- ```bash
2219
- # Use smaller/faster models
2220
- ollama pull qwen2.5-coder:1.5b # Smaller = faster
2221
- ```
2222
-
2223
- 3. **Check Lynkr logs:**
2224
- - Look for actual response times
2225
- - Example: `Response time: 2500ms`
2226
-
2227
- #### Issue: Autocomplete doesn't work with Lynkr
2228
-
2229
- **Symptoms:** Code autocomplete suggestions don't appear
2230
-
2231
- **This is expected behavior:**
2232
- - Cursor's inline autocomplete uses Cursor's built-in models (fast, local)
2233
- - Autocomplete does NOT go through Lynkr
2234
- - Only these features use Lynkr:
2235
- - ✅ Chat (`Cmd+L` / `Ctrl+L`)
2236
- - ✅ Cmd+K inline edits
2237
- - ✅ @Codebase search
2238
- - ❌ Autocomplete (uses Cursor's models)
2239
-
2240
- #### Issue: Embeddings work but search results are poor
2241
-
2242
- **Symptoms:** @Codebase returns irrelevant files
2243
-
2244
- **Solutions:**
2245
- 1. **Try better embedding models:**
2246
- ```bash
2247
- # For Ollama - upgrade to larger model
2248
- ollama pull mxbai-embed-large # Better quality than nomic-embed-text
2249
- OLLAMA_EMBEDDINGS_MODEL=mxbai-embed-large
2250
- ```
2251
-
2252
- 2. **Use cloud embeddings for better quality:**
2253
- ```bash
2254
- # OpenRouter has excellent embeddings
2255
- OPENROUTER_API_KEY=sk-or-v1-your-key
2256
- ```
2257
-
2258
- 3. **This is a Cursor indexing issue, not Lynkr:**
2259
- - Cursor needs to re-index your codebase
2260
- - Try closing and reopening the workspace
2261
-
2262
- #### Issue: "Too many requests" or rate limiting
2263
-
2264
- **Symptoms:** Provider returns 429 errors
2265
-
2266
- **Solutions:**
2267
- 1. **Enable fallback provider:**
2268
- ```bash
2269
- # In .env
2270
- FALLBACK_ENABLED=true
2271
- FALLBACK_PROVIDER=openrouter # Or another provider
2272
- ```
2273
-
2274
- 2. **Use different provider:**
2275
- - Some providers have higher rate limits
2276
- - OpenRouter aggregates multiple providers
2277
-
2278
- 3. **Check provider dashboard:**
2279
- - Verify you haven't exceeded quota
2280
- - Some providers have free tier limits
2281
-
2282
- #### Getting More Help
2283
-
2284
- **Check Lynkr logs in terminal:**
2285
- ```bash
2286
- # Look for error messages in the terminal where Lynkr is running
2287
- # Logs show:
2288
- # - Which provider was used
2289
- # - Request/response details
2290
- # - Error messages from providers
2291
- ```
2292
-
2293
- **Test individual endpoints:**
2294
- ```bash
2295
- # Test health
2296
- curl http://localhost:8081/v1/health
2297
-
2298
- # Test chat
2299
- curl http://localhost:8081/v1/chat/completions \
2300
- -H "Content-Type: application/json" \
2301
- -d '{"model":"claude-3.5-sonnet","messages":[{"role":"user","content":"hi"}]}'
2302
-
2303
- # Test embeddings
2304
- curl http://localhost:8081/v1/embeddings \
2305
- -H "Content-Type: application/json" \
2306
- -d '{"input":"test"}'
2307
- ```
2308
-
2309
- **Enable debug logging:**
2310
- ```bash
2311
- # In .env
2312
- LOG_LEVEL=debug
2313
-
2314
- # Restart Lynkr
2315
- npm start
2316
- ```
2317
-
2318
- ### Architecture
2319
-
2320
- ```
2321
- Cursor IDE
2322
- ↓ OpenAI API format
2323
- Lynkr Proxy
2324
- ↓ Converts to Anthropic format
2325
- Your Provider (Databricks/Bedrock/OpenRouter/Ollama/etc.)
2326
- ↓ Returns response
2327
- Lynkr Proxy
2328
- ↓ Converts back to OpenAI format
2329
- Cursor IDE (displays result)
2330
- ```
2331
-
2332
- ### Advanced Configuration
2333
-
2334
- #### Setup 1: Simplest (One key for everything - OpenRouter)
2335
- ```bash
2336
- # Chat + Embeddings: OpenRouter handles both with ONE key
2337
- MODEL_PROVIDER=openrouter
2338
- OPENROUTER_API_KEY=sk-or-v1-your-key-here
2339
-
2340
- # Optional: Use different models for chat vs embeddings
2341
- OPENROUTER_MODEL=anthropic/claude-3.5-sonnet # Chat
2342
- OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small # Embeddings
2343
-
2344
- # That's it! Both chat and @Codebase search work.
2345
- # Cost: ~$5-10/month for 100K requests
2346
- ```
2347
-
2348
- #### Setup 2: Most Private (100% Local - Ollama + Ollama)
2349
- ```bash
2350
- # Chat: Ollama local model (FREE)
2351
- MODEL_PROVIDER=ollama
2352
- OLLAMA_MODEL=qwen2.5-coder:latest
2353
- OLLAMA_ENDPOINT=http://localhost:11434
2354
-
2355
- # Embeddings: Ollama local embeddings (FREE)
2356
- OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
2357
- OLLAMA_EMBEDDINGS_ENDPOINT=http://localhost:11434/api/embeddings
2358
-
2359
- # Zero cloud dependencies, 100% private
2360
- # Cost: FREE
2361
- # Privacy: All data stays on your machine
2362
- ```
2363
-
2364
- #### Setup 3: Most Private (100% Local - Ollama + llama.cpp)
2365
- ```bash
2366
- # Chat: Ollama local model (FREE)
2367
- MODEL_PROVIDER=ollama
2368
- OLLAMA_MODEL=qwen2.5-coder:latest
2369
-
2370
- # Embeddings: llama.cpp with GGUF model (FREE)
2371
- LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings
2372
-
2373
- # Start llama.cpp separately:
2374
- # ./llama-server -m nomic-embed-text-v1.5.Q4_K_M.gguf --port 8080 --embedding
2375
-
2376
- # Cost: FREE
2377
- # Privacy: All data stays on your machine
2378
- ```
2379
-
2380
- #### Setup 4: Hybrid (Premium Chat + Local Search)
2381
- ```bash
2382
- # Chat: Databricks Claude Sonnet 4.5 (best quality)
2383
- MODEL_PROVIDER=databricks
2384
- DATABRICKS_API_KEY=your-key
2385
- DATABRICKS_API_BASE=https://your-workspace.databricks.com
2386
-
2387
- # Embeddings: Ollama local (private @Codebase search)
2388
- EMBEDDINGS_PROVIDER=ollama # Force Ollama for embeddings
2389
- OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
2390
-
2391
- # Cost: ~$15-30/month for chat, FREE for embeddings
2392
- # Privacy: Cloud chat, local @Codebase search
2393
- ```
2394
-
2395
- #### Setup 5: Hybrid (Premium Chat + Cloud Search)
2396
- ```bash
2397
- # Chat: Bedrock with Claude (AWS infrastructure)
2398
- MODEL_PROVIDER=bedrock
2399
- AWS_BEDROCK_API_KEY=your-key
2400
- AWS_BEDROCK_REGION=us-east-2
2401
- AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
2402
-
2403
- # Embeddings: OpenRouter (cheaper than Bedrock embeddings)
2404
- OPENROUTER_API_KEY=sk-or-v1-your-key-here
2405
-
2406
- # Cost: ~$15-30/month for chat + $0.01-0.10/month for embeddings
2407
- ```
2408
-
2409
- #### Setup 6: Cost Optimized (Hybrid Routing)
2410
- ```bash
2411
- # Simple queries → Ollama (FREE, local)
2412
- # Complex queries → Databricks (premium, cloud)
2413
- MODEL_PROVIDER=ollama
2414
- PREFER_OLLAMA=true
2415
- FALLBACK_ENABLED=true
2416
- FALLBACK_PROVIDER=databricks
2417
-
2418
- # Embeddings: Local for privacy
2419
- OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
2420
-
2421
- # Cost: Mostly FREE (Ollama handles 70-80% of requests)
2422
- # Only complex tool-heavy requests go to Databricks
2423
- ```
2424
-
2425
- ### Visual Setup Summary
2426
-
2427
- ```
2428
- ┌─────────────────────────────────────────────────────────┐
2429
- │ Cursor Settings → Models → OpenAI API │
2430
- ├─────────────────────────────────────────────────────────â”Ī
2431
- │ │
2432
- │ API Key: sk-lynkr │
2433
- │ (or any non-empty value) │
2434
- │ │
2435
- │ Base URL: http://localhost:8081/v1 │
2436
- │ ⚠ïļ Must include /v1 │
2437
- │ │
2438
- │ Model: claude-3.5-sonnet │
2439
- │ (or your provider's model) │
2440
- │ │
2441
- └─────────────────────────────────────────────────────────┘
2442
- ```
2443
-
2444
- ### What Makes This Different from Cursor Native?
2445
-
2446
- | Aspect | Cursor Native | Lynkr + Cursor |
2447
- |--------|---------------|----------------|
2448
- | **Providers** | OpenAI only | 9+ providers (Bedrock, Databricks, OpenRouter, Ollama, llama.cpp, etc.) |
2449
- | **Costs** | OpenAI pricing | 60-80% cheaper (or 100% FREE with Ollama) |
2450
- | **Privacy** | Cloud-only | Can run 100% locally (Ollama + local embeddings) |
2451
- | **Embeddings** | Built-in (cloud) | 4 options: Ollama (local), llama.cpp (local), OpenRouter (cloud), OpenAI (cloud) |
2452
- | **Control** | Black box | Full observability, logs, metrics |
2453
- | **Features** | All Cursor features | All Cursor features (chat, Cmd+K, @Codebase) |
2454
- | **Flexibility** | Fixed setup | Mix providers (e.g., Bedrock chat + Ollama embeddings) |
2455
-
2456
- ---
2457
-
2458
- ## Frequently Asked Questions (FAQ)
2459
-
2460
- <details>
2461
- <summary><strong>Q: Can I use Lynkr with the official Claude Code CLI?</strong></summary>
2462
-
2463
- **A:** Yes! Lynkr is designed as a drop-in replacement for Anthropic's backend. Simply set `ANTHROPIC_BASE_URL` to point to your Lynkr server:
2464
-
2465
- ```bash
2466
- export ANTHROPIC_BASE_URL=http://localhost:8081
2467
- export ANTHROPIC_API_KEY=dummy # Required by CLI, but ignored by Lynkr
2468
- claude "Your prompt here"
2469
- ```
2470
-
2471
- **Note:** Default port is 8081 (configured in `.env` as `PORT=8081`)
2472
-
2473
- *Related searches: Claude Code proxy setup, Claude Code alternative backend, self-hosted Claude Code*
2474
- </details>
2475
-
2476
- <details>
2477
- <summary><strong>Q: How much money does Lynkr save on token costs?</strong></summary>
2478
-
2479
- **A:** With all 5 optimization phases enabled, Lynkr achieves **60-80% token reduction**:
2480
-
2481
- - **Normal workloads:** 20-30% reduction
2482
- - **Memory-heavy:** 30-45% reduction
2483
- - **Tool-heavy:** 25-35% reduction
2484
- - **Long conversations:** 35-40% reduction
2485
-
2486
- At 100k requests/month, this translates to **$6,400-9,600/month savings** ($77k-115k/year).
2487
-
2488
- *Related searches: Claude Code cost reduction, token optimization strategies, AI cost savings*
2489
- </details>
2490
-
2491
- <details>
2492
- <summary><strong>Q: Can I use Ollama models with Lynkr and Cursor?</strong></summary>
2493
-
2494
- **A:** Yes! Ollama works for both chat AND embeddings (100% local, FREE):
2495
-
2496
- **Chat setup:**
2497
- ```bash
2498
- export MODEL_PROVIDER=ollama
2499
- export OLLAMA_MODEL=qwen2.5-coder:latest # or llama3.1, mistral, etc.
2500
- lynkr start
2501
- ```
2502
-
2503
- **Embeddings setup (for @Codebase search):**
2504
- ```bash
2505
- # Pull embedding model
2506
- ollama pull nomic-embed-text
2507
-
2508
- # Add to .env
2509
- OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
2510
- ```
2511
-
2512
- **Best Ollama models for coding:**
2513
- - **Chat**: `qwen2.5-coder:latest` (7B) - Optimized for code generation
2514
- - **Chat**: `llama3.1:8b` - General-purpose, good balance
2515
- - **Chat**: `codellama:13b` - Higher quality, needs more RAM
2516
- - **Embeddings**: `nomic-embed-text` (137M) - Best all-around
2517
- - **Embeddings**: `mxbai-embed-large` (335M) - Higher quality
2518
-
2519
- **100% local, 100% private, 100% FREE!** 🔒
2520
-
2521
- *Related searches: Ollama Claude Code integration, local LLM for coding, offline AI assistant, private embeddings*
2522
- </details>
2523
-
2524
- <details>
2525
- <summary><strong>Q: How do I enable @Codebase search in Cursor with Lynkr?</strong></summary>
2526
-
2527
- **A:** @Codebase semantic search requires embeddings. Choose ONE option:
2528
-
2529
- **Option 1: Ollama (100% Local, FREE)** 🔒
2530
- ```bash
2531
- ollama pull nomic-embed-text
2532
- # Add to .env: OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
2533
- ```
2534
-
2535
- **Option 2: llama.cpp (100% Local, FREE)** 🔒
2536
- ```bash
2537
- ./llama-server -m nomic-embed-text-v1.5.Q4_K_M.gguf --port 8080 --embedding
2538
- # Add to .env: LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings
2539
- ```
2540
-
2541
- **Option 3: OpenRouter (Cloud, ~$0.01-0.10/month)**
2542
- ```bash
2543
- # Add to .env: OPENROUTER_API_KEY=sk-or-v1-your-key
2544
- ```
2545
-
2546
- **Option 4: OpenAI (Cloud, ~$0.01-0.10/month)**
2547
- ```bash
2548
- # Add to .env: OPENAI_API_KEY=sk-your-key
2549
- ```
2550
-
2551
- **After configuring, restart Lynkr.** @Codebase will then work in Cursor!
2552
-
2553
- **Smart provider detection:**
2554
- - Uses same provider as chat (if supported)
2555
- - Or automatically selects first available
2556
- - Or use `EMBEDDINGS_PROVIDER=ollama` to force a specific provider
2557
-
2558
- *Related searches: Cursor @Codebase setup, semantic code search, local embeddings, private codebase search*
2559
- </details>
2560
-
2561
- <details>
2562
- <summary><strong>Q: What are the performance differences between providers?</strong></summary>
2563
-
2564
- **A:** Performance comparison:
2565
-
2566
- | Provider | Latency | Cost | Tool Support | Best For |
2567
- |----------|---------|------|--------------|----------|
2568
- | **Databricks/Azure** | 500ms-2s | $$$ | Excellent | Enterprise production |
2569
- | **OpenRouter** | 300ms-1.5s | $$ | Excellent | Flexibility, cost optimization |
2570
- | **Ollama** | 100-500ms | Free | Limited | Local development, privacy |
2571
- | **llama.cpp** | 50-300ms | Free | Limited | Maximum performance |
2572
-
2573
- *Related searches: LLM provider comparison, Claude Code performance, best AI model for coding*
2574
- </details>
2575
-
2576
- <details>
2577
- <summary><strong>Q: Is this an exact drop-in replacement for Anthropic's backend?</strong></summary>
2578
-
2579
- **A:** No. Lynkr mimics key Claude Code CLI behaviors but is intentionally extensible. Some premium Anthropic features (Claude Skills, hosted sandboxes) are out of scope for self-hosted deployments.
2580
-
2581
- **What works:** Core workflows (chat, tool calls, repo operations, Git integration, MCP servers)
2582
- **What's different:** Self-hosted = you control infrastructure, security, and scaling
2583
-
2584
- *Related searches: Claude Code alternatives, self-hosted AI coding assistant*
2585
- </details>
2586
-
2587
- <details>
2588
- <summary><strong>Q: How does Lynkr compare with Anthropic's hosted backend?</strong></summary>
2589
-
2590
- **A:** Functionally they overlap on core workflows (chat, tool calls, repo ops), but differ in scope:
2591
-
2592
- | Capability | Anthropic Hosted Backend | Claude Code Proxy |
2593
- |------------|-------------------------|-------------------|
2594
- | Claude models | Anthropic-operated Sonnet/Opus | Adapters for Databricks (default), Azure Anthropic, OpenRouter (100+ models), and Ollama (local models) |
2595
- | Prompt cache | Managed, opaque | Local LRU cache with configurable TTL/size |
2596
- | Git & workspace tools | Anthropic-managed hooks | Local Node handlers (`src/tools/`) with policy gate |
2597
- | Web search/fetch | Hosted browsing agent, JS-capable | Local HTTP fetch (no JS) plus optional policy fallback |
2598
- | MCP orchestration | Anthropic-managed sandbox | Local MCP discovery, optional Docker sandbox |
2599
- | Secure sandboxes | Anthropic-provided remote sandboxes | Optional Docker runtime; full access if disabled |
2600
- | Claude Skills / workflows | Available in hosted product | Not implemented (future roadmap) |
2601
- | Support & SLAs | Anthropic-run service | Self-hosted; you own uptime, auth, logging |
2602
- | Cost & scaling | Usage-billed API | Whatever infra you deploy (Node + dependencies) |
2603
-
2604
- The proxy is ideal when you need local control, custom tooling, or non-Anthropic model endpoints. If you require fully managed browsing, secure sandboxes, or enterprise SLA, stick with the hosted backend.
2605
-
2606
- *Related searches: Anthropic API alternatives, Claude Code self-hosted vs cloud*
2607
- </details>
2608
-
2609
- <details>
2610
- <summary><strong>Q: Does prompt caching work like Anthropic's cache?</strong></summary>
2611
-
2612
- **A:** Yes, functionally similar. Identical messages (model, messages, tools, sampling params) reuse cached responses until TTL expires. Tool-invoking turns skip caching.
2613
-
2614
- Lynkr's caching implementation matches Claude's cache semantics, providing the same latency and cost benefits.
2615
-
2616
- *Related searches: Prompt caching Claude, LLM response caching, reduce AI API costs*
2617
- </details>
2618
-
2619
- <details>
2620
- <summary><strong>Q: Can I connect multiple MCP servers?</strong></summary>
2621
-
2622
- **A:** Yes! Place multiple manifests in `MCP_MANIFEST_DIRS`. Each server is launched and its tools are namespaced.
2623
-
2624
- ```bash
2625
- export MCP_MANIFEST_DIRS=/path/to/manifests:/another/path
2626
- ```
2627
-
2628
- Lynkr automatically discovers and orchestrates all MCP servers.
2629
-
2630
- *Related searches: MCP server integration, Model Context Protocol setup, multiple MCP servers*
2631
- </details>
2632
-
2633
- <details>
2634
- <summary><strong>Q: How do I change the workspace root?</strong></summary>
2635
-
2636
- **A:** Set `WORKSPACE_ROOT` before starting the proxy:
2637
-
2638
- ```bash
2639
- export WORKSPACE_ROOT=/path/to/your/project
2640
- lynkr start
2641
- ```
2642
-
2643
- The indexer and filesystem tools operate relative to this path.
2644
-
2645
- *Related searches: Claude Code workspace configuration, change working directory*
2646
- </details>
2647
-
2648
- <details>
2649
- <summary><strong>Q: What is llama.cpp and when should I use it over Ollama?</strong></summary>
2650
-
2651
- **A:** llama.cpp is a high-performance C++ inference engine for running LLMs locally. Compared to Ollama:
2652
-
2653
- **llama.cpp advantages:**
2654
- - ✅ **Faster inference** - Optimized C++ code
2655
- - ✅ **Less memory** - Advanced quantization (Q2_K to Q8_0)
2656
- - ✅ **Any GGUF model** - Direct HuggingFace support
2657
- - ✅ **More GPU backends** - CUDA, Metal, ROCm, Vulkan, SYCL
2658
- - ✅ **Fine-grained control** - Context length, GPU layers, etc.
2659
-
2660
- **Use llama.cpp when you need:**
2661
- - Maximum inference speed and minimum memory
2662
- - Specific quantization levels
2663
- - GGUF models not packaged for Ollama
2664
-
2665
- **Use Ollama when:** You prefer easier setup and don't need the extra control.
2666
-
2667
- *Related searches: llama.cpp vs Ollama, GGUF model inference, local LLM performance*
2668
- </details>
2669
-
2670
- <details>
2671
- <summary><strong>Q: How do I set up llama.cpp with Lynkr?</strong></summary>
2672
-
2673
- **A:** Follow these steps to integrate llama.cpp with Lynkr:
2674
-
2675
- ```bash
2676
- # 1. Build llama.cpp (or download pre-built binary)
2677
- git clone https://github.com/ggerganov/llama.cpp
2678
- cd llama.cpp && make
2679
-
2680
- # 2. Download a GGUF model
2681
- wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_k_m.gguf
2682
-
2683
- # 3. Start the server
2684
- ./llama-server -m qwen2.5-coder-7b-instruct-q4_k_m.gguf --port 8080
2685
-
2686
- # 4. Configure Lynkr
2687
- export MODEL_PROVIDER=llamacpp
2688
- export LLAMACPP_ENDPOINT=http://localhost:8080
2689
- lynkr start
2690
- ```
2691
-
2692
- *Related searches: llama.cpp setup, GGUF model deployment, llama-server configuration*
2693
- </details>
2694
-
2695
- <details>
2696
- <summary><strong>Q: What is OpenRouter and why should I use it?</strong></summary>
2697
-
2698
- **A:** OpenRouter is a unified API gateway that provides access to **100+ AI models** from multiple providers (OpenAI, Anthropic, Google, Meta, Mistral, etc.) through a single API key.
2699
-
2700
- **Key benefits:**
2701
- - ✅ **No vendor lock-in** - Switch models without changing code
2702
- - ✅ **Competitive pricing** - Often cheaper than direct provider access (e.g., GPT-4o-mini at $0.15/$0.60 per 1M tokens)
2703
- - ✅ **Automatic fallbacks** - If your primary model is unavailable, OpenRouter tries alternatives
2704
- - ✅ **Pay-as-you-go** - No monthly fees or subscriptions
2705
- - ✅ **Full tool calling support** - Compatible with Claude Code CLI workflows
2706
-
2707
- *Related searches: OpenRouter API gateway, multi-model LLM access, AI provider aggregator*
2708
- </details>
2709
-
2710
- <details>
2711
- <summary><strong>Q: How do I get started with OpenRouter?</strong></summary>
2712
-
2713
- **A:** Quick OpenRouter setup (5 minutes):
2714
-
2715
- 1. Visit https://openrouter.ai and sign in (GitHub, Google, or email)
2716
- 2. Go to https://openrouter.ai/keys and create an API key
2717
- 3. Add credits to your account (minimum $5, pay-as-you-go)
2718
- 4. Configure Lynkr:
2719
- ```bash
2720
- export MODEL_PROVIDER=openrouter
2721
- export OPENROUTER_API_KEY=sk-or-v1-...
2722
- export OPENROUTER_MODEL=openai/gpt-4o-mini
2723
- lynkr start
2724
- ```
2725
- 5. Connect Claude CLI and start coding!
2726
-
2727
- *Related searches: OpenRouter API key setup, OpenRouter getting started, OpenRouter credit system*
2728
- </details>
2729
-
2730
- <details>
2731
- <summary><strong>Q: Which OpenRouter model should I use?</strong></summary>
2732
-
2733
- **A:** Popular choices by use case:
2734
-
2735
- - **Budget-conscious:** `openai/gpt-4o-mini` ($0.15/$0.60 per 1M tokens) - Best value for code tasks
2736
- - **Best quality:** `anthropic/claude-3.5-sonnet` - Claude's most capable model
2737
- - **Free tier:** `meta-llama/llama-3.1-8b-instruct:free` - Completely free (rate-limited)
2738
- - **Balanced:** `google/gemini-pro-1.5` - Large context window, good performance
2739
-
2740
- See https://openrouter.ai/models for the complete list with pricing and features.
2741
-
2742
- *Related searches: best OpenRouter models for coding, cheapest OpenRouter models, OpenRouter model comparison*
2743
- </details>
2744
-
2745
- <details>
2746
- <summary><strong>Q: How do I use OpenAI directly with Lynkr?</strong></summary>
2747
-
2748
- **A:** Set `MODEL_PROVIDER=openai` and configure your API key:
2749
-
2750
- ```bash
2751
- export MODEL_PROVIDER=openai
2752
- export OPENAI_API_KEY=sk-your-api-key
2753
- export OPENAI_MODEL=gpt-4o # or gpt-4o-mini, o1-preview, etc.
2754
- lynkr start
2755
- ```
2756
-
2757
- Then start Lynkr and connect Claude CLI as usual. All requests will be routed to OpenAI's API with automatic format conversion.
2758
-
2759
- *Related searches: OpenAI API with Claude Code, GPT-4o integration, OpenAI proxy setup*
2760
- </details>
2761
-
2762
- <details>
2763
- <summary><strong>Q: What's the difference between OpenAI, Azure OpenAI, and OpenRouter?</strong></summary>
2764
-
2765
- **A:** Here's how they compare:
2766
-
2767
- - **OpenAI** - Direct access to OpenAI's API. Simplest setup, lowest latency, pay-as-you-go billing directly with OpenAI.
2768
- - **Azure OpenAI** - OpenAI models hosted on Azure infrastructure. Enterprise features (private endpoints, data residency, Azure AD integration), billed through Azure.
2769
- - **OpenRouter** - Third-party API gateway providing access to 100+ models (including OpenAI). Competitive pricing, automatic fallbacks, single API key for multiple providers.
2770
-
2771
- **Choose:**
2772
- - OpenAI for simplicity and direct access
2773
- - Azure OpenAI for enterprise requirements and compliance
2774
- - OpenRouter for model flexibility and cost optimization
2775
-
2776
- *Related searches: OpenAI vs Azure OpenAI, OpenRouter vs OpenAI pricing, enterprise AI deployment*
2777
- </details>
2778
-
2779
- <details>
2780
- <summary><strong>Q: Which OpenAI model should I use?</strong></summary>
2781
-
2782
- **A:** Recommended models by use case:
2783
-
2784
- - **Best quality:** `gpt-4o` - Most capable, multimodal (text + vision), excellent tool calling
2785
- - **Best value:** `gpt-4o-mini` - Fast, affordable ($0.15/$0.60 per 1M tokens), good for most tasks
2786
- - **Complex reasoning:** `o1-preview` - Advanced reasoning for math, logic, and complex problems
2787
- - **Fast reasoning:** `o1-mini` - Efficient reasoning for coding and math tasks
2788
-
2789
- For coding tasks, `gpt-4o-mini` offers the best balance of cost and quality.
2790
-
2791
- *Related searches: best GPT model for coding, o1-preview vs gpt-4o, OpenAI model selection*
2792
- </details>
2793
-
2794
- <details>
2795
- <summary><strong>Q: Can I use OpenAI with the 3-tier hybrid routing?</strong></summary>
2796
-
2797
- **A:** Yes! The recommended configuration uses multi-tier routing for optimal cost/performance:
2798
-
2799
- - **Tier 1 (0-2 tools):** Ollama (free, local, fast)
2800
- - **Tier 2 (3-14 tools):** OpenRouter (affordable, full tool support)
2801
- - **Tier 3 (15+ tools):** Databricks (most capable, enterprise features)
2802
-
2803
- This gives you the best of all worlds: free for simple tasks, affordable for moderate complexity, and enterprise-grade for heavy workloads.
2804
-
2805
- *Related searches: hybrid AI routing, multi-provider LLM strategy, cost-optimized AI architecture*
2806
- </details>
2807
-
2808
- <details>
2809
- <summary><strong>Q: Where are session transcripts stored?</strong></summary>
2810
-
2811
- **A:** Session transcripts are stored in SQLite at `data/sessions.db` (configurable via `SESSION_DB_PATH` environment variable).
2812
-
2813
- This allows for full conversation history, debugging, and audit trails.
2814
-
2815
- *Related searches: Claude Code session storage, conversation history location, SQLite session database*
2816
- </details>
2817
-
2818
- <details>
2819
- <summary><strong>Q: What production hardening features are included?</strong></summary>
2820
-
2821
- **A:** Lynkr includes **14 production-ready features** across three categories:
2822
-
2823
- **Reliability:**
2824
- - Retry logic with exponential backoff
2825
- - Circuit breakers
2826
- - Load shedding
2827
- - Graceful shutdown
2828
- - Connection pooling
2829
-
2830
- **Observability:**
2831
- - Metrics collection (Prometheus format)
2832
- - Health checks (Kubernetes-ready)
2833
- - Structured logging with request IDs
2834
-
2835
- **Security:**
2836
- - Input validation
2837
- - Consistent error handling
2838
- - Path allowlisting
2839
- - Budget enforcement
2840
-
2841
- All features add **minimal overhead** (~7Ξs per request) and are battle-tested with **80 comprehensive tests**.
2842
-
2843
- *Related searches: production-ready Node.js API, Kubernetes health checks, circuit breaker pattern*
2844
- </details>
2845
-
2846
- <details>
2847
- <summary><strong>Q: How does circuit breaker protection work?</strong></summary>
2848
-
2849
- **A:** Circuit breakers protect against cascading failures by implementing a state machine:
2850
-
2851
- **CLOSED (Normal):** Requests pass through normally
2852
- **OPEN (Failed):** After 5 consecutive failures, circuit opens and fails fast for 60 seconds (prevents overwhelming failing services)
2853
- **HALF-OPEN (Testing):** Circuit automatically attempts recovery, testing if the service has recovered
2854
-
2855
- This pattern prevents your application from wasting resources on requests likely to fail, and allows failing services time to recover.
2856
-
2857
- *Related searches: circuit breaker pattern explained, microservices resilience, failure recovery strategies*
2858
- </details>
2859
-
2860
- <details>
2861
- <summary><strong>Q: What metrics are collected and how can I access them?</strong></summary>
2862
-
2863
- **A:** Lynkr collects comprehensive metrics including:
2864
-
2865
- - Request counts and error rates
2866
- - Latency percentiles (p50, p95, p99)
2867
- - Token usage and costs
2868
- - Circuit breaker states
2869
-
2870
- **Access metrics via:**
2871
- - `/metrics/observability` - JSON format for dashboards
2872
- - `/metrics/prometheus` - Prometheus scraping
2873
- - `/metrics/circuit-breakers` - Circuit breaker state
2874
-
2875
- Perfect for Grafana dashboards, alerting, and production monitoring.
2876
-
2877
- *Related searches: Prometheus metrics endpoint, Node.js observability, API metrics collection*
2878
- </details>
2879
-
2880
- <details>
2881
- <summary><strong>Q: Is Lynkr production-ready?</strong></summary>
2882
-
2883
- **A:** Yes! Lynkr is designed for production deployments with:
2884
-
2885
- - ✅ **Zero-downtime deployments** (graceful shutdown)
2886
- - ✅ **Kubernetes integration** (health checks, metrics)
2887
- - ✅ **Horizontal scaling** (stateless design)
2888
- - ✅ **Enterprise monitoring** (Prometheus, Grafana)
2889
- - ✅ **Battle-tested reliability** (80 comprehensive tests, 100% pass rate)
2890
- - ✅ **Minimal overhead** (<10ξs middleware, <200MB memory)
2891
-
2892
- Used in production environments with >100K requests/day.
2893
-
2894
- *Related searches: production Node.js proxy, enterprise AI deployment, Kubernetes AI infrastructure*
2895
- </details>
2896
-
2897
- <details>
2898
- <summary><strong>Q: How do I deploy Lynkr to Kubernetes?</strong></summary>
2899
-
2900
- **A:** Deploy Lynkr to Kubernetes in 4 steps:
2901
-
2902
- 1. **Build Docker image:** `docker build -t lynkr .`
2903
- 2. **Configure secrets:** Store environment variables in Kubernetes secrets
2904
- 3. **Deploy application:** Apply Kubernetes deployment manifests
2905
- 4. **Configure monitoring:** Set up Prometheus scraping and Grafana dashboards
2906
-
2907
- **Key features for K8s:**
2908
- - Health check endpoints (liveness, readiness, startup probes)
2909
- - Graceful shutdown (respects SIGTERM)
2910
- - Stateless design (horizontal pod autoscaling)
2911
- - Prometheus metrics (ServiceMonitor ready)
2912
-
2913
- The graceful shutdown and health check endpoints ensure zero-downtime deployments.
2914
-
2915
- *Related searches: Kubernetes deployment best practices, Docker proxy deployment, K8s health probes*
2916
- </details>
2917
-
2918
- ### Still have questions?
2919
- - 💎 [Ask on GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)
2920
- - 🐛 [Report an Issue](https://github.com/vishalveerareddy123/Lynkr/issues)
2921
- - 📚 [Read Full Documentation](https://deepwiki.com/vishalveerareddy123/Lynkr)
2922
-
2923
- ---
2924
-
2925
- ## References & Further Reading
2926
-
2927
- ### Academic & Technical Resources
2928
-
2929
- **Agentic AI Systems:**
2930
- - **Zhang et al. (2024)**. *Agentic Context Engineering*. arXiv:2510.04618. [arXiv](https://arxiv.org/abs/2510.04618)
2931
-
2932
- **Long-Term Memory & RAG:**
2933
- - **Mohtashami & Jaggi (2023)**. *Landmark Attention: Random-Access Infinite Context Length for Transformers*. [arXiv](https://arxiv.org/abs/2305.16300)
2934
- - **Google DeepMind (2024)**. *Titans: Learning to Memorize at Test Time*. [arXiv](https://arxiv.org/abs/2411.07043)
2935
-
2936
- For BibTeX citations, see [CITATIONS.bib](CITATIONS.bib).
2937
-
2938
- ### Official Documentation
2939
-
2940
- - [Claude Code CLI Documentation](https://docs.anthropic.com/en/docs/build-with-claude/claude-for-sheets) - Official Claude Code reference
2941
- - [Model Context Protocol (MCP) Specification](https://spec.modelcontextprotocol.io/) - MCP protocol documentation
2942
- - [Databricks Foundation Models](https://docs.databricks.com/en/machine-learning/foundation-models/index.html) - Databricks LLM documentation
2943
- - [Anthropic API Documentation](https://docs.anthropic.com/en/api/getting-started) - Claude API reference
2944
-
2945
- ### Related Projects & Tools
2946
-
2947
- - [Ollama](https://ollama.ai/) - Local LLM runtime for running open-source models
2948
- - [OpenRouter](https://openrouter.ai/) - Multi-provider LLM API gateway (100+ models)
2949
- - [llama.cpp](https://github.com/ggerganov/llama.cpp) - High-performance C++ LLM inference engine
2950
- - [LiteLLM](https://github.com/BerriAI/litellm) - Multi-provider LLM proxy (alternative approach)
2951
- - [Awesome MCP Servers](https://github.com/punkpeye/awesome-mcp-servers) - Curated list of MCP server implementations
2952
-
2953
- ---
2954
-
2955
- ## Community & Adoption
2956
-
2957
- ### Get Involved
2958
-
2959
- **⭐ Star this repository** to show your support and help others discover Lynkr!
2960
-
2961
- [![GitHub stars](https://img.shields.io/github/stars/vishalveerareddy123/Lynkr?style=social)](https://github.com/vishalveerareddy123/Lynkr)
2962
-
2963
- ### Support & Resources
2964
-
2965
- - 🐛 **Report Issues:** [GitHub Issues](https://github.com/vishalveerareddy123/Lynkr/issues) - Bug reports and feature requests
2966
- - 💎 **Discussions:** [GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions) - Questions, ideas, and community help
2967
- - 💎 **Discord Community:** [Join our Discord](https://discord.gg/qF7DDxrX) - Real-time chat and community support
2968
- - 📚 **Documentation:** [DeepWiki](https://deepwiki.com/vishalveerareddy123/Lynkr) - Comprehensive guides and examples
2969
- - 🔧 **Contributing:** [CONTRIBUTING.md](CONTRIBUTING.md) - How to contribute to Lynkr
2970
-
2971
- ### Share Lynkr
2972
-
2973
- Help spread the word about Lynkr:
2974
-
2975
- - ðŸĶ [Share on Twitter](https://twitter.com/intent/tweet?text=Check%20out%20Lynkr%20-%20a%20production-ready%20Claude%20Code%20proxy%20with%20multi-provider%20support%20and%2060-80%25%20token%20savings!&url=https://github.com/vishalveerareddy123/Lynkr&hashtags=AI,ClaudeCode,LLM,OpenSource)
2976
- - 💞 [Share on LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https://github.com/vishalveerareddy123/Lynkr)
2977
- - 📰 [Share on Hacker News](https://news.ycombinator.com/submitlink?u=https://github.com/vishalveerareddy123/Lynkr&t=Lynkr%20-%20Production-Ready%20Claude%20Code%20Proxy)
2978
- - ðŸ“ą [Share on Reddit](https://www.reddit.com/submit?url=https://github.com/vishalveerareddy123/Lynkr&title=Lynkr%20-%20Production-Ready%20Claude%20Code%20Proxy%20with%20Multi-Provider%20Support)
2979
-
2980
- ### Why Developers Choose Lynkr
2981
-
2982
- - 💰 **Massive cost savings** - Save 60-80% on token costs with built-in optimization
2983
- - 🔓 **Provider freedom** - Choose from 7+ LLM providers (Databricks, OpenRouter, Ollama, Azure, llama.cpp)
2984
- - 🏠 **Privacy & control** - Self-hosted, open-source, no vendor lock-in
2985
- - 🚀 **Production-ready** - Enterprise features: circuit breakers, metrics, health checks
2986
- - 🛠ïļ **Active development** - Regular updates, responsive maintainers, growing community
2987
-
2988
- ---
2989
-
2990
- ## License
390
+ ## Community & Support
2991
391
 
2992
- Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for details.
392
+ - ⭐ **Star this repo** if Lynkr helps you!
393
+ - 💎 **[Join Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Ask questions, share tips
394
+ - 🐛 **[Report Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Bug reports welcome
395
+ - 📖 **[Read the Docs](documentation/)** - Comprehensive guides
2993
396
 
2994
397
  ---
2995
398
 
2996
- If you find Lynkr useful, please ⭐ the repo — it helps more people discover it.
399
+ **Made with âĪïļ by developers, for developers.**