lynkr 1.0.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/CITATIONS.bib +6 -0
  2. package/DEPLOYMENT.md +1001 -0
  3. package/README.md +215 -71
  4. package/docs/index.md +55 -2
  5. package/monitor-agents.sh +31 -0
  6. package/package.json +7 -3
  7. package/src/agents/context-manager.js +220 -0
  8. package/src/agents/definitions/loader.js +563 -0
  9. package/src/agents/executor.js +412 -0
  10. package/src/agents/index.js +157 -0
  11. package/src/agents/parallel-coordinator.js +68 -0
  12. package/src/agents/reflector.js +321 -0
  13. package/src/agents/skillbook.js +331 -0
  14. package/src/agents/store.js +244 -0
  15. package/src/api/router.js +55 -0
  16. package/src/clients/databricks.js +214 -17
  17. package/src/clients/routing.js +15 -7
  18. package/src/clients/standard-tools.js +341 -0
  19. package/src/config/index.js +41 -5
  20. package/src/orchestrator/index.js +254 -37
  21. package/src/server.js +2 -0
  22. package/src/tools/agent-task.js +96 -0
  23. package/test/azure-openai-config.test.js +203 -0
  24. package/test/azure-openai-error-resilience.test.js +238 -0
  25. package/test/azure-openai-format-conversion.test.js +354 -0
  26. package/test/azure-openai-integration.test.js +281 -0
  27. package/test/azure-openai-routing.test.js +148 -0
  28. package/test/azure-openai-streaming.test.js +171 -0
  29. package/test/format-conversion.test.js +578 -0
  30. package/test/hybrid-routing-integration.test.js +18 -11
  31. package/test/openrouter-error-resilience.test.js +418 -0
  32. package/test/passthrough-mode.test.js +385 -0
  33. package/test/routing.test.js +9 -3
  34. package/test/web-tools.test.js +3 -0
  35. package/test-agents-simple.js +43 -0
  36. package/test-cli-connection.sh +33 -0
  37. package/test-learning-unit.js +126 -0
  38. package/test-learning.js +112 -0
  39. package/test-parallel-agents.sh +124 -0
  40. package/test-parallel-direct.js +155 -0
  41. package/test-subagents.sh +117 -0
package/README.md CHANGED
@@ -15,22 +15,22 @@
15
15
  ## Table of Contents
16
16
 
17
17
  1. [Overview](#overview)
18
- 2. [Core Capabilities](#core-capabilities)
18
+ 2. [Supported Models & Providers](#supported-models--providers)
19
+ 3. [Core Capabilities](#core-capabilities)
19
20
  - [Repo Intelligence & Navigation](#repo-intelligence--navigation)
20
21
  - [Git Workflow Enhancements](#git-workflow-enhancements)
21
22
  - [Diff & Change Management](#diff--change-management)
22
23
  - [Execution & Tooling](#execution--tooling)
23
24
  - [Workflow & Collaboration](#workflow--collaboration)
24
25
  - [UX, Monitoring, and Logs](#ux-monitoring-and-logs)
25
- 3. [Production Hardening Features](#production-hardening-features)
26
+ 4. [Production Hardening Features](#production-hardening-features)
26
27
  - [Reliability & Resilience](#reliability--resilience)
27
28
  - [Observability & Monitoring](#observability--monitoring)
28
29
  - [Security & Governance](#security--governance)
29
- - [Performance Characteristics](#performance-characteristics)
30
- 4. [Architecture](#architecture)
31
- 5. [Getting Started](#getting-started)
32
- 6. [Configuration Reference](#configuration-reference)
33
- 7. [Runtime Operations](#runtime-operations)
30
+ 5. [Architecture](#architecture)
31
+ 6. [Getting Started](#getting-started)
32
+ 7. [Configuration Reference](#configuration-reference)
33
+ 8. [Runtime Operations](#runtime-operations)
34
34
  - [Launching the Proxy](#launching-the-proxy)
35
35
  - [Connecting Claude Code CLI](#connecting-claude-code-cli)
36
36
  - [Using Ollama Models](#using-ollama-models)
@@ -40,11 +40,12 @@
40
40
  - [Integrating MCP Servers](#integrating-mcp-servers)
41
41
  - [Health Checks & Monitoring](#health-checks--monitoring)
42
42
  - [Metrics & Observability](#metrics--observability)
43
- 8. [Manual Test Matrix](#manual-test-matrix)
44
- 9. [Troubleshooting](#troubleshooting)
45
- 10. [Roadmap & Known Gaps](#roadmap--known-gaps)
46
- 11. [FAQ](#faq)
47
- 12. [License](#license)
43
+ 9. [Manual Test Matrix](#manual-test-matrix)
44
+ 10. [Troubleshooting](#troubleshooting)
45
+ 11. [Roadmap & Known Gaps](#roadmap--known-gaps)
46
+ 12. [FAQ](#faq)
47
+ 13. [References](#references)
48
+ 14. [License](#license)
48
49
 
49
50
  ---
50
51
 
@@ -71,6 +72,105 @@ Further documentation and usage notes are available on [DeepWiki](https://deepwi
71
72
 
72
73
  ---
73
74
 
75
+ ## Supported Models & Providers
76
+
77
+ Lynkr supports multiple AI model providers, giving you flexibility in choosing the right model for your needs:
78
+
79
+ ### **Provider Options**
80
+
81
+ | Provider | Configuration | Models Available | Best For |
82
+ |----------|--------------|------------------|----------|
83
+ | **Databricks** (Default) | `MODEL_PROVIDER=databricks` | Claude Sonnet 4.5, Claude Opus 4.5 | Production use, enterprise deployment |
84
+ | **Azure OpenAI** | `MODEL_PROVIDER=azure-openai` | GPT-4o, GPT-4o-mini, GPT-5, o1, o3 | Azure integration, Microsoft ecosystem |
85
+ | **Azure Anthropic** | `MODEL_PROVIDER=azure-anthropic` | Claude Sonnet 4.5, Claude Opus 4.5 | Azure-hosted Claude models |
86
+ | **OpenRouter** | `MODEL_PROVIDER=openrouter` | 100+ models (GPT-4o, Claude, Gemini, Llama, etc.) | Model flexibility, cost optimization |
87
+ | **Ollama** (Local) | `MODEL_PROVIDER=ollama` | Llama 3.1, Qwen2.5, Mistral, CodeLlama | Local/offline use, privacy, no API costs |
88
+
89
+ ### **Recommended Models by Use Case**
90
+
91
+ #### **For Production Code Assistance**
92
+ - **Best**: Claude Sonnet 4.5 (via Databricks or Azure Anthropic)
93
+ - **Alternative**: GPT-4o (via Azure OpenAI or OpenRouter)
94
+ - **Budget**: GPT-4o-mini (via Azure OpenAI) or Claude Haiku (via OpenRouter)
95
+
96
+ #### **For Code Generation**
97
+ - **Best**: Claude Opus 4.5 (via Databricks or Azure Anthropic)
98
+ - **Alternative**: GPT-4o (via Azure OpenAI)
99
+ - **Local**: Qwen2.5-Coder 32B (via Ollama)
100
+
101
+ #### **For Fast Exploration**
102
+ - **Best**: Claude Haiku (via OpenRouter or Azure Anthropic)
103
+ - **Alternative**: GPT-4o-mini (via Azure OpenAI)
104
+ - **Local**: Llama 3.1 8B (via Ollama)
105
+
106
+ #### **For Cost Optimization**
107
+ - **Cheapest Cloud**: Amazon Nova models (via OpenRouter) - free tier available
108
+ - **Cheapest Local**: Ollama (any model) - completely free, runs on your hardware
109
+
110
+ ### **Azure OpenAI Specific Models**
111
+
112
+ When using `MODEL_PROVIDER=azure-openai`, you can deploy any of these models:
113
+
114
+ | Model | Deployment Name | Capabilities | Best For |
115
+ |-------|----------------|--------------|----------|
116
+ | **GPT-4o** | `gpt-4o` | Text, vision, function calling | General-purpose, multimodal tasks |
117
+ | **GPT-4o-mini** | `gpt-4o-mini` | Text, function calling | Fast responses, cost-effective |
118
+ | **GPT-5** | `gpt-5-chat` or custom | Advanced reasoning, longer context | Complex problem-solving |
119
+ | **o1-preview** | `o1-preview` | Deep reasoning, chain of thought | Mathematical, logic problems |
120
+ | **o3-mini** | `o3-mini` | Efficient reasoning | Fast reasoning tasks |
121
+
122
+ **Note**: Azure OpenAI deployment names are configurable via `AZURE_OPENAI_DEPLOYMENT` environment variable.
123
+
124
+ ### **Ollama Model Recommendations**
125
+
126
+ For tool calling support (required for Claude Code CLI functionality):
127
+
128
+ ✅ **Recommended**:
129
+ - `llama3.1:8b` - Good balance of speed and capability
130
+ - `llama3.2` - Latest Llama model
131
+ - `qwen2.5:14b` - Strong reasoning (larger model needed, 7b struggles with tools)
132
+ - `mistral:7b-instruct` - Fast and capable
133
+
134
+ ❌ **Not Recommended for Tools**:
135
+ - `qwen2.5-coder` - Code-only, slow with tool calling
136
+ - `codellama` - Code-only, poor tool support
137
+
138
+ ### **Hybrid Routing (Ollama + Cloud Fallback)**
139
+
140
+ Lynkr supports intelligent hybrid routing for cost optimization:
141
+
142
+ ```bash
143
+ # Use Ollama for simple tasks, fallback to cloud for complex ones
144
+ PREFER_OLLAMA=true
145
+ FALLBACK_ENABLED=true
146
+ FALLBACK_PROVIDER=databricks # or azure-openai, openrouter, azure-anthropic
147
+ ```
148
+
149
+ **How it works**:
150
+ - Requests with few/no tools → Ollama (free, local)
151
+ - Requests with many tools → Cloud provider (more capable)
152
+ - Ollama failures → Automatic fallback to cloud
153
+
154
+ **Routing Logic**:
155
+ - 0-2 tools: Ollama
156
+ - 3-15 tools: OpenRouter or Azure OpenAI (if configured)
157
+ - 16+ tools: Databricks or Azure Anthropic (most capable)
158
+
159
+ ### **Provider Comparison**
160
+
161
+ | Feature | Databricks | Azure OpenAI | Azure Anthropic | OpenRouter | Ollama |
162
+ |---------|-----------|--------------|-----------------|------------|--------|
163
+ | **Setup Complexity** | Medium | Medium | Medium | Easy | Easy |
164
+ | **Cost** | $$$ | $$ | $$$ | $ | Free |
165
+ | **Latency** | Low | Low | Low | Medium | Very Low |
166
+ | **Tool Calling** | Excellent | Excellent | Excellent | Good | Fair |
167
+ | **Context Length** | 200K | 128K | 200K | Varies | 32K-128K |
168
+ | **Streaming** | Yes | Yes | Yes | Yes | Yes |
169
+ | **Privacy** | Enterprise | Enterprise | Enterprise | Third-party | Local |
170
+ | **Offline** | No | No | No | No | Yes |
171
+
172
+ ---
173
+
74
174
  ## Core Capabilities
75
175
 
76
176
  ### Repo Intelligence & Navigation
@@ -96,7 +196,15 @@ Further documentation and usage notes are available on [DeepWiki](https://deepwi
96
196
 
97
197
  ### Execution & Tooling
98
198
 
99
- - Tool execution pipeline sandboxes or runs tools in the host workspace based on policy.
199
+ - **Flexible tool execution modes**: Configure where tools execute via `TOOL_EXECUTION_MODE`:
200
+ - `server` (default) – Tools run on the proxy server where Lynkr is hosted
201
+ - `client`/`passthrough` – Tools execute on the Claude Code CLI side, enabling local file operations and commands on the client machine
202
+ - **Client-side tool execution** – When in passthrough mode, the proxy returns Anthropic-formatted `tool_use` blocks to the CLI, which executes them locally and sends back `tool_result` blocks. This enables:
203
+ - File operations on the CLI user's local filesystem
204
+ - Local command execution in the user's environment
205
+ - Access to local credentials and SSH keys
206
+ - Integration with local development tools
207
+ - Tool execution pipeline sandboxes or runs tools in the host workspace based on policy (server mode).
100
208
  - MCP sandbox orchestration (Docker runtime by default) optionally isolates external tools with mount and permission controls.
101
209
  - Automated testing harness exposes `workspace_test_run`, `workspace_test_history`, and `workspace_test_summary`.
102
210
  - Prompt caching reduces repeated token usage for iterative conversations.
@@ -213,43 +321,7 @@ Lynkr includes comprehensive production-ready features designed for reliability,
213
321
  - Cost tracking and budget exhaustion handling
214
322
  - Request-level cost attribution
215
323
 
216
- ### Performance Characteristics
217
-
218
- #### **Benchmark Results**
219
- Based on comprehensive performance testing with 100,000+ operations:
220
-
221
- | Component | Throughput | Latency | Overhead |
222
- |-----------|------------|---------|----------|
223
- | Baseline (no-op) | 21.3M ops/sec | 0.00005ms | - |
224
- | Metrics Collection | 4.7M ops/sec | 0.0002ms | 0.15ms |
225
- | Load Shedding Check | 7.6M ops/sec | 0.0001ms | 0.08ms |
226
- | Circuit Breaker | 4.3M ops/sec | 0.0002ms | 0.18ms |
227
- | Input Validation (simple) | 5.8M ops/sec | 0.0002ms | 0.12ms |
228
- | Input Validation (complex) | 890K ops/sec | 0.0011ms | 0.96ms |
229
- | Combined Middleware Stack | 140K ops/sec | 0.0071ms | 7.1μs |
230
-
231
- **Overall Performance Rating:** ⭐ **EXCELLENT**
232
- - Total middleware overhead: **7.1 microseconds** per request
233
- - Throughput: **140,000 requests/second**
234
- - Memory overhead: **~4MB** for typical workload
235
-
236
- #### **Production Deployment Metrics**
237
- - **Test Coverage:** 80 comprehensive tests with 100% pass rate
238
- - **Feature Completeness:** 14/14 production features implemented
239
- - **Zero-downtime Deployments:** Supported via graceful shutdown
240
- - **Horizontal Scaling:** Stateless design enables unlimited horizontal scaling
241
- - **Vertical Scaling:** Efficient resource usage supports high request volumes
242
-
243
- #### **Scalability Profile**
244
- - Single instance handles 140K req/sec under test conditions
245
- - Linear scaling with additional instances (no shared state)
246
- - Memory usage: ~100MB baseline + ~4MB per 10K active requests
247
- - CPU usage: <5% per core at moderate load
248
- - Network: Limited by backend API latency, not proxy overhead
249
-
250
- For detailed performance analysis, benchmarks, and deployment guidance, see [PERFORMANCE-REPORT.md](PERFORMANCE-REPORT.md).
251
324
 
252
- ---
253
325
 
254
326
  ## Architecture
255
327
 
@@ -593,6 +665,7 @@ See https://openrouter.ai/models for the complete list with pricing.
593
665
  | `PROMPT_CACHE_ENABLED` | Toggle the prompt cache system. | `true` |
594
666
  | `PROMPT_CACHE_TTL_MS` | Milliseconds before cached prompts expire. | `300000` (5 minutes) |
595
667
  | `PROMPT_CACHE_MAX_ENTRIES` | Maximum number of cached prompts retained. | `64` |
668
+ | `TOOL_EXECUTION_MODE` | Controls where tools execute: `server` (default, tools run on proxy server), `client`/`passthrough` (tools execute on Claude Code CLI side). | `server` |
596
669
  | `POLICY_MAX_STEPS` | Max agent loop iterations before timeout. | `8` |
597
670
  | `POLICY_GIT_ALLOW_PUSH` | Allow/disallow `workspace_git_push`. | `false` |
598
671
  | `POLICY_GIT_REQUIRE_TESTS` | Enforce passing tests before `workspace_git_commit`. | `false` |
@@ -690,12 +763,6 @@ Lynkr works with any Ollama model. Popular choices:
690
763
  - **mistral:latest** – Fast, efficient model (7B parameters, 4.1GB)
691
764
  - **codellama:latest** – Meta's code-focused model (7B-34B variants)
692
765
 
693
- **Performance Characteristics:**
694
-
695
- - **Latency**: ~100-500ms first token (depending on model size and hardware)
696
- - **Throughput**: ~20-50 tokens/sec on M1/M2 Macs, ~10-30 tokens/sec on typical CPUs
697
- - **Memory**: 8GB RAM minimum recommended for 7B models, 16GB for 13B models
698
- - **Disk**: 4-10GB per model (quantized)
699
766
 
700
767
  **Ollama Health Check:**
701
768
 
@@ -716,7 +783,6 @@ Lynkr now supports **native tool calling** for compatible Ollama models:
716
783
  - ✅ **Format conversion**: Transparent conversion between Anthropic and Ollama tool formats
717
784
  - ❌ **Unsupported models**: llama3, older models (tools are filtered out automatically)
718
785
 
719
- See [OLLAMA-TOOL-CALLING.md](OLLAMA-TOOL-CALLING.md) for implementation details.
720
786
 
721
787
  **Limitations:**
722
788
 
@@ -885,8 +951,6 @@ npm start
885
951
  | **Cost per simple request** | $0.002-0.005 | $0.00 | 100% savings 💰 |
886
952
  | **Fallback latency** | N/A | <100ms | Transparent to user |
887
953
 
888
- See [HYBRID-ROUTING-ANALYSIS.md](HYBRID-ROUTING-ANALYSIS.md) for detailed performance analysis.
889
-
890
954
  ### Using Built-in Workspace Tools
891
955
 
892
956
  You can call tools programmatically via HTTP:
@@ -913,6 +977,69 @@ curl http://localhost:8080/v1/messages \
913
977
 
914
978
  Tool responses appear in the assistant content block with structured JSON.
915
979
 
980
+ ### Client-Side Tool Execution (Passthrough Mode)
981
+
982
+ Lynkr supports **client-side tool execution**, where tools execute on the Claude Code CLI machine instead of the proxy server. This enables local file operations, commands, and access to local resources.
983
+
984
+ **Enable client-side execution:**
985
+
986
+ ```bash
987
+ # Set in .env or export before starting
988
+ export TOOL_EXECUTION_MODE=client
989
+ npm start
990
+ ```
991
+
992
+ **How it works:**
993
+
994
+ 1. **Model generates tool calls** – Databricks/OpenRouter/Ollama model returns tool calls
995
+ 2. **Proxy converts to Anthropic format** – Tool calls converted to `tool_use` blocks
996
+ 3. **CLI executes tools locally** – Claude Code CLI receives `tool_use` blocks and runs them on the user's machine
997
+ 4. **CLI sends results back** – Tool results sent back to proxy in next request as `tool_result` blocks
998
+ 5. **Conversation continues** – Proxy forwards the complete conversation (including tool results) back to the model
999
+
1000
+ **Example response in passthrough mode:**
1001
+
1002
+ ```json
1003
+ {
1004
+ "id": "msg_123",
1005
+ "type": "message",
1006
+ "role": "assistant",
1007
+ "content": [
1008
+ {
1009
+ "type": "text",
1010
+ "text": "I'll create that file for you."
1011
+ },
1012
+ {
1013
+ "type": "tool_use",
1014
+ "id": "toolu_abc",
1015
+ "name": "Write",
1016
+ "input": {
1017
+ "file_path": "/tmp/test.txt",
1018
+ "content": "Hello World"
1019
+ }
1020
+ }
1021
+ ],
1022
+ "stop_reason": "tool_use"
1023
+ }
1024
+ ```
1025
+
1026
+ **Benefits:**
1027
+ - ✅ Tools execute on CLI user's local filesystem
1028
+ - ✅ Access to local credentials, SSH keys, environment variables
1029
+ - ✅ Integration with local development tools (git, npm, docker, etc.)
1030
+ - ✅ Reduced network latency for file operations
1031
+ - ✅ Server doesn't need filesystem access or permissions
1032
+
1033
+ **Use cases:**
1034
+ - Remote proxy server, local CLI execution
1035
+ - Multi-user environments where each user needs their own workspace
1036
+ - Security-sensitive environments where server shouldn't access user files
1037
+
1038
+ **Supported modes:**
1039
+ - `TOOL_EXECUTION_MODE=server` – Tools run on proxy server (default)
1040
+ - `TOOL_EXECUTION_MODE=client` – Tools run on CLI side
1041
+ - `TOOL_EXECUTION_MODE=passthrough` – Alias for `client`
1042
+
916
1043
  ### Working with Prompt Caching
917
1044
 
918
1045
  - Set `PROMPT_CACHE_ENABLED=true` (default) to activate the cache.
@@ -1205,11 +1332,21 @@ Replace `<workspace>` and `<endpoint-name>` with your Databricks workspace host
1205
1332
  - **Claude CLI prompts for missing tools** – Verify `tools` array in the client request lists the functions you expect. The proxy only exposes registered handlers.
1206
1333
  - **Dynamic finance pages return stale data** – `web_fetch` fetches static HTML only. Use an API endpoint (e.g. Yahoo Finance chart JSON) or the Databricks-hosted tooling if you need rendered values from heavily scripted pages.
1207
1334
 
1335
+ ### OpenRouter Issues
1336
+
1337
+ - **"No choices in OpenRouter response" errors** – OpenRouter sometimes returns error responses (rate limits, model unavailable) with JSON but no `choices` array. As of the latest update, Lynkr gracefully handles these errors and returns proper error responses instead of crashing. Check logs for "OpenRouter response missing choices array" warnings to see the full error details.
1338
+ - **Multi-prompt behavior with certain models** – Some OpenRouter models (particularly open-source models like `openai/gpt-oss-120b`) may be overly cautious and ask for confirmation multiple times before executing tools. This is model-specific behavior. Consider switching to:
1339
+ - `anthropic/claude-3.5-sonnet` – More decisive tool execution
1340
+ - `openai/gpt-4o` or `openai/gpt-4o-mini` – Better tool calling behavior
1341
+ - Use Databricks provider with Claude models for optimal tool execution
1342
+ - **Rate limit errors** – OpenRouter applies per-model rate limits. If you hit limits frequently, check your OpenRouter dashboard for current usage and consider upgrading your plan or spreading requests across multiple models.
1343
+
1208
1344
  ### Production Hardening Issues
1209
1345
 
1210
1346
  - **503 Service Unavailable errors during normal load** – Check load shedding thresholds (`LOAD_SHEDDING_*`). Lower values may trigger too aggressively. Check `/metrics/observability` for memory usage patterns.
1211
1347
  - **Circuit breaker stuck in OPEN state** – Check `/metrics/circuit-breakers` to see failure counts. Verify backend service (Databricks/Azure) is accessible. Circuit will automatically attempt recovery after `CIRCUIT_BREAKER_TIMEOUT` (default: 60s).
1212
1348
  - **"Circuit breaker is OPEN" errors** – The circuit breaker detected too many failures and is protecting against cascading failures. Wait for timeout or fix the underlying issue. Check logs for root cause of failures.
1349
+ - **Azure OpenAI specific**: If using Azure OpenAI and seeing circuit breaker errors, verify your `AZURE_OPENAI_ENDPOINT` includes the full path (including `/openai/deployments/YOUR-DEPLOYMENT/chat/completions`). Missing endpoint variable or undefined returns can trigger circuit breaker protection.
1213
1350
  - **High latency after adding production features** – This is unexpected; middleware adds only ~7μs overhead. Check `/metrics/prometheus` for actual latency distribution. Verify network latency to backend services.
1214
1351
  - **Health check endpoint returns 503 but service seems healthy** – Check individual health check components in the response JSON. Database connectivity or memory issues may trigger this. Review logs for specific health check failures.
1215
1352
  - **Metrics endpoint shows incorrect data** – Metrics are in-memory and reset on restart. For persistent metrics, configure Prometheus scraping. Check that `METRICS_ENABLED=true`.
@@ -1245,9 +1382,9 @@ If performance is degraded:
1245
1382
 
1246
1383
  ## Roadmap & Known Gaps
1247
1384
 
1248
- ### ✅ Recently Completed (Production Hardening)
1385
+ ### ✅ Recently Completed
1249
1386
 
1250
- All 14 production hardening features have been implemented and tested with 100% pass rate:
1387
+ **Production Hardening (All 14 features implemented with 100% pass rate):**
1251
1388
  - ✅ Exponential backoff with jitter retry logic
1252
1389
  - ✅ Circuit breaker pattern for external services
1253
1390
  - ✅ Load shedding with resource monitoring
@@ -1263,7 +1400,11 @@ All 14 production hardening features have been implemented and tested with 100%
1263
1400
  - ✅ Rate limiting capabilities
1264
1401
  - ✅ Safe command DSL
1265
1402
 
1266
- Performance verified: 7.1μs overhead, 140K req/sec throughput. See [PERFORMANCE-REPORT.md](PERFORMANCE-REPORT.md) for details.
1403
+
1404
+ **Latest Features (December 2025):**
1405
+ - ✅ **Client-side tool execution** (`TOOL_EXECUTION_MODE=client/passthrough`) – Tools can now execute on the Claude Code CLI side instead of the server, enabling local file operations, local commands, and access to local credentials
1406
+ - ✅ **OpenRouter error resilience** – Graceful handling of malformed OpenRouter responses (missing `choices` array), preventing crashes during rate limits or service errors
1407
+ - ✅ **Enhanced format conversion** – Improved Anthropic ↔ OpenRouter format conversion for tool calls, ensuring proper `tool_use` block generation and session consistency across providers
1267
1408
 
1268
1409
  ### 🔮 Future Enhancements
1269
1410
 
@@ -1380,32 +1521,35 @@ A: Lynkr collects request counts, error rates, latency percentiles (p50, p95, p9
1380
1521
  - `/metrics/circuit-breakers` - Circuit breaker state
1381
1522
 
1382
1523
  **Q: Is Lynkr production-ready?**
1383
- A: Yes. Excellent performance (140K req/sec), and comprehensive observability, Lynkr is designed for production deployments. It supports:
1524
+ A: Yes. Excellent performance , and comprehensive observability, Lynkr is designed for production deployments. It supports:
1384
1525
  - Zero-downtime deployments (graceful shutdown)
1385
1526
  - Kubernetes integration (health checks, metrics)
1386
1527
  - Horizontal scaling (stateless design)
1387
1528
  - Enterprise monitoring (Prometheus, Grafana)
1388
1529
 
1389
- **Q: What's the performance impact of production features?**
1390
- A: Minimal. Comprehensive benchmarking shows:
1391
- - Total middleware overhead: 7.1 microseconds per request
1392
- - Throughput: 140,000 requests/second
1393
- - Memory overhead: ~4MB for typical workload
1394
1530
 
1395
- This is considered "EXCELLENT" performance - the overhead is negligible compared to network and API latency.
1396
1531
 
1397
1532
  **Q: How do I deploy Lynkr to Kubernetes?**
1398
1533
  A: Use the included Kubernetes configurations and Docker support. Key steps:
1399
1534
  1. Build Docker image: `docker build -t lynkr .`
1400
1535
  2. Configure environment variables in Kubernetes secrets
1401
- 3. Deploy with health checks (see examples in [PERFORMANCE-REPORT.md](PERFORMANCE-REPORT.md))
1402
- 4. Configure Prometheus scraping for metrics
1403
- 5. Set up Grafana dashboards for visualization
1536
+ 3. Configure Prometheus scraping for metrics
1537
+ 4. Set up Grafana dashboards for visualization
1404
1538
 
1405
1539
  The graceful shutdown and health check endpoints ensure zero-downtime deployments.
1406
1540
 
1407
1541
  ---
1408
1542
 
1543
+ ## References
1544
+
1545
+ Lynkr's design also includes ACE Framework informed by research in agentic AI systems and context engineering:
1546
+
1547
+ - **Zhang et al. (2024)**. *Agentic Context Engineering*. arXiv:2510.04618. [arXiv](https://arxiv.org/abs/2510.04618)
1548
+
1549
+ For BibTeX citations, see [CITATIONS.bib](CITATIONS.bib).
1550
+
1551
+ ---
1552
+
1409
1553
  ## License
1410
1554
 
1411
1555
  MIT License. See [LICENSE](LICENSE) for details.
package/docs/index.md CHANGED
@@ -73,10 +73,13 @@ Commit, push, diff, stage, generate release notes, etc.
73
73
  ### ✔ Prompt Caching (LRU + TTL)
74
74
  Reuses identical prompts to reduce cost + latency.
75
75
 
76
- ### ✔ Workspace Tools
76
+ ### ✔ Workspace Tools
77
77
  Task tracker, file I/O, test runner, index rebuild, etc.
78
78
 
79
- ### ✔ Fully extensible Node.js architecture
79
+ ### ✔ Client-Side Tool Execution (Passthrough Mode)
80
+ Tools can execute on the Claude Code CLI side instead of the server, enabling local file operations and commands.
81
+
82
+ ### ✔ Fully extensible Node.js architecture
80
83
  Add custom tools, policies, or backend adapters.
81
84
 
82
85
  ---
@@ -92,6 +95,7 @@ Add custom tools, policies, or backend adapters.
92
95
  - [Prompt Caching](#-prompt-caching)
93
96
  - [MCP (Model Context Protocol) Integration](#-model-context-protocol-mcp)
94
97
  - [Git Tools](#-git-tools)
98
+ - [Client-Side Tool Execution (Passthrough Mode)](#-client-side-tool-execution-passthrough-mode)
95
99
  - [API Examples](#-api-examples)
96
100
  - [Roadmap](#-roadmap)
97
101
  - [Links](#-links)
@@ -363,6 +367,47 @@ Example:
363
367
 
364
368
  ---
365
369
 
370
+ # 🔄 Client-Side Tool Execution (Passthrough Mode)
371
+
372
+ Lynkr supports **client-side tool execution**, enabling tools to execute on the Claude Code CLI machine instead of the proxy server.
373
+
374
+ **Enable passthrough mode:**
375
+
376
+ ```bash
377
+ export TOOL_EXECUTION_MODE=client
378
+ npm start
379
+ ```
380
+
381
+ **How it works:**
382
+
383
+ 1. Model generates tool calls (from Databricks/OpenRouter/Ollama)
384
+ 2. Proxy converts to Anthropic format with `tool_use` blocks
385
+ 3. Claude Code CLI receives `tool_use` blocks and executes locally
386
+ 4. CLI sends `tool_result` blocks back in the next request
387
+ 5. Proxy forwards complete conversation back to the model
388
+
389
+ **Benefits:**
390
+
391
+ * ✅ Local filesystem access on CLI user's machine
392
+ * ✅ Local credentials, SSH keys, environment variables
393
+ * ✅ Integration with local dev tools (git, npm, docker)
394
+ * ✅ Reduced network latency for file operations
395
+ * ✅ Server doesn't need filesystem permissions
396
+
397
+ **Use cases:**
398
+
399
+ * Remote proxy server with local CLI execution
400
+ * Multi-user environments where each needs their own workspace
401
+ * Security-sensitive setups where server shouldn't access user files
402
+
403
+ **Configuration:**
404
+
405
+ * `TOOL_EXECUTION_MODE=server` – Tools run on proxy (default)
406
+ * `TOOL_EXECUTION_MODE=client` – Tools run on CLI side
407
+ * `TOOL_EXECUTION_MODE=passthrough` – Alias for client mode
408
+
409
+ ---
410
+
366
411
  # 🧪 API Example (Index Rebuild)
367
412
 
368
413
  ```bash
@@ -382,6 +427,14 @@ curl http://localhost:8080/v1/messages \
382
427
 
383
428
  # 🛣 Roadmap
384
429
 
430
+ ## ✅ Recently Completed (December 2025)
431
+
432
+ * **Client-side tool execution** (`TOOL_EXECUTION_MODE=client/passthrough`) – Tools can execute on the Claude Code CLI side, enabling local file operations, commands, and access to local credentials
433
+ * **OpenRouter error resilience** – Graceful handling of malformed OpenRouter responses, preventing crashes during rate limits or service errors
434
+ * **Enhanced format conversion** – Improved Anthropic ↔ OpenRouter format conversion for tool calls with proper `tool_use` block generation
435
+
436
+ ## 🔮 Future Features
437
+
385
438
  * LSP integration (TypeScript, Python, more languages)
386
439
  * Per-file diff comments
387
440
  * Risk scoring for Git diffs
@@ -0,0 +1,31 @@
1
+ #!/bin/bash
2
+
3
+ # Monitor agent activity in real-time
4
+
5
+ echo "🔍 Monitoring Agent Activity"
6
+ echo "=============================="
7
+ echo ""
8
+
9
+ while true; do
10
+ clear
11
+ echo "🔍 Agent Statistics (refreshing every 3s)"
12
+ echo "=========================================="
13
+ echo ""
14
+
15
+ # Get stats
16
+ curl -s http://localhost:8080/v1/agents/stats | jq -r '.stats[] |
17
+ "Agent: \(.agent_type)
18
+ Executions: \(.total_executions) (\(.completed) completed, \(.failed) failed)
19
+ Avg Duration: \(.avg_duration_ms)ms
20
+ Tokens: \(.total_input_tokens) in / \(.total_output_tokens) out
21
+ "' || echo "Proxy not responding..."
22
+
23
+ echo ""
24
+ echo "Latest transcripts:"
25
+ ls -lt data/agent-transcripts/*.jsonl 2>/dev/null | head -3 || echo "No transcripts yet"
26
+
27
+ echo ""
28
+ echo "Press Ctrl+C to stop monitoring"
29
+
30
+ sleep 3
31
+ done
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "lynkr",
3
- "version": "1.0.0",
4
- "description": "Self-hosted Claude Code proxy with Databricks and Azure Anthropic adapters, workspace tooling, and MCP integration.",
3
+ "version": "2.0.0",
4
+ "description": "Self-hosted Claude Code proxy with Databricks,Azure adapters, workspace tooling, and MCP integration.",
5
5
  "main": "index.js",
6
6
  "bin": {
7
7
  "lynkr": "./bin/cli.js",
@@ -12,7 +12,8 @@
12
12
  "dev": "nodemon index.js",
13
13
  "lint": "eslint src index.js",
14
14
  "test": "npm run test:unit && npm run test:performance",
15
- "test:unit": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/routing.test.js test/hybrid-routing-integration.test.js test/web-tools.test.js",
15
+ "test:unit": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/routing.test.js test/hybrid-routing-integration.test.js test/web-tools.test.js test/passthrough-mode.test.js test/openrouter-error-resilience.test.js test/format-conversion.test.js test/azure-openai-config.test.js test/azure-openai-format-conversion.test.js test/azure-openai-routing.test.js test/azure-openai-streaming.test.js test/azure-openai-error-resilience.test.js test/azure-openai-integration.test.js",
16
+ "test:new-features": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/passthrough-mode.test.js test/openrouter-error-resilience.test.js test/format-conversion.test.js",
16
17
  "test:performance": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node test/hybrid-routing-performance.test.js && DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node test/performance-tests.js",
17
18
  "test:benchmark": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node test/performance-benchmark.js",
18
19
  "test:quick": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/routing.test.js",
@@ -40,6 +41,7 @@
40
41
  "node": ">=20.0.0"
41
42
  },
42
43
  "dependencies": {
44
+ "@azure/openai": "^2.0.0",
43
45
  "better-sqlite3": "^9.4.0",
44
46
  "compression": "^1.7.4",
45
47
  "diff": "^5.2.0",
@@ -47,6 +49,8 @@
47
49
  "express": "^5.1.0",
48
50
  "express-rate-limit": "^8.2.1",
49
51
  "fast-glob": "^3.3.2",
52
+ "js-yaml": "^4.1.1",
53
+ "openai": "^6.14.0",
50
54
  "pino": "^8.17.2",
51
55
  "pino-http": "^8.6.0",
52
56
  "tree-sitter": "^0.20.1",