npm - lynkr - Versions diffs - 1.0.0 → 2.0.0 - Mend

lynkr 1.0.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

package/CITATIONS.bib +6 -0
package/DEPLOYMENT.md +1001 -0
package/README.md +215 -71
package/docs/index.md +55 -2
package/monitor-agents.sh +31 -0
package/package.json +7 -3
package/src/agents/context-manager.js +220 -0
package/src/agents/definitions/loader.js +563 -0
package/src/agents/executor.js +412 -0
package/src/agents/index.js +157 -0
package/src/agents/parallel-coordinator.js +68 -0
package/src/agents/reflector.js +321 -0
package/src/agents/skillbook.js +331 -0
package/src/agents/store.js +244 -0
package/src/api/router.js +55 -0
package/src/clients/databricks.js +214 -17
package/src/clients/routing.js +15 -7
package/src/clients/standard-tools.js +341 -0
package/src/config/index.js +41 -5
package/src/orchestrator/index.js +254 -37
package/src/server.js +2 -0
package/src/tools/agent-task.js +96 -0
package/test/azure-openai-config.test.js +203 -0
package/test/azure-openai-error-resilience.test.js +238 -0
package/test/azure-openai-format-conversion.test.js +354 -0
package/test/azure-openai-integration.test.js +281 -0
package/test/azure-openai-routing.test.js +148 -0
package/test/azure-openai-streaming.test.js +171 -0
package/test/format-conversion.test.js +578 -0
package/test/hybrid-routing-integration.test.js +18 -11
package/test/openrouter-error-resilience.test.js +418 -0
package/test/passthrough-mode.test.js +385 -0
package/test/routing.test.js +9 -3
package/test/web-tools.test.js +3 -0
package/test-agents-simple.js +43 -0
package/test-cli-connection.sh +33 -0
package/test-learning-unit.js +126 -0
package/test-learning.js +112 -0
package/test-parallel-agents.sh +124 -0
package/test-parallel-direct.js +155 -0
package/test-subagents.sh +117 -0

package/README.md CHANGED Viewed

@@ -15,22 +15,22 @@
 ## Table of Contents
 1. [Overview](#overview)
-2. [Core Capabilities](#core-capabilities)
+2. [Supported Models & Providers](#supported-models--providers)
+3. [Core Capabilities](#core-capabilities)
    - [Repo Intelligence & Navigation](#repo-intelligence--navigation)
    - [Git Workflow Enhancements](#git-workflow-enhancements)
    - [Diff & Change Management](#diff--change-management)
    - [Execution & Tooling](#execution--tooling)
    - [Workflow & Collaboration](#workflow--collaboration)
    - [UX, Monitoring, and Logs](#ux-monitoring-and-logs)
-3. [Production Hardening Features](#production-hardening-features)
+4. [Production Hardening Features](#production-hardening-features)
    - [Reliability & Resilience](#reliability--resilience)
    - [Observability & Monitoring](#observability--monitoring)
    - [Security & Governance](#security--governance)
-   - [Performance Characteristics](#performance-characteristics)
-4. [Architecture](#architecture)
-5. [Getting Started](#getting-started)
-6. [Configuration Reference](#configuration-reference)
-7. [Runtime Operations](#runtime-operations)
+5. [Architecture](#architecture)
+6. [Getting Started](#getting-started)
+7. [Configuration Reference](#configuration-reference)
+8. [Runtime Operations](#runtime-operations)
    - [Launching the Proxy](#launching-the-proxy)
    - [Connecting Claude Code CLI](#connecting-claude-code-cli)
    - [Using Ollama Models](#using-ollama-models)
@@ -40,11 +40,12 @@
    - [Integrating MCP Servers](#integrating-mcp-servers)
    - [Health Checks & Monitoring](#health-checks--monitoring)
    - [Metrics & Observability](#metrics--observability)
-8. [Manual Test Matrix](#manual-test-matrix)
-9. [Troubleshooting](#troubleshooting)
-10. [Roadmap & Known Gaps](#roadmap--known-gaps)
-11. [FAQ](#faq)
-12. [License](#license)
+9. [Manual Test Matrix](#manual-test-matrix)
+10. [Troubleshooting](#troubleshooting)
+11. [Roadmap & Known Gaps](#roadmap--known-gaps)
+12. [FAQ](#faq)
+13. [References](#references)
+14. [License](#license)
 ---
@@ -71,6 +72,105 @@ Further documentation and usage notes are available on [DeepWiki](https://deepwi
 ---
+## Supported Models & Providers
+Lynkr supports multiple AI model providers, giving you flexibility in choosing the right model for your needs:
+### **Provider Options**
+| Provider | Configuration | Models Available | Best For |
+|----------|--------------|------------------|----------|
+| **Databricks** (Default) | `MODEL_PROVIDER=databricks` | Claude Sonnet 4.5, Claude Opus 4.5 | Production use, enterprise deployment |
+| **Azure OpenAI** | `MODEL_PROVIDER=azure-openai` | GPT-4o, GPT-4o-mini, GPT-5, o1, o3 | Azure integration, Microsoft ecosystem |
+| **Azure Anthropic** | `MODEL_PROVIDER=azure-anthropic` | Claude Sonnet 4.5, Claude Opus 4.5 | Azure-hosted Claude models |
+| **OpenRouter** | `MODEL_PROVIDER=openrouter` | 100+ models (GPT-4o, Claude, Gemini, Llama, etc.) | Model flexibility, cost optimization |
+| **Ollama** (Local) | `MODEL_PROVIDER=ollama` | Llama 3.1, Qwen2.5, Mistral, CodeLlama | Local/offline use, privacy, no API costs |
+### **Recommended Models by Use Case**
+#### **For Production Code Assistance**
+- **Best**: Claude Sonnet 4.5 (via Databricks or Azure Anthropic)
+- **Alternative**: GPT-4o (via Azure OpenAI or OpenRouter)
+- **Budget**: GPT-4o-mini (via Azure OpenAI) or Claude Haiku (via OpenRouter)
+#### **For Code Generation**
+- **Best**: Claude Opus 4.5 (via Databricks or Azure Anthropic)
+- **Alternative**: GPT-4o (via Azure OpenAI)
+- **Local**: Qwen2.5-Coder 32B (via Ollama)
+#### **For Fast Exploration**
+- **Best**: Claude Haiku (via OpenRouter or Azure Anthropic)
+- **Alternative**: GPT-4o-mini (via Azure OpenAI)
+- **Local**: Llama 3.1 8B (via Ollama)
+#### **For Cost Optimization**
+- **Cheapest Cloud**: Amazon Nova models (via OpenRouter) - free tier available
+- **Cheapest Local**: Ollama (any model) - completely free, runs on your hardware
+### **Azure OpenAI Specific Models**
+When using `MODEL_PROVIDER=azure-openai`, you can deploy any of these models:
+| Model | Deployment Name | Capabilities | Best For |
+|-------|----------------|--------------|----------|
+| **GPT-4o** | `gpt-4o` | Text, vision, function calling | General-purpose, multimodal tasks |
+| **GPT-4o-mini** | `gpt-4o-mini` | Text, function calling | Fast responses, cost-effective |
+| **GPT-5** | `gpt-5-chat` or custom | Advanced reasoning, longer context | Complex problem-solving |
+| **o1-preview** | `o1-preview` | Deep reasoning, chain of thought | Mathematical, logic problems |
+| **o3-mini** | `o3-mini` | Efficient reasoning | Fast reasoning tasks |
+**Note**: Azure OpenAI deployment names are configurable via `AZURE_OPENAI_DEPLOYMENT` environment variable.
+### **Ollama Model Recommendations**
+For tool calling support (required for Claude Code CLI functionality):
+✅ **Recommended**:
+- `llama3.1:8b` - Good balance of speed and capability
+- `llama3.2` - Latest Llama model
+- `qwen2.5:14b` - Strong reasoning (larger model needed, 7b struggles with tools)
+- `mistral:7b-instruct` - Fast and capable
+❌ **Not Recommended for Tools**:
+- `qwen2.5-coder` - Code-only, slow with tool calling
+- `codellama` - Code-only, poor tool support
+### **Hybrid Routing (Ollama + Cloud Fallback)**
+Lynkr supports intelligent hybrid routing for cost optimization:
+```bash
+# Use Ollama for simple tasks, fallback to cloud for complex ones
+PREFER_OLLAMA=true
+FALLBACK_ENABLED=true
+FALLBACK_PROVIDER=databricks  # or azure-openai, openrouter, azure-anthropic
+```
+**How it works**:
+- Requests with few/no tools → Ollama (free, local)
+- Requests with many tools → Cloud provider (more capable)
+- Ollama failures → Automatic fallback to cloud
+**Routing Logic**:
+- 0-2 tools: Ollama
+- 3-15 tools: OpenRouter or Azure OpenAI (if configured)
+- 16+ tools: Databricks or Azure Anthropic (most capable)
+### **Provider Comparison**
+| Feature | Databricks | Azure OpenAI | Azure Anthropic | OpenRouter | Ollama |
+|---------|-----------|--------------|-----------------|------------|--------|
+| **Setup Complexity** | Medium | Medium | Medium | Easy | Easy |
+| **Cost** | $$$ | $$ | $$$ | $ | Free |
+| **Latency** | Low | Low | Low | Medium | Very Low |
+| **Tool Calling** | Excellent | Excellent | Excellent | Good | Fair |
+| **Context Length** | 200K | 128K | 200K | Varies | 32K-128K |
+| **Streaming** | Yes | Yes | Yes | Yes | Yes |
+| **Privacy** | Enterprise | Enterprise | Enterprise | Third-party | Local |
+| **Offline** | No | No | No | No | Yes |
+---
 ## Core Capabilities
 ### Repo Intelligence & Navigation
@@ -96,7 +196,15 @@ Further documentation and usage notes are available on [DeepWiki](https://deepwi
 ### Execution & Tooling
-- Tool execution pipeline sandboxes or runs tools in the host workspace based on policy.
+- **Flexible tool execution modes**: Configure where tools execute via `TOOL_EXECUTION_MODE`:
+  - `server` (default) – Tools run on the proxy server where Lynkr is hosted
+  - `client`/`passthrough` – Tools execute on the Claude Code CLI side, enabling local file operations and commands on the client machine
+- **Client-side tool execution** – When in passthrough mode, the proxy returns Anthropic-formatted `tool_use` blocks to the CLI, which executes them locally and sends back `tool_result` blocks. This enables:
+  - File operations on the CLI user's local filesystem
+  - Local command execution in the user's environment
+  - Access to local credentials and SSH keys
+  - Integration with local development tools
+- Tool execution pipeline sandboxes or runs tools in the host workspace based on policy (server mode).
 - MCP sandbox orchestration (Docker runtime by default) optionally isolates external tools with mount and permission controls.
 - Automated testing harness exposes `workspace_test_run`, `workspace_test_history`, and `workspace_test_summary`.
 - Prompt caching reduces repeated token usage for iterative conversations.
@@ -213,43 +321,7 @@ Lynkr includes comprehensive production-ready features designed for reliability,
 - Cost tracking and budget exhaustion handling
 - Request-level cost attribution
-### Performance Characteristics
-#### **Benchmark Results**
-Based on comprehensive performance testing with 100,000+ operations:
-| Component | Throughput | Latency | Overhead |
-|-----------|------------|---------|----------|
-| Baseline (no-op) | 21.3M ops/sec | 0.00005ms | - |
-| Metrics Collection | 4.7M ops/sec | 0.0002ms | 0.15ms |
-| Load Shedding Check | 7.6M ops/sec | 0.0001ms | 0.08ms |
-| Circuit Breaker | 4.3M ops/sec | 0.0002ms | 0.18ms |
-| Input Validation (simple) | 5.8M ops/sec | 0.0002ms | 0.12ms |
-| Input Validation (complex) | 890K ops/sec | 0.0011ms | 0.96ms |
-| Combined Middleware Stack | 140K ops/sec | 0.0071ms | 7.1μs |
-**Overall Performance Rating:** ⭐ **EXCELLENT**
-- Total middleware overhead: **7.1 microseconds** per request
-- Throughput: **140,000 requests/second**
-- Memory overhead: **~4MB** for typical workload
-#### **Production Deployment Metrics**
-- **Test Coverage:** 80 comprehensive tests with 100% pass rate
-- **Feature Completeness:** 14/14 production features implemented
-- **Zero-downtime Deployments:** Supported via graceful shutdown
-- **Horizontal Scaling:** Stateless design enables unlimited horizontal scaling
-- **Vertical Scaling:** Efficient resource usage supports high request volumes
-#### **Scalability Profile**
-- Single instance handles 140K req/sec under test conditions
-- Linear scaling with additional instances (no shared state)
-- Memory usage: ~100MB baseline + ~4MB per 10K active requests
-- CPU usage: <5% per core at moderate load
-- Network: Limited by backend API latency, not proxy overhead
-For detailed performance analysis, benchmarks, and deployment guidance, see [PERFORMANCE-REPORT.md](PERFORMANCE-REPORT.md).
----
 ## Architecture
@@ -593,6 +665,7 @@ See https://openrouter.ai/models for the complete list with pricing.
 | `PROMPT_CACHE_ENABLED` | Toggle the prompt cache system. | `true` |
 | `PROMPT_CACHE_TTL_MS` | Milliseconds before cached prompts expire. | `300000` (5 minutes) |
 | `PROMPT_CACHE_MAX_ENTRIES` | Maximum number of cached prompts retained. | `64` |
+| `TOOL_EXECUTION_MODE` | Controls where tools execute: `server` (default, tools run on proxy server), `client`/`passthrough` (tools execute on Claude Code CLI side). | `server` |
 | `POLICY_MAX_STEPS` | Max agent loop iterations before timeout. | `8` |
 | `POLICY_GIT_ALLOW_PUSH` | Allow/disallow `workspace_git_push`. | `false` |
 | `POLICY_GIT_REQUIRE_TESTS` | Enforce passing tests before `workspace_git_commit`. | `false` |
@@ -690,12 +763,6 @@ Lynkr works with any Ollama model. Popular choices:
 - **mistral:latest** – Fast, efficient model (7B parameters, 4.1GB)
 - **codellama:latest** – Meta's code-focused model (7B-34B variants)
-**Performance Characteristics:**
-- **Latency**: ~100-500ms first token (depending on model size and hardware)
-- **Throughput**: ~20-50 tokens/sec on M1/M2 Macs, ~10-30 tokens/sec on typical CPUs
-- **Memory**: 8GB RAM minimum recommended for 7B models, 16GB for 13B models
-- **Disk**: 4-10GB per model (quantized)
 **Ollama Health Check:**
@@ -716,7 +783,6 @@ Lynkr now supports **native tool calling** for compatible Ollama models:
 - ✅ **Format conversion**: Transparent conversion between Anthropic and Ollama tool formats
 - ❌ **Unsupported models**: llama3, older models (tools are filtered out automatically)
-See [OLLAMA-TOOL-CALLING.md](OLLAMA-TOOL-CALLING.md) for implementation details.
 **Limitations:**
@@ -885,8 +951,6 @@ npm start
 | **Cost per simple request** | $0.002-0.005 | $0.00 | 100% savings 💰 |
 | **Fallback latency** | N/A | <100ms | Transparent to user |
-See [HYBRID-ROUTING-ANALYSIS.md](HYBRID-ROUTING-ANALYSIS.md) for detailed performance analysis.
 ### Using Built-in Workspace Tools
 You can call tools programmatically via HTTP:
@@ -913,6 +977,69 @@ curl http://localhost:8080/v1/messages \
 Tool responses appear in the assistant content block with structured JSON.
+### Client-Side Tool Execution (Passthrough Mode)
+Lynkr supports **client-side tool execution**, where tools execute on the Claude Code CLI machine instead of the proxy server. This enables local file operations, commands, and access to local resources.
+**Enable client-side execution:**
+```bash
+# Set in .env or export before starting
+export TOOL_EXECUTION_MODE=client
+npm start
+```
+**How it works:**
+1. **Model generates tool calls** – Databricks/OpenRouter/Ollama model returns tool calls
+2. **Proxy converts to Anthropic format** – Tool calls converted to `tool_use` blocks
+3. **CLI executes tools locally** – Claude Code CLI receives `tool_use` blocks and runs them on the user's machine
+4. **CLI sends results back** – Tool results sent back to proxy in next request as `tool_result` blocks
+5. **Conversation continues** – Proxy forwards the complete conversation (including tool results) back to the model
+**Example response in passthrough mode:**
+```json
+{
+  "id": "msg_123",
+  "type": "message",
+  "role": "assistant",
+  "content": [
+    {
+      "type": "text",
+      "text": "I'll create that file for you."
+    },
+    {
+      "type": "tool_use",
+      "id": "toolu_abc",
+      "name": "Write",
+      "input": {
+        "file_path": "/tmp/test.txt",
+        "content": "Hello World"
+      }
+    }
+  ],
+  "stop_reason": "tool_use"
+}
+```
+**Benefits:**
+- ✅ Tools execute on CLI user's local filesystem
+- ✅ Access to local credentials, SSH keys, environment variables
+- ✅ Integration with local development tools (git, npm, docker, etc.)
+- ✅ Reduced network latency for file operations
+- ✅ Server doesn't need filesystem access or permissions
+**Use cases:**
+- Remote proxy server, local CLI execution
+- Multi-user environments where each user needs their own workspace
+- Security-sensitive environments where server shouldn't access user files
+**Supported modes:**
+- `TOOL_EXECUTION_MODE=server` – Tools run on proxy server (default)
+- `TOOL_EXECUTION_MODE=client` – Tools run on CLI side
+- `TOOL_EXECUTION_MODE=passthrough` – Alias for `client`
 ### Working with Prompt Caching
 - Set `PROMPT_CACHE_ENABLED=true` (default) to activate the cache.
@@ -1205,11 +1332,21 @@ Replace `<workspace>` and `<endpoint-name>` with your Databricks workspace host
 - **Claude CLI prompts for missing tools** – Verify `tools` array in the client request lists the functions you expect. The proxy only exposes registered handlers.
 - **Dynamic finance pages return stale data** – `web_fetch` fetches static HTML only. Use an API endpoint (e.g. Yahoo Finance chart JSON) or the Databricks-hosted tooling if you need rendered values from heavily scripted pages.
+### OpenRouter Issues
+- **"No choices in OpenRouter response" errors** – OpenRouter sometimes returns error responses (rate limits, model unavailable) with JSON but no `choices` array. As of the latest update, Lynkr gracefully handles these errors and returns proper error responses instead of crashing. Check logs for "OpenRouter response missing choices array" warnings to see the full error details.
+- **Multi-prompt behavior with certain models** – Some OpenRouter models (particularly open-source models like `openai/gpt-oss-120b`) may be overly cautious and ask for confirmation multiple times before executing tools. This is model-specific behavior. Consider switching to:
+  - `anthropic/claude-3.5-sonnet` – More decisive tool execution
+  - `openai/gpt-4o` or `openai/gpt-4o-mini` – Better tool calling behavior
+  - Use Databricks provider with Claude models for optimal tool execution
+- **Rate limit errors** – OpenRouter applies per-model rate limits. If you hit limits frequently, check your OpenRouter dashboard for current usage and consider upgrading your plan or spreading requests across multiple models.
 ### Production Hardening Issues
 - **503 Service Unavailable errors during normal load** – Check load shedding thresholds (`LOAD_SHEDDING_*`). Lower values may trigger too aggressively. Check `/metrics/observability` for memory usage patterns.
 - **Circuit breaker stuck in OPEN state** – Check `/metrics/circuit-breakers` to see failure counts. Verify backend service (Databricks/Azure) is accessible. Circuit will automatically attempt recovery after `CIRCUIT_BREAKER_TIMEOUT` (default: 60s).
 - **"Circuit breaker is OPEN" errors** – The circuit breaker detected too many failures and is protecting against cascading failures. Wait for timeout or fix the underlying issue. Check logs for root cause of failures.
+  - **Azure OpenAI specific**: If using Azure OpenAI and seeing circuit breaker errors, verify your `AZURE_OPENAI_ENDPOINT` includes the full path (including `/openai/deployments/YOUR-DEPLOYMENT/chat/completions`). Missing endpoint variable or undefined returns can trigger circuit breaker protection.
 - **High latency after adding production features** – This is unexpected; middleware adds only ~7μs overhead. Check `/metrics/prometheus` for actual latency distribution. Verify network latency to backend services.
 - **Health check endpoint returns 503 but service seems healthy** – Check individual health check components in the response JSON. Database connectivity or memory issues may trigger this. Review logs for specific health check failures.
 - **Metrics endpoint shows incorrect data** – Metrics are in-memory and reset on restart. For persistent metrics, configure Prometheus scraping. Check that `METRICS_ENABLED=true`.
@@ -1245,9 +1382,9 @@ If performance is degraded:
 ## Roadmap & Known Gaps
-### ✅ Recently Completed (Production Hardening)
+### ✅ Recently Completed
-All 14 production hardening features have been implemented and tested with 100% pass rate:
+**Production Hardening (All 14 features implemented with 100% pass rate):**
 - ✅ Exponential backoff with jitter retry logic
 - ✅ Circuit breaker pattern for external services
 - ✅ Load shedding with resource monitoring
@@ -1263,7 +1400,11 @@ All 14 production hardening features have been implemented and tested with 100%
 - ✅ Rate limiting capabilities
 - ✅ Safe command DSL
-Performance verified: 7.1μs overhead, 140K req/sec throughput. See [PERFORMANCE-REPORT.md](PERFORMANCE-REPORT.md) for details.
+**Latest Features (December 2025):**
+- ✅ **Client-side tool execution** (`TOOL_EXECUTION_MODE=client/passthrough`) – Tools can now execute on the Claude Code CLI side instead of the server, enabling local file operations, local commands, and access to local credentials
+- ✅ **OpenRouter error resilience** – Graceful handling of malformed OpenRouter responses (missing `choices` array), preventing crashes during rate limits or service errors
+- ✅ **Enhanced format conversion** – Improved Anthropic ↔ OpenRouter format conversion for tool calls, ensuring proper `tool_use` block generation and session consistency across providers
 ### 🔮 Future Enhancements
@@ -1380,32 +1521,35 @@ A: Lynkr collects request counts, error rates, latency percentiles (p50, p95, p9
 - `/metrics/circuit-breakers` - Circuit breaker state
 **Q: Is Lynkr production-ready?**
-A: Yes.  Excellent performance (140K req/sec), and comprehensive observability, Lynkr is designed for production deployments. It supports:
+A: Yes.  Excellent performance , and comprehensive observability, Lynkr is designed for production deployments. It supports:
 - Zero-downtime deployments (graceful shutdown)
 - Kubernetes integration (health checks, metrics)
 - Horizontal scaling (stateless design)
 - Enterprise monitoring (Prometheus, Grafana)
-**Q: What's the performance impact of production features?**
-A: Minimal. Comprehensive benchmarking shows:
-- Total middleware overhead: 7.1 microseconds per request
-- Throughput: 140,000 requests/second
-- Memory overhead: ~4MB for typical workload
-This is considered "EXCELLENT" performance - the overhead is negligible compared to network and API latency.
 **Q: How do I deploy Lynkr to Kubernetes?**
 A: Use the included Kubernetes configurations and Docker support. Key steps:
 1. Build Docker image: `docker build -t lynkr .`
 2. Configure environment variables in Kubernetes secrets
-3. Deploy with health checks (see examples in [PERFORMANCE-REPORT.md](PERFORMANCE-REPORT.md))
-4. Configure Prometheus scraping for metrics
-5. Set up Grafana dashboards for visualization
+3. Configure Prometheus scraping for metrics
+4. Set up Grafana dashboards for visualization
 The graceful shutdown and health check endpoints ensure zero-downtime deployments.
 ---
+## References
+Lynkr's design also includes ACE Framework informed by research in agentic AI systems and context engineering:
+- **Zhang et al. (2024)**. *Agentic Context Engineering*. arXiv:2510.04618. [arXiv](https://arxiv.org/abs/2510.04618)
+For BibTeX citations, see [CITATIONS.bib](CITATIONS.bib).
+---
 ## License
 MIT License. See [LICENSE](LICENSE) for details.

package/docs/index.md CHANGED Viewed

@@ -73,10 +73,13 @@ Commit, push, diff, stage, generate release notes, etc.
 ### ✔ Prompt Caching (LRU + TTL)
 Reuses identical prompts to reduce cost + latency.
-### ✔ Workspace Tools
+### ✔ Workspace Tools
 Task tracker, file I/O, test runner, index rebuild, etc.
-### ✔ Fully extensible Node.js architecture
+### ✔ Client-Side Tool Execution (Passthrough Mode)
+Tools can execute on the Claude Code CLI side instead of the server, enabling local file operations and commands.
+### ✔ Fully extensible Node.js architecture
 Add custom tools, policies, or backend adapters.
 ---
@@ -92,6 +95,7 @@ Add custom tools, policies, or backend adapters.
 - [Prompt Caching](#-prompt-caching)
 - [MCP (Model Context Protocol) Integration](#-model-context-protocol-mcp)
 - [Git Tools](#-git-tools)
+- [Client-Side Tool Execution (Passthrough Mode)](#-client-side-tool-execution-passthrough-mode)
 - [API Examples](#-api-examples)
 - [Roadmap](#-roadmap)
 - [Links](#-links)
@@ -363,6 +367,47 @@ Example:
 ---
+# 🔄 Client-Side Tool Execution (Passthrough Mode)
+Lynkr supports **client-side tool execution**, enabling tools to execute on the Claude Code CLI machine instead of the proxy server.
+**Enable passthrough mode:**
+```bash
+export TOOL_EXECUTION_MODE=client
+npm start
+```
+**How it works:**
+1. Model generates tool calls (from Databricks/OpenRouter/Ollama)
+2. Proxy converts to Anthropic format with `tool_use` blocks
+3. Claude Code CLI receives `tool_use` blocks and executes locally
+4. CLI sends `tool_result` blocks back in the next request
+5. Proxy forwards complete conversation back to the model
+**Benefits:**
+* ✅ Local filesystem access on CLI user's machine
+* ✅ Local credentials, SSH keys, environment variables
+* ✅ Integration with local dev tools (git, npm, docker)
+* ✅ Reduced network latency for file operations
+* ✅ Server doesn't need filesystem permissions
+**Use cases:**
+* Remote proxy server with local CLI execution
+* Multi-user environments where each needs their own workspace
+* Security-sensitive setups where server shouldn't access user files
+**Configuration:**
+* `TOOL_EXECUTION_MODE=server` – Tools run on proxy (default)
+* `TOOL_EXECUTION_MODE=client` – Tools run on CLI side
+* `TOOL_EXECUTION_MODE=passthrough` – Alias for client mode
+---
 # 🧪 API Example (Index Rebuild)
 ```bash
@@ -382,6 +427,14 @@ curl http://localhost:8080/v1/messages \
 # 🛣 Roadmap
+## ✅ Recently Completed (December 2025)
+* **Client-side tool execution** (`TOOL_EXECUTION_MODE=client/passthrough`) – Tools can execute on the Claude Code CLI side, enabling local file operations, commands, and access to local credentials
+* **OpenRouter error resilience** – Graceful handling of malformed OpenRouter responses, preventing crashes during rate limits or service errors
+* **Enhanced format conversion** – Improved Anthropic ↔ OpenRouter format conversion for tool calls with proper `tool_use` block generation
+## 🔮 Future Features
 * LSP integration (TypeScript, Python, more languages)
 * Per-file diff comments
 * Risk scoring for Git diffs

package/monitor-agents.sh ADDED Viewed

@@ -0,0 +1,31 @@
+#!/bin/bash
+# Monitor agent activity in real-time
+echo "🔍 Monitoring Agent Activity"
+echo "=============================="
+echo ""
+while true; do
+  clear
+  echo "🔍 Agent Statistics (refreshing every 3s)"
+  echo "=========================================="
+  echo ""
+  # Get stats
+  curl -s http://localhost:8080/v1/agents/stats | jq -r '.stats[] |
+    "Agent: \(.agent_type)
+    Executions: \(.total_executions) (\(.completed) completed, \(.failed) failed)
+    Avg Duration: \(.avg_duration_ms)ms
+    Tokens: \(.total_input_tokens) in / \(.total_output_tokens) out
+    "' || echo "Proxy not responding..."
+  echo ""
+  echo "Latest transcripts:"
+  ls -lt data/agent-transcripts/*.jsonl 2>/dev/null | head -3 || echo "No transcripts yet"
+  echo ""
+  echo "Press Ctrl+C to stop monitoring"
+  sleep 3
+done

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "lynkr",
-  "version": "1.0.0",
-  "description": "Self-hosted Claude Code proxy with Databricks and Azure Anthropic adapters, workspace tooling, and MCP integration.",
+  "version": "2.0.0",
+  "description": "Self-hosted Claude Code proxy with Databricks,Azure  adapters, workspace tooling, and MCP integration.",
   "main": "index.js",
   "bin": {
     "lynkr": "./bin/cli.js",
@@ -12,7 +12,8 @@
     "dev": "nodemon index.js",
     "lint": "eslint src index.js",
     "test": "npm run test:unit && npm run test:performance",
-    "test:unit": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/routing.test.js test/hybrid-routing-integration.test.js test/web-tools.test.js",
+    "test:unit": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/routing.test.js test/hybrid-routing-integration.test.js test/web-tools.test.js test/passthrough-mode.test.js test/openrouter-error-resilience.test.js test/format-conversion.test.js test/azure-openai-config.test.js test/azure-openai-format-conversion.test.js test/azure-openai-routing.test.js test/azure-openai-streaming.test.js test/azure-openai-error-resilience.test.js test/azure-openai-integration.test.js",
+    "test:new-features": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/passthrough-mode.test.js test/openrouter-error-resilience.test.js test/format-conversion.test.js",
     "test:performance": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node test/hybrid-routing-performance.test.js && DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node test/performance-tests.js",
     "test:benchmark": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node test/performance-benchmark.js",
     "test:quick": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/routing.test.js",
@@ -40,6 +41,7 @@
     "node": ">=20.0.0"
   },
   "dependencies": {
+    "@azure/openai": "^2.0.0",
     "better-sqlite3": "^9.4.0",
     "compression": "^1.7.4",
     "diff": "^5.2.0",
@@ -47,6 +49,8 @@
     "express": "^5.1.0",
     "express-rate-limit": "^8.2.1",
     "fast-glob": "^3.3.2",
+    "js-yaml": "^4.1.1",
+    "openai": "^6.14.0",
     "pino": "^8.17.2",
     "pino-http": "^8.6.0",
     "tree-sitter": "^0.20.1",