npm - lynkr - Versions diffs - 4.1.0 → 4.2.1 - Mend

lynkr 4.1.0 → 4.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/LYNKR-TUI-PLAN.md +984 -0
package/README.md +0 -117
package/bin/cli.js +6 -0
package/docs/index.md +78 -790
package/package.json +1 -1
package/src/api/openai-router.js +187 -0
package/src/api/router.js +172 -22
package/src/clients/databricks.js +82 -7
package/src/clients/openai-format.js +11 -9
package/src/clients/openrouter-utils.js +15 -5
package/src/clients/responses-format.js +214 -0
package/src/clients/standard-tools.js +4 -4
package/src/orchestrator/index.js +32 -0
package/README.md.backup +0 -2996
package/docs/LOCAL_EMBEDDINGS_PLAN.md +0 -1024
package/lynkr-0.1.1.tgz +0 -0

package/README.md CHANGED Viewed

@@ -32,123 +32,6 @@ Lynkr is a **self-hosted proxy server** that unlocks Claude Code CLI and Cursor
 ---
-## 💰 Cost Savings
-Lynkr reduces AI costs by **60-80%** through intelligent token optimization:
-### Real-World Savings Example
-**Scenario:** 100,000 API requests/month, 50k input tokens, 2k output tokens per request
-| Provider | Without Lynkr | With Lynkr | **Monthly Savings** | **Annual Savings** |
-|----------|---------------|------------|---------------------|-------------------|
-| **Claude Sonnet 4.5** (Databricks) | $16,000 | $6,400 | **$9,600** | **$115,200** |
-| **GPT-4o** (OpenRouter) | $12,000 | $4,800 | **$7,200** | **$86,400** |
-| **Ollama** (Local) | API costs | **$0** | **$12,000+** | **$144,000+** |
-### How We Achieve 60-80% Cost Reduction
-**6 Token Optimization Phases:**
-1. **Smart Tool Selection** (50-70% reduction)
-   - Filters tools based on request type
-   - Chat queries don't get file/git tools
-   - Only sends relevant tools to model
-2. **Prompt Caching** (30-45% reduction)
-   - Caches repeated prompts and system messages
-   - Reuses context across conversations
-   - Reduces redundant token usage
-3. **Memory Deduplication** (20-30% reduction)
-   - Removes duplicate conversation context
-   - Compresses historical messages
-   - Eliminates redundant information
-4. **Tool Response Truncation** (15-25% reduction)
-   - Truncates long tool outputs intelligently
-   - Keeps only relevant portions
-   - Reduces tool result tokens
-5. **Dynamic System Prompts** (10-20% reduction)
-   - Adapts prompts to request complexity
-   - Shorter prompts for simple queries
-   - Full prompts only when needed
-6. **Conversation Compression** (15-25% reduction)
-   - Summarizes old conversation turns
-   - Keeps recent context detailed
-   - Archives historical context
-📖 **[Detailed Token Optimization Guide](documentation/token-optimization.md)**
----
-## 🚀 Key Features
-### Multi-Provider Support (9+ Providers)
-- ✅ **Cloud Providers:** Databricks, AWS Bedrock (100+ models), OpenRouter (100+ models), Azure OpenAI, Azure Anthropic, OpenAI
-- ✅ **Local Providers:** Ollama (free), llama.cpp (free), LM Studio (free)
-- ✅ **Hybrid Routing:** Automatically route between local (fast/free) and cloud (powerful) based on complexity
-- ✅ **Automatic Fallback:** Transparent failover if primary provider is unavailable
-### Cost Optimization
-- 💰 **60-80% Token Reduction** - 6-phase optimization pipeline
-- 💰 **$77k-$115k Annual Savings** - For typical enterprise usage (100k requests/month)
-- 💰 **100% FREE Option** - Run completely locally with Ollama or llama.cpp
-- 💰 **Hybrid Routing** - 65-100% cost savings by using local models for simple requests
-### Privacy & Security
-- 🔒 **100% Local Operation** - Run completely offline with Ollama/llama.cpp
-- 🔒 **Air-Gapped Deployments** - No internet required for local providers
-- 🔒 **Self-Hosted** - Full control over your data and infrastructure
-- 🔒 **Local Embeddings** - Private @Codebase search with Ollama/llama.cpp
-- 🔐 **Policy Enforcement** - Git restrictions, test requirements, web fetch controls
-- 🔐 **Sandboxing** - Optional Docker isolation for MCP tools
-### Enterprise Features
-- 🏢 **Production-Ready** - Circuit breakers, load shedding, graceful shutdown
-- 🏢 **Observability** - Prometheus metrics, structured logging, health checks
-- 🏢 **Kubernetes-Ready** - Liveness, readiness, startup probes
-- 🏢 **High Performance** - ~7μs overhead, 140K req/sec throughput
-- 🏢 **Reliability** - Exponential backoff, automatic retries, error resilience
-- 🏢 **Scalability** - Horizontal scaling, connection pooling, load balancing
-### IDE Integration
-- ✅ **Claude Code CLI** - Drop-in replacement for Anthropic backend
-- ✅ **Cursor IDE** - Full OpenAI API compatibility
-- ✅ **Continue.dev** - Works with any OpenAI-compatible client
-- ✅ **All Features Work** - Chat, file operations, tool calling, streaming
-### Advanced Capabilities
-- 🧠 **Long-Term Memory** - Titans-inspired memory system with surprise-based filtering
-- 🧠 **Semantic Memory** - FTS5 search with multi-signal retrieval (recency, importance, relevance)
-- 🧠 **Automatic Extraction** - Zero-latency memory updates (<50ms retrieval, <100ms async extraction)
-- 🔧 **MCP Integration** - Automatic Model Context Protocol server discovery
-- 🔧 **Tool Calling** - Full tool support with server and client execution modes
-- 🔧 **Custom Tools** - Easy integration of custom tool implementations
-- 🔍 **Embeddings Support** - 4 options: Ollama (local), llama.cpp (local), OpenRouter, OpenAI
-- 📊 **Token Tracking** - Real-time usage monitoring and cost attribution
-### Developer Experience
-- 🎯 **Zero Code Changes** - Works with existing Claude Code CLI/Cursor setups
-- 🎯 **Hot Reload** - Development mode with auto-restart
-- 🎯 **Comprehensive Logging** - Structured logs with request ID correlation
-- 🎯 **Easy Configuration** - Environment variables or .env file
-- 🎯 **Docker Support** - docker-compose with GPU support
-- 🎯 **400+ Tests** - Comprehensive test coverage for reliability
-### Streaming & Performance
-- ⚡ **Real-Time Streaming** - Token-by-token streaming for all providers
-- ⚡ **Low Latency** - Minimal overhead (~7μs per request)
-- ⚡ **High Throughput** - 140K requests/second capacity
-- ⚡ **Connection Pooling** - Efficient connection reuse
-- ⚡ **Prompt Caching** - LRU cache with SHA-256 keying
-📖 **[Complete Feature Documentation](documentation/features.md)**
----
 ## Quick Start
 ### Installation

package/bin/cli.js CHANGED Viewed

@@ -1,3 +1,9 @@
 #!/usr/bin/env node
+if (process.argv.includes('--version') || process.argv.includes('-v')) {
+  const pkg = require('../package.json');
+  console.log(pkg.version);
+  process.exit(0);
+}
 require("../index.js");