lynkr 4.1.0 → 4.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -32,123 +32,6 @@ Lynkr is a **self-hosted proxy server** that unlocks Claude Code CLI and Cursor
32
32
 
33
33
  ---
34
34
 
35
- ## 💰 Cost Savings
36
-
37
- Lynkr reduces AI costs by **60-80%** through intelligent token optimization:
38
-
39
- ### Real-World Savings Example
40
-
41
- **Scenario:** 100,000 API requests/month, 50k input tokens, 2k output tokens per request
42
-
43
- | Provider | Without Lynkr | With Lynkr | **Monthly Savings** | **Annual Savings** |
44
- |----------|---------------|------------|---------------------|-------------------|
45
- | **Claude Sonnet 4.5** (Databricks) | $16,000 | $6,400 | **$9,600** | **$115,200** |
46
- | **GPT-4o** (OpenRouter) | $12,000 | $4,800 | **$7,200** | **$86,400** |
47
- | **Ollama** (Local) | API costs | **$0** | **$12,000+** | **$144,000+** |
48
-
49
- ### How We Achieve 60-80% Cost Reduction
50
-
51
- **6 Token Optimization Phases:**
52
-
53
- 1. **Smart Tool Selection** (50-70% reduction)
54
- - Filters tools based on request type
55
- - Chat queries don't get file/git tools
56
- - Only sends relevant tools to model
57
-
58
- 2. **Prompt Caching** (30-45% reduction)
59
- - Caches repeated prompts and system messages
60
- - Reuses context across conversations
61
- - Reduces redundant token usage
62
-
63
- 3. **Memory Deduplication** (20-30% reduction)
64
- - Removes duplicate conversation context
65
- - Compresses historical messages
66
- - Eliminates redundant information
67
-
68
- 4. **Tool Response Truncation** (15-25% reduction)
69
- - Truncates long tool outputs intelligently
70
- - Keeps only relevant portions
71
- - Reduces tool result tokens
72
-
73
- 5. **Dynamic System Prompts** (10-20% reduction)
74
- - Adapts prompts to request complexity
75
- - Shorter prompts for simple queries
76
- - Full prompts only when needed
77
-
78
- 6. **Conversation Compression** (15-25% reduction)
79
- - Summarizes old conversation turns
80
- - Keeps recent context detailed
81
- - Archives historical context
82
-
83
- 📖 **[Detailed Token Optimization Guide](documentation/token-optimization.md)**
84
-
85
- ---
86
-
87
- ## 🚀 Key Features
88
-
89
- ### Multi-Provider Support (9+ Providers)
90
- - ✅ **Cloud Providers:** Databricks, AWS Bedrock (100+ models), OpenRouter (100+ models), Azure OpenAI, Azure Anthropic, OpenAI
91
- - ✅ **Local Providers:** Ollama (free), llama.cpp (free), LM Studio (free)
92
- - ✅ **Hybrid Routing:** Automatically route between local (fast/free) and cloud (powerful) based on complexity
93
- - ✅ **Automatic Fallback:** Transparent failover if primary provider is unavailable
94
-
95
- ### Cost Optimization
96
- - 💰 **60-80% Token Reduction** - 6-phase optimization pipeline
97
- - 💰 **$77k-$115k Annual Savings** - For typical enterprise usage (100k requests/month)
98
- - 💰 **100% FREE Option** - Run completely locally with Ollama or llama.cpp
99
- - 💰 **Hybrid Routing** - 65-100% cost savings by using local models for simple requests
100
-
101
- ### Privacy & Security
102
- - 🔒 **100% Local Operation** - Run completely offline with Ollama/llama.cpp
103
- - 🔒 **Air-Gapped Deployments** - No internet required for local providers
104
- - 🔒 **Self-Hosted** - Full control over your data and infrastructure
105
- - 🔒 **Local Embeddings** - Private @Codebase search with Ollama/llama.cpp
106
- - 🔐 **Policy Enforcement** - Git restrictions, test requirements, web fetch controls
107
- - 🔐 **Sandboxing** - Optional Docker isolation for MCP tools
108
-
109
- ### Enterprise Features
110
- - ðŸĒ **Production-Ready** - Circuit breakers, load shedding, graceful shutdown
111
- - ðŸĒ **Observability** - Prometheus metrics, structured logging, health checks
112
- - ðŸĒ **Kubernetes-Ready** - Liveness, readiness, startup probes
113
- - ðŸĒ **High Performance** - ~7Ξs overhead, 140K req/sec throughput
114
- - ðŸĒ **Reliability** - Exponential backoff, automatic retries, error resilience
115
- - ðŸĒ **Scalability** - Horizontal scaling, connection pooling, load balancing
116
-
117
- ### IDE Integration
118
- - ✅ **Claude Code CLI** - Drop-in replacement for Anthropic backend
119
- - ✅ **Cursor IDE** - Full OpenAI API compatibility
120
- - ✅ **Continue.dev** - Works with any OpenAI-compatible client
121
- - ✅ **All Features Work** - Chat, file operations, tool calling, streaming
122
-
123
- ### Advanced Capabilities
124
- - 🧠 **Long-Term Memory** - Titans-inspired memory system with surprise-based filtering
125
- - 🧠 **Semantic Memory** - FTS5 search with multi-signal retrieval (recency, importance, relevance)
126
- - 🧠 **Automatic Extraction** - Zero-latency memory updates (<50ms retrieval, <100ms async extraction)
127
- - 🔧 **MCP Integration** - Automatic Model Context Protocol server discovery
128
- - 🔧 **Tool Calling** - Full tool support with server and client execution modes
129
- - 🔧 **Custom Tools** - Easy integration of custom tool implementations
130
- - 🔍 **Embeddings Support** - 4 options: Ollama (local), llama.cpp (local), OpenRouter, OpenAI
131
- - 📊 **Token Tracking** - Real-time usage monitoring and cost attribution
132
-
133
- ### Developer Experience
134
- - ðŸŽŊ **Zero Code Changes** - Works with existing Claude Code CLI/Cursor setups
135
- - ðŸŽŊ **Hot Reload** - Development mode with auto-restart
136
- - ðŸŽŊ **Comprehensive Logging** - Structured logs with request ID correlation
137
- - ðŸŽŊ **Easy Configuration** - Environment variables or .env file
138
- - ðŸŽŊ **Docker Support** - docker-compose with GPU support
139
- - ðŸŽŊ **400+ Tests** - Comprehensive test coverage for reliability
140
-
141
- ### Streaming & Performance
142
- - ⚡ **Real-Time Streaming** - Token-by-token streaming for all providers
143
- - ⚡ **Low Latency** - Minimal overhead (~7ξs per request)
144
- - ⚡ **High Throughput** - 140K requests/second capacity
145
- - ⚡ **Connection Pooling** - Efficient connection reuse
146
- - ⚡ **Prompt Caching** - LRU cache with SHA-256 keying
147
-
148
- 📖 **[Complete Feature Documentation](documentation/features.md)**
149
-
150
- ---
151
-
152
35
  ## Quick Start
153
36
 
154
37
  ### Installation
package/bin/cli.js CHANGED
@@ -1,3 +1,9 @@
1
1
  #!/usr/bin/env node
2
2
 
3
+ if (process.argv.includes('--version') || process.argv.includes('-v')) {
4
+ const pkg = require('../package.json');
5
+ console.log(pkg.version);
6
+ process.exit(0);
7
+ }
8
+
3
9
  require("../index.js");