lynkr 8.0.0 → 9.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.lynkr/telemetry.db +0 -0
- package/.lynkr/telemetry.db-shm +0 -0
- package/.lynkr/telemetry.db-wal +0 -0
- package/README.md +196 -322
- package/lynkr-skill.tar.gz +0 -0
- package/package.json +4 -3
- package/src/api/openai-router.js +64 -13
- package/src/api/providers-handler.js +171 -3
- package/src/api/router.js +9 -2
- package/src/clients/circuit-breaker.js +10 -247
- package/src/clients/codex-process.js +342 -0
- package/src/clients/codex-utils.js +143 -0
- package/src/clients/databricks.js +210 -63
- package/src/clients/resilience.js +540 -0
- package/src/clients/retry.js +22 -167
- package/src/clients/standard-tools.js +23 -0
- package/src/config/index.js +77 -0
- package/src/context/compression.js +42 -9
- package/src/context/distill.js +492 -0
- package/src/orchestrator/index.js +48 -8
- package/src/routing/complexity-analyzer.js +258 -5
- package/src/routing/index.js +12 -2
- package/src/routing/latency-tracker.js +148 -0
- package/src/routing/model-tiers.js +2 -0
- package/src/routing/quality-scorer.js +113 -0
- package/src/routing/telemetry.js +464 -0
- package/src/server.js +13 -12
- package/src/tools/code-graph.js +538 -0
- package/src/tools/code-mode.js +304 -0
- package/src/tools/index.js +4 -0
- package/src/tools/lazy-loader.js +18 -0
- package/src/tools/mcp-remote.js +7 -0
- package/src/tools/smart-selection.js +11 -0
- package/src/tools/tinyfish.js +358 -0
- package/src/tools/truncate.js +1 -0
- package/src/utils/payload.js +206 -0
- package/src/utils/perf-timer.js +80 -0
- package/.github/FUNDING.yml +0 -15
- package/.github/workflows/README.md +0 -215
- package/.github/workflows/ci.yml +0 -69
- package/.github/workflows/index.yml +0 -62
- package/.github/workflows/web-tools-tests.yml +0 -56
- package/CITATIONS.bib +0 -6
- package/DEPLOYMENT.md +0 -1001
- package/LYNKR-TUI-PLAN.md +0 -984
- package/PERFORMANCE-REPORT.md +0 -866
- package/PLAN-per-client-model-routing.md +0 -252
- package/docs/42642f749da6234f41b6b425c3bb07c9.txt +0 -1
- package/docs/BingSiteAuth.xml +0 -4
- package/docs/docs-style.css +0 -478
- package/docs/docs.html +0 -198
- package/docs/google5be250e608e6da39.html +0 -1
- package/docs/index.html +0 -577
- package/docs/index.md +0 -584
- package/docs/robots.txt +0 -4
- package/docs/sitemap.xml +0 -44
- package/docs/style.css +0 -1223
- package/docs/toon-integration-spec.md +0 -130
- package/documentation/README.md +0 -101
- package/documentation/api.md +0 -806
- package/documentation/claude-code-cli.md +0 -679
- package/documentation/codex-cli.md +0 -397
- package/documentation/contributing.md +0 -571
- package/documentation/cursor-integration.md +0 -734
- package/documentation/docker.md +0 -874
- package/documentation/embeddings.md +0 -762
- package/documentation/faq.md +0 -713
- package/documentation/features.md +0 -403
- package/documentation/headroom.md +0 -519
- package/documentation/installation.md +0 -758
- package/documentation/memory-system.md +0 -476
- package/documentation/production.md +0 -636
- package/documentation/providers.md +0 -1009
- package/documentation/routing.md +0 -476
- package/documentation/testing.md +0 -629
- package/documentation/token-optimization.md +0 -325
- package/documentation/tools.md +0 -697
- package/documentation/troubleshooting.md +0 -969
- package/final-test.js +0 -33
- package/headroom-sidecar/config.py +0 -93
- package/headroom-sidecar/requirements.txt +0 -14
- package/headroom-sidecar/server.py +0 -451
- package/monitor-agents.sh +0 -31
- package/scripts/audit-log-reader.js +0 -399
- package/scripts/compact-dictionary.js +0 -204
- package/scripts/test-deduplication.js +0 -448
- package/src/db/database.sqlite +0 -0
- package/te +0 -11622
- package/test/README.md +0 -212
- package/test/azure-openai-config.test.js +0 -213
- package/test/azure-openai-error-resilience.test.js +0 -238
- package/test/azure-openai-format-conversion.test.js +0 -354
- package/test/azure-openai-integration.test.js +0 -287
- package/test/azure-openai-routing.test.js +0 -175
- package/test/azure-openai-streaming.test.js +0 -171
- package/test/bedrock-integration.test.js +0 -457
- package/test/comprehensive-test-suite.js +0 -928
- package/test/config-validation.test.js +0 -207
- package/test/cursor-integration.test.js +0 -484
- package/test/format-conversion.test.js +0 -578
- package/test/hybrid-routing-integration.test.js +0 -269
- package/test/hybrid-routing-performance.test.js +0 -428
- package/test/llamacpp-integration.test.js +0 -882
- package/test/lmstudio-integration.test.js +0 -347
- package/test/memory/extractor.test.js +0 -398
- package/test/memory/retriever.test.js +0 -613
- package/test/memory/retriever.test.js.bak +0 -585
- package/test/memory/search.test.js +0 -537
- package/test/memory/search.test.js.bak +0 -389
- package/test/memory/store.test.js +0 -344
- package/test/memory/store.test.js.bak +0 -312
- package/test/memory/surprise.test.js +0 -300
- package/test/memory-performance.test.js +0 -472
- package/test/openai-integration.test.js +0 -683
- package/test/openrouter-error-resilience.test.js +0 -418
- package/test/passthrough-mode.test.js +0 -385
- package/test/performance-benchmark.js +0 -351
- package/test/performance-tests.js +0 -528
- package/test/routing.test.js +0 -225
- package/test/toon-compression.test.js +0 -131
- package/test/web-tools.test.js +0 -329
- package/test-agents-simple.js +0 -43
- package/test-cli-connection.sh +0 -33
- package/test-learning-unit.js +0 -126
- package/test-learning.js +0 -112
- package/test-parallel-agents.sh +0 -124
- package/test-parallel-direct.js +0 -155
- package/test-subagents.sh +0 -117
|
@@ -1,325 +0,0 @@
|
|
|
1
|
-
# Token Optimization Guide
|
|
2
|
-
|
|
3
|
-
Comprehensive guide to Lynkr's token optimization strategies that achieve 60-80% cost reduction.
|
|
4
|
-
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
## Overview
|
|
8
|
-
|
|
9
|
-
Lynkr reduces token usage by **60-80%** through 6 intelligent optimization phases. At 100,000 requests/month, this translates to **$77k-$115k annual savings**.
|
|
10
|
-
|
|
11
|
-
---
|
|
12
|
-
|
|
13
|
-
## Cost Savings Breakdown
|
|
14
|
-
|
|
15
|
-
### Real-World Example
|
|
16
|
-
|
|
17
|
-
**Scenario:** 100,000 requests/month, 50k input tokens, 2k output tokens per request
|
|
18
|
-
|
|
19
|
-
| Provider | Without Lynkr | With Lynkr (60% savings) | Monthly Savings | Annual Savings |
|
|
20
|
-
|----------|---------------|-------------------------|-----------------|----------------|
|
|
21
|
-
| **Claude Sonnet 4.5** | $16,000 | $6,400 | **$9,600** | **$115,200** |
|
|
22
|
-
| **GPT-4o** | $12,000 | $4,800 | **$7,200** | **$86,400** |
|
|
23
|
-
| **Ollama (Local)** | API costs | **$0** | **$12,000+** | **$144,000+** |
|
|
24
|
-
|
|
25
|
-
---
|
|
26
|
-
|
|
27
|
-
## 6 Optimization Phases
|
|
28
|
-
|
|
29
|
-
### Phase 1: Smart Tool Selection (50-70% reduction)
|
|
30
|
-
|
|
31
|
-
**Problem:** Sending all tools to every request wastes tokens.
|
|
32
|
-
|
|
33
|
-
**Solution:** Intelligently filter tools based on request type.
|
|
34
|
-
|
|
35
|
-
**How it works:**
|
|
36
|
-
- **Chat queries** → Only send Read tool
|
|
37
|
-
- **File operations** → Send Read, Write, Edit tools
|
|
38
|
-
- **Git operations** → Send git_* tools
|
|
39
|
-
- **Code execution** → Send Bash tool
|
|
40
|
-
|
|
41
|
-
**Example:**
|
|
42
|
-
```
|
|
43
|
-
Original: 30 tools × 150 tokens = 4,500 tokens
|
|
44
|
-
Optimized: 3 tools × 150 tokens = 450 tokens
|
|
45
|
-
Savings: 90% (4,050 tokens saved)
|
|
46
|
-
```
|
|
47
|
-
|
|
48
|
-
**Configuration:**
|
|
49
|
-
```bash
|
|
50
|
-
# Automatic - no configuration needed
|
|
51
|
-
# Lynkr detects request type and filters tools
|
|
52
|
-
```
|
|
53
|
-
|
|
54
|
-
---
|
|
55
|
-
|
|
56
|
-
### Phase 2: Prompt Caching (30-45% reduction)
|
|
57
|
-
|
|
58
|
-
**Problem:** Repeated system prompts consume tokens.
|
|
59
|
-
|
|
60
|
-
**Solution:** Cache and reuse prompts across requests.
|
|
61
|
-
|
|
62
|
-
**How it works:**
|
|
63
|
-
- SHA-256 hash of prompt
|
|
64
|
-
- LRU cache with TTL (default: 5 minutes)
|
|
65
|
-
- Cache hit = free tokens
|
|
66
|
-
|
|
67
|
-
**Example:**
|
|
68
|
-
```
|
|
69
|
-
First request: 2,000 token system prompt
|
|
70
|
-
Subsequent requests: 0 tokens (cache hit)
|
|
71
|
-
10 requests: Save 18,000 tokens (90% reduction)
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
**Configuration:**
|
|
75
|
-
```bash
|
|
76
|
-
# Enable prompt caching (default: enabled)
|
|
77
|
-
PROMPT_CACHE_ENABLED=true
|
|
78
|
-
|
|
79
|
-
# Cache TTL in milliseconds (default: 300000 = 5 minutes)
|
|
80
|
-
PROMPT_CACHE_TTL_MS=300000
|
|
81
|
-
|
|
82
|
-
# Max cached entries (default: 64)
|
|
83
|
-
PROMPT_CACHE_MAX_ENTRIES=64
|
|
84
|
-
```
|
|
85
|
-
|
|
86
|
-
---
|
|
87
|
-
|
|
88
|
-
### Phase 3: Memory Deduplication (20-30% reduction)
|
|
89
|
-
|
|
90
|
-
**Problem:** Duplicate memories inject redundant context.
|
|
91
|
-
|
|
92
|
-
**Solution:** Deduplicate memories before injection.
|
|
93
|
-
|
|
94
|
-
**How it works:**
|
|
95
|
-
- Track last N memories injected
|
|
96
|
-
- Skip if same memory was in last 5 requests
|
|
97
|
-
- Only inject novel context
|
|
98
|
-
|
|
99
|
-
**Example:**
|
|
100
|
-
```
|
|
101
|
-
Original: 5 memories × 200 tokens × 10 requests = 10,000 tokens
|
|
102
|
-
With dedup: 5 memories × 200 tokens + 3 new × 200 = 1,600 tokens
|
|
103
|
-
Savings: 84% (8,400 tokens saved)
|
|
104
|
-
```
|
|
105
|
-
|
|
106
|
-
**Configuration:**
|
|
107
|
-
```bash
|
|
108
|
-
# Enable memory deduplication (default: enabled)
|
|
109
|
-
MEMORY_DEDUP_ENABLED=true
|
|
110
|
-
|
|
111
|
-
# Lookback window for dedup (default: 5)
|
|
112
|
-
MEMORY_DEDUP_LOOKBACK=5
|
|
113
|
-
```
|
|
114
|
-
|
|
115
|
-
---
|
|
116
|
-
|
|
117
|
-
### Phase 4: Tool Response Truncation (15-25% reduction)
|
|
118
|
-
|
|
119
|
-
**Problem:** Long tool outputs (file contents, bash output) waste tokens.
|
|
120
|
-
|
|
121
|
-
**Solution:** Intelligently truncate tool responses.
|
|
122
|
-
|
|
123
|
-
**How it works:**
|
|
124
|
-
- File Read: Limit to 2,000 lines
|
|
125
|
-
- Bash output: Limit to 1,000 lines
|
|
126
|
-
- Keep most relevant portions
|
|
127
|
-
- Add truncation indicator
|
|
128
|
-
|
|
129
|
-
**Example:**
|
|
130
|
-
```
|
|
131
|
-
Original file read: 10,000 lines = 50,000 tokens
|
|
132
|
-
Truncated: 2,000 lines = 10,000 tokens
|
|
133
|
-
Savings: 80% (40,000 tokens saved)
|
|
134
|
-
```
|
|
135
|
-
|
|
136
|
-
**Configuration:**
|
|
137
|
-
```bash
|
|
138
|
-
# Automatic - no configuration needed
|
|
139
|
-
# Built into Read and Bash tools
|
|
140
|
-
```
|
|
141
|
-
|
|
142
|
-
---
|
|
143
|
-
|
|
144
|
-
### Phase 5: Dynamic System Prompts (10-20% reduction)
|
|
145
|
-
|
|
146
|
-
**Problem:** Long system prompts for simple queries.
|
|
147
|
-
|
|
148
|
-
**Solution:** Adapt prompt complexity to request type.
|
|
149
|
-
|
|
150
|
-
**How it works:**
|
|
151
|
-
- **Simple chat**: Minimal system prompt (500 tokens)
|
|
152
|
-
- **File operations**: Medium prompt (1,000 tokens)
|
|
153
|
-
- **Complex multi-tool**: Full prompt (2,000 tokens)
|
|
154
|
-
|
|
155
|
-
**Example:**
|
|
156
|
-
```
|
|
157
|
-
10 simple queries with full prompt: 10 × 2,000 = 20,000 tokens
|
|
158
|
-
10 simple queries with minimal: 10 × 500 = 5,000 tokens
|
|
159
|
-
Savings: 75% (15,000 tokens saved)
|
|
160
|
-
```
|
|
161
|
-
|
|
162
|
-
**Configuration:**
|
|
163
|
-
```bash
|
|
164
|
-
# Automatic - no configuration needed
|
|
165
|
-
# Lynkr detects request complexity
|
|
166
|
-
```
|
|
167
|
-
|
|
168
|
-
---
|
|
169
|
-
|
|
170
|
-
### Phase 6: Conversation Compression (15-25% reduction)
|
|
171
|
-
|
|
172
|
-
**Problem:** Long conversation history accumulates tokens.
|
|
173
|
-
|
|
174
|
-
**Solution:** Compress old messages while keeping recent ones detailed.
|
|
175
|
-
|
|
176
|
-
**How it works:**
|
|
177
|
-
- Last 5 messages: Full detail
|
|
178
|
-
- Messages 6-20: Summarized
|
|
179
|
-
- Messages 21+: Archived (not sent)
|
|
180
|
-
|
|
181
|
-
**Example:**
|
|
182
|
-
```
|
|
183
|
-
20-turn conversation without compression: 100,000 tokens
|
|
184
|
-
With compression: 25,000 tokens (last 5 full + 15 summarized)
|
|
185
|
-
Savings: 75% (75,000 tokens saved)
|
|
186
|
-
```
|
|
187
|
-
|
|
188
|
-
**Configuration:**
|
|
189
|
-
```bash
|
|
190
|
-
# Automatic - no configuration needed
|
|
191
|
-
# Lynkr manages conversation history
|
|
192
|
-
```
|
|
193
|
-
|
|
194
|
-
---
|
|
195
|
-
|
|
196
|
-
## Combined Savings
|
|
197
|
-
|
|
198
|
-
When all 6 phases work together:
|
|
199
|
-
|
|
200
|
-
**Example Request Flow:**
|
|
201
|
-
|
|
202
|
-
1. **Original request**: 50,000 input tokens
|
|
203
|
-
- System prompt: 2,000 tokens
|
|
204
|
-
- Tools: 4,500 tokens (30 tools)
|
|
205
|
-
- Memories: 1,000 tokens (5 memories)
|
|
206
|
-
- Conversation: 20,000 tokens (20 messages)
|
|
207
|
-
- User query: 22,500 tokens
|
|
208
|
-
|
|
209
|
-
2. **After optimization**: 12,500 input tokens
|
|
210
|
-
- System prompt: 0 tokens (cache hit)
|
|
211
|
-
- Tools: 450 tokens (3 relevant tools)
|
|
212
|
-
- Memories: 200 tokens (deduplicated)
|
|
213
|
-
- Conversation: 5,000 tokens (compressed)
|
|
214
|
-
- User query: 22,500 tokens (same)
|
|
215
|
-
|
|
216
|
-
3. **Savings**: 75% reduction (37,500 tokens saved)
|
|
217
|
-
|
|
218
|
-
---
|
|
219
|
-
|
|
220
|
-
## Monitoring Token Usage
|
|
221
|
-
|
|
222
|
-
### Real-Time Tracking
|
|
223
|
-
|
|
224
|
-
```bash
|
|
225
|
-
# Check metrics endpoint
|
|
226
|
-
curl http://localhost:8081/metrics | grep lynkr_tokens
|
|
227
|
-
|
|
228
|
-
# Output:
|
|
229
|
-
# lynkr_tokens_input_total{provider="databricks"} 1234567
|
|
230
|
-
# lynkr_tokens_output_total{provider="databricks"} 234567
|
|
231
|
-
# lynkr_tokens_cached_total 500000
|
|
232
|
-
```
|
|
233
|
-
|
|
234
|
-
### Per-Request Logging
|
|
235
|
-
|
|
236
|
-
```bash
|
|
237
|
-
# Enable token logging
|
|
238
|
-
LOG_LEVEL=info
|
|
239
|
-
|
|
240
|
-
# Logs show:
|
|
241
|
-
# {"level":"info","tokens":{"input":1250,"output":234,"cached":750}}
|
|
242
|
-
```
|
|
243
|
-
|
|
244
|
-
---
|
|
245
|
-
|
|
246
|
-
## Best Practices
|
|
247
|
-
|
|
248
|
-
### 1. Enable All Optimizations
|
|
249
|
-
|
|
250
|
-
```bash
|
|
251
|
-
# All optimizations are enabled by default
|
|
252
|
-
# No configuration needed
|
|
253
|
-
```
|
|
254
|
-
|
|
255
|
-
### 2. Use Tier-Based Routing
|
|
256
|
-
|
|
257
|
-
```bash
|
|
258
|
-
# Route simple requests to free Ollama, complex to cloud
|
|
259
|
-
# Set all 4 TIER_* env vars to enable tier-based routing
|
|
260
|
-
TIER_SIMPLE=ollama:llama3.2
|
|
261
|
-
TIER_MEDIUM=openrouter:openai/gpt-4o-mini
|
|
262
|
-
TIER_COMPLEX=azure-openai:gpt-4o
|
|
263
|
-
TIER_REASONING=azure-openai:gpt-4o
|
|
264
|
-
FALLBACK_ENABLED=true
|
|
265
|
-
FALLBACK_PROVIDER=databricks
|
|
266
|
-
```
|
|
267
|
-
|
|
268
|
-
### 3. Monitor and Tune
|
|
269
|
-
|
|
270
|
-
```bash
|
|
271
|
-
# Check cache hit rate
|
|
272
|
-
curl http://localhost:8081/metrics | grep cache_hits
|
|
273
|
-
|
|
274
|
-
# Adjust cache size if needed
|
|
275
|
-
PROMPT_CACHE_MAX_ENTRIES=128 # Increase for more caching
|
|
276
|
-
```
|
|
277
|
-
|
|
278
|
-
---
|
|
279
|
-
|
|
280
|
-
## ROI Calculator
|
|
281
|
-
|
|
282
|
-
Calculate your potential savings:
|
|
283
|
-
|
|
284
|
-
**Formula:**
|
|
285
|
-
```
|
|
286
|
-
Monthly Requests = 100,000
|
|
287
|
-
Avg Input Tokens = 50,000
|
|
288
|
-
Avg Output Tokens = 2,000
|
|
289
|
-
Cost per 1M Input = $3.00
|
|
290
|
-
Cost per 1M Output = $15.00
|
|
291
|
-
|
|
292
|
-
Without Lynkr:
|
|
293
|
-
Input Cost = (100,000 × 50,000 ÷ 1,000,000) × $3 = $15,000
|
|
294
|
-
Output Cost = (100,000 × 2,000 ÷ 1,000,000) × $15 = $3,000
|
|
295
|
-
Total = $18,000/month
|
|
296
|
-
|
|
297
|
-
With Lynkr (60% savings):
|
|
298
|
-
Total = $7,200/month
|
|
299
|
-
|
|
300
|
-
Savings = $10,800/month = $129,600/year
|
|
301
|
-
```
|
|
302
|
-
|
|
303
|
-
**Your numbers:**
|
|
304
|
-
- Monthly requests: _____
|
|
305
|
-
- Avg input tokens: _____
|
|
306
|
-
- Avg output tokens: _____
|
|
307
|
-
- Provider cost: _____
|
|
308
|
-
|
|
309
|
-
**Result:** $_____ saved per year
|
|
310
|
-
|
|
311
|
-
---
|
|
312
|
-
|
|
313
|
-
## Next Steps
|
|
314
|
-
|
|
315
|
-
- **[Installation Guide](installation.md)** - Install Lynkr
|
|
316
|
-
- **[Provider Configuration](providers.md)** - Configure providers
|
|
317
|
-
- **[Production Guide](production.md)** - Deploy to production
|
|
318
|
-
- **[FAQ](faq.md)** - Common questions
|
|
319
|
-
|
|
320
|
-
---
|
|
321
|
-
|
|
322
|
-
## Getting Help
|
|
323
|
-
|
|
324
|
-
- **[GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Ask questions
|
|
325
|
-
- **[GitHub Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Report issues
|