lynkr 7.2.5 → 8.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (124) hide show
  1. package/README.md +3 -3
  2. package/config/model-tiers.json +89 -0
  3. package/install.sh +6 -1
  4. package/package.json +4 -2
  5. package/scripts/setup.js +0 -1
  6. package/src/agents/executor.js +14 -6
  7. package/src/api/middleware/session.js +15 -2
  8. package/src/api/openai-router.js +162 -37
  9. package/src/api/providers-handler.js +15 -1
  10. package/src/api/router.js +107 -2
  11. package/src/budget/index.js +4 -3
  12. package/src/clients/databricks.js +431 -234
  13. package/src/clients/gpt-utils.js +181 -0
  14. package/src/clients/ollama-utils.js +66 -140
  15. package/src/clients/routing.js +0 -1
  16. package/src/clients/standard-tools.js +99 -3
  17. package/src/config/index.js +133 -35
  18. package/src/context/toon.js +173 -0
  19. package/src/logger/index.js +23 -0
  20. package/src/orchestrator/index.js +688 -213
  21. package/src/routing/agentic-detector.js +320 -0
  22. package/src/routing/complexity-analyzer.js +202 -2
  23. package/src/routing/cost-optimizer.js +305 -0
  24. package/src/routing/index.js +168 -159
  25. package/src/routing/model-tiers.js +365 -0
  26. package/src/server.js +4 -14
  27. package/src/sessions/cleanup.js +3 -3
  28. package/src/sessions/record.js +10 -1
  29. package/src/sessions/store.js +7 -2
  30. package/src/tools/agent-task.js +48 -1
  31. package/src/tools/index.js +19 -2
  32. package/src/tools/lazy-loader.js +7 -0
  33. package/src/tools/tinyfish.js +358 -0
  34. package/src/tools/truncate.js +1 -0
  35. package/.github/FUNDING.yml +0 -15
  36. package/.github/workflows/README.md +0 -215
  37. package/.github/workflows/ci.yml +0 -69
  38. package/.github/workflows/index.yml +0 -62
  39. package/.github/workflows/web-tools-tests.yml +0 -56
  40. package/CITATIONS.bib +0 -6
  41. package/CLAWROUTER_ROUTING_PLAN.md +0 -910
  42. package/DEPLOYMENT.md +0 -1001
  43. package/LYNKR-TUI-PLAN.md +0 -984
  44. package/PERFORMANCE-REPORT.md +0 -866
  45. package/PLAN-per-client-model-routing.md +0 -252
  46. package/ROUTER_COMPARISON.md +0 -173
  47. package/TIER_ROUTING_PLAN.md +0 -771
  48. package/docs/42642f749da6234f41b6b425c3bb07c9.txt +0 -1
  49. package/docs/BingSiteAuth.xml +0 -4
  50. package/docs/docs-style.css +0 -478
  51. package/docs/docs.html +0 -197
  52. package/docs/google5be250e608e6da39.html +0 -1
  53. package/docs/index.html +0 -577
  54. package/docs/index.md +0 -577
  55. package/docs/robots.txt +0 -4
  56. package/docs/sitemap.xml +0 -44
  57. package/docs/style.css +0 -1223
  58. package/documentation/README.md +0 -100
  59. package/documentation/api.md +0 -806
  60. package/documentation/claude-code-cli.md +0 -672
  61. package/documentation/codex-cli.md +0 -397
  62. package/documentation/contributing.md +0 -571
  63. package/documentation/cursor-integration.md +0 -731
  64. package/documentation/docker.md +0 -867
  65. package/documentation/embeddings.md +0 -760
  66. package/documentation/faq.md +0 -659
  67. package/documentation/features.md +0 -396
  68. package/documentation/headroom.md +0 -519
  69. package/documentation/installation.md +0 -706
  70. package/documentation/memory-system.md +0 -476
  71. package/documentation/production.md +0 -601
  72. package/documentation/providers.md +0 -906
  73. package/documentation/testing.md +0 -629
  74. package/documentation/token-optimization.md +0 -323
  75. package/documentation/tools.md +0 -697
  76. package/documentation/troubleshooting.md +0 -893
  77. package/final-test.js +0 -33
  78. package/headroom-sidecar/config.py +0 -93
  79. package/headroom-sidecar/requirements.txt +0 -14
  80. package/headroom-sidecar/server.py +0 -451
  81. package/monitor-agents.sh +0 -31
  82. package/scripts/audit-log-reader.js +0 -399
  83. package/scripts/compact-dictionary.js +0 -204
  84. package/scripts/test-deduplication.js +0 -448
  85. package/src/db/database.sqlite +0 -0
  86. package/test/README.md +0 -212
  87. package/test/azure-openai-config.test.js +0 -204
  88. package/test/azure-openai-error-resilience.test.js +0 -238
  89. package/test/azure-openai-format-conversion.test.js +0 -354
  90. package/test/azure-openai-integration.test.js +0 -281
  91. package/test/azure-openai-routing.test.js +0 -177
  92. package/test/azure-openai-streaming.test.js +0 -171
  93. package/test/bedrock-integration.test.js +0 -471
  94. package/test/comprehensive-test-suite.js +0 -928
  95. package/test/config-validation.test.js +0 -207
  96. package/test/cursor-integration.test.js +0 -484
  97. package/test/format-conversion.test.js +0 -578
  98. package/test/hybrid-routing-integration.test.js +0 -254
  99. package/test/hybrid-routing-performance.test.js +0 -418
  100. package/test/llamacpp-integration.test.js +0 -863
  101. package/test/lmstudio-integration.test.js +0 -335
  102. package/test/memory/extractor.test.js +0 -398
  103. package/test/memory/retriever.test.js +0 -613
  104. package/test/memory/retriever.test.js.bak +0 -585
  105. package/test/memory/search.test.js +0 -537
  106. package/test/memory/search.test.js.bak +0 -389
  107. package/test/memory/store.test.js +0 -344
  108. package/test/memory/store.test.js.bak +0 -312
  109. package/test/memory/surprise.test.js +0 -300
  110. package/test/memory-performance.test.js +0 -472
  111. package/test/openai-integration.test.js +0 -686
  112. package/test/openrouter-error-resilience.test.js +0 -418
  113. package/test/passthrough-mode.test.js +0 -385
  114. package/test/performance-benchmark.js +0 -351
  115. package/test/performance-tests.js +0 -528
  116. package/test/routing.test.js +0 -219
  117. package/test/web-tools.test.js +0 -329
  118. package/test-agents-simple.js +0 -43
  119. package/test-cli-connection.sh +0 -33
  120. package/test-learning-unit.js +0 -126
  121. package/test-learning.js +0 -112
  122. package/test-parallel-agents.sh +0 -124
  123. package/test-parallel-direct.js +0 -155
  124. package/test-subagents.sh +0 -117
@@ -1,519 +0,0 @@
1
- # Headroom Context Compression
2
-
3
- Headroom is an intelligent context compression system that reduces LLM token usage by 47-92% while preserving semantic meaning. It runs as a Python sidecar container that Lynkr manages automatically via Docker.
4
-
5
- ---
6
-
7
- ## Overview
8
-
9
- ### What is Headroom?
10
-
11
- Headroom is a context optimization SDK that compresses LLM prompts and tool outputs using:
12
-
13
- 1. **Smart Crusher** - Statistical JSON compression based on field analysis
14
- 2. **Cache Aligner** - Stabilizes dynamic content (UUIDs, timestamps) for provider cache hits
15
- 3. **CCR (Compress-Cache-Retrieve)** - Reversible compression with on-demand retrieval
16
- 4. **Rolling Window** - Token budget enforcement with turn-based windowing
17
- 5. **LLMLingua** (optional) - ML-based 20x compression using BERT
18
-
19
- ### Benefits
20
-
21
- | Metric | Without Headroom | With Headroom |
22
- |--------|-----------------|---------------|
23
- | Token usage | 100% | 8-53% (47-92% reduction) |
24
- | Cache hit rate | ~20% | ~60-80% |
25
- | Cost per request | $0.01-0.05 | $0.002-0.02 |
26
- | Context overflow | Common | Rare |
27
-
28
- ---
29
-
30
- ## Quick Start
31
-
32
- ### 1. Enable Headroom
33
-
34
- Add to your `.env`:
35
-
36
- ```bash
37
- # Enable Headroom compression
38
- HEADROOM_ENABLED=true
39
- ```
40
-
41
- ### 2. Start Lynkr
42
-
43
- ```bash
44
- npm start
45
- ```
46
-
47
- Lynkr will automatically:
48
- 1. Pull the `lynkr/headroom-sidecar:latest` Docker image
49
- 2. Start the container with configured settings
50
- 3. Wait for health checks to pass
51
- 4. Begin compressing requests
52
-
53
- ### 3. Verify It's Working
54
-
55
- Check the health endpoint:
56
-
57
- ```bash
58
- curl http://localhost:8081/health/headroom
59
- ```
60
-
61
- Expected response:
62
- ```json
63
- {
64
- "enabled": true,
65
- "healthy": true,
66
- "service": {
67
- "available": true,
68
- "ccrEnabled": true,
69
- "llmlinguaEnabled": false
70
- },
71
- "docker": {
72
- "running": true,
73
- "status": "running",
74
- "health": "healthy"
75
- }
76
- }
77
- ```
78
-
79
- ---
80
-
81
- ## How It Works
82
-
83
- ### Transform Pipeline
84
-
85
- When a request arrives, Headroom processes it through a three-stage pipeline:
86
-
87
- ```
88
- Request → Cache Aligner → Smart Crusher → Context Manager → Compressed Request
89
- ↓ ↓ ↓
90
- Stabilize IDs Compress JSON Enforce budget
91
- ```
92
-
93
- ### 1. Cache Aligner
94
-
95
- **Problem**: Dynamic content like UUIDs and timestamps change every request, preventing provider cache hits.
96
-
97
- **Solution**: Replace dynamic values with stable placeholders:
98
-
99
- ```json
100
- // Before
101
- {"id": "f47ac10b-58cc-4372-a567-0e02b2c3d479", "created": "2024-01-15T10:30:00Z"}
102
-
103
- // After
104
- {"id": "[ID:1]", "created": "[TS:1]"}
105
- ```
106
-
107
- **Result**: 60-80% cache hit rate instead of ~20%.
108
-
109
- ### 2. Smart Crusher
110
-
111
- **Problem**: Tool outputs often contain repetitive JSON with many similar items.
112
-
113
- **Solution**: Statistical analysis to identify and compress redundant fields:
114
-
115
- ```json
116
- // Before (100 search results, ~50KB)
117
- [
118
- {"title": "Result 1", "url": "...", "snippet": "...", "score": 0.95, ...},
119
- {"title": "Result 2", "url": "...", "snippet": "...", "score": 0.93, ...},
120
- // ... 98 more items
121
- ]
122
-
123
- // After (~5KB)
124
- {
125
- "_meta": {"compressed": true, "original_count": 100, "kept": 12},
126
- "items": [
127
- // Top 12 most relevant items with essential fields only
128
- ]
129
- }
130
- ```
131
-
132
- **Compression strategies**:
133
- - **High-variance fields**: Keep (they're informative)
134
- - **Low-variance fields**: Remove (they're redundant)
135
- - **Unique fields**: Keep first occurrence only
136
- - **Repetitive arrays**: Sample representative items
137
-
138
- ### 3. CCR (Compress-Cache-Retrieve)
139
-
140
- **Problem**: Sometimes you need to retrieve compressed content later.
141
-
142
- **Solution**: Hash-based reversible compression:
143
-
144
- ```json
145
- // Compressed message
146
- {
147
- "content": "[CCR:abc123] 100 files found. Use ccr_retrieve to explore.",
148
- "ccr_available": true
149
- }
150
-
151
- // Tool definition injected
152
- {
153
- "name": "ccr_retrieve",
154
- "description": "Retrieve compressed content by hash",
155
- "input_schema": {
156
- "hash": "string",
157
- "query": "string (optional search within results)"
158
- }
159
- }
160
- ```
161
-
162
- When the LLM calls `ccr_retrieve`, Headroom returns the full original content.
163
-
164
- ---
165
-
166
- ## Configuration
167
-
168
- ### Basic Settings
169
-
170
- ```bash
171
- # Enable/disable Headroom
172
- HEADROOM_ENABLED=true
173
-
174
- # Sidecar endpoint
175
- HEADROOM_ENDPOINT=http://localhost:8787
176
-
177
- # Request timeout (ms)
178
- HEADROOM_TIMEOUT_MS=5000
179
-
180
- # Skip compression for small requests (tokens)
181
- HEADROOM_MIN_TOKENS=500
182
-
183
- # Mode: "audit" (observe) or "optimize" (apply)
184
- HEADROOM_MODE=optimize
185
- ```
186
-
187
- ### Docker Settings
188
-
189
- ```bash
190
- # Enable automatic container management
191
- HEADROOM_DOCKER_ENABLED=true
192
-
193
- # Docker image
194
- HEADROOM_DOCKER_IMAGE=lynkr/headroom-sidecar:latest
195
-
196
- # Container name
197
- HEADROOM_DOCKER_CONTAINER_NAME=lynkr-headroom
198
-
199
- # Port mapping
200
- HEADROOM_DOCKER_PORT=8787
201
-
202
- # Resource limits
203
- HEADROOM_DOCKER_MEMORY_LIMIT=512m
204
- HEADROOM_DOCKER_CPU_LIMIT=1.0
205
-
206
- # Restart policy
207
- HEADROOM_DOCKER_RESTART_POLICY=unless-stopped
208
- ```
209
-
210
- ### Transform Settings
211
-
212
- ```bash
213
- # Smart Crusher (statistical JSON compression)
214
- HEADROOM_SMART_CRUSHER=true
215
- HEADROOM_SMART_CRUSHER_MIN_TOKENS=200
216
- HEADROOM_SMART_CRUSHER_MAX_ITEMS=15
217
-
218
- # Tool Crusher (fixed-rules compression)
219
- HEADROOM_TOOL_CRUSHER=true
220
-
221
- # Cache Aligner (stabilize dynamic content)
222
- HEADROOM_CACHE_ALIGNER=true
223
-
224
- # Rolling Window (context overflow management)
225
- HEADROOM_ROLLING_WINDOW=true
226
- HEADROOM_KEEP_TURNS=3
227
- ```
228
-
229
- ### CCR Settings
230
-
231
- ```bash
232
- # Enable CCR for reversible compression
233
- HEADROOM_CCR=true
234
-
235
- # Cache TTL in seconds
236
- HEADROOM_CCR_TTL=300
237
- ```
238
-
239
- ### LLMLingua Settings (Optional)
240
-
241
- LLMLingua provides ML-based compression using BERT token classification. Requires GPU for reasonable performance.
242
-
243
- ```bash
244
- # Enable LLMLingua (default: false)
245
- HEADROOM_LLMLINGUA=true
246
-
247
- # Device: cuda, cpu, auto
248
- HEADROOM_LLMLINGUA_DEVICE=cuda
249
- ```
250
-
251
- **Note**: LLMLingua adds 100-500ms latency per request. Only enable if you have a GPU and need maximum compression.
252
-
253
- ---
254
-
255
- ## API Endpoints
256
-
257
- ### Health Check
258
-
259
- ```bash
260
- GET /health/headroom
261
- ```
262
-
263
- Returns Headroom health status including container and service state.
264
-
265
- ### Compression Metrics
266
-
267
- ```bash
268
- GET /metrics/compression
269
- ```
270
-
271
- Returns compression statistics:
272
-
273
- ```json
274
- {
275
- "enabled": true,
276
- "endpoint": "http://localhost:8787",
277
- "client": {
278
- "totalCalls": 150,
279
- "successfulCompressions": 120,
280
- "skippedCompressions": 25,
281
- "failures": 5,
282
- "totalTokensSaved": 450000,
283
- "averageLatencyMs": 45,
284
- "compressionRate": 80,
285
- "failureRate": 3
286
- },
287
- "server": {
288
- "requests_total": 150,
289
- "compressions_applied": 120,
290
- "average_compression_ratio": 0.35,
291
- "ccr_retrievals": 45
292
- }
293
- }
294
- ```
295
-
296
- ### Detailed Status
297
-
298
- ```bash
299
- GET /headroom/status
300
- ```
301
-
302
- Returns full status including configuration, metrics, and recent logs.
303
-
304
- ### Container Restart
305
-
306
- ```bash
307
- POST /headroom/restart
308
- ```
309
-
310
- Restarts the Headroom container (useful for applying config changes).
311
-
312
- ### Container Logs
313
-
314
- ```bash
315
- GET /headroom/logs?tail=100
316
- ```
317
-
318
- Returns recent container logs for debugging.
319
-
320
- ---
321
-
322
- ## Monitoring
323
-
324
- ### Health Check Integration
325
-
326
- Headroom status is included in the `/health/ready` endpoint:
327
-
328
- ```json
329
- {
330
- "status": "ready",
331
- "checks": {
332
- "database": { "healthy": true },
333
- "memory": { "healthy": true },
334
- "headroom": {
335
- "healthy": true,
336
- "enabled": true,
337
- "service": "available",
338
- "docker": "running"
339
- }
340
- }
341
- }
342
- ```
343
-
344
- **Note**: Headroom is non-critical. If it fails, Lynkr continues without compression.
345
-
346
- ### Logging
347
-
348
- Headroom logs compression events:
349
-
350
- ```
351
- INFO: Headroom compression applied
352
- tokensBefore: 15000
353
- tokensAfter: 5200
354
- savingsPercent: 65.3
355
- latencyMs: 42
356
- transforms: ["cache_aligner", "smart_crusher"]
357
- ```
358
-
359
- ---
360
-
361
- ## Troubleshooting
362
-
363
- ### Container Won't Start
364
-
365
- **Check Docker is running:**
366
- ```bash
367
- docker ps
368
- ```
369
-
370
- **Check for port conflicts:**
371
- ```bash
372
- lsof -i :8787
373
- ```
374
-
375
- **View container logs:**
376
- ```bash
377
- curl http://localhost:8081/headroom/logs
378
- # or
379
- docker logs lynkr-headroom
380
- ```
381
-
382
- ### High Latency
383
-
384
- 1. **Reduce transforms**: Disable LLMLingua if not needed
385
- 2. **Increase resources**: Raise `HEADROOM_DOCKER_MEMORY_LIMIT`
386
- 3. **Skip small requests**: Increase `HEADROOM_MIN_TOKENS`
387
-
388
- ### Compression Not Applied
389
-
390
- Check:
391
- 1. `HEADROOM_ENABLED=true` in `.env`
392
- 2. Request has more than `HEADROOM_MIN_TOKENS` tokens
393
- 3. Health endpoint shows `healthy: true`
394
-
395
- ### CCR Retrieval Fails
396
-
397
- 1. Check `HEADROOM_CCR=true`
398
- 2. Verify TTL hasn't expired (`HEADROOM_CCR_TTL`)
399
- 3. Ensure same session is used (CCR is session-scoped)
400
-
401
- ---
402
-
403
- ## Architecture
404
-
405
- ### System Diagram
406
-
407
- ```
408
- ┌─────────────────────────────────────────────────────────────────┐
409
- │ Lynkr (Node.js) │
410
- │ ┌──────────────────────────────────────────────────────────┐ │
411
- │ │ Request Handler │ │
412
- │ │ ↓ │ │
413
- │ │ src/headroom/client.js ──HTTP──→ Headroom Sidecar │ │
414
- │ │ ↓ (Python Container) │ │
415
- │ │ Compressed Request │ │ │
416
- │ │ ↓ ↓ │ │
417
- │ │ LLM Provider ┌─────────────┐ │ │
418
- │ │ │ Transforms │ │ │
419
- │ └──────────────────────────────────│ - Aligner │─────────┘ │
420
- │ │ - Crusher │ │
421
- │ │ - CCR Store │ │
422
- │ │ - LLMLingua │ │
423
- │ └─────────────┘ │
424
- └─────────────────────────────────────────────────────────────────┘
425
- ```
426
-
427
- ### Request Flow
428
-
429
- 1. **Request arrives** at Lynkr
430
- 2. **Token estimation** - Skip if below `HEADROOM_MIN_TOKENS`
431
- 3. **Send to sidecar** - HTTP POST to `/compress`
432
- 4. **Transform pipeline** executes:
433
- - Cache Aligner stabilizes dynamic content
434
- - Smart Crusher compresses JSON structures
435
- - Context Manager enforces token budget
436
- 5. **Return compressed** messages and tools
437
- 6. **Forward to LLM** provider
438
- 7. **On CCR tool call** - Retrieve original content
439
-
440
- ### File Structure
441
-
442
- ```
443
- src/headroom/
444
- ├── index.js # HeadroomManager singleton, exports
445
- ├── launcher.js # Docker container lifecycle (dockerode)
446
- ├── client.js # HTTP client for sidecar API
447
- └── health.js # Health check functionality
448
- ```
449
-
450
- ---
451
-
452
- ## Best Practices
453
-
454
- ### 1. Start with Defaults
455
-
456
- The default configuration is optimized for most use cases:
457
- - Smart Crusher: Enabled
458
- - Cache Aligner: Enabled
459
- - CCR: Enabled
460
- - LLMLingua: Disabled (enable only with GPU)
461
-
462
- ### 2. Monitor Compression Rates
463
-
464
- Check `/metrics/compression` regularly:
465
- - **Good**: 60-80% compression rate
466
- - **Warning**: Below 40% (check transform settings)
467
- - **Issue**: High failure rate (check container health)
468
-
469
- ### 3. Tune for Your Workload
470
-
471
- | Workload | Recommended Settings |
472
- |----------|---------------------|
473
- | Code assistance | `SMART_CRUSHER_MAX_ITEMS=20` |
474
- | Search-heavy | `SMART_CRUSHER_MAX_ITEMS=10`, CCR enabled |
475
- | Long conversations | `ROLLING_WINDOW=true`, `KEEP_TURNS=5` |
476
- | Cost-sensitive | Enable LLMLingua with GPU |
477
-
478
- ### 4. Use Audit Mode First
479
-
480
- Test compression without applying it:
481
-
482
- ```bash
483
- HEADROOM_MODE=audit
484
- ```
485
-
486
- This logs what would be compressed without modifying requests.
487
-
488
- ---
489
-
490
- ## FAQ
491
-
492
- ### Does Headroom affect response quality?
493
-
494
- Minimal impact. Smart Crusher preserves high-variance (informative) fields and CCR allows full retrieval when needed. LLMLingua may have ~1.5% quality reduction.
495
-
496
- ### Can I use Headroom without Docker?
497
-
498
- Yes. Disable Docker management and run the sidecar manually:
499
-
500
- ```bash
501
- HEADROOM_DOCKER_ENABLED=false
502
- HEADROOM_ENDPOINT=http://your-headroom-server:8787
503
- ```
504
-
505
- ### Is Headroom required?
506
-
507
- No. If Headroom fails or is disabled, Lynkr works normally without compression.
508
-
509
- ### What providers benefit most?
510
-
511
- All providers benefit from compression. Anthropic and OpenAI see additional benefits from Cache Aligner improving cache hit rates.
512
-
513
- ---
514
-
515
- ## References
516
-
517
- - [Headroom GitHub Repository](https://github.com/chopratejas/headroom)
518
- - [LLMLingua Paper](https://arxiv.org/abs/2310.05736)
519
- - [Anthropic Prompt Caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)