lynkr 8.0.0 → 8.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (102) hide show
  1. package/README.md +1 -1
  2. package/package.json +1 -1
  3. package/src/api/openai-router.js +34 -2
  4. package/src/clients/standard-tools.js +23 -0
  5. package/src/config/index.js +20 -0
  6. package/src/orchestrator/index.js +2 -2
  7. package/src/server.js +2 -12
  8. package/src/tools/index.js +4 -0
  9. package/src/tools/lazy-loader.js +7 -0
  10. package/src/tools/tinyfish.js +358 -0
  11. package/src/tools/truncate.js +1 -0
  12. package/.github/FUNDING.yml +0 -15
  13. package/.github/workflows/README.md +0 -215
  14. package/.github/workflows/ci.yml +0 -69
  15. package/.github/workflows/index.yml +0 -62
  16. package/.github/workflows/web-tools-tests.yml +0 -56
  17. package/CITATIONS.bib +0 -6
  18. package/DEPLOYMENT.md +0 -1001
  19. package/LYNKR-TUI-PLAN.md +0 -984
  20. package/PERFORMANCE-REPORT.md +0 -866
  21. package/PLAN-per-client-model-routing.md +0 -252
  22. package/docs/42642f749da6234f41b6b425c3bb07c9.txt +0 -1
  23. package/docs/BingSiteAuth.xml +0 -4
  24. package/docs/docs-style.css +0 -478
  25. package/docs/docs.html +0 -198
  26. package/docs/google5be250e608e6da39.html +0 -1
  27. package/docs/index.html +0 -577
  28. package/docs/index.md +0 -584
  29. package/docs/robots.txt +0 -4
  30. package/docs/sitemap.xml +0 -44
  31. package/docs/style.css +0 -1223
  32. package/docs/toon-integration-spec.md +0 -130
  33. package/documentation/README.md +0 -101
  34. package/documentation/api.md +0 -806
  35. package/documentation/claude-code-cli.md +0 -679
  36. package/documentation/codex-cli.md +0 -397
  37. package/documentation/contributing.md +0 -571
  38. package/documentation/cursor-integration.md +0 -734
  39. package/documentation/docker.md +0 -874
  40. package/documentation/embeddings.md +0 -762
  41. package/documentation/faq.md +0 -713
  42. package/documentation/features.md +0 -403
  43. package/documentation/headroom.md +0 -519
  44. package/documentation/installation.md +0 -758
  45. package/documentation/memory-system.md +0 -476
  46. package/documentation/production.md +0 -636
  47. package/documentation/providers.md +0 -1009
  48. package/documentation/routing.md +0 -476
  49. package/documentation/testing.md +0 -629
  50. package/documentation/token-optimization.md +0 -325
  51. package/documentation/tools.md +0 -697
  52. package/documentation/troubleshooting.md +0 -969
  53. package/final-test.js +0 -33
  54. package/headroom-sidecar/config.py +0 -93
  55. package/headroom-sidecar/requirements.txt +0 -14
  56. package/headroom-sidecar/server.py +0 -451
  57. package/monitor-agents.sh +0 -31
  58. package/scripts/audit-log-reader.js +0 -399
  59. package/scripts/compact-dictionary.js +0 -204
  60. package/scripts/test-deduplication.js +0 -448
  61. package/src/db/database.sqlite +0 -0
  62. package/te +0 -11622
  63. package/test/README.md +0 -212
  64. package/test/azure-openai-config.test.js +0 -213
  65. package/test/azure-openai-error-resilience.test.js +0 -238
  66. package/test/azure-openai-format-conversion.test.js +0 -354
  67. package/test/azure-openai-integration.test.js +0 -287
  68. package/test/azure-openai-routing.test.js +0 -175
  69. package/test/azure-openai-streaming.test.js +0 -171
  70. package/test/bedrock-integration.test.js +0 -457
  71. package/test/comprehensive-test-suite.js +0 -928
  72. package/test/config-validation.test.js +0 -207
  73. package/test/cursor-integration.test.js +0 -484
  74. package/test/format-conversion.test.js +0 -578
  75. package/test/hybrid-routing-integration.test.js +0 -269
  76. package/test/hybrid-routing-performance.test.js +0 -428
  77. package/test/llamacpp-integration.test.js +0 -882
  78. package/test/lmstudio-integration.test.js +0 -347
  79. package/test/memory/extractor.test.js +0 -398
  80. package/test/memory/retriever.test.js +0 -613
  81. package/test/memory/retriever.test.js.bak +0 -585
  82. package/test/memory/search.test.js +0 -537
  83. package/test/memory/search.test.js.bak +0 -389
  84. package/test/memory/store.test.js +0 -344
  85. package/test/memory/store.test.js.bak +0 -312
  86. package/test/memory/surprise.test.js +0 -300
  87. package/test/memory-performance.test.js +0 -472
  88. package/test/openai-integration.test.js +0 -683
  89. package/test/openrouter-error-resilience.test.js +0 -418
  90. package/test/passthrough-mode.test.js +0 -385
  91. package/test/performance-benchmark.js +0 -351
  92. package/test/performance-tests.js +0 -528
  93. package/test/routing.test.js +0 -225
  94. package/test/toon-compression.test.js +0 -131
  95. package/test/web-tools.test.js +0 -329
  96. package/test-agents-simple.js +0 -43
  97. package/test-cli-connection.sh +0 -33
  98. package/test-learning-unit.js +0 -126
  99. package/test-learning.js +0 -112
  100. package/test-parallel-agents.sh +0 -124
  101. package/test-parallel-direct.js +0 -155
  102. package/test-subagents.sh +0 -117
@@ -1,519 +0,0 @@
1
- # Headroom Context Compression
2
-
3
- Headroom is an intelligent context compression system that reduces LLM token usage by 47-92% while preserving semantic meaning. It runs as a Python sidecar container that Lynkr manages automatically via Docker.
4
-
5
- ---
6
-
7
- ## Overview
8
-
9
- ### What is Headroom?
10
-
11
- Headroom is a context optimization SDK that compresses LLM prompts and tool outputs using:
12
-
13
- 1. **Smart Crusher** - Statistical JSON compression based on field analysis
14
- 2. **Cache Aligner** - Stabilizes dynamic content (UUIDs, timestamps) for provider cache hits
15
- 3. **CCR (Compress-Cache-Retrieve)** - Reversible compression with on-demand retrieval
16
- 4. **Rolling Window** - Token budget enforcement with turn-based windowing
17
- 5. **LLMLingua** (optional) - ML-based 20x compression using BERT
18
-
19
- ### Benefits
20
-
21
- | Metric | Without Headroom | With Headroom |
22
- |--------|-----------------|---------------|
23
- | Token usage | 100% | 8-53% (47-92% reduction) |
24
- | Cache hit rate | ~20% | ~60-80% |
25
- | Cost per request | $0.01-0.05 | $0.002-0.02 |
26
- | Context overflow | Common | Rare |
27
-
28
- ---
29
-
30
- ## Quick Start
31
-
32
- ### 1. Enable Headroom
33
-
34
- Add to your `.env`:
35
-
36
- ```bash
37
- # Enable Headroom compression
38
- HEADROOM_ENABLED=true
39
- ```
40
-
41
- ### 2. Start Lynkr
42
-
43
- ```bash
44
- npm start
45
- ```
46
-
47
- Lynkr will automatically:
48
- 1. Pull the `lynkr/headroom-sidecar:latest` Docker image
49
- 2. Start the container with configured settings
50
- 3. Wait for health checks to pass
51
- 4. Begin compressing requests
52
-
53
- ### 3. Verify It's Working
54
-
55
- Check the health endpoint:
56
-
57
- ```bash
58
- curl http://localhost:8081/health/headroom
59
- ```
60
-
61
- Expected response:
62
- ```json
63
- {
64
- "enabled": true,
65
- "healthy": true,
66
- "service": {
67
- "available": true,
68
- "ccrEnabled": true,
69
- "llmlinguaEnabled": false
70
- },
71
- "docker": {
72
- "running": true,
73
- "status": "running",
74
- "health": "healthy"
75
- }
76
- }
77
- ```
78
-
79
- ---
80
-
81
- ## How It Works
82
-
83
- ### Transform Pipeline
84
-
85
- When a request arrives, Headroom processes it through a three-stage pipeline:
86
-
87
- ```
88
- Request → Cache Aligner → Smart Crusher → Context Manager → Compressed Request
89
- ↓ ↓ ↓
90
- Stabilize IDs Compress JSON Enforce budget
91
- ```
92
-
93
- ### 1. Cache Aligner
94
-
95
- **Problem**: Dynamic content like UUIDs and timestamps change every request, preventing provider cache hits.
96
-
97
- **Solution**: Replace dynamic values with stable placeholders:
98
-
99
- ```json
100
- // Before
101
- {"id": "f47ac10b-58cc-4372-a567-0e02b2c3d479", "created": "2024-01-15T10:30:00Z"}
102
-
103
- // After
104
- {"id": "[ID:1]", "created": "[TS:1]"}
105
- ```
106
-
107
- **Result**: 60-80% cache hit rate instead of ~20%.
108
-
109
- ### 2. Smart Crusher
110
-
111
- **Problem**: Tool outputs often contain repetitive JSON with many similar items.
112
-
113
- **Solution**: Statistical analysis to identify and compress redundant fields:
114
-
115
- ```json
116
- // Before (100 search results, ~50KB)
117
- [
118
- {"title": "Result 1", "url": "...", "snippet": "...", "score": 0.95, ...},
119
- {"title": "Result 2", "url": "...", "snippet": "...", "score": 0.93, ...},
120
- // ... 98 more items
121
- ]
122
-
123
- // After (~5KB)
124
- {
125
- "_meta": {"compressed": true, "original_count": 100, "kept": 12},
126
- "items": [
127
- // Top 12 most relevant items with essential fields only
128
- ]
129
- }
130
- ```
131
-
132
- **Compression strategies**:
133
- - **High-variance fields**: Keep (they're informative)
134
- - **Low-variance fields**: Remove (they're redundant)
135
- - **Unique fields**: Keep first occurrence only
136
- - **Repetitive arrays**: Sample representative items
137
-
138
- ### 3. CCR (Compress-Cache-Retrieve)
139
-
140
- **Problem**: Sometimes you need to retrieve compressed content later.
141
-
142
- **Solution**: Hash-based reversible compression:
143
-
144
- ```json
145
- // Compressed message
146
- {
147
- "content": "[CCR:abc123] 100 files found. Use ccr_retrieve to explore.",
148
- "ccr_available": true
149
- }
150
-
151
- // Tool definition injected
152
- {
153
- "name": "ccr_retrieve",
154
- "description": "Retrieve compressed content by hash",
155
- "input_schema": {
156
- "hash": "string",
157
- "query": "string (optional search within results)"
158
- }
159
- }
160
- ```
161
-
162
- When the LLM calls `ccr_retrieve`, Headroom returns the full original content.
163
-
164
- ---
165
-
166
- ## Configuration
167
-
168
- ### Basic Settings
169
-
170
- ```bash
171
- # Enable/disable Headroom
172
- HEADROOM_ENABLED=true
173
-
174
- # Sidecar endpoint
175
- HEADROOM_ENDPOINT=http://localhost:8787
176
-
177
- # Request timeout (ms)
178
- HEADROOM_TIMEOUT_MS=5000
179
-
180
- # Skip compression for small requests (tokens)
181
- HEADROOM_MIN_TOKENS=500
182
-
183
- # Mode: "audit" (observe) or "optimize" (apply)
184
- HEADROOM_MODE=optimize
185
- ```
186
-
187
- ### Docker Settings
188
-
189
- ```bash
190
- # Enable automatic container management
191
- HEADROOM_DOCKER_ENABLED=true
192
-
193
- # Docker image
194
- HEADROOM_DOCKER_IMAGE=lynkr/headroom-sidecar:latest
195
-
196
- # Container name
197
- HEADROOM_DOCKER_CONTAINER_NAME=lynkr-headroom
198
-
199
- # Port mapping
200
- HEADROOM_DOCKER_PORT=8787
201
-
202
- # Resource limits
203
- HEADROOM_DOCKER_MEMORY_LIMIT=512m
204
- HEADROOM_DOCKER_CPU_LIMIT=1.0
205
-
206
- # Restart policy
207
- HEADROOM_DOCKER_RESTART_POLICY=unless-stopped
208
- ```
209
-
210
- ### Transform Settings
211
-
212
- ```bash
213
- # Smart Crusher (statistical JSON compression)
214
- HEADROOM_SMART_CRUSHER=true
215
- HEADROOM_SMART_CRUSHER_MIN_TOKENS=200
216
- HEADROOM_SMART_CRUSHER_MAX_ITEMS=15
217
-
218
- # Tool Crusher (fixed-rules compression)
219
- HEADROOM_TOOL_CRUSHER=true
220
-
221
- # Cache Aligner (stabilize dynamic content)
222
- HEADROOM_CACHE_ALIGNER=true
223
-
224
- # Rolling Window (context overflow management)
225
- HEADROOM_ROLLING_WINDOW=true
226
- HEADROOM_KEEP_TURNS=3
227
- ```
228
-
229
- ### CCR Settings
230
-
231
- ```bash
232
- # Enable CCR for reversible compression
233
- HEADROOM_CCR=true
234
-
235
- # Cache TTL in seconds
236
- HEADROOM_CCR_TTL=300
237
- ```
238
-
239
- ### LLMLingua Settings (Optional)
240
-
241
- LLMLingua provides ML-based compression using BERT token classification. Requires GPU for reasonable performance.
242
-
243
- ```bash
244
- # Enable LLMLingua (default: false)
245
- HEADROOM_LLMLINGUA=true
246
-
247
- # Device: cuda, cpu, auto
248
- HEADROOM_LLMLINGUA_DEVICE=cuda
249
- ```
250
-
251
- **Note**: LLMLingua adds 100-500ms latency per request. Only enable if you have a GPU and need maximum compression.
252
-
253
- ---
254
-
255
- ## API Endpoints
256
-
257
- ### Health Check
258
-
259
- ```bash
260
- GET /health/headroom
261
- ```
262
-
263
- Returns Headroom health status including container and service state.
264
-
265
- ### Compression Metrics
266
-
267
- ```bash
268
- GET /metrics/compression
269
- ```
270
-
271
- Returns compression statistics:
272
-
273
- ```json
274
- {
275
- "enabled": true,
276
- "endpoint": "http://localhost:8787",
277
- "client": {
278
- "totalCalls": 150,
279
- "successfulCompressions": 120,
280
- "skippedCompressions": 25,
281
- "failures": 5,
282
- "totalTokensSaved": 450000,
283
- "averageLatencyMs": 45,
284
- "compressionRate": 80,
285
- "failureRate": 3
286
- },
287
- "server": {
288
- "requests_total": 150,
289
- "compressions_applied": 120,
290
- "average_compression_ratio": 0.35,
291
- "ccr_retrievals": 45
292
- }
293
- }
294
- ```
295
-
296
- ### Detailed Status
297
-
298
- ```bash
299
- GET /headroom/status
300
- ```
301
-
302
- Returns full status including configuration, metrics, and recent logs.
303
-
304
- ### Container Restart
305
-
306
- ```bash
307
- POST /headroom/restart
308
- ```
309
-
310
- Restarts the Headroom container (useful for applying config changes).
311
-
312
- ### Container Logs
313
-
314
- ```bash
315
- GET /headroom/logs?tail=100
316
- ```
317
-
318
- Returns recent container logs for debugging.
319
-
320
- ---
321
-
322
- ## Monitoring
323
-
324
- ### Health Check Integration
325
-
326
- Headroom status is included in the `/health/ready` endpoint:
327
-
328
- ```json
329
- {
330
- "status": "ready",
331
- "checks": {
332
- "database": { "healthy": true },
333
- "memory": { "healthy": true },
334
- "headroom": {
335
- "healthy": true,
336
- "enabled": true,
337
- "service": "available",
338
- "docker": "running"
339
- }
340
- }
341
- }
342
- ```
343
-
344
- **Note**: Headroom is non-critical. If it fails, Lynkr continues without compression.
345
-
346
- ### Logging
347
-
348
- Headroom logs compression events:
349
-
350
- ```
351
- INFO: Headroom compression applied
352
- tokensBefore: 15000
353
- tokensAfter: 5200
354
- savingsPercent: 65.3
355
- latencyMs: 42
356
- transforms: ["cache_aligner", "smart_crusher"]
357
- ```
358
-
359
- ---
360
-
361
- ## Troubleshooting
362
-
363
- ### Container Won't Start
364
-
365
- **Check Docker is running:**
366
- ```bash
367
- docker ps
368
- ```
369
-
370
- **Check for port conflicts:**
371
- ```bash
372
- lsof -i :8787
373
- ```
374
-
375
- **View container logs:**
376
- ```bash
377
- curl http://localhost:8081/headroom/logs
378
- # or
379
- docker logs lynkr-headroom
380
- ```
381
-
382
- ### High Latency
383
-
384
- 1. **Reduce transforms**: Disable LLMLingua if not needed
385
- 2. **Increase resources**: Raise `HEADROOM_DOCKER_MEMORY_LIMIT`
386
- 3. **Skip small requests**: Increase `HEADROOM_MIN_TOKENS`
387
-
388
- ### Compression Not Applied
389
-
390
- Check:
391
- 1. `HEADROOM_ENABLED=true` in `.env`
392
- 2. Request has more than `HEADROOM_MIN_TOKENS` tokens
393
- 3. Health endpoint shows `healthy: true`
394
-
395
- ### CCR Retrieval Fails
396
-
397
- 1. Check `HEADROOM_CCR=true`
398
- 2. Verify TTL hasn't expired (`HEADROOM_CCR_TTL`)
399
- 3. Ensure same session is used (CCR is session-scoped)
400
-
401
- ---
402
-
403
- ## Architecture
404
-
405
- ### System Diagram
406
-
407
- ```
408
- ┌─────────────────────────────────────────────────────────────────┐
409
- │ Lynkr (Node.js) │
410
- │ ┌──────────────────────────────────────────────────────────┐ │
411
- │ │ Request Handler │ │
412
- │ │ ↓ │ │
413
- │ │ src/headroom/client.js ──HTTP──→ Headroom Sidecar │ │
414
- │ │ ↓ (Python Container) │ │
415
- │ │ Compressed Request │ │ │
416
- │ │ ↓ ↓ │ │
417
- │ │ LLM Provider ┌─────────────┐ │ │
418
- │ │ │ Transforms │ │ │
419
- │ └──────────────────────────────────│ - Aligner │─────────┘ │
420
- │ │ - Crusher │ │
421
- │ │ - CCR Store │ │
422
- │ │ - LLMLingua │ │
423
- │ └─────────────┘ │
424
- └─────────────────────────────────────────────────────────────────┘
425
- ```
426
-
427
- ### Request Flow
428
-
429
- 1. **Request arrives** at Lynkr
430
- 2. **Token estimation** - Skip if below `HEADROOM_MIN_TOKENS`
431
- 3. **Send to sidecar** - HTTP POST to `/compress`
432
- 4. **Transform pipeline** executes:
433
- - Cache Aligner stabilizes dynamic content
434
- - Smart Crusher compresses JSON structures
435
- - Context Manager enforces token budget
436
- 5. **Return compressed** messages and tools
437
- 6. **Forward to LLM** provider
438
- 7. **On CCR tool call** - Retrieve original content
439
-
440
- ### File Structure
441
-
442
- ```
443
- src/headroom/
444
- ├── index.js # HeadroomManager singleton, exports
445
- ├── launcher.js # Docker container lifecycle (dockerode)
446
- ├── client.js # HTTP client for sidecar API
447
- └── health.js # Health check functionality
448
- ```
449
-
450
- ---
451
-
452
- ## Best Practices
453
-
454
- ### 1. Start with Defaults
455
-
456
- The default configuration is optimized for most use cases:
457
- - Smart Crusher: Enabled
458
- - Cache Aligner: Enabled
459
- - CCR: Enabled
460
- - LLMLingua: Disabled (enable only with GPU)
461
-
462
- ### 2. Monitor Compression Rates
463
-
464
- Check `/metrics/compression` regularly:
465
- - **Good**: 60-80% compression rate
466
- - **Warning**: Below 40% (check transform settings)
467
- - **Issue**: High failure rate (check container health)
468
-
469
- ### 3. Tune for Your Workload
470
-
471
- | Workload | Recommended Settings |
472
- |----------|---------------------|
473
- | Code assistance | `SMART_CRUSHER_MAX_ITEMS=20` |
474
- | Search-heavy | `SMART_CRUSHER_MAX_ITEMS=10`, CCR enabled |
475
- | Long conversations | `ROLLING_WINDOW=true`, `KEEP_TURNS=5` |
476
- | Cost-sensitive | Enable LLMLingua with GPU |
477
-
478
- ### 4. Use Audit Mode First
479
-
480
- Test compression without applying it:
481
-
482
- ```bash
483
- HEADROOM_MODE=audit
484
- ```
485
-
486
- This logs what would be compressed without modifying requests.
487
-
488
- ---
489
-
490
- ## FAQ
491
-
492
- ### Does Headroom affect response quality?
493
-
494
- Minimal impact. Smart Crusher preserves high-variance (informative) fields and CCR allows full retrieval when needed. LLMLingua may have ~1.5% quality reduction.
495
-
496
- ### Can I use Headroom without Docker?
497
-
498
- Yes. Disable Docker management and run the sidecar manually:
499
-
500
- ```bash
501
- HEADROOM_DOCKER_ENABLED=false
502
- HEADROOM_ENDPOINT=http://your-headroom-server:8787
503
- ```
504
-
505
- ### Is Headroom required?
506
-
507
- No. If Headroom fails or is disabled, Lynkr works normally without compression.
508
-
509
- ### What providers benefit most?
510
-
511
- All providers benefit from compression. Anthropic and OpenAI see additional benefits from Cache Aligner improving cache hit rates.
512
-
513
- ---
514
-
515
- ## References
516
-
517
- - [Headroom GitHub Repository](https://github.com/chopratejas/headroom)
518
- - [LLMLingua Paper](https://arxiv.org/abs/2310.05736)
519
- - [Anthropic Prompt Caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)