lynkr 8.0.0 → 9.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.lynkr/telemetry.db +0 -0
- package/.lynkr/telemetry.db-shm +0 -0
- package/.lynkr/telemetry.db-wal +0 -0
- package/README.md +196 -322
- package/lynkr-skill.tar.gz +0 -0
- package/package.json +4 -3
- package/src/api/openai-router.js +64 -13
- package/src/api/providers-handler.js +171 -3
- package/src/api/router.js +9 -2
- package/src/clients/circuit-breaker.js +10 -247
- package/src/clients/codex-process.js +342 -0
- package/src/clients/codex-utils.js +143 -0
- package/src/clients/databricks.js +210 -63
- package/src/clients/resilience.js +540 -0
- package/src/clients/retry.js +22 -167
- package/src/clients/standard-tools.js +23 -0
- package/src/config/index.js +77 -0
- package/src/context/compression.js +42 -9
- package/src/context/distill.js +492 -0
- package/src/orchestrator/index.js +48 -8
- package/src/routing/complexity-analyzer.js +258 -5
- package/src/routing/index.js +12 -2
- package/src/routing/latency-tracker.js +148 -0
- package/src/routing/model-tiers.js +2 -0
- package/src/routing/quality-scorer.js +113 -0
- package/src/routing/telemetry.js +464 -0
- package/src/server.js +13 -12
- package/src/tools/code-graph.js +538 -0
- package/src/tools/code-mode.js +304 -0
- package/src/tools/index.js +4 -0
- package/src/tools/lazy-loader.js +18 -0
- package/src/tools/mcp-remote.js +7 -0
- package/src/tools/smart-selection.js +11 -0
- package/src/tools/tinyfish.js +358 -0
- package/src/tools/truncate.js +1 -0
- package/src/utils/payload.js +206 -0
- package/src/utils/perf-timer.js +80 -0
- package/.github/FUNDING.yml +0 -15
- package/.github/workflows/README.md +0 -215
- package/.github/workflows/ci.yml +0 -69
- package/.github/workflows/index.yml +0 -62
- package/.github/workflows/web-tools-tests.yml +0 -56
- package/CITATIONS.bib +0 -6
- package/DEPLOYMENT.md +0 -1001
- package/LYNKR-TUI-PLAN.md +0 -984
- package/PERFORMANCE-REPORT.md +0 -866
- package/PLAN-per-client-model-routing.md +0 -252
- package/docs/42642f749da6234f41b6b425c3bb07c9.txt +0 -1
- package/docs/BingSiteAuth.xml +0 -4
- package/docs/docs-style.css +0 -478
- package/docs/docs.html +0 -198
- package/docs/google5be250e608e6da39.html +0 -1
- package/docs/index.html +0 -577
- package/docs/index.md +0 -584
- package/docs/robots.txt +0 -4
- package/docs/sitemap.xml +0 -44
- package/docs/style.css +0 -1223
- package/docs/toon-integration-spec.md +0 -130
- package/documentation/README.md +0 -101
- package/documentation/api.md +0 -806
- package/documentation/claude-code-cli.md +0 -679
- package/documentation/codex-cli.md +0 -397
- package/documentation/contributing.md +0 -571
- package/documentation/cursor-integration.md +0 -734
- package/documentation/docker.md +0 -874
- package/documentation/embeddings.md +0 -762
- package/documentation/faq.md +0 -713
- package/documentation/features.md +0 -403
- package/documentation/headroom.md +0 -519
- package/documentation/installation.md +0 -758
- package/documentation/memory-system.md +0 -476
- package/documentation/production.md +0 -636
- package/documentation/providers.md +0 -1009
- package/documentation/routing.md +0 -476
- package/documentation/testing.md +0 -629
- package/documentation/token-optimization.md +0 -325
- package/documentation/tools.md +0 -697
- package/documentation/troubleshooting.md +0 -969
- package/final-test.js +0 -33
- package/headroom-sidecar/config.py +0 -93
- package/headroom-sidecar/requirements.txt +0 -14
- package/headroom-sidecar/server.py +0 -451
- package/monitor-agents.sh +0 -31
- package/scripts/audit-log-reader.js +0 -399
- package/scripts/compact-dictionary.js +0 -204
- package/scripts/test-deduplication.js +0 -448
- package/src/db/database.sqlite +0 -0
- package/te +0 -11622
- package/test/README.md +0 -212
- package/test/azure-openai-config.test.js +0 -213
- package/test/azure-openai-error-resilience.test.js +0 -238
- package/test/azure-openai-format-conversion.test.js +0 -354
- package/test/azure-openai-integration.test.js +0 -287
- package/test/azure-openai-routing.test.js +0 -175
- package/test/azure-openai-streaming.test.js +0 -171
- package/test/bedrock-integration.test.js +0 -457
- package/test/comprehensive-test-suite.js +0 -928
- package/test/config-validation.test.js +0 -207
- package/test/cursor-integration.test.js +0 -484
- package/test/format-conversion.test.js +0 -578
- package/test/hybrid-routing-integration.test.js +0 -269
- package/test/hybrid-routing-performance.test.js +0 -428
- package/test/llamacpp-integration.test.js +0 -882
- package/test/lmstudio-integration.test.js +0 -347
- package/test/memory/extractor.test.js +0 -398
- package/test/memory/retriever.test.js +0 -613
- package/test/memory/retriever.test.js.bak +0 -585
- package/test/memory/search.test.js +0 -537
- package/test/memory/search.test.js.bak +0 -389
- package/test/memory/store.test.js +0 -344
- package/test/memory/store.test.js.bak +0 -312
- package/test/memory/surprise.test.js +0 -300
- package/test/memory-performance.test.js +0 -472
- package/test/openai-integration.test.js +0 -683
- package/test/openrouter-error-resilience.test.js +0 -418
- package/test/passthrough-mode.test.js +0 -385
- package/test/performance-benchmark.js +0 -351
- package/test/performance-tests.js +0 -528
- package/test/routing.test.js +0 -225
- package/test/toon-compression.test.js +0 -131
- package/test/web-tools.test.js +0 -329
- package/test-agents-simple.js +0 -43
- package/test-cli-connection.sh +0 -33
- package/test-learning-unit.js +0 -126
- package/test-learning.js +0 -112
- package/test-parallel-agents.sh +0 -124
- package/test-parallel-direct.js +0 -155
- package/test-subagents.sh +0 -117
|
@@ -1,519 +0,0 @@
|
|
|
1
|
-
# Headroom Context Compression
|
|
2
|
-
|
|
3
|
-
Headroom is an intelligent context compression system that reduces LLM token usage by 47-92% while preserving semantic meaning. It runs as a Python sidecar container that Lynkr manages automatically via Docker.
|
|
4
|
-
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
## Overview
|
|
8
|
-
|
|
9
|
-
### What is Headroom?
|
|
10
|
-
|
|
11
|
-
Headroom is a context optimization SDK that compresses LLM prompts and tool outputs using:
|
|
12
|
-
|
|
13
|
-
1. **Smart Crusher** - Statistical JSON compression based on field analysis
|
|
14
|
-
2. **Cache Aligner** - Stabilizes dynamic content (UUIDs, timestamps) for provider cache hits
|
|
15
|
-
3. **CCR (Compress-Cache-Retrieve)** - Reversible compression with on-demand retrieval
|
|
16
|
-
4. **Rolling Window** - Token budget enforcement with turn-based windowing
|
|
17
|
-
5. **LLMLingua** (optional) - ML-based 20x compression using BERT
|
|
18
|
-
|
|
19
|
-
### Benefits
|
|
20
|
-
|
|
21
|
-
| Metric | Without Headroom | With Headroom |
|
|
22
|
-
|--------|-----------------|---------------|
|
|
23
|
-
| Token usage | 100% | 8-53% (47-92% reduction) |
|
|
24
|
-
| Cache hit rate | ~20% | ~60-80% |
|
|
25
|
-
| Cost per request | $0.01-0.05 | $0.002-0.02 |
|
|
26
|
-
| Context overflow | Common | Rare |
|
|
27
|
-
|
|
28
|
-
---
|
|
29
|
-
|
|
30
|
-
## Quick Start
|
|
31
|
-
|
|
32
|
-
### 1. Enable Headroom
|
|
33
|
-
|
|
34
|
-
Add to your `.env`:
|
|
35
|
-
|
|
36
|
-
```bash
|
|
37
|
-
# Enable Headroom compression
|
|
38
|
-
HEADROOM_ENABLED=true
|
|
39
|
-
```
|
|
40
|
-
|
|
41
|
-
### 2. Start Lynkr
|
|
42
|
-
|
|
43
|
-
```bash
|
|
44
|
-
npm start
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
Lynkr will automatically:
|
|
48
|
-
1. Pull the `lynkr/headroom-sidecar:latest` Docker image
|
|
49
|
-
2. Start the container with configured settings
|
|
50
|
-
3. Wait for health checks to pass
|
|
51
|
-
4. Begin compressing requests
|
|
52
|
-
|
|
53
|
-
### 3. Verify It's Working
|
|
54
|
-
|
|
55
|
-
Check the health endpoint:
|
|
56
|
-
|
|
57
|
-
```bash
|
|
58
|
-
curl http://localhost:8081/health/headroom
|
|
59
|
-
```
|
|
60
|
-
|
|
61
|
-
Expected response:
|
|
62
|
-
```json
|
|
63
|
-
{
|
|
64
|
-
"enabled": true,
|
|
65
|
-
"healthy": true,
|
|
66
|
-
"service": {
|
|
67
|
-
"available": true,
|
|
68
|
-
"ccrEnabled": true,
|
|
69
|
-
"llmlinguaEnabled": false
|
|
70
|
-
},
|
|
71
|
-
"docker": {
|
|
72
|
-
"running": true,
|
|
73
|
-
"status": "running",
|
|
74
|
-
"health": "healthy"
|
|
75
|
-
}
|
|
76
|
-
}
|
|
77
|
-
```
|
|
78
|
-
|
|
79
|
-
---
|
|
80
|
-
|
|
81
|
-
## How It Works
|
|
82
|
-
|
|
83
|
-
### Transform Pipeline
|
|
84
|
-
|
|
85
|
-
When a request arrives, Headroom processes it through a three-stage pipeline:
|
|
86
|
-
|
|
87
|
-
```
|
|
88
|
-
Request → Cache Aligner → Smart Crusher → Context Manager → Compressed Request
|
|
89
|
-
↓ ↓ ↓
|
|
90
|
-
Stabilize IDs Compress JSON Enforce budget
|
|
91
|
-
```
|
|
92
|
-
|
|
93
|
-
### 1. Cache Aligner
|
|
94
|
-
|
|
95
|
-
**Problem**: Dynamic content like UUIDs and timestamps change every request, preventing provider cache hits.
|
|
96
|
-
|
|
97
|
-
**Solution**: Replace dynamic values with stable placeholders:
|
|
98
|
-
|
|
99
|
-
```json
|
|
100
|
-
// Before
|
|
101
|
-
{"id": "f47ac10b-58cc-4372-a567-0e02b2c3d479", "created": "2024-01-15T10:30:00Z"}
|
|
102
|
-
|
|
103
|
-
// After
|
|
104
|
-
{"id": "[ID:1]", "created": "[TS:1]"}
|
|
105
|
-
```
|
|
106
|
-
|
|
107
|
-
**Result**: 60-80% cache hit rate instead of ~20%.
|
|
108
|
-
|
|
109
|
-
### 2. Smart Crusher
|
|
110
|
-
|
|
111
|
-
**Problem**: Tool outputs often contain repetitive JSON with many similar items.
|
|
112
|
-
|
|
113
|
-
**Solution**: Statistical analysis to identify and compress redundant fields:
|
|
114
|
-
|
|
115
|
-
```json
|
|
116
|
-
// Before (100 search results, ~50KB)
|
|
117
|
-
[
|
|
118
|
-
{"title": "Result 1", "url": "...", "snippet": "...", "score": 0.95, ...},
|
|
119
|
-
{"title": "Result 2", "url": "...", "snippet": "...", "score": 0.93, ...},
|
|
120
|
-
// ... 98 more items
|
|
121
|
-
]
|
|
122
|
-
|
|
123
|
-
// After (~5KB)
|
|
124
|
-
{
|
|
125
|
-
"_meta": {"compressed": true, "original_count": 100, "kept": 12},
|
|
126
|
-
"items": [
|
|
127
|
-
// Top 12 most relevant items with essential fields only
|
|
128
|
-
]
|
|
129
|
-
}
|
|
130
|
-
```
|
|
131
|
-
|
|
132
|
-
**Compression strategies**:
|
|
133
|
-
- **High-variance fields**: Keep (they're informative)
|
|
134
|
-
- **Low-variance fields**: Remove (they're redundant)
|
|
135
|
-
- **Unique fields**: Keep first occurrence only
|
|
136
|
-
- **Repetitive arrays**: Sample representative items
|
|
137
|
-
|
|
138
|
-
### 3. CCR (Compress-Cache-Retrieve)
|
|
139
|
-
|
|
140
|
-
**Problem**: Sometimes you need to retrieve compressed content later.
|
|
141
|
-
|
|
142
|
-
**Solution**: Hash-based reversible compression:
|
|
143
|
-
|
|
144
|
-
```json
|
|
145
|
-
// Compressed message
|
|
146
|
-
{
|
|
147
|
-
"content": "[CCR:abc123] 100 files found. Use ccr_retrieve to explore.",
|
|
148
|
-
"ccr_available": true
|
|
149
|
-
}
|
|
150
|
-
|
|
151
|
-
// Tool definition injected
|
|
152
|
-
{
|
|
153
|
-
"name": "ccr_retrieve",
|
|
154
|
-
"description": "Retrieve compressed content by hash",
|
|
155
|
-
"input_schema": {
|
|
156
|
-
"hash": "string",
|
|
157
|
-
"query": "string (optional search within results)"
|
|
158
|
-
}
|
|
159
|
-
}
|
|
160
|
-
```
|
|
161
|
-
|
|
162
|
-
When the LLM calls `ccr_retrieve`, Headroom returns the full original content.
|
|
163
|
-
|
|
164
|
-
---
|
|
165
|
-
|
|
166
|
-
## Configuration
|
|
167
|
-
|
|
168
|
-
### Basic Settings
|
|
169
|
-
|
|
170
|
-
```bash
|
|
171
|
-
# Enable/disable Headroom
|
|
172
|
-
HEADROOM_ENABLED=true
|
|
173
|
-
|
|
174
|
-
# Sidecar endpoint
|
|
175
|
-
HEADROOM_ENDPOINT=http://localhost:8787
|
|
176
|
-
|
|
177
|
-
# Request timeout (ms)
|
|
178
|
-
HEADROOM_TIMEOUT_MS=5000
|
|
179
|
-
|
|
180
|
-
# Skip compression for small requests (tokens)
|
|
181
|
-
HEADROOM_MIN_TOKENS=500
|
|
182
|
-
|
|
183
|
-
# Mode: "audit" (observe) or "optimize" (apply)
|
|
184
|
-
HEADROOM_MODE=optimize
|
|
185
|
-
```
|
|
186
|
-
|
|
187
|
-
### Docker Settings
|
|
188
|
-
|
|
189
|
-
```bash
|
|
190
|
-
# Enable automatic container management
|
|
191
|
-
HEADROOM_DOCKER_ENABLED=true
|
|
192
|
-
|
|
193
|
-
# Docker image
|
|
194
|
-
HEADROOM_DOCKER_IMAGE=lynkr/headroom-sidecar:latest
|
|
195
|
-
|
|
196
|
-
# Container name
|
|
197
|
-
HEADROOM_DOCKER_CONTAINER_NAME=lynkr-headroom
|
|
198
|
-
|
|
199
|
-
# Port mapping
|
|
200
|
-
HEADROOM_DOCKER_PORT=8787
|
|
201
|
-
|
|
202
|
-
# Resource limits
|
|
203
|
-
HEADROOM_DOCKER_MEMORY_LIMIT=512m
|
|
204
|
-
HEADROOM_DOCKER_CPU_LIMIT=1.0
|
|
205
|
-
|
|
206
|
-
# Restart policy
|
|
207
|
-
HEADROOM_DOCKER_RESTART_POLICY=unless-stopped
|
|
208
|
-
```
|
|
209
|
-
|
|
210
|
-
### Transform Settings
|
|
211
|
-
|
|
212
|
-
```bash
|
|
213
|
-
# Smart Crusher (statistical JSON compression)
|
|
214
|
-
HEADROOM_SMART_CRUSHER=true
|
|
215
|
-
HEADROOM_SMART_CRUSHER_MIN_TOKENS=200
|
|
216
|
-
HEADROOM_SMART_CRUSHER_MAX_ITEMS=15
|
|
217
|
-
|
|
218
|
-
# Tool Crusher (fixed-rules compression)
|
|
219
|
-
HEADROOM_TOOL_CRUSHER=true
|
|
220
|
-
|
|
221
|
-
# Cache Aligner (stabilize dynamic content)
|
|
222
|
-
HEADROOM_CACHE_ALIGNER=true
|
|
223
|
-
|
|
224
|
-
# Rolling Window (context overflow management)
|
|
225
|
-
HEADROOM_ROLLING_WINDOW=true
|
|
226
|
-
HEADROOM_KEEP_TURNS=3
|
|
227
|
-
```
|
|
228
|
-
|
|
229
|
-
### CCR Settings
|
|
230
|
-
|
|
231
|
-
```bash
|
|
232
|
-
# Enable CCR for reversible compression
|
|
233
|
-
HEADROOM_CCR=true
|
|
234
|
-
|
|
235
|
-
# Cache TTL in seconds
|
|
236
|
-
HEADROOM_CCR_TTL=300
|
|
237
|
-
```
|
|
238
|
-
|
|
239
|
-
### LLMLingua Settings (Optional)
|
|
240
|
-
|
|
241
|
-
LLMLingua provides ML-based compression using BERT token classification. Requires GPU for reasonable performance.
|
|
242
|
-
|
|
243
|
-
```bash
|
|
244
|
-
# Enable LLMLingua (default: false)
|
|
245
|
-
HEADROOM_LLMLINGUA=true
|
|
246
|
-
|
|
247
|
-
# Device: cuda, cpu, auto
|
|
248
|
-
HEADROOM_LLMLINGUA_DEVICE=cuda
|
|
249
|
-
```
|
|
250
|
-
|
|
251
|
-
**Note**: LLMLingua adds 100-500ms latency per request. Only enable if you have a GPU and need maximum compression.
|
|
252
|
-
|
|
253
|
-
---
|
|
254
|
-
|
|
255
|
-
## API Endpoints
|
|
256
|
-
|
|
257
|
-
### Health Check
|
|
258
|
-
|
|
259
|
-
```bash
|
|
260
|
-
GET /health/headroom
|
|
261
|
-
```
|
|
262
|
-
|
|
263
|
-
Returns Headroom health status including container and service state.
|
|
264
|
-
|
|
265
|
-
### Compression Metrics
|
|
266
|
-
|
|
267
|
-
```bash
|
|
268
|
-
GET /metrics/compression
|
|
269
|
-
```
|
|
270
|
-
|
|
271
|
-
Returns compression statistics:
|
|
272
|
-
|
|
273
|
-
```json
|
|
274
|
-
{
|
|
275
|
-
"enabled": true,
|
|
276
|
-
"endpoint": "http://localhost:8787",
|
|
277
|
-
"client": {
|
|
278
|
-
"totalCalls": 150,
|
|
279
|
-
"successfulCompressions": 120,
|
|
280
|
-
"skippedCompressions": 25,
|
|
281
|
-
"failures": 5,
|
|
282
|
-
"totalTokensSaved": 450000,
|
|
283
|
-
"averageLatencyMs": 45,
|
|
284
|
-
"compressionRate": 80,
|
|
285
|
-
"failureRate": 3
|
|
286
|
-
},
|
|
287
|
-
"server": {
|
|
288
|
-
"requests_total": 150,
|
|
289
|
-
"compressions_applied": 120,
|
|
290
|
-
"average_compression_ratio": 0.35,
|
|
291
|
-
"ccr_retrievals": 45
|
|
292
|
-
}
|
|
293
|
-
}
|
|
294
|
-
```
|
|
295
|
-
|
|
296
|
-
### Detailed Status
|
|
297
|
-
|
|
298
|
-
```bash
|
|
299
|
-
GET /headroom/status
|
|
300
|
-
```
|
|
301
|
-
|
|
302
|
-
Returns full status including configuration, metrics, and recent logs.
|
|
303
|
-
|
|
304
|
-
### Container Restart
|
|
305
|
-
|
|
306
|
-
```bash
|
|
307
|
-
POST /headroom/restart
|
|
308
|
-
```
|
|
309
|
-
|
|
310
|
-
Restarts the Headroom container (useful for applying config changes).
|
|
311
|
-
|
|
312
|
-
### Container Logs
|
|
313
|
-
|
|
314
|
-
```bash
|
|
315
|
-
GET /headroom/logs?tail=100
|
|
316
|
-
```
|
|
317
|
-
|
|
318
|
-
Returns recent container logs for debugging.
|
|
319
|
-
|
|
320
|
-
---
|
|
321
|
-
|
|
322
|
-
## Monitoring
|
|
323
|
-
|
|
324
|
-
### Health Check Integration
|
|
325
|
-
|
|
326
|
-
Headroom status is included in the `/health/ready` endpoint:
|
|
327
|
-
|
|
328
|
-
```json
|
|
329
|
-
{
|
|
330
|
-
"status": "ready",
|
|
331
|
-
"checks": {
|
|
332
|
-
"database": { "healthy": true },
|
|
333
|
-
"memory": { "healthy": true },
|
|
334
|
-
"headroom": {
|
|
335
|
-
"healthy": true,
|
|
336
|
-
"enabled": true,
|
|
337
|
-
"service": "available",
|
|
338
|
-
"docker": "running"
|
|
339
|
-
}
|
|
340
|
-
}
|
|
341
|
-
}
|
|
342
|
-
```
|
|
343
|
-
|
|
344
|
-
**Note**: Headroom is non-critical. If it fails, Lynkr continues without compression.
|
|
345
|
-
|
|
346
|
-
### Logging
|
|
347
|
-
|
|
348
|
-
Headroom logs compression events:
|
|
349
|
-
|
|
350
|
-
```
|
|
351
|
-
INFO: Headroom compression applied
|
|
352
|
-
tokensBefore: 15000
|
|
353
|
-
tokensAfter: 5200
|
|
354
|
-
savingsPercent: 65.3
|
|
355
|
-
latencyMs: 42
|
|
356
|
-
transforms: ["cache_aligner", "smart_crusher"]
|
|
357
|
-
```
|
|
358
|
-
|
|
359
|
-
---
|
|
360
|
-
|
|
361
|
-
## Troubleshooting
|
|
362
|
-
|
|
363
|
-
### Container Won't Start
|
|
364
|
-
|
|
365
|
-
**Check Docker is running:**
|
|
366
|
-
```bash
|
|
367
|
-
docker ps
|
|
368
|
-
```
|
|
369
|
-
|
|
370
|
-
**Check for port conflicts:**
|
|
371
|
-
```bash
|
|
372
|
-
lsof -i :8787
|
|
373
|
-
```
|
|
374
|
-
|
|
375
|
-
**View container logs:**
|
|
376
|
-
```bash
|
|
377
|
-
curl http://localhost:8081/headroom/logs
|
|
378
|
-
# or
|
|
379
|
-
docker logs lynkr-headroom
|
|
380
|
-
```
|
|
381
|
-
|
|
382
|
-
### High Latency
|
|
383
|
-
|
|
384
|
-
1. **Reduce transforms**: Disable LLMLingua if not needed
|
|
385
|
-
2. **Increase resources**: Raise `HEADROOM_DOCKER_MEMORY_LIMIT`
|
|
386
|
-
3. **Skip small requests**: Increase `HEADROOM_MIN_TOKENS`
|
|
387
|
-
|
|
388
|
-
### Compression Not Applied
|
|
389
|
-
|
|
390
|
-
Check:
|
|
391
|
-
1. `HEADROOM_ENABLED=true` in `.env`
|
|
392
|
-
2. Request has more than `HEADROOM_MIN_TOKENS` tokens
|
|
393
|
-
3. Health endpoint shows `healthy: true`
|
|
394
|
-
|
|
395
|
-
### CCR Retrieval Fails
|
|
396
|
-
|
|
397
|
-
1. Check `HEADROOM_CCR=true`
|
|
398
|
-
2. Verify TTL hasn't expired (`HEADROOM_CCR_TTL`)
|
|
399
|
-
3. Ensure same session is used (CCR is session-scoped)
|
|
400
|
-
|
|
401
|
-
---
|
|
402
|
-
|
|
403
|
-
## Architecture
|
|
404
|
-
|
|
405
|
-
### System Diagram
|
|
406
|
-
|
|
407
|
-
```
|
|
408
|
-
┌─────────────────────────────────────────────────────────────────┐
|
|
409
|
-
│ Lynkr (Node.js) │
|
|
410
|
-
│ ┌──────────────────────────────────────────────────────────┐ │
|
|
411
|
-
│ │ Request Handler │ │
|
|
412
|
-
│ │ ↓ │ │
|
|
413
|
-
│ │ src/headroom/client.js ──HTTP──→ Headroom Sidecar │ │
|
|
414
|
-
│ │ ↓ (Python Container) │ │
|
|
415
|
-
│ │ Compressed Request │ │ │
|
|
416
|
-
│ │ ↓ ↓ │ │
|
|
417
|
-
│ │ LLM Provider ┌─────────────┐ │ │
|
|
418
|
-
│ │ │ Transforms │ │ │
|
|
419
|
-
│ └──────────────────────────────────│ - Aligner │─────────┘ │
|
|
420
|
-
│ │ - Crusher │ │
|
|
421
|
-
│ │ - CCR Store │ │
|
|
422
|
-
│ │ - LLMLingua │ │
|
|
423
|
-
│ └─────────────┘ │
|
|
424
|
-
└─────────────────────────────────────────────────────────────────┘
|
|
425
|
-
```
|
|
426
|
-
|
|
427
|
-
### Request Flow
|
|
428
|
-
|
|
429
|
-
1. **Request arrives** at Lynkr
|
|
430
|
-
2. **Token estimation** - Skip if below `HEADROOM_MIN_TOKENS`
|
|
431
|
-
3. **Send to sidecar** - HTTP POST to `/compress`
|
|
432
|
-
4. **Transform pipeline** executes:
|
|
433
|
-
- Cache Aligner stabilizes dynamic content
|
|
434
|
-
- Smart Crusher compresses JSON structures
|
|
435
|
-
- Context Manager enforces token budget
|
|
436
|
-
5. **Return compressed** messages and tools
|
|
437
|
-
6. **Forward to LLM** provider
|
|
438
|
-
7. **On CCR tool call** - Retrieve original content
|
|
439
|
-
|
|
440
|
-
### File Structure
|
|
441
|
-
|
|
442
|
-
```
|
|
443
|
-
src/headroom/
|
|
444
|
-
├── index.js # HeadroomManager singleton, exports
|
|
445
|
-
├── launcher.js # Docker container lifecycle (dockerode)
|
|
446
|
-
├── client.js # HTTP client for sidecar API
|
|
447
|
-
└── health.js # Health check functionality
|
|
448
|
-
```
|
|
449
|
-
|
|
450
|
-
---
|
|
451
|
-
|
|
452
|
-
## Best Practices
|
|
453
|
-
|
|
454
|
-
### 1. Start with Defaults
|
|
455
|
-
|
|
456
|
-
The default configuration is optimized for most use cases:
|
|
457
|
-
- Smart Crusher: Enabled
|
|
458
|
-
- Cache Aligner: Enabled
|
|
459
|
-
- CCR: Enabled
|
|
460
|
-
- LLMLingua: Disabled (enable only with GPU)
|
|
461
|
-
|
|
462
|
-
### 2. Monitor Compression Rates
|
|
463
|
-
|
|
464
|
-
Check `/metrics/compression` regularly:
|
|
465
|
-
- **Good**: 60-80% compression rate
|
|
466
|
-
- **Warning**: Below 40% (check transform settings)
|
|
467
|
-
- **Issue**: High failure rate (check container health)
|
|
468
|
-
|
|
469
|
-
### 3. Tune for Your Workload
|
|
470
|
-
|
|
471
|
-
| Workload | Recommended Settings |
|
|
472
|
-
|----------|---------------------|
|
|
473
|
-
| Code assistance | `SMART_CRUSHER_MAX_ITEMS=20` |
|
|
474
|
-
| Search-heavy | `SMART_CRUSHER_MAX_ITEMS=10`, CCR enabled |
|
|
475
|
-
| Long conversations | `ROLLING_WINDOW=true`, `KEEP_TURNS=5` |
|
|
476
|
-
| Cost-sensitive | Enable LLMLingua with GPU |
|
|
477
|
-
|
|
478
|
-
### 4. Use Audit Mode First
|
|
479
|
-
|
|
480
|
-
Test compression without applying it:
|
|
481
|
-
|
|
482
|
-
```bash
|
|
483
|
-
HEADROOM_MODE=audit
|
|
484
|
-
```
|
|
485
|
-
|
|
486
|
-
This logs what would be compressed without modifying requests.
|
|
487
|
-
|
|
488
|
-
---
|
|
489
|
-
|
|
490
|
-
## FAQ
|
|
491
|
-
|
|
492
|
-
### Does Headroom affect response quality?
|
|
493
|
-
|
|
494
|
-
Minimal impact. Smart Crusher preserves high-variance (informative) fields and CCR allows full retrieval when needed. LLMLingua may have ~1.5% quality reduction.
|
|
495
|
-
|
|
496
|
-
### Can I use Headroom without Docker?
|
|
497
|
-
|
|
498
|
-
Yes. Disable Docker management and run the sidecar manually:
|
|
499
|
-
|
|
500
|
-
```bash
|
|
501
|
-
HEADROOM_DOCKER_ENABLED=false
|
|
502
|
-
HEADROOM_ENDPOINT=http://your-headroom-server:8787
|
|
503
|
-
```
|
|
504
|
-
|
|
505
|
-
### Is Headroom required?
|
|
506
|
-
|
|
507
|
-
No. If Headroom fails or is disabled, Lynkr works normally without compression.
|
|
508
|
-
|
|
509
|
-
### What providers benefit most?
|
|
510
|
-
|
|
511
|
-
All providers benefit from compression. Anthropic and OpenAI see additional benefits from Cache Aligner improving cache hit rates.
|
|
512
|
-
|
|
513
|
-
---
|
|
514
|
-
|
|
515
|
-
## References
|
|
516
|
-
|
|
517
|
-
- [Headroom GitHub Repository](https://github.com/chopratejas/headroom)
|
|
518
|
-
- [LLMLingua Paper](https://arxiv.org/abs/2310.05736)
|
|
519
|
-
- [Anthropic Prompt Caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)
|