npm - lynkr - Versions diffs - 7.2.5 → 8.0.1 - Mend

lynkr 7.2.5 → 8.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (124) hide show

package/README.md +3 -3
package/config/model-tiers.json +89 -0
package/install.sh +6 -1
package/package.json +4 -2
package/scripts/setup.js +0 -1
package/src/agents/executor.js +14 -6
package/src/api/middleware/session.js +15 -2
package/src/api/openai-router.js +162 -37
package/src/api/providers-handler.js +15 -1
package/src/api/router.js +107 -2
package/src/budget/index.js +4 -3
package/src/clients/databricks.js +431 -234
package/src/clients/gpt-utils.js +181 -0
package/src/clients/ollama-utils.js +66 -140
package/src/clients/routing.js +0 -1
package/src/clients/standard-tools.js +99 -3
package/src/config/index.js +133 -35
package/src/context/toon.js +173 -0
package/src/logger/index.js +23 -0
package/src/orchestrator/index.js +688 -213
package/src/routing/agentic-detector.js +320 -0
package/src/routing/complexity-analyzer.js +202 -2
package/src/routing/cost-optimizer.js +305 -0
package/src/routing/index.js +168 -159
package/src/routing/model-tiers.js +365 -0
package/src/server.js +4 -14
package/src/sessions/cleanup.js +3 -3
package/src/sessions/record.js +10 -1
package/src/sessions/store.js +7 -2
package/src/tools/agent-task.js +48 -1
package/src/tools/index.js +19 -2
package/src/tools/lazy-loader.js +7 -0
package/src/tools/tinyfish.js +358 -0
package/src/tools/truncate.js +1 -0
package/.github/FUNDING.yml +0 -15
package/.github/workflows/README.md +0 -215
package/.github/workflows/ci.yml +0 -69
package/.github/workflows/index.yml +0 -62
package/.github/workflows/web-tools-tests.yml +0 -56
package/CITATIONS.bib +0 -6
package/CLAWROUTER_ROUTING_PLAN.md +0 -910
package/DEPLOYMENT.md +0 -1001
package/LYNKR-TUI-PLAN.md +0 -984
package/PERFORMANCE-REPORT.md +0 -866
package/PLAN-per-client-model-routing.md +0 -252
package/ROUTER_COMPARISON.md +0 -173
package/TIER_ROUTING_PLAN.md +0 -771
package/docs/42642f749da6234f41b6b425c3bb07c9.txt +0 -1
package/docs/BingSiteAuth.xml +0 -4
package/docs/docs-style.css +0 -478
package/docs/docs.html +0 -197
package/docs/google5be250e608e6da39.html +0 -1
package/docs/index.html +0 -577
package/docs/index.md +0 -577
package/docs/robots.txt +0 -4
package/docs/sitemap.xml +0 -44
package/docs/style.css +0 -1223
package/documentation/README.md +0 -100
package/documentation/api.md +0 -806
package/documentation/claude-code-cli.md +0 -672
package/documentation/codex-cli.md +0 -397
package/documentation/contributing.md +0 -571
package/documentation/cursor-integration.md +0 -731
package/documentation/docker.md +0 -867
package/documentation/embeddings.md +0 -760
package/documentation/faq.md +0 -659
package/documentation/features.md +0 -396
package/documentation/headroom.md +0 -519
package/documentation/installation.md +0 -706
package/documentation/memory-system.md +0 -476
package/documentation/production.md +0 -601
package/documentation/providers.md +0 -906
package/documentation/testing.md +0 -629
package/documentation/token-optimization.md +0 -323
package/documentation/tools.md +0 -697
package/documentation/troubleshooting.md +0 -893
package/final-test.js +0 -33
package/headroom-sidecar/config.py +0 -93
package/headroom-sidecar/requirements.txt +0 -14
package/headroom-sidecar/server.py +0 -451
package/monitor-agents.sh +0 -31
package/scripts/audit-log-reader.js +0 -399
package/scripts/compact-dictionary.js +0 -204
package/scripts/test-deduplication.js +0 -448
package/src/db/database.sqlite +0 -0
package/test/README.md +0 -212
package/test/azure-openai-config.test.js +0 -204
package/test/azure-openai-error-resilience.test.js +0 -238
package/test/azure-openai-format-conversion.test.js +0 -354
package/test/azure-openai-integration.test.js +0 -281
package/test/azure-openai-routing.test.js +0 -177
package/test/azure-openai-streaming.test.js +0 -171
package/test/bedrock-integration.test.js +0 -471
package/test/comprehensive-test-suite.js +0 -928
package/test/config-validation.test.js +0 -207
package/test/cursor-integration.test.js +0 -484
package/test/format-conversion.test.js +0 -578
package/test/hybrid-routing-integration.test.js +0 -254
package/test/hybrid-routing-performance.test.js +0 -418
package/test/llamacpp-integration.test.js +0 -863
package/test/lmstudio-integration.test.js +0 -335
package/test/memory/extractor.test.js +0 -398
package/test/memory/retriever.test.js +0 -613
package/test/memory/retriever.test.js.bak +0 -585
package/test/memory/search.test.js +0 -537
package/test/memory/search.test.js.bak +0 -389
package/test/memory/store.test.js +0 -344
package/test/memory/store.test.js.bak +0 -312
package/test/memory/surprise.test.js +0 -300
package/test/memory-performance.test.js +0 -472
package/test/openai-integration.test.js +0 -686
package/test/openrouter-error-resilience.test.js +0 -418
package/test/passthrough-mode.test.js +0 -385
package/test/performance-benchmark.js +0 -351
package/test/performance-tests.js +0 -528
package/test/routing.test.js +0 -219
package/test/web-tools.test.js +0 -329
package/test-agents-simple.js +0 -43
package/test-cli-connection.sh +0 -33
package/test-learning-unit.js +0 -126
package/test-learning.js +0 -112
package/test-parallel-agents.sh +0 -124
package/test-parallel-direct.js +0 -155
package/test-subagents.sh +0 -117

package/documentation/headroom.md DELETED Viewed

@@ -1,519 +0,0 @@
-# Headroom Context Compression
-Headroom is an intelligent context compression system that reduces LLM token usage by 47-92% while preserving semantic meaning. It runs as a Python sidecar container that Lynkr manages automatically via Docker.
----
-## Overview
-### What is Headroom?
-Headroom is a context optimization SDK that compresses LLM prompts and tool outputs using:
-1. **Smart Crusher** - Statistical JSON compression based on field analysis
-2. **Cache Aligner** - Stabilizes dynamic content (UUIDs, timestamps) for provider cache hits
-3. **CCR (Compress-Cache-Retrieve)** - Reversible compression with on-demand retrieval
-4. **Rolling Window** - Token budget enforcement with turn-based windowing
-5. **LLMLingua** (optional) - ML-based 20x compression using BERT
-### Benefits
-| Metric | Without Headroom | With Headroom |
-|--------|-----------------|---------------|
-| Token usage | 100% | 8-53% (47-92% reduction) |
-| Cache hit rate | ~20% | ~60-80% |
-| Cost per request | $0.01-0.05 | $0.002-0.02 |
-| Context overflow | Common | Rare |
----
-## Quick Start
-### 1. Enable Headroom
-Add to your `.env`:
-```bash
-# Enable Headroom compression
-HEADROOM_ENABLED=true
-```
-### 2. Start Lynkr
-```bash
-npm start
-```
-Lynkr will automatically:
-1. Pull the `lynkr/headroom-sidecar:latest` Docker image
-2. Start the container with configured settings
-3. Wait for health checks to pass
-4. Begin compressing requests
-### 3. Verify It's Working
-Check the health endpoint:
-```bash
-curl http://localhost:8081/health/headroom
-```
-Expected response:
-```json
-{
-  "enabled": true,
-  "healthy": true,
-  "service": {
-    "available": true,
-    "ccrEnabled": true,
-    "llmlinguaEnabled": false
-  },
-  "docker": {
-    "running": true,
-    "status": "running",
-    "health": "healthy"
-  }
-}
-```
----
-## How It Works
-### Transform Pipeline
-When a request arrives, Headroom processes it through a three-stage pipeline:
-```
-Request → Cache Aligner → Smart Crusher → Context Manager → Compressed Request
-                ↓               ↓                ↓
-         Stabilize IDs    Compress JSON    Enforce budget
-```
-### 1. Cache Aligner
-**Problem**: Dynamic content like UUIDs and timestamps change every request, preventing provider cache hits.
-**Solution**: Replace dynamic values with stable placeholders:
-```json
-// Before
-{"id": "f47ac10b-58cc-4372-a567-0e02b2c3d479", "created": "2024-01-15T10:30:00Z"}
-// After
-{"id": "[ID:1]", "created": "[TS:1]"}
-```
-**Result**: 60-80% cache hit rate instead of ~20%.
-### 2. Smart Crusher
-**Problem**: Tool outputs often contain repetitive JSON with many similar items.
-**Solution**: Statistical analysis to identify and compress redundant fields:
-```json
-// Before (100 search results, ~50KB)
-[
-  {"title": "Result 1", "url": "...", "snippet": "...", "score": 0.95, ...},
-  {"title": "Result 2", "url": "...", "snippet": "...", "score": 0.93, ...},
-  // ... 98 more items
-]
-// After (~5KB)
-{
-  "_meta": {"compressed": true, "original_count": 100, "kept": 12},
-  "items": [
-    // Top 12 most relevant items with essential fields only
-  ]
-}
-```
-**Compression strategies**:
-- **High-variance fields**: Keep (they're informative)
-- **Low-variance fields**: Remove (they're redundant)
-- **Unique fields**: Keep first occurrence only
-- **Repetitive arrays**: Sample representative items
-### 3. CCR (Compress-Cache-Retrieve)
-**Problem**: Sometimes you need to retrieve compressed content later.
-**Solution**: Hash-based reversible compression:
-```json
-// Compressed message
-{
-  "content": "[CCR:abc123] 100 files found. Use ccr_retrieve to explore.",
-  "ccr_available": true
-}
-// Tool definition injected
-{
-  "name": "ccr_retrieve",
-  "description": "Retrieve compressed content by hash",
-  "input_schema": {
-    "hash": "string",
-    "query": "string (optional search within results)"
-  }
-}
-```
-When the LLM calls `ccr_retrieve`, Headroom returns the full original content.
----
-## Configuration
-### Basic Settings
-```bash
-# Enable/disable Headroom
-HEADROOM_ENABLED=true
-# Sidecar endpoint
-HEADROOM_ENDPOINT=http://localhost:8787
-# Request timeout (ms)
-HEADROOM_TIMEOUT_MS=5000
-# Skip compression for small requests (tokens)
-HEADROOM_MIN_TOKENS=500
-# Mode: "audit" (observe) or "optimize" (apply)
-HEADROOM_MODE=optimize
-```
-### Docker Settings
-```bash
-# Enable automatic container management
-HEADROOM_DOCKER_ENABLED=true
-# Docker image
-HEADROOM_DOCKER_IMAGE=lynkr/headroom-sidecar:latest
-# Container name
-HEADROOM_DOCKER_CONTAINER_NAME=lynkr-headroom
-# Port mapping
-HEADROOM_DOCKER_PORT=8787
-# Resource limits
-HEADROOM_DOCKER_MEMORY_LIMIT=512m
-HEADROOM_DOCKER_CPU_LIMIT=1.0
-# Restart policy
-HEADROOM_DOCKER_RESTART_POLICY=unless-stopped
-```
-### Transform Settings
-```bash
-# Smart Crusher (statistical JSON compression)
-HEADROOM_SMART_CRUSHER=true
-HEADROOM_SMART_CRUSHER_MIN_TOKENS=200
-HEADROOM_SMART_CRUSHER_MAX_ITEMS=15
-# Tool Crusher (fixed-rules compression)
-HEADROOM_TOOL_CRUSHER=true
-# Cache Aligner (stabilize dynamic content)
-HEADROOM_CACHE_ALIGNER=true
-# Rolling Window (context overflow management)
-HEADROOM_ROLLING_WINDOW=true
-HEADROOM_KEEP_TURNS=3
-```
-### CCR Settings
-```bash
-# Enable CCR for reversible compression
-HEADROOM_CCR=true
-# Cache TTL in seconds
-HEADROOM_CCR_TTL=300
-```
-### LLMLingua Settings (Optional)
-LLMLingua provides ML-based compression using BERT token classification. Requires GPU for reasonable performance.
-```bash
-# Enable LLMLingua (default: false)
-HEADROOM_LLMLINGUA=true
-# Device: cuda, cpu, auto
-HEADROOM_LLMLINGUA_DEVICE=cuda
-```
-**Note**: LLMLingua adds 100-500ms latency per request. Only enable if you have a GPU and need maximum compression.
----
-## API Endpoints
-### Health Check
-```bash
-GET /health/headroom
-```
-Returns Headroom health status including container and service state.
-### Compression Metrics
-```bash
-GET /metrics/compression
-```
-Returns compression statistics:
-```json
-{
-  "enabled": true,
-  "endpoint": "http://localhost:8787",
-  "client": {
-    "totalCalls": 150,
-    "successfulCompressions": 120,
-    "skippedCompressions": 25,
-    "failures": 5,
-    "totalTokensSaved": 450000,
-    "averageLatencyMs": 45,
-    "compressionRate": 80,
-    "failureRate": 3
-  },
-  "server": {
-    "requests_total": 150,
-    "compressions_applied": 120,
-    "average_compression_ratio": 0.35,
-    "ccr_retrievals": 45
-  }
-}
-```
-### Detailed Status
-```bash
-GET /headroom/status
-```
-Returns full status including configuration, metrics, and recent logs.
-### Container Restart
-```bash
-POST /headroom/restart
-```
-Restarts the Headroom container (useful for applying config changes).
-### Container Logs
-```bash
-GET /headroom/logs?tail=100
-```
-Returns recent container logs for debugging.
----
-## Monitoring
-### Health Check Integration
-Headroom status is included in the `/health/ready` endpoint:
-```json
-{
-  "status": "ready",
-  "checks": {
-    "database": { "healthy": true },
-    "memory": { "healthy": true },
-    "headroom": {
-      "healthy": true,
-      "enabled": true,
-      "service": "available",
-      "docker": "running"
-    }
-  }
-}
-```
-**Note**: Headroom is non-critical. If it fails, Lynkr continues without compression.
-### Logging
-Headroom logs compression events:
-```
-INFO: Headroom compression applied
-  tokensBefore: 15000
-  tokensAfter: 5200
-  savingsPercent: 65.3
-  latencyMs: 42
-  transforms: ["cache_aligner", "smart_crusher"]
-```
----
-## Troubleshooting
-### Container Won't Start
-**Check Docker is running:**
-```bash
-docker ps
-```
-**Check for port conflicts:**
-```bash
-lsof -i :8787
-```
-**View container logs:**
-```bash
-curl http://localhost:8081/headroom/logs
-# or
-docker logs lynkr-headroom
-```
-### High Latency
-1. **Reduce transforms**: Disable LLMLingua if not needed
-2. **Increase resources**: Raise `HEADROOM_DOCKER_MEMORY_LIMIT`
-3. **Skip small requests**: Increase `HEADROOM_MIN_TOKENS`
-### Compression Not Applied
-Check:
-1. `HEADROOM_ENABLED=true` in `.env`
-2. Request has more than `HEADROOM_MIN_TOKENS` tokens
-3. Health endpoint shows `healthy: true`
-### CCR Retrieval Fails
-1. Check `HEADROOM_CCR=true`
-2. Verify TTL hasn't expired (`HEADROOM_CCR_TTL`)
-3. Ensure same session is used (CCR is session-scoped)
----
-## Architecture
-### System Diagram
-```
-┌─────────────────────────────────────────────────────────────────┐
-│                         Lynkr (Node.js)                         │
-│  ┌──────────────────────────────────────────────────────────┐  │
-│  │  Request Handler                                          │  │
-│  │    ↓                                                      │  │
-│  │  src/headroom/client.js ──HTTP──→ Headroom Sidecar       │  │
-│  │    ↓                              (Python Container)      │  │
-│  │  Compressed Request                    │                  │  │
-│  │    ↓                                   ↓                  │  │
-│  │  LLM Provider                    ┌─────────────┐         │  │
-│  │                                  │ Transforms  │         │  │
-│  └──────────────────────────────────│ - Aligner   │─────────┘  │
-│                                     │ - Crusher   │            │
-│                                     │ - CCR Store │            │
-│                                     │ - LLMLingua │            │
-│                                     └─────────────┘            │
-└─────────────────────────────────────────────────────────────────┘
-```
-### Request Flow
-1. **Request arrives** at Lynkr
-2. **Token estimation** - Skip if below `HEADROOM_MIN_TOKENS`
-3. **Send to sidecar** - HTTP POST to `/compress`
-4. **Transform pipeline** executes:
-   - Cache Aligner stabilizes dynamic content
-   - Smart Crusher compresses JSON structures
-   - Context Manager enforces token budget
-5. **Return compressed** messages and tools
-6. **Forward to LLM** provider
-7. **On CCR tool call** - Retrieve original content
-### File Structure
-```
-src/headroom/
-├── index.js        # HeadroomManager singleton, exports
-├── launcher.js     # Docker container lifecycle (dockerode)
-├── client.js       # HTTP client for sidecar API
-└── health.js       # Health check functionality
-```
----
-## Best Practices
-### 1. Start with Defaults
-The default configuration is optimized for most use cases:
-- Smart Crusher: Enabled
-- Cache Aligner: Enabled
-- CCR: Enabled
-- LLMLingua: Disabled (enable only with GPU)
-### 2. Monitor Compression Rates
-Check `/metrics/compression` regularly:
-- **Good**: 60-80% compression rate
-- **Warning**: Below 40% (check transform settings)
-- **Issue**: High failure rate (check container health)
-### 3. Tune for Your Workload
-| Workload | Recommended Settings |
-|----------|---------------------|
-| Code assistance | `SMART_CRUSHER_MAX_ITEMS=20` |
-| Search-heavy | `SMART_CRUSHER_MAX_ITEMS=10`, CCR enabled |
-| Long conversations | `ROLLING_WINDOW=true`, `KEEP_TURNS=5` |
-| Cost-sensitive | Enable LLMLingua with GPU |
-### 4. Use Audit Mode First
-Test compression without applying it:
-```bash
-HEADROOM_MODE=audit
-```
-This logs what would be compressed without modifying requests.
----
-## FAQ
-### Does Headroom affect response quality?
-Minimal impact. Smart Crusher preserves high-variance (informative) fields and CCR allows full retrieval when needed. LLMLingua may have ~1.5% quality reduction.
-### Can I use Headroom without Docker?
-Yes. Disable Docker management and run the sidecar manually:
-```bash
-HEADROOM_DOCKER_ENABLED=false
-HEADROOM_ENDPOINT=http://your-headroom-server:8787
-```
-### Is Headroom required?
-No. If Headroom fails or is disabled, Lynkr works normally without compression.
-### What providers benefit most?
-All providers benefit from compression. Anthropic and OpenAI see additional benefits from Cache Aligner improving cache hit rates.
----
-## References
-- [Headroom GitHub Repository](https://github.com/chopratejas/headroom)
-- [LLMLingua Paper](https://arxiv.org/abs/2310.05736)
-- [Anthropic Prompt Caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)