squeezr-ai 1.46.3 → 1.80.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (119) hide show
  1. package/README.md +189 -315
  2. package/bin/squeezr.js +2535 -2251
  3. package/dist/__tests__/aiRateLimit.test.d.ts +1 -0
  4. package/dist/__tests__/aiRateLimit.test.js +20 -0
  5. package/dist/__tests__/attachmentDedup.test.d.ts +1 -0
  6. package/dist/__tests__/attachmentDedup.test.js +89 -0
  7. package/dist/__tests__/compressibilityProbe.test.d.ts +1 -0
  8. package/dist/__tests__/compressibilityProbe.test.js +45 -0
  9. package/dist/__tests__/compressionGuard.test.d.ts +1 -0
  10. package/dist/__tests__/compressionGuard.test.js +57 -0
  11. package/dist/__tests__/compressor.test.js +104 -51
  12. package/dist/__tests__/diffRead.test.d.ts +1 -0
  13. package/dist/__tests__/diffRead.test.js +83 -0
  14. package/dist/__tests__/glossaryStore.test.d.ts +1 -0
  15. package/dist/__tests__/glossaryStore.test.js +37 -0
  16. package/dist/__tests__/glossarySub.test.d.ts +1 -0
  17. package/dist/__tests__/glossarySub.test.js +162 -0
  18. package/dist/__tests__/imageDedup.test.d.ts +1 -0
  19. package/dist/__tests__/imageDedup.test.js +80 -0
  20. package/dist/__tests__/largeBlock.test.d.ts +1 -0
  21. package/dist/__tests__/largeBlock.test.js +35 -0
  22. package/dist/__tests__/mcpFilter.test.d.ts +1 -0
  23. package/dist/__tests__/mcpFilter.test.js +87 -0
  24. package/dist/__tests__/newFeatures.test.d.ts +1 -0
  25. package/dist/__tests__/newFeatures.test.js +124 -0
  26. package/dist/__tests__/qualityHarness.test.d.ts +1 -0
  27. package/dist/__tests__/qualityHarness.test.js +98 -0
  28. package/dist/__tests__/rateLimitHeaders.test.js +6 -0
  29. package/dist/__tests__/requestCapture.test.d.ts +1 -0
  30. package/dist/__tests__/requestCapture.test.js +37 -0
  31. package/dist/__tests__/skillDedup.test.d.ts +1 -0
  32. package/dist/__tests__/skillDedup.test.js +57 -0
  33. package/dist/__tests__/staleTurns.test.d.ts +1 -0
  34. package/dist/__tests__/staleTurns.test.js +113 -0
  35. package/dist/__tests__/structuredGuard.test.d.ts +1 -0
  36. package/dist/__tests__/structuredGuard.test.js +72 -0
  37. package/dist/__tests__/toolDescComp.test.d.ts +1 -0
  38. package/dist/__tests__/toolDescComp.test.js +157 -0
  39. package/dist/__tests__/toolResultDedup.test.d.ts +1 -0
  40. package/dist/__tests__/toolResultDedup.test.js +40 -0
  41. package/dist/aiRateLimit.d.ts +19 -0
  42. package/dist/aiRateLimit.js +35 -0
  43. package/dist/aiToggle.d.ts +14 -0
  44. package/dist/aiToggle.js +53 -0
  45. package/dist/attachmentCompress.d.ts +9 -0
  46. package/dist/attachmentCompress.js +211 -0
  47. package/dist/attachmentDedup.d.ts +9 -0
  48. package/dist/attachmentDedup.js +89 -0
  49. package/dist/bypass.d.ts +6 -3
  50. package/dist/bypass.js +37 -5
  51. package/dist/cache.d.ts +3 -0
  52. package/dist/cache.js +10 -0
  53. package/dist/circuitBreaker.d.ts +4 -2
  54. package/dist/circuitBreaker.js +6 -3
  55. package/dist/compressibilityProbe.d.ts +8 -0
  56. package/dist/compressibilityProbe.js +47 -0
  57. package/dist/compressionGuard.d.ts +31 -0
  58. package/dist/compressionGuard.js +101 -0
  59. package/dist/compressor.d.ts +51 -1
  60. package/dist/compressor.js +599 -73
  61. package/dist/config.d.ts +21 -1
  62. package/dist/config.js +58 -2
  63. package/dist/dashboard.d.ts +1 -1
  64. package/dist/dashboard.js +621 -116
  65. package/dist/diffRead.d.ts +9 -0
  66. package/dist/diffRead.js +149 -0
  67. package/dist/expand.d.ts +2 -0
  68. package/dist/expand.js +6 -0
  69. package/dist/glossaryStore.d.ts +28 -0
  70. package/dist/glossaryStore.js +131 -0
  71. package/dist/glossarySub.d.ts +38 -0
  72. package/dist/glossarySub.js +123 -0
  73. package/dist/history.d.ts +35 -1
  74. package/dist/history.js +31 -5
  75. package/dist/identGlossary.d.ts +20 -0
  76. package/dist/identGlossary.js +215 -0
  77. package/dist/imageDedup.d.ts +12 -0
  78. package/dist/imageDedup.js +98 -0
  79. package/dist/index.js +7 -0
  80. package/dist/limits.d.ts +5 -2
  81. package/dist/limits.js +47 -4
  82. package/dist/logFeed.d.ts +10 -0
  83. package/dist/logFeed.js +42 -0
  84. package/dist/mcpFilter.d.ts +43 -0
  85. package/dist/mcpFilter.js +89 -0
  86. package/dist/mcpToolFilter.d.ts +32 -0
  87. package/dist/mcpToolFilter.js +140 -0
  88. package/dist/probePort.js +5 -1
  89. package/dist/promptCache.d.ts +44 -0
  90. package/dist/promptCache.js +121 -0
  91. package/dist/qualityGovernor.d.ts +11 -0
  92. package/dist/qualityGovernor.js +69 -0
  93. package/dist/requestCapture.d.ts +21 -0
  94. package/dist/requestCapture.js +79 -0
  95. package/dist/semanticRead.d.ts +9 -0
  96. package/dist/semanticRead.js +188 -0
  97. package/dist/server.js +447 -46
  98. package/dist/sessionCache.js +9 -2
  99. package/dist/skillDedup.d.ts +5 -0
  100. package/dist/skillDedup.js +89 -0
  101. package/dist/staleTurnSummary.d.ts +9 -0
  102. package/dist/staleTurnSummary.js +110 -0
  103. package/dist/staleTurns.d.ts +14 -0
  104. package/dist/staleTurns.js +80 -0
  105. package/dist/stats.d.ts +16 -3
  106. package/dist/stats.js +157 -21
  107. package/dist/stockToolDescs.d.ts +12 -0
  108. package/dist/stockToolDescs.js +69 -0
  109. package/dist/structuredGuard.d.ts +25 -0
  110. package/dist/structuredGuard.js +116 -0
  111. package/dist/systemPrompt.js +6 -2
  112. package/dist/systemSectioning.d.ts +21 -0
  113. package/dist/systemSectioning.js +111 -0
  114. package/dist/toolDescComp.d.ts +30 -0
  115. package/dist/toolDescComp.js +81 -0
  116. package/dist/toolResultDedup.d.ts +9 -0
  117. package/dist/toolResultDedup.js +88 -0
  118. package/package.json +69 -66
  119. package/squeezr.toml +18 -1
package/README.md CHANGED
@@ -1,315 +1,189 @@
1
- # Squeezr
2
-
3
- **Token compression proxy for AI coding CLIs.** Sits between your CLI and the API, compresses context on the fly, saves thousands of tokens per session. Includes a real-time web dashboard and MCP integration.
4
-
5
- [![npm](https://img.shields.io/npm/v/squeezr-ai)](https://www.npmjs.com/package/squeezr-ai) [![license](https://img.shields.io/npm/l/squeezr-ai)](LICENSE)
6
-
7
- ## Supported CLIs & apps
8
-
9
- | Client | Protocol | Proxy method |
10
- |--------|----------|--------------|
11
- | Claude Code | HTTP to Anthropic API | `ANTHROPIC_BASE_URL=http://localhost:8080` |
12
- | **Claude Desktop** | **HTTP to Anthropic API** | **Windows: `setx ANTHROPIC_BASE_URL` (set by `squeezr setup`); macOS: `launchctl setenv`; Linux: `~/.config/environment.d/`** |
13
- | Aider | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
14
- | OpenCode | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
15
- | Gemini CLI | HTTP to Gemini API | `GEMINI_API_BASE_URL=http://localhost:8080` |
16
- | Ollama | HTTP (local) | Transparent via dummy API key detection |
17
- | **Codex Desktop** | **HTTP to OpenAI API** | **`~/.codex/config.toml` → `openai_base_url` (set by `squeezr setup`)** |
18
- | **Codex CLI** | **WebSocket to chatgpt.com** | **TLS-terminating MITM proxy on :8081** |
19
- | Cursor IDE | HTTP to OpenAI API (BYOK mode only) | `localhost:8080` directly (CORS supported) or `squeezr tunnel` if calls come from Cursor's servers |
20
- | Continue (VS Code) | HTTP to OpenAI-compat | `apiBase: http://localhost:8080/v1` |
21
-
22
- Works with both API keys and subscription plans (OAuth) — Claude Code Max/Pro, OpenAI Plus, etc.
23
-
24
- ## Quick start
25
-
26
- ```bash
27
- npm install -g squeezr-ai
28
- squeezr setup # configures env vars, auto-start, CA trust, and MCP server
29
- squeezr start
30
- ```
31
-
32
- `squeezr setup` handles everything automatically:
33
- - Sets `ANTHROPIC_BASE_URL`, `GEMINI_API_BASE_URL`, `NODE_EXTRA_CA_CERTS`
34
- - Installs a shell wrapper (PowerShell on Windows, bash/zsh on Linux/macOS/WSL) that auto-refreshes env vars after `squeezr start/setup/update` no need to restart the terminal
35
- - Registers auto-start (launchd on macOS, systemd on Linux, Task Scheduler/NSSM on Windows)
36
- - Registers the MCP server in Claude Code, Cursor, Windsurf, and Cline
37
- - **Windows:** imports the MITM CA into the Windows Certificate Store (user-level, no admin required) so Rust-based CLIs like Codex trust the proxy's TLS certificates
38
- - **macOS/Linux/WSL:** generates a CA bundle at `~/.squeezr/mitm-ca/bundle.crt` for `NODE_EXTRA_CA_CERTS`
39
-
40
- ## How it works
41
-
42
- Every request from your AI CLI passes through Squeezr on `localhost:8080`. The proxy applies three compression layers before forwarding to the upstream API:
43
-
44
- ### Layer 1: System prompt compression
45
-
46
- The system prompt (~13KB for Claude Code) is compressed once using an AI model and cached. Subsequent requests reuse the cached version. Saves ~3,000 tokens per request.
47
-
48
- ### Layer 2: Deterministic preprocessing
49
-
50
- Zero-latency, rule-based transformations applied to every tool result:
51
-
52
- - **Noise removal:** ANSI escape codes, progress bars, timestamps, spinner output
53
- - **Deduplication:** repeated stack frames, duplicate lines, redundant git hunks
54
- - **Minification:** JSON whitespace, collapsed blank lines
55
-
56
- ### Layer 3: Tool-specific patterns (~30 rules)
57
-
58
- Each tool result is matched against specialized compression rules:
59
-
60
- | Category | Tools | What it does |
61
- |----------|-------|--------------|
62
- | Git | diff, log, status, branch | 1-line diff context, capped log, compact status |
63
- | JS/TS | vitest, jest, playwright, tsc, eslint, biome, prettier | Failures/errors only, grouped by file |
64
- | Package managers | pnpm, npm | Install summary, list capped at 30, outdated only |
65
- | Build | next build, cargo build | Errors only |
66
- | Test | cargo test, pytest, go test | FAIL blocks + tracebacks only |
67
- | Infra | terraform, docker, kubectl | Resource changes, compact tables, last 50 log lines |
68
- | Other | prisma, gh CLI, curl/wget | Strip ASCII art, cap output, remove verbose headers |
69
-
70
- ### Exclusive patterns
71
-
72
- Applied to specific content types regardless of tool:
73
-
74
- - **Lockfiles** (package-lock.json, Cargo.lock, etc.) → dependency count summary
75
- - **Large code files** (>500 lines) → imports + function/class signatures only
76
- - **Long output** (>200 lines) head + tail + omission note
77
- - **Grep results** → grouped by file, matches capped
78
- - **Glob results** (>30 files) directory tree summary
79
- - **Noisy output** (>50% non-essential) auto-extract errors/warnings
80
-
81
- ### Adaptive pressure
82
-
83
- Compression aggressiveness scales with context window usage:
84
-
85
- | Context usage | Threshold | Behavior |
86
- |---------------|-----------|----------|
87
- | < 50% | 1,500 chars | Light — only compress large results |
88
- | 50–75% | 800 chars | Normal — standard compression |
89
- | 75–90% | 400 chars | Aggressivecompress most results |
90
- | > 90% | 150 chars | Critical — compress everything, 0 git diff context |
91
-
92
- ### Session optimizations
93
-
94
- - **Session cache:** Each compressed tool result is stored by MD5 hash. If the same content appears again in a later request, the cached version is reused instantly with no API call
95
- - **KV cache warming:** IDs are deterministic (MD5) so the compressed string is byte-for-byte identical across requests — Anthropic's KV cache stays warm and historical tokens cost zero compute
96
- - **Cross-turn dedup:** If the same file is read multiple times, earlier reads are replaced with reference pointers
97
- - **Expand on demand:** Compressed blocks include a `squeezr_expand(id)` callback to retrieve full content
98
-
99
- ## Web dashboard
100
-
101
- Live dashboard at `http://localhost:PORT/squeezr/dashboard` with 5 pages:
102
-
103
- | Page | What it shows |
104
- |------|---------------|
105
- | **Overview** | Tokens saved, compression %, requests, cost saved, per-tool breakdown, sparkline chart, context pressure bars, active project badge, savings breakdown (deterministic, AI, dedup, system prompt, overhead) |
106
- | **Projects** | Per-project aggregate stats across all sessions, auto-detected from working directory or set manually via MCP |
107
- | **History** | Past proxy sessions grouped by project and day — start/end time, duration, request count, tokens saved, relative timestamps |
108
- | **Limits** | Real-time rate limit gauges per CLI: Anthropic token/request limits, OpenAI billing & credit balance, Gemini 429 tracking, input/output token usage (session + daily), personal monthly budget bar |
109
- | **Settings** | Compression mode selector (Soft/Normal/Aggressive/Critical), threshold tuning |
110
-
111
- Updates every 2 seconds via SSE. Works with both API key and subscription (OAuth) authentication.
112
-
113
- ## MCP server
114
-
115
- Built-in MCP server (`squeezr-mcp`) that gives any MCP-capable AI CLI real-time awareness and control of Squeezr.
116
-
117
- **Installed automatically** by `squeezr setup` into Claude Code, Cursor, Windsurf, and Cline.
118
-
119
- | Tool | Description |
120
- |------|-------------|
121
- | `squeezr_status` | Is proxy running? Version, port, uptime, mode, circuit breaker state, bypass status |
122
- | `squeezr_stats` | Token savings, compression %, cost saved, savings breakdown, per-tool breakdown, latency (p50/p95/p99), expand rate |
123
- | `squeezr_set_mode` | Change compression mode instantly (soft / normal / aggressive / critical) |
124
- | `squeezr_config` | Current thresholds, keepRecent, cache sizes, AI-skipped tools |
125
- | `squeezr_habits` | Detect wasteful patterns this session (duplicate reads, high Bash count, cache efficiency) |
126
- | `squeezr_stop` | Stop the proxy gracefully (persists caches before exit) |
127
- | `squeezr_check_updates` | Check npm for newer Squeezr version |
128
- | `squeezr_update` | Update to latest version via `npm install -g squeezr-ai@latest` |
129
- | `squeezr_set_project` | Manually set/clear the current project name (overrides auto-detection) |
130
- | `squeezr_bypass` | Toggle bypass mode disable compression instantly without restart (runtime-only) |
131
-
132
- Every MCP tool response automatically checks for updates and appends a notification banner when a new version is available.
133
-
134
- ## Honest savings tracking
135
-
136
- Squeezr tracks token savings with full transparency. `squeezr gain` and the dashboard break down savings by source:
137
-
138
- | Source | Description |
139
- |--------|-------------|
140
- | Deterministic | Rule-based preprocessing (ANSI strip, dedup, minification) — free, zero latency |
141
- | AI compression | Haiku/GPT-mini summarization of tool results — near-free, slight latency |
142
- | Read dedup | Cross-turn deduplication of repeated file reads |
143
- | System prompt | One-time AI compression of the system prompt, cached across requests |
144
- | Tag overhead | Bytes added by `[squeezr:ID]` markers (subtracted from savings) |
145
- | AI cost | Estimated token cost of compression API calls (subtracted from NET) |
146
-
147
- **NET savings** = total savings tag overhead − AI compression cost.
148
-
149
- ### `squeezr gain` subcommands
150
-
151
- ```bash
152
- squeezr gain # all-time savings summary
153
- squeezr gain --session # live session savings from the running proxy
154
- squeezr gain --details # all-time stats with per-tool breakdown
155
- squeezr gain --reset # reset all-time counters
156
- ```
157
-
158
- ## Project tracking
159
-
160
- Squeezr automatically detects the active project from the CLI's working directory (e.g. Claude Code's `<cwd>` tag in the system prompt). Per-project stats are tracked across sessions.
161
-
162
- - **Auto-detection:** extracts the project name from the last meaningful path segment
163
- - **Manual override:** `squeezr_set_project` MCP tool or `POST /squeezr/project` REST endpoint
164
- - **Per-project stats:** visible on the Dashboard's Projects page and in `squeezr gain --session`
165
-
166
- ## Codex support (MITM proxy)
167
-
168
- Codex uses WebSocket over TLS to `chatgpt.com` with OAuth authentication — it cannot be proxied via `OPENAI_BASE_URL`. Squeezr runs a TLS-terminating MITM proxy on port 8081 that intercepts and compresses WebSocket frames. See [CODEX.md](CODEX.md) for the full technical breakdown.
169
-
170
- The MITM proxy **only intercepts `chatgpt.com`** traffic. All other HTTPS requests (npm, git, curl, etc.) pass through as a transparent TCP tunnel — no certificate needed, no interference.
171
-
172
- ## Configuration
173
-
174
- ### Global config: `squeezr.toml` (next to the binary)
175
-
176
- ```toml
177
- # Compression thresholds
178
- threshold = 800 # min chars to apply compression
179
- keep_recent = 3 # skip the N most recent tool results
180
- ai_compression = false # enable AI (Haiku) for tool result compression
181
-
182
- # Ports
183
- port = 8080 # HTTP proxy port
184
- mitm_port = 8081 # MITM proxy port (Codex)
185
-
186
- # Models
187
- local_model = "qwen2.5-coder:1.5b" # model for local compression
188
- local_upstream = "http://localhost:11434"
189
-
190
- # Tools to never AI-compress (deterministic-only)
191
- ai_skip_tools = ["Read", "View"]
192
-
193
- # Compression modes override thresholds
194
- [modes.soft]
195
- threshold = 1500
196
- keep_recent = 10
197
- ai_compression = false
198
-
199
- [modes.normal]
200
- threshold = 800
201
- keep_recent = 3
202
-
203
- [modes.aggressive]
204
- threshold = 200
205
- keep_recent = 1
206
- ai_compression = true
207
-
208
- [modes.critical]
209
- threshold = 50
210
- keep_recent = 0
211
- ai_compression = true
212
- ```
213
-
214
- ### Project-level config: `squeezr.project.toml` (in project root)
215
-
216
- Project-level config is deep-merged over global config. Useful for per-repo tuning.
217
-
218
- ### Environment variables
219
-
220
- | Variable | Default | Description |
221
- |----------|---------|-------------|
222
- | `SQUEEZR_PORT` | `8080` | HTTP proxy port (Claude, Aider, Gemini) |
223
- | `SQUEEZR_MITM_PORT` | `8081` | MITM proxy port (Codex) — defaults to SQUEEZR_PORT + 1 |
224
- | `SQUEEZR_THRESHOLD` | `800` | Min chars to compress |
225
- | `SQUEEZR_KEEP_RECENT` | `3` | Recent results to skip |
226
- | `SQUEEZR_DISABLED` | `false` | Disable all compression |
227
- | `SQUEEZR_DRY_RUN` | `false` | Log savings without compressing |
228
- | `SQUEEZR_LOCAL_UPSTREAM` | `http://localhost:11434` | Ollama/LM Studio URL |
229
- | `SQUEEZR_LOCAL_MODEL` | `qwen2.5-coder:1.5b` | Local model for compression |
230
-
231
- ### Per-command skip
232
-
233
- Add `# squeezr:skip` anywhere in a Bash command to bypass compression for that result.
234
-
235
- ## CLI commands
236
-
237
- ```bash
238
- squeezr setup # configure env vars, auto-start, CA trust, install MCP server
239
- squeezr start # start the proxy (auto-restarts if version mismatch after update)
240
- squeezr update # kill old processes, install latest from npm, restart
241
- squeezr stop # stop the proxy
242
- squeezr status # check if proxy is running
243
- squeezr logs # show last 50 log lines
244
- squeezr config # print current config
245
- squeezr ports # change HTTP and MITM proxy ports
246
- squeezr gain # all-time token savings summary
247
- squeezr gain --session # live session savings from the running proxy
248
- squeezr gain --details # all-time stats with per-tool breakdown
249
- squeezr gain --reset # reset all-time counters
250
- squeezr discover # detect which AI CLIs are installed
251
- squeezr bypass # toggle bypass mode (skip compression, keep logging)
252
- squeezr bypass --on # enable bypass (disable compression)
253
- squeezr bypass --off # disable bypass (resume compression)
254
- squeezr tunnel # expose proxy via Cloudflare Tunnel (for Cursor IDE)
255
- squeezr mcp install # register MCP server in Claude Code, Cursor, Windsurf, Cline
256
- squeezr mcp uninstall # remove MCP server registration
257
- squeezr uninstall # remove Squeezr completely (env vars, CA, auto-start, logs)
258
- squeezr version # print version
259
- ```
260
-
261
- ## Resilience
262
-
263
- Squeezr sits in the critical path between your AI CLI and the upstream API. It's designed to never break your workflow:
264
-
265
- - **Circuit breaker** — If the AI compression backend (Haiku, GPT-4o-mini, etc.) fails 3 times in a row, Squeezr automatically skips AI compression for 60 seconds, then probes recovery. Deterministic compression continues working. Visible in dashboard, `squeezr status`, and MCP.
266
- - **5-second AI timeout** — Each AI compression call has a hard 5s timeout. If the backend is slow, the original content passes through unmodified.
267
- - **Bypass mode** — `squeezr bypass` instantly disables all compression without restarting. Requests still pass through and are logged. Toggle via CLI, MCP, dashboard, or REST API.
268
- - **Expand rate tracking** — Monitors how often the model calls `squeezr_expand` to recover compressed content. High expand rate signals the compression is too aggressive.
269
- - **Latency tracking** — p50/p95/p99 compression latency visible in dashboard and MCP stats.
270
-
271
- ## Compression backends
272
-
273
- Squeezr uses cheap/free models for AI compression (the deterministic layer is pure regex, no API calls):
274
-
275
- | Backend | Model | Used for | Cost |
276
- |---------|-------|----------|------|
277
- | Anthropic | Haiku | System prompt, session cache | ~$0.0001/call |
278
- | OpenAI | GPT-4o-mini | Fallback compression | ~$0.0001/call |
279
- | Gemini | Flash-8B | Fallback compression | Free |
280
- | Local | qwen2.5-coder:1.5b | Compression when using Ollama | Free |
281
- | ChatGPT (WS) | GPT-5.4-mini | Codex frame compression | $0 (same subscription) |
282
-
283
- ## Requirements
284
-
285
- - Node.js 18+ (compatible with Node.js 24)
286
- - For Codex MITM: set `HTTPS_PROXY=http://localhost:8081` in the terminal where you run Codex (not set globally to avoid interfering with other tools)
287
- - For local compression: [Ollama](https://ollama.ai) with `qwen2.5-coder:1.5b`
288
-
289
- ## Troubleshooting
290
-
291
- ### Claude Code throws `undefined is not an object (evaluating '$.speed')`
292
-
293
- Symptom: every prompt in Claude Code immediately errors with `undefined is not an object (evaluating '$.speed')` (or similar `$.X` parse errors). This means Claude Code is sending its API requests to **something that is not Squeezr** but happens to occupy Squeezr's port — typically a Docker container (Apache, nginx, WordPress) bound to `8080`.
294
-
295
- To diagnose, run:
296
-
297
- ```bash
298
- squeezr status
299
- ```
300
-
301
- If the output says `a foreign service is` listening on the port, you have three options:
302
-
303
- 1. **Move Squeezr to a different port** (recommended): `squeezr ports` and pick something free, then reopen your terminal.
304
- 2. **Stop the offending service**: `docker ps` to find what owns 8080, then `docker stop <id>`.
305
- 3. **Inspect runtime info**: `cat ~/.squeezr/runtime.json` shows the *actual* port Squeezr is bound to. If it differs from your `ANTHROPIC_BASE_URL`, run `squeezr setup` to refresh your shell profile.
306
-
307
- Squeezr v1.24.0+ runs a self-test on every startup that detects this exact failure mode and prints actionable hints. You can re-run it any time with:
308
-
309
- ```bash
310
- curl -s "http://localhost:$(jq -r .port ~/.squeezr/runtime.json)/squeezr/selftest?run=1" | jq
311
- ```
312
-
313
- ## License
314
-
315
- MIT
1
+ # Squeezr
2
+
3
+ **Token compression proxy for AI coding CLIs.** Sits between your CLI and the API, compresses context on the fly — **without ever breaking Anthropic's prompt cache** — and saves thousands of tokens per session. Real-time dashboard, MCP integration, and an optional AI compression layer powered by **Zest**, Squeezr's own model.
4
+
5
+ [![npm](https://img.shields.io/npm/v/squeezr-ai)](https://www.npmjs.com/package/squeezr-ai) [![license](https://img.shields.io/npm/l/squeezr-ai)](LICENSE)
6
+
7
+ ## Supported CLIs & apps
8
+
9
+ | Client | Protocol | Proxy method |
10
+ |--------|----------|--------------|
11
+ | Claude Code | HTTP to Anthropic API | `ANTHROPIC_BASE_URL=http://localhost:8080` |
12
+ | Claude Desktop | HTTP to Anthropic API | Windows: `setx` (via `squeezr setup`); macOS: `launchctl setenv`; Linux: `environment.d` |
13
+ | Aider | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
14
+ | OpenCode | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
15
+ | Gemini CLI | HTTP to Gemini API | `GEMINI_API_BASE_URL=http://localhost:8080` |
16
+ | Ollama | HTTP (local) | Transparent via dummy API key detection |
17
+ | Codex Desktop | HTTP to OpenAI API | `~/.codex/config.toml` → `openai_base_url` (via `squeezr setup`) |
18
+ | Codex CLI | WebSocket to chatgpt.com | TLS-terminating MITM proxy on :8081 |
19
+ | Cursor IDE | HTTP to OpenAI API (BYOK only) | `localhost:8080` directly (CORS) or `squeezr tunnel` |
20
+ | Continue (VS Code) | HTTP to OpenAI-compat | `apiBase: http://localhost:8080/v1` |
21
+
22
+ Works with both API keys and subscription plans (OAuth) — Claude Code Max/Pro, OpenAI Plus, etc.
23
+
24
+ ## Quick start
25
+
26
+ ```bash
27
+ npm install -g squeezr-ai
28
+ squeezr setup # env vars, auto-start, CA trust, MCP server — all automatic
29
+ squeezr start
30
+ ```
31
+
32
+ ## Prompt-cache safety (the core design principle)
33
+
34
+ Anthropic bills cached context at **0.1x** — but only if the request prefix arrives **byte-for-byte identical** between requests. A proxy that mutates history differently on each request silently invalidates that cache and re-bills the full context at full price, costing far more than compression saves.
35
+
36
+ Squeezr is built around this constraint:
37
+
38
+ - **Deterministic compression is byte-stable** same input always produces the same output (fixed-pressure rules), so the cached prefix never changes → Anthropic's cache keeps hitting.
39
+ - **Unstable passes never touch the cached prefix** — AI compression, cross-turn dedup, diff-reads and stale-turn summaries only operate past the last `cache_control` marker (or freely for clients that don't use caching).
40
+ - **Live cache health monitoring** — the dashboard's *Prompt Cache* card shows `cache_read` vs `cache_creation` tokens and a Hit Health %. Green (≥80%) means you're paying the minimum possible; red means something is invalidating your prefix.
41
+
42
+ Full write-up: [docs/PROMPT_CACHE.md](docs/PROMPT_CACHE.md)
43
+
44
+ ## How it works
45
+
46
+ Every request passes through Squeezr on `localhost:8080`. Compression layers, in order:
47
+
48
+ 1. **MCP tool filtering** (opt-in) — drop tool definitions from MCP servers you block (`mcp_block_servers`) or that aren't allow-listed. A single chatty MCP server can cost ~18k tokens *per request*. Servers used in the conversation are never filtered.
49
+ 2. **Tool description compression** — tool descriptions are truncated to their first paragraph (Bash: 10,441 → 53 chars), with the full spec stored in the expand store. The model retrieves it on demand via `squeezr_expand`. Saves ~17k tokens/request on a stock Claude Code session.
50
+ 3. **System prompt compression** skill/plugin duplicate blocks deduplicated; optional AI summarization (gated behind `ai_compression`).
51
+ 4. **Deterministic preprocessing** — zero-latency regex rules on every tool result: ANSI/progress-bar/timestamp stripping, line dedup, JSON minification, plus ~30 tool-specific patterns (git, vitest/jest, tsc, eslint, cargo, pytest, docker, kubectl, gh…). Byte-stable → cache-safe.
52
+ 5. **Cross-turn dedup & diff-reads** repeated tool outputs collapse to references; repeated file reads become diffs against the latest read. (Only past the cache barrier.)
53
+ 6. **Stale-turn summarization** conversations >40 turns get old assistant prose collapsed to keyword summaries. (Only for clients without prompt caching.)
54
+ 7. **AI compression** (opt-in, off by default) — blocks ≥1500 chars summarized by a small model. Measured on real data: 75–91% compression on large blocks. Backends: **Zest (local, free, deterministic)**, Haiku, GPT-4o-mini, Gemini Flash. Guarded by a rate limiter (20 calls/5 min), a persistent on/off toggle, and the cache barrier.
55
+
56
+ ### Recovery: nothing is ever lost
57
+
58
+ Every compressed block embeds a `squeezr_expand(id)` reference. A `squeezr_expand` tool is injected into each request — if the model needs the original content, it retrieves it in one call. The dashboard tracks expand rate as the compression-quality metric (0 = nothing important was lost).
59
+
60
+ ### Adaptive pressure
61
+
62
+ Compression aggressiveness scales with context usage: <50% light (1500-char threshold), 50–75% → normal (800), 75–90% → aggressive (400), >90% critical (150).
63
+
64
+ ## Web dashboard
65
+
66
+ `http://localhost:8080/squeezr/dashboard` 3 pages, SSE-updated:
67
+
68
+ | Page | What it shows |
69
+ |------|---------------|
70
+ | **Overview** | All-time tokens saved (single source of truth), ratio + per-request average, cost saved, Top Tools (real per-tool block counts), Session Cache (AI layer), AI Compression card (calls / saved / spent / net), **Prompt Cache health** (read vs creation + hit %), Savings by type (per-technique breakdown), by model (incl. what compression backends spend), by client, compression mode + **Bypass / AI Compression toggles** |
71
+ | **Savings** | Day / Week / Month / All-time filters with period navigation — per-period tokens, cost, sessions, charts, By Model / By Client / Top Tools / AI Compression / Session Cache, all persisted across restarts |
72
+ | **Settings** | Client base-URL reference, ports, version/uptime, bypass & circuit breaker state, **AI Compression on/off**, **Restart / Stop buttons**, update check |
73
+
74
+ ## Safety & resilience
75
+
76
+ Squeezr sits in the critical path. It is designed to never break your workflow — and never burn your plan:
77
+
78
+ - **Bypass mode (persisted)** one click/command disables all compression; survives restarts. The emergency stop.
79
+ - **AI compression master switch (persisted, default OFF)** with a subscription OAuth token, AI compression calls bill against *your own plan*; only enable it with a separately billed API key or the free local Zest backend.
80
+ - **AI rate limiter** — hard cap of 20 AI calls per 5-minute sliding window, process-global.
81
+ - **AI minimum block size (1500 chars)** — measured on real data: small blocks *expand* under AI compression; Squeezr never AI-compresses them.
82
+ - **Cache barrier** — unstable passes can't touch the cached prefix (see prompt-cache safety above).
83
+ - **Circuit breaker** 3 consecutive AI backend failures → AI compression disabled for 60s, deterministic continues.
84
+ - **Atomic persistence** — stats, history, caches and toggles are written atomically (tmp + rename); a crash can't corrupt them.
85
+ - **Self-test on startup** detects port squatting (the classic `$.speed` Claude Code error), env-var drift, and pipeline issues.
86
+
87
+ ## Honest metrics
88
+
89
+ One source of truth (`~/.squeezr/stats.json`, continuous net counters never inflated per-session sums):
90
+
91
+ - **Net saved** = what actually left your requests, after `[squeezr:id]` tag overhead.
92
+ - **Savings by type** — deterministic, dedup, tool descriptions, stale turns, AI, MCP filter, system prompt (gross per-technique, labeled as such).
93
+ - **AI spend tracking** — every compression backend call's real token usage (input/output, per model) is counted and shown against what it saved.
94
+ - **Prompt cache** `cache_read` vs `cache_creation` from Anthropic's real usage fields. Anthropic's cache discount is shown separately and *not* claimed as Squeezr savings.
95
+
96
+ ## Zest Squeezr's own compression model
97
+
98
+ Zest (`zest-0.8b`, fine-tuned from Qwen3.5-0.8B with LoRA) is Squeezr's local compression model: free, runs on CPU via Ollama, and **deterministic in greedy decoding** — which makes AI compression byte-stable and therefore cache-safe. Status: v3 trained (89% eval accuracy), GGUF packaging in progress. Design doc: [docs/REINVENT_AI.md](docs/REINVENT_AI.md)
99
+
100
+ ## MCP server
101
+
102
+ Installed automatically into Claude Code, Cursor, Windsurf and Cline by `squeezr setup`.
103
+
104
+ Tools: `squeezr_status`, `squeezr_stats`, `squeezr_set_mode`, `squeezr_config`, `squeezr_habits`, `squeezr_stop`, `squeezr_check_updates`, `squeezr_update`, `squeezr_set_project`, `squeezr_bypass`.
105
+
106
+ ## Configuration
107
+
108
+ User config lives at **`~/.squeezr/squeezr.toml`** (survives npm updates). A project-local `.squeezr.toml` deep-merges on top.
109
+
110
+ ```toml
111
+ [compression]
112
+ threshold = 800 # min chars to compress a tool result
113
+ keep_recent = 3 # recent tool results never touched
114
+ ai_compression = false # MASTER switch for AI calls — default OFF (see Safety)
115
+ compress_system_prompt = true
116
+ compress_conversation = true
117
+ stale_turns = true # auto-disabled when prompt-cache markers are present
118
+ tool_desc_compress = true # first-paragraph truncation + expand recovery
119
+ tool_desc_expand = true
120
+ # mcp_block_servers = ["some-mcp"] # drop these servers' tools (~18k tok/req each)
121
+ # mcp_allow_servers = ["github-mcp"] # if set, only these survive
122
+ # skip_tools = ["Read"] # never compress these tool types
123
+ # Per-command: append "# squeezr:skip" to any Bash command to skip its result
124
+
125
+ [cache]
126
+ enabled = true
127
+ max_entries = 1000
128
+
129
+ [adaptive]
130
+ enabled = true # pressure-based thresholds (see Adaptive pressure)
131
+
132
+ [local]
133
+ enabled = true
134
+ upstream_url = "http://localhost:11434"
135
+ compression_model = "qwen2.5-coder:1.5b" # or zest-0.8b once published
136
+ ```
137
+
138
+ Env vars: `SQUEEZR_PORT`, `SQUEEZR_MITM_PORT`, `SQUEEZR_THRESHOLD`, `SQUEEZR_KEEP_RECENT`, `SQUEEZR_DISABLED`, `SQUEEZR_DRY_RUN`, `SQUEEZR_LOCAL_UPSTREAM`, `SQUEEZR_LOCAL_MODEL`.
139
+
140
+ ## CLI commands
141
+
142
+ ```bash
143
+ squeezr setup # configure everything (env, auto-start, CA, MCP)
144
+ squeezr start # start the proxy
145
+ squeezr restart # stop + start (reloads config)
146
+ squeezr stop # stop the proxy
147
+ squeezr update # install latest from npm + restart
148
+ squeezr status # is it running? version, port, self-test
149
+ squeezr logs # last 50 log lines
150
+ squeezr config # print current config
151
+ squeezr ports # change HTTP / MITM ports
152
+ squeezr gain # all-time savings (--session, --details, --reset)
153
+ squeezr discover # detect installed AI CLIs
154
+ squeezr bypass # toggle bypass (--on / --off) persisted
155
+ squeezr tunnel # Cloudflare Tunnel (Cursor IDE)
156
+ squeezr mcp install # register MCP server (mcp uninstall to remove)
157
+ squeezr desktop start # separate proxy for Claude/Codex Desktop (stop/status)
158
+ squeezr uninstall # remove completely
159
+ squeezr version
160
+ ```
161
+
162
+ ## REST endpoints
163
+
164
+ `/squeezr/stats` · `/squeezr/history` · `/squeezr/health` · `/squeezr/bypass` (GET/POST) · `/squeezr/ai-compression` (GET/POST) · `/squeezr/config` · `/squeezr/native-compact` · `/squeezr/control/restart` · `/squeezr/control/stop` · `/squeezr/dashboard`
165
+
166
+ ## Requirements
167
+
168
+ Node.js ≥18, ~140 MB RAM, no GPU. Full details: [docs/HARDWARE_REQUIREMENTS.md](docs/HARDWARE_REQUIREMENTS.md)
169
+
170
+ ## Documentation
171
+
172
+ | Doc | Contents |
173
+ |-----|----------|
174
+ | [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) | Internal architecture |
175
+ | [docs/PROMPT_CACHE.md](docs/PROMPT_CACHE.md) | How Anthropic's prompt cache works + the 3 cache-breakers we found and fixed |
176
+ | [docs/REINVENT_AI.md](docs/REINVENT_AI.md) | Data-driven design of the AI compression layer (Zest) |
177
+ | [docs/HARDWARE_REQUIREMENTS.md](docs/HARDWARE_REQUIREMENTS.md) | Measured hardware requirements |
178
+ | [docs/TOOLS_ISSUE.md](docs/TOOLS_ISSUE.md) | The 28k-tokens-per-request tool descriptions problem |
179
+ | [CHANGELOG.md](CHANGELOG.md) | Full version history |
180
+
181
+ ## Troubleshooting
182
+
183
+ **Claude Code throws `undefined is not an object (evaluating '$.speed')`** — something that isn't Squeezr is squatting on the port (often a Docker container on 8080). Run `squeezr status`; if it reports a foreign service: `squeezr ports` to move, or stop the offender. `cat ~/.squeezr/runtime.json` shows the actual bound port.
184
+
185
+ **Dashboard shows 0 AI calls with AI Compression ON** — expected with Claude Code: the prompt-cache barrier leaves nothing safe for AI to compress (everything cacheable is protected, everything recent is preserved). AI compression shines for clients without prompt caching, or via the (upcoming) stable Zest pipeline.
186
+
187
+ ## License
188
+
189
+ MIT