squeezr-ai 1.46.3 → 1.80.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +199 -315
- package/bin/squeezr.js +2535 -2251
- package/dist/__tests__/aiRateLimit.test.d.ts +1 -0
- package/dist/__tests__/aiRateLimit.test.js +20 -0
- package/dist/__tests__/attachmentDedup.test.d.ts +1 -0
- package/dist/__tests__/attachmentDedup.test.js +89 -0
- package/dist/__tests__/compressibilityProbe.test.d.ts +1 -0
- package/dist/__tests__/compressibilityProbe.test.js +45 -0
- package/dist/__tests__/compressionGuard.test.d.ts +1 -0
- package/dist/__tests__/compressionGuard.test.js +57 -0
- package/dist/__tests__/compressor.test.js +104 -51
- package/dist/__tests__/diffRead.test.d.ts +1 -0
- package/dist/__tests__/diffRead.test.js +83 -0
- package/dist/__tests__/glossaryStore.test.d.ts +1 -0
- package/dist/__tests__/glossaryStore.test.js +37 -0
- package/dist/__tests__/glossarySub.test.d.ts +1 -0
- package/dist/__tests__/glossarySub.test.js +162 -0
- package/dist/__tests__/imageDedup.test.d.ts +1 -0
- package/dist/__tests__/imageDedup.test.js +80 -0
- package/dist/__tests__/largeBlock.test.d.ts +1 -0
- package/dist/__tests__/largeBlock.test.js +35 -0
- package/dist/__tests__/mcpFilter.test.d.ts +1 -0
- package/dist/__tests__/mcpFilter.test.js +87 -0
- package/dist/__tests__/newFeatures.test.d.ts +1 -0
- package/dist/__tests__/newFeatures.test.js +124 -0
- package/dist/__tests__/qualityHarness.test.d.ts +1 -0
- package/dist/__tests__/qualityHarness.test.js +98 -0
- package/dist/__tests__/rateLimitHeaders.test.js +6 -0
- package/dist/__tests__/requestCapture.test.d.ts +1 -0
- package/dist/__tests__/requestCapture.test.js +37 -0
- package/dist/__tests__/skillDedup.test.d.ts +1 -0
- package/dist/__tests__/skillDedup.test.js +57 -0
- package/dist/__tests__/staleTurns.test.d.ts +1 -0
- package/dist/__tests__/staleTurns.test.js +113 -0
- package/dist/__tests__/structuredGuard.test.d.ts +1 -0
- package/dist/__tests__/structuredGuard.test.js +72 -0
- package/dist/__tests__/toolDescComp.test.d.ts +1 -0
- package/dist/__tests__/toolDescComp.test.js +157 -0
- package/dist/__tests__/toolResultDedup.test.d.ts +1 -0
- package/dist/__tests__/toolResultDedup.test.js +40 -0
- package/dist/aiRateLimit.d.ts +19 -0
- package/dist/aiRateLimit.js +35 -0
- package/dist/aiToggle.d.ts +14 -0
- package/dist/aiToggle.js +53 -0
- package/dist/attachmentCompress.d.ts +9 -0
- package/dist/attachmentCompress.js +211 -0
- package/dist/attachmentDedup.d.ts +9 -0
- package/dist/attachmentDedup.js +89 -0
- package/dist/bypass.d.ts +6 -3
- package/dist/bypass.js +37 -5
- package/dist/cache.d.ts +3 -0
- package/dist/cache.js +10 -0
- package/dist/circuitBreaker.d.ts +4 -2
- package/dist/circuitBreaker.js +6 -3
- package/dist/compressibilityProbe.d.ts +8 -0
- package/dist/compressibilityProbe.js +47 -0
- package/dist/compressionGuard.d.ts +31 -0
- package/dist/compressionGuard.js +101 -0
- package/dist/compressor.d.ts +51 -1
- package/dist/compressor.js +599 -73
- package/dist/config.d.ts +21 -1
- package/dist/config.js +64 -4
- package/dist/dashboard.d.ts +1 -1
- package/dist/dashboard.js +621 -116
- package/dist/diffRead.d.ts +9 -0
- package/dist/diffRead.js +149 -0
- package/dist/expand.d.ts +2 -0
- package/dist/expand.js +6 -0
- package/dist/glossaryStore.d.ts +28 -0
- package/dist/glossaryStore.js +131 -0
- package/dist/glossarySub.d.ts +38 -0
- package/dist/glossarySub.js +123 -0
- package/dist/history.d.ts +35 -1
- package/dist/history.js +31 -5
- package/dist/identGlossary.d.ts +20 -0
- package/dist/identGlossary.js +215 -0
- package/dist/imageDedup.d.ts +12 -0
- package/dist/imageDedup.js +98 -0
- package/dist/index.js +7 -0
- package/dist/limits.d.ts +5 -2
- package/dist/limits.js +47 -4
- package/dist/logFeed.d.ts +10 -0
- package/dist/logFeed.js +42 -0
- package/dist/mcpFilter.d.ts +43 -0
- package/dist/mcpFilter.js +89 -0
- package/dist/mcpToolFilter.d.ts +32 -0
- package/dist/mcpToolFilter.js +140 -0
- package/dist/probePort.js +5 -1
- package/dist/promptCache.d.ts +44 -0
- package/dist/promptCache.js +121 -0
- package/dist/qualityGovernor.d.ts +11 -0
- package/dist/qualityGovernor.js +69 -0
- package/dist/requestCapture.d.ts +21 -0
- package/dist/requestCapture.js +79 -0
- package/dist/semanticRead.d.ts +9 -0
- package/dist/semanticRead.js +188 -0
- package/dist/server.js +447 -46
- package/dist/sessionCache.js +9 -2
- package/dist/skillDedup.d.ts +5 -0
- package/dist/skillDedup.js +89 -0
- package/dist/staleTurnSummary.d.ts +9 -0
- package/dist/staleTurnSummary.js +110 -0
- package/dist/staleTurns.d.ts +14 -0
- package/dist/staleTurns.js +80 -0
- package/dist/stats.d.ts +16 -3
- package/dist/stats.js +157 -21
- package/dist/stockToolDescs.d.ts +12 -0
- package/dist/stockToolDescs.js +69 -0
- package/dist/structuredGuard.d.ts +25 -0
- package/dist/structuredGuard.js +116 -0
- package/dist/systemPrompt.js +6 -2
- package/dist/systemSectioning.d.ts +21 -0
- package/dist/systemSectioning.js +111 -0
- package/dist/toolDescComp.d.ts +30 -0
- package/dist/toolDescComp.js +81 -0
- package/dist/toolResultDedup.d.ts +9 -0
- package/dist/toolResultDedup.js +88 -0
- package/package.json +69 -66
- package/squeezr.toml +18 -1
package/README.md
CHANGED
|
@@ -1,315 +1,199 @@
|
|
|
1
|
-
# Squeezr
|
|
2
|
-
|
|
3
|
-
**Token compression proxy for AI coding CLIs.** Sits between your CLI and the API, compresses context on the fly
|
|
4
|
-
|
|
5
|
-
[](https://www.npmjs.com/package/squeezr-ai) [](LICENSE)
|
|
6
|
-
|
|
7
|
-
## Supported CLIs & apps
|
|
8
|
-
|
|
9
|
-
| Client | Protocol | Proxy method |
|
|
10
|
-
|--------|----------|--------------|
|
|
11
|
-
| Claude Code | HTTP to Anthropic API | `ANTHROPIC_BASE_URL=http://localhost:8080` |
|
|
12
|
-
|
|
|
13
|
-
| Aider | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
|
|
14
|
-
| OpenCode | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
|
|
15
|
-
| Gemini CLI | HTTP to Gemini API | `GEMINI_API_BASE_URL=http://localhost:8080` |
|
|
16
|
-
| Ollama | HTTP (local) | Transparent via dummy API key detection |
|
|
17
|
-
|
|
|
18
|
-
|
|
|
19
|
-
| Cursor IDE | HTTP to OpenAI API (BYOK
|
|
20
|
-
| Continue (VS Code) | HTTP to OpenAI-compat | `apiBase: http://localhost:8080/v1` |
|
|
21
|
-
|
|
22
|
-
Works with both API keys and subscription plans (OAuth) — Claude Code Max/Pro, OpenAI Plus, etc.
|
|
23
|
-
|
|
24
|
-
## Quick start
|
|
25
|
-
|
|
26
|
-
```bash
|
|
27
|
-
npm install -g squeezr-ai
|
|
28
|
-
squeezr setup #
|
|
29
|
-
squeezr start
|
|
30
|
-
```
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
- **
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
- **
|
|
53
|
-
- **
|
|
54
|
-
- **
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
- **
|
|
77
|
-
|
|
78
|
-
- **
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
squeezr
|
|
154
|
-
squeezr
|
|
155
|
-
squeezr
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
##
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
threshold = 800
|
|
201
|
-
keep_recent = 3
|
|
202
|
-
|
|
203
|
-
[modes.aggressive]
|
|
204
|
-
threshold = 200
|
|
205
|
-
keep_recent = 1
|
|
206
|
-
ai_compression = true
|
|
207
|
-
|
|
208
|
-
[modes.critical]
|
|
209
|
-
threshold = 50
|
|
210
|
-
keep_recent = 0
|
|
211
|
-
ai_compression = true
|
|
212
|
-
```
|
|
213
|
-
|
|
214
|
-
### Project-level config: `squeezr.project.toml` (in project root)
|
|
215
|
-
|
|
216
|
-
Project-level config is deep-merged over global config. Useful for per-repo tuning.
|
|
217
|
-
|
|
218
|
-
### Environment variables
|
|
219
|
-
|
|
220
|
-
| Variable | Default | Description |
|
|
221
|
-
|----------|---------|-------------|
|
|
222
|
-
| `SQUEEZR_PORT` | `8080` | HTTP proxy port (Claude, Aider, Gemini) |
|
|
223
|
-
| `SQUEEZR_MITM_PORT` | `8081` | MITM proxy port (Codex) — defaults to SQUEEZR_PORT + 1 |
|
|
224
|
-
| `SQUEEZR_THRESHOLD` | `800` | Min chars to compress |
|
|
225
|
-
| `SQUEEZR_KEEP_RECENT` | `3` | Recent results to skip |
|
|
226
|
-
| `SQUEEZR_DISABLED` | `false` | Disable all compression |
|
|
227
|
-
| `SQUEEZR_DRY_RUN` | `false` | Log savings without compressing |
|
|
228
|
-
| `SQUEEZR_LOCAL_UPSTREAM` | `http://localhost:11434` | Ollama/LM Studio URL |
|
|
229
|
-
| `SQUEEZR_LOCAL_MODEL` | `qwen2.5-coder:1.5b` | Local model for compression |
|
|
230
|
-
|
|
231
|
-
### Per-command skip
|
|
232
|
-
|
|
233
|
-
Add `# squeezr:skip` anywhere in a Bash command to bypass compression for that result.
|
|
234
|
-
|
|
235
|
-
## CLI commands
|
|
236
|
-
|
|
237
|
-
```bash
|
|
238
|
-
squeezr setup # configure env vars, auto-start, CA trust, install MCP server
|
|
239
|
-
squeezr start # start the proxy (auto-restarts if version mismatch after update)
|
|
240
|
-
squeezr update # kill old processes, install latest from npm, restart
|
|
241
|
-
squeezr stop # stop the proxy
|
|
242
|
-
squeezr status # check if proxy is running
|
|
243
|
-
squeezr logs # show last 50 log lines
|
|
244
|
-
squeezr config # print current config
|
|
245
|
-
squeezr ports # change HTTP and MITM proxy ports
|
|
246
|
-
squeezr gain # all-time token savings summary
|
|
247
|
-
squeezr gain --session # live session savings from the running proxy
|
|
248
|
-
squeezr gain --details # all-time stats with per-tool breakdown
|
|
249
|
-
squeezr gain --reset # reset all-time counters
|
|
250
|
-
squeezr discover # detect which AI CLIs are installed
|
|
251
|
-
squeezr bypass # toggle bypass mode (skip compression, keep logging)
|
|
252
|
-
squeezr bypass --on # enable bypass (disable compression)
|
|
253
|
-
squeezr bypass --off # disable bypass (resume compression)
|
|
254
|
-
squeezr tunnel # expose proxy via Cloudflare Tunnel (for Cursor IDE)
|
|
255
|
-
squeezr mcp install # register MCP server in Claude Code, Cursor, Windsurf, Cline
|
|
256
|
-
squeezr mcp uninstall # remove MCP server registration
|
|
257
|
-
squeezr uninstall # remove Squeezr completely (env vars, CA, auto-start, logs)
|
|
258
|
-
squeezr version # print version
|
|
259
|
-
```
|
|
260
|
-
|
|
261
|
-
## Resilience
|
|
262
|
-
|
|
263
|
-
Squeezr sits in the critical path between your AI CLI and the upstream API. It's designed to never break your workflow:
|
|
264
|
-
|
|
265
|
-
- **Circuit breaker** — If the AI compression backend (Haiku, GPT-4o-mini, etc.) fails 3 times in a row, Squeezr automatically skips AI compression for 60 seconds, then probes recovery. Deterministic compression continues working. Visible in dashboard, `squeezr status`, and MCP.
|
|
266
|
-
- **5-second AI timeout** — Each AI compression call has a hard 5s timeout. If the backend is slow, the original content passes through unmodified.
|
|
267
|
-
- **Bypass mode** — `squeezr bypass` instantly disables all compression without restarting. Requests still pass through and are logged. Toggle via CLI, MCP, dashboard, or REST API.
|
|
268
|
-
- **Expand rate tracking** — Monitors how often the model calls `squeezr_expand` to recover compressed content. High expand rate signals the compression is too aggressive.
|
|
269
|
-
- **Latency tracking** — p50/p95/p99 compression latency visible in dashboard and MCP stats.
|
|
270
|
-
|
|
271
|
-
## Compression backends
|
|
272
|
-
|
|
273
|
-
Squeezr uses cheap/free models for AI compression (the deterministic layer is pure regex, no API calls):
|
|
274
|
-
|
|
275
|
-
| Backend | Model | Used for | Cost |
|
|
276
|
-
|---------|-------|----------|------|
|
|
277
|
-
| Anthropic | Haiku | System prompt, session cache | ~$0.0001/call |
|
|
278
|
-
| OpenAI | GPT-4o-mini | Fallback compression | ~$0.0001/call |
|
|
279
|
-
| Gemini | Flash-8B | Fallback compression | Free |
|
|
280
|
-
| Local | qwen2.5-coder:1.5b | Compression when using Ollama | Free |
|
|
281
|
-
| ChatGPT (WS) | GPT-5.4-mini | Codex frame compression | $0 (same subscription) |
|
|
282
|
-
|
|
283
|
-
## Requirements
|
|
284
|
-
|
|
285
|
-
- Node.js 18+ (compatible with Node.js 24)
|
|
286
|
-
- For Codex MITM: set `HTTPS_PROXY=http://localhost:8081` in the terminal where you run Codex (not set globally to avoid interfering with other tools)
|
|
287
|
-
- For local compression: [Ollama](https://ollama.ai) with `qwen2.5-coder:1.5b`
|
|
288
|
-
|
|
289
|
-
## Troubleshooting
|
|
290
|
-
|
|
291
|
-
### Claude Code throws `undefined is not an object (evaluating '$.speed')`
|
|
292
|
-
|
|
293
|
-
Symptom: every prompt in Claude Code immediately errors with `undefined is not an object (evaluating '$.speed')` (or similar `$.X` parse errors). This means Claude Code is sending its API requests to **something that is not Squeezr** but happens to occupy Squeezr's port — typically a Docker container (Apache, nginx, WordPress) bound to `8080`.
|
|
294
|
-
|
|
295
|
-
To diagnose, run:
|
|
296
|
-
|
|
297
|
-
```bash
|
|
298
|
-
squeezr status
|
|
299
|
-
```
|
|
300
|
-
|
|
301
|
-
If the output says `a foreign service is` listening on the port, you have three options:
|
|
302
|
-
|
|
303
|
-
1. **Move Squeezr to a different port** (recommended): `squeezr ports` and pick something free, then reopen your terminal.
|
|
304
|
-
2. **Stop the offending service**: `docker ps` to find what owns 8080, then `docker stop <id>`.
|
|
305
|
-
3. **Inspect runtime info**: `cat ~/.squeezr/runtime.json` shows the *actual* port Squeezr is bound to. If it differs from your `ANTHROPIC_BASE_URL`, run `squeezr setup` to refresh your shell profile.
|
|
306
|
-
|
|
307
|
-
Squeezr v1.24.0+ runs a self-test on every startup that detects this exact failure mode and prints actionable hints. You can re-run it any time with:
|
|
308
|
-
|
|
309
|
-
```bash
|
|
310
|
-
curl -s "http://localhost:$(jq -r .port ~/.squeezr/runtime.json)/squeezr/selftest?run=1" | jq
|
|
311
|
-
```
|
|
312
|
-
|
|
313
|
-
## License
|
|
314
|
-
|
|
315
|
-
MIT
|
|
1
|
+
# Squeezr
|
|
2
|
+
|
|
3
|
+
**Token compression proxy for AI coding CLIs.** Sits between your CLI and the API, compresses context on the fly — **without ever breaking Anthropic's prompt cache** — and saves thousands of tokens per session. Real-time dashboard, MCP integration, and an optional AI compression layer powered by **Zest**, Squeezr's own model.
|
|
4
|
+
|
|
5
|
+
[](https://www.npmjs.com/package/squeezr-ai) [](LICENSE)
|
|
6
|
+
|
|
7
|
+
## Supported CLIs & apps
|
|
8
|
+
|
|
9
|
+
| Client | Protocol | Proxy method |
|
|
10
|
+
|--------|----------|--------------|
|
|
11
|
+
| Claude Code | HTTP to Anthropic API | `ANTHROPIC_BASE_URL=http://localhost:8080` |
|
|
12
|
+
| Claude Desktop | HTTP to Anthropic API | Windows: `setx` (via `squeezr setup`); macOS: `launchctl setenv`; Linux: `environment.d` |
|
|
13
|
+
| Aider | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
|
|
14
|
+
| OpenCode | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
|
|
15
|
+
| Gemini CLI | HTTP to Gemini API | `GEMINI_API_BASE_URL=http://localhost:8080` |
|
|
16
|
+
| Ollama | HTTP (local) | Transparent via dummy API key detection |
|
|
17
|
+
| Codex Desktop | HTTP to OpenAI API | `~/.codex/config.toml` → `openai_base_url` (via `squeezr setup`) |
|
|
18
|
+
| Codex CLI | WebSocket to chatgpt.com | TLS-terminating MITM proxy on :8081 |
|
|
19
|
+
| Cursor IDE | HTTP to OpenAI API (BYOK only) | `localhost:8080` directly (CORS) or `squeezr tunnel` |
|
|
20
|
+
| Continue (VS Code) | HTTP to OpenAI-compat | `apiBase: http://localhost:8080/v1` |
|
|
21
|
+
|
|
22
|
+
Works with both API keys and subscription plans (OAuth) — Claude Code Max/Pro, OpenAI Plus, etc.
|
|
23
|
+
|
|
24
|
+
## Quick start
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
npm install -g squeezr-ai
|
|
28
|
+
squeezr setup # env vars, auto-start, CA trust, MCP server — all automatic
|
|
29
|
+
squeezr start
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
## Prompt-cache safety (the core design principle)
|
|
33
|
+
|
|
34
|
+
Anthropic bills cached context at **0.1x** — but only if the request prefix arrives **byte-for-byte identical** between requests. A proxy that mutates history differently on each request silently invalidates that cache and re-bills the full context at full price, costing far more than compression saves.
|
|
35
|
+
|
|
36
|
+
Squeezr is built around this constraint:
|
|
37
|
+
|
|
38
|
+
- **Deterministic compression is byte-stable** — same input always produces the same output (fixed-pressure rules), so the cached prefix never changes → Anthropic's cache keeps hitting.
|
|
39
|
+
- **Unstable passes never touch the cached prefix** — AI compression, cross-turn dedup, diff-reads and stale-turn summaries only operate past the last `cache_control` marker (or freely for clients that don't use caching).
|
|
40
|
+
- **Live cache health monitoring** — the dashboard's *Prompt Cache* card shows `cache_read` vs `cache_creation` tokens and a Hit Health %. Green (≥80%) means you're paying the minimum possible; red means something is invalidating your prefix.
|
|
41
|
+
|
|
42
|
+
Full write-up: [docs/PROMPT_CACHE.md](docs/PROMPT_CACHE.md)
|
|
43
|
+
|
|
44
|
+
## How it works
|
|
45
|
+
|
|
46
|
+
Every request passes through Squeezr on `localhost:8080`. Compression layers, in order:
|
|
47
|
+
|
|
48
|
+
1. **MCP tool filtering** (opt-in) — drop tool definitions from MCP servers you block (`mcp_block_servers`) or that aren't allow-listed. A single chatty MCP server can cost ~18k tokens *per request*. Servers used in the conversation are never filtered.
|
|
49
|
+
2. **Tool description compression** — tool descriptions are truncated to their first paragraph (Bash: 10,441 → 53 chars), with the full spec stored in the expand store. The model retrieves it on demand via `squeezr_expand`. Saves ~17k tokens/request on a stock Claude Code session.
|
|
50
|
+
3. **System prompt compression** — skill/plugin duplicate blocks deduplicated; optional AI summarization (gated behind `ai_compression`).
|
|
51
|
+
4. **Deterministic preprocessing** — zero-latency regex rules on every tool result: ANSI/progress-bar/timestamp stripping, line dedup, JSON minification, plus ~30 tool-specific patterns (git, vitest/jest, tsc, eslint, cargo, pytest, docker, kubectl, gh…). Byte-stable → cache-safe.
|
|
52
|
+
5. **Cross-turn dedup & diff-reads** — repeated tool outputs collapse to references; repeated file reads become diffs against the latest read. (Only past the cache barrier.)
|
|
53
|
+
6. **Stale-turn summarization** — conversations >40 turns get old assistant prose collapsed to keyword summaries. (Only for clients without prompt caching.)
|
|
54
|
+
7. **AI compression** (opt-in, off by default) — old blocks above the AI floor (~1000 chars, auto-raised by the quality governor) summarized by a small model. Backends: **Zest (local, free, deterministic)**, Haiku, GPT-4o-mini, Gemini Flash. Heavily guarded so it only ever helps:
|
|
55
|
+
- **Structured-data guard** — JSON / JSONL / record dumps / tables are *never* AI-rewritten (a model can silently blank a field value); they stay in their deterministic form. Prose/logs still get compressed.
|
|
56
|
+
- **Compressibility probe** — a one-shot `deflate` estimate skips already-dense blocks (path/error/test dumps) that wouldn't beat the min-ratio, so no wasted backend calls.
|
|
57
|
+
- **Acceptance guardrail + retry-with-correction** — every AI result is validated; if it dropped a critical token (path/URL/error code) the model is re-prompted with the exact tokens to restore, else the result is rejected and the deterministic form is kept. Nothing that loses a hard token is ever used.
|
|
58
|
+
- **Quality governor** — watches expand-rate and guard-reject-rate and auto-raises the min block size (or pauses) when quality dips.
|
|
59
|
+
- **Backend-aware limits** — local Zest is free → no rate limit, generous timeout, processed sequentially (Ollama serialises anyway). Cloud backends keep a hard cap (20 calls/5 min) and a short timeout to protect spend.
|
|
60
|
+
- Plus the persistent on/off toggle and the cache barrier.
|
|
61
|
+
|
|
62
|
+
### Recovery: nothing is ever lost
|
|
63
|
+
|
|
64
|
+
Every compressed block embeds a `squeezr_expand(id)` reference. A `squeezr_expand` tool is injected into each request — if the model needs the original content, it retrieves it in one call. The dashboard tracks expand rate as the compression-quality metric (0 = nothing important was lost).
|
|
65
|
+
|
|
66
|
+
### Adaptive pressure
|
|
67
|
+
|
|
68
|
+
Compression aggressiveness scales with context usage: <50% → light (1500-char threshold), 50–75% → normal (800), 75–90% → aggressive (400), >90% → critical (150).
|
|
69
|
+
|
|
70
|
+
## Web dashboard
|
|
71
|
+
|
|
72
|
+
`http://localhost:8080/squeezr/dashboard` — 3 pages, SSE-updated:
|
|
73
|
+
|
|
74
|
+
| Page | What it shows |
|
|
75
|
+
|------|---------------|
|
|
76
|
+
| **Overview** | **Today-scoped** (resets at midnight): tokens saved, two honest ratios — **% of total sent today** and **% of the last request** (changes every turn), cost comparison (today), Cost/Savings-by-type breakdown (today), Top Tools, Session Cache, AI Compression card (calls / saved / spent / net), **Prompt Cache health** (read vs creation + hit %, **persisted across restarts**), by model / by client, compression mode + **Bypass / AI Compression toggles** |
|
|
77
|
+
| **Savings** | Day / Week / Month / All-time filters with period navigation — per-period tokens, cost, sessions, charts, By Model / By Client / Top Tools / AI Compression / Session Cache, all persisted across restarts |
|
|
78
|
+
| **Settings** | Client base-URL reference, ports, version/uptime, bypass & circuit breaker state, **AI Compression on/off**, **Restart / Stop buttons**, update check |
|
|
79
|
+
|
|
80
|
+
## Safety & resilience
|
|
81
|
+
|
|
82
|
+
Squeezr sits in the critical path. It is designed to never break your workflow — and never burn your plan:
|
|
83
|
+
|
|
84
|
+
- **Bypass mode (persisted)** — one click/command disables all compression; survives restarts. The emergency stop.
|
|
85
|
+
- **AI compression master switch (persisted, default OFF)** — with a subscription OAuth token, AI compression calls bill against *your own plan*; Squeezr refuses to auto-route to Haiku on an OAuth token. Use the free local Zest backend or a separately billed API key.
|
|
86
|
+
- **AI rate limiter (cloud only)** — hard cap of 20 AI calls per 5-minute sliding window for paid cloud backends (protects spend). Local Zest is free → not rate-limited.
|
|
87
|
+
- **AI minimum block size (~1000 chars, governed)** — small blocks can't be compressed without loss; Squeezr never AI-compresses below the floor, and the quality governor raises it automatically if reject/expand rates climb.
|
|
88
|
+
- **Structured-data & compressibility guards** — AI never rewrites structured data (JSON/records → no field corruption), and dense/incompressible blocks skip AI entirely.
|
|
89
|
+
- **Acceptance guardrail + retry-with-correction** — AI output that drops a critical token or doesn't save enough is rejected (after one corrective retry); the deterministic form is kept.
|
|
90
|
+
- **Cache barrier** — unstable passes can't touch the cached prefix (see prompt-cache safety above).
|
|
91
|
+
- **Circuit breaker + backend-aware timeouts** — 3 consecutive AI backend failures → AI disabled for 60s, deterministic continues. Local calls get a generous timeout and run sequentially (Ollama serialises) so they don't false-timeout.
|
|
92
|
+
- **Atomic persistence** — stats, history, caches and toggles are written atomically (tmp + rename); a crash can't corrupt them.
|
|
93
|
+
- **Self-test on startup** — detects port squatting (the classic `$.speed` Claude Code error), env-var drift, and pipeline issues.
|
|
94
|
+
|
|
95
|
+
## Honest metrics
|
|
96
|
+
|
|
97
|
+
One source of truth (`~/.squeezr/stats.json`, continuous net counters — never inflated per-session sums):
|
|
98
|
+
|
|
99
|
+
- **Net saved** = what actually left your requests, after `[squeezr:id]` tag overhead.
|
|
100
|
+
- **Savings by type** — deterministic, dedup, tool descriptions, stale turns, AI, MCP filter, system prompt (gross per-technique, labeled as such).
|
|
101
|
+
- **AI spend tracking** — every compression backend call's real token usage (input/output, per model) is counted and shown against what it saved.
|
|
102
|
+
- **Prompt cache** — `cache_read` vs `cache_creation` from Anthropic's real usage fields. Anthropic's cache discount is shown separately and *not* claimed as Squeezr savings.
|
|
103
|
+
|
|
104
|
+
## Zest — Squeezr's own compression model
|
|
105
|
+
|
|
106
|
+
Zest (`zest-0.8b`, fine-tuned from Qwen3.5-0.8B with LoRA) is Squeezr's local compression model: free, runs on CPU via Ollama, and **deterministic in greedy decoding** (temperature 0) — which makes AI compression byte-stable and therefore cache-safe. Status: deployed and selectable as the `local` backend (Ollama). Training data is being regenerated against Squeezr's own runtime guard (every example must keep all hard tokens — paths/URLs/error codes — and clear the min-ratio) so the model learns guard-passing compression instead of token-dropping. Design doc: [docs/REINVENT_AI.md](docs/REINVENT_AI.md)
|
|
107
|
+
|
|
108
|
+
## MCP server
|
|
109
|
+
|
|
110
|
+
Installed automatically into Claude Code, Cursor, Windsurf and Cline by `squeezr setup`.
|
|
111
|
+
|
|
112
|
+
Tools: `squeezr_status`, `squeezr_stats`, `squeezr_set_mode`, `squeezr_config`, `squeezr_habits`, `squeezr_stop`, `squeezr_check_updates`, `squeezr_update`, `squeezr_set_project`, `squeezr_bypass`.
|
|
113
|
+
|
|
114
|
+
## Configuration
|
|
115
|
+
|
|
116
|
+
User config lives at **`~/.squeezr/squeezr.toml`** (survives npm updates). A project-local `.squeezr.toml` deep-merges on top.
|
|
117
|
+
|
|
118
|
+
```toml
|
|
119
|
+
[compression]
|
|
120
|
+
threshold = 800 # min chars to compress a tool result
|
|
121
|
+
keep_recent = 3 # recent tool results never touched
|
|
122
|
+
ai_compression = false # MASTER switch for AI calls — default OFF (see Safety)
|
|
123
|
+
backend = "local" # auto | local (Zest) | haiku | gpt-mini | gemini-flash
|
|
124
|
+
compress_system_prompt = true
|
|
125
|
+
compress_conversation = true
|
|
126
|
+
compress_assistant_ai = false # AI-compress long old assistant turns (prose-heavy chats)
|
|
127
|
+
stale_turns = true # auto-disabled when prompt-cache markers are present
|
|
128
|
+
tool_desc_compress = true # first-paragraph truncation + expand recovery
|
|
129
|
+
tool_desc_expand = true
|
|
130
|
+
# mcp_block_servers = ["some-mcp"] # drop these servers' tools (~18k tok/req each)
|
|
131
|
+
# mcp_allow_servers = ["github-mcp"] # if set, only these survive
|
|
132
|
+
# skip_tools = ["Read"] # never compress these tool types
|
|
133
|
+
# Per-command: append "# squeezr:skip" to any Bash command to skip its result
|
|
134
|
+
|
|
135
|
+
[cache]
|
|
136
|
+
enabled = true
|
|
137
|
+
max_entries = 1000
|
|
138
|
+
|
|
139
|
+
[adaptive]
|
|
140
|
+
enabled = true # pressure-based thresholds (see Adaptive pressure)
|
|
141
|
+
|
|
142
|
+
[local]
|
|
143
|
+
enabled = true
|
|
144
|
+
upstream_url = "http://localhost:11434"
|
|
145
|
+
compression_model = "qwen2.5-coder:1.5b" # or zest-0.8b once published
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
Env vars: `SQUEEZR_PORT`, `SQUEEZR_MITM_PORT`, `SQUEEZR_THRESHOLD`, `SQUEEZR_KEEP_RECENT`, `SQUEEZR_DISABLED`, `SQUEEZR_DRY_RUN`, `SQUEEZR_LOCAL_UPSTREAM`, `SQUEEZR_LOCAL_MODEL`.
|
|
149
|
+
|
|
150
|
+
## CLI commands
|
|
151
|
+
|
|
152
|
+
```bash
|
|
153
|
+
squeezr setup # configure everything (env, auto-start, CA, MCP)
|
|
154
|
+
squeezr start # start the proxy
|
|
155
|
+
squeezr restart # stop + start (reloads config)
|
|
156
|
+
squeezr stop # stop the proxy
|
|
157
|
+
squeezr update # install latest from npm + restart
|
|
158
|
+
squeezr status # is it running? version, port, self-test
|
|
159
|
+
squeezr logs # last 50 log lines
|
|
160
|
+
squeezr config # print current config
|
|
161
|
+
squeezr ports # change HTTP / MITM ports
|
|
162
|
+
squeezr gain # all-time savings (--session, --details, --reset)
|
|
163
|
+
squeezr discover # detect installed AI CLIs
|
|
164
|
+
squeezr bypass # toggle bypass (--on / --off) — persisted
|
|
165
|
+
squeezr tunnel # Cloudflare Tunnel (Cursor IDE)
|
|
166
|
+
squeezr mcp install # register MCP server (mcp uninstall to remove)
|
|
167
|
+
squeezr desktop start # separate proxy for Claude/Codex Desktop (stop/status)
|
|
168
|
+
squeezr uninstall # remove completely
|
|
169
|
+
squeezr version
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
## REST endpoints
|
|
173
|
+
|
|
174
|
+
`/squeezr/stats` · `/squeezr/history` · `/squeezr/health` · `/squeezr/bypass` (GET/POST) · `/squeezr/ai-compression` (GET/POST) · `/squeezr/config` · `/squeezr/native-compact` · `/squeezr/control/restart` · `/squeezr/control/stop` · `/squeezr/dashboard`
|
|
175
|
+
|
|
176
|
+
## Requirements
|
|
177
|
+
|
|
178
|
+
Node.js ≥18, ~140 MB RAM, no GPU. Full details: [docs/HARDWARE_REQUIREMENTS.md](docs/HARDWARE_REQUIREMENTS.md)
|
|
179
|
+
|
|
180
|
+
## Documentation
|
|
181
|
+
|
|
182
|
+
| Doc | Contents |
|
|
183
|
+
|-----|----------|
|
|
184
|
+
| [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) | Internal architecture |
|
|
185
|
+
| [docs/PROMPT_CACHE.md](docs/PROMPT_CACHE.md) | How Anthropic's prompt cache works + the 3 cache-breakers we found and fixed |
|
|
186
|
+
| [docs/REINVENT_AI.md](docs/REINVENT_AI.md) | Data-driven design of the AI compression layer (Zest) |
|
|
187
|
+
| [docs/HARDWARE_REQUIREMENTS.md](docs/HARDWARE_REQUIREMENTS.md) | Measured hardware requirements |
|
|
188
|
+
| [docs/TOOLS_ISSUE.md](docs/TOOLS_ISSUE.md) | The 28k-tokens-per-request tool descriptions problem |
|
|
189
|
+
| [CHANGELOG.md](CHANGELOG.md) | Full version history |
|
|
190
|
+
|
|
191
|
+
## Troubleshooting
|
|
192
|
+
|
|
193
|
+
**Claude Code throws `undefined is not an object (evaluating '$.speed')`** — something that isn't Squeezr is squatting on the port (often a Docker container on 8080). Run `squeezr status`; if it reports a foreign service: `squeezr ports` to move, or stop the offender. `cat ~/.squeezr/runtime.json` shows the actual bound port.
|
|
194
|
+
|
|
195
|
+
**Dashboard shows 0 AI calls with AI Compression ON** — expected with Claude Code: the prompt-cache barrier leaves nothing safe for AI to compress (everything cacheable is protected, everything recent is preserved). AI compression shines for clients without prompt caching, or via the (upcoming) stable Zest pipeline.
|
|
196
|
+
|
|
197
|
+
## License
|
|
198
|
+
|
|
199
|
+
MIT
|