squeezr-ai 1.22.0 → 1.23.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,288 +1,313 @@
1
- # Squeezr
2
-
3
- **Token compression proxy for AI coding CLIs.** Sits between your CLI and the API, compresses context on the fly, saves thousands of tokens per session. Includes a real-time web dashboard and MCP integration.
4
-
5
- [![npm](https://img.shields.io/npm/v/squeezr-ai)](https://www.npmjs.com/package/squeezr-ai) [![license](https://img.shields.io/npm/l/squeezr-ai)](LICENSE)
6
-
7
- ## Supported CLIs
8
-
9
- | CLI | Protocol | Proxy method |
10
- |-----|----------|--------------|
11
- | Claude Code | HTTP to Anthropic API | `ANTHROPIC_BASE_URL=http://localhost:8080` |
12
- | Aider | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
13
- | OpenCode | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
14
- | Gemini CLI | HTTP to Gemini API | `GEMINI_API_BASE_URL=http://localhost:8080` |
15
- | Ollama | HTTP (local) | Transparent via dummy API key detection |
16
- | **Codex** | **WebSocket to chatgpt.com** | **TLS-terminating MITM proxy on :8081** |
17
- | **Cursor IDE** | **ConnectRPC/HTTP2 to api2.cursor.sh** | **`squeezr cursor` — MITM proxy on :8082** |
18
- | Continue (VS Code) | HTTP to OpenAI-compat | `apiBase: http://localhost:8080/v1` |
19
-
20
- Works with both API keys and subscription plans (OAuth) — Claude Code Max/Pro, OpenAI Plus, etc.
21
-
22
- ## Quick start
23
-
24
- ```bash
25
- npm install -g squeezr-ai
26
- squeezr setup # configures env vars, auto-start, CA trust, and MCP server
27
- squeezr start
28
- ```
29
-
30
- `squeezr setup` handles everything automatically:
31
- - Sets `ANTHROPIC_BASE_URL`, `GEMINI_API_BASE_URL`, `NODE_EXTRA_CA_CERTS`
32
- - Installs a shell wrapper (PowerShell on Windows, bash/zsh on Linux/macOS/WSL) that auto-refreshes env vars after `squeezr start/setup/update` — no need to restart the terminal
33
- - Registers auto-start (launchd on macOS, systemd on Linux, Task Scheduler/NSSM on Windows)
34
- - Registers the MCP server in Claude Code, Cursor, Windsurf, and Cline
35
- - **Windows:** imports the MITM CA into the Windows Certificate Store (user-level, no admin required) so Rust-based CLIs like Codex trust the proxy's TLS certificates
36
- - **macOS/Linux/WSL:** generates a CA bundle at `~/.squeezr/mitm-ca/bundle.crt` for `NODE_EXTRA_CA_CERTS`
37
-
38
- ## How it works
39
-
40
- Every request from your AI CLI passes through Squeezr on `localhost:8080`. The proxy applies three compression layers before forwarding to the upstream API:
41
-
42
- ### Layer 1: System prompt compression
43
-
44
- The system prompt (~13KB for Claude Code) is compressed once using an AI model and cached. Subsequent requests reuse the cached version. Saves ~3,000 tokens per request.
45
-
46
- ### Layer 2: Deterministic preprocessing
47
-
48
- Zero-latency, rule-based transformations applied to every tool result:
49
-
50
- - **Noise removal:** ANSI escape codes, progress bars, timestamps, spinner output
51
- - **Deduplication:** repeated stack frames, duplicate lines, redundant git hunks
52
- - **Minification:** JSON whitespace, collapsed blank lines
53
-
54
- ### Layer 3: Tool-specific patterns (~30 rules)
55
-
56
- Each tool result is matched against specialized compression rules:
57
-
58
- | Category | Tools | What it does |
59
- |----------|-------|--------------|
60
- | Git | diff, log, status, branch | 1-line diff context, capped log, compact status |
61
- | JS/TS | vitest, jest, playwright, tsc, eslint, biome, prettier | Failures/errors only, grouped by file |
62
- | Package managers | pnpm, npm | Install summary, list capped at 30, outdated only |
63
- | Build | next build, cargo build | Errors only |
64
- | Test | cargo test, pytest, go test | FAIL blocks + tracebacks only |
65
- | Infra | terraform, docker, kubectl | Resource changes, compact tables, last 50 log lines |
66
- | Other | prisma, gh CLI, curl/wget | Strip ASCII art, cap output, remove verbose headers |
67
-
68
- ### Exclusive patterns
69
-
70
- Applied to specific content types regardless of tool:
71
-
72
- - **Lockfiles** (package-lock.json, Cargo.lock, etc.) → dependency count summary
73
- - **Large code files** (>500 lines) → imports + function/class signatures only
74
- - **Long output** (>200 lines) → head + tail + omission note
75
- - **Grep results** → grouped by file, matches capped
76
- - **Glob results** (>30 files) → directory tree summary
77
- - **Noisy output** (>50% non-essential) → auto-extract errors/warnings
78
-
79
- ### Adaptive pressure
80
-
81
- Compression aggressiveness scales with context window usage:
82
-
83
- | Context usage | Threshold | Behavior |
84
- |---------------|-----------|----------|
85
- | < 50% | 1,500 chars | Light — only compress large results |
86
- | 50–75% | 800 chars | Normal — standard compression |
87
- | 75–90% | 400 chars | Aggressive — compress most results |
88
- | > 90% | 150 chars | Critical — compress everything, 0 git diff context |
89
-
90
- ### Session optimizations
91
-
92
- - **Session cache:** After ~50 tool results, older results are batch-summarized into a single compact block
93
- - **KV cache warming:** Deterministic MD5-based IDs keep compressed content prefix-stable across requests
94
- - **Cross-turn dedup:** If the same file is read multiple times, earlier reads are replaced with reference pointers
95
- - **Expand on demand:** Compressed blocks include a `squeezr_expand(id)` callback to retrieve full content
96
-
97
- ## Web dashboard
98
-
99
- Live dashboard at `http://localhost:PORT/squeezr/dashboard` with 5 pages:
100
-
101
- | Page | What it shows |
102
- |------|---------------|
103
- | **Overview** | Tokens saved, compression %, requests, cost saved, per-tool breakdown, sparkline chart, context pressure bars, active project badge, savings breakdown (deterministic, AI, dedup, system prompt, overhead) |
104
- | **Projects** | Per-project aggregate stats across all sessions, auto-detected from working directory or set manually via MCP |
105
- | **History** | Past proxy sessions grouped by project and day — start/end time, duration, request count, tokens saved, relative timestamps |
106
- | **Limits** | Real-time rate limit gauges per CLI: Anthropic token/request limits, OpenAI billing & credit balance, Gemini 429 tracking, input/output token usage (session + daily), personal monthly budget bar |
107
- | **Settings** | Compression mode selector (Soft/Normal/Aggressive/Critical), threshold tuning |
108
-
109
- Updates every 2 seconds via SSE. Works with both API key and subscription (OAuth) authentication.
110
-
111
- ## MCP server
112
-
113
- Built-in MCP server (`squeezr-mcp`) that gives any MCP-capable AI CLI real-time awareness and control of Squeezr.
114
-
115
- **Installed automatically** by `squeezr setup` into Claude Code, Cursor, Windsurf, and Cline.
116
-
117
- | Tool | Description |
118
- |------|-------------|
119
- | `squeezr_status` | Is proxy running? Version, port, uptime, mode, dry-run state |
120
- | `squeezr_stats` | Token savings, compression %, cost saved, savings breakdown (deterministic/AI/dedup/system prompt/overhead), per-tool breakdown |
121
- | `squeezr_set_mode` | Change compression mode instantly (soft / normal / aggressive / critical) |
122
- | `squeezr_config` | Current thresholds, keepRecent, cache sizes, AI-skipped tools |
123
- | `squeezr_habits` | Detect wasteful patterns this session (duplicate reads, high Bash count, cache efficiency) |
124
- | `squeezr_stop` | Stop the proxy gracefully (persists caches before exit) |
125
- | `squeezr_check_updates` | Check npm for newer Squeezr version |
126
- | `squeezr_update` | Update to latest version via `npm install -g squeezr-ai@latest` |
127
- | `squeezr_set_project` | Manually set/clear the current project name (overrides auto-detection) |
128
- | `squeezr_bypass` | Toggle bypass mode — disable compression instantly without restart (runtime-only) |
129
-
130
- Every MCP tool response automatically checks for updates and appends a notification banner when a new version is available.
131
-
132
- ## Honest savings tracking
133
-
134
- Squeezr tracks token savings with full transparency. `squeezr gain` and the dashboard break down savings by source:
135
-
136
- | Source | Description |
137
- |--------|-------------|
138
- | Deterministic | Rule-based preprocessing (ANSI strip, dedup, minification) — free, zero latency |
139
- | AI compression | Haiku/GPT-mini summarization of tool results — near-free, slight latency |
140
- | Read dedup | Cross-turn deduplication of repeated file reads |
141
- | System prompt | One-time AI compression of the system prompt, cached across requests |
142
- | Tag overhead | Bytes added by `[squeezr:ID]` markers (subtracted from savings) |
143
- | AI cost | Estimated token cost of compression API calls (subtracted from NET) |
144
-
145
- **NET savings** = total savings − tag overhead − AI compression cost.
146
-
147
- ### `squeezr gain` subcommands
148
-
149
- ```bash
150
- squeezr gain # all-time savings summary
151
- squeezr gain --session # live session savings from the running proxy
152
- squeezr gain --details # all-time stats with per-tool breakdown
153
- squeezr gain --reset # reset all-time counters
154
- ```
155
-
156
- ## Project tracking
157
-
158
- Squeezr automatically detects the active project from the CLI's working directory (e.g. Claude Code's `<cwd>` tag in the system prompt). Per-project stats are tracked across sessions.
159
-
160
- - **Auto-detection:** extracts the project name from the last meaningful path segment
161
- - **Manual override:** `squeezr_set_project` MCP tool or `POST /squeezr/project` REST endpoint
162
- - **Per-project stats:** visible on the Dashboard's Projects page and in `squeezr gain --session`
163
-
164
- ## Codex support (MITM proxy)
165
-
166
- Codex uses WebSocket over TLS to `chatgpt.com` with OAuth authentication — it cannot be proxied via `OPENAI_BASE_URL`. Squeezr runs a TLS-terminating MITM proxy on port 8081 that intercepts and compresses WebSocket frames. See [CODEX.md](CODEX.md) for the full technical breakdown.
167
-
168
- The MITM proxy **only intercepts `chatgpt.com`** traffic. All other HTTPS requests (npm, git, curl, etc.) pass through as a transparent TCP tunnel — no certificate needed, no interference.
169
-
170
- ## Configuration
171
-
172
- ### Global config: `squeezr.toml` (next to the binary)
173
-
174
- ```toml
175
- # Compression thresholds
176
- threshold = 800 # min chars to apply compression
177
- keep_recent = 3 # skip the N most recent tool results
178
- ai_compression = false # enable AI (Haiku) for tool result compression
179
-
180
- # Ports
181
- port = 8080 # HTTP proxy port
182
- mitm_port = 8081 # MITM proxy port (Codex)
183
-
184
- # Models
185
- local_model = "qwen2.5-coder:1.5b" # model for local compression
186
- local_upstream = "http://localhost:11434"
187
-
188
- # Tools to never AI-compress (deterministic-only)
189
- ai_skip_tools = ["Read", "View"]
190
-
191
- # Compression modes override thresholds
192
- [modes.soft]
193
- threshold = 1500
194
- keep_recent = 10
195
- ai_compression = false
196
-
197
- [modes.normal]
198
- threshold = 800
199
- keep_recent = 3
200
-
201
- [modes.aggressive]
202
- threshold = 200
203
- keep_recent = 1
204
- ai_compression = true
205
-
206
- [modes.critical]
207
- threshold = 50
208
- keep_recent = 0
209
- ai_compression = true
210
- ```
211
-
212
- ### Project-level config: `squeezr.project.toml` (in project root)
213
-
214
- Project-level config is deep-merged over global config. Useful for per-repo tuning.
215
-
216
- ### Environment variables
217
-
218
- | Variable | Default | Description |
219
- |----------|---------|-------------|
220
- | `SQUEEZR_PORT` | `8080` | HTTP proxy port (Claude, Aider, Gemini) |
221
- | `SQUEEZR_MITM_PORT` | `8081` | MITM proxy port (Codex) — defaults to SQUEEZR_PORT + 1 |
222
- | `SQUEEZR_THRESHOLD` | `800` | Min chars to compress |
223
- | `SQUEEZR_KEEP_RECENT` | `3` | Recent results to skip |
224
- | `SQUEEZR_DISABLED` | `false` | Disable all compression |
225
- | `SQUEEZR_DRY_RUN` | `false` | Log savings without compressing |
226
- | `SQUEEZR_LOCAL_UPSTREAM` | `http://localhost:11434` | Ollama/LM Studio URL |
227
- | `SQUEEZR_LOCAL_MODEL` | `qwen2.5-coder:1.5b` | Local model for compression |
228
-
229
- ### Per-command skip
230
-
231
- Add `# squeezr:skip` anywhere in a Bash command to bypass compression for that result.
232
-
233
- ## CLI commands
234
-
235
- ```bash
236
- squeezr setup # configure env vars, auto-start, CA trust, install MCP server
237
- squeezr start # start the proxy (auto-restarts if version mismatch after update)
238
- squeezr update # kill old processes, install latest from npm, restart
239
- squeezr stop # stop the proxy
240
- squeezr status # check if proxy is running
241
- squeezr logs # show last 50 log lines
242
- squeezr config # print current config
243
- squeezr ports # change HTTP and MITM proxy ports
244
- squeezr gain # all-time token savings summary
245
- squeezr gain --session # live session savings from the running proxy
246
- squeezr gain --details # all-time stats with per-tool breakdown
247
- squeezr gain --reset # reset all-time counters
248
- squeezr discover # detect which AI CLIs are installed
249
- squeezr bypass # toggle bypass mode (skip compression, keep logging)
250
- squeezr bypass --on # enable bypass (disable compression)
251
- squeezr bypass --off # disable bypass (resume compression)
252
- squeezr mcp install # register MCP server in Claude Code, Cursor, Windsurf, Cline
253
- squeezr mcp uninstall # remove MCP server registration
254
- squeezr uninstall # remove Squeezr completely (env vars, CA, auto-start, logs)
255
- squeezr version # print version
256
- ```
257
-
258
- ## Resilience
259
-
260
- Squeezr sits in the critical path between your AI CLI and the upstream API. It's designed to never break your workflow:
261
-
262
- - **Circuit breaker** — If the AI compression backend (Haiku, GPT-4o-mini, etc.) fails 3 times in a row, Squeezr automatically skips AI compression for 60 seconds, then probes recovery. Deterministic compression continues working. Visible in dashboard, `squeezr status`, and MCP.
263
- - **5-second AI timeout** — Each AI compression call has a hard 5s timeout. If the backend is slow, the original content passes through unmodified.
264
- - **Bypass mode** — `squeezr bypass` instantly disables all compression without restarting. Requests still pass through and are logged. Toggle via CLI, MCP, dashboard, or REST API.
265
- - **Expand rate tracking** — Monitors how often the model calls `squeezr_expand` to recover compressed content. High expand rate signals the compression is too aggressive.
266
- - **Latency tracking** — p50/p95/p99 compression latency visible in dashboard and MCP stats.
267
-
268
- ## Compression backends
269
-
270
- Squeezr uses cheap/free models for AI compression (the deterministic layer is pure regex, no API calls):
271
-
272
- | Backend | Model | Used for | Cost |
273
- |---------|-------|----------|------|
274
- | Anthropic | Haiku | System prompt, session cache | ~$0.0001/call |
275
- | OpenAI | GPT-4o-mini | Fallback compression | ~$0.0001/call |
276
- | Gemini | Flash-8B | Fallback compression | Free |
277
- | Local | qwen2.5-coder:1.5b | Compression when using Ollama | Free |
278
- | ChatGPT (WS) | GPT-5.4-mini | Codex frame compression | $0 (same subscription) |
279
-
280
- ## Requirements
281
-
282
- - Node.js 18+ (compatible with Node.js 24)
283
- - For Codex MITM: set `HTTPS_PROXY=http://localhost:8081` in the terminal where you run Codex (not set globally to avoid interfering with other tools)
284
- - For local compression: [Ollama](https://ollama.ai) with `qwen2.5-coder:1.5b`
285
-
286
- ## License
287
-
288
- MIT
1
+ # Squeezr
2
+
3
+ **Token compression proxy for AI coding CLIs.** Sits between your CLI and the API, compresses context on the fly, saves thousands of tokens per session. Includes a real-time web dashboard and MCP integration.
4
+
5
+ [![npm](https://img.shields.io/npm/v/squeezr-ai)](https://www.npmjs.com/package/squeezr-ai) [![license](https://img.shields.io/npm/l/squeezr-ai)](LICENSE)
6
+
7
+ ## Supported CLIs
8
+
9
+ | CLI | Protocol | Proxy method |
10
+ |-----|----------|--------------|
11
+ | Claude Code | HTTP to Anthropic API | `ANTHROPIC_BASE_URL=http://localhost:8080` |
12
+ | Aider | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
13
+ | OpenCode | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
14
+ | Gemini CLI | HTTP to Gemini API | `GEMINI_API_BASE_URL=http://localhost:8080` |
15
+ | Ollama | HTTP (local) | Transparent via dummy API key detection |
16
+ | **Codex** | **WebSocket to chatgpt.com** | **TLS-terminating MITM proxy on :8081** |
17
+ | **Cursor IDE** | **ConnectRPC/HTTP2 to api2.cursor.sh** | **`squeezr cursor` — MITM proxy on :8082** |
18
+ | Continue (VS Code) | HTTP to OpenAI-compat | `apiBase: http://localhost:8080/v1` |
19
+
20
+ Works with both API keys and subscription plans (OAuth) — Claude Code Max/Pro, OpenAI Plus, etc.
21
+
22
+ ## Quick start
23
+
24
+ ```bash
25
+ npm install -g squeezr-ai
26
+ squeezr setup # configures env vars, auto-start, CA trust, and MCP server
27
+ squeezr start
28
+ ```
29
+
30
+ `squeezr setup` handles everything automatically:
31
+ - Sets `ANTHROPIC_BASE_URL`, `GEMINI_API_BASE_URL`, `NODE_EXTRA_CA_CERTS`
32
+ - Installs a shell wrapper (PowerShell on Windows, bash/zsh on Linux/macOS/WSL) that auto-refreshes env vars after `squeezr start/setup/update` — no need to restart the terminal
33
+ - Registers auto-start (launchd on macOS, systemd on Linux, Task Scheduler/NSSM on Windows)
34
+ - Registers the MCP server in Claude Code, Cursor, Windsurf, and Cline
35
+ - **Windows:** imports the MITM CA into the Windows Certificate Store (user-level, no admin required) so Rust-based CLIs like Codex trust the proxy's TLS certificates
36
+ - **macOS/Linux/WSL:** generates a CA bundle at `~/.squeezr/mitm-ca/bundle.crt` for `NODE_EXTRA_CA_CERTS`
37
+
38
+ ## How it works
39
+
40
+ Every request from your AI CLI passes through Squeezr on `localhost:8080`. The proxy applies three compression layers before forwarding to the upstream API:
41
+
42
+ ### Layer 1: System prompt compression
43
+
44
+ The system prompt (~13KB for Claude Code) is compressed once using an AI model and cached. Subsequent requests reuse the cached version. Saves ~3,000 tokens per request.
45
+
46
+ ### Layer 2: Deterministic preprocessing
47
+
48
+ Zero-latency, rule-based transformations applied to every tool result:
49
+
50
+ - **Noise removal:** ANSI escape codes, progress bars, timestamps, spinner output
51
+ - **Deduplication:** repeated stack frames, duplicate lines, redundant git hunks
52
+ - **Minification:** JSON whitespace, collapsed blank lines
53
+
54
+ ### Layer 3: Tool-specific patterns (~30 rules)
55
+
56
+ Each tool result is matched against specialized compression rules:
57
+
58
+ | Category | Tools | What it does |
59
+ |----------|-------|--------------|
60
+ | Git | diff, log, status, branch | 1-line diff context, capped log, compact status |
61
+ | JS/TS | vitest, jest, playwright, tsc, eslint, biome, prettier | Failures/errors only, grouped by file |
62
+ | Package managers | pnpm, npm | Install summary, list capped at 30, outdated only |
63
+ | Build | next build, cargo build | Errors only |
64
+ | Test | cargo test, pytest, go test | FAIL blocks + tracebacks only |
65
+ | Infra | terraform, docker, kubectl | Resource changes, compact tables, last 50 log lines |
66
+ | Other | prisma, gh CLI, curl/wget | Strip ASCII art, cap output, remove verbose headers |
67
+
68
+ ### Exclusive patterns
69
+
70
+ Applied to specific content types regardless of tool:
71
+
72
+ - **Lockfiles** (package-lock.json, Cargo.lock, etc.) → dependency count summary
73
+ - **Large code files** (>500 lines) → imports + function/class signatures only
74
+ - **Long output** (>200 lines) → head + tail + omission note
75
+ - **Grep results** → grouped by file, matches capped
76
+ - **Glob results** (>30 files) → directory tree summary
77
+ - **Noisy output** (>50% non-essential) → auto-extract errors/warnings
78
+
79
+ ### Adaptive pressure
80
+
81
+ Compression aggressiveness scales with context window usage:
82
+
83
+ | Context usage | Threshold | Behavior |
84
+ |---------------|-----------|----------|
85
+ | < 50% | 1,500 chars | Light — only compress large results |
86
+ | 50–75% | 800 chars | Normal — standard compression |
87
+ | 75–90% | 400 chars | Aggressive — compress most results |
88
+ | > 90% | 150 chars | Critical — compress everything, 0 git diff context |
89
+
90
+ ### Session optimizations
91
+
92
+ - **Session cache:** After ~50 tool results, older results are batch-summarized into a single compact block
93
+ - **KV cache warming:** Deterministic MD5-based IDs keep compressed content prefix-stable across requests
94
+ - **Cross-turn dedup:** If the same file is read multiple times, earlier reads are replaced with reference pointers
95
+ - **Expand on demand:** Compressed blocks include a `squeezr_expand(id)` callback to retrieve full content
96
+
97
+ ## Web dashboard
98
+
99
+ Live dashboard at `http://localhost:PORT/squeezr/dashboard` with 5 pages:
100
+
101
+ | Page | What it shows |
102
+ |------|---------------|
103
+ | **Overview** | Tokens saved, compression %, requests, cost saved, per-tool breakdown, sparkline chart, context pressure bars, active project badge, savings breakdown (deterministic, AI, dedup, system prompt, overhead) |
104
+ | **Projects** | Per-project aggregate stats across all sessions, auto-detected from working directory or set manually via MCP |
105
+ | **History** | Past proxy sessions grouped by project and day — start/end time, duration, request count, tokens saved, relative timestamps |
106
+ | **Limits** | Real-time rate limit gauges per CLI: Anthropic token/request limits, OpenAI billing & credit balance, Gemini 429 tracking, input/output token usage (session + daily), personal monthly budget bar |
107
+ | **Settings** | Compression mode selector (Soft/Normal/Aggressive/Critical), threshold tuning |
108
+
109
+ Updates every 2 seconds via SSE. Works with both API key and subscription (OAuth) authentication.
110
+
111
+ ## MCP server
112
+
113
+ Built-in MCP server (`squeezr-mcp`) that gives any MCP-capable AI CLI real-time awareness and control of Squeezr.
114
+
115
+ **Installed automatically** by `squeezr setup` into Claude Code, Cursor, Windsurf, and Cline.
116
+
117
+ | Tool | Description |
118
+ |------|-------------|
119
+ | `squeezr_status` | Is proxy running? Version, port, uptime, mode, circuit breaker state, bypass status |
120
+ | `squeezr_stats` | Token savings, compression %, cost saved, savings breakdown, per-tool breakdown, latency (p50/p95/p99), expand rate |
121
+ | `squeezr_set_mode` | Change compression mode instantly (soft / normal / aggressive / critical) |
122
+ | `squeezr_config` | Current thresholds, keepRecent, cache sizes, AI-skipped tools |
123
+ | `squeezr_habits` | Detect wasteful patterns this session (duplicate reads, high Bash count, cache efficiency) |
124
+ | `squeezr_stop` | Stop the proxy gracefully (persists caches before exit) |
125
+ | `squeezr_check_updates` | Check npm for newer Squeezr version |
126
+ | `squeezr_update` | Update to latest version via `npm install -g squeezr-ai@latest` |
127
+ | `squeezr_set_project` | Manually set/clear the current project name (overrides auto-detection) |
128
+ | `squeezr_bypass` | Toggle bypass mode — disable compression instantly without restart (runtime-only) |
129
+
130
+ Every MCP tool response automatically checks for updates and appends a notification banner when a new version is available.
131
+
132
+ ## Honest savings tracking
133
+
134
+ Squeezr tracks token savings with full transparency. `squeezr gain` and the dashboard break down savings by source:
135
+
136
+ | Source | Description |
137
+ |--------|-------------|
138
+ | Deterministic | Rule-based preprocessing (ANSI strip, dedup, minification) — free, zero latency |
139
+ | AI compression | Haiku/GPT-mini summarization of tool results — near-free, slight latency |
140
+ | Read dedup | Cross-turn deduplication of repeated file reads |
141
+ | System prompt | One-time AI compression of the system prompt, cached across requests |
142
+ | Tag overhead | Bytes added by `[squeezr:ID]` markers (subtracted from savings) |
143
+ | AI cost | Estimated token cost of compression API calls (subtracted from NET) |
144
+
145
+ **NET savings** = total savings − tag overhead − AI compression cost.
146
+
147
+ ### `squeezr gain` subcommands
148
+
149
+ ```bash
150
+ squeezr gain # all-time savings summary
151
+ squeezr gain --session # live session savings from the running proxy
152
+ squeezr gain --details # all-time stats with per-tool breakdown
153
+ squeezr gain --reset # reset all-time counters
154
+ ```
155
+
156
+ ## Project tracking
157
+
158
+ Squeezr automatically detects the active project from the CLI's working directory (e.g. Claude Code's `<cwd>` tag in the system prompt). Per-project stats are tracked across sessions.
159
+
160
+ - **Auto-detection:** extracts the project name from the last meaningful path segment
161
+ - **Manual override:** `squeezr_set_project` MCP tool or `POST /squeezr/project` REST endpoint
162
+ - **Per-project stats:** visible on the Dashboard's Projects page and in `squeezr gain --session`
163
+
164
+ ## Codex support (MITM proxy)
165
+
166
+ Codex uses WebSocket over TLS to `chatgpt.com` with OAuth authentication — it cannot be proxied via `OPENAI_BASE_URL`. Squeezr runs a TLS-terminating MITM proxy on port 8081 that intercepts and compresses WebSocket frames. See [CODEX.md](CODEX.md) for the full technical breakdown.
167
+
168
+ The MITM proxy **only intercepts `chatgpt.com`** traffic. All other HTTPS requests (npm, git, curl, etc.) pass through as a transparent TCP tunnel — no certificate needed, no interference.
169
+
170
+ ## Configuration
171
+
172
+ ### Global config: `squeezr.toml` (next to the binary)
173
+
174
+ ```toml
175
+ # Compression thresholds
176
+ threshold = 800 # min chars to apply compression
177
+ keep_recent = 3 # skip the N most recent tool results
178
+ ai_compression = false # enable AI (Haiku) for tool result compression
179
+
180
+ # Ports
181
+ port = 8080 # HTTP proxy port
182
+ mitm_port = 8081 # MITM proxy port (Codex)
183
+
184
+ # Models
185
+ local_model = "qwen2.5-coder:1.5b" # model for local compression
186
+ local_upstream = "http://localhost:11434"
187
+
188
+ # Tools to never AI-compress (deterministic-only)
189
+ ai_skip_tools = ["Read", "View"]
190
+
191
+ # Compression modes override thresholds
192
+ [modes.soft]
193
+ threshold = 1500
194
+ keep_recent = 10
195
+ ai_compression = false
196
+
197
+ [modes.normal]
198
+ threshold = 800
199
+ keep_recent = 3
200
+
201
+ [modes.aggressive]
202
+ threshold = 200
203
+ keep_recent = 1
204
+ ai_compression = true
205
+
206
+ [modes.critical]
207
+ threshold = 50
208
+ keep_recent = 0
209
+ ai_compression = true
210
+ ```
211
+
212
+ ### Project-level config: `squeezr.project.toml` (in project root)
213
+
214
+ Project-level config is deep-merged over global config. Useful for per-repo tuning.
215
+
216
+ ### Environment variables
217
+
218
+ | Variable | Default | Description |
219
+ |----------|---------|-------------|
220
+ | `SQUEEZR_PORT` | `8080` | HTTP proxy port (Claude, Aider, Gemini) |
221
+ | `SQUEEZR_MITM_PORT` | `8081` | MITM proxy port (Codex) — defaults to SQUEEZR_PORT + 1 |
222
+ | `SQUEEZR_THRESHOLD` | `800` | Min chars to compress |
223
+ | `SQUEEZR_KEEP_RECENT` | `3` | Recent results to skip |
224
+ | `SQUEEZR_DISABLED` | `false` | Disable all compression |
225
+ | `SQUEEZR_DRY_RUN` | `false` | Log savings without compressing |
226
+ | `SQUEEZR_LOCAL_UPSTREAM` | `http://localhost:11434` | Ollama/LM Studio URL |
227
+ | `SQUEEZR_LOCAL_MODEL` | `qwen2.5-coder:1.5b` | Local model for compression |
228
+
229
+ ### Per-command skip
230
+
231
+ Add `# squeezr:skip` anywhere in a Bash command to bypass compression for that result.
232
+
233
+ ## CLI commands
234
+
235
+ ```bash
236
+ squeezr setup # configure env vars, auto-start, CA trust, install MCP server
237
+ squeezr start # start the proxy (auto-restarts if version mismatch after update)
238
+ squeezr update # kill old processes, install latest from npm, restart
239
+ squeezr stop # stop the proxy
240
+ squeezr status # check if proxy is running
241
+ squeezr logs # show last 50 log lines
242
+ squeezr config # print current config
243
+ squeezr ports # change HTTP and MITM proxy ports
244
+ squeezr gain # all-time token savings summary
245
+ squeezr gain --session # live session savings from the running proxy
246
+ squeezr gain --details # all-time stats with per-tool breakdown
247
+ squeezr gain --reset # reset all-time counters
248
+ squeezr discover # detect which AI CLIs are installed
249
+ squeezr bypass # toggle bypass mode (skip compression, keep logging)
250
+ squeezr bypass --on # enable bypass (disable compression)
251
+ squeezr bypass --off # disable bypass (resume compression)
252
+ squeezr tunnel # expose proxy via Cloudflare Tunnel (for Cursor IDE)
253
+ squeezr mcp install # register MCP server in Claude Code, Cursor, Windsurf, Cline
254
+ squeezr mcp uninstall # remove MCP server registration
255
+ squeezr uninstall # remove Squeezr completely (env vars, CA, auto-start, logs)
256
+ squeezr version # print version
257
+ ```
258
+
259
+ ## Resilience
260
+
261
+ Squeezr sits in the critical path between your AI CLI and the upstream API. It's designed to never break your workflow:
262
+
263
+ - **Circuit breaker** — If the AI compression backend (Haiku, GPT-4o-mini, etc.) fails 3 times in a row, Squeezr automatically skips AI compression for 60 seconds, then probes recovery. Deterministic compression continues working. Visible in dashboard, `squeezr status`, and MCP.
264
+ - **5-second AI timeout** — Each AI compression call has a hard 5s timeout. If the backend is slow, the original content passes through unmodified.
265
+ - **Bypass mode** — `squeezr bypass` instantly disables all compression without restarting. Requests still pass through and are logged. Toggle via CLI, MCP, dashboard, or REST API.
266
+ - **Expand rate tracking** — Monitors how often the model calls `squeezr_expand` to recover compressed content. High expand rate signals the compression is too aggressive.
267
+ - **Latency tracking** — p50/p95/p99 compression latency visible in dashboard and MCP stats.
268
+
269
+ ## Compression backends
270
+
271
+ Squeezr uses cheap/free models for AI compression (the deterministic layer is pure regex, no API calls):
272
+
273
+ | Backend | Model | Used for | Cost |
274
+ |---------|-------|----------|------|
275
+ | Anthropic | Haiku | System prompt, session cache | ~$0.0001/call |
276
+ | OpenAI | GPT-4o-mini | Fallback compression | ~$0.0001/call |
277
+ | Gemini | Flash-8B | Fallback compression | Free |
278
+ | Local | qwen2.5-coder:1.5b | Compression when using Ollama | Free |
279
+ | ChatGPT (WS) | GPT-5.4-mini | Codex frame compression | $0 (same subscription) |
280
+
281
+ ## Requirements
282
+
283
+ - Node.js 18+ (compatible with Node.js 24)
284
+ - For Codex MITM: set `HTTPS_PROXY=http://localhost:8081` in the terminal where you run Codex (not set globally to avoid interfering with other tools)
285
+ - For local compression: [Ollama](https://ollama.ai) with `qwen2.5-coder:1.5b`
286
+
287
+ ## Troubleshooting
288
+
289
+ ### Claude Code throws `undefined is not an object (evaluating '$.speed')`
290
+
291
+ Symptom: every prompt in Claude Code immediately errors with `undefined is not an object (evaluating '$.speed')` (or similar `$.X` parse errors). This means Claude Code is sending its API requests to **something that is not Squeezr** but happens to occupy Squeezr's port — typically a Docker container (Apache, nginx, WordPress) bound to `8080`.
292
+
293
+ To diagnose, run:
294
+
295
+ ```bash
296
+ squeezr status
297
+ ```
298
+
299
+ If the output says `a foreign service is` listening on the port, you have three options:
300
+
301
+ 1. **Move Squeezr to a different port** (recommended): `squeezr ports` and pick something free, then reopen your terminal.
302
+ 2. **Stop the offending service**: `docker ps` to find what owns 8080, then `docker stop <id>`.
303
+ 3. **Inspect runtime info**: `cat ~/.squeezr/runtime.json` shows the *actual* port Squeezr is bound to. If it differs from your `ANTHROPIC_BASE_URL`, run `squeezr setup` to refresh your shell profile.
304
+
305
+ Squeezr v1.23.0+ runs a self-test on every startup that detects this exact failure mode and prints actionable hints. You can re-run it any time with:
306
+
307
+ ```bash
308
+ curl -s "http://localhost:$(jq -r .port ~/.squeezr/runtime.json)/squeezr/selftest?run=1" | jq
309
+ ```
310
+
311
+ ## License
312
+
313
+ MIT