squeezr-ai 1.21.0 → 1.21.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,217 +1,274 @@
1
- # Squeezr
2
-
3
- **Token compression proxy for AI coding CLIs.** Sits between your CLI and the API, compresses context on the fly, saves thousands of tokens per session.
4
-
5
- [![npm](https://img.shields.io/npm/v/squeezr-ai)](https://www.npmjs.com/package/squeezr-ai) [![license](https://img.shields.io/npm/l/squeezr-ai)](LICENSE) [![tests](https://img.shields.io/badge/tests-237%20passing-brightgreen)]()
6
-
7
- ## Supported CLIs
8
-
9
- | CLI | Protocol | Proxy method |
10
- |-----|----------|-------------|
11
- | Claude Code | HTTP to Anthropic API | `ANTHROPIC_BASE_URL=http://localhost:8080` |
12
- | Aider | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
13
- | OpenCode | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
14
- | Gemini CLI | HTTP to Gemini API | `GEMINI_API_BASE_URL=http://localhost:8080` |
15
- | Ollama | HTTP (local) | Transparent via dummy API key detection |
16
- | **Codex** | **WebSocket to chatgpt.com** | **TLS-terminating MITM proxy on :8081** |
17
- | **Cursor IDE** | **ConnectRPC/HTTP2 to api2.cursor.sh** | **`squeezr cursor` — MITM proxy on :8082** |
18
- | Continue (VS Code) | HTTP to OpenAI-compat | `apiBase: http://localhost:8080/v1` |
19
-
20
- ## Quick start
21
-
22
- ```bash
23
- npm install -g squeezr-ai
24
- squeezr setup # configures env vars, auto-start, and CA trust
25
- squeezr start
26
- ```
27
-
28
- `squeezr setup` handles everything automatically:
29
- - Sets `ANTHROPIC_BASE_URL`, `GEMINI_API_BASE_URL`, `NODE_EXTRA_CA_CERTS`
30
- - Installs a shell wrapper (PowerShell on Windows, bash/zsh on Linux/macOS/WSL) that auto-refreshes env vars after `squeezr start/setup/update` no need to restart the terminal
31
- - Registers auto-start (launchd on macOS, systemd on Linux, Task Scheduler/NSSM on Windows)
32
- - **Windows:** imports the MITM CA into the Windows Certificate Store (user-level, no admin required) so Rust-based CLIs like Codex trust the proxy's TLS certificates
33
- - **macOS/Linux/WSL:** generates a CA bundle at `~/.squeezr/mitm-ca/bundle.crt` for `SSL_CERT_FILE`
34
-
35
- ## How it works
36
-
37
- Every request from your AI CLI passes through Squeezr on `localhost:8080`. The proxy applies three compression layers before forwarding to the upstream API:
38
-
39
- ### Layer 1: System prompt compression
40
-
41
- The system prompt (~13KB for Claude Code) is compressed once using an AI model and cached. Subsequent requests reuse the cached version. Saves ~3,000 tokens per request.
42
-
43
- ### Layer 2: Deterministic preprocessing
44
-
45
- Zero-latency, rule-based transformations applied to every tool result:
46
-
47
- - **Noise removal:** ANSI escape codes, progress bars, timestamps, spinner output
48
- - **Deduplication:** repeated stack frames, duplicate lines, redundant git hunks
49
- - **Minification:** JSON whitespace, collapsed blank lines
50
-
51
- ### Layer 3: Tool-specific patterns (~30 rules)
52
-
53
- Each tool result is matched against specialized compression rules:
54
-
55
- | Category | Tools | What it does |
56
- |----------|-------|-------------|
57
- | Git | diff, log, status, branch | 1-line diff context, capped log, compact status |
58
- | JS/TS | vitest, jest, playwright, tsc, eslint, biome, prettier | Failures/errors only, grouped by file |
59
- | Package managers | pnpm, npm | Install summary, list capped at 30, outdated only |
60
- | Build | next build, cargo build | Errors only |
61
- | Test | cargo test, pytest, go test | FAIL blocks + tracebacks only |
62
- | Infra | terraform, docker, kubectl | Resource changes, compact tables, last 50 log lines |
63
- | Other | prisma, gh CLI, curl/wget | Strip ASCII art, cap output, remove verbose headers |
64
-
65
- ### Exclusive patterns
66
-
67
- Applied to specific content types regardless of tool:
68
-
69
- - **Lockfiles** (package-lock.json, Cargo.lock, etc.) → dependency count summary
70
- - **Large code files** (>500 lines) imports + function/class signatures only
71
- - **Long output** (>200 lines) → head + tail + omission note
72
- - **Grep results** grouped by file, matches capped
73
- - **Glob results** (>30 files) → directory tree summary
74
- - **Noisy output** (>50% non-essential) → auto-extract errors/warnings
75
-
76
- ### Adaptive pressure
77
-
78
- Compression aggressiveness scales with context window usage:
79
-
80
- | Context usage | Threshold | Behavior |
81
- |--------------|-----------|----------|
82
- | < 50% | 1,500 chars | Light — only compress large results |
83
- | 50–75% | 800 chars | Normal — standard compression |
84
- | 75–90% | 400 chars | Aggressive — compress most results |
85
- | > 90% | 150 chars | Critical — compress everything, 0 git diff context |
86
-
87
- ### Session optimizations
88
-
89
- - **Session cache:** After ~50 tool results, older results are batch-summarized into a single compact block
90
- - **KV cache warming:** Deterministic MD5-based IDs keep compressed content prefix-stable across requests
91
- - **Cross-turn dedup:** If the same file is read multiple times, earlier reads are replaced with reference pointers
92
- - **Expand on demand:** Compressed blocks include a `squeezr_expand(id)` callback to retrieve full content
93
-
94
- ## Codex support (MITM proxy)
95
-
96
- Codex uses WebSocket over TLS to `chatgpt.com` with OAuth authentication — it cannot be proxied via `OPENAI_BASE_URL`. Squeezr runs a TLS-terminating MITM proxy on port 8081 that intercepts and compresses WebSocket frames. See [CODEX.md](CODEX.md) for the full technical breakdown.
97
-
98
- The MITM proxy **only intercepts `chatgpt.com`** traffic. All other HTTPS requests (npm, git, curl, etc.) pass through as a transparent TCP tunnel — no certificate needed, no interference.
99
-
100
- ## Configuration
101
-
102
- ### Global config: `squeezr.toml` (next to the binary)
103
-
104
- ```toml
105
- [proxy]
106
- port = 8080 # HTTP proxy (Claude, Aider, Gemini)
107
- mitm_port = 8081 # MITM proxy (Codex) defaults to port + 1
108
-
109
- [compression]
110
- threshold = 800 # min chars to trigger compression
111
- keep_recent = 3 # last N results left uncompressed
112
- compress_system_prompt = true
113
- compress_conversation = false # aggressive: compress assistant messages too
114
- # skip_tools = ["Read"] # skip ALL compression for these tools (deterministic + AI)
115
- # only_tools = ["Bash"] # only compress these tools
116
- ai_skip_tools = ["Read"] # skip AI compression only (default); deterministic still runs
117
-
118
- [cache]
119
- enabled = true
120
- max_entries = 1000
121
-
122
- [adaptive]
123
- enabled = true
124
- low_threshold = 1500
125
- mid_threshold = 800
126
- high_threshold = 400
127
- critical_threshold = 150
128
-
129
- [local]
130
- enabled = true
131
- upstream_url = "http://localhost:11434" # Ollama
132
- compression_model = "qwen2.5-coder:1.5b"
133
- ```
134
-
135
- ### Project config: `.squeezr.toml` (in project root)
136
-
137
- Project-level config is deep-merged over global config. Useful for per-repo tuning.
138
-
139
- ### Environment variables
140
-
141
- | Variable | Default | Description |
142
- |----------|---------|-------------|
143
- | `SQUEEZR_PORT` | `8080` | HTTP proxy port (Claude, Aider, Gemini) |
144
- | `SQUEEZR_MITM_PORT` | `8081` | MITM proxy port (Codex) defaults to SQUEEZR_PORT + 1 |
145
- | `SQUEEZR_THRESHOLD` | `800` | Min chars to compress |
146
- | `SQUEEZR_KEEP_RECENT` | `3` | Recent results to skip |
147
- | `SQUEEZR_DISABLED` | `false` | Disable all compression |
148
- | `SQUEEZR_DRY_RUN` | `false` | Log savings without compressing |
149
- | `SQUEEZR_LOCAL_UPSTREAM` | `http://localhost:11434` | Ollama/LM Studio URL |
150
- | `SQUEEZR_LOCAL_MODEL` | `qwen2.5-coder:1.5b` | Local model for compression |
151
-
152
- ### Per-command skip
153
-
154
- Add `# squeezr:skip` anywhere in a Bash command to bypass compression for that result.
155
-
156
- ## Compression backends
157
-
158
- Squeezr uses cheap/free models for AI compression (the deterministic layer is pure regex, no API calls):
159
-
160
- | Backend | Model | Used for | Cost |
161
- |---------|-------|----------|------|
162
- | Anthropic | Haiku | System prompt, session cache | ~$0.0001/call |
163
- | OpenAI | GPT-4o-mini | Fallback compression | ~$0.0001/call |
164
- | Gemini | Flash-8B | Fallback compression | Free |
165
- | Local | qwen2.5-coder:1.5b | Compression when using Ollama | Free |
166
- | ChatGPT (WS) | GPT-5.4-mini | Codex frame compression | $0 (same subscription) |
167
-
168
- ### Typical savings
169
-
170
- - **Per tool result:** 70–95% reduction depending on tool
171
- - **Per session (2 hours):** ~200K tokens → ~80K tokens (60% savings)
172
- - **System prompt:** ~13KB → ~600 tokens (cached)
173
-
174
- ## CLI commands
175
-
176
- ```bash
177
- squeezr setup # configure env vars, auto-start, CA trust, install MCP server
178
- squeezr start # start the proxy (auto-restarts if version mismatch after update)
179
- squeezr update # kill old processes, install latest from npm, restart
180
- squeezr stop # stop the proxy
181
- squeezr status # check if proxy is running
182
- squeezr logs # show last 50 log lines
183
- squeezr config # print current config
184
- squeezr ports # change HTTP and MITM proxy ports
185
- squeezr gain # estimate token savings for a directory
186
- squeezr discover # detect which AI CLIs are installed
187
- squeezr mcp install # register MCP server in Claude Code, Cursor, Windsurf, Cline
188
- squeezr mcp uninstall # remove MCP server registration
189
- squeezr uninstall # remove Squeezr completely (env vars, CA, auto-start, logs)
190
- squeezr version # print version
191
- ```
192
-
193
- ## MCP server
194
-
195
- Squeezr ships with a built-in MCP server (`squeezr-mcp`) that gives any MCP-capable AI CLI real-time awareness of Squeezr's state and control over it.
196
-
197
- **Installed automatically** by `squeezr setup` into Claude Code, Cursor, Windsurf, and Cline.
198
-
199
- Available MCP tools:
200
-
201
- | Tool | Description |
202
- |---|---|
203
- | `squeezr_status` | Is proxy running? Version, port, uptime, mode |
204
- | `squeezr_stats` | Token savings, compression %, cost saved, per-tool breakdown |
205
- | `squeezr_set_mode` | Change compression mode instantly (soft / normal / aggressive / critical) |
206
- | `squeezr_config` | Current thresholds, keepRecent, cache sizes |
207
- | `squeezr_habits` | Detect wasteful patterns this session (duplicate reads, high Bash count, cache efficiency) |
208
-
209
- ## Requirements
210
-
211
- - Node.js 18+ (compatible with Node.js 24)
212
- - For Codex MITM: set `HTTPS_PROXY=http://localhost:8081` in the terminal where you run Codex (not set globally to avoid interfering with other tools)
213
- - For local compression: [Ollama](https://ollama.ai) with `qwen2.5-coder:1.5b`
214
-
215
- ## License
216
-
217
- MIT
1
+ # Squeezr
2
+
3
+ **Token compression proxy for AI coding CLIs.** Sits between your CLI and the API, compresses context on the fly, saves thousands of tokens per session. Includes a real-time web dashboard and MCP integration.
4
+
5
+ [![npm](https://img.shields.io/npm/v/squeezr-ai)](https://www.npmjs.com/package/squeezr-ai) [![license](https://img.shields.io/npm/l/squeezr-ai)](LICENSE)
6
+
7
+ ## Supported CLIs
8
+
9
+ | CLI | Protocol | Proxy method |
10
+ |-----|----------|--------------|
11
+ | Claude Code | HTTP to Anthropic API | `ANTHROPIC_BASE_URL=http://localhost:8080` |
12
+ | Aider | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
13
+ | OpenCode | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
14
+ | Gemini CLI | HTTP to Gemini API | `GEMINI_API_BASE_URL=http://localhost:8080` |
15
+ | Ollama | HTTP (local) | Transparent via dummy API key detection |
16
+ | **Codex** | **WebSocket to chatgpt.com** | **TLS-terminating MITM proxy on :8081** |
17
+ | **Cursor IDE** | **ConnectRPC/HTTP2 to api2.cursor.sh** | **`squeezr cursor` — MITM proxy on :8082** |
18
+ | Continue (VS Code) | HTTP to OpenAI-compat | `apiBase: http://localhost:8080/v1` |
19
+
20
+ Works with both API keys and subscription plans (OAuth) — Claude Code Max/Pro, OpenAI Plus, etc.
21
+
22
+ ## Quick start
23
+
24
+ ```bash
25
+ npm install -g squeezr-ai
26
+ squeezr setup # configures env vars, auto-start, CA trust, and MCP server
27
+ squeezr start
28
+ ```
29
+
30
+ `squeezr setup` handles everything automatically:
31
+ - Sets `ANTHROPIC_BASE_URL`, `GEMINI_API_BASE_URL`, `NODE_EXTRA_CA_CERTS`
32
+ - Installs a shell wrapper (PowerShell on Windows, bash/zsh on Linux/macOS/WSL) that auto-refreshes env vars after `squeezr start/setup/update` no need to restart the terminal
33
+ - Registers auto-start (launchd on macOS, systemd on Linux, Task Scheduler/NSSM on Windows)
34
+ - Registers the MCP server in Claude Code, Cursor, Windsurf, and Cline
35
+ - **Windows:** imports the MITM CA into the Windows Certificate Store (user-level, no admin required) so Rust-based CLIs like Codex trust the proxy's TLS certificates
36
+ - **macOS/Linux/WSL:** generates a CA bundle at `~/.squeezr/mitm-ca/bundle.crt` for `NODE_EXTRA_CA_CERTS`
37
+
38
+ ## How it works
39
+
40
+ Every request from your AI CLI passes through Squeezr on `localhost:8080`. The proxy applies three compression layers before forwarding to the upstream API:
41
+
42
+ ### Layer 1: System prompt compression
43
+
44
+ The system prompt (~13KB for Claude Code) is compressed once using an AI model and cached. Subsequent requests reuse the cached version. Saves ~3,000 tokens per request.
45
+
46
+ ### Layer 2: Deterministic preprocessing
47
+
48
+ Zero-latency, rule-based transformations applied to every tool result:
49
+
50
+ - **Noise removal:** ANSI escape codes, progress bars, timestamps, spinner output
51
+ - **Deduplication:** repeated stack frames, duplicate lines, redundant git hunks
52
+ - **Minification:** JSON whitespace, collapsed blank lines
53
+
54
+ ### Layer 3: Tool-specific patterns (~30 rules)
55
+
56
+ Each tool result is matched against specialized compression rules:
57
+
58
+ | Category | Tools | What it does |
59
+ |----------|-------|--------------|
60
+ | Git | diff, log, status, branch | 1-line diff context, capped log, compact status |
61
+ | JS/TS | vitest, jest, playwright, tsc, eslint, biome, prettier | Failures/errors only, grouped by file |
62
+ | Package managers | pnpm, npm | Install summary, list capped at 30, outdated only |
63
+ | Build | next build, cargo build | Errors only |
64
+ | Test | cargo test, pytest, go test | FAIL blocks + tracebacks only |
65
+ | Infra | terraform, docker, kubectl | Resource changes, compact tables, last 50 log lines |
66
+ | Other | prisma, gh CLI, curl/wget | Strip ASCII art, cap output, remove verbose headers |
67
+
68
+ ### Exclusive patterns
69
+
70
+ Applied to specific content types regardless of tool:
71
+
72
+ - **Lockfiles** (package-lock.json, Cargo.lock, etc.) dependency count summary
73
+ - **Large code files** (>500 lines) → imports + function/class signatures only
74
+ - **Long output** (>200 lines) → head + tail + omission note
75
+ - **Grep results** → grouped by file, matches capped
76
+ - **Glob results** (>30 files) → directory tree summary
77
+ - **Noisy output** (>50% non-essential) → auto-extract errors/warnings
78
+
79
+ ### Adaptive pressure
80
+
81
+ Compression aggressiveness scales with context window usage:
82
+
83
+ | Context usage | Threshold | Behavior |
84
+ |---------------|-----------|----------|
85
+ | < 50% | 1,500 chars | Lightonly compress large results |
86
+ | 50–75% | 800 chars | Normal — standard compression |
87
+ | 75–90% | 400 chars | Aggressive — compress most results |
88
+ | > 90% | 150 chars | Critical — compress everything, 0 git diff context |
89
+
90
+ ### Session optimizations
91
+
92
+ - **Session cache:** After ~50 tool results, older results are batch-summarized into a single compact block
93
+ - **KV cache warming:** Deterministic MD5-based IDs keep compressed content prefix-stable across requests
94
+ - **Cross-turn dedup:** If the same file is read multiple times, earlier reads are replaced with reference pointers
95
+ - **Expand on demand:** Compressed blocks include a `squeezr_expand(id)` callback to retrieve full content
96
+
97
+ ## Web dashboard
98
+
99
+ Live dashboard at `http://localhost:PORT/squeezr/dashboard` with 5 pages:
100
+
101
+ | Page | What it shows |
102
+ |------|---------------|
103
+ | **Overview** | Tokens saved, compression %, requests, cost saved, per-tool breakdown, sparkline chart, context pressure bars, active project badge, savings breakdown (deterministic, AI, dedup, system prompt, overhead) |
104
+ | **Projects** | Per-project aggregate stats across all sessions, auto-detected from working directory or set manually via MCP |
105
+ | **History** | Past proxy sessions grouped by project and day — start/end time, duration, request count, tokens saved, relative timestamps |
106
+ | **Limits** | Real-time rate limit gauges per CLI: Anthropic token/request limits, OpenAI billing & credit balance, Gemini 429 tracking, input/output token usage (session + daily), personal monthly budget bar |
107
+ | **Settings** | Compression mode selector (Soft/Normal/Aggressive/Critical), threshold tuning |
108
+
109
+ Updates every 2 seconds via SSE. Works with both API key and subscription (OAuth) authentication.
110
+
111
+ ## MCP server
112
+
113
+ Built-in MCP server (`squeezr-mcp`) that gives any MCP-capable AI CLI real-time awareness and control of Squeezr.
114
+
115
+ **Installed automatically** by `squeezr setup` into Claude Code, Cursor, Windsurf, and Cline.
116
+
117
+ | Tool | Description |
118
+ |------|-------------|
119
+ | `squeezr_status` | Is proxy running? Version, port, uptime, mode, dry-run state |
120
+ | `squeezr_stats` | Token savings, compression %, cost saved, savings breakdown (deterministic/AI/dedup/system prompt/overhead), per-tool breakdown |
121
+ | `squeezr_set_mode` | Change compression mode instantly (soft / normal / aggressive / critical) |
122
+ | `squeezr_config` | Current thresholds, keepRecent, cache sizes, AI-skipped tools |
123
+ | `squeezr_habits` | Detect wasteful patterns this session (duplicate reads, high Bash count, cache efficiency) |
124
+ | `squeezr_stop` | Stop the proxy gracefully (persists caches before exit) |
125
+ | `squeezr_check_updates` | Check npm for newer Squeezr version |
126
+ | `squeezr_update` | Update to latest version via `npm install -g squeezr-ai@latest` |
127
+ | `squeezr_set_project` | Manually set/clear the current project name (overrides auto-detection) |
128
+
129
+ Every MCP tool response automatically checks for updates and appends a notification banner when a new version is available.
130
+
131
+ ## Honest savings tracking
132
+
133
+ Squeezr tracks token savings with full transparency. `squeezr gain` and the dashboard break down savings by source:
134
+
135
+ | Source | Description |
136
+ |--------|-------------|
137
+ | Deterministic | Rule-based preprocessing (ANSI strip, dedup, minification) free, zero latency |
138
+ | AI compression | Haiku/GPT-mini summarization of tool results — near-free, slight latency |
139
+ | Read dedup | Cross-turn deduplication of repeated file reads |
140
+ | System prompt | One-time AI compression of the system prompt, cached across requests |
141
+ | Tag overhead | Bytes added by `[squeezr:ID]` markers (subtracted from savings) |
142
+ | AI cost | Estimated token cost of compression API calls (subtracted from NET) |
143
+
144
+ **NET savings** = total savings tag overhead AI compression cost.
145
+
146
+ ### `squeezr gain` subcommands
147
+
148
+ ```bash
149
+ squeezr gain # all-time savings summary
150
+ squeezr gain --session # live session savings from the running proxy
151
+ squeezr gain --details # all-time stats with per-tool breakdown
152
+ squeezr gain --reset # reset all-time counters
153
+ ```
154
+
155
+ ## Project tracking
156
+
157
+ Squeezr automatically detects the active project from the CLI's working directory (e.g. Claude Code's `<cwd>` tag in the system prompt). Per-project stats are tracked across sessions.
158
+
159
+ - **Auto-detection:** extracts the project name from the last meaningful path segment
160
+ - **Manual override:** `squeezr_set_project` MCP tool or `POST /squeezr/project` REST endpoint
161
+ - **Per-project stats:** visible on the Dashboard's Projects page and in `squeezr gain --session`
162
+
163
+ ## Codex support (MITM proxy)
164
+
165
+ Codex uses WebSocket over TLS to `chatgpt.com` with OAuth authentication — it cannot be proxied via `OPENAI_BASE_URL`. Squeezr runs a TLS-terminating MITM proxy on port 8081 that intercepts and compresses WebSocket frames. See [CODEX.md](CODEX.md) for the full technical breakdown.
166
+
167
+ The MITM proxy **only intercepts `chatgpt.com`** traffic. All other HTTPS requests (npm, git, curl, etc.) pass through as a transparent TCP tunnel — no certificate needed, no interference.
168
+
169
+ ## Configuration
170
+
171
+ ### Global config: `squeezr.toml` (next to the binary)
172
+
173
+ ```toml
174
+ # Compression thresholds
175
+ threshold = 800 # min chars to apply compression
176
+ keep_recent = 3 # skip the N most recent tool results
177
+ ai_compression = false # enable AI (Haiku) for tool result compression
178
+
179
+ # Ports
180
+ port = 8080 # HTTP proxy port
181
+ mitm_port = 8081 # MITM proxy port (Codex)
182
+
183
+ # Models
184
+ local_model = "qwen2.5-coder:1.5b" # model for local compression
185
+ local_upstream = "http://localhost:11434"
186
+
187
+ # Tools to never AI-compress (deterministic-only)
188
+ ai_skip_tools = ["Read", "View"]
189
+
190
+ # Compression modes override thresholds
191
+ [modes.soft]
192
+ threshold = 1500
193
+ keep_recent = 10
194
+ ai_compression = false
195
+
196
+ [modes.normal]
197
+ threshold = 800
198
+ keep_recent = 3
199
+
200
+ [modes.aggressive]
201
+ threshold = 200
202
+ keep_recent = 1
203
+ ai_compression = true
204
+
205
+ [modes.critical]
206
+ threshold = 50
207
+ keep_recent = 0
208
+ ai_compression = true
209
+ ```
210
+
211
+ ### Project-level config: `squeezr.project.toml` (in project root)
212
+
213
+ Project-level config is deep-merged over global config. Useful for per-repo tuning.
214
+
215
+ ### Environment variables
216
+
217
+ | Variable | Default | Description |
218
+ |----------|---------|-------------|
219
+ | `SQUEEZR_PORT` | `8080` | HTTP proxy port (Claude, Aider, Gemini) |
220
+ | `SQUEEZR_MITM_PORT` | `8081` | MITM proxy port (Codex) — defaults to SQUEEZR_PORT + 1 |
221
+ | `SQUEEZR_THRESHOLD` | `800` | Min chars to compress |
222
+ | `SQUEEZR_KEEP_RECENT` | `3` | Recent results to skip |
223
+ | `SQUEEZR_DISABLED` | `false` | Disable all compression |
224
+ | `SQUEEZR_DRY_RUN` | `false` | Log savings without compressing |
225
+ | `SQUEEZR_LOCAL_UPSTREAM` | `http://localhost:11434` | Ollama/LM Studio URL |
226
+ | `SQUEEZR_LOCAL_MODEL` | `qwen2.5-coder:1.5b` | Local model for compression |
227
+
228
+ ### Per-command skip
229
+
230
+ Add `# squeezr:skip` anywhere in a Bash command to bypass compression for that result.
231
+
232
+ ## CLI commands
233
+
234
+ ```bash
235
+ squeezr setup # configure env vars, auto-start, CA trust, install MCP server
236
+ squeezr start # start the proxy (auto-restarts if version mismatch after update)
237
+ squeezr update # kill old processes, install latest from npm, restart
238
+ squeezr stop # stop the proxy
239
+ squeezr status # check if proxy is running
240
+ squeezr logs # show last 50 log lines
241
+ squeezr config # print current config
242
+ squeezr ports # change HTTP and MITM proxy ports
243
+ squeezr gain # all-time token savings summary
244
+ squeezr gain --session # live session savings from the running proxy
245
+ squeezr gain --details # all-time stats with per-tool breakdown
246
+ squeezr gain --reset # reset all-time counters
247
+ squeezr discover # detect which AI CLIs are installed
248
+ squeezr mcp install # register MCP server in Claude Code, Cursor, Windsurf, Cline
249
+ squeezr mcp uninstall # remove MCP server registration
250
+ squeezr uninstall # remove Squeezr completely (env vars, CA, auto-start, logs)
251
+ squeezr version # print version
252
+ ```
253
+
254
+ ## Compression backends
255
+
256
+ Squeezr uses cheap/free models for AI compression (the deterministic layer is pure regex, no API calls):
257
+
258
+ | Backend | Model | Used for | Cost |
259
+ |---------|-------|----------|------|
260
+ | Anthropic | Haiku | System prompt, session cache | ~$0.0001/call |
261
+ | OpenAI | GPT-4o-mini | Fallback compression | ~$0.0001/call |
262
+ | Gemini | Flash-8B | Fallback compression | Free |
263
+ | Local | qwen2.5-coder:1.5b | Compression when using Ollama | Free |
264
+ | ChatGPT (WS) | GPT-5.4-mini | Codex frame compression | $0 (same subscription) |
265
+
266
+ ## Requirements
267
+
268
+ - Node.js 18+ (compatible with Node.js 24)
269
+ - For Codex MITM: set `HTTPS_PROXY=http://localhost:8081` in the terminal where you run Codex (not set globally to avoid interfering with other tools)
270
+ - For local compression: [Ollama](https://ollama.ai) with `qwen2.5-coder:1.5b`
271
+
272
+ ## License
273
+
274
+ MIT
package/bin/squeezr.js CHANGED
@@ -1,4 +1,4 @@
1
- #!/usr/bin/env node
1
+ #!/usr/bin/env node
2
2
 
3
3
  import { spawn, execSync } from 'child_process'
4
4
  import http from 'http'