squeezr-ai 1.14.7 → 1.14.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,188 +1,188 @@
1
- # Squeezr
2
-
3
- **Token compression proxy for AI coding CLIs.** Sits between your CLI and the API, compresses context on the fly, saves thousands of tokens per session.
4
-
5
- [![npm](https://img.shields.io/npm/v/squeezr-ai)](https://www.npmjs.com/package/squeezr-ai) [![license](https://img.shields.io/npm/l/squeezr-ai)](LICENSE) [![tests](https://img.shields.io/badge/tests-190%20passing-brightgreen)]()
6
-
7
- ## Supported CLIs
8
-
9
- | CLI | Protocol | Proxy method |
10
- |-----|----------|-------------|
11
- | Claude Code | HTTP to Anthropic API | `ANTHROPIC_BASE_URL=http://localhost:8080` |
12
- | Aider | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
13
- | OpenCode | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
14
- | Gemini CLI | HTTP to Gemini API | `GEMINI_API_BASE_URL=http://localhost:8080` |
15
- | Ollama | HTTP (local) | Transparent via dummy API key detection |
16
- | **Codex** | **WebSocket to chatgpt.com** | **TLS-terminating MITM proxy on :8081** |
17
-
18
- ## Quick start
19
-
20
- ```bash
21
- npm install -g squeezr-ai
22
- squeezr setup # configures env vars, auto-start, and CA trust
23
- squeezr start
24
- ```
25
-
26
- `squeezr setup` handles everything automatically:
27
- - Sets `ANTHROPIC_BASE_URL`, `GEMINI_API_BASE_URL`, `HTTPS_PROXY`, `NODE_EXTRA_CA_CERTS`, `NO_PROXY`
28
- - Registers auto-start (launchd on macOS, systemd on Linux, Task Scheduler/NSSM on Windows)
29
- - **Windows:** imports the MITM CA into the Windows Certificate Store (user-level, no admin required) so Rust-based CLIs like Codex trust the proxy's TLS certificates
30
- - **macOS/Linux:** generates a CA bundle at `~/.squeezr/mitm-ca/bundle.crt` for `SSL_CERT_FILE`
31
-
32
- ## How it works
33
-
34
- Every request from your AI CLI passes through Squeezr on `localhost:8080`. The proxy applies three compression layers before forwarding to the upstream API:
35
-
36
- ### Layer 1: System prompt compression
37
-
38
- The system prompt (~13KB for Claude Code) is compressed once using an AI model and cached. Subsequent requests reuse the cached version. Saves ~3,000 tokens per request.
39
-
40
- ### Layer 2: Deterministic preprocessing
41
-
42
- Zero-latency, rule-based transformations applied to every tool result:
43
-
44
- - **Noise removal:** ANSI escape codes, progress bars, timestamps, spinner output
45
- - **Deduplication:** repeated stack frames, duplicate lines, redundant git hunks
46
- - **Minification:** JSON whitespace, collapsed blank lines
47
-
48
- ### Layer 3: Tool-specific patterns (~30 rules)
49
-
50
- Each tool result is matched against specialized compression rules:
51
-
52
- | Category | Tools | What it does |
53
- |----------|-------|-------------|
54
- | Git | diff, log, status, branch | 1-line diff context, capped log, compact status |
55
- | JS/TS | vitest, jest, playwright, tsc, eslint, biome, prettier | Failures/errors only, grouped by file |
56
- | Package managers | pnpm, npm | Install summary, list capped at 30, outdated only |
57
- | Build | next build, cargo build | Errors only |
58
- | Test | cargo test, pytest, go test | FAIL blocks + tracebacks only |
59
- | Infra | terraform, docker, kubectl | Resource changes, compact tables, last 50 log lines |
60
- | Other | prisma, gh CLI, curl/wget | Strip ASCII art, cap output, remove verbose headers |
61
-
62
- ### Exclusive patterns
63
-
64
- Applied to specific content types regardless of tool:
65
-
66
- - **Lockfiles** (package-lock.json, Cargo.lock, etc.) → dependency count summary
67
- - **Large code files** (>500 lines) → imports + function/class signatures only
68
- - **Long output** (>200 lines) → head + tail + omission note
69
- - **Grep results** → grouped by file, matches capped
70
- - **Glob results** (>30 files) → directory tree summary
71
- - **Noisy output** (>50% non-essential) → auto-extract errors/warnings
72
-
73
- ### Adaptive pressure
74
-
75
- Compression aggressiveness scales with context window usage:
76
-
77
- | Context usage | Threshold | Behavior |
78
- |--------------|-----------|----------|
79
- | < 50% | 1,500 chars | Light — only compress large results |
80
- | 50–75% | 800 chars | Normal — standard compression |
81
- | 75–90% | 400 chars | Aggressive — compress most results |
82
- | > 90% | 150 chars | Critical — compress everything, 0 git diff context |
83
-
84
- ### Session optimizations
85
-
86
- - **Session cache:** After ~50 tool results, older results are batch-summarized into a single compact block
87
- - **KV cache warming:** Deterministic MD5-based IDs keep compressed content prefix-stable across requests
88
- - **Cross-turn dedup:** If the same file is read multiple times, earlier reads are replaced with reference pointers
89
- - **Expand on demand:** Compressed blocks include a `squeezr_expand(id)` callback to retrieve full content
90
-
91
- ## Codex support (MITM proxy)
92
-
93
- Codex uses WebSocket over TLS to `chatgpt.com` with OAuth authentication — it cannot be proxied via `OPENAI_BASE_URL`. Squeezr runs a TLS-terminating MITM proxy on port 8081 that intercepts and compresses WebSocket frames. See [CODEX.md](CODEX.md) for the full technical breakdown.
94
-
95
- ## Configuration
96
-
97
- ### Global config: `squeezr.toml` (next to the binary)
98
-
99
- ```toml
100
- [proxy]
101
- port = 8080
102
-
103
- [compression]
104
- threshold = 800 # min chars to trigger compression
105
- keep_recent = 3 # last N results left uncompressed
106
- compress_system_prompt = true
107
- compress_conversation = false # aggressive: compress assistant messages too
108
- # skip_tools = ["Read"] # never compress these tools
109
- # only_tools = ["Bash"] # only compress these tools
110
-
111
- [cache]
112
- enabled = true
113
- max_entries = 1000
114
-
115
- [adaptive]
116
- enabled = true
117
- low_threshold = 1500
118
- mid_threshold = 800
119
- high_threshold = 400
120
- critical_threshold = 150
121
-
122
- [local]
123
- enabled = true
124
- upstream_url = "http://localhost:11434" # Ollama
125
- compression_model = "qwen2.5-coder:1.5b"
126
- ```
127
-
128
- ### Project config: `.squeezr.toml` (in project root)
129
-
130
- Project-level config is deep-merged over global config. Useful for per-repo tuning.
131
-
132
- ### Environment variables
133
-
134
- | Variable | Default | Description |
135
- |----------|---------|-------------|
136
- | `SQUEEZR_PORT` | `8080` | Proxy port (MITM port = this + 1) |
137
- | `SQUEEZR_THRESHOLD` | `800` | Min chars to compress |
138
- | `SQUEEZR_KEEP_RECENT` | `3` | Recent results to skip |
139
- | `SQUEEZR_DISABLED` | `false` | Disable all compression |
140
- | `SQUEEZR_DRY_RUN` | `false` | Log savings without compressing |
141
- | `SQUEEZR_LOCAL_UPSTREAM` | `http://localhost:11434` | Ollama/LM Studio URL |
142
- | `SQUEEZR_LOCAL_MODEL` | `qwen2.5-coder:1.5b` | Local model for compression |
143
-
144
- ### Per-command skip
145
-
146
- Add `# squeezr:skip` anywhere in a Bash command to bypass compression for that result.
147
-
148
- ## Compression backends
149
-
150
- Squeezr uses cheap/free models for AI compression (the deterministic layer is pure regex, no API calls):
151
-
152
- | Backend | Model | Used for | Cost |
153
- |---------|-------|----------|------|
154
- | Anthropic | Haiku | System prompt, session cache | ~$0.0001/call |
155
- | OpenAI | GPT-4o-mini | Fallback compression | ~$0.0001/call |
156
- | Gemini | Flash-8B | Fallback compression | Free |
157
- | Local | qwen2.5-coder:1.5b | Compression when using Ollama | Free |
158
- | ChatGPT (WS) | GPT-5.4-mini | Codex frame compression | $0 (same subscription) |
159
-
160
- ### Typical savings
161
-
162
- - **Per tool result:** 70–95% reduction depending on tool
163
- - **Per session (2 hours):** ~200K tokens → ~80K tokens (60% savings)
164
- - **System prompt:** ~13KB → ~600 tokens (cached)
165
-
166
- ## CLI commands
167
-
168
- ```bash
169
- squeezr setup # configure env vars, auto-start, CA trust
170
- squeezr start # start the proxy (foreground)
171
- squeezr stop # stop the proxy
172
- squeezr status # check if proxy is running
173
- squeezr logs # show last 50 log lines
174
- squeezr config # print current config
175
- squeezr gain # estimate token savings for a directory
176
- squeezr discover # detect which AI CLIs are installed
177
- squeezr version # print version
178
- ```
179
-
180
- ## Requirements
181
-
182
- - Node.js 18+
183
- - For Codex MITM: `HTTPS_PROXY=http://localhost:8081` (set automatically by `squeezr setup`)
184
- - For local compression: [Ollama](https://ollama.ai) with `qwen2.5-coder:1.5b`
185
-
186
- ## License
187
-
188
- MIT
1
+ # Squeezr
2
+
3
+ **Token compression proxy for AI coding CLIs.** Sits between your CLI and the API, compresses context on the fly, saves thousands of tokens per session.
4
+
5
+ [![npm](https://img.shields.io/npm/v/squeezr-ai)](https://www.npmjs.com/package/squeezr-ai) [![license](https://img.shields.io/npm/l/squeezr-ai)](LICENSE) [![tests](https://img.shields.io/badge/tests-190%20passing-brightgreen)]()
6
+
7
+ ## Supported CLIs
8
+
9
+ | CLI | Protocol | Proxy method |
10
+ |-----|----------|-------------|
11
+ | Claude Code | HTTP to Anthropic API | `ANTHROPIC_BASE_URL=http://localhost:8080` |
12
+ | Aider | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
13
+ | OpenCode | HTTP to Anthropic/OpenAI API | `ANTHROPIC_BASE_URL` / `openai_base_url` |
14
+ | Gemini CLI | HTTP to Gemini API | `GEMINI_API_BASE_URL=http://localhost:8080` |
15
+ | Ollama | HTTP (local) | Transparent via dummy API key detection |
16
+ | **Codex** | **WebSocket to chatgpt.com** | **TLS-terminating MITM proxy on :8081** |
17
+
18
+ ## Quick start
19
+
20
+ ```bash
21
+ npm install -g squeezr-ai
22
+ squeezr setup # configures env vars, auto-start, and CA trust
23
+ squeezr start
24
+ ```
25
+
26
+ `squeezr setup` handles everything automatically:
27
+ - Sets `ANTHROPIC_BASE_URL`, `GEMINI_API_BASE_URL`, `HTTPS_PROXY`, `NODE_EXTRA_CA_CERTS`, `NO_PROXY`
28
+ - Registers auto-start (launchd on macOS, systemd on Linux, Task Scheduler/NSSM on Windows)
29
+ - **Windows:** imports the MITM CA into the Windows Certificate Store (user-level, no admin required) so Rust-based CLIs like Codex trust the proxy's TLS certificates
30
+ - **macOS/Linux:** generates a CA bundle at `~/.squeezr/mitm-ca/bundle.crt` for `SSL_CERT_FILE`
31
+
32
+ ## How it works
33
+
34
+ Every request from your AI CLI passes through Squeezr on `localhost:8080`. The proxy applies three compression layers before forwarding to the upstream API:
35
+
36
+ ### Layer 1: System prompt compression
37
+
38
+ The system prompt (~13KB for Claude Code) is compressed once using an AI model and cached. Subsequent requests reuse the cached version. Saves ~3,000 tokens per request.
39
+
40
+ ### Layer 2: Deterministic preprocessing
41
+
42
+ Zero-latency, rule-based transformations applied to every tool result:
43
+
44
+ - **Noise removal:** ANSI escape codes, progress bars, timestamps, spinner output
45
+ - **Deduplication:** repeated stack frames, duplicate lines, redundant git hunks
46
+ - **Minification:** JSON whitespace, collapsed blank lines
47
+
48
+ ### Layer 3: Tool-specific patterns (~30 rules)
49
+
50
+ Each tool result is matched against specialized compression rules:
51
+
52
+ | Category | Tools | What it does |
53
+ |----------|-------|-------------|
54
+ | Git | diff, log, status, branch | 1-line diff context, capped log, compact status |
55
+ | JS/TS | vitest, jest, playwright, tsc, eslint, biome, prettier | Failures/errors only, grouped by file |
56
+ | Package managers | pnpm, npm | Install summary, list capped at 30, outdated only |
57
+ | Build | next build, cargo build | Errors only |
58
+ | Test | cargo test, pytest, go test | FAIL blocks + tracebacks only |
59
+ | Infra | terraform, docker, kubectl | Resource changes, compact tables, last 50 log lines |
60
+ | Other | prisma, gh CLI, curl/wget | Strip ASCII art, cap output, remove verbose headers |
61
+
62
+ ### Exclusive patterns
63
+
64
+ Applied to specific content types regardless of tool:
65
+
66
+ - **Lockfiles** (package-lock.json, Cargo.lock, etc.) → dependency count summary
67
+ - **Large code files** (>500 lines) → imports + function/class signatures only
68
+ - **Long output** (>200 lines) → head + tail + omission note
69
+ - **Grep results** → grouped by file, matches capped
70
+ - **Glob results** (>30 files) → directory tree summary
71
+ - **Noisy output** (>50% non-essential) → auto-extract errors/warnings
72
+
73
+ ### Adaptive pressure
74
+
75
+ Compression aggressiveness scales with context window usage:
76
+
77
+ | Context usage | Threshold | Behavior |
78
+ |--------------|-----------|----------|
79
+ | < 50% | 1,500 chars | Light — only compress large results |
80
+ | 50–75% | 800 chars | Normal — standard compression |
81
+ | 75–90% | 400 chars | Aggressive — compress most results |
82
+ | > 90% | 150 chars | Critical — compress everything, 0 git diff context |
83
+
84
+ ### Session optimizations
85
+
86
+ - **Session cache:** After ~50 tool results, older results are batch-summarized into a single compact block
87
+ - **KV cache warming:** Deterministic MD5-based IDs keep compressed content prefix-stable across requests
88
+ - **Cross-turn dedup:** If the same file is read multiple times, earlier reads are replaced with reference pointers
89
+ - **Expand on demand:** Compressed blocks include a `squeezr_expand(id)` callback to retrieve full content
90
+
91
+ ## Codex support (MITM proxy)
92
+
93
+ Codex uses WebSocket over TLS to `chatgpt.com` with OAuth authentication — it cannot be proxied via `OPENAI_BASE_URL`. Squeezr runs a TLS-terminating MITM proxy on port 8081 that intercepts and compresses WebSocket frames. See [CODEX.md](CODEX.md) for the full technical breakdown.
94
+
95
+ ## Configuration
96
+
97
+ ### Global config: `squeezr.toml` (next to the binary)
98
+
99
+ ```toml
100
+ [proxy]
101
+ port = 8080
102
+
103
+ [compression]
104
+ threshold = 800 # min chars to trigger compression
105
+ keep_recent = 3 # last N results left uncompressed
106
+ compress_system_prompt = true
107
+ compress_conversation = false # aggressive: compress assistant messages too
108
+ # skip_tools = ["Read"] # never compress these tools
109
+ # only_tools = ["Bash"] # only compress these tools
110
+
111
+ [cache]
112
+ enabled = true
113
+ max_entries = 1000
114
+
115
+ [adaptive]
116
+ enabled = true
117
+ low_threshold = 1500
118
+ mid_threshold = 800
119
+ high_threshold = 400
120
+ critical_threshold = 150
121
+
122
+ [local]
123
+ enabled = true
124
+ upstream_url = "http://localhost:11434" # Ollama
125
+ compression_model = "qwen2.5-coder:1.5b"
126
+ ```
127
+
128
+ ### Project config: `.squeezr.toml` (in project root)
129
+
130
+ Project-level config is deep-merged over global config. Useful for per-repo tuning.
131
+
132
+ ### Environment variables
133
+
134
+ | Variable | Default | Description |
135
+ |----------|---------|-------------|
136
+ | `SQUEEZR_PORT` | `8080` | Proxy port (MITM port = this + 1) |
137
+ | `SQUEEZR_THRESHOLD` | `800` | Min chars to compress |
138
+ | `SQUEEZR_KEEP_RECENT` | `3` | Recent results to skip |
139
+ | `SQUEEZR_DISABLED` | `false` | Disable all compression |
140
+ | `SQUEEZR_DRY_RUN` | `false` | Log savings without compressing |
141
+ | `SQUEEZR_LOCAL_UPSTREAM` | `http://localhost:11434` | Ollama/LM Studio URL |
142
+ | `SQUEEZR_LOCAL_MODEL` | `qwen2.5-coder:1.5b` | Local model for compression |
143
+
144
+ ### Per-command skip
145
+
146
+ Add `# squeezr:skip` anywhere in a Bash command to bypass compression for that result.
147
+
148
+ ## Compression backends
149
+
150
+ Squeezr uses cheap/free models for AI compression (the deterministic layer is pure regex, no API calls):
151
+
152
+ | Backend | Model | Used for | Cost |
153
+ |---------|-------|----------|------|
154
+ | Anthropic | Haiku | System prompt, session cache | ~$0.0001/call |
155
+ | OpenAI | GPT-4o-mini | Fallback compression | ~$0.0001/call |
156
+ | Gemini | Flash-8B | Fallback compression | Free |
157
+ | Local | qwen2.5-coder:1.5b | Compression when using Ollama | Free |
158
+ | ChatGPT (WS) | GPT-5.4-mini | Codex frame compression | $0 (same subscription) |
159
+
160
+ ### Typical savings
161
+
162
+ - **Per tool result:** 70–95% reduction depending on tool
163
+ - **Per session (2 hours):** ~200K tokens → ~80K tokens (60% savings)
164
+ - **System prompt:** ~13KB → ~600 tokens (cached)
165
+
166
+ ## CLI commands
167
+
168
+ ```bash
169
+ squeezr setup # configure env vars, auto-start, CA trust
170
+ squeezr start # start the proxy (foreground)
171
+ squeezr stop # stop the proxy
172
+ squeezr status # check if proxy is running
173
+ squeezr logs # show last 50 log lines
174
+ squeezr config # print current config
175
+ squeezr gain # estimate token savings for a directory
176
+ squeezr discover # detect which AI CLIs are installed
177
+ squeezr version # print version
178
+ ```
179
+
180
+ ## Requirements
181
+
182
+ - Node.js 18+
183
+ - For Codex MITM: `HTTPS_PROXY=http://localhost:8081` (set automatically by `squeezr setup`)
184
+ - For local compression: [Ollama](https://ollama.ai) with `qwen2.5-coder:1.5b`
185
+
186
+ ## License
187
+
188
+ MIT