squeezr-ai 1.11.1 → 1.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -107,6 +107,67 @@ Your provider's API (Anthropic / OpenAI / Google / Ollama)
107
107
 
108
108
  Recent content is always preserved untouched — by default the last 3 tool results are never compressed. Your CLI always has full context for what it's currently working on.
109
109
 
110
+ ### Does compression make the AI "dumber"?
111
+
112
+ No — it's the opposite. Without Squeezr, long sessions hit the context window limit and the CLI **silently drops old messages entirely**. You lose them with no way to get them back.
113
+
114
+ With Squeezr, old messages are **summarized, not deleted**. A 3,000-token git diff from message #15 becomes a ~150-token summary like:
115
+
116
+ ```
117
+ [squeezr:a3f2c1] git diff: modified src/auth.ts — validateToken:
118
+ added expiry logging + refreshToken call. 3 files changed.
119
+ ```
120
+
121
+ The AI knows *what* you did, not every exact line. And if it needs the full original, it calls `squeezr_expand(a3f2c1)` and gets it back — losslessly.
122
+
123
+ | Scenario | Message #15 at turn #100 |
124
+ |---|---|
125
+ | **No compression** | Probably dropped by the CLI (doesn't fit) |
126
+ | **With Squeezr** | Summarized but present, expandable on demand |
127
+
128
+ The trade-off: less detail, but more memory. Without Squeezr the AI forgets entire messages. With Squeezr it has a one-line note about every decision made — and can retrieve the full context when needed.
129
+
130
+ ---
131
+
132
+ ## Deterministic compression engine
133
+
134
+ Before any AI model is involved, Squeezr runs a full deterministic compression pipeline on every tool result. This is a zero-cost, zero-latency layer that handles the most common developer outputs with specialized parsers:
135
+
136
+ | Tool output | What Squeezr does | Typical savings |
137
+ |---|---|---|
138
+ | **git status** | Parses staged/modified/untracked, drops noise lines | 70-85% |
139
+ | **git diff** | Extracts changed function names, strips context lines (adaptive), summarizes large diffs | 65-92% |
140
+ | **git log** | Compacts to `hash msg (author, date)`, caps entries by pressure | 70-90% |
141
+ | **cargo test/build/clippy** | Extracts only failures, errors, warnings | 80-95% |
142
+ | **vitest/jest/playwright** | Extracts failed tests and assertion errors | 80-95% |
143
+ | **tsc** | Groups errors by file, keeps only error lines | 75-90% |
144
+ | **eslint/biome** | Compacts to file + rule + message | 70-85% |
145
+ | **prettier** | Keeps only files-changed summary | 80-90% |
146
+ | **next build** | Extracts errors and route summary | 75-85% |
147
+ | **pytest** | Extracts FAILED lines and short summaries | 80-95% |
148
+ | **npm/pnpm install** | Strips progress bars, keeps final summary | 85-90% |
149
+ | **npm outdated** | Compact table format | 60-75% |
150
+ | **docker ps/images/logs** | Compact output, strips timestamps | 70-80% |
151
+ | **kubectl get/describe/logs** | Strips timestamps, compacts tables | 70-80% |
152
+ | **gh pr/run/issue** | Strips decorations, keeps data | 65-75% |
153
+ | **curl/wget** | Strips progress, keeps response body/headers | 60-80% |
154
+ | **terraform plan/apply** | Extracts changes summary | 70-85% |
155
+ | **prisma** | Compacts migration and schema output | 65-80% |
156
+ | **Grep results** | Groups by file, caps matches per file (adaptive) | 60-80% |
157
+ | **Read (large files)** | >500 lines: imports + signatures only. >200 lines: head + tail | 70-95% |
158
+ | **Glob** | Compacts file listings into directory summaries | 50-70% |
159
+
160
+ Additionally, on **all** outputs regardless of tool:
161
+ - ANSI escape codes stripped
162
+ - Progress bars and spinners removed
163
+ - Repeated lines collapsed (`"... repeated N more times"`)
164
+ - Duplicate stack traces deduplicated (Node.js and Python)
165
+ - Inline JSON minified (objects >200 chars)
166
+ - Timestamps stripped (ISO 8601, bracketed, bare time formats)
167
+ - Excessive whitespace collapsed
168
+
169
+ This engine runs in pure Node.js — microseconds per result, no API calls, no cost. It handles the bulk of the compression work. The AI layer (Haiku/GPT-4o-mini) only kicks in afterward on older messages where further summarization is needed.
170
+
110
171
  ---
111
172
 
112
173
  ## Supported CLIs and providers
@@ -116,12 +177,12 @@ Squeezr auto-detects which provider each request targets from the auth headers.
116
177
  | CLI | Set this env var | Compresses with | Extra keys needed |
117
178
  |---|---|---|---|
118
179
  | **Claude Code** | `ANTHROPIC_BASE_URL=http://localhost:8080` | Claude Haiku | None |
119
- | **Codex CLI** | `OPENAI_BASE_URL=http://localhost:8080` | GPT-4o-mini | None |
120
- | **Aider** (OpenAI backend) | `OPENAI_BASE_URL=http://localhost:8080` | GPT-4o-mini | None |
180
+ | **Codex CLI** | `openai_base_url=http://localhost:8080` | GPT-4o-mini | None |
181
+ | **Aider** (OpenAI backend) | `openai_base_url=http://localhost:8080` | GPT-4o-mini | None |
121
182
  | **Aider** (Anthropic backend) | `ANTHROPIC_BASE_URL=http://localhost:8080` | Claude Haiku | None |
122
- | **OpenCode** | `OPENAI_BASE_URL=http://localhost:8080` | GPT-4o-mini | None |
183
+ | **OpenCode** | `openai_base_url=http://localhost:8080` | GPT-4o-mini | None |
123
184
  | **Gemini CLI** | `GEMINI_API_BASE_URL=http://localhost:8080` | Gemini Flash 8B | None |
124
- | **Ollama** (any CLI) | `OPENAI_BASE_URL=http://localhost:8080` | Local model (configurable) | None |
185
+ | **Ollama** (any CLI) | `openai_base_url=http://localhost:8080` | Local model (configurable) | None |
125
186
 
126
187
  Squeezr extracts the API key from the request itself and reuses it for compression. Zero extra setup.
127
188
 
@@ -142,13 +203,13 @@ export ANTHROPIC_BASE_URL=http://localhost:8080 # macOS / Linux
142
203
  $env:ANTHROPIC_BASE_URL="http://localhost:8080" # Windows PowerShell
143
204
 
144
205
  # Codex / Aider / OpenCode
145
- export OPENAI_BASE_URL=http://localhost:8080
206
+ export openai_base_url=http://localhost:8080
146
207
 
147
208
  # Gemini CLI
148
209
  export GEMINI_API_BASE_URL=http://localhost:8080
149
210
 
150
211
  # Ollama
151
- export OPENAI_BASE_URL=http://localhost:8080
212
+ export openai_base_url=http://localhost:8080
152
213
  ```
153
214
 
154
215
  Or use the shell installer to set up the env var permanently and register Squeezr as a login service:
package/bin/squeezr.js CHANGED
@@ -181,7 +181,7 @@ function setupWindows() {
181
181
  // 1. Set env vars permanently via setx (user scope, no admin needed)
182
182
  const vars = {
183
183
  ANTHROPIC_BASE_URL: 'http://localhost:8080',
184
- OPENAI_BASE_URL: 'http://localhost:8080',
184
+ openai_base_url: 'http://localhost:8080',
185
185
  GEMINI_API_BASE_URL: 'http://localhost:8080',
186
186
  }
187
187
  for (const [key, value] of Object.entries(vars)) {
@@ -257,7 +257,7 @@ function setupUnix() {
257
257
  const shellBlock = [
258
258
  `# squeezr env vars`,
259
259
  `export ANTHROPIC_BASE_URL=http://localhost:${port}`,
260
- `export OPENAI_BASE_URL=http://localhost:${port}`,
260
+ `export openai_base_url=http://localhost:${port}`,
261
261
  `export GEMINI_API_BASE_URL=http://localhost:${port}`,
262
262
  `# squeezr auto-heal: start proxy if not running`,
263
263
  `if ! curl -sf http://localhost:${port}/squeezr/health >/dev/null 2>&1; then`,
@@ -384,7 +384,7 @@ function setupWSL() {
384
384
  const shellBlock = [
385
385
  `# squeezr env vars`,
386
386
  `export ANTHROPIC_BASE_URL=http://localhost:${port}`,
387
- `export OPENAI_BASE_URL=http://localhost:${port}`,
387
+ `export openai_base_url=http://localhost:${port}`,
388
388
  `export GEMINI_API_BASE_URL=http://localhost:${port}`,
389
389
  `# squeezr auto-heal: start proxy if not running`,
390
390
  `if ! curl -sf http://localhost:${port}/squeezr/health >/dev/null 2>&1; then`,
@@ -421,7 +421,7 @@ function setupWSL() {
421
421
  const setxExe = '/mnt/c/Windows/System32/setx.exe'
422
422
  const winVars = {
423
423
  ANTHROPIC_BASE_URL: 'http://localhost:8080',
424
- OPENAI_BASE_URL: 'http://localhost:8080',
424
+ openai_base_url: 'http://localhost:8080',
425
425
  GEMINI_API_BASE_URL: 'http://localhost:8080',
426
426
  }
427
427
  if (fs.existsSync(setxExe)) {
package/dist/server.js CHANGED
@@ -236,7 +236,8 @@ app.get('/squeezr/expand/:id', (c) => {
236
236
  app.all('*', async (c) => {
237
237
  const upstream = detectUpstream(c.req.raw.headers);
238
238
  const url = new URL(c.req.url);
239
- const targetUrl = `${upstream}${url.pathname}${url.search}`;
239
+ const targetPath = url.pathname === '/responses' ? '/v1/responses' : url.pathname;
240
+ const targetUrl = `${upstream}${targetPath}${url.search}`;
240
241
  const body = await c.req.arrayBuffer();
241
242
  const fwdHeaders = forwardHeaders(c.req.raw.headers);
242
243
  const resp = await fetch(targetUrl, {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "squeezr-ai",
3
- "version": "1.11.1",
3
+ "version": "1.11.2",
4
4
  "description": "AI proxy that compresses Claude Code, Codex, Aider, Gemini CLI and Ollama context windows to save thousands of tokens per session",
5
5
  "keywords": [
6
6
  "claude",