npm - squeezr-ai - Versions diffs - 1.80.6 → 1.80.15 - Mend

squeezr-ai 1.80.6 → 1.80.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.md +17 -7
package/dist/__tests__/compressor.test.js +18 -6
package/dist/__tests__/deterministic.test.js +229 -142
package/dist/__tests__/expand.test.js +99 -1
package/dist/__tests__/staleTurns.test.js +9 -4
package/dist/compressor.js +9 -3
package/dist/config.js +6 -2
package/dist/deterministic.d.ts +0 -6
package/dist/deterministic.js +76 -12
package/dist/expand.d.ts +16 -0
package/dist/expand.js +120 -10
package/dist/mcp.js +33 -0
package/dist/server.js +8 -2
package/dist/staleTurns.js +0 -13
package/package.json +69 -69

package/README.md CHANGED Viewed

@@ -51,7 +51,13 @@ Every request passes through Squeezr on `localhost:8080`. Compression layers, in
 4. **Deterministic preprocessing** — zero-latency regex rules on every tool result: ANSI/progress-bar/timestamp stripping, line dedup, JSON minification, plus ~30 tool-specific patterns (git, vitest/jest, tsc, eslint, cargo, pytest, docker, kubectl, gh…). Byte-stable → cache-safe.
 5. **Cross-turn dedup & diff-reads** — repeated tool outputs collapse to references; repeated file reads become diffs against the latest read. (Only past the cache barrier.)
 6. **Stale-turn summarization** — conversations >40 turns get old assistant prose collapsed to keyword summaries. (Only for clients without prompt caching.)
-7. **AI compression** (opt-in, off by default) — blocks ≥1500 chars summarized by a small model. Measured on real data: 75–91% compression on large blocks. Backends: **Zest (local, free, deterministic)**, Haiku, GPT-4o-mini, Gemini Flash. Guarded by a rate limiter (20 calls/5 min), a persistent on/off toggle, and the cache barrier.
+7. **AI compression** (opt-in, off by default) — old blocks above the AI floor (~1000 chars, auto-raised by the quality governor) summarized by a small model. Backends: **Zest (local, free, deterministic)**, Haiku, GPT-4o-mini, Gemini Flash. Heavily guarded so it only ever helps:
+   - **Structured-data guard** — JSON / JSONL / record dumps / tables are *never* AI-rewritten (a model can silently blank a field value); they stay in their deterministic form. Prose/logs still get compressed.
+   - **Compressibility probe** — a one-shot `deflate` estimate skips already-dense blocks (path/error/test dumps) that wouldn't beat the min-ratio, so no wasted backend calls.
+   - **Acceptance guardrail + retry-with-correction** — every AI result is validated; if it dropped a critical token (path/URL/error code) the model is re-prompted with the exact tokens to restore, else the result is rejected and the deterministic form is kept. Nothing that loses a hard token is ever used.
+   - **Quality governor** — watches expand-rate and guard-reject-rate and auto-raises the min block size (or pauses) when quality dips.
+   - **Backend-aware limits** — local Zest is free → no rate limit, generous timeout, processed sequentially (Ollama serialises anyway). Cloud backends keep a hard cap (20 calls/5 min) and a short timeout to protect spend.
+   - Plus the persistent on/off toggle and the cache barrier.
 ### Recovery: nothing is ever lost
@@ -67,7 +73,7 @@ Compression aggressiveness scales with context usage: <50% → light (1500-char
 | Page | What it shows |
 |------|---------------|
-| **Overview** | All-time tokens saved (single source of truth), ratio + per-request average, cost saved, Top Tools (real per-tool block counts), Session Cache (AI layer), AI Compression card (calls / saved / spent / net), **Prompt Cache health** (read vs creation + hit %), Savings by type (per-technique breakdown), by model (incl. what compression backends spend), by client, compression mode + **Bypass / AI Compression toggles** |
+| **Overview** | **Today-scoped** (resets at midnight): tokens saved, two honest ratios — **% of total sent today** and **% of the last request** (changes every turn), cost comparison (today), Cost/Savings-by-type breakdown (today), Top Tools, Session Cache, AI Compression card (calls / saved / spent / net), **Prompt Cache health** (read vs creation + hit %, **persisted across restarts**), by model / by client, compression mode + **Bypass / AI Compression toggles** |
 | **Savings** | Day / Week / Month / All-time filters with period navigation — per-period tokens, cost, sessions, charts, By Model / By Client / Top Tools / AI Compression / Session Cache, all persisted across restarts |
 | **Settings** | Client base-URL reference, ports, version/uptime, bypass & circuit breaker state, **AI Compression on/off**, **Restart / Stop buttons**, update check |
@@ -76,11 +82,13 @@ Compression aggressiveness scales with context usage: <50% → light (1500-char
 Squeezr sits in the critical path. It is designed to never break your workflow — and never burn your plan:
 - **Bypass mode (persisted)** — one click/command disables all compression; survives restarts. The emergency stop.
-- **AI compression master switch (persisted, default OFF)** — with a subscription OAuth token, AI compression calls bill against *your own plan*; only enable it with a separately billed API key or the free local Zest backend.
-- **AI rate limiter** — hard cap of 20 AI calls per 5-minute sliding window, process-global.
-- **AI minimum block size (1500 chars)** — measured on real data: small blocks *expand* under AI compression; Squeezr never AI-compresses them.
+- **AI compression master switch (persisted, default OFF)** — with a subscription OAuth token, AI compression calls bill against *your own plan*; Squeezr refuses to auto-route to Haiku on an OAuth token. Use the free local Zest backend or a separately billed API key.
+- **AI rate limiter (cloud only)** — hard cap of 20 AI calls per 5-minute sliding window for paid cloud backends (protects spend). Local Zest is free → not rate-limited.
+- **AI minimum block size (~1000 chars, governed)** — small blocks can't be compressed without loss; Squeezr never AI-compresses below the floor, and the quality governor raises it automatically if reject/expand rates climb.
+- **Structured-data & compressibility guards** — AI never rewrites structured data (JSON/records → no field corruption), and dense/incompressible blocks skip AI entirely.
+- **Acceptance guardrail + retry-with-correction** — AI output that drops a critical token or doesn't save enough is rejected (after one corrective retry); the deterministic form is kept.
 - **Cache barrier** — unstable passes can't touch the cached prefix (see prompt-cache safety above).
-- **Circuit breaker** — 3 consecutive AI backend failures → AI compression disabled for 60s, deterministic continues.
+- **Circuit breaker + backend-aware timeouts** — 3 consecutive AI backend failures → AI disabled for 60s, deterministic continues. Local calls get a generous timeout and run sequentially (Ollama serialises) so they don't false-timeout.
 - **Atomic persistence** — stats, history, caches and toggles are written atomically (tmp + rename); a crash can't corrupt them.
 - **Self-test on startup** — detects port squatting (the classic `$.speed` Claude Code error), env-var drift, and pipeline issues.
@@ -95,7 +103,7 @@ One source of truth (`~/.squeezr/stats.json`, continuous net counters — never
 ## Zest — Squeezr's own compression model
-Zest (`zest-0.8b`, fine-tuned from Qwen3.5-0.8B with LoRA) is Squeezr's local compression model: free, runs on CPU via Ollama, and **deterministic in greedy decoding** — which makes AI compression byte-stable and therefore cache-safe. Status: v3 trained (89% eval accuracy), GGUF packaging in progress. Design doc: [docs/REINVENT_AI.md](docs/REINVENT_AI.md)
+Zest (`zest-0.8b`, fine-tuned from Qwen3.5-0.8B with LoRA) is Squeezr's local compression model: free, runs on CPU via Ollama, and **deterministic in greedy decoding** (temperature 0) — which makes AI compression byte-stable and therefore cache-safe. Status: deployed and selectable as the `local` backend (Ollama). Training data is being regenerated against Squeezr's own runtime guard (every example must keep all hard tokens — paths/URLs/error codes — and clear the min-ratio) so the model learns guard-passing compression instead of token-dropping. Design doc: [docs/REINVENT_AI.md](docs/REINVENT_AI.md)
 ## MCP server
@@ -112,8 +120,10 @@ User config lives at **`~/.squeezr/squeezr.toml`** (survives npm updates). A pro
 threshold = 800              # min chars to compress a tool result
 keep_recent = 3              # recent tool results never touched
 ai_compression = false       # MASTER switch for AI calls — default OFF (see Safety)
+backend = "local"            # auto | local (Zest) | haiku | gpt-mini | gemini-flash
 compress_system_prompt = true
 compress_conversation = true
+compress_assistant_ai = false # AI-compress long old assistant turns (prose-heavy chats)
 stale_turns = true           # auto-disabled when prompt-cache markers are present
 tool_desc_compress = true    # first-paragraph truncation + expand recovery
 tool_desc_expand = true

package/dist/__tests__/compressor.test.js CHANGED Viewed

@@ -1,6 +1,7 @@
 import { describe, it, expect, vi, beforeEach } from 'vitest';
 import { clearExpandStore } from '../expand.js';
 import { clearSessionCache } from '../sessionCache.js';
+import { runtimeOverrides } from '../config.js';
 // Mock AI SDKs before importing compressor
 vi.mock('@anthropic-ai/sdk', () => ({
     // function (not arrow) — `new Anthropic()` requires a constructable implementation
@@ -34,10 +35,17 @@ vi.mock('../aiToggle.js', () => ({
     setAiCompression: () => { },
     toggleAiCompression: () => true,
 }));
-// Mock fetch for Gemini
+// Mock fetch for the fetch-based backends. Must satisfy BOTH shapes because the
+// default backend is now `local` (Ollama, /api/chat → {message:{content}}) and
+// effectiveBackend() reads the global config singleton, not the per-test config.
+// Gemini uses {candidates}. `ok: true` keeps ollamaCompressChunk from throwing.
 const mockFetch = vi.fn().mockResolvedValue({
+    ok: true,
     json: async () => ({
         candidates: [{ content: { parts: [{ text: 'AI compressed summary' }] } }],
+        message: { content: 'AI compressed summary' },
+        prompt_eval_count: 10,
+        eval_count: 5,
     }),
 });
 vi.stubGlobal('fetch', mockFetch);
@@ -72,6 +80,10 @@ beforeEach(() => {
     clearExpandStore();
     clearSessionCache();
     vi.clearAllMocks();
+    // effectiveBackend() reads the GLOBAL config singleton (default `local`), so by
+    // default these tests exercise the Ollama path (mock fetch above returns a valid
+    // Ollama-shaped body). Tests that need a specific cloud backend set it explicitly.
+    runtimeOverrides.compressionBackend = undefined;
 });
 // ── Anthropic format ──────────────────────────────────────────────────────────
 describe('compressAnthropicMessages', () => {
@@ -118,7 +130,7 @@ describe('compressAnthropicMessages', () => {
         const msgs = makeMessages(['x'.repeat(1600), 'y'.repeat(1600)]);
         const [result] = await compressAnthropicMessages(msgs, 'key', baseConfig);
         const compressed = result[1].content[0].content;
-        expect(compressed).toMatch(/\[squeezr:[a-f0-9]{6} -\d+%\]/);
+        expect(compressed).toMatch(/\[squeezr:[a-f0-9]{6} -\d+% — squeezr_expand\("[a-f0-9]{6}"\) for full exact text\]/);
     });
     it('does not compress blocks below threshold', async () => {
         const shortText = 'short'; // below threshold of 50
@@ -234,11 +246,10 @@ describe('compressOpenAIMessages', () => {
         expect(savings.compressed).toBe(1);
     });
     it('uses Ollama backend for local keys', async () => {
-        const OpenAI = (await import('openai')).default;
-        const msgs = makeMessages(['z'.repeat(200), 'v'.repeat(200)]);
+        const msgs = makeMessages(['z'.repeat(1600), 'v'.repeat(1600)]);
         await compressOpenAIMessages(msgs, 'ollama-key', { ...baseConfig, isLocalKey: () => true }, true);
-        // OpenAI client should be called (Ollama uses OpenAI-compatible API)
-        expect(OpenAI).toHaveBeenCalled();
+        // Local compression uses Ollama's native /api/chat over fetch (not the OpenAI SDK).
+        expect(mockFetch).toHaveBeenCalledWith(expect.stringContaining('/api/chat'), expect.any(Object));
     });
     it('does not inject expand tool for local requests', async () => {
         const msgs = makeMessages(['short']);
@@ -281,6 +292,7 @@ describe('compressGeminiContents', () => {
         expect(savings.compressed).toBe(1);
     });
     it('uses fetch with Gemini API URL', async () => {
+        runtimeOverrides.compressionBackend = 'auto'; // use the per-API default (Gemini), not the global `local`
         const cts = makeContents(['g'.repeat(200), 'h'.repeat(200)]);
         await compressGeminiContents(cts, 'my-google-key', baseConfig);
         expect(mockFetch).toHaveBeenCalledWith(expect.stringContaining('generativelanguage.googleapis.com'), expect.any(Object));

package/dist/__tests__/deterministic.test.js CHANGED Viewed

@@ -1,5 +1,77 @@
 import { describe, it, expect } from 'vitest';
 import { preprocess, preprocessForTool, preprocessRatio } from '../deterministic.js';
+// ── Read fidelity (Edit-mismatch / corruption guards) ─────────────────────────
+describe('preprocessForTool - Read stays verbatim', () => {
+    it('does not minify embedded JSON in a read (Edit would mismatch)', () => {
+        const file = '{\n  "name": "pkg",\n  "version": "1.0.0",\n  "scripts": {\n    "build": "tsc"\n  }\n}';
+        expect(preprocessForTool(file, 'Read')).toBe(file);
+    });
+    it('does not strip timestamp-like substrings from a read', () => {
+        const file = 'const RELEASE = "2026-01-02T03:04:05Z"\nconst T = "12:34:56 "';
+        expect(preprocessForTool(file, 'Read')).toBe(file);
+    });
+    it('does not collapse blank lines or dedup repeated lines in a read', () => {
+        const file = 'a\n\n\n\nb\nx\nx\nx\nx\n';
+        expect(preprocessForTool(file, 'Read')).toBe(file);
+    });
+    it('still strips ANSI/control noise from a read', () => {
+        expect(preprocessForTool('\x1B[32mcode\x1B[0m', 'Read')).toBe('code');
+    });
+});
+describe('minifyJson - big integer safety', () => {
+    it('leaves blocks with 16+ digit integers untouched (precision corruption)', () => {
+        const block = '{ "id": 1234567890123456789, "name": "x", "padding": "' + 'y'.repeat(200) + '" }';
+        // The big id must survive verbatim (JSON.parse would round it)
+        expect(preprocess(block)).toContain('1234567890123456789');
+    });
+    it('still minifies safe JSON', () => {
+        const block = '{\n  "a": 1,\n  "b": "' + 'z'.repeat(200) + '"\n}';
+        const out = preprocess(block);
+        expect(out).not.toContain('\n  "a"'); // got minified
+    });
+});
+describe('compactGrepOutput - Windows paths', () => {
+    it('groups by full drive-letter path, not the bare drive', () => {
+        const lines = Array.from({ length: 25 }, (_, i) => `C:\\src\\app.ts:${i + 1}:match ${i}`);
+        const out = preprocessForTool(lines.join('\n'), 'Grep');
+        expect(out).toContain('C:\\src\\app.ts');
+        expect(out).not.toMatch(/^C \(/m); // not grouped under bogus file "C"
+    });
+});
+// ── Expand results are never re-compressed ────────────────────────────────────
+describe('preprocessForTool - squeezr_expand result is verbatim', () => {
+    it('returns an expand-call result untouched, even when huge', () => {
+        const huge = Array.from({ length: 500 }, (_, i) => `recovered line ${i}`).join('\n');
+        // mcp-prefixed name (how Claude Code routes the MCP tool)
+        expect(preprocessForTool(huge, 'mcp__squeezr__squeezr_expand')).toBe(huge);
+        expect(preprocessForTool(huge, 'squeezr_expand')).toBe(huge);
+    });
+});
+// ── Reversibility: lossy deterministic compaction gets an expand pointer ──────
+describe('preprocessForTool - lossy compaction is recoverable', () => {
+    it('appends a squeezr_expand pointer when a huge read is truncated', () => {
+        const big = Array.from({ length: 400 }, (_, i) => `line ${i}`).join('\n');
+        const out = preprocessForTool(big, 'Read');
+        expect(out).toContain('squeezr_expand("');
+        expect(out).toContain('omitted');
+    });
+    it('does NOT append a pointer to a small verbatim read', () => {
+        const small = 'const a = 1\nconst b = 2\n';
+        expect(preprocessForTool(small, 'Read')).toBe(small);
+    });
+    it('appends a pointer when a long bash output is truncated', () => {
+        const log = Array.from({ length: 200 }, (_, i) => `noise output line ${i}`).join('\n');
+        expect(preprocessForTool(log, 'Bash')).toContain('squeezr_expand("');
+    });
+    it('the pointer id round-trips through the expand store', async () => {
+        const big = Array.from({ length: 400 }, (_, i) => `unique-line-${i}`).join('\n');
+        const out = preprocessForTool(big, 'Read');
+        const id = out.match(/squeezr_expand\("([0-9a-f]{6})"\)/)?.[1];
+        expect(id).toBeTruthy();
+        const { retrieveOriginal } = await import('../expand.js');
+        expect(retrieveOriginal(id)).toBe(big);
+    });
+});
 // ── Base pipeline ─────────────────────────────────────────────────────────────
 describe('preprocess - base pipeline', () => {
     it('strips ANSI escape codes', () => {
@@ -74,25 +146,25 @@ describe('preprocessRatio', () => {
 });
 // ── Git diff ──────────────────────────────────────────────────────────────────
 describe('preprocessForTool - git diff', () => {
-    const sampleDiff = `diff --git a/src/foo.ts b/src/foo.ts
-index abc123..def456 100644
---- a/src/foo.ts
-+++ b/src/foo.ts
-@@ -1,7 +1,7 @@
- import { foo } from './bar'
--const x = 1
-+const x = 2
- function hello() {
-   return x
- }
-@@ -10,5 +10,5 @@
- context before
--old line
-+new line
- context after
- more context
+    const sampleDiff = `diff --git a/src/foo.ts b/src/foo.ts
+index abc123..def456 100644
+--- a/src/foo.ts
++++ b/src/foo.ts
+@@ -1,7 +1,7 @@
+ import { foo } from './bar'
+-const x = 1
++const x = 2
+ function hello() {
+   return x
+ }
+@@ -10,5 +10,5 @@
+ context before
+-old line
++new line
+ context after
+ more context
  even more context`;
     it('keeps diff headers', () => {
         const out = preprocessForTool(sampleDiff, 'Bash');
@@ -121,24 +193,24 @@ index abc123..def456 100644
 });
 // ── Cargo test ────────────────────────────────────────────────────────────────
 describe('preprocessForTool - cargo test', () => {
-    const allPassing = `running 5 tests
-test foo::test_a ... ok
-test foo::test_b ... ok
-test foo::test_c ... ok
-test foo::test_d ... ok
-test foo::test_e ... ok
+    const allPassing = `running 5 tests
+test foo::test_a ... ok
+test foo::test_b ... ok
+test foo::test_c ... ok
+test foo::test_d ... ok
+test foo::test_e ... ok
 test result: ok. 5 passed; 0 failed; 0 ignored`;
-    const withFailures = `running 3 tests
-test foo::test_a ... ok
-test foo::test_b ... FAILED
-test foo::test_c ... ok
-failures:
----- foo::test_b stdout ----
-thread 'foo::test_b' panicked at 'assertion failed', src/lib.rs:10
+    const withFailures = `running 3 tests
+test foo::test_a ... ok
+test foo::test_b ... FAILED
+test foo::test_c ... ok
+failures:
+---- foo::test_b stdout ----
+thread 'foo::test_b' panicked at 'assertion failed', src/lib.rs:10
 test result: FAILED. 2 passed; 1 failed`;
     it('returns only summary when all tests pass', () => {
         const out = preprocessForTool(allPassing, 'Bash');
@@ -161,14 +233,14 @@ test result: FAILED. 2 passed; 1 failed`;
 });
 // ── Cargo build / clippy ──────────────────────────────────────────────────────
 describe('preprocessForTool - cargo build errors', () => {
-    const buildOutput = `   Compiling foo v0.1.0
-   Compiling bar v1.2.3
-error[E0308]: mismatched types
-  --> src/main.rs:5:10
-   |
- 5 |     let x: i32 = "hello";
-   |            ---   ^^^^^^^ expected i32, found &str
-   |
+    const buildOutput = `   Compiling foo v0.1.0
+   Compiling bar v1.2.3
+error[E0308]: mismatched types
+  --> src/main.rs:5:10
+   |
+ 5 |     let x: i32 = "hello";
+   |            ---   ^^^^^^^ expected i32, found &str
+   |
 error: aborting due to 1 previous error`;
     it('removes Compiling lines', () => {
         const out = preprocessForTool(buildOutput, 'Bash');
@@ -188,23 +260,23 @@ error: aborting due to 1 previous error`;
 });
 // ── Vitest ────────────────────────────────────────────────────────────────────
 describe('preprocessForTool - vitest', () => {
-    const allPass = `✓ src/foo.test.ts (3)
-  ✓ test one 5ms
-  ✓ test two 3ms
-  ✓ test three 2ms
-Test Files  1 passed (1)
-Tests       3 passed (3)
+    const allPass = `✓ src/foo.test.ts (3)
+  ✓ test one 5ms
+  ✓ test two 3ms
+  ✓ test three 2ms
+Test Files  1 passed (1)
+Tests       3 passed (3)
 Duration    120ms`;
-    const withFail = `✓ src/foo.test.ts (2)
-× src/bar.test.ts (1)
-  × failing test 10ms
-    AssertionError: expected 1 to equal 2
-      - Expected: 2
-      + Received: 1
-Test Files  1 failed | 1 passed (2)
-Tests       1 failed | 2 passed (3)
+    const withFail = `✓ src/foo.test.ts (2)
+× src/bar.test.ts (1)
+  × failing test 10ms
+    AssertionError: expected 1 to equal 2
+      - Expected: 2
+      + Received: 1
+Test Files  1 failed | 1 passed (2)
+Tests       1 failed | 2 passed (3)
 Duration    150ms`;
     it('returns only summary when all tests pass', () => {
         const out = preprocessForTool(allPass, 'Bash');
@@ -230,8 +302,8 @@ Duration    150ms`;
 });
 // ── TypeScript ────────────────────────────────────────────────────────────────
 describe('preprocessForTool - tsc errors', () => {
-    const tscOutput = `src/foo.ts(10,5): error TS2345: Argument of type 'string' is not assignable to parameter of type 'number'.
-src/foo.ts(20,3): error TS2551: Property 'bar' does not exist on type 'Foo'.
+    const tscOutput = `src/foo.ts(10,5): error TS2345: Argument of type 'string' is not assignable to parameter of type 'number'.
+src/foo.ts(20,3): error TS2551: Property 'bar' does not exist on type 'Foo'.
 src/bar.ts(5,10): error TS2304: Cannot find name 'baz'.`;
     it('groups errors by file', () => {
         const out = preprocessForTool(tscOutput, 'Bash');
@@ -249,13 +321,13 @@ src/bar.ts(5,10): error TS2304: Cannot find name 'baz'.`;
 });
 // ── ESLint ────────────────────────────────────────────────────────────────────
 describe('preprocessForTool - eslint', () => {
-    const eslintOutput = `/src/foo.ts
-  10:5  error  'x' is defined but never used  no-unused-vars
-  20:1  warning  Unexpected console statement  no-console
-/src/bar.ts
-  5:10  error  Missing semicolon  semi
+    const eslintOutput = `/src/foo.ts
+  10:5  error  'x' is defined but never used  no-unused-vars
+  20:1  warning  Unexpected console statement  no-console
+/src/bar.ts
+  5:10  error  Missing semicolon  semi
 ✖ 3 problems (2 errors, 1 warning)`;
     it('keeps error/warning lines', () => {
         const out = preprocessForTool(eslintOutput, 'Bash');
@@ -288,7 +360,7 @@ describe('preprocessForTool - pnpm install', () => {
 });
 // ── Docker ────────────────────────────────────────────────────────────────────
 describe('preprocessForTool - docker ps', () => {
-    const dockerPs = `CONTAINER ID   IMAGE         COMMAND       CREATED       STATUS        PORTS     NAMES
+    const dockerPs = `CONTAINER ID   IMAGE         COMMAND       CREATED       STATUS        PORTS     NAMES
 abc123def456   nginx:latest  "/docker-e…"  2 hours ago   Up 2 hours    80/tcp    web`;
     it('keeps header and container rows', () => {
         const out = preprocessForTool(dockerPs, 'Bash');
@@ -312,15 +384,15 @@ describe('preprocessForTool - long bash output (generic truncation)', () => {
 });
 // ── gh CLI ────────────────────────────────────────────────────────────────────
 describe('preprocessForTool - gh pr', () => {
-    const ghPr = `title:  Fix the bug
-state:  OPEN
-author: sergioramosv
-url:    https://github.com/sergioramosv/squeezr/pull/5
-number: 5
-labels: bug, help wanted
-This is a long PR body with lots of text explaining the changes
-in great detail that we don't really need in a summary.
+    const ghPr = `title:  Fix the bug
+state:  OPEN
+author: sergioramosv
+url:    https://github.com/sergioramosv/squeezr/pull/5
+number: 5
+labels: bug, help wanted
+This is a long PR body with lots of text explaining the changes
+in great detail that we don't really need in a summary.
 More text here. Even more text. Lots and lots of text.`;
     it('keeps key metadata fields', () => {
         const out = preprocessForTool(ghPr, 'Bash');
@@ -379,6 +451,21 @@ describe('preprocessForTool - Read tool', () => {
         expect(out).toContain('omitted');
         expect(out.length).toBeLessThan(lockfile.length / 10);
     });
+    it('does NOT misclassify a source file that merely mentions lockfile patterns', () => {
+        // Regression: a 600-line source file that contains the lockfile signature
+        // strings ONCE (as the detector's own patterns) must not be nuked as a
+        // lockfile. Real harm: this destroys content with no expand copy.
+        const src = [
+            `function looksLikeLockfile(text) {`,
+            `  return text.includes('integrity sha') || text.includes('"resolved"') || text.includes('# yarn lockfile')`,
+            `}`,
+            ...Array.from({ length: 600 }, (_, i) => `const value${i} = compute(${i})`),
+        ].join('\n');
+        const out = preprocessForTool(src, 'Read');
+        expect(out).not.toContain('lockfile —'); // not the lockfile-omitted summary
+        // It is a >500-line TS file → semantic structure extraction keeps signatures
+        expect(out).toContain('looksLikeLockfile');
+    });
 });
 // ── Glob tool ─────────────────────────────────────────────────────────────────
 describe('preprocessForTool - Glob tool', () => {
@@ -397,18 +484,18 @@ describe('preprocessForTool - Glob tool', () => {
 });
 // ── git status ────────────────────────────────────────────────────────────────
 describe('preprocessForTool - git status', () => {
-    const status = `On branch main
-Your branch is up to date with 'origin/main'.
-Changes not staged for commit:
-  (use "git add <file>..." to update what will be committed)
-	modified:   src/foo.ts
-	modified:   src/bar.ts
-Untracked files:
-  (use "git add <file>..." to include in what will be committed)
-	new-file.ts
+    const status = `On branch main
+Your branch is up to date with 'origin/main'.
+Changes not staged for commit:
+  (use "git add <file>..." to update what will be committed)
+	modified:   src/foo.ts
+	modified:   src/bar.ts
+Untracked files:
+  (use "git add <file>..." to include in what will be committed)
+	new-file.ts
 no changes added to commit`;
     it('shows branch name', () => {
         const out = preprocessForTool(status, 'Bash');
@@ -459,14 +546,14 @@ describe('preprocessForTool - git log --oneline', () => {
 });
 // ── pnpm list ─────────────────────────────────────────────────────────────────
 describe('preprocessForTool - pnpm/npm list', () => {
-    const npmList = `my-app@1.0.0
-├── express@4.18.2
-│   ├── accepts@1.3.8
-│   │   └── mime-types@2.1.35
-│   └── body-parser@1.20.2
-├── react@18.2.0
-│   └── scheduler@0.23.0
-└── typescript@5.8.3
+    const npmList = `my-app@1.0.0
+├── express@4.18.2
+│   ├── accepts@1.3.8
+│   │   └── mime-types@2.1.35
+│   └── body-parser@1.20.2
+├── react@18.2.0
+│   └── scheduler@0.23.0
+└── typescript@5.8.3
     └── typescript@5.8.3 deduped`;
     it('keeps direct dependencies', () => {
         const out = preprocessForTool(npmList, 'Bash');
@@ -508,15 +595,15 @@ describe('preprocessForTool - pnpm outdated', () => {
 });
 // ── prisma ────────────────────────────────────────────────────────────────────
 describe('preprocessForTool - prisma', () => {
-    const prismaOutput = `Prisma schema loaded from prisma/schema.prisma
-Environment variables loaded from .env
-✔ Generated Prisma Client (v5.10.2) to ./node_modules/@prisma/client in 127ms
-┌─────────────────────────────────────────────────────────┐
-│  Starter Prisma Tip:                                    │
-│  Understand your Prisma schema better with the          │
-│  Prisma VS Code Extension, for free!                    │
+    const prismaOutput = `Prisma schema loaded from prisma/schema.prisma
+Environment variables loaded from .env
+✔ Generated Prisma Client (v5.10.2) to ./node_modules/@prisma/client in 127ms
+┌─────────────────────────────────────────────────────────┐
+│  Starter Prisma Tip:                                    │
+│  Understand your Prisma schema better with the          │
+│  Prisma VS Code Extension, for free!                    │
 └─────────────────────────────────────────────────────────┘`;
     it('keeps important output lines', () => {
         const out = preprocessForTool(prismaOutput, 'Bash');
@@ -551,18 +638,18 @@ describe('preprocessForTool - gh pr checks', () => {
 });
 // ── Playwright ────────────────────────────────────────────────────────────────
 describe('preprocessForTool - playwright', () => {
-    const withFail = `Running 5 tests using 2 workers
-  ✘ tests/login.spec.ts:12:5 › Login › should log in [chromium] (5.2s)
-  Error: Timed out 5000ms waiting for expect(locator).toBeVisible()
-  Locator: getByRole('button', { name: 'Submit' })
-  Expected: visible
-  Received: hidden
-    at tests/login.spec.ts:15:22
-  ✓ tests/home.spec.ts:5:5 › Home › loads [chromium] (1.1s)
+    const withFail = `Running 5 tests using 2 workers
+  ✘ tests/login.spec.ts:12:5 › Login › should log in [chromium] (5.2s)
+  Error: Timed out 5000ms waiting for expect(locator).toBeVisible()
+  Locator: getByRole('button', { name: 'Submit' })
+  Expected: visible
+  Received: hidden
+    at tests/login.spec.ts:15:22
+  ✓ tests/home.spec.ts:5:5 › Home › loads [chromium] (1.1s)
   1 failed, 4 passed (12s)`;
     it('keeps failure blocks', () => {
         const out = preprocessForTool(withFail, 'Bash');
@@ -580,11 +667,11 @@ describe('preprocessForTool - playwright', () => {
 });
 // ── Python / pytest ───────────────────────────────────────────────────────────
 describe('preprocessForTool - python traceback', () => {
-    const traceback = `Traceback (most recent call last):
-  File "app.py", line 42, in process
-    result = calculate(x)
-  File "app.py", line 17, in calculate
-    return x / 0
+    const traceback = `Traceback (most recent call last):
+  File "app.py", line 42, in process
+    result = calculate(x)
+  File "app.py", line 17, in calculate
+    return x / 0
 ZeroDivisionError: division by zero`;
     it('keeps traceback lines', () => {
         const out = preprocessForTool(traceback, 'Bash');
@@ -600,11 +687,11 @@ ZeroDivisionError: division by zero`;
 });
 // ── Go test ───────────────────────────────────────────────────────────────────
 describe('preprocessForTool - go test', () => {
-    const goOutput = `--- PASS: TestAdd (0.00s)
---- FAIL: TestDivide (0.00s)
-    calc_test.go:15: expected 5, got 0
---- PASS: TestMultiply (0.00s)
-FAIL
+    const goOutput = `--- PASS: TestAdd (0.00s)
+--- FAIL: TestDivide (0.00s)
+    calc_test.go:15: expected 5, got 0
+--- PASS: TestMultiply (0.00s)
+FAIL
 FAIL\tgithub.com/user/calc\t0.003s`;
     it('keeps failure lines', () => {
         const out = preprocessForTool(goOutput, 'Bash');
@@ -623,18 +710,18 @@ FAIL\tgithub.com/user/calc\t0.003s`;
 });
 // ── Terraform ─────────────────────────────────────────────────────────────────
 describe('preprocessForTool - terraform', () => {
-    const planOutput = `Terraform will perform the following actions:
-  # aws_instance.web will be created
-  + resource "aws_instance" "web" {
-      + ami           = "ami-0c55b159cbfafe1f0"
-      + instance_type = "t2.micro"
-      ... (many attributes)
-    }
-  # aws_s3_bucket.data must be replaced
-  -/+ resource "aws_s3_bucket" "data" {
+    const planOutput = `Terraform will perform the following actions:
+  # aws_instance.web will be created
+  + resource "aws_instance" "web" {
+      + ami           = "ami-0c55b159cbfafe1f0"
+      + instance_type = "t2.micro"
+      ... (many attributes)
+    }
+  # aws_s3_bucket.data must be replaced
+  -/+ resource "aws_s3_bucket" "data" {
 Plan: 1 to add, 0 to change, 1 to destroy.`;
     it('keeps resource change summary lines', () => {
         const out = preprocessForTool(planOutput, 'Bash');