npm - @sliday/tamp - Versions diffs - 0.1.0 → 0.2.0 - Mend

@sliday/tamp 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md ADDED Viewed

@@ -0,0 +1,235 @@
+# Tamp
+**Token compression proxy for coding agents.** 33.9% fewer input tokens, zero code changes. Works with Claude Code, Aider, Cursor, Cline, Windsurf, and any OpenAI-compatible agent.
+```
+npx @sliday/tamp
+```
+Or install globally:
+```bash
+curl -fsSL https://tamp.dev/setup.sh | bash
+```
+## How It Works
+Tamp auto-detects your agent's API format and compresses tool result blocks before forwarding upstream. Source code, error results, and non-JSON content pass through untouched.
+```
+Claude Code ──► Tamp (localhost:7778) ──► Anthropic API
+Aider/Cursor ──►          │          ──► OpenAI API
+Gemini CLI ────►          │          ──► Google AI API
+                          │
+                          ├─ JSON → minify whitespace
+                          ├─ Arrays → TOON columnar encoding
+                          ├─ Line-numbered → strip prefixes + minify
+                          ├─ Source code → passthrough
+                          └─ Errors → skip
+```
+### Supported API Formats
+| Format | Endpoint | Agents |
+|--------|----------|--------|
+| Anthropic Messages | `POST /v1/messages` | Claude Code |
+| OpenAI Chat Completions | `POST /v1/chat/completions` | Aider, Cursor, Cline, Windsurf, OpenCode |
+| Google Gemini | `POST .../generateContent` | Gemini CLI |
+### Compression Stages
+| Stage | What it does | When it applies |
+|-------|-------------|-----------------|
+| `minify` | Strips JSON whitespace | Pretty-printed JSON objects/arrays |
+| `toon` | Columnar [TOON encoding](https://github.com/nicholasgasior/toon-format) | Homogeneous arrays (file listings, routes, deps) |
+| `llmlingua` | Neural text compression via [LLMLingua](https://github.com/microsoft/LLMLingua) sidecar | Natural language text (requires sidecar) |
+Only `minify` is enabled by default. Enable more with `TOONA_STAGES=minify,toon`.
+## Quick Start
+### 1. Start the proxy
+```bash
+npx @sliday/tamp
+```
+```
+  ┌─ Tamp ─────────────────────────────────┐
+  │  Proxy: http://localhost:7778          │
+  │  Status: ● Ready                       │
+  │                                        │
+  │  Claude Code:                          │
+  │    ANTHROPIC_BASE_URL=http://localhost:7778
+  │                                        │
+  │  Aider / Cursor / Cline:              │
+  │    OPENAI_BASE_URL=http://localhost:7778
+  └────────────────────────────────────────┘
+```
+### 2. Point your agent at the proxy
+**Claude Code:**
+```bash
+export ANTHROPIC_BASE_URL=http://localhost:7778
+claude
+```
+**Aider:**
+```bash
+export OPENAI_API_BASE=http://localhost:7778
+aider
+```
+**Cursor / Cline / Windsurf:**
+Set the API base URL to `http://localhost:7778` in your editor's settings.
+That's it. Use your agent as normal — Tamp compresses silently in the background.
+## Configuration
+All configuration via environment variables:
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `TOONA_PORT` | `7778` | Proxy listen port |
+| `TOONA_UPSTREAM` | `https://api.anthropic.com` | Default upstream API URL |
+| `TOONA_UPSTREAM_OPENAI` | `https://api.openai.com` | Upstream for OpenAI-format requests |
+| `TOONA_UPSTREAM_GEMINI` | `https://generativelanguage.googleapis.com` | Upstream for Gemini-format requests |
+| `TOONA_STAGES` | `minify` | Comma-separated compression stages |
+| `TOONA_MIN_SIZE` | `200` | Minimum content size (chars) to attempt compression |
+| `TOONA_LOG` | `true` | Enable request logging to stderr |
+| `TOONA_LOG_FILE` | _(none)_ | Write logs to file |
+| `TOONA_MAX_BODY` | `10485760` | Max request body size (bytes) before passthrough |
+| `TOONA_LLMLINGUA_URL` | _(none)_ | LLMLingua sidecar URL for text compression |
+### Recommended setup
+```bash
+# Maximum compression
+TOONA_STAGES=minify,toon npx @sliday/tamp
+```
+## Installation Methods
+### npx (no install)
+```bash
+npx @sliday/tamp
+```
+### npm global
+```bash
+npm install -g @sliday/tamp
+npx @sliday/tamp
+```
+### Git clone
+```bash
+git clone https://github.com/sliday/tamp.git
+cd tamp && npm install
+node bin/tamp.js
+```
+### One-line installer
+```bash
+curl -fsSL https://tamp.dev/setup.sh | bash
+```
+The installer clones to `~/.tamp`, adds `ANTHROPIC_BASE_URL` to your shell profile, and creates a `tamp` alias.
+## What Gets Compressed
+Tamp only compresses the **last user message** in each request (the most recent `tool_result` blocks). Historical messages are left untouched to avoid redundant recompression.
+| Content Type | Action | Example |
+|-------------|--------|---------|
+| Pretty-printed JSON | Minify whitespace | `package.json`, config files |
+| JSON with line numbers | Strip prefixes + minify | Read tool output (`  1→{...}`) |
+| Homogeneous JSON arrays | TOON encode | File listings, route tables, dependencies |
+| Already-minified JSON | Skip | Single-line JSON |
+| Source code (text) | Passthrough | `.ts`, `.py`, `.rs` files |
+| `is_error: true` results | Skip entirely | Error tool results |
+| TOON-encoded content | Skip | Already compressed |
+## Architecture
+```
+bin/tamp.js          CLI entry point
+index.js             HTTP proxy server
+providers.js         API format adapters (Anthropic, OpenAI, Gemini) + auto-detection
+compress.js          Compression pipeline (compressRequest, compressText)
+detect.js            Content classification (classifyContent, tryParseJSON, stripLineNumbers)
+config.js            Environment-based configuration
+stats.js             Session statistics and request logging
+setup.sh             One-line installer script
+```
+### How the proxy works
+1. `detectProvider()` auto-detects the API format from the request path
+2. Unrecognized requests are piped through unmodified
+3. Matched requests are buffered, parsed, and tool results are extracted via the provider adapter
+4. Extracted blocks are classified and compressed
+5. The modified body is forwarded to the correct upstream with updated `Content-Length`
+6. The upstream response is streamed back to the client unmodified
+Bodies exceeding `TOONA_MAX_BODY` are piped through without buffering.
+## Benchmarking
+The `bench/` directory contains a reproducible A/B benchmark that measures actual token savings via OpenRouter:
+```bash
+OPENROUTER_API_KEY=... node bench/runner.js   # 70 API calls, ~2 min
+node bench/analyze.js                          # Statistical analysis
+node bench/render.js                           # White paper (HTML + PDF)
+```
+Seven scenarios cover the full range: small/large JSON, tabular data, source code, multi-turn conversations, line-numbered output, and error results. Each runs 5 times for statistical confidence (95% CI via Student's t-distribution).
+Results are written to `bench/results/` (gitignored).
+## Development
+```bash
+# Run tests
+npm test
+# Smoke test (spins up proxy + echo server, validates compression)
+node smoke.js
+# Run specific test file
+node --test test/compress.test.js
+```
+### Test files
+```
+test/compress.test.js    Compression pipeline tests (Anthropic + OpenAI formats)
+test/providers.test.js   Provider adapter + auto-detection tests
+test/detect.test.js      Content classification tests
+test/config.test.js      Configuration loading tests
+test/proxy.test.js       HTTP proxy integration tests
+test/stats.test.js       Statistics and logging tests
+test/fixtures/           Sample API payloads
+```
+## How Token Savings Work
+Claude Code sends the full conversation history on every API call. As a session progresses, tool results accumulate — file contents, directory listings, command outputs — all re-sent as input tokens on each request.
+At $3/million input tokens (Sonnet 4), a 200-request session consuming 3M input tokens costs $9. If 60% of tool results are compressible JSON, and compression removes 30-50% of those tokens, that's $1.60-2.70 saved per session.
+For teams with 5 developers doing 2 sessions/day, that's $500-800/month in savings.
+## License
+MIT
+## Author
+[Stas Kulesh](mailto:stas@sliday.com) — [sliday.com](https://sliday.com)

package/bin/tamp.js CHANGED Viewed

@@ -4,17 +4,23 @@ import { createProxy } from '../index.js'
 const { config, server } = createProxy()
 server.listen(config.port, () => {
+  const url = `http://localhost:${config.port}`
   console.error('')
-  console.error('  ┌─ Tamp ─────────────────────────────┐')
-  console.error(`  │  Proxy: http://localhost:${config.port}      │`)
-  console.error('  │  Status: ● Ready                   │')
-  console.error('  │                                    │')
-  console.error('  │  In another terminal:              │')
-  console.error(`  │  export ANTHROPIC_BASE_URL=http://localhost:${config.port}`)
-  console.error('  │  claude                            │')
-  console.error('  └────────────────────────────────────┘')
+  console.error('  ┌─ Tamp ─────────────────────────────────┐')
+  console.error(`  │  Proxy: ${url}              │`)
+  console.error('  │  Status: ● Ready                       │')
+  console.error('  │                                        │')
+  console.error('  │  Claude Code:                          │')
+  console.error(`  │    ANTHROPIC_BASE_URL=${url}  │`)
+  console.error('  │                                        │')
+  console.error('  │  Aider / Cursor / Cline:               │')
+  console.error(`  │    OPENAI_BASE_URL=${url}     │`)
+  console.error('  └────────────────────────────────────────┘')
   console.error('')
-  console.error(`  Upstream: ${config.upstream}`)
+  console.error(`  Upstreams:`)
+  console.error(`    anthropic → ${config.upstreams.anthropic}`)
+  console.error(`    openai    → ${config.upstreams.openai}`)
+  console.error(`    gemini    → ${config.upstreams.gemini}`)
   console.error(`  Stages: ${config.stages.join(', ')}`)
   console.error('')
 })

package/compress.js CHANGED Viewed

@@ -1,5 +1,7 @@
 import { encode } from '@toon-format/toon'
+import { countTokens } from '@anthropic-ai/tokenizer'
 import { tryParseJSON, classifyContent, stripLineNumbers } from './detect.js'
+import { anthropic } from './providers.js'
 export function compressText(text, config) {
   if (text.length < config.minSize) return null
@@ -31,7 +33,7 @@ export function compressText(text, config) {
     } catch { /* fall back to minified */ }
   }
-  return { text: best.text, method: best.method, originalLen: text.length, compressedLen: best.text.length }
+  return { text: best.text, method: best.method, originalLen: text.length, compressedLen: best.text.length, originalTokens: countTokens(text), compressedTokens: countTokens(best.text) }
 }
 async function compressWithLLMLingua(text, config) {
@@ -47,7 +49,7 @@ async function compressWithLLMLingua(text, config) {
     clearTimeout(timeout)
     if (!res.ok) return null
     const data = await res.json()
-    return { text: data.text, method: 'llmlingua', originalLen: text.length, compressedLen: data.text.length }
+    return { text: data.text, method: 'llmlingua', originalLen: text.length, compressedLen: data.text.length, originalTokens: countTokens(text), compressedTokens: countTokens(data.text) }
   } catch {
     return null
   }
@@ -61,54 +63,22 @@ async function compressBlock(text, config) {
   return sync
 }
-export async function compressMessages(body, config) {
+export async function compressRequest(body, config, provider) {
+  const targets = provider.extract(body)
   const stats = []
-  if (!body?.messages?.length) return { body, stats }
-  let lastUserIdx = -1
-  for (let i = body.messages.length - 1; i >= 0; i--) {
-    if (body.messages[i].role === 'user') { lastUserIdx = i; break }
-  }
-  if (lastUserIdx === -1) return { body, stats }
-  const msg = body.messages[lastUserIdx]
-  const debug = config.log
-  if (typeof msg.content === 'string') {
-    const result = await compressBlock(msg.content, config)
+  for (const target of targets) {
+    if (target.skip) { stats.push({ index: target.index, skipped: target.skip }); continue }
+    const result = await compressBlock(target.text, config)
     if (result) {
-      msg.content = result.text
-      stats.push({ index: lastUserIdx, ...result })
-    }
-  } else if (Array.isArray(msg.content)) {
-    for (let i = 0; i < msg.content.length; i++) {
-      const block = msg.content[i]
-      if (block.type !== 'tool_result') continue
-      if (block.is_error) { stats.push({ index: i, skipped: 'error' }); continue }
-      if (typeof block.content === 'string') {
-        if (debug) {
-          const cls = classifyContent(block.content)
-          const len = block.content.length
-          console.error(`[toona] debug block[${i}]: type=${cls} len=${len} tool_use_id=${block.tool_use_id || '?'}`)
-        }
-        const result = await compressBlock(block.content, config)
-        if (result) { block.content = result.text; stats.push({ index: i, ...result }) }
-      } else if (Array.isArray(block.content)) {
-        for (const sub of block.content) {
-          if (sub.type === 'text') {
-            if (debug) {
-              const cls = classifyContent(sub.text)
-              const len = sub.text.length
-              console.error(`[toona] debug sub-block: type=${cls} len=${len}`)
-            }
-            const result = await compressBlock(sub.text, config)
-            if (result) { sub.text = result.text; stats.push({ index: i, ...result }) }
-          }
-        }
-      }
+      target.compressed = result.text
+      stats.push({ index: target.index, ...result })
     }
   }
+  provider.apply(body, targets)
   return { body, stats }
 }
+export async function compressMessages(body, config) {
+  if (!body?.messages?.length) return { body, stats: [] }
+  return compressRequest(body, config, anthropic)
+}

package/config.js CHANGED Viewed

@@ -3,6 +3,11 @@ export function loadConfig(env = process.env) {
   return Object.freeze({
     port: parseInt(env.TOONA_PORT, 10) || 7778,
     upstream: env.TOONA_UPSTREAM || 'https://api.anthropic.com',
+    upstreams: Object.freeze({
+      anthropic: env.TOONA_UPSTREAM || 'https://api.anthropic.com',
+      openai: env.TOONA_UPSTREAM_OPENAI || 'https://api.openai.com',
+      gemini: env.TOONA_UPSTREAM_GEMINI || 'https://generativelanguage.googleapis.com',
+    }),
     minSize: parseInt(env.TOONA_MIN_SIZE, 10) || 200,
     stages,
     log: env.TOONA_LOG !== 'false',

package/index.js CHANGED Viewed

@@ -1,11 +1,16 @@
 import http from 'node:http'
 import https from 'node:https'
 import { loadConfig } from './config.js'
-import { compressMessages } from './compress.js'
+import { compressRequest } from './compress.js'
+import { detectProvider } from './providers.js'
 import { createSession, formatRequestLog } from './stats.js'
 export function createProxy(overrides = {}) {
-  const config = { ...loadConfig(), ...overrides }
+  const base = loadConfig()
+  const config = { ...base, ...overrides }
+  if (overrides.upstream && !overrides.upstreams) {
+    config.upstreams = { anthropic: overrides.upstream, openai: overrides.upstream, gemini: overrides.upstream }
+  }
   const session = createSession()
   return { config, session, server: _createServer(config, session) }
 }
@@ -81,13 +86,16 @@ function pipeRequest(req, res, upstreamUrl, prefixChunks) {
 return http.createServer(async (req, res) => {
   if (config.log) console.error(`[tamp] ${req.method} ${req.url}`)
-  const upstreamUrl = new URL(req.url, config.upstream)
-  const isMessages = req.method === 'POST' && req.url.startsWith('/v1/messages')
+  const provider = detectProvider(req.method, req.url)
-  if (!isMessages) {
+  if (!provider) {
+    const upstreamUrl = new URL(req.url, config.upstream)
     return pipeRequest(req, res, upstreamUrl)
   }
+  const upstream = config.upstreams?.[provider.name] || config.upstream
+  const upstreamUrl = new URL(req.url, upstream)
   const chunks = []
   let size = 0
   let overflow = false
@@ -113,12 +121,12 @@ return http.createServer(async (req, res) => {
   try {
     const parsed = JSON.parse(rawBody.toString('utf-8'))
-    const { body, stats } = await compressMessages(parsed, config)
+    const { body, stats } = await compressRequest(parsed, config, provider)
     finalBody = Buffer.from(JSON.stringify(body), 'utf-8')
     if (config.log && stats.length) {
       session.record(stats)
-      console.error(formatRequestLog(stats, session))
+      console.error(formatRequestLog(stats, session, provider.name, req.url))
     }
   } catch (err) {
     if (config.log) console.error(`[tamp] passthrough (parse error): ${err.message}`)

package/package.json CHANGED Viewed

@@ -6,10 +6,11 @@
     "compress.js",
     "config.js",
     "detect.js",
+    "providers.js",
     "stats.js"
   ],
-  "version": "0.1.0",
-  "description": "Token compression proxy for Claude Code. 50% fewer tokens, zero behavior change.",
+  "version": "0.2.0",
+  "description": "Token compression proxy for coding agents. Works with Claude Code, Aider, Cursor, Cline, Windsurf. 33.9% fewer input tokens.",
   "type": "module",
   "main": "index.js",
   "bin": {
@@ -17,9 +18,18 @@
   },
   "scripts": {
     "start": "node bin/tamp.js",
-    "test": "node --test test/*.test.js"
+    "test": "node --test test/*.test.js",
+    "test:sidecar": "node test/capture-golden.js --verify",
+    "bench:semantic": "node bench/semantic-eval.js"
   },
-  "keywords": ["claude", "anthropic", "proxy", "compression", "tokens", "llm"],
+  "keywords": [
+    "claude",
+    "anthropic",
+    "proxy",
+    "compression",
+    "tokens",
+    "llm"
+  ],
   "author": "Stas Kulesh <stas@sliday.com>",
   "license": "MIT",
   "repository": {
@@ -28,6 +38,7 @@
   },
   "homepage": "https://github.com/sliday/tamp",
   "dependencies": {
+    "@anthropic-ai/tokenizer": "^0.0.4",
     "@toon-format/toon": "^2.1.0"
   }
 }

package/providers.js ADDED Viewed

@@ -0,0 +1,147 @@
+const anthropic = {
+  name: 'anthropic',
+  match(method, url) {
+    return method === 'POST' && url.startsWith('/v1/messages')
+  },
+  extract(body) {
+    const targets = []
+    if (!body?.messages?.length) return targets
+    let lastUserIdx = -1
+    for (let i = body.messages.length - 1; i >= 0; i--) {
+      if (body.messages[i].role === 'user') { lastUserIdx = i; break }
+    }
+    if (lastUserIdx === -1) return targets
+    const msg = body.messages[lastUserIdx]
+    if (typeof msg.content === 'string') {
+      targets.push({ path: ['messages', lastUserIdx, 'content'], text: msg.content })
+    } else if (Array.isArray(msg.content)) {
+      for (let i = 0; i < msg.content.length; i++) {
+        const block = msg.content[i]
+        if (block.type !== 'tool_result') continue
+        if (block.is_error) { targets.push({ skip: 'error', index: i }); continue }
+        if (typeof block.content === 'string') {
+          targets.push({ path: ['messages', lastUserIdx, 'content', i, 'content'], text: block.content, index: i })
+        } else if (Array.isArray(block.content)) {
+          for (let j = 0; j < block.content.length; j++) {
+            const sub = block.content[j]
+            if (sub.type === 'text') {
+              targets.push({ path: ['messages', lastUserIdx, 'content', i, 'content', j, 'text'], text: sub.text, index: i })
+            }
+          }
+        }
+      }
+    }
+    return targets
+  },
+  apply(body, targets) {
+    for (const t of targets) {
+      if (t.skip || !t.compressed) continue
+      let obj = body
+      const path = t.path
+      for (let i = 0; i < path.length - 1; i++) obj = obj[path[i]]
+      obj[path[path.length - 1]] = t.compressed
+    }
+  },
+}
+const openai = {
+  name: 'openai',
+  match(method, url) {
+    return method === 'POST' && url.startsWith('/v1/chat/completions')
+  },
+  extract(body) {
+    const targets = []
+    if (!body?.messages?.length) return targets
+    // Find last assistant message with tool_calls
+    let lastAssistantIdx = -1
+    for (let i = body.messages.length - 1; i >= 0; i--) {
+      if (body.messages[i].role === 'assistant' && body.messages[i].tool_calls?.length) {
+        lastAssistantIdx = i
+        break
+      }
+    }
+    if (lastAssistantIdx === -1) return targets
+    // Collect all subsequent role:tool messages
+    for (let i = lastAssistantIdx + 1; i < body.messages.length; i++) {
+      const msg = body.messages[i]
+      if (msg.role !== 'tool') break
+      if (typeof msg.content === 'string') {
+        targets.push({ path: ['messages', i, 'content'], text: msg.content, index: i })
+      }
+    }
+    return targets
+  },
+  apply(body, targets) {
+    for (const t of targets) {
+      if (t.skip || !t.compressed) continue
+      let obj = body
+      const path = t.path
+      for (let i = 0; i < path.length - 1; i++) obj = obj[path[i]]
+      obj[path[path.length - 1]] = t.compressed
+    }
+  },
+}
+const gemini = {
+  name: 'gemini',
+  match(method, url) {
+    return method === 'POST' && url.includes('generateContent')
+  },
+  extract(body) {
+    const targets = []
+    if (!body?.contents?.length) return targets
+    // Find last content with functionResponse parts
+    for (let ci = body.contents.length - 1; ci >= 0; ci--) {
+      const content = body.contents[ci]
+      if (!content.parts?.length) continue
+      for (let pi = 0; pi < content.parts.length; pi++) {
+        const part = content.parts[pi]
+        if (!part.functionResponse?.response) continue
+        const resp = part.functionResponse.response
+        const text = typeof resp === 'string' ? resp : JSON.stringify(resp, null, 2)
+        targets.push({
+          path: ['contents', ci, 'parts', pi, 'functionResponse', 'response'],
+          text,
+          index: pi,
+          wasObject: typeof resp !== 'string',
+        })
+      }
+      if (targets.length) break
+    }
+    return targets
+  },
+  apply(body, targets) {
+    for (const t of targets) {
+      if (t.skip || !t.compressed) continue
+      let obj = body
+      const path = t.path
+      for (let i = 0; i < path.length - 1; i++) obj = obj[path[i]]
+      // If original was object, try to parse compressed back to object
+      if (t.wasObject) {
+        try {
+          obj[path[path.length - 1]] = JSON.parse(t.compressed)
+          continue
+        } catch { /* fall through to string */ }
+      }
+      obj[path[path.length - 1]] = t.compressed
+    }
+  },
+}
+const providers = [anthropic, openai, gemini]
+export function detectProvider(method, url) {
+  for (const p of providers) {
+    if (p.match(method, url)) return p
+  }
+  return null
+}
+export { anthropic, openai, gemini }

package/stats.js CHANGED Viewed

@@ -1,27 +1,33 @@
-export function formatRequestLog(stats, session) {
+export function formatRequestLog(stats, session, providerName, url) {
   const compressed = stats.filter(s => s.method)
   const skipped = stats.filter(s => s.skipped)
-  const lines = [`[toona] POST /v1/messages — ${stats.length} blocks, ${compressed.length} compressed`]
+  const label = providerName || 'anthropic'
+  const path = url || '/v1/messages'
+  const lines = [`[toona] ${label} ${path} — ${stats.length} blocks, ${compressed.length} compressed`]
   for (const s of stats) {
     if (s.skipped) {
       lines.push(`[toona]   block[${s.index}]: skipped (${s.skipped})`)
     } else if (s.method) {
       const pct = (((s.originalLen - s.compressedLen) / s.originalLen) * 100).toFixed(1)
-      lines.push(`[toona]   block[${s.index}]: ${s.originalLen}->${s.compressedLen} chars (-${pct}%) [${s.method}]`)
+      const tokInfo = s.originalTokens ? ` ${s.originalTokens}->${s.compressedTokens} tok` : ''
+      lines.push(`[toona]   block[${s.index}]: ${s.originalLen}->${s.compressedLen} chars (-${pct}%)${tokInfo} [${s.method}]`)
     }
   }
   const totalOrig = compressed.reduce((a, s) => a + s.originalLen, 0)
   const totalComp = compressed.reduce((a, s) => a + s.compressedLen, 0)
+  const totalOrigTok = compressed.reduce((a, s) => a + (s.originalTokens || 0), 0)
+  const totalCompTok = compressed.reduce((a, s) => a + (s.compressedTokens || 0), 0)
   if (compressed.length > 0) {
     const pct = (((totalOrig - totalComp) / totalOrig) * 100).toFixed(1)
-    lines.push(`[toona]   total: ${totalOrig}->${totalComp} chars (-${pct}%)`)
+    const tokPct = totalOrigTok > 0 ? (((totalOrigTok - totalCompTok) / totalOrigTok) * 100).toFixed(1) : '0.0'
+    lines.push(`[toona]   total: ${totalOrig}->${totalComp} chars (-${pct}%), ${totalOrigTok}->${totalCompTok} tokens (-${tokPct}%)`)
   }
   if (session) {
     const totals = session.getTotals()
-    lines.push(`[toona]   session: ${totals.totalSaved} chars saved across ${totals.compressionCount} compressions`)
+    lines.push(`[toona]   session: ${totals.totalSaved} chars, ${totals.totalTokensSaved} tokens saved across ${totals.compressionCount} compressions`)
   }
   return lines.join('\n')
@@ -29,6 +35,7 @@ export function formatRequestLog(stats, session) {
 export function createSession() {
   let totalSaved = 0
+  let totalTokensSaved = 0
   let compressionCount = 0
   return {
@@ -36,12 +43,13 @@ export function createSession() {
       for (const s of stats) {
         if (s.method && s.originalLen && s.compressedLen) {
           totalSaved += s.originalLen - s.compressedLen
+          totalTokensSaved += (s.originalTokens || 0) - (s.compressedTokens || 0)
           compressionCount++
         }
       }
     },
     getTotals() {
-      return { totalSaved, compressionCount }
+      return { totalSaved, totalTokensSaved, compressionCount }
     },
   }
 }