npm - runcap - Versions diffs - 0.1.1 → 0.2.0 - Mend

runcap 0.1.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +34 -15
package/bin/runcap.mjs +79 -5
package/package.json +3 -3
package/src/alerts.mjs +145 -0
package/src/cloud.mjs +90 -0
package/src/compressor.mjs +169 -0
package/src/mission-control.mjs +486 -81

package/README.md CHANGED Viewed

@@ -2,30 +2,42 @@
 [![CI](https://github.com/kirder24-code/ai-agent-manager/actions/workflows/ci.yml/badge.svg)](https://github.com/kirder24-code/ai-agent-manager/actions/workflows/ci.yml)
-**Know what your coding agent will cost before you build it — and set a hard ceiling so it never surprises you.**
+![Runcap terminal demo: estimate, cap, compress, stop](docs/assets/demo.svg)
+**Know what your coding agent will cost before you build it, and set a hard ceiling so it never surprises you.**
 Runcap estimates the cost of an agent run as a range, enforces a hard spend ceiling that physically stops the run, and when the agent gets stuck it hands you the exact rescue prompt. Free, MIT, 100% local. Your code and tokens never touch a server.
-> Every other tool here is a rear-view mirror — it shows you the bill *after* you paid it. Runcap estimates the bill *before* you start and caps it. It is a circuit breaker, not a dashboard.
+> Every other tool here is a rear-view mirror - it shows you the bill *after* you paid it. Runcap estimates the bill *before* you start and caps it. It is a circuit breaker, not a dashboard.
 ## Why
-Multi-agent coding runs burn roughly **15x more tokens** than a single chat ([Anthropic engineering](https://www.anthropic.com/engineering/built-multi-agent-research-system)). Agents loop on the same error, rewrite plans, and hand you a confident summary while the task is not actually done. You find out what it cost when the invoice — or the subscription limit — arrives.
+Multi-agent coding runs burn roughly **15x more tokens** than a single chat ([Anthropic engineering](https://www.anthropic.com/engineering/built-multi-agent-research-system)). Agents loop on the same error, rewrite plans, and hand you a confident summary while the task is not actually done. You find out what it cost when the invoice - or the subscription limit - arrives.
 Observability tools (Langfuse, Helicone, LangSmith, AgentOps) measure the past. Gateways (LiteLLM, Portkey, OpenRouter) route the present. None of them stop the spend *before* it happens. Runcap does the one thing the rear-view mirror can't:
 ```text
-estimate before build  →  cap during run  →  rescue when stuck  →  verify it finished
+estimate before build  →  cap during run  →  compress every call  →  rescue when stuck
 ```
 ## The honest claim
-Runcap does **not** promise an exact cost oracle. Agent trajectories are stochastic — nobody, including the model labs, can predict the exact token count of a run. So Runcap gives you a **range plus a hard cap**:
+Runcap does **not** promise an exact cost oracle. Agent trajectories are stochastic - nobody, including the model labs, can predict the exact token count of a run. So Runcap gives you a **range plus a hard cap**:
-> "This build is roughly $3–7. Cap it at $10." — then it kills the run the second it hits the ceiling.
+> "This build is roughly $3-7. Cap it at $10." - then it kills the run the second it hits the ceiling.
 The range is the headline. The hard cap is the product.
+## Who this is for
+Runcap is a developer tool. It works by running a local gateway that your agent's API calls pass through, so it can price and cap them before they reach the paid provider. That means you need three things already in place:
+- **Your own provider API key** (OpenAI or Anthropic). Runcap does not sell or supply model access.
+- **Your own agent** - Claude Code, Codex, or any script that calls the OpenAI/Anthropic API.
+- **Comfort running a CLI** and a local process on your machine.
+If you have those, Runcap caps your spend in one command. If you are looking for a no-account web app that runs the AI for you, this is not that - it is a circuit breaker for a setup you already own.
 ## 60-second demo
 No API key required.
@@ -48,7 +60,7 @@ Fuel: 24% (medium confidence)
 Recommendation: Do not launch as one broad mission. Split into one vertical slice with a verification command.
 ```
-**2. Wrap a run — and get a rescue prompt the moment it gets stuck:**
+**2. Wrap a run - and get a rescue prompt the moment it gets stuck:**
 ```text
 $ runcap run --label demo -- npm run build
@@ -111,19 +123,25 @@ ANTHROPIC_API_KEY=sk-ant-... AIM_DAILY_BUDGET_USD=5 runcap gateway
 When spend crosses the ceiling, the next call returns `429 budget_guard` instead of money leaving your account. Try it with no key: `runcap gateway --mock`.
+## Token compression (built in, no extra deps)
+Every request that passes through the gateway is compressed before it's forwarded: embedded JSON is re-serialized compactly, long log/stack-trace dumps are collapsed to head + tail, and trailing whitespace is squeezed. This is **lossless by construction** - your prose instructions and code semantics are never altered, only machine "garbage" is trimmed. It's pure Node with **zero ML or native dependencies**, so it installs everywhere without the build pain heavier compressors have.
+The dashboard shows the result as one number: **"You saved $X · N tokens compressed · would have spent $Y."** Disable it with `AIM_COMPRESS=off` if you ever want raw passthrough.
 ## Pricing table
-Costs are calculated from a sourced multi-provider table — Anthropic (Opus / Sonnet / Haiku) and OpenAI (GPT-5 family + legacy GPT-4), with cache-read and batch discounts handled — labeled with source and verification date. When a model is unknown, Runcap says `unknown_price` rather than guessing.
+Costs are calculated from a sourced multi-provider table - Anthropic (Opus / Sonnet / Haiku) and OpenAI (GPT-5 family + legacy GPT-4), with cache-read and batch discounts handled - labeled with source and verification date. When a model is unknown, Runcap says `unknown_price` rather than guessing.
 ## Trust model
 Runcap is built not to fake certainty. Every important output carries a truth label:
-- `observed` — git diff, exit code, file changes, terminal output;
-- `calculated` — parsed errors, diff hashes, stuck score, cost from the sourced price table;
-- `provider_usage` — token usage returned by the upstream provider;
-- `manual_calibration` — subscription % you entered before/after a run;
-- `unknown` — Runcap cannot honestly know.
+- `observed` - git diff, exit code, file changes, terminal output;
+- `calculated` - parsed errors, diff hashes, stuck score, cost from the sourced price table;
+- `provider_usage` - token usage returned by the upstream provider;
+- `manual_calibration` - subscription % you entered before/after a run;
+- `unknown` - Runcap cannot honestly know.
 If it cannot prove something, it says so.
@@ -132,14 +150,15 @@ If it cannot prove something, it says so.
 | Tier | Price | What you get |
 |---|---|---|
 | **OSS** (MIT, local) | $0 forever | All local runs, cost estimation, hard cap, run wrapping, stuck detection, rescue prompts, local dashboard. Never crippleware. |
+| **Founding Pro** (limited) | **$49 once** | Lifetime Pro at the founder price - pay once, keep Pro forever, before it moves to $19/mo. |
 | **Pro** | $19/mo | Cloud sync across machines, hosted dashboard, estimate-vs-actual trends, shareable reports, alerts on cap breach |
 | **Team** | $49/seat/mo | Shared budget pools, org-wide ceilings, per-project rollups, role-based caps |
-The local core is free forever. Only persistence, collaboration, and aggregation are paid — the things that only matter once data leaves your laptop.
+The local core is free forever. Only persistence, collaboration, and aggregation are paid - the things that only matter once data leaves your laptop.
 ## Current stage
-A working local tool, not a hosted SaaS. Ready for: wrapping real Codex / Claude / Cursor sessions, catching stuck agents, and proving rescue prompts save time. Not yet: a hosted cloud platform or a universal observability standard. It is not trying to replace Langfuse or LiteLLM — it does the thing they don't.
+A working local tool, not a hosted SaaS. Ready for: wrapping real Codex / Claude / Cursor sessions, catching stuck agents, and proving rescue prompts save time. Not yet: a hosted cloud platform or a universal observability standard. It is not trying to replace Langfuse or LiteLLM - it does the thing they don't.
 ## Documentation

package/bin/runcap.mjs CHANGED Viewed

@@ -16,19 +16,36 @@ import {
   startDashboard,
   startGateway,
   showStatus,
+  setBudgetCap,
+  clearBudgetCap,
+  currentBudgetCap,
+  hasStoredCap,
+  welcome,
   templates
 } from "../src/mission-control.mjs";
+import {
+  loginCommand,
+  logoutCommand,
+  whoamiCommand,
+  syncRun,
+  planToRun
+} from "../src/cloud.mjs";
+import { alertsCommand } from "../src/alerts.mjs";
 const args = process.argv.slice(2);
-const command = args[0] ?? "help";
+const command = args[0] ?? "welcome";
 function usage() {
   console.log(`Runcap — cap every agent run before it starts
 Usage:
-  runcap run [--label name] [--fuel-before 24] -- <command...>
-  runcap plan [--fuel 24] [--quality high|balanced|cheap] -- <goal...>
+  runcap run [--label name] [--cap|--no-cap] [--mock] -- <command...>
+                                 (auto-enforces your cap; no manual gateway/base-URL setup)
+  runcap plan [--fuel 24] [--quality high|balanced|cheap] [--apply-cap] -- <goal...>
   runcap plans
+  runcap cap <usd>               (set the hard cap the gateway enforces)
+  runcap cap show                (show the current cap)
+  runcap cap clear               (remove the stored cap)
   runcap preflight -- <command or prompt...>
   runcap status
   runcap list
@@ -40,6 +57,10 @@ Usage:
   runcap gateway [--port 8792] [--mock]
   runcap setup
   runcap doctor
+  runcap login <license-key>     (Pro: enable cloud sync + hosted dashboard)
+  runcap logout
+  runcap whoami
+  runcap alerts [list|add|test|clear]   (Pro: phone alerts when a run hits its cap)
   runcap fuel set <percent>
   runcap fuel calibrate <mission-id> <after-percent>
@@ -62,24 +83,50 @@ function takeOption(input, name) {
   return value;
 }
+function takeFlag(input, name) {
+  const index = input.indexOf(name);
+  if (index === -1) return false;
+  input.splice(index, 1);
+  return true;
+}
 try {
-  if (command === "help" || command === "--help" || command === "-h") {
+  if (command === "welcome") {
+    console.log(await welcome());
+  } else if (command === "help" || command === "--help" || command === "-h") {
     usage();
   } else if (command === "run") {
     const runArgs = args.slice(1);
     const label = takeOption(runArgs, "--label");
     const fuelBefore = takeOption(runArgs, "--fuel-before");
+    const forceCap = takeFlag(runArgs, "--cap");
+    const noCap = takeFlag(runArgs, "--no-cap");
+    const mock = takeFlag(runArgs, "--mock");
     const separator = runArgs.indexOf("--");
     const childArgs = separator === -1 ? runArgs : runArgs.slice(separator + 1);
     if (childArgs.length === 0) {
       throw new Error("Missing command after `aim run --`.");
     }
+    // Zero-config: auto-wrap with the cap gateway when a cap is set (or forced),
+    // unless explicitly disabled. No manual gateway start, no base-URL exports.
+    const capConfigured = Boolean(process.env.AIM_DAILY_BUDGET_USD) || hasStoredCap();
+    const autoGateway = !noCap && (forceCap || mock || capConfigured);
+    if (!autoGateway && !noCap && !capConfigured) {
+      console.log("runcap: no cap set, running without the gateway. Set one with `runcap cap <usd>` to enforce a budget.\n");
+    }
     const result = await runMission({
       command: childArgs,
       label,
-      fuelBefore: fuelBefore === undefined ? undefined : Number(fuelBefore)
+      fuelBefore: fuelBefore === undefined ? undefined : Number(fuelBefore),
+      autoGateway,
+      mock
     });
     console.log(result.summary);
+    if (result.capSummary) {
+      const c = result.capSummary;
+      const capLine = c.capUsd === null ? "no cap" : `cap $${c.capUsd.toFixed(2)}`;
+      console.log(`\nRuncap: cap enforced (${capLine}). This run spent ~$${c.spentThisRunUsd.toFixed(4)} (window total $${c.spentWindowUsd.toFixed(4)}).`);
+    }
   } else if (command === "preflight") {
     const runArgs = args.slice(1);
     const separator = runArgs.indexOf("--");
@@ -90,6 +137,9 @@ try {
     console.log(await preflightMission(childArgs));
   } else if (command === "plan") {
     const planArgs = args.slice(1);
+    const applyCapIndex = planArgs.indexOf("--apply-cap");
+    const applyCap = applyCapIndex !== -1;
+    if (applyCap) planArgs.splice(applyCapIndex, 1);
     const fuelPercent = takeOption(planArgs, "--fuel");
     const quality = takeOption(planArgs, "--quality") ?? "high";
     const separator = planArgs.indexOf("--");
@@ -117,6 +167,30 @@ try {
       `Report: .runcap/plans/${plan.id}/plan.md`,
       ""
     ].join("\n"));
+    if (applyCap) {
+      console.log(await setBudgetCap(plan.budget.recommendedCapUsd, { source: `plan:${plan.id}` }));
+      console.log("");
+    }
+    const sync = await syncRun(planToRun(plan));
+    if (sync === "synced") console.log("Cloud: synced to your Runcap Pro dashboard.");
+    else if (sync && sync.startsWith("sync_failed")) console.log(`Cloud: ${sync}`);
+  } else if (command === "login") {
+    console.log(await loginCommand(args[1]));
+  } else if (command === "logout") {
+    console.log(await logoutCommand());
+  } else if (command === "whoami") {
+    console.log(await whoamiCommand());
+  } else if (command === "alerts") {
+    console.log(await alertsCommand(args.slice(1)));
+  } else if (command === "cap") {
+    const sub = args[1];
+    if (sub === undefined || sub === "show") {
+      console.log(currentBudgetCap());
+    } else if (sub === "clear") {
+      console.log(await clearBudgetCap());
+    } else {
+      console.log(await setBudgetCap(sub));
+    }
   } else if (command === "plans") {
     console.log(await listPlans());
   } else if (command === "status") {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "runcap",
-  "version": "0.1.1",
+  "version": "0.2.0",
   "description": "Cap every agent run before it starts: estimate cost, set a hard ceiling that stops the run, rescue stuck agents. Local, MIT, nothing uploaded.",
   "license": "MIT",
   "type": "module",
@@ -34,8 +34,8 @@
     "LICENSE"
   ],
   "bin": {
-    "runcap": "./bin/runcap.mjs",
-    "aim": "./bin/runcap.mjs"
+    "runcap": "bin/runcap.mjs",
+    "aim": "bin/runcap.mjs"
   },
   "scripts": {
     "setup": "node ./bin/runcap.mjs setup",

package/src/alerts.mjs ADDED Viewed

@@ -0,0 +1,145 @@
+import { mkdir, readFile, writeFile } from "node:fs/promises";
+import { existsSync } from "node:fs";
+import os from "node:os";
+import path from "node:path";
+import { readLicense } from "./cloud.mjs";
+const CONFIG_DIR = path.join(os.homedir(), ".runcap");
+const ALERTS_FILE = path.join(CONFIG_DIR, "alerts.json");
+async function readAlerts() {
+  if (!existsSync(ALERTS_FILE)) return { channels: [] };
+  try {
+    const raw = await readFile(ALERTS_FILE, "utf8");
+    const data = JSON.parse(raw);
+    return { channels: Array.isArray(data.channels) ? data.channels : [] };
+  } catch {
+    return { channels: [] };
+  }
+}
+async function writeAlerts(config) {
+  await mkdir(CONFIG_DIR, { recursive: true });
+  await writeFile(ALERTS_FILE, JSON.stringify(config, null, 2));
+}
+function describeChannel(c) {
+  if (c.type === "telegram") return `telegram (chat ${c.chatId})`;
+  if (c.type === "whatsapp") return `whatsapp (${c.phone})`;
+  if (c.type === "webhook") return `webhook (${c.url})`;
+  return c.type;
+}
+async function deliverToChannel(channel, text) {
+  if (channel.type === "telegram") {
+    const resp = await fetch(`https://api.telegram.org/bot${channel.token}/sendMessage`, {
+      method: "POST",
+      headers: { "Content-Type": "application/json" },
+      body: JSON.stringify({ chat_id: channel.chatId, text })
+    });
+    return resp.ok;
+  }
+  if (channel.type === "whatsapp") {
+    // CallMeBot free WhatsApp API: user supplies their own phone + apikey.
+    const url = `https://api.callmebot.com/whatsapp.php?phone=${encodeURIComponent(channel.phone)}&text=${encodeURIComponent(text)}&apikey=${encodeURIComponent(channel.apikey)}`;
+    const resp = await fetch(url);
+    return resp.ok;
+  }
+  if (channel.type === "webhook") {
+    // Send both keys so Slack ("text") and Discord ("content") both work.
+    const resp = await fetch(channel.url, {
+      method: "POST",
+      headers: { "Content-Type": "application/json" },
+      body: JSON.stringify({ text, content: text })
+    });
+    return resp.ok;
+  }
+  return false;
+}
+// Best-effort, never throws into the caller. Pro-gated: requires a stored license.
+export async function sendAlert(text) {
+  const license = await readLicense();
+  if (!license) return null; // free tier: no alerts
+  const { channels } = await readAlerts();
+  if (!channels.length) return null;
+  const results = [];
+  for (const ch of channels) {
+    try {
+      const ok = await deliverToChannel(ch, text);
+      results.push(ok ? describeChannel(ch) : `${describeChannel(ch)} (failed)`);
+    } catch (err) {
+      results.push(`${describeChannel(ch)} (error: ${err.message})`);
+    }
+  }
+  return results;
+}
+export async function alertsCommand(args) {
+  const sub = args[0] ?? "list";
+  if (sub === "list") {
+    const { channels } = await readAlerts();
+    const license = await readLicense();
+    const lines = [];
+    if (!license) {
+      lines.push("Alerts are a Runcap Pro feature. Run `runcap login <key>` to enable them.");
+      lines.push("");
+    }
+    if (!channels.length) {
+      lines.push("No alert channels configured.");
+      lines.push("");
+      lines.push("Add one (the run that breaches your cap will ping you on your phone):");
+      lines.push("  runcap alerts add telegram <bot-token> <chat-id>");
+      lines.push("  runcap alerts add whatsapp <phone> <callmebot-apikey>");
+      lines.push("  runcap alerts add webhook <url>          (Slack / Discord / custom)");
+      return lines.join("\n");
+    }
+    lines.push("Configured alert channels:");
+    channels.forEach((c, i) => lines.push(`  ${i + 1}. ${describeChannel(c)}`));
+    lines.push("");
+    lines.push("Test them with: runcap alerts test");
+    return lines.join("\n");
+  }
+  if (sub === "add") {
+    const type = args[1];
+    const { channels } = await readAlerts();
+    let channel;
+    if (type === "telegram") {
+      const token = args[2];
+      const chatId = args[3];
+      if (!token || !chatId) throw new Error("Usage: runcap alerts add telegram <bot-token> <chat-id>");
+      channel = { type: "telegram", token, chatId };
+    } else if (type === "whatsapp") {
+      const phone = args[2];
+      const apikey = args[3];
+      if (!phone || !apikey) throw new Error("Usage: runcap alerts add whatsapp <phone> <callmebot-apikey>");
+      channel = { type: "whatsapp", phone, apikey };
+    } else if (type === "webhook") {
+      const url = args[2];
+      if (!url) throw new Error("Usage: runcap alerts add webhook <url>");
+      channel = { type: "webhook", url };
+    } else {
+      throw new Error("Unknown channel type. Use: telegram | whatsapp | webhook");
+    }
+    channels.push(channel);
+    await writeAlerts({ channels });
+    return `Added ${describeChannel(channel)}. Run \`runcap alerts test\` to confirm it reaches your phone.`;
+  }
+  if (sub === "test") {
+    const license = await readLicense();
+    if (!license) return "Alerts are Pro-only. Run `runcap login <key>` first.";
+    const results = await sendAlert("Runcap test alert — your cap-breach notifications are working.");
+    if (!results) return "No channels configured. Add one with `runcap alerts add ...`.";
+    return `Test sent to: ${results.join(", ")}`;
+  }
+  if (sub === "clear" || sub === "off") {
+    await writeAlerts({ channels: [] });
+    return "Cleared all alert channels.";
+  }
+  throw new Error("Usage: runcap alerts [list|add|test|clear]");
+}

package/src/cloud.mjs ADDED Viewed

@@ -0,0 +1,90 @@
+import { mkdir, readFile, writeFile } from "node:fs/promises";
+import { existsSync } from "node:fs";
+import os from "node:os";
+import path from "node:path";
+const CONFIG_DIR = path.join(os.homedir(), ".runcap");
+const LICENSE_FILE = path.join(CONFIG_DIR, "license.json");
+const DEFAULT_ENDPOINT = "https://launchsoloai.com/api/runcap-ingest";
+export async function readLicense() {
+  if (!existsSync(LICENSE_FILE)) return null;
+  try {
+    const raw = await readFile(LICENSE_FILE, "utf8");
+    const data = JSON.parse(raw);
+    return data.key ? data : null;
+  } catch {
+    return null;
+  }
+}
+export async function saveLicense(key, endpoint) {
+  await mkdir(CONFIG_DIR, { recursive: true });
+  const data = { key: key.trim(), endpoint: endpoint || DEFAULT_ENDPOINT, savedAt: new Date().toISOString() };
+  await writeFile(LICENSE_FILE, JSON.stringify(data, null, 2));
+  return data;
+}
+export async function clearLicense() {
+  if (existsSync(LICENSE_FILE)) {
+    await writeFile(LICENSE_FILE, JSON.stringify({}, null, 2));
+  }
+}
+function maskKey(key) {
+  if (!key || key.length < 8) return "****";
+  return `${key.slice(0, 4)}...${key.slice(-4)}`;
+}
+export async function loginCommand(key) {
+  if (!key) throw new Error("Usage: runcap login <license-key>");
+  const saved = await saveLicense(key);
+  return [
+    `Saved Runcap Pro license ${maskKey(saved.key)}.`,
+    `Cloud sync is now ON. Future plans and runs sync to your hosted dashboard.`,
+    `Dashboard: https://launchsoloai.com/runcap/dashboard`
+  ].join("\n");
+}
+export async function logoutCommand() {
+  await clearLicense();
+  return "Logged out. Cloud sync is OFF. The local core keeps working as before.";
+}
+export async function whoamiCommand() {
+  const lic = await readLicense();
+  if (!lic) return "Not logged in. Local core only (free). Run `runcap login <key>` to enable Pro cloud sync.";
+  return `Logged in with license ${maskKey(lic.key)}. Cloud sync ON → ${lic.endpoint}`;
+}
+// Best-effort: never throws into the caller's flow. Returns a short status string.
+export async function syncRun(run) {
+  const lic = await readLicense();
+  if (!lic) return null; // free mode, silent
+  try {
+    const resp = await fetch(lic.endpoint, {
+      method: "POST",
+      headers: { "Content-Type": "application/json" },
+      body: JSON.stringify({ license_key: lic.key, run })
+    });
+    if (resp.ok) return "synced";
+    if (resp.status === 403) return "sync_failed: license rejected (run `runcap whoami`)";
+    return `sync_failed: server ${resp.status}`;
+  } catch (err) {
+    return `sync_failed: ${err.message}`;
+  }
+}
+export function planToRun(plan) {
+  return {
+    mission_id: plan.id,
+    label: plan.goal,
+    estimate_low: plan.budget?.costLowUsd ?? 0,
+    estimate_high: plan.budget?.costHighUsd ?? 0,
+    cap: plan.budget?.recommendedCapUsd ?? null,
+    actual: null,
+    capped: false,
+    status: "planned"
+  };
+}

package/src/compressor.mjs ADDED Viewed

@@ -0,0 +1,169 @@
+// Runcap token compressor — pure Node, no ML, no native deps.
+//
+// Headroom (the popular Python tool) proves the demand but pays for it with
+// onnxruntime/HF model weights that break installs on macOS Intel, Windows MSVC,
+// etc. Runcap takes the opposite bet: only the deterministic, lossless-by-construction
+// reductions that need zero dependencies and can never silently change an answer.
+//
+// What we compress (and why it is safe):
+//   - JSON whitespace inside string-embedded JSON blobs (re-serialize compact).
+//   - Repeated blank lines and trailing whitespace in long text blocks.
+//   - Long log / stack-trace runs collapsed to head + tail + "(N lines elided)".
+// What we never touch:
+//   - The user's actual prose instructions.
+//   - Code semantics (we only strip trailing whitespace, never tokens).
+//   - Anything under a conservative size threshold (compression has overhead).
+//
+// Every reduction is COUNTED so the gateway can show one honest number:
+// "X tokens saved by compression". Token counts are an estimate (~4 chars/token),
+// labeled `estimated`, never claimed as provider-exact.
+const CHARS_PER_TOKEN = 4;
+const MIN_FIELD_CHARS = 200; // below this, compression overhead isn't worth it
+const LOG_HEAD_LINES = 12;
+const LOG_TAIL_LINES = 8;
+const LOG_COLLAPSE_THRESHOLD = 40; // collapse runs longer than this
+export function estimateTokens(text) {
+  if (!text) return 0;
+  return Math.ceil(String(text).length / CHARS_PER_TOKEN);
+}
+// Re-serialize an embedded JSON string compactly. Handles two shapes safely:
+//   1. The whole field is JSON ("{...}" or "[...]").
+//   2. A short text prefix followed by a JSON blob ("Here is the data:\n{...}").
+// In case 2 we only touch the JSON tail and keep the prefix verbatim, so prose
+// is never altered. Returns null if nothing valid/smaller was found.
+function compactEmbeddedJson(value) {
+  const trimmed = value.trim();
+  // Case 1: entire field is JSON.
+  if (trimmed.startsWith("{") || trimmed.startsWith("[")) {
+    try {
+      const compact = JSON.stringify(JSON.parse(trimmed));
+      if (compact.length < value.length) return compact;
+    } catch {
+      // fall through to prefix handling
+    }
+  }
+  // Case 2: a prefix then a JSON blob. Find the first { or [ and try to parse
+  // from there to end. Only accept if the tail is valid JSON in full.
+  const idx = value.search(/[{[]/);
+  if (idx > 0) {
+    const prefix = value.slice(0, idx);
+    // Keep the prefix small/prose-like; don't swallow huge text blocks.
+    if (prefix.length <= 200) {
+      const tail = value.slice(idx).trim();
+      try {
+        const compact = JSON.stringify(JSON.parse(tail));
+        const rebuilt = prefix + compact;
+        if (rebuilt.length < value.length) return rebuilt;
+      } catch {
+        return null;
+      }
+    }
+  }
+  return null;
+}
+const LOG_LINE_RE = /^\s*(\d{4}-\d{2}-\d{2}[T ]|\[?\d{2}:\d{2}:\d{2}|DEBUG|INFO|WARN|ERROR|TRACE|at\s+\w|\s+File ")/;
+// Collapse a long, log-like block: keep the head and tail (the parts a model
+// actually needs to diagnose), elide the repetitive middle.
+function collapseLogBlock(value) {
+  const lines = value.split("\n");
+  if (lines.length <= LOG_COLLAPSE_THRESHOLD) return null;
+  const logish = lines.filter((l) => LOG_LINE_RE.test(l)).length;
+  // Only collapse if it really looks like logs/stack traces, not prose.
+  if (logish < lines.length * 0.5) return null;
+  const head = lines.slice(0, LOG_HEAD_LINES);
+  const tail = lines.slice(-LOG_TAIL_LINES);
+  const elided = lines.length - head.length - tail.length;
+  if (elided <= 0) return null;
+  return [...head, `... (${elided} repetitive log lines elided by Runcap) ...`, ...tail].join("\n");
+}
+// Collapse 3+ blank lines to 1, and strip trailing whitespace ONLY on lines
+// that are part of a multi-line block. We deliberately leave single-line prose
+// (and its final trailing space) untouched so instructions are never altered.
+function squeezeWhitespace(value) {
+  const lines = value.split("\n");
+  if (lines.length < 3) return null; // not a structural block; leave prose alone
+  const squeezed = lines
+    .map((l) => l.replace(/[ \t]+$/g, ""))
+    .join("\n")
+    .replace(/\n{3,}/g, "\n\n");
+  return squeezed.length < value.length ? squeezed : null;
+}
+// Compress a single string field through the safe ladder. Returns the smallest
+// safe result (or the original if nothing helped).
+function compressField(value) {
+  if (typeof value !== "string" || value.length < MIN_FIELD_CHARS) return value;
+  let out = value;
+  const json = compactEmbeddedJson(out);
+  if (json !== null) out = json;
+  const logs = collapseLogBlock(out);
+  if (logs !== null && logs.length < out.length) out = logs;
+  const ws = squeezeWhitespace(out);
+  if (ws !== null && ws.length < out.length) out = ws;
+  return out;
+}
+// Walk an OpenAI- or Anthropic-shaped request body and compress message content.
+// Returns { body, before, after, savedChars, savedTokens, touched }.
+export function compressRequestBody(body) {
+  const result = { body, savedChars: 0, savedTokens: 0, touched: 0, before: 0, after: 0 };
+  if (!body || typeof body !== "object") return result;
+  const measureBefore = JSON.stringify(body).length;
+  let touched = 0;
+  const compressContent = (content) => {
+    if (typeof content === "string") {
+      const next = compressField(content);
+      if (next !== content) touched += 1;
+      return next;
+    }
+    if (Array.isArray(content)) {
+      return content.map((part) => {
+        if (part && typeof part === "object" && typeof part.text === "string") {
+          const next = compressField(part.text);
+          if (next !== part.text) touched += 1;
+          return { ...part, text: next };
+        }
+        return part;
+      });
+    }
+    return content;
+  };
+  let next = body;
+  // OpenAI chat.completions: messages[].content
+  if (Array.isArray(body.messages)) {
+    next = {
+      ...body,
+      messages: body.messages.map((m) =>
+        m && typeof m === "object" && "content" in m ? { ...m, content: compressContent(m.content) } : m
+      )
+    };
+  }
+  // Anthropic system prompt (string or block array)
+  if (next.system !== undefined) {
+    next = { ...next, system: compressContent(next.system) };
+  }
+  // OpenAI responses API / raw input
+  if (typeof next.input === "string") {
+    next = { ...next, input: compressContent(next.input) };
+  }
+  const measureAfter = JSON.stringify(next).length;
+  const savedChars = Math.max(0, measureBefore - measureAfter);
+  return {
+    body: next,
+    before: measureBefore,
+    after: measureAfter,
+    savedChars,
+    savedTokens: Math.round(savedChars / CHARS_PER_TOKEN),
+    touched
+  };
+}