npm - runcap - Versions diffs - 0.2.2 → 0.3.0 - Mend

runcap 0.2.2 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md +15 -3
package/package.json +3 -2
package/scripts/loop-test.mjs +84 -0
package/scripts/make-linkedin-delta-video.mjs +412 -0
package/src/compressor.mjs +57 -1
package/src/mission-control.mjs +39 -3

package/README.md CHANGED Viewed

@@ -4,15 +4,21 @@
 ![Runcap terminal demo: estimate, cap, compress, stop](docs/assets/demo.svg)
-**Know what your coding agent will cost before you build it, and set a hard ceiling so it never surprises you.**
+**Your AI coding agent re-reads the same files over and over and quietly burns your money. Runcap estimates the bill before you build, hard-caps the spend so it physically stops at your ceiling, and losslessly compresses every call. Free, MIT, 100% local. Your code and tokens never touch a server.**
-Runcap estimates the cost of an agent run as a range, enforces a hard spend ceiling that physically stops the run, and when the agent gets stuck it hands you the exact rescue prompt. Free, MIT, 100% local. Your code and tokens never touch a server.
+On a real OpenAI call, one edited-file re-read dropped from **1,186 to 737 prompt tokens (37.9% saved)** with the model still answering correctly about the changed line. No other proxy does this:
+| | Without Runcap | With Runcap |
+|---|---|---|
+| Re-read of an edited file | 1,186 prompt tokens | **737 prompt tokens** |
+| You find out the cost | when the invoice arrives | **before you press go, capped at your ceiling** |
+| When the agent gets stuck | it keeps spending | **run stops, you get the exact rescue prompt** |
 > Every other tool here is a rear-view mirror - it shows you the bill *after* you paid it. Runcap estimates the bill *before* you start and caps it. It is a circuit breaker, not a dashboard.
 ## Why
-Multi-agent coding runs burn roughly **15x more tokens** than a single chat ([Anthropic engineering](https://www.anthropic.com/engineering/built-multi-agent-research-system)). Agents loop on the same error, rewrite plans, and hand you a confident summary while the task is not actually done. You find out what it cost when the invoice - or the subscription limit - arrives.
+**Agents loop on the same error, rewrite plans, and re-read files they just edited - every loop is tokens you pay for.** Multi-agent coding runs burn roughly **15x more tokens** than a single chat ([Anthropic engineering](https://www.anthropic.com/engineering/built-multi-agent-research-system)). They hand you a confident summary while the task is not actually done, and you find out what it cost when the invoice - or the subscription limit - arrives.
 Observability tools (Langfuse, Helicone, LangSmith, AgentOps) measure the past. Gateways (LiteLLM, Portkey, OpenRouter) route the present. None of them stop the spend *before* it happens. Runcap does the one thing the rear-view mirror can't:
@@ -135,6 +141,12 @@ It's pure Node with **zero ML or native dependencies**, so it installs everywher
 The dashboard shows the result as one number: **"You saved $X · N tokens compressed · would have spent $Y."** Disable it with `AIM_COMPRESS=off` if you ever want raw passthrough.
+## Loop detection (the "looks productive but stuck" signal)
+The hard case in stuck-detection is the agent that keeps producing output but is really circling the same failure, just reworded each time. Plain hashing misses it because the prompt is *similar but never byte-identical* between loops. Because the gateway sees every request, Runcap compares each request's conversation shape against the recent run with the same line-similarity primitive the delta-encoder uses: when several prompts in a row are near-identical (default: 3 prompts at 92%+ similarity) while the conversation never moves forward, it flags `loop.looping` on the event, surfaces a warning in `runcap status`, and fires an alert.
+This is a **calculated** signal, not a proven dollar-saving: it tells you *"the agent has sent 3 near-identical prompts in a row with no progress"* so you can step in before the loop burns more budget. Tune or disable it with `AIM_LOOP_DETECT=off`. (Today's [`detectStuck`](src/mission-control.mjs) post-run score is outcome-based: exit code, parsed errors, and zero-diff. The loop signal adds the missing in-flight behavioral signal on top of it.)
 ## Pricing table
 Costs are calculated from a sourced multi-provider table - Anthropic (Opus / Sonnet / Haiku) and OpenAI (GPT-5 family + legacy GPT-4), with cache-read and batch discounts handled - labeled with source and verification date. When a model is unknown, Runcap says `unknown_price` rather than guessing.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "runcap",
-  "version": "0.2.2",
+  "version": "0.3.0",
   "description": "Cap every agent run before it starts: estimate cost, set a hard ceiling that stops the run, rescue stuck agents. Local, MIT, nothing uploaded.",
   "license": "MIT",
   "type": "module",
@@ -45,8 +45,9 @@
     "acceptance": "node ./scripts/acceptance.mjs",
     "smoke": "node ./bin/runcap.mjs run --label smoke -- npm --prefix examples/broken-ts-app run build",
     "demo:broken": "node ./bin/runcap.mjs run --label broken-ts-demo -- npm --prefix examples/broken-ts-app run build",
-    "test": "node ./scripts/delta-test.mjs && node ./scripts/validate-demo.mjs",
+    "test": "node ./scripts/delta-test.mjs && node ./scripts/loop-test.mjs && node ./scripts/validate-demo.mjs",
     "test:delta": "node ./scripts/delta-test.mjs",
+    "test:loop": "node ./scripts/loop-test.mjs",
     "status": "node ./bin/runcap.mjs status",
     "report": "node ./bin/runcap.mjs report",
     "export": "node ./bin/runcap.mjs export",

package/scripts/loop-test.mjs ADDED Viewed

@@ -0,0 +1,84 @@
+// Loop / circling detection tests, run against the REAL compressor exports.
+// Proves the "looks productive but stuck" signal the gateway emits:
+//   1. Reworded same-failure attempts (similar-but-not-identical prompts) are
+//      flagged as a loop once they repeat enough times.
+//   2. Genuine progress (the conversation tail actually changing) is NOT flagged.
+//   3. A single slow/long legit step is NOT flagged.
+//
+// Pure Node, no test framework. Exits non-zero on any failure so it can gate CI.
+import { detectLoop, requestShapeText } from "../src/compressor.mjs";
+let failures = 0;
+function check(name, pass, detail) {
+  if (!pass) failures++;
+  console.log(`${pass ? "PASS" : "FAIL"}  ${name}${detail ? "  — " + detail : ""}`);
+}
+// A long, stable conversation tail (system + history the agent keeps resending),
+// plus a final attempt line that the agent only REWORDS each loop. This is the
+// exact case that fools cheap hashing: 99% identical, never byte-equal.
+const stableTail = [
+  "You are a coding agent. Fix the failing build.",
+  ...Array.from({ length: 40 }, (_, i) => `context line ${i}: prior file content the agent keeps resending`),
+  "The test still fails with: TypeError: cannot read property 'id' of undefined"
+].join("\n");
+function attempt(wording) {
+  return stableTail + "\n" + "Let me try this: " + wording;
+}
+// --- Test 1: reworded same-failure attempts are flagged as a loop ---
+{
+  const history = [
+    attempt("guard the undefined with an if check"),
+    attempt("add an optional chain before .id"),
+    attempt("default the object to {} before reading id")
+  ];
+  const current = attempt("wrap the access in a try/catch and read id safely");
+  const r = detectLoop(current, history);
+  check("reworded same-failure attempts flagged as loop", r.looping && r.repeats >= 3,
+    `repeats=${r.repeats}, similarity=${r.similarity}`);
+}
+// --- Test 2: real progress is NOT flagged ---
+// Each turn the conversation tail genuinely changes (new files, new errors).
+{
+  const history = [
+    "Fix the build. Error: missing module 'parser'.\n" + "ctx A ".repeat(40),
+    "Installed parser. New error: parser.parse is not a function.\n" + "ctx B ".repeat(40)
+  ];
+  const current = "Fixed the call signature. Now the test passes; writing the next feature.\n" + "ctx C ".repeat(40);
+  const r = detectLoop(current, history);
+  check("genuine progress is NOT flagged as loop", !r.looping,
+    `looping=${r.looping}, repeats=${r.repeats}`);
+}
+// --- Test 3: a single slow/long legit step is NOT flagged ---
+// One big request with no prior near-identical history must never trip.
+{
+  const current = attempt("first and only attempt at this step");
+  const r = detectLoop(current, []);
+  check("single long step is NOT flagged", !r.looping && r.repeats === 0,
+    `repeats=${r.repeats}`);
+}
+// --- Test 4: two repeats is at_risk but below the warn threshold ---
+{
+  const history = [attempt("try A"), attempt("try B")];
+  const current = attempt("try C");
+  const r = detectLoop(current, history);
+  check("two near-identical repeats not yet a loop (under threshold)", !r.looping && r.repeats === 2,
+    `repeats=${r.repeats}`);
+}
+// --- Test 5: requestShapeText pulls the same text from OpenAI and Anthropic shapes ---
+{
+  const openai = requestShapeText({ messages: [{ role: "user", content: "hello world" }] });
+  const anthropic = requestShapeText({ messages: [{ role: "user", content: [{ type: "text", text: "hello world" }] }] });
+  check("requestShapeText normalizes OpenAI and Anthropic content", openai === "hello world" && anthropic === "hello world",
+    `openai="${openai}" anthropic="${anthropic}"`);
+}
+console.log("\n" + (failures === 0 ? "ALL LOOP TESTS PASSED" : `${failures} LOOP TEST(S) FAILED`));
+process.exit(failures === 0 ? 0 : 1);

package/scripts/make-linkedin-delta-video.mjs ADDED Viewed

@@ -0,0 +1,412 @@
+// Renders a LinkedIn-ready MP4 for the Runcap delta-encoding post.
+// Output: docs/assets/media/runcap-linkedin-delta-demo.mp4
+// Requires: playwright + ffmpeg available on the machine.
+import { spawnSync } from "node:child_process";
+import { mkdirSync, readdirSync, rmSync } from "node:fs";
+import { dirname, join, resolve } from "node:path";
+import { fileURLToPath } from "node:url";
+import { chromium } from "playwright";
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const root = resolve(__dirname, "..");
+const outDir = resolve(root, "docs/assets/media");
+const framesDir = "/private/tmp/runcap-linkedin-delta-frames";
+const outFile = join(outDir, "runcap-linkedin-delta-demo.mp4");
+const width = 1080;
+const height = 1080;
+const fps = 30;
+const duration = 12;
+const frameCount = fps * duration;
+mkdirSync(outDir, { recursive: true });
+mkdirSync(framesDir, { recursive: true });
+for (const file of readdirSync(framesDir)) {
+  if (file.startsWith("frame-") && file.endsWith(".png")) {
+    rmSync(join(framesDir, file));
+  }
+}
+const html = `<!doctype html>
+<html>
+<head>
+  <meta charset="utf-8" />
+  <style>
+    * { box-sizing: border-box; }
+    html, body {
+      margin: 0;
+      width: ${width}px;
+      height: ${height}px;
+      overflow: hidden;
+      background: #f4f6fb;
+      font-family: Inter, ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
+      color: #f8fafc;
+    }
+    .stage {
+      width: ${width}px;
+      height: ${height}px;
+      padding: 58px;
+      display: grid;
+      place-items: center;
+      background:
+        radial-gradient(circle at 15% 10%, rgba(34, 211, 238, .18), transparent 32%),
+        radial-gradient(circle at 85% 12%, rgba(99, 102, 241, .16), transparent 34%),
+        linear-gradient(135deg, #eef2ff, #f8fafc);
+    }
+    .card {
+      width: 964px;
+      height: 964px;
+      border-radius: 42px;
+      padding: 42px;
+      background: #080b12;
+      box-shadow: 0 36px 90px rgba(15, 23, 42, .25);
+      position: relative;
+      overflow: hidden;
+    }
+    .card::before {
+      content: "";
+      position: absolute;
+      inset: 0;
+      background:
+        radial-gradient(circle at 50% -10%, rgba(45, 212, 191, .18), transparent 36%),
+        linear-gradient(180deg, rgba(255,255,255,.06), transparent 28%);
+      pointer-events: none;
+    }
+    .top {
+      position: relative;
+      display: flex;
+      justify-content: space-between;
+      align-items: center;
+      color: #94a3b8;
+      font-size: 23px;
+      letter-spacing: -0.02em;
+    }
+    .brand {
+      display: flex;
+      gap: 14px;
+      align-items: center;
+      font-weight: 800;
+      color: #fff;
+      font-size: 30px;
+    }
+    .logo {
+      width: 42px;
+      height: 42px;
+      border-radius: 13px;
+      display: grid;
+      place-items: center;
+      background: linear-gradient(135deg, #22d3ee, #34d399);
+      color: #021014;
+      font-weight: 900;
+    }
+    .pill {
+      border: 1px solid rgba(148, 163, 184, .28);
+      background: rgba(15, 23, 42, .68);
+      color: #cbd5e1;
+      border-radius: 999px;
+      padding: 10px 16px;
+      font-size: 18px;
+      font-weight: 650;
+    }
+    .content {
+      position: relative;
+      height: 818px;
+      padding-top: 44px;
+    }
+    .headline {
+      margin: 0;
+      color: #f8fafc;
+      font-size: 70px;
+      line-height: .96;
+      letter-spacing: -0.06em;
+      max-width: 830px;
+    }
+    .sub {
+      margin-top: 22px;
+      color: #cbd5e1;
+      font-size: 29px;
+      line-height: 1.28;
+      letter-spacing: -0.03em;
+      max-width: 820px;
+    }
+    .accent { color: #67e8f9; }
+    .green { color: #34d399; }
+    .red { color: #fb7185; }
+    .violet { color: #a78bfa; }
+    .mono {
+      font-family: "SF Mono", "JetBrains Mono", Menlo, Consolas, monospace;
+      letter-spacing: -0.04em;
+    }
+    .terminal {
+      margin-top: 38px;
+      border: 1px solid rgba(148, 163, 184, .22);
+      background: rgba(2, 6, 23, .82);
+      border-radius: 24px;
+      padding: 26px;
+      font-size: 24px;
+      line-height: 1.42;
+      color: #dbeafe;
+      box-shadow: inset 0 1px 0 rgba(255,255,255,.05);
+    }
+    .terminal .line { opacity: 1; }
+    .grid2 {
+      display: grid;
+      grid-template-columns: 1fr 1fr;
+      gap: 24px;
+      margin-top: 34px;
+    }
+    .file {
+      border: 1px solid rgba(148, 163, 184, .22);
+      background: rgba(15, 23, 42, .9);
+      border-radius: 22px;
+      padding: 22px;
+      min-height: 290px;
+    }
+    .file h3 {
+      margin: 0 0 16px;
+      color: #94a3b8;
+      font-size: 20px;
+      letter-spacing: -0.02em;
+    }
+    .code {
+      font-size: 20px;
+      line-height: 1.42;
+      white-space: pre-wrap;
+      color: #dbeafe;
+    }
+    .changed {
+      display: inline-block;
+      padding: 2px 6px;
+      border-radius: 7px;
+      background: rgba(52, 211, 153, .16);
+      color: #6ee7b7;
+    }
+    .warning {
+      margin-top: 25px;
+      border: 1px solid rgba(251, 113, 133, .35);
+      background: rgba(251, 113, 133, .1);
+      color: #fecdd3;
+      border-radius: 22px;
+      padding: 20px 24px;
+      font-size: 27px;
+      font-weight: 850;
+      letter-spacing: -0.04em;
+    }
+    .flow {
+      display: grid;
+      grid-template-columns: 1fr 88px 1fr;
+      align-items: center;
+      gap: 18px;
+      margin-top: 42px;
+    }
+    .box {
+      min-height: 220px;
+      border: 1px solid rgba(148, 163, 184, .24);
+      background: rgba(15, 23, 42, .88);
+      border-radius: 24px;
+      padding: 24px;
+    }
+    .box-title {
+      color: #94a3b8;
+      font-size: 20px;
+      font-weight: 750;
+      margin-bottom: 16px;
+    }
+    .arrow {
+      height: 88px;
+      border-radius: 50%;
+      display: grid;
+      place-items: center;
+      background: linear-gradient(135deg, #22d3ee, #34d399);
+      color: #031015;
+      font-size: 42px;
+      font-weight: 900;
+    }
+    .numbers {
+      margin-top: 46px;
+      display: grid;
+      grid-template-columns: 1fr 1fr;
+      gap: 28px;
+      align-items: end;
+    }
+    .number-card {
+      border-radius: 26px;
+      padding: 28px;
+      background: rgba(15, 23, 42, .9);
+      border: 1px solid rgba(148, 163, 184, .22);
+    }
+    .label {
+      color: #94a3b8;
+      font-size: 22px;
+      margin-bottom: 12px;
+      letter-spacing: -0.03em;
+    }
+    .big {
+      font-size: 78px;
+      line-height: .9;
+      font-weight: 900;
+      letter-spacing: -0.08em;
+    }
+    .bar {
+      margin-top: 32px;
+      height: 34px;
+      border-radius: 999px;
+      background: rgba(148, 163, 184, .16);
+      overflow: hidden;
+      border: 1px solid rgba(148, 163, 184, .24);
+    }
+    .fill {
+      height: 100%;
+      width: 37.9%;
+      border-radius: 999px;
+      background: linear-gradient(90deg, #22d3ee, #34d399);
+    }
+    .footer {
+      position: absolute;
+      left: 42px;
+      right: 42px;
+      bottom: 34px;
+      display: flex;
+      justify-content: space-between;
+      align-items: center;
+      color: #94a3b8;
+      font-size: 20px;
+    }
+    .scene {
+      position: absolute;
+      inset: 44px 0 0 0;
+      opacity: 0;
+      transform: translateY(24px) scale(.985);
+      transition: opacity .24s ease, transform .24s ease;
+    }
+    .scene.active {
+      opacity: 1;
+      transform: translateY(0) scale(1);
+    }
+  </style>
+</head>
+<body>
+  <div class="stage">
+    <div class="card">
+      <div class="top">
+        <div class="brand"><div class="logo">R</div> Runcap</div>
+        <div class="pill">local-first AI cost control</div>
+      </div>
+      <div class="content">
+        <section class="scene active" id="s0">
+          <h1 class="headline">Your AI coding agent has a hidden tax.</h1>
+          <p class="sub">It reads a file, edits one line, then re-reads the whole file. The API charges full price again.</p>
+          <div class="terminal mono">
+            <div class="line accent">agent loop</div>
+            <div class="line">read auth.ts → edit one line → read auth.ts again</div>
+            <div class="line red">same context, full token bill</div>
+          </div>
+        </section>
+        <section class="scene" id="s1">
+          <h1 class="headline">A tiny edit becomes a full re-read.</h1>
+          <div class="grid2">
+            <div class="file mono">
+              <h3>auth.ts · first read</h3>
+              <div class="code">if (!token) {
+  throw new Error("no token");
+}
+return verify(token);</div>
+            </div>
+            <div class="file mono">
+              <h3>auth.ts · after one-line edit</h3>
+              <div class="code">if (!token) {
+  <span class="changed">return res.status(401)</span>;
+}
+return verify(token);</div>
+            </div>
+          </div>
+          <div class="warning">Without a delta layer, the agent pays to send the whole file again.</div>
+        </section>
+        <section class="scene" id="s2">
+          <h1 class="headline">Runcap sends a lossless delta instead.</h1>
+          <p class="sub">It refuses to emit the diff unless it can rebuild the edited file byte-for-byte first.</p>
+          <div class="flow mono">
+            <div class="box">
+              <div class="box-title">model already saw</div>
+              <div class="code">throw new Error("no token")</div>
+            </div>
+            <div class="arrow">→</div>
+            <div class="box">
+              <div class="box-title">Runcap sends only the change</div>
+              <div class="code red">- throw new Error("no token")</div>
+              <div class="code green">+ return res.status(401)</div>
+            </div>
+          </div>
+        </section>
+        <section class="scene" id="s3">
+          <h1 class="headline">Real OpenAI call. Real provider usage.</h1>
+          <div class="numbers">
+            <div class="number-card">
+              <div class="label">baseline prompt</div>
+              <div class="big red mono">1,186</div>
+              <div class="label">tokens</div>
+            </div>
+            <div class="number-card">
+              <div class="label">with Runcap delta</div>
+              <div class="big green mono">737</div>
+              <div class="label">tokens</div>
+            </div>
+          </div>
+          <div class="bar"><div class="fill"></div></div>
+          <p class="sub"><span class="green">37.9% saved</span>. The model still answered correctly about the changed line.</p>
+        </section>
+        <section class="scene" id="s4">
+          <h1 class="headline">Then cap the run before it gets expensive.</h1>
+          <p class="sub">Point OpenAI or Anthropic-compatible tools at the local gateway. When the ceiling is crossed, the next call stops.</p>
+          <div class="terminal mono">
+            <div class="line green">$ AIM_DAILY_BUDGET_USD=10 runcap gateway</div>
+            <div class="line">gateway up · compression on · hard cap armed</div>
+            <div class="line red">HTTP 429 budget_guard</div>
+            <div class="line accent">stopped before money left your account</div>
+          </div>
+        </section>
+      </div>
+      <div class="footer">
+        <span class="mono">npm install -g runcap</span>
+        <span>Free · MIT · 100% local</span>
+      </div>
+    </div>
+  </div>
+  <script>
+    const scenes = [...document.querySelectorAll(".scene")];
+    window.renderFrame = (seconds) => {
+      const index = seconds < 2.4 ? 0 : seconds < 4.8 ? 1 : seconds < 7.2 ? 2 : seconds < 9.8 ? 3 : 4;
+      scenes.forEach((scene, i) => scene.classList.toggle("active", i === index));
+    };
+  </script>
+</body>
+</html>`;
+const browser = await chromium.launch({ headless: true });
+const page = await browser.newPage({ viewport: { width, height }, deviceScaleFactor: 1 });
+await page.setContent(html);
+await page.waitForTimeout(100);
+for (let i = 0; i < frameCount; i += 1) {
+  const seconds = i / fps;
+  await page.evaluate((t) => window.renderFrame(t), seconds);
+  await page.screenshot({ path: join(framesDir, `frame-${String(i).padStart(4, "0")}.png`) });
+}
+await browser.close();
+const ffmpeg = spawnSync("ffmpeg", [
+  "-y",
+  "-framerate", String(fps),
+  "-i", join(framesDir, "frame-%04d.png"),
+  "-c:v", "libx264",
+  "-pix_fmt", "yuv420p",
+  "-movflags", "+faststart",
+  "-crf", "18",
+  outFile
+], { stdio: "inherit" });
+if (ffmpeg.status !== 0) {
+  process.exit(ffmpeg.status ?? 1);
+}
+console.log(`wrote ${outFile}`);

package/src/compressor.mjs CHANGED Viewed

@@ -46,7 +46,7 @@ function shortHash(text) {
 // Cheap line-overlap ratio. Used only to decide whether a full LCS diff is
 // worth computing; the real saving is measured against the emitted delta.
-function lineSimilarity(aLines, bLines) {
+export function lineSimilarity(aLines, bLines) {
   const aSet = new Set(aLines);
   let shared = 0;
   for (const l of bLines) if (aSet.has(l)) shared++;
@@ -378,3 +378,59 @@ export function compressRequestBody(body) {
     deltas: deduped.deltas
   };
 }
+// --- loop / circling detection (the "looks productive but stuck" signal) ---
+// The gateway sees every request the agent sends. An agent that is circling the
+// same failure with reworded attempts sends prompts that are SIMILAR-but-not-
+// identical turn after turn: the conversation tail barely moves while tokens
+// keep burning. Plain hashing misses this (the text differs slightly each loop);
+// this catches it with the same line-similarity primitive the delta-encoder uses.
+const LOOP_SIMILARITY = 0.92; // two consecutive prompts this similar = no real progress made between them
+const LOOP_MIN_REPEATS = 3;   // how many near-identical prompts in a row before we warn
+// Pull the comparable "shape" of a request: the concatenated text the agent is
+// actually sending this turn (messages / input / system), order-preserving.
+export function requestShapeText(body) {
+  if (!body || typeof body !== "object") return "";
+  const parts = [];
+  const push = (content) => {
+    if (typeof content === "string") parts.push(content);
+    else if (Array.isArray(content)) {
+      for (const p of content) if (p && typeof p === "object" && typeof p.text === "string") parts.push(p.text);
+    }
+  };
+  if (Array.isArray(body.messages)) for (const m of body.messages) if (m && typeof m === "object") push(m.content);
+  if (body.system !== undefined) push(body.system);
+  if (typeof body.input === "string") push(body.input);
+  return parts.join("\n");
+}
+// Given the current request and a rolling history of prior request shapes,
+// decide whether the agent is circling. Returns { looping, repeats, similarity }.
+// History is oldest->newest of prior requestShapeText() strings in this session.
+export function detectLoop(currentShape, history, {
+  similarityThreshold = LOOP_SIMILARITY,
+  minRepeats = LOOP_MIN_REPEATS
+} = {}) {
+  if (!currentShape || !Array.isArray(history) || history.length === 0) {
+    return { looping: false, repeats: 0, similarity: 0 };
+  }
+  const curLines = String(currentShape).split("\n");
+  let repeats = 0;
+  let lastSimilarity = 0;
+  // Walk backward through history; count the unbroken run of near-identical turns.
+  for (let i = history.length - 1; i >= 0; i--) {
+    const sim = lineSimilarity(curLines, String(history[i]).split("\n"));
+    if (sim >= similarityThreshold) {
+      repeats += 1;
+      lastSimilarity = sim;
+    } else {
+      break;
+    }
+  }
+  return {
+    looping: repeats >= minRepeats,
+    repeats,
+    similarity: Number(lastSimilarity.toFixed(3))
+  };
+}

package/src/mission-control.mjs CHANGED Viewed

@@ -7,7 +7,7 @@ import path from "node:path";
 import process from "node:process";
 import { syncRun } from "./cloud.mjs";
 import { sendAlert } from "./alerts.mjs";
-import { compressRequestBody, estimateTokens } from "./compressor.mjs";
+import { compressRequestBody, estimateTokens, requestShapeText, detectLoop } from "./compressor.mjs";
 const STORE_DIR = ".runcap";
 const MISSIONS_DIR = path.join(STORE_DIR, "missions");
@@ -523,6 +523,12 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
   if (gatewayMode !== "mock" && !openaiKey && !anthropicKey) {
     throw new Error("Missing upstream key. Set OPENAI_API_KEY (for /v1/chat/completions) and/or ANTHROPIC_API_KEY (for /v1/messages). The gateway cannot proxy without at least one.");
   }
+  // Rolling history of recent request shapes (per gateway process) so we can
+  // detect an agent circling the same failure with reworded prompts: similar-
+  // but-not-identical turns, which plain hashing never catches.
+  const loopEnabled = (process.env.AIM_LOOP_DETECT ?? "on").toLowerCase() !== "off";
+  const shapeHistory = [];
+  const SHAPE_HISTORY_MAX = 12;
   const server = http.createServer(async (request, response) => {
     const started = Date.now();
     try {
@@ -545,6 +551,17 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
       const bodyText = await readRequestBody(request);
       const requestBody = safeJson(bodyText) ?? {};
+      // Loop signal: compare this request's shape against the recent run.
+      let loop = null;
+      if (loopEnabled) {
+        const shape = requestShapeText(requestBody);
+        if (shape) {
+          const result = detectLoop(shape, shapeHistory);
+          loop = { looping: result.looping, repeats: result.repeats, similarity: result.similarity, truth: "calculated" };
+          shapeHistory.push(shape);
+          if (shapeHistory.length > SHAPE_HISTORY_MAX) shapeHistory.shift();
+        }
+      }
       const budget = readBudget();
       const summary = await readGatewaySummary({ windowMs: budgetWindowMs() });
       // Compress the request body once (safe, lossless-by-construction). Disable with AIM_COMPRESS=off.
@@ -591,6 +608,7 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
             capUsd: budget,
             blockedByThisCall
           },
+          loop,
           error: blockedByThisCall
             ? `Budget would be exceeded by this call: $${summary.estimatedCostUsd} spent + ~$${callEstimate} this call > cap $${budget}`
             : `Budget exceeded: ${summary.estimatedCostUsd} >= ${budget}`,
@@ -631,6 +649,7 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
           usage: responseBody.usage,
           cost: estimateApiCost(responseBody.usage, requestBody.model ?? responseBody.model),
           compression,
+          loop,
           truth: "mock_provider_usage",
           requestHash: createHash("sha1").update(bodyText).digest("hex")
         });
@@ -682,9 +701,14 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
         usage: responseBody.usage ?? null,
         cost: estimateApiCost(responseBody.usage, requestBody.model ?? responseBody.model),
         compression,
+        loop,
         truth: responseBody.usage ? "provider_usage" : "unknown",
         requestHash: createHash("sha1").update(bodyText).digest("hex")
       });
+      if (loop && loop.looping) {
+        sendAlert(`Runcap: possible stuck loop. The agent has sent ${loop.repeats} near-identical prompts in a row (${Math.round(loop.similarity * 100)}% similar) without the conversation moving forward. It may be circling the same failure with reworded attempts.`)
+          .catch(() => {});
+      }
       if (responseBody.usage) {
         const spent = await readGatewaySummary({ windowMs: budgetWindowMs() });
         syncRun({
@@ -769,19 +793,23 @@ export async function showStatus(options = {}) {
   const gateway = await readGatewaySummary();
   const gatewayLine = `Gateway: ${gateway.callCount} calls, ${gateway.totalTokens} tokens, $${gateway.estimatedCostUsd} estimated (${gateway.truth})`;
+  const loopLine = gateway.loop?.looping
+    ? `Loop warning: last ${gateway.loop.repeats} prompts were ${Math.round(gateway.loop.similarity * 100)}% identical with no progress. The agent may be circling the same failure (truth: calculated).`
+    : null;
   const latest = await latestMissionId();
-  if (!latest) return `${fuelLine}\n${gatewayLine}\nNo missions recorded yet.`;
+  if (!latest) return [fuelLine, gatewayLine, loopLine, "No missions recorded yet."].filter(Boolean).join("\n");
   const mission = await readMission(latest);
   return [
     fuelLine,
     gatewayLine,
+    loopLine,
     `Latest mission: ${mission.id}`,
     `Status: ${mission.stuck.status}`,
     `Exit code: ${mission.exitCode}`,
     `Changed files: ${mission.diffEvidence.changedFiles.length}`,
     `Errors: ${mission.errors.length}`,
     `Report: ${path.join(MISSIONS_DIR, mission.id, "report.md")}`
-  ].join("\n");
+  ].filter(Boolean).join("\n");
 }
 export async function recordFuel(value) {
@@ -1419,6 +1447,13 @@ async function readGatewaySummary({ windowMs } = {}) {
     const inputRate = pricing ? pricing.inputPerMillion : 3; // fall back to a mid Sonnet-ish rate
     return sum + (saved * inputRate) / 1_000_000;
   }, 0);
+  // Loop signal: the most recent event that carries a loop verdict tells us
+  // whether the agent is currently circling (similar-but-not-identical prompts
+  // repeated without progress). This is the "looks productive but stuck" case.
+  const lastWithLoop = [...events].reverse().find((event) => event.loop);
+  const loop = lastWithLoop
+    ? { ...lastWithLoop.loop, at: lastWithLoop.at, model: lastWithLoop.model }
+    : { looping: false, repeats: 0, similarity: 0, truth: "calculated" };
   return {
     callCount: events.length,
     successfulCallCount: successful.length,
@@ -1427,6 +1462,7 @@ async function readGatewaySummary({ windowMs } = {}) {
     savedTokens,
     savedUsd: Number(savedUsd.toFixed(6)),
     wouldHaveSpentUsd: Number((estimatedCost + savedUsd).toFixed(6)),
+    loop,
     truth: events.some((event) => event.truth === "provider_usage" || event.truth === "mock_provider_usage")
       ? "usage_plus_static_price_table"
       : "unknown",