runcap 0.2.2 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -4,15 +4,21 @@
4
4
 
5
5
  ![Runcap terminal demo: estimate, cap, compress, stop](docs/assets/demo.svg)
6
6
 
7
- **Know what your coding agent will cost before you build it, and set a hard ceiling so it never surprises you.**
7
+ **Your AI coding agent re-reads the same files over and over and quietly burns your money. Runcap estimates the bill before you build, hard-caps the spend so it physically stops at your ceiling, and losslessly compresses every call. Free, MIT, 100% local. Your code and tokens never touch a server.**
8
8
 
9
- Runcap estimates the cost of an agent run as a range, enforces a hard spend ceiling that physically stops the run, and when the agent gets stuck it hands you the exact rescue prompt. Free, MIT, 100% local. Your code and tokens never touch a server.
9
+ On a real OpenAI call, one edited-file re-read dropped from **1,186 to 737 prompt tokens (37.9% saved)** with the model still answering correctly about the changed line. No other proxy does this:
10
+
11
+ | | Without Runcap | With Runcap |
12
+ |---|---|---|
13
+ | Re-read of an edited file | 1,186 prompt tokens | **737 prompt tokens** |
14
+ | You find out the cost | when the invoice arrives | **before you press go, capped at your ceiling** |
15
+ | When the agent gets stuck | it keeps spending | **run stops, you get the exact rescue prompt** |
10
16
 
11
17
  > Every other tool here is a rear-view mirror - it shows you the bill *after* you paid it. Runcap estimates the bill *before* you start and caps it. It is a circuit breaker, not a dashboard.
12
18
 
13
19
  ## Why
14
20
 
15
- Multi-agent coding runs burn roughly **15x more tokens** than a single chat ([Anthropic engineering](https://www.anthropic.com/engineering/built-multi-agent-research-system)). Agents loop on the same error, rewrite plans, and hand you a confident summary while the task is not actually done. You find out what it cost when the invoice - or the subscription limit - arrives.
21
+ **Agents loop on the same error, rewrite plans, and re-read files they just edited - every loop is tokens you pay for.** Multi-agent coding runs burn roughly **15x more tokens** than a single chat ([Anthropic engineering](https://www.anthropic.com/engineering/built-multi-agent-research-system)). They hand you a confident summary while the task is not actually done, and you find out what it cost when the invoice - or the subscription limit - arrives.
16
22
 
17
23
  Observability tools (Langfuse, Helicone, LangSmith, AgentOps) measure the past. Gateways (LiteLLM, Portkey, OpenRouter) route the present. None of them stop the spend *before* it happens. Runcap does the one thing the rear-view mirror can't:
18
24
 
@@ -135,6 +141,12 @@ It's pure Node with **zero ML or native dependencies**, so it installs everywher
135
141
 
136
142
  The dashboard shows the result as one number: **"You saved $X · N tokens compressed · would have spent $Y."** Disable it with `AIM_COMPRESS=off` if you ever want raw passthrough.
137
143
 
144
+ ## Loop detection (the "looks productive but stuck" signal)
145
+
146
+ The hard case in stuck-detection is the agent that keeps producing output but is really circling the same failure, just reworded each time. Plain hashing misses it because the prompt is *similar but never byte-identical* between loops. Because the gateway sees every request, Runcap compares each request's conversation shape against the recent run with the same line-similarity primitive the delta-encoder uses: when several prompts in a row are near-identical (default: 3 prompts at 92%+ similarity) while the conversation never moves forward, it flags `loop.looping` on the event, surfaces a warning in `runcap status`, and fires an alert.
147
+
148
+ This is a **calculated** signal, not a proven dollar-saving: it tells you *"the agent has sent 3 near-identical prompts in a row with no progress"* so you can step in before the loop burns more budget. Tune or disable it with `AIM_LOOP_DETECT=off`. (Today's [`detectStuck`](src/mission-control.mjs) post-run score is outcome-based: exit code, parsed errors, and zero-diff. The loop signal adds the missing in-flight behavioral signal on top of it.)
149
+
138
150
  ## Pricing table
139
151
 
140
152
  Costs are calculated from a sourced multi-provider table - Anthropic (Opus / Sonnet / Haiku) and OpenAI (GPT-5 family + legacy GPT-4), with cache-read and batch discounts handled - labeled with source and verification date. When a model is unknown, Runcap says `unknown_price` rather than guessing.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "runcap",
3
- "version": "0.2.2",
3
+ "version": "0.3.0",
4
4
  "description": "Cap every agent run before it starts: estimate cost, set a hard ceiling that stops the run, rescue stuck agents. Local, MIT, nothing uploaded.",
5
5
  "license": "MIT",
6
6
  "type": "module",
@@ -45,8 +45,9 @@
45
45
  "acceptance": "node ./scripts/acceptance.mjs",
46
46
  "smoke": "node ./bin/runcap.mjs run --label smoke -- npm --prefix examples/broken-ts-app run build",
47
47
  "demo:broken": "node ./bin/runcap.mjs run --label broken-ts-demo -- npm --prefix examples/broken-ts-app run build",
48
- "test": "node ./scripts/delta-test.mjs && node ./scripts/validate-demo.mjs",
48
+ "test": "node ./scripts/delta-test.mjs && node ./scripts/loop-test.mjs && node ./scripts/validate-demo.mjs",
49
49
  "test:delta": "node ./scripts/delta-test.mjs",
50
+ "test:loop": "node ./scripts/loop-test.mjs",
50
51
  "status": "node ./bin/runcap.mjs status",
51
52
  "report": "node ./bin/runcap.mjs report",
52
53
  "export": "node ./bin/runcap.mjs export",
@@ -0,0 +1,84 @@
1
+ // Loop / circling detection tests, run against the REAL compressor exports.
2
+ // Proves the "looks productive but stuck" signal the gateway emits:
3
+ // 1. Reworded same-failure attempts (similar-but-not-identical prompts) are
4
+ // flagged as a loop once they repeat enough times.
5
+ // 2. Genuine progress (the conversation tail actually changing) is NOT flagged.
6
+ // 3. A single slow/long legit step is NOT flagged.
7
+ //
8
+ // Pure Node, no test framework. Exits non-zero on any failure so it can gate CI.
9
+
10
+ import { detectLoop, requestShapeText } from "../src/compressor.mjs";
11
+
12
+ let failures = 0;
13
+ function check(name, pass, detail) {
14
+ if (!pass) failures++;
15
+ console.log(`${pass ? "PASS" : "FAIL"} ${name}${detail ? " — " + detail : ""}`);
16
+ }
17
+
18
+ // A long, stable conversation tail (system + history the agent keeps resending),
19
+ // plus a final attempt line that the agent only REWORDS each loop. This is the
20
+ // exact case that fools cheap hashing: 99% identical, never byte-equal.
21
+ const stableTail = [
22
+ "You are a coding agent. Fix the failing build.",
23
+ ...Array.from({ length: 40 }, (_, i) => `context line ${i}: prior file content the agent keeps resending`),
24
+ "The test still fails with: TypeError: cannot read property 'id' of undefined"
25
+ ].join("\n");
26
+
27
+ function attempt(wording) {
28
+ return stableTail + "\n" + "Let me try this: " + wording;
29
+ }
30
+
31
+ // --- Test 1: reworded same-failure attempts are flagged as a loop ---
32
+ {
33
+ const history = [
34
+ attempt("guard the undefined with an if check"),
35
+ attempt("add an optional chain before .id"),
36
+ attempt("default the object to {} before reading id")
37
+ ];
38
+ const current = attempt("wrap the access in a try/catch and read id safely");
39
+ const r = detectLoop(current, history);
40
+ check("reworded same-failure attempts flagged as loop", r.looping && r.repeats >= 3,
41
+ `repeats=${r.repeats}, similarity=${r.similarity}`);
42
+ }
43
+
44
+ // --- Test 2: real progress is NOT flagged ---
45
+ // Each turn the conversation tail genuinely changes (new files, new errors).
46
+ {
47
+ const history = [
48
+ "Fix the build. Error: missing module 'parser'.\n" + "ctx A ".repeat(40),
49
+ "Installed parser. New error: parser.parse is not a function.\n" + "ctx B ".repeat(40)
50
+ ];
51
+ const current = "Fixed the call signature. Now the test passes; writing the next feature.\n" + "ctx C ".repeat(40);
52
+ const r = detectLoop(current, history);
53
+ check("genuine progress is NOT flagged as loop", !r.looping,
54
+ `looping=${r.looping}, repeats=${r.repeats}`);
55
+ }
56
+
57
+ // --- Test 3: a single slow/long legit step is NOT flagged ---
58
+ // One big request with no prior near-identical history must never trip.
59
+ {
60
+ const current = attempt("first and only attempt at this step");
61
+ const r = detectLoop(current, []);
62
+ check("single long step is NOT flagged", !r.looping && r.repeats === 0,
63
+ `repeats=${r.repeats}`);
64
+ }
65
+
66
+ // --- Test 4: two repeats is at_risk but below the warn threshold ---
67
+ {
68
+ const history = [attempt("try A"), attempt("try B")];
69
+ const current = attempt("try C");
70
+ const r = detectLoop(current, history);
71
+ check("two near-identical repeats not yet a loop (under threshold)", !r.looping && r.repeats === 2,
72
+ `repeats=${r.repeats}`);
73
+ }
74
+
75
+ // --- Test 5: requestShapeText pulls the same text from OpenAI and Anthropic shapes ---
76
+ {
77
+ const openai = requestShapeText({ messages: [{ role: "user", content: "hello world" }] });
78
+ const anthropic = requestShapeText({ messages: [{ role: "user", content: [{ type: "text", text: "hello world" }] }] });
79
+ check("requestShapeText normalizes OpenAI and Anthropic content", openai === "hello world" && anthropic === "hello world",
80
+ `openai="${openai}" anthropic="${anthropic}"`);
81
+ }
82
+
83
+ console.log("\n" + (failures === 0 ? "ALL LOOP TESTS PASSED" : `${failures} LOOP TEST(S) FAILED`));
84
+ process.exit(failures === 0 ? 0 : 1);
@@ -0,0 +1,412 @@
1
+ // Renders a LinkedIn-ready MP4 for the Runcap delta-encoding post.
2
+ // Output: docs/assets/media/runcap-linkedin-delta-demo.mp4
3
+ // Requires: playwright + ffmpeg available on the machine.
4
+ import { spawnSync } from "node:child_process";
5
+ import { mkdirSync, readdirSync, rmSync } from "node:fs";
6
+ import { dirname, join, resolve } from "node:path";
7
+ import { fileURLToPath } from "node:url";
8
+ import { chromium } from "playwright";
9
+
10
+ const __dirname = dirname(fileURLToPath(import.meta.url));
11
+ const root = resolve(__dirname, "..");
12
+ const outDir = resolve(root, "docs/assets/media");
13
+ const framesDir = "/private/tmp/runcap-linkedin-delta-frames";
14
+ const outFile = join(outDir, "runcap-linkedin-delta-demo.mp4");
15
+
16
+ const width = 1080;
17
+ const height = 1080;
18
+ const fps = 30;
19
+ const duration = 12;
20
+ const frameCount = fps * duration;
21
+
22
+ mkdirSync(outDir, { recursive: true });
23
+ mkdirSync(framesDir, { recursive: true });
24
+ for (const file of readdirSync(framesDir)) {
25
+ if (file.startsWith("frame-") && file.endsWith(".png")) {
26
+ rmSync(join(framesDir, file));
27
+ }
28
+ }
29
+
30
+ const html = `<!doctype html>
31
+ <html>
32
+ <head>
33
+ <meta charset="utf-8" />
34
+ <style>
35
+ * { box-sizing: border-box; }
36
+ html, body {
37
+ margin: 0;
38
+ width: ${width}px;
39
+ height: ${height}px;
40
+ overflow: hidden;
41
+ background: #f4f6fb;
42
+ font-family: Inter, ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
43
+ color: #f8fafc;
44
+ }
45
+ .stage {
46
+ width: ${width}px;
47
+ height: ${height}px;
48
+ padding: 58px;
49
+ display: grid;
50
+ place-items: center;
51
+ background:
52
+ radial-gradient(circle at 15% 10%, rgba(34, 211, 238, .18), transparent 32%),
53
+ radial-gradient(circle at 85% 12%, rgba(99, 102, 241, .16), transparent 34%),
54
+ linear-gradient(135deg, #eef2ff, #f8fafc);
55
+ }
56
+ .card {
57
+ width: 964px;
58
+ height: 964px;
59
+ border-radius: 42px;
60
+ padding: 42px;
61
+ background: #080b12;
62
+ box-shadow: 0 36px 90px rgba(15, 23, 42, .25);
63
+ position: relative;
64
+ overflow: hidden;
65
+ }
66
+ .card::before {
67
+ content: "";
68
+ position: absolute;
69
+ inset: 0;
70
+ background:
71
+ radial-gradient(circle at 50% -10%, rgba(45, 212, 191, .18), transparent 36%),
72
+ linear-gradient(180deg, rgba(255,255,255,.06), transparent 28%);
73
+ pointer-events: none;
74
+ }
75
+ .top {
76
+ position: relative;
77
+ display: flex;
78
+ justify-content: space-between;
79
+ align-items: center;
80
+ color: #94a3b8;
81
+ font-size: 23px;
82
+ letter-spacing: -0.02em;
83
+ }
84
+ .brand {
85
+ display: flex;
86
+ gap: 14px;
87
+ align-items: center;
88
+ font-weight: 800;
89
+ color: #fff;
90
+ font-size: 30px;
91
+ }
92
+ .logo {
93
+ width: 42px;
94
+ height: 42px;
95
+ border-radius: 13px;
96
+ display: grid;
97
+ place-items: center;
98
+ background: linear-gradient(135deg, #22d3ee, #34d399);
99
+ color: #021014;
100
+ font-weight: 900;
101
+ }
102
+ .pill {
103
+ border: 1px solid rgba(148, 163, 184, .28);
104
+ background: rgba(15, 23, 42, .68);
105
+ color: #cbd5e1;
106
+ border-radius: 999px;
107
+ padding: 10px 16px;
108
+ font-size: 18px;
109
+ font-weight: 650;
110
+ }
111
+ .content {
112
+ position: relative;
113
+ height: 818px;
114
+ padding-top: 44px;
115
+ }
116
+ .headline {
117
+ margin: 0;
118
+ color: #f8fafc;
119
+ font-size: 70px;
120
+ line-height: .96;
121
+ letter-spacing: -0.06em;
122
+ max-width: 830px;
123
+ }
124
+ .sub {
125
+ margin-top: 22px;
126
+ color: #cbd5e1;
127
+ font-size: 29px;
128
+ line-height: 1.28;
129
+ letter-spacing: -0.03em;
130
+ max-width: 820px;
131
+ }
132
+ .accent { color: #67e8f9; }
133
+ .green { color: #34d399; }
134
+ .red { color: #fb7185; }
135
+ .violet { color: #a78bfa; }
136
+ .mono {
137
+ font-family: "SF Mono", "JetBrains Mono", Menlo, Consolas, monospace;
138
+ letter-spacing: -0.04em;
139
+ }
140
+ .terminal {
141
+ margin-top: 38px;
142
+ border: 1px solid rgba(148, 163, 184, .22);
143
+ background: rgba(2, 6, 23, .82);
144
+ border-radius: 24px;
145
+ padding: 26px;
146
+ font-size: 24px;
147
+ line-height: 1.42;
148
+ color: #dbeafe;
149
+ box-shadow: inset 0 1px 0 rgba(255,255,255,.05);
150
+ }
151
+ .terminal .line { opacity: 1; }
152
+ .grid2 {
153
+ display: grid;
154
+ grid-template-columns: 1fr 1fr;
155
+ gap: 24px;
156
+ margin-top: 34px;
157
+ }
158
+ .file {
159
+ border: 1px solid rgba(148, 163, 184, .22);
160
+ background: rgba(15, 23, 42, .9);
161
+ border-radius: 22px;
162
+ padding: 22px;
163
+ min-height: 290px;
164
+ }
165
+ .file h3 {
166
+ margin: 0 0 16px;
167
+ color: #94a3b8;
168
+ font-size: 20px;
169
+ letter-spacing: -0.02em;
170
+ }
171
+ .code {
172
+ font-size: 20px;
173
+ line-height: 1.42;
174
+ white-space: pre-wrap;
175
+ color: #dbeafe;
176
+ }
177
+ .changed {
178
+ display: inline-block;
179
+ padding: 2px 6px;
180
+ border-radius: 7px;
181
+ background: rgba(52, 211, 153, .16);
182
+ color: #6ee7b7;
183
+ }
184
+ .warning {
185
+ margin-top: 25px;
186
+ border: 1px solid rgba(251, 113, 133, .35);
187
+ background: rgba(251, 113, 133, .1);
188
+ color: #fecdd3;
189
+ border-radius: 22px;
190
+ padding: 20px 24px;
191
+ font-size: 27px;
192
+ font-weight: 850;
193
+ letter-spacing: -0.04em;
194
+ }
195
+ .flow {
196
+ display: grid;
197
+ grid-template-columns: 1fr 88px 1fr;
198
+ align-items: center;
199
+ gap: 18px;
200
+ margin-top: 42px;
201
+ }
202
+ .box {
203
+ min-height: 220px;
204
+ border: 1px solid rgba(148, 163, 184, .24);
205
+ background: rgba(15, 23, 42, .88);
206
+ border-radius: 24px;
207
+ padding: 24px;
208
+ }
209
+ .box-title {
210
+ color: #94a3b8;
211
+ font-size: 20px;
212
+ font-weight: 750;
213
+ margin-bottom: 16px;
214
+ }
215
+ .arrow {
216
+ height: 88px;
217
+ border-radius: 50%;
218
+ display: grid;
219
+ place-items: center;
220
+ background: linear-gradient(135deg, #22d3ee, #34d399);
221
+ color: #031015;
222
+ font-size: 42px;
223
+ font-weight: 900;
224
+ }
225
+ .numbers {
226
+ margin-top: 46px;
227
+ display: grid;
228
+ grid-template-columns: 1fr 1fr;
229
+ gap: 28px;
230
+ align-items: end;
231
+ }
232
+ .number-card {
233
+ border-radius: 26px;
234
+ padding: 28px;
235
+ background: rgba(15, 23, 42, .9);
236
+ border: 1px solid rgba(148, 163, 184, .22);
237
+ }
238
+ .label {
239
+ color: #94a3b8;
240
+ font-size: 22px;
241
+ margin-bottom: 12px;
242
+ letter-spacing: -0.03em;
243
+ }
244
+ .big {
245
+ font-size: 78px;
246
+ line-height: .9;
247
+ font-weight: 900;
248
+ letter-spacing: -0.08em;
249
+ }
250
+ .bar {
251
+ margin-top: 32px;
252
+ height: 34px;
253
+ border-radius: 999px;
254
+ background: rgba(148, 163, 184, .16);
255
+ overflow: hidden;
256
+ border: 1px solid rgba(148, 163, 184, .24);
257
+ }
258
+ .fill {
259
+ height: 100%;
260
+ width: 37.9%;
261
+ border-radius: 999px;
262
+ background: linear-gradient(90deg, #22d3ee, #34d399);
263
+ }
264
+ .footer {
265
+ position: absolute;
266
+ left: 42px;
267
+ right: 42px;
268
+ bottom: 34px;
269
+ display: flex;
270
+ justify-content: space-between;
271
+ align-items: center;
272
+ color: #94a3b8;
273
+ font-size: 20px;
274
+ }
275
+ .scene {
276
+ position: absolute;
277
+ inset: 44px 0 0 0;
278
+ opacity: 0;
279
+ transform: translateY(24px) scale(.985);
280
+ transition: opacity .24s ease, transform .24s ease;
281
+ }
282
+ .scene.active {
283
+ opacity: 1;
284
+ transform: translateY(0) scale(1);
285
+ }
286
+ </style>
287
+ </head>
288
+ <body>
289
+ <div class="stage">
290
+ <div class="card">
291
+ <div class="top">
292
+ <div class="brand"><div class="logo">R</div> Runcap</div>
293
+ <div class="pill">local-first AI cost control</div>
294
+ </div>
295
+ <div class="content">
296
+ <section class="scene active" id="s0">
297
+ <h1 class="headline">Your AI coding agent has a hidden tax.</h1>
298
+ <p class="sub">It reads a file, edits one line, then re-reads the whole file. The API charges full price again.</p>
299
+ <div class="terminal mono">
300
+ <div class="line accent">agent loop</div>
301
+ <div class="line">read auth.ts → edit one line → read auth.ts again</div>
302
+ <div class="line red">same context, full token bill</div>
303
+ </div>
304
+ </section>
305
+ <section class="scene" id="s1">
306
+ <h1 class="headline">A tiny edit becomes a full re-read.</h1>
307
+ <div class="grid2">
308
+ <div class="file mono">
309
+ <h3>auth.ts · first read</h3>
310
+ <div class="code">if (!token) {
311
+ throw new Error("no token");
312
+ }
313
+ return verify(token);</div>
314
+ </div>
315
+ <div class="file mono">
316
+ <h3>auth.ts · after one-line edit</h3>
317
+ <div class="code">if (!token) {
318
+ <span class="changed">return res.status(401)</span>;
319
+ }
320
+ return verify(token);</div>
321
+ </div>
322
+ </div>
323
+ <div class="warning">Without a delta layer, the agent pays to send the whole file again.</div>
324
+ </section>
325
+ <section class="scene" id="s2">
326
+ <h1 class="headline">Runcap sends a lossless delta instead.</h1>
327
+ <p class="sub">It refuses to emit the diff unless it can rebuild the edited file byte-for-byte first.</p>
328
+ <div class="flow mono">
329
+ <div class="box">
330
+ <div class="box-title">model already saw</div>
331
+ <div class="code">throw new Error("no token")</div>
332
+ </div>
333
+ <div class="arrow">→</div>
334
+ <div class="box">
335
+ <div class="box-title">Runcap sends only the change</div>
336
+ <div class="code red">- throw new Error("no token")</div>
337
+ <div class="code green">+ return res.status(401)</div>
338
+ </div>
339
+ </div>
340
+ </section>
341
+ <section class="scene" id="s3">
342
+ <h1 class="headline">Real OpenAI call. Real provider usage.</h1>
343
+ <div class="numbers">
344
+ <div class="number-card">
345
+ <div class="label">baseline prompt</div>
346
+ <div class="big red mono">1,186</div>
347
+ <div class="label">tokens</div>
348
+ </div>
349
+ <div class="number-card">
350
+ <div class="label">with Runcap delta</div>
351
+ <div class="big green mono">737</div>
352
+ <div class="label">tokens</div>
353
+ </div>
354
+ </div>
355
+ <div class="bar"><div class="fill"></div></div>
356
+ <p class="sub"><span class="green">37.9% saved</span>. The model still answered correctly about the changed line.</p>
357
+ </section>
358
+ <section class="scene" id="s4">
359
+ <h1 class="headline">Then cap the run before it gets expensive.</h1>
360
+ <p class="sub">Point OpenAI or Anthropic-compatible tools at the local gateway. When the ceiling is crossed, the next call stops.</p>
361
+ <div class="terminal mono">
362
+ <div class="line green">$ AIM_DAILY_BUDGET_USD=10 runcap gateway</div>
363
+ <div class="line">gateway up · compression on · hard cap armed</div>
364
+ <div class="line red">HTTP 429 budget_guard</div>
365
+ <div class="line accent">stopped before money left your account</div>
366
+ </div>
367
+ </section>
368
+ </div>
369
+ <div class="footer">
370
+ <span class="mono">npm install -g runcap</span>
371
+ <span>Free · MIT · 100% local</span>
372
+ </div>
373
+ </div>
374
+ </div>
375
+ <script>
376
+ const scenes = [...document.querySelectorAll(".scene")];
377
+ window.renderFrame = (seconds) => {
378
+ const index = seconds < 2.4 ? 0 : seconds < 4.8 ? 1 : seconds < 7.2 ? 2 : seconds < 9.8 ? 3 : 4;
379
+ scenes.forEach((scene, i) => scene.classList.toggle("active", i === index));
380
+ };
381
+ </script>
382
+ </body>
383
+ </html>`;
384
+
385
+ const browser = await chromium.launch({ headless: true });
386
+ const page = await browser.newPage({ viewport: { width, height }, deviceScaleFactor: 1 });
387
+ await page.setContent(html);
388
+ await page.waitForTimeout(100);
389
+
390
+ for (let i = 0; i < frameCount; i += 1) {
391
+ const seconds = i / fps;
392
+ await page.evaluate((t) => window.renderFrame(t), seconds);
393
+ await page.screenshot({ path: join(framesDir, `frame-${String(i).padStart(4, "0")}.png`) });
394
+ }
395
+ await browser.close();
396
+
397
+ const ffmpeg = spawnSync("ffmpeg", [
398
+ "-y",
399
+ "-framerate", String(fps),
400
+ "-i", join(framesDir, "frame-%04d.png"),
401
+ "-c:v", "libx264",
402
+ "-pix_fmt", "yuv420p",
403
+ "-movflags", "+faststart",
404
+ "-crf", "18",
405
+ outFile
406
+ ], { stdio: "inherit" });
407
+
408
+ if (ffmpeg.status !== 0) {
409
+ process.exit(ffmpeg.status ?? 1);
410
+ }
411
+
412
+ console.log(`wrote ${outFile}`);
@@ -46,7 +46,7 @@ function shortHash(text) {
46
46
 
47
47
  // Cheap line-overlap ratio. Used only to decide whether a full LCS diff is
48
48
  // worth computing; the real saving is measured against the emitted delta.
49
- function lineSimilarity(aLines, bLines) {
49
+ export function lineSimilarity(aLines, bLines) {
50
50
  const aSet = new Set(aLines);
51
51
  let shared = 0;
52
52
  for (const l of bLines) if (aSet.has(l)) shared++;
@@ -378,3 +378,59 @@ export function compressRequestBody(body) {
378
378
  deltas: deduped.deltas
379
379
  };
380
380
  }
381
+
382
+ // --- loop / circling detection (the "looks productive but stuck" signal) ---
383
+ // The gateway sees every request the agent sends. An agent that is circling the
384
+ // same failure with reworded attempts sends prompts that are SIMILAR-but-not-
385
+ // identical turn after turn: the conversation tail barely moves while tokens
386
+ // keep burning. Plain hashing misses this (the text differs slightly each loop);
387
+ // this catches it with the same line-similarity primitive the delta-encoder uses.
388
+ const LOOP_SIMILARITY = 0.92; // two consecutive prompts this similar = no real progress made between them
389
+ const LOOP_MIN_REPEATS = 3; // how many near-identical prompts in a row before we warn
390
+
391
+ // Pull the comparable "shape" of a request: the concatenated text the agent is
392
+ // actually sending this turn (messages / input / system), order-preserving.
393
+ export function requestShapeText(body) {
394
+ if (!body || typeof body !== "object") return "";
395
+ const parts = [];
396
+ const push = (content) => {
397
+ if (typeof content === "string") parts.push(content);
398
+ else if (Array.isArray(content)) {
399
+ for (const p of content) if (p && typeof p === "object" && typeof p.text === "string") parts.push(p.text);
400
+ }
401
+ };
402
+ if (Array.isArray(body.messages)) for (const m of body.messages) if (m && typeof m === "object") push(m.content);
403
+ if (body.system !== undefined) push(body.system);
404
+ if (typeof body.input === "string") push(body.input);
405
+ return parts.join("\n");
406
+ }
407
+
408
+ // Given the current request and a rolling history of prior request shapes,
409
+ // decide whether the agent is circling. Returns { looping, repeats, similarity }.
410
+ // History is oldest->newest of prior requestShapeText() strings in this session.
411
+ export function detectLoop(currentShape, history, {
412
+ similarityThreshold = LOOP_SIMILARITY,
413
+ minRepeats = LOOP_MIN_REPEATS
414
+ } = {}) {
415
+ if (!currentShape || !Array.isArray(history) || history.length === 0) {
416
+ return { looping: false, repeats: 0, similarity: 0 };
417
+ }
418
+ const curLines = String(currentShape).split("\n");
419
+ let repeats = 0;
420
+ let lastSimilarity = 0;
421
+ // Walk backward through history; count the unbroken run of near-identical turns.
422
+ for (let i = history.length - 1; i >= 0; i--) {
423
+ const sim = lineSimilarity(curLines, String(history[i]).split("\n"));
424
+ if (sim >= similarityThreshold) {
425
+ repeats += 1;
426
+ lastSimilarity = sim;
427
+ } else {
428
+ break;
429
+ }
430
+ }
431
+ return {
432
+ looping: repeats >= minRepeats,
433
+ repeats,
434
+ similarity: Number(lastSimilarity.toFixed(3))
435
+ };
436
+ }
@@ -7,7 +7,7 @@ import path from "node:path";
7
7
  import process from "node:process";
8
8
  import { syncRun } from "./cloud.mjs";
9
9
  import { sendAlert } from "./alerts.mjs";
10
- import { compressRequestBody, estimateTokens } from "./compressor.mjs";
10
+ import { compressRequestBody, estimateTokens, requestShapeText, detectLoop } from "./compressor.mjs";
11
11
 
12
12
  const STORE_DIR = ".runcap";
13
13
  const MISSIONS_DIR = path.join(STORE_DIR, "missions");
@@ -523,6 +523,12 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
523
523
  if (gatewayMode !== "mock" && !openaiKey && !anthropicKey) {
524
524
  throw new Error("Missing upstream key. Set OPENAI_API_KEY (for /v1/chat/completions) and/or ANTHROPIC_API_KEY (for /v1/messages). The gateway cannot proxy without at least one.");
525
525
  }
526
+ // Rolling history of recent request shapes (per gateway process) so we can
527
+ // detect an agent circling the same failure with reworded prompts: similar-
528
+ // but-not-identical turns, which plain hashing never catches.
529
+ const loopEnabled = (process.env.AIM_LOOP_DETECT ?? "on").toLowerCase() !== "off";
530
+ const shapeHistory = [];
531
+ const SHAPE_HISTORY_MAX = 12;
526
532
  const server = http.createServer(async (request, response) => {
527
533
  const started = Date.now();
528
534
  try {
@@ -545,6 +551,17 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
545
551
 
546
552
  const bodyText = await readRequestBody(request);
547
553
  const requestBody = safeJson(bodyText) ?? {};
554
+ // Loop signal: compare this request's shape against the recent run.
555
+ let loop = null;
556
+ if (loopEnabled) {
557
+ const shape = requestShapeText(requestBody);
558
+ if (shape) {
559
+ const result = detectLoop(shape, shapeHistory);
560
+ loop = { looping: result.looping, repeats: result.repeats, similarity: result.similarity, truth: "calculated" };
561
+ shapeHistory.push(shape);
562
+ if (shapeHistory.length > SHAPE_HISTORY_MAX) shapeHistory.shift();
563
+ }
564
+ }
548
565
  const budget = readBudget();
549
566
  const summary = await readGatewaySummary({ windowMs: budgetWindowMs() });
550
567
  // Compress the request body once (safe, lossless-by-construction). Disable with AIM_COMPRESS=off.
@@ -591,6 +608,7 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
591
608
  capUsd: budget,
592
609
  blockedByThisCall
593
610
  },
611
+ loop,
594
612
  error: blockedByThisCall
595
613
  ? `Budget would be exceeded by this call: $${summary.estimatedCostUsd} spent + ~$${callEstimate} this call > cap $${budget}`
596
614
  : `Budget exceeded: ${summary.estimatedCostUsd} >= ${budget}`,
@@ -631,6 +649,7 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
631
649
  usage: responseBody.usage,
632
650
  cost: estimateApiCost(responseBody.usage, requestBody.model ?? responseBody.model),
633
651
  compression,
652
+ loop,
634
653
  truth: "mock_provider_usage",
635
654
  requestHash: createHash("sha1").update(bodyText).digest("hex")
636
655
  });
@@ -682,9 +701,14 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
682
701
  usage: responseBody.usage ?? null,
683
702
  cost: estimateApiCost(responseBody.usage, requestBody.model ?? responseBody.model),
684
703
  compression,
704
+ loop,
685
705
  truth: responseBody.usage ? "provider_usage" : "unknown",
686
706
  requestHash: createHash("sha1").update(bodyText).digest("hex")
687
707
  });
708
+ if (loop && loop.looping) {
709
+ sendAlert(`Runcap: possible stuck loop. The agent has sent ${loop.repeats} near-identical prompts in a row (${Math.round(loop.similarity * 100)}% similar) without the conversation moving forward. It may be circling the same failure with reworded attempts.`)
710
+ .catch(() => {});
711
+ }
688
712
  if (responseBody.usage) {
689
713
  const spent = await readGatewaySummary({ windowMs: budgetWindowMs() });
690
714
  syncRun({
@@ -769,19 +793,23 @@ export async function showStatus(options = {}) {
769
793
 
770
794
  const gateway = await readGatewaySummary();
771
795
  const gatewayLine = `Gateway: ${gateway.callCount} calls, ${gateway.totalTokens} tokens, $${gateway.estimatedCostUsd} estimated (${gateway.truth})`;
796
+ const loopLine = gateway.loop?.looping
797
+ ? `Loop warning: last ${gateway.loop.repeats} prompts were ${Math.round(gateway.loop.similarity * 100)}% identical with no progress. The agent may be circling the same failure (truth: calculated).`
798
+ : null;
772
799
  const latest = await latestMissionId();
773
- if (!latest) return `${fuelLine}\n${gatewayLine}\nNo missions recorded yet.`;
800
+ if (!latest) return [fuelLine, gatewayLine, loopLine, "No missions recorded yet."].filter(Boolean).join("\n");
774
801
  const mission = await readMission(latest);
775
802
  return [
776
803
  fuelLine,
777
804
  gatewayLine,
805
+ loopLine,
778
806
  `Latest mission: ${mission.id}`,
779
807
  `Status: ${mission.stuck.status}`,
780
808
  `Exit code: ${mission.exitCode}`,
781
809
  `Changed files: ${mission.diffEvidence.changedFiles.length}`,
782
810
  `Errors: ${mission.errors.length}`,
783
811
  `Report: ${path.join(MISSIONS_DIR, mission.id, "report.md")}`
784
- ].join("\n");
812
+ ].filter(Boolean).join("\n");
785
813
  }
786
814
 
787
815
  export async function recordFuel(value) {
@@ -1419,6 +1447,13 @@ async function readGatewaySummary({ windowMs } = {}) {
1419
1447
  const inputRate = pricing ? pricing.inputPerMillion : 3; // fall back to a mid Sonnet-ish rate
1420
1448
  return sum + (saved * inputRate) / 1_000_000;
1421
1449
  }, 0);
1450
+ // Loop signal: the most recent event that carries a loop verdict tells us
1451
+ // whether the agent is currently circling (similar-but-not-identical prompts
1452
+ // repeated without progress). This is the "looks productive but stuck" case.
1453
+ const lastWithLoop = [...events].reverse().find((event) => event.loop);
1454
+ const loop = lastWithLoop
1455
+ ? { ...lastWithLoop.loop, at: lastWithLoop.at, model: lastWithLoop.model }
1456
+ : { looping: false, repeats: 0, similarity: 0, truth: "calculated" };
1422
1457
  return {
1423
1458
  callCount: events.length,
1424
1459
  successfulCallCount: successful.length,
@@ -1427,6 +1462,7 @@ async function readGatewaySummary({ windowMs } = {}) {
1427
1462
  savedTokens,
1428
1463
  savedUsd: Number(savedUsd.toFixed(6)),
1429
1464
  wouldHaveSpentUsd: Number((estimatedCost + savedUsd).toFixed(6)),
1465
+ loop,
1430
1466
  truth: events.some((event) => event.truth === "provider_usage" || event.truth === "mock_provider_usage")
1431
1467
  ? "usage_plus_static_price_table"
1432
1468
  : "unknown",