runcap 0.2.2 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +15 -3
- package/package.json +3 -2
- package/scripts/loop-test.mjs +84 -0
- package/scripts/make-linkedin-delta-video.mjs +412 -0
- package/src/compressor.mjs +57 -1
- package/src/mission-control.mjs +39 -3
package/README.md
CHANGED
|
@@ -4,15 +4,21 @@
|
|
|
4
4
|
|
|
5
5
|

|
|
6
6
|
|
|
7
|
-
**
|
|
7
|
+
**Your AI coding agent re-reads the same files over and over and quietly burns your money. Runcap estimates the bill before you build, hard-caps the spend so it physically stops at your ceiling, and losslessly compresses every call. Free, MIT, 100% local. Your code and tokens never touch a server.**
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
On a real OpenAI call, one edited-file re-read dropped from **1,186 to 737 prompt tokens (37.9% saved)** with the model still answering correctly about the changed line. No other proxy does this:
|
|
10
|
+
|
|
11
|
+
| | Without Runcap | With Runcap |
|
|
12
|
+
|---|---|---|
|
|
13
|
+
| Re-read of an edited file | 1,186 prompt tokens | **737 prompt tokens** |
|
|
14
|
+
| You find out the cost | when the invoice arrives | **before you press go, capped at your ceiling** |
|
|
15
|
+
| When the agent gets stuck | it keeps spending | **run stops, you get the exact rescue prompt** |
|
|
10
16
|
|
|
11
17
|
> Every other tool here is a rear-view mirror - it shows you the bill *after* you paid it. Runcap estimates the bill *before* you start and caps it. It is a circuit breaker, not a dashboard.
|
|
12
18
|
|
|
13
19
|
## Why
|
|
14
20
|
|
|
15
|
-
Multi-agent coding runs burn roughly **15x more tokens** than a single chat ([Anthropic engineering](https://www.anthropic.com/engineering/built-multi-agent-research-system)).
|
|
21
|
+
**Agents loop on the same error, rewrite plans, and re-read files they just edited - every loop is tokens you pay for.** Multi-agent coding runs burn roughly **15x more tokens** than a single chat ([Anthropic engineering](https://www.anthropic.com/engineering/built-multi-agent-research-system)). They hand you a confident summary while the task is not actually done, and you find out what it cost when the invoice - or the subscription limit - arrives.
|
|
16
22
|
|
|
17
23
|
Observability tools (Langfuse, Helicone, LangSmith, AgentOps) measure the past. Gateways (LiteLLM, Portkey, OpenRouter) route the present. None of them stop the spend *before* it happens. Runcap does the one thing the rear-view mirror can't:
|
|
18
24
|
|
|
@@ -135,6 +141,12 @@ It's pure Node with **zero ML or native dependencies**, so it installs everywher
|
|
|
135
141
|
|
|
136
142
|
The dashboard shows the result as one number: **"You saved $X · N tokens compressed · would have spent $Y."** Disable it with `AIM_COMPRESS=off` if you ever want raw passthrough.
|
|
137
143
|
|
|
144
|
+
## Loop detection (the "looks productive but stuck" signal)
|
|
145
|
+
|
|
146
|
+
The hard case in stuck-detection is the agent that keeps producing output but is really circling the same failure, just reworded each time. Plain hashing misses it because the prompt is *similar but never byte-identical* between loops. Because the gateway sees every request, Runcap compares each request's conversation shape against the recent run with the same line-similarity primitive the delta-encoder uses: when several prompts in a row are near-identical (default: 3 prompts at 92%+ similarity) while the conversation never moves forward, it flags `loop.looping` on the event, surfaces a warning in `runcap status`, and fires an alert.
|
|
147
|
+
|
|
148
|
+
This is a **calculated** signal, not a proven dollar-saving: it tells you *"the agent has sent 3 near-identical prompts in a row with no progress"* so you can step in before the loop burns more budget. Tune or disable it with `AIM_LOOP_DETECT=off`. (Today's [`detectStuck`](src/mission-control.mjs) post-run score is outcome-based: exit code, parsed errors, and zero-diff. The loop signal adds the missing in-flight behavioral signal on top of it.)
|
|
149
|
+
|
|
138
150
|
## Pricing table
|
|
139
151
|
|
|
140
152
|
Costs are calculated from a sourced multi-provider table - Anthropic (Opus / Sonnet / Haiku) and OpenAI (GPT-5 family + legacy GPT-4), with cache-read and batch discounts handled - labeled with source and verification date. When a model is unknown, Runcap says `unknown_price` rather than guessing.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "runcap",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.3.0",
|
|
4
4
|
"description": "Cap every agent run before it starts: estimate cost, set a hard ceiling that stops the run, rescue stuck agents. Local, MIT, nothing uploaded.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"type": "module",
|
|
@@ -45,8 +45,9 @@
|
|
|
45
45
|
"acceptance": "node ./scripts/acceptance.mjs",
|
|
46
46
|
"smoke": "node ./bin/runcap.mjs run --label smoke -- npm --prefix examples/broken-ts-app run build",
|
|
47
47
|
"demo:broken": "node ./bin/runcap.mjs run --label broken-ts-demo -- npm --prefix examples/broken-ts-app run build",
|
|
48
|
-
"test": "node ./scripts/delta-test.mjs && node ./scripts/validate-demo.mjs",
|
|
48
|
+
"test": "node ./scripts/delta-test.mjs && node ./scripts/loop-test.mjs && node ./scripts/validate-demo.mjs",
|
|
49
49
|
"test:delta": "node ./scripts/delta-test.mjs",
|
|
50
|
+
"test:loop": "node ./scripts/loop-test.mjs",
|
|
50
51
|
"status": "node ./bin/runcap.mjs status",
|
|
51
52
|
"report": "node ./bin/runcap.mjs report",
|
|
52
53
|
"export": "node ./bin/runcap.mjs export",
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
// Loop / circling detection tests, run against the REAL compressor exports.
|
|
2
|
+
// Proves the "looks productive but stuck" signal the gateway emits:
|
|
3
|
+
// 1. Reworded same-failure attempts (similar-but-not-identical prompts) are
|
|
4
|
+
// flagged as a loop once they repeat enough times.
|
|
5
|
+
// 2. Genuine progress (the conversation tail actually changing) is NOT flagged.
|
|
6
|
+
// 3. A single slow/long legit step is NOT flagged.
|
|
7
|
+
//
|
|
8
|
+
// Pure Node, no test framework. Exits non-zero on any failure so it can gate CI.
|
|
9
|
+
|
|
10
|
+
import { detectLoop, requestShapeText } from "../src/compressor.mjs";
|
|
11
|
+
|
|
12
|
+
let failures = 0;
|
|
13
|
+
function check(name, pass, detail) {
|
|
14
|
+
if (!pass) failures++;
|
|
15
|
+
console.log(`${pass ? "PASS" : "FAIL"} ${name}${detail ? " — " + detail : ""}`);
|
|
16
|
+
}
|
|
17
|
+
|
|
18
|
+
// A long, stable conversation tail (system + history the agent keeps resending),
|
|
19
|
+
// plus a final attempt line that the agent only REWORDS each loop. This is the
|
|
20
|
+
// exact case that fools cheap hashing: 99% identical, never byte-equal.
|
|
21
|
+
const stableTail = [
|
|
22
|
+
"You are a coding agent. Fix the failing build.",
|
|
23
|
+
...Array.from({ length: 40 }, (_, i) => `context line ${i}: prior file content the agent keeps resending`),
|
|
24
|
+
"The test still fails with: TypeError: cannot read property 'id' of undefined"
|
|
25
|
+
].join("\n");
|
|
26
|
+
|
|
27
|
+
function attempt(wording) {
|
|
28
|
+
return stableTail + "\n" + "Let me try this: " + wording;
|
|
29
|
+
}
|
|
30
|
+
|
|
31
|
+
// --- Test 1: reworded same-failure attempts are flagged as a loop ---
|
|
32
|
+
{
|
|
33
|
+
const history = [
|
|
34
|
+
attempt("guard the undefined with an if check"),
|
|
35
|
+
attempt("add an optional chain before .id"),
|
|
36
|
+
attempt("default the object to {} before reading id")
|
|
37
|
+
];
|
|
38
|
+
const current = attempt("wrap the access in a try/catch and read id safely");
|
|
39
|
+
const r = detectLoop(current, history);
|
|
40
|
+
check("reworded same-failure attempts flagged as loop", r.looping && r.repeats >= 3,
|
|
41
|
+
`repeats=${r.repeats}, similarity=${r.similarity}`);
|
|
42
|
+
}
|
|
43
|
+
|
|
44
|
+
// --- Test 2: real progress is NOT flagged ---
|
|
45
|
+
// Each turn the conversation tail genuinely changes (new files, new errors).
|
|
46
|
+
{
|
|
47
|
+
const history = [
|
|
48
|
+
"Fix the build. Error: missing module 'parser'.\n" + "ctx A ".repeat(40),
|
|
49
|
+
"Installed parser. New error: parser.parse is not a function.\n" + "ctx B ".repeat(40)
|
|
50
|
+
];
|
|
51
|
+
const current = "Fixed the call signature. Now the test passes; writing the next feature.\n" + "ctx C ".repeat(40);
|
|
52
|
+
const r = detectLoop(current, history);
|
|
53
|
+
check("genuine progress is NOT flagged as loop", !r.looping,
|
|
54
|
+
`looping=${r.looping}, repeats=${r.repeats}`);
|
|
55
|
+
}
|
|
56
|
+
|
|
57
|
+
// --- Test 3: a single slow/long legit step is NOT flagged ---
|
|
58
|
+
// One big request with no prior near-identical history must never trip.
|
|
59
|
+
{
|
|
60
|
+
const current = attempt("first and only attempt at this step");
|
|
61
|
+
const r = detectLoop(current, []);
|
|
62
|
+
check("single long step is NOT flagged", !r.looping && r.repeats === 0,
|
|
63
|
+
`repeats=${r.repeats}`);
|
|
64
|
+
}
|
|
65
|
+
|
|
66
|
+
// --- Test 4: two repeats is at_risk but below the warn threshold ---
|
|
67
|
+
{
|
|
68
|
+
const history = [attempt("try A"), attempt("try B")];
|
|
69
|
+
const current = attempt("try C");
|
|
70
|
+
const r = detectLoop(current, history);
|
|
71
|
+
check("two near-identical repeats not yet a loop (under threshold)", !r.looping && r.repeats === 2,
|
|
72
|
+
`repeats=${r.repeats}`);
|
|
73
|
+
}
|
|
74
|
+
|
|
75
|
+
// --- Test 5: requestShapeText pulls the same text from OpenAI and Anthropic shapes ---
|
|
76
|
+
{
|
|
77
|
+
const openai = requestShapeText({ messages: [{ role: "user", content: "hello world" }] });
|
|
78
|
+
const anthropic = requestShapeText({ messages: [{ role: "user", content: [{ type: "text", text: "hello world" }] }] });
|
|
79
|
+
check("requestShapeText normalizes OpenAI and Anthropic content", openai === "hello world" && anthropic === "hello world",
|
|
80
|
+
`openai="${openai}" anthropic="${anthropic}"`);
|
|
81
|
+
}
|
|
82
|
+
|
|
83
|
+
console.log("\n" + (failures === 0 ? "ALL LOOP TESTS PASSED" : `${failures} LOOP TEST(S) FAILED`));
|
|
84
|
+
process.exit(failures === 0 ? 0 : 1);
|
|
@@ -0,0 +1,412 @@
|
|
|
1
|
+
// Renders a LinkedIn-ready MP4 for the Runcap delta-encoding post.
|
|
2
|
+
// Output: docs/assets/media/runcap-linkedin-delta-demo.mp4
|
|
3
|
+
// Requires: playwright + ffmpeg available on the machine.
|
|
4
|
+
import { spawnSync } from "node:child_process";
|
|
5
|
+
import { mkdirSync, readdirSync, rmSync } from "node:fs";
|
|
6
|
+
import { dirname, join, resolve } from "node:path";
|
|
7
|
+
import { fileURLToPath } from "node:url";
|
|
8
|
+
import { chromium } from "playwright";
|
|
9
|
+
|
|
10
|
+
const __dirname = dirname(fileURLToPath(import.meta.url));
|
|
11
|
+
const root = resolve(__dirname, "..");
|
|
12
|
+
const outDir = resolve(root, "docs/assets/media");
|
|
13
|
+
const framesDir = "/private/tmp/runcap-linkedin-delta-frames";
|
|
14
|
+
const outFile = join(outDir, "runcap-linkedin-delta-demo.mp4");
|
|
15
|
+
|
|
16
|
+
const width = 1080;
|
|
17
|
+
const height = 1080;
|
|
18
|
+
const fps = 30;
|
|
19
|
+
const duration = 12;
|
|
20
|
+
const frameCount = fps * duration;
|
|
21
|
+
|
|
22
|
+
mkdirSync(outDir, { recursive: true });
|
|
23
|
+
mkdirSync(framesDir, { recursive: true });
|
|
24
|
+
for (const file of readdirSync(framesDir)) {
|
|
25
|
+
if (file.startsWith("frame-") && file.endsWith(".png")) {
|
|
26
|
+
rmSync(join(framesDir, file));
|
|
27
|
+
}
|
|
28
|
+
}
|
|
29
|
+
|
|
30
|
+
const html = `<!doctype html>
|
|
31
|
+
<html>
|
|
32
|
+
<head>
|
|
33
|
+
<meta charset="utf-8" />
|
|
34
|
+
<style>
|
|
35
|
+
* { box-sizing: border-box; }
|
|
36
|
+
html, body {
|
|
37
|
+
margin: 0;
|
|
38
|
+
width: ${width}px;
|
|
39
|
+
height: ${height}px;
|
|
40
|
+
overflow: hidden;
|
|
41
|
+
background: #f4f6fb;
|
|
42
|
+
font-family: Inter, ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
|
|
43
|
+
color: #f8fafc;
|
|
44
|
+
}
|
|
45
|
+
.stage {
|
|
46
|
+
width: ${width}px;
|
|
47
|
+
height: ${height}px;
|
|
48
|
+
padding: 58px;
|
|
49
|
+
display: grid;
|
|
50
|
+
place-items: center;
|
|
51
|
+
background:
|
|
52
|
+
radial-gradient(circle at 15% 10%, rgba(34, 211, 238, .18), transparent 32%),
|
|
53
|
+
radial-gradient(circle at 85% 12%, rgba(99, 102, 241, .16), transparent 34%),
|
|
54
|
+
linear-gradient(135deg, #eef2ff, #f8fafc);
|
|
55
|
+
}
|
|
56
|
+
.card {
|
|
57
|
+
width: 964px;
|
|
58
|
+
height: 964px;
|
|
59
|
+
border-radius: 42px;
|
|
60
|
+
padding: 42px;
|
|
61
|
+
background: #080b12;
|
|
62
|
+
box-shadow: 0 36px 90px rgba(15, 23, 42, .25);
|
|
63
|
+
position: relative;
|
|
64
|
+
overflow: hidden;
|
|
65
|
+
}
|
|
66
|
+
.card::before {
|
|
67
|
+
content: "";
|
|
68
|
+
position: absolute;
|
|
69
|
+
inset: 0;
|
|
70
|
+
background:
|
|
71
|
+
radial-gradient(circle at 50% -10%, rgba(45, 212, 191, .18), transparent 36%),
|
|
72
|
+
linear-gradient(180deg, rgba(255,255,255,.06), transparent 28%);
|
|
73
|
+
pointer-events: none;
|
|
74
|
+
}
|
|
75
|
+
.top {
|
|
76
|
+
position: relative;
|
|
77
|
+
display: flex;
|
|
78
|
+
justify-content: space-between;
|
|
79
|
+
align-items: center;
|
|
80
|
+
color: #94a3b8;
|
|
81
|
+
font-size: 23px;
|
|
82
|
+
letter-spacing: -0.02em;
|
|
83
|
+
}
|
|
84
|
+
.brand {
|
|
85
|
+
display: flex;
|
|
86
|
+
gap: 14px;
|
|
87
|
+
align-items: center;
|
|
88
|
+
font-weight: 800;
|
|
89
|
+
color: #fff;
|
|
90
|
+
font-size: 30px;
|
|
91
|
+
}
|
|
92
|
+
.logo {
|
|
93
|
+
width: 42px;
|
|
94
|
+
height: 42px;
|
|
95
|
+
border-radius: 13px;
|
|
96
|
+
display: grid;
|
|
97
|
+
place-items: center;
|
|
98
|
+
background: linear-gradient(135deg, #22d3ee, #34d399);
|
|
99
|
+
color: #021014;
|
|
100
|
+
font-weight: 900;
|
|
101
|
+
}
|
|
102
|
+
.pill {
|
|
103
|
+
border: 1px solid rgba(148, 163, 184, .28);
|
|
104
|
+
background: rgba(15, 23, 42, .68);
|
|
105
|
+
color: #cbd5e1;
|
|
106
|
+
border-radius: 999px;
|
|
107
|
+
padding: 10px 16px;
|
|
108
|
+
font-size: 18px;
|
|
109
|
+
font-weight: 650;
|
|
110
|
+
}
|
|
111
|
+
.content {
|
|
112
|
+
position: relative;
|
|
113
|
+
height: 818px;
|
|
114
|
+
padding-top: 44px;
|
|
115
|
+
}
|
|
116
|
+
.headline {
|
|
117
|
+
margin: 0;
|
|
118
|
+
color: #f8fafc;
|
|
119
|
+
font-size: 70px;
|
|
120
|
+
line-height: .96;
|
|
121
|
+
letter-spacing: -0.06em;
|
|
122
|
+
max-width: 830px;
|
|
123
|
+
}
|
|
124
|
+
.sub {
|
|
125
|
+
margin-top: 22px;
|
|
126
|
+
color: #cbd5e1;
|
|
127
|
+
font-size: 29px;
|
|
128
|
+
line-height: 1.28;
|
|
129
|
+
letter-spacing: -0.03em;
|
|
130
|
+
max-width: 820px;
|
|
131
|
+
}
|
|
132
|
+
.accent { color: #67e8f9; }
|
|
133
|
+
.green { color: #34d399; }
|
|
134
|
+
.red { color: #fb7185; }
|
|
135
|
+
.violet { color: #a78bfa; }
|
|
136
|
+
.mono {
|
|
137
|
+
font-family: "SF Mono", "JetBrains Mono", Menlo, Consolas, monospace;
|
|
138
|
+
letter-spacing: -0.04em;
|
|
139
|
+
}
|
|
140
|
+
.terminal {
|
|
141
|
+
margin-top: 38px;
|
|
142
|
+
border: 1px solid rgba(148, 163, 184, .22);
|
|
143
|
+
background: rgba(2, 6, 23, .82);
|
|
144
|
+
border-radius: 24px;
|
|
145
|
+
padding: 26px;
|
|
146
|
+
font-size: 24px;
|
|
147
|
+
line-height: 1.42;
|
|
148
|
+
color: #dbeafe;
|
|
149
|
+
box-shadow: inset 0 1px 0 rgba(255,255,255,.05);
|
|
150
|
+
}
|
|
151
|
+
.terminal .line { opacity: 1; }
|
|
152
|
+
.grid2 {
|
|
153
|
+
display: grid;
|
|
154
|
+
grid-template-columns: 1fr 1fr;
|
|
155
|
+
gap: 24px;
|
|
156
|
+
margin-top: 34px;
|
|
157
|
+
}
|
|
158
|
+
.file {
|
|
159
|
+
border: 1px solid rgba(148, 163, 184, .22);
|
|
160
|
+
background: rgba(15, 23, 42, .9);
|
|
161
|
+
border-radius: 22px;
|
|
162
|
+
padding: 22px;
|
|
163
|
+
min-height: 290px;
|
|
164
|
+
}
|
|
165
|
+
.file h3 {
|
|
166
|
+
margin: 0 0 16px;
|
|
167
|
+
color: #94a3b8;
|
|
168
|
+
font-size: 20px;
|
|
169
|
+
letter-spacing: -0.02em;
|
|
170
|
+
}
|
|
171
|
+
.code {
|
|
172
|
+
font-size: 20px;
|
|
173
|
+
line-height: 1.42;
|
|
174
|
+
white-space: pre-wrap;
|
|
175
|
+
color: #dbeafe;
|
|
176
|
+
}
|
|
177
|
+
.changed {
|
|
178
|
+
display: inline-block;
|
|
179
|
+
padding: 2px 6px;
|
|
180
|
+
border-radius: 7px;
|
|
181
|
+
background: rgba(52, 211, 153, .16);
|
|
182
|
+
color: #6ee7b7;
|
|
183
|
+
}
|
|
184
|
+
.warning {
|
|
185
|
+
margin-top: 25px;
|
|
186
|
+
border: 1px solid rgba(251, 113, 133, .35);
|
|
187
|
+
background: rgba(251, 113, 133, .1);
|
|
188
|
+
color: #fecdd3;
|
|
189
|
+
border-radius: 22px;
|
|
190
|
+
padding: 20px 24px;
|
|
191
|
+
font-size: 27px;
|
|
192
|
+
font-weight: 850;
|
|
193
|
+
letter-spacing: -0.04em;
|
|
194
|
+
}
|
|
195
|
+
.flow {
|
|
196
|
+
display: grid;
|
|
197
|
+
grid-template-columns: 1fr 88px 1fr;
|
|
198
|
+
align-items: center;
|
|
199
|
+
gap: 18px;
|
|
200
|
+
margin-top: 42px;
|
|
201
|
+
}
|
|
202
|
+
.box {
|
|
203
|
+
min-height: 220px;
|
|
204
|
+
border: 1px solid rgba(148, 163, 184, .24);
|
|
205
|
+
background: rgba(15, 23, 42, .88);
|
|
206
|
+
border-radius: 24px;
|
|
207
|
+
padding: 24px;
|
|
208
|
+
}
|
|
209
|
+
.box-title {
|
|
210
|
+
color: #94a3b8;
|
|
211
|
+
font-size: 20px;
|
|
212
|
+
font-weight: 750;
|
|
213
|
+
margin-bottom: 16px;
|
|
214
|
+
}
|
|
215
|
+
.arrow {
|
|
216
|
+
height: 88px;
|
|
217
|
+
border-radius: 50%;
|
|
218
|
+
display: grid;
|
|
219
|
+
place-items: center;
|
|
220
|
+
background: linear-gradient(135deg, #22d3ee, #34d399);
|
|
221
|
+
color: #031015;
|
|
222
|
+
font-size: 42px;
|
|
223
|
+
font-weight: 900;
|
|
224
|
+
}
|
|
225
|
+
.numbers {
|
|
226
|
+
margin-top: 46px;
|
|
227
|
+
display: grid;
|
|
228
|
+
grid-template-columns: 1fr 1fr;
|
|
229
|
+
gap: 28px;
|
|
230
|
+
align-items: end;
|
|
231
|
+
}
|
|
232
|
+
.number-card {
|
|
233
|
+
border-radius: 26px;
|
|
234
|
+
padding: 28px;
|
|
235
|
+
background: rgba(15, 23, 42, .9);
|
|
236
|
+
border: 1px solid rgba(148, 163, 184, .22);
|
|
237
|
+
}
|
|
238
|
+
.label {
|
|
239
|
+
color: #94a3b8;
|
|
240
|
+
font-size: 22px;
|
|
241
|
+
margin-bottom: 12px;
|
|
242
|
+
letter-spacing: -0.03em;
|
|
243
|
+
}
|
|
244
|
+
.big {
|
|
245
|
+
font-size: 78px;
|
|
246
|
+
line-height: .9;
|
|
247
|
+
font-weight: 900;
|
|
248
|
+
letter-spacing: -0.08em;
|
|
249
|
+
}
|
|
250
|
+
.bar {
|
|
251
|
+
margin-top: 32px;
|
|
252
|
+
height: 34px;
|
|
253
|
+
border-radius: 999px;
|
|
254
|
+
background: rgba(148, 163, 184, .16);
|
|
255
|
+
overflow: hidden;
|
|
256
|
+
border: 1px solid rgba(148, 163, 184, .24);
|
|
257
|
+
}
|
|
258
|
+
.fill {
|
|
259
|
+
height: 100%;
|
|
260
|
+
width: 37.9%;
|
|
261
|
+
border-radius: 999px;
|
|
262
|
+
background: linear-gradient(90deg, #22d3ee, #34d399);
|
|
263
|
+
}
|
|
264
|
+
.footer {
|
|
265
|
+
position: absolute;
|
|
266
|
+
left: 42px;
|
|
267
|
+
right: 42px;
|
|
268
|
+
bottom: 34px;
|
|
269
|
+
display: flex;
|
|
270
|
+
justify-content: space-between;
|
|
271
|
+
align-items: center;
|
|
272
|
+
color: #94a3b8;
|
|
273
|
+
font-size: 20px;
|
|
274
|
+
}
|
|
275
|
+
.scene {
|
|
276
|
+
position: absolute;
|
|
277
|
+
inset: 44px 0 0 0;
|
|
278
|
+
opacity: 0;
|
|
279
|
+
transform: translateY(24px) scale(.985);
|
|
280
|
+
transition: opacity .24s ease, transform .24s ease;
|
|
281
|
+
}
|
|
282
|
+
.scene.active {
|
|
283
|
+
opacity: 1;
|
|
284
|
+
transform: translateY(0) scale(1);
|
|
285
|
+
}
|
|
286
|
+
</style>
|
|
287
|
+
</head>
|
|
288
|
+
<body>
|
|
289
|
+
<div class="stage">
|
|
290
|
+
<div class="card">
|
|
291
|
+
<div class="top">
|
|
292
|
+
<div class="brand"><div class="logo">R</div> Runcap</div>
|
|
293
|
+
<div class="pill">local-first AI cost control</div>
|
|
294
|
+
</div>
|
|
295
|
+
<div class="content">
|
|
296
|
+
<section class="scene active" id="s0">
|
|
297
|
+
<h1 class="headline">Your AI coding agent has a hidden tax.</h1>
|
|
298
|
+
<p class="sub">It reads a file, edits one line, then re-reads the whole file. The API charges full price again.</p>
|
|
299
|
+
<div class="terminal mono">
|
|
300
|
+
<div class="line accent">agent loop</div>
|
|
301
|
+
<div class="line">read auth.ts → edit one line → read auth.ts again</div>
|
|
302
|
+
<div class="line red">same context, full token bill</div>
|
|
303
|
+
</div>
|
|
304
|
+
</section>
|
|
305
|
+
<section class="scene" id="s1">
|
|
306
|
+
<h1 class="headline">A tiny edit becomes a full re-read.</h1>
|
|
307
|
+
<div class="grid2">
|
|
308
|
+
<div class="file mono">
|
|
309
|
+
<h3>auth.ts · first read</h3>
|
|
310
|
+
<div class="code">if (!token) {
|
|
311
|
+
throw new Error("no token");
|
|
312
|
+
}
|
|
313
|
+
return verify(token);</div>
|
|
314
|
+
</div>
|
|
315
|
+
<div class="file mono">
|
|
316
|
+
<h3>auth.ts · after one-line edit</h3>
|
|
317
|
+
<div class="code">if (!token) {
|
|
318
|
+
<span class="changed">return res.status(401)</span>;
|
|
319
|
+
}
|
|
320
|
+
return verify(token);</div>
|
|
321
|
+
</div>
|
|
322
|
+
</div>
|
|
323
|
+
<div class="warning">Without a delta layer, the agent pays to send the whole file again.</div>
|
|
324
|
+
</section>
|
|
325
|
+
<section class="scene" id="s2">
|
|
326
|
+
<h1 class="headline">Runcap sends a lossless delta instead.</h1>
|
|
327
|
+
<p class="sub">It refuses to emit the diff unless it can rebuild the edited file byte-for-byte first.</p>
|
|
328
|
+
<div class="flow mono">
|
|
329
|
+
<div class="box">
|
|
330
|
+
<div class="box-title">model already saw</div>
|
|
331
|
+
<div class="code">throw new Error("no token")</div>
|
|
332
|
+
</div>
|
|
333
|
+
<div class="arrow">→</div>
|
|
334
|
+
<div class="box">
|
|
335
|
+
<div class="box-title">Runcap sends only the change</div>
|
|
336
|
+
<div class="code red">- throw new Error("no token")</div>
|
|
337
|
+
<div class="code green">+ return res.status(401)</div>
|
|
338
|
+
</div>
|
|
339
|
+
</div>
|
|
340
|
+
</section>
|
|
341
|
+
<section class="scene" id="s3">
|
|
342
|
+
<h1 class="headline">Real OpenAI call. Real provider usage.</h1>
|
|
343
|
+
<div class="numbers">
|
|
344
|
+
<div class="number-card">
|
|
345
|
+
<div class="label">baseline prompt</div>
|
|
346
|
+
<div class="big red mono">1,186</div>
|
|
347
|
+
<div class="label">tokens</div>
|
|
348
|
+
</div>
|
|
349
|
+
<div class="number-card">
|
|
350
|
+
<div class="label">with Runcap delta</div>
|
|
351
|
+
<div class="big green mono">737</div>
|
|
352
|
+
<div class="label">tokens</div>
|
|
353
|
+
</div>
|
|
354
|
+
</div>
|
|
355
|
+
<div class="bar"><div class="fill"></div></div>
|
|
356
|
+
<p class="sub"><span class="green">37.9% saved</span>. The model still answered correctly about the changed line.</p>
|
|
357
|
+
</section>
|
|
358
|
+
<section class="scene" id="s4">
|
|
359
|
+
<h1 class="headline">Then cap the run before it gets expensive.</h1>
|
|
360
|
+
<p class="sub">Point OpenAI or Anthropic-compatible tools at the local gateway. When the ceiling is crossed, the next call stops.</p>
|
|
361
|
+
<div class="terminal mono">
|
|
362
|
+
<div class="line green">$ AIM_DAILY_BUDGET_USD=10 runcap gateway</div>
|
|
363
|
+
<div class="line">gateway up · compression on · hard cap armed</div>
|
|
364
|
+
<div class="line red">HTTP 429 budget_guard</div>
|
|
365
|
+
<div class="line accent">stopped before money left your account</div>
|
|
366
|
+
</div>
|
|
367
|
+
</section>
|
|
368
|
+
</div>
|
|
369
|
+
<div class="footer">
|
|
370
|
+
<span class="mono">npm install -g runcap</span>
|
|
371
|
+
<span>Free · MIT · 100% local</span>
|
|
372
|
+
</div>
|
|
373
|
+
</div>
|
|
374
|
+
</div>
|
|
375
|
+
<script>
|
|
376
|
+
const scenes = [...document.querySelectorAll(".scene")];
|
|
377
|
+
window.renderFrame = (seconds) => {
|
|
378
|
+
const index = seconds < 2.4 ? 0 : seconds < 4.8 ? 1 : seconds < 7.2 ? 2 : seconds < 9.8 ? 3 : 4;
|
|
379
|
+
scenes.forEach((scene, i) => scene.classList.toggle("active", i === index));
|
|
380
|
+
};
|
|
381
|
+
</script>
|
|
382
|
+
</body>
|
|
383
|
+
</html>`;
|
|
384
|
+
|
|
385
|
+
const browser = await chromium.launch({ headless: true });
|
|
386
|
+
const page = await browser.newPage({ viewport: { width, height }, deviceScaleFactor: 1 });
|
|
387
|
+
await page.setContent(html);
|
|
388
|
+
await page.waitForTimeout(100);
|
|
389
|
+
|
|
390
|
+
for (let i = 0; i < frameCount; i += 1) {
|
|
391
|
+
const seconds = i / fps;
|
|
392
|
+
await page.evaluate((t) => window.renderFrame(t), seconds);
|
|
393
|
+
await page.screenshot({ path: join(framesDir, `frame-${String(i).padStart(4, "0")}.png`) });
|
|
394
|
+
}
|
|
395
|
+
await browser.close();
|
|
396
|
+
|
|
397
|
+
const ffmpeg = spawnSync("ffmpeg", [
|
|
398
|
+
"-y",
|
|
399
|
+
"-framerate", String(fps),
|
|
400
|
+
"-i", join(framesDir, "frame-%04d.png"),
|
|
401
|
+
"-c:v", "libx264",
|
|
402
|
+
"-pix_fmt", "yuv420p",
|
|
403
|
+
"-movflags", "+faststart",
|
|
404
|
+
"-crf", "18",
|
|
405
|
+
outFile
|
|
406
|
+
], { stdio: "inherit" });
|
|
407
|
+
|
|
408
|
+
if (ffmpeg.status !== 0) {
|
|
409
|
+
process.exit(ffmpeg.status ?? 1);
|
|
410
|
+
}
|
|
411
|
+
|
|
412
|
+
console.log(`wrote ${outFile}`);
|
package/src/compressor.mjs
CHANGED
|
@@ -46,7 +46,7 @@ function shortHash(text) {
|
|
|
46
46
|
|
|
47
47
|
// Cheap line-overlap ratio. Used only to decide whether a full LCS diff is
|
|
48
48
|
// worth computing; the real saving is measured against the emitted delta.
|
|
49
|
-
function lineSimilarity(aLines, bLines) {
|
|
49
|
+
export function lineSimilarity(aLines, bLines) {
|
|
50
50
|
const aSet = new Set(aLines);
|
|
51
51
|
let shared = 0;
|
|
52
52
|
for (const l of bLines) if (aSet.has(l)) shared++;
|
|
@@ -378,3 +378,59 @@ export function compressRequestBody(body) {
|
|
|
378
378
|
deltas: deduped.deltas
|
|
379
379
|
};
|
|
380
380
|
}
|
|
381
|
+
|
|
382
|
+
// --- loop / circling detection (the "looks productive but stuck" signal) ---
|
|
383
|
+
// The gateway sees every request the agent sends. An agent that is circling the
|
|
384
|
+
// same failure with reworded attempts sends prompts that are SIMILAR-but-not-
|
|
385
|
+
// identical turn after turn: the conversation tail barely moves while tokens
|
|
386
|
+
// keep burning. Plain hashing misses this (the text differs slightly each loop);
|
|
387
|
+
// this catches it with the same line-similarity primitive the delta-encoder uses.
|
|
388
|
+
const LOOP_SIMILARITY = 0.92; // two consecutive prompts this similar = no real progress made between them
|
|
389
|
+
const LOOP_MIN_REPEATS = 3; // how many near-identical prompts in a row before we warn
|
|
390
|
+
|
|
391
|
+
// Pull the comparable "shape" of a request: the concatenated text the agent is
|
|
392
|
+
// actually sending this turn (messages / input / system), order-preserving.
|
|
393
|
+
export function requestShapeText(body) {
|
|
394
|
+
if (!body || typeof body !== "object") return "";
|
|
395
|
+
const parts = [];
|
|
396
|
+
const push = (content) => {
|
|
397
|
+
if (typeof content === "string") parts.push(content);
|
|
398
|
+
else if (Array.isArray(content)) {
|
|
399
|
+
for (const p of content) if (p && typeof p === "object" && typeof p.text === "string") parts.push(p.text);
|
|
400
|
+
}
|
|
401
|
+
};
|
|
402
|
+
if (Array.isArray(body.messages)) for (const m of body.messages) if (m && typeof m === "object") push(m.content);
|
|
403
|
+
if (body.system !== undefined) push(body.system);
|
|
404
|
+
if (typeof body.input === "string") push(body.input);
|
|
405
|
+
return parts.join("\n");
|
|
406
|
+
}
|
|
407
|
+
|
|
408
|
+
// Given the current request and a rolling history of prior request shapes,
|
|
409
|
+
// decide whether the agent is circling. Returns { looping, repeats, similarity }.
|
|
410
|
+
// History is oldest->newest of prior requestShapeText() strings in this session.
|
|
411
|
+
export function detectLoop(currentShape, history, {
|
|
412
|
+
similarityThreshold = LOOP_SIMILARITY,
|
|
413
|
+
minRepeats = LOOP_MIN_REPEATS
|
|
414
|
+
} = {}) {
|
|
415
|
+
if (!currentShape || !Array.isArray(history) || history.length === 0) {
|
|
416
|
+
return { looping: false, repeats: 0, similarity: 0 };
|
|
417
|
+
}
|
|
418
|
+
const curLines = String(currentShape).split("\n");
|
|
419
|
+
let repeats = 0;
|
|
420
|
+
let lastSimilarity = 0;
|
|
421
|
+
// Walk backward through history; count the unbroken run of near-identical turns.
|
|
422
|
+
for (let i = history.length - 1; i >= 0; i--) {
|
|
423
|
+
const sim = lineSimilarity(curLines, String(history[i]).split("\n"));
|
|
424
|
+
if (sim >= similarityThreshold) {
|
|
425
|
+
repeats += 1;
|
|
426
|
+
lastSimilarity = sim;
|
|
427
|
+
} else {
|
|
428
|
+
break;
|
|
429
|
+
}
|
|
430
|
+
}
|
|
431
|
+
return {
|
|
432
|
+
looping: repeats >= minRepeats,
|
|
433
|
+
repeats,
|
|
434
|
+
similarity: Number(lastSimilarity.toFixed(3))
|
|
435
|
+
};
|
|
436
|
+
}
|
package/src/mission-control.mjs
CHANGED
|
@@ -7,7 +7,7 @@ import path from "node:path";
|
|
|
7
7
|
import process from "node:process";
|
|
8
8
|
import { syncRun } from "./cloud.mjs";
|
|
9
9
|
import { sendAlert } from "./alerts.mjs";
|
|
10
|
-
import { compressRequestBody, estimateTokens } from "./compressor.mjs";
|
|
10
|
+
import { compressRequestBody, estimateTokens, requestShapeText, detectLoop } from "./compressor.mjs";
|
|
11
11
|
|
|
12
12
|
const STORE_DIR = ".runcap";
|
|
13
13
|
const MISSIONS_DIR = path.join(STORE_DIR, "missions");
|
|
@@ -523,6 +523,12 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
|
|
|
523
523
|
if (gatewayMode !== "mock" && !openaiKey && !anthropicKey) {
|
|
524
524
|
throw new Error("Missing upstream key. Set OPENAI_API_KEY (for /v1/chat/completions) and/or ANTHROPIC_API_KEY (for /v1/messages). The gateway cannot proxy without at least one.");
|
|
525
525
|
}
|
|
526
|
+
// Rolling history of recent request shapes (per gateway process) so we can
|
|
527
|
+
// detect an agent circling the same failure with reworded prompts: similar-
|
|
528
|
+
// but-not-identical turns, which plain hashing never catches.
|
|
529
|
+
const loopEnabled = (process.env.AIM_LOOP_DETECT ?? "on").toLowerCase() !== "off";
|
|
530
|
+
const shapeHistory = [];
|
|
531
|
+
const SHAPE_HISTORY_MAX = 12;
|
|
526
532
|
const server = http.createServer(async (request, response) => {
|
|
527
533
|
const started = Date.now();
|
|
528
534
|
try {
|
|
@@ -545,6 +551,17 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
|
|
|
545
551
|
|
|
546
552
|
const bodyText = await readRequestBody(request);
|
|
547
553
|
const requestBody = safeJson(bodyText) ?? {};
|
|
554
|
+
// Loop signal: compare this request's shape against the recent run.
|
|
555
|
+
let loop = null;
|
|
556
|
+
if (loopEnabled) {
|
|
557
|
+
const shape = requestShapeText(requestBody);
|
|
558
|
+
if (shape) {
|
|
559
|
+
const result = detectLoop(shape, shapeHistory);
|
|
560
|
+
loop = { looping: result.looping, repeats: result.repeats, similarity: result.similarity, truth: "calculated" };
|
|
561
|
+
shapeHistory.push(shape);
|
|
562
|
+
if (shapeHistory.length > SHAPE_HISTORY_MAX) shapeHistory.shift();
|
|
563
|
+
}
|
|
564
|
+
}
|
|
548
565
|
const budget = readBudget();
|
|
549
566
|
const summary = await readGatewaySummary({ windowMs: budgetWindowMs() });
|
|
550
567
|
// Compress the request body once (safe, lossless-by-construction). Disable with AIM_COMPRESS=off.
|
|
@@ -591,6 +608,7 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
|
|
|
591
608
|
capUsd: budget,
|
|
592
609
|
blockedByThisCall
|
|
593
610
|
},
|
|
611
|
+
loop,
|
|
594
612
|
error: blockedByThisCall
|
|
595
613
|
? `Budget would be exceeded by this call: $${summary.estimatedCostUsd} spent + ~$${callEstimate} this call > cap $${budget}`
|
|
596
614
|
: `Budget exceeded: ${summary.estimatedCostUsd} >= ${budget}`,
|
|
@@ -631,6 +649,7 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
|
|
|
631
649
|
usage: responseBody.usage,
|
|
632
650
|
cost: estimateApiCost(responseBody.usage, requestBody.model ?? responseBody.model),
|
|
633
651
|
compression,
|
|
652
|
+
loop,
|
|
634
653
|
truth: "mock_provider_usage",
|
|
635
654
|
requestHash: createHash("sha1").update(bodyText).digest("hex")
|
|
636
655
|
});
|
|
@@ -682,9 +701,14 @@ function createGatewayServer({ port = 8792, mock = false, upstream = {} } = {})
|
|
|
682
701
|
usage: responseBody.usage ?? null,
|
|
683
702
|
cost: estimateApiCost(responseBody.usage, requestBody.model ?? responseBody.model),
|
|
684
703
|
compression,
|
|
704
|
+
loop,
|
|
685
705
|
truth: responseBody.usage ? "provider_usage" : "unknown",
|
|
686
706
|
requestHash: createHash("sha1").update(bodyText).digest("hex")
|
|
687
707
|
});
|
|
708
|
+
if (loop && loop.looping) {
|
|
709
|
+
sendAlert(`Runcap: possible stuck loop. The agent has sent ${loop.repeats} near-identical prompts in a row (${Math.round(loop.similarity * 100)}% similar) without the conversation moving forward. It may be circling the same failure with reworded attempts.`)
|
|
710
|
+
.catch(() => {});
|
|
711
|
+
}
|
|
688
712
|
if (responseBody.usage) {
|
|
689
713
|
const spent = await readGatewaySummary({ windowMs: budgetWindowMs() });
|
|
690
714
|
syncRun({
|
|
@@ -769,19 +793,23 @@ export async function showStatus(options = {}) {
|
|
|
769
793
|
|
|
770
794
|
const gateway = await readGatewaySummary();
|
|
771
795
|
const gatewayLine = `Gateway: ${gateway.callCount} calls, ${gateway.totalTokens} tokens, $${gateway.estimatedCostUsd} estimated (${gateway.truth})`;
|
|
796
|
+
const loopLine = gateway.loop?.looping
|
|
797
|
+
? `Loop warning: last ${gateway.loop.repeats} prompts were ${Math.round(gateway.loop.similarity * 100)}% identical with no progress. The agent may be circling the same failure (truth: calculated).`
|
|
798
|
+
: null;
|
|
772
799
|
const latest = await latestMissionId();
|
|
773
|
-
if (!latest) return
|
|
800
|
+
if (!latest) return [fuelLine, gatewayLine, loopLine, "No missions recorded yet."].filter(Boolean).join("\n");
|
|
774
801
|
const mission = await readMission(latest);
|
|
775
802
|
return [
|
|
776
803
|
fuelLine,
|
|
777
804
|
gatewayLine,
|
|
805
|
+
loopLine,
|
|
778
806
|
`Latest mission: ${mission.id}`,
|
|
779
807
|
`Status: ${mission.stuck.status}`,
|
|
780
808
|
`Exit code: ${mission.exitCode}`,
|
|
781
809
|
`Changed files: ${mission.diffEvidence.changedFiles.length}`,
|
|
782
810
|
`Errors: ${mission.errors.length}`,
|
|
783
811
|
`Report: ${path.join(MISSIONS_DIR, mission.id, "report.md")}`
|
|
784
|
-
].join("\n");
|
|
812
|
+
].filter(Boolean).join("\n");
|
|
785
813
|
}
|
|
786
814
|
|
|
787
815
|
export async function recordFuel(value) {
|
|
@@ -1419,6 +1447,13 @@ async function readGatewaySummary({ windowMs } = {}) {
|
|
|
1419
1447
|
const inputRate = pricing ? pricing.inputPerMillion : 3; // fall back to a mid Sonnet-ish rate
|
|
1420
1448
|
return sum + (saved * inputRate) / 1_000_000;
|
|
1421
1449
|
}, 0);
|
|
1450
|
+
// Loop signal: the most recent event that carries a loop verdict tells us
|
|
1451
|
+
// whether the agent is currently circling (similar-but-not-identical prompts
|
|
1452
|
+
// repeated without progress). This is the "looks productive but stuck" case.
|
|
1453
|
+
const lastWithLoop = [...events].reverse().find((event) => event.loop);
|
|
1454
|
+
const loop = lastWithLoop
|
|
1455
|
+
? { ...lastWithLoop.loop, at: lastWithLoop.at, model: lastWithLoop.model }
|
|
1456
|
+
: { looping: false, repeats: 0, similarity: 0, truth: "calculated" };
|
|
1422
1457
|
return {
|
|
1423
1458
|
callCount: events.length,
|
|
1424
1459
|
successfulCallCount: successful.length,
|
|
@@ -1427,6 +1462,7 @@ async function readGatewaySummary({ windowMs } = {}) {
|
|
|
1427
1462
|
savedTokens,
|
|
1428
1463
|
savedUsd: Number(savedUsd.toFixed(6)),
|
|
1429
1464
|
wouldHaveSpentUsd: Number((estimatedCost + savedUsd).toFixed(6)),
|
|
1465
|
+
loop,
|
|
1430
1466
|
truth: events.some((event) => event.truth === "provider_usage" || event.truth === "mock_provider_usage")
|
|
1431
1467
|
? "usage_plus_static_price_table"
|
|
1432
1468
|
: "unknown",
|