@valescoagency/runway 0.3.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +22 -6
- package/dist/commands/doctor.js +203 -2
- package/dist/commands/run.js +32 -4
- package/dist/config.js +7 -1
- package/dist/linear.js +41 -0
- package/dist/orchestrator.js +244 -47
- package/dist/policy.js +76 -0
- package/dist/prompts.js +44 -1
- package/package.json +2 -1
- package/prompts/implement.md +46 -2
- package/templates/Dockerfile.claude-code.base +24 -0
package/README.md
CHANGED
|
@@ -14,7 +14,7 @@ zero-secrets-at-rest, and the `gh` CLI for PR creation.
|
|
|
14
14
|
|---|---|
|
|
15
15
|
| `runway doctor` | Read-only preflight diagnostic: host tooling, env vars, repo state, and the agent docker image. Use when something stopped working and you want a sanity report. `--json` for CI / scripted health checks. |
|
|
16
16
|
| `runway init` | Scaffold the cwd repo for runway: write `.sandcastle/Dockerfile` + (tier 2) `.env.schema` with op:// references. Run **once per target repo**. |
|
|
17
|
-
| `runway run` | Drain a Linear queue. For each `Todo` issue: branch, agent works, sub-agent reviews, PR opens (or `
|
|
17
|
+
| `runway run` | Drain a Linear queue. For each `Todo` issue: branch, agent works, sub-agent reviews, PR opens (or `ready-for-human` label). Run **whenever you want a batch of work done**. |
|
|
18
18
|
| `runway upgrade` | Update the runway CLI itself: `git pull` the local clone, `pnpm install`, typecheck. `--check` for a dry-run, `--force` to override dirty/branch refusals. |
|
|
19
19
|
| `runway upgrade-repo` | Re-render the cwd repo's runway scaffold against the current vendored templates. Use after a runway version bump that changed the Dockerfile or template shape — `init` writes them, `upgrade-repo` keeps them current without re-prompting for op:// values. |
|
|
20
20
|
|
|
@@ -50,7 +50,7 @@ runway (this CLI, on your Mac, run from inside the target repo)
|
|
|
50
50
|
│ → REVIEW: APPROVED | REVIEW: REJECTED — <reason>
|
|
51
51
|
│
|
|
52
52
|
├── approved → git push → gh pr create → Linear "In Review"
|
|
53
|
-
└── rejected → Linear label "
|
|
53
|
+
└── rejected → Linear label "ready-for-human", comment with reason
|
|
54
54
|
↓ next issue
|
|
55
55
|
```
|
|
56
56
|
|
|
@@ -161,10 +161,19 @@ export LINEAR_API_KEY=lin_api_...
|
|
|
161
161
|
# export RUNWAY_READY_STATUS="Todo"
|
|
162
162
|
# export RUNWAY_IN_PROGRESS_STATUS="In Progress"
|
|
163
163
|
# export RUNWAY_IN_REVIEW_STATUS="In Review"
|
|
164
|
-
# export RUNWAY_HITL_LABEL="
|
|
164
|
+
# export RUNWAY_HITL_LABEL="ready-for-human"
|
|
165
165
|
# export RUNWAY_MAX_ITERATIONS=5
|
|
166
166
|
```
|
|
167
167
|
|
|
168
|
+
`RUNWAY_HITL_LABEL` defaults to `ready-for-human`, matching the
|
|
169
|
+
[Flightplan](https://github.com/valescoagency/flightplan) canonical
|
|
170
|
+
state-label vocabulary (`needs-triage`, `needs-info`,
|
|
171
|
+
`ready-for-agent`, `ready-for-human`, `wontfix`) that Bedrock and
|
|
172
|
+
other Valesco repos use. Override the env var if your workspace uses
|
|
173
|
+
a different label. `runway doctor` validates that the configured
|
|
174
|
+
team, workflow states, and HITL label all exist before any agent run
|
|
175
|
+
— misconfiguration surfaces immediately instead of mid-drain.
|
|
176
|
+
|
|
168
177
|
### From source (development)
|
|
169
178
|
|
|
170
179
|
```bash
|
|
@@ -214,14 +223,21 @@ Skip a single hook invocation with `LEFTHOOK=0 git commit …` (or
|
|
|
214
223
|
```bash
|
|
215
224
|
cd /path/to/the/repo/you/want/agents/working/on
|
|
216
225
|
runway run # drain the entire ready queue
|
|
217
|
-
runway run --max 3 #
|
|
226
|
+
runway run --max 3 # attempt at most 3 issues then exit
|
|
218
227
|
runway --help
|
|
219
228
|
```
|
|
220
229
|
|
|
221
230
|
`runway` (no subcommand) is an alias for `runway run` for back-compat.
|
|
222
231
|
|
|
232
|
+
`--max N` bounds **attempts**, not successes. Every issue picked up
|
|
233
|
+
counts as one attempt, whether it ends in a PR, a `needs-human` label,
|
|
234
|
+
or a revert-to-`Todo` after an infrastructure failure. An issue
|
|
235
|
+
reverted in this invocation will not be re-picked in the same
|
|
236
|
+
invocation — re-run runway after fixing the underlying config to retry
|
|
237
|
+
it.
|
|
238
|
+
|
|
223
239
|
The CLI exits with 0 even if some issues hit HITL or errored — those
|
|
224
|
-
are normal outcomes. Check Linear for the `
|
|
240
|
+
are normal outcomes. Check Linear for the `ready-for-human` label and the
|
|
225
241
|
per-issue comments for what happened.
|
|
226
242
|
|
|
227
243
|
## Linear conventions
|
|
@@ -239,7 +255,7 @@ It transitions them through:
|
|
|
239
255
|
agent has committed to its branch — startup failures before any
|
|
240
256
|
commits revert the issue back to `Todo` rather than stranding it)
|
|
241
257
|
- `In Review` when the PR opens
|
|
242
|
-
- (label `
|
|
258
|
+
- (label `ready-for-human`) if the agent or reviewer can't finish *after*
|
|
243
259
|
the agent has committed real work
|
|
244
260
|
|
|
245
261
|
These names are configurable per env var; the queries match by name so
|
package/dist/commands/doctor.js
CHANGED
|
@@ -2,6 +2,9 @@ import { existsSync, readFileSync } from "node:fs";
|
|
|
2
2
|
import { join } from "node:path";
|
|
3
3
|
import { execa } from "execa";
|
|
4
4
|
import { detectBaseBranch } from "../git.js";
|
|
5
|
+
import { loadPolicy } from "../policy.js";
|
|
6
|
+
import { loadConfig } from "../config.js";
|
|
7
|
+
import { validateLinearConfig } from "../linear.js";
|
|
5
8
|
// ---------------------------------------------------------------------------
|
|
6
9
|
// Usage
|
|
7
10
|
// ---------------------------------------------------------------------------
|
|
@@ -87,12 +90,14 @@ export async function doctorCommand(argv) {
|
|
|
87
90
|
sections.push(await checkEnvironment(tierForToolingChecks, cwd, repo));
|
|
88
91
|
sections.push(await checkRepoState(cwd, repo));
|
|
89
92
|
sections.push(await checkDockerImage(cwd));
|
|
93
|
+
sections.push(await checkLinearConfig());
|
|
90
94
|
}
|
|
91
95
|
else {
|
|
92
96
|
// Push placeholder skipped sections so JSON output stays well-shaped.
|
|
93
97
|
sections.push(skippedSection("Environment"));
|
|
94
98
|
sections.push(skippedSection("Repo state"));
|
|
95
99
|
sections.push(skippedSection("Docker image"));
|
|
100
|
+
sections.push(skippedSection("Linear configuration"));
|
|
96
101
|
}
|
|
97
102
|
// Render
|
|
98
103
|
if (opts.json) {
|
|
@@ -102,8 +107,14 @@ export async function doctorCommand(argv) {
|
|
|
102
107
|
renderText(sections, tier, initialised, opts.detailed);
|
|
103
108
|
}
|
|
104
109
|
// Exit code: required-check failures = 1.
|
|
105
|
-
//
|
|
106
|
-
|
|
110
|
+
// Required: 0 host tooling, 1 environment, 3 docker image, 4 Linear
|
|
111
|
+
// config. Section 2 (repo state) is informational.
|
|
112
|
+
const requiredSections = [
|
|
113
|
+
sections[0],
|
|
114
|
+
sections[1],
|
|
115
|
+
sections[3],
|
|
116
|
+
sections[4],
|
|
117
|
+
];
|
|
107
118
|
const failed = requiredSections.some((s) => s?.ran && [...s.checks.values()].some((c) => c.status === "fail"));
|
|
108
119
|
process.exit(failed ? 1 : 0);
|
|
109
120
|
}
|
|
@@ -321,6 +332,23 @@ async function checkEnvironment(tier, cwd, repo) {
|
|
|
321
332
|
});
|
|
322
333
|
}
|
|
323
334
|
}
|
|
335
|
+
// VA-352: surface the active impl-pass write-path policy so the
|
|
336
|
+
// operator can see whether an agent run can touch CI workflows, etc.
|
|
337
|
+
try {
|
|
338
|
+
const policy = loadPolicy(cwd);
|
|
339
|
+
checks.set("policy", {
|
|
340
|
+
status: "ok",
|
|
341
|
+
label: "impl policy",
|
|
342
|
+
detail: `${policy.source} (${policy.forbiddenPaths.length} forbidden path${policy.forbiddenPaths.length === 1 ? "" : "s"})`,
|
|
343
|
+
});
|
|
344
|
+
}
|
|
345
|
+
catch (err) {
|
|
346
|
+
checks.set("policy", {
|
|
347
|
+
status: "fail",
|
|
348
|
+
label: "impl policy",
|
|
349
|
+
detail: errMsg(err),
|
|
350
|
+
});
|
|
351
|
+
}
|
|
324
352
|
return { title: "Environment", checks, ran: true };
|
|
325
353
|
}
|
|
326
354
|
function envSet(name, missingStatus) {
|
|
@@ -445,6 +473,51 @@ async function checkDockerImage(cwd) {
|
|
|
445
473
|
detail: imageUser ? `User=${imageUser}` : `User unset (root); host=${expected}`,
|
|
446
474
|
});
|
|
447
475
|
}
|
|
476
|
+
// VA-351: container readiness — pnpm on PATH + HOME/cache env
|
|
477
|
+
// baked in. Cheap one-shot run; fails fast if the image is stale.
|
|
478
|
+
try {
|
|
479
|
+
const probe = await execa("docker", [
|
|
480
|
+
"run",
|
|
481
|
+
"--rm",
|
|
482
|
+
imageName,
|
|
483
|
+
"bash",
|
|
484
|
+
"-lc",
|
|
485
|
+
'set -e; which pnpm >/dev/null && printf "HOME=%s\\nXDG_CACHE_HOME=%s\\nTURBO_CACHE_DIR=%s\\n" "$HOME" "$XDG_CACHE_HOME" "$TURBO_CACHE_DIR"',
|
|
486
|
+
], { reject: false });
|
|
487
|
+
const out = probe.stdout ?? "";
|
|
488
|
+
const missing = [];
|
|
489
|
+
if (probe.exitCode !== 0)
|
|
490
|
+
missing.push("pnpm");
|
|
491
|
+
if (!/^HOME=\/home\/agent\s*$/m.test(out))
|
|
492
|
+
missing.push("HOME");
|
|
493
|
+
if (!/^XDG_CACHE_HOME=\/home\/agent\/.cache\s*$/m.test(out)) {
|
|
494
|
+
missing.push("XDG_CACHE_HOME");
|
|
495
|
+
}
|
|
496
|
+
if (!/^TURBO_CACHE_DIR=\/tmp\/turbo-cache\s*$/m.test(out)) {
|
|
497
|
+
missing.push("TURBO_CACHE_DIR");
|
|
498
|
+
}
|
|
499
|
+
if (missing.length === 0) {
|
|
500
|
+
checks.set("container_ready", {
|
|
501
|
+
status: "ok",
|
|
502
|
+
label: "container readiness",
|
|
503
|
+
detail: "pnpm on PATH; HOME, XDG_CACHE_HOME, TURBO_CACHE_DIR set",
|
|
504
|
+
});
|
|
505
|
+
}
|
|
506
|
+
else {
|
|
507
|
+
checks.set("container_ready", {
|
|
508
|
+
status: "warn",
|
|
509
|
+
label: "container readiness",
|
|
510
|
+
detail: `missing or wrong inside container: ${missing.join(", ")} — rebuild via \`runway upgrade-repo && docker build .sandcastle -t ${imageName}\``,
|
|
511
|
+
});
|
|
512
|
+
}
|
|
513
|
+
}
|
|
514
|
+
catch (err) {
|
|
515
|
+
checks.set("container_ready", {
|
|
516
|
+
status: "warn",
|
|
517
|
+
label: "container readiness",
|
|
518
|
+
detail: `probe failed: ${errMsg(err)}`,
|
|
519
|
+
});
|
|
520
|
+
}
|
|
448
521
|
}
|
|
449
522
|
catch (err) {
|
|
450
523
|
checks.set("image_present", {
|
|
@@ -455,6 +528,134 @@ async function checkDockerImage(cwd) {
|
|
|
455
528
|
}
|
|
456
529
|
return { title: "Docker image", checks, ran: true };
|
|
457
530
|
}
|
|
531
|
+
// ---------------------------------------------------------------------------
|
|
532
|
+
// Section: Linear configuration (VA-354)
|
|
533
|
+
// ---------------------------------------------------------------------------
|
|
534
|
+
/**
|
|
535
|
+
* Validate that the team, workflow states, and HITL label `runway run`
|
|
536
|
+
* would use actually exist on the Linear workspace. Without this,
|
|
537
|
+
* misconfiguration only surfaces deep inside a long agent run — too
|
|
538
|
+
* late to fix without losing the work.
|
|
539
|
+
*/
|
|
540
|
+
async function checkLinearConfig() {
|
|
541
|
+
const checks = new Map();
|
|
542
|
+
// The config loader's only hard requirement is LINEAR_API_KEY; the
|
|
543
|
+
// rest defaults. If the key is missing, the Environment section
|
|
544
|
+
// already fails — surface a skip here rather than re-failing.
|
|
545
|
+
if (!process.env.LINEAR_API_KEY) {
|
|
546
|
+
checks.set("linear_config", {
|
|
547
|
+
status: "skip",
|
|
548
|
+
label: "Linear config",
|
|
549
|
+
detail: "LINEAR_API_KEY unset — skipped",
|
|
550
|
+
});
|
|
551
|
+
return { title: "Linear configuration", checks, ran: true };
|
|
552
|
+
}
|
|
553
|
+
let config;
|
|
554
|
+
try {
|
|
555
|
+
config = loadConfig();
|
|
556
|
+
}
|
|
557
|
+
catch (err) {
|
|
558
|
+
checks.set("linear_config", {
|
|
559
|
+
status: "fail",
|
|
560
|
+
label: "Linear config",
|
|
561
|
+
detail: `failed to load runway config: ${errMsg(err)}`,
|
|
562
|
+
});
|
|
563
|
+
return { title: "Linear configuration", checks, ran: true };
|
|
564
|
+
}
|
|
565
|
+
let result;
|
|
566
|
+
try {
|
|
567
|
+
result = await validateLinearConfig(config);
|
|
568
|
+
}
|
|
569
|
+
catch (err) {
|
|
570
|
+
checks.set("linear_api", {
|
|
571
|
+
status: "fail",
|
|
572
|
+
label: "Linear API",
|
|
573
|
+
detail: `validation request failed: ${errMsg(err)}`,
|
|
574
|
+
});
|
|
575
|
+
return { title: "Linear configuration", checks, ran: true };
|
|
576
|
+
}
|
|
577
|
+
if (result.team.kind === "missing") {
|
|
578
|
+
checks.set("team", {
|
|
579
|
+
status: "fail",
|
|
580
|
+
label: `team ${config.linearTeam}`,
|
|
581
|
+
detail: `Linear team key "${result.team.key}" not found — set RUNWAY_LINEAR_TEAM`,
|
|
582
|
+
});
|
|
583
|
+
// States/labels are skipped when the team missing; surface
|
|
584
|
+
// explicitly so the user knows they weren't checked.
|
|
585
|
+
checks.set("states", {
|
|
586
|
+
status: "skip",
|
|
587
|
+
label: "workflow states",
|
|
588
|
+
detail: "skipped (team missing)",
|
|
589
|
+
});
|
|
590
|
+
checks.set("hitl_label", {
|
|
591
|
+
status: "skip",
|
|
592
|
+
label: "HITL label",
|
|
593
|
+
detail: "skipped (team missing)",
|
|
594
|
+
});
|
|
595
|
+
return { title: "Linear configuration", checks, ran: true };
|
|
596
|
+
}
|
|
597
|
+
checks.set("team", {
|
|
598
|
+
status: "ok",
|
|
599
|
+
label: `team ${config.linearTeam}`,
|
|
600
|
+
detail: `id=${result.team.id}`,
|
|
601
|
+
});
|
|
602
|
+
for (const [key, configured, state] of [
|
|
603
|
+
["ready_state", config.readyStatus, result.readyStatus],
|
|
604
|
+
["in_progress_state", config.inProgressStatus, result.inProgressStatus],
|
|
605
|
+
["in_review_state", config.inReviewStatus, result.inReviewStatus],
|
|
606
|
+
]) {
|
|
607
|
+
if (state.kind === "ok") {
|
|
608
|
+
checks.set(key, {
|
|
609
|
+
status: "ok",
|
|
610
|
+
label: `workflow state "${configured}"`,
|
|
611
|
+
detail: "present",
|
|
612
|
+
});
|
|
613
|
+
}
|
|
614
|
+
else if (state.kind === "skipped") {
|
|
615
|
+
checks.set(key, {
|
|
616
|
+
status: "skip",
|
|
617
|
+
label: `workflow state "${configured}"`,
|
|
618
|
+
detail: state.reason,
|
|
619
|
+
});
|
|
620
|
+
}
|
|
621
|
+
else {
|
|
622
|
+
checks.set(key, {
|
|
623
|
+
status: "fail",
|
|
624
|
+
label: `workflow state "${configured}"`,
|
|
625
|
+
detail: `not found on team; available: ${formatList(state.available)}`,
|
|
626
|
+
});
|
|
627
|
+
}
|
|
628
|
+
}
|
|
629
|
+
if (result.hitlLabel.kind === "ok") {
|
|
630
|
+
checks.set("hitl_label", {
|
|
631
|
+
status: "ok",
|
|
632
|
+
label: `HITL label "${config.hitlLabel}"`,
|
|
633
|
+
detail: "present",
|
|
634
|
+
});
|
|
635
|
+
}
|
|
636
|
+
else if (result.hitlLabel.kind === "skipped") {
|
|
637
|
+
checks.set("hitl_label", {
|
|
638
|
+
status: "skip",
|
|
639
|
+
label: `HITL label "${config.hitlLabel}"`,
|
|
640
|
+
detail: result.hitlLabel.reason,
|
|
641
|
+
});
|
|
642
|
+
}
|
|
643
|
+
else {
|
|
644
|
+
checks.set("hitl_label", {
|
|
645
|
+
status: "fail",
|
|
646
|
+
label: `HITL label "${config.hitlLabel}"`,
|
|
647
|
+
detail: `not found on team — set RUNWAY_HITL_LABEL or create the label. Available: ${formatList(result.hitlLabel.available)}`,
|
|
648
|
+
});
|
|
649
|
+
}
|
|
650
|
+
return { title: "Linear configuration", checks, ran: true };
|
|
651
|
+
}
|
|
652
|
+
function formatList(items) {
|
|
653
|
+
if (items.length === 0)
|
|
654
|
+
return "(none)";
|
|
655
|
+
if (items.length <= 8)
|
|
656
|
+
return items.join(", ");
|
|
657
|
+
return `${items.slice(0, 8).join(", ")}, …(+${items.length - 8} more)`;
|
|
658
|
+
}
|
|
458
659
|
/**
|
|
459
660
|
* Sanitize the cwd's basename the same way sandcastle's `defaultImageName`
|
|
460
661
|
* does: lowercase, replace any char outside `[a-z0-9_.-]` with `-`, fall
|
package/dist/commands/run.js
CHANGED
|
@@ -4,6 +4,16 @@ import { createGithubGateway } from "../github.js";
|
|
|
4
4
|
import { assertSandcastleInitialised, drainQueue, } from "../orchestrator.js";
|
|
5
5
|
export function parseRunArgs(argv) {
|
|
6
6
|
const opts = {};
|
|
7
|
+
const collectAllow = (raw) => {
|
|
8
|
+
const paths = raw
|
|
9
|
+
.split(",")
|
|
10
|
+
.map((s) => s.trim())
|
|
11
|
+
.filter(Boolean);
|
|
12
|
+
if (paths.length === 0) {
|
|
13
|
+
throw new Error("--allow-paths requires at least one glob");
|
|
14
|
+
}
|
|
15
|
+
opts.allowPaths = [...(opts.allowPaths ?? []), ...paths];
|
|
16
|
+
};
|
|
7
17
|
for (let i = 0; i < argv.length; i += 1) {
|
|
8
18
|
const a = argv[i];
|
|
9
19
|
if (a === "--max" || a === "-n") {
|
|
@@ -27,6 +37,16 @@ export function parseRunArgs(argv) {
|
|
|
27
37
|
else if (a?.startsWith("--project=")) {
|
|
28
38
|
opts.project = a.slice("--project=".length);
|
|
29
39
|
}
|
|
40
|
+
else if (a === "--allow-paths") {
|
|
41
|
+
const v = argv[i + 1];
|
|
42
|
+
if (!v)
|
|
43
|
+
throw new Error("--allow-paths requires a value");
|
|
44
|
+
collectAllow(v);
|
|
45
|
+
i += 1;
|
|
46
|
+
}
|
|
47
|
+
else if (a?.startsWith("--allow-paths=")) {
|
|
48
|
+
collectAllow(a.slice("--allow-paths=".length));
|
|
49
|
+
}
|
|
30
50
|
else if (a === "--help" || a === "-h") {
|
|
31
51
|
printRunUsage();
|
|
32
52
|
process.exit(0);
|
|
@@ -45,10 +65,18 @@ USAGE
|
|
|
45
65
|
runway run [--max N]
|
|
46
66
|
|
|
47
67
|
OPTIONS
|
|
48
|
-
--max, -n N
|
|
68
|
+
--max, -n N Attempt at most N issues then exit (counts every
|
|
69
|
+
attempt — success, HITL, or revert-to-Todo).
|
|
70
|
+
Default: drain queue.
|
|
49
71
|
--project ID Scope the queue to a single Linear project under the
|
|
50
72
|
team. Accepts project UUID, slug, or name. Overrides
|
|
51
73
|
RUNWAY_LINEAR_PROJECT. Default: team-wide.
|
|
74
|
+
--allow-paths GLOBS
|
|
75
|
+
Comma-separated globs removed from the impl policy's
|
|
76
|
+
forbidden-paths list for this invocation only.
|
|
77
|
+
Example: --allow-paths='.github/workflows/**' lets
|
|
78
|
+
the agent touch CI for issues whose AC require it.
|
|
79
|
+
Repeatable; pairs with .runway/policy.yml.
|
|
52
80
|
--help, -h Show this help.
|
|
53
81
|
|
|
54
82
|
ENVIRONMENT
|
|
@@ -62,7 +90,7 @@ ENVIRONMENT
|
|
|
62
90
|
RUNWAY_READY_STATUS default "Todo"
|
|
63
91
|
RUNWAY_IN_PROGRESS_STATUS default "In Progress"
|
|
64
92
|
RUNWAY_IN_REVIEW_STATUS default "In Review"
|
|
65
|
-
RUNWAY_HITL_LABEL default "
|
|
93
|
+
RUNWAY_HITL_LABEL default "ready-for-human"
|
|
66
94
|
RUNWAY_MAX_ITERATIONS default 5
|
|
67
95
|
`);
|
|
68
96
|
}
|
|
@@ -80,6 +108,6 @@ export async function runCommand(argv) {
|
|
|
80
108
|
? `team ${config.linearTeam} / project ${config.linearProject}`
|
|
81
109
|
: `team ${config.linearTeam}`;
|
|
82
110
|
console.log(`[runway] draining queue from ${scope} (status="${config.readyStatus}") against ${cwd}`);
|
|
83
|
-
const result = await drainQueue({ config, linear, github, cwd }, { max: opts.max });
|
|
84
|
-
console.log(`[runway] done —
|
|
111
|
+
const result = await drainQueue({ config, linear, github, cwd }, { max: opts.max, allowPaths: opts.allowPaths });
|
|
112
|
+
console.log(`[runway] done — attempts=${result.attempts} opened=${result.opened} hitl=${result.hitl} errored=${result.errored}`);
|
|
85
113
|
}
|
package/dist/config.js
CHANGED
|
@@ -46,7 +46,13 @@ const ConfigSchema = z.object({
|
|
|
46
46
|
readyStatus: z.string().default("Todo"),
|
|
47
47
|
inProgressStatus: z.string().default("In Progress"),
|
|
48
48
|
inReviewStatus: z.string().default("In Review"),
|
|
49
|
-
|
|
49
|
+
// VA-354: default to the Flightplan canonical state label
|
|
50
|
+
// `ready-for-human`. The previous default (`needs-human`) doesn't
|
|
51
|
+
// exist on Flightplan-aligned Linear workspaces (the common case
|
|
52
|
+
// for Valesco repos), and `linear.applyLabel` failures cascaded
|
|
53
|
+
// into the substantive rejection reason being lost. Workspaces that
|
|
54
|
+
// use a different label override via `RUNWAY_HITL_LABEL`.
|
|
55
|
+
hitlLabel: z.string().default("ready-for-human"),
|
|
50
56
|
maxIterations: z.coerce.number().int().positive().default(5),
|
|
51
57
|
});
|
|
52
58
|
export function loadConfig() {
|
package/dist/linear.js
CHANGED
|
@@ -104,3 +104,44 @@ export function createLinearGateway(config) {
|
|
|
104
104
|
},
|
|
105
105
|
};
|
|
106
106
|
}
|
|
107
|
+
export async function validateLinearConfig(config) {
|
|
108
|
+
const client = new LinearClient({ apiKey: config.linearApiKey });
|
|
109
|
+
const teams = await client.teams({
|
|
110
|
+
filter: { key: { eq: config.linearTeam } },
|
|
111
|
+
});
|
|
112
|
+
const team = teams.nodes[0];
|
|
113
|
+
if (!team) {
|
|
114
|
+
return {
|
|
115
|
+
team: { kind: "missing", key: config.linearTeam },
|
|
116
|
+
readyStatus: { kind: "skipped", reason: "team missing" },
|
|
117
|
+
inProgressStatus: { kind: "skipped", reason: "team missing" },
|
|
118
|
+
inReviewStatus: { kind: "skipped", reason: "team missing" },
|
|
119
|
+
hitlLabel: { kind: "skipped", reason: "team missing" },
|
|
120
|
+
};
|
|
121
|
+
}
|
|
122
|
+
const states = await client.workflowStates({
|
|
123
|
+
filter: { team: { id: { eq: team.id } } },
|
|
124
|
+
});
|
|
125
|
+
const stateNames = states.nodes.map((s) => s.name);
|
|
126
|
+
const checkState = (want) => stateNames.includes(want)
|
|
127
|
+
? { kind: "ok", name: want }
|
|
128
|
+
: { kind: "missing", name: want, available: stateNames };
|
|
129
|
+
const labels = await client.issueLabels({
|
|
130
|
+
filter: { team: { id: { eq: team.id } } },
|
|
131
|
+
});
|
|
132
|
+
const labelNames = labels.nodes.map((l) => l.name);
|
|
133
|
+
const hitlLabel = labelNames.includes(config.hitlLabel)
|
|
134
|
+
? { kind: "ok", name: config.hitlLabel }
|
|
135
|
+
: {
|
|
136
|
+
kind: "missing",
|
|
137
|
+
name: config.hitlLabel,
|
|
138
|
+
available: labelNames.slice(0, 50),
|
|
139
|
+
};
|
|
140
|
+
return {
|
|
141
|
+
team: { kind: "ok", id: team.id },
|
|
142
|
+
readyStatus: checkState(config.readyStatus),
|
|
143
|
+
inProgressStatus: checkState(config.inProgressStatus),
|
|
144
|
+
inReviewStatus: checkState(config.inReviewStatus),
|
|
145
|
+
hitlLabel,
|
|
146
|
+
};
|
|
147
|
+
}
|
package/dist/orchestrator.js
CHANGED
|
@@ -3,9 +3,25 @@ import { join } from "node:path";
|
|
|
3
3
|
import { run, claudeCode } from "@ai-hero/sandcastle";
|
|
4
4
|
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
|
|
5
5
|
import { execa } from "execa";
|
|
6
|
-
import { implementVars, loadImplementPrompt, loadReviewPrompt, renderPrompt, reviewVars, } from "./prompts.js";
|
|
6
|
+
import { buildIterationSummary, implementVars, loadImplementPrompt, loadReviewPrompt, renderPrompt, reviewVars, tailOfMessage, } from "./prompts.js";
|
|
7
7
|
import { detectBaseBranch } from "./git.js";
|
|
8
|
-
|
|
8
|
+
import { loadPolicy } from "./policy.js";
|
|
9
|
+
// VA-353: review verdict marker. Global flag because sandcastle
|
|
10
|
+
// appends wrapper output ("Agent stopped", "Capturing session",
|
|
11
|
+
// "Reached max iterations (1).", "Run complete: …") AFTER the agent's
|
|
12
|
+
// final message — so the marker is rarely the last line. We scan
|
|
13
|
+
// every line-start match and keep the LAST one, which is the most
|
|
14
|
+
// recent agent verdict. Standalone-line: ^…$ with /m anchors prevent
|
|
15
|
+
// mid-prose matches like "the reviewer should output REVIEW: APPROVED
|
|
16
|
+
// when…".
|
|
17
|
+
const REVIEW_VERDICT_RE = /^REVIEW:\s*(APPROVED|REJECTED)(?:\s+—\s+(.*))?$/gm;
|
|
18
|
+
// VA-350: impl-pass termination contract. Last `IMPL:` marker line in
|
|
19
|
+
// the agent's output wins (most recent iteration's verdict). DONE →
|
|
20
|
+
// proceed to review; BLOCKED → HITL with reason; CONTINUE or missing →
|
|
21
|
+
// fall through (back-compat). The trailing reason after `—` is
|
|
22
|
+
// captured for BLOCKED.
|
|
23
|
+
const IMPL_VERDICT_RE = /^IMPL:\s*(DONE|BLOCKED|CONTINUE)(?:\s+—\s+(.*))?$/gm;
|
|
24
|
+
const IMPL_COMPLETION_SIGNALS = ["IMPL: DONE", "IMPL: BLOCKED"];
|
|
9
25
|
/**
|
|
10
26
|
* Confirms the cwd looks like a sandcastle-initialised repo. If not,
|
|
11
27
|
* we error early with a clear message rather than letting Sandcastle
|
|
@@ -24,7 +40,7 @@ export function assertSandcastleInitialised(cwd) {
|
|
|
24
40
|
export async function drainQueue(deps, opts = {}) {
|
|
25
41
|
const { config, linear } = deps;
|
|
26
42
|
const max = opts.max ?? Number.POSITIVE_INFINITY;
|
|
27
|
-
let
|
|
43
|
+
let attempts = 0;
|
|
28
44
|
let opened = 0;
|
|
29
45
|
let hitl = 0;
|
|
30
46
|
let errored = 0;
|
|
@@ -33,19 +49,33 @@ export async function drainQueue(deps, opts = {}) {
|
|
|
33
49
|
// fast, before we touch any Linear state).
|
|
34
50
|
const baseBranch = config.baseBranch ?? (await detectBaseBranch(deps.cwd));
|
|
35
51
|
console.log(`[runway] base branch resolved to "${baseBranch}"`);
|
|
36
|
-
const
|
|
37
|
-
|
|
52
|
+
const policy = loadPolicy(deps.cwd, { allowPathsOverride: opts.allowPaths });
|
|
53
|
+
console.log(`[runway] policy: ${policy.source}`);
|
|
54
|
+
const runDeps = { ...deps, baseBranch, policy };
|
|
55
|
+
// VA-344: never re-pick an issue in the same invocation, even if
|
|
56
|
+
// VA-342 reverted it to `Todo`. Without this, a deterministic
|
|
57
|
+
// startup failure (broken .env.schema, missing image, expired token)
|
|
58
|
+
// would loop on the same issue until --max was exhausted.
|
|
59
|
+
const seen = new Set();
|
|
60
|
+
const outcomes = [];
|
|
61
|
+
while (attempts < max) {
|
|
38
62
|
const queue = await linear.fetchReady();
|
|
39
|
-
|
|
63
|
+
const issue = queue.find((i) => !seen.has(i.id));
|
|
64
|
+
if (!issue)
|
|
40
65
|
break;
|
|
41
|
-
|
|
66
|
+
seen.add(issue.id);
|
|
67
|
+
attempts += 1;
|
|
42
68
|
try {
|
|
43
|
-
const
|
|
44
|
-
|
|
45
|
-
if (verdict === "opened")
|
|
69
|
+
const result = await processIssue(issue, runDeps);
|
|
70
|
+
if (result.kind === "opened")
|
|
46
71
|
opened += 1;
|
|
47
|
-
if (
|
|
72
|
+
if (result.kind === "hitl")
|
|
48
73
|
hitl += 1;
|
|
74
|
+
outcomes.push({
|
|
75
|
+
identifier: issue.identifier,
|
|
76
|
+
kind: result.kind,
|
|
77
|
+
detail: result.detail,
|
|
78
|
+
});
|
|
49
79
|
}
|
|
50
80
|
catch (err) {
|
|
51
81
|
errored += 1;
|
|
@@ -53,30 +83,44 @@ export async function drainQueue(deps, opts = {}) {
|
|
|
53
83
|
// If the agent crashed before producing any commits (missing
|
|
54
84
|
// image, varlock validation, container failed to boot, etc.),
|
|
55
85
|
// it's an infrastructure failure — not a HITL. Revert the issue
|
|
56
|
-
// to
|
|
57
|
-
//
|
|
58
|
-
//
|
|
86
|
+
// to the ready state and skip the HITL label so the next run can
|
|
87
|
+
// pick it up cleanly. `In Progress` is reserved for "agent has
|
|
88
|
+
// committed to the branch".
|
|
59
89
|
const branch = `agent/${issue.identifier.toLowerCase()}`;
|
|
60
90
|
const startedRealWork = await hasCommits(deps.cwd, baseBranch, branch);
|
|
91
|
+
const errDetail = err instanceof Error ? err.message : String(err);
|
|
61
92
|
if (!startedRealWork) {
|
|
62
93
|
await linear
|
|
63
94
|
.transition(issue.id, config.readyStatus)
|
|
64
95
|
.catch(() => undefined);
|
|
65
96
|
await linear
|
|
66
|
-
.comment(issue.id, `Runway hit a startup failure before the agent produced any commits — reverting to \`${config.readyStatus}\` for retry:\n\n\`\`\`\n${
|
|
97
|
+
.comment(issue.id, `Runway hit a startup failure before the agent produced any commits — reverting to \`${config.readyStatus}\` for retry:\n\n\`\`\`\n${errDetail}\n\`\`\``)
|
|
67
98
|
.catch(() => undefined);
|
|
99
|
+
outcomes.push({
|
|
100
|
+
identifier: issue.identifier,
|
|
101
|
+
kind: "reverted",
|
|
102
|
+
detail: errDetail,
|
|
103
|
+
});
|
|
68
104
|
}
|
|
69
105
|
else {
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
106
|
+
// VA-355: comment first with the substantive reason, label
|
|
107
|
+
// second (best-effort). If we labeled first and the label
|
|
108
|
+
// didn't exist (Flightplan workspaces hitting the
|
|
109
|
+
// `needs-human` default — see VA-354), the orchestrator's
|
|
110
|
+
// catch would never get to the reason and the operator would
|
|
111
|
+
// see an infrastructure error in Linear with no clue what
|
|
112
|
+
// the agent actually found.
|
|
113
|
+
await flagHitl(issue, runDeps, `Runway hit an unrecoverable error and flagged for human review: ${errDetail}`);
|
|
114
|
+
outcomes.push({
|
|
115
|
+
identifier: issue.identifier,
|
|
116
|
+
kind: "errored",
|
|
117
|
+
detail: errDetail,
|
|
118
|
+
});
|
|
76
119
|
}
|
|
77
120
|
}
|
|
78
121
|
}
|
|
79
|
-
|
|
122
|
+
printExitSummary(outcomes);
|
|
123
|
+
return { attempts, opened, hitl, errored, outcomes };
|
|
80
124
|
}
|
|
81
125
|
async function processIssue(issue, deps) {
|
|
82
126
|
const { config, linear, github, cwd, baseBranch } = deps;
|
|
@@ -84,21 +128,64 @@ async function processIssue(issue, deps) {
|
|
|
84
128
|
await linear.transition(issue.id, config.inProgressStatus);
|
|
85
129
|
await linear.comment(issue.id, `Runway picked up this issue. Branch: \`${branch}\`.`);
|
|
86
130
|
// 1. Implementation pass.
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
}
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
131
|
+
//
|
|
132
|
+
// VA-349 + VA-350: run iterations one at a time so we can (a) inject
|
|
133
|
+
// a summary of the previous iteration into the next prompt — no more
|
|
134
|
+
// "I'll start by understanding the current state of the repository"
|
|
135
|
+
// 5x per issue — and (b) break early on IMPL: DONE/BLOCKED parsed
|
|
136
|
+
// from our own code rather than relying on sandcastle's substring
|
|
137
|
+
// completionSignal.
|
|
138
|
+
const implementTemplate = await loadImplementPrompt();
|
|
139
|
+
const maxIters = Math.max(1, config.maxIterations);
|
|
140
|
+
let prevSummary = "";
|
|
141
|
+
let implementResult;
|
|
142
|
+
let implVerdict = { kind: "missing" };
|
|
143
|
+
for (let iter = 1; iter <= maxIters; iter += 1) {
|
|
144
|
+
const implementPrompt = renderPrompt(implementTemplate, implementVars(issue, {
|
|
145
|
+
previousIterations: prevSummary,
|
|
146
|
+
policy: deps.policy,
|
|
147
|
+
}));
|
|
148
|
+
implementResult = await run({
|
|
149
|
+
agent: claudeCode("claude-opus-4-6"),
|
|
150
|
+
sandbox: docker({
|
|
151
|
+
env: dockerEnv(config),
|
|
152
|
+
}),
|
|
153
|
+
cwd,
|
|
154
|
+
prompt: implementPrompt,
|
|
155
|
+
branchStrategy: { type: "branch", branch },
|
|
156
|
+
maxIterations: 1,
|
|
157
|
+
completionSignal: [...IMPL_COMPLETION_SIGNALS],
|
|
158
|
+
name: `impl-${issue.identifier}-iter-${iter}`,
|
|
159
|
+
});
|
|
160
|
+
implVerdict = parseImplVerdict(implementResult);
|
|
161
|
+
if (implVerdict.kind === "done" || implVerdict.kind === "blocked")
|
|
162
|
+
break;
|
|
163
|
+
// CONTINUE / missing — build the summary that the NEXT iteration
|
|
164
|
+
// will see at the top of its prompt.
|
|
165
|
+
const commits = await captureCommitLog(cwd, baseBranch, branch).catch(() => "");
|
|
166
|
+
prevSummary = buildIterationSummary({
|
|
167
|
+
iterationsRun: iter,
|
|
168
|
+
commits,
|
|
169
|
+
finalMessageTail: tailOfMessage(implementResult.stdout ?? ""),
|
|
170
|
+
});
|
|
171
|
+
}
|
|
172
|
+
// implementResult is set after the first iteration. The `!` is safe
|
|
173
|
+
// because maxIters >= 1.
|
|
174
|
+
const finalResult = implementResult;
|
|
175
|
+
// VA-350: BLOCKED short-circuits straight to HITL — no reviewer pass
|
|
176
|
+
// for a self-declared blocker.
|
|
177
|
+
if (implVerdict.kind === "blocked") {
|
|
178
|
+
const reason = `Implementation pass blocked: ${implVerdict.reason}`;
|
|
179
|
+
await flagHitl(issue, deps, reason);
|
|
180
|
+
return { kind: "hitl", detail: reason };
|
|
181
|
+
}
|
|
182
|
+
if (implVerdict.kind === "missing") {
|
|
183
|
+
console.warn(`[runway] ${issue.identifier}: impl agent ended without an IMPL: marker after ${maxIters} iteration(s); proceeding to review for back-compat.`);
|
|
184
|
+
}
|
|
185
|
+
if (finalResult.commits.length === 0) {
|
|
186
|
+
const reason = "Agent produced no commits — the issue may need clarification or human input.";
|
|
187
|
+
await flagHitl(issue, deps, reason);
|
|
188
|
+
return { kind: "hitl", detail: reason };
|
|
102
189
|
}
|
|
103
190
|
// 2. Review pass — read-only-ish, just looking at the diff.
|
|
104
191
|
const diff = await captureDiff(cwd, baseBranch, branch);
|
|
@@ -117,8 +204,9 @@ async function processIssue(issue, deps) {
|
|
|
117
204
|
});
|
|
118
205
|
const verdict = parseReviewVerdict(reviewResult);
|
|
119
206
|
if (verdict.kind === "rejected") {
|
|
120
|
-
|
|
121
|
-
|
|
207
|
+
const reason = `Sub-agent review rejected: ${verdict.reason}`;
|
|
208
|
+
await flagHitl(issue, deps, reason);
|
|
209
|
+
return { kind: "hitl", detail: reason };
|
|
122
210
|
}
|
|
123
211
|
// 3. Push + PR.
|
|
124
212
|
await github.pushBranch(cwd, branch);
|
|
@@ -132,12 +220,78 @@ async function processIssue(issue, deps) {
|
|
|
132
220
|
});
|
|
133
221
|
await linear.transition(issue.id, config.inReviewStatus);
|
|
134
222
|
await linear.comment(issue.id, `Runway opened a PR for review: ${prUrl}`);
|
|
135
|
-
return "opened";
|
|
223
|
+
return { kind: "opened", detail: prUrl };
|
|
136
224
|
}
|
|
225
|
+
/**
|
|
226
|
+
* VA-355: comment is the load-bearing artifact, label is metadata.
|
|
227
|
+
* Post the comment FIRST so the substantive reason lands on the issue
|
|
228
|
+
* even if the label apply later fails (Flightplan workspaces hitting
|
|
229
|
+
* the `needs-human` default, transient Linear errors, etc.). On full
|
|
230
|
+
* failure (comment didn't even post), dump the reason to stderr with
|
|
231
|
+
* a clear banner so the operator sees it terminal-side.
|
|
232
|
+
*/
|
|
137
233
|
async function flagHitl(issue, deps, reason) {
|
|
138
234
|
const { config, linear } = deps;
|
|
139
|
-
|
|
140
|
-
|
|
235
|
+
const body = `Runway flagged for human review: ${reason}`;
|
|
236
|
+
let commentPosted = false;
|
|
237
|
+
try {
|
|
238
|
+
await linear.comment(issue.id, body);
|
|
239
|
+
commentPosted = true;
|
|
240
|
+
}
|
|
241
|
+
catch (err) {
|
|
242
|
+
console.error(`[runway] ${issue.identifier}: failed to post HITL comment:`, errMsg(err));
|
|
243
|
+
}
|
|
244
|
+
try {
|
|
245
|
+
await linear.applyLabel(issue.id, config.hitlLabel);
|
|
246
|
+
}
|
|
247
|
+
catch (err) {
|
|
248
|
+
const detail = errMsg(err);
|
|
249
|
+
console.error(`[runway] ${issue.identifier}: failed to apply HITL label "${config.hitlLabel}":`, detail);
|
|
250
|
+
if (commentPosted) {
|
|
251
|
+
// Best-effort follow-up note; the real reason is already on the
|
|
252
|
+
// issue from the first comment.
|
|
253
|
+
await linear
|
|
254
|
+
.comment(issue.id, `Note: could not apply \`${config.hitlLabel}\` label — please apply it manually. (${detail})`)
|
|
255
|
+
.catch(() => undefined);
|
|
256
|
+
}
|
|
257
|
+
}
|
|
258
|
+
if (!commentPosted) {
|
|
259
|
+
// Last resort: the operator at least sees the reason in their
|
|
260
|
+
// terminal, even with Linear entirely unreachable.
|
|
261
|
+
process.stderr.write([
|
|
262
|
+
"",
|
|
263
|
+
`===== REJECTION REASON FOLLOWS (${issue.identifier}) =====`,
|
|
264
|
+
reason,
|
|
265
|
+
"===== END REJECTION REASON =====",
|
|
266
|
+
"",
|
|
267
|
+
"",
|
|
268
|
+
].join("\n"));
|
|
269
|
+
}
|
|
270
|
+
}
|
|
271
|
+
function errMsg(err) {
|
|
272
|
+
if (err instanceof Error)
|
|
273
|
+
return err.message.split("\n")[0] ?? err.message;
|
|
274
|
+
return String(err);
|
|
275
|
+
}
|
|
276
|
+
/**
|
|
277
|
+
* VA-355: render a per-issue verdict trail at the end of the drain so
|
|
278
|
+
* the operator can scan results without opening Linear. Skipped when
|
|
279
|
+
* no issues were attempted.
|
|
280
|
+
*/
|
|
281
|
+
function printExitSummary(outcomes) {
|
|
282
|
+
if (outcomes.length === 0)
|
|
283
|
+
return;
|
|
284
|
+
console.log("\n[runway] per-issue outcomes:");
|
|
285
|
+
for (const o of outcomes) {
|
|
286
|
+
const tag = o.kind === "opened"
|
|
287
|
+
? "APPROVED → PR opened"
|
|
288
|
+
: o.kind === "hitl"
|
|
289
|
+
? "HITL"
|
|
290
|
+
: o.kind === "reverted"
|
|
291
|
+
? "REVERTED → Todo"
|
|
292
|
+
: "INFRA_ERROR";
|
|
293
|
+
console.log(` ${o.identifier} ${tag} ${o.detail}`);
|
|
294
|
+
}
|
|
141
295
|
}
|
|
142
296
|
/**
|
|
143
297
|
* Whether the agent branch has any commits beyond `base`. Used by the
|
|
@@ -166,25 +320,59 @@ async function captureCommitLog(repoPath, base, branch) {
|
|
|
166
320
|
const { stdout } = await execa("git", ["log", "--oneline", `${base}..${branch}`], { cwd: repoPath });
|
|
167
321
|
return stdout;
|
|
168
322
|
}
|
|
323
|
+
/**
|
|
324
|
+
* Pulls the last `IMPL:` marker line out of the agent's output. The
|
|
325
|
+
* orchestrator uses this to distinguish a clean completion (DONE) from
|
|
326
|
+
* a self-declared block (BLOCKED — reason) from a multi-iteration
|
|
327
|
+
* in-progress signal (CONTINUE). A missing marker is treated as
|
|
328
|
+
* CONTINUE-with-warning for back-compat.
|
|
329
|
+
*/
|
|
330
|
+
export function parseImplVerdict(result) {
|
|
331
|
+
const text = stringifyResult(result);
|
|
332
|
+
// Take the LAST match — later iterations override earlier ones if
|
|
333
|
+
// the agent emitted multiple markers across an iteration loop.
|
|
334
|
+
const matches = [...text.matchAll(IMPL_VERDICT_RE)];
|
|
335
|
+
const last = matches[matches.length - 1];
|
|
336
|
+
if (!last)
|
|
337
|
+
return { kind: "missing" };
|
|
338
|
+
if (last[1] === "DONE")
|
|
339
|
+
return { kind: "done" };
|
|
340
|
+
if (last[1] === "CONTINUE")
|
|
341
|
+
return { kind: "continue" };
|
|
342
|
+
return {
|
|
343
|
+
kind: "blocked",
|
|
344
|
+
reason: last[2]?.trim() || "no reason given",
|
|
345
|
+
};
|
|
346
|
+
}
|
|
169
347
|
/**
|
|
170
348
|
* Sandcastle's `RunResult` shape varies by version; defensively dig out
|
|
171
349
|
* the last assistant message text. We only need to match the
|
|
172
350
|
* `REVIEW: APPROVED` / `REVIEW: REJECTED — …` line at the tail.
|
|
173
351
|
*/
|
|
174
|
-
|
|
352
|
+
/**
|
|
353
|
+
* VA-353: parse the reviewer's final `REVIEW: APPROVED` /
|
|
354
|
+
* `REVIEW: REJECTED — <reason>` marker. Scans the agent's combined
|
|
355
|
+
* stdout for *all* matches and returns the LAST one, since sandcastle
|
|
356
|
+
* appends its own wrapper output ("Agent stopped", "Capturing
|
|
357
|
+
* session", "Reached max iterations (N).", "Run complete: …") after
|
|
358
|
+
* the agent's final message. A missing marker is itself a rejection —
|
|
359
|
+
* a reviewer pass that didn't terminate cleanly is not trustworthy.
|
|
360
|
+
*/
|
|
361
|
+
export function parseReviewVerdict(result) {
|
|
175
362
|
const text = stringifyResult(result);
|
|
176
|
-
const
|
|
177
|
-
|
|
363
|
+
const matches = [...text.matchAll(REVIEW_VERDICT_RE)];
|
|
364
|
+
const last = matches[matches.length - 1];
|
|
365
|
+
if (!last) {
|
|
178
366
|
return {
|
|
179
367
|
kind: "rejected",
|
|
180
368
|
reason: "review output did not contain a REVIEW: verdict line",
|
|
181
369
|
};
|
|
182
370
|
}
|
|
183
|
-
if (
|
|
371
|
+
if (last[1] === "APPROVED")
|
|
184
372
|
return { kind: "approved", reason: "" };
|
|
185
373
|
return {
|
|
186
374
|
kind: "rejected",
|
|
187
|
-
reason:
|
|
375
|
+
reason: last[2]?.trim() || "no reason given",
|
|
188
376
|
};
|
|
189
377
|
}
|
|
190
378
|
function stringifyResult(result) {
|
|
@@ -192,6 +380,15 @@ function stringifyResult(result) {
|
|
|
192
380
|
return result;
|
|
193
381
|
if (result && typeof result === "object") {
|
|
194
382
|
const r = result;
|
|
383
|
+
// VA-353: sandcastle's RunResult carries the combined agent output
|
|
384
|
+
// on `stdout`. Prefer it — falling through to JSON.stringify (the
|
|
385
|
+
// old behavior) replaces real newlines with `\n` escapes and
|
|
386
|
+
// breaks `^…$/m` line anchoring, which is the exact reason the
|
|
387
|
+
// reviewer's verdict was being silently dropped for issues like
|
|
388
|
+
// VA-312 tonight. The iterations/output fallbacks remain for
|
|
389
|
+
// back-compat with older shapes and inline test fixtures.
|
|
390
|
+
if (typeof r.stdout === "string" && r.stdout.length > 0)
|
|
391
|
+
return r.stdout;
|
|
195
392
|
if (r.iterations?.length) {
|
|
196
393
|
return r.iterations
|
|
197
394
|
.map((i) => i.output ?? i.text ?? "")
|
package/dist/policy.js
ADDED
|
@@ -0,0 +1,76 @@
|
|
|
1
|
+
import { existsSync, readFileSync } from "node:fs";
|
|
2
|
+
import { join } from "node:path";
|
|
3
|
+
import { parse as parseYaml } from "yaml";
|
|
4
|
+
import { z } from "zod";
|
|
5
|
+
/**
|
|
6
|
+
* VA-352: per-repo + per-run write-path policy for the impl agent.
|
|
7
|
+
*
|
|
8
|
+
* Defaults are conservative — secrets and sandbox-internals are always
|
|
9
|
+
* denied. Repos that need agents to touch CI workflows (the common
|
|
10
|
+
* case) opt in by creating `.runway/policy.yml` with `allowedPaths`,
|
|
11
|
+
* or by passing `--allow-paths=` for a single invocation.
|
|
12
|
+
*
|
|
13
|
+
* The policy is reflected back to the agent in the rendered prompt
|
|
14
|
+
* (`prompts/implement.md`'s "Working style" denylist sentence) so the
|
|
15
|
+
* sentence the agent sees matches what runway will enforce at review
|
|
16
|
+
* time. Enforcement itself (refusing to push a PR that touches a
|
|
17
|
+
* denied path) lives in the reviewer pass — out of scope for this
|
|
18
|
+
* change; the goal here is that the agent gets a correct denylist
|
|
19
|
+
* and surfaces `IMPL: BLOCKED` when an AC requires a denied path.
|
|
20
|
+
*/
|
|
21
|
+
export const DEFAULT_FORBIDDEN_PATHS = [
|
|
22
|
+
".github/workflows/**",
|
|
23
|
+
".env*",
|
|
24
|
+
"*.pem",
|
|
25
|
+
"*.key",
|
|
26
|
+
"pnpm-lock.yaml",
|
|
27
|
+
".sandcastle/**",
|
|
28
|
+
];
|
|
29
|
+
const PolicyFileSchema = z.object({
|
|
30
|
+
allowedPaths: z.array(z.string()).optional(),
|
|
31
|
+
forbiddenPaths: z.array(z.string()).optional(),
|
|
32
|
+
});
|
|
33
|
+
const POLICY_RELATIVE_PATH = join(".runway", "policy.yml");
|
|
34
|
+
/**
|
|
35
|
+
* Resolve the effective policy for `cwd`. Reads `.runway/policy.yml`
|
|
36
|
+
* when present, layers it on top of the conservative defaults, then
|
|
37
|
+
* applies any `--allow-paths` CLI override.
|
|
38
|
+
*/
|
|
39
|
+
export function loadPolicy(cwd, opts = {}) {
|
|
40
|
+
const sources = [];
|
|
41
|
+
let forbidden = new Set(DEFAULT_FORBIDDEN_PATHS);
|
|
42
|
+
const policyPath = join(cwd, POLICY_RELATIVE_PATH);
|
|
43
|
+
if (existsSync(policyPath)) {
|
|
44
|
+
sources.push(POLICY_RELATIVE_PATH);
|
|
45
|
+
const raw = readFileSync(policyPath, "utf8");
|
|
46
|
+
const parsed = PolicyFileSchema.parse(parseYaml(raw) ?? {});
|
|
47
|
+
if (parsed.forbiddenPaths) {
|
|
48
|
+
forbidden = new Set(parsed.forbiddenPaths);
|
|
49
|
+
}
|
|
50
|
+
for (const allow of parsed.allowedPaths ?? [])
|
|
51
|
+
forbidden.delete(allow);
|
|
52
|
+
}
|
|
53
|
+
else {
|
|
54
|
+
sources.push("defaults");
|
|
55
|
+
}
|
|
56
|
+
if (opts.allowPathsOverride?.length) {
|
|
57
|
+
for (const allow of opts.allowPathsOverride)
|
|
58
|
+
forbidden.delete(allow);
|
|
59
|
+
sources.push("--allow-paths");
|
|
60
|
+
}
|
|
61
|
+
return {
|
|
62
|
+
forbiddenPaths: [...forbidden],
|
|
63
|
+
source: sources.join(" + "),
|
|
64
|
+
};
|
|
65
|
+
}
|
|
66
|
+
/**
|
|
67
|
+
* Render the bullet sentence the impl prompt shows the agent. Stable
|
|
68
|
+
* formatting so a missing path is visible in a diff.
|
|
69
|
+
*/
|
|
70
|
+
export function renderForbiddenPathsBullet(policy) {
|
|
71
|
+
if (policy.forbiddenPaths.length === 0) {
|
|
72
|
+
return "- (No write-path restrictions for this repo. Use judgment.)";
|
|
73
|
+
}
|
|
74
|
+
const quoted = policy.forbiddenPaths.map((p) => `\`${p}\``).join(", ");
|
|
75
|
+
return `- Never modify ${quoted}. If the issue's acceptance criteria require modifying one of these paths, **stop and emit \`IMPL: BLOCKED — issue requires modifying <path>, which working-style policy forbids\`** — do not silently skip the work.`;
|
|
76
|
+
}
|
package/dist/prompts.js
CHANGED
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
import { readFile } from "node:fs/promises";
|
|
2
2
|
import { fileURLToPath } from "node:url";
|
|
3
3
|
import { dirname, join } from "node:path";
|
|
4
|
+
import { renderForbiddenPathsBullet } from "./policy.js";
|
|
4
5
|
const __dirname = dirname(fileURLToPath(import.meta.url));
|
|
5
6
|
// Prompts ship with the runway package, NOT in the target repo's
|
|
6
7
|
// .sandcastle/. Runway substitutes {{KEY}} placeholders before passing
|
|
@@ -22,13 +23,55 @@ export async function loadReviewPrompt() {
|
|
|
22
23
|
export function renderPrompt(template, vars) {
|
|
23
24
|
return template.replace(/\{\{(\w+)\}\}/g, (_, k) => vars[k] ?? `{{${k}}}`);
|
|
24
25
|
}
|
|
25
|
-
export function implementVars(issue) {
|
|
26
|
+
export function implementVars(issue, opts = {}) {
|
|
26
27
|
return {
|
|
27
28
|
ISSUE_IDENTIFIER: issue.identifier,
|
|
28
29
|
ISSUE_TITLE: issue.title,
|
|
29
30
|
ISSUE_DESCRIPTION: issue.description || "(no description)",
|
|
31
|
+
// VA-349: empty for iteration 1, a structured summary for 2+.
|
|
32
|
+
PREVIOUS_ITERATIONS: opts.previousIterations ?? "",
|
|
33
|
+
// VA-352: render the working-style denylist from the active policy
|
|
34
|
+
// so the agent never sees a hardcoded list that diverges from what
|
|
35
|
+
// runway actually enforces.
|
|
36
|
+
POLICY_FORBIDDEN_BULLET: opts.policy
|
|
37
|
+
? renderForbiddenPathsBullet(opts.policy)
|
|
38
|
+
: "",
|
|
30
39
|
};
|
|
31
40
|
}
|
|
41
|
+
/**
|
|
42
|
+
* VA-349: build the "## Previous iterations" block that gets prepended
|
|
43
|
+
* to iteration N+1's prompt. Carries the agent's commit log and the
|
|
44
|
+
* tail of its final message so the next iteration doesn't re-explore
|
|
45
|
+
* the repo from scratch.
|
|
46
|
+
*/
|
|
47
|
+
export function buildIterationSummary(args) {
|
|
48
|
+
const { iterationsRun, commits, finalMessageTail } = args;
|
|
49
|
+
return [
|
|
50
|
+
"## Previous iterations",
|
|
51
|
+
"",
|
|
52
|
+
`You have already completed ${iterationsRun} iteration(s) on this issue.`,
|
|
53
|
+
"Do **not** re-explore the repository — pick up where the last iteration left off.",
|
|
54
|
+
"",
|
|
55
|
+
"### Commits so far on this branch",
|
|
56
|
+
"",
|
|
57
|
+
"```",
|
|
58
|
+
commits.trim() || "(no commits yet)",
|
|
59
|
+
"```",
|
|
60
|
+
"",
|
|
61
|
+
"### Tail of the last iteration's final message",
|
|
62
|
+
"",
|
|
63
|
+
"```",
|
|
64
|
+
finalMessageTail.trim() || "(no output captured)",
|
|
65
|
+
"```",
|
|
66
|
+
"",
|
|
67
|
+
].join("\n");
|
|
68
|
+
}
|
|
69
|
+
/** Keep the tail of an iteration's stdout small enough to fit alongside the prompt. */
|
|
70
|
+
export function tailOfMessage(stdout, maxChars = 2000) {
|
|
71
|
+
if (stdout.length <= maxChars)
|
|
72
|
+
return stdout;
|
|
73
|
+
return `…(earlier output truncated)\n${stdout.slice(-maxChars)}`;
|
|
74
|
+
}
|
|
32
75
|
export function reviewVars(args) {
|
|
33
76
|
return {
|
|
34
77
|
ISSUE_IDENTIFIER: args.issue.identifier,
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@valescoagency/runway",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.4.0",
|
|
4
4
|
"description": "Linear-driven orchestrator + scaffolder for coding agents on Sandcastle. `runway init` scaffolds a target repo (sandcastle + varlock + 1Password); `runway run` drains a Linear queue against it; `runway doctor`, `runway upgrade`, `runway upgrade-repo` round out the lifecycle.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"author": {
|
|
@@ -42,6 +42,7 @@
|
|
|
42
42
|
"@ai-hero/sandcastle": "^0.5.10",
|
|
43
43
|
"@linear/sdk": "^41.0.0",
|
|
44
44
|
"execa": "^9.5.2",
|
|
45
|
+
"yaml": "^2.9.0",
|
|
45
46
|
"zod": "^3.23.8"
|
|
46
47
|
},
|
|
47
48
|
"devDependencies": {
|
package/prompts/implement.md
CHANGED
|
@@ -6,6 +6,8 @@ You are an autonomous coding agent working on a single Linear issue.
|
|
|
6
6
|
|
|
7
7
|
{{ISSUE_DESCRIPTION}}
|
|
8
8
|
|
|
9
|
+
{{PREVIOUS_ITERATIONS}}
|
|
10
|
+
|
|
9
11
|
# Repository context
|
|
10
12
|
|
|
11
13
|
You are operating inside a clean checkout of the target repository on a
|
|
@@ -29,9 +31,51 @@ fresh branch named `agent/{{ISSUE_IDENTIFIER}}`. Branch off `main`.
|
|
|
29
31
|
- If the issue is ambiguous and you can't make a reasonable judgment
|
|
30
32
|
call, stop and explain what's missing in your final message — runway
|
|
31
33
|
will route to a human.
|
|
32
|
-
|
|
33
|
-
`pnpm-lock.yaml` (unless the task is a dep bump), or `.sandcastle/**`.
|
|
34
|
+
{{POLICY_FORBIDDEN_BULLET}}
|
|
34
35
|
|
|
35
36
|
# Stop conditions
|
|
36
37
|
|
|
37
38
|
When all five "done" criteria pass, stop. Don't keep polishing.
|
|
39
|
+
|
|
40
|
+
# Termination contract — REQUIRED
|
|
41
|
+
|
|
42
|
+
End **every** response with exactly one of these markers, on its own
|
|
43
|
+
line, as the **last non-empty line** of your message. Nothing after it.
|
|
44
|
+
|
|
45
|
+
- `IMPL: DONE` — all five "done" criteria are met. The reviewer pass
|
|
46
|
+
will run next; no further iterations are needed.
|
|
47
|
+
- `IMPL: BLOCKED — <one-line reason>` — you cannot proceed without
|
|
48
|
+
human input (issue is ambiguous, requires a decision outside the
|
|
49
|
+
agent's purview, conflicts with a working-style constraint, hits a
|
|
50
|
+
permission wall, etc.). Runway will route the issue to a human with
|
|
51
|
+
your reason attached and will not run the reviewer pass.
|
|
52
|
+
- `IMPL: CONTINUE` — you made progress but the work isn't done yet.
|
|
53
|
+
Runway will run another iteration so you can pick up where you left
|
|
54
|
+
off.
|
|
55
|
+
|
|
56
|
+
Examples:
|
|
57
|
+
|
|
58
|
+
```
|
|
59
|
+
…all tests pass, typecheck clean, lint clean. Commit pushed.
|
|
60
|
+
|
|
61
|
+
IMPL: DONE
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
```
|
|
65
|
+
…the issue's acceptance criteria require modifying
|
|
66
|
+
`.github/workflows/release.yml`, which the working-style policy
|
|
67
|
+
forbids. Cannot proceed.
|
|
68
|
+
|
|
69
|
+
IMPL: BLOCKED — issue requires CI workflow changes that working-style policy forbids
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
```
|
|
73
|
+
…added the migration and the RLS policy. Tests for the policy
|
|
74
|
+
helper still need to be written next iteration.
|
|
75
|
+
|
|
76
|
+
IMPL: CONTINUE
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
The marker is parsed mechanically by runway. A missing or malformed
|
|
80
|
+
marker is treated as `CONTINUE` for back-compat, but **always** emit
|
|
81
|
+
one explicitly — silent completions waste budget on re-exploration.
|
|
@@ -39,6 +39,30 @@ RUN if ! getent group $AGENT_GID >/dev/null; then \
|
|
|
39
39
|
groupmod -g $AGENT_GID node; \
|
|
40
40
|
fi \
|
|
41
41
|
&& usermod -u $AGENT_UID -g $AGENT_GID -d /home/agent -m -l agent node
|
|
42
|
+
|
|
43
|
+
# VA-351: bake the container env up front so agents don't manually
|
|
44
|
+
# work around host-path leaks, missing pnpm, or unset HOME on every
|
|
45
|
+
# iteration. Without these, every agent run repeats the same
|
|
46
|
+
# corepack/TURBO_CACHE_DIR/HOME setup commands — see VA-312's run log
|
|
47
|
+
# for the receipts.
|
|
48
|
+
ENV HOME=/home/agent
|
|
49
|
+
ENV XDG_CACHE_HOME=/home/agent/.cache
|
|
50
|
+
ENV TURBO_CACHE_DIR=/tmp/turbo-cache
|
|
51
|
+
ENV npm_config_cache=/home/agent/.cache/npm
|
|
52
|
+
|
|
53
|
+
# Pre-create cache dirs with agent ownership so the first pnpm/turbo
|
|
54
|
+
# run doesn't have to chown them. Both are inside paths the agent owns
|
|
55
|
+
# anyway; this just makes them exist.
|
|
56
|
+
RUN mkdir -p /home/agent/.cache /home/agent/.cache/npm /tmp/turbo-cache \
|
|
57
|
+
&& chown -R $AGENT_UID:$AGENT_GID /home/agent/.cache /tmp/turbo-cache
|
|
58
|
+
|
|
59
|
+
# Bake pnpm via corepack at build time so `pnpm` is on PATH inside the
|
|
60
|
+
# container before any agent command runs. Pin a default; target repos
|
|
61
|
+
# can override at runtime via `packageManager` in package.json +
|
|
62
|
+
# `corepack use`.
|
|
63
|
+
RUN corepack enable \
|
|
64
|
+
&& corepack prepare pnpm@10.0.0 --activate
|
|
65
|
+
|
|
42
66
|
USER ${AGENT_UID}:${AGENT_GID}
|
|
43
67
|
|
|
44
68
|
# Install Claude Code CLI
|