nubos-pilot 0.9.4 → 0.9.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/agents/np-critic-acceptance.md +4 -0
- package/agents/np-critic-style.md +4 -0
- package/agents/np-critic-tests.md +4 -0
- package/bin/np-tools/commit-task.cjs +37 -10
- package/bin/np-tools/commit-task.test.cjs +117 -5
- package/bin/np-tools/loop-audit-tool-use.cjs +27 -10
- package/bin/np-tools/loop-commands.test.cjs +266 -0
- package/bin/np-tools/loop-run-round.cjs +81 -0
- package/docs/adr/0010-nubosloop.md +34 -0
- package/lib/nubosloop.cjs +51 -0
- package/package.json +1 -1
- package/workflows/execute-phase.md +23 -4
|
@@ -26,6 +26,10 @@ This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENES
|
|
|
26
26
|
|
|
27
27
|
Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
|
|
28
28
|
|
|
29
|
+
## Spawn-Evidence Audit (Trust Layer, ADR-0010)
|
|
30
|
+
|
|
31
|
+
Your spawn must be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --agent np-critic-acceptance --tool-use-log <json>` after you emit your findings JSON. This is the orchestrator's responsibility, not yours — but if you observe (in the verify output or task summary) that a prior round's critic-schwarm completed without an audit stamp, surface that as a finding of category `locked-decision-violation` because it indicates a bypass of ADR-0010 Layer C. The post-critics gate (`loop-run-round --phase post-critics`) refuses without the three critic stamps; missing your stamp blocks the entire round.
|
|
32
|
+
|
|
29
33
|
## Inputs
|
|
30
34
|
|
|
31
35
|
The orchestrator provides these paths in your prompt context. Read every path it hands you via `Read` — do not guess.
|
|
@@ -25,6 +25,10 @@ This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENES
|
|
|
25
25
|
|
|
26
26
|
Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
|
|
27
27
|
|
|
28
|
+
## Spawn-Evidence Audit (Trust Layer, ADR-0010)
|
|
29
|
+
|
|
30
|
+
Your spawn must be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --agent np-critic-style --tool-use-log <json>` after you emit your findings JSON. The post-critics gate refuses without the three critic stamps; missing your stamp blocks the entire round. Synthesizing a fake findings JSON without spawning your sibling critics is a Layer-C violation and the orchestrator must NOT do it.
|
|
31
|
+
|
|
28
32
|
## Inputs
|
|
29
33
|
|
|
30
34
|
The orchestrator provides these paths in your prompt context. Read every path it hands you via `Read` — do not guess.
|
|
@@ -25,6 +25,10 @@ This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENES
|
|
|
25
25
|
|
|
26
26
|
Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
|
|
27
27
|
|
|
28
|
+
## Spawn-Evidence Audit (Trust Layer, ADR-0010)
|
|
29
|
+
|
|
30
|
+
Your spawn must be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --agent np-critic-tests --tool-use-log <json>` after you emit your findings JSON. The post-critics gate refuses without the three critic stamps; missing your stamp blocks the entire round. Synthesizing a fake findings JSON without spawning your sibling critics is a Layer-C violation and the orchestrator must NOT do it.
|
|
31
|
+
|
|
28
32
|
## Inputs
|
|
29
33
|
|
|
30
34
|
The orchestrator provides these paths in your prompt context. Read every path it hands you via `Read` — do not guess.
|
|
@@ -11,27 +11,53 @@ const { deleteCheckpoint, readCheckpoint } = require('../../lib/checkpoint.cjs')
|
|
|
11
11
|
|
|
12
12
|
const BYPASS_FLAG = '--bypass-nubosloop';
|
|
13
13
|
|
|
14
|
+
// Evidence-based gate: a complete Nubosloop run accumulates fields on the
|
|
15
|
+
// checkpoint envelope (cache_hit from preflight, verify_exit_code from
|
|
16
|
+
// post-executor, findings from post-critics, committed_at from commit). A
|
|
17
|
+
// gamed run that only invokes `loop-run-round --phase commit` directly leaves
|
|
18
|
+
// verify_exit_code and findings undefined. Checking last_phase alone is not
|
|
19
|
+
// enough — we require the cumulative signature.
|
|
14
20
|
function _assertLoopGate(taskId, cwd, bypass, stderr) {
|
|
15
21
|
const cp = readCheckpoint(taskId, cwd);
|
|
16
|
-
const
|
|
17
|
-
|
|
18
|
-
const
|
|
19
|
-
|
|
22
|
+
const np = (cp && cp.nubosloop) || null;
|
|
23
|
+
const last = np && np.last_phase;
|
|
24
|
+
const checks = [
|
|
25
|
+
{ ok: !!cp, reason: 'no-checkpoint', missing: 'checkpoint', observed: 'no-checkpoint' },
|
|
26
|
+
{ ok: last === 'commit', reason: 'last-phase-mismatch', missing: 'last_phase=commit', observed: last || 'none' },
|
|
27
|
+
{ ok: np && np.verify_exit_code === 0, reason: 'post-executor-not-green', missing: 'verify_exit_code=0', observed: np && np.verify_exit_code !== undefined ? String(np.verify_exit_code) : 'undefined' },
|
|
28
|
+
{ ok: np && Array.isArray(np.findings), reason: 'post-critics-missing', missing: 'findings (array)', observed: np && np.findings !== undefined ? JSON.stringify(np.findings).slice(0, 60) : 'undefined' },
|
|
29
|
+
{ ok: np && !!np.committed_at, reason: 'commit-phase-not-stamped', missing: 'committed_at', observed: (np && np.committed_at) || 'undefined' },
|
|
30
|
+
];
|
|
31
|
+
const failed = checks.find((c) => !c.ok);
|
|
32
|
+
if (!failed) {
|
|
33
|
+
return { bypassed: false, last_phase: last, forced_commit_phase: !!(np && np.forced_commit_phase) };
|
|
34
|
+
}
|
|
20
35
|
if (bypass) {
|
|
21
36
|
stderr.write(
|
|
22
37
|
'[nubos-pilot] WARNING: commit-task ' + taskId +
|
|
23
|
-
' bypassing Nubosloop gate (' + BYPASS_FLAG +
|
|
38
|
+
' bypassing Nubosloop gate (' + BYPASS_FLAG +
|
|
39
|
+
'; reason=' + failed.reason + '; missing=' + failed.missing +
|
|
40
|
+
'; observed=' + failed.observed +
|
|
24
41
|
'). Single-pass commit, no critic review enforced.\n',
|
|
25
42
|
);
|
|
26
|
-
return { bypassed: true, last_phase: last || null };
|
|
43
|
+
return { bypassed: true, last_phase: last || null, forced_commit_phase: !!(np && np.forced_commit_phase) };
|
|
27
44
|
}
|
|
28
45
|
throw new NubosPilotError(
|
|
29
46
|
'commit-task-loop-bypass-violation',
|
|
30
|
-
'commit-task refused: Nubosloop
|
|
31
|
-
' (
|
|
32
|
-
'
|
|
47
|
+
'commit-task refused: Nubosloop sequence incomplete for ' + taskId +
|
|
48
|
+
' (reason=' + failed.reason + '; missing=' + failed.missing +
|
|
49
|
+
'; observed=' + failed.observed + '). ' +
|
|
50
|
+
'Run the full loop (preflight → post-executor verify-green → post-critics → commit) first, or pass ' + BYPASS_FLAG +
|
|
33
51
|
' for an explicit single-pass override.',
|
|
34
|
-
{
|
|
52
|
+
{
|
|
53
|
+
taskId,
|
|
54
|
+
reason: failed.reason,
|
|
55
|
+
missing: failed.missing,
|
|
56
|
+
observed_last_phase: last || null,
|
|
57
|
+
observed_verify_exit_code: np && np.verify_exit_code !== undefined ? np.verify_exit_code : null,
|
|
58
|
+
observed_findings_is_array: !!(np && Array.isArray(np.findings)),
|
|
59
|
+
observed_committed_at: (np && np.committed_at) || null,
|
|
60
|
+
},
|
|
35
61
|
);
|
|
36
62
|
}
|
|
37
63
|
|
|
@@ -141,6 +167,7 @@ function run(args, ctx) {
|
|
|
141
167
|
files: safeFiles,
|
|
142
168
|
files_source: filesSource,
|
|
143
169
|
nubosloop_bypassed: gate.bypassed,
|
|
170
|
+
nubosloop_forced_commit_phase: !!gate.forced_commit_phase,
|
|
144
171
|
};
|
|
145
172
|
stdout.write(JSON.stringify(payload));
|
|
146
173
|
return payload;
|
|
@@ -82,9 +82,10 @@ function _capture() {
|
|
|
82
82
|
return { stub, get: () => buf };
|
|
83
83
|
}
|
|
84
84
|
|
|
85
|
-
// Seed a checkpoint that satisfies the Nubosloop gate (
|
|
86
|
-
//
|
|
87
|
-
//
|
|
85
|
+
// Seed a checkpoint that satisfies the full Nubosloop gate (sequence-integrity).
|
|
86
|
+
// A real loop accumulates evidence on the envelope; the gate refuses unless
|
|
87
|
+
// every required marker is present. Tests that exercise game-paths build their
|
|
88
|
+
// own partial fixtures.
|
|
88
89
|
function seedLoopReadyCheckpoint(root, taskId, extra) {
|
|
89
90
|
const cpPath = path.join(root, '.nubos-pilot', 'checkpoints', taskId + '.json');
|
|
90
91
|
fs.mkdirSync(path.dirname(cpPath), { recursive: true });
|
|
@@ -94,12 +95,21 @@ function seedLoopReadyCheckpoint(root, taskId, extra) {
|
|
|
94
95
|
status: 'pre-commit',
|
|
95
96
|
files_touched: [],
|
|
96
97
|
nubosloop: {
|
|
98
|
+
round: 1,
|
|
99
|
+
cache_hit: false,
|
|
97
100
|
last_phase: 'commit',
|
|
98
101
|
last_action: 'commit',
|
|
102
|
+
verify_exit_code: 0,
|
|
103
|
+
findings: [],
|
|
99
104
|
committed_at: '2026-05-04T12:00:00Z',
|
|
100
105
|
},
|
|
101
106
|
};
|
|
102
|
-
|
|
107
|
+
// Allow the test to override individual fields (incl. nubosloop sub-fields).
|
|
108
|
+
const merged = Object.assign({}, base, extra || {});
|
|
109
|
+
if (extra && extra.nubosloop) {
|
|
110
|
+
merged.nubosloop = Object.assign({}, base.nubosloop, extra.nubosloop);
|
|
111
|
+
}
|
|
112
|
+
fs.writeFileSync(cpPath, JSON.stringify(merged), 'utf-8');
|
|
103
113
|
return cpPath;
|
|
104
114
|
}
|
|
105
115
|
|
|
@@ -249,10 +259,112 @@ test('CT-9: refuse commit when nubosloop.last_phase ≠ commit', () => {
|
|
|
249
259
|
() => subcmd.run(['M006-S001-T0021'], { cwd: root, stdout: cap.stub, stderr: stderr.stub }),
|
|
250
260
|
(err) => err && err.code === 'commit-task-loop-bypass-violation'
|
|
251
261
|
&& err.details && err.details.reason === 'last-phase-mismatch'
|
|
252
|
-
&& err.details.
|
|
262
|
+
&& err.details.observed_last_phase === 'verifying',
|
|
263
|
+
);
|
|
264
|
+
});
|
|
265
|
+
|
|
266
|
+
test('CT-12: refuse gamed commit (last_phase=commit but no verify_exit_code)', () => {
|
|
267
|
+
const root = makeRepo();
|
|
268
|
+
seedPlanAndTask(root, '06-01', 'M006-S001-T0030', ['src/g.ts']);
|
|
269
|
+
fs.mkdirSync(path.join(root, 'src'), { recursive: true });
|
|
270
|
+
fs.writeFileSync(path.join(root, 'src', 'g.ts'), 'export const g = 7;\n', 'utf-8');
|
|
271
|
+
// Simulates an agent that ran ONLY `loop-run-round --phase commit` to game
|
|
272
|
+
// the gate, without going through preflight/post-executor/post-critics.
|
|
273
|
+
// verify_exit_code is undefined → post-executor never ran.
|
|
274
|
+
const cpPath = path.join(root, '.nubos-pilot', 'checkpoints', 'M006-S001-T0030.json');
|
|
275
|
+
fs.mkdirSync(path.dirname(cpPath), { recursive: true });
|
|
276
|
+
fs.writeFileSync(cpPath, JSON.stringify({
|
|
277
|
+
schema_version: 1,
|
|
278
|
+
task_id: 'M006-S001-T0030',
|
|
279
|
+
status: 'pre-commit',
|
|
280
|
+
files_touched: [],
|
|
281
|
+
nubosloop: { last_phase: 'commit', last_action: 'commit', committed_at: '2026-05-04T12:00:00Z' },
|
|
282
|
+
}), 'utf-8');
|
|
283
|
+
const cap = _capture();
|
|
284
|
+
const stderr = _capture();
|
|
285
|
+
assert.throws(
|
|
286
|
+
() => subcmd.run(['M006-S001-T0030'], { cwd: root, stdout: cap.stub, stderr: stderr.stub }),
|
|
287
|
+
(err) => err && err.code === 'commit-task-loop-bypass-violation'
|
|
288
|
+
&& err.details && err.details.reason === 'post-executor-not-green',
|
|
289
|
+
);
|
|
290
|
+
});
|
|
291
|
+
|
|
292
|
+
test('CT-13: refuse gamed commit when verify ran but post-critics findings missing', () => {
|
|
293
|
+
const root = makeRepo();
|
|
294
|
+
seedPlanAndTask(root, '06-01', 'M006-S001-T0031', ['src/h.ts']);
|
|
295
|
+
fs.mkdirSync(path.join(root, 'src'), { recursive: true });
|
|
296
|
+
fs.writeFileSync(path.join(root, 'src', 'h.ts'), 'export const h = 8;\n', 'utf-8');
|
|
297
|
+
// verify ran (exit_code=0) but critics never produced findings — agent
|
|
298
|
+
// skipped the critic-schwarm step.
|
|
299
|
+
const cpPath = path.join(root, '.nubos-pilot', 'checkpoints', 'M006-S001-T0031.json');
|
|
300
|
+
fs.mkdirSync(path.dirname(cpPath), { recursive: true });
|
|
301
|
+
fs.writeFileSync(cpPath, JSON.stringify({
|
|
302
|
+
schema_version: 1,
|
|
303
|
+
task_id: 'M006-S001-T0031',
|
|
304
|
+
status: 'pre-commit',
|
|
305
|
+
files_touched: [],
|
|
306
|
+
nubosloop: {
|
|
307
|
+
last_phase: 'commit', last_action: 'commit',
|
|
308
|
+
verify_exit_code: 0, // post-executor ran
|
|
309
|
+
committed_at: '2026-05-04T12:00:00Z',
|
|
310
|
+
// findings: missing → post-critics never ran
|
|
311
|
+
},
|
|
312
|
+
}), 'utf-8');
|
|
313
|
+
const cap = _capture();
|
|
314
|
+
const stderr = _capture();
|
|
315
|
+
assert.throws(
|
|
316
|
+
() => subcmd.run(['M006-S001-T0031'], { cwd: root, stdout: cap.stub, stderr: stderr.stub }),
|
|
317
|
+
(err) => err && err.code === 'commit-task-loop-bypass-violation'
|
|
318
|
+
&& err.details && err.details.reason === 'post-critics-missing',
|
|
319
|
+
);
|
|
320
|
+
});
|
|
321
|
+
|
|
322
|
+
test('CT-14: refuse when verify-red was recorded (post-executor failed)', () => {
|
|
323
|
+
const root = makeRepo();
|
|
324
|
+
seedPlanAndTask(root, '06-01', 'M006-S001-T0032', ['src/i.ts']);
|
|
325
|
+
fs.mkdirSync(path.join(root, 'src'), { recursive: true });
|
|
326
|
+
fs.writeFileSync(path.join(root, 'src', 'i.ts'), 'export const i = 9;\n', 'utf-8');
|
|
327
|
+
// Loop reached commit-stamp somehow but verify was red — must refuse.
|
|
328
|
+
seedLoopReadyCheckpoint(root, 'M006-S001-T0032', {
|
|
329
|
+
nubosloop: { verify_exit_code: 1 },
|
|
330
|
+
});
|
|
331
|
+
const cap = _capture();
|
|
332
|
+
const stderr = _capture();
|
|
333
|
+
assert.throws(
|
|
334
|
+
() => subcmd.run(['M006-S001-T0032'], { cwd: root, stdout: cap.stub, stderr: stderr.stub }),
|
|
335
|
+
(err) => err && err.code === 'commit-task-loop-bypass-violation'
|
|
336
|
+
&& err.details && err.details.reason === 'post-executor-not-green'
|
|
337
|
+
&& err.details.observed_verify_exit_code === 1,
|
|
253
338
|
);
|
|
254
339
|
});
|
|
255
340
|
|
|
341
|
+
test('CT-15: bypass on gamed commit logs precise reason in stderr', () => {
|
|
342
|
+
const root = makeRepo();
|
|
343
|
+
seedPlanAndTask(root, '06-01', 'M006-S001-T0033', ['src/j.ts']);
|
|
344
|
+
fs.mkdirSync(path.join(root, 'src'), { recursive: true });
|
|
345
|
+
fs.writeFileSync(path.join(root, 'src', 'j.ts'), 'export const j = 10;\n', 'utf-8');
|
|
346
|
+
const cpPath = path.join(root, '.nubos-pilot', 'checkpoints', 'M006-S001-T0033.json');
|
|
347
|
+
fs.mkdirSync(path.dirname(cpPath), { recursive: true });
|
|
348
|
+
fs.writeFileSync(cpPath, JSON.stringify({
|
|
349
|
+
schema_version: 1, task_id: 'M006-S001-T0033', status: 'pre-commit', files_touched: [],
|
|
350
|
+
nubosloop: { last_phase: 'commit', last_action: 'commit', committed_at: 'z' },
|
|
351
|
+
}), 'utf-8');
|
|
352
|
+
const prev = process.cwd();
|
|
353
|
+
process.chdir(root);
|
|
354
|
+
const cap = _capture();
|
|
355
|
+
const stderr = _capture();
|
|
356
|
+
try {
|
|
357
|
+
subcmd.run(['M006-S001-T0033', '--bypass-nubosloop'], { cwd: root, stdout: cap.stub, stderr: stderr.stub });
|
|
358
|
+
} finally {
|
|
359
|
+
process.chdir(prev);
|
|
360
|
+
}
|
|
361
|
+
const payload = JSON.parse(cap.get());
|
|
362
|
+
assert.equal(payload.ok, true);
|
|
363
|
+
assert.equal(payload.nubosloop_bypassed, true);
|
|
364
|
+
assert.match(stderr.get(), /reason=post-executor-not-green/);
|
|
365
|
+
assert.match(stderr.get(), /missing=verify_exit_code=0/);
|
|
366
|
+
});
|
|
367
|
+
|
|
256
368
|
test('CT-10: --bypass-nubosloop allows single-pass commit and warns on stderr', () => {
|
|
257
369
|
const root = makeRepo();
|
|
258
370
|
seedPlanAndTask(root, '06-01', 'M006-S001-T0022', ['src/e.ts']);
|
|
@@ -31,18 +31,35 @@ function run(argv, ctx) {
|
|
|
31
31
|
{ hint: 'agents requiring search tools: ' + nubosloop.AUDITED_AGENTS.join(', ') },
|
|
32
32
|
);
|
|
33
33
|
}
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
);
|
|
40
|
-
|
|
34
|
+
// --tool-use-log is required for AUDITED_AGENTS (Rule 9 enforcement reads
|
|
35
|
+
// the tool list to verify search-knowledge / match-existing-learning calls).
|
|
36
|
+
// For non-audited spawns (critics, plan-checker, etc.) the orchestrator may
|
|
37
|
+
// omit it — we still record the spawn for Layer-C audit-trail evidence with
|
|
38
|
+
// an empty log. Explicit empty-array is also accepted.
|
|
39
|
+
const isAuditedAgent = nubosloop.AUDITED_AGENTS.includes(agent);
|
|
40
|
+
let log;
|
|
41
|
+
if (tail.includes('--tool-use-log')) {
|
|
42
|
+
log = args.getJsonFlag(
|
|
43
|
+
tail,
|
|
44
|
+
'--tool-use-log',
|
|
45
|
+
'loop-audit-missing-log',
|
|
46
|
+
"JSON array of tool-name strings, e.g. '[\"Read\",\"search-knowledge\",\"Edit\"]'",
|
|
47
|
+
);
|
|
48
|
+
if (!Array.isArray(log)) {
|
|
49
|
+
throw new (require('../../lib/core.cjs').NubosPilotError)(
|
|
50
|
+
'loop-audit-invalid-log',
|
|
51
|
+
'--tool-use-log must be a JSON array',
|
|
52
|
+
{ got: typeof log },
|
|
53
|
+
);
|
|
54
|
+
}
|
|
55
|
+
} else if (isAuditedAgent) {
|
|
41
56
|
throw new (require('../../lib/core.cjs').NubosPilotError)(
|
|
42
|
-
'loop-audit-
|
|
43
|
-
'--tool-use-log
|
|
44
|
-
{
|
|
57
|
+
'loop-audit-missing-log',
|
|
58
|
+
'loop-audit-tool-use requires --tool-use-log for audited agent: ' + agent,
|
|
59
|
+
{ hint: 'audited agents drive Rule 9 enforcement; pass --tool-use-log \'[]\' to record an empty spawn' },
|
|
45
60
|
);
|
|
61
|
+
} else {
|
|
62
|
+
log = [];
|
|
46
63
|
}
|
|
47
64
|
const result = nubosloop.auditToolUse(taskId, agent, log, cwd);
|
|
48
65
|
const payload = { task_id: taskId, ...result };
|
|
@@ -349,9 +349,25 @@ test('LCLI-RR-2: loop-run-round preflight on populated store → spawn-executor-
|
|
|
349
349
|
assert.ok(out.cache_hit);
|
|
350
350
|
});
|
|
351
351
|
|
|
352
|
+
// Helper: seed the per-round spawn-evidence audit log so Layer-C gates accept
|
|
353
|
+
// post-executor / post-critics. Tests that exercise the gate explicitly
|
|
354
|
+
// (LCLI-RR-12+) build their own partial fixtures.
|
|
355
|
+
function _seedSpawnEvidence(taskId, round, agents, cwd) {
|
|
356
|
+
const nubosloop = require('../../lib/nubosloop.cjs');
|
|
357
|
+
nubosloop.recordLoopState(taskId, { round }, cwd);
|
|
358
|
+
for (const a of agents) {
|
|
359
|
+
// Pass an empty tool-use log — these are evidence stamps, not Rule 9 audits.
|
|
360
|
+
// For AUDITED_AGENTS in this test (np-executor / np-build-fixer) we need to
|
|
361
|
+
// pass a valid search-tool to avoid generating a rule-9-violation finding.
|
|
362
|
+
const log = nubosloop.AUDITED_AGENTS.includes(a) ? ['search-knowledge'] : [];
|
|
363
|
+
nubosloop.auditToolUse(taskId, a, log, cwd);
|
|
364
|
+
}
|
|
365
|
+
}
|
|
366
|
+
|
|
352
367
|
test('LCLI-RR-3: loop-run-round phase=post-executor with verify-green → spawn-critic-schwarm', () => {
|
|
353
368
|
const r = _mkRoot();
|
|
354
369
|
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
370
|
+
_seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r);
|
|
355
371
|
const cap = _cap();
|
|
356
372
|
const loopRunRound = require('./loop-run-round.cjs');
|
|
357
373
|
loopRunRound.run(
|
|
@@ -366,6 +382,7 @@ test('LCLI-RR-3: loop-run-round phase=post-executor with verify-green → spawn-
|
|
|
366
382
|
test('LCLI-RR-4: loop-run-round phase=post-executor with verify-red → spawn-build-fixer', () => {
|
|
367
383
|
const r = _mkRoot();
|
|
368
384
|
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
385
|
+
_seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r);
|
|
369
386
|
const cap = _cap();
|
|
370
387
|
const loopRunRound = require('./loop-run-round.cjs');
|
|
371
388
|
loopRunRound.run(
|
|
@@ -380,6 +397,8 @@ test('LCLI-RR-4: loop-run-round phase=post-executor with verify-red → spawn-bu
|
|
|
380
397
|
test('LCLI-RR-5: loop-run-round phase=post-critics with zero findings → commit', () => {
|
|
381
398
|
const r = _mkRoot();
|
|
382
399
|
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
400
|
+
_seedSpawnEvidence('M001-S001-T0001', 1,
|
|
401
|
+
['np-executor', 'np-critic-style', 'np-critic-tests', 'np-critic-acceptance'], r);
|
|
383
402
|
const cap = _cap();
|
|
384
403
|
const loopRunRound = require('./loop-run-round.cjs');
|
|
385
404
|
loopRunRound.run(
|
|
@@ -399,6 +418,10 @@ test('LCLI-RR-5b: post-critics surfaces rule-9-violation from audit log even wit
|
|
|
399
418
|
// Round 1, executor shipped without searching → audit captures violation
|
|
400
419
|
nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
|
|
401
420
|
nubosloop.auditToolUse('M001-S001-T0001', 'np-executor', ['Read', 'Edit'], r);
|
|
421
|
+
// Seed the three critic spawn evidences so the Layer-C gate is satisfied —
|
|
422
|
+
// we want the rule-9-violation to surface from the audit log, not the gate.
|
|
423
|
+
_seedSpawnEvidence('M001-S001-T0001', 1,
|
|
424
|
+
['np-critic-style', 'np-critic-tests', 'np-critic-acceptance'], r);
|
|
402
425
|
// Critics return zero findings (style/tests/acceptance all clean) — without
|
|
403
426
|
// the Rule 9 chain the loop would commit. With it, the audit violation must
|
|
404
427
|
// still route the round to executor.
|
|
@@ -428,6 +451,9 @@ test('LCLI-RR-5c: post-critics scopes audit findings to current round only', ()
|
|
|
428
451
|
nubosloop.auditToolUse('M001-S001-T0001', 'np-executor', ['Read'], r);
|
|
429
452
|
nubosloop.recordLoopState('M001-S001-T0001', { round: 2 }, r);
|
|
430
453
|
nubosloop.auditToolUse('M001-S001-T0001', 'np-build-fixer', ['search-knowledge'], r);
|
|
454
|
+
// Seed critic-spawn evidence for round 2 so the Layer-C gate is satisfied.
|
|
455
|
+
_seedSpawnEvidence('M001-S001-T0001', 2,
|
|
456
|
+
['np-critic-style', 'np-critic-tests', 'np-critic-acceptance'], r);
|
|
431
457
|
const cap = _cap();
|
|
432
458
|
const loopRunRound = require('./loop-run-round.cjs');
|
|
433
459
|
loopRunRound.run(
|
|
@@ -467,6 +493,246 @@ test('LCLI-RR-7: loop-run-round rejects unknown --phase', () => {
|
|
|
467
493
|
);
|
|
468
494
|
});
|
|
469
495
|
|
|
496
|
+
test('LCLI-RR-8: phase=commit refuses without verify_exit_code (post-executor never ran)', () => {
|
|
497
|
+
const r = _mkRoot();
|
|
498
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
499
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
500
|
+
assert.throws(
|
|
501
|
+
() => loopRunRound.run(
|
|
502
|
+
['M001-S001-T0001', '--phase', 'commit'],
|
|
503
|
+
{ cwd: r, stdout: _cap().stub },
|
|
504
|
+
),
|
|
505
|
+
(err) => err && err.code === 'loop-commit-precondition-missing'
|
|
506
|
+
&& err.details && err.details.missing === 'verify_exit_code',
|
|
507
|
+
);
|
|
508
|
+
});
|
|
509
|
+
|
|
510
|
+
test('LCLI-RR-9: phase=commit refuses without findings (post-critics never ran)', () => {
|
|
511
|
+
const r = _mkRoot();
|
|
512
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
513
|
+
const nubosloop = require('../../lib/nubosloop.cjs');
|
|
514
|
+
// post-executor ran (verify-green) but critics never produced findings.
|
|
515
|
+
nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
|
|
516
|
+
checkpoint.mergeCheckpoint('M001-S001-T0001', (cur) => ({
|
|
517
|
+
nubosloop: Object.assign({}, (cur && cur.nubosloop) || {}, { verify_exit_code: 0 }),
|
|
518
|
+
}), r);
|
|
519
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
520
|
+
assert.throws(
|
|
521
|
+
() => loopRunRound.run(
|
|
522
|
+
['M001-S001-T0001', '--phase', 'commit'],
|
|
523
|
+
{ cwd: r, stdout: _cap().stub },
|
|
524
|
+
),
|
|
525
|
+
(err) => err && err.code === 'loop-commit-precondition-missing'
|
|
526
|
+
&& err.details && err.details.missing === 'findings',
|
|
527
|
+
);
|
|
528
|
+
});
|
|
529
|
+
|
|
530
|
+
test('LCLI-RR-10: phase=commit accepts complete loop state (verify-green + findings array)', () => {
|
|
531
|
+
const r = _mkRoot();
|
|
532
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
533
|
+
const nubosloop = require('../../lib/nubosloop.cjs');
|
|
534
|
+
nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
|
|
535
|
+
checkpoint.mergeCheckpoint('M001-S001-T0001', (cur) => ({
|
|
536
|
+
nubosloop: Object.assign({}, (cur && cur.nubosloop) || {}, {
|
|
537
|
+
verify_exit_code: 0,
|
|
538
|
+
findings: [],
|
|
539
|
+
}),
|
|
540
|
+
}), r);
|
|
541
|
+
const cap = _cap();
|
|
542
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
543
|
+
loopRunRound.run(['M001-S001-T0001', '--phase', 'commit'], { cwd: r, stdout: cap.stub });
|
|
544
|
+
const out = JSON.parse(cap.get());
|
|
545
|
+
assert.equal(out.next_action, 'commit-task');
|
|
546
|
+
assert.equal(out.forced, false);
|
|
547
|
+
const cp = checkpoint.readCheckpoint('M001-S001-T0001', r);
|
|
548
|
+
assert.equal(cp.nubosloop.last_phase, 'commit');
|
|
549
|
+
assert.equal(cp.nubosloop.forced_commit_phase, false);
|
|
550
|
+
});
|
|
551
|
+
|
|
552
|
+
test('LCLI-RR-11: phase=commit --force-commit-phase bypasses preconditions and stamps the override', () => {
|
|
553
|
+
const r = _mkRoot();
|
|
554
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
555
|
+
// Empty checkpoint — no verify, no findings. Force should still allow.
|
|
556
|
+
const cap = _cap();
|
|
557
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
558
|
+
loopRunRound.run(
|
|
559
|
+
['M001-S001-T0001', '--phase', 'commit', '--force-commit-phase'],
|
|
560
|
+
{ cwd: r, stdout: cap.stub },
|
|
561
|
+
);
|
|
562
|
+
const out = JSON.parse(cap.get());
|
|
563
|
+
assert.equal(out.next_action, 'commit-task');
|
|
564
|
+
assert.equal(out.forced, true);
|
|
565
|
+
const cp = checkpoint.readCheckpoint('M001-S001-T0001', r);
|
|
566
|
+
assert.equal(cp.nubosloop.forced_commit_phase, true);
|
|
567
|
+
});
|
|
568
|
+
|
|
569
|
+
// Layer C — audit-trail evidence enforcement -------------------------------
|
|
570
|
+
|
|
571
|
+
test('LCLI-RR-12: post-executor refuses without np-executor audit (R1)', () => {
|
|
572
|
+
const r = _mkRoot();
|
|
573
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
574
|
+
// Round defaults to 1 with no audit entries.
|
|
575
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
576
|
+
assert.throws(
|
|
577
|
+
() => loopRunRound.run(
|
|
578
|
+
['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0'],
|
|
579
|
+
{ cwd: r, stdout: _cap().stub },
|
|
580
|
+
),
|
|
581
|
+
(err) => err && err.code === 'loop-post-executor-missing-spawn-audit'
|
|
582
|
+
&& Array.isArray(err.details && err.details.missing)
|
|
583
|
+
&& err.details.missing.includes('np-executor')
|
|
584
|
+
&& err.details.round === 1,
|
|
585
|
+
);
|
|
586
|
+
});
|
|
587
|
+
|
|
588
|
+
test('LCLI-RR-13: post-executor refuses on R1 if only np-build-fixer was audited (wrong agent)', () => {
|
|
589
|
+
const r = _mkRoot();
|
|
590
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
591
|
+
_seedSpawnEvidence('M001-S001-T0001', 1, ['np-build-fixer'], r);
|
|
592
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
593
|
+
assert.throws(
|
|
594
|
+
() => loopRunRound.run(
|
|
595
|
+
['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0'],
|
|
596
|
+
{ cwd: r, stdout: _cap().stub },
|
|
597
|
+
),
|
|
598
|
+
(err) => err && err.code === 'loop-post-executor-missing-spawn-audit'
|
|
599
|
+
&& err.details.missing.includes('np-executor'),
|
|
600
|
+
);
|
|
601
|
+
});
|
|
602
|
+
|
|
603
|
+
test('LCLI-RR-14: post-executor on R≥2 requires np-build-fixer audit, not np-executor', () => {
|
|
604
|
+
const r = _mkRoot();
|
|
605
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
606
|
+
// Advance to round 2; audit only the wrong agent (np-executor).
|
|
607
|
+
const nubosloop = require('../../lib/nubosloop.cjs');
|
|
608
|
+
nubosloop.recordLoopState('M001-S001-T0001', { round: 2 }, r);
|
|
609
|
+
nubosloop.auditToolUse('M001-S001-T0001', 'np-executor', ['search-knowledge'], r);
|
|
610
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
611
|
+
assert.throws(
|
|
612
|
+
() => loopRunRound.run(
|
|
613
|
+
['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0'],
|
|
614
|
+
{ cwd: r, stdout: _cap().stub },
|
|
615
|
+
),
|
|
616
|
+
(err) => err && err.code === 'loop-post-executor-missing-spawn-audit'
|
|
617
|
+
&& err.details.missing.includes('np-build-fixer')
|
|
618
|
+
&& err.details.round === 2,
|
|
619
|
+
);
|
|
620
|
+
});
|
|
621
|
+
|
|
622
|
+
test('LCLI-RR-15: post-critics refuses without any critic audit (synthetic-JSON bypass)', () => {
|
|
623
|
+
const r = _mkRoot();
|
|
624
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
625
|
+
_seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r);
|
|
626
|
+
// No critic-spawn audit → gate must refuse even if --critic-outputs is valid.
|
|
627
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
628
|
+
assert.throws(
|
|
629
|
+
() => loopRunRound.run(
|
|
630
|
+
['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs',
|
|
631
|
+
'[{"critic":"style","findings":[]},{"critic":"tests","findings":[]},{"critic":"acceptance","findings":[],"criteria":[]}]'],
|
|
632
|
+
{ cwd: r, stdout: _cap().stub },
|
|
633
|
+
),
|
|
634
|
+
(err) => err && err.code === 'loop-post-critics-missing-critic-audit'
|
|
635
|
+
&& Array.isArray(err.details.missing)
|
|
636
|
+
&& err.details.missing.length === 3,
|
|
637
|
+
);
|
|
638
|
+
});
|
|
639
|
+
|
|
640
|
+
test('LCLI-RR-16: post-critics refuses with only 2 of 3 critic audits (partial bypass)', () => {
|
|
641
|
+
const r = _mkRoot();
|
|
642
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
643
|
+
_seedSpawnEvidence('M001-S001-T0001', 1,
|
|
644
|
+
['np-executor', 'np-critic-style', 'np-critic-tests'], r); // missing acceptance
|
|
645
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
646
|
+
assert.throws(
|
|
647
|
+
() => loopRunRound.run(
|
|
648
|
+
['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs',
|
|
649
|
+
'[{"critic":"style","findings":[]},{"critic":"tests","findings":[]},{"critic":"acceptance","findings":[],"criteria":[]}]'],
|
|
650
|
+
{ cwd: r, stdout: _cap().stub },
|
|
651
|
+
),
|
|
652
|
+
(err) => err && err.code === 'loop-post-critics-missing-critic-audit'
|
|
653
|
+
&& err.details.missing.length === 1
|
|
654
|
+
&& err.details.missing[0] === 'np-critic-acceptance',
|
|
655
|
+
);
|
|
656
|
+
});
|
|
657
|
+
|
|
658
|
+
test('LCLI-RR-17: --force-post-executor bypasses Layer-C gate', () => {
|
|
659
|
+
const r = _mkRoot();
|
|
660
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
661
|
+
// No audit entries; force flag must let us through.
|
|
662
|
+
const cap = _cap();
|
|
663
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
664
|
+
loopRunRound.run(
|
|
665
|
+
['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0', '--force-post-executor'],
|
|
666
|
+
{ cwd: r, stdout: cap.stub },
|
|
667
|
+
);
|
|
668
|
+
const out = JSON.parse(cap.get());
|
|
669
|
+
assert.equal(out.next_action, 'spawn-critic-schwarm');
|
|
670
|
+
});
|
|
671
|
+
|
|
672
|
+
test('LCLI-RR-18: --force-post-critics bypasses Layer-C gate', () => {
|
|
673
|
+
const r = _mkRoot();
|
|
674
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
675
|
+
_seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r); // executor audited, critics not
|
|
676
|
+
const cap = _cap();
|
|
677
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
678
|
+
loopRunRound.run(
|
|
679
|
+
['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs',
|
|
680
|
+
'[{"critic":"style","findings":[]},{"critic":"tests","findings":[]},{"critic":"acceptance","findings":[],"criteria":[]}]',
|
|
681
|
+
'--force-post-critics'],
|
|
682
|
+
{ cwd: r, stdout: cap.stub },
|
|
683
|
+
);
|
|
684
|
+
const out = JSON.parse(cap.get());
|
|
685
|
+
assert.equal(out.next_action, 'commit');
|
|
686
|
+
});
|
|
687
|
+
|
|
688
|
+
test('LCLI-RR-19: assertSpawnsAuditedForRound returns ordered missing list', () => {
|
|
689
|
+
const r = _mkRoot();
|
|
690
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
691
|
+
const nubosloop = require('../../lib/nubosloop.cjs');
|
|
692
|
+
nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
|
|
693
|
+
nubosloop.auditToolUse('M001-S001-T0001', 'np-critic-style', [], r);
|
|
694
|
+
const v = nubosloop.assertSpawnsAuditedForRound(
|
|
695
|
+
'M001-S001-T0001', nubosloop.POST_CRITICS_EVIDENCE, 1, r,
|
|
696
|
+
);
|
|
697
|
+
assert.equal(v.satisfied, false);
|
|
698
|
+
assert.deepEqual(v.missing, ['np-critic-tests', 'np-critic-acceptance']);
|
|
699
|
+
});
|
|
700
|
+
|
|
701
|
+
test('LCLI-RR-20: findSpawnAuditForRound is round-scoped (round-1 audit not visible from round-2)', () => {
|
|
702
|
+
const r = _mkRoot();
|
|
703
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
704
|
+
const nubosloop = require('../../lib/nubosloop.cjs');
|
|
705
|
+
nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
|
|
706
|
+
nubosloop.auditToolUse('M001-S001-T0001', 'np-critic-style', [], r);
|
|
707
|
+
assert.ok(nubosloop.findSpawnAuditForRound('M001-S001-T0001', 'np-critic-style', 1, r));
|
|
708
|
+
assert.equal(nubosloop.findSpawnAuditForRound('M001-S001-T0001', 'np-critic-style', 2, r), null);
|
|
709
|
+
});
|
|
710
|
+
|
|
711
|
+
test('LCLI-RR-21: loop-audit-tool-use accepts critics without --tool-use-log (records empty spawn)', () => {
|
|
712
|
+
const r = _mkRoot();
|
|
713
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
714
|
+
const nubosloop = require('../../lib/nubosloop.cjs');
|
|
715
|
+
nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
|
|
716
|
+
const loopAudit = require('./loop-audit-tool-use.cjs');
|
|
717
|
+
const cap = _cap();
|
|
718
|
+
loopAudit.run(['M001-S001-T0001', '--agent', 'np-critic-style'], { cwd: r, stdout: cap.stub });
|
|
719
|
+
const out = JSON.parse(cap.get());
|
|
720
|
+
assert.equal(out.agent, 'np-critic-style');
|
|
721
|
+
assert.equal(out.violation, null); // critics aren't audited for Rule 9
|
|
722
|
+
// The audit log must still record the spawn so Layer C can find it.
|
|
723
|
+
assert.ok(nubosloop.findSpawnAuditForRound('M001-S001-T0001', 'np-critic-style', 1, r));
|
|
724
|
+
});
|
|
725
|
+
|
|
726
|
+
test('LCLI-RR-22: loop-audit-tool-use still REQUIRES --tool-use-log for AUDITED_AGENTS', () => {
|
|
727
|
+
const r = _mkRoot();
|
|
728
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
729
|
+
const loopAudit = require('./loop-audit-tool-use.cjs');
|
|
730
|
+
assert.throws(
|
|
731
|
+
() => loopAudit.run(['M001-S001-T0001', '--agent', 'np-executor'], { cwd: r, stdout: _cap().stub }),
|
|
732
|
+
(err) => err && err.code === 'loop-audit-missing-log',
|
|
733
|
+
);
|
|
734
|
+
});
|
|
735
|
+
|
|
470
736
|
test('LCLI-22: learning-match queries the local store', () => {
|
|
471
737
|
const r = _mkRoot();
|
|
472
738
|
const lr = require('../../lib/learnings.cjs');
|
|
@@ -81,6 +81,27 @@ function _runPostExecutor(taskId, list, cwd) {
|
|
|
81
81
|
{ hint: 'pass the exit code of the task verify command' },
|
|
82
82
|
);
|
|
83
83
|
}
|
|
84
|
+
// Layer C: audit-trail enforcement — refuse if no executor spawn was
|
|
85
|
+
// recorded for this round via `loop-audit-tool-use`. This blocks the
|
|
86
|
+
// bypass where an orchestrator stamps verify-green without actually
|
|
87
|
+
// spawning np-executor / np-build-fixer.
|
|
88
|
+
const force = list.includes('--force-post-executor');
|
|
89
|
+
if (!force) {
|
|
90
|
+
const cur = checkpoint.readCheckpoint(taskId, cwd) || {};
|
|
91
|
+
const round = Number((cur.nubosloop && cur.nubosloop.round)) || 1;
|
|
92
|
+
const required = round === 1 ? nubosloop.POST_EXECUTOR_EVIDENCE_R1 : nubosloop.POST_EXECUTOR_EVIDENCE_RN;
|
|
93
|
+
const verdict = nubosloop.assertSpawnsAuditedForRound(taskId, required, round, cwd);
|
|
94
|
+
if (!verdict.satisfied) {
|
|
95
|
+
throw new NubosPilotError(
|
|
96
|
+
'loop-post-executor-missing-spawn-audit',
|
|
97
|
+
'phase=post-executor refused: no `loop-audit-tool-use` record found for round=' + round +
|
|
98
|
+
', agent=' + verdict.missing.join('/') + ' on ' + taskId + '. ' +
|
|
99
|
+
'Spawn the executor/build-fixer agent and call `loop-audit-tool-use ' + taskId +
|
|
100
|
+
' --agent <name> --tool-use-log <json>` first, or pass --force-post-executor for an explicit override.',
|
|
101
|
+
{ taskId, round, missing: verdict.missing.slice(), required: required.slice() },
|
|
102
|
+
);
|
|
103
|
+
}
|
|
104
|
+
}
|
|
84
105
|
const code = Number(verifyExitCode);
|
|
85
106
|
const verifyOutputPath = args.getFlag(list, '--verify-output-path');
|
|
86
107
|
let verifyOutput = '';
|
|
@@ -132,6 +153,27 @@ function _runPostCritics(taskId, list, cwd) {
|
|
|
132
153
|
const pb = cp.nubosloop || {};
|
|
133
154
|
return Number(pb.round) || 1;
|
|
134
155
|
})();
|
|
156
|
+
// Layer C: audit-trail enforcement — refuse if the three critic spawns
|
|
157
|
+
// (style/tests/acceptance) are not present in the audit log for this round.
|
|
158
|
+
// This blocks the bypass where an orchestrator hand-writes synthetic
|
|
159
|
+
// critic-output JSON without actually spawning the critic agents.
|
|
160
|
+
const force = list.includes('--force-post-critics');
|
|
161
|
+
if (!force) {
|
|
162
|
+
const verdict = nubosloop.assertSpawnsAuditedForRound(
|
|
163
|
+
taskId, nubosloop.POST_CRITICS_EVIDENCE, round, cwd,
|
|
164
|
+
);
|
|
165
|
+
if (!verdict.satisfied) {
|
|
166
|
+
throw new NubosPilotError(
|
|
167
|
+
'loop-post-critics-missing-critic-audit',
|
|
168
|
+
'phase=post-critics refused: critic-schwarm spawn-evidence missing for round=' + round +
|
|
169
|
+
' on ' + taskId + ' (missing audits: ' + verdict.missing.join(', ') + '). ' +
|
|
170
|
+
'For each critic agent, call `loop-audit-tool-use ' + taskId +
|
|
171
|
+
' --agent <np-critic-style|np-critic-tests|np-critic-acceptance> --tool-use-log <json>` ' +
|
|
172
|
+
'after the spawn, then re-run --phase post-critics. Pass --force-post-critics for an explicit override.',
|
|
173
|
+
{ taskId, round, missing: verdict.missing.slice(), required: nubosloop.POST_CRITICS_EVIDENCE.slice() },
|
|
174
|
+
);
|
|
175
|
+
}
|
|
176
|
+
}
|
|
135
177
|
const opts = nubosloop.resolveLoopOpts(cwd);
|
|
136
178
|
// Rule 9 chain: convert this round's audit violations into rule-9-violation
|
|
137
179
|
// findings so they participate in routing alongside critic findings.
|
|
@@ -164,6 +206,43 @@ function _runPostCritics(taskId, list, cwd) {
|
|
|
164
206
|
}
|
|
165
207
|
|
|
166
208
|
function _runCommit(taskId, list, cwd) {
|
|
209
|
+
// Sequence-integrity guard: the commit phase MUST follow a complete loop run.
|
|
210
|
+
// Stamping last_phase='commit' is what unlocks commit-task, so without this
|
|
211
|
+
// check an agent could shell out `loop-run-round --phase commit` directly,
|
|
212
|
+
// skip preflight/executor/critics, and bypass the entire Nubosloop. The
|
|
213
|
+
// commit-task gate then sees a satisfied last_phase and lets the commit
|
|
214
|
+
// through. Defense-in-depth: refuse here AND in commit-task.
|
|
215
|
+
//
|
|
216
|
+
// Required evidence on the checkpoint envelope:
|
|
217
|
+
// - verify_exit_code === 0 → post-executor ran AND verify was green
|
|
218
|
+
// - findings is an array → post-critics ran (empty array = passed)
|
|
219
|
+
//
|
|
220
|
+
// Bypass for legitimate test fixtures / migration: --force-commit-phase.
|
|
221
|
+
const force = list.includes('--force-commit-phase');
|
|
222
|
+
if (!force) {
|
|
223
|
+
const cur = checkpoint.readCheckpoint(taskId, cwd) || {};
|
|
224
|
+
const np = (cur && cur.nubosloop) || {};
|
|
225
|
+
if (np.verify_exit_code !== 0) {
|
|
226
|
+
throw new NubosPilotError(
|
|
227
|
+
'loop-commit-precondition-missing',
|
|
228
|
+
'phase=commit refused: post-executor did not record a verify-green run for ' + taskId +
|
|
229
|
+
' (observed verify_exit_code=' + (np.verify_exit_code === undefined ? 'undefined' : np.verify_exit_code) + '). ' +
|
|
230
|
+
'Run `loop-run-round ' + taskId + ' --phase post-executor --verify-exit-code 0 --verify-output-path ...` first, ' +
|
|
231
|
+
'or pass --force-commit-phase for an explicit override.',
|
|
232
|
+
{ taskId, missing: 'verify_exit_code', observed: np.verify_exit_code === undefined ? null : np.verify_exit_code },
|
|
233
|
+
);
|
|
234
|
+
}
|
|
235
|
+
if (!Array.isArray(np.findings)) {
|
|
236
|
+
throw new NubosPilotError(
|
|
237
|
+
'loop-commit-precondition-missing',
|
|
238
|
+
'phase=commit refused: post-critics did not produce a findings array for ' + taskId +
|
|
239
|
+
' (observed findings=' + (np.findings === undefined ? 'undefined' : JSON.stringify(np.findings)) + '). ' +
|
|
240
|
+
'Run `loop-run-round ' + taskId + ' --phase post-critics --critic-outputs <json>` first, ' +
|
|
241
|
+
'or pass --force-commit-phase for an explicit override.',
|
|
242
|
+
{ taskId, missing: 'findings', observed: np.findings === undefined ? null : np.findings },
|
|
243
|
+
);
|
|
244
|
+
}
|
|
245
|
+
}
|
|
167
246
|
const pattern = args.getFlag(list, '--learning-pattern') || null;
|
|
168
247
|
const outcome = args.getFlag(list, '--learning-outcome') || 'verified';
|
|
169
248
|
let logged = null;
|
|
@@ -184,6 +263,7 @@ function _runCommit(taskId, list, cwd) {
|
|
|
184
263
|
last_phase: 'commit',
|
|
185
264
|
last_action: 'commit',
|
|
186
265
|
committed_at: new Date().toISOString(),
|
|
266
|
+
forced_commit_phase: force ? true : (cur && cur.nubosloop && cur.nubosloop.forced_commit_phase) || false,
|
|
187
267
|
}),
|
|
188
268
|
}),
|
|
189
269
|
cwd,
|
|
@@ -192,6 +272,7 @@ function _runCommit(taskId, list, cwd) {
|
|
|
192
272
|
phase: 'commit',
|
|
193
273
|
next_action: 'commit-task',
|
|
194
274
|
learning_logged: logged,
|
|
275
|
+
forced: force,
|
|
195
276
|
};
|
|
196
277
|
}
|
|
197
278
|
|
|
@@ -77,6 +77,39 @@ When `loop.maxRounds` is hit:
|
|
|
77
77
|
* Bad, because per-task token cost grows compared to the single-pass model. Accepted — that cost is the price of completeness, and the cache + cap bound it.
|
|
78
78
|
* Bad, because the orchestrator must coordinate 1 Executor + 3 Critics + occasional Researcher-Schwarm per task. Accepted — that coordination is what makes per-task adversarial review possible.
|
|
79
79
|
|
|
80
|
+
## Trust Layer (amended 2026-05-04)
|
|
81
|
+
|
|
82
|
+
The original spec assumed a cooperative orchestrator: each `loop-run-round --phase X` call was treated as evidence that the corresponding work happened. Multiple production runs proved that assumption wrong — under user-pressure or budget constraints, an orchestrator can rationalize partial-loops or fully-synthetic loops while still emitting the right CLI calls. Three failure modes observed in the wild:
|
|
83
|
+
|
|
84
|
+
1. **Single-pass bypass** — `executor → commit-task` directly, skipping the loop. (Closed by `commit-task` Layer-A gate; refuses without `nubosloop.last_phase=commit`.)
|
|
85
|
+
2. **Stamp bypass** — `loop-run-round --phase commit` invoked directly without prior phases, just to satisfy Layer A. (Closed by Layer-B precondition in `_runCommit`; refuses without `verify_exit_code=0` and `findings: []` on the checkpoint.)
|
|
86
|
+
3. **Synthetic-evidence bypass** — orchestrator invokes every `loop-run-round` phase but with hand-written `--critic-outputs '[{"critic":"style","findings":[]}, ...]'` JSON, never actually spawning the three critic agents. Layers A and B see a perfectly-shaped checkpoint and accept. (Closed by Layer-C audit-trail gate, this amendment.)
|
|
87
|
+
|
|
88
|
+
### Layer-C — Spawn-evidence audit-trail
|
|
89
|
+
|
|
90
|
+
Each LLM spawn (executor, build-fixer, three critics) MUST be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --task-id … --agent <name> --tool-use-log <json>`. The round number is sourced automatically from `nubosloop.round` to keep the chain idempotent.
|
|
91
|
+
|
|
92
|
+
Two phase verbs now consult this log before accepting an advance:
|
|
93
|
+
|
|
94
|
+
* **`loop-run-round --phase post-executor`** requires an audit entry for `np-executor` (round 1) or `np-build-fixer` (round ≥ 2) in the current round. Refuses with `loop-post-executor-missing-spawn-audit` otherwise.
|
|
95
|
+
* **`loop-run-round --phase post-critics`** requires audit entries for all three: `np-critic-style`, `np-critic-tests`, `np-critic-acceptance`. Refuses with `loop-post-critics-missing-critic-audit` otherwise.
|
|
96
|
+
|
|
97
|
+
Both phases accept explicit overrides — `--force-post-executor` / `--force-post-critics` — for legitimate test fixtures and migration. The override stamps `forced_*` flags on the checkpoint so dashboards can count them.
|
|
98
|
+
|
|
99
|
+
### Defense-in-depth summary
|
|
100
|
+
|
|
101
|
+
| Layer | Where | What it proves | Bypass cost |
|
|
102
|
+
|-------|---------------------|---------------------------------------------------------------|----------------------------------------|
|
|
103
|
+
| A | `commit-task.cjs` | The full sequence signature is on the checkpoint | Lie at all five evidence fields |
|
|
104
|
+
| B | `_runCommit` | Verify-green AND a post-critics findings array preceded the commit phase | Pre-write fake `verify_exit_code=0` and `findings: []` to the checkpoint manually |
|
|
105
|
+
| C | `_runPostExecutor` + `_runPostCritics` | Each declared spawn appears in the per-round audit log | Issue extra `loop-audit-tool-use` calls naming agents that didn't actually run |
|
|
106
|
+
|
|
107
|
+
No layer is sufficient alone. Together they require a deliberate, multi-step lie across multiple verbs to bypass — far more deliberate than the "pragmatic compression" rationalizations that produced bypasses 1–3 in production.
|
|
108
|
+
|
|
109
|
+
### What the Trust Layer cannot prove
|
|
110
|
+
|
|
111
|
+
Layer C still cannot prove that the agent named in an audit entry actually ran. The orchestrator could call `loop-audit-tool-use --agent np-critic-style …` without spawning the critic. Closing this gap requires runtime instrumentation — the LLM runtime itself stamps spawn-provenance metadata into the audit entry, which the orchestrator cannot forge. That is "Stufe 2" and tracked separately; this amendment closes the practical bypass class without it.
|
|
112
|
+
|
|
80
113
|
## More Information
|
|
81
114
|
|
|
82
115
|
* **Related ADR:** [ADR-0001](0001-no-daemon-invariant.md) — the loop runs in-session; no daemon coordinates spawns.
|
|
@@ -85,3 +118,4 @@ When `loop.maxRounds` is hit:
|
|
|
85
118
|
* **Related ADR:** [ADR-0012](0012-completeness-doctrine.md) — the loop enforces the Completeness Mandate.
|
|
86
119
|
* **Concept page:** [`v1/concepts/nubosloop.md`](../../knowledge/libraries/nubos-pilot/v1/concepts/nubosloop.md).
|
|
87
120
|
* **Library:** `lib/nubosloop.cjs`.
|
|
121
|
+
* **Gate code:** `bin/np-tools/commit-task.cjs::_assertLoopGate` (Layer A); `bin/np-tools/loop-run-round.cjs::_runCommit` (Layer B); `bin/np-tools/loop-run-round.cjs::_runPostExecutor` + `_runPostCritics` (Layer C).
|
package/lib/nubosloop.cjs
CHANGED
|
@@ -341,6 +341,52 @@ const SEARCH_TOOLS = Object.freeze([
|
|
|
341
341
|
|
|
342
342
|
const AUDITED_AGENTS = Object.freeze(['np-researcher', 'np-executor', 'np-build-fixer']);
|
|
343
343
|
|
|
344
|
+
// Spawn-evidence agent groups (ADR-0010 Layer-C audit-trail enforcement).
|
|
345
|
+
// These lists are NOT about Rule 9 (which AUDITED_AGENTS gates) — they declare
|
|
346
|
+
// which spawns MUST appear in the per-round tool-use audit log before the
|
|
347
|
+
// orchestrator can advance loop-run-round through `post-executor`/`post-critics`.
|
|
348
|
+
// An entry in tool_use_audit with matching agent+round is the only evidence
|
|
349
|
+
// the gate accepts that the spawn actually happened.
|
|
350
|
+
const POST_EXECUTOR_EVIDENCE_R1 = Object.freeze(['np-executor']);
|
|
351
|
+
const POST_EXECUTOR_EVIDENCE_RN = Object.freeze(['np-build-fixer']);
|
|
352
|
+
const POST_CRITICS_EVIDENCE = Object.freeze([
|
|
353
|
+
'np-critic-style',
|
|
354
|
+
'np-critic-tests',
|
|
355
|
+
'np-critic-acceptance',
|
|
356
|
+
]);
|
|
357
|
+
|
|
358
|
+
/**
|
|
359
|
+
* Look up a spawn-audit entry for a given (taskId, agent, round). Returns the
|
|
360
|
+
* audit entry object if found, null otherwise. Used by Layer-C gates in
|
|
361
|
+
* loop-run-round to assert that real spawns preceded each phase advance.
|
|
362
|
+
*/
|
|
363
|
+
function findSpawnAuditForRound(taskId, agent, round, cwd) {
|
|
364
|
+
if (!checkpoint.TASK_ID_RE.test(taskId)) return null;
|
|
365
|
+
const target = Number(round);
|
|
366
|
+
if (!Number.isFinite(target) || target < 1) return null;
|
|
367
|
+
const audits = readToolUseAudit(taskId, cwd) || [];
|
|
368
|
+
for (const a of audits) {
|
|
369
|
+
if (!a) continue;
|
|
370
|
+
if (a.agent !== agent) continue;
|
|
371
|
+
if ((Number(a.round) || 1) !== target) continue;
|
|
372
|
+
return a;
|
|
373
|
+
}
|
|
374
|
+
return null;
|
|
375
|
+
}
|
|
376
|
+
|
|
377
|
+
/**
|
|
378
|
+
* Assert every required spawn for a phase exists in the audit log for the
|
|
379
|
+
* current round. Returns { satisfied, missing } — the orchestrator-side gate
|
|
380
|
+
* uses `missing` to compose actionable error messages.
|
|
381
|
+
*/
|
|
382
|
+
function assertSpawnsAuditedForRound(taskId, requiredAgents, round, cwd) {
|
|
383
|
+
const missing = [];
|
|
384
|
+
for (const agent of requiredAgents) {
|
|
385
|
+
if (!findSpawnAuditForRound(taskId, agent, round, cwd)) missing.push(agent);
|
|
386
|
+
}
|
|
387
|
+
return { satisfied: missing.length === 0, missing };
|
|
388
|
+
}
|
|
389
|
+
|
|
344
390
|
/**
|
|
345
391
|
* Rule 9 mechanical check (Completeness Doctrine + ADR-0010 Step 4).
|
|
346
392
|
* The orchestrator collects each spawn's tool-use log (most LLM APIs
|
|
@@ -637,6 +683,11 @@ module.exports = {
|
|
|
637
683
|
auditToolUse,
|
|
638
684
|
readToolUseAudit,
|
|
639
685
|
auditFindingsForRound,
|
|
686
|
+
findSpawnAuditForRound,
|
|
687
|
+
assertSpawnsAuditedForRound,
|
|
688
|
+
POST_EXECUTOR_EVIDENCE_R1,
|
|
689
|
+
POST_EXECUTOR_EVIDENCE_RN,
|
|
690
|
+
POST_CRITICS_EVIDENCE,
|
|
640
691
|
KNOWN_ROUTING_BUCKETS,
|
|
641
692
|
SEARCH_TOOLS,
|
|
642
693
|
AUDITED_AGENTS,
|
package/package.json
CHANGED
|
@@ -223,13 +223,19 @@ for WAVE_INDEX in 0 1 2 ...; do
|
|
|
223
223
|
|
|
224
224
|
node .nubos-pilot/bin/np-tools.cjs checkpoint transition "$TASK_ID" verifying
|
|
225
225
|
|
|
226
|
-
# === Step 4: Mechanical Checks +
|
|
226
|
+
# === Step 4: Mechanical Checks + spawn-evidence audit (orchestrator-side) ===
|
|
227
227
|
VERIFY_LOG="${TMPDIR:-/tmp}/np-verify-${TASK_ID}-r${ROUND}.log"
|
|
228
228
|
# Orchestrator (NOT the agent) runs the task's <verify> command + stack
|
|
229
229
|
# linters; redirect stdout+stderr to $VERIFY_LOG.
|
|
230
230
|
VERIFY_EXIT=$?
|
|
231
|
+
# Stamp executor spawn-evidence into the audit log. EXECUTOR_TOOL_LOG is
|
|
232
|
+
# the tool-name JSON array harvested from the spawn's tool_use stream
|
|
233
|
+
# (e.g. '["Read","search-knowledge","Edit","Bash"]'). For AUDITED_AGENTS
|
|
234
|
+
# this drives Rule 9 enforcement; the round number is sourced automatically
|
|
235
|
+
# from the checkpoint by loop-audit-tool-use. The post-executor gate (Layer C)
|
|
236
|
+
# refuses to advance unless this evidence stamp exists for the current round.
|
|
231
237
|
node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" \
|
|
232
|
-
--
|
|
238
|
+
--agent "$EXECUTOR_AGENT" --tool-use-log "$EXECUTOR_TOOL_LOG"
|
|
233
239
|
|
|
234
240
|
POST_EXEC=$(node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
|
|
235
241
|
--phase post-executor \
|
|
@@ -249,6 +255,19 @@ for WAVE_INDEX in 0 1 2 ...; do
|
|
|
249
255
|
# - agents/np-critic-acceptance.md (sonnet) → CRITIC_ACCEPTANCE_JSON
|
|
250
256
|
CRITIC_OUTPUTS_JSON=$(printf '[%s,%s,%s]' "$CRITIC_STYLE_JSON" "$CRITIC_TESTS_JSON" "$CRITIC_ACCEPTANCE_JSON")
|
|
251
257
|
|
|
258
|
+
# === Step 5b: Stamp critic spawn-evidence (one audit entry per critic) ===
|
|
259
|
+
# MANDATORY — without these three stamps, post-critics refuses with
|
|
260
|
+
# `loop-post-critics-missing-critic-audit` (Layer C, ADR-0010 Trust-Layer).
|
|
261
|
+
# The orchestrator MUST issue all three calls AFTER the critic spawns
|
|
262
|
+
# have actually run; synthetic --critic-outputs JSON without these
|
|
263
|
+
# corresponding audit entries is mechanically blocked.
|
|
264
|
+
# --tool-use-log may be empty for critics (they aren't AUDITED_AGENTS for
|
|
265
|
+
# Rule 9), but supplying the actual critic tool list is preferred for
|
|
266
|
+
# observability on np:dashboard.
|
|
267
|
+
node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-critic-style --tool-use-log '[]'
|
|
268
|
+
node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-critic-tests --tool-use-log '[]'
|
|
269
|
+
node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-critic-acceptance --tool-use-log '[]'
|
|
270
|
+
|
|
252
271
|
# === Step 6: Route via loop-evaluate (post-critics) ===
|
|
253
272
|
POST_CRIT=$(node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
|
|
254
273
|
--phase post-critics --critic-outputs "$CRITIC_OUTPUTS_JSON")
|
|
@@ -347,7 +366,7 @@ After every slice completes, point the operator at `/np:validate-phase $PHASE` t
|
|
|
347
366
|
- Spawn the three Critic agents (`np-critic-style`, `np-critic-tests`, `np-critic-acceptance`) IN PARALLEL — single message, three Agent blocks per task per round.
|
|
348
367
|
- Run `loop-run-round --phase post-executor` AFTER mechanical checks; honor `next_action: spawn-build-fixer` (verify-red short-circuit, skip critics this round).
|
|
349
368
|
- Run `loop-run-round --phase post-critics` AFTER critics return, to obtain the routing `next_action`.
|
|
350
|
-
- Run `loop-audit-tool-use` per round per spawn — Rule 9 (
|
|
369
|
+
- Run `loop-audit-tool-use` per round per spawn — for executor/build-fixer this drives Rule 9 enforcement, AND for the three Critic agents this is the spawn-evidence required by the Layer-C audit-trail gate (`loop-post-executor-missing-spawn-audit` / `loop-post-critics-missing-critic-audit`). All four audit calls per round are mandatory before the corresponding `loop-run-round --phase post-{executor|critics}` invocation.
|
|
351
370
|
- Route every commit through `node .nubos-pilot/bin/np-tools.cjs commit-task` so `assertCommittablePaths` (D-25) runs.
|
|
352
371
|
- Hard-stop the wave when `commit-task` returns non-zero, OR a task hits `stuck`/`plan-checker`.
|
|
353
372
|
|
|
@@ -357,7 +376,7 @@ After every slice completes, point the operator at `/np:validate-phase $PHASE` t
|
|
|
357
376
|
- Skip the Nubosloop and call `commit-task` directly after the executor (single-pass executor → commit is forbidden — ADR-0010).
|
|
358
377
|
- Spawn the Critic agents serially — they MUST run in parallel (single message, three Agent blocks).
|
|
359
378
|
- Use `np-executor` on Round ≥ 2 — use `np-build-fixer` (it gets prior critic findings + verify output excerpt).
|
|
360
|
-
- Skip `loop-audit-tool-use`
|
|
379
|
+
- Skip `loop-audit-tool-use` for ANY spawn (executor/build-fixer/the three Critics). Skipping the executor audit silences Rule 9; skipping any critic audit means the orchestrator cannot prove the critic actually ran, and the post-critics gate refuses. Synthesizing `--critic-outputs` JSON without spawning real critic agents is the canonical bypass — Layer C blocks it mechanically.
|
|
361
380
|
- Extend a task's scope beyond `files_modified` — D-04 violations route to `plan-checker`, not post-hoc PLAN.md mutations.
|
|
362
381
|
- Invoke `git commit`, `git add`, or any bare git command from this workflow or the spawned agent (CLAUDE.md §Git operations).
|
|
363
382
|
- Bundle two tasks into one commit (ADR-0004 atomicity).
|