nubos-pilot 0.9.5 → 0.9.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/agents/np-critic-acceptance.md +4 -0
- package/agents/np-critic-style.md +4 -0
- package/agents/np-critic-tests.md +4 -0
- package/bin/np-tools/loop-audit-tool-use.cjs +27 -10
- package/bin/np-tools/loop-commands.test.cjs +193 -0
- package/bin/np-tools/loop-run-round.cjs +42 -0
- package/docs/adr/0010-nubosloop.md +34 -0
- package/lib/nubosloop.cjs +51 -0
- package/package.json +1 -1
- package/workflows/execute-phase.md +23 -4
|
@@ -26,6 +26,10 @@ This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENES
|
|
|
26
26
|
|
|
27
27
|
Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
|
|
28
28
|
|
|
29
|
+
## Spawn-Evidence Audit (Trust Layer, ADR-0010)
|
|
30
|
+
|
|
31
|
+
Your spawn must be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --agent np-critic-acceptance --tool-use-log <json>` after you emit your findings JSON. This is the orchestrator's responsibility, not yours — but if you observe (in the verify output or task summary) that a prior round's critic-schwarm completed without an audit stamp, surface that as a finding of category `locked-decision-violation` because it indicates a bypass of ADR-0010 Layer C. The post-critics gate (`loop-run-round --phase post-critics`) refuses without the three critic stamps; missing your stamp blocks the entire round.
|
|
32
|
+
|
|
29
33
|
## Inputs
|
|
30
34
|
|
|
31
35
|
The orchestrator provides these paths in your prompt context. Read every path it hands you via `Read` — do not guess.
|
|
@@ -25,6 +25,10 @@ This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENES
|
|
|
25
25
|
|
|
26
26
|
Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
|
|
27
27
|
|
|
28
|
+
## Spawn-Evidence Audit (Trust Layer, ADR-0010)
|
|
29
|
+
|
|
30
|
+
Your spawn must be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --agent np-critic-style --tool-use-log <json>` after you emit your findings JSON. The post-critics gate refuses without the three critic stamps; missing your stamp blocks the entire round. Synthesizing a fake findings JSON without spawning your sibling critics is a Layer-C violation and the orchestrator must NOT do it.
|
|
31
|
+
|
|
28
32
|
## Inputs
|
|
29
33
|
|
|
30
34
|
The orchestrator provides these paths in your prompt context. Read every path it hands you via `Read` — do not guess.
|
|
@@ -25,6 +25,10 @@ This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENES
|
|
|
25
25
|
|
|
26
26
|
Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
|
|
27
27
|
|
|
28
|
+
## Spawn-Evidence Audit (Trust Layer, ADR-0010)
|
|
29
|
+
|
|
30
|
+
Your spawn must be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --agent np-critic-tests --tool-use-log <json>` after you emit your findings JSON. The post-critics gate refuses without the three critic stamps; missing your stamp blocks the entire round. Synthesizing a fake findings JSON without spawning your sibling critics is a Layer-C violation and the orchestrator must NOT do it.
|
|
31
|
+
|
|
28
32
|
## Inputs
|
|
29
33
|
|
|
30
34
|
The orchestrator provides these paths in your prompt context. Read every path it hands you via `Read` — do not guess.
|
|
@@ -31,18 +31,35 @@ function run(argv, ctx) {
|
|
|
31
31
|
{ hint: 'agents requiring search tools: ' + nubosloop.AUDITED_AGENTS.join(', ') },
|
|
32
32
|
);
|
|
33
33
|
}
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
);
|
|
40
|
-
|
|
34
|
+
// --tool-use-log is required for AUDITED_AGENTS (Rule 9 enforcement reads
|
|
35
|
+
// the tool list to verify search-knowledge / match-existing-learning calls).
|
|
36
|
+
// For non-audited spawns (critics, plan-checker, etc.) the orchestrator may
|
|
37
|
+
// omit it — we still record the spawn for Layer-C audit-trail evidence with
|
|
38
|
+
// an empty log. Explicit empty-array is also accepted.
|
|
39
|
+
const isAuditedAgent = nubosloop.AUDITED_AGENTS.includes(agent);
|
|
40
|
+
let log;
|
|
41
|
+
if (tail.includes('--tool-use-log')) {
|
|
42
|
+
log = args.getJsonFlag(
|
|
43
|
+
tail,
|
|
44
|
+
'--tool-use-log',
|
|
45
|
+
'loop-audit-missing-log',
|
|
46
|
+
"JSON array of tool-name strings, e.g. '[\"Read\",\"search-knowledge\",\"Edit\"]'",
|
|
47
|
+
);
|
|
48
|
+
if (!Array.isArray(log)) {
|
|
49
|
+
throw new (require('../../lib/core.cjs').NubosPilotError)(
|
|
50
|
+
'loop-audit-invalid-log',
|
|
51
|
+
'--tool-use-log must be a JSON array',
|
|
52
|
+
{ got: typeof log },
|
|
53
|
+
);
|
|
54
|
+
}
|
|
55
|
+
} else if (isAuditedAgent) {
|
|
41
56
|
throw new (require('../../lib/core.cjs').NubosPilotError)(
|
|
42
|
-
'loop-audit-
|
|
43
|
-
'--tool-use-log
|
|
44
|
-
{
|
|
57
|
+
'loop-audit-missing-log',
|
|
58
|
+
'loop-audit-tool-use requires --tool-use-log for audited agent: ' + agent,
|
|
59
|
+
{ hint: 'audited agents drive Rule 9 enforcement; pass --tool-use-log \'[]\' to record an empty spawn' },
|
|
45
60
|
);
|
|
61
|
+
} else {
|
|
62
|
+
log = [];
|
|
46
63
|
}
|
|
47
64
|
const result = nubosloop.auditToolUse(taskId, agent, log, cwd);
|
|
48
65
|
const payload = { task_id: taskId, ...result };
|
|
@@ -349,9 +349,25 @@ test('LCLI-RR-2: loop-run-round preflight on populated store → spawn-executor-
|
|
|
349
349
|
assert.ok(out.cache_hit);
|
|
350
350
|
});
|
|
351
351
|
|
|
352
|
+
// Helper: seed the per-round spawn-evidence audit log so Layer-C gates accept
|
|
353
|
+
// post-executor / post-critics. Tests that exercise the gate explicitly
|
|
354
|
+
// (LCLI-RR-12+) build their own partial fixtures.
|
|
355
|
+
function _seedSpawnEvidence(taskId, round, agents, cwd) {
|
|
356
|
+
const nubosloop = require('../../lib/nubosloop.cjs');
|
|
357
|
+
nubosloop.recordLoopState(taskId, { round }, cwd);
|
|
358
|
+
for (const a of agents) {
|
|
359
|
+
// Pass an empty tool-use log — these are evidence stamps, not Rule 9 audits.
|
|
360
|
+
// For AUDITED_AGENTS in this test (np-executor / np-build-fixer) we need to
|
|
361
|
+
// pass a valid search-tool to avoid generating a rule-9-violation finding.
|
|
362
|
+
const log = nubosloop.AUDITED_AGENTS.includes(a) ? ['search-knowledge'] : [];
|
|
363
|
+
nubosloop.auditToolUse(taskId, a, log, cwd);
|
|
364
|
+
}
|
|
365
|
+
}
|
|
366
|
+
|
|
352
367
|
test('LCLI-RR-3: loop-run-round phase=post-executor with verify-green → spawn-critic-schwarm', () => {
|
|
353
368
|
const r = _mkRoot();
|
|
354
369
|
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
370
|
+
_seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r);
|
|
355
371
|
const cap = _cap();
|
|
356
372
|
const loopRunRound = require('./loop-run-round.cjs');
|
|
357
373
|
loopRunRound.run(
|
|
@@ -366,6 +382,7 @@ test('LCLI-RR-3: loop-run-round phase=post-executor with verify-green → spawn-
|
|
|
366
382
|
test('LCLI-RR-4: loop-run-round phase=post-executor with verify-red → spawn-build-fixer', () => {
|
|
367
383
|
const r = _mkRoot();
|
|
368
384
|
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
385
|
+
_seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r);
|
|
369
386
|
const cap = _cap();
|
|
370
387
|
const loopRunRound = require('./loop-run-round.cjs');
|
|
371
388
|
loopRunRound.run(
|
|
@@ -380,6 +397,8 @@ test('LCLI-RR-4: loop-run-round phase=post-executor with verify-red → spawn-bu
|
|
|
380
397
|
test('LCLI-RR-5: loop-run-round phase=post-critics with zero findings → commit', () => {
|
|
381
398
|
const r = _mkRoot();
|
|
382
399
|
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
400
|
+
_seedSpawnEvidence('M001-S001-T0001', 1,
|
|
401
|
+
['np-executor', 'np-critic-style', 'np-critic-tests', 'np-critic-acceptance'], r);
|
|
383
402
|
const cap = _cap();
|
|
384
403
|
const loopRunRound = require('./loop-run-round.cjs');
|
|
385
404
|
loopRunRound.run(
|
|
@@ -399,6 +418,10 @@ test('LCLI-RR-5b: post-critics surfaces rule-9-violation from audit log even wit
|
|
|
399
418
|
// Round 1, executor shipped without searching → audit captures violation
|
|
400
419
|
nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
|
|
401
420
|
nubosloop.auditToolUse('M001-S001-T0001', 'np-executor', ['Read', 'Edit'], r);
|
|
421
|
+
// Seed the three critic spawn evidences so the Layer-C gate is satisfied —
|
|
422
|
+
// we want the rule-9-violation to surface from the audit log, not the gate.
|
|
423
|
+
_seedSpawnEvidence('M001-S001-T0001', 1,
|
|
424
|
+
['np-critic-style', 'np-critic-tests', 'np-critic-acceptance'], r);
|
|
402
425
|
// Critics return zero findings (style/tests/acceptance all clean) — without
|
|
403
426
|
// the Rule 9 chain the loop would commit. With it, the audit violation must
|
|
404
427
|
// still route the round to executor.
|
|
@@ -428,6 +451,9 @@ test('LCLI-RR-5c: post-critics scopes audit findings to current round only', ()
|
|
|
428
451
|
nubosloop.auditToolUse('M001-S001-T0001', 'np-executor', ['Read'], r);
|
|
429
452
|
nubosloop.recordLoopState('M001-S001-T0001', { round: 2 }, r);
|
|
430
453
|
nubosloop.auditToolUse('M001-S001-T0001', 'np-build-fixer', ['search-knowledge'], r);
|
|
454
|
+
// Seed critic-spawn evidence for round 2 so the Layer-C gate is satisfied.
|
|
455
|
+
_seedSpawnEvidence('M001-S001-T0001', 2,
|
|
456
|
+
['np-critic-style', 'np-critic-tests', 'np-critic-acceptance'], r);
|
|
431
457
|
const cap = _cap();
|
|
432
458
|
const loopRunRound = require('./loop-run-round.cjs');
|
|
433
459
|
loopRunRound.run(
|
|
@@ -540,6 +566,173 @@ test('LCLI-RR-11: phase=commit --force-commit-phase bypasses preconditions and s
|
|
|
540
566
|
assert.equal(cp.nubosloop.forced_commit_phase, true);
|
|
541
567
|
});
|
|
542
568
|
|
|
569
|
+
// Layer C — audit-trail evidence enforcement -------------------------------
|
|
570
|
+
|
|
571
|
+
test('LCLI-RR-12: post-executor refuses without np-executor audit (R1)', () => {
|
|
572
|
+
const r = _mkRoot();
|
|
573
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
574
|
+
// Round defaults to 1 with no audit entries.
|
|
575
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
576
|
+
assert.throws(
|
|
577
|
+
() => loopRunRound.run(
|
|
578
|
+
['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0'],
|
|
579
|
+
{ cwd: r, stdout: _cap().stub },
|
|
580
|
+
),
|
|
581
|
+
(err) => err && err.code === 'loop-post-executor-missing-spawn-audit'
|
|
582
|
+
&& Array.isArray(err.details && err.details.missing)
|
|
583
|
+
&& err.details.missing.includes('np-executor')
|
|
584
|
+
&& err.details.round === 1,
|
|
585
|
+
);
|
|
586
|
+
});
|
|
587
|
+
|
|
588
|
+
test('LCLI-RR-13: post-executor refuses on R1 if only np-build-fixer was audited (wrong agent)', () => {
|
|
589
|
+
const r = _mkRoot();
|
|
590
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
591
|
+
_seedSpawnEvidence('M001-S001-T0001', 1, ['np-build-fixer'], r);
|
|
592
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
593
|
+
assert.throws(
|
|
594
|
+
() => loopRunRound.run(
|
|
595
|
+
['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0'],
|
|
596
|
+
{ cwd: r, stdout: _cap().stub },
|
|
597
|
+
),
|
|
598
|
+
(err) => err && err.code === 'loop-post-executor-missing-spawn-audit'
|
|
599
|
+
&& err.details.missing.includes('np-executor'),
|
|
600
|
+
);
|
|
601
|
+
});
|
|
602
|
+
|
|
603
|
+
test('LCLI-RR-14: post-executor on R≥2 requires np-build-fixer audit, not np-executor', () => {
|
|
604
|
+
const r = _mkRoot();
|
|
605
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
606
|
+
// Advance to round 2; audit only the wrong agent (np-executor).
|
|
607
|
+
const nubosloop = require('../../lib/nubosloop.cjs');
|
|
608
|
+
nubosloop.recordLoopState('M001-S001-T0001', { round: 2 }, r);
|
|
609
|
+
nubosloop.auditToolUse('M001-S001-T0001', 'np-executor', ['search-knowledge'], r);
|
|
610
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
611
|
+
assert.throws(
|
|
612
|
+
() => loopRunRound.run(
|
|
613
|
+
['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0'],
|
|
614
|
+
{ cwd: r, stdout: _cap().stub },
|
|
615
|
+
),
|
|
616
|
+
(err) => err && err.code === 'loop-post-executor-missing-spawn-audit'
|
|
617
|
+
&& err.details.missing.includes('np-build-fixer')
|
|
618
|
+
&& err.details.round === 2,
|
|
619
|
+
);
|
|
620
|
+
});
|
|
621
|
+
|
|
622
|
+
test('LCLI-RR-15: post-critics refuses without any critic audit (synthetic-JSON bypass)', () => {
|
|
623
|
+
const r = _mkRoot();
|
|
624
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
625
|
+
_seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r);
|
|
626
|
+
// No critic-spawn audit → gate must refuse even if --critic-outputs is valid.
|
|
627
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
628
|
+
assert.throws(
|
|
629
|
+
() => loopRunRound.run(
|
|
630
|
+
['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs',
|
|
631
|
+
'[{"critic":"style","findings":[]},{"critic":"tests","findings":[]},{"critic":"acceptance","findings":[],"criteria":[]}]'],
|
|
632
|
+
{ cwd: r, stdout: _cap().stub },
|
|
633
|
+
),
|
|
634
|
+
(err) => err && err.code === 'loop-post-critics-missing-critic-audit'
|
|
635
|
+
&& Array.isArray(err.details.missing)
|
|
636
|
+
&& err.details.missing.length === 3,
|
|
637
|
+
);
|
|
638
|
+
});
|
|
639
|
+
|
|
640
|
+
test('LCLI-RR-16: post-critics refuses with only 2 of 3 critic audits (partial bypass)', () => {
|
|
641
|
+
const r = _mkRoot();
|
|
642
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
643
|
+
_seedSpawnEvidence('M001-S001-T0001', 1,
|
|
644
|
+
['np-executor', 'np-critic-style', 'np-critic-tests'], r); // missing acceptance
|
|
645
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
646
|
+
assert.throws(
|
|
647
|
+
() => loopRunRound.run(
|
|
648
|
+
['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs',
|
|
649
|
+
'[{"critic":"style","findings":[]},{"critic":"tests","findings":[]},{"critic":"acceptance","findings":[],"criteria":[]}]'],
|
|
650
|
+
{ cwd: r, stdout: _cap().stub },
|
|
651
|
+
),
|
|
652
|
+
(err) => err && err.code === 'loop-post-critics-missing-critic-audit'
|
|
653
|
+
&& err.details.missing.length === 1
|
|
654
|
+
&& err.details.missing[0] === 'np-critic-acceptance',
|
|
655
|
+
);
|
|
656
|
+
});
|
|
657
|
+
|
|
658
|
+
test('LCLI-RR-17: --force-post-executor bypasses Layer-C gate', () => {
|
|
659
|
+
const r = _mkRoot();
|
|
660
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
661
|
+
// No audit entries; force flag must let us through.
|
|
662
|
+
const cap = _cap();
|
|
663
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
664
|
+
loopRunRound.run(
|
|
665
|
+
['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0', '--force-post-executor'],
|
|
666
|
+
{ cwd: r, stdout: cap.stub },
|
|
667
|
+
);
|
|
668
|
+
const out = JSON.parse(cap.get());
|
|
669
|
+
assert.equal(out.next_action, 'spawn-critic-schwarm');
|
|
670
|
+
});
|
|
671
|
+
|
|
672
|
+
test('LCLI-RR-18: --force-post-critics bypasses Layer-C gate', () => {
|
|
673
|
+
const r = _mkRoot();
|
|
674
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
675
|
+
_seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r); // executor audited, critics not
|
|
676
|
+
const cap = _cap();
|
|
677
|
+
const loopRunRound = require('./loop-run-round.cjs');
|
|
678
|
+
loopRunRound.run(
|
|
679
|
+
['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs',
|
|
680
|
+
'[{"critic":"style","findings":[]},{"critic":"tests","findings":[]},{"critic":"acceptance","findings":[],"criteria":[]}]',
|
|
681
|
+
'--force-post-critics'],
|
|
682
|
+
{ cwd: r, stdout: cap.stub },
|
|
683
|
+
);
|
|
684
|
+
const out = JSON.parse(cap.get());
|
|
685
|
+
assert.equal(out.next_action, 'commit');
|
|
686
|
+
});
|
|
687
|
+
|
|
688
|
+
test('LCLI-RR-19: assertSpawnsAuditedForRound returns ordered missing list', () => {
|
|
689
|
+
const r = _mkRoot();
|
|
690
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
691
|
+
const nubosloop = require('../../lib/nubosloop.cjs');
|
|
692
|
+
nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
|
|
693
|
+
nubosloop.auditToolUse('M001-S001-T0001', 'np-critic-style', [], r);
|
|
694
|
+
const v = nubosloop.assertSpawnsAuditedForRound(
|
|
695
|
+
'M001-S001-T0001', nubosloop.POST_CRITICS_EVIDENCE, 1, r,
|
|
696
|
+
);
|
|
697
|
+
assert.equal(v.satisfied, false);
|
|
698
|
+
assert.deepEqual(v.missing, ['np-critic-tests', 'np-critic-acceptance']);
|
|
699
|
+
});
|
|
700
|
+
|
|
701
|
+
test('LCLI-RR-20: findSpawnAuditForRound is round-scoped (round-1 audit not visible from round-2)', () => {
|
|
702
|
+
const r = _mkRoot();
|
|
703
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
704
|
+
const nubosloop = require('../../lib/nubosloop.cjs');
|
|
705
|
+
nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
|
|
706
|
+
nubosloop.auditToolUse('M001-S001-T0001', 'np-critic-style', [], r);
|
|
707
|
+
assert.ok(nubosloop.findSpawnAuditForRound('M001-S001-T0001', 'np-critic-style', 1, r));
|
|
708
|
+
assert.equal(nubosloop.findSpawnAuditForRound('M001-S001-T0001', 'np-critic-style', 2, r), null);
|
|
709
|
+
});
|
|
710
|
+
|
|
711
|
+
test('LCLI-RR-21: loop-audit-tool-use accepts critics without --tool-use-log (records empty spawn)', () => {
|
|
712
|
+
const r = _mkRoot();
|
|
713
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
714
|
+
const nubosloop = require('../../lib/nubosloop.cjs');
|
|
715
|
+
nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
|
|
716
|
+
const loopAudit = require('./loop-audit-tool-use.cjs');
|
|
717
|
+
const cap = _cap();
|
|
718
|
+
loopAudit.run(['M001-S001-T0001', '--agent', 'np-critic-style'], { cwd: r, stdout: cap.stub });
|
|
719
|
+
const out = JSON.parse(cap.get());
|
|
720
|
+
assert.equal(out.agent, 'np-critic-style');
|
|
721
|
+
assert.equal(out.violation, null); // critics aren't audited for Rule 9
|
|
722
|
+
// The audit log must still record the spawn so Layer C can find it.
|
|
723
|
+
assert.ok(nubosloop.findSpawnAuditForRound('M001-S001-T0001', 'np-critic-style', 1, r));
|
|
724
|
+
});
|
|
725
|
+
|
|
726
|
+
test('LCLI-RR-22: loop-audit-tool-use still REQUIRES --tool-use-log for AUDITED_AGENTS', () => {
|
|
727
|
+
const r = _mkRoot();
|
|
728
|
+
checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
|
|
729
|
+
const loopAudit = require('./loop-audit-tool-use.cjs');
|
|
730
|
+
assert.throws(
|
|
731
|
+
() => loopAudit.run(['M001-S001-T0001', '--agent', 'np-executor'], { cwd: r, stdout: _cap().stub }),
|
|
732
|
+
(err) => err && err.code === 'loop-audit-missing-log',
|
|
733
|
+
);
|
|
734
|
+
});
|
|
735
|
+
|
|
543
736
|
test('LCLI-22: learning-match queries the local store', () => {
|
|
544
737
|
const r = _mkRoot();
|
|
545
738
|
const lr = require('../../lib/learnings.cjs');
|
|
@@ -81,6 +81,27 @@ function _runPostExecutor(taskId, list, cwd) {
|
|
|
81
81
|
{ hint: 'pass the exit code of the task verify command' },
|
|
82
82
|
);
|
|
83
83
|
}
|
|
84
|
+
// Layer C: audit-trail enforcement — refuse if no executor spawn was
|
|
85
|
+
// recorded for this round via `loop-audit-tool-use`. This blocks the
|
|
86
|
+
// bypass where an orchestrator stamps verify-green without actually
|
|
87
|
+
// spawning np-executor / np-build-fixer.
|
|
88
|
+
const force = list.includes('--force-post-executor');
|
|
89
|
+
if (!force) {
|
|
90
|
+
const cur = checkpoint.readCheckpoint(taskId, cwd) || {};
|
|
91
|
+
const round = Number((cur.nubosloop && cur.nubosloop.round)) || 1;
|
|
92
|
+
const required = round === 1 ? nubosloop.POST_EXECUTOR_EVIDENCE_R1 : nubosloop.POST_EXECUTOR_EVIDENCE_RN;
|
|
93
|
+
const verdict = nubosloop.assertSpawnsAuditedForRound(taskId, required, round, cwd);
|
|
94
|
+
if (!verdict.satisfied) {
|
|
95
|
+
throw new NubosPilotError(
|
|
96
|
+
'loop-post-executor-missing-spawn-audit',
|
|
97
|
+
'phase=post-executor refused: no `loop-audit-tool-use` record found for round=' + round +
|
|
98
|
+
', agent=' + verdict.missing.join('/') + ' on ' + taskId + '. ' +
|
|
99
|
+
'Spawn the executor/build-fixer agent and call `loop-audit-tool-use ' + taskId +
|
|
100
|
+
' --agent <name> --tool-use-log <json>` first, or pass --force-post-executor for an explicit override.',
|
|
101
|
+
{ taskId, round, missing: verdict.missing.slice(), required: required.slice() },
|
|
102
|
+
);
|
|
103
|
+
}
|
|
104
|
+
}
|
|
84
105
|
const code = Number(verifyExitCode);
|
|
85
106
|
const verifyOutputPath = args.getFlag(list, '--verify-output-path');
|
|
86
107
|
let verifyOutput = '';
|
|
@@ -132,6 +153,27 @@ function _runPostCritics(taskId, list, cwd) {
|
|
|
132
153
|
const pb = cp.nubosloop || {};
|
|
133
154
|
return Number(pb.round) || 1;
|
|
134
155
|
})();
|
|
156
|
+
// Layer C: audit-trail enforcement — refuse if the three critic spawns
|
|
157
|
+
// (style/tests/acceptance) are not present in the audit log for this round.
|
|
158
|
+
// This blocks the bypass where an orchestrator hand-writes synthetic
|
|
159
|
+
// critic-output JSON without actually spawning the critic agents.
|
|
160
|
+
const force = list.includes('--force-post-critics');
|
|
161
|
+
if (!force) {
|
|
162
|
+
const verdict = nubosloop.assertSpawnsAuditedForRound(
|
|
163
|
+
taskId, nubosloop.POST_CRITICS_EVIDENCE, round, cwd,
|
|
164
|
+
);
|
|
165
|
+
if (!verdict.satisfied) {
|
|
166
|
+
throw new NubosPilotError(
|
|
167
|
+
'loop-post-critics-missing-critic-audit',
|
|
168
|
+
'phase=post-critics refused: critic-schwarm spawn-evidence missing for round=' + round +
|
|
169
|
+
' on ' + taskId + ' (missing audits: ' + verdict.missing.join(', ') + '). ' +
|
|
170
|
+
'For each critic agent, call `loop-audit-tool-use ' + taskId +
|
|
171
|
+
' --agent <np-critic-style|np-critic-tests|np-critic-acceptance> --tool-use-log <json>` ' +
|
|
172
|
+
'after the spawn, then re-run --phase post-critics. Pass --force-post-critics for an explicit override.',
|
|
173
|
+
{ taskId, round, missing: verdict.missing.slice(), required: nubosloop.POST_CRITICS_EVIDENCE.slice() },
|
|
174
|
+
);
|
|
175
|
+
}
|
|
176
|
+
}
|
|
135
177
|
const opts = nubosloop.resolveLoopOpts(cwd);
|
|
136
178
|
// Rule 9 chain: convert this round's audit violations into rule-9-violation
|
|
137
179
|
// findings so they participate in routing alongside critic findings.
|
|
@@ -77,6 +77,39 @@ When `loop.maxRounds` is hit:
|
|
|
77
77
|
* Bad, because per-task token cost grows compared to the single-pass model. Accepted — that cost is the price of completeness, and the cache + cap bound it.
|
|
78
78
|
* Bad, because the orchestrator must coordinate 1 Executor + 3 Critics + occasional Researcher-Schwarm per task. Accepted — that coordination is what makes per-task adversarial review possible.
|
|
79
79
|
|
|
80
|
+
## Trust Layer (amended 2026-05-04)
|
|
81
|
+
|
|
82
|
+
The original spec assumed a cooperative orchestrator: each `loop-run-round --phase X` call was treated as evidence that the corresponding work happened. Multiple production runs proved that assumption wrong — under user-pressure or budget constraints, an orchestrator can rationalize partial-loops or fully-synthetic loops while still emitting the right CLI calls. Three failure modes observed in the wild:
|
|
83
|
+
|
|
84
|
+
1. **Single-pass bypass** — `executor → commit-task` directly, skipping the loop. (Closed by `commit-task` Layer-A gate; refuses without `nubosloop.last_phase=commit`.)
|
|
85
|
+
2. **Stamp bypass** — `loop-run-round --phase commit` invoked directly without prior phases, just to satisfy Layer A. (Closed by Layer-B precondition in `_runCommit`; refuses without `verify_exit_code=0` and `findings: []` on the checkpoint.)
|
|
86
|
+
3. **Synthetic-evidence bypass** — orchestrator invokes every `loop-run-round` phase but with hand-written `--critic-outputs '[{"critic":"style","findings":[]}, ...]'` JSON, never actually spawning the three critic agents. Layers A and B see a perfectly-shaped checkpoint and accept. (Closed by Layer-C audit-trail gate, this amendment.)
|
|
87
|
+
|
|
88
|
+
### Layer-C — Spawn-evidence audit-trail
|
|
89
|
+
|
|
90
|
+
Each LLM spawn (executor, build-fixer, three critics) MUST be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --task-id … --agent <name> --tool-use-log <json>`. The round number is sourced automatically from `nubosloop.round` to keep the chain idempotent.
|
|
91
|
+
|
|
92
|
+
Two phase verbs now consult this log before accepting an advance:
|
|
93
|
+
|
|
94
|
+
* **`loop-run-round --phase post-executor`** requires an audit entry for `np-executor` (round 1) or `np-build-fixer` (round ≥ 2) in the current round. Refuses with `loop-post-executor-missing-spawn-audit` otherwise.
|
|
95
|
+
* **`loop-run-round --phase post-critics`** requires audit entries for all three: `np-critic-style`, `np-critic-tests`, `np-critic-acceptance`. Refuses with `loop-post-critics-missing-critic-audit` otherwise.
|
|
96
|
+
|
|
97
|
+
Both phases accept explicit overrides — `--force-post-executor` / `--force-post-critics` — for legitimate test fixtures and migration. The override stamps `forced_*` flags on the checkpoint so dashboards can count them.
|
|
98
|
+
|
|
99
|
+
### Defense-in-depth summary
|
|
100
|
+
|
|
101
|
+
| Layer | Where | What it proves | Bypass cost |
|
|
102
|
+
|-------|---------------------|---------------------------------------------------------------|----------------------------------------|
|
|
103
|
+
| A | `commit-task.cjs` | The full sequence signature is on the checkpoint | Lie at all five evidence fields |
|
|
104
|
+
| B | `_runCommit` | Verify-green AND a post-critics findings array preceded the commit phase | Pre-write fake `verify_exit_code=0` and `findings: []` to the checkpoint manually |
|
|
105
|
+
| C | `_runPostExecutor` + `_runPostCritics` | Each declared spawn appears in the per-round audit log | Issue extra `loop-audit-tool-use` calls naming agents that didn't actually run |
|
|
106
|
+
|
|
107
|
+
No layer is sufficient alone. Together they require a deliberate, multi-step lie across multiple verbs to bypass — far more deliberate than the "pragmatic compression" rationalizations that produced bypasses 1–3 in production.
|
|
108
|
+
|
|
109
|
+
### What the Trust Layer cannot prove
|
|
110
|
+
|
|
111
|
+
Layer C still cannot prove that the agent named in an audit entry actually ran. The orchestrator could call `loop-audit-tool-use --agent np-critic-style …` without spawning the critic. Closing this gap requires runtime instrumentation — the LLM runtime itself stamps spawn-provenance metadata into the audit entry, which the orchestrator cannot forge. That is "Stufe 2" and tracked separately; this amendment closes the practical bypass class without it.
|
|
112
|
+
|
|
80
113
|
## More Information
|
|
81
114
|
|
|
82
115
|
* **Related ADR:** [ADR-0001](0001-no-daemon-invariant.md) — the loop runs in-session; no daemon coordinates spawns.
|
|
@@ -85,3 +118,4 @@ When `loop.maxRounds` is hit:
|
|
|
85
118
|
* **Related ADR:** [ADR-0012](0012-completeness-doctrine.md) — the loop enforces the Completeness Mandate.
|
|
86
119
|
* **Concept page:** [`v1/concepts/nubosloop.md`](../../knowledge/libraries/nubos-pilot/v1/concepts/nubosloop.md).
|
|
87
120
|
* **Library:** `lib/nubosloop.cjs`.
|
|
121
|
+
* **Gate code:** `bin/np-tools/commit-task.cjs::_assertLoopGate` (Layer A); `bin/np-tools/loop-run-round.cjs::_runCommit` (Layer B); `bin/np-tools/loop-run-round.cjs::_runPostExecutor` + `_runPostCritics` (Layer C).
|
package/lib/nubosloop.cjs
CHANGED
|
@@ -341,6 +341,52 @@ const SEARCH_TOOLS = Object.freeze([
|
|
|
341
341
|
|
|
342
342
|
const AUDITED_AGENTS = Object.freeze(['np-researcher', 'np-executor', 'np-build-fixer']);
|
|
343
343
|
|
|
344
|
+
// Spawn-evidence agent groups (ADR-0010 Layer-C audit-trail enforcement).
|
|
345
|
+
// These lists are NOT about Rule 9 (which AUDITED_AGENTS gates) — they declare
|
|
346
|
+
// which spawns MUST appear in the per-round tool-use audit log before the
|
|
347
|
+
// orchestrator can advance loop-run-round through `post-executor`/`post-critics`.
|
|
348
|
+
// An entry in tool_use_audit with matching agent+round is the only evidence
|
|
349
|
+
// the gate accepts that the spawn actually happened.
|
|
350
|
+
const POST_EXECUTOR_EVIDENCE_R1 = Object.freeze(['np-executor']);
|
|
351
|
+
const POST_EXECUTOR_EVIDENCE_RN = Object.freeze(['np-build-fixer']);
|
|
352
|
+
const POST_CRITICS_EVIDENCE = Object.freeze([
|
|
353
|
+
'np-critic-style',
|
|
354
|
+
'np-critic-tests',
|
|
355
|
+
'np-critic-acceptance',
|
|
356
|
+
]);
|
|
357
|
+
|
|
358
|
+
/**
|
|
359
|
+
* Look up a spawn-audit entry for a given (taskId, agent, round). Returns the
|
|
360
|
+
* audit entry object if found, null otherwise. Used by Layer-C gates in
|
|
361
|
+
* loop-run-round to assert that real spawns preceded each phase advance.
|
|
362
|
+
*/
|
|
363
|
+
function findSpawnAuditForRound(taskId, agent, round, cwd) {
|
|
364
|
+
if (!checkpoint.TASK_ID_RE.test(taskId)) return null;
|
|
365
|
+
const target = Number(round);
|
|
366
|
+
if (!Number.isFinite(target) || target < 1) return null;
|
|
367
|
+
const audits = readToolUseAudit(taskId, cwd) || [];
|
|
368
|
+
for (const a of audits) {
|
|
369
|
+
if (!a) continue;
|
|
370
|
+
if (a.agent !== agent) continue;
|
|
371
|
+
if ((Number(a.round) || 1) !== target) continue;
|
|
372
|
+
return a;
|
|
373
|
+
}
|
|
374
|
+
return null;
|
|
375
|
+
}
|
|
376
|
+
|
|
377
|
+
/**
|
|
378
|
+
* Assert every required spawn for a phase exists in the audit log for the
|
|
379
|
+
* current round. Returns { satisfied, missing } — the orchestrator-side gate
|
|
380
|
+
* uses `missing` to compose actionable error messages.
|
|
381
|
+
*/
|
|
382
|
+
function assertSpawnsAuditedForRound(taskId, requiredAgents, round, cwd) {
|
|
383
|
+
const missing = [];
|
|
384
|
+
for (const agent of requiredAgents) {
|
|
385
|
+
if (!findSpawnAuditForRound(taskId, agent, round, cwd)) missing.push(agent);
|
|
386
|
+
}
|
|
387
|
+
return { satisfied: missing.length === 0, missing };
|
|
388
|
+
}
|
|
389
|
+
|
|
344
390
|
/**
|
|
345
391
|
* Rule 9 mechanical check (Completeness Doctrine + ADR-0010 Step 4).
|
|
346
392
|
* The orchestrator collects each spawn's tool-use log (most LLM APIs
|
|
@@ -637,6 +683,11 @@ module.exports = {
|
|
|
637
683
|
auditToolUse,
|
|
638
684
|
readToolUseAudit,
|
|
639
685
|
auditFindingsForRound,
|
|
686
|
+
findSpawnAuditForRound,
|
|
687
|
+
assertSpawnsAuditedForRound,
|
|
688
|
+
POST_EXECUTOR_EVIDENCE_R1,
|
|
689
|
+
POST_EXECUTOR_EVIDENCE_RN,
|
|
690
|
+
POST_CRITICS_EVIDENCE,
|
|
640
691
|
KNOWN_ROUTING_BUCKETS,
|
|
641
692
|
SEARCH_TOOLS,
|
|
642
693
|
AUDITED_AGENTS,
|
package/package.json
CHANGED
|
@@ -223,13 +223,19 @@ for WAVE_INDEX in 0 1 2 ...; do
|
|
|
223
223
|
|
|
224
224
|
node .nubos-pilot/bin/np-tools.cjs checkpoint transition "$TASK_ID" verifying
|
|
225
225
|
|
|
226
|
-
# === Step 4: Mechanical Checks +
|
|
226
|
+
# === Step 4: Mechanical Checks + spawn-evidence audit (orchestrator-side) ===
|
|
227
227
|
VERIFY_LOG="${TMPDIR:-/tmp}/np-verify-${TASK_ID}-r${ROUND}.log"
|
|
228
228
|
# Orchestrator (NOT the agent) runs the task's <verify> command + stack
|
|
229
229
|
# linters; redirect stdout+stderr to $VERIFY_LOG.
|
|
230
230
|
VERIFY_EXIT=$?
|
|
231
|
+
# Stamp executor spawn-evidence into the audit log. EXECUTOR_TOOL_LOG is
|
|
232
|
+
# the tool-name JSON array harvested from the spawn's tool_use stream
|
|
233
|
+
# (e.g. '["Read","search-knowledge","Edit","Bash"]'). For AUDITED_AGENTS
|
|
234
|
+
# this drives Rule 9 enforcement; the round number is sourced automatically
|
|
235
|
+
# from the checkpoint by loop-audit-tool-use. The post-executor gate (Layer C)
|
|
236
|
+
# refuses to advance unless this evidence stamp exists for the current round.
|
|
231
237
|
node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" \
|
|
232
|
-
--
|
|
238
|
+
--agent "$EXECUTOR_AGENT" --tool-use-log "$EXECUTOR_TOOL_LOG"
|
|
233
239
|
|
|
234
240
|
POST_EXEC=$(node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
|
|
235
241
|
--phase post-executor \
|
|
@@ -249,6 +255,19 @@ for WAVE_INDEX in 0 1 2 ...; do
|
|
|
249
255
|
# - agents/np-critic-acceptance.md (sonnet) → CRITIC_ACCEPTANCE_JSON
|
|
250
256
|
CRITIC_OUTPUTS_JSON=$(printf '[%s,%s,%s]' "$CRITIC_STYLE_JSON" "$CRITIC_TESTS_JSON" "$CRITIC_ACCEPTANCE_JSON")
|
|
251
257
|
|
|
258
|
+
# === Step 5b: Stamp critic spawn-evidence (one audit entry per critic) ===
|
|
259
|
+
# MANDATORY — without these three stamps, post-critics refuses with
|
|
260
|
+
# `loop-post-critics-missing-critic-audit` (Layer C, ADR-0010 Trust-Layer).
|
|
261
|
+
# The orchestrator MUST issue all three calls AFTER the critic spawns
|
|
262
|
+
# have actually run; synthetic --critic-outputs JSON without these
|
|
263
|
+
# corresponding audit entries is mechanically blocked.
|
|
264
|
+
# --tool-use-log may be empty for critics (they aren't AUDITED_AGENTS for
|
|
265
|
+
# Rule 9), but supplying the actual critic tool list is preferred for
|
|
266
|
+
# observability on np:dashboard.
|
|
267
|
+
node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-critic-style --tool-use-log '[]'
|
|
268
|
+
node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-critic-tests --tool-use-log '[]'
|
|
269
|
+
node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-critic-acceptance --tool-use-log '[]'
|
|
270
|
+
|
|
252
271
|
# === Step 6: Route via loop-evaluate (post-critics) ===
|
|
253
272
|
POST_CRIT=$(node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
|
|
254
273
|
--phase post-critics --critic-outputs "$CRITIC_OUTPUTS_JSON")
|
|
@@ -347,7 +366,7 @@ After every slice completes, point the operator at `/np:validate-phase $PHASE` t
|
|
|
347
366
|
- Spawn the three Critic agents (`np-critic-style`, `np-critic-tests`, `np-critic-acceptance`) IN PARALLEL — single message, three Agent blocks per task per round.
|
|
348
367
|
- Run `loop-run-round --phase post-executor` AFTER mechanical checks; honor `next_action: spawn-build-fixer` (verify-red short-circuit, skip critics this round).
|
|
349
368
|
- Run `loop-run-round --phase post-critics` AFTER critics return, to obtain the routing `next_action`.
|
|
350
|
-
- Run `loop-audit-tool-use` per round per spawn — Rule 9 (
|
|
369
|
+
- Run `loop-audit-tool-use` per round per spawn — for executor/build-fixer this drives Rule 9 enforcement, AND for the three Critic agents this is the spawn-evidence required by the Layer-C audit-trail gate (`loop-post-executor-missing-spawn-audit` / `loop-post-critics-missing-critic-audit`). All four audit calls per round are mandatory before the corresponding `loop-run-round --phase post-{executor|critics}` invocation.
|
|
351
370
|
- Route every commit through `node .nubos-pilot/bin/np-tools.cjs commit-task` so `assertCommittablePaths` (D-25) runs.
|
|
352
371
|
- Hard-stop the wave when `commit-task` returns non-zero, OR a task hits `stuck`/`plan-checker`.
|
|
353
372
|
|
|
@@ -357,7 +376,7 @@ After every slice completes, point the operator at `/np:validate-phase $PHASE` t
|
|
|
357
376
|
- Skip the Nubosloop and call `commit-task` directly after the executor (single-pass executor → commit is forbidden — ADR-0010).
|
|
358
377
|
- Spawn the Critic agents serially — they MUST run in parallel (single message, three Agent blocks).
|
|
359
378
|
- Use `np-executor` on Round ≥ 2 — use `np-build-fixer` (it gets prior critic findings + verify output excerpt).
|
|
360
|
-
- Skip `loop-audit-tool-use`
|
|
379
|
+
- Skip `loop-audit-tool-use` for ANY spawn (executor/build-fixer/the three Critics). Skipping the executor audit silences Rule 9; skipping any critic audit means the orchestrator cannot prove the critic actually ran, and the post-critics gate refuses. Synthesizing `--critic-outputs` JSON without spawning real critic agents is the canonical bypass — Layer C blocks it mechanically.
|
|
361
380
|
- Extend a task's scope beyond `files_modified` — D-04 violations route to `plan-checker`, not post-hoc PLAN.md mutations.
|
|
362
381
|
- Invoke `git commit`, `git add`, or any bare git command from this workflow or the spawned agent (CLAUDE.md §Git operations).
|
|
363
382
|
- Bundle two tasks into one commit (ADR-0004 atomicity).
|