nubos-pilot 0.9.5 → 0.9.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -26,6 +26,10 @@ This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENES
26
26
 
27
27
  Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
28
28
 
29
+ ## Spawn-Evidence Audit (Trust Layer, ADR-0010)
30
+
31
+ Your spawn must be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --agent np-critic-acceptance --tool-use-log <json>` after you emit your findings JSON. This is the orchestrator's responsibility, not yours — but if you observe (in the verify output or task summary) that a prior round's critic-schwarm completed without an audit stamp, surface that as a finding of category `locked-decision-violation` because it indicates a bypass of ADR-0010 Layer C. The post-critics gate (`loop-run-round --phase post-critics`) refuses without the three critic stamps; missing your stamp blocks the entire round.
32
+
29
33
  ## Inputs
30
34
 
31
35
  The orchestrator provides these paths in your prompt context. Read every path it hands you via `Read` — do not guess.
@@ -25,6 +25,10 @@ This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENES
25
25
 
26
26
  Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
27
27
 
28
+ ## Spawn-Evidence Audit (Trust Layer, ADR-0010)
29
+
30
+ Your spawn must be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --agent np-critic-style --tool-use-log <json>` after you emit your findings JSON. The post-critics gate refuses without the three critic stamps; missing your stamp blocks the entire round. Synthesizing a fake findings JSON without spawning your sibling critics is a Layer-C violation and the orchestrator must NOT do it.
31
+
28
32
  ## Inputs
29
33
 
30
34
  The orchestrator provides these paths in your prompt context. Read every path it hands you via `Read` — do not guess.
@@ -25,6 +25,10 @@ This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENES
25
25
 
26
26
  Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
27
27
 
28
+ ## Spawn-Evidence Audit (Trust Layer, ADR-0010)
29
+
30
+ Your spawn must be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --agent np-critic-tests --tool-use-log <json>` after you emit your findings JSON. The post-critics gate refuses without the three critic stamps; missing your stamp blocks the entire round. Synthesizing a fake findings JSON without spawning your sibling critics is a Layer-C violation and the orchestrator must NOT do it.
31
+
28
32
  ## Inputs
29
33
 
30
34
  The orchestrator provides these paths in your prompt context. Read every path it hands you via `Read` — do not guess.
@@ -31,18 +31,35 @@ function run(argv, ctx) {
31
31
  { hint: 'agents requiring search tools: ' + nubosloop.AUDITED_AGENTS.join(', ') },
32
32
  );
33
33
  }
34
- const log = args.getJsonFlag(
35
- tail,
36
- '--tool-use-log',
37
- 'loop-audit-missing-log',
38
- "JSON array of tool-name strings, e.g. '[\"Read\",\"search-knowledge\",\"Edit\"]'",
39
- );
40
- if (!Array.isArray(log)) {
34
+ // --tool-use-log is required for AUDITED_AGENTS (Rule 9 enforcement reads
35
+ // the tool list to verify search-knowledge / match-existing-learning calls).
36
+ // For non-audited spawns (critics, plan-checker, etc.) the orchestrator may
37
+ // omit it — we still record the spawn for Layer-C audit-trail evidence with
38
+ // an empty log. Explicit empty-array is also accepted.
39
+ const isAuditedAgent = nubosloop.AUDITED_AGENTS.includes(agent);
40
+ let log;
41
+ if (tail.includes('--tool-use-log')) {
42
+ log = args.getJsonFlag(
43
+ tail,
44
+ '--tool-use-log',
45
+ 'loop-audit-missing-log',
46
+ "JSON array of tool-name strings, e.g. '[\"Read\",\"search-knowledge\",\"Edit\"]'",
47
+ );
48
+ if (!Array.isArray(log)) {
49
+ throw new (require('../../lib/core.cjs').NubosPilotError)(
50
+ 'loop-audit-invalid-log',
51
+ '--tool-use-log must be a JSON array',
52
+ { got: typeof log },
53
+ );
54
+ }
55
+ } else if (isAuditedAgent) {
41
56
  throw new (require('../../lib/core.cjs').NubosPilotError)(
42
- 'loop-audit-invalid-log',
43
- '--tool-use-log must be a JSON array',
44
- { got: typeof log },
57
+ 'loop-audit-missing-log',
58
+ 'loop-audit-tool-use requires --tool-use-log for audited agent: ' + agent,
59
+ { hint: 'audited agents drive Rule 9 enforcement; pass --tool-use-log \'[]\' to record an empty spawn' },
45
60
  );
61
+ } else {
62
+ log = [];
46
63
  }
47
64
  const result = nubosloop.auditToolUse(taskId, agent, log, cwd);
48
65
  const payload = { task_id: taskId, ...result };
@@ -349,9 +349,25 @@ test('LCLI-RR-2: loop-run-round preflight on populated store → spawn-executor-
349
349
  assert.ok(out.cache_hit);
350
350
  });
351
351
 
352
+ // Helper: seed the per-round spawn-evidence audit log so Layer-C gates accept
353
+ // post-executor / post-critics. Tests that exercise the gate explicitly
354
+ // (LCLI-RR-12+) build their own partial fixtures.
355
+ function _seedSpawnEvidence(taskId, round, agents, cwd) {
356
+ const nubosloop = require('../../lib/nubosloop.cjs');
357
+ nubosloop.recordLoopState(taskId, { round }, cwd);
358
+ for (const a of agents) {
359
+ // Pass an empty tool-use log — these are evidence stamps, not Rule 9 audits.
360
+ // For AUDITED_AGENTS in this test (np-executor / np-build-fixer) we need to
361
+ // pass a valid search-tool to avoid generating a rule-9-violation finding.
362
+ const log = nubosloop.AUDITED_AGENTS.includes(a) ? ['search-knowledge'] : [];
363
+ nubosloop.auditToolUse(taskId, a, log, cwd);
364
+ }
365
+ }
366
+
352
367
  test('LCLI-RR-3: loop-run-round phase=post-executor with verify-green → spawn-critic-schwarm', () => {
353
368
  const r = _mkRoot();
354
369
  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
370
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r);
355
371
  const cap = _cap();
356
372
  const loopRunRound = require('./loop-run-round.cjs');
357
373
  loopRunRound.run(
@@ -366,6 +382,7 @@ test('LCLI-RR-3: loop-run-round phase=post-executor with verify-green → spawn-
366
382
  test('LCLI-RR-4: loop-run-round phase=post-executor with verify-red → spawn-build-fixer', () => {
367
383
  const r = _mkRoot();
368
384
  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
385
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r);
369
386
  const cap = _cap();
370
387
  const loopRunRound = require('./loop-run-round.cjs');
371
388
  loopRunRound.run(
@@ -380,6 +397,8 @@ test('LCLI-RR-4: loop-run-round phase=post-executor with verify-red → spawn-bu
380
397
  test('LCLI-RR-5: loop-run-round phase=post-critics with zero findings → commit', () => {
381
398
  const r = _mkRoot();
382
399
  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
400
+ _seedSpawnEvidence('M001-S001-T0001', 1,
401
+ ['np-executor', 'np-critic-style', 'np-critic-tests', 'np-critic-acceptance'], r);
383
402
  const cap = _cap();
384
403
  const loopRunRound = require('./loop-run-round.cjs');
385
404
  loopRunRound.run(
@@ -399,6 +418,10 @@ test('LCLI-RR-5b: post-critics surfaces rule-9-violation from audit log even wit
399
418
  // Round 1, executor shipped without searching → audit captures violation
400
419
  nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
401
420
  nubosloop.auditToolUse('M001-S001-T0001', 'np-executor', ['Read', 'Edit'], r);
421
+ // Seed the three critic spawn evidences so the Layer-C gate is satisfied —
422
+ // we want the rule-9-violation to surface from the audit log, not the gate.
423
+ _seedSpawnEvidence('M001-S001-T0001', 1,
424
+ ['np-critic-style', 'np-critic-tests', 'np-critic-acceptance'], r);
402
425
  // Critics return zero findings (style/tests/acceptance all clean) — without
403
426
  // the Rule 9 chain the loop would commit. With it, the audit violation must
404
427
  // still route the round to executor.
@@ -428,6 +451,9 @@ test('LCLI-RR-5c: post-critics scopes audit findings to current round only', ()
428
451
  nubosloop.auditToolUse('M001-S001-T0001', 'np-executor', ['Read'], r);
429
452
  nubosloop.recordLoopState('M001-S001-T0001', { round: 2 }, r);
430
453
  nubosloop.auditToolUse('M001-S001-T0001', 'np-build-fixer', ['search-knowledge'], r);
454
+ // Seed critic-spawn evidence for round 2 so the Layer-C gate is satisfied.
455
+ _seedSpawnEvidence('M001-S001-T0001', 2,
456
+ ['np-critic-style', 'np-critic-tests', 'np-critic-acceptance'], r);
431
457
  const cap = _cap();
432
458
  const loopRunRound = require('./loop-run-round.cjs');
433
459
  loopRunRound.run(
@@ -540,6 +566,173 @@ test('LCLI-RR-11: phase=commit --force-commit-phase bypasses preconditions and s
540
566
  assert.equal(cp.nubosloop.forced_commit_phase, true);
541
567
  });
542
568
 
569
+ // Layer C — audit-trail evidence enforcement -------------------------------
570
+
571
+ test('LCLI-RR-12: post-executor refuses without np-executor audit (R1)', () => {
572
+ const r = _mkRoot();
573
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
574
+ // Round defaults to 1 with no audit entries.
575
+ const loopRunRound = require('./loop-run-round.cjs');
576
+ assert.throws(
577
+ () => loopRunRound.run(
578
+ ['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0'],
579
+ { cwd: r, stdout: _cap().stub },
580
+ ),
581
+ (err) => err && err.code === 'loop-post-executor-missing-spawn-audit'
582
+ && Array.isArray(err.details && err.details.missing)
583
+ && err.details.missing.includes('np-executor')
584
+ && err.details.round === 1,
585
+ );
586
+ });
587
+
588
+ test('LCLI-RR-13: post-executor refuses on R1 if only np-build-fixer was audited (wrong agent)', () => {
589
+ const r = _mkRoot();
590
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
591
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-build-fixer'], r);
592
+ const loopRunRound = require('./loop-run-round.cjs');
593
+ assert.throws(
594
+ () => loopRunRound.run(
595
+ ['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0'],
596
+ { cwd: r, stdout: _cap().stub },
597
+ ),
598
+ (err) => err && err.code === 'loop-post-executor-missing-spawn-audit'
599
+ && err.details.missing.includes('np-executor'),
600
+ );
601
+ });
602
+
603
+ test('LCLI-RR-14: post-executor on R≥2 requires np-build-fixer audit, not np-executor', () => {
604
+ const r = _mkRoot();
605
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
606
+ // Advance to round 2; audit only the wrong agent (np-executor).
607
+ const nubosloop = require('../../lib/nubosloop.cjs');
608
+ nubosloop.recordLoopState('M001-S001-T0001', { round: 2 }, r);
609
+ nubosloop.auditToolUse('M001-S001-T0001', 'np-executor', ['search-knowledge'], r);
610
+ const loopRunRound = require('./loop-run-round.cjs');
611
+ assert.throws(
612
+ () => loopRunRound.run(
613
+ ['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0'],
614
+ { cwd: r, stdout: _cap().stub },
615
+ ),
616
+ (err) => err && err.code === 'loop-post-executor-missing-spawn-audit'
617
+ && err.details.missing.includes('np-build-fixer')
618
+ && err.details.round === 2,
619
+ );
620
+ });
621
+
622
+ test('LCLI-RR-15: post-critics refuses without any critic audit (synthetic-JSON bypass)', () => {
623
+ const r = _mkRoot();
624
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
625
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r);
626
+ // No critic-spawn audit → gate must refuse even if --critic-outputs is valid.
627
+ const loopRunRound = require('./loop-run-round.cjs');
628
+ assert.throws(
629
+ () => loopRunRound.run(
630
+ ['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs',
631
+ '[{"critic":"style","findings":[]},{"critic":"tests","findings":[]},{"critic":"acceptance","findings":[],"criteria":[]}]'],
632
+ { cwd: r, stdout: _cap().stub },
633
+ ),
634
+ (err) => err && err.code === 'loop-post-critics-missing-critic-audit'
635
+ && Array.isArray(err.details.missing)
636
+ && err.details.missing.length === 3,
637
+ );
638
+ });
639
+
640
+ test('LCLI-RR-16: post-critics refuses with only 2 of 3 critic audits (partial bypass)', () => {
641
+ const r = _mkRoot();
642
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
643
+ _seedSpawnEvidence('M001-S001-T0001', 1,
644
+ ['np-executor', 'np-critic-style', 'np-critic-tests'], r); // missing acceptance
645
+ const loopRunRound = require('./loop-run-round.cjs');
646
+ assert.throws(
647
+ () => loopRunRound.run(
648
+ ['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs',
649
+ '[{"critic":"style","findings":[]},{"critic":"tests","findings":[]},{"critic":"acceptance","findings":[],"criteria":[]}]'],
650
+ { cwd: r, stdout: _cap().stub },
651
+ ),
652
+ (err) => err && err.code === 'loop-post-critics-missing-critic-audit'
653
+ && err.details.missing.length === 1
654
+ && err.details.missing[0] === 'np-critic-acceptance',
655
+ );
656
+ });
657
+
658
+ test('LCLI-RR-17: --force-post-executor bypasses Layer-C gate', () => {
659
+ const r = _mkRoot();
660
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
661
+ // No audit entries; force flag must let us through.
662
+ const cap = _cap();
663
+ const loopRunRound = require('./loop-run-round.cjs');
664
+ loopRunRound.run(
665
+ ['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0', '--force-post-executor'],
666
+ { cwd: r, stdout: cap.stub },
667
+ );
668
+ const out = JSON.parse(cap.get());
669
+ assert.equal(out.next_action, 'spawn-critic-schwarm');
670
+ });
671
+
672
+ test('LCLI-RR-18: --force-post-critics bypasses Layer-C gate', () => {
673
+ const r = _mkRoot();
674
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
675
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r); // executor audited, critics not
676
+ const cap = _cap();
677
+ const loopRunRound = require('./loop-run-round.cjs');
678
+ loopRunRound.run(
679
+ ['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs',
680
+ '[{"critic":"style","findings":[]},{"critic":"tests","findings":[]},{"critic":"acceptance","findings":[],"criteria":[]}]',
681
+ '--force-post-critics'],
682
+ { cwd: r, stdout: cap.stub },
683
+ );
684
+ const out = JSON.parse(cap.get());
685
+ assert.equal(out.next_action, 'commit');
686
+ });
687
+
688
+ test('LCLI-RR-19: assertSpawnsAuditedForRound returns ordered missing list', () => {
689
+ const r = _mkRoot();
690
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
691
+ const nubosloop = require('../../lib/nubosloop.cjs');
692
+ nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
693
+ nubosloop.auditToolUse('M001-S001-T0001', 'np-critic-style', [], r);
694
+ const v = nubosloop.assertSpawnsAuditedForRound(
695
+ 'M001-S001-T0001', nubosloop.POST_CRITICS_EVIDENCE, 1, r,
696
+ );
697
+ assert.equal(v.satisfied, false);
698
+ assert.deepEqual(v.missing, ['np-critic-tests', 'np-critic-acceptance']);
699
+ });
700
+
701
+ test('LCLI-RR-20: findSpawnAuditForRound is round-scoped (round-1 audit not visible from round-2)', () => {
702
+ const r = _mkRoot();
703
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
704
+ const nubosloop = require('../../lib/nubosloop.cjs');
705
+ nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
706
+ nubosloop.auditToolUse('M001-S001-T0001', 'np-critic-style', [], r);
707
+ assert.ok(nubosloop.findSpawnAuditForRound('M001-S001-T0001', 'np-critic-style', 1, r));
708
+ assert.equal(nubosloop.findSpawnAuditForRound('M001-S001-T0001', 'np-critic-style', 2, r), null);
709
+ });
710
+
711
+ test('LCLI-RR-21: loop-audit-tool-use accepts critics without --tool-use-log (records empty spawn)', () => {
712
+ const r = _mkRoot();
713
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
714
+ const nubosloop = require('../../lib/nubosloop.cjs');
715
+ nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
716
+ const loopAudit = require('./loop-audit-tool-use.cjs');
717
+ const cap = _cap();
718
+ loopAudit.run(['M001-S001-T0001', '--agent', 'np-critic-style'], { cwd: r, stdout: cap.stub });
719
+ const out = JSON.parse(cap.get());
720
+ assert.equal(out.agent, 'np-critic-style');
721
+ assert.equal(out.violation, null); // critics aren't audited for Rule 9
722
+ // The audit log must still record the spawn so Layer C can find it.
723
+ assert.ok(nubosloop.findSpawnAuditForRound('M001-S001-T0001', 'np-critic-style', 1, r));
724
+ });
725
+
726
+ test('LCLI-RR-22: loop-audit-tool-use still REQUIRES --tool-use-log for AUDITED_AGENTS', () => {
727
+ const r = _mkRoot();
728
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
729
+ const loopAudit = require('./loop-audit-tool-use.cjs');
730
+ assert.throws(
731
+ () => loopAudit.run(['M001-S001-T0001', '--agent', 'np-executor'], { cwd: r, stdout: _cap().stub }),
732
+ (err) => err && err.code === 'loop-audit-missing-log',
733
+ );
734
+ });
735
+
543
736
  test('LCLI-22: learning-match queries the local store', () => {
544
737
  const r = _mkRoot();
545
738
  const lr = require('../../lib/learnings.cjs');
@@ -81,6 +81,27 @@ function _runPostExecutor(taskId, list, cwd) {
81
81
  { hint: 'pass the exit code of the task verify command' },
82
82
  );
83
83
  }
84
+ // Layer C: audit-trail enforcement — refuse if no executor spawn was
85
+ // recorded for this round via `loop-audit-tool-use`. This blocks the
86
+ // bypass where an orchestrator stamps verify-green without actually
87
+ // spawning np-executor / np-build-fixer.
88
+ const force = list.includes('--force-post-executor');
89
+ if (!force) {
90
+ const cur = checkpoint.readCheckpoint(taskId, cwd) || {};
91
+ const round = Number((cur.nubosloop && cur.nubosloop.round)) || 1;
92
+ const required = round === 1 ? nubosloop.POST_EXECUTOR_EVIDENCE_R1 : nubosloop.POST_EXECUTOR_EVIDENCE_RN;
93
+ const verdict = nubosloop.assertSpawnsAuditedForRound(taskId, required, round, cwd);
94
+ if (!verdict.satisfied) {
95
+ throw new NubosPilotError(
96
+ 'loop-post-executor-missing-spawn-audit',
97
+ 'phase=post-executor refused: no `loop-audit-tool-use` record found for round=' + round +
98
+ ', agent=' + verdict.missing.join('/') + ' on ' + taskId + '. ' +
99
+ 'Spawn the executor/build-fixer agent and call `loop-audit-tool-use ' + taskId +
100
+ ' --agent <name> --tool-use-log <json>` first, or pass --force-post-executor for an explicit override.',
101
+ { taskId, round, missing: verdict.missing.slice(), required: required.slice() },
102
+ );
103
+ }
104
+ }
84
105
  const code = Number(verifyExitCode);
85
106
  const verifyOutputPath = args.getFlag(list, '--verify-output-path');
86
107
  let verifyOutput = '';
@@ -132,6 +153,27 @@ function _runPostCritics(taskId, list, cwd) {
132
153
  const pb = cp.nubosloop || {};
133
154
  return Number(pb.round) || 1;
134
155
  })();
156
+ // Layer C: audit-trail enforcement — refuse if the three critic spawns
157
+ // (style/tests/acceptance) are not present in the audit log for this round.
158
+ // This blocks the bypass where an orchestrator hand-writes synthetic
159
+ // critic-output JSON without actually spawning the critic agents.
160
+ const force = list.includes('--force-post-critics');
161
+ if (!force) {
162
+ const verdict = nubosloop.assertSpawnsAuditedForRound(
163
+ taskId, nubosloop.POST_CRITICS_EVIDENCE, round, cwd,
164
+ );
165
+ if (!verdict.satisfied) {
166
+ throw new NubosPilotError(
167
+ 'loop-post-critics-missing-critic-audit',
168
+ 'phase=post-critics refused: critic-schwarm spawn-evidence missing for round=' + round +
169
+ ' on ' + taskId + ' (missing audits: ' + verdict.missing.join(', ') + '). ' +
170
+ 'For each critic agent, call `loop-audit-tool-use ' + taskId +
171
+ ' --agent <np-critic-style|np-critic-tests|np-critic-acceptance> --tool-use-log <json>` ' +
172
+ 'after the spawn, then re-run --phase post-critics. Pass --force-post-critics for an explicit override.',
173
+ { taskId, round, missing: verdict.missing.slice(), required: nubosloop.POST_CRITICS_EVIDENCE.slice() },
174
+ );
175
+ }
176
+ }
135
177
  const opts = nubosloop.resolveLoopOpts(cwd);
136
178
  // Rule 9 chain: convert this round's audit violations into rule-9-violation
137
179
  // findings so they participate in routing alongside critic findings.
@@ -77,6 +77,39 @@ When `loop.maxRounds` is hit:
77
77
  * Bad, because per-task token cost grows compared to the single-pass model. Accepted — that cost is the price of completeness, and the cache + cap bound it.
78
78
  * Bad, because the orchestrator must coordinate 1 Executor + 3 Critics + occasional Researcher-Schwarm per task. Accepted — that coordination is what makes per-task adversarial review possible.
79
79
 
80
+ ## Trust Layer (amended 2026-05-04)
81
+
82
+ The original spec assumed a cooperative orchestrator: each `loop-run-round --phase X` call was treated as evidence that the corresponding work happened. Multiple production runs proved that assumption wrong — under user-pressure or budget constraints, an orchestrator can rationalize partial-loops or fully-synthetic loops while still emitting the right CLI calls. Three failure modes observed in the wild:
83
+
84
+ 1. **Single-pass bypass** — `executor → commit-task` directly, skipping the loop. (Closed by `commit-task` Layer-A gate; refuses without `nubosloop.last_phase=commit`.)
85
+ 2. **Stamp bypass** — `loop-run-round --phase commit` invoked directly without prior phases, just to satisfy Layer A. (Closed by Layer-B precondition in `_runCommit`; refuses without `verify_exit_code=0` and `findings: []` on the checkpoint.)
86
+ 3. **Synthetic-evidence bypass** — orchestrator invokes every `loop-run-round` phase but with hand-written `--critic-outputs '[{"critic":"style","findings":[]}, ...]'` JSON, never actually spawning the three critic agents. Layers A and B see a perfectly-shaped checkpoint and accept. (Closed by Layer-C audit-trail gate, this amendment.)
87
+
88
+ ### Layer-C — Spawn-evidence audit-trail
89
+
90
+ Each LLM spawn (executor, build-fixer, three critics) MUST be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --task-id … --agent <name> --tool-use-log <json>`. The round number is sourced automatically from `nubosloop.round` to keep the chain idempotent.
91
+
92
+ Two phase verbs now consult this log before accepting an advance:
93
+
94
+ * **`loop-run-round --phase post-executor`** requires an audit entry for `np-executor` (round 1) or `np-build-fixer` (round ≥ 2) in the current round. Refuses with `loop-post-executor-missing-spawn-audit` otherwise.
95
+ * **`loop-run-round --phase post-critics`** requires audit entries for all three: `np-critic-style`, `np-critic-tests`, `np-critic-acceptance`. Refuses with `loop-post-critics-missing-critic-audit` otherwise.
96
+
97
+ Both phases accept explicit overrides — `--force-post-executor` / `--force-post-critics` — for legitimate test fixtures and migration. The override stamps `forced_*` flags on the checkpoint so dashboards can count them.
98
+
99
+ ### Defense-in-depth summary
100
+
101
+ | Layer | Where | What it proves | Bypass cost |
102
+ |-------|---------------------|---------------------------------------------------------------|----------------------------------------|
103
+ | A | `commit-task.cjs` | The full sequence signature is on the checkpoint | Lie at all five evidence fields |
104
+ | B | `_runCommit` | Verify-green AND a post-critics findings array preceded the commit phase | Pre-write fake `verify_exit_code=0` and `findings: []` to the checkpoint manually |
105
+ | C | `_runPostExecutor` + `_runPostCritics` | Each declared spawn appears in the per-round audit log | Issue extra `loop-audit-tool-use` calls naming agents that didn't actually run |
106
+
107
+ No layer is sufficient alone. Together they require a deliberate, multi-step lie across multiple verbs to bypass — far more deliberate than the "pragmatic compression" rationalizations that produced bypasses 1–3 in production.
108
+
109
+ ### What the Trust Layer cannot prove
110
+
111
+ Layer C still cannot prove that the agent named in an audit entry actually ran. The orchestrator could call `loop-audit-tool-use --agent np-critic-style …` without spawning the critic. Closing this gap requires runtime instrumentation — the LLM runtime itself stamps spawn-provenance metadata into the audit entry, which the orchestrator cannot forge. That is "Stufe 2" and tracked separately; this amendment closes the practical bypass class without it.
112
+
80
113
  ## More Information
81
114
 
82
115
  * **Related ADR:** [ADR-0001](0001-no-daemon-invariant.md) — the loop runs in-session; no daemon coordinates spawns.
@@ -85,3 +118,4 @@ When `loop.maxRounds` is hit:
85
118
  * **Related ADR:** [ADR-0012](0012-completeness-doctrine.md) — the loop enforces the Completeness Mandate.
86
119
  * **Concept page:** [`v1/concepts/nubosloop.md`](../../knowledge/libraries/nubos-pilot/v1/concepts/nubosloop.md).
87
120
  * **Library:** `lib/nubosloop.cjs`.
121
+ * **Gate code:** `bin/np-tools/commit-task.cjs::_assertLoopGate` (Layer A); `bin/np-tools/loop-run-round.cjs::_runCommit` (Layer B); `bin/np-tools/loop-run-round.cjs::_runPostExecutor` + `_runPostCritics` (Layer C).
package/lib/nubosloop.cjs CHANGED
@@ -341,6 +341,52 @@ const SEARCH_TOOLS = Object.freeze([
341
341
 
342
342
  const AUDITED_AGENTS = Object.freeze(['np-researcher', 'np-executor', 'np-build-fixer']);
343
343
 
344
+ // Spawn-evidence agent groups (ADR-0010 Layer-C audit-trail enforcement).
345
+ // These lists are NOT about Rule 9 (which AUDITED_AGENTS gates) — they declare
346
+ // which spawns MUST appear in the per-round tool-use audit log before the
347
+ // orchestrator can advance loop-run-round through `post-executor`/`post-critics`.
348
+ // An entry in tool_use_audit with matching agent+round is the only evidence
349
+ // the gate accepts that the spawn actually happened.
350
+ const POST_EXECUTOR_EVIDENCE_R1 = Object.freeze(['np-executor']);
351
+ const POST_EXECUTOR_EVIDENCE_RN = Object.freeze(['np-build-fixer']);
352
+ const POST_CRITICS_EVIDENCE = Object.freeze([
353
+ 'np-critic-style',
354
+ 'np-critic-tests',
355
+ 'np-critic-acceptance',
356
+ ]);
357
+
358
+ /**
359
+ * Look up a spawn-audit entry for a given (taskId, agent, round). Returns the
360
+ * audit entry object if found, null otherwise. Used by Layer-C gates in
361
+ * loop-run-round to assert that real spawns preceded each phase advance.
362
+ */
363
+ function findSpawnAuditForRound(taskId, agent, round, cwd) {
364
+ if (!checkpoint.TASK_ID_RE.test(taskId)) return null;
365
+ const target = Number(round);
366
+ if (!Number.isFinite(target) || target < 1) return null;
367
+ const audits = readToolUseAudit(taskId, cwd) || [];
368
+ for (const a of audits) {
369
+ if (!a) continue;
370
+ if (a.agent !== agent) continue;
371
+ if ((Number(a.round) || 1) !== target) continue;
372
+ return a;
373
+ }
374
+ return null;
375
+ }
376
+
377
+ /**
378
+ * Assert every required spawn for a phase exists in the audit log for the
379
+ * current round. Returns { satisfied, missing } — the orchestrator-side gate
380
+ * uses `missing` to compose actionable error messages.
381
+ */
382
+ function assertSpawnsAuditedForRound(taskId, requiredAgents, round, cwd) {
383
+ const missing = [];
384
+ for (const agent of requiredAgents) {
385
+ if (!findSpawnAuditForRound(taskId, agent, round, cwd)) missing.push(agent);
386
+ }
387
+ return { satisfied: missing.length === 0, missing };
388
+ }
389
+
344
390
  /**
345
391
  * Rule 9 mechanical check (Completeness Doctrine + ADR-0010 Step 4).
346
392
  * The orchestrator collects each spawn's tool-use log (most LLM APIs
@@ -637,6 +683,11 @@ module.exports = {
637
683
  auditToolUse,
638
684
  readToolUseAudit,
639
685
  auditFindingsForRound,
686
+ findSpawnAuditForRound,
687
+ assertSpawnsAuditedForRound,
688
+ POST_EXECUTOR_EVIDENCE_R1,
689
+ POST_EXECUTOR_EVIDENCE_RN,
690
+ POST_CRITICS_EVIDENCE,
640
691
  KNOWN_ROUTING_BUCKETS,
641
692
  SEARCH_TOOLS,
642
693
  AUDITED_AGENTS,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nubos-pilot",
3
- "version": "0.9.5",
3
+ "version": "0.9.7",
4
4
  "description": "AI-driven planning and execution tool for code projects",
5
5
  "homepage": "https://github.com/Nubos-AI/nubos-pilot",
6
6
  "repository": {
@@ -223,13 +223,19 @@ for WAVE_INDEX in 0 1 2 ...; do
223
223
 
224
224
  node .nubos-pilot/bin/np-tools.cjs checkpoint transition "$TASK_ID" verifying
225
225
 
226
- # === Step 4: Mechanical Checks + tool-use audit (orchestrator-side) ===
226
+ # === Step 4: Mechanical Checks + spawn-evidence audit (orchestrator-side) ===
227
227
  VERIFY_LOG="${TMPDIR:-/tmp}/np-verify-${TASK_ID}-r${ROUND}.log"
228
228
  # Orchestrator (NOT the agent) runs the task's <verify> command + stack
229
229
  # linters; redirect stdout+stderr to $VERIFY_LOG.
230
230
  VERIFY_EXIT=$?
231
+ # Stamp executor spawn-evidence into the audit log. EXECUTOR_TOOL_LOG is
232
+ # the tool-name JSON array harvested from the spawn's tool_use stream
233
+ # (e.g. '["Read","search-knowledge","Edit","Bash"]'). For AUDITED_AGENTS
234
+ # this drives Rule 9 enforcement; the round number is sourced automatically
235
+ # from the checkpoint by loop-audit-tool-use. The post-executor gate (Layer C)
236
+ # refuses to advance unless this evidence stamp exists for the current round.
231
237
  node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" \
232
- --round "$ROUND" --agent "$EXECUTOR_AGENT"
238
+ --agent "$EXECUTOR_AGENT" --tool-use-log "$EXECUTOR_TOOL_LOG"
233
239
 
234
240
  POST_EXEC=$(node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
235
241
  --phase post-executor \
@@ -249,6 +255,19 @@ for WAVE_INDEX in 0 1 2 ...; do
249
255
  # - agents/np-critic-acceptance.md (sonnet) → CRITIC_ACCEPTANCE_JSON
250
256
  CRITIC_OUTPUTS_JSON=$(printf '[%s,%s,%s]' "$CRITIC_STYLE_JSON" "$CRITIC_TESTS_JSON" "$CRITIC_ACCEPTANCE_JSON")
251
257
 
258
+ # === Step 5b: Stamp critic spawn-evidence (one audit entry per critic) ===
259
+ # MANDATORY — without these three stamps, post-critics refuses with
260
+ # `loop-post-critics-missing-critic-audit` (Layer C, ADR-0010 Trust-Layer).
261
+ # The orchestrator MUST issue all three calls AFTER the critic spawns
262
+ # have actually run; synthetic --critic-outputs JSON without these
263
+ # corresponding audit entries is mechanically blocked.
264
+ # --tool-use-log may be empty for critics (they aren't AUDITED_AGENTS for
265
+ # Rule 9), but supplying the actual critic tool list is preferred for
266
+ # observability on np:dashboard.
267
+ node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-critic-style --tool-use-log '[]'
268
+ node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-critic-tests --tool-use-log '[]'
269
+ node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-critic-acceptance --tool-use-log '[]'
270
+
252
271
  # === Step 6: Route via loop-evaluate (post-critics) ===
253
272
  POST_CRIT=$(node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
254
273
  --phase post-critics --critic-outputs "$CRITIC_OUTPUTS_JSON")
@@ -347,7 +366,7 @@ After every slice completes, point the operator at `/np:validate-phase $PHASE` t
347
366
  - Spawn the three Critic agents (`np-critic-style`, `np-critic-tests`, `np-critic-acceptance`) IN PARALLEL — single message, three Agent blocks per task per round.
348
367
  - Run `loop-run-round --phase post-executor` AFTER mechanical checks; honor `next_action: spawn-build-fixer` (verify-red short-circuit, skip critics this round).
349
368
  - Run `loop-run-round --phase post-critics` AFTER critics return, to obtain the routing `next_action`.
350
- - Run `loop-audit-tool-use` per round per spawn — Rule 9 (search-knowledge / match-existing-learning) is mechanically enforced.
369
+ - Run `loop-audit-tool-use` per round per spawn — for executor/build-fixer this drives Rule 9 enforcement, AND for the three Critic agents this is the spawn-evidence required by the Layer-C audit-trail gate (`loop-post-executor-missing-spawn-audit` / `loop-post-critics-missing-critic-audit`). All four audit calls per round are mandatory before the corresponding `loop-run-round --phase post-{executor|critics}` invocation.
351
370
  - Route every commit through `node .nubos-pilot/bin/np-tools.cjs commit-task` so `assertCommittablePaths` (D-25) runs.
352
371
  - Hard-stop the wave when `commit-task` returns non-zero, OR a task hits `stuck`/`plan-checker`.
353
372
 
@@ -357,7 +376,7 @@ After every slice completes, point the operator at `/np:validate-phase $PHASE` t
357
376
  - Skip the Nubosloop and call `commit-task` directly after the executor (single-pass executor → commit is forbidden — ADR-0010).
358
377
  - Spawn the Critic agents serially — they MUST run in parallel (single message, three Agent blocks).
359
378
  - Use `np-executor` on Round ≥ 2 — use `np-build-fixer` (it gets prior critic findings + verify output excerpt).
360
- - Skip `loop-audit-tool-use` Rule 9 violations must surface as `rule-9-violation` findings, not be silenced.
379
+ - Skip `loop-audit-tool-use` for ANY spawn (executor/build-fixer/the three Critics). Skipping the executor audit silences Rule 9; skipping any critic audit means the orchestrator cannot prove the critic actually ran, and the post-critics gate refuses. Synthesizing `--critic-outputs` JSON without spawning real critic agents is the canonical bypass — Layer C blocks it mechanically.
361
380
  - Extend a task's scope beyond `files_modified` — D-04 violations route to `plan-checker`, not post-hoc PLAN.md mutations.
362
381
  - Invoke `git commit`, `git add`, or any bare git command from this workflow or the spawned agent (CLAUDE.md §Git operations).
363
382
  - Bundle two tasks into one commit (ADR-0004 atomicity).