npm - nubos-pilot - Versions diffs - 0.9.5 → 0.9.7 - Mend

nubos-pilot 0.9.5 → 0.9.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/agents/np-critic-acceptance.md +4 -0
package/agents/np-critic-style.md +4 -0
package/agents/np-critic-tests.md +4 -0
package/bin/np-tools/loop-audit-tool-use.cjs +27 -10
package/bin/np-tools/loop-commands.test.cjs +193 -0
package/bin/np-tools/loop-run-round.cjs +42 -0
package/docs/adr/0010-nubosloop.md +34 -0
package/lib/nubosloop.cjs +51 -0
package/package.json +1 -1
package/workflows/execute-phase.md +23 -4

package/agents/np-critic-acceptance.md CHANGED Viewed

@@ -26,6 +26,10 @@ This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENES
 Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
+## Spawn-Evidence Audit (Trust Layer, ADR-0010)
+Your spawn must be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --agent np-critic-acceptance --tool-use-log <json>` after you emit your findings JSON. This is the orchestrator's responsibility, not yours — but if you observe (in the verify output or task summary) that a prior round's critic-schwarm completed without an audit stamp, surface that as a finding of category `locked-decision-violation` because it indicates a bypass of ADR-0010 Layer C. The post-critics gate (`loop-run-round --phase post-critics`) refuses without the three critic stamps; missing your stamp blocks the entire round.
 ## Inputs
 The orchestrator provides these paths in your prompt context. Read every path it hands you via `Read` — do not guess.

package/agents/np-critic-style.md CHANGED Viewed

@@ -25,6 +25,10 @@ This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENES
 Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
+## Spawn-Evidence Audit (Trust Layer, ADR-0010)
+Your spawn must be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --agent np-critic-style --tool-use-log <json>` after you emit your findings JSON. The post-critics gate refuses without the three critic stamps; missing your stamp blocks the entire round. Synthesizing a fake findings JSON without spawning your sibling critics is a Layer-C violation and the orchestrator must NOT do it.
 ## Inputs
 The orchestrator provides these paths in your prompt context. Read every path it hands you via `Read` — do not guess.

package/agents/np-critic-tests.md CHANGED Viewed

@@ -25,6 +25,10 @@ This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENES
 Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
+## Spawn-Evidence Audit (Trust Layer, ADR-0010)
+Your spawn must be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --agent np-critic-tests --tool-use-log <json>` after you emit your findings JSON. The post-critics gate refuses without the three critic stamps; missing your stamp blocks the entire round. Synthesizing a fake findings JSON without spawning your sibling critics is a Layer-C violation and the orchestrator must NOT do it.
 ## Inputs
 The orchestrator provides these paths in your prompt context. Read every path it hands you via `Read` — do not guess.

package/bin/np-tools/loop-audit-tool-use.cjs CHANGED Viewed

@@ -31,18 +31,35 @@ function run(argv, ctx) {
       { hint: 'agents requiring search tools: ' + nubosloop.AUDITED_AGENTS.join(', ') },
     );
   }
-  const log = args.getJsonFlag(
-    tail,
-    '--tool-use-log',
-    'loop-audit-missing-log',
-    "JSON array of tool-name strings, e.g. '[\"Read\",\"search-knowledge\",\"Edit\"]'",
-  );
-  if (!Array.isArray(log)) {
+  // --tool-use-log is required for AUDITED_AGENTS (Rule 9 enforcement reads
+  // the tool list to verify search-knowledge / match-existing-learning calls).
+  // For non-audited spawns (critics, plan-checker, etc.) the orchestrator may
+  // omit it — we still record the spawn for Layer-C audit-trail evidence with
+  // an empty log. Explicit empty-array is also accepted.
+  const isAuditedAgent = nubosloop.AUDITED_AGENTS.includes(agent);
+  let log;
+  if (tail.includes('--tool-use-log')) {
+    log = args.getJsonFlag(
+      tail,
+      '--tool-use-log',
+      'loop-audit-missing-log',
+      "JSON array of tool-name strings, e.g. '[\"Read\",\"search-knowledge\",\"Edit\"]'",
+    );
+    if (!Array.isArray(log)) {
+      throw new (require('../../lib/core.cjs').NubosPilotError)(
+        'loop-audit-invalid-log',
+        '--tool-use-log must be a JSON array',
+        { got: typeof log },
+      );
+    }
+  } else if (isAuditedAgent) {
     throw new (require('../../lib/core.cjs').NubosPilotError)(
-      'loop-audit-invalid-log',
-      '--tool-use-log must be a JSON array',
-      { got: typeof log },
+      'loop-audit-missing-log',
+      'loop-audit-tool-use requires --tool-use-log for audited agent: ' + agent,
+      { hint: 'audited agents drive Rule 9 enforcement; pass --tool-use-log \'[]\' to record an empty spawn' },
     );
+  } else {
+    log = [];
   }
   const result = nubosloop.auditToolUse(taskId, agent, log, cwd);
   const payload = { task_id: taskId, ...result };

package/bin/np-tools/loop-commands.test.cjs CHANGED Viewed

@@ -349,9 +349,25 @@ test('LCLI-RR-2: loop-run-round preflight on populated store → spawn-executor-
   assert.ok(out.cache_hit);
 });
+// Helper: seed the per-round spawn-evidence audit log so Layer-C gates accept
+// post-executor / post-critics. Tests that exercise the gate explicitly
+// (LCLI-RR-12+) build their own partial fixtures.
+function _seedSpawnEvidence(taskId, round, agents, cwd) {
+  const nubosloop = require('../../lib/nubosloop.cjs');
+  nubosloop.recordLoopState(taskId, { round }, cwd);
+  for (const a of agents) {
+    // Pass an empty tool-use log — these are evidence stamps, not Rule 9 audits.
+    // For AUDITED_AGENTS in this test (np-executor / np-build-fixer) we need to
+    // pass a valid search-tool to avoid generating a rule-9-violation finding.
+    const log = nubosloop.AUDITED_AGENTS.includes(a) ? ['search-knowledge'] : [];
+    nubosloop.auditToolUse(taskId, a, log, cwd);
+  }
+}
 test('LCLI-RR-3: loop-run-round phase=post-executor with verify-green → spawn-critic-schwarm', () => {
   const r = _mkRoot();
   checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r);
   const cap = _cap();
   const loopRunRound = require('./loop-run-round.cjs');
   loopRunRound.run(
@@ -366,6 +382,7 @@ test('LCLI-RR-3: loop-run-round phase=post-executor with verify-green → spawn-
 test('LCLI-RR-4: loop-run-round phase=post-executor with verify-red → spawn-build-fixer', () => {
   const r = _mkRoot();
   checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r);
   const cap = _cap();
   const loopRunRound = require('./loop-run-round.cjs');
   loopRunRound.run(
@@ -380,6 +397,8 @@ test('LCLI-RR-4: loop-run-round phase=post-executor with verify-red → spawn-bu
 test('LCLI-RR-5: loop-run-round phase=post-critics with zero findings → commit', () => {
   const r = _mkRoot();
   checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  _seedSpawnEvidence('M001-S001-T0001', 1,
+    ['np-executor', 'np-critic-style', 'np-critic-tests', 'np-critic-acceptance'], r);
   const cap = _cap();
   const loopRunRound = require('./loop-run-round.cjs');
   loopRunRound.run(
@@ -399,6 +418,10 @@ test('LCLI-RR-5b: post-critics surfaces rule-9-violation from audit log even wit
   // Round 1, executor shipped without searching → audit captures violation
   nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
   nubosloop.auditToolUse('M001-S001-T0001', 'np-executor', ['Read', 'Edit'], r);
+  // Seed the three critic spawn evidences so the Layer-C gate is satisfied —
+  // we want the rule-9-violation to surface from the audit log, not the gate.
+  _seedSpawnEvidence('M001-S001-T0001', 1,
+    ['np-critic-style', 'np-critic-tests', 'np-critic-acceptance'], r);
   // Critics return zero findings (style/tests/acceptance all clean) — without
   // the Rule 9 chain the loop would commit. With it, the audit violation must
   // still route the round to executor.
@@ -428,6 +451,9 @@ test('LCLI-RR-5c: post-critics scopes audit findings to current round only', ()
   nubosloop.auditToolUse('M001-S001-T0001', 'np-executor', ['Read'], r);
   nubosloop.recordLoopState('M001-S001-T0001', { round: 2 }, r);
   nubosloop.auditToolUse('M001-S001-T0001', 'np-build-fixer', ['search-knowledge'], r);
+  // Seed critic-spawn evidence for round 2 so the Layer-C gate is satisfied.
+  _seedSpawnEvidence('M001-S001-T0001', 2,
+    ['np-critic-style', 'np-critic-tests', 'np-critic-acceptance'], r);
   const cap = _cap();
   const loopRunRound = require('./loop-run-round.cjs');
   loopRunRound.run(
@@ -540,6 +566,173 @@ test('LCLI-RR-11: phase=commit --force-commit-phase bypasses preconditions and s
   assert.equal(cp.nubosloop.forced_commit_phase, true);
 });
+// Layer C — audit-trail evidence enforcement -------------------------------
+test('LCLI-RR-12: post-executor refuses without np-executor audit (R1)', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  // Round defaults to 1 with no audit entries.
+  const loopRunRound = require('./loop-run-round.cjs');
+  assert.throws(
+    () => loopRunRound.run(
+      ['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0'],
+      { cwd: r, stdout: _cap().stub },
+    ),
+    (err) => err && err.code === 'loop-post-executor-missing-spawn-audit'
+      && Array.isArray(err.details && err.details.missing)
+      && err.details.missing.includes('np-executor')
+      && err.details.round === 1,
+  );
+});
+test('LCLI-RR-13: post-executor refuses on R1 if only np-build-fixer was audited (wrong agent)', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  _seedSpawnEvidence('M001-S001-T0001', 1, ['np-build-fixer'], r);
+  const loopRunRound = require('./loop-run-round.cjs');
+  assert.throws(
+    () => loopRunRound.run(
+      ['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0'],
+      { cwd: r, stdout: _cap().stub },
+    ),
+    (err) => err && err.code === 'loop-post-executor-missing-spawn-audit'
+      && err.details.missing.includes('np-executor'),
+  );
+});
+test('LCLI-RR-14: post-executor on R≥2 requires np-build-fixer audit, not np-executor', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  // Advance to round 2; audit only the wrong agent (np-executor).
+  const nubosloop = require('../../lib/nubosloop.cjs');
+  nubosloop.recordLoopState('M001-S001-T0001', { round: 2 }, r);
+  nubosloop.auditToolUse('M001-S001-T0001', 'np-executor', ['search-knowledge'], r);
+  const loopRunRound = require('./loop-run-round.cjs');
+  assert.throws(
+    () => loopRunRound.run(
+      ['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0'],
+      { cwd: r, stdout: _cap().stub },
+    ),
+    (err) => err && err.code === 'loop-post-executor-missing-spawn-audit'
+      && err.details.missing.includes('np-build-fixer')
+      && err.details.round === 2,
+  );
+});
+test('LCLI-RR-15: post-critics refuses without any critic audit (synthetic-JSON bypass)', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r);
+  // No critic-spawn audit → gate must refuse even if --critic-outputs is valid.
+  const loopRunRound = require('./loop-run-round.cjs');
+  assert.throws(
+    () => loopRunRound.run(
+      ['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs',
+        '[{"critic":"style","findings":[]},{"critic":"tests","findings":[]},{"critic":"acceptance","findings":[],"criteria":[]}]'],
+      { cwd: r, stdout: _cap().stub },
+    ),
+    (err) => err && err.code === 'loop-post-critics-missing-critic-audit'
+      && Array.isArray(err.details.missing)
+      && err.details.missing.length === 3,
+  );
+});
+test('LCLI-RR-16: post-critics refuses with only 2 of 3 critic audits (partial bypass)', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  _seedSpawnEvidence('M001-S001-T0001', 1,
+    ['np-executor', 'np-critic-style', 'np-critic-tests'], r); // missing acceptance
+  const loopRunRound = require('./loop-run-round.cjs');
+  assert.throws(
+    () => loopRunRound.run(
+      ['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs',
+        '[{"critic":"style","findings":[]},{"critic":"tests","findings":[]},{"critic":"acceptance","findings":[],"criteria":[]}]'],
+      { cwd: r, stdout: _cap().stub },
+    ),
+    (err) => err && err.code === 'loop-post-critics-missing-critic-audit'
+      && err.details.missing.length === 1
+      && err.details.missing[0] === 'np-critic-acceptance',
+  );
+});
+test('LCLI-RR-17: --force-post-executor bypasses Layer-C gate', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  // No audit entries; force flag must let us through.
+  const cap = _cap();
+  const loopRunRound = require('./loop-run-round.cjs');
+  loopRunRound.run(
+    ['M001-S001-T0001', '--phase', 'post-executor', '--verify-exit-code', '0', '--force-post-executor'],
+    { cwd: r, stdout: cap.stub },
+  );
+  const out = JSON.parse(cap.get());
+  assert.equal(out.next_action, 'spawn-critic-schwarm');
+});
+test('LCLI-RR-18: --force-post-critics bypasses Layer-C gate', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor'], r); // executor audited, critics not
+  const cap = _cap();
+  const loopRunRound = require('./loop-run-round.cjs');
+  loopRunRound.run(
+    ['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs',
+      '[{"critic":"style","findings":[]},{"critic":"tests","findings":[]},{"critic":"acceptance","findings":[],"criteria":[]}]',
+     '--force-post-critics'],
+    { cwd: r, stdout: cap.stub },
+  );
+  const out = JSON.parse(cap.get());
+  assert.equal(out.next_action, 'commit');
+});
+test('LCLI-RR-19: assertSpawnsAuditedForRound returns ordered missing list', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  const nubosloop = require('../../lib/nubosloop.cjs');
+  nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
+  nubosloop.auditToolUse('M001-S001-T0001', 'np-critic-style', [], r);
+  const v = nubosloop.assertSpawnsAuditedForRound(
+    'M001-S001-T0001', nubosloop.POST_CRITICS_EVIDENCE, 1, r,
+  );
+  assert.equal(v.satisfied, false);
+  assert.deepEqual(v.missing, ['np-critic-tests', 'np-critic-acceptance']);
+});
+test('LCLI-RR-20: findSpawnAuditForRound is round-scoped (round-1 audit not visible from round-2)', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  const nubosloop = require('../../lib/nubosloop.cjs');
+  nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
+  nubosloop.auditToolUse('M001-S001-T0001', 'np-critic-style', [], r);
+  assert.ok(nubosloop.findSpawnAuditForRound('M001-S001-T0001', 'np-critic-style', 1, r));
+  assert.equal(nubosloop.findSpawnAuditForRound('M001-S001-T0001', 'np-critic-style', 2, r), null);
+});
+test('LCLI-RR-21: loop-audit-tool-use accepts critics without --tool-use-log (records empty spawn)', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  const nubosloop = require('../../lib/nubosloop.cjs');
+  nubosloop.recordLoopState('M001-S001-T0001', { round: 1 }, r);
+  const loopAudit = require('./loop-audit-tool-use.cjs');
+  const cap = _cap();
+  loopAudit.run(['M001-S001-T0001', '--agent', 'np-critic-style'], { cwd: r, stdout: cap.stub });
+  const out = JSON.parse(cap.get());
+  assert.equal(out.agent, 'np-critic-style');
+  assert.equal(out.violation, null); // critics aren't audited for Rule 9
+  // The audit log must still record the spawn so Layer C can find it.
+  assert.ok(nubosloop.findSpawnAuditForRound('M001-S001-T0001', 'np-critic-style', 1, r));
+});
+test('LCLI-RR-22: loop-audit-tool-use still REQUIRES --tool-use-log for AUDITED_AGENTS', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  const loopAudit = require('./loop-audit-tool-use.cjs');
+  assert.throws(
+    () => loopAudit.run(['M001-S001-T0001', '--agent', 'np-executor'], { cwd: r, stdout: _cap().stub }),
+    (err) => err && err.code === 'loop-audit-missing-log',
+  );
+});
 test('LCLI-22: learning-match queries the local store', () => {
   const r = _mkRoot();
   const lr = require('../../lib/learnings.cjs');

package/bin/np-tools/loop-run-round.cjs CHANGED Viewed

@@ -81,6 +81,27 @@ function _runPostExecutor(taskId, list, cwd) {
       { hint: 'pass the exit code of the task verify command' },
     );
   }
+  // Layer C: audit-trail enforcement — refuse if no executor spawn was
+  // recorded for this round via `loop-audit-tool-use`. This blocks the
+  // bypass where an orchestrator stamps verify-green without actually
+  // spawning np-executor / np-build-fixer.
+  const force = list.includes('--force-post-executor');
+  if (!force) {
+    const cur = checkpoint.readCheckpoint(taskId, cwd) || {};
+    const round = Number((cur.nubosloop && cur.nubosloop.round)) || 1;
+    const required = round === 1 ? nubosloop.POST_EXECUTOR_EVIDENCE_R1 : nubosloop.POST_EXECUTOR_EVIDENCE_RN;
+    const verdict = nubosloop.assertSpawnsAuditedForRound(taskId, required, round, cwd);
+    if (!verdict.satisfied) {
+      throw new NubosPilotError(
+        'loop-post-executor-missing-spawn-audit',
+        'phase=post-executor refused: no `loop-audit-tool-use` record found for round=' + round +
+        ', agent=' + verdict.missing.join('/') + ' on ' + taskId + '. ' +
+        'Spawn the executor/build-fixer agent and call `loop-audit-tool-use ' + taskId +
+        ' --agent <name> --tool-use-log <json>` first, or pass --force-post-executor for an explicit override.',
+        { taskId, round, missing: verdict.missing.slice(), required: required.slice() },
+      );
+    }
+  }
   const code = Number(verifyExitCode);
   const verifyOutputPath = args.getFlag(list, '--verify-output-path');
   let verifyOutput = '';
@@ -132,6 +153,27 @@ function _runPostCritics(taskId, list, cwd) {
     const pb = cp.nubosloop || {};
     return Number(pb.round) || 1;
   })();
+  // Layer C: audit-trail enforcement — refuse if the three critic spawns
+  // (style/tests/acceptance) are not present in the audit log for this round.
+  // This blocks the bypass where an orchestrator hand-writes synthetic
+  // critic-output JSON without actually spawning the critic agents.
+  const force = list.includes('--force-post-critics');
+  if (!force) {
+    const verdict = nubosloop.assertSpawnsAuditedForRound(
+      taskId, nubosloop.POST_CRITICS_EVIDENCE, round, cwd,
+    );
+    if (!verdict.satisfied) {
+      throw new NubosPilotError(
+        'loop-post-critics-missing-critic-audit',
+        'phase=post-critics refused: critic-schwarm spawn-evidence missing for round=' + round +
+        ' on ' + taskId + ' (missing audits: ' + verdict.missing.join(', ') + '). ' +
+        'For each critic agent, call `loop-audit-tool-use ' + taskId +
+        ' --agent <np-critic-style|np-critic-tests|np-critic-acceptance> --tool-use-log <json>` ' +
+        'after the spawn, then re-run --phase post-critics. Pass --force-post-critics for an explicit override.',
+        { taskId, round, missing: verdict.missing.slice(), required: nubosloop.POST_CRITICS_EVIDENCE.slice() },
+      );
+    }
+  }
   const opts = nubosloop.resolveLoopOpts(cwd);
   // Rule 9 chain: convert this round's audit violations into rule-9-violation
   // findings so they participate in routing alongside critic findings.

package/docs/adr/0010-nubosloop.md CHANGED Viewed

@@ -77,6 +77,39 @@ When `loop.maxRounds` is hit:
 * Bad, because per-task token cost grows compared to the single-pass model. Accepted — that cost is the price of completeness, and the cache + cap bound it.
 * Bad, because the orchestrator must coordinate 1 Executor + 3 Critics + occasional Researcher-Schwarm per task. Accepted — that coordination is what makes per-task adversarial review possible.
+## Trust Layer (amended 2026-05-04)
+The original spec assumed a cooperative orchestrator: each `loop-run-round --phase X` call was treated as evidence that the corresponding work happened. Multiple production runs proved that assumption wrong — under user-pressure or budget constraints, an orchestrator can rationalize partial-loops or fully-synthetic loops while still emitting the right CLI calls. Three failure modes observed in the wild:
+1. **Single-pass bypass** — `executor → commit-task` directly, skipping the loop. (Closed by `commit-task` Layer-A gate; refuses without `nubosloop.last_phase=commit`.)
+2. **Stamp bypass** — `loop-run-round --phase commit` invoked directly without prior phases, just to satisfy Layer A. (Closed by Layer-B precondition in `_runCommit`; refuses without `verify_exit_code=0` and `findings: []` on the checkpoint.)
+3. **Synthetic-evidence bypass** — orchestrator invokes every `loop-run-round` phase but with hand-written `--critic-outputs '[{"critic":"style","findings":[]}, ...]'` JSON, never actually spawning the three critic agents. Layers A and B see a perfectly-shaped checkpoint and accept. (Closed by Layer-C audit-trail gate, this amendment.)
+### Layer-C — Spawn-evidence audit-trail
+Each LLM spawn (executor, build-fixer, three critics) MUST be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --task-id … --agent <name> --tool-use-log <json>`. The round number is sourced automatically from `nubosloop.round` to keep the chain idempotent.
+Two phase verbs now consult this log before accepting an advance:
+* **`loop-run-round --phase post-executor`** requires an audit entry for `np-executor` (round 1) or `np-build-fixer` (round ≥ 2) in the current round. Refuses with `loop-post-executor-missing-spawn-audit` otherwise.
+* **`loop-run-round --phase post-critics`** requires audit entries for all three: `np-critic-style`, `np-critic-tests`, `np-critic-acceptance`. Refuses with `loop-post-critics-missing-critic-audit` otherwise.
+Both phases accept explicit overrides — `--force-post-executor` / `--force-post-critics` — for legitimate test fixtures and migration. The override stamps `forced_*` flags on the checkpoint so dashboards can count them.
+### Defense-in-depth summary
+| Layer | Where               | What it proves                                                | Bypass cost                            |
+|-------|---------------------|---------------------------------------------------------------|----------------------------------------|
+| A     | `commit-task.cjs`   | The full sequence signature is on the checkpoint              | Lie at all five evidence fields        |
+| B     | `_runCommit`        | Verify-green AND a post-critics findings array preceded the commit phase | Pre-write fake `verify_exit_code=0` and `findings: []` to the checkpoint manually |
+| C     | `_runPostExecutor` + `_runPostCritics` | Each declared spawn appears in the per-round audit log | Issue extra `loop-audit-tool-use` calls naming agents that didn't actually run |
+No layer is sufficient alone. Together they require a deliberate, multi-step lie across multiple verbs to bypass — far more deliberate than the "pragmatic compression" rationalizations that produced bypasses 1–3 in production.
+### What the Trust Layer cannot prove
+Layer C still cannot prove that the agent named in an audit entry actually ran. The orchestrator could call `loop-audit-tool-use --agent np-critic-style …` without spawning the critic. Closing this gap requires runtime instrumentation — the LLM runtime itself stamps spawn-provenance metadata into the audit entry, which the orchestrator cannot forge. That is "Stufe 2" and tracked separately; this amendment closes the practical bypass class without it.
 ## More Information
 * **Related ADR:** [ADR-0001](0001-no-daemon-invariant.md) — the loop runs in-session; no daemon coordinates spawns.
@@ -85,3 +118,4 @@ When `loop.maxRounds` is hit:
 * **Related ADR:** [ADR-0012](0012-completeness-doctrine.md) — the loop enforces the Completeness Mandate.
 * **Concept page:** [`v1/concepts/nubosloop.md`](../../knowledge/libraries/nubos-pilot/v1/concepts/nubosloop.md).
 * **Library:** `lib/nubosloop.cjs`.
+* **Gate code:** `bin/np-tools/commit-task.cjs::_assertLoopGate` (Layer A); `bin/np-tools/loop-run-round.cjs::_runCommit` (Layer B); `bin/np-tools/loop-run-round.cjs::_runPostExecutor` + `_runPostCritics` (Layer C).

package/lib/nubosloop.cjs CHANGED Viewed

@@ -341,6 +341,52 @@ const SEARCH_TOOLS = Object.freeze([
 const AUDITED_AGENTS = Object.freeze(['np-researcher', 'np-executor', 'np-build-fixer']);
+// Spawn-evidence agent groups (ADR-0010 Layer-C audit-trail enforcement).
+// These lists are NOT about Rule 9 (which AUDITED_AGENTS gates) — they declare
+// which spawns MUST appear in the per-round tool-use audit log before the
+// orchestrator can advance loop-run-round through `post-executor`/`post-critics`.
+// An entry in tool_use_audit with matching agent+round is the only evidence
+// the gate accepts that the spawn actually happened.
+const POST_EXECUTOR_EVIDENCE_R1 = Object.freeze(['np-executor']);
+const POST_EXECUTOR_EVIDENCE_RN = Object.freeze(['np-build-fixer']);
+const POST_CRITICS_EVIDENCE = Object.freeze([
+  'np-critic-style',
+  'np-critic-tests',
+  'np-critic-acceptance',
+]);
+/**
+ * Look up a spawn-audit entry for a given (taskId, agent, round). Returns the
+ * audit entry object if found, null otherwise. Used by Layer-C gates in
+ * loop-run-round to assert that real spawns preceded each phase advance.
+ */
+function findSpawnAuditForRound(taskId, agent, round, cwd) {
+  if (!checkpoint.TASK_ID_RE.test(taskId)) return null;
+  const target = Number(round);
+  if (!Number.isFinite(target) || target < 1) return null;
+  const audits = readToolUseAudit(taskId, cwd) || [];
+  for (const a of audits) {
+    if (!a) continue;
+    if (a.agent !== agent) continue;
+    if ((Number(a.round) || 1) !== target) continue;
+    return a;
+  }
+  return null;
+}
+/**
+ * Assert every required spawn for a phase exists in the audit log for the
+ * current round. Returns { satisfied, missing } — the orchestrator-side gate
+ * uses `missing` to compose actionable error messages.
+ */
+function assertSpawnsAuditedForRound(taskId, requiredAgents, round, cwd) {
+  const missing = [];
+  for (const agent of requiredAgents) {
+    if (!findSpawnAuditForRound(taskId, agent, round, cwd)) missing.push(agent);
+  }
+  return { satisfied: missing.length === 0, missing };
+}
 /**
  * Rule 9 mechanical check (Completeness Doctrine + ADR-0010 Step 4).
  * The orchestrator collects each spawn's tool-use log (most LLM APIs
@@ -637,6 +683,11 @@ module.exports = {
   auditToolUse,
   readToolUseAudit,
   auditFindingsForRound,
+  findSpawnAuditForRound,
+  assertSpawnsAuditedForRound,
+  POST_EXECUTOR_EVIDENCE_R1,
+  POST_EXECUTOR_EVIDENCE_RN,
+  POST_CRITICS_EVIDENCE,
   KNOWN_ROUTING_BUCKETS,
   SEARCH_TOOLS,
   AUDITED_AGENTS,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "nubos-pilot",
-  "version": "0.9.5",
+  "version": "0.9.7",
   "description": "AI-driven planning and execution tool for code projects",
   "homepage": "https://github.com/Nubos-AI/nubos-pilot",
   "repository": {

package/workflows/execute-phase.md CHANGED Viewed

@@ -223,13 +223,19 @@ for WAVE_INDEX in 0 1 2 ...; do
       node .nubos-pilot/bin/np-tools.cjs checkpoint transition "$TASK_ID" verifying
-      # === Step 4: Mechanical Checks + tool-use audit (orchestrator-side) ===
+      # === Step 4: Mechanical Checks + spawn-evidence audit (orchestrator-side) ===
       VERIFY_LOG="${TMPDIR:-/tmp}/np-verify-${TASK_ID}-r${ROUND}.log"
       # Orchestrator (NOT the agent) runs the task's <verify> command + stack
       # linters; redirect stdout+stderr to $VERIFY_LOG.
       VERIFY_EXIT=$?
+      # Stamp executor spawn-evidence into the audit log. EXECUTOR_TOOL_LOG is
+      # the tool-name JSON array harvested from the spawn's tool_use stream
+      # (e.g. '["Read","search-knowledge","Edit","Bash"]'). For AUDITED_AGENTS
+      # this drives Rule 9 enforcement; the round number is sourced automatically
+      # from the checkpoint by loop-audit-tool-use. The post-executor gate (Layer C)
+      # refuses to advance unless this evidence stamp exists for the current round.
       node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" \
-        --round "$ROUND" --agent "$EXECUTOR_AGENT"
+        --agent "$EXECUTOR_AGENT" --tool-use-log "$EXECUTOR_TOOL_LOG"
       POST_EXEC=$(node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
         --phase post-executor \
@@ -249,6 +255,19 @@ for WAVE_INDEX in 0 1 2 ...; do
       #   - agents/np-critic-acceptance.md  (sonnet) → CRITIC_ACCEPTANCE_JSON
       CRITIC_OUTPUTS_JSON=$(printf '[%s,%s,%s]' "$CRITIC_STYLE_JSON" "$CRITIC_TESTS_JSON" "$CRITIC_ACCEPTANCE_JSON")
+      # === Step 5b: Stamp critic spawn-evidence (one audit entry per critic) ===
+      # MANDATORY — without these three stamps, post-critics refuses with
+      # `loop-post-critics-missing-critic-audit` (Layer C, ADR-0010 Trust-Layer).
+      # The orchestrator MUST issue all three calls AFTER the critic spawns
+      # have actually run; synthetic --critic-outputs JSON without these
+      # corresponding audit entries is mechanically blocked.
+      # --tool-use-log may be empty for critics (they aren't AUDITED_AGENTS for
+      # Rule 9), but supplying the actual critic tool list is preferred for
+      # observability on np:dashboard.
+      node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-critic-style      --tool-use-log '[]'
+      node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-critic-tests      --tool-use-log '[]'
+      node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-critic-acceptance --tool-use-log '[]'
       # === Step 6: Route via loop-evaluate (post-critics) ===
       POST_CRIT=$(node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
         --phase post-critics --critic-outputs "$CRITIC_OUTPUTS_JSON")
@@ -347,7 +366,7 @@ After every slice completes, point the operator at `/np:validate-phase $PHASE` t
 - Spawn the three Critic agents (`np-critic-style`, `np-critic-tests`, `np-critic-acceptance`) IN PARALLEL — single message, three Agent blocks per task per round.
 - Run `loop-run-round --phase post-executor` AFTER mechanical checks; honor `next_action: spawn-build-fixer` (verify-red short-circuit, skip critics this round).
 - Run `loop-run-round --phase post-critics` AFTER critics return, to obtain the routing `next_action`.
-- Run `loop-audit-tool-use` per round per spawn — Rule 9 (search-knowledge / match-existing-learning) is mechanically enforced.
+- Run `loop-audit-tool-use` per round per spawn — for executor/build-fixer this drives Rule 9 enforcement, AND for the three Critic agents this is the spawn-evidence required by the Layer-C audit-trail gate (`loop-post-executor-missing-spawn-audit` / `loop-post-critics-missing-critic-audit`). All four audit calls per round are mandatory before the corresponding `loop-run-round --phase post-{executor|critics}` invocation.
 - Route every commit through `node .nubos-pilot/bin/np-tools.cjs commit-task` so `assertCommittablePaths` (D-25) runs.
 - Hard-stop the wave when `commit-task` returns non-zero, OR a task hits `stuck`/`plan-checker`.
@@ -357,7 +376,7 @@ After every slice completes, point the operator at `/np:validate-phase $PHASE` t
 - Skip the Nubosloop and call `commit-task` directly after the executor (single-pass executor → commit is forbidden — ADR-0010).
 - Spawn the Critic agents serially — they MUST run in parallel (single message, three Agent blocks).
 - Use `np-executor` on Round ≥ 2 — use `np-build-fixer` (it gets prior critic findings + verify output excerpt).
-- Skip `loop-audit-tool-use` — Rule 9 violations must surface as `rule-9-violation` findings, not be silenced.
+- Skip `loop-audit-tool-use` for ANY spawn (executor/build-fixer/the three Critics). Skipping the executor audit silences Rule 9; skipping any critic audit means the orchestrator cannot prove the critic actually ran, and the post-critics gate refuses. Synthesizing `--critic-outputs` JSON without spawning real critic agents is the canonical bypass — Layer C blocks it mechanically.
 - Extend a task's scope beyond `files_modified` — D-04 violations route to `plan-checker`, not post-hoc PLAN.md mutations.
 - Invoke `git commit`, `git add`, or any bare git command from this workflow or the spawned agent (CLAUDE.md §Git operations).
 - Bundle two tasks into one commit (ADR-0004 atomicity).