nubos-pilot 1.0.1 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,8 +1,8 @@
1
1
  ---
2
2
  name: np-critic
3
- description: Nubosloop critic for the per-task adversarial review. Spawned ONCE after np-executor (or np-build-fixer) commits a draft. Read-only on source. Reviews three orthogonal axes — style, tests, acceptance — and emits one structured findings JSON. ADR-0010 (single-critic revision 2026-05-05).
3
+ description: Nubosloop critic for the per-task adversarial review. Spawned ONCE after np-executor (or np-build-fixer) commits a draft. Read-only on source, write-allowed for the critic-report file the orchestrator hands it. Reviews three orthogonal axes — style, tests, acceptance — writes the full findings JSON to disk and emits a tiny verdict envelope. ADR-0010 §Verdict-Only Contract (2026-05-05).
4
4
  tier: sonnet
5
- tools: Read, Bash, Grep, Glob
5
+ tools: Read, Write, Bash, Grep, Glob
6
6
  color: "#A855F7"
7
7
  ---
8
8
 
@@ -46,6 +46,7 @@ The orchestrator provides these paths in your prompt context. Read every path it
46
46
  | Executor diff (required) | The patch produced this round. | inline / captured in checkpoint |
47
47
  | Verify output (required) | stdout/stderr of the task's verify command. | inline |
48
48
  | Files modified (required) | Paths the executor was scoped to. | task plan frontmatter `files_modified` |
49
+ | **Report path (required, ADR-0010 §L5)** | The path where you `Write` the full findings JSON. The orchestrator pre-creates the parent directory; you only need to `Write`. | `.nubos-pilot/.tmp/<run-id>/critic-<task-id>-r<round>.json` |
49
50
  | Codebase docs (recommended) | `.nubos-pilot/codebase/<module>.md` for the touched modules — invariants and gotchas. | `.nubos-pilot/codebase/` |
50
51
 
51
52
  ## Audit Surface — three axis modules (load BEFORE auditing)
@@ -62,9 +63,13 @@ You produce ONE merged findings JSON covering ALL three axes — see Output Sche
62
63
 
63
64
  If any of the three module files cannot be read, emit `category: critic-error` with `remediation: "missing critic module file: <path>"` and route to `stuck` — the orchestrator must inject all three.
64
65
 
65
- ## Output Schema
66
+ ## Output Schema — Verdict-Only Contract (ADR-0010 §L5, 2026-05-05)
66
67
 
67
- Emit a single JSON object as your final response (no prose, no markdown wrapper around it).
68
+ You emit your audit in **two artefacts**: the full findings JSON gets `Write`-n to a path the orchestrator hands you, and your spawn's final response is a tiny envelope. This keeps the parent context lean — verbatim multi-kB findings reports were the dominant Nubosloop token sink before this revision.
69
+
70
+ ### Step 1 — write the full report to disk
71
+
72
+ The orchestrator passes a `<report_path>` value in your spawn prompt (typically `.nubos-pilot/.tmp/<run-id>/critic-<task-id>-r<round>.json`). Use `Write` to emit this object verbatim into that path:
68
73
 
69
74
  ```json
70
75
  {
@@ -96,11 +101,29 @@ Emit a single JSON object as your final response (no prose, no markdown wrapper
96
101
  }
97
102
  ```
98
103
 
99
- `verdict` is `passed` only when every criterion in `criteria[]` is `Satisfied` AND `findings.length === 0`. Otherwise `issues_found`.
104
+ The full-report shape is unchanged from the legacy contract — `lib/nubosloop.cjs::mergeCriticOutputs` reads this file directly via `loop-run-round --phase post-critics --critic-outputs-path`. Five-field routing contract (`category`, `severity`, `file`, `line`, `remediation`) is unchanged; auto-promotion of `Unsatisfied`/`Information-Missing` criteria is unchanged.
105
+
106
+ ### Step 2 — emit the verdict envelope as your final response
107
+
108
+ After the `Write` succeeds, your spawn's final response — the message that lands in the orchestrator's context — is a **single small JSON object**, no prose, no markdown wrapper:
109
+
110
+ ```json
111
+ {
112
+ "critic": "critic",
113
+ "task_id": "M001-S001-T0001",
114
+ "round": 1,
115
+ "verdict": "passed | issues_found",
116
+ "blockers_count": 0,
117
+ "report_path": ".nubos-pilot/.tmp/<run-id>/critic-M001-S001-T0001-r1.json",
118
+ "run_id": "<run-id>"
119
+ }
120
+ ```
121
+
122
+ `verdict` is `passed` only when every criterion in `criteria[]` is `Satisfied` AND `findings.length === 0`. Otherwise `issues_found`. `blockers_count` is the count of findings with `severity == "fail"` plus criteria with verdict `Unsatisfied` (so the orchestrator can sort tasks for triage without reading the full file). `report_path` is the literal path you wrote — verbatim from the orchestrator's `<report_path>` input.
100
123
 
101
- **Routing-engine contract.** `lib/nubosloop.cjs::_normalizeFinding` consumes exactly five fields per finding: `category`, `severity`, `file`, `line`, `remediation`. Every other field (`id`, `criterion_id`, `question_to_user`, etc.) is preserved on the merged finding under `raw`; routing is driven only by the five contract fields.
124
+ If `<report_path>` is missing from your prompt or you cannot write the file, do NOT silently fall back to inline JSON — that defeats the cost-control purpose of this contract. Emit a single envelope with `verdict: "issues_found"`, `blockers_count: 1`, `report_path: null`, and an inline `error` field describing the cause; the orchestrator routes that to `critic-error stuck`.
102
125
 
103
- **Note on auto-promotion.** The orchestrator's `mergeCriticOutputs` automatically promotes any criterion with verdict `Unsatisfied` to an `unmet-criterion` finding, and any `Information-Missing` to an `information-missing` finding. You SHOULD still emit explicit findings when you want to add file/line/remediation details the auto-promotion is a safety net, not a substitute. Identical findings are deduplicated by fingerprint.
126
+ **Why two artefacts.** The full findings JSON is several kB on a typical adversarial review (one paragraph per finding × N findings + per-criterion evidence sentences). Returning that as the spawn's final message replays it into the parent's history every round. The envelope is ~150 bytes the orchestrator only reads the file when post-critics actually needs to route findings.
104
127
 
105
128
  ## Scope Guardrail
106
129
 
@@ -110,19 +133,22 @@ Emit a single JSON object as your final response (no prose, no markdown wrapper
110
133
  - Cite file, line, and concrete remediation per finding — not vague gripes.
111
134
  - Cite passing test names from the verify output as `Satisfied` evidence.
112
135
  - Mark infra failures `Information-Missing`, never `Unsatisfied`.
113
- - Emit one JSON object only no prose wrapper, no markdown fence.
136
+ - `Write` the full findings JSON to the orchestrator-supplied `<report_path>` BEFORE emitting your final-message envelope.
137
+ - Final message = the small verdict envelope only. No prose, no markdown fence, no inline findings array.
114
138
 
115
139
  **Don't:**
116
- - Edit source — you are read-only.
140
+ - Edit source — `Write` is allowed ONLY for the `<report_path>` the orchestrator hands you. Touching anything else is a Layer-A bypass.
117
141
  - Spawn other agents — you finish your audit and return.
118
142
  - Skip an axis "because the diff looks small". A small diff with no tests is a `missing-test` finding.
119
143
  - Pass with reservations — verdict is binary (`passed` or `issues_found`); reservations belong in findings.
120
144
  - Refuse to surface findings because "the executor will fix them anyway" — surface them, the loop closes them.
145
+ - Inline the full findings JSON in the final message. The Verdict-Only Contract exists because that response replays into the orchestrator's context every round and is the dominant token sink — defeating it silently re-introduces the cost ADR-0010 §L5 was designed to remove.
121
146
  </scope_guardrail>
122
147
 
123
148
  ## Stop Conditions
124
149
 
125
- Hard-stop (return findings + verdict; do NOT attempt recovery):
126
- - The task plan has no `<success_criteria>` block — emit a single `unmet-criterion` finding pointing at this gap and route to plan-checker.
127
- - The Critic budget (timeout) is exhausted — emit collected criteria + findings + verdict `issues_found`.
128
- - The diff is unparseable / files are missing → emit `category: critic-error` and route to stuck.
150
+ Hard-stop (`Write` the full findings JSON to `<report_path>` if possible, then emit the envelope; do NOT attempt recovery):
151
+ - The task plan has no `<success_criteria>` block — emit a single `unmet-criterion` finding pointing at this gap and route to plan-checker. Envelope `verdict: "issues_found"`, `blockers_count: 1`.
152
+ - The Critic budget (timeout) is exhausted — emit collected criteria + findings + verdict `issues_found`. Envelope reflects the partial report.
153
+ - The diff is unparseable / files are missing → emit `category: critic-error` and route to stuck. Envelope `verdict: "issues_found"`, `blockers_count: 1`.
154
+ - `<report_path>` is missing from the prompt OR `Write` to it fails → emit envelope with `report_path: null`, `verdict: "issues_found"`, `blockers_count: 1`, and an `error` field describing the cause. Routing engine treats this as `critic-error → stuck`.
@@ -86,6 +86,7 @@ const COMMANDS = [
86
86
  { name: 'loop-run-round', category: 'Execution', description: 'Drive the per-task Nubosloop state machine — phases: preflight | post-executor | post-critics | commit | stuck', description_de: 'Treibt die Per-Task Nubosloop-State-Machine — Phasen: preflight | post-executor | post-critics | commit | stuck' },
87
87
  { name: 'loop-audit-tool-use', category: 'Execution', description: 'Record/read the tool-use audit per spawn (Completeness Rule 9 mechanical check)', description_de: 'Tool-use Audit pro Spawn schreiben/lesen (Completeness Rule 9 mechanische Prüfung)' },
88
88
  { name: 'loop-stuck', category: 'Execution', description: 'Mark a task as stuck (writes loop-state + flips checkpoint status to stuck)', description_de: 'Markiert Task als stuck (schreibt Loop-State + setzt Checkpoint-Status auf stuck)' },
89
+ { name: 'spawn-headless', category: 'Execution', description: 'Spawn an agent as a headless `claude -p` subprocess (ADR-0010 §L6); writes stdout to --output-path and returns exit code', description_de: 'Spawnt einen Agent als headless `claude -p` Subprozess (ADR-0010 §L6); schreibt stdout nach --output-path und liefert Exit-Code' },
89
90
  { name: 'loop-metrics', category: 'Utility', description: 'Aggregate Nubosloop telemetry across all checkpoints (commits, stuck, route distribution)', description_de: 'Aggregiert Nubosloop-Telemetrie über alle Checkpoints (Commits, Stuck, Routing)' },
90
91
  { name: 'learning-log', category: 'Execution', description: 'Persist a learning to the local store (or MCP adapter when configured)', description_de: 'Persistiert ein Learning im lokalen Store (oder MCP-Adapter falls konfiguriert)' },
91
92
  { name: 'learning-match', category: 'Utility', description: 'Query the learnings store for cached patterns matching a free-text query', description_de: 'Fragt den Learnings-Store nach Cached-Patterns ab' },
@@ -1421,3 +1421,155 @@ test('LCLI-20: learning-log payload carries fingerprint + was_new + occurrence',
1421
1421
  assert.equal(out2.occurrence, 2);
1422
1422
  assert.equal(out2.fingerprint, out1.fingerprint);
1423
1423
  });
1424
+
1425
+ // ADR-0010 §L5 Verdict-Only Contract — post-critics reads findings from disk.
1426
+
1427
+ test('LCLI-RR-L5-1: post-critics --critic-outputs-path reads file and routes commit on zero findings', () => {
1428
+ const r = _mkRoot();
1429
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
1430
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor', 'np-critic'], r);
1431
+ const reportDir = path.join(r, '.nubos-pilot', '.tmp');
1432
+ fs.mkdirSync(reportDir, { recursive: true });
1433
+ const reportPath = path.join(reportDir, 'critic-r1.json');
1434
+ fs.writeFileSync(reportPath, JSON.stringify({
1435
+ critic: 'critic', task_id: 'M001-S001-T0001', round: 1,
1436
+ criteria: [], findings: [], verdict: 'passed',
1437
+ }), 'utf-8');
1438
+ const cap = _cap();
1439
+ const loopRunRound = require('./loop-run-round.cjs');
1440
+ loopRunRound.run(
1441
+ ['M001-S001-T0001', '--phase', 'post-critics',
1442
+ '--critic-outputs-path', path.relative(r, reportPath)],
1443
+ { cwd: r, stdout: cap.stub },
1444
+ );
1445
+ const out = JSON.parse(cap.get());
1446
+ assert.equal(out.next_action, 'commit');
1447
+ assert.equal(out.findings.length, 0);
1448
+ });
1449
+
1450
+ test('LCLI-RR-L5-2: post-critics --critic-outputs-path with single object (not array) is wrapped', () => {
1451
+ const r = _mkRoot();
1452
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
1453
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor', 'np-critic'], r);
1454
+ const reportPath = path.join(r, 'critic-r1.json');
1455
+ fs.writeFileSync(reportPath, JSON.stringify({
1456
+ critic: 'critic', task_id: 'M001-S001-T0001', round: 1,
1457
+ findings: [{ category: 'todo-marker', severity: 'fail', file: 'a.ts', line: 4, remediation: 'remove TODO' }],
1458
+ criteria: [], verdict: 'issues_found',
1459
+ }), 'utf-8');
1460
+ const cap = _cap();
1461
+ const loopRunRound = require('./loop-run-round.cjs');
1462
+ loopRunRound.run(
1463
+ ['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs-path', 'critic-r1.json'],
1464
+ { cwd: r, stdout: cap.stub },
1465
+ );
1466
+ const out = JSON.parse(cap.get());
1467
+ assert.equal(out.next_action, 'executor');
1468
+ assert.equal(out.findings.length, 1);
1469
+ assert.equal(out.findings[0].category, 'todo-marker');
1470
+ });
1471
+
1472
+ test('LCLI-RR-L5-3: post-critics rejects both --critic-outputs and --critic-outputs-path', () => {
1473
+ const r = _mkRoot();
1474
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
1475
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor', 'np-critic'], r);
1476
+ const reportPath = path.join(r, 'critic-r1.json');
1477
+ fs.writeFileSync(reportPath, '[]', 'utf-8');
1478
+ const cap = _cap();
1479
+ const loopRunRound = require('./loop-run-round.cjs');
1480
+ assert.throws(
1481
+ () => loopRunRound.run(
1482
+ ['M001-S001-T0001', '--phase', 'post-critics',
1483
+ '--critic-outputs', '[]',
1484
+ '--critic-outputs-path', 'critic-r1.json'],
1485
+ { cwd: r, stdout: cap.stub },
1486
+ ),
1487
+ (err) => err && err.code === 'loop-run-round-post-critics-conflicting-outputs',
1488
+ );
1489
+ });
1490
+
1491
+ test('LCLI-RR-L5-4: post-critics --critic-outputs-path rejects path traversal outside cwd', () => {
1492
+ const r = _mkRoot();
1493
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
1494
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor', 'np-critic'], r);
1495
+ const cap = _cap();
1496
+ const loopRunRound = require('./loop-run-round.cjs');
1497
+ assert.throws(
1498
+ () => loopRunRound.run(
1499
+ ['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs-path', '/etc/passwd'],
1500
+ { cwd: r, stdout: cap.stub },
1501
+ ),
1502
+ (err) => err && err.code === 'loop-run-round-critic-outputs-path-traversal',
1503
+ );
1504
+ });
1505
+
1506
+ test('LCLI-RR-L5-5: post-critics --critic-outputs-path on missing file errors typed', () => {
1507
+ const r = _mkRoot();
1508
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
1509
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor', 'np-critic'], r);
1510
+ const cap = _cap();
1511
+ const loopRunRound = require('./loop-run-round.cjs');
1512
+ assert.throws(
1513
+ () => loopRunRound.run(
1514
+ ['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs-path', 'never-was-here.json'],
1515
+ { cwd: r, stdout: cap.stub },
1516
+ ),
1517
+ (err) => err && err.code === 'loop-run-round-critic-outputs-path-unreadable',
1518
+ );
1519
+ });
1520
+
1521
+ test('LCLI-RR-L5-6: post-critics --critic-outputs-path on invalid JSON errors typed', () => {
1522
+ const r = _mkRoot();
1523
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
1524
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor', 'np-critic'], r);
1525
+ const reportPath = path.join(r, 'broken.json');
1526
+ fs.writeFileSync(reportPath, 'not valid json {{{', 'utf-8');
1527
+ const cap = _cap();
1528
+ const loopRunRound = require('./loop-run-round.cjs');
1529
+ assert.throws(
1530
+ () => loopRunRound.run(
1531
+ ['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs-path', 'broken.json'],
1532
+ { cwd: r, stdout: cap.stub },
1533
+ ),
1534
+ (err) => err && err.code === 'loop-run-round-critic-outputs-path-invalid-json',
1535
+ );
1536
+ });
1537
+
1538
+ test('LCLI-RR-L5-7: stuck --findings-path mirrors the post-critics path semantics', () => {
1539
+ const r = _mkRoot();
1540
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
1541
+ const reportPath = path.join(r, 'stuck-findings.json');
1542
+ fs.writeFileSync(reportPath, JSON.stringify({
1543
+ critic: 'critic', findings: [{ category: 'todo-marker', severity: 'fail', file: 'x.ts', line: 1, remediation: 'fix' }],
1544
+ criteria: [], verdict: 'issues_found',
1545
+ }), 'utf-8');
1546
+ const cap = _cap();
1547
+ const loopRunRound = require('./loop-run-round.cjs');
1548
+ loopRunRound.run(
1549
+ ['M001-S001-T0001', '--phase', 'stuck', '--reason', 'manual-fix-pending',
1550
+ '--findings-path', 'stuck-findings.json'],
1551
+ { cwd: r, stdout: cap.stub },
1552
+ );
1553
+ const out = JSON.parse(cap.get());
1554
+ assert.equal(out.phase, 'stuck');
1555
+ const cp = checkpoint.readCheckpoint('M001-S001-T0001', r);
1556
+ assert.ok(Array.isArray(cp.nubosloop.findings), 'findings persisted as array');
1557
+ assert.equal(cp.nubosloop.findings[0].findings[0].category, 'todo-marker');
1558
+ });
1559
+
1560
+ test('LCLI-RR-L5-8: stuck rejects both --findings and --findings-path', () => {
1561
+ const r = _mkRoot();
1562
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
1563
+ const reportPath = path.join(r, 'f.json');
1564
+ fs.writeFileSync(reportPath, '[]', 'utf-8');
1565
+ const cap = _cap();
1566
+ const loopRunRound = require('./loop-run-round.cjs');
1567
+ assert.throws(
1568
+ () => loopRunRound.run(
1569
+ ['M001-S001-T0001', '--phase', 'stuck', '--reason', 'manual-fix-pending',
1570
+ '--findings', '[]', '--findings-path', 'f.json'],
1571
+ { cwd: r, stdout: cap.stub },
1572
+ ),
1573
+ (err) => err && err.code === 'loop-run-round-stuck-conflicting-findings',
1574
+ );
1575
+ });
@@ -264,13 +264,68 @@ function _runPostExecutor(taskId, list, cwd) {
264
264
  };
265
265
  }
266
266
 
267
- function _runPostCritics(taskId, list, cwd) {
268
- const criticOutputs = args.getJsonFlag(
269
- list,
270
- '--critic-outputs',
271
- 'loop-run-round-post-critics-missing-outputs',
272
- 'pass the merged critic JSON array (style + tests + acceptance)',
267
+ function _readCriticOutputsFromPath(criticPath, cwd) {
268
+ const resolved = path.resolve(cwd, criticPath);
269
+ const tmp = (process.env.TMPDIR || '/tmp');
270
+ const tmpResolved = path.resolve(tmp);
271
+ const cwdResolved = path.resolve(cwd);
272
+ const insideCwd = resolved === cwdResolved || resolved.startsWith(cwdResolved + path.sep);
273
+ const insideTmp = resolved === tmpResolved || resolved.startsWith(tmpResolved + path.sep);
274
+ if (!insideCwd && !insideTmp) {
275
+ throw new NubosPilotError(
276
+ 'loop-run-round-critic-outputs-path-traversal',
277
+ '--critic-outputs-path must resolve inside cwd or TMPDIR',
278
+ { path: criticPath, resolved, cwd: cwdResolved, tmp: tmpResolved },
279
+ );
280
+ }
281
+ let raw;
282
+ try { raw = fs.readFileSync(resolved, 'utf-8'); }
283
+ catch (err) {
284
+ throw new NubosPilotError(
285
+ 'loop-run-round-critic-outputs-path-unreadable',
286
+ '--critic-outputs-path could not be read',
287
+ { path: criticPath, cause: err && err.message },
288
+ );
289
+ }
290
+ let parsed;
291
+ try { parsed = JSON.parse(raw); }
292
+ catch (err) {
293
+ throw new NubosPilotError(
294
+ 'loop-run-round-critic-outputs-path-invalid-json',
295
+ '--critic-outputs-path content is not valid JSON',
296
+ { path: criticPath, cause: err && err.message },
297
+ );
298
+ }
299
+ if (Array.isArray(parsed)) return parsed;
300
+ if (parsed && typeof parsed === 'object') return [parsed];
301
+ throw new NubosPilotError(
302
+ 'loop-run-round-critic-outputs-path-invalid-shape',
303
+ '--critic-outputs-path must contain a critic-output object or array of objects',
304
+ { path: criticPath, got: typeof parsed },
273
305
  );
306
+ }
307
+
308
+ function _runPostCritics(taskId, list, cwd) {
309
+ const inlineRaw = args.getFlag(list, '--critic-outputs');
310
+ const pathFlag = args.getFlag(list, '--critic-outputs-path');
311
+ if (inlineRaw !== undefined && pathFlag !== undefined) {
312
+ throw new NubosPilotError(
313
+ 'loop-run-round-post-critics-conflicting-outputs',
314
+ 'pass exactly one of --critic-outputs or --critic-outputs-path, not both',
315
+ { hint: 'Verdict-Only contract (ADR-0010 §L5) prefers --critic-outputs-path; inline form is the legacy fallback' },
316
+ );
317
+ }
318
+ let criticOutputs;
319
+ if (pathFlag !== undefined) {
320
+ criticOutputs = _readCriticOutputsFromPath(pathFlag, cwd);
321
+ } else {
322
+ criticOutputs = args.getJsonFlag(
323
+ list,
324
+ '--critic-outputs',
325
+ 'loop-run-round-post-critics-missing-outputs',
326
+ 'pass the merged critic JSON array (style + tests + acceptance), or --critic-outputs-path <file> per ADR-0010 §L5',
327
+ );
328
+ }
274
329
  if (!Array.isArray(criticOutputs)) {
275
330
  throw new NubosPilotError(
276
331
  'loop-run-round-post-critics-invalid-outputs',
@@ -482,7 +537,22 @@ const STUCK_REASONS_THAT_CLEAR_OVERRIDE = new Set([
482
537
 
483
538
  function _runStuck(taskId, list, cwd) {
484
539
  const reason = args.getFlag(list, '--reason') || '';
485
- const findings = args.optionalJsonFlag(list, '--findings');
540
+ const findingsInline = args.getFlag(list, '--findings');
541
+ const findingsPath = args.getFlag(list, '--findings-path');
542
+ if (findingsInline !== undefined && findingsPath !== undefined) {
543
+ throw new NubosPilotError(
544
+ 'loop-run-round-stuck-conflicting-findings',
545
+ 'pass exactly one of --findings or --findings-path, not both',
546
+ { hint: 'Verdict-Only contract (ADR-0010 §L5) prefers --findings-path; inline form is the legacy fallback' },
547
+ );
548
+ }
549
+ let findings;
550
+ if (findingsPath !== undefined) {
551
+ const parsed = _readCriticOutputsFromPath(findingsPath, cwd);
552
+ findings = parsed;
553
+ } else {
554
+ findings = args.optionalJsonFlag(list, '--findings');
555
+ }
486
556
  const merged = checkpoint.mergeCheckpoint(
487
557
  taskId,
488
558
  (cur) => {
@@ -0,0 +1,188 @@
1
+ 'use strict';
2
+
3
+ const fs = require('node:fs');
4
+ const path = require('node:path');
5
+ const child_process = require('node:child_process');
6
+
7
+ const { NubosPilotError } = require('../../lib/core.cjs');
8
+ const args = require('./_args.cjs');
9
+
10
+ const DEFAULT_TIMEOUT_MS = 10 * 60 * 1000;
11
+ const STDERR_TAIL_BYTES = 4 * 1024;
12
+
13
+ function _assertInsideCwdOrTmp(p, cwd, label) {
14
+ const resolved = path.resolve(cwd, p);
15
+ const tmp = (process.env.TMPDIR || '/tmp');
16
+ const tmpResolved = path.resolve(tmp);
17
+ const cwdResolved = path.resolve(cwd);
18
+ const insideCwd = resolved === cwdResolved || resolved.startsWith(cwdResolved + path.sep);
19
+ const insideTmp = resolved === tmpResolved || resolved.startsWith(tmpResolved + path.sep);
20
+ if (!insideCwd && !insideTmp) {
21
+ throw new NubosPilotError(
22
+ 'spawn-headless-path-traversal',
23
+ label + ' must resolve inside cwd or TMPDIR',
24
+ { path: p, resolved, cwd: cwdResolved, tmp: tmpResolved, label },
25
+ );
26
+ }
27
+ return resolved;
28
+ }
29
+
30
+ function _resolveAgentPath(agent, cwd) {
31
+ if (typeof agent !== 'string' || !agent.match(/^[a-zA-Z0-9_-]+$/)) {
32
+ throw new NubosPilotError(
33
+ 'spawn-headless-invalid-agent-name',
34
+ '--agent must be a simple identifier (alphanumeric, dash, underscore)',
35
+ { agent },
36
+ );
37
+ }
38
+ const candidates = [
39
+ path.join(cwd, '.nubos-pilot', 'agents', agent + '.md'),
40
+ path.join(cwd, '.claude', 'agents', agent + '.md'),
41
+ path.join(__dirname, '..', '..', 'agents', agent + '.md'),
42
+ ];
43
+ for (const c of candidates) {
44
+ try { if (fs.statSync(c).isFile()) return c; }
45
+ catch { /* not present at this path */ }
46
+ }
47
+ throw new NubosPilotError(
48
+ 'spawn-headless-agent-not-found',
49
+ 'Agent file not found for `' + agent + '` (searched: .nubos-pilot/agents, .claude/agents, package agents/)',
50
+ { agent, searched: candidates },
51
+ );
52
+ }
53
+
54
+ function _readPromptFile(promptPath, cwd) {
55
+ const resolved = _assertInsideCwdOrTmp(promptPath, cwd, '--prompt-path');
56
+ try { return fs.readFileSync(resolved, 'utf-8'); }
57
+ catch (err) {
58
+ throw new NubosPilotError(
59
+ 'spawn-headless-prompt-unreadable',
60
+ '--prompt-path could not be read',
61
+ { path: promptPath, cause: err && err.message },
62
+ );
63
+ }
64
+ }
65
+
66
+ function _ensureOutputDir(outputPath, cwd) {
67
+ const resolved = _assertInsideCwdOrTmp(outputPath, cwd, '--output-path');
68
+ fs.mkdirSync(path.dirname(resolved), { recursive: true });
69
+ return resolved;
70
+ }
71
+
72
+ function _claudeBinary() {
73
+ const env = process.env.NUBOS_PILOT_CLAUDE_BIN;
74
+ if (env && env.trim()) return env.trim();
75
+ return 'claude';
76
+ }
77
+
78
+ function _composePrompt(agentBody, userPrompt) {
79
+ return agentBody.trimEnd() + '\n\n---\n\n' + userPrompt.trimEnd() + '\n';
80
+ }
81
+
82
+ function _stripFrontmatter(md) {
83
+ if (!md.startsWith('---\n')) return md;
84
+ const end = md.indexOf('\n---\n', 4);
85
+ if (end === -1) return md;
86
+ return md.slice(end + 5);
87
+ }
88
+
89
+ function run(argv, ctx) {
90
+ const context = ctx || {};
91
+ const cwd = context.cwd || process.cwd();
92
+ const stdout = context.stdout || process.stdout;
93
+ const list = Array.isArray(argv) ? argv : [];
94
+
95
+ const agent = args.getFlag(list, '--agent');
96
+ if (!agent) {
97
+ throw new NubosPilotError(
98
+ 'spawn-headless-missing-agent',
99
+ 'spawn-headless requires --agent <name>',
100
+ { hint: 'agent is the basename of an .md file under agents/ (without extension)' },
101
+ );
102
+ }
103
+ const promptPath = args.getFlag(list, '--prompt-path');
104
+ if (!promptPath) {
105
+ throw new NubosPilotError(
106
+ 'spawn-headless-missing-prompt-path',
107
+ 'spawn-headless requires --prompt-path <file>',
108
+ {},
109
+ );
110
+ }
111
+ const outputPath = args.getFlag(list, '--output-path');
112
+ if (!outputPath) {
113
+ throw new NubosPilotError(
114
+ 'spawn-headless-missing-output-path',
115
+ 'spawn-headless requires --output-path <file>',
116
+ {},
117
+ );
118
+ }
119
+ const timeoutRaw = args.getFlag(list, '--timeout-ms');
120
+ const timeoutMs = timeoutRaw !== undefined ? Number(timeoutRaw) : DEFAULT_TIMEOUT_MS;
121
+ if (!Number.isFinite(timeoutMs) || timeoutMs < 1000) {
122
+ throw new NubosPilotError(
123
+ 'spawn-headless-invalid-timeout',
124
+ '--timeout-ms must be a positive number ≥ 1000',
125
+ { value: timeoutRaw },
126
+ );
127
+ }
128
+
129
+ const agentPath = _resolveAgentPath(agent, cwd);
130
+ const agentBody = _stripFrontmatter(fs.readFileSync(agentPath, 'utf-8'));
131
+ const userPrompt = _readPromptFile(promptPath, cwd);
132
+ const composedPrompt = _composePrompt(agentBody, userPrompt);
133
+ const resolvedOutput = _ensureOutputDir(outputPath, cwd);
134
+
135
+ const bin = _claudeBinary();
136
+ const claudeArgs = ['-p', '--output-format', 'json'];
137
+ let result;
138
+ try {
139
+ result = child_process.spawnSync(bin, claudeArgs, {
140
+ cwd,
141
+ input: composedPrompt,
142
+ timeout: timeoutMs,
143
+ maxBuffer: 64 * 1024 * 1024,
144
+ encoding: 'utf-8',
145
+ env: process.env,
146
+ });
147
+ } catch (err) {
148
+ throw new NubosPilotError(
149
+ 'spawn-headless-spawn-failed',
150
+ 'failed to spawn `' + bin + '`: ' + (err && err.message),
151
+ { bin, cause: err && err.code },
152
+ );
153
+ }
154
+ if (result.error && result.error.code === 'ENOENT') {
155
+ throw new NubosPilotError(
156
+ 'spawn-headless-claude-not-found',
157
+ 'binary `' + bin + '` not found on PATH (set NUBOS_PILOT_CLAUDE_BIN to override)',
158
+ { bin },
159
+ );
160
+ }
161
+ if (result.error && result.error.code === 'ETIMEDOUT') {
162
+ throw new NubosPilotError(
163
+ 'spawn-headless-timed-out',
164
+ 'subprocess `' + bin + '` exceeded --timeout-ms ' + timeoutMs,
165
+ { bin, timeoutMs },
166
+ );
167
+ }
168
+
169
+ const stderrTail = (result.stderr || '').slice(-STDERR_TAIL_BYTES);
170
+ const exitCode = result.status == null ? 1 : Number(result.status);
171
+
172
+ fs.writeFileSync(resolvedOutput, result.stdout || '', 'utf-8');
173
+
174
+ const payload = {
175
+ agent,
176
+ output_path: outputPath,
177
+ output_path_resolved: resolvedOutput,
178
+ exit_code: exitCode,
179
+ stderr_excerpt: stderrTail,
180
+ bin,
181
+ timed_out: !!(result.error && result.error.code === 'ETIMEDOUT'),
182
+ };
183
+ stdout.write(JSON.stringify(payload) + '\n');
184
+ if (exitCode !== 0) return 2;
185
+ return 0;
186
+ }
187
+
188
+ module.exports = { run };
@@ -0,0 +1,196 @@
1
+ 'use strict';
2
+
3
+ const fs = require('node:fs');
4
+ const os = require('node:os');
5
+ const path = require('node:path');
6
+ const { test, afterEach } = require('node:test');
7
+ const assert = require('node:assert/strict');
8
+
9
+ const spawnHeadless = require('./spawn-headless.cjs');
10
+
11
+ const _sandboxes = [];
12
+ const _envBackup = {};
13
+
14
+ function _mkRoot() {
15
+ const r = fs.mkdtempSync(path.join(os.tmpdir(), 'np-spawn-headless-'));
16
+ fs.mkdirSync(path.join(r, '.nubos-pilot', 'agents'), { recursive: true });
17
+ fs.writeFileSync(
18
+ path.join(r, '.nubos-pilot', 'agents', 'np-test-critic.md'),
19
+ '---\nname: np-test-critic\ntools: Read, Write\n---\n\n# Role\n\nYou are a test critic.\n',
20
+ 'utf-8',
21
+ );
22
+ _sandboxes.push(r);
23
+ return r;
24
+ }
25
+
26
+ function _cap() {
27
+ let s = '';
28
+ return { stub: { write: (x) => { s += String(x); return true; } }, get: () => s };
29
+ }
30
+
31
+ afterEach(() => {
32
+ while (_sandboxes.length) {
33
+ const r = _sandboxes.pop();
34
+ try { fs.rmSync(r, { recursive: true, force: true }); } catch {}
35
+ }
36
+ for (const k of Object.keys(_envBackup)) {
37
+ if (_envBackup[k] === undefined) delete process.env[k];
38
+ else process.env[k] = _envBackup[k];
39
+ delete _envBackup[k];
40
+ }
41
+ });
42
+
43
+ function _setEnv(k, v) {
44
+ _envBackup[k] = process.env[k];
45
+ if (v == null) delete process.env[k];
46
+ else process.env[k] = v;
47
+ }
48
+
49
+ test('SH-1: spawn-headless requires --agent', () => {
50
+ const r = _mkRoot();
51
+ const cap = _cap();
52
+ assert.throws(
53
+ () => spawnHeadless.run([], { cwd: r, stdout: cap.stub }),
54
+ (err) => err && err.code === 'spawn-headless-missing-agent',
55
+ );
56
+ });
57
+
58
+ test('SH-2: spawn-headless requires --prompt-path', () => {
59
+ const r = _mkRoot();
60
+ const cap = _cap();
61
+ assert.throws(
62
+ () => spawnHeadless.run(['--agent', 'np-test-critic'], { cwd: r, stdout: cap.stub }),
63
+ (err) => err && err.code === 'spawn-headless-missing-prompt-path',
64
+ );
65
+ });
66
+
67
+ test('SH-3: spawn-headless requires --output-path', () => {
68
+ const r = _mkRoot();
69
+ fs.writeFileSync(path.join(r, 'p.md'), 'do the audit', 'utf-8');
70
+ const cap = _cap();
71
+ assert.throws(
72
+ () => spawnHeadless.run(
73
+ ['--agent', 'np-test-critic', '--prompt-path', 'p.md'],
74
+ { cwd: r, stdout: cap.stub },
75
+ ),
76
+ (err) => err && err.code === 'spawn-headless-missing-output-path',
77
+ );
78
+ });
79
+
80
+ test('SH-4: spawn-headless rejects path traversal on prompt-path', () => {
81
+ const r = _mkRoot();
82
+ const cap = _cap();
83
+ assert.throws(
84
+ () => spawnHeadless.run(
85
+ ['--agent', 'np-test-critic',
86
+ '--prompt-path', '/etc/passwd',
87
+ '--output-path', 'out.json'],
88
+ { cwd: r, stdout: cap.stub },
89
+ ),
90
+ (err) => err && err.code === 'spawn-headless-path-traversal',
91
+ );
92
+ });
93
+
94
+ test('SH-5: spawn-headless rejects unknown agent', () => {
95
+ const r = _mkRoot();
96
+ fs.writeFileSync(path.join(r, 'p.md'), 'audit', 'utf-8');
97
+ const cap = _cap();
98
+ assert.throws(
99
+ () => spawnHeadless.run(
100
+ ['--agent', 'np-does-not-exist',
101
+ '--prompt-path', 'p.md',
102
+ '--output-path', 'out.json'],
103
+ { cwd: r, stdout: cap.stub },
104
+ ),
105
+ (err) => err && err.code === 'spawn-headless-agent-not-found',
106
+ );
107
+ });
108
+
109
+ test('SH-6: spawn-headless rejects invalid agent name (path-injection guard)', () => {
110
+ const r = _mkRoot();
111
+ fs.writeFileSync(path.join(r, 'p.md'), 'audit', 'utf-8');
112
+ const cap = _cap();
113
+ assert.throws(
114
+ () => spawnHeadless.run(
115
+ ['--agent', '../../etc/passwd',
116
+ '--prompt-path', 'p.md',
117
+ '--output-path', 'out.json'],
118
+ { cwd: r, stdout: cap.stub },
119
+ ),
120
+ (err) => err && err.code === 'spawn-headless-invalid-agent-name',
121
+ );
122
+ });
123
+
124
+ test('SH-7: spawn-headless reports claude-not-found when binary missing', () => {
125
+ const r = _mkRoot();
126
+ fs.writeFileSync(path.join(r, 'p.md'), 'audit', 'utf-8');
127
+ _setEnv('NUBOS_PILOT_CLAUDE_BIN', path.join(r, 'no-such-binary'));
128
+ const cap = _cap();
129
+ assert.throws(
130
+ () => spawnHeadless.run(
131
+ ['--agent', 'np-test-critic',
132
+ '--prompt-path', 'p.md',
133
+ '--output-path', 'out.json'],
134
+ { cwd: r, stdout: cap.stub },
135
+ ),
136
+ (err) => err && err.code === 'spawn-headless-claude-not-found',
137
+ );
138
+ });
139
+
140
+ test('SH-8: spawn-headless captures stdout to output-path on success (mock binary)', () => {
141
+ const r = _mkRoot();
142
+ fs.writeFileSync(path.join(r, 'p.md'), 'audit', 'utf-8');
143
+ const mockBin = path.join(r, 'mock-claude.sh');
144
+ fs.writeFileSync(mockBin, '#!/bin/sh\ncat > /dev/null\nprintf \'{"verdict":"passed","blockers_count":0,"report_path":null}\\n\'\n', 'utf-8');
145
+ fs.chmodSync(mockBin, 0o755);
146
+ _setEnv('NUBOS_PILOT_CLAUDE_BIN', mockBin);
147
+ const cap = _cap();
148
+ const rc = spawnHeadless.run(
149
+ ['--agent', 'np-test-critic',
150
+ '--prompt-path', 'p.md',
151
+ '--output-path', 'out.json'],
152
+ { cwd: r, stdout: cap.stub },
153
+ );
154
+ assert.equal(rc, 0, 'success returns exit 0');
155
+ const payload = JSON.parse(cap.get());
156
+ assert.equal(payload.exit_code, 0);
157
+ assert.equal(payload.agent, 'np-test-critic');
158
+ const written = fs.readFileSync(path.join(r, 'out.json'), 'utf-8');
159
+ assert.match(written, /"verdict":"passed"/);
160
+ });
161
+
162
+ test('SH-9: spawn-headless surfaces non-zero subprocess exit (mock failure)', () => {
163
+ const r = _mkRoot();
164
+ fs.writeFileSync(path.join(r, 'p.md'), 'audit', 'utf-8');
165
+ const mockBin = path.join(r, 'mock-fail.sh');
166
+ fs.writeFileSync(mockBin, '#!/bin/sh\ncat > /dev/null\necho boom >&2\nexit 7\n', 'utf-8');
167
+ fs.chmodSync(mockBin, 0o755);
168
+ _setEnv('NUBOS_PILOT_CLAUDE_BIN', mockBin);
169
+ const cap = _cap();
170
+ const rc = spawnHeadless.run(
171
+ ['--agent', 'np-test-critic',
172
+ '--prompt-path', 'p.md',
173
+ '--output-path', 'out.json'],
174
+ { cwd: r, stdout: cap.stub },
175
+ );
176
+ assert.equal(rc, 2, 'non-zero subprocess returns rc=2');
177
+ const payload = JSON.parse(cap.get());
178
+ assert.equal(payload.exit_code, 7);
179
+ assert.match(payload.stderr_excerpt, /boom/);
180
+ });
181
+
182
+ test('SH-10: spawn-headless rejects --timeout-ms below 1000', () => {
183
+ const r = _mkRoot();
184
+ fs.writeFileSync(path.join(r, 'p.md'), 'audit', 'utf-8');
185
+ const cap = _cap();
186
+ assert.throws(
187
+ () => spawnHeadless.run(
188
+ ['--agent', 'np-test-critic',
189
+ '--prompt-path', 'p.md',
190
+ '--output-path', 'out.json',
191
+ '--timeout-ms', '500'],
192
+ { cwd: r, stdout: cap.stub },
193
+ ),
194
+ (err) => err && err.code === 'spawn-headless-invalid-timeout',
195
+ );
196
+ });
@@ -156,6 +156,83 @@ No layer is sufficient alone. Together they require a deliberate, multi-step lie
156
156
 
157
157
  Layer C still cannot prove that the agent named in an audit entry actually ran. The orchestrator could call `loop-audit-tool-use --agent np-critic …` without spawning the critic. Closing this gap requires runtime instrumentation — the LLM runtime itself stamps spawn-provenance metadata into the audit entry, which the orchestrator cannot forge. That is "Stufe 2" and tracked separately; this amendment closes the practical bypass class without it.
158
158
 
159
+ ## Cost Layer (added 2026-05-05)
160
+
161
+ The Trust Layer raises the price of dishonesty; the Cost Layer raises the price of *honesty*. Two failure modes observed alongside the Trust Layer rollout:
162
+
163
+ 1. **Verbose critic returns dominate the per-round token bill.** The critic's structured findings JSON (criteria + findings + per-finding remediation prose) routinely runs 2–5 kB. Returning it as the spawn's final message replays it into the parent context every round, and over a 3-round loop with a 30-task milestone the critic alone burns ~200–500k parent-context tokens that contribute nothing to routing — `lib/nubosloop.cjs::mergeCriticOutputs` only consumes five fields per finding, the rest is decoration the parent never inspects.
164
+ 2. **Sub-agent „context isolation" is not context auslagerung.** The runtime's native Agent tool isolates the *child's* context window, but the agent's final message lands verbatim in the parent's history. For a Nubosloop with 1 researcher + 1 critic per round, that is two verbose returns per round per task — the largest per-task cost driver after the executor's own output.
165
+
166
+ The Cost Layer addresses both without weakening the Trust Layer: spawn-evidence auditing is unchanged, the routing engine is unchanged, only the *transport* of critic/researcher output between child and parent contexts changes.
167
+
168
+ ### Layer L5 — Verdict-Only Critic Contract
169
+
170
+ Critics now emit their full findings JSON to a path the orchestrator hands them in the spawn prompt (`<report_path>`, typically `${TMPDIR}/nubos-pilot/critic-reports/critic-<task-id>-r<round>.json`). The spawn's *final message* — the artefact that lands in parent context — is a small envelope:
171
+
172
+ ```json
173
+ { "critic": "critic", "task_id": "M001-S001-T0001", "round": 1,
174
+ "verdict": "passed | issues_found", "blockers_count": 0,
175
+ "report_path": "...", "run_id": "..." }
176
+ ```
177
+
178
+ `bin/np-tools/loop-run-round.cjs::_runPostCritics` accepts a new `--critic-outputs-path <file>` flag that reads the on-disk findings JSON directly. Inline `--critic-outputs <json>` remains accepted (legacy fallback for runtimes without `Write` capability and migration fixtures), but exactly one of the two MUST be passed — both at once is `loop-run-round-post-critics-conflicting-outputs`. `_runStuck` gets the symmetric `--findings-path` for the stuck-with-findings escalation paths.
179
+
180
+ The critic's `tools` frontmatter gains `Write`. `Write` is *only* permitted on the orchestrator-supplied `<report_path>`; touching anything else is a Layer-A bypass class and dealt with under existing trust-layer rules.
181
+
182
+ **Failure modes:**
183
+ - `<report_path>` missing in prompt OR `Write` fails → envelope sets `report_path: null`, `verdict: "issues_found"`, `blockers_count: 1`, with an `error` field. Routing engine treats this as `critic-error → stuck`.
184
+ - File written but unreadable / not valid JSON / shape mismatch → `loop-run-round` errors with a typed `loop-run-round-critic-outputs-path-{unreadable,invalid-json,invalid-shape}` code.
185
+ - Inline-JSON-with-no-file fallback is still routable; the legacy contract is not removed, only deprioritised.
186
+
187
+ **Why a contract, not a tooling hack:** this isn't an output-truncation hack. The findings JSON's *full content* still drives routing — it just travels via filesystem rather than via parent context. Token bill drops by ~95% on the critic axis without any loss of information. Shape, dedup, and fingerprint logic in `mergeCriticOutputs` are untouched.
188
+
189
+ Layer-C audit semantics are unchanged: the orchestrator still calls `loop-audit-tool-use --agent np-critic` after the spawn returns. The audit doesn't care whether the spawn delivered its findings inline or via file.
190
+
191
+ ### Layer L6 — Headless-Subprocess Mode (opt-in)
192
+
193
+ When the runtime's native Agent tool is the wrong shape — e.g. when the parent context has bloated past comfortable limits despite L5, or when a teammate is running on a runtime where each Agent-tool result still costs noticeable cache fragmentation — the orchestrator can route critic and researcher spawns through `bin/np-tools/spawn-headless.cjs` instead. This shells out to `claude -p --output-format json` as a child process; the spawn's conversation lives entirely outside the parent session and only the final-message JSON (the envelope under L5, or whatever the agent emits) is captured to disk.
194
+
195
+ **Config (`.nubos-pilot/config.json`):**
196
+
197
+ ```json
198
+ {
199
+ "spawn": {
200
+ "headless": {
201
+ "enabled": false,
202
+ "agents": ["np-critic", "np-researcher"],
203
+ "timeout_ms": 600000,
204
+ "fallback_on_error": true
205
+ }
206
+ }
207
+ }
208
+ ```
209
+
210
+ `enabled` defaults to `false` so existing installs see no behaviour change. `agents` defaults to the two read-only/output-emitting agents — extending the list to executor-class agents is a Layer-A risk because they mutate the working tree; the doc tooling enforces this list socially (no mechanical guard). `fallback_on_error: true` makes a failed `claude -p` spawn (binary missing, auth-failure, timeout) fall back to the runtime's Agent tool; the fallback is stamped on the checkpoint (`nubosloop.spawn_headless_fallbacks[]`) so dashboards can count fallback rate.
211
+
212
+ **Trade-offs (intentionally accepted):**
213
+
214
+ - **No shared prompt cache with parent.** Each headless spawn cold-loads its system prompt and agent body. Trade is worth it for critic/researcher because they don't share enough prompt prefix with the parent loop to benefit from cache anyway.
215
+ - **Separate auth.** `claude` CLI must be on `$PATH` and authenticated independently. The `NUBOS_PILOT_CLAUDE_BIN` env var overrides the binary path for split-install scenarios.
216
+ - **Cold-start latency.** Each spawn pays the boot cost of a new process. On 3-round loops with a 30-task milestone this is 30 × 3 × 2 ≈ 180 cold starts vs. 0 in the agent-tool path. Acceptable trade for token detach when token cost dominates.
217
+ - **No streaming feedback.** Headless spawns run to completion, then return — interactive features (Agent tool's progress streaming) are not available. This is fine for critics/researchers (which are batch-shaped anyway).
218
+
219
+ **Trust Layer compatibility:** the `loop-audit-tool-use` stamp is identical in both paths. The orchestrator must call it after `spawn-headless` returns, exactly as it does after an Agent-tool spawn. Going headless to bypass the audit is a Layer-C violation by the existing definition; nothing about L6 weakens this.
220
+
221
+ **What L6 deliberately does NOT do:**
222
+
223
+ - It does not headless-spawn the executor. Editing files inside a `claude -p` subprocess works in principle but the file mutations would not surface through the parent runtime's diff/edit telemetry, breaking the Layer-A commit-task gate which depends on the runtime's tracked-write surface.
224
+ - It does not move the audit log. Audits are still appended by the orchestrator (= the parent), not by the subprocess; otherwise a hostile subprocess could fake its own audit.
225
+ - It does not collapse the multiple researcher spawns of the swarm. `swarm.research.k = 3` still produces three independent headless processes; the merge step is unchanged.
226
+
227
+ ### Cost Layer summary
228
+
229
+ | Layer | Where | What it removes | Cost |
230
+ |-------|---------------------------------------------|----------------------------------------------------------------------------|----------------------------------------------------|
231
+ | L5 | `agents/np-critic.md` + `loop-run-round.cjs` | Verbatim findings JSON in parent context every round | Critic now requires `Write` on its report path |
232
+ | L6 | `bin/np-tools/spawn-headless.cjs` + workflow dispatcher | Whole spawn conversation in parent context for critic/researcher | Cold-start per spawn, no shared prompt cache, separate auth |
233
+
234
+ L5 alone is a ~95% reduction on the critic axis with no operational cost; recommended default. L6 stacks on top for installs where parent context is the binding constraint despite L5 — opt-in.
235
+
159
236
  ## More Information
160
237
 
161
238
  * **Related ADR:** [ADR-0001](0001-no-daemon-invariant.md) — the loop runs in-session; no daemon coordinates spawns.
@@ -43,6 +43,17 @@ const DEFAULT_SWARM = Object.freeze({
43
43
 
44
44
  const DEFAULT_AUTO_LOG_LEARNING = true;
45
45
 
46
+ const DEFAULT_SPAWN_HEADLESS = Object.freeze({
47
+ enabled: false,
48
+ agents: Object.freeze(['np-critic', 'np-researcher']),
49
+ timeout_ms: 10 * 60 * 1000,
50
+ fallback_on_error: true,
51
+ });
52
+
53
+ const DEFAULT_SPAWN = Object.freeze({
54
+ headless: DEFAULT_SPAWN_HEADLESS,
55
+ });
56
+
46
57
  const DEFAULT_MODEL_PROFILE = 'frontier';
47
58
  const DEFAULT_SCOPE = 'local';
48
59
  const DEFAULT_RESPONSE_LANGUAGE = 'en';
@@ -55,6 +66,7 @@ const DEFAULT_CONFIG_TREE = Object.freeze({
55
66
  agents: DEFAULT_AGENTS,
56
67
  loop: DEFAULT_LOOP,
57
68
  swarm: DEFAULT_SWARM,
69
+ spawn: DEFAULT_SPAWN,
58
70
  auto_log_learning: DEFAULT_AUTO_LOG_LEARNING,
59
71
  });
60
72
 
@@ -75,6 +87,14 @@ function buildInstallConfig(answers) {
75
87
  critic: { ...DEFAULT_SWARM_CRITIC },
76
88
  knowledge_adapter: DEFAULT_SWARM.knowledge_adapter,
77
89
  },
90
+ spawn: {
91
+ headless: {
92
+ enabled: DEFAULT_SPAWN_HEADLESS.enabled,
93
+ agents: [...DEFAULT_SPAWN_HEADLESS.agents],
94
+ timeout_ms: DEFAULT_SPAWN_HEADLESS.timeout_ms,
95
+ fallback_on_error: DEFAULT_SPAWN_HEADLESS.fallback_on_error,
96
+ },
97
+ },
78
98
  auto_log_learning: DEFAULT_AUTO_LOG_LEARNING,
79
99
  };
80
100
  }
@@ -87,6 +107,8 @@ module.exports = {
87
107
  DEFAULT_SWARM,
88
108
  DEFAULT_SWARM_RESEARCH,
89
109
  DEFAULT_SWARM_CRITIC,
110
+ DEFAULT_SPAWN,
111
+ DEFAULT_SPAWN_HEADLESS,
90
112
  DEFAULT_AUTO_LOG_LEARNING,
91
113
  DEFAULT_MODEL_PROFILE,
92
114
  DEFAULT_SCOPE,
@@ -11,12 +11,28 @@ const DEFAULT_K = 3;
11
11
  const MIN_K = 1;
12
12
  const MAX_K = 5;
13
13
 
14
+ // Perspectival nudges, NOT thematic preferences. Each entry varies HOW the
15
+ // spawn investigates the same `<task_query>`, never WHAT it should prefer in
16
+ // the answer. Thematic seed_deltas (e.g. "prefer native TypeScript types")
17
+ // silently turn the swarm into a topic-split — three spawns each ranking a
18
+ // different axis, intersection ≈ 0, consensus a fiction. ADR-0011 §Spawn
19
+ // Contract is explicit: identical task_query for every spawn, only the
20
+ // seed_delta varies, and the variation must not change WHICH question is
21
+ // answered or which solution dimension is favoured.
22
+ //
23
+ // Litmus test for adding a new entry: rephrase as "what does this researcher
24
+ // optimise FOR in their final answer?" — if the answer names a concrete
25
+ // solution attribute (TypeScript, smallest deps, latest version), it is
26
+ // thematic and belongs in the planner / architect, not the researcher swarm.
27
+ // Perspectival nudges answer "how does this researcher arrive at the
28
+ // answer?" — methodology, evidence weighting, contrarian stance, breadth vs.
29
+ // depth, gap surfacing.
14
30
  const SEED_DELTAS = [
15
- 'Prefer authoritative sources over training data when they conflict.',
16
- 'Prefer the smallest dependency surface that satisfies the requirement.',
17
- 'Prefer libraries that ship native TypeScript types and ESM by default.',
18
- 'Prefer architectures that compose with the project\'s existing module boundaries over greenfield rewrites.',
19
- 'Prefer documented, observable failure modes over silent fallbacks.',
31
+ 'Treat training-data recall as a hypothesis to verify against primary documentation; downgrade unverified claims to LOW confidence.',
32
+ 'Survey breadth-first before narrowing enumerate every viable option you find, even ones that look obviously inferior, before recommending.',
33
+ 'Be contrarian: assume the obvious recommendation is wrong and justify whether it actually is. If it survives the challenge, your confidence is higher.',
34
+ 'Surface unknowns explicitly: anything you cannot verify becomes an Open Question, not an [ASSUMED] filled with a plausible default.',
35
+ 'Stress-test the leading recommendation: name the most plausible failure mode that would make it the wrong choice, then assess how likely that mode is in scope.',
20
36
  ];
21
37
 
22
38
  // Invariant — SEED_DELTAS must cover the full clamp range. The previous
package/np-tools.cjs CHANGED
@@ -103,6 +103,7 @@ const topLevelCommands = {
103
103
  'loop-audit-tool-use': require('./bin/np-tools/loop-audit-tool-use.cjs'),
104
104
  'loop-stuck': require('./bin/np-tools/loop-stuck.cjs'),
105
105
  'loop-metrics': require('./bin/np-tools/loop-metrics.cjs'),
106
+ 'spawn-headless': require('./bin/np-tools/spawn-headless.cjs'),
106
107
  'learning-log': require('./bin/np-tools/learning-log.cjs'),
107
108
  'learning-match': require('./bin/np-tools/learning-match.cjs'),
108
109
  'learning-list': require('./bin/np-tools/learning-list.cjs'),
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nubos-pilot",
3
- "version": "1.0.1",
3
+ "version": "1.0.3",
4
4
  "description": "AI-driven planning and execution tool for code projects",
5
5
  "homepage": "https://github.com/Nubos-AI/nubos-pilot",
6
6
  "repository": {
@@ -121,8 +121,8 @@ Every task runs through the **Nubosloop** ([ADR-0010](../docs/adr/0010-nubosloop
121
121
  2. **Researcher-Schwarm (on cache miss, or on `next_action=researcher` re-route)** — orchestrator spawns `swarm.research.k=3` independent `np-researcher` agents IN PARALLEL (single message, three Agent blocks) and merges their outputs through `lib/researcher-swarm.cjs::mergeConsensus` (Mehrheit / Union / Schnittmenge). The merged consensus enters the Executor prompt with provenance.
122
122
  3. **Executor (R1) or Build-Fixer (R≥2)** — single LLM spawn. Round 1 spawns `agents/np-executor.md`. Round ≥ 2 spawns `agents/np-build-fixer.md` with prior critic findings + verify output appended. Edits ONLY paths in `files_modified` (D-04 — no scope expansion). Does NOT call `commit-task`.
123
123
  4. **Mechanical Checks (orchestrator, NOT the agent)** — run task's `<verify>` command + stack linters (`phpstan`, `pint`, `tsc`, `eslint`); capture exit code + output to `$VERIFY_LOG`. Then `loop-audit-tool-use --task-id ... --round ...` confirms the spawn invoked `search-knowledge` or `match-existing-learning` ≥ 1× (Rule 9). Audit findings get round-stamped and feed `loop-evaluate` alongside critic findings. Then call `loop-run-round --phase post-executor --verify-exit-code "$VERIFY_EXIT" --verify-output-path "$VERIFY_LOG"`. On verify-red the verb returns `next_action: spawn-build-fixer` — skip critics, advance to next round directly.
124
- 5. **Critic (verify-green only)** — one Critic agent spawns: `agents/np-critic.md` (sonnet). It emits a single structured findings JSON covering all three axes (style, tests, acceptance). Single-critic revision per ADR-0010 §Trust Layer 2026-05-05 — the prior 3-critic schwarm collapsed because three parallel spawns added latency without proportional finding-quality gains.
125
- 6. **Route** — `loop-run-round --phase post-critics --critic-outputs "$CRITIC_JSON"` returns `next_action ∈ {commit, executor, researcher, askuser, plan-checker, stuck}`:
124
+ 5. **Critic (verify-green only)** — one Critic agent spawns: `agents/np-critic.md` (sonnet). It writes the full findings JSON to `$CRITIC_REPORT_PATH` and emits a small verdict envelope as its final message (ADR-0010 §L5 Verdict-Only Contract, 2026-05-05). Single-critic revision per §Trust Layer 2026-05-05 — the prior 3-critic schwarm collapsed because three parallel spawns added latency without proportional finding-quality gains; the Verdict-Only Contract on top reduces per-round main-context tokens by an order of magnitude (verbatim findings reports were the dominant Nubosloop cost-driver).
125
+ 6. **Route** — `loop-run-round --phase post-critics --critic-outputs-path "$CRITIC_REPORT_PATH"` (or legacy `--critic-outputs "$CRITIC_JSON"` when the Verdict-Only Contract is unavailable) returns `next_action ∈ {commit, executor, researcher, askuser, plan-checker, stuck}`:
126
126
 
127
127
  | `next_action` | Trigger | Action |
128
128
  |------------------|------------------------------------|-----------------------------------------------------------------|
@@ -152,8 +152,56 @@ SWARM_K=$(node .nubos-pilot/bin/np-tools.cjs config-get swarm.research.k 2>/dev/
152
152
  SWARM_THRESHOLD=$(node .nubos-pilot/bin/np-tools.cjs config-get swarm.research.threshold 2>/dev/null || echo 0.9)
153
153
  SWARM_MIN_OCC=$(node .nubos-pilot/bin/np-tools.cjs config-get swarm.research.minOccurrence 2>/dev/null || echo 3)
154
154
  AUTO_LOG_LEARNING=$(node .nubos-pilot/bin/np-tools.cjs config-get auto_log_learning 2>/dev/null || echo true)
155
+ SPAWN_HEADLESS_ENABLED=$(node .nubos-pilot/bin/np-tools.cjs config-get spawn.headless.enabled 2>/dev/null || echo false)
156
+ SPAWN_HEADLESS_AGENTS=$(node .nubos-pilot/bin/np-tools.cjs config-get spawn.headless.agents 2>/dev/null || echo '["np-critic","np-researcher"]')
157
+ SPAWN_HEADLESS_FALLBACK=$(node .nubos-pilot/bin/np-tools.cjs config-get spawn.headless.fallback_on_error 2>/dev/null || echo true)
155
158
  ```
156
159
 
160
+ ## Spawn dispatch — agent-tool vs. headless subprocess (ADR-0010 §L6)
161
+
162
+ By default, `np-researcher` and `np-critic` spawns go through the runtime's
163
+ native Agent tool — the parent context picks up the spawn's final message as a
164
+ tool result. When `spawn.headless.enabled=true` AND the agent name appears in
165
+ `spawn.headless.agents`, the orchestrator instead shells out to
166
+ `node .nubos-pilot/bin/np-tools.cjs spawn-headless --agent <name> ...`, which
167
+ runs the agent inside an isolated `claude -p` subprocess. The subprocess'
168
+ final-message is captured to disk; the parent context only sees an exit code
169
+ plus the path. This buys true context detach for the verbose-but-bounded
170
+ critic/researcher passes — at the cost of an own prompt cache, separate auth,
171
+ and a cold-start per spawn.
172
+
173
+ **Dispatch helper (use at every np-researcher / np-critic spawn point):**
174
+
175
+ ```bash
176
+ _spawn_dispatch_is_headless() {
177
+ local agent="$1"
178
+ [ "$SPAWN_HEADLESS_ENABLED" = "true" ] || return 1
179
+ echo "$SPAWN_HEADLESS_AGENTS" | node -e \
180
+ "let l=''; process.stdin.on('data',d=>l+=d); process.stdin.on('end',()=>{
181
+ try { const arr = JSON.parse(l); process.exit(arr.includes(process.argv[1]) ? 0 : 1); }
182
+ catch (e) { process.exit(1); }
183
+ })" "$agent"
184
+ }
185
+ ```
186
+
187
+ For each headless spawn the orchestrator (a) writes the rendered prompt to
188
+ `${TMPDIR:-/tmp}/nubos-pilot/prompts/<agent>-<task-id>-r<round>.md`,
189
+ (b) calls `spawn-headless --agent <name> --prompt-path … --output-path …`,
190
+ (c) on non-zero exit AND `spawn.headless.fallback_on_error=true`, falls back to
191
+ the regular agent-tool spawn. Falling back is logged on the checkpoint
192
+ (`spawn_headless_fallbacks[]`) so the fallback rate is visible on
193
+ `/np:dashboard`. **The Layer-C `loop-audit-tool-use` stamp is identical for
194
+ both paths** — it is the orchestrator's responsibility to call it after the
195
+ spawn returns, regardless of whether the spawn went through the agent tool or
196
+ the headless subprocess. Bypassing the audit by going headless is a Layer-C
197
+ violation by the same definition as before.
198
+
199
+ `np-executor` and `np-build-fixer` are NEVER eligible for headless spawn —
200
+ they edit files in the working tree and depend on the parent runtime's file
201
+ write semantics. `spawn.headless.agents` defaults to `['np-critic','np-researcher']`
202
+ for exactly this reason; do not extend it without understanding which agents
203
+ mutate the working tree.
204
+
157
205
  **Per-task max-rounds override (T3, ADR-0010 Trust-Layer):** before entering the per-task while-loop, check the task's checkpoint for a `max_rounds_override` (set when the operator answered the stuck-dialog with "Weitermachen +5 Runden"). If present, it beats the config default — both for the bash while-cap and for the `post-critics` `evaluateLoop` cap.
158
206
 
159
207
  ```bash
@@ -313,11 +361,20 @@ for WAVE_INDEX in 0 1 2 ...; do
313
361
  continue
314
362
  fi
315
363
 
316
- # === Step 5: Critic — one agent, all three axes ===
364
+ # === Step 5: Critic — one agent, all three axes (Verdict-Only Contract, ADR-0010 §L5) ===
365
+ # The orchestrator pre-creates the report directory and hands the path to
366
+ # the spawn. The critic Writes the full findings JSON to that path and
367
+ # emits a tiny envelope (~150 bytes) as its final message — the verbose
368
+ # findings/criteria payload never enters the parent context. This is the
369
+ # main token-cost lever in ADR-0010; see §L5.
370
+ mkdir -p "${TMPDIR:-/tmp}/nubos-pilot/critic-reports"
371
+ CRITIC_REPORT_PATH="${TMPDIR:-/tmp}/nubos-pilot/critic-reports/critic-${TASK_ID}-r${ROUND}.json"
372
+
317
373
  # Single LLM spawn (sonnet by default — see swarm.critic.tier in config):
318
- # - agents/np-critic.md → CRITIC_OUTPUT_JSON
374
+ # - agents/np-critic.md → writes $CRITIC_REPORT_PATH, returns envelope
319
375
  # The orchestrator injects the three audit-surface modules into the
320
- # spawn's <files_to_read> block np-critic is thin (role, output schema,
376
+ # spawn's <files_to_read> block AND hands the agent <report_path> as a
377
+ # required spawn input — np-critic is thin (role, output schema,
321
378
  # trust-layer rules) and treats the three modules as canonical
322
379
  # audit-truth (categories, severity rubric, stop-conditions per axis):
323
380
  #
@@ -330,9 +387,15 @@ for WAVE_INDEX in 0 1 2 ...; do
330
387
  # - agents/np-critic-tests.md (Tests axis module)
331
388
  # - agents/np-critic-acceptance.md (Acceptance axis module)
332
389
  # </files_to_read>
390
+ # <report_path>$CRITIC_REPORT_PATH</report_path>
391
+ #
392
+ # Final-message shape from the spawn (verbatim, no markdown wrapper):
393
+ # { critic, task_id, round, verdict, blockers_count, report_path, run_id }
333
394
  #
334
- # The critic emits ONE merged JSON covering all three axes.
335
- CRITIC_OUTPUTS_JSON=$(printf '[%s]' "$CRITIC_OUTPUT_JSON")
395
+ # The orchestrator does NOT need to parse the envelope to drive routing
396
+ # loop-run-round --phase post-critics --critic-outputs-path reads the
397
+ # full file directly. Envelope fields are surfaced on np:dashboard for
398
+ # at-a-glance triage (verdict + blockers_count per task).
336
399
 
337
400
  # === Step 5b: Stamp critic spawn-evidence ===
338
401
  # MANDATORY — without this stamp, post-critics refuses with
@@ -346,8 +409,15 @@ for WAVE_INDEX in 0 1 2 ...; do
346
409
  node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-critic --tool-use-log '[]'
347
410
 
348
411
  # === Step 6: Route via loop-evaluate (post-critics) ===
412
+ # Verdict-Only Contract (ADR-0010 §L5): pass --critic-outputs-path so the
413
+ # full findings JSON is read directly from disk. The envelope from the
414
+ # spawn's final message is NOT what loop-evaluate consumes; it routes on
415
+ # the on-disk findings/criteria payload. The legacy --critic-outputs
416
+ # inline form is still accepted for runtimes without Write capability or
417
+ # for migration fixtures (`--force-post-critics` overrides the audit gate
418
+ # the same way it always has).
349
419
  POST_CRIT=$(node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
350
- --phase post-critics --critic-outputs "$CRITIC_OUTPUTS_JSON")
420
+ --phase post-critics --critic-outputs-path "$CRITIC_REPORT_PATH")
351
421
  NEXT_ACTION=$(echo "$POST_CRIT" | node -e 'process.stdin.on("data",d=>console.log(JSON.parse(d).next_action))')
352
422
 
353
423
  case "$NEXT_ACTION" in
@@ -395,17 +465,17 @@ for WAVE_INDEX in 0 1 2 ...; do
395
465
  case "$PLAN_ASK" in
396
466
  "Plan neu prüfen"*)
397
467
  node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
398
- --phase stuck --reason "user-requested-replan" --findings "$CRITIC_OUTPUTS_JSON"
468
+ --phase stuck --reason "user-requested-replan" --findings-path "$CRITIC_REPORT_PATH"
399
469
  echo "[np:execute-phase] $TASK_ID flagged for plan-checker. Run /np:plan-phase $PHASE --repromote, then re-run /np:execute-phase $PHASE." >&2
400
470
  exit 4 ;;
401
471
  "Task als stuck"*)
402
472
  node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
403
- --phase stuck --reason "plan-checker-user-stuck" --findings "$CRITIC_OUTPUTS_JSON"
473
+ --phase stuck --reason "plan-checker-user-stuck" --findings-path "$CRITIC_REPORT_PATH"
404
474
  echo "[np:execute-phase] $TASK_ID marked stuck (user choice from plan-checker dialog)." >&2
405
475
  exit 3 ;;
406
476
  "Manuell fixen"*)
407
477
  node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
408
- --phase stuck --reason "manual-fix-pending" --findings "$CRITIC_OUTPUTS_JSON"
478
+ --phase stuck --reason "manual-fix-pending" --findings-path "$CRITIC_REPORT_PATH"
409
479
  echo "[np:execute-phase] $TASK_ID paused for manual fix. Resume via /np:execute-phase $PHASE when ready." >&2
410
480
  exit 0 ;;
411
481
  esac ;;
@@ -436,17 +506,17 @@ for WAVE_INDEX in 0 1 2 ...; do
436
506
  continue ;;
437
507
  "Task neu planen"*)
438
508
  node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
439
- --phase stuck --reason "user-requested-replan" --findings "$CRITIC_OUTPUTS_JSON"
509
+ --phase stuck --reason "user-requested-replan" --findings-path "$CRITIC_REPORT_PATH"
440
510
  echo "[np:execute-phase] $TASK_ID flagged for plan-checker. Run /np:plan-phase $PHASE --repromote, then re-run /np:execute-phase $PHASE." >&2
441
511
  exit 4 ;;
442
512
  "Task als stuck"*)
443
513
  node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
444
- --phase stuck --reason "max-rounds-user-stuck" --findings "$CRITIC_OUTPUTS_JSON"
514
+ --phase stuck --reason "max-rounds-user-stuck" --findings-path "$CRITIC_REPORT_PATH"
445
515
  echo "[np:execute-phase] $TASK_ID marked stuck after $LOOP_MAX_ROUNDS rounds (user choice)." >&2
446
516
  exit 3 ;;
447
517
  "Manuell fixen"*)
448
518
  node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
449
- --phase stuck --reason "manual-fix-pending" --findings "$CRITIC_OUTPUTS_JSON"
519
+ --phase stuck --reason "manual-fix-pending" --findings-path "$CRITIC_REPORT_PATH"
450
520
  echo "[np:execute-phase] $TASK_ID paused for manual fix. Resume via /np:execute-phase $PHASE when ready." >&2
451
521
  exit 0 ;;
452
522
  esac ;;
@@ -526,7 +596,9 @@ After every slice completes, point the operator at `/np:validate-phase $PHASE` t
526
596
  - Start one checkpoint per task before kicking off the loop.
527
597
  - Run `loop-run-round --phase preflight` BEFORE every Round-1 executor spawn — never skip the cache lookup.
528
598
  - Spawn `agents/np-executor.md` on Round 1, `agents/np-build-fixer.md` on Round ≥ 2 — once per round, with only that task's `files_modified` in scope (D-04, no scope expansion).
529
- - Spawn the single Critic agent (`np-critic`) once per round, after a verify-green post-executor. It emits one JSON covering style + tests + acceptance.
599
+ - Spawn the single Critic agent (`np-critic`) once per round, after a verify-green post-executor. It writes the full findings JSON to `$CRITIC_REPORT_PATH` and emits a small verdict envelope as its final message (ADR-0010 §L5 Verdict-Only Contract).
600
+ - Pre-create `${TMPDIR:-/tmp}/nubos-pilot/critic-reports/` before the critic spawn so the agent's `Write` cannot fail on a missing parent directory.
601
+ - Pass `--critic-outputs-path "$CRITIC_REPORT_PATH"` to `loop-run-round --phase post-critics` so the full findings JSON is read from disk rather than replayed through the spawn's final message.
530
602
  - Run `loop-run-round --phase post-executor` AFTER mechanical checks; honor `next_action: spawn-build-fixer` (verify-red short-circuit, skip critics this round).
531
603
  - Run `loop-run-round --phase post-critics` AFTER critics return, to obtain the routing `next_action`.
532
604
  - Run `loop-audit-tool-use` per round per spawn — for executor/build-fixer this drives Rule 9 enforcement, AND for `np-critic` this is the spawn-evidence required by the Layer-C audit-trail gate (`loop-post-executor-missing-spawn-audit` / `loop-post-critics-missing-critic-audit`). After the Single-Critic Revision (ADR-0010, 2026-05-05) the per-round audit count is **two** in rounds ≥ 2 (`np-build-fixer` + `np-critic`) and **`swarm.research.k` + 2** in round 1 (k × `np-researcher` + `np-executor` + `np-critic`). All audits in the active round are mandatory before the corresponding `loop-run-round --phase post-{researcher|executor|critics}` invocation.
@@ -540,6 +612,7 @@ After every slice completes, point the operator at `/np:validate-phase $PHASE` t
540
612
  - Spawn the Critic agent BEFORE the post-executor verify-green check — verify must pass first; the critic only runs on verify-green.
541
613
  - Use `np-executor` on Round ≥ 2 — use `np-build-fixer` (it gets prior critic findings + verify output excerpt).
542
614
  - Skip `loop-audit-tool-use` for ANY spawn (researcher / executor / build-fixer / `np-critic`). Skipping the executor audit silences Rule 9; skipping the critic audit means the orchestrator cannot prove the critic actually ran, and the post-critics gate refuses. Synthesizing `--critic-outputs` JSON without spawning the real `np-critic` agent is the canonical bypass — Layer C blocks it mechanically.
615
+ - Bypass the Verdict-Only Contract by inlining the full findings JSON in the spawn's final message or by reconstructing `$CRITIC_REPORT_PATH` content from the envelope. Both defeat the cost-control purpose of ADR-0010 §L5; the critic is required to `Write` the findings file itself, and the orchestrator is required to read that file via `--critic-outputs-path` rather than the envelope.
543
616
  - Extend a task's scope beyond `files_modified` — D-04 violations route to `plan-checker`, not post-hoc PLAN.md mutations.
544
617
  - Invoke `git commit`, `git add`, or any bare git command from this workflow or the spawned agent (CLAUDE.md §Git operations).
545
618
  - Bundle two tasks into one commit (ADR-0004 atomicity).