npm - nubos-pilot - Versions diffs - 1.0.1 → 1.0.3 - Mend

nubos-pilot 1.0.1 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/agents/np-critic.md +39 -13
package/bin/np-tools/_commands.cjs +1 -0
package/bin/np-tools/loop-commands.test.cjs +152 -0
package/bin/np-tools/loop-run-round.cjs +77 -7
package/bin/np-tools/spawn-headless.cjs +188 -0
package/bin/np-tools/spawn-headless.test.cjs +196 -0
package/docs/adr/0010-nubosloop.md +77 -0
package/lib/config-defaults.cjs +22 -0
package/lib/researcher-swarm.cjs +21 -5
package/np-tools.cjs +1 -0
package/package.json +1 -1
package/workflows/execute-phase.md +88 -15

package/agents/np-critic.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
 name: np-critic
-description: Nubosloop critic for the per-task adversarial review. Spawned ONCE after np-executor (or np-build-fixer) commits a draft. Read-only on source. Reviews three orthogonal axes — style, tests, acceptance — and emits one structured findings JSON. ADR-0010 (single-critic revision 2026-05-05).
+description: Nubosloop critic for the per-task adversarial review. Spawned ONCE after np-executor (or np-build-fixer) commits a draft. Read-only on source, write-allowed for the critic-report file the orchestrator hands it. Reviews three orthogonal axes — style, tests, acceptance — writes the full findings JSON to disk and emits a tiny verdict envelope. ADR-0010 §Verdict-Only Contract (2026-05-05).
 tier: sonnet
-tools: Read, Bash, Grep, Glob
+tools: Read, Write, Bash, Grep, Glob
 color: "#A855F7"
 ---
@@ -46,6 +46,7 @@ The orchestrator provides these paths in your prompt context. Read every path it
 | Executor diff (required) | The patch produced this round. | inline / captured in checkpoint |
 | Verify output (required) | stdout/stderr of the task's verify command. | inline |
 | Files modified (required) | Paths the executor was scoped to. | task plan frontmatter `files_modified` |
+| **Report path (required, ADR-0010 §L5)** | The path where you `Write` the full findings JSON. The orchestrator pre-creates the parent directory; you only need to `Write`. | `.nubos-pilot/.tmp/<run-id>/critic-<task-id>-r<round>.json` |
 | Codebase docs (recommended) | `.nubos-pilot/codebase/<module>.md` for the touched modules — invariants and gotchas. | `.nubos-pilot/codebase/` |
 ## Audit Surface — three axis modules (load BEFORE auditing)
@@ -62,9 +63,13 @@ You produce ONE merged findings JSON covering ALL three axes — see Output Sche
 If any of the three module files cannot be read, emit `category: critic-error` with `remediation: "missing critic module file: <path>"` and route to `stuck` — the orchestrator must inject all three.
-## Output Schema
+## Output Schema — Verdict-Only Contract (ADR-0010 §L5, 2026-05-05)
-Emit a single JSON object as your final response (no prose, no markdown wrapper around it).
+You emit your audit in **two artefacts**: the full findings JSON gets `Write`-n to a path the orchestrator hands you, and your spawn's final response is a tiny envelope. This keeps the parent context lean — verbatim multi-kB findings reports were the dominant Nubosloop token sink before this revision.
+### Step 1 — write the full report to disk
+The orchestrator passes a `<report_path>` value in your spawn prompt (typically `.nubos-pilot/.tmp/<run-id>/critic-<task-id>-r<round>.json`). Use `Write` to emit this object verbatim into that path:
 ```json
 {
@@ -96,11 +101,29 @@ Emit a single JSON object as your final response (no prose, no markdown wrapper
 }
 ```
-`verdict` is `passed` only when every criterion in `criteria[]` is `Satisfied` AND `findings.length === 0`. Otherwise `issues_found`.
+The full-report shape is unchanged from the legacy contract — `lib/nubosloop.cjs::mergeCriticOutputs` reads this file directly via `loop-run-round --phase post-critics --critic-outputs-path`. Five-field routing contract (`category`, `severity`, `file`, `line`, `remediation`) is unchanged; auto-promotion of `Unsatisfied`/`Information-Missing` criteria is unchanged.
+### Step 2 — emit the verdict envelope as your final response
+After the `Write` succeeds, your spawn's final response — the message that lands in the orchestrator's context — is a **single small JSON object**, no prose, no markdown wrapper:
+```json
+{
+  "critic": "critic",
+  "task_id": "M001-S001-T0001",
+  "round": 1,
+  "verdict": "passed | issues_found",
+  "blockers_count": 0,
+  "report_path": ".nubos-pilot/.tmp/<run-id>/critic-M001-S001-T0001-r1.json",
+  "run_id": "<run-id>"
+}
+```
+`verdict` is `passed` only when every criterion in `criteria[]` is `Satisfied` AND `findings.length === 0`. Otherwise `issues_found`. `blockers_count` is the count of findings with `severity == "fail"` plus criteria with verdict `Unsatisfied` (so the orchestrator can sort tasks for triage without reading the full file). `report_path` is the literal path you wrote — verbatim from the orchestrator's `<report_path>` input.
-**Routing-engine contract.** `lib/nubosloop.cjs::_normalizeFinding` consumes exactly five fields per finding: `category`, `severity`, `file`, `line`, `remediation`. Every other field (`id`, `criterion_id`, `question_to_user`, etc.) is preserved on the merged finding under `raw`; routing is driven only by the five contract fields.
+If `<report_path>` is missing from your prompt or you cannot write the file, do NOT silently fall back to inline JSON — that defeats the cost-control purpose of this contract. Emit a single envelope with `verdict: "issues_found"`, `blockers_count: 1`, `report_path: null`, and an inline `error` field describing the cause; the orchestrator routes that to `critic-error → stuck`.
-**Note on auto-promotion.** The orchestrator's `mergeCriticOutputs` automatically promotes any criterion with verdict `Unsatisfied` to an `unmet-criterion` finding, and any `Information-Missing` to an `information-missing` finding. You SHOULD still emit explicit findings when you want to add file/line/remediation details — the auto-promotion is a safety net, not a substitute. Identical findings are deduplicated by fingerprint.
+**Why two artefacts.** The full findings JSON is several kB on a typical adversarial review (one paragraph per finding × N findings + per-criterion evidence sentences). Returning that as the spawn's final message replays it into the parent's history every round. The envelope is ~150 bytes — the orchestrator only reads the file when post-critics actually needs to route findings.
 ## Scope Guardrail
@@ -110,19 +133,22 @@ Emit a single JSON object as your final response (no prose, no markdown wrapper
 - Cite file, line, and concrete remediation per finding — not vague gripes.
 - Cite passing test names from the verify output as `Satisfied` evidence.
 - Mark infra failures `Information-Missing`, never `Unsatisfied`.
-- Emit one JSON object only — no prose wrapper, no markdown fence.
+- `Write` the full findings JSON to the orchestrator-supplied `<report_path>` BEFORE emitting your final-message envelope.
+- Final message = the small verdict envelope only. No prose, no markdown fence, no inline findings array.
 **Don't:**
-- Edit source — you are read-only.
+- Edit source — `Write` is allowed ONLY for the `<report_path>` the orchestrator hands you. Touching anything else is a Layer-A bypass.
 - Spawn other agents — you finish your audit and return.
 - Skip an axis "because the diff looks small". A small diff with no tests is a `missing-test` finding.
 - Pass with reservations — verdict is binary (`passed` or `issues_found`); reservations belong in findings.
 - Refuse to surface findings because "the executor will fix them anyway" — surface them, the loop closes them.
+- Inline the full findings JSON in the final message. The Verdict-Only Contract exists because that response replays into the orchestrator's context every round and is the dominant token sink — defeating it silently re-introduces the cost ADR-0010 §L5 was designed to remove.
 </scope_guardrail>
 ## Stop Conditions
-Hard-stop (return findings + verdict; do NOT attempt recovery):
-- The task plan has no `<success_criteria>` block — emit a single `unmet-criterion` finding pointing at this gap and route to plan-checker.
-- The Critic budget (timeout) is exhausted — emit collected criteria + findings + verdict `issues_found`.
-- The diff is unparseable / files are missing → emit `category: critic-error` and route to stuck.
+Hard-stop (`Write` the full findings JSON to `<report_path>` if possible, then emit the envelope; do NOT attempt recovery):
+- The task plan has no `<success_criteria>` block — emit a single `unmet-criterion` finding pointing at this gap and route to plan-checker. Envelope `verdict: "issues_found"`, `blockers_count: 1`.
+- The Critic budget (timeout) is exhausted — emit collected criteria + findings + verdict `issues_found`. Envelope reflects the partial report.
+- The diff is unparseable / files are missing → emit `category: critic-error` and route to stuck. Envelope `verdict: "issues_found"`, `blockers_count: 1`.
+- `<report_path>` is missing from the prompt OR `Write` to it fails → emit envelope with `report_path: null`, `verdict: "issues_found"`, `blockers_count: 1`, and an `error` field describing the cause. Routing engine treats this as `critic-error → stuck`.

package/bin/np-tools/_commands.cjs CHANGED Viewed

@@ -86,6 +86,7 @@ const COMMANDS = [
   { name: 'loop-run-round',          category: 'Execution', description: 'Drive the per-task Nubosloop state machine — phases: preflight | post-executor | post-critics | commit | stuck', description_de: 'Treibt die Per-Task Nubosloop-State-Machine — Phasen: preflight | post-executor | post-critics | commit | stuck' },
   { name: 'loop-audit-tool-use',     category: 'Execution', description: 'Record/read the tool-use audit per spawn (Completeness Rule 9 mechanical check)', description_de: 'Tool-use Audit pro Spawn schreiben/lesen (Completeness Rule 9 mechanische Prüfung)' },
   { name: 'loop-stuck',              category: 'Execution', description: 'Mark a task as stuck (writes loop-state + flips checkpoint status to stuck)', description_de: 'Markiert Task als stuck (schreibt Loop-State + setzt Checkpoint-Status auf stuck)' },
+  { name: 'spawn-headless',          category: 'Execution', description: 'Spawn an agent as a headless `claude -p` subprocess (ADR-0010 §L6); writes stdout to --output-path and returns exit code', description_de: 'Spawnt einen Agent als headless `claude -p` Subprozess (ADR-0010 §L6); schreibt stdout nach --output-path und liefert Exit-Code' },
   { name: 'loop-metrics',            category: 'Utility',   description: 'Aggregate Nubosloop telemetry across all checkpoints (commits, stuck, route distribution)', description_de: 'Aggregiert Nubosloop-Telemetrie über alle Checkpoints (Commits, Stuck, Routing)' },
   { name: 'learning-log',            category: 'Execution', description: 'Persist a learning to the local store (or MCP adapter when configured)', description_de: 'Persistiert ein Learning im lokalen Store (oder MCP-Adapter falls konfiguriert)' },
   { name: 'learning-match',          category: 'Utility',   description: 'Query the learnings store for cached patterns matching a free-text query', description_de: 'Fragt den Learnings-Store nach Cached-Patterns ab' },

package/bin/np-tools/loop-commands.test.cjs CHANGED Viewed

@@ -1421,3 +1421,155 @@ test('LCLI-20: learning-log payload carries fingerprint + was_new + occurrence',
   assert.equal(out2.occurrence, 2);
   assert.equal(out2.fingerprint, out1.fingerprint);
 });
+// ADR-0010 §L5 Verdict-Only Contract — post-critics reads findings from disk.
+test('LCLI-RR-L5-1: post-critics --critic-outputs-path reads file and routes commit on zero findings', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor', 'np-critic'], r);
+  const reportDir = path.join(r, '.nubos-pilot', '.tmp');
+  fs.mkdirSync(reportDir, { recursive: true });
+  const reportPath = path.join(reportDir, 'critic-r1.json');
+  fs.writeFileSync(reportPath, JSON.stringify({
+    critic: 'critic', task_id: 'M001-S001-T0001', round: 1,
+    criteria: [], findings: [], verdict: 'passed',
+  }), 'utf-8');
+  const cap = _cap();
+  const loopRunRound = require('./loop-run-round.cjs');
+  loopRunRound.run(
+    ['M001-S001-T0001', '--phase', 'post-critics',
+      '--critic-outputs-path', path.relative(r, reportPath)],
+    { cwd: r, stdout: cap.stub },
+  );
+  const out = JSON.parse(cap.get());
+  assert.equal(out.next_action, 'commit');
+  assert.equal(out.findings.length, 0);
+});
+test('LCLI-RR-L5-2: post-critics --critic-outputs-path with single object (not array) is wrapped', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor', 'np-critic'], r);
+  const reportPath = path.join(r, 'critic-r1.json');
+  fs.writeFileSync(reportPath, JSON.stringify({
+    critic: 'critic', task_id: 'M001-S001-T0001', round: 1,
+    findings: [{ category: 'todo-marker', severity: 'fail', file: 'a.ts', line: 4, remediation: 'remove TODO' }],
+    criteria: [], verdict: 'issues_found',
+  }), 'utf-8');
+  const cap = _cap();
+  const loopRunRound = require('./loop-run-round.cjs');
+  loopRunRound.run(
+    ['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs-path', 'critic-r1.json'],
+    { cwd: r, stdout: cap.stub },
+  );
+  const out = JSON.parse(cap.get());
+  assert.equal(out.next_action, 'executor');
+  assert.equal(out.findings.length, 1);
+  assert.equal(out.findings[0].category, 'todo-marker');
+});
+test('LCLI-RR-L5-3: post-critics rejects both --critic-outputs and --critic-outputs-path', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor', 'np-critic'], r);
+  const reportPath = path.join(r, 'critic-r1.json');
+  fs.writeFileSync(reportPath, '[]', 'utf-8');
+  const cap = _cap();
+  const loopRunRound = require('./loop-run-round.cjs');
+  assert.throws(
+    () => loopRunRound.run(
+      ['M001-S001-T0001', '--phase', 'post-critics',
+        '--critic-outputs', '[]',
+        '--critic-outputs-path', 'critic-r1.json'],
+      { cwd: r, stdout: cap.stub },
+    ),
+    (err) => err && err.code === 'loop-run-round-post-critics-conflicting-outputs',
+  );
+});
+test('LCLI-RR-L5-4: post-critics --critic-outputs-path rejects path traversal outside cwd', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor', 'np-critic'], r);
+  const cap = _cap();
+  const loopRunRound = require('./loop-run-round.cjs');
+  assert.throws(
+    () => loopRunRound.run(
+      ['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs-path', '/etc/passwd'],
+      { cwd: r, stdout: cap.stub },
+    ),
+    (err) => err && err.code === 'loop-run-round-critic-outputs-path-traversal',
+  );
+});
+test('LCLI-RR-L5-5: post-critics --critic-outputs-path on missing file errors typed', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor', 'np-critic'], r);
+  const cap = _cap();
+  const loopRunRound = require('./loop-run-round.cjs');
+  assert.throws(
+    () => loopRunRound.run(
+      ['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs-path', 'never-was-here.json'],
+      { cwd: r, stdout: cap.stub },
+    ),
+    (err) => err && err.code === 'loop-run-round-critic-outputs-path-unreadable',
+  );
+});
+test('LCLI-RR-L5-6: post-critics --critic-outputs-path on invalid JSON errors typed', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  _seedSpawnEvidence('M001-S001-T0001', 1, ['np-executor', 'np-critic'], r);
+  const reportPath = path.join(r, 'broken.json');
+  fs.writeFileSync(reportPath, 'not valid json {{{', 'utf-8');
+  const cap = _cap();
+  const loopRunRound = require('./loop-run-round.cjs');
+  assert.throws(
+    () => loopRunRound.run(
+      ['M001-S001-T0001', '--phase', 'post-critics', '--critic-outputs-path', 'broken.json'],
+      { cwd: r, stdout: cap.stub },
+    ),
+    (err) => err && err.code === 'loop-run-round-critic-outputs-path-invalid-json',
+  );
+});
+test('LCLI-RR-L5-7: stuck --findings-path mirrors the post-critics path semantics', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  const reportPath = path.join(r, 'stuck-findings.json');
+  fs.writeFileSync(reportPath, JSON.stringify({
+    critic: 'critic', findings: [{ category: 'todo-marker', severity: 'fail', file: 'x.ts', line: 1, remediation: 'fix' }],
+    criteria: [], verdict: 'issues_found',
+  }), 'utf-8');
+  const cap = _cap();
+  const loopRunRound = require('./loop-run-round.cjs');
+  loopRunRound.run(
+    ['M001-S001-T0001', '--phase', 'stuck', '--reason', 'manual-fix-pending',
+      '--findings-path', 'stuck-findings.json'],
+    { cwd: r, stdout: cap.stub },
+  );
+  const out = JSON.parse(cap.get());
+  assert.equal(out.phase, 'stuck');
+  const cp = checkpoint.readCheckpoint('M001-S001-T0001', r);
+  assert.ok(Array.isArray(cp.nubosloop.findings), 'findings persisted as array');
+  assert.equal(cp.nubosloop.findings[0].findings[0].category, 'todo-marker');
+});
+test('LCLI-RR-L5-8: stuck rejects both --findings and --findings-path', () => {
+  const r = _mkRoot();
+  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
+  const reportPath = path.join(r, 'f.json');
+  fs.writeFileSync(reportPath, '[]', 'utf-8');
+  const cap = _cap();
+  const loopRunRound = require('./loop-run-round.cjs');
+  assert.throws(
+    () => loopRunRound.run(
+      ['M001-S001-T0001', '--phase', 'stuck', '--reason', 'manual-fix-pending',
+        '--findings', '[]', '--findings-path', 'f.json'],
+      { cwd: r, stdout: cap.stub },
+    ),
+    (err) => err && err.code === 'loop-run-round-stuck-conflicting-findings',
+  );
+});

package/bin/np-tools/loop-run-round.cjs CHANGED Viewed

@@ -264,13 +264,68 @@ function _runPostExecutor(taskId, list, cwd) {
   };
 }
-function _runPostCritics(taskId, list, cwd) {
-  const criticOutputs = args.getJsonFlag(
-    list,
-    '--critic-outputs',
-    'loop-run-round-post-critics-missing-outputs',
-    'pass the merged critic JSON array (style + tests + acceptance)',
+function _readCriticOutputsFromPath(criticPath, cwd) {
+  const resolved = path.resolve(cwd, criticPath);
+  const tmp = (process.env.TMPDIR || '/tmp');
+  const tmpResolved = path.resolve(tmp);
+  const cwdResolved = path.resolve(cwd);
+  const insideCwd = resolved === cwdResolved || resolved.startsWith(cwdResolved + path.sep);
+  const insideTmp = resolved === tmpResolved || resolved.startsWith(tmpResolved + path.sep);
+  if (!insideCwd && !insideTmp) {
+    throw new NubosPilotError(
+      'loop-run-round-critic-outputs-path-traversal',
+      '--critic-outputs-path must resolve inside cwd or TMPDIR',
+      { path: criticPath, resolved, cwd: cwdResolved, tmp: tmpResolved },
+    );
+  }
+  let raw;
+  try { raw = fs.readFileSync(resolved, 'utf-8'); }
+  catch (err) {
+    throw new NubosPilotError(
+      'loop-run-round-critic-outputs-path-unreadable',
+      '--critic-outputs-path could not be read',
+      { path: criticPath, cause: err && err.message },
+    );
+  }
+  let parsed;
+  try { parsed = JSON.parse(raw); }
+  catch (err) {
+    throw new NubosPilotError(
+      'loop-run-round-critic-outputs-path-invalid-json',
+      '--critic-outputs-path content is not valid JSON',
+      { path: criticPath, cause: err && err.message },
+    );
+  }
+  if (Array.isArray(parsed)) return parsed;
+  if (parsed && typeof parsed === 'object') return [parsed];
+  throw new NubosPilotError(
+    'loop-run-round-critic-outputs-path-invalid-shape',
+    '--critic-outputs-path must contain a critic-output object or array of objects',
+    { path: criticPath, got: typeof parsed },
   );
+}
+function _runPostCritics(taskId, list, cwd) {
+  const inlineRaw = args.getFlag(list, '--critic-outputs');
+  const pathFlag = args.getFlag(list, '--critic-outputs-path');
+  if (inlineRaw !== undefined && pathFlag !== undefined) {
+    throw new NubosPilotError(
+      'loop-run-round-post-critics-conflicting-outputs',
+      'pass exactly one of --critic-outputs or --critic-outputs-path, not both',
+      { hint: 'Verdict-Only contract (ADR-0010 §L5) prefers --critic-outputs-path; inline form is the legacy fallback' },
+    );
+  }
+  let criticOutputs;
+  if (pathFlag !== undefined) {
+    criticOutputs = _readCriticOutputsFromPath(pathFlag, cwd);
+  } else {
+    criticOutputs = args.getJsonFlag(
+      list,
+      '--critic-outputs',
+      'loop-run-round-post-critics-missing-outputs',
+      'pass the merged critic JSON array (style + tests + acceptance), or --critic-outputs-path <file> per ADR-0010 §L5',
+    );
+  }
   if (!Array.isArray(criticOutputs)) {
     throw new NubosPilotError(
       'loop-run-round-post-critics-invalid-outputs',
@@ -482,7 +537,22 @@ const STUCK_REASONS_THAT_CLEAR_OVERRIDE = new Set([
 function _runStuck(taskId, list, cwd) {
   const reason = args.getFlag(list, '--reason') || '';
-  const findings = args.optionalJsonFlag(list, '--findings');
+  const findingsInline = args.getFlag(list, '--findings');
+  const findingsPath = args.getFlag(list, '--findings-path');
+  if (findingsInline !== undefined && findingsPath !== undefined) {
+    throw new NubosPilotError(
+      'loop-run-round-stuck-conflicting-findings',
+      'pass exactly one of --findings or --findings-path, not both',
+      { hint: 'Verdict-Only contract (ADR-0010 §L5) prefers --findings-path; inline form is the legacy fallback' },
+    );
+  }
+  let findings;
+  if (findingsPath !== undefined) {
+    const parsed = _readCriticOutputsFromPath(findingsPath, cwd);
+    findings = parsed;
+  } else {
+    findings = args.optionalJsonFlag(list, '--findings');
+  }
   const merged = checkpoint.mergeCheckpoint(
     taskId,
     (cur) => {

package/bin/np-tools/spawn-headless.cjs ADDED Viewed

@@ -0,0 +1,188 @@
+'use strict';
+const fs = require('node:fs');
+const path = require('node:path');
+const child_process = require('node:child_process');
+const { NubosPilotError } = require('../../lib/core.cjs');
+const args = require('./_args.cjs');
+const DEFAULT_TIMEOUT_MS = 10 * 60 * 1000;
+const STDERR_TAIL_BYTES = 4 * 1024;
+function _assertInsideCwdOrTmp(p, cwd, label) {
+  const resolved = path.resolve(cwd, p);
+  const tmp = (process.env.TMPDIR || '/tmp');
+  const tmpResolved = path.resolve(tmp);
+  const cwdResolved = path.resolve(cwd);
+  const insideCwd = resolved === cwdResolved || resolved.startsWith(cwdResolved + path.sep);
+  const insideTmp = resolved === tmpResolved || resolved.startsWith(tmpResolved + path.sep);
+  if (!insideCwd && !insideTmp) {
+    throw new NubosPilotError(
+      'spawn-headless-path-traversal',
+      label + ' must resolve inside cwd or TMPDIR',
+      { path: p, resolved, cwd: cwdResolved, tmp: tmpResolved, label },
+    );
+  }
+  return resolved;
+}
+function _resolveAgentPath(agent, cwd) {
+  if (typeof agent !== 'string' || !agent.match(/^[a-zA-Z0-9_-]+$/)) {
+    throw new NubosPilotError(
+      'spawn-headless-invalid-agent-name',
+      '--agent must be a simple identifier (alphanumeric, dash, underscore)',
+      { agent },
+    );
+  }
+  const candidates = [
+    path.join(cwd, '.nubos-pilot', 'agents', agent + '.md'),
+    path.join(cwd, '.claude', 'agents', agent + '.md'),
+    path.join(__dirname, '..', '..', 'agents', agent + '.md'),
+  ];
+  for (const c of candidates) {
+    try { if (fs.statSync(c).isFile()) return c; }
+    catch { /* not present at this path */ }
+  }
+  throw new NubosPilotError(
+    'spawn-headless-agent-not-found',
+    'Agent file not found for `' + agent + '` (searched: .nubos-pilot/agents, .claude/agents, package agents/)',
+    { agent, searched: candidates },
+  );
+}
+function _readPromptFile(promptPath, cwd) {
+  const resolved = _assertInsideCwdOrTmp(promptPath, cwd, '--prompt-path');
+  try { return fs.readFileSync(resolved, 'utf-8'); }
+  catch (err) {
+    throw new NubosPilotError(
+      'spawn-headless-prompt-unreadable',
+      '--prompt-path could not be read',
+      { path: promptPath, cause: err && err.message },
+    );
+  }
+}
+function _ensureOutputDir(outputPath, cwd) {
+  const resolved = _assertInsideCwdOrTmp(outputPath, cwd, '--output-path');
+  fs.mkdirSync(path.dirname(resolved), { recursive: true });
+  return resolved;
+}
+function _claudeBinary() {
+  const env = process.env.NUBOS_PILOT_CLAUDE_BIN;
+  if (env && env.trim()) return env.trim();
+  return 'claude';
+}
+function _composePrompt(agentBody, userPrompt) {
+  return agentBody.trimEnd() + '\n\n---\n\n' + userPrompt.trimEnd() + '\n';
+}
+function _stripFrontmatter(md) {
+  if (!md.startsWith('---\n')) return md;
+  const end = md.indexOf('\n---\n', 4);
+  if (end === -1) return md;
+  return md.slice(end + 5);
+}
+function run(argv, ctx) {
+  const context = ctx || {};
+  const cwd = context.cwd || process.cwd();
+  const stdout = context.stdout || process.stdout;
+  const list = Array.isArray(argv) ? argv : [];
+  const agent = args.getFlag(list, '--agent');
+  if (!agent) {
+    throw new NubosPilotError(
+      'spawn-headless-missing-agent',
+      'spawn-headless requires --agent <name>',
+      { hint: 'agent is the basename of an .md file under agents/ (without extension)' },
+    );
+  }
+  const promptPath = args.getFlag(list, '--prompt-path');
+  if (!promptPath) {
+    throw new NubosPilotError(
+      'spawn-headless-missing-prompt-path',
+      'spawn-headless requires --prompt-path <file>',
+      {},
+    );
+  }
+  const outputPath = args.getFlag(list, '--output-path');
+  if (!outputPath) {
+    throw new NubosPilotError(
+      'spawn-headless-missing-output-path',
+      'spawn-headless requires --output-path <file>',
+      {},
+    );
+  }
+  const timeoutRaw = args.getFlag(list, '--timeout-ms');
+  const timeoutMs = timeoutRaw !== undefined ? Number(timeoutRaw) : DEFAULT_TIMEOUT_MS;
+  if (!Number.isFinite(timeoutMs) || timeoutMs < 1000) {
+    throw new NubosPilotError(
+      'spawn-headless-invalid-timeout',
+      '--timeout-ms must be a positive number ≥ 1000',
+      { value: timeoutRaw },
+    );
+  }
+  const agentPath = _resolveAgentPath(agent, cwd);
+  const agentBody = _stripFrontmatter(fs.readFileSync(agentPath, 'utf-8'));
+  const userPrompt = _readPromptFile(promptPath, cwd);
+  const composedPrompt = _composePrompt(agentBody, userPrompt);
+  const resolvedOutput = _ensureOutputDir(outputPath, cwd);
+  const bin = _claudeBinary();
+  const claudeArgs = ['-p', '--output-format', 'json'];
+  let result;
+  try {
+    result = child_process.spawnSync(bin, claudeArgs, {
+      cwd,
+      input: composedPrompt,
+      timeout: timeoutMs,
+      maxBuffer: 64 * 1024 * 1024,
+      encoding: 'utf-8',
+      env: process.env,
+    });
+  } catch (err) {
+    throw new NubosPilotError(
+      'spawn-headless-spawn-failed',
+      'failed to spawn `' + bin + '`: ' + (err && err.message),
+      { bin, cause: err && err.code },
+    );
+  }
+  if (result.error && result.error.code === 'ENOENT') {
+    throw new NubosPilotError(
+      'spawn-headless-claude-not-found',
+      'binary `' + bin + '` not found on PATH (set NUBOS_PILOT_CLAUDE_BIN to override)',
+      { bin },
+    );
+  }
+  if (result.error && result.error.code === 'ETIMEDOUT') {
+    throw new NubosPilotError(
+      'spawn-headless-timed-out',
+      'subprocess `' + bin + '` exceeded --timeout-ms ' + timeoutMs,
+      { bin, timeoutMs },
+    );
+  }
+  const stderrTail = (result.stderr || '').slice(-STDERR_TAIL_BYTES);
+  const exitCode = result.status == null ? 1 : Number(result.status);
+  fs.writeFileSync(resolvedOutput, result.stdout || '', 'utf-8');
+  const payload = {
+    agent,
+    output_path: outputPath,
+    output_path_resolved: resolvedOutput,
+    exit_code: exitCode,
+    stderr_excerpt: stderrTail,
+    bin,
+    timed_out: !!(result.error && result.error.code === 'ETIMEDOUT'),
+  };
+  stdout.write(JSON.stringify(payload) + '\n');
+  if (exitCode !== 0) return 2;
+  return 0;
+}
+module.exports = { run };

package/bin/np-tools/spawn-headless.test.cjs ADDED Viewed

@@ -0,0 +1,196 @@
+'use strict';
+const fs = require('node:fs');
+const os = require('node:os');
+const path = require('node:path');
+const { test, afterEach } = require('node:test');
+const assert = require('node:assert/strict');
+const spawnHeadless = require('./spawn-headless.cjs');
+const _sandboxes = [];
+const _envBackup = {};
+function _mkRoot() {
+  const r = fs.mkdtempSync(path.join(os.tmpdir(), 'np-spawn-headless-'));
+  fs.mkdirSync(path.join(r, '.nubos-pilot', 'agents'), { recursive: true });
+  fs.writeFileSync(
+    path.join(r, '.nubos-pilot', 'agents', 'np-test-critic.md'),
+    '---\nname: np-test-critic\ntools: Read, Write\n---\n\n# Role\n\nYou are a test critic.\n',
+    'utf-8',
+  );
+  _sandboxes.push(r);
+  return r;
+}
+function _cap() {
+  let s = '';
+  return { stub: { write: (x) => { s += String(x); return true; } }, get: () => s };
+}
+afterEach(() => {
+  while (_sandboxes.length) {
+    const r = _sandboxes.pop();
+    try { fs.rmSync(r, { recursive: true, force: true }); } catch {}
+  }
+  for (const k of Object.keys(_envBackup)) {
+    if (_envBackup[k] === undefined) delete process.env[k];
+    else process.env[k] = _envBackup[k];
+    delete _envBackup[k];
+  }
+});
+function _setEnv(k, v) {
+  _envBackup[k] = process.env[k];
+  if (v == null) delete process.env[k];
+  else process.env[k] = v;
+}
+test('SH-1: spawn-headless requires --agent', () => {
+  const r = _mkRoot();
+  const cap = _cap();
+  assert.throws(
+    () => spawnHeadless.run([], { cwd: r, stdout: cap.stub }),
+    (err) => err && err.code === 'spawn-headless-missing-agent',
+  );
+});
+test('SH-2: spawn-headless requires --prompt-path', () => {
+  const r = _mkRoot();
+  const cap = _cap();
+  assert.throws(
+    () => spawnHeadless.run(['--agent', 'np-test-critic'], { cwd: r, stdout: cap.stub }),
+    (err) => err && err.code === 'spawn-headless-missing-prompt-path',
+  );
+});
+test('SH-3: spawn-headless requires --output-path', () => {
+  const r = _mkRoot();
+  fs.writeFileSync(path.join(r, 'p.md'), 'do the audit', 'utf-8');
+  const cap = _cap();
+  assert.throws(
+    () => spawnHeadless.run(
+      ['--agent', 'np-test-critic', '--prompt-path', 'p.md'],
+      { cwd: r, stdout: cap.stub },
+    ),
+    (err) => err && err.code === 'spawn-headless-missing-output-path',
+  );
+});
+test('SH-4: spawn-headless rejects path traversal on prompt-path', () => {
+  const r = _mkRoot();
+  const cap = _cap();
+  assert.throws(
+    () => spawnHeadless.run(
+      ['--agent', 'np-test-critic',
+        '--prompt-path', '/etc/passwd',
+        '--output-path', 'out.json'],
+      { cwd: r, stdout: cap.stub },
+    ),
+    (err) => err && err.code === 'spawn-headless-path-traversal',
+  );
+});
+test('SH-5: spawn-headless rejects unknown agent', () => {
+  const r = _mkRoot();
+  fs.writeFileSync(path.join(r, 'p.md'), 'audit', 'utf-8');
+  const cap = _cap();
+  assert.throws(
+    () => spawnHeadless.run(
+      ['--agent', 'np-does-not-exist',
+        '--prompt-path', 'p.md',
+        '--output-path', 'out.json'],
+      { cwd: r, stdout: cap.stub },
+    ),
+    (err) => err && err.code === 'spawn-headless-agent-not-found',
+  );
+});
+test('SH-6: spawn-headless rejects invalid agent name (path-injection guard)', () => {
+  const r = _mkRoot();
+  fs.writeFileSync(path.join(r, 'p.md'), 'audit', 'utf-8');
+  const cap = _cap();
+  assert.throws(
+    () => spawnHeadless.run(
+      ['--agent', '../../etc/passwd',
+        '--prompt-path', 'p.md',
+        '--output-path', 'out.json'],
+      { cwd: r, stdout: cap.stub },
+    ),
+    (err) => err && err.code === 'spawn-headless-invalid-agent-name',
+  );
+});
+test('SH-7: spawn-headless reports claude-not-found when binary missing', () => {
+  const r = _mkRoot();
+  fs.writeFileSync(path.join(r, 'p.md'), 'audit', 'utf-8');
+  _setEnv('NUBOS_PILOT_CLAUDE_BIN', path.join(r, 'no-such-binary'));
+  const cap = _cap();
+  assert.throws(
+    () => spawnHeadless.run(
+      ['--agent', 'np-test-critic',
+        '--prompt-path', 'p.md',
+        '--output-path', 'out.json'],
+      { cwd: r, stdout: cap.stub },
+    ),
+    (err) => err && err.code === 'spawn-headless-claude-not-found',
+  );
+});
+test('SH-8: spawn-headless captures stdout to output-path on success (mock binary)', () => {
+  const r = _mkRoot();
+  fs.writeFileSync(path.join(r, 'p.md'), 'audit', 'utf-8');
+  const mockBin = path.join(r, 'mock-claude.sh');
+  fs.writeFileSync(mockBin, '#!/bin/sh\ncat > /dev/null\nprintf \'{"verdict":"passed","blockers_count":0,"report_path":null}\\n\'\n', 'utf-8');
+  fs.chmodSync(mockBin, 0o755);
+  _setEnv('NUBOS_PILOT_CLAUDE_BIN', mockBin);
+  const cap = _cap();
+  const rc = spawnHeadless.run(
+    ['--agent', 'np-test-critic',
+      '--prompt-path', 'p.md',
+      '--output-path', 'out.json'],
+    { cwd: r, stdout: cap.stub },
+  );
+  assert.equal(rc, 0, 'success returns exit 0');
+  const payload = JSON.parse(cap.get());
+  assert.equal(payload.exit_code, 0);
+  assert.equal(payload.agent, 'np-test-critic');
+  const written = fs.readFileSync(path.join(r, 'out.json'), 'utf-8');
+  assert.match(written, /"verdict":"passed"/);
+});
+test('SH-9: spawn-headless surfaces non-zero subprocess exit (mock failure)', () => {
+  const r = _mkRoot();
+  fs.writeFileSync(path.join(r, 'p.md'), 'audit', 'utf-8');
+  const mockBin = path.join(r, 'mock-fail.sh');
+  fs.writeFileSync(mockBin, '#!/bin/sh\ncat > /dev/null\necho boom >&2\nexit 7\n', 'utf-8');
+  fs.chmodSync(mockBin, 0o755);
+  _setEnv('NUBOS_PILOT_CLAUDE_BIN', mockBin);
+  const cap = _cap();
+  const rc = spawnHeadless.run(
+    ['--agent', 'np-test-critic',
+      '--prompt-path', 'p.md',
+      '--output-path', 'out.json'],
+    { cwd: r, stdout: cap.stub },
+  );
+  assert.equal(rc, 2, 'non-zero subprocess returns rc=2');
+  const payload = JSON.parse(cap.get());
+  assert.equal(payload.exit_code, 7);
+  assert.match(payload.stderr_excerpt, /boom/);
+});
+test('SH-10: spawn-headless rejects --timeout-ms below 1000', () => {
+  const r = _mkRoot();
+  fs.writeFileSync(path.join(r, 'p.md'), 'audit', 'utf-8');
+  const cap = _cap();
+  assert.throws(
+    () => spawnHeadless.run(
+      ['--agent', 'np-test-critic',
+        '--prompt-path', 'p.md',
+        '--output-path', 'out.json',
+        '--timeout-ms', '500'],
+      { cwd: r, stdout: cap.stub },
+    ),
+    (err) => err && err.code === 'spawn-headless-invalid-timeout',
+  );
+});

package/docs/adr/0010-nubosloop.md CHANGED Viewed

@@ -156,6 +156,83 @@ No layer is sufficient alone. Together they require a deliberate, multi-step lie
 Layer C still cannot prove that the agent named in an audit entry actually ran. The orchestrator could call `loop-audit-tool-use --agent np-critic …` without spawning the critic. Closing this gap requires runtime instrumentation — the LLM runtime itself stamps spawn-provenance metadata into the audit entry, which the orchestrator cannot forge. That is "Stufe 2" and tracked separately; this amendment closes the practical bypass class without it.
+## Cost Layer (added 2026-05-05)
+The Trust Layer raises the price of dishonesty; the Cost Layer raises the price of *honesty*. Two failure modes observed alongside the Trust Layer rollout:
+1. **Verbose critic returns dominate the per-round token bill.** The critic's structured findings JSON (criteria + findings + per-finding remediation prose) routinely runs 2–5 kB. Returning it as the spawn's final message replays it into the parent context every round, and over a 3-round loop with a 30-task milestone the critic alone burns ~200–500k parent-context tokens that contribute nothing to routing — `lib/nubosloop.cjs::mergeCriticOutputs` only consumes five fields per finding, the rest is decoration the parent never inspects.
+2. **Sub-agent „context isolation" is not context auslagerung.** The runtime's native Agent tool isolates the *child's* context window, but the agent's final message lands verbatim in the parent's history. For a Nubosloop with 1 researcher + 1 critic per round, that is two verbose returns per round per task — the largest per-task cost driver after the executor's own output.
+The Cost Layer addresses both without weakening the Trust Layer: spawn-evidence auditing is unchanged, the routing engine is unchanged, only the *transport* of critic/researcher output between child and parent contexts changes.
+### Layer L5 — Verdict-Only Critic Contract
+Critics now emit their full findings JSON to a path the orchestrator hands them in the spawn prompt (`<report_path>`, typically `${TMPDIR}/nubos-pilot/critic-reports/critic-<task-id>-r<round>.json`). The spawn's *final message* — the artefact that lands in parent context — is a small envelope:
+```json
+{ "critic": "critic", "task_id": "M001-S001-T0001", "round": 1,
+  "verdict": "passed | issues_found", "blockers_count": 0,
+  "report_path": "...", "run_id": "..." }
+```
+`bin/np-tools/loop-run-round.cjs::_runPostCritics` accepts a new `--critic-outputs-path <file>` flag that reads the on-disk findings JSON directly. Inline `--critic-outputs <json>` remains accepted (legacy fallback for runtimes without `Write` capability and migration fixtures), but exactly one of the two MUST be passed — both at once is `loop-run-round-post-critics-conflicting-outputs`. `_runStuck` gets the symmetric `--findings-path` for the stuck-with-findings escalation paths.
+The critic's `tools` frontmatter gains `Write`. `Write` is *only* permitted on the orchestrator-supplied `<report_path>`; touching anything else is a Layer-A bypass class and dealt with under existing trust-layer rules.
+**Failure modes:**
+- `<report_path>` missing in prompt OR `Write` fails → envelope sets `report_path: null`, `verdict: "issues_found"`, `blockers_count: 1`, with an `error` field. Routing engine treats this as `critic-error → stuck`.
+- File written but unreadable / not valid JSON / shape mismatch → `loop-run-round` errors with a typed `loop-run-round-critic-outputs-path-{unreadable,invalid-json,invalid-shape}` code.
+- Inline-JSON-with-no-file fallback is still routable; the legacy contract is not removed, only deprioritised.
+**Why a contract, not a tooling hack:** this isn't an output-truncation hack. The findings JSON's *full content* still drives routing — it just travels via filesystem rather than via parent context. Token bill drops by ~95% on the critic axis without any loss of information. Shape, dedup, and fingerprint logic in `mergeCriticOutputs` are untouched.
+Layer-C audit semantics are unchanged: the orchestrator still calls `loop-audit-tool-use --agent np-critic` after the spawn returns. The audit doesn't care whether the spawn delivered its findings inline or via file.
+### Layer L6 — Headless-Subprocess Mode (opt-in)
+When the runtime's native Agent tool is the wrong shape — e.g. when the parent context has bloated past comfortable limits despite L5, or when a teammate is running on a runtime where each Agent-tool result still costs noticeable cache fragmentation — the orchestrator can route critic and researcher spawns through `bin/np-tools/spawn-headless.cjs` instead. This shells out to `claude -p --output-format json` as a child process; the spawn's conversation lives entirely outside the parent session and only the final-message JSON (the envelope under L5, or whatever the agent emits) is captured to disk.
+**Config (`.nubos-pilot/config.json`):**
+```json
+{
+  "spawn": {
+    "headless": {
+      "enabled": false,
+      "agents": ["np-critic", "np-researcher"],
+      "timeout_ms": 600000,
+      "fallback_on_error": true
+    }
+  }
+}
+```
+`enabled` defaults to `false` so existing installs see no behaviour change. `agents` defaults to the two read-only/output-emitting agents — extending the list to executor-class agents is a Layer-A risk because they mutate the working tree; the doc tooling enforces this list socially (no mechanical guard). `fallback_on_error: true` makes a failed `claude -p` spawn (binary missing, auth-failure, timeout) fall back to the runtime's Agent tool; the fallback is stamped on the checkpoint (`nubosloop.spawn_headless_fallbacks[]`) so dashboards can count fallback rate.
+**Trade-offs (intentionally accepted):**
+- **No shared prompt cache with parent.** Each headless spawn cold-loads its system prompt and agent body. Trade is worth it for critic/researcher because they don't share enough prompt prefix with the parent loop to benefit from cache anyway.
+- **Separate auth.** `claude` CLI must be on `$PATH` and authenticated independently. The `NUBOS_PILOT_CLAUDE_BIN` env var overrides the binary path for split-install scenarios.
+- **Cold-start latency.** Each spawn pays the boot cost of a new process. On 3-round loops with a 30-task milestone this is 30 × 3 × 2 ≈ 180 cold starts vs. 0 in the agent-tool path. Acceptable trade for token detach when token cost dominates.
+- **No streaming feedback.** Headless spawns run to completion, then return — interactive features (Agent tool's progress streaming) are not available. This is fine for critics/researchers (which are batch-shaped anyway).
+**Trust Layer compatibility:** the `loop-audit-tool-use` stamp is identical in both paths. The orchestrator must call it after `spawn-headless` returns, exactly as it does after an Agent-tool spawn. Going headless to bypass the audit is a Layer-C violation by the existing definition; nothing about L6 weakens this.
+**What L6 deliberately does NOT do:**
+- It does not headless-spawn the executor. Editing files inside a `claude -p` subprocess works in principle but the file mutations would not surface through the parent runtime's diff/edit telemetry, breaking the Layer-A commit-task gate which depends on the runtime's tracked-write surface.
+- It does not move the audit log. Audits are still appended by the orchestrator (= the parent), not by the subprocess; otherwise a hostile subprocess could fake its own audit.
+- It does not collapse the multiple researcher spawns of the swarm. `swarm.research.k = 3` still produces three independent headless processes; the merge step is unchanged.
+### Cost Layer summary
+| Layer | Where                                       | What it removes                                                            | Cost                                               |
+|-------|---------------------------------------------|----------------------------------------------------------------------------|----------------------------------------------------|
+| L5    | `agents/np-critic.md` + `loop-run-round.cjs` | Verbatim findings JSON in parent context every round                       | Critic now requires `Write` on its report path     |
+| L6    | `bin/np-tools/spawn-headless.cjs` + workflow dispatcher | Whole spawn conversation in parent context for critic/researcher           | Cold-start per spawn, no shared prompt cache, separate auth |
+L5 alone is a ~95% reduction on the critic axis with no operational cost; recommended default. L6 stacks on top for installs where parent context is the binding constraint despite L5 — opt-in.
 ## More Information
 * **Related ADR:** [ADR-0001](0001-no-daemon-invariant.md) — the loop runs in-session; no daemon coordinates spawns.

package/lib/config-defaults.cjs CHANGED Viewed

@@ -43,6 +43,17 @@ const DEFAULT_SWARM = Object.freeze({
 const DEFAULT_AUTO_LOG_LEARNING = true;
+const DEFAULT_SPAWN_HEADLESS = Object.freeze({
+  enabled: false,
+  agents: Object.freeze(['np-critic', 'np-researcher']),
+  timeout_ms: 10 * 60 * 1000,
+  fallback_on_error: true,
+});
+const DEFAULT_SPAWN = Object.freeze({
+  headless: DEFAULT_SPAWN_HEADLESS,
+});
 const DEFAULT_MODEL_PROFILE = 'frontier';
 const DEFAULT_SCOPE = 'local';
 const DEFAULT_RESPONSE_LANGUAGE = 'en';
@@ -55,6 +66,7 @@ const DEFAULT_CONFIG_TREE = Object.freeze({
   agents: DEFAULT_AGENTS,
   loop: DEFAULT_LOOP,
   swarm: DEFAULT_SWARM,
+  spawn: DEFAULT_SPAWN,
   auto_log_learning: DEFAULT_AUTO_LOG_LEARNING,
 });
@@ -75,6 +87,14 @@ function buildInstallConfig(answers) {
       critic: { ...DEFAULT_SWARM_CRITIC },
       knowledge_adapter: DEFAULT_SWARM.knowledge_adapter,
     },
+    spawn: {
+      headless: {
+        enabled: DEFAULT_SPAWN_HEADLESS.enabled,
+        agents: [...DEFAULT_SPAWN_HEADLESS.agents],
+        timeout_ms: DEFAULT_SPAWN_HEADLESS.timeout_ms,
+        fallback_on_error: DEFAULT_SPAWN_HEADLESS.fallback_on_error,
+      },
+    },
     auto_log_learning: DEFAULT_AUTO_LOG_LEARNING,
   };
 }
@@ -87,6 +107,8 @@ module.exports = {
   DEFAULT_SWARM,
   DEFAULT_SWARM_RESEARCH,
   DEFAULT_SWARM_CRITIC,
+  DEFAULT_SPAWN,
+  DEFAULT_SPAWN_HEADLESS,
   DEFAULT_AUTO_LOG_LEARNING,
   DEFAULT_MODEL_PROFILE,
   DEFAULT_SCOPE,

package/lib/researcher-swarm.cjs CHANGED Viewed

@@ -11,12 +11,28 @@ const DEFAULT_K = 3;
 const MIN_K = 1;
 const MAX_K = 5;
+// Perspectival nudges, NOT thematic preferences. Each entry varies HOW the
+// spawn investigates the same `<task_query>`, never WHAT it should prefer in
+// the answer. Thematic seed_deltas (e.g. "prefer native TypeScript types")
+// silently turn the swarm into a topic-split — three spawns each ranking a
+// different axis, intersection ≈ 0, consensus a fiction. ADR-0011 §Spawn
+// Contract is explicit: identical task_query for every spawn, only the
+// seed_delta varies, and the variation must not change WHICH question is
+// answered or which solution dimension is favoured.
+//
+// Litmus test for adding a new entry: rephrase as "what does this researcher
+// optimise FOR in their final answer?" — if the answer names a concrete
+// solution attribute (TypeScript, smallest deps, latest version), it is
+// thematic and belongs in the planner / architect, not the researcher swarm.
+// Perspectival nudges answer "how does this researcher arrive at the
+// answer?" — methodology, evidence weighting, contrarian stance, breadth vs.
+// depth, gap surfacing.
 const SEED_DELTAS = [
-  'Prefer authoritative sources over training data when they conflict.',
-  'Prefer the smallest dependency surface that satisfies the requirement.',
-  'Prefer libraries that ship native TypeScript types and ESM by default.',
-  'Prefer architectures that compose with the project\'s existing module boundaries over greenfield rewrites.',
-  'Prefer documented, observable failure modes over silent fallbacks.',
+  'Treat training-data recall as a hypothesis to verify against primary documentation; downgrade unverified claims to LOW confidence.',
+  'Survey breadth-first before narrowing — enumerate every viable option you find, even ones that look obviously inferior, before recommending.',
+  'Be contrarian: assume the obvious recommendation is wrong and justify whether it actually is. If it survives the challenge, your confidence is higher.',
+  'Surface unknowns explicitly: anything you cannot verify becomes an Open Question, not an [ASSUMED] filled with a plausible default.',
+  'Stress-test the leading recommendation: name the most plausible failure mode that would make it the wrong choice, then assess how likely that mode is in scope.',
 ];
 // Invariant — SEED_DELTAS must cover the full clamp range. The previous

package/np-tools.cjs CHANGED Viewed

@@ -103,6 +103,7 @@ const topLevelCommands = {
   'loop-audit-tool-use': require('./bin/np-tools/loop-audit-tool-use.cjs'),
   'loop-stuck':          require('./bin/np-tools/loop-stuck.cjs'),
   'loop-metrics':       require('./bin/np-tools/loop-metrics.cjs'),
+  'spawn-headless':     require('./bin/np-tools/spawn-headless.cjs'),
   'learning-log':      require('./bin/np-tools/learning-log.cjs'),
   'learning-match':    require('./bin/np-tools/learning-match.cjs'),
   'learning-list':     require('./bin/np-tools/learning-list.cjs'),

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "nubos-pilot",
-  "version": "1.0.1",
+  "version": "1.0.3",
   "description": "AI-driven planning and execution tool for code projects",
   "homepage": "https://github.com/Nubos-AI/nubos-pilot",
   "repository": {

package/workflows/execute-phase.md CHANGED Viewed

@@ -121,8 +121,8 @@ Every task runs through the **Nubosloop** ([ADR-0010](../docs/adr/0010-nubosloop
 2. **Researcher-Schwarm (on cache miss, or on `next_action=researcher` re-route)** — orchestrator spawns `swarm.research.k=3` independent `np-researcher` agents IN PARALLEL (single message, three Agent blocks) and merges their outputs through `lib/researcher-swarm.cjs::mergeConsensus` (Mehrheit / Union / Schnittmenge). The merged consensus enters the Executor prompt with provenance.
 3. **Executor (R1) or Build-Fixer (R≥2)** — single LLM spawn. Round 1 spawns `agents/np-executor.md`. Round ≥ 2 spawns `agents/np-build-fixer.md` with prior critic findings + verify output appended. Edits ONLY paths in `files_modified` (D-04 — no scope expansion). Does NOT call `commit-task`.
 4. **Mechanical Checks (orchestrator, NOT the agent)** — run task's `<verify>` command + stack linters (`phpstan`, `pint`, `tsc`, `eslint`); capture exit code + output to `$VERIFY_LOG`. Then `loop-audit-tool-use --task-id ... --round ...` confirms the spawn invoked `search-knowledge` or `match-existing-learning` ≥ 1× (Rule 9). Audit findings get round-stamped and feed `loop-evaluate` alongside critic findings. Then call `loop-run-round --phase post-executor --verify-exit-code "$VERIFY_EXIT" --verify-output-path "$VERIFY_LOG"`. On verify-red the verb returns `next_action: spawn-build-fixer` — skip critics, advance to next round directly.
-5. **Critic (verify-green only)** — one Critic agent spawns: `agents/np-critic.md` (sonnet). It emits a single structured findings JSON covering all three axes (style, tests, acceptance). Single-critic revision per ADR-0010 §Trust Layer 2026-05-05 — the prior 3-critic schwarm collapsed because three parallel spawns added latency without proportional finding-quality gains.
-6. **Route** — `loop-run-round --phase post-critics --critic-outputs "$CRITIC_JSON"` returns `next_action ∈ {commit, executor, researcher, askuser, plan-checker, stuck}`:
+5. **Critic (verify-green only)** — one Critic agent spawns: `agents/np-critic.md` (sonnet). It writes the full findings JSON to `$CRITIC_REPORT_PATH` and emits a small verdict envelope as its final message (ADR-0010 §L5 Verdict-Only Contract, 2026-05-05). Single-critic revision per §Trust Layer 2026-05-05 — the prior 3-critic schwarm collapsed because three parallel spawns added latency without proportional finding-quality gains; the Verdict-Only Contract on top reduces per-round main-context tokens by an order of magnitude (verbatim findings reports were the dominant Nubosloop cost-driver).
+6. **Route** — `loop-run-round --phase post-critics --critic-outputs-path "$CRITIC_REPORT_PATH"` (or legacy `--critic-outputs "$CRITIC_JSON"` when the Verdict-Only Contract is unavailable) returns `next_action ∈ {commit, executor, researcher, askuser, plan-checker, stuck}`:
    | `next_action`    | Trigger                            | Action                                                          |
    |------------------|------------------------------------|-----------------------------------------------------------------|
@@ -152,8 +152,56 @@ SWARM_K=$(node .nubos-pilot/bin/np-tools.cjs config-get swarm.research.k 2>/dev/
 SWARM_THRESHOLD=$(node .nubos-pilot/bin/np-tools.cjs config-get swarm.research.threshold 2>/dev/null || echo 0.9)
 SWARM_MIN_OCC=$(node .nubos-pilot/bin/np-tools.cjs config-get swarm.research.minOccurrence 2>/dev/null || echo 3)
 AUTO_LOG_LEARNING=$(node .nubos-pilot/bin/np-tools.cjs config-get auto_log_learning 2>/dev/null || echo true)
+SPAWN_HEADLESS_ENABLED=$(node .nubos-pilot/bin/np-tools.cjs config-get spawn.headless.enabled 2>/dev/null || echo false)
+SPAWN_HEADLESS_AGENTS=$(node .nubos-pilot/bin/np-tools.cjs config-get spawn.headless.agents 2>/dev/null || echo '["np-critic","np-researcher"]')
+SPAWN_HEADLESS_FALLBACK=$(node .nubos-pilot/bin/np-tools.cjs config-get spawn.headless.fallback_on_error 2>/dev/null || echo true)
 ```
+## Spawn dispatch — agent-tool vs. headless subprocess (ADR-0010 §L6)
+By default, `np-researcher` and `np-critic` spawns go through the runtime's
+native Agent tool — the parent context picks up the spawn's final message as a
+tool result. When `spawn.headless.enabled=true` AND the agent name appears in
+`spawn.headless.agents`, the orchestrator instead shells out to
+`node .nubos-pilot/bin/np-tools.cjs spawn-headless --agent <name> ...`, which
+runs the agent inside an isolated `claude -p` subprocess. The subprocess'
+final-message is captured to disk; the parent context only sees an exit code
+plus the path. This buys true context detach for the verbose-but-bounded
+critic/researcher passes — at the cost of an own prompt cache, separate auth,
+and a cold-start per spawn.
+**Dispatch helper (use at every np-researcher / np-critic spawn point):**
+```bash
+_spawn_dispatch_is_headless() {
+  local agent="$1"
+  [ "$SPAWN_HEADLESS_ENABLED" = "true" ] || return 1
+  echo "$SPAWN_HEADLESS_AGENTS" | node -e \
+    "let l=''; process.stdin.on('data',d=>l+=d); process.stdin.on('end',()=>{
+      try { const arr = JSON.parse(l); process.exit(arr.includes(process.argv[1]) ? 0 : 1); }
+      catch (e) { process.exit(1); }
+    })" "$agent"
+}
+```
+For each headless spawn the orchestrator (a) writes the rendered prompt to
+`${TMPDIR:-/tmp}/nubos-pilot/prompts/<agent>-<task-id>-r<round>.md`,
+(b) calls `spawn-headless --agent <name> --prompt-path … --output-path …`,
+(c) on non-zero exit AND `spawn.headless.fallback_on_error=true`, falls back to
+the regular agent-tool spawn. Falling back is logged on the checkpoint
+(`spawn_headless_fallbacks[]`) so the fallback rate is visible on
+`/np:dashboard`. **The Layer-C `loop-audit-tool-use` stamp is identical for
+both paths** — it is the orchestrator's responsibility to call it after the
+spawn returns, regardless of whether the spawn went through the agent tool or
+the headless subprocess. Bypassing the audit by going headless is a Layer-C
+violation by the same definition as before.
+`np-executor` and `np-build-fixer` are NEVER eligible for headless spawn —
+they edit files in the working tree and depend on the parent runtime's file
+write semantics. `spawn.headless.agents` defaults to `['np-critic','np-researcher']`
+for exactly this reason; do not extend it without understanding which agents
+mutate the working tree.
 **Per-task max-rounds override (T3, ADR-0010 Trust-Layer):** before entering the per-task while-loop, check the task's checkpoint for a `max_rounds_override` (set when the operator answered the stuck-dialog with "Weitermachen +5 Runden"). If present, it beats the config default — both for the bash while-cap and for the `post-critics` `evaluateLoop` cap.
 ```bash
@@ -313,11 +361,20 @@ for WAVE_INDEX in 0 1 2 ...; do
         continue
       fi
-      # === Step 5: Critic — one agent, all three axes ===
+      # === Step 5: Critic — one agent, all three axes (Verdict-Only Contract, ADR-0010 §L5) ===
+      # The orchestrator pre-creates the report directory and hands the path to
+      # the spawn. The critic Writes the full findings JSON to that path and
+      # emits a tiny envelope (~150 bytes) as its final message — the verbose
+      # findings/criteria payload never enters the parent context. This is the
+      # main token-cost lever in ADR-0010; see §L5.
+      mkdir -p "${TMPDIR:-/tmp}/nubos-pilot/critic-reports"
+      CRITIC_REPORT_PATH="${TMPDIR:-/tmp}/nubos-pilot/critic-reports/critic-${TASK_ID}-r${ROUND}.json"
       # Single LLM spawn (sonnet by default — see swarm.critic.tier in config):
-      #   - agents/np-critic.md → CRITIC_OUTPUT_JSON
+      #   - agents/np-critic.md → writes $CRITIC_REPORT_PATH, returns envelope
       # The orchestrator injects the three audit-surface modules into the
-      # spawn's <files_to_read> block — np-critic is thin (role, output schema,
+      # spawn's <files_to_read> block AND hands the agent <report_path> as a
+      # required spawn input — np-critic is thin (role, output schema,
       # trust-layer rules) and treats the three modules as canonical
       # audit-truth (categories, severity rubric, stop-conditions per axis):
       #
@@ -330,9 +387,15 @@ for WAVE_INDEX in 0 1 2 ...; do
       #   - agents/np-critic-tests.md       (Tests axis module)
       #   - agents/np-critic-acceptance.md  (Acceptance axis module)
       #   </files_to_read>
+      #   <report_path>$CRITIC_REPORT_PATH</report_path>
+      #
+      # Final-message shape from the spawn (verbatim, no markdown wrapper):
+      #   { critic, task_id, round, verdict, blockers_count, report_path, run_id }
       #
-      # The critic emits ONE merged JSON covering all three axes.
-      CRITIC_OUTPUTS_JSON=$(printf '[%s]' "$CRITIC_OUTPUT_JSON")
+      # The orchestrator does NOT need to parse the envelope to drive routing
+      # — loop-run-round --phase post-critics --critic-outputs-path reads the
+      # full file directly. Envelope fields are surfaced on np:dashboard for
+      # at-a-glance triage (verdict + blockers_count per task).
       # === Step 5b: Stamp critic spawn-evidence ===
       # MANDATORY — without this stamp, post-critics refuses with
@@ -346,8 +409,15 @@ for WAVE_INDEX in 0 1 2 ...; do
       node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-critic --tool-use-log '[]'
       # === Step 6: Route via loop-evaluate (post-critics) ===
+      # Verdict-Only Contract (ADR-0010 §L5): pass --critic-outputs-path so the
+      # full findings JSON is read directly from disk. The envelope from the
+      # spawn's final message is NOT what loop-evaluate consumes; it routes on
+      # the on-disk findings/criteria payload. The legacy --critic-outputs
+      # inline form is still accepted for runtimes without Write capability or
+      # for migration fixtures (`--force-post-critics` overrides the audit gate
+      # the same way it always has).
       POST_CRIT=$(node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
-        --phase post-critics --critic-outputs "$CRITIC_OUTPUTS_JSON")
+        --phase post-critics --critic-outputs-path "$CRITIC_REPORT_PATH")
       NEXT_ACTION=$(echo "$POST_CRIT" | node -e 'process.stdin.on("data",d=>console.log(JSON.parse(d).next_action))')
       case "$NEXT_ACTION" in
@@ -395,17 +465,17 @@ for WAVE_INDEX in 0 1 2 ...; do
                        case "$PLAN_ASK" in
                          "Plan neu prüfen"*)
                            node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
-                             --phase stuck --reason "user-requested-replan" --findings "$CRITIC_OUTPUTS_JSON"
+                             --phase stuck --reason "user-requested-replan" --findings-path "$CRITIC_REPORT_PATH"
                            echo "[np:execute-phase] $TASK_ID flagged for plan-checker. Run /np:plan-phase $PHASE --repromote, then re-run /np:execute-phase $PHASE." >&2
                            exit 4 ;;
                          "Task als stuck"*)
                            node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
-                             --phase stuck --reason "plan-checker-user-stuck" --findings "$CRITIC_OUTPUTS_JSON"
+                             --phase stuck --reason "plan-checker-user-stuck" --findings-path "$CRITIC_REPORT_PATH"
                            echo "[np:execute-phase] $TASK_ID marked stuck (user choice from plan-checker dialog)." >&2
                            exit 3 ;;
                          "Manuell fixen"*)
                            node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
-                             --phase stuck --reason "manual-fix-pending" --findings "$CRITIC_OUTPUTS_JSON"
+                             --phase stuck --reason "manual-fix-pending" --findings-path "$CRITIC_REPORT_PATH"
                            echo "[np:execute-phase] $TASK_ID paused for manual fix. Resume via /np:execute-phase $PHASE when ready." >&2
                            exit 0 ;;
                        esac ;;
@@ -436,17 +506,17 @@ for WAVE_INDEX in 0 1 2 ...; do
                            continue ;;
                          "Task neu planen"*)
                            node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
-                             --phase stuck --reason "user-requested-replan" --findings "$CRITIC_OUTPUTS_JSON"
+                             --phase stuck --reason "user-requested-replan" --findings-path "$CRITIC_REPORT_PATH"
                            echo "[np:execute-phase] $TASK_ID flagged for plan-checker. Run /np:plan-phase $PHASE --repromote, then re-run /np:execute-phase $PHASE." >&2
                            exit 4 ;;
                          "Task als stuck"*)
                            node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
-                             --phase stuck --reason "max-rounds-user-stuck" --findings "$CRITIC_OUTPUTS_JSON"
+                             --phase stuck --reason "max-rounds-user-stuck" --findings-path "$CRITIC_REPORT_PATH"
                            echo "[np:execute-phase] $TASK_ID marked stuck after $LOOP_MAX_ROUNDS rounds (user choice)." >&2
                            exit 3 ;;
                          "Manuell fixen"*)
                            node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
-                             --phase stuck --reason "manual-fix-pending" --findings "$CRITIC_OUTPUTS_JSON"
+                             --phase stuck --reason "manual-fix-pending" --findings-path "$CRITIC_REPORT_PATH"
                            echo "[np:execute-phase] $TASK_ID paused for manual fix. Resume via /np:execute-phase $PHASE when ready." >&2
                            exit 0 ;;
                        esac ;;
@@ -526,7 +596,9 @@ After every slice completes, point the operator at `/np:validate-phase $PHASE` t
 - Start one checkpoint per task before kicking off the loop.
 - Run `loop-run-round --phase preflight` BEFORE every Round-1 executor spawn — never skip the cache lookup.
 - Spawn `agents/np-executor.md` on Round 1, `agents/np-build-fixer.md` on Round ≥ 2 — once per round, with only that task's `files_modified` in scope (D-04, no scope expansion).
-- Spawn the single Critic agent (`np-critic`) once per round, after a verify-green post-executor. It emits one JSON covering style + tests + acceptance.
+- Spawn the single Critic agent (`np-critic`) once per round, after a verify-green post-executor. It writes the full findings JSON to `$CRITIC_REPORT_PATH` and emits a small verdict envelope as its final message (ADR-0010 §L5 Verdict-Only Contract).
+- Pre-create `${TMPDIR:-/tmp}/nubos-pilot/critic-reports/` before the critic spawn so the agent's `Write` cannot fail on a missing parent directory.
+- Pass `--critic-outputs-path "$CRITIC_REPORT_PATH"` to `loop-run-round --phase post-critics` so the full findings JSON is read from disk rather than replayed through the spawn's final message.
 - Run `loop-run-round --phase post-executor` AFTER mechanical checks; honor `next_action: spawn-build-fixer` (verify-red short-circuit, skip critics this round).
 - Run `loop-run-round --phase post-critics` AFTER critics return, to obtain the routing `next_action`.
 - Run `loop-audit-tool-use` per round per spawn — for executor/build-fixer this drives Rule 9 enforcement, AND for `np-critic` this is the spawn-evidence required by the Layer-C audit-trail gate (`loop-post-executor-missing-spawn-audit` / `loop-post-critics-missing-critic-audit`). After the Single-Critic Revision (ADR-0010, 2026-05-05) the per-round audit count is **two** in rounds ≥ 2 (`np-build-fixer` + `np-critic`) and **`swarm.research.k` + 2** in round 1 (k × `np-researcher` + `np-executor` + `np-critic`). All audits in the active round are mandatory before the corresponding `loop-run-round --phase post-{researcher|executor|critics}` invocation.
@@ -540,6 +612,7 @@ After every slice completes, point the operator at `/np:validate-phase $PHASE` t
 - Spawn the Critic agent BEFORE the post-executor verify-green check — verify must pass first; the critic only runs on verify-green.
 - Use `np-executor` on Round ≥ 2 — use `np-build-fixer` (it gets prior critic findings + verify output excerpt).
 - Skip `loop-audit-tool-use` for ANY spawn (researcher / executor / build-fixer / `np-critic`). Skipping the executor audit silences Rule 9; skipping the critic audit means the orchestrator cannot prove the critic actually ran, and the post-critics gate refuses. Synthesizing `--critic-outputs` JSON without spawning the real `np-critic` agent is the canonical bypass — Layer C blocks it mechanically.
+- Bypass the Verdict-Only Contract by inlining the full findings JSON in the spawn's final message or by reconstructing `$CRITIC_REPORT_PATH` content from the envelope. Both defeat the cost-control purpose of ADR-0010 §L5; the critic is required to `Write` the findings file itself, and the orchestrator is required to read that file via `--critic-outputs-path` rather than the envelope.
 - Extend a task's scope beyond `files_modified` — D-04 violations route to `plan-checker`, not post-hoc PLAN.md mutations.
 - Invoke `git commit`, `git add`, or any bare git command from this workflow or the spawned agent (CLAUDE.md §Git operations).
 - Bundle two tasks into one commit (ADR-0004 atomicity).