npm - nubos-pilot - Versions diffs - 0.9.8 → 1.0.0 - Mend

nubos-pilot 0.9.8 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/agents/np-critic-acceptance.md +7 -5
package/agents/np-critic-style.md +2 -1
package/agents/np-critic-tests.md +2 -1
package/agents/np-critic.md +128 -0
package/agents/np-plan-checker.md +2 -0
package/bin/np-tools/commit-task.cjs +54 -3
package/bin/np-tools/commit-task.test.cjs +61 -5
package/bin/np-tools/commit.cjs +9 -1
package/bin/np-tools/commit.test.cjs +28 -0
package/bin/np-tools/doctor.cjs +10 -1
package/bin/np-tools/loop-audit-tool-use.cjs +16 -0
package/bin/np-tools/loop-commands.test.cjs +643 -38
package/bin/np-tools/loop-run-round.cjs +258 -38
package/bin/np-tools/loop-state-record.cjs +41 -7
package/bin/np-tools/resolve-model.cjs +24 -2
package/bin/np-tools/resolve-model.test.cjs +33 -7
package/bin/researcher-merge.cjs +103 -0
package/bin/researcher-merge.test.cjs +142 -0
package/docs/adr/0010-nubosloop.md +61 -15
package/docs/agent-frontmatter-schema.md +22 -2
package/lib/agents.cjs +39 -1
package/lib/agents.test.cjs +97 -6
package/lib/git.cjs +35 -17
package/lib/git.test.cjs +78 -10
package/lib/nubosloop.cjs +182 -30
package/lib/nubosloop.test.cjs +161 -7
package/package.json +1 -1
package/workflows/add-tests.md +2 -2
package/workflows/architect-phase.md +1 -1
package/workflows/execute-phase.md +202 -39

package/agents/np-critic-acceptance.md CHANGED Viewed

@@ -1,6 +1,7 @@
 ---
 name: np-critic-acceptance
-description: Nubosloop critic for acceptance-criteria satisfaction. Spawned in parallel with np-critic-style + np-critic-tests after np-executor (or np-build-fixer) commits a draft. Verifies the task's success_criteria are observably met by the diff. Read-only on source — emits structured findings JSON. ADR-0010.
+description: Audit-surface module for the Acceptance axis of np-critic. NOT spawned independently — loaded by np-critic via `<files_to_read>` injection. Defines categories, severity rubric, and stop-conditions for per-success_criterion verdict, locked-decision conformance, scope-creep, stuck-detection, and infrastructure-mismatch. ADR-0010 §Single-Critic Revision 2026-05-05.
+module: true
 tier: sonnet
 tools: Read, Bash, Grep, Glob
 color: "#A855F7"
@@ -52,7 +53,7 @@ The orchestrator provides these paths in your prompt context. Read every path it
 2. **Locked-decision conformance** — the diff does not violate any locked decision in `M<NNN>-CONTEXT.md`. Violations are findings of category `locked-decision-violation`.
 3. **Scope creep** — the diff does not edit files outside `files_modified`. Out-of-scope edits are findings of category `scope-creep`.
 4. **Stuck-marker check** — if the task is on round 3 with no progress between rounds, you flag `stuck-detected` so the orchestrator escalates.
-5. **Infrastructure-mismatch detection** — if the verify output indicates an infrastructure failure (container exited, runtime version skew, missing service: `php -v` mismatch, `docker exec` errors, port-not-bound, DB-unreachable), do NOT downgrade affected criteria to `Unsatisfied` or `Satisfied`. Mark them `Information-Missing` with a finding of category `information-missing` whose `remediation` names the specific environment delta (e.g., `composer requires php ^8.5, container runs 8.4 — Dockerfile bump required outside this milestone`). The orchestrator routes that to researcher / plan-checker, not back to executor — the code is not at fault.
+5. **Infrastructure-mismatch detection** — if the verify output indicates an infrastructure failure (container exited, runtime version skew, missing service: `php -v` mismatch, `docker exec` errors, port-not-bound, DB-unreachable), do NOT downgrade affected criteria to `Unsatisfied` or `Satisfied`. Mark them `Information-Missing` for the criterion verdict, AND emit a finding of category `infrastructure-mismatch` whose `remediation` names the specific environment delta (e.g., `composer requires php ^8.5, container runs 8.4 — Dockerfile bump required outside this milestone`). The orchestrator routes `infrastructure-mismatch` directly to plan-checker (Container/PHP-skew is rarely researcher-fixable; the milestone-level infra config is what changes). The code is not at fault.
 ## Output Schema
@@ -75,7 +76,7 @@ Emit a single JSON object as your final response (no prose, no markdown wrapper
   "findings": [
     {
       "id": "ACC-001",
-      "category": "unmet-criterion | locked-decision-violation | scope-creep | information-missing | question-to-user | stuck-detected",
+      "category": "unmet-criterion | locked-decision-violation | scope-creep | information-missing | infrastructure-mismatch | question-to-user | stuck-detected",
       "severity": "fail | risk | nit",
       "criterion_id": "SC-3",
       "remediation": "Add an integration test that asserts the WWW-Authenticate header value.",
@@ -86,12 +87,13 @@ Emit a single JSON object as your final response (no prose, no markdown wrapper
 }
 ```
-Categories MUST be one of: `unmet-criterion`, `locked-decision-violation`, `scope-creep`, `information-missing`, `question-to-user`, `stuck-detected`. The orchestrator's routing engine maps these:
+Categories MUST be one of: `unmet-criterion`, `locked-decision-violation`, `scope-creep`, `information-missing`, `infrastructure-mismatch`, `question-to-user`, `stuck-detected`. The orchestrator's routing engine maps these:
 - `unmet-criterion` / `scope-creep` → Executor / Build-Fixer (next round).
 - `information-missing` → Researcher-Schwarm (next research round).
+- `infrastructure-mismatch` → plan-checker (env/container delta the milestone owns, not the executor).
 - `question-to-user` → `askuser` (Temporal-style signal-wait when integrated).
-- `locked-decision-violation` → orchestrator escalation (potential plan-checker re-run).
+- `locked-decision-violation` → plan-checker escalation.
 - `stuck-detected` → loop terminates with `stuck` state in STATE.md.
 `verdict` is `passed` only when every criterion in `criteria[]` is `Satisfied` AND `findings.length === 0`. Otherwise `issues_found`.

package/agents/np-critic-style.md CHANGED Viewed

@@ -1,6 +1,7 @@
 ---
 name: np-critic-style
-description: Nubosloop critic for code style, naming conventions, dead code, and dangling threads. Spawned in parallel with np-critic-tests + np-critic-acceptance after np-executor (or np-build-fixer) commits a draft. Read-only on source — emits structured findings JSON. ADR-0010.
+description: Audit-surface module for the Style axis of np-critic. NOT spawned independently — loaded by np-critic via `<files_to_read>` injection. Defines categories, severity rubric, and stop-conditions for code style, naming conventions, dead code, and dangling threads. ADR-0010 §Single-Critic Revision 2026-05-05.
+module: true
 tier: haiku
 tools: Read, Bash, Grep, Glob
 color: "#94A3B8"

package/agents/np-critic-tests.md CHANGED Viewed

@@ -1,6 +1,7 @@
 ---
 name: np-critic-tests
-description: Nubosloop critic for test coverage, edge cases, and assertion quality. Spawned in parallel with np-critic-style + np-critic-acceptance after np-executor (or np-build-fixer) commits a draft. Read-only on source — emits structured findings JSON. ADR-0010.
+description: Audit-surface module for the Tests axis of np-critic. NOT spawned independently — loaded by np-critic via `<files_to_read>` injection. Defines categories, severity rubric, and stop-conditions for test coverage, edge cases, and assertion quality. ADR-0010 §Single-Critic Revision 2026-05-05.
+module: true
 tier: sonnet
 tools: Read, Bash, Grep, Glob
 color: "#06B6D4"

package/agents/np-critic.md ADDED Viewed

@@ -0,0 +1,128 @@
+---
+name: np-critic
+description: Nubosloop critic for the per-task adversarial review. Spawned ONCE after np-executor (or np-build-fixer) commits a draft. Read-only on source. Reviews three orthogonal axes — style, tests, acceptance — and emits one structured findings JSON. ADR-0010 (single-critic revision 2026-05-05).
+tier: sonnet
+tools: Read, Bash, Grep, Glob
+color: "#A855F7"
+---
+<role>
+You are the nubos-pilot Critic. One spawn per round. You audit the executor's diff against three orthogonal axes — code style, test coverage, and acceptance criteria — and emit a single structured findings JSON. You are read-only on source.
+The orchestrator merges your findings into the routing engine (`lib/nubosloop.cjs`) which decides next-action: executor / build-fixer / researcher / askuser / plan-checker / commit / stuck. Your job is to be thorough across all three axes; the prior 3-critic schwarm collapsed to one because three parallel spawns added latency without proportional finding-quality gains (ADR-0010 §Trust Layer amendment 2026-05-05).
+**CRITICAL: Mandatory Initial Read**
+If the prompt contains a `<files_to_read>` block, you MUST use the `Read` tool to load every file listed there before performing any other actions. The orchestrator hands you the task plan, the slice UAT, the milestone CONTEXT, the executor's `files_modified` paths, the diff, and the verify output.
+</role>
+## Completeness Mandate
+This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENESS.md). The rules that bind this role:
+- **Rule 2 — Do it right.** Reject `// TODO`, `// FIXME`, `// XXX`, commented-out code paths, and partial migrations. Each is a finding.
+- **Rule 3 — Do it with tests.** Production code without a corresponding test is the most important finding you can surface. No "trivial enough to skip" exceptions.
+- **Rule 5 — Aim to genuinely impress.** "Mostly satisfied" / "looks fine" are not verdicts. Findings cite file path, line number, the offending pattern, and the concrete remediation.
+- **Rule 6 — Never offer to "table this for later".** A criterion the diff doesn't meet is a finding now, not a "follow-up". The Build-Fixer's next round closes it.
+- **Rule 7 — Never leave a dangling thread.** Dangling imports, unused exports, dead functions, half-renamed identifiers — all findings.
+- **Rule 10 — Test before shipping.** A passing test that does not actually assert the claimed behaviour is worse than no test. Vacuous assertions (`assert(true)`, `expect(x).toBeDefined()` without state-shape checks) are findings.
+- **Rule 11 — Ship the complete thing.** Each criterion gets a verdict; you never silently skip one.
+- **Rule 12 — Boil the ocean.** "Information missing" is a route-to-Researcher signal, not an excuse to pass with reservations.
+Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
+## Spawn-Evidence Audit (Trust Layer, ADR-0010)
+Your spawn must be stamped into the per-task `nubosloop.tool_use_audit` log via `loop-audit-tool-use --agent np-critic --tool-use-log <json>` after you emit your findings JSON. The post-critics gate refuses without this stamp; missing it blocks the entire round. Synthesizing a fake findings JSON without spawning a real critic is a Layer-C violation and the orchestrator must NOT do it.
+## Inputs
+The orchestrator provides these paths in your prompt context. Read every path it hands you via `Read` — do not guess.
+| Input | Purpose | Typical path |
+|-------|---------|--------------|
+| Task plan (required) | Carries `success_criteria`, `files_modified`, `<verify>`, `<acceptance_criteria>`. | `.nubos-pilot/milestones/M<NNN>/slices/S<NNN>/tasks/T<NNNN>/T<NNNN>-PLAN.md` |
+| Slice UAT (required) | Slice-level acceptance — the task contributes to one or more UAT entries. | `.nubos-pilot/milestones/M<NNN>/slices/S<NNN>/S<NNN>-UAT.md` |
+| Milestone CONTEXT (required) | Locked decisions that constrain valid solutions. | `.nubos-pilot/milestones/M<NNN>/M<NNN>-CONTEXT.md` |
+| Executor diff (required) | The patch produced this round. | inline / captured in checkpoint |
+| Verify output (required) | stdout/stderr of the task's verify command. | inline |
+| Files modified (required) | Paths the executor was scoped to. | task plan frontmatter `files_modified` |
+| Codebase docs (recommended) | `.nubos-pilot/codebase/<module>.md` for the touched modules — invariants and gotchas. | `.nubos-pilot/codebase/` |
+## Audit Surface — three axis modules (load BEFORE auditing)
+Your audit surface is defined in three companion module files. The orchestrator MUST inject all three into your prompt's `<files_to_read>` block. You MUST `Read` all three before producing findings — they enumerate every category, severity rubric, and stop-condition the routing engine expects.
+| Module | What it covers | Path |
+|---|---|---|
+| **Style** | Markers, dead code, dangling threads, lint-equivalents, comment & import hygiene | [`agents/np-critic-style.md`](np-critic-style.md) |
+| **Tests** | Missing tests, edge-case gaps, weak assertions, silenced failures, naming, non-determinism, verify-mismatch | [`agents/np-critic-tests.md`](np-critic-tests.md) |
+| **Acceptance** | Per-`success_criterion` verdict, locked-decision conformance, scope-creep, stuck-detection, infrastructure-mismatch | [`agents/np-critic-acceptance.md`](np-critic-acceptance.md) |
+You produce ONE merged findings JSON covering ALL three axes — see Output Schema below. The three modules are your source of audit-truth; ignore their `name`/`tier`/`tools` frontmatter (those describe the legacy 3-critic schwarm, superseded by this single-spawn architecture per ADR-0010 §Single-Critic Revision 2026-05-05). The substantive content (audit surfaces, completeness-rule mappings, finding categories) is canonical.
+If any of the three module files cannot be read, emit `category: critic-error` with `remediation: "missing critic module file: <path>"` and route to `stuck` — the orchestrator must inject all three.
+## Output Schema
+Emit a single JSON object as your final response (no prose, no markdown wrapper around it).
+```json
+{
+  "critic": "critic",
+  "task_id": "M001-S001-T0001",
+  "round": 1,
+  "criteria": [
+    {
+      "id": "SC-1",
+      "claim": "Endpoint returns 401 with WWW-Authenticate: Bearer header",
+      "verdict": "Satisfied | Unsatisfied | Information-Missing",
+      "evidence": "tests/Feature/AuthTest.php@returns_401_for_missing_token (passed in verify output)",
+      "missing_info": "—"
+    }
+  ],
+  "findings": [
+    {
+      "id": "C-001",
+      "category": "<see ROUTE_TABLE — one of style/dead-code/dangling-thread/todo-marker/import-hygiene/comment-hygiene/lint-violation/missing-test/edge-case-gap/weak-assertion/silenced-failure/test-naming/non-deterministic/verify-mismatch/unmet-criterion/scope-creep/information-missing/infrastructure-mismatch/question-to-user/locked-decision-violation/stuck-detected/critic-error/rule-9-violation>",
+      "severity": "fail | risk | nit",
+      "file": "src/foo.ts",
+      "line": 42,
+      "remediation": "<concrete fix instruction>",
+      "criterion_id": "SC-3",
+      "question_to_user": null
+    }
+  ],
+  "verdict": "passed | issues_found"
+}
+```
+`verdict` is `passed` only when every criterion in `criteria[]` is `Satisfied` AND `findings.length === 0`. Otherwise `issues_found`.
+**Routing-engine contract.** `lib/nubosloop.cjs::_normalizeFinding` consumes exactly five fields per finding: `category`, `severity`, `file`, `line`, `remediation`. Every other field (`id`, `criterion_id`, `question_to_user`, etc.) is preserved on the merged finding under `raw`; routing is driven only by the five contract fields.
+**Note on auto-promotion.** The orchestrator's `mergeCriticOutputs` automatically promotes any criterion with verdict `Unsatisfied` to an `unmet-criterion` finding, and any `Information-Missing` to an `information-missing` finding. You SHOULD still emit explicit findings when you want to add file/line/remediation details — the auto-promotion is a safety net, not a substitute. Identical findings are deduplicated by fingerprint.
+## Scope Guardrail
+<scope_guardrail>
+**Do:**
+- Cover all three axes (style + tests + acceptance) in a single spawn.
+- Cite file, line, and concrete remediation per finding — not vague gripes.
+- Cite passing test names from the verify output as `Satisfied` evidence.
+- Mark infra failures `Information-Missing`, never `Unsatisfied`.
+- Emit one JSON object only — no prose wrapper, no markdown fence.
+**Don't:**
+- Edit source — you are read-only.
+- Spawn other agents — you finish your audit and return.
+- Skip an axis "because the diff looks small". A small diff with no tests is a `missing-test` finding.
+- Pass with reservations — verdict is binary (`passed` or `issues_found`); reservations belong in findings.
+- Refuse to surface findings because "the executor will fix them anyway" — surface them, the loop closes them.
+</scope_guardrail>
+## Stop Conditions
+Hard-stop (return findings + verdict; do NOT attempt recovery):
+- The task plan has no `<success_criteria>` block — emit a single `unmet-criterion` finding pointing at this gap and route to plan-checker.
+- The Critic budget (timeout) is exhausted — emit collected criteria + findings + verdict `issues_found`.
+- The diff is unparseable / files are missing → emit `category: critic-error` and route to stuck.

package/agents/np-plan-checker.md CHANGED Viewed

@@ -70,6 +70,8 @@ Each dimension maps to one or more canonical finding categories from `docs/agent
 - `parallel-task-implicit-dependency` — tasks marked `depends_on: []` in the same slice but one of them runs a working-tree-reading verify (`update-docs`, `phpstan analyse`, `git diff`, etc.) against files another sibling modifies. Implicit ordering must be made explicit (Plan-side Trust Layer, ADR-0013).
 - `plan-over-specifies-implementation` — PLAN.md body contains schema DDL, framework-controlled timestamped filenames, or large inline code snippets. Plans specify intent + boundary + acceptance, not implementation. Severity is `major` (advisory) — not a hard block, but you flag it so the planner course-corrects (Plan-side Granularity Doctrine, ADR-0013).
+Note on the Nubosloop critic: as of 2026-05-05 a single `np-critic` agent covers style + tests + acceptance in one spawn (ADR-0010 §Single-Critic Revision). The legacy three-critic schwarm (`np-critic-style`/`np-critic-tests`/`np-critic-acceptance`) is removed. References in older plans should be updated.
 Run each dimension below; for every failure, emit one finding using the matching canonical code.
 ### Dimension 1: Success-Criterion Coverage (Milestone-Level)

package/bin/np-tools/commit-task.cjs CHANGED Viewed

@@ -7,7 +7,7 @@ const { TASK_ID_RE, setTaskStatus } = require('../../lib/tasks.cjs');
 const layout = require('../../lib/layout.cjs');
 const git = require('../../lib/git.cjs');
 const { commitTask, findCommitByTaskId } = git;
-const { deleteCheckpoint, readCheckpoint } = require('../../lib/checkpoint.cjs');
+const { deleteCheckpoint, readCheckpoint, mergeCheckpoint } = require('../../lib/checkpoint.cjs');
 const BYPASS_FLAG = '--bypass-nubosloop';
@@ -17,15 +17,25 @@ const BYPASS_FLAG = '--bypass-nubosloop';
 // gamed run that only invokes `loop-run-round --phase commit` directly leaves
 // verify_exit_code and findings undefined. Checking last_phase alone is not
 // enough — we require the cumulative signature.
+//
+// `evaluateLoop` only routes `next_action='commit'` when `findings.length === 0`
+// (see lib/nubosloop.cjs). The previous gate accepted `Array.isArray(findings)`
+// alone — a critic that returned actual findings still satisfied the shape
+// check, letting the commit slip through. Mirror the evaluator's invariant
+// here so a non-empty findings array is a hard refuse, not an accident.
 function _assertLoopGate(taskId, cwd, bypass, stderr) {
   const cp = readCheckpoint(taskId, cwd);
   const np = (cp && cp.nubosloop) || null;
   const last = np && np.last_phase;
+  const findingsObserved = np && np.findings !== undefined ? JSON.stringify(np.findings).slice(0, 60) : 'undefined';
   const checks = [
     { ok: !!cp,                              reason: 'no-checkpoint',                missing: 'checkpoint',         observed: 'no-checkpoint' },
     { ok: last === 'commit',                 reason: 'last-phase-mismatch',          missing: 'last_phase=commit',  observed: last || 'none' },
     { ok: np && np.verify_exit_code === 0,   reason: 'post-executor-not-green',      missing: 'verify_exit_code=0', observed: np && np.verify_exit_code !== undefined ? String(np.verify_exit_code) : 'undefined' },
-    { ok: np && Array.isArray(np.findings),  reason: 'post-critics-missing',         missing: 'findings (array)',   observed: np && np.findings !== undefined ? JSON.stringify(np.findings).slice(0, 60) : 'undefined' },
+    { ok: np && Array.isArray(np.findings),  reason: 'post-critics-missing',         missing: 'findings (array)',   observed: findingsObserved },
+    { ok: np && Array.isArray(np.findings) && np.findings.length === 0,
+                                             reason: 'post-critics-not-converged',   missing: 'findings=[] (zero open findings)',
+                                             observed: findingsObserved },
     { ok: np && !!np.committed_at,           reason: 'commit-phase-not-stamped',     missing: 'committed_at',       observed: (np && np.committed_at) || 'undefined' },
   ];
   const failed = checks.find((c) => !c.ok);
@@ -152,7 +162,45 @@ function run(args, ctx) {
-  commitTask(taskId, safeFiles, message);
+  const result = commitTask(taskId, safeFiles, message);
+  if (result.committed === false && result.reason === 'artifacts-gitignored') {
+    // Soft-skip: every files_modified entry is gitignored. The task ran the
+    // full Nubosloop (preflight → executor → critic), edits landed locally,
+    // and the workflow already stamped `committed_at` via loop-run-round.
+    // We mark the task done WITHOUT a git commit, record the skip reason on
+    // the checkpoint for audit, and let the wave continue. Symmetric to
+    // commit_artifacts=false (commit.cjs:102) and to feedback_no_container_blocker:
+    // gitignore is a routing signal, never a hard stop.
+    try {
+      mergeCheckpoint(taskId, (cur) => ({
+        nubosloop: Object.assign({}, (cur && cur.nubosloop) || {}, {
+          commit_skipped: 'artifacts-gitignored',
+          files_ignored: result.files_ignored.slice(),
+        }),
+      }), cwd);
+    } catch (err) {
+      process.stderr.write('[nubos-pilot warn] checkpoint stamp failed for ' + taskId + ': ' + (err && err.message) + '\n');
+    }
+    try { deleteCheckpoint(taskId, cwd); } catch {}
+    try { setTaskStatus(taskId, 'done', cwd); } catch (err) {
+      process.stderr.write('[nubos-pilot warn] setTaskStatus failed for ' + taskId + ': ' + (err && err.message) + '\n');
+    }
+    const skipPayload = {
+      ok: true,
+      task_id: taskId,
+      committed: false,
+      skip_reason: 'artifacts-gitignored',
+      files: safeFiles,
+      files_ignored: result.files_ignored,
+      files_source: filesSource,
+      nubosloop_bypassed: gate.bypassed,
+      nubosloop_forced_commit_phase: !!gate.forced_commit_phase,
+    };
+    stdout.write(JSON.stringify(skipPayload));
+    return skipPayload;
+  }
   const sha = findCommitByTaskId(taskId);
   try { deleteCheckpoint(taskId, cwd); } catch {  }
@@ -163,8 +211,11 @@ function run(args, ctx) {
   const payload = {
     ok: true,
     task_id: taskId,
+    committed: true,
     sha,
     files: safeFiles,
+    files_committed: result.files_committed,
+    files_ignored: result.files_ignored,
     files_source: filesSource,
     nubosloop_bypassed: gate.bypassed,
     nubosloop_forced_commit_phase: !!gate.forced_commit_phase,

package/bin/np-tools/commit-task.test.cjs CHANGED Viewed

@@ -164,24 +164,59 @@ test('CT-3: commit-task emits JSON with sha + files on success', () => {
   assert.ok(subject.startsWith('task(M006-S001-T0001):'), 'subject: ' + subject);
 });
-test('CT-4: commit-task LOUD-FAILS when every files_modified entry is gitignored (D-25)', () => {
+test('CT-4: commit-task SOFT-SKIPS when every files_modified entry is gitignored (artifacts-gitignored terminator)', () => {
   const root = makeRepo();
   seedPlanAndTask(root, '06-01', 'M006-S001-T0002', ['build/out.js']);
   seedLoopReadyCheckpoint(root, 'M006-S001-T0002');
   fs.writeFileSync(path.join(root, '.gitignore'), 'build/\n', 'utf-8');
   fs.mkdirSync(path.join(root, 'build'), { recursive: true });
   fs.writeFileSync(path.join(root, 'build', 'out.js'), 'noise', 'utf-8');
+  const before = execFileSync('git', ['-C', root, 'log', '--format=%H'], { encoding: 'utf-8' }).trim().split('\n').filter(Boolean).length;
   const prev = process.cwd();
   process.chdir(root);
   const cap = _capture();
+  let payload;
   try {
-    assert.throws(
-      () => subcmd.run(['M006-S001-T0002'], { cwd: root, stdout: cap.stub }),
-      (err) => err && err.code === 'commit-all-paths-gitignored',
-    );
+    payload = subcmd.run(['M006-S001-T0002'], { cwd: root, stdout: cap.stub });
   } finally {
     process.chdir(prev);
   }
+  assert.equal(payload.ok, true);
+  assert.equal(payload.committed, false);
+  assert.equal(payload.skip_reason, 'artifacts-gitignored');
+  assert.deepEqual(payload.files_ignored, ['build/out.js']);
+  const after = execFileSync('git', ['-C', root, 'log', '--format=%H'], { encoding: 'utf-8' }).trim().split('\n').filter(Boolean).length;
+  assert.equal(after, before, 'soft-skip must not produce a commit');
+  const cpPath = path.join(root, '.nubos-pilot', 'checkpoints', 'M006-S001-T0002.json');
+  assert.equal(fs.existsSync(cpPath), false, 'checkpoint must be deleted on terminal skip (symmetric to commit success)');
+});
+test('CT-4b: commit-task commits the tracked subset on mixed paths (artifacts + real source)', () => {
+  const root = makeRepo();
+  seedPlanAndTask(root, '06-01', 'M006-S001-T0003', ['src/a.ts', '.nubos-pilot/codebase/modules/x.md']);
+  seedLoopReadyCheckpoint(root, 'M006-S001-T0003');
+  fs.writeFileSync(path.join(root, '.gitignore'), '.nubos-pilot/codebase/\n', 'utf-8');
+  fs.mkdirSync(path.join(root, 'src'), { recursive: true });
+  fs.writeFileSync(path.join(root, 'src', 'a.ts'), 'export const x = 1;', 'utf-8');
+  fs.mkdirSync(path.join(root, '.nubos-pilot', 'codebase', 'modules'), { recursive: true });
+  fs.writeFileSync(path.join(root, '.nubos-pilot', 'codebase', 'modules', 'x.md'), '# X', 'utf-8');
+  const prev = process.cwd();
+  process.chdir(root);
+  const cap = _capture();
+  let payload;
+  try {
+    payload = subcmd.run(['M006-S001-T0003'], { cwd: root, stdout: cap.stub });
+  } finally {
+    process.chdir(prev);
+  }
+  assert.equal(payload.ok, true);
+  assert.equal(payload.committed, true);
+  assert.deepEqual(payload.files_committed, ['src/a.ts']);
+  assert.deepEqual(payload.files_ignored, ['.nubos-pilot/codebase/modules/x.md']);
+  assert.ok(/^[0-9a-f]{40}$/.test(payload.sha));
+  const stat = execFileSync('git', ['-C', root, 'show', '--stat', '--format=', 'HEAD'], { encoding: 'utf-8' });
+  assert.match(stat, /src\/a\.ts/);
+  assert.doesNotMatch(stat, /codebase\/modules\/x\.md/);
 });
 test('CT-5: commit-task unknown task id → task-not-found', () => {
@@ -319,6 +354,27 @@ test('CT-13: refuse gamed commit when verify ran but post-critics findings missi
   );
 });
+test('CT-13b: refuse gamed commit when post-critics produced non-empty findings', () => {
+  // `evaluateLoop` only routes `next_action=commit` when findings.length===0.
+  // The earlier shape-only gate accepted any array — a critic that returned
+  // open issues still passed if the orchestrator stamped --phase commit on
+  // top. Mirror the evaluator's invariant: non-empty findings = refuse.
+  const root = makeRepo();
+  seedPlanAndTask(root, '06-01', 'M006-S001-T0033', ['src/j.ts']);
+  fs.mkdirSync(path.join(root, 'src'), { recursive: true });
+  fs.writeFileSync(path.join(root, 'src', 'j.ts'), 'export const j = 10;\n', 'utf-8');
+  seedLoopReadyCheckpoint(root, 'M006-S001-T0033', {
+    nubosloop: { findings: [{ category: 'todo-marker', file: 'src/j.ts', line: 1, severity: 'fail' }] },
+  });
+  const cap = _capture();
+  const stderr = _capture();
+  assert.throws(
+    () => subcmd.run(['M006-S001-T0033'], { cwd: root, stdout: cap.stub, stderr: stderr.stub }),
+    (err) => err && err.code === 'commit-task-loop-bypass-violation'
+      && err.details && err.details.reason === 'post-critics-not-converged',
+  );
+});
 test('CT-14: refuse when verify-red was recorded (post-executor failed)', () => {
   const root = makeRepo();
   seedPlanAndTask(root, '06-01', 'M006-S001-T0032', ['src/i.ts']);

package/bin/np-tools/commit.cjs CHANGED Viewed

@@ -107,7 +107,15 @@ function run(argv, ctx) {
     const normalized = _normalizeFiles(files, cwd, root);
     const committable = assertCommittablePaths(normalized, { cwd: root });
     if (committable.length === 0) {
-      throw new NubosPilotError('commit-no-paths', 'commit invoked with no committable paths', { files });
+      // All paths gitignored → soft-skip with structured payload (symmetric to
+      // commit_artifacts=false above). The earlier `commit-no-paths` throw
+      // turned a routing signal into a hard error.
+      stdout.write(JSON.stringify({
+        committed: false,
+        reason: 'artifacts-gitignored',
+        files_ignored: normalized,
+      }) + '\n');
+      return 0;
     }
     execFileSync('git', ['add', '--', ...committable], { cwd: root, stdio: 'pipe' });
     execFileSync('git', ['commit', '-m', msg, '--', ...committable], { cwd: root, stdio: 'pipe' });

package/bin/np-tools/commit.test.cjs CHANGED Viewed

@@ -141,6 +141,34 @@ test('COMMIT-5: workflow.commit_artifacts=false skips commit silently with exit
   assert.equal(logOut, '', 'expected no commits to be created');
 });
+test('COMMIT-7: all-paths-gitignored soft-skips with structured payload (exit 0, no commit)', () => {
+  const sb = makeSandbox();
+  initGit(sb);
+  fs.writeFileSync(path.join(sb, '.gitignore'), 'build/\n');
+  fs.mkdirSync(path.join(sb, 'build'), { recursive: true });
+  fs.writeFileSync(path.join(sb, 'build', 'out.js'), 'noise');
+  const stdout = makeSink();
+  const stderr = makeSink();
+  const origCwd = process.cwd();
+  process.chdir(sb);
+  let code;
+  try {
+    code = commitCli.run(['chore: artifact', '--files', 'build/out.js'], { stdout, stderr });
+  } finally {
+    process.chdir(origCwd);
+  }
+  assert.equal(code, 0, 'stderr=' + stderr.toString());
+  const payload = JSON.parse(stdout.toString().trim());
+  assert.equal(payload.committed, false);
+  assert.equal(payload.reason, 'artifacts-gitignored');
+  assert.deepEqual(payload.files_ignored, ['build/out.js']);
+  let logOut;
+  try {
+    logOut = execFileSync('git', ['log', '--format=%H'], { cwd: sb, encoding: 'utf-8' });
+  } catch { logOut = ''; }
+  assert.equal(logOut.trim(), '', 'expected no commits to be created');
+});
 test('COMMIT-6: workflow.commit_artifacts=true still commits normally', () => {
   const sb = makeSandbox();
   initGit(sb);

package/bin/np-tools/doctor.cjs CHANGED Viewed

@@ -336,7 +336,16 @@ function _checkMilestoneLayout(projectRoot) {
   return issues;
 }
-const NUBOSLOOP_CRITICS = ['np-critic-style', 'np-critic-tests', 'np-critic-acceptance'];
+// Single-critic revision (ADR-0010 §Single-Critic Revision 2026-05-05): one
+// np-critic spawned per round, with three audit-surface modules loaded as
+// <files_to_read>. The doctor checks that all four files are present —
+// missing the spawnable critic OR any of the three modules breaks the loop.
+const NUBOSLOOP_CRITICS = [
+  'np-critic',             // spawnable (sonnet)
+  'np-critic-style',       // axis module (Style)
+  'np-critic-tests',       // axis module (Tests)
+  'np-critic-acceptance',  // axis module (Acceptance)
+];
 function _checkNubosloopCritics(projectRoot) {
   const issues = [];

package/bin/np-tools/loop-audit-tool-use.cjs CHANGED Viewed

@@ -2,6 +2,7 @@
 const checkpoint = require('../../lib/checkpoint.cjs');
 const nubosloop = require('../../lib/nubosloop.cjs');
+const agentsLib = require('../../lib/agents.cjs');
 const args = require('./_args.cjs');
 const TASK_ID_RE = checkpoint.TASK_ID_RE;
@@ -31,6 +32,21 @@ function run(argv, ctx) {
       { hint: 'agents requiring search tools: ' + nubosloop.AUDITED_AGENTS.join(', ') },
     );
   }
+  if (typeof agent === 'string' && agent.startsWith('np-')) {
+    try {
+      agentsLib.loadAgentModule(agent, cwd);
+      throw new (require('../../lib/core.cjs').NubosPilotError)(
+        'loop-audit-agent-is-module',
+        'loop-audit-tool-use refuses to record a spawn for "' + agent + '": this agent is a module (module: true) and cannot be spawned independently',
+        { agent, hint: 'Modules are loaded as <files_to_read> by their parent agent. Spawn the parent and audit that name instead.' },
+      );
+    } catch (err) {
+      if (!err) throw err;
+      if (err.code === 'loop-audit-agent-is-module') throw err;
+      // Any other error (agent-not-found, agent-not-a-module) means the name
+      // is not a known module — fall through and accept the audit.
+    }
+  }
   // --tool-use-log is required for AUDITED_AGENTS (Rule 9 enforcement reads
   // the tool list to verify search-knowledge / match-existing-learning calls).
   // For non-audited spawns (critics, plan-checker, etc.) the orchestrator may