npm - nubos-pilot - Versions diffs - 0.1.0 - Mend

nubos-pilot 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (273) hide show

package/agents/np-ai-researcher.md +140 -0
package/agents/np-code-fixer.md +363 -0
package/agents/np-code-reviewer.md +351 -0
package/agents/np-domain-researcher.md +136 -0
package/agents/np-eval-auditor.md +167 -0
package/agents/np-eval-planner.md +153 -0
package/agents/np-executor.md +72 -0
package/agents/np-framework-selector.md +171 -0
package/agents/np-nyquist-auditor.md +185 -0
package/agents/np-plan-checker.md +165 -0
package/agents/np-planner.md +199 -0
package/agents/np-researcher.md +150 -0
package/agents/np-security-auditor.md +206 -0
package/agents/np-ui-auditor.md +369 -0
package/agents/np-ui-checker.md +192 -0
package/agents/np-ui-researcher.md +324 -0
package/agents/np-verifier.md +79 -0
package/bin/check-coverage.cjs +40 -0
package/bin/check-workflows.cjs +171 -0
package/bin/check-workflows.test.cjs +208 -0
package/bin/install.js +500 -0
package/bin/np-tools/_commands.cjs +70 -0
package/bin/np-tools/add-tests.cjs +171 -0
package/bin/np-tools/add-tests.test.cjs +122 -0
package/bin/np-tools/add-todo.cjs +108 -0
package/bin/np-tools/add-todo.test.cjs +112 -0
package/bin/np-tools/agent-skills.cjs +14 -0
package/bin/np-tools/agent-skills.test.cjs +42 -0
package/bin/np-tools/ai-integration-phase.cjs +109 -0
package/bin/np-tools/ai-integration-phase.test.cjs +123 -0
package/bin/np-tools/askuser.cjs +53 -0
package/bin/np-tools/askuser.test.cjs +49 -0
package/bin/np-tools/autonomous.cjs +69 -0
package/bin/np-tools/autonomous.test.cjs +74 -0
package/bin/np-tools/checkpoint.cjs +101 -0
package/bin/np-tools/checkpoint.test.cjs +119 -0
package/bin/np-tools/code-review.cjs +133 -0
package/bin/np-tools/code-review.test.cjs +96 -0
package/bin/np-tools/commit-task.cjs +120 -0
package/bin/np-tools/commit-task.test.cjs +160 -0
package/bin/np-tools/commit.cjs +103 -0
package/bin/np-tools/commit.test.cjs +93 -0
package/bin/np-tools/config.cjs +101 -0
package/bin/np-tools/config.test.cjs +71 -0
package/bin/np-tools/discuss-phase-power.cjs +265 -0
package/bin/np-tools/discuss-phase-power.test.cjs +242 -0
package/bin/np-tools/discuss-phase.cjs +132 -0
package/bin/np-tools/discuss-phase.test.cjs +148 -0
package/bin/np-tools/dispatch.cjs +116 -0
package/bin/np-tools/doctor.cjs +242 -0
package/bin/np-tools/eval-review.cjs +116 -0
package/bin/np-tools/eval-review.test.cjs +123 -0
package/bin/np-tools/execute-phase.cjs +182 -0
package/bin/np-tools/execute-phase.test.cjs +116 -0
package/bin/np-tools/execute-plan.cjs +124 -0
package/bin/np-tools/execute-plan.test.cjs +82 -0
package/bin/np-tools/help.cjs +28 -0
package/bin/np-tools/help.test.cjs +29 -0
package/bin/np-tools/init-dispatch.test.cjs +91 -0
package/bin/np-tools/metrics.cjs +97 -0
package/bin/np-tools/metrics.test.cjs +188 -0
package/bin/np-tools/new-milestone.cjs +288 -0
package/bin/np-tools/new-milestone.test.cjs +166 -0
package/bin/np-tools/new-project.cjs +284 -0
package/bin/np-tools/new-project.test.cjs +165 -0
package/bin/np-tools/next.cjs +7 -0
package/bin/np-tools/next.test.cjs +30 -0
package/bin/np-tools/park.cjs +48 -0
package/bin/np-tools/park.test.cjs +50 -0
package/bin/np-tools/pause-work.cjs +24 -0
package/bin/np-tools/pause-work.test.cjs +74 -0
package/bin/np-tools/phase.cjs +71 -0
package/bin/np-tools/phase.test.cjs +81 -0
package/bin/np-tools/plan-diff.cjs +57 -0
package/bin/np-tools/plan-diff.test.cjs +134 -0
package/bin/np-tools/plan-milestone-gaps.cjs +115 -0
package/bin/np-tools/plan-milestone-gaps.test.cjs +122 -0
package/bin/np-tools/plan-phase.cjs +350 -0
package/bin/np-tools/plan-phase.test.cjs +263 -0
package/bin/np-tools/progress.cjs +7 -0
package/bin/np-tools/progress.test.cjs +44 -0
package/bin/np-tools/queue.cjs +213 -0
package/bin/np-tools/research-phase.cjs +144 -0
package/bin/np-tools/research-phase.test.cjs +154 -0
package/bin/np-tools/reset-slice.cjs +17 -0
package/bin/np-tools/reset-slice.test.cjs +96 -0
package/bin/np-tools/resolve-model.cjs +110 -0
package/bin/np-tools/resolve-model.test.cjs +200 -0
package/bin/np-tools/resume-work.cjs +76 -0
package/bin/np-tools/resume-work.test.cjs +91 -0
package/bin/np-tools/skip.cjs +48 -0
package/bin/np-tools/skip.test.cjs +66 -0
package/bin/np-tools/slug.cjs +34 -0
package/bin/np-tools/slug.test.cjs +46 -0
package/bin/np-tools/state.cjs +16 -0
package/bin/np-tools/state.test.cjs +40 -0
package/bin/np-tools/stats.cjs +151 -0
package/bin/np-tools/stats.test.cjs +118 -0
package/bin/np-tools/triage.cjs +128 -0
package/bin/np-tools/ui-phase.cjs +108 -0
package/bin/np-tools/ui-phase.test.cjs +121 -0
package/bin/np-tools/ui-review.cjs +108 -0
package/bin/np-tools/ui-review.test.cjs +120 -0
package/bin/np-tools/undo-task.cjs +31 -0
package/bin/np-tools/undo-task.test.cjs +117 -0
package/bin/np-tools/undo.cjs +43 -0
package/bin/np-tools/undo.test.cjs +120 -0
package/bin/np-tools/unpark.cjs +48 -0
package/bin/np-tools/unpark.test.cjs +50 -0
package/bin/np-tools/verify-work.cjs +186 -0
package/bin/np-tools/verify-work.test.cjs +97 -0
package/docs/adr/0001-no-daemon-invariant.md +82 -0
package/docs/adr/0002-zero-runtime-dependencies.md +90 -0
package/docs/adr/0003-max-six-unit-types.md +85 -0
package/docs/adr/0004-atomic-commit-per-unit.md +102 -0
package/docs/adr/0005-three-orthogonal-file-trees.md +98 -0
package/docs/adr/0006-yaml-dependency-amendment.md +60 -0
package/docs/adr/README.md +27 -0
package/docs/agent-frontmatter-schema.md +84 -0
package/docs/phase-artifact-schemas.md +292 -0
package/docs/phase-directory-layout.md +82 -0
package/lib/__tests__/README.md +1 -0
package/lib/agents.cjs +98 -0
package/lib/agents.test.cjs +286 -0
package/lib/askuser.cjs +36 -0
package/lib/askuser.test.cjs +310 -0
package/lib/checkpoint.cjs +135 -0
package/lib/checkpoint.test.cjs +184 -0
package/lib/core.cjs +165 -0
package/lib/core.test.cjs +405 -0
package/lib/fixtures/README.md +1 -0
package/lib/fixtures/phase-tree/README.md +1 -0
package/lib/fixtures/plans/cycle/PLAN.md +16 -0
package/lib/fixtures/plans/cycle/tasks/T-01.md +20 -0
package/lib/fixtures/plans/cycle/tasks/T-02.md +20 -0
package/lib/fixtures/plans/cycle/tasks/T-03.md +20 -0
package/lib/fixtures/plans/linear/PLAN.md +16 -0
package/lib/fixtures/plans/linear/tasks/T-01.md +20 -0
package/lib/fixtures/plans/linear/tasks/T-02.md +20 -0
package/lib/fixtures/plans/linear/tasks/T-03.md +20 -0
package/lib/fixtures/plans/parallel/PLAN.md +16 -0
package/lib/fixtures/plans/parallel/tasks/T-01.md +20 -0
package/lib/fixtures/plans/parallel/tasks/T-02.md +20 -0
package/lib/fixtures/plans/parallel/tasks/T-03.md +20 -0
package/lib/fixtures/plans/wave-conflict/PLAN.md +16 -0
package/lib/fixtures/plans/wave-conflict/tasks/T-01.md +20 -0
package/lib/fixtures/plans/wave-conflict/tasks/T-02.md +20 -0
package/lib/fixtures/roadmap/ROADMAP-malformed.md +3 -0
package/lib/fixtures/roadmap/ROADMAP-minimal.md +51 -0
package/lib/fixtures/roadmap/roadmap-malformed.yaml +7 -0
package/lib/fixtures/roadmap/roadmap-minimal.yaml +40 -0
package/lib/fixtures/roadmap/roadmap-ten-phases.yaml +101 -0
package/lib/fixtures/templates/phase-context.md +6 -0
package/lib/fixtures/templates/plan-skeleton.md +6 -0
package/lib/frontmatter.cjs +251 -0
package/lib/frontmatter.test.cjs +177 -0
package/lib/gaps.cjs +197 -0
package/lib/gaps.test.cjs +200 -0
package/lib/git.cjs +207 -0
package/lib/git.test.cjs +305 -0
package/lib/install/agents-md.cjs +77 -0
package/lib/install/backup.cjs +70 -0
package/lib/install/codex-toml.cjs +440 -0
package/lib/install/managed-block.cjs +30 -0
package/lib/install/manifest.cjs +148 -0
package/lib/install/mcp-writer.cjs +127 -0
package/lib/install/runtime-detect.cjs +44 -0
package/lib/install/staging.cjs +149 -0
package/lib/metrics-aggregate.cjs +229 -0
package/lib/metrics-aggregate.test.cjs +192 -0
package/lib/metrics.cjs +120 -0
package/lib/metrics.test.cjs +182 -0
package/lib/model-aliases.regression.test.cjs +16 -0
package/lib/model-profiles.cjs +42 -0
package/lib/model-profiles.test.cjs +61 -0
package/lib/next.cjs +236 -0
package/lib/next.test.cjs +194 -0
package/lib/phase.cjs +95 -0
package/lib/phase.test.cjs +189 -0
package/lib/plan-checker-contract.test.cjs +72 -0
package/lib/plan-diff.cjs +173 -0
package/lib/plan-diff.test.cjs +217 -0
package/lib/plan.cjs +85 -0
package/lib/plan.test.cjs +263 -0
package/lib/progress.cjs +95 -0
package/lib/progress.test.cjs +116 -0
package/lib/researcher-contract.test.cjs +61 -0
package/lib/roadmap-render.cjs +206 -0
package/lib/roadmap-render.test.cjs +121 -0
package/lib/roadmap.cjs +416 -0
package/lib/roadmap.test.cjs +371 -0
package/lib/runtime/_contract.test.cjs +61 -0
package/lib/runtime/_readline.cjs +119 -0
package/lib/runtime/_readline.test.cjs +126 -0
package/lib/runtime/claude.cjs +48 -0
package/lib/runtime/claude.test.cjs +101 -0
package/lib/runtime/codex.cjs +35 -0
package/lib/runtime/codex.test.cjs +114 -0
package/lib/runtime/gemini.cjs +35 -0
package/lib/runtime/gemini.test.cjs +109 -0
package/lib/runtime/index.cjs +49 -0
package/lib/runtime/index.test.cjs +181 -0
package/lib/runtime/opencode.cjs +35 -0
package/lib/runtime/opencode.test.cjs +124 -0
package/lib/state.cjs +205 -0
package/lib/state.test.cjs +264 -0
package/lib/surface-audit.test.cjs +46 -0
package/lib/tasks.cjs +327 -0
package/lib/tasks.test.cjs +389 -0
package/lib/template.cjs +66 -0
package/lib/template.test.cjs +159 -0
package/lib/undo.cjs +179 -0
package/lib/undo.test.cjs +261 -0
package/lib/verify.cjs +116 -0
package/lib/verify.test.cjs +187 -0
package/np-tools.cjs +303 -0
package/package.json +39 -0
package/templates/AI-SPEC.md +90 -0
package/templates/CONTEXT.md +32 -0
package/templates/PLAN.md +69 -0
package/templates/PROJECT.md +60 -0
package/templates/REQUIREMENTS.md +38 -0
package/templates/SECURITY.md +61 -0
package/templates/UI-SPEC.md +64 -0
package/templates/VALIDATION.md +76 -0
package/templates/claude/payload/README.md +11 -0
package/templates/opencode/opencode.json +6 -0
package/templates/opencode/payload/AGENTS.md +9 -0
package/workflows/add-backlog.md +212 -0
package/workflows/add-tests.md +69 -0
package/workflows/add-todo.md +222 -0
package/workflows/ai-integration-phase.md +230 -0
package/workflows/autonomous.md +94 -0
package/workflows/cleanup.md +325 -0
package/workflows/code-review-fix.md +435 -0
package/workflows/code-review.md +447 -0
package/workflows/discuss-phase-assumptions.md +269 -0
package/workflows/discuss-phase-power.md +139 -0
package/workflows/discuss-phase.md +386 -0
package/workflows/dispatch.md +9 -0
package/workflows/doctor.md +10 -0
package/workflows/eval-review.md +243 -0
package/workflows/execute-phase.md +142 -0
package/workflows/execute-plan.md +82 -0
package/workflows/help.md +8 -0
package/workflows/new-milestone.md +166 -0
package/workflows/new-project.md +213 -0
package/workflows/next.md +8 -0
package/workflows/note.md +244 -0
package/workflows/park.md +29 -0
package/workflows/pause-work.md +34 -0
package/workflows/plan-milestone-gaps.md +233 -0
package/workflows/plan-phase.md +351 -0
package/workflows/progress.md +8 -0
package/workflows/queue.md +9 -0
package/workflows/research-phase.md +327 -0
package/workflows/reset-slice.md +39 -0
package/workflows/resume-work.md +79 -0
package/workflows/review.md +489 -0
package/workflows/secure-phase.md +209 -0
package/workflows/session-report.md +243 -0
package/workflows/skip.md +29 -0
package/workflows/state.md +7 -0
package/workflows/stats.md +170 -0
package/workflows/thread.md +214 -0
package/workflows/triage.md +9 -0
package/workflows/ui-phase.md +246 -0
package/workflows/ui-review.md +222 -0
package/workflows/undo-task.md +42 -0
package/workflows/undo.md +55 -0
package/workflows/unpark.md +29 -0
package/workflows/validate-phase.md +231 -0
package/workflows/verify-work.md +83 -0

package/bin/np-tools/verify-work.cjs ADDED Viewed

@@ -0,0 +1,186 @@
+const fs = require('node:fs');
+const path = require('node:path');
+const os = require('node:os');
+const crypto = require('node:crypto');
+const {
+  NubosPilotError,
+  projectStateDir,
+  atomicWriteFileSync,
+  withFileLock,
+} = require('../../lib/core.cjs');
+const { getPhase } = require('../../lib/roadmap.cjs');
+const { paddedPhase, findPhaseDir } = require('../../lib/phase.cjs');
+const {
+  verifyPhase,
+  renderVerificationMd,
+  writeVerificationMd,
+} = require('../../lib/verify.cjs');
+const { getAgentSkills } = require('../../lib/agents.cjs');
+const INLINE_THRESHOLD_BYTES = 16 * 1024;
+const _VALID_SC_STATUSES = new Set(['Pass', 'Fail', 'Defer', 'Pending']);
+function _validatePhaseArg(raw) {
+  if (raw == null || raw === '' || !/^\d+(\.\d+)?$/.test(String(raw))) {
+    throw new NubosPilotError(
+      'verify-work-invalid-phase',
+      'verify-work requires a numeric phase argument',
+      { value: raw == null ? '' : String(raw) },
+    );
+  }
+  return String(raw);
+}
+function _safeSkills(name, cwd) {
+  try { return getAgentSkills(name, cwd); } catch { return []; }
+}
+function _emit(payload, stdout, cwd) {
+  const json = JSON.stringify(payload, null, 2);
+  if (Buffer.byteLength(json, 'utf-8') <= INLINE_THRESHOLD_BYTES) {
+    stdout.write(json);
+    return;
+  }
+  let tmpDir;
+  try {
+    tmpDir = path.join(projectStateDir(cwd), '.tmp');
+    fs.mkdirSync(tmpDir, { recursive: true });
+  } catch { tmpDir = os.tmpdir(); }
+  const suffix = process.pid + '-' + crypto.randomBytes(4).toString('hex');
+  const tmpPath = path.join(tmpDir, 'init-verify-work-' + suffix + '.json');
+  fs.writeFileSync(tmpPath, json, 'utf-8');
+  stdout.write('@file:' + tmpPath);
+}
+function _initPayload(phaseArg, cwd) {
+  const phaseN = Number(phaseArg);
+  const phase = getPhase(phaseN, cwd);
+  const padded = paddedPhase(phaseN);
+  const phase_dir = findPhaseDir(phaseN, cwd);
+  const results = verifyPhase(phaseN, { cwd });
+  return {
+    _workflow: 'verify-work',
+    phase: phaseArg,
+    padded,
+    phase_dir,
+    phase_name: phase.name,
+    success_criteria: Array.isArray(phase.success_criteria) ? phase.success_criteria : [],
+    draft_results: results,
+    verification_path: phase_dir ? path.join(phase_dir, padded + '-VERIFICATION.md') : null,
+    verifier_tier: 'sonnet',
+    agent_skills: { verifier: _safeSkills('np-verifier', cwd) },
+  };
+}
+function _emitDraft(phaseArg, cwd) {
+  const phaseN = Number(phaseArg);
+  writeVerificationMd(phaseN, cwd);
+  const padded = paddedPhase(phaseN);
+  const phase_dir = findPhaseDir(phaseN, cwd);
+  return { ok: true, path: path.join(phase_dir, padded + '-VERIFICATION.md') };
+}
+function _recordSc(phaseArg, scId, status, notes, cwd) {
+  if (!/^SC-\d+$/.test(String(scId))) {
+    throw new NubosPilotError(
+      'verify-work-invalid-sc-id',
+      'Invalid SC id: ' + scId + ' (expected SC-N)',
+      { scId },
+    );
+  }
+  if (!_VALID_SC_STATUSES.has(status)) {
+    throw new NubosPilotError(
+      'verify-work-invalid-status',
+      'Invalid SC status: ' + status + ' (allowed: ' + [..._VALID_SC_STATUSES].join(', ') + ')',
+      { status },
+    );
+  }
+  const phaseN = Number(phaseArg);
+  const padded = paddedPhase(phaseN);
+  const phase_dir = findPhaseDir(phaseN, cwd);
+  if (!phase_dir) {
+    throw new NubosPilotError(
+      'verify-work-phase-dir-missing',
+      'Phase directory not found for phase ' + phaseN,
+      { phase: phaseN },
+    );
+  }
+  const target = path.join(phase_dir, padded + '-VERIFICATION.md');
+  return withFileLock(target, () => {
+    let raw;
+    try { raw = fs.readFileSync(target, 'utf-8'); } catch (err) {
+      throw new NubosPilotError(
+        'verify-work-file-unreadable',
+        'VERIFICATION.md not readable at ' + target + ' — run `verify-work emit-draft` first',
+        { path: target, cause: err && err.code },
+      );
+    }
+    const blockRe = new RegExp(
+      '^(### ' + scId + ':[^\\n]*\\n)(- \\*\\*Status:\\*\\* )[^\\n]*(\\n- \\*\\*Classified by:\\*\\* )[^\\n]*',
+      'm',
+    );
+    if (!blockRe.test(raw)) {
+      throw new NubosPilotError(
+        'verify-work-sc-not-found',
+        'SC ' + scId + ' not found in VERIFICATION.md',
+        { scId, path: target },
+      );
+    }
+    let next = raw.replace(blockRe, (_m, hdr, p1, p3) => hdr + p1 + status + p3 + 'user');
+    if (notes) {
+      const afterRe = new RegExp(
+        '^(### ' + scId + ':[^\\n]*\\n- \\*\\*Status:\\*\\* [^\\n]*\\n- \\*\\*Classified by:\\*\\* [^\\n]*\\n- \\*\\*Evidence:\\*\\* [^\\n]*)(\\n- \\*\\*Notes:\\*\\* [^\\n]*)?',
+        'm',
+      );
+      next = next.replace(afterRe, (_m, head) => head + '\n- **Notes:** ' + notes);
+    }
+    atomicWriteFileSync(target, next);
+    return { ok: true, sc_id: scId, status, path: target };
+  });
+}
+function run(args, ctx) {
+  const context = ctx || {};
+  const cwd = context.cwd || process.cwd();
+  const stdout = context.stdout || process.stdout;
+  const list = Array.isArray(args) ? args : [];
+  const verb = list[0];
+  switch (verb) {
+    case 'init': {
+      const phaseArg = _validatePhaseArg(list[1]);
+      const payload = _initPayload(phaseArg, cwd);
+      _emit(payload, stdout, cwd);
+      return payload;
+    }
+    case 'emit-draft': {
+      const phaseArg = _validatePhaseArg(list[1]);
+      const result = _emitDraft(phaseArg, cwd);
+      stdout.write(JSON.stringify(result));
+      return result;
+    }
+    case 'record-sc': {
+      const phaseArg = _validatePhaseArg(list[1]);
+      const scId = list[2];
+      const status = list[3];
+      const notes = list.slice(4).join(' ') || null;
+      const result = _recordSc(phaseArg, scId, status, notes, cwd);
+      stdout.write(JSON.stringify(result));
+      return result;
+    }
+    default:
+      throw new NubosPilotError(
+        'verify-work-unknown-verb',
+        'verify-work: unknown verb: ' + String(verb),
+        { verb },
+      );
+  }
+}
+module.exports = { run, INLINE_THRESHOLD_BYTES };

package/bin/np-tools/verify-work.test.cjs ADDED Viewed

@@ -0,0 +1,97 @@
+const { test, afterEach } = require('node:test');
+const assert = require('node:assert/strict');
+const fs = require('node:fs');
+const path = require('node:path');
+const { makeSandbox, seedRoadmapYaml, seedPhaseDir, cleanupAll } =
+  require('../../tests/helpers/fixture.cjs');
+const subcmd = require('./verify-work.cjs');
+function _roadmapWithSCs() {
+  return {
+    schema_version: 1,
+    milestones: [{ id: 'v1.0', name: 'm1', phases: [
+      { number: 6, name: 'Execution', slug: 'execution', goal: '', depends_on: [],
+        requirements: [], success_criteria: ['Tasks commit atomically', 'Verification runs'],
+        status: 'planned', plans: [] },
+    ]}],
+  };
+}
+function _capture() { let b = ''; return { stub: { write: (s) => { b += s; return true; } }, get: () => b }; }
+afterEach(cleanupAll);
+test('VW-1: init emits payload with success_criteria + verifier_tier', () => {
+  const sandbox = makeSandbox();
+  seedRoadmapYaml(sandbox, _roadmapWithSCs());
+  seedPhaseDir(sandbox, 6, 'execution', {});
+  const cap = _capture();
+  const p = subcmd.run(['init', '6'], { cwd: sandbox, stdout: cap.stub });
+  assert.equal(p._workflow, 'verify-work');
+  assert.equal(p.verifier_tier, 'sonnet');
+  assert.deepEqual(p.success_criteria, ['Tasks commit atomically', 'Verification runs']);
+  assert.ok(Array.isArray(p.draft_results));
+  assert.equal(p.draft_results.length, 2);
+});
+test('VW-2: emit-draft writes VERIFICATION.md', () => {
+  const sandbox = makeSandbox();
+  seedRoadmapYaml(sandbox, _roadmapWithSCs());
+  const phaseDir = seedPhaseDir(sandbox, 6, 'execution', {});
+  const cap = _capture();
+  subcmd.run(['emit-draft', '6'], { cwd: sandbox, stdout: cap.stub });
+  const vp = path.join(phaseDir, '06-VERIFICATION.md');
+  assert.ok(fs.existsSync(vp));
+  const body = fs.readFileSync(vp, 'utf-8');
+  assert.ok(body.includes('### SC-1:'));
+  assert.ok(body.includes('### SC-2:'));
+  assert.ok(body.includes('**Status:** Pending'));
+});
+test('VW-3: record-sc updates a single SC status + sets classified_by=user', () => {
+  const sandbox = makeSandbox();
+  seedRoadmapYaml(sandbox, _roadmapWithSCs());
+  const phaseDir = seedPhaseDir(sandbox, 6, 'execution', {});
+  const cap1 = _capture();
+  subcmd.run(['emit-draft', '6'], { cwd: sandbox, stdout: cap1.stub });
+  const cap2 = _capture();
+  subcmd.run(['record-sc', '6', 'SC-1', 'Pass'], { cwd: sandbox, stdout: cap2.stub });
+  const body = fs.readFileSync(path.join(phaseDir, '06-VERIFICATION.md'), 'utf-8');
+  assert.ok(body.includes('### SC-1: Tasks commit atomically\n- **Status:** Pass\n- **Classified by:** user'));
+  assert.ok(body.includes('### SC-2: Verification runs\n- **Status:** Pending'));
+});
+test('VW-4: record-sc rejects unknown status', () => {
+  const sandbox = makeSandbox();
+  seedRoadmapYaml(sandbox, _roadmapWithSCs());
+  seedPhaseDir(sandbox, 6, 'execution', {});
+  const cap1 = _capture();
+  subcmd.run(['emit-draft', '6'], { cwd: sandbox, stdout: cap1.stub });
+  const cap2 = _capture();
+  assert.throws(
+    () => subcmd.run(['record-sc', '6', 'SC-1', 'Maybe'], { cwd: sandbox, stdout: cap2.stub }),
+    (err) => err && err.code === 'verify-work-invalid-status',
+  );
+});
+test('VW-5: record-sc before emit-draft → file-unreadable', () => {
+  const sandbox = makeSandbox();
+  seedRoadmapYaml(sandbox, _roadmapWithSCs());
+  seedPhaseDir(sandbox, 6, 'execution', {});
+  const cap = _capture();
+  assert.throws(
+    () => subcmd.run(['record-sc', '6', 'SC-1', 'Pass'], { cwd: sandbox, stdout: cap.stub }),
+    (err) => err && err.code === 'verify-work-file-unreadable',
+  );
+});
+test('VW-6: unknown verb throws', () => {
+  const sandbox = makeSandbox();
+  const cap = _capture();
+  assert.throws(
+    () => subcmd.run(['bogus'], { cwd: sandbox, stdout: cap.stub }),
+    (err) => err && err.code === 'verify-work-unknown-verb',
+  );
+});

package/docs/adr/0001-no-daemon-invariant.md ADDED Viewed

@@ -0,0 +1,82 @@
+# ADR-0001: No-Daemon Invariant
+* Status: Accepted
+* Date: 2026-04-14
+* Supersedes: None
+## Context and Problem Statement
+nubos-pilot must run inside existing agent CLIs (Claude Code, Codex, Gemini, OpenCode). Any long-lived background process, RPC server, or OS-level service registration would break the install-anywhere promise and directly violate PROJECT.md's Core Value ("ohne eigenes Daemon"). The question is binary: should nubos-pilot ever spawn or require a background process?
+For the purposes of this ADR, **"daemon" means any one of:**
+1. An OS-level service (systemd unit, launchd agent, Windows service).
+2. A long-lived runtime process kept alive between slash-command invocations.
+3. An RPC server exposing a local port, socket, or pipe.
+4. A between-invocation file-watcher or async-job runner.
+**What does NOT count as a daemon:** a short-lived, foreground `node` process launched by a slash-command's bash block that exits when its work completes. That is an ordinary synchronous invocation, not a background process.
+## Decision Drivers
+* **Install-anywhere** — no `sudo`, no `systemd`/`launchd`, no Windows service registration.
+* **Multi-runtime compatibility** — all four target runtimes (Claude Code, Codex, Gemini, OpenCode) treat this tool as a synchronous set of slash-commands. None of them host background services.
+* **Simplicity** — no process lifecycle to reason about; no "is it running?" diagnostics; no log rotation; no PID files.
+* **Security / footprint** — no always-on listener, no accidental RPC surface, no persistent state in a running process.
+## Considered Options
+* **Stay daemon-free** — in-session, foreground-only execution via short-lived `node` invocations. (CHOSEN)
+* **SDK-embedded coding-agent runtime** — adopt a persistent coding-agent SDK as a runtime layer (e.g. an `@anthropic-ai/claude-agent-sdk`-style loader pattern).
+* **Cross-session daemon via launchd/systemd** — a user-installed service that auto-advances plans while no agent CLI is open (REQUIREMENTS.md FUT-06).
+* **RPC multi-agent system** — spawn local worker processes communicating over sockets to achieve real parallelism beyond Claude's `Task` tool.
+## Decision Outcome
+Chosen option: **"Stay daemon-free"**, because it is the only option that satisfies install-anywhere + multi-runtime compatibility + simplicity drivers simultaneously. Any feature that would require a persistent background process is out of scope by construction; the same user intent can be satisfied via the in-session auto-advance loop (`np:autonomous` — forward-reference Phase 6 / EXEC-03) which runs inside the agent CLI's own session.
+### Consequences
+* Good, because any feature that requires a daemon is out of scope by construction — the invariant resolves scope disputes before they start.
+* Good, because the tool is fully removable by deleting files — there is no service to stop, no PID to kill, no socket to clean up.
+* Good, because there is no security surface from an always-on listener; no accidental RPC port; no attack surface grown by simply installing nubos-pilot.
+* Good, because install works on machines where the user cannot `sudo` (corporate/managed environments, air-gapped dev boxes).
+* Bad, because background auto-advance while the user's agent CLI is closed is impossible. Mitigated by `np:autonomous` (in-session loop, Phase 6 / EXEC-03). The cross-session variant is explicitly deferred (REQUIREMENTS.md FUT-06) and would require a future ADR that supersedes this one.
+* Bad, because we forgo a richer SDK-embedded interactive loop; mitigated by the fact that the four supported agent CLIs already provide their own interactive loops — we compose with them instead of competing.
+## Pros and Cons of the Options
+### Stay daemon-free — chosen
+* Good, because `.cjs` files invoked inline from slash-command bash blocks require zero persistent processes — the implementation cost is bounded and well-understood.
+* Good, because it gives us zero supply-chain exposure from a daemon framework (no coding-agent SDK transitive tree, no `@modelcontextprotocol/sdk` runtime dep).
+* Good, because it lines up with PROJECT.md Constraint "Keine eigenen Prozesse".
+* Bad, because we must implement auto-advance as an in-session loop rather than a scheduled background worker — accepted cost, captured by `np:autonomous` design in Phase 6.
+### SDK-embedded coding-agent runtime — rejected
+* Good, because it provides a richer interactive loop and a unified tool/capability abstraction.
+* Bad, because it reintroduces a runtime — directly contradicts PROJECT.md Constraint "Keine eigenen Prozesse".
+* Bad, because it couples nubos-pilot's lifecycle to a library's API-stability — a coupling CLAUDE.md §"External runtime SDKs" calls out as prohibited.
+* Bad, because maintaining a process means maintaining crash-recovery, log files, and version migrations — all scope we explicitly reject.
+### Cross-session daemon via launchd/systemd — rejected (deferred)
+* Good, because it would enable true "leave it running overnight" plan execution.
+* Bad, because it requires per-OS service registration (root on Linux, `launchctl` on macOS, Service Control Manager on Windows) — install-anywhere dies.
+* Bad, because it requires a PID/lock management story that the synchronous in-session model avoids entirely.
+* Deferred: REQUIREMENTS.md FUT-06 captures this as a future-scope item. Adoption would require a new ADR superseding ADR-0001.
+### RPC multi-agent system — rejected
+* Good, because it would unlock true multi-agent parallelism beyond Claude's in-session `Task` tool.
+* Bad, because it requires a long-lived server process — the exact thing this ADR forbids. (REQUIREMENTS.md §"Out of Scope" row "Echtes RPC-basiertes Multi-Agent-System".)
+* Bad, because the parallelism Claude's `Task` tool already provides is sufficient for the scope outlined in PROJECT.md; RPC would be a solution in search of a problem.
+* Bad, because it introduces local-socket attack surface for negligible benefit over in-session `Task`.
+## More Information
+* **Related ADR:** [ADR-0004](0004-atomic-commit-per-unit.md) — atomic-commit-per-unit works precisely because no daemon holds state across sessions; each commit is self-contained.
+* **PROJECT.md:** §"Constraints" — "Keine eigenen Prozesse: alles läuft inline im Agent-CLI".
+* **CLAUDE.md:** §"External runtime SDKs" — runtime coding-agent SDKs (including `@anthropic-ai/claude-agent-sdk` and similar) are explicitly prohibited.
+* **REQUIREMENTS.md:** §"Out of Scope" → rows "Eigene Runtime / Daemon / Background-Prozess", "Cross-Session Daemon für Auto-Advance ohne offene Session", "Echtes RPC-basiertes Multi-Agent-System".

package/docs/adr/0002-zero-runtime-dependencies.md ADDED Viewed

@@ -0,0 +1,90 @@
+# ADR-0002: Zero Runtime Dependencies
+* Status: Accepted
+* Date: 2026-04-14
+* Supersedes: None
+* **Amendment:** [ADR-0006](0006-yaml-dependency-amendment.md) permits `yaml@^2.8` as a narrowly-scoped runtime dependency (2026-04-15).
+## Context and Problem Statement
+`package.json`'s `dependencies` block is the only way nubos-pilot can ship transitive complexity to end users through `npx`. Every runtime dependency is a three-headed cost: a supply-chain surface, a version-compatibility constraint, and an install-failure mode (Windows path quirks, corporate proxies, air-gapped networks, peer-dep conflicts, abandoned maintainers). The question: should nubos-pilot ever declare runtime dependencies?
+## Scope
+This ADR is about `package.json.dependencies` specifically — the subset of `package.json` that ships to end users via `npm install` / `npx`:
+* **"Zero runtime deps" means:** `package.json.dependencies === {}` (empty object, not absent) — no library is pulled down at install time on an end-user machine.
+* **`devDependencies` are explicitly permitted.** Test runners (`c8` for coverage), optional hook bundlers (`esbuild`, for patterns like a future `scripts/build-hooks.js`), and similar authoring-time tooling live there. They are never shipped to end users.
+* **Environment assumptions are not dependencies.** `git` (FND-04 commits), `node >=22` (the engine we target), and the host agent CLI (Claude Code / Codex / Gemini / OpenCode) are assumed to exist on the user's machine — they are prerequisites, not things nubos-pilot ships.
+## Decision Drivers
+* **Sufficiency of Node builtins** — the full markdown-workflow surface (frontmatter parsing, readline prompts, file locking, ANSI output, child-process spawn) is reachable through `fs`, `path`, `os`, `child_process`, `readline`, `crypto`, and `util` alone.
+* **`npx` install reliability** — zero deps ≈ zero failure modes on Windows, corporate networks, and air-gapped environments where `npm install` is notoriously flaky.
+* **Patchability (Core Value)** — users copy `.cjs` files verbatim into `.claude/nubos-pilot/` and sometimes patch them locally; there is no `node_modules/` tree to keep in sync.
+* **Security** — zero runtime deps ≈ zero supply-chain surface. No transitive vulnerabilities, no "abandoned maintainer" risk, no post-install script execution from third parties.
+## Considered Options
+* **Zero runtime dependencies** — Node builtins (`fs`, `path`, `os`, `child_process`, `readline`, `crypto`, `util`) + hand-rolled helpers. (CHOSEN)
+* **Rich dependency tree** — adopt a broad runtime surface (e.g. a coding-agent SDK, `playwright`, `sharp`, `sql.js`, `chokidar`, an image-processing native addon, `@modelcontextprotocol/sdk`, `chalk`/`picocolors`).
+* **Native Rust N-API engine** — publish per-platform prebuilt binaries (`-darwin-arm64`, `-linux-x64`, etc.) as `optionalDependencies`.
+* **Accept a single narrow dependency pragmatically** — e.g. `yaml` for frontmatter parsing, because a hand-rolled regex is limited; ship it as a runtime dep rather than write a small parser.
+## Decision Outcome
+Chosen: **"Zero runtime dependencies"**, because it is the only option that reinforces the Core-Value patchability story and minimizes install-failure modes on the weakest user environments (Windows + corporate proxy + air-gapped) simultaneously. The `devDependencies` escape hatch covers authoring-time needs without leaking into end-user installs.
+**Escape hatch for future exceptions:** if a concrete future feature genuinely requires a runtime dep that builtins cannot satisfy, the exception is introduced by a new ADR (e.g. `NNNN-accept-yaml-dependency.md`) that either supersedes ADR-0002 wholesale or amends it narrowly with a name-scoped exemption. The escape is deliberately bureaucratic so that "just add a dep" never becomes the reflex answer (per CONTEXT.md D-07, existing ADRs are not rewritten — they are superseded).
+### Consequences
+* Good, because `npm install` is effectively a no-op for end users — nothing to download, nothing to audit, nothing to break.
+* Good, because supply-chain audits are trivial (there is no chain beyond what ships with Node itself).
+* Good, because users can copy-patch `.cjs` files without module-resolution confusion — no `require()` path that resolves differently in their project vs. ours.
+* Good, because the install-payload tree (see [ADR-0005](0005-three-orthogonal-file-trees.md)) contains only `.cjs` files and markdown — no `node_modules/` subtree ever appears there.
+* Bad, because we reimplement small utilities — YAML frontmatter via hand-rolled parser (`lib/frontmatter.cjs`), readline prompts instead of `inquirer`/`@clack/prompts`, raw ANSI escape constants instead of `chalk`. This is an accepted cost documented at length in CLAUDE.md §"Alternatives Considered".
+* Neutral, because `devDependencies` are permitted and do not ship to users — we can still adopt `c8` for coverage, `esbuild` for optional hook bundling, or `node:test` (builtin) for the test runner.
+* Neutral, because `optionalDependencies` for native prebuilt binaries is ALSO rejected by this ADR — see Rust N-API option below — so no accidental backdoor.
+## Pros and Cons of the Options
+### Zero runtime dependencies — chosen
+* Good, because Node builtins cover the entire markdown-workflow surface without external packages.
+* Good, because it preserves the Core Value "markdown-only, multi-runtime, ohne eigenes Daemon" — any dep is one step toward a runtime.
+* Good, because `devDependencies` remain available for the authoring-time needs that do not leak to users.
+* Bad, because every small utility must be hand-rolled — accepted, documented exhaustively in CLAUDE.md §"Alternatives Considered".
+### Rich dependency tree — rejected
+* Good, because `chalk`/`picocolors` produce nicer terminal output; `@clack/prompts` produces nicer Q&A flows; `marked` and image-processing addons enable TUI rendering.
+* Bad, because `playwright`, `sharp`, `sql.js`, `chokidar`, image addons target TUI/image/async-job features we do not implement.
+* Bad, because `@modelcontextprotocol/sdk` as a runtime dep contradicts REQUIREMENTS.md §"Out of Scope" row "Nubos-MCP als First-Class-Dependency". MCP integration is the user's agent CLI's concern, not ours.
+* Bad, because `@anthropic-ai/claude-agent-sdk` implies we spawn agents — that's the daemon pattern ADR-0001 forbids.
+* Bad, because every transitive node_module is an install-failure risk on Windows + corporate proxies, the environments that most need nubos-pilot to "just work".
+### Native Rust N-API engine — rejected
+* Good, because native binaries offer raw-speed `grep`/`ast-grep`/syntax-highlighting.
+* Bad, because it requires publishing per-platform prebuilt binaries (`-darwin-arm64`, `-linux-x64`, `-linux-arm64`, `-win32-x64`, ...) as `optionalDependencies`, with the associated CI/release plumbing.
+* Bad, because nubos-pilot has no TUI, no image pipeline, and no watcher — the use cases a native engine is built for do not exist in our scope.
+* Bad, because Claude Code already exposes `Grep`, `Read`, `Bash` as first-class tools; we don't need a native re-implementation of grep/ast/read.
+* Bad, because introducing a binary-ship story violates Core Value patchability (users can't copy-patch a `.node` binary the way they can a `.cjs` file).
+### Accept a single narrow dependency pragmatically — rejected (for now)
+* Good, because one dep like `yaml@^2.8` would make frontmatter parsing robust against multiline sequences and anchors.
+* Bad, because "just one dep" is a slippery slope; the hand-rolled parser covers the subset we actually use, and once the door is open, `semver`, `glob`, `minimatch` follow.
+* Bad, because the escape-hatch route (new ADR superseding ADR-0002) exists for exactly this situation — it forces the author to demonstrate the concrete need. Open-ended "pragmatism" removes the forcing function.
+* The escape hatch in Decision Outcome explicitly permits this option on demonstrated need via a new ADR — so rejecting it today is not a permanent closure.
+## More Information
+* **Related ADR:** [ADR-0005](0005-three-orthogonal-file-trees.md) — the install-payload tree contains only `.cjs` + markdown; no `node_modules/` subtree ever ships.
+* **CLAUDE.md:** §"Technology Stack" → "Installation" (matches `"dependencies": {}` shape); §"External runtime dependencies" (row-by-row rejection rationale for the heavy deps enumerated above); §"Alternatives Considered" (accepted-cost catalogue).
+* **REQUIREMENTS.md:** §"Out of Scope" → rows "Nubos-MCP als First-Class-Dependency", and (implicitly) the runtime/daemon rows that would require heavy deps.
+---
+*This ADR does not describe CI enforcement. CI-gate enforcement of the zero-deps rule (dep-growth-block) is deferred to a later deploy/CI phase per ROADMAP.md; Phase 1 enforcement consists of human PR review and this ADR as the authoritative reference.*

package/docs/adr/0003-max-six-unit-types.md ADDED Viewed

@@ -0,0 +1,85 @@
+# ADR-0003: Max Six Unit-Types
+* Status: Accepted
+* Date: 2026-04-14
+* Supersedes: None
+## Context and Problem Statement
+Planning ontologies tend to grow without upper bound — epics, stories, sub-stories, initiatives, spikes, tickets, tasks, subtasks. Each new type fragments workflows, multiplies templates, and forces the user to internalize yet another naming distinction. The question for nubos-pilot is: how many user-facing planning "unit-types" exist, and what are they?
+For the purpose of this ADR, a **unit-type** is a user-facing, persistence-bearing noun with its own template, its own file path in the project-state tree, and its own lifecycle states. An AI prompt variable, an internal helper module (e.g. `lib/tasks.cjs`), a configuration key, or a subsection inside a file are **not** unit-types — they are implementation details invisible to the user's mental model.
+## Decision Drivers
+* **Transparency (Core Value)** — users must be able to mental-model the whole planning system without a reference card. A small type set keeps the system legible.
+* **Cheatsheet-legible** — six named types still fit on a cheatsheet a user internalizes in minutes; growing beyond six would require invoking a reference.
+* **Surface-area cost** — each type costs a template + a workflow verb + a state-machine chunk. N types produce on the order of `N × 3` artifacts that must stay in sync.
+* **Cap-and-escape-hatch** beats **open-ended ontology** for a tool that is supposed to be "transparent" by Core Value.
+## Considered Options
+* **Cap at exactly six** — Milestone, Phase, Plan, Task, Todo, Backlog. (CHOSEN)
+* **Open-ended type system** — let users define new unit-types at will (e.g. via configuration).
+* **Flat "ticket" model** — a single unit-type with tags/categories replacing structural distinctions.
+## Decision Outcome
+Chosen: **"Cap at exactly six"**. The six types are enumerated below as running prose with one short paragraph per type. This enumeration is deliberately NOT a YAML block, NOT a fenced code snippet, and NOT a machine-parsed table — downstream agents do not auto-consume this list; workflows hard-code the six type names inside their own code. The list is authoritative prose for human readers (CONTEXT.md §specifics: "sie ist prose, keine Maschinen-Konvention").
+### The Six Unit-Types
+1. **Milestone** — a top-level project goal spanning multiple phases. Milestones live as entries in `ROADMAP.md`. A milestone's completion does not itself produce a commit (see [ADR-0004](0004-atomic-commit-per-unit.md) for the milestone exception); instead, editing ROADMAP.md to mark the milestone done is the atomic commit that records the milestone's completion.
+2. **Phase** — a sequential slice of a milestone pursuing a single coherent goal. Each phase gets its own directory at `.nubos-pilot/phases/<NN>-<slug>/`, contains a PLAN.md, and has its own lifecycle (not-started → executing → complete). Phases are the primary unit most workflows operate on.
+3. **Plan** — the `PLAN.md` inside a phase describing how the phase executes: waves, tasks, verification. There is typically one PLAN per phase, and it is authored by the `np:plan-phase` workflow and consumed by the execution workflows.
+4. **Task** — an atomic unit of work inside a plan. Tasks are authored as `<task>` XML blocks inside PLAN.md by default. Promotion to standalone `tasks/*.md` files is permitted when parallelism, mixed model-tiers, or non-linear dependencies demand it (forward-reference PLAN-06). A task is the smallest unit that produces exactly one git commit ([ADR-0004](0004-atomic-commit-per-unit.md)).
+5. **Todo** — a captured-on-the-fly idea that lives under `.nubos-pilot/todos/pending/` until it is scheduled. Todos are the lightweight capture path for ideas that surface during execution but don't yet belong to any plan. Promoted todos become tasks or backlog items as part of their scheduling.
+6. **Backlog** — a deferred item parked under `.nubos-pilot/backlog/`. Backlog items are scheduled for later but not yet bound to a phase. They are heavier than todos (they carry rationale and rough scoping) and lighter than phases (they have no plan, no tasks yet).
+All six names (Milestone, Phase, Plan, Task, Todo, Backlog) appear verbatim above; the `.nubos-pilot/phases/`, `.nubos-pilot/todos/`, `.nubos-pilot/backlog/` paths referenced here are text-invariant forward-references — **Phase 1 does not create any of these directories** (D-09, Pitfall 3). They are scaffolded in later phases (Project-State directories starting Phase 4).
+### Consequences
+* Good, because every workflow knows exactly which of the six types it operates on; no type-discovery logic is needed.
+* Good, because the full type catalogue fits on a cheatsheet — new contributors and users can internalize it in minutes.
+* Bad, because users coming from tools that use "epic", "story", "sprint", or "initiative" must map those concepts onto Milestone / Phase / Plan. This is an accepted trade-off and the mapping is usually obvious.
+* Neutral, because adding a seventh type is not forbidden forever — it requires a new ADR per CONTEXT.md D-07 that supersedes or amends ADR-0003. The forcing function prevents casual ontology drift.
+## Pros and Cons of the Options
+### Cap at exactly six — chosen
+* Good, because the six types cover every planning granularity we have encountered — Milestone and Backlog fill the gaps above and below the Phase/Plan/Task/Todo core.
+* Good, because every type has a unique, obvious scope — nothing overlaps.
+* Good, because `N × 3` artifact cost stays fixed and small.
+* Bad, because there is no native notion of "epic" or "initiative" for users coming from Jira-like tools. Mitigated by the Milestone/Phase mapping documented above.
+### Open-ended type system — rejected
+* Good, because users with niche workflows could model their exact process.
+* Bad, because it fragments the workflow library: every `np:*` command either handles unknown types gracefully (complexity) or errors on them (frustration).
+* Bad, because template proliferation breaks the Core-Value transparency — a user can no longer know the whole system at a glance.
+* Bad, because cross-project conventions break down — if every install has a different type set, shared tooling (verifier, planner, reviewer) must be defensive about everything.
+### Flat "ticket" model — rejected
+* Good, because it is the simplest possible ontology — "everything is a ticket".
+* Bad, because it loses the granularity distinction between a 5-minute idea (Todo) and a 3-week milestone. Users end up re-creating hierarchy via tags, which is just a rebuilt ontology without the clarity.
+* Bad, because users want the distinction between "idea captured mid-work" (Todo), "thing I will definitely do later" (Backlog), and "thing I am doing right now" (Task). Flattening loses all three.
+* Bad, because workflow logic becomes branch-heavy (if ticket has no parent plan → behave as todo; if ticket has tasks → behave as plan) — the six-type cap replaces runtime branching with a static type tag.
+## Enforcement
+CI-gate enforcement against a seventh unit-type is deferred to a later deploy/CI phase per ROADMAP.md. Phase 1 enforcement consists of human review during PR review and this ADR as the authoritative reference. Future additions of a seventh type require a new ADR superseding or amending this one (CONTEXT.md D-07).
+## More Information
+* **Related ADR:** [ADR-0004](0004-atomic-commit-per-unit.md) — atomic-commit-per-unit binds the one-commit rule to each of the six types.
+* **Related ADR:** [ADR-0005](0005-three-orthogonal-file-trees.md) — the Project-State tree is where five of the six types physically live (Milestones live in ROADMAP.md within the same tree).
+* **REQUIREMENTS.md:** §"Foundation" row FND-03 — the canonical statement "Milestone/Phase/Plan/Task/Todo/Backlog, keine weiteren ohne ADR".
+* **CONTEXT.md:** §specifics — "sie ist prose, keine Maschinen-Konvention" (why the enumeration stays prose).

package/docs/adr/0004-atomic-commit-per-unit.md ADDED Viewed

@@ -0,0 +1,102 @@
+# ADR-0004: Atomic Commit per Unit
+* Status: Accepted
+* Date: 2026-04-14
+* Supersedes: None
+## Context and Problem Statement
+Executor agents must produce a legible, reversible git history. Two anti-patterns destroy that property:
+1. **Bundling** — a single commit that touches multiple units (e.g. two tasks + a plan edit in one commit). This makes `np:undo-task` impossible: there is no clean `git revert` for "only this one task".
+2. **Splitting** — a single unit that spans multiple commits (e.g. "part 1 of task N", "part 2 of task N"). This makes `np:undo` incoherent: which commit represents the unit?
+The question: what is the commit-to-unit mapping that makes phase-level, plan-level, task-level, and slice-level undo mechanically implementable?
+## The Rule
+**Every completed unit (Phase, Plan, Task, Todo, Backlog-move) produces exactly one git commit. A commit never bundles more than one unit. A unit never produces zero or two commits.**
+This is the atomic-commit-per-unit invariant. It is the property the EXEC-06 (`np:undo`), EXEC-07 (`np:undo-task` / `np:reset-slice`), and EXEC-09 (Executor-Subagent) requirements rely on for their implementation.
+### Milestone Exception
+A Milestone is one of the six unit-types (ADR-0003) but **a Milestone completion is not itself a separate commit**. A milestone is represented by an entry in `ROADMAP.md`; marking it done is an edit to `ROADMAP.md` which is itself a unit-level commit (with commit-type `milestone(…)`). There is no "magic milestone commit" separate from the ROADMAP.md edit. Readers should not expect one.
+## Decision Drivers
+* **Reversibility** — `np:undo`, `np:undo-task`, and `np:reset-slice` all rely on the 1:1 commit-to-unit mapping; without it, these commands cannot be implemented as mechanical `git revert <sha>` operations.
+* **Legibility** — `git log --oneline` reads like a plan-trace; each line corresponds to one unit completion. Reviewers, operators, and future-us can understand the project's progress from git alone.
+* **Audit** — code review can proceed per-unit: a reviewer sees exactly what one unit changed, without having to mentally extract task-N from a mixed-unit commit.
+## Considered Options
+* **One atomic commit per unit** — the rule stated above. (CHOSEN)
+* **Squash-at-phase-boundary** — authors produce many small commits during execution, then squash the whole phase into one commit at phase-end.
+* **One commit per file change** — commit granularity is tied to file count, not semantic unit count.
+* **No commit discipline** — the executor commits whenever it feels like it (developer's choice).
+## Decision Outcome
+Chosen: **"One atomic commit per unit"**, because it is the only option that makes EXEC-06 and EXEC-07 implementable as mechanical `git revert <sha>` operations. Every other option forces `np:undo-task` into either "impossible" (squash, no-discipline) or "brittle" (one-commit-per-file with heuristics about "which files belong to task N").
+### Commit Message Format
+Every unit-producing commit uses the prefix:
+```
+<type>(<phase>-<plan>-<task>): <unit title>
+```
+Where `<type>` is the lowercased unit-type name from ADR-0003: `phase`, `plan`, `task`, `todo`, `backlog`, or `milestone`. The `<phase>-<plan>-<task>` identifier is elided to the granularity of the unit (e.g. a Phase commit uses just `phase-03`; a Task commit inside Phase 3 Plan 2 Task 4 uses `phase-03-02-04` or similar). The exact punctuation and ordering of the identifier is Claude's-discretion in Phase 6 when the Executor-Subagent is authored; this ADR asserts only (a) the one-commit-per-unit rule and (b) the type-prefix convention. Later ADRs or the Phase-6 PLAN.md may refine the identifier format without superseding ADR-0004, provided the atomicity rule remains intact.
+### Consequences
+* Good, because `np:undo-task`, `np:reset-slice`, and `np:undo` each map to a well-defined set of commits — implementation reduces to `git log --grep=<type>(<id>)` + `git revert <sha>`.
+* Good, because `git log --oneline` reads as a progress report; `git log --grep='phase-03'` filters one phase cleanly.
+* Good, because code review can proceed per-unit: reviewers see one unit per commit with no extraction work.
+* Good, because no daemon is required to enforce atomicity ([ADR-0001](0001-no-daemon-invariant.md)) — the Executor-Subagent enforces it in-session at commit time.
+* Bad, because small units produce many commits. Accepted — `git log --grep 'phase-03'` filters by phase; squash-merge at PR boundary is still available if a maintainer chooses.
+* Neutral, because PR-level squash-merging is compatible with this rule, provided per-unit atomic commits are preserved on the feature branch. The rule governs the executor's output, not the eventual merged-to-main shape.
+## Pros and Cons of the Options
+### One atomic commit per unit — chosen
+* Good, because it makes EXEC-06 / EXEC-07 implementable as mechanical revert operations.
+* Good, because it produces self-documenting git history.
+* Good, because the one-to-one mapping is a well-understood git-discipline pattern with no novel enforcement cost.
+* Bad, because commit count grows linearly with plan complexity — accepted; modern git tooling handles thousands of commits trivially.
+### Squash-at-phase-boundary — rejected
+* Good, because it produces a tidy "one commit per phase" history on main.
+* Bad, because it destroys task-granularity undo: `np:undo-task` has no commit to revert once the phase is squashed.
+* Bad, because crash-recovery loses intermediate state — if the agent crashes mid-phase, a partial squash either does not exist (work lost) or is incoherent (partial phase as one commit).
+* Bad, because a verifier that wants to re-verify a single task after the phase is merged cannot isolate that task's diff.
+### One commit per file change — rejected
+* Good, because it produces the smallest possible commits.
+* Bad, because it couples commit count to file count, not semantic unit count. A task that modifies 5 files produces 5 commits; a task that modifies 1 file produces 1 commit. `np:undo-task` then needs a heuristic — "which file-commits belong to this task?" — that the one-commit-per-unit rule eliminates entirely.
+* Bad, because it breaks the mental model: readers can no longer equate "one entry in git log" with "one unit in the plan".
+* Bad, because commit messages become meaningless ("add line to foo.md") rather than intentional ("complete task N").
+### No commit discipline — rejected
+* Good, because it requires the least process.
+* Bad, because it breaks `np:undo-task` by construction — there is no deterministic commit-to-task mapping to revert.
+* Bad, because it makes code review per-unit impossible.
+* Bad, because two executor agents working the same plan at different times produce non-comparable histories.
+## More Information
+* **Related ADR:** [ADR-0001](0001-no-daemon-invariant.md) — the commit happens in the invoking agent's session, not in a background worker; no daemon holds a write-lock across sessions.
+* **Related ADR:** [ADR-0003](0003-max-six-unit-types.md) — defines the six unit-types this rule binds to.
+* **Related ADR:** [ADR-0005](0005-three-orthogonal-file-trees.md) — commits touch files in a single tree at a time (typically Source or Project-State; the Install-Payload tree at the user's install location is never committed from the user's side).
+* **REQUIREMENTS.md:** §"Execution" → rows EXEC-06 (`np:undo`), EXEC-07 (`np:undo-task` / `np:reset-slice`), EXEC-09 (Executor-Subagent — atomic-commit-per-unit enforced).
+* **CLAUDE.md:** §"Workflow Enforcement" — establishes atomic-commit-per-unit as the executor invariant.
+---
+*CI-gate enforcement of atomic-commit-per-unit (e.g. an automated rejection of multi-unit commits) is deferred to a later deploy/CI phase per ROADMAP.md. Phase 1 enforcement = human review and this ADR as the authoritative reference.*