npm - nubos-pilot - Versions diffs - 1.3.3 → 1.3.5 - Mend

nubos-pilot 1.3.3 → 1.3.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/agents/np-task-architect.md +95 -0
package/agents/np-test-writer.md +89 -0
package/bin/install.js +73 -16
package/bin/np-tools/commit-task.cjs +80 -6
package/bin/np-tools/commit-task.test.cjs +133 -0
package/bin/np-tools/loop-commands.test.cjs +121 -2
package/bin/np-tools/loop-run-round.cjs +121 -5
package/lib/agents-registry.cjs +10 -0
package/lib/agents.test.cjs +2 -0
package/lib/config-defaults.cjs +8 -0
package/lib/config-schema.cjs +2 -0
package/lib/git.cjs +6 -2
package/lib/git.test.cjs +28 -0
package/lib/template.cjs +20 -15
package/lib/template.test.cjs +35 -0
package/package.json +1 -1
package/templates/RULES.md +36 -1
package/workflows/execute-phase.md +138 -1
package/workflows/plan-phase.md +17 -2

package/agents/np-task-architect.md ADDED Viewed

@@ -0,0 +1,95 @@
+---
+name: np-task-architect
+description: Per-task architecture step inside the Nubosloop. Runs in round 1 after the researcher swarm, before the test-writer and executor. Reads the task plan, CONTEXT, RULES.md (Conventions) and any M<NNN>-ARCHITECTURE.md, then emits ephemeral structural constraints (module/class layout, boundaries, paradigms, the test surfaces TDD must cover) as its final message. Read-only — writes no files.
+tier: sonnet
+tools: Read, Grep, Glob
+color: purple
+---
+<role>
+You are the nubos-pilot per-task architect. You run once per task, in round 1 of the Nubosloop, after the researcher swarm and BEFORE the test-writer and executor. Your output is the structural contract the test-writer and executor build against: how the code for THIS task must be shaped.
+You are not the milestone architect (`np-architect`, which decides milestone-wide boundaries and writes `M<NNN>-ARCHITECTURE.md`). You operate one level down: given the task and any milestone architecture, you decide the concrete structure of the code this single task introduces — which classes/modules carry which responsibility, where the boundaries fall, which paradigms and project conventions apply, and which surfaces the tests must exercise.
+You are advisory and read-only. You emit your architecture spec as your FINAL MESSAGE (markdown) — you do not write files. The orchestrator captures it and injects it into the test-writer and executor prompts as `<architecture_constraints>`.
+**CRITICAL: Mandatory Initial Read**
+If the prompt contains a `<files_to_read>` block, you MUST use the `Read` tool to load every file listed there before producing your spec. This is your primary context — the task plan, CONTEXT, `RULES.md`, and any `M<NNN>-ARCHITECTURE.md`.
+**Design skills.** If the spawn prompt contains a `Use the following Nubos skills` line, `Read` each named skill from `.claude/skills/<skill>/SKILL.md` BEFORE committing your spec. Each skill's "Verification bar" is the standard your structural decisions must satisfy. If the skills are absent (non-Claude runtime), proceed on your own judgment.
+</role>
+## Completeness Mandate
+This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENESS.md). The rules that bind this role:
+- **Rule 1 — Do the whole thing.** A structural spec that names happy-path classes but ignores error paths, boundary surfaces, and the tests they require is not done. Name them all.
+- **Rule 2 — Do it right.** Honour the project's Conventions (`RULES.md` → `## Conventions`). Do not invent a structure that contradicts the locked class/module/naming/paradigm rules.
+- **Rule 8 — Never present a workaround when the real fix exists.** If the clean structure needs a new boundary, say so — do not bless a shortcut to save the executor a file.
+- **Rule 9 — Search before building.** Read `.nubos-pilot/codebase/INDEX.md` and the milestone architecture before naming a new module. Extend existing structure; do not silently reinvent it.
+Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
+## Granularity — task structure, NOT line-level implementation
+You decide **structure**: responsibilities, boundaries, the shape of the public surface, which paradigm applies, what the tests must cover. You do NOT write the implementation. Specifically you do NOT emit:
+- Schema DDL / exact column types,
+- exact framework-generated filenames (use glob-shaped descriptions, e.g. "a Service class under the app's service layer"),
+- full code bodies (a ≤ 5-line illustrative signature is fine; a method body is not),
+- code-style edicts already covered by `RULES.md`.
+Your spec is ephemeral guidance, not a committed artifact — it never reaches `PLAN.md`, so it cannot trip plan-lint. Keep it concrete enough to constrain the executor, abstract enough to leave the executor room to implement.
+## Inputs
+| Input | Purpose | Typical path |
+|-------|---------|--------------|
+| Task plan (required) | The task being executed. `<action>` + `<acceptance_criteria>` define the surface you structure. | `.nubos-pilot/milestones/M<NNN>/slices/S<NNN>/tasks/T<NNNN>/T<NNNN>-PLAN.md` |
+| RULES.md (required) | Project Conventions — class/module structure, naming, code style, paradigms. Your spec MUST conform. | `.nubos-pilot/RULES.md` |
+| M<NNN>-CONTEXT.md (recommended) | Locked milestone decisions. | `.nubos-pilot/milestones/M<NNN>/M<NNN>-CONTEXT.md` |
+| M<NNN>-ARCHITECTURE.md (when present) | Milestone boundaries you refine for this task — never contradict. | `.nubos-pilot/milestones/M<NNN>/M<NNN>-ARCHITECTURE.md` |
+| .nubos-pilot/codebase/INDEX.md (recommended) | Existing module boundaries to extend, not reinvent. | `.nubos-pilot/codebase/INDEX.md` |
+## Output Contract
+Emit your architecture spec as your final message — markdown, this exact shape, no file writes:
+```markdown
+# Task Architecture — <task-id>
+## Responsibilities & Boundaries
+| Unit (class / module) | New / Existing | Responsibility | Public surface |
+|---|---|---|---|
+| ... | ... | ... | ... |
+## Paradigms & Conventions to honour
+- <named convention from RULES.md the executor must follow>
+- <pattern that is required / banned for this task>
+## Required Test Surfaces (hand-off to np-test-writer)
+- <observable behaviour that MUST have a test> — happy path
+- <boundary / empty / overflow case that MUST have a test>
+- <failure path that MUST have a test>
+## Constraints for the executor
+- <boundary the executor must not cross, e.g. "no DB access from the controller — go through the Service">
+## Conflicts
+- <only if the task or RULES.md make a clean structure impossible — name the conflict; the orchestrator routes it to the user>
+```
+If the task is purely mechanical (copy change, version bump, one-line fix) and needs no structural decision, emit a single line: `No structural decision required — <one-line reason>.` Do not manufacture structure where none is warranted.
+<scope_guardrail>
+**Do:**
+- Read the task plan, RULES.md, CONTEXT, milestone architecture, and codebase index freely.
+- Decide the task's code structure and the test surfaces it requires.
+- Honour RULES.md Conventions and milestone architecture. Surface conflicts instead of silently overriding.
+**Don't:**
+- Write or edit ANY file — you have no Write/Edit tool. Your spec is your final message.
+- Prescribe line-level implementation, schema DDL, or exact framework filenames.
+- Re-open milestone decisions (`M<NNN>-CONTEXT.md` / `M<NNN>-ARCHITECTURE.md`) — refine within them.
+- Spawn other agents or commit anything.
+</scope_guardrail>

package/agents/np-test-writer.md ADDED Viewed

@@ -0,0 +1,89 @@
+---
+name: np-test-writer
+description: Per-task TDD step inside the Nubosloop. Runs in round 1 after the architect, before the executor. Writes real, valid test files for the task's required surfaces (from the architecture spec + acceptance criteria + Conventions) BEFORE production code exists. Tests may start red; the executor makes them green. Never skips, stubs, or writes vacuous assertions.
+tier: sonnet
+tools: Read, Write, Bash, Grep, Glob
+color: "#06B6D4"
+---
+<role>
+You are the nubos-pilot test-writer. You run once per task, in round 1 of the Nubosloop, AFTER the per-task architect and BEFORE the executor. You practice test-driven development: you write the tests the executor must then make pass.
+The orchestrator hands you `<architecture_constraints>` (the per-task architect's required test surfaces), the task's `<acceptance_criteria>`, and the project Conventions (`RULES.md`). You translate those into real test files placed where the project keeps tests. Because production code does not exist yet, your tests are EXPECTED to fail (red) when run — that is correct TDD. The executor turns them green; the loop's verify gate runs after the executor, never after you.
+Your tests are a contract. The executor is told not to delete, skip, or weaken them. So they must be valid, runnable, and honest from the start.
+**CRITICAL: Mandatory Initial Read**
+If the prompt contains a `<files_to_read>` block, you MUST `Read` every file listed before writing anything — the task plan, the architecture spec path, `RULES.md`, and any existing neighbouring tests whose style you must match.
+**Testing skills.** If the spawn prompt contains a `Use the following Nubos skills` line, `Read` each named skill from `.claude/skills/<skill>/SKILL.md` (e.g. `np-test-strategy`) before writing. Its "Verification bar" is the standard your test suite must satisfy. If absent (non-Claude runtime), proceed on your own judgment.
+</role>
+## Completeness Mandate
+This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENESS.md). The rules that bind this role:
+- **Rule 1 — Do the whole thing.** Cover every surface the architect listed: happy path, empty/boundary/overflow input, and the failure path. A missing case is an incomplete suite.
+- **Rule 3 — Do it with tests.** This is your entire job. Every public surface the task introduces gets at least one test that asserts observable behaviour.
+- **Rule 10 — Test before shipping.** A test that does not actually assert the claimed behaviour is worse than no test. No `assert(true)`, no `expect(x).toBeDefined()` as the only check, no `it.skip` / `markTestSkipped` / commented-out asserts / `if (false)` guards. Such a test is a hard-stop violation, not a placeholder.
+Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
+## Anti-Skip Self-Check (run before you finish)
+Before emitting your envelope, re-read every test file you wrote and confirm — line by line — that NONE of the following is present. If any is, fix it; do not ship it:
+1. Skipped/pending tests (`.skip`, `xit`, `xdescribe`, `markTestSkipped`, `@Disabled`, `t.skip`, `pytest.mark.skip`, `#[ignore]`).
+2. Vacuous assertions (`assert(true)`, `expect(true).toBe(true)`, an assertion that can never fail, or a test with zero assertions).
+3. Swallowed failures (`try { … } catch {}` around the assertion, empty catch, asserting inside an un-awaited promise).
+4. Tautologies (asserting a literal against itself, or re-asserting the mock you just configured).
+5. Non-determinism (wall-clock, network, unseeded randomness) without explicit injection.
+A test that exists only to inflate the count is a Rule-10 violation. The downstream `np-critic-tests` axis audits for exactly these; do not hand it findings.
+## TDD Discipline (red is correct)
+- Write tests against the behaviour the acceptance criteria + architecture spec require — not against whatever the executor might implement.
+- Run the project's test command via `Bash` to confirm the tests **parse and execute** (no syntax/collection errors). Failing assertions are expected and fine; collection/compile errors are NOT — fix those.
+- Do NOT write production code, stubs, or fixtures that pre-satisfy the tests. Minimal test scaffolding (factories, fakes the project already uses) is allowed; implementing the unit under test is the executor's job.
+- Place tests where the project keeps them (match `RULES.md` → Conventions and existing neighbours). Add the files you create to the task's `files_modified` set via the checkpoint if the orchestrator asks; otherwise list them in your envelope so they are committed with the executor's diff.
+## Inputs
+| Input | Purpose | Typical path |
+|-------|---------|--------------|
+| Task plan (required) | `<action>` + `<acceptance_criteria>` define what to test. | `.nubos-pilot/milestones/M<NNN>/slices/S<NNN>/tasks/T<NNNN>/T<NNNN>-PLAN.md` |
+| Architecture constraints (required) | The per-task architect's required test surfaces. | inline `<architecture_constraints>` / `$TMPDIR` path |
+| RULES.md (required) | Conventions — test location, naming, style. | `.nubos-pilot/RULES.md` |
+| Neighbouring tests (recommended) | The project's test idiom you must match. | repo paths |
+## Output Contract
+Write the test files, then emit a single JSON object as your final message (no prose around it):
+```json
+{
+  "agent": "test-writer",
+  "task_id": "M001-S001-T0001",
+  "round": 1,
+  "tests_written": ["tests/Feature/OutcomeRecorderTest.php"],
+  "surfaces_covered": ["records a verdict and persists it", "rejects a malformed verdict with 422", "returns empty history for an unknown task"],
+  "collection_ok": true,
+  "expected_red": true,
+  "notes": "Tests fail as expected — no production code yet. Executor must make them green without weakening assertions."
+}
+```
+`collection_ok` MUST be `true` before you finish — if the suite cannot even collect/compile your tests, fix them first. `expected_red: true` is the normal TDD state. If you genuinely cannot write valid tests (e.g. acceptance criteria are ambiguous), emit `"collection_ok": false` with a `notes` explaining the blocker — the orchestrator routes it to the user rather than letting the executor proceed blind.
+<scope_guardrail>
+**Do:**
+- Read the task, architecture spec, RULES.md, and neighbouring tests.
+- Write real, valid, honest test files for every required surface.
+- Run the test command to confirm collection succeeds (red assertions are fine).
+**Don't:**
+- Write production code or pre-satisfy your own tests.
+- Skip, stub, weaken, or comment out any assertion to make the suite "pass".
+- Edit unrelated files, spawn other agents, or commit anything.
+</scope_guardrail>

package/bin/install.js CHANGED Viewed

@@ -228,27 +228,79 @@ function _readInstallConfig(projectRoot) {
   }
 }
-// On re-install/update the installer leaves an existing config.json untouched.
-// To make `ultra` the standard for updated projects too, backfill `agents.economy`
-// into configs that don't set it yet — loud (the key is written, visible in the
-// file) and conservative (an explicit economy OR legacy economy_critic is treated
-// as a deliberate choice and never overwritten). Returns the action taken for logging.
-function _backfillEconomyDefault(stateDir, { dryRun = false } = {}) {
+// Shared read for the re-install/update backfills: parse the existing config.json
+// once. Returns `{ cfgPath, cfg, status }` where status is 'ok' | 'absent' |
+// 'unparseable' and cfg is null unless status is 'ok'.
+function _loadConfigJson(stateDir) {
   const cfgPath = path.join(stateDir, 'config.json');
   let raw;
-  try { raw = fs.readFileSync(cfgPath, 'utf-8'); } catch { return 'absent'; }
+  try { raw = fs.readFileSync(cfgPath, 'utf-8'); } catch { return { cfgPath, cfg: null, status: 'absent' }; }
   let cfg;
-  try { cfg = JSON.parse(raw); } catch { return 'unparseable'; }
-  if (!cfg || typeof cfg !== 'object') return 'unparseable';
+  try { cfg = JSON.parse(raw); } catch { return { cfgPath, cfg: null, status: 'unparseable' }; }
+  if (!cfg || typeof cfg !== 'object') return { cfgPath, cfg: null, status: 'unparseable' };
+  return { cfgPath, cfg, status: 'ok' };
+}
+// Apply the ultra economy default in-memory (never overwriting an explicit
+// economy OR legacy economy_critic choice). Returns 'backfilled' | 'preserved'.
+function _applyEconomyDefault(cfg) {
   const agents = cfg.agents && typeof cfg.agents === 'object' ? cfg.agents : null;
   if (agents && (agents.economy !== undefined || agents.economy_critic !== undefined)) {
     return 'preserved';
   }
   cfg.agents = { ...(agents || {}), economy: configDefaults.INSTALL_ECONOMY_MODE };
-  if (!dryRun) atomicWriteFileSync(cfgPath, JSON.stringify(cfg, null, 2));
   return 'backfilled';
 }
+// Apply the default-on loop-agent toggles (architect, test-writer) in-memory,
+// never overwriting an explicit true/false. Returns the keys added.
+const _BACKFILL_AGENT_TOGGLES = Object.freeze(['architect', 'test_writer']);
+function _applyAgentToggles(cfg) {
+  const agents = cfg.agents && typeof cfg.agents === 'object' ? cfg.agents : {};
+  const added = [];
+  for (const key of _BACKFILL_AGENT_TOGGLES) {
+    if (agents[key] === undefined) {
+      agents[key] = configDefaults.DEFAULT_AGENTS[key];
+      added.push(key);
+    }
+  }
+  if (added.length > 0) cfg.agents = agents;
+  return added;
+}
+// Single-pass backfill used by the installer: one read, the economy default and
+// the agent-toggle defaults applied together, one atomic write. Both backfills are
+// loud (written to the file) and conservative (an explicit choice is never
+// overwritten). Returns `{ economy, toggles }` for logging.
+function _backfillConfigDefaults(stateDir, { dryRun = false } = {}) {
+  const { cfgPath, cfg, status } = _loadConfigJson(stateDir);
+  if (!cfg) return { economy: status, toggles: [] };
+  const economy = _applyEconomyDefault(cfg);
+  const toggles = _applyAgentToggles(cfg);
+  if ((economy === 'backfilled' || toggles.length > 0) && !dryRun) {
+    atomicWriteFileSync(cfgPath, JSON.stringify(cfg, null, 2));
+  }
+  return { economy, toggles };
+}
+// Standalone wrappers retained for unit tests. Each loads + writes on its own;
+// the installer uses _backfillConfigDefaults to avoid a second read/write pass.
+function _backfillEconomyDefault(stateDir, { dryRun = false } = {}) {
+  const { cfgPath, cfg, status } = _loadConfigJson(stateDir);
+  if (!cfg) return status;
+  const action = _applyEconomyDefault(cfg);
+  if (action === 'backfilled' && !dryRun) atomicWriteFileSync(cfgPath, JSON.stringify(cfg, null, 2));
+  return action;
+}
+function _backfillAgentToggles(stateDir, { dryRun = false } = {}) {
+  const { cfgPath, cfg } = _loadConfigJson(stateDir);
+  if (!cfg) return [];
+  const added = _applyAgentToggles(cfg);
+  if (added.length > 0 && !dryRun) atomicWriteFileSync(cfgPath, JSON.stringify(cfg, null, 2));
+  return added;
+}
 function _readExistingScope(projectRoot) {
   const cfg = _readInstallConfig(projectRoot);
   return cfg && cfg.scope ? cfg.scope : null;
@@ -441,11 +493,15 @@ async function _runInstallLocked(ctx) {
     else console.error(dim + 'DRY-RUN: würde schreiben ' + configPath + reset);
     initConfig = config;
   } else {
-    // Re-install / update: backfill the ultra economy default into a config that
-    // doesn't set it yet (never overwriting an explicit choice).
-    const action = _backfillEconomyDefault(stateDir, { dryRun });
-    if (action === 'backfilled') {
-      console.error(green + '  [config] agents.economy → ultra (backfilled default)'
+    // Re-install / update: backfill the default agent config into a config that
+    // doesn't set it yet (one read/write, never overwriting an explicit choice).
+    const { economy, toggles } = _backfillConfigDefaults(stateDir, { dryRun });
+    if (economy === 'backfilled') {
+      console.error(green + '  [config] agents.economy → ' + configDefaults.INSTALL_ECONOMY_MODE + ' (backfilled default)'
+        + (dryRun ? ' [DRY-RUN]' : '') + reset);
+    }
+    for (const key of toggles) {
+      console.error(green + '  [config] agents.' + key + ' → ' + configDefaults.DEFAULT_AGENTS[key] + ' (backfilled default)'
         + (dryRun ? ' [DRY-RUN]' : '') + reset);
     }
   }
@@ -915,5 +971,6 @@ module.exports = {
   parseInstallFlags,
   VALID_AGENTS, VALID_SCOPES,
   SOURCE_PAYLOAD_DIR, PAYLOAD_SUBPATH, STATE_SUBPATH,
-  _payloadDirFor, _stateDirFor, _backfillEconomyDefault,
+  _payloadDirFor, _stateDirFor,
+  _backfillEconomyDefault, _backfillAgentToggles, _backfillConfigDefaults,
 };

package/bin/np-tools/commit-task.cjs CHANGED Viewed

@@ -91,19 +91,90 @@ function _resolveSafe(root, p) {
 }
 const _COMMIT_NAME_MAX = 200;
+const _COMMIT_BODY_MAX = 2000;
+const _TASK_ID_PREFIX_RE = /^\s*M\d{3,}-S\d{3,}-T\d{4,}\s*[—:-]\s*/;
+const _PLACEHOLDER_RE = /^\s*(?:\{\{.*\}\}|_?TBD\b.*|_none\b.*)\s*$/i;
 function _sanitizeCommitName(s) {
   return String(s == null ? '' : s).replace(/[\r\n\t]+/g, ' ').replace(/\s+/g, ' ').trim().slice(0, _COMMIT_NAME_MAX);
 }
+// The scaffolded H1 is "# <task-id> — <name>"; without a `name:` frontmatter
+// field the body regex captures the whole heading, producing the duplicated
+// "task(ID): ID — desc" subject. Strip a leading task-id prefix so the subject
+// reads "task(ID): desc".
 function _extractName(frontmatter, body) {
+  // Strip a leading task-id prefix, but fall through to the next source when the
+  // strip empties the candidate — a `name:` of just "M001-S001-T0001 —" or an H1
+  // with no description must not produce a bare "task(ID): " subject.
   if (typeof frontmatter.name === 'string' && frontmatter.name.length > 0) {
-    return _sanitizeCommitName(frontmatter.name);
+    const stripped = _sanitizeCommitName(String(frontmatter.name).replace(_TASK_ID_PREFIX_RE, ''));
+    if (stripped) return stripped;
   }
   const m = String(body || '').match(/^#\s+(?:Task:\s*)?(.+?)\s*$/m);
-  if (m) return _sanitizeCommitName(m[1]);
+  if (m) {
+    const stripped = _sanitizeCommitName(m[1].replace(_TASK_ID_PREFIX_RE, ''));
+    if (stripped) return stripped;
+  }
   return _sanitizeCommitName(frontmatter.id || 'task');
 }
+function _innerTag(body, tag) {
+  const m = String(body || '').match(new RegExp('<' + tag + '>([\\s\\S]*?)</' + tag + '>'));
+  if (!m) return '';
+  const inner = m[1].trim();
+  // A still-unfilled placeholder may carry a Markdown bullet prefix
+  // (`- _TBD — …`); test the de-bulleted form so it is recognised and omitted.
+  const candidate = inner.replace(/^[-*+]\s+/, '');
+  return _PLACEHOLDER_RE.test(candidate) ? '' : inner;
+}
+function _sanitizeCommitBody(s) {
+  return String(s == null ? '' : s)
+    .replace(/\r\n?/g, '\n')
+    .replace(/[ \t]+\n/g, '\n')
+    .replace(/\n{3,}/g, '\n\n')
+    .trim()
+    .slice(0, _COMMIT_BODY_MAX);
+}
+// Compose a descriptive commit body from the task's intent so the history is
+// self-explanatory months later: what the task does (<action>), what it must
+// satisfy (<acceptance_criteria>), the task id, and the files it touched.
+function _dedupe(arr) {
+  const seen = new Set();
+  const out = [];
+  for (const x of arr) {
+    if (!seen.has(x)) { seen.add(x); out.push(x); }
+  }
+  return out;
+}
+// Test files np-test-writer wrote this task (ADR-0023). The test-writer chooses
+// their paths at runtime — the planner-authored files_modified does not list
+// them — so the post-test-writer phase records them in the checkpoint. Fold them
+// into the commit set so the executor-greened tests land with their production
+// code instead of being silently dropped.
+function _tddTestFiles(taskId, cwd, root) {
+  const cp = readCheckpoint(taskId, cwd);
+  const np = cp && cp.nubosloop;
+  const tests = np && Array.isArray(np.tdd_tests) ? np.tdd_tests : [];
+  return tests.map((p) => _resolveSafe(root, p));
+}
+function _composeCommitBody(body, taskId, files) {
+  const action = _innerTag(body, 'action');
+  const accept = _innerTag(body, 'acceptance_criteria');
+  const parts = [];
+  if (action) parts.push(action);
+  if (accept) parts.push('Acceptance:\n' + accept);
+  parts.push('Task: ' + taskId);
+  if (Array.isArray(files) && files.length > 0) {
+    parts.push('Files: ' + files.join(', '));
+  }
+  return _sanitizeCommitBody(parts.join('\n\n'));
+}
 function run(args, ctx) {
   const context = ctx || {};
   const cwd = context.cwd || process.cwd();
@@ -153,13 +224,16 @@ function run(args, ctx) {
     );
   }
   const root = findProjectRoot(cwd);
-  const safeFiles = files.map((p) => _resolveSafe(root, p));
+  const declaredSafe = files.map((p) => _resolveSafe(root, p));
+  const safeFiles = _dedupe([...declaredSafe, ..._tddTestFiles(taskId, cwd, root)]);
   const name = _extractName(frontmatter, body);
   const message = 'task(' + taskId + '): ' + name;
+  // Compose the body from the paths that will actually be committed (commitTask
+  // drops gitignored entries), so `git log` never advertises a file the diff omits.
+  const { committable } = git.classifyCommittablePaths(safeFiles);
+  const commitBody = _composeCommitBody(body, taskId, committable);
-  const result = commitTask(taskId, safeFiles, message);
+  const result = commitTask(taskId, safeFiles, message, commitBody);
   if (result.committed === false && result.reason === 'artifacts-gitignored') {
     try {

package/bin/np-tools/commit-task.test.cjs CHANGED Viewed

@@ -164,6 +164,139 @@ test('CT-3: commit-task emits JSON with sha + files on success', () => {
   assert.ok(subject.startsWith('task(M006-S001-T0001):'), 'subject: ' + subject);
 });
+test('CT-3b: subject strips the duplicated task-id prefix from the H1 heading', () => {
+  const root = makeRepo();
+  const taskId = 'M006-S001-T0050';
+  const m = taskId.match(/^(M\d{3,})-(S\d{3,})-(T\d{4,})$/);
+  const [, mId, sId, tId] = m;
+  const taskDir = path.join(root, '.nubos-pilot', 'milestones', mId, 'slices', sId, 'tasks', tId);
+  fs.mkdirSync(taskDir, { recursive: true });
+  fs.writeFileSync(path.join(taskDir, tId + '-PLAN.md'), [
+    '---',
+    `id: ${taskId}`,
+    `milestone: ${mId}`,
+    `slice: ${mId}-${sId}`,
+    'type: execute',
+    'status: in-progress',
+    'tier: sonnet',
+    'owner: np-executor',
+    'wave: 1',
+    'depends_on: []',
+    'files_modified:',
+    '  - src/a.ts',
+    'autonomous: true',
+    'must_haves: {}',
+    '---',
+    '',
+    `# ${taskId} — wire the outcome feedback loop`,
+    '',
+    '<action>',
+    'Add the OutcomeRecorder service and persist verdicts.',
+    '</action>',
+    '',
+    '<acceptance_criteria>',
+    '- Verdicts persist across restarts',
+    '- API returns 201 on record',
+    '</acceptance_criteria>',
+  ].join('\n'), 'utf-8');
+  seedLoopReadyCheckpoint(root, taskId);
+  fs.mkdirSync(path.join(root, 'src'), { recursive: true });
+  fs.writeFileSync(path.join(root, 'src', 'a.ts'), 'export const a = 1;\n', 'utf-8');
+  const prev = process.cwd();
+  process.chdir(root);
+  const cap = _capture();
+  try {
+    subcmd.run([taskId], { cwd: root, stdout: cap.stub });
+  } finally {
+    process.chdir(prev);
+  }
+  const subject = execFileSync('git', ['-C', root, 'log', '-n', '1', '--format=%s'], { encoding: 'utf-8' }).trim();
+  assert.equal(subject, `task(${taskId}): wire the outcome feedback loop`,
+    'subject must not repeat the task id after the colon');
+  const fullBody = execFileSync('git', ['-C', root, 'log', '-n', '1', '--format=%b'], { encoding: 'utf-8' });
+  assert.match(fullBody, /Add the OutcomeRecorder service/, 'body should carry the <action> intent');
+  assert.match(fullBody, /Acceptance:/);
+  assert.match(fullBody, /Verdicts persist across restarts/);
+  assert.match(fullBody, new RegExp('Task: ' + taskId));
+  assert.match(fullBody, /Files: src\/a\.ts/);
+});
+test('CT-3c: test-writer files recorded in nubosloop.tdd_tests are folded into the commit', () => {
+  const root = makeRepo();
+  const taskId = 'M006-S001-T0060';
+  seedPlanAndTask(root, '06-01', taskId, ['src/a.ts']);
+  seedLoopReadyCheckpoint(root, taskId, { nubosloop: { tdd_tests: ['tests/a.test.ts'] } });
+  fs.mkdirSync(path.join(root, 'src'), { recursive: true });
+  fs.mkdirSync(path.join(root, 'tests'), { recursive: true });
+  fs.writeFileSync(path.join(root, 'src', 'a.ts'), 'export const a = 1;\n', 'utf-8');
+  fs.writeFileSync(path.join(root, 'tests', 'a.test.ts'), 'test("a", () => {});\n', 'utf-8');
+  const prev = process.cwd();
+  process.chdir(root);
+  try {
+    subcmd.run([taskId], { cwd: root, stdout: _capture().stub });
+  } finally {
+    process.chdir(prev);
+  }
+  const committed = execFileSync('git', ['-C', root, 'show', '--name-only', '--format=', 'HEAD'], { encoding: 'utf-8' });
+  assert.match(committed, /src\/a\.ts/, 'production file must be committed');
+  assert.match(committed, /tests\/a\.test\.ts/, 'tdd test file must be committed even though it is not in files_modified');
+});
+test('CT-3d: degenerate task name falls back to the id instead of an empty subject', () => {
+  const root = makeRepo();
+  const taskId = 'M006-S001-T0061';
+  const m = taskId.match(/^(M\d{3,})-(S\d{3,})-(T\d{4,})$/);
+  const [, mId, sId, tId] = m;
+  const taskDir = path.join(root, '.nubos-pilot', 'milestones', mId, 'slices', sId, 'tasks', tId);
+  fs.mkdirSync(taskDir, { recursive: true });
+  fs.writeFileSync(path.join(taskDir, tId + '-PLAN.md'), [
+    '---',
+    `id: ${taskId}`,
+    `milestone: ${mId}`,
+    `slice: ${mId}-${sId}`,
+    'type: execute', 'status: in-progress', 'tier: sonnet', 'owner: np-executor',
+    'wave: 1', 'depends_on: []',
+    'files_modified:', '  - src/a.ts',
+    'autonomous: true', 'must_haves: {}',
+    '---', '',
+    `# ${taskId} — `,
+  ].join('\n'), 'utf-8');
+  seedLoopReadyCheckpoint(root, taskId);
+  fs.mkdirSync(path.join(root, 'src'), { recursive: true });
+  fs.writeFileSync(path.join(root, 'src', 'a.ts'), 'export const a = 1;\n', 'utf-8');
+  const prev = process.cwd();
+  process.chdir(root);
+  try {
+    subcmd.run([taskId], { cwd: root, stdout: _capture().stub });
+  } finally {
+    process.chdir(prev);
+  }
+  const subject = execFileSync('git', ['-C', root, 'log', '-n', '1', '--format=%s'], { encoding: 'utf-8' }).trim();
+  assert.equal(subject, `task(${taskId}): ${taskId}`, 'empty name must fall back to the id, never a bare colon');
+});
+test('CT-3e: commit body Files: lists only committed paths, not gitignored ones', () => {
+  const root = makeRepo();
+  const taskId = 'M006-S001-T0062';
+  seedPlanAndTask(root, '06-01', taskId, ['src/a.ts', 'build/out.js']);
+  seedLoopReadyCheckpoint(root, taskId);
+  fs.writeFileSync(path.join(root, '.gitignore'), 'build/\n', 'utf-8');
+  fs.mkdirSync(path.join(root, 'src'), { recursive: true });
+  fs.mkdirSync(path.join(root, 'build'), { recursive: true });
+  fs.writeFileSync(path.join(root, 'src', 'a.ts'), 'export const a = 1;\n', 'utf-8');
+  fs.writeFileSync(path.join(root, 'build', 'out.js'), 'noise', 'utf-8');
+  const prev = process.cwd();
+  process.chdir(root);
+  try {
+    subcmd.run([taskId], { cwd: root, stdout: _capture().stub });
+  } finally {
+    process.chdir(prev);
+  }
+  const fullBody = execFileSync('git', ['-C', root, 'log', '-n', '1', '--format=%b'], { encoding: 'utf-8' });
+  assert.match(fullBody, /Files: src\/a\.ts/);
+  assert.doesNotMatch(fullBody, /build\/out\.js/, 'gitignored path must not be advertised in the body');
+});
 test('CT-4: commit-task SOFT-SKIPS when every files_modified entry is gitignored (artifacts-gitignored terminator)', () => {
   const root = makeRepo();
   seedPlanAndTask(root, '06-01', 'M006-S001-T0002', ['build/out.js']);