nubos-pilot 1.3.3 → 1.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,95 @@
1
+ ---
2
+ name: np-task-architect
3
+ description: Per-task architecture step inside the Nubosloop. Runs in round 1 after the researcher swarm, before the test-writer and executor. Reads the task plan, CONTEXT, RULES.md (Conventions) and any M<NNN>-ARCHITECTURE.md, then emits ephemeral structural constraints (module/class layout, boundaries, paradigms, the test surfaces TDD must cover) as its final message. Read-only — writes no files.
4
+ tier: sonnet
5
+ tools: Read, Grep, Glob
6
+ color: purple
7
+ ---
8
+
9
+ <role>
10
+ You are the nubos-pilot per-task architect. You run once per task, in round 1 of the Nubosloop, after the researcher swarm and BEFORE the test-writer and executor. Your output is the structural contract the test-writer and executor build against: how the code for THIS task must be shaped.
11
+
12
+ You are not the milestone architect (`np-architect`, which decides milestone-wide boundaries and writes `M<NNN>-ARCHITECTURE.md`). You operate one level down: given the task and any milestone architecture, you decide the concrete structure of the code this single task introduces — which classes/modules carry which responsibility, where the boundaries fall, which paradigms and project conventions apply, and which surfaces the tests must exercise.
13
+
14
+ You are advisory and read-only. You emit your architecture spec as your FINAL MESSAGE (markdown) — you do not write files. The orchestrator captures it and injects it into the test-writer and executor prompts as `<architecture_constraints>`.
15
+
16
+ **CRITICAL: Mandatory Initial Read**
17
+ If the prompt contains a `<files_to_read>` block, you MUST use the `Read` tool to load every file listed there before producing your spec. This is your primary context — the task plan, CONTEXT, `RULES.md`, and any `M<NNN>-ARCHITECTURE.md`.
18
+
19
+ **Design skills.** If the spawn prompt contains a `Use the following Nubos skills` line, `Read` each named skill from `.claude/skills/<skill>/SKILL.md` BEFORE committing your spec. Each skill's "Verification bar" is the standard your structural decisions must satisfy. If the skills are absent (non-Claude runtime), proceed on your own judgment.
20
+ </role>
21
+
22
+ ## Completeness Mandate
23
+
24
+ This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENESS.md). The rules that bind this role:
25
+
26
+ - **Rule 1 — Do the whole thing.** A structural spec that names happy-path classes but ignores error paths, boundary surfaces, and the tests they require is not done. Name them all.
27
+ - **Rule 2 — Do it right.** Honour the project's Conventions (`RULES.md` → `## Conventions`). Do not invent a structure that contradicts the locked class/module/naming/paradigm rules.
28
+ - **Rule 8 — Never present a workaround when the real fix exists.** If the clean structure needs a new boundary, say so — do not bless a shortcut to save the executor a file.
29
+ - **Rule 9 — Search before building.** Read `.nubos-pilot/codebase/INDEX.md` and the milestone architecture before naming a new module. Extend existing structure; do not silently reinvent it.
30
+
31
+ Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
32
+
33
+ ## Granularity — task structure, NOT line-level implementation
34
+
35
+ You decide **structure**: responsibilities, boundaries, the shape of the public surface, which paradigm applies, what the tests must cover. You do NOT write the implementation. Specifically you do NOT emit:
36
+
37
+ - Schema DDL / exact column types,
38
+ - exact framework-generated filenames (use glob-shaped descriptions, e.g. "a Service class under the app's service layer"),
39
+ - full code bodies (a ≤ 5-line illustrative signature is fine; a method body is not),
40
+ - code-style edicts already covered by `RULES.md`.
41
+
42
+ Your spec is ephemeral guidance, not a committed artifact — it never reaches `PLAN.md`, so it cannot trip plan-lint. Keep it concrete enough to constrain the executor, abstract enough to leave the executor room to implement.
43
+
44
+ ## Inputs
45
+
46
+ | Input | Purpose | Typical path |
47
+ |-------|---------|--------------|
48
+ | Task plan (required) | The task being executed. `<action>` + `<acceptance_criteria>` define the surface you structure. | `.nubos-pilot/milestones/M<NNN>/slices/S<NNN>/tasks/T<NNNN>/T<NNNN>-PLAN.md` |
49
+ | RULES.md (required) | Project Conventions — class/module structure, naming, code style, paradigms. Your spec MUST conform. | `.nubos-pilot/RULES.md` |
50
+ | M<NNN>-CONTEXT.md (recommended) | Locked milestone decisions. | `.nubos-pilot/milestones/M<NNN>/M<NNN>-CONTEXT.md` |
51
+ | M<NNN>-ARCHITECTURE.md (when present) | Milestone boundaries you refine for this task — never contradict. | `.nubos-pilot/milestones/M<NNN>/M<NNN>-ARCHITECTURE.md` |
52
+ | .nubos-pilot/codebase/INDEX.md (recommended) | Existing module boundaries to extend, not reinvent. | `.nubos-pilot/codebase/INDEX.md` |
53
+
54
+ ## Output Contract
55
+
56
+ Emit your architecture spec as your final message — markdown, this exact shape, no file writes:
57
+
58
+ ```markdown
59
+ # Task Architecture — <task-id>
60
+
61
+ ## Responsibilities & Boundaries
62
+ | Unit (class / module) | New / Existing | Responsibility | Public surface |
63
+ |---|---|---|---|
64
+ | ... | ... | ... | ... |
65
+
66
+ ## Paradigms & Conventions to honour
67
+ - <named convention from RULES.md the executor must follow>
68
+ - <pattern that is required / banned for this task>
69
+
70
+ ## Required Test Surfaces (hand-off to np-test-writer)
71
+ - <observable behaviour that MUST have a test> — happy path
72
+ - <boundary / empty / overflow case that MUST have a test>
73
+ - <failure path that MUST have a test>
74
+
75
+ ## Constraints for the executor
76
+ - <boundary the executor must not cross, e.g. "no DB access from the controller — go through the Service">
77
+
78
+ ## Conflicts
79
+ - <only if the task or RULES.md make a clean structure impossible — name the conflict; the orchestrator routes it to the user>
80
+ ```
81
+
82
+ If the task is purely mechanical (copy change, version bump, one-line fix) and needs no structural decision, emit a single line: `No structural decision required — <one-line reason>.` Do not manufacture structure where none is warranted.
83
+
84
+ <scope_guardrail>
85
+ **Do:**
86
+ - Read the task plan, RULES.md, CONTEXT, milestone architecture, and codebase index freely.
87
+ - Decide the task's code structure and the test surfaces it requires.
88
+ - Honour RULES.md Conventions and milestone architecture. Surface conflicts instead of silently overriding.
89
+
90
+ **Don't:**
91
+ - Write or edit ANY file — you have no Write/Edit tool. Your spec is your final message.
92
+ - Prescribe line-level implementation, schema DDL, or exact framework filenames.
93
+ - Re-open milestone decisions (`M<NNN>-CONTEXT.md` / `M<NNN>-ARCHITECTURE.md`) — refine within them.
94
+ - Spawn other agents or commit anything.
95
+ </scope_guardrail>
@@ -0,0 +1,89 @@
1
+ ---
2
+ name: np-test-writer
3
+ description: Per-task TDD step inside the Nubosloop. Runs in round 1 after the architect, before the executor. Writes real, valid test files for the task's required surfaces (from the architecture spec + acceptance criteria + Conventions) BEFORE production code exists. Tests may start red; the executor makes them green. Never skips, stubs, or writes vacuous assertions.
4
+ tier: sonnet
5
+ tools: Read, Write, Bash, Grep, Glob
6
+ color: "#06B6D4"
7
+ ---
8
+
9
+ <role>
10
+ You are the nubos-pilot test-writer. You run once per task, in round 1 of the Nubosloop, AFTER the per-task architect and BEFORE the executor. You practice test-driven development: you write the tests the executor must then make pass.
11
+
12
+ The orchestrator hands you `<architecture_constraints>` (the per-task architect's required test surfaces), the task's `<acceptance_criteria>`, and the project Conventions (`RULES.md`). You translate those into real test files placed where the project keeps tests. Because production code does not exist yet, your tests are EXPECTED to fail (red) when run — that is correct TDD. The executor turns them green; the loop's verify gate runs after the executor, never after you.
13
+
14
+ Your tests are a contract. The executor is told not to delete, skip, or weaken them. So they must be valid, runnable, and honest from the start.
15
+
16
+ **CRITICAL: Mandatory Initial Read**
17
+ If the prompt contains a `<files_to_read>` block, you MUST `Read` every file listed before writing anything — the task plan, the architecture spec path, `RULES.md`, and any existing neighbouring tests whose style you must match.
18
+
19
+ **Testing skills.** If the spawn prompt contains a `Use the following Nubos skills` line, `Read` each named skill from `.claude/skills/<skill>/SKILL.md` (e.g. `np-test-strategy`) before writing. Its "Verification bar" is the standard your test suite must satisfy. If absent (non-Claude runtime), proceed on your own judgment.
20
+ </role>
21
+
22
+ ## Completeness Mandate
23
+
24
+ This agent operates under [`templates/COMPLETENESS.md`](../templates/COMPLETENESS.md). The rules that bind this role:
25
+
26
+ - **Rule 1 — Do the whole thing.** Cover every surface the architect listed: happy path, empty/boundary/overflow input, and the failure path. A missing case is an incomplete suite.
27
+ - **Rule 3 — Do it with tests.** This is your entire job. Every public surface the task introduces gets at least one test that asserts observable behaviour.
28
+ - **Rule 10 — Test before shipping.** A test that does not actually assert the claimed behaviour is worse than no test. No `assert(true)`, no `expect(x).toBeDefined()` as the only check, no `it.skip` / `markTestSkipped` / commented-out asserts / `if (false)` guards. Such a test is a hard-stop violation, not a placeholder.
29
+
30
+ Refusal of any rule is a hard-stop. Surface the violation to the orchestrator verbatim and abort the spawn.
31
+
32
+ ## Anti-Skip Self-Check (run before you finish)
33
+
34
+ Before emitting your envelope, re-read every test file you wrote and confirm — line by line — that NONE of the following is present. If any is, fix it; do not ship it:
35
+
36
+ 1. Skipped/pending tests (`.skip`, `xit`, `xdescribe`, `markTestSkipped`, `@Disabled`, `t.skip`, `pytest.mark.skip`, `#[ignore]`).
37
+ 2. Vacuous assertions (`assert(true)`, `expect(true).toBe(true)`, an assertion that can never fail, or a test with zero assertions).
38
+ 3. Swallowed failures (`try { … } catch {}` around the assertion, empty catch, asserting inside an un-awaited promise).
39
+ 4. Tautologies (asserting a literal against itself, or re-asserting the mock you just configured).
40
+ 5. Non-determinism (wall-clock, network, unseeded randomness) without explicit injection.
41
+
42
+ A test that exists only to inflate the count is a Rule-10 violation. The downstream `np-critic-tests` axis audits for exactly these; do not hand it findings.
43
+
44
+ ## TDD Discipline (red is correct)
45
+
46
+ - Write tests against the behaviour the acceptance criteria + architecture spec require — not against whatever the executor might implement.
47
+ - Run the project's test command via `Bash` to confirm the tests **parse and execute** (no syntax/collection errors). Failing assertions are expected and fine; collection/compile errors are NOT — fix those.
48
+ - Do NOT write production code, stubs, or fixtures that pre-satisfy the tests. Minimal test scaffolding (factories, fakes the project already uses) is allowed; implementing the unit under test is the executor's job.
49
+ - Place tests where the project keeps them (match `RULES.md` → Conventions and existing neighbours). Add the files you create to the task's `files_modified` set via the checkpoint if the orchestrator asks; otherwise list them in your envelope so they are committed with the executor's diff.
50
+
51
+ ## Inputs
52
+
53
+ | Input | Purpose | Typical path |
54
+ |-------|---------|--------------|
55
+ | Task plan (required) | `<action>` + `<acceptance_criteria>` define what to test. | `.nubos-pilot/milestones/M<NNN>/slices/S<NNN>/tasks/T<NNNN>/T<NNNN>-PLAN.md` |
56
+ | Architecture constraints (required) | The per-task architect's required test surfaces. | inline `<architecture_constraints>` / `$TMPDIR` path |
57
+ | RULES.md (required) | Conventions — test location, naming, style. | `.nubos-pilot/RULES.md` |
58
+ | Neighbouring tests (recommended) | The project's test idiom you must match. | repo paths |
59
+
60
+ ## Output Contract
61
+
62
+ Write the test files, then emit a single JSON object as your final message (no prose around it):
63
+
64
+ ```json
65
+ {
66
+ "agent": "test-writer",
67
+ "task_id": "M001-S001-T0001",
68
+ "round": 1,
69
+ "tests_written": ["tests/Feature/OutcomeRecorderTest.php"],
70
+ "surfaces_covered": ["records a verdict and persists it", "rejects a malformed verdict with 422", "returns empty history for an unknown task"],
71
+ "collection_ok": true,
72
+ "expected_red": true,
73
+ "notes": "Tests fail as expected — no production code yet. Executor must make them green without weakening assertions."
74
+ }
75
+ ```
76
+
77
+ `collection_ok` MUST be `true` before you finish — if the suite cannot even collect/compile your tests, fix them first. `expected_red: true` is the normal TDD state. If you genuinely cannot write valid tests (e.g. acceptance criteria are ambiguous), emit `"collection_ok": false` with a `notes` explaining the blocker — the orchestrator routes it to the user rather than letting the executor proceed blind.
78
+
79
+ <scope_guardrail>
80
+ **Do:**
81
+ - Read the task, architecture spec, RULES.md, and neighbouring tests.
82
+ - Write real, valid, honest test files for every required surface.
83
+ - Run the test command to confirm collection succeeds (red assertions are fine).
84
+
85
+ **Don't:**
86
+ - Write production code or pre-satisfy your own tests.
87
+ - Skip, stub, weaken, or comment out any assertion to make the suite "pass".
88
+ - Edit unrelated files, spawn other agents, or commit anything.
89
+ </scope_guardrail>
package/bin/install.js CHANGED
@@ -228,27 +228,79 @@ function _readInstallConfig(projectRoot) {
228
228
  }
229
229
  }
230
230
 
231
- // On re-install/update the installer leaves an existing config.json untouched.
232
- // To make `ultra` the standard for updated projects too, backfill `agents.economy`
233
- // into configs that don't set it yet loud (the key is written, visible in the
234
- // file) and conservative (an explicit economy OR legacy economy_critic is treated
235
- // as a deliberate choice and never overwritten). Returns the action taken for logging.
236
- function _backfillEconomyDefault(stateDir, { dryRun = false } = {}) {
231
+ // Shared read for the re-install/update backfills: parse the existing config.json
232
+ // once. Returns `{ cfgPath, cfg, status }` where status is 'ok' | 'absent' |
233
+ // 'unparseable' and cfg is null unless status is 'ok'.
234
+ function _loadConfigJson(stateDir) {
237
235
  const cfgPath = path.join(stateDir, 'config.json');
238
236
  let raw;
239
- try { raw = fs.readFileSync(cfgPath, 'utf-8'); } catch { return 'absent'; }
237
+ try { raw = fs.readFileSync(cfgPath, 'utf-8'); } catch { return { cfgPath, cfg: null, status: 'absent' }; }
240
238
  let cfg;
241
- try { cfg = JSON.parse(raw); } catch { return 'unparseable'; }
242
- if (!cfg || typeof cfg !== 'object') return 'unparseable';
239
+ try { cfg = JSON.parse(raw); } catch { return { cfgPath, cfg: null, status: 'unparseable' }; }
240
+ if (!cfg || typeof cfg !== 'object') return { cfgPath, cfg: null, status: 'unparseable' };
241
+ return { cfgPath, cfg, status: 'ok' };
242
+ }
243
+
244
+ // Apply the ultra economy default in-memory (never overwriting an explicit
245
+ // economy OR legacy economy_critic choice). Returns 'backfilled' | 'preserved'.
246
+ function _applyEconomyDefault(cfg) {
243
247
  const agents = cfg.agents && typeof cfg.agents === 'object' ? cfg.agents : null;
244
248
  if (agents && (agents.economy !== undefined || agents.economy_critic !== undefined)) {
245
249
  return 'preserved';
246
250
  }
247
251
  cfg.agents = { ...(agents || {}), economy: configDefaults.INSTALL_ECONOMY_MODE };
248
- if (!dryRun) atomicWriteFileSync(cfgPath, JSON.stringify(cfg, null, 2));
249
252
  return 'backfilled';
250
253
  }
251
254
 
255
+ // Apply the default-on loop-agent toggles (architect, test-writer) in-memory,
256
+ // never overwriting an explicit true/false. Returns the keys added.
257
+ const _BACKFILL_AGENT_TOGGLES = Object.freeze(['architect', 'test_writer']);
258
+ function _applyAgentToggles(cfg) {
259
+ const agents = cfg.agents && typeof cfg.agents === 'object' ? cfg.agents : {};
260
+ const added = [];
261
+ for (const key of _BACKFILL_AGENT_TOGGLES) {
262
+ if (agents[key] === undefined) {
263
+ agents[key] = configDefaults.DEFAULT_AGENTS[key];
264
+ added.push(key);
265
+ }
266
+ }
267
+ if (added.length > 0) cfg.agents = agents;
268
+ return added;
269
+ }
270
+
271
+ // Single-pass backfill used by the installer: one read, the economy default and
272
+ // the agent-toggle defaults applied together, one atomic write. Both backfills are
273
+ // loud (written to the file) and conservative (an explicit choice is never
274
+ // overwritten). Returns `{ economy, toggles }` for logging.
275
+ function _backfillConfigDefaults(stateDir, { dryRun = false } = {}) {
276
+ const { cfgPath, cfg, status } = _loadConfigJson(stateDir);
277
+ if (!cfg) return { economy: status, toggles: [] };
278
+ const economy = _applyEconomyDefault(cfg);
279
+ const toggles = _applyAgentToggles(cfg);
280
+ if ((economy === 'backfilled' || toggles.length > 0) && !dryRun) {
281
+ atomicWriteFileSync(cfgPath, JSON.stringify(cfg, null, 2));
282
+ }
283
+ return { economy, toggles };
284
+ }
285
+
286
+ // Standalone wrappers retained for unit tests. Each loads + writes on its own;
287
+ // the installer uses _backfillConfigDefaults to avoid a second read/write pass.
288
+ function _backfillEconomyDefault(stateDir, { dryRun = false } = {}) {
289
+ const { cfgPath, cfg, status } = _loadConfigJson(stateDir);
290
+ if (!cfg) return status;
291
+ const action = _applyEconomyDefault(cfg);
292
+ if (action === 'backfilled' && !dryRun) atomicWriteFileSync(cfgPath, JSON.stringify(cfg, null, 2));
293
+ return action;
294
+ }
295
+
296
+ function _backfillAgentToggles(stateDir, { dryRun = false } = {}) {
297
+ const { cfgPath, cfg } = _loadConfigJson(stateDir);
298
+ if (!cfg) return [];
299
+ const added = _applyAgentToggles(cfg);
300
+ if (added.length > 0 && !dryRun) atomicWriteFileSync(cfgPath, JSON.stringify(cfg, null, 2));
301
+ return added;
302
+ }
303
+
252
304
  function _readExistingScope(projectRoot) {
253
305
  const cfg = _readInstallConfig(projectRoot);
254
306
  return cfg && cfg.scope ? cfg.scope : null;
@@ -441,11 +493,15 @@ async function _runInstallLocked(ctx) {
441
493
  else console.error(dim + 'DRY-RUN: würde schreiben ' + configPath + reset);
442
494
  initConfig = config;
443
495
  } else {
444
- // Re-install / update: backfill the ultra economy default into a config that
445
- // doesn't set it yet (never overwriting an explicit choice).
446
- const action = _backfillEconomyDefault(stateDir, { dryRun });
447
- if (action === 'backfilled') {
448
- console.error(green + ' [config] agents.economy → ultra (backfilled default)'
496
+ // Re-install / update: backfill the default agent config into a config that
497
+ // doesn't set it yet (one read/write, never overwriting an explicit choice).
498
+ const { economy, toggles } = _backfillConfigDefaults(stateDir, { dryRun });
499
+ if (economy === 'backfilled') {
500
+ console.error(green + ' [config] agents.economy → ' + configDefaults.INSTALL_ECONOMY_MODE + ' (backfilled default)'
501
+ + (dryRun ? ' [DRY-RUN]' : '') + reset);
502
+ }
503
+ for (const key of toggles) {
504
+ console.error(green + ' [config] agents.' + key + ' → ' + configDefaults.DEFAULT_AGENTS[key] + ' (backfilled default)'
449
505
  + (dryRun ? ' [DRY-RUN]' : '') + reset);
450
506
  }
451
507
  }
@@ -915,5 +971,6 @@ module.exports = {
915
971
  parseInstallFlags,
916
972
  VALID_AGENTS, VALID_SCOPES,
917
973
  SOURCE_PAYLOAD_DIR, PAYLOAD_SUBPATH, STATE_SUBPATH,
918
- _payloadDirFor, _stateDirFor, _backfillEconomyDefault,
974
+ _payloadDirFor, _stateDirFor,
975
+ _backfillEconomyDefault, _backfillAgentToggles, _backfillConfigDefaults,
919
976
  };
@@ -91,19 +91,90 @@ function _resolveSafe(root, p) {
91
91
  }
92
92
 
93
93
  const _COMMIT_NAME_MAX = 200;
94
+ const _COMMIT_BODY_MAX = 2000;
95
+ const _TASK_ID_PREFIX_RE = /^\s*M\d{3,}-S\d{3,}-T\d{4,}\s*[—:-]\s*/;
96
+ const _PLACEHOLDER_RE = /^\s*(?:\{\{.*\}\}|_?TBD\b.*|_none\b.*)\s*$/i;
97
+
94
98
  function _sanitizeCommitName(s) {
95
99
  return String(s == null ? '' : s).replace(/[\r\n\t]+/g, ' ').replace(/\s+/g, ' ').trim().slice(0, _COMMIT_NAME_MAX);
96
100
  }
97
101
 
102
+ // The scaffolded H1 is "# <task-id> — <name>"; without a `name:` frontmatter
103
+ // field the body regex captures the whole heading, producing the duplicated
104
+ // "task(ID): ID — desc" subject. Strip a leading task-id prefix so the subject
105
+ // reads "task(ID): desc".
98
106
  function _extractName(frontmatter, body) {
107
+ // Strip a leading task-id prefix, but fall through to the next source when the
108
+ // strip empties the candidate — a `name:` of just "M001-S001-T0001 —" or an H1
109
+ // with no description must not produce a bare "task(ID): " subject.
99
110
  if (typeof frontmatter.name === 'string' && frontmatter.name.length > 0) {
100
- return _sanitizeCommitName(frontmatter.name);
111
+ const stripped = _sanitizeCommitName(String(frontmatter.name).replace(_TASK_ID_PREFIX_RE, ''));
112
+ if (stripped) return stripped;
101
113
  }
102
114
  const m = String(body || '').match(/^#\s+(?:Task:\s*)?(.+?)\s*$/m);
103
- if (m) return _sanitizeCommitName(m[1]);
115
+ if (m) {
116
+ const stripped = _sanitizeCommitName(m[1].replace(_TASK_ID_PREFIX_RE, ''));
117
+ if (stripped) return stripped;
118
+ }
104
119
  return _sanitizeCommitName(frontmatter.id || 'task');
105
120
  }
106
121
 
122
+ function _innerTag(body, tag) {
123
+ const m = String(body || '').match(new RegExp('<' + tag + '>([\\s\\S]*?)</' + tag + '>'));
124
+ if (!m) return '';
125
+ const inner = m[1].trim();
126
+ // A still-unfilled placeholder may carry a Markdown bullet prefix
127
+ // (`- _TBD — …`); test the de-bulleted form so it is recognised and omitted.
128
+ const candidate = inner.replace(/^[-*+]\s+/, '');
129
+ return _PLACEHOLDER_RE.test(candidate) ? '' : inner;
130
+ }
131
+
132
+ function _sanitizeCommitBody(s) {
133
+ return String(s == null ? '' : s)
134
+ .replace(/\r\n?/g, '\n')
135
+ .replace(/[ \t]+\n/g, '\n')
136
+ .replace(/\n{3,}/g, '\n\n')
137
+ .trim()
138
+ .slice(0, _COMMIT_BODY_MAX);
139
+ }
140
+
141
+ // Compose a descriptive commit body from the task's intent so the history is
142
+ // self-explanatory months later: what the task does (<action>), what it must
143
+ // satisfy (<acceptance_criteria>), the task id, and the files it touched.
144
+ function _dedupe(arr) {
145
+ const seen = new Set();
146
+ const out = [];
147
+ for (const x of arr) {
148
+ if (!seen.has(x)) { seen.add(x); out.push(x); }
149
+ }
150
+ return out;
151
+ }
152
+
153
+ // Test files np-test-writer wrote this task (ADR-0023). The test-writer chooses
154
+ // their paths at runtime — the planner-authored files_modified does not list
155
+ // them — so the post-test-writer phase records them in the checkpoint. Fold them
156
+ // into the commit set so the executor-greened tests land with their production
157
+ // code instead of being silently dropped.
158
+ function _tddTestFiles(taskId, cwd, root) {
159
+ const cp = readCheckpoint(taskId, cwd);
160
+ const np = cp && cp.nubosloop;
161
+ const tests = np && Array.isArray(np.tdd_tests) ? np.tdd_tests : [];
162
+ return tests.map((p) => _resolveSafe(root, p));
163
+ }
164
+
165
+ function _composeCommitBody(body, taskId, files) {
166
+ const action = _innerTag(body, 'action');
167
+ const accept = _innerTag(body, 'acceptance_criteria');
168
+ const parts = [];
169
+ if (action) parts.push(action);
170
+ if (accept) parts.push('Acceptance:\n' + accept);
171
+ parts.push('Task: ' + taskId);
172
+ if (Array.isArray(files) && files.length > 0) {
173
+ parts.push('Files: ' + files.join(', '));
174
+ }
175
+ return _sanitizeCommitBody(parts.join('\n\n'));
176
+ }
177
+
107
178
  function run(args, ctx) {
108
179
  const context = ctx || {};
109
180
  const cwd = context.cwd || process.cwd();
@@ -153,13 +224,16 @@ function run(args, ctx) {
153
224
  );
154
225
  }
155
226
  const root = findProjectRoot(cwd);
156
- const safeFiles = files.map((p) => _resolveSafe(root, p));
227
+ const declaredSafe = files.map((p) => _resolveSafe(root, p));
228
+ const safeFiles = _dedupe([...declaredSafe, ..._tddTestFiles(taskId, cwd, root)]);
157
229
  const name = _extractName(frontmatter, body);
158
230
  const message = 'task(' + taskId + '): ' + name;
231
+ // Compose the body from the paths that will actually be committed (commitTask
232
+ // drops gitignored entries), so `git log` never advertises a file the diff omits.
233
+ const { committable } = git.classifyCommittablePaths(safeFiles);
234
+ const commitBody = _composeCommitBody(body, taskId, committable);
159
235
 
160
-
161
-
162
- const result = commitTask(taskId, safeFiles, message);
236
+ const result = commitTask(taskId, safeFiles, message, commitBody);
163
237
 
164
238
  if (result.committed === false && result.reason === 'artifacts-gitignored') {
165
239
  try {
@@ -164,6 +164,139 @@ test('CT-3: commit-task emits JSON with sha + files on success', () => {
164
164
  assert.ok(subject.startsWith('task(M006-S001-T0001):'), 'subject: ' + subject);
165
165
  });
166
166
 
167
+ test('CT-3b: subject strips the duplicated task-id prefix from the H1 heading', () => {
168
+ const root = makeRepo();
169
+ const taskId = 'M006-S001-T0050';
170
+ const m = taskId.match(/^(M\d{3,})-(S\d{3,})-(T\d{4,})$/);
171
+ const [, mId, sId, tId] = m;
172
+ const taskDir = path.join(root, '.nubos-pilot', 'milestones', mId, 'slices', sId, 'tasks', tId);
173
+ fs.mkdirSync(taskDir, { recursive: true });
174
+ fs.writeFileSync(path.join(taskDir, tId + '-PLAN.md'), [
175
+ '---',
176
+ `id: ${taskId}`,
177
+ `milestone: ${mId}`,
178
+ `slice: ${mId}-${sId}`,
179
+ 'type: execute',
180
+ 'status: in-progress',
181
+ 'tier: sonnet',
182
+ 'owner: np-executor',
183
+ 'wave: 1',
184
+ 'depends_on: []',
185
+ 'files_modified:',
186
+ ' - src/a.ts',
187
+ 'autonomous: true',
188
+ 'must_haves: {}',
189
+ '---',
190
+ '',
191
+ `# ${taskId} — wire the outcome feedback loop`,
192
+ '',
193
+ '<action>',
194
+ 'Add the OutcomeRecorder service and persist verdicts.',
195
+ '</action>',
196
+ '',
197
+ '<acceptance_criteria>',
198
+ '- Verdicts persist across restarts',
199
+ '- API returns 201 on record',
200
+ '</acceptance_criteria>',
201
+ ].join('\n'), 'utf-8');
202
+ seedLoopReadyCheckpoint(root, taskId);
203
+ fs.mkdirSync(path.join(root, 'src'), { recursive: true });
204
+ fs.writeFileSync(path.join(root, 'src', 'a.ts'), 'export const a = 1;\n', 'utf-8');
205
+ const prev = process.cwd();
206
+ process.chdir(root);
207
+ const cap = _capture();
208
+ try {
209
+ subcmd.run([taskId], { cwd: root, stdout: cap.stub });
210
+ } finally {
211
+ process.chdir(prev);
212
+ }
213
+ const subject = execFileSync('git', ['-C', root, 'log', '-n', '1', '--format=%s'], { encoding: 'utf-8' }).trim();
214
+ assert.equal(subject, `task(${taskId}): wire the outcome feedback loop`,
215
+ 'subject must not repeat the task id after the colon');
216
+ const fullBody = execFileSync('git', ['-C', root, 'log', '-n', '1', '--format=%b'], { encoding: 'utf-8' });
217
+ assert.match(fullBody, /Add the OutcomeRecorder service/, 'body should carry the <action> intent');
218
+ assert.match(fullBody, /Acceptance:/);
219
+ assert.match(fullBody, /Verdicts persist across restarts/);
220
+ assert.match(fullBody, new RegExp('Task: ' + taskId));
221
+ assert.match(fullBody, /Files: src\/a\.ts/);
222
+ });
223
+
224
+ test('CT-3c: test-writer files recorded in nubosloop.tdd_tests are folded into the commit', () => {
225
+ const root = makeRepo();
226
+ const taskId = 'M006-S001-T0060';
227
+ seedPlanAndTask(root, '06-01', taskId, ['src/a.ts']);
228
+ seedLoopReadyCheckpoint(root, taskId, { nubosloop: { tdd_tests: ['tests/a.test.ts'] } });
229
+ fs.mkdirSync(path.join(root, 'src'), { recursive: true });
230
+ fs.mkdirSync(path.join(root, 'tests'), { recursive: true });
231
+ fs.writeFileSync(path.join(root, 'src', 'a.ts'), 'export const a = 1;\n', 'utf-8');
232
+ fs.writeFileSync(path.join(root, 'tests', 'a.test.ts'), 'test("a", () => {});\n', 'utf-8');
233
+ const prev = process.cwd();
234
+ process.chdir(root);
235
+ try {
236
+ subcmd.run([taskId], { cwd: root, stdout: _capture().stub });
237
+ } finally {
238
+ process.chdir(prev);
239
+ }
240
+ const committed = execFileSync('git', ['-C', root, 'show', '--name-only', '--format=', 'HEAD'], { encoding: 'utf-8' });
241
+ assert.match(committed, /src\/a\.ts/, 'production file must be committed');
242
+ assert.match(committed, /tests\/a\.test\.ts/, 'tdd test file must be committed even though it is not in files_modified');
243
+ });
244
+
245
+ test('CT-3d: degenerate task name falls back to the id instead of an empty subject', () => {
246
+ const root = makeRepo();
247
+ const taskId = 'M006-S001-T0061';
248
+ const m = taskId.match(/^(M\d{3,})-(S\d{3,})-(T\d{4,})$/);
249
+ const [, mId, sId, tId] = m;
250
+ const taskDir = path.join(root, '.nubos-pilot', 'milestones', mId, 'slices', sId, 'tasks', tId);
251
+ fs.mkdirSync(taskDir, { recursive: true });
252
+ fs.writeFileSync(path.join(taskDir, tId + '-PLAN.md'), [
253
+ '---',
254
+ `id: ${taskId}`,
255
+ `milestone: ${mId}`,
256
+ `slice: ${mId}-${sId}`,
257
+ 'type: execute', 'status: in-progress', 'tier: sonnet', 'owner: np-executor',
258
+ 'wave: 1', 'depends_on: []',
259
+ 'files_modified:', ' - src/a.ts',
260
+ 'autonomous: true', 'must_haves: {}',
261
+ '---', '',
262
+ `# ${taskId} — `,
263
+ ].join('\n'), 'utf-8');
264
+ seedLoopReadyCheckpoint(root, taskId);
265
+ fs.mkdirSync(path.join(root, 'src'), { recursive: true });
266
+ fs.writeFileSync(path.join(root, 'src', 'a.ts'), 'export const a = 1;\n', 'utf-8');
267
+ const prev = process.cwd();
268
+ process.chdir(root);
269
+ try {
270
+ subcmd.run([taskId], { cwd: root, stdout: _capture().stub });
271
+ } finally {
272
+ process.chdir(prev);
273
+ }
274
+ const subject = execFileSync('git', ['-C', root, 'log', '-n', '1', '--format=%s'], { encoding: 'utf-8' }).trim();
275
+ assert.equal(subject, `task(${taskId}): ${taskId}`, 'empty name must fall back to the id, never a bare colon');
276
+ });
277
+
278
+ test('CT-3e: commit body Files: lists only committed paths, not gitignored ones', () => {
279
+ const root = makeRepo();
280
+ const taskId = 'M006-S001-T0062';
281
+ seedPlanAndTask(root, '06-01', taskId, ['src/a.ts', 'build/out.js']);
282
+ seedLoopReadyCheckpoint(root, taskId);
283
+ fs.writeFileSync(path.join(root, '.gitignore'), 'build/\n', 'utf-8');
284
+ fs.mkdirSync(path.join(root, 'src'), { recursive: true });
285
+ fs.mkdirSync(path.join(root, 'build'), { recursive: true });
286
+ fs.writeFileSync(path.join(root, 'src', 'a.ts'), 'export const a = 1;\n', 'utf-8');
287
+ fs.writeFileSync(path.join(root, 'build', 'out.js'), 'noise', 'utf-8');
288
+ const prev = process.cwd();
289
+ process.chdir(root);
290
+ try {
291
+ subcmd.run([taskId], { cwd: root, stdout: _capture().stub });
292
+ } finally {
293
+ process.chdir(prev);
294
+ }
295
+ const fullBody = execFileSync('git', ['-C', root, 'log', '-n', '1', '--format=%b'], { encoding: 'utf-8' });
296
+ assert.match(fullBody, /Files: src\/a\.ts/);
297
+ assert.doesNotMatch(fullBody, /build\/out\.js/, 'gitignored path must not be advertised in the body');
298
+ });
299
+
167
300
  test('CT-4: commit-task SOFT-SKIPS when every files_modified entry is gitignored (artifacts-gitignored terminator)', () => {
168
301
  const root = makeRepo();
169
302
  seedPlanAndTask(root, '06-01', 'M006-S001-T0002', ['build/out.js']);
@@ -426,6 +426,125 @@ function _seedSpawnEvidence(taskId, round, agents, cwd) {
426
426
  }
427
427
  }
428
428
 
429
+ test('LCLI-RR-2a: phase=post-architect refuses without np-task-architect spawn audit', async () => {
430
+ const r = _mkRoot();
431
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
432
+ require('../../lib/nubosloop.cjs').recordLoopState('M001-S001-T0001', { round: 1 }, r);
433
+ const loopRunRound = require('./loop-run-round.cjs');
434
+ await assert.rejects(
435
+ () => loopRunRound.run(
436
+ ['M001-S001-T0001', '--phase', 'post-architect'],
437
+ { cwd: r, stdout: _cap().stub },
438
+ ),
439
+ (err) => err && err.code === 'loop-post-architect-missing-spawn-audit',
440
+ );
441
+ });
442
+
443
+ test('LCLI-RR-2b: phase=post-architect with stamp → spawn-test-writer, no round bump', async () => {
444
+ const r = _mkRoot();
445
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
446
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-task-architect'], r);
447
+ const cap = _cap();
448
+ const loopRunRound = require('./loop-run-round.cjs');
449
+ await loopRunRound.run(['M001-S001-T0001', '--phase', 'post-architect'], { cwd: r, stdout: cap.stub });
450
+ const out = JSON.parse(cap.get());
451
+ assert.equal(out.phase, 'post-architect');
452
+ assert.equal(out.next_action, 'spawn-test-writer');
453
+ const cp = checkpoint.readCheckpoint('M001-S001-T0001', r);
454
+ assert.equal(cp.nubosloop.last_phase, 'post-architect');
455
+ assert.equal(cp.nubosloop.round, 1, 'prep step must not bump the round counter');
456
+ });
457
+
458
+ test('LCLI-RR-2c: phase=post-architect --force-post-architect bypasses the audit gate', async () => {
459
+ const r = _mkRoot();
460
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
461
+ require('../../lib/nubosloop.cjs').recordLoopState('M001-S001-T0001', { round: 1 }, r);
462
+ const cap = _cap();
463
+ const loopRunRound = require('./loop-run-round.cjs');
464
+ await loopRunRound.run(
465
+ ['M001-S001-T0001', '--phase', 'post-architect', '--force-post-architect'],
466
+ { cwd: r, stdout: cap.stub },
467
+ );
468
+ const out = JSON.parse(cap.get());
469
+ assert.equal(out.forced, true);
470
+ assert.equal(out.next_action, 'spawn-test-writer');
471
+ });
472
+
473
+ test('LCLI-RR-2d: phase=post-test-writer refuses without np-test-writer spawn audit', async () => {
474
+ const r = _mkRoot();
475
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
476
+ require('../../lib/nubosloop.cjs').recordLoopState('M001-S001-T0001', { round: 1 }, r);
477
+ const loopRunRound = require('./loop-run-round.cjs');
478
+ await assert.rejects(
479
+ () => loopRunRound.run(
480
+ ['M001-S001-T0001', '--phase', 'post-test-writer'],
481
+ { cwd: r, stdout: _cap().stub },
482
+ ),
483
+ (err) => err && err.code === 'loop-post-test-writer-missing-spawn-audit',
484
+ );
485
+ });
486
+
487
+ test('LCLI-RR-2e: phase=post-test-writer with stamp → spawn-executor, no round bump', async () => {
488
+ const r = _mkRoot();
489
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
490
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-test-writer'], r);
491
+ const cap = _cap();
492
+ const loopRunRound = require('./loop-run-round.cjs');
493
+ await loopRunRound.run(['M001-S001-T0001', '--phase', 'post-test-writer'], { cwd: r, stdout: cap.stub });
494
+ const out = JSON.parse(cap.get());
495
+ assert.equal(out.phase, 'post-test-writer');
496
+ assert.equal(out.next_action, 'spawn-executor');
497
+ const cp = checkpoint.readCheckpoint('M001-S001-T0001', r);
498
+ assert.equal(cp.nubosloop.last_phase, 'post-test-writer');
499
+ assert.equal(cp.nubosloop.round, 1);
500
+ });
501
+
502
+ test('LCLI-RR-2f: phase=post-test-writer --tests records the written paths in nubosloop.tdd_tests', async () => {
503
+ const r = _mkRoot();
504
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
505
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-test-writer'], r);
506
+ const cap = _cap();
507
+ const loopRunRound = require('./loop-run-round.cjs');
508
+ await loopRunRound.run(
509
+ ['M001-S001-T0001', '--phase', 'post-test-writer', '--tests', 'tests/a.test.ts, tests/b.test.ts'],
510
+ { cwd: r, stdout: cap.stub },
511
+ );
512
+ const out = JSON.parse(cap.get());
513
+ assert.deepEqual(out.tdd_tests, ['tests/a.test.ts', 'tests/b.test.ts']);
514
+ const cp = checkpoint.readCheckpoint('M001-S001-T0001', r);
515
+ assert.deepEqual(cp.nubosloop.tdd_tests, ['tests/a.test.ts', 'tests/b.test.ts']);
516
+ });
517
+
518
+ test('LCLI-RR-2g: post-researcher next_action skips disabled prep steps (architect off → test-writer)', async () => {
519
+ const r = _mkRoot();
520
+ fs.writeFileSync(
521
+ path.join(r, '.nubos-pilot', 'config.json'),
522
+ JSON.stringify({ agents: { architect: false } }),
523
+ 'utf-8',
524
+ );
525
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
526
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-researcher', 'np-researcher', 'np-researcher'], r);
527
+ const cap = _cap();
528
+ const loopRunRound = require('./loop-run-round.cjs');
529
+ await loopRunRound.run(['M001-S001-T0001', '--phase', 'post-researcher'], { cwd: r, stdout: cap.stub });
530
+ assert.equal(JSON.parse(cap.get()).next_action, 'spawn-test-writer');
531
+ });
532
+
533
+ test('LCLI-RR-2h: post-researcher next_action → executor when both prep steps are off', async () => {
534
+ const r = _mkRoot();
535
+ fs.writeFileSync(
536
+ path.join(r, '.nubos-pilot', 'config.json'),
537
+ JSON.stringify({ agents: { architect: false, test_writer: false } }),
538
+ 'utf-8',
539
+ );
540
+ checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
541
+ _seedSpawnEvidence('M001-S001-T0001', 1, ['np-researcher', 'np-researcher', 'np-researcher'], r);
542
+ const cap = _cap();
543
+ const loopRunRound = require('./loop-run-round.cjs');
544
+ await loopRunRound.run(['M001-S001-T0001', '--phase', 'post-researcher'], { cwd: r, stdout: cap.stub });
545
+ assert.equal(JSON.parse(cap.get()).next_action, 'spawn-executor');
546
+ });
547
+
429
548
  test('LCLI-RR-3: loop-run-round phase=post-executor with verify-green → spawn-critic-schwarm', async () => {
430
549
  const r = _mkRoot();
431
550
  checkpoint.startTask({ id: 'M001-S001-T0001' }, r);
@@ -1184,7 +1303,7 @@ test('LCLI-RR-35: post-researcher accepts when k=3 np-researcher audits exist (T
1184
1303
  { cwd: r, stdout: cap.stub },
1185
1304
  );
1186
1305
  const out = JSON.parse(cap.get());
1187
- assert.equal(out.next_action, 'spawn-executor');
1306
+ assert.equal(out.next_action, 'spawn-architect');
1188
1307
  assert.equal(out.forced, false);
1189
1308
  assert.equal(out.expected_researcher_count, 3);
1190
1309
  });
@@ -1207,7 +1326,7 @@ test('LCLI-RR-35b: post-researcher k-gate honors swarm.research.k config (Gap #6
1207
1326
  );
1208
1327
  const out = JSON.parse(cap.get());
1209
1328
  assert.equal(out.expected_researcher_count, 1);
1210
- assert.equal(out.next_action, 'spawn-executor');
1329
+ assert.equal(out.next_action, 'spawn-architect');
1211
1330
  });
1212
1331
 
1213
1332
  test('LCLI-RR-36: --force-post-researcher bypasses Layer-C gate + stamps flag (T2)', async () => {
@@ -7,6 +7,7 @@ const { NubosPilotError, safeAssign } = require('../../lib/core.cjs');
7
7
  const safePath = require('../../lib/safe-path.cjs');
8
8
 
9
9
  const checkpoint = require('../../lib/checkpoint.cjs');
10
+ const config = require('../../lib/config.cjs');
10
11
  const nubosloop = require('../../lib/nubosloop.cjs');
11
12
  const messaging = require('../../lib/messaging.cjs');
12
13
  const compress = require('../../lib/compress.cjs');
@@ -40,6 +41,8 @@ function _verifyExcerpt(verifyOutput, cwd) {
40
41
  const VALID_PHASES = new Set([
41
42
  'preflight',
42
43
  'post-researcher',
44
+ 'post-architect',
45
+ 'post-test-writer',
43
46
  'post-executor',
44
47
  'post-critics',
45
48
  'commit',
@@ -153,13 +156,124 @@ function _runPostResearcher(taskId, list, cwd) {
153
156
  );
154
157
  return {
155
158
  phase: 'post-researcher',
156
- next_action: 'spawn-executor',
159
+ next_action: _nextAfterResearcher(cwd),
157
160
  forced: force,
158
161
  expected_researcher_count: expectedK,
159
162
  round: merged.nubosloop ? merged.nubosloop.round : null,
160
163
  };
161
164
  }
162
165
 
166
+ // Round-1 preparation steps (architect, test-writer) that run between the
167
+ // researcher swarm and the executor. Each verifies its spawn was stamped via
168
+ // `loop-audit-tool-use` (Layer-C SKIP-GUARD) and records last_phase. They never
169
+ // bump the round counter — TDD writes tests once; build-fixer rounds iterate.
170
+ // Config-gated in the orchestrator: when agents.architect / agents.test_writer
171
+ // is off the step (and this phase) is simply never invoked.
172
+ function _checkPrepSpawnAudited(taskId, list, cwd, agent, forceFlag) {
173
+ const force = list.includes(forceFlag);
174
+ const cur = checkpoint.readCheckpoint(taskId, cwd) || {};
175
+ const round = Number((cur.nubosloop && cur.nubosloop.round)) || 1;
176
+ const satisfied = force
177
+ ? true
178
+ : nubosloop.assertSpawnsAuditedForRound(taskId, [agent], round, cwd).satisfied;
179
+ return { force, round, satisfied };
180
+ }
181
+
182
+ function _stampPrepPhase(taskId, cwd, phase, lastAction, forcedKey, force, extraFn) {
183
+ return checkpoint.mergeCheckpoint(
184
+ taskId,
185
+ (curCp) => {
186
+ const prev = (curCp && curCp.nubosloop) || {};
187
+ const partial = { last_phase: phase, last_action: lastAction };
188
+ if (force) partial[forcedKey] = true;
189
+ if (typeof extraFn === 'function') safeAssign(partial, extraFn(prev));
190
+ return { nubosloop: safeAssign({}, prev, partial) };
191
+ },
192
+ cwd,
193
+ );
194
+ }
195
+
196
+ // The round-1 prep steps are config-gated, so the emitted next_action hint must
197
+ // reflect which downstream steps are actually enabled — a consumer driving off
198
+ // the JSON hint (rather than the ACTION CONTRACT prose) must not skip an enabled
199
+ // architect/test-writer or stall on a disabled one.
200
+ function _nextEnabled(cwd, steps) {
201
+ for (const [path, action] of steps) {
202
+ if (config.tryReadConfigPath(cwd, path, true, { onWarn() {} })) return action;
203
+ }
204
+ return 'spawn-executor';
205
+ }
206
+
207
+ function _nextAfterResearcher(cwd) {
208
+ return _nextEnabled(cwd, [['agents.architect', 'spawn-architect'], ['agents.test_writer', 'spawn-test-writer']]);
209
+ }
210
+
211
+ function _nextAfterArchitect(cwd) {
212
+ return _nextEnabled(cwd, [['agents.test_writer', 'spawn-test-writer']]);
213
+ }
214
+
215
+ // `--tests "a, b, c"` → ['a','b','c']; the post-test-writer phase records these
216
+ // (the paths np-test-writer wrote) so commit-task can fold them into the commit.
217
+ function _parseTestPaths(raw) {
218
+ if (typeof raw !== 'string' || raw.trim() === '') return [];
219
+ return raw.split(',').map((s) => s.trim()).filter((s) => s.length > 0);
220
+ }
221
+
222
+ function _runPostArchitect(taskId, list, cwd) {
223
+ const g = _checkPrepSpawnAudited(taskId, list, cwd, 'np-task-architect', '--force-post-architect');
224
+ if (!g.satisfied) {
225
+ throw new NubosPilotError(
226
+ 'loop-post-architect-missing-spawn-audit',
227
+ 'phase=post-architect refused: no `loop-audit-tool-use` record for round=' + g.round +
228
+ ', agent=np-task-architect on ' + taskId + '. ' +
229
+ 'Spawn np-task-architect and call `loop-audit-tool-use ' + taskId +
230
+ ' --agent np-task-architect --tool-use-log <json>` after the spawn, ' +
231
+ 'or pass --force-post-architect for an explicit override.',
232
+ { taskId, round: g.round, missing: ['np-task-architect'], required: ['np-task-architect'] },
233
+ );
234
+ }
235
+ const merged = _stampPrepPhase(taskId, cwd, 'post-architect', 'architect-spawned', 'forced_post_architect', g.force);
236
+ return {
237
+ phase: 'post-architect',
238
+ next_action: _nextAfterArchitect(cwd),
239
+ forced: g.force,
240
+ round: merged.nubosloop ? merged.nubosloop.round : g.round,
241
+ };
242
+ }
243
+
244
+ function _runPostTestWriter(taskId, list, cwd) {
245
+ const g = _checkPrepSpawnAudited(taskId, list, cwd, 'np-test-writer', '--force-post-test-writer');
246
+ if (!g.satisfied) {
247
+ throw new NubosPilotError(
248
+ 'loop-post-test-writer-missing-spawn-audit',
249
+ 'phase=post-test-writer refused: no `loop-audit-tool-use` record for round=' + g.round +
250
+ ', agent=np-test-writer on ' + taskId + '. ' +
251
+ 'Spawn np-test-writer and call `loop-audit-tool-use ' + taskId +
252
+ ' --agent np-test-writer --tool-use-log <json>` after the spawn, ' +
253
+ 'or pass --force-post-test-writer for an explicit override.',
254
+ { taskId, round: g.round, missing: ['np-test-writer'], required: ['np-test-writer'] },
255
+ );
256
+ }
257
+ const testPaths = _parseTestPaths(args.getFlag(list, '--tests'));
258
+ const merged = _stampPrepPhase(
259
+ taskId, cwd, 'post-test-writer', 'test-writer-spawned', 'forced_post_test_writer', g.force,
260
+ (prev) => {
261
+ const prior = Array.isArray(prev.tdd_tests) ? prev.tdd_tests : [];
262
+ const seen = new Set(prior);
263
+ const tdd_tests = prior.slice();
264
+ for (const p of testPaths) { if (!seen.has(p)) { seen.add(p); tdd_tests.push(p); } }
265
+ return { tdd_tests };
266
+ },
267
+ );
268
+ return {
269
+ phase: 'post-test-writer',
270
+ next_action: 'spawn-executor',
271
+ forced: g.force,
272
+ tdd_tests: merged.nubosloop && Array.isArray(merged.nubosloop.tdd_tests) ? merged.nubosloop.tdd_tests : [],
273
+ round: merged.nubosloop ? merged.nubosloop.round : g.round,
274
+ };
275
+ }
276
+
163
277
  function _runPostExecutor(taskId, list, cwd) {
164
278
  const verifyExitCode = args.getFlag(list, '--verify-exit-code');
165
279
  if (verifyExitCode === undefined) {
@@ -557,7 +671,7 @@ async function run(argv, ctx) {
557
671
  if (!phase) {
558
672
  throw new NubosPilotError(
559
673
  'loop-run-round-missing-phase',
560
- 'loop-run-round requires --phase <preflight|post-researcher|post-executor|post-critics|commit|stuck>',
674
+ 'loop-run-round requires --phase <preflight|post-researcher|post-architect|post-test-writer|post-executor|post-critics|commit|stuck>',
561
675
  { hint: 'each phase corresponds to a non-LLM transition between LLM spawns' },
562
676
  );
563
677
  }
@@ -572,9 +686,11 @@ async function run(argv, ctx) {
572
686
  const tail = list.slice(1);
573
687
  let result;
574
688
  switch (phase) {
575
- case 'preflight': result = await _runPreflight(taskId, tail, cwd); break;
576
- case 'post-researcher': result = _runPostResearcher(taskId, tail, cwd); break;
577
- case 'post-executor': result = _runPostExecutor(taskId, tail, cwd); break;
689
+ case 'preflight': result = await _runPreflight(taskId, tail, cwd); break;
690
+ case 'post-researcher': result = _runPostResearcher(taskId, tail, cwd); break;
691
+ case 'post-architect': result = _runPostArchitect(taskId, tail, cwd); break;
692
+ case 'post-test-writer': result = _runPostTestWriter(taskId, tail, cwd); break;
693
+ case 'post-executor': result = _runPostExecutor(taskId, tail, cwd); break;
578
694
  case 'post-critics': result = _runPostCritics(taskId, tail, cwd); break;
579
695
  case 'commit': result = _runCommit(taskId, tail, cwd); break;
580
696
  case 'stuck': result = _runStuck(taskId, tail, cwd); break;
@@ -15,6 +15,14 @@ const BUILD_FIXER_AGENT = 'np-build-fixer';
15
15
 
16
16
  const RESEARCHER_AGENT = 'np-researcher';
17
17
 
18
+ // Round-1 preparation agents that run between the researcher swarm and the
19
+ // executor. Config-gated (agents.architect / agents.test_writer). They get
20
+ // Layer-C spawn-evidence stamps but are NOT Rule-9 audited (not in
21
+ // AUDITED_AGENTS) — the architect is advisory/read-only and the test-writer's
22
+ // quality is enforced downstream by the np-critic-tests axis.
23
+ const TASK_ARCHITECT_AGENT = 'np-task-architect';
24
+ const TEST_WRITER_AGENT = 'np-test-writer';
25
+
18
26
  const AUDITED_AGENTS = Object.freeze([
19
27
  EXECUTOR_AGENT,
20
28
  BUILD_FIXER_AGENT,
@@ -29,5 +37,7 @@ module.exports = {
29
37
  EXECUTOR_AGENT,
30
38
  BUILD_FIXER_AGENT,
31
39
  RESEARCHER_AGENT,
40
+ TASK_ARCHITECT_AGENT,
41
+ TEST_WRITER_AGENT,
32
42
  AUDITED_AGENTS,
33
43
  };
@@ -242,6 +242,8 @@ const NP_AGENTS = [
242
242
  { file: 'np-researcher-reconciler', expected_tier: 'sonnet' },
243
243
  { file: 'np-codebase-documenter', expected_tier: 'sonnet' },
244
244
  { file: 'np-architect', expected_tier: 'sonnet' },
245
+ { file: 'np-task-architect', expected_tier: 'sonnet' },
246
+ { file: 'np-test-writer', expected_tier: 'sonnet' },
245
247
  { file: 'np-build-fixer', expected_tier: 'sonnet' },
246
248
  { file: 'np-security-reviewer', expected_tier: 'sonnet' },
247
249
  { file: 'np-nyquist-auditor', expected_tier: 'haiku' },
@@ -18,6 +18,14 @@ const DEFAULT_AGENTS = Object.freeze({
18
18
  research: true,
19
19
  plan_checker: true,
20
20
  verifier: true,
21
+ // Per-task architecture step in the Nubosloop (np-task-architect). Runs in
22
+ // round 1 after the researcher swarm, before the test-writer and executor.
23
+ // Backfilled to true on install/update when absent (see bin/install.js).
24
+ architect: true,
25
+ // Per-task TDD step (np-test-writer). Runs in round 1 after the architect,
26
+ // before the executor — writes the tests the executor must make green.
27
+ // Backfilled to true on install/update when absent (see bin/install.js).
28
+ test_writer: true,
21
29
  // Economy axis level (off|lite|full|ultra). Default `lite` = prevention-first:
22
30
  // the climb-the-ladder discipline is on, the in-loop critic is opt-in (full/ultra).
23
31
  // Resolved via lib/economy-mode.cjs; legacy `economy_critic` bool still honoured.
@@ -33,6 +33,8 @@ const SCHEMA = Object.freeze({
33
33
  research: { type: 'boolean', optional: true },
34
34
  plan_checker: { type: 'boolean', optional: true },
35
35
  verifier: { type: 'boolean', optional: true },
36
+ architect: { type: 'boolean', optional: true },
37
+ test_writer: { type: 'boolean', optional: true },
36
38
  economy: { type: 'enum', values: VALID_ECONOMY_MODES, optional: true },
37
39
  economy_critic: { type: 'boolean', optional: true },
38
40
  },
package/lib/git.cjs CHANGED
@@ -58,7 +58,7 @@ function assertCommittablePaths(paths, opts) {
58
58
  return committable;
59
59
  }
60
60
 
61
- function commitTask(taskId, files, message) {
61
+ function commitTask(taskId, files, message, body) {
62
62
  const { committable, ignored } = classifyCommittablePaths(files);
63
63
  if (committable.length === 0) {
64
64
  if (ignored.length > 0) {
@@ -84,7 +84,11 @@ function commitTask(taskId, files, message) {
84
84
  });
85
85
  }
86
86
  execFileSync('git', ['add', '--', ...committable], { stdio: 'pipe', timeout: GIT_TIMEOUT_MS });
87
- execFileSync('git', ['commit', '-m', message, '--', ...committable], { stdio: 'pipe', timeout: GIT_TIMEOUT_MS });
87
+ const commitArgs = ['commit', '-m', message];
88
+ if (typeof body === 'string' && body.trim().length > 0) {
89
+ commitArgs.push('-m', body);
90
+ }
91
+ execFileSync('git', [...commitArgs, '--', ...committable], { stdio: 'pipe', timeout: GIT_TIMEOUT_MS });
88
92
  return {
89
93
  committed: true,
90
94
  files_committed: committable.slice(),
package/lib/git.test.cjs CHANGED
@@ -199,6 +199,34 @@ test('GIT-5: commitTask creates a single commit containing exactly the supplied
199
199
  });
200
200
  });
201
201
 
202
+ test('GIT-5b: commitTask attaches a multi-line body via a second -m when body is supplied', () => {
203
+ const root = makeRepo();
204
+ inRepo(root, () => {
205
+ writeFile(root, 'lib/git.cjs', '// stub');
206
+ git.commitTask(
207
+ 'M006-S001-T0001',
208
+ ['lib/git.cjs'],
209
+ 'task(M006-S001-T0001): add git helper',
210
+ 'Implements the git helper.\n\nTask: M006-S001-T0001',
211
+ );
212
+ const subject = execFileSync('git', ['log', '-n', '1', '--format=%s'], { encoding: 'utf-8' }).trim();
213
+ const fullBody = execFileSync('git', ['log', '-n', '1', '--format=%b'], { encoding: 'utf-8' });
214
+ assert.equal(subject, 'task(M006-S001-T0001): add git helper');
215
+ assert.match(fullBody, /Implements the git helper\./);
216
+ assert.match(fullBody, /Task: M006-S001-T0001/);
217
+ });
218
+ });
219
+
220
+ test('GIT-5c: commitTask omits the body -m when body is empty/whitespace (backward-compatible)', () => {
221
+ const root = makeRepo();
222
+ inRepo(root, () => {
223
+ writeFile(root, 'lib/git.cjs', '// stub');
224
+ git.commitTask('M006-S001-T0001', ['lib/git.cjs'], 'task(M006-S001-T0001): add git helper', ' ');
225
+ const fullBody = execFileSync('git', ['log', '-n', '1', '--format=%b'], { encoding: 'utf-8' }).trim();
226
+ assert.equal(fullBody, '');
227
+ });
228
+ });
229
+
202
230
  test('GIT-6: findCommitByTaskId returns 40-char SHA for known task commit', () => {
203
231
  const root = makeRepo();
204
232
  inRepo(root, () => {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nubos-pilot",
3
- "version": "1.3.3",
3
+ "version": "1.3.4",
4
4
  "description": "Self-hosted AI pilot for any codebase. Researcher and critic agents plan, execute and verify each change.",
5
5
  "homepage": "https://pilot.nubos.cloud",
6
6
  "repository": {
@@ -79,7 +79,33 @@ Examples:
79
79
  -->
80
80
  - _TBD — fill with logging policy._
81
81
 
82
- ## Code Style
82
+ ## Conventions
83
+
84
+ > **How your code must be built.** These rules bind the architect (`np-task-architect`),
85
+ > the test-writer (`np-test-writer`), the executor, and the style critic
86
+ > (`np-critic-style`). They are read on every task. Each subsection is **MUST FILL** —
87
+ > use `- _none — <reason>_` only when a subsection genuinely does not apply.
88
+
89
+ ### Class / Module Structure
90
+
91
+ <!-- How classes, modules, and units are shaped. Examples:
92
+ - One public class per file; file name matches the class name.
93
+ - Constructor injection only — no service-locator / static singletons.
94
+ - Business logic lives in Services/Actions; controllers stay thin (no DB access).
95
+ - Public surface ≤ 5 methods; split when it grows past that.
96
+ -->
97
+ - _TBD — fill with class/module construction rules._
98
+
99
+ ### Naming
100
+
101
+ <!-- Identifier conventions. Examples:
102
+ - Classes PascalCase, methods camelCase, constants UPPER_SNAKE.
103
+ - Booleans read as predicates (`isActive`, `hasAccess`), never `flag`/`status`.
104
+ - Table names follow the framework's pluralization — never override.
105
+ -->
106
+ - _TBD — fill with naming conventions._
107
+
108
+ ### Code Style
83
109
 
84
110
  <!-- Format/lint/comment policy. Examples:
85
111
  - No comments inside source — names + tests carry intent.
@@ -88,6 +114,15 @@ Examples:
88
114
  -->
89
115
  - _TBD — fill with style policy._
90
116
 
117
+ ### Patterns & Paradigms
118
+
119
+ <!-- Architectural patterns that are required or banned. Examples:
120
+ - Required: Repository pattern for all persistence; Result objects over exceptions for control flow.
121
+ - Banned: anemic domain models; inheritance for code reuse (prefer composition).
122
+ - Errors are typed and surfaced — never swallowed or stringly-typed.
123
+ -->
124
+ - _TBD — fill with required/banned patterns._
125
+
91
126
  ## Out-of-Scope (Forever)
92
127
 
93
128
  <!-- Things this project explicitly will never do. Distinct from deferred ideas.
@@ -219,6 +219,10 @@ SPAWN_HEADLESS_ENABLED=$(node .nubos-pilot/bin/np-tools.cjs config-get spawn.hea
219
219
  SPAWN_HEADLESS_AGENTS=$(node .nubos-pilot/bin/np-tools.cjs config-get spawn.headless.agents 2>/dev/null || echo '["np-critic","np-researcher"]')
220
220
  SPAWN_HEADLESS_FALLBACK=$(node .nubos-pilot/bin/np-tools.cjs config-get spawn.headless.fallback_on_error 2>/dev/null || echo true)
221
221
  CONF_INJECT_CRITERIA=$(node .nubos-pilot/bin/np-tools.cjs config-get conformance.inject_criteria 2>/dev/null || echo true)
222
+ # Round-1 prep agents (default on; backfilled on install/update). When a toggle
223
+ # is false the matching ACTION CONTRACT (Step 2b / Step 2c) is skipped wholesale.
224
+ ARCHITECT_ENABLED=$(node .nubos-pilot/bin/np-tools.cjs config-get agents.architect 2>/dev/null || echo true)
225
+ TEST_WRITER_ENABLED=$(node .nubos-pilot/bin/np-tools.cjs config-get agents.test_writer 2>/dev/null || echo true)
222
226
  # Milestone success_criteria as the executor's acceptance target (rendered once from the INIT payload).
223
227
  # Intent-level only (ADR-0019): these describe what "done right" means, NOT how to build it.
224
228
  SUCCESS_CRITERIA_BLOCK=$(echo "$INIT" | node -e 'process.stdin.on("data",d=>{try{const c=JSON.parse(d).success_criteria||[];console.log(c.map(x=>"- "+(x.id?x.id+": ":"")+(x.text||x)).join("\n"))}catch(e){console.log("")}})')
@@ -414,6 +418,131 @@ for WAVE_INDEX in 0 1 2 ...; do
414
418
  CONSENSUS_PATTERN=""
415
419
  fi
416
420
 
421
+ # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
422
+ # ACTION CONTRACT — Step 2b: Per-Task Architect (Round 1, config-gated)
423
+ # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
424
+ # WHEN: $ROUND -eq 1 AND $ARCHITECT_ENABLED = true. Skip wholesale otherwise
425
+ # (agents.architect=false → no architect this run; R≥2 build-fixer rounds
426
+ # never run it).
427
+ # SKIP-GUARD: loop-post-architect-missing-spawn-audit (needs 1 architect audit).
428
+ #
429
+ # Execute EXACTLY these three groups, in order:
430
+ #
431
+ # (1) ONE Agent tool-call (real, not bash):
432
+ # Agent(subagent_type="np-task-architect", prompt=<…>)
433
+ # Prompt fields:
434
+ # <files_to_read>: task plan, slice plan, CONTEXT.md, RULES.md,
435
+ # M<NNN>-ARCHITECTURE.md (if present), .nubos-pilot/codebase/INDEX.md
436
+ # <consensus_pattern>: $CONSENSUS_PATTERN (researcher output; may be empty)
437
+ # <lang_directive>: $LANG_DIRECTIVE
438
+ # Curated skills (quality bar) — instruct the agent to Read each that
439
+ # applies from .claude/skills/<skill>/SKILL.md: np-system-design,
440
+ # np-service-boundary, np-api-design, np-composition-patterns,
441
+ # np-error-handling, np-adr (only for a costly-to-reverse choice).
442
+ # The agent is READ-ONLY: it emits its Task-Architecture spec as its FINAL
443
+ # MESSAGE (markdown per its Output Contract). Write that message verbatim
444
+ # to "$ARCH_CONSTRAINTS_PATH".
445
+ #
446
+ # (2) ONE Bash audit-stamp (same round) — architect is NOT Rule-9 audited,
447
+ # so an empty tool-use log is correct:
448
+ # node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" \
449
+ # --agent np-task-architect --tool-use-log '[]'
450
+ #
451
+ # (3) ONE Bash advance:
452
+ # node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
453
+ # --phase post-architect
454
+ #
455
+ # $ARCH_CONSTRAINTS is injected as <architecture_constraints> into the
456
+ # test-writer (Step 2c) AND executor (Step 3) prompts.
457
+ #
458
+ # Rationale: ADR-0023 — a per-task structural pass before tests/code so the
459
+ # test-writer and executor build against a decided shape, honouring RULES.md
460
+ # Conventions. Ephemeral ($TMPDIR, never committed) → plan-lint untouched.
461
+ # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
462
+ ARCH_CONSTRAINTS=""
463
+ ARCH_CONSTRAINTS_PATH="${TMPDIR:-/tmp}/np-arch-${TASK_ID}.md"
464
+ if [ "$ROUND" -eq 1 ] && [ "$ARCHITECT_ENABLED" = "true" ]; then
465
+ # Off-host (ADR-0021): np-task-architect is read-only (Read/Grep/Glob), not
466
+ # Rule-9 audited, writes no files — run via spawn-offhost with default cwd
467
+ # when it routes to an openai-compat provider; its spec returns as the
468
+ # spawn's final message (content).
469
+ ARCHITECT_KIND=$(node .nubos-pilot/bin/np-tools.cjs resolve-model np-task-architect --json 2>/dev/null \
470
+ | node -e 'let s="";process.stdin.on("data",d=>s+=d).on("end",()=>{try{console.log(JSON.parse(s).kind||"native")}catch{console.log("native")}})')
471
+ if [ "$ARCHITECT_KIND" = "openai-compat" ]; then
472
+ A_PROMPT="${TMPDIR:-/tmp}/np-offhost-task-architect-${TASK_ID}.md"
473
+ # … render the files_to_read block + consensus + skills + $LANG_DIRECTIVE into "$A_PROMPT" …
474
+ A_OUT=$(node .nubos-pilot/bin/np-tools.cjs spawn-offhost \
475
+ --agent np-task-architect --task-file "$A_PROMPT" --task-id "$TASK_ID" \
476
+ --read-only --no-audit ${SLICE_CWD:+--cwd "$SLICE_CWD"})
477
+ echo "$A_OUT" | ARCH_PATH="$ARCH_CONSTRAINTS_PATH" node -e 'let s="";process.stdin.on("data",d=>s+=d).on("end",()=>{let c="";try{c=JSON.parse(s).content||""}catch{}require("fs").writeFileSync(process.env.ARCH_PATH,c)})'
478
+ else
479
+ true # → execute group (1): native Agent spawn; write its final message to "$ARCH_CONSTRAINTS_PATH".
480
+ fi
481
+ node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-task-architect --tool-use-log '[]'
482
+ node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" --phase post-architect
483
+ [ -f "$ARCH_CONSTRAINTS_PATH" ] && ARCH_CONSTRAINTS=$(cat "$ARCH_CONSTRAINTS_PATH")
484
+ fi
485
+
486
+ # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
487
+ # ACTION CONTRACT — Step 2c: Test-Writer / TDD (Round 1, config-gated)
488
+ # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
489
+ # WHEN: $ROUND -eq 1 AND $TEST_WRITER_ENABLED = true. Runs AFTER the architect,
490
+ # BEFORE the executor. Skip wholesale otherwise.
491
+ # SKIP-GUARD: loop-post-test-writer-missing-spawn-audit (needs 1 test-writer audit).
492
+ #
493
+ # Execute EXACTLY these three groups, in order:
494
+ #
495
+ # (1) ONE Agent tool-call (real, not bash):
496
+ # Agent(subagent_type="np-test-writer", prompt=<…>)
497
+ # Prompt fields:
498
+ # <files_to_read>: task plan, slice plan, RULES.md, neighbouring tests
499
+ # <architecture_constraints>: $ARCH_CONSTRAINTS (the architect's required
500
+ # test surfaces; empty when the architect is disabled)
501
+ # <success_criteria>: $SUCCESS_CRITERIA_BLOCK + slice UAT path (intent-level)
502
+ # <lang_directive>: $LANG_DIRECTIVE
503
+ # Curated skill (quality bar) — instruct the agent to Read
504
+ # .claude/skills/np-test-strategy/SKILL.md and satisfy its Verification bar.
505
+ # RULES — the agent writes REAL, VALID test files for every required surface;
506
+ # it MUST NOT skip/stub/weaken assertions (Rule 10). Tests MAY be red now;
507
+ # the executor makes them green. The agent emits a JSON envelope whose
508
+ # tests_written paths you collect into $TDD_TESTS.
509
+ #
510
+ # (2) ONE Bash audit-stamp (same round) — test-writer is NOT Rule-9 audited:
511
+ # node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" \
512
+ # --agent np-test-writer --tool-use-log '[]'
513
+ #
514
+ # (3) ONE Bash advance — pass the written test paths so they are recorded in
515
+ # the checkpoint (nubosloop.tdd_tests) and commit-task folds them into the
516
+ # commit even when files_modified did not enumerate them:
517
+ # node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" \
518
+ # --phase post-test-writer --tests "$TDD_TESTS"
519
+ #
520
+ # Rationale: ADR-0023 — TDD inside the loop. The mechanical verify gate
521
+ # (Step 4) runs only AFTER the executor, so red-until-executor is expected
522
+ # and not a failure. The np-critic-tests axis (Step 5) re-audits for any
523
+ # skipped/vacuous assertions that slipped through.
524
+ # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
525
+ TDD_TESTS=""
526
+ if [ "$ROUND" -eq 1 ] && [ "$TEST_WRITER_ENABLED" = "true" ]; then
527
+ # Off-host (ADR-0021): np-test-writer writes test files, so off-host needs
528
+ # worktree isolation exactly like the executor (model-driven Write confined
529
+ # + ff-merged back). When worktree isolation is off, it runs native.
530
+ TEST_WRITER_KIND=$(node .nubos-pilot/bin/np-tools.cjs resolve-model np-test-writer --json 2>/dev/null \
531
+ | node -e 'let s="";process.stdin.on("data",d=>s+=d).on("end",()=>{try{console.log(JSON.parse(s).kind||"native")}catch{console.log("native")}})')
532
+ if [ "$TEST_WRITER_KIND" = "openai-compat" ] && [ "$WORKTREE_ISOLATION" = "true" ] && [ -n "$SLICE_CWD" ] && [ "$SLICE_CWD" != "." ]; then
533
+ TW_PROMPT="${TMPDIR:-/tmp}/np-offhost-test-writer-${TASK_ID}.md"
534
+ # … render files_to_read + architecture_constraints + success_criteria + skill + $LANG_DIRECTIVE into "$TW_PROMPT" …
535
+ TW_OUT=$(node .nubos-pilot/bin/np-tools.cjs spawn-offhost \
536
+ --agent np-test-writer --task-file "$TW_PROMPT" --task-id "$TASK_ID" \
537
+ --cwd "$SLICE_CWD" --allow-bash --no-audit)
538
+ TDD_TESTS=$(echo "$TW_OUT" | node -e 'let s="";process.stdin.on("data",d=>s+=d).on("end",()=>{try{const j=JSON.parse(JSON.parse(s).content||"{}");console.log((j.tests_written||[]).join(", "))}catch{console.log("")}})')
539
+ else
540
+ true # → execute group (1): native Agent spawn; collect tests_written from the envelope into $TDD_TESTS.
541
+ fi
542
+ node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use "$TASK_ID" --agent np-test-writer --tool-use-log '[]'
543
+ node .nubos-pilot/bin/np-tools.cjs loop-run-round "$TASK_ID" --phase post-test-writer --tests "$TDD_TESTS"
544
+ fi
545
+
417
546
  # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
418
547
  # ACTION CONTRACT — Step 3: Executor (R1) / Build-Fixer (R≥2)
419
548
  # ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
@@ -424,6 +553,13 @@ for WAVE_INDEX in 0 1 2 ...; do
424
553
  # Prompt fields:
425
554
  # <files_to_read>: task plan, slice plan, prior slice SUMMARYs, CONTEXT.md
426
555
  # <consensus_pattern>: $CONSENSUS_PATTERN (with [VERIFIED]/[PROVISIONAL]/[CACHED])
556
+ # <architecture_constraints>: $ARCH_CONSTRAINTS — the per-task architect's
557
+ # decided structure + constraints (empty when agents.architect is off).
558
+ # The executor builds against this shape; it is intent-level, not a code spec.
559
+ # <tdd_tests>: $TDD_TESTS — test files np-test-writer wrote (R1, empty when off).
560
+ # The executor MUST make them green WITHOUT deleting, skipping, or weakening
561
+ # any assertion. They are in scope alongside files_modified (recorded in the
562
+ # checkpoint at post-test-writer) and commit-task commits them with the diff.
427
563
  # <success_criteria>: when $CONF_INJECT_CRITERIA = true, include the milestone
428
564
  # acceptance target — $SUCCESS_CRITERIA_BLOCK plus the slice UAT path
429
565
  # (.nubos-pilot/milestones/M<NNN>/slices/S<NNN>/S<NNN>-UAT.md). Frame it as
@@ -437,7 +573,8 @@ for WAVE_INDEX in 0 1 2 ...; do
437
573
  # ultra) instruct the agent to APPLY the np-executor "Climb the ladder"
438
574
  # discipline before writing (prevention-first). When $ECONOMY_MODE = off,
439
575
  # instruct it to SKIP the ladder (no economy pressure this run).
440
- # RULES — Agent MUST: edit ONLY paths in files_modified (D-04 scope guard) —
576
+ # RULES — Agent MUST: edit ONLY paths in files_modified plus the <tdd_tests>
577
+ # paths (D-04 scope guard; the TDD tests are the sole sanctioned addition) —
441
578
  # success_criteria are the acceptance target, NEVER a licence to touch other files,
442
579
  # run `node np-tools.cjs knowledge-search "<q>" --task $TASK_ID` via Bash
443
580
  # ≥1× (Rule 9 — the --task flag writes the audit evidence ledger),
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  command: np:plan-phase
3
3
  description: Plans a milestone (M<NNN>) — breaks it into slices (waves) and tasks. Spawns np-planner (opus) + np-plan-checker (opus), 2-iteration verification, then scaffolds every task file.
4
- argument-hint: <milestone-number> [--research] [--repromote]
4
+ argument-hint: <milestone-number> [--research] [--architect] [--repromote]
5
5
  ---
6
6
 
7
7
  # np:plan-phase
@@ -41,17 +41,19 @@ Output layout:
41
41
  ```bash
42
42
  PHASE=""
43
43
  RESEARCH_FLAG=0
44
+ ARCHITECT_FLAG=0
44
45
  REPROMOTE_FLAG=0
45
46
  for arg in "$@"; do
46
47
  case "$arg" in
47
48
  --research) RESEARCH_FLAG=1 ;;
49
+ --architect) ARCHITECT_FLAG=1 ;;
48
50
  --repromote) REPROMOTE_FLAG=1 ;;
49
51
  --*) echo "Unknown flag: $arg" >&2; exit 2 ;;
50
52
  *) [[ -z "$PHASE" ]] && PHASE="$arg" ;;
51
53
  esac
52
54
  done
53
55
  if [[ -z "$PHASE" ]]; then
54
- echo "Usage: /np:plan-phase <milestone-number> [--research] [--repromote]" >&2
56
+ echo "Usage: /np:plan-phase <milestone-number> [--research] [--architect] [--repromote]" >&2
55
57
  exit 2
56
58
  fi
57
59
  ```
@@ -154,6 +156,19 @@ fi
154
156
 
155
157
  **Exit code 42 contract:** orchestrator sees exit 42 → runs `/np:research-phase $PHASE` → re-enters `/np:plan-phase $PHASE` without the `--research` flag.
156
158
 
159
+ ### Gate 2b — Optional architecture pass (`--architect`)
160
+
161
+ The `--architect` flag auto-dispatches `/np:architect-phase` before planning, so a structural ADR pass (`M<NNN>-ARCHITECTURE.md`) is decided up front and the planner consumes it like an extension of CONTEXT.md. Dispatched AFTER research (the established flow is research → architect → plan): when both flags are set, the research re-entry strips `--research`, leaving `--architect` to dispatch on the next pass.
162
+
163
+ ```bash
164
+ if [[ "$ARCHITECT_FLAG" == "1" ]]; then
165
+ echo "architect-auto: dispatching /np:architect-phase $PHASE before planning" >&2
166
+ exit 43
167
+ fi
168
+ ```
169
+
170
+ **Exit code 43 contract:** orchestrator sees exit 43 → runs `/np:architect-phase $PHASE` → re-enters `/np:plan-phase $PHASE` without the `--architect` flag. The milestone `np-architect` stays intent-level (ADR-0019): its decisions inform the plan; they do not bake schema/filenames/code-style into `PLAN.md`.
171
+
157
172
  **Researcher-Schwarm semantics (ADR-0011).** The dispatched `/np:research-phase` runs in Schwarm mode by default (`swarm.research.k=3`). The cache-bypass at Pre-flight short-circuits the swarm whenever the milestone goal + requirements match a stored learning at similarity ≥ `swarm.research.threshold` and `occurrence ≥ swarm.research.minOccurrence`. The merged consensus carries a `<consensus_meta>` block (`k`, `agreement_score`, `flagged_decisions`) which `np-plan-checker` reads to weight downstream verdicts. No additional flags needed at this site — the swarm runs automatically when `--research` is set.
158
173
 
159
174
  ### Gate 3 — Milestone already planned