npm - @nusoft/nuos-build-catalogue - Versions diffs - 0.23.0 → 0.25.0 - Mend

@nusoft/nuos-build-catalogue 0.23.0 → 0.25.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/dist/commands/init.js +2 -0
package/package.json +1 -1
package/templates/agents/coder.md +2 -0
package/templates/agents/reviewer.md +4 -0
package/templates/agents/tester.md +4 -2
package/templates/protocols/build-wu.md +31 -0
package/templates/protocols/plan-initial-wu.md +14 -2
package/templates/protocols/plan-review.md +134 -0
package/templates/starter-kit/methodfile.json +9 -0
package/templates/testing/README.md +21 -0
package/templates/testing/example.test.ts +7 -0
package/templates/testing/vitest.config.ts +16 -0

package/dist/commands/init.js CHANGED Viewed

@@ -43,6 +43,7 @@ const PROTOCOL_FILES = [
     'plan-uiux.md',
     'plan-maps.md',
     'plan-initial-wu.md',
+    'plan-review.md',
     'build-wu.md',
 ];
 /**
@@ -60,6 +61,7 @@ const PROTOCOL_DESCRIPTIONS = {
     'plan-uiux': 'Phase C of planning — enumerate every surface and build the complete design system',
     'plan-maps': 'Phase D of planning — map the journey from here to done with phases and near-term plan',
     'plan-initial-wu': 'Phase E of planning — file the first 5–10 work units ordered by dependency',
+    'plan-review': 'End-to-end planning review — surfaces gaps, inconsistencies, and optimisations before building starts',
     'build-wu': 'Orchestrate a swarm of agents to build one work unit end-to-end',
 };
 const TOOLS = {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@nusoft/nuos-build-catalogue",
-  "version": "0.23.0",
+  "version": "0.25.0",
   "description": "NuOS build-catalogue tooling: semantic search (WU 110) + migration runner that lifts markdown artefacts into JSON-backed workflow records (WU 111, Phase G).",
   "type": "module",
   "bin": {

package/templates/agents/coder.md CHANGED Viewed

@@ -46,6 +46,8 @@ If anything in the work unit is ambiguous, **stop and surface the ambiguity to t
 5. **No comments unless WHY is non-obvious.** A hidden constraint, a workaround for a specific bug, behaviour that would surprise a reader. If removing the comment wouldn't confuse a future reader, don't write it.
+6. **Write testable code (vitest gate).** If `methodfile.json` declares `testing.framework: "vitest"` with `testing.enforced: true`, every source file you create or substantially modify in this WU must end up covered by at least one vitest test — the tester writes them, but your job is to make that cheap. Export the units the tester needs to reach; avoid burying observable logic inside untestable closures; keep side effects at the edges. Files that genuinely can't be unit-tested (pure type declarations, config glue) are fine — flag them in your notes so the reviewer doesn't treat them as drift.
 ## When you finish
 Append a brief note to the work unit's `## Notes / log` section:

package/templates/agents/reviewer.md CHANGED Viewed

@@ -52,6 +52,10 @@ nuos-catalogue memory store --value="<the pattern and why it matters>" --wu=<han
 7. **Is jargon being introduced into user-facing copy?** If the work unit serves a non-engineer persona, the surface text should match the project's voice file. Flag anything that sounds like dev-speak in a user-facing surface.
+8. **Does the vitest gate pass (JS/TS projects)?** If `methodfile.json` declares `testing.framework: "vitest"` with `testing.enforced: true`, run both gates from [build-wu.md §Step 5.5](../protocols/build-wu.md):
+   - **Gate A:** Run `npx vitest run` (or whatever `testing.command` says) from the implementation repo root. Capture the full output. Non-zero exit → BLOCKER finding with the failing test list.
+   - **Gate B:** Compute `git diff --name-only <swarm-base>...HEAD`, filter to source files (`.ts/.tsx/.js/.jsx` under `src/`, `app/`, `routes/`, `pages/`, `lib/`, `components/`, `api/` — exclude `*.test.*`, `*.spec.*`, `*.d.ts`, configs). For each remaining file, grep the test directories for an import of that module or a colocated `*.test.*` file. Any uncovered file → BLOCKER finding naming the file. The coder may rebut by flagging files as genuinely untestable (type-only, config glue) in the WU notes — accept those rebuttals when reasonable.
 ## How you write findings
 Each finding has:

package/templates/agents/tester.md CHANGED Viewed

@@ -39,9 +39,11 @@ nuos-catalogue memory store --value="<what you learned about testing this area>"
 3. **Failure paths matter as much as happy paths.** If the work unit's walkthrough mentions what happens when data is missing or the user makes a mistake, write a test for each.
-4. **Use the existing test idioms.** Don't introduce a new assertion library, a new fixture pattern, or a new way of mocking. If the project uses `node:test`, use it. If it uses Vitest, use Vitest.
+4. **Use the framework declared in `methodfile.json`'s `testing.framework`.** For JS/TS projects this defaults to **vitest** — the harness pre-wires it. If `testing.framework` is null or names a non-vitest runner (e.g. `pytest`), match the project's existing idiom instead. Never silently switch frameworks; if the declared framework is missing from the repo, escalate to the coordinator before writing tests.
-5. **Tests must be reproducible.** No flaky timing, no relying on order, no shared mutable state between tests unless the framework explicitly supports it.
+5. **Per-touched-file coverage is mandatory (vitest gate).** When `testing.framework` is vitest and `testing.enforced` is true, every source file the coder created or substantially modified in this WU must have at least one test file referencing it. Run `git diff --name-only <swarm-base>...HEAD` to get the list. For each `.ts/.tsx/.js/.jsx` file under `src/`, `app/`, `routes/`, `pages/`, `lib/`, `components/`, or `api/` (excluding test files, `.d.ts`, configs), write or extend a test that imports the module and exercises observable behaviour. Colocated (`foo.test.ts` next to `foo.ts`) or `tests/`-mirrored layout — match what already exists.
+6. **Tests must be reproducible.** No flaky timing, no relying on order, no shared mutable state between tests unless the framework explicitly supports it.
 ## When you run the tests

package/templates/protocols/build-wu.md CHANGED Viewed

@@ -88,6 +88,14 @@ Use Claude Code's **Task tool**. Each spawn names the agent (`subagent_type`), t
 Omit `null` fields. If `techStack.defined` is `false` or the section is absent, note it in the swarm audit entry and suggest the operator define the stack (`/plan-orientation` or edit `methodfile.json` directly) before the next swarm run — agents generating code without a known stack default to generic patterns that often need rework.
+**Vitest pre-flight (JS/TS projects only).** Before spawning the coder, check `methodfile.json`'s `testing` block. If `testing.framework` is `vitest` and `testing.enforced` is `true`, verify the implementation repo has vitest installed and a `vitest.config.ts` (or `.js`/`.mts`) present. If either is missing, before any agent runs:
+1. Surface to the operator in one line: *"Vitest is the declared test runner but isn't wired up in the implementation repo yet. I'll install it (`npm i -D vitest @vitest/coverage-v8`) and drop a minimal `vitest.config.ts` before the coder starts. OK?"*
+2. On confirmation: (a) run `npm i -D vitest @vitest/coverage-v8` in the implementation repo; (b) copy the harness's `vitest.config.ts` template into the repo root and `example.test.ts` into `tests/example.test.ts` (create the directory if missing). The templates ship inside the installed harness package at `node_modules/@nusoft/nuos-build-catalogue/templates/testing/` — read them from there; if the harness is being run from a checkout, they're at `<harness-repo>/templates/testing/`; (c) add a `"test": "vitest run"` script to the implementation repo's `package.json` if absent; (d) run `npx vitest run` once to confirm the wiring works.
+3. Record the setup in the swarm audit entry under a `## Setup` section so the next swarm doesn't repeat the check.
+If `testing.framework` is not vitest (e.g. the project is Python), skip this pre-flight — the existing "match the project's idiom" rule applies.
 **Spawn in parallel where possible.** If two agents can work independently (e.g. tester writing tests while reviewer reads design), spawn them in the same message. Sequential when an agent's output is the next agent's input (architect → coder).
 For each spawn:
@@ -102,6 +110,25 @@ When each agent returns, capture their output. Three outcomes are typical:
 - **REQUEST CHANGES** by reviewer → re-spawn coder with reviewer's findings as input. Cap at 3 retry loops; if still failing, escalate to debugger or operator.
 - **ESCALATE** (any agent surfaces an architectural issue, a design ambiguity, a need for the operator's call) → STOP the swarm. Surface the issue to the operator in plain English; do not auto-decide.
+## Step 5.5 — Run the test gate (JS/TS projects)
+If `methodfile.json` has `testing.framework: "vitest"` and `testing.enforced: true`, this gate is mandatory before the reviewer's APPROVE can stand. The reviewer is responsible for running it (see [reviewer.md](../agents/reviewer.md)), but the coordinator owns the outcome.
+**Gate A — the suite passes:** Run the command in `testing.command` (default `npx vitest run`) from the implementation repo root. The command must exit 0. Capture full output. If any test fails, the gate fails — re-spawn the coder + tester with the failure output, counted against the retry cap in Step 5.
+**Gate B — every touched source file is covered:**
+1. Compute the WU's changed files: `git diff --name-only <base>...HEAD` where `<base>` is the swarm's starting commit (recorded in the audit entry's `## Setup` section, or `HEAD~1` if not).
+2. Filter to source files: anything matching `*.ts`, `*.tsx`, `*.js`, `*.jsx` under `src/`, `app/`, `routes/`, `pages/`, `lib/`, `components/`, or `api/`. Exclude `*.test.*`, `*.spec.*`, `*.d.ts`, config files, and anything under `node_modules/`, `dist/`, `build/`, `.next/`.
+3. For each remaining file, check at least one `.test.ts(x)` or `.spec.ts(x)` references it. Acceptable references: (a) an `import` statement naming the file's module path, (b) a colocated `foo.test.ts` next to `foo.ts`, (c) a `tests/foo.test.ts` whose basename matches.
+4. Any source file with no matching test is a Gate B failure. Re-spawn the tester with the uncovered file list and a directive to add at least one passing test per file.
+If both gates pass, record `✓ vitest gate passed (N tests, M files covered)` in the swarm audit entry under `## Test gate` and continue to Step 6.
+If either gate fails, re-spawn agents per the retry rules in Step 5. Gate failures count against the 3-attempt cap. After the third failure, escalate to the operator in plain English: *"After three attempts the vitest gate still fails on [list]. Either the tests need redesign or the touched files genuinely shouldn't be tested (config glue, declaration files). How would you like to proceed?"*
+**Non-JS projects:** Skip this gate but note in the audit entry that the WU shipped without an enforced test gate (e.g. *"Python project — vitest gate N/A; pytest suite run separately"*).
 ## Step 6 — Record the swarm run
 Write an audit entry at `docs/build/swarm/YYYY-MM-DD-wu-<handle>.md`. Use the template at `docs/build/swarm/_template.md`. Capture:
@@ -213,6 +240,10 @@ For full-feature swarms (architect → coder → tester → reviewer), after the
 If anything looks misaligned, escalate to the operator before spending more tokens on the tester.
+### Vitest test gate (JS/TS projects)
+Step 5.5 above defines the gate in full. Restated as a verification rule: a JS/TS work unit cannot promote with reviewer APPROVE unless `vitest run` exits 0 AND every source file the WU touched has at least one test file referencing it. The reviewer runs the gate; the coordinator enforces the outcome. This is the load-bearing gate for code quality — drift here means shipping untested behaviour.
 ### Recording gate triggers
 Every gate trigger gets recorded in the swarm audit entry under a `## Gate triggers` section. Even if the swarm continues, the trigger is logged. This builds the audit trail for the operator to review when reasoning about whether the swarm pattern is paying off.

package/templates/protocols/plan-initial-wu.md CHANGED Viewed

@@ -53,10 +53,22 @@ Update each work unit file with its dependency links. Update `docs/build/work-un
 If multiple work units have no dependencies, pick the one that unblocks the most. Mark that one `🟡 in flight`; leave the others `🔵 proposed` with a note that they can start in parallel.
-## Step 4 — Close
+## Step 4 — Run the planning arc review (required before closing)
+Before Phase E can close, run the full planning arc review. This is not optional.
+**Invoke `/plan-review`** (or read `.claude/commands/plan-review.md` and follow it). The review agent reads every artefact in the catalogue, then surfaces what's missing, unclear, inconsistent, or improvable — before a single line of code is written.
+Do not proceed to Step 5 until:
+- All blocker findings are fixed or explicitly escalated to the operator
+- All other findings are either fixed, filed as open questions, or deferred with a stated reason
+The review typically takes 10–20 minutes. It is the difference between a catalogue an agent can build against coherently and one that produces drift from the first spawn.
+## Step 5 — Close
 Update STATE.md:
-- Phase E row → `✅ complete (YYYY-MM-DD)`
+- Phase E row → `✅ complete (YYYY-MM-DD)` (only after `plan-review` has completed)
 - "Active work unit" → the first `🟡 in flight` work unit handle and title
 - "What is currently in flight" → one sentence describing what the swarm will tackle first
 - Refresh "Last updated"

package/templates/protocols/plan-review.md ADDED Viewed

@@ -0,0 +1,134 @@
+# plan-review
+You are running the **planning arc review** — a full end-to-end audit of everything the planning arc produced before a single line of code is written.
+This runs automatically at the end of Phase E. It can also be invoked at any point mid-project (e.g. after a significant pivot, after adding a new persona, or when something feels off) with `/plan-review`.
+By the end of this protocol:
+- Every gap, ambiguity, inconsistency, and optimisation opportunity in the planning catalogue has been surfaced
+- Each finding is either fixed immediately, filed as a Q-NNN open question, or explicitly deferred with a reason
+- The operator has confirmed the catalogue is complete enough to build against
+- Nothing unclear is hiding in the planning artefacts where an agent will silently make a wrong assumption
+---
+## Step 1 — Read the entire catalogue (do not skip anything)
+Before spawning the review agent, read every artefact produced by the planning arc:
+- `methodfile.json` — project metadata, tech stack, planning state
+- `docs/build/STATE.md` — current snapshot
+- All files in `docs/build/personas/` (not just `_index.md` — every persona file)
+- All files in `docs/build/architecture/`
+- All files in `docs/build/contracts/`
+- All files in `docs/build/ui-ux/`
+- All files in `docs/build/design-system/` (tokens, components, patterns, voice, accessibility)
+- All files in `docs/build/maps/`
+- All files in `docs/build/work-units/`
+- All files in `docs/build/decisions/`
+- `docs/build/open-questions/_index.md`
+- `docs/build/risks/_index.md`
+Also run the cross-agent memory search for any prior findings about this project:
+```bash
+nuos-catalogue memory search --query="planning gaps"
+nuos-catalogue memory search --query="open questions"
+```
+## Step 2 — Spawn the review agent
+Spawn an **architect** agent (Opus) with this exact brief:
+> You are reviewing the complete planning catalogue for **[project name]** before any implementation begins. Your job is to find what's missing, unclear, inconsistent, or improvable — so the agents that build this project have a complete, coherent brief to work from.
+>
+> Read every artefact provided below (personas, architecture, contracts, UI/UX surfaces, design system, maps, work units, decisions, open questions).
+>
+> Then run end to end through the entire project planning. Consider:
+> - **User journeys**: does the catalogue trace every complete path a user takes through the product? Are any paths incomplete, ambiguous, or contradictory?
+> - **Expectations and pain points**: do the personas clearly describe what users expect and what frustrates them? Would an agent reading these personas build something the real user would recognise?
+> - **Expected outcomes**: for each work unit, is the outcome unambiguous? Could two different agents read the same work unit and produce different things?
+> - **User experience**: does the design system actually govern the surfaces? Do the surfaces reference components that exist? Are there surfaces with no clear design language?
+> - **Every route**: are there user paths implied by the architecture that have no corresponding surface? Are there surfaces with no clear entry point?
+> - **Every journey**: does every persona have at least one complete journey through the product — from entry to outcome?
+> - **Every reason this tool will be used**: have all use cases been captured? Are there obvious use cases implied by the personas that have no work units?
+> - **Cross-artefact consistency**: do contracts match what modules claim to provide? Do work units reference personas and modules that exist? Do surfaces reference design-system components that are filed?
+>
+> Return your findings structured as four lists:
+>
+> **MISSING** — things the catalogue should contain but doesn't (e.g. a surface with no empty state, a persona with no journey, a module with no contract)
+>
+> **UNCLEAR** — things that are present but need more definition before an agent can act on them (e.g. an acceptance criterion that isn't binary, a design token with no stated value, a contract that says "appropriate response" without defining what appropriate means)
+>
+> **GAPS** — inconsistencies between artefacts (e.g. a work unit that consumes a contract that doesn't exist, a surface that uses a colour token not in the design system, an architecture module that nothing depends on and nothing depends on it)
+>
+> **OPTIMISE** — things that are present and correct but could be improved to produce better agent output (e.g. a persona that has seven dimensions but no acid-test scenario, a work unit with three acceptance criteria where five would give the tester better coverage, a map phase with no verification gate)
+>
+> For each finding: state what it is, which artefact it's in, and what specifically needs to change or be added. Be precise — vague findings produce vague fixes.
+Pass the full contents of every artefact as context. Do not summarise the artefacts — pass the full text.
+## Step 3 — Triage the findings with the operator
+When the review agent returns, surface the findings in plain English grouped by list. For each finding:
+1. Read it to the operator in plain language
+2. Ask: *"Fix it now, file it as an open question to address before we build, or defer it with a reason?"*
+3. Execute immediately:
+   - **Fix now**: make the change to the relevant artefact, show the operator
+   - **Open question**: file as Q-NNN in `docs/build/open-questions/`, add to `_index.md`
+   - **Defer**: note the reason in the relevant artefact's file (so the next agent to read it knows the gap was seen and deliberately left)
+Do not let findings disappear into conversation. Every finding must land somewhere in the catalogue before the review closes.
+If the review agent surfaces more than 10 findings, group them by severity before presenting:
+- **Blockers** (MISSING or GAPS that would cause an agent to build the wrong thing) — address these before any building starts, no exceptions
+- **Non-blockers** (UNCLEAR or OPTIMISE items that would improve quality but don't break the brief) — can be filed as Q-NNN and addressed in the first building sessions
+## Step 4 — Store the review
+After all findings are triaged, store a summary in cross-agent memory:
+```bash
+nuos-catalogue memory store \
+  --value="Planning review complete: [N] findings — [N] fixed, [N] filed as open questions, [N] deferred. Key issues: [one sentence summary of the most significant findings]" \
+  --agent=coordinator \
+  --key="planning-review"
+```
+Write a brief review entry to the current session log (it will be captured by `/end-of-session`).
+## Step 5 — Surface the result to the operator
+> "Planning review done. Here's what we found:
+>
+> - **[N] blockers** — [summary / "none"]
+> - **[N] clarifications** — [summary / "none"]
+> - **[N] optimisations** — [summary / "none"]
+>
+> [If blockers were fixed]: Fixed [N] things in the catalogue directly.
+> [If open questions were filed]: Filed [N] open questions — these will surface in `/start-of-session` when we start building.
+> [If deferred]: Deferred [N] items — noted in the relevant artefacts.
+>
+> The catalogue is [complete and clear to build against / has [N] open questions that should be resolved in the first session before the swarm starts]."
+Return control to whatever invoked this protocol (typically `plan-initial-wu`, which will then proceed to close Phase E).
+---
+## If invoked standalone (mid-project)
+When `/plan-review` is called outside of the planning arc close — e.g. after a significant pivot, after a new persona is added, or when something feels off — run Steps 1–5 above, then:
+- Do not update planning progress in STATE.md (Phase E may already be complete)
+- Do update STATE.md's "What is currently in flight" and "Last updated"
+- Run `/end-of-session` to commit the findings and any fixes
+---
+## What never to do
+- **Never skip the full catalogue read.** A review based on summaries misses cross-artefact inconsistencies — which are the most damaging class of gap.
+- **Never let a finding sit in conversation.** If it's not filed or fixed before the review closes, it's lost.
+- **Never block building on OPTIMISE findings.** These are improvements, not prerequisites. File them, continue.

package/templates/starter-kit/methodfile.json CHANGED Viewed

@@ -19,6 +19,15 @@
     "externalServices": [],
     "notes": null
   },
+  "testing": {
+    "framework": "vitest",
+    "command": "npx vitest run",
+    "configPath": "vitest.config.ts",
+    "enforced": true,
+    "policy": "every-touched-source-file-must-be-covered-by-a-passing-test",
+    "appliesTo": ["typescript", "javascript", "tsx", "jsx"],
+    "comment": "Vitest is the default test runner for JS/TS implementation repos. The swarm enforces two gates before a WU can promote: (1) `npx vitest run` exits 0; (2) every source file touched by the WU is referenced by at least one .test.ts(x) or .spec.ts(x). If the project's techStack does not include JS/TS, set framework to whatever idiom the stack uses (or null to opt out — the coordinator will warn but not block)."
+  },
   "catalogue": {
     "root": "docs/build/",
     "registers": {

package/templates/testing/README.md ADDED Viewed

@@ -0,0 +1,21 @@
+# Vitest scaffolding
+These files are copied into a JS/TS implementation repo by the swarm coordinator (see [build-wu.md §Step 4 — Vitest pre-flight](../protocols/build-wu.md)) when:
+- `methodfile.json` has `testing.framework: "vitest"` AND `testing.enforced: true`, and
+- the implementation repo does not yet have vitest installed or a `vitest.config.ts` present.
+## Files
+- `vitest.config.ts` — minimal Node-environment config with sensible include/exclude globs for typical `src/`, `app/`, `routes/`, `lib/` layouts and v8 coverage wired in.
+- `example.test.ts` — single passing test, dropped at `tests/example.test.ts` so the operator can confirm the runner works end-to-end.
+## What the coordinator does
+1. Runs `npm i -D vitest @vitest/coverage-v8` in the implementation repo.
+2. Copies `vitest.config.ts` to the repo root.
+3. Copies `example.test.ts` to `tests/example.test.ts` (creates the directory if missing).
+4. Adds `"test": "vitest run"` to the implementation repo's `package.json` `scripts` block if absent.
+5. Runs `npx vitest run` once to confirm the wiring works before spawning the coder.
+The example test is meant to be deleted as soon as the first real WU adds real tests — its purpose is purely to prove the wiring before the swarm depends on it.

package/templates/testing/example.test.ts ADDED Viewed

@@ -0,0 +1,7 @@
+import { describe, expect, it } from 'vitest';
+describe('vitest wiring', () => {
+  it('runs', () => {
+    expect(1 + 1).toBe(2);
+  });
+});

package/templates/testing/vitest.config.ts ADDED Viewed

@@ -0,0 +1,16 @@
+import { defineConfig } from 'vitest/config';
+export default defineConfig({
+  test: {
+    include: ['tests/**/*.test.ts', 'src/**/*.test.ts', 'app/**/*.test.{ts,tsx}', 'routes/**/*.test.{ts,tsx}'],
+    exclude: ['node_modules', 'dist', '.next', 'build'],
+    environment: 'node',
+    reporters: ['default'],
+    coverage: {
+      provider: 'v8',
+      reporter: ['text', 'json-summary'],
+      include: ['src/**/*.{ts,tsx}', 'app/**/*.{ts,tsx}', 'routes/**/*.{ts,tsx}', 'lib/**/*.{ts,tsx}'],
+      exclude: ['**/*.test.*', '**/*.spec.*', '**/*.d.ts'],
+    },
+  },
+});