baldart 4.40.0 → 4.42.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (32) hide show
  1. package/CHANGELOG.md +42 -0
  2. package/README.md +6 -2
  3. package/VERSION +1 -1
  4. package/framework/.claude/agents/REGISTRY.md +1 -1
  5. package/framework/.claude/agents/coder.md +4 -2
  6. package/framework/.claude/agents/qa-sentinel.md +10 -1
  7. package/framework/.claude/commands/qa.md +2 -0
  8. package/framework/.claude/skills/new/references/setup.md +19 -7
  9. package/framework/.claude/skills/toolchain-bootstrap/SKILL.md +127 -0
  10. package/framework/.claude/workflows/new-card-review.js +17 -2
  11. package/framework/.claude/workflows/new2.js +44 -3
  12. package/framework/agents/index.md +2 -0
  13. package/framework/agents/toolchain-protocol.md +80 -0
  14. package/framework/docs/TOOLCHAIN-LAYER.md +135 -0
  15. package/framework/scripts/validate-card-baseline.js +133 -3
  16. package/framework/templates/baldart.config.template.yml +40 -0
  17. package/framework/templates/ci/check-card-baseline.yml +0 -3
  18. package/package.json +1 -1
  19. package/src/commands/configure.js +81 -0
  20. package/src/commands/doctor.js +67 -0
  21. package/src/commands/update.js +12 -0
  22. package/src/utils/tool-currency.js +52 -0
  23. package/src/utils/toolchain-adapters/biome.js +92 -0
  24. package/src/utils/toolchain-adapters/eslint.js +39 -0
  25. package/src/utils/toolchain-adapters/husky.js +30 -0
  26. package/src/utils/toolchain-adapters/index.js +83 -0
  27. package/src/utils/toolchain-adapters/jest.js +34 -0
  28. package/src/utils/toolchain-adapters/lefthook.js +84 -0
  29. package/src/utils/toolchain-adapters/prettier.js +39 -0
  30. package/src/utils/toolchain-adapters/tsc.js +50 -0
  31. package/src/utils/toolchain-adapters/vitest.js +46 -0
  32. package/src/utils/toolchain-installer.js +233 -0
package/CHANGELOG.md CHANGED
@@ -5,6 +5,48 @@ All notable changes to BALDART will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [4.42.0] - 2026-06-15
9
+
10
+ **The `/new` worktree-setup can no longer pass on a fabricated baseline: the orchestrator now verifies the worktree on disk instead of trusting the subagent's self-report, and the setup subagent moves off `haiku`.** A real `/new` run reproduced **2/2** a silent failure — the background **worktree-setup** subagent (on `haiku`) returned a well-formed block reporting `baseline: pass` in ~6s **with no worktree on disk**: it pattern-matched the expected output instead of running the multi-step `/nw` skill, and the orchestrator trusted it because the baseline gate only ever checked the *returned* field, never the disk. This release closes both halves: (1) the worktree-setup subagent moves **`haiku → sonnet`** (running `/nw` via the Skill tool is a sustained tool-execution chain a too-weak model fabricates rather than executes); (2) a new **worktree integrity gate** (`setup.md §6a`) verifies the worktree with **orchestrator Bash** — `git worktree list --porcelain` + `test -d` + branch + `node_modules` — which a subagent cannot fabricate, and routes any failure to a **non-circular fallback chain** (`subagent → inline /nw → HALT`, no loop, with a build `timeout`). `new2`'s pre-flight gets the parallel mitigation (it cannot run Bash, so it returns non-falsifiable evidence the workflow string-matches — explicitly declared structurally weaker). Separately, the card-baseline validator is now **dependency-free**: it parsed cards with `js-yaml`, absent from the framework payload, so `require('js-yaml')` failed in consumers and silently disabled card-baseline validation (1b-iii) — replaced with a node-core `parseCardYaml`. **MINOR** (additive integrity gate + a behavior change to the worktree subagent model + a dependency-removing bugfix; no removed surface, the `/nw` `{path,branch,port}` contract is unchanged, and no new `baldart.config.yml` key ⇒ schema-change propagation rule N/A).
11
+
12
+ ### Added
13
+
14
+ - **`framework/.claude/skills/new/references/setup.md` §6a — worktree integrity gate (BLOCKING, orchestrator-Bash).** On `baseline: pass`, the orchestrator verifies the worktree on disk *itself* (`git -C "$MAIN" worktree list --porcelain` lists the path, `test -d`, branch matches, `node_modules` present; registry `buildVerified` + a build-artifact dir as best-effort corroboration) — **never** a `verified` field returned by a subagent (which is exactly as fabricable as the block). Any of checks 1-4 failing routes to the §4d fallback. Honest-failure signals (`baseline: fail`/`timeout`) are trusted and STOP first (recreating cannot fix a broken or hung build).
15
+ - **`framework/scripts/validate-card-baseline.js` — `parseCardYaml()`**, a node-core reader for the card-YAML subset (`|`/`>` block scalars, nested maps, scalar/map lists, inline flow collections, number fidelity for `group.sequence` which drives epic detection). Replaces the `js-yaml` dependency. Exported and guarded by a new **Check C** in `scripts/check-card-baseline.js`.
16
+
17
+ ### Changed
18
+
19
+ - **`framework/.claude/skills/new/references/setup.md` §4b — worktree-setup subagent `model: "haiku" → "sonnet"`** (rationale rewritten: a sustained Skill-tool execution chain, not one-shot plumbing; its return is verified by the §6a disk gate regardless). §4d fallback broadened from "empty / 0-tool-uses" to "did not produce a VERIFIED worktree", made a genuinely **different executor** (inline `/nw`, never a re-spawn or a frozen script) with an explicit cap (`subagent → inline → HALT`; transient-vs-deterministic classification), and the briefing wraps the build in `timeout 600` → new `baseline: timeout`.
20
+ - **`framework/.claude/agents/REGISTRY.md`** — the haiku plumbing carve-out drops the worktree-setup sub-clause (now sonnet); only the file-scoped revert agent remains a sanctioned haiku use.
21
+ - **`framework/.claude/workflows/new2.js`** — `PREFLIGHT_SCHEMA` gains `worktreeVerified` + `worktreeEvidence` (literal `worktree list --porcelain` / `ls` / baseline-log tail) and `baseline: 'timeout'`; a new **E2.5 worktree integrity gate** string-matches that evidence (the workflow JS cannot run Bash, so it is declared structurally weaker than classic `/new`'s orchestrator-Bash gate), and a timeout is batch-fatal.
22
+ - **`framework/templates/ci/check-card-baseline.yml`** — drops the now-unnecessary `npm i js-yaml` step (the validator is dependency-free).
23
+
24
+ ### Fixed
25
+
26
+ - **Card-baseline validation (1b-iii) silently skipped in every consumer lacking `js-yaml`.** The shipped validator `require`-d a package not in the framework payload, so `/new` / `/new2` pre-flight could not run it and degraded to skip-with-note. Now node-core — it runs everywhere.
27
+ - **`/new` worktree-setup accepting a fabricated `baseline: pass`** (reproduced 2/2 on a real consumer) — closed by the §6a integrity gate + the sonnet model + the broadened fallback above.
28
+
29
+ ## [4.41.0] - 2026-06-15
30
+
31
+ **BALDART becomes opinionated about the *tools*, not just the workflow: a new curated toolchain layer installs best-in-class JS/TS dev tools at first install and makes the agents actually use them.** Until now BALDART shipped agents and workflows but was agnostic about linters/formatters/test-runners — every quality gate hard-coded `eslint`/`tsc`/`jest`. This release adds an opt-in toolchain layer (`features.has_toolchain`) that, on a JS/TS project, PRESELECTS and installs a curated set as devDependencies — **Biome** (format + lint + import organizer), **Vitest**, **tsc**, **Lefthook** (pre-commit) — then records literal gate commands in `toolchain.commands.*` that the gate flows (`/new`, `/new2`, `/qa`, `qa-sentinel`, `coder`) run verbatim instead of guessing. The design is the fourth member of the install-adapter family (alongside routine-/tool-/lsp-adapters) and inherits their invariants: **opinionated but askable** (default Y, opt-out), **non-destructive** (configs written only when absent; existing ESLint/Prettier/Jest/husky are detected and a migration is only ever PROPOSED, never automatic — `.husky/` is never overwritten), **never silent in CI** (`--non-interactive` writes the flag only; `baldart doctor` backfills), and **silent fallback** (an unset command degrades to the project-standard default; the layer is invisible to non-JS projects and consumers with their own toolchain). **MINOR** (additive capability + new `features.has_toolchain` + `toolchain.*` config keys, propagated end-to-end per the schema-change propagation rule; backwards-compatible — flag defaults `false`, every gate falls back to today's behavior when unset).
32
+
33
+ ### Added
34
+
35
+ - **`src/utils/toolchain-adapters/`** — new adapter family (same dispatcher pattern as `lsp-adapters/`). Curated installers `biome.js` / `vitest.js` / `tsc.js` / `lefthook.js` (each: `installCommand`/`verifyCommand`/`commands()`/`initConfig`/`static replaces`/`static detect`) + incumbent **detectors** `eslint.js` / `prettier.js` / `jest.js` / `husky.js` (detection-only, drive the migration proposal). `index.js` exposes `REGISTRY` + `INCUMBENTS` + `detectAll` + `detectIncumbents`.
36
+ - **`src/utils/toolchain-installer.js`** — orchestrator (modeled on `graphify-installer.js`): `recommend()` (clean installs), `migrations()` (incumbent-blocked tools → manual migration), `install()` (devDeps; writes a tool's config BEFORE the package's own postinstall so Lefthook registers our Biome pre-commit, not its commented-out example), `initConfigs()` (non-destructive), `commandsFor()`, `activeTools()`, `certify()`. Never throws.
37
+ - **`framework/agents/toolchain-protocol.md`** — runtime protocol: per-gate resolution (explicit `toolchain.commands.*` → project-standard fallback → skip), the command map, and the rule that a configured command which FAILS is a real failure (fallback applies only to *unset* commands). Routed from `framework/agents/index.md`.
38
+ - **`framework/docs/TOOLCHAIN-LAYER.md`** — operator guide (plumbing, lifecycle, edge cases, how to add a language/tool).
39
+ - **`framework/.claude/skills/toolchain-bootstrap/SKILL.md`** — explicit install path (`/toolchain-bootstrap`), CI-safe; mirrors `/lsp-bootstrap`.
40
+ - **`framework/templates/baldart.config.template.yml`** — `features.has_toolchain` + the `toolchain:` block (`installed_tools`, `commands.{lint,format,typecheck,test,test_related,build,audit}`, `auto_verify`).
41
+
42
+ ### Changed
43
+
44
+ - **`src/commands/configure.js`** — autodetects the prompt default (JS/TS project AND no incumbent → Y), and on enable runs the install/migrate/write/certify flow (clean set preselected; incumbents → migration proposal, never automatic).
45
+ - **`src/commands/update.js`** — the schema-drift detector now diffs the `toolchain:` block including its nested `commands.*` map (same nested-block contract as `graph:`), so a future gate key surfaces for pre-existing consumers.
46
+ - **`src/commands/doctor.js`** — backfill actions `toolchain-install` (devDeps missing) and `toolchain-init-config` (default config missing), gated on `features.has_toolchain`, never blocking.
47
+ - **`src/utils/tool-currency.js`** — `_toolchainRecords` reports curated devDeps behind their npm `latest` (installed version read from `node_modules`, not a global binary); `autoUpgradable:false` (devDep upgrades touch `package.json` — surfaced as a command, user-run). Honest `unknown` offline.
48
+ - **Consumers wired to read `toolchain.commands.*` with silent fallback** — `framework/.claude/agents/qa-sentinel.md`, `framework/.claude/agents/coder.md`, `framework/.claude/commands/qa.md`, `framework/.claude/workflows/new-card-review.js` (post-fix re-verify + qa gate brief), `framework/.claude/workflows/new2.js` (pre-flight baseline config facts).
49
+
8
50
  ## [4.40.0] - 2026-06-15
9
51
 
10
52
  **The classic `/new` now slims its single-card final review the same way `new2` already does — closing an asymmetry that made an N=1 batch re-run review finders it had already run.** `new-final-review.js` has carried an F-041 single-card slim since v4.17.x (keep the unique-value cross-model Codex pass + the qa-sentinel merge gate; drop the duplicate Claude breadth finders that already ran per-card), but it only ever fired for **`new2`**, which passes `singleCard`. The classic `/new` final-review delegation (`final-review.md` Step F.1.5) **never passed the flag**, so `slim` was always false and a one-card `/new` batch ran the full breadth set in the final review even though the per-card pass had already covered the same files — a duplicate cross-model Codex + redundant doc review on every single-card run. The user's framing ("for one card, skip the per-card review") was the wrong lever (it would drop Simplify, break the fail-fast ordering that runs review *before* E2E + doc-sync, and lose the early security pass — and was already litigated: v3.35.0 introduced an N=1 final-review skip, v3.37.0 reverted it precisely because the per-card pass can run shallow under `light`). The right lever, already chosen for `new2`, is the inverse: keep the per-card pass, slim the *final*. This release extends that slim to classic `/new` — but **per-finder, coverage-gated**, because classic `/new` differs from `new2`: its `doc-reviewer` runs in the skill's Phase 3 (and can defer to Final under `light`), and its `api-perf-cost-auditor` is deferred-to-final *by design* (never runs per-card). **MINOR** (changes the N=1 merge-gate composition; no new surface, no `baldart.config.yml` key — `singleCard`/`slimDoc`/`slimApi` are workflow args ⇒ schema-propagation rule N/A).
package/README.md CHANGED
@@ -208,13 +208,13 @@ Skills always-ask when required keys are missing — never silently default.
208
208
  never overwrites your file. Full guide:
209
209
  [`framework/docs/PROJECT-CONFIGURATION.md`](framework/docs/PROJECT-CONFIGURATION.md).
210
210
 
211
- ### Skills (32 portable skills)
211
+ ### Skills (33 portable skills)
212
212
 
213
213
  Skills live under `.claude/skills/` and are auto-discovered by Claude Code.
214
214
  Bundled skills:
215
215
 
216
216
  - **Workflow**: `new`, `new2` (v4.16.0 — EXPERIMENTAL workflow-hosted `/new`, Claude-only, for A/B testing context economy), `prd`, `prd-add`, `bug`, `simplify`, `worktree-manager`, `issue-review`, `context-primer`
217
- - **Code quality**: `skill-creator`, `find-skills`, `webapp-testing`, `playwright-skill`, `lsp-bootstrap` (v3.10.0), `graphify-bootstrap` (v4.21.0 — code knowledge graph), `graph-align` (v4.21.0 — doc↔graph alignment), `e2e-review` (v3.18.0)
217
+ - **Code quality**: `skill-creator`, `find-skills`, `webapp-testing`, `playwright-skill`, `lsp-bootstrap` (v3.10.0), `graphify-bootstrap` (v4.21.0 — code knowledge graph), `graph-align` (v4.21.0 — doc↔graph alignment), `toolchain-bootstrap` (v4.41.0 — curated dev toolchain), `e2e-review` (v3.18.0)
218
218
  - **Design**: `frontend-design`, `ui-design`, `motion-design`, `gamification-design`, `design-system-init` (v3.11.0)
219
219
  - **Product**: `seo-audit`, `copywriting`, `api-design-principles`
220
220
  - **Knowledge**: `doc-writing-for-rag`, `capture` (LLM wiki overlay)
@@ -245,6 +245,10 @@ When `features.has_lsp_layer: true`, `codebase-architect` and the code-explorati
245
245
 
246
246
  When `features.has_code_graph: true`, agents prefer the [Graphify](https://github.com/safishamsi/graphify) code knowledge graph (tree-sitter, local/offline, native Leiden communities) for **structural / relational** queries — "what connects X to Y", blast-radius of a change, which modules cluster — via `graphify query`/`path`/`explain`/`affected`. The same graph **re-activates the LLM-wiki auto-learning loop** (dormant since the RAG removal in v4.20.0): `wiki-curator`, `/capture`, and the nightly `doc-graph-align` routine feed synthesis candidates from Graphify's native `GRAPH_REPORT.md` (god nodes, communities, suggested questions) — entirely offline. Graphify is a single language-agnostic tool (`pipx install graphifyy`); install via `baldart configure` or `/graphify-bootstrap` (never silent in CI — `baldart doctor` backfills). Falls back silently to LSP→Grep→Git. See [`framework/agents/code-graph-protocol.md`](framework/agents/code-graph-protocol.md) and [`framework/docs/CODE-GRAPH-LAYER.md`](framework/docs/CODE-GRAPH-LAYER.md).
247
247
 
248
+ ### Curated Toolchain Layer (new in v4.41.0)
249
+
250
+ When `features.has_toolchain: true`, BALDART becomes opinionated about the *tools* you build with, not just the workflow. On a JS/TS project `baldart configure` PRESELECTS and installs a curated set as devDependencies — **Biome** (format + lint + import organizer), **Vitest**, **tsc**, **Lefthook** (pre-commit) — and records literal gate commands in `toolchain.commands.*`. The quality-gate flows (`/new`, `/new2`, `/qa`, `qa-sentinel`, `coder`) then run those commands verbatim instead of hard-coding `eslint`/`tsc`/`jest`. Opinionated **but askable** (default Y, opt-out) and **non-destructive**: existing ESLint/Prettier/Jest/husky setups are detected and a migration is only ever PROPOSED, never automatic (`.husky/` is never overwritten). Never silent in CI (`baldart doctor` backfills); each gate falls back silently to the project-standard default when its command is unset, so non-JS projects and consumers with their own toolchain are unaffected. Install via `baldart configure` or `/toolchain-bootstrap`. See [`framework/agents/toolchain-protocol.md`](framework/agents/toolchain-protocol.md) and [`framework/docs/TOOLCHAIN-LAYER.md`](framework/docs/TOOLCHAIN-LAYER.md).
251
+
248
252
  ### Commands
249
253
 
250
254
  - **/new**: Batch orchestrator with QA validation, production readiness checklist, and context recovery (also available as a skill)
package/VERSION CHANGED
@@ -1 +1 @@
1
- 4.40.0
1
+ 4.42.0
@@ -200,7 +200,7 @@ Use this table when spawning agents via the `Task` tool. The `model` field in ea
200
200
 
201
201
  **Rules**: Never use haiku for any **named specialist agent** in the matrix above. Opus for code writing and creative/complex work. Sonnet for analysis, review, and documentation. `qa-sentinel` stays on sonnet even though it is a mechanical gate runner — its failure interpretation and SCOPED-vs-FULL tiering benefit from sonnet.
202
202
 
203
- **Plumbing carve-out (haiku allowed):** the "never haiku" rule above governs the specialist agents. **Mechanical plumbing spawned as `general-purpose`** — with no reasoning and an explicit ROLE BOUNDARY — MAY use `model: "haiku"` via override. Two sanctioned uses today, both in `/new`: (1) the background **worktree-setup** subagent that runs `/nw` (create worktree, install deps, allocate port, write registry — `references/setup.md` §4b); (2) the **file-scoped revert agent** that restores unauthorized files to their pre-commit state (`references/implement.md` Phase 2.4). Both are deterministic git/file ops, never code authoring.
203
+ **Plumbing carve-out (haiku allowed):** the "never haiku" rule above governs the specialist agents. **Mechanical plumbing spawned as `general-purpose`** — with no reasoning and an explicit ROLE BOUNDARY — MAY use `model: "haiku"` via override. One sanctioned use today, in `/new`: the **file-scoped revert agent** that restores unauthorized files to their pre-commit state (`references/implement.md` Phase 2.4) a deterministic git/file op, never code authoring. (The background **worktree-setup** subagent — `references/setup.md` §4b was haiku through v4.41.0 but moved to **`model: "sonnet"`** in v4.42.0: it runs the multi-step `/nw` skill **via the Skill tool**, a sustained tool-execution chain a too-weak model fabricates rather than executes — the observed 2/2 `baseline: pass`-with-no-worktree failures. Its return is verified by the orchestrator's disk gate regardless, so a capable model is the right trade.)
204
204
 
205
205
  ## Notes
206
206
 
@@ -226,9 +226,11 @@ persistence code while `stack.database` is empty, surface a warning suggesting
226
226
 
227
227
  Do NOT wait until the end to verify your work. After each sub-task that changes types, interfaces, or function signatures:
228
228
 
229
- 1. **When `stack.language` includes `typescript`**: run `npx tsc --noEmit` fix ALL errors before proceeding to the next sub-task. On non-TypeScript stacks, run the project's equivalent type/static check instead (e.g. `mypy` / `pyright` for Python, `go vet` for Go) when one is configured.
229
+ > **Toolchain (since v4.41.0):** when `features.has_toolchain: true`, use the commands in `toolchain.commands.*` (`typecheck`, `lint`, `test`/`test_related`) verbatim for the steps below per `agents/toolchain-protocol.md` falling back to the defaults shown when a key is unset.
230
+
231
+ 1. **When `stack.language` includes `typescript`**: run `toolchain.commands.typecheck` if set, else `npx tsc --noEmit` — fix ALL errors before proceeding to the next sub-task. On non-TypeScript stacks, run the project's equivalent type/static check instead (e.g. `mypy` / `pyright` for Python, `go vet` for Go) when one is configured.
230
232
  2. If the type check fails: **STOP**. Fix the errors first. Do NOT continue writing new files on top of broken types.
231
- 3. After the final sub-task, run the project's linter on your changed files (e.g. `npx eslint --max-warnings=0 <files>` for a JS/TS stack, or the configured equivalent).
233
+ 3. After the final sub-task, run the linter on your changed files: `toolchain.commands.lint` if set, else the project default (e.g. `npx eslint --max-warnings=0 <files>` for a JS/TS stack, or the configured equivalent).
232
234
  4. **Unit tests**: run ONLY the specific test file, not the full suite. `npm run test` is slow (minutes). Run the single file with the project's actual runner — check `package.json` scripts / devDependencies and use whichever is present (e.g. `npx vitest run <file>`, `npx jest <file>`, `node --import tsx <file>`, or `pytest <file>` / `go test ./<pkg>` on non-JS stacks). Do NOT assume `tsx` is installed: if the chosen command fails with a missing-loader/module error, that means the runner is wrong — try the next candidate, do NOT treat the failure as "no related tests". If the task doesn't involve a known test file, skip the test run.
233
235
 
234
236
  **Error Recovery — three-branch state machine (cap: 3 attempts)**: If a build breaks mid-implementation:
@@ -23,7 +23,16 @@ Run mechanical quality gates fast, return a PASS/FAIL verdict, and get out of th
23
23
  > patterns. Pre-commit gates below are the typical defaults — adjust the toolchain to
24
24
  > match your project (e.g. `ruff` instead of `eslint`, `pytest` instead of `npm test`).
25
25
 
26
- Default pre-commit gates (MUST pass before any commit verdict):
26
+ **Toolchain commands (since v4.41.0):** when `features.has_toolchain: true` in
27
+ `baldart.config.yml`, run the command from `toolchain.commands.<gate>` (lint,
28
+ typecheck, test, test_related, build, audit) **verbatim** instead of the default
29
+ below — per `agents/toolchain-protocol.md`. Resolve per gate: a non-empty config
30
+ command wins; an empty/absent one falls back to the default below. A configured
31
+ command that EXITS NON-ZERO is a real FAIL (do not fall back). Example: a project
32
+ on Biome runs `npx biome check .` for lint; a Vitest project runs `npx vitest run`.
33
+
34
+ Default pre-commit gates (MUST pass before any commit verdict) — used when no
35
+ `toolchain.commands.*` is configured:
27
36
  1. Lint: `npx eslint --max-warnings=0 <changed source files>` (or project equivalent)
28
37
  2. Type-check: `npx tsc --noEmit` (or project equivalent)
29
38
  3. Markdown lint: `npx markdownlint-cli2 <changed .md files>` (if .md files changed)
@@ -98,6 +98,8 @@ Write the initial file:
98
98
 
99
99
  ## Step 3 — Execute Profile
100
100
 
101
+ > **Toolchain (since v4.41.0):** when `features.has_toolchain: true` in `baldart.config.yml`, run the gate commands from `toolchain.commands.*` (lint, typecheck, test, test_related, build, audit) verbatim — per `agents/toolchain-protocol.md` — instead of the generic examples below. A configured command that fails is a real FAIL.
102
+
101
103
  ### LIGHT — Fast confidence (<3 min target [DESIGN-CHOICE: scoped to lint + type-check + related tests only; sufficient for low-risk or doc-only diffs])
102
104
 
103
105
  Run directly in this session (no sub-agents):
@@ -281,29 +281,41 @@
281
281
  - **`WT_PATH` set AND `registry.json` has a complete code entry for this slug** (a finished prior run — `buildVerified` recorded) → **resume**: read `path`/`branch`/`port`/`createdAt`/`buildVerified` from that entry, skip to step 6; re-run the baseline as a single background `Bash` (output to `/tmp`) **only if** `buildVerified` is not `true`.
282
282
  - **`WT_PATH` set but NO complete registry entry** (a prior attempt interrupted mid-setup — the normal compaction-mid-barrier state: worktree created, build unfinished, entry never written) → it is a half-built orphan with **no card work** (we are still in pre-flight, zero commits). **Reset clean and recreate**: `git -C "$MAIN" worktree remove --force "$WT_PATH"` then `git -C "$MAIN" branch -D "$WT_BRANCH"` (both ignore-if-absent), then proceed to 4b. A pre-flight worktree has nothing to lose, so a clean recreate is always safe — and it sidesteps the fail-loud collision a naive re-spawn would hit.
283
283
  This — detection by `git worktree list`, not the lagging registry — is what makes the deferred-flush pre-flight genuinely idempotent across compaction.
284
- b. **Spawn ONE background subagent** (Agent tool, **`mode: "bypassPermissions"`** — mandatory per the SKILL.md meta-rules; a background agent that hit a permission prompt with no human present would stall the barrier forever — `run_in_background: true`, `name: "worktree-setup-<FIRST-CARD-ID>"`, a subagent type that can use the Skill tool — `general-purpose`, **`model: "haiku"`** — this mission is pure deterministic plumbing (create the worktree, install deps, allocate the dev-server port, write the registry entry) with no reasoning; haiku is sufficient and is faster on the barrier, and it is the sanctioned plumbing carve-out to REGISTRY.md's "never haiku" rule) whose ENTIRE mission is to run `/nw` and return the block below. Briefing:
284
+ b. **Spawn ONE background subagent** (Agent tool, **`mode: "bypassPermissions"`** — mandatory per the SKILL.md meta-rules; a background agent that hit a permission prompt with no human present would stall the barrier forever — `run_in_background: true`, `name: "worktree-setup-<FIRST-CARD-ID>"`, a subagent type that can use the Skill tool — `general-purpose`, **`model: "sonnet"`** — the mission runs the multi-step `/nw` skill (group cards, create the worktree, install deps, allocate the dev-server port, write the registry entry, run the baseline) **via the Skill tool**: a sustained tool-execution chain, NOT one-shot plumbing. A too-weak model pattern-matches the expected return block instead of doing the work the observed 2/2 fabrication was a well-formed block reporting `baseline: pass` in ~6s with **no worktree on disk**. Sonnet executes it for real or fails honestly; either way the returned block is **never trusted as evidence** — the orchestrator's disk-verification gate (step 6a below) is the source of truth) whose ENTIRE mission is to run `/nw` and return the block below. Briefing:
285
285
  ```
286
286
  Invoke the worktree-manager skill in `/nw` programmatic mode with:
287
287
  { cards: [<all card IDs>], groupParent: <PARENT-ID|null>, slug: "<slug>" }
288
288
  Let it: group cards, derive the branch from git_strategy.branch (fallback
289
289
  feat/<PARENT-ID>-<slug>), create the worktree in .worktrees/, install deps,
290
290
  copy env files, assign a free port, update .worktrees/registry.json (all card
291
- IDs in the `cards` field), and run the baseline (tsc + lint + build).
291
+ IDs in the `cards` field), and run the baseline (tsc + lint + build)
292
+ UNDER A HARD TIMEOUT — wrap the build step in `timeout 600 <build-cmd>`
293
+ (10 min) so a hung or interactive build cannot stall the pre-flight barrier.
292
294
  Redirect ALL install/build/git output to files under /tmp — never inline it.
293
295
  Return ONLY this block, nothing else (no logs, no narration, no skill recap):
294
296
  worktree_path: <absolute path>
295
297
  branch: <branch>
296
298
  port: <n>
297
299
  created_at: <ISO-8601, stamped WHEN you create the worktree — before install/build>
298
- baseline: pass | fail
299
- baseline_log: <path on failure, else "-">
300
- If the build fails, return `baseline: fail` with the log path and STOP — do not continue.
300
+ baseline: pass | fail | timeout
301
+ baseline_log: <path on failure/timeout, else "-">
302
+ If the build fails, return `baseline: fail` with the log path and STOP.
303
+ If the timeout kills the build, return `baseline: timeout` with the partial
304
+ log path and STOP — do not continue.
301
305
  ```
302
306
  c. The subagent's context (the worktree-manager skill body, install/build logs, git output) **lives and dies inside it** — the orchestrator receives only the structured block. Combined with the Codex check (3d, also background), this replaces the old ~30-turn foreground pre-flight tail with background ops and a single resume.
303
- d. **Fallback if the subagent cannot run `/nw` (the Skill-from-subagent capability is NOT assumed).** If the subagent returns empty, or without the structured block — e.g. it cannot invoke the Skill tool in this consumer, or it returns the known "0 tool uses · Done" empty-result failure do **NOT** strand the barrier: the orchestrator **falls back to invoking `/nw` inline itself** (the pre-v4.15.0 path slug from 4a, then record `worktree_path`/`branch`/`port` and stamp `created_at` from the inline return). You lose the prefix saving for this one run, but pre-flight completes. This is the same opt-in-with-fallback discipline the Codex check (3d) and the dynamic-workflow gate use — never leave the worktree uncreated on a missing capability.
307
+ d. **Fallback executor — when the subagent does not produce a VERIFIED worktree.** The subagent's returned block is **never trusted as evidence**; step 6a verifies the worktree on disk. Trigger this fallback when EITHER (i) the subagent returns empty / without the structured block (cannot invoke the Skill tool in this consumer, or the known "0 tool uses · Done" empty-result), OR (ii) it returns a well-formed block but the **step-6a disk gate is VERIFIED:false** (the observed 2/2 fabrication: `baseline: pass` in ~6s with no worktree on disk). In both cases do **NOT** strand the barrier: the orchestrator **falls back to invoking `/nw` inline itself** — a genuinely **DIFFERENT executor** (the full-model orchestrator interpreting the skill prose), **not** a re-spawn of the same subagent and **not** a frozen script, so neither a Skill-from-subagent capability gap nor a weak-model fabrication can recur on the fallback. Slug from 4a; record `worktree_path`/`branch`/`port` and stamp `created_at` from the inline return; then **re-run the step-6a disk gate ONCE**. **Cap (no loop):** the chain is strictly `subagent → inline /nw → HALT+report` — never re-spawn the subagent, never loop the inline path. If the inline attempt ALSO fails the gate, or the failure is **deterministic** (port exhaustion across 3001-3099, a corrupted lockfile, or a genuine `baseline: fail`/`timeout` build error — recreating cannot fix these), **HALT and report** rather than retry. You lose the prefix saving for this one run, but pre-flight completes or halts cleanly — never silently on a phantom worktree. This is the same opt-in-with-fallback discipline the Codex check (3d) and the dynamic-workflow gate use.
304
308
  5. **End the turn — barrier on ALL launched background ops (wait for every one, not the first).** Having launched the Codex cross-card check (3d) and the worktree-setup subagent (4), the orchestrator has nothing to do until they return. **End the turn** — do NOT poll with `sleep`/`echo "waiting"` loops (§ "Context economy"; same rule as team-mode Step C). Background agents and background `Bash` re-invoke the orchestrator automatically on completion. **Wait for EVERY launched op before step 6**: each completion wakes you separately, so on each wake check whether *all* launched ops have returned — if one is still in flight, **end the turn again** and wait. Do NOT proceed to step 6 on the first completion, or you would read a half-written `$AUDIT_FILE` (and 3d's "If PASS or file empty: proceed normally" would silently swallow real conflicts) or a missing worktree block. (If 3d was SKIPPED by the provenance gate, the only op is the worktree subagent — or none, if step 4a2 resumed an existing worktree.) **Recovery**: a compaction mid-barrier re-enters pre-flight from step 4; the 4a2 git pre-check makes that safe (the worktree is detected via `git worktree list` and resumed-or-reset, never blindly re-created into a fail-loud collision).
305
309
  6. **On resume — flush the pre-flight tracker sections in one pass (no incremental per-sub-step churn).** When all launched ops have returned:
306
- a. **Baseline gate**: if the worktree subagent returned `baseline: fail` STOP and report (point the user at `baseline_log`). Do NOT continue to Phase 1.
310
+ a. **Worktree integrity gate (BLOCKING the disk is the source of truth, not the returned block).**
311
+ - **Honest-failure signals first** (trust these — a reported failure is real, not a fabrication, and recreating the worktree would not fix it): if the block reports `baseline: fail` → STOP and report (point the user at `baseline_log`); if `baseline: timeout` (the build exceeded the launch timeout, §4b/§5) → STOP and report the timeout. Do NOT continue to Phase 1; do NOT recreate.
312
+ - **On `baseline: pass`, VERIFY the worktree on disk YOURSELF** — these are orchestrator `Bash` calls run in this (full-model) context; **never** trust a `verified`-style field returned by a subagent (it is exactly as fabricable as the block itself). With `$WT_PATH`/`$WT_BRANCH` from the block and `$MAIN` from the tracker (`## Worktree` `Main repo:`):
313
+ 1. `git -C "$MAIN" worktree list --porcelain` lists `$WT_PATH`;
314
+ 2. `test -d "$WT_PATH"`;
315
+ 3. `git -C "$WT_PATH" rev-parse --abbrev-ref HEAD` equals `$WT_BRANCH`;
316
+ 4. `test -d "$WT_PATH/node_modules"` (proves install actually ran);
317
+ 5. *(corroboration)* the `.worktrees/registry.json` entry for this worktree has `buildVerified: true`, and — best-effort — a build artifact dir if the project emits one (`.next`/`dist`/`build`/`out`) is present with mtime newer than `$WT_PATH/package.json`. A missing build dir is **not** a hard fail on its own (a lib/CLI may emit none); checks 1-4 are the real evidence.
318
+ - **All pass → VERIFIED**: proceed to (b). **Any of 1-4 fails →** the block was fabricated or setup is incomplete (the observed 2/2: `baseline: pass`, ~6s, nothing on disk) → route to the **4d fallback chain** (inline `/nw`, re-run this gate ONCE, then HALT — never loop). This gate is the actual fix for the haiku-fabrication failure; do not skip it on the happy path.
307
319
  b. **Codex verdict**: handle it via the verdict-extraction discipline in 3d (read `$AUDIT_FILE` through the `[codex]`-stripping filter; keep distilled findings only).
308
320
  c. **One-pass tracker flush (no round-trips).** Assemble the pre-flight sections **in-context** (they are all small) and fill them with **back-to-back `Edit`s and no intervening reads**. The win is killing the old read-modify-read-modify churn (~5 incremental edits), **not** the literal tool-call count. **Do NOT `Write`-overwrite the whole file from in-context memory**: Phase 0 already wrote `Main repo:` / `Trunk branch:` / `Metrics dir:` into `## Worktree` and the `Status`/divergence lines into `## Phase 0`, and after a barrier compaction you may no longer hold those in-context (`$MAIN` "does not survive context compaction" — § Phase 0 step 1) — an overwrite-from-memory would silently drop them and HALT later with "`$MAIN` absent from tracker". Surgical `Edit`s on the placeholder sections leave Phase 0's content intact. Sections filled here:
309
321
  - `## File Ownership Map` (3b).
@@ -0,0 +1,127 @@
1
+ ---
2
+ name: toolchain-bootstrap
3
+ effort: medium
4
+ description: >
5
+ Install, verify, and wire the curated dev toolchain for this project. Detects
6
+ the JS/TS stack, installs the best-in-class tools as devDependencies (Biome for
7
+ format+lint+import, Vitest, tsc, Lefthook), writes their default config only
8
+ when absent, records toolchain.installed_tools + toolchain.commands.* in
9
+ baldart.config.yml, and proposes (never forces) a migration off any existing
10
+ ESLint/Prettier/Jest/husky. Use when the user says /toolchain-bootstrap,
11
+ "install the toolchain", "set up biome", or after enabling
12
+ features.has_toolchain for the first time. Idempotent — re-running re-verifies
13
+ and reports drift, never overwriting existing configs.
14
+ ---
15
+
16
+ # Toolchain Bootstrap
17
+
18
+ Set up the curated dev toolchain so the quality gates (`/new`, `/qa`, `/check`,
19
+ `qa-sentinel`) run best-in-class tools instead of whatever the project happened
20
+ to have. Non-destructive throughout: configs are written only when absent and
21
+ incumbent tools are never installed over or removed.
22
+
23
+ ## Project Context
24
+
25
+ **Reads from `baldart.config.yml`:** `features.has_toolchain`, `toolchain.installed_tools`, `toolchain.commands.*`, `toolchain.auto_verify`.
26
+ **Gated by features:** `features.has_toolchain` — this skill refuses to run when the flag is `false`, prompting the user to flip it via `npx baldart configure` first.
27
+ **Overlay:** loads `.baldart/overlays/toolchain-bootstrap.md` if present — project-specific tool choices or install commands.
28
+ **On missing keys:** ask the user; do not assume defaults. See `framework/agents/project-context.md` § 3.
29
+
30
+ ## Effort
31
+
32
+ **Baseline:** `effort: medium` (frontmatter). **Inline override:** pass
33
+ `effort=<low|medium|high|xhigh|max>` anywhere in the invocation to scale
34
+ reasoning depth for this run — detect it once at kickoff and strip the token
35
+ before consuming user input. Level→behavior mapping, parsing contract, and
36
+ precedence caveats: `framework/agents/effort-protocol.md`.
37
+
38
+ ## What This Skill Does
39
+
40
+ 1. Reads `baldart.config.yml` and confirms `features.has_toolchain: true`.
41
+ 2. Probes the cwd via `src/utils/toolchain-installer.js` to compute the clean
42
+ recommendation (`recommend()`) and any incumbent-blocked migrations
43
+ (`migrations()`).
44
+ 3. Installs the recommended tools as **devDependencies** (`npm install -D -E`) —
45
+ never global (these are project tools invoked via `npx`, unlike LSP servers).
46
+ 4. Writes each tool's default config **only when absent** (`biome.json`,
47
+ `lefthook.yml`) and registers Lefthook's `pre-commit` hook.
48
+ 5. For incumbents (ESLint/Prettier/Jest/husky), PROPOSES a migration with a
49
+ preview — never automatic, never touching the existing config (`.husky/` is
50
+ never overwritten).
51
+ 6. Writes `toolchain.installed_tools` + `toolchain.commands.*` to
52
+ `baldart.config.yml`, then `certify()`s the set.
53
+ 7. Prints a compact status and points to
54
+ `framework/agents/toolchain-protocol.md` for runtime behavior.
55
+
56
+ ## Workflow
57
+
58
+ 1. **Refusal check.** If `features.has_toolchain: false` (or missing):
59
+ ```
60
+ This project hasn't opted in to the curated toolchain yet.
61
+ Run `npx baldart configure` and answer YES to "Enable curated dev toolchain?"
62
+ then re-run /toolchain-bootstrap.
63
+ ```
64
+ Stop here. Do not proceed.
65
+
66
+ 2. **Probe.** Call the installer:
67
+ ```
68
+ node -e 'const T=require("./.framework/src/utils/toolchain-installer"); const t=new T(); console.log(JSON.stringify({hasPkg:t.hasPackageJson(),recommend:t.recommend(),migrations:t.migrations()},null,2))'
69
+ ```
70
+ If `hasPkg` is false, report "no package.json — JS toolchain not applicable"
71
+ and stop. Otherwise show the recommended tools and any migrations.
72
+
73
+ 3. **Confirm.** For the clean recommendation, default YES. For each migration,
74
+ show what it replaces (and for Biome, the `npx biome migrate eslint/prettier
75
+ --write` preview) and default NO.
76
+
77
+ 4. **Install + config + commands.** Run the installer's `install`, `initConfigs`,
78
+ then read back `commandsFor(activeTools())`. Capture failures into a "Skipped"
79
+ bucket — never abort the whole skill on one failure.
80
+
81
+ 5. **Persist.** Update `baldart.config.yml`:
82
+ ```yaml
83
+ toolchain:
84
+ installed_tools: [biome, vitest, tsc, lefthook]
85
+ commands:
86
+ lint: npx biome check .
87
+ format: npx biome format --write .
88
+ typecheck: npx tsc --noEmit
89
+ test: npx vitest run
90
+ test_related: npx vitest related --run
91
+ auto_verify: true
92
+ ```
93
+
94
+ 6. **Output.** A compact one-line-per-tool status, in the project's
95
+ `identity.language` from `baldart.config.yml` (English default).
96
+
97
+ ## Output Contract
98
+
99
+ A single short status block — never a multi-page report. Format:
100
+
101
+ ```
102
+ Toolchain bootstrap:
103
+ ✓ biome (devDep 2.x, biome.json written, lint+format wired)
104
+ ✓ vitest (devDep, test + test_related wired)
105
+ ✓ tsc (typescript present, typecheck wired)
106
+ ✓ lefthook (devDep, lefthook.yml + pre-commit hook installed)
107
+ · eslint → migration available (declined): npx biome migrate eslint --write
108
+ Config updated: baldart.config.yml toolchain.installed_tools = [biome, vitest, tsc, lefthook]
109
+ Next: /qa, /new, qa-sentinel will now run these commands instead of the defaults.
110
+ ```
111
+
112
+ ## Notes
113
+
114
+ - This skill never edits source code. Its only side effects are devDependency
115
+ installs, writing absent default configs, registering the Lefthook hook, and a
116
+ YAML write to `baldart.config.yml`.
117
+ - Re-run safely. `recommend()`/`migrations()` are recomputed each time, configs
118
+ are skipped when present, and already-usable tools are left untouched —
119
+ additions / repairs are idempotent.
120
+ - `baldart doctor` is the unattended backfill: it installs missing tools
121
+ (`toolchain-install`) and restores missing default config (`toolchain-init-config`).
122
+
123
+ ## See Also
124
+
125
+ - `framework/agents/toolchain-protocol.md` — runtime command resolution + fallback.
126
+ - `framework/docs/TOOLCHAIN-LAYER.md` — full plumbing + lifecycle.
127
+ - `src/utils/toolchain-adapters/` — per-tool install/verify recipes.
@@ -38,6 +38,14 @@ const cfg = a.config || {}
38
38
  const highRisk = (cfg.paths && cfg.paths.high_risk_modules) || [] // security-domain hint
39
39
  const protocolRef = '.claude/skills/new/references/review-cycle.md'
40
40
 
41
+ // Curated toolchain (since v4.41.0): when features.has_toolchain is on, the
42
+ // consumer records LITERAL gate commands in toolchain.commands.* — agents run
43
+ // THOSE instead of guessing (e.g. `npx biome check .` not `npm run lint`). Empty
44
+ // / absent → silent fallback to the project-standard default. Read from the
45
+ // config the skill already passes via args.config (never hardcoded project facts).
46
+ const tcCmds = (cfg.toolchain && cfg.toolchain.commands) || {}
47
+ const tc = (key, fallback) => (tcCmds[key] && String(tcCmds[key]).trim()) || fallback
48
+
41
49
  // Per-card result accumulator — built up-front (so the early-return guards can return it) and
42
50
  // populated with fixesApplied/residual in the Fix phase.
43
51
  const perCard = {}
@@ -193,8 +201,15 @@ const codexPrompt =
193
201
  `For each finding return: finding_id, title, severity (BLOCKER|HIGH|MEDIUM|LOW), confidence (0-100), evidence (exact file:line + code quote), minimal_fix_direction, and domain (doc|security|migration|code|perf|test). ` +
194
202
  `Run the mandatory false-positive check on every finding and suppress the unconvincing ones (your findings are treated as already FP-validated). Set codexAvailable:true when the review ran.`
195
203
 
204
+ const tcGateLines = [
205
+ ['lint', tcCmds.lint], ['typecheck', tcCmds.typecheck], ['test', tcCmds.test],
206
+ ['build', tcCmds.build], ['audit', tcCmds.audit],
207
+ ].filter(([, v]) => v && String(v).trim());
196
208
  const qaPrompt =
197
- `Run MECHANICAL GATES ONLY over the wave scope, per ${protocolRef} (Phase 3.5 qa-sentinel contract): lint, type-check (when stack uses typescript), the full test suite, build, dependency audit, and markdownlint as applicable. You are a GATE RUNNER: do NOT read source for code findings, do NOT emit severities — return only a PASS/FAIL/SKIP gate table.\n\nWorktree: ${a.worktreePath || '(cwd)'} — cd into it first.\nChanged files:\n${unionScope.join('\n')}`
209
+ `Run MECHANICAL GATES ONLY over the wave scope, per ${protocolRef} (Phase 3.5 qa-sentinel contract): lint, type-check (when stack uses typescript), the full test suite, build, dependency audit, and markdownlint as applicable. You are a GATE RUNNER: do NOT read source for code findings, do NOT emit severities — return only a PASS/FAIL/SKIP gate table.\n\nWorktree: ${a.worktreePath || '(cwd)'} — cd into it first.\nChanged files:\n${unionScope.join('\n')}` +
210
+ (tcGateLines.length
211
+ ? `\n\nThis project configures a curated toolchain — run THESE EXACT commands for the corresponding gates (do not substitute):\n${tcGateLines.map(([k, v]) => ` • ${k}: ${String(v).trim()}`).join('\n')}`
212
+ : '')
198
213
 
199
214
  function simplifyPrompt(c) {
200
215
  return `Simplify analysis (read-only — you do NOT edit, the workflow applies fixes afterward) over ONE card's committed diff, per ${protocolRef} (Phase 2.55). Cover all THREE lenses and return findings:\n` +
@@ -342,7 +357,7 @@ async function applyFixPass(findings, writer, label, role) {
342
357
  `You MAY edit ONLY these files (ownership map — touching anything else is a violation):\n${unionEditable.join('\n')}\n\n` +
343
358
  `Findings to fix (fix the code, not the tests unless a test itself is wrong; do NOT expand scope beyond the finding):\n` +
344
359
  findings.map((f) => `- [${f.finding_id}] (${f.card || '?'} / ${f.domain} / ${f.severity}) ${f.title}\n evidence: ${f.evidence}\n direction: ${f.minimal_fix_direction}`).join('\n') +
345
- `\n\nAfter applying: run \`npm run lint\` and (when the project uses typescript) \`npx tsc --noEmit\` and \`npm run build\` in the worktree. If a check fails because of an edit you made, fix the regression — at most 2 retries — staying within the allowed files. ` +
360
+ `\n\nAfter applying: run \`${tc('lint', 'npm run lint')}\` and (when the project uses typescript) \`${tc('typecheck', 'npx tsc --noEmit')}\` and \`${tc('build', 'npm run build')}\` in the worktree. If a check fails because of an edit you made, fix the regression — at most 2 retries — staying within the allowed files. ` +
346
361
  `Do NOT commit. Do NOT git stash (refs/stash is shared across worktrees). ` +
347
362
  `Return: applied (finding_ids you fixed), unresolved (finding_ids you could NOT fix within the allowed files / 2 retries), and checks (PASS/FAIL/SKIP for lint, tsc, build).`
348
363
  const r = await agent(fixBrief, { label, phase: 'Fix', agentType: writer, schema: FIX_SCHEMA })
@@ -141,15 +141,29 @@ if (!cardIds.length) {
141
141
  // ───────────────────────────────────────────────────────────────────────────
142
142
  const PREFLIGHT_SCHEMA = {
143
143
  type: 'object',
144
- required: ['ok', 'worktreePath', 'branch', 'baseline', 'cards', 'cardGraph'],
144
+ required: ['ok', 'worktreePath', 'branch', 'baseline', 'worktreeVerified', 'cards', 'cardGraph'],
145
145
  additionalProperties: false,
146
146
  properties: {
147
147
  ok: { type: 'boolean' },
148
148
  worktreePath: { type: 'string' },
149
149
  branch: { type: 'string' },
150
150
  port: { type: ['number', 'string'] },
151
- baseline: { enum: ['pass', 'fail'] },
151
+ baseline: { enum: ['pass', 'fail', 'timeout'] },
152
152
  baselineLog: { type: 'string' },
153
+ // v4.42.0 — worktree integrity. The workflow JS CANNOT run bash, so it cannot re-verify the
154
+ // worktree itself the way classic /new does with orchestrator Bash (setup.md §6a) — this gate
155
+ // is STRUCTURALLY WEAKER. Best mitigation: the agent returns worktreeVerified + the LITERAL
156
+ // command evidence, which the workflow string-matches (a bare boolean would be as fabricable
157
+ // as the haiku block was). Real backstop: every card cd's into the worktree and re-runs gates.
158
+ worktreeVerified: { type: 'boolean', description: 'true ONLY after the agent ran `git worktree list --porcelain` + `test -d <wt>/node_modules` and they confirm the worktree on disk' },
159
+ worktreeEvidence: {
160
+ type: 'object', additionalProperties: true,
161
+ properties: {
162
+ worktreeListPorcelain: { type: 'string', description: 'literal stdout of `git -C <main> worktree list --porcelain`' },
163
+ artifactsLs: { type: 'string', description: 'literal stdout of `ls -la <wt>/node_modules <wt>/.next 2>/dev/null | head`' },
164
+ baselineLogTail: { type: 'string', description: 'last ~20 lines of the baseline log' },
165
+ },
166
+ },
153
167
  // B4 — executionMode/groups removed: the DAG scheduler is strictly sequential (single
154
168
  // worktree); computing team-mode groups was pre-flight work nobody read. Real parallelism
155
169
  // is a future release, after A/B data.
@@ -215,6 +229,9 @@ const projectBrief = [
215
229
  `Trunk: ${TRUNK}`,
216
230
  `Reference modules (Read these for the EXACT /new semantics — this workflow only sequences them): ${REF}/`,
217
231
  `Config facts: stack=${JSON.stringify(cfg.stack || {})}; features=${JSON.stringify(features)}; high_risk_modules=${JSON.stringify(highRisk)}; paths.backlog_dir=${paths.backlog_dir || '?'}; paths.references_dir=${paths.references_dir || '?'}.`,
232
+ (features.has_toolchain && cfg.toolchain && cfg.toolchain.commands)
233
+ ? `Toolchain commands (run THESE verbatim for the baseline + any gate, per agents/toolchain-protocol.md; empty → project default): ${JSON.stringify(cfg.toolchain.commands)}.`
234
+ : '',
218
235
  FLAGS.effort ? `Reasoning effort for this run: ${FLAGS.effort}.` : '',
219
236
  ].filter(Boolean).join('\n')
220
237
 
@@ -243,7 +260,7 @@ try {
243
260
  `ROLE BOUNDARY (specialization integrity): you are the OPS/GIT agent. You NEVER edit source or doc files — any needed content change belongs to the coder specialist; report it instead.\n\n` +
244
261
  `DETERMINISTIC GATE POLICIES (NO user prompts):\n` +
245
262
  `• G1 dirty-tree (main repo ${MAIN}): partition framework-managed noise exactly as setup.md step 3 ($METRICS=${METRICS}, .baldart/generated|state.json|skill-conflicts.json — NOT overlays/). Genuine user work → auto-stash 'baldart-new2-${firstCard}' (main checkout) and record the label. Never commit/abort/prompt.\n` +
246
- `• Worktree (setup.md step 4): create ONE code worktree off ${TRUNK}; install deps; assign a port; run the baseline (tsc+lint+build). Copy ONLY the artifacts needed (env/.env.local/.env.example/supabase/.temp) — do NOT bulk-copy untracked files from the main repo (avoids stray backlog cards in the worktree). Use the git-authoritative idempotency pre-check. E2: baseline FAILS → do NOT fix it yourself (role boundary — the coder specialist repairs it); return baseline:'fail' + a baselineLog precise enough for a coder to act (failing command, error excerpt, suspect files).\n` +
263
+ `• Worktree (setup.md step 4): create ONE code worktree off ${TRUNK}; install deps; assign a port; run the baseline (tsc+lint+build). Copy ONLY the artifacts needed (env/.env.local/.env.example/supabase/.temp) — do NOT bulk-copy untracked files from the main repo (avoids stray backlog cards in the worktree). Use the git-authoritative idempotency pre-check. E2: baseline FAILS → do NOT fix it yourself (role boundary — the coder specialist repairs it); return baseline:'fail' + a baselineLog precise enough for a coder to act (failing command, error excerpt, suspect files). Wrap the build in \`timeout 600 <build-cmd>\` (10 min); if killed, return baseline:'timeout' + the partial log. On baseline PASS, VERIFY the worktree on disk and return EVIDENCE (not just a flag): set worktreeVerified:true ONLY after running \`git -C ${MAIN} worktree list --porcelain\` (the worktree path MUST appear in the output) AND \`test -d <wt>/node_modules\` AND confirming the branch; put the LITERAL stdout into worktreeEvidence{ worktreeListPorcelain, artifactsLs:\`ls -la <wt>/node_modules <wt>/.next 2>/dev/null | head\`, baselineLogTail }. The workflow string-matches this evidence — NEVER report worktreeVerified:true without actually running the commands.\n` +
247
264
  codexResolveBullet +
248
265
  g3Bullet +
249
266
  `• G4 card-field validation (setup.md 1b/1c): card missing requirements/acceptance_criteria/files_likely_touched → EXCLUDE (excluded[] + reason). Never HALT for one bad card.\n` +
@@ -268,6 +285,12 @@ if (!preflight || preflight.ok === false) {
268
285
  ledger(firstCard, 'preflight', 'BATCH-FATAL', (preflight && preflight.workspaceNote) || 'workspace unworkable')
269
286
  return finalReturn({ fatal: true, reason: 'workspace unworkable — see pre-flight' })
270
287
  }
288
+ if (preflight.baseline === 'timeout') {
289
+ // v4.42.0 — a hung/interactive build hit the §4b 10-min timeout. An autonomous batch cannot
290
+ // ask the user (classic /new STOPs here); recreating cannot fix a hang → batch-fatal.
291
+ ledger(firstCard, 'baseline', 'BATCH-FATAL', 'baseline build timed out — see baselineLog')
292
+ return finalReturn({ fatal: true, reason: 'baseline build timed out (hung/interactive build) — see baselineLog' })
293
+ }
271
294
  if (preflight.baseline === 'fail') {
272
295
  // E2 (specialization integrity) — baseline repair is CODE work: it belongs to the coder
273
296
  // specialist, not the ops pre-flight agent (which never edits source). ONE bounded attempt;
@@ -288,6 +311,24 @@ if (preflight.baseline === 'fail') {
288
311
  }
289
312
  }
290
313
 
314
+ // E2.5 (v4.42.0) — worktree integrity gate. Mirrors classic /new setup.md §6a, but WEAKER: the
315
+ // workflow JS cannot run bash, so it can only string-match the LITERAL evidence the pre-flight
316
+ // agent returned, not re-derive it on disk. Catches the haiku-class fabrication (baseline:'pass'
317
+ // with no worktree) by requiring the worktree path to appear in the agent's `worktree list
318
+ // --porcelain` output and node_modules in its ls. Skipped on the fail→repair path (that path
319
+ // provably did real work and the coder cd'd in). Backstop: every card re-runs the gates anyway.
320
+ if (preflight.baseline === 'pass') {
321
+ const ev = preflight.worktreeEvidence || {}
322
+ const wt = String(preflight.worktreePath || '')
323
+ const porcelain = String(ev.worktreeListPorcelain || '')
324
+ const verified = preflight.worktreeVerified === true && !!wt && porcelain.includes(wt) && /node_modules/.test(String(ev.artifactsLs || ''))
325
+ if (!verified) {
326
+ ledger(firstCard, 'E2.5-worktree', 'BATCH-FATAL', `worktree not verified on disk (verified=${preflight.worktreeVerified}, pathInPorcelain=${porcelain.includes(wt)}) — possible fabricated baseline:pass`)
327
+ return finalReturn({ fatal: true, reason: 'worktree integrity gate failed — baseline:pass but the returned evidence does not confirm a worktree on disk' })
328
+ }
329
+ ledger(firstCard, 'E2.5-worktree', 'VERIFIED', 'porcelain lists the worktree + node_modules present')
330
+ }
331
+
291
332
  for (const ex of preflight.excluded || []) ledger(ex.card, 'preflight-exclude', 'EXCLUDED', ex.reason)
292
333
  if (preflight.workspaceNote) ledger(firstCard, 'G1/G2-workspace', 'AUTO', preflight.workspaceNote)
293
334
  if (preflight.crossCard) ledger(firstCard, 'G3-cross-card', 'INFO', preflight.crossCard)
@@ -24,6 +24,7 @@ Route agents to the right module with minimal reading.
24
24
  - If touching architecture, auth, or tech stack -> read `agents/architecture.md`.
25
25
  - If touching workflow/process/commits/backlog -> read `agents/workflows.md`.
26
26
  - If CREATING or MUTATING a backlog card (any prefix — `FEAT`/`CHORE`/`BUG`/`DOC`/`PERF`/`UI`), or consuming one type-blind (`/new`, `/new2`) -> read `agents/card-schema.md` (the universal, profile-aware baseline) before writing/validating fields.
27
+ - If running MECHANICAL GATES (lint, format, type-check, test, build, audit) and `features.has_toolchain: true` -> read `agents/toolchain-protocol.md` and run the command from `toolchain.commands.<gate>` verbatim, falling back to the project-standard default when a key is unset. A configured command that FAILS is a real gate failure (do not fall back).
27
28
  - If touching testing or QA issues -> read `agents/testing.md` (also documents the scope-aware,
28
29
  profile-driven test-selection strategy consumed by `qa-sentinel`).
29
30
  - If touching GitHub issues or issue workflow -> read `agents/github-issue-subagent.md`.
@@ -65,6 +66,7 @@ When adding or updating agents, update REGISTRY.md — not this file.
65
66
  - `agents/project-context.md` — Project context protocol: `baldart.config.yml` + overlays + missing-key handling (since v3.0.0)
66
67
  - `agents/code-search-protocol.md` — Retrieval hierarchy for code search: LSP → Grep → Git (since v3.10.0, gated on `features.has_lsp_layer`)
67
68
  - `agents/code-graph-protocol.md` — Structural/relational retrieval via the Graphify code knowledge graph (since v4.21.0, gated on `features.has_code_graph`)
69
+ - `agents/toolchain-protocol.md` — Mechanical-gate command resolution (lint/format/typecheck/test/build/audit) from `toolchain.commands.*` with silent project-standard fallback (since v4.41.0, gated on `features.has_toolchain`)
68
70
  - `agents/design-system-protocol.md` — Registry-first discipline for UI work: BLOCKING cascade on `INDEX.md` + `tokens-reference.md` + `components/<Name>.md` (since v3.11.0, gated on `features.has_design_system`)
69
71
  - `agents/card-schema.md` — Atomic Card Baseline Schema: the universal, profile-aware (epic/child/standalone) field contract every backlog card satisfies, plus the consumer HALT/BACK-FILL/WARN contract (since v4.35.0)
70
72
 
@@ -0,0 +1,80 @@
1
+ # Toolchain Protocol
2
+
3
+ ## Purpose
4
+
5
+ Define how agents/skills run the **mechanical quality gates** — lint, format,
6
+ type-check, test, build, audit — when a project has opted into the curated
7
+ toolchain. The commands are read from `toolchain.commands.*` in
8
+ `baldart.config.yml` and run **verbatim**, instead of each gate hard-coding
9
+ `eslint` / `tsc` / `jest` / `npm run build`. This lets a project standardize on
10
+ the best-in-class tools (Biome for format+lint+import, Vitest, tsc, Lefthook)
11
+ and have every gate flow use them consistently.
12
+
13
+ ## Scope
14
+
15
+ **In**: which command to invoke for each gate; the resolution order; the silent
16
+ fallback. **Out**: code review for correctness/quality (that is the reviewer
17
+ agents' job, not a mechanical gate); installing the tools (that is
18
+ `baldart configure` / `/toolchain-bootstrap` / `baldart doctor`, see
19
+ `framework/docs/TOOLCHAIN-LAYER.md`).
20
+
21
+ ## Gating
22
+
23
+ Conditional on `features.has_toolchain: true` in `baldart.config.yml`. The layer
24
+ degrades **silently and per-command**: for each gate, if
25
+ `toolchain.commands.<gate>` is a non-empty string, run it **exactly**; otherwise
26
+ fall back to the flow's built-in project-standard default (`npx eslint`,
27
+ `npx tsc --noEmit`, `npm test`, `npm run build`, `npm audit`, …). When the flag
28
+ is `false`/missing, every gate uses its default — identical to pre-toolchain
29
+ behavior. **Never block a task** because a command is unset or a tool is missing;
30
+ surface install gaps through `baldart doctor` (`toolchain-install` /
31
+ `toolchain-init-config` actions), not mid-task.
32
+
33
+ ## Resolution hierarchy (per gate)
34
+
35
+ 1. **Explicit config** — `toolchain.commands.<gate>` from `baldart.config.yml`,
36
+ run verbatim. This is the source of truth when set.
37
+ 2. **Project-standard fallback** — the gate's existing default command, inferred
38
+ from the stack (eslint/tsc/jest/npm scripts). Used when the config key is
39
+ empty/absent.
40
+ 3. **Skip** — when neither resolves (e.g. no build script in a library), report
41
+ `SKIP`, never fail.
42
+
43
+ ## Command map
44
+
45
+ | Gate | Config key | Curated default (Biome/Vitest/tsc) | Fallback |
46
+ | --- | --- | --- | --- |
47
+ | Lint | `toolchain.commands.lint` | `npx biome check .` | `npx eslint --max-warnings=0 <files>` |
48
+ | Format | `toolchain.commands.format` | `npx biome format --write .` | `npx prettier --write <files>` |
49
+ | Type-check | `toolchain.commands.typecheck` | `npx tsc --noEmit` | `npx tsc --noEmit` |
50
+ | Test | `toolchain.commands.test` | `npx vitest run` | `npm test` |
51
+ | Test (related) | `toolchain.commands.test_related` | `npx vitest related --run <files>` | `npx jest --findRelatedTests <files>` |
52
+ | Build | `toolchain.commands.build` | `npm run build` | `npm run build` |
53
+ | Audit | `toolchain.commands.audit` | `npm audit --audit-level=high` | `npm audit` |
54
+
55
+ Biome's `check` runs lint + format-check + import-organize in one pass, so the
56
+ `lint` gate alone covers what a separate ESLint+Prettier+import-plugin trio
57
+ would; the `format` command only `--write`s.
58
+
59
+ ## Consumers
60
+
61
+ `qa-sentinel` (gate runner), `/qa`, `/check`, the `coder` post-fix re-verify,
62
+ and the `/new` + `/new2` review workflows resolve gate commands through this
63
+ protocol. The `/new` workflow scripts receive the resolved config via their
64
+ `args.config` payload (the consuming skill passes `baldart.config.yml`) — they
65
+ must never hard-code project facts (see the workflows contamination contract).
66
+
67
+ ## Fallback rules
68
+
69
+ - A configured command that EXITS NON-ZERO is a genuine gate **FAIL** — do not
70
+ silently fall back to the default (that would mask a real failure). Fallback
71
+ applies only when the command is **unset**, not when it fails.
72
+ - A configured command whose binary is missing (`command not found`) is a setup
73
+ gap → report it and fall back to the default for that single run, then let
74
+ `baldart doctor` repair the install. Never abort the task.
75
+
76
+ ## See also
77
+
78
+ - `framework/docs/TOOLCHAIN-LAYER.md` — install lifecycle, adapters, plumbing.
79
+ - `agents/testing.md` — scope-aware, profile-driven test selection (the `test` /
80
+ `test_related` gates compose with it).