baldart 4.39.0 → 4.41.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. package/CHANGELOG.md +31 -0
  2. package/README.md +6 -2
  3. package/VERSION +1 -1
  4. package/framework/.claude/agents/coder.md +4 -2
  5. package/framework/.claude/agents/qa-sentinel.md +10 -1
  6. package/framework/.claude/commands/qa.md +2 -0
  7. package/framework/.claude/skills/new/references/final-review.md +53 -18
  8. package/framework/.claude/skills/toolchain-bootstrap/SKILL.md +127 -0
  9. package/framework/.claude/workflows/new-card-review.js +17 -2
  10. package/framework/.claude/workflows/new-final-review.js +21 -10
  11. package/framework/.claude/workflows/new2.js +3 -0
  12. package/framework/agents/index.md +2 -0
  13. package/framework/agents/toolchain-protocol.md +80 -0
  14. package/framework/docs/TOOLCHAIN-LAYER.md +135 -0
  15. package/framework/docs/WORKFLOWS.md +1 -1
  16. package/framework/templates/baldart.config.template.yml +40 -0
  17. package/package.json +1 -1
  18. package/src/commands/configure.js +81 -0
  19. package/src/commands/doctor.js +67 -0
  20. package/src/commands/update.js +12 -0
  21. package/src/utils/tool-currency.js +52 -0
  22. package/src/utils/toolchain-adapters/biome.js +92 -0
  23. package/src/utils/toolchain-adapters/eslint.js +39 -0
  24. package/src/utils/toolchain-adapters/husky.js +30 -0
  25. package/src/utils/toolchain-adapters/index.js +83 -0
  26. package/src/utils/toolchain-adapters/jest.js +34 -0
  27. package/src/utils/toolchain-adapters/lefthook.js +84 -0
  28. package/src/utils/toolchain-adapters/prettier.js +39 -0
  29. package/src/utils/toolchain-adapters/tsc.js +50 -0
  30. package/src/utils/toolchain-adapters/vitest.js +46 -0
  31. package/src/utils/toolchain-installer.js +233 -0
package/CHANGELOG.md CHANGED
@@ -5,6 +5,37 @@ All notable changes to BALDART will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [4.41.0] - 2026-06-15
9
+
10
+ **BALDART becomes opinionated about the *tools*, not just the workflow: a new curated toolchain layer installs best-in-class JS/TS dev tools at first install and makes the agents actually use them.** Until now BALDART shipped agents and workflows but was agnostic about linters/formatters/test-runners — every quality gate hard-coded `eslint`/`tsc`/`jest`. This release adds an opt-in toolchain layer (`features.has_toolchain`) that, on a JS/TS project, PRESELECTS and installs a curated set as devDependencies — **Biome** (format + lint + import organizer), **Vitest**, **tsc**, **Lefthook** (pre-commit) — then records literal gate commands in `toolchain.commands.*` that the gate flows (`/new`, `/new2`, `/qa`, `qa-sentinel`, `coder`) run verbatim instead of guessing. The design is the fourth member of the install-adapter family (alongside routine-/tool-/lsp-adapters) and inherits their invariants: **opinionated but askable** (default Y, opt-out), **non-destructive** (configs written only when absent; existing ESLint/Prettier/Jest/husky are detected and a migration is only ever PROPOSED, never automatic — `.husky/` is never overwritten), **never silent in CI** (`--non-interactive` writes the flag only; `baldart doctor` backfills), and **silent fallback** (an unset command degrades to the project-standard default; the layer is invisible to non-JS projects and consumers with their own toolchain). **MINOR** (additive capability + new `features.has_toolchain` + `toolchain.*` config keys, propagated end-to-end per the schema-change propagation rule; backwards-compatible — flag defaults `false`, every gate falls back to today's behavior when unset).
11
+
12
+ ### Added
13
+
14
+ - **`src/utils/toolchain-adapters/`** — new adapter family (same dispatcher pattern as `lsp-adapters/`). Curated installers `biome.js` / `vitest.js` / `tsc.js` / `lefthook.js` (each: `installCommand`/`verifyCommand`/`commands()`/`initConfig`/`static replaces`/`static detect`) + incumbent **detectors** `eslint.js` / `prettier.js` / `jest.js` / `husky.js` (detection-only, drive the migration proposal). `index.js` exposes `REGISTRY` + `INCUMBENTS` + `detectAll` + `detectIncumbents`.
15
+ - **`src/utils/toolchain-installer.js`** — orchestrator (modeled on `graphify-installer.js`): `recommend()` (clean installs), `migrations()` (incumbent-blocked tools → manual migration), `install()` (devDeps; writes a tool's config BEFORE the package's own postinstall so Lefthook registers our Biome pre-commit, not its commented-out example), `initConfigs()` (non-destructive), `commandsFor()`, `activeTools()`, `certify()`. Never throws.
16
+ - **`framework/agents/toolchain-protocol.md`** — runtime protocol: per-gate resolution (explicit `toolchain.commands.*` → project-standard fallback → skip), the command map, and the rule that a configured command which FAILS is a real failure (fallback applies only to *unset* commands). Routed from `framework/agents/index.md`.
17
+ - **`framework/docs/TOOLCHAIN-LAYER.md`** — operator guide (plumbing, lifecycle, edge cases, how to add a language/tool).
18
+ - **`framework/.claude/skills/toolchain-bootstrap/SKILL.md`** — explicit install path (`/toolchain-bootstrap`), CI-safe; mirrors `/lsp-bootstrap`.
19
+ - **`framework/templates/baldart.config.template.yml`** — `features.has_toolchain` + the `toolchain:` block (`installed_tools`, `commands.{lint,format,typecheck,test,test_related,build,audit}`, `auto_verify`).
20
+
21
+ ### Changed
22
+
23
+ - **`src/commands/configure.js`** — autodetects the prompt default (JS/TS project AND no incumbent → Y), and on enable runs the install/migrate/write/certify flow (clean set preselected; incumbents → migration proposal, never automatic).
24
+ - **`src/commands/update.js`** — the schema-drift detector now diffs the `toolchain:` block including its nested `commands.*` map (same nested-block contract as `graph:`), so a future gate key surfaces for pre-existing consumers.
25
+ - **`src/commands/doctor.js`** — backfill actions `toolchain-install` (devDeps missing) and `toolchain-init-config` (default config missing), gated on `features.has_toolchain`, never blocking.
26
+ - **`src/utils/tool-currency.js`** — `_toolchainRecords` reports curated devDeps behind their npm `latest` (installed version read from `node_modules`, not a global binary); `autoUpgradable:false` (devDep upgrades touch `package.json` — surfaced as a command, user-run). Honest `unknown` offline.
27
+ - **Consumers wired to read `toolchain.commands.*` with silent fallback** — `framework/.claude/agents/qa-sentinel.md`, `framework/.claude/agents/coder.md`, `framework/.claude/commands/qa.md`, `framework/.claude/workflows/new-card-review.js` (post-fix re-verify + qa gate brief), `framework/.claude/workflows/new2.js` (pre-flight baseline config facts).
28
+
29
+ ## [4.40.0] - 2026-06-15
30
+
31
+ **The classic `/new` now slims its single-card final review the same way `new2` already does — closing an asymmetry that made an N=1 batch re-run review finders it had already run.** `new-final-review.js` has carried an F-041 single-card slim since v4.17.x (keep the unique-value cross-model Codex pass + the qa-sentinel merge gate; drop the duplicate Claude breadth finders that already ran per-card), but it only ever fired for **`new2`**, which passes `singleCard`. The classic `/new` final-review delegation (`final-review.md` Step F.1.5) **never passed the flag**, so `slim` was always false and a one-card `/new` batch ran the full breadth set in the final review even though the per-card pass had already covered the same files — a duplicate cross-model Codex + redundant doc review on every single-card run. The user's framing ("for one card, skip the per-card review") was the wrong lever (it would drop Simplify, break the fail-fast ordering that runs review *before* E2E + doc-sync, and lose the early security pass — and was already litigated: v3.35.0 introduced an N=1 final-review skip, v3.37.0 reverted it precisely because the per-card pass can run shallow under `light`). The right lever, already chosen for `new2`, is the inverse: keep the per-card pass, slim the *final*. This release extends that slim to classic `/new` — but **per-finder, coverage-gated**, because classic `/new` differs from `new2`: its `doc-reviewer` runs in the skill's Phase 3 (and can defer to Final under `light`), and its `api-perf-cost-auditor` is deferred-to-final *by design* (never runs per-card). **MINOR** (changes the N=1 merge-gate composition; no new surface, no `baldart.config.yml` key — `singleCard`/`slimDoc`/`slimApi` are workflow args ⇒ schema-propagation rule N/A).
32
+
33
+ ### Changed
34
+
35
+ - **`framework/.claude/workflows/new-final-review.js`** — the single `slim = a.singleCard` flag is decoupled into **per-finder** `slimDoc` / `slimApi`, each defaulting to `a.singleCard` when absent (so callers passing only `singleCard` — i.e. `new2` — are byte-for-byte unchanged, and a caller passing neither gets an unconditional FULL pass — a safe default that never silently over-skips). `doc-reviewer` is gated on `!slimDoc`, `api-perf-cost-auditor` on `!slimApi && hasApiDataFiles`; Codex + `qa-sentinel` always run. Per-finder skip logs replace the single coupled log line.
36
+ - **`framework/.claude/skills/new/references/final-review.md`** — Step F.1.5 now passes `singleCard: cardPaths.length === 1`, `slimDoc: cardPaths.length === 1 && <Phase-3 doc-review ran, not deferred>`, and `slimApi: false` (api-perf never runs per-card in classic `/new` — the final is its only run, so it is never slimmed). The v3.37.0 "FULL gate" invariant prose is reconciled to describe the coverage-gated slim (Codex + qa-sentinel are the unconditional safety gate; breadth finders slim per-finder for N=1 only, gated on per-card coverage — the coverage-gate is the backstop the v3.35.0 blanket skip lacked). The inline Step F.3 fallback (SSOT the workflow mirrors) and the fan-out completion barrier ("ALL THREE" → "every launched Task") carry the same rule, so the no-`Workflow`-tool path stays coherent.
37
+ - **`framework/docs/WORKFLOWS.md`** — the `new-final-review` row's single-card description corrected from "skips the duplicate doc/api reviewers" to the accurate per-finder, coverage-gated behavior (`new2` drops both; classic `/new` drops only `doc-reviewer` and keeps `api-perf-cost-auditor`).
38
+
8
39
  ## [4.39.0] - 2026-06-15
9
40
 
10
41
  **`/new`, `/new2`, and `/prd` now auto-reap orphaned Codex MCP servers at their workspace-hygiene finalizers — the v4.37.0 doctor reaper, made automatic.** v4.37.0 added an on-demand reaper to `baldart doctor`, but the leak compounds *per skill run*: every batch's Codex finder calls (`/new`/`new2` per-card review + final review, `/prd` discovery-completeness + plan audit) drive `codex app-server`, whose detached broker spawns the `~/.codex/config.toml` MCP servers (Playwright, …) as children that orphan to init (ppid 1) when the broker dies and keep burning CPU. Waiting for a manual `baldart doctor` let them accumulate between runs. Now each batch ends by sweeping them. A new focused, non-interactive CLI command (`baldart reap-orphans`) is the SSOT the three finalizers call; it shares the v4.37.0 `codex-orphans.js` detection/reaping logic and the same hard safety invariant — it reaps ONLY orphaned MCP servers (ppid 1 ⇒ broker dead ⇒ stdio broken), and NEVER kills a live `codex app-server` broker (a shared, detached runtime that may still serve the user's interactive session). Because an MCP child of a still-warm broker is not yet orphaned, this is a cumulative orphan sweep (catches this run's debris once its broker dies, plus any prior runs'), not a per-run broker teardown. **MINOR** (new CLI command + skill-finalizer wiring; backwards-compatible — non-blocking hygiene step, no-op when nothing is orphaned, no install/layout change, no `baldart.config.yml` key ⇒ schema-propagation rule N/A).
package/README.md CHANGED
@@ -208,13 +208,13 @@ Skills always-ask when required keys are missing — never silently default.
208
208
  never overwrites your file. Full guide:
209
209
  [`framework/docs/PROJECT-CONFIGURATION.md`](framework/docs/PROJECT-CONFIGURATION.md).
210
210
 
211
- ### Skills (32 portable skills)
211
+ ### Skills (33 portable skills)
212
212
 
213
213
  Skills live under `.claude/skills/` and are auto-discovered by Claude Code.
214
214
  Bundled skills:
215
215
 
216
216
  - **Workflow**: `new`, `new2` (v4.16.0 — EXPERIMENTAL workflow-hosted `/new`, Claude-only, for A/B testing context economy), `prd`, `prd-add`, `bug`, `simplify`, `worktree-manager`, `issue-review`, `context-primer`
217
- - **Code quality**: `skill-creator`, `find-skills`, `webapp-testing`, `playwright-skill`, `lsp-bootstrap` (v3.10.0), `graphify-bootstrap` (v4.21.0 — code knowledge graph), `graph-align` (v4.21.0 — doc↔graph alignment), `e2e-review` (v3.18.0)
217
+ - **Code quality**: `skill-creator`, `find-skills`, `webapp-testing`, `playwright-skill`, `lsp-bootstrap` (v3.10.0), `graphify-bootstrap` (v4.21.0 — code knowledge graph), `graph-align` (v4.21.0 — doc↔graph alignment), `toolchain-bootstrap` (v4.41.0 — curated dev toolchain), `e2e-review` (v3.18.0)
218
218
  - **Design**: `frontend-design`, `ui-design`, `motion-design`, `gamification-design`, `design-system-init` (v3.11.0)
219
219
  - **Product**: `seo-audit`, `copywriting`, `api-design-principles`
220
220
  - **Knowledge**: `doc-writing-for-rag`, `capture` (LLM wiki overlay)
@@ -245,6 +245,10 @@ When `features.has_lsp_layer: true`, `codebase-architect` and the code-explorati
245
245
 
246
246
  When `features.has_code_graph: true`, agents prefer the [Graphify](https://github.com/safishamsi/graphify) code knowledge graph (tree-sitter, local/offline, native Leiden communities) for **structural / relational** queries — "what connects X to Y", blast-radius of a change, which modules cluster — via `graphify query`/`path`/`explain`/`affected`. The same graph **re-activates the LLM-wiki auto-learning loop** (dormant since the RAG removal in v4.20.0): `wiki-curator`, `/capture`, and the nightly `doc-graph-align` routine feed synthesis candidates from Graphify's native `GRAPH_REPORT.md` (god nodes, communities, suggested questions) — entirely offline. Graphify is a single language-agnostic tool (`pipx install graphifyy`); install via `baldart configure` or `/graphify-bootstrap` (never silent in CI — `baldart doctor` backfills). Falls back silently to LSP→Grep→Git. See [`framework/agents/code-graph-protocol.md`](framework/agents/code-graph-protocol.md) and [`framework/docs/CODE-GRAPH-LAYER.md`](framework/docs/CODE-GRAPH-LAYER.md).
247
247
 
248
+ ### Curated Toolchain Layer (new in v4.41.0)
249
+
250
+ When `features.has_toolchain: true`, BALDART becomes opinionated about the *tools* you build with, not just the workflow. On a JS/TS project `baldart configure` PRESELECTS and installs a curated set as devDependencies — **Biome** (format + lint + import organizer), **Vitest**, **tsc**, **Lefthook** (pre-commit) — and records literal gate commands in `toolchain.commands.*`. The quality-gate flows (`/new`, `/new2`, `/qa`, `qa-sentinel`, `coder`) then run those commands verbatim instead of hard-coding `eslint`/`tsc`/`jest`. Opinionated **but askable** (default Y, opt-out) and **non-destructive**: existing ESLint/Prettier/Jest/husky setups are detected and a migration is only ever PROPOSED, never automatic (`.husky/` is never overwritten). Never silent in CI (`baldart doctor` backfills); each gate falls back silently to the project-standard default when its command is unset, so non-JS projects and consumers with their own toolchain are unaffected. Install via `baldart configure` or `/toolchain-bootstrap`. See [`framework/agents/toolchain-protocol.md`](framework/agents/toolchain-protocol.md) and [`framework/docs/TOOLCHAIN-LAYER.md`](framework/docs/TOOLCHAIN-LAYER.md).
251
+
248
252
  ### Commands
249
253
 
250
254
  - **/new**: Batch orchestrator with QA validation, production readiness checklist, and context recovery (also available as a skill)
package/VERSION CHANGED
@@ -1 +1 @@
1
- 4.39.0
1
+ 4.41.0
@@ -226,9 +226,11 @@ persistence code while `stack.database` is empty, surface a warning suggesting
226
226
 
227
227
  Do NOT wait until the end to verify your work. After each sub-task that changes types, interfaces, or function signatures:
228
228
 
229
- 1. **When `stack.language` includes `typescript`**: run `npx tsc --noEmit` fix ALL errors before proceeding to the next sub-task. On non-TypeScript stacks, run the project's equivalent type/static check instead (e.g. `mypy` / `pyright` for Python, `go vet` for Go) when one is configured.
229
+ > **Toolchain (since v4.41.0):** when `features.has_toolchain: true`, use the commands in `toolchain.commands.*` (`typecheck`, `lint`, `test`/`test_related`) verbatim for the steps below per `agents/toolchain-protocol.md` falling back to the defaults shown when a key is unset.
230
+
231
+ 1. **When `stack.language` includes `typescript`**: run `toolchain.commands.typecheck` if set, else `npx tsc --noEmit` — fix ALL errors before proceeding to the next sub-task. On non-TypeScript stacks, run the project's equivalent type/static check instead (e.g. `mypy` / `pyright` for Python, `go vet` for Go) when one is configured.
230
232
  2. If the type check fails: **STOP**. Fix the errors first. Do NOT continue writing new files on top of broken types.
231
- 3. After the final sub-task, run the project's linter on your changed files (e.g. `npx eslint --max-warnings=0 <files>` for a JS/TS stack, or the configured equivalent).
233
+ 3. After the final sub-task, run the linter on your changed files: `toolchain.commands.lint` if set, else the project default (e.g. `npx eslint --max-warnings=0 <files>` for a JS/TS stack, or the configured equivalent).
232
234
  4. **Unit tests**: run ONLY the specific test file, not the full suite. `npm run test` is slow (minutes). Run the single file with the project's actual runner — check `package.json` scripts / devDependencies and use whichever is present (e.g. `npx vitest run <file>`, `npx jest <file>`, `node --import tsx <file>`, or `pytest <file>` / `go test ./<pkg>` on non-JS stacks). Do NOT assume `tsx` is installed: if the chosen command fails with a missing-loader/module error, that means the runner is wrong — try the next candidate, do NOT treat the failure as "no related tests". If the task doesn't involve a known test file, skip the test run.
233
235
 
234
236
  **Error Recovery — three-branch state machine (cap: 3 attempts)**: If a build breaks mid-implementation:
@@ -23,7 +23,16 @@ Run mechanical quality gates fast, return a PASS/FAIL verdict, and get out of th
23
23
  > patterns. Pre-commit gates below are the typical defaults — adjust the toolchain to
24
24
  > match your project (e.g. `ruff` instead of `eslint`, `pytest` instead of `npm test`).
25
25
 
26
- Default pre-commit gates (MUST pass before any commit verdict):
26
+ **Toolchain commands (since v4.41.0):** when `features.has_toolchain: true` in
27
+ `baldart.config.yml`, run the command from `toolchain.commands.<gate>` (lint,
28
+ typecheck, test, test_related, build, audit) **verbatim** instead of the default
29
+ below — per `agents/toolchain-protocol.md`. Resolve per gate: a non-empty config
30
+ command wins; an empty/absent one falls back to the default below. A configured
31
+ command that EXITS NON-ZERO is a real FAIL (do not fall back). Example: a project
32
+ on Biome runs `npx biome check .` for lint; a Vitest project runs `npx vitest run`.
33
+
34
+ Default pre-commit gates (MUST pass before any commit verdict) — used when no
35
+ `toolchain.commands.*` is configured:
27
36
  1. Lint: `npx eslint --max-warnings=0 <changed source files>` (or project equivalent)
28
37
  2. Type-check: `npx tsc --noEmit` (or project equivalent)
29
38
  3. Markdown lint: `npx markdownlint-cli2 <changed .md files>` (if .md files changed)
@@ -98,6 +98,8 @@ Write the initial file:
98
98
 
99
99
  ## Step 3 — Execute Profile
100
100
 
101
+ > **Toolchain (since v4.41.0):** when `features.has_toolchain: true` in `baldart.config.yml`, run the gate commands from `toolchain.commands.*` (lint, typecheck, test, test_related, build, audit) verbatim — per `agents/toolchain-protocol.md` — instead of the generic examples below. A configured command that fails is a real FAIL.
102
+
101
103
  ### LIGHT — Fast confidence (<3 min target [DESIGN-CHOICE: scoped to lint + type-check + related tests only; sufficient for low-risk or doc-only diffs])
102
104
 
103
105
  Run directly in this session (no sub-agents):
@@ -13,21 +13,32 @@ Once ALL cards are committed in the worktree:
13
13
  > **Final-review FULL gate (since v3.37.0 — supersedes the v3.35.0 scope-reduction)** — because
14
14
  > Phase 3.7 may now run a card at `light` depth, the per-card pass can NO LONGER be assumed to have
15
15
  > full-reviewed every card. The final review is therefore the **unconditional safety gate that makes
16
- > the Phase 3.7 `light` profile safe**: it ALWAYS runs a single FULL `/codexreview` (full agent set)
17
- > over the **ENTIRE batch diff** before merge — **no N=1 skip, no cross-card scope reduction**. Every
18
- > line of every card — including any reviewed at `light` in Phase 3.7 — receives a full-depth Codex
19
- > review at least once before merge.
16
+ > the Phase 3.7 `light` profile safe**: the **cross-model Codex pass + the `qa-sentinel` merge gate
17
+ > ALWAYS run** over the **ENTIRE batch diff** before merge — **no N=1 skip, no cross-card scope
18
+ > reduction** for those two. Every line of every card — including any reviewed at `light` in Phase
19
+ > 3.7 — receives a full-depth Codex review at least once before merge.
20
20
  >
21
- > - Run Steps F.1–F.5 for **EVERY batch, including N=1**. Nothing here is skipped.
21
+ > - Run Steps F.1–F.5 for **EVERY batch, including N=1**. The safety gate (Codex + qa-sentinel) is
22
+ > never skipped.
22
23
  > - `review_scope_files` = the **FULL union** of all touched files across all cards (F.1 step 4 —
23
24
  > NEVER reduced to the cross-card subset).
24
- > - F.3 invokes the full reviewer set (Codex + doc-reviewer + api-perf-cost-auditor + qa-sentinel)
25
- > over that union; F.5 runs the final build.
25
+ > - F.3 invokes Codex + qa-sentinel over that union unconditionally, **plus** the breadth finders
26
+ > (doc-reviewer, api-perf-cost-auditor). F.5 runs the final build.
27
+ > - **F-041 per-finder slim for N=1 only (coverage-gated).** On a **single-card** batch the breadth
28
+ > finders may be dropped — but ONLY per-finder, gated on what ACTUALLY ran per-card, never a blanket
29
+ > "N=1 → skip" (that is the v3.35.0 hole this gate closed). For N=1: drop `doc-reviewer` **iff** Phase
30
+ > 3 ran it on the (single) card — if Phase 3 deferred it (`light` + no doc files), the final MUST run
31
+ > it; **keep `api-perf-cost-auditor` always**, because it is deferred-to-final by design and never
32
+ > ran per-card in classic `/new` (the final is its only run). Codex + qa-sentinel still run. For N>1
33
+ > the full breadth set always runs (a multi-card batch has genuine cross-card surface to review).
26
34
  >
27
- > Rationale: this re-introduces the post-batch full pass that v3.35.0 de-duplicated away. That
28
- > de-dup assumed Phase 3.7 had already full-reviewed every card an assumption broken the moment
29
- > `light` became selectable. The cost of one full batch-diff review is the deliberate price of the
30
- > per-card `light` speed-up (explicit maintainer decision, v3.37.0).
35
+ > Rationale: v3.37.0 re-introduced the post-batch full pass that v3.35.0 de-duplicated away — that
36
+ > de-dup assumed Phase 3.7 had full-reviewed every card, an assumption broken the moment `light` became
37
+ > selectable. The F-041 slim is NOT a regression of that decision: the safety invariant (Codex + qa
38
+ > over the full diff) is untouched, and a breadth finder is dropped **only** when its own per-card run
39
+ > already covered the same single card — i.e. the coverage-gate IS the backstop the v3.35.0 skip lacked.
40
+ > A single card has no cross-card surface, so a finder's per-card pass and its final pass are genuinely
41
+ > the same review.
31
42
 
32
43
  ### Step F.1 — Resolve scope
33
44
 
@@ -78,10 +89,23 @@ that is a **gate violation**: log it as
78
89
  reviewScopeFiles, // the FULL union from F.1
79
90
  archBaselinePaths, // per-card baselines if ALL present (F.2 dedup), else null
80
91
  hasApiDataFiles, // true unless NO scope file falls under paths.api_* / data-model
81
- config // the parsed baldart.config.yml
92
+ config, // the parsed baldart.config.yml
93
+ singleCard: cardPaths.length === 1, // N=1 batch — enables the F-041 per-finder slim
94
+ slimDoc: cardPaths.length === 1 && <Phase-3 doc-review RAN for this card, NOT deferred>,
95
+ slimApi: false // api-perf NEVER runs per-card in classic /new → final is its ONLY run; never slim it
82
96
  }})
83
97
  ```
84
98
 
99
+ **`slimDoc` coverage gate (per-finder — do NOT collapse to a bare `singleCard`).** The
100
+ final review may drop the doc-reviewer for an N=1 batch ONLY because Phase 3 already ran
101
+ it on the same (single) card — so it is gated on Phase 3 having ACTUALLY run, not deferred.
102
+ You know this from your own Phase 3 log: if you logged `doc-review: DEFERRED to Final FULL
103
+ gate (light, no doc files in diff)` (review-cycle.md step ~235), Phase 3 did NOT run →
104
+ `slimDoc: false` (the final MUST run doc-reviewer, else doc review is skipped end-to-end).
105
+ Otherwise Phase 3 ran → `slimDoc: true`. `slimApi` is ALWAYS `false` here: api-perf-cost-auditor
106
+ is deferred-to-final by design in classic `/new` (never part of the per-card cluster), so
107
+ the final pass is its only run — slimming it would leave the api/perf domain unreviewed.
108
+
85
109
  The workflow returns `{ codexEngine, findings, gateTable, summary }` where
86
110
  `findings` are already consolidated and classified (`VERIFIED` /
87
111
  `NEEDS_MANUAL_CONFIRMATION`; `FALSE_POSITIVE` already dropped) and `gateTable`
@@ -184,15 +208,26 @@ that is a **gate violation**: log it as
184
208
  | **api-perf-cost-auditor** | `api-perf-cost-auditor` | API/data/performance/cost defects (skip if no API/data files in scope) | Same findings schema |
185
209
  | **qa-sentinel** | `qa-sentinel` | **Mechanical gates ONLY** over the batch scope (lint, tsc, full test suite, build, `npm audit`, markdownlint) | A PASS/FAIL gate table — NOT a findings list. qa-sentinel does not read source files, does not emit severities, and does not do edge-case/reproducibility analysis (its system prompt forbids it). A gate FAILURE feeds the fix-loop the same way a VERIFIED finding does. |
186
210
 
211
+ **F-041 per-finder slim — N=1 only (coverage-gated; identical rule to the delegated path's
212
+ `slimDoc`/`slimApi`).** This inline prose is the SSOT the workflow mirrors, so it carries the same
213
+ slim: on a **single-card** batch, **skip the `doc-reviewer` row iff Phase 3 ran doc-review on the
214
+ card** (you logged no `doc-review: DEFERRED ...` — if Phase 3 deferred it under `light`, KEEP it here
215
+ or doc review is skipped end-to-end). **Always keep `api-perf-cost-auditor`** (subject only to its
216
+ existing "no API/data files in scope" skip): it is deferred-to-final by design and never ran
217
+ per-card, so this is its only run — slimming it would leave api/perf unreviewed. `qa-sentinel`
218
+ always runs. For N>1, run the full breadth set (genuine cross-card surface exists). Codex (step 6)
219
+ is never slimmed.
220
+
187
221
  The two code-aware agents (doc-reviewer, api-perf-cost-auditor) receive: card IDs, YAML, `review_scope_files`, codebase-architect baseline, and a Budget Block per the `/codexreview` Step 2 contract (`framework/.claude/commands/codexreview.md`). qa-sentinel receives only the worktree path + the changed-file list and runs gates. Code-correctness/edge-case analysis is Codex's job (and the per-card `/codexreview` already ran) — do NOT ask qa-sentinel to produce code findings.
188
222
 
189
- **Fan-out completion barrier (BLOCKING before F.4).** The three Claude agents write to a shared
223
+ **Fan-out completion barrier (BLOCKING before F.4).** The Claude agents write to a shared
190
224
  findings pool that F.4 step 9 fans in. Before F.4 reads ANY finding, you MUST have collected the
191
- return value of ALL THREE Task invocations from step 7 (doc-reviewer, api-perf-cost-auditor,
192
- qa-sentinel) never start the merge while a Task is still in flight. Because step 7 launches all
193
- three in a single message, the harness returns when all three complete; do NOT proceed to step 9
194
- on a partial set. (The Codex background task has its OWN barrier step 8 below polls `$REVIEW_FILE`
195
- for completion. The two barriers are independent: wait for BOTH the three Claude Tasks AND the
225
+ return value of **every Task invocation you launched in step 7** (the set after the F-041 slim —
226
+ normally doc-reviewer + api-perf-cost-auditor + qa-sentinel, but fewer when a breadth finder was
227
+ slimmed for N=1) never start the merge while a Task is still in flight. Because step 7 launches
228
+ them in a single message, the harness returns when all complete; do NOT proceed to step 9 on a
229
+ partial set. (The Codex background task has its OWN barrier step 8 below polls `$REVIEW_FILE`
230
+ for completion. The two barriers are independent: wait for BOTH the launched Claude Tasks AND the
196
231
  Codex background task before merging.)
197
232
 
198
233
  ### Step F.4 — Collect & merge findings
@@ -0,0 +1,127 @@
1
+ ---
2
+ name: toolchain-bootstrap
3
+ effort: medium
4
+ description: >
5
+ Install, verify, and wire the curated dev toolchain for this project. Detects
6
+ the JS/TS stack, installs the best-in-class tools as devDependencies (Biome for
7
+ format+lint+import, Vitest, tsc, Lefthook), writes their default config only
8
+ when absent, records toolchain.installed_tools + toolchain.commands.* in
9
+ baldart.config.yml, and proposes (never forces) a migration off any existing
10
+ ESLint/Prettier/Jest/husky. Use when the user says /toolchain-bootstrap,
11
+ "install the toolchain", "set up biome", or after enabling
12
+ features.has_toolchain for the first time. Idempotent — re-running re-verifies
13
+ and reports drift, never overwriting existing configs.
14
+ ---
15
+
16
+ # Toolchain Bootstrap
17
+
18
+ Set up the curated dev toolchain so the quality gates (`/new`, `/qa`, `/check`,
19
+ `qa-sentinel`) run best-in-class tools instead of whatever the project happened
20
+ to have. Non-destructive throughout: configs are written only when absent and
21
+ incumbent tools are never installed over or removed.
22
+
23
+ ## Project Context
24
+
25
+ **Reads from `baldart.config.yml`:** `features.has_toolchain`, `toolchain.installed_tools`, `toolchain.commands.*`, `toolchain.auto_verify`.
26
+ **Gated by features:** `features.has_toolchain` — this skill refuses to run when the flag is `false`, prompting the user to flip it via `npx baldart configure` first.
27
+ **Overlay:** loads `.baldart/overlays/toolchain-bootstrap.md` if present — project-specific tool choices or install commands.
28
+ **On missing keys:** ask the user; do not assume defaults. See `framework/agents/project-context.md` § 3.
29
+
30
+ ## Effort
31
+
32
+ **Baseline:** `effort: medium` (frontmatter). **Inline override:** pass
33
+ `effort=<low|medium|high|xhigh|max>` anywhere in the invocation to scale
34
+ reasoning depth for this run — detect it once at kickoff and strip the token
35
+ before consuming user input. Level→behavior mapping, parsing contract, and
36
+ precedence caveats: `framework/agents/effort-protocol.md`.
37
+
38
+ ## What This Skill Does
39
+
40
+ 1. Reads `baldart.config.yml` and confirms `features.has_toolchain: true`.
41
+ 2. Probes the cwd via `src/utils/toolchain-installer.js` to compute the clean
42
+ recommendation (`recommend()`) and any incumbent-blocked migrations
43
+ (`migrations()`).
44
+ 3. Installs the recommended tools as **devDependencies** (`npm install -D -E`) —
45
+ never global (these are project tools invoked via `npx`, unlike LSP servers).
46
+ 4. Writes each tool's default config **only when absent** (`biome.json`,
47
+ `lefthook.yml`) and registers Lefthook's `pre-commit` hook.
48
+ 5. For incumbents (ESLint/Prettier/Jest/husky), PROPOSES a migration with a
49
+ preview — never automatic, never touching the existing config (`.husky/` is
50
+ never overwritten).
51
+ 6. Writes `toolchain.installed_tools` + `toolchain.commands.*` to
52
+ `baldart.config.yml`, then `certify()`s the set.
53
+ 7. Prints a compact status and points to
54
+ `framework/agents/toolchain-protocol.md` for runtime behavior.
55
+
56
+ ## Workflow
57
+
58
+ 1. **Refusal check.** If `features.has_toolchain: false` (or missing):
59
+ ```
60
+ This project hasn't opted in to the curated toolchain yet.
61
+ Run `npx baldart configure` and answer YES to "Enable curated dev toolchain?"
62
+ then re-run /toolchain-bootstrap.
63
+ ```
64
+ Stop here. Do not proceed.
65
+
66
+ 2. **Probe.** Call the installer:
67
+ ```
68
+ node -e 'const T=require("./.framework/src/utils/toolchain-installer"); const t=new T(); console.log(JSON.stringify({hasPkg:t.hasPackageJson(),recommend:t.recommend(),migrations:t.migrations()},null,2))'
69
+ ```
70
+ If `hasPkg` is false, report "no package.json — JS toolchain not applicable"
71
+ and stop. Otherwise show the recommended tools and any migrations.
72
+
73
+ 3. **Confirm.** For the clean recommendation, default YES. For each migration,
74
+ show what it replaces (and for Biome, the `npx biome migrate eslint/prettier
75
+ --write` preview) and default NO.
76
+
77
+ 4. **Install + config + commands.** Run the installer's `install`, `initConfigs`,
78
+ then read back `commandsFor(activeTools())`. Capture failures into a "Skipped"
79
+ bucket — never abort the whole skill on one failure.
80
+
81
+ 5. **Persist.** Update `baldart.config.yml`:
82
+ ```yaml
83
+ toolchain:
84
+ installed_tools: [biome, vitest, tsc, lefthook]
85
+ commands:
86
+ lint: npx biome check .
87
+ format: npx biome format --write .
88
+ typecheck: npx tsc --noEmit
89
+ test: npx vitest run
90
+ test_related: npx vitest related --run
91
+ auto_verify: true
92
+ ```
93
+
94
+ 6. **Output.** A compact one-line-per-tool status, in the project's
95
+ `identity.language` from `baldart.config.yml` (English default).
96
+
97
+ ## Output Contract
98
+
99
+ A single short status block — never a multi-page report. Format:
100
+
101
+ ```
102
+ Toolchain bootstrap:
103
+ ✓ biome (devDep 2.x, biome.json written, lint+format wired)
104
+ ✓ vitest (devDep, test + test_related wired)
105
+ ✓ tsc (typescript present, typecheck wired)
106
+ ✓ lefthook (devDep, lefthook.yml + pre-commit hook installed)
107
+ · eslint → migration available (declined): npx biome migrate eslint --write
108
+ Config updated: baldart.config.yml toolchain.installed_tools = [biome, vitest, tsc, lefthook]
109
+ Next: /qa, /new, qa-sentinel will now run these commands instead of the defaults.
110
+ ```
111
+
112
+ ## Notes
113
+
114
+ - This skill never edits source code. Its only side effects are devDependency
115
+ installs, writing absent default configs, registering the Lefthook hook, and a
116
+ YAML write to `baldart.config.yml`.
117
+ - Re-run safely. `recommend()`/`migrations()` are recomputed each time, configs
118
+ are skipped when present, and already-usable tools are left untouched —
119
+ additions / repairs are idempotent.
120
+ - `baldart doctor` is the unattended backfill: it installs missing tools
121
+ (`toolchain-install`) and restores missing default config (`toolchain-init-config`).
122
+
123
+ ## See Also
124
+
125
+ - `framework/agents/toolchain-protocol.md` — runtime command resolution + fallback.
126
+ - `framework/docs/TOOLCHAIN-LAYER.md` — full plumbing + lifecycle.
127
+ - `src/utils/toolchain-adapters/` — per-tool install/verify recipes.
@@ -38,6 +38,14 @@ const cfg = a.config || {}
38
38
  const highRisk = (cfg.paths && cfg.paths.high_risk_modules) || [] // security-domain hint
39
39
  const protocolRef = '.claude/skills/new/references/review-cycle.md'
40
40
 
41
+ // Curated toolchain (since v4.41.0): when features.has_toolchain is on, the
42
+ // consumer records LITERAL gate commands in toolchain.commands.* — agents run
43
+ // THOSE instead of guessing (e.g. `npx biome check .` not `npm run lint`). Empty
44
+ // / absent → silent fallback to the project-standard default. Read from the
45
+ // config the skill already passes via args.config (never hardcoded project facts).
46
+ const tcCmds = (cfg.toolchain && cfg.toolchain.commands) || {}
47
+ const tc = (key, fallback) => (tcCmds[key] && String(tcCmds[key]).trim()) || fallback
48
+
41
49
  // Per-card result accumulator — built up-front (so the early-return guards can return it) and
42
50
  // populated with fixesApplied/residual in the Fix phase.
43
51
  const perCard = {}
@@ -193,8 +201,15 @@ const codexPrompt =
193
201
  `For each finding return: finding_id, title, severity (BLOCKER|HIGH|MEDIUM|LOW), confidence (0-100), evidence (exact file:line + code quote), minimal_fix_direction, and domain (doc|security|migration|code|perf|test). ` +
194
202
  `Run the mandatory false-positive check on every finding and suppress the unconvincing ones (your findings are treated as already FP-validated). Set codexAvailable:true when the review ran.`
195
203
 
204
+ const tcGateLines = [
205
+ ['lint', tcCmds.lint], ['typecheck', tcCmds.typecheck], ['test', tcCmds.test],
206
+ ['build', tcCmds.build], ['audit', tcCmds.audit],
207
+ ].filter(([, v]) => v && String(v).trim());
196
208
  const qaPrompt =
197
- `Run MECHANICAL GATES ONLY over the wave scope, per ${protocolRef} (Phase 3.5 qa-sentinel contract): lint, type-check (when stack uses typescript), the full test suite, build, dependency audit, and markdownlint as applicable. You are a GATE RUNNER: do NOT read source for code findings, do NOT emit severities — return only a PASS/FAIL/SKIP gate table.\n\nWorktree: ${a.worktreePath || '(cwd)'} — cd into it first.\nChanged files:\n${unionScope.join('\n')}`
209
+ `Run MECHANICAL GATES ONLY over the wave scope, per ${protocolRef} (Phase 3.5 qa-sentinel contract): lint, type-check (when stack uses typescript), the full test suite, build, dependency audit, and markdownlint as applicable. You are a GATE RUNNER: do NOT read source for code findings, do NOT emit severities — return only a PASS/FAIL/SKIP gate table.\n\nWorktree: ${a.worktreePath || '(cwd)'} — cd into it first.\nChanged files:\n${unionScope.join('\n')}` +
210
+ (tcGateLines.length
211
+ ? `\n\nThis project configures a curated toolchain — run THESE EXACT commands for the corresponding gates (do not substitute):\n${tcGateLines.map(([k, v]) => ` • ${k}: ${String(v).trim()}`).join('\n')}`
212
+ : '')
198
213
 
199
214
  function simplifyPrompt(c) {
200
215
  return `Simplify analysis (read-only — you do NOT edit, the workflow applies fixes afterward) over ONE card's committed diff, per ${protocolRef} (Phase 2.55). Cover all THREE lenses and return findings:\n` +
@@ -342,7 +357,7 @@ async function applyFixPass(findings, writer, label, role) {
342
357
  `You MAY edit ONLY these files (ownership map — touching anything else is a violation):\n${unionEditable.join('\n')}\n\n` +
343
358
  `Findings to fix (fix the code, not the tests unless a test itself is wrong; do NOT expand scope beyond the finding):\n` +
344
359
  findings.map((f) => `- [${f.finding_id}] (${f.card || '?'} / ${f.domain} / ${f.severity}) ${f.title}\n evidence: ${f.evidence}\n direction: ${f.minimal_fix_direction}`).join('\n') +
345
- `\n\nAfter applying: run \`npm run lint\` and (when the project uses typescript) \`npx tsc --noEmit\` and \`npm run build\` in the worktree. If a check fails because of an edit you made, fix the regression — at most 2 retries — staying within the allowed files. ` +
360
+ `\n\nAfter applying: run \`${tc('lint', 'npm run lint')}\` and (when the project uses typescript) \`${tc('typecheck', 'npx tsc --noEmit')}\` and \`${tc('build', 'npm run build')}\` in the worktree. If a check fails because of an edit you made, fix the regression — at most 2 retries — staying within the allowed files. ` +
346
361
  `Do NOT commit. Do NOT git stash (refs/stash is shared across worktrees). ` +
347
362
  `Return: applied (finding_ids you fixed), unresolved (finding_ids you could NOT fix within the allowed files / 2 retries), and checks (PASS/FAIL/SKIP for lint, tsc, build).`
348
363
  const r = await agent(fixBrief, { label, phase: 'Fix', agentType: writer, schema: FIX_SCHEMA })
@@ -171,28 +171,39 @@ const apiPrompt =
171
171
  const qaPrompt =
172
172
  `Run MECHANICAL GATES ONLY over the batch scope, per ${protocolRef} Step F.3 (qa-sentinel row): lint, type-check, the full test suite, build, dependency audit, and markdownlint as applicable to this project. Do NOT read source for code findings, do NOT emit severities — return only a PASS/FAIL/SKIP gate table.\n\nWorktree: ${a.worktreePath || '(cwd)'}\nChanged files:\n${scope.join('\n')}`
173
173
 
174
- // F-041 — single-card batch: the per-card review (Phase 3) already ran doc-reviewer +
175
- // api-perf-cost-auditor over these exact files, and a 1-card batch has NO cross-card
176
- // conflict to surface. Keep ONLY the cross-model Codex pass (its unique value a different
177
- // model finds different bugs) + qa-sentinel gates; skip the Claude-agent duplicates.
178
- const slim = a.singleCard === true
174
+ // F-041 — single-card batch: a per-card review already covered these exact files and a
175
+ // 1-card batch has NO cross-card conflict to surface, so the duplicate Claude finders can be
176
+ // dropped from the final pass but PER-FINDER, gated on what ACTUALLY ran per-card. The
177
+ // cross-model Codex pass (its unique value — a different model finds different bugs) and the
178
+ // qa-sentinel merge gate ALWAYS run. Callers pass coverage-gated flags:
179
+ // slimDoc — drop the final doc-reviewer (doc-reviewer ran per-card on this single card)
180
+ // slimApi — drop the final api-perf-cost-auditor (it ran per-card on this single card)
181
+ // Backward-compat: a caller passing only `singleCard` (or nothing) gets the old coupled
182
+ // behavior — slimDoc===slimApi===singleCard, and absent ⇒ FULL — a safe default that never
183
+ // silently over-skips. (`new2.js` passes only `singleCard`, so its behavior is unchanged;
184
+ // the `/new` classic skill passes slimApi:false because api-perf NEVER runs per-card there —
185
+ // it is deferred to THIS final pass by design, so this is its only run.)
186
+ const slimDoc = a.slimDoc !== undefined ? a.slimDoc === true : a.singleCard === true
187
+ const slimApi = a.slimApi !== undefined ? a.slimApi === true : a.singleCard === true
179
188
  // qa-sentinel always runs — the merge integrity gate reads its PASS/FAIL table.
180
189
  const reviewThunks = [
181
190
  () => agent(qaPrompt, { label: 'qa-sentinel', phase: 'Review', agentType: 'qa-sentinel', schema: GATES_SCHEMA }).then((r) => ({ kind: 'qa', r })),
182
191
  ]
183
- if (!slim) {
192
+ if (!slimDoc) {
184
193
  reviewThunks.unshift(() => agent(docPrompt, { label: 'doc-reviewer', phase: 'Review', agentType: 'doc-reviewer', schema: FINDINGS_SCHEMA }).then((r) => ({ kind: 'doc', r })))
194
+ } else {
195
+ log('Review: single-card batch — final doc-reviewer skipped (already ran per-card); kept cross-model Codex + qa gates.')
185
196
  }
186
197
  // Codex thunk runs ONLY when the pre-flight resolved the companion (else: no wasted agent).
187
198
  if (codexResolved) {
188
199
  reviewThunks.unshift(() => agent(codexPrompt, { label: 'codex', phase: 'Review', schema: CODEX_SCHEMA }).then((r) => ({ kind: 'codex', r })))
189
200
  }
190
- // api-perf-cost-auditor: skipped when no API/data files OR on a slim single-card pass.
191
- if (!slim && a.hasApiDataFiles !== false) {
201
+ // api-perf-cost-auditor: skipped when no API/data files OR when it already ran per-card (slimApi).
202
+ if (!slimApi && a.hasApiDataFiles !== false) {
192
203
  reviewThunks.push(() => agent(apiPrompt, { label: 'api-perf-cost-auditor', phase: 'Review', agentType: 'api-perf-cost-auditor', schema: FINDINGS_SCHEMA }).then((r) => ({ kind: 'api', r })))
193
204
  } else {
194
- log(slim
195
- ? 'Review: single-card batch — doc-reviewer + api-perf skipped (already run per-card); kept cross-model Codex + qa gates.'
205
+ log(slimApi
206
+ ? 'Review: single-card batch — final api-perf-cost-auditor skipped (already ran per-card).'
196
207
  : 'Review: api-perf-cost-auditor skipped (no API/data files in scope).')
197
208
  }
198
209
 
@@ -215,6 +215,9 @@ const projectBrief = [
215
215
  `Trunk: ${TRUNK}`,
216
216
  `Reference modules (Read these for the EXACT /new semantics — this workflow only sequences them): ${REF}/`,
217
217
  `Config facts: stack=${JSON.stringify(cfg.stack || {})}; features=${JSON.stringify(features)}; high_risk_modules=${JSON.stringify(highRisk)}; paths.backlog_dir=${paths.backlog_dir || '?'}; paths.references_dir=${paths.references_dir || '?'}.`,
218
+ (features.has_toolchain && cfg.toolchain && cfg.toolchain.commands)
219
+ ? `Toolchain commands (run THESE verbatim for the baseline + any gate, per agents/toolchain-protocol.md; empty → project default): ${JSON.stringify(cfg.toolchain.commands)}.`
220
+ : '',
218
221
  FLAGS.effort ? `Reasoning effort for this run: ${FLAGS.effort}.` : '',
219
222
  ].filter(Boolean).join('\n')
220
223
 
@@ -24,6 +24,7 @@ Route agents to the right module with minimal reading.
24
24
  - If touching architecture, auth, or tech stack -> read `agents/architecture.md`.
25
25
  - If touching workflow/process/commits/backlog -> read `agents/workflows.md`.
26
26
  - If CREATING or MUTATING a backlog card (any prefix — `FEAT`/`CHORE`/`BUG`/`DOC`/`PERF`/`UI`), or consuming one type-blind (`/new`, `/new2`) -> read `agents/card-schema.md` (the universal, profile-aware baseline) before writing/validating fields.
27
+ - If running MECHANICAL GATES (lint, format, type-check, test, build, audit) and `features.has_toolchain: true` -> read `agents/toolchain-protocol.md` and run the command from `toolchain.commands.<gate>` verbatim, falling back to the project-standard default when a key is unset. A configured command that FAILS is a real gate failure (do not fall back).
27
28
  - If touching testing or QA issues -> read `agents/testing.md` (also documents the scope-aware,
28
29
  profile-driven test-selection strategy consumed by `qa-sentinel`).
29
30
  - If touching GitHub issues or issue workflow -> read `agents/github-issue-subagent.md`.
@@ -65,6 +66,7 @@ When adding or updating agents, update REGISTRY.md — not this file.
65
66
  - `agents/project-context.md` — Project context protocol: `baldart.config.yml` + overlays + missing-key handling (since v3.0.0)
66
67
  - `agents/code-search-protocol.md` — Retrieval hierarchy for code search: LSP → Grep → Git (since v3.10.0, gated on `features.has_lsp_layer`)
67
68
  - `agents/code-graph-protocol.md` — Structural/relational retrieval via the Graphify code knowledge graph (since v4.21.0, gated on `features.has_code_graph`)
69
+ - `agents/toolchain-protocol.md` — Mechanical-gate command resolution (lint/format/typecheck/test/build/audit) from `toolchain.commands.*` with silent project-standard fallback (since v4.41.0, gated on `features.has_toolchain`)
68
70
  - `agents/design-system-protocol.md` — Registry-first discipline for UI work: BLOCKING cascade on `INDEX.md` + `tokens-reference.md` + `components/<Name>.md` (since v3.11.0, gated on `features.has_design_system`)
69
71
  - `agents/card-schema.md` — Atomic Card Baseline Schema: the universal, profile-aware (epic/child/standalone) field contract every backlog card satisfies, plus the consumer HALT/BACK-FILL/WARN contract (since v4.35.0)
70
72
 
@@ -0,0 +1,80 @@
1
+ # Toolchain Protocol
2
+
3
+ ## Purpose
4
+
5
+ Define how agents/skills run the **mechanical quality gates** — lint, format,
6
+ type-check, test, build, audit — when a project has opted into the curated
7
+ toolchain. The commands are read from `toolchain.commands.*` in
8
+ `baldart.config.yml` and run **verbatim**, instead of each gate hard-coding
9
+ `eslint` / `tsc` / `jest` / `npm run build`. This lets a project standardize on
10
+ the best-in-class tools (Biome for format+lint+import, Vitest, tsc, Lefthook)
11
+ and have every gate flow use them consistently.
12
+
13
+ ## Scope
14
+
15
+ **In**: which command to invoke for each gate; the resolution order; the silent
16
+ fallback. **Out**: code review for correctness/quality (that is the reviewer
17
+ agents' job, not a mechanical gate); installing the tools (that is
18
+ `baldart configure` / `/toolchain-bootstrap` / `baldart doctor`, see
19
+ `framework/docs/TOOLCHAIN-LAYER.md`).
20
+
21
+ ## Gating
22
+
23
+ Conditional on `features.has_toolchain: true` in `baldart.config.yml`. The layer
24
+ degrades **silently and per-command**: for each gate, if
25
+ `toolchain.commands.<gate>` is a non-empty string, run it **exactly**; otherwise
26
+ fall back to the flow's built-in project-standard default (`npx eslint`,
27
+ `npx tsc --noEmit`, `npm test`, `npm run build`, `npm audit`, …). When the flag
28
+ is `false`/missing, every gate uses its default — identical to pre-toolchain
29
+ behavior. **Never block a task** because a command is unset or a tool is missing;
30
+ surface install gaps through `baldart doctor` (`toolchain-install` /
31
+ `toolchain-init-config` actions), not mid-task.
32
+
33
+ ## Resolution hierarchy (per gate)
34
+
35
+ 1. **Explicit config** — `toolchain.commands.<gate>` from `baldart.config.yml`,
36
+ run verbatim. This is the source of truth when set.
37
+ 2. **Project-standard fallback** — the gate's existing default command, inferred
38
+ from the stack (eslint/tsc/jest/npm scripts). Used when the config key is
39
+ empty/absent.
40
+ 3. **Skip** — when neither resolves (e.g. no build script in a library), report
41
+ `SKIP`, never fail.
42
+
43
+ ## Command map
44
+
45
+ | Gate | Config key | Curated default (Biome/Vitest/tsc) | Fallback |
46
+ | --- | --- | --- | --- |
47
+ | Lint | `toolchain.commands.lint` | `npx biome check .` | `npx eslint --max-warnings=0 <files>` |
48
+ | Format | `toolchain.commands.format` | `npx biome format --write .` | `npx prettier --write <files>` |
49
+ | Type-check | `toolchain.commands.typecheck` | `npx tsc --noEmit` | `npx tsc --noEmit` |
50
+ | Test | `toolchain.commands.test` | `npx vitest run` | `npm test` |
51
+ | Test (related) | `toolchain.commands.test_related` | `npx vitest related --run <files>` | `npx jest --findRelatedTests <files>` |
52
+ | Build | `toolchain.commands.build` | `npm run build` | `npm run build` |
53
+ | Audit | `toolchain.commands.audit` | `npm audit --audit-level=high` | `npm audit` |
54
+
55
+ Biome's `check` runs lint + format-check + import-organize in one pass, so the
56
+ `lint` gate alone covers what a separate ESLint+Prettier+import-plugin trio
57
+ would; the `format` command only `--write`s.
58
+
59
+ ## Consumers
60
+
61
+ `qa-sentinel` (gate runner), `/qa`, `/check`, the `coder` post-fix re-verify,
62
+ and the `/new` + `/new2` review workflows resolve gate commands through this
63
+ protocol. The `/new` workflow scripts receive the resolved config via their
64
+ `args.config` payload (the consuming skill passes `baldart.config.yml`) — they
65
+ must never hard-code project facts (see the workflows contamination contract).
66
+
67
+ ## Fallback rules
68
+
69
+ - A configured command that EXITS NON-ZERO is a genuine gate **FAIL** — do not
70
+ silently fall back to the default (that would mask a real failure). Fallback
71
+ applies only when the command is **unset**, not when it fails.
72
+ - A configured command whose binary is missing (`command not found`) is a setup
73
+ gap → report it and fall back to the default for that single run, then let
74
+ `baldart doctor` repair the install. Never abort the task.
75
+
76
+ ## See also
77
+
78
+ - `framework/docs/TOOLCHAIN-LAYER.md` — install lifecycle, adapters, plumbing.
79
+ - `agents/testing.md` — scope-aware, profile-driven test selection (the `test` /
80
+ `test_related` gates compose with it).