npm - baldart - Versions diffs - 4.43.0 → 4.45.0 - Mend

baldart 4.43.0 → 4.45.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,46 @@ All notable changes to BALDART will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [4.45.0] - 2026-06-15
+**The main-repo-root (`$MAIN`) resolution across `worktree-manager` + `/prd` + `/new` is unified onto one correct, `separate-git-dir`-safe canonical — fixing two latent bugs while *refuting* the obvious "unify on `--git-common-dir`" fix the deferred plan proposed.** v4.42.0 deferred the `$MAIN` cleanup after an adversarial pass flagged it as a likely trap; this release does it properly. The root resolution lived in ~5 forms across three genuine execution contexts (main-checkout cwd, worktree cwd, the `allocate-id.sh` startup), and the deferred plan wanted to collapse them onto `git rev-parse --git-common-dir` + `/..` (the recipe already in `allocate-id.sh` `resolve_main()`). A 3-skeptic adversarial review **before implementation** demolished that plan with git experiments: (1) `--git-common-dir` + parent returns the *parent of the git dir*, which is the WRONG directory under `git init --separate-git-dir` (the git dir lives outside the working tree) — so the "canonical" recipe was itself fragile, and `resolve_main()` only worked because BALDART repos use in-tree `.git`; (2) the alleged `prd/SKILL.md` Step-1 bug ("`--show-toplevel` from inside a worktree") does **not** exist — Step 1 runs only at fresh kickoff on the main checkout, and resume reads the persisted `main_path`, so `--show-toplevel` is correct there *and* is the only form that survives `separate-git-dir`; (3) `git worktree list` head is also not `separate-git-dir`-safe (returns the git dir). The corrected design keeps the task's **context classification** but fixes the **primitive**: every site resolves the root through `--show-toplevel` (the true working-tree root in all cases), using `git -C .. rev-parse --show-toplevel` from a worktree (its parent `.worktrees/` lives inside the main repo). `resolve_main()` is rewritten as the canonical reference (detects a linked worktree via `git-dir != git-common-dir`, walks one level up) — **byte-identical output to the old form for in-tree repos** (verified), correct for `separate-git-dir`, and fails cleanly (`return 1`) instead of emitting a garbage path. Two real fragilities fixed along the way: the `/mw` `$MAIN` fallback (`--git-common-dir` + `/..` under `git -C`, relative-base hazard + `separate-git-dir` break) and the `3b` card-sync (`--show-superproject-working-tree || pwd`, which returned the SUPERPROJECT for a real submodule and only worked otherwise by a cwd accident). All recipes dogfooded across normal + `separate-git-dir` repos in every cwd context. **MINOR** (hardening + bug fixes across skills/agents; no removed surface, **no new `baldart.config.yml` key** ⇒ schema-change propagation rule N/A).
+### Fixed
+- **`framework/.claude/skills/worktree-manager/scripts/allocate-id.sh` — `resolve_main()` rewritten `--show-toplevel`-based.** Now resolves the working-tree root (correct under `git init --separate-git-dir`, where `--git-common-dir` + parent pointed at the wrong directory), detects a linked worktree (`git-dir != git-common-dir`) and walks one level up. Identical output to the prior form for in-tree repos; returns non-zero instead of a garbage path when not in a git repo.
+- **`framework/.claude/skills/worktree-manager/SKILL.md` — `/mw` step 4c `$MAIN` fallback** no longer derives the root via `git -C "$WORKTREE_PATH" rev-parse --git-common-dir` + `/..` (parent-of-git-dir, wrong under a separate git dir; appending `/..` to a possibly-relative `--git-common-dir` under `git -C` resolves against the wrong base). It `cd`s into the worktree then `git -C .. rev-parse --show-toplevel`, with a loud guard replacing the silent failure path.
+- **`framework/.claude/skills/worktree-manager/SKILL.md` — `3b. Sync untracked cards`** drops the `git -C "$WORKTREE_PATH" --show-superproject-working-tree || pwd` form (returned the *superproject* for a real git submodule, where the cards do not live; only resolved otherwise by relying on cwd). At step 3b cwd is still the main checkout, so it now uses `git rev-parse --show-toplevel` — correct for normal, submodule, and `separate-git-dir` repos.
+### Changed
+- **`framework/.claude/skills/worktree-manager/SKILL.md` — new `## Resolving the main repo root` section** documenting the three execution contexts, the correct primitive per context, and an explicit "do NOT use `--git-common-dir` + `/..`" rule (codifying the adversarial finding so the next maintainer doesn't re-attempt the refuted unification). The `/nw` step-1, nw-docs step-0, and `/nw` env-copy sites gain cross-ref comments; env-copy keeps `git -C .. --show-toplevel` but replaces its silent `|| echo '../..'` fallback with a loud guard.
+- **`framework/.claude/skills/prd/references/validation-phase.md` + `framework/.claude/agents/{prd,prd-card-writer}.md`** — the legacy `$MAIN` fallback and the descriptive `$MAIN` comments switch from `--git-common-dir` to the canonical `--show-toplevel` resolution (and point at the new section). `prd/SKILL.md` Step 1 is deliberately **unchanged** — `--show-toplevel` there is correct (refuted "bug").
+- **`framework/.claude/skills/new/references/setup.md` — `$MAIN` resolution corrected**: from a worktree-invocation, resolve via the worktree's parent toplevel rather than `--show-superproject-working-tree` (a submodule primitive) or `git worktree list` (not `separate-git-dir`-safe).
+## [4.44.0] - 2026-06-15
+**The toolchain layer (v4.41.0) now reaches the *execution-side* gates too: the worktree baseline/merge builds and the classic `/new` reference suite stop hard-coding `npx tsc`/`npx eslint`/`npm run build` and run the consumer's configured `toolchain.commands.*` instead.** v4.41.0 wired the *review* gates (`qa-sentinel`, `coder`, the `/new`+`/new2` review workflows) through `toolchain.commands.*`, but a dozen gates that actually run a build/lint/typecheck were still hard-coded — so a Biome/Vitest consumer's worktree baseline silently ran `eslint`, and the classic `/new` per-card + final gates ran `npm run build` regardless of config. This release makes those gates toolchain-aware while preserving each site's existing **severity** policy (a configured command failing is the same STOP/continue verdict the default produced — the protocol's no-fallback rule governs only *which* command runs, never what to do with its exit code). The mechanism follows the **execution context**: a markdown **prose note** where a full model writes the gate command (the `/new` references, `/bug`, `/simplify`), and an inline **shell resolver** in `worktree-manager` — which has no `args.config` and whose baseline runs in a weak/background subagent, so it resolves `toolchain.commands.*` from `baldart.config.yml` on disk the same way it already resolves `git.trunk_branch`, rather than relying on a prose note a weak model could skip. The design was adversarially reviewed before implementation (3 skeptics), which corrected the initial plan on three points: (1) the `/mw` best-effort `npm run test 2>/dev/null || true` keeps its `|| true` swallow (never a STOP trigger); (2) the `/mw` changed-files lint scope is preserved as the default (a configured whole-tree command like `biome check .` runs by the consumer's contract); (3) two missed twins — `/bug` and `/simplify` verify steps — were added to scope. **Declared debt (intentionally untouched):** `npm install` stays hard-coded (there is no `toolchain.commands.install` key — adding one would be a 5-layer schema change; keeping this MINOR), `markdownlint` is not in the curated map, and the descriptive `(tsc+lint+build)` labels in `new2.js`/`setup.md` are not load-bearing (the `new2` projectBrief already injects the verbatim instruction — refuted by review). **MINOR** (additive: more gates honor the existing `features.has_toolchain` + `toolchain.commands.*`; no removed surface, **no new `baldart.config.yml` key** ⇒ schema-change propagation rule N/A).
+### Changed
+- **`framework/.claude/skills/worktree-manager/SKILL.md` — gates toolchain-aware via an inline shell resolver.** New `## Toolchain-aware gates` convention + a Project-Context dependency on `features.has_toolchain` + `toolchain.commands.{typecheck,lint,build,test}`. Each gate block (`/nw` step 5 baseline, `/mw` step 2 pre-merge, step 4b rebase build, step 5 post-merge) inlines a `_tc()` resolver that reads the config from disk (mirroring the existing `git.trunk_branch`/`git.merge_strategy` grep idiom) and runs the configured command via `eval "${VAR:-<default>}"`, falling back to the hard-coded default when unset/flag-off. Three carve-outs documented: `npm install` is not a gate key, severity is unchanged, and the `/mw` best-effort `test` line keeps its `|| true` swallow.
+- **`framework/.claude/skills/new/SKILL.md` — new `## Toolchain gates` core invariant** (cited as `§ "Toolchain gates"` by the reference modules, same pattern as `§ "Context economy"`): when `has_toolchain`, the mechanical gates use `toolchain.commands.<gate>` verbatim, a non-zero configured command is a real FAIL (no masking fallback), and `npm install`/`markdownlint` are not toolchain keys.
+- **`framework/.claude/skills/new/references/{implement,review-cycle,completeness,final-review,commit,team-mode}.md` — gate sites cite `§ "Toolchain gates"`** so the classic `/new` per-card (Phase 1 verify, Phase 2.55/2.5x re-runs, Phase 3 doc re-verify, completeness gap re-run, commit pre-check), team-mode (per-card coder briefing, group build, group simplify re-run), and final-review build all resolve `toolchain.commands.*` verbatim when the flag is on.
+- **`framework/.claude/skills/{bug,simplify}/SKILL.md` — standalone verify steps toolchain-aware** (the two twins surfaced by the adversarial scope review): `/bug` PHASE 5 regression checks and `/simplify` Step 5 verify gain a self-contained Toolchain note + a Project-Context dependency on `features.has_toolchain` + `toolchain.commands.{lint,typecheck,test}`.
+- **`framework/agents/toolchain-protocol.md` — Consumers section updated** to record the execution-side gates and the two resolution mechanisms (prose note vs on-disk shell resolver), and to restate the `npm install`/`markdownlint` carve-outs.
+## [4.43.1] - 2026-06-15
+**The npm publish workflow is now race-safe: pushing several `v*.*.*` tags close together can no longer leave `latest` on a lower version.** Publishing v4.41.0/4.42.0/4.43.0 together exposed the bug — the three tag-push workflow runs executed in **parallel**, and `npm publish` (with no explicit dist-tag) points `latest` at whichever version finishes **last chronologically**, not the highest. Real outcome: `latest` landed on **4.42.0**, so `npx baldart` installed a non-latest version (fixed by hand with `npm dist-tag add baldart@4.43.0 latest`). This release closes the race two ways: (1) a workflow-level **`concurrency` group** (`group: publish-npm`, `cancel-in-progress: false`) serializes all publish runs so they never race on the dist-tag and a publish is never cancelled; (2) a new final step **reconciles `latest` to the highest published version** — it computes the max stable semver across `npm view baldart versions` (unioned with the just-published `VERSION` to survive registry propagation lag, numeric per-segment compare so `4.10.0 > 4.9.0`) and repoints `latest` only when it has drifted. Because the runs are serialized, the last run always leaves `latest` on the maximum regardless of push order. The existing "Verify tag matches VERSION" guard is untouched. **PATCH** (CI/release-machinery bugfix, no change to installed surface or `baldart.config.yml` ⇒ schema-change propagation rule N/A).
+### Fixed
+- **`.github/workflows/publish-npm.yml` — race-safe `latest` dist-tag.** Added a `concurrency` group to serialize parallel tag-push publish runs, and a `Reconcile 'latest' dist-tag to highest published version` step that repoints `latest` at `max(npm view baldart versions ∪ VERSION)` via a self-contained numeric semver comparator (no `semver` dependency), only when `latest` drifts. Fixes `latest` landing on a lower version when multiple tags are pushed close together.
+### Changed
+- **`MAINTAINING.md` — release protocol note** that pushing multiple tags at once is now safe (since v4.43.1) and why.
 ## [4.43.0] - 2026-06-15
 **The classic `/new` team-mode director stops bloating its own context by legitimately running the review cluster inline — the delegation gate is hardened to the same no-discretion enforcement the sequential path already carries.** A real run analysis (the same FEAT-0028/0029 investigation that produced v4.42.0) found the dominant orchestrator-context driver in team mode is NOT the review token volume (that already runs out-of-context in the `new-card-review` workflow) but the **director's turn count**: the team-mode delegation gate `D.1.6` was labelled "opt-in, additive" with a soft `IF available → delegate`, while the sequential path (`review-cycle.md` Phase 2.5x) carries a strict **MECHANICAL · NO discretion · MUST delegate · GATE VIOLATION if inline** clause. That asymmetry let a director legitimately run the chattiest sub-steps (the per-card Codex loop + per-card Simplify fan-out) inline, re-reading a growing prefix every turn. This release ports the strong clause verbatim into `D.1.6`, plus two pure-distillation fixes (batch the D.1.5 per-card diffs to `/tmp` and read only the compact summary; the same diffs-to-disk discipline on the sequential path for symmetry). Also lands the one **safe** survivor of the deferred worktree-prose cleanup: the `/nw` code-mode now **auto-appends** `.worktrees/` to `.gitignore` (idempotent + WARN) instead of leaving a programmatic `/new` run to create a git-tracked worktree. An adversarial review refuted the rest of that cleanup and a separate review-gate proposal (per-card AC-conformance gate + a second relevance-gating layer) as net-negative/regressive — both deferred. **MINOR** (FIX A changes which path runs by default in team mode — an observable behavior change; no removed surface, no new `baldart.config.yml` key ⇒ schema-change propagation rule N/A).

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 4.43.0
1	+ 4.45.0

package/framework/.claude/agents/prd-card-writer.md CHANGED Viewed

@@ -311,7 +311,8 @@ applies even when N=1.
   worktree on this machine (lock + shared high-water mark under `.worktrees/`):
   ```bash
-  # $MAIN is the main repo root: dirname of `git rev-parse --git-common-dir`.
+  # $MAIN is the persisted main repo root (see worktree-manager § "Resolving the
+  # main repo root" — via --show-toplevel, NOT --git-common-dir).
   ALLOC="$MAIN/.claude/skills/worktree-manager/scripts/allocate-id.sh"
   if [ -x "$ALLOC" ]; then
     N="$("$ALLOC" reserve FEAT "<slug>")"   # e.g. prints 0024

package/framework/.claude/agents/prd.md CHANGED Viewed

@@ -504,7 +504,9 @@ parallelism (concurrent sessions on sibling worktrees both pick the same "next"
 integer and conflict at merge):
 ```bash
-# $MAIN = dirname of `git rev-parse --git-common-dir` (shared across worktrees).
+# $MAIN = the persisted main repo root (the same value SKILL.md Step 1 wrote; see
+# worktree-manager § "Resolving the main repo root" — resolved via --show-toplevel,
+# NOT --git-common-dir, which breaks under git init --separate-git-dir).
 ALLOC="$MAIN/.claude/skills/worktree-manager/scripts/allocate-id.sh"
 [ -x "$ALLOC" ] && N="$("$ALLOC" reserve FEAT <slug>")   # also BUG | UI | DOC | PERF
 ```

package/framework/.claude/skills/bug/SKILL.md CHANGED Viewed

@@ -17,7 +17,7 @@ Argument: optional bug description (e.g., `/bug feature X is not saving`).
 ## Project Context
-**Reads from `baldart.config.yml`:** `paths.design_system`, `paths.references_dir`, `paths.wiki_dir` (Phase 0 `rg` lookup target).
+**Reads from `baldart.config.yml`:** `paths.design_system`, `paths.references_dir`, `paths.wiki_dir` (Phase 0 `rg` lookup target), `features.has_toolchain` + `toolchain.commands.{lint,typecheck,test}` (the PHASE 5 regression gates run those verbatim when the flag is on).
 **Gated by features:** `features.has_design_system` (when `true`, Phase 4's design-system reads become BLOCKING for UI-touching bugs).
 **Overlay:** loads `.baldart/overlays/bug.md` if present — project-specific debug entry points (e.g. SWR debug switches, env summary helpers, error-code modules). The base skill stays generic; project-specific code paths live in the overlay.
 **On missing/empty keys:** ask the user; do not assume defaults. See `framework/agents/project-context.md` § 3.
@@ -202,6 +202,8 @@ If the project exposes an env-summary helper (listed in `.baldart/overlays/bug.m
 ## PHASE 5: VERIFY & CLEAN UP
+> **Toolchain:** when `features.has_toolchain: true` in `baldart.config.yml`, run `toolchain.commands.{typecheck,lint,test}` **verbatim** instead of the defaults below — per `framework/agents/toolchain-protocol.md`. Empty/absent key (or flag off) → the default. A configured command that exits non-zero is a real FAIL (do not fall back).
 1. Reproduce original scenario — confirm fix
 2. Check regressions: `npx tsc --noEmit` + `npx eslint --max-warnings=0 <changed-files>`
 3. If tests exist: `npm run test`

package/framework/.claude/skills/new/SKILL.md CHANGED Viewed

@@ -243,6 +243,24 @@ baselines. Keep that bulk on disk and pass **paths**, not bodies.
 ---
+## Toolchain gates
+When `features.has_toolchain: true` in `baldart.config.yml`, every mechanical gate
+this skill runs (`lint`, `typecheck`, `test`, `build`) uses the consumer's
+configured command from `toolchain.commands.<gate>` **verbatim** instead of the
+`npm`/`npx` default shown at the call site — per `framework/agents/toolchain-protocol.md`.
+Resolve per gate: a non-empty `toolchain.commands.{lint,typecheck,test,build}` wins;
+empty/absent (or the flag off/missing) → the default at the call site, identical to
+pre-toolchain behavior. A configured command that exits non-zero is a real gate
+**FAIL** — do NOT then run the default (that would mask the failure); fall back only
+when the key is **unset**. `npm install` and `markdownlint` are NOT toolchain keys
+(there is no `commands.install`, and markdownlint is not in the curated map) — they
+stay as written. The gate-output discipline (§ "Context economy" → redirect to disk,
+surface only the exit code) applies to the resolved command exactly as to the default.
+Reference modules cite this section as `§ "Toolchain gates"`.
+---
 ## Routing — il per-card pipeline e i moduli on-demand
@@ -250,9 +268,9 @@ baselines. Keep that bulk on disk and pass **paths**, not bodies.
 sistema, ri-letto a OGNI turno. Per non pagare 60k+ token di istruzioni di fase a
 ogni turno, il dettaglio passo-passo di ogni fase vive in un **modulo `references/<x>.md`**
 caricato on-demand. Questo file (il core) tiene solo gli invarianti cross-fase
-(Context Tracking, Progress Visibility, § "Context economy", Agent Routing, QA
-Profile, Trivial fast-lane, Risk-signal detector, Fix Application Log) + questa
-mappa di navigazione.
+(Context Tracking, Progress Visibility, § "Context economy", § "Toolchain gates",
+Agent Routing, QA Profile, Trivial fast-lane, Risk-signal detector, Fix Application
+Log) + questa mappa di navigazione.
 > **HARD RULE — leggi il modulo PRIMA di eseguire la fase.** Quando entri in una
 > fase, **Read** il suo modulo `references/<x>.md` e poi eseguilo. Eseguire una
@@ -260,9 +278,9 @@ mappa di navigazione.
 > compaction che l'ha evacuato) è una violazione di protocollo: ricaricalo. Registra
 > nel tracker, sotto `## Current Card`, il campo `phase_module_loaded: <modulo>` al
 > caricamento, così la § "Context recovery protocol" sa cosa ri-leggere dopo una
-> compaction. I `§ "..."` citati dai moduli (Context economy, Context Tracking,
-> Trivial-card fast-lane, Risk-signal detector, Fix Application Log) puntano a sezioni
-> che vivono **qui nel core** → risolvono sempre.
+> compaction. I `§ "..."` citati dai moduli (Context economy, Toolchain gates, Context
+> Tracking, Trivial-card fast-lane, Risk-signal detector, Fix Application Log) puntano a
+> sezioni che vivono **qui nel core** → risolvono sempre.
 **Sequenza (per ogni card, in ordine — i moduli per-card sono caricati per traversata):**

package/framework/.claude/skills/new/references/commit.md CHANGED Viewed

@@ -6,7 +6,7 @@
 > Sequential-mode global step numbering resumes here at 26 (Phase 3.5 ended at 25; Phase 3.7 used its own local C.0–C.6 counter). The tracker phase-string `4-commit` therefore maps to step 26, NOT a second step 25.
-26. **Update tracker**: phase = "4-commit". **Entry assertion** — before committing, verify the Phase 3.7 e2e re-run obligation was honored: read the tracker for `e2e-rerun: triggered` / `e2e-rerun: not-needed`. If Phase 3.7 touched UI files but no `e2e-rerun` entry exists, do NOT commit yet — go run the re-run per Phase 3.7 step 6 first. Also confirm Phase 3.5/3.7 fixes did not leave lint/tsc broken: if the Phase 3.7 fix sub-loop applied any patch, run `npm run lint` + `npx tsc --noEmit` (when typescript) once before committing (redirect to disk per § "Context economy").
+26. **Update tracker**: phase = "4-commit". **Entry assertion** — before committing, verify the Phase 3.7 e2e re-run obligation was honored: read the tracker for `e2e-rerun: triggered` / `e2e-rerun: not-needed`. If Phase 3.7 touched UI files but no `e2e-rerun` entry exists, do NOT commit yet — go run the re-run per Phase 3.7 step 6 first. Also confirm Phase 3.5/3.7 fixes did not leave lint/tsc broken: if the Phase 3.7 fix sub-loop applied any patch, run `npm run lint` + `npx tsc --noEmit` (when typescript) once before committing — when `has_toolchain`, the configured `toolchain.commands.{lint,typecheck}` verbatim (§ "Toolchain gates") — (redirect to disk per § "Context economy").
 27. Stage and commit **all changes together** in the worktree using format `[CARD-ID] Brief description` (MUST per AGENTS.md). Include all relevant files — implementation, review fixes, QA-driven fixes, and doc updates in a single commit. Do NOT merge or push yet — that happens post-batch.
     - **IMPORTANT — explicit staging**: NEVER use `git add -A` or `git add .`. Always stage files by explicit name:
       ```bash

package/framework/.claude/skills/new/references/completeness.md CHANGED Viewed

@@ -118,7 +118,7 @@ Before triggering any review, you MUST verify that the coder agent implemented *
   - The exact list of unimplemented items (copy the checklist rows)
   - The file-ownership restrictions from `## File Ownership Map`
   - The instruction: "Implement ONLY these missing items. Do not refactor or expand scope."
-- After the fix agent completes, re-run the static gates the fix could have broken — `npm run lint`, `npx tsc --noEmit` (when `stack.language` includes typescript), and `npm test` — not just build + lint (a gap-fix can introduce a type error or break a test that the earlier Phase 2 gate had passed). Redirect each to `/tmp/<gate>-<CARD-ID>.txt` per § "Context economy" (never inline).
+- After the fix agent completes, re-run the static gates the fix could have broken — `npm run lint`, `npx tsc --noEmit` (when `stack.language` includes typescript), and `npm test` (when `has_toolchain`, the configured `toolchain.commands.{lint,typecheck,test}` verbatim — § "Toolchain gates") — not just build + lint (a gap-fix can introduce a type error or break a test that the earlier Phase 2 gate had passed). Redirect each to `/tmp/<gate>-<CARD-ID>.txt` per § "Context economy" (never inline).
 - Re-verify each fixed item against the code — do NOT trust the agent's self-report.
 - Repeat this sub-loop up to **2 times** (per-item budget, shared with step 0 — see "Loop-counter scope"). After 2 loops, if items remain Partial or Missing:
   - Log in `## Issues & Flags`: list each unimplemented requirement.

package/framework/.claude/skills/new/references/final-review.md CHANGED Viewed

@@ -258,7 +258,7 @@ that is a **gate violation**: log it as
     - **`security`-domain findings** (path in `paths.high_risk_modules`, or RLS-policy SQL) → route to **security-reviewer** in write mode (canonical writer map v4.26.1 — it owns the security-invariant contract a coder lacks; NEVER route security fixes to coder). **`migration`-domain findings** (SQL under the migrations dir) → route to **coder**. For both, apply the Sub-agent failure protocol's STOP-on-crash rule (never inline-fallback on a security/migration fix). These are NOT collapsed into a generic "everything else" bucket.
     - **All remaining findings** (other code, perf, test) → invoke the **coder** agent once to apply them in a single pass.
     Run in the order doc-reviewer → security-reviewer → coder (skip any whose partition is empty). Pass only the verified findings, not false positives.
-12. Run final build: `npm run lint && npx tsc --noEmit && npm run build` (redirect each to `/tmp/final-<gate>.txt` per § "Context economy"; surface only exit code + bounded extract on failure).
+12. Run final build: `npm run lint && npx tsc --noEmit && npm run build` (when `has_toolchain`, the configured `toolchain.commands.{lint,typecheck,build}` verbatim — § "Toolchain gates"; redirect each to `/tmp/final-<gate>.txt` per § "Context economy"; surface only exit code + bounded extract on failure).
     If any check fails, apply self-healing retry loop (up to 3 times).
 13. **Update tracker** with final review results:
     - Review engine: Codex (a non-Anthropic frontier model, resolved at runtime by `codex-companion.mjs`) (primary) | Claude code-reviewer (fallback)

package/framework/.claude/skills/new/references/implement.md CHANGED Viewed

@@ -335,6 +335,7 @@
    ```
 8. **Run the verification gates and CAPTURE their output to disk** (so step 9 can pass it to a fix agent) — **redirect, never `tee`/stream inline** (per § "Context economy" → Gate-output discipline). Each is its own gate:
+   When `features.has_toolchain: true`, substitute each command below with `toolchain.commands.{lint,typecheck,test,build}` run verbatim (§ "Toolchain gates"; defaults shown are the fallback when a key is unset).
    ```bash
    cd <worktree-path>
    npm run lint        > /tmp/lint-<CARD-ID>.txt  2>&1; echo "lint:$?"

package/framework/.claude/skills/new/references/review-cycle.md CHANGED Viewed

@@ -117,7 +117,7 @@ After completeness is verified, clean up the implementation before it reaches re
    **Telemetry (Fix Application Log)** — for EVERY finding (valid OR skipped) append one row to the tracker's `## Fix Application Log` section per the schema above. Use `domain=simplify-{reuse|quality|efficiency}` matching the originating agent. Include the `severity` trailing key. Inline: `decision=inline | applied_by=orchestrator | est_lines=<n> | severity=<HIGH|MEDIUM> | finding=<1-line>`. Delegated (domain-override): `decision=<coder|doc-reviewer> | applied_by=<coder|doc-reviewer> | est_lines=<n> | severity=<...> | finding=<1-line>`. Skipped: `decision=skipped | applied_by=- | est_lines=0 | reason=<false-positive|not-worth-addressing>`.
-5. After all fixes, run `npm run lint` and `npx tsc --noEmit` to confirm nothing broke (redirect to disk per § "Context economy"; surface only exit code + a bounded extract on failure).
+5. After all fixes, run `npm run lint` and `npx tsc --noEmit` (when `has_toolchain`, the configured `toolchain.commands.{lint,typecheck}` verbatim — § "Toolchain gates") to confirm nothing broke (redirect to disk per § "Context economy"; surface only exit code + a bounded extract on failure).
    If either fails, fix the regression (up to **2 retries**). **If it still fails after 2 retries**: do NOT silently continue to Phase 2.6 with a broken tree — log the failure in `## Issues & Flags` as `[SIMPLIFY-REGRESSION]` and invoke `AskUserQuestion` (revert the simplify fixes / keep and have me fix manually / stop the card), mirroring the Phase 3.5 escalation.
 6. **Update tracker**: phase = "2.55-simplify DONE", log count of fixes applied (or "clean — 0 fixes").
@@ -288,7 +288,7 @@ skill's Phase 1 falls back to deriving Gherkin scenarios from
     Doc-reviewer applies all doc-domain fixes itself. The orchestrator does NOT spawn a coder for doc fixes (since v3.40.0 — `doc` is owned by `doc-reviewer`, see "Domain-Override Domains"). The only doc-reviewer output that leaves this phase unfixed is a **doc-drift→bug finding rooted in CODE** (the implementation contradicts a documented contract). Route it explicitly: if the conflicting code file matches the `security` Domain-Override match rule (`paths.high_risk_modules`) → spawn `security-reviewer` with the finding now, in this phase (a security-class code fix is not deferrable to a `light` Phase 3.7, and security is owned by `security-reviewer` — never a coder); otherwise carry the finding into the Phase 3.7 `/codexreview` input as a known code-drift bug and let the Phase 3.7 fix sub-loop apply it. Either way, append a Fix Application Log row with `domain=codex-correctness` (NOT `doc`) so telemetry attributes it as a code fix. Do NOT leave it accumulating in the tracker with no fix owner.
 14. **Knowledge-corpus sync (OPTIONAL — only if the project ships a corpus-sync agent)**: There is NO shipped `obsidian-sync` agent — do NOT dispatch one (a hard dispatch to a non-existent subagent fails silently). Only when the project provides its own knowledge-corpus sync agent (declared in `.baldart/overlays/new.md`) AND doc-reviewer's findings indicate a corpus impact, invoke that agent with the listed paths after the doc fixes are applied. Otherwise skip with a one-line notice (`knowledge-corpus sync: skipped (no corpus-sync agent configured)`). Non-blocking either way.
 15. **Telemetry** — after doc-reviewer returns, append one row per doc finding to `## Fix Application Log`: `3 | doc | est_lines=<n> | decision=doc-reviewer | applied_by=doc-reviewer | finding=<1-line>`. If 0 findings, append one row: `3 | doc | est_lines=0 | decision=skipped | applied_by=- | reason=no-findings`. **Phase-8 producer (named counter)** — ALSO record the per-card doc-gap counts as a structured line in `## Current Card` (carried into `## Completed Cards` at Phase 5): `doc_gaps: found=<N> fixed=<M>` where `N` = total doc findings doc-reviewer raised and `M` = those it applied. This is the single named producer for Phase 8's `doc_gaps_found` / `doc_gaps_fixed` fields — without it those fields have no upstream write and Phase 8 would hard-code zeros. (D.4a is the team-mode producer of the same counter — see Phase 7 § D.4a.)
-16. Run `npm run lint` and `npx tsc --noEmit` (when `stack.language` includes typescript) to verify nothing broke (redirect to disk per § "Context economy"). If doc-reviewer touched any source-adjacent file (a `.ts`/`.tsx` helper, a co-located doc export), also run `npm run build`. If any check fails, apply the self-healing retry loop (up to 3 times, no user prompt). **If still failing after 3 retries**: do NOT fall through silently to Phase 3.5 — log `[DOC-PHASE-REGRESSION]` in `## Issues & Flags` and invoke `AskUserQuestion` (revert the doc-phase edits that broke the build / keep and fix manually / stop the card).
+16. Run `npm run lint` and `npx tsc --noEmit` (when `stack.language` includes typescript) — when `has_toolchain`, the configured `toolchain.commands.{lint,typecheck,build}` verbatim (§ "Toolchain gates") — to verify nothing broke (redirect to disk per § "Context economy"). If doc-reviewer touched any source-adjacent file (a `.ts`/`.tsx` helper, a co-located doc export), also run `npm run build`. If any check fails, apply the self-healing retry loop (up to 3 times, no user prompt). **If still failing after 3 retries**: do NOT fall through silently to Phase 3.5 — log `[DOC-PHASE-REGRESSION]` in `## Issues & Flags` and invoke `AskUserQuestion` (revert the doc-phase edits that broke the build / keep and fix manually / stop the card).
 17. **Telemetry for the step-16 self-heal** — if the retry loop spawned any fix (a code edit to recover from a doc-phase regression), append a Fix Application Log row for it AFTER the loop settles (the step-15 doc telemetry row was written before this loop ran, so it does not capture step-16 fixes). Then update tracker: phase = "3-doc-review DONE", log doc findings count, fixes applied.
     If doc-reviewer found a recurring gap, append 1-line to `## Lessons Learned`:
     `DOC: <pattern>`

package/framework/.claude/skills/new/references/setup.md CHANGED Viewed

@@ -15,7 +15,7 @@
    - Resolve `$TRUNK` = `git.trunk_branch` from `baldart.config.yml`. **When the key is absent** (consumer updated to ≥4.0.0 without re-running `configure`), do NOT hard-assume `develop` — autodetect the repo's real default branch exactly as worktree-manager does, so `/new` and `nw` agree on the base: `git -C "$MAIN" symbolic-ref --quiet refs/remotes/origin/HEAD` (strip `refs/remotes/origin/`), else the first existing local branch among `develop` / `main` / `master`. A `main`-trunk repo defaulted to `develop` here would diverge from the worktree base `nw` picks and break every `git diff "$TRUNK...HEAD"` gate. Only HALT ("Trunk branch unresolved — run `npx baldart configure`") if nothing resolves. Persist the resolved value as `Trunk branch:` in the tracker `## Worktree` section. **Every later Phase 0 / Phase 6c bash snippet that references the integration trunk MUST use `$TRUNK`, never a baked-in `develop`.** Begin every later consumer with a guard: if `$TRUNK` is empty → HALT with "Trunk branch unresolved — re-read `git.trunk_branch` from the tracker".
    - Resolve `$METRICS` = `paths.metrics` from `baldart.config.yml` (default `docs/metrics`). This is the framework-owned telemetry directory written by Phase 8 (tracker archive, `skill-runs.jsonl`, `sessions/`). The dirty-tree gate (step 3) reads `$METRICS` to recognise — and never surface — its own telemetry output. Persist it as `Metrics dir:` in the tracker `## Worktree` section.
-1. **Resolve `$MAIN`** — the absolute path of the main repo (not a worktree). If `/new` was invoked from inside a worktree, walk up to the parent repo via `git rev-parse --show-superproject-working-tree` or `git worktree list` until you find the non-worktree root. Persist as `Main repo:` in the tracker `## Worktree` section. **Write `$MAIN` to the tracker the moment it is computed** — every later consumer (Phase 6c, Phase 6b) MUST re-read it from the tracker and HALT with "`$MAIN` absent from tracker" if the field is missing or empty, never silently use an undefined `$MAIN` (it does not survive context compaction).
+1. **Resolve `$MAIN`** — the absolute path of the main repo (not a worktree). Use `git rev-parse --show-toplevel` when cwd is the main checkout. If `/new` was invoked from **inside a worktree**, that returns the worktree, so resolve the main root from the worktree's parent instead: `cd "$(git rev-parse --show-toplevel)/.." && git rev-parse --show-toplevel` (the worktree lives at `<main>/.worktrees/<name>`). Do **not** use `--git-common-dir` + `/..` (wrong under `git init --separate-git-dir`) or `git worktree list` (its main entry is the git dir, not the working tree, under a separate git dir). This is the same contract as worktree-manager § "Resolving the main repo root". Persist as `Main repo:` in the tracker `## Worktree` section. **Write `$MAIN` to the tracker the moment it is computed** — every later consumer (Phase 6c, Phase 6b) MUST re-read it from the tracker and HALT with "`$MAIN` absent from tracker" if the field is missing or empty, never silently use an undefined `$MAIN` (it does not survive context compaction).
 1b. **Migration Gate (BLOCKING only when a migration is *declared* — else a silent no-op)** — resolve DB migrations interactively **before** the worktree exists, so the schema is live before any card builds against it. *Why this exists*: a migration applied to a shared/remote DB is owner-gated, so without this gate it is deferred to the END of the batch — and every downstream card in the batch is then built and verified against a schema that is not yet live (`validation_commands` / QA / E2E / DB-generated `tsc` types fail falsely → those cards cascade into deferral/blocked). Front-loading the migration removes that root cause. **The declaration lives in the EPIC card** (`migration_plan` block — project-specific, authored by the user, typically via the `.baldart/overlays/new.md` overlay). Steps:

package/framework/.claude/skills/new/references/team-mode.md CHANGED Viewed

@@ -105,6 +105,7 @@ Agent tool call:
     a) Print the numbered requirements checklist (anti-skip measure)
     b) Implement ALL requirements
     c) Run: npx tsc --noEmit && npx eslint --max-warnings=0 <your-files>
+       (when toolchain.commands.{typecheck,lint} are configured, run those verbatim instead — per agents/toolchain-protocol.md)
     d) Self-heal up to 3 times if checks fail
     e) Verify completeness: for each requirement, confirm code exists (read it)
     f) If any requirement is missing after implementation, implement it now
@@ -151,7 +152,7 @@ For each completed agent:
 After ALL agents in the group complete successfully:
-1. **D.1 — Build verification (group)** — Run `npm run build` in the worktree to verify combined changes compile (redirect to `/tmp/build-group.txt` per § "Context economy"; surface only exit code + bounded extract on failure). If build fails, identify which card's changes broke it (from `git diff --name-only` per card), spawn a targeted fix-coder for those files only.
+1. **D.1 — Build verification (group)** — Run `npm run build` (when `has_toolchain`, the configured `toolchain.commands.build` verbatim — § "Toolchain gates") in the worktree to verify combined changes compile (redirect to `/tmp/build-group.txt` per § "Context economy"; surface only exit code + bounded extract on failure). If build fails, identify which card's changes broke it (from `git diff --name-only` per card), spawn a targeted fix-coder for those files only.
 1.5. **D.1.5 — Effective per-card review profile (compute ONCE; drives D.2 + D.4b)** — For EACH card in the group, compute its **effective codex profile** with the SAME deterministic rule the sequential Phase 3.7 Step C uses, so the two paths never disagree:
    - **Floor**: read the card's `review_profile` field (`skip`/`light`/`balanced`/`deep`) per the QA Profile Selector (fallback-computed only for legacy cards lacking the field).
@@ -242,7 +243,7 @@ After ALL agents in the group complete successfully:
 3a. **D.3a — Phase 2.5b AC-Closure Gate (per-card, BLOCKING — non-skippable)** — For EACH card in the group, **sequentially**, invoke the full Phase 2.5b gate as documented in `### Phase 2.5b — AC-Closure Gate (BLOCKING — Scope Closure Discipline)`. This includes: build the AC Closure Ledger from the card YAML, run the rationalization scan, invoke `AskUserQuestion` one-per-deferred-AC, run the `implementation_notes` deferral audit, and persist the ledger in the tracker. Until EVERY card in the group exits PASS, do NOT proceed to D.3b. Cards exiting with `not_implemented` ACs that the user routes to "Implementa adesso" must finish their fix-coder loop and re-pass the gate before D.3b starts for the next card. Log under `## AC Closure Ledger — <CARD-ID>` per card.
-3b. **D.3b — Phase 2.55 Simplify (per-card, FANNED OUT across the group)** — The Simplify agents are **read-only analysis on file-disjoint per-card diffs** (the orchestrator applies the fixes afterward), so there is NO reason to run them one card at a time. **Spawn the per-card Simplify analysis for ALL eligible cards in PARALLEL** — in a SINGLE message, fire each card's Phase 2.55 trio (Reuse / Quality / Efficiency) against that card's diff captured to `/tmp/diff-<CARD-ID>.txt` (per Phase 2.55 step 2 — pass each trio the **path**, scoped to the card's File Ownership Map; never inline the diff). Per-card (not group-aggregate) so findings stay attributable. When all analyses return, **apply fixes per card** (file-disjoint → no write conflict), then re-run `npm run lint` and `npx tsc --noEmit` on the worktree ONCE for the whole group (redirect to disk per § "Context economy"). (Concurrency is capped by the platform; passing N cards is safe — excess agents queue.)
+3b. **D.3b — Phase 2.55 Simplify (per-card, FANNED OUT across the group)** — The Simplify agents are **read-only analysis on file-disjoint per-card diffs** (the orchestrator applies the fixes afterward), so there is NO reason to run them one card at a time. **Spawn the per-card Simplify analysis for ALL eligible cards in PARALLEL** — in a SINGLE message, fire each card's Phase 2.55 trio (Reuse / Quality / Efficiency) against that card's diff captured to `/tmp/diff-<CARD-ID>.txt` (per Phase 2.55 step 2 — pass each trio the **path**, scoped to the card's File Ownership Map; never inline the diff). Per-card (not group-aggregate) so findings stay attributable. When all analyses return, **apply fixes per card** (file-disjoint → no write conflict), then re-run `npm run lint` and `npx tsc --noEmit` (when `has_toolchain`, the configured `toolchain.commands.{lint,typecheck}` verbatim — § "Toolchain gates") on the worktree ONCE for the whole group (redirect to disk per § "Context economy"). (Concurrency is capped by the platform; passing N cards is safe — excess agents queue.)
    - **Gate (enumerated, `TRIVIAL_CARDS`-driven)**: SKIP D.3b for a card in **`TRIVIAL_CARDS`** (the set already computed at D.1.5 — `review_profile == skip` AND 0 Step-A triggers AND **non-source diff**), aligning team mode with sequential Phase 2.55's `IS_TRIVIAL` re-confirmation on the ACTUAL diff and with team-mode's own D.1.5 SSOT. A trivial card has no substantive diff to simplify, and Simplify is quality-only (no merge-gate coverage to lose). Log `simplify: SKIPPED (trivial — non-source diff)`. **A card with `review_profile == skip` whose committed diff DID touch a source file is NOT in `TRIVIAL_CARDS` → run D.3b for it** (exactly as sequential 2.55 does — `skip` is the floor, the real diff is the deciding check). For `light`/`balanced`/`deep` cards D.3b runs unchanged. This is the ONLY enumerated skip — never skip D.3b "for time" on a non-trivial card. (Skipped cards are simply omitted from the parallel fan-out.)
 3c. **D.3c — Phase 2.6 E2E-Review (per-card)** — First, evaluate the existing Gate table for EVERY card at once (skip when `features.has_e2e_review: false`, backend-only diff per the diff predicate documented in Phase 2.6, or card type in the Phase 2.6 skip set — `backend`/`api`/`db`/`infra`/`docs`/`chore`/`config`). In practice most cards in a group skip this gate (backend/db/api), so the eligible set is usually 0–1. For the cards that PASS the gate, invoke `/e2e-review` in programmatic mode with that card's payload. Each `/e2e-review` keeps its own isolated state dir (`.baldart/e2e-review/<CARD-ID>/`), so multiple runs do not clobber each other's artifacts.

package/framework/.claude/skills/prd/references/validation-phase.md CHANGED Viewed

@@ -66,7 +66,7 @@ created by Step 1 (HARD RULE 17). `$WORKTREE_PATH` is set in the state file.
 **Variable guard (R6).** Before executing any item in Step 7, resolve and verify:
 - `$WORKTREE_PATH` — read from the state file `## Worktree / path:`. HALT with `"ABORT: WORKTREE_PATH not set in state file — cannot commit or merge."` if absent or empty.
-- `$MAIN` — the absolute path of the main (non-worktree) git repository root. **READ the persisted value from the state file `## Worktree / main_path:`** (written at SKILL.md Step 1, the R6 persist-then-read pattern `/new` uses). HALT with `"ABORT: MAIN absent from state — re-run from a session whose Step 1 persisted main_path."` if the field is absent or empty. Only when the field is genuinely missing (legacy state file) FALL BACK to deriving it once via `git -C "$WORKTREE_PATH" rev-parse --git-common-dir` (strips `/.git`). Every subsequent `git` command that runs outside the worktree MUST use `git -C "$MAIN" …`; never use a bare `git` command whose cwd is undefined. (This is the same variable name and resolution contract `/new` uses — the two skills must agree.)
+- `$MAIN` — the absolute path of the main (non-worktree) git repository root. **READ the persisted value from the state file `## Worktree / main_path:`** (written at SKILL.md Step 1, the R6 persist-then-read pattern `/new` uses). HALT with `"ABORT: MAIN absent from state — re-run from a session whose Step 1 persisted main_path."` if the field is absent or empty. Only when the field is genuinely missing (legacy state file) FALL BACK to deriving it once from the worktree's parent toplevel: `MAIN="$(cd "$WORKTREE_PATH" && git -C .. rev-parse --show-toplevel)"`. (Do **not** use `git rev-parse --git-common-dir` + `/..` — that returns the parent of the git dir, which is wrong under `git init --separate-git-dir`; `--show-toplevel` is the true working-tree root. This mirrors the canonical resolution in worktree-manager § "Resolving the main repo root".) Every subsequent `git` command that runs outside the worktree MUST use `git -C "$MAIN" …`; never use a bare `git` command whose cwd is undefined. (This is the same variable name and resolution contract `/new` uses — the two skills must agree.)
 ### Resolve remaining items

package/framework/.claude/skills/simplify/SKILL.md CHANGED Viewed

@@ -8,7 +8,7 @@ description: Review changed code for reuse, quality, and efficiency, then fix an
 ## Project Context
-**Reads from `baldart.config.yml`:** `paths.references_dir`, `paths.components_primitives`, `paths.components_root`, `paths.design_system`.
+**Reads from `baldart.config.yml`:** `paths.references_dir`, `paths.components_primitives`, `paths.components_root`, `paths.design_system`, `features.has_toolchain` + `toolchain.commands.{lint,typecheck,test}` (the Step 5 verify gates run those verbatim when the flag is on — see Step 5).
 **Gated by features:** `features.has_design_system` (Design System check section is BLOCKING when `true`).
 **Overlay:** loads `.baldart/overlays/simplify.md` if present — project-specific utility module paths, hook conventions.
 **On missing/empty keys:** ask the user; do not assume defaults. See `framework/agents/project-context.md` § 3.
@@ -142,6 +142,12 @@ When extracting shared code, write to the SAME `${paths.*}` keys the review phas
 These static checks run AFTER Step 4 has mutated code — a pre-fix pass would not cover the post-fix
 code, so re-running them here is mandatory whenever Step 4 changed anything.
+> **Toolchain:** when `features.has_toolchain: true` in `baldart.config.yml`, run the command from
+> `toolchain.commands.{lint,typecheck,test}` **verbatim** instead of the default shown below — per
+> `framework/agents/toolchain-protocol.md`. Per gate: a non-empty config command wins; empty/absent
+> (or the flag off) falls back to the default. A configured command that exits non-zero is a real
+> FAIL (do not fall back to the default).
 Run the linter (use the project's lint command):
 ```

package/framework/.claude/skills/worktree-manager/SKILL.md CHANGED Viewed

@@ -20,6 +20,7 @@ description: >
 - `git.merge_strategy` — `pr` (default, GitHub PR via `gh`) or `local-push` (direct fast-forward to `origin/<git.trunk_branch>`). Drives `/mw` step 4c.
 - `paths.backlog_dir` — used for syncing untracked backlog cards (`/nw` step 3b).
 - `paths.metrics` — JSONL telemetry dir (default `docs/metrics`) used by the rebase conflict-resolution table.
+- `features.has_toolchain` + `toolchain.commands.{typecheck,lint,build,test}` — when the flag is `true`, the mechanical gates below run the consumer's configured commands verbatim instead of the hardcoded `npx tsc`/`npx eslint`/`npm run build` defaults (see "## Toolchain-aware gates"). Absent/`false` → defaults, identical to pre-toolchain behavior.
 - Protocol reference: `framework/agents/project-context.md`. Skills must ASK when a needed key is missing — never assume.
 ## Effort
@@ -30,6 +31,47 @@ reasoning depth for this run — detect it once at kickoff and strip the token
 before consuming user input. Level→behavior mapping, parsing contract, and
 precedence caveats: `framework/agents/effort-protocol.md`.
+## Toolchain-aware gates
+When `features.has_toolchain: true` in `baldart.config.yml`, every mechanical
+gate in this skill (type-check, lint, build, test) runs the command from
+`toolchain.commands.<gate>` **verbatim** instead of the hardcoded default — per
+`framework/agents/toolchain-protocol.md`. Resolution is done **in shell**, the
+same way this skill already resolves `git.trunk_branch` / `git.merge_strategy`
+from disk (a background `/nw` baseline runner is a weak/background model — a prose
+"please substitute" note would be silently skipped while the hardcoded command
+ran anyway; the shell resolver is model-independent). Each gate block inlines this
+compact resolver (self-contained, since each runs as its own `Bash` call):
+```bash
+# Resolve baldart.config.yml from disk; missing/unreadable → empty → hardcoded default.
+CFG="baldart.config.yml"; [ -f "$CFG" ] || CFG="$(git rev-parse --show-toplevel 2>/dev/null)/baldart.config.yml"
+_tc() { grep -E '^[[:space:]]*has_toolchain:[[:space:]]*true' "$CFG" >/dev/null 2>&1 || return 0
+  grep -A20 '^toolchain:' "$CFG" 2>/dev/null | grep -A15 '^[[:space:]]*commands:' \
+    | grep -E "^[[:space:]]+$1:" | head -1 \
+    | sed -E "s/.*$1:[[:space:]]*\"?([^\"#]*)\"?.*/\1/" | sed -E 's/[[:space:]]+$//'; }
+```
+`eval "${VAR:-<default>}"` then runs the configured command or falls back.
+**Three carve-outs specific to this skill:**
+- **`npm install` is NOT a toolchain gate** — there is no `toolchain.commands.install`
+  key (the command map is lint/format/typecheck/test/test_related/build/audit).
+  Dependency installation stays as-is; do **not** route it through the resolver.
+- **Severity is unchanged.** The resolver picks *which* command runs, never *what
+  to do with its exit code*. The protocol's "a configured command that exits
+  non-zero is a real FAIL — do not then run the default" governs only resolution
+  (never mask a failure by running the default after the configured one failed);
+  it does **not** change each block's STOP/continue policy: a `/nw` baseline
+  type-check/lint failure is still "report but continue", a build failure is still
+  "STOP", exactly as written at each block.
+- **The `/mw` best-effort test line stays best-effort.** `npm run test 2>/dev/null
+  || true` (pre-merge step 2) is intentionally non-blocking; the resolved command
+  keeps the `|| true` swallow, so `test` is never a `/mw` STOP trigger.
+A configured **lint** command may be whole-tree (e.g. `npx biome check .`) where a
+default is changed-files-scoped (`/mw` pre-merge) — that wider scope is the
+consumer's configured contract, applied only when `has_toolchain` is on.
 **IMMEDIATE EXECUTION**: When invoked via `/nw`, `/mw`, `/lw`, or `/cw`, do NOT explain the process. Start executing the matching command flow immediately:
@@ -112,6 +154,9 @@ Output:
 #    Without this, the worktree IS git-tracked: `git status` on the main
 #    repo explodes with every file inside the worktree, exactly defeating
 #    the parallel-isolation purpose of the docs-mode.
+#    Main-checkout context (the docs worktree is created below, cwd is still the
+#    main repo) → `--show-toplevel` is the main root, separate-git-dir-safe. See
+#    § "Resolving the main repo root".
 MAIN_ROOT="$(git rev-parse --show-toplevel)"
 if [ ! -f "$MAIN_ROOT/.gitignore" ] || ! grep -qE '^\.worktrees/?$' "$MAIN_ROOT/.gitignore"; then
   echo "ERROR: .worktrees/ is not in $MAIN_ROOT/.gitignore" >&2
@@ -367,7 +412,8 @@ unmerged sibling branch, invisible to both the local backlog and the trunk
 merge-base. They collide at rebase/merge time.
 `scripts/allocate-id.sh` closes that race **on the same machine**. Every worktree
-shares one main repo root (resolved from `git rev-parse --git-common-dir`), and
+shares one main repo root (resolved by the canonical `resolve_main()` helper in
+that script — see § "Resolving the main repo root" below), and
 `.worktrees/` lives there and is gitignored — so it is the natural shared
 coordination point, exactly like `registry.json`. The allocator anchors a lock
 and a per-prefix high-water-mark there:
@@ -381,6 +427,36 @@ and a per-prefix high-water-mark there:
 .claude/skills/worktree-manager/scripts/allocate-id.sh release <worktree-path>
 ```
+---
+## Resolving the main repo root
+Several steps need `$MAIN` — the absolute path of the **main repo working tree**
+(never a worktree). It is resolved in **three distinct execution contexts**, and
+each genuinely needs a different primitive — this is NOT entropy, and "unifying"
+them onto one naked primitive regresses real cases (verified):
+1. **Main-checkout cwd** (a worktree is being *created*, or cards synced before
+   any `cd`): `git rev-parse --show-toplevel`. This is the true working-tree root
+   even when `.git` lives elsewhere (`git init --separate-git-dir`). Sites:
+   `/nw` step 1, nw-docs step 0, `3b. Sync untracked cards` (cwd is still main
+   there), and `/prd` SKILL.md Step 1.
+2. **Worktree cwd** (after `cd "$WORKTREE_PATH"`): `git -C .. rev-parse --show-toplevel`.
+   The worktree's parent is `.worktrees/`, which lives inside the main repo, so
+   its toplevel is the main root — correct under a separate git dir too. Sites:
+   `/nw` step 4 env-copy, `/mw` step 4c `$MAIN` fallback.
+3. **Arbitrary cwd, main-or-worktree** (the `allocate-id.sh` startup, called from
+   either): the canonical `resolve_main()` in `scripts/allocate-id.sh` — it uses
+   `--show-toplevel`, then detects a linked worktree (`git-dir != git-common-dir`)
+   and walks one level up.
+**Do NOT resolve `$MAIN` via `git rev-parse --git-common-dir` + `/..`.** That
+returns the parent of the *git dir*, which equals the working-tree root only when
+`.git` is in-tree; under `git init --separate-git-dir` it points at the wrong
+directory. Always derive the root through `--show-toplevel`. `resolve_main()` is
+the reference implementation; the SKILL snippets mirror it inline because the
+script is not on disk inside a freshly-checked-out worktree.
 Shared files, all under `$MAIN/.worktrees/` (already gitignored):
 - `.id-alloc.lock/` — cross-process mutex (atomic `mkdir`, stale-stolen after 30s
   via the dir's own mtime; it is a directory, NOT a git ref, so it does not touch
@@ -422,6 +498,9 @@ Supports three modes:
 # Resolve $MAIN (main repo root) and $TRUNK (git.trunk_branch) up front, and
 # persist both onto the registry entry created in step 6 (R6). Every later step
 # reads them from the registry with a presence guard — never from in-context state.
+# Main-checkout context: cwd is the main repo (the orchestrator invokes /nw from
+# $MAIN — setup.md resolves $MAIN before spawning), so `--show-toplevel` is the
+# main root and is separate-git-dir-safe. See § "Resolving the main repo root".
 MAIN="$(git rev-parse --show-toplevel)"
 # .gitignore safety (auto-heal, NON-blocking — code-mode differs from nw-docs, which
 # hard-aborts). Without `.worktrees/` ignored the worktree is git-tracked and `git status`
@@ -520,7 +599,14 @@ but are NOT on the trunk branch yet. The worktree (branched from the trunk) won'
 # when the key is absent — emit it resolved, never the literal ${paths.backlog_dir}
 # token, which is not valid bash). Same resolution as /new's card-scoped diff block.
 BACKLOG_DIR="<value of paths.backlog_dir, or 'backlog' if the key is absent>"
-MAIN_ROOT="$(git -C "$WORKTREE_PATH" rev-parse --show-superproject-working-tree 2>/dev/null || pwd)"
+# At this point cwd is still the MAIN checkout (the `cd "$WORKTREE_PATH"` happens
+# in step 4 below), so `--show-toplevel` IS the main repo root — correct for a
+# normal repo, a git submodule, and a `--separate-git-dir` repo alike. (The old
+# `git -C "$WORKTREE_PATH" --show-superproject-working-tree || pwd` was wrong for a
+# real submodule — it returned the SUPERPROJECT, not this repo's root where the
+# cards live — and only resolved otherwise by relying on this same cwd accident.
+# See § "Resolving the main repo root".)
+MAIN_ROOT="$(git rev-parse --show-toplevel)"
 # For each card in the batch, check if its YAML exists in the main repo but not in the worktree
 for CARD_FILE in $(ls "$MAIN_ROOT/$BACKLOG_DIR"/*.yml 2>/dev/null); do
@@ -549,7 +635,13 @@ npm install
 # 3. Copy environment files from main repo root
 #    Build requires Firebase env vars — this copy is critical.
-MAIN_ROOT="$(git -C .. rev-parse --show-toplevel 2>/dev/null || echo '../..')"
+#    cwd is the worktree; its parent (`.worktrees/`) lives inside the main repo,
+#    so `git -C ..` resolves the MAIN repo's toplevel — correct under a
+#    `--separate-git-dir` repo too (unlike `--git-common-dir`+parent). The old
+#    silent `|| echo '../..'` fallback masked a resolution failure with a relative
+#    guess; fail loud instead. See § "Resolving the main repo root".
+MAIN_ROOT="$(git -C .. rev-parse --show-toplevel 2>/dev/null)"
+[ -n "$MAIN_ROOT" ] || { echo "ERROR: cannot resolve main repo root from worktree $(pwd)" >&2; exit 1; }
 cp "$MAIN_ROOT/.env.local" .env.local 2>/dev/null || true
 cp "$MAIN_ROOT/.env" .env 2>/dev/null || true
@@ -597,16 +689,27 @@ fi
 ### 5. Verify baseline
 ```bash
+# Toolchain-aware (§ "Toolchain-aware gates"): when features.has_toolchain: true,
+# run toolchain.commands.{typecheck,lint,build} verbatim; else the defaults below.
+CFG="baldart.config.yml"; [ -f "$CFG" ] || CFG="$(git rev-parse --show-toplevel 2>/dev/null)/baldart.config.yml"
+_tc() { grep -E '^[[:space:]]*has_toolchain:[[:space:]]*true' "$CFG" >/dev/null 2>&1 || return 0
+  grep -A20 '^toolchain:' "$CFG" 2>/dev/null | grep -A15 '^[[:space:]]*commands:' \
+    | grep -E "^[[:space:]]+$1:" | head -1 \
+    | sed -E "s/.*$1:[[:space:]]*\"?([^\"#]*)\"?.*/\1/" | sed -E 's/[[:space:]]+$//'; }
+TC_TC=$(_tc typecheck); TC_LINT=$(_tc lint); TC_BUILD=$(_tc build)
 # TypeScript + lint (fast)
-npx tsc --noEmit
-npx eslint --max-warnings=0 src/
+eval "${TC_TC:-npx tsc --noEmit}"
+eval "${TC_LINT:-npx eslint --max-warnings=0 src/}"
 # Full build verification (required — confirms worktree is functional)
-npm run build
+eval "${TC_BUILD:-npm run build}"
 ```
 If build fails → STOP and report. Do NOT continue — the worktree is broken.
 If only tsc/lint fails → report but continue (the trunk branch should be clean, may be a transient issue).
+(Severity is by-gate as stated here; the toolchain resolver only changes *which* command runs — § "Toolchain-aware gates".)
 ### 6. Update registry
@@ -717,15 +820,27 @@ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>"
 fi
 # 2. Run quality checks
-# Lint only changed files (vs the trunk branch)
-npx eslint --max-warnings=0 $(git diff --name-only "origin/$TRUNK" -- '*.ts' '*.tsx')
-npx tsc --noEmit
+# Toolchain-aware (§ "Toolchain-aware gates"): when features.has_toolchain: true,
+# run toolchain.commands.{lint,typecheck,build,test} verbatim; else the defaults below.
+CFG="baldart.config.yml"; [ -f "$CFG" ] || CFG="$(git rev-parse --show-toplevel 2>/dev/null)/baldart.config.yml"
+_tc() { grep -E '^[[:space:]]*has_toolchain:[[:space:]]*true' "$CFG" >/dev/null 2>&1 || return 0
+  grep -A20 '^toolchain:' "$CFG" 2>/dev/null | grep -A15 '^[[:space:]]*commands:' \
+    | grep -E "^[[:space:]]+$1:" | head -1 \
+    | sed -E "s/.*$1:[[:space:]]*\"?([^\"#]*)\"?.*/\1/" | sed -E 's/[[:space:]]+$//'; }
+TC_LINT=$(_tc lint); TC_TC=$(_tc typecheck); TC_BUILD=$(_tc build); TC_TEST=$(_tc test)
+# Lint — default is changed-files-scoped (vs trunk); a configured whole-tree command
+# (e.g. `npx biome check .`) runs verbatim by the consumer's contract (§ carve-out).
+CHANGED_TS=$(git diff --name-only "origin/$TRUNK" -- '*.ts' '*.tsx')
+eval "${TC_LINT:-npx eslint --max-warnings=0 $CHANGED_TS}"
+eval "${TC_TC:-npx tsc --noEmit}"
 # Full build (required before PR per AGENTS.md)
-npm run build
+eval "${TC_BUILD:-npm run build}"
-# Tests
-npm run test 2>/dev/null || true
+# Tests — BEST-EFFORT by design: never a STOP trigger. Keep the `|| true` swallow
+# whether configured or default (§ carve-out: the /mw test line stays best-effort).
+{ eval "${TC_TEST:-npm run test}"; } 2>/dev/null || true
 ```
 If lint, tsc, or build fails → report and STOP. Do NOT proceed to PR.
@@ -898,7 +1013,13 @@ done
 # Continue rebase + verify build (no stash to restore — step 4b never stashes)
 git rebase --continue
-npm run build
+# Toolchain-aware (§ "Toolchain-aware gates"): toolchain.commands.build verbatim when set, else default.
+CFG="baldart.config.yml"; [ -f "$CFG" ] || CFG="$(git rev-parse --show-toplevel 2>/dev/null)/baldart.config.yml"
+_tc() { grep -E '^[[:space:]]*has_toolchain:[[:space:]]*true' "$CFG" >/dev/null 2>&1 || return 0
+  grep -A20 '^toolchain:' "$CFG" 2>/dev/null | grep -A15 '^[[:space:]]*commands:' \
+    | grep -E "^[[:space:]]+$1:" | head -1 \
+    | sed -E "s/.*$1:[[:space:]]*\"?([^\"#]*)\"?.*/\1/" | sed -E 's/[[:space:]]+$//'; }
+TC_BUILD=$(_tc build); eval "${TC_BUILD:-npm run build}"
 ```
 If build fails after rebase → STOP and report. The rebase introduced incompatibilities.
@@ -911,11 +1032,16 @@ Read `git.merge_strategy` from `baldart.config.yml` (default: `pr`):
 ```bash
 # $MAIN and $TRUNK were resolved in step 1 (read from the registry entry, R6) and
-# presence-guarded. Fall back to deriving $MAIN from git-common-dir ONLY if it is
+# presence-guarded. Fall back to deriving $MAIN from the worktree ONLY if it is
 # still unset — never silently re-derive over a value the registry already supplied.
 if [ -z "$MAIN" ]; then
-  MAIN=$(git -C "$WORKTREE_PATH" rev-parse --git-common-dir)/..
-  MAIN=$(cd "$MAIN" && pwd)
+  # Canonical worktree→main resolution (see § "Resolving the main repo root"):
+  # cd INTO the worktree so `git -C ..` resolves the MAIN repo's toplevel. NOT
+  # `--git-common-dir`+`/..` — that returns the parent of the git dir (wrong under
+  # `git init --separate-git-dir`), and appending `/..` to a possibly-relative
+  # `--git-common-dir` under `git -C` resolves against the wrong base.
+  MAIN="$(cd "$WORKTREE_PATH" 2>/dev/null && git -C .. rev-parse --show-toplevel 2>/dev/null)"
+  [ -n "$MAIN" ] || { echo "ERROR: cannot resolve \$MAIN from worktree $WORKTREE_PATH (registry mainRoot was empty)." >&2; exit 1; }
 fi
 # Resolve strategy from baldart.config.yml (default: pr)
@@ -1093,8 +1219,15 @@ fi
 ### 5. Post-merge verification
 ```bash
-npx tsc --noEmit
-npm run build
+# Toolchain-aware (§ "Toolchain-aware gates"): toolchain.commands.{typecheck,build} verbatim when set, else defaults.
+CFG="baldart.config.yml"; [ -f "$CFG" ] || CFG="$(git rev-parse --show-toplevel 2>/dev/null)/baldart.config.yml"
+_tc() { grep -E '^[[:space:]]*has_toolchain:[[:space:]]*true' "$CFG" >/dev/null 2>&1 || return 0
+  grep -A20 '^toolchain:' "$CFG" 2>/dev/null | grep -A15 '^[[:space:]]*commands:' \
+    | grep -E "^[[:space:]]+$1:" | head -1 \
+    | sed -E "s/.*$1:[[:space:]]*\"?([^\"#]*)\"?.*/\1/" | sed -E 's/[[:space:]]+$//'; }
+TC_TC=$(_tc typecheck); TC_BUILD=$(_tc build)
+eval "${TC_TC:-npx tsc --noEmit}"
+eval "${TC_BUILD:-npm run build}"
 ```
 If post-merge build fails → STOP and report. Do NOT cleanup worktree (may need to investigate).

package/framework/.claude/skills/worktree-manager/scripts/allocate-id.sh CHANGED Viewed

@@ -8,8 +8,8 @@
 # backlog scan cannot see IDs that are still in flight on an unmerged sibling
 # worktree branch.
 #
-# Mechanism: every worktree shares the same main repo root (resolved from
-# `git rev-parse --git-common-dir`), and `.worktrees/` lives there and is
+# Mechanism: every worktree shares the same main repo root (resolved by
+# `resolve_main()` below), and `.worktrees/` lives there and is
 # gitignored. We anchor a lock + a per-prefix high-water-mark file there, so a
 # reservation is atomic across every worktree on this machine. The high-water
 # bumped under the lock is the correctness mechanism; the max() against the real
@@ -34,15 +34,29 @@ LOCK_MAX_TRIES=150   # ~30s at 0.2s/try
 err() { printf '%s\n' "$*" >&2; }
-# --- Resolve the shared main repo root from any worktree -------------------
+# --- Resolve the main repo root from the main checkout OR any worktree -----
+# CANONICAL main-repo-root resolution for the whole framework (mirrored inline in
+# worktree-manager/SKILL.md + prd, where this script is not reachable from a
+# worktree checkout). Built on `--show-toplevel`, NOT `--git-common-dir`+parent:
+# the latter returns the PARENT OF THE GIT DIR, which is wrong under
+# `git init --separate-git-dir` (the git dir lives outside the working tree).
+# `--show-toplevel` is the true working-tree root in every case. From a linked
+# worktree `--show-toplevel` returns the WORKTREE root, so we detect that
+# (git-dir != git-common-dir) and walk one level up out of `.worktrees/` — the
+# worktree path is always `<main>/.worktrees/<name>` (SKILL.md R8) — and resolve
+# the main checkout's toplevel there. Fails (return 1) rather than print garbage.
 resolve_main() {
-  local common
+  local top gd common
+  top="$(git rev-parse --show-toplevel 2>/dev/null)" || return 1
+  gd="$(git rev-parse --git-dir 2>/dev/null)" || return 1
   common="$(git rev-parse --git-common-dir 2>/dev/null)" || return 1
-  case "$common" in
-    /*) ;;                          # already absolute
-    *)  common="$(pwd)/$common" ;;  # relative (we're in the main repo) → absolutise
-  esac
-  (cd "$common/.." 2>/dev/null && pwd) || return 1
+  if [ "$gd" = "$common" ]; then
+    printf '%s\n' "$top"            # main checkout — toplevel IS the main repo root
+  else
+    # linked worktree — the main root is the toplevel of the directory that
+    # contains `.worktrees/` (one level above this worktree).
+    (cd "$top/.." 2>/dev/null && git rev-parse --show-toplevel 2>/dev/null) || return 1
+  fi
 }
 # --- Read a paths.* / git.* scalar from baldart.config.yml -----------------

package/framework/agents/toolchain-protocol.md CHANGED Viewed

@@ -64,6 +64,21 @@ protocol. The `/new` workflow scripts receive the resolved config via their
 `args.config` payload (the consuming skill passes `baldart.config.yml`) — they
 must never hard-code project facts (see the workflows contamination contract).
+Since v4.42.0 the **execution-side** gates resolve through this protocol too: the
+`worktree-manager` skill (`/nw` baseline + `/mw` pre-merge / rebase / post-merge
+build gates), the classic `/new` reference suite (`implement`, `review-cycle`,
+`completeness`, `final-review`, `commit`, `team-mode` — cited as `§ "Toolchain
+gates"` from the `/new` core), and the standalone `/bug` + `/simplify` verify
+steps. Two execution-context-specific resolution mechanisms are used: a markdown
+**prose note** where a full model reads the skill and writes the gate command
+(the `/new` references, `/bug`, `/simplify`, the agents), and an inline **shell
+resolver** in `worktree-manager` (it has no `args.config` and its baseline runs in
+a weak/background subagent, so it resolves `toolchain.commands.*` from
+`baldart.config.yml` on disk — the same way it already resolves `git.trunk_branch`
+— rather than relying on a prose note a weak model could skip). `npm install` is
+**not** a gate (there is no `commands.install` key), and `markdownlint` is not in
+the curated map — both stay as written everywhere.
 ## Fallback rules
 - A configured command that EXITS NON-ZERO is a genuine gate **FAIL** — do not

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "baldart",
-  "version": "4.43.0",
+  "version": "4.45.0",
   "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
   "bin": {
     "baldart": "./bin/baldart.js"