npm - baldart - Versions diffs - 4.38.0 → 4.40.0 - Mend

baldart 4.38.0 → 4.40.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/CHANGELOG.md +22 -0
package/VERSION +1 -1
package/bin/baldart.js +10 -0
package/framework/.claude/skills/new/references/final-review.md +53 -18
package/framework/.claude/skills/new/references/merge-cleanup.md +15 -0
package/framework/.claude/skills/new2/SKILL.md +11 -0
package/framework/.claude/skills/prd/references/validation-phase.md +14 -0
package/framework/.claude/workflows/new-final-review.js +21 -10
package/framework/docs/WORKFLOWS.md +1 -1
package/package.json +1 -1
package/src/commands/reap-orphans.js +90 -0

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,28 @@ All notable changes to BALDART will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [4.40.0] - 2026-06-15
+**The classic `/new` now slims its single-card final review the same way `new2` already does — closing an asymmetry that made an N=1 batch re-run review finders it had already run.** `new-final-review.js` has carried an F-041 single-card slim since v4.17.x (keep the unique-value cross-model Codex pass + the qa-sentinel merge gate; drop the duplicate Claude breadth finders that already ran per-card), but it only ever fired for **`new2`**, which passes `singleCard`. The classic `/new` final-review delegation (`final-review.md` Step F.1.5) **never passed the flag**, so `slim` was always false and a one-card `/new` batch ran the full breadth set in the final review even though the per-card pass had already covered the same files — a duplicate cross-model Codex + redundant doc review on every single-card run. The user's framing ("for one card, skip the per-card review") was the wrong lever (it would drop Simplify, break the fail-fast ordering that runs review *before* E2E + doc-sync, and lose the early security pass — and was already litigated: v3.35.0 introduced an N=1 final-review skip, v3.37.0 reverted it precisely because the per-card pass can run shallow under `light`). The right lever, already chosen for `new2`, is the inverse: keep the per-card pass, slim the *final*. This release extends that slim to classic `/new` — but **per-finder, coverage-gated**, because classic `/new` differs from `new2`: its `doc-reviewer` runs in the skill's Phase 3 (and can defer to Final under `light`), and its `api-perf-cost-auditor` is deferred-to-final *by design* (never runs per-card). **MINOR** (changes the N=1 merge-gate composition; no new surface, no `baldart.config.yml` key — `singleCard`/`slimDoc`/`slimApi` are workflow args ⇒ schema-propagation rule N/A).
+### Changed
+- **`framework/.claude/workflows/new-final-review.js`** — the single `slim = a.singleCard` flag is decoupled into **per-finder** `slimDoc` / `slimApi`, each defaulting to `a.singleCard` when absent (so callers passing only `singleCard` — i.e. `new2` — are byte-for-byte unchanged, and a caller passing neither gets an unconditional FULL pass — a safe default that never silently over-skips). `doc-reviewer` is gated on `!slimDoc`, `api-perf-cost-auditor` on `!slimApi && hasApiDataFiles`; Codex + `qa-sentinel` always run. Per-finder skip logs replace the single coupled log line.
+- **`framework/.claude/skills/new/references/final-review.md`** — Step F.1.5 now passes `singleCard: cardPaths.length === 1`, `slimDoc: cardPaths.length === 1 && <Phase-3 doc-review ran, not deferred>`, and `slimApi: false` (api-perf never runs per-card in classic `/new` — the final is its only run, so it is never slimmed). The v3.37.0 "FULL gate" invariant prose is reconciled to describe the coverage-gated slim (Codex + qa-sentinel are the unconditional safety gate; breadth finders slim per-finder for N=1 only, gated on per-card coverage — the coverage-gate is the backstop the v3.35.0 blanket skip lacked). The inline Step F.3 fallback (SSOT the workflow mirrors) and the fan-out completion barrier ("ALL THREE" → "every launched Task") carry the same rule, so the no-`Workflow`-tool path stays coherent.
+- **`framework/docs/WORKFLOWS.md`** — the `new-final-review` row's single-card description corrected from "skips the duplicate doc/api reviewers" to the accurate per-finder, coverage-gated behavior (`new2` drops both; classic `/new` drops only `doc-reviewer` and keeps `api-perf-cost-auditor`).
+## [4.39.0] - 2026-06-15
+**`/new`, `/new2`, and `/prd` now auto-reap orphaned Codex MCP servers at their workspace-hygiene finalizers — the v4.37.0 doctor reaper, made automatic.** v4.37.0 added an on-demand reaper to `baldart doctor`, but the leak compounds *per skill run*: every batch's Codex finder calls (`/new`/`new2` per-card review + final review, `/prd` discovery-completeness + plan audit) drive `codex app-server`, whose detached broker spawns the `~/.codex/config.toml` MCP servers (Playwright, …) as children that orphan to init (ppid 1) when the broker dies and keep burning CPU. Waiting for a manual `baldart doctor` let them accumulate between runs. Now each batch ends by sweeping them. A new focused, non-interactive CLI command (`baldart reap-orphans`) is the SSOT the three finalizers call; it shares the v4.37.0 `codex-orphans.js` detection/reaping logic and the same hard safety invariant — it reaps ONLY orphaned MCP servers (ppid 1 ⇒ broker dead ⇒ stdio broken), and NEVER kills a live `codex app-server` broker (a shared, detached runtime that may still serve the user's interactive session). Because an MCP child of a still-warm broker is not yet orphaned, this is a cumulative orphan sweep (catches this run's debris once its broker dies, plus any prior runs'), not a per-run broker teardown. **MINOR** (new CLI command + skill-finalizer wiring; backwards-compatible — non-blocking hygiene step, no-op when nothing is orphaned, no install/layout change, no `baldart.config.yml` key ⇒ schema-propagation rule N/A).
+### Added
+- **`src/commands/reap-orphans.js`** + **`bin/baldart.js`** — new `baldart reap-orphans` command: detects orphaned MCP servers (ppid 1 + MCP signature) and reaps each process tree via syscall, then prints a one-line summary. `--dry-run` reports without killing; `--json` emits a machine-readable result (`schema:"baldart.reap-orphans/1"`). Always exits 0 (hygiene, never a blocker). Reuses `src/utils/codex-orphans.js` (the v4.37.0 SSOT); live `codex app-server` brokers are detected and reported but never killed.
+### Changed
+- **`framework/.claude/skills/new/references/merge-cleanup.md`** (Phase 6c, new step 5b), **`framework/.claude/skills/prd/references/validation-phase.md`** (Step 7.5, new non-blocking closer), **`framework/.claude/skills/new2/SKILL.md`** (Step 5, new item 6) — each workspace-hygiene finalizer now runs `npx baldart reap-orphans` as a NON-BLOCKING step and folds its summary into the phase log. `new2` runs it in the main context after the workflow returns (the workflow sandbox cannot run Bash). All three carry the explicit "reaps orphans only, never the broker" note so a future maintainer does not escalate it into a broker kill.
 ## [4.38.0] - 2026-06-15
 **`baldart doctor` now checks whether the external tools BALDART installs are out of date upstream, and offers a one-command upgrade.** BALDART installs external tools into consumer machines but **pins none of them** — `pipx install graphifyy`, `npm install -g typescript-language-server`, … all grab "latest" at install time, and neither pipx nor npm ever auto-upgrades. So a consumer who installed months ago is frozen on whatever version they got, and never receives upstream security/correctness fixes; the `add`/`update`/`configure` flows can't help because they only run at install time. Concrete trigger: Graphify shipped `0.8.37` (SSRF guard thread-safety + prompt-injection mitigation + a macOS NFC/NFD re-extraction loop fix), `0.8.38` (`calls` edge-direction + JS/TS default import/export + tsconfig `paths` correctness) and `0.8.39` (a `graphify affected` `KeyError` crash fix — a command BALDART agents actually invoke) — all invisible to a frozen install. This release adds the continuous currency check that was missing: the tool-dependency analogue of the `baldart` CLI's own `UpdateNotifier`. **MINOR** (new doctor diagnostic + self-heal action; backwards-compatible — network-gated and skipped under `--offline`, zero output when every tool is current, no install/layout change, no `baldart.config.yml` key ⇒ schema-propagation rule N/A).

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 4.38.0
1	+ 4.40.0

package/bin/baldart.js CHANGED Viewed

@@ -154,6 +154,16 @@ program
     await doctorCommand({ auto: !!options.auto, offline: !!options.offline });
   });
+program
+  .command('reap-orphans')
+  .description('Sweep orphaned MCP-server processes left by Codex calls (ppid 1 — their broker is dead). Used by the /new and /prd finalizers; safe to run anytime. Never touches a live codex app-server broker.')
+  .option('--json', 'Machine-readable output: emit a single JSON result object on stdout')
+  .option('--dry-run', 'Detect and report orphans without killing anything')
+  .action(async (options) => {
+    const reapCommand = require('../src/commands/reap-orphans');
+    await reapCommand({ json: !!options.json, dryRun: !!options.dryRun });
+  });
 const overlayGroup = program
   .command('overlay')
   .description('Author and check .baldart/overlays/ — scaffolds, validates, and detects drift on skill/agent/command overlays');

package/framework/.claude/skills/new/references/final-review.md CHANGED Viewed

@@ -13,21 +13,32 @@ Once ALL cards are committed in the worktree:
 > **Final-review FULL gate (since v3.37.0 — supersedes the v3.35.0 scope-reduction)** — because
 > Phase 3.7 may now run a card at `light` depth, the per-card pass can NO LONGER be assumed to have
 > full-reviewed every card. The final review is therefore the **unconditional safety gate that makes
-> the Phase 3.7 `light` profile safe**: it ALWAYS runs a single FULL `/codexreview` (full agent set)
-> over the **ENTIRE batch diff** before merge — **no N=1 skip, no cross-card scope reduction**. Every
-> line of every card — including any reviewed at `light` in Phase 3.7 — receives a full-depth Codex
-> review at least once before merge.
+> the Phase 3.7 `light` profile safe**: the **cross-model Codex pass + the `qa-sentinel` merge gate
+> ALWAYS run** over the **ENTIRE batch diff** before merge — **no N=1 skip, no cross-card scope
+> reduction** for those two. Every line of every card — including any reviewed at `light` in Phase
+> 3.7 — receives a full-depth Codex review at least once before merge.
 >
-> - Run Steps F.1–F.5 for **EVERY batch, including N=1**. Nothing here is skipped.
+> - Run Steps F.1–F.5 for **EVERY batch, including N=1**. The safety gate (Codex + qa-sentinel) is
+>   never skipped.
 > - `review_scope_files` = the **FULL union** of all touched files across all cards (F.1 step 4 —
 >   NEVER reduced to the cross-card subset).
-> - F.3 invokes the full reviewer set (Codex + doc-reviewer + api-perf-cost-auditor + qa-sentinel)
->   over that union; F.5 runs the final build.
+> - F.3 invokes Codex + qa-sentinel over that union unconditionally, **plus** the breadth finders
+>   (doc-reviewer, api-perf-cost-auditor). F.5 runs the final build.
+> - **F-041 per-finder slim for N=1 only (coverage-gated).** On a **single-card** batch the breadth
+>   finders may be dropped — but ONLY per-finder, gated on what ACTUALLY ran per-card, never a blanket
+>   "N=1 → skip" (that is the v3.35.0 hole this gate closed). For N=1: drop `doc-reviewer` **iff** Phase
+>   3 ran it on the (single) card — if Phase 3 deferred it (`light` + no doc files), the final MUST run
+>   it; **keep `api-perf-cost-auditor` always**, because it is deferred-to-final by design and never
+>   ran per-card in classic `/new` (the final is its only run). Codex + qa-sentinel still run. For N>1
+>   the full breadth set always runs (a multi-card batch has genuine cross-card surface to review).
 >
-> Rationale: this re-introduces the post-batch full pass that v3.35.0 de-duplicated away. That
-> de-dup assumed Phase 3.7 had already full-reviewed every card — an assumption broken the moment
-> `light` became selectable. The cost of one full batch-diff review is the deliberate price of the
-> per-card `light` speed-up (explicit maintainer decision, v3.37.0).
+> Rationale: v3.37.0 re-introduced the post-batch full pass that v3.35.0 de-duplicated away — that
+> de-dup assumed Phase 3.7 had full-reviewed every card, an assumption broken the moment `light` became
+> selectable. The F-041 slim is NOT a regression of that decision: the safety invariant (Codex + qa
+> over the full diff) is untouched, and a breadth finder is dropped **only** when its own per-card run
+> already covered the same single card — i.e. the coverage-gate IS the backstop the v3.35.0 skip lacked.
+> A single card has no cross-card surface, so a finder's per-card pass and its final pass are genuinely
+> the same review.
 ### Step F.1 — Resolve scope
@@ -78,10 +89,23 @@ that is a **gate violation**: log it as
     reviewScopeFiles,                              // the FULL union from F.1
     archBaselinePaths,                             // per-card baselines if ALL present (F.2 dedup), else null
     hasApiDataFiles,                               // true unless NO scope file falls under paths.api_* / data-model
-    config                                         // the parsed baldart.config.yml
+    config,                                        // the parsed baldart.config.yml
+    singleCard: cardPaths.length === 1,            // N=1 batch — enables the F-041 per-finder slim
+    slimDoc:    cardPaths.length === 1 && <Phase-3 doc-review RAN for this card, NOT deferred>,
+    slimApi:    false                              // api-perf NEVER runs per-card in classic /new → final is its ONLY run; never slim it
   }})
   ```
+  **`slimDoc` coverage gate (per-finder — do NOT collapse to a bare `singleCard`).** The
+  final review may drop the doc-reviewer for an N=1 batch ONLY because Phase 3 already ran
+  it on the same (single) card — so it is gated on Phase 3 having ACTUALLY run, not deferred.
+  You know this from your own Phase 3 log: if you logged `doc-review: DEFERRED to Final FULL
+  gate (light, no doc files in diff)` (review-cycle.md step ~235), Phase 3 did NOT run →
+  `slimDoc: false` (the final MUST run doc-reviewer, else doc review is skipped end-to-end).
+  Otherwise Phase 3 ran → `slimDoc: true`. `slimApi` is ALWAYS `false` here: api-perf-cost-auditor
+  is deferred-to-final by design in classic `/new` (never part of the per-card cluster), so
+  the final pass is its only run — slimming it would leave the api/perf domain unreviewed.
   The workflow returns `{ codexEngine, findings, gateTable, summary }` where
   `findings` are already consolidated and classified (`VERIFIED` /
   `NEEDS_MANUAL_CONFIRMATION`; `FALSE_POSITIVE` already dropped) and `gateTable`
@@ -184,15 +208,26 @@ that is a **gate violation**: log it as
    | **api-perf-cost-auditor** | `api-perf-cost-auditor` | API/data/performance/cost defects (skip if no API/data files in scope) | Same findings schema |
    | **qa-sentinel** | `qa-sentinel` | **Mechanical gates ONLY** over the batch scope (lint, tsc, full test suite, build, `npm audit`, markdownlint) | A PASS/FAIL gate table — NOT a findings list. qa-sentinel does not read source files, does not emit severities, and does not do edge-case/reproducibility analysis (its system prompt forbids it). A gate FAILURE feeds the fix-loop the same way a VERIFIED finding does. |
+   **F-041 per-finder slim — N=1 only (coverage-gated; identical rule to the delegated path's
+   `slimDoc`/`slimApi`).** This inline prose is the SSOT the workflow mirrors, so it carries the same
+   slim: on a **single-card** batch, **skip the `doc-reviewer` row iff Phase 3 ran doc-review on the
+   card** (you logged no `doc-review: DEFERRED ...` — if Phase 3 deferred it under `light`, KEEP it here
+   or doc review is skipped end-to-end). **Always keep `api-perf-cost-auditor`** (subject only to its
+   existing "no API/data files in scope" skip): it is deferred-to-final by design and never ran
+   per-card, so this is its only run — slimming it would leave api/perf unreviewed. `qa-sentinel`
+   always runs. For N>1, run the full breadth set (genuine cross-card surface exists). Codex (step 6)
+   is never slimmed.
    The two code-aware agents (doc-reviewer, api-perf-cost-auditor) receive: card IDs, YAML, `review_scope_files`, codebase-architect baseline, and a Budget Block per the `/codexreview` Step 2 contract (`framework/.claude/commands/codexreview.md`). qa-sentinel receives only the worktree path + the changed-file list and runs gates. Code-correctness/edge-case analysis is Codex's job (and the per-card `/codexreview` already ran) — do NOT ask qa-sentinel to produce code findings.
-   **Fan-out completion barrier (BLOCKING before F.4).** The three Claude agents write to a shared
+   **Fan-out completion barrier (BLOCKING before F.4).** The Claude agents write to a shared
    findings pool that F.4 step 9 fans in. Before F.4 reads ANY finding, you MUST have collected the
-   return value of ALL THREE Task invocations from step 7 (doc-reviewer, api-perf-cost-auditor,
-   qa-sentinel) — never start the merge while a Task is still in flight. Because step 7 launches all
-   three in a single message, the harness returns when all three complete; do NOT proceed to step 9
-   on a partial set. (The Codex background task has its OWN barrier — step 8 below polls `$REVIEW_FILE`
-   for completion. The two barriers are independent: wait for BOTH the three Claude Tasks AND the
+   return value of **every Task invocation you launched in step 7** (the set after the F-041 slim —
+   normally doc-reviewer + api-perf-cost-auditor + qa-sentinel, but fewer when a breadth finder was
+   slimmed for N=1) — never start the merge while a Task is still in flight. Because step 7 launches
+   them in a single message, the harness returns when all complete; do NOT proceed to step 9 on a
+   partial set. (The Codex background task has its OWN barrier — step 8 below polls `$REVIEW_FILE`
+   for completion. The two barriers are independent: wait for BOTH the launched Claude Tasks AND the
    Codex background task before merging.)
 ### Step F.4 — Collect & merge findings

package/framework/.claude/skills/new/references/merge-cleanup.md CHANGED Viewed

@@ -177,6 +177,20 @@ The most common failure mode is leaving cards IN_PROGRESS after merge. This crea
      - Question: `"Restore dello stash di Phase 0 ha generato conflitti. Lo stash è ancora presente (NON eliminato). Come procedo?"`
      - Options: `[Lascia lo stash + apri istruzioni per merge manuale]` / `[Mostrami il conflitto inline]` / `[Halt]`.
+5b. **Process hygiene — reap orphaned Codex MCP servers (NON-BLOCKING)**. This
+   batch's per-card / final-review Codex calls drive `codex app-server`, whose
+   broker spawns the MCP servers declared in `~/.codex/config.toml` (Playwright,
+   …) as children; when a broker dies the OS reparents those MCP servers to init
+   (ppid 1) and they keep burning CPU. Sweep them now so the batch ends clean:
+   ```bash
+   npx baldart reap-orphans 2>/dev/null || true
+   ```
+   This reaps ONLY orphaned MCP servers (ppid 1 ⇒ their broker is already dead ⇒
+   stdio is broken ⇒ dead weight). It deliberately NEVER kills a live
+   `codex app-server` broker (a shared, detached runtime that may still serve the
+   user's interactive session). Never gate the close on this — any error or a
+   "nothing to reap" result is fine; capture its one-line summary for the log.
 6. **Log and exit**:
    ```
    ## Phase 6c — Workspace Hygiene Post-merge
@@ -185,6 +199,7 @@ The most common failure mode is leaving cards IN_PROGRESS after merge. This crea
    Divergence (local…origin/$TRUNK): <0\t0 | resolved: pushed/cherry-picked/ff-pulled/rebased>
    Sync-deferred markers: <none | reconciled | user-retained>
    Phase 0 snapshot restore: <n/a | popped clean | conflict-deferred-to-user>
+   Codex MCP hygiene: <reaped N/M | nothing to reap | skipped (error)>
    Completed: <timestamp>
    ```
    If any step ended in HALT, set `Status: HALT` and report — Phase 7 must NOT start with an unclean main repo unless the user explicitly chose `[Lascia così]`.

package/framework/.claude/skills/new2/SKILL.md CHANGED Viewed

@@ -265,3 +265,14 @@ returns when the batch is done. It returns:
    deferrals resolving too late — order the dependent card earlier), and `owner_gated_deduped` > 0
    means N defers were collapsed to one external action.
    Do NOT re-summarise the cards — the workflow already did.
+6. **Process hygiene — reap orphaned Codex MCP servers (NON-BLOCKING).** The batch's per-card Codex
+   finder calls drive `codex app-server`, whose broker spawns the `~/.codex/config.toml` MCP servers
+   (Playwright, …) as children; when a broker dies they leak to init (ppid 1) and keep burning CPU.
+   Sweep them in the main context (the workflow sandbox cannot run Bash, so this MUST run here, after
+   the workflow returns) so the run ends clean:
+   ```bash
+   npx baldart reap-orphans 2>/dev/null || true
+   ```
+   Reaps ONLY orphaned MCP servers (ppid 1 ⇒ broker dead); NEVER kills a live `codex app-server`
+   broker. Non-blocking — any error / "nothing to reap" is fine; fold its one-line summary into the
+   record (`codex_mcp_reaped`).

package/framework/.claude/skills/prd/references/validation-phase.md CHANGED Viewed

@@ -247,6 +247,20 @@ markers it can emit, then act:
 empty, no `[SYNC-NEEDS-DECISION]` marker is left unhandled, and the merged remote
 branch is gone (or its deletion is explicitly user-deferred).
+**Process hygiene — reap orphaned Codex MCP servers (NON-BLOCKING).** This run's
+Codex calls (discovery-completeness check, plan audit) drive `codex app-server`,
+whose broker spawns the MCP servers from `~/.codex/config.toml` (Playwright, …)
+as children; when a broker dies the OS reparents them to init (ppid 1) and they
+keep burning CPU. Sweep them now so the run ends clean:
+```bash
+npx baldart reap-orphans 2>/dev/null || true
+```
+This reaps ONLY orphaned MCP servers (ppid 1 ⇒ broker dead ⇒ stdio broken ⇒ dead
+weight); it NEVER kills a live `codex app-server` broker (a shared, detached
+runtime that may still serve the user's interactive session). This is NOT part
+of the blocking gate — any error or "nothing to reap" is fine; include its
+one-line summary in the final summary's hygiene line.
 ### Step 7.6 — Obsidian back-reference (NON-BLOCKING — runs only when a spec note was given)
 **Why this exists.** When the user kicked off the PRD from an Obsidian note (state file

package/framework/.claude/workflows/new-final-review.js CHANGED Viewed

@@ -171,28 +171,39 @@ const apiPrompt =
 const qaPrompt =
   `Run MECHANICAL GATES ONLY over the batch scope, per ${protocolRef} Step F.3 (qa-sentinel row): lint, type-check, the full test suite, build, dependency audit, and markdownlint as applicable to this project. Do NOT read source for code findings, do NOT emit severities — return only a PASS/FAIL/SKIP gate table.\n\nWorktree: ${a.worktreePath || '(cwd)'}\nChanged files:\n${scope.join('\n')}`
-// F-041 — single-card batch: the per-card review (Phase 3) already ran doc-reviewer +
-// api-perf-cost-auditor over these exact files, and a 1-card batch has NO cross-card
-// conflict to surface. Keep ONLY the cross-model Codex pass (its unique value — a different
-// model finds different bugs) + qa-sentinel gates; skip the Claude-agent duplicates.
-const slim = a.singleCard === true
+// F-041 — single-card batch: a per-card review already covered these exact files and a
+// 1-card batch has NO cross-card conflict to surface, so the duplicate Claude finders can be
+// dropped from the final pass — but PER-FINDER, gated on what ACTUALLY ran per-card. The
+// cross-model Codex pass (its unique value — a different model finds different bugs) and the
+// qa-sentinel merge gate ALWAYS run. Callers pass coverage-gated flags:
+//   slimDoc — drop the final doc-reviewer (doc-reviewer ran per-card on this single card)
+//   slimApi — drop the final api-perf-cost-auditor (it ran per-card on this single card)
+// Backward-compat: a caller passing only `singleCard` (or nothing) gets the old coupled
+// behavior — slimDoc===slimApi===singleCard, and absent ⇒ FULL — a safe default that never
+// silently over-skips. (`new2.js` passes only `singleCard`, so its behavior is unchanged;
+// the `/new` classic skill passes slimApi:false because api-perf NEVER runs per-card there —
+// it is deferred to THIS final pass by design, so this is its only run.)
+const slimDoc = a.slimDoc !== undefined ? a.slimDoc === true : a.singleCard === true
+const slimApi = a.slimApi !== undefined ? a.slimApi === true : a.singleCard === true
 // qa-sentinel always runs — the merge integrity gate reads its PASS/FAIL table.
 const reviewThunks = [
   () => agent(qaPrompt, { label: 'qa-sentinel', phase: 'Review', agentType: 'qa-sentinel', schema: GATES_SCHEMA }).then((r) => ({ kind: 'qa', r })),
 ]
-if (!slim) {
+if (!slimDoc) {
   reviewThunks.unshift(() => agent(docPrompt, { label: 'doc-reviewer', phase: 'Review', agentType: 'doc-reviewer', schema: FINDINGS_SCHEMA }).then((r) => ({ kind: 'doc', r })))
+} else {
+  log('Review: single-card batch — final doc-reviewer skipped (already ran per-card); kept cross-model Codex + qa gates.')
 }
 // Codex thunk runs ONLY when the pre-flight resolved the companion (else: no wasted agent).
 if (codexResolved) {
   reviewThunks.unshift(() => agent(codexPrompt, { label: 'codex', phase: 'Review', schema: CODEX_SCHEMA }).then((r) => ({ kind: 'codex', r })))
 }
-// api-perf-cost-auditor: skipped when no API/data files OR on a slim single-card pass.
-if (!slim && a.hasApiDataFiles !== false) {
+// api-perf-cost-auditor: skipped when no API/data files OR when it already ran per-card (slimApi).
+if (!slimApi && a.hasApiDataFiles !== false) {
   reviewThunks.push(() => agent(apiPrompt, { label: 'api-perf-cost-auditor', phase: 'Review', agentType: 'api-perf-cost-auditor', schema: FINDINGS_SCHEMA }).then((r) => ({ kind: 'api', r })))
 } else {
-  log(slim
-    ? 'Review: single-card batch — doc-reviewer + api-perf skipped (already run per-card); kept cross-model Codex + qa gates.'
+  log(slimApi
+    ? 'Review: single-card batch — final api-perf-cost-auditor skipped (already ran per-card).'
     : 'Review: api-perf-cost-auditor skipped (no API/data files in scope).')
 }

package/framework/docs/WORKFLOWS.md CHANGED Viewed

@@ -13,7 +13,7 @@ workflows are unavailable behaves exactly as before.
 | Workflow | Used by | What it does |
 | :--- | :--- | :--- |
-| `new-final-review` | `/new` Final Review (Step F.1.5) | Runs the read-only cross-batch review fan-out — architecture baseline + Codex ‖ doc-reviewer ‖ api-perf-cost-auditor ‖ qa-sentinel — then adversarially verifies low-confidence findings and returns them classified. Applies no fixes (the skill owns fix application + user gates). **v4.17.1+:** Codex availability is resolved by a **deterministic pre-flight glob + background poll** (no false negatives from a synchronous-run timeout); a **single-card batch** skips the duplicate doc/api reviewers (already run per-card), keeping only the cross-model Codex pass + qa gates. |
+| `new-final-review` | `/new` Final Review (Step F.1.5) | Runs the read-only cross-batch review fan-out — architecture baseline + Codex ‖ doc-reviewer ‖ api-perf-cost-auditor ‖ qa-sentinel — then adversarially verifies low-confidence findings and returns them classified. Applies no fixes (the skill owns fix application + user gates). **v4.17.1+:** Codex availability is resolved by a **deterministic pre-flight glob + background poll** (no false negatives from a synchronous-run timeout); a **single-card batch** slims the breadth finders **per-finder, coverage-gated** (caller-supplied `slimDoc`/`slimApi`): it drops a duplicate Claude reviewer only when that reviewer already ran per-card on the same card — `new2` drops both doc + api (it runs both per-card, relevance-gated); classic `/new` drops only `doc-reviewer` (when Phase 3 ran it) and **keeps `api-perf-cost-auditor`** (deferred-to-final by design — the final is its only run). Codex + qa gates always run. Backward-compatible: callers without the flags get the full reviewer set. |
 | `new-card-review` (v4.34.0) | `/new` per-card review cluster — sequential (`review-cycle.md` § Phase 2.5x) **and** team-mode (`team-mode.md` D.1.6) | Hosts the **per-wave** review-cluster OUTSIDE the orchestrator context — the biggest prefix-growth source on long epics. Takes **1..N co-located cards** (1 = sequential per-card; N = a team-mode group, so it runs **once per wave, not per card**) and fans out the finders per card: Simplify + cross-model Codex (agent-launched binary, `code-reviewer` fallback) + qa-sentinel (group max tier) + security-reviewer (high-risk only), each specialist FP-checking its own findings. Then **ONE `coder` applies all VERIFIED code/perf/security/simplify findings in a single pass** (files disjoint by ownership) and re-verifies lint/tsc/build. Returns a compact `{perCard:{fixesApplied,residual}, gateTable, summary}` — the minority `residual` (doc, needs-manual, scope, unconverged) the skill resolves with the right specialist / user gate. **Doc-review and E2E stay in the skill** (doc is write-mode + must see final code; E2E is human-gated + nests a skill); **api-perf is deferred to the Final FULL gate**. Maps to `review-cycle.md` Phase 2.55+3.5+3.7. |
 | `new2` (v4.17.2) | `/new2` skill (the whole batch) | **EXPERIMENTAL A/B variant of `/new`.** Hosts the ENTIRE batch in the background runtime so subagent output never enters the main context. A **dependency-gated DAG scheduler** runs a card only when all in-batch deps are *committed* (and blocks transitive dependents of a failed dep instead of routing them to resolve); each card uses its **owner_agent** + a **specialized review fan-out** (not general-purpose); the worktree is kept **atomic per card** (rollback-to-HEAD on failure); transient API errors are retried and a sustained **outage degrades cleanly** (`degraded` return + durable resume via the skill); a **run ledger** dedups resolves and records accepted deferrals (no re-routing loop); the **merge is integrity-gated** (never force-DONE, never `git add` unreviewed code, never merge an incomplete/degraded batch); the commit step runs on **Haiku** while **follow-up cards are written by `prd-card-writer`**; telemetry carries real **cost** (`total_tokens` via `budget.spent()`, `agent_count`, skill-stamped `wall_clock_s`) + `degraded`. **v4.17.2:** the pre-flight **G3 cross-card Codex check is deterministic** (glob-first + background poll, skipped on single-card batches; `codex_resolved` in telemetry); a non-transient card crash is terminal-with-residual (no orphaned self-healing). Agents Read `/new`'s reference modules for semantics. |
 | `new2-resolve` (v4.17.2) | `new2` (self-healing) | Resolution pass for any gate that would otherwise need a human (`ac-unmet · blocker · qa-fail · e2e-blocked · merge-blocker · scope-expansion`). A **terminal short-circuit** skips the costly multi-attempt when the problem is impossible-by-definition (`out-of-ownership` verified in JS; other terminal reasons ratified by a judge); a **MANDATORY adversarial judge** cross-checks every `verified` claim — the judge independently greps the files and the workflow verifies **at least one** falls inside MAY-EDIT (`.some()`, so listing adjacent changed files is not mistaken for fabrication); accepts a **batched `findings` list** (one resolve per fix-area). **The domain is normalized** (freeform `documentation`→`doc`, …) before routing the **fixer** (doc→doc-reviewer, ui→ui-expert, security→security-reviewer, else coder) and judge, and a **doc finding gets doc-tree MAY-EDIT** (not the card's code scope); the 3-angle Tier-2 fan-out is reserved for code domains (single retry for doc/test). Follow-ups are written by **`prd-card-writer`**, offline-safe (deferred to the skill if no agent can write). |

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "baldart",
-  "version": "4.38.0",
+  "version": "4.40.0",
   "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
   "bin": {
     "baldart": "./bin/baldart.js"

package/src/commands/reap-orphans.js ADDED Viewed

@@ -0,0 +1,90 @@
+/**
+ * `baldart reap-orphans` — sweep orphaned MCP-server processes left by Codex
+ * calls (since v4.38.0).
+ *
+ * Non-interactive, focused companion to the `baldart doctor` reap action: it
+ * detects MCP servers that have been orphaned to init (ppid 1 — their parent
+ * `codex app-server` broker is dead) and kills each process tree. Designed to be
+ * called from the `/new` (Phase 6c) and `/prd` (Step 7.5) workspace-hygiene
+ * finalizers so each batch ends by clearing the MCP debris its Codex finder
+ * calls (and any prior runs) left behind.
+ *
+ * SCOPE & SAFETY (do not "improve" this into killing the broker):
+ *   - Reaps ONLY orphaned MCP servers (ppid 1 + MCP signature). A live, in-use
+ *     `codex app-server` broker is `detached + unref'd` BY DESIGN and also shows
+ *     ppid 1, so we never touch brokers — killing one could break the user's
+ *     concurrent interactive Codex session. Brokers are reported, never killed.
+ *   - Because MCP children of a *still-alive* broker have ppid = broker (not 1),
+ *     a run whose broker is still warm may leave its own MCP non-orphaned at
+ *     finalizer time; those get swept by the next run's finalizer (or `doctor`).
+ *     This command is a cumulative orphan sweep, not a per-run broker teardown.
+ *
+ * Always exits 0 — this is hygiene, never a blocker. The SSOT for detection /
+ * reaping logic is `src/utils/codex-orphans.js`; this command only frames it.
+ */
+const UI = require('../utils/ui');
+const CodexOrphans = require('../utils/codex-orphans');
+async function reapOrphans(opts = {}) {
+  const json = !!opts.json;
+  const dryRun = !!opts.dryRun;
+  const result = {
+    schema: 'baldart.reap-orphans/1',
+    found: 0,
+    reaped: 0,
+    failed: 0,
+    runtimeBrokers: 0,
+    dryRun,
+    orphans: [],
+    failures: [],
+  };
+  try {
+    const procs = CodexOrphans.listProcesses();
+    const { mcp, runtime } = CodexOrphans.detectOrphans(procs);
+    result.found = mcp.length;
+    result.runtimeBrokers = runtime.length;
+    result.orphans = mcp.map((p) => ({ pid: p.pid, etime: p.etime, command: p.command }));
+    if (!dryRun && mcp.length > 0) {
+      const { killed, failed } = CodexOrphans.reapOrphans(mcp, procs);
+      result.reaped = killed.length;
+      result.failed = failed.length;
+      result.failures = failed;
+    }
+  } catch (err) {
+    result.error = (err && err.message) || String(err);
+  }
+  if (json) {
+    process.stdout.write(JSON.stringify(result) + '\n');
+    return result;
+  }
+  // Human output — single concise summary line + optional detail.
+  if (result.error) {
+    UI.warning(`Codex MCP reap skipped (probe error: ${result.error}).`);
+  } else if (result.found === 0) {
+    UI.success('Codex MCP hygiene: no orphaned MCP servers — nothing to reap.');
+  } else if (dryRun) {
+    UI.warning(`Codex MCP hygiene: ${result.found} orphaned MCP server(s) found (dry-run, not killed):`);
+    result.orphans.slice(0, 8).forEach((o) =>
+      console.log(`   • pid ${o.pid} (up ${o.etime}): ${o.command.slice(0, 70)}`));
+    if (result.orphans.length > 8) console.log(`   • … and ${result.orphans.length - 8} more`);
+  } else {
+    UI.success(`Codex MCP hygiene: reaped ${result.reaped}/${result.found} orphaned MCP server(s) (incl. descendants).`);
+    if (result.failed > 0) {
+      UI.warning(`${result.failed} could not be killed:`);
+      result.failures.forEach((f) => console.log(`    pid ${f.pid}: ${f.error}`));
+    }
+  }
+  if (result.runtimeBrokers > 0) {
+    UI.info(`(${result.runtimeBrokers} codex app-server broker(s) detected — left untouched by design.)`);
+  }
+  return result;
+}
+module.exports = reapOrphans;