npm - baldart - Versions diffs - 4.35.0 → 4.37.0 - Mend

baldart 4.35.0 → 4.37.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/CHANGELOG.md +28 -0
package/README.md +11 -2
package/VERSION +1 -1
package/framework/.claude/agents/security-reviewer.md +1 -0
package/framework/.claude/skills/new/SKILL.md +4 -2
package/framework/.claude/skills/new/references/codex-gate.md +2 -2
package/framework/.claude/skills/new/references/final-review.md +2 -2
package/framework/.claude/skills/new/references/review-cycle.md +10 -6
package/framework/.claude/skills/new/references/team-mode.md +7 -5
package/framework/.claude/workflows/new-card-review.js +58 -30
package/framework/.claude/workflows/new-final-review.js +5 -1
package/package.json +1 -1
package/src/commands/doctor.js +62 -0
package/src/utils/codex-orphans.js +182 -0

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,34 @@ All notable changes to BALDART will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [4.37.0] - 2026-06-15
+**`baldart doctor` now detects and reaps orphaned MCP-server processes left behind by BALDART's Codex calls.** A real machine hit ~100% CPU from ~45 orphaned `@playwright/mcp` processes (plus stray `obsidian-mcp-server` instances), all children of OpenAI Codex CLI sessions that had since died. Root cause traced through the Codex companion plugin: every BALDART Codex finder call (`/new`, `new2`, `/codexreview`, the cron review engine) drives `codex app-server` via `codex-companion.mjs`, which attaches to a **shared, `detached + unref'd` broker** (`broker-lifecycle.mjs`). That broker spawns every MCP server declared in the user's `~/.codex/config.toml` (Playwright, Figma, …) as its own children; when the broker dies the OS reparents those MCP servers to init (ppid 1) and they keep running — an `@playwright/mcp` can peg a core for days. The leak compounds across sessions. We cannot suppress the MCP spawn per-call (the companion attaches to a broker it does not control, and exposes no shutdown verb), so the fix is a **safe reaper** owned by the doctor. **MINOR** (new doctor diagnostic + self-heal action; backwards-compatible — zero output on a clean machine, no install/layout change, not a `baldart.config.yml` key ⇒ schema-propagation rule N/A).
+### Added
+- **`src/utils/codex-orphans.js`** — orphaned-MCP-server detector + reaper. `detectOrphans()` snapshots `ps -axo` and returns MCP servers that are **orphaned (ppid 1) AND match an MCP-server command signature** (`@playwright/mcp`, `playwright-mcp`, `*-mcp-server`, `@modelcontextprotocol/*`, `obsidian-mcp`, npx `*-mcp@*`). `reapOrphans()` kills each orphan's full process tree (so a Playwright MCP's browser children go too) via `process.kill(pid, 'SIGKILL')` — a direct syscall, immune to sandboxed shells that silently swallow multi-arg `kill`/for-loops. **Safety invariant**: ppid 1 means the parent is dead, so an MCP server's stdio pipe is broken and the process is unreconnectable dead weight — safe to reap. The `codex app-server` broker is deliberately NOT reaped: it is `detached + unref'd` by design, so a *live, in-use* shared runtime also shows ppid 1 and ppid 1 cannot tell a leaked broker from a healthy one. Broker processes are detected for visibility only. Fully fail-safe (Windows / any error → "no orphans"); no age threshold (an orphan is dead weight at any age).
+- **`src/commands/doctor.js`** — new probe (`state.mcpOrphans`), diagnostic line (`Codex MCP leak — N orphaned MCP server(s) running`, shown only when present so a clean machine prints nothing), and self-heal action `reap-mcp-orphans` (`autoOk: false` — killing processes warrants explicit intent; re-detects against a fresh snapshot at run time so it never acts on a stale list).
+## [4.36.0] - 2026-06-13
+**`/new` security-domain fixes are now applied by `security-reviewer`, not `coder` — the v4.26.1 canonical writer map, finally propagated from `new2` to `/new`.** Auditing the `new2` lessons for guards/logic missing on `/new` surfaced one real gap (the others — args-string guard, JS router clamp, no-self-judge + specialist-owned lane, relevance-gated fan-out — were already present on `/new`). `new2-resolve.js` routes security fixes to `security-reviewer` (`fixerAgent = {doc:'doc-reviewer', ui:'ui-expert', security:'security-reviewer'}[domain] || 'coder'`), but the canonical writer map was never propagated to `/new`'s SSOT: the `Domain-Override Domains` table (SKILL.md) and every fix-routing site still sent `security` → `coder`. A coder applying a one-line RLS/permission/auth fix lacks the security-invariant contract that lives in `security-reviewer`'s system prompt — the same class of error as "wrong agent for the card", and a direct violation of the user's standing strict-specialization principle. **MINOR** (changes which agent applies security fixes across `/new`; backwards-compatible — `migration` stays `coder`, no install/layout change, no `baldart.config.yml` key ⇒ schema-propagation rule N/A).
+### Changed
+- **`framework/.claude/skills/new/SKILL.md`** — `Domain-Override Domains` table: `security` owning agent `coder` → **`security-reviewer`** (write mode), plus a new "Why `security` is owned by `security-reviewer`" rationale mirroring the `doc` one. The sequential-mode overview line aligned too.
+- **`framework/.claude/skills/new/references/review-cycle.md`**, **`final-review.md`**, **`team-mode.md`**, **`codex-gate.md`** — every security-fix routing site (Phase 2.55 Domain-Override delegation, the delegated-workflow residual routing, the Final FULL merge-blocking partition, the Phase 3.7 codex fix sub-loop, and the doc-drift→bug security path) now routes `security` → `security-reviewer` and runs it before the `coder` pass. `migration` stays `coder`.
+- **`framework/.claude/workflows/new-card-review.js`** — the Fix phase no longer folds security into the single coder pass. It partitions `VERIFIED` findings into a `security-reviewer` pass (domain `security`) and a `coder` pass (`code`/`perf`/`migration`/`test`/`simplify`), run sequentially (security first) over the disjoint-by-ownership editable set so shared-file edits never conflict; a FAIL in either pass fails the wave. `new-final-review.js` needed no change (it is read-only — the calling skill applies fixes — and its `domainVerifier` already routes security verification to `security-reviewer`).
+- **`framework/.claude/agents/security-reviewer.md`** — new "Dual mode — review vs. apply" Behavior Rule: by default it audits and proposes (read-only), but when invoked as the security domain writer (by `/new`/`new2`/the codex fix loop) it APPLIES the remediation directly via Edit/Write and re-verifies — security fixes are owned by it, never deferred to a coder.
+## [4.35.1] - 2026-06-13
+**`/new` workflow delegation no longer degrades to a silent no-op when `args` arrives as a JSON string.** A live `/new FEAT-0027 -full` team-mode run delegated its per-wave review cluster to the `new-card-review` workflow and got back a degenerate result (`cards:0`, 0 agents, ~24ms) — the orchestrator correctly fell back to the inline cluster, but the delegation (the single biggest context-economy win in team mode) was wasted on every wave. Root cause: the `Workflow` tool sometimes serializes a structured `args` object to a JSON **string**; `new-card-review.js` and `new-final-review.js` read `args.cards` / `args.reviewScopeFiles` directly, so a string `args` left those `undefined` → empty scope → the early-return guard fired. The `new2` family (`new2.js`, `new2-resolve.js`) had already been hardened against exactly this (`F-001/F-004` parse-or-default guard), but the fix was **never propagated** to the two `/new` workflows — a parallel-location miss. **PATCH** (bugfix to shipped workflow payload, no behaviour change to install, no config key ⇒ schema-propagation rule N/A).
+### Fixed
+- **`framework/.claude/workflows/new-card-review.js`**, **`framework/.claude/workflows/new-final-review.js`** — added the same defensive `if (typeof a === 'string') { try { a = JSON.parse(a) } catch (_) { a = {} } }` guard already present in `new2.js`/`new2-resolve.js`. All four workflows now tolerate `args` delivered as a JSON string, so `/new`'s delegated review cluster and Final Review fan-out run as intended instead of no-op'ing into the inline fallback.
 ## [4.35.0] - 2026-06-13
 **Card-baseline standardization — every backlog card, any prefix/origin, conforms to one profile-aware SSOT; `/new` normalizes foreign cards at ingestion.** A real `CHORE-0007` (consumer repo `mayo`) reached `/new` **without `review_profile`** (and without `scope`/`scope_boundaries`/`canonical_docs`): it was hand/ad-hoc authored after a graph-align finding, never by the canonical writer. `/new` and `/new2` consume cards **type-blind** — they scale per-card review depth on `review_profile` and run the same pipeline regardless of prefix — so an off-baseline card silently degrades the pipeline. Root cause: the baseline was scattered across `card-template.yml` + `prd-card-writer`'s Required-Fields + Rule C with **no single SSOT and no validator**; `prd-card-writer` only documented the `/prd` epic+children flow (its standalone single-card mode, used by `new2-resolve`, was undocumented); and three writers diverged — `/prd`/`new2`/`new2-resolve` emit the full baseline, but the `/new` AC-deferral stub (`completeness.md`) and `/issue-review` (`issue-review.md`) wrote partial cards.

package/README.md CHANGED Viewed

@@ -496,8 +496,17 @@ still exist for power users, but the seamless default makes them unnecessary.
 Smart diagnostic that detects the install state and proposes the next sensible
 action (install, migrate legacy layout, configure, refresh config schema,
-update, push, or "nothing to do"). Prints a status table then runs the
-proposed actions with confirmation per step.
+update, push, repair symlinks, reap orphaned Codex MCP servers, or "nothing to
+do"). Prints a status table then runs the proposed actions with confirmation per
+step.
+Since v4.37.0 it also surfaces **orphaned MCP-server processes left by Codex
+calls** — every BALDART Codex finder call (`/new`, `new2`, `/codexreview`, the
+cron review engine) drives `codex app-server`, whose detached broker spawns the
+MCP servers from `~/.codex/config.toml` (Playwright, …) and leaks them to init
+(ppid 1) when it dies, where they keep burning CPU. The doctor reaps the
+orphaned MCP servers (and their browser children) directly via syscall; the live
+`codex app-server` broker is never touched.
 ```bash
 npx baldart            # diagnostic + interactive prompts

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 4.35.0
1	+ 4.37.0

package/framework/.claude/agents/security-reviewer.md CHANGED Viewed

@@ -50,6 +50,7 @@ Before reviewing:
 ## Behavior Rules
+- **Dual mode — review vs. apply.** By default you AUDIT and *propose* remediations (read-only). But when invoked as the **security domain writer** (e.g. by `/new` / `new2` / the Phase 3.7 codex fix loop, whose brief tells you to "apply the verified security findings" in write mode), you ARE the fixer: APPLY the minimal remediation directly with the Edit/Write tools, then re-verify (lint/tsc/build as instructed). Security fixes are owned by you — never deferred to a coder — because the auth/permission/RLS/multi-tenant-isolation invariants live in YOUR system prompt, not the coder's. Stay within the files the brief's ownership map allows; if a fix needs a file outside that scope, report it as residual rather than expanding scope.
 - Be extremely critical, thorough, and skeptical. Optimize for correctness and security, not politeness.
 - Do NOT assume the developer did things safely unless proven by code evidence.
 - Treat ALL external input as hostile.

package/framework/.claude/skills/new/SKILL.md CHANGED Viewed

@@ -291,7 +291,7 @@ per-card nei sub-step D.x (mai aggregate). Caricalo quando Pre-flight seleziona
 ### Sequential mode (default for small batches)
 - Cards execute one at a time through the full per-card pipeline (Phases 1-5).
-- Code review and doc review for the same card run as **parallel read-only audits**, then fixes are applied by domain owner: **doc findings → `doc-reviewer` (write mode)**, code/security/migration findings → `coder`. (Sequential Phase 3 is even simpler — doc-reviewer runs alone, so it audits AND applies in one invocation.)
+- Code review and doc review for the same card run as **parallel read-only audits**, then fixes are applied by domain owner (see § "Domain-Override Domains"): **doc findings → `doc-reviewer` (write mode)**, **security findings → `security-reviewer` (write mode)**, code/perf/migration findings → `coder`. (Sequential Phase 3 is even simpler — doc-reviewer runs alone, so it audits AND applies in one invocation.)
 - This mode is unchanged from the original behavior.
 ### Team mode (for complex batches)
@@ -541,11 +541,13 @@ Enumerated exhaustively:
 | Domain | Owning agent | Match rule |
 |---|---|---|
 | `doc` | **`doc-reviewer`** (write mode) | File path matching `*.md` under `${paths.references_dir}`, `${paths.prd_dir}`, project root `CHANGELOG.md`, or any `ssot-registry.md`. |
-| `security` | `coder` | File path matching any entry in `paths.high_risk_modules` (`baldart.config.yml`) — the same auth/permission/payment-class paths the Phase 3.7 Step A detector reads. Also any SQL migration whose content matches `CREATE POLICY|ALTER POLICY|DROP POLICY` (RLS policy mutations). If `paths.high_risk_modules` is absent, the security match rule emits a one-line diagnostic and matches nothing (no hardcoded default). |
+| `security` | **`security-reviewer`** (write mode) | File path matching any entry in `paths.high_risk_modules` (`baldart.config.yml`) — the same auth/permission/payment-class paths the Phase 3.7 Step A detector reads. Also any SQL migration whose content matches `CREATE POLICY|ALTER POLICY|DROP POLICY` (RLS policy mutations). If `paths.high_risk_modules` is absent, the security match rule emits a one-line diagnostic and matches nothing (no hardcoded default). |
 | `migration` | `coder` | File path matching `${paths.migrations_dir}/*.sql` if `paths.migrations_dir` is defined in `baldart.config.yml`; otherwise the project's migrations dir per convention (`migrations/`, `db/migrate/`, `supabase/migrations/`, `prisma/migrations/`). |
 **Why `doc` is owned by `doc-reviewer`, not `coder` (since v3.40.0)** — the doc invariants the orchestrator must not break (freshness markers, linking protocol, frontmatter standard, tabular formatting, SSOT/registry coverage, dependency-topological order, SCIP/code refs) are encoded in the **`doc-reviewer`** system prompt, NOT the coder's. The coder is a code-oriented agent that lacks the doc-invariant contract — routing doc fixes to it is the wrong agent doing work the auditing agent already has full context for. The agent that *audits* the docs is also the agent that *fixes* them (`doc-reviewer.md` § Constraints: "WRITE missing docs directly. You are fully responsible — do not defer to other agents"). NEVER route a `doc`-domain fix to `coder`.
+**Why `security` is owned by `security-reviewer`, not `coder` (since v4.36.0)** — the same logic as `doc`, applied to the security domain (canonical writer map v4.26.1; user principle "il codice lo scrive solo coder, la security solo security-reviewer"). The auth/permission/RLS/multi-tenant-isolation invariants live in the **`security-reviewer`** system prompt, not the coder's; a coder applying a one-line RLS or permission fix without that contract is the same class of error as the "wrong agent for the card". `security-reviewer` is the writer for security-domain fixes — it audits AND applies. `migration` stays `coder` (SQL authoring is the coder's lane; a migration's security-policy content matching the RLS rule above is classified `security`, not `migration`). NEVER route a `security`-domain fix to `coder`.
 **Edge case explicit** — a mechanical append-a-row update to `CHANGELOG.md` or `ssot-registry.md` is still classified `doc` and still goes through `doc-reviewer`, never inline and never `coder`. The uniformity of the rule matters more than the cost of the individual spawn.
 Domains NOT listed here remain governed by the per-phase rules of the corresponding phase (e.g. `simplify-*` follows Phase 2.55 inline rule).

package/framework/.claude/skills/new/references/codex-gate.md CHANGED Viewed

@@ -128,9 +128,9 @@ For EVERY card (no conditional skip — the gate ALWAYS runs; only its DEPTH var
 4. **Apply fix sub-loop** (mirror of Phase 3.5 retry pattern):
    - If 0 BLOCKER and 0 HIGH → log `verdict: PASS — proceeding to Phase 4` in tracker. Done. (MEDIUM/LOW findings are advisory at this per-card gate; they are not silently lost — the post-batch **Final-review FULL gate** applies every VERIFIED finding ≥ MEDIUM. Log the MEDIUM count in the tracker so it is visible.)
-   - If 1+ BLOCKER OR 1+ HIGH → spawn `coder` agent with the report path + list of VERIFIED bugs. **At `full` profile** the report contains Codex-suggested inline patches: pass them and have the coder **apply the suggested patches** with the right system prompt (project conventions, naming, testing patterns) — it does NOT re-do the analysis or re-grep (since v3.28.3), BUT it MUST first confirm each patch still applies against the current file state (prior fix-loop iterations may have shifted line offsets); if a patch no longer applies cleanly, the coder re-locates the target by content and applies the equivalent edit rather than a stale-offset verbatim paste. **At `light` profile** (since v4.18.0) the findings come from **Codex** (the sole finder) — the report carries Codex's `minimal_fix_direction`; brief the coder to apply it (treat it like the `full`-profile Codex fix direction). **On the Codex-unavailable fallback** the `light` findings come from `code-reviewer` instead — brief the coder to apply the `code-reviewer` fix direction (no Codex patches to paste). After coder fixes, **re-write the lean contract `/tmp/codexreview-lean-<CARD-ID>.json` (it is consumed-once and deleted by `/codexreview`)** and re-invoke `/codexreview` via the Skill tool with `args: <CARD-ID>` (NOT a bare prose mention — the card ID MUST be passed so the retry reviews THIS card, not an inferred one). Repeat **max 2 times**.
+   - If 1+ BLOCKER OR 1+ HIGH → spawn the **domain writer** with the report path + list of VERIFIED bugs (canonical writer map v4.26.1 — see SKILL.md § "Domain-Override Domains"): **`security`-domain findings** (touching `paths.high_risk_modules` or RLS-policy SQL — the same `security` match rule) → **`security-reviewer`** in write mode (it owns the security-invariant contract a coder lacks; never route a security fix to `coder`); **all other findings** (`correctness`/code/perf/`other`) → **`coder`**. Run security-reviewer first, then coder (skip either if its partition is empty). **At `full` profile** the report contains Codex-suggested inline patches: pass them and have the coder **apply the suggested patches** with the right system prompt (project conventions, naming, testing patterns) — it does NOT re-do the analysis or re-grep (since v3.28.3), BUT it MUST first confirm each patch still applies against the current file state (prior fix-loop iterations may have shifted line offsets); if a patch no longer applies cleanly, the coder re-locates the target by content and applies the equivalent edit rather than a stale-offset verbatim paste. **At `light` profile** (since v4.18.0) the findings come from **Codex** (the sole finder) — the report carries Codex's `minimal_fix_direction`; brief the coder to apply it (treat it like the `full`-profile Codex fix direction). **On the Codex-unavailable fallback** the `light` findings come from `code-reviewer` instead — brief the coder to apply the `code-reviewer` fix direction (no Codex patches to paste). After coder fixes, **re-write the lean contract `/tmp/codexreview-lean-<CARD-ID>.json` (it is consumed-once and deleted by `/codexreview`)** and re-invoke `/codexreview` via the Skill tool with `args: <CARD-ID>` (NOT a bare prose mention — the card ID MUST be passed so the retry reviews THIS card, not an inferred one). Repeat **max 2 times**.
    - If still BLOCKER/HIGH after 2 retries → log in `## Issues & Flags` and **ask the user** whether to proceed, escalate, or stop. The Phase 4 commit MUST NOT happen until the Pre-Merge Codex Review verdict is PASS or user explicitly overrides.
-   - **Telemetry** — for EVERY codex finding processed (verified BLOCKER, verified HIGH, or false-positive-filtered), append one row to `## Fix Application Log`: `3.7 | codex-<security|correctness|other> | est_lines=<n> | decision=<coder|skipped> | applied_by=<coder|-> | severity=<BLOCKER|HIGH|FALSE-POSITIVE> | retry=<n>`. Classify domain: `security` for findings touching RLS / auth / permissions / payments; `correctness` for logic / data integrity / race conditions; `other` for everything else.
+   - **Telemetry** — for EVERY codex finding processed (verified BLOCKER, verified HIGH, or false-positive-filtered), append one row to `## Fix Application Log`: `3.7 | codex-<security|correctness|other> | est_lines=<n> | decision=<security-reviewer|coder|skipped> | applied_by=<security-reviewer|coder|-> | severity=<BLOCKER|HIGH|FALSE-POSITIVE> | retry=<n>`. (`security`-domain fixes are applied by `security-reviewer`, all others by `coder`.) Classify domain: `security` for findings touching RLS / auth / permissions / payments; `correctness` for logic / data integrity / race conditions; `other` for everything else.
 5. **Update tracker**: phase = `3.7-codexgate DONE` (the gate runs unconditionally for every card — the legacy `3.7-highrisk` name implied it only fired on high-risk cards, which is no longer true), log final verdict, retry count, list of fixed findings, and the report path.

package/framework/.claude/skills/new/references/final-review.md CHANGED Viewed

@@ -220,9 +220,9 @@ that is a **gate violation**: log it as
 10. **Persist verified findings** to `/tmp/batch-final-review-<FIRST-CARD-ID>.md`.
 11. **Merge-blocking gate (mirrors the per-card Phase 3.7 gate this final pass backstops):** if any VERIFIED **BLOCKER or HIGH** finding exists, it MUST be resolved before Phase 6 merge. Apply fixes by **domain owner** (since v3.40.0 — same Domain-Override routing as the per-card phases), then re-verify; if a BLOCKER/HIGH cannot be resolved in a single apply + one retry, log it in `## Issues & Flags` and invoke `AskUserQuestion` (override with reason / escalate to a follow-up card / halt) — do NOT proceed to Phase 6 with an unresolved BLOCKER or HIGH. VERIFIED findings of severity MEDIUM are also applied (advisory below that). Partition the verified findings by the **Domain-Override match rules** ("Domain-Override Domains"):
     - **`doc`-domain findings** (file path matching the `doc` match rule — `*.md` under `${paths.references_dir}`/`${paths.prd_dir}`, `CHANGELOG.md`, `ssot-registry.md`) → invoke the **doc-reviewer** agent once in write mode to apply them. NEVER route doc fixes to coder.
-    - **`security`-domain findings** (path in `paths.high_risk_modules`, or RLS-policy SQL) and **`migration`-domain findings** (SQL under the migrations dir) → route to **coder**, but apply the Sub-agent failure protocol's STOP-on-crash rule for these domains (never inline-fallback on a security/migration fix). These are NOT collapsed into a generic "everything else" bucket.
+    - **`security`-domain findings** (path in `paths.high_risk_modules`, or RLS-policy SQL) → route to **security-reviewer** in write mode (canonical writer map v4.26.1 — it owns the security-invariant contract a coder lacks; NEVER route security fixes to coder). **`migration`-domain findings** (SQL under the migrations dir) → route to **coder**. For both, apply the Sub-agent failure protocol's STOP-on-crash rule (never inline-fallback on a security/migration fix). These are NOT collapsed into a generic "everything else" bucket.
     - **All remaining findings** (other code, perf, test) → invoke the **coder** agent once to apply them in a single pass.
-    Run in the order doc-reviewer → coder (or skip either if its partition is empty). Pass only the verified findings, not false positives.
+    Run in the order doc-reviewer → security-reviewer → coder (skip any whose partition is empty). Pass only the verified findings, not false positives.
 12. Run final build: `npm run lint && npx tsc --noEmit && npm run build` (redirect each to `/tmp/final-<gate>.txt` per § "Context economy"; surface only exit code + bounded extract on failure).
     If any check fails, apply self-healing retry loop (up to 3 times).
 13. **Update tracker** with final review results:

package/framework/.claude/skills/new/references/review-cycle.md CHANGED Viewed

@@ -51,8 +51,10 @@ so it surfaces in telemetry.
   ```
   The workflow runs Simplify + Codex (agent-launched, code-reviewer fallback) + qa-sentinel + security,
-  FP-checks each specialist's own findings, then **one coder applies all VERIFIED
-  code/perf/security/simplify findings in a single pass** and re-verifies lint/tsc/build. It returns
+  FP-checks each specialist's own findings, then the **domain writer applies its VERIFIED findings**
+  (canonical writer map v4.26.1: `security` → `security-reviewer`; `code`/`perf`/`migration`/`test`/
+  `simplify` → `coder`) — security-reviewer pass first, then the coder pass — and re-verifies
+  lint/tsc/build. It returns
   `{ codexEngine, perCard: { <CARD-ID>: { fixesApplied, residual } }, gateTable, summary }`.
   **Skip the inline Phase 2.55 + Phase 3.5 below AND the Phase 3.7 gate in `codex-gate.md`** (all three
   are now done), then handle the workflow output HERE in the skill. **Process each `residual` finding by
@@ -61,7 +63,9 @@ so it surfaces in telemetry.
   - `classification == NEEDS_MANUAL_CONFIRMATION` (any domain) → `AskUserQuestion` — the human gate the
     workflow cannot run. (`summary.needsManual` counts these, doc included.)
   - else `domain == doc` residual → carry into **Phase 3** (the doc-reviewer runs there, post-E2E, on final code).
-  - else `code`/`perf`/`security`/`migration` residual (a fix the coder could not converge in its 2 retries)
+  - else `security` residual (a fix not converged in 2 retries) → spawn a targeted `security-reviewer`
+    now over this card's `editableFiles` (it owns the security-invariant contract — never a coder).
+  - else `code`/`perf`/`migration` residual (a fix the coder could not converge in its 2 retries)
     → spawn a targeted `coder` now over this card's `editableFiles`.
   - **QA gate (BLOCKING — mirror of inline Phase 3.5 step 24)**: if `gateTable` has any `status:"FAIL"`
     **OR** `summary.checksFailed` is true, the merge gate is NOT satisfied. Spawn a `coder` on the
@@ -107,7 +111,7 @@ After completeness is verified, clean up the implementation before it reaches re
    - **Efficiency agent** — flag unnecessary work (redundant computations, duplicate API calls, N+1), missed concurrency, hot-path bloat, recurring no-op updates without change-detection guards, TOCTOU existence checks, memory issues (unbounded structures, missing cleanup), overly broad operations.
 4. Aggregate findings from all three agents. For each finding:
-   - **Valid AND in a Domain-Override domain** (the finding's target file matches the `doc`, `security`, or `migration` match rule in "Domain-Override Domains") → do NOT apply inline. Delegate to the domain owner: `doc` → `doc-reviewer` (write mode), `security`/`migration` → `coder`. Even a one-line efficiency fix in `paths.high_risk_modules` or a migration file goes to the owning agent — the orchestrator lacks that domain's invariant contract.
+   - **Valid AND in a Domain-Override domain** (the finding's target file matches the `doc`, `security`, or `migration` match rule in "Domain-Override Domains") → do NOT apply inline. Delegate to the domain **writer** (canonical writer map v4.26.1): `doc` → `doc-reviewer` (write mode), `security` → `security-reviewer` (write mode — it owns the security-invariant contract a coder lacks), `migration` → `coder`. Even a one-line efficiency fix in `paths.high_risk_modules` (security) or a migration file goes to the owning agent — the orchestrator lacks that domain's invariant contract.
    - **Valid AND not in a Domain-Override domain** → fix directly (apply edits inline).
    - **False positive / not worth addressing** → skip, BUT record it (see telemetry). If the skip rests on a "covered by X" / "redundant" / "not needed" rationalization (the same family the AC-Closure Gate guards against), do NOT discard silently — verify the rationale by reading `X`, and if it does not hold, treat the finding as valid.
@@ -279,9 +283,9 @@ skill's Phase 1 falls back to deriving Gherkin scenarios from
       per-card Phase 3.7 gate now skips that duplicate (lean mode), so THIS pass MUST carry it.
       A doc-drift→bug finding whose root cause is in CODE (not the doc) is the ONE thing
       doc-reviewer does NOT fix itself — report it with the conflicting code location + the doc
-      it violates, and the orchestrator routes it to the `security`/code fix path as appropriate.
+      it violates, and the orchestrator routes it to the `security` (→ security-reviewer) / code (→ coder) fix path as appropriate.
     ```
-    Doc-reviewer applies all doc-domain fixes itself. The orchestrator does NOT spawn a coder for doc fixes (since v3.40.0 — `doc` is owned by `doc-reviewer`, see "Domain-Override Domains"). The only doc-reviewer output that leaves this phase unfixed is a **doc-drift→bug finding rooted in CODE** (the implementation contradicts a documented contract). Route it explicitly: if the conflicting code file matches the `security` Domain-Override match rule (`paths.high_risk_modules`) → spawn `coder` with the finding now, in this phase (a security-class code fix is not deferrable to a `light` Phase 3.7); otherwise carry the finding into the Phase 3.7 `/codexreview` input as a known code-drift bug and let the Phase 3.7 fix sub-loop apply it. Either way, append a Fix Application Log row with `domain=codex-correctness` (NOT `doc`) so telemetry attributes it as a code fix. Do NOT leave it accumulating in the tracker with no fix owner.
+    Doc-reviewer applies all doc-domain fixes itself. The orchestrator does NOT spawn a coder for doc fixes (since v3.40.0 — `doc` is owned by `doc-reviewer`, see "Domain-Override Domains"). The only doc-reviewer output that leaves this phase unfixed is a **doc-drift→bug finding rooted in CODE** (the implementation contradicts a documented contract). Route it explicitly: if the conflicting code file matches the `security` Domain-Override match rule (`paths.high_risk_modules`) → spawn `security-reviewer` with the finding now, in this phase (a security-class code fix is not deferrable to a `light` Phase 3.7, and security is owned by `security-reviewer` — never a coder); otherwise carry the finding into the Phase 3.7 `/codexreview` input as a known code-drift bug and let the Phase 3.7 fix sub-loop apply it. Either way, append a Fix Application Log row with `domain=codex-correctness` (NOT `doc`) so telemetry attributes it as a code fix. Do NOT leave it accumulating in the tracker with no fix owner.
 14. **Knowledge-corpus sync (OPTIONAL — only if the project ships a corpus-sync agent)**: There is NO shipped `obsidian-sync` agent — do NOT dispatch one (a hard dispatch to a non-existent subagent fails silently). Only when the project provides its own knowledge-corpus sync agent (declared in `.baldart/overlays/new.md`) AND doc-reviewer's findings indicate a corpus impact, invoke that agent with the listed paths after the doc fixes are applied. Otherwise skip with a one-line notice (`knowledge-corpus sync: skipped (no corpus-sync agent configured)`). Non-blocking either way.
 15. **Telemetry** — after doc-reviewer returns, append one row per doc finding to `## Fix Application Log`: `3 | doc | est_lines=<n> | decision=doc-reviewer | applied_by=doc-reviewer | finding=<1-line>`. If 0 findings, append one row: `3 | doc | est_lines=0 | decision=skipped | applied_by=- | reason=no-findings`. **Phase-8 producer (named counter)** — ALSO record the per-card doc-gap counts as a structured line in `## Current Card` (carried into `## Completed Cards` at Phase 5): `doc_gaps: found=<N> fixed=<M>` where `N` = total doc findings doc-reviewer raised and `M` = those it applied. This is the single named producer for Phase 8's `doc_gaps_found` / `doc_gaps_fixed` fields — without it those fields have no upstream write and Phase 8 would hard-code zeros. (D.4a is the team-mode producer of the same counter — see Phase 7 § D.4a.)
 16. Run `npm run lint` and `npx tsc --noEmit` (when `stack.language` includes typescript) to verify nothing broke (redirect to disk per § "Context economy"). If doc-reviewer touched any source-adjacent file (a `.ts`/`.tsx` helper, a co-located doc export), also run `npm run build`. If any check fails, apply the self-healing retry loop (up to 3 times, no user prompt). **If still failing after 3 retries**: do NOT fall through silently to Phase 3.5 — log `[DOC-PHASE-REGRESSION]` in `## Issues & Flags` and invoke `AskUserQuestion` (revert the doc-phase edits that broke the build / keep and fix manually / stop the card).

package/framework/.claude/skills/new/references/team-mode.md CHANGED Viewed

@@ -184,13 +184,15 @@ After ALL agents in the group complete successfully:
      }})
      ```
      The workflow fans out the finders per card, runs ONE Codex pass + ONE qa-sentinel (group max tier)
-     over the union, and **one coder applies all VERIFIED code/perf/security/simplify fixes for the
-     whole group in a single pass** (files disjoint by ownership → no conflict, same as D.3). It returns
-     `{ codexEngine, perCard, gateTable, summary }`. **Skip the inline D.2 (code portion), D.3, D.3b,
+     over the union, and the **domain writer applies all VERIFIED fixes for the whole group** (canonical
+     writer map v4.26.1: `security` → `security-reviewer`, then `code`/`perf`/`migration`/`test`/`simplify`
+     → `coder`; the two passes run sequentially over disjoint-by-ownership files → no conflict, same as D.3).
+     It returns `{ codexEngine, perCard, gateTable, summary }`. **Skip the inline D.2 (code portion), D.3, D.3b,
      D.4, D.4b** below. Then per card handle `perCard[<id>].residual` exactly as the sequential gate does
      (`references/review-cycle.md` § Phase 2.5x — **by classification first**: `NEEDS_MANUAL_CONFIRMATION`
-     any-domain → `AskUserQuestion`; else doc residual → the post-E2E doc step; else unconverged
-     code/perf/security residual → targeted `coder`). Apply the **same BLOCKING QA-gate consumption**:
+     any-domain → `AskUserQuestion`; else doc residual → the post-E2E doc step; else unconverged `security`
+     residual → targeted `security-reviewer`; else unconverged code/perf residual → targeted `coder`).
+     Apply the **same BLOCKING QA-gate consumption**:
      `gateTable` with any `status:"FAIL"` OR `summary.checksFailed` → coder fix (≤2 retries) then
      `AskUserQuestion`; **D.5 commit MUST NOT happen until `gateTable` is PASS/SKIP and `checksFailed` is
      false** (a delegated QA FAIL blocks exactly as inline D.4 / Phase 3.5 would — `gateTable` is

package/framework/.claude/workflows/new-card-review.js CHANGED Viewed

@@ -28,7 +28,11 @@ export const meta = {
 //     gateTable, summary }
 // ───────────────────────────────────────────────────────────────────────────
-const a = args || {}
+// Tolerate args delivered as a JSON string (parse-or-default) — the Workflow tool
+// sometimes serializes a structured `args` object to a string; without this guard
+// `a.cards` is undefined → empty `cards` → degenerate no-op return (cards:0, 0 agents).
+let a = args || {}
+if (typeof a === 'string') { try { a = JSON.parse(a) } catch (_) { a = {} } }
 const cards = (Array.isArray(a.cards) ? a.cards : []).filter((c) => c && c.cardId)
 const cfg = a.config || {}
 const highRisk = (cfg.paths && cfg.paths.high_risk_modules) || [] // security-domain hint
@@ -298,59 +302,83 @@ const surviving = classified
   .map((f) => ({ ...f, card: attributeCard(f, fileToCard, cards) }))
 // ───────────────────────────────────────────────────────────────────────────
-// Phase Fix — ONE coder applies all VERIFIED code/perf/security/simplify findings.
-//   doc findings → residual (the skill runs doc-reviewer post-E2E on final code).
-//   NEEDS_MANUAL_CONFIRMATION → residual (human gate, owned by the skill).
+// Phase Fix — the DOMAIN WRITER applies its verified findings (canonical writer
+//   map v4.26.1): security → security-reviewer (owns the security-invariant
+//   contract — never folded into the coder pass); code/perf/migration/test/simplify
+//   → coder. doc findings → residual (the skill runs doc-reviewer post-E2E on final
+//   code). NEEDS_MANUAL_CONFIRMATION → residual (human gate, owned by the skill).
 // ───────────────────────────────────────────────────────────────────────────
 phase('Fix')
 const isDoc = (f) => /doc|wiki|ssot|readme/.test(String(f.domain).toLowerCase())
+// 'security' domain → security-reviewer. migration STAYS coder (canonical writer map: code/perf/
+// migration/test → coder), so match the exact 'security' domain, not the broader verifier regex.
+const isSecurity = (f) => String(f.domain).toLowerCase() === 'security'
 const isManual = (f) => f.classification === 'NEEDS_MANUAL_CONFIRMATION'
 // Partition `surviving` (= VERIFIED + NEEDS_MANUAL; FALSE_POSITIVE already dropped) with NO overlap:
-//   actionable    = VERIFIED non-doc  → the coder fixes these.
+//   securityFix   = VERIFIED security → security-reviewer applies (it owns the security invariants).
+//   actionable    = VERIFIED non-doc non-security → the coder fixes these.
 //   docResidual   = VERIFIED doc      → the skill runs doc-reviewer post-E2E on final code.
 //   manualResidual= NEEDS_MANUAL any  → human gate, owned by the skill (a doc-manual must NOT be
 //                   silently auto-re-reviewed: it carries its needs-manual classification out).
-const actionable = surviving.filter((f) => f.classification === 'VERIFIED' && !isDoc(f))
+const securityFix = surviving.filter((f) => f.classification === 'VERIFIED' && !isDoc(f) && isSecurity(f))
+const actionable = surviving.filter((f) => f.classification === 'VERIFIED' && !isDoc(f) && !isSecurity(f))
 const docResidual = surviving.filter((f) => f.classification === 'VERIFIED' && isDoc(f))
 const manualResidual = surviving.filter(isManual)
 const SKIP_CHECKS = { lint: 'SKIP', tsc: 'SKIP', build: 'SKIP' }
-let fixResult = { applied: [], unresolved: [], checks: { ...SKIP_CHECKS } }
-if (actionable.length && unionEditable.length) {
+// One fix pass: the domain WRITER applies its verified findings to the worktree, then re-verifies.
+// Passes run SEQUENTIALLY (security-reviewer before coder) so edits on shared files never conflict
+// without having to partition the ownership map; the last pass to run carries the build it verified.
+async function applyFixPass(findings, writer, label, role) {
+  if (!findings.length) return { applied: [], unresolved: [], checks: { ...SKIP_CHECKS }, ran: false }
+  if (!unionEditable.length) {
+    log(`Fix: ${findings.length} ${label} finding(s) but no editable files in scope — returned as residual (${writer} skipped).`)
+    return { applied: [], unresolved: findings.map((f) => f.finding_id), checks: { ...SKIP_CHECKS }, ran: false }
+  }
   const fixBrief =
-    `Apply ALL of the verified review findings below to the worktree, then verify the build. You are the SINGLE fix pass for this wave.\n\n` +
+    `Apply ALL of the verified ${role} review findings below to the worktree, then verify the build. You are the ${writer} fix pass for this wave.\n\n` +
     `Worktree: ${a.worktreePath || '(cwd)'} — cd into it.\n` +
     `You MAY edit ONLY these files (ownership map — touching anything else is a violation):\n${unionEditable.join('\n')}\n\n` +
-    `Findings to fix (grouped — fix the code, not the tests unless a test itself is wrong; do NOT expand scope beyond the finding):\n` +
-    actionable.map((f) => `- [${f.finding_id}] (${f.card || '?'} / ${f.domain} / ${f.severity}) ${f.title}\n    evidence: ${f.evidence}\n    direction: ${f.minimal_fix_direction}`).join('\n') +
+    `Findings to fix (fix the code, not the tests unless a test itself is wrong; do NOT expand scope beyond the finding):\n` +
+    findings.map((f) => `- [${f.finding_id}] (${f.card || '?'} / ${f.domain} / ${f.severity}) ${f.title}\n    evidence: ${f.evidence}\n    direction: ${f.minimal_fix_direction}`).join('\n') +
     `\n\nAfter applying: run \`npm run lint\` and (when the project uses typescript) \`npx tsc --noEmit\` and \`npm run build\` in the worktree. If a check fails because of an edit you made, fix the regression — at most 2 retries — staying within the allowed files. ` +
     `Do NOT commit. Do NOT git stash (refs/stash is shared across worktrees). ` +
     `Return: applied (finding_ids you fixed), unresolved (finding_ids you could NOT fix within the allowed files / 2 retries), and checks (PASS/FAIL/SKIP for lint, tsc, build).`
-  const r = await agent(fixBrief, { label: 'fix-coder', phase: 'Fix', agentType: 'coder', schema: FIX_SCHEMA })
-  // Normalize: the coder may die (null) or return a truthy object missing fields.
-  fixResult = (r && typeof r === 'object') ? r : { applied: [], unresolved: actionable.map((f) => f.finding_id), checks: { ...SKIP_CHECKS } }
-  if (!Array.isArray(fixResult.applied)) fixResult.applied = []
-  if (!Array.isArray(fixResult.unresolved)) fixResult.unresolved = []
-  if (!fixResult.checks || typeof fixResult.checks !== 'object') fixResult.checks = { ...SKIP_CHECKS }
-  log(`Fix: coder applied ${fixResult.applied.length}/${actionable.length} finding(s); checks lint=${fixResult.checks.lint} tsc=${fixResult.checks.tsc} build=${fixResult.checks.build}.`)
-} else if (actionable.length) {
-  // Actionable findings exist but NO editable files are mapped → cannot fix; return all as residual
-  // (no wasted coder spawn — the skill will route them to a targeted coder with a proper ownership scope).
-  fixResult = { applied: [], unresolved: actionable.map((f) => f.finding_id), checks: { ...SKIP_CHECKS } }
-  log(`Fix: ${actionable.length} actionable finding(s) but no editable files in scope — returned as residual (coder skipped).`)
-} else {
-  log('Fix: no actionable code/perf/security/simplify findings — coder skipped.')
+  const r = await agent(fixBrief, { label, phase: 'Fix', agentType: writer, schema: FIX_SCHEMA })
+  // Normalize: the agent may die (null) or return a truthy object missing fields.
+  const res = (r && typeof r === 'object') ? r : { applied: [], unresolved: findings.map((f) => f.finding_id), checks: { ...SKIP_CHECKS } }
+  if (!Array.isArray(res.applied)) res.applied = []
+  if (!Array.isArray(res.unresolved)) res.unresolved = []
+  if (!res.checks || typeof res.checks !== 'object') res.checks = { ...SKIP_CHECKS }
+  res.ran = true
+  log(`Fix: ${writer} applied ${res.applied.length}/${findings.length} ${label} finding(s); checks lint=${res.checks.lint} tsc=${res.checks.tsc} build=${res.checks.build}.`)
+  return res
 }
+// Security writer FIRST (owns the security-invariant contract), then the coder. Sequential → no
+// edit conflict on shared files; the coder pass (when it runs) carries the authoritative build.
+const secResult = await applyFixPass(securityFix, 'security-reviewer', 'fix-security', 'security')
+const codeFixResult = await applyFixPass(actionable, 'coder', 'fix-coder', 'code/perf/simplify')
+if (!securityFix.length && !actionable.length) log('Fix: no actionable code/perf/security/simplify findings — fixers skipped.')
+// Merge the two passes. A FAIL in EITHER pass fails the wave; PASS only when a pass actually ran it.
+const fixPasses = [secResult, codeFixResult]
+const allActionable = [...securityFix, ...actionable]
+const appliedIds = new Set(fixPasses.flatMap((p) => (p.applied || []).map((x) => x.finding_id)))
+const unresolvedIds = new Set(fixPasses.flatMap((p) => p.unresolved || []))
+const ranChecks = fixPasses.filter((p) => p.ran).map((p) => p.checks)
+const mergedChecks = ['lint', 'tsc', 'build'].reduce((acc, k) => {
+  acc[k] = ranChecks.some((c) => c[k] === 'FAIL') ? 'FAIL' : (ranChecks.some((c) => c[k] === 'PASS') ? 'PASS' : 'SKIP')
+  return acc
+}, {})
 // Unfixed actionable findings become residual (human/coder follow-up owned by the skill).
-const appliedIds = new Set((fixResult.applied || []).map((x) => x.finding_id))
-const unresolvedIds = new Set(fixResult.unresolved || [])
-const codeResidual = actionable.filter((f) => !appliedIds.has(f.finding_id) || unresolvedIds.has(f.finding_id))
-const checksFailed = ['lint', 'tsc', 'build'].some((k) => fixResult.checks && fixResult.checks[k] === 'FAIL')
+const codeResidual = allActionable.filter((f) => !appliedIds.has(f.finding_id) || unresolvedIds.has(f.finding_id))
+const checksFailed = ['lint', 'tsc', 'build'].some((k) => mergedChecks[k] === 'FAIL')
 // ---- Assemble per-card result ----------------------------------------------
 function bucket(cardId) { return perCard[cardId] || (perCard[cardId] = { fixesApplied: [], residual: [] }) }
-for (const f of actionable) {
+for (const f of allActionable) {
   if (appliedIds.has(f.finding_id) && !unresolvedIds.has(f.finding_id)) {
     bucket(f.card || cards[0].cardId).fixesApplied.push(`[${f.finding_id}] ${f.title}`)
   }

package/framework/.claude/workflows/new-final-review.js CHANGED Viewed

@@ -24,7 +24,11 @@ export const meta = {
 //   { codexEngine, findings:[…classified, FALSE_POSITIVE dropped], gateTable, summary }
 // ───────────────────────────────────────────────────────────────────────────
-const a = args || {}
+// Tolerate args delivered as a JSON string (parse-or-default) — the Workflow tool
+// sometimes serializes a structured `args` object to a string; without this guard
+// `a.reviewScopeFiles`/`a.cardPaths` are undefined → empty scope → degenerate no-op return.
+let a = args || {}
+if (typeof a === 'string') { try { a = JSON.parse(a) } catch (_) { a = {} } }
 const scope = Array.isArray(a.reviewScopeFiles) ? a.reviewScopeFiles : []
 const cards = Array.isArray(a.cardPaths) ? a.cardPaths : []
 const cfg = a.config || {}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "baldart",
-  "version": "4.35.0",
+  "version": "4.37.0",
   "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
   "bin": {
     "baldart": "./bin/baldart.js"

package/src/commands/doctor.js CHANGED Viewed

@@ -33,6 +33,7 @@ const Hooks = require('../utils/hooks');
 const GitHooks = require('../utils/githooks');
 const LspInstaller = require('../utils/lsp-installer');
 const GraphifyInstaller = require('../utils/graphify-installer');
+const CodexOrphans = require('../utils/codex-orphans');
 const UpdateNotifier = require('../utils/update-notifier');
 const cliPackageJson = require('../../package.json');
@@ -388,6 +389,23 @@ async function detectState(cwd, opts = {}) {
         }
       }
     } catch (_) { /* never block doctor on graph probe */ }
+    // ---- Orphaned MCP servers from Codex calls (since v4.37.0) ---------
+    // BALDART's Codex finder calls (/new, new2, /codexreview, cron engine)
+    // drive `codex app-server` via the companion plugin. That broker spawns the
+    // MCP servers from ~/.codex/config.toml (Playwright, …) as children and,
+    // being detached, leaks them to init (ppid 1) when it dies — they keep
+    // running (an @playwright/mcp can peg a core for days). Surface the orphans
+    // so the planner can offer a safe reap (orphaned MCP servers only; never the
+    // broker — see codex-orphans.js for the ppid-1 safety invariant). Fully
+    // fail-safe: any error → no orphans reported.
+    state.mcpOrphans = [];
+    state.codexRuntimeOrphans = [];
+    try {
+      const { mcp, runtime } = CodexOrphans.detectOrphans();
+      state.mcpOrphans = mcp;
+      state.codexRuntimeOrphans = runtime;
+    } catch (_) { /* never block doctor on the process probe */ }
   }
   return state;
@@ -781,6 +799,37 @@ function planActions(state) {
     });
   }
+  // Orphaned MCP servers from Codex calls (since v4.37.0). BALDART's Codex
+  // finder calls leave behind MCP-server processes (Playwright, obsidian-mcp, …)
+  // reparented to init when their `codex app-server` broker dies. They keep
+  // burning CPU. Offer a safe reap — orphaned MCP servers only (ppid 1 ⇒ parent
+  // dead ⇒ stdio broken ⇒ dead weight). The action is NOT autoOk: killing
+  // processes warrants explicit intent.
+  if (state.mcpOrphans && state.mcpOrphans.length > 0) {
+    const n = state.mcpOrphans.length;
+    actions.push({
+      key: 'reap-mcp-orphans',
+      label: `Reap ${n} orphaned MCP server process(es) left by Codex`,
+      why: `${n} MCP server(s) are orphaned (ppid 1 — their parent Codex session/broker is dead) and still running. They cannot be reconnected to (their stdio pipe is broken) and waste CPU. Reaping kills each process tree directly via syscall. The codex app-server broker itself is never touched.`,
+      autoOk: false, // kills processes — require explicit intent
+      run: async () => {
+        const procs = CodexOrphans.listProcesses();
+        // Re-detect against a fresh snapshot so we never act on a stale list.
+        const { mcp } = CodexOrphans.detectOrphans(procs);
+        if (mcp.length === 0) {
+          UI.info('No orphaned MCP servers remain — nothing to reap.');
+          return;
+        }
+        const { killed, failed } = CodexOrphans.reapOrphans(mcp, procs);
+        if (killed.length) UI.success(`Reaped ${killed.length} orphaned process(es) (incl. descendants).`);
+        if (failed.length) {
+          UI.warning(`${failed.length} could not be killed:`);
+          failed.forEach((f) => console.log(`    pid ${f.pid}: ${f.error}`));
+        }
+      },
+    });
+  }
   // v3.25.0+: drift detection is authoritative via VERSION compare (isAligned).
   // The HEAD...FETCH_HEAD commit count is subtree-merge noise and never reaches
   // 0, so we MUST NOT use it as the "needs update" signal.
@@ -1037,6 +1086,19 @@ function renderDiagnostic(state) {
     console.log(statusLine('Code graph', 'disabled', 'ok'));
   }
+  // Orphaned MCP servers left by Codex calls (v4.37.0). Only shown when present
+  // — a clean machine prints nothing here (zero noise).
+  if (state.mcpOrphans && state.mcpOrphans.length > 0) {
+    console.log(statusLine(
+      'Codex MCP leak',
+      `${state.mcpOrphans.length} orphaned MCP server(s) running — will be reaped`,
+      'warn'
+    ));
+    state.mcpOrphans.slice(0, 6).forEach((p) =>
+      console.log(`   • pid ${p.pid} (up ${p.etime}): ${p.command.slice(0, 70)}`));
+    if (state.mcpOrphans.length > 6) console.log(`   • … and ${state.mcpOrphans.length - 6} more`);
+  }
   console.log();
 }

package/src/utils/codex-orphans.js ADDED Viewed

@@ -0,0 +1,182 @@
+/**
+ * Orphaned-MCP-server reaper (since v4.37.0).
+ *
+ * WHY THIS EXISTS
+ * ---------------
+ * BALDART's Codex integration (the `/new`, `new2`, `/codexreview` finder calls
+ * and the cron review engine) drives the OpenAI Codex CLI through the
+ * `codex-companion.mjs` plugin. That companion attaches to a SHARED, persistent
+ * `codex app-server` broker which is spawned `detached + unref'd`
+ * (broker-lifecycle.mjs) and which, in turn, spawns every MCP server declared in
+ * the user's `~/.codex/config.toml` (Playwright, Figma, …) as its own children.
+ *
+ * When that broker eventually dies, its MCP children are NOT reaped: the OS
+ * reparents them to init (ppid 1) and they keep running — an `@playwright/mcp`
+ * server can sit at ~100% CPU for days. Over many Codex sessions these
+ * accumulate (the symptom that motivated this utility: ~45 orphaned Playwright
+ * MCP processes pegging the machine).
+ *
+ * SAFETY INVARIANT (read before touching the matchers)
+ * ----------------------------------------------------
+ * We reap a process ONLY when BOTH hold:
+ *   1. ppid === 1  → the process was reparented to init, i.e. its controlling
+ *      parent is DEAD. An MCP server is a stdio child of whatever launched it;
+ *      once that parent dies the stdio pipe is broken and the server is dead
+ *      weight that can never be reconnected to. Reaping it is safe.
+ *   2. the command matches a known MCP-server signature (below).
+ *
+ * We deliberately DO NOT reap the `codex app-server` broker itself. The broker
+ * is `detached + unref'd` BY DESIGN, so a perfectly healthy, in-use shared
+ * runtime ALSO shows ppid 1 — ppid 1 cannot distinguish a leaked broker from a
+ * live one. Killing it could interrupt an in-flight Codex turn. We only report
+ * broker processes for visibility; we never auto-kill them.
+ *
+ * We use Node's `process.kill(pid)` (a direct syscall) rather than shelling out
+ * to `kill` — some sandboxed shells silently swallow multi-arg `kill`/for-loops,
+ * and the syscall path is immune to that.
+ *
+ * Fully fail-safe: any internal error degrades to "no orphans found" / "nothing
+ * reaped". This is hygiene, never a blocker.
+ */
+const { execSync } = require('child_process');
+// Command signatures that identify an MCP server. When such a process is
+// orphaned (ppid 1) it is safe to reap (its stdio parent is gone).
+const MCP_SIGNATURES = [
+  /@playwright\/mcp/,
+  /\bplaywright-mcp\b/,
+  /@modelcontextprotocol\//,
+  /-mcp-server\b/,
+  /\bmcp-server\b/,
+  /\bobsidian-mcp/,
+  /[\w@/.-]+-mcp@/, // npx-launched `<pkg>-mcp@<version>`
+];
+// Codex runtime processes — DETECTED for visibility, never auto-reaped (see the
+// safety note above: a detached broker at ppid 1 may still be the live runtime).
+const CODEX_RUNTIME_SIGNATURES = [
+  /codex\s+app-server/,
+  /codex-companion\.mjs/,
+];
+function matchesAny(signatures, command) {
+  return signatures.some((re) => re.test(command));
+}
+/**
+ * Snapshot every process as { pid, ppid, etime, command }.
+ * `ps -axo` works on both macOS and Linux. Returns [] on any failure or on
+ * Windows (the orphan-reparent-to-init leak is a POSIX phenomenon).
+ */
+function listProcesses() {
+  if (process.platform === 'win32') return [];
+  let raw;
+  try {
+    raw = execSync('ps -axo pid=,ppid=,etime=,command=', {
+      encoding: 'utf8',
+      maxBuffer: 16 * 1024 * 1024,
+      timeout: 5000,
+    });
+  } catch (_) {
+    return [];
+  }
+  const procs = [];
+  for (const line of raw.split('\n')) {
+    const m = line.trim().match(/^(\d+)\s+(\d+)\s+(\S+)\s+(.*)$/);
+    if (!m) continue;
+    procs.push({
+      pid: Number(m[1]),
+      ppid: Number(m[2]),
+      etime: m[3],
+      command: m[4],
+    });
+  }
+  return procs;
+}
+/**
+ * Detect orphaned MCP servers (reapable) and Codex runtime processes (info only).
+ *
+ * @returns {{ mcp: Array, runtime: Array }}
+ *   mcp     — orphaned MCP servers (ppid 1 + MCP signature) safe to reap
+ *   runtime — codex app-server / companion processes (reported, NOT reaped)
+ */
+function detectOrphans(procs = listProcesses()) {
+  const self = process.pid;
+  const mcp = [];
+  const runtime = [];
+  for (const p of procs) {
+    if (p.pid === self) continue;
+    if (p.ppid !== 1) continue; // only true orphans — parent is dead
+    if (matchesAny(MCP_SIGNATURES, p.command)) mcp.push(p);
+    else if (matchesAny(CODEX_RUNTIME_SIGNATURES, p.command)) runtime.push(p);
+  }
+  return { mcp, runtime };
+}
+/**
+ * Collect a pid plus all of its descendants (so killing an orphaned MCP server
+ * also takes down the browser/worker subprocesses it spawned).
+ */
+function collectTree(rootPid, procs) {
+  const childrenOf = new Map();
+  for (const p of procs) {
+    if (!childrenOf.has(p.ppid)) childrenOf.set(p.ppid, []);
+    childrenOf.get(p.ppid).push(p.pid);
+  }
+  const tree = [];
+  const seen = new Set();
+  const stack = [rootPid];
+  while (stack.length) {
+    const pid = stack.pop();
+    if (seen.has(pid)) continue;
+    seen.add(pid);
+    tree.push(pid);
+    for (const child of childrenOf.get(pid) || []) stack.push(child);
+  }
+  return tree;
+}
+/**
+ * Reap the given orphaned MCP-server processes (and their descendant trees).
+ * Uses process.kill(pid, 'SIGKILL') per-pid — immune to shells that swallow
+ * multi-arg kills. Never throws.
+ *
+ * @param {Array} orphans  the `mcp` array from detectOrphans()
+ * @param {Array} procs    full process snapshot (for descendant resolution)
+ * @returns {{ killed: number[], failed: Array<{pid:number,error:string}> }}
+ */
+function reapOrphans(orphans = [], procs = listProcesses()) {
+  const self = process.pid;
+  const targets = new Set();
+  for (const o of orphans) {
+    for (const pid of collectTree(o.pid, procs)) {
+      if (pid !== self && Number.isInteger(pid) && pid > 1) targets.add(pid);
+    }
+  }
+  const killed = [];
+  const failed = [];
+  // Kill descendants before roots so a parent can't immediately re-fork: sort
+  // by depth is overkill — SIGKILL is unconditional — so a single pass suffices.
+  for (const pid of targets) {
+    try {
+      process.kill(pid, 'SIGKILL');
+      killed.push(pid);
+    } catch (err) {
+      // ESRCH = already gone (e.g. died with its parent tree) → treat as success.
+      if (err && err.code === 'ESRCH') killed.push(pid);
+      else failed.push({ pid, error: (err && err.message) || String(err) });
+    }
+  }
+  return { killed, failed };
+}
+module.exports = {
+  MCP_SIGNATURES,
+  CODEX_RUNTIME_SIGNATURES,
+  listProcesses,
+  detectOrphans,
+  collectTree,
+  reapOrphans,
+};