baldart 4.36.0 → 4.37.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,15 @@ All notable changes to BALDART will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [4.37.0] - 2026-06-15
9
+
10
+ **`baldart doctor` now detects and reaps orphaned MCP-server processes left behind by BALDART's Codex calls.** A real machine hit ~100% CPU from ~45 orphaned `@playwright/mcp` processes (plus stray `obsidian-mcp-server` instances), all children of OpenAI Codex CLI sessions that had since died. Root cause traced through the Codex companion plugin: every BALDART Codex finder call (`/new`, `new2`, `/codexreview`, the cron review engine) drives `codex app-server` via `codex-companion.mjs`, which attaches to a **shared, `detached + unref'd` broker** (`broker-lifecycle.mjs`). That broker spawns every MCP server declared in the user's `~/.codex/config.toml` (Playwright, Figma, …) as its own children; when the broker dies the OS reparents those MCP servers to init (ppid 1) and they keep running — an `@playwright/mcp` can peg a core for days. The leak compounds across sessions. We cannot suppress the MCP spawn per-call (the companion attaches to a broker it does not control, and exposes no shutdown verb), so the fix is a **safe reaper** owned by the doctor. **MINOR** (new doctor diagnostic + self-heal action; backwards-compatible — zero output on a clean machine, no install/layout change, not a `baldart.config.yml` key ⇒ schema-propagation rule N/A).
11
+
12
+ ### Added
13
+
14
+ - **`src/utils/codex-orphans.js`** — orphaned-MCP-server detector + reaper. `detectOrphans()` snapshots `ps -axo` and returns MCP servers that are **orphaned (ppid 1) AND match an MCP-server command signature** (`@playwright/mcp`, `playwright-mcp`, `*-mcp-server`, `@modelcontextprotocol/*`, `obsidian-mcp`, npx `*-mcp@*`). `reapOrphans()` kills each orphan's full process tree (so a Playwright MCP's browser children go too) via `process.kill(pid, 'SIGKILL')` — a direct syscall, immune to sandboxed shells that silently swallow multi-arg `kill`/for-loops. **Safety invariant**: ppid 1 means the parent is dead, so an MCP server's stdio pipe is broken and the process is unreconnectable dead weight — safe to reap. The `codex app-server` broker is deliberately NOT reaped: it is `detached + unref'd` by design, so a *live, in-use* shared runtime also shows ppid 1 and ppid 1 cannot tell a leaked broker from a healthy one. Broker processes are detected for visibility only. Fully fail-safe (Windows / any error → "no orphans"); no age threshold (an orphan is dead weight at any age).
15
+ - **`src/commands/doctor.js`** — new probe (`state.mcpOrphans`), diagnostic line (`Codex MCP leak — N orphaned MCP server(s) running`, shown only when present so a clean machine prints nothing), and self-heal action `reap-mcp-orphans` (`autoOk: false` — killing processes warrants explicit intent; re-detects against a fresh snapshot at run time so it never acts on a stale list).
16
+
8
17
  ## [4.36.0] - 2026-06-13
9
18
 
10
19
  **`/new` security-domain fixes are now applied by `security-reviewer`, not `coder` — the v4.26.1 canonical writer map, finally propagated from `new2` to `/new`.** Auditing the `new2` lessons for guards/logic missing on `/new` surfaced one real gap (the others — args-string guard, JS router clamp, no-self-judge + specialist-owned lane, relevance-gated fan-out — were already present on `/new`). `new2-resolve.js` routes security fixes to `security-reviewer` (`fixerAgent = {doc:'doc-reviewer', ui:'ui-expert', security:'security-reviewer'}[domain] || 'coder'`), but the canonical writer map was never propagated to `/new`'s SSOT: the `Domain-Override Domains` table (SKILL.md) and every fix-routing site still sent `security` → `coder`. A coder applying a one-line RLS/permission/auth fix lacks the security-invariant contract that lives in `security-reviewer`'s system prompt — the same class of error as "wrong agent for the card", and a direct violation of the user's standing strict-specialization principle. **MINOR** (changes which agent applies security fixes across `/new`; backwards-compatible — `migration` stays `coder`, no install/layout change, no `baldart.config.yml` key ⇒ schema-propagation rule N/A).
package/README.md CHANGED
@@ -496,8 +496,17 @@ still exist for power users, but the seamless default makes them unnecessary.
496
496
 
497
497
  Smart diagnostic that detects the install state and proposes the next sensible
498
498
  action (install, migrate legacy layout, configure, refresh config schema,
499
- update, push, or "nothing to do"). Prints a status table then runs the
500
- proposed actions with confirmation per step.
499
+ update, push, repair symlinks, reap orphaned Codex MCP servers, or "nothing to
500
+ do"). Prints a status table then runs the proposed actions with confirmation per
501
+ step.
502
+
503
+ Since v4.37.0 it also surfaces **orphaned MCP-server processes left by Codex
504
+ calls** — every BALDART Codex finder call (`/new`, `new2`, `/codexreview`, the
505
+ cron review engine) drives `codex app-server`, whose detached broker spawns the
506
+ MCP servers from `~/.codex/config.toml` (Playwright, …) and leaks them to init
507
+ (ppid 1) when it dies, where they keep burning CPU. The doctor reaps the
508
+ orphaned MCP servers (and their browser children) directly via syscall; the live
509
+ `codex app-server` broker is never touched.
501
510
 
502
511
  ```bash
503
512
  npx baldart # diagnostic + interactive prompts
package/VERSION CHANGED
@@ -1 +1 @@
1
- 4.36.0
1
+ 4.37.0
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "baldart",
3
- "version": "4.36.0",
3
+ "version": "4.37.0",
4
4
  "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
5
5
  "bin": {
6
6
  "baldart": "./bin/baldart.js"
@@ -33,6 +33,7 @@ const Hooks = require('../utils/hooks');
33
33
  const GitHooks = require('../utils/githooks');
34
34
  const LspInstaller = require('../utils/lsp-installer');
35
35
  const GraphifyInstaller = require('../utils/graphify-installer');
36
+ const CodexOrphans = require('../utils/codex-orphans');
36
37
  const UpdateNotifier = require('../utils/update-notifier');
37
38
  const cliPackageJson = require('../../package.json');
38
39
 
@@ -388,6 +389,23 @@ async function detectState(cwd, opts = {}) {
388
389
  }
389
390
  }
390
391
  } catch (_) { /* never block doctor on graph probe */ }
392
+
393
+ // ---- Orphaned MCP servers from Codex calls (since v4.37.0) ---------
394
+ // BALDART's Codex finder calls (/new, new2, /codexreview, cron engine)
395
+ // drive `codex app-server` via the companion plugin. That broker spawns the
396
+ // MCP servers from ~/.codex/config.toml (Playwright, …) as children and,
397
+ // being detached, leaks them to init (ppid 1) when it dies — they keep
398
+ // running (an @playwright/mcp can peg a core for days). Surface the orphans
399
+ // so the planner can offer a safe reap (orphaned MCP servers only; never the
400
+ // broker — see codex-orphans.js for the ppid-1 safety invariant). Fully
401
+ // fail-safe: any error → no orphans reported.
402
+ state.mcpOrphans = [];
403
+ state.codexRuntimeOrphans = [];
404
+ try {
405
+ const { mcp, runtime } = CodexOrphans.detectOrphans();
406
+ state.mcpOrphans = mcp;
407
+ state.codexRuntimeOrphans = runtime;
408
+ } catch (_) { /* never block doctor on the process probe */ }
391
409
  }
392
410
 
393
411
  return state;
@@ -781,6 +799,37 @@ function planActions(state) {
781
799
  });
782
800
  }
783
801
 
802
+ // Orphaned MCP servers from Codex calls (since v4.37.0). BALDART's Codex
803
+ // finder calls leave behind MCP-server processes (Playwright, obsidian-mcp, …)
804
+ // reparented to init when their `codex app-server` broker dies. They keep
805
+ // burning CPU. Offer a safe reap — orphaned MCP servers only (ppid 1 ⇒ parent
806
+ // dead ⇒ stdio broken ⇒ dead weight). The action is NOT autoOk: killing
807
+ // processes warrants explicit intent.
808
+ if (state.mcpOrphans && state.mcpOrphans.length > 0) {
809
+ const n = state.mcpOrphans.length;
810
+ actions.push({
811
+ key: 'reap-mcp-orphans',
812
+ label: `Reap ${n} orphaned MCP server process(es) left by Codex`,
813
+ why: `${n} MCP server(s) are orphaned (ppid 1 — their parent Codex session/broker is dead) and still running. They cannot be reconnected to (their stdio pipe is broken) and waste CPU. Reaping kills each process tree directly via syscall. The codex app-server broker itself is never touched.`,
814
+ autoOk: false, // kills processes — require explicit intent
815
+ run: async () => {
816
+ const procs = CodexOrphans.listProcesses();
817
+ // Re-detect against a fresh snapshot so we never act on a stale list.
818
+ const { mcp } = CodexOrphans.detectOrphans(procs);
819
+ if (mcp.length === 0) {
820
+ UI.info('No orphaned MCP servers remain — nothing to reap.');
821
+ return;
822
+ }
823
+ const { killed, failed } = CodexOrphans.reapOrphans(mcp, procs);
824
+ if (killed.length) UI.success(`Reaped ${killed.length} orphaned process(es) (incl. descendants).`);
825
+ if (failed.length) {
826
+ UI.warning(`${failed.length} could not be killed:`);
827
+ failed.forEach((f) => console.log(` pid ${f.pid}: ${f.error}`));
828
+ }
829
+ },
830
+ });
831
+ }
832
+
784
833
  // v3.25.0+: drift detection is authoritative via VERSION compare (isAligned).
785
834
  // The HEAD...FETCH_HEAD commit count is subtree-merge noise and never reaches
786
835
  // 0, so we MUST NOT use it as the "needs update" signal.
@@ -1037,6 +1086,19 @@ function renderDiagnostic(state) {
1037
1086
  console.log(statusLine('Code graph', 'disabled', 'ok'));
1038
1087
  }
1039
1088
 
1089
+ // Orphaned MCP servers left by Codex calls (v4.37.0). Only shown when present
1090
+ // — a clean machine prints nothing here (zero noise).
1091
+ if (state.mcpOrphans && state.mcpOrphans.length > 0) {
1092
+ console.log(statusLine(
1093
+ 'Codex MCP leak',
1094
+ `${state.mcpOrphans.length} orphaned MCP server(s) running — will be reaped`,
1095
+ 'warn'
1096
+ ));
1097
+ state.mcpOrphans.slice(0, 6).forEach((p) =>
1098
+ console.log(` • pid ${p.pid} (up ${p.etime}): ${p.command.slice(0, 70)}`));
1099
+ if (state.mcpOrphans.length > 6) console.log(` • … and ${state.mcpOrphans.length - 6} more`);
1100
+ }
1101
+
1040
1102
  console.log();
1041
1103
  }
1042
1104
 
@@ -0,0 +1,182 @@
1
+ /**
2
+ * Orphaned-MCP-server reaper (since v4.37.0).
3
+ *
4
+ * WHY THIS EXISTS
5
+ * ---------------
6
+ * BALDART's Codex integration (the `/new`, `new2`, `/codexreview` finder calls
7
+ * and the cron review engine) drives the OpenAI Codex CLI through the
8
+ * `codex-companion.mjs` plugin. That companion attaches to a SHARED, persistent
9
+ * `codex app-server` broker which is spawned `detached + unref'd`
10
+ * (broker-lifecycle.mjs) and which, in turn, spawns every MCP server declared in
11
+ * the user's `~/.codex/config.toml` (Playwright, Figma, …) as its own children.
12
+ *
13
+ * When that broker eventually dies, its MCP children are NOT reaped: the OS
14
+ * reparents them to init (ppid 1) and they keep running — an `@playwright/mcp`
15
+ * server can sit at ~100% CPU for days. Over many Codex sessions these
16
+ * accumulate (the symptom that motivated this utility: ~45 orphaned Playwright
17
+ * MCP processes pegging the machine).
18
+ *
19
+ * SAFETY INVARIANT (read before touching the matchers)
20
+ * ----------------------------------------------------
21
+ * We reap a process ONLY when BOTH hold:
22
+ * 1. ppid === 1 → the process was reparented to init, i.e. its controlling
23
+ * parent is DEAD. An MCP server is a stdio child of whatever launched it;
24
+ * once that parent dies the stdio pipe is broken and the server is dead
25
+ * weight that can never be reconnected to. Reaping it is safe.
26
+ * 2. the command matches a known MCP-server signature (below).
27
+ *
28
+ * We deliberately DO NOT reap the `codex app-server` broker itself. The broker
29
+ * is `detached + unref'd` BY DESIGN, so a perfectly healthy, in-use shared
30
+ * runtime ALSO shows ppid 1 — ppid 1 cannot distinguish a leaked broker from a
31
+ * live one. Killing it could interrupt an in-flight Codex turn. We only report
32
+ * broker processes for visibility; we never auto-kill them.
33
+ *
34
+ * We use Node's `process.kill(pid)` (a direct syscall) rather than shelling out
35
+ * to `kill` — some sandboxed shells silently swallow multi-arg `kill`/for-loops,
36
+ * and the syscall path is immune to that.
37
+ *
38
+ * Fully fail-safe: any internal error degrades to "no orphans found" / "nothing
39
+ * reaped". This is hygiene, never a blocker.
40
+ */
41
+
42
+ const { execSync } = require('child_process');
43
+
44
+ // Command signatures that identify an MCP server. When such a process is
45
+ // orphaned (ppid 1) it is safe to reap (its stdio parent is gone).
46
+ const MCP_SIGNATURES = [
47
+ /@playwright\/mcp/,
48
+ /\bplaywright-mcp\b/,
49
+ /@modelcontextprotocol\//,
50
+ /-mcp-server\b/,
51
+ /\bmcp-server\b/,
52
+ /\bobsidian-mcp/,
53
+ /[\w@/.-]+-mcp@/, // npx-launched `<pkg>-mcp@<version>`
54
+ ];
55
+
56
+ // Codex runtime processes — DETECTED for visibility, never auto-reaped (see the
57
+ // safety note above: a detached broker at ppid 1 may still be the live runtime).
58
+ const CODEX_RUNTIME_SIGNATURES = [
59
+ /codex\s+app-server/,
60
+ /codex-companion\.mjs/,
61
+ ];
62
+
63
+ function matchesAny(signatures, command) {
64
+ return signatures.some((re) => re.test(command));
65
+ }
66
+
67
+ /**
68
+ * Snapshot every process as { pid, ppid, etime, command }.
69
+ * `ps -axo` works on both macOS and Linux. Returns [] on any failure or on
70
+ * Windows (the orphan-reparent-to-init leak is a POSIX phenomenon).
71
+ */
72
+ function listProcesses() {
73
+ if (process.platform === 'win32') return [];
74
+ let raw;
75
+ try {
76
+ raw = execSync('ps -axo pid=,ppid=,etime=,command=', {
77
+ encoding: 'utf8',
78
+ maxBuffer: 16 * 1024 * 1024,
79
+ timeout: 5000,
80
+ });
81
+ } catch (_) {
82
+ return [];
83
+ }
84
+ const procs = [];
85
+ for (const line of raw.split('\n')) {
86
+ const m = line.trim().match(/^(\d+)\s+(\d+)\s+(\S+)\s+(.*)$/);
87
+ if (!m) continue;
88
+ procs.push({
89
+ pid: Number(m[1]),
90
+ ppid: Number(m[2]),
91
+ etime: m[3],
92
+ command: m[4],
93
+ });
94
+ }
95
+ return procs;
96
+ }
97
+
98
+ /**
99
+ * Detect orphaned MCP servers (reapable) and Codex runtime processes (info only).
100
+ *
101
+ * @returns {{ mcp: Array, runtime: Array }}
102
+ * mcp — orphaned MCP servers (ppid 1 + MCP signature) safe to reap
103
+ * runtime — codex app-server / companion processes (reported, NOT reaped)
104
+ */
105
+ function detectOrphans(procs = listProcesses()) {
106
+ const self = process.pid;
107
+ const mcp = [];
108
+ const runtime = [];
109
+ for (const p of procs) {
110
+ if (p.pid === self) continue;
111
+ if (p.ppid !== 1) continue; // only true orphans — parent is dead
112
+ if (matchesAny(MCP_SIGNATURES, p.command)) mcp.push(p);
113
+ else if (matchesAny(CODEX_RUNTIME_SIGNATURES, p.command)) runtime.push(p);
114
+ }
115
+ return { mcp, runtime };
116
+ }
117
+
118
+ /**
119
+ * Collect a pid plus all of its descendants (so killing an orphaned MCP server
120
+ * also takes down the browser/worker subprocesses it spawned).
121
+ */
122
+ function collectTree(rootPid, procs) {
123
+ const childrenOf = new Map();
124
+ for (const p of procs) {
125
+ if (!childrenOf.has(p.ppid)) childrenOf.set(p.ppid, []);
126
+ childrenOf.get(p.ppid).push(p.pid);
127
+ }
128
+ const tree = [];
129
+ const seen = new Set();
130
+ const stack = [rootPid];
131
+ while (stack.length) {
132
+ const pid = stack.pop();
133
+ if (seen.has(pid)) continue;
134
+ seen.add(pid);
135
+ tree.push(pid);
136
+ for (const child of childrenOf.get(pid) || []) stack.push(child);
137
+ }
138
+ return tree;
139
+ }
140
+
141
+ /**
142
+ * Reap the given orphaned MCP-server processes (and their descendant trees).
143
+ * Uses process.kill(pid, 'SIGKILL') per-pid — immune to shells that swallow
144
+ * multi-arg kills. Never throws.
145
+ *
146
+ * @param {Array} orphans the `mcp` array from detectOrphans()
147
+ * @param {Array} procs full process snapshot (for descendant resolution)
148
+ * @returns {{ killed: number[], failed: Array<{pid:number,error:string}> }}
149
+ */
150
+ function reapOrphans(orphans = [], procs = listProcesses()) {
151
+ const self = process.pid;
152
+ const targets = new Set();
153
+ for (const o of orphans) {
154
+ for (const pid of collectTree(o.pid, procs)) {
155
+ if (pid !== self && Number.isInteger(pid) && pid > 1) targets.add(pid);
156
+ }
157
+ }
158
+ const killed = [];
159
+ const failed = [];
160
+ // Kill descendants before roots so a parent can't immediately re-fork: sort
161
+ // by depth is overkill — SIGKILL is unconditional — so a single pass suffices.
162
+ for (const pid of targets) {
163
+ try {
164
+ process.kill(pid, 'SIGKILL');
165
+ killed.push(pid);
166
+ } catch (err) {
167
+ // ESRCH = already gone (e.g. died with its parent tree) → treat as success.
168
+ if (err && err.code === 'ESRCH') killed.push(pid);
169
+ else failed.push({ pid, error: (err && err.message) || String(err) });
170
+ }
171
+ }
172
+ return { killed, failed };
173
+ }
174
+
175
+ module.exports = {
176
+ MCP_SIGNATURES,
177
+ CODEX_RUNTIME_SIGNATURES,
178
+ listProcesses,
179
+ detectOrphans,
180
+ collectTree,
181
+ reapOrphans,
182
+ };