baldart 4.36.0 → 4.37.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +9 -0
- package/README.md +11 -2
- package/VERSION +1 -1
- package/package.json +1 -1
- package/src/commands/doctor.js +62 -0
- package/src/utils/codex-orphans.js +182 -0
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,15 @@ All notable changes to BALDART will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [4.37.0] - 2026-06-15
|
|
9
|
+
|
|
10
|
+
**`baldart doctor` now detects and reaps orphaned MCP-server processes left behind by BALDART's Codex calls.** A real machine hit ~100% CPU from ~45 orphaned `@playwright/mcp` processes (plus stray `obsidian-mcp-server` instances), all children of OpenAI Codex CLI sessions that had since died. Root cause traced through the Codex companion plugin: every BALDART Codex finder call (`/new`, `new2`, `/codexreview`, the cron review engine) drives `codex app-server` via `codex-companion.mjs`, which attaches to a **shared, `detached + unref'd` broker** (`broker-lifecycle.mjs`). That broker spawns every MCP server declared in the user's `~/.codex/config.toml` (Playwright, Figma, …) as its own children; when the broker dies the OS reparents those MCP servers to init (ppid 1) and they keep running — an `@playwright/mcp` can peg a core for days. The leak compounds across sessions. We cannot suppress the MCP spawn per-call (the companion attaches to a broker it does not control, and exposes no shutdown verb), so the fix is a **safe reaper** owned by the doctor. **MINOR** (new doctor diagnostic + self-heal action; backwards-compatible — zero output on a clean machine, no install/layout change, not a `baldart.config.yml` key ⇒ schema-propagation rule N/A).
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
|
|
14
|
+
- **`src/utils/codex-orphans.js`** — orphaned-MCP-server detector + reaper. `detectOrphans()` snapshots `ps -axo` and returns MCP servers that are **orphaned (ppid 1) AND match an MCP-server command signature** (`@playwright/mcp`, `playwright-mcp`, `*-mcp-server`, `@modelcontextprotocol/*`, `obsidian-mcp`, npx `*-mcp@*`). `reapOrphans()` kills each orphan's full process tree (so a Playwright MCP's browser children go too) via `process.kill(pid, 'SIGKILL')` — a direct syscall, immune to sandboxed shells that silently swallow multi-arg `kill`/for-loops. **Safety invariant**: ppid 1 means the parent is dead, so an MCP server's stdio pipe is broken and the process is unreconnectable dead weight — safe to reap. The `codex app-server` broker is deliberately NOT reaped: it is `detached + unref'd` by design, so a *live, in-use* shared runtime also shows ppid 1 and ppid 1 cannot tell a leaked broker from a healthy one. Broker processes are detected for visibility only. Fully fail-safe (Windows / any error → "no orphans"); no age threshold (an orphan is dead weight at any age).
|
|
15
|
+
- **`src/commands/doctor.js`** — new probe (`state.mcpOrphans`), diagnostic line (`Codex MCP leak — N orphaned MCP server(s) running`, shown only when present so a clean machine prints nothing), and self-heal action `reap-mcp-orphans` (`autoOk: false` — killing processes warrants explicit intent; re-detects against a fresh snapshot at run time so it never acts on a stale list).
|
|
16
|
+
|
|
8
17
|
## [4.36.0] - 2026-06-13
|
|
9
18
|
|
|
10
19
|
**`/new` security-domain fixes are now applied by `security-reviewer`, not `coder` — the v4.26.1 canonical writer map, finally propagated from `new2` to `/new`.** Auditing the `new2` lessons for guards/logic missing on `/new` surfaced one real gap (the others — args-string guard, JS router clamp, no-self-judge + specialist-owned lane, relevance-gated fan-out — were already present on `/new`). `new2-resolve.js` routes security fixes to `security-reviewer` (`fixerAgent = {doc:'doc-reviewer', ui:'ui-expert', security:'security-reviewer'}[domain] || 'coder'`), but the canonical writer map was never propagated to `/new`'s SSOT: the `Domain-Override Domains` table (SKILL.md) and every fix-routing site still sent `security` → `coder`. A coder applying a one-line RLS/permission/auth fix lacks the security-invariant contract that lives in `security-reviewer`'s system prompt — the same class of error as "wrong agent for the card", and a direct violation of the user's standing strict-specialization principle. **MINOR** (changes which agent applies security fixes across `/new`; backwards-compatible — `migration` stays `coder`, no install/layout change, no `baldart.config.yml` key ⇒ schema-propagation rule N/A).
|
package/README.md
CHANGED
|
@@ -496,8 +496,17 @@ still exist for power users, but the seamless default makes them unnecessary.
|
|
|
496
496
|
|
|
497
497
|
Smart diagnostic that detects the install state and proposes the next sensible
|
|
498
498
|
action (install, migrate legacy layout, configure, refresh config schema,
|
|
499
|
-
update, push,
|
|
500
|
-
proposed actions with confirmation per
|
|
499
|
+
update, push, repair symlinks, reap orphaned Codex MCP servers, or "nothing to
|
|
500
|
+
do"). Prints a status table then runs the proposed actions with confirmation per
|
|
501
|
+
step.
|
|
502
|
+
|
|
503
|
+
Since v4.37.0 it also surfaces **orphaned MCP-server processes left by Codex
|
|
504
|
+
calls** — every BALDART Codex finder call (`/new`, `new2`, `/codexreview`, the
|
|
505
|
+
cron review engine) drives `codex app-server`, whose detached broker spawns the
|
|
506
|
+
MCP servers from `~/.codex/config.toml` (Playwright, …) and leaks them to init
|
|
507
|
+
(ppid 1) when it dies, where they keep burning CPU. The doctor reaps the
|
|
508
|
+
orphaned MCP servers (and their browser children) directly via syscall; the live
|
|
509
|
+
`codex app-server` broker is never touched.
|
|
501
510
|
|
|
502
511
|
```bash
|
|
503
512
|
npx baldart # diagnostic + interactive prompts
|
package/VERSION
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
4.
|
|
1
|
+
4.37.0
|
package/package.json
CHANGED
package/src/commands/doctor.js
CHANGED
|
@@ -33,6 +33,7 @@ const Hooks = require('../utils/hooks');
|
|
|
33
33
|
const GitHooks = require('../utils/githooks');
|
|
34
34
|
const LspInstaller = require('../utils/lsp-installer');
|
|
35
35
|
const GraphifyInstaller = require('../utils/graphify-installer');
|
|
36
|
+
const CodexOrphans = require('../utils/codex-orphans');
|
|
36
37
|
const UpdateNotifier = require('../utils/update-notifier');
|
|
37
38
|
const cliPackageJson = require('../../package.json');
|
|
38
39
|
|
|
@@ -388,6 +389,23 @@ async function detectState(cwd, opts = {}) {
|
|
|
388
389
|
}
|
|
389
390
|
}
|
|
390
391
|
} catch (_) { /* never block doctor on graph probe */ }
|
|
392
|
+
|
|
393
|
+
// ---- Orphaned MCP servers from Codex calls (since v4.37.0) ---------
|
|
394
|
+
// BALDART's Codex finder calls (/new, new2, /codexreview, cron engine)
|
|
395
|
+
// drive `codex app-server` via the companion plugin. That broker spawns the
|
|
396
|
+
// MCP servers from ~/.codex/config.toml (Playwright, …) as children and,
|
|
397
|
+
// being detached, leaks them to init (ppid 1) when it dies — they keep
|
|
398
|
+
// running (an @playwright/mcp can peg a core for days). Surface the orphans
|
|
399
|
+
// so the planner can offer a safe reap (orphaned MCP servers only; never the
|
|
400
|
+
// broker — see codex-orphans.js for the ppid-1 safety invariant). Fully
|
|
401
|
+
// fail-safe: any error → no orphans reported.
|
|
402
|
+
state.mcpOrphans = [];
|
|
403
|
+
state.codexRuntimeOrphans = [];
|
|
404
|
+
try {
|
|
405
|
+
const { mcp, runtime } = CodexOrphans.detectOrphans();
|
|
406
|
+
state.mcpOrphans = mcp;
|
|
407
|
+
state.codexRuntimeOrphans = runtime;
|
|
408
|
+
} catch (_) { /* never block doctor on the process probe */ }
|
|
391
409
|
}
|
|
392
410
|
|
|
393
411
|
return state;
|
|
@@ -781,6 +799,37 @@ function planActions(state) {
|
|
|
781
799
|
});
|
|
782
800
|
}
|
|
783
801
|
|
|
802
|
+
// Orphaned MCP servers from Codex calls (since v4.37.0). BALDART's Codex
|
|
803
|
+
// finder calls leave behind MCP-server processes (Playwright, obsidian-mcp, …)
|
|
804
|
+
// reparented to init when their `codex app-server` broker dies. They keep
|
|
805
|
+
// burning CPU. Offer a safe reap — orphaned MCP servers only (ppid 1 ⇒ parent
|
|
806
|
+
// dead ⇒ stdio broken ⇒ dead weight). The action is NOT autoOk: killing
|
|
807
|
+
// processes warrants explicit intent.
|
|
808
|
+
if (state.mcpOrphans && state.mcpOrphans.length > 0) {
|
|
809
|
+
const n = state.mcpOrphans.length;
|
|
810
|
+
actions.push({
|
|
811
|
+
key: 'reap-mcp-orphans',
|
|
812
|
+
label: `Reap ${n} orphaned MCP server process(es) left by Codex`,
|
|
813
|
+
why: `${n} MCP server(s) are orphaned (ppid 1 — their parent Codex session/broker is dead) and still running. They cannot be reconnected to (their stdio pipe is broken) and waste CPU. Reaping kills each process tree directly via syscall. The codex app-server broker itself is never touched.`,
|
|
814
|
+
autoOk: false, // kills processes — require explicit intent
|
|
815
|
+
run: async () => {
|
|
816
|
+
const procs = CodexOrphans.listProcesses();
|
|
817
|
+
// Re-detect against a fresh snapshot so we never act on a stale list.
|
|
818
|
+
const { mcp } = CodexOrphans.detectOrphans(procs);
|
|
819
|
+
if (mcp.length === 0) {
|
|
820
|
+
UI.info('No orphaned MCP servers remain — nothing to reap.');
|
|
821
|
+
return;
|
|
822
|
+
}
|
|
823
|
+
const { killed, failed } = CodexOrphans.reapOrphans(mcp, procs);
|
|
824
|
+
if (killed.length) UI.success(`Reaped ${killed.length} orphaned process(es) (incl. descendants).`);
|
|
825
|
+
if (failed.length) {
|
|
826
|
+
UI.warning(`${failed.length} could not be killed:`);
|
|
827
|
+
failed.forEach((f) => console.log(` pid ${f.pid}: ${f.error}`));
|
|
828
|
+
}
|
|
829
|
+
},
|
|
830
|
+
});
|
|
831
|
+
}
|
|
832
|
+
|
|
784
833
|
// v3.25.0+: drift detection is authoritative via VERSION compare (isAligned).
|
|
785
834
|
// The HEAD...FETCH_HEAD commit count is subtree-merge noise and never reaches
|
|
786
835
|
// 0, so we MUST NOT use it as the "needs update" signal.
|
|
@@ -1037,6 +1086,19 @@ function renderDiagnostic(state) {
|
|
|
1037
1086
|
console.log(statusLine('Code graph', 'disabled', 'ok'));
|
|
1038
1087
|
}
|
|
1039
1088
|
|
|
1089
|
+
// Orphaned MCP servers left by Codex calls (v4.37.0). Only shown when present
|
|
1090
|
+
// — a clean machine prints nothing here (zero noise).
|
|
1091
|
+
if (state.mcpOrphans && state.mcpOrphans.length > 0) {
|
|
1092
|
+
console.log(statusLine(
|
|
1093
|
+
'Codex MCP leak',
|
|
1094
|
+
`${state.mcpOrphans.length} orphaned MCP server(s) running — will be reaped`,
|
|
1095
|
+
'warn'
|
|
1096
|
+
));
|
|
1097
|
+
state.mcpOrphans.slice(0, 6).forEach((p) =>
|
|
1098
|
+
console.log(` • pid ${p.pid} (up ${p.etime}): ${p.command.slice(0, 70)}`));
|
|
1099
|
+
if (state.mcpOrphans.length > 6) console.log(` • … and ${state.mcpOrphans.length - 6} more`);
|
|
1100
|
+
}
|
|
1101
|
+
|
|
1040
1102
|
console.log();
|
|
1041
1103
|
}
|
|
1042
1104
|
|
|
@@ -0,0 +1,182 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Orphaned-MCP-server reaper (since v4.37.0).
|
|
3
|
+
*
|
|
4
|
+
* WHY THIS EXISTS
|
|
5
|
+
* ---------------
|
|
6
|
+
* BALDART's Codex integration (the `/new`, `new2`, `/codexreview` finder calls
|
|
7
|
+
* and the cron review engine) drives the OpenAI Codex CLI through the
|
|
8
|
+
* `codex-companion.mjs` plugin. That companion attaches to a SHARED, persistent
|
|
9
|
+
* `codex app-server` broker which is spawned `detached + unref'd`
|
|
10
|
+
* (broker-lifecycle.mjs) and which, in turn, spawns every MCP server declared in
|
|
11
|
+
* the user's `~/.codex/config.toml` (Playwright, Figma, …) as its own children.
|
|
12
|
+
*
|
|
13
|
+
* When that broker eventually dies, its MCP children are NOT reaped: the OS
|
|
14
|
+
* reparents them to init (ppid 1) and they keep running — an `@playwright/mcp`
|
|
15
|
+
* server can sit at ~100% CPU for days. Over many Codex sessions these
|
|
16
|
+
* accumulate (the symptom that motivated this utility: ~45 orphaned Playwright
|
|
17
|
+
* MCP processes pegging the machine).
|
|
18
|
+
*
|
|
19
|
+
* SAFETY INVARIANT (read before touching the matchers)
|
|
20
|
+
* ----------------------------------------------------
|
|
21
|
+
* We reap a process ONLY when BOTH hold:
|
|
22
|
+
* 1. ppid === 1 → the process was reparented to init, i.e. its controlling
|
|
23
|
+
* parent is DEAD. An MCP server is a stdio child of whatever launched it;
|
|
24
|
+
* once that parent dies the stdio pipe is broken and the server is dead
|
|
25
|
+
* weight that can never be reconnected to. Reaping it is safe.
|
|
26
|
+
* 2. the command matches a known MCP-server signature (below).
|
|
27
|
+
*
|
|
28
|
+
* We deliberately DO NOT reap the `codex app-server` broker itself. The broker
|
|
29
|
+
* is `detached + unref'd` BY DESIGN, so a perfectly healthy, in-use shared
|
|
30
|
+
* runtime ALSO shows ppid 1 — ppid 1 cannot distinguish a leaked broker from a
|
|
31
|
+
* live one. Killing it could interrupt an in-flight Codex turn. We only report
|
|
32
|
+
* broker processes for visibility; we never auto-kill them.
|
|
33
|
+
*
|
|
34
|
+
* We use Node's `process.kill(pid)` (a direct syscall) rather than shelling out
|
|
35
|
+
* to `kill` — some sandboxed shells silently swallow multi-arg `kill`/for-loops,
|
|
36
|
+
* and the syscall path is immune to that.
|
|
37
|
+
*
|
|
38
|
+
* Fully fail-safe: any internal error degrades to "no orphans found" / "nothing
|
|
39
|
+
* reaped". This is hygiene, never a blocker.
|
|
40
|
+
*/
|
|
41
|
+
|
|
42
|
+
const { execSync } = require('child_process');
|
|
43
|
+
|
|
44
|
+
// Command signatures that identify an MCP server. When such a process is
|
|
45
|
+
// orphaned (ppid 1) it is safe to reap (its stdio parent is gone).
|
|
46
|
+
const MCP_SIGNATURES = [
|
|
47
|
+
/@playwright\/mcp/,
|
|
48
|
+
/\bplaywright-mcp\b/,
|
|
49
|
+
/@modelcontextprotocol\//,
|
|
50
|
+
/-mcp-server\b/,
|
|
51
|
+
/\bmcp-server\b/,
|
|
52
|
+
/\bobsidian-mcp/,
|
|
53
|
+
/[\w@/.-]+-mcp@/, // npx-launched `<pkg>-mcp@<version>`
|
|
54
|
+
];
|
|
55
|
+
|
|
56
|
+
// Codex runtime processes — DETECTED for visibility, never auto-reaped (see the
|
|
57
|
+
// safety note above: a detached broker at ppid 1 may still be the live runtime).
|
|
58
|
+
const CODEX_RUNTIME_SIGNATURES = [
|
|
59
|
+
/codex\s+app-server/,
|
|
60
|
+
/codex-companion\.mjs/,
|
|
61
|
+
];
|
|
62
|
+
|
|
63
|
+
function matchesAny(signatures, command) {
|
|
64
|
+
return signatures.some((re) => re.test(command));
|
|
65
|
+
}
|
|
66
|
+
|
|
67
|
+
/**
|
|
68
|
+
* Snapshot every process as { pid, ppid, etime, command }.
|
|
69
|
+
* `ps -axo` works on both macOS and Linux. Returns [] on any failure or on
|
|
70
|
+
* Windows (the orphan-reparent-to-init leak is a POSIX phenomenon).
|
|
71
|
+
*/
|
|
72
|
+
function listProcesses() {
|
|
73
|
+
if (process.platform === 'win32') return [];
|
|
74
|
+
let raw;
|
|
75
|
+
try {
|
|
76
|
+
raw = execSync('ps -axo pid=,ppid=,etime=,command=', {
|
|
77
|
+
encoding: 'utf8',
|
|
78
|
+
maxBuffer: 16 * 1024 * 1024,
|
|
79
|
+
timeout: 5000,
|
|
80
|
+
});
|
|
81
|
+
} catch (_) {
|
|
82
|
+
return [];
|
|
83
|
+
}
|
|
84
|
+
const procs = [];
|
|
85
|
+
for (const line of raw.split('\n')) {
|
|
86
|
+
const m = line.trim().match(/^(\d+)\s+(\d+)\s+(\S+)\s+(.*)$/);
|
|
87
|
+
if (!m) continue;
|
|
88
|
+
procs.push({
|
|
89
|
+
pid: Number(m[1]),
|
|
90
|
+
ppid: Number(m[2]),
|
|
91
|
+
etime: m[3],
|
|
92
|
+
command: m[4],
|
|
93
|
+
});
|
|
94
|
+
}
|
|
95
|
+
return procs;
|
|
96
|
+
}
|
|
97
|
+
|
|
98
|
+
/**
|
|
99
|
+
* Detect orphaned MCP servers (reapable) and Codex runtime processes (info only).
|
|
100
|
+
*
|
|
101
|
+
* @returns {{ mcp: Array, runtime: Array }}
|
|
102
|
+
* mcp — orphaned MCP servers (ppid 1 + MCP signature) safe to reap
|
|
103
|
+
* runtime — codex app-server / companion processes (reported, NOT reaped)
|
|
104
|
+
*/
|
|
105
|
+
function detectOrphans(procs = listProcesses()) {
|
|
106
|
+
const self = process.pid;
|
|
107
|
+
const mcp = [];
|
|
108
|
+
const runtime = [];
|
|
109
|
+
for (const p of procs) {
|
|
110
|
+
if (p.pid === self) continue;
|
|
111
|
+
if (p.ppid !== 1) continue; // only true orphans — parent is dead
|
|
112
|
+
if (matchesAny(MCP_SIGNATURES, p.command)) mcp.push(p);
|
|
113
|
+
else if (matchesAny(CODEX_RUNTIME_SIGNATURES, p.command)) runtime.push(p);
|
|
114
|
+
}
|
|
115
|
+
return { mcp, runtime };
|
|
116
|
+
}
|
|
117
|
+
|
|
118
|
+
/**
|
|
119
|
+
* Collect a pid plus all of its descendants (so killing an orphaned MCP server
|
|
120
|
+
* also takes down the browser/worker subprocesses it spawned).
|
|
121
|
+
*/
|
|
122
|
+
function collectTree(rootPid, procs) {
|
|
123
|
+
const childrenOf = new Map();
|
|
124
|
+
for (const p of procs) {
|
|
125
|
+
if (!childrenOf.has(p.ppid)) childrenOf.set(p.ppid, []);
|
|
126
|
+
childrenOf.get(p.ppid).push(p.pid);
|
|
127
|
+
}
|
|
128
|
+
const tree = [];
|
|
129
|
+
const seen = new Set();
|
|
130
|
+
const stack = [rootPid];
|
|
131
|
+
while (stack.length) {
|
|
132
|
+
const pid = stack.pop();
|
|
133
|
+
if (seen.has(pid)) continue;
|
|
134
|
+
seen.add(pid);
|
|
135
|
+
tree.push(pid);
|
|
136
|
+
for (const child of childrenOf.get(pid) || []) stack.push(child);
|
|
137
|
+
}
|
|
138
|
+
return tree;
|
|
139
|
+
}
|
|
140
|
+
|
|
141
|
+
/**
|
|
142
|
+
* Reap the given orphaned MCP-server processes (and their descendant trees).
|
|
143
|
+
* Uses process.kill(pid, 'SIGKILL') per-pid — immune to shells that swallow
|
|
144
|
+
* multi-arg kills. Never throws.
|
|
145
|
+
*
|
|
146
|
+
* @param {Array} orphans the `mcp` array from detectOrphans()
|
|
147
|
+
* @param {Array} procs full process snapshot (for descendant resolution)
|
|
148
|
+
* @returns {{ killed: number[], failed: Array<{pid:number,error:string}> }}
|
|
149
|
+
*/
|
|
150
|
+
function reapOrphans(orphans = [], procs = listProcesses()) {
|
|
151
|
+
const self = process.pid;
|
|
152
|
+
const targets = new Set();
|
|
153
|
+
for (const o of orphans) {
|
|
154
|
+
for (const pid of collectTree(o.pid, procs)) {
|
|
155
|
+
if (pid !== self && Number.isInteger(pid) && pid > 1) targets.add(pid);
|
|
156
|
+
}
|
|
157
|
+
}
|
|
158
|
+
const killed = [];
|
|
159
|
+
const failed = [];
|
|
160
|
+
// Kill descendants before roots so a parent can't immediately re-fork: sort
|
|
161
|
+
// by depth is overkill — SIGKILL is unconditional — so a single pass suffices.
|
|
162
|
+
for (const pid of targets) {
|
|
163
|
+
try {
|
|
164
|
+
process.kill(pid, 'SIGKILL');
|
|
165
|
+
killed.push(pid);
|
|
166
|
+
} catch (err) {
|
|
167
|
+
// ESRCH = already gone (e.g. died with its parent tree) → treat as success.
|
|
168
|
+
if (err && err.code === 'ESRCH') killed.push(pid);
|
|
169
|
+
else failed.push({ pid, error: (err && err.message) || String(err) });
|
|
170
|
+
}
|
|
171
|
+
}
|
|
172
|
+
return { killed, failed };
|
|
173
|
+
}
|
|
174
|
+
|
|
175
|
+
module.exports = {
|
|
176
|
+
MCP_SIGNATURES,
|
|
177
|
+
CODEX_RUNTIME_SIGNATURES,
|
|
178
|
+
listProcesses,
|
|
179
|
+
detectOrphans,
|
|
180
|
+
collectTree,
|
|
181
|
+
reapOrphans,
|
|
182
|
+
};
|