baldart 4.35.0 → 4.37.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +28 -0
- package/README.md +11 -2
- package/VERSION +1 -1
- package/framework/.claude/agents/security-reviewer.md +1 -0
- package/framework/.claude/skills/new/SKILL.md +4 -2
- package/framework/.claude/skills/new/references/codex-gate.md +2 -2
- package/framework/.claude/skills/new/references/final-review.md +2 -2
- package/framework/.claude/skills/new/references/review-cycle.md +10 -6
- package/framework/.claude/skills/new/references/team-mode.md +7 -5
- package/framework/.claude/workflows/new-card-review.js +58 -30
- package/framework/.claude/workflows/new-final-review.js +5 -1
- package/package.json +1 -1
- package/src/commands/doctor.js +62 -0
- package/src/utils/codex-orphans.js +182 -0
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,34 @@ All notable changes to BALDART will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [4.37.0] - 2026-06-15
|
|
9
|
+
|
|
10
|
+
**`baldart doctor` now detects and reaps orphaned MCP-server processes left behind by BALDART's Codex calls.** A real machine hit ~100% CPU from ~45 orphaned `@playwright/mcp` processes (plus stray `obsidian-mcp-server` instances), all children of OpenAI Codex CLI sessions that had since died. Root cause traced through the Codex companion plugin: every BALDART Codex finder call (`/new`, `new2`, `/codexreview`, the cron review engine) drives `codex app-server` via `codex-companion.mjs`, which attaches to a **shared, `detached + unref'd` broker** (`broker-lifecycle.mjs`). That broker spawns every MCP server declared in the user's `~/.codex/config.toml` (Playwright, Figma, …) as its own children; when the broker dies the OS reparents those MCP servers to init (ppid 1) and they keep running — an `@playwright/mcp` can peg a core for days. The leak compounds across sessions. We cannot suppress the MCP spawn per-call (the companion attaches to a broker it does not control, and exposes no shutdown verb), so the fix is a **safe reaper** owned by the doctor. **MINOR** (new doctor diagnostic + self-heal action; backwards-compatible — zero output on a clean machine, no install/layout change, not a `baldart.config.yml` key ⇒ schema-propagation rule N/A).
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
|
|
14
|
+
- **`src/utils/codex-orphans.js`** — orphaned-MCP-server detector + reaper. `detectOrphans()` snapshots `ps -axo` and returns MCP servers that are **orphaned (ppid 1) AND match an MCP-server command signature** (`@playwright/mcp`, `playwright-mcp`, `*-mcp-server`, `@modelcontextprotocol/*`, `obsidian-mcp`, npx `*-mcp@*`). `reapOrphans()` kills each orphan's full process tree (so a Playwright MCP's browser children go too) via `process.kill(pid, 'SIGKILL')` — a direct syscall, immune to sandboxed shells that silently swallow multi-arg `kill`/for-loops. **Safety invariant**: ppid 1 means the parent is dead, so an MCP server's stdio pipe is broken and the process is unreconnectable dead weight — safe to reap. The `codex app-server` broker is deliberately NOT reaped: it is `detached + unref'd` by design, so a *live, in-use* shared runtime also shows ppid 1 and ppid 1 cannot tell a leaked broker from a healthy one. Broker processes are detected for visibility only. Fully fail-safe (Windows / any error → "no orphans"); no age threshold (an orphan is dead weight at any age).
|
|
15
|
+
- **`src/commands/doctor.js`** — new probe (`state.mcpOrphans`), diagnostic line (`Codex MCP leak — N orphaned MCP server(s) running`, shown only when present so a clean machine prints nothing), and self-heal action `reap-mcp-orphans` (`autoOk: false` — killing processes warrants explicit intent; re-detects against a fresh snapshot at run time so it never acts on a stale list).
|
|
16
|
+
|
|
17
|
+
## [4.36.0] - 2026-06-13
|
|
18
|
+
|
|
19
|
+
**`/new` security-domain fixes are now applied by `security-reviewer`, not `coder` — the v4.26.1 canonical writer map, finally propagated from `new2` to `/new`.** Auditing the `new2` lessons for guards/logic missing on `/new` surfaced one real gap (the others — args-string guard, JS router clamp, no-self-judge + specialist-owned lane, relevance-gated fan-out — were already present on `/new`). `new2-resolve.js` routes security fixes to `security-reviewer` (`fixerAgent = {doc:'doc-reviewer', ui:'ui-expert', security:'security-reviewer'}[domain] || 'coder'`), but the canonical writer map was never propagated to `/new`'s SSOT: the `Domain-Override Domains` table (SKILL.md) and every fix-routing site still sent `security` → `coder`. A coder applying a one-line RLS/permission/auth fix lacks the security-invariant contract that lives in `security-reviewer`'s system prompt — the same class of error as "wrong agent for the card", and a direct violation of the user's standing strict-specialization principle. **MINOR** (changes which agent applies security fixes across `/new`; backwards-compatible — `migration` stays `coder`, no install/layout change, no `baldart.config.yml` key ⇒ schema-propagation rule N/A).
|
|
20
|
+
|
|
21
|
+
### Changed
|
|
22
|
+
|
|
23
|
+
- **`framework/.claude/skills/new/SKILL.md`** — `Domain-Override Domains` table: `security` owning agent `coder` → **`security-reviewer`** (write mode), plus a new "Why `security` is owned by `security-reviewer`" rationale mirroring the `doc` one. The sequential-mode overview line aligned too.
|
|
24
|
+
- **`framework/.claude/skills/new/references/review-cycle.md`**, **`final-review.md`**, **`team-mode.md`**, **`codex-gate.md`** — every security-fix routing site (Phase 2.55 Domain-Override delegation, the delegated-workflow residual routing, the Final FULL merge-blocking partition, the Phase 3.7 codex fix sub-loop, and the doc-drift→bug security path) now routes `security` → `security-reviewer` and runs it before the `coder` pass. `migration` stays `coder`.
|
|
25
|
+
- **`framework/.claude/workflows/new-card-review.js`** — the Fix phase no longer folds security into the single coder pass. It partitions `VERIFIED` findings into a `security-reviewer` pass (domain `security`) and a `coder` pass (`code`/`perf`/`migration`/`test`/`simplify`), run sequentially (security first) over the disjoint-by-ownership editable set so shared-file edits never conflict; a FAIL in either pass fails the wave. `new-final-review.js` needed no change (it is read-only — the calling skill applies fixes — and its `domainVerifier` already routes security verification to `security-reviewer`).
|
|
26
|
+
- **`framework/.claude/agents/security-reviewer.md`** — new "Dual mode — review vs. apply" Behavior Rule: by default it audits and proposes (read-only), but when invoked as the security domain writer (by `/new`/`new2`/the codex fix loop) it APPLIES the remediation directly via Edit/Write and re-verifies — security fixes are owned by it, never deferred to a coder.
|
|
27
|
+
|
|
28
|
+
## [4.35.1] - 2026-06-13
|
|
29
|
+
|
|
30
|
+
**`/new` workflow delegation no longer degrades to a silent no-op when `args` arrives as a JSON string.** A live `/new FEAT-0027 -full` team-mode run delegated its per-wave review cluster to the `new-card-review` workflow and got back a degenerate result (`cards:0`, 0 agents, ~24ms) — the orchestrator correctly fell back to the inline cluster, but the delegation (the single biggest context-economy win in team mode) was wasted on every wave. Root cause: the `Workflow` tool sometimes serializes a structured `args` object to a JSON **string**; `new-card-review.js` and `new-final-review.js` read `args.cards` / `args.reviewScopeFiles` directly, so a string `args` left those `undefined` → empty scope → the early-return guard fired. The `new2` family (`new2.js`, `new2-resolve.js`) had already been hardened against exactly this (`F-001/F-004` parse-or-default guard), but the fix was **never propagated** to the two `/new` workflows — a parallel-location miss. **PATCH** (bugfix to shipped workflow payload, no behaviour change to install, no config key ⇒ schema-propagation rule N/A).
|
|
31
|
+
|
|
32
|
+
### Fixed
|
|
33
|
+
|
|
34
|
+
- **`framework/.claude/workflows/new-card-review.js`**, **`framework/.claude/workflows/new-final-review.js`** — added the same defensive `if (typeof a === 'string') { try { a = JSON.parse(a) } catch (_) { a = {} } }` guard already present in `new2.js`/`new2-resolve.js`. All four workflows now tolerate `args` delivered as a JSON string, so `/new`'s delegated review cluster and Final Review fan-out run as intended instead of no-op'ing into the inline fallback.
|
|
35
|
+
|
|
8
36
|
## [4.35.0] - 2026-06-13
|
|
9
37
|
|
|
10
38
|
**Card-baseline standardization — every backlog card, any prefix/origin, conforms to one profile-aware SSOT; `/new` normalizes foreign cards at ingestion.** A real `CHORE-0007` (consumer repo `mayo`) reached `/new` **without `review_profile`** (and without `scope`/`scope_boundaries`/`canonical_docs`): it was hand/ad-hoc authored after a graph-align finding, never by the canonical writer. `/new` and `/new2` consume cards **type-blind** — they scale per-card review depth on `review_profile` and run the same pipeline regardless of prefix — so an off-baseline card silently degrades the pipeline. Root cause: the baseline was scattered across `card-template.yml` + `prd-card-writer`'s Required-Fields + Rule C with **no single SSOT and no validator**; `prd-card-writer` only documented the `/prd` epic+children flow (its standalone single-card mode, used by `new2-resolve`, was undocumented); and three writers diverged — `/prd`/`new2`/`new2-resolve` emit the full baseline, but the `/new` AC-deferral stub (`completeness.md`) and `/issue-review` (`issue-review.md`) wrote partial cards.
|
package/README.md
CHANGED
|
@@ -496,8 +496,17 @@ still exist for power users, but the seamless default makes them unnecessary.
|
|
|
496
496
|
|
|
497
497
|
Smart diagnostic that detects the install state and proposes the next sensible
|
|
498
498
|
action (install, migrate legacy layout, configure, refresh config schema,
|
|
499
|
-
update, push,
|
|
500
|
-
proposed actions with confirmation per
|
|
499
|
+
update, push, repair symlinks, reap orphaned Codex MCP servers, or "nothing to
|
|
500
|
+
do"). Prints a status table then runs the proposed actions with confirmation per
|
|
501
|
+
step.
|
|
502
|
+
|
|
503
|
+
Since v4.37.0 it also surfaces **orphaned MCP-server processes left by Codex
|
|
504
|
+
calls** — every BALDART Codex finder call (`/new`, `new2`, `/codexreview`, the
|
|
505
|
+
cron review engine) drives `codex app-server`, whose detached broker spawns the
|
|
506
|
+
MCP servers from `~/.codex/config.toml` (Playwright, …) and leaks them to init
|
|
507
|
+
(ppid 1) when it dies, where they keep burning CPU. The doctor reaps the
|
|
508
|
+
orphaned MCP servers (and their browser children) directly via syscall; the live
|
|
509
|
+
`codex app-server` broker is never touched.
|
|
501
510
|
|
|
502
511
|
```bash
|
|
503
512
|
npx baldart # diagnostic + interactive prompts
|
package/VERSION
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
4.
|
|
1
|
+
4.37.0
|
|
@@ -50,6 +50,7 @@ Before reviewing:
|
|
|
50
50
|
|
|
51
51
|
## Behavior Rules
|
|
52
52
|
|
|
53
|
+
- **Dual mode — review vs. apply.** By default you AUDIT and *propose* remediations (read-only). But when invoked as the **security domain writer** (e.g. by `/new` / `new2` / the Phase 3.7 codex fix loop, whose brief tells you to "apply the verified security findings" in write mode), you ARE the fixer: APPLY the minimal remediation directly with the Edit/Write tools, then re-verify (lint/tsc/build as instructed). Security fixes are owned by you — never deferred to a coder — because the auth/permission/RLS/multi-tenant-isolation invariants live in YOUR system prompt, not the coder's. Stay within the files the brief's ownership map allows; if a fix needs a file outside that scope, report it as residual rather than expanding scope.
|
|
53
54
|
- Be extremely critical, thorough, and skeptical. Optimize for correctness and security, not politeness.
|
|
54
55
|
- Do NOT assume the developer did things safely unless proven by code evidence.
|
|
55
56
|
- Treat ALL external input as hostile.
|
|
@@ -291,7 +291,7 @@ per-card nei sub-step D.x (mai aggregate). Caricalo quando Pre-flight seleziona
|
|
|
291
291
|
### Sequential mode (default for small batches)
|
|
292
292
|
|
|
293
293
|
- Cards execute one at a time through the full per-card pipeline (Phases 1-5).
|
|
294
|
-
- Code review and doc review for the same card run as **parallel read-only audits**, then fixes are applied by domain owner: **doc findings → `doc-reviewer` (write mode)**, code/
|
|
294
|
+
- Code review and doc review for the same card run as **parallel read-only audits**, then fixes are applied by domain owner (see § "Domain-Override Domains"): **doc findings → `doc-reviewer` (write mode)**, **security findings → `security-reviewer` (write mode)**, code/perf/migration findings → `coder`. (Sequential Phase 3 is even simpler — doc-reviewer runs alone, so it audits AND applies in one invocation.)
|
|
295
295
|
- This mode is unchanged from the original behavior.
|
|
296
296
|
|
|
297
297
|
### Team mode (for complex batches)
|
|
@@ -541,11 +541,13 @@ Enumerated exhaustively:
|
|
|
541
541
|
| Domain | Owning agent | Match rule |
|
|
542
542
|
|---|---|---|
|
|
543
543
|
| `doc` | **`doc-reviewer`** (write mode) | File path matching `*.md` under `${paths.references_dir}`, `${paths.prd_dir}`, project root `CHANGELOG.md`, or any `ssot-registry.md`. |
|
|
544
|
-
| `security` |
|
|
544
|
+
| `security` | **`security-reviewer`** (write mode) | File path matching any entry in `paths.high_risk_modules` (`baldart.config.yml`) — the same auth/permission/payment-class paths the Phase 3.7 Step A detector reads. Also any SQL migration whose content matches `CREATE POLICY|ALTER POLICY|DROP POLICY` (RLS policy mutations). If `paths.high_risk_modules` is absent, the security match rule emits a one-line diagnostic and matches nothing (no hardcoded default). |
|
|
545
545
|
| `migration` | `coder` | File path matching `${paths.migrations_dir}/*.sql` if `paths.migrations_dir` is defined in `baldart.config.yml`; otherwise the project's migrations dir per convention (`migrations/`, `db/migrate/`, `supabase/migrations/`, `prisma/migrations/`). |
|
|
546
546
|
|
|
547
547
|
**Why `doc` is owned by `doc-reviewer`, not `coder` (since v3.40.0)** — the doc invariants the orchestrator must not break (freshness markers, linking protocol, frontmatter standard, tabular formatting, SSOT/registry coverage, dependency-topological order, SCIP/code refs) are encoded in the **`doc-reviewer`** system prompt, NOT the coder's. The coder is a code-oriented agent that lacks the doc-invariant contract — routing doc fixes to it is the wrong agent doing work the auditing agent already has full context for. The agent that *audits* the docs is also the agent that *fixes* them (`doc-reviewer.md` § Constraints: "WRITE missing docs directly. You are fully responsible — do not defer to other agents"). NEVER route a `doc`-domain fix to `coder`.
|
|
548
548
|
|
|
549
|
+
**Why `security` is owned by `security-reviewer`, not `coder` (since v4.36.0)** — the same logic as `doc`, applied to the security domain (canonical writer map v4.26.1; user principle "il codice lo scrive solo coder, la security solo security-reviewer"). The auth/permission/RLS/multi-tenant-isolation invariants live in the **`security-reviewer`** system prompt, not the coder's; a coder applying a one-line RLS or permission fix without that contract is the same class of error as the "wrong agent for the card". `security-reviewer` is the writer for security-domain fixes — it audits AND applies. `migration` stays `coder` (SQL authoring is the coder's lane; a migration's security-policy content matching the RLS rule above is classified `security`, not `migration`). NEVER route a `security`-domain fix to `coder`.
|
|
550
|
+
|
|
549
551
|
**Edge case explicit** — a mechanical append-a-row update to `CHANGELOG.md` or `ssot-registry.md` is still classified `doc` and still goes through `doc-reviewer`, never inline and never `coder`. The uniformity of the rule matters more than the cost of the individual spawn.
|
|
550
552
|
|
|
551
553
|
Domains NOT listed here remain governed by the per-phase rules of the corresponding phase (e.g. `simplify-*` follows Phase 2.55 inline rule).
|
|
@@ -128,9 +128,9 @@ For EVERY card (no conditional skip — the gate ALWAYS runs; only its DEPTH var
|
|
|
128
128
|
|
|
129
129
|
4. **Apply fix sub-loop** (mirror of Phase 3.5 retry pattern):
|
|
130
130
|
- If 0 BLOCKER and 0 HIGH → log `verdict: PASS — proceeding to Phase 4` in tracker. Done. (MEDIUM/LOW findings are advisory at this per-card gate; they are not silently lost — the post-batch **Final-review FULL gate** applies every VERIFIED finding ≥ MEDIUM. Log the MEDIUM count in the tracker so it is visible.)
|
|
131
|
-
- If 1+ BLOCKER OR 1+ HIGH → spawn
|
|
131
|
+
- If 1+ BLOCKER OR 1+ HIGH → spawn the **domain writer** with the report path + list of VERIFIED bugs (canonical writer map v4.26.1 — see SKILL.md § "Domain-Override Domains"): **`security`-domain findings** (touching `paths.high_risk_modules` or RLS-policy SQL — the same `security` match rule) → **`security-reviewer`** in write mode (it owns the security-invariant contract a coder lacks; never route a security fix to `coder`); **all other findings** (`correctness`/code/perf/`other`) → **`coder`**. Run security-reviewer first, then coder (skip either if its partition is empty). **At `full` profile** the report contains Codex-suggested inline patches: pass them and have the coder **apply the suggested patches** with the right system prompt (project conventions, naming, testing patterns) — it does NOT re-do the analysis or re-grep (since v3.28.3), BUT it MUST first confirm each patch still applies against the current file state (prior fix-loop iterations may have shifted line offsets); if a patch no longer applies cleanly, the coder re-locates the target by content and applies the equivalent edit rather than a stale-offset verbatim paste. **At `light` profile** (since v4.18.0) the findings come from **Codex** (the sole finder) — the report carries Codex's `minimal_fix_direction`; brief the coder to apply it (treat it like the `full`-profile Codex fix direction). **On the Codex-unavailable fallback** the `light` findings come from `code-reviewer` instead — brief the coder to apply the `code-reviewer` fix direction (no Codex patches to paste). After coder fixes, **re-write the lean contract `/tmp/codexreview-lean-<CARD-ID>.json` (it is consumed-once and deleted by `/codexreview`)** and re-invoke `/codexreview` via the Skill tool with `args: <CARD-ID>` (NOT a bare prose mention — the card ID MUST be passed so the retry reviews THIS card, not an inferred one). Repeat **max 2 times**.
|
|
132
132
|
- If still BLOCKER/HIGH after 2 retries → log in `## Issues & Flags` and **ask the user** whether to proceed, escalate, or stop. The Phase 4 commit MUST NOT happen until the Pre-Merge Codex Review verdict is PASS or user explicitly overrides.
|
|
133
|
-
- **Telemetry** — for EVERY codex finding processed (verified BLOCKER, verified HIGH, or false-positive-filtered), append one row to `## Fix Application Log`: `3.7 | codex-<security|correctness|other> | est_lines=<n> | decision=<coder|skipped> | applied_by=<coder|-> | severity=<BLOCKER|HIGH|FALSE-POSITIVE> | retry=<n>`. Classify domain: `security` for findings touching RLS / auth / permissions / payments; `correctness` for logic / data integrity / race conditions; `other` for everything else.
|
|
133
|
+
- **Telemetry** — for EVERY codex finding processed (verified BLOCKER, verified HIGH, or false-positive-filtered), append one row to `## Fix Application Log`: `3.7 | codex-<security|correctness|other> | est_lines=<n> | decision=<security-reviewer|coder|skipped> | applied_by=<security-reviewer|coder|-> | severity=<BLOCKER|HIGH|FALSE-POSITIVE> | retry=<n>`. (`security`-domain fixes are applied by `security-reviewer`, all others by `coder`.) Classify domain: `security` for findings touching RLS / auth / permissions / payments; `correctness` for logic / data integrity / race conditions; `other` for everything else.
|
|
134
134
|
|
|
135
135
|
5. **Update tracker**: phase = `3.7-codexgate DONE` (the gate runs unconditionally for every card — the legacy `3.7-highrisk` name implied it only fired on high-risk cards, which is no longer true), log final verdict, retry count, list of fixed findings, and the report path.
|
|
136
136
|
|
|
@@ -220,9 +220,9 @@ that is a **gate violation**: log it as
|
|
|
220
220
|
10. **Persist verified findings** to `/tmp/batch-final-review-<FIRST-CARD-ID>.md`.
|
|
221
221
|
11. **Merge-blocking gate (mirrors the per-card Phase 3.7 gate this final pass backstops):** if any VERIFIED **BLOCKER or HIGH** finding exists, it MUST be resolved before Phase 6 merge. Apply fixes by **domain owner** (since v3.40.0 — same Domain-Override routing as the per-card phases), then re-verify; if a BLOCKER/HIGH cannot be resolved in a single apply + one retry, log it in `## Issues & Flags` and invoke `AskUserQuestion` (override with reason / escalate to a follow-up card / halt) — do NOT proceed to Phase 6 with an unresolved BLOCKER or HIGH. VERIFIED findings of severity MEDIUM are also applied (advisory below that). Partition the verified findings by the **Domain-Override match rules** ("Domain-Override Domains"):
|
|
222
222
|
- **`doc`-domain findings** (file path matching the `doc` match rule — `*.md` under `${paths.references_dir}`/`${paths.prd_dir}`, `CHANGELOG.md`, `ssot-registry.md`) → invoke the **doc-reviewer** agent once in write mode to apply them. NEVER route doc fixes to coder.
|
|
223
|
-
- **`security`-domain findings** (path in `paths.high_risk_modules`, or RLS-policy SQL)
|
|
223
|
+
- **`security`-domain findings** (path in `paths.high_risk_modules`, or RLS-policy SQL) → route to **security-reviewer** in write mode (canonical writer map v4.26.1 — it owns the security-invariant contract a coder lacks; NEVER route security fixes to coder). **`migration`-domain findings** (SQL under the migrations dir) → route to **coder**. For both, apply the Sub-agent failure protocol's STOP-on-crash rule (never inline-fallback on a security/migration fix). These are NOT collapsed into a generic "everything else" bucket.
|
|
224
224
|
- **All remaining findings** (other code, perf, test) → invoke the **coder** agent once to apply them in a single pass.
|
|
225
|
-
Run in the order doc-reviewer → coder (
|
|
225
|
+
Run in the order doc-reviewer → security-reviewer → coder (skip any whose partition is empty). Pass only the verified findings, not false positives.
|
|
226
226
|
12. Run final build: `npm run lint && npx tsc --noEmit && npm run build` (redirect each to `/tmp/final-<gate>.txt` per § "Context economy"; surface only exit code + bounded extract on failure).
|
|
227
227
|
If any check fails, apply self-healing retry loop (up to 3 times).
|
|
228
228
|
13. **Update tracker** with final review results:
|
|
@@ -51,8 +51,10 @@ so it surfaces in telemetry.
|
|
|
51
51
|
```
|
|
52
52
|
|
|
53
53
|
The workflow runs Simplify + Codex (agent-launched, code-reviewer fallback) + qa-sentinel + security,
|
|
54
|
-
FP-checks each specialist's own findings, then **
|
|
55
|
-
|
|
54
|
+
FP-checks each specialist's own findings, then the **domain writer applies its VERIFIED findings**
|
|
55
|
+
(canonical writer map v4.26.1: `security` → `security-reviewer`; `code`/`perf`/`migration`/`test`/
|
|
56
|
+
`simplify` → `coder`) — security-reviewer pass first, then the coder pass — and re-verifies
|
|
57
|
+
lint/tsc/build. It returns
|
|
56
58
|
`{ codexEngine, perCard: { <CARD-ID>: { fixesApplied, residual } }, gateTable, summary }`.
|
|
57
59
|
**Skip the inline Phase 2.55 + Phase 3.5 below AND the Phase 3.7 gate in `codex-gate.md`** (all three
|
|
58
60
|
are now done), then handle the workflow output HERE in the skill. **Process each `residual` finding by
|
|
@@ -61,7 +63,9 @@ so it surfaces in telemetry.
|
|
|
61
63
|
- `classification == NEEDS_MANUAL_CONFIRMATION` (any domain) → `AskUserQuestion` — the human gate the
|
|
62
64
|
workflow cannot run. (`summary.needsManual` counts these, doc included.)
|
|
63
65
|
- else `domain == doc` residual → carry into **Phase 3** (the doc-reviewer runs there, post-E2E, on final code).
|
|
64
|
-
- else `
|
|
66
|
+
- else `security` residual (a fix not converged in 2 retries) → spawn a targeted `security-reviewer`
|
|
67
|
+
now over this card's `editableFiles` (it owns the security-invariant contract — never a coder).
|
|
68
|
+
- else `code`/`perf`/`migration` residual (a fix the coder could not converge in its 2 retries)
|
|
65
69
|
→ spawn a targeted `coder` now over this card's `editableFiles`.
|
|
66
70
|
- **QA gate (BLOCKING — mirror of inline Phase 3.5 step 24)**: if `gateTable` has any `status:"FAIL"`
|
|
67
71
|
**OR** `summary.checksFailed` is true, the merge gate is NOT satisfied. Spawn a `coder` on the
|
|
@@ -107,7 +111,7 @@ After completeness is verified, clean up the implementation before it reaches re
|
|
|
107
111
|
- **Efficiency agent** — flag unnecessary work (redundant computations, duplicate API calls, N+1), missed concurrency, hot-path bloat, recurring no-op updates without change-detection guards, TOCTOU existence checks, memory issues (unbounded structures, missing cleanup), overly broad operations.
|
|
108
112
|
|
|
109
113
|
4. Aggregate findings from all three agents. For each finding:
|
|
110
|
-
- **Valid AND in a Domain-Override domain** (the finding's target file matches the `doc`, `security`, or `migration` match rule in "Domain-Override Domains") → do NOT apply inline. Delegate to the domain
|
|
114
|
+
- **Valid AND in a Domain-Override domain** (the finding's target file matches the `doc`, `security`, or `migration` match rule in "Domain-Override Domains") → do NOT apply inline. Delegate to the domain **writer** (canonical writer map v4.26.1): `doc` → `doc-reviewer` (write mode), `security` → `security-reviewer` (write mode — it owns the security-invariant contract a coder lacks), `migration` → `coder`. Even a one-line efficiency fix in `paths.high_risk_modules` (security) or a migration file goes to the owning agent — the orchestrator lacks that domain's invariant contract.
|
|
111
115
|
- **Valid AND not in a Domain-Override domain** → fix directly (apply edits inline).
|
|
112
116
|
- **False positive / not worth addressing** → skip, BUT record it (see telemetry). If the skip rests on a "covered by X" / "redundant" / "not needed" rationalization (the same family the AC-Closure Gate guards against), do NOT discard silently — verify the rationale by reading `X`, and if it does not hold, treat the finding as valid.
|
|
113
117
|
|
|
@@ -279,9 +283,9 @@ skill's Phase 1 falls back to deriving Gherkin scenarios from
|
|
|
279
283
|
per-card Phase 3.7 gate now skips that duplicate (lean mode), so THIS pass MUST carry it.
|
|
280
284
|
A doc-drift→bug finding whose root cause is in CODE (not the doc) is the ONE thing
|
|
281
285
|
doc-reviewer does NOT fix itself — report it with the conflicting code location + the doc
|
|
282
|
-
it violates, and the orchestrator routes it to the `security
|
|
286
|
+
it violates, and the orchestrator routes it to the `security` (→ security-reviewer) / code (→ coder) fix path as appropriate.
|
|
283
287
|
```
|
|
284
|
-
Doc-reviewer applies all doc-domain fixes itself. The orchestrator does NOT spawn a coder for doc fixes (since v3.40.0 — `doc` is owned by `doc-reviewer`, see "Domain-Override Domains"). The only doc-reviewer output that leaves this phase unfixed is a **doc-drift→bug finding rooted in CODE** (the implementation contradicts a documented contract). Route it explicitly: if the conflicting code file matches the `security` Domain-Override match rule (`paths.high_risk_modules`) → spawn `
|
|
288
|
+
Doc-reviewer applies all doc-domain fixes itself. The orchestrator does NOT spawn a coder for doc fixes (since v3.40.0 — `doc` is owned by `doc-reviewer`, see "Domain-Override Domains"). The only doc-reviewer output that leaves this phase unfixed is a **doc-drift→bug finding rooted in CODE** (the implementation contradicts a documented contract). Route it explicitly: if the conflicting code file matches the `security` Domain-Override match rule (`paths.high_risk_modules`) → spawn `security-reviewer` with the finding now, in this phase (a security-class code fix is not deferrable to a `light` Phase 3.7, and security is owned by `security-reviewer` — never a coder); otherwise carry the finding into the Phase 3.7 `/codexreview` input as a known code-drift bug and let the Phase 3.7 fix sub-loop apply it. Either way, append a Fix Application Log row with `domain=codex-correctness` (NOT `doc`) so telemetry attributes it as a code fix. Do NOT leave it accumulating in the tracker with no fix owner.
|
|
285
289
|
14. **Knowledge-corpus sync (OPTIONAL — only if the project ships a corpus-sync agent)**: There is NO shipped `obsidian-sync` agent — do NOT dispatch one (a hard dispatch to a non-existent subagent fails silently). Only when the project provides its own knowledge-corpus sync agent (declared in `.baldart/overlays/new.md`) AND doc-reviewer's findings indicate a corpus impact, invoke that agent with the listed paths after the doc fixes are applied. Otherwise skip with a one-line notice (`knowledge-corpus sync: skipped (no corpus-sync agent configured)`). Non-blocking either way.
|
|
286
290
|
15. **Telemetry** — after doc-reviewer returns, append one row per doc finding to `## Fix Application Log`: `3 | doc | est_lines=<n> | decision=doc-reviewer | applied_by=doc-reviewer | finding=<1-line>`. If 0 findings, append one row: `3 | doc | est_lines=0 | decision=skipped | applied_by=- | reason=no-findings`. **Phase-8 producer (named counter)** — ALSO record the per-card doc-gap counts as a structured line in `## Current Card` (carried into `## Completed Cards` at Phase 5): `doc_gaps: found=<N> fixed=<M>` where `N` = total doc findings doc-reviewer raised and `M` = those it applied. This is the single named producer for Phase 8's `doc_gaps_found` / `doc_gaps_fixed` fields — without it those fields have no upstream write and Phase 8 would hard-code zeros. (D.4a is the team-mode producer of the same counter — see Phase 7 § D.4a.)
|
|
287
291
|
16. Run `npm run lint` and `npx tsc --noEmit` (when `stack.language` includes typescript) to verify nothing broke (redirect to disk per § "Context economy"). If doc-reviewer touched any source-adjacent file (a `.ts`/`.tsx` helper, a co-located doc export), also run `npm run build`. If any check fails, apply the self-healing retry loop (up to 3 times, no user prompt). **If still failing after 3 retries**: do NOT fall through silently to Phase 3.5 — log `[DOC-PHASE-REGRESSION]` in `## Issues & Flags` and invoke `AskUserQuestion` (revert the doc-phase edits that broke the build / keep and fix manually / stop the card).
|
|
@@ -184,13 +184,15 @@ After ALL agents in the group complete successfully:
|
|
|
184
184
|
}})
|
|
185
185
|
```
|
|
186
186
|
The workflow fans out the finders per card, runs ONE Codex pass + ONE qa-sentinel (group max tier)
|
|
187
|
-
over the union, and **
|
|
188
|
-
|
|
189
|
-
`
|
|
187
|
+
over the union, and the **domain writer applies all VERIFIED fixes for the whole group** (canonical
|
|
188
|
+
writer map v4.26.1: `security` → `security-reviewer`, then `code`/`perf`/`migration`/`test`/`simplify`
|
|
189
|
+
→ `coder`; the two passes run sequentially over disjoint-by-ownership files → no conflict, same as D.3).
|
|
190
|
+
It returns `{ codexEngine, perCard, gateTable, summary }`. **Skip the inline D.2 (code portion), D.3, D.3b,
|
|
190
191
|
D.4, D.4b** below. Then per card handle `perCard[<id>].residual` exactly as the sequential gate does
|
|
191
192
|
(`references/review-cycle.md` § Phase 2.5x — **by classification first**: `NEEDS_MANUAL_CONFIRMATION`
|
|
192
|
-
any-domain → `AskUserQuestion`; else doc residual → the post-E2E doc step; else unconverged
|
|
193
|
-
code/perf
|
|
193
|
+
any-domain → `AskUserQuestion`; else doc residual → the post-E2E doc step; else unconverged `security`
|
|
194
|
+
residual → targeted `security-reviewer`; else unconverged code/perf residual → targeted `coder`).
|
|
195
|
+
Apply the **same BLOCKING QA-gate consumption**:
|
|
194
196
|
`gateTable` with any `status:"FAIL"` OR `summary.checksFailed` → coder fix (≤2 retries) then
|
|
195
197
|
`AskUserQuestion`; **D.5 commit MUST NOT happen until `gateTable` is PASS/SKIP and `checksFailed` is
|
|
196
198
|
false** (a delegated QA FAIL blocks exactly as inline D.4 / Phase 3.5 would — `gateTable` is
|
|
@@ -28,7 +28,11 @@ export const meta = {
|
|
|
28
28
|
// gateTable, summary }
|
|
29
29
|
// ───────────────────────────────────────────────────────────────────────────
|
|
30
30
|
|
|
31
|
-
|
|
31
|
+
// Tolerate args delivered as a JSON string (parse-or-default) — the Workflow tool
|
|
32
|
+
// sometimes serializes a structured `args` object to a string; without this guard
|
|
33
|
+
// `a.cards` is undefined → empty `cards` → degenerate no-op return (cards:0, 0 agents).
|
|
34
|
+
let a = args || {}
|
|
35
|
+
if (typeof a === 'string') { try { a = JSON.parse(a) } catch (_) { a = {} } }
|
|
32
36
|
const cards = (Array.isArray(a.cards) ? a.cards : []).filter((c) => c && c.cardId)
|
|
33
37
|
const cfg = a.config || {}
|
|
34
38
|
const highRisk = (cfg.paths && cfg.paths.high_risk_modules) || [] // security-domain hint
|
|
@@ -298,59 +302,83 @@ const surviving = classified
|
|
|
298
302
|
.map((f) => ({ ...f, card: attributeCard(f, fileToCard, cards) }))
|
|
299
303
|
|
|
300
304
|
// ───────────────────────────────────────────────────────────────────────────
|
|
301
|
-
// Phase Fix —
|
|
302
|
-
//
|
|
303
|
-
//
|
|
305
|
+
// Phase Fix — the DOMAIN WRITER applies its verified findings (canonical writer
|
|
306
|
+
// map v4.26.1): security → security-reviewer (owns the security-invariant
|
|
307
|
+
// contract — never folded into the coder pass); code/perf/migration/test/simplify
|
|
308
|
+
// → coder. doc findings → residual (the skill runs doc-reviewer post-E2E on final
|
|
309
|
+
// code). NEEDS_MANUAL_CONFIRMATION → residual (human gate, owned by the skill).
|
|
304
310
|
// ───────────────────────────────────────────────────────────────────────────
|
|
305
311
|
phase('Fix')
|
|
306
312
|
const isDoc = (f) => /doc|wiki|ssot|readme/.test(String(f.domain).toLowerCase())
|
|
313
|
+
// 'security' domain → security-reviewer. migration STAYS coder (canonical writer map: code/perf/
|
|
314
|
+
// migration/test → coder), so match the exact 'security' domain, not the broader verifier regex.
|
|
315
|
+
const isSecurity = (f) => String(f.domain).toLowerCase() === 'security'
|
|
307
316
|
const isManual = (f) => f.classification === 'NEEDS_MANUAL_CONFIRMATION'
|
|
308
317
|
// Partition `surviving` (= VERIFIED + NEEDS_MANUAL; FALSE_POSITIVE already dropped) with NO overlap:
|
|
309
|
-
//
|
|
318
|
+
// securityFix = VERIFIED security → security-reviewer applies (it owns the security invariants).
|
|
319
|
+
// actionable = VERIFIED non-doc non-security → the coder fixes these.
|
|
310
320
|
// docResidual = VERIFIED doc → the skill runs doc-reviewer post-E2E on final code.
|
|
311
321
|
// manualResidual= NEEDS_MANUAL any → human gate, owned by the skill (a doc-manual must NOT be
|
|
312
322
|
// silently auto-re-reviewed: it carries its needs-manual classification out).
|
|
313
|
-
const
|
|
323
|
+
const securityFix = surviving.filter((f) => f.classification === 'VERIFIED' && !isDoc(f) && isSecurity(f))
|
|
324
|
+
const actionable = surviving.filter((f) => f.classification === 'VERIFIED' && !isDoc(f) && !isSecurity(f))
|
|
314
325
|
const docResidual = surviving.filter((f) => f.classification === 'VERIFIED' && isDoc(f))
|
|
315
326
|
const manualResidual = surviving.filter(isManual)
|
|
316
327
|
|
|
317
328
|
const SKIP_CHECKS = { lint: 'SKIP', tsc: 'SKIP', build: 'SKIP' }
|
|
318
|
-
|
|
319
|
-
|
|
329
|
+
|
|
330
|
+
// One fix pass: the domain WRITER applies its verified findings to the worktree, then re-verifies.
|
|
331
|
+
// Passes run SEQUENTIALLY (security-reviewer before coder) so edits on shared files never conflict
|
|
332
|
+
// without having to partition the ownership map; the last pass to run carries the build it verified.
|
|
333
|
+
async function applyFixPass(findings, writer, label, role) {
|
|
334
|
+
if (!findings.length) return { applied: [], unresolved: [], checks: { ...SKIP_CHECKS }, ran: false }
|
|
335
|
+
if (!unionEditable.length) {
|
|
336
|
+
log(`Fix: ${findings.length} ${label} finding(s) but no editable files in scope — returned as residual (${writer} skipped).`)
|
|
337
|
+
return { applied: [], unresolved: findings.map((f) => f.finding_id), checks: { ...SKIP_CHECKS }, ran: false }
|
|
338
|
+
}
|
|
320
339
|
const fixBrief =
|
|
321
|
-
`Apply ALL of the verified review findings below to the worktree, then verify the build. You are the
|
|
340
|
+
`Apply ALL of the verified ${role} review findings below to the worktree, then verify the build. You are the ${writer} fix pass for this wave.\n\n` +
|
|
322
341
|
`Worktree: ${a.worktreePath || '(cwd)'} — cd into it.\n` +
|
|
323
342
|
`You MAY edit ONLY these files (ownership map — touching anything else is a violation):\n${unionEditable.join('\n')}\n\n` +
|
|
324
|
-
`Findings to fix (
|
|
325
|
-
|
|
343
|
+
`Findings to fix (fix the code, not the tests unless a test itself is wrong; do NOT expand scope beyond the finding):\n` +
|
|
344
|
+
findings.map((f) => `- [${f.finding_id}] (${f.card || '?'} / ${f.domain} / ${f.severity}) ${f.title}\n evidence: ${f.evidence}\n direction: ${f.minimal_fix_direction}`).join('\n') +
|
|
326
345
|
`\n\nAfter applying: run \`npm run lint\` and (when the project uses typescript) \`npx tsc --noEmit\` and \`npm run build\` in the worktree. If a check fails because of an edit you made, fix the regression — at most 2 retries — staying within the allowed files. ` +
|
|
327
346
|
`Do NOT commit. Do NOT git stash (refs/stash is shared across worktrees). ` +
|
|
328
347
|
`Return: applied (finding_ids you fixed), unresolved (finding_ids you could NOT fix within the allowed files / 2 retries), and checks (PASS/FAIL/SKIP for lint, tsc, build).`
|
|
329
|
-
const r = await agent(fixBrief, { label
|
|
330
|
-
// Normalize: the
|
|
331
|
-
|
|
332
|
-
if (!Array.isArray(
|
|
333
|
-
if (!Array.isArray(
|
|
334
|
-
if (!
|
|
335
|
-
|
|
336
|
-
}
|
|
337
|
-
|
|
338
|
-
// (no wasted coder spawn — the skill will route them to a targeted coder with a proper ownership scope).
|
|
339
|
-
fixResult = { applied: [], unresolved: actionable.map((f) => f.finding_id), checks: { ...SKIP_CHECKS } }
|
|
340
|
-
log(`Fix: ${actionable.length} actionable finding(s) but no editable files in scope — returned as residual (coder skipped).`)
|
|
341
|
-
} else {
|
|
342
|
-
log('Fix: no actionable code/perf/security/simplify findings — coder skipped.')
|
|
348
|
+
const r = await agent(fixBrief, { label, phase: 'Fix', agentType: writer, schema: FIX_SCHEMA })
|
|
349
|
+
// Normalize: the agent may die (null) or return a truthy object missing fields.
|
|
350
|
+
const res = (r && typeof r === 'object') ? r : { applied: [], unresolved: findings.map((f) => f.finding_id), checks: { ...SKIP_CHECKS } }
|
|
351
|
+
if (!Array.isArray(res.applied)) res.applied = []
|
|
352
|
+
if (!Array.isArray(res.unresolved)) res.unresolved = []
|
|
353
|
+
if (!res.checks || typeof res.checks !== 'object') res.checks = { ...SKIP_CHECKS }
|
|
354
|
+
res.ran = true
|
|
355
|
+
log(`Fix: ${writer} applied ${res.applied.length}/${findings.length} ${label} finding(s); checks lint=${res.checks.lint} tsc=${res.checks.tsc} build=${res.checks.build}.`)
|
|
356
|
+
return res
|
|
343
357
|
}
|
|
344
358
|
|
|
359
|
+
// Security writer FIRST (owns the security-invariant contract), then the coder. Sequential → no
|
|
360
|
+
// edit conflict on shared files; the coder pass (when it runs) carries the authoritative build.
|
|
361
|
+
const secResult = await applyFixPass(securityFix, 'security-reviewer', 'fix-security', 'security')
|
|
362
|
+
const codeFixResult = await applyFixPass(actionable, 'coder', 'fix-coder', 'code/perf/simplify')
|
|
363
|
+
if (!securityFix.length && !actionable.length) log('Fix: no actionable code/perf/security/simplify findings — fixers skipped.')
|
|
364
|
+
|
|
365
|
+
// Merge the two passes. A FAIL in EITHER pass fails the wave; PASS only when a pass actually ran it.
|
|
366
|
+
const fixPasses = [secResult, codeFixResult]
|
|
367
|
+
const allActionable = [...securityFix, ...actionable]
|
|
368
|
+
const appliedIds = new Set(fixPasses.flatMap((p) => (p.applied || []).map((x) => x.finding_id)))
|
|
369
|
+
const unresolvedIds = new Set(fixPasses.flatMap((p) => p.unresolved || []))
|
|
370
|
+
const ranChecks = fixPasses.filter((p) => p.ran).map((p) => p.checks)
|
|
371
|
+
const mergedChecks = ['lint', 'tsc', 'build'].reduce((acc, k) => {
|
|
372
|
+
acc[k] = ranChecks.some((c) => c[k] === 'FAIL') ? 'FAIL' : (ranChecks.some((c) => c[k] === 'PASS') ? 'PASS' : 'SKIP')
|
|
373
|
+
return acc
|
|
374
|
+
}, {})
|
|
345
375
|
// Unfixed actionable findings become residual (human/coder follow-up owned by the skill).
|
|
346
|
-
const
|
|
347
|
-
const
|
|
348
|
-
const codeResidual = actionable.filter((f) => !appliedIds.has(f.finding_id) || unresolvedIds.has(f.finding_id))
|
|
349
|
-
const checksFailed = ['lint', 'tsc', 'build'].some((k) => fixResult.checks && fixResult.checks[k] === 'FAIL')
|
|
376
|
+
const codeResidual = allActionable.filter((f) => !appliedIds.has(f.finding_id) || unresolvedIds.has(f.finding_id))
|
|
377
|
+
const checksFailed = ['lint', 'tsc', 'build'].some((k) => mergedChecks[k] === 'FAIL')
|
|
350
378
|
|
|
351
379
|
// ---- Assemble per-card result ----------------------------------------------
|
|
352
380
|
function bucket(cardId) { return perCard[cardId] || (perCard[cardId] = { fixesApplied: [], residual: [] }) }
|
|
353
|
-
for (const f of
|
|
381
|
+
for (const f of allActionable) {
|
|
354
382
|
if (appliedIds.has(f.finding_id) && !unresolvedIds.has(f.finding_id)) {
|
|
355
383
|
bucket(f.card || cards[0].cardId).fixesApplied.push(`[${f.finding_id}] ${f.title}`)
|
|
356
384
|
}
|
|
@@ -24,7 +24,11 @@ export const meta = {
|
|
|
24
24
|
// { codexEngine, findings:[…classified, FALSE_POSITIVE dropped], gateTable, summary }
|
|
25
25
|
// ───────────────────────────────────────────────────────────────────────────
|
|
26
26
|
|
|
27
|
-
|
|
27
|
+
// Tolerate args delivered as a JSON string (parse-or-default) — the Workflow tool
|
|
28
|
+
// sometimes serializes a structured `args` object to a string; without this guard
|
|
29
|
+
// `a.reviewScopeFiles`/`a.cardPaths` are undefined → empty scope → degenerate no-op return.
|
|
30
|
+
let a = args || {}
|
|
31
|
+
if (typeof a === 'string') { try { a = JSON.parse(a) } catch (_) { a = {} } }
|
|
28
32
|
const scope = Array.isArray(a.reviewScopeFiles) ? a.reviewScopeFiles : []
|
|
29
33
|
const cards = Array.isArray(a.cardPaths) ? a.cardPaths : []
|
|
30
34
|
const cfg = a.config || {}
|
package/package.json
CHANGED
package/src/commands/doctor.js
CHANGED
|
@@ -33,6 +33,7 @@ const Hooks = require('../utils/hooks');
|
|
|
33
33
|
const GitHooks = require('../utils/githooks');
|
|
34
34
|
const LspInstaller = require('../utils/lsp-installer');
|
|
35
35
|
const GraphifyInstaller = require('../utils/graphify-installer');
|
|
36
|
+
const CodexOrphans = require('../utils/codex-orphans');
|
|
36
37
|
const UpdateNotifier = require('../utils/update-notifier');
|
|
37
38
|
const cliPackageJson = require('../../package.json');
|
|
38
39
|
|
|
@@ -388,6 +389,23 @@ async function detectState(cwd, opts = {}) {
|
|
|
388
389
|
}
|
|
389
390
|
}
|
|
390
391
|
} catch (_) { /* never block doctor on graph probe */ }
|
|
392
|
+
|
|
393
|
+
// ---- Orphaned MCP servers from Codex calls (since v4.37.0) ---------
|
|
394
|
+
// BALDART's Codex finder calls (/new, new2, /codexreview, cron engine)
|
|
395
|
+
// drive `codex app-server` via the companion plugin. That broker spawns the
|
|
396
|
+
// MCP servers from ~/.codex/config.toml (Playwright, …) as children and,
|
|
397
|
+
// being detached, leaks them to init (ppid 1) when it dies — they keep
|
|
398
|
+
// running (an @playwright/mcp can peg a core for days). Surface the orphans
|
|
399
|
+
// so the planner can offer a safe reap (orphaned MCP servers only; never the
|
|
400
|
+
// broker — see codex-orphans.js for the ppid-1 safety invariant). Fully
|
|
401
|
+
// fail-safe: any error → no orphans reported.
|
|
402
|
+
state.mcpOrphans = [];
|
|
403
|
+
state.codexRuntimeOrphans = [];
|
|
404
|
+
try {
|
|
405
|
+
const { mcp, runtime } = CodexOrphans.detectOrphans();
|
|
406
|
+
state.mcpOrphans = mcp;
|
|
407
|
+
state.codexRuntimeOrphans = runtime;
|
|
408
|
+
} catch (_) { /* never block doctor on the process probe */ }
|
|
391
409
|
}
|
|
392
410
|
|
|
393
411
|
return state;
|
|
@@ -781,6 +799,37 @@ function planActions(state) {
|
|
|
781
799
|
});
|
|
782
800
|
}
|
|
783
801
|
|
|
802
|
+
// Orphaned MCP servers from Codex calls (since v4.37.0). BALDART's Codex
|
|
803
|
+
// finder calls leave behind MCP-server processes (Playwright, obsidian-mcp, …)
|
|
804
|
+
// reparented to init when their `codex app-server` broker dies. They keep
|
|
805
|
+
// burning CPU. Offer a safe reap — orphaned MCP servers only (ppid 1 ⇒ parent
|
|
806
|
+
// dead ⇒ stdio broken ⇒ dead weight). The action is NOT autoOk: killing
|
|
807
|
+
// processes warrants explicit intent.
|
|
808
|
+
if (state.mcpOrphans && state.mcpOrphans.length > 0) {
|
|
809
|
+
const n = state.mcpOrphans.length;
|
|
810
|
+
actions.push({
|
|
811
|
+
key: 'reap-mcp-orphans',
|
|
812
|
+
label: `Reap ${n} orphaned MCP server process(es) left by Codex`,
|
|
813
|
+
why: `${n} MCP server(s) are orphaned (ppid 1 — their parent Codex session/broker is dead) and still running. They cannot be reconnected to (their stdio pipe is broken) and waste CPU. Reaping kills each process tree directly via syscall. The codex app-server broker itself is never touched.`,
|
|
814
|
+
autoOk: false, // kills processes — require explicit intent
|
|
815
|
+
run: async () => {
|
|
816
|
+
const procs = CodexOrphans.listProcesses();
|
|
817
|
+
// Re-detect against a fresh snapshot so we never act on a stale list.
|
|
818
|
+
const { mcp } = CodexOrphans.detectOrphans(procs);
|
|
819
|
+
if (mcp.length === 0) {
|
|
820
|
+
UI.info('No orphaned MCP servers remain — nothing to reap.');
|
|
821
|
+
return;
|
|
822
|
+
}
|
|
823
|
+
const { killed, failed } = CodexOrphans.reapOrphans(mcp, procs);
|
|
824
|
+
if (killed.length) UI.success(`Reaped ${killed.length} orphaned process(es) (incl. descendants).`);
|
|
825
|
+
if (failed.length) {
|
|
826
|
+
UI.warning(`${failed.length} could not be killed:`);
|
|
827
|
+
failed.forEach((f) => console.log(` pid ${f.pid}: ${f.error}`));
|
|
828
|
+
}
|
|
829
|
+
},
|
|
830
|
+
});
|
|
831
|
+
}
|
|
832
|
+
|
|
784
833
|
// v3.25.0+: drift detection is authoritative via VERSION compare (isAligned).
|
|
785
834
|
// The HEAD...FETCH_HEAD commit count is subtree-merge noise and never reaches
|
|
786
835
|
// 0, so we MUST NOT use it as the "needs update" signal.
|
|
@@ -1037,6 +1086,19 @@ function renderDiagnostic(state) {
|
|
|
1037
1086
|
console.log(statusLine('Code graph', 'disabled', 'ok'));
|
|
1038
1087
|
}
|
|
1039
1088
|
|
|
1089
|
+
// Orphaned MCP servers left by Codex calls (v4.37.0). Only shown when present
|
|
1090
|
+
// — a clean machine prints nothing here (zero noise).
|
|
1091
|
+
if (state.mcpOrphans && state.mcpOrphans.length > 0) {
|
|
1092
|
+
console.log(statusLine(
|
|
1093
|
+
'Codex MCP leak',
|
|
1094
|
+
`${state.mcpOrphans.length} orphaned MCP server(s) running — will be reaped`,
|
|
1095
|
+
'warn'
|
|
1096
|
+
));
|
|
1097
|
+
state.mcpOrphans.slice(0, 6).forEach((p) =>
|
|
1098
|
+
console.log(` • pid ${p.pid} (up ${p.etime}): ${p.command.slice(0, 70)}`));
|
|
1099
|
+
if (state.mcpOrphans.length > 6) console.log(` • … and ${state.mcpOrphans.length - 6} more`);
|
|
1100
|
+
}
|
|
1101
|
+
|
|
1040
1102
|
console.log();
|
|
1041
1103
|
}
|
|
1042
1104
|
|
|
@@ -0,0 +1,182 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Orphaned-MCP-server reaper (since v4.37.0).
|
|
3
|
+
*
|
|
4
|
+
* WHY THIS EXISTS
|
|
5
|
+
* ---------------
|
|
6
|
+
* BALDART's Codex integration (the `/new`, `new2`, `/codexreview` finder calls
|
|
7
|
+
* and the cron review engine) drives the OpenAI Codex CLI through the
|
|
8
|
+
* `codex-companion.mjs` plugin. That companion attaches to a SHARED, persistent
|
|
9
|
+
* `codex app-server` broker which is spawned `detached + unref'd`
|
|
10
|
+
* (broker-lifecycle.mjs) and which, in turn, spawns every MCP server declared in
|
|
11
|
+
* the user's `~/.codex/config.toml` (Playwright, Figma, …) as its own children.
|
|
12
|
+
*
|
|
13
|
+
* When that broker eventually dies, its MCP children are NOT reaped: the OS
|
|
14
|
+
* reparents them to init (ppid 1) and they keep running — an `@playwright/mcp`
|
|
15
|
+
* server can sit at ~100% CPU for days. Over many Codex sessions these
|
|
16
|
+
* accumulate (the symptom that motivated this utility: ~45 orphaned Playwright
|
|
17
|
+
* MCP processes pegging the machine).
|
|
18
|
+
*
|
|
19
|
+
* SAFETY INVARIANT (read before touching the matchers)
|
|
20
|
+
* ----------------------------------------------------
|
|
21
|
+
* We reap a process ONLY when BOTH hold:
|
|
22
|
+
* 1. ppid === 1 → the process was reparented to init, i.e. its controlling
|
|
23
|
+
* parent is DEAD. An MCP server is a stdio child of whatever launched it;
|
|
24
|
+
* once that parent dies the stdio pipe is broken and the server is dead
|
|
25
|
+
* weight that can never be reconnected to. Reaping it is safe.
|
|
26
|
+
* 2. the command matches a known MCP-server signature (below).
|
|
27
|
+
*
|
|
28
|
+
* We deliberately DO NOT reap the `codex app-server` broker itself. The broker
|
|
29
|
+
* is `detached + unref'd` BY DESIGN, so a perfectly healthy, in-use shared
|
|
30
|
+
* runtime ALSO shows ppid 1 — ppid 1 cannot distinguish a leaked broker from a
|
|
31
|
+
* live one. Killing it could interrupt an in-flight Codex turn. We only report
|
|
32
|
+
* broker processes for visibility; we never auto-kill them.
|
|
33
|
+
*
|
|
34
|
+
* We use Node's `process.kill(pid)` (a direct syscall) rather than shelling out
|
|
35
|
+
* to `kill` — some sandboxed shells silently swallow multi-arg `kill`/for-loops,
|
|
36
|
+
* and the syscall path is immune to that.
|
|
37
|
+
*
|
|
38
|
+
* Fully fail-safe: any internal error degrades to "no orphans found" / "nothing
|
|
39
|
+
* reaped". This is hygiene, never a blocker.
|
|
40
|
+
*/
|
|
41
|
+
|
|
42
|
+
const { execSync } = require('child_process');
|
|
43
|
+
|
|
44
|
+
// Command signatures that identify an MCP server. When such a process is
|
|
45
|
+
// orphaned (ppid 1) it is safe to reap (its stdio parent is gone).
|
|
46
|
+
const MCP_SIGNATURES = [
|
|
47
|
+
/@playwright\/mcp/,
|
|
48
|
+
/\bplaywright-mcp\b/,
|
|
49
|
+
/@modelcontextprotocol\//,
|
|
50
|
+
/-mcp-server\b/,
|
|
51
|
+
/\bmcp-server\b/,
|
|
52
|
+
/\bobsidian-mcp/,
|
|
53
|
+
/[\w@/.-]+-mcp@/, // npx-launched `<pkg>-mcp@<version>`
|
|
54
|
+
];
|
|
55
|
+
|
|
56
|
+
// Codex runtime processes — DETECTED for visibility, never auto-reaped (see the
|
|
57
|
+
// safety note above: a detached broker at ppid 1 may still be the live runtime).
|
|
58
|
+
const CODEX_RUNTIME_SIGNATURES = [
|
|
59
|
+
/codex\s+app-server/,
|
|
60
|
+
/codex-companion\.mjs/,
|
|
61
|
+
];
|
|
62
|
+
|
|
63
|
+
function matchesAny(signatures, command) {
|
|
64
|
+
return signatures.some((re) => re.test(command));
|
|
65
|
+
}
|
|
66
|
+
|
|
67
|
+
/**
|
|
68
|
+
* Snapshot every process as { pid, ppid, etime, command }.
|
|
69
|
+
* `ps -axo` works on both macOS and Linux. Returns [] on any failure or on
|
|
70
|
+
* Windows (the orphan-reparent-to-init leak is a POSIX phenomenon).
|
|
71
|
+
*/
|
|
72
|
+
function listProcesses() {
|
|
73
|
+
if (process.platform === 'win32') return [];
|
|
74
|
+
let raw;
|
|
75
|
+
try {
|
|
76
|
+
raw = execSync('ps -axo pid=,ppid=,etime=,command=', {
|
|
77
|
+
encoding: 'utf8',
|
|
78
|
+
maxBuffer: 16 * 1024 * 1024,
|
|
79
|
+
timeout: 5000,
|
|
80
|
+
});
|
|
81
|
+
} catch (_) {
|
|
82
|
+
return [];
|
|
83
|
+
}
|
|
84
|
+
const procs = [];
|
|
85
|
+
for (const line of raw.split('\n')) {
|
|
86
|
+
const m = line.trim().match(/^(\d+)\s+(\d+)\s+(\S+)\s+(.*)$/);
|
|
87
|
+
if (!m) continue;
|
|
88
|
+
procs.push({
|
|
89
|
+
pid: Number(m[1]),
|
|
90
|
+
ppid: Number(m[2]),
|
|
91
|
+
etime: m[3],
|
|
92
|
+
command: m[4],
|
|
93
|
+
});
|
|
94
|
+
}
|
|
95
|
+
return procs;
|
|
96
|
+
}
|
|
97
|
+
|
|
98
|
+
/**
|
|
99
|
+
* Detect orphaned MCP servers (reapable) and Codex runtime processes (info only).
|
|
100
|
+
*
|
|
101
|
+
* @returns {{ mcp: Array, runtime: Array }}
|
|
102
|
+
* mcp — orphaned MCP servers (ppid 1 + MCP signature) safe to reap
|
|
103
|
+
* runtime — codex app-server / companion processes (reported, NOT reaped)
|
|
104
|
+
*/
|
|
105
|
+
function detectOrphans(procs = listProcesses()) {
|
|
106
|
+
const self = process.pid;
|
|
107
|
+
const mcp = [];
|
|
108
|
+
const runtime = [];
|
|
109
|
+
for (const p of procs) {
|
|
110
|
+
if (p.pid === self) continue;
|
|
111
|
+
if (p.ppid !== 1) continue; // only true orphans — parent is dead
|
|
112
|
+
if (matchesAny(MCP_SIGNATURES, p.command)) mcp.push(p);
|
|
113
|
+
else if (matchesAny(CODEX_RUNTIME_SIGNATURES, p.command)) runtime.push(p);
|
|
114
|
+
}
|
|
115
|
+
return { mcp, runtime };
|
|
116
|
+
}
|
|
117
|
+
|
|
118
|
+
/**
|
|
119
|
+
* Collect a pid plus all of its descendants (so killing an orphaned MCP server
|
|
120
|
+
* also takes down the browser/worker subprocesses it spawned).
|
|
121
|
+
*/
|
|
122
|
+
function collectTree(rootPid, procs) {
|
|
123
|
+
const childrenOf = new Map();
|
|
124
|
+
for (const p of procs) {
|
|
125
|
+
if (!childrenOf.has(p.ppid)) childrenOf.set(p.ppid, []);
|
|
126
|
+
childrenOf.get(p.ppid).push(p.pid);
|
|
127
|
+
}
|
|
128
|
+
const tree = [];
|
|
129
|
+
const seen = new Set();
|
|
130
|
+
const stack = [rootPid];
|
|
131
|
+
while (stack.length) {
|
|
132
|
+
const pid = stack.pop();
|
|
133
|
+
if (seen.has(pid)) continue;
|
|
134
|
+
seen.add(pid);
|
|
135
|
+
tree.push(pid);
|
|
136
|
+
for (const child of childrenOf.get(pid) || []) stack.push(child);
|
|
137
|
+
}
|
|
138
|
+
return tree;
|
|
139
|
+
}
|
|
140
|
+
|
|
141
|
+
/**
|
|
142
|
+
* Reap the given orphaned MCP-server processes (and their descendant trees).
|
|
143
|
+
* Uses process.kill(pid, 'SIGKILL') per-pid — immune to shells that swallow
|
|
144
|
+
* multi-arg kills. Never throws.
|
|
145
|
+
*
|
|
146
|
+
* @param {Array} orphans the `mcp` array from detectOrphans()
|
|
147
|
+
* @param {Array} procs full process snapshot (for descendant resolution)
|
|
148
|
+
* @returns {{ killed: number[], failed: Array<{pid:number,error:string}> }}
|
|
149
|
+
*/
|
|
150
|
+
function reapOrphans(orphans = [], procs = listProcesses()) {
|
|
151
|
+
const self = process.pid;
|
|
152
|
+
const targets = new Set();
|
|
153
|
+
for (const o of orphans) {
|
|
154
|
+
for (const pid of collectTree(o.pid, procs)) {
|
|
155
|
+
if (pid !== self && Number.isInteger(pid) && pid > 1) targets.add(pid);
|
|
156
|
+
}
|
|
157
|
+
}
|
|
158
|
+
const killed = [];
|
|
159
|
+
const failed = [];
|
|
160
|
+
// Kill descendants before roots so a parent can't immediately re-fork: sort
|
|
161
|
+
// by depth is overkill — SIGKILL is unconditional — so a single pass suffices.
|
|
162
|
+
for (const pid of targets) {
|
|
163
|
+
try {
|
|
164
|
+
process.kill(pid, 'SIGKILL');
|
|
165
|
+
killed.push(pid);
|
|
166
|
+
} catch (err) {
|
|
167
|
+
// ESRCH = already gone (e.g. died with its parent tree) → treat as success.
|
|
168
|
+
if (err && err.code === 'ESRCH') killed.push(pid);
|
|
169
|
+
else failed.push({ pid, error: (err && err.message) || String(err) });
|
|
170
|
+
}
|
|
171
|
+
}
|
|
172
|
+
return { killed, failed };
|
|
173
|
+
}
|
|
174
|
+
|
|
175
|
+
module.exports = {
|
|
176
|
+
MCP_SIGNATURES,
|
|
177
|
+
CODEX_RUNTIME_SIGNATURES,
|
|
178
|
+
listProcesses,
|
|
179
|
+
detectOrphans,
|
|
180
|
+
collectTree,
|
|
181
|
+
reapOrphans,
|
|
182
|
+
};
|