baldart 4.33.2 → 4.34.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,34 @@ All notable changes to BALDART will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [4.34.1] - 2026-06-12
9
+
10
+ **`reference-integrity` CI gate: recognize the plumbing carve-out.** v4.33.2 introduced a legitimate `subagent_type: general-purpose` dispatch for the mechanical file-revert agent (the REGISTRY.md plumbing carve-out — mechanical git/file ops, never code authoring), but the CI `reference-integrity` gate's R11 anti-fallback rule banned `general-purpose` **unconditionally**, so `main` went red on the v4.33.2 / v4.34.0 tags (the npm publish workflow is independent and succeeded — both versions are live). The gate now allows `subagent_type: general-purpose` **only** on a dispatch line explicitly marked `plumbing carve-out`; every other `general-purpose` dispatch stays banned (R11 intact, intentionally narrow so it can't be abused as a generic fallback). **PATCH** (CI gate fix; no behavior change to `/new`).
11
+
12
+ ### Changed
13
+
14
+ - **`scripts/check-reference-integrity.js`** — Check A allows `subagent_type: general-purpose` when the dispatch line carries the literal `plumbing carve-out` marker; otherwise the R11 ban stands (with an error message pointing to the marker + REGISTRY.md).
15
+
16
+ ## [4.34.0] - 2026-06-12
17
+
18
+ **`/new` per-card review cluster extracted into the `new-card-review` dynamic workflow — kills the dominant context-cost driver on long epics.** On a long epic (10-12 cards) `/new` accumulated context monotonically: every card ran the full review fan-out (Simplify ×3 + Codex/code-review + qa + the verify/FP pass) *inside the orchestrator*, and each step re-paid the whole growing prefix (cumulative `cache_read`). The driver is **turn-count × prefix**, not subagent output volume. Following the validated `new-final-review` pattern (read-only fan-out hosted in a workflow whose context is discarded), the review cluster now runs OUTSIDE the orchestrator and returns only a compact result.
19
+
20
+ A **single workflow parametrized on the number of cards** covers both modes: sequential delegates `cards:[1]`; team-mode delegates `cards:[the whole wave]`, so it runs **once per wave, not once per card** — the key win, since the team-mode per-card Codex (D.4b) + Simplify (D.3b) loops were the chattiest sub-steps. The workflow fans out the finders per card (Simplify + cross-model Codex with `code-reviewer` fallback + qa-sentinel at group-max tier + security-reviewer on high-risk only), each specialist FP-checking its own findings; then **one `coder` applies all VERIFIED code/perf/security/simplify findings in a single pass** (files disjoint by ownership) and re-verifies lint/tsc/build. Only `{perCard:{fixesApplied,residual}, gateTable, summary}` re-enters the orchestrator — the minority `residual` (doc, needs-manual, scope-expanding, unconverged) is resolved by the skill with the right specialist or a user gate.
21
+
22
+ **Boundaries (by design):** E2E (Phase 2.6 / D.3c — human-gated + nests a skill) and doc-review (Phase 3 / D.2+D.4a — write-mode, must see final code) stay in the skill; doc runs **after E2E on final code** in the delegated path. `/codexreview` (Phase 3.7 / D.4b — a skill that nests sub-agents) cannot run inside a workflow, so the workflow launches the Codex **binary** directly via an agent (haiku preflight + background poll), exactly like `new-final-review`; the **Final FULL gate** remains the cross-card depth net. api-perf stays deferred to the Final. **Opt-in + additive**: delegates only when the `Workflow` tool is available AND the script is linked, else the inline prose (SSOT) runs unchanged — Codex/cross-tool consumers are unaffected. **MINOR** (new capability, fallback inline, no breaking change; no `baldart.config.yml` key ⇒ schema-propagation rule N/A). Validation deferred to telemetry on the first real run (per `/new` convention).
23
+
24
+ ### Added
25
+
26
+ - **`framework/.claude/workflows/new-card-review.js`** — the per-wave review+fix workflow (1..N cards). Modeled on `new-final-review.js` (shared schemas, deterministic Codex preflight haiku + background poll, specialist-owned FP-check) + Simplify/security finders + the single-coder batch-fix pass.
27
+
28
+ ### Changed
29
+
30
+ - **`framework/.claude/skills/new/references/review-cycle.md`** — new "Phase 2.5x — Review-cluster workflow delegation gate" (replica of final-review.md F.1.5): delegates Phase 2.55+3.5+3.7 to `new-card-review` (`cards:[this card]`), consumes the typed `residual`, then proceeds E2E→Doc→Commit; else inline SSOT. `IS_TRIVIAL` never delegates.
31
+ - **`framework/.claude/skills/new/references/team-mode.md`** — new "D.1.6 — Review-cluster workflow delegation gate": delegates the group's D.3b+D.4+D.4b in ONE call per wave (`cards:[the wave]`), runs D.3a (AC-closure) before it and D.3c (E2E) + post-E2E doc + D.5/D.6 after; else inline.
32
+ - **`framework/.claude/skills/new/references/codex-gate.md`** — Phase 3.7 carve-out: SKIP when the cluster was delegated (the workflow already ran the Codex pass).
33
+ - **`framework/.claude/skills/new/SKILL.md`** — routing table notes the `new-card-review` delegation for the 2.55+3.5+3.7 cluster (parity with the final-review row).
34
+ - **`framework/docs/WORKFLOWS.md`** — documents the `new-card-review` workflow.
35
+
8
36
  ## [4.33.2] - 2026-06-12
9
37
 
10
38
  **`/new` runs its two purely-mechanical spawns on haiku instead of opus.** Audit of every agent `/new` spawns (not `new2`) found that the spawns inherit each agent's frontmatter default and none ever override `model:` — correct for the reasoning agents (`coder`/`ui-expert` = opus; the review/analysis agents = sonnet), but the **background worktree-setup subagent** that runs `/nw` is a `general-purpose` agent with no `model:`, so it inherited the **session model (opus)** to do pure git plumbing (create worktree, install deps, allocate port, write registry). Same shape for the **file-scoped revert agent** (was spawned as `coder`/opus to do a mechanical revert). Both now run on **haiku** — cheaper and faster on the background barrier, with no quality loss because neither does any reasoning or code authoring. `qa-sentinel` deliberately stays on sonnet (failure interpretation + test-tier selection). `/mw` is invoked inline by the orchestrator (not a subagent) so it is unaffected. No `baldart.config.yml` key involved (model selection is native) ⇒ schema-change propagation rule does not apply. **PATCH** (cost/perf, no new capability, no breaking change).
package/VERSION CHANGED
@@ -1 +1 @@
1
- 4.33.2
1
+ 4.34.1
@@ -271,8 +271,8 @@ mappa di navigazione.
271
271
  | 0 — Workspace Hygiene + Pre-flight | a inizio batch (una volta) | [`references/setup.md`](references/setup.md) | mai (BLOCKING) |
272
272
  | 1-2 — Claim & Context + Implement | per card | [`references/implement.md`](references/implement.md) | — |
273
273
  | 2.5 / 2.5b — Completeness + AC-Closure | dopo l'impl | [`references/completeness.md`](references/completeness.md) | — (2.5b è BLOCKING, mai skip) |
274
- | 2.55 / 2.6 / 3 / 3.5 — Simplify + E2E + Doc + QA | dopo AC-Closure | [`references/review-cycle.md`](references/review-cycle.md) | `IS_TRIVIAL`→skip 2.55+3.5; `has_e2e_review:false`/backend-only→skip 2.6; `light`+no-doc-diff→Doc a Final; `balanced`→QA a Final |
275
- | 3.7 — Pre-Merge Codex Review Gate | pre-commit | [`references/codex-gate.md`](references/codex-gate.md) | solo `IS_TRIVIAL` (altrimenti unconditional; `light`/`full` è DEPTH, non skip) |
274
+ | 2.55 / 2.6 / 3 / 3.5 — Simplify + E2E + Doc + QA | dopo AC-Closure | [`references/review-cycle.md`](references/review-cycle.md) | `IS_TRIVIAL`→skip 2.55+3.5; `has_e2e_review:false`/backend-only→skip 2.6; `light`+no-doc-diff→Doc a Final; `balanced`→QA a Final. **Il cluster discovery+fix 2.55+3.5+3.7 delega al workflow `new-card-review` se il tool `Workflow` è disponibile (review-cycle.md § Phase 2.5x — poi E2E→Doc→Commit); altrimenti inline. E2E (2.6) e Doc (3) restano sempre nella skill.** |
275
+ | 3.7 — Pre-Merge Codex Review Gate | pre-commit | [`references/codex-gate.md`](references/codex-gate.md) | solo `IS_TRIVIAL` (altrimenti unconditional; `light`/`full` è DEPTH, non skip). **Coperta dal workflow `new-card-review` quando il cluster è delegato (codex-gate.md carve-out).** |
276
276
  | 4-5 — Commit + Context Clean | fine card | [`references/commit.md`](references/commit.md) | — |
277
277
  | Final review F.1-F.6 | dopo l'ultima card | [`references/final-review.md`](references/final-review.md) | mai (FULL gate unconditional). **F.2–F.4 delegate al workflow `new-final-review` se il tool `Workflow` è disponibile (Step F.1.5); altrimenti inline.** F.5/F.6 sempre nella skill. |
278
278
  | 6 / 6b / 6c — Merge & cleanup | dopo Final | [`references/merge-cleanup.md`](references/merge-cleanup.md) | mai (BLOCKING) |
@@ -4,6 +4,16 @@
4
4
 
5
5
  ### Phase 3.7 — Pre-Merge Codex Review Gate (MANDATORY — UNCONDITIONAL)
6
6
 
7
+ > **Workflow-delegation carve-out (v4.34.0)**: if the review cluster was delegated to the
8
+ > `new-card-review` workflow (see `references/review-cycle.md` § "Phase 2.5x — Review-cluster workflow
9
+ > delegation gate"), this Phase 3.7 is **already covered** — the workflow ran the cross-model Codex pass
10
+ > (agent-launched binary, `code-reviewer` fallback) at the card's profile depth as part of the cluster,
11
+ > and the FP-validated findings were already fix-applied. **SKIP this gate**; log
12
+ > `codex-review: covered by new-card-review workflow`. This carve-out applies ONLY when the delegation
13
+ > branch was taken; on the inline fallback (no `Workflow` tool) this gate runs in full as written below.
14
+ > The unconditional **Final-review FULL gate** (Codex over the whole batch) remains the cross-card net
15
+ > in both paths.
16
+
7
17
  **This gate runs for EVERY card with code to review before the Phase 4 commit.** A per-card `/codexreview` MUST run BEFORE the Phase 4 commit, regardless of file paths or perceived risk. The historical conditional High-Risk Path detector is preserved below — but only as a **signal-logging step** to record *which* triggers matched in the tracker. It NEVER suppresses the `/codexreview` invocation. Even if zero triggers match, `/codexreview` runs.
8
18
 
9
19
  > **Single carve-out — `IS_TRIVIAL` fast-lane (since v4.6.0)**: a card that is `IS_TRIVIAL` on its ACTUAL committed diff (`review_profile=skip` + non-source diff + 0 Step-A triggers — § "Trivial-card fast-lane") **skips this gate**, because there is no code for Codex to review. Its adversarial coverage is carried by the SAME unconditional **Final-review FULL gate** the `light` path already relies on (it runs Codex over the entire batch diff, including every trivial card's files, before merge). This is NOT a `time budget` skip — it is the deterministic `IS_TRIVIAL` gate reason, and the guard means a single source file in the diff flips the card back onto this gate. Log `codex-review: SKIPPED (trivial — non-source diff; covered by Final FULL gate)`.
@@ -2,6 +2,81 @@
2
2
 
3
3
  > **Modulo `/new`** — eseguilo, poi torna al core § "Routing" per la fase successiva. I `§ "..."` (Context economy, Context Tracking, Trivial-card fast-lane, Risk-signal detector, Fix Application Log) vivono nel **core SKILL.md**.
4
4
 
5
+ ### Phase 2.5x — Review-cluster workflow delegation gate (v4.34.0 — opt-in, additive)
6
+
7
+ The per-card review cluster — **Phase 2.55 Simplify + Phase 3.5 QA + Phase 3.7 Codex** (the
8
+ *discovery* fan-out: Simplify trio, cross-model Codex/code-reviewer, qa-sentinel gates, security
9
+ review) plus the **fix application** — is a self-contained read-then-fix pass that touches no git
10
+ state outside the worktree and asks the user nothing on the happy path. That makes it a clean fit for a
11
+ **dynamic workflow** that runs the whole cluster OUTSIDE this orchestrator's context window (the single
12
+ biggest source of prefix growth on long epics). E2E (Phase 2.6, human-gated + nests a skill) and the
13
+ doc-review (Phase 3, write-mode, must see the FINAL code) stay in the skill.
14
+
15
+ **Branch (decide once, here — at the top of the review cluster, AFTER Phase 2.5b AC-Closure):**
16
+
17
+ - **IF `IS_TRIVIAL`** (§ "Trivial-card fast-lane", re-confirmed on the committed diff) → do NOT delegate.
18
+ Follow the trivial fast-lane in Phase 2.55 below (inline mechanical gates → Phase 3 doc → commit).
19
+
20
+ - **ELSE IF** the `Workflow` tool is available **AND** `.claude/workflows/new-card-review.js` is present
21
+ (linked by the framework on Claude-enabled installs) → **delegate the cluster** to it. First build the
22
+ inputs (reusing the existing deterministic logic — do NOT re-implement it):
23
+ - `qaTier` ← the **Phase 3.5 profile selection** (step 19-21b below: `review_profile` floor →
24
+ `skip`/`light` ⇒ `qaTier:"light"` (qa-sentinel deferred to Final), `deep` **or any Phase 3.7 Step-A
25
+ high-risk trigger on this card's diff** ⇒ `qaTier:"full"`). Compute the **Step-A detector**
26
+ (`references/codex-gate.md` Step A) once here — it also tells the workflow's Codex pass the depth.
27
+ - `scopeFiles` ← this card's committed diff (`git diff --name-only "$TRUNK...HEAD"`, fallback `HEAD~1..HEAD`).
28
+ - `editableFiles` ← this card's **File Ownership Map** entries (the coder's write scope; `setup.md` step 3b).
29
+ - `archBaselinePath` ← `/tmp/arch-baseline-<CARD-ID>.md` (persisted at `implement.md` step 5b).
30
+ - `hasSecurityFiles` ← any `scopeFiles` path matches `paths.high_risk_modules`.
31
+ - `runSimplify` ← `true` (a non-trivial card always simplifies).
32
+
33
+ ```
34
+ Workflow({ name: 'new-card-review', args: {
35
+ cards: [{ cardId, cardPath, scopeFiles, editableFiles, qaTier, hasSecurityFiles, runSimplify, archBaselinePath }],
36
+ worktreePath, baseBranch, config // from Phase 0 / tracker
37
+ }})
38
+ ```
39
+
40
+ The workflow runs Simplify + Codex (agent-launched, code-reviewer fallback) + qa-sentinel + security,
41
+ FP-checks each specialist's own findings, then **one coder applies all VERIFIED
42
+ code/perf/security/simplify findings in a single pass** and re-verifies lint/tsc/build. It returns
43
+ `{ codexEngine, perCard: { <CARD-ID>: { fixesApplied, residual } }, gateTable, summary }`.
44
+ **Skip the inline Phase 2.55 + Phase 3.5 below AND the Phase 3.7 gate in `codex-gate.md`** (all three
45
+ are now done), then handle the workflow output HERE in the skill. **Process each `residual` finding by
46
+ CLASSIFICATION FIRST, then domain** (so a needs-manual doc finding reaches the human gate, not the
47
+ automated doc re-review):
48
+ - `classification == NEEDS_MANUAL_CONFIRMATION` (any domain) → `AskUserQuestion` — the human gate the
49
+ workflow cannot run. (`summary.needsManual` counts these, doc included.)
50
+ - else `domain == doc` residual → carry into **Phase 3** (the doc-reviewer runs there, post-E2E, on final code).
51
+ - else `code`/`perf`/`security`/`migration` residual (a fix the coder could not converge in its 2 retries)
52
+ → spawn a targeted `coder` now over this card's `editableFiles`.
53
+ - **QA gate (BLOCKING — mirror of inline Phase 3.5 step 24)**: if `gateTable` has any `status:"FAIL"`
54
+ **OR** `summary.checksFailed` is true, the merge gate is NOT satisfied. Spawn a `coder` on the
55
+ failing gates / post-fix regressions (≤2 retries, re-run via the workflow or inline qa-sentinel),
56
+ then if still failing → `AskUserQuestion` (proceed/stop). **Do NOT proceed to Phase 4 commit until
57
+ `gateTable` is all PASS/SKIP and `checksFailed` is false** — a delegated QA FAIL must block exactly
58
+ as the inline path does. (`gateTable` is **wave-scoped**, not per-card — qa-sentinel runs once over
59
+ the worktree; a FAIL applies to the whole wave's commit.)
60
+ - **Telemetry**: append the workflow's `fixesApplied` + `residual` to `## Fix Application Log`
61
+ (`decision=workflow`). Derive the Phase-8 producers from the return: `qa_first_attempt` ←
62
+ `summary.qaRan ? (summary.failingGates.length === 0 && !summary.checksFailed ? "pass" : "fail") : "n/a"`
63
+ (n/a when qa was deferred to the Final); `doc_gaps` is produced by **Phase 3** (which still runs below).
64
+
65
+ Then proceed to **Phase 2.6 (E2E)** → **Phase 3 (doc review, relevance-gated)** → **Phase 4 (commit)**.
66
+ Log `review-cluster: delegated to new-card-review (engine: <codexEngine>, fixes: <n>, residual: <n>, qa: <pass|fail|deferred>)`.
67
+
68
+ - **ELSE** (no `Workflow` tool — older Claude Code, `disableWorkflows`, or a Codex install where
69
+ workflows have no equivalent) → run the **inline Phase 2.55 → 2.6 → 3 → 3.5 → 3.7 exactly as written
70
+ below**. This prose is the SSOT and the always-available fallback; the workflow only mirrors it.
71
+
72
+ > **Drift note (maintainers):** `new-card-review.js` encodes only the *orchestration shape* of the
73
+ > review cluster; its agent briefs cite THIS module for the review *semantics*, so the SSOT for what each
74
+ > finder checks stays here. When you change the Simplify lenses, the qa contract, the Codex/Step-A
75
+ > profile rule, or the FP-check/routing, update the workflow's phase/agent wiring to match. The
76
+ > delegated path runs the doc-review (Phase 3) on the FINAL post-fix/post-E2E code; the inline path runs
77
+ > it at its current position — both are doc-relevance-gated and both are backstopped by the Final FULL
78
+ > doc-reviewer.
79
+
5
80
  ### Phase 2.55 — Simplify (code cleanup before review)
6
81
 
7
82
  > **Trivial fast-lane gate**: re-confirm `IS_TRIVIAL` on the ACTUAL committed diff (§ "Trivial-card fast-lane" — all 3 conditions, now including the real non-source diff check). If trivial → **SKIP this phase** AND Phase 3.5 QA + Phase 3.7 Codex; instead run the **inline mechanical gates** (`markdownlint` on changed `.md`, `lint` on lintable changed files, `build` as a sanity check — no qa-sentinel, no test suite), then proceed to Phase 3 (doc review, which DOES run) and Phase 4 (commit). Log `simplify/qa/codex: SKIPPED (trivial — non-source diff)`. If the actual diff turned out to contain a source file → NOT trivial → run this phase and the normal review path. AC-Closure (Phase 2.5b) already ran and is never skipped.
@@ -163,6 +163,63 @@ After ALL agents in the group complete successfully:
163
163
  - **Sub-classify `DOC_DEFER_CARDS`** (since v4.7.0, for #2 doc deferral) = cards with `review_profile == light` whose committed diff touches **NO documentation file** (no `.md`, no path under `${paths.references_dir}`, no data-model/ssot/api doc). Their per-card doc-review is deferred to the Final F.3 doc-reviewer. (Trivial cards, and any card whose diff touches a doc file, are NOT in this set — doc-review stays relevant for them.)
164
164
  - Log: `## D.1.5 Effective Profiles\n<CARD-ID>: profile=<floor> triggers=<n> diff=<source|non-source> → effective=<light|full> (<LIGHT_CARDS|FULL_CARDS>)<, TRIVIAL / DOC_DEFER if applicable>` per card. This single computation is the SSOT for D.2 (doc-reviewer scoping), D.3b/D.3c (already skipped for trivial), and D.4b (inclusion + per-card Codex profile: `light` cards → `/codexreview` `light`, `full` cards → `full`) — do NOT recompute it downstream.
165
165
 
166
+ 1.6. **D.1.6 — Review-cluster workflow delegation gate (v4.34.0 — opt-in, additive)** — The group's
167
+ code-review cluster — **D.3b Simplify + D.4 QA + D.4b Codex** (per-card discovery) **and their fix
168
+ application** — runs ONCE per wave OUTSIDE this orchestrator's context when delegated to a dynamic
169
+ workflow. This is the single biggest context-economy win in team mode (the D.4b per-card Codex loop +
170
+ the D.3b per-card Simplify fan-out are the chattiest sub-steps).
171
+
172
+ - **IF** the `Workflow` tool is available **AND** `.claude/workflows/new-card-review.js` is present →
173
+ **delegate the cluster for the whole group in ONE call.** First run **D.3a (AC-Closure Gate)** for
174
+ every card (it is a human gate and must precede the review — keep it here, before the workflow).
175
+ Then build one `cards[]` entry per **non-trivial** card in the group (skip `TRIVIAL_CARDS`):
176
+ `cardId`, `cardPath`, `scopeFiles` (the card's committed diff), `editableFiles` (its File Ownership
177
+ Map slice), `qaTier` (`full` iff the GROUP max is `deep`/risk-escalated per D.1.5 — mirrors D.4's
178
+ group predicate; else `light`), `hasSecurityFiles` (scope ∩ `paths.high_risk_modules`),
179
+ `runSimplify:true`, `archBaselinePath` (`/tmp/arch-baseline-group-<FIRST-CARD-ID>.md`).
180
+ ```
181
+ Workflow({ name: 'new-card-review', args: {
182
+ cards: [ …one entry per non-trivial card in the wave… ],
183
+ worktreePath, baseBranch, config
184
+ }})
185
+ ```
186
+ The workflow fans out the finders per card, runs ONE Codex pass + ONE qa-sentinel (group max tier)
187
+ over the union, and **one coder applies all VERIFIED code/perf/security/simplify fixes for the
188
+ whole group in a single pass** (files disjoint by ownership → no conflict, same as D.3). It returns
189
+ `{ codexEngine, perCard, gateTable, summary }`. **Skip the inline D.2 (code portion), D.3, D.3b,
190
+ D.4, D.4b** below. Then per card handle `perCard[<id>].residual` exactly as the sequential gate does
191
+ (`references/review-cycle.md` § Phase 2.5x — **by classification first**: `NEEDS_MANUAL_CONFIRMATION`
192
+ any-domain → `AskUserQuestion`; else doc residual → the post-E2E doc step; else unconverged
193
+ code/perf/security residual → targeted `coder`). Apply the **same BLOCKING QA-gate consumption**:
194
+ `gateTable` with any `status:"FAIL"` OR `summary.checksFailed` → coder fix (≤2 retries) then
195
+ `AskUserQuestion`; **D.5 commit MUST NOT happen until `gateTable` is PASS/SKIP and `checksFailed` is
196
+ false** (a delegated QA FAIL blocks exactly as inline D.4 / Phase 3.5 would — `gateTable` is
197
+ wave-scoped). Append the workflow `fixesApplied`/`residual` to `## Fix Application Log`
198
+ (`decision=workflow`, with `card=<ID>`).
199
+ - **Coverage-assertion reconciliation (BLOCKING — closes the FEAT-0006 hole on the delegated path)**:
200
+ the Step D coverage assertion (below) requires per-card `simplify:` / `code-review:` /
201
+ `codex-review:` / `qa:` rows whose inline producers (D.3b/D.4/D.4b) were just skipped. **Synthesize
202
+ them truthfully from the workflow return**, per non-trivial card: `simplify:` ← `<n> fixes via
203
+ workflow | decision=workflow` (or `clean`); `code-review:`/`codex-review:` ← `decision=workflow,
204
+ engine=<codexEngine>`; `qa:` ← `summary.qaRan ? (gateTable all PASS ? "PASS" : "FAIL→fixed/asked")
205
+ : "DEFERRED to Final FULL gate (group max ≤ balanced)"`; `qa_first_attempt` ← `summary.qaRan ?
206
+ (summary.failingGates.length===0 && !summary.checksFailed ? "pass" : "fail") : "n/a"`. `TRIVIAL_CARDS`
207
+ (excluded from `cards[]`) still get their `SKIPPED (trivial)` rows as in the inline path. A
208
+ `decision=workflow` row is a **truthful** producer — never write a placeholder for work that did
209
+ not run. `doc_gaps` is produced by the post-E2E doc step (D.4a logic, below).
210
+ Then proceed: **D.3c (E2E)** → **doc-review (audit+apply, group, relevance-gated — run it HERE,
211
+ after E2E, on the final code; this is the D.2-audit + D.4a-apply pair collapsed into one post-E2E
212
+ pass, excluding `DOC_DEFER_CARDS`)** → **D.5 (commit)** → **D.6 (backlog)**.
213
+ Log `D.1.6: review-cluster delegated to new-card-review (wave, <N> cards, engine: <codexEngine>, qa: <pass|fail|deferred>)`.
214
+
215
+ - **ELSE** (no `Workflow` tool / Codex install) → run the **inline D.2 → D.6 exactly as written
216
+ below** (SSOT fallback; unchanged order).
217
+
218
+ > **Drift note (maintainers):** when delegating, the doc-review (D.2 audit + D.4a apply) runs as a
219
+ > single post-E2E pass on final code; on the inline fallback it stays split at D.2/D.4a. Both exclude
220
+ > `DOC_DEFER_CARDS` and are backstopped by the Final FULL doc-reviewer. Keep the workflow's finder
221
+ > set / FP-check / profile rule in sync with `references/review-cycle.md` (the cluster SSOT).
222
+
166
223
  2. **D.2 — Combined static review (group)** — **doc-reviewer only** (since v4.18.0 — the code-reviewer pass moved to D.4b; see below):
167
224
  - **doc-reviewer — over the group MINUS `DOC_DEFER_CARDS`** (since v4.7.0). It runs **read-only here**, MUST **attribute every doc finding to a specific card** (by file → ownership map), and applies the full Phase 3 mandate INCLUDING the spec/docs-drift→bug lens (since v3.35.0). The `DOC_DEFER_CARDS` (light, no-doc diff per D.1.5) are excluded — their doc-review is deferred to the Final F.3 doc-reviewer (batch-wide). **If the group is entirely `DOC_DEFER_CARDS` → skip the D.2 doc-reviewer** and log `D.2 doc-reviewer: DEFERRED to Final FULL gate (all cards light + no-doc diff)`. D.4a consumes the per-card findings (for the non-deferred cards) and dispatches the doc-reviewer (in write mode) to apply them — no second AUDIT spawn, but the FIXES are still owned by doc-reviewer.
168
225
  - **No D.2 code-reviewer (since v4.18.0).** Previously `code-reviewer` ran here over `LIGHT_CARDS \ TRIVIAL_CARDS` as their single per-card code review (to keep Codex off light). Now that `light` runs Codex per-card (Codex finder + FP-gate at D.4b — `code-reviewer` validates there), the light cards' code review lives at **D.4b** alongside the full cards. D.2 no longer spawns a code-reviewer; log `D.2 code-reviewer: N/A (light cards code-reviewed at D.4b via Codex-light since v4.18.0)`.
@@ -224,7 +281,7 @@ Before moving to Step E, the orchestrator MUST verify the tracker contains, for
224
281
  - `code-review: <covered at D.4b (Codex-light FP-gate or Codex-full) | SKIPPED (trivial — non-source diff)>` (per-card — since v4.18.0 light cards are code-reviewed at D.4b, not D.2; the trivial SKIP is valid ONLY for `TRIVIAL_CARDS` per the D.1.5 partition)
225
282
  - `codex-review: <verdict (light — Codex finder + FP-gate) | verdict (full) | SKIPPED (trivial — non-source diff; covered by Final FULL gate)>` (per-card from D.4b — since v4.18.0 `LIGHT_CARDS` produce a Codex verdict, NOT a skip; the SKIPPED form is valid ONLY for `TRIVIAL_CARDS`)
226
283
 
227
- A missing entry means a sub-step was skipped. An entry whose value is a **`review_profile`-driven SKIP/DEFER**, an **`IS_TRIVIAL` SKIP**, or a **`/prd`-provenance SKIP** with a documented enumerated reason above (D.3b `skip` profile; D.4b/code/codex `TRIVIAL_CARDS`; doc-review `DOC_DEFER_CARDS`; QA `group max=balanced, no escalation`; api-perf `skip_api_perf_auditor`; plan-auditor `holistic_audit provenance`) is a VALID, present entry — not a violation; all defer to the unconditional Final FULL gate. What remains forbidden: a missing entry, or a skip whose reason is `time budget` / `to save tokens` / any model-invented constraint. If any entry is genuinely missing, do NOT proceed to Step E — return to the missing sub-step and execute it. Documenting "skipped per time budget" or similar is a protocol violation per the top-of-file rigidity clause.
284
+ A missing entry means a sub-step was skipped. An entry whose value is a **`review_profile`-driven SKIP/DEFER**, an **`IS_TRIVIAL` SKIP**, a **`/prd`-provenance SKIP**, or a **`decision=workflow` provenance** (the D.1.6 delegated path — the `new-card-review` workflow IS the producer of `simplify`/`code-review`/`codex-review`/`qa` for that card, synthesized truthfully from its return, never a placeholder) with a documented enumerated reason above (D.3b `skip` profile; D.4b/code/codex `TRIVIAL_CARDS`; doc-review `DOC_DEFER_CARDS`; QA `group max=balanced, no escalation`; api-perf `skip_api_perf_auditor`; plan-auditor `holistic_audit provenance`; **D.1.6 `decision=workflow`**) is a VALID, present entry — not a violation; all defer to the unconditional Final FULL gate. What remains forbidden: a missing entry, or a skip whose reason is `time budget` / `to save tokens` / any model-invented constraint. If any entry is genuinely missing, do NOT proceed to Step E — return to the missing sub-step and execute it. Documenting "skipped per time budget" or similar is a protocol violation per the top-of-file rigidity clause.
228
285
 
229
286
  #### Step E: Context purge + next group
230
287
 
@@ -0,0 +1,416 @@
1
+ export const meta = {
2
+ name: 'new-card-review',
3
+ description:
4
+ "Per-wave review+fix cluster for /new. Takes 1..N co-located cards (1 = sequential per-card; N = a team-mode group/wave) and runs the whole review fan-out OUTSIDE the orchestrator context: Simplify + cross-model Codex (agent-launched, code-reviewer fallback) + qa-sentinel gates + security-reviewer (high-risk only), each specialist FP-checking its OWN findings; then ONE coder applies all VERIFIED code/perf/security/simplify findings in a single pass (files disjoint by ownership) and re-verifies lint/tsc/build. Doc-review and api-perf are deliberately OUT (doc runs post-E2E in the skill; api-perf is deferred to the Final FULL gate). Returns a compact {perCard:{fixesApplied,residual}, gateTable, summary} — the only object that re-enters the orchestrator prefix. Maps to references/review-cycle.md (Phase 2.55+3.5+3.7) and references/team-mode.md (Step D.2-D.4b). NOT new2: no human gates (returned as residual), no AC-closure, no E2E, no merge/commit/backlog.",
5
+ phases: [
6
+ { title: 'Baseline', detail: 'architecture grounding for the wave scope' },
7
+ { title: 'Discovery', detail: 'parallel finders per card (simplify / codex / qa / security)' },
8
+ { title: 'Verify', detail: 'specialist-owned FP validation; residual routed to domain specialist' },
9
+ { title: 'Fix', detail: 'one coder applies all verified code/perf/security/simplify findings + re-verify' },
10
+ ],
11
+ }
12
+
13
+ // ───────────────────────────────────────────────────────────────────────────
14
+ // args contract — supplied by the /new skill (it owns git + tracker + human gates):
15
+ // cards [{ cardId, cardPath, scopeFiles[], editableFiles[], qaTier,
16
+ // hasSecurityFiles, runSimplify, archBaselinePath }]
17
+ // sequential → length 1; team-mode → the whole group/wave.
18
+ // scopeFiles files the card's committed diff touched (review surface)
19
+ // editableFiles ownership-map files the coder MAY write (fix scope)
20
+ // qaTier 'light' | 'full' (qa-sentinel runs only at 'full'; else deferred to Final)
21
+ // archBaselinePath /tmp/arch-baseline-<id>.md (or null)
22
+ // worktreePath string the batch worktree (agents cd into it)
23
+ // baseBranch string trunk the diff is taken against
24
+ // config object resolved baldart.config.yml (paths.* … )
25
+ //
26
+ // Return value (consumed by the skill):
27
+ // { codexEngine, perCard: { <cardId>: { fixesApplied:[…1-line], residual:[…finding] } },
28
+ // gateTable, summary }
29
+ // ───────────────────────────────────────────────────────────────────────────
30
+
31
+ const a = args || {}
32
+ const cards = (Array.isArray(a.cards) ? a.cards : []).filter((c) => c && c.cardId)
33
+ const cfg = a.config || {}
34
+ const highRisk = (cfg.paths && cfg.paths.high_risk_modules) || [] // security-domain hint
35
+ const protocolRef = '.claude/skills/new/references/review-cycle.md'
36
+
37
+ // Per-card result accumulator — built up-front (so the early-return guards can return it) and
38
+ // populated with fixesApplied/residual in the Fix phase.
39
+ const perCard = {}
40
+ for (const c of cards) perCard[c.cardId] = { fixesApplied: [], residual: [] }
41
+
42
+ if (!cards.length) {
43
+ log('new-card-review: no cards supplied — nothing to review.')
44
+ return { codexEngine: 'none', perCard, gateTable: [], summary: makeSummary({}) }
45
+ }
46
+
47
+ // Union scope across the wave (Codex + qa run once over the union; per-card attribution by file).
48
+ const unionScope = dedupe(cards.flatMap((c) => asArr(c.scopeFiles)))
49
+ const unionEditable = dedupe(cards.flatMap((c) => asArr(c.editableFiles)))
50
+ const maxQaTier = cards.some((c) => String(c.qaTier).toLowerCase() === 'full') ? 'full' : 'light'
51
+
52
+ if (!unionScope.length) {
53
+ log('new-card-review: empty review scope across all cards — nothing to review.')
54
+ return { codexEngine: 'none', perCard, gateTable: [], summary: makeSummary({}) }
55
+ }
56
+
57
+ // ---- Schemas (parity with new-final-review.js) ------------------------------
58
+ const FINDING = {
59
+ type: 'object',
60
+ required: ['finding_id', 'title', 'severity', 'confidence', 'evidence', 'minimal_fix_direction', 'domain'],
61
+ additionalProperties: false,
62
+ properties: {
63
+ finding_id: { type: 'string', description: '<CARD-ID>-F### or <agent>-F###' },
64
+ title: { type: 'string' },
65
+ severity: { enum: ['BLOCKER', 'HIGH', 'MEDIUM', 'LOW'] },
66
+ confidence: { type: 'number', description: '0-100' },
67
+ evidence: { type: 'string', description: 'exact file:line + code quote' },
68
+ minimal_fix_direction: { type: 'string' },
69
+ domain: { enum: ['doc', 'security', 'migration', 'code', 'perf', 'test'], description: 'Domain-Override routing bucket' },
70
+ },
71
+ }
72
+ const FINDINGS_SCHEMA = {
73
+ type: 'object', required: ['findings'], additionalProperties: false,
74
+ properties: { findings: { type: 'array', items: FINDING }, note: { type: 'string' } },
75
+ }
76
+ const CODEX_SCHEMA = {
77
+ type: 'object', required: ['codexAvailable', 'findings'], additionalProperties: false,
78
+ properties: {
79
+ codexAvailable: { type: 'boolean', description: 'false if CODEX_NOT_FOUND / TIMED_OUT' },
80
+ findings: { type: 'array', items: FINDING },
81
+ note: { type: 'string' },
82
+ },
83
+ }
84
+ const GATES_SCHEMA = {
85
+ type: 'object', required: ['gates'], additionalProperties: false,
86
+ properties: {
87
+ gates: {
88
+ type: 'array',
89
+ items: {
90
+ type: 'object', required: ['gate', 'status'], additionalProperties: false,
91
+ properties: { gate: { type: 'string' }, status: { enum: ['PASS', 'FAIL', 'SKIP'] }, detail: { type: 'string' } },
92
+ },
93
+ },
94
+ },
95
+ }
96
+ const VERDICT_SCHEMA = {
97
+ type: 'object', required: ['classification'], additionalProperties: false,
98
+ properties: {
99
+ classification: { enum: ['VERIFIED', 'FALSE_POSITIVE', 'NEEDS_MANUAL_CONFIRMATION'] },
100
+ rationale: { type: 'string' },
101
+ },
102
+ }
103
+ const FIX_SCHEMA = {
104
+ type: 'object', required: ['applied', 'unresolved', 'checks'], additionalProperties: false,
105
+ properties: {
106
+ applied: {
107
+ type: 'array',
108
+ items: {
109
+ type: 'object', required: ['finding_id'], additionalProperties: false,
110
+ properties: { finding_id: { type: 'string' }, note: { type: 'string' } },
111
+ },
112
+ },
113
+ unresolved: { type: 'array', items: { type: 'string', description: 'finding_id left unfixed' } },
114
+ checks: {
115
+ type: 'object', required: ['lint', 'tsc', 'build'], additionalProperties: false,
116
+ properties: { lint: { enum: ['PASS', 'FAIL', 'SKIP'] }, tsc: { enum: ['PASS', 'FAIL', 'SKIP'] }, build: { enum: ['PASS', 'FAIL', 'SKIP'] } },
117
+ },
118
+ note: { type: 'string' },
119
+ },
120
+ }
121
+
122
+ // ---- Shared brief fragments -------------------------------------------------
123
+ const waveBrief = [
124
+ `Worktree: ${a.worktreePath || '(cwd)'}`,
125
+ `Base branch for the diff: ${a.baseBranch || '(trunk)'}`,
126
+ `Cards under review (Read each YAML for acceptance_criteria + entrypoints):\n${cards.map((c) => `${c.cardId} — ${c.cardPath || '(no path)'}`).join('\n')}`,
127
+ `ALL changed files in this wave (review every one):\n${unionScope.join('\n')}`,
128
+ ].join('\n\n')
129
+
130
+ function cardScopeBrief(c) {
131
+ return `Card ${c.cardId} — files changed (review surface):\n${asArr(c.scopeFiles).join('\n') || '(none)'}`
132
+ }
133
+
134
+ // ───────────────────────────────────────────────────────────────────────────
135
+ // Phase Baseline — reuse per-card baselines when present, else architect ×1
136
+ // ───────────────────────────────────────────────────────────────────────────
137
+ phase('Baseline')
138
+ const baselinePaths = dedupe(cards.map((c) => c.archBaselinePath).filter(Boolean))
139
+ let baselineBrief
140
+ if (baselinePaths.length) {
141
+ baselineBrief = `Architecture baseline files (Read each — file paths, type signatures, patterns, high-risk paths):\n${baselinePaths.join('\n')}`
142
+ log(`Baseline: reusing ${baselinePaths.length} per-card baseline file(s).`)
143
+ } else {
144
+ const arch = await agent(
145
+ `You are grounding a post-implementation code review. Map the EXISTING architecture, critical patterns, and high-risk code paths relevant to this wave's changed files, so downstream reviewers can spot regressions.\n\n${waveBrief}\n\nReturn a concise baseline (key modules, their contracts, the regression-prone seams touched by this diff). Do not review the changes yet.`,
146
+ { label: 'arch-baseline', phase: 'Baseline', agentType: 'codebase-architect',
147
+ schema: { type: 'object', required: ['baseline'], additionalProperties: false, properties: { baseline: { type: 'string' } } } }
148
+ )
149
+ baselineBrief = `Architecture baseline (from codebase-architect):\n${(arch && arch.baseline) || '(unavailable)'}`
150
+ log('Baseline: spawned codebase-architect (no per-card baselines supplied).')
151
+ }
152
+
153
+ // ───────────────────────────────────────────────────────────────────────────
154
+ // Phase Discovery — per-card Simplify/security finders + wave-wide Codex + qa
155
+ // ───────────────────────────────────────────────────────────────────────────
156
+ phase('Discovery')
157
+
158
+ // Deterministic Codex pre-flight (parity with new-final-review.js F-040): a minimal Haiku agent runs
159
+ // ONLY the resolution glob and returns the path, so the existence decision is taken out of the review
160
+ // agent's hands. Found → Codex runs in the BACKGROUND with file-poll. Not found → code-reviewer fallback.
161
+ const PREFLIGHT_SCHEMA = {
162
+ type: 'object', required: ['codexScriptPath'], additionalProperties: false,
163
+ properties: { codexScriptPath: { type: 'string', description: 'absolute path to codex-companion.mjs, or "" if none' } },
164
+ }
165
+ let codexScriptPath = ''
166
+ try {
167
+ const pf = await agent(
168
+ `Resolve the Codex companion script path. Run EXACTLY this one Bash command and nothing else:\n` +
169
+ " ls -d ~/.claude/plugins/marketplaces/openai-codex/plugins/codex/scripts/codex-companion.mjs ~/.claude/plugins/cache/openai-codex/codex/*/scripts/codex-companion.mjs 2>/dev/null | sort -V | tail -1\n" +
170
+ `Return codexScriptPath = the trimmed stdout (a single absolute path), or "" if the command printed nothing. Do NOT run Codex, do NOT read any other file, do NOT reason about availability — just report the command's stdout.`,
171
+ { label: 'codex-preflight', phase: 'Discovery', model: 'haiku', schema: PREFLIGHT_SCHEMA }
172
+ )
173
+ codexScriptPath = (pf && typeof pf.codexScriptPath === 'string') ? pf.codexScriptPath.trim() : ''
174
+ } catch (_) { codexScriptPath = '' }
175
+ const codexResolved = /codex-companion\.mjs$/.test(codexScriptPath)
176
+ log(codexResolved
177
+ ? `Codex companion resolved deterministically: ${codexScriptPath}`
178
+ : 'Codex companion NOT found (deterministic pre-flight) — primary code review via code-reviewer fallback.')
179
+
180
+ const codexPrompt =
181
+ `Run a deep, cross-model code review over this wave's diff, following the review protocol summarized in ${protocolRef} (Phase 3.7). The code is already written and committed — find bugs, regressions, security issues, and quality problems.\n\n` +
182
+ `The Codex companion script is ALREADY CONFIRMED PRESENT at:\n ${codexScriptPath}\n` +
183
+ `Launch it in the BACKGROUND and poll for completion — do NOT run it synchronously (a sync run would hit the Bash tool timeout):\n` +
184
+ ` • REVIEW_FILE=a unique /tmp file (e.g. /tmp/codexreview-wave-${cards[0].cardId}-$$.md)\n` +
185
+ ` • node "${codexScriptPath}" task "<your review instructions>" > "$REVIEW_FILE" 2>&1 — run this with run_in_background:true (the launching call returns immediately).\n` +
186
+ ` • Then POLL $REVIEW_FILE (BashOutput / repeated reads) until it holds a terminal result, up to a full 10-minute window.\n` +
187
+ `Return codexAvailable:false ONLY if $REVIEW_FILE ends up containing "CODEX_NOT_FOUND" or stays empty after the FULL 10-minute window — NEVER because a single Bash call returned slowly.\n\n` +
188
+ `${waveBrief}\n\n${baselineBrief}\n\n` +
189
+ `For each finding return: finding_id, title, severity (BLOCKER|HIGH|MEDIUM|LOW), confidence (0-100), evidence (exact file:line + code quote), minimal_fix_direction, and domain (doc|security|migration|code|perf|test). ` +
190
+ `Run the mandatory false-positive check on every finding and suppress the unconvincing ones (your findings are treated as already FP-validated). Set codexAvailable:true when the review ran.`
191
+
192
+ const qaPrompt =
193
+ `Run MECHANICAL GATES ONLY over the wave scope, per ${protocolRef} (Phase 3.5 qa-sentinel contract): lint, type-check (when stack uses typescript), the full test suite, build, dependency audit, and markdownlint as applicable. You are a GATE RUNNER: do NOT read source for code findings, do NOT emit severities — return only a PASS/FAIL/SKIP gate table.\n\nWorktree: ${a.worktreePath || '(cwd)'} — cd into it first.\nChanged files:\n${unionScope.join('\n')}`
194
+
195
+ function simplifyPrompt(c) {
196
+ return `Simplify analysis (read-only — you do NOT edit, the workflow applies fixes afterward) over ONE card's committed diff, per ${protocolRef} (Phase 2.55). Cover all THREE lenses and return findings:\n` +
197
+ ` • Reuse — newly written code that duplicates an existing util/helper; inline logic that could use existing code.\n` +
198
+ ` • Quality — redundant state, parameter sprawl, copy-paste with slight variation, leaky abstractions, stringly-typed code where enums exist, unnecessary JSX nesting, WHAT-comments / narration.\n` +
199
+ ` • Efficiency — redundant computation, duplicate API calls, N+1, missed concurrency, hot-path bloat, missing change-detection guards, unbounded structures.\n\n` +
200
+ `${cardScopeBrief(c)}\n\n${baselineBrief}\n\n` +
201
+ `Run a false-positive check on every finding and SUPPRESS the unconvincing ones (your surviving findings are treated as validated). Return findings with domain in {code, perf}. Flag only a finding you genuinely cannot resolve as confidence < 80.`
202
+ }
203
+
204
+ function securityPrompt(c) {
205
+ return `AppSec review (read-only) over ONE card's committed diff — auth, permissions, secrets, webhooks, file upload, infra, multi-tenant isolation, injection. Security-sensitive paths: ${highRisk.join(', ') || '(none configured)'}.\n\n${cardScopeBrief(c)}\n\n${baselineBrief}\n\n` +
206
+ `You OWN the security domain end-to-end: run the mandatory false-positive check on every finding yourself and SUPPRESS the unconvincing ones — your surviving findings are treated as validated and are NOT re-judged by another agent. Flag only a finding you genuinely cannot resolve as confidence < 80. Return findings with domain in {security, migration}.`
207
+ }
208
+
209
+ // Build the finder fan-out. Per-card: simplify (if runSimplify), security (if hasSecurityFiles).
210
+ // Wave-wide: ONE Codex pass over the union, ONE qa-sentinel pass at full tier (else skipped → deferred).
211
+ const findThunks = []
212
+ for (const c of cards) {
213
+ if (c.runSimplify !== false) {
214
+ findThunks.push(() => agent(simplifyPrompt(c), { label: `simplify:${c.cardId}`, phase: 'Discovery', schema: FINDINGS_SCHEMA }).then((r) => ({ kind: 'simplify', card: c, r })))
215
+ }
216
+ if (c.hasSecurityFiles === true) {
217
+ findThunks.push(() => agent(securityPrompt(c), { label: `security:${c.cardId}`, phase: 'Discovery', agentType: 'security-reviewer', schema: FINDINGS_SCHEMA }).then((r) => ({ kind: 'security', card: c, r })))
218
+ }
219
+ }
220
+ if (codexResolved) {
221
+ findThunks.push(() => agent(codexPrompt, { label: 'codex', phase: 'Discovery', schema: CODEX_SCHEMA }).then((r) => ({ kind: 'codex', card: null, r })))
222
+ }
223
+ if (maxQaTier === 'full') {
224
+ findThunks.push(() => agent(qaPrompt, { label: 'qa-sentinel', phase: 'Discovery', agentType: 'qa-sentinel', schema: GATES_SCHEMA }).then((r) => ({ kind: 'qa', card: null, r })))
225
+ } else {
226
+ log('Discovery: qa-sentinel SKIPPED (wave max tier ≤ light — full suite deferred to the Final FULL gate).')
227
+ }
228
+
229
+ const findResults = (await parallel(findThunks)).filter(Boolean)
230
+
231
+ // ---- Fan-in: collect findings + Codex fallback branch -----------------------
232
+ let raw = []
233
+ let gateTable = []
234
+ let codexEngine = codexResolved ? 'codex' : 'code-reviewer (fallback)'
235
+ let codexRan = false
236
+ for (const item of findResults) {
237
+ if (item.kind === 'codex') {
238
+ if (item.r && item.r.codexAvailable && Array.isArray(item.r.findings)) {
239
+ codexRan = true
240
+ raw.push(...item.r.findings.map((f) => ({ ...f, source: 'codex', preValidated: true })))
241
+ } else {
242
+ log('Discovery: Codex companion resolved but its review did not complete — falling back to code-reviewer.')
243
+ }
244
+ } else if (item.kind === 'qa') {
245
+ gateTable = (item.r && item.r.gates) || []
246
+ } else if (item.r && Array.isArray(item.r.findings)) {
247
+ // simplify / security own their lane and FP-check their OWN findings → already validated.
248
+ raw.push(...item.r.findings.map((f) => ({ ...f, source: item.kind, preValidated: true })))
249
+ }
250
+ }
251
+
252
+ // ONE deterministic fallback: pre-flight found no companion, or it ran but did not complete.
253
+ if (!codexRan) {
254
+ codexEngine = 'code-reviewer (fallback)'
255
+ const fb = await agent(
256
+ `Codex was unavailable for this wave's code review. Run the FULL code review yourself over the wave diff, per ${protocolRef} (Phase 3.7).\n\n${waveBrief}\n\n${baselineBrief}\n\nReturn findings using the schema fields, with a self false-positive check applied (your findings are treated as validated).`,
257
+ { label: 'code-reviewer (fallback)', phase: 'Discovery', agentType: 'code-reviewer', schema: FINDINGS_SCHEMA }
258
+ )
259
+ if (fb && Array.isArray(fb.findings)) raw.push(...fb.findings.map((f) => ({ ...f, source: 'code-reviewer', preValidated: true })))
260
+ }
261
+
262
+ // ───────────────────────────────────────────────────────────────────────────
263
+ // Phase Verify — specialist-owned validation (parity with new-final-review.js F.4)
264
+ // ───────────────────────────────────────────────────────────────────────────
265
+ phase('Verify')
266
+ const classified = (await parallel(raw.map((f) => () => verifyFinding(f)))).filter(Boolean)
267
+
268
+ function domainVerifier(domain) {
269
+ const d = String(domain || 'code').toLowerCase()
270
+ if (/doc|wiki|ssot|readme/.test(d)) return 'doc-reviewer'
271
+ if (/sec|auth|secret|rls|migrat|schema|ddl|\bsql\b/.test(d)) return 'security-reviewer'
272
+ if (/perf|cost|\bapi\b|data|latency|throughput|n\+1/.test(d)) return 'api-perf-cost-auditor'
273
+ if (/\btest|qa\b|spec|coverage/.test(d)) return 'qa-sentinel'
274
+ return 'code-reviewer'
275
+ }
276
+
277
+ async function verifyFinding(f) {
278
+ if (f.preValidated || (typeof f.confidence === 'number' && f.confidence >= 80)) {
279
+ return { ...f, classification: 'VERIFIED' }
280
+ }
281
+ const verifier = domainVerifier(f.domain)
282
+ if (verifier === f.source) {
283
+ // domain specialist IS the finder and already had its pass — a second instance adds no diversity.
284
+ return { ...f, classification: 'NEEDS_MANUAL_CONFIRMATION' }
285
+ }
286
+ const v = await agent(
287
+ `Adversarially validate this ${f.domain || 'code'} finding as the DOMAIN specialist over the cited file:line. Default to FALSE_POSITIVE if the evidence does not hold; use NEEDS_MANUAL_CONFIRMATION only when you genuinely cannot decide from the code.\n\n` +
288
+ `finding_id: ${f.finding_id}\nseverity: ${f.severity}\ntitle: ${f.title}\nevidence: ${f.evidence}\ndomain: ${f.domain}\nsecurity-sensitive paths: ${highRisk.join(', ') || '(none configured)'}`,
289
+ { label: `verify:${f.finding_id}`, phase: 'Verify', agentType: verifier, schema: VERDICT_SCHEMA }
290
+ )
291
+ return { ...f, classification: (v && v.classification) || 'NEEDS_MANUAL_CONFIRMATION' }
292
+ }
293
+
294
+ // Drop FALSE_POSITIVE; attribute each surviving finding to its card by file → card map.
295
+ const fileToCard = buildFileToCard(cards)
296
+ const surviving = classified
297
+ .filter((f) => f.classification !== 'FALSE_POSITIVE')
298
+ .map((f) => ({ ...f, card: attributeCard(f, fileToCard, cards) }))
299
+
300
+ // ───────────────────────────────────────────────────────────────────────────
301
+ // Phase Fix — ONE coder applies all VERIFIED code/perf/security/simplify findings.
302
+ // doc findings → residual (the skill runs doc-reviewer post-E2E on final code).
303
+ // NEEDS_MANUAL_CONFIRMATION → residual (human gate, owned by the skill).
304
+ // ───────────────────────────────────────────────────────────────────────────
305
+ phase('Fix')
306
+ const isDoc = (f) => /doc|wiki|ssot|readme/.test(String(f.domain).toLowerCase())
307
+ const isManual = (f) => f.classification === 'NEEDS_MANUAL_CONFIRMATION'
308
+ // Partition `surviving` (= VERIFIED + NEEDS_MANUAL; FALSE_POSITIVE already dropped) with NO overlap:
309
+ // actionable = VERIFIED non-doc → the coder fixes these.
310
+ // docResidual = VERIFIED doc → the skill runs doc-reviewer post-E2E on final code.
311
+ // manualResidual= NEEDS_MANUAL any → human gate, owned by the skill (a doc-manual must NOT be
312
+ // silently auto-re-reviewed: it carries its needs-manual classification out).
313
+ const actionable = surviving.filter((f) => f.classification === 'VERIFIED' && !isDoc(f))
314
+ const docResidual = surviving.filter((f) => f.classification === 'VERIFIED' && isDoc(f))
315
+ const manualResidual = surviving.filter(isManual)
316
+
317
+ const SKIP_CHECKS = { lint: 'SKIP', tsc: 'SKIP', build: 'SKIP' }
318
+ let fixResult = { applied: [], unresolved: [], checks: { ...SKIP_CHECKS } }
319
+ if (actionable.length && unionEditable.length) {
320
+ const fixBrief =
321
+ `Apply ALL of the verified review findings below to the worktree, then verify the build. You are the SINGLE fix pass for this wave.\n\n` +
322
+ `Worktree: ${a.worktreePath || '(cwd)'} — cd into it.\n` +
323
+ `You MAY edit ONLY these files (ownership map — touching anything else is a violation):\n${unionEditable.join('\n')}\n\n` +
324
+ `Findings to fix (grouped — fix the code, not the tests unless a test itself is wrong; do NOT expand scope beyond the finding):\n` +
325
+ actionable.map((f) => `- [${f.finding_id}] (${f.card || '?'} / ${f.domain} / ${f.severity}) ${f.title}\n evidence: ${f.evidence}\n direction: ${f.minimal_fix_direction}`).join('\n') +
326
+ `\n\nAfter applying: run \`npm run lint\` and (when the project uses typescript) \`npx tsc --noEmit\` and \`npm run build\` in the worktree. If a check fails because of an edit you made, fix the regression — at most 2 retries — staying within the allowed files. ` +
327
+ `Do NOT commit. Do NOT git stash (refs/stash is shared across worktrees). ` +
328
+ `Return: applied (finding_ids you fixed), unresolved (finding_ids you could NOT fix within the allowed files / 2 retries), and checks (PASS/FAIL/SKIP for lint, tsc, build).`
329
+ const r = await agent(fixBrief, { label: 'fix-coder', phase: 'Fix', agentType: 'coder', schema: FIX_SCHEMA })
330
+ // Normalize: the coder may die (null) or return a truthy object missing fields.
331
+ fixResult = (r && typeof r === 'object') ? r : { applied: [], unresolved: actionable.map((f) => f.finding_id), checks: { ...SKIP_CHECKS } }
332
+ if (!Array.isArray(fixResult.applied)) fixResult.applied = []
333
+ if (!Array.isArray(fixResult.unresolved)) fixResult.unresolved = []
334
+ if (!fixResult.checks || typeof fixResult.checks !== 'object') fixResult.checks = { ...SKIP_CHECKS }
335
+ log(`Fix: coder applied ${fixResult.applied.length}/${actionable.length} finding(s); checks lint=${fixResult.checks.lint} tsc=${fixResult.checks.tsc} build=${fixResult.checks.build}.`)
336
+ } else if (actionable.length) {
337
+ // Actionable findings exist but NO editable files are mapped → cannot fix; return all as residual
338
+ // (no wasted coder spawn — the skill will route them to a targeted coder with a proper ownership scope).
339
+ fixResult = { applied: [], unresolved: actionable.map((f) => f.finding_id), checks: { ...SKIP_CHECKS } }
340
+ log(`Fix: ${actionable.length} actionable finding(s) but no editable files in scope — returned as residual (coder skipped).`)
341
+ } else {
342
+ log('Fix: no actionable code/perf/security/simplify findings — coder skipped.')
343
+ }
344
+
345
+ // Unfixed actionable findings become residual (human/coder follow-up owned by the skill).
346
+ const appliedIds = new Set((fixResult.applied || []).map((x) => x.finding_id))
347
+ const unresolvedIds = new Set(fixResult.unresolved || [])
348
+ const codeResidual = actionable.filter((f) => !appliedIds.has(f.finding_id) || unresolvedIds.has(f.finding_id))
349
+ const checksFailed = ['lint', 'tsc', 'build'].some((k) => fixResult.checks && fixResult.checks[k] === 'FAIL')
350
+
351
+ // ---- Assemble per-card result ----------------------------------------------
352
+ function bucket(cardId) { return perCard[cardId] || (perCard[cardId] = { fixesApplied: [], residual: [] }) }
353
+ for (const f of actionable) {
354
+ if (appliedIds.has(f.finding_id) && !unresolvedIds.has(f.finding_id)) {
355
+ bucket(f.card || cards[0].cardId).fixesApplied.push(`[${f.finding_id}] ${f.title}`)
356
+ }
357
+ }
358
+ for (const f of [...codeResidual, ...docResidual, ...manualResidual]) {
359
+ bucket(f.card || cards[0].cardId).residual.push(slimFinding(f))
360
+ }
361
+
362
+ const summary = makeSummary({
363
+ cards: cards.length,
364
+ totalFindings: raw.length,
365
+ verified: surviving.filter((f) => f.classification === 'VERIFIED').length,
366
+ falsePositive: classified.filter((f) => f.classification === 'FALSE_POSITIVE').length,
367
+ needsManual: manualResidual.length,
368
+ fixesApplied: appliedIds.size,
369
+ docResidual: docResidual.length,
370
+ codeResidual: codeResidual.length,
371
+ qaRan: maxQaTier === 'full', // false ⇒ qa-sentinel deferred to the Final FULL gate (qa_first_attempt = n/a)
372
+ checksFailed, // post-fix lint/tsc/build (coder); distinct from the qa-sentinel gateTable
373
+ failingGates: gateTable.filter((g) => g.status === 'FAIL').map((g) => g.gate), // qa-sentinel FAIL gates (test/build/audit/markdownlint)
374
+ blockers: surviving.filter((f) => f.classification === 'VERIFIED' && f.severity === 'BLOCKER').length,
375
+ highs: surviving.filter((f) => f.classification === 'VERIFIED' && f.severity === 'HIGH').length,
376
+ })
377
+ log(`Wave review done: ${summary.fixesApplied} fixed, ${summary.codeResidual} code-residual, ${summary.docResidual} doc-residual, ${summary.needsManual} needs-manual, ${summary.failingGates.length} failing gate(s)${checksFailed ? ', post-fix checks FAILED' : ''}. Engine: ${codexEngine}.`)
378
+
379
+ return { codexEngine, perCard, gateTable, summary }
380
+
381
+ // ---- helpers ----------------------------------------------------------------
382
+ function asArr(x) { return Array.isArray(x) ? x.filter(Boolean) : [] }
383
+ function dedupe(xs) { return Array.from(new Set(asArr(xs))) }
384
+ function makeSummary(o) {
385
+ return Object.assign({ cards: 0, totalFindings: 0, verified: 0, falsePositive: 0, needsManual: 0, fixesApplied: 0, docResidual: 0, codeResidual: 0, qaRan: false, checksFailed: false, failingGates: [], blockers: 0, highs: 0 }, o || {})
386
+ }
387
+ function slimFinding(f) {
388
+ return { finding_id: f.finding_id, title: f.title, severity: f.severity, domain: f.domain, evidence: f.evidence, minimal_fix_direction: f.minimal_fix_direction, classification: f.classification, card: f.card }
389
+ }
390
+ function buildFileToCard(cs) {
391
+ const m = new Map()
392
+ for (const c of cs) for (const file of dedupe([...asArr(c.scopeFiles), ...asArr(c.editableFiles)])) if (!m.has(file)) m.set(file, c.cardId)
393
+ return m
394
+ }
395
+ function attributeCard(f, map, cs) {
396
+ const ev = String(f.evidence || '') + ' ' + String(f.minimal_fix_direction || '')
397
+ // longest known path that appears in the evidence AT A PATH BOUNDARY wins (most specific match).
398
+ // Boundary match avoids `components/Button` mis-matching inside `components/ButtonGroup.tsx`.
399
+ let best = null
400
+ for (const [file, cardId] of map.entries()) {
401
+ if (pathInText(file, ev) && (!best || file.length > best.len)) best = { cardId, len: file.length }
402
+ }
403
+ return best ? best.cardId : cs[0].cardId
404
+ }
405
+ function pathInText(file, text) {
406
+ if (!file) return false
407
+ const cont = /[A-Za-z0-9._/-]/ // a path-continuation char on either side ⇒ not a real boundary match
408
+ let i = text.indexOf(file)
409
+ while (i !== -1) {
410
+ const before = i === 0 ? '' : text[i - 1]
411
+ const after = i + file.length >= text.length ? '' : text[i + file.length]
412
+ if (!cont.test(before) && !cont.test(after)) return true
413
+ i = text.indexOf(file, i + 1)
414
+ }
415
+ return false
416
+ }
@@ -14,6 +14,7 @@ workflows are unavailable behaves exactly as before.
14
14
  | Workflow | Used by | What it does |
15
15
  | :--- | :--- | :--- |
16
16
  | `new-final-review` | `/new` Final Review (Step F.1.5) | Runs the read-only cross-batch review fan-out — architecture baseline + Codex ‖ doc-reviewer ‖ api-perf-cost-auditor ‖ qa-sentinel — then adversarially verifies low-confidence findings and returns them classified. Applies no fixes (the skill owns fix application + user gates). **v4.17.1+:** Codex availability is resolved by a **deterministic pre-flight glob + background poll** (no false negatives from a synchronous-run timeout); a **single-card batch** skips the duplicate doc/api reviewers (already run per-card), keeping only the cross-model Codex pass + qa gates. |
17
+ | `new-card-review` (v4.34.0) | `/new` per-card review cluster — sequential (`review-cycle.md` § Phase 2.5x) **and** team-mode (`team-mode.md` D.1.6) | Hosts the **per-wave** review-cluster OUTSIDE the orchestrator context — the biggest prefix-growth source on long epics. Takes **1..N co-located cards** (1 = sequential per-card; N = a team-mode group, so it runs **once per wave, not per card**) and fans out the finders per card: Simplify + cross-model Codex (agent-launched binary, `code-reviewer` fallback) + qa-sentinel (group max tier) + security-reviewer (high-risk only), each specialist FP-checking its own findings. Then **ONE `coder` applies all VERIFIED code/perf/security/simplify findings in a single pass** (files disjoint by ownership) and re-verifies lint/tsc/build. Returns a compact `{perCard:{fixesApplied,residual}, gateTable, summary}` — the minority `residual` (doc, needs-manual, scope, unconverged) the skill resolves with the right specialist / user gate. **Doc-review and E2E stay in the skill** (doc is write-mode + must see final code; E2E is human-gated + nests a skill); **api-perf is deferred to the Final FULL gate**. Maps to `review-cycle.md` Phase 2.55+3.5+3.7. |
17
18
  | `new2` (v4.17.2) | `/new2` skill (the whole batch) | **EXPERIMENTAL A/B variant of `/new`.** Hosts the ENTIRE batch in the background runtime so subagent output never enters the main context. A **dependency-gated DAG scheduler** runs a card only when all in-batch deps are *committed* (and blocks transitive dependents of a failed dep instead of routing them to resolve); each card uses its **owner_agent** + a **specialized review fan-out** (not general-purpose); the worktree is kept **atomic per card** (rollback-to-HEAD on failure); transient API errors are retried and a sustained **outage degrades cleanly** (`degraded` return + durable resume via the skill); a **run ledger** dedups resolves and records accepted deferrals (no re-routing loop); the **merge is integrity-gated** (never force-DONE, never `git add` unreviewed code, never merge an incomplete/degraded batch); the commit step runs on **Haiku** while **follow-up cards are written by `prd-card-writer`**; telemetry carries real **cost** (`total_tokens` via `budget.spent()`, `agent_count`, skill-stamped `wall_clock_s`) + `degraded`. **v4.17.2:** the pre-flight **G3 cross-card Codex check is deterministic** (glob-first + background poll, skipped on single-card batches; `codex_resolved` in telemetry); a non-transient card crash is terminal-with-residual (no orphaned self-healing). Agents Read `/new`'s reference modules for semantics. |
18
19
  | `new2-resolve` (v4.17.2) | `new2` (self-healing) | Resolution pass for any gate that would otherwise need a human (`ac-unmet · blocker · qa-fail · e2e-blocked · merge-blocker · scope-expansion`). A **terminal short-circuit** skips the costly multi-attempt when the problem is impossible-by-definition (`out-of-ownership` verified in JS; other terminal reasons ratified by a judge); a **MANDATORY adversarial judge** cross-checks every `verified` claim — the judge independently greps the files and the workflow verifies **at least one** falls inside MAY-EDIT (`.some()`, so listing adjacent changed files is not mistaken for fabrication); accepts a **batched `findings` list** (one resolve per fix-area). **The domain is normalized** (freeform `documentation`→`doc`, …) before routing the **fixer** (doc→doc-reviewer, ui→ui-expert, security→security-reviewer, else coder) and judge, and a **doc finding gets doc-tree MAY-EDIT** (not the card's code scope); the 3-angle Tier-2 fan-out is reserved for code domains (single retry for doc/test). Follow-ups are written by **`prd-card-writer`**, offline-safe (deferred to the skill if no agent can write). |
19
20
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "baldart",
3
- "version": "4.33.2",
3
+ "version": "4.34.1",
4
4
  "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
5
5
  "bin": {
6
6
  "baldart": "./bin/baldart.js"