npm - baldart - Versions diffs - 4.24.3 → 4.25.0 - Mend

baldart 4.24.3 → 4.25.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/CHANGELOG.md +10 -0
package/VERSION +1 -1
package/framework/.claude/workflows/new2.js +47 -10
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,16 @@ All notable changes to BALDART will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [4.25.0] - 2026-06-11
+**`new2`: deterministic per-card review matrix — `balanced` no longer spawns every specialist on every card.** The old fan-out ran code-reviewer + doc-reviewer + qa-sentinel + api-perf-cost-auditor (+ security-reviewer) unconditionally at `balanced`: an api-perf pass on a doc-only card, a doc pass on a pure-code card — 1–3 wasted spawns per card. Each specialist now runs IFF its domain is evidenced by the card's actual surface (`scopeFiles ∪ MAY-EDIT`), computed deterministically in JS (no agent judgement) and audited via a `review-matrix` ledger row per card. **MINOR** (review-behavior capability on the EXPERIMENTAL `new2` surface only; no config key, no change to `/new` — its interactive profiles are untouched; the schema-change propagation rule does not apply).
+### Changed
+- **`framework/.claude/workflows/new2.js` — relevance-gated `balanced` fan-out.** `touchesCode` → code-reviewer + qa-sentinel; `touchesDocs` (`.md`/`.mdx` or under `paths.docs_dir|references_dir|wiki_dir|prd_dir|design_system`) → doc-reviewer (on a doc-ONLY card it IS the core finder — code-reviewer and qa-sentinel are skipped); `touchesApiData` (superset of the final review's `hasApiDataFiles` regex) → api-perf-cost-auditor; the v4.24.2 `securityRelevant` gate → security-reviewer. Surface = `scopeFiles ∪ MAY-EDIT`, so a DoD-mandated doc the implementer FORGOT to edit still triggers doc-reviewer (it's in MAY-EDIT). **Fail-safes**: `deep` keeps the unconditional full fan-out (Rule C escalation is respected); an empty surface (no evidence) falls back to the full fan-out — absence of evidence is not evidence of absence. `light`/`skip` unchanged.
+- **`new2.js` Phase Final — the F-041 `singleCard`/slim skip is now coverage-gated.** The final review's doc pass is skipped for a single-card batch ONLY when doc-reviewer ACTUALLY ran per-card (`reviewersRun`); otherwise it runs as the missing-doc-update safety net — so gating a per-card specialist off never leaves its domain unreviewed at batch level. This also closes a **latent F-041 hole**: under `light`, doc-reviewer never ran per-card, yet the slim skip still suppressed the final doc pass for single-card batches.
+- **`new2.js` telemetry — `per_card[].reviewers`** records the gated matrix per card, so the A/B spawn accounting can measure exactly what the gating saves.
 ## [4.24.3] - 2026-06-10
 **`new2`: two follow-ups from the v4.24.2 logic review — epic closure no longer misses deferred cards, and the owner-gated classification reaches the final review.** Both are "parallel location" gaps of earlier fixes (the v4.17.2 meta-lesson): v4.22.1's epic-closure and v4.24.1's owner-gated deferral each missed one site. **PATCH** (bug-fix to the EXPERIMENTAL `new2` surface only; no config key, no change to `/new`).

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 4.24.3
1	+ 4.25.0

package/framework/.claude/workflows/new2.js CHANGED Viewed

@@ -460,12 +460,40 @@ async function runCard(cardId, cardPath) {
   // v4.18.0 — at `light`, Codex is the SOLE finder (cost-shift off Claude); `code-reviewer` is the
   // fallback when the companion is unavailable. The FP-gate equivalent is preserved downstream: any
   // block routes through resolve(), whose mandatory adversarial judge (new2-resolve F-015, code domain
-  // → code-reviewer) cross-checks the Codex finding before a fix/followup. `balanced`/`deep` unchanged.
+  // → code-reviewer) cross-checks the Codex finding before a fix/followup.
   const codexAvail = !!sharedCtx.codexResolved && !!sharedCtx.codexScriptPath
-  const reviewers = reviewProfile === 'skip' ? []
-    : reviewProfile === 'light' ? (codexAvail ? ['codex'] : ['code-reviewer'])
-    : ['code-reviewer', 'doc-reviewer', 'qa-sentinel', 'api-perf-cost-auditor'].concat(
-        securityRelevant ? ['security-reviewer'] : [])
+  // B6 (v4.25.0) — deterministic per-card review matrix. `balanced` no longer means "every
+  // specialist on every card": each one runs IFF its domain is evidenced by the card's actual
+  // surface (scopeFiles ∪ MAY-EDIT), computed deterministically HERE in JS and audited via a
+  // `review-matrix` ledger row. `deep` keeps the unconditional full fan-out (Rule C assigns it
+  // to high-risk cards — respect the escalation). Coverage holds at batch level: the final
+  // review's doc pass stays the missing-doc-update safety net (its singleCard/slim skip is now
+  // gated on doc-reviewer having ACTUALLY run per-card — see Phase Final), and its api-perf
+  // pass keys on hasApiDataFiles with a regex this matrix supersets — so gating OFF a per-card
+  // specialist never leaves its domain unreviewed.
+  const surface = dedupe((scopeFiles || []).concat(mayEdit || []))
+  const docDirs = [paths.docs_dir, paths.references_dir, paths.wiki_dir, paths.prd_dir, paths.design_system].filter(Boolean)
+  const isDocFile = (f) => /\.(md|mdx)$/i.test(String(f)) || docDirs.some((d) => String(f).includes(String(d)))
+  const touchesDocs = surface.some(isDocFile)
+  const touchesCode = surface.some((f) => !isDocFile(f))
+  // superset of new-final-review's hasApiDataFiles regex — per-card coverage may exceed the
+  // final pass, never lag it.
+  const touchesApiData = surface.some((f) => /api\/|data-model|\.sql$|migrations?\/|server|route|edge|middleware|cron|queue|worker|prisma|drizzle|supabase|schema/i.test(String(f)))
+  const noEvidence = surface.length === 0 // fail-safe: absence of evidence ≠ evidence of absence
+  const FULL_FANOUT = ['code-reviewer', 'doc-reviewer', 'qa-sentinel', 'api-perf-cost-auditor'].concat(securityRelevant ? ['security-reviewer'] : [])
+  let reviewers
+  if (reviewProfile === 'skip') reviewers = []
+  else if (reviewProfile === 'light') reviewers = codexAvail ? ['codex'] : ['code-reviewer']
+  else if (reviewProfile === 'deep' || noEvidence) reviewers = FULL_FANOUT
+  else { // balanced — relevance-gated
+    reviewers = []
+    if (touchesCode) reviewers.push('code-reviewer')
+    if (touchesDocs) reviewers.push('doc-reviewer') // doc-ONLY card: doc-reviewer IS the core finder
+    if (touchesCode) reviewers.push('qa-sentinel')  // skip for doc-only cards — no behavior to QA
+    if (touchesApiData) reviewers.push('api-perf-cost-auditor')
+    if (securityRelevant) reviewers.push('security-reviewer')
+  }
+  if (reviewProfile !== 'skip') g('review-matrix', 'PLANNED', `[${reviewProfile}${noEvidence && reviewProfile === 'balanced' ? '→conservative-full (no surface evidence)' : ''}] ${reviewers.join('+') || '(none)'} · docs:${touchesDocs} code:${touchesCode} api/data:${touchesApiData} sec:${securityRelevant}`)
   const reviewSchema = { type: 'object', required: ['blocks', 'scopeExpansion'], additionalProperties: true,
     properties: { blocks: { type: 'array', items: { type: 'object', additionalProperties: true } }, scopeExpansion: { type: 'array', items: { type: 'object', additionalProperties: true } }, note: { type: 'string' } } }
   let reviewResults = []
@@ -535,7 +563,7 @@ async function runCard(cardId, cardPath) {
     for (const sx of grp) g('scope-expansion', s === 'resolved' ? 'INTEGRATED' : 'FOLLOWUP', sx.evidence || '')
   }
-  if (cardBlocked) { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md` } }
+  if (cardBlocked) { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, reviewersRun: reviewers, archBaselinePath: `/tmp/arch-baseline-${cardId}.md` } }
   // --- Phase 4 — commit (F-023: Haiku + git-status reconcile, never git add -A). ---
   // F-040/H — DONE policy. A card with an OPEN owner-gated/policy-deferred AC commits its code but
@@ -578,6 +606,9 @@ async function runCard(cardId, cardPath) {
     // (A3): an 'unresolved' class means the DoD is genuinely unmet → the card stays IN_PROGRESS.
     deferred: deferredOpen,
     deferredClasses: Array.from(deferredClasses),
+    // B6 — which reviewers ACTUALLY ran (the gated matrix); Phase Final keys its slim/skip
+    // decision on this, and per_card telemetry records it for the A/B spawn accounting.
+    reviewersRun: reviewers,
     scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md`, gates,
   }
 }
@@ -675,9 +706,15 @@ if (committed.length && !degraded) {
       firstCardId: firstCard, worktreePath: sharedCtx.worktreePath, baseBranch: TRUNK,
       cardPaths: committed.map((r) => pathById[r.card]).filter(Boolean),
       reviewScopeFiles, archBaselinePaths: allArch, hasApiDataFiles, config: cfg,
-      // F-041 — single-card batch: doc-reviewer + api-perf already ran per-card and there is
-      // NO cross-card conflict to find. Keep only the cross-model Codex pass + qa gates.
-      singleCard: committed.length === 1,
+      // F-041 + B6 — slim the final pass (skip its doc-reviewer) ONLY when doc-reviewer
+      // ACTUALLY ran per-card for this single card. The old unconditional `length === 1` was
+      // built on the premise "doc + api-perf already ran per-card" — false under the gated
+      // review matrix (and it was ALREADY false under `light`, a latent F-041 coverage hole
+      // this closes). When per-card doc review was gated off, the final doc pass is the
+      // missing-doc-update safety net and must run even for a single card. api-perf at the
+      // final pass is independently gated by hasApiDataFiles (regex the per-card matrix
+      // supersets), so its coverage is consistent either way.
+      singleCard: committed.length === 1 && ((committed[0].reviewersRun || []).includes('doc-reviewer')),
     })
   } catch (e) { if (e && isTransient(e)) noteDegraded('outage'); final = null }
@@ -840,7 +877,7 @@ function buildTelemetry() {
     // cost — total_tokens via budget.spent() delta; agent_count via counter; wall_clock_s stamped by the SKILL.
     total_tokens: totalTokens,
     agent_count: agentCount,
-    per_card: perCardResults.map((r) => ({ card: r.card, status: r.status, deferred: !!r.deferred, deferredClasses: r.deferredClasses || [], gates: (r.gates || []).length })),
+    per_card: perCardResults.map((r) => ({ card: r.card, status: r.status, deferred: !!r.deferred, deferredClasses: r.deferredClasses || [], reviewers: r.reviewersRun || [], gates: (r.gates || []).length })),
     stats_requested: !!FLAGS.stats,
   }
 }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "baldart",
-  "version": "4.24.3",
+  "version": "4.25.0",
   "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
   "bin": {
     "baldart": "./bin/baldart.js"