npm - baldart - Versions diffs - 4.24.2 → 4.25.0 - Mend

baldart 4.24.2 → 4.25.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md +19 -0
package/VERSION +1 -1
package/framework/.claude/skills/new2/SKILL.md +8 -0
package/framework/.claude/workflows/new2.js +60 -15
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,25 @@ All notable changes to BALDART will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [4.25.0] - 2026-06-11
+**`new2`: deterministic per-card review matrix — `balanced` no longer spawns every specialist on every card.** The old fan-out ran code-reviewer + doc-reviewer + qa-sentinel + api-perf-cost-auditor (+ security-reviewer) unconditionally at `balanced`: an api-perf pass on a doc-only card, a doc pass on a pure-code card — 1–3 wasted spawns per card. Each specialist now runs IFF its domain is evidenced by the card's actual surface (`scopeFiles ∪ MAY-EDIT`), computed deterministically in JS (no agent judgement) and audited via a `review-matrix` ledger row per card. **MINOR** (review-behavior capability on the EXPERIMENTAL `new2` surface only; no config key, no change to `/new` — its interactive profiles are untouched; the schema-change propagation rule does not apply).
+### Changed
+- **`framework/.claude/workflows/new2.js` — relevance-gated `balanced` fan-out.** `touchesCode` → code-reviewer + qa-sentinel; `touchesDocs` (`.md`/`.mdx` or under `paths.docs_dir|references_dir|wiki_dir|prd_dir|design_system`) → doc-reviewer (on a doc-ONLY card it IS the core finder — code-reviewer and qa-sentinel are skipped); `touchesApiData` (superset of the final review's `hasApiDataFiles` regex) → api-perf-cost-auditor; the v4.24.2 `securityRelevant` gate → security-reviewer. Surface = `scopeFiles ∪ MAY-EDIT`, so a DoD-mandated doc the implementer FORGOT to edit still triggers doc-reviewer (it's in MAY-EDIT). **Fail-safes**: `deep` keeps the unconditional full fan-out (Rule C escalation is respected); an empty surface (no evidence) falls back to the full fan-out — absence of evidence is not evidence of absence. `light`/`skip` unchanged.
+- **`new2.js` Phase Final — the F-041 `singleCard`/slim skip is now coverage-gated.** The final review's doc pass is skipped for a single-card batch ONLY when doc-reviewer ACTUALLY ran per-card (`reviewersRun`); otherwise it runs as the missing-doc-update safety net — so gating a per-card specialist off never leaves its domain unreviewed at batch level. This also closes a **latent F-041 hole**: under `light`, doc-reviewer never ran per-card, yet the slim skip still suppressed the final doc pass for single-card batches.
+- **`new2.js` telemetry — `per_card[].reviewers`** records the gated matrix per card, so the A/B spawn accounting can measure exactly what the gating saves.
+## [4.24.3] - 2026-06-10
+**`new2`: two follow-ups from the v4.24.2 logic review — epic closure no longer misses deferred cards, and the owner-gated classification reaches the final review.** Both are "parallel location" gaps of earlier fixes (the v4.17.2 meta-lesson): v4.22.1's epic-closure and v4.24.1's owner-gated deferral each missed one site. **PATCH** (bug-fix to the EXPERIMENTAL `new2` surface only; no config key, no change to `/new`).
+### Fixed
+- **`new2/SKILL.md` (Step 5.2) — epic-closure re-check after deferred-DONE.** The merge agent's epic-closure (Phase 6b step 5e) runs BEFORE the skill marks deferred cards DONE post-run — so an epic whose last open child was a deferred card stayed TODO forever (nothing re-checked it). The skill now re-runs the all-children-DONE check for the parents of the cards it just marked DONE, closing the epic in the same reconciliation commit.
+- **`new2.js` (Phase Final) — owner-gated deferrals no longer block the merge at final review.** The `merge-blocker`/`qa-fail` resolves checked only `.status`: a finding whose sole remedy is an external/infra step (e.g. the same pending remote `db:push` a reviewer re-raises batch-wide) set `mergeBlocked` and stranded a complete batch. The `deferralClass` check from the per-card loops (v4.24.1/v4.24.2) now applies here too: `owner-gated`/`not-a-code-defect` → follow-up tracked + `DEFERRED-OWNER-GATED` ledger row, merge proceeds; a genuine unresolved code defect still blocks.
 ## [4.24.2] - 2026-06-10
 **`new2`: holistic logic review — deterministic owner_agent routing, a merge gate that no longer strands the batch, `deferralClass` end-to-end, and −N agent spawns per run.** A full logic review of `new2.js`/`new2-resolve.js`/`SKILL.md` against `/new`'s reference modules found four correctness defects and five sources of wasted spawns. The headline routing bug: the pre-flight's `ownerAgent` was passed RAW as `agentType` — the G25 "unknown→coder" rule lived only in the prompt, so any freeform value (`claude`, `backend`, a typo) was a PERMANENT spawn error → card `failed` → (combined with the merge-gate bug) the whole batch unmerged. **PATCH** (bug-fix/hardening of the EXPERIMENTAL `new2` surface only; **no `baldart.config.yml` key**, **no change to `/new`** — the schema-change propagation rule does not apply).

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 4.24.2
1	+ 4.25.0

package/framework/.claude/skills/new2/SKILL.md CHANGED Viewed

@@ -157,6 +157,14 @@ returns when the batch is done. It returns:
      follow-up <id>"`. NEVER auto-DONE it — a follow-up tracks the gap, but DONE would lie (F-029).
    **If a card's follow-up could NOT be created in step 1, leave it NON-DONE and surface it** —
    fail-loud; NEVER mark a card DONE with a silently-dropped requirement (F-029).
+   **Then re-run the epic-closure check for the cards just marked DONE.** The merge agent's
+   epic-closure (Phase 6b step 5e) ran BEFORE this step, when the deferred cards were still
+   NON-DONE — so an epic whose last open child was a deferred card is still TODO and nothing
+   else will ever close it. For each distinct `group.parent` of the cards marked DONE here: if
+   `grep -l "parent: <EPIC-ID>" ${paths.backlog_dir}/*.yml | xargs grep -L "status: DONE"`
+   prints nothing, set the epic card `status: DONE` + `completed_date` + note
+   `"epic-closure gate — all children DONE (post-run, new2 skill)"`, folded into the SAME
+   reconciliation commit. If any child is still open, leave the epic untouched.
 3. **Resume if degraded.** If `degraded` is true, re-invoke the workflow with
    `Workflow({ scriptPath, resumeFromRunId })` (same `args` + the new `ts`). The
    per-card **skip-completed** guard makes the resume idempotent — already-committed

package/framework/.claude/workflows/new2.js CHANGED Viewed

@@ -460,12 +460,40 @@ async function runCard(cardId, cardPath) {
   // v4.18.0 — at `light`, Codex is the SOLE finder (cost-shift off Claude); `code-reviewer` is the
   // fallback when the companion is unavailable. The FP-gate equivalent is preserved downstream: any
   // block routes through resolve(), whose mandatory adversarial judge (new2-resolve F-015, code domain
-  // → code-reviewer) cross-checks the Codex finding before a fix/followup. `balanced`/`deep` unchanged.
+  // → code-reviewer) cross-checks the Codex finding before a fix/followup.
   const codexAvail = !!sharedCtx.codexResolved && !!sharedCtx.codexScriptPath
-  const reviewers = reviewProfile === 'skip' ? []
-    : reviewProfile === 'light' ? (codexAvail ? ['codex'] : ['code-reviewer'])
-    : ['code-reviewer', 'doc-reviewer', 'qa-sentinel', 'api-perf-cost-auditor'].concat(
-        securityRelevant ? ['security-reviewer'] : [])
+  // B6 (v4.25.0) — deterministic per-card review matrix. `balanced` no longer means "every
+  // specialist on every card": each one runs IFF its domain is evidenced by the card's actual
+  // surface (scopeFiles ∪ MAY-EDIT), computed deterministically HERE in JS and audited via a
+  // `review-matrix` ledger row. `deep` keeps the unconditional full fan-out (Rule C assigns it
+  // to high-risk cards — respect the escalation). Coverage holds at batch level: the final
+  // review's doc pass stays the missing-doc-update safety net (its singleCard/slim skip is now
+  // gated on doc-reviewer having ACTUALLY run per-card — see Phase Final), and its api-perf
+  // pass keys on hasApiDataFiles with a regex this matrix supersets — so gating OFF a per-card
+  // specialist never leaves its domain unreviewed.
+  const surface = dedupe((scopeFiles || []).concat(mayEdit || []))
+  const docDirs = [paths.docs_dir, paths.references_dir, paths.wiki_dir, paths.prd_dir, paths.design_system].filter(Boolean)
+  const isDocFile = (f) => /\.(md|mdx)$/i.test(String(f)) || docDirs.some((d) => String(f).includes(String(d)))
+  const touchesDocs = surface.some(isDocFile)
+  const touchesCode = surface.some((f) => !isDocFile(f))
+  // superset of new-final-review's hasApiDataFiles regex — per-card coverage may exceed the
+  // final pass, never lag it.
+  const touchesApiData = surface.some((f) => /api\/|data-model|\.sql$|migrations?\/|server|route|edge|middleware|cron|queue|worker|prisma|drizzle|supabase|schema/i.test(String(f)))
+  const noEvidence = surface.length === 0 // fail-safe: absence of evidence ≠ evidence of absence
+  const FULL_FANOUT = ['code-reviewer', 'doc-reviewer', 'qa-sentinel', 'api-perf-cost-auditor'].concat(securityRelevant ? ['security-reviewer'] : [])
+  let reviewers
+  if (reviewProfile === 'skip') reviewers = []
+  else if (reviewProfile === 'light') reviewers = codexAvail ? ['codex'] : ['code-reviewer']
+  else if (reviewProfile === 'deep' || noEvidence) reviewers = FULL_FANOUT
+  else { // balanced — relevance-gated
+    reviewers = []
+    if (touchesCode) reviewers.push('code-reviewer')
+    if (touchesDocs) reviewers.push('doc-reviewer') // doc-ONLY card: doc-reviewer IS the core finder
+    if (touchesCode) reviewers.push('qa-sentinel')  // skip for doc-only cards — no behavior to QA
+    if (touchesApiData) reviewers.push('api-perf-cost-auditor')
+    if (securityRelevant) reviewers.push('security-reviewer')
+  }
+  if (reviewProfile !== 'skip') g('review-matrix', 'PLANNED', `[${reviewProfile}${noEvidence && reviewProfile === 'balanced' ? '→conservative-full (no surface evidence)' : ''}] ${reviewers.join('+') || '(none)'} · docs:${touchesDocs} code:${touchesCode} api/data:${touchesApiData} sec:${securityRelevant}`)
   const reviewSchema = { type: 'object', required: ['blocks', 'scopeExpansion'], additionalProperties: true,
     properties: { blocks: { type: 'array', items: { type: 'object', additionalProperties: true } }, scopeExpansion: { type: 'array', items: { type: 'object', additionalProperties: true } }, note: { type: 'string' } } }
   let reviewResults = []
@@ -535,7 +563,7 @@ async function runCard(cardId, cardPath) {
     for (const sx of grp) g('scope-expansion', s === 'resolved' ? 'INTEGRATED' : 'FOLLOWUP', sx.evidence || '')
   }
-  if (cardBlocked) { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md` } }
+  if (cardBlocked) { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, reviewersRun: reviewers, archBaselinePath: `/tmp/arch-baseline-${cardId}.md` } }
   // --- Phase 4 — commit (F-023: Haiku + git-status reconcile, never git add -A). ---
   // F-040/H — DONE policy. A card with an OPEN owner-gated/policy-deferred AC commits its code but
@@ -578,6 +606,9 @@ async function runCard(cardId, cardPath) {
     // (A3): an 'unresolved' class means the DoD is genuinely unmet → the card stays IN_PROGRESS.
     deferred: deferredOpen,
     deferredClasses: Array.from(deferredClasses),
+    // B6 — which reviewers ACTUALLY ran (the gated matrix); Phase Final keys its slim/skip
+    // decision on this, and per_card telemetry records it for the A/B spawn accounting.
+    reviewersRun: reviewers,
     scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md`, gates,
   }
 }
@@ -675,9 +706,15 @@ if (committed.length && !degraded) {
       firstCardId: firstCard, worktreePath: sharedCtx.worktreePath, baseBranch: TRUNK,
       cardPaths: committed.map((r) => pathById[r.card]).filter(Boolean),
       reviewScopeFiles, archBaselinePaths: allArch, hasApiDataFiles, config: cfg,
-      // F-041 — single-card batch: doc-reviewer + api-perf already ran per-card and there is
-      // NO cross-card conflict to find. Keep only the cross-model Codex pass + qa gates.
-      singleCard: committed.length === 1,
+      // F-041 + B6 — slim the final pass (skip its doc-reviewer) ONLY when doc-reviewer
+      // ACTUALLY ran per-card for this single card. The old unconditional `length === 1` was
+      // built on the premise "doc + api-perf already ran per-card" — false under the gated
+      // review matrix (and it was ALREADY false under `light`, a latent F-041 coverage hole
+      // this closes). When per-card doc review was gated off, the final doc pass is the
+      // missing-doc-update safety net and must run even for a single card. api-perf at the
+      // final pass is independently gated by hasApiDataFiles (regex the per-card matrix
+      // supersets), so its coverage is consistent either way.
+      singleCard: committed.length === 1 && ((committed[0].reviewersRun || []).includes('doc-reviewer')),
     })
   } catch (e) { if (e && isTransient(e)) noteDegraded('outage'); final = null }
@@ -699,17 +736,25 @@ if (committed.length && !degraded) {
       const area = (Array.isArray(f.files) && f.files[0]) || (f.file) || (f.domain || 'misc')
       ;(byArea[area] = byArea[area] || []).push(f)
     }
+    // F-040 (parallel location, v4.24.3) — the owner-gated classification applies HERE too,
+    // not just in the per-card loops: a final-review finding whose only remedy is an external/
+    // infra step (e.g. the same pending remote db:push a reviewer re-raises batch-wide) must
+    // NOT block the merge — the batch's code is complete and the residual is already tracked
+    // as a follow-up with its deferralClass. Only a genuine unresolved CODE defect blocks.
+    const ownerGatedFinal = (r) => r.status === 'followup' && (r.deferralClass === 'owner-gated' || r.deferralClass === 'not-a-code-defect')
     for (const area of Object.keys(byArea)) {
       const group = byArea[area]
-      const s = (await resolve('merge-blocker', group[0].finding_id || firstCard,
+      const r = await resolve('merge-blocker', group[0].finding_id || firstCard,
         group.map((f) => `${f.severity} ${f.title}: ${f.evidence}`).join(' || '),
         { mayEditPaths: reviewScopeFiles, scopeFiles: reviewScopeFiles, domain: group[0].domain || 'code',
-          findings: group.map((f) => ({ kind: 'merge-blocker', evidence: `${f.title}: ${f.evidence}`, domain: f.domain || 'code' })) })).status
-      if (s !== 'resolved') mergeBlocked = true
+          findings: group.map((f) => ({ kind: 'merge-blocker', evidence: `${f.title}: ${f.evidence}`, domain: f.domain || 'code' })) })
+      if (ownerGatedFinal(r)) ledger(group[0].finding_id || firstCard, 'final-merge-blocker', 'DEFERRED-OWNER-GATED', `${r.deferralClass} — follow-up tracked, merge NOT blocked`)
+      else if (r.status !== 'resolved') mergeBlocked = true
     }
     if (finalSummary && finalSummary.failingGates && finalSummary.failingGates.length) {
-      const s = (await resolve('qa-fail', firstCard, `final gates failing: ${finalSummary.failingGates.join(', ')}`, { mayEditPaths: reviewScopeFiles, scopeFiles: reviewScopeFiles, domain: 'code' })).status
-      if (s !== 'resolved') mergeBlocked = true
+      const r = await resolve('qa-fail', firstCard, `final gates failing: ${finalSummary.failingGates.join(', ')}`, { mayEditPaths: reviewScopeFiles, scopeFiles: reviewScopeFiles, domain: 'code' })
+      if (ownerGatedFinal(r)) ledger(firstCard, 'final-qa', 'DEFERRED-OWNER-GATED', `${r.deferralClass} — follow-up tracked, merge NOT blocked`)
+      else if (r.status !== 'resolved') mergeBlocked = true
     }
   } else {
     ledger(firstCard, 'final-review', 'SKIPPED', degraded ? 'degraded' : 'workflow returned null')
@@ -832,7 +877,7 @@ function buildTelemetry() {
     // cost — total_tokens via budget.spent() delta; agent_count via counter; wall_clock_s stamped by the SKILL.
     total_tokens: totalTokens,
     agent_count: agentCount,
-    per_card: perCardResults.map((r) => ({ card: r.card, status: r.status, deferred: !!r.deferred, deferredClasses: r.deferredClasses || [], gates: (r.gates || []).length })),
+    per_card: perCardResults.map((r) => ({ card: r.card, status: r.status, deferred: !!r.deferred, deferredClasses: r.deferredClasses || [], reviewers: r.reviewersRun || [], gates: (r.gates || []).length })),
     stats_requested: !!FLAGS.stats,
   }
 }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "baldart",
-  "version": "4.24.2",
+  "version": "4.25.0",
   "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
   "bin": {
     "baldart": "./bin/baldart.js"