baldart 4.31.1 → 4.32.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,21 @@ All notable changes to BALDART will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [4.32.0] - 2026-06-11
9
+
10
+ **`new2`: `deep` is now relevance-gated too — it no longer fires all 5 reviewers on every card regardless of content.** Diagnosing a real batch (the FEAT-0023 supplier-associated-users epic — permissions/RLS/migrations, so 7 of 11 cards are legitimately `review_profile: deep` per `prd-card-writer` Rule C) exposed an **asymmetry**, not a misclassification: the per-card review matrix relevance-gated `balanced` (a specialist runs only if its domain is evidenced by `scopeFiles ∪ MAY-EDIT`) but left `deep` on an **unconditional 5-way fan-out** (`FULL_FANOUT` = code-reviewer + doc-reviewer + qa-sentinel + api-perf-cost-auditor + security-reviewer). So a `deep` card with no doc surface still paid for a doc-reviewer, and one with no API/data surface still paid for api-perf-cost-auditor — every time. This is exactly the "every card gets a full review regardless of what's inside" symptom.
11
+
12
+ The fix unifies the review MODEL: **the profile controls review DEPTH, the surface controls review BREADTH.**
13
+ - `deep` and `balanced` now share the same relevance-gated reviewer SET (extracted into `relevanceGated()`): `touchesCode` → code-reviewer + qa-sentinel; `touchesDocs` → doc-reviewer (the core finder on a doc-only card); `touchesApiData` → api-perf-cost-auditor; `securityRelevant` → security-reviewer.
14
+ - `deep`'s extra DEPTH is unchanged and is carried downstream, NOT by the fan-out set: the full QA suite + Codex-full posture reach each spawned reviewer via `cardBrief`'s "Review profile" line, and the Phase-1 architect audit still keys on `reviewProfile === 'deep'` (`needAudit`). So the reviewers that DO run review at full depth — only the ones whose domain the card never touches are dropped.
15
+ - **Fail-safe preserved**: `noEvidence` (empty `scopeFiles ∪ MAY-EDIT`) still falls back to `FULL_FANOUT` for BOTH balanced and deep — absence of evidence ≠ evidence of absence. `light`/`skip` unchanged. Batch-level coverage is unchanged: the Final Review's batch-wide doc + api/perf passes remain the safety nets, and `securityRelevant` (which a `deep` card almost always trips) keeps security-reviewer on genuinely sensitive cards.
16
+
17
+ Not touched: `prd-card-writer` Rule C (it classifies correctly — `deep` for migration/RLS/permission/schema/HIGH-integration cards, `light` for pure-UI, `skip` for epic/doc — verified against the FEAT-0023 cards) and `/new`'s interactive prose path (which already scales per-card by profile + relevance — api-perf is deferred to the Final Review F.3, qa deferred at balanced, doc deferred at light-no-doc — so the unconditional-deep fan-out was unique to `new2.js`). **MINOR** (review-behavior change on the EXPERIMENTAL `new2` surface only; no config key — the schema-change propagation rule does not apply; no change to `/new`).
18
+
19
+ ### Changed
20
+
21
+ - **`framework/.claude/workflows/new2.js`** — per-card review matrix (B7): the relevance gate now applies to `deep` as well as `balanced` via a shared `relevanceGated()` helper; the `reviewProfile === 'deep'` arm of the `FULL_FANOUT` branch is removed, leaving only `noEvidence` as the conservative full-fan-out fail-safe. The `review-matrix` ledger row's `→conservative-full` annotation now fires for any `noEvidence` profile (was `balanced`-only). Comment block updated to document the depth-vs-breadth model.
22
+
8
23
  ## [4.31.1] - 2026-06-11
9
24
 
10
25
  **`new2`: fix v4.31.0's dedup coverage + drop empty residuals + honest A/B cost — verified against the real FEAT-0022 telemetry/report (not the narrative).** Reading the actual run's `skill-runs.jsonl` + workflow report (instead of the prior turn's recollection) exposed three things v4.31.0 missed:
package/VERSION CHANGED
@@ -1 +1 @@
1
- 4.31.1
1
+ 4.32.0
@@ -578,15 +578,22 @@ async function runCard(cardId, cardPath) {
578
578
  // block routes through resolve(), whose mandatory adversarial judge (new2-resolve F-015, code domain
579
579
  // → code-reviewer) cross-checks the Codex finding before a fix/followup.
580
580
  const codexAvail = !!sharedCtx.codexResolved && !!sharedCtx.codexScriptPath
581
- // B6 (v4.25.0) — deterministic per-card review matrix. `balanced` no longer means "every
582
- // specialist on every card": each one runs IFF its domain is evidenced by the card's actual
583
- // surface (scopeFiles ∪ MAY-EDIT), computed deterministically HERE in JS and audited via a
584
- // `review-matrix` ledger row. `deep` keeps the unconditional full fan-out (Rule C assigns it
585
- // to high-risk cards respect the escalation). Coverage holds at batch level: the final
586
- // review's doc pass stays the missing-doc-update safety net (its singleCard/slim skip is now
587
- // gated on doc-reviewer having ACTUALLY run per-card see Phase Final), and its api-perf
588
- // pass keys on hasApiDataFiles with a regex this matrix supersets so gating OFF a per-card
589
- // specialist never leaves its domain unreviewed.
581
+ // B6 (v4.25.0) / B7 (v4.32.0) — deterministic per-card review matrix. A specialist runs IFF its
582
+ // domain is evidenced by the card's actual surface (scopeFiles ∪ MAY-EDIT), computed
583
+ // deterministically HERE in JS and audited via a `review-matrix` ledger row. Since v4.32.0 this
584
+ // relevance gate applies to `deep` TOO `deep` no longer means an unconditional 5-way fan-out.
585
+ // The asymmetry it removed: `balanced` was surface-gated while `deep` paid for doc-reviewer on a
586
+ // card with no doc surface and api-perf-cost-auditor on a card with no API/data surface, every
587
+ // time. The REVIEW MODEL is now: the **profile controls review DEPTH** (deep full QA suite +
588
+ // Codex full + the Phase-1 audit gate all keyed on `reviewProfile` downstream and carried to
589
+ // each spawned reviewer via `cardBrief`'s "Review profile" line, so the depth of the reviewers
590
+ // that DO run is unchanged), and the **surface controls review BREADTH**. `noEvidence` (empty
591
+ // surface) stays the conservative full-fan-out fail-safe for BOTH balanced and deep — absence of
592
+ // evidence ≠ evidence of absence. Coverage holds at batch level: the final review's doc pass
593
+ // stays the missing-doc-update safety net (its singleCard/slim skip is gated on doc-reviewer
594
+ // having ACTUALLY run per-card — see Phase Final), and its api-perf pass keys on hasApiDataFiles
595
+ // with a regex this matrix supersets — so gating OFF a per-card specialist never leaves its
596
+ // domain unreviewed.
590
597
  const surface = dedupe((scopeFiles || []).concat(mayEdit || []))
591
598
  const docDirs = [paths.docs_dir, paths.references_dir, paths.wiki_dir, paths.prd_dir, paths.design_system].filter(Boolean)
592
599
  const isDocFile = (f) => /\.(md|mdx)$/i.test(String(f)) || docDirs.some((d) => String(f).includes(String(d)))
@@ -597,19 +604,23 @@ async function runCard(cardId, cardPath) {
597
604
  const touchesApiData = surface.some((f) => /api\/|data-model|\.sql$|migrations?\/|server|route|edge|middleware|cron|queue|worker|prisma|drizzle|supabase|schema/i.test(String(f)))
598
605
  const noEvidence = surface.length === 0 // fail-safe: absence of evidence ≠ evidence of absence
599
606
  const FULL_FANOUT = ['code-reviewer', 'doc-reviewer', 'qa-sentinel', 'api-perf-cost-auditor'].concat(securityRelevant ? ['security-reviewer'] : [])
607
+ // Relevance-gated BREADTH — shared by `balanced` AND `deep` (profile-gated DEPTH is carried
608
+ // downstream via cardBrief + needAudit, not by which specialists spawn).
609
+ const relevanceGated = () => {
610
+ const rs = []
611
+ if (touchesCode) rs.push('code-reviewer')
612
+ if (touchesDocs) rs.push('doc-reviewer') // doc-ONLY card: doc-reviewer IS the core finder
613
+ if (touchesCode) rs.push('qa-sentinel') // skip for doc-only cards — no behavior to QA
614
+ if (touchesApiData) rs.push('api-perf-cost-auditor')
615
+ if (securityRelevant) rs.push('security-reviewer')
616
+ return rs
617
+ }
600
618
  let reviewers
601
619
  if (reviewProfile === 'skip') reviewers = []
602
620
  else if (reviewProfile === 'light') reviewers = codexAvail ? ['codex'] : ['code-reviewer']
603
- else if (reviewProfile === 'deep' || noEvidence) reviewers = FULL_FANOUT
604
- else { // balanced — relevance-gated
605
- reviewers = []
606
- if (touchesCode) reviewers.push('code-reviewer')
607
- if (touchesDocs) reviewers.push('doc-reviewer') // doc-ONLY card: doc-reviewer IS the core finder
608
- if (touchesCode) reviewers.push('qa-sentinel') // skip for doc-only cards — no behavior to QA
609
- if (touchesApiData) reviewers.push('api-perf-cost-auditor')
610
- if (securityRelevant) reviewers.push('security-reviewer')
611
- }
612
- if (reviewProfile !== 'skip') g('review-matrix', 'PLANNED', `[${reviewProfile}${noEvidence && reviewProfile === 'balanced' ? '→conservative-full (no surface evidence)' : ''}] ${reviewers.join('+') || '(none)'} · docs:${touchesDocs} code:${touchesCode} api/data:${touchesApiData} sec:${securityRelevant}`)
621
+ else if (noEvidence) reviewers = FULL_FANOUT // balanced/deep, no surface evidence conservative full
622
+ else reviewers = relevanceGated() // balanced AND deep surface-gated breadth
623
+ if (reviewProfile !== 'skip') g('review-matrix', 'PLANNED', `[${reviewProfile}${noEvidence ? '→conservative-full (no surface evidence)' : ''}] ${reviewers.join('+') || '(none)'} · docs:${touchesDocs} code:${touchesCode} api/data:${touchesApiData} sec:${securityRelevant}`)
613
624
  const reviewSchema = { type: 'object', required: ['blocks', 'scopeExpansion'], additionalProperties: true,
614
625
  properties: { blocks: { type: 'array', items: { type: 'object', additionalProperties: true } }, scopeExpansion: { type: 'array', items: { type: 'object', additionalProperties: true } }, note: { type: 'string' } } }
615
626
  let reviewResults = []
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "baldart",
3
- "version": "4.31.1",
3
+ "version": "4.32.0",
4
4
  "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
5
5
  "bin": {
6
6
  "baldart": "./bin/baldart.js"