baldart 4.24.3 → 4.25.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +10 -0
- package/VERSION +1 -1
- package/framework/.claude/workflows/new2.js +47 -10
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,16 @@ All notable changes to BALDART will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [4.25.0] - 2026-06-11
|
|
9
|
+
|
|
10
|
+
**`new2`: deterministic per-card review matrix — `balanced` no longer spawns every specialist on every card.** The old fan-out ran code-reviewer + doc-reviewer + qa-sentinel + api-perf-cost-auditor (+ security-reviewer) unconditionally at `balanced`: an api-perf pass on a doc-only card, a doc pass on a pure-code card — 1–3 wasted spawns per card. Each specialist now runs IFF its domain is evidenced by the card's actual surface (`scopeFiles ∪ MAY-EDIT`), computed deterministically in JS (no agent judgement) and audited via a `review-matrix` ledger row per card. **MINOR** (review-behavior capability on the EXPERIMENTAL `new2` surface only; no config key, no change to `/new` — its interactive profiles are untouched; the schema-change propagation rule does not apply).
|
|
11
|
+
|
|
12
|
+
### Changed
|
|
13
|
+
|
|
14
|
+
- **`framework/.claude/workflows/new2.js` — relevance-gated `balanced` fan-out.** `touchesCode` → code-reviewer + qa-sentinel; `touchesDocs` (`.md`/`.mdx` or under `paths.docs_dir|references_dir|wiki_dir|prd_dir|design_system`) → doc-reviewer (on a doc-ONLY card it IS the core finder — code-reviewer and qa-sentinel are skipped); `touchesApiData` (superset of the final review's `hasApiDataFiles` regex) → api-perf-cost-auditor; the v4.24.2 `securityRelevant` gate → security-reviewer. Surface = `scopeFiles ∪ MAY-EDIT`, so a DoD-mandated doc the implementer FORGOT to edit still triggers doc-reviewer (it's in MAY-EDIT). **Fail-safes**: `deep` keeps the unconditional full fan-out (Rule C escalation is respected); an empty surface (no evidence) falls back to the full fan-out — absence of evidence is not evidence of absence. `light`/`skip` unchanged.
|
|
15
|
+
- **`new2.js` Phase Final — the F-041 `singleCard`/slim skip is now coverage-gated.** The final review's doc pass is skipped for a single-card batch ONLY when doc-reviewer ACTUALLY ran per-card (`reviewersRun`); otherwise it runs as the missing-doc-update safety net — so gating a per-card specialist off never leaves its domain unreviewed at batch level. This also closes a **latent F-041 hole**: under `light`, doc-reviewer never ran per-card, yet the slim skip still suppressed the final doc pass for single-card batches.
|
|
16
|
+
- **`new2.js` telemetry — `per_card[].reviewers`** records the gated matrix per card, so the A/B spawn accounting can measure exactly what the gating saves.
|
|
17
|
+
|
|
8
18
|
## [4.24.3] - 2026-06-10
|
|
9
19
|
|
|
10
20
|
**`new2`: two follow-ups from the v4.24.2 logic review — epic closure no longer misses deferred cards, and the owner-gated classification reaches the final review.** Both are "parallel location" gaps of earlier fixes (the v4.17.2 meta-lesson): v4.22.1's epic-closure and v4.24.1's owner-gated deferral each missed one site. **PATCH** (bug-fix to the EXPERIMENTAL `new2` surface only; no config key, no change to `/new`).
|
package/VERSION
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
4.
|
|
1
|
+
4.25.0
|
|
@@ -460,12 +460,40 @@ async function runCard(cardId, cardPath) {
|
|
|
460
460
|
// v4.18.0 — at `light`, Codex is the SOLE finder (cost-shift off Claude); `code-reviewer` is the
|
|
461
461
|
// fallback when the companion is unavailable. The FP-gate equivalent is preserved downstream: any
|
|
462
462
|
// block routes through resolve(), whose mandatory adversarial judge (new2-resolve F-015, code domain
|
|
463
|
-
// → code-reviewer) cross-checks the Codex finding before a fix/followup.
|
|
463
|
+
// → code-reviewer) cross-checks the Codex finding before a fix/followup.
|
|
464
464
|
const codexAvail = !!sharedCtx.codexResolved && !!sharedCtx.codexScriptPath
|
|
465
|
-
|
|
466
|
-
|
|
467
|
-
|
|
468
|
-
|
|
465
|
+
// B6 (v4.25.0) — deterministic per-card review matrix. `balanced` no longer means "every
|
|
466
|
+
// specialist on every card": each one runs IFF its domain is evidenced by the card's actual
|
|
467
|
+
// surface (scopeFiles ∪ MAY-EDIT), computed deterministically HERE in JS and audited via a
|
|
468
|
+
// `review-matrix` ledger row. `deep` keeps the unconditional full fan-out (Rule C assigns it
|
|
469
|
+
// to high-risk cards — respect the escalation). Coverage holds at batch level: the final
|
|
470
|
+
// review's doc pass stays the missing-doc-update safety net (its singleCard/slim skip is now
|
|
471
|
+
// gated on doc-reviewer having ACTUALLY run per-card — see Phase Final), and its api-perf
|
|
472
|
+
// pass keys on hasApiDataFiles with a regex this matrix supersets — so gating OFF a per-card
|
|
473
|
+
// specialist never leaves its domain unreviewed.
|
|
474
|
+
const surface = dedupe((scopeFiles || []).concat(mayEdit || []))
|
|
475
|
+
const docDirs = [paths.docs_dir, paths.references_dir, paths.wiki_dir, paths.prd_dir, paths.design_system].filter(Boolean)
|
|
476
|
+
const isDocFile = (f) => /\.(md|mdx)$/i.test(String(f)) || docDirs.some((d) => String(f).includes(String(d)))
|
|
477
|
+
const touchesDocs = surface.some(isDocFile)
|
|
478
|
+
const touchesCode = surface.some((f) => !isDocFile(f))
|
|
479
|
+
// superset of new-final-review's hasApiDataFiles regex — per-card coverage may exceed the
|
|
480
|
+
// final pass, never lag it.
|
|
481
|
+
const touchesApiData = surface.some((f) => /api\/|data-model|\.sql$|migrations?\/|server|route|edge|middleware|cron|queue|worker|prisma|drizzle|supabase|schema/i.test(String(f)))
|
|
482
|
+
const noEvidence = surface.length === 0 // fail-safe: absence of evidence ≠ evidence of absence
|
|
483
|
+
const FULL_FANOUT = ['code-reviewer', 'doc-reviewer', 'qa-sentinel', 'api-perf-cost-auditor'].concat(securityRelevant ? ['security-reviewer'] : [])
|
|
484
|
+
let reviewers
|
|
485
|
+
if (reviewProfile === 'skip') reviewers = []
|
|
486
|
+
else if (reviewProfile === 'light') reviewers = codexAvail ? ['codex'] : ['code-reviewer']
|
|
487
|
+
else if (reviewProfile === 'deep' || noEvidence) reviewers = FULL_FANOUT
|
|
488
|
+
else { // balanced — relevance-gated
|
|
489
|
+
reviewers = []
|
|
490
|
+
if (touchesCode) reviewers.push('code-reviewer')
|
|
491
|
+
if (touchesDocs) reviewers.push('doc-reviewer') // doc-ONLY card: doc-reviewer IS the core finder
|
|
492
|
+
if (touchesCode) reviewers.push('qa-sentinel') // skip for doc-only cards — no behavior to QA
|
|
493
|
+
if (touchesApiData) reviewers.push('api-perf-cost-auditor')
|
|
494
|
+
if (securityRelevant) reviewers.push('security-reviewer')
|
|
495
|
+
}
|
|
496
|
+
if (reviewProfile !== 'skip') g('review-matrix', 'PLANNED', `[${reviewProfile}${noEvidence && reviewProfile === 'balanced' ? '→conservative-full (no surface evidence)' : ''}] ${reviewers.join('+') || '(none)'} · docs:${touchesDocs} code:${touchesCode} api/data:${touchesApiData} sec:${securityRelevant}`)
|
|
469
497
|
const reviewSchema = { type: 'object', required: ['blocks', 'scopeExpansion'], additionalProperties: true,
|
|
470
498
|
properties: { blocks: { type: 'array', items: { type: 'object', additionalProperties: true } }, scopeExpansion: { type: 'array', items: { type: 'object', additionalProperties: true } }, note: { type: 'string' } } }
|
|
471
499
|
let reviewResults = []
|
|
@@ -535,7 +563,7 @@ async function runCard(cardId, cardPath) {
|
|
|
535
563
|
for (const sx of grp) g('scope-expansion', s === 'resolved' ? 'INTEGRATED' : 'FOLLOWUP', sx.evidence || '')
|
|
536
564
|
}
|
|
537
565
|
|
|
538
|
-
if (cardBlocked) { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md` } }
|
|
566
|
+
if (cardBlocked) { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, reviewersRun: reviewers, archBaselinePath: `/tmp/arch-baseline-${cardId}.md` } }
|
|
539
567
|
|
|
540
568
|
// --- Phase 4 — commit (F-023: Haiku + git-status reconcile, never git add -A). ---
|
|
541
569
|
// F-040/H — DONE policy. A card with an OPEN owner-gated/policy-deferred AC commits its code but
|
|
@@ -578,6 +606,9 @@ async function runCard(cardId, cardPath) {
|
|
|
578
606
|
// (A3): an 'unresolved' class means the DoD is genuinely unmet → the card stays IN_PROGRESS.
|
|
579
607
|
deferred: deferredOpen,
|
|
580
608
|
deferredClasses: Array.from(deferredClasses),
|
|
609
|
+
// B6 — which reviewers ACTUALLY ran (the gated matrix); Phase Final keys its slim/skip
|
|
610
|
+
// decision on this, and per_card telemetry records it for the A/B spawn accounting.
|
|
611
|
+
reviewersRun: reviewers,
|
|
581
612
|
scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md`, gates,
|
|
582
613
|
}
|
|
583
614
|
}
|
|
@@ -675,9 +706,15 @@ if (committed.length && !degraded) {
|
|
|
675
706
|
firstCardId: firstCard, worktreePath: sharedCtx.worktreePath, baseBranch: TRUNK,
|
|
676
707
|
cardPaths: committed.map((r) => pathById[r.card]).filter(Boolean),
|
|
677
708
|
reviewScopeFiles, archBaselinePaths: allArch, hasApiDataFiles, config: cfg,
|
|
678
|
-
// F-041 —
|
|
679
|
-
//
|
|
680
|
-
|
|
709
|
+
// F-041 + B6 — slim the final pass (skip its doc-reviewer) ONLY when doc-reviewer
|
|
710
|
+
// ACTUALLY ran per-card for this single card. The old unconditional `length === 1` was
|
|
711
|
+
// built on the premise "doc + api-perf already ran per-card" — false under the gated
|
|
712
|
+
// review matrix (and it was ALREADY false under `light`, a latent F-041 coverage hole
|
|
713
|
+
// this closes). When per-card doc review was gated off, the final doc pass is the
|
|
714
|
+
// missing-doc-update safety net and must run even for a single card. api-perf at the
|
|
715
|
+
// final pass is independently gated by hasApiDataFiles (regex the per-card matrix
|
|
716
|
+
// supersets), so its coverage is consistent either way.
|
|
717
|
+
singleCard: committed.length === 1 && ((committed[0].reviewersRun || []).includes('doc-reviewer')),
|
|
681
718
|
})
|
|
682
719
|
} catch (e) { if (e && isTransient(e)) noteDegraded('outage'); final = null }
|
|
683
720
|
|
|
@@ -840,7 +877,7 @@ function buildTelemetry() {
|
|
|
840
877
|
// cost — total_tokens via budget.spent() delta; agent_count via counter; wall_clock_s stamped by the SKILL.
|
|
841
878
|
total_tokens: totalTokens,
|
|
842
879
|
agent_count: agentCount,
|
|
843
|
-
per_card: perCardResults.map((r) => ({ card: r.card, status: r.status, deferred: !!r.deferred, deferredClasses: r.deferredClasses || [], gates: (r.gates || []).length })),
|
|
880
|
+
per_card: perCardResults.map((r) => ({ card: r.card, status: r.status, deferred: !!r.deferred, deferredClasses: r.deferredClasses || [], reviewers: r.reviewersRun || [], gates: (r.gates || []).length })),
|
|
844
881
|
stats_requested: !!FLAGS.stats,
|
|
845
882
|
}
|
|
846
883
|
}
|