baldart 4.24.2 → 4.25.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +19 -0
- package/VERSION +1 -1
- package/framework/.claude/skills/new2/SKILL.md +8 -0
- package/framework/.claude/workflows/new2.js +60 -15
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,25 @@ All notable changes to BALDART will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [4.25.0] - 2026-06-11
|
|
9
|
+
|
|
10
|
+
**`new2`: deterministic per-card review matrix — `balanced` no longer spawns every specialist on every card.** The old fan-out ran code-reviewer + doc-reviewer + qa-sentinel + api-perf-cost-auditor (+ security-reviewer) unconditionally at `balanced`: an api-perf pass on a doc-only card, a doc pass on a pure-code card — 1–3 wasted spawns per card. Each specialist now runs IFF its domain is evidenced by the card's actual surface (`scopeFiles ∪ MAY-EDIT`), computed deterministically in JS (no agent judgement) and audited via a `review-matrix` ledger row per card. **MINOR** (review-behavior capability on the EXPERIMENTAL `new2` surface only; no config key, no change to `/new` — its interactive profiles are untouched; the schema-change propagation rule does not apply).
|
|
11
|
+
|
|
12
|
+
### Changed
|
|
13
|
+
|
|
14
|
+
- **`framework/.claude/workflows/new2.js` — relevance-gated `balanced` fan-out.** `touchesCode` → code-reviewer + qa-sentinel; `touchesDocs` (`.md`/`.mdx` or under `paths.docs_dir|references_dir|wiki_dir|prd_dir|design_system`) → doc-reviewer (on a doc-ONLY card it IS the core finder — code-reviewer and qa-sentinel are skipped); `touchesApiData` (superset of the final review's `hasApiDataFiles` regex) → api-perf-cost-auditor; the v4.24.2 `securityRelevant` gate → security-reviewer. Surface = `scopeFiles ∪ MAY-EDIT`, so a DoD-mandated doc the implementer FORGOT to edit still triggers doc-reviewer (it's in MAY-EDIT). **Fail-safes**: `deep` keeps the unconditional full fan-out (Rule C escalation is respected); an empty surface (no evidence) falls back to the full fan-out — absence of evidence is not evidence of absence. `light`/`skip` unchanged.
|
|
15
|
+
- **`new2.js` Phase Final — the F-041 `singleCard`/slim skip is now coverage-gated.** The final review's doc pass is skipped for a single-card batch ONLY when doc-reviewer ACTUALLY ran per-card (`reviewersRun`); otherwise it runs as the missing-doc-update safety net — so gating a per-card specialist off never leaves its domain unreviewed at batch level. This also closes a **latent F-041 hole**: under `light`, doc-reviewer never ran per-card, yet the slim skip still suppressed the final doc pass for single-card batches.
|
|
16
|
+
- **`new2.js` telemetry — `per_card[].reviewers`** records the gated matrix per card, so the A/B spawn accounting can measure exactly what the gating saves.
|
|
17
|
+
|
|
18
|
+
## [4.24.3] - 2026-06-10
|
|
19
|
+
|
|
20
|
+
**`new2`: two follow-ups from the v4.24.2 logic review — epic closure no longer misses deferred cards, and the owner-gated classification reaches the final review.** Both are "parallel location" gaps of earlier fixes (the v4.17.2 meta-lesson): v4.22.1's epic-closure and v4.24.1's owner-gated deferral each missed one site. **PATCH** (bug-fix to the EXPERIMENTAL `new2` surface only; no config key, no change to `/new`).
|
|
21
|
+
|
|
22
|
+
### Fixed
|
|
23
|
+
|
|
24
|
+
- **`new2/SKILL.md` (Step 5.2) — epic-closure re-check after deferred-DONE.** The merge agent's epic-closure (Phase 6b step 5e) runs BEFORE the skill marks deferred cards DONE post-run — so an epic whose last open child was a deferred card stayed TODO forever (nothing re-checked it). The skill now re-runs the all-children-DONE check for the parents of the cards it just marked DONE, closing the epic in the same reconciliation commit.
|
|
25
|
+
- **`new2.js` (Phase Final) — owner-gated deferrals no longer block the merge at final review.** The `merge-blocker`/`qa-fail` resolves checked only `.status`: a finding whose sole remedy is an external/infra step (e.g. the same pending remote `db:push` a reviewer re-raises batch-wide) set `mergeBlocked` and stranded a complete batch. The `deferralClass` check from the per-card loops (v4.24.1/v4.24.2) now applies here too: `owner-gated`/`not-a-code-defect` → follow-up tracked + `DEFERRED-OWNER-GATED` ledger row, merge proceeds; a genuine unresolved code defect still blocks.
|
|
26
|
+
|
|
8
27
|
## [4.24.2] - 2026-06-10
|
|
9
28
|
|
|
10
29
|
**`new2`: holistic logic review — deterministic owner_agent routing, a merge gate that no longer strands the batch, `deferralClass` end-to-end, and −N agent spawns per run.** A full logic review of `new2.js`/`new2-resolve.js`/`SKILL.md` against `/new`'s reference modules found four correctness defects and five sources of wasted spawns. The headline routing bug: the pre-flight's `ownerAgent` was passed RAW as `agentType` — the G25 "unknown→coder" rule lived only in the prompt, so any freeform value (`claude`, `backend`, a typo) was a PERMANENT spawn error → card `failed` → (combined with the merge-gate bug) the whole batch unmerged. **PATCH** (bug-fix/hardening of the EXPERIMENTAL `new2` surface only; **no `baldart.config.yml` key**, **no change to `/new`** — the schema-change propagation rule does not apply).
|
package/VERSION
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
4.
|
|
1
|
+
4.25.0
|
|
@@ -157,6 +157,14 @@ returns when the batch is done. It returns:
|
|
|
157
157
|
follow-up <id>"`. NEVER auto-DONE it — a follow-up tracks the gap, but DONE would lie (F-029).
|
|
158
158
|
**If a card's follow-up could NOT be created in step 1, leave it NON-DONE and surface it** —
|
|
159
159
|
fail-loud; NEVER mark a card DONE with a silently-dropped requirement (F-029).
|
|
160
|
+
**Then re-run the epic-closure check for the cards just marked DONE.** The merge agent's
|
|
161
|
+
epic-closure (Phase 6b step 5e) ran BEFORE this step, when the deferred cards were still
|
|
162
|
+
NON-DONE — so an epic whose last open child was a deferred card is still TODO and nothing
|
|
163
|
+
else will ever close it. For each distinct `group.parent` of the cards marked DONE here: if
|
|
164
|
+
`grep -l "parent: <EPIC-ID>" ${paths.backlog_dir}/*.yml | xargs grep -L "status: DONE"`
|
|
165
|
+
prints nothing, set the epic card `status: DONE` + `completed_date` + note
|
|
166
|
+
`"epic-closure gate — all children DONE (post-run, new2 skill)"`, folded into the SAME
|
|
167
|
+
reconciliation commit. If any child is still open, leave the epic untouched.
|
|
160
168
|
3. **Resume if degraded.** If `degraded` is true, re-invoke the workflow with
|
|
161
169
|
`Workflow({ scriptPath, resumeFromRunId })` (same `args` + the new `ts`). The
|
|
162
170
|
per-card **skip-completed** guard makes the resume idempotent — already-committed
|
|
@@ -460,12 +460,40 @@ async function runCard(cardId, cardPath) {
|
|
|
460
460
|
// v4.18.0 — at `light`, Codex is the SOLE finder (cost-shift off Claude); `code-reviewer` is the
|
|
461
461
|
// fallback when the companion is unavailable. The FP-gate equivalent is preserved downstream: any
|
|
462
462
|
// block routes through resolve(), whose mandatory adversarial judge (new2-resolve F-015, code domain
|
|
463
|
-
// → code-reviewer) cross-checks the Codex finding before a fix/followup.
|
|
463
|
+
// → code-reviewer) cross-checks the Codex finding before a fix/followup.
|
|
464
464
|
const codexAvail = !!sharedCtx.codexResolved && !!sharedCtx.codexScriptPath
|
|
465
|
-
|
|
466
|
-
|
|
467
|
-
|
|
468
|
-
|
|
465
|
+
// B6 (v4.25.0) — deterministic per-card review matrix. `balanced` no longer means "every
|
|
466
|
+
// specialist on every card": each one runs IFF its domain is evidenced by the card's actual
|
|
467
|
+
// surface (scopeFiles ∪ MAY-EDIT), computed deterministically HERE in JS and audited via a
|
|
468
|
+
// `review-matrix` ledger row. `deep` keeps the unconditional full fan-out (Rule C assigns it
|
|
469
|
+
// to high-risk cards — respect the escalation). Coverage holds at batch level: the final
|
|
470
|
+
// review's doc pass stays the missing-doc-update safety net (its singleCard/slim skip is now
|
|
471
|
+
// gated on doc-reviewer having ACTUALLY run per-card — see Phase Final), and its api-perf
|
|
472
|
+
// pass keys on hasApiDataFiles with a regex this matrix supersets — so gating OFF a per-card
|
|
473
|
+
// specialist never leaves its domain unreviewed.
|
|
474
|
+
const surface = dedupe((scopeFiles || []).concat(mayEdit || []))
|
|
475
|
+
const docDirs = [paths.docs_dir, paths.references_dir, paths.wiki_dir, paths.prd_dir, paths.design_system].filter(Boolean)
|
|
476
|
+
const isDocFile = (f) => /\.(md|mdx)$/i.test(String(f)) || docDirs.some((d) => String(f).includes(String(d)))
|
|
477
|
+
const touchesDocs = surface.some(isDocFile)
|
|
478
|
+
const touchesCode = surface.some((f) => !isDocFile(f))
|
|
479
|
+
// superset of new-final-review's hasApiDataFiles regex — per-card coverage may exceed the
|
|
480
|
+
// final pass, never lag it.
|
|
481
|
+
const touchesApiData = surface.some((f) => /api\/|data-model|\.sql$|migrations?\/|server|route|edge|middleware|cron|queue|worker|prisma|drizzle|supabase|schema/i.test(String(f)))
|
|
482
|
+
const noEvidence = surface.length === 0 // fail-safe: absence of evidence ≠ evidence of absence
|
|
483
|
+
const FULL_FANOUT = ['code-reviewer', 'doc-reviewer', 'qa-sentinel', 'api-perf-cost-auditor'].concat(securityRelevant ? ['security-reviewer'] : [])
|
|
484
|
+
let reviewers
|
|
485
|
+
if (reviewProfile === 'skip') reviewers = []
|
|
486
|
+
else if (reviewProfile === 'light') reviewers = codexAvail ? ['codex'] : ['code-reviewer']
|
|
487
|
+
else if (reviewProfile === 'deep' || noEvidence) reviewers = FULL_FANOUT
|
|
488
|
+
else { // balanced — relevance-gated
|
|
489
|
+
reviewers = []
|
|
490
|
+
if (touchesCode) reviewers.push('code-reviewer')
|
|
491
|
+
if (touchesDocs) reviewers.push('doc-reviewer') // doc-ONLY card: doc-reviewer IS the core finder
|
|
492
|
+
if (touchesCode) reviewers.push('qa-sentinel') // skip for doc-only cards — no behavior to QA
|
|
493
|
+
if (touchesApiData) reviewers.push('api-perf-cost-auditor')
|
|
494
|
+
if (securityRelevant) reviewers.push('security-reviewer')
|
|
495
|
+
}
|
|
496
|
+
if (reviewProfile !== 'skip') g('review-matrix', 'PLANNED', `[${reviewProfile}${noEvidence && reviewProfile === 'balanced' ? '→conservative-full (no surface evidence)' : ''}] ${reviewers.join('+') || '(none)'} · docs:${touchesDocs} code:${touchesCode} api/data:${touchesApiData} sec:${securityRelevant}`)
|
|
469
497
|
const reviewSchema = { type: 'object', required: ['blocks', 'scopeExpansion'], additionalProperties: true,
|
|
470
498
|
properties: { blocks: { type: 'array', items: { type: 'object', additionalProperties: true } }, scopeExpansion: { type: 'array', items: { type: 'object', additionalProperties: true } }, note: { type: 'string' } } }
|
|
471
499
|
let reviewResults = []
|
|
@@ -535,7 +563,7 @@ async function runCard(cardId, cardPath) {
|
|
|
535
563
|
for (const sx of grp) g('scope-expansion', s === 'resolved' ? 'INTEGRATED' : 'FOLLOWUP', sx.evidence || '')
|
|
536
564
|
}
|
|
537
565
|
|
|
538
|
-
if (cardBlocked) { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md` } }
|
|
566
|
+
if (cardBlocked) { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, reviewersRun: reviewers, archBaselinePath: `/tmp/arch-baseline-${cardId}.md` } }
|
|
539
567
|
|
|
540
568
|
// --- Phase 4 — commit (F-023: Haiku + git-status reconcile, never git add -A). ---
|
|
541
569
|
// F-040/H — DONE policy. A card with an OPEN owner-gated/policy-deferred AC commits its code but
|
|
@@ -578,6 +606,9 @@ async function runCard(cardId, cardPath) {
|
|
|
578
606
|
// (A3): an 'unresolved' class means the DoD is genuinely unmet → the card stays IN_PROGRESS.
|
|
579
607
|
deferred: deferredOpen,
|
|
580
608
|
deferredClasses: Array.from(deferredClasses),
|
|
609
|
+
// B6 — which reviewers ACTUALLY ran (the gated matrix); Phase Final keys its slim/skip
|
|
610
|
+
// decision on this, and per_card telemetry records it for the A/B spawn accounting.
|
|
611
|
+
reviewersRun: reviewers,
|
|
581
612
|
scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md`, gates,
|
|
582
613
|
}
|
|
583
614
|
}
|
|
@@ -675,9 +706,15 @@ if (committed.length && !degraded) {
|
|
|
675
706
|
firstCardId: firstCard, worktreePath: sharedCtx.worktreePath, baseBranch: TRUNK,
|
|
676
707
|
cardPaths: committed.map((r) => pathById[r.card]).filter(Boolean),
|
|
677
708
|
reviewScopeFiles, archBaselinePaths: allArch, hasApiDataFiles, config: cfg,
|
|
678
|
-
// F-041 —
|
|
679
|
-
//
|
|
680
|
-
|
|
709
|
+
// F-041 + B6 — slim the final pass (skip its doc-reviewer) ONLY when doc-reviewer
|
|
710
|
+
// ACTUALLY ran per-card for this single card. The old unconditional `length === 1` was
|
|
711
|
+
// built on the premise "doc + api-perf already ran per-card" — false under the gated
|
|
712
|
+
// review matrix (and it was ALREADY false under `light`, a latent F-041 coverage hole
|
|
713
|
+
// this closes). When per-card doc review was gated off, the final doc pass is the
|
|
714
|
+
// missing-doc-update safety net and must run even for a single card. api-perf at the
|
|
715
|
+
// final pass is independently gated by hasApiDataFiles (regex the per-card matrix
|
|
716
|
+
// supersets), so its coverage is consistent either way.
|
|
717
|
+
singleCard: committed.length === 1 && ((committed[0].reviewersRun || []).includes('doc-reviewer')),
|
|
681
718
|
})
|
|
682
719
|
} catch (e) { if (e && isTransient(e)) noteDegraded('outage'); final = null }
|
|
683
720
|
|
|
@@ -699,17 +736,25 @@ if (committed.length && !degraded) {
|
|
|
699
736
|
const area = (Array.isArray(f.files) && f.files[0]) || (f.file) || (f.domain || 'misc')
|
|
700
737
|
;(byArea[area] = byArea[area] || []).push(f)
|
|
701
738
|
}
|
|
739
|
+
// F-040 (parallel location, v4.24.3) — the owner-gated classification applies HERE too,
|
|
740
|
+
// not just in the per-card loops: a final-review finding whose only remedy is an external/
|
|
741
|
+
// infra step (e.g. the same pending remote db:push a reviewer re-raises batch-wide) must
|
|
742
|
+
// NOT block the merge — the batch's code is complete and the residual is already tracked
|
|
743
|
+
// as a follow-up with its deferralClass. Only a genuine unresolved CODE defect blocks.
|
|
744
|
+
const ownerGatedFinal = (r) => r.status === 'followup' && (r.deferralClass === 'owner-gated' || r.deferralClass === 'not-a-code-defect')
|
|
702
745
|
for (const area of Object.keys(byArea)) {
|
|
703
746
|
const group = byArea[area]
|
|
704
|
-
const
|
|
747
|
+
const r = await resolve('merge-blocker', group[0].finding_id || firstCard,
|
|
705
748
|
group.map((f) => `${f.severity} ${f.title}: ${f.evidence}`).join(' || '),
|
|
706
749
|
{ mayEditPaths: reviewScopeFiles, scopeFiles: reviewScopeFiles, domain: group[0].domain || 'code',
|
|
707
|
-
findings: group.map((f) => ({ kind: 'merge-blocker', evidence: `${f.title}: ${f.evidence}`, domain: f.domain || 'code' })) })
|
|
708
|
-
if (
|
|
750
|
+
findings: group.map((f) => ({ kind: 'merge-blocker', evidence: `${f.title}: ${f.evidence}`, domain: f.domain || 'code' })) })
|
|
751
|
+
if (ownerGatedFinal(r)) ledger(group[0].finding_id || firstCard, 'final-merge-blocker', 'DEFERRED-OWNER-GATED', `${r.deferralClass} — follow-up tracked, merge NOT blocked`)
|
|
752
|
+
else if (r.status !== 'resolved') mergeBlocked = true
|
|
709
753
|
}
|
|
710
754
|
if (finalSummary && finalSummary.failingGates && finalSummary.failingGates.length) {
|
|
711
|
-
const
|
|
712
|
-
if (
|
|
755
|
+
const r = await resolve('qa-fail', firstCard, `final gates failing: ${finalSummary.failingGates.join(', ')}`, { mayEditPaths: reviewScopeFiles, scopeFiles: reviewScopeFiles, domain: 'code' })
|
|
756
|
+
if (ownerGatedFinal(r)) ledger(firstCard, 'final-qa', 'DEFERRED-OWNER-GATED', `${r.deferralClass} — follow-up tracked, merge NOT blocked`)
|
|
757
|
+
else if (r.status !== 'resolved') mergeBlocked = true
|
|
713
758
|
}
|
|
714
759
|
} else {
|
|
715
760
|
ledger(firstCard, 'final-review', 'SKIPPED', degraded ? 'degraded' : 'workflow returned null')
|
|
@@ -832,7 +877,7 @@ function buildTelemetry() {
|
|
|
832
877
|
// cost — total_tokens via budget.spent() delta; agent_count via counter; wall_clock_s stamped by the SKILL.
|
|
833
878
|
total_tokens: totalTokens,
|
|
834
879
|
agent_count: agentCount,
|
|
835
|
-
per_card: perCardResults.map((r) => ({ card: r.card, status: r.status, deferred: !!r.deferred, deferredClasses: r.deferredClasses || [], gates: (r.gates || []).length })),
|
|
880
|
+
per_card: perCardResults.map((r) => ({ card: r.card, status: r.status, deferred: !!r.deferred, deferredClasses: r.deferredClasses || [], reviewers: r.reviewersRun || [], gates: (r.gates || []).length })),
|
|
836
881
|
stats_requested: !!FLAGS.stats,
|
|
837
882
|
}
|
|
838
883
|
}
|